Two proposed modifications to the PostgreSQL FDW

Started by Chris Traversover 7 years ago17 messages
#1Chris Travers
chris.travers@adjust.com

Hi all;

I am looking at trying to make two modifications to the PostgreSQL FDW and
would like feedback on this before I do.

1. INSERTMETHOD=[insert|copy] option on foreign table.

One significant limitation of the PostgreSQL FDW is that it does a prepared
statement insert on each row written which imposes a per-row latency. This
hits environments where there is significant latency or few latency
guarantees particularly hard, for example, writing to a foreign table that
might be physically located on another continent. The idea is that
INSERTMETHOD would default to insert and therefore have no changes but
where needed people could specify COPY which would stream the data out.
Updates would still be unaffected.

2. TWOPHASECOMMIT=[off|on] option

The second major issue that I see with PostgreSQL's foreign database
wrappers is the fact that there is no two phase commit which means that a
single transaction writing to a group of tables has no expectation that all
backends will commit or rollback together. With this patch an option would
be applied to foreign tables such that they could be set to use two phase
commit When this is done, the first write to each backend would register a
connection with a global transaction handler and a pre-commit and commit
hooks would be set up to properly process these.

On recommit a per-global-transaction file would be opened in the data
directory and prepare statements logged to the file. On error, we simply
roll back our local transaction.

On commit hook , we go through and start to commit the remote global
transactions. At this point we make a best effort but track whether or not
we were successfully on all. If successful on all, we delete the file. If
unsuccessful we fire a background worker which re-reads the file and is
responsible for cleanup. If global transactions persist, a SQL
administration function will be made available to restart the cleanup
process. On rollback, we do like commit but we roll back all transactions
in the set. The file has enough information to determine whether we should
be committing or rolling back on cleanup.

I would like to push these both for Pg 12. Is there any feedback on the
concepts and the problems first

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

#2Andres Freund
andres@anarazel.de
In reply to: Chris Travers (#1)
Re: Two proposed modifications to the PostgreSQL FDW

Hi,

On 2018-08-20 16:28:01 +0200, Chris Travers wrote:

1. INSERTMETHOD=[insert|copy] option on foreign table.

One significant limitation of the PostgreSQL FDW is that it does a prepared
statement insert on each row written which imposes a per-row latency. This
hits environments where there is significant latency or few latency
guarantees particularly hard, for example, writing to a foreign table that
might be physically located on another continent. The idea is that
INSERTMETHOD would default to insert and therefore have no changes but
where needed people could specify COPY which would stream the data out.
Updates would still be unaffected.

That has a *lot* of semantics issues, because you suddenly don't get
synchronous error reports anymore. I don't think that's OK on a
per-table basis. If we invented something like this, it IMO should be a
per-statement explicit opt in that'd allow streaming.

2. TWOPHASECOMMIT=[off|on] option

The second major issue that I see with PostgreSQL's foreign database
wrappers is the fact that there is no two phase commit which means that a
single transaction writing to a group of tables has no expectation that all
backends will commit or rollback together. With this patch an option would
be applied to foreign tables such that they could be set to use two phase
commit When this is done, the first write to each backend would register a
connection with a global transaction handler and a pre-commit and commit
hooks would be set up to properly process these.

On recommit a per-global-transaction file would be opened in the data
directory and prepare statements logged to the file. On error, we simply
roll back our local transaction.

On commit hook , we go through and start to commit the remote global
transactions. At this point we make a best effort but track whether or not
we were successfully on all. If successful on all, we delete the file. If
unsuccessful we fire a background worker which re-reads the file and is
responsible for cleanup. If global transactions persist, a SQL
administration function will be made available to restart the cleanup
process. On rollback, we do like commit but we roll back all transactions
in the set. The file has enough information to determine whether we should
be committing or rolling back on cleanup.

I would like to push these both for Pg 12. Is there any feedback on the
concepts and the problems first

There's been *substantial* work on this. You should at least read the
discussion & coordinate with the relevant developers.

See https://commitfest.postgresql.org/19/1574/ and the referenced discussions.

Greetings,

Andres Freund

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chris Travers (#1)
Re: Two proposed modifications to the PostgreSQL FDW

Chris Travers <chris.travers@adjust.com> writes:

I am looking at trying to make two modifications to the PostgreSQL FDW and
would like feedback on this before I do.

1. INSERTMETHOD=[insert|copy] option on foreign table.

One significant limitation of the PostgreSQL FDW is that it does a prepared
statement insert on each row written which imposes a per-row latency. This
hits environments where there is significant latency or few latency
guarantees particularly hard, for example, writing to a foreign table that
might be physically located on another continent. The idea is that
INSERTMETHOD would default to insert and therefore have no changes but
where needed people could specify COPY which would stream the data out.
Updates would still be unaffected.

It seems unlikely to me that an FDW option would be at all convenient
for this. What about selecting it dynamically based on the planner's
estimate of the number of rows to be inserted?

A different thing we could think about is enabling COPY TO/FROM a
foreign table.

2. TWOPHASECOMMIT=[off|on] option

The second major issue that I see with PostgreSQL's foreign database
wrappers is the fact that there is no two phase commit which means that a
single transaction writing to a group of tables has no expectation that all
backends will commit or rollback together. With this patch an option would
be applied to foreign tables such that they could be set to use two phase
commit When this is done, the first write to each backend would register a
connection with a global transaction handler and a pre-commit and commit
hooks would be set up to properly process these.

ENOINFRASTRUCTURE ... and the FDW pieces of that hardly seem like the
place to start.

regards, tom lane

#4Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#2)
Re: Two proposed modifications to the PostgreSQL FDW

Greetings,

* Andres Freund (andres@anarazel.de) wrote:

On 2018-08-20 16:28:01 +0200, Chris Travers wrote:

1. INSERTMETHOD=[insert|copy] option on foreign table.

One significant limitation of the PostgreSQL FDW is that it does a prepared
statement insert on each row written which imposes a per-row latency. This
hits environments where there is significant latency or few latency
guarantees particularly hard, for example, writing to a foreign table that
might be physically located on another continent. The idea is that
INSERTMETHOD would default to insert and therefore have no changes but
where needed people could specify COPY which would stream the data out.
Updates would still be unaffected.

That has a *lot* of semantics issues, because you suddenly don't get
synchronous error reports anymore. I don't think that's OK on a
per-table basis. If we invented something like this, it IMO should be a
per-statement explicit opt in that'd allow streaming.

Doing some kind of decoration on a per-statement level to do something
different for FDWs doesn't really seem very clean..

On reading this, a thought I had was that maybe we should just perform a
COPY to the FDW when COPY is what's been specified by the user (eg:

COPY my_foreign_table FROM STDIN;

), but that wouldn't help when someone wants to bulk copy data from a
local table into a foreign table.

COPY is already non-standard though, so we can extend it, and one option
might be to extend it like so:

COPY my_local_table TO TABLE my_foreign_table;

Which could be made to work for both foreign tables and local ones,
where it'd basically be:

INSERT INTO my_foreign_table SELECT * FROM my_local_table;

The COPY TO case already supports queries, such that you could then do:

COPY (SELECT c1,c2,c3 FROM my_local_table) TO TABLE my_foreign_table;

I'd also think we'd want to support this kind of 'bulk COPY-like'
operation for multiple FDWs (I wonder if maybe file_fdw could be made to
support this new method, thus allowing users to write out to files with
it, which we don't support today at all).

Just some brain-storming and ideas about where this could possibly go.

Thanks!

Stephen

#5Andres Freund
andres@anarazel.de
In reply to: Stephen Frost (#4)
Re: Two proposed modifications to the PostgreSQL FDW

Hi,

On 2018-08-20 10:56:39 -0400, Stephen Frost wrote:

* Andres Freund (andres@anarazel.de) wrote:

On 2018-08-20 16:28:01 +0200, Chris Travers wrote:

1. INSERTMETHOD=[insert|copy] option on foreign table.

One significant limitation of the PostgreSQL FDW is that it does a prepared
statement insert on each row written which imposes a per-row latency. This
hits environments where there is significant latency or few latency
guarantees particularly hard, for example, writing to a foreign table that
might be physically located on another continent. The idea is that
INSERTMETHOD would default to insert and therefore have no changes but
where needed people could specify COPY which would stream the data out.
Updates would still be unaffected.

That has a *lot* of semantics issues, because you suddenly don't get
synchronous error reports anymore. I don't think that's OK on a
per-table basis. If we invented something like this, it IMO should be a
per-statement explicit opt in that'd allow streaming.

Doing some kind of decoration on a per-statement level to do something
different for FDWs doesn't really seem very clean..

I think it's required. The semantics of an INSERT statement
*drastically* change if you don't insert remotely. Constraints aren't
evaluated once the command finished, sequences aren't increased until
later, there'll be weird interactions with savepoints, ... Without
executing immediately remotely it's basically isn't a normal INSERT
anymore.

Note bulk INSERT and single row INSERT are very different here.

That's not to say it's not useful to pipeline. To the contrary.

On reading this, a thought I had was that maybe we should just perform a
COPY to the FDW when COPY is what's been specified by the user (eg:

COPY my_foreign_table FROM STDIN;

Right. There'd not even need to be a an option since that's already
pipelined.

), but that wouldn't help when someone wants to bulk copy data from a
local table into a foreign table.

That possibly still is doable, just with INSERT, as you don't need (may
not even, in plenty cases) to see the effects of the statement until the
CommandCounterIncrement(). So we can delay the flush a bit.

Greetings,

Andres Freund

#6Chris Travers
chris.travers@adjust.com
In reply to: Andres Freund (#2)
Re: Two proposed modifications to the PostgreSQL FDW

On Mon, Aug 20, 2018 at 4:41 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-08-20 16:28:01 +0200, Chris Travers wrote:

1. INSERTMETHOD=[insert|copy] option on foreign table.

One significant limitation of the PostgreSQL FDW is that it does a

prepared

statement insert on each row written which imposes a per-row latency.

This

hits environments where there is significant latency or few latency
guarantees particularly hard, for example, writing to a foreign table

that

might be physically located on another continent. The idea is that
INSERTMETHOD would default to insert and therefore have no changes but
where needed people could specify COPY which would stream the data out.
Updates would still be unaffected.

That has a *lot* of semantics issues, because you suddenly don't get
synchronous error reports anymore. I don't think that's OK on a
per-table basis. If we invented something like this, it IMO should be a
per-statement explicit opt in that'd allow streaming.

I want to push back a bit and justify the per table decision here.

This is primarily useful when two things are present:
1. There is significant lag between the servers
2. There are significant bulk operations on the table.

Only when both are present does it make sense to use this option. If you
have a low latency connection, don't use it. If you are only writing a few
rows here and there, don't use it. If you have a high latency connection
*and* you are inserting large numbers of rows at a time, then this is where
you run into a problem and the current FDW approach is, frankly, unusable
in these cases. These characteristics follow the server (and hence the
table) in the first case and the table only in the second case. Therefore
the characteristics of the foreign table determine whether it makes sense
to consider this option. Doing this on the statement level poses more
problems, in my view, than it fixes.

Moreover the idea of batching inserts doesn't help either (and actually
would be worse in many cases than the current approach) since the current
approach uses a prepared statement which gets called for each row. Doing
a batch insert might save you on the per-row round trip but it adds a great
deal of complexity to the overall operation.

Now, a per-statement opt-in poses a number of questions. Obviously we'd
have to allow this on local tables too, but it would have no effect. So
suddenly you have a semantic construct which *may* or *may not* pose the
problems you mention and in fact may or may not pose the problem on foreign
tables because whether the construct does anything depends on the FDW
driver. I don't see what exactly one buys in shifting the problem in this
way given that when this is useful it will be used on virtually all inserts
and when it is not useful it won't be used at all.

Now, the entire problem is the expectation of synchronous errors on a per
row basis. This is what causes the problem with inserting a million rows
over a transcontinental link. (insert row, check error, insert row, check
error etc).

So on to the question of containing the problems. I think the best way to
contain this is to have one COPY statement per insert planner node. And
that makes sure that the basic guarantee that a planner node succeeds or
errors is met. For sequences, I would expect that supplying a column list
without the relevant sequenced value should behave in a tolerable way
(otherwise two concurrent copies would have problems).

So that's the primary defense. Now let's look at the main alternatives.

1. You can use a table where you logically replicate inserts, and where
you insert then truncate or delete. This works fine until you have enough
WAL traffic that you cannot tolerate a replication outage.....
2. You can write an extension which allows you to COPY a current table
into a foreign table. The problem here is that SPI has a bunch of
protections to keep you from copying to stdout, so when I helped with this
proof fo concept we copied to a temp file in a ramdisk and copied from
there which is frankly quite a bit worse than this approach.

If you want a per statement opt-in, I guess the questions that come to my
mind are:

1. What would this opt-in look like?
2. How important is it that we avoid breaking current FDW interfaces to
implement it?
3. How would you expect it to behave if directed at a local table or a
foreign table on, say, a MySQL db?

2. TWOPHASECOMMIT=[off|on] option

The second major issue that I see with PostgreSQL's foreign database
wrappers is the fact that there is no two phase commit which means that a
single transaction writing to a group of tables has no expectation that

all

backends will commit or rollback together. With this patch an option

would

be applied to foreign tables such that they could be set to use two phase
commit When this is done, the first write to each backend would

register a

connection with a global transaction handler and a pre-commit and commit
hooks would be set up to properly process these.

On recommit a per-global-transaction file would be opened in the data
directory and prepare statements logged to the file. On error, we simply
roll back our local transaction.

On commit hook , we go through and start to commit the remote global
transactions. At this point we make a best effort but track whether or

not

we were successfully on all. If successful on all, we delete the file.

If

unsuccessful we fire a background worker which re-reads the file and is
responsible for cleanup. If global transactions persist, a SQL
administration function will be made available to restart the cleanup
process. On rollback, we do like commit but we roll back all

transactions

in the set. The file has enough information to determine whether we

should

be committing or rolling back on cleanup.

I would like to push these both for Pg 12. Is there any feedback on the
concepts and the problems first

There's been *substantial* work on this. You should at least read the
discussion & coordinate with the relevant developers.

I suppose I should forward this to them directly also.

Yeah. Also the transaction manager code for this I wrote while helping
with a proof of concept for this copy-to-remote extension.

There are a few big differences in implementation with the patches you
mention and the disagreement was part of why I thought about going this
direction.

First, discussion of differences in implementation:

1. I treat the local and remote transactions symmetrically and I make no
assumptions about what might happen between prepare and an attempted local
commit.
prepare goes into the precommit hook
commit goes into the commit hook and we never raise errors if it fails
(because you cannot rollback at that point). Instead a warning is raised
and cleanup commences.
rollback goes into the rollback hook and we never raise errors if it
fails (because you are already rolling back).

2. By treating this as a property of a table rather than a property of a
foreign data wrapper or a server, we can better prevent prepared
transactions where they have not been enabled.
This also ensures that we know whether we are guaranteeing two phase
commit or not by looking at the table.

3. By making this opt-in it avoids a lot of problems with regards to
incorrect configuration etc since if the DBA says "use two phase commit"
and failed to enable prepared transactions on the other side...

On to failure modes:
1. Its possible that under high load too many foreign transactions are
prepared and things start rolling back instead of committing. Oh well....
2. In the event that a foreign server goes away between prepare and
commit, we continue to retry via the background worker. The background
worker is very pessimistic and checks every remote system for the named
transaction.
3. Administrator action is required in two specific cases. In the first
case the controlling server crashes, restarts, etc along with the cleanup
worker while cleanup is running. It would be possible to patch the backend
to provide a cleanup service of this sort on startup. However this would
require some design decisions to be made now. In the second case, a
transaction has been around for a while and is causing vacuum horizon
issues so needs to be manually resolved. Maybe the reason here is beyond
our control such as another recommit hook stalling in another extension.

Part of the reason for this discussion is to try to get feedback not only
from the current developers but also those (like Tom) who objected the last
time around.

For example, if it is a requirement that we have automatic cleanup in the
event of a server restart, then the backend needs to know what to do about
these. If not, then we leave the whole code in the FDW. If the backend
has to know we probably need a file header sufficient for the backend to
decide that this is a PostgreSQL-FDW global transaction log vs maybe one
that could be added for the Oracle FDW later.... Maybe mentioning the .so
and symbol for the background worker entry point?

See https://commitfest.postgresql.org/19/1574/ and the referenced
discussions.

Greetings,

Andres Freund

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

#7Chris Travers
chris.travers@adjust.com
In reply to: Chris Travers (#6)
2 attachment(s)
Re: Two proposed modifications to the PostgreSQL FDW

For the record, here's the proof of concept code for the transaction
manager which works off libpq connections.

It is not ready yet by any means. But it is included for design
discussion. If the previous patch gets in instead, that's fine, but figure
it is worth including here for discussion purposes.

Two things which are currently missing are a) an ability to specify in the
log file where the cleanup routine is located for a background worker and
b) a separation of backend responsibility for restarting cleanup efforts on
server start.

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

Attachments:

twophasecommit.happlication/octet-stream; name=twophasecommit.hDownload
twophasecommit.capplication/octet-stream; name=twophasecommit.cDownload
#8Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Chris Travers (#6)
Re: Two proposed modifications to the PostgreSQL FDW

On Tue, Aug 21, 2018 at 1:47 AM Chris Travers <chris.travers@adjust.com> wrote:

On Mon, Aug 20, 2018 at 4:41 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-08-20 16:28:01 +0200, Chris Travers wrote:

2. TWOPHASECOMMIT=[off|on] option

The second major issue that I see with PostgreSQL's foreign database
wrappers is the fact that there is no two phase commit which means that a
single transaction writing to a group of tables has no expectation that all
backends will commit or rollback together. With this patch an option would
be applied to foreign tables such that they could be set to use two phase
commit When this is done, the first write to each backend would register a
connection with a global transaction handler and a pre-commit and commit
hooks would be set up to properly process these.

On recommit a per-global-transaction file would be opened in the data
directory and prepare statements logged to the file. On error, we simply
roll back our local transaction.

On commit hook , we go through and start to commit the remote global
transactions. At this point we make a best effort but track whether or not
we were successfully on all. If successful on all, we delete the file. If
unsuccessful we fire a background worker which re-reads the file and is
responsible for cleanup. If global transactions persist, a SQL
administration function will be made available to restart the cleanup
process. On rollback, we do like commit but we roll back all transactions
in the set. The file has enough information to determine whether we should
be committing or rolling back on cleanup.

I would like to push these both for Pg 12. Is there any feedback on the
concepts and the problems first

Thank you for the proposal. I agree that it's a major problem that
postgres_fdw (or PostgreSQL core API) doesn't support two-phase
commit.

There's been *substantial* work on this. You should at least read the
discussion & coordinate with the relevant developers.

I suppose I should forward this to them directly also.

Yeah. Also the transaction manager code for this I wrote while helping with a proof of concept for this copy-to-remote extension.

There are a few big differences in implementation with the patches you mention and the disagreement was part of why I thought about going this direction.

First, discussion of differences in implementation:

1. I treat the local and remote transactions symmetrically and I make no assumptions about what might happen between prepare and an attempted local commit.
prepare goes into the precommit hook
commit goes into the commit hook and we never raise errors if it fails (because you cannot rollback at that point). Instead a warning is raised and cleanup commences.
rollback goes into the rollback hook and we never raise errors if it fails (because you are already rolling back).

2. By treating this as a property of a table rather than a property of a foreign data wrapper or a server, we can better prevent prepared transactions where they have not been enabled.
This also ensures that we know whether we are guaranteeing two phase commit or not by looking at the table.

3. By making this opt-in it avoids a lot of problems with regards to incorrect configuration etc since if the DBA says "use two phase commit" and failed to enable prepared transactions on the other side...

On to failure modes:
1. Its possible that under high load too many foreign transactions are prepared and things start rolling back instead of committing. Oh well....
2. In the event that a foreign server goes away between prepare and commit, we continue to retry via the background worker. The background worker is very pessimistic and checks every remote system for the named transaction.

If some participant servers fail during COMMIT PREPARED, will the
client get a "committed"? or an "aborted"? If the client gets
"aborted", that's not correct because the local changes are already
committed at that point. On the other hand, if the client get
"committed" it might break the current user semantics because the
subsequent reads may not be able to see the own committed writes. Also
since we don't want to wait for COMMIT PREPARED to complete we need to
consider that users could cancel the query anytime. To not break the
current semantics we cannot raise error during 2nd phase of two-phase
commit but it's not realistic because even the palloc() can raise an
error.

The design the patch chose is making backends do only PREPARE and wait
for the background worker to complete COMMIT PREPARED. In this design
the clients get a "committed" only either when successful in commit on
all participants or when they cancel the query explicitly. In other
words, the client will wait for completion of 2nd phase of two-phase
commit forever unless it cancels.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#9Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#3)
Re: Two proposed modifications to the PostgreSQL FDW

On 2018/08/20 23:43, Tom Lane wrote:

Chris Travers <chris.travers@adjust.com> writes:

I am looking at trying to make two modifications to the PostgreSQL FDW and
would like feedback on this before I do.

1. INSERTMETHOD=[insert|copy] option on foreign table.

One significant limitation of the PostgreSQL FDW is that it does a prepared
statement insert on each row written which imposes a per-row latency. This
hits environments where there is significant latency or few latency
guarantees particularly hard, for example, writing to a foreign table that
might be physically located on another continent. The idea is that
INSERTMETHOD would default to insert and therefore have no changes but
where needed people could specify COPY which would stream the data out.
Updates would still be unaffected.

It seems unlikely to me that an FDW option would be at all convenient
for this. What about selecting it dynamically based on the planner's
estimate of the number of rows to be inserted?

A different thing we could think about is enabling COPY TO/FROM a
foreign table.

Fwiw, the following commit did introduce COPY FROM support for foreign
tables, although using a FDW INSERT interface, so not exactly optimized
for bulk-loading yet.

commit 3d956d9562aa4811b5eaaaf5314d361c61be2ae0
Author: Robert Haas <rhaas@postgresql.org>
Date: Fri Apr 6 19:16:11 2018 -0400

Allow insert and update tuple routing and COPY for foreign tables.

Thanks,
Amit

#10Chris Travers
chris.travers@adjust.com
In reply to: Masahiko Sawada (#8)
Re: Two proposed modifications to the PostgreSQL FDW

On Tue, Aug 21, 2018 at 8:42 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Tue, Aug 21, 2018 at 1:47 AM Chris Travers <chris.travers@adjust.com>
wrote:

On Mon, Aug 20, 2018 at 4:41 PM Andres Freund <andres@anarazel.de>

wrote:

Hi,

On 2018-08-20 16:28:01 +0200, Chris Travers wrote:

2. TWOPHASECOMMIT=[off|on] option

The second major issue that I see with PostgreSQL's foreign database
wrappers is the fact that there is no two phase commit which means

that a

single transaction writing to a group of tables has no expectation

that all

backends will commit or rollback together. With this patch an option

would

be applied to foreign tables such that they could be set to use two

phase

commit When this is done, the first write to each backend would

register a

connection with a global transaction handler and a pre-commit and

commit

hooks would be set up to properly process these.

On recommit a per-global-transaction file would be opened in the data
directory and prepare statements logged to the file. On error, we

simply

roll back our local transaction.

On commit hook , we go through and start to commit the remote global
transactions. At this point we make a best effort but track whether

or not

we were successfully on all. If successful on all, we delete the

file. If

unsuccessful we fire a background worker which re-reads the file and

is

responsible for cleanup. If global transactions persist, a SQL
administration function will be made available to restart the cleanup
process. On rollback, we do like commit but we roll back all

transactions

in the set. The file has enough information to determine whether we

should

be committing or rolling back on cleanup.

I would like to push these both for Pg 12. Is there any feedback on

the

concepts and the problems first

Thank you for the proposal. I agree that it's a major problem that
postgres_fdw (or PostgreSQL core API) doesn't support two-phase
commit.

There's been *substantial* work on this. You should at least read the
discussion & coordinate with the relevant developers.

I suppose I should forward this to them directly also.

Yeah. Also the transaction manager code for this I wrote while helping

with a proof of concept for this copy-to-remote extension.

There are a few big differences in implementation with the patches you

mention and the disagreement was part of why I thought about going this
direction.

First, discussion of differences in implementation:

1. I treat the local and remote transactions symmetrically and I make

no assumptions about what might happen between prepare and an attempted
local commit.

prepare goes into the precommit hook
commit goes into the commit hook and we never raise errors if it

fails (because you cannot rollback at that point). Instead a warning is
raised and cleanup commences.

rollback goes into the rollback hook and we never raise errors if it

fails (because you are already rolling back).

2. By treating this as a property of a table rather than a property of

a foreign data wrapper or a server, we can better prevent prepared
transactions where they have not been enabled.

This also ensures that we know whether we are guaranteeing two phase

commit or not by looking at the table.

3. By making this opt-in it avoids a lot of problems with regards to

incorrect configuration etc since if the DBA says "use two phase commit"
and failed to enable prepared transactions on the other side...

On to failure modes:
1. Its possible that under high load too many foreign transactions are

prepared and things start rolling back instead of committing. Oh well....

2. In the event that a foreign server goes away between prepare and

commit, we continue to retry via the background worker. The background
worker is very pessimistic and checks every remote system for the named
transaction.

If some participant servers fail during COMMIT PREPARED, will the
client get a "committed"? or an "aborted"? If the client gets
"aborted", that's not correct because the local changes are already
committed at that point.

Ok so let's discuss this in more detail here.

You have basically 6 states a TPC global transaction can be in.
1. We haven't gotten to the point of trying to commit (BEGIN)
2. We are trying to commit (PREPARE)
3. We have committed to committing all transactions (COMMIT)
4. We have committed to rolling back all transactions (ROLLBACK)
5. We have successfully committed OR rolled back all transactions
(COMPLETE)
6. We tried to commit or rollback all transactions and got some errors
(INCOMPLETE)

During COMMIT PREPARED we cannot raise errors to PostgreSQL. We have
already committed to committing and therefore the only way forward is to
fix the problem.

On the other hand, if the client get
"committed" it might break the current user semantics because the
subsequent reads may not be able to see the own committed writes.

Actually it is worse than that and this is why automatic attempted recovery
is an absolute requirement. If you cannot commit prepared, then you have a
prepared statement that is stuck on the remote side. This sets auto vacuum
horizons and some other nastiness. So we have to note, move on, and try to
fix.

Moreover since COMMIT PREPARED occurs during the commit hook, not the
precommit hook, it is too late to roll back the local transaction. We
cannot raise errors since this causes a conflict in the commit status of
the local transaction. So when we commit the local transaction we commit
to committing all prepared transactions as soon as possible. Note some
changes need to be made to make this usable in the FDW context, so what I
am hoping is that the dialog helps impact the discussion and options going
forward.

Also
since we don't want to wait for COMMIT PREPARED to complete we need to
consider that users could cancel the query anytime. To not break the
current semantics we cannot raise error during 2nd phase of two-phase
commit but it's not realistic because even the palloc() can raise an
error.

We don't palloc. All memory used here is on the stack. I do allow for
dramatic precondition checks to cause errors but those should never happen
absent some other programmer doing something dramatically unsafe anyway.
For example if you try to double-commit a transaction set.....

There is a possible of system errors if one can no longer write to the file
log but at this point as long as we have logged the phase change to commit
we are able to recover later.

So in the event where one sees an error here one continues on to the next
transaction in the global transaction set and tries to commit it, etc.
until it runs through the entire set of prepared transactions. Then if
there were any errors it fires off a background worker which re-reads the
log file and goes out to the various foreign servers, checks to see if
there is a prepared transaction, and if so commits it. If the transaction
set state was in rollback, it tries to roll it back instead. If this
errors,, it sleeps for a second and then loops through those which errored
and retries until all are complete.

The other thing is we record whether we are committing or rolling back the
transaction when we hit the commit or rollback hook. This is critical
because we can imagine a world where the Oracle FDW supports similar
semantics. In that case everything works and is not ordering dependent.
I.e. we can prepare our transactions. Oracle can try and fail, and
rollback, and we rollback all the transactions everywhere. And all we have
to know was we got to the precommit hook and then we rolled back.

The design the patch chose is making backends do only PREPARE and wait
for the background worker to complete COMMIT PREPARED. In this design
the clients get a "committed" only either when successful in commit on
all participants or when they cancel the query explicitly. In other
words, the client will wait for completion of 2nd phase of two-phase
commit forever unless it cancels.

In this approach we make a best effort to commit or rollback (as
appropriate in the state of the global transaction) *all* remote
transactions during global commit or global rollback. It is not guaranteed
but it avoids breaking semantics as much as we can. Also the background
worker here does not need to attach to shared memory since the log has
everything required. COMMIT PREPARED ought to be a fast operation unless
there are network problems but those can affect prepare as well.

Also imagine a case where you are writing to three dbs. One is on Oracle,
one on DB2, and one on PostgreSQL You successfully prepare your
transaction. DB2 successfully prepares, and then the Oracle db errors for
some reason (maybe a deferred constraint). Does the background worker have
enough information to know to roll back your transaction on the remote side?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

#11Chris Travers
chris.travers@adjust.com
In reply to: Tom Lane (#3)
Re: Two proposed modifications to the PostgreSQL FDW

On Mon, Aug 20, 2018 at 4:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Chris Travers <chris.travers@adjust.com> writes:

I am looking at trying to make two modifications to the PostgreSQL FDW

and

would like feedback on this before I do.

1. INSERTMETHOD=[insert|copy] option on foreign table.

One significant limitation of the PostgreSQL FDW is that it does a

prepared

statement insert on each row written which imposes a per-row latency.

This

hits environments where there is significant latency or few latency
guarantees particularly hard, for example, writing to a foreign table

that

might be physically located on another continent. The idea is that
INSERTMETHOD would default to insert and therefore have no changes but
where needed people could specify COPY which would stream the data out.
Updates would still be unaffected.

It seems unlikely to me that an FDW option would be at all convenient
for this. What about selecting it dynamically based on the planner's
estimate of the number of rows to be inserted?

A different thing we could think about is enabling COPY TO/FROM a
foreign table.

Actually as I start to understand some aspects Andres's concern above,
there are issues beyond numbers of rows. But yes, selecting dynamically
would be preferable.

Two major things I think we cannot support on this are RETURNING clauses
and ON CONFLICT clauses. So anywhere we need to worry about those a copy
node could not be used. So it is more complex than merely row estimates.

2. TWOPHASECOMMIT=[off|on] option

The second major issue that I see with PostgreSQL's foreign database
wrappers is the fact that there is no two phase commit which means that a
single transaction writing to a group of tables has no expectation that

all

backends will commit or rollback together. With this patch an option

would

be applied to foreign tables such that they could be set to use two phase
commit When this is done, the first write to each backend would

register a

connection with a global transaction handler and a pre-commit and commit
hooks would be set up to properly process these.

ENOINFRASTRUCTURE ... and the FDW pieces of that hardly seem like the
place to start.

I disagree about the lack of infrastructure. We have every piece of
infrastructure we need for a minimum viable offering.
1. Two Phase Commit in PostgreSQL
2. Custom Background Workers
3. Pre/Post Commit/Rollback hooks for callbacks.

Those are sufficient to handle the vast majority of error cases.

The one thing we *might* want that we don't have is a startup process to
scan a directory of background worker status files and fire off appropriate
background workers on database start. That hardly seems difficult though.

regards, tom lane

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

#12Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Chris Travers (#10)
Re: Two proposed modifications to the PostgreSQL FDW

On Tue, Aug 21, 2018 at 5:36 PM Chris Travers <chris.travers@adjust.com> wrote:

On Tue, Aug 21, 2018 at 8:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Aug 21, 2018 at 1:47 AM Chris Travers <chris.travers@adjust.com> wrote:

On Mon, Aug 20, 2018 at 4:41 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-08-20 16:28:01 +0200, Chris Travers wrote:

2. TWOPHASECOMMIT=[off|on] option

The second major issue that I see with PostgreSQL's foreign database
wrappers is the fact that there is no two phase commit which means that a
single transaction writing to a group of tables has no expectation that all
backends will commit or rollback together. With this patch an option would
be applied to foreign tables such that they could be set to use two phase
commit When this is done, the first write to each backend would register a
connection with a global transaction handler and a pre-commit and commit
hooks would be set up to properly process these.

On recommit a per-global-transaction file would be opened in the data
directory and prepare statements logged to the file. On error, we simply
roll back our local transaction.

On commit hook , we go through and start to commit the remote global
transactions. At this point we make a best effort but track whether or not
we were successfully on all. If successful on all, we delete the file. If
unsuccessful we fire a background worker which re-reads the file and is
responsible for cleanup. If global transactions persist, a SQL
administration function will be made available to restart the cleanup
process. On rollback, we do like commit but we roll back all transactions
in the set. The file has enough information to determine whether we should
be committing or rolling back on cleanup.

I would like to push these both for Pg 12. Is there any feedback on the
concepts and the problems first

Thank you for the proposal. I agree that it's a major problem that
postgres_fdw (or PostgreSQL core API) doesn't support two-phase
commit.

There's been *substantial* work on this. You should at least read the
discussion & coordinate with the relevant developers.

I suppose I should forward this to them directly also.

Yeah. Also the transaction manager code for this I wrote while helping with a proof of concept for this copy-to-remote extension.

There are a few big differences in implementation with the patches you mention and the disagreement was part of why I thought about going this direction.

First, discussion of differences in implementation:

1. I treat the local and remote transactions symmetrically and I make no assumptions about what might happen between prepare and an attempted local commit.
prepare goes into the precommit hook
commit goes into the commit hook and we never raise errors if it fails (because you cannot rollback at that point). Instead a warning is raised and cleanup commences.
rollback goes into the rollback hook and we never raise errors if it fails (because you are already rolling back).

2. By treating this as a property of a table rather than a property of a foreign data wrapper or a server, we can better prevent prepared transactions where they have not been enabled.
This also ensures that we know whether we are guaranteeing two phase commit or not by looking at the table.

3. By making this opt-in it avoids a lot of problems with regards to incorrect configuration etc since if the DBA says "use two phase commit" and failed to enable prepared transactions on the other side...

On to failure modes:
1. Its possible that under high load too many foreign transactions are prepared and things start rolling back instead of committing. Oh well....
2. In the event that a foreign server goes away between prepare and commit, we continue to retry via the background worker. The background worker is very pessimistic and checks every remote system for the named transaction.

If some participant servers fail during COMMIT PREPARED, will the
client get a "committed"? or an "aborted"? If the client gets
"aborted", that's not correct because the local changes are already
committed at that point.

Ok so let's discuss this in more detail here.

You have basically 6 states a TPC global transaction can be in.
1. We haven't gotten to the point of trying to commit (BEGIN)
2. We are trying to commit (PREPARE)
3. We have committed to committing all transactions (COMMIT)
4. We have committed to rolling back all transactions (ROLLBACK)
5. We have successfully committed OR rolled back all transactions (COMPLETE)
6. We tried to commit or rollback all transactions and got some errors (INCOMPLETE)

During COMMIT PREPARED we cannot raise errors to PostgreSQL. We have already committed to committing and therefore the only way forward is to fix the problem.

Agreed. I wrote the case where the client gets an "aborted" but it
should not happen.

On the other hand, if the client get
"committed" it might break the current user semantics because the
subsequent reads may not be able to see the own committed writes.

Actually it is worse than that and this is why automatic attempted recovery is an absolute requirement. If you cannot commit prepared, then you have a prepared statement that is stuck on the remote side. This sets auto vacuum horizons and some other nastiness. So we have to note, move on, and try to fix.

Yeah, in my patch the background worker will continue to try to fix if occur.

Moreover since COMMIT PREPARED occurs during the commit hook, not the precommit hook, it is too late to roll back the local transaction. We cannot raise errors since this causes a conflict in the commit status of the local transaction. So when we commit the local transaction we commit to committing all prepared transactions as soon as possible. Note some changes need to be made to make this usable in the FDW context, so what I am hoping is that the dialog helps impact the discussion and options going forward.

Also
since we don't want to wait for COMMIT PREPARED to complete we need to
consider that users could cancel the query anytime. To not break the
current semantics we cannot raise error during 2nd phase of two-phase
commit but it's not realistic because even the palloc() can raise an
error.

We don't palloc. All memory used here is on the stack. I do allow for dramatic precondition checks to cause errors but those should never happen absent some other programmer doing something dramatically unsafe anyway. For example if you try to double-commit a transaction set.....

Sorry, palloc() is just an example. I'm not sure all FDWs can
implement all callbacks for two-phase commit without codes that could
emit errors.

There is a possible of system errors if one can no longer write to the file log but at this point as long as we have logged the phase change to commit we are able to recover later.

So in the event where one sees an error here one continues on to the next transaction in the global transaction set and tries to commit it, etc. until it runs through the entire set of prepared transactions. Then if there were any errors it fires off a background worker which re-reads the log file and goes out to the various foreign servers, checks to see if there is a prepared transaction, and if so commits it. If the transaction set state was in rollback, it tries to roll it back instead. If this errors,, it sleeps for a second and then loops through those which errored and retries until all are complete.

Yeah, the patch has the similar functionality.

The other thing is we record whether we are committing or rolling back the transaction when we hit the commit or rollback hook. This is critical because we can imagine a world where the Oracle FDW supports similar semantics. In that case everything works and is not ordering dependent. I.e. we can prepare our transactions. Oracle can try and fail, and rollback, and we rollback all the transactions everywhere. And all we have to know was we got to the precommit hook and then we rolled back.

In my patch the global transaction manager manages each status of
foreign servers participating in global transactions with WAL logging.
The fate of transaction on foreign server will be determined according
to the state of local transaction and their status. WAL logging is
important because not only in term of speedup but also supporting
streaming replication.

The design the patch chose is making backends do only PREPARE and wait
for the background worker to complete COMMIT PREPARED. In this design
the clients get a "committed" only either when successful in commit on
all participants or when they cancel the query explicitly. In other
words, the client will wait for completion of 2nd phase of two-phase
commit forever unless it cancels.

In this approach we make a best effort to commit or rollback (as appropriate in the state of the global transaction) *all* remote transactions during global commit or global rollback. It is not guaranteed but it avoids breaking semantics as much as we can. Also the background worker here does not need to attach to shared memory since the log has everything required. COMMIT PREPARED ought to be a fast operation unless there are network problems but those can affect prepare as well.

Also imagine a case where you are writing to three dbs. One is on Oracle, one on DB2, and one on PostgreSQL You successfully prepare your transaction. DB2 successfully prepares, and then the Oracle db errors for some reason (maybe a deferred constraint). Does the background worker have enough information to know to roll back your transaction on the remote side?

I think that what the background worker needs to know to rollback
remote transactions are how to rollback and what to rollback. How to
rollback is defined in each FDWs. The state of each global
transactions is stored on GTM's shared memory and transaction
identifiers of each foreign server are associated with its global
transaction identifier the background worker can find out the
transaction identifier of foreign server side to rollback by looking
such information.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#13Chris Travers
chris.travers@adjust.com
In reply to: Masahiko Sawada (#12)
Re: Two proposed modifications to the PostgreSQL FDW

On Wed, Aug 22, 2018 at 3:12 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Tue, Aug 21, 2018 at 5:36 PM Chris Travers <chris.travers@adjust.com>
wrote:

On Tue, Aug 21, 2018 at 8:42 AM Masahiko Sawada <sawada.mshk@gmail.com>

wrote:

On Tue, Aug 21, 2018 at 1:47 AM Chris Travers <chris.travers@adjust.com>

wrote:

On Mon, Aug 20, 2018 at 4:41 PM Andres Freund <andres@anarazel.de>

wrote:

Hi,

On 2018-08-20 16:28:01 +0200, Chris Travers wrote:

2. TWOPHASECOMMIT=[off|on] option

The second major issue that I see with PostgreSQL's foreign

database

wrappers is the fact that there is no two phase commit which means

that a

single transaction writing to a group of tables has no expectation

that all

backends will commit or rollback together. With this patch an

option would

be applied to foreign tables such that they could be set to use

two phase

commit When this is done, the first write to each backend would

register a

connection with a global transaction handler and a pre-commit and

commit

hooks would be set up to properly process these.

On recommit a per-global-transaction file would be opened in the

data

directory and prepare statements logged to the file. On error, we

simply

roll back our local transaction.

On commit hook , we go through and start to commit the remote

global

transactions. At this point we make a best effort but track

whether or not

we were successfully on all. If successful on all, we delete the

file. If

unsuccessful we fire a background worker which re-reads the file

and is

responsible for cleanup. If global transactions persist, a SQL
administration function will be made available to restart the

cleanup

process. On rollback, we do like commit but we roll back all

transactions

in the set. The file has enough information to determine whether

we should

be committing or rolling back on cleanup.

I would like to push these both for Pg 12. Is there any feedback

on the

concepts and the problems first

Thank you for the proposal. I agree that it's a major problem that
postgres_fdw (or PostgreSQL core API) doesn't support two-phase
commit.

There's been *substantial* work on this. You should at least read the
discussion & coordinate with the relevant developers.

I suppose I should forward this to them directly also.

Yeah. Also the transaction manager code for this I wrote while

helping with a proof of concept for this copy-to-remote extension.

There are a few big differences in implementation with the patches

you mention and the disagreement was part of why I thought about going this
direction.

First, discussion of differences in implementation:

1. I treat the local and remote transactions symmetrically and I

make no assumptions about what might happen between prepare and an
attempted local commit.

prepare goes into the precommit hook
commit goes into the commit hook and we never raise errors if it

fails (because you cannot rollback at that point). Instead a warning is
raised and cleanup commences.

rollback goes into the rollback hook and we never raise errors if

it fails (because you are already rolling back).

2. By treating this as a property of a table rather than a property

of a foreign data wrapper or a server, we can better prevent prepared
transactions where they have not been enabled.

This also ensures that we know whether we are guaranteeing two

phase commit or not by looking at the table.

3. By making this opt-in it avoids a lot of problems with regards to

incorrect configuration etc since if the DBA says "use two phase commit"
and failed to enable prepared transactions on the other side...

On to failure modes:
1. Its possible that under high load too many foreign transactions

are prepared and things start rolling back instead of committing. Oh
well....

2. In the event that a foreign server goes away between prepare and

commit, we continue to retry via the background worker. The background
worker is very pessimistic and checks every remote system for the named
transaction.

If some participant servers fail during COMMIT PREPARED, will the
client get a "committed"? or an "aborted"? If the client gets
"aborted", that's not correct because the local changes are already
committed at that point.

Ok so let's discuss this in more detail here.

You have basically 6 states a TPC global transaction can be in.
1. We haven't gotten to the point of trying to commit (BEGIN)
2. We are trying to commit (PREPARE)
3. We have committed to committing all transactions (COMMIT)
4. We have committed to rolling back all transactions (ROLLBACK)
5. We have successfully committed OR rolled back all transactions

(COMPLETE)

6. We tried to commit or rollback all transactions and got some errors

(INCOMPLETE)

During COMMIT PREPARED we cannot raise errors to PostgreSQL. We have

already committed to committing and therefore the only way forward is to
fix the problem.

Agreed. I wrote the case where the client gets an "aborted" but it
should not happen.

It is possible an administrator could log in and roll back the prepared
transaction but that's beyond the scope of any possible patch.

On the other hand, if the client get
"committed" it might break the current user semantics because the
subsequent reads may not be able to see the own committed writes.

Actually it is worse than that and this is why automatic attempted

recovery is an absolute requirement. If you cannot commit prepared, then
you have a prepared statement that is stuck on the remote side. This sets
auto vacuum horizons and some other nastiness. So we have to note, move
on, and try to fix.

Yeah, in my patch the background worker will continue to try to fix if
occur.

The two things I would suggest is that rather than auto-detecting (if I
understand your patch correctly) whether prepared transactions are possible
on the other system, making it an option to the foreign server or foreign
table. Otherwise one might enable prepared transactions for one set of
operations on one database and have it automatically cause headaches in
another context.

The other thing I wasn't quite sure about on your patch was what happens
if, say, someone trips over a power cord while the background worker is
trying to commit things, whether the information is available on the
initiating server when it comes back. whether a DBA has to go out and try
to figure out what needs to be committed remotely, and how this would be
done. If you could explain that process, that would be helpful to me.

(In my approach these would be recorded on the master and an SQL function
could re-initiate the background worker.)

Moreover since COMMIT PREPARED occurs during the commit hook, not the

precommit hook, it is too late to roll back the local transaction. We
cannot raise errors since this causes a conflict in the commit status of
the local transaction. So when we commit the local transaction we commit
to committing all prepared transactions as soon as possible. Note some
changes need to be made to make this usable in the FDW context, so what I
am hoping is that the dialog helps impact the discussion and options going
forward.

Also
since we don't want to wait for COMMIT PREPARED to complete we need to
consider that users could cancel the query anytime. To not break the
current semantics we cannot raise error during 2nd phase of two-phase
commit but it's not realistic because even the palloc() can raise an
error.

We don't palloc. All memory used here is on the stack. I do allow for

dramatic precondition checks to cause errors but those should never happen
absent some other programmer doing something dramatically unsafe anyway.
For example if you try to double-commit a transaction set.....

Sorry, palloc() is just an example. I'm not sure all FDWs can
implement all callbacks for two-phase commit without codes that could
emit errors.

Yeah, but if you are in the commit hook and someone emits an error, that's
wrong because that then tries to rollback an already committed transaction
and the backend rightfully panics. In fact I should probably strip out the
precondition check errors there and issue a warning. It might sometimes
happen when something goes seriously wrong on a system level, but....

There is a possible of system errors if one can no longer write to the

file log but at this point as long as we have logged the phase change to
commit we are able to recover later.

So in the event where one sees an error here one continues on to the

next transaction in the global transaction set and tries to commit it, etc.
until it runs through the entire set of prepared transactions. Then if
there were any errors it fires off a background worker which re-reads the
log file and goes out to the various foreign servers, checks to see if
there is a prepared transaction, and if so commits it. If the transaction
set state was in rollback, it tries to roll it back instead. If this
errors,, it sleeps for a second and then loops through those which errored
and retries until all are complete.

Yeah, the patch has the similar functionality.

The other thing is we record whether we are committing or rolling back

the transaction when we hit the commit or rollback hook. This is critical
because we can imagine a world where the Oracle FDW supports similar
semantics. In that case everything works and is not ordering dependent.
I.e. we can prepare our transactions. Oracle can try and fail, and
rollback, and we rollback all the transactions everywhere. And all we have
to know was we got to the precommit hook and then we rolled back.

In my patch the global transaction manager manages each status of
foreign servers participating in global transactions with WAL logging.
The fate of transaction on foreign server will be determined according
to the state of local transaction and their status. WAL logging is
important because not only in term of speedup but also supporting
streaming replication.

So you are optimizing for large numbers of prepared transactions or at a
high rate?

Also does the background worker get fired again on recovery as needed?

The design the patch chose is making backends do only PREPARE and wait
for the background worker to complete COMMIT PREPARED. In this design
the clients get a "committed" only either when successful in commit on
all participants or when they cancel the query explicitly. In other
words, the client will wait for completion of 2nd phase of two-phase
commit forever unless it cancels.

In this approach we make a best effort to commit or rollback (as

appropriate in the state of the global transaction) *all* remote
transactions during global commit or global rollback. It is not guaranteed
but it avoids breaking semantics as much as we can. Also the background
worker here does not need to attach to shared memory since the log has
everything required. COMMIT PREPARED ought to be a fast operation unless
there are network problems but those can affect prepare as well.

Also imagine a case where you are writing to three dbs. One is on

Oracle, one on DB2, and one on PostgreSQL You successfully prepare your
transaction. DB2 successfully prepares, and then the Oracle db errors for
some reason (maybe a deferred constraint). Does the background worker have
enough information to know to roll back your transaction on the remote side?

I think that what the background worker needs to know to rollback
remote transactions are how to rollback and what to rollback. How to
rollback is defined in each FDWs.

Agreed. And naturally same with commits.

My assumption is that each foreign data wrapper would have to set its own
precommit/commit hook callbacks. I think your patch extends the fdw
structure to try to ensure these are done automatically?

The state of each global
transactions is stored on GTM's shared memory and transaction
identifiers of each foreign server are associated with its global
transaction identifier the background worker can find out the
transaction identifier of foreign server side to rollback by looking
such information.

Seems sane.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

#14Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Chris Travers (#13)
Re: Two proposed modifications to the PostgreSQL FDW

On Wed, Aug 22, 2018 at 1:20 PM Chris Travers <chris.travers@adjust.com> wrote:

On Wed, Aug 22, 2018 at 3:12 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Aug 21, 2018 at 5:36 PM Chris Travers <chris.travers@adjust.com> wrote:

On Tue, Aug 21, 2018 at 8:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Aug 21, 2018 at 1:47 AM Chris Travers <chris.travers@adjust.com> wrote:

On Mon, Aug 20, 2018 at 4:41 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-08-20 16:28:01 +0200, Chris Travers wrote:

2. TWOPHASECOMMIT=[off|on] option

The second major issue that I see with PostgreSQL's foreign database
wrappers is the fact that there is no two phase commit which means that a
single transaction writing to a group of tables has no expectation that all
backends will commit or rollback together. With this patch an option would
be applied to foreign tables such that they could be set to use two phase
commit When this is done, the first write to each backend would register a
connection with a global transaction handler and a pre-commit and commit
hooks would be set up to properly process these.

On recommit a per-global-transaction file would be opened in the data
directory and prepare statements logged to the file. On error, we simply
roll back our local transaction.

On commit hook , we go through and start to commit the remote global
transactions. At this point we make a best effort but track whether or not
we were successfully on all. If successful on all, we delete the file. If
unsuccessful we fire a background worker which re-reads the file and is
responsible for cleanup. If global transactions persist, a SQL
administration function will be made available to restart the cleanup
process. On rollback, we do like commit but we roll back all transactions
in the set. The file has enough information to determine whether we should
be committing or rolling back on cleanup.

I would like to push these both for Pg 12. Is there any feedback on the
concepts and the problems first

Thank you for the proposal. I agree that it's a major problem that
postgres_fdw (or PostgreSQL core API) doesn't support two-phase
commit.

There's been *substantial* work on this. You should at least read the
discussion & coordinate with the relevant developers.

I suppose I should forward this to them directly also.

Yeah. Also the transaction manager code for this I wrote while helping with a proof of concept for this copy-to-remote extension.

There are a few big differences in implementation with the patches you mention and the disagreement was part of why I thought about going this direction.

First, discussion of differences in implementation:

1. I treat the local and remote transactions symmetrically and I make no assumptions about what might happen between prepare and an attempted local commit.
prepare goes into the precommit hook
commit goes into the commit hook and we never raise errors if it fails (because you cannot rollback at that point). Instead a warning is raised and cleanup commences.
rollback goes into the rollback hook and we never raise errors if it fails (because you are already rolling back).

2. By treating this as a property of a table rather than a property of a foreign data wrapper or a server, we can better prevent prepared transactions where they have not been enabled.
This also ensures that we know whether we are guaranteeing two phase commit or not by looking at the table.

3. By making this opt-in it avoids a lot of problems with regards to incorrect configuration etc since if the DBA says "use two phase commit" and failed to enable prepared transactions on the other side...

On to failure modes:
1. Its possible that under high load too many foreign transactions are prepared and things start rolling back instead of committing. Oh well....
2. In the event that a foreign server goes away between prepare and commit, we continue to retry via the background worker. The background worker is very pessimistic and checks every remote system for the named transaction.

If some participant servers fail during COMMIT PREPARED, will the
client get a "committed"? or an "aborted"? If the client gets
"aborted", that's not correct because the local changes are already
committed at that point.

Ok so let's discuss this in more detail here.

You have basically 6 states a TPC global transaction can be in.
1. We haven't gotten to the point of trying to commit (BEGIN)
2. We are trying to commit (PREPARE)
3. We have committed to committing all transactions (COMMIT)
4. We have committed to rolling back all transactions (ROLLBACK)
5. We have successfully committed OR rolled back all transactions (COMPLETE)
6. We tried to commit or rollback all transactions and got some errors (INCOMPLETE)

During COMMIT PREPARED we cannot raise errors to PostgreSQL. We have already committed to committing and therefore the only way forward is to fix the problem.

Agreed. I wrote the case where the client gets an "aborted" but it
should not happen.

It is possible an administrator could log in and roll back the prepared transaction but that's beyond the scope of any possible patch.

On the other hand, if the client get
"committed" it might break the current user semantics because the
subsequent reads may not be able to see the own committed writes.

Actually it is worse than that and this is why automatic attempted recovery is an absolute requirement. If you cannot commit prepared, then you have a prepared statement that is stuck on the remote side. This sets auto vacuum horizons and some other nastiness. So we have to note, move on, and try to fix.

Yeah, in my patch the background worker will continue to try to fix if occur.

The two things I would suggest is that rather than auto-detecting (if I understand your patch correctly) whether prepared transactions are possible on the other system, making it an option to the foreign server or foreign table. Otherwise one might enable prepared transactions for one set of operations on one database and have it automatically cause headaches in another context.

Yeah, currently it's an option for foreign servers. The patch adds a
new option "two_phase_commit" to postgres_fdw.

The other thing I wasn't quite sure about on your patch was what happens if, say, someone trips over a power cord while the background worker is trying to commit things, whether the information is available on the initiating server when it comes back. whether a DBA has to go out and try to figure out what needs to be committed remotely, and how this would be done. If you could explain that process, that would be helpful to me.

(In my approach these would be recorded on the master and an SQL function could re-initiate the background worker.)

My approach is almost the same as yours. For details, in the
pre-commit we do WAL-logging for each participants server before
preparing transactions on the remote sides. The WAL has information of
participants foreign servers(foreign server oid, database oid etc) and
its global transaction identifier. Even if plug-pulled during trying
to commit we can recover the global transactions that are not
completed yet and its participants information from WAL. After the
recovery users needs to execute the SQL function to fix the
in-completed global transactions. Since the function can find out
whether the remote transaction should be committed or rollback-ed by
checking CLOG. Does my answer make sense?

Moreover since COMMIT PREPARED occurs during the commit hook, not the precommit hook, it is too late to roll back the local transaction. We cannot raise errors since this causes a conflict in the commit status of the local transaction. So when we commit the local transaction we commit to committing all prepared transactions as soon as possible. Note some changes need to be made to make this usable in the FDW context, so what I am hoping is that the dialog helps impact the discussion and options going forward.

Also
since we don't want to wait for COMMIT PREPARED to complete we need to
consider that users could cancel the query anytime. To not break the
current semantics we cannot raise error during 2nd phase of two-phase
commit but it's not realistic because even the palloc() can raise an
error.

We don't palloc. All memory used here is on the stack. I do allow for dramatic precondition checks to cause errors but those should never happen absent some other programmer doing something dramatically unsafe anyway. For example if you try to double-commit a transaction set.....

Sorry, palloc() is just an example. I'm not sure all FDWs can
implement all callbacks for two-phase commit without codes that could
emit errors.

Yeah, but if you are in the commit hook and someone emits an error, that's wrong because that then tries to rollback an already committed transaction and the backend rightfully panics. In fact I should probably strip out the precondition check errors there and issue a warning. It might sometimes happen when something goes seriously wrong on a system level, but....

In my patch since the commit hook is performed by the background
worker not by the backends it's no problem if someone emits an error
in the commit hook. After the backend prepared transactions on the all
remote side, it enqueue itself to the wait queue. The background
worker gets the global transaction waiting to be completed and commit
prepared transaction on all remote side. After completed the global
transaction the background worker dequeue it.

There is a possible of system errors if one can no longer write to the file log but at this point as long as we have logged the phase change to commit we are able to recover later.

So in the event where one sees an error here one continues on to the next transaction in the global transaction set and tries to commit it, etc. until it runs through the entire set of prepared transactions. Then if there were any errors it fires off a background worker which re-reads the log file and goes out to the various foreign servers, checks to see if there is a prepared transaction, and if so commits it. If the transaction set state was in rollback, it tries to roll it back instead. If this errors,, it sleeps for a second and then loops through those which errored and retries until all are complete.

Yeah, the patch has the similar functionality.

The other thing is we record whether we are committing or rolling back the transaction when we hit the commit or rollback hook. This is critical because we can imagine a world where the Oracle FDW supports similar semantics. In that case everything works and is not ordering dependent. I.e. we can prepare our transactions. Oracle can try and fail, and rollback, and we rollback all the transactions everywhere. And all we have to know was we got to the precommit hook and then we rolled back.

In my patch the global transaction manager manages each status of
foreign servers participating in global transactions with WAL logging.
The fate of transaction on foreign server will be determined according
to the state of local transaction and their status. WAL logging is
important because not only in term of speedup but also supporting
streaming replication.

So you are optimizing for large numbers of prepared transactions or at a high rate?

I don't do optimizations much for the patch as this is the first
implementation. Once the basic feature committed I will do that.

Also does the background worker get fired again on recovery as needed?

No. I added new SQL function to fix global transactions. We need to
execute that function manually after recovery.

The design the patch chose is making backends do only PREPARE and wait
for the background worker to complete COMMIT PREPARED. In this design
the clients get a "committed" only either when successful in commit on
all participants or when they cancel the query explicitly. In other
words, the client will wait for completion of 2nd phase of two-phase
commit forever unless it cancels.

In this approach we make a best effort to commit or rollback (as appropriate in the state of the global transaction) *all* remote transactions during global commit or global rollback. It is not guaranteed but it avoids breaking semantics as much as we can. Also the background worker here does not need to attach to shared memory since the log has everything required. COMMIT PREPARED ought to be a fast operation unless there are network problems but those can affect prepare as well.

Also imagine a case where you are writing to three dbs. One is on Oracle, one on DB2, and one on PostgreSQL You successfully prepare your transaction. DB2 successfully prepares, and then the Oracle db errors for some reason (maybe a deferred constraint). Does the background worker have enough information to know to roll back your transaction on the remote side?

I think that what the background worker needs to know to rollback
remote transactions are how to rollback and what to rollback. How to
rollback is defined in each FDWs.

Agreed. And naturally same with commits.

My assumption is that each foreign data wrapper would have to set its own precommit/commit hook callbacks. I think your patch extends the fdw structure to try to ensure these are done automatically?

Yes. The patch adds new FDW APIs for the atomic commit such as
prepare, commit, rollback, resolve(2nd phase of 2PC). The FDW
developers who want make their FDW support the atomic commit need to
define these API and call the registration function when transaction
starts. If the FDW of the registered foreign server doesn't support
FDW's atomic commit API the transaction emit an error.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#15Chris Travers
chris.travers@adjust.com
In reply to: Masahiko Sawada (#14)
Re: Two proposed modifications to the PostgreSQL FDW

On Wed, Aug 22, 2018 at 9:09 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Wed, Aug 22, 2018 at 1:20 PM Chris Travers <chris.travers@adjust.com>
wrote:

The two things I would suggest is that rather than auto-detecting (if I

understand your patch correctly) whether prepared transactions are possible
on the other system, making it an option to the foreign server or foreign
table. Otherwise one might enable prepared transactions for one set of
operations on one database and have it automatically cause headaches in
another context.

Yeah, currently it's an option for foreign servers. The patch adds a
new option "two_phase_commit" to postgres_fdw.

Seems sane.

The other thing I wasn't quite sure about on your patch was what happens

if, say, someone trips over a power cord while the background worker is
trying to commit things, whether the information is available on the
initiating server when it comes back. whether a DBA has to go out and try
to figure out what needs to be committed remotely, and how this would be
done. If you could explain that process, that would be helpful to me.

(In my approach these would be recorded on the master and an SQL

function could re-initiate the background worker.)

My approach is almost the same as yours. For details, in the
pre-commit we do WAL-logging for each participants server before
preparing transactions on the remote sides. The WAL has information of
participants foreign servers(foreign server oid, database oid etc) and
its global transaction identifier. Even if plug-pulled during trying
to commit we can recover the global transactions that are not
completed yet and its participants information from WAL. After the
recovery users needs to execute the SQL function to fix the
in-completed global transactions. Since the function can find out
whether the remote transaction should be committed or rollback-ed by
checking CLOG. Does my answer make sense?

Yeah. That is probably more elegant than my solution. I do wonder though
if the next phase would not be to add some sort of hook to automatically
start the background worker in this case.

Moreover since COMMIT PREPARED occurs during the commit hook, not the

precommit hook, it is too late to roll back the local transaction. We
cannot raise errors since this causes a conflict in the commit status of
the local transaction. So when we commit the local transaction we commit
to committing all prepared transactions as soon as possible. Note some
changes need to be made to make this usable in the FDW context, so what I
am hoping is that the dialog helps impact the discussion and options going
forward.

Also
since we don't want to wait for COMMIT PREPARED to complete we need

to

consider that users could cancel the query anytime. To not break the
current semantics we cannot raise error during 2nd phase of two-phase
commit but it's not realistic because even the palloc() can raise an
error.

We don't palloc. All memory used here is on the stack. I do allow

for dramatic precondition checks to cause errors but those should never
happen absent some other programmer doing something dramatically unsafe
anyway. For example if you try to double-commit a transaction set.....

Sorry, palloc() is just an example. I'm not sure all FDWs can
implement all callbacks for two-phase commit without codes that could
emit errors.

Yeah, but if you are in the commit hook and someone emits an error,

that's wrong because that then tries to rollback an already committed
transaction and the backend rightfully panics. In fact I should probably
strip out the precondition check errors there and issue a warning. It
might sometimes happen when something goes seriously wrong on a system
level, but....

In my patch since the commit hook is performed by the background
worker not by the backends it's no problem if someone emits an error
in the commit hook. After the backend prepared transactions on the all
remote side, it enqueue itself to the wait queue. The background
worker gets the global transaction waiting to be completed and commit
prepared transaction on all remote side. After completed the global
transaction the background worker dequeue it.

Seems sane. I was firing off one background worker per global transaction
that needed cleanup. This might be an area to think about in terms of
questions of larger parallelism in remote writes.

There is a possible of system errors if one can no longer write to

the file log but at this point as long as we have logged the phase change
to commit we are able to recover later.

So in the event where one sees an error here one continues on to the

next transaction in the global transaction set and tries to commit it, etc.
until it runs through the entire set of prepared transactions. Then if
there were any errors it fires off a background worker which re-reads the
log file and goes out to the various foreign servers, checks to see if
there is a prepared transaction, and if so commits it. If the transaction
set state was in rollback, it tries to roll it back instead. If this
errors,, it sleeps for a second and then loops through those which errored
and retries until all are complete.

Yeah, the patch has the similar functionality.

The other thing is we record whether we are committing or rolling

back the transaction when we hit the commit or rollback hook. This is
critical because we can imagine a world where the Oracle FDW supports
similar semantics. In that case everything works and is not ordering
dependent. I.e. we can prepare our transactions. Oracle can try and fail,
and rollback, and we rollback all the transactions everywhere. And all we
have to know was we got to the precommit hook and then we rolled back.

In my patch the global transaction manager manages each status of
foreign servers participating in global transactions with WAL logging.
The fate of transaction on foreign server will be determined according
to the state of local transaction and their status. WAL logging is
important because not only in term of speedup but also supporting
streaming replication.

So you are optimizing for large numbers of prepared transactions or at a

high rate?

I don't do optimizations much for the patch as this is the first
implementation. Once the basic feature committed I will do that.

Also does the background worker get fired again on recovery as needed?

No. I added new SQL function to fix global transactions. We need to
execute that function manually after recovery.

And that is solely on a case where the db does a full restart. Is it
possible the background worker could survive a backend restart?

The design the patch chose is making backends do only PREPARE and

wait

for the background worker to complete COMMIT PREPARED. In this design
the clients get a "committed" only either when successful in commit

on

all participants or when they cancel the query explicitly. In other
words, the client will wait for completion of 2nd phase of two-phase
commit forever unless it cancels.

In this approach we make a best effort to commit or rollback (as

appropriate in the state of the global transaction) *all* remote
transactions during global commit or global rollback. It is not guaranteed
but it avoids breaking semantics as much as we can. Also the background
worker here does not need to attach to shared memory since the log has
everything required. COMMIT PREPARED ought to be a fast operation unless
there are network problems but those can affect prepare as well.

Also imagine a case where you are writing to three dbs. One is on

Oracle, one on DB2, and one on PostgreSQL You successfully prepare your
transaction. DB2 successfully prepares, and then the Oracle db errors for
some reason (maybe a deferred constraint). Does the background worker have
enough information to know to roll back your transaction on the remote side?

I think that what the background worker needs to know to rollback
remote transactions are how to rollback and what to rollback. How to
rollback is defined in each FDWs.

Agreed. And naturally same with commits.

My assumption is that each foreign data wrapper would have to set its

own precommit/commit hook callbacks. I think your patch extends the fdw
structure to try to ensure these are done automatically?

Yes. The patch adds new FDW APIs for the atomic commit such as
prepare, commit, rollback, resolve(2nd phase of 2PC). The FDW
developers who want make their FDW support the atomic commit need to
define these API and call the registration function when transaction
starts. If the FDW of the registered foreign server doesn't support
FDW's atomic commit API the transaction emit an error.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

#16Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Chris Travers (#15)
Re: Two proposed modifications to the PostgreSQL FDW

On Wed, Aug 22, 2018 at 5:13 PM Chris Travers <chris.travers@adjust.com> wrote:

On Wed, Aug 22, 2018 at 9:09 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Aug 22, 2018 at 1:20 PM Chris Travers <chris.travers@adjust.com> wrote:

The two things I would suggest is that rather than auto-detecting (if I understand your patch correctly) whether prepared transactions are possible on the other system, making it an option to the foreign server or foreign table. Otherwise one might enable prepared transactions for one set of operations on one database and have it automatically cause headaches in another context.

Yeah, currently it's an option for foreign servers. The patch adds a
new option "two_phase_commit" to postgres_fdw.

Seems sane.

The other thing I wasn't quite sure about on your patch was what happens if, say, someone trips over a power cord while the background worker is trying to commit things, whether the information is available on the initiating server when it comes back. whether a DBA has to go out and try to figure out what needs to be committed remotely, and how this would be done. If you could explain that process, that would be helpful to me.

(In my approach these would be recorded on the master and an SQL function could re-initiate the background worker.)

My approach is almost the same as yours. For details, in the
pre-commit we do WAL-logging for each participants server before
preparing transactions on the remote sides. The WAL has information of
participants foreign servers(foreign server oid, database oid etc) and
its global transaction identifier. Even if plug-pulled during trying
to commit we can recover the global transactions that are not
completed yet and its participants information from WAL. After the
recovery users needs to execute the SQL function to fix the
in-completed global transactions. Since the function can find out
whether the remote transaction should be committed or rollback-ed by
checking CLOG. Does my answer make sense?

Yeah. That is probably more elegant than my solution. I do wonder though if the next phase would not be to add some sort of hook to automatically start the background worker in this case.

Moreover since COMMIT PREPARED occurs during the commit hook, not the precommit hook, it is too late to roll back the local transaction. We cannot raise errors since this causes a conflict in the commit status of the local transaction. So when we commit the local transaction we commit to committing all prepared transactions as soon as possible. Note some changes need to be made to make this usable in the FDW context, so what I am hoping is that the dialog helps impact the discussion and options going forward.

Also
since we don't want to wait for COMMIT PREPARED to complete we need to
consider that users could cancel the query anytime. To not break the
current semantics we cannot raise error during 2nd phase of two-phase
commit but it's not realistic because even the palloc() can raise an
error.

We don't palloc. All memory used here is on the stack. I do allow for dramatic precondition checks to cause errors but those should never happen absent some other programmer doing something dramatically unsafe anyway. For example if you try to double-commit a transaction set.....

Sorry, palloc() is just an example. I'm not sure all FDWs can
implement all callbacks for two-phase commit without codes that could
emit errors.

Yeah, but if you are in the commit hook and someone emits an error, that's wrong because that then tries to rollback an already committed transaction and the backend rightfully panics. In fact I should probably strip out the precondition check errors there and issue a warning. It might sometimes happen when something goes seriously wrong on a system level, but....

In my patch since the commit hook is performed by the background
worker not by the backends it's no problem if someone emits an error
in the commit hook. After the backend prepared transactions on the all
remote side, it enqueue itself to the wait queue. The background
worker gets the global transaction waiting to be completed and commit
prepared transaction on all remote side. After completed the global
transaction the background worker dequeue it.

Seems sane. I was firing off one background worker per global transaction that needed cleanup. This might be an area to think about in terms of questions of larger parallelism in remote writes.

There is a possible of system errors if one can no longer write to the file log but at this point as long as we have logged the phase change to commit we are able to recover later.

So in the event where one sees an error here one continues on to the next transaction in the global transaction set and tries to commit it, etc. until it runs through the entire set of prepared transactions. Then if there were any errors it fires off a background worker which re-reads the log file and goes out to the various foreign servers, checks to see if there is a prepared transaction, and if so commits it. If the transaction set state was in rollback, it tries to roll it back instead. If this errors,, it sleeps for a second and then loops through those which errored and retries until all are complete.

Yeah, the patch has the similar functionality.

The other thing is we record whether we are committing or rolling back the transaction when we hit the commit or rollback hook. This is critical because we can imagine a world where the Oracle FDW supports similar semantics. In that case everything works and is not ordering dependent. I.e. we can prepare our transactions. Oracle can try and fail, and rollback, and we rollback all the transactions everywhere. And all we have to know was we got to the precommit hook and then we rolled back.

In my patch the global transaction manager manages each status of
foreign servers participating in global transactions with WAL logging.
The fate of transaction on foreign server will be determined according
to the state of local transaction and their status. WAL logging is
important because not only in term of speedup but also supporting
streaming replication.

So you are optimizing for large numbers of prepared transactions or at a high rate?

I don't do optimizations much for the patch as this is the first
implementation. Once the basic feature committed I will do that.

Also does the background worker get fired again on recovery as needed?

No. I added new SQL function to fix global transactions. We need to
execute that function manually after recovery.

Oops, sorry I was wrong. The background workers will get fired
automatically by the launcher process after the recovery.

And that is solely on a case where the db does a full restart. Is it possible the background worker could survive a backend restart?

Yes, all background worker resolving foreign transactions are launched
by the launcher. So the backend failure doesn't affect the background
worker and even if they fail then the launcher will launch them again
as long as there is not-completed global transaction.

The design the patch chose is making backends do only PREPARE and wait
for the background worker to complete COMMIT PREPARED. In this design
the clients get a "committed" only either when successful in commit on
all participants or when they cancel the query explicitly. In other
words, the client will wait for completion of 2nd phase of two-phase
commit forever unless it cancels.

In this approach we make a best effort to commit or rollback (as appropriate in the state of the global transaction) *all* remote transactions during global commit or global rollback. It is not guaranteed but it avoids breaking semantics as much as we can. Also the background worker here does not need to attach to shared memory since the log has everything required. COMMIT PREPARED ought to be a fast operation unless there are network problems but those can affect prepare as well.

Also imagine a case where you are writing to three dbs. One is on Oracle, one on DB2, and one on PostgreSQL You successfully prepare your transaction. DB2 successfully prepares, and then the Oracle db errors for some reason (maybe a deferred constraint). Does the background worker have enough information to know to roll back your transaction on the remote side?

I think that what the background worker needs to know to rollback
remote transactions are how to rollback and what to rollback. How to
rollback is defined in each FDWs.

Agreed. And naturally same with commits.

My assumption is that each foreign data wrapper would have to set its own precommit/commit hook callbacks. I think your patch extends the fdw structure to try to ensure these are done automatically?

Yes. The patch adds new FDW APIs for the atomic commit such as
prepare, commit, rollback, resolve(2nd phase of 2PC). The FDW
developers who want make their FDW support the atomic commit need to
define these API and call the registration function when transaction
starts. If the FDW of the registered foreign server doesn't support
FDW's atomic commit API the transaction emit an error.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#17Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Amit Langote (#9)
Re: Two proposed modifications to the PostgreSQL FDW

(2018/08/21 16:03), Amit Langote wrote:

On 2018/08/20 23:43, Tom Lane wrote:

Chris Travers<chris.travers@adjust.com> writes:

I am looking at trying to make two modifications to the PostgreSQL FDW and
would like feedback on this before I do.

1. INSERTMETHOD=[insert|copy] option on foreign table.

One significant limitation of the PostgreSQL FDW is that it does a prepared
statement insert on each row written which imposes a per-row latency. This
hits environments where there is significant latency or few latency
guarantees particularly hard, for example, writing to a foreign table that
might be physically located on another continent. The idea is that
INSERTMETHOD would default to insert and therefore have no changes but
where needed people could specify COPY which would stream the data out.
Updates would still be unaffected.

A different thing we could think about is enabling COPY TO/FROM a
foreign table.

Fwiw, the following commit did introduce COPY FROM support for foreign
tables, although using a FDW INSERT interface, so not exactly optimized
for bulk-loading yet.

commit 3d956d9562aa4811b5eaaaf5314d361c61be2ae0
Author: Robert Haas<rhaas@postgresql.org>
Date: Fri Apr 6 19:16:11 2018 -0400

Allow insert and update tuple routing and COPY for foreign tables.

That's right. To improve the efficiency, I plan to work on COPY FROM/TO
a foreign table for PG12 (In [1]/messages/by-id/23990375-45a6-5823-b0aa-a6a7a6a957f0@lab.ntt.co.jp, I proposed new FDW APIs for COPY FROM).

Sorry, I'm late to the party.

Best regards,
Etsuro Fujita

[1]: /messages/by-id/23990375-45a6-5823-b0aa-a6a7a6a957f0@lab.ntt.co.jp
/messages/by-id/23990375-45a6-5823-b0aa-a6a7a6a957f0@lab.ntt.co.jp