Stopping logical replication protocol

Started by Vladimir Gordiychukabout 10 years ago42 messageshackers

folyga@gmail.com

about 10 years ago

Hi all,

During implementing logical replication protocol for pgjdbc
https://github.com/pgjdbc/pgjdbc/pull/550 I faced with strange behavior of
the *walcender.c*:

1. When WAL consumer catchup master and change his state to streaming,
not available normally complete replication by send CopyDone message until
will not generate/create new WAL record. It occurs because logical decoding
located in *WalSndWaitForWal* until will not available next WAL record,
and it method receive message from consumer even reply on CopyDone with
CopyDone but ignore exit from loop and we can wait many times waiting
CommandStatus & ReadyForQuery packages on consumer.
2. Logical decoding ignore message from consumer during decoding and
writing transaction in socket(*WalSndWriteData*). It affect long
transactions with many changes. Because for example if we start decoding
transaction that insert 1 million records and after consume 1% of it date
we
decide stop replication, it will be not available until whole million
record will not send to consumer. And I see the following problems because
of this:

- It will generate many not necessary network traffic.
- Not available normally stop logical replication, because consumer will
fail with timeout and it broke scenario when consumer put data to external
system asynchronously and by success callback send feedback to master
about flushed LSN, but by fail callback stop replication, and restart it
from the last successfully sent LSN to external system, because slot
will be busy and it will increase latency for streaming system.
- Consumer can send keepalive message with required reply flag to master
that right now decoding and sending huge transaction, and after some time
disconnect master because because reply on keepalive will not get. I
faced with it problem during research bottledwater-pg
<https://github.com/confluentinc/bottledwater-pg> extension that fail
each time during receive transaction in that was modify 1 million
record(restart_lsn was before huge transaction so extension fail again and
again) disconnect master because not keep keepalive package too long.

I prepare small patch that fix problems describe below. Now *WalSndWriteData
*first check message from consumer and during decode transaction check that
replication still active. KeppAlive message now not send if was get
CopyDone package(keep alive now not necessary we preparing to
complete). *WalSndWaitForWal
*after get CopyDone exit from loop. With apply this patch I get next
measurements
Before
-----
logical start and stopping: *15446ms*
logical stopping: *13820ms*

physical start and stopping: 462ms
physical stopping: 348

After
-----
logical start and stopping: 2424ms
logical stopping: *26ms*

physical start and stopping: 458ms
physical stopping: 329ms

Where for measurements was use code from pgjdbc

For physical replicaion:

LogSequenceNumber startLSN = getCurrentLSN();

Statement st = sqlConnection.createStatement();
st.execute("insert into test_logic_table\n"
+ " select id, md5(random()::text) as name from
generate_series(1, 1000000) as id;");
st.close();

long start = System.nanoTime();

PGReplicationStream stream =
pgConnection
.replicationStream()
.physical()
.withStartPosition(startLSN)
.start();

//read single message
stream.read();
long startStopping = System.nanoTime();

stream.close();

long now = System.nanoTime();

long startAndStopTime = now - start;
long stopTime = now - startStopping;

System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime));
System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime));

For logical replication:

LogSequenceNumber startLSN = getCurrentLSN();

Statement st = sqlConnection.createStatement();
st.execute("insert into test_logic_table\n"
+ " select id, md5(random()::text) as name from
generate_series(1, 1000000) as id;");
st.close();

long start = System.nanoTime();

PGReplicationStream stream =
pgConnection
.replicationStream()
.logical()
.withSlotName(SLOT_NAME)
.withStartPosition(startLSN)
.withSlotOption("include-xids", false)
.withSlotOption("skip-empty-xacts", true)
.start();

//read single message
stream.read();
long startStopping = System.nanoTime();

stream.close();

long now = System.nanoTime();

long startAndStopTime = now - start;
long stopTime = now - startStopping;

System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime));
System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime));

https://github.com/Gordiychuk/postgres/tree/stopping_logical_replication

Shulgin, Oleksandr

oleksandr.shulgin@zalando.de

about 10 years ago

In reply to: Vladimir Gordiychuk (#1)

Re: Stopping logical replication protocol

On Fri, May 6, 2016 at 5:23 PM, Vladimir Gordiychuk <folyga@gmail.com>
wrote:

Hi all,

During implementing logical replication protocol for pgjdbc
https://github.com/pgjdbc/pgjdbc/pull/550 I faced with strange behavior
of the *walcender.c*:

1. When WAL consumer catchup master and change his state to streaming,
not available normally complete replication by send CopyDone message until
will not generate/create new WAL record. It occurs because logical decoding
located in *WalSndWaitForWal* until will not available next WAL
record, and it method receive message from consumer even reply on CopyDone
with CopyDone but ignore exit from loop and we can wait many times waiting
CommandStatus & ReadyForQuery packages on consumer.
2. Logical decoding ignore message from consumer during decoding and
writing transaction in socket(*WalSndWriteData*). It affect long
transactions with many changes. Because for example if we start decoding
transaction that insert 1 million records and after consume 1% of it date
we
decide stop replication, it will be not available until whole million
record will not send to consumer.

How exactly are you stopping the replication? If you just stop reading

you'll likely to hit some problems, but what if you also close the
connection? I don't think there is any other supported way to do it.

I was working last year on adding support for replication protocol to the
Python driver: https://github.com/psycopg/psycopg2/pull/322 It would be
nice if you could skim through this implementation, since it didn't receive
a whole lot of review.

Cheers.
--
Alex

Vladimir Gordiychuk

folyga@gmail.com

about 10 years ago

In reply to: Shulgin, Oleksandr (#2)

Re: Stopping logical replication protocol

Replication work via Copy API, so for stop replication walcender.c wait
CopyDone. My communication seems like this

FE=> StartReplication(query: START_REPLICATION SLOT
pgjdbc_logical_replication_slot LOGICAL 0/18FCFD0 ("include-xids" 'false',
"skip-empty-xacts" 'true'))
FE=> Query(CopyStart)
<=BE CopyBothResponse
FE=> StandbyStatusUpdate(received: 0/18FCFD0, flushed: 0/0, applied: 0/0,
clock: Tue May 03 22:13:14 MSK 2016)
FE=> CopyData(34)
<=BE CopyData
<=BE CopyData
<=BE Keepalive(lastServerWal: 0/18FCFD0, clock: Tue May 03 22:13:14 MSK
2016 needReply: false)
<=BE XLogData(currWal: 0/0, lastServerWal: 0/0, clock: 0)
<=BE CopyData
<=BE XLogData(currWal: 0/18FD0A0, lastServerWal: 0/18FD0A0, clock: 0)
<=BE CopyData
<=BE XLogData(currWal: 0/18FD1B0, lastServerWal: 0/18FD1B0, clock: 0)
FE=> StopReplication
<=BE CopyData
FE=> CopyDone
<=BE CopyDone
<=BE CopyData
... after few seconds
<=BE CommandStatus(COPY 0)
<=BE ReadyForQuery(I)

The main idea that i want work with same connection without close it.

2016-05-06 19:45 GMT+03:00 Oleksandr Shulgin <oleksandr.shulgin@zalando.de>:

Show quoted text

On Fri, May 6, 2016 at 5:23 PM, Vladimir Gordiychuk <folyga@gmail.com>
wrote:

Hi all,

During implementing logical replication protocol for pgjdbc
https://github.com/pgjdbc/pgjdbc/pull/550 I faced with strange behavior
of the *walcender.c*:

1. When WAL consumer catchup master and change his state to
streaming, not available normally complete replication by send CopyDone
message until will not generate/create new WAL record. It occurs because
logical decoding located in *WalSndWaitForWal* until will not
available next WAL record, and it method receive message from consumer even
reply on CopyDone with CopyDone but ignore exit from loop and we can wait
many times waiting CommandStatus & ReadyForQuery packages on consumer.
2. Logical decoding ignore message from consumer during decoding and
writing transaction in socket(*WalSndWriteData*). It affect long
transactions with many changes. Because for example if we start decoding
transaction that insert 1 million records and after consume 1% of it date
we
decide stop replication, it will be not available until whole million
record will not send to consumer.

How exactly are you stopping the replication? If you just stop reading

you'll likely to hit some problems, but what if you also close the
connection? I don't think there is any other supported way to do it.

I was working last year on adding support for replication protocol to the
Python driver: https://github.com/psycopg/psycopg2/pull/322 It would be
nice if you could skim through this implementation, since it didn't receive
a whole lot of review.

Cheers.
--
Alex

Craig Ringer

craig@2ndquadrant.com

about 10 years ago

In reply to: Vladimir Gordiychuk (#1)

Re: Stopping logical replication protocol

On 6 May 2016 at 23:23, Vladimir Gordiychuk <folyga@gmail.com> wrote:

I prepare small patch that fix problems describe below. Now *WalSndWriteData
*first check message from consumer and during decode transaction check
that replication still active. KeppAlive message now not send if was get
CopyDone package(keep alive now not necessary we preparing to complete). *WalSndWaitForWal
*after get CopyDone exit from loop. With apply this patch I get next
measurements

I'll review this, but based on the description and concept I agree it's
useful.

I have been frustrated in the past by the inability to terminate the
logical replication stream and return to command mode without a
disconnect/reconnect so I'd like to have this.

AFAIK there's no particular reason we can't do it, it's just not
implemented. It does have to be a clean exit though, where the logical
decoding context is destroyed, the xact it was running in is closed, etc.
Like for the SQL interface.

You still won't want to do it too often because there's a cost to stopping
decoding and restarting it. We have to re-read from the restart_lsn and
reassemble transactions again up to the point where we can start sending
them to the client again. Especially if you're most of the way through
decoding a big xact, or if there's a long-running xact holding restart_lsn
down this might cost a bit. But no worse than repeatedly calling the SQL
logical decoding interface.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Craig Ringer

craig@2ndquadrant.com

about 10 years ago

In reply to: Vladimir Gordiychuk (#1)

Re: Stopping logical replication protocol

On 6 May 2016 at 23:23, Vladimir Gordiychuk <folyga@gmail.com> wrote:

I prepare small patch that fix problems describe below. Now *WalSndWriteData
*first check message from consumer and during decode transaction check
that replication still active.

OK, upon looking closer I'm not sure I agree with this approach.

In WalSndLoop(...) we currently call ProcessRepliesIfAny() which handles a
client-initiated CopyDone by replying with its own CopyDone. WalSndLoop
then just waits until the send buffer is flushed or the client disconnects
and returns. That looks to be pretty much what's needed... but only when we
actually execute that code path.

XLogSendLogical calls WalSndWaitForWal(...) (via xlogreader and
logical_read_xlog_page) when waiting for something to process, when the
client will be blocked on a socket read. It also processes client CopyDone,
but it doesn't seem to test streamingDoneSending or streamingDoneReceiving
like the outer WalSndLoop does. So it looks like it continues waiting for
WAL when we know the client sent CopyDone and doesn't want anything more.

I think we should be testing that in WalSndWaitForWal, like we do in
WalSndLoop, and bailing out. You've done that part, and I agree that's good.

It seems like what you're also trying to allow interruption deeper than
that, when we're in the middle of processing a reorder buffer commit record
and streaming that to the client. You're introducing an is_active member
(actually a callback, though name suggests it's a flag) in struct
ReorderBuffer to check whether a CopyDone is received, and you're skipping
ReorderBuffer commit processing when the callback returns false. The
callback returns "!streamingDoneReceiving && !streamingDoneSending" i.e.
it's false if either end has sent CopyDone. streamingDoneSending and
streamingDoneSending are only set in ProcessRepliesIfAny, called by
WalSndLoop and WalSndWaitForWal. So the idea is, presumably, that if we're
waiting for WAL from XLogSendLogical we skip processing of any commit
records and exit.

That seems overcomplicated.

When WalSndWaitForWAL is called
by logical_read_xlog_page, logical_read_xlog_page can just test
streamingDoneReceiving and streamingDoneSending. If they're set it can skip
the page read and return -1, which will cause the xlogreader to return a
null record to XLogSendLogical. That'll skip the decoding calls and return
to WalSndLoop, where we'll notice it's time to exit.

That way there's no need to modify the reorder buffer structs, add
callbacks, etc.

Make sense?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Craig Ringer

craig@2ndquadrant.com

about 10 years ago

In reply to: Craig Ringer (#5)

Re: Stopping logical replication protocol

I've created a CF entry for this patch:

https://commitfest.postgresql.org/10/621/

set as waiting-on-author. Vladimir, I didn't find a PostgreSQL community
user account for you, so I couldn't set you up as the author. What's your
PostgreSQL community username?

Vladimir Gordiychuk

folyga@gmail.com

about 10 years ago

In reply to: Craig Ringer (#6)

Re: Stopping logical replication protocol

What's your PostgreSQL community username?

gordiychuk

It seems like what you're also trying to allow interruption deeper than

that, when we're in the middle of processing a reorder buffer commit record
and streaming that to the client. You're introducing an is_active member
(actually a callback, though name suggests it's a flag) in struct
ReorderBuffer to check whether a CopyDone is received, and you're skipping
ReorderBuffer commit processing when the callback returns false. The
callback returns "!streamingDoneReceiving && !streamingDoneSending" i.e.
it's false if either end has sent CopyDone. streamingDoneSending and
streamingDoneSending are only set in ProcessRepliesIfAny, called by
WalSndLoop and WalSndWaitForWal. So the idea is, presumably, that if we're
waiting for WAL from XLogSendLogical we skip processing of any commit
records and exit.

That seems overcomplicated.

When WalSndWaitForWAL is called
by logical_read_xlog_page, logical_read_xlog_page can just test
streamingDoneReceiving and streamingDoneSending. If they're set it can skip
the page read and return -1, which will cause the xlogreader to return a
null record to XLogSendLogical. That'll skip the decoding calls and return
to WalSndLoop, where we'll notice it's time to exit.

ProcessRepliesIfAny also now executes in WalSdnWriteData. Because during
send data we should also check message from client(client can send
CopyDone, KeepAlive, Terminate).

@@ -1086,14 +1089,6 @@ WalSndWriteData(LogicalDecodingContext *ctx,
XLogRecPtr lsn, TransactionId xid,
memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)],
tmpbuf.data, sizeof(int64));

- /* fast path */
- /* Try to flush pending output to the client */
- if (pq_flush_if_writable() != 0)
- WalSndShutdown();
-
- if (!pq_is_send_pending())
- return;
-

The main idea is that we can get CopyDone from client in the next
functions: WalSdnLoop, WalSndWaitForWal, WalSndWriteData. All of this
methods can take a long time, because WalSndWaitForWal can wait new
transaction and on not active db it can take long enough, WalSndWriteData
can send big transaction that also lead to ignore messages from client
until long time(In my example above for 1 million object changes, walsender
ignore messages 13 seconds and not allow reuse connection). When client
send CopyDone they don't want receive message anymore for current LSN. For
example physical replication can be interrupt in the middle of transaction
that affect more than one LSN.

Maybe I not correct undestand documentation, but I want reuse same
connection without reopen it, because open new connection takes too long.
Is it correct use case or CopyDOne it side effect of copy protocol and for
complete replication need use always Terminate package and reopen
connection?

Craig Ringer

craig@2ndquadrant.com

about 10 years ago

In reply to: Vladimir Gordiychuk (#7)

Re: Stopping logical replication protocol

ProcessRepliesIfAny also now executes in WalSdnWriteData. Because during
send data we should also check message from client(client can send
CopyDone, KeepAlive, Terminate).

Ah, I didn't spot that ProcessRepliesIfAny() is already called from
WalSndWriteData in the current codebase.

I agree that it would be useful to check for client-initiated CopyDone
there, clear the walsender write queue, close the decoding session and
return to command mode.

Nothing gets sent until we've formatted the data we want to send into a
StringInfo, though. The comments on WalSndPrepareWrite even note that
there's no guarantee anything will get done with the data. There should be
no need to test whether we're processing a Datum, since we won't call
ProcessRepliesIfAny() or WalSndWriteData() while we're doing that.

I think all you should need to do is test streamingDoneSending and
streamingDoneReceiving in WalSndWriteData() and WalSndWaitForWal().

I outlined how I think WalSndWaitForWal() should be handled earlier.
Test streamingDoneReceiving
and streamingDoneSending in logical_read_xlog_page and return -1.

For WalSndWriteData() it's more complicated because there's data waiting to
flush. I don't see any way to remove data from the send buffer in the libpq
async API, so you'll have to send whatever's already been passed to libpq
for sending using pq_putmessage_noblock(...). This is necessary for
protocol integrity too, since you need to send a complete CopyData packet.
The length is specified at the start of the CopyData, so once you start
sending it you are committed to finishing it. Otherwise the client has no
way to know when to stop expecting CopyData and look for the next protocol
message. It can wait until its socket read blocks, but that doesn't mean
there isn't more of the CopyData in the upstream server's send buffers,
buffered in a router, etc.

If you want a way to abort in the middle of sending a datum (or in fact an
individual logical change) you'd need a much bigger change to the
replication protocol, where we chunk up big Datum values into multiple
CopyData messages with continuations. That'd be quite nice to have,
especially if it let us stream the Datum out without assembling the whole
thing in memory in a StringInfo first, but it's a much bigger change.

If you just return from WalSndWriteData(), the data is still queued by
libpq, and libpq will still finish sending it when you try to send
something else. So I don't see any point taking much action in response to
CopyDone in WalSndWriteData(). Maybe you could check if the message we're
about to send is large and if it is, check for client input before we start
sending and bail out before the pq_putmessage_noblock() call. But once
we've done that we're committed to finishing sending that message and have
to wait until pq_is_send_pending() return false before we can send our own
CopyDone and exit.

Since WalSndWriteData() must write a whole message from the decoding plugin
before it can return, and when it returns we'll return
from LogicalDecodingProcessRecord(..) to XLogSendLogical(...), we can just
test for streamingDoneReceiving and streamingDoneSending there.

I guess if you add a flag or callback to the logical decoding context you
could have the logical decoding plugin interrupt processing of an
insert/update/delete row change message between processing of individual
datums within it and return without calling OutputPluginWrite(...) so the
data is never queued for WalSndWriteData(...). This would make a difference
if you have wide rows... but don't currently process client writes during
output plugin callbacks. You'd have to add a logical decoding API function
clients could call to process client replies and set the flag. (They can't
call ProcessRepliesIfAny from walsender.c as it's static and not meant to
be called from within a decoding plugin anyway).

I want reuse same connection without reopen it, because open new

connection takes too long.

Fair enough. Though I don't understand why you'd be doing this often enough
that you'd care about reopening connections. What is the problem you are
trying to solve with this? The underlying reason you need this change?

Is it correct use case or CopyDone it side effect of copy protocol and

for complete replication need use always Terminate package and reopen
connection?

Client-initiated CopyDone for COPY BOTH should be just fine in protocol
terms. There's partial support for it in the walsender already.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Craig Ringer

craig@2ndquadrant.com

about 10 years ago

In reply to: Craig Ringer (#8)

Re: Stopping logical replication protocol

On 10 May 2016 at 09:50, Craig Ringer <craig@2ndquadrant.com> wrote:

I outlined how I think WalSndWaitForWal() should be handled earlier. Test streamingDoneReceiving
and streamingDoneSending in logical_read_xlog_page and return -1.

OK, so thinking about this some more, I see why you've added the callback
within the reorder buffer code. You want to stop processing of a
transaction after we've decoded the commit and are looping over the changes
within ReorderBufferCommit(...), which doesn't know anything about the
walsender. So we could loop for a long time within WalSndLoop ->
XLogSendLogical -> LogicalDecodingProcessRecord if the record is a commit,
as we process each change and send it to the client.

So aborting processing and returning between individual changes in
ReorderBufferCommit(...) seems very reasonable. I agree that some kind of
callback is needed because the walsender uses static globals to control its
send/receive stop logic. I don't like the naming "is_active"; maybe reverse
the sense and call it "stop_decoding_cb" ? Or "continue_decoding_cb"?
Unsure.

Anyway, here's a draft patch that does the parts other than the reorder
buffer processing stop. It passes 'make check' and the src/test/recovery
tests, but I haven't written anything to verify the client-initiated abort
handling. You have test code for this and I'd be interested to see the
results.

This patch doesn't attempt to allow decoding to be aborted during
processing of an xlog record, including a commit. So it'll still attempt to
send whole transactions. But it should allow clients to send CopyDone when
we're waiting for new WAL to be generated and return to command mode then.

I think it's worth making the next step, where you allow reorder buffer
commit processing to be interrupted, into a separate patch on top of this
one. They're two separate changes IMO.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#10

Vladimir Gordiychuk

folyga@gmail.com

about 10 years ago

In reply to: Craig Ringer (#9)

Re: Stopping logical replication protocol

Fair enough. Though I don't understand why you'd be doing this often
enough that you'd care about reopening connections. What is the problem you
are trying to solve with this? The underlying reason you need this change?

First reason it clear API in pgjdc. Second reason it ability fast enough
rollack to one of the previous LSN and repeat it. My use case for second
reason - I have logical decoding extension that prepare only data as
key-value pair without information about (insert, update, delete) for
example if it delete as key I use primary key for table and as value null. Via
pgjdc by replication protocol connects receiver data, consumer group
changes to batch and send it to Kafka. If some problem occurs during
delivery to kafka consumer, I should stop current replication, go back to
success LSN and repeat all messages from success LSN. I think it operation
can be quite common, but reopen connection or not stopping replication will
increase latency.

Anyway, here's a draft patch that does the parts other than the reorder

buffer processing stop. It passes 'make check' and the src/test/recovery
tests, but I haven't written anything to verify the client-initiated abort
handling. You have test code for this and I'd be interested to see the
results.

What about keepAlive package when we already send/receive CopyDone? Is It
really necessary?

I measure current fix without include long transaction, only start
replication and stop it, when on postgresql absent active transactions and
get next results:

*Before:*
12:35:31.403 (2) FE=> StartReplication(query: START_REPLICATION SLOT
pgjdbc_logical_replication_slot LOGICAL 0/7287AC00 ("include-xids" 'false',
"skip-empty-xacts" 'true'))
12:35:31.403 (2) FE=> Query(CopyStart)
12:35:31.404 (2) <=BE CopyBothResponse
12:35:31.406 (2) FE=> StopReplication
12:35:31.406 (2) FE=> CopyDone
12:35:31.406 (2) <=BE CopyData
12:35:31.407 (2) [107, 0, 0, 0, 0, 114, -121, -84, 0, 0, 1, -43, 120, 106,
53, -85, 21, 0]
12:35:31.407 (2) <=BE CopyDone
12:35:31.407 (2) <=BE CopyData
12:35:31.407 (2) [107, 0, 0, 0, 0, 114, -121, -84, 0, 0, 1, -43, 120, 106,
53, -76, 29, 0]
12:35:52.120 (2) <=BE CommandStatus(COPY 0)
12:35:52.120 (2) <=BE CommandStatus(SELECT)
12:35:52.120 (2) <=BE ReadyForQuery(I)
12:35:52.126 (2) FE=> Terminate

*Timing:*
Start and stopping time: 20729ms
Stopping time: 20720ms

*After:*
14:30:02.050 (2) FE=> StartReplication(query: START_REPLICATION SLOT
pgjdbc_logical_replication_slot LOGICAL 0/728A06C0 ("include-xids" 'false',
"skip-empty-xacts" 'true'))
14:30:02.050 (2) FE=> Query(CopyStart)
14:30:02.051 (2) <=BE CopyBothResponse
14:30:02.056 (2) FE=> StopReplication
14:30:02.057 (2) FE=> CopyDone
14:30:02.057 (2) <=BE CopyData
14:30:02.057 (2) [107, 0, 0, 0, 0, 114, -118, 6, -64, 0, 1, -43, 122, 3,
-69, 107, 76, 0]
14:30:02.058 (2) <=BE CopyDone
14:30:02.058 (2) <=BE CommandStatus(COPY 0)
14:30:02.058 (2) <=BE CommandStatus(SELECT)
14:30:02.058 (2) <=BE ReadyForQuery(I)
14:30:02.068 (2) FE=> StandbyStatusUpdate(received: 0/728A06C0, flushed:
0/0, applied: 0/0, clock: Tue May 10 14:30:02 MSK 2016)

*Timing:*
Start and stopping time: 27ms
Stopping time: 11ms

So aborting processing and returning between individual changes in

ReorderBufferCommit(...) seems very reasonable. I agree that some kind of
callback is needed because the walsender uses static globals to control its
send/receive stop logic. I don't like the naming "is_active"; maybe reverse
the sense and call it "stop_decoding_cb" ? Or "continue_decoding_cb"?
Unsure.

continue_decoding_cb sounds good.

I think it's worth making the next step, where you allow reorder buffer

commit processing to be interrupted, into a separate patch on top of this
one. They're two separate changes IMO.

We will continue in the current thread, or new? I interesting in both patch
for my solution and pgjbc driver.

2016-05-10 5:49 GMT+03:00 Craig Ringer <craig@2ndquadrant.com>:

Show quoted text

On 10 May 2016 at 09:50, Craig Ringer <craig@2ndquadrant.com> wrote:

I outlined how I think WalSndWaitForWal() should be handled earlier.
Test streamingDoneReceiving and streamingDoneSending in logical_read_xlog_page
and return -1.

OK, so thinking about this some more, I see why you've added the callback
within the reorder buffer code. You want to stop processing of a
transaction after we've decoded the commit and are looping over the changes
within ReorderBufferCommit(...), which doesn't know anything about the
walsender. So we could loop for a long time within WalSndLoop ->
XLogSendLogical -> LogicalDecodingProcessRecord if the record is a commit,
as we process each change and send it to the client.

So aborting processing and returning between individual changes in
ReorderBufferCommit(...) seems very reasonable. I agree that some kind of
callback is needed because the walsender uses static globals to control its
send/receive stop logic. I don't like the naming "is_active"; maybe reverse
the sense and call it "stop_decoding_cb" ? Or "continue_decoding_cb"?
Unsure.

Anyway, here's a draft patch that does the parts other than the reorder
buffer processing stop. It passes 'make check' and the src/test/recovery
tests, but I haven't written anything to verify the client-initiated abort
handling. You have test code for this and I'd be interested to see the
results.

This patch doesn't attempt to allow decoding to be aborted during
processing of an xlog record, including a commit. So it'll still attempt to
send whole transactions. But it should allow clients to send CopyDone when
we're waiting for new WAL to be generated and return to command mode then.

I think it's worth making the next step, where you allow reorder buffer
commit processing to be interrupted, into a separate patch on top of this
one. They're two separate changes IMO.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#11

Craig Ringer

craig@2ndquadrant.com

about 10 years ago

In reply to: Vladimir Gordiychuk (#10)

Re: Stopping logical replication protocol

On 10 May 2016 at 19:41, Vladimir Gordiychuk <folyga@gmail.com> wrote:

Fair enough. Though I don't understand why you'd be doing this often

enough that you'd care about reopening connections. What is the problem you
are trying to solve with this? The underlying reason you need this change?

First reason it clear API in pgjdc. Second reason it ability fast enough
rollack to one of the previous LSN and repeat it. My use case for second
reason - I have logical decoding extension that prepare only data as
key-value pair without information about (insert, update, delete) for
example if it delete as key I use primary key for table and as value null. Via
pgjdc by replication protocol connects receiver data, consumer group
changes to batch and send it to Kafka. If some problem occurs during
delivery to kafka consumer, I should stop current replication, go back to
success LSN and repeat all messages from success LSN. I think it operation
can be quite common, but reopen connection or not stopping replication
will increase latency.

It will, but not tons. The biggest cost (at least if there are any long
running xacts) will be replaying since the restart_lsn when you restart the
decoding session, which will happen whether you reconnect or just stop
decoding and return to command mode.

Anyway, here's a draft patch that does the parts other than the reorder

buffer processing stop. It passes 'make check' and the src/test/recovery
tests, but I haven't written anything to verify the client-initiated abort
handling. You have test code for this and I'd be interested to see the
results.

What about keepAlive package when we already send/receive CopyDone? Is It
really necessary?

No, I don't think it is, and I applied the change you made to suppress it.

I think it's worth making the next step, where you allow reorder buffer

commit processing to be interrupted, into a separate patch on top of this
one. They're two separate changes IMO.

We will continue in the current thread, or new? I interesting in both
patch for my solution and pgjbc driver.

Same thread, I just think these are two somewhat separate changes. One is
just in the walsender and allows return to command mode during waiting for
WAL. The other is more intrusive into the reorder buffer etc and allows
aborting decoding during commit processing. So two separate patches make
sense here IMO, one on top of the other.

I use git and just "git format-patch -2" to prepare a stack of two patches
from two separate commits.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#12

Vladimir Gordiychuk

folyga@gmail.com

about 10 years ago

In reply to: Craig Ringer (#11)

Re: Stopping logical replication protocol

in which release can be include first part?

2016-05-10 15:15 GMT+03:00 Craig Ringer <craig@2ndquadrant.com>:

Show quoted text

On 10 May 2016 at 19:41, Vladimir Gordiychuk <folyga@gmail.com> wrote:

Fair enough. Though I don't understand why you'd be doing this often

enough that you'd care about reopening connections. What is the problem you
are trying to solve with this? The underlying reason you need this change?

First reason it clear API in pgjdc. Second reason it ability fast enough
rollack to one of the previous LSN and repeat it. My use case for second
reason - I have logical decoding extension that prepare only data as
key-value pair without information about (insert, update, delete) for
example if it delete as key I use primary key for table and as value null. Via
pgjdc by replication protocol connects receiver data, consumer group
changes to batch and send it to Kafka. If some problem occurs during
delivery to kafka consumer, I should stop current replication, go back to
success LSN and repeat all messages from success LSN. I think it operation
can be quite common, but reopen connection or not stopping replication
will increase latency.

It will, but not tons. The biggest cost (at least if there are any long
running xacts) will be replaying since the restart_lsn when you restart the
decoding session, which will happen whether you reconnect or just stop
decoding and return to command mode.

Anyway, here's a draft patch that does the parts other than the reorder

buffer processing stop. It passes 'make check' and the src/test/recovery
tests, but I haven't written anything to verify the client-initiated abort
handling. You have test code for this and I'd be interested to see the
results.

What about keepAlive package when we already send/receive CopyDone? Is It
really necessary?

No, I don't think it is, and I applied the change you made to suppress it.

I think it's worth making the next step, where you allow reorder buffer

commit processing to be interrupted, into a separate patch on top of this
one. They're two separate changes IMO.

We will continue in the current thread, or new? I interesting in both
patch for my solution and pgjbc driver.

Same thread, I just think these are two somewhat separate changes. One is
just in the walsender and allows return to command mode during waiting for
WAL. The other is more intrusive into the reorder buffer etc and allows
aborting decoding during commit processing. So two separate patches make
sense here IMO, one on top of the other.

I use git and just "git format-patch -2" to prepare a stack of two patches
from two separate commits.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#13

Craig Ringer

craig@2ndquadrant.com

about 10 years ago

In reply to: Vladimir Gordiychuk (#12)

Re: Stopping logical replication protocol

On 11 May 2016 at 01:15, Vladimir Gordiychuk <folyga@gmail.com> wrote:

in which release can be include first part?

Since it's not a bug fix, I don't think it can go in before 9.7.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#14

Vladimir Gordiychuk

folyga@gmail.com

about 10 years ago

In reply to: Craig Ringer (#13)

Re: Stopping logical replication protocol

Same thread, I just think these are two somewhat separate changes. One is
just in the walsender and allows return to command mode during waiting for
WAL. The other is more intrusive into the reorder buffer etc and allows
aborting decoding during commit processing. So two separate patches make
sense here IMO, one on top of the other.

About the second part of the patch. What the reason decode and send whole
transaction? Why we can't process logical decoding via WalSndLoop LSN by
LSN as it work in physycal replication? For example if transaction contains
in more them one LSN, first we decode and send "begin", "part data from
current LSN" and then returns to WalSndLoop on the next iteration we send
"another part data", "commit". I don't research in this way, because I
think it will be big changes in comparison callback that stop sending.

2016-05-10 20:22 GMT+03:00 Craig Ringer <craig@2ndquadrant.com>:

Show quoted text

On 11 May 2016 at 01:15, Vladimir Gordiychuk <folyga@gmail.com> wrote:

in which release can be include first part?

Since it's not a bug fix, I don't think it can go in before 9.7.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#15

Craig Ringer

craig@2ndquadrant.com

about 10 years ago

In reply to: Vladimir Gordiychuk (#14)

Re: Stopping logical replication protocol

On 11 May 2016 at 06:47, Vladimir Gordiychuk <folyga@gmail.com> wrote:

Same thread, I just think these are two somewhat separate changes. One is

just in the walsender and allows return to command mode during waiting for
WAL. The other is more intrusive into the reorder buffer etc and allows
aborting decoding during commit processing. So two separate patches make
sense here IMO, one on top of the other.

About the second part of the patch. What the reason decode and send whole
transaction? Why we can't process logical decoding via WalSndLoop LSN by
LSN as it work in physycal replication? For example if transaction contains
in more them one LSN, first we decode and send "begin", "part data from
current LSN" and then returns to WalSndLoop on the next iteration we send
"another part data", "commit". I don't research in this way, because I
think it will be big changes in comparison callback that stop sending.

There are two parts to that. First, why do we reorder at all, accumulating
a whole transaction in a reorder buffer until we see a commit then sending
it all at once? Second, when sending, why don't we return control to the
walsender between messages?

For the first: reordering xacts server-side lets the client not worry about
replay order. It just applies them as it receives them. It means the server
can omit uncommitted transactions from the stream entirely and clients can
be kept simple(r). IIRC there are also some issues around relcache
invalidation handling and time travel that make it desirable to wait until
commit before building a snapshot and decoding, but I haven't dug into the
details. Andres is the person who knows that area best.

As for why we don't return control to the walsender between change
callbacks when processing a reorder buffer at commit time, I'm not really
sure but suspect it's mostly down to easy API and development. If control
returned to the walsender between each change we'd need an async api for
the reorder buffer where you test to see if there's more unprocessed work
and call back into the reorder buffer again if there is. So the reorder
buffer has to keep state for the progress of replaying a commit in a
separate struct, handle repeated calls to process work, etc. Also, since
many individual changes are very small that could lead to a fair bit of
overhead; it'd probably want to process a batch of changes then return.
Which makes it even more complicated.

If it returned control to the caller between changes then each caller would
also need to have the logic to test for more work and call back into the
reorder buffer. Both the walsender and SQL interface would need it. The way
it is, the loop is just in one place.

It probably makes more sense to have a callback that can test state and
abort processing, like you've introduced. The callback could probably even
periodically check to see if there's client input to process and consume it.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#16

Vladimir Gordiychuk

folyga@gmail.com

almost 10 years ago

In reply to: Craig Ringer (#15)

Re: Stopping logical replication protocol

Hi. It has already passed a few months but patch still have required review
state. Can I help to speed up the review, or should i wait commitfest?
I plane complete changes in pgjdbc drive inside PR
https://github.com/pgjdbc/pgjdbc/pull/550 but PR blocked current problem
with not available stop logical replication.

2016-05-11 7:25 GMT+03:00 Craig Ringer <craig@2ndquadrant.com>:

Show quoted text

On 11 May 2016 at 06:47, Vladimir Gordiychuk <folyga@gmail.com> wrote:

Same thread, I just think these are two somewhat separate changes. One is

just in the walsender and allows return to command mode during waiting for
WAL. The other is more intrusive into the reorder buffer etc and allows
aborting decoding during commit processing. So two separate patches make
sense here IMO, one on top of the other.

About the second part of the patch. What the reason decode and send whole
transaction? Why we can't process logical decoding via WalSndLoop LSN by
LSN as it work in physycal replication? For example if transaction contains
in more them one LSN, first we decode and send "begin", "part data from
current LSN" and then returns to WalSndLoop on the next iteration we send
"another part data", "commit". I don't research in this way, because I
think it will be big changes in comparison callback that stop sending.

There are two parts to that. First, why do we reorder at all, accumulating
a whole transaction in a reorder buffer until we see a commit then sending
it all at once? Second, when sending, why don't we return control to the
walsender between messages?

For the first: reordering xacts server-side lets the client not worry
about replay order. It just applies them as it receives them. It means the
server can omit uncommitted transactions from the stream entirely and
clients can be kept simple(r). IIRC there are also some issues around
relcache invalidation handling and time travel that make it desirable to
wait until commit before building a snapshot and decoding, but I haven't
dug into the details. Andres is the person who knows that area best.

As for why we don't return control to the walsender between change
callbacks when processing a reorder buffer at commit time, I'm not really
sure but suspect it's mostly down to easy API and development. If control
returned to the walsender between each change we'd need an async api for
the reorder buffer where you test to see if there's more unprocessed work
and call back into the reorder buffer again if there is. So the reorder
buffer has to keep state for the progress of replaying a commit in a
separate struct, handle repeated calls to process work, etc. Also, since
many individual changes are very small that could lead to a fair bit of
overhead; it'd probably want to process a batch of changes then return.
Which makes it even more complicated.

If it returned control to the caller between changes then each caller
would also need to have the logic to test for more work and call back into
the reorder buffer. Both the walsender and SQL interface would need it. The
way it is, the loop is just in one place.

It probably makes more sense to have a callback that can test state and
abort processing, like you've introduced. The callback could probably even
periodically check to see if there's client input to process and consume it.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#17

Craig Ringer

craig@2ndquadrant.com

almost 10 years ago

In reply to: Vladimir Gordiychuk (#16)

Re: Stopping logical replication protocol

On 25 August 2016 at 03:26, Vladimir Gordiychuk <folyga@gmail.com> wrote:

Hi. It has already passed a few months but patch still have required review
state. Can I help to speed up the review, or should i wait commitfest?
I plane complete changes in pgjdbc drive inside PR
https://github.com/pgjdbc/pgjdbc/pull/550 but PR blocked current problem
with not available stop logical replication.

The latest patch is the updated one I posted, which was the part of
yours that allowed return to command mode when logical decoding is
idle.

If you want to submit an updated version of the second patch, with the
callback to allow logical decoding cancellation during
ReorderBufferCommit, that'd be handy, but I don't think it's as
important as the first one.

As far as I'm concerned the first patch is ready. Please take a look
and verify that you're happy with my updates and test the updated
patch. I'll mark it ready to go if you are. It's linked to at
https://commitfest.postgresql.org/10/621/ .

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Craig Ringer

craig@2ndquadrant.com

almost 10 years ago

In reply to: Craig Ringer (#17)

Re: Stopping logical replication protocol

On 25 August 2016 at 09:22, Craig Ringer <craig@2ndquadrant.com> wrote:

On 25 August 2016 at 03:26, Vladimir Gordiychuk <folyga@gmail.com> wrote:

Hi. It has already passed a few months but patch still have required review
state. Can I help to speed up the review, or should i wait commitfest?
I plane complete changes in pgjdbc drive inside PR
https://github.com/pgjdbc/pgjdbc/pull/550 but PR blocked current problem
with not available stop logical replication.

The latest patch is the updated one I posted, which was the part of
yours that allowed return to command mode when logical decoding is
idle.

If you want to submit an updated version of the second patch, with the
callback to allow logical decoding cancellation during
ReorderBufferCommit, that'd be handy, but I don't think it's as
important as the first one.

As far as I'm concerned the first patch is ready. Please take a look
and verify that you're happy with my updates and test the updated
patch. I'll mark it ready to go if you are. It's linked to at
https://commitfest.postgresql.org/10/621/ .

By the way, I now think that the second part of your patch, to allow
interruption during ReorderBufferCommit processing, is also very
desirable.

Not just for that feature, but because we should be processing client
messages during long ReorderBufferCommit executions so that we can
respond to feedback from the client. Right now, long running
ReorderBufferCommit runs can trigger walsender_timeout because there's
no feedback processed.

Alternately, ReorderBufferCommit() could call back into the walsender
with a progress update, without requiring the client to send feedback.
It knows the client is making progress because it's consuming data on
the socket. But I'd rather have client feedback processed, since it
could be useful later for client=->server messages to output plugin
callbacks too.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Craig Ringer

craig@2ndquadrant.com

over 9 years ago

In reply to: Craig Ringer (#18)

Re: Stopping logical replication protocol

On 25 August 2016 at 13:04, Craig Ringer <craig@2ndquadrant.com> wrote:

By the way, I now think that the second part of your patch, to allow
interruption during ReorderBufferCommit processing, is also very
desirable.

I've updated your patch, rebasing it on top of 10.0 master and
splitting it into two pieces. I thought you were planning on following
up with an update to the second part, but maybe that was unclear from
our prior discussion, so I'm doing this to help get this moving again.

The first patch is what I posted earlier - the part of your patch that
allows the walsender to exit COPY BOTH mode and return to command mode
in response to a client-initiated CopyDone when it's in the
WalSndLoop. For logical decoding this means that it will notice a
client CopyDone when it's between transactions or while it's ingesting
transaction changes. It won't react to CopyDone while it's in
ReorderBufferCommit and sending a transaction to an output plugin
though.

The second piece is the other half of your original patch. Currently
unmodified from what you wrote. It adds a hook in ReorderBufferCommit
that calls back into the walsender to check for new client input and
interrupt processing. I haven't yet investigated this in detail.

Review comments on the 2nd patch, i.e. the 2nd half of your original patch:

* Other places in logical decoding use the CB suffix for callback
types. This should do the same.

* I'm not too keen on the name is_active for the callback. We
discussed the name continue_decoding_cb in our prior conversation.

* Instead of having LogicalContextAlwaysActive, would it be better to
test if the callback pointer is null and skip it if so?

* Lots of typos in LogicalContextAlwaysActive comments, and wording is unclear

* comment added to arguments of pg_logical_slot_get_changes_guts is
not in PostgreSQL style - it goes way wide on the line. Probably
better to put the comment above the call and mention which argument it
refers to.

* A comment somewhere is needed - probably on the IsStreamingActive()
callback - to point readers at the fact that WalSndWriteData() calls
ProcessRepliesIfAny() and that's how IsStreamingActive can get set. I
had to look at the email discussion history to remind myself how this
worked when I was re-reading the code.

* I'm not keen on

typedef ReorderBufferIsActive LogicalDecondingContextIsActive;

I think instead a single callback name that encompasses both should
be used. ContinueDecodingCB ?

* There are no tests. I don't see any really simple way to test this, though.

I suggest that you apply these updated/split patches to a fresh copy
of master then see if you can update it based on this feedback.
Something like:

git checkout master
git pull
git checkout -b stop-decoding-v10
git am /path/to/0001-Respect-client-initiated-CopyDone-in-walsender.patch
git am /path/to/0002-Second-part-of-Vladimir-Gordiychuk-s-patch.patch

then modify the code per the above, and "git commit --amend" the
changes and send an updated set of patches with "git format-patch -2"
.

I am setting this patch as "waiting on author" in the commitfest.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#20

Vladimir Gordiychuk

folyga@gmail.com

over 9 years ago

In reply to: Craig Ringer (#19)

Re: Stopping logical replication protocol

Review comments on the 2nd patch, i.e. the 2nd half of your original patch:

* Other places in logical decoding use the CB suffix for callback
types. This should do the same.

* I'm not too keen on the name is_active for the callback. We
discussed the name continue_decoding_cb in our prior conversation.

* Instead of having LogicalContextAlwaysActive, would it be better to
test if the callback pointer is null and skip it if so?

* Lots of typos in LogicalContextAlwaysActive comments, and wording is

unclear

* comment added to arguments of pg_logical_slot_get_changes_guts is
not in PostgreSQL style - it goes way wide on the line. Probably
better to put the comment above the call and mention which argument it
refers to.

* A comment somewhere is needed - probably on the IsStreamingActive()
callback - to point readers at the fact that WalSndWriteData() calls
ProcessRepliesIfAny() and that's how IsStreamingActive can get set. I
had to look at the email discussion history to remind myself how this
worked when I was re-reading the code.

* I'm not keen on

typedef ReorderBufferIsActive LogicalDecondingContextIsActive;

I think instead a single callback name that encompasses both should
be used. ContinueDecodingCB ?

Fixed patch in attach.

* There are no tests. I don't see any really simple way to test this,

though.

I will be grateful if you specify the best way how to do it. I tests this
patches by pgjdbc driver tests that also build on head of postgres. You can
check test scenario for both patches in PRhttps://github.com/pgjdbc/
pgjdbc/pull/550

org.postgresql.replication.LogicalReplicationTest#testDuringSendBigTransactionReplicationStreamCloseNotActive
org.postgresql.replication.LogicalReplicationTest#testAfterCloseReplicationStreamDBSlotStatusNotActive

2016-09-06 4:10 GMT+03:00 Craig Ringer <craig@2ndquadrant.com>:

Show quoted text

On 25 August 2016 at 13:04, Craig Ringer <craig@2ndquadrant.com> wrote:

By the way, I now think that the second part of your patch, to allow
interruption during ReorderBufferCommit processing, is also very
desirable.

I've updated your patch, rebasing it on top of 10.0 master and
splitting it into two pieces. I thought you were planning on following
up with an update to the second part, but maybe that was unclear from
our prior discussion, so I'm doing this to help get this moving again.

The first patch is what I posted earlier - the part of your patch that
allows the walsender to exit COPY BOTH mode and return to command mode
in response to a client-initiated CopyDone when it's in the
WalSndLoop. For logical decoding this means that it will notice a
client CopyDone when it's between transactions or while it's ingesting
transaction changes. It won't react to CopyDone while it's in
ReorderBufferCommit and sending a transaction to an output plugin
though.

The second piece is the other half of your original patch. Currently
unmodified from what you wrote. It adds a hook in ReorderBufferCommit
that calls back into the walsender to check for new client input and
interrupt processing. I haven't yet investigated this in detail.

Review comments on the 2nd patch, i.e. the 2nd half of your original patch:

* Other places in logical decoding use the CB suffix for callback
types. This should do the same.

* I'm not too keen on the name is_active for the callback. We
discussed the name continue_decoding_cb in our prior conversation.

* Instead of having LogicalContextAlwaysActive, would it be better to
test if the callback pointer is null and skip it if so?

* Lots of typos in LogicalContextAlwaysActive comments, and wording is
unclear

* comment added to arguments of pg_logical_slot_get_changes_guts is
not in PostgreSQL style - it goes way wide on the line. Probably
better to put the comment above the call and mention which argument it
refers to.

* A comment somewhere is needed - probably on the IsStreamingActive()
callback - to point readers at the fact that WalSndWriteData() calls
ProcessRepliesIfAny() and that's how IsStreamingActive can get set. I
had to look at the email discussion history to remind myself how this
worked when I was re-reading the code.

* I'm not keen on

typedef ReorderBufferIsActive LogicalDecondingContextIsActive;

I think instead a single callback name that encompasses both should
be used. ContinueDecodingCB ?

* There are no tests. I don't see any really simple way to test this,
though.

I suggest that you apply these updated/split patches to a fresh copy
of master then see if you can update it based on this feedback.
Something like:

git checkout master
git pull
git checkout -b stop-decoding-v10
git am /path/to/0001-Respect-client-initiated-CopyDone-in-walsender.patch
git am /path/to/0002-Second-part-of-Vladimir-Gordiychuk-s-patch.patch

then modify the code per the above, and "git commit --amend" the
changes and send an updated set of patches with "git format-patch -2"
.

I am setting this patch as "waiting on author" in the commitfest.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services