Keepalive for max_standby_delay

Started by Simon Riggsalmost 16 years ago104 messageshackers
Jump to latest
#1Simon Riggs
simon@2ndQuadrant.com

Patch adds a keepalive message to ensure max_standby_delay is useful.

No WAL format changes, no libpq changes. Just an additional message type
for the streaming replication protocol, sent once per main loop in
WALsender. Plus docs.

Comments?

--
Simon Riggs www.2ndQuadrant.com

Attachments:

hs_keepalive.patchtext/x-patch; charset=UTF-8; name=hs_keepalive.patchDownload+97-5
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#1)
Re: Keepalive for max_standby_delay

Simon Riggs <simon@2ndQuadrant.com> writes:

Patch adds a keepalive message to ensure max_standby_delay is useful.

The proposed placement of the keepalive-send is about the worst it could
possibly be. It needs to be done right before pq_flush to ensure
minimum transfer delay. Otherwise any attempt to measure clock skew
using the timestamp will be seriously off. Where you've placed it
guarantees maximum delay not minimum.

I'm also extremely dubious that it's a good idea to set
recoveryLastXTime from this. Using both that and the timestamps from
the wal log flies in the face of everything I remember about control
theory. We should be doing only one or only the other, I think.

regards, tom lane

#3Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#2)
Re: Keepalive for max_standby_delay

On Sat, 2010-05-15 at 11:45 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

Patch adds a keepalive message to ensure max_standby_delay is useful.

The proposed placement of the keepalive-send is about the worst it could
possibly be. It needs to be done right before pq_flush to ensure
minimum transfer delay. Otherwise any attempt to measure clock skew
using the timestamp will be seriously off. Where you've placed it
guarantees maximum delay not minimum.

I don't understand. WalSndKeepAlive() contains a pq_flush() immediately
after the timestamp is set. I did that way for exactly the same reasons
you've said.

Do you mean you only want to see one pq_flush()? I used two so that the
actual data is never delayed by a keepalive. WAL Sender was going to
sleep anyway, so shouldn't be a problem.

I'm also extremely dubious that it's a good idea to set
recoveryLastXTime from this. Using both that and the timestamps from
the wal log flies in the face of everything I remember about control
theory. We should be doing only one or only the other, I think.

I can change it so that the recoveryLastXTime will not be updated if we
are using the value from the keepalives. So we have one, or the other.
Remember that replication can switch backwards and forwards between
modes, so it seems sensible to have a common timing value whichever mode
we're in.

--
Simon Riggs www.2ndQuadrant.com

#4Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#3)
Re: Keepalive for max_standby_delay

Simon Riggs wrote:

On Sat, 2010-05-15 at 11:45 -0400, Tom Lane wrote:

I'm also extremely dubious that it's a good idea to set
recoveryLastXTime from this. Using both that and the timestamps from
the wal log flies in the face of everything I remember about control
theory. We should be doing only one or only the other, I think.

I can change it so that the recoveryLastXTime will not be updated if we
are using the value from the keepalives. So we have one, or the other.
Remember that replication can switch backwards and forwards between
modes, so it seems sensible to have a common timing value whichever mode
we're in.

That means that recoveryLastXTime can jump forwards and backwards.
Doesn't feel right to me either. If you want to expose the
keepalive-time to queries, it should be a separate field, something like
lastMasterKeepaliveTime and a pg_last_master_keepalive() function to
read it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#4)
Re: Keepalive for max_standby_delay

On Sat, 2010-05-15 at 19:30 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

On Sat, 2010-05-15 at 11:45 -0400, Tom Lane wrote:

I'm also extremely dubious that it's a good idea to set
recoveryLastXTime from this. Using both that and the timestamps from
the wal log flies in the face of everything I remember about control
theory. We should be doing only one or only the other, I think.

I can change it so that the recoveryLastXTime will not be updated if we
are using the value from the keepalives. So we have one, or the other.
Remember that replication can switch backwards and forwards between
modes, so it seems sensible to have a common timing value whichever mode
we're in.

That means that recoveryLastXTime can jump forwards and backwards.

That behaviour would be bad, so why not just prevent that from
happening?

Doesn't feel right to me either. If you want to expose the
keepalive-time to queries, it should be a separate field, something like
lastMasterKeepaliveTime and a pg_last_master_keepalive() function to
read it.

That wouldn't be good because then you couldn't easily monitor the
delay? You'd have to run two different functions depending on the state
of replication (for which we would need yet another function). Users
would just wrap that back up into a single function.

--
Simon Riggs www.2ndQuadrant.com

#6Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#5)
Re: Keepalive for max_standby_delay

Simon Riggs wrote:

On Sat, 2010-05-15 at 19:30 +0300, Heikki Linnakangas wrote:

Doesn't feel right to me either. If you want to expose the
keepalive-time to queries, it should be a separate field, something like
lastMasterKeepaliveTime and a pg_last_master_keepalive() function to
read it.

That wouldn't be good because then you couldn't easily monitor the
delay? You'd have to run two different functions depending on the state
of replication (for which we would need yet another function). Users
would just wrap that back up into a single function.

What exactly is the user trying to monitor? If it's "how far behind is
the standby", the difference between pg_current_xlog_insert_location()
in the master and pg_last_xlog_replay_location() in the standby seems
more robust and well-defined to me. It's a measure of XLOG location (ie.
bytes) instead of time, but time is a complicated concept.

Also note that as the patch stands, if you receive a keep-alive from the
master at point X, it doesn't mean that the standby is fully up-to-date.
It's possible that the walsender just finished sending a huge batch of
accumulated WAL, say 1 GB, and it took 1 hour for the batch to be sent.
During that time, a lot more WAL has accumulated, yet walsender sends a
keep-alive with the current timestamp.

In general, the purpose of a keep-alive is to keep the connection alive,
but you're trying to accomplish something else too, and I don't fully
understand what it is.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#7Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#6)
Re: Keepalive for max_standby_delay

On Sat, 2010-05-15 at 20:05 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

On Sat, 2010-05-15 at 19:30 +0300, Heikki Linnakangas wrote:

Doesn't feel right to me either. If you want to expose the
keepalive-time to queries, it should be a separate field, something like
lastMasterKeepaliveTime and a pg_last_master_keepalive() function to
read it.

That wouldn't be good because then you couldn't easily monitor the
delay? You'd have to run two different functions depending on the state
of replication (for which we would need yet another function). Users
would just wrap that back up into a single function.

What exactly is the user trying to monitor? If it's "how far behind is
the standby", the difference between pg_current_xlog_insert_location()
in the master and pg_last_xlog_replay_location() in the standby seems
more robust and well-defined to me. It's a measure of XLOG location (ie.
bytes) instead of time, but time is a complicated concept.

Maybe, but its meaningful to users and that is the point.

Also note that as the patch stands, if you receive a keep-alive from the
master at point X, it doesn't mean that the standby is fully up-to-date.
It's possible that the walsender just finished sending a huge batch of
accumulated WAL, say 1 GB, and it took 1 hour for the batch to be sent.
During that time, a lot more WAL has accumulated, yet walsender sends a
keep-alive with the current timestamp.

Not at all. The timestamp for the keepalive is calculated after the
pq_flush for the main WAL data. So it takes 10 years to send the next
blob of WAL data the timestamp will be current.

However, a point you made in an earlier thread is still true here. It
sounds like it would be best if startup process didn't re-access shared
memory for the next location until it had fully replayed all the WAL up
to the point it last accessed. That way the changing value of the shared
timestamp would have no effect on the calculated value at any point in
time. I will recode using that concept.

In general, the purpose of a keep-alive is to keep the connection alive,
but you're trying to accomplish something else too, and I don't fully
understand what it is.

That surprises me. If nothing else, its in the title of the thread,
though since you personally added this to the Hot Standby todo more than
6 months ago I'd hope you of all people would understand the purpose.

--
Simon Riggs www.2ndQuadrant.com

#8Simon Riggs
simon@2ndQuadrant.com
In reply to: Simon Riggs (#7)
Re: Keepalive for max_standby_delay

On Sat, 2010-05-15 at 18:24 +0100, Simon Riggs wrote:

I will recode using that concept.

There's some behaviours that aren't helpful here:

Startup gets new pointer when it runs out of data to replay. That might
or might not include an updated keepalive timestamp, since there's no
exact relationship between chunks sent and chunks received. Startup
might ask for a new chunk when half a chunk has been received, or when
multiple chunks have been received.

WALSender doesn't chunk up what it sends, though libpq does at a lower
level. So there's no way to make a keepalive happen every X bytes
without doing this from within libpq.

WALSender sleeps even when it might have more WAL to send, it doesn't
check it just unconditionally sleeps. At least WALReceiver loops until
it has no more to receive. I just can't imagine why that's useful
behaviour.

All in all, I think we should be calling this "burst replication" not
streaming replication. That does cause an issue in trying to monitor
what's going on cos there's so much jitter.

--
Simon Riggs www.2ndQuadrant.com

#9Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#8)
Re: Keepalive for max_standby_delay

Simon Riggs wrote:

WALSender sleeps even when it might have more WAL to send, it doesn't
check it just unconditionally sleeps. At least WALReceiver loops until
it has no more to receive. I just can't imagine why that's useful
behaviour.

Good catch. That should be fixed.

I also note that walsender doesn't respond to signals, while it's
sending a large batch. That's analogous to the issue that was addressed
recently in the archiver process.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#10Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#9)
Re: Keepalive for max_standby_delay

Heikki Linnakangas wrote:

Simon Riggs wrote:

WALSender sleeps even when it might have more WAL to send, it doesn't
check it just unconditionally sleeps. At least WALReceiver loops until
it has no more to receive. I just can't imagine why that's useful
behaviour.

Good catch. That should be fixed.

I also note that walsender doesn't respond to signals, while it's
sending a large batch. That's analogous to the issue that was addressed
recently in the archiver process.

Attached patch rearranges the walsender loops slightly to fix the above.
XLogSend() now only sends up to MAX_SEND_SIZE bytes (== XLOG_SEG_SIZE /
2) in one round and returns to the main loop after that even if there's
unsent WAL, and the main loop no longer sleeps if there's unsent WAL.
That way the main loop gets to respond to signals quickly, and we also
get to update the shared memory status and PS display more often when
there's a lot of catching up to do.

Comments, have I screwed up anything?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

dont-sleep-if-wal-to-send.patchtext/x-diff; name=dont-sleep-if-wal-to-send.patchDownload+166-157
#11Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#10)
Re: Keepalive for max_standby_delay

On Sun, 2010-05-16 at 00:05 +0300, Heikki Linnakangas wrote:

Heikki Linnakangas wrote:

Simon Riggs wrote:

WALSender sleeps even when it might have more WAL to send, it doesn't
check it just unconditionally sleeps. At least WALReceiver loops until
it has no more to receive. I just can't imagine why that's useful
behaviour.

Good catch. That should be fixed.

I also note that walsender doesn't respond to signals, while it's
sending a large batch. That's analogous to the issue that was addressed
recently in the archiver process.

Attached patch rearranges the walsender loops slightly to fix the above.
XLogSend() now only sends up to MAX_SEND_SIZE bytes (== XLOG_SEG_SIZE /
2) in one round and returns to the main loop after that even if there's
unsent WAL, and the main loop no longer sleeps if there's unsent WAL.
That way the main loop gets to respond to signals quickly, and we also
get to update the shared memory status and PS display more often when
there's a lot of catching up to do.

Comments

8MB at a time still seems like a large batch to me.

libpq is going to send it in smaller chunks anyway, so I don't see the
importance of trying to keep the batch too large. It just introduces
delay into the sending process. We should be sending chunks that matches
libpq better.

--
Simon Riggs www.2ndQuadrant.com

#12Simon Riggs
simon@2ndQuadrant.com
In reply to: Simon Riggs (#8)
Re: Keepalive for max_standby_delay

On Sat, 2010-05-15 at 19:50 +0100, Simon Riggs wrote:

On Sat, 2010-05-15 at 18:24 +0100, Simon Riggs wrote:

I will recode using that concept.

Startup gets new pointer when it runs out of data to replay. That might
or might not include an updated keepalive timestamp, since there's no
exact relationship between chunks sent and chunks received. Startup
might ask for a new chunk when half a chunk has been received, or when
multiple chunks have been received.

New version, with some other cleanup of wait processing.

New logic is that when Startup asks for next applychunk of WAL it saves
the lastChunkTimestamp. That is then the base time used by
WaitExceedsMaxStandbyDelay(), except when latestXLogTime is later.
Since multiple receivechunks can arrive from primary before Startup asks
for next applychunk we use the oldest receivechunk timestamp, not the
latest. Doing it this way means the lastChunkTimestamp doesn't change
when new keepalives arrive, so we have a stable and reasonably accurate
recordSendTimestamp for each WAL record.

The keepalive is sent as the first part of a new message, if any. So
partial chunks of data always have an accurate timestamp, even if that
is slightly older as a result. Doesn't make much difference except with
very large chunks.

I think that addresses the points raised on this thread and others.

--
Simon Riggs www.2ndQuadrant.com

Attachments:

hs_keepalive.v2.patchtext/x-patch; charset=UTF-8; name=hs_keepalive.v2.patchDownload+221-22
#13Fujii Masao
masao.fujii@gmail.com
In reply to: Heikki Linnakangas (#10)
Re: Keepalive for max_standby_delay

On Sun, May 16, 2010 at 6:05 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Heikki Linnakangas wrote:

Simon Riggs wrote:

WALSender sleeps even when it might have more WAL to send, it doesn't
check it just unconditionally sleeps. At least WALReceiver loops until
it has no more to receive. I just can't imagine why that's useful
behaviour.

Good catch. That should be fixed.

I also note that walsender doesn't respond to signals, while it's
sending a large batch. That's analogous to the issue that was addressed
recently in the archiver process.

Attached patch rearranges the walsender loops slightly to fix the above.
XLogSend() now only sends up to MAX_SEND_SIZE bytes (== XLOG_SEG_SIZE /
2) in one round and returns to the main loop after that even if there's
unsent WAL, and the main loop no longer sleeps if there's unsent WAL.
That way the main loop gets to respond to signals quickly, and we also
get to update the shared memory status and PS display more often when
there's a lot of catching up to do.

Comments

The main loop needs to respond to also the 'X' message from the standby
quickly?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#14Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#12)
Re: Keepalive for max_standby_delay

On Mon, May 17, 2010 at 1:11 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Sat, 2010-05-15 at 19:50 +0100, Simon Riggs wrote:

On Sat, 2010-05-15 at 18:24 +0100, Simon Riggs wrote:

I will recode using that concept.

Startup gets new pointer when it runs out of data to replay. That might
or might not include an updated keepalive timestamp, since there's no
exact relationship between chunks sent and chunks received. Startup
might ask for a new chunk when half a chunk has been received, or when
multiple chunks have been received.

New version, with some other cleanup of wait processing.

New logic is that when Startup asks for next applychunk of WAL it saves
the lastChunkTimestamp. That is then the base time used by
WaitExceedsMaxStandbyDelay(), except when latestXLogTime is later.
Since multiple receivechunks can arrive from primary before Startup asks
for next applychunk we use the oldest receivechunk timestamp, not the
latest. Doing it this way means the lastChunkTimestamp doesn't change
when new keepalives arrive, so we have a stable and reasonably accurate
recordSendTimestamp for each WAL record.

The keepalive is sent as the first part of a new message, if any. So
partial chunks of data always have an accurate timestamp, even if that
is slightly older as a result. Doesn't make much difference except with
very large chunks.

I think that addresses the points raised on this thread and others.

Is it OK that this keepalive message cannot be used by HS in file-based
log-shipping? Even in SR, the startup process cannot use the keepalive
until walreceiver has been started up.

WalSndKeepAlive() always calls initStringInfo(), which seems to cause
memory-leak.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#15Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#14)
Re: Keepalive for max_standby_delay

On Mon, 2010-05-17 at 11:51 +0900, Fujii Masao wrote:

Is it OK that this keepalive message cannot be used by HS in file-based
log-shipping? Even in SR, the startup process cannot use the keepalive
until walreceiver has been started up.

Yes, I see those things.

We already have archive_timeout to handle the keepalive case in
file-based.

When starting up the delay is high anyway, so doesn't really matter
about accuracy - though we do use latestXLogTime in that cases.

WalSndKeepAlive() always calls initStringInfo(), which seems to cause
memory-leak.

Thanks.

--
Simon Riggs www.2ndQuadrant.com

#16Simon Riggs
simon@2ndQuadrant.com
In reply to: Simon Riggs (#11)
Re: Keepalive for max_standby_delay

On Sun, 2010-05-16 at 16:53 +0100, Simon Riggs wrote:

Attached patch rearranges the walsender loops slightly to fix the above.
XLogSend() now only sends up to MAX_SEND_SIZE bytes (== XLOG_SEG_SIZE /
2) in one round and returns to the main loop after that even if there's
unsent WAL, and the main loop no longer sleeps if there's unsent WAL.
That way the main loop gets to respond to signals quickly, and we also
get to update the shared memory status and PS display more often when
there's a lot of catching up to do.

Comments

8MB at a time still seems like a large batch to me.

libpq is going to send it in smaller chunks anyway, so I don't see the
importance of trying to keep the batch too large. It just introduces
delay into the sending process. We should be sending chunks that matches
libpq better.

More to the point the logic will fail if XLOG_BLCKSZ > PQ_BUFFER_SIZE
because it will send partial pages.

Having MAX_SEND_SIZE > PQ_BUFFER_SIZE is pointless, as libpq currently
stands.

--
Simon Riggs www.2ndQuadrant.com

#17Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Heikki Linnakangas (#6)
Re: Keepalive for max_standby_delay

On May 15, 2010, at 12:05 PM, Heikki Linnakangas wrote:

What exactly is the user trying to monitor? If it's "how far behind is
the standby", the difference between pg_current_xlog_insert_location()
in the master and pg_last_xlog_replay_location() in the standby seems
more robust and well-defined to me. It's a measure of XLOG location (ie.
bytes) instead of time, but time is a complicated concept.

I can tell you that end users *will* want a time-based indication of how far behind we are. DBAs will understand "we're this many transactions behind", but managers and end users won't. Unless it's unreasonable to provide that info, we should do so.
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

#18Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#16)
Re: Keepalive for max_standby_delay

On 17/05/10 04:40, Simon Riggs wrote:

On Sun, 2010-05-16 at 16:53 +0100, Simon Riggs wrote:

Attached patch rearranges the walsender loops slightly to fix the above.
XLogSend() now only sends up to MAX_SEND_SIZE bytes (== XLOG_SEG_SIZE /
2) in one round and returns to the main loop after that even if there's
unsent WAL, and the main loop no longer sleeps if there's unsent WAL.
That way the main loop gets to respond to signals quickly, and we also
get to update the shared memory status and PS display more often when
there's a lot of catching up to do.

Comments

8MB at a time still seems like a large batch to me.

libpq is going to send it in smaller chunks anyway, so I don't see the
importance of trying to keep the batch too large. It just introduces
delay into the sending process. We should be sending chunks that matches
libpq better.

More to the point the logic will fail if XLOG_BLCKSZ> PQ_BUFFER_SIZE
because it will send partial pages.

I don't see a failure. We rely on not splitting WAL records across
messages, but we're talking about libpq-level CopyData messages, not TCP
messages.

Having MAX_SEND_SIZE> PQ_BUFFER_SIZE is pointless, as libpq currently
stands.

Well, it does affect the size of the read() in walsender, and I'm sure
there's some overhead in setting the ps display and the other small
stuff we do once per message. But you're probably right that we could
easily make MAX_SEND_SIZE much smaller with no noticeable affect on
performance, while making walsender more responsive to signals. I'll
decrease it to, say, 512 kB.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#19Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Jim Nasby (#17)
Re: Keepalive for max_standby_delay

On 17/05/10 12:36, Jim Nasby wrote:

On May 15, 2010, at 12:05 PM, Heikki Linnakangas wrote:

What exactly is the user trying to monitor? If it's "how far behind is
the standby", the difference between pg_current_xlog_insert_location()
in the master and pg_last_xlog_replay_location() in the standby seems
more robust and well-defined to me. It's a measure of XLOG location (ie.
bytes) instead of time, but time is a complicated concept.

I can tell you that end users *will* want a time-based indication of how far behind we are. DBAs will understand "we're this many transactions behind", but managers and end users won't. Unless it's unreasonable to provide that info, we should do so.

No doubt about that, the problem is that it's hard to provide a reliable
time-based indication.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#20Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#18)
Re: Keepalive for max_standby_delay

On Tue, 2010-05-18 at 17:06 -0400, Heikki Linnakangas wrote:

On 17/05/10 04:40, Simon Riggs wrote:

On Sun, 2010-05-16 at 16:53 +0100, Simon Riggs wrote:

Attached patch rearranges the walsender loops slightly to fix the above.
XLogSend() now only sends up to MAX_SEND_SIZE bytes (== XLOG_SEG_SIZE /
2) in one round and returns to the main loop after that even if there's
unsent WAL, and the main loop no longer sleeps if there's unsent WAL.
That way the main loop gets to respond to signals quickly, and we also
get to update the shared memory status and PS display more often when
there's a lot of catching up to do.

Comments

8MB at a time still seems like a large batch to me.

libpq is going to send it in smaller chunks anyway, so I don't see the
importance of trying to keep the batch too large. It just introduces
delay into the sending process. We should be sending chunks that matches
libpq better.

More to the point the logic will fail if XLOG_BLCKSZ> PQ_BUFFER_SIZE
because it will send partial pages.

I don't see a failure. We rely on not splitting WAL records across
messages, but we're talking about libpq-level CopyData messages, not TCP
messages.

OK

Having MAX_SEND_SIZE> PQ_BUFFER_SIZE is pointless, as libpq currently
stands.

Well, it does affect the size of the read() in walsender, and I'm sure
there's some overhead in setting the ps display and the other small
stuff we do once per message. But you're probably right that we could
easily make MAX_SEND_SIZE much smaller with no noticeable affect on
performance, while making walsender more responsive to signals. I'll
decrease it to, say, 512 kB.

I'm pretty certain we don't need to set the ps display once per message.
ps doesn't need an update 5 times per second on average.

There's no reason that the buffer size we use for XLogRead() should be
the same as the send buffer, if you're worried about that. My point is
that pq_putmessage contains internal flushes so at the libpq level you
gain nothing by big batches. The read() will be buffered anyway with
readahead so not sure what the issue is. We'll have to do this for sync
rep anyway, so what's the big deal? Just do it now, once. Do we really
want 9.1 code to differ here?

--
Simon Riggs www.2ndQuadrant.com

#21Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#19)
#22Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#20)
#23Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#22)
#24Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#23)
#25Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#24)
#26Josh Berkus
josh@agliodbs.com
In reply to: Heikki Linnakangas (#24)
#27Simon Riggs
simon@2ndQuadrant.com
In reply to: Simon Riggs (#12)
#28Simon Riggs
simon@2ndQuadrant.com
In reply to: Josh Berkus (#26)
#29Josh Berkus
josh@agliodbs.com
In reply to: Simon Riggs (#28)
#30Simon Riggs
simon@2ndQuadrant.com
In reply to: Josh Berkus (#29)
#31Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#30)
#32Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#30)
#33Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#32)
#34Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#31)
#35Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Simon Riggs (#34)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#30)
#37Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#36)
#38Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#36)
#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#37)
#40Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#39)
#41Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#40)
#42Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#36)
#43Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#36)
#44Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#43)
#45Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#36)
#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#42)
#47Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#45)
#48Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#47)
#49Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Tom Lane (#45)
#50Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#36)
#51Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dimitri Fontaine (#49)
#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#50)
#53Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Tom Lane (#51)
#54Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#45)
#55Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#46)
#56Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#55)
#57Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#56)
#58Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#52)
#59Fujii Masao
masao.fujii@gmail.com
In reply to: Heikki Linnakangas (#50)
#60Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#59)
#61Fujii Masao
masao.fujii@gmail.com
In reply to: Tom Lane (#48)
#62Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#60)
#63Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#62)
#64Fujii Masao
masao.fujii@gmail.com
In reply to: Tom Lane (#52)
#65Tom Lane
tgl@sss.pgh.pa.us
In reply to: Fujii Masao (#61)
#66Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#56)
#67Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#66)
#68Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#45)
#69Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#67)
#70Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#69)
#71Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#70)
#72Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#43)
#73Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#58)
#74Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#72)
#75Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#74)
#76Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#75)
#77Simon Riggs
simon@2ndQuadrant.com
In reply to: Simon Riggs (#74)
#78Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#77)
#79Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#78)
#80Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#79)
#81Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#80)
#82Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#81)
#83Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#82)
#84Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Robert Haas (#83)
#85Robert Haas
robertmhaas@gmail.com
In reply to: Ron Mayer (#84)
#86Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#82)
#87Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#86)
#88Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#86)
#89Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#88)
#90Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#88)
#91Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#90)
#92Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#91)
#93Simon Riggs
simon@2ndQuadrant.com
In reply to: Josh Berkus (#89)
#94Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#93)
#95Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#90)
#96Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#95)
#97Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#96)
#98Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#97)
#99Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#98)
#100Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#99)
#101Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#97)
#102Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#101)
#103Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#102)
#104Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#103)