pgsql: Send new protocol keepalive messages to standby servers.

Started by Simon Riggsover 14 years ago33 messageshackers
Jump to latest
#1Simon Riggs
simon@2ndQuadrant.com

Send new protocol keepalive messages to standby servers.
Allows streaming replication users to calculate transfer latency
and apply delay via internal functions. No external functions yet.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/64233902d22ba42846397cb7551894217522fad4

Modified Files
--------------
doc/src/sgml/protocol.sgml | 48 +++++++++++++++++++++
src/backend/access/transam/xlog.c | 43 +++++++++++++++++++
src/backend/replication/walreceiver.c | 47 ++++++++++++++++++++-
src/backend/replication/walreceiverfuncs.c | 63 ++++++++++++++++++++++++++++
src/backend/replication/walsender.c | 42 ++++++++++++-------
src/include/access/xlog.h | 1 +
src/include/replication/walprotocol.h | 22 ++++++++++
src/include/replication/walreceiver.h | 8 ++++
8 files changed, 258 insertions(+), 16 deletions(-)

#2Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#1)
Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

On Sat, Dec 31, 2011 at 10:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Send new protocol keepalive messages to standby servers.
Allows streaming replication users to calculate transfer latency
and apply delay via internal functions. No external functions yet.

pq_flush_if_writable() needs to be called just after
WalSndKeepalive(). Otherwise,
keepalive packet is not sent for a while.

+static void
+ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime)

walEnd is not used in ProcessWalSndrMessage() at all. Can't we remove it?
If yes, walEnd field in WalSndrMessage is also not used anywhere, so ISTM
we can remove it.

+	elog(DEBUG2, "sendtime %s receipttime %s replication apply delay %d
transfer latency %d",
+					timestamptz_to_str(sendTime),
+					timestamptz_to_str(lastMsgReceiptTime),
+					GetReplicationApplyDelay(),
+					GetReplicationTransferLatency());

The unit of replication apply delay and transfer latency should be in
log message.

GetReplicationApplyDelay() and GetReplicationTransferLatency() are called
whenever the standby receives the message from the master. Which might
degrade the performance of replication a bit. So we should skip the above elog
when log_message >= DEBUG2?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#3Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#2)
Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

On Wed, Jan 11, 2012 at 2:05 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Sat, Dec 31, 2011 at 10:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Send new protocol keepalive messages to standby servers.
Allows streaming replication users to calculate transfer latency
and apply delay via internal functions. No external functions yet.

Thanks for further review.

pq_flush_if_writable() needs to be called just after
WalSndKeepalive(). Otherwise,
keepalive packet is not sent for a while.

It will get sent though won't it? Maybe not immediately. I guess we
may as well flush though, since we're not doing anything else - by
definition. Will add.

+static void
+ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime)

walEnd is not used in ProcessWalSndrMessage() at all. Can't we remove it?
If yes, walEnd field in WalSndrMessage is also not used anywhere, so ISTM
we can remove it.

It's there to allow extension of the message processing to be more
complex than it currently is. Changing the protocol is much harder
than changing a function call.

I'd like to keep it since it doesn't have any negative effects.

+       elog(DEBUG2, "sendtime %s receipttime %s replication apply delay %d
transfer latency %d",
+                                       timestamptz_to_str(sendTime),
+                                       timestamptz_to_str(lastMsgReceiptTime),
+                                       GetReplicationApplyDelay(),
+                                       GetReplicationTransferLatency());

The unit of replication apply delay and transfer latency should be in
log message.

OK, will do.

GetReplicationApplyDelay() and GetReplicationTransferLatency() are called
whenever the standby receives the message from the master. Which might
degrade the performance of replication a bit. So we should skip the above elog
when log_message >= DEBUG2?

OK, will put in a specific test for you.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#4Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#3)
Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

On Thu, Jan 12, 2012 at 12:20 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

+static void
+ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime)

walEnd is not used in ProcessWalSndrMessage() at all. Can't we remove it?
If yes, walEnd field in WalSndrMessage is also not used anywhere, so ISTM
we can remove it.

It's there to allow extension of the message processing to be more
complex than it currently is. Changing the protocol is much harder
than changing a function call.

I'd like to keep it since it doesn't have any negative effects.

OK. Another problem about walEnd is that WalDataMessageHeader.walEnd is not
the same kind of location as WalSndrMessage.walEnd. The former indicates the
location that WAL has already been flushed (maybe not sent yet), i.e.,
"send request
location". OTOH, the latter indicates the location that WAL has
already been sent.
Is this inconsistency intentional?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#4)
Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

On Thu, Jan 12, 2012 at 3:09 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Jan 12, 2012 at 12:20 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

+static void
+ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime)

walEnd is not used in ProcessWalSndrMessage() at all. Can't we remove it?
If yes, walEnd field in WalSndrMessage is also not used anywhere, so ISTM
we can remove it.

It's there to allow extension of the message processing to be more
complex than it currently is. Changing the protocol is much harder
than changing a function call.

I'd like to keep it since it doesn't have any negative effects.

OK. Another problem about walEnd is that WalDataMessageHeader.walEnd is not
the same kind of location as WalSndrMessage.walEnd. The former indicates the
location that WAL has already been flushed (maybe not sent yet), i.e.,
"send request
location". OTOH, the latter indicates the location that WAL has
already been sent.
Is this inconsistency intentional?

WalSndrMessage isn't set to anything, its just a definition.

PrimaryKeepaliveMessage is a message type that uses WalSndrMessage.
That message type is only sent when the WalSndr is quiet, so what is
the difference, in that case?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#6Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#5)
Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

On Thu, Jan 12, 2012 at 5:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

PrimaryKeepaliveMessage is a message type that uses WalSndrMessage.
That message type is only sent when the WalSndr is quiet, so what is
the difference, in that case?

Oh, you are right. There is no difference.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#7Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#6)
Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

On Thu, Jan 12, 2012 at 10:37 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Jan 12, 2012 at 5:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

PrimaryKeepaliveMessage is a message type that uses WalSndrMessage.
That message type is only sent when the WalSndr is quiet, so what is
the difference, in that case?

Oh, you are right. There is no difference.

Here are the changes we discussed. Further comments before commit?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

fujii_review_keepalives.v1.patchtext/x-patch; charset=US-ASCII; name=fujii_review_keepalives.v1.patchDownload+6-1
#8Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#7)
Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

On Fri, Jan 13, 2012 at 4:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Thu, Jan 12, 2012 at 10:37 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Jan 12, 2012 at 5:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

PrimaryKeepaliveMessage is a message type that uses WalSndrMessage.
That message type is only sent when the WalSndr is quiet, so what is
the difference, in that case?

Oh, you are right. There is no difference.

Here are the changes we discussed. Further comments before commit?

Can you add the test for avoiding useless call of GetReplicationApplyDelay()
and GetReplicationTransferLatency()?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#9Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#1)
Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

On Sat, Dec 31, 2011 at 10:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Send new protocol keepalive messages to standby servers.
Allows streaming replication users to calculate transfer latency
and apply delay via internal functions. No external functions yet.

Is there plan to implement such external functions before 9.2 release?
If not, keepalive protocol seems to be almost useless because there is
no use of it for a user and the increase in the number of packets might
increase the replication performance overhead slightly. No?

Regards,

--
Fujii Masao

#10Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#9)
Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

On Wed, May 23, 2012 at 2:28 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Sat, Dec 31, 2011 at 10:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Send new protocol keepalive messages to standby servers.
Allows streaming replication users to calculate transfer latency
and apply delay via internal functions. No external functions yet.

Is there plan to implement such external functions before 9.2 release?
If not, keepalive protocol seems to be almost useless because there is
no use of it for a user and the increase in the number of packets might
increase the replication performance overhead slightly. No?

Good point. IMHO, this shouldn't really have been committed like
this, but since it was, we had better fix it, either by reverting the
change or forcing an initdb to expose the functionality.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#10)
Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, May 23, 2012 at 2:28 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

Is there plan to implement such external functions before 9.2 release?
If not, keepalive protocol seems to be almost useless because there is
no use of it for a user and the increase in the number of packets might
increase the replication performance overhead slightly. No?

Good point. IMHO, this shouldn't really have been committed like
this, but since it was, we had better fix it, either by reverting the
change or forcing an initdb to expose the functionality.

I see no reason to rip the code out if we have plans to make use of it
in the near future. I am also not for going back into development mode
on 9.2, which is what adding new functions now would amount to. What's
wrong with leaving well enough alone? It's not like there is no
unfinished work anywhere else in Postgres ...

regards, tom lane

#12Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#11)
Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

On Thu, May 24, 2012 at 2:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, May 23, 2012 at 2:28 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

Is there plan to implement such external functions before 9.2 release?
If not, keepalive protocol seems to be almost useless because there is
no use of it for a user and the increase in the number of packets might
increase the replication performance overhead slightly. No?

Good point.  IMHO, this shouldn't really have been committed like
this, but since it was, we had better fix it, either by reverting the
change or forcing an initdb to expose the functionality.

I see no reason to rip the code out if we have plans to make use of it
in the near future.  I am also not for going back into development mode
on 9.2, which is what adding new functions now would amount to.  What's
wrong with leaving well enough alone?  It's not like there is no
unfinished work anywhere else in Postgres ...

So, extra TCP overhead for no user-visible benefit doesn't bother you?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#12)
Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

On 24 May 2012 21:11, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, May 24, 2012 at 2:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, May 23, 2012 at 2:28 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

Is there plan to implement such external functions before 9.2 release?
If not, keepalive protocol seems to be almost useless because there is
no use of it for a user and the increase in the number of packets might
increase the replication performance overhead slightly. No?

Good point.  IMHO, this shouldn't really have been committed like
this, but since it was, we had better fix it, either by reverting the
change or forcing an initdb to expose the functionality.

I see no reason to rip the code out if we have plans to make use of it
in the near future.  I am also not for going back into development mode
on 9.2, which is what adding new functions now would amount to.  What's
wrong with leaving well enough alone?  It's not like there is no
unfinished work anywhere else in Postgres ...

So, extra TCP overhead for no user-visible benefit doesn't bother you?

Other changes occurred such that WAL messages don't get sent at all in
many cases on an idle server. The keep alive replaces that, so is of
value in itself.

The new functions would have made most sense if file based keepalives
had been approved. But that didn't make it in and hence incomplete.

Adding functions is the work of a few hours, but not worth starting
that if you intend to block it.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#14Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#13)
Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

On Wed, May 30, 2012 at 4:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 24 May 2012 21:11, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, May 24, 2012 at 2:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, May 23, 2012 at 2:28 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

Is there plan to implement such external functions before 9.2 release?
If not, keepalive protocol seems to be almost useless because there is
no use of it for a user and the increase in the number of packets might
increase the replication performance overhead slightly. No?

Good point.  IMHO, this shouldn't really have been committed like
this, but since it was, we had better fix it, either by reverting the
change or forcing an initdb to expose the functionality.

I see no reason to rip the code out if we have plans to make use of it
in the near future.  I am also not for going back into development mode
on 9.2, which is what adding new functions now would amount to.  What's
wrong with leaving well enough alone?  It's not like there is no
unfinished work anywhere else in Postgres ...

So, extra TCP overhead for no user-visible benefit doesn't bother you?

Other changes occurred such that WAL messages don't get sent at all in
many cases on an idle server. The keep alive replaces that, so is of
value in itself.

The new functions would have made most sense if file based keepalives
had been approved. But that didn't make it in and hence incomplete.

Even if we don't have file based keepalives, the new function enables us
to calculate the network latency, so it seems worth exposing the function.

OTOH, I wonder whether we really need to send keepalive messages
periodically to calculate a network latency. ISTM we don't unless a network
latency varies from situation to situation so frequently and we'd like to
monitor that in almost real time.

Regards,

--
Fujii Masao

#15Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#14)
Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

On Wed, May 30, 2012 at 12:17 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

OTOH, I wonder whether we really need to send keepalive messages
periodically to calculate a network latency. ISTM we don't unless a network
latency varies from situation to situation so frequently and we'd like to
monitor that in almost real time.

I didn't look at this patch too carefully when it was committed.
Looking at it more carefully now, it looks to me like this patch does
two different things. One is to add a function called
GetReplicationApplyDelay(), which returns the number of milliseconds
since replay was fully caught up. So if you were last caught up 5
minutes ago and you have replayed 4 minutes and 50 seconds worth of
WAL during that time, this function will return 5 minutes, not 10
seconds. That is not what I would call "apply delay", which I would
define as how far behind you are NOW, not how long it's been since you
weren't behind at all.

The second thing it does is add a function called
GetReplicationTransferLatency(). The return value of this function is
the difference between the slave's clock at the time the last master
keepalive was processed and the master's clock at the time that
keepalive was generated. I think that in practice, unless network
time synchronization is in use, this is mostly going to be computing
the clock skew between the master and the slave. If time
synchronization is in use, then as you say it'll be a very jittery
measure of master-slave network latency, which can be monitored
perfectly well from outside PG.

Now, measuring time skew is potentially a useful thing to do, if we
believe that this will actually give us an accurate measurement of
what the time skew is, because there are a whole series of things that
people want to do which involve subtracting a slave timestamp from a
master timestamp. Tom has persistently rebuffed all such proposals on
the grounds that there might be time skew, so in theory we could make
those things possible by having a way to measure time skew, which this
does. Here's what we do: given a slave timestamp, add the estimated
time skew to find an equivalent master timestamp, and then subtract.
Using a method of this type would allow us to compute a *real* apply
delay. Woohoo! Unfortunately, if time synchronization IS in use,
then the system clocks are probably already synchronized three to six
orders of magnitude more precisely than what this method can measure,
so the effect of using GetReplicationTransferLatency() to adjust slave
timestamps will be to massively reduce the accuracy of such
calculations. However, I've thus far been unable to convince anyone
that this is a bad idea, so maybe this is where we're gonna end up.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#15)
Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

Robert Haas <robertmhaas@gmail.com> writes:

Now, measuring time skew is potentially a useful thing to do, if we
believe that this will actually give us an accurate measurement of
what the time skew is, because there are a whole series of things that
people want to do which involve subtracting a slave timestamp from a
master timestamp. Tom has persistently rebuffed all such proposals on
the grounds that there might be time skew, so in theory we could make
those things possible by having a way to measure time skew, which this
does. Here's what we do: given a slave timestamp, add the estimated
time skew to find an equivalent master timestamp, and then subtract.
Using a method of this type would allow us to compute a *real* apply
delay. Woohoo! Unfortunately, if time synchronization IS in use,
then the system clocks are probably already synchronized three to six
orders of magnitude more precisely than what this method can measure,
so the effect of using GetReplicationTransferLatency() to adjust slave
timestamps will be to massively reduce the accuracy of such
calculations. However, I've thus far been unable to convince anyone
that this is a bad idea, so maybe this is where we're gonna end up.

Hmm ... first question is do we actually care whether the clocks are
synced to the millisecond level, ie what is it you'd do differently
if you know that the master and slave clocks are synced more closely
than you can measure at the protocol level.

But if there is a reason to care, perhaps we could have a setting that
says "we're using NTP, so trust the clocks to be synced"? What I object
to is assuming that without any evidence, or being unable to operate
correctly in an environment where it's not true.

regards, tom lane

#17Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#16)
Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

On Thu, May 31, 2012 at 11:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hmm ... first question is do we actually care whether the clocks are
synced to the millisecond level, ie what is it you'd do differently
if you know that the master and slave clocks are synced more closely
than you can measure at the protocol level.

But if there is a reason to care, perhaps we could have a setting that
says "we're using NTP, so trust the clocks to be synced"?  What I object
to is assuming that without any evidence, or being unable to operate
correctly in an environment where it's not true.

In general, we are happy to leave to the operating system - or some
other operating service - those tasks which are best handled in that
way. I don't understand why this should be an exception. If we're
not going to implement a filesystem inside PostgreSQL - which is
actually relatively closely related to our core mission as a data
store - then why the heck do we want to implement time
synchronization? If this were an easy problem I wouldn't care, but
it's not. The solution Simon has implemented here, besides being
vulnerable to network jitter that can't be eliminated without
reimplementing some sort of complex ntp-like protocol inside the
backend - won't work with log shipping, which is why (or part of why?)
Simon proposed keepalive files to allow this information to be passed
through the archive. To me, this is massive over-engineering. I'll
support the keepalive feature if it's the only way to get you to agree
to adding the capabilities we need to be competitive with other
replication solutions - but that's about the only redeeming value it
has IMV.

Now, mind you, I am not saying that we should randomly and arbitrarily
make ourselves vulnerable to clock skew when there is a reasonable
alternative design. For example, you were able to come up with a way
to make max_standby_delay work sensibly without having to compare
master and slave timestamps, and that's good. But in cases where no
such design exists - and a time-based notion of replication delay
seems to be one of those times - I don't see any real merit in
reinventing the wheel, especially since it seems likely that the wheel
is going to be dodecagonal. Aside from network jitter and the need
for archive keepalives, suppose the two machines really do have clocks
that are an hour off from each other. And the master system is really
busy so the slave runs about a minute behind. We detect the time skew
and correct for it, so the replication delay shows up correctly. Life
is good. But then the system administrator notices that there's a
problem and fires up ntpd to fix it. Our keepalive system will now
notice and decide that the "replication transfer latency" is now 0 s
instead of +/- 3600 s. However, we're replaying records from a minute
ago, before the time change, so now for the next minute our
replication delay is either 61 minutes or -59 minutes, depending on
the direction of the skew, and then it goes back to normal. Not the
end of the world, but weird. It's the sort of thing that we probably
won't even try to document, because it'll affect very few people, but
anyone who is affected will have to understand the system pretty
deeply to understand what's gone wrong. IME, users hate that.

On the other hand, if we simply say "PostgreSQL computes the
replication delay by subtracting the time at which the WAL was
generated, as recorded on the master, from the time at which it is
replayed by the slave" then, hey, we still have a wart, but it's
pretty clear what the wart is and how to fix it, and we can easily
document that. Again, if we could get rid of the failure modes and
make this really water-tight, I think I'd be in favor of that, but it
seems to me that we are in the process of expending a lot of energy
and an even larger amount of calendar time to create a system that
will misbehave in numerous subtle ways instead of one straightforward
one. I don't see that as a good trade.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#17)
Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

Robert Haas <robertmhaas@gmail.com> writes:

On the other hand, if we simply say "PostgreSQL computes the
replication delay by subtracting the time at which the WAL was
generated, as recorded on the master, from the time at which it is
replayed by the slave" then, hey, we still have a wart, but it's
pretty clear what the wart is and how to fix it, and we can easily
document that. Again, if we could get rid of the failure modes and
make this really water-tight, I think I'd be in favor of that, but it
seems to me that we are in the process of expending a lot of energy
and an even larger amount of calendar time to create a system that
will misbehave in numerous subtle ways instead of one straightforward
one. I don't see that as a good trade.

Well, okay, but let's document "if you use this feature, it's incumbent
on you to make sure the master and slave clocks are synced. We
recommend running NTP." or words to that effect.

regards, tom lane

#19Michael Nolan
htfoot@gmail.com
In reply to: Tom Lane (#18)
Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

On 6/2/12, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On the other hand, if we simply say "PostgreSQL computes the
replication delay by subtracting the time at which the WAL was
generated, as recorded on the master, from the time at which it is
replayed by the slave" then, hey, we still have a wart, but it's
pretty clear what the wart is and how to fix it, and we can easily
document that. Again, if we could get rid of the failure modes and
make this really water-tight, I think I'd be in favor of that, but it
seems to me that we are in the process of expending a lot of energy
and an even larger amount of calendar time to create a system that
will misbehave in numerous subtle ways instead of one straightforward
one. I don't see that as a good trade.

Well, okay, but let's document "if you use this feature, it's incumbent
on you to make sure the master and slave clocks are synced. We
recommend running NTP." or words to that effect.

What if the two servers are in different time zones?
--
Mike Nolan

#20Chris Browne
cbbrowne@acm.org
In reply to: Michael Nolan (#19)
Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

On Sat, Jun 2, 2012 at 12:01 PM, Michael Nolan <htfoot@gmail.com> wrote:

On 6/2/12, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On the other hand, if we simply say "PostgreSQL computes the
replication delay by subtracting the time at which the WAL was
generated, as recorded on the master, from the time at which it is
replayed by the slave" then, hey, we still have a wart, but it's
pretty clear what the wart is and how to fix it, and we can easily
document that.  Again, if we could get rid of the failure modes and
make this really water-tight, I think I'd be in favor of that, but it
seems to me that we are in the process of expending a lot of energy
and an even larger amount of calendar time to create a system that
will misbehave in numerous subtle ways instead of one straightforward
one.  I don't see that as a good trade.

Well, okay, but let's document "if you use this feature, it's incumbent
on you to make sure the master and slave clocks are synced.  We
recommend running NTP." or words to that effect.

What if the two servers are in different time zones?

NTP shouldn't have any problem; it uses UTC underneath. As does
PostgreSQL, underneath.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

#21Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#10)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#21)
#23Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#22)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#23)
In reply to: Tom Lane (#24)
#26Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#24)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#21)
#28Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Tom Lane (#24)
#29Bruce Momjian
bruce@momjian.us
In reply to: Chris Browne (#20)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dimitri Fontaine (#28)
#31Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Tom Lane (#30)
#32Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#24)
#33Magnus Hagander
magnus@hagander.net
In reply to: Simon Riggs (#32)