Question about StartLogicalReplication() error path

Started by Jeff Davisover 4 years ago20 messages
#1Jeff Davis
pgsql@j-davis.com

StartLogicalReplication() calls CreateDecodingContext(), which says:

else if (start_lsn < slot->data.confirmed_flush)
{
/*
* It might seem like we should error out in this case, but it's
* pretty common for a client to acknowledge a LSN it doesn't
have to
* do anything for, and thus didn't store persistently, because the
* xlog records didn't result in anything relevant for logical
* decoding. Clients have to be able to do that to support
synchronous
* replication.
*/
...
start_lsn = slot->data.confirmed_flush;
}

But what about LSNs that are way in the past? Physical replication will
throw an error in that case (e.g. "requested WAL segment %s has already
been removed"), but StartLogicalReplication() ends up just starting
from confirmed_flush, which doesn't seem right.

I'm not sure I understand the comment overall. Why would the client
request something that it has already acknowledged, and why would the
server override that and just advance to the confirmed_lsn?

Regards,
Jeff Davis

#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Jeff Davis (#1)
Re: Question about StartLogicalReplication() error path

On Fri, Jun 11, 2021 at 7:38 AM Jeff Davis <pgsql@j-davis.com> wrote:

StartLogicalReplication() calls CreateDecodingContext(), which says:

else if (start_lsn < slot->data.confirmed_flush)
{
/*
* It might seem like we should error out in this case, but it's
* pretty common for a client to acknowledge a LSN it doesn't
have to
* do anything for, and thus didn't store persistently, because the
* xlog records didn't result in anything relevant for logical
* decoding. Clients have to be able to do that to support
synchronous
* replication.
*/
...
start_lsn = slot->data.confirmed_flush;
}

..
..

I'm not sure I understand the comment overall. Why would the client
request something that it has already acknowledged,

Because sometimes clients don't have to do anything for xlog records.
One example is WAL for DDL where logical decoding didn't produce
anything for the client but later with keepalive we send the LSN of
WAL where DDL has finished and the client just responds with the
position sent by the server as it doesn't have any other pending
transactions.

and why would the
server override that and just advance to the confirmed_lsn?

I think because there is no need to process the WAL that has been
confirmed by the client. Do you see any problems with this scheme?

--
With Regards,
Amit Kapila.

#3Jeff Davis
pgsql@j-davis.com
In reply to: Amit Kapila (#2)
Re: Question about StartLogicalReplication() error path

On Fri, 2021-06-11 at 10:13 +0530, Amit Kapila wrote:

Because sometimes clients don't have to do anything for xlog records.
One example is WAL for DDL where logical decoding didn't produce
anything for the client but later with keepalive we send the LSN of
WAL where DDL has finished and the client just responds with the
position sent by the server as it doesn't have any other pending
transactions.

If I understand correctly, in this situation it avoids the cost of a
write on the client just to update its stored LSN progress value when
there's no real data to be written. In that case the client would need
to rely on the server's confirmed_flush_lsn instead of its own stored
LSN progress value.

That's a reasonable thing for the *client* to do explicitly, e.g. by
just reading the slot's confirmed_flush_lsn and comparing to its own
stored lsn. But I don't think it's reasonable for the server to just
skip over data requested by the client because it thinks it knows best.

I think because there is no need to process the WAL that has been
confirmed by the client. Do you see any problems with this scheme?

Several:

* Replication setups are complex, and it can be easy to misconfigure
something or have a bug in some control code. An error is valuable to
detect the problem closer to the source.

* There are plausible configurations where things could go badly wrong.
For instance, if you are storing the decoded data in another postgres
server with syncrhonous_commit=off, and acknowledging LSNs before they
are durable. A crash of the destination system would be consistent, but
it would be missing some data earlier than the confirmed_flush_lsn. The
client would then request the data starting at its stored lsn progress
value, but the server would skip ahead to the confirmed_flush_lsn;
silently missing data.

* It's contradicted by the docs: "Instructs server to start streaming
WAL for logical replication, starting at WAL location XXX/XXX."

* The comment acknowledges that a user might expect an error in that
case; but doesn't really address why the user would expect an error,
and why it's OK to violate that expectation.

Regards,
Jeff Davis

#4Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#3)
Re: Question about StartLogicalReplication() error path

On Fri, Jun 11, 2021 at 2:23 AM Jeff Davis <pgsql@j-davis.com> wrote:

* The comment acknowledges that a user might expect an error in that
case; but doesn't really address why the user would expect an error,
and why it's OK to violate that expectation.

This code was written by Andres, so he'd be the best person to comment
on it, but it seems to me that the comment does explain this, and that
it's basically the same explanation as what Amit said. If the client
doesn't have to do anything for a certain range of WAL and just
acknowledges it, it would under your proposal have to also durably
record that it had chosen to do nothing, which might cause extra
fsyncs, potentially lots of extra fsyncs if this happens frequently
e.g. because most tables are filtered out and the replicated ones are
only modified occasionally. I'm not sure that it would be a good
trade-off to have a tighter sanity check at the expense of adding that
overhead.

--
Robert Haas
EDB: http://www.enterprisedb.com

#5Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#4)
Re: Question about StartLogicalReplication() error path

Hi,

On 2021-06-11 13:15:11 -0400, Robert Haas wrote:

On Fri, Jun 11, 2021 at 2:23 AM Jeff Davis <pgsql@j-davis.com> wrote:

* The comment acknowledges that a user might expect an error in that
case; but doesn't really address why the user would expect an error,
and why it's OK to violate that expectation.

This code was written by Andres, so he'd be the best person to comment
on it, but it seems to me that the comment does explain this, and that
it's basically the same explanation as what Amit said. If the client
doesn't have to do anything for a certain range of WAL and just
acknowledges it, it would under your proposal have to also durably
record that it had chosen to do nothing, which might cause extra
fsyncs, potentially lots of extra fsyncs if this happens frequently
e.g. because most tables are filtered out and the replicated ones are
only modified occasionally.

Yes, that's the motivation.

I'm not sure that it would be a good trade-off to have a tighter
sanity check at the expense of adding that overhead.

Especially because it very well might break existing working setups...

Greetings,

Andres Freund

#6Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#4)
Re: Question about StartLogicalReplication() error path

On Fri, 2021-06-11 at 13:15 -0400, Robert Haas wrote:

on it, but it seems to me that the comment does explain this, and
that
it's basically the same explanation as what Amit said.

It only addresses the "pro" side of the behavior, not the "con". It's a
bit like saying "Given that we are in the U.S., it might seem like we
should be driving on the right side of the road, but that side has
traffic and we are in a hurry."

Why might it seem that we should error out? If we don't error out, what
bad things might happen? How do these "con"s weigh against the "pro"s?

I'm not sure that it would be a good
trade-off to have a tighter sanity check at the expense of adding
that
overhead.

It doesn't add any overhead.

All the client would have to do is "SELECT confirmed_flush_lsn FROM
pg_replication_slots WHERE slot_name='myslot'", and compare it to the
stored value. If the stored value is earlier than the
confirmed_flush_lsn, the *client* can decide to start replication at
the confirmed_flush_lsn. That makes sense because the client knows more
about its behavior and configuration, and whether that's a safe choice
or not.

The only difference is whether the server is safe-by-default with
intuitive semantics that match the documentation, or unsafe-by-default
with unexpected semantics that don't match the documentation.

Regards,
Jeff Davis

#7Jeff Davis
pgsql@j-davis.com
In reply to: Andres Freund (#5)
Re: Question about StartLogicalReplication() error path

On Fri, 2021-06-11 at 10:40 -0700, Andres Freund wrote:

Especially because it very well might break existing working
setups...

Please address my concerns[1]/messages/by-id/e22a4606333ce1032e29fe2fb1aa9036e6f0ca98.camel@j-davis.com.

Regards,
Jeff Davis

[1]: /messages/by-id/e22a4606333ce1032e29fe2fb1aa9036e6f0ca98.camel@j-davis.com
/messages/by-id/e22a4606333ce1032e29fe2fb1aa9036e6f0ca98.camel@j-davis.com

#8Andres Freund
andres@anarazel.de
In reply to: Jeff Davis (#6)
Re: Question about StartLogicalReplication() error path

On 2021-06-11 11:49:19 -0700, Jeff Davis wrote:

All the client would have to do is "SELECT confirmed_flush_lsn FROM
pg_replication_slots WHERE slot_name='myslot'", and compare it to the
stored value.

That doesn't work as easily in older versions because there was no SQL
support in replication connections until PG 10...

#9Jeff Davis
pgsql@j-davis.com
In reply to: Andres Freund (#8)
Re: Question about StartLogicalReplication() error path

On Fri, 2021-06-11 at 11:56 -0700, Andres Freund wrote:

That doesn't work as easily in older versions because there was no
SQL
support in replication connections until PG 10...

9.6 will be EOL this year. I don't really see why such old versions are
relevant to this discussion.

Regards,
Jeff Davis

#10Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#6)
Re: Question about StartLogicalReplication() error path

On Fri, Jun 11, 2021 at 2:49 PM Jeff Davis <pgsql@j-davis.com> wrote:

It doesn't add any overhead.

All the client would have to do is "SELECT confirmed_flush_lsn FROM
pg_replication_slots WHERE slot_name='myslot'", and compare it to the
stored value. If the stored value is earlier than the
confirmed_flush_lsn, the *client* can decide to start replication at
the confirmed_flush_lsn. That makes sense because the client knows more
about its behavior and configuration, and whether that's a safe choice
or not.

True, but it doesn't seem very nice to forcethe client depend on SQL
when that wouldn't otherwise be needed. The SQL is a lot more likely
to fail than a replication command, for example due to some
permissions issue. So I think if we want to make this an optional
behavior, it would be better to add a flag to the START_REPLICATION
flag to say whether it's OK for the server to fast-forward like this.

You seem to see this as some kind of major problem and I guess I don't
agree. I think it's pretty clear what the motivation was for the
current behavior, because I believe it's well-explained by the comment
and the three people who have tried to answer your question. I also
think it's pretty clear why somebody might find it surprising: someone
might think that fast-forwarding is harmful and risky rather than a
useful convenience. As evidence for the fact that someone might think
that, I offer the fact that you seem to think exactly that thing. I
also think that there's pretty good evidence that the behavior as it
exists is not really so bad. As far as I know, and I certainly might
have missed something, you're the first one to complain about behavior
that we've had for quite a long time now, and you seem to be saying
that it might cause problems for somebody, not that you know it
actually did. So, I don't know, I'm not opposed to talking about
potential improvements here, but to the extent that you're suggesting
this is unreasonable behavior, I think that's too harsh.

--
Robert Haas
EDB: http://www.enterprisedb.com

#11Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#10)
Re: Question about StartLogicalReplication() error path

Hi,

On 2021-06-11 16:05:10 -0400, Robert Haas wrote:

You seem to see this as some kind of major problem and I guess I don't
agree. I think it's pretty clear what the motivation was for the
current behavior, because I believe it's well-explained by the comment
and the three people who have tried to answer your question. I also
think it's pretty clear why somebody might find it surprising: someone
might think that fast-forwarding is harmful and risky rather than a
useful convenience. As evidence for the fact that someone might think
that, I offer the fact that you seem to think exactly that thing. I
also think that there's pretty good evidence that the behavior as it
exists is not really so bad. As far as I know, and I certainly might
have missed something, you're the first one to complain about behavior
that we've had for quite a long time now, and you seem to be saying
that it might cause problems for somebody, not that you know it
actually did. So, I don't know, I'm not opposed to talking about
potential improvements here, but to the extent that you're suggesting
this is unreasonable behavior, I think that's too harsh.

Yea. I think it'd be a different matter if streaming logical decoding
had been added this cycle and we'd started out with supporting queries
over replication connection - but it's been long enough that it's likely
that people rely on the current behaviour, and I don't see the gain in
reliability outweigh the compat issues.

Your argument that one can just check kinda goes both ways - you can do
that with the current behaviour too...

Greetings,

Andres Freund

#12Andres Freund
andres@anarazel.de
In reply to: Jeff Davis (#9)
Re: Question about StartLogicalReplication() error path

On 2021-06-11 12:07:24 -0700, Jeff Davis wrote:

On Fri, 2021-06-11 at 11:56 -0700, Andres Freund wrote:

That doesn't work as easily in older versions because there was no
SQL
support in replication connections until PG 10...

9.6 will be EOL this year. I don't really see why such old versions are
relevant to this discussion.

It's relevant to understand how we ended up here.

#13Amit Kapila
amit.kapila16@gmail.com
In reply to: Jeff Davis (#3)
Re: Question about StartLogicalReplication() error path

On Fri, Jun 11, 2021 at 11:52 AM Jeff Davis <pgsql@j-davis.com> wrote:

On Fri, 2021-06-11 at 10:13 +0530, Amit Kapila wrote:

I think because there is no need to process the WAL that has been
confirmed by the client. Do you see any problems with this scheme?

Several:

* Replication setups are complex, and it can be easy to misconfigure
something or have a bug in some control code. An error is valuable to
detect the problem closer to the source.

* There are plausible configurations where things could go badly wrong.
For instance, if you are storing the decoded data in another postgres
server with syncrhonous_commit=off, and acknowledging LSNs before they
are durable. A crash of the destination system would be consistent, but
it would be missing some data earlier than the confirmed_flush_lsn. The
client would then request the data starting at its stored lsn progress
value, but the server would skip ahead to the confirmed_flush_lsn;
silently missing data.

AFAIU, currently, in such a case, the subscriber (client) won't
advance the flush location (confirmed_flush_lsn). So, we won't lose
any data.

--
With Regards,
Amit Kapila.

#14Jeff Davis
pgsql@j-davis.com
In reply to: Amit Kapila (#13)
Re: Question about StartLogicalReplication() error path

On Sat, 2021-06-12 at 16:17 +0530, Amit Kapila wrote:

AFAIU, currently, in such a case, the subscriber (client) won't
advance the flush location (confirmed_flush_lsn). So, we won't lose
any data.

I think you are talking about the official Logical Replication
specifically, rather than an arbitrary client that's using the logical
replication protocol based on the protocol docs.

It seems that there's not much agreement in a behavior change here. I
suggest one or more of the following:

1. change the logical rep protocol docs to match the current behavior
a. also briefly explain in the docs why it's different from
physical replication (which does always start at the provided LSN as
far as I can tell)

2. Change the comment to add something like "Starting at a different
LSN than requested might not catch certain kinds of client errors.
Clients should be careful to check confirmed_flush_lsn if starting at
the requested LSN is required."

3. upgrade DEBUG1 message to a WARNING

Can I get agreement on any of the above suggestions?

Regards,
Jeff Davis

#15Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#14)
Re: Question about StartLogicalReplication() error path

On Mon, Jun 14, 2021 at 12:50 PM Jeff Davis <pgsql@j-davis.com> wrote:

It seems that there's not much agreement in a behavior change here. I
suggest one or more of the following:

1. change the logical rep protocol docs to match the current behavior
a. also briefly explain in the docs why it's different from
physical replication (which does always start at the provided LSN as
far as I can tell)

2. Change the comment to add something like "Starting at a different
LSN than requested might not catch certain kinds of client errors.
Clients should be careful to check confirmed_flush_lsn if starting at
the requested LSN is required."

3. upgrade DEBUG1 message to a WARNING

Can I get agreement on any of the above suggestions?

I'm happy to hear other opinions, but I think I would be inclined to
vote in favor of #1 and/or #2 but against #3.

--
Robert Haas
EDB: http://www.enterprisedb.com

#16Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#15)
Re: Question about StartLogicalReplication() error path

On Mon, 2021-06-14 at 13:13 -0400, Robert Haas wrote:

I'm happy to hear other opinions, but I think I would be inclined to
vote in favor of #1 and/or #2 but against #3.

What about upgrading it to, say, LOG? It seems like it would happen
pretty infrequently, and in the event something strange happens, might
rule out some possibilities.

Regards,
Jeff Davis

#17Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Jeff Davis (#16)
Re: Question about StartLogicalReplication() error path

At Mon, 14 Jun 2021 14:51:35 -0700, Jeff Davis <pgsql@j-davis.com> wrote in

On Mon, 2021-06-14 at 13:13 -0400, Robert Haas wrote:

I'm happy to hear other opinions, but I think I would be inclined to
vote in favor of #1 and/or #2 but against #3.

What about upgrading it to, say, LOG? It seems like it would happen
pretty infrequently, and in the event something strange happens, might
rule out some possibilities.

I don't think the message is neded, but I don't oppose it as far as
the level is LOG and the messages were changed as something like this:

-		elog(DEBUG1, "cannot stream from %X/%X, minimum is %X/%X, forwarding",
+		elog(LOG, "%X/%X has been already streamed, forwarding to %X/%X",

FWIW, I most prefer #1. I see #2 as optional. and see #3 as the above.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#18Amit Kapila
amit.kapila16@gmail.com
In reply to: Jeff Davis (#16)
Re: Question about StartLogicalReplication() error path

On Tue, Jun 15, 2021 at 3:21 AM Jeff Davis <pgsql@j-davis.com> wrote:

On Mon, 2021-06-14 at 13:13 -0400, Robert Haas wrote:

I'm happy to hear other opinions, but I think I would be inclined to
vote in favor of #1 and/or #2 but against #3.

What about upgrading it to, say, LOG? It seems like it would happen
pretty infrequently, and in the event something strange happens, might
rule out some possibilities.

I don't see any problem with changing it to LOG if that helps
especially because it won't happen too often.

--
With Regards,
Amit Kapila.

#19Jeff Davis
pgsql@j-davis.com
In reply to: Kyotaro Horiguchi (#17)
1 attachment(s)
Re: Question about StartLogicalReplication() error path

On Tue, 2021-06-15 at 15:19 +0900, Kyotaro Horiguchi wrote:

I don't think the message is neded, but I don't oppose it as far as
the level is LOG and the messages were changed as something like
this:

-               elog(DEBUG1, "cannot stream from %X/%X, minimum is
%X/%X, forwarding",
+               elog(LOG, "%X/%X has been already streamed,
forwarding to %X/%X",

FWIW, I most prefer #1. I see #2 as optional. and see #3 as the
above.

Attached.

Regards,
Jeff Davis

Attachments:

logical-start-doc-fix.difftext/x-patch; charset=UTF-8; name=logical-start-doc-fix.diffDownload
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index bc2a2feb0b9..4543d772781 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2396,10 +2396,22 @@ The commands accepted in replication mode are:
     <term><literal>START_REPLICATION</literal> <literal>SLOT</literal> <replaceable class="parameter">slot_name</replaceable> <literal>LOGICAL</literal> <replaceable class="parameter">XXX/XXX</replaceable> [ ( <replaceable>option_name</replaceable> [ <replaceable>option_value</replaceable> ] [, ...] ) ]</term>
     <listitem>
      <para>
-      Instructs server to start streaming WAL for logical replication, starting
-      at WAL location <replaceable class="parameter">XXX/XXX</replaceable>. The server can
-      reply with an error, for example if the requested section of WAL has already
-      been recycled. On success, server responds with a CopyBothResponse
+      Instructs server to start streaming WAL for logical replication,
+      starting at either WAL location <replaceable
+      class="parameter">XXX/XXX</replaceable> or the slot's
+      <literal>confirmed_flush_lsn</literal> (see <xref
+      linkend="view-pg-replication-slots"/>), whichever is greater. This
+      behavior makes it easier for clients to avoid updating their local LSN
+      status when there is no data to process. However, starting at a
+      different LSN than requested might not catch certain kinds of client
+      errors; so the client may wish to check that
+      <literal>confirmed_flush_lsn</literal> matches its expectations before
+      issuing <literal>START_REPLICATION</literal>.
+     </para>
+
+     <para>
+      The server can reply with an error, for example if the
+      slot does not exist. On success, server responds with a CopyBothResponse
       message, and then starts to stream WAL to the frontend.
      </para>
 
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index ffc6160e9f3..b3090979115 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -518,8 +518,12 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 		 * xlog records didn't result in anything relevant for logical
 		 * decoding. Clients have to be able to do that to support synchronous
 		 * replication.
+		 *
+		 * Starting at a different LSN than requested might not catch certain
+		 * kinds of client errors; so the client may wish to check that
+		 * confirmed_flush_lsn matches its expectations.
 		 */
-		elog(DEBUG1, "cannot stream from %X/%X, minimum is %X/%X, forwarding",
+		elog(LOG, "%X/%X has been already streamed, forwarding to %X/%X"
 			 LSN_FORMAT_ARGS(start_lsn),
 			 LSN_FORMAT_ARGS(slot->data.confirmed_flush));
 
#20Amit Kapila
amit.kapila16@gmail.com
In reply to: Jeff Davis (#19)
Re: Question about StartLogicalReplication() error path

On Thu, Jun 17, 2021 at 4:25 AM Jeff Davis <pgsql@j-davis.com> wrote:

On Tue, 2021-06-15 at 15:19 +0900, Kyotaro Horiguchi wrote:

I don't think the message is neded, but I don't oppose it as far as
the level is LOG and the messages were changed as something like
this:

-               elog(DEBUG1, "cannot stream from %X/%X, minimum is
%X/%X, forwarding",
+               elog(LOG, "%X/%X has been already streamed,
forwarding to %X/%X",

FWIW, I most prefer #1. I see #2 as optional. and see #3 as the
above.

Attached.

LGTM.

--
With Regards,
Amit Kapila.