pg_receivewal starting position
Hello,
I've notived that pg_receivewal logic for deciding which LSN to start
streaming at consists of:
- looking up the latest WAL file in our destination folder, and resume from
here
- if there isn't, use the current flush location instead.
This behaviour surprised me when using it with a replication slot: I was
expecting it to start streaming at the last flushed location from the
replication slot instead. If you consider a backup tool which will take
pg_receivewal's output and transfer it somewhere else, using the replication
slot position would be the easiest way to ensure we don't miss WAL files.
Does that make sense ?
I don't know if it should be the default, toggled by a command line flag, or if
we even should let the user provide a LSN.
I'd be happy to implement any of that if we agree.
--
Ronan Dunklau
At Tue, 27 Jul 2021 07:50:39 +0200, Ronan Dunklau <ronan.dunklau@aiven.io> wrote in
Hello,
I've notived that pg_receivewal logic for deciding which LSN to start
streaming at consists of:
- looking up the latest WAL file in our destination folder, and resume from
here
- if there isn't, use the current flush location instead.This behaviour surprised me when using it with a replication slot: I was
expecting it to start streaming at the last flushed location from the
replication slot instead. If you consider a backup tool which will take
pg_receivewal's output and transfer it somewhere else, using the replication
slot position would be the easiest way to ensure we don't miss WAL files.Does that make sense ?
I don't know if it should be the default, toggled by a command line flag, or if
we even should let the user provide a LSN.
*I* think it is completely reasonable (or at least convenient or less
astonishing) that pg_receivewal starts from the restart_lsn of the
replication slot to use. The tool already decides the clean-start LSN
a bit unusual way. And it seems to me that proposed behavior can be
the default when -S is specified.
I'd be happy to implement any of that if we agree.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Le mercredi 28 juillet 2021, 08:22:30 CEST Kyotaro Horiguchi a écrit :
At Tue, 27 Jul 2021 07:50:39 +0200, Ronan Dunklau <ronan.dunklau@aiven.io>
wrote inHello,
I've notived that pg_receivewal logic for deciding which LSN to start
streaming at consists of:
- looking up the latest WAL file in our destination folder, and resume
fromhere
- if there isn't, use the current flush location instead.
This behaviour surprised me when using it with a replication slot: I was
expecting it to start streaming at the last flushed location from the
replication slot instead. If you consider a backup tool which will take
pg_receivewal's output and transfer it somewhere else, using the
replication slot position would be the easiest way to ensure we don't
miss WAL files.Does that make sense ?
I don't know if it should be the default, toggled by a command line flag,
or if we even should let the user provide a LSN.*I* think it is completely reasonable (or at least convenient or less
astonishing) that pg_receivewal starts from the restart_lsn of the
replication slot to use. The tool already decides the clean-start LSN
a bit unusual way. And it seems to me that proposed behavior can be
the default when -S is specified.
As of now we can't get the replication_slot restart_lsn with a replication
connection AFAIK.
This implies that the patch could require the user to specify a maintenance-db
parameter, and we would use that if provided to fetch the replication slot
info, or fallback to the previous behaviour. I don't really like this approach
as the behaviour changing wether we supply a maintenance-db parameter or not
is error-prone for the user.
Another option would be to add a new replication command (for example
ACQUIRE_REPLICATION_SLOT <slot_name>) to set the replication slot as the
current one, and return some info about it (restart_lsn at least for a
physical slot).
I don't see any reason not to make it work for logical replication connections
/ slots, but it wouldn't be that useful since we can query the database in
that case.
Acquiring the replication slot instead of just reading it would make sure that
no other process could start the replication between the time we read the
restart_lsn and when we issue START_REPLICATION. START_REPLICATION could then
check if we already have a replication slot, and ensure it is the same one as
the one we're trying to use.
From pg_receivewal point of view, this would amount to:
- check if we currently have wal in the target directory.
- if we do, proceed as currently done, by computing the start lsn and
timeline from the last archived wal
- if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the
restart_lsn as the start lsn if there is one, and don't provide a timeline
- if we still don't have a start_lsn, fallback to using the current server
wal position as is done.
What do you think ? Which information should we provide about the slot ?
--
Ronan Dunklau
At Wed, 28 Jul 2021 12:57:39 +0200, Ronan Dunklau <ronan.dunklau@aiven.io> wrote in
Le mercredi 28 juillet 2021, 08:22:30 CEST Kyotaro Horiguchi a écrit :
At Tue, 27 Jul 2021 07:50:39 +0200, Ronan Dunklau <ronan.dunklau@aiven.io>
wrote inI don't know if it should be the default, toggled by a command line flag,
or if we even should let the user provide a LSN.*I* think it is completely reasonable (or at least convenient or less
astonishing) that pg_receivewal starts from the restart_lsn of the
replication slot to use. The tool already decides the clean-start LSN
a bit unusual way. And it seems to me that proposed behavior can be
the default when -S is specified.As of now we can't get the replication_slot restart_lsn with a replication
connection AFAIK.This implies that the patch could require the user to specify a maintenance-db
parameter, and we would use that if provided to fetch the replication slot
info, or fallback to the previous behaviour. I don't really like this approach
as the behaviour changing wether we supply a maintenance-db parameter or not
is error-prone for the user.Another option would be to add a new replication command (for example
ACQUIRE_REPLICATION_SLOT <slot_name>) to set the replication slot as the
current one, and return some info about it (restart_lsn at least for a
physical slot).
I didn't thought in details. But I forgot that ordinary SQL commands
have been prohibited in physical replication connection. So we need a
new replication command but it's not that a big deal.
I don't see any reason not to make it work for logical replication connections
/ slots, but it wouldn't be that useful since we can query the database in
that case.
Ordinary SQL queries are usable on a logical replication slot so
I'm not sure how logical replication connection uses the command.
However, like you, I wouldn't bother restricting the command to
physical replication, but perhaps the new command should return the
slot type.
Acquiring the replication slot instead of just reading it would make sure that
no other process could start the replication between the time we read the
restart_lsn and when we issue START_REPLICATION. START_REPLICATION could then
check if we already have a replication slot, and ensure it is the same one as
the one we're trying to use.
I'm not sure it's worth adding complexity for such strictness.
START_REPLICATION safely fails if someone steals the slot meanwhile.
In the first place there's no means to protect a slot from others
while idle. One possible problem is the case where START_REPLICATION
successfully acquire the slot after the new command failed. But that
case doesn't seem worse than the case someone advances the slot while
absence. So I think READ_REPLICATION_SLOT is sufficient.
From pg_receivewal point of view, this would amount to:
- check if we currently have wal in the target directory.
- if we do, proceed as currently done, by computing the start lsn and
timeline from the last archived wal
- if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the
restart_lsn as the start lsn if there is one, and don't provide a timeline
- if we still don't have a start_lsn, fallback to using the current server
wal position as is done.
That's pretty much it.
What do you think ? Which information should we provide about the slot ?
We need the timeline id to start with when using restart_lsn. The
current timeline can be used in most cases but there's a case where
the LSN is historical.
pg_receivewal doesn't send a replication status report when a segment
is finished. So after pg_receivewal stops just after a segment is
finished, the slot stays at the beginning of the last segment. Thus
next time it will start from there, creating a duplicate segment.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Le jeudi 29 juillet 2021, 07:32:37 CEST Kyotaro Horiguchi a écrit :
I didn't thought in details. But I forgot that ordinary SQL commands
have been prohibited in physical replication connection. So we need a
new replication command but it's not that a big deal.
Thank you for your feedback !
I don't see any reason not to make it work for logical replication
connections / slots, but it wouldn't be that useful since we can query
the database in that case.Ordinary SQL queries are usable on a logical replication slot so
I'm not sure how logical replication connection uses the command.
However, like you, I wouldn't bother restricting the command to
physical replication, but perhaps the new command should return the
slot type.
Ok done in the attached patch.
I'm not sure it's worth adding complexity for such strictness.
START_REPLICATION safely fails if someone steals the slot meanwhile.
In the first place there's no means to protect a slot from others
while idle. One possible problem is the case where START_REPLICATION
successfully acquire the slot after the new command failed. But that
case doesn't seem worse than the case someone advances the slot while
absence. So I think READ_REPLICATION_SLOT is sufficient.
Ok, I implemented it like this. I tried to follow the pg_get_replication_slots
approach with regards to how to prevent concurrent modification while reading
the slot.
From pg_receivewal point of view, this would amount to:
- check if we currently have wal in the target directory.- if we do, proceed as currently done, by computing the start lsn and
timeline from the last archived wal
- if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the
restart_lsn as the start lsn if there is one, and don't provide a timeline
- if we still don't have a start_lsn, fallback to using the current
serverwal position as is done.
That's pretty much it.
Great.
What do you think ? Which information should we provide about the slot ?
We need the timeline id to start with when using restart_lsn. The
current timeline can be used in most cases but there's a case where
the LSN is historical.
Ok, see below.
pg_receivewal doesn't send a replication status report when a segment
is finished. So after pg_receivewal stops just after a segment is
finished, the slot stays at the beginning of the last segment. Thus
next time it will start from there, creating a duplicate segment.
I'm not sure I see where the problem is here. If we don't keep the segments in
pg_walreceiver target directory, then it would be the responsibility of
whoever moved them to make sure we don't have duplicates, or to handle them
gracefully.
Even if we were forcing a feedback after a segment is finished, there could
still be a problem if the feedback never made it to the server but the segment
was here. It might be interesting to send a feedback anyway.
Please find attached two patches implementing what we've been discussing.
Patch 0001 adds the new READ_REPLICATION_SLOT command.
It returns for a given slot the type, restart_lsn, flush_lsn,
restart_lsn_timeline and flush_lsn_timeline.
The timelines are determined by reading the current timeline history, and
finding the timeline where we may find the record. I didn't find explicit test
for eg IDENTIFY_SYSTEM so didn't write one either for this new command, but it
is tested indirectly in patch 0002.
Patch 0002 makes pg_receivewal use that command if we use a replication slot
and the command is available, and use the restart_lsn and restart_lsn_timeline
as a starting point. It also adds a small test to check that we start back
from the previous restart_lsn instead of the current flush position when our
destination directory does not contain any WAL file.
I also noticed we don't test following a timeline switch. It would probably be
good to add that, both for the case where we determine the previous timeline
from the archived segments and when it comes from the new command. What do you
think ?
Regards,
--
Ronan Dunklau
Attachments:
v1-0001-Add-READ_REPLICATION_SLOT-command.patchtext/x-patch; charset=UTF-8; name=v1-0001-Add-READ_REPLICATION_SLOT-command.patchDownload+197-2
v1-0002-Use-READ_REPLICATION_SLOT-command-in-pg_receivewa.patchtext/x-patch; charset=UTF-8; name=v1-0002-Use-READ_REPLICATION_SLOT-command-in-pg_receivewa.patchDownload+103-5
Le jeudi 29 juillet 2021, 11:09:40 CEST Ronan Dunklau a écrit :
Patch 0001 adds the new READ_REPLICATION_SLOT command.
It returns for a given slot the type, restart_lsn, flush_lsn,
restart_lsn_timeline and flush_lsn_timeline.
The timelines are determined by reading the current timeline history, and
finding the timeline where we may find the record. I didn't find explicit
test for eg IDENTIFY_SYSTEM so didn't write one either for this new
command, but it is tested indirectly in patch 0002.Patch 0002 makes pg_receivewal use that command if we use a replication slot
and the command is available, and use the restart_lsn and
restart_lsn_timeline as a starting point. It also adds a small test to
check that we start back from the previous restart_lsn instead of the
current flush position when our destination directory does not contain any
WAL file.I also noticed we don't test following a timeline switch. It would probably
be good to add that, both for the case where we determine the previous
timeline from the archived segments and when it comes from the new command.
What do you think ?
Following the discussion at [1]/messages/by-id/flat/ CAD21AoDYmv0yJMQnWtCx_kZGwVZnkQSTQ1re2JNSgM0k37afYQ%40mail.gmail.com, I refactored the implementation into
streamutil and added a third patch making use of it in pg_basebackup itself in
order to fail early if the replication slot doesn't exist, so please find
attached v2 for that.
Best regards,
[1]: /messages/by-id/flat/ CAD21AoDYmv0yJMQnWtCx_kZGwVZnkQSTQ1re2JNSgM0k37afYQ%40mail.gmail.com
CAD21AoDYmv0yJMQnWtCx_kZGwVZnkQSTQ1re2JNSgM0k37afYQ%40mail.gmail.com
--
Ronan Dunklau
Attachments:
v2-0001-Add-READ_REPLICATION_SLOT-command.patchtext/x-patch; charset=UTF-8; name=v2-0001-Add-READ_REPLICATION_SLOT-command.patchDownload+197-2
v2-0002-Use-READ_REPLICATION_SLOT-command-in-pg_receivewa.patchtext/x-patch; charset=UTF-8; name=v2-0002-Use-READ_REPLICATION_SLOT-command-in-pg_receivewa.patchDownload+131-5
v2-0003-Check-slot-existence-in-pg_basebackup.patchtext/x-patch; charset=UTF-8; name=v2-0003-Check-slot-existence-in-pg_basebackup.patchDownload+15-5
On Thu, Aug 26, 2021 at 02:14:27PM +0200, Ronan Dunklau wrote:
Following the discussion at [1], I refactored the implementation into
streamutil and added a third patch making use of it in pg_basebackup itself in
order to fail early if the replication slot doesn't exist, so please find
attached v2 for that.
Thanks for the split. That helps a lot.
+
+
/*
* Run IDENTIFY_SYSTEM through a given connection and give back to caller
The patch series has some noise diffs here and there, you may want to
clean up that for clarity.
+ if (slot == NULL || !slot->in_use)
+ {
+ LWLockRelease(ReplicationSlotControlLock);
+
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
LWLocks are released on ERROR, so no need for LWLockRelease() here.
+ <listitem>
+ <para>
+ Read information about the named replication slot. This is
useful to determine which WAL location we should be asking the server
to start streaming at.
A nit. You may want to be more careful with the indentation of the
documentation. Things are usually limited in width for readability.
More <literal> markups would be nice for the field names used in the
descriptions.
+ if (slot == NULL || !slot->in_use) [...]
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("replication slot \"%s\" does not exist",
+ cmd->slotname)));
[...]
+ if (PQntuples(res) == 0)
+ {
+ pg_log_error("replication slot %s does not exist", slot_name);
+ PQclear(0);
+ return false;
So, the backend and ReadReplicationSlot() report an ERROR if a slot
does not exist but pg_basebackup's GetSlotInformation() does the same
if there are no tuples returned. That's inconsistent. Wouldn't it be
more instinctive to return a NULL tuple instead if the slot does not
exist to be able to check after real ERRORs in frontends using this
interface? A slot in use exists, so the error is a bit confusing here
anyway, no?
+ * XXX: should we allow the caller to specify which target timeline it wants
+ * ?
+ */
What are you thinking about here?
-# restarts of pg_receivewal will see this segment as full..
+# restarts of pg_receivewal will see this segment as full../
Typo.
+ TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 4, "restart_lsn_timeline",
+ INT4OID, -1, 0);
+ TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 5, "confirmed_flush_lsn_timeline",
+ INT4OID, -1, 0);
I would call these restart_tli and confirmed_flush_tli., without the
"lsn" part.
The patch for READ_REPLICATION_SLOT could have some tests using a
connection that has replication=1 in some TAP tests. We do that in
001_stream_rep.pl with SHOW, as one example.
- 'slot0'
+ 'slot0', '-p',
+ "$port"
Something we are missing here?
--
Michael
On Thu, Aug 26, 2021 at 5:45 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
order to fail early if the replication slot doesn't exist, so please find
attached v2 for that.
Thanks for the patches. Here are some comments:
1) While the intent of these patches looks good, I have following
concern with new replication command READ_REPLICATION_SLOT: what if
the pg_receivewal exits (because user issued a SIGINT or for some
reason) after flushing the received WAL to disk, before it sends
sendFeedback to postgres server's walsender so that it doesn't get a
chance to update the restart_lsn in the replication slot via
PhysicalConfirmReceivedLocation. If the pg_receivewal is started
again, isn't it going to get the previous restart_lsn and receive the
last chunk of flushed WAL again?
2) What is the significance of READ_REPLICATION_SLOT for logical
replication slots? I read above that somebody suggested to restrict
the walsender to handle READ_REPLICATION_SLOT for physical replication
slots so that the callers will see a command failure. But I tend to
think that it is clean to have this command common for both physical
and logical replication slots and the callers can have an Assert(type
== 'physical').
3) Isn't it useful to send active, active_pid info of the replication
slot via READ_REPLICATION_SLOT? pg_receivewal can use Assert(active ==
true && active_pid == getpid()) as an assertion to ensure that it is
the sole owner of the replication slot? Also, is it good send
wal_status info
4) I think below messages should start with lower case letter and also
there are some typos:
+ pg_log_warning("Could not fetch the replication_slot \"%s\" information "
+ pg_log_warning("Server does not suport fetching the slot's position, "
something like:
+ pg_log_warning("could not fetch replication slot \"%s\" information, "
+ "resuming from current server position instead", replication_slot);
+ pg_log_warning("server does not support fetching replication slot
information, "
+ "resuming from current server position instead");
5) How about emitting the above messages in case of "verbose"?
6) How about an assertion like below?
+ if (stream.startpos == InvalidXLogRecPtr)
+ {
+ stream.startpos = serverpos;
+ stream.timeline = servertli;
+ }
+
+Assert(stream.startpos != InvalidXLogRecPtr)>>
7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?
8) Just an idea, how about we store pg_receivewal's lastFlushPosition
in a file before pg_receivewal exits and compare it with the
restart_lsn that it received from the replication slot, if
lastFlushPosition == received_restart_lsn well and good, if not, then
something would have happened and we always start at the
lastFlushPosition ?
Regards,
Bharath Rupireddy.
Le vendredi 27 août 2021, 05:44:32 CEST Michael Paquier a écrit :
On Thu, Aug 26, 2021 at 02:14:27PM +0200, Ronan Dunklau wrote:
Following the discussion at [1], I refactored the implementation into
streamutil and added a third patch making use of it in pg_basebackup
itself in order to fail early if the replication slot doesn't exist, so
please find attached v2 for that.Thanks for the split. That helps a lot.
Thank you very much for the review, please find attached an updated patchset.
I've also taken into account some remarks made by Bharath Rupireddy.
+ + /* * Run IDENTIFY_SYSTEM through a given connection and give back to callerThe patch series has some noise diffs here and there, you may want to
clean up that for clarity.
Ok, sorry about that.
+ if (slot == NULL || !slot->in_use) + { + LWLockRelease(ReplicationSlotControlLock); + + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), LWLocks are released on ERROR, so no need for LWLockRelease() here.
Following your suggestion of not erroring out on an unexisting slot this point
is no longer be relevant, but thanks for pointing this out anyway.
+ <listitem> + <para> + Read information about the named replication slot. This is useful to determine which WAL location we should be asking the server to start streaming at.A nit. You may want to be more careful with the indentation of the
documentation. Things are usually limited in width for readability.
More <literal> markups would be nice for the field names used in the
descriptions.
Ok.
+ if (slot == NULL || !slot->in_use)
[...] + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg("replication slot \"%s\" does not exist", + cmd->slotname))); [...] + if (PQntuples(res) == 0) + { + pg_log_error("replication slot %s does not exist", slot_name); + PQclear(0); + return false; So, the backend and ReadReplicationSlot() report an ERROR if a slot does not exist but pg_basebackup's GetSlotInformation() does the same if there are no tuples returned. That's inconsistent. Wouldn't it be more instinctive to return a NULL tuple instead if the slot does not exist to be able to check after real ERRORs in frontends using this interface?
The attached patch returns no tuple at all when the replication slot doesn't
exist. I'm not sure if that's what you meant by returning a NULL tuple ?
A slot in use exists, so the error is a bit confusing here
anyway, no?
From my understanding, a slot *not* in use doesn't exist anymore, as such I
don't really understand this point. Could you clarify ?
+ * XXX: should we allow the caller to specify which target timeline it wants + * ? + */ What are you thinking about here?
I was thinking that maybe instead of walking back the timeline history from
where we currently are on the server, we could allow an additional argument
for the client to specify which timeline it wants. But I guess a replication
slot can not be present for a past, divergent timeline ? I have removed that
suggestion.
-# restarts of pg_receivewal will see this segment as full.. +# restarts of pg_receivewal will see this segment as full../ Typo.
Ok.
+ TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 4, "restart_lsn_timeline", + INT4OID, -1, 0); + TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 5, "confirmed_flush_lsn_timeline", + INT4OID, -1, 0); I would call these restart_tli and confirmed_flush_tli., without the "lsn" part.
Ok.
The patch for READ_REPLICATION_SLOT could have some tests using a
connection that has replication=1 in some TAP tests. We do that in
001_stream_rep.pl with SHOW, as one example.
Ok. I added the physical part to 001_stream_rep.pl, using the protocol
interface directly for creating / dropping the slot, and some tests for
logical replication slots to 006_logical_decoding.pl.
- 'slot0' + 'slot0', '-p', + "$port" Something we are missing here?
The thing we're missing here is a wrapper for command_fails_like. I've added
this to PostgresNode.pm.
Best regards,
--
Ronan Dunklau
Attachments:
v3-0001-Add-READ_REPLICATION_SLOT-command.patchtext/x-patch; charset=UTF-8; name=v3-0001-Add-READ_REPLICATION_SLOT-command.patchDownload+279-4
v3-0002-Use-READ_REPLICATION_SLOT-command-in-pg_receivewa.patchtext/x-patch; charset=UTF-8; name=v3-0002-Use-READ_REPLICATION_SLOT-command-in-pg_receivewa.patchDownload+154-5
v3-0003-Check-slot-existence-in-pg_basebackup.patchtext/x-patch; charset=UTF-8; name=v3-0003-Check-slot-existence-in-pg_basebackup.patchDownload+41-4
Le samedi 28 août 2021, 14:10:25 CEST Bharath Rupireddy a écrit :
On Thu, Aug 26, 2021 at 5:45 PM Ronan Dunklau <ronan.dunklau@aiven.io>
wrote:
order to fail early if the replication slot doesn't exist, so please find
attached v2 for that.Thanks for the patches. Here are some comments:
Thank you for this review ! Please see the other side of the thread where I
answered Michael Paquier with a new patchset, which includes some of your
proposed modifications.
1) While the intent of these patches looks good, I have following
concern with new replication command READ_REPLICATION_SLOT: what if
the pg_receivewal exits (because user issued a SIGINT or for some
reason) after flushing the received WAL to disk, before it sends
sendFeedback to postgres server's walsender so that it doesn't get a
chance to update the restart_lsn in the replication slot via
PhysicalConfirmReceivedLocation. If the pg_receivewal is started
again, isn't it going to get the previous restart_lsn and receive the
last chunk of flushed WAL again?
I've kept the existing directory as the source of truth if we have any WAL
there already. If we don't, we fallback to the restart_lsn on the replication
slot.
So in the event that we start it again from somewhere else where we don't have
access to those WALs anymore, we could be receiving it again, which in my
opinion is better than losing everything in between in that case.
2) What is the significance of READ_REPLICATION_SLOT for logical
replication slots? I read above that somebody suggested to restrict
the walsender to handle READ_REPLICATION_SLOT for physical replication
slots so that the callers will see a command failure. But I tend to
think that it is clean to have this command common for both physical
and logical replication slots and the callers can have an Assert(type
== 'physical').
I've updated the patch to make it easy for the caller to check the slot's type
and added a verification for those cases.
In general, I tried to implement the meaning of the different fields exactly as
it's done in the pg_replication_slots view.
3) Isn't it useful to send active, active_pid info of the replication
slot via READ_REPLICATION_SLOT? pg_receivewal can use Assert(active ==
true && active_pid == getpid()) as an assertion to ensure that it is
the sole owner of the replication slot? Also, is it good send
wal_status info
Typically we read the slot before attaching to it, since what we want to do is
check if it exists. It may be worthwile to check that it's not already used by
another backend though.
4) I think below messages should start with lower case letter and also there are some typos: + pg_log_warning("Could not fetch the replication_slot \"%s\" information " + pg_log_warning("Server does not suport fetching the slot's position, " something like: + pg_log_warning("could not fetch replication slot \"%s\" information, " + "resuming from current server position instead", replication_slot); + pg_log_warning("server does not support fetching replication slot information, " + "resuming from current server position instead");
I've rephrased it a bit in v3, let me know if that's what you had in mind.
5) How about emitting the above messages in case of "verbose"?
I think it is useful to warn the user even if not in the verbose case, but if
the consensus is to move it to verbose only output I can change it.
6) How about an assertion like below? + if (stream.startpos == InvalidXLogRecPtr) + { + stream.startpos = serverpos; + stream.timeline = servertli; + } + +Assert(stream.startpos != InvalidXLogRecPtr)>>
Good idea.
7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?
From my point of view, I already expected it to use something like that when
using a replication slot. Maybe an option to turn it off could be useful ?
8) Just an idea, how about we store pg_receivewal's lastFlushPosition
in a file before pg_receivewal exits and compare it with the
restart_lsn that it received from the replication slot, if
lastFlushPosition == received_restart_lsn well and good, if not, then
something would have happened and we always start at the
lastFlushPosition ?
The patch idea originally came from the fact that some utility use
pg_receivewal to fetch WALs, and move them elsewhere. In that sense I don't
really see what value this brings compared to the existing (and unmodified) way
of computing the restart position from the already stored files ?
Best regards,
--
Ronan Dunklau
On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
Thank you for this review ! Please see the other side of the thread where I
answered Michael Paquier with a new patchset, which includes some of your
proposed modifications.
Thanks for the updated patches. Here are some comments on v3-0001
patch. I will continue to review 0002 and 0003.
1) Missing full stops "." at the end.
+ <literal>logical</literal>
+ when following the current timeline history
+ position, when following the current timeline history
2) Can we have the "type" column as "slot_type" just to keep in sync
with the pg_replication_slots view?
3) Can we mention the link to pg_replication_slots view in the columns
- "type", "restart_lsn", "confirmed_flush_lsn"?
Something like: the "slot_type"/"restart_lsn"/"confirmed_flush_lsn" is
same as <link linkend="view-pg-replication-slots"><structname>pg_replication_slots</structname></link>
view.
4) Can we use "read_replication_slot" instead of
"identify_replication_slot", just to be in sync with the actual
command?
5) Are you actually copying the slot contents into the slot_contents
variable here? Isn't just taking the pointer to the shared memory?
+ /* Copy slot contents while holding spinlock */
+ SpinLockAcquire(&slot->mutex);
+ slot_contents = *slot;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);
You could just do:
+ Oid dbid;
+ XLogRecPtr restart_lsn;
+ XLogRecPtr confirmed_flush;
+ /* Copy the required slot contents */
+ SpinLockAcquire(&slot->mutex);
+ dbid = slot.data.database;
+ restart_lsn = slot.data.restart_lsn;
+ confirmed_flush = slot.data.confirmed_flush;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);
6) It looks like you are not sending anything as a response to the
READ_REPLICATION_SLOT command, if the slot specified doesn't exist.
You are just calling end_tup_output which just calls rShutdown (points
to donothingCleanup of printsimpleDR)
if (has_value)
do_tup_output(tstate, values, nulls);
end_tup_output(tstate);
Can you test the use case where the pg_receivewal asks
READ_REPLICATION_SLOT with a non-existent replication slot and see
with your v3 patch how it behaves?
Why don't you remove has_value flag and do this in ReadReplicationSlot:
Datum values[5];
bool nulls[5];
MemSet(values, 0, sizeof(values));
MemSet(nulls, 0, sizeof(nulls));
+ dest = CreateDestReceiver(DestRemoteSimple);
+ tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual);
+ do_tup_output(tstate, values, nulls);
+ end_tup_output(tstate);
7) Why don't we use 2 separate variables "restart_tli",
"confirmed_flush_tli" instead of "slots_position_timeline", just to be
more informative?
8) I still have the question like, how can a client (pg_receivewal for
instance) know that it is the current owner/user of the slot it is
requesting the info? As I said upthread, why don't we send "active"
and "active_pid" fields of the pg_replication_slots view?
Also, it would be good to send the "wal_status" field so that the
client can check if the "wal_status" is not "lost"?
9) There are 2 new lines at the end of ReadReplicationSlot. We give
only one new line after each function definition.
end_tup_output(tstate);
}
<<1stnewline>>
<<2ndnewline>>
/*
* Handle TIMELINE_HISTORY command.
*/
10) Why do we need to have two test cases for "non-existent" slots?
Isn't the test case after "DROP REPLICATION" enough?
+($ret, $stdout, $stderr) = $node_primary->psql(
+ 'postgres', 'READ_REPLICATION_SLOT non_existent_slot;',
+ extra_params => [ '-d', $connstr_rep ]);
+ok( $ret == 0,
+ "READ_REPLICATION_SLOT does not produce an error with non existent slot");
+ok( $stdout eq '',
+ "READ_REPLICATION_SLOT returns no tuple if a slot is non existent");
You can just rename the test case name from:
+ "READ_REPLICATION_SLOT returns no tuple if a slot has been dropped");
to
+ "READ_REPLICATION_SLOT returns no tuple if a slot is non existent");
Regards,
Bharath Rupireddy.
On Tue, Aug 31, 2021 at 4:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
Thank you for this review ! Please see the other side of the thread where I
answered Michael Paquier with a new patchset, which includes some of your
proposed modifications.Thanks for the updated patches. Here are some comments on v3-0001
patch. I will continue to review 0002 and 0003.
Continuing my review on the v3 patch set:
0002:
1) I think the following message
"could not fetch replication slot LSN: got %d rows and %d fields,
expected %d rows and %d or more fields"
should be:
"could not read replication slot: got %d rows and %d fields, expected
%d rows and %d or more fields"
2) Why GetSlotInformation is returning InvalidXLogRecPtr? Change it to
return false instead.
3) Alignment issue:
Change below 2 lines:
+ appendPQExpBuffer(query, "READ_REPLICATION_SLOT %s",
+ slot_name);
To 1 line, as it will be under 80 char limit:
+ appendPQExpBuffer(query, "READ_REPLICATION_SLOT %s", slot_name);
4) Shouldn't your GetSlotInformation return what ever the
READ_REPLICATION_SLOT gets as output columns to be more generic?
Callers would pass non-null/null pointers to the inputs required/not
required for them. Please refer to RunIdentifySystem how it does that.
GetSlotInformation can just read the tuples that are interested for the callers.
bool
GetSlotInformation(PGconn *conn, const char *slot_name, char **slot_type,
XLogRecPtr *restart_lsn, uint32 *restart_lsn_tli,
XLogRecPtr *confirmed_lsn, XLogRecPtr *confirmed_lsn_tli)
{
if (slot_type)
{
/* get the slot_type value from the received tuple */
}
if (restart_lsn)
{
/* get the restart_lsn value from the received tuple */
}
if (restart_lsn_tli)
{
/* get the restart_lsn_tli value from the received tuple */
}
if (confirmed_lsn)
{
/* get the confirmed_lsn value from the received tuple */
}
if (confirmed_lsn_tli)
{
/* get the confirmed_lsn_tli value from the received tuple */
}
}
5) How about below as GetSlotInformation function comment?
/*
* Run READ_REPLICATION_SLOT through a given connection and give back to caller
* following information of the slot if requested:
* - type
* - restart lsn
* - restart lsn timeline
* - confirmed lsn
* - confirmed lsn timeline
*/
6) Do you need +#include "pqexpbuffer.h" in pg_receivewal.c?
7) Missing "," after information and it is not required to use "the"
in the messages.
Change below
+ pg_log_warning("could not fetch the replication_slot \"%s\" information "
+ "resuming from the current server position instead", replication_slot);
to:
+ pg_log_warning("could not fetch replication_slot \"%s\" information, "
+ "resuming from current server position instead", replication_slot);
8) A typo "suport". Ignore this comment, if you incorporate review comment #10.
Change below
pg_log_warning("server does not suport fetching the slot's position, "
"resuming from the current server position instead");
to:
pg_log_warning("server does not support getting start LSN from
replication slot, "
"resuming from current server position instead");
9) I think you should do free the memory allocated to slot_type by
GetSlotInformation:
+ if (strcmp(slot_type, "physical") != 0)
+ {
+ pg_log_error("slot \"%s\" is not a physical replication slot",
+ replication_slot);
+ exit(1);
+ }
+
+ pg_free(slot_type);
10) Isn't it PQclear(res);?
+ PQclear(0);
11) I don't think you need to check for the null value of
replication_slot. In the StreamLog it can't be null, so you can safely
remove the below if condition.
+ if (replication_slot)
+ {
12) How about
/* Try to get start position from server's replication slot information */
insted of
+ /* Try to get it from the slot if any, and the server supports it */
13) When you say that the server supports the new
READ_REPLICATION_SLOT command only if version >= 15000, then isn't it
the function GetSlotInformation doing the following:
bool
GetSlotInformation(PGconn *conn,....,bool *is_supported)
{
if (PQserverVersion(conn) < 15000)
{
*is_supported = false;
return false;
}
*is_supported = true;
}
So, the callers will just do:
/* Try to get start position from server's replication slot information */
char *slot_type = NULL;
bool is_supported;
if (!GetSlotInformation(conn, replication_slot, &stream.startpos,
&stream.timeline, &slot_type, &is_supported))
{
if (!is_supported)
pg_log_warning("server does not support getting start LSN from
replication slot, "
"resuming from current server position instead");
else
pg_log_warning("could not fetch replication_slot \"%s\" information, "
"resuming from current server position instead",
replication_slot);
}
if (slot_type && strcmp(slot_type, "physical") != 0)
{
pg_log_error("slot \"%s\" is not a physical replication slot",
replication_slot);
exit(1);
}
pg_free(slot_type);
}
14) Instead of just
+ if (strcmp(slot_type, "physical") != 0)
do
+ if (slot_type && strcmp(slot_type, "physical") != 0)
0003:
1) The message should start with lower case: "slot \"%s\" is not a
physical replication slot".
+ pg_log_error("Slot \"%s\" is not a physical replication slot",
2)
+ if (strcmp(slot_type, "physical") != 0)
do
+ if (slot_type && strcmp(slot_type, "physical") != 0)
Regards,
Bharath Rupireddy.
Le mardi 31 août 2021, 13:17:22 CEST Bharath Rupireddy a écrit :
On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io>
wrote:
Thank you for this review ! Please see the other side of the thread where
I
answered Michael Paquier with a new patchset, which includes some of your
proposed modifications.Thanks for the updated patches. Here are some comments on v3-0001
patch. I will continue to review 0002 and 0003.
Thank you ! I will send a new version shortly, once I address your remarks
concerning patch 0002 (and hopefully 0003 :-) )
1) Missing full stops "." at the end. + <literal>logical</literal> + when following the current timeline history + position, when following the current timeline history
Good catch, I will take care of it for the next version.
2) Can we have the "type" column as "slot_type" just to keep in sync
with the pg_replication_slots view?
You're right, it makes more sense like this.
3) Can we mention the link to pg_replication_slots view in the columns
- "type", "restart_lsn", "confirmed_flush_lsn"?
Something like: the "slot_type"/"restart_lsn"/"confirmed_flush_lsn" is
same as <link
linkend="view-pg-replication-slots"><structname>pg_replication_slots</struc
tname></link> view.
Same as above, thanks.
4) Can we use "read_replication_slot" instead of
"identify_replication_slot", just to be in sync with the actual
command?
That must have been a leftover from an earlier version of the patch, I will fix
it also.
5) Are you actually copying the slot contents into the slot_contents variable here? Isn't just taking the pointer to the shared memory? + /* Copy slot contents while holding spinlock */ + SpinLockAcquire(&slot->mutex); + slot_contents = *slot; + SpinLockRelease(&slot->mutex); + LWLockRelease(ReplicationSlotControlLock);You could just do: + Oid dbid; + XLogRecPtr restart_lsn; + XLogRecPtr confirmed_flush;+ /* Copy the required slot contents */ + SpinLockAcquire(&slot->mutex); + dbid = slot.data.database; + restart_lsn = slot.data.restart_lsn; + confirmed_flush = slot.data.confirmed_flush; + SpinLockRelease(&slot->mutex); + LWLockRelease(ReplicationSlotControlLock);
It's probably simpler that way.
6) It looks like you are not sending anything as a response to the
READ_REPLICATION_SLOT command, if the slot specified doesn't exist.
You are just calling end_tup_output which just calls rShutdown (points
to donothingCleanup of printsimpleDR)
if (has_value)
do_tup_output(tstate, values, nulls);
end_tup_output(tstate);
Can you test the use case where the pg_receivewal asks
READ_REPLICATION_SLOT with a non-existent replication slot and see
with your v3 patch how it behaves?
Why don't you remove has_value flag and do this in ReadReplicationSlot:
Datum values[5];
bool nulls[5];
MemSet(values, 0, sizeof(values));
MemSet(nulls, 0, sizeof(nulls));+ dest = CreateDestReceiver(DestRemoteSimple); + tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual); + do_tup_output(tstate, values, nulls); + end_tup_output(tstate);
As for the API, I think it's cleaner to just send an empty result set instead
of a null tuple in that case but I won't fight over it if there is consensus on
having an all-nulls tuple value instead.
There is indeed a bug, but not here, in the second patch: I still test the
slot type even when we didn't fetch anything. So I will add a test for that
too.
7) Why don't we use 2 separate variables "restart_tli",
"confirmed_flush_tli" instead of "slots_position_timeline", just to be
more informative?
You're right.
8) I still have the question like, how can a client (pg_receivewal for
instance) know that it is the current owner/user of the slot it is
requesting the info? As I said upthread, why don't we send "active"
and "active_pid" fields of the pg_replication_slots view?
Also, it would be good to send the "wal_status" field so that the
client can check if the "wal_status" is not "lost"?
As for pg_receivewal, it can only check that it's not active at that time,
since we only aquire the replication slot once we know the start_lsn. For the
basebackup case it's the same thing as we only want to check if it exists.
As such, I didn't add them as I didn't see the need, but if it can be useful
why not ? I will do that in the next version.
9) There are 2 new lines at the end of ReadReplicationSlot. We give
only one new line after each function definition.
end_tup_output(tstate);
}
<<1stnewline>>
<<2ndnewline>>
/*
* Handle TIMELINE_HISTORY command.
*/
Ok !
10) Why do we need to have two test cases for "non-existent" slots? Isn't the test case after "DROP REPLICATION" enough? +($ret, $stdout, $stderr) = $node_primary->psql( + 'postgres', 'READ_REPLICATION_SLOT non_existent_slot;', + extra_params => [ '-d', $connstr_rep ]); +ok( $ret == 0, + "READ_REPLICATION_SLOT does not produce an error with non existent slot"); +ok( $stdout eq '', + "READ_REPLICATION_SLOT returns no tuple if a slot is non existent");You can just rename the test case name from: + "READ_REPLICATION_SLOT returns no tuple if a slot has been dropped"); to + "READ_REPLICATION_SLOT returns no tuple if a slot is non existent");
I wanted to test both the case where no slot by this name exists, and the case
where it has been dropped hence still referenced but marked as not "in_use".
Maybe it's not worth it and we can remove the first case as you suggest.
Best regards,
--
Ronan Dunklau
On Mon, Aug 30, 2021 at 11:55:42AM +0200, Ronan Dunklau wrote:
Le vendredi 27 août 2021, 05:44:32 CEST Michael Paquier a écrit :
+ if (slot == NULL || !slot->in_use)
[...] + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg("replication slot \"%s\" does not exist", + cmd->slotname))); [...] + if (PQntuples(res) == 0) + { + pg_log_error("replication slot %s does not exist", slot_name); + PQclear(0); + return false; So, the backend and ReadReplicationSlot() report an ERROR if a slot does not exist but pg_basebackup's GetSlotInformation() does the same if there are no tuples returned. That's inconsistent. Wouldn't it be more instinctive to return a NULL tuple instead if the slot does not exist to be able to check after real ERRORs in frontends using this interface?The attached patch returns no tuple at all when the replication slot doesn't
exist. I'm not sure if that's what you meant by returning a NULL tuple ?
Just return a tuple filled only with NULL values. I would tend to
code things so as we set all the flags of nulls[] to true by default,
remove has_value and define the number of columns in a #define, as of:
#define READ_REPLICATION_SLOT_COLS 5
[...]
Datum values[READ_REPLICATION_SLOT_COLS];
bool nulls[READ_REPLICATION_SLOT_COLS];
[...]
MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
Assert(i == READ_REPLICATION_SLOT_COLS); // when filling values.
This would make ReadReplicationSlot() cleaner by removing all the
else{} blocks coded now to handle the NULL values, and that would be
more in-line with the documentation where we state that one tuple is
returned. Note that this is the same kind of behavior for similar
in-core functions where objects are queried if they don't exist.
I would also suggest a reword of some of the docs, say:
+ <listitem>
+ <para>
+ Read the information of a replication slot. Returns a tuple with
+ <literal>NULL</literal> values if the replication slot does not
+ exist.
+ </para>
A slot in use exists, so the error is a bit confusing here
anyway, no?From my understanding, a slot *not* in use doesn't exist anymore, as such I
don't really understand this point. Could you clarify ?
Yeah, sorry about that. I did not recall the exact meaning of
in_use. Considering the slot as undefined if the flag is false is the
right thing to do.
I was thinking that maybe instead of walking back the timeline history from
where we currently are on the server, we could allow an additional argument
for the client to specify which timeline it wants. But I guess a replication
slot can not be present for a past, divergent timeline ? I have removed that
suggestion.
The parent TLI history is linear, so I'd find that a bit strange in
concept, FWIW.
- 'slot0' + 'slot0', '-p', + "$port" Something we are missing here?The thing we're missing here is a wrapper for command_fails_like. I've added
this to PostgresNode.pm.
It may be better to apply this bit separately, then.
--
Michael
On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?
From my point of view, I already expected it to use something like that when
using a replication slot. Maybe an option to turn it off could be useful ?
IMO, pg_receivewal should have a way to turn off/on using
READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support
READ_REPLICATION_SLOT (a lower version) but for some reasons the
pg_receivewal(running separately) is upgraded to the version that uses
READ_REPLICATION_SLOT, knowing that the server doesn't support
READ_REPLICATION_SLOT why should user let pg_receivewal run an
unnecessary code?
Regards,
Bharath Rupireddy.
At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?
From my point of view, I already expected it to use something like that when
using a replication slot. Maybe an option to turn it off could be useful ?IMO, pg_receivewal should have a way to turn off/on using
READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support
READ_REPLICATION_SLOT (a lower version) but for some reasons the
pg_receivewal(running separately) is upgraded to the version that uses
READ_REPLICATION_SLOT, knowing that the server doesn't support
READ_REPLICATION_SLOT why should user let pg_receivewal run an
unnecessary code?
If I read the patch correctly the situation above is warned by the
following message then continue to the next step giving up to identify
start position from slot data.
"server does not suport fetching the slot's position, resuming from the current server position instead"
(The message looks a bit too long, though..)
However, if the operator doesn't know the server is old, pg_receivewal
starts streaming from unexpected position, which is a kind of
disaster. So I'm inclined to agree to Bharath, but rather I imagine of
an option to explicitly specify how to determine the start position.
--start-source=[server,wal,slot] specify starting-LSN source, default is
trying all of them in the order of wal, slot, server.
I don't think the option doesn't need to accept multiple values at once.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Thu, Sep 2, 2021 at 11:15 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?
From my point of view, I already expected it to use something like that when
using a replication slot. Maybe an option to turn it off could be useful ?IMO, pg_receivewal should have a way to turn off/on using
READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support
READ_REPLICATION_SLOT (a lower version) but for some reasons the
pg_receivewal(running separately) is upgraded to the version that uses
READ_REPLICATION_SLOT, knowing that the server doesn't support
READ_REPLICATION_SLOT why should user let pg_receivewal run an
unnecessary code?If I read the patch correctly the situation above is warned by the
following message then continue to the next step giving up to identify
start position from slot data."server does not suport fetching the slot's position, resuming from the current server position instead"
(The message looks a bit too long, though..)
However, if the operator doesn't know the server is old, pg_receivewal
starts streaming from unexpected position, which is a kind of
disaster. So I'm inclined to agree to Bharath, but rather I imagine of
an option to explicitly specify how to determine the start position.--start-source=[server,wal,slot] specify starting-LSN source, default is
trying all of them in the order of wal, slot, server.I don't think the option doesn't need to accept multiple values at once.
If --start-source = 'wal' fails, then pg_receivewal should show up an
error saying "cannot find start position from <<user-specified-wal>>
directory, try with "server" or "slot" for --start-source". We might
end having similar errors for other options as well. Isn't this going
to create unnecessary complexity?
The existing way the pg_receivewal fetches start pos i.e. first from
wal directory and then from server start position, isn't known to the
user at all, no verbose message or anything specified in the docs. Why
do we need to expose this with the --start-source option? IMO, we can
keep it that way and we can just have a way to turn off the new
behaviour that we are proposing here, i.e.fetching the start position
from the slot's restart_lsn.
Regards,
Bharath Rupireddy.
Le jeudi 2 septembre 2021, 08:42:22 CEST Bharath Rupireddy a écrit :
On Thu, Sep 2, 2021 at 11:15 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote in>On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io>
wrote:
7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an
option?From my point of view, I already expected it to use something like
that when using a replication slot. Maybe an option to turn it off
could be useful ?> >IMO, pg_receivewal should have a way to turn off/on using
READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support
READ_REPLICATION_SLOT (a lower version) but for some reasons the
pg_receivewal(running separately) is upgraded to the version that uses
READ_REPLICATION_SLOT, knowing that the server doesn't support
READ_REPLICATION_SLOT why should user let pg_receivewal run an
unnecessary code?If I read the patch correctly the situation above is warned by the
following message then continue to the next step giving up to identify
start position from slot data."server does not suport fetching the slot's position, resuming from the
current server position instead">(The message looks a bit too long, though..)
However, if the operator doesn't know the server is old, pg_receivewal
starts streaming from unexpected position, which is a kind of
disaster. So I'm inclined to agree to Bharath, but rather I imagine of
an option to explicitly specify how to determine the start position.--start-source=[server,wal,slot] specify starting-LSN source, default is
trying all of them in the order of wal, slot, server.
I don't think the option doesn't need to accept multiple values at once.
If --start-source = 'wal' fails, then pg_receivewal should show up an
error saying "cannot find start position from <<user-specified-wal>>
directory, try with "server" or "slot" for --start-source". We might
end having similar errors for other options as well. Isn't this going
to create unnecessary complexity?The existing way the pg_receivewal fetches start pos i.e. first from
wal directory and then from server start position, isn't known to the
user at all, no verbose message or anything specified in the docs. Why
do we need to expose this with the --start-source option? IMO, we can
keep it that way and we can just have a way to turn off the new
behaviour that we are proposing here, i.e.fetching the start position
from the slot's restart_lsn.
Then it should probably be documented. We write in the docs that it is
strongly recommended to use a replication slot, but do not mention how we
resume from have been already processed.
If someone really cares about having control over how the start position is
defined instead of relying on the auto detection, it would be wiser to add a --
startpos parameter similar to the endpos one, which would override everything
else, instead of different flags for different behaviours.
Regards,
--
Ronan Dunklau
On Thu, Sep 02, 2021 at 02:45:54PM +0900, Kyotaro Horiguchi wrote:
At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
If I read the patch correctly the situation above is warned by the
following message then continue to the next step giving up to identify
start position from slot data.
Better to fallback to the past behavior if attempting to use a
pg_receivewal >= 15 with a PG instance older than 14.
"server does not suport fetching the slot's position, resuming from the current server position instead"
(The message looks a bit too long, though..)
Agreed. Falling back to a warning is not the best answer we can have
here, as there could be various failure types, and for some of them a
hard failure is adapted;
- Failure in the backend while running READ_REPLICATION_SLOT. This
should imply a hard failure, no?
- Slot that does not exist. In this case, we could fall back to the
current write position of the server.
by default if the slot information cannot be retrieved.
Something that's disturbing me in patch 0002 is that we would ignore
the results of GetSlotInformation() if any error happens, even if
there is a problem in the backend, like an OOM. We should be careful
about the semantics here.
However, if the operator doesn't know the server is old, pg_receivewal
starts streaming from unexpected position, which is a kind of
disaster. So I'm inclined to agree to Bharath, but rather I imagine of
an option to explicitly specify how to determine the start position.--start-source=[server,wal,slot] specify starting-LSN source, default is
trying all of them in the order of wal, slot, server.I don't think the option doesn't need to accept multiple values at once.
What is the difference between "wal" and "server"? "wal" stands for
the start position of the set of files stored in the location
directory, and "server" is the location that we'd receive from the
server? I don't think that we need that because, when using a slot,
we know that we can rely on the LSN that the slot retains for
pg_receivewal as that should be the same point as what has been
streamed last. Could there be an argument instead for changing the
default and rely on the slot information rather than scanning the
local WAL archive path for the start point when using --slot? When
using pg_receivewal as a service, relying on a scan of the WAL archive
directory if there is no slot and fallback to an invalid LSN if there
is nothing is fine by me, but I think that just relying on the slot
information is saner as the backend makes sure that nothing is
missing. That's also more useful when streaming changes from a single
slot from multiple locations (stream to location 1 with a slot, stop
pg_receivewal, stream to location 2 that completes 1 with the same
slot).
+ pg_log_error("Slot \"%s\" is not a physical replication slot",
+ replication_slot);
In 0003, the format of this error is not really project-like.
Something like that perhaps would be more adapted:
"cannot use the slot provided, physical slot expected."
I am not really convinced about the need of getting the active state
and the PID used in the backend when fetcing the slot data,
particularly if that's just for some frontend-side checks. The
backend has safeguards already for all that.
While looking at that, I have applied de1d4fe to add
PostgresNode::command_fails_like(), coming from 0003, and put my hands
on 0001 as per the attached, as the starting point. That basically
comes down to all the points raised upthread, plus some tweaks for
things I bumped into to get the semantics of the command to what looks
like the right shape.
--
Michael
Attachments:
v4-0001-Add-READ_REPLICATION_SLOT-command.patchtext/plain; charset=us-asciiDownload+266-4
Le jeudi 2 septembre 2021, 09:28:29 CEST Michael Paquier a écrit :
On Thu, Sep 02, 2021 at 02:45:54PM +0900, Kyotaro Horiguchi wrote:
At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote in If I read the patch
correctly the situation above is warned by the following message then
continue to the next step giving up to identify start position from slot
data.Better to fallback to the past behavior if attempting to use a
pg_receivewal >= 15 with a PG instance older than 14."server does not suport fetching the slot's position, resuming from the
current server position instead">(The message looks a bit too long, though..)
Agreed. Falling back to a warning is not the best answer we can have
here, as there could be various failure types, and for some of them a
hard failure is adapted;
- Failure in the backend while running READ_REPLICATION_SLOT. This
should imply a hard failure, no?
- Slot that does not exist. In this case, we could fall back to the
current write position of the server.by default if the slot information cannot be retrieved.
Something that's disturbing me in patch 0002 is that we would ignore
the results of GetSlotInformation() if any error happens, even if
there is a problem in the backend, like an OOM. We should be careful
about the semantics here.
Ok !
However, if the operator doesn't know the server is old, pg_receivewal
starts streaming from unexpected position, which is a kind of
disaster. So I'm inclined to agree to Bharath, but rather I imagine of
an option to explicitly specify how to determine the start position.--start-source=[server,wal,slot] specify starting-LSN source, default is
trying all of them in the order of wal, slot, server.
I don't think the option doesn't need to accept multiple values at once.
What is the difference between "wal" and "server"? "wal" stands for
the start position of the set of files stored in the location
directory, and "server" is the location that we'd receive from the
server? I don't think that we need that because, when using a slot,
we know that we can rely on the LSN that the slot retains for
pg_receivewal as that should be the same point as what has been
streamed last. Could there be an argument instead for changing the
default and rely on the slot information rather than scanning the
local WAL archive path for the start point when using --slot? When
using pg_receivewal as a service, relying on a scan of the WAL archive
directory if there is no slot and fallback to an invalid LSN if there
is nothing is fine by me, but I think that just relying on the slot
information is saner as the backend makes sure that nothing is
missing. That's also more useful when streaming changes from a single
slot from multiple locations (stream to location 1 with a slot, stop
pg_receivewal, stream to location 2 that completes 1 with the same
slot).
One benefit I see from first trying to get it from the local WAL stream is that
we may end up in a state where it has been flushed to disk but we couldn't
advance the replication slot. In that case it is better to resume from the
point on disk. Maybe taking the max(slot_lsn, local_file_lsn) would work best
for the use case you're describing.
+ pg_log_error("Slot \"%s\" is not a physical replication slot", + replication_slot); In 0003, the format of this error is not really project-like. Something like that perhaps would be more adapted: "cannot use the slot provided, physical slot expected."I am not really convinced about the need of getting the active state
and the PID used in the backend when fetcing the slot data,
particularly if that's just for some frontend-side checks. The
backend has safeguards already for all that.
I could see the use for sending active_pid for use within pg_basebackup: at
least we could fail early if the slot is already in use. But at the same time,
maybe it won't be in use anymore once we need it.
While looking at that, I have applied de1d4fe to add
PostgresNode::command_fails_like(), coming from 0003, and put my hands
on 0001 as per the attached, as the starting point. That basically
comes down to all the points raised upthread, plus some tweaks for
things I bumped into to get the semantics of the command to what looks
like the right shape.
Thanks, I was about to send a new patchset with basically the same thing. It
would be nice to know we work on the same thing concurrently in the future to
avoid duplicate efforts. I'll rebase and send the updated version for patches
0002 and 0003 of my original proposal once we reach an agreement over the
behaviour / options of pg_receivewal.
Also considering the number of different fields to be filled by the
GetSlotInformation function, my local branch groups them into a dedicated
struct which is more convenient than having X possibly null arguments.
--
Ronan Dunklau