recovery_command has precedence over phisical slots?
Hello everyone,
I'm experiencing a behaviour I don't really understand if is a
misconfiguration or a wanted behaviour:
1) I set up a primary server (a.k.a. db1) with and archive_command to a
storage
2) I set up a replica (a.k.a. db2) that created a slot named as slot_2 and
that has the recovery_command set to read archived wal on the storage.
If I shutdown replica db2 during a pgbench I see the safe_wal_size queried
from pg_replication_slots on the primary decrease to a certain amount but
still in the max_slot_wal_kepp_size window: even if I restart the replica
db2 before the slot_state changes to unreserved or lost I see that the
replica gets needed wals from the storage using recovery_command but
doesn't use slot on primary.
Only if I comment the recovery command on the .conf of the replica then it
uses slot.
If this is a wanted behaviour I can't understand the need of slots on
primary.
Hopin' could someone explain me, thanks in advance, Giovanni
On Fri, 2022-08-19 at 16:54 +0200, Giovanni Biscontini wrote:
Hello everyone,
I'm experiencing a behaviour I don't really understand if is a misconfiguration or a wanted behaviour:
1) I set up a primary server (a.k.a. db1) with and archive_command to a storage
2) I set up a replica (a.k.a. db2) that created a slot named as slot_2 and that has the recovery_command set to read archived wal on the storage.
If I shutdown replica db2 during a pgbench I see the safe_wal_size queried from pg_replication_slots on the primary decrease to a certain amount but still in the max_slot_wal_kepp_size window: even
if I restart the replica db2 before the slot_state changes to unreserved or lost I see that the replica gets needed wals from the storage using recovery_command but doesn't use slot on primary.
Only if I comment the recovery command on the .conf of the replica then it uses slot.
If this is a wanted behaviour I can't understand the need of slots on primary.
This is normal behavior and is no problem.
After the standby has caught up using "restore_command", it will connection to
the primary as defined in "primary_conninfo" and stream WAL from there.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
At Fri, 19 Aug 2022 18:37:53 +0200, Laurenz Albe <laurenz.albe@cybertec.at> wrote in
On Fri, 2022-08-19 at 16:54 +0200, Giovanni Biscontini wrote:
Hello everyone,
I'm experiencing a behaviour I don't really understand if is a misconfiguration or a wanted behaviour:
1) I set up a primary server (a.k.a. db1) with and archive_command to a storage
2) I set up a replica (a.k.a. db2) that created a slot named as slot_2 and that has the recovery_command set to read archived wal on the storage.
If I shutdown replica db2 during a pgbench I see the safe_wal_size queried from pg_replication_slots on the primary decrease to a certain amount but still in the max_slot_wal_kepp_size window: even
if I restart the replica db2 before the slot_state changes to unreserved or lost I see that the replica gets needed wals from the storage using recovery_command but doesn't use slot on primary.
Only if I comment the recovery command on the .conf of the replica then it uses slot.
If this is a wanted behaviour I can't understand the need of slots on primary.This is normal behavior and is no problem.
After the standby has caught up using "restore_command", it will connection to
the primary as defined in "primary_conninfo" and stream WAL from there.
The reason that db2 ran recovery beyond the slot LSN is the db2's
restore_command (I guess) points to db1's archive. If db2 had its own
archive directory or no archive (that is, restore_command is empty),
archive recovery stops at (approximately) the slot LSN and replication
will start from there (from the beginning of the segment, to be
exact).
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Wed, 2022-08-24 at 14:18 +0900, Kyotaro Horiguchi wrote:
At Fri, 19 Aug 2022 18:37:53 +0200, Laurenz Albe <laurenz.albe@cybertec.at> wrote in
On Fri, 2022-08-19 at 16:54 +0200, Giovanni Biscontini wrote:
Hello everyone,
I'm experiencing a behaviour I don't really understand if is a misconfiguration or a wanted behaviour:
1) I set up a primary server (a.k.a. db1) with and archive_command to a storage
2) I set up a replica (a.k.a. db2) that created a slot named as slot_2 and that has the recovery_command
set to read archived wal on the storage.
If I shutdown replica db2 during a pgbench I see the safe_wal_size queried from pg_replication_slots
on the primary decrease to a certain amount but still in the max_slot_wal_kepp_size window: even
if I restart the replica db2 before the slot_state changes to unreserved or lost I see that the
replica gets needed wals from the storage using recovery_command but doesn't use slot on primary.
Only if I comment the recovery command on the .conf of the replica then it uses slot.
If this is a wanted behaviour I can't understand the need of slots on primary.This is normal behavior and is no problem.
After the standby has caught up using "restore_command", it will connection to
the primary as defined in "primary_conninfo" and stream WAL from there.The reason that db2 ran recovery beyond the slot LSN is the db2's
restore_command (I guess) points to db1's archive. If db2 had its own
archive directory or no archive (that is, restore_command is empty),
archive recovery stops at (approximately) the slot LSN and replication
will start from there (from the beginning of the segment, to be
exact).
Is it a problem if archive recovery proceeds past the replication slot's LSN?
I guess I don't see the problem.
Yours,
Laurenz Albe
Il giorno mer 24 ago 2022 alle ore 13:00 Laurenz Albe <
laurenz.albe@cybertec.at> ha scritto:
On Wed, 2022-08-24 at 14:18 +0900, Kyotaro Horiguchi wrote:
At Fri, 19 Aug 2022 18:37:53 +0200, Laurenz Albe <
laurenz.albe@cybertec.at> wrote in
On Fri, 2022-08-19 at 16:54 +0200, Giovanni Biscontini wrote:
Hello everyone,
I'm experiencing a behaviour I don't really understand if is amisconfiguration or a wanted behaviour:
1) I set up a primary server (a.k.a. db1) with and archive_command
to a storage
2) I set up a replica (a.k.a. db2) that created a slot named as
slot_2 and that has the recovery_command
set to read archived wal on the storage.
If I shutdown replica db2 during a pgbench I see the safe_wal_sizequeried from pg_replication_slots
on the primary decrease to a certain amount but still in the
max_slot_wal_kepp_size window: even
if I restart the replica db2 before the slot_state changes to
unreserved or lost I see that the
replica gets needed wals from the storage using recovery_command but
doesn't use slot on primary.
Only if I comment the recovery command on the .conf of the replica
then it uses slot.
If this is a wanted behaviour I can't understand the need of
slots on primary.
This is normal behavior and is no problem.
After the standby has caught up using "restore_command", it will
connection to
the primary as defined in "primary_conninfo" and stream WAL from there.
The reason that db2 ran recovery beyond the slot LSN is the db2's
restore_command (I guess) points to db1's archive. If db2 had its own
archive directory or no archive (that is, restore_command is empty),
archive recovery stops at (approximately) the slot LSN and replication
will start from there (from the beginning of the segment, to be
exact).Is it a problem if archive recovery proceeds past the replication slot's
LSN?I guess I don't see the problem.
Yours,
Laurenz Albe
Hi and thanks all, my thoughts:
a) if I set up a slot I thought it would be useful for 2 reason:
a.1) it has a "per replica" reference on the wal to keep,
a.2) after a disconnection in replica (db2) when it reconnects I think
it can be quicker to get missing WALs referenced in slot from the primary
pg_wal than recover them from archived, especially if archived are on an a
S3 bucket (so yes db2 recovery points to the same archive of db1)
b) Archive and consequently the recovery command in my thoughts are "the
safety" if replica falls behind the wal_keep_size or (in this case) behind
the max_slot_wal_keep_size
c) I understand that, maybe, the idea behind giving the precedence to to
recovery_command is "recovery is present, so don't even give a try to slot
because it can be lost so go to "safety" with recovery that is intended to
be.
but... in this case if I set a slot+a recovery_command the usage and
subsequently the risk of filling the disk space, is useless: it uses always
the recovery.
So if I can say the problem is: I configure a slot that in every case
produces more time to set it up, more disk usage, more configuration, but
is useless...
thanks in advance and best regard, Giovanni
p.s. I forgot to specify before: the pg version is 14.5