Replication with Patroni not working after killing secondary and starting again
Hi,
I am new to Patroni and PostgreSQL.We have set up a cluster with etcd (3
nodes), Patroni (2 nodes) and PostgreSQL (2 nodes) with replication from
primary to secondary. Seemed to be working fine and we started some tests.
One of the tests gave us unsatisfactory results. Specifically when we start
a long transaction with multiple inserts (we use remote Java app for that)
and during execution of this transaction we kill the secondary database by
using the following:
first list the processes:
ps aux | egrep '(patroni|postgresql)'
and
uses the kill -9 command for the two processes that have postgres in the
name then the following happens:
1) Java app continues with the long transaction which is OK
2) Patroni is not restarting automatically again on the secondary, which I
assume is as expected (?)
3) When the transaction finishes and we start the patroni on secondary by
using the commands:
systemctl enable patroni
systemctl start patroni
the database starts on secondary after a while but the replication from
the primary is not working anymore. Trying new transactions does not
replicate the data either. The same set up worked before in other scenarios
- replicated data (e.g. when using short transactions, killing the
secondary, restarting secondary).
Thus my questions:
1) Is it normal that replication stops working if we kill secondary
postgres and start it again using patroni? Do we need to do any additional
steps except the commands above that start patroni?
2) Is it normal that patroni is not started again automatically after we
kill it and postgres on secondary?
3) Or there is something wrong with our setup and the replication should be
recovered automatically after we kill the secondary and start the patroni
again on secondary?
Thanks,
Zbigniew
On 2022-04-27 15:27:34 +0200, Zb B wrote:
Hi,
I am new to Patroni and PostgreSQL.We have set up a cluster with etcd (3
nodes), Patroni (2 nodes) and PostgreSQL (2 nodes) with replication from
primary to secondary.
Pretty much the setup we have.
Seemed to be working fine and we started some tests. One of the tests
gave us unsatisfactory results. Specifically when we start a long
transaction with multiple inserts (we use remote Java app for that)
and during execution of this transaction we kill the secondary
database by using the following:
[...]
the database starts on secondary after a while but the replication from the
primary is not working anymore.
[...]
Thus my questions:
1) Is it normal that replication stops working if we kill secondary postgres
and start it again using patroni? Do we need to do any additional steps except
the commands above that start patroni?
No.
When the secondary starts up it should continue replicating from where
it stopped. However, it can only do this if the necessary information is
still available. If WAL files have been deleted in the mean time. it
can't replay them. There should be error messages in your logs on what
went wrong.
2) Is it normal that patroni is not started again automatically after we kill
it and postgres on secondary?
Depends on your system setup. Did you tell systemd to automatically
restart patroni? Patroni obviously can't do anything by itself after
it's been killed.
3) Or there is something wrong with our setup and the replication should be
recovered automatically after we kill the secondary and start the patroni again
on secondary?
I assume (but you really haven't given us enough information, so I could
be wrong) that you haven't configured a replication slot and you haven't
enough WAL segments to last through the downtime naturally.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
When the secondary starts up it should continue replicating from where
it stopped. However, it can only do this if the necessary information is
still available. If WAL files have been deleted in the mean time. it
can't replay them. There should be error messages in your logs on what
went wrong
I did another test using different wal_sender_timeout parameter, as the
time of the secondary being shut down was longer than the default 60s for
this parameter.
I was hoping it would help but the result was the same (records were not
replicated to the secondary after the patroni start). Well, I just verified
again that the records were replicated after about 15 minutes to the
secondary, so probably the timeout setting helped, or I was not patient
enough before. Is it normal to wait so long for the replication? (the
original transaction in primary took about 5 minutes and was about 3000
small records). I am providing more details for completeness below:
I get the following errors on the primary DB:
2022-04-28 04:36:50.544 EDT [13794] WARNING: archive_mode enabled, yet
archive_command is not set
2022-04-28 04:37:34.893 EDT [14755] ERROR: replication slot
"xyzd3riardb05" does not exist
2022-04-28 04:37:34.893 EDT [14755] STATEMENT: START_REPLICATION SLOT
"xyzd3riardb05" 0/7000000 TIMELINE 18
2022-04-28 04:37:34.915 EDT [14756] ERROR: replication slot
"xyzd3riardb05" does not exist
2022-04-28 04:37:34.915 EDT [14756] STATEMENT: START_REPLICATION SLOT
"xyzd3riardb05" 0/7000000 TIMELINE 18
2022-04-28 04:37:39.925 EDT [14763] ERROR: replication slot
"xyzd3riardb05" does not exist
2022-04-28 04:37:39.925 EDT [14763] STATEMENT: START_REPLICATION SLOT
"xyzd3riardb05" 0/7000000 TIMELINE 18
2022-04-28 04:37:44.924 EDT [14768] ERROR: replication slot
"xyzd3riardb05" does not exist
2022-04-28 04:37:44.924 EDT [14768] STATEMENT: START_REPLICATION SLOT
"xyzd3riardb05" 0/7000000 TIMELINE 18
and after some time such errors stop to appear.
and when I execute:
su postgres -c "psql -c \"SELECT * FROM pg_replication_slots;\""
I get the following the slot seems to exist:
slot_name | plugin | slot_type | datoid | database | temporary | active
| active_pid | xmin | catalog_xmin | restart_lsn | confirmed_f
lush_lsn | wal_status | safe_wal_size | two_phase
---------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+------------
---------+------------+---------------+-----------
xyzd3riardb05 | | physical | | | f | f
| | | | 0/73289E8 |
| reserved | | f
pdc2b | | physical | | | f | f
| | | | 0/726D398 |
| reserved | | f
pdc2b_standby | | physical | | | f | f
| | | | 0/726D398 |
| reserved | | f
(3 rows)
And as I said I just verified that the records were replicated after about
15 minutes to the secondary.
On 2022-04-28 11:09:12 +0200, Zb B wrote:
When the secondary starts up it should continue replicating from where
it stopped. However, it can only do this if the necessary information is
still available. If WAL files have been deleted in the mean time. it
can't replay them. There should be error messages in your logs on what
went wrongI did another test using different wal_sender_timeout parameter, as the time of
the secondary being shut down was longer than the default 60s for this
parameter.
I don't think this will help. It will just make the primary slower in
noticing that the secondary is gone.
I was hoping it would help but the result was the same (records were not
replicated to the secondary after the patroni start). Well, I just verified
again that the records were replicated after about 15 minutes to the secondary,
so probably the timeout setting helped, or I was not patient enough before.
The latter, I suspect. Although I'm surprised that it takes so long. In
my experience, that takes only a few seconds, certainly less than a
minute for replication to start (how long it takes to finish depends on
the amount of data, of course).
Patroni can nuke the secondary database and create a fresh copy
(using basebackup). That might take 15 minutes (depending on the
database size). I don't think it does that automatically, though. Also I
think you would have noticed that.
What does `patronictl list` show during that interval?
Is it normal to wait so long for the replication? (the original
transaction in primary took about 5 minutes and was about 3000 small
records). I am providing more details for completeness below:I get the following errors on the primary DB:
2022-04-28 04:36:50.544 EDT [13794] WARNING: archive_mode enabled, yet
archive_command is not set
2022-04-28 04:37:34.893 EDT [14755] ERROR: replication slot "xyzd3riardb05"
does not exist
2022-04-28 04:37:34.893 EDT [14755] STATEMENT: START_REPLICATION SLOT
"xyzd3riardb05" 0/7000000 TIMELINE 18
...
and after some time such errors stop to appear.
So the replication slot is probably created after some time and then
replication starts to work.
I think that replication slot is managed by Patroni. So the question
would be: Why does Patroni take so long to create it? Did it log
anything?
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
What does `patronictl list` show during that interval?
Well. I can't repeat the situation anymore. Now the replication starts
immediately after starting the patroni on secondary. I did several
switchover commands meanwhile though
Meanwhile I did another test where I run a Java app with a large number of
*short* transactions (inserts) and during execution of this app I do the
patroni switchover command:
patronictl -c /etc/patroni/patroni.yml switchover
It turned out the records were not replicated to the secondary and when I
tried to execute the switchover command on the primary I got the following
error:
Error: This cluster has no master
When I tried to execute the switchover command on the secondary it worked
but because there was a discrepancy between the primary and secondary the
records on the old primary were rolled back (the number of records on
primary and secondary became the same - the same as it was on the old
secondary)
Apparently there is something wrong with my cluster. How to debug i?. Do I
need to configure anything so the replication is synchronous?
pt., 29 kwi 2022 o 22:33 Peter J. Holzer <hjp-pgsql@hjp.at> napisał(a):
Show quoted text
On 2022-04-28 11:09:12 +0200, Zb B wrote:
When the secondary starts up it should continue replicating from where
it stopped. However, it can only do this if the necessary informationis
still available. If WAL files have been deleted in the mean time. it
can't replay them. There should be error messages in your logs on what
went wrongI did another test using different wal_sender_timeout parameter, as the
time of
the secondary being shut down was longer than the default 60s for this
parameter.I don't think this will help. It will just make the primary slower in
noticing that the secondary is gone.I was hoping it would help but the result was the same (records were not
replicated to the secondary after the patroni start). Well, I justverified
again that the records were replicated after about 15 minutes to the
secondary,
so probably the timeout setting helped, or I was not patient enough
before.
The latter, I suspect. Although I'm surprised that it takes so long. In
my experience, that takes only a few seconds, certainly less than a
minute for replication to start (how long it takes to finish depends on
the amount of data, of course).Patroni can nuke the secondary database and create a fresh copy
(using basebackup). That might take 15 minutes (depending on the
database size). I don't think it does that automatically, though. Also I
think you would have noticed that.What does `patronictl list` show during that interval?
Is it normal to wait so long for the replication? (the original
transaction in primary took about 5 minutes and was about 3000 small
records). I am providing more details for completeness below:I get the following errors on the primary DB:
2022-04-28 04:36:50.544 EDT [13794] WARNING: archive_mode enabled, yet
archive_command is not set
2022-04-28 04:37:34.893 EDT [14755] ERROR: replication slot"xyzd3riardb05"
does not exist
2022-04-28 04:37:34.893 EDT [14755] STATEMENT: START_REPLICATION SLOT
"xyzd3riardb05" 0/7000000 TIMELINE 18...
and after some time such errors stop to appear.
So the replication slot is probably created after some time and then
replication starts to work.I think that replication slot is managed by Patroni. So the question
would be: Why does Patroni take so long to create it? Did it log
anything?hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
On 2022-05-04 10:21:56 +0200, Zb B wrote:
Apparently there is something wrong with my cluster. How to debug i?.
Do I need to configure anything so the replication is synchronous?
Does https://patroni.readthedocs.io/en/latest/replication_modes.html
help?
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
Does https://patroni.readthedocs.io/en/latest/replication_modes.html
help?
Thanks. I have found the same meanwhile. The effects I experienced were
caused by the fact that Patroni configures async replication by default.
After changing it to sync everything worked as expected