pg_rewind after promote

Started by Emond Papegaaijabout 2 years ago4 messagesgeneral

emond.papegaaij@gmail.com

about 2 years ago

Hi,

We develop an application that uses PostgreSQL in combination with Pgpool
as a database backend for a Jakarta EE application (on WildFly). This
application supports running in a clustered setup with 3 nodes, providing
both high availability and load balancing. Every node runs an instance of
the database, a pgpool and the application server. Pgpool manages the
PostgreSQL replication using async streaming replication, with 1 primary
and 2 standby nodes.

The versions used are (containerized on debian:bullseye-slim):
PostgreSQL version 12.18
Pgpool2 version 4.5.0

The problem we are seeing happens during planned maintenance, for example,
when updates are installed and the hosts need to reboot. We take the hosts
out of the cluster one at a time, perform the updates and reboot, and bring
the host back into the cluster. If the host that needs to be taken out has
the role of the primary database, we need to perform a failover. For this,
we perform several steps:
* we detach the primary database backend, forcing a failover
* pgpool selects a new primary database and promotes it
* the other 2 nodes (the old primary and the other standby) are rewound
and streaming is resumed from the new primary
* the node that needed to be taken out of the cluster (the old primary) is
shutdown and rebooted

This works fine most of the time, but sometimes we see this message on one
of the nodes:
pg_rewind: source and target cluster are on the same timeline pg_rewind: no
rewind required
This message seems timing related, as the first node might report that,
while the second reports something like:
pg_rewind: servers diverged at WAL location 5/F28AB1A8 on timeline 21
pg_rewind: rewinding from last common checkpoint at 5/F27FCA98 on timeline
21 pg_rewind: Done!

If we ignore the response from pg_rewind, streaming will break on the node
that reported no rewind was required. On the new primary, we do observe the
database moving from timeline 21 to 22, but it seems this takes some time
to materialize to be observable by pg_rewind. This window where the new
timeline does exist, but is not observed by pg_rewind makes our failover
much less reliable. So, I've got 2 questions:

1. Is my observation about the starting of a new timeline correct?
2. If yes, is there anything we can do during to block promotion process
until the new timeline has fully materialized, either by waiting or
preferably forcing the new timeline to be started?

Best regards,
Emond Papegaaij

Laurenz Albe

laurenz.albe@cybertec.at

about 2 years ago

In reply to: Emond Papegaaij (#1)

Re: pg_rewind after promote

On Thu, 2024-03-28 at 15:52 +0100, Emond Papegaaij wrote:

* we detach the primary database backend, forcing a failover
* pgpool selects a new primary database and promotes it
* the other 2 nodes (the old primary and the other standby) are rewound
and streaming is resumed from the new primary
* the node that needed to be taken out of the cluster (the old primary)
is shutdown and rebooted

This works fine most of the time, but sometimes we see this message on one of the nodes:
pg_rewind: source and target cluster are on the same timeline pg_rewind: no rewind required
This message seems timing related, as the first node might report that,
while the second reports something like:
pg_rewind: servers diverged at WAL location 5/F28AB1A8 on timeline 21
pg_rewind: rewinding from last common checkpoint at 5/F27FCA98 on timeline 21
pg_rewind: Done!

If we ignore the response from pg_rewind, streaming will break on the node that reported
no rewind was required. On the new primary, we do observe the database moving from timeline
21 to 22, but it seems this takes some time to materialize to be observable by pg_rewind.

1. Is my observation about the starting of a new timeline correct?
2. If yes, is there anything we can do during to block promotion process until the new
timeline has fully materialized, either by waiting or preferably forcing the new
timeline to be started?

This must be the problem addressed by commit 009eeee746 [1]. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=009eeee746825090ec7194321a3db4b298d6571e.

You'd have to upgrade to PostgreSQL v16, which would be a good idea anyway, given
that you are running v12.

A temporary workaround could be to explicitly trigger a checkpoint right after
promotion.

Yours,
Laurenz Albe

[1]: . https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=009eeee746825090ec7194321a3db4b298d6571e

Emond Papegaaij

emond.papegaaij@gmail.com

about 2 years ago

In reply to: Laurenz Albe (#2)

Re: pg_rewind after promote

Op do 28 mrt 2024 om 16:21 schreef Laurenz Albe <laurenz.albe@cybertec.at>:

On Thu, 2024-03-28 at 15:52 +0100, Emond Papegaaij wrote:

This works fine most of the time, but sometimes we see this message on

one of the nodes:

pg_rewind: source and target cluster are on the same timeline pg_rewind:

no rewind required

This message seems timing related, as the first node might report that,
while the second reports something like:
pg_rewind: servers diverged at WAL location 5/F28AB1A8 on timeline 21
pg_rewind: rewinding from last common checkpoint at 5/F27FCA98 on

timeline 21

pg_rewind: Done!

If we ignore the response from pg_rewind, streaming will break on the

node that reported

no rewind was required. On the new primary, we do observe the database

moving from timeline

21 to 22, but it seems this takes some time to materialize to be

observable by pg_rewind.

This must be the problem addressed by commit 009eeee746 [1].

Thanks for the quick help!
This commit does seem to exactly address the problem we are seeing. Great
to hear it's fixed in the latest version!

You'd have to upgrade to PostgreSQL v16, which would be a good idea anyway,

given
that you are running v12.

This is quite high on our roadmap. We were at v12 when we introduced our HA
setup. Before then, upgrading PostgreSQL was as simple as running
pg_upgrade, but now we need to deal with upgrading an entire cluster. We
are thinking about setting up logical replication to a single v16 node, and
resync the cluster from that node. We will make sure to upgrade before v12
is EOL (November this year).

A temporary workaround could be to explicitly trigger a checkpoint right

after
promotion.

Would this be as simple as sending a CHECKPOINT to the new primary just
after promoting? This would work fine for us until we've migrated to v16.

Best regards,
Emond Papegaaij

Laurenz Albe

laurenz.albe@cybertec.at

about 2 years ago

In reply to: Emond Papegaaij (#3)

Re: pg_rewind after promote

On Thu, 2024-03-28 at 17:17 +0100, Emond Papegaaij wrote:

Op do 28 mrt 2024 om 16:21 schreef Laurenz Albe <laurenz.albe@cybertec.at>:

On Thu, 2024-03-28 at 15:52 +0100, Emond Papegaaij wrote:

pg_rewind: source and target cluster are on the same timeline pg_rewind: no rewind required

If we ignore the response from pg_rewind, streaming will break on the node that reported
no rewind was required. On the new primary, we do observe the database moving from timeline
21 to 22, but it seems this takes some time to materialize to be observable by pg_rewind.

This must be the problem addressed by commit 009eeee746 [1].

A temporary workaround could be to explicitly trigger a checkpoint right after
promotion.

Would this be as simple as sending a CHECKPOINT to the new primary just after promoting?
This would work fine for us until we've migrated to v16.

Yes, that would be the idea.

Yours,
Laurenz Albe