how reliable is pg_rewind?

Started by Curt Kolovsonover 5 years ago5 messagesgeneral
Jump to latest
#1Curt Kolovson
curt@kolovson.org

When trying to resync an old primary to become a new standby, I have found
that pg_rewind only works occasionally. How reliable/robust is pg_rewind,
and what are its limitations? We have observed that approx half our FPIs in
the WALs are due to XLOG/FPI_FOR_HINT. The only reason we've set
wal_log_hints=on is so that we can use pg_rewind. But if pg_rewind is
unreliable, we would rather turn off wal_log_hints. Any info on the
reliability of pg_rewind and its limitations would be appreciated.

Thanks, Curt

#2Michael Paquier
michael@paquier.xyz
In reply to: Curt Kolovson (#1)
Re: how reliable is pg_rewind?

On Sat, Aug 01, 2020 at 10:35:37AM -0700, Curt Kolovson wrote:

When trying to resync an old primary to become a new standby, I have found
that pg_rewind only works occasionally. How reliable/robust is pg_rewind,
and what are its limitations? We have observed that approx half our FPIs in
the WALs are due to XLOG/FPI_FOR_HINT. The only reason we've set
wal_log_hints=on is so that we can use pg_rewind. But if pg_rewind is
unreliable, we would rather turn off wal_log_hints. Any info on the
reliability of pg_rewind and its limitations would be appreciated.

FWIW, we use it in production to accelerate the redeployment of
standbys in HA configuration for 4 years now in at least one product,
and it is present in upstream for since 9.5, for 5 years now. So the
tool is rather baked at this stage of the game.
--
Michael

#3Paul Förster
paul.foerster@gmail.com
In reply to: Michael Paquier (#2)
Re: how reliable is pg_rewind?

Hi Curt, hi Michael,

On 03. Aug, 2020, at 03:58, Michael Paquier <michael@paquier.xyz> wrote:

On Sat, Aug 01, 2020 at 10:35:37AM -0700, Curt Kolovson wrote:

Any info on the reliability of pg_rewind and its limitations would be appreciated.

FWIW, we use it in production to accelerate the redeployment of
standbys in HA configuration for 4 years now in at least one product,
and it is present in upstream for since 9.5, for 5 years now. So the
tool is rather baked at this stage of the game.

same here. We use it with Patroni in failover cluster setups for about 2-3 years now. It has not failed us yet.

Cheers,
Paul

#4Curt Kolovson
curt@kolovson.org
In reply to: Paul Förster (#3)
Re: how reliable is pg_rewind?

Thanks, Paul and Michael. I forgot to mention that we're using postgres
v10.12.

On Sun, Aug 2, 2020 at 10:29 PM Paul Förster <paul.foerster@gmail.com>
wrote:

Show quoted text

Hi Curt, hi Michael,

On 03. Aug, 2020, at 03:58, Michael Paquier <michael@paquier.xyz> wrote:

On Sat, Aug 01, 2020 at 10:35:37AM -0700, Curt Kolovson wrote:

Any info on the reliability of pg_rewind and its limitations would be

appreciated.

FWIW, we use it in production to accelerate the redeployment of
standbys in HA configuration for 4 years now in at least one product,
and it is present in upstream for since 9.5, for 5 years now. So the
tool is rather baked at this stage of the game.

same here. We use it with Patroni in failover cluster setups for about 2-3
years now. It has not failed us yet.

Cheers,
Paul

#5Paul Förster
paul.foerster@gmail.com
In reply to: Curt Kolovson (#4)
Re: how reliable is pg_rewind?

Hi Curt,

On 03. Aug, 2020, at 08:25, Curt Kolovson <curt@kolovson.org> wrote:
Thanks, Paul and Michael. I forgot to mention that we're using postgres v10.12.

11.6 and 12.3 here.

Also, please don't top-post, thanks.

Cheers,
Paul