[Proposal] pg_rewind integration into core
Hi,
It's possible to have a good number of standbys (in the context of async
streaming replication) as part of the client architecture. Rather than
asking the client to look into the intricacies of comparing the LSN of each
standby with that of primary and performing the pg_rewind, isn't it a good
idea to integrate the pg_rewind into the startup logic and perform
pg_rewind on need basis?
Considering the scenarios where primary is ahead of sync standbys, upon
promotion of a standby, pg_rewind is needed on the old primary if it has to
be up as a standby. Similarly in the scenarios where async standbys(in
physical replication context) go ahead of sync standbys, and upon promotion
of a standby, there is need for pg_rewind to be performed on the async
standbys which are ahead of sync standby being promoted.
With these scenarios under consideration, integrating pg_rewind into
postgres core might be a better option IMO. We could optionally choose to
have pg_rewind dry run performed during the standby startup and based on
the need, perform the rewind and have the standby in sync with the primary.
Would like to invite more thoughts from the hackers.
Regards,
RKN
On Wed, Mar 23, 2022 at 05:13:47PM +0530, RKN Sai Krishna wrote:
Considering the scenarios where primary is ahead of sync standbys, upon
promotion of a standby, pg_rewind is needed on the old primary if it has to
be up as a standby. Similarly in the scenarios where async standbys(in
physical replication context) go ahead of sync standbys, and upon promotion
of a standby, there is need for pg_rewind to be performed on the async
standbys which are ahead of sync standby being promoted.
With these scenarios under consideration, integrating pg_rewind into
postgres core might be a better option IMO. We could optionally choose to
have pg_rewind dry run performed during the standby startup and based on
the need, perform the rewind and have the standby in sync with the primary.
pg_rewind is already part of the core code as a binary tool, but what
you mean is to integrate a portion of it in the backend code, as of a
startup sequence (with the node to rewind using primary_conninfo for
the source?). Once thing that we would need to be careful about
is that no assumptions a rewind relies on are messed up in any way
at the step where the rewind begins. One such thing is that the
standby has achieved crash recovery correctly, so you would need
to somewhat complicate more the startup sequence, which is already a
complicated and sensitive piece of logic, with more internal
dependencies between each piece. I am not really convinced that we
need to add more technical debt in this area, particularly now that
pg_rewind is able to enforce recovery on the target node once so as it
has a clean state when the rewind can begin, so the assumptions around
crash recovery and rewind have a clear frontier cut.
--
Michael