New recovery_target_timeline=primary option

Started by Efrain J. Berdecia4 months ago8 messages
#1Efrain J. Berdecia
ejberdecia@yahoo.com

One-line Summary: This new recovery_target_timeline option would ensure that when rebuilding a replica cluster, the recovery stays in the primary cluster's timeline making it fool proof and avoiding recovery timeline inconsistencies.

Business Use-case: Reduce human interaction when rebuilding replicas where unwanted timelines might have been archived in the repo and speed up recovery.

User impact with the change: New parameter option available 

Implementation details: I would need a subject matter expert to please make this feature a reality 

Estimated Development Time: unknown 

Category: Include the text: Restore, replication

Thanks in advance Efrain J Berdecia 

#2Euler Taveira
euler@eulerto.com
In reply to: Efrain J. Berdecia (#1)
Re: New recovery_target_timeline=primary option

On Thu, Sep 11, 2025, at 9:17 PM, Efrain J. Berdecia wrote:

*One-line Summary:* This new recovery_target_timeline option would
ensure that when rebuilding a replica cluster, the recovery stays in
the primary cluster's timeline making it fool proof and avoiding
recovery timeline inconsistencies.

Do you understand what the timeline is for? [1]https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-TIMELINES You are proposing to implement
exactly what it is protecting you from: overwrite previous archived WAL after a
recovery.

[1]: https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-TIMELINES

--
Euler Taveira
EDB https://www.enterprisedb.com/

#3Efrain J. Berdecia
ejberdecia@yahoo.com
In reply to: Euler Taveira (#2)
Re: New recovery_target_timeline=primary option

This option would only be applicable when the standby.signal file is used only for restoring a cluster for the purposes of establishing a standby replica.

Yahoo Mail: Search, Organize, Conquer

On Thu, Sep 11, 2025 at 8:50 PM, Euler Taveira<euler@eulerto.com> wrote: On Thu, Sep 11, 2025, at 9:17 PM, Efrain J. Berdecia wrote:

*One-line Summary:* This new recovery_target_timeline option would
ensure that when rebuilding a replica cluster, the recovery stays in
the primary cluster's timeline making it fool proof and avoiding
recovery timeline inconsistencies.

Do you understand what the timeline is for? [1]https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-TIMELINES You are proposing to implement
exactly what it is protecting you from: overwrite previous archived WAL after a
recovery.

[1]: https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-TIMELINES

--
Euler Taveira
EDB  https://www.enterprisedb.com/

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: Efrain J. Berdecia (#1)
Re: New recovery_target_timeline=primary option

On Thursday, September 11, 2025, Efrain J. Berdecia <ejberdecia@yahoo.com>
wrote:

*One-line Summary:* This new recovery_target_timeline option would ensure
that when rebuilding a replica cluster, the recovery stays in the primary
cluster's timeline making it fool proof and avoiding recovery timeline
inconsistencies.

*Business Use-case:* Reduce human interaction when rebuilding replicas
where unwanted timelines might have been archived in the repo and speed up
recovery.

*User impact with the change: New parameter option available *

*Implementation details:* I would need a subject matter expert to please
make this feature a reality

*Estimated Development Time: unknown *

Category: Include the text: Restore, replication

Feature requests with this little info are probably better discussed on the
-general list to garner support for the idea.

David J.

#5Efrain J. Berdecia
ejberdecia@yahoo.com
In reply to: Euler Taveira (#2)
Re: New recovery_target_timeline=primary option

The error I would like to address with this feature is the following:
FATAL: highest timeline xxx of the primary is behind timeline yyy
Where the restored standby for some reason has applied wal files that made is go beyond the currents primary timeline.
Seems to me postgres already had more than enough logic to keep the restored standby's timeline in sync with the primary but is choosing to put out a fatal error instead. This foxes human intervention by having to specify the exact timeline needed to match the primary. I think this could be covered by the proposed option.
Yahoo Mail: Search, Organize, Conquer

On Thu, Sep 11, 2025 at 8:50 PM, Euler Taveira<euler@eulerto.com> wrote: On Thu, Sep 11, 2025, at 9:17 PM, Efrain J. Berdecia wrote:

*One-line Summary:* This new recovery_target_timeline option would
ensure that when rebuilding a replica cluster, the recovery stays in
the primary cluster's timeline making it fool proof and avoiding
recovery timeline inconsistencies.

Do you understand what the timeline is for? [1]https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-TIMELINES You are proposing to implement
exactly what it is protecting you from: overwrite previous archived WAL after a
recovery.

[1]: https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-TIMELINES

--
Euler Taveira
EDB  https://www.enterprisedb.com/

#6Euler Taveira
euler@eulerto.com
In reply to: Efrain J. Berdecia (#5)
Re: New recovery_target_timeline=primary option

On Thu, Sep 11, 2025, at 10:07 PM, Efrain J. Berdecia wrote:

The error I would like to address with this feature is the following:

FATAL: highest timeline xxx of the primary is behind timeline yyy

It seems your procedure to set up a standby is incorrect. See [1]https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-SETUP. You are not
using the base backup from the primary server.

You didn't describe the whole procedure so it is hard to point out where the
problem is.

[1]: https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-SETUP

--
Euler Taveira
EDB https://www.enterprisedb.com/

#7Efrain J. Berdecia
ejberdecia@yahoo.com
In reply to: David G. Johnston (#4)
Re: New recovery_target_timeline=primary option

A typical scenario would be if we have a high availability setup with two replicated clusters, primary and a standby. Throw patroni in the mix to manage automatic failover.
If we use a backup solution like PGbackrest to take full backups and archive the Wal files. Let's say we have a scenario where patroni starts flapping between the clusters and promotes both clusters several times but finally settles and chooses to continue running the primary cluster with an older timeline than the newest timeline in the pgbackrest repo, then when we try to reinit or restore the standby, by default, it will attempt to restore to latest timeline.
Leaving the admins to have to figure out what is the correct timeline to restore to, which at the end of the day needs to match the primary's timeline anyways, regardless of the latest timeline files in the pgbackrest repo.
Is either that or the admins need to go in the archive repo and manually delete the related wall files from the timeline that doesn't match the primary to prevent conflicts.
Is a common scenario.

Yahoo Mail: Search, Organize, Conquer

On Thu, Sep 11, 2025 at 9:05 PM, David G. Johnston<david.g.johnston@gmail.com> wrote: On Thursday, September 11, 2025, Efrain J. Berdecia <ejberdecia@yahoo.com> wrote:

One-line Summary: This new recovery_target_timeline option would ensure that when rebuilding a replica cluster, the recovery stays in the primary cluster's timeline making it fool proof and avoiding recovery timeline inconsistencies.

Business Use-case: Reduce human interaction when rebuilding replicas where unwanted timelines might have been archived in the repo and speed up recovery.

User impact with the change: New parameter option available 

Implementation details: I would need a subject matter expert to please make this feature a reality 

Estimated Development Time: unknown 

Category: Include the text: Restore, replication

Feature requests with this little info are probably better discussed on the -general list to garner support for the idea.
David J. 

#8Efrain J. Berdecia
ejberdecia@yahoo.com
In reply to: Euler Taveira (#6)
Re: New recovery_target_timeline=primary option

Even the documentation states/warns:
"Set restore_command to a simple command to copy files from the WAL archive. If you plan to have multiple standby servers for high availability purposes, make sure that recovery_target_timeline is set to latest (the default), to make the standby server follow the timeline change that occurs at failover to another standby."
By default, recovery_target_timeline is set to latest. What I'm recommending is an option to set it to just follow or stay within the primarie's timeline without having to receive the fatal message stated before that ends up stopping the recovery of the standby.
Supposed we have timelines 1-3 archived in our backup repo. Currently our streaming replication setup is running in timeline 3. But now, we need to restore the primary to timeline 2.  We can specify recovery_target_timeline=2 to initially restore the primary. But when I go to reinit or rebuild the standby, why not just add a new option, recovery_target_timeline=primary, that forces the standby to just stay on the primaries timeline without having to figure out the correct timeline for the standby.
Without this new parameter or without specifying the timeline when restoring the standby, the restore will take the standby to timeline 3 and get the fatal error message. This happens a lot on setups using tools like patroni.
Just trying to make the administrator's and HA tools lives a little easier when setting up a standby.
Yahoo Mail: Search, Organize, Conquer

On Thu, Sep 11, 2025 at 9:19 PM, Euler Taveira<euler@eulerto.com> wrote: On Thu, Sep 11, 2025, at 10:07 PM, Efrain J. Berdecia wrote:

The error I would like to address with this feature is the following:

FATAL: highest timeline xxx of the primary is behind timeline yyy

It seems your procedure to set up a standby is incorrect. See [1]https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-SETUP. You are not
using the base backup from the primary server.

You didn't describe the whole procedure so it is hard to point out where the
problem is.

[1]: https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-SETUP

--
Euler Taveira
EDB  https://www.enterprisedb.com/