BUG #14109: pg_rewind fails to update target control file in one scenario

Started by John Lumbyalmost 10 years ago18 messagesbugs
Jump to latest
#1John Lumby
johnlumby@hotmail.com

The following bug has been logged on the website:

Bug reference: 14109
Logged by: John Lumby
Email address: johnlumby@hotmail.com
PostgreSQL version: 9.5.1
Operating system: linux 64-bit
Description:

scenario :
two systems currently in an operating streaming replication relationship :
Primary systemA Standby SystemB
with no WAL queued and no inserts/updates/deletes now being performed on
systemA

then in chronological sequence :
. shut down SystemA
. pg_ctl promote SystemB
and verify systemB is running correctly stand-alone
. pg_rewind SystemA
output is something like
connected to server
fetched file "global/pg_control", length 8192
fetched file "pg_xlog/0000000D.history", length 388
servers diverged at WAL position 9/A90002A8 on timeline
12
no rewind required

. set up correct recovery.conf on SystemA
. start SystemA postgres server

At this point, both systemB and systemA appear to be running correctly,
but any insert/update/delete now performed on systemB is not replicated to
systemA.
Also pg_stat_replication view on systemB shows state 'startup' , not
'streaming'

I believe there is a bug in pg_rewind for this scenario, where it finds
that
the following conditions are true :
1 - source and target cluster are not on the same timeline
2 - the histories diverged exactly at the end of the
shutdown checkpoint record on the target,
so there are no WAL records in the target
that don't belong in the source's history

The code then concludes that no rewind is needed.

Which is true --
However, what I believe *is* needed is to update the target control file
with the new timeline and other information from the source.

This patch seems to fix the problem on my system :

--- src/bin/pg_rewind/pg_rewind.c.orig	2016-02-08 16:12:28.000000000 -0500
+++ src/bin/pg_rewind/pg_rewind.c	2016-04-24 14:50:52.646737233 -0400
@@ -247,7 +247,14 @@ main(int argc, char **argv)
 			 * needed.
 			 */
 			if (chkptendrec == divergerec)
+			{
 				rewind_needed = false;
+                /*  however we must still copy the control file from source
to target
+                 *  because of the timeline change.
+                 */
+				printf(_("no rewind required but will update global control file from
source for increase in timeline.\n"));
+			    goto updateControlFile;
+			}
 			else
 				rewind_needed = true;
 		}
@@ -318,6 +325,7 @@ main(int argc, char **argv)
 	pg_log(PG_PROGRESS, "\ncreating backup label and updating control
file\n");
 	createBackupLabel(chkptredo, chkpttli, chkptrec);

+ updateControlFile:
/*
* Update control file of target. Make it ready to perform archive
* recovery when restarting.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Michael Paquier
michael@paquier.xyz
In reply to: John Lumby (#1)
Re: BUG #14109: pg_rewind fails to update target control file in one scenario

On Mon, Apr 25, 2016 at 4:25 AM, <johnlumby@hotmail.com> wrote:

However, what I believe *is* needed is to update the target control file
with the new timeline and other information from the source.

No, this is incorrect. There is no need to update the control file of
a node that has not been rewound, and pg_rewind should not mess up
with that if there is no divergence point between the target and the
source nodes or it would update the minimum recovery point of a node
without real need to do so. It should be able to join back the cluster
depending on its initial shutdown state (when you shut down systemA).
What are the logs of your system A telling you regarding its startup
state?
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3John Lumby
johnlumby@hotmail.com
In reply to: Michael Paquier (#2)
Re: BUG #14109: pg_rewind fails to update target control file in one scenario

Thanks Michael,

After the pg_rewind in the scenario I described,

1) on System B (new Primary) I see

Sat Apr 23 14:19:18 EDT 2016

control file indicates
last check point WAL id : 0000000C00000009000000A3

 client_addr |         backend_start         |  state  | sent_location | write_location | flush_location | replay_location
-------------+-------------------------------+---------+---------------+----------------+----------------+-----------------
 10.19.0.1   | 2016-04-23 18:19:50.812509+00 | startup | 9/A30000D0    | 9/A30000D0     | 9/A30000D0     | 9/A30000D0

2) whereas on System A after pg_rewind  I see

Sat Apr 23 14:19:54 EDT 2016

control file indicates

last check point WAL id : 0000000B00000009000000A3

 pg_last_xlog_receive_location() , pg_last_xlog_replay_location() indicates

 pg_last_xlog_receive_location | pg_last_xlog_replay_location
-------------------------------+------------------------------
 9/A3000000                    | 9/A30000D0
(1 row)

Note the difference in timeline

and then,  as I described,   no WAL is replicated from B to A.

Did you try this scenario yourself?     I hope you agree it is a bug?
I will defer to you on what part of the code is the true cause,
but to me it looks very much as though pg_rewind ought to update the control file in this scenario.
That certainly does fix it.
If not that,   then what?

Cheers,   John

----------------------------------------

Date: Mon, 25 Apr 2016 16:23:58 +0900
Subject: Re: [BUGS] BUG #14109: pg_rewind fails to update target control file in one scenario
From: michael.paquier@gmail.com
To: johnlumby@hotmail.com
CC: pgsql-bugs@postgresql.org

On Mon, Apr 25, 2016 at 4:25 AM, <johnlumby@hotmail.com> wrote:

However, what I believe *is* needed is to update the target control file
with the new timeline and other information from the source.

No, this is incorrect. There is no need to update the control file of
a node that has not been rewound, and pg_rewind should not mess up
with that if there is no divergence point between the target and the
source nodes or it would update the minimum recovery point of a node
without real need to do so. It should be able to join back the cluster
depending on its initial shutdown state (when you shut down systemA).
What are the logs of your system A telling you regarding its startup
state?
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Michael Paquier
michael@paquier.xyz
In reply to: John Lumby (#3)
Re: BUG #14109: pg_rewind fails to update target control file in one scenario

On Mon, Apr 25, 2016 at 10:48 PM, John Lumby <johnlumby@hotmail.com> wrote:

Thanks Michael,

After the pg_rewind in the scenario I described,

1) on System B (new Primary) I see

Sat Apr 23 14:19:18 EDT 2016

control file indicates
last check point WAL id : 0000000C00000009000000A3

client_addr | backend_start | state | sent_location | write_location | flush_location | replay_location
-------------+-------------------------------+---------+---------------+----------------+----------------+-----------------
10.19.0.1 | 2016-04-23 18:19:50.812509+00 | startup | 9/A30000D0 | 9/A30000D0 | 9/A30000D0 | 9/A30000D0

2) whereas on System A after pg_rewind I see

(pg_rewind is a no-op here). It has done nothing to the source node.
When you ran it, it was clearly mentioned that "no rewind is needed".

Sat Apr 23 14:19:54 EDT 2016

control file indicates
last check point WAL id : 0000000B00000009000000A3

pg_last_xlog_receive_location() , pg_last_xlog_replay_location() indicates

pg_last_xlog_receive_location | pg_last_xlog_replay_location
-------------------------------+------------------------------
9/A3000000 | 9/A30000D0
(1 row)

Note the difference in timeline

Yes, and? System A is still on its previous timeline 11, and will jump
to timeline 12 once it has connected back. That's possible since 9.3.

and then, as I described, no WAL is replicated from B to A.
Did you try this scenario yourself?

Yes.

I hope you agree it is a bug?

No. In this case pg_rewind is a no-op: system A was shut down *before*
B was promoted, so it knows about the shutdown checkpoint record of
system A. No rewind would be needed here. One potential issue with
repetitive rewinds in such configurations is that after promotion of
system B the control file information is not up to date to the new
timeline, and pg_rewind runs and fetches the control data file of the
source node which still has the outdated timeline information. You may
want to issue a checkpoint on the source node after its promotion to
ensure that its control file is in correct shape, and pointing to the
latest timeline.

Again there is no bug here.
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#5John Lumby
johnlumby@hotmail.com
In reply to: Michael Paquier (#4)
Re: BUG #14109: pg_rewind fails to update target control file in one scenario

----------------------------------------

Date: Mon, 25 Apr 2016 23:03:23 +0900
Subject: Re: [BUGS] BUG #14109: pg_rewind fails to update target control file in one scenario
From: michael.paquier@gmail.com
To: johnlumby@hotmail.com
CC: pgsql-bugs@postgresql.org

On Mon, Apr 25, 2016 at 10:48 PM, John Lumby <johnlumby@hotmail.com> wrote:

Note the difference in timeline

Yes, and? System A is still on its previous timeline 11, and will jump
to timeline 12 once it has connected back. That's possible since 9.3.

Answering your "and?" : In my testing, 
        "and no WAL replicated from B, new Primary , to A , new Standby".
That is why I concluded there is a bug.
Also on System A (new Standby) timeline did not jump up to 12,  it remained at 11.
Are you indicating that the admin (me) needs to do something to make that happen?
If so what?

and then, as I described, no WAL is replicated from B to A.
Did you try this scenario yourself?

Yes.

And did you perform updates on System B (new Primary)
and then observe them replicated to System A?
I consistently see that *not* happening.

I hope you agree it is a bug?

No. In this case pg_rewind is a no-op: system A was shut down *before*
B was promoted, so it knows about the shutdown checkpoint record of
system A. No rewind would be needed here. One potential issue with

Yes,  no rewind is needed,  I agree.   But,  as you stated earlier,
the timeline on System A needs to be incremented.
So the question is,  what is supposed to make that happen?

Again there is no bug here.
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Julien Rouhaud
rjuju123@gmail.com
In reply to: John Lumby (#5)
Re: BUG #14109: pg_rewind fails to update target control file in one scenario

On 25/04/2016 16:36, John Lumby wrote:

From: michael.paquier@gmail.com
To: johnlumby@hotmail.com
CC: pgsql-bugs@postgresql.org

On Mon, Apr 25, 2016 at 10:48 PM, John Lumby <johnlumby@hotmail.com> wrote:

Note the difference in timeline

Yes, and? System A is still on its previous timeline 11, and will jump
to timeline 12 once it has connected back. That's possible since 9.3.

Answering your "and?" : In my testing,
"and no WAL replicated from B, new Primary , to A , new Standby".
That is why I concluded there is a bug.
Also on System A (new Standby) timeline did not jump up to 12, it remained at 11.
Are you indicating that the admin (me) needs to do something to make that happen?
If so what?

Did you set the recovery_target_timeline parameter to "latest" in the
recovery.conf file? (see
http://www.postgresql.org/docs/current/static/recovery-target-settings.html).

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7John Lumby
johnlumby@hotmail.com
In reply to: Julien Rouhaud (#6)
Re: BUG #14109: pg_rewind fails to update target control file in one scenario

----------------------------------------

Subject: Re: [BUGS] BUG #14109: pg_rewind fails to update target control file in one scenario
To: johnlumby@hotmail.com; michael.paquier@gmail.com
CC: pgsql-bugs@postgresql.org
From: julien.rouhaud@dalibo.com
Date: Mon, 25 Apr 2016 16:53:19 +0200

On 25/04/2016 16:36, John Lumby wrote:

From: michael.paquier@gmail.com
To: johnlumby@hotmail.com
CC: pgsql-bugs@postgresql.org

On Mon, Apr 25, 2016 at 10:48 PM, John Lumby <johnlumby@hotmail.com> wrote:

Note the difference in timeline

Did you set the recovery_target_timeline parameter to "latest" in the
recovery.conf file? (see
http://www.postgresql.org/docs/current/static/recovery-target-settings.html).

Thanks Julien,  no  I did not,     I will re-test with that later,  on 9.5.2
Meanwhile one question on that.   The documentation states
   "Setting this to latest recovers
to the latest timeline found in the archive, which ..."
However I am running my systems with
archive_mode = off   
I am wondering whether
recovery_target_timeline parameter = "latest"
is expected to work reliably  to fix my particular problem when archiving is not in effect.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Guillaume Lelarge
guillaume@lelarge.info
In reply to: John Lumby (#7)
Re: BUG #14109: pg_rewind fails to update target control file in one scenario

2016-04-25 17:45 GMT+02:00 John Lumby <johnlumby@hotmail.com>:

----------------------------------------

Subject: Re: [BUGS] BUG #14109: pg_rewind fails to update target control

file in one scenario

To: johnlumby@hotmail.com; michael.paquier@gmail.com
CC: pgsql-bugs@postgresql.org
From: julien.rouhaud@dalibo.com
Date: Mon, 25 Apr 2016 16:53:19 +0200

On 25/04/2016 16:36, John Lumby wrote:

From: michael.paquier@gmail.com
To: johnlumby@hotmail.com
CC: pgsql-bugs@postgresql.org

On Mon, Apr 25, 2016 at 10:48 PM, John Lumby <johnlumby@hotmail.com>

wrote:

Note the difference in timeline

Did you set the recovery_target_timeline parameter to "latest" in the
recovery.conf file? (see

http://www.postgresql.org/docs/current/static/recovery-target-settings.html
).

Thanks Julien, no I did not, I will re-test with that later, on
9.5.2
Meanwhile one question on that. The documentation states
"Setting this to latest recovers
to the latest timeline found in the archive, which ..."
However I am running my systems with
archive_mode = off
I am wondering whether
recovery_target_timeline parameter = "latest"
is expected to work reliably to fix my particular problem when archiving
is not in effect.

It also works with the streaming replication since at least 9.3. So, yes,
it should work.

--
Guillaume.
http://blog.guillaume.lelarge.info
http://www.dalibo.com

#9Julien Rouhaud
rjuju123@gmail.com
In reply to: John Lumby (#7)
Re: BUG #14109: pg_rewind fails to update target control file in one scenario

On 25/04/2016 17:45, John Lumby wrote:

From: julien.rouhaud@dalibo.com

Did you set the recovery_target_timeline parameter to "latest" in the
recovery.conf file? (see
http://www.postgresql.org/docs/current/static/recovery-target-settings.html).

Thanks Julien, no I did not, I will re-test with that later, on 9.5.2
Meanwhile one question on that. The documentation states
"Setting this to latest recovers
to the latest timeline found in the archive, which ..."
However I am running my systems with
archive_mode = off
I am wondering whether
recovery_target_timeline parameter = "latest"
is expected to work reliably to fix my particular problem when archiving is not in effect.

I'm not sure where it's documented, but the timeline change works with
log shipping, and since 9.3 with streaming replication.

That's why Michael previously said: "System A is still on its previous
timeline 11, and will jump
to timeline 12 once it has connected back. That's possible since 9.3."

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10John Lumby
johnlumby@hotmail.com
In reply to: Julien Rouhaud (#9)
Re: BUG #14109: pg_rewind fails to update target control file in one scenario

I've tested same scenario but with the setting

recovery_target_timeline = 'latest'

and on 9.5.2,

and now the new Standby receives new WAL correctly after the pg_rewind and restart

So,  assuming this is reliable (will work without requiring archiving)
then my problem is solved.

Thanks for everyone's help,  please close the bug as user error

Cheers,  John

----------------------------------------

Subject: Re: [BUGS] BUG #14109: pg_rewind fails to update target control file in one scenario
To: johnlumby@hotmail.com; michael.paquier@gmail.com
CC: pgsql-bugs@postgresql.org
From: julien.rouhaud@dalibo.com
Date: Mon, 25 Apr 2016 19:06:57 +0200

On 25/04/2016 17:45, John Lumby wrote:

From: julien.rouhaud@dalibo.com

Did you set the recovery_target_timeline parameter to "latest" in the
recovery.conf file? (see
http://www.postgresql.org/docs/current/static/recovery-target-settings.html).

Thanks Julien, no I did not, I will re-test with that later, on 9.5.2
Meanwhile one question on that. The documentation states
"Setting this to latest recovers
to the latest timeline found in the archive, which ..."
However I am running my systems with
archive_mode = off
I am wondering whether
recovery_target_timeline parameter = "latest"
is expected to work reliably to fix my particular problem when archiving is not in effect.

I'm not sure where it's documented, but the timeline change works with
log shipping, and since 9.3 with streaming replication.

That's why Michael previously said: "System A is still on its previous
timeline 11, and will jump
to timeline 12 once it has connected back. That's possible since 9.3."

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#11Michael Paquier
michael@paquier.xyz
In reply to: John Lumby (#10)
Re: BUG #14109: pg_rewind fails to update target control file in one scenario

On Tue, Apr 26, 2016 at 7:15 AM, John Lumby <johnlumby@hotmail.com> wrote:

So, assuming this is reliable (will work without requiring archiving)
then my problem is solved.

Depending on the checkpoint frequency and the activity on your
systems, you may face problems with missing WAL segments at some point
because past WAL segments need to be recycled or removed by the server
to move on with its life. One way to take care of this class of
problems is to use wal_keep_segments. An even better one is called
replication slot. This solely depends on how your system is working,
so perhaps you will not need some extra configuration.
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#12John Lumby
johnlumby@hotmail.com
In reply to: Michael Paquier (#11)
Re: BUG #14109: pg_rewind fails to update target control file in one scenario

Thanks Michael,

----------------------------------------

Date: Tue, 26 Apr 2016 08:04:58 +0900
Subject: Re: [BUGS] BUG #14109: pg_rewind fails to update target control file in one scenario
From: michael.paquier@gmail.com
To: johnlumby@hotmail.com
CC: julien.rouhaud@dalibo.com; pgsql-bugs@postgresql.org

On Tue, Apr 26, 2016 at 7:15 AM, John Lumby <johnlumby@hotmail.com> wrote:

So, assuming this is reliable (will work without requiring archiving)
then my problem is solved.

Depending on the checkpoint frequency and the activity on your
systems, you may face problems with missing WAL segments at some point
because past WAL segments need to be recycled or removed by the server
to move on with its life.

Yes,  I fear I could be caught out by that --
in fact that is why I now always "stabilize" the replication by halting ins/upd/del activity
and then shut the current Primary down first before promoting current Standby.
I *think* that then should guarantee there cannot be any missing WAL segments
when I then rewind the old Primary to become new Standby.

One way to take care of this class of
problems is to use wal_keep_segments. An even better one is called
replication slot.

Regarding replication slots  --   Actually I do use them (I think it is unsafe to run
streaming replication without either archiving or a replication slot)
but even that would still not guarantee success
if I did not take the precaution of shutting down current primary first before flip.

And  ..   we discussed this very point in pqsql-general just a month ago  --

/messages/by-id/COL131-W804D45E77B0D0FB1EF08B1A3890@phx.gbl

I did not get any answer to my suggestion in that post but I think it might be useful.

This solely depends on how your system is working,
so perhaps you will not need some extra configuration.
--
Michael

I think there needs to be some clear instructions on exactly what configuration is needed
to be able to run streaming replication and always be able to flip
Standby->Primary   ,  <some actions>   ,   Primary-> Standby

and in those posts in pgsql-general I wrote a suggested addition to the wiki page
but was unable to edit it myself.

Cheers,    John

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#13Michael Paquier
michael@paquier.xyz
In reply to: John Lumby (#12)
Re: BUG #14109: pg_rewind fails to update target control file in one scenario

On Tue, Apr 26, 2016 at 10:37 PM, John Lumby <johnlumby@hotmail.com> wrote:

I wrote:

One way to take care of this class of
problems is to use wal_keep_segments. An even better one is called
replication slot.

Regarding replication slots -- Actually I do use them (I think it is unsafe to run
streaming replication without either archiving or a replication slot)
but even that would still not guarantee success
if I did not take the precaution of shutting down current primary first before flip.

And .. we discussed this very point in pqsql-general just a month ago --

/messages/by-id/COL131-W804D45E77B0D0FB1EF08B1A3890@phx.gbl

My memory is so short-lived lately... I did not recall that :)

I did not get any answer to my suggestion in that post but I think it might be useful.

Replication slots are perfectly able to retain WAL segments from a
prior timeline, so I am not sure that this would be much a gain. And
as they can be used as well on standbys you could create/drop slots on
it at regular intervals. Or more simply use a WAL archive.
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#14Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#13)
Re: BUG #14109: pg_rewind fails to update target control file in one scenario

On Wed, Apr 27, 2016 at 9:47 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Apr 26, 2016 at 10:37 PM, John Lumby <johnlumby@hotmail.com> wrote:

I did not get any answer to my suggestion in that post but I think it might be useful.

Replication slots are perfectly able to retain WAL segments from a
prior timeline, so I am not sure that this would be much a gain.

Or to put in in other words, a slot that activates automatically
itself at promotion to retain WAL from the previous timeline would be
interesting for pg_rewind, but the use case we have here is only
pg_rewind. so it seems a bit narrow to bother modifying the core core
to add this support in the replication slot code. And this can be
solved lightly with an archive. Thoughts from others are welcome.
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#15Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#14)
Re: BUG #14109: pg_rewind fails to update target control file in one scenario

Michael Paquier wrote:

On Wed, Apr 27, 2016 at 9:47 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Replication slots are perfectly able to retain WAL segments from a
prior timeline, so I am not sure that this would be much a gain.

Or to put in in other words, a slot that activates automatically
itself at promotion to retain WAL from the previous timeline would be
interesting for pg_rewind, but the use case we have here is only
pg_rewind. so it seems a bit narrow to bother modifying the core core
to add this support in the replication slot code. And this can be
solved lightly with an archive. Thoughts from others are welcome.

Sounds to me like it should be enough to document (in pg_rewind's page)
that activating a slot prior to the promotion. Maybe we could have
"pg_ctl promote" have an option to create a slot, to make this simpler?

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#16Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#15)
Re: BUG #14109: pg_rewind fails to update target control file in one scenario

On Thu, Apr 28, 2016 at 8:28 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Michael Paquier wrote:

On Wed, Apr 27, 2016 at 9:47 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Replication slots are perfectly able to retain WAL segments from a
prior timeline, so I am not sure that this would be much a gain.

Or to put in in other words, a slot that activates automatically
itself at promotion to retain WAL from the previous timeline would be
interesting for pg_rewind, but the use case we have here is only
pg_rewind. so it seems a bit narrow to bother modifying the core core
to add this support in the replication slot code. And this can be
solved lightly with an archive. Thoughts from others are welcome.

Sounds to me like it should be enough to document (in pg_rewind's page)
that activating a slot prior to the promotion. Maybe we could have
"pg_ctl promote" have an option to create a slot, to make this simpler?

That would be actually creating a slot before the last common
checkpoint before promotion, which is a bit more tricky to evaluate.
With luck, perhaps both of them would be on the same segment, but
there is no way to be sure about that... It seems to me that what we
are looking for here is a way to tell through
pg_create_physical_replication_slot to reserve WAL from a position
that caller has decided.
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#17Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#16)
Re: BUG #14109: pg_rewind fails to update target control file in one scenario

Michael Paquier wrote:

On Thu, Apr 28, 2016 at 8:28 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Sounds to me like it should be enough to document (in pg_rewind's page)
that activating a slot prior to the promotion. Maybe we could have
"pg_ctl promote" have an option to create a slot, to make this simpler?

That would be actually creating a slot before the last common
checkpoint before promotion, which is a bit more tricky to evaluate.
With luck, perhaps both of them would be on the same segment, but
there is no way to be sure about that... It seems to me that what we
are looking for here is a way to tell through
pg_create_physical_replication_slot to reserve WAL from a position
that caller has decided.

Well, that sounds like just a SMOP, doesn't it?

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#18Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#17)
Re: BUG #14109: pg_rewind fails to update target control file in one scenario

On Fri, Apr 29, 2016 at 8:42 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Well, that sounds like just a SMOP, doesn't it?

Oh, actually, looking at ReplicationSlotReserveWal(), what is used as
a restart LSN is the last redo position. I haven't noticed that until
now. So indded if you create a slot just before the promotion that
would actually be fine.
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs