PG Upgrade with hardlinks, when to start/stop master and replicas
Hello everyone!
We are about to upgrade a 6 instance cluster from pg92 to pg10 using pg_upgrade with hardlinks and rsync.
Our preliminary tests are working really good so far but on question has popped up that we feel is really critical because it has an important impact on our failover plan.
After reading the pg_upgrade documentation multiple times, it seems that after running pg_upgrade on the primary instance, we can't start it until we run rsync from the primary to the standby. I'm understanding this from the following section in the pg_upgrade manual page.
```
You will not be running pg_upgrade on the standby servers, but rather rsync on the
primary. Do not start any servers yet.
```
I'm understanding the `any` as primary and standbys.
On the other hand, we've been doing tests that start the primary instance as soon as pg_upgrade is done. This tests have worked perfectly fine so far. We make the rsync call with the primary instance running and the standby can start later on after rsync is done and we copy the new configuration files.
If what we are doing is wrong, we need to run `rsync` before starting the primary instance, that would mean that the primary and the standby are not usable if pg10 doesn't start correctly in the primary right ?
I hope my question is clear enough!
Best,
Martín
Martín Fernández wrote:
After reading the pg_upgrade documentation multiple times, it seems that after running pg_upgrade on the primary instance, we can't start it until we run rsync from the primary to the standby. I'm understanding this from the following section in the pg_upgrade manual page.
You will not be running pg_upgrade on the standby servers, but rather rsync on the
primary. Do not start any servers yet.
Immediately following, you can read:
If you did not use link mode, do not have or do not want to use rsync, or want an easier
solution, skip the instructions in this section and simply recreate the standby servers
once pg_upgrade completes and the new primary is running.
So this is not compulsory, it's just an efficient method to quickly get the standby
server updated.
There is nothing wrong with rebuilding the standby later.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
Hola Martin
Pues si uno sigue la secuencia de la pagina de ayuda de PostgreSQL
https://www.postgresql.org/docs/10/pgupgrade.html
Usage
(...)
7. Stop both servers
(...)
10. Upgrade Streaming Replication and Log-Shipping standby servers
(...)
12. Start the new server
*The new server can now be safely started, and then any rsync'ed standby
servers.*
Pensaría que toca esperar a que terminen todo los rsync de las replicas
antes de iniciar la master... Pero tengo incluso una pregunta adicional: si
tengo 2 o mas replicas.. se podria sincronizar primero la master con una
replica (rsync), iniciar la master y luego emplear la replica sincronizada
para sincronizar el resto de replicas mientras que la máster ya se
encuentra en funcionamiento?.
El lun., 18 de feb. de 2019 a la(s) 15:21, Laurenz Albe (
laurenz.albe@cybertec.at) escribió:
Martín Fernández wrote:
After reading the pg_upgrade documentation multiple times, it seems that
after running pg_upgrade on the primary instance, we can't start it until
we run rsync from the primary to the standby. I'm understanding this from
the following section in the pg_upgrade manual page.You will not be running pg_upgrade on the standby servers, but rather
rsync on the
primary. Do not start any servers yet.
Immediately following, you can read:
If you did not use link mode, do not have or do not want to use rsync, or
want an easier
solution, skip the instructions in this section and simply recreate the
standby servers
once pg_upgrade completes and the new primary is running.So this is not compulsory, it's just an efficient method to quickly get
the standby
server updated.There is nothing wrong with rebuilding the standby later.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
--
Cordialmente,
Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Oracle Database 10g Administrator Certified Associate
EnterpriseDB Certified PostgreSQL 9.3 Associate
Laurenz and Hellmuth, replying to both of you!
Thanks for the quick replies BTW!
Martín
On Mon, Feb 18, 2019 at 5:32 PM Hellmuth Vargas <hivs77@gmail.com> wrote:
Hola Martin
Pues si uno sigue la secuencia de la pagina de ayuda de PostgreSQL
https://www.postgresql.org/docs/10/pgupgrade.html
Usage
(...)
7. Stop both servers
(...)
10. Upgrade Streaming Replication and Log-Shipping standby servers
(...)
12. Start the new server*The new server can now be safely started, and then any rsync'ed standby
servers.*Pensaría que toca esperar a que terminen todo los rsync de las replicas
antes de iniciar la master... Pero tengo incluso una pregunta adicional: si
tengo 2 o mas replicas.. se podria sincronizar primero la master con una
replica (rsync), iniciar la master y luego emplear la replica sincronizada
para sincronizar el resto de replicas mientras que la máster ya se
encuentra en funcionamiento?.
I asked myself that question as well. I'm trying to basically minimize the
amount of standbys that are going to be affected by the upgrade. The
documentation states that you can do something similar to what you
describe, the only thing that you need to make sure is to no start the
standby if you are going to use it as a source of rsync to another standby.
Would that make sense ?
El lun., 18 de feb. de 2019 a la(s) 15:21, Laurenz Albe (
laurenz.albe@cybertec.at) escribió:Martín Fernández wrote:
After reading the pg_upgrade documentation multiple times, it seems
that after running pg_upgrade on the primary instance, we can't start it
until we run rsync from the primary to the standby. I'm understanding this
from the following section in the pg_upgrade manual page.You will not be running pg_upgrade on the standby servers, but rather
rsync on the
primary. Do not start any servers yet.
Immediately following, you can read:
If you did not use link mode, do not have or do not want to use rsync,
or want an easier
solution, skip the instructions in this section and simply recreate the
standby servers
once pg_upgrade completes and the new primary is running.So this is not compulsory, it's just an efficient method to quickly get
the standby
server updated.There is nothing wrong with rebuilding the standby later.
I think that by `recreate` here we are talking about pg_basebackup right ?
That won't be acceptable because our database would take days to complete.
We need to use rsync and leverage the hardlinks.
Show quoted text
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com--
Cordialmente,Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Oracle Database 10g Administrator Certified Associate
EnterpriseDB Certified PostgreSQL 9.3 Associate
Greetings,
* Martín Fernández (fmartin91@gmail.com) wrote:
After reading the pg_upgrade documentation multiple times, it seems that after running pg_upgrade on the primary instance, we can't start it until we run rsync from the primary to the standby. I'm understanding this from the following section in the pg_upgrade manual page.
```
You will not be running pg_upgrade on the standby servers, but rather rsync on the
primary. Do not start any servers yet.
```I'm understanding the `any` as primary and standbys.
Yes, that's correct, you shouldn't start up anything yet.
On the other hand, we've been doing tests that start the primary instance as soon as pg_upgrade is done. This tests have worked perfectly fine so far. We make the rsync call with the primary instance running and the standby can start later on after rsync is done and we copy the new configuration files.
This is like taking an online backup of the primary without actually
doing pg_start_backup / pg_stop_backup and following the protocol for
that, meaning that the replica will start up without a backup_label and
will think it's at whatever point in the WAL stream that the pg_control
file says its at as of whenever the rsync copies that file.
That is NOT SAFE and it's a sure way to end up with corruption.
The rsync while everything is down should be pretty fast, unless you
have unlogged tables that are big (in which case, you should truncate
them before shutting down the primary) or temporary tables left around
(which you should clean up) or just generally other things that a
replica doesn't normally have.
If you can't have any downtime during this process then, imv, the answer
is to build out a new replica that will essentially be a 'throw-away',
move all the read load over to it and then go through the documented
pg_upgrade process with the primary and the other replicas, then flip
the traffic back to the primary + original replicas and then you can
either throw away the replica that was kept online or rebuild it using
the traditional methods of pg_basebackup (or for a larger system, you
could use pgbackrest which can run in parallel and is much, much faster
than pg_basebackup).
If what we are doing is wrong, we need to run `rsync` before starting the primary instance, that would mean that the primary and the standby are not usable if pg10 doesn't start correctly in the primary right ?
This is another reason why it's good to have an independent replica, as
it can be a fail-safe if things go completely south (you can just
promote it and have it be the primary and then rebuild replicas using
the regular backup+restore method and figure out what went wrong with
the pg10 migration).
Thanks!
Stephen
Hi
But could you do the following procedure?:
pg_upgrade of the master
rysnc with a hot standby
arracar master
hot standby start
stop hot standby and rsync the other hot standby with the migrated hot
standby?
El mar., 19 de feb. de 2019 a la(s) 06:12, Stephen Frost (sfrost@snowman.net)
escribió:
Greetings,
* Martín Fernández (fmartin91@gmail.com) wrote:
After reading the pg_upgrade documentation multiple times, it seems that
after running pg_upgrade on the primary instance, we can't start it until
we run rsync from the primary to the standby. I'm understanding this from
the following section in the pg_upgrade manual page.```
You will not be running pg_upgrade on the standby servers, but ratherrsync on the
primary. Do not start any servers yet.
```I'm understanding the `any` as primary and standbys.
Yes, that's correct, you shouldn't start up anything yet.
On the other hand, we've been doing tests that start
the primary instance as soon as pg_upgrade is done. This tests have worked
perfectly fine so far. We make the rsync call with the primary instance
running and the standby can start later on after rsync is done and we copy
the new configuration files.This is like taking an online backup of the primary without actually
doing pg_start_backup / pg_stop_backup and following the protocol for
that, meaning that the replica will start up without a backup_label and
will think it's at whatever point in the WAL stream that the pg_control
file says its at as of whenever the rsync copies that file.That is NOT SAFE and it's a sure way to end up with corruption.
The rsync while everything is down should be pretty fast, unless you
have unlogged tables that are big (in which case, you should truncate
them before shutting down the primary) or temporary tables left around
(which you should clean up) or just generally other things that a
replica doesn't normally have.If you can't have any downtime during this process then, imv, the answer
is to build out a new replica that will essentially be a 'throw-away',
move all the read load over to it and then go through the documented
pg_upgrade process with the primary and the other replicas, then flip
the traffic back to the primary + original replicas and then you can
either throw away the replica that was kept online or rebuild it using
the traditional methods of pg_basebackup (or for a larger system, you
could use pgbackrest which can run in parallel and is much, much faster
than pg_basebackup).If what we are doing is wrong, we need to run `rsync` before starting
the primary instance, that would mean that the primary and the standby are
not usable if pg10 doesn't start correctly in the primary right ?This is another reason why it's good to have an independent replica, as
it can be a fail-safe if things go completely south (you can just
promote it and have it be the primary and then rebuild replicas using
the regular backup+restore method and figure out what went wrong with
the pg10 migration).Thanks!
Stephen
--
Cordialmente,
Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Oracle Database 10g Administrator Certified Associate
EnterpriseDB Certified PostgreSQL 9.3 Associate
Greetings,
* Hellmuth Vargas (hivs77@gmail.com) wrote:
But could you do the following procedure?:
pg_upgrade of the master
rysnc with a hot standby
The above should be alright provided both the primary and the standby
are down and the instructions in the pg_upgrade docs are followed.
arracar master
hot standby start
So, start both the primary and the replica? That part should be fine by
itself.
stop hot standby and rsync the other hot standby with the migrated hot
standby?
At some later point, shut down the replica completely, then do an rsync
from that replica to the other replica and build its hard-link tree that
way, and update anything that's changed while the 'migrated' replica was
online? I don't see any obvious issue with that as the result should
mean that the two replicas are identical from PG's perspective from that
point moving forward.
Ultimately, it really depends on your specific environment though, of
course. It also might not be a bad idea to do a regular backup of the
upgraded primary and then restore that to the second replica, just to
make sure you have that whole process working and to test out your
restore process.
Thanks!
Stephen
Thank you Stephen 👍
El mar., 19 de feb. de 2019 a la(s) 10:05, Stephen Frost (sfrost@snowman.net)
escribió:
Greetings,
* Hellmuth Vargas (hivs77@gmail.com) wrote:
But could you do the following procedure?:
pg_upgrade of the master
rysnc with a hot standbyThe above should be alright provided both the primary and the standby
are down and the instructions in the pg_upgrade docs are followed.arracar master
hot standby startSo, start both the primary and the replica? That part should be fine by
itself.stop hot standby and rsync the other hot standby with the migrated hot
standby?At some later point, shut down the replica completely, then do an rsync
from that replica to the other replica and build its hard-link tree that
way, and update anything that's changed while the 'migrated' replica was
online? I don't see any obvious issue with that as the result should
mean that the two replicas are identical from PG's perspective from that
point moving forward.Ultimately, it really depends on your specific environment though, of
course. It also might not be a bad idea to do a regular backup of the
upgraded primary and then restore that to the second replica, just to
make sure you have that whole process working and to test out your
restore process.Thanks!
Stephen
--
Cordialmente,
Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Oracle Database 10g Administrator Certified Associate
EnterpriseDB Certified PostgreSQL 9.3 Associate
Stephen,
Thanks for information! I've refactor our migration scripts to follow the suggestions.
One extra question that popped up. As long as we don't start the standby (after running rsync), we can always `rm -f $PGDATA_10` and promote the standby if necessary for failover right ? We also need to `mv` pg_control.old to pg_control in the old data directory.
Thanks!
Martín
On Tue, Feb 19th, 2019 at 12:39 PM, Hellmuth Vargas <hivs77@gmail.com> wrote:
Show quoted text
Thank you Stephen 👍
El mar., 19 de feb. de 2019 a la(s) 10:05, Stephen Frost ( sfrost@snowman.net
) escribió:Greetings,
* Hellmuth Vargas ( hivs77@gmail.com ) wrote:
But could you do the following procedure?:
pg_upgrade of the master
rysnc with a hot standbyThe above should be alright provided both the primary and the standby
are down and the instructions in the pg_upgrade docs are followed.arracar master
hot standby startSo, start both the primary and the replica? That part should be fine by
itself.stop hot standby and rsync the other hot standby with the migrated hot
standby?At some later point, shut down the replica completely, then do an rsync
from that replica to the other replica and build its hard-link tree that
way, and update anything that's changed while the 'migrated' replica was
online? I don't see any obvious issue with that as the result should
mean that the two replicas are identical from PG's perspective from that
point moving forward.Ultimately, it really depends on your specific environment though, of
course. It also might not be a bad idea to do a regular backup of the
upgraded primary and then restore that to the second replica, just to
make sure you have that whole process working and to test out your
restore process.Thanks!
Stephen
--
Cordialmente,Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Oracle Database 10g Administrator Certified Associate
EnterpriseDB Certified PostgreSQL 9.3 Associate
Greetings,
* Martín Fernández (fmartin91@gmail.com) wrote:
Thanks for information! I've refactor our migration scripts to follow the suggestions.
Please don't top-post on these mailing lists.
One extra question that popped up. As long as we don't start the standby (after running rsync), we can always `rm -f $PGDATA_10` and promote the standby if necessary for failover right ? We also need to `mv` pg_control.old to pg_control in the old data directory.
Not sure which standby we're talking about here, but in general, yes, as
long as you haven't actually started the system after the
pg_upgrade/rsync, you should be able to blow away the new cluster that
pg_upgrade/rsync created and start the old cluster back up and promote
it (if necessary) and use it.
Note that you should *not* need to do anything with pg_control, I have
no idea what you're referring to there, but the old cluster should have
the pg_control file and all the catalog tables in place from before the
pg_upgrade/rsync (those aren't touched during the pg_upgrade/rsync
process) and you would just need to start up the old binaries pointing
at the old PG data directory and everything should just work.
Thanks!
Stephen
Stephen,
@bilby91
On Tue, Feb 19, 2019 at 1:37 PM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* Martín Fernández (fmartin91@gmail.com) wrote:
Thanks for information! I've refactor our migration scripts to follow
the suggestions.
Please don't top-post on these mailing lists.
One extra question that popped up. As long as we don't start the standby
(after running rsync), we can always `rm -f $PGDATA_10` and promote the
standby if necessary for failover right ? We also need to `mv`
pg_control.old to pg_control in the old data directory.Not sure which standby we're talking about here, but in general, yes, as
long as you haven't actually started the system after the
pg_upgrade/rsync, you should be able to blow away the new cluster that
pg_upgrade/rsync created and start the old cluster back up and promote
it (if necessary) and use it.Note that you should *not* need to do anything with pg_control, I have
no idea what you're referring to there, but the old cluster should have
the pg_control file and all the catalog tables in place from before the
pg_upgrade/rsync (those aren't touched during the pg_upgrade/rsync
process) and you would just need to start up the old binaries pointing
at the old PG data directory and everything should just work.
I did some successful tests yesterday around this scenario. That standby in
this context is that one that received the rsync from the master but was
never started. The old data directory stays intact except for the fact that
globa/pg_control was renmaed with a .old
I have found the documentation on pg_ugprade that states this:
` If you ran pg_upgrade without --link or did not start the new server, the
old cluster was not modified except that, if linking started, a .old suffix
was appended to
$PGDATA/global/pg_control. To reuse the old cluster,
possibly remove the .old suffix from $PGDATA/global/pg_control; you can
then restart the old cluster.`
Show quoted text
Thanks!
Stephen
Greetings,
* Martín Fernández (fmartin91@gmail.com) wrote:
On Tue, Feb 19, 2019 at 1:37 PM Stephen Frost <sfrost@snowman.net> wrote:
* Martín Fernández (fmartin91@gmail.com) wrote:
Thanks for information! I've refactor our migration scripts to follow
the suggestions.
Please don't top-post on these mailing lists.
One extra question that popped up. As long as we don't start the standby
(after running rsync), we can always `rm -f $PGDATA_10` and promote the
standby if necessary for failover right ? We also need to `mv`
pg_control.old to pg_control in the old data directory.Not sure which standby we're talking about here, but in general, yes, as
long as you haven't actually started the system after the
pg_upgrade/rsync, you should be able to blow away the new cluster that
pg_upgrade/rsync created and start the old cluster back up and promote
it (if necessary) and use it.Note that you should *not* need to do anything with pg_control, I have
no idea what you're referring to there, but the old cluster should have
the pg_control file and all the catalog tables in place from before the
pg_upgrade/rsync (those aren't touched during the pg_upgrade/rsync
process) and you would just need to start up the old binaries pointing
at the old PG data directory and everything should just work.I did some successful tests yesterday around this scenario. That standby in
this context is that one that received the rsync from the master but was
never started. The old data directory stays intact except for the fact that
globa/pg_control was renmaed with a .oldI have found the documentation on pg_ugprade that states this:
` If you ran pg_upgrade without --link or did not start the new server, the
old cluster was not modified except that, if linking started, a .old suffix
was appended to
$PGDATA/global/pg_control. To reuse the old cluster,
possibly remove the .old suffix from $PGDATA/global/pg_control; you can
then restart the old cluster.`
Ah, right, I forgot that it did that, fair enough.
I've never been thrilled with that particular approach due to the
inherent risks of people messing directly with files like pg_control,
but that's how it is for now.
Thanks!
Stephen
On Tue, Feb 19, 2019 at 12:25:24PM -0500, Stephen Frost wrote:
Ah, right, I forgot that it did that, fair enough.
I've never been thrilled with that particular approach due to the
inherent risks of people messing directly with files like pg_control,
but that's how it is for now.
There was too much concern that users would accidentally start the old
server at some later point, and its files would be hard linked to the
new live server, leading to disaster.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
Greetings,
* Bruce Momjian (bruce@momjian.us) wrote:
On Tue, Feb 19, 2019 at 12:25:24PM -0500, Stephen Frost wrote:
Ah, right, I forgot that it did that, fair enough.
I've never been thrilled with that particular approach due to the
inherent risks of people messing directly with files like pg_control,
but that's how it is for now.There was too much concern that users would accidentally start the old
server at some later point, and its files would be hard linked to the
new live server, leading to disaster.
Sure, I understand that concern, just wish there was a better approach
we could use for "DO NOT START THIS SERVER" rather than moving of the
pg_control file.
Thanks!
Stephen
On Thu, Feb 21, 2019 at 09:31:32PM -0500, Stephen Frost wrote:
Greetings,
* Bruce Momjian (bruce@momjian.us) wrote:
On Tue, Feb 19, 2019 at 12:25:24PM -0500, Stephen Frost wrote:
Ah, right, I forgot that it did that, fair enough.
I've never been thrilled with that particular approach due to the
inherent risks of people messing directly with files like pg_control,
but that's how it is for now.There was too much concern that users would accidentally start the old
server at some later point, and its files would be hard linked to the
new live server, leading to disaster.Sure, I understand that concern, just wish there was a better approach
we could use for "DO NOT START THIS SERVER" rather than moving of the
pg_control file.
As ugly as it is, I have never heard of a better solution.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
Bruce Momjian <bruce@momjian.us> writes:
On Thu, Feb 21, 2019 at 09:31:32PM -0500, Stephen Frost wrote:
* Bruce Momjian (bruce@momjian.us) wrote:
There was too much concern that users would accidentally start the old
server at some later point, and its files would be hard linked to the
new live server, leading to disaster.
Sure, I understand that concern, just wish there was a better approach
we could use for "DO NOT START THIS SERVER" rather than moving of the
pg_control file.
As ugly as it is, I have never heard of a better solution.
system("rm -rf $OLDPGDATA") ... nah, that is not a better idea.
regards, tom lane
On Fri, Feb 22, 2019 at 2:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Bruce Momjian <bruce@momjian.us> writes:
On Thu, Feb 21, 2019 at 09:31:32PM -0500, Stephen Frost wrote:
* Bruce Momjian (bruce@momjian.us) wrote:
There was too much concern that users would accidentally start the old
server at some later point, and its files would be hard linked to the
new live server, leading to disaster.
I think this is a great solution. Knowing that neither a human nor a
supervisor can mess up the hardlinks is something I really appreciate.
Show quoted text
Sure, I understand that concern, just wish there was a better approach
we could use for "DO NOT START THIS SERVER" rather than moving of the
pg_control file.As ugly as it is, I have never heard of a better solution.
system("rm -rf $OLDPGDATA") ... nah, that is not a better idea.
regards, tom lane