Improve documentation for pg_upgrade, standbys and rsync

Started by Laurenz Albeover 4 years ago13 messages
#1Laurenz Albe
laurenz.albe@cybertec.at
1 attachment(s)

I revently tried to upgrade a standby following the documentation,
but I found it hard to understand, and it took me several tries to
get it right. This is of course owing to my lack of expertise with
rsync, but I think the documentation and examples could be clearer.

I think it would be a good idea to recommend the --relative option
of rsync.

Here is a patch that does that, as well as update the versions in
the code samples to something more recent. Also, I think it makes
sense to place the data directory in the sample in /var/lib/postgresql,
which is similar to what many people will have in real life.

Yours,
Laurenz Albe

Attachments:

0001-Improve-doc-for-pg_upgrade-and-standby-servers.patchtext/x-patch; charset=UTF-8; name=0001-Improve-doc-for-pg_upgrade-and-standby-servers.patchDownload
From 0ce2de70811ced07eb75215197cbb83cd7a7d2b9 Mon Sep 17 00:00:00 2001
From: Laurenz Albe <laurenz.albe@cybertec.at>
Date: Tue, 18 May 2021 19:42:55 +0200
Subject: [PATCH] Improve doc for pg_upgrade and standby servers

Recommend using the --relative option of rsync for clarity
and adapt the code samples accordingly.
Using relative paths makes clearer what is meant by "current
directory" and "remote_dir".
---
 doc/src/sgml/ref/pgupgrade.sgml | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index a83c63cd98..f3d6df8877 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -528,26 +528,26 @@ pg_upgrade.exe
 
       <para>
        When using link mode, standby servers can be quickly upgraded using
-       <application>rsync</application>.  To accomplish this, from a directory on
+       <application>rsync</application>.  To accomplish this, change into a directory on
        the primary server that is above the old and new database cluster
-       directories, run this on the <emphasis>primary</emphasis> for each standby
+       directories and run this on the <emphasis>primary</emphasis> for each standby
        server:
 
 <programlisting>
-rsync --archive --delete --hard-links --size-only --no-inc-recursive old_cluster new_cluster remote_dir
+rsync --archive --delete --hard-links --size-only --no-inc-recursive --relative old_cluster new_cluster remote_dir
 </programlisting>
 
        where <option>old_cluster</option> and <option>new_cluster</option> are relative
        to the current directory on the primary, and <option>remote_dir</option>
-       is <emphasis>above</emphasis> the old and new cluster directories
-       on the standby.  The directory structure under the specified
-       directories on the primary and standbys must match.  Consult the
+       is the directory on the standby that corresponds to your current directory
+       on the primary.  The directory structure under the specified
+       directories on the primary and standbys must be the same.  Consult the
        <application>rsync</application> manual page for details on specifying the
        remote directory, e.g.,
 
 <programlisting>
-rsync --archive --delete --hard-links --size-only --no-inc-recursive /opt/PostgreSQL/9.5 \
-      /opt/PostgreSQL/9.6 standby.example.com:/opt/PostgreSQL
+rsync --archive --delete --hard-links --size-only --no-inc-recursive --relative 9.6 13 \
+      standby.example.com:/var/lib/postgresql
 </programlisting>
 
        You can verify what the command will do using
@@ -576,8 +576,8 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /opt/Postgr
        <application>rsync</application> command for each tablespace directory, e.g.:
 
 <programlisting>
-rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tblsp/PG_9.5_201510051 \
-      /vol1/pg_tblsp/PG_9.6_201608131 standby.example.com:/vol1/pg_tblsp
+rsync --archive --delete --hard-links --size-only --no-inc-recursive --relative \
+      PG_9.6_201608131 PG_13_202007201 standby.example.com:/vol1/tblsp
 </programlisting>
 
        If you have relocated <filename>pg_wal</filename> outside the data
-- 
2.26.3

#2Stephen Frost
sfrost@snowman.net
In reply to: Laurenz Albe (#1)
Re: Improve documentation for pg_upgrade, standbys and rsync

Greetings,

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:

I revently tried to upgrade a standby following the documentation,
but I found it hard to understand, and it took me several tries to
get it right. This is of course owing to my lack of expertise with
rsync, but I think the documentation and examples could be clearer.

I think it would be a good idea to recommend the --relative option
of rsync.

Here is a patch that does that, as well as update the versions in
the code samples to something more recent. Also, I think it makes
sense to place the data directory in the sample in /var/lib/postgresql,
which is similar to what many people will have in real life.

Haven't had a chance to look at this in depth but improving things here
would be good.

An additional thing that we should really be mentioning is to tell
people to go in and TRUNCATE all of their UNLOGGED tables before going
through this process, otherwise the rsync will end up spending a bunch
of time copying the files for UNLOGGED relations which you really don't
want.

Thanks,

Stephen

#3Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Stephen Frost (#2)
Re: Improve documentation for pg_upgrade, standbys and rsync

On Wed, 2021-05-19 at 10:31 -0400, Stephen Frost wrote:

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:

I revently tried to upgrade a standby following the documentation,
but I found it hard to understand, [...]

Haven't had a chance to look at this in depth but improving things here
would be good.

An additional thing that we should really be mentioning is to tell
people to go in and TRUNCATE all of their UNLOGGED tables before going
through this process, otherwise the rsync will end up spending a bunch
of time copying the files for UNLOGGED relations which you really don't
want.

Thanks for the feedback and the suggestion.
CCing -hackers so that I can add it to the commitfest.

Yours,
Laurenz Albe

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Stephen Frost (#2)
1 attachment(s)
Re: Improve documentation for pg_upgrade, standbys and rsync

On Wed, 2021-05-19 at 10:31 -0400, Stephen Frost wrote:

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:

I revently tried to upgrade a standby following the documentation,
but I found it hard to understand, and it took me several tries to
get it right. This is of course owing to my lack of expertise with
rsync, but I think the documentation and examples could be clearer.

I think it would be a good idea to recommend the --relative option
of rsync.

An additional thing that we should really be mentioning is to tell
people to go in and TRUNCATE all of their UNLOGGED tables before going
through this process, otherwise the rsync will end up spending a bunch
of time copying the files for UNLOGGED relations which you really don't
want.

I have thought about that some more, and I am not certain that we should
unconditionally recommend that. Perhaps the pain of rebuilding the
unlogged table on the primary would be worse than rsyncing it to the
standby.

The documentation already mentions

"Unfortunately, rsync needlessly copies files associated with temporary
and unlogged tables because these files don't normally exist on standby
servers."

I'd say that is good enough, and people can draw their conclusions from
that.

Attached is a new patch with an added reminder to create "standby.signal",
as mentioned in [1]https://www.postgr.es/m/1A5A1B6E-7BB6-47EB-8443-40222B769404@iris.washington.edu.

Yours,
Laurenz Albe

[1]: https://www.postgr.es/m/1A5A1B6E-7BB6-47EB-8443-40222B769404@iris.washington.edu

Attachments:

0001-Improve-doc-for-pg_upgrade-and-standby-servers.V2.patchtext/x-patch; charset=UTF-8; name=0001-Improve-doc-for-pg_upgrade-and-standby-servers.V2.patchDownload
From 47b685b700548af06ab08673187bdd1df7236464 Mon Sep 17 00:00:00 2001
From: Laurenz Albe <laurenz.albe@cybertec.at>
Date: Fri, 16 Jul 2021 07:45:22 +0200
Subject: [PATCH] Improve doc for pg_upgrade and standby servers

Recommend using the --relative option of rsync for clarity
and adapt the code samples accordingly.
Using relative paths makes clearer what is meant by "current
directory" and "remote_dir".

Add a reminder that "standby.signal" needs to be created.
---
 doc/src/sgml/ref/pgupgrade.sgml | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index a83c63cd98..7aff00833a 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -528,26 +528,26 @@ pg_upgrade.exe
 
       <para>
        When using link mode, standby servers can be quickly upgraded using
-       <application>rsync</application>.  To accomplish this, from a directory on
+       <application>rsync</application>.  To accomplish this, change into a directory on
        the primary server that is above the old and new database cluster
-       directories, run this on the <emphasis>primary</emphasis> for each standby
+       directories and run this on the <emphasis>primary</emphasis> for each standby
        server:
 
 <programlisting>
-rsync --archive --delete --hard-links --size-only --no-inc-recursive old_cluster new_cluster remote_dir
+rsync --archive --delete --hard-links --size-only --no-inc-recursive --relative old_cluster new_cluster remote_dir
 </programlisting>
 
        where <option>old_cluster</option> and <option>new_cluster</option> are relative
        to the current directory on the primary, and <option>remote_dir</option>
-       is <emphasis>above</emphasis> the old and new cluster directories
-       on the standby.  The directory structure under the specified
-       directories on the primary and standbys must match.  Consult the
+       is the directory on the standby that corresponds to your current directory
+       on the primary.  The directory structure under the specified
+       directories on the primary and standbys must be the same.  Consult the
        <application>rsync</application> manual page for details on specifying the
        remote directory, e.g.,
 
 <programlisting>
-rsync --archive --delete --hard-links --size-only --no-inc-recursive /opt/PostgreSQL/9.5 \
-      /opt/PostgreSQL/9.6 standby.example.com:/opt/PostgreSQL
+rsync --archive --delete --hard-links --size-only --no-inc-recursive --relative 9.6 13 \
+      standby.example.com:/var/lib/postgresql
 </programlisting>
 
        You can verify what the command will do using
@@ -576,8 +576,8 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /opt/Postgr
        <application>rsync</application> command for each tablespace directory, e.g.:
 
 <programlisting>
-rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tblsp/PG_9.5_201510051 \
-      /vol1/pg_tblsp/PG_9.6_201608131 standby.example.com:/vol1/pg_tblsp
+rsync --archive --delete --hard-links --size-only --no-inc-recursive --relative \
+      PG_9.6_201608131 PG_13_202007201 standby.example.com:/vol1/tblsp
 </programlisting>
 
        If you have relocated <filename>pg_wal</filename> outside the data
@@ -593,7 +593,8 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
        Configure the servers for log shipping.  (You do not need to run
        <function>pg_start_backup()</function> and <function>pg_stop_backup()</function>
        or take a file system backup as the standbys are still synchronized
-       with the primary.)
+       with the primary.)  Don't forget to create <filename>standby.signal</filename>
+       on the standby server.
       </para>
      </step>
 
-- 
2.26.3

#5Stephen Frost
sfrost@snowman.net
In reply to: Laurenz Albe (#4)
Re: Improve documentation for pg_upgrade, standbys and rsync

Greetings,

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:

On Wed, 2021-05-19 at 10:31 -0400, Stephen Frost wrote:

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:

I revently tried to upgrade a standby following the documentation,
but I found it hard to understand, and it took me several tries to
get it right. This is of course owing to my lack of expertise with
rsync, but I think the documentation and examples could be clearer.

I think it would be a good idea to recommend the --relative option
of rsync.

An additional thing that we should really be mentioning is to tell
people to go in and TRUNCATE all of their UNLOGGED tables before going
through this process, otherwise the rsync will end up spending a bunch
of time copying the files for UNLOGGED relations which you really don't
want.

I have thought about that some more, and I am not certain that we should
unconditionally recommend that. Perhaps the pain of rebuilding the
unlogged table on the primary would be worse than rsyncing it to the
standby.

I disagree entirely. The reason to even consider using this approach is
to minimize the time required to get things back online and there's no
question that having the unlogged tables get rsync'd across would
increase the time required.

The documentation already mentions

"Unfortunately, rsync needlessly copies files associated with temporary
and unlogged tables because these files don't normally exist on standby
servers."

I'd say that is good enough, and people can draw their conclusions from
that.

I disagree. Instead, we should have explicit steps included which
detail how to find and truncate unlogged tables and what to do to remove
or exclude temporary files once the server is shut down.

Attached is a new patch with an added reminder to create "standby.signal",
as mentioned in [1].

Yours,
Laurenz Albe

[1]: https://www.postgr.es/m/1A5A1B6E-7BB6-47EB-8443-40222B769404@iris.washington.edu

From 47b685b700548af06ab08673187bdd1df7236464 Mon Sep 17 00:00:00 2001
From: Laurenz Albe <laurenz.albe@cybertec.at>
Date: Fri, 16 Jul 2021 07:45:22 +0200
Subject: [PATCH] Improve doc for pg_upgrade and standby servers

Recommend using the --relative option of rsync for clarity
and adapt the code samples accordingly.
Using relative paths makes clearer what is meant by "current
directory" and "remote_dir".

I'm not really convinced that this is actually a positive change, though
I don't know that it's really a negative one either. In general, I
prefer fully qualified paths to try and make things very clear about
what's happening, but this is also a bit of an odd case due to hard
links, etc.

Add a reminder that "standby.signal" needs to be created.

This makes sense to include, certainly, but it should be an explicit
step, not just a "don't forget" note at the end. I'm not really sure
why we talk about "log shipping" either..? Wouldn't it make more sense
to have something like:

g. Configure standby servers

Review the prior configuration of the standby servers and set up the
same configuration in the newly rsync'd directory.

1. touch /path/to/replica/standby.signal
2. Configure restore_command to pull from WAL archive
3. For streaming replicas, configure primary_conninfo

Probably back-patched all the way, with adjustments made for the pre-12
releases accordingly.

Thanks,

Stephen

#6Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Stephen Frost (#5)
1 attachment(s)
Re: Improve documentation for pg_upgrade, standbys and rsync

Thanks for looking at this!

On Fri, 2021-07-16 at 09:17 -0400, Stephen Frost wrote:

An additional thing that we should really be mentioning is to tell
people to go in and TRUNCATE all of their UNLOGGED tables before going
through this process, otherwise the rsync will end up spending a bunch
of time copying the files for UNLOGGED relations which you really don't
want.

I have thought about that some more, and I am not certain that we should
unconditionally recommend that. Perhaps the pain of rebuilding the
unlogged table on the primary would be worse than rsyncing it to the
standby.

I disagree entirely. The reason to even consider using this approach is
to minimize the time required to get things back online and there's no
question that having the unlogged tables get rsync'd across would
increase the time required.

I am not totally convinced that minimal down time is always more important
than keeping your unlogged tables, but I have adapted the patch accordingly.

The documentation already mentions

"Unfortunately, rsync needlessly copies files associated with temporary
and unlogged tables because these files don't normally exist on standby
servers."

I'd say that is good enough, and people can draw their conclusions from
that.

I disagree. Instead, we should have explicit steps included which
detail how to find and truncate unlogged tables and what to do to remove
or exclude temporary files once the server is shut down.

Ok, done.

Recommend using the --relative option of rsync for clarity
and adapt the code samples accordingly.
Using relative paths makes clearer what is meant by "current
directory" and "remote_dir".

I'm not really convinced that this is actually a positive change, though
I don't know that it's really a negative one either. In general, I
prefer fully qualified paths to try and make things very clear about
what's happening, but this is also a bit of an odd case due to hard
links, etc.

I normally prefer absolute paths as well.
But that is the only way I got it to run, and I think that in this
case it adds clarity to have the data directories relative to your
current working directory.

Add a reminder that "standby.signal" needs to be created.

This makes sense to include, certainly, but it should be an explicit
step, not just a "don't forget" note at the end. I'm not really sure
why we talk about "log shipping" either..? Wouldn't it make more sense
to have something like:

g. Configure standby servers

Review the prior configuration of the standby servers and set up the
same configuration in the newly rsync'd directory.

1. touch /path/to/replica/standby.signal
2. Configure restore_command to pull from WAL archive
3. For streaming replicas, configure primary_conninfo

Ok, I have modified the final step like this. That is better than
talking about log shipping.

Patch V3 attached.

Yours,
Laurenz Albe

Attachments:

0001-Improve-doc-for-pg_upgrade-and-standby-servers.V3.patchtext/x-patch; charset=UTF-8; name=0001-Improve-doc-for-pg_upgrade-and-standby-servers.V3.patchDownload
From 43453dc7379f87ca6638c80c9ec6bf528f8e2e28 Mon Sep 17 00:00:00 2001
From: Laurenz Albe <laurenz.albe@cybertec.at>
Date: Thu, 22 Jul 2021 15:33:59 +0200
Subject: [PATCH] Improve doc for pg_upgrade and standby servers

Recommend truncating or removing unlogged and temporary
tables to speed up "rsync".  Since this is best done in
the step "Prepare for standby server upgrades", move that
step to precede "Stop both servers".

Recommend using the --relative option of rsync for clarity
and adapt the code samples accordingly.
Using relative paths makes clearer what is meant by "current
directory" and "remote_dir".

Rewrite the final substep to not mention "log shipping".
Rather, provide a list of the necessary configuration steps.
---
 doc/src/sgml/ref/pgupgrade.sgml | 96 +++++++++++++++++++++------------
 1 file changed, 63 insertions(+), 33 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index a83c63cd98..3ccb311ff7 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -324,6 +324,35 @@ make prefix=/usr/local/pgsql.new install
     </para>
    </step>
 
+   <step id="prepare-standby-upgrade">
+    <title>Prepare for standby server upgrades</title>
+
+    <para>
+     If you are upgrading standby servers using methods outlined in section <xref
+     linkend="pgupgrade-step-replicas"/>, you should consider dropping temporary
+     tables and truncating unlogged tables on the primary, since that will speed up
+     <application>rsync</application> and keep the down time short.
+     You could run the following <application>psql</application> commands
+     in all databases:
+
+<programlisting>
+SELECT format('DROP TABLE %s', oid::regclass) FROM pg_class WHERE relpersistence = 't' \gexec
+SELECT format('TRUNCATE %s', oid::regclass) FROM pg_class WHERE relpersistence = 'u' \gexec
+</programlisting>
+
+     After stopping the primary servers as described in the following step, verify that
+     the old standby servers have caught up by running
+     <application>pg_controldata</application> against the old primary and
+     standby clusters.  Verify that the <quote>Latest checkpoint location</quote>
+     values match in all clusters.
+     (There will be a mismatch if old standby servers were shut down
+     before the old primary or if the old standby servers are still running.)
+     Also, make sure <varname>wal_level</varname> is not set to
+     <literal>minimal</literal> in the <filename>postgresql.conf</filename> file on the
+     new primary cluster.
+    </para>
+   </step>
+
    <step>
     <title>Stop both servers</title>
 
@@ -349,23 +378,6 @@ NET STOP postgresql-&majorversion;
     </para>
    </step>
 
-   <step>
-    <title>Prepare for standby server upgrades</title>
-
-    <para>
-     If you are upgrading standby servers using methods outlined in section <xref
-     linkend="pgupgrade-step-replicas"/>, verify that the old standby
-     servers are caught up by running <application>pg_controldata</application>
-     against the old primary and standby clusters.  Verify that the
-     <quote>Latest checkpoint location</quote> values match in all clusters.
-     (There will be a mismatch if old standby servers were shut down
-     before the old primary or if the old standby servers are still running.)
-     Also, make sure <varname>wal_level</varname> is not set to
-     <literal>minimal</literal> in the <filename>postgresql.conf</filename> file on the
-     new primary cluster.
-    </para>
-   </step>
-
    <step>
     <title>Run <application>pg_upgrade</application></title>
 
@@ -528,26 +540,26 @@ pg_upgrade.exe
 
       <para>
        When using link mode, standby servers can be quickly upgraded using
-       <application>rsync</application>.  To accomplish this, from a directory on
+       <application>rsync</application>.  To accomplish this, change into a directory on
        the primary server that is above the old and new database cluster
-       directories, run this on the <emphasis>primary</emphasis> for each standby
+       directories and run this on the <emphasis>primary</emphasis> for each standby
        server:
 
 <programlisting>
-rsync --archive --delete --hard-links --size-only --no-inc-recursive old_cluster new_cluster remote_dir
+rsync --archive --delete --hard-links --size-only --no-inc-recursive --relative old_cluster new_cluster remote_dir
 </programlisting>
 
        where <option>old_cluster</option> and <option>new_cluster</option> are relative
        to the current directory on the primary, and <option>remote_dir</option>
-       is <emphasis>above</emphasis> the old and new cluster directories
-       on the standby.  The directory structure under the specified
-       directories on the primary and standbys must match.  Consult the
+       is the directory on the standby that corresponds to your current directory
+       on the primary.  The directory structure under the specified
+       directories on the primary and standbys must be the same.  Consult the
        <application>rsync</application> manual page for details on specifying the
        remote directory, e.g.,
 
 <programlisting>
-rsync --archive --delete --hard-links --size-only --no-inc-recursive /opt/PostgreSQL/9.5 \
-      /opt/PostgreSQL/9.6 standby.example.com:/opt/PostgreSQL
+rsync --archive --delete --hard-links --size-only --no-inc-recursive --relative 9.6 13 \
+      standby.example.com:/var/lib/postgresql
 </programlisting>
 
        You can verify what the command will do using
@@ -568,7 +580,8 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /opt/Postgr
        small.)  This provides rapid standby upgrades.  Unfortunately,
        <application>rsync</application> needlessly copies files associated with
        temporary and unlogged tables because these files don't normally
-       exist on standby servers.
+       exist on standby servers, so you should have truncated or dropped
+       such tables as described in <xref linkend="prepare-standby-upgrade"/>.
       </para>
 
       <para>
@@ -576,8 +589,8 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /opt/Postgr
        <application>rsync</application> command for each tablespace directory, e.g.:
 
 <programlisting>
-rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tblsp/PG_9.5_201510051 \
-      /vol1/pg_tblsp/PG_9.6_201608131 standby.example.com:/vol1/pg_tblsp
+rsync --archive --delete --hard-links --size-only --no-inc-recursive --relative \
+      PG_9.6_201608131 PG_13_202007201 standby.example.com:/vol1/tblsp
 </programlisting>
 
        If you have relocated <filename>pg_wal</filename> outside the data
@@ -587,13 +600,30 @@ rsync --archive --delete --hard-links --size-only --no-inc-recursive /vol1/pg_tb
      </step>
 
      <step>
-      <title>Configure streaming replication and log-shipping standby servers</title>
+      <title>Configure standby servers</title>
 
       <para>
-       Configure the servers for log shipping.  (You do not need to run
-       <function>pg_start_backup()</function> and <function>pg_stop_backup()</function>
-       or take a file system backup as the standbys are still synchronized
-       with the primary.)
+       Review the prior configuration of the standby servers and set up the
+       same configuration in the newly <application>rsync</application>'ed
+       directory.
+
+       <itemizedlist>
+        <listitem>
+         <para>
+          touch <filename>/path/to/replica/standby.signal</filename>
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          configure <varname>restore_command</varname> to pull from WAL archive
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          for streaming replicas, configure <varname>primary_conninfo</varname>
+         </para>
+        </listitem>
+       </itemizedlist>
       </para>
      </step>
 
-- 
2.26.3

#7Stephen Frost
sfrost@snowman.net
In reply to: Laurenz Albe (#6)
Re: Improve documentation for pg_upgrade, standbys and rsync

Greetings,

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:

Thanks for looking at this!

Sure. Thanks for working on it!

On Fri, 2021-07-16 at 09:17 -0400, Stephen Frost wrote:

An additional thing that we should really be mentioning is to tell
people to go in and TRUNCATE all of their UNLOGGED tables before going
through this process, otherwise the rsync will end up spending a bunch
of time copying the files for UNLOGGED relations which you really don't
want.

I have thought about that some more, and I am not certain that we should
unconditionally recommend that. Perhaps the pain of rebuilding the
unlogged table on the primary would be worse than rsyncing it to the
standby.

I disagree entirely. The reason to even consider using this approach is
to minimize the time required to get things back online and there's no
question that having the unlogged tables get rsync'd across would
increase the time required.

I am not totally convinced that minimal down time is always more important
than keeping your unlogged tables, but I have adapted the patch accordingly.

Having the unlogged tables end up on replicas seems awkward also because
they really shouldn't be there and they'd never end up getting cleaned
up unless the replica crashed or was rebuilt..

The documentation already mentions

"Unfortunately, rsync needlessly copies files associated with temporary
and unlogged tables because these files don't normally exist on standby
servers."

I'd say that is good enough, and people can draw their conclusions from
that.

I disagree. Instead, we should have explicit steps included which
detail how to find and truncate unlogged tables and what to do to remove
or exclude temporary files once the server is shut down.

Ok, done.

Great, thanks, it's not quite this simple, unfortunately, more below..

Recommend using the --relative option of rsync for clarity
and adapt the code samples accordingly.
Using relative paths makes clearer what is meant by "current
directory" and "remote_dir".

I'm not really convinced that this is actually a positive change, though
I don't know that it's really a negative one either. In general, I
prefer fully qualified paths to try and make things very clear about
what's happening, but this is also a bit of an odd case due to hard
links, etc.

I normally prefer absolute paths as well.
But that is the only way I got it to run, and I think that in this
case it adds clarity to have the data directories relative to your
current working directory.

I'm pretty curious that you weren't able to get it to run with absolute
paths..

Add a reminder that "standby.signal" needs to be created.

This makes sense to include, certainly, but it should be an explicit
step, not just a "don't forget" note at the end. I'm not really sure
why we talk about "log shipping" either..? Wouldn't it make more sense
to have something like:

g. Configure standby servers

Review the prior configuration of the standby servers and set up the
same configuration in the newly rsync'd directory.

1. touch /path/to/replica/standby.signal
2. Configure restore_command to pull from WAL archive
3. For streaming replicas, configure primary_conninfo

Ok, I have modified the final step like this. That is better than
talking about log shipping.

Yup, glad you agree on that.

From 43453dc7379f87ca6638c80c9ec6bf528f8e2e28 Mon Sep 17 00:00:00 2001
From: Laurenz Albe <laurenz.albe@cybertec.at>
Date: Thu, 22 Jul 2021 15:33:59 +0200
Subject: [PATCH] Improve doc for pg_upgrade and standby servers

Recommend truncating or removing unlogged and temporary
tables to speed up "rsync". Since this is best done in
the step "Prepare for standby server upgrades", move that
step to precede "Stop both servers".

Recommend using the --relative option of rsync for clarity
and adapt the code samples accordingly.
Using relative paths makes clearer what is meant by "current
directory" and "remote_dir".

Rewrite the final substep to not mention "log shipping".
Rather, provide a list of the necessary configuration steps.
---
doc/src/sgml/ref/pgupgrade.sgml | 96 +++++++++++++++++++++------------
1 file changed, 63 insertions(+), 33 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index a83c63cd98..3ccb311ff7 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -324,6 +324,35 @@ make prefix=/usr/local/pgsql.new install
</para>
</step>
+   <step id="prepare-standby-upgrade">
+    <title>Prepare for standby server upgrades</title>
+
+    <para>
+     If you are upgrading standby servers using methods outlined in section <xref
+     linkend="pgupgrade-step-replicas"/>, you should consider dropping temporary
+     tables and truncating unlogged tables on the primary, since that will speed up
+     <application>rsync</application> and keep the down time short.
+     You could run the following <application>psql</application> commands
+     in all databases:
+
+<programlisting>
+SELECT format('DROP TABLE %s', oid::regclass) FROM pg_class WHERE relpersistence = 't' \gexec
+SELECT format('TRUNCATE %s', oid::regclass) FROM pg_class WHERE relpersistence = 'u' \gexec
+</programlisting>

Temporary tables aren't actually visible across different backends, nor
should they exist once the system has been shut down, but sometimes they
do get left around due to a crash, so the above won't actually work and
isn't the way to deal with those. The same can also happen with
temporary files that we create which end up in pgsql_tmp.

We could possibly exclude pgsql_tmp in the rsync command, but cleaning
up the temporary table files would involve something more complicated
like using 'find' to search for any '^t[0-9]+_[0-9]+.*$' files or
something along those lines.

Though, for that matter we should really be looking through all of the
directories and files that pg_basebackup excludes and considering if
they should somehow be excluded. There's no easy way to exclude
everything that pg_basebackup would with just an rsync because the logic
is a bit complicated (which is why I was saying we really need a proper
tool...) but we could probably provide a somewhat better rsync command
by going through that list and excluding what makes sense to exclude.
We could also provide another explicit before-rsync step to review all
the temp table files and move them or remove them, depending on how
comfortable one is with hacking around in the data directory.

This, of course, all comes back to the original complaint I had about
documenting this approach, which is that these things should only be
done by someone extremely familiar with the PG codebase, until and
unless we write an actual tool to do this.

+     (There will be a mismatch if old standby servers were shut down
+     before the old primary or if the old standby servers are still running.)

Would probably be good to note that if the standby's were shut down
before the primary then this method can *not* be used safely... The
above leaves it unclear about if the mismatch is an issue or not. I get
that this was in the original docs, but still would be good to improve
it.

Thanks!

Stephen

#8Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Stephen Frost (#7)
Re: Improve documentation for pg_upgrade, standbys and rsync

On Mon, 2021-07-26 at 15:11 -0400, Stephen Frost wrote:

An additional thing that we should really be mentioning is to tell
people to go in and TRUNCATE all of their UNLOGGED tables before going
through this process, otherwise the rsync will end up spending a bunch
of time copying the files for UNLOGGED relations which you really don't
want.

Ok, done.

Great, thanks, it's not quite this simple, unfortunately, more below..

+    <para>
+     If you are upgrading standby servers using methods outlined in section <xref
+     linkend="pgupgrade-step-replicas"/>, you should consider dropping temporary
+     tables and truncating unlogged tables on the primary, since that will speed up
+     <application>rsync</application> and keep the down time short.
+     You could run the following <application>psql</application> commands
+     in all databases:
+
+<programlisting>
+SELECT format('DROP TABLE %s', oid::regclass) FROM pg_class WHERE relpersistence = 't' \gexec
+SELECT format('TRUNCATE %s', oid::regclass) FROM pg_class WHERE relpersistence = 'u' \gexec
+</programlisting>

Temporary tables aren't actually visible across different backends, nor
should they exist once the system has been shut down, but sometimes they
do get left around due to a crash, so the above won't actually work and
isn't the way to deal with those. The same can also happen with
temporary files that we create which end up in pgsql_tmp.

We could possibly exclude pgsql_tmp in the rsync command, but cleaning
up the temporary table files would involve something more complicated
like using 'find' to search for any '^t[0-9]+_[0-9]+.*$' files or
something along those lines.

Though, for that matter we should really be looking through all of the
directories and files that pg_basebackup excludes and considering if
they should somehow be excluded. There's no easy way to exclude
everything that pg_basebackup would with just an rsync because the logic
is a bit complicated (which is why I was saying we really need a proper
tool...) but we could probably provide a somewhat better rsync command
by going through that list and excluding what makes sense to exclude.
We could also provide another explicit before-rsync step to review all
the temp table files and move them or remove them, depending on how
comfortable one is with hacking around in the data directory.

This, of course, all comes back to the original complaint I had about
documenting this approach, which is that these things should only be
done by someone extremely familiar with the PG codebase, until and
unless we write an actual tool to do this.

I agree with what you write, but that sounds like you are arguing for
a code patch rather than for documentation to enable the user to do
that manually, which is what I believe you said initially.

My two statements will get rid of temporary tables left behind after
a crash and truncate unlogged tables, which should be an improvement.

Of course it would be good to get rid of orphaned files left behind
after a crash, but, as you say, that is not so easy.

I'd say that writing tools to do better than my two SQL statements
is nice to have, but beyond the scope of this documentation patch.

Recommend using the --relative option of rsync for clarity
and adapt the code samples accordingly.
Using relative paths makes clearer what is meant by "current
directory" and "remote_dir".

I normally prefer absolute paths as well.
But that is the only way I got it to run, and I think that in this
case it adds clarity to have the data directories relative to your
current working directory.

I'm pretty curious that you weren't able to get it to run with absolute
paths..

I tried a couple of times with a test cluster and failed.

Part of the confustion for me is that you are supposed to run the
rsync from a certain directory, which seems weird if paths are absolute.
Run from *any* directory above the old and the new cluster?

"Relative to my current directory" makes more sense to me here.

+     (There will be a mismatch if old standby servers were shut down
+     before the old primary or if the old standby servers are still running.)

Would probably be good to note that if the standby's were shut down
before the primary then this method can *not* be used safely... The
above leaves it unclear about if the mismatch is an issue or not. I get
that this was in the original docs, but still would be good to improve
it.

Agreed.

Yours,
Laurenz Albe

#9Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#7)
Re: Improve documentation for pg_upgrade, standbys and rsync

On Mon, Jul 26, 2021 at 3:11 PM Stephen Frost <sfrost@snowman.net> wrote:

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:

Thanks for looking at this!

Sure. Thanks for working on it!

Stephen, do you intend to do something about this patch in terms of
getting it committed? You're the only reviewer but haven't responded
to the thread for more than 5 months.

I don't feel that I know this area of the documentation well enough to
feel comfortable passing judgement on whether this change is an
improvement or not. However I do feel somewhat uncomfortable with
this:

- <step>
- <title>Prepare for standby server upgrades</title>
-
- <para>
- If you are upgrading standby servers using methods outlined in
section <xref
- linkend="pgupgrade-step-replicas"/>, verify that the old standby
- servers are caught up by running <application>pg_controldata</application>
- against the old primary and standby clusters. Verify that the
- <quote>Latest checkpoint location</quote> values match in all clusters.
- (There will be a mismatch if old standby servers were shut down
- before the old primary or if the old standby servers are still running.)
- Also, make sure <varname>wal_level</varname> is not set to
- <literal>minimal</literal> in the
<filename>postgresql.conf</filename> file on the
- new primary cluster.
- </para>
- </step>

Right now, we say that you should stop the standby servers and then
prepared for standby server upgrades. With this patch, we say that you
should first prepare for standby server upgrades, and then stop the
standby servers. But the last part of the text about preparing for
standby server upgrades now mentions things to be done after carrying
out the next step where the servers are actually stopped. That seems
confusing. Perhaps we need two separate steps here, one to be
performed before stopping both servers and the other after.

Also, let me express my general terror at the idea of anyone actually
using this procedure.

Regards,

--
Robert Haas
EDB: http://www.enterprisedb.com

#10Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#9)
Re: Improve documentation for pg_upgrade, standbys and rsync

Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:

On Mon, Jul 26, 2021 at 3:11 PM Stephen Frost <sfrost@snowman.net> wrote:

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:

Thanks for looking at this!

Sure. Thanks for working on it!

Stephen, do you intend to do something about this patch in terms of
getting it committed? You're the only reviewer but haven't responded
to the thread for more than 5 months.

I tried to be clear in the last email on the thread, the one which you
just responded to, here:

* Stephen Frost (sfrost@snowman.net) wrote:

This, of course, all comes back to the original complaint I had about
documenting this approach, which is that these things should only be
done by someone extremely familiar with the PG codebase, until and
unless we write an actual tool to do this.

To be more explicit though- we should write a tool to do this. We
shouldn't try to document a way to do it because it's hard to get right.
While rsync is very capable, what's needed to really do this goes beyond
what we could reasonably put into any rsync command or really even into
a documented procedure. I get that we already have it documented (and
I'll note that doing so was against my recommendation..) and that some
folks (likely those who follow this mailing list) have had success using
it, but I'd really rather we just take it out and put it on a wiki
somewhere as a "we need a tool that does this stuff" and hope that
someone finds time to write one.

I don't feel that I know this area of the documentation well enough to
feel comfortable passing judgement on whether this change is an
improvement or not. However I do feel somewhat uncomfortable with
this:

- <step>
- <title>Prepare for standby server upgrades</title>
-
- <para>
- If you are upgrading standby servers using methods outlined in
section <xref
- linkend="pgupgrade-step-replicas"/>, verify that the old standby
- servers are caught up by running <application>pg_controldata</application>
- against the old primary and standby clusters. Verify that the
- <quote>Latest checkpoint location</quote> values match in all clusters.
- (There will be a mismatch if old standby servers were shut down
- before the old primary or if the old standby servers are still running.)
- Also, make sure <varname>wal_level</varname> is not set to
- <literal>minimal</literal> in the
<filename>postgresql.conf</filename> file on the
- new primary cluster.
- </para>
- </step>

Right now, we say that you should stop the standby servers and then
prepared for standby server upgrades. With this patch, we say that you
should first prepare for standby server upgrades, and then stop the
standby servers. But the last part of the text about preparing for
standby server upgrades now mentions things to be done after carrying
out the next step where the servers are actually stopped. That seems
confusing. Perhaps we need two separate steps here, one to be
performed before stopping both servers and the other after.

It should really be both- things to do on the primary ahead of time
(truncate all unlogged tables, make sure there aren't any orphaned
temporary tables, etc), and then things to do on the replicas after
shutting the primary down (basically, make sure they are fully caught up
with where the primary was at shutdown). I tried to explain that in my
prior email but perhaps didn't do a very good job.

Also, let me express my general terror at the idea of anyone actually
using this procedure.

I mean, yeah, I agree.

Thanks,

Stephen

#11Bruce Momjian
bruce@momjian.us
In reply to: Stephen Frost (#10)
Re: Improve documentation for pg_upgrade, standbys and rsync

On Tue, Apr 5, 2022 at 01:10:38PM -0400, Stephen Frost wrote:

To be more explicit though- we should write a tool to do this. We
shouldn't try to document a way to do it because it's hard to get right.
While rsync is very capable, what's needed to really do this goes beyond
what we could reasonably put into any rsync command or really even into
a documented procedure. I get that we already have it documented (and
I'll note that doing so was against my recommendation..) and that some
folks (likely those who follow this mailing list) have had success using
it, but I'd really rather we just take it out and put it on a wiki
somewhere as a "we need a tool that does this stuff" and hope that
someone finds time to write one.

Well, I think pg_upgrade needs a tool, let alone for standby upgrades,
but 13 years in, no one has written one, so I am not holding my breath.
Also, we need to document the procedure _somewhere_ --- if we don't the
only procedure is embedded in a tool. and that seems even worse than
what we have now.

It should really be both- things to do on the primary ahead of time
(truncate all unlogged tables, make sure there aren't any orphaned
temporary tables, etc), and then things to do on the replicas after
shutting the primary down (basically, make sure they are fully caught up
with where the primary was at shutdown). I tried to explain that in my
prior email but perhaps didn't do a very good job.

Also, let me express my general terror at the idea of anyone actually
using this procedure.

I mean, yeah, I agree.

I thought that was true for pg_upgrade in general? ;-)

Seems like a pull up your sleeves and hold your nose --- I am good at
those tasks. ;-) Should I work on this? Tangentially, I see that my
old macros fastgetattr and heap_getattr have finally been retired by
commit e27f4ee0a7. :-)

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Indecision is a decision. Inaction is an action. Mark Batterson

#12Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Robert Haas (#9)
Re: Improve documentation for pg_upgrade, standbys and rsync

On Tue, 2022-04-05 at 12:38 -0400, Robert Haas wrote:

Also, let me express my general terror at the idea of anyone actually
using this procedure.

I did, and I couldn't get it to work with absolute paths, and using
relative paths seemed to me to be more intuitive anyway, hence the patch.

Originally that was the only change I wanted to make to the documentation,
but you know how it is: as soon as you touch something like this, someone
will (rightly so) prod you and say "while you change this, that other
thing there should also be improved", and the patch gets more
and more invasive.

I agree with the scariness of this, but I prefer to have it in the
documentation anyway; at least as long as we have nothing better (which
is always the enemy of the good).

Yours,
Laurenz Albe

#13Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Stephen Frost (#10)
Re: Improve documentation for pg_upgrade, standbys and rsync

On Tue, 2022-04-05 at 13:10 -0400, Stephen Frost wrote:

* Robert Haas (robertmhaas@gmail.com) wrote:
On Mon, Jul 26, 2021 at 3:11 PM Stephen Frost <sfrost@snowman.net> wrote:

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:

Thanks for looking at this!

Sure.  Thanks for working on it!

Stephen, do you intend to do something about this patch in terms of
getting it committed? You're the only reviewer but haven't responded
to the thread for more than 5 months.

I tried to be clear in the last email on the thread, the one which you
just responded to, here:

* Stephen Frost (sfrost@snowman.net) wrote:

This, of course, all comes back to the original complaint I had about
documenting this approach, which is that these things should only be
done by someone extremely familiar with the PG codebase, until and
unless we write an actual tool to do this.

To be more explicit though- we should write a tool to do this.  We
shouldn't try to document a way to do it because it's hard to get right.

I see no agreement on this patch. I'll withdraw it from the commitfest
to avoid hogging resources. Thanks everyone for the review.

Yours,
Laurenz Albe