Doc: Move standalone backup section, mention -X argument

Started by David G. Johnstonover 1 year ago6 messages

david.g.johnston@gmail.com

over 1 year ago

1 attachment(s)

A documentation comment came in [1]/messages/by-id/CAKFQuwZ=WxdWJ6O66yQ9dnWTLO12p7h3HpfhowCj+0U_bNrzdg@mail.gmail.com causing me to review some of our backup
documentation and I left the current content and location of the standalone
backups was odd. I propose to move it to a better place, under file system
backups.

Adding to commitfest.

David J.

[1]: /messages/by-id/CAKFQuwZ=WxdWJ6O66yQ9dnWTLO12p7h3HpfhowCj+0U_bNrzdg@mail.gmail.com
/messages/by-id/CAKFQuwZ=WxdWJ6O66yQ9dnWTLO12p7h3HpfhowCj+0U_bNrzdg@mail.gmail.com

Attachments:

v1-0001-docs-move-standalone-pg_basebackup-docs-to-file-syst.patchapplication/octet-stream; name=v1-0001-docs-move-standalone-pg_basebackup-docs-to-file-syst.patchDownload

From 0512ec27c52401b9ed7b468e7b68ff9ebec1584e Mon Sep 17 00:00:00 2001
From: "David G. Johnston" <david.g.johnston@gmail.com>
Date: Fri, 2 Feb 2024 12:32:51 -0700
Subject: [PATCH] docs: move standalone pg_basebackup docs to file system
 section

---
 doc/src/sgml/backup.sgml | 47 ++++++++++++++++++++--------------------
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index b3468eea3c..1c5e48d6da 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -351,6 +351,29 @@ pg_dump -j <replaceable class="parameter">num</replaceable> -F d -f <replaceable
  <sect1 id="backup-file">
   <title>File System Level Backup</title>
 
+  <para>
+   In constrast to the logical backup that pg_dump performs there is
+   also the option to perform a physical backup.
+   <productname>PostgreSQL</productname> provides a tool,
+   <literal>pg_basebackup</literal>, that can produce a similar standalone
+   backup to the one produced by pg_dump, though the restoration point
+   of the physical backup is as of when the backup ended instead of when
+   it began. You may also choose to roll your own solution using the tools
+   available in your operating system.  This section describes both options.
+  </para>
+
+  <sect2 id="backup-standalone-pgbasebackup">
+    <title>Built-In Standalone Backups</title>
+    <para>
+     If all you want is a simple standalone backup of your cluster at some point in time
+     (specifically at roughly the moment the backup ended) you can just save the archive
+     produced by the <xref linkend="app-pgbasebackup"/> command. It handles, by default,
+     saving the write-ahead log (WAL) files produced during the backup to the archive.
+    </para>
+  </sect2>
+
+  <sect2 id="backup-standalone-custom">
+  <title>Custom Physical Backups</title>
   <para>
    An alternative backup strategy is to directly copy the files that
    <productname>PostgreSQL</productname> uses to store the data in the database;
@@ -462,6 +485,7 @@ tar -cf backup.tar /usr/local/pgsql/data
    the contents of indexes for example, just the commands to recreate
    them.)  However, taking a file system backup might be faster.
   </para>
+  </sect2>
  </sect1>
 
  <sect1 id="continuous-archiving">
@@ -1442,29 +1466,6 @@ restore_command = 'cp /mnt/server/archivedir/%f %p'
    <para>
     Some tips for configuring continuous archiving are given here.
    </para>
-
-    <sect3 id="backup-standalone">
-     <title>Standalone Hot Backups</title>
-
-     <para>
-      It is possible to use <productname>PostgreSQL</productname>'s backup facilities to
-      produce standalone hot backups. These are backups that cannot be used
-      for point-in-time recovery, yet are typically much faster to backup and
-      restore than <application>pg_dump</application> dumps.  (They are also much larger
-      than <application>pg_dump</application> dumps, so in some cases the speed advantage
-      might be negated.)
-     </para>
-
-     <para>
-      As with base backups, the easiest way to produce a standalone
-      hot backup is to use the <xref linkend="app-pgbasebackup"/>
-      tool. If you include the <literal>-X</literal> parameter when calling
-      it, all the write-ahead log required to use the backup will be
-      included in the backup automatically, and no special action is
-      required to restore the backup.
-     </para>
-    </sect3>
-
     <sect3 id="compressed-archive-logs">
      <title>Compressed Archive Logs</title>
 
-- 
2.34.1

Marlene Reiterer

marlene.reiterer.03@gmail.com

over 1 year ago

In reply to: David G. Johnston (#1)

Re: Doc: Move standalone backup section, mention -X argument

I compiled the patch and it worked without any problems.

I think the patch makes sense, because of the structure of the current
docs. It seems more logical to have this section in this part of the
documentation, where it is useful and not only described for another
chapter, because it won't even work with the current chapter it is
referenced in ("Continous Archiving and Point-in-Time Recovery
(PITR)").

I am still new to Postgres, so I can't tell whether it can be written
more detailed or not. But I really like it, that is in a more fitting
chapter in my opinion.

Regards,
Marlene Reiterer

Am Mo., 16. Sept. 2024 um 10:35 Uhr schrieb David G. Johnston
<david.g.johnston@gmail.com>:

Show quoted text

A documentation comment came in [1] causing me to review some of our backup documentation and I left the current content and location of the standalone backups was odd. I propose to move it to a better place, under file system backups.

Adding to commitfest.

David J.

[1] /messages/by-id/CAKFQuwZ=WxdWJ6O66yQ9dnWTLO12p7h3HpfhowCj+0U_bNrzdg@mail.gmail.com

Álvaro Herrera

alvherre@alvh.no-ip.org

11 months ago

In reply to: David G. Johnston (#1)

Re: Doc: Move standalone backup section, mention -X argument

On 2024-Jun-28, David G. Johnston wrote:

A documentation comment came in [1] causing me to review some of our backup
documentation and I left the current content and location of the standalone
backups was odd. I propose to move it to a better place, under file system
backups.

Even before this patch, these sections are all a bit incoherent, because
we spend a lot of vertical space explaining WAL archiving before even
mentioning how they would be used, with pg_basebackup mentioned halfway
down the page. Your patch makes it a bit better, but I think it doesn't
go far enough. Even after the patch, If the reader skips 25.2, then
section 25.3 reads a bit incoherent until you're halfway down the (quite
long) page and pg_basebackup is mentioned. I think it would be better
to move 25.2 out of the way moving it to the end of the chapter, and do
something like this

25.1. SQL Dump
25.1.1. Restoring the Dump
25.1.2. Using pg_dumpall
25.1.3. Handling Large Databases

25.2. Physical Backups Using Continuous Archiving
David's text: "In constrast to logical backups ... "
25.2.1. Built-In Standalone Backups
"If all you want is a standalone ..."
25.2.2. Setting Up WAL Archiving
25.2.3. Making a Base Backup
25.2.4. Making an Incremental Backup
25.2.5. Making a Base Backup Using the Low Level API
25.2.6. Recovering Using a Continuous Archive Backup
25.2.7. Timelines
25.2.8. Tips and Examples
25.2.9. Caveats

25.3. File System Level Backup
Start current 25.2 with a few additional words: "An older and largely
deprecated technique to take a backup is to directly copy the files ... "

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Nunca confiaré en un traidor. Ni siquiera si el traidor lo he creado yo"
(Barón Vladimir Harkonnen)

David G. Johnston

david.g.johnston@gmail.com

11 months ago

In reply to: Álvaro Herrera (#3)

Re: Doc: Move standalone backup section, mention -X argument

On Monday, February 10, 2025, Álvaro Herrera <alvherre@alvh.no-ip.org>
wrote:

On 2024-Jun-28, David G. Johnston wrote:

A documentation comment came in [1] causing me to review some of our

backup

documentation and I left the current content and location of the

standalone

backups was odd. I propose to move it to a better place, under file

system

backups.

Even before this patch, these sections are all a bit incoherent, because
we spend a lot of vertical space explaining WAL archiving before even
mentioning how they would be used, with pg_basebackup mentioned halfway
down the page. Your patch makes it a bit better, but I think it doesn't
go far enough. Even after the patch, If the reader skips 25.2, then
section 25.3 reads a bit incoherent until you're halfway down the (quite
long) page and pg_basebackup is mentioned. I think it would be better
to move 25.2 out of the way moving it to the end of the chapter, and do
something like this

25.1. SQL Dump
25.1.1. Restoring the Dump
25.1.2. Using pg_dumpall
25.1.3. Handling Large Databases

25.2. Physical Backups Using Continuous Archiving
David's text: "In constrast to logical backups ... "
25.2.1. Built-In Standalone Backups
"If all you want is a standalone ..."
25.2.2. Setting Up WAL Archiving
25.2.3. Making a Base Backup
25.2.4. Making an Incremental Backup
25.2.5. Making a Base Backup Using the Low Level API
25.2.6. Recovering Using a Continuous Archive Backup
25.2.7. Timelines
25.2.8. Tips and Examples
25.2.9. Caveats

25.3. File System Level Backup
Start current 25.2 with a few additional words: "An older and largely
deprecated technique to take a backup is to directly copy the files ...
"

Thanks. There is another comment floating about saying a similar thing.
I’m good with giving a more comprehensive patch a go.

David J.

vignesh C

vignesh21@gmail.com

10 months ago

In reply to: David G. Johnston (#1)

Re: Doc: Move standalone backup section, mention -X argument

On Sat, 29 Jun 2024 at 02:27, David G. Johnston
<david.g.johnston@gmail.com> wrote:

A documentation comment came in [1] causing me to review some of our backup documentation and I left the current content and location of the standalone backups was odd. I propose to move it to a better place, under file system backups.

Adding to commitfest.

I noticed that Alvaro's comment from [1]/messages/by-id/202502101154.bmb536npfl5e@alvherre.pgsql is not yet addressed, I have
changed the status of commitfest entry to Waiting on Author, please
address them and change the status back to Needs review.
[1]: /messages/by-id/202502101154.bmb536npfl5e@alvherre.pgsql

Regards,
Vignesh

Benoit Lobréau

benoit.lobreau@dalibo.com

9 months ago

In reply to: vignesh C (#5)

1 attachment(s)

Re: Doc: Move standalone backup section, mention -X argument

On 3/16/25 2:19 PM, vignesh C wrote:

I noticed that Alvaro's comment from [1] is not yet addressed, I have
changed the status of commitfest entry to Waiting on Author, please
address them and change the status back to Needs review.
[1] - /messages/by-id/202502101154.bmb536npfl5e@alvherre.pgsql

Regards,
Vignesh

Hi,

You will find a patch for the proposed changes attached to this mail.

The menu is now:

25.1. SQL Dump
25.1.1. Restoring the Dump
25.1.2. Using pg_dumpall
25.1.3. Handling Large Databases
25.2. Physical Backups Using Continuous Archiving
25.2.1. Built-In Standalone Backups
25.2.2. Setting Up WAL Archiving
25.2.3. Making a Base Backup
25.2.4. Making an Incremental Backup
25.2.5. Making a Base Backup Using the Low Level API
25.2.6. Recovering Using a Continuous Archive Backup
25.2.7. Timelines
25.2.8. Tips and Examples
25.2.9. Caveats
25.3. File System Level Backup

I slightly modified section 25.2.1 and 25.3 as proposed.

--
Benoit Lobréau
Consultant
http://dalibo.com

Attachments:

v2-0001-Reorganize-the-backup-section.patchtext/x-patch; charset=UTF-8; name=v2-0001-Reorganize-the-backup-section.patchDownload

From 16813b396a45c4061b5c2d21a9091e3fb372567c Mon Sep 17 00:00:00 2001
From: benoit <benoit.lobreau@dalibo.com>
Date: Tue, 15 Apr 2025 15:25:08 +0200
Subject: [PATCH] Reorganize the backup section

The standalone backup of the backup documentation lacks visibility. The
solution described in the file level backup section, while still usable,
is not the preferred method. This patch attempts to remedy this by moving
things around.
---
 doc/src/sgml/backup.sgml | 278 +++++++++++++++++++--------------------
 1 file changed, 139 insertions(+), 139 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 25b8904baf7..c167fb5b6b6 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -354,124 +354,8 @@ pg_dump -j <replaceable class="parameter">num</replaceable> -F d -f <replaceable
   </sect2>
  </sect1>
 
- <sect1 id="backup-file">
-  <title>File System Level Backup</title>
-
-  <para>
-   An alternative backup strategy is to directly copy the files that
-   <productname>PostgreSQL</productname> uses to store the data in the database;
-   <xref linkend="creating-cluster"/> explains where these files
-   are located.  You can use whatever method you prefer
-   for doing file system backups; for example:
-
-<programlisting>
-tar -cf backup.tar /usr/local/pgsql/data
-</programlisting>
-  </para>
-
-  <para>
-   There are two restrictions, however, which make this method
-   impractical, or at least inferior to the <application>pg_dump</application>
-   method:
-
-   <orderedlist>
-    <listitem>
-     <para>
-      The database server <emphasis>must</emphasis> be shut down in order to
-      get a usable backup. Half-way measures such as disallowing all
-      connections will <emphasis>not</emphasis> work
-      (in part because <command>tar</command> and similar tools do not take
-      an atomic snapshot of the state of the file system,
-      but also because of internal buffering within the server).
-      Information about stopping the server can be found in
-      <xref linkend="server-shutdown"/>.  Needless to say, you
-      also need to shut down the server before restoring the data.
-     </para>
-    </listitem>
-
-    <listitem>
-     <para>
-      If you have dug into the details of the file system layout of the
-      database, you might be tempted to try to back up or restore only certain
-      individual tables or databases from their respective files or
-      directories. This will <emphasis>not</emphasis> work because the
-      information contained in these files is not usable without
-      the commit log files,
-      <filename>pg_xact/*</filename>, which contain the commit status of
-      all transactions. A table file is only usable with this
-      information. Of course it is also impossible to restore only a
-      table and the associated <filename>pg_xact</filename> data
-      because that would render all other tables in the database
-      cluster useless.  So file system backups only work for complete
-      backup and restoration of an entire database cluster.
-     </para>
-    </listitem>
-   </orderedlist>
-  </para>
-
-  <para>
-   An alternative file-system backup approach is to make a
-   <quote>consistent snapshot</quote> of the data directory, if the
-   file system supports that functionality (and you are willing to
-   trust that it is implemented correctly).  The typical procedure is
-   to make a <quote>frozen snapshot</quote> of the volume containing the
-   database, then copy the whole data directory (not just parts, see
-   above) from the snapshot to a backup device, then release the frozen
-   snapshot.  This will work even while the database server is running.
-   However, a backup created in this way saves
-   the database files in a state as if the database server was not
-   properly shut down; therefore, when you start the database server
-   on the backed-up data, it will think the previous server instance
-   crashed and will replay the WAL log.  This is not a problem; just
-   be aware of it (and be sure to include the WAL files in your backup).
-   You can perform a <command>CHECKPOINT</command> before taking the
-   snapshot to reduce recovery time.
-  </para>
-
-  <para>
-   If your database is spread across multiple file systems, there might not
-   be any way to obtain exactly-simultaneous frozen snapshots of all
-   the volumes.  For example, if your data files and WAL log are on different
-   disks, or if tablespaces are on different file systems, it might
-   not be possible to use snapshot backup because the snapshots
-   <emphasis>must</emphasis> be simultaneous.
-   Read your file system documentation very carefully before trusting
-   the consistent-snapshot technique in such situations.
-  </para>
-
-  <para>
-   If simultaneous snapshots are not possible, one option is to shut down
-   the database server long enough to establish all the frozen snapshots.
-   Another option is to perform a continuous archiving base backup (<xref
-   linkend="backup-base-backup"/>) because such backups are immune to file
-   system changes during the backup.  This requires enabling continuous
-   archiving just during the backup process; restore is done using
-   continuous archive recovery (<xref linkend="backup-pitr-recovery"/>).
-  </para>
-
-  <para>
-   Another option is to use <application>rsync</application> to perform a file
-   system backup.  This is done by first running <application>rsync</application>
-   while the database server is running, then shutting down the database
-   server long enough to do an <command>rsync --checksum</command>.
-   (<option>--checksum</option> is necessary because <command>rsync</command> only
-   has file modification-time granularity of one second.)  The
-   second <application>rsync</application> will be quicker than the first,
-   because it has relatively little data to transfer, and the end result
-   will be consistent because the server was down.  This method
-   allows a file system backup to be performed with minimal downtime.
-  </para>
-
-  <para>
-   Note that a file system backup will typically be larger
-   than an SQL dump. (<application>pg_dump</application> does not need to dump
-   the contents of indexes for example, just the commands to recreate
-   them.)  However, taking a file system backup might be faster.
-  </para>
- </sect1>
-
  <sect1 id="continuous-archiving">
-  <title>Continuous Archiving and Point-in-Time Recovery (PITR)</title>
+  <title>Physical Backups Using Continuous Archiving</title>
 
   <indexterm zone="backup">
    <primary>continuous archiving</primary>
@@ -569,6 +453,28 @@ tar -cf backup.tar /usr/local/pgsql/data
    archiving WAL files.
   </para>
 
+  <sect2 id="backup-standalone">
+   <title>Built-In Standalone Backups</title>
+
+   <para>
+    If all you want is a standalone backup, it is possible to use <productname>
+    PostgreSQL</productname>'s backup facilities to produce standalone hot backups.
+    These are backups that cannot be used for point-in-time recovery, yet are
+    typically much faster to backup and restore than <application>pg_dump</application>
+    dumps. (They are also much larger than <application>pg_dump</application> dumps,
+    so in some cases the speed advantage might be negated.)
+   </para>
+
+   <para>
+    The easiest way to produce a standalone
+    hot backup is to use the <xref linkend="app-pgbasebackup"/>
+    tool. If you include the <literal>-X</literal> parameter when calling
+    it, all the write-ahead log required to use the backup will be
+    included in the backup automatically, and no special action is
+    required to restore the backup.
+   </para>
+  </sect2>
+
   <sect2 id="backup-archiving-wal">
    <title>Setting Up WAL Archiving</title>
 
@@ -1464,28 +1370,6 @@ restore_command = 'cp /mnt/server/archivedir/%f %p'
     Some tips for configuring continuous archiving are given here.
    </para>
 
-    <sect3 id="backup-standalone">
-     <title>Standalone Hot Backups</title>
-
-     <para>
-      It is possible to use <productname>PostgreSQL</productname>'s backup facilities to
-      produce standalone hot backups. These are backups that cannot be used
-      for point-in-time recovery, yet are typically much faster to backup and
-      restore than <application>pg_dump</application> dumps.  (They are also much larger
-      than <application>pg_dump</application> dumps, so in some cases the speed advantage
-      might be negated.)
-     </para>
-
-     <para>
-      As with base backups, the easiest way to produce a standalone
-      hot backup is to use the <xref linkend="app-pgbasebackup"/>
-      tool. If you include the <literal>-X</literal> parameter when calling
-      it, all the write-ahead log required to use the backup will be
-      included in the backup automatically, and no special action is
-      required to restore the backup.
-     </para>
-    </sect3>
-
     <sect3 id="compressed-archive-logs">
      <title>Compressed Archive Logs</title>
 
@@ -1617,4 +1501,120 @@ archive_command = 'local_backup_script.sh "%p" "%f"'
   </sect2>
  </sect1>
 
+ <sect1 id="backup-file">
+  <title>File System Level Backup</title>
+
+  <para>
+   An older and largely deprecated technique to take a backup is to directly copy
+   the files that <productname>PostgreSQL</productname> uses to store the data in
+   the database; <xref linkend="creating-cluster"/> explains where these files
+   are located. You can use whatever method you prefer for doing file system
+   backups; for example:
+
+<programlisting>
+tar -cf backup.tar /usr/local/pgsql/data
+</programlisting>
+  </para>
+
+  <para>
+   There are two restrictions, however, which make this method
+   impractical, or at least inferior to the <application>pg_dump</application>
+   method:
+
+   <orderedlist>
+    <listitem>
+     <para>
+      The database server <emphasis>must</emphasis> be shut down in order to
+      get a usable backup. Half-way measures such as disallowing all
+      connections will <emphasis>not</emphasis> work
+      (in part because <command>tar</command> and similar tools do not take
+      an atomic snapshot of the state of the file system,
+      but also because of internal buffering within the server).
+      Information about stopping the server can be found in
+      <xref linkend="server-shutdown"/>.  Needless to say, you
+      also need to shut down the server before restoring the data.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      If you have dug into the details of the file system layout of the
+      database, you might be tempted to try to back up or restore only certain
+      individual tables or databases from their respective files or
+      directories. This will <emphasis>not</emphasis> work because the
+      information contained in these files is not usable without
+      the commit log files,
+      <filename>pg_xact/*</filename>, which contain the commit status of
+      all transactions. A table file is only usable with this
+      information. Of course it is also impossible to restore only a
+      table and the associated <filename>pg_xact</filename> data
+      because that would render all other tables in the database
+      cluster useless.  So file system backups only work for complete
+      backup and restoration of an entire database cluster.
+     </para>
+    </listitem>
+   </orderedlist>
+  </para>
+
+  <para>
+   An alternative file-system backup approach is to make a
+   <quote>consistent snapshot</quote> of the data directory, if the
+   file system supports that functionality (and you are willing to
+   trust that it is implemented correctly).  The typical procedure is
+   to make a <quote>frozen snapshot</quote> of the volume containing the
+   database, then copy the whole data directory (not just parts, see
+   above) from the snapshot to a backup device, then release the frozen
+   snapshot.  This will work even while the database server is running.
+   However, a backup created in this way saves
+   the database files in a state as if the database server was not
+   properly shut down; therefore, when you start the database server
+   on the backed-up data, it will think the previous server instance
+   crashed and will replay the WAL log.  This is not a problem; just
+   be aware of it (and be sure to include the WAL files in your backup).
+   You can perform a <command>CHECKPOINT</command> before taking the
+   snapshot to reduce recovery time.
+  </para>
+
+  <para>
+   If your database is spread across multiple file systems, there might not
+   be any way to obtain exactly-simultaneous frozen snapshots of all
+   the volumes.  For example, if your data files and WAL log are on different
+   disks, or if tablespaces are on different file systems, it might
+   not be possible to use snapshot backup because the snapshots
+   <emphasis>must</emphasis> be simultaneous.
+   Read your file system documentation very carefully before trusting
+   the consistent-snapshot technique in such situations.
+  </para>
+
+  <para>
+   If simultaneous snapshots are not possible, one option is to shut down
+   the database server long enough to establish all the frozen snapshots.
+   Another option is to perform a continuous archiving base backup (<xref
+   linkend="backup-base-backup"/>) because such backups are immune to file
+   system changes during the backup.  This requires enabling continuous
+   archiving just during the backup process; restore is done using
+   continuous archive recovery (<xref linkend="backup-pitr-recovery"/>).
+  </para>
+
+  <para>
+   Another option is to use <application>rsync</application> to perform a file
+   system backup.  This is done by first running <application>rsync</application>
+   while the database server is running, then shutting down the database
+   server long enough to do an <command>rsync --checksum</command>.
+   (<option>--checksum</option> is necessary because <command>rsync</command> only
+   has file modification-time granularity of one second.)  The
+   second <application>rsync</application> will be quicker than the first,
+   because it has relatively little data to transfer, and the end result
+   will be consistent because the server was down.  This method
+   allows a file system backup to be performed with minimal downtime.
+  </para>
+
+  <para>
+   Note that a file system backup will typically be larger
+   than an SQL dump. (<application>pg_dump</application> does not need to dump
+   the contents of indexes for example, just the commands to recreate
+   them.)  However, taking a file system backup might be faster.
+  </para>
+ </sect1>
+
 </chapter>
-- 
2.48.1