archive modules

Started by Bossart, Nathanabout 4 years ago102 messages
#1Bossart, Nathan
bossartn@amazon.com
1 attachment(s)

This thread is a continuation of the thread with the subject
"parallelizing the archiver" [0]/messages/by-id/BC4D6BB2-6976-4397-A417-A6A30EEDC63E@amazon.com. That thread had morphed into an
effort to allow creating archive modules, so I've created a new one to
ensure that this topic has the proper visibility.

I've attached the latest patch from the previous thread. This patch
does a few things. First, it adds the archive_library GUC that
specifies a library to use in place of archive_command. If
archive_library is set to "shell" (the default), archive_command is
still used. The archive_library is preloaded, so its _PG_init() can
do anything that libraries loaded via shared_preload_libraries can do.
Like logical decoding output plugins, archive modules must define an
initialization function and some callbacks. The patch also introduces
the basic_archive module to ensure test coverage on the new
infrastructure.

Nathan

[0]: /messages/by-id/BC4D6BB2-6976-4397-A417-A6A30EEDC63E@amazon.com

Attachments:

v7-0001-Introduce-archive-module-infrastructure.patchapplication/octet-stream; name=v7-0001-Introduce-archive-module-infrastructure.patchDownload
From 72e606ca7ab3b411de2971600b3ed0a64e2644ec Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Wed, 27 Oct 2021 03:22:04 +0000
Subject: [PATCH v7 1/1] Introduce archive module infrastructure.

This feature allows custom archive libraries to be used in place of
archive_command.  A new GUC called archive_library specifies the
archive module that should be used.  The library is preloaded, so
its _PG_init() can do anything that libraries loaded via
shared_preload_libraries can do.  Like logical decoding output
plugins, archive modules must define an initialization function and
some callbacks.  If archive_library is set to "shell" (which is the
default for backward compatibility), archive_command is used.
---
 doc/src/sgml/archive-modules.sgml                  | 133 +++++++++++++++
 doc/src/sgml/backup.sgml                           |  83 +++++----
 doc/src/sgml/config.sgml                           |  37 +++-
 doc/src/sgml/filelist.sgml                         |   1 +
 doc/src/sgml/high-availability.sgml                |   6 +-
 doc/src/sgml/postgres.sgml                         |   1 +
 doc/src/sgml/ref/pg_basebackup.sgml                |   4 +-
 doc/src/sgml/ref/pg_receivewal.sgml                |   6 +-
 doc/src/sgml/wal.sgml                              |   2 +-
 src/backend/access/transam/xlog.c                  |   2 +-
 src/backend/postmaster/Makefile                    |   1 +
 src/backend/postmaster/pgarch.c                    | 182 +++++++-------------
 src/backend/postmaster/postmaster.c                |   2 +
 src/backend/postmaster/shell_archive.c             | 156 +++++++++++++++++
 src/backend/utils/init/miscinit.c                  |  27 +++
 src/backend/utils/misc/guc.c                       |  15 +-
 src/backend/utils/misc/postgresql.conf.sample      |   1 +
 src/include/access/xlog.h                          |   1 -
 src/include/miscadmin.h                            |   2 +
 src/include/postmaster/pgarch.h                    |  45 +++++
 src/test/modules/Makefile                          |   1 +
 src/test/modules/basic_archive/.gitignore          |   4 +
 src/test/modules/basic_archive/Makefile            |  20 +++
 src/test/modules/basic_archive/basic_archive.c     | 189 +++++++++++++++++++++
 src/test/modules/basic_archive/basic_archive.conf  |   3 +
 .../basic_archive/expected/basic_archive.out       |  29 ++++
 .../modules/basic_archive/sql/basic_archive.sql    |  22 +++
 27 files changed, 802 insertions(+), 173 deletions(-)
 create mode 100644 doc/src/sgml/archive-modules.sgml
 create mode 100644 src/backend/postmaster/shell_archive.c
 create mode 100644 src/test/modules/basic_archive/.gitignore
 create mode 100644 src/test/modules/basic_archive/Makefile
 create mode 100644 src/test/modules/basic_archive/basic_archive.c
 create mode 100644 src/test/modules/basic_archive/basic_archive.conf
 create mode 100644 src/test/modules/basic_archive/expected/basic_archive.out
 create mode 100644 src/test/modules/basic_archive/sql/basic_archive.sql

diff --git a/doc/src/sgml/archive-modules.sgml b/doc/src/sgml/archive-modules.sgml
new file mode 100644
index 0000000000..d69b462578
--- /dev/null
+++ b/doc/src/sgml/archive-modules.sgml
@@ -0,0 +1,133 @@
+<!-- doc/src/sgml/archive-modules.sgml -->
+
+<chapter id="archive-modules">
+ <title>Archive Modules</title>
+ <indexterm zone="archive-modules">
+  <primary>Archive Modules</primary>
+ </indexterm>
+
+ <para>
+  PostgreSQL provides infrastructure to create custom modules for continuous
+  archiving (see <xref linkend="continuous-archiving"/>).  While archiving via
+  a shell command (i.e., <xref linkend="guc-archive-command"/>) is much
+  simpler, a custom archive module will often be considerably more robust and
+  performant.
+ </para>
+
+ <para>
+  When a custom <xref linkend="guc-archive-library"/> is configured, PostgreSQL
+  will submit completed WAL files to the module, and the server will avoid
+  recyling or removing these WAL files until the module indicates that the files
+  were successfully archived.  It is ultimately up to the module to decide what
+  to do with each WAL file, but many recommendations are listed at
+  <xref linkend="backup-archiving-wal"/>.
+ </para>
+
+ <para>
+  Archiving modules must at least consist of an initialization function (see
+  <xref linkend="archive-module-init"/>) and the required callbacks (see
+  <xref linkend="archive-module-callbacks"/>).  However, archive modules are
+  also permitted to do much more (e.g., declare GUCs, register background
+  workers, and implement SQL functions).
+ </para>
+
+ <para>
+  The <filename>src/test/modules/basic_archive</filename> module contains a
+  working example, which demonstrates some useful techniques.
+ </para>
+
+ <warning>
+  <para>
+   There are considerable robustness and security risks in using archive modules
+   because, being written in the <literal>C</literal> language, they have access
+   to many of the server resources.  Administrators wishing to enable archive
+   modules should exercise extreme caution.  Only carefully audited modules
+   should be loaded.
+  </para>
+ </warning>
+
+ <sect1 id="archive-module-init">
+  <title>Initialization Functions</title>
+  <indexterm zone="archive-module-init">
+   <primary>_PG_archive_module_init</primary>
+  </indexterm>
+  <para>
+   An archive library is loaded by dynamically loading a shared library with the
+   <xref linkend="guc-archive-library"/>'s name as the library base name.  The
+   normal library search path is used to locate the library.  To provide the
+   required archive module callbacks and to indicate that the library is
+   actually an archive module, it needs to provide a function named
+   <function>PG_archive_module_init</function>.  This function is passed a
+   struct that needs to be filled with the callback function pointers for
+   individual actions.
+
+<programlisting>
+typedef struct ArchiveModuleCallbacks
+{
+    ArchiveCheckConfiguredCB check_configured_cb;
+    ArchiveFileCB archive_file_cb;
+} ArchiveModuleCallbacks;
+typedef void (*ArchiveModuleInit) (struct ArchiveModuleCallbacks *cb);
+</programlisting>
+
+   Both callbacks are required.
+  </para>
+
+  <para>
+   Archive libraries are preloaded in a similar fashion as
+   <xref linkend="guc-shared-preload-libraries"/>.  This means that it is
+   possible to do things in the module's <function>_PG_init</function> function
+   that can only be done at server start.  The
+   <varname>process_archive_library_in_progress</varname> will be set to
+   <literal>true</literal> when the archive library is being preloaded during
+   server startup.
+  </para>
+ </sect1>
+
+ <sect1 id="archive-module-callbacks">
+  <title>Archive Module Callbacks</title>
+  <para>
+   The archive callbacks define the actual archiving behavior of the module.
+   The server will call them as required to process each individual WAL file.
+  </para>
+
+  <sect2 id="archive-module-check">
+   <title>Check Callback</title>
+   <para>
+    The <function>check_configured_cb</function> callback is called to determine
+    whether the module is fully configured and ready to accept WAL files.
+
+<programlisting>
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server will proceed with
+    archiving the file by calling the <function>archive_file_cb</function>
+    callback.  If <literal>false</literal> is returned, archiving will not
+    proceed.  In the latter case, the server will periodically call this
+    function, and archiving will proceed if it eventually returns
+    <literal>true</literal>.
+   </para>
+  </sect2>
+
+  <sect2 id="archive-module-archive">
+   <title>Archive Callback</title>
+   <para>
+    The <function>archive_file_cb</function> callback is called to archive a
+    single WAL file.
+
+<programlisting>
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server proceeds as if the file
+    was successfully archived, which may include recycling or removing the
+    original WAL file.  If <literal>false</literal> is returned, the server will
+    keep the original WAL file and retry archiving later.
+    <literal>file</literal> will contain just the file name of the WAL file to
+    archive, while <literal>path</literal> contains the full path of the WAL
+    file (including the file name).
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index cba32b6eb3..b42f1b3ca7 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -593,20 +593,23 @@ tar -cf backup.tar /usr/local/pgsql/data
     provide the database administrator with flexibility,
     <productname>PostgreSQL</productname> tries not to make any assumptions about how
     the archiving will be done.  Instead, <productname>PostgreSQL</productname> lets
-    the administrator specify a shell command to be executed to copy a
-    completed segment file to wherever it needs to go.  The command could be
-    as simple as a <literal>cp</literal>, or it could invoke a complex shell
-    script &mdash; it's all up to you.
+    the administrator specify an archive library to be executed to copy a
+    completed segment file to wherever it needs to go.  This could be as simple
+    as a shell command that uses <literal>cp</literal>, or it could invoke a
+    complex C function &mdash; it's all up to you.
    </para>
 
    <para>
     To enable WAL archiving, set the <xref linkend="guc-wal-level"/>
     configuration parameter to <literal>replica</literal> or higher,
     <xref linkend="guc-archive-mode"/> to <literal>on</literal>,
-    and specify the shell command to use in the <xref
-    linkend="guc-archive-command"/> configuration parameter.  In practice
+    and specify the library to use in the <xref
+    linkend="guc-archive-library"/> configuration parameter.  In practice
     these settings will always be placed in the
     <filename>postgresql.conf</filename> file.
+    One simple way to archive is to set <varname>archive_library</varname> to
+    <literal>shell</literal> and to specify a shell command in
+    <xref linkend="guc-archive-command"/>.
     In <varname>archive_command</varname>,
     <literal>%p</literal> is replaced by the path name of the file to
     archive, while <literal>%f</literal> is replaced by only the file name.
@@ -631,7 +634,17 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command will be executed under the ownership of the same
+    Another way to archive is to use a custom archive module as the
+    <varname>archive_library</varname>.  Since such modules are written in
+    <literal>C</literal>, creating your own may require considerably more effort
+    than writing a shell command.  However, archive modules can be more
+    performant than archiving via shell, and they will have access to many
+    useful server resources.  For more information about archive modules, see
+    <xref linkend="archive-modules"/>.
+   </para>
+
+   <para>
+    The archive library will be executed under the ownership of the same
     user that the <productname>PostgreSQL</productname> server is running as.  Since
     the series of WAL files being archived contains effectively everything
     in your database, you will want to be sure that the archived data is
@@ -640,25 +653,31 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is important that the archive command return zero exit status if and
-    only if it succeeds.  Upon getting a zero result,
+    It is important that the archive function return <literal>true</literal> if
+    and only if it succeeds.  If <literal>true</literal> is returned,
     <productname>PostgreSQL</productname> will assume that the file has been
-    successfully archived, and will remove or recycle it.  However, a nonzero
-    status tells <productname>PostgreSQL</productname> that the file was not archived;
-    it will try again periodically until it succeeds.
+    successfully archived, and will remove or recycle it.  However, a return
+    value of <literal>false</literal> tells
+    <productname>PostgreSQL</productname> that the file was not archived; it
+    will try again periodically until it succeeds.  If you are archiving via a
+    shell command, the appropriate return values can be achieved by returning
+    <literal>0</literal> if the command succeeds and a nonzero value if it
+    fails.
    </para>
 
    <para>
-    When the archive command is terminated by a signal (other than
-    <systemitem>SIGTERM</systemitem> that is used as part of a server
-    shutdown) or an error by the shell with an exit status greater than
-    125 (such as command not found), the archiver process aborts and gets
-    restarted by the postmaster. In such cases, the failure is
-    not reported in <xref linkend="pg-stat-archiver-view"/>.
+    If the archive function emits an <literal>ERROR</literal> or
+    <literal>FATAL</literal>, the archiver process aborts and gets restarted by
+    the postmaster.  If you are archiving via shell command, FATAL is emitted if
+    the command is terminated by a signal (other than
+    <systemitem>SIGTERM</systemitem> that is used as part of a server shutdown)
+    or an error by the shell with an exit status greater than 125 (such as
+    command not found).  In such cases, the failure is not reported in
+    <xref linkend="pg-stat-archiver-view"/>.
    </para>
 
    <para>
-    The archive command should generally be designed to refuse to overwrite
+    The archive library should generally be designed to refuse to overwrite
     any pre-existing archive file.  This is an important safety feature to
     preserve the integrity of your archive in case of administrator error
     (such as sending the output of two different servers to the same archive
@@ -666,9 +685,9 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is advisable to test your proposed archive command to ensure that it
+    It is advisable to test your proposed archive library to ensure that it
     indeed does not overwrite an existing file, <emphasis>and that it returns
-    nonzero status in this case</emphasis>.
+    <literal>false</literal> in this case</emphasis>.
     The example command above for Unix ensures this by including a separate
     <command>test</command> step.  On some Unix platforms, <command>cp</command> has
     switches such as <option>-i</option> that can be used to do the same thing
@@ -680,7 +699,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
 
    <para>
     While designing your archiving setup, consider what will happen if
-    the archive command fails repeatedly because some aspect requires
+    the archive library fails repeatedly because some aspect requires
     operator intervention or the archive runs out of space. For example, this
     could occur if you write to tape without an autochanger; when the tape
     fills, nothing further can be archived until the tape is swapped.
@@ -695,7 +714,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The speed of the archiving command is unimportant as long as it can keep up
+    The speed of the archive library is unimportant as long as it can keep up
     with the average rate at which your server generates WAL data.  Normal
     operation continues even if the archiving process falls a little behind.
     If archiving falls significantly behind, this will increase the amount of
@@ -707,11 +726,11 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    In writing your archive command, you should assume that the file names to
+    In writing your archive library, you should assume that the file names to
     be archived can be up to 64 characters long and can contain any
     combination of ASCII letters, digits, and dots.  It is not necessary to
-    preserve the original relative path (<literal>%p</literal>) but it is necessary to
-    preserve the file name (<literal>%f</literal>).
+    preserve the original relative path but it is necessary to preserve the file
+    name.
    </para>
 
    <para>
@@ -728,7 +747,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command is only invoked on completed WAL segments.  Hence,
+    The archive function is only invoked on completed WAL segments.  Hence,
     if your server generates only little WAL traffic (or has slack periods
     where it does so), there could be a long delay between the completion
     of a transaction and its safe recording in archive storage.  To put
@@ -758,7 +777,8 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
     contain enough information for archive recovery.  (Crash recovery is
     unaffected.)  For this reason, <varname>wal_level</varname> can only be changed at
     server start.  However, <varname>archive_command</varname> can be changed with a
-    configuration file reload.  If you wish to temporarily stop archiving,
+    configuration file reload.  If you are archiving via shell and wish to
+    temporarily stop archiving,
     one way to do it is to set <varname>archive_command</varname> to the empty
     string (<literal>''</literal>).
     This will cause WAL files to accumulate in <filename>pg_wal/</filename> until a
@@ -938,11 +958,11 @@ SELECT * FROM pg_stop_backup(false, true);
      On a standby, <varname>archive_mode</varname> must be <literal>always</literal> in order
      for <function>pg_stop_backup</function> to wait.
      Archiving of these files happens automatically since you have
-     already configured <varname>archive_command</varname>. In most cases this
+     already configured <varname>archive_library</varname>. In most cases this
      happens quickly, but you are advised to monitor your archive
      system to ensure there are no delays.
      If the archive process has fallen behind
-     because of failures of the archive command, it will keep retrying
+     because of failures of the archive library, it will keep retrying
      until the archive succeeds and the backup is complete.
      If you wish to place a time limit on the execution of
      <function>pg_stop_backup</function>, set an appropriate
@@ -1500,9 +1520,10 @@ restore_command = 'cp /mnt/server/archivedir/%f %p'
       To prepare for low level standalone hot backups, make sure
       <varname>wal_level</varname> is set to
       <literal>replica</literal> or higher, <varname>archive_mode</varname> to
-      <literal>on</literal>, and set up an <varname>archive_command</varname> that performs
+      <literal>on</literal>, and set up an <varname>archive_library</varname> that performs
       archiving only when a <emphasis>switch file</emphasis> exists.  For example:
 <programlisting>
+archive_library = 'shell'
 archive_command = 'test ! -f /var/lib/pgsql/backup_in_progress || (test ! -f /var/lib/pgsql/archive/%f &amp;&amp; cp %p /var/lib/pgsql/archive/%f)'
 </programlisting>
       This command will perform archiving when
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index de77f14573..1e6ab34913 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3479,7 +3479,7 @@ include_dir 'conf.d'
         Maximum size to let the WAL grow during automatic
         checkpoints. This is a soft limit; WAL size can exceed
         <varname>max_wal_size</varname> under special circumstances, such as
-        heavy load, a failing <varname>archive_command</varname>, or a high
+        heavy load, a failing <varname>archive_library</varname>, or a high
         <varname>wal_keep_size</varname> setting.
         If this value is specified without units, it is taken as megabytes.
         The default is 1 GB.
@@ -3528,7 +3528,7 @@ include_dir 'conf.d'
        <para>
         When <varname>archive_mode</varname> is enabled, completed WAL segments
         are sent to archive storage by setting
-        <xref linkend="guc-archive-command"/>. In addition to <literal>off</literal>,
+        <xref linkend="guc-archive-library"/>. In addition to <literal>off</literal>,
         to disable, there are two modes: <literal>on</literal>, and
         <literal>always</literal>. During normal operation, there is no
         difference between the two modes, but when set to <literal>always</literal>
@@ -3538,9 +3538,6 @@ include_dir 'conf.d'
         <xref linkend="continuous-archiving-in-standby"/> for details.
        </para>
        <para>
-        <varname>archive_mode</varname> and <varname>archive_command</varname> are
-        separate variables so that <varname>archive_command</varname> can be
-        changed without leaving archiving mode.
         This parameter can only be set at server start.
         <varname>archive_mode</varname> cannot be enabled when
         <varname>wal_level</varname> is set to <literal>minimal</literal>.
@@ -3548,6 +3545,28 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-archive-library" xreflabel="archive_library">
+      <term><varname>archive_library</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>archive_library</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        The library to use for archiving completed WAL file segments.  If set to
+        <literal>shell</literal> (the default) or an empty string, archiving via
+        shell is enabled, and <xref linkend="guc-archive-command"/> is used.
+        Otherwise, the specified shared library is preloaded and is used for
+        archiving.  For more information, see
+        <xref linkend="backup-archiving-wal"/> and
+        <xref linkend="archive-modules"/>.
+       </para>
+       <para>
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-archive-command" xreflabel="archive_command">
       <term><varname>archive_command</varname> (<type>string</type>)
       <indexterm>
@@ -3570,9 +3589,11 @@ include_dir 'conf.d'
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
         file or on the server command line.  It is ignored unless
-        <varname>archive_mode</varname> was enabled at server start.
+        <varname>archive_mode</varname> was enabled at server start and
+        <varname>archive_library</varname> specifies to archive via shell command.
         If <varname>archive_command</varname> is an empty string (the default) while
-        <varname>archive_mode</varname> is enabled, WAL archiving is temporarily
+        <varname>archive_mode</varname> is enabled and <varname>archive_library</varname>
+        specifies archiving via shell, WAL archiving is temporarily
         disabled, but the server continues to accumulate WAL segment files in
         the expectation that a command will soon be provided.  Setting
         <varname>archive_command</varname> to a command that does nothing but
@@ -3592,7 +3613,7 @@ include_dir 'conf.d'
       </term>
       <listitem>
        <para>
-        The <xref linkend="guc-archive-command"/> is only invoked for
+        The <xref linkend="guc-archive-library"/> is only invoked for
         completed WAL segments. Hence, if your server generates little WAL
         traffic (or has slack periods where it does so), there could be a
         long delay between the completion of a transaction and its safe
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 89454e99b9..e6b472ec32 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -99,6 +99,7 @@
 <!ENTITY custom-scan SYSTEM "custom-scan.sgml">
 <!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
 <!ENTITY replication-origins SYSTEM "replication-origins.sgml">
+<!ENTITY archive-modules SYSTEM "archive-modules.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index c43f214020..f4e5e9420b 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -935,7 +935,7 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
     In lieu of using replication slots, it is possible to prevent the removal
     of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by
     storing the segments in an archive using
-    <xref linkend="guc-archive-command"/>.
+    <xref linkend="guc-archive-library"/>.
     However, these methods often result in retaining more WAL segments than
     required, whereas replication slots retain only the number of segments
     known to be needed.  On the other hand, replication slots can retain so
@@ -1386,10 +1386,10 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
      to <literal>always</literal>, and the standby will call the archive
      command for every WAL segment it receives, whether it's by restoring
      from the archive or by streaming replication. The shared archive can
-     be handled similarly, but the <varname>archive_command</varname> must
+     be handled similarly, but the <varname>archive_library</varname> must
      test if the file being archived exists already, and if the existing file
      has identical contents. This requires more care in the
-     <varname>archive_command</varname>, as it must
+     <varname>archive_library</varname>, as it must
      be careful to not overwrite an existing file with different contents,
      but return success if the exactly same file is archived twice. And
      all that must be done free of race conditions, if two servers attempt
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index dba9cf413f..3db6d2160b 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -233,6 +233,7 @@ break is not needed in a wider output rendering.
   &bgworker;
   &logicaldecoding;
   &replication-origins;
+  &archive-modules;
 
  </part>
 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9e6807b457..2aaeaca766 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -102,8 +102,8 @@ PostgreSQL documentation
      <para>
       All WAL records required for the backup must contain sufficient full-page writes,
       which requires you to enable <varname>full_page_writes</varname> on the primary and
-      not to use a tool like <application>pg_compresslog</application> as
-      <varname>archive_command</varname> to remove full-page writes from WAL files.
+      not to use a tool in your <varname>archive_library</varname> to remove
+      full-page writes from WAL files.
      </para>
     </listitem>
    </itemizedlist>
diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 9fde2fd2ef..10ee107000 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -40,7 +40,7 @@ PostgreSQL documentation
   <para>
    <application>pg_receivewal</application> streams the write-ahead
    log in real time as it's being generated on the server, and does not wait
-   for segments to complete like <xref linkend="guc-archive-command"/> does.
+   for segments to complete like <xref linkend="guc-archive-library"/> does.
    For this reason, it is not necessary to set
    <xref linkend="guc-archive-timeout"/> when using
     <application>pg_receivewal</application>.
@@ -465,11 +465,11 @@ PostgreSQL documentation
 
   <para>
    When using <application>pg_receivewal</application> instead of
-   <xref linkend="guc-archive-command"/> as the main WAL backup method, it is
+   <xref linkend="guc-archive-library"/> as the main WAL backup method, it is
    strongly recommended to use replication slots.  Otherwise, the server is
    free to recycle or remove write-ahead log files before they are backed up,
    because it does not have any information, either
-   from <xref linkend="guc-archive-command"/> or the replication slots, about
+   from <xref linkend="guc-archive-library"/> or the replication slots, about
    how far the WAL stream has been archived.  Note, however, that a
    replication slot will fill up the server's disk space if the receiver does
    not keep up with fetching the WAL data.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 24e1c89503..2bb27a8468 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -636,7 +636,7 @@
    WAL files plus one additional WAL file are
    kept at all times. Also, if WAL archiving is used, old segments cannot be
    removed or recycled until they are archived. If WAL archiving cannot keep up
-   with the pace that WAL is generated, or if <varname>archive_command</varname>
+   with the pace that WAL is generated, or if <varname>archive_library</varname>
    fails repeatedly, old WAL files will accumulate in <filename>pg_wal</filename>
    until the situation is resolved. A slow or failed standby server that
    uses a replication slot will have the same effect (see
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f547efd294..6350656a8b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8795,7 +8795,7 @@ ShutdownXLOG(int code, Datum arg)
 		 * process one more time at the end of shutdown). The checkpoint
 		 * record will go to the next XLOG file and won't be archived (yet).
 		 */
-		if (XLogArchivingActive() && XLogArchiveCommandSet())
+		if (XLogArchivingActive())
 			RequestXLogSwitch(false);
 
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 787c6a2c3b..dbbeac5a82 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -23,6 +23,7 @@ OBJS = \
 	pgarch.o \
 	pgstat.o \
 	postmaster.o \
+	shell_archive.o \
 	startup.o \
 	syslogger.o \
 	walwriter.o
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 74a7d7c4d0..f0e437f820 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -25,18 +25,12 @@
  */
 #include "postgres.h"
 
-#include <fcntl.h>
-#include <signal.h>
-#include <time.h>
 #include <sys/stat.h>
-#include <sys/time.h>
-#include <sys/wait.h>
 #include <unistd.h>
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "libpq/pqsignal.h"
-#include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/pgarch.h"
@@ -78,6 +72,8 @@ typedef struct PgArchData
 	int			pgprocno;		/* pgprocno of archiver process */
 } PgArchData;
 
+char *XLogArchiveLibrary = "";
+
 
 /* ----------
  * Local data
@@ -85,6 +81,8 @@ typedef struct PgArchData
  */
 static time_t last_sigterm_time = 0;
 static PgArchData *PgArch = NULL;
+static ArchiveModuleCallbacks *ArchiveContext = NULL;
+
 
 /*
  * Flags set by interrupt handlers for later service in the main loop.
@@ -103,6 +101,7 @@ static bool pgarch_readyXlog(char *xlog);
 static void pgarch_archiveDone(char *xlog);
 static void pgarch_die(int code, Datum arg);
 static void HandlePgArchInterrupts(void);
+static void LoadArchiveLibrary(void);
 
 /* Report shared memory space needed by PgArchShmemInit */
 Size
@@ -198,6 +197,11 @@ PgArchiverMain(void)
 	 */
 	PgArch->pgprocno = MyProc->pgprocno;
 
+	/*
+	 * Load the archive_library.
+	 */
+	LoadArchiveLibrary();
+
 	pgarch_MainLoop();
 
 	proc_exit(0);
@@ -358,11 +362,11 @@ pgarch_ArchiverCopyLoop(void)
 			 */
 			HandlePgArchInterrupts();
 
-			/* can't do anything if no command ... */
-			if (!XLogArchiveCommandSet())
+			/* can't do anything if not configured ... */
+			if (!ArchiveContext->check_configured_cb())
 			{
 				ereport(WARNING,
-						(errmsg("archive_mode enabled, yet archive_command is not set")));
+						(errmsg("archive_mode enabled, yet archiving is not configured")));
 				return;
 			}
 
@@ -443,136 +447,31 @@ pgarch_ArchiverCopyLoop(void)
 /*
  * pgarch_archiveXlog
  *
- * Invokes system(3) to copy one archive file to wherever it should go
+ * Invokes archive_file_cb to copy one archive file to wherever it should go
  *
  * Returns true if successful
  */
 static bool
 pgarch_archiveXlog(char *xlog)
 {
-	char		xlogarchcmd[MAXPGPATH];
 	char		pathname[MAXPGPATH];
 	char		activitymsg[MAXFNAMELEN + 16];
-	char	   *dp;
-	char	   *endp;
-	const char *sp;
-	int			rc;
+	bool		ret;
 
 	snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
 
-	/*
-	 * construct the command to be executed
-	 */
-	dp = xlogarchcmd;
-	endp = xlogarchcmd + MAXPGPATH - 1;
-	*endp = '\0';
-
-	for (sp = XLogArchiveCommand; *sp; sp++)
-	{
-		if (*sp == '%')
-		{
-			switch (sp[1])
-			{
-				case 'p':
-					/* %p: relative path of source file */
-					sp++;
-					strlcpy(dp, pathname, endp - dp);
-					make_native_path(dp);
-					dp += strlen(dp);
-					break;
-				case 'f':
-					/* %f: filename of source file */
-					sp++;
-					strlcpy(dp, xlog, endp - dp);
-					dp += strlen(dp);
-					break;
-				case '%':
-					/* convert %% to a single % */
-					sp++;
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-				default:
-					/* otherwise treat the % as not special */
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-			}
-		}
-		else
-		{
-			if (dp < endp)
-				*dp++ = *sp;
-		}
-	}
-	*dp = '\0';
-
-	ereport(DEBUG3,
-			(errmsg_internal("executing archive command \"%s\"",
-							 xlogarchcmd)));
-
 	/* Report archive activity in PS display */
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	rc = system(xlogarchcmd);
-	if (rc != 0)
-	{
-		/*
-		 * If either the shell itself, or a called command, died on a signal,
-		 * abort the archiver.  We do this because system() ignores SIGINT and
-		 * SIGQUIT while waiting; so a signal is very likely something that
-		 * should have interrupted us too.  Also die if the shell got a hard
-		 * "command not found" type of error.  If we overreact it's no big
-		 * deal, the postmaster will just start the archiver again.
-		 */
-		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
-
-		if (WIFEXITED(rc))
-		{
-			ereport(lev,
-					(errmsg("archive command failed with exit code %d",
-							WEXITSTATUS(rc)),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-		else if (WIFSIGNALED(rc))
-		{
-#if defined(WIN32)
-			ereport(lev,
-					(errmsg("archive command was terminated by exception 0x%X",
-							WTERMSIG(rc)),
-					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#else
-			ereport(lev,
-					(errmsg("archive command was terminated by signal %d: %s",
-							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#endif
-		}
-		else
-		{
-			ereport(lev,
-					(errmsg("archive command exited with unrecognized status %d",
-							rc),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-
+	ret = ArchiveContext->archive_file_cb(xlog, pathname);
+	if (ret)
+		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
+	else
 		snprintf(activitymsg, sizeof(activitymsg), "failed on %s", xlog);
-		set_ps_display(activitymsg);
-
-		return false;
-	}
-	elog(DEBUG1, "archived write-ahead log file \"%s\"", xlog);
-
-	snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	set_ps_display(activitymsg);
 
-	return true;
+	return ret;
 }
 
 /*
@@ -716,3 +615,44 @@ HandlePgArchInterrupts(void)
 		ProcessConfigFile(PGC_SIGHUP);
 	}
 }
+
+/*
+ * LoadArchiveLibrary
+ *
+ * Loads the archiving callbacks into our local ArchiveContext.
+ */
+static void
+LoadArchiveLibrary(void)
+{
+	ArchiveContext = palloc0(sizeof(ArchiveModuleCallbacks));
+
+	/*
+	 * If shell archiving is enabled, use our special initialization
+	 * function.  Otherwise, load the library and call its
+	 * _PG_archive_module_init().
+	 */
+	if (ShellArchivingEnabled())
+		shell_archive_init(ArchiveContext);
+	else
+	{
+		ArchiveModuleInit archive_init;
+
+		archive_init = (ArchiveModuleInit)
+			load_external_function(XLogArchiveLibrary,
+								   "_PG_archive_module_init", false, NULL);
+
+		if (archive_init == NULL)
+			ereport(ERROR,
+					(errmsg("archive modules have to declare the "
+							"_PG_archive_module_init symbol")));
+
+		archive_init(ArchiveContext);
+	}
+
+	if (ArchiveContext->check_configured_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register a check callback")));
+	if (ArchiveContext->archive_file_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register an archive callback")));
+}
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index e2a76ba055..f43c6b4cdc 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1024,6 +1024,7 @@ PostmasterMain(int argc, char *argv[])
 	 * process any libraries that should be preloaded at postmaster start
 	 */
 	process_shared_preload_libraries();
+	process_archive_library();
 
 	/*
 	 * Initialize SSL library, if specified.
@@ -5011,6 +5012,7 @@ SubPostmasterMain(int argc, char *argv[])
 	 * non-EXEC_BACKEND behavior.
 	 */
 	process_shared_preload_libraries();
+	process_archive_library();
 
 	/* Run backend or appropriate child */
 	if (strcmp(argv[1], "--forkbackend") == 0)
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
new file mode 100644
index 0000000000..7298dda6ee
--- /dev/null
+++ b/src/backend/postmaster/shell_archive.c
@@ -0,0 +1,156 @@
+/*-------------------------------------------------------------------------
+ *
+ * shell_archive.c
+ *
+ * This archiving function uses a user-specified shell command (the
+ * archive_command GUC) to copy write-ahead log files.  It is used as the
+ * default, but other modules may define their own custom archiving logic.
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/shell_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/wait.h>
+
+#include "access/xlog.h"
+#include "postmaster/pgarch.h"
+
+static bool shell_archive_configured(void);
+static bool shell_archive_file(const char *file, const char *path);
+
+void
+shell_archive_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&shell_archive_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = shell_archive_configured;
+	cb->archive_file_cb = shell_archive_file;
+}
+
+static bool
+shell_archive_configured(void)
+{
+	return XLogArchiveCommand[0] != '\0';
+}
+
+static bool
+shell_archive_file(const char *file, const char *path)
+{
+	char		xlogarchcmd[MAXPGPATH];
+	char	   *dp;
+	char	   *endp;
+	const char *sp;
+	int			rc;
+
+	Assert(file != NULL);
+	Assert(path != NULL);
+
+	/*
+	 * construct the command to be executed
+	 */
+	dp = xlogarchcmd;
+	endp = xlogarchcmd + MAXPGPATH - 1;
+	*endp = '\0';
+
+	for (sp = XLogArchiveCommand; *sp; sp++)
+	{
+		if (*sp == '%')
+		{
+			switch (sp[1])
+			{
+				case 'p':
+					/* %p: relative path of source file */
+					sp++;
+					strlcpy(dp, path, endp - dp);
+					make_native_path(dp);
+					dp += strlen(dp);
+					break;
+				case 'f':
+					/* %f: filename of source file */
+					sp++;
+					strlcpy(dp, file, endp - dp);
+					dp += strlen(dp);
+					break;
+				case '%':
+					/* convert %% to a single % */
+					sp++;
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+				default:
+					/* otherwise treat the % as not special */
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+			}
+		}
+		else
+		{
+			if (dp < endp)
+				*dp++ = *sp;
+		}
+	}
+	*dp = '\0';
+
+	ereport(DEBUG3,
+			(errmsg_internal("executing archive command \"%s\"",
+							 xlogarchcmd)));
+
+	rc = system(xlogarchcmd);
+	if (rc != 0)
+	{
+		/*
+		 * If either the shell itself, or a called command, died on a signal,
+		 * abort the archiver.  We do this because system() ignores SIGINT and
+		 * SIGQUIT while waiting; so a signal is very likely something that
+		 * should have interrupted us too.  Also die if the shell got a hard
+		 * "command not found" type of error.  If we overreact it's no big
+		 * deal, the postmaster will just start the archiver again.
+		 */
+		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
+
+		if (WIFEXITED(rc))
+		{
+			ereport(lev,
+					(errmsg("archive command failed with exit code %d",
+							WEXITSTATUS(rc)),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+		else if (WIFSIGNALED(rc))
+		{
+#if defined(WIN32)
+			ereport(lev,
+					(errmsg("archive command was terminated by exception 0x%X",
+							WTERMSIG(rc)),
+					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#else
+			ereport(lev,
+					(errmsg("archive command was terminated by signal %d: %s",
+							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#endif
+		}
+		else
+		{
+			ereport(lev,
+					(errmsg("archive command exited with unrecognized status %d",
+							rc),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+
+		return false;
+	}
+
+	elog(DEBUG1, "archived write-ahead log file \"%s\"", file);
+	return true;
+}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 88801374b5..9f2766ed04 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -38,6 +38,7 @@
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/interrupt.h"
+#include "postmaster/pgarch.h"
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -1614,6 +1615,9 @@ char	   *local_preload_libraries_string = NULL;
 /* Flag telling that we are loading shared_preload_libraries */
 bool		process_shared_preload_libraries_in_progress = false;
 
+/* Flag telling that we are loading archive_library */
+bool		process_archive_library_in_progress = false;
+
 /*
  * load the shared libraries listed in 'libraries'
  *
@@ -1696,6 +1700,29 @@ process_session_preload_libraries(void)
 				   true);
 }
 
+/*
+ * process the archive library
+ */
+void
+process_archive_library(void)
+{
+	process_archive_library_in_progress = true;
+
+	/*
+	 * The shell archiving code is in the core server, so there's nothing
+	 * to load for that.
+	 */
+	if (!ShellArchivingEnabled())
+	{
+		load_file(XLogArchiveLibrary, false);
+		ereport(DEBUG1,
+				(errmsg_internal("loaded archive library \"%s\"",
+								 XLogArchiveLibrary)));
+	}
+
+	process_archive_library_in_progress = false;
+}
+
 void
 pg_bindtextdomain(const char *domain)
 {
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index e91d5a3cfd..9204f608fc 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3864,13 +3864,23 @@ static struct config_string ConfigureNamesString[] =
 	{
 		{"archive_command", PGC_SIGHUP, WAL_ARCHIVING,
 			gettext_noop("Sets the shell command that will be called to archive a WAL file."),
-			NULL
+			gettext_noop("This is unused if \"archive_library\" does not indicate archiving via shell is enabled.")
 		},
 		&XLogArchiveCommand,
 		"",
 		NULL, NULL, show_archive_command
 	},
 
+	{
+		{"archive_library", PGC_POSTMASTER, WAL_ARCHIVING,
+			gettext_noop("Sets the library that will be called to archive a WAL file."),
+			gettext_noop("A value of \"shell\" or an empty string indicates that \"archive_command\" should be used.")
+		},
+		&XLogArchiveLibrary,
+		"shell",
+		NULL, NULL, NULL
+	},
+
 	{
 		{"restore_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will be called to retrieve an archived WAL file."),
@@ -8961,7 +8971,8 @@ init_custom_variable(const char *name,
 	 * module might already have hooked into.
 	 */
 	if (context == PGC_POSTMASTER &&
-		!process_shared_preload_libraries_in_progress)
+		!process_shared_preload_libraries_in_progress &&
+		!process_archive_library_in_progress)
 		elog(FATAL, "cannot create PGC_POSTMASTER variables after startup");
 
 	/*
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 1cbc9feeb6..dc4a20b014 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -245,6 +245,7 @@
 
 #archive_mode = off		# enables archiving; off, on, or always
 				# (change requires restart)
+#archive_library = 'shell'	# library to use to archive a logfile segment
 #archive_command = ''		# command to use to archive a logfile segment
 				# placeholders: %p = path of file to archive
 				#               %f = file name only
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 5e2c94a05f..7093e3390f 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -157,7 +157,6 @@ extern PGDLLIMPORT int wal_level;
 /* Is WAL archiving enabled always (even during recovery)? */
 #define XLogArchivingAlways() \
 	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
-#define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')
 
 /*
  * Is WAL-logging necessary for archival or log-shipping, or can we skip
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 90a3016065..8717fed0dc 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -464,6 +464,7 @@ extern void BaseInit(void);
 /* in utils/init/miscinit.c */
 extern bool IgnoreSystemIndexes;
 extern PGDLLIMPORT bool process_shared_preload_libraries_in_progress;
+extern PGDLLIMPORT bool process_archive_library_in_progress;
 extern char *session_preload_libraries_string;
 extern char *shared_preload_libraries_string;
 extern char *local_preload_libraries_string;
@@ -477,6 +478,7 @@ extern bool RecheckDataDirLockFile(void);
 extern void ValidatePgVersion(const char *path);
 extern void process_shared_preload_libraries(void);
 extern void process_session_preload_libraries(void);
+extern void process_archive_library(void);
 extern void pg_bindtextdomain(const char *domain);
 extern bool has_rolreplication(Oid roleid);
 
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 1e47a143e1..7d09d2665e 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -32,4 +32,49 @@ extern bool PgArchCanRestart(void);
 extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 
+/*
+ * The value of the archive_library GUC.
+ */
+extern char *XLogArchiveLibrary;
+
+/*
+ * Callback that gets called to determine if the archive module is
+ * configured.
+ */
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+
+/*
+ * Callback called to archive a single WAL file.
+ */
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+
+/*
+ * Archive module callbacks
+ */
+typedef struct ArchiveModuleCallbacks
+{
+	ArchiveCheckConfiguredCB check_configured_cb;
+	ArchiveFileCB archive_file_cb;
+} ArchiveModuleCallbacks;
+
+/*
+ * Type of the shared library symbol _PG_archive_module_init that is looked
+ * up when loading an archive library.
+ */
+typedef void (*ArchiveModuleInit) (ArchiveModuleCallbacks *cb);
+
+/*
+ * Since the logic for archiving via a shell command is in the core server
+ * and does not need to be loaded via a shared library, it has a special
+ * initialization function.
+ */
+extern void shell_archive_init(ArchiveModuleCallbacks *cb);
+
+/*
+ * We consider archiving via shell to be enabled if archive_library is
+ * empty or if archive_library is set to "shell".
+ */
+#define ShellArchivingEnabled() \
+	(XLogArchiveLibrary[0] == '\0' || strcmp(XLogArchiveLibrary, "shell") == 0)
+
 #endif							/* _PGARCH_H */
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index dffc79b2d9..b49e508a2c 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -5,6 +5,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = \
+		  basic_archive \
 		  brin \
 		  commit_ts \
 		  delay_execution \
diff --git a/src/test/modules/basic_archive/.gitignore b/src/test/modules/basic_archive/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/basic_archive/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/basic_archive/Makefile b/src/test/modules/basic_archive/Makefile
new file mode 100644
index 0000000000..ffbf846b68
--- /dev/null
+++ b/src/test/modules/basic_archive/Makefile
@@ -0,0 +1,20 @@
+# src/test/modules/basic_archive/Makefile
+
+MODULES = basic_archive
+PGFILEDESC = "basic_archive - basic archive module"
+
+REGRESS = basic_archive
+REGRESS_OPTS = --temp-config $(top_srcdir)/src/test/modules/basic_archive/basic_archive.conf
+
+NO_INSTALLCHECK = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/basic_archive
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/basic_archive/basic_archive.c b/src/test/modules/basic_archive/basic_archive.c
new file mode 100644
index 0000000000..322049d45f
--- /dev/null
+++ b/src/test/modules/basic_archive/basic_archive.c
@@ -0,0 +1,189 @@
+/*-------------------------------------------------------------------------
+ *
+ * basic_archive.c
+ *
+ * This file demonstrates a basic archive library implementation that is
+ * roughly equivalent to the following shell command:
+ *
+ * 		test ! -f /path/to/dest && cp /path/to/src /path/to/dest
+ *
+ * One notable difference between this module and the shell command above
+ * is that this module first copies the file to a temporary destination,
+ * syncs it to disk, and then durably moves it to the final destination.
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/test/modules/basic_archive/basic_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "miscadmin.h"
+#include "postmaster/pgarch.h"
+#include "storage/copydir.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+void _PG_init(void);
+void _PG_archive_module_init(ArchiveModuleCallbacks *cb);
+
+static char *archive_directory = NULL;
+
+static bool basic_archive_configured(void);
+static bool basic_archive_file(const char *file, const char *path);
+static bool check_archive_directory(char **newval, void **extra, GucSource source);
+
+/*
+ * _PG_init
+ *
+ * Defines the module's GUC.
+ */
+void
+_PG_init(void)
+{
+	if (!process_archive_library_in_progress)
+		ereport(ERROR,
+				(errmsg("\"basic_archive\" can only be loaded via \"archive_library\"")));
+
+	DefineCustomStringVariable("basic_archive.archive_directory",
+							   gettext_noop("Archive file destination directory."),
+							   NULL,
+							   &archive_directory,
+							   "",
+							   PGC_SIGHUP,
+							   0,
+							   check_archive_directory, NULL, NULL);
+
+	EmitWarningsOnPlaceholders("basic_archive");
+}
+
+/*
+ * _PG_archive_module_init
+ *
+ * Returns the module's archiving callbacks.
+ */
+void
+_PG_archive_module_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = basic_archive_configured;
+	cb->archive_file_cb = basic_archive_file;
+}
+
+/*
+ * check_archive_directory
+ *
+ * Checks that the provided archive directory exists.
+ */
+static bool
+check_archive_directory(char **newval, void **extra, GucSource source)
+{
+	struct stat st;
+
+	/*
+	 * The default value is an empty string, so we have to accept that value.
+	 * Our check_configured callback also checks for this and prevents archiving
+	 * from proceeding if it is still empty.
+	 */
+	if (*newval == NULL || *newval[0] == '\0')
+		return true;
+
+	/*
+	 * Make sure the file paths won't be too long.  The docs indicate that the
+	 * file names to be archived can be up to 64 characters long.
+	 */
+	if (strlen(*newval) + 64 + 2 >= MAXPGPATH)
+	{
+		GUC_check_errdetail("archive directory too long");
+		return false;
+	}
+
+	/*
+	 * Do a basic sanity check that the specified archive directory exists.  It
+	 * could be removed at some point in the future, so we still need to be
+	 * prepared for it not to exist in the actual archiving logic.
+	 */
+	if (stat(*newval, &st) != 0 || !S_ISDIR(st.st_mode))
+	{
+		GUC_check_errdetail("specified archive directory does not exist");
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * basic_archive_configured
+ *
+ * Checks that archive_directory is not blank.
+ */
+static bool
+basic_archive_configured(void)
+{
+	return archive_directory != NULL && archive_directory[0] != '\0';
+}
+
+/*
+ * basic_archive_file
+ *
+ * Archives one file.
+ */
+static bool
+basic_archive_file(const char *file, const char *path)
+{
+	char destination[MAXPGPATH];
+	char temp[MAXPGPATH];
+	struct stat st;
+
+	ereport(DEBUG3,
+			(errmsg("archiving \"%s\" via basic_archive", file)));
+
+	snprintf(destination, MAXPGPATH, "%s/%s", archive_directory, file);
+	snprintf(temp, MAXPGPATH, "%s/%s", archive_directory, "archtemp");
+
+	/*
+	 * First, check if the file has already been archived.  If the archive file
+	 * already exists, something might be wrong, so we just fail.
+	 */
+	if (stat(destination, &st) == 0)
+	{
+		ereport(WARNING,
+				(errmsg("archive file \"%s\" already exists", destination)));
+		return false;
+	}
+	else if (errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat file \"%s\": %m", destination)));
+
+	/*
+	 * Remove pre-existing temporary file, if one exists.
+	 */
+	if (unlink(temp) != 0 && errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not unlink file \"%s\": %m", temp)));
+
+	/*
+	 * Copy the file to its temporary destination.
+	 */
+	copy_file(unconstify(char *, path), temp);
+
+	/*
+	 * Sync the temporary file to disk and move it to its final destination.
+	 */
+	(void) durable_rename_excl(temp, destination, ERROR);
+
+	ereport(DEBUG1,
+			(errmsg("archived \"%s\" via basic_archive", file)));
+
+	return true;
+}
diff --git a/src/test/modules/basic_archive/basic_archive.conf b/src/test/modules/basic_archive/basic_archive.conf
new file mode 100644
index 0000000000..b26b2d4144
--- /dev/null
+++ b/src/test/modules/basic_archive/basic_archive.conf
@@ -0,0 +1,3 @@
+archive_mode = 'on'
+archive_library = 'basic_archive'
+basic_archive.archive_directory = '.'
diff --git a/src/test/modules/basic_archive/expected/basic_archive.out b/src/test/modules/basic_archive/expected/basic_archive.out
new file mode 100644
index 0000000000..0015053e0f
--- /dev/null
+++ b/src/test/modules/basic_archive/expected/basic_archive.out
@@ -0,0 +1,29 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+ ?column? 
+----------
+        1
+(1 row)
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test;
diff --git a/src/test/modules/basic_archive/sql/basic_archive.sql b/src/test/modules/basic_archive/sql/basic_archive.sql
new file mode 100644
index 0000000000..14e236d57a
--- /dev/null
+++ b/src/test/modules/basic_archive/sql/basic_archive.sql
@@ -0,0 +1,22 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+
+DROP TABLE test;
-- 
2.16.6

#2Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Bossart, Nathan (#1)
Re: archive modules

On 2021/11/02 3:54, Bossart, Nathan wrote:

This thread is a continuation of the thread with the subject
"parallelizing the archiver" [0]. That thread had morphed into an
effort to allow creating archive modules, so I've created a new one to
ensure that this topic has the proper visibility.

What is the main motivation of this patch? I was thinking that
it's for parallelizing WAL archiving. But as far as I read
the patch very briefly, WAL file name is still passed to
the archive callback function one by one.

Are you planning to extend this mechanism to other WAL
archiving-related commands like restore_command? I can imagine
that those who use archive library (rather than shell) would
like to use the same mechanism for WAL restore.

I've attached the latest patch from the previous thread. This patch
does a few things. First, it adds the archive_library GUC that
specifies a library to use in place of archive_command. If
archive_library is set to "shell" (the default), archive_command is
still used. The archive_library is preloaded, so its _PG_init() can
do anything that libraries loaded via shared_preload_libraries can do.
Like logical decoding output plugins, archive modules must define an
initialization function and some callbacks. The patch also introduces
the basic_archive module to ensure test coverage on the new
infrastructure.

I think that it's worth adding this module into core
rather than handling it as test module. It provides very basic
WAL archiving feature, but (I guess) it's enough for some users.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#3Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#2)
Re: archive modules

On Tue, Nov 02, 2021 at 01:43:54PM +0900, Fujii Masao wrote:

On 2021/11/02 3:54, Bossart, Nathan wrote:

This thread is a continuation of the thread with the subject
"parallelizing the archiver" [0]. That thread had morphed into an
effort to allow creating archive modules, so I've created a new one to
ensure that this topic has the proper visibility.

What is the main motivation of this patch? I was thinking that
it's for parallelizing WAL archiving. But as far as I read
the patch very briefly, WAL file name is still passed to
the archive callback function one by one.

It seems to me that this patch is not moving into the right direction
implementation-wise (I have read the arguments about
backward-compatibility that led to the introduction of archive_library
and its shell mode), for what looks like a duplicate of
shared_preload_libraries but for an extra code path dedicated to the
archiver, where we could just have a hook instead? We have been
talking for some time now to make the archiver process more
bgworker-ish, so as we finish with something closer to what the
logical replication launcher is.
--
Michael

#4Bossart, Nathan
bossartn@amazon.com
In reply to: Fujii Masao (#2)
Re: archive modules

On 11/1/21, 9:44 PM, "Fujii Masao" <masao.fujii@oss.nttdata.com> wrote:

What is the main motivation of this patch? I was thinking that
it's for parallelizing WAL archiving. But as far as I read
the patch very briefly, WAL file name is still passed to
the archive callback function one by one.

The main motivation is provide a way to archive without shelling out.
This reduces the amount of overhead, which can improve archival rate
significantly. It should also make it easier to archive more safely.
For example, many of the common shell commands used for archiving
won't fsync the data, but it isn't too hard to do so via C. The
current proposal doesn't introduce any extra infrastructure for
batching or parallelism, but it is probably still possible. I would
like to eventually add batching, but for now I'm only focused on
introducing basic archive module support.

Are you planning to extend this mechanism to other WAL
archiving-related commands like restore_command? I can imagine
that those who use archive library (rather than shell) would
like to use the same mechanism for WAL restore.

I would like to do this eventually, but my current proposal is limited
to archive_command.

I think that it's worth adding this module into core
rather than handling it as test module. It provides very basic
WAL archiving feature, but (I guess) it's enough for some users.

Do you think it should go into contrib?

Nathan

#5Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#3)
Re: archive modules

On Tue, Nov 2, 2021 at 1:24 AM Michael Paquier <michael@paquier.xyz> wrote:

It seems to me that this patch is not moving into the right direction
implementation-wise (I have read the arguments about
backward-compatibility that led to the introduction of archive_library
and its shell mode), for what looks like a duplicate of
shared_preload_libraries but for an extra code path dedicated to the
archiver, where we could just have a hook instead? We have been
talking for some time now to make the archiver process more
bgworker-ish, so as we finish with something closer to what the
logical replication launcher is.

Why in the world would you want a plain hook rather than something
closer to the way logical replication works?

Plain hooks are annoying to use. If you load things at the wrong time,
it silently doesn't work. It's also impossible to unload anything. If
you want to change to a different module, you probably have to bounce
the whole server instead of just changing the GUC and letting it load
the new module when you run 'pg_ctl reload'.

Blech! :-)

--
Robert Haas
EDB: http://www.enterprisedb.com

#6Bossart, Nathan
bossartn@amazon.com
In reply to: Michael Paquier (#3)
Re: archive modules

I've just realized I forgot to CC the active participants on the last
thread to this one, so I've attempted to do that now. I didn't
intentionally leave anyone out, but I'm sorry if I missed someone.

On 11/1/21, 10:24 PM, "Michael Paquier" <michael@paquier.xyz> wrote:

It seems to me that this patch is not moving into the right direction
implementation-wise (I have read the arguments about
backward-compatibility that led to the introduction of archive_library
and its shell mode), for what looks like a duplicate of
shared_preload_libraries but for an extra code path dedicated to the
archiver, where we could just have a hook instead? We have been
talking for some time now to make the archiver process more
bgworker-ish, so as we finish with something closer to what the
logical replication launcher is.

Hm. It sounds like you want something more like what was discussed
earlier in the other thread [0]/messages/by-id/8B7BF404-29D4-4662-A2DF-1AC4C98463EB@amazon.com. This approach would basically just
add a hook and call it a day. My patch for this approach [1]/messages/by-id/attachment/127385/v2-0001-Replace-archive_command-with-a-hook.patch moved
the shell archive logic to a test module, but the general consensus
seems to be that we'd need to have it be present in core (and the
default).

Nathan

[0]: /messages/by-id/8B7BF404-29D4-4662-A2DF-1AC4C98463EB@amazon.com
[1]: /messages/by-id/attachment/127385/v2-0001-Replace-archive_command-with-a-hook.patch

#7Bossart, Nathan
bossartn@amazon.com
In reply to: Robert Haas (#5)
Re: archive modules

On 11/2/21, 8:11 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:

Why in the world would you want a plain hook rather than something
closer to the way logical replication works?

Plain hooks are annoying to use. If you load things at the wrong time,
it silently doesn't work. It's also impossible to unload anything. If
you want to change to a different module, you probably have to bounce
the whole server instead of just changing the GUC and letting it load
the new module when you run 'pg_ctl reload'.

Well, the current patch does require a reload since the modules are
preloaded, but maybe there is some way to avoid that. However, I
agree with the general argument that a dedicated archive module
interface will be easier to use correctly. With a hook, it could be
hard to know which library's archive function we are actually using.
With archive_library, we know the exact library's callback functions
we are using.

I think an argument for just adding a hook is that it's simpler and
easier to maintain. But continuous archiving is a pretty critical
piece of functionality, so I think it's a great candidate for some
sturdy infrastructure.

Nathan

#8Robert Haas
robertmhaas@gmail.com
In reply to: Bossart, Nathan (#7)
Re: archive modules

On Tue, Nov 2, 2021 at 11:26 AM Bossart, Nathan <bossartn@amazon.com> wrote:

Well, the current patch does require a reload since the modules are
preloaded, but maybe there is some way to avoid that.

I think we could set things up so that if the value changes, you call
a shutdown hook for the old library, load the new one, and call any
startup hook for that one.

That wouldn't work with a hook-based approach.

--
Robert Haas
EDB: http://www.enterprisedb.com

#9Bossart, Nathan
bossartn@amazon.com
In reply to: Robert Haas (#8)
Re: archive modules

On 11/2/21, 8:57 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:

On Tue, Nov 2, 2021 at 11:26 AM Bossart, Nathan <bossartn@amazon.com> wrote:

Well, the current patch does require a reload since the modules are
preloaded, but maybe there is some way to avoid that.

I think we could set things up so that if the value changes, you call
a shutdown hook for the old library, load the new one, and call any
startup hook for that one.

Yes, that seems doable. My point is that I've intentionally chosen to
preload the libraries at the moment so that it's possible to define
PGC_POSTMASTER GUCs and to use RegisterBackgroundWorker(). If we
think that switching archive modules without restarting is more
important, I believe we will need to take on a few restrictions.

Nathan

#10Robert Haas
robertmhaas@gmail.com
In reply to: Bossart, Nathan (#9)
Re: archive modules

On Tue, Nov 2, 2021 at 12:10 PM Bossart, Nathan <bossartn@amazon.com> wrote:

Yes, that seems doable. My point is that I've intentionally chosen to
preload the libraries at the moment so that it's possible to define
PGC_POSTMASTER GUCs and to use RegisterBackgroundWorker(). If we
think that switching archive modules without restarting is more
important, I believe we will need to take on a few restrictions.

I guess I'm failing to understand what the problem is. You can set
GUCs of the form foo.bar in postgresql.conf anyway, right?

--
Robert Haas
EDB: http://www.enterprisedb.com

#11Bossart, Nathan
bossartn@amazon.com
In reply to: Robert Haas (#10)
Re: archive modules

On 11/2/21, 9:17 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:

On Tue, Nov 2, 2021 at 12:10 PM Bossart, Nathan <bossartn@amazon.com> wrote:

Yes, that seems doable. My point is that I've intentionally chosen to
preload the libraries at the moment so that it's possible to define
PGC_POSTMASTER GUCs and to use RegisterBackgroundWorker(). If we
think that switching archive modules without restarting is more
important, I believe we will need to take on a few restrictions.

I guess I'm failing to understand what the problem is. You can set
GUCs of the form foo.bar in postgresql.conf anyway, right?

I must not be explaining it well, sorry. I'm mainly thinking about
the following code snippets.

In guc.c:
/*
* Only allow custom PGC_POSTMASTER variables to be created during shared
* library preload; any later than that, we can't ensure that the value
* doesn't change after startup. This is a fatal elog if it happens; just
* erroring out isn't safe because we don't know what the calling loadable
* module might already have hooked into.
*/
if (context == PGC_POSTMASTER &&
!process_shared_preload_libraries_in_progress)
elog(FATAL, "cannot create PGC_POSTMASTER variables after startup");

In bgworker.c:
if (!process_shared_preload_libraries_in_progress &&
strcmp(worker->bgw_library_name, "postgres") != 0)
{
if (!IsUnderPostmaster)
ereport(LOG,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("background worker \"%s\": must be registered in shared_preload_libraries",
worker->bgw_name)));
return;
}

You could still introduce GUCs in _PG_init(), but they couldn't be
defined as PGC_POSTMASTER. Also, you could still use
RegisterDynamicBackgroundWorker() to register a background worker, but
you couldn't use RegisterBackgroundWorker(). These might be
acceptable restrictions if swapping archive libraries on the fly seems
more important, but I wanted to bring that front and center to make
sure everyone understands the tradeoffs.

It's also entirely possible I'm misunderstanding something here...

Nathan

#12Robert Haas
robertmhaas@gmail.com
In reply to: Bossart, Nathan (#11)
Re: archive modules

On Tue, Nov 2, 2021 at 12:39 PM Bossart, Nathan <bossartn@amazon.com> wrote:>

On 11/2/21, 9:17 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
You could still introduce GUCs in _PG_init(), but they couldn't be
defined as PGC_POSTMASTER.

It seems like PGC_POSTMASTER isn't very desirable anyway. Wouldn't you
want PGC_SIGHUP? I mean I'm not saying there couldn't be a case where
that wouldn't work, because you could need a big chunk of shared
memory allocated at startup time or something. But in that's probably
not typical, and if it does happen well then that particular archive
module has to be preloaded. If you know that you have several archive
modules that you want to use, you can still preload them all if for
this or any other reason you want to do that. But in a lot of cases
it's not going to be necessary.

In other words, if we use hooks, then you're guaranteed to need a
server restart to change anything. If we use something like what you
have now, there can be corner cases where you need that or benefits of
preloading, but it's not a hard requirement, and in many cases you can
get by without it. That seems strictly better to me ... but maybe I'm
still confused.

Also, you could still use
RegisterDynamicBackgroundWorker() to register a background worker, but
you couldn't use RegisterBackgroundWorker(). These might be
acceptable restrictions if swapping archive libraries on the fly seems
more important, but I wanted to bring that front and center to make
sure everyone understands the tradeoffs.

RegisterDynamicBackgroundWorker() seems way better, though. It's hard
for me to understand why this would be a problem for anybody. And
again, if somebody does have that need, they can always fall back to
saying that their particular module should be preloaded if you want to
use it.

--
Robert Haas
EDB: http://www.enterprisedb.com

#13Bossart, Nathan
bossartn@amazon.com
In reply to: Robert Haas (#12)
Re: archive modules

On 11/2/21, 9:46 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:

On Tue, Nov 2, 2021 at 12:39 PM Bossart, Nathan <bossartn@amazon.com> wrote:>

On 11/2/21, 9:17 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
You could still introduce GUCs in _PG_init(), but they couldn't be
defined as PGC_POSTMASTER.

It seems like PGC_POSTMASTER isn't very desirable anyway. Wouldn't you
want PGC_SIGHUP? I mean I'm not saying there couldn't be a case where
that wouldn't work, because you could need a big chunk of shared
memory allocated at startup time or something. But in that's probably
not typical, and if it does happen well then that particular archive
module has to be preloaded. If you know that you have several archive
modules that you want to use, you can still preload them all if for
this or any other reason you want to do that. But in a lot of cases
it's not going to be necessary.

In other words, if we use hooks, then you're guaranteed to need a
server restart to change anything. If we use something like what you
have now, there can be corner cases where you need that or benefits of
preloading, but it's not a hard requirement, and in many cases you can
get by without it. That seems strictly better to me ... but maybe I'm
still confused.

Also, you could still use
RegisterDynamicBackgroundWorker() to register a background worker, but
you couldn't use RegisterBackgroundWorker(). These might be
acceptable restrictions if swapping archive libraries on the fly seems
more important, but I wanted to bring that front and center to make
sure everyone understands the tradeoffs.

RegisterDynamicBackgroundWorker() seems way better, though. It's hard
for me to understand why this would be a problem for anybody. And
again, if somebody does have that need, they can always fall back to
saying that their particular module should be preloaded if you want to
use it.

I agree. I'll make sure the archive library can be changed via SIGHUP
in the next revision.

Nathan

#14Bossart, Nathan
bossartn@amazon.com
In reply to: Robert Haas (#12)
1 attachment(s)
Re: archive modules

On 11/2/21, 10:29 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

I agree. I'll make sure the archive library can be changed via SIGHUP
in the next revision.

And here it is.

Nathan

Attachments:

v8-0001-Introduce-archive-module-infrastructure.patchapplication/octet-stream; name=v8-0001-Introduce-archive-module-infrastructure.patchDownload
From 8a565f9090a6d564a678a72574d82c8b757b4dea Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Tue, 2 Nov 2021 20:55:24 +0000
Subject: [PATCH v8 1/1] Introduce archive module infrastructure.

This feature allows custom archive libraries to be used in place of
archive_command.  A new GUC called archive_library specifies the
archive module that should be used.  Like logical decoding output
plugins, archive modules must define an initialization function and
some callbacks.  If archive_library is set to "shell" (which is the
default for backward compatibility), archive_command is used.
---
 doc/src/sgml/archive-modules.sgml                  | 123 +++++++++++++
 doc/src/sgml/backup.sgml                           |  83 +++++----
 doc/src/sgml/config.sgml                           |  37 +++-
 doc/src/sgml/filelist.sgml                         |   1 +
 doc/src/sgml/high-availability.sgml                |   6 +-
 doc/src/sgml/postgres.sgml                         |   1 +
 doc/src/sgml/ref/pg_basebackup.sgml                |   4 +-
 doc/src/sgml/ref/pg_receivewal.sgml                |   6 +-
 doc/src/sgml/wal.sgml                              |   2 +-
 src/backend/access/transam/xlog.c                  |   2 +-
 src/backend/postmaster/Makefile                    |   1 +
 src/backend/postmaster/pgarch.c                    | 192 ++++++++-------------
 src/backend/postmaster/shell_archive.c             | 153 ++++++++++++++++
 src/backend/utils/init/miscinit.c                  |   1 +
 src/backend/utils/misc/guc.c                       |  12 +-
 src/backend/utils/misc/postgresql.conf.sample      |   1 +
 src/include/access/xlog.h                          |   1 -
 src/include/postmaster/pgarch.h                    |  45 +++++
 src/test/modules/Makefile                          |   1 +
 src/test/modules/basic_archive/.gitignore          |   4 +
 src/test/modules/basic_archive/Makefile            |  20 +++
 src/test/modules/basic_archive/basic_archive.c     | 185 ++++++++++++++++++++
 src/test/modules/basic_archive/basic_archive.conf  |   3 +
 .../basic_archive/expected/basic_archive.out       |  29 ++++
 .../modules/basic_archive/sql/basic_archive.sql    |  22 +++
 25 files changed, 763 insertions(+), 172 deletions(-)
 create mode 100644 doc/src/sgml/archive-modules.sgml
 create mode 100644 src/backend/postmaster/shell_archive.c
 create mode 100644 src/test/modules/basic_archive/.gitignore
 create mode 100644 src/test/modules/basic_archive/Makefile
 create mode 100644 src/test/modules/basic_archive/basic_archive.c
 create mode 100644 src/test/modules/basic_archive/basic_archive.conf
 create mode 100644 src/test/modules/basic_archive/expected/basic_archive.out
 create mode 100644 src/test/modules/basic_archive/sql/basic_archive.sql

diff --git a/doc/src/sgml/archive-modules.sgml b/doc/src/sgml/archive-modules.sgml
new file mode 100644
index 0000000000..d52aaaf1f5
--- /dev/null
+++ b/doc/src/sgml/archive-modules.sgml
@@ -0,0 +1,123 @@
+<!-- doc/src/sgml/archive-modules.sgml -->
+
+<chapter id="archive-modules">
+ <title>Archive Modules</title>
+ <indexterm zone="archive-modules">
+  <primary>Archive Modules</primary>
+ </indexterm>
+
+ <para>
+  PostgreSQL provides infrastructure to create custom modules for continuous
+  archiving (see <xref linkend="continuous-archiving"/>).  While archiving via
+  a shell command (i.e., <xref linkend="guc-archive-command"/>) is much
+  simpler, a custom archive module will often be considerably more robust and
+  performant.
+ </para>
+
+ <para>
+  When a custom <xref linkend="guc-archive-library"/> is configured, PostgreSQL
+  will submit completed WAL files to the module, and the server will avoid
+  recyling or removing these WAL files until the module indicates that the files
+  were successfully archived.  It is ultimately up to the module to decide what
+  to do with each WAL file, but many recommendations are listed at
+  <xref linkend="backup-archiving-wal"/>.
+ </para>
+
+ <para>
+  Archiving modules must at least consist of an initialization function (see
+  <xref linkend="archive-module-init"/>) and the required callbacks (see
+  <xref linkend="archive-module-callbacks"/>).  However, archive modules are
+  also permitted to do much more (e.g., declare GUCs and register background
+  workers).
+ </para>
+
+ <para>
+  The <filename>src/test/modules/basic_archive</filename> module contains a
+  working example, which demonstrates some useful techniques.
+ </para>
+
+ <warning>
+  <para>
+   There are considerable robustness and security risks in using archive modules
+   because, being written in the <literal>C</literal> language, they have access
+   to many server resources.  Administrators wishing to enable archive modules
+   should exercise extreme caution.  Only carefully audited modules should be
+   loaded.
+  </para>
+ </warning>
+
+ <sect1 id="archive-module-init">
+  <title>Initialization Functions</title>
+  <indexterm zone="archive-module-init">
+   <primary>_PG_archive_module_init</primary>
+  </indexterm>
+  <para>
+   An archive library is loaded by dynamically loading a shared library with the
+   <xref linkend="guc-archive-library"/>'s name as the library base name.  The
+   normal library search path is used to locate the library.  To provide the
+   required archive module callbacks and to indicate that the library is
+   actually an archive module, it needs to provide a function named
+   <function>_PG_archive_module_init</function>.  This function is passed a
+   struct that needs to be filled with the callback function pointers for
+   individual actions.
+
+<programlisting>
+typedef struct ArchiveModuleCallbacks
+{
+    ArchiveCheckConfiguredCB check_configured_cb;
+    ArchiveFileCB archive_file_cb;
+} ArchiveModuleCallbacks;
+typedef void (*ArchiveModuleInit) (struct ArchiveModuleCallbacks *cb);
+</programlisting>
+
+   Both callbacks are required.
+  </para>
+ </sect1>
+
+ <sect1 id="archive-module-callbacks">
+  <title>Archive Module Callbacks</title>
+  <para>
+   The archive callbacks define the actual archiving behavior of the module.
+   The server will call them as required to process each individual WAL file.
+  </para>
+
+  <sect2 id="archive-module-check">
+   <title>Check Callback</title>
+   <para>
+    The <function>check_configured_cb</function> callback is called to determine
+    whether the module is fully configured and ready to accept WAL files.
+
+<programlisting>
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server will proceed with
+    archiving the file by calling the <function>archive_file_cb</function>
+    callback.  If <literal>false</literal> is returned, archiving will not
+    proceed.  In the latter case, the server will periodically call this
+    function, and archiving will proceed if it eventually returns
+    <literal>true</literal>.
+   </para>
+  </sect2>
+
+  <sect2 id="archive-module-archive">
+   <title>Archive Callback</title>
+   <para>
+    The <function>archive_file_cb</function> callback is called to archive a
+    single WAL file.
+
+<programlisting>
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server proceeds as if the file
+    was successfully archived, which may include recycling or removing the
+    original WAL file.  If <literal>false</literal> is returned, the server will
+    keep the original WAL file and retry archiving later.
+    <literal>file</literal> will contain just the file name of the WAL file to
+    archive, while <literal>path</literal> contains the full path of the WAL
+    file (including the file name).
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index cba32b6eb3..b42f1b3ca7 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -593,20 +593,23 @@ tar -cf backup.tar /usr/local/pgsql/data
     provide the database administrator with flexibility,
     <productname>PostgreSQL</productname> tries not to make any assumptions about how
     the archiving will be done.  Instead, <productname>PostgreSQL</productname> lets
-    the administrator specify a shell command to be executed to copy a
-    completed segment file to wherever it needs to go.  The command could be
-    as simple as a <literal>cp</literal>, or it could invoke a complex shell
-    script &mdash; it's all up to you.
+    the administrator specify an archive library to be executed to copy a
+    completed segment file to wherever it needs to go.  This could be as simple
+    as a shell command that uses <literal>cp</literal>, or it could invoke a
+    complex C function &mdash; it's all up to you.
    </para>
 
    <para>
     To enable WAL archiving, set the <xref linkend="guc-wal-level"/>
     configuration parameter to <literal>replica</literal> or higher,
     <xref linkend="guc-archive-mode"/> to <literal>on</literal>,
-    and specify the shell command to use in the <xref
-    linkend="guc-archive-command"/> configuration parameter.  In practice
+    and specify the library to use in the <xref
+    linkend="guc-archive-library"/> configuration parameter.  In practice
     these settings will always be placed in the
     <filename>postgresql.conf</filename> file.
+    One simple way to archive is to set <varname>archive_library</varname> to
+    <literal>shell</literal> and to specify a shell command in
+    <xref linkend="guc-archive-command"/>.
     In <varname>archive_command</varname>,
     <literal>%p</literal> is replaced by the path name of the file to
     archive, while <literal>%f</literal> is replaced by only the file name.
@@ -631,7 +634,17 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command will be executed under the ownership of the same
+    Another way to archive is to use a custom archive module as the
+    <varname>archive_library</varname>.  Since such modules are written in
+    <literal>C</literal>, creating your own may require considerably more effort
+    than writing a shell command.  However, archive modules can be more
+    performant than archiving via shell, and they will have access to many
+    useful server resources.  For more information about archive modules, see
+    <xref linkend="archive-modules"/>.
+   </para>
+
+   <para>
+    The archive library will be executed under the ownership of the same
     user that the <productname>PostgreSQL</productname> server is running as.  Since
     the series of WAL files being archived contains effectively everything
     in your database, you will want to be sure that the archived data is
@@ -640,25 +653,31 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is important that the archive command return zero exit status if and
-    only if it succeeds.  Upon getting a zero result,
+    It is important that the archive function return <literal>true</literal> if
+    and only if it succeeds.  If <literal>true</literal> is returned,
     <productname>PostgreSQL</productname> will assume that the file has been
-    successfully archived, and will remove or recycle it.  However, a nonzero
-    status tells <productname>PostgreSQL</productname> that the file was not archived;
-    it will try again periodically until it succeeds.
+    successfully archived, and will remove or recycle it.  However, a return
+    value of <literal>false</literal> tells
+    <productname>PostgreSQL</productname> that the file was not archived; it
+    will try again periodically until it succeeds.  If you are archiving via a
+    shell command, the appropriate return values can be achieved by returning
+    <literal>0</literal> if the command succeeds and a nonzero value if it
+    fails.
    </para>
 
    <para>
-    When the archive command is terminated by a signal (other than
-    <systemitem>SIGTERM</systemitem> that is used as part of a server
-    shutdown) or an error by the shell with an exit status greater than
-    125 (such as command not found), the archiver process aborts and gets
-    restarted by the postmaster. In such cases, the failure is
-    not reported in <xref linkend="pg-stat-archiver-view"/>.
+    If the archive function emits an <literal>ERROR</literal> or
+    <literal>FATAL</literal>, the archiver process aborts and gets restarted by
+    the postmaster.  If you are archiving via shell command, FATAL is emitted if
+    the command is terminated by a signal (other than
+    <systemitem>SIGTERM</systemitem> that is used as part of a server shutdown)
+    or an error by the shell with an exit status greater than 125 (such as
+    command not found).  In such cases, the failure is not reported in
+    <xref linkend="pg-stat-archiver-view"/>.
    </para>
 
    <para>
-    The archive command should generally be designed to refuse to overwrite
+    The archive library should generally be designed to refuse to overwrite
     any pre-existing archive file.  This is an important safety feature to
     preserve the integrity of your archive in case of administrator error
     (such as sending the output of two different servers to the same archive
@@ -666,9 +685,9 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is advisable to test your proposed archive command to ensure that it
+    It is advisable to test your proposed archive library to ensure that it
     indeed does not overwrite an existing file, <emphasis>and that it returns
-    nonzero status in this case</emphasis>.
+    <literal>false</literal> in this case</emphasis>.
     The example command above for Unix ensures this by including a separate
     <command>test</command> step.  On some Unix platforms, <command>cp</command> has
     switches such as <option>-i</option> that can be used to do the same thing
@@ -680,7 +699,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
 
    <para>
     While designing your archiving setup, consider what will happen if
-    the archive command fails repeatedly because some aspect requires
+    the archive library fails repeatedly because some aspect requires
     operator intervention or the archive runs out of space. For example, this
     could occur if you write to tape without an autochanger; when the tape
     fills, nothing further can be archived until the tape is swapped.
@@ -695,7 +714,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The speed of the archiving command is unimportant as long as it can keep up
+    The speed of the archive library is unimportant as long as it can keep up
     with the average rate at which your server generates WAL data.  Normal
     operation continues even if the archiving process falls a little behind.
     If archiving falls significantly behind, this will increase the amount of
@@ -707,11 +726,11 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    In writing your archive command, you should assume that the file names to
+    In writing your archive library, you should assume that the file names to
     be archived can be up to 64 characters long and can contain any
     combination of ASCII letters, digits, and dots.  It is not necessary to
-    preserve the original relative path (<literal>%p</literal>) but it is necessary to
-    preserve the file name (<literal>%f</literal>).
+    preserve the original relative path but it is necessary to preserve the file
+    name.
    </para>
 
    <para>
@@ -728,7 +747,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command is only invoked on completed WAL segments.  Hence,
+    The archive function is only invoked on completed WAL segments.  Hence,
     if your server generates only little WAL traffic (or has slack periods
     where it does so), there could be a long delay between the completion
     of a transaction and its safe recording in archive storage.  To put
@@ -758,7 +777,8 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
     contain enough information for archive recovery.  (Crash recovery is
     unaffected.)  For this reason, <varname>wal_level</varname> can only be changed at
     server start.  However, <varname>archive_command</varname> can be changed with a
-    configuration file reload.  If you wish to temporarily stop archiving,
+    configuration file reload.  If you are archiving via shell and wish to
+    temporarily stop archiving,
     one way to do it is to set <varname>archive_command</varname> to the empty
     string (<literal>''</literal>).
     This will cause WAL files to accumulate in <filename>pg_wal/</filename> until a
@@ -938,11 +958,11 @@ SELECT * FROM pg_stop_backup(false, true);
      On a standby, <varname>archive_mode</varname> must be <literal>always</literal> in order
      for <function>pg_stop_backup</function> to wait.
      Archiving of these files happens automatically since you have
-     already configured <varname>archive_command</varname>. In most cases this
+     already configured <varname>archive_library</varname>. In most cases this
      happens quickly, but you are advised to monitor your archive
      system to ensure there are no delays.
      If the archive process has fallen behind
-     because of failures of the archive command, it will keep retrying
+     because of failures of the archive library, it will keep retrying
      until the archive succeeds and the backup is complete.
      If you wish to place a time limit on the execution of
      <function>pg_stop_backup</function>, set an appropriate
@@ -1500,9 +1520,10 @@ restore_command = 'cp /mnt/server/archivedir/%f %p'
       To prepare for low level standalone hot backups, make sure
       <varname>wal_level</varname> is set to
       <literal>replica</literal> or higher, <varname>archive_mode</varname> to
-      <literal>on</literal>, and set up an <varname>archive_command</varname> that performs
+      <literal>on</literal>, and set up an <varname>archive_library</varname> that performs
       archiving only when a <emphasis>switch file</emphasis> exists.  For example:
 <programlisting>
+archive_library = 'shell'
 archive_command = 'test ! -f /var/lib/pgsql/backup_in_progress || (test ! -f /var/lib/pgsql/archive/%f &amp;&amp; cp %p /var/lib/pgsql/archive/%f)'
 </programlisting>
       This command will perform archiving when
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index de77f14573..9042510438 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3479,7 +3479,7 @@ include_dir 'conf.d'
         Maximum size to let the WAL grow during automatic
         checkpoints. This is a soft limit; WAL size can exceed
         <varname>max_wal_size</varname> under special circumstances, such as
-        heavy load, a failing <varname>archive_command</varname>, or a high
+        heavy load, a failing <varname>archive_library</varname>, or a high
         <varname>wal_keep_size</varname> setting.
         If this value is specified without units, it is taken as megabytes.
         The default is 1 GB.
@@ -3528,7 +3528,7 @@ include_dir 'conf.d'
        <para>
         When <varname>archive_mode</varname> is enabled, completed WAL segments
         are sent to archive storage by setting
-        <xref linkend="guc-archive-command"/>. In addition to <literal>off</literal>,
+        <xref linkend="guc-archive-library"/>. In addition to <literal>off</literal>,
         to disable, there are two modes: <literal>on</literal>, and
         <literal>always</literal>. During normal operation, there is no
         difference between the two modes, but when set to <literal>always</literal>
@@ -3538,9 +3538,6 @@ include_dir 'conf.d'
         <xref linkend="continuous-archiving-in-standby"/> for details.
        </para>
        <para>
-        <varname>archive_mode</varname> and <varname>archive_command</varname> are
-        separate variables so that <varname>archive_command</varname> can be
-        changed without leaving archiving mode.
         This parameter can only be set at server start.
         <varname>archive_mode</varname> cannot be enabled when
         <varname>wal_level</varname> is set to <literal>minimal</literal>.
@@ -3548,6 +3545,28 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-archive-library" xreflabel="archive_library">
+      <term><varname>archive_library</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>archive_library</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        The library to use for archiving completed WAL file segments.  If set to
+        <literal>shell</literal> (the default) or an empty string, archiving via
+        shell is enabled, and <xref linkend="guc-archive-command"/> is used.
+        Otherwise, the specified shared library is used for archiving.  For more
+        information, see <xref linkend="backup-archiving-wal"/> and
+        <xref linkend="archive-modules"/>.
+       </para>
+       <para>
+        This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-archive-command" xreflabel="archive_command">
       <term><varname>archive_command</varname> (<type>string</type>)
       <indexterm>
@@ -3570,9 +3589,11 @@ include_dir 'conf.d'
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
         file or on the server command line.  It is ignored unless
-        <varname>archive_mode</varname> was enabled at server start.
+        <varname>archive_mode</varname> was enabled at server start and
+        <varname>archive_library</varname> specifies to archive via shell command.
         If <varname>archive_command</varname> is an empty string (the default) while
-        <varname>archive_mode</varname> is enabled, WAL archiving is temporarily
+        <varname>archive_mode</varname> is enabled and <varname>archive_library</varname>
+        specifies archiving via shell, WAL archiving is temporarily
         disabled, but the server continues to accumulate WAL segment files in
         the expectation that a command will soon be provided.  Setting
         <varname>archive_command</varname> to a command that does nothing but
@@ -3592,7 +3613,7 @@ include_dir 'conf.d'
       </term>
       <listitem>
        <para>
-        The <xref linkend="guc-archive-command"/> is only invoked for
+        The <xref linkend="guc-archive-library"/> is only invoked for
         completed WAL segments. Hence, if your server generates little WAL
         traffic (or has slack periods where it does so), there could be a
         long delay between the completion of a transaction and its safe
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 89454e99b9..e6b472ec32 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -99,6 +99,7 @@
 <!ENTITY custom-scan SYSTEM "custom-scan.sgml">
 <!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
 <!ENTITY replication-origins SYSTEM "replication-origins.sgml">
+<!ENTITY archive-modules SYSTEM "archive-modules.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index c43f214020..f4e5e9420b 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -935,7 +935,7 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
     In lieu of using replication slots, it is possible to prevent the removal
     of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by
     storing the segments in an archive using
-    <xref linkend="guc-archive-command"/>.
+    <xref linkend="guc-archive-library"/>.
     However, these methods often result in retaining more WAL segments than
     required, whereas replication slots retain only the number of segments
     known to be needed.  On the other hand, replication slots can retain so
@@ -1386,10 +1386,10 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
      to <literal>always</literal>, and the standby will call the archive
      command for every WAL segment it receives, whether it's by restoring
      from the archive or by streaming replication. The shared archive can
-     be handled similarly, but the <varname>archive_command</varname> must
+     be handled similarly, but the <varname>archive_library</varname> must
      test if the file being archived exists already, and if the existing file
      has identical contents. This requires more care in the
-     <varname>archive_command</varname>, as it must
+     <varname>archive_library</varname>, as it must
      be careful to not overwrite an existing file with different contents,
      but return success if the exactly same file is archived twice. And
      all that must be done free of race conditions, if two servers attempt
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index dba9cf413f..3db6d2160b 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -233,6 +233,7 @@ break is not needed in a wider output rendering.
   &bgworker;
   &logicaldecoding;
   &replication-origins;
+  &archive-modules;
 
  </part>
 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9e6807b457..2aaeaca766 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -102,8 +102,8 @@ PostgreSQL documentation
      <para>
       All WAL records required for the backup must contain sufficient full-page writes,
       which requires you to enable <varname>full_page_writes</varname> on the primary and
-      not to use a tool like <application>pg_compresslog</application> as
-      <varname>archive_command</varname> to remove full-page writes from WAL files.
+      not to use a tool in your <varname>archive_library</varname> to remove
+      full-page writes from WAL files.
      </para>
     </listitem>
    </itemizedlist>
diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 9fde2fd2ef..10ee107000 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -40,7 +40,7 @@ PostgreSQL documentation
   <para>
    <application>pg_receivewal</application> streams the write-ahead
    log in real time as it's being generated on the server, and does not wait
-   for segments to complete like <xref linkend="guc-archive-command"/> does.
+   for segments to complete like <xref linkend="guc-archive-library"/> does.
    For this reason, it is not necessary to set
    <xref linkend="guc-archive-timeout"/> when using
     <application>pg_receivewal</application>.
@@ -465,11 +465,11 @@ PostgreSQL documentation
 
   <para>
    When using <application>pg_receivewal</application> instead of
-   <xref linkend="guc-archive-command"/> as the main WAL backup method, it is
+   <xref linkend="guc-archive-library"/> as the main WAL backup method, it is
    strongly recommended to use replication slots.  Otherwise, the server is
    free to recycle or remove write-ahead log files before they are backed up,
    because it does not have any information, either
-   from <xref linkend="guc-archive-command"/> or the replication slots, about
+   from <xref linkend="guc-archive-library"/> or the replication slots, about
    how far the WAL stream has been archived.  Note, however, that a
    replication slot will fill up the server's disk space if the receiver does
    not keep up with fetching the WAL data.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 24e1c89503..2bb27a8468 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -636,7 +636,7 @@
    WAL files plus one additional WAL file are
    kept at all times. Also, if WAL archiving is used, old segments cannot be
    removed or recycled until they are archived. If WAL archiving cannot keep up
-   with the pace that WAL is generated, or if <varname>archive_command</varname>
+   with the pace that WAL is generated, or if <varname>archive_library</varname>
    fails repeatedly, old WAL files will accumulate in <filename>pg_wal</filename>
    until the situation is resolved. A slow or failed standby server that
    uses a replication slot will have the same effect (see
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0a0771a18e..32e59b1e78 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8806,7 +8806,7 @@ ShutdownXLOG(int code, Datum arg)
 		 * process one more time at the end of shutdown). The checkpoint
 		 * record will go to the next XLOG file and won't be archived (yet).
 		 */
-		if (XLogArchivingActive() && XLogArchiveCommandSet())
+		if (XLogArchivingActive())
 			RequestXLogSwitch(false);
 
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 787c6a2c3b..dbbeac5a82 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -23,6 +23,7 @@ OBJS = \
 	pgarch.o \
 	pgstat.o \
 	postmaster.o \
+	shell_archive.o \
 	startup.o \
 	syslogger.o \
 	walwriter.o
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 74a7d7c4d0..7200145a44 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -25,18 +25,12 @@
  */
 #include "postgres.h"
 
-#include <fcntl.h>
-#include <signal.h>
-#include <time.h>
 #include <sys/stat.h>
-#include <sys/time.h>
-#include <sys/wait.h>
 #include <unistd.h>
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "libpq/pqsignal.h"
-#include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/pgarch.h"
@@ -78,6 +72,8 @@ typedef struct PgArchData
 	int			pgprocno;		/* pgprocno of archiver process */
 } PgArchData;
 
+char *XLogArchiveLibrary = "";
+
 
 /* ----------
  * Local data
@@ -85,6 +81,8 @@ typedef struct PgArchData
  */
 static time_t last_sigterm_time = 0;
 static PgArchData *PgArch = NULL;
+static ArchiveModuleCallbacks *ArchiveContext = NULL;
+
 
 /*
  * Flags set by interrupt handlers for later service in the main loop.
@@ -103,6 +101,7 @@ static bool pgarch_readyXlog(char *xlog);
 static void pgarch_archiveDone(char *xlog);
 static void pgarch_die(int code, Datum arg);
 static void HandlePgArchInterrupts(void);
+static void LoadArchiveLibrary(void);
 
 /* Report shared memory space needed by PgArchShmemInit */
 Size
@@ -198,6 +197,11 @@ PgArchiverMain(void)
 	 */
 	PgArch->pgprocno = MyProc->pgprocno;
 
+	/*
+	 * Load the archive_library.
+	 */
+	LoadArchiveLibrary();
+
 	pgarch_MainLoop();
 
 	proc_exit(0);
@@ -358,11 +362,11 @@ pgarch_ArchiverCopyLoop(void)
 			 */
 			HandlePgArchInterrupts();
 
-			/* can't do anything if no command ... */
-			if (!XLogArchiveCommandSet())
+			/* can't do anything if not configured ... */
+			if (!ArchiveContext->check_configured_cb())
 			{
 				ereport(WARNING,
-						(errmsg("archive_mode enabled, yet archive_command is not set")));
+						(errmsg("archive_mode enabled, yet archiving is not configured")));
 				return;
 			}
 
@@ -443,136 +447,31 @@ pgarch_ArchiverCopyLoop(void)
 /*
  * pgarch_archiveXlog
  *
- * Invokes system(3) to copy one archive file to wherever it should go
+ * Invokes archive_file_cb to copy one archive file to wherever it should go
  *
  * Returns true if successful
  */
 static bool
 pgarch_archiveXlog(char *xlog)
 {
-	char		xlogarchcmd[MAXPGPATH];
 	char		pathname[MAXPGPATH];
 	char		activitymsg[MAXFNAMELEN + 16];
-	char	   *dp;
-	char	   *endp;
-	const char *sp;
-	int			rc;
+	bool		ret;
 
 	snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
 
-	/*
-	 * construct the command to be executed
-	 */
-	dp = xlogarchcmd;
-	endp = xlogarchcmd + MAXPGPATH - 1;
-	*endp = '\0';
-
-	for (sp = XLogArchiveCommand; *sp; sp++)
-	{
-		if (*sp == '%')
-		{
-			switch (sp[1])
-			{
-				case 'p':
-					/* %p: relative path of source file */
-					sp++;
-					strlcpy(dp, pathname, endp - dp);
-					make_native_path(dp);
-					dp += strlen(dp);
-					break;
-				case 'f':
-					/* %f: filename of source file */
-					sp++;
-					strlcpy(dp, xlog, endp - dp);
-					dp += strlen(dp);
-					break;
-				case '%':
-					/* convert %% to a single % */
-					sp++;
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-				default:
-					/* otherwise treat the % as not special */
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-			}
-		}
-		else
-		{
-			if (dp < endp)
-				*dp++ = *sp;
-		}
-	}
-	*dp = '\0';
-
-	ereport(DEBUG3,
-			(errmsg_internal("executing archive command \"%s\"",
-							 xlogarchcmd)));
-
 	/* Report archive activity in PS display */
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	rc = system(xlogarchcmd);
-	if (rc != 0)
-	{
-		/*
-		 * If either the shell itself, or a called command, died on a signal,
-		 * abort the archiver.  We do this because system() ignores SIGINT and
-		 * SIGQUIT while waiting; so a signal is very likely something that
-		 * should have interrupted us too.  Also die if the shell got a hard
-		 * "command not found" type of error.  If we overreact it's no big
-		 * deal, the postmaster will just start the archiver again.
-		 */
-		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
-
-		if (WIFEXITED(rc))
-		{
-			ereport(lev,
-					(errmsg("archive command failed with exit code %d",
-							WEXITSTATUS(rc)),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-		else if (WIFSIGNALED(rc))
-		{
-#if defined(WIN32)
-			ereport(lev,
-					(errmsg("archive command was terminated by exception 0x%X",
-							WTERMSIG(rc)),
-					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#else
-			ereport(lev,
-					(errmsg("archive command was terminated by signal %d: %s",
-							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#endif
-		}
-		else
-		{
-			ereport(lev,
-					(errmsg("archive command exited with unrecognized status %d",
-							rc),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-
+	ret = ArchiveContext->archive_file_cb(xlog, pathname);
+	if (ret)
+		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
+	else
 		snprintf(activitymsg, sizeof(activitymsg), "failed on %s", xlog);
-		set_ps_display(activitymsg);
-
-		return false;
-	}
-	elog(DEBUG1, "archived write-ahead log file \"%s\"", xlog);
-
-	snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	set_ps_display(activitymsg);
 
-	return true;
+	return ret;
 }
 
 /*
@@ -714,5 +613,56 @@ HandlePgArchInterrupts(void)
 	{
 		ConfigReloadPending = false;
 		ProcessConfigFile(PGC_SIGHUP);
+
+		/*
+		 * Load the archive_library in case it changed.  Ideally, this would
+		 * first unload any pre-existing loaded archive library to release
+		 * custom GUCs, decommission background workers, etc., but there is
+		 * presently no mechanism for unloading a library.  For more
+		 * information, see the comment above internal_unload_library().
+		 */
+		LoadArchiveLibrary();
 	}
 }
+
+/*
+ * LoadArchiveLibrary
+ *
+ * Loads the archiving callbacks into our local ArchiveContext.
+ */
+static void
+LoadArchiveLibrary(void)
+{
+	ArchiveModuleInit archive_init;
+
+	if (ArchiveContext == NULL)
+		ArchiveContext = palloc(sizeof(ArchiveModuleCallbacks));
+
+	memset(ArchiveContext, 0, sizeof(ArchiveModuleCallbacks));
+
+	/*
+	 * If shell archiving is enabled, use our special initialization
+	 * function.  Otherwise, load the library and call its
+	 * _PG_archive_module_init().
+	 */
+	if (ShellArchivingEnabled())
+		archive_init = shell_archive_init;
+	else
+		archive_init = (ArchiveModuleInit)
+			load_external_function(XLogArchiveLibrary,
+								   "_PG_archive_module_init", false, NULL);
+
+	if (archive_init == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules have to declare the "
+						"_PG_archive_module_init symbol")));
+
+	archive_init(ArchiveContext);
+
+	if (ArchiveContext->check_configured_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register a check callback")));
+	if (ArchiveContext->archive_file_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register an archive callback")));
+}
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
new file mode 100644
index 0000000000..f3cdbc97fe
--- /dev/null
+++ b/src/backend/postmaster/shell_archive.c
@@ -0,0 +1,153 @@
+/*-------------------------------------------------------------------------
+ *
+ * shell_archive.c
+ *
+ * This archiving function uses a user-specified shell command (the
+ * archive_command GUC) to copy write-ahead log files.  It is used as the
+ * default, but other modules may define their own custom archiving logic.
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/shell_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/wait.h>
+
+#include "access/xlog.h"
+#include "postmaster/pgarch.h"
+
+static bool shell_archive_configured(void);
+static bool shell_archive_file(const char *file, const char *path);
+
+void
+shell_archive_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&shell_archive_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = shell_archive_configured;
+	cb->archive_file_cb = shell_archive_file;
+}
+
+static bool
+shell_archive_configured(void)
+{
+	return XLogArchiveCommand[0] != '\0';
+}
+
+static bool
+shell_archive_file(const char *file, const char *path)
+{
+	char		xlogarchcmd[MAXPGPATH];
+	char	   *dp;
+	char	   *endp;
+	const char *sp;
+	int			rc;
+
+	/*
+	 * construct the command to be executed
+	 */
+	dp = xlogarchcmd;
+	endp = xlogarchcmd + MAXPGPATH - 1;
+	*endp = '\0';
+
+	for (sp = XLogArchiveCommand; *sp; sp++)
+	{
+		if (*sp == '%')
+		{
+			switch (sp[1])
+			{
+				case 'p':
+					/* %p: relative path of source file */
+					sp++;
+					strlcpy(dp, path, endp - dp);
+					make_native_path(dp);
+					dp += strlen(dp);
+					break;
+				case 'f':
+					/* %f: filename of source file */
+					sp++;
+					strlcpy(dp, file, endp - dp);
+					dp += strlen(dp);
+					break;
+				case '%':
+					/* convert %% to a single % */
+					sp++;
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+				default:
+					/* otherwise treat the % as not special */
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+			}
+		}
+		else
+		{
+			if (dp < endp)
+				*dp++ = *sp;
+		}
+	}
+	*dp = '\0';
+
+	ereport(DEBUG3,
+			(errmsg_internal("executing archive command \"%s\"",
+							 xlogarchcmd)));
+
+	rc = system(xlogarchcmd);
+	if (rc != 0)
+	{
+		/*
+		 * If either the shell itself, or a called command, died on a signal,
+		 * abort the archiver.  We do this because system() ignores SIGINT and
+		 * SIGQUIT while waiting; so a signal is very likely something that
+		 * should have interrupted us too.  Also die if the shell got a hard
+		 * "command not found" type of error.  If we overreact it's no big
+		 * deal, the postmaster will just start the archiver again.
+		 */
+		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
+
+		if (WIFEXITED(rc))
+		{
+			ereport(lev,
+					(errmsg("archive command failed with exit code %d",
+							WEXITSTATUS(rc)),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+		else if (WIFSIGNALED(rc))
+		{
+#if defined(WIN32)
+			ereport(lev,
+					(errmsg("archive command was terminated by exception 0x%X",
+							WTERMSIG(rc)),
+					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#else
+			ereport(lev,
+					(errmsg("archive command was terminated by signal %d: %s",
+							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#endif
+		}
+		else
+		{
+			ereport(lev,
+					(errmsg("archive command exited with unrecognized status %d",
+							rc),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+
+		return false;
+	}
+
+	elog(DEBUG1, "archived write-ahead log file \"%s\"", file);
+	return true;
+}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 88801374b5..358d9ed029 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -38,6 +38,7 @@
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/interrupt.h"
+#include "postmaster/pgarch.h"
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index e91d5a3cfd..57a9255fc2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3864,13 +3864,23 @@ static struct config_string ConfigureNamesString[] =
 	{
 		{"archive_command", PGC_SIGHUP, WAL_ARCHIVING,
 			gettext_noop("Sets the shell command that will be called to archive a WAL file."),
-			NULL
+			gettext_noop("This is unused if \"archive_library\" does not indicate archiving via shell is enabled.")
 		},
 		&XLogArchiveCommand,
 		"",
 		NULL, NULL, show_archive_command
 	},
 
+	{
+		{"archive_library", PGC_SIGHUP, WAL_ARCHIVING,
+			gettext_noop("Sets the library that will be called to archive a WAL file."),
+			gettext_noop("A value of \"shell\" or an empty string indicates that \"archive_command\" should be used.")
+		},
+		&XLogArchiveLibrary,
+		"shell",
+		NULL, NULL, NULL
+	},
+
 	{
 		{"restore_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will be called to retrieve an archived WAL file."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 1cbc9feeb6..dc4a20b014 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -245,6 +245,7 @@
 
 #archive_mode = off		# enables archiving; off, on, or always
 				# (change requires restart)
+#archive_library = 'shell'	# library to use to archive a logfile segment
 #archive_command = ''		# command to use to archive a logfile segment
 				# placeholders: %p = path of file to archive
 				#               %f = file name only
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index c0a560204b..850de4a857 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -157,7 +157,6 @@ extern PGDLLIMPORT int wal_level;
 /* Is WAL archiving enabled always (even during recovery)? */
 #define XLogArchivingAlways() \
 	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
-#define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')
 
 /*
  * Is WAL-logging necessary for archival or log-shipping, or can we skip
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 1e47a143e1..7d09d2665e 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -32,4 +32,49 @@ extern bool PgArchCanRestart(void);
 extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 
+/*
+ * The value of the archive_library GUC.
+ */
+extern char *XLogArchiveLibrary;
+
+/*
+ * Callback that gets called to determine if the archive module is
+ * configured.
+ */
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+
+/*
+ * Callback called to archive a single WAL file.
+ */
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+
+/*
+ * Archive module callbacks
+ */
+typedef struct ArchiveModuleCallbacks
+{
+	ArchiveCheckConfiguredCB check_configured_cb;
+	ArchiveFileCB archive_file_cb;
+} ArchiveModuleCallbacks;
+
+/*
+ * Type of the shared library symbol _PG_archive_module_init that is looked
+ * up when loading an archive library.
+ */
+typedef void (*ArchiveModuleInit) (ArchiveModuleCallbacks *cb);
+
+/*
+ * Since the logic for archiving via a shell command is in the core server
+ * and does not need to be loaded via a shared library, it has a special
+ * initialization function.
+ */
+extern void shell_archive_init(ArchiveModuleCallbacks *cb);
+
+/*
+ * We consider archiving via shell to be enabled if archive_library is
+ * empty or if archive_library is set to "shell".
+ */
+#define ShellArchivingEnabled() \
+	(XLogArchiveLibrary[0] == '\0' || strcmp(XLogArchiveLibrary, "shell") == 0)
+
 #endif							/* _PGARCH_H */
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index dffc79b2d9..b49e508a2c 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -5,6 +5,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = \
+		  basic_archive \
 		  brin \
 		  commit_ts \
 		  delay_execution \
diff --git a/src/test/modules/basic_archive/.gitignore b/src/test/modules/basic_archive/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/basic_archive/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/basic_archive/Makefile b/src/test/modules/basic_archive/Makefile
new file mode 100644
index 0000000000..ffbf846b68
--- /dev/null
+++ b/src/test/modules/basic_archive/Makefile
@@ -0,0 +1,20 @@
+# src/test/modules/basic_archive/Makefile
+
+MODULES = basic_archive
+PGFILEDESC = "basic_archive - basic archive module"
+
+REGRESS = basic_archive
+REGRESS_OPTS = --temp-config $(top_srcdir)/src/test/modules/basic_archive/basic_archive.conf
+
+NO_INSTALLCHECK = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/basic_archive
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/basic_archive/basic_archive.c b/src/test/modules/basic_archive/basic_archive.c
new file mode 100644
index 0000000000..84e923bb7e
--- /dev/null
+++ b/src/test/modules/basic_archive/basic_archive.c
@@ -0,0 +1,185 @@
+/*-------------------------------------------------------------------------
+ *
+ * basic_archive.c
+ *
+ * This file demonstrates a basic archive library implementation that is
+ * roughly equivalent to the following shell command:
+ *
+ * 		test ! -f /path/to/dest && cp /path/to/src /path/to/dest
+ *
+ * One notable difference between this module and the shell command above
+ * is that this module first copies the file to a temporary destination,
+ * syncs it to disk, and then durably moves it to the final destination.
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/test/modules/basic_archive/basic_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "miscadmin.h"
+#include "postmaster/pgarch.h"
+#include "storage/copydir.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+void _PG_init(void);
+void _PG_archive_module_init(ArchiveModuleCallbacks *cb);
+
+static char *archive_directory = NULL;
+
+static bool basic_archive_configured(void);
+static bool basic_archive_file(const char *file, const char *path);
+static bool check_archive_directory(char **newval, void **extra, GucSource source);
+
+/*
+ * _PG_init
+ *
+ * Defines the module's GUC.
+ */
+void
+_PG_init(void)
+{
+	DefineCustomStringVariable("basic_archive.archive_directory",
+							   gettext_noop("Archive file destination directory."),
+							   NULL,
+							   &archive_directory,
+							   "",
+							   PGC_SIGHUP,
+							   0,
+							   check_archive_directory, NULL, NULL);
+
+	EmitWarningsOnPlaceholders("basic_archive");
+}
+
+/*
+ * _PG_archive_module_init
+ *
+ * Returns the module's archiving callbacks.
+ */
+void
+_PG_archive_module_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = basic_archive_configured;
+	cb->archive_file_cb = basic_archive_file;
+}
+
+/*
+ * check_archive_directory
+ *
+ * Checks that the provided archive directory exists.
+ */
+static bool
+check_archive_directory(char **newval, void **extra, GucSource source)
+{
+	struct stat st;
+
+	/*
+	 * The default value is an empty string, so we have to accept that value.
+	 * Our check_configured callback also checks for this and prevents archiving
+	 * from proceeding if it is still empty.
+	 */
+	if (*newval == NULL || *newval[0] == '\0')
+		return true;
+
+	/*
+	 * Make sure the file paths won't be too long.  The docs indicate that the
+	 * file names to be archived can be up to 64 characters long.
+	 */
+	if (strlen(*newval) + 64 + 2 >= MAXPGPATH)
+	{
+		GUC_check_errdetail("archive directory too long");
+		return false;
+	}
+
+	/*
+	 * Do a basic sanity check that the specified archive directory exists.  It
+	 * could be removed at some point in the future, so we still need to be
+	 * prepared for it not to exist in the actual archiving logic.
+	 */
+	if (stat(*newval, &st) != 0 || !S_ISDIR(st.st_mode))
+	{
+		GUC_check_errdetail("specified archive directory does not exist");
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * basic_archive_configured
+ *
+ * Checks that archive_directory is not blank.
+ */
+static bool
+basic_archive_configured(void)
+{
+	return archive_directory != NULL && archive_directory[0] != '\0';
+}
+
+/*
+ * basic_archive_file
+ *
+ * Archives one file.
+ */
+static bool
+basic_archive_file(const char *file, const char *path)
+{
+	char destination[MAXPGPATH];
+	char temp[MAXPGPATH];
+	struct stat st;
+
+	ereport(DEBUG3,
+			(errmsg("archiving \"%s\" via basic_archive", file)));
+
+	snprintf(destination, MAXPGPATH, "%s/%s", archive_directory, file);
+	snprintf(temp, MAXPGPATH, "%s/%s", archive_directory, "archtemp");
+
+	/*
+	 * First, check if the file has already been archived.  If the archive file
+	 * already exists, something might be wrong, so we just fail.
+	 */
+	if (stat(destination, &st) == 0)
+	{
+		ereport(WARNING,
+				(errmsg("archive file \"%s\" already exists", destination)));
+		return false;
+	}
+	else if (errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat file \"%s\": %m", destination)));
+
+	/*
+	 * Remove pre-existing temporary file, if one exists.
+	 */
+	if (unlink(temp) != 0 && errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not unlink file \"%s\": %m", temp)));
+
+	/*
+	 * Copy the file to its temporary destination.
+	 */
+	copy_file(unconstify(char *, path), temp);
+
+	/*
+	 * Sync the temporary file to disk and move it to its final destination.
+	 */
+	(void) durable_rename_excl(temp, destination, ERROR);
+
+	ereport(DEBUG1,
+			(errmsg("archived \"%s\" via basic_archive", file)));
+
+	return true;
+}
diff --git a/src/test/modules/basic_archive/basic_archive.conf b/src/test/modules/basic_archive/basic_archive.conf
new file mode 100644
index 0000000000..b26b2d4144
--- /dev/null
+++ b/src/test/modules/basic_archive/basic_archive.conf
@@ -0,0 +1,3 @@
+archive_mode = 'on'
+archive_library = 'basic_archive'
+basic_archive.archive_directory = '.'
diff --git a/src/test/modules/basic_archive/expected/basic_archive.out b/src/test/modules/basic_archive/expected/basic_archive.out
new file mode 100644
index 0000000000..0015053e0f
--- /dev/null
+++ b/src/test/modules/basic_archive/expected/basic_archive.out
@@ -0,0 +1,29 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+ ?column? 
+----------
+        1
+(1 row)
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test;
diff --git a/src/test/modules/basic_archive/sql/basic_archive.sql b/src/test/modules/basic_archive/sql/basic_archive.sql
new file mode 100644
index 0000000000..14e236d57a
--- /dev/null
+++ b/src/test/modules/basic_archive/sql/basic_archive.sql
@@ -0,0 +1,22 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+
+DROP TABLE test;
-- 
2.16.6

#15Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Bossart, Nathan (#4)
Re: archive modules

On 2021/11/03 0:03, Bossart, Nathan wrote:

On 11/1/21, 9:44 PM, "Fujii Masao" <masao.fujii@oss.nttdata.com> wrote:

What is the main motivation of this patch? I was thinking that
it's for parallelizing WAL archiving. But as far as I read
the patch very briefly, WAL file name is still passed to
the archive callback function one by one.

The main motivation is provide a way to archive without shelling out.
This reduces the amount of overhead, which can improve archival rate
significantly.

It's helpful if you share how much this approach reduces
the amount of overhead.

It should also make it easier to archive more safely.
For example, many of the common shell commands used for archiving
won't fsync the data, but it isn't too hard to do so via C.

But probably we can do the same thing even by using the existing
shell interface? For example, we can implement and provide
the C program of the archive command that fsync's the file?
Users can just use it in archive_command.

The
current proposal doesn't introduce any extra infrastructure for
batching or parallelism, but it is probably still possible. I would
like to eventually add batching, but for now I'm only focused on
introducing basic archive module support.

Understood. I agree that it's reasonable to implement them gradually.

Are you planning to extend this mechanism to other WAL
archiving-related commands like restore_command? I can imagine
that those who use archive library (rather than shell) would
like to use the same mechanism for WAL restore.

I would like to do this eventually, but my current proposal is limited
to archive_command.

Understood.

I think that it's worth adding this module into core
rather than handling it as test module. It provides very basic
WAL archiving feature, but (I guess) it's enough for some users.

Do you think it should go into contrib?

Yes, at least for me..

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#16David Steele
david@pgmasters.net
In reply to: Fujii Masao (#15)
Re: archive modules

On 11/7/21 1:04 AM, Fujii Masao wrote:

On 2021/11/03 0:03, Bossart, Nathan wrote:

On 11/1/21, 9:44 PM, "Fujii Masao" <masao.fujii@oss.nttdata.com> wrote:

What is the main motivation of this patch? I was thinking that
it's for parallelizing WAL archiving. But as far as I read
the patch very briefly, WAL file name is still passed to
the archive callback function one by one.

The main motivation is provide a way to archive without shelling out.
This reduces the amount of overhead, which can improve archival rate
significantly.

It's helpful if you share how much this approach reduces
the amount of overhead.

FWIW we have a test for this in pgBackRest. Running
`archive_command=pgbackrest archive-push ...` 1000 times via system()
yields an average of 3ms per execution. pgBackRest reports ~1ms of time
here so the system() overhead is ~2ms. These times are on my very fast
workstation and in my experience servers are quite a bit slower.

This doesn't tell the entire story, though, because in this test
pgBackRest is just checking notifications being returned by an async
process that was spawned earlier. This complexity exists to save the
startup costs of, e.g. establishing an SSH connection, which is often >
1 second.

This module would make it far easier to pay those startup costs a single
time, or at least only occasionally, making it possible to write
performant archivers with less complexity than is currently possible.

It should also make it easier to archive more safely.
For example, many of the common shell commands used for archiving
won't fsync the data, but it isn't too hard to do so via C.

But probably we can do the same thing even by using the existing
shell interface? For example, we can implement and provide
the C program of the archive command that fsync's the file?
Users can just use it in archive_command.

It is far more common to be writing WAL segments to another host or
object storage. In either case I believe a local fsync file command is
not very useful.

I think that it's worth adding this module into core
rather than handling it as test module. It provides very basic
WAL archiving feature, but (I guess) it's enough for some users.

Do you think it should go into contrib?

I would prefer this module to be in core as our standard implementation
and load by default in a vanilla install.

Regards,
--
-David
david@pgmasters.net

#17Bossart, Nathan
bossartn@amazon.com
In reply to: David Steele (#16)
Re: archive modules

On 11/10/21, 8:10 AM, "David Steele" <david@pgmasters.net> wrote:

On 11/7/21 1:04 AM, Fujii Masao wrote:

It's helpful if you share how much this approach reduces
the amount of overhead.

FWIW we have a test for this in pgBackRest. Running
`archive_command=pgbackrest archive-push ...` 1000 times via system()
yields an average of 3ms per execution. pgBackRest reports ~1ms of time
here so the system() overhead is ~2ms. These times are on my very fast
workstation and in my experience servers are quite a bit slower.

In the previous thread [0]/messages/by-id/E9035E94-EC76-436E-B6C9-1C03FBD8EF54@amazon.com, I noted a 50% speedup for a basic
archiving strategy that involved copying the file to a different
directory.

I would prefer this module to be in core as our standard implementation
and load by default in a vanilla install.

Hm. I think I disagree with putting it in contrib and with making it
the default archive library. The first reason is backward
compatibility. There has already been quite a bit of discussion about
this, and I don't see how we can get away with anything except for
maintaining the existing behavior for now. Maybe we could move to a
better default down the road, but I'm hesitant to press that issue too
much at the moment.

The second reason is that the basic_archive module has a couple of
deficiencies. For example, it doesn't handle inconvenient server
crashes well (e.g., after archiving but before we've renamed the
.ready file). A way to fix this might be to compare the archive file
with the to-be-archived file and to succeed if they are exactly the
same. Another problem is that there is no handling for multiple
servers using basic_archive to write WAL to the same location. This
is because basic_archive first copies data to a temporary file that is
always named "archtemp." This might be fixed by appending a random
string to the temporary file or by locking it somehow, but there are
still a few things left to figure out.

I think it'd be awesome to eventually fix all these issues in
basic_archive and to recommend it as a proper archiving strategy, but
I'm worried that this will introduce a significant amount of
complexity to this patch. I really only intended for basic_archive to
be used for testing and to demonstrate that it's possible use the
archive module infrastructure to do something useful. If folks really
want it in contrib, I'd at least add a big warning about the
aforementioned problems in its docs.

Nathan

[0]: /messages/by-id/E9035E94-EC76-436E-B6C9-1C03FBD8EF54@amazon.com

#18David Steele
david@pgmasters.net
In reply to: Bossart, Nathan (#17)
Re: archive modules

On 11/10/21 1:22 PM, Bossart, Nathan wrote:

On 11/10/21, 8:10 AM, "David Steele" <david@pgmasters.net> wrote:

I would prefer this module to be in core as our standard implementation
and load by default in a vanilla install.

Hm. I think I disagree with putting it in contrib and with making it
the default archive library. The first reason is backward
compatibility. There has already been quite a bit of discussion about
this, and I don't see how we can get away with anything except for
maintaining the existing behavior for now. Maybe we could move to a
better default down the road, but I'm hesitant to press that issue too
much at the moment.

OK, I haven't had to go over the patch in detail so I didn't realize the
module was not backwards compatible. I'll have a closer look soon.

The second reason is that the basic_archive module has a couple of
deficiencies. For example, it doesn't handle inconvenient server
crashes well (e.g., after archiving but before we've renamed the
.ready file). A way to fix this might be to compare the archive file
with the to-be-archived file and to succeed if they are exactly the
same. Another problem is that there is no handling for multiple
servers using basic_archive to write WAL to the same location. This
is because basic_archive first copies data to a temporary file that is
always named "archtemp." This might be fixed by appending a random
string to the temporary file or by locking it somehow, but there are
still a few things left to figure out.

Honestly, I'm not sure to what extent it makes sense to delve into these
problems for an archiver that basically just copies to another
directory. This is a not a very realistic solution for the common
storage requirements we are seeing these days.

I think it'd be awesome to eventually fix all these issues in
basic_archive and to recommend it as a proper archiving strategy, but
I'm worried that this will introduce a significant amount of
complexity to this patch. I really only intended for basic_archive to
be used for testing and to demonstrate that it's possible use the
archive module infrastructure to do something useful. If folks really
want it in contrib, I'd at least add a big warning about the
aforementioned problems in its docs.

I'll have more to say once I've had a closer look, but in general I
agree with what you have said here. Keeping it in test for now is likely
to be the best approach.

Regards,
--
-David
david@pgmasters.net

#19Bossart, Nathan
bossartn@amazon.com
In reply to: David Steele (#18)
Re: archive modules

On 11/10/21, 10:42 AM, "David Steele" <david@pgmasters.net> wrote:

OK, I haven't had to go over the patch in detail so I didn't realize the
module was not backwards compatible. I'll have a closer look soon.

It's backward-compatible in the sense that you'd be able to switch
archive_library to "shell" to continue using archive_command, but
archive_command is otherwise unused. The proposed patch sets
archive_library to "shell" by default.

Honestly, I'm not sure to what extent it makes sense to delve into these
problems for an archiver that basically just copies to another
directory. This is a not a very realistic solution for the common
storage requirements we are seeing these days.

Agreed.

I'll have more to say once I've had a closer look, but in general I
agree with what you have said here. Keeping it in test for now is likely
to be the best approach.

Looking forward to your feedback.

Nathan

#20Bossart, Nathan
bossartn@amazon.com
In reply to: David Steele (#18)
1 attachment(s)
Re: archive modules

On 11/10/21, 10:53 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

Looking forward to your feedback.

Here is a rebased patch (beb4e9b broke v8).

Nathan

Attachments:

v9-0001-Introduce-archive-module-infrastructure.patchapplication/octet-stream; name=v9-0001-Introduce-archive-module-infrastructure.patchDownload
From 24b859d38bf38a6e1cd17cda98dd3e747321b5fb Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Thu, 11 Nov 2021 22:00:35 +0000
Subject: [PATCH v9 1/1] Introduce archive module infrastructure.

This feature allows custom archive libraries to be used in place of
archive_command.  A new GUC called archive_library specifies the
archive module that should be used.  Like logical decoding output
plugins, archive modules must define an initialization function and
some callbacks.  If archive_library is set to "shell" (which is the
default for backward compatibility), archive_command is used.
---
 doc/src/sgml/archive-modules.sgml                  | 123 +++++++++++++
 doc/src/sgml/backup.sgml                           |  83 +++++----
 doc/src/sgml/config.sgml                           |  37 +++-
 doc/src/sgml/filelist.sgml                         |   1 +
 doc/src/sgml/high-availability.sgml                |   6 +-
 doc/src/sgml/postgres.sgml                         |   1 +
 doc/src/sgml/ref/pg_basebackup.sgml                |   4 +-
 doc/src/sgml/ref/pg_receivewal.sgml                |   6 +-
 doc/src/sgml/wal.sgml                              |   2 +-
 src/backend/access/transam/xlog.c                  |   2 +-
 src/backend/postmaster/Makefile                    |   1 +
 src/backend/postmaster/pgarch.c                    | 192 ++++++++-------------
 src/backend/postmaster/shell_archive.c             | 153 ++++++++++++++++
 src/backend/utils/init/miscinit.c                  |   1 +
 src/backend/utils/misc/guc.c                       |  12 +-
 src/backend/utils/misc/postgresql.conf.sample      |   1 +
 src/include/access/xlog.h                          |   1 -
 src/include/postmaster/pgarch.h                    |  45 +++++
 src/test/modules/Makefile                          |   1 +
 src/test/modules/basic_archive/.gitignore          |   4 +
 src/test/modules/basic_archive/Makefile            |  20 +++
 src/test/modules/basic_archive/basic_archive.c     | 185 ++++++++++++++++++++
 src/test/modules/basic_archive/basic_archive.conf  |   3 +
 .../basic_archive/expected/basic_archive.out       |  29 ++++
 .../modules/basic_archive/sql/basic_archive.sql    |  22 +++
 25 files changed, 763 insertions(+), 172 deletions(-)
 create mode 100644 doc/src/sgml/archive-modules.sgml
 create mode 100644 src/backend/postmaster/shell_archive.c
 create mode 100644 src/test/modules/basic_archive/.gitignore
 create mode 100644 src/test/modules/basic_archive/Makefile
 create mode 100644 src/test/modules/basic_archive/basic_archive.c
 create mode 100644 src/test/modules/basic_archive/basic_archive.conf
 create mode 100644 src/test/modules/basic_archive/expected/basic_archive.out
 create mode 100644 src/test/modules/basic_archive/sql/basic_archive.sql

diff --git a/doc/src/sgml/archive-modules.sgml b/doc/src/sgml/archive-modules.sgml
new file mode 100644
index 0000000000..d52aaaf1f5
--- /dev/null
+++ b/doc/src/sgml/archive-modules.sgml
@@ -0,0 +1,123 @@
+<!-- doc/src/sgml/archive-modules.sgml -->
+
+<chapter id="archive-modules">
+ <title>Archive Modules</title>
+ <indexterm zone="archive-modules">
+  <primary>Archive Modules</primary>
+ </indexterm>
+
+ <para>
+  PostgreSQL provides infrastructure to create custom modules for continuous
+  archiving (see <xref linkend="continuous-archiving"/>).  While archiving via
+  a shell command (i.e., <xref linkend="guc-archive-command"/>) is much
+  simpler, a custom archive module will often be considerably more robust and
+  performant.
+ </para>
+
+ <para>
+  When a custom <xref linkend="guc-archive-library"/> is configured, PostgreSQL
+  will submit completed WAL files to the module, and the server will avoid
+  recyling or removing these WAL files until the module indicates that the files
+  were successfully archived.  It is ultimately up to the module to decide what
+  to do with each WAL file, but many recommendations are listed at
+  <xref linkend="backup-archiving-wal"/>.
+ </para>
+
+ <para>
+  Archiving modules must at least consist of an initialization function (see
+  <xref linkend="archive-module-init"/>) and the required callbacks (see
+  <xref linkend="archive-module-callbacks"/>).  However, archive modules are
+  also permitted to do much more (e.g., declare GUCs and register background
+  workers).
+ </para>
+
+ <para>
+  The <filename>src/test/modules/basic_archive</filename> module contains a
+  working example, which demonstrates some useful techniques.
+ </para>
+
+ <warning>
+  <para>
+   There are considerable robustness and security risks in using archive modules
+   because, being written in the <literal>C</literal> language, they have access
+   to many server resources.  Administrators wishing to enable archive modules
+   should exercise extreme caution.  Only carefully audited modules should be
+   loaded.
+  </para>
+ </warning>
+
+ <sect1 id="archive-module-init">
+  <title>Initialization Functions</title>
+  <indexterm zone="archive-module-init">
+   <primary>_PG_archive_module_init</primary>
+  </indexterm>
+  <para>
+   An archive library is loaded by dynamically loading a shared library with the
+   <xref linkend="guc-archive-library"/>'s name as the library base name.  The
+   normal library search path is used to locate the library.  To provide the
+   required archive module callbacks and to indicate that the library is
+   actually an archive module, it needs to provide a function named
+   <function>_PG_archive_module_init</function>.  This function is passed a
+   struct that needs to be filled with the callback function pointers for
+   individual actions.
+
+<programlisting>
+typedef struct ArchiveModuleCallbacks
+{
+    ArchiveCheckConfiguredCB check_configured_cb;
+    ArchiveFileCB archive_file_cb;
+} ArchiveModuleCallbacks;
+typedef void (*ArchiveModuleInit) (struct ArchiveModuleCallbacks *cb);
+</programlisting>
+
+   Both callbacks are required.
+  </para>
+ </sect1>
+
+ <sect1 id="archive-module-callbacks">
+  <title>Archive Module Callbacks</title>
+  <para>
+   The archive callbacks define the actual archiving behavior of the module.
+   The server will call them as required to process each individual WAL file.
+  </para>
+
+  <sect2 id="archive-module-check">
+   <title>Check Callback</title>
+   <para>
+    The <function>check_configured_cb</function> callback is called to determine
+    whether the module is fully configured and ready to accept WAL files.
+
+<programlisting>
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server will proceed with
+    archiving the file by calling the <function>archive_file_cb</function>
+    callback.  If <literal>false</literal> is returned, archiving will not
+    proceed.  In the latter case, the server will periodically call this
+    function, and archiving will proceed if it eventually returns
+    <literal>true</literal>.
+   </para>
+  </sect2>
+
+  <sect2 id="archive-module-archive">
+   <title>Archive Callback</title>
+   <para>
+    The <function>archive_file_cb</function> callback is called to archive a
+    single WAL file.
+
+<programlisting>
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server proceeds as if the file
+    was successfully archived, which may include recycling or removing the
+    original WAL file.  If <literal>false</literal> is returned, the server will
+    keep the original WAL file and retry archiving later.
+    <literal>file</literal> will contain just the file name of the WAL file to
+    archive, while <literal>path</literal> contains the full path of the WAL
+    file (including the file name).
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index cba32b6eb3..b42f1b3ca7 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -593,20 +593,23 @@ tar -cf backup.tar /usr/local/pgsql/data
     provide the database administrator with flexibility,
     <productname>PostgreSQL</productname> tries not to make any assumptions about how
     the archiving will be done.  Instead, <productname>PostgreSQL</productname> lets
-    the administrator specify a shell command to be executed to copy a
-    completed segment file to wherever it needs to go.  The command could be
-    as simple as a <literal>cp</literal>, or it could invoke a complex shell
-    script &mdash; it's all up to you.
+    the administrator specify an archive library to be executed to copy a
+    completed segment file to wherever it needs to go.  This could be as simple
+    as a shell command that uses <literal>cp</literal>, or it could invoke a
+    complex C function &mdash; it's all up to you.
    </para>
 
    <para>
     To enable WAL archiving, set the <xref linkend="guc-wal-level"/>
     configuration parameter to <literal>replica</literal> or higher,
     <xref linkend="guc-archive-mode"/> to <literal>on</literal>,
-    and specify the shell command to use in the <xref
-    linkend="guc-archive-command"/> configuration parameter.  In practice
+    and specify the library to use in the <xref
+    linkend="guc-archive-library"/> configuration parameter.  In practice
     these settings will always be placed in the
     <filename>postgresql.conf</filename> file.
+    One simple way to archive is to set <varname>archive_library</varname> to
+    <literal>shell</literal> and to specify a shell command in
+    <xref linkend="guc-archive-command"/>.
     In <varname>archive_command</varname>,
     <literal>%p</literal> is replaced by the path name of the file to
     archive, while <literal>%f</literal> is replaced by only the file name.
@@ -631,7 +634,17 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command will be executed under the ownership of the same
+    Another way to archive is to use a custom archive module as the
+    <varname>archive_library</varname>.  Since such modules are written in
+    <literal>C</literal>, creating your own may require considerably more effort
+    than writing a shell command.  However, archive modules can be more
+    performant than archiving via shell, and they will have access to many
+    useful server resources.  For more information about archive modules, see
+    <xref linkend="archive-modules"/>.
+   </para>
+
+   <para>
+    The archive library will be executed under the ownership of the same
     user that the <productname>PostgreSQL</productname> server is running as.  Since
     the series of WAL files being archived contains effectively everything
     in your database, you will want to be sure that the archived data is
@@ -640,25 +653,31 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is important that the archive command return zero exit status if and
-    only if it succeeds.  Upon getting a zero result,
+    It is important that the archive function return <literal>true</literal> if
+    and only if it succeeds.  If <literal>true</literal> is returned,
     <productname>PostgreSQL</productname> will assume that the file has been
-    successfully archived, and will remove or recycle it.  However, a nonzero
-    status tells <productname>PostgreSQL</productname> that the file was not archived;
-    it will try again periodically until it succeeds.
+    successfully archived, and will remove or recycle it.  However, a return
+    value of <literal>false</literal> tells
+    <productname>PostgreSQL</productname> that the file was not archived; it
+    will try again periodically until it succeeds.  If you are archiving via a
+    shell command, the appropriate return values can be achieved by returning
+    <literal>0</literal> if the command succeeds and a nonzero value if it
+    fails.
    </para>
 
    <para>
-    When the archive command is terminated by a signal (other than
-    <systemitem>SIGTERM</systemitem> that is used as part of a server
-    shutdown) or an error by the shell with an exit status greater than
-    125 (such as command not found), the archiver process aborts and gets
-    restarted by the postmaster. In such cases, the failure is
-    not reported in <xref linkend="pg-stat-archiver-view"/>.
+    If the archive function emits an <literal>ERROR</literal> or
+    <literal>FATAL</literal>, the archiver process aborts and gets restarted by
+    the postmaster.  If you are archiving via shell command, FATAL is emitted if
+    the command is terminated by a signal (other than
+    <systemitem>SIGTERM</systemitem> that is used as part of a server shutdown)
+    or an error by the shell with an exit status greater than 125 (such as
+    command not found).  In such cases, the failure is not reported in
+    <xref linkend="pg-stat-archiver-view"/>.
    </para>
 
    <para>
-    The archive command should generally be designed to refuse to overwrite
+    The archive library should generally be designed to refuse to overwrite
     any pre-existing archive file.  This is an important safety feature to
     preserve the integrity of your archive in case of administrator error
     (such as sending the output of two different servers to the same archive
@@ -666,9 +685,9 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is advisable to test your proposed archive command to ensure that it
+    It is advisable to test your proposed archive library to ensure that it
     indeed does not overwrite an existing file, <emphasis>and that it returns
-    nonzero status in this case</emphasis>.
+    <literal>false</literal> in this case</emphasis>.
     The example command above for Unix ensures this by including a separate
     <command>test</command> step.  On some Unix platforms, <command>cp</command> has
     switches such as <option>-i</option> that can be used to do the same thing
@@ -680,7 +699,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
 
    <para>
     While designing your archiving setup, consider what will happen if
-    the archive command fails repeatedly because some aspect requires
+    the archive library fails repeatedly because some aspect requires
     operator intervention or the archive runs out of space. For example, this
     could occur if you write to tape without an autochanger; when the tape
     fills, nothing further can be archived until the tape is swapped.
@@ -695,7 +714,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The speed of the archiving command is unimportant as long as it can keep up
+    The speed of the archive library is unimportant as long as it can keep up
     with the average rate at which your server generates WAL data.  Normal
     operation continues even if the archiving process falls a little behind.
     If archiving falls significantly behind, this will increase the amount of
@@ -707,11 +726,11 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    In writing your archive command, you should assume that the file names to
+    In writing your archive library, you should assume that the file names to
     be archived can be up to 64 characters long and can contain any
     combination of ASCII letters, digits, and dots.  It is not necessary to
-    preserve the original relative path (<literal>%p</literal>) but it is necessary to
-    preserve the file name (<literal>%f</literal>).
+    preserve the original relative path but it is necessary to preserve the file
+    name.
    </para>
 
    <para>
@@ -728,7 +747,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command is only invoked on completed WAL segments.  Hence,
+    The archive function is only invoked on completed WAL segments.  Hence,
     if your server generates only little WAL traffic (or has slack periods
     where it does so), there could be a long delay between the completion
     of a transaction and its safe recording in archive storage.  To put
@@ -758,7 +777,8 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
     contain enough information for archive recovery.  (Crash recovery is
     unaffected.)  For this reason, <varname>wal_level</varname> can only be changed at
     server start.  However, <varname>archive_command</varname> can be changed with a
-    configuration file reload.  If you wish to temporarily stop archiving,
+    configuration file reload.  If you are archiving via shell and wish to
+    temporarily stop archiving,
     one way to do it is to set <varname>archive_command</varname> to the empty
     string (<literal>''</literal>).
     This will cause WAL files to accumulate in <filename>pg_wal/</filename> until a
@@ -938,11 +958,11 @@ SELECT * FROM pg_stop_backup(false, true);
      On a standby, <varname>archive_mode</varname> must be <literal>always</literal> in order
      for <function>pg_stop_backup</function> to wait.
      Archiving of these files happens automatically since you have
-     already configured <varname>archive_command</varname>. In most cases this
+     already configured <varname>archive_library</varname>. In most cases this
      happens quickly, but you are advised to monitor your archive
      system to ensure there are no delays.
      If the archive process has fallen behind
-     because of failures of the archive command, it will keep retrying
+     because of failures of the archive library, it will keep retrying
      until the archive succeeds and the backup is complete.
      If you wish to place a time limit on the execution of
      <function>pg_stop_backup</function>, set an appropriate
@@ -1500,9 +1520,10 @@ restore_command = 'cp /mnt/server/archivedir/%f %p'
       To prepare for low level standalone hot backups, make sure
       <varname>wal_level</varname> is set to
       <literal>replica</literal> or higher, <varname>archive_mode</varname> to
-      <literal>on</literal>, and set up an <varname>archive_command</varname> that performs
+      <literal>on</literal>, and set up an <varname>archive_library</varname> that performs
       archiving only when a <emphasis>switch file</emphasis> exists.  For example:
 <programlisting>
+archive_library = 'shell'
 archive_command = 'test ! -f /var/lib/pgsql/backup_in_progress || (test ! -f /var/lib/pgsql/archive/%f &amp;&amp; cp %p /var/lib/pgsql/archive/%f)'
 </programlisting>
       This command will perform archiving when
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3f806740d5..23b3ff338c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3479,7 +3479,7 @@ include_dir 'conf.d'
         Maximum size to let the WAL grow during automatic
         checkpoints. This is a soft limit; WAL size can exceed
         <varname>max_wal_size</varname> under special circumstances, such as
-        heavy load, a failing <varname>archive_command</varname>, or a high
+        heavy load, a failing <varname>archive_library</varname>, or a high
         <varname>wal_keep_size</varname> setting.
         If this value is specified without units, it is taken as megabytes.
         The default is 1 GB.
@@ -3528,7 +3528,7 @@ include_dir 'conf.d'
        <para>
         When <varname>archive_mode</varname> is enabled, completed WAL segments
         are sent to archive storage by setting
-        <xref linkend="guc-archive-command"/>. In addition to <literal>off</literal>,
+        <xref linkend="guc-archive-library"/>. In addition to <literal>off</literal>,
         to disable, there are two modes: <literal>on</literal>, and
         <literal>always</literal>. During normal operation, there is no
         difference between the two modes, but when set to <literal>always</literal>
@@ -3538,9 +3538,6 @@ include_dir 'conf.d'
         <xref linkend="continuous-archiving-in-standby"/> for details.
        </para>
        <para>
-        <varname>archive_mode</varname> and <varname>archive_command</varname> are
-        separate variables so that <varname>archive_command</varname> can be
-        changed without leaving archiving mode.
         This parameter can only be set at server start.
         <varname>archive_mode</varname> cannot be enabled when
         <varname>wal_level</varname> is set to <literal>minimal</literal>.
@@ -3548,6 +3545,28 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-archive-library" xreflabel="archive_library">
+      <term><varname>archive_library</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>archive_library</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        The library to use for archiving completed WAL file segments.  If set to
+        <literal>shell</literal> (the default) or an empty string, archiving via
+        shell is enabled, and <xref linkend="guc-archive-command"/> is used.
+        Otherwise, the specified shared library is used for archiving.  For more
+        information, see <xref linkend="backup-archiving-wal"/> and
+        <xref linkend="archive-modules"/>.
+       </para>
+       <para>
+        This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-archive-command" xreflabel="archive_command">
       <term><varname>archive_command</varname> (<type>string</type>)
       <indexterm>
@@ -3570,9 +3589,11 @@ include_dir 'conf.d'
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
         file or on the server command line.  It is ignored unless
-        <varname>archive_mode</varname> was enabled at server start.
+        <varname>archive_mode</varname> was enabled at server start and
+        <varname>archive_library</varname> specifies to archive via shell command.
         If <varname>archive_command</varname> is an empty string (the default) while
-        <varname>archive_mode</varname> is enabled, WAL archiving is temporarily
+        <varname>archive_mode</varname> is enabled and <varname>archive_library</varname>
+        specifies archiving via shell, WAL archiving is temporarily
         disabled, but the server continues to accumulate WAL segment files in
         the expectation that a command will soon be provided.  Setting
         <varname>archive_command</varname> to a command that does nothing but
@@ -3592,7 +3613,7 @@ include_dir 'conf.d'
       </term>
       <listitem>
        <para>
-        The <xref linkend="guc-archive-command"/> is only invoked for
+        The <xref linkend="guc-archive-library"/> is only invoked for
         completed WAL segments. Hence, if your server generates little WAL
         traffic (or has slack periods where it does so), there could be a
         long delay between the completion of a transaction and its safe
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 89454e99b9..e6b472ec32 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -99,6 +99,7 @@
 <!ENTITY custom-scan SYSTEM "custom-scan.sgml">
 <!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
 <!ENTITY replication-origins SYSTEM "replication-origins.sgml">
+<!ENTITY archive-modules SYSTEM "archive-modules.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index c43f214020..f4e5e9420b 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -935,7 +935,7 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
     In lieu of using replication slots, it is possible to prevent the removal
     of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by
     storing the segments in an archive using
-    <xref linkend="guc-archive-command"/>.
+    <xref linkend="guc-archive-library"/>.
     However, these methods often result in retaining more WAL segments than
     required, whereas replication slots retain only the number of segments
     known to be needed.  On the other hand, replication slots can retain so
@@ -1386,10 +1386,10 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
      to <literal>always</literal>, and the standby will call the archive
      command for every WAL segment it receives, whether it's by restoring
      from the archive or by streaming replication. The shared archive can
-     be handled similarly, but the <varname>archive_command</varname> must
+     be handled similarly, but the <varname>archive_library</varname> must
      test if the file being archived exists already, and if the existing file
      has identical contents. This requires more care in the
-     <varname>archive_command</varname>, as it must
+     <varname>archive_library</varname>, as it must
      be careful to not overwrite an existing file with different contents,
      but return success if the exactly same file is archived twice. And
      all that must be done free of race conditions, if two servers attempt
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index dba9cf413f..3db6d2160b 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -233,6 +233,7 @@ break is not needed in a wider output rendering.
   &bgworker;
   &logicaldecoding;
   &replication-origins;
+  &archive-modules;
 
  </part>
 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9e6807b457..2aaeaca766 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -102,8 +102,8 @@ PostgreSQL documentation
      <para>
       All WAL records required for the backup must contain sufficient full-page writes,
       which requires you to enable <varname>full_page_writes</varname> on the primary and
-      not to use a tool like <application>pg_compresslog</application> as
-      <varname>archive_command</varname> to remove full-page writes from WAL files.
+      not to use a tool in your <varname>archive_library</varname> to remove
+      full-page writes from WAL files.
      </para>
     </listitem>
    </itemizedlist>
diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 5de80f8c64..a6b6ba91fb 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -40,7 +40,7 @@ PostgreSQL documentation
   <para>
    <application>pg_receivewal</application> streams the write-ahead
    log in real time as it's being generated on the server, and does not wait
-   for segments to complete like <xref linkend="guc-archive-command"/> does.
+   for segments to complete like <xref linkend="guc-archive-library"/> does.
    For this reason, it is not necessary to set
    <xref linkend="guc-archive-timeout"/> when using
     <application>pg_receivewal</application>.
@@ -488,11 +488,11 @@ PostgreSQL documentation
 
   <para>
    When using <application>pg_receivewal</application> instead of
-   <xref linkend="guc-archive-command"/> as the main WAL backup method, it is
+   <xref linkend="guc-archive-library"/> as the main WAL backup method, it is
    strongly recommended to use replication slots.  Otherwise, the server is
    free to recycle or remove write-ahead log files before they are backed up,
    because it does not have any information, either
-   from <xref linkend="guc-archive-command"/> or the replication slots, about
+   from <xref linkend="guc-archive-library"/> or the replication slots, about
    how far the WAL stream has been archived.  Note, however, that a
    replication slot will fill up the server's disk space if the receiver does
    not keep up with fetching the WAL data.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 24e1c89503..2bb27a8468 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -636,7 +636,7 @@
    WAL files plus one additional WAL file are
    kept at all times. Also, if WAL archiving is used, old segments cannot be
    removed or recycled until they are archived. If WAL archiving cannot keep up
-   with the pace that WAL is generated, or if <varname>archive_command</varname>
+   with the pace that WAL is generated, or if <varname>archive_library</varname>
    fails repeatedly, old WAL files will accumulate in <filename>pg_wal</filename>
    until the situation is resolved. A slow or failed standby server that
    uses a replication slot will have the same effect (see
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e073121a7e..2ad816c9ae 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8871,7 +8871,7 @@ ShutdownXLOG(int code, Datum arg)
 		 * process one more time at the end of shutdown). The checkpoint
 		 * record will go to the next XLOG file and won't be archived (yet).
 		 */
-		if (XLogArchivingActive() && XLogArchiveCommandSet())
+		if (XLogArchivingActive())
 			RequestXLogSwitch(false);
 
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 787c6a2c3b..dbbeac5a82 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -23,6 +23,7 @@ OBJS = \
 	pgarch.o \
 	pgstat.o \
 	postmaster.o \
+	shell_archive.o \
 	startup.o \
 	syslogger.o \
 	walwriter.o
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 3b33e01d95..34a8588eb6 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -25,19 +25,13 @@
  */
 #include "postgres.h"
 
-#include <fcntl.h>
-#include <signal.h>
-#include <time.h>
 #include <sys/stat.h>
-#include <sys/time.h>
-#include <sys/wait.h>
 #include <unistd.h>
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "lib/binaryheap.h"
 #include "libpq/pqsignal.h"
-#include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/pgarch.h"
@@ -93,6 +87,8 @@ typedef struct PgArchData
 	slock_t		arch_lck;
 } PgArchData;
 
+char *XLogArchiveLibrary = "";
+
 
 /* ----------
  * Local data
@@ -100,6 +96,8 @@ typedef struct PgArchData
  */
 static time_t last_sigterm_time = 0;
 static PgArchData *PgArch = NULL;
+static ArchiveModuleCallbacks *ArchiveContext = NULL;
+
 
 /*
  * Stuff for tracking multiple files to archive from each scan of
@@ -135,6 +133,7 @@ static void pgarch_archiveDone(char *xlog);
 static void pgarch_die(int code, Datum arg);
 static void HandlePgArchInterrupts(void);
 static int ready_file_comparator(Datum a, Datum b, void *arg);
+static void LoadArchiveLibrary(void);
 
 /* Report shared memory space needed by PgArchShmemInit */
 Size
@@ -231,6 +230,11 @@ PgArchiverMain(void)
 	 */
 	PgArch->pgprocno = MyProc->pgprocno;
 
+	/*
+	 * Load the archive_library.
+	 */
+	LoadArchiveLibrary();
+
 	/* Initialize our max-heap for prioritizing files to archive. */
 	arch_heap = binaryheap_allocate(NUM_FILES_PER_DIRECTORY_SCAN,
 									ready_file_comparator, NULL);
@@ -398,11 +402,11 @@ pgarch_ArchiverCopyLoop(void)
 			 */
 			HandlePgArchInterrupts();
 
-			/* can't do anything if no command ... */
-			if (!XLogArchiveCommandSet())
+			/* can't do anything if not configured ... */
+			if (!ArchiveContext->check_configured_cb())
 			{
 				ereport(WARNING,
-						(errmsg("archive_mode enabled, yet archive_command is not set")));
+						(errmsg("archive_mode enabled, yet archiving is not configured")));
 				return;
 			}
 
@@ -483,136 +487,31 @@ pgarch_ArchiverCopyLoop(void)
 /*
  * pgarch_archiveXlog
  *
- * Invokes system(3) to copy one archive file to wherever it should go
+ * Invokes archive_file_cb to copy one archive file to wherever it should go
  *
  * Returns true if successful
  */
 static bool
 pgarch_archiveXlog(char *xlog)
 {
-	char		xlogarchcmd[MAXPGPATH];
 	char		pathname[MAXPGPATH];
 	char		activitymsg[MAXFNAMELEN + 16];
-	char	   *dp;
-	char	   *endp;
-	const char *sp;
-	int			rc;
+	bool		ret;
 
 	snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
 
-	/*
-	 * construct the command to be executed
-	 */
-	dp = xlogarchcmd;
-	endp = xlogarchcmd + MAXPGPATH - 1;
-	*endp = '\0';
-
-	for (sp = XLogArchiveCommand; *sp; sp++)
-	{
-		if (*sp == '%')
-		{
-			switch (sp[1])
-			{
-				case 'p':
-					/* %p: relative path of source file */
-					sp++;
-					strlcpy(dp, pathname, endp - dp);
-					make_native_path(dp);
-					dp += strlen(dp);
-					break;
-				case 'f':
-					/* %f: filename of source file */
-					sp++;
-					strlcpy(dp, xlog, endp - dp);
-					dp += strlen(dp);
-					break;
-				case '%':
-					/* convert %% to a single % */
-					sp++;
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-				default:
-					/* otherwise treat the % as not special */
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-			}
-		}
-		else
-		{
-			if (dp < endp)
-				*dp++ = *sp;
-		}
-	}
-	*dp = '\0';
-
-	ereport(DEBUG3,
-			(errmsg_internal("executing archive command \"%s\"",
-							 xlogarchcmd)));
-
 	/* Report archive activity in PS display */
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	rc = system(xlogarchcmd);
-	if (rc != 0)
-	{
-		/*
-		 * If either the shell itself, or a called command, died on a signal,
-		 * abort the archiver.  We do this because system() ignores SIGINT and
-		 * SIGQUIT while waiting; so a signal is very likely something that
-		 * should have interrupted us too.  Also die if the shell got a hard
-		 * "command not found" type of error.  If we overreact it's no big
-		 * deal, the postmaster will just start the archiver again.
-		 */
-		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
-
-		if (WIFEXITED(rc))
-		{
-			ereport(lev,
-					(errmsg("archive command failed with exit code %d",
-							WEXITSTATUS(rc)),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-		else if (WIFSIGNALED(rc))
-		{
-#if defined(WIN32)
-			ereport(lev,
-					(errmsg("archive command was terminated by exception 0x%X",
-							WTERMSIG(rc)),
-					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#else
-			ereport(lev,
-					(errmsg("archive command was terminated by signal %d: %s",
-							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#endif
-		}
-		else
-		{
-			ereport(lev,
-					(errmsg("archive command exited with unrecognized status %d",
-							rc),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-
+	ret = ArchiveContext->archive_file_cb(xlog, pathname);
+	if (ret)
+		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
+	else
 		snprintf(activitymsg, sizeof(activitymsg), "failed on %s", xlog);
-		set_ps_display(activitymsg);
-
-		return false;
-	}
-	elog(DEBUG1, "archived write-ahead log file \"%s\"", xlog);
-
-	snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	set_ps_display(activitymsg);
 
-	return true;
+	return ret;
 }
 
 /*
@@ -855,5 +754,56 @@ HandlePgArchInterrupts(void)
 	{
 		ConfigReloadPending = false;
 		ProcessConfigFile(PGC_SIGHUP);
+
+		/*
+		 * Load the archive_library in case it changed.  Ideally, this would
+		 * first unload any pre-existing loaded archive library to release
+		 * custom GUCs, decommission background workers, etc., but there is
+		 * presently no mechanism for unloading a library.  For more
+		 * information, see the comment above internal_unload_library().
+		 */
+		LoadArchiveLibrary();
 	}
 }
+
+/*
+ * LoadArchiveLibrary
+ *
+ * Loads the archiving callbacks into our local ArchiveContext.
+ */
+static void
+LoadArchiveLibrary(void)
+{
+	ArchiveModuleInit archive_init;
+
+	if (ArchiveContext == NULL)
+		ArchiveContext = palloc(sizeof(ArchiveModuleCallbacks));
+
+	memset(ArchiveContext, 0, sizeof(ArchiveModuleCallbacks));
+
+	/*
+	 * If shell archiving is enabled, use our special initialization
+	 * function.  Otherwise, load the library and call its
+	 * _PG_archive_module_init().
+	 */
+	if (ShellArchivingEnabled())
+		archive_init = shell_archive_init;
+	else
+		archive_init = (ArchiveModuleInit)
+			load_external_function(XLogArchiveLibrary,
+								   "_PG_archive_module_init", false, NULL);
+
+	if (archive_init == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules have to declare the "
+						"_PG_archive_module_init symbol")));
+
+	archive_init(ArchiveContext);
+
+	if (ArchiveContext->check_configured_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register a check callback")));
+	if (ArchiveContext->archive_file_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register an archive callback")));
+}
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
new file mode 100644
index 0000000000..f3cdbc97fe
--- /dev/null
+++ b/src/backend/postmaster/shell_archive.c
@@ -0,0 +1,153 @@
+/*-------------------------------------------------------------------------
+ *
+ * shell_archive.c
+ *
+ * This archiving function uses a user-specified shell command (the
+ * archive_command GUC) to copy write-ahead log files.  It is used as the
+ * default, but other modules may define their own custom archiving logic.
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/shell_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/wait.h>
+
+#include "access/xlog.h"
+#include "postmaster/pgarch.h"
+
+static bool shell_archive_configured(void);
+static bool shell_archive_file(const char *file, const char *path);
+
+void
+shell_archive_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&shell_archive_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = shell_archive_configured;
+	cb->archive_file_cb = shell_archive_file;
+}
+
+static bool
+shell_archive_configured(void)
+{
+	return XLogArchiveCommand[0] != '\0';
+}
+
+static bool
+shell_archive_file(const char *file, const char *path)
+{
+	char		xlogarchcmd[MAXPGPATH];
+	char	   *dp;
+	char	   *endp;
+	const char *sp;
+	int			rc;
+
+	/*
+	 * construct the command to be executed
+	 */
+	dp = xlogarchcmd;
+	endp = xlogarchcmd + MAXPGPATH - 1;
+	*endp = '\0';
+
+	for (sp = XLogArchiveCommand; *sp; sp++)
+	{
+		if (*sp == '%')
+		{
+			switch (sp[1])
+			{
+				case 'p':
+					/* %p: relative path of source file */
+					sp++;
+					strlcpy(dp, path, endp - dp);
+					make_native_path(dp);
+					dp += strlen(dp);
+					break;
+				case 'f':
+					/* %f: filename of source file */
+					sp++;
+					strlcpy(dp, file, endp - dp);
+					dp += strlen(dp);
+					break;
+				case '%':
+					/* convert %% to a single % */
+					sp++;
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+				default:
+					/* otherwise treat the % as not special */
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+			}
+		}
+		else
+		{
+			if (dp < endp)
+				*dp++ = *sp;
+		}
+	}
+	*dp = '\0';
+
+	ereport(DEBUG3,
+			(errmsg_internal("executing archive command \"%s\"",
+							 xlogarchcmd)));
+
+	rc = system(xlogarchcmd);
+	if (rc != 0)
+	{
+		/*
+		 * If either the shell itself, or a called command, died on a signal,
+		 * abort the archiver.  We do this because system() ignores SIGINT and
+		 * SIGQUIT while waiting; so a signal is very likely something that
+		 * should have interrupted us too.  Also die if the shell got a hard
+		 * "command not found" type of error.  If we overreact it's no big
+		 * deal, the postmaster will just start the archiver again.
+		 */
+		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
+
+		if (WIFEXITED(rc))
+		{
+			ereport(lev,
+					(errmsg("archive command failed with exit code %d",
+							WEXITSTATUS(rc)),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+		else if (WIFSIGNALED(rc))
+		{
+#if defined(WIN32)
+			ereport(lev,
+					(errmsg("archive command was terminated by exception 0x%X",
+							WTERMSIG(rc)),
+					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#else
+			ereport(lev,
+					(errmsg("archive command was terminated by signal %d: %s",
+							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#endif
+		}
+		else
+		{
+			ereport(lev,
+					(errmsg("archive command exited with unrecognized status %d",
+							rc),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+
+		return false;
+	}
+
+	elog(DEBUG1, "archived write-ahead log file \"%s\"", file);
+	return true;
+}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 88801374b5..358d9ed029 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -38,6 +38,7 @@
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/interrupt.h"
+#include "postmaster/pgarch.h"
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index e91d5a3cfd..57a9255fc2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3864,13 +3864,23 @@ static struct config_string ConfigureNamesString[] =
 	{
 		{"archive_command", PGC_SIGHUP, WAL_ARCHIVING,
 			gettext_noop("Sets the shell command that will be called to archive a WAL file."),
-			NULL
+			gettext_noop("This is unused if \"archive_library\" does not indicate archiving via shell is enabled.")
 		},
 		&XLogArchiveCommand,
 		"",
 		NULL, NULL, show_archive_command
 	},
 
+	{
+		{"archive_library", PGC_SIGHUP, WAL_ARCHIVING,
+			gettext_noop("Sets the library that will be called to archive a WAL file."),
+			gettext_noop("A value of \"shell\" or an empty string indicates that \"archive_command\" should be used.")
+		},
+		&XLogArchiveLibrary,
+		"shell",
+		NULL, NULL, NULL
+	},
+
 	{
 		{"restore_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will be called to retrieve an archived WAL file."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 1cbc9feeb6..dc4a20b014 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -245,6 +245,7 @@
 
 #archive_mode = off		# enables archiving; off, on, or always
 				# (change requires restart)
+#archive_library = 'shell'	# library to use to archive a logfile segment
 #archive_command = ''		# command to use to archive a logfile segment
 				# placeholders: %p = path of file to archive
 				#               %f = file name only
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 898df2ee03..942eb4d55c 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -155,7 +155,6 @@ extern PGDLLIMPORT int wal_level;
 /* Is WAL archiving enabled always (even during recovery)? */
 #define XLogArchivingAlways() \
 	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
-#define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')
 
 /*
  * Is WAL-logging necessary for archival or log-shipping, or can we skip
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 732615be57..03b5ab7c22 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -33,4 +33,49 @@ extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 extern void PgArchForceDirScan(void);
 
+/*
+ * The value of the archive_library GUC.
+ */
+extern char *XLogArchiveLibrary;
+
+/*
+ * Callback that gets called to determine if the archive module is
+ * configured.
+ */
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+
+/*
+ * Callback called to archive a single WAL file.
+ */
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+
+/*
+ * Archive module callbacks
+ */
+typedef struct ArchiveModuleCallbacks
+{
+	ArchiveCheckConfiguredCB check_configured_cb;
+	ArchiveFileCB archive_file_cb;
+} ArchiveModuleCallbacks;
+
+/*
+ * Type of the shared library symbol _PG_archive_module_init that is looked
+ * up when loading an archive library.
+ */
+typedef void (*ArchiveModuleInit) (ArchiveModuleCallbacks *cb);
+
+/*
+ * Since the logic for archiving via a shell command is in the core server
+ * and does not need to be loaded via a shared library, it has a special
+ * initialization function.
+ */
+extern void shell_archive_init(ArchiveModuleCallbacks *cb);
+
+/*
+ * We consider archiving via shell to be enabled if archive_library is
+ * empty or if archive_library is set to "shell".
+ */
+#define ShellArchivingEnabled() \
+	(XLogArchiveLibrary[0] == '\0' || strcmp(XLogArchiveLibrary, "shell") == 0)
+
 #endif							/* _PGARCH_H */
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index dffc79b2d9..b49e508a2c 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -5,6 +5,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = \
+		  basic_archive \
 		  brin \
 		  commit_ts \
 		  delay_execution \
diff --git a/src/test/modules/basic_archive/.gitignore b/src/test/modules/basic_archive/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/basic_archive/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/basic_archive/Makefile b/src/test/modules/basic_archive/Makefile
new file mode 100644
index 0000000000..ffbf846b68
--- /dev/null
+++ b/src/test/modules/basic_archive/Makefile
@@ -0,0 +1,20 @@
+# src/test/modules/basic_archive/Makefile
+
+MODULES = basic_archive
+PGFILEDESC = "basic_archive - basic archive module"
+
+REGRESS = basic_archive
+REGRESS_OPTS = --temp-config $(top_srcdir)/src/test/modules/basic_archive/basic_archive.conf
+
+NO_INSTALLCHECK = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/basic_archive
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/basic_archive/basic_archive.c b/src/test/modules/basic_archive/basic_archive.c
new file mode 100644
index 0000000000..84e923bb7e
--- /dev/null
+++ b/src/test/modules/basic_archive/basic_archive.c
@@ -0,0 +1,185 @@
+/*-------------------------------------------------------------------------
+ *
+ * basic_archive.c
+ *
+ * This file demonstrates a basic archive library implementation that is
+ * roughly equivalent to the following shell command:
+ *
+ * 		test ! -f /path/to/dest && cp /path/to/src /path/to/dest
+ *
+ * One notable difference between this module and the shell command above
+ * is that this module first copies the file to a temporary destination,
+ * syncs it to disk, and then durably moves it to the final destination.
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/test/modules/basic_archive/basic_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "miscadmin.h"
+#include "postmaster/pgarch.h"
+#include "storage/copydir.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+void _PG_init(void);
+void _PG_archive_module_init(ArchiveModuleCallbacks *cb);
+
+static char *archive_directory = NULL;
+
+static bool basic_archive_configured(void);
+static bool basic_archive_file(const char *file, const char *path);
+static bool check_archive_directory(char **newval, void **extra, GucSource source);
+
+/*
+ * _PG_init
+ *
+ * Defines the module's GUC.
+ */
+void
+_PG_init(void)
+{
+	DefineCustomStringVariable("basic_archive.archive_directory",
+							   gettext_noop("Archive file destination directory."),
+							   NULL,
+							   &archive_directory,
+							   "",
+							   PGC_SIGHUP,
+							   0,
+							   check_archive_directory, NULL, NULL);
+
+	EmitWarningsOnPlaceholders("basic_archive");
+}
+
+/*
+ * _PG_archive_module_init
+ *
+ * Returns the module's archiving callbacks.
+ */
+void
+_PG_archive_module_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = basic_archive_configured;
+	cb->archive_file_cb = basic_archive_file;
+}
+
+/*
+ * check_archive_directory
+ *
+ * Checks that the provided archive directory exists.
+ */
+static bool
+check_archive_directory(char **newval, void **extra, GucSource source)
+{
+	struct stat st;
+
+	/*
+	 * The default value is an empty string, so we have to accept that value.
+	 * Our check_configured callback also checks for this and prevents archiving
+	 * from proceeding if it is still empty.
+	 */
+	if (*newval == NULL || *newval[0] == '\0')
+		return true;
+
+	/*
+	 * Make sure the file paths won't be too long.  The docs indicate that the
+	 * file names to be archived can be up to 64 characters long.
+	 */
+	if (strlen(*newval) + 64 + 2 >= MAXPGPATH)
+	{
+		GUC_check_errdetail("archive directory too long");
+		return false;
+	}
+
+	/*
+	 * Do a basic sanity check that the specified archive directory exists.  It
+	 * could be removed at some point in the future, so we still need to be
+	 * prepared for it not to exist in the actual archiving logic.
+	 */
+	if (stat(*newval, &st) != 0 || !S_ISDIR(st.st_mode))
+	{
+		GUC_check_errdetail("specified archive directory does not exist");
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * basic_archive_configured
+ *
+ * Checks that archive_directory is not blank.
+ */
+static bool
+basic_archive_configured(void)
+{
+	return archive_directory != NULL && archive_directory[0] != '\0';
+}
+
+/*
+ * basic_archive_file
+ *
+ * Archives one file.
+ */
+static bool
+basic_archive_file(const char *file, const char *path)
+{
+	char destination[MAXPGPATH];
+	char temp[MAXPGPATH];
+	struct stat st;
+
+	ereport(DEBUG3,
+			(errmsg("archiving \"%s\" via basic_archive", file)));
+
+	snprintf(destination, MAXPGPATH, "%s/%s", archive_directory, file);
+	snprintf(temp, MAXPGPATH, "%s/%s", archive_directory, "archtemp");
+
+	/*
+	 * First, check if the file has already been archived.  If the archive file
+	 * already exists, something might be wrong, so we just fail.
+	 */
+	if (stat(destination, &st) == 0)
+	{
+		ereport(WARNING,
+				(errmsg("archive file \"%s\" already exists", destination)));
+		return false;
+	}
+	else if (errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat file \"%s\": %m", destination)));
+
+	/*
+	 * Remove pre-existing temporary file, if one exists.
+	 */
+	if (unlink(temp) != 0 && errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not unlink file \"%s\": %m", temp)));
+
+	/*
+	 * Copy the file to its temporary destination.
+	 */
+	copy_file(unconstify(char *, path), temp);
+
+	/*
+	 * Sync the temporary file to disk and move it to its final destination.
+	 */
+	(void) durable_rename_excl(temp, destination, ERROR);
+
+	ereport(DEBUG1,
+			(errmsg("archived \"%s\" via basic_archive", file)));
+
+	return true;
+}
diff --git a/src/test/modules/basic_archive/basic_archive.conf b/src/test/modules/basic_archive/basic_archive.conf
new file mode 100644
index 0000000000..b26b2d4144
--- /dev/null
+++ b/src/test/modules/basic_archive/basic_archive.conf
@@ -0,0 +1,3 @@
+archive_mode = 'on'
+archive_library = 'basic_archive'
+basic_archive.archive_directory = '.'
diff --git a/src/test/modules/basic_archive/expected/basic_archive.out b/src/test/modules/basic_archive/expected/basic_archive.out
new file mode 100644
index 0000000000..0015053e0f
--- /dev/null
+++ b/src/test/modules/basic_archive/expected/basic_archive.out
@@ -0,0 +1,29 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+ ?column? 
+----------
+        1
+(1 row)
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test;
diff --git a/src/test/modules/basic_archive/sql/basic_archive.sql b/src/test/modules/basic_archive/sql/basic_archive.sql
new file mode 100644
index 0000000000..14e236d57a
--- /dev/null
+++ b/src/test/modules/basic_archive/sql/basic_archive.sql
@@ -0,0 +1,22 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+
+DROP TABLE test;
-- 
2.16.6

#21Bossart, Nathan
bossartn@amazon.com
In reply to: Bossart, Nathan (#20)
4 attachment(s)
Re: archive modules

CC'd a few folks who were on earlier messages in this thread but have
since been left out for whatever reason.

On 11/11/21, 3:02 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

Here is a rebased patch (beb4e9b broke v8).

I went ahead and split the patch into 4 separate patches in an effort
to ease review. 0001 just refactors the shell archiving logic to its
own file. 0002 introduces the archive modules infrastructure. 0003
introduces the basic_archive test module. And 0004 is the docs.

Nathan

Attachments:

v10-0001-Refactor-shell-archive-function-to-its-own-file.patchapplication/octet-stream; name=v10-0001-Refactor-shell-archive-function-to-its-own-file.patchDownload
From e0495040eab191dabc4383879934df69bca799f8 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:01:56 +0000
Subject: [PATCH v10 1/4] Refactor shell archive function to its own file.

---
 src/backend/postmaster/Makefile        |   1 +
 src/backend/postmaster/pgarch.c        | 123 ++-----------------------------
 src/backend/postmaster/shell_archive.c | 131 +++++++++++++++++++++++++++++++++
 src/include/postmaster/pgarch.h        |   3 +
 4 files changed, 141 insertions(+), 117 deletions(-)
 create mode 100644 src/backend/postmaster/shell_archive.c

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 787c6a2c3b..dbbeac5a82 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -23,6 +23,7 @@ OBJS = \
 	pgarch.o \
 	pgstat.o \
 	postmaster.o \
+	shell_archive.o \
 	startup.o \
 	syslogger.o \
 	walwriter.o
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 3b33e01d95..7fa1644889 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -25,19 +25,13 @@
  */
 #include "postgres.h"
 
-#include <fcntl.h>
-#include <signal.h>
-#include <time.h>
 #include <sys/stat.h>
-#include <sys/time.h>
-#include <sys/wait.h>
 #include <unistd.h>
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "lib/binaryheap.h"
 #include "libpq/pqsignal.h"
-#include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/pgarch.h"
@@ -490,129 +484,24 @@ pgarch_ArchiverCopyLoop(void)
 static bool
 pgarch_archiveXlog(char *xlog)
 {
-	char		xlogarchcmd[MAXPGPATH];
 	char		pathname[MAXPGPATH];
 	char		activitymsg[MAXFNAMELEN + 16];
-	char	   *dp;
-	char	   *endp;
-	const char *sp;
-	int			rc;
+	bool		ret;
 
 	snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
 
-	/*
-	 * construct the command to be executed
-	 */
-	dp = xlogarchcmd;
-	endp = xlogarchcmd + MAXPGPATH - 1;
-	*endp = '\0';
-
-	for (sp = XLogArchiveCommand; *sp; sp++)
-	{
-		if (*sp == '%')
-		{
-			switch (sp[1])
-			{
-				case 'p':
-					/* %p: relative path of source file */
-					sp++;
-					strlcpy(dp, pathname, endp - dp);
-					make_native_path(dp);
-					dp += strlen(dp);
-					break;
-				case 'f':
-					/* %f: filename of source file */
-					sp++;
-					strlcpy(dp, xlog, endp - dp);
-					dp += strlen(dp);
-					break;
-				case '%':
-					/* convert %% to a single % */
-					sp++;
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-				default:
-					/* otherwise treat the % as not special */
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-			}
-		}
-		else
-		{
-			if (dp < endp)
-				*dp++ = *sp;
-		}
-	}
-	*dp = '\0';
-
-	ereport(DEBUG3,
-			(errmsg_internal("executing archive command \"%s\"",
-							 xlogarchcmd)));
-
 	/* Report archive activity in PS display */
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	rc = system(xlogarchcmd);
-	if (rc != 0)
-	{
-		/*
-		 * If either the shell itself, or a called command, died on a signal,
-		 * abort the archiver.  We do this because system() ignores SIGINT and
-		 * SIGQUIT while waiting; so a signal is very likely something that
-		 * should have interrupted us too.  Also die if the shell got a hard
-		 * "command not found" type of error.  If we overreact it's no big
-		 * deal, the postmaster will just start the archiver again.
-		 */
-		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
-
-		if (WIFEXITED(rc))
-		{
-			ereport(lev,
-					(errmsg("archive command failed with exit code %d",
-							WEXITSTATUS(rc)),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-		else if (WIFSIGNALED(rc))
-		{
-#if defined(WIN32)
-			ereport(lev,
-					(errmsg("archive command was terminated by exception 0x%X",
-							WTERMSIG(rc)),
-					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#else
-			ereport(lev,
-					(errmsg("archive command was terminated by signal %d: %s",
-							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#endif
-		}
-		else
-		{
-			ereport(lev,
-					(errmsg("archive command exited with unrecognized status %d",
-							rc),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-
+	ret = shell_archive_file(xlog, pathname);
+	if (ret)
+		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
+	else
 		snprintf(activitymsg, sizeof(activitymsg), "failed on %s", xlog);
-		set_ps_display(activitymsg);
-
-		return false;
-	}
-	elog(DEBUG1, "archived write-ahead log file \"%s\"", xlog);
-
-	snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	set_ps_display(activitymsg);
 
-	return true;
+	return ret;
 }
 
 /*
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
new file mode 100644
index 0000000000..5a873c4fd2
--- /dev/null
+++ b/src/backend/postmaster/shell_archive.c
@@ -0,0 +1,131 @@
+/*-------------------------------------------------------------------------
+ *
+ * shell_archive.c
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/shell_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/wait.h>
+
+#include "access/xlog.h"
+#include "postmaster/pgarch.h"
+
+bool
+shell_archive_file(const char *file, const char *path)
+{
+	char		xlogarchcmd[MAXPGPATH];
+	char	   *dp;
+	char	   *endp;
+	const char *sp;
+	int			rc;
+
+	/*
+	 * construct the command to be executed
+	 */
+	dp = xlogarchcmd;
+	endp = xlogarchcmd + MAXPGPATH - 1;
+	*endp = '\0';
+
+	for (sp = XLogArchiveCommand; *sp; sp++)
+	{
+		if (*sp == '%')
+		{
+			switch (sp[1])
+			{
+				case 'p':
+					/* %p: relative path of source file */
+					sp++;
+					strlcpy(dp, path, endp - dp);
+					make_native_path(dp);
+					dp += strlen(dp);
+					break;
+				case 'f':
+					/* %f: filename of source file */
+					sp++;
+					strlcpy(dp, file, endp - dp);
+					dp += strlen(dp);
+					break;
+				case '%':
+					/* convert %% to a single % */
+					sp++;
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+				default:
+					/* otherwise treat the % as not special */
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+			}
+		}
+		else
+		{
+			if (dp < endp)
+				*dp++ = *sp;
+		}
+	}
+	*dp = '\0';
+
+	ereport(DEBUG3,
+			(errmsg_internal("executing archive command \"%s\"",
+							 xlogarchcmd)));
+
+	rc = system(xlogarchcmd);
+	if (rc != 0)
+	{
+		/*
+		 * If either the shell itself, or a called command, died on a signal,
+		 * abort the archiver.  We do this because system() ignores SIGINT and
+		 * SIGQUIT while waiting; so a signal is very likely something that
+		 * should have interrupted us too.  Also die if the shell got a hard
+		 * "command not found" type of error.  If we overreact it's no big
+		 * deal, the postmaster will just start the archiver again.
+		 */
+		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
+
+		if (WIFEXITED(rc))
+		{
+			ereport(lev,
+					(errmsg("archive command failed with exit code %d",
+							WEXITSTATUS(rc)),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+		else if (WIFSIGNALED(rc))
+		{
+#if defined(WIN32)
+			ereport(lev,
+					(errmsg("archive command was terminated by exception 0x%X",
+							WTERMSIG(rc)),
+					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#else
+			ereport(lev,
+					(errmsg("archive command was terminated by signal %d: %s",
+							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#endif
+		}
+		else
+		{
+			ereport(lev,
+					(errmsg("archive command exited with unrecognized status %d",
+							rc),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+
+		return false;
+	}
+
+	elog(DEBUG1, "archived write-ahead log file \"%s\"", file);
+	return true;
+}
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 732615be57..9c4bd69b56 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -33,4 +33,7 @@ extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 extern void PgArchForceDirScan(void);
 
+/* in shell_archive.c */
+extern bool shell_archive_file(const char *file, const char *path);
+
 #endif							/* _PGARCH_H */
-- 
2.16.6

v10-0004-Add-documentation-for-archive-modules.patchapplication/octet-stream; name=v10-0004-Add-documentation-for-archive-modules.patchDownload
From 208139df46020c4c488a3ccde7307da91fbb6ba2 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:06:01 +0000
Subject: [PATCH v10 4/4] Add documentation for archive modules.

---
 doc/src/sgml/archive-modules.sgml   | 123 ++++++++++++++++++++++++++++++++++++
 doc/src/sgml/backup.sgml            |  83 +++++++++++++++---------
 doc/src/sgml/config.sgml            |  37 ++++++++---
 doc/src/sgml/filelist.sgml          |   1 +
 doc/src/sgml/high-availability.sgml |   6 +-
 doc/src/sgml/postgres.sgml          |   1 +
 doc/src/sgml/ref/pg_basebackup.sgml |   4 +-
 doc/src/sgml/ref/pg_receivewal.sgml |   6 +-
 doc/src/sgml/wal.sgml               |   2 +-
 9 files changed, 215 insertions(+), 48 deletions(-)
 create mode 100644 doc/src/sgml/archive-modules.sgml

diff --git a/doc/src/sgml/archive-modules.sgml b/doc/src/sgml/archive-modules.sgml
new file mode 100644
index 0000000000..d52aaaf1f5
--- /dev/null
+++ b/doc/src/sgml/archive-modules.sgml
@@ -0,0 +1,123 @@
+<!-- doc/src/sgml/archive-modules.sgml -->
+
+<chapter id="archive-modules">
+ <title>Archive Modules</title>
+ <indexterm zone="archive-modules">
+  <primary>Archive Modules</primary>
+ </indexterm>
+
+ <para>
+  PostgreSQL provides infrastructure to create custom modules for continuous
+  archiving (see <xref linkend="continuous-archiving"/>).  While archiving via
+  a shell command (i.e., <xref linkend="guc-archive-command"/>) is much
+  simpler, a custom archive module will often be considerably more robust and
+  performant.
+ </para>
+
+ <para>
+  When a custom <xref linkend="guc-archive-library"/> is configured, PostgreSQL
+  will submit completed WAL files to the module, and the server will avoid
+  recyling or removing these WAL files until the module indicates that the files
+  were successfully archived.  It is ultimately up to the module to decide what
+  to do with each WAL file, but many recommendations are listed at
+  <xref linkend="backup-archiving-wal"/>.
+ </para>
+
+ <para>
+  Archiving modules must at least consist of an initialization function (see
+  <xref linkend="archive-module-init"/>) and the required callbacks (see
+  <xref linkend="archive-module-callbacks"/>).  However, archive modules are
+  also permitted to do much more (e.g., declare GUCs and register background
+  workers).
+ </para>
+
+ <para>
+  The <filename>src/test/modules/basic_archive</filename> module contains a
+  working example, which demonstrates some useful techniques.
+ </para>
+
+ <warning>
+  <para>
+   There are considerable robustness and security risks in using archive modules
+   because, being written in the <literal>C</literal> language, they have access
+   to many server resources.  Administrators wishing to enable archive modules
+   should exercise extreme caution.  Only carefully audited modules should be
+   loaded.
+  </para>
+ </warning>
+
+ <sect1 id="archive-module-init">
+  <title>Initialization Functions</title>
+  <indexterm zone="archive-module-init">
+   <primary>_PG_archive_module_init</primary>
+  </indexterm>
+  <para>
+   An archive library is loaded by dynamically loading a shared library with the
+   <xref linkend="guc-archive-library"/>'s name as the library base name.  The
+   normal library search path is used to locate the library.  To provide the
+   required archive module callbacks and to indicate that the library is
+   actually an archive module, it needs to provide a function named
+   <function>_PG_archive_module_init</function>.  This function is passed a
+   struct that needs to be filled with the callback function pointers for
+   individual actions.
+
+<programlisting>
+typedef struct ArchiveModuleCallbacks
+{
+    ArchiveCheckConfiguredCB check_configured_cb;
+    ArchiveFileCB archive_file_cb;
+} ArchiveModuleCallbacks;
+typedef void (*ArchiveModuleInit) (struct ArchiveModuleCallbacks *cb);
+</programlisting>
+
+   Both callbacks are required.
+  </para>
+ </sect1>
+
+ <sect1 id="archive-module-callbacks">
+  <title>Archive Module Callbacks</title>
+  <para>
+   The archive callbacks define the actual archiving behavior of the module.
+   The server will call them as required to process each individual WAL file.
+  </para>
+
+  <sect2 id="archive-module-check">
+   <title>Check Callback</title>
+   <para>
+    The <function>check_configured_cb</function> callback is called to determine
+    whether the module is fully configured and ready to accept WAL files.
+
+<programlisting>
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server will proceed with
+    archiving the file by calling the <function>archive_file_cb</function>
+    callback.  If <literal>false</literal> is returned, archiving will not
+    proceed.  In the latter case, the server will periodically call this
+    function, and archiving will proceed if it eventually returns
+    <literal>true</literal>.
+   </para>
+  </sect2>
+
+  <sect2 id="archive-module-archive">
+   <title>Archive Callback</title>
+   <para>
+    The <function>archive_file_cb</function> callback is called to archive a
+    single WAL file.
+
+<programlisting>
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server proceeds as if the file
+    was successfully archived, which may include recycling or removing the
+    original WAL file.  If <literal>false</literal> is returned, the server will
+    keep the original WAL file and retry archiving later.
+    <literal>file</literal> will contain just the file name of the WAL file to
+    archive, while <literal>path</literal> contains the full path of the WAL
+    file (including the file name).
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index cba32b6eb3..b42f1b3ca7 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -593,20 +593,23 @@ tar -cf backup.tar /usr/local/pgsql/data
     provide the database administrator with flexibility,
     <productname>PostgreSQL</productname> tries not to make any assumptions about how
     the archiving will be done.  Instead, <productname>PostgreSQL</productname> lets
-    the administrator specify a shell command to be executed to copy a
-    completed segment file to wherever it needs to go.  The command could be
-    as simple as a <literal>cp</literal>, or it could invoke a complex shell
-    script &mdash; it's all up to you.
+    the administrator specify an archive library to be executed to copy a
+    completed segment file to wherever it needs to go.  This could be as simple
+    as a shell command that uses <literal>cp</literal>, or it could invoke a
+    complex C function &mdash; it's all up to you.
    </para>
 
    <para>
     To enable WAL archiving, set the <xref linkend="guc-wal-level"/>
     configuration parameter to <literal>replica</literal> or higher,
     <xref linkend="guc-archive-mode"/> to <literal>on</literal>,
-    and specify the shell command to use in the <xref
-    linkend="guc-archive-command"/> configuration parameter.  In practice
+    and specify the library to use in the <xref
+    linkend="guc-archive-library"/> configuration parameter.  In practice
     these settings will always be placed in the
     <filename>postgresql.conf</filename> file.
+    One simple way to archive is to set <varname>archive_library</varname> to
+    <literal>shell</literal> and to specify a shell command in
+    <xref linkend="guc-archive-command"/>.
     In <varname>archive_command</varname>,
     <literal>%p</literal> is replaced by the path name of the file to
     archive, while <literal>%f</literal> is replaced by only the file name.
@@ -631,7 +634,17 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command will be executed under the ownership of the same
+    Another way to archive is to use a custom archive module as the
+    <varname>archive_library</varname>.  Since such modules are written in
+    <literal>C</literal>, creating your own may require considerably more effort
+    than writing a shell command.  However, archive modules can be more
+    performant than archiving via shell, and they will have access to many
+    useful server resources.  For more information about archive modules, see
+    <xref linkend="archive-modules"/>.
+   </para>
+
+   <para>
+    The archive library will be executed under the ownership of the same
     user that the <productname>PostgreSQL</productname> server is running as.  Since
     the series of WAL files being archived contains effectively everything
     in your database, you will want to be sure that the archived data is
@@ -640,25 +653,31 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is important that the archive command return zero exit status if and
-    only if it succeeds.  Upon getting a zero result,
+    It is important that the archive function return <literal>true</literal> if
+    and only if it succeeds.  If <literal>true</literal> is returned,
     <productname>PostgreSQL</productname> will assume that the file has been
-    successfully archived, and will remove or recycle it.  However, a nonzero
-    status tells <productname>PostgreSQL</productname> that the file was not archived;
-    it will try again periodically until it succeeds.
+    successfully archived, and will remove or recycle it.  However, a return
+    value of <literal>false</literal> tells
+    <productname>PostgreSQL</productname> that the file was not archived; it
+    will try again periodically until it succeeds.  If you are archiving via a
+    shell command, the appropriate return values can be achieved by returning
+    <literal>0</literal> if the command succeeds and a nonzero value if it
+    fails.
    </para>
 
    <para>
-    When the archive command is terminated by a signal (other than
-    <systemitem>SIGTERM</systemitem> that is used as part of a server
-    shutdown) or an error by the shell with an exit status greater than
-    125 (such as command not found), the archiver process aborts and gets
-    restarted by the postmaster. In such cases, the failure is
-    not reported in <xref linkend="pg-stat-archiver-view"/>.
+    If the archive function emits an <literal>ERROR</literal> or
+    <literal>FATAL</literal>, the archiver process aborts and gets restarted by
+    the postmaster.  If you are archiving via shell command, FATAL is emitted if
+    the command is terminated by a signal (other than
+    <systemitem>SIGTERM</systemitem> that is used as part of a server shutdown)
+    or an error by the shell with an exit status greater than 125 (such as
+    command not found).  In such cases, the failure is not reported in
+    <xref linkend="pg-stat-archiver-view"/>.
    </para>
 
    <para>
-    The archive command should generally be designed to refuse to overwrite
+    The archive library should generally be designed to refuse to overwrite
     any pre-existing archive file.  This is an important safety feature to
     preserve the integrity of your archive in case of administrator error
     (such as sending the output of two different servers to the same archive
@@ -666,9 +685,9 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is advisable to test your proposed archive command to ensure that it
+    It is advisable to test your proposed archive library to ensure that it
     indeed does not overwrite an existing file, <emphasis>and that it returns
-    nonzero status in this case</emphasis>.
+    <literal>false</literal> in this case</emphasis>.
     The example command above for Unix ensures this by including a separate
     <command>test</command> step.  On some Unix platforms, <command>cp</command> has
     switches such as <option>-i</option> that can be used to do the same thing
@@ -680,7 +699,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
 
    <para>
     While designing your archiving setup, consider what will happen if
-    the archive command fails repeatedly because some aspect requires
+    the archive library fails repeatedly because some aspect requires
     operator intervention or the archive runs out of space. For example, this
     could occur if you write to tape without an autochanger; when the tape
     fills, nothing further can be archived until the tape is swapped.
@@ -695,7 +714,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The speed of the archiving command is unimportant as long as it can keep up
+    The speed of the archive library is unimportant as long as it can keep up
     with the average rate at which your server generates WAL data.  Normal
     operation continues even if the archiving process falls a little behind.
     If archiving falls significantly behind, this will increase the amount of
@@ -707,11 +726,11 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    In writing your archive command, you should assume that the file names to
+    In writing your archive library, you should assume that the file names to
     be archived can be up to 64 characters long and can contain any
     combination of ASCII letters, digits, and dots.  It is not necessary to
-    preserve the original relative path (<literal>%p</literal>) but it is necessary to
-    preserve the file name (<literal>%f</literal>).
+    preserve the original relative path but it is necessary to preserve the file
+    name.
    </para>
 
    <para>
@@ -728,7 +747,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command is only invoked on completed WAL segments.  Hence,
+    The archive function is only invoked on completed WAL segments.  Hence,
     if your server generates only little WAL traffic (or has slack periods
     where it does so), there could be a long delay between the completion
     of a transaction and its safe recording in archive storage.  To put
@@ -758,7 +777,8 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
     contain enough information for archive recovery.  (Crash recovery is
     unaffected.)  For this reason, <varname>wal_level</varname> can only be changed at
     server start.  However, <varname>archive_command</varname> can be changed with a
-    configuration file reload.  If you wish to temporarily stop archiving,
+    configuration file reload.  If you are archiving via shell and wish to
+    temporarily stop archiving,
     one way to do it is to set <varname>archive_command</varname> to the empty
     string (<literal>''</literal>).
     This will cause WAL files to accumulate in <filename>pg_wal/</filename> until a
@@ -938,11 +958,11 @@ SELECT * FROM pg_stop_backup(false, true);
      On a standby, <varname>archive_mode</varname> must be <literal>always</literal> in order
      for <function>pg_stop_backup</function> to wait.
      Archiving of these files happens automatically since you have
-     already configured <varname>archive_command</varname>. In most cases this
+     already configured <varname>archive_library</varname>. In most cases this
      happens quickly, but you are advised to monitor your archive
      system to ensure there are no delays.
      If the archive process has fallen behind
-     because of failures of the archive command, it will keep retrying
+     because of failures of the archive library, it will keep retrying
      until the archive succeeds and the backup is complete.
      If you wish to place a time limit on the execution of
      <function>pg_stop_backup</function>, set an appropriate
@@ -1500,9 +1520,10 @@ restore_command = 'cp /mnt/server/archivedir/%f %p'
       To prepare for low level standalone hot backups, make sure
       <varname>wal_level</varname> is set to
       <literal>replica</literal> or higher, <varname>archive_mode</varname> to
-      <literal>on</literal>, and set up an <varname>archive_command</varname> that performs
+      <literal>on</literal>, and set up an <varname>archive_library</varname> that performs
       archiving only when a <emphasis>switch file</emphasis> exists.  For example:
 <programlisting>
+archive_library = 'shell'
 archive_command = 'test ! -f /var/lib/pgsql/backup_in_progress || (test ! -f /var/lib/pgsql/archive/%f &amp;&amp; cp %p /var/lib/pgsql/archive/%f)'
 </programlisting>
       This command will perform archiving when
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3f806740d5..23b3ff338c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3479,7 +3479,7 @@ include_dir 'conf.d'
         Maximum size to let the WAL grow during automatic
         checkpoints. This is a soft limit; WAL size can exceed
         <varname>max_wal_size</varname> under special circumstances, such as
-        heavy load, a failing <varname>archive_command</varname>, or a high
+        heavy load, a failing <varname>archive_library</varname>, or a high
         <varname>wal_keep_size</varname> setting.
         If this value is specified without units, it is taken as megabytes.
         The default is 1 GB.
@@ -3528,7 +3528,7 @@ include_dir 'conf.d'
        <para>
         When <varname>archive_mode</varname> is enabled, completed WAL segments
         are sent to archive storage by setting
-        <xref linkend="guc-archive-command"/>. In addition to <literal>off</literal>,
+        <xref linkend="guc-archive-library"/>. In addition to <literal>off</literal>,
         to disable, there are two modes: <literal>on</literal>, and
         <literal>always</literal>. During normal operation, there is no
         difference between the two modes, but when set to <literal>always</literal>
@@ -3538,9 +3538,6 @@ include_dir 'conf.d'
         <xref linkend="continuous-archiving-in-standby"/> for details.
        </para>
        <para>
-        <varname>archive_mode</varname> and <varname>archive_command</varname> are
-        separate variables so that <varname>archive_command</varname> can be
-        changed without leaving archiving mode.
         This parameter can only be set at server start.
         <varname>archive_mode</varname> cannot be enabled when
         <varname>wal_level</varname> is set to <literal>minimal</literal>.
@@ -3548,6 +3545,28 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-archive-library" xreflabel="archive_library">
+      <term><varname>archive_library</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>archive_library</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        The library to use for archiving completed WAL file segments.  If set to
+        <literal>shell</literal> (the default) or an empty string, archiving via
+        shell is enabled, and <xref linkend="guc-archive-command"/> is used.
+        Otherwise, the specified shared library is used for archiving.  For more
+        information, see <xref linkend="backup-archiving-wal"/> and
+        <xref linkend="archive-modules"/>.
+       </para>
+       <para>
+        This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-archive-command" xreflabel="archive_command">
       <term><varname>archive_command</varname> (<type>string</type>)
       <indexterm>
@@ -3570,9 +3589,11 @@ include_dir 'conf.d'
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
         file or on the server command line.  It is ignored unless
-        <varname>archive_mode</varname> was enabled at server start.
+        <varname>archive_mode</varname> was enabled at server start and
+        <varname>archive_library</varname> specifies to archive via shell command.
         If <varname>archive_command</varname> is an empty string (the default) while
-        <varname>archive_mode</varname> is enabled, WAL archiving is temporarily
+        <varname>archive_mode</varname> is enabled and <varname>archive_library</varname>
+        specifies archiving via shell, WAL archiving is temporarily
         disabled, but the server continues to accumulate WAL segment files in
         the expectation that a command will soon be provided.  Setting
         <varname>archive_command</varname> to a command that does nothing but
@@ -3592,7 +3613,7 @@ include_dir 'conf.d'
       </term>
       <listitem>
        <para>
-        The <xref linkend="guc-archive-command"/> is only invoked for
+        The <xref linkend="guc-archive-library"/> is only invoked for
         completed WAL segments. Hence, if your server generates little WAL
         traffic (or has slack periods where it does so), there could be a
         long delay between the completion of a transaction and its safe
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 89454e99b9..e6b472ec32 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -99,6 +99,7 @@
 <!ENTITY custom-scan SYSTEM "custom-scan.sgml">
 <!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
 <!ENTITY replication-origins SYSTEM "replication-origins.sgml">
+<!ENTITY archive-modules SYSTEM "archive-modules.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index c43f214020..f4e5e9420b 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -935,7 +935,7 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
     In lieu of using replication slots, it is possible to prevent the removal
     of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by
     storing the segments in an archive using
-    <xref linkend="guc-archive-command"/>.
+    <xref linkend="guc-archive-library"/>.
     However, these methods often result in retaining more WAL segments than
     required, whereas replication slots retain only the number of segments
     known to be needed.  On the other hand, replication slots can retain so
@@ -1386,10 +1386,10 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
      to <literal>always</literal>, and the standby will call the archive
      command for every WAL segment it receives, whether it's by restoring
      from the archive or by streaming replication. The shared archive can
-     be handled similarly, but the <varname>archive_command</varname> must
+     be handled similarly, but the <varname>archive_library</varname> must
      test if the file being archived exists already, and if the existing file
      has identical contents. This requires more care in the
-     <varname>archive_command</varname>, as it must
+     <varname>archive_library</varname>, as it must
      be careful to not overwrite an existing file with different contents,
      but return success if the exactly same file is archived twice. And
      all that must be done free of race conditions, if two servers attempt
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index dba9cf413f..3db6d2160b 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -233,6 +233,7 @@ break is not needed in a wider output rendering.
   &bgworker;
   &logicaldecoding;
   &replication-origins;
+  &archive-modules;
 
  </part>
 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9e6807b457..2aaeaca766 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -102,8 +102,8 @@ PostgreSQL documentation
      <para>
       All WAL records required for the backup must contain sufficient full-page writes,
       which requires you to enable <varname>full_page_writes</varname> on the primary and
-      not to use a tool like <application>pg_compresslog</application> as
-      <varname>archive_command</varname> to remove full-page writes from WAL files.
+      not to use a tool in your <varname>archive_library</varname> to remove
+      full-page writes from WAL files.
      </para>
     </listitem>
    </itemizedlist>
diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 5de80f8c64..a6b6ba91fb 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -40,7 +40,7 @@ PostgreSQL documentation
   <para>
    <application>pg_receivewal</application> streams the write-ahead
    log in real time as it's being generated on the server, and does not wait
-   for segments to complete like <xref linkend="guc-archive-command"/> does.
+   for segments to complete like <xref linkend="guc-archive-library"/> does.
    For this reason, it is not necessary to set
    <xref linkend="guc-archive-timeout"/> when using
     <application>pg_receivewal</application>.
@@ -488,11 +488,11 @@ PostgreSQL documentation
 
   <para>
    When using <application>pg_receivewal</application> instead of
-   <xref linkend="guc-archive-command"/> as the main WAL backup method, it is
+   <xref linkend="guc-archive-library"/> as the main WAL backup method, it is
    strongly recommended to use replication slots.  Otherwise, the server is
    free to recycle or remove write-ahead log files before they are backed up,
    because it does not have any information, either
-   from <xref linkend="guc-archive-command"/> or the replication slots, about
+   from <xref linkend="guc-archive-library"/> or the replication slots, about
    how far the WAL stream has been archived.  Note, however, that a
    replication slot will fill up the server's disk space if the receiver does
    not keep up with fetching the WAL data.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 24e1c89503..2bb27a8468 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -636,7 +636,7 @@
    WAL files plus one additional WAL file are
    kept at all times. Also, if WAL archiving is used, old segments cannot be
    removed or recycled until they are archived. If WAL archiving cannot keep up
-   with the pace that WAL is generated, or if <varname>archive_command</varname>
+   with the pace that WAL is generated, or if <varname>archive_library</varname>
    fails repeatedly, old WAL files will accumulate in <filename>pg_wal</filename>
    until the situation is resolved. A slow or failed standby server that
    uses a replication slot will have the same effect (see
-- 
2.16.6

v10-0003-Add-test-archive-module.patchapplication/octet-stream; name=v10-0003-Add-test-archive-module.patchDownload
From 089594ce3c5a5f7de1a141684dc63d1cc687d9e2 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:05:43 +0000
Subject: [PATCH v10 3/4] Add test archive module.

---
 src/test/modules/Makefile                          |   1 +
 src/test/modules/basic_archive/.gitignore          |   4 +
 src/test/modules/basic_archive/Makefile            |  20 +++
 src/test/modules/basic_archive/basic_archive.c     | 185 +++++++++++++++++++++
 src/test/modules/basic_archive/basic_archive.conf  |   3 +
 .../basic_archive/expected/basic_archive.out       |  29 ++++
 .../modules/basic_archive/sql/basic_archive.sql    |  22 +++
 7 files changed, 264 insertions(+)
 create mode 100644 src/test/modules/basic_archive/.gitignore
 create mode 100644 src/test/modules/basic_archive/Makefile
 create mode 100644 src/test/modules/basic_archive/basic_archive.c
 create mode 100644 src/test/modules/basic_archive/basic_archive.conf
 create mode 100644 src/test/modules/basic_archive/expected/basic_archive.out
 create mode 100644 src/test/modules/basic_archive/sql/basic_archive.sql

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index dffc79b2d9..b49e508a2c 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -5,6 +5,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = \
+		  basic_archive \
 		  brin \
 		  commit_ts \
 		  delay_execution \
diff --git a/src/test/modules/basic_archive/.gitignore b/src/test/modules/basic_archive/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/basic_archive/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/basic_archive/Makefile b/src/test/modules/basic_archive/Makefile
new file mode 100644
index 0000000000..ffbf846b68
--- /dev/null
+++ b/src/test/modules/basic_archive/Makefile
@@ -0,0 +1,20 @@
+# src/test/modules/basic_archive/Makefile
+
+MODULES = basic_archive
+PGFILEDESC = "basic_archive - basic archive module"
+
+REGRESS = basic_archive
+REGRESS_OPTS = --temp-config $(top_srcdir)/src/test/modules/basic_archive/basic_archive.conf
+
+NO_INSTALLCHECK = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/basic_archive
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/basic_archive/basic_archive.c b/src/test/modules/basic_archive/basic_archive.c
new file mode 100644
index 0000000000..84e923bb7e
--- /dev/null
+++ b/src/test/modules/basic_archive/basic_archive.c
@@ -0,0 +1,185 @@
+/*-------------------------------------------------------------------------
+ *
+ * basic_archive.c
+ *
+ * This file demonstrates a basic archive library implementation that is
+ * roughly equivalent to the following shell command:
+ *
+ * 		test ! -f /path/to/dest && cp /path/to/src /path/to/dest
+ *
+ * One notable difference between this module and the shell command above
+ * is that this module first copies the file to a temporary destination,
+ * syncs it to disk, and then durably moves it to the final destination.
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/test/modules/basic_archive/basic_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "miscadmin.h"
+#include "postmaster/pgarch.h"
+#include "storage/copydir.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+void _PG_init(void);
+void _PG_archive_module_init(ArchiveModuleCallbacks *cb);
+
+static char *archive_directory = NULL;
+
+static bool basic_archive_configured(void);
+static bool basic_archive_file(const char *file, const char *path);
+static bool check_archive_directory(char **newval, void **extra, GucSource source);
+
+/*
+ * _PG_init
+ *
+ * Defines the module's GUC.
+ */
+void
+_PG_init(void)
+{
+	DefineCustomStringVariable("basic_archive.archive_directory",
+							   gettext_noop("Archive file destination directory."),
+							   NULL,
+							   &archive_directory,
+							   "",
+							   PGC_SIGHUP,
+							   0,
+							   check_archive_directory, NULL, NULL);
+
+	EmitWarningsOnPlaceholders("basic_archive");
+}
+
+/*
+ * _PG_archive_module_init
+ *
+ * Returns the module's archiving callbacks.
+ */
+void
+_PG_archive_module_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = basic_archive_configured;
+	cb->archive_file_cb = basic_archive_file;
+}
+
+/*
+ * check_archive_directory
+ *
+ * Checks that the provided archive directory exists.
+ */
+static bool
+check_archive_directory(char **newval, void **extra, GucSource source)
+{
+	struct stat st;
+
+	/*
+	 * The default value is an empty string, so we have to accept that value.
+	 * Our check_configured callback also checks for this and prevents archiving
+	 * from proceeding if it is still empty.
+	 */
+	if (*newval == NULL || *newval[0] == '\0')
+		return true;
+
+	/*
+	 * Make sure the file paths won't be too long.  The docs indicate that the
+	 * file names to be archived can be up to 64 characters long.
+	 */
+	if (strlen(*newval) + 64 + 2 >= MAXPGPATH)
+	{
+		GUC_check_errdetail("archive directory too long");
+		return false;
+	}
+
+	/*
+	 * Do a basic sanity check that the specified archive directory exists.  It
+	 * could be removed at some point in the future, so we still need to be
+	 * prepared for it not to exist in the actual archiving logic.
+	 */
+	if (stat(*newval, &st) != 0 || !S_ISDIR(st.st_mode))
+	{
+		GUC_check_errdetail("specified archive directory does not exist");
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * basic_archive_configured
+ *
+ * Checks that archive_directory is not blank.
+ */
+static bool
+basic_archive_configured(void)
+{
+	return archive_directory != NULL && archive_directory[0] != '\0';
+}
+
+/*
+ * basic_archive_file
+ *
+ * Archives one file.
+ */
+static bool
+basic_archive_file(const char *file, const char *path)
+{
+	char destination[MAXPGPATH];
+	char temp[MAXPGPATH];
+	struct stat st;
+
+	ereport(DEBUG3,
+			(errmsg("archiving \"%s\" via basic_archive", file)));
+
+	snprintf(destination, MAXPGPATH, "%s/%s", archive_directory, file);
+	snprintf(temp, MAXPGPATH, "%s/%s", archive_directory, "archtemp");
+
+	/*
+	 * First, check if the file has already been archived.  If the archive file
+	 * already exists, something might be wrong, so we just fail.
+	 */
+	if (stat(destination, &st) == 0)
+	{
+		ereport(WARNING,
+				(errmsg("archive file \"%s\" already exists", destination)));
+		return false;
+	}
+	else if (errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat file \"%s\": %m", destination)));
+
+	/*
+	 * Remove pre-existing temporary file, if one exists.
+	 */
+	if (unlink(temp) != 0 && errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not unlink file \"%s\": %m", temp)));
+
+	/*
+	 * Copy the file to its temporary destination.
+	 */
+	copy_file(unconstify(char *, path), temp);
+
+	/*
+	 * Sync the temporary file to disk and move it to its final destination.
+	 */
+	(void) durable_rename_excl(temp, destination, ERROR);
+
+	ereport(DEBUG1,
+			(errmsg("archived \"%s\" via basic_archive", file)));
+
+	return true;
+}
diff --git a/src/test/modules/basic_archive/basic_archive.conf b/src/test/modules/basic_archive/basic_archive.conf
new file mode 100644
index 0000000000..b26b2d4144
--- /dev/null
+++ b/src/test/modules/basic_archive/basic_archive.conf
@@ -0,0 +1,3 @@
+archive_mode = 'on'
+archive_library = 'basic_archive'
+basic_archive.archive_directory = '.'
diff --git a/src/test/modules/basic_archive/expected/basic_archive.out b/src/test/modules/basic_archive/expected/basic_archive.out
new file mode 100644
index 0000000000..0015053e0f
--- /dev/null
+++ b/src/test/modules/basic_archive/expected/basic_archive.out
@@ -0,0 +1,29 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+ ?column? 
+----------
+        1
+(1 row)
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test;
diff --git a/src/test/modules/basic_archive/sql/basic_archive.sql b/src/test/modules/basic_archive/sql/basic_archive.sql
new file mode 100644
index 0000000000..14e236d57a
--- /dev/null
+++ b/src/test/modules/basic_archive/sql/basic_archive.sql
@@ -0,0 +1,22 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+
+DROP TABLE test;
-- 
2.16.6

v10-0002-Introduce-archive-modules-infrastructure.patchapplication/octet-stream; name=v10-0002-Introduce-archive-modules-infrastructure.patchDownload
From 8eb3d24ed866887b0139812cb4b33cc5cdf78cc9 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:04:41 +0000
Subject: [PATCH v10 2/4] Introduce archive modules infrastructure.

---
 src/backend/access/transam/xlog.c             |  2 +-
 src/backend/postmaster/pgarch.c               | 71 +++++++++++++++++++++++++--
 src/backend/postmaster/shell_archive.c        | 24 ++++++++-
 src/backend/utils/init/miscinit.c             |  1 +
 src/backend/utils/misc/guc.c                  | 12 ++++-
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/xlog.h                     |  1 -
 src/include/postmaster/pgarch.h               | 46 ++++++++++++++++-
 8 files changed, 147 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1616448368..621b9b49e0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8867,7 +8867,7 @@ ShutdownXLOG(int code, Datum arg)
 		 * process one more time at the end of shutdown). The checkpoint
 		 * record will go to the next XLOG file and won't be archived (yet).
 		 */
-		if (XLogArchivingActive() && XLogArchiveCommandSet())
+		if (XLogArchivingActive())
 			RequestXLogSwitch(false);
 
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 7fa1644889..34a8588eb6 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -87,6 +87,8 @@ typedef struct PgArchData
 	slock_t		arch_lck;
 } PgArchData;
 
+char *XLogArchiveLibrary = "";
+
 
 /* ----------
  * Local data
@@ -94,6 +96,8 @@ typedef struct PgArchData
  */
 static time_t last_sigterm_time = 0;
 static PgArchData *PgArch = NULL;
+static ArchiveModuleCallbacks *ArchiveContext = NULL;
+
 
 /*
  * Stuff for tracking multiple files to archive from each scan of
@@ -129,6 +133,7 @@ static void pgarch_archiveDone(char *xlog);
 static void pgarch_die(int code, Datum arg);
 static void HandlePgArchInterrupts(void);
 static int ready_file_comparator(Datum a, Datum b, void *arg);
+static void LoadArchiveLibrary(void);
 
 /* Report shared memory space needed by PgArchShmemInit */
 Size
@@ -225,6 +230,11 @@ PgArchiverMain(void)
 	 */
 	PgArch->pgprocno = MyProc->pgprocno;
 
+	/*
+	 * Load the archive_library.
+	 */
+	LoadArchiveLibrary();
+
 	/* Initialize our max-heap for prioritizing files to archive. */
 	arch_heap = binaryheap_allocate(NUM_FILES_PER_DIRECTORY_SCAN,
 									ready_file_comparator, NULL);
@@ -392,11 +402,11 @@ pgarch_ArchiverCopyLoop(void)
 			 */
 			HandlePgArchInterrupts();
 
-			/* can't do anything if no command ... */
-			if (!XLogArchiveCommandSet())
+			/* can't do anything if not configured ... */
+			if (!ArchiveContext->check_configured_cb())
 			{
 				ereport(WARNING,
-						(errmsg("archive_mode enabled, yet archive_command is not set")));
+						(errmsg("archive_mode enabled, yet archiving is not configured")));
 				return;
 			}
 
@@ -477,7 +487,7 @@ pgarch_ArchiverCopyLoop(void)
 /*
  * pgarch_archiveXlog
  *
- * Invokes system(3) to copy one archive file to wherever it should go
+ * Invokes archive_file_cb to copy one archive file to wherever it should go
  *
  * Returns true if successful
  */
@@ -494,7 +504,7 @@ pgarch_archiveXlog(char *xlog)
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	ret = shell_archive_file(xlog, pathname);
+	ret = ArchiveContext->archive_file_cb(xlog, pathname);
 	if (ret)
 		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	else
@@ -744,5 +754,56 @@ HandlePgArchInterrupts(void)
 	{
 		ConfigReloadPending = false;
 		ProcessConfigFile(PGC_SIGHUP);
+
+		/*
+		 * Load the archive_library in case it changed.  Ideally, this would
+		 * first unload any pre-existing loaded archive library to release
+		 * custom GUCs, decommission background workers, etc., but there is
+		 * presently no mechanism for unloading a library.  For more
+		 * information, see the comment above internal_unload_library().
+		 */
+		LoadArchiveLibrary();
 	}
 }
+
+/*
+ * LoadArchiveLibrary
+ *
+ * Loads the archiving callbacks into our local ArchiveContext.
+ */
+static void
+LoadArchiveLibrary(void)
+{
+	ArchiveModuleInit archive_init;
+
+	if (ArchiveContext == NULL)
+		ArchiveContext = palloc(sizeof(ArchiveModuleCallbacks));
+
+	memset(ArchiveContext, 0, sizeof(ArchiveModuleCallbacks));
+
+	/*
+	 * If shell archiving is enabled, use our special initialization
+	 * function.  Otherwise, load the library and call its
+	 * _PG_archive_module_init().
+	 */
+	if (ShellArchivingEnabled())
+		archive_init = shell_archive_init;
+	else
+		archive_init = (ArchiveModuleInit)
+			load_external_function(XLogArchiveLibrary,
+								   "_PG_archive_module_init", false, NULL);
+
+	if (archive_init == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules have to declare the "
+						"_PG_archive_module_init symbol")));
+
+	archive_init(ArchiveContext);
+
+	if (ArchiveContext->check_configured_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register a check callback")));
+	if (ArchiveContext->archive_file_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register an archive callback")));
+}
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
index 5a873c4fd2..f3cdbc97fe 100644
--- a/src/backend/postmaster/shell_archive.c
+++ b/src/backend/postmaster/shell_archive.c
@@ -2,6 +2,10 @@
  *
  * shell_archive.c
  *
+ * This archiving function uses a user-specified shell command (the
+ * archive_command GUC) to copy write-ahead log files.  It is used as the
+ * default, but other modules may define their own custom archiving logic.
+ *
  * Copyright (c) 2021, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
@@ -16,7 +20,25 @@
 #include "access/xlog.h"
 #include "postmaster/pgarch.h"
 
-bool
+static bool shell_archive_configured(void);
+static bool shell_archive_file(const char *file, const char *path);
+
+void
+shell_archive_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&shell_archive_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = shell_archive_configured;
+	cb->archive_file_cb = shell_archive_file;
+}
+
+static bool
+shell_archive_configured(void)
+{
+	return XLogArchiveCommand[0] != '\0';
+}
+
+static bool
 shell_archive_file(const char *file, const char *path)
 {
 	char		xlogarchcmd[MAXPGPATH];
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 88801374b5..358d9ed029 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -38,6 +38,7 @@
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/interrupt.h"
+#include "postmaster/pgarch.h"
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index e91d5a3cfd..57a9255fc2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3864,13 +3864,23 @@ static struct config_string ConfigureNamesString[] =
 	{
 		{"archive_command", PGC_SIGHUP, WAL_ARCHIVING,
 			gettext_noop("Sets the shell command that will be called to archive a WAL file."),
-			NULL
+			gettext_noop("This is unused if \"archive_library\" does not indicate archiving via shell is enabled.")
 		},
 		&XLogArchiveCommand,
 		"",
 		NULL, NULL, show_archive_command
 	},
 
+	{
+		{"archive_library", PGC_SIGHUP, WAL_ARCHIVING,
+			gettext_noop("Sets the library that will be called to archive a WAL file."),
+			gettext_noop("A value of \"shell\" or an empty string indicates that \"archive_command\" should be used.")
+		},
+		&XLogArchiveLibrary,
+		"shell",
+		NULL, NULL, NULL
+	},
+
 	{
 		{"restore_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will be called to retrieve an archived WAL file."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 1cbc9feeb6..dc4a20b014 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -245,6 +245,7 @@
 
 #archive_mode = off		# enables archiving; off, on, or always
 				# (change requires restart)
+#archive_library = 'shell'	# library to use to archive a logfile segment
 #archive_command = ''		# command to use to archive a logfile segment
 				# placeholders: %p = path of file to archive
 				#               %f = file name only
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 898df2ee03..942eb4d55c 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -155,7 +155,6 @@ extern PGDLLIMPORT int wal_level;
 /* Is WAL archiving enabled always (even during recovery)? */
 #define XLogArchivingAlways() \
 	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
-#define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')
 
 /*
  * Is WAL-logging necessary for archival or log-shipping, or can we skip
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 9c4bd69b56..03b5ab7c22 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -33,7 +33,49 @@ extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 extern void PgArchForceDirScan(void);
 
-/* in shell_archive.c */
-extern bool shell_archive_file(const char *file, const char *path);
+/*
+ * The value of the archive_library GUC.
+ */
+extern char *XLogArchiveLibrary;
+
+/*
+ * Callback that gets called to determine if the archive module is
+ * configured.
+ */
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+
+/*
+ * Callback called to archive a single WAL file.
+ */
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+
+/*
+ * Archive module callbacks
+ */
+typedef struct ArchiveModuleCallbacks
+{
+	ArchiveCheckConfiguredCB check_configured_cb;
+	ArchiveFileCB archive_file_cb;
+} ArchiveModuleCallbacks;
+
+/*
+ * Type of the shared library symbol _PG_archive_module_init that is looked
+ * up when loading an archive library.
+ */
+typedef void (*ArchiveModuleInit) (ArchiveModuleCallbacks *cb);
+
+/*
+ * Since the logic for archiving via a shell command is in the core server
+ * and does not need to be loaded via a shared library, it has a special
+ * initialization function.
+ */
+extern void shell_archive_init(ArchiveModuleCallbacks *cb);
+
+/*
+ * We consider archiving via shell to be enabled if archive_library is
+ * empty or if archive_library is set to "shell".
+ */
+#define ShellArchivingEnabled() \
+	(XLogArchiveLibrary[0] == '\0' || strcmp(XLogArchiveLibrary, "shell") == 0)
 
 #endif							/* _PGARCH_H */
-- 
2.16.6

#22Bossart, Nathan
bossartn@amazon.com
In reply to: Bossart, Nathan (#21)
4 attachment(s)
Re: archive modules

On 11/19/21, 11:24 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

I went ahead and split the patch into 4 separate patches in an effort
to ease review. 0001 just refactors the shell archiving logic to its
own file. 0002 introduces the archive modules infrastructure. 0003
introduces the basic_archive test module. And 0004 is the docs.

Here is a rebased patch set (1b06d7b broke v10).

Nathan

Attachments:

v11-0004-Add-documentation-for-archive-modules.patchapplication/octet-stream; name=v11-0004-Add-documentation-for-archive-modules.patchDownload
From e9f26cabed415a0a06f820bf83fa08d21130a1b7 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:06:01 +0000
Subject: [PATCH v11 4/4] Add documentation for archive modules.

---
 doc/src/sgml/archive-modules.sgml   | 123 ++++++++++++++++++++++++++++++++++++
 doc/src/sgml/backup.sgml            |  83 +++++++++++++++---------
 doc/src/sgml/config.sgml            |  37 ++++++++---
 doc/src/sgml/filelist.sgml          |   1 +
 doc/src/sgml/high-availability.sgml |   6 +-
 doc/src/sgml/postgres.sgml          |   1 +
 doc/src/sgml/ref/pg_basebackup.sgml |   4 +-
 doc/src/sgml/ref/pg_receivewal.sgml |   6 +-
 doc/src/sgml/wal.sgml               |   2 +-
 9 files changed, 215 insertions(+), 48 deletions(-)
 create mode 100644 doc/src/sgml/archive-modules.sgml

diff --git a/doc/src/sgml/archive-modules.sgml b/doc/src/sgml/archive-modules.sgml
new file mode 100644
index 0000000000..d52aaaf1f5
--- /dev/null
+++ b/doc/src/sgml/archive-modules.sgml
@@ -0,0 +1,123 @@
+<!-- doc/src/sgml/archive-modules.sgml -->
+
+<chapter id="archive-modules">
+ <title>Archive Modules</title>
+ <indexterm zone="archive-modules">
+  <primary>Archive Modules</primary>
+ </indexterm>
+
+ <para>
+  PostgreSQL provides infrastructure to create custom modules for continuous
+  archiving (see <xref linkend="continuous-archiving"/>).  While archiving via
+  a shell command (i.e., <xref linkend="guc-archive-command"/>) is much
+  simpler, a custom archive module will often be considerably more robust and
+  performant.
+ </para>
+
+ <para>
+  When a custom <xref linkend="guc-archive-library"/> is configured, PostgreSQL
+  will submit completed WAL files to the module, and the server will avoid
+  recyling or removing these WAL files until the module indicates that the files
+  were successfully archived.  It is ultimately up to the module to decide what
+  to do with each WAL file, but many recommendations are listed at
+  <xref linkend="backup-archiving-wal"/>.
+ </para>
+
+ <para>
+  Archiving modules must at least consist of an initialization function (see
+  <xref linkend="archive-module-init"/>) and the required callbacks (see
+  <xref linkend="archive-module-callbacks"/>).  However, archive modules are
+  also permitted to do much more (e.g., declare GUCs and register background
+  workers).
+ </para>
+
+ <para>
+  The <filename>src/test/modules/basic_archive</filename> module contains a
+  working example, which demonstrates some useful techniques.
+ </para>
+
+ <warning>
+  <para>
+   There are considerable robustness and security risks in using archive modules
+   because, being written in the <literal>C</literal> language, they have access
+   to many server resources.  Administrators wishing to enable archive modules
+   should exercise extreme caution.  Only carefully audited modules should be
+   loaded.
+  </para>
+ </warning>
+
+ <sect1 id="archive-module-init">
+  <title>Initialization Functions</title>
+  <indexterm zone="archive-module-init">
+   <primary>_PG_archive_module_init</primary>
+  </indexterm>
+  <para>
+   An archive library is loaded by dynamically loading a shared library with the
+   <xref linkend="guc-archive-library"/>'s name as the library base name.  The
+   normal library search path is used to locate the library.  To provide the
+   required archive module callbacks and to indicate that the library is
+   actually an archive module, it needs to provide a function named
+   <function>_PG_archive_module_init</function>.  This function is passed a
+   struct that needs to be filled with the callback function pointers for
+   individual actions.
+
+<programlisting>
+typedef struct ArchiveModuleCallbacks
+{
+    ArchiveCheckConfiguredCB check_configured_cb;
+    ArchiveFileCB archive_file_cb;
+} ArchiveModuleCallbacks;
+typedef void (*ArchiveModuleInit) (struct ArchiveModuleCallbacks *cb);
+</programlisting>
+
+   Both callbacks are required.
+  </para>
+ </sect1>
+
+ <sect1 id="archive-module-callbacks">
+  <title>Archive Module Callbacks</title>
+  <para>
+   The archive callbacks define the actual archiving behavior of the module.
+   The server will call them as required to process each individual WAL file.
+  </para>
+
+  <sect2 id="archive-module-check">
+   <title>Check Callback</title>
+   <para>
+    The <function>check_configured_cb</function> callback is called to determine
+    whether the module is fully configured and ready to accept WAL files.
+
+<programlisting>
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server will proceed with
+    archiving the file by calling the <function>archive_file_cb</function>
+    callback.  If <literal>false</literal> is returned, archiving will not
+    proceed.  In the latter case, the server will periodically call this
+    function, and archiving will proceed if it eventually returns
+    <literal>true</literal>.
+   </para>
+  </sect2>
+
+  <sect2 id="archive-module-archive">
+   <title>Archive Callback</title>
+   <para>
+    The <function>archive_file_cb</function> callback is called to archive a
+    single WAL file.
+
+<programlisting>
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server proceeds as if the file
+    was successfully archived, which may include recycling or removing the
+    original WAL file.  If <literal>false</literal> is returned, the server will
+    keep the original WAL file and retry archiving later.
+    <literal>file</literal> will contain just the file name of the WAL file to
+    archive, while <literal>path</literal> contains the full path of the WAL
+    file (including the file name).
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index cba32b6eb3..b42f1b3ca7 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -593,20 +593,23 @@ tar -cf backup.tar /usr/local/pgsql/data
     provide the database administrator with flexibility,
     <productname>PostgreSQL</productname> tries not to make any assumptions about how
     the archiving will be done.  Instead, <productname>PostgreSQL</productname> lets
-    the administrator specify a shell command to be executed to copy a
-    completed segment file to wherever it needs to go.  The command could be
-    as simple as a <literal>cp</literal>, or it could invoke a complex shell
-    script &mdash; it's all up to you.
+    the administrator specify an archive library to be executed to copy a
+    completed segment file to wherever it needs to go.  This could be as simple
+    as a shell command that uses <literal>cp</literal>, or it could invoke a
+    complex C function &mdash; it's all up to you.
    </para>
 
    <para>
     To enable WAL archiving, set the <xref linkend="guc-wal-level"/>
     configuration parameter to <literal>replica</literal> or higher,
     <xref linkend="guc-archive-mode"/> to <literal>on</literal>,
-    and specify the shell command to use in the <xref
-    linkend="guc-archive-command"/> configuration parameter.  In practice
+    and specify the library to use in the <xref
+    linkend="guc-archive-library"/> configuration parameter.  In practice
     these settings will always be placed in the
     <filename>postgresql.conf</filename> file.
+    One simple way to archive is to set <varname>archive_library</varname> to
+    <literal>shell</literal> and to specify a shell command in
+    <xref linkend="guc-archive-command"/>.
     In <varname>archive_command</varname>,
     <literal>%p</literal> is replaced by the path name of the file to
     archive, while <literal>%f</literal> is replaced by only the file name.
@@ -631,7 +634,17 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command will be executed under the ownership of the same
+    Another way to archive is to use a custom archive module as the
+    <varname>archive_library</varname>.  Since such modules are written in
+    <literal>C</literal>, creating your own may require considerably more effort
+    than writing a shell command.  However, archive modules can be more
+    performant than archiving via shell, and they will have access to many
+    useful server resources.  For more information about archive modules, see
+    <xref linkend="archive-modules"/>.
+   </para>
+
+   <para>
+    The archive library will be executed under the ownership of the same
     user that the <productname>PostgreSQL</productname> server is running as.  Since
     the series of WAL files being archived contains effectively everything
     in your database, you will want to be sure that the archived data is
@@ -640,25 +653,31 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is important that the archive command return zero exit status if and
-    only if it succeeds.  Upon getting a zero result,
+    It is important that the archive function return <literal>true</literal> if
+    and only if it succeeds.  If <literal>true</literal> is returned,
     <productname>PostgreSQL</productname> will assume that the file has been
-    successfully archived, and will remove or recycle it.  However, a nonzero
-    status tells <productname>PostgreSQL</productname> that the file was not archived;
-    it will try again periodically until it succeeds.
+    successfully archived, and will remove or recycle it.  However, a return
+    value of <literal>false</literal> tells
+    <productname>PostgreSQL</productname> that the file was not archived; it
+    will try again periodically until it succeeds.  If you are archiving via a
+    shell command, the appropriate return values can be achieved by returning
+    <literal>0</literal> if the command succeeds and a nonzero value if it
+    fails.
    </para>
 
    <para>
-    When the archive command is terminated by a signal (other than
-    <systemitem>SIGTERM</systemitem> that is used as part of a server
-    shutdown) or an error by the shell with an exit status greater than
-    125 (such as command not found), the archiver process aborts and gets
-    restarted by the postmaster. In such cases, the failure is
-    not reported in <xref linkend="pg-stat-archiver-view"/>.
+    If the archive function emits an <literal>ERROR</literal> or
+    <literal>FATAL</literal>, the archiver process aborts and gets restarted by
+    the postmaster.  If you are archiving via shell command, FATAL is emitted if
+    the command is terminated by a signal (other than
+    <systemitem>SIGTERM</systemitem> that is used as part of a server shutdown)
+    or an error by the shell with an exit status greater than 125 (such as
+    command not found).  In such cases, the failure is not reported in
+    <xref linkend="pg-stat-archiver-view"/>.
    </para>
 
    <para>
-    The archive command should generally be designed to refuse to overwrite
+    The archive library should generally be designed to refuse to overwrite
     any pre-existing archive file.  This is an important safety feature to
     preserve the integrity of your archive in case of administrator error
     (such as sending the output of two different servers to the same archive
@@ -666,9 +685,9 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is advisable to test your proposed archive command to ensure that it
+    It is advisable to test your proposed archive library to ensure that it
     indeed does not overwrite an existing file, <emphasis>and that it returns
-    nonzero status in this case</emphasis>.
+    <literal>false</literal> in this case</emphasis>.
     The example command above for Unix ensures this by including a separate
     <command>test</command> step.  On some Unix platforms, <command>cp</command> has
     switches such as <option>-i</option> that can be used to do the same thing
@@ -680,7 +699,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
 
    <para>
     While designing your archiving setup, consider what will happen if
-    the archive command fails repeatedly because some aspect requires
+    the archive library fails repeatedly because some aspect requires
     operator intervention or the archive runs out of space. For example, this
     could occur if you write to tape without an autochanger; when the tape
     fills, nothing further can be archived until the tape is swapped.
@@ -695,7 +714,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The speed of the archiving command is unimportant as long as it can keep up
+    The speed of the archive library is unimportant as long as it can keep up
     with the average rate at which your server generates WAL data.  Normal
     operation continues even if the archiving process falls a little behind.
     If archiving falls significantly behind, this will increase the amount of
@@ -707,11 +726,11 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    In writing your archive command, you should assume that the file names to
+    In writing your archive library, you should assume that the file names to
     be archived can be up to 64 characters long and can contain any
     combination of ASCII letters, digits, and dots.  It is not necessary to
-    preserve the original relative path (<literal>%p</literal>) but it is necessary to
-    preserve the file name (<literal>%f</literal>).
+    preserve the original relative path but it is necessary to preserve the file
+    name.
    </para>
 
    <para>
@@ -728,7 +747,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command is only invoked on completed WAL segments.  Hence,
+    The archive function is only invoked on completed WAL segments.  Hence,
     if your server generates only little WAL traffic (or has slack periods
     where it does so), there could be a long delay between the completion
     of a transaction and its safe recording in archive storage.  To put
@@ -758,7 +777,8 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
     contain enough information for archive recovery.  (Crash recovery is
     unaffected.)  For this reason, <varname>wal_level</varname> can only be changed at
     server start.  However, <varname>archive_command</varname> can be changed with a
-    configuration file reload.  If you wish to temporarily stop archiving,
+    configuration file reload.  If you are archiving via shell and wish to
+    temporarily stop archiving,
     one way to do it is to set <varname>archive_command</varname> to the empty
     string (<literal>''</literal>).
     This will cause WAL files to accumulate in <filename>pg_wal/</filename> until a
@@ -938,11 +958,11 @@ SELECT * FROM pg_stop_backup(false, true);
      On a standby, <varname>archive_mode</varname> must be <literal>always</literal> in order
      for <function>pg_stop_backup</function> to wait.
      Archiving of these files happens automatically since you have
-     already configured <varname>archive_command</varname>. In most cases this
+     already configured <varname>archive_library</varname>. In most cases this
      happens quickly, but you are advised to monitor your archive
      system to ensure there are no delays.
      If the archive process has fallen behind
-     because of failures of the archive command, it will keep retrying
+     because of failures of the archive library, it will keep retrying
      until the archive succeeds and the backup is complete.
      If you wish to place a time limit on the execution of
      <function>pg_stop_backup</function>, set an appropriate
@@ -1500,9 +1520,10 @@ restore_command = 'cp /mnt/server/archivedir/%f %p'
       To prepare for low level standalone hot backups, make sure
       <varname>wal_level</varname> is set to
       <literal>replica</literal> or higher, <varname>archive_mode</varname> to
-      <literal>on</literal>, and set up an <varname>archive_command</varname> that performs
+      <literal>on</literal>, and set up an <varname>archive_library</varname> that performs
       archiving only when a <emphasis>switch file</emphasis> exists.  For example:
 <programlisting>
+archive_library = 'shell'
 archive_command = 'test ! -f /var/lib/pgsql/backup_in_progress || (test ! -f /var/lib/pgsql/archive/%f &amp;&amp; cp %p /var/lib/pgsql/archive/%f)'
 </programlisting>
       This command will perform archiving when
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3f806740d5..23b3ff338c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3479,7 +3479,7 @@ include_dir 'conf.d'
         Maximum size to let the WAL grow during automatic
         checkpoints. This is a soft limit; WAL size can exceed
         <varname>max_wal_size</varname> under special circumstances, such as
-        heavy load, a failing <varname>archive_command</varname>, or a high
+        heavy load, a failing <varname>archive_library</varname>, or a high
         <varname>wal_keep_size</varname> setting.
         If this value is specified without units, it is taken as megabytes.
         The default is 1 GB.
@@ -3528,7 +3528,7 @@ include_dir 'conf.d'
        <para>
         When <varname>archive_mode</varname> is enabled, completed WAL segments
         are sent to archive storage by setting
-        <xref linkend="guc-archive-command"/>. In addition to <literal>off</literal>,
+        <xref linkend="guc-archive-library"/>. In addition to <literal>off</literal>,
         to disable, there are two modes: <literal>on</literal>, and
         <literal>always</literal>. During normal operation, there is no
         difference between the two modes, but when set to <literal>always</literal>
@@ -3538,9 +3538,6 @@ include_dir 'conf.d'
         <xref linkend="continuous-archiving-in-standby"/> for details.
        </para>
        <para>
-        <varname>archive_mode</varname> and <varname>archive_command</varname> are
-        separate variables so that <varname>archive_command</varname> can be
-        changed without leaving archiving mode.
         This parameter can only be set at server start.
         <varname>archive_mode</varname> cannot be enabled when
         <varname>wal_level</varname> is set to <literal>minimal</literal>.
@@ -3548,6 +3545,28 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-archive-library" xreflabel="archive_library">
+      <term><varname>archive_library</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>archive_library</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        The library to use for archiving completed WAL file segments.  If set to
+        <literal>shell</literal> (the default) or an empty string, archiving via
+        shell is enabled, and <xref linkend="guc-archive-command"/> is used.
+        Otherwise, the specified shared library is used for archiving.  For more
+        information, see <xref linkend="backup-archiving-wal"/> and
+        <xref linkend="archive-modules"/>.
+       </para>
+       <para>
+        This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-archive-command" xreflabel="archive_command">
       <term><varname>archive_command</varname> (<type>string</type>)
       <indexterm>
@@ -3570,9 +3589,11 @@ include_dir 'conf.d'
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
         file or on the server command line.  It is ignored unless
-        <varname>archive_mode</varname> was enabled at server start.
+        <varname>archive_mode</varname> was enabled at server start and
+        <varname>archive_library</varname> specifies to archive via shell command.
         If <varname>archive_command</varname> is an empty string (the default) while
-        <varname>archive_mode</varname> is enabled, WAL archiving is temporarily
+        <varname>archive_mode</varname> is enabled and <varname>archive_library</varname>
+        specifies archiving via shell, WAL archiving is temporarily
         disabled, but the server continues to accumulate WAL segment files in
         the expectation that a command will soon be provided.  Setting
         <varname>archive_command</varname> to a command that does nothing but
@@ -3592,7 +3613,7 @@ include_dir 'conf.d'
       </term>
       <listitem>
        <para>
-        The <xref linkend="guc-archive-command"/> is only invoked for
+        The <xref linkend="guc-archive-library"/> is only invoked for
         completed WAL segments. Hence, if your server generates little WAL
         traffic (or has slack periods where it does so), there could be a
         long delay between the completion of a transaction and its safe
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 89454e99b9..e6b472ec32 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -99,6 +99,7 @@
 <!ENTITY custom-scan SYSTEM "custom-scan.sgml">
 <!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
 <!ENTITY replication-origins SYSTEM "replication-origins.sgml">
+<!ENTITY archive-modules SYSTEM "archive-modules.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index c43f214020..f4e5e9420b 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -935,7 +935,7 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
     In lieu of using replication slots, it is possible to prevent the removal
     of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by
     storing the segments in an archive using
-    <xref linkend="guc-archive-command"/>.
+    <xref linkend="guc-archive-library"/>.
     However, these methods often result in retaining more WAL segments than
     required, whereas replication slots retain only the number of segments
     known to be needed.  On the other hand, replication slots can retain so
@@ -1386,10 +1386,10 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
      to <literal>always</literal>, and the standby will call the archive
      command for every WAL segment it receives, whether it's by restoring
      from the archive or by streaming replication. The shared archive can
-     be handled similarly, but the <varname>archive_command</varname> must
+     be handled similarly, but the <varname>archive_library</varname> must
      test if the file being archived exists already, and if the existing file
      has identical contents. This requires more care in the
-     <varname>archive_command</varname>, as it must
+     <varname>archive_library</varname>, as it must
      be careful to not overwrite an existing file with different contents,
      but return success if the exactly same file is archived twice. And
      all that must be done free of race conditions, if two servers attempt
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index dba9cf413f..3db6d2160b 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -233,6 +233,7 @@ break is not needed in a wider output rendering.
   &bgworker;
   &logicaldecoding;
   &replication-origins;
+  &archive-modules;
 
  </part>
 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9e6807b457..2aaeaca766 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -102,8 +102,8 @@ PostgreSQL documentation
      <para>
       All WAL records required for the backup must contain sufficient full-page writes,
       which requires you to enable <varname>full_page_writes</varname> on the primary and
-      not to use a tool like <application>pg_compresslog</application> as
-      <varname>archive_command</varname> to remove full-page writes from WAL files.
+      not to use a tool in your <varname>archive_library</varname> to remove
+      full-page writes from WAL files.
      </para>
     </listitem>
    </itemizedlist>
diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 5de80f8c64..a6b6ba91fb 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -40,7 +40,7 @@ PostgreSQL documentation
   <para>
    <application>pg_receivewal</application> streams the write-ahead
    log in real time as it's being generated on the server, and does not wait
-   for segments to complete like <xref linkend="guc-archive-command"/> does.
+   for segments to complete like <xref linkend="guc-archive-library"/> does.
    For this reason, it is not necessary to set
    <xref linkend="guc-archive-timeout"/> when using
     <application>pg_receivewal</application>.
@@ -488,11 +488,11 @@ PostgreSQL documentation
 
   <para>
    When using <application>pg_receivewal</application> instead of
-   <xref linkend="guc-archive-command"/> as the main WAL backup method, it is
+   <xref linkend="guc-archive-library"/> as the main WAL backup method, it is
    strongly recommended to use replication slots.  Otherwise, the server is
    free to recycle or remove write-ahead log files before they are backed up,
    because it does not have any information, either
-   from <xref linkend="guc-archive-command"/> or the replication slots, about
+   from <xref linkend="guc-archive-library"/> or the replication slots, about
    how far the WAL stream has been archived.  Note, however, that a
    replication slot will fill up the server's disk space if the receiver does
    not keep up with fetching the WAL data.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 24e1c89503..2bb27a8468 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -636,7 +636,7 @@
    WAL files plus one additional WAL file are
    kept at all times. Also, if WAL archiving is used, old segments cannot be
    removed or recycled until they are archived. If WAL archiving cannot keep up
-   with the pace that WAL is generated, or if <varname>archive_command</varname>
+   with the pace that WAL is generated, or if <varname>archive_library</varname>
    fails repeatedly, old WAL files will accumulate in <filename>pg_wal</filename>
    until the situation is resolved. A slow or failed standby server that
    uses a replication slot will have the same effect (see
-- 
2.16.6

v11-0003-Add-test-archive-module.patchapplication/octet-stream; name=v11-0003-Add-test-archive-module.patchDownload
From 7a423e76b2a32184ea6f80c5f88d5e3fe58c43a5 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:05:43 +0000
Subject: [PATCH v11 3/4] Add test archive module.

---
 src/test/modules/Makefile                          |   1 +
 src/test/modules/basic_archive/.gitignore          |   4 +
 src/test/modules/basic_archive/Makefile            |  20 +++
 src/test/modules/basic_archive/basic_archive.c     | 185 +++++++++++++++++++++
 src/test/modules/basic_archive/basic_archive.conf  |   3 +
 .../basic_archive/expected/basic_archive.out       |  29 ++++
 .../modules/basic_archive/sql/basic_archive.sql    |  22 +++
 7 files changed, 264 insertions(+)
 create mode 100644 src/test/modules/basic_archive/.gitignore
 create mode 100644 src/test/modules/basic_archive/Makefile
 create mode 100644 src/test/modules/basic_archive/basic_archive.c
 create mode 100644 src/test/modules/basic_archive/basic_archive.conf
 create mode 100644 src/test/modules/basic_archive/expected/basic_archive.out
 create mode 100644 src/test/modules/basic_archive/sql/basic_archive.sql

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index dffc79b2d9..b49e508a2c 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -5,6 +5,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = \
+		  basic_archive \
 		  brin \
 		  commit_ts \
 		  delay_execution \
diff --git a/src/test/modules/basic_archive/.gitignore b/src/test/modules/basic_archive/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/basic_archive/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/basic_archive/Makefile b/src/test/modules/basic_archive/Makefile
new file mode 100644
index 0000000000..ffbf846b68
--- /dev/null
+++ b/src/test/modules/basic_archive/Makefile
@@ -0,0 +1,20 @@
+# src/test/modules/basic_archive/Makefile
+
+MODULES = basic_archive
+PGFILEDESC = "basic_archive - basic archive module"
+
+REGRESS = basic_archive
+REGRESS_OPTS = --temp-config $(top_srcdir)/src/test/modules/basic_archive/basic_archive.conf
+
+NO_INSTALLCHECK = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/basic_archive
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/basic_archive/basic_archive.c b/src/test/modules/basic_archive/basic_archive.c
new file mode 100644
index 0000000000..84e923bb7e
--- /dev/null
+++ b/src/test/modules/basic_archive/basic_archive.c
@@ -0,0 +1,185 @@
+/*-------------------------------------------------------------------------
+ *
+ * basic_archive.c
+ *
+ * This file demonstrates a basic archive library implementation that is
+ * roughly equivalent to the following shell command:
+ *
+ * 		test ! -f /path/to/dest && cp /path/to/src /path/to/dest
+ *
+ * One notable difference between this module and the shell command above
+ * is that this module first copies the file to a temporary destination,
+ * syncs it to disk, and then durably moves it to the final destination.
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/test/modules/basic_archive/basic_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "miscadmin.h"
+#include "postmaster/pgarch.h"
+#include "storage/copydir.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+void _PG_init(void);
+void _PG_archive_module_init(ArchiveModuleCallbacks *cb);
+
+static char *archive_directory = NULL;
+
+static bool basic_archive_configured(void);
+static bool basic_archive_file(const char *file, const char *path);
+static bool check_archive_directory(char **newval, void **extra, GucSource source);
+
+/*
+ * _PG_init
+ *
+ * Defines the module's GUC.
+ */
+void
+_PG_init(void)
+{
+	DefineCustomStringVariable("basic_archive.archive_directory",
+							   gettext_noop("Archive file destination directory."),
+							   NULL,
+							   &archive_directory,
+							   "",
+							   PGC_SIGHUP,
+							   0,
+							   check_archive_directory, NULL, NULL);
+
+	EmitWarningsOnPlaceholders("basic_archive");
+}
+
+/*
+ * _PG_archive_module_init
+ *
+ * Returns the module's archiving callbacks.
+ */
+void
+_PG_archive_module_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = basic_archive_configured;
+	cb->archive_file_cb = basic_archive_file;
+}
+
+/*
+ * check_archive_directory
+ *
+ * Checks that the provided archive directory exists.
+ */
+static bool
+check_archive_directory(char **newval, void **extra, GucSource source)
+{
+	struct stat st;
+
+	/*
+	 * The default value is an empty string, so we have to accept that value.
+	 * Our check_configured callback also checks for this and prevents archiving
+	 * from proceeding if it is still empty.
+	 */
+	if (*newval == NULL || *newval[0] == '\0')
+		return true;
+
+	/*
+	 * Make sure the file paths won't be too long.  The docs indicate that the
+	 * file names to be archived can be up to 64 characters long.
+	 */
+	if (strlen(*newval) + 64 + 2 >= MAXPGPATH)
+	{
+		GUC_check_errdetail("archive directory too long");
+		return false;
+	}
+
+	/*
+	 * Do a basic sanity check that the specified archive directory exists.  It
+	 * could be removed at some point in the future, so we still need to be
+	 * prepared for it not to exist in the actual archiving logic.
+	 */
+	if (stat(*newval, &st) != 0 || !S_ISDIR(st.st_mode))
+	{
+		GUC_check_errdetail("specified archive directory does not exist");
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * basic_archive_configured
+ *
+ * Checks that archive_directory is not blank.
+ */
+static bool
+basic_archive_configured(void)
+{
+	return archive_directory != NULL && archive_directory[0] != '\0';
+}
+
+/*
+ * basic_archive_file
+ *
+ * Archives one file.
+ */
+static bool
+basic_archive_file(const char *file, const char *path)
+{
+	char destination[MAXPGPATH];
+	char temp[MAXPGPATH];
+	struct stat st;
+
+	ereport(DEBUG3,
+			(errmsg("archiving \"%s\" via basic_archive", file)));
+
+	snprintf(destination, MAXPGPATH, "%s/%s", archive_directory, file);
+	snprintf(temp, MAXPGPATH, "%s/%s", archive_directory, "archtemp");
+
+	/*
+	 * First, check if the file has already been archived.  If the archive file
+	 * already exists, something might be wrong, so we just fail.
+	 */
+	if (stat(destination, &st) == 0)
+	{
+		ereport(WARNING,
+				(errmsg("archive file \"%s\" already exists", destination)));
+		return false;
+	}
+	else if (errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat file \"%s\": %m", destination)));
+
+	/*
+	 * Remove pre-existing temporary file, if one exists.
+	 */
+	if (unlink(temp) != 0 && errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not unlink file \"%s\": %m", temp)));
+
+	/*
+	 * Copy the file to its temporary destination.
+	 */
+	copy_file(unconstify(char *, path), temp);
+
+	/*
+	 * Sync the temporary file to disk and move it to its final destination.
+	 */
+	(void) durable_rename_excl(temp, destination, ERROR);
+
+	ereport(DEBUG1,
+			(errmsg("archived \"%s\" via basic_archive", file)));
+
+	return true;
+}
diff --git a/src/test/modules/basic_archive/basic_archive.conf b/src/test/modules/basic_archive/basic_archive.conf
new file mode 100644
index 0000000000..b26b2d4144
--- /dev/null
+++ b/src/test/modules/basic_archive/basic_archive.conf
@@ -0,0 +1,3 @@
+archive_mode = 'on'
+archive_library = 'basic_archive'
+basic_archive.archive_directory = '.'
diff --git a/src/test/modules/basic_archive/expected/basic_archive.out b/src/test/modules/basic_archive/expected/basic_archive.out
new file mode 100644
index 0000000000..0015053e0f
--- /dev/null
+++ b/src/test/modules/basic_archive/expected/basic_archive.out
@@ -0,0 +1,29 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+ ?column? 
+----------
+        1
+(1 row)
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test;
diff --git a/src/test/modules/basic_archive/sql/basic_archive.sql b/src/test/modules/basic_archive/sql/basic_archive.sql
new file mode 100644
index 0000000000..14e236d57a
--- /dev/null
+++ b/src/test/modules/basic_archive/sql/basic_archive.sql
@@ -0,0 +1,22 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+
+DROP TABLE test;
-- 
2.16.6

v11-0002-Introduce-archive-modules-infrastructure.patchapplication/octet-stream; name=v11-0002-Introduce-archive-modules-infrastructure.patchDownload
From f9a8a394cc6674b9986d93602ca673d18b7f9014 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:04:41 +0000
Subject: [PATCH v11 2/4] Introduce archive modules infrastructure.

---
 src/backend/access/transam/xlog.c             |  2 +-
 src/backend/postmaster/pgarch.c               | 71 +++++++++++++++++++++++++--
 src/backend/postmaster/shell_archive.c        | 24 ++++++++-
 src/backend/utils/init/miscinit.c             |  1 +
 src/backend/utils/misc/guc.c                  | 12 ++++-
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/xlog.h                     |  1 -
 src/include/postmaster/pgarch.h               | 46 ++++++++++++++++-
 8 files changed, 147 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 221e4cb34f..ce942c817a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8868,7 +8868,7 @@ ShutdownXLOG(int code, Datum arg)
 		 * process one more time at the end of shutdown). The checkpoint
 		 * record will go to the next XLOG file and won't be archived (yet).
 		 */
-		if (XLogArchivingActive() && XLogArchiveCommandSet())
+		if (XLogArchivingActive())
 			RequestXLogSwitch(false);
 
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 7fa1644889..34a8588eb6 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -87,6 +87,8 @@ typedef struct PgArchData
 	slock_t		arch_lck;
 } PgArchData;
 
+char *XLogArchiveLibrary = "";
+
 
 /* ----------
  * Local data
@@ -94,6 +96,8 @@ typedef struct PgArchData
  */
 static time_t last_sigterm_time = 0;
 static PgArchData *PgArch = NULL;
+static ArchiveModuleCallbacks *ArchiveContext = NULL;
+
 
 /*
  * Stuff for tracking multiple files to archive from each scan of
@@ -129,6 +133,7 @@ static void pgarch_archiveDone(char *xlog);
 static void pgarch_die(int code, Datum arg);
 static void HandlePgArchInterrupts(void);
 static int ready_file_comparator(Datum a, Datum b, void *arg);
+static void LoadArchiveLibrary(void);
 
 /* Report shared memory space needed by PgArchShmemInit */
 Size
@@ -225,6 +230,11 @@ PgArchiverMain(void)
 	 */
 	PgArch->pgprocno = MyProc->pgprocno;
 
+	/*
+	 * Load the archive_library.
+	 */
+	LoadArchiveLibrary();
+
 	/* Initialize our max-heap for prioritizing files to archive. */
 	arch_heap = binaryheap_allocate(NUM_FILES_PER_DIRECTORY_SCAN,
 									ready_file_comparator, NULL);
@@ -392,11 +402,11 @@ pgarch_ArchiverCopyLoop(void)
 			 */
 			HandlePgArchInterrupts();
 
-			/* can't do anything if no command ... */
-			if (!XLogArchiveCommandSet())
+			/* can't do anything if not configured ... */
+			if (!ArchiveContext->check_configured_cb())
 			{
 				ereport(WARNING,
-						(errmsg("archive_mode enabled, yet archive_command is not set")));
+						(errmsg("archive_mode enabled, yet archiving is not configured")));
 				return;
 			}
 
@@ -477,7 +487,7 @@ pgarch_ArchiverCopyLoop(void)
 /*
  * pgarch_archiveXlog
  *
- * Invokes system(3) to copy one archive file to wherever it should go
+ * Invokes archive_file_cb to copy one archive file to wherever it should go
  *
  * Returns true if successful
  */
@@ -494,7 +504,7 @@ pgarch_archiveXlog(char *xlog)
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	ret = shell_archive_file(xlog, pathname);
+	ret = ArchiveContext->archive_file_cb(xlog, pathname);
 	if (ret)
 		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	else
@@ -744,5 +754,56 @@ HandlePgArchInterrupts(void)
 	{
 		ConfigReloadPending = false;
 		ProcessConfigFile(PGC_SIGHUP);
+
+		/*
+		 * Load the archive_library in case it changed.  Ideally, this would
+		 * first unload any pre-existing loaded archive library to release
+		 * custom GUCs, decommission background workers, etc., but there is
+		 * presently no mechanism for unloading a library.  For more
+		 * information, see the comment above internal_unload_library().
+		 */
+		LoadArchiveLibrary();
 	}
 }
+
+/*
+ * LoadArchiveLibrary
+ *
+ * Loads the archiving callbacks into our local ArchiveContext.
+ */
+static void
+LoadArchiveLibrary(void)
+{
+	ArchiveModuleInit archive_init;
+
+	if (ArchiveContext == NULL)
+		ArchiveContext = palloc(sizeof(ArchiveModuleCallbacks));
+
+	memset(ArchiveContext, 0, sizeof(ArchiveModuleCallbacks));
+
+	/*
+	 * If shell archiving is enabled, use our special initialization
+	 * function.  Otherwise, load the library and call its
+	 * _PG_archive_module_init().
+	 */
+	if (ShellArchivingEnabled())
+		archive_init = shell_archive_init;
+	else
+		archive_init = (ArchiveModuleInit)
+			load_external_function(XLogArchiveLibrary,
+								   "_PG_archive_module_init", false, NULL);
+
+	if (archive_init == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules have to declare the "
+						"_PG_archive_module_init symbol")));
+
+	archive_init(ArchiveContext);
+
+	if (ArchiveContext->check_configured_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register a check callback")));
+	if (ArchiveContext->archive_file_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register an archive callback")));
+}
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
index 3c51b620b7..5809040295 100644
--- a/src/backend/postmaster/shell_archive.c
+++ b/src/backend/postmaster/shell_archive.c
@@ -2,6 +2,10 @@
  *
  * shell_archive.c
  *
+ * This archiving function uses a user-specified shell command (the
+ * archive_command GUC) to copy write-ahead log files.  It is used as the
+ * default, but other modules may define their own custom archiving logic.
+ *
  * Copyright (c) 2021, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
@@ -17,7 +21,25 @@
 #include "pgstat.h"
 #include "postmaster/pgarch.h"
 
-bool
+static bool shell_archive_configured(void);
+static bool shell_archive_file(const char *file, const char *path);
+
+void
+shell_archive_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&shell_archive_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = shell_archive_configured;
+	cb->archive_file_cb = shell_archive_file;
+}
+
+static bool
+shell_archive_configured(void)
+{
+	return XLogArchiveCommand[0] != '\0';
+}
+
+static bool
 shell_archive_file(const char *file, const char *path)
 {
 	char		xlogarchcmd[MAXPGPATH];
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 88801374b5..358d9ed029 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -38,6 +38,7 @@
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/interrupt.h"
+#include "postmaster/pgarch.h"
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index e91d5a3cfd..57a9255fc2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3864,13 +3864,23 @@ static struct config_string ConfigureNamesString[] =
 	{
 		{"archive_command", PGC_SIGHUP, WAL_ARCHIVING,
 			gettext_noop("Sets the shell command that will be called to archive a WAL file."),
-			NULL
+			gettext_noop("This is unused if \"archive_library\" does not indicate archiving via shell is enabled.")
 		},
 		&XLogArchiveCommand,
 		"",
 		NULL, NULL, show_archive_command
 	},
 
+	{
+		{"archive_library", PGC_SIGHUP, WAL_ARCHIVING,
+			gettext_noop("Sets the library that will be called to archive a WAL file."),
+			gettext_noop("A value of \"shell\" or an empty string indicates that \"archive_command\" should be used.")
+		},
+		&XLogArchiveLibrary,
+		"shell",
+		NULL, NULL, NULL
+	},
+
 	{
 		{"restore_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will be called to retrieve an archived WAL file."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 1cbc9feeb6..dc4a20b014 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -245,6 +245,7 @@
 
 #archive_mode = off		# enables archiving; off, on, or always
 				# (change requires restart)
+#archive_library = 'shell'	# library to use to archive a logfile segment
 #archive_command = ''		# command to use to archive a logfile segment
 				# placeholders: %p = path of file to archive
 				#               %f = file name only
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 898df2ee03..942eb4d55c 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -155,7 +155,6 @@ extern PGDLLIMPORT int wal_level;
 /* Is WAL archiving enabled always (even during recovery)? */
 #define XLogArchivingAlways() \
 	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
-#define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')
 
 /*
  * Is WAL-logging necessary for archival or log-shipping, or can we skip
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 9c4bd69b56..03b5ab7c22 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -33,7 +33,49 @@ extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 extern void PgArchForceDirScan(void);
 
-/* in shell_archive.c */
-extern bool shell_archive_file(const char *file, const char *path);
+/*
+ * The value of the archive_library GUC.
+ */
+extern char *XLogArchiveLibrary;
+
+/*
+ * Callback that gets called to determine if the archive module is
+ * configured.
+ */
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+
+/*
+ * Callback called to archive a single WAL file.
+ */
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+
+/*
+ * Archive module callbacks
+ */
+typedef struct ArchiveModuleCallbacks
+{
+	ArchiveCheckConfiguredCB check_configured_cb;
+	ArchiveFileCB archive_file_cb;
+} ArchiveModuleCallbacks;
+
+/*
+ * Type of the shared library symbol _PG_archive_module_init that is looked
+ * up when loading an archive library.
+ */
+typedef void (*ArchiveModuleInit) (ArchiveModuleCallbacks *cb);
+
+/*
+ * Since the logic for archiving via a shell command is in the core server
+ * and does not need to be loaded via a shared library, it has a special
+ * initialization function.
+ */
+extern void shell_archive_init(ArchiveModuleCallbacks *cb);
+
+/*
+ * We consider archiving via shell to be enabled if archive_library is
+ * empty or if archive_library is set to "shell".
+ */
+#define ShellArchivingEnabled() \
+	(XLogArchiveLibrary[0] == '\0' || strcmp(XLogArchiveLibrary, "shell") == 0)
 
 #endif							/* _PGARCH_H */
-- 
2.16.6

v11-0001-Refactor-shell-archive-function-to-its-own-file.patchapplication/octet-stream; name=v11-0001-Refactor-shell-archive-function-to-its-own-file.patchDownload
From 133a79791642133d0dbbb9a3ab6d13a507c0220e Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Mon, 22 Nov 2021 17:49:18 +0000
Subject: [PATCH v11 1/4] Refactor shell archive function to its own file.

---
 src/backend/postmaster/Makefile        |   1 +
 src/backend/postmaster/pgarch.c        | 126 ++----------------------------
 src/backend/postmaster/shell_archive.c | 135 +++++++++++++++++++++++++++++++++
 src/include/postmaster/pgarch.h        |   3 +
 4 files changed, 145 insertions(+), 120 deletions(-)
 create mode 100644 src/backend/postmaster/shell_archive.c

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 787c6a2c3b..dbbeac5a82 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -23,6 +23,7 @@ OBJS = \
 	pgarch.o \
 	pgstat.o \
 	postmaster.o \
+	shell_archive.o \
 	startup.o \
 	syslogger.o \
 	walwriter.o
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 434939be9b..7fa1644889 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -25,19 +25,13 @@
  */
 #include "postgres.h"
 
-#include <fcntl.h>
-#include <signal.h>
-#include <time.h>
 #include <sys/stat.h>
-#include <sys/time.h>
-#include <sys/wait.h>
 #include <unistd.h>
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "lib/binaryheap.h"
 #include "libpq/pqsignal.h"
-#include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/pgarch.h"
@@ -490,132 +484,24 @@ pgarch_ArchiverCopyLoop(void)
 static bool
 pgarch_archiveXlog(char *xlog)
 {
-	char		xlogarchcmd[MAXPGPATH];
 	char		pathname[MAXPGPATH];
 	char		activitymsg[MAXFNAMELEN + 16];
-	char	   *dp;
-	char	   *endp;
-	const char *sp;
-	int			rc;
+	bool		ret;
 
 	snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
 
-	/*
-	 * construct the command to be executed
-	 */
-	dp = xlogarchcmd;
-	endp = xlogarchcmd + MAXPGPATH - 1;
-	*endp = '\0';
-
-	for (sp = XLogArchiveCommand; *sp; sp++)
-	{
-		if (*sp == '%')
-		{
-			switch (sp[1])
-			{
-				case 'p':
-					/* %p: relative path of source file */
-					sp++;
-					strlcpy(dp, pathname, endp - dp);
-					make_native_path(dp);
-					dp += strlen(dp);
-					break;
-				case 'f':
-					/* %f: filename of source file */
-					sp++;
-					strlcpy(dp, xlog, endp - dp);
-					dp += strlen(dp);
-					break;
-				case '%':
-					/* convert %% to a single % */
-					sp++;
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-				default:
-					/* otherwise treat the % as not special */
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-			}
-		}
-		else
-		{
-			if (dp < endp)
-				*dp++ = *sp;
-		}
-	}
-	*dp = '\0';
-
-	ereport(DEBUG3,
-			(errmsg_internal("executing archive command \"%s\"",
-							 xlogarchcmd)));
-
 	/* Report archive activity in PS display */
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	pgstat_report_wait_start(WAIT_EVENT_ARCHIVE_COMMAND);
-	rc = system(xlogarchcmd);
-	pgstat_report_wait_end();
-
-	if (rc != 0)
-	{
-		/*
-		 * If either the shell itself, or a called command, died on a signal,
-		 * abort the archiver.  We do this because system() ignores SIGINT and
-		 * SIGQUIT while waiting; so a signal is very likely something that
-		 * should have interrupted us too.  Also die if the shell got a hard
-		 * "command not found" type of error.  If we overreact it's no big
-		 * deal, the postmaster will just start the archiver again.
-		 */
-		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
-
-		if (WIFEXITED(rc))
-		{
-			ereport(lev,
-					(errmsg("archive command failed with exit code %d",
-							WEXITSTATUS(rc)),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-		else if (WIFSIGNALED(rc))
-		{
-#if defined(WIN32)
-			ereport(lev,
-					(errmsg("archive command was terminated by exception 0x%X",
-							WTERMSIG(rc)),
-					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#else
-			ereport(lev,
-					(errmsg("archive command was terminated by signal %d: %s",
-							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#endif
-		}
-		else
-		{
-			ereport(lev,
-					(errmsg("archive command exited with unrecognized status %d",
-							rc),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-
+	ret = shell_archive_file(xlog, pathname);
+	if (ret)
+		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
+	else
 		snprintf(activitymsg, sizeof(activitymsg), "failed on %s", xlog);
-		set_ps_display(activitymsg);
-
-		return false;
-	}
-	elog(DEBUG1, "archived write-ahead log file \"%s\"", xlog);
-
-	snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	set_ps_display(activitymsg);
 
-	return true;
+	return ret;
 }
 
 /*
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
new file mode 100644
index 0000000000..3c51b620b7
--- /dev/null
+++ b/src/backend/postmaster/shell_archive.c
@@ -0,0 +1,135 @@
+/*-------------------------------------------------------------------------
+ *
+ * shell_archive.c
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/shell_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/wait.h>
+
+#include "access/xlog.h"
+#include "pgstat.h"
+#include "postmaster/pgarch.h"
+
+bool
+shell_archive_file(const char *file, const char *path)
+{
+	char		xlogarchcmd[MAXPGPATH];
+	char	   *dp;
+	char	   *endp;
+	const char *sp;
+	int			rc;
+
+	/*
+	 * construct the command to be executed
+	 */
+	dp = xlogarchcmd;
+	endp = xlogarchcmd + MAXPGPATH - 1;
+	*endp = '\0';
+
+	for (sp = XLogArchiveCommand; *sp; sp++)
+	{
+		if (*sp == '%')
+		{
+			switch (sp[1])
+			{
+				case 'p':
+					/* %p: relative path of source file */
+					sp++;
+					strlcpy(dp, path, endp - dp);
+					make_native_path(dp);
+					dp += strlen(dp);
+					break;
+				case 'f':
+					/* %f: filename of source file */
+					sp++;
+					strlcpy(dp, file, endp - dp);
+					dp += strlen(dp);
+					break;
+				case '%':
+					/* convert %% to a single % */
+					sp++;
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+				default:
+					/* otherwise treat the % as not special */
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+			}
+		}
+		else
+		{
+			if (dp < endp)
+				*dp++ = *sp;
+		}
+	}
+	*dp = '\0';
+
+	ereport(DEBUG3,
+			(errmsg_internal("executing archive command \"%s\"",
+							 xlogarchcmd)));
+
+	pgstat_report_wait_start(WAIT_EVENT_ARCHIVE_COMMAND);
+	rc = system(xlogarchcmd);
+	pgstat_report_wait_end();
+
+	if (rc != 0)
+	{
+		/*
+		 * If either the shell itself, or a called command, died on a signal,
+		 * abort the archiver.  We do this because system() ignores SIGINT and
+		 * SIGQUIT while waiting; so a signal is very likely something that
+		 * should have interrupted us too.  Also die if the shell got a hard
+		 * "command not found" type of error.  If we overreact it's no big
+		 * deal, the postmaster will just start the archiver again.
+		 */
+		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
+
+		if (WIFEXITED(rc))
+		{
+			ereport(lev,
+					(errmsg("archive command failed with exit code %d",
+							WEXITSTATUS(rc)),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+		else if (WIFSIGNALED(rc))
+		{
+#if defined(WIN32)
+			ereport(lev,
+					(errmsg("archive command was terminated by exception 0x%X",
+							WTERMSIG(rc)),
+					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#else
+			ereport(lev,
+					(errmsg("archive command was terminated by signal %d: %s",
+							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#endif
+		}
+		else
+		{
+			ereport(lev,
+					(errmsg("archive command exited with unrecognized status %d",
+							rc),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+
+		return false;
+	}
+
+	elog(DEBUG1, "archived write-ahead log file \"%s\"", file);
+	return true;
+}
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 732615be57..9c4bd69b56 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -33,4 +33,7 @@ extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 extern void PgArchForceDirScan(void);
 
+/* in shell_archive.c */
+extern bool shell_archive_file(const char *file, const char *path);
+
 #endif							/* _PGARCH_H */
-- 
2.16.6

#23Bossart, Nathan
bossartn@amazon.com
In reply to: Bossart, Nathan (#22)
Re: archive modules

On 11/2/21, 8:07 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

The main motivation is provide a way to archive without shelling out.
This reduces the amount of overhead, which can improve archival rate
significantly. It should also make it easier to archive more safely.
For example, many of the common shell commands used for archiving
won't fsync the data, but it isn't too hard to do so via C. The
current proposal doesn't introduce any extra infrastructure for
batching or parallelism, but it is probably still possible. I would
like to eventually add batching, but for now I'm only focused on
introducing basic archive module support.

As noted above, the latest patch set (v11) doesn't add any batching or
parallelism. Now that beb4e9b is committed (which causes the archiver
to gather multiple files to archive in each scan of archive_status),
it seems like a good time to discuss this a bit further. I think
there are some interesting design considerations.

As is, the current archive module infrastructure in the v11 patch set
should help reduce the amount of overhead per-file quite a bit, and I
observed a noticeable speedup with a basic file-copying archive
strategy (although this is likely not representative of real-world
workloads). I believe it would be possible for archive module authors
to implement batching/parallelism, but AFAICT it would still require
hacks similar to what folks do today with archive_command. For
example, you could look ahead in archive_status, archive a bunch of
files in a batch or in parallel with background workers, and then
quickly return true when the archive_library is called for later files
in the batch.

Alternatively, we could offer some kind of built-in batching support
in the archive module infrastructure. One simple approach would be to
just have pgarch_readyXlog() optionally return the entire list of
files gathered from the directory scan of archive_status (presently up
to 64 files). Or we could provide a GUC like archive_batch_size that
would allow users to limit how many files are sent to the
archive_library each time. This list would be given to
pgarch_archiveXlog(), which would return which files were successfully
archived and which failed. I think this could be done for
archive_command as well, although it might be tricky to determine
which files were archived successfully. To handle that, we might just
need to fail the whole batch if the archive_command return value
indicates failure.

Another interesting change is that the special timeline file handling
added in beb4e9b becomes less useful. Presently, if a timeline
history file is marked ready for archival, we force pgarch_readyXlog()
to do a new scan of archive_status the next time it is called in order
to pick it up as soon as possible (ordinarily it just returns the
files gathered in a previous scan until it runs out). If we are
sending a list of files to the archive module, it will be more
difficult to ensure timeline history files are picked up so quickly.
Perhaps this is a reasonable tradeoff to make when archive batching is
enabled.

I think the retry logic can stay roughly the same. If any files in a
batch cannot be archived, wait a second before retrying. If that
happens a few times in a row, stop archiving for a bit. It wouldn't
be quite as precise as what's there today because the failures could
be for different files each time, but I don't know if that is terribly
important.

Finally, I wonder if batching support is something we should bother
with at all for the first round of archive module support. I believe
it is something that could be easily added later, although it might
require archive modules to adjust the archiving callback to accept and
return a list of files. IMO the archive modules infrastructure is
still an improvement even without batching, and it seems to fit nicely
into the existing behavior of the archiver process. I'm curious what
others think about all this.

Nathan

#24Bossart, Nathan
bossartn@amazon.com
In reply to: Bossart, Nathan (#23)
4 attachment(s)
Re: archive modules

On 11/22/21, 10:01 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

On 11/19/21, 11:24 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

I went ahead and split the patch into 4 separate patches in an effort
to ease review. 0001 just refactors the shell archiving logic to its
own file. 0002 introduces the archive modules infrastructure. 0003
introduces the basic_archive test module. And 0004 is the docs.

Here is a rebased patch set (1b06d7b broke v10).

I'm attempting to make cfbot happy again with v12. It looked like
there was a missing #include for Windows.

Nathan

Attachments:

v12-0001-Refactor-shell-archive-function-to-its-own-file.patchapplication/octet-stream; name=v12-0001-Refactor-shell-archive-function-to-its-own-file.patchDownload
From e44a714575df136409951ffa55cb5f75f1ba6cbd Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Mon, 22 Nov 2021 17:49:18 +0000
Subject: [PATCH v12 1/4] Refactor shell archive function to its own file.

---
 src/backend/postmaster/Makefile        |   1 +
 src/backend/postmaster/pgarch.c        | 125 ++----------------------------
 src/backend/postmaster/shell_archive.c | 135 +++++++++++++++++++++++++++++++++
 src/include/postmaster/pgarch.h        |   3 +
 4 files changed, 145 insertions(+), 119 deletions(-)
 create mode 100644 src/backend/postmaster/shell_archive.c

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 787c6a2c3b..dbbeac5a82 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -23,6 +23,7 @@ OBJS = \
 	pgarch.o \
 	pgstat.o \
 	postmaster.o \
+	shell_archive.o \
 	startup.o \
 	syslogger.o \
 	walwriter.o
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 5b6bf9f4e0..dc68dd570e 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -25,19 +25,14 @@
  */
 #include "postgres.h"
 
-#include <fcntl.h>
-#include <signal.h>
 #include <time.h>
 #include <sys/stat.h>
-#include <sys/time.h>
-#include <sys/wait.h>
 #include <unistd.h>
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "lib/binaryheap.h"
 #include "libpq/pqsignal.h"
-#include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/pgarch.h"
@@ -503,132 +498,24 @@ pgarch_ArchiverCopyLoop(void)
 static bool
 pgarch_archiveXlog(char *xlog)
 {
-	char		xlogarchcmd[MAXPGPATH];
 	char		pathname[MAXPGPATH];
 	char		activitymsg[MAXFNAMELEN + 16];
-	char	   *dp;
-	char	   *endp;
-	const char *sp;
-	int			rc;
+	bool		ret;
 
 	snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
 
-	/*
-	 * construct the command to be executed
-	 */
-	dp = xlogarchcmd;
-	endp = xlogarchcmd + MAXPGPATH - 1;
-	*endp = '\0';
-
-	for (sp = XLogArchiveCommand; *sp; sp++)
-	{
-		if (*sp == '%')
-		{
-			switch (sp[1])
-			{
-				case 'p':
-					/* %p: relative path of source file */
-					sp++;
-					strlcpy(dp, pathname, endp - dp);
-					make_native_path(dp);
-					dp += strlen(dp);
-					break;
-				case 'f':
-					/* %f: filename of source file */
-					sp++;
-					strlcpy(dp, xlog, endp - dp);
-					dp += strlen(dp);
-					break;
-				case '%':
-					/* convert %% to a single % */
-					sp++;
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-				default:
-					/* otherwise treat the % as not special */
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-			}
-		}
-		else
-		{
-			if (dp < endp)
-				*dp++ = *sp;
-		}
-	}
-	*dp = '\0';
-
-	ereport(DEBUG3,
-			(errmsg_internal("executing archive command \"%s\"",
-							 xlogarchcmd)));
-
 	/* Report archive activity in PS display */
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	pgstat_report_wait_start(WAIT_EVENT_ARCHIVE_COMMAND);
-	rc = system(xlogarchcmd);
-	pgstat_report_wait_end();
-
-	if (rc != 0)
-	{
-		/*
-		 * If either the shell itself, or a called command, died on a signal,
-		 * abort the archiver.  We do this because system() ignores SIGINT and
-		 * SIGQUIT while waiting; so a signal is very likely something that
-		 * should have interrupted us too.  Also die if the shell got a hard
-		 * "command not found" type of error.  If we overreact it's no big
-		 * deal, the postmaster will just start the archiver again.
-		 */
-		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
-
-		if (WIFEXITED(rc))
-		{
-			ereport(lev,
-					(errmsg("archive command failed with exit code %d",
-							WEXITSTATUS(rc)),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-		else if (WIFSIGNALED(rc))
-		{
-#if defined(WIN32)
-			ereport(lev,
-					(errmsg("archive command was terminated by exception 0x%X",
-							WTERMSIG(rc)),
-					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#else
-			ereport(lev,
-					(errmsg("archive command was terminated by signal %d: %s",
-							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#endif
-		}
-		else
-		{
-			ereport(lev,
-					(errmsg("archive command exited with unrecognized status %d",
-							rc),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-
+	ret = shell_archive_file(xlog, pathname);
+	if (ret)
+		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
+	else
 		snprintf(activitymsg, sizeof(activitymsg), "failed on %s", xlog);
-		set_ps_display(activitymsg);
-
-		return false;
-	}
-	elog(DEBUG1, "archived write-ahead log file \"%s\"", xlog);
-
-	snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	set_ps_display(activitymsg);
 
-	return true;
+	return ret;
 }
 
 /*
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
new file mode 100644
index 0000000000..b54e701da4
--- /dev/null
+++ b/src/backend/postmaster/shell_archive.c
@@ -0,0 +1,135 @@
+/*-------------------------------------------------------------------------
+ *
+ * shell_archive.c
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/shell_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/wait.h>
+
+#include "access/xlog.h"
+#include "pgstat.h"
+#include "postmaster/pgarch.h"
+
+bool
+shell_archive_file(const char *file, const char *path)
+{
+	char		xlogarchcmd[MAXPGPATH];
+	char	   *dp;
+	char	   *endp;
+	const char *sp;
+	int			rc;
+
+	/*
+	 * construct the command to be executed
+	 */
+	dp = xlogarchcmd;
+	endp = xlogarchcmd + MAXPGPATH - 1;
+	*endp = '\0';
+
+	for (sp = XLogArchiveCommand; *sp; sp++)
+	{
+		if (*sp == '%')
+		{
+			switch (sp[1])
+			{
+				case 'p':
+					/* %p: relative path of source file */
+					sp++;
+					strlcpy(dp, path, endp - dp);
+					make_native_path(dp);
+					dp += strlen(dp);
+					break;
+				case 'f':
+					/* %f: filename of source file */
+					sp++;
+					strlcpy(dp, file, endp - dp);
+					dp += strlen(dp);
+					break;
+				case '%':
+					/* convert %% to a single % */
+					sp++;
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+				default:
+					/* otherwise treat the % as not special */
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+			}
+		}
+		else
+		{
+			if (dp < endp)
+				*dp++ = *sp;
+		}
+	}
+	*dp = '\0';
+
+	ereport(DEBUG3,
+			(errmsg_internal("executing archive command \"%s\"",
+							 xlogarchcmd)));
+
+	pgstat_report_wait_start(WAIT_EVENT_ARCHIVE_COMMAND);
+	rc = system(xlogarchcmd);
+	pgstat_report_wait_end();
+
+	if (rc != 0)
+	{
+		/*
+		 * If either the shell itself, or a called command, died on a signal,
+		 * abort the archiver.  We do this because system() ignores SIGINT and
+		 * SIGQUIT while waiting; so a signal is very likely something that
+		 * should have interrupted us too.  Also die if the shell got a hard
+		 * "command not found" type of error.  If we overreact it's no big
+		 * deal, the postmaster will just start the archiver again.
+		 */
+		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
+
+		if (WIFEXITED(rc))
+		{
+			ereport(lev,
+					(errmsg("archive command failed with exit code %d",
+							WEXITSTATUS(rc)),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+		else if (WIFSIGNALED(rc))
+		{
+#if defined(WIN32)
+			ereport(lev,
+					(errmsg("archive command was terminated by exception 0x%X",
+							WTERMSIG(rc)),
+					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#else
+			ereport(lev,
+					(errmsg("archive command was terminated by signal %d: %s",
+							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#endif
+		}
+		else
+		{
+			ereport(lev,
+					(errmsg("archive command exited with unrecognized status %d",
+							rc),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+
+		return false;
+	}
+
+	elog(DEBUG1, "archived write-ahead log file \"%s\"", file);
+	return true;
+}
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 732615be57..9c4bd69b56 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -33,4 +33,7 @@ extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 extern void PgArchForceDirScan(void);
 
+/* in shell_archive.c */
+extern bool shell_archive_file(const char *file, const char *path);
+
 #endif							/* _PGARCH_H */
-- 
2.16.6

v12-0002-Introduce-archive-modules-infrastructure.patchapplication/octet-stream; name=v12-0002-Introduce-archive-modules-infrastructure.patchDownload
From 76e0d1b1c32cba4d1b4cfe9f3d71c9d43a3e9b6c Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:04:41 +0000
Subject: [PATCH v12 2/4] Introduce archive modules infrastructure.

---
 src/backend/access/transam/xlog.c             |  2 +-
 src/backend/postmaster/pgarch.c               | 71 +++++++++++++++++++++++++--
 src/backend/postmaster/shell_archive.c        | 24 ++++++++-
 src/backend/utils/init/miscinit.c             |  1 +
 src/backend/utils/misc/guc.c                  | 12 ++++-
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/xlog.h                     |  1 -
 src/include/postmaster/pgarch.h               | 46 ++++++++++++++++-
 8 files changed, 147 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 87cd05c945..fa13bf4d21 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8832,7 +8832,7 @@ ShutdownXLOG(int code, Datum arg)
 		 * process one more time at the end of shutdown). The checkpoint
 		 * record will go to the next XLOG file and won't be archived (yet).
 		 */
-		if (XLogArchivingActive() && XLogArchiveCommandSet())
+		if (XLogArchivingActive())
 			RequestXLogSwitch(false);
 
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index dc68dd570e..61e88ff133 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -88,6 +88,8 @@ typedef struct PgArchData
 	slock_t		arch_lck;
 } PgArchData;
 
+char *XLogArchiveLibrary = "";
+
 
 /* ----------
  * Local data
@@ -95,6 +97,8 @@ typedef struct PgArchData
  */
 static time_t last_sigterm_time = 0;
 static PgArchData *PgArch = NULL;
+static ArchiveModuleCallbacks *ArchiveContext = NULL;
+
 
 /*
  * Stuff for tracking multiple files to archive from each scan of
@@ -139,6 +143,7 @@ static void pgarch_archiveDone(char *xlog);
 static void pgarch_die(int code, Datum arg);
 static void HandlePgArchInterrupts(void);
 static int ready_file_comparator(Datum a, Datum b, void *arg);
+static void LoadArchiveLibrary(void);
 
 /* Report shared memory space needed by PgArchShmemInit */
 Size
@@ -235,6 +240,11 @@ PgArchiverMain(void)
 	 */
 	PgArch->pgprocno = MyProc->pgprocno;
 
+	/*
+	 * Load the archive_library.
+	 */
+	LoadArchiveLibrary();
+
 	/* Create workspace for pgarch_readyXlog() */
 	arch_files = palloc(sizeof(struct arch_files_state));
 	arch_files->arch_files_size = 0;
@@ -406,11 +416,11 @@ pgarch_ArchiverCopyLoop(void)
 			 */
 			HandlePgArchInterrupts();
 
-			/* can't do anything if no command ... */
-			if (!XLogArchiveCommandSet())
+			/* can't do anything if not configured ... */
+			if (!ArchiveContext->check_configured_cb())
 			{
 				ereport(WARNING,
-						(errmsg("archive_mode enabled, yet archive_command is not set")));
+						(errmsg("archive_mode enabled, yet archiving is not configured")));
 				return;
 			}
 
@@ -491,7 +501,7 @@ pgarch_ArchiverCopyLoop(void)
 /*
  * pgarch_archiveXlog
  *
- * Invokes system(3) to copy one archive file to wherever it should go
+ * Invokes archive_file_cb to copy one archive file to wherever it should go
  *
  * Returns true if successful
  */
@@ -508,7 +518,7 @@ pgarch_archiveXlog(char *xlog)
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	ret = shell_archive_file(xlog, pathname);
+	ret = ArchiveContext->archive_file_cb(xlog, pathname);
 	if (ret)
 		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	else
@@ -761,5 +771,56 @@ HandlePgArchInterrupts(void)
 	{
 		ConfigReloadPending = false;
 		ProcessConfigFile(PGC_SIGHUP);
+
+		/*
+		 * Load the archive_library in case it changed.  Ideally, this would
+		 * first unload any pre-existing loaded archive library to release
+		 * custom GUCs, decommission background workers, etc., but there is
+		 * presently no mechanism for unloading a library.  For more
+		 * information, see the comment above internal_unload_library().
+		 */
+		LoadArchiveLibrary();
 	}
 }
+
+/*
+ * LoadArchiveLibrary
+ *
+ * Loads the archiving callbacks into our local ArchiveContext.
+ */
+static void
+LoadArchiveLibrary(void)
+{
+	ArchiveModuleInit archive_init;
+
+	if (ArchiveContext == NULL)
+		ArchiveContext = palloc(sizeof(ArchiveModuleCallbacks));
+
+	memset(ArchiveContext, 0, sizeof(ArchiveModuleCallbacks));
+
+	/*
+	 * If shell archiving is enabled, use our special initialization
+	 * function.  Otherwise, load the library and call its
+	 * _PG_archive_module_init().
+	 */
+	if (ShellArchivingEnabled())
+		archive_init = shell_archive_init;
+	else
+		archive_init = (ArchiveModuleInit)
+			load_external_function(XLogArchiveLibrary,
+								   "_PG_archive_module_init", false, NULL);
+
+	if (archive_init == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules have to declare the "
+						"_PG_archive_module_init symbol")));
+
+	archive_init(ArchiveContext);
+
+	if (ArchiveContext->check_configured_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register a check callback")));
+	if (ArchiveContext->archive_file_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register an archive callback")));
+}
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
index b54e701da4..19e240c205 100644
--- a/src/backend/postmaster/shell_archive.c
+++ b/src/backend/postmaster/shell_archive.c
@@ -2,6 +2,10 @@
  *
  * shell_archive.c
  *
+ * This archiving function uses a user-specified shell command (the
+ * archive_command GUC) to copy write-ahead log files.  It is used as the
+ * default, but other modules may define their own custom archiving logic.
+ *
  * Copyright (c) 2022, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
@@ -17,7 +21,25 @@
 #include "pgstat.h"
 #include "postmaster/pgarch.h"
 
-bool
+static bool shell_archive_configured(void);
+static bool shell_archive_file(const char *file, const char *path);
+
+void
+shell_archive_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&shell_archive_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = shell_archive_configured;
+	cb->archive_file_cb = shell_archive_file;
+}
+
+static bool
+shell_archive_configured(void)
+{
+	return XLogArchiveCommand[0] != '\0';
+}
+
+static bool
 shell_archive_file(const char *file, const char *path)
 {
 	char		xlogarchcmd[MAXPGPATH];
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 88801374b5..358d9ed029 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -38,6 +38,7 @@
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/interrupt.h"
+#include "postmaster/pgarch.h"
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index f9504d3aec..2554923e1d 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3869,13 +3869,23 @@ static struct config_string ConfigureNamesString[] =
 	{
 		{"archive_command", PGC_SIGHUP, WAL_ARCHIVING,
 			gettext_noop("Sets the shell command that will be called to archive a WAL file."),
-			NULL
+			gettext_noop("This is unused if \"archive_library\" does not indicate archiving via shell is enabled.")
 		},
 		&XLogArchiveCommand,
 		"",
 		NULL, NULL, show_archive_command
 	},
 
+	{
+		{"archive_library", PGC_SIGHUP, WAL_ARCHIVING,
+			gettext_noop("Sets the library that will be called to archive a WAL file."),
+			gettext_noop("A value of \"shell\" or an empty string indicates that \"archive_command\" should be used.")
+		},
+		&XLogArchiveLibrary,
+		"shell",
+		NULL, NULL, NULL
+	},
+
 	{
 		{"restore_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will be called to retrieve an archived WAL file."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a1acd46b61..e8bdd1fe13 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -245,6 +245,7 @@
 
 #archive_mode = off		# enables archiving; off, on, or always
 				# (change requires restart)
+#archive_library = 'shell'	# library to use to archive a logfile segment
 #archive_command = ''		# command to use to archive a logfile segment
 				# placeholders: %p = path of file to archive
 				#               %f = file name only
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 34f6c89f06..b7dfb580ad 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -155,7 +155,6 @@ extern PGDLLIMPORT int wal_level;
 /* Is WAL archiving enabled always (even during recovery)? */
 #define XLogArchivingAlways() \
 	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
-#define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')
 
 /*
  * Is WAL-logging necessary for archival or log-shipping, or can we skip
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 9c4bd69b56..03b5ab7c22 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -33,7 +33,49 @@ extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 extern void PgArchForceDirScan(void);
 
-/* in shell_archive.c */
-extern bool shell_archive_file(const char *file, const char *path);
+/*
+ * The value of the archive_library GUC.
+ */
+extern char *XLogArchiveLibrary;
+
+/*
+ * Callback that gets called to determine if the archive module is
+ * configured.
+ */
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+
+/*
+ * Callback called to archive a single WAL file.
+ */
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+
+/*
+ * Archive module callbacks
+ */
+typedef struct ArchiveModuleCallbacks
+{
+	ArchiveCheckConfiguredCB check_configured_cb;
+	ArchiveFileCB archive_file_cb;
+} ArchiveModuleCallbacks;
+
+/*
+ * Type of the shared library symbol _PG_archive_module_init that is looked
+ * up when loading an archive library.
+ */
+typedef void (*ArchiveModuleInit) (ArchiveModuleCallbacks *cb);
+
+/*
+ * Since the logic for archiving via a shell command is in the core server
+ * and does not need to be loaded via a shared library, it has a special
+ * initialization function.
+ */
+extern void shell_archive_init(ArchiveModuleCallbacks *cb);
+
+/*
+ * We consider archiving via shell to be enabled if archive_library is
+ * empty or if archive_library is set to "shell".
+ */
+#define ShellArchivingEnabled() \
+	(XLogArchiveLibrary[0] == '\0' || strcmp(XLogArchiveLibrary, "shell") == 0)
 
 #endif							/* _PGARCH_H */
-- 
2.16.6

v12-0003-Add-test-archive-module.patchapplication/octet-stream; name=v12-0003-Add-test-archive-module.patchDownload
From 7936c8ff9faff34ca5e7de85f4092480c8cb11c9 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:05:43 +0000
Subject: [PATCH v12 3/4] Add test archive module.

---
 src/test/modules/Makefile                          |   1 +
 src/test/modules/basic_archive/.gitignore          |   4 +
 src/test/modules/basic_archive/Makefile            |  20 +++
 src/test/modules/basic_archive/basic_archive.c     | 185 +++++++++++++++++++++
 src/test/modules/basic_archive/basic_archive.conf  |   3 +
 .../basic_archive/expected/basic_archive.out       |  29 ++++
 .../modules/basic_archive/sql/basic_archive.sql    |  22 +++
 7 files changed, 264 insertions(+)
 create mode 100644 src/test/modules/basic_archive/.gitignore
 create mode 100644 src/test/modules/basic_archive/Makefile
 create mode 100644 src/test/modules/basic_archive/basic_archive.c
 create mode 100644 src/test/modules/basic_archive/basic_archive.conf
 create mode 100644 src/test/modules/basic_archive/expected/basic_archive.out
 create mode 100644 src/test/modules/basic_archive/sql/basic_archive.sql

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index dffc79b2d9..b49e508a2c 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -5,6 +5,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = \
+		  basic_archive \
 		  brin \
 		  commit_ts \
 		  delay_execution \
diff --git a/src/test/modules/basic_archive/.gitignore b/src/test/modules/basic_archive/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/basic_archive/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/basic_archive/Makefile b/src/test/modules/basic_archive/Makefile
new file mode 100644
index 0000000000..ffbf846b68
--- /dev/null
+++ b/src/test/modules/basic_archive/Makefile
@@ -0,0 +1,20 @@
+# src/test/modules/basic_archive/Makefile
+
+MODULES = basic_archive
+PGFILEDESC = "basic_archive - basic archive module"
+
+REGRESS = basic_archive
+REGRESS_OPTS = --temp-config $(top_srcdir)/src/test/modules/basic_archive/basic_archive.conf
+
+NO_INSTALLCHECK = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/basic_archive
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/basic_archive/basic_archive.c b/src/test/modules/basic_archive/basic_archive.c
new file mode 100644
index 0000000000..66cbbaa7b5
--- /dev/null
+++ b/src/test/modules/basic_archive/basic_archive.c
@@ -0,0 +1,185 @@
+/*-------------------------------------------------------------------------
+ *
+ * basic_archive.c
+ *
+ * This file demonstrates a basic archive library implementation that is
+ * roughly equivalent to the following shell command:
+ *
+ * 		test ! -f /path/to/dest && cp /path/to/src /path/to/dest
+ *
+ * One notable difference between this module and the shell command above
+ * is that this module first copies the file to a temporary destination,
+ * syncs it to disk, and then durably moves it to the final destination.
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/test/modules/basic_archive/basic_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "miscadmin.h"
+#include "postmaster/pgarch.h"
+#include "storage/copydir.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+void _PG_init(void);
+void _PG_archive_module_init(ArchiveModuleCallbacks *cb);
+
+static char *archive_directory = NULL;
+
+static bool basic_archive_configured(void);
+static bool basic_archive_file(const char *file, const char *path);
+static bool check_archive_directory(char **newval, void **extra, GucSource source);
+
+/*
+ * _PG_init
+ *
+ * Defines the module's GUC.
+ */
+void
+_PG_init(void)
+{
+	DefineCustomStringVariable("basic_archive.archive_directory",
+							   gettext_noop("Archive file destination directory."),
+							   NULL,
+							   &archive_directory,
+							   "",
+							   PGC_SIGHUP,
+							   0,
+							   check_archive_directory, NULL, NULL);
+
+	EmitWarningsOnPlaceholders("basic_archive");
+}
+
+/*
+ * _PG_archive_module_init
+ *
+ * Returns the module's archiving callbacks.
+ */
+void
+_PG_archive_module_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = basic_archive_configured;
+	cb->archive_file_cb = basic_archive_file;
+}
+
+/*
+ * check_archive_directory
+ *
+ * Checks that the provided archive directory exists.
+ */
+static bool
+check_archive_directory(char **newval, void **extra, GucSource source)
+{
+	struct stat st;
+
+	/*
+	 * The default value is an empty string, so we have to accept that value.
+	 * Our check_configured callback also checks for this and prevents archiving
+	 * from proceeding if it is still empty.
+	 */
+	if (*newval == NULL || *newval[0] == '\0')
+		return true;
+
+	/*
+	 * Make sure the file paths won't be too long.  The docs indicate that the
+	 * file names to be archived can be up to 64 characters long.
+	 */
+	if (strlen(*newval) + 64 + 2 >= MAXPGPATH)
+	{
+		GUC_check_errdetail("archive directory too long");
+		return false;
+	}
+
+	/*
+	 * Do a basic sanity check that the specified archive directory exists.  It
+	 * could be removed at some point in the future, so we still need to be
+	 * prepared for it not to exist in the actual archiving logic.
+	 */
+	if (stat(*newval, &st) != 0 || !S_ISDIR(st.st_mode))
+	{
+		GUC_check_errdetail("specified archive directory does not exist");
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * basic_archive_configured
+ *
+ * Checks that archive_directory is not blank.
+ */
+static bool
+basic_archive_configured(void)
+{
+	return archive_directory != NULL && archive_directory[0] != '\0';
+}
+
+/*
+ * basic_archive_file
+ *
+ * Archives one file.
+ */
+static bool
+basic_archive_file(const char *file, const char *path)
+{
+	char destination[MAXPGPATH];
+	char temp[MAXPGPATH];
+	struct stat st;
+
+	ereport(DEBUG3,
+			(errmsg("archiving \"%s\" via basic_archive", file)));
+
+	snprintf(destination, MAXPGPATH, "%s/%s", archive_directory, file);
+	snprintf(temp, MAXPGPATH, "%s/%s", archive_directory, "archtemp");
+
+	/*
+	 * First, check if the file has already been archived.  If the archive file
+	 * already exists, something might be wrong, so we just fail.
+	 */
+	if (stat(destination, &st) == 0)
+	{
+		ereport(WARNING,
+				(errmsg("archive file \"%s\" already exists", destination)));
+		return false;
+	}
+	else if (errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat file \"%s\": %m", destination)));
+
+	/*
+	 * Remove pre-existing temporary file, if one exists.
+	 */
+	if (unlink(temp) != 0 && errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not unlink file \"%s\": %m", temp)));
+
+	/*
+	 * Copy the file to its temporary destination.
+	 */
+	copy_file(unconstify(char *, path), temp);
+
+	/*
+	 * Sync the temporary file to disk and move it to its final destination.
+	 */
+	(void) durable_rename_excl(temp, destination, ERROR);
+
+	ereport(DEBUG1,
+			(errmsg("archived \"%s\" via basic_archive", file)));
+
+	return true;
+}
diff --git a/src/test/modules/basic_archive/basic_archive.conf b/src/test/modules/basic_archive/basic_archive.conf
new file mode 100644
index 0000000000..b26b2d4144
--- /dev/null
+++ b/src/test/modules/basic_archive/basic_archive.conf
@@ -0,0 +1,3 @@
+archive_mode = 'on'
+archive_library = 'basic_archive'
+basic_archive.archive_directory = '.'
diff --git a/src/test/modules/basic_archive/expected/basic_archive.out b/src/test/modules/basic_archive/expected/basic_archive.out
new file mode 100644
index 0000000000..0015053e0f
--- /dev/null
+++ b/src/test/modules/basic_archive/expected/basic_archive.out
@@ -0,0 +1,29 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+ ?column? 
+----------
+        1
+(1 row)
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test;
diff --git a/src/test/modules/basic_archive/sql/basic_archive.sql b/src/test/modules/basic_archive/sql/basic_archive.sql
new file mode 100644
index 0000000000..14e236d57a
--- /dev/null
+++ b/src/test/modules/basic_archive/sql/basic_archive.sql
@@ -0,0 +1,22 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+
+DROP TABLE test;
-- 
2.16.6

v12-0004-Add-documentation-for-archive-modules.patchapplication/octet-stream; name=v12-0004-Add-documentation-for-archive-modules.patchDownload
From 23ab54853e1ed3b5533f1017b159469268e771ea Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:06:01 +0000
Subject: [PATCH v12 4/4] Add documentation for archive modules.

---
 doc/src/sgml/archive-modules.sgml   | 123 ++++++++++++++++++++++++++++++++++++
 doc/src/sgml/backup.sgml            |  83 +++++++++++++++---------
 doc/src/sgml/config.sgml            |  37 ++++++++---
 doc/src/sgml/filelist.sgml          |   1 +
 doc/src/sgml/high-availability.sgml |   6 +-
 doc/src/sgml/postgres.sgml          |   1 +
 doc/src/sgml/ref/pg_basebackup.sgml |   4 +-
 doc/src/sgml/ref/pg_receivewal.sgml |   6 +-
 doc/src/sgml/wal.sgml               |   2 +-
 9 files changed, 215 insertions(+), 48 deletions(-)
 create mode 100644 doc/src/sgml/archive-modules.sgml

diff --git a/doc/src/sgml/archive-modules.sgml b/doc/src/sgml/archive-modules.sgml
new file mode 100644
index 0000000000..d52aaaf1f5
--- /dev/null
+++ b/doc/src/sgml/archive-modules.sgml
@@ -0,0 +1,123 @@
+<!-- doc/src/sgml/archive-modules.sgml -->
+
+<chapter id="archive-modules">
+ <title>Archive Modules</title>
+ <indexterm zone="archive-modules">
+  <primary>Archive Modules</primary>
+ </indexterm>
+
+ <para>
+  PostgreSQL provides infrastructure to create custom modules for continuous
+  archiving (see <xref linkend="continuous-archiving"/>).  While archiving via
+  a shell command (i.e., <xref linkend="guc-archive-command"/>) is much
+  simpler, a custom archive module will often be considerably more robust and
+  performant.
+ </para>
+
+ <para>
+  When a custom <xref linkend="guc-archive-library"/> is configured, PostgreSQL
+  will submit completed WAL files to the module, and the server will avoid
+  recyling or removing these WAL files until the module indicates that the files
+  were successfully archived.  It is ultimately up to the module to decide what
+  to do with each WAL file, but many recommendations are listed at
+  <xref linkend="backup-archiving-wal"/>.
+ </para>
+
+ <para>
+  Archiving modules must at least consist of an initialization function (see
+  <xref linkend="archive-module-init"/>) and the required callbacks (see
+  <xref linkend="archive-module-callbacks"/>).  However, archive modules are
+  also permitted to do much more (e.g., declare GUCs and register background
+  workers).
+ </para>
+
+ <para>
+  The <filename>src/test/modules/basic_archive</filename> module contains a
+  working example, which demonstrates some useful techniques.
+ </para>
+
+ <warning>
+  <para>
+   There are considerable robustness and security risks in using archive modules
+   because, being written in the <literal>C</literal> language, they have access
+   to many server resources.  Administrators wishing to enable archive modules
+   should exercise extreme caution.  Only carefully audited modules should be
+   loaded.
+  </para>
+ </warning>
+
+ <sect1 id="archive-module-init">
+  <title>Initialization Functions</title>
+  <indexterm zone="archive-module-init">
+   <primary>_PG_archive_module_init</primary>
+  </indexterm>
+  <para>
+   An archive library is loaded by dynamically loading a shared library with the
+   <xref linkend="guc-archive-library"/>'s name as the library base name.  The
+   normal library search path is used to locate the library.  To provide the
+   required archive module callbacks and to indicate that the library is
+   actually an archive module, it needs to provide a function named
+   <function>_PG_archive_module_init</function>.  This function is passed a
+   struct that needs to be filled with the callback function pointers for
+   individual actions.
+
+<programlisting>
+typedef struct ArchiveModuleCallbacks
+{
+    ArchiveCheckConfiguredCB check_configured_cb;
+    ArchiveFileCB archive_file_cb;
+} ArchiveModuleCallbacks;
+typedef void (*ArchiveModuleInit) (struct ArchiveModuleCallbacks *cb);
+</programlisting>
+
+   Both callbacks are required.
+  </para>
+ </sect1>
+
+ <sect1 id="archive-module-callbacks">
+  <title>Archive Module Callbacks</title>
+  <para>
+   The archive callbacks define the actual archiving behavior of the module.
+   The server will call them as required to process each individual WAL file.
+  </para>
+
+  <sect2 id="archive-module-check">
+   <title>Check Callback</title>
+   <para>
+    The <function>check_configured_cb</function> callback is called to determine
+    whether the module is fully configured and ready to accept WAL files.
+
+<programlisting>
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server will proceed with
+    archiving the file by calling the <function>archive_file_cb</function>
+    callback.  If <literal>false</literal> is returned, archiving will not
+    proceed.  In the latter case, the server will periodically call this
+    function, and archiving will proceed if it eventually returns
+    <literal>true</literal>.
+   </para>
+  </sect2>
+
+  <sect2 id="archive-module-archive">
+   <title>Archive Callback</title>
+   <para>
+    The <function>archive_file_cb</function> callback is called to archive a
+    single WAL file.
+
+<programlisting>
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server proceeds as if the file
+    was successfully archived, which may include recycling or removing the
+    original WAL file.  If <literal>false</literal> is returned, the server will
+    keep the original WAL file and retry archiving later.
+    <literal>file</literal> will contain just the file name of the WAL file to
+    archive, while <literal>path</literal> contains the full path of the WAL
+    file (including the file name).
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index cba32b6eb3..b42f1b3ca7 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -593,20 +593,23 @@ tar -cf backup.tar /usr/local/pgsql/data
     provide the database administrator with flexibility,
     <productname>PostgreSQL</productname> tries not to make any assumptions about how
     the archiving will be done.  Instead, <productname>PostgreSQL</productname> lets
-    the administrator specify a shell command to be executed to copy a
-    completed segment file to wherever it needs to go.  The command could be
-    as simple as a <literal>cp</literal>, or it could invoke a complex shell
-    script &mdash; it's all up to you.
+    the administrator specify an archive library to be executed to copy a
+    completed segment file to wherever it needs to go.  This could be as simple
+    as a shell command that uses <literal>cp</literal>, or it could invoke a
+    complex C function &mdash; it's all up to you.
    </para>
 
    <para>
     To enable WAL archiving, set the <xref linkend="guc-wal-level"/>
     configuration parameter to <literal>replica</literal> or higher,
     <xref linkend="guc-archive-mode"/> to <literal>on</literal>,
-    and specify the shell command to use in the <xref
-    linkend="guc-archive-command"/> configuration parameter.  In practice
+    and specify the library to use in the <xref
+    linkend="guc-archive-library"/> configuration parameter.  In practice
     these settings will always be placed in the
     <filename>postgresql.conf</filename> file.
+    One simple way to archive is to set <varname>archive_library</varname> to
+    <literal>shell</literal> and to specify a shell command in
+    <xref linkend="guc-archive-command"/>.
     In <varname>archive_command</varname>,
     <literal>%p</literal> is replaced by the path name of the file to
     archive, while <literal>%f</literal> is replaced by only the file name.
@@ -631,7 +634,17 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command will be executed under the ownership of the same
+    Another way to archive is to use a custom archive module as the
+    <varname>archive_library</varname>.  Since such modules are written in
+    <literal>C</literal>, creating your own may require considerably more effort
+    than writing a shell command.  However, archive modules can be more
+    performant than archiving via shell, and they will have access to many
+    useful server resources.  For more information about archive modules, see
+    <xref linkend="archive-modules"/>.
+   </para>
+
+   <para>
+    The archive library will be executed under the ownership of the same
     user that the <productname>PostgreSQL</productname> server is running as.  Since
     the series of WAL files being archived contains effectively everything
     in your database, you will want to be sure that the archived data is
@@ -640,25 +653,31 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is important that the archive command return zero exit status if and
-    only if it succeeds.  Upon getting a zero result,
+    It is important that the archive function return <literal>true</literal> if
+    and only if it succeeds.  If <literal>true</literal> is returned,
     <productname>PostgreSQL</productname> will assume that the file has been
-    successfully archived, and will remove or recycle it.  However, a nonzero
-    status tells <productname>PostgreSQL</productname> that the file was not archived;
-    it will try again periodically until it succeeds.
+    successfully archived, and will remove or recycle it.  However, a return
+    value of <literal>false</literal> tells
+    <productname>PostgreSQL</productname> that the file was not archived; it
+    will try again periodically until it succeeds.  If you are archiving via a
+    shell command, the appropriate return values can be achieved by returning
+    <literal>0</literal> if the command succeeds and a nonzero value if it
+    fails.
    </para>
 
    <para>
-    When the archive command is terminated by a signal (other than
-    <systemitem>SIGTERM</systemitem> that is used as part of a server
-    shutdown) or an error by the shell with an exit status greater than
-    125 (such as command not found), the archiver process aborts and gets
-    restarted by the postmaster. In such cases, the failure is
-    not reported in <xref linkend="pg-stat-archiver-view"/>.
+    If the archive function emits an <literal>ERROR</literal> or
+    <literal>FATAL</literal>, the archiver process aborts and gets restarted by
+    the postmaster.  If you are archiving via shell command, FATAL is emitted if
+    the command is terminated by a signal (other than
+    <systemitem>SIGTERM</systemitem> that is used as part of a server shutdown)
+    or an error by the shell with an exit status greater than 125 (such as
+    command not found).  In such cases, the failure is not reported in
+    <xref linkend="pg-stat-archiver-view"/>.
    </para>
 
    <para>
-    The archive command should generally be designed to refuse to overwrite
+    The archive library should generally be designed to refuse to overwrite
     any pre-existing archive file.  This is an important safety feature to
     preserve the integrity of your archive in case of administrator error
     (such as sending the output of two different servers to the same archive
@@ -666,9 +685,9 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is advisable to test your proposed archive command to ensure that it
+    It is advisable to test your proposed archive library to ensure that it
     indeed does not overwrite an existing file, <emphasis>and that it returns
-    nonzero status in this case</emphasis>.
+    <literal>false</literal> in this case</emphasis>.
     The example command above for Unix ensures this by including a separate
     <command>test</command> step.  On some Unix platforms, <command>cp</command> has
     switches such as <option>-i</option> that can be used to do the same thing
@@ -680,7 +699,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
 
    <para>
     While designing your archiving setup, consider what will happen if
-    the archive command fails repeatedly because some aspect requires
+    the archive library fails repeatedly because some aspect requires
     operator intervention or the archive runs out of space. For example, this
     could occur if you write to tape without an autochanger; when the tape
     fills, nothing further can be archived until the tape is swapped.
@@ -695,7 +714,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The speed of the archiving command is unimportant as long as it can keep up
+    The speed of the archive library is unimportant as long as it can keep up
     with the average rate at which your server generates WAL data.  Normal
     operation continues even if the archiving process falls a little behind.
     If archiving falls significantly behind, this will increase the amount of
@@ -707,11 +726,11 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    In writing your archive command, you should assume that the file names to
+    In writing your archive library, you should assume that the file names to
     be archived can be up to 64 characters long and can contain any
     combination of ASCII letters, digits, and dots.  It is not necessary to
-    preserve the original relative path (<literal>%p</literal>) but it is necessary to
-    preserve the file name (<literal>%f</literal>).
+    preserve the original relative path but it is necessary to preserve the file
+    name.
    </para>
 
    <para>
@@ -728,7 +747,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command is only invoked on completed WAL segments.  Hence,
+    The archive function is only invoked on completed WAL segments.  Hence,
     if your server generates only little WAL traffic (or has slack periods
     where it does so), there could be a long delay between the completion
     of a transaction and its safe recording in archive storage.  To put
@@ -758,7 +777,8 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
     contain enough information for archive recovery.  (Crash recovery is
     unaffected.)  For this reason, <varname>wal_level</varname> can only be changed at
     server start.  However, <varname>archive_command</varname> can be changed with a
-    configuration file reload.  If you wish to temporarily stop archiving,
+    configuration file reload.  If you are archiving via shell and wish to
+    temporarily stop archiving,
     one way to do it is to set <varname>archive_command</varname> to the empty
     string (<literal>''</literal>).
     This will cause WAL files to accumulate in <filename>pg_wal/</filename> until a
@@ -938,11 +958,11 @@ SELECT * FROM pg_stop_backup(false, true);
      On a standby, <varname>archive_mode</varname> must be <literal>always</literal> in order
      for <function>pg_stop_backup</function> to wait.
      Archiving of these files happens automatically since you have
-     already configured <varname>archive_command</varname>. In most cases this
+     already configured <varname>archive_library</varname>. In most cases this
      happens quickly, but you are advised to monitor your archive
      system to ensure there are no delays.
      If the archive process has fallen behind
-     because of failures of the archive command, it will keep retrying
+     because of failures of the archive library, it will keep retrying
      until the archive succeeds and the backup is complete.
      If you wish to place a time limit on the execution of
      <function>pg_stop_backup</function>, set an appropriate
@@ -1500,9 +1520,10 @@ restore_command = 'cp /mnt/server/archivedir/%f %p'
       To prepare for low level standalone hot backups, make sure
       <varname>wal_level</varname> is set to
       <literal>replica</literal> or higher, <varname>archive_mode</varname> to
-      <literal>on</literal>, and set up an <varname>archive_command</varname> that performs
+      <literal>on</literal>, and set up an <varname>archive_library</varname> that performs
       archiving only when a <emphasis>switch file</emphasis> exists.  For example:
 <programlisting>
+archive_library = 'shell'
 archive_command = 'test ! -f /var/lib/pgsql/backup_in_progress || (test ! -f /var/lib/pgsql/archive/%f &amp;&amp; cp %p /var/lib/pgsql/archive/%f)'
 </programlisting>
       This command will perform archiving when
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index afbb6c35e3..d8b5152930 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3479,7 +3479,7 @@ include_dir 'conf.d'
         Maximum size to let the WAL grow during automatic
         checkpoints. This is a soft limit; WAL size can exceed
         <varname>max_wal_size</varname> under special circumstances, such as
-        heavy load, a failing <varname>archive_command</varname>, or a high
+        heavy load, a failing <varname>archive_library</varname>, or a high
         <varname>wal_keep_size</varname> setting.
         If this value is specified without units, it is taken as megabytes.
         The default is 1 GB.
@@ -3528,7 +3528,7 @@ include_dir 'conf.d'
        <para>
         When <varname>archive_mode</varname> is enabled, completed WAL segments
         are sent to archive storage by setting
-        <xref linkend="guc-archive-command"/>. In addition to <literal>off</literal>,
+        <xref linkend="guc-archive-library"/>. In addition to <literal>off</literal>,
         to disable, there are two modes: <literal>on</literal>, and
         <literal>always</literal>. During normal operation, there is no
         difference between the two modes, but when set to <literal>always</literal>
@@ -3538,9 +3538,6 @@ include_dir 'conf.d'
         <xref linkend="continuous-archiving-in-standby"/> for details.
        </para>
        <para>
-        <varname>archive_mode</varname> and <varname>archive_command</varname> are
-        separate variables so that <varname>archive_command</varname> can be
-        changed without leaving archiving mode.
         This parameter can only be set at server start.
         <varname>archive_mode</varname> cannot be enabled when
         <varname>wal_level</varname> is set to <literal>minimal</literal>.
@@ -3548,6 +3545,28 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-archive-library" xreflabel="archive_library">
+      <term><varname>archive_library</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>archive_library</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        The library to use for archiving completed WAL file segments.  If set to
+        <literal>shell</literal> (the default) or an empty string, archiving via
+        shell is enabled, and <xref linkend="guc-archive-command"/> is used.
+        Otherwise, the specified shared library is used for archiving.  For more
+        information, see <xref linkend="backup-archiving-wal"/> and
+        <xref linkend="archive-modules"/>.
+       </para>
+       <para>
+        This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-archive-command" xreflabel="archive_command">
       <term><varname>archive_command</varname> (<type>string</type>)
       <indexterm>
@@ -3570,9 +3589,11 @@ include_dir 'conf.d'
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
         file or on the server command line.  It is ignored unless
-        <varname>archive_mode</varname> was enabled at server start.
+        <varname>archive_mode</varname> was enabled at server start and
+        <varname>archive_library</varname> specifies to archive via shell command.
         If <varname>archive_command</varname> is an empty string (the default) while
-        <varname>archive_mode</varname> is enabled, WAL archiving is temporarily
+        <varname>archive_mode</varname> is enabled and <varname>archive_library</varname>
+        specifies archiving via shell, WAL archiving is temporarily
         disabled, but the server continues to accumulate WAL segment files in
         the expectation that a command will soon be provided.  Setting
         <varname>archive_command</varname> to a command that does nothing but
@@ -3592,7 +3613,7 @@ include_dir 'conf.d'
       </term>
       <listitem>
        <para>
-        The <xref linkend="guc-archive-command"/> is only invoked for
+        The <xref linkend="guc-archive-library"/> is only invoked for
         completed WAL segments. Hence, if your server generates little WAL
         traffic (or has slack periods where it does so), there could be a
         long delay between the completion of a transaction and its safe
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 89454e99b9..e6b472ec32 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -99,6 +99,7 @@
 <!ENTITY custom-scan SYSTEM "custom-scan.sgml">
 <!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
 <!ENTITY replication-origins SYSTEM "replication-origins.sgml">
+<!ENTITY archive-modules SYSTEM "archive-modules.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index a265409f02..437712762a 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -935,7 +935,7 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
     In lieu of using replication slots, it is possible to prevent the removal
     of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by
     storing the segments in an archive using
-    <xref linkend="guc-archive-command"/>.
+    <xref linkend="guc-archive-library"/>.
     However, these methods often result in retaining more WAL segments than
     required, whereas replication slots retain only the number of segments
     known to be needed.  On the other hand, replication slots can retain so
@@ -1386,10 +1386,10 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
      to <literal>always</literal>, and the standby will call the archive
      command for every WAL segment it receives, whether it's by restoring
      from the archive or by streaming replication. The shared archive can
-     be handled similarly, but the <varname>archive_command</varname> must
+     be handled similarly, but the <varname>archive_library</varname> must
      test if the file being archived exists already, and if the existing file
      has identical contents. This requires more care in the
-     <varname>archive_command</varname>, as it must
+     <varname>archive_library</varname>, as it must
      be careful to not overwrite an existing file with different contents,
      but return success if the exactly same file is archived twice. And
      all that must be done free of race conditions, if two servers attempt
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index dba9cf413f..3db6d2160b 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -233,6 +233,7 @@ break is not needed in a wider output rendering.
   &bgworker;
   &logicaldecoding;
   &replication-origins;
+  &archive-modules;
 
  </part>
 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9e6807b457..2aaeaca766 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -102,8 +102,8 @@ PostgreSQL documentation
      <para>
       All WAL records required for the backup must contain sufficient full-page writes,
       which requires you to enable <varname>full_page_writes</varname> on the primary and
-      not to use a tool like <application>pg_compresslog</application> as
-      <varname>archive_command</varname> to remove full-page writes from WAL files.
+      not to use a tool in your <varname>archive_library</varname> to remove
+      full-page writes from WAL files.
      </para>
     </listitem>
    </itemizedlist>
diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 5de80f8c64..a6b6ba91fb 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -40,7 +40,7 @@ PostgreSQL documentation
   <para>
    <application>pg_receivewal</application> streams the write-ahead
    log in real time as it's being generated on the server, and does not wait
-   for segments to complete like <xref linkend="guc-archive-command"/> does.
+   for segments to complete like <xref linkend="guc-archive-library"/> does.
    For this reason, it is not necessary to set
    <xref linkend="guc-archive-timeout"/> when using
     <application>pg_receivewal</application>.
@@ -488,11 +488,11 @@ PostgreSQL documentation
 
   <para>
    When using <application>pg_receivewal</application> instead of
-   <xref linkend="guc-archive-command"/> as the main WAL backup method, it is
+   <xref linkend="guc-archive-library"/> as the main WAL backup method, it is
    strongly recommended to use replication slots.  Otherwise, the server is
    free to recycle or remove write-ahead log files before they are backed up,
    because it does not have any information, either
-   from <xref linkend="guc-archive-command"/> or the replication slots, about
+   from <xref linkend="guc-archive-library"/> or the replication slots, about
    how far the WAL stream has been archived.  Note, however, that a
    replication slot will fill up the server's disk space if the receiver does
    not keep up with fetching the WAL data.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 24e1c89503..2bb27a8468 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -636,7 +636,7 @@
    WAL files plus one additional WAL file are
    kept at all times. Also, if WAL archiving is used, old segments cannot be
    removed or recycled until they are archived. If WAL archiving cannot keep up
-   with the pace that WAL is generated, or if <varname>archive_command</varname>
+   with the pace that WAL is generated, or if <varname>archive_library</varname>
    fails repeatedly, old WAL files will accumulate in <filename>pg_wal</filename>
    until the situation is resolved. A slow or failed standby server that
    uses a replication slot will have the same effect (see
-- 
2.16.6

#25Bossart, Nathan
bossartn@amazon.com
In reply to: Bossart, Nathan (#24)
4 attachment(s)
Re: archive modules

On 1/5/22, 3:14 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

On 11/22/21, 10:01 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

On 11/19/21, 11:24 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

I went ahead and split the patch into 4 separate patches in an effort
to ease review. 0001 just refactors the shell archiving logic to its
own file. 0002 introduces the archive modules infrastructure. 0003
introduces the basic_archive test module. And 0004 is the docs.

Here is a rebased patch set (1b06d7b broke v10).

I'm attempting to make cfbot happy again with v12. It looked like
there was a missing #include for Windows.

Here is another rebase for cfbot.

Nathan

Attachments:

v13-0001-Refactor-shell-archive-function-to-its-own-file.patchapplication/octet-stream; name=v13-0001-Refactor-shell-archive-function-to-its-own-file.patchDownload
From 4a52e3060f16513f6eb86021c0c75dd01eab42cb Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Mon, 22 Nov 2021 17:49:18 +0000
Subject: [PATCH v13 1/4] Refactor shell archive function to its own file.

---
 src/backend/postmaster/Makefile        |   1 +
 src/backend/postmaster/pgarch.c        | 125 ++----------------------------
 src/backend/postmaster/shell_archive.c | 135 +++++++++++++++++++++++++++++++++
 src/include/postmaster/pgarch.h        |   3 +
 4 files changed, 145 insertions(+), 119 deletions(-)
 create mode 100644 src/backend/postmaster/shell_archive.c

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 787c6a2c3b..dbbeac5a82 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -23,6 +23,7 @@ OBJS = \
 	pgarch.o \
 	pgstat.o \
 	postmaster.o \
+	shell_archive.o \
 	startup.o \
 	syslogger.o \
 	walwriter.o
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 1121e4fb29..6e3fcedc97 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -25,19 +25,14 @@
  */
 #include "postgres.h"
 
-#include <fcntl.h>
-#include <signal.h>
 #include <time.h>
 #include <sys/stat.h>
-#include <sys/time.h>
-#include <sys/wait.h>
 #include <unistd.h>
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "lib/binaryheap.h"
 #include "libpq/pqsignal.h"
-#include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/pgarch.h"
@@ -504,132 +499,24 @@ pgarch_ArchiverCopyLoop(void)
 static bool
 pgarch_archiveXlog(char *xlog)
 {
-	char		xlogarchcmd[MAXPGPATH];
 	char		pathname[MAXPGPATH];
 	char		activitymsg[MAXFNAMELEN + 16];
-	char	   *dp;
-	char	   *endp;
-	const char *sp;
-	int			rc;
+	bool		ret;
 
 	snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
 
-	/*
-	 * construct the command to be executed
-	 */
-	dp = xlogarchcmd;
-	endp = xlogarchcmd + MAXPGPATH - 1;
-	*endp = '\0';
-
-	for (sp = XLogArchiveCommand; *sp; sp++)
-	{
-		if (*sp == '%')
-		{
-			switch (sp[1])
-			{
-				case 'p':
-					/* %p: relative path of source file */
-					sp++;
-					strlcpy(dp, pathname, endp - dp);
-					make_native_path(dp);
-					dp += strlen(dp);
-					break;
-				case 'f':
-					/* %f: filename of source file */
-					sp++;
-					strlcpy(dp, xlog, endp - dp);
-					dp += strlen(dp);
-					break;
-				case '%':
-					/* convert %% to a single % */
-					sp++;
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-				default:
-					/* otherwise treat the % as not special */
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-			}
-		}
-		else
-		{
-			if (dp < endp)
-				*dp++ = *sp;
-		}
-	}
-	*dp = '\0';
-
-	ereport(DEBUG3,
-			(errmsg_internal("executing archive command \"%s\"",
-							 xlogarchcmd)));
-
 	/* Report archive activity in PS display */
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	pgstat_report_wait_start(WAIT_EVENT_ARCHIVE_COMMAND);
-	rc = system(xlogarchcmd);
-	pgstat_report_wait_end();
-
-	if (rc != 0)
-	{
-		/*
-		 * If either the shell itself, or a called command, died on a signal,
-		 * abort the archiver.  We do this because system() ignores SIGINT and
-		 * SIGQUIT while waiting; so a signal is very likely something that
-		 * should have interrupted us too.  Also die if the shell got a hard
-		 * "command not found" type of error.  If we overreact it's no big
-		 * deal, the postmaster will just start the archiver again.
-		 */
-		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
-
-		if (WIFEXITED(rc))
-		{
-			ereport(lev,
-					(errmsg("archive command failed with exit code %d",
-							WEXITSTATUS(rc)),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-		else if (WIFSIGNALED(rc))
-		{
-#if defined(WIN32)
-			ereport(lev,
-					(errmsg("archive command was terminated by exception 0x%X",
-							WTERMSIG(rc)),
-					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#else
-			ereport(lev,
-					(errmsg("archive command was terminated by signal %d: %s",
-							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#endif
-		}
-		else
-		{
-			ereport(lev,
-					(errmsg("archive command exited with unrecognized status %d",
-							rc),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-
+	ret = shell_archive_file(xlog, pathname);
+	if (ret)
+		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
+	else
 		snprintf(activitymsg, sizeof(activitymsg), "failed on %s", xlog);
-		set_ps_display(activitymsg);
-
-		return false;
-	}
-	elog(DEBUG1, "archived write-ahead log file \"%s\"", xlog);
-
-	snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	set_ps_display(activitymsg);
 
-	return true;
+	return ret;
 }
 
 /*
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
new file mode 100644
index 0000000000..b54e701da4
--- /dev/null
+++ b/src/backend/postmaster/shell_archive.c
@@ -0,0 +1,135 @@
+/*-------------------------------------------------------------------------
+ *
+ * shell_archive.c
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/shell_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/wait.h>
+
+#include "access/xlog.h"
+#include "pgstat.h"
+#include "postmaster/pgarch.h"
+
+bool
+shell_archive_file(const char *file, const char *path)
+{
+	char		xlogarchcmd[MAXPGPATH];
+	char	   *dp;
+	char	   *endp;
+	const char *sp;
+	int			rc;
+
+	/*
+	 * construct the command to be executed
+	 */
+	dp = xlogarchcmd;
+	endp = xlogarchcmd + MAXPGPATH - 1;
+	*endp = '\0';
+
+	for (sp = XLogArchiveCommand; *sp; sp++)
+	{
+		if (*sp == '%')
+		{
+			switch (sp[1])
+			{
+				case 'p':
+					/* %p: relative path of source file */
+					sp++;
+					strlcpy(dp, path, endp - dp);
+					make_native_path(dp);
+					dp += strlen(dp);
+					break;
+				case 'f':
+					/* %f: filename of source file */
+					sp++;
+					strlcpy(dp, file, endp - dp);
+					dp += strlen(dp);
+					break;
+				case '%':
+					/* convert %% to a single % */
+					sp++;
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+				default:
+					/* otherwise treat the % as not special */
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+			}
+		}
+		else
+		{
+			if (dp < endp)
+				*dp++ = *sp;
+		}
+	}
+	*dp = '\0';
+
+	ereport(DEBUG3,
+			(errmsg_internal("executing archive command \"%s\"",
+							 xlogarchcmd)));
+
+	pgstat_report_wait_start(WAIT_EVENT_ARCHIVE_COMMAND);
+	rc = system(xlogarchcmd);
+	pgstat_report_wait_end();
+
+	if (rc != 0)
+	{
+		/*
+		 * If either the shell itself, or a called command, died on a signal,
+		 * abort the archiver.  We do this because system() ignores SIGINT and
+		 * SIGQUIT while waiting; so a signal is very likely something that
+		 * should have interrupted us too.  Also die if the shell got a hard
+		 * "command not found" type of error.  If we overreact it's no big
+		 * deal, the postmaster will just start the archiver again.
+		 */
+		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
+
+		if (WIFEXITED(rc))
+		{
+			ereport(lev,
+					(errmsg("archive command failed with exit code %d",
+							WEXITSTATUS(rc)),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+		else if (WIFSIGNALED(rc))
+		{
+#if defined(WIN32)
+			ereport(lev,
+					(errmsg("archive command was terminated by exception 0x%X",
+							WTERMSIG(rc)),
+					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#else
+			ereport(lev,
+					(errmsg("archive command was terminated by signal %d: %s",
+							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#endif
+		}
+		else
+		{
+			ereport(lev,
+					(errmsg("archive command exited with unrecognized status %d",
+							rc),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+
+		return false;
+	}
+
+	elog(DEBUG1, "archived write-ahead log file \"%s\"", file);
+	return true;
+}
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index ed55d6646b..991a6d0616 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -33,4 +33,7 @@ extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 extern void PgArchForceDirScan(void);
 
+/* in shell_archive.c */
+extern bool shell_archive_file(const char *file, const char *path);
+
 #endif							/* _PGARCH_H */
-- 
2.16.6

v13-0002-Introduce-archive-modules-infrastructure.patchapplication/octet-stream; name=v13-0002-Introduce-archive-modules-infrastructure.patchDownload
From af56f5aa0ac9b1c67a453d132f86c6d6c6a7ca64 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:04:41 +0000
Subject: [PATCH v13 2/4] Introduce archive modules infrastructure.

---
 src/backend/access/transam/xlog.c             |  2 +-
 src/backend/postmaster/pgarch.c               | 71 +++++++++++++++++++++++++--
 src/backend/postmaster/shell_archive.c        | 24 ++++++++-
 src/backend/utils/init/miscinit.c             |  1 +
 src/backend/utils/misc/guc.c                  | 12 ++++-
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/xlog.h                     |  1 -
 src/include/postmaster/pgarch.h               | 46 ++++++++++++++++-
 8 files changed, 147 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c9d4cbf3ff..c036783ca7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8832,7 +8832,7 @@ ShutdownXLOG(int code, Datum arg)
 		 * process one more time at the end of shutdown). The checkpoint
 		 * record will go to the next XLOG file and won't be archived (yet).
 		 */
-		if (XLogArchivingActive() && XLogArchiveCommandSet())
+		if (XLogArchivingActive())
 			RequestXLogSwitch(false);
 
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 6e3fcedc97..11a1737aac 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -89,6 +89,8 @@ typedef struct PgArchData
 	slock_t		arch_lck;
 } PgArchData;
 
+char *XLogArchiveLibrary = "";
+
 
 /* ----------
  * Local data
@@ -96,6 +98,8 @@ typedef struct PgArchData
  */
 static time_t last_sigterm_time = 0;
 static PgArchData *PgArch = NULL;
+static ArchiveModuleCallbacks *ArchiveContext = NULL;
+
 
 /*
  * Stuff for tracking multiple files to archive from each scan of
@@ -140,6 +144,7 @@ static void pgarch_archiveDone(char *xlog);
 static void pgarch_die(int code, Datum arg);
 static void HandlePgArchInterrupts(void);
 static int ready_file_comparator(Datum a, Datum b, void *arg);
+static void LoadArchiveLibrary(void);
 
 /* Report shared memory space needed by PgArchShmemInit */
 Size
@@ -236,6 +241,11 @@ PgArchiverMain(void)
 	 */
 	PgArch->pgprocno = MyProc->pgprocno;
 
+	/*
+	 * Load the archive_library.
+	 */
+	LoadArchiveLibrary();
+
 	/* Create workspace for pgarch_readyXlog() */
 	arch_files = palloc(sizeof(struct arch_files_state));
 	arch_files->arch_files_size = 0;
@@ -407,11 +417,11 @@ pgarch_ArchiverCopyLoop(void)
 			 */
 			HandlePgArchInterrupts();
 
-			/* can't do anything if no command ... */
-			if (!XLogArchiveCommandSet())
+			/* can't do anything if not configured ... */
+			if (!ArchiveContext->check_configured_cb())
 			{
 				ereport(WARNING,
-						(errmsg("archive_mode enabled, yet archive_command is not set")));
+						(errmsg("archive_mode enabled, yet archiving is not configured")));
 				return;
 			}
 
@@ -492,7 +502,7 @@ pgarch_ArchiverCopyLoop(void)
 /*
  * pgarch_archiveXlog
  *
- * Invokes system(3) to copy one archive file to wherever it should go
+ * Invokes archive_file_cb to copy one archive file to wherever it should go
  *
  * Returns true if successful
  */
@@ -509,7 +519,7 @@ pgarch_archiveXlog(char *xlog)
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	ret = shell_archive_file(xlog, pathname);
+	ret = ArchiveContext->archive_file_cb(xlog, pathname);
 	if (ret)
 		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	else
@@ -763,9 +773,60 @@ HandlePgArchInterrupts(void)
 	{
 		ConfigReloadPending = false;
 		ProcessConfigFile(PGC_SIGHUP);
+
+		/*
+		 * Load the archive_library in case it changed.  Ideally, this would
+		 * first unload any pre-existing loaded archive library to release
+		 * custom GUCs, decommission background workers, etc., but there is
+		 * presently no mechanism for unloading a library.  For more
+		 * information, see the comment above internal_unload_library().
+		 */
+		LoadArchiveLibrary();
 	}
 
 	/* Perform logging of memory contexts of this process */
 	if (LogMemoryContextPending)
 		ProcessLogMemoryContextInterrupt();
 }
+
+/*
+ * LoadArchiveLibrary
+ *
+ * Loads the archiving callbacks into our local ArchiveContext.
+ */
+static void
+LoadArchiveLibrary(void)
+{
+	ArchiveModuleInit archive_init;
+
+	if (ArchiveContext == NULL)
+		ArchiveContext = palloc(sizeof(ArchiveModuleCallbacks));
+
+	memset(ArchiveContext, 0, sizeof(ArchiveModuleCallbacks));
+
+	/*
+	 * If shell archiving is enabled, use our special initialization
+	 * function.  Otherwise, load the library and call its
+	 * _PG_archive_module_init().
+	 */
+	if (ShellArchivingEnabled())
+		archive_init = shell_archive_init;
+	else
+		archive_init = (ArchiveModuleInit)
+			load_external_function(XLogArchiveLibrary,
+								   "_PG_archive_module_init", false, NULL);
+
+	if (archive_init == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules have to declare the "
+						"_PG_archive_module_init symbol")));
+
+	archive_init(ArchiveContext);
+
+	if (ArchiveContext->check_configured_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register a check callback")));
+	if (ArchiveContext->archive_file_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register an archive callback")));
+}
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
index b54e701da4..19e240c205 100644
--- a/src/backend/postmaster/shell_archive.c
+++ b/src/backend/postmaster/shell_archive.c
@@ -2,6 +2,10 @@
  *
  * shell_archive.c
  *
+ * This archiving function uses a user-specified shell command (the
+ * archive_command GUC) to copy write-ahead log files.  It is used as the
+ * default, but other modules may define their own custom archiving logic.
+ *
  * Copyright (c) 2022, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
@@ -17,7 +21,25 @@
 #include "pgstat.h"
 #include "postmaster/pgarch.h"
 
-bool
+static bool shell_archive_configured(void);
+static bool shell_archive_file(const char *file, const char *path);
+
+void
+shell_archive_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&shell_archive_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = shell_archive_configured;
+	cb->archive_file_cb = shell_archive_file;
+}
+
+static bool
+shell_archive_configured(void)
+{
+	return XLogArchiveCommand[0] != '\0';
+}
+
+static bool
 shell_archive_file(const char *file, const char *path)
 {
 	char		xlogarchcmd[MAXPGPATH];
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 0f2570d626..0868e5a24f 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -38,6 +38,7 @@
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/interrupt.h"
+#include "postmaster/pgarch.h"
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6fc5cbc09a..bbe3c1832b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3869,13 +3869,23 @@ static struct config_string ConfigureNamesString[] =
 	{
 		{"archive_command", PGC_SIGHUP, WAL_ARCHIVING,
 			gettext_noop("Sets the shell command that will be called to archive a WAL file."),
-			NULL
+			gettext_noop("This is unused if \"archive_library\" does not indicate archiving via shell is enabled.")
 		},
 		&XLogArchiveCommand,
 		"",
 		NULL, NULL, show_archive_command
 	},
 
+	{
+		{"archive_library", PGC_SIGHUP, WAL_ARCHIVING,
+			gettext_noop("Sets the library that will be called to archive a WAL file."),
+			gettext_noop("A value of \"shell\" or an empty string indicates that \"archive_command\" should be used.")
+		},
+		&XLogArchiveLibrary,
+		"shell",
+		NULL, NULL, NULL
+	},
+
 	{
 		{"restore_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will be called to retrieve an archived WAL file."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a1acd46b61..e8bdd1fe13 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -245,6 +245,7 @@
 
 #archive_mode = off		# enables archiving; off, on, or always
 				# (change requires restart)
+#archive_library = 'shell'	# library to use to archive a logfile segment
 #archive_command = ''		# command to use to archive a logfile segment
 				# placeholders: %p = path of file to archive
 				#               %f = file name only
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index bb0c52686a..85114b2e5f 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -155,7 +155,6 @@ extern PGDLLIMPORT int wal_level;
 /* Is WAL archiving enabled always (even during recovery)? */
 #define XLogArchivingAlways() \
 	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
-#define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')
 
 /*
  * Is WAL-logging necessary for archival or log-shipping, or can we skip
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 991a6d0616..8c9bf203cc 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -33,7 +33,49 @@ extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 extern void PgArchForceDirScan(void);
 
-/* in shell_archive.c */
-extern bool shell_archive_file(const char *file, const char *path);
+/*
+ * The value of the archive_library GUC.
+ */
+extern char *XLogArchiveLibrary;
+
+/*
+ * Callback that gets called to determine if the archive module is
+ * configured.
+ */
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+
+/*
+ * Callback called to archive a single WAL file.
+ */
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+
+/*
+ * Archive module callbacks
+ */
+typedef struct ArchiveModuleCallbacks
+{
+	ArchiveCheckConfiguredCB check_configured_cb;
+	ArchiveFileCB archive_file_cb;
+} ArchiveModuleCallbacks;
+
+/*
+ * Type of the shared library symbol _PG_archive_module_init that is looked
+ * up when loading an archive library.
+ */
+typedef void (*ArchiveModuleInit) (ArchiveModuleCallbacks *cb);
+
+/*
+ * Since the logic for archiving via a shell command is in the core server
+ * and does not need to be loaded via a shared library, it has a special
+ * initialization function.
+ */
+extern void shell_archive_init(ArchiveModuleCallbacks *cb);
+
+/*
+ * We consider archiving via shell to be enabled if archive_library is
+ * empty or if archive_library is set to "shell".
+ */
+#define ShellArchivingEnabled() \
+	(XLogArchiveLibrary[0] == '\0' || strcmp(XLogArchiveLibrary, "shell") == 0)
 
 #endif							/* _PGARCH_H */
-- 
2.16.6

v13-0003-Add-test-archive-module.patchapplication/octet-stream; name=v13-0003-Add-test-archive-module.patchDownload
From 6c785dc16ec1fd52ae2d9b717ef24830aa87366f Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:05:43 +0000
Subject: [PATCH v13 3/4] Add test archive module.

---
 src/test/modules/Makefile                          |   1 +
 src/test/modules/basic_archive/.gitignore          |   4 +
 src/test/modules/basic_archive/Makefile            |  20 +++
 src/test/modules/basic_archive/basic_archive.c     | 185 +++++++++++++++++++++
 src/test/modules/basic_archive/basic_archive.conf  |   3 +
 .../basic_archive/expected/basic_archive.out       |  29 ++++
 .../modules/basic_archive/sql/basic_archive.sql    |  22 +++
 7 files changed, 264 insertions(+)
 create mode 100644 src/test/modules/basic_archive/.gitignore
 create mode 100644 src/test/modules/basic_archive/Makefile
 create mode 100644 src/test/modules/basic_archive/basic_archive.c
 create mode 100644 src/test/modules/basic_archive/basic_archive.conf
 create mode 100644 src/test/modules/basic_archive/expected/basic_archive.out
 create mode 100644 src/test/modules/basic_archive/sql/basic_archive.sql

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index dffc79b2d9..b49e508a2c 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -5,6 +5,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = \
+		  basic_archive \
 		  brin \
 		  commit_ts \
 		  delay_execution \
diff --git a/src/test/modules/basic_archive/.gitignore b/src/test/modules/basic_archive/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/basic_archive/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/basic_archive/Makefile b/src/test/modules/basic_archive/Makefile
new file mode 100644
index 0000000000..ffbf846b68
--- /dev/null
+++ b/src/test/modules/basic_archive/Makefile
@@ -0,0 +1,20 @@
+# src/test/modules/basic_archive/Makefile
+
+MODULES = basic_archive
+PGFILEDESC = "basic_archive - basic archive module"
+
+REGRESS = basic_archive
+REGRESS_OPTS = --temp-config $(top_srcdir)/src/test/modules/basic_archive/basic_archive.conf
+
+NO_INSTALLCHECK = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/basic_archive
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/basic_archive/basic_archive.c b/src/test/modules/basic_archive/basic_archive.c
new file mode 100644
index 0000000000..66cbbaa7b5
--- /dev/null
+++ b/src/test/modules/basic_archive/basic_archive.c
@@ -0,0 +1,185 @@
+/*-------------------------------------------------------------------------
+ *
+ * basic_archive.c
+ *
+ * This file demonstrates a basic archive library implementation that is
+ * roughly equivalent to the following shell command:
+ *
+ * 		test ! -f /path/to/dest && cp /path/to/src /path/to/dest
+ *
+ * One notable difference between this module and the shell command above
+ * is that this module first copies the file to a temporary destination,
+ * syncs it to disk, and then durably moves it to the final destination.
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/test/modules/basic_archive/basic_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "miscadmin.h"
+#include "postmaster/pgarch.h"
+#include "storage/copydir.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+void _PG_init(void);
+void _PG_archive_module_init(ArchiveModuleCallbacks *cb);
+
+static char *archive_directory = NULL;
+
+static bool basic_archive_configured(void);
+static bool basic_archive_file(const char *file, const char *path);
+static bool check_archive_directory(char **newval, void **extra, GucSource source);
+
+/*
+ * _PG_init
+ *
+ * Defines the module's GUC.
+ */
+void
+_PG_init(void)
+{
+	DefineCustomStringVariable("basic_archive.archive_directory",
+							   gettext_noop("Archive file destination directory."),
+							   NULL,
+							   &archive_directory,
+							   "",
+							   PGC_SIGHUP,
+							   0,
+							   check_archive_directory, NULL, NULL);
+
+	EmitWarningsOnPlaceholders("basic_archive");
+}
+
+/*
+ * _PG_archive_module_init
+ *
+ * Returns the module's archiving callbacks.
+ */
+void
+_PG_archive_module_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = basic_archive_configured;
+	cb->archive_file_cb = basic_archive_file;
+}
+
+/*
+ * check_archive_directory
+ *
+ * Checks that the provided archive directory exists.
+ */
+static bool
+check_archive_directory(char **newval, void **extra, GucSource source)
+{
+	struct stat st;
+
+	/*
+	 * The default value is an empty string, so we have to accept that value.
+	 * Our check_configured callback also checks for this and prevents archiving
+	 * from proceeding if it is still empty.
+	 */
+	if (*newval == NULL || *newval[0] == '\0')
+		return true;
+
+	/*
+	 * Make sure the file paths won't be too long.  The docs indicate that the
+	 * file names to be archived can be up to 64 characters long.
+	 */
+	if (strlen(*newval) + 64 + 2 >= MAXPGPATH)
+	{
+		GUC_check_errdetail("archive directory too long");
+		return false;
+	}
+
+	/*
+	 * Do a basic sanity check that the specified archive directory exists.  It
+	 * could be removed at some point in the future, so we still need to be
+	 * prepared for it not to exist in the actual archiving logic.
+	 */
+	if (stat(*newval, &st) != 0 || !S_ISDIR(st.st_mode))
+	{
+		GUC_check_errdetail("specified archive directory does not exist");
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * basic_archive_configured
+ *
+ * Checks that archive_directory is not blank.
+ */
+static bool
+basic_archive_configured(void)
+{
+	return archive_directory != NULL && archive_directory[0] != '\0';
+}
+
+/*
+ * basic_archive_file
+ *
+ * Archives one file.
+ */
+static bool
+basic_archive_file(const char *file, const char *path)
+{
+	char destination[MAXPGPATH];
+	char temp[MAXPGPATH];
+	struct stat st;
+
+	ereport(DEBUG3,
+			(errmsg("archiving \"%s\" via basic_archive", file)));
+
+	snprintf(destination, MAXPGPATH, "%s/%s", archive_directory, file);
+	snprintf(temp, MAXPGPATH, "%s/%s", archive_directory, "archtemp");
+
+	/*
+	 * First, check if the file has already been archived.  If the archive file
+	 * already exists, something might be wrong, so we just fail.
+	 */
+	if (stat(destination, &st) == 0)
+	{
+		ereport(WARNING,
+				(errmsg("archive file \"%s\" already exists", destination)));
+		return false;
+	}
+	else if (errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat file \"%s\": %m", destination)));
+
+	/*
+	 * Remove pre-existing temporary file, if one exists.
+	 */
+	if (unlink(temp) != 0 && errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not unlink file \"%s\": %m", temp)));
+
+	/*
+	 * Copy the file to its temporary destination.
+	 */
+	copy_file(unconstify(char *, path), temp);
+
+	/*
+	 * Sync the temporary file to disk and move it to its final destination.
+	 */
+	(void) durable_rename_excl(temp, destination, ERROR);
+
+	ereport(DEBUG1,
+			(errmsg("archived \"%s\" via basic_archive", file)));
+
+	return true;
+}
diff --git a/src/test/modules/basic_archive/basic_archive.conf b/src/test/modules/basic_archive/basic_archive.conf
new file mode 100644
index 0000000000..b26b2d4144
--- /dev/null
+++ b/src/test/modules/basic_archive/basic_archive.conf
@@ -0,0 +1,3 @@
+archive_mode = 'on'
+archive_library = 'basic_archive'
+basic_archive.archive_directory = '.'
diff --git a/src/test/modules/basic_archive/expected/basic_archive.out b/src/test/modules/basic_archive/expected/basic_archive.out
new file mode 100644
index 0000000000..0015053e0f
--- /dev/null
+++ b/src/test/modules/basic_archive/expected/basic_archive.out
@@ -0,0 +1,29 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+ ?column? 
+----------
+        1
+(1 row)
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test;
diff --git a/src/test/modules/basic_archive/sql/basic_archive.sql b/src/test/modules/basic_archive/sql/basic_archive.sql
new file mode 100644
index 0000000000..14e236d57a
--- /dev/null
+++ b/src/test/modules/basic_archive/sql/basic_archive.sql
@@ -0,0 +1,22 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+
+DROP TABLE test;
-- 
2.16.6

v13-0004-Add-documentation-for-archive-modules.patchapplication/octet-stream; name=v13-0004-Add-documentation-for-archive-modules.patchDownload
From b177c620cbe7ff0ddfa0471f7be433e14ea1e4e5 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:06:01 +0000
Subject: [PATCH v13 4/4] Add documentation for archive modules.

---
 doc/src/sgml/archive-modules.sgml   | 123 ++++++++++++++++++++++++++++++++++++
 doc/src/sgml/backup.sgml            |  83 +++++++++++++++---------
 doc/src/sgml/config.sgml            |  37 ++++++++---
 doc/src/sgml/filelist.sgml          |   1 +
 doc/src/sgml/high-availability.sgml |   6 +-
 doc/src/sgml/postgres.sgml          |   1 +
 doc/src/sgml/ref/pg_basebackup.sgml |   4 +-
 doc/src/sgml/ref/pg_receivewal.sgml |   6 +-
 doc/src/sgml/wal.sgml               |   2 +-
 9 files changed, 215 insertions(+), 48 deletions(-)
 create mode 100644 doc/src/sgml/archive-modules.sgml

diff --git a/doc/src/sgml/archive-modules.sgml b/doc/src/sgml/archive-modules.sgml
new file mode 100644
index 0000000000..d52aaaf1f5
--- /dev/null
+++ b/doc/src/sgml/archive-modules.sgml
@@ -0,0 +1,123 @@
+<!-- doc/src/sgml/archive-modules.sgml -->
+
+<chapter id="archive-modules">
+ <title>Archive Modules</title>
+ <indexterm zone="archive-modules">
+  <primary>Archive Modules</primary>
+ </indexterm>
+
+ <para>
+  PostgreSQL provides infrastructure to create custom modules for continuous
+  archiving (see <xref linkend="continuous-archiving"/>).  While archiving via
+  a shell command (i.e., <xref linkend="guc-archive-command"/>) is much
+  simpler, a custom archive module will often be considerably more robust and
+  performant.
+ </para>
+
+ <para>
+  When a custom <xref linkend="guc-archive-library"/> is configured, PostgreSQL
+  will submit completed WAL files to the module, and the server will avoid
+  recyling or removing these WAL files until the module indicates that the files
+  were successfully archived.  It is ultimately up to the module to decide what
+  to do with each WAL file, but many recommendations are listed at
+  <xref linkend="backup-archiving-wal"/>.
+ </para>
+
+ <para>
+  Archiving modules must at least consist of an initialization function (see
+  <xref linkend="archive-module-init"/>) and the required callbacks (see
+  <xref linkend="archive-module-callbacks"/>).  However, archive modules are
+  also permitted to do much more (e.g., declare GUCs and register background
+  workers).
+ </para>
+
+ <para>
+  The <filename>src/test/modules/basic_archive</filename> module contains a
+  working example, which demonstrates some useful techniques.
+ </para>
+
+ <warning>
+  <para>
+   There are considerable robustness and security risks in using archive modules
+   because, being written in the <literal>C</literal> language, they have access
+   to many server resources.  Administrators wishing to enable archive modules
+   should exercise extreme caution.  Only carefully audited modules should be
+   loaded.
+  </para>
+ </warning>
+
+ <sect1 id="archive-module-init">
+  <title>Initialization Functions</title>
+  <indexterm zone="archive-module-init">
+   <primary>_PG_archive_module_init</primary>
+  </indexterm>
+  <para>
+   An archive library is loaded by dynamically loading a shared library with the
+   <xref linkend="guc-archive-library"/>'s name as the library base name.  The
+   normal library search path is used to locate the library.  To provide the
+   required archive module callbacks and to indicate that the library is
+   actually an archive module, it needs to provide a function named
+   <function>_PG_archive_module_init</function>.  This function is passed a
+   struct that needs to be filled with the callback function pointers for
+   individual actions.
+
+<programlisting>
+typedef struct ArchiveModuleCallbacks
+{
+    ArchiveCheckConfiguredCB check_configured_cb;
+    ArchiveFileCB archive_file_cb;
+} ArchiveModuleCallbacks;
+typedef void (*ArchiveModuleInit) (struct ArchiveModuleCallbacks *cb);
+</programlisting>
+
+   Both callbacks are required.
+  </para>
+ </sect1>
+
+ <sect1 id="archive-module-callbacks">
+  <title>Archive Module Callbacks</title>
+  <para>
+   The archive callbacks define the actual archiving behavior of the module.
+   The server will call them as required to process each individual WAL file.
+  </para>
+
+  <sect2 id="archive-module-check">
+   <title>Check Callback</title>
+   <para>
+    The <function>check_configured_cb</function> callback is called to determine
+    whether the module is fully configured and ready to accept WAL files.
+
+<programlisting>
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server will proceed with
+    archiving the file by calling the <function>archive_file_cb</function>
+    callback.  If <literal>false</literal> is returned, archiving will not
+    proceed.  In the latter case, the server will periodically call this
+    function, and archiving will proceed if it eventually returns
+    <literal>true</literal>.
+   </para>
+  </sect2>
+
+  <sect2 id="archive-module-archive">
+   <title>Archive Callback</title>
+   <para>
+    The <function>archive_file_cb</function> callback is called to archive a
+    single WAL file.
+
+<programlisting>
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server proceeds as if the file
+    was successfully archived, which may include recycling or removing the
+    original WAL file.  If <literal>false</literal> is returned, the server will
+    keep the original WAL file and retry archiving later.
+    <literal>file</literal> will contain just the file name of the WAL file to
+    archive, while <literal>path</literal> contains the full path of the WAL
+    file (including the file name).
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index cba32b6eb3..b42f1b3ca7 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -593,20 +593,23 @@ tar -cf backup.tar /usr/local/pgsql/data
     provide the database administrator with flexibility,
     <productname>PostgreSQL</productname> tries not to make any assumptions about how
     the archiving will be done.  Instead, <productname>PostgreSQL</productname> lets
-    the administrator specify a shell command to be executed to copy a
-    completed segment file to wherever it needs to go.  The command could be
-    as simple as a <literal>cp</literal>, or it could invoke a complex shell
-    script &mdash; it's all up to you.
+    the administrator specify an archive library to be executed to copy a
+    completed segment file to wherever it needs to go.  This could be as simple
+    as a shell command that uses <literal>cp</literal>, or it could invoke a
+    complex C function &mdash; it's all up to you.
    </para>
 
    <para>
     To enable WAL archiving, set the <xref linkend="guc-wal-level"/>
     configuration parameter to <literal>replica</literal> or higher,
     <xref linkend="guc-archive-mode"/> to <literal>on</literal>,
-    and specify the shell command to use in the <xref
-    linkend="guc-archive-command"/> configuration parameter.  In practice
+    and specify the library to use in the <xref
+    linkend="guc-archive-library"/> configuration parameter.  In practice
     these settings will always be placed in the
     <filename>postgresql.conf</filename> file.
+    One simple way to archive is to set <varname>archive_library</varname> to
+    <literal>shell</literal> and to specify a shell command in
+    <xref linkend="guc-archive-command"/>.
     In <varname>archive_command</varname>,
     <literal>%p</literal> is replaced by the path name of the file to
     archive, while <literal>%f</literal> is replaced by only the file name.
@@ -631,7 +634,17 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command will be executed under the ownership of the same
+    Another way to archive is to use a custom archive module as the
+    <varname>archive_library</varname>.  Since such modules are written in
+    <literal>C</literal>, creating your own may require considerably more effort
+    than writing a shell command.  However, archive modules can be more
+    performant than archiving via shell, and they will have access to many
+    useful server resources.  For more information about archive modules, see
+    <xref linkend="archive-modules"/>.
+   </para>
+
+   <para>
+    The archive library will be executed under the ownership of the same
     user that the <productname>PostgreSQL</productname> server is running as.  Since
     the series of WAL files being archived contains effectively everything
     in your database, you will want to be sure that the archived data is
@@ -640,25 +653,31 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is important that the archive command return zero exit status if and
-    only if it succeeds.  Upon getting a zero result,
+    It is important that the archive function return <literal>true</literal> if
+    and only if it succeeds.  If <literal>true</literal> is returned,
     <productname>PostgreSQL</productname> will assume that the file has been
-    successfully archived, and will remove or recycle it.  However, a nonzero
-    status tells <productname>PostgreSQL</productname> that the file was not archived;
-    it will try again periodically until it succeeds.
+    successfully archived, and will remove or recycle it.  However, a return
+    value of <literal>false</literal> tells
+    <productname>PostgreSQL</productname> that the file was not archived; it
+    will try again periodically until it succeeds.  If you are archiving via a
+    shell command, the appropriate return values can be achieved by returning
+    <literal>0</literal> if the command succeeds and a nonzero value if it
+    fails.
    </para>
 
    <para>
-    When the archive command is terminated by a signal (other than
-    <systemitem>SIGTERM</systemitem> that is used as part of a server
-    shutdown) or an error by the shell with an exit status greater than
-    125 (such as command not found), the archiver process aborts and gets
-    restarted by the postmaster. In such cases, the failure is
-    not reported in <xref linkend="pg-stat-archiver-view"/>.
+    If the archive function emits an <literal>ERROR</literal> or
+    <literal>FATAL</literal>, the archiver process aborts and gets restarted by
+    the postmaster.  If you are archiving via shell command, FATAL is emitted if
+    the command is terminated by a signal (other than
+    <systemitem>SIGTERM</systemitem> that is used as part of a server shutdown)
+    or an error by the shell with an exit status greater than 125 (such as
+    command not found).  In such cases, the failure is not reported in
+    <xref linkend="pg-stat-archiver-view"/>.
    </para>
 
    <para>
-    The archive command should generally be designed to refuse to overwrite
+    The archive library should generally be designed to refuse to overwrite
     any pre-existing archive file.  This is an important safety feature to
     preserve the integrity of your archive in case of administrator error
     (such as sending the output of two different servers to the same archive
@@ -666,9 +685,9 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is advisable to test your proposed archive command to ensure that it
+    It is advisable to test your proposed archive library to ensure that it
     indeed does not overwrite an existing file, <emphasis>and that it returns
-    nonzero status in this case</emphasis>.
+    <literal>false</literal> in this case</emphasis>.
     The example command above for Unix ensures this by including a separate
     <command>test</command> step.  On some Unix platforms, <command>cp</command> has
     switches such as <option>-i</option> that can be used to do the same thing
@@ -680,7 +699,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
 
    <para>
     While designing your archiving setup, consider what will happen if
-    the archive command fails repeatedly because some aspect requires
+    the archive library fails repeatedly because some aspect requires
     operator intervention or the archive runs out of space. For example, this
     could occur if you write to tape without an autochanger; when the tape
     fills, nothing further can be archived until the tape is swapped.
@@ -695,7 +714,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The speed of the archiving command is unimportant as long as it can keep up
+    The speed of the archive library is unimportant as long as it can keep up
     with the average rate at which your server generates WAL data.  Normal
     operation continues even if the archiving process falls a little behind.
     If archiving falls significantly behind, this will increase the amount of
@@ -707,11 +726,11 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    In writing your archive command, you should assume that the file names to
+    In writing your archive library, you should assume that the file names to
     be archived can be up to 64 characters long and can contain any
     combination of ASCII letters, digits, and dots.  It is not necessary to
-    preserve the original relative path (<literal>%p</literal>) but it is necessary to
-    preserve the file name (<literal>%f</literal>).
+    preserve the original relative path but it is necessary to preserve the file
+    name.
    </para>
 
    <para>
@@ -728,7 +747,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command is only invoked on completed WAL segments.  Hence,
+    The archive function is only invoked on completed WAL segments.  Hence,
     if your server generates only little WAL traffic (or has slack periods
     where it does so), there could be a long delay between the completion
     of a transaction and its safe recording in archive storage.  To put
@@ -758,7 +777,8 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
     contain enough information for archive recovery.  (Crash recovery is
     unaffected.)  For this reason, <varname>wal_level</varname> can only be changed at
     server start.  However, <varname>archive_command</varname> can be changed with a
-    configuration file reload.  If you wish to temporarily stop archiving,
+    configuration file reload.  If you are archiving via shell and wish to
+    temporarily stop archiving,
     one way to do it is to set <varname>archive_command</varname> to the empty
     string (<literal>''</literal>).
     This will cause WAL files to accumulate in <filename>pg_wal/</filename> until a
@@ -938,11 +958,11 @@ SELECT * FROM pg_stop_backup(false, true);
      On a standby, <varname>archive_mode</varname> must be <literal>always</literal> in order
      for <function>pg_stop_backup</function> to wait.
      Archiving of these files happens automatically since you have
-     already configured <varname>archive_command</varname>. In most cases this
+     already configured <varname>archive_library</varname>. In most cases this
      happens quickly, but you are advised to monitor your archive
      system to ensure there are no delays.
      If the archive process has fallen behind
-     because of failures of the archive command, it will keep retrying
+     because of failures of the archive library, it will keep retrying
      until the archive succeeds and the backup is complete.
      If you wish to place a time limit on the execution of
      <function>pg_stop_backup</function>, set an appropriate
@@ -1500,9 +1520,10 @@ restore_command = 'cp /mnt/server/archivedir/%f %p'
       To prepare for low level standalone hot backups, make sure
       <varname>wal_level</varname> is set to
       <literal>replica</literal> or higher, <varname>archive_mode</varname> to
-      <literal>on</literal>, and set up an <varname>archive_command</varname> that performs
+      <literal>on</literal>, and set up an <varname>archive_library</varname> that performs
       archiving only when a <emphasis>switch file</emphasis> exists.  For example:
 <programlisting>
+archive_library = 'shell'
 archive_command = 'test ! -f /var/lib/pgsql/backup_in_progress || (test ! -f /var/lib/pgsql/archive/%f &amp;&amp; cp %p /var/lib/pgsql/archive/%f)'
 </programlisting>
       This command will perform archiving when
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index afbb6c35e3..d8b5152930 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3479,7 +3479,7 @@ include_dir 'conf.d'
         Maximum size to let the WAL grow during automatic
         checkpoints. This is a soft limit; WAL size can exceed
         <varname>max_wal_size</varname> under special circumstances, such as
-        heavy load, a failing <varname>archive_command</varname>, or a high
+        heavy load, a failing <varname>archive_library</varname>, or a high
         <varname>wal_keep_size</varname> setting.
         If this value is specified without units, it is taken as megabytes.
         The default is 1 GB.
@@ -3528,7 +3528,7 @@ include_dir 'conf.d'
        <para>
         When <varname>archive_mode</varname> is enabled, completed WAL segments
         are sent to archive storage by setting
-        <xref linkend="guc-archive-command"/>. In addition to <literal>off</literal>,
+        <xref linkend="guc-archive-library"/>. In addition to <literal>off</literal>,
         to disable, there are two modes: <literal>on</literal>, and
         <literal>always</literal>. During normal operation, there is no
         difference between the two modes, but when set to <literal>always</literal>
@@ -3538,9 +3538,6 @@ include_dir 'conf.d'
         <xref linkend="continuous-archiving-in-standby"/> for details.
        </para>
        <para>
-        <varname>archive_mode</varname> and <varname>archive_command</varname> are
-        separate variables so that <varname>archive_command</varname> can be
-        changed without leaving archiving mode.
         This parameter can only be set at server start.
         <varname>archive_mode</varname> cannot be enabled when
         <varname>wal_level</varname> is set to <literal>minimal</literal>.
@@ -3548,6 +3545,28 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-archive-library" xreflabel="archive_library">
+      <term><varname>archive_library</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>archive_library</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        The library to use for archiving completed WAL file segments.  If set to
+        <literal>shell</literal> (the default) or an empty string, archiving via
+        shell is enabled, and <xref linkend="guc-archive-command"/> is used.
+        Otherwise, the specified shared library is used for archiving.  For more
+        information, see <xref linkend="backup-archiving-wal"/> and
+        <xref linkend="archive-modules"/>.
+       </para>
+       <para>
+        This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-archive-command" xreflabel="archive_command">
       <term><varname>archive_command</varname> (<type>string</type>)
       <indexterm>
@@ -3570,9 +3589,11 @@ include_dir 'conf.d'
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
         file or on the server command line.  It is ignored unless
-        <varname>archive_mode</varname> was enabled at server start.
+        <varname>archive_mode</varname> was enabled at server start and
+        <varname>archive_library</varname> specifies to archive via shell command.
         If <varname>archive_command</varname> is an empty string (the default) while
-        <varname>archive_mode</varname> is enabled, WAL archiving is temporarily
+        <varname>archive_mode</varname> is enabled and <varname>archive_library</varname>
+        specifies archiving via shell, WAL archiving is temporarily
         disabled, but the server continues to accumulate WAL segment files in
         the expectation that a command will soon be provided.  Setting
         <varname>archive_command</varname> to a command that does nothing but
@@ -3592,7 +3613,7 @@ include_dir 'conf.d'
       </term>
       <listitem>
        <para>
-        The <xref linkend="guc-archive-command"/> is only invoked for
+        The <xref linkend="guc-archive-library"/> is only invoked for
         completed WAL segments. Hence, if your server generates little WAL
         traffic (or has slack periods where it does so), there could be a
         long delay between the completion of a transaction and its safe
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 89454e99b9..e6b472ec32 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -99,6 +99,7 @@
 <!ENTITY custom-scan SYSTEM "custom-scan.sgml">
 <!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
 <!ENTITY replication-origins SYSTEM "replication-origins.sgml">
+<!ENTITY archive-modules SYSTEM "archive-modules.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index a265409f02..437712762a 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -935,7 +935,7 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
     In lieu of using replication slots, it is possible to prevent the removal
     of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by
     storing the segments in an archive using
-    <xref linkend="guc-archive-command"/>.
+    <xref linkend="guc-archive-library"/>.
     However, these methods often result in retaining more WAL segments than
     required, whereas replication slots retain only the number of segments
     known to be needed.  On the other hand, replication slots can retain so
@@ -1386,10 +1386,10 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
      to <literal>always</literal>, and the standby will call the archive
      command for every WAL segment it receives, whether it's by restoring
      from the archive or by streaming replication. The shared archive can
-     be handled similarly, but the <varname>archive_command</varname> must
+     be handled similarly, but the <varname>archive_library</varname> must
      test if the file being archived exists already, and if the existing file
      has identical contents. This requires more care in the
-     <varname>archive_command</varname>, as it must
+     <varname>archive_library</varname>, as it must
      be careful to not overwrite an existing file with different contents,
      but return success if the exactly same file is archived twice. And
      all that must be done free of race conditions, if two servers attempt
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index dba9cf413f..3db6d2160b 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -233,6 +233,7 @@ break is not needed in a wider output rendering.
   &bgworker;
   &logicaldecoding;
   &replication-origins;
+  &archive-modules;
 
  </part>
 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9e6807b457..2aaeaca766 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -102,8 +102,8 @@ PostgreSQL documentation
      <para>
       All WAL records required for the backup must contain sufficient full-page writes,
       which requires you to enable <varname>full_page_writes</varname> on the primary and
-      not to use a tool like <application>pg_compresslog</application> as
-      <varname>archive_command</varname> to remove full-page writes from WAL files.
+      not to use a tool in your <varname>archive_library</varname> to remove
+      full-page writes from WAL files.
      </para>
     </listitem>
    </itemizedlist>
diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 5de80f8c64..a6b6ba91fb 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -40,7 +40,7 @@ PostgreSQL documentation
   <para>
    <application>pg_receivewal</application> streams the write-ahead
    log in real time as it's being generated on the server, and does not wait
-   for segments to complete like <xref linkend="guc-archive-command"/> does.
+   for segments to complete like <xref linkend="guc-archive-library"/> does.
    For this reason, it is not necessary to set
    <xref linkend="guc-archive-timeout"/> when using
     <application>pg_receivewal</application>.
@@ -488,11 +488,11 @@ PostgreSQL documentation
 
   <para>
    When using <application>pg_receivewal</application> instead of
-   <xref linkend="guc-archive-command"/> as the main WAL backup method, it is
+   <xref linkend="guc-archive-library"/> as the main WAL backup method, it is
    strongly recommended to use replication slots.  Otherwise, the server is
    free to recycle or remove write-ahead log files before they are backed up,
    because it does not have any information, either
-   from <xref linkend="guc-archive-command"/> or the replication slots, about
+   from <xref linkend="guc-archive-library"/> or the replication slots, about
    how far the WAL stream has been archived.  Note, however, that a
    replication slot will fill up the server's disk space if the receiver does
    not keep up with fetching the WAL data.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 24e1c89503..2bb27a8468 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -636,7 +636,7 @@
    WAL files plus one additional WAL file are
    kept at all times. Also, if WAL archiving is used, old segments cannot be
    removed or recycled until they are archived. If WAL archiving cannot keep up
-   with the pace that WAL is generated, or if <varname>archive_command</varname>
+   with the pace that WAL is generated, or if <varname>archive_library</varname>
    fails repeatedly, old WAL files will accumulate in <filename>pg_wal</filename>
    until the situation is resolved. A slow or failed standby server that
    uses a replication slot will have the same effect (see
-- 
2.16.6

#26Robert Haas
robertmhaas@gmail.com
In reply to: Bossart, Nathan (#25)
Re: archive modules

On Thu, Jan 13, 2022 at 2:38 PM Bossart, Nathan <bossartn@amazon.com> wrote:

Here is another rebase for cfbot.

I've committed 0001 now. I don't see anything particularly wrong with
the rest of this either, but here are a few comments:

- I wonder whether it might be better to promote the basic archiving
module to contrib (as compared with src/test/modules) and try to
harden it to the extent that such hardening is required. I think a lot
of people would get good use out of that. It might not be a completely
baked solution, but a solution doesn't have to be completely baked to
be a massive improvement over the stupidity endorsed by our current
documentation.

- I wonder whether it's a good idea to silently succeed if the file
exists and has the same contents as the file we're trying to archive.
ISTR that being necessary behavior for robustness, because what if we
archive the file and then die before recording the fact that we
archived it?

- If we load a new archive library, should we give the old one a
callback so it can shut down? And should the archiver considering
exiting since we can't unload? It isn't necessary but it might be
nicer.

- I believe we decided some time back to invoke function pointers
(*like)(this) rather than like(this) for clarity. It was judged that
something->like(this) was fine because that can only be a function
pointer, so no need to write (*(something->like))(this), but
like(this) could make you think that "like" is a plain function rather
than a function pointer.

--
Robert Haas
EDB: http://www.enterprisedb.com

#27Nathan Bossart
nathandbossart@gmail.com
In reply to: Robert Haas (#26)
Re: archive modules

On Fri, Jan 28, 2022 at 02:06:50PM -0500, Robert Haas wrote:

I've committed 0001 now. I don't see anything particularly wrong with
the rest of this either, but here are a few comments:

Thanks!

- I wonder whether it might be better to promote the basic archiving
module to contrib (as compared with src/test/modules) and try to
harden it to the extent that such hardening is required. I think a lot
of people would get good use out of that. It might not be a completely
baked solution, but a solution doesn't have to be completely baked to
be a massive improvement over the stupidity endorsed by our current
documentation.

This has been suggested a few times in this thread, so I'll go ahead and
move it to contrib. I am clearly outnumbered! :)

I discussed the two main deficiencies I'm aware of with basic_archive
earlier [0]/messages/by-id/A30D8D33-8944-4898-BCA8-C77C18258247@amazon.com. The first one is the issue with "incovenient" server crashes
(mentioned below). The second is that there is no handling for multiple
servers writing to the same location since the temporary file is always
named "archtemp." I thought about a few ways to pick a unique file name
(or at least one that is _probably_ unique), but that began adding a lot of
complexity for something I intended as a test module. I can spend some
more time on this if you think it's worth fixing for a contrib module.

- I wonder whether it's a good idea to silently succeed if the file
exists and has the same contents as the file we're trying to archive.
ISTR that being necessary behavior for robustness, because what if we
archive the file and then die before recording the fact that we
archived it?

Yes. The only reason I didn't proceed with this earlier is because the
logic became a sizable chunk of the module. I will add this in the next
revision.

- If we load a new archive library, should we give the old one a
callback so it can shut down? And should the archiver considering
exiting since we can't unload? It isn't necessary but it might be
nicer.

Good idea. I'll look into this.

- I believe we decided some time back to invoke function pointers
(*like)(this) rather than like(this) for clarity. It was judged that
something->like(this) was fine because that can only be a function
pointer, so no need to write (*(something->like))(this), but
like(this) could make you think that "like" is a plain function rather
than a function pointer.

Will fix.

[0]: /messages/by-id/A30D8D33-8944-4898-BCA8-C77C18258247@amazon.com

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com/

#28Robert Haas
robertmhaas@gmail.com
In reply to: Nathan Bossart (#27)
Re: archive modules

On Fri, Jan 28, 2022 at 3:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

I discussed the two main deficiencies I'm aware of with basic_archive
earlier [0]. The first one is the issue with "incovenient" server crashes
(mentioned below).

Seems easy enough to rectify, if it's just a matter of silently-succeed-if-same.

The second is that there is no handling for multiple
servers writing to the same location since the temporary file is always
named "archtemp." I thought about a few ways to pick a unique file name
(or at least one that is _probably_ unique), but that began adding a lot of
complexity for something I intended as a test module. I can spend some
more time on this if you think it's worth fixing for a contrib module.

Well, my first thought was to wonder whether we even care about that
scenario, but I guess we probably do, at least a little bit.

How about:

1. Name temporary files like
archive_temp.${FINAL_NAME}.${PID}.${SOME_RANDOM_NUMBER}. Create them
with O_EXCL. If it fails, die.

2. Try not to leave files like this behind, perhaps installing an
on_proc_exit callback or similar, but accept that crashes and unlink()
failures will make it inevitable in some cases.

3. Document that crashes or other strange failure cases may leave
archive_temp.* files behind in the archive directory, and recommend
that users remove them before restarting the database after a crash
(or, with caution, removing them while the database is running if the
user is sure that the files are old and unrelated to any archiving
still in progress).

I'm not arguing that this is exactly the right idea. But I am arguing
that it shouldn't take a ton of engineering to come up with something
reasonable here.

--
Robert Haas
EDB: http://www.enterprisedb.com

#29Nathan Bossart
nathandbossart@gmail.com
In reply to: Robert Haas (#28)
Re: archive modules

On Fri, Jan 28, 2022 at 03:20:41PM -0500, Robert Haas wrote:

On Fri, Jan 28, 2022 at 3:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

I discussed the two main deficiencies I'm aware of with basic_archive
earlier [0]. The first one is the issue with "incovenient" server crashes
(mentioned below).

Seems easy enough to rectify, if it's just a matter of silently-succeed-if-same.

Yes.

The second is that there is no handling for multiple
servers writing to the same location since the temporary file is always
named "archtemp." I thought about a few ways to pick a unique file name
(or at least one that is _probably_ unique), but that began adding a lot of
complexity for something I intended as a test module. I can spend some
more time on this if you think it's worth fixing for a contrib module.

Well, my first thought was to wonder whether we even care about that
scenario, but I guess we probably do, at least a little bit.

How about:

1. Name temporary files like
archive_temp.${FINAL_NAME}.${PID}.${SOME_RANDOM_NUMBER}. Create them
with O_EXCL. If it fails, die.

2. Try not to leave files like this behind, perhaps installing an
on_proc_exit callback or similar, but accept that crashes and unlink()
failures will make it inevitable in some cases.

3. Document that crashes or other strange failure cases may leave
archive_temp.* files behind in the archive directory, and recommend
that users remove them before restarting the database after a crash
(or, with caution, removing them while the database is running if the
user is sure that the files are old and unrelated to any archiving
still in progress).

I'm not arguing that this is exactly the right idea. But I am arguing
that it shouldn't take a ton of engineering to come up with something
reasonable here.

This is roughly what I had in mind. I'll give it a try.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#30Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#29)
3 attachment(s)
Re: archive modules

Here is a new revision. I've moved basic_archive to contrib, hardened it
as suggested, and added shutdown support for archive modules.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v14-0001-Introduce-archive-modules-infrastructure.patchtext/x-diff; charset=us-asciiDownload
From f62fea53b93ba7181dfe084b4100eba59eb82aaa Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:04:41 +0000
Subject: [PATCH v14 1/3] Introduce archive modules infrastructure.

---
 src/backend/access/transam/xlog.c             |  2 +-
 src/backend/postmaster/pgarch.c               | 93 +++++++++++++++++--
 src/backend/postmaster/shell_archive.c        | 24 ++++-
 src/backend/utils/init/miscinit.c             |  1 +
 src/backend/utils/misc/guc.c                  | 12 ++-
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/xlog.h                     |  1 -
 src/include/postmaster/pgarch.h               | 52 ++++++++++-
 8 files changed, 172 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dfe2a0bcce..958220c495 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8831,7 +8831,7 @@ ShutdownXLOG(int code, Datum arg)
 		 * process one more time at the end of shutdown). The checkpoint
 		 * record will go to the next XLOG file and won't be archived (yet).
 		 */
-		if (XLogArchivingActive() && XLogArchiveCommandSet())
+		if (XLogArchivingActive())
 			RequestXLogSwitch(false);
 
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 6e3fcedc97..d4a7ca97ca 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -89,6 +89,8 @@ typedef struct PgArchData
 	slock_t		arch_lck;
 } PgArchData;
 
+char *XLogArchiveLibrary = "";
+
 
 /* ----------
  * Local data
@@ -96,6 +98,8 @@ typedef struct PgArchData
  */
 static time_t last_sigterm_time = 0;
 static PgArchData *PgArch = NULL;
+static ArchiveModuleCallbacks ArchiveContext;
+
 
 /*
  * Stuff for tracking multiple files to archive from each scan of
@@ -140,6 +144,7 @@ static void pgarch_archiveDone(char *xlog);
 static void pgarch_die(int code, Datum arg);
 static void HandlePgArchInterrupts(void);
 static int ready_file_comparator(Datum a, Datum b, void *arg);
+static void LoadArchiveLibrary(void);
 
 /* Report shared memory space needed by PgArchShmemInit */
 Size
@@ -236,6 +241,11 @@ PgArchiverMain(void)
 	 */
 	PgArch->pgprocno = MyProc->pgprocno;
 
+	/*
+	 * Load the archive_library.
+	 */
+	LoadArchiveLibrary();
+
 	/* Create workspace for pgarch_readyXlog() */
 	arch_files = palloc(sizeof(struct arch_files_state));
 	arch_files->arch_files_size = 0;
@@ -407,11 +417,12 @@ pgarch_ArchiverCopyLoop(void)
 			 */
 			HandlePgArchInterrupts();
 
-			/* can't do anything if no command ... */
-			if (!XLogArchiveCommandSet())
+			/* can't do anything if not configured ... */
+			if (ArchiveContext.check_configured_cb != NULL &&
+				!ArchiveContext.check_configured_cb())
 			{
 				ereport(WARNING,
-						(errmsg("archive_mode enabled, yet archive_command is not set")));
+						(errmsg("archive_mode enabled, yet archiving is not configured")));
 				return;
 			}
 
@@ -492,7 +503,7 @@ pgarch_ArchiverCopyLoop(void)
 /*
  * pgarch_archiveXlog
  *
- * Invokes system(3) to copy one archive file to wherever it should go
+ * Invokes archive_file_cb to copy one archive file to wherever it should go
  *
  * Returns true if successful
  */
@@ -509,7 +520,7 @@ pgarch_archiveXlog(char *xlog)
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	ret = shell_archive_file(xlog, pathname);
+	ret = ArchiveContext.archive_file_cb(xlog, pathname);
 	if (ret)
 		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	else
@@ -759,13 +770,79 @@ HandlePgArchInterrupts(void)
 	if (ProcSignalBarrierPending)
 		ProcessProcSignalBarrier();
 
+	/* Perform logging of memory contexts of this process */
+	if (LogMemoryContextPending)
+		ProcessLogMemoryContextInterrupt();
+
 	if (ConfigReloadPending)
 	{
+		char	   *archiveLib = pstrdup(XLogArchiveLibrary);
+		bool		archiveLibChanged;
+
 		ConfigReloadPending = false;
 		ProcessConfigFile(PGC_SIGHUP);
+
+		archiveLibChanged = strcmp(XLogArchiveLibrary, archiveLib) != 0;
+		pfree(archiveLib);
+
+		if (archiveLibChanged)
+		{
+			/*
+			 * Call the currently loaded archive module's shutdown callback, if
+			 * one is defined.
+			 */
+			if (ArchiveContext.shutdown_cb != NULL)
+				ArchiveContext.shutdown_cb();
+
+			/*
+			 * Ideally, we would simply unload the previous archive module and
+			 * load the new one, but there is presently no mechanism for
+			 * unloading a library (see the comment above
+			 * internal_unload_library()).  To deal with this, we simply restart
+			 * the archiver.  The new archive module will be loaded when the new
+			 * archiver process starts up.
+			 */
+			ereport(LOG,
+					(errmsg("restarting archiver process because value of "
+							"\"archive_library\" was changed")));
+
+			proc_exit(0);
+		}
 	}
+}
 
-	/* Perform logging of memory contexts of this process */
-	if (LogMemoryContextPending)
-		ProcessLogMemoryContextInterrupt();
+/*
+ * LoadArchiveLibrary
+ *
+ * Loads the archiving callbacks into our local ArchiveContext.
+ */
+static void
+LoadArchiveLibrary(void)
+{
+	ArchiveModuleInit archive_init;
+
+	memset(&ArchiveContext, 0, sizeof(ArchiveModuleCallbacks));
+
+	/*
+	 * If shell archiving is enabled, use our special initialization
+	 * function.  Otherwise, load the library and call its
+	 * _PG_archive_module_init().
+	 */
+	if (ShellArchivingEnabled())
+		archive_init = shell_archive_init;
+	else
+		archive_init = (ArchiveModuleInit)
+			load_external_function(XLogArchiveLibrary,
+								   "_PG_archive_module_init", false, NULL);
+
+	if (archive_init == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules have to declare the "
+						"_PG_archive_module_init symbol")));
+
+	(*archive_init) (&ArchiveContext);
+
+	if (ArchiveContext.archive_file_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register an archive callback")));
 }
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
index b54e701da4..19e240c205 100644
--- a/src/backend/postmaster/shell_archive.c
+++ b/src/backend/postmaster/shell_archive.c
@@ -2,6 +2,10 @@
  *
  * shell_archive.c
  *
+ * This archiving function uses a user-specified shell command (the
+ * archive_command GUC) to copy write-ahead log files.  It is used as the
+ * default, but other modules may define their own custom archiving logic.
+ *
  * Copyright (c) 2022, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
@@ -17,7 +21,25 @@
 #include "pgstat.h"
 #include "postmaster/pgarch.h"
 
-bool
+static bool shell_archive_configured(void);
+static bool shell_archive_file(const char *file, const char *path);
+
+void
+shell_archive_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&shell_archive_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = shell_archive_configured;
+	cb->archive_file_cb = shell_archive_file;
+}
+
+static bool
+shell_archive_configured(void)
+{
+	return XLogArchiveCommand[0] != '\0';
+}
+
+static bool
 shell_archive_file(const char *file, const char *path)
 {
 	char		xlogarchcmd[MAXPGPATH];
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 0f2570d626..0868e5a24f 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -38,6 +38,7 @@
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/interrupt.h"
+#include "postmaster/pgarch.h"
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 4c94f09c64..86b223821f 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3881,13 +3881,23 @@ static struct config_string ConfigureNamesString[] =
 	{
 		{"archive_command", PGC_SIGHUP, WAL_ARCHIVING,
 			gettext_noop("Sets the shell command that will be called to archive a WAL file."),
-			NULL
+			gettext_noop("This is unused if \"archive_library\" does not indicate archiving via shell is enabled.")
 		},
 		&XLogArchiveCommand,
 		"",
 		NULL, NULL, show_archive_command
 	},
 
+	{
+		{"archive_library", PGC_SIGHUP, WAL_ARCHIVING,
+			gettext_noop("Sets the library that will be called to archive a WAL file."),
+			gettext_noop("A value of \"shell\" or an empty string indicates that \"archive_command\" should be used.")
+		},
+		&XLogArchiveLibrary,
+		"shell",
+		NULL, NULL, NULL
+	},
+
 	{
 		{"restore_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will be called to retrieve an archived WAL file."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 817d5f5324..b4376d76aa 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -245,6 +245,7 @@
 
 #archive_mode = off		# enables archiving; off, on, or always
 				# (change requires restart)
+#archive_library = 'shell'	# library to use to archive a logfile segment
 #archive_command = ''		# command to use to archive a logfile segment
 				# placeholders: %p = path of file to archive
 				#               %f = file name only
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index bb0c52686a..85114b2e5f 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -155,7 +155,6 @@ extern PGDLLIMPORT int wal_level;
 /* Is WAL archiving enabled always (even during recovery)? */
 #define XLogArchivingAlways() \
 	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
-#define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')
 
 /*
  * Is WAL-logging necessary for archival or log-shipping, or can we skip
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 991a6d0616..732b12c0ba 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -33,7 +33,55 @@ extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 extern void PgArchForceDirScan(void);
 
-/* in shell_archive.c */
-extern bool shell_archive_file(const char *file, const char *path);
+/*
+ * The value of the archive_library GUC.
+ */
+extern char *XLogArchiveLibrary;
+
+/*
+ * Callback that gets called to determine if the archive module is
+ * configured.
+ */
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+
+/*
+ * Callback called to archive a single WAL file.
+ */
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+
+/*
+ * Called to shutdown an archive module.
+ */
+typedef void (*ArchiveShutdownCB) (void);
+
+/*
+ * Archive module callbacks
+ */
+typedef struct ArchiveModuleCallbacks
+{
+	ArchiveCheckConfiguredCB check_configured_cb;
+	ArchiveFileCB archive_file_cb;
+	ArchiveShutdownCB shutdown_cb;
+} ArchiveModuleCallbacks;
+
+/*
+ * Type of the shared library symbol _PG_archive_module_init that is looked
+ * up when loading an archive library.
+ */
+typedef void (*ArchiveModuleInit) (ArchiveModuleCallbacks *cb);
+
+/*
+ * Since the logic for archiving via a shell command is in the core server
+ * and does not need to be loaded via a shared library, it has a special
+ * initialization function.
+ */
+extern void shell_archive_init(ArchiveModuleCallbacks *cb);
+
+/*
+ * We consider archiving via shell to be enabled if archive_library is
+ * empty or if archive_library is set to "shell".
+ */
+#define ShellArchivingEnabled() \
+	(XLogArchiveLibrary[0] == '\0' || strcmp(XLogArchiveLibrary, "shell") == 0)
 
 #endif							/* _PGARCH_H */
-- 
2.25.1

v14-0002-Add-test-archive-module.patchtext/x-diff; charset=us-asciiDownload
From cd4844b313f006111a94186b5039a48f5960dac1 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:05:43 +0000
Subject: [PATCH v14 2/3] Add test archive module.

---
 contrib/Makefile                              |   1 +
 contrib/basic_archive/.gitignore              |   4 +
 contrib/basic_archive/Makefile                |  20 ++
 contrib/basic_archive/basic_archive.c         | 287 ++++++++++++++++++
 contrib/basic_archive/basic_archive.conf      |   3 +
 .../basic_archive/expected/basic_archive.out  |  29 ++
 contrib/basic_archive/sql/basic_archive.sql   |  22 ++
 7 files changed, 366 insertions(+)
 create mode 100644 contrib/basic_archive/.gitignore
 create mode 100644 contrib/basic_archive/Makefile
 create mode 100644 contrib/basic_archive/basic_archive.c
 create mode 100644 contrib/basic_archive/basic_archive.conf
 create mode 100644 contrib/basic_archive/expected/basic_archive.out
 create mode 100644 contrib/basic_archive/sql/basic_archive.sql

diff --git a/contrib/Makefile b/contrib/Makefile
index 87bf87ab90..e3e221308b 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -9,6 +9,7 @@ SUBDIRS = \
 		amcheck		\
 		auth_delay	\
 		auto_explain	\
+		basic_archive	\
 		bloom		\
 		btree_gin	\
 		btree_gist	\
diff --git a/contrib/basic_archive/.gitignore b/contrib/basic_archive/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/contrib/basic_archive/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/basic_archive/Makefile b/contrib/basic_archive/Makefile
new file mode 100644
index 0000000000..14d036e1c4
--- /dev/null
+++ b/contrib/basic_archive/Makefile
@@ -0,0 +1,20 @@
+# contrib/basic_archive/Makefile
+
+MODULES = basic_archive
+PGFILEDESC = "basic_archive - basic archive module"
+
+REGRESS = basic_archive
+REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/basic_archive/basic_archive.conf
+
+NO_INSTALLCHECK = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/basic_archive
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
new file mode 100644
index 0000000000..ac6d75f51c
--- /dev/null
+++ b/contrib/basic_archive/basic_archive.c
@@ -0,0 +1,287 @@
+/*-------------------------------------------------------------------------
+ *
+ * basic_archive.c
+ *
+ * This file demonstrates a basic archive library implementation that is
+ * roughly equivalent to the following shell command:
+ *
+ * 		test ! -f /path/to/dest && cp /path/to/src /path/to/dest
+ *
+ * One notable difference between this module and the shell command above
+ * is that this module first copies the file to a temporary destination,
+ * syncs it to disk, and then durably moves it to the final destination.
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  contrib/basic_archive/basic_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "miscadmin.h"
+#include "postmaster/pgarch.h"
+#include "storage/copydir.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+void _PG_init(void);
+void _PG_archive_module_init(ArchiveModuleCallbacks *cb);
+
+static char *archive_directory = NULL;
+
+static bool basic_archive_configured(void);
+static bool basic_archive_file(const char *file, const char *path);
+static bool check_archive_directory(char **newval, void **extra, GucSource source);
+static bool compare_files(const char *file1, const char *file2);
+
+/*
+ * _PG_init
+ *
+ * Defines the module's GUC.
+ */
+void
+_PG_init(void)
+{
+	DefineCustomStringVariable("basic_archive.archive_directory",
+							   gettext_noop("Archive file destination directory."),
+							   NULL,
+							   &archive_directory,
+							   "",
+							   PGC_SIGHUP,
+							   0,
+							   check_archive_directory, NULL, NULL);
+
+	EmitWarningsOnPlaceholders("basic_archive");
+}
+
+/*
+ * _PG_archive_module_init
+ *
+ * Returns the module's archiving callbacks.
+ */
+void
+_PG_archive_module_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = basic_archive_configured;
+	cb->archive_file_cb = basic_archive_file;
+}
+
+/*
+ * check_archive_directory
+ *
+ * Checks that the provided archive directory exists.
+ */
+static bool
+check_archive_directory(char **newval, void **extra, GucSource source)
+{
+	struct stat st;
+
+	/*
+	 * The default value is an empty string, so we have to accept that value.
+	 * Our check_configured callback also checks for this and prevents archiving
+	 * from proceeding if it is still empty.
+	 */
+	if (*newval == NULL || *newval[0] == '\0')
+		return true;
+
+	/*
+	 * Make sure the file paths won't be too long.  The docs indicate that the
+	 * file names to be archived can be up to 64 characters long.
+	 */
+	if (strlen(*newval) + 64 + 2 >= MAXPGPATH)
+	{
+		GUC_check_errdetail("archive directory too long");
+		return false;
+	}
+
+	/*
+	 * Do a basic sanity check that the specified archive directory exists.  It
+	 * could be removed at some point in the future, so we still need to be
+	 * prepared for it not to exist in the actual archiving logic.
+	 */
+	if (stat(*newval, &st) != 0 || !S_ISDIR(st.st_mode))
+	{
+		GUC_check_errdetail("specified archive directory does not exist");
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * basic_archive_configured
+ *
+ * Checks that archive_directory is not blank.
+ */
+static bool
+basic_archive_configured(void)
+{
+	return archive_directory != NULL && archive_directory[0] != '\0';
+}
+
+/*
+ * basic_archive_file
+ *
+ * Archives one file.
+ */
+static bool
+basic_archive_file(const char *file, const char *path)
+{
+	char		destination[MAXPGPATH];
+	char		temp[MAXPGPATH + 64];
+	struct stat st;
+
+	ereport(DEBUG3,
+			(errmsg("archiving \"%s\" via basic_archive", file)));
+
+	snprintf(destination, MAXPGPATH, "%s/%s", archive_directory, file);
+
+	/*
+	 * First, check if the file has already been archived.  If it already exists
+	 * and has the same contents as the file we're trying to archive, we can
+	 * return success (after ensuring the file is persisted to disk). This
+	 * scenario is possible if the server crashed after archiving the file but
+	 * before renaming its .ready file to .done.
+	 *
+	 * If the archive file already exists but has different contents, something
+	 * might be wrong, so we just fail.
+	 */
+	if (stat(destination, &st) == 0)
+	{
+		if (compare_files(path, destination))
+		{
+			ereport(DEBUG3,
+					(errmsg("archive file \"%s\" already exists with identical contents",
+							destination)));
+
+			fsync_fname(destination, false);
+			fsync_fname(archive_directory, true);
+
+			return true;
+		}
+
+		ereport(WARNING,
+				(errmsg("archive file \"%s\" already exists", destination)));
+		return false;
+	}
+	else if (errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat file \"%s\": %m", destination)));
+
+	/*
+	 * Pick a sufficiently random name for the temporary file so that a
+	 * collision is unlikely.  This helps avoid problems in case a temporary
+	 * file was left around after a crash or another server happens to be
+	 * archiving to the same directory.
+	 */
+	snprintf(temp, sizeof(temp), "%s/%s.%s.%d.%d", archive_directory,
+			 "archtemp", file, MyProcPid, (int) (random() & 0x7fff));
+
+	/*
+	 * Copy the file to its temporary destination.  Note that this will fail if
+	 * temp already exists.
+	 */
+	copy_file(unconstify(char *, path), temp);
+
+	/*
+	 * Sync the temporary file to disk and move it to its final destination.
+	 * This will fail if destination already exists.
+	 */
+	(void) durable_rename_excl(temp, destination, ERROR);
+
+	ereport(DEBUG1,
+			(errmsg("archived \"%s\" via basic_archive", file)));
+
+	return true;
+}
+
+/*
+ * compare_files
+ *
+ * Returns whether the contents of the files are the same.
+ */
+static bool
+compare_files(const char *file1, const char *file2)
+{
+#define CMP_BUF_SIZE (4096)
+	char		buf1[CMP_BUF_SIZE];
+	char		buf2[CMP_BUF_SIZE];
+	int			fd1;
+	int			fd2;
+	bool		ret = true;
+
+	fd1 = OpenTransientFile(file1, O_RDONLY | PG_BINARY);
+	if (fd1 < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", file1)));
+
+	fd2 = OpenTransientFile(file2, O_RDONLY | PG_BINARY);
+	if (fd2 < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", file2)));
+
+	for (;;)
+	{
+		int		nbytes = 0;
+		int		buf1_len = 0;
+		int		buf2_len = 0;
+
+		while (buf1_len < CMP_BUF_SIZE)
+		{
+			nbytes = read(fd1, buf1 + buf1_len, CMP_BUF_SIZE - buf1_len);
+			if (nbytes < 0)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not read file \"%s\": %m", file1)));
+			else if (nbytes == 0)
+				break;
+
+			buf1_len += nbytes;
+		}
+
+		while (buf2_len < CMP_BUF_SIZE)
+		{
+			nbytes = read(fd2, buf2 + buf2_len, CMP_BUF_SIZE - buf2_len);
+			if (nbytes < 0)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not read file \"%s\": %m", file2)));
+			else if (nbytes == 0)
+				break;
+
+			buf2_len += nbytes;
+		}
+
+		if (buf1_len != buf2_len || memcmp(buf1, buf2, buf1_len) != 0)
+		{
+			ret = false;
+			break;
+		}
+		else if (buf1_len == 0)
+			break;
+	}
+
+	if (CloseTransientFile(fd1) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", file1)));
+
+	if (CloseTransientFile(fd2) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", file2)));
+
+	return ret;
+}
diff --git a/contrib/basic_archive/basic_archive.conf b/contrib/basic_archive/basic_archive.conf
new file mode 100644
index 0000000000..b26b2d4144
--- /dev/null
+++ b/contrib/basic_archive/basic_archive.conf
@@ -0,0 +1,3 @@
+archive_mode = 'on'
+archive_library = 'basic_archive'
+basic_archive.archive_directory = '.'
diff --git a/contrib/basic_archive/expected/basic_archive.out b/contrib/basic_archive/expected/basic_archive.out
new file mode 100644
index 0000000000..0015053e0f
--- /dev/null
+++ b/contrib/basic_archive/expected/basic_archive.out
@@ -0,0 +1,29 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+ ?column? 
+----------
+        1
+(1 row)
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test;
diff --git a/contrib/basic_archive/sql/basic_archive.sql b/contrib/basic_archive/sql/basic_archive.sql
new file mode 100644
index 0000000000..14e236d57a
--- /dev/null
+++ b/contrib/basic_archive/sql/basic_archive.sql
@@ -0,0 +1,22 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+
+DROP TABLE test;
-- 
2.25.1

v14-0003-Add-documentation-for-archive-modules.patchtext/x-diff; charset=us-asciiDownload
From 4c6f42839ebce326f91d577fb3d3160086ea1e24 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:06:01 +0000
Subject: [PATCH v14 3/3] Add documentation for archive modules.

---
 doc/src/sgml/archive-modules.sgml   | 141 ++++++++++++++++++++++++++++
 doc/src/sgml/backup.sgml            |  83 ++++++++++------
 doc/src/sgml/basic-archive.sgml     |  81 ++++++++++++++++
 doc/src/sgml/config.sgml            |  37 ++++++--
 doc/src/sgml/contrib.sgml           |   1 +
 doc/src/sgml/filelist.sgml          |   2 +
 doc/src/sgml/high-availability.sgml |   6 +-
 doc/src/sgml/postgres.sgml          |   1 +
 doc/src/sgml/ref/pg_basebackup.sgml |   4 +-
 doc/src/sgml/ref/pg_receivewal.sgml |   6 +-
 doc/src/sgml/wal.sgml               |   2 +-
 11 files changed, 316 insertions(+), 48 deletions(-)
 create mode 100644 doc/src/sgml/archive-modules.sgml
 create mode 100644 doc/src/sgml/basic-archive.sgml

diff --git a/doc/src/sgml/archive-modules.sgml b/doc/src/sgml/archive-modules.sgml
new file mode 100644
index 0000000000..722dde0d42
--- /dev/null
+++ b/doc/src/sgml/archive-modules.sgml
@@ -0,0 +1,141 @@
+<!-- doc/src/sgml/archive-modules.sgml -->
+
+<chapter id="archive-modules">
+ <title>Archive Modules</title>
+ <indexterm zone="archive-modules">
+  <primary>Archive Modules</primary>
+ </indexterm>
+
+ <para>
+  PostgreSQL provides infrastructure to create custom modules for continuous
+  archiving (see <xref linkend="continuous-archiving"/>).  While archiving via
+  a shell command (i.e., <xref linkend="guc-archive-command"/>) is much
+  simpler, a custom archive module will often be considerably more robust and
+  performant.
+ </para>
+
+ <para>
+  When a custom <xref linkend="guc-archive-library"/> is configured, PostgreSQL
+  will submit completed WAL files to the module, and the server will avoid
+  recyling or removing these WAL files until the module indicates that the files
+  were successfully archived.  It is ultimately up to the module to decide what
+  to do with each WAL file, but many recommendations are listed at
+  <xref linkend="backup-archiving-wal"/>.
+ </para>
+
+ <para>
+  Archiving modules must at least consist of an initialization function (see
+  <xref linkend="archive-module-init"/>) and the required callbacks (see
+  <xref linkend="archive-module-callbacks"/>).  However, archive modules are
+  also permitted to do much more (e.g., declare GUCs and register background
+  workers).
+ </para>
+
+ <para>
+  The <filename>contrib/basic_archive</filename> module contains a working
+  example, which demonstrates some useful techniques.
+ </para>
+
+ <warning>
+  <para>
+   There are considerable robustness and security risks in using archive modules
+   because, being written in the <literal>C</literal> language, they have access
+   to many server resources.  Administrators wishing to enable archive modules
+   should exercise extreme caution.  Only carefully audited modules should be
+   loaded.
+  </para>
+ </warning>
+
+ <sect1 id="archive-module-init">
+  <title>Initialization Functions</title>
+  <indexterm zone="archive-module-init">
+   <primary>_PG_archive_module_init</primary>
+  </indexterm>
+  <para>
+   An archive library is loaded by dynamically loading a shared library with the
+   <xref linkend="guc-archive-library"/>'s name as the library base name.  The
+   normal library search path is used to locate the library.  To provide the
+   required archive module callbacks and to indicate that the library is
+   actually an archive module, it needs to provide a function named
+   <function>_PG_archive_module_init</function>.  This function is passed a
+   struct that needs to be filled with the callback function pointers for
+   individual actions.
+
+<programlisting>
+typedef struct ArchiveModuleCallbacks
+{
+    ArchiveCheckConfiguredCB check_configured_cb;
+    ArchiveFileCB archive_file_cb;
+    ArchiveShutdownCB shutdown_cb;
+} ArchiveModuleCallbacks;
+typedef void (*ArchiveModuleInit) (struct ArchiveModuleCallbacks *cb);
+</programlisting>
+
+   Only the <function>archive_file_cb</function> callback is required.  The
+   others are optional.
+  </para>
+ </sect1>
+
+ <sect1 id="archive-module-callbacks">
+  <title>Archive Module Callbacks</title>
+  <para>
+   The archive callbacks define the actual archiving behavior of the module.
+   The server will call them as required to process each individual WAL file.
+  </para>
+
+  <sect2 id="archive-module-check">
+   <title>Check Callback</title>
+   <para>
+    The <function>check_configured_cb</function> callback is called to determine
+    whether the module is fully configured and ready to accept WAL files.  If no
+    <function>check_configured_cb</function> is defined, the server always
+    assumes the module is configured.
+
+<programlisting>
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server will proceed with
+    archiving the file by calling the <function>archive_file_cb</function>
+    callback.  If <literal>false</literal> is returned, archiving will not
+    proceed.  In the latter case, the server will periodically call this
+    function, and archiving will proceed if it eventually returns
+    <literal>true</literal>.
+   </para>
+  </sect2>
+
+  <sect2 id="archive-module-archive">
+   <title>Archive Callback</title>
+   <para>
+    The <function>archive_file_cb</function> callback is called to archive a
+    single WAL file.
+
+<programlisting>
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server proceeds as if the file
+    was successfully archived, which may include recycling or removing the
+    original WAL file.  If <literal>false</literal> is returned, the server will
+    keep the original WAL file and retry archiving later.
+    <literal>file</literal> will contain just the file name of the WAL file to
+    archive, while <literal>path</literal> contains the full path of the WAL
+    file (including the file name).
+   </para>
+  </sect2>
+
+  <sect2 id="archive-module-shutdown">
+   <title>Shutdown Callback</title>
+   <para>
+    The <function>shutdown_cb</function> callback is called when the value of
+    <xref linkend="guc-archive-library"/> changes (and before the new archive
+    library is loaded).  If no <function>shutdown_cb</function> is defined, no
+    special action is taken before loading the new archive library.
+
+<programlisting>
+typedef void (*ArchiveShutdownCB) (void);
+</programlisting>
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index cba32b6eb3..b42f1b3ca7 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -593,20 +593,23 @@ tar -cf backup.tar /usr/local/pgsql/data
     provide the database administrator with flexibility,
     <productname>PostgreSQL</productname> tries not to make any assumptions about how
     the archiving will be done.  Instead, <productname>PostgreSQL</productname> lets
-    the administrator specify a shell command to be executed to copy a
-    completed segment file to wherever it needs to go.  The command could be
-    as simple as a <literal>cp</literal>, or it could invoke a complex shell
-    script &mdash; it's all up to you.
+    the administrator specify an archive library to be executed to copy a
+    completed segment file to wherever it needs to go.  This could be as simple
+    as a shell command that uses <literal>cp</literal>, or it could invoke a
+    complex C function &mdash; it's all up to you.
    </para>
 
    <para>
     To enable WAL archiving, set the <xref linkend="guc-wal-level"/>
     configuration parameter to <literal>replica</literal> or higher,
     <xref linkend="guc-archive-mode"/> to <literal>on</literal>,
-    and specify the shell command to use in the <xref
-    linkend="guc-archive-command"/> configuration parameter.  In practice
+    and specify the library to use in the <xref
+    linkend="guc-archive-library"/> configuration parameter.  In practice
     these settings will always be placed in the
     <filename>postgresql.conf</filename> file.
+    One simple way to archive is to set <varname>archive_library</varname> to
+    <literal>shell</literal> and to specify a shell command in
+    <xref linkend="guc-archive-command"/>.
     In <varname>archive_command</varname>,
     <literal>%p</literal> is replaced by the path name of the file to
     archive, while <literal>%f</literal> is replaced by only the file name.
@@ -631,7 +634,17 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command will be executed under the ownership of the same
+    Another way to archive is to use a custom archive module as the
+    <varname>archive_library</varname>.  Since such modules are written in
+    <literal>C</literal>, creating your own may require considerably more effort
+    than writing a shell command.  However, archive modules can be more
+    performant than archiving via shell, and they will have access to many
+    useful server resources.  For more information about archive modules, see
+    <xref linkend="archive-modules"/>.
+   </para>
+
+   <para>
+    The archive library will be executed under the ownership of the same
     user that the <productname>PostgreSQL</productname> server is running as.  Since
     the series of WAL files being archived contains effectively everything
     in your database, you will want to be sure that the archived data is
@@ -640,25 +653,31 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is important that the archive command return zero exit status if and
-    only if it succeeds.  Upon getting a zero result,
+    It is important that the archive function return <literal>true</literal> if
+    and only if it succeeds.  If <literal>true</literal> is returned,
     <productname>PostgreSQL</productname> will assume that the file has been
-    successfully archived, and will remove or recycle it.  However, a nonzero
-    status tells <productname>PostgreSQL</productname> that the file was not archived;
-    it will try again periodically until it succeeds.
+    successfully archived, and will remove or recycle it.  However, a return
+    value of <literal>false</literal> tells
+    <productname>PostgreSQL</productname> that the file was not archived; it
+    will try again periodically until it succeeds.  If you are archiving via a
+    shell command, the appropriate return values can be achieved by returning
+    <literal>0</literal> if the command succeeds and a nonzero value if it
+    fails.
    </para>
 
    <para>
-    When the archive command is terminated by a signal (other than
-    <systemitem>SIGTERM</systemitem> that is used as part of a server
-    shutdown) or an error by the shell with an exit status greater than
-    125 (such as command not found), the archiver process aborts and gets
-    restarted by the postmaster. In such cases, the failure is
-    not reported in <xref linkend="pg-stat-archiver-view"/>.
+    If the archive function emits an <literal>ERROR</literal> or
+    <literal>FATAL</literal>, the archiver process aborts and gets restarted by
+    the postmaster.  If you are archiving via shell command, FATAL is emitted if
+    the command is terminated by a signal (other than
+    <systemitem>SIGTERM</systemitem> that is used as part of a server shutdown)
+    or an error by the shell with an exit status greater than 125 (such as
+    command not found).  In such cases, the failure is not reported in
+    <xref linkend="pg-stat-archiver-view"/>.
    </para>
 
    <para>
-    The archive command should generally be designed to refuse to overwrite
+    The archive library should generally be designed to refuse to overwrite
     any pre-existing archive file.  This is an important safety feature to
     preserve the integrity of your archive in case of administrator error
     (such as sending the output of two different servers to the same archive
@@ -666,9 +685,9 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is advisable to test your proposed archive command to ensure that it
+    It is advisable to test your proposed archive library to ensure that it
     indeed does not overwrite an existing file, <emphasis>and that it returns
-    nonzero status in this case</emphasis>.
+    <literal>false</literal> in this case</emphasis>.
     The example command above for Unix ensures this by including a separate
     <command>test</command> step.  On some Unix platforms, <command>cp</command> has
     switches such as <option>-i</option> that can be used to do the same thing
@@ -680,7 +699,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
 
    <para>
     While designing your archiving setup, consider what will happen if
-    the archive command fails repeatedly because some aspect requires
+    the archive library fails repeatedly because some aspect requires
     operator intervention or the archive runs out of space. For example, this
     could occur if you write to tape without an autochanger; when the tape
     fills, nothing further can be archived until the tape is swapped.
@@ -695,7 +714,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The speed of the archiving command is unimportant as long as it can keep up
+    The speed of the archive library is unimportant as long as it can keep up
     with the average rate at which your server generates WAL data.  Normal
     operation continues even if the archiving process falls a little behind.
     If archiving falls significantly behind, this will increase the amount of
@@ -707,11 +726,11 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    In writing your archive command, you should assume that the file names to
+    In writing your archive library, you should assume that the file names to
     be archived can be up to 64 characters long and can contain any
     combination of ASCII letters, digits, and dots.  It is not necessary to
-    preserve the original relative path (<literal>%p</literal>) but it is necessary to
-    preserve the file name (<literal>%f</literal>).
+    preserve the original relative path but it is necessary to preserve the file
+    name.
    </para>
 
    <para>
@@ -728,7 +747,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command is only invoked on completed WAL segments.  Hence,
+    The archive function is only invoked on completed WAL segments.  Hence,
     if your server generates only little WAL traffic (or has slack periods
     where it does so), there could be a long delay between the completion
     of a transaction and its safe recording in archive storage.  To put
@@ -758,7 +777,8 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
     contain enough information for archive recovery.  (Crash recovery is
     unaffected.)  For this reason, <varname>wal_level</varname> can only be changed at
     server start.  However, <varname>archive_command</varname> can be changed with a
-    configuration file reload.  If you wish to temporarily stop archiving,
+    configuration file reload.  If you are archiving via shell and wish to
+    temporarily stop archiving,
     one way to do it is to set <varname>archive_command</varname> to the empty
     string (<literal>''</literal>).
     This will cause WAL files to accumulate in <filename>pg_wal/</filename> until a
@@ -938,11 +958,11 @@ SELECT * FROM pg_stop_backup(false, true);
      On a standby, <varname>archive_mode</varname> must be <literal>always</literal> in order
      for <function>pg_stop_backup</function> to wait.
      Archiving of these files happens automatically since you have
-     already configured <varname>archive_command</varname>. In most cases this
+     already configured <varname>archive_library</varname>. In most cases this
      happens quickly, but you are advised to monitor your archive
      system to ensure there are no delays.
      If the archive process has fallen behind
-     because of failures of the archive command, it will keep retrying
+     because of failures of the archive library, it will keep retrying
      until the archive succeeds and the backup is complete.
      If you wish to place a time limit on the execution of
      <function>pg_stop_backup</function>, set an appropriate
@@ -1500,9 +1520,10 @@ restore_command = 'cp /mnt/server/archivedir/%f %p'
       To prepare for low level standalone hot backups, make sure
       <varname>wal_level</varname> is set to
       <literal>replica</literal> or higher, <varname>archive_mode</varname> to
-      <literal>on</literal>, and set up an <varname>archive_command</varname> that performs
+      <literal>on</literal>, and set up an <varname>archive_library</varname> that performs
       archiving only when a <emphasis>switch file</emphasis> exists.  For example:
 <programlisting>
+archive_library = 'shell'
 archive_command = 'test ! -f /var/lib/pgsql/backup_in_progress || (test ! -f /var/lib/pgsql/archive/%f &amp;&amp; cp %p /var/lib/pgsql/archive/%f)'
 </programlisting>
       This command will perform archiving when
diff --git a/doc/src/sgml/basic-archive.sgml b/doc/src/sgml/basic-archive.sgml
new file mode 100644
index 0000000000..0b650f17a8
--- /dev/null
+++ b/doc/src/sgml/basic-archive.sgml
@@ -0,0 +1,81 @@
+<!-- doc/src/sgml/basic-archive.sgml -->
+
+<sect1 id="basic-archive" xreflabel="basic_archive">
+ <title>basic_archive</title>
+
+ <indexterm zone="basic-archive">
+  <primary>basic_archive</primary>
+ </indexterm>
+
+ <para>
+  <filename>basic_archive</filename> is an example of an archive module.  This
+  module copies completed WAL segment files to the specified directory.  This
+  may not be especially useful, but it can serve as a starting point for
+  developing your own archive module.  For more information about archive
+  modules, see <xref linkend="archive-modules"/>.
+ </para>
+
+ <para>
+  In order to function, this module must be loaded via
+  <xref linkend="guc-archive-library"/>, and <xref linkend="guc-archive-mode"/>
+  must be enabled.
+ </para>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term>
+     <varname>basic_archive.archive_directory</varname> (<type>string</type>)
+     <indexterm>
+      <primary><varname>basic_archive.archive_directory</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      The directory where the server should copy WAL segment files.  This
+      directory must already exist.  The default is an empty string, which
+      effectively halts WAL archiving, but if <xref linkend="guc-archive-mode"/>
+      is enabled, the server will accumulate WAL segment files in the
+      expectation that a value will soon be provided.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <para>
+   These parameters must be set in <filename>postgresql.conf</filename>.
+   Typical usage might be:
+  </para>
+
+<programlisting>
+# postgresql.conf
+archive_mode = 'on'
+archive_library = 'basic_archive'
+basic_archive.archive_directory = '/path/to/archive/directory'
+</programlisting>
+ </sect2>
+
+ <sect2>
+  <title>Notes</title>
+
+  <para>
+   Server crashes may leave temporary files with the prefix
+   <filename>archtemp</filename> in the archive directory.  It is recommended to
+   delete such files before restarting the server after a crash.  It is safe to
+   remove such files while the server is running as long as they are unrelated
+   to any archiving still in progress, but users should use extra caution when
+   doing so.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Author</title>
+
+  <para>
+   Nathan Bossart
+  </para>
+ </sect2>
+
+</sect1>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 692d8a2a17..1836e35ac4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3479,7 +3479,7 @@ include_dir 'conf.d'
         Maximum size to let the WAL grow during automatic
         checkpoints. This is a soft limit; WAL size can exceed
         <varname>max_wal_size</varname> under special circumstances, such as
-        heavy load, a failing <varname>archive_command</varname>, or a high
+        heavy load, a failing <varname>archive_library</varname>, or a high
         <varname>wal_keep_size</varname> setting.
         If this value is specified without units, it is taken as megabytes.
         The default is 1 GB.
@@ -3528,7 +3528,7 @@ include_dir 'conf.d'
        <para>
         When <varname>archive_mode</varname> is enabled, completed WAL segments
         are sent to archive storage by setting
-        <xref linkend="guc-archive-command"/>. In addition to <literal>off</literal>,
+        <xref linkend="guc-archive-library"/>. In addition to <literal>off</literal>,
         to disable, there are two modes: <literal>on</literal>, and
         <literal>always</literal>. During normal operation, there is no
         difference between the two modes, but when set to <literal>always</literal>
@@ -3538,9 +3538,6 @@ include_dir 'conf.d'
         <xref linkend="continuous-archiving-in-standby"/> for details.
        </para>
        <para>
-        <varname>archive_mode</varname> and <varname>archive_command</varname> are
-        separate variables so that <varname>archive_command</varname> can be
-        changed without leaving archiving mode.
         This parameter can only be set at server start.
         <varname>archive_mode</varname> cannot be enabled when
         <varname>wal_level</varname> is set to <literal>minimal</literal>.
@@ -3548,6 +3545,28 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-archive-library" xreflabel="archive_library">
+      <term><varname>archive_library</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>archive_library</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        The library to use for archiving completed WAL file segments.  If set to
+        <literal>shell</literal> (the default) or an empty string, archiving via
+        shell is enabled, and <xref linkend="guc-archive-command"/> is used.
+        Otherwise, the specified shared library is used for archiving.  For more
+        information, see <xref linkend="backup-archiving-wal"/> and
+        <xref linkend="archive-modules"/>.
+       </para>
+       <para>
+        This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-archive-command" xreflabel="archive_command">
       <term><varname>archive_command</varname> (<type>string</type>)
       <indexterm>
@@ -3570,9 +3589,11 @@ include_dir 'conf.d'
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
         file or on the server command line.  It is ignored unless
-        <varname>archive_mode</varname> was enabled at server start.
+        <varname>archive_mode</varname> was enabled at server start and
+        <varname>archive_library</varname> specifies to archive via shell command.
         If <varname>archive_command</varname> is an empty string (the default) while
-        <varname>archive_mode</varname> is enabled, WAL archiving is temporarily
+        <varname>archive_mode</varname> is enabled and <varname>archive_library</varname>
+        specifies archiving via shell, WAL archiving is temporarily
         disabled, but the server continues to accumulate WAL segment files in
         the expectation that a command will soon be provided.  Setting
         <varname>archive_command</varname> to a command that does nothing but
@@ -3592,7 +3613,7 @@ include_dir 'conf.d'
       </term>
       <listitem>
        <para>
-        The <xref linkend="guc-archive-command"/> is only invoked for
+        The <xref linkend="guc-archive-library"/> is only invoked for
         completed WAL segments. Hence, if your server generates little WAL
         traffic (or has slack periods where it does so), there could be a
         long delay between the completion of a transaction and its safe
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index d3ca4b6932..be9711c6f2 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -99,6 +99,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
  &amcheck;
  &auth-delay;
  &auto-explain;
+ &basic-archive;
  &bloom;
  &btree-gin;
  &btree-gist;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 89454e99b9..328cd1f378 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -99,6 +99,7 @@
 <!ENTITY custom-scan SYSTEM "custom-scan.sgml">
 <!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
 <!ENTITY replication-origins SYSTEM "replication-origins.sgml">
+<!ENTITY archive-modules SYSTEM "archive-modules.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
@@ -112,6 +113,7 @@
 <!ENTITY amcheck         SYSTEM "amcheck.sgml">
 <!ENTITY auth-delay      SYSTEM "auth-delay.sgml">
 <!ENTITY auto-explain    SYSTEM "auto-explain.sgml">
+<!ENTITY basic-archive   SYSTEM "basic-archive.sgml">
 <!ENTITY bloom           SYSTEM "bloom.sgml">
 <!ENTITY btree-gin       SYSTEM "btree-gin.sgml">
 <!ENTITY btree-gist      SYSTEM "btree-gist.sgml">
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index a265409f02..437712762a 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -935,7 +935,7 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
     In lieu of using replication slots, it is possible to prevent the removal
     of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by
     storing the segments in an archive using
-    <xref linkend="guc-archive-command"/>.
+    <xref linkend="guc-archive-library"/>.
     However, these methods often result in retaining more WAL segments than
     required, whereas replication slots retain only the number of segments
     known to be needed.  On the other hand, replication slots can retain so
@@ -1386,10 +1386,10 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
      to <literal>always</literal>, and the standby will call the archive
      command for every WAL segment it receives, whether it's by restoring
      from the archive or by streaming replication. The shared archive can
-     be handled similarly, but the <varname>archive_command</varname> must
+     be handled similarly, but the <varname>archive_library</varname> must
      test if the file being archived exists already, and if the existing file
      has identical contents. This requires more care in the
-     <varname>archive_command</varname>, as it must
+     <varname>archive_library</varname>, as it must
      be careful to not overwrite an existing file with different contents,
      but return success if the exactly same file is archived twice. And
      all that must be done free of race conditions, if two servers attempt
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index dba9cf413f..3db6d2160b 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -233,6 +233,7 @@ break is not needed in a wider output rendering.
   &bgworker;
   &logicaldecoding;
   &replication-origins;
+  &archive-modules;
 
  </part>
 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 1546f10c0d..e7ae29ec3d 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -102,8 +102,8 @@ PostgreSQL documentation
      <para>
       All WAL records required for the backup must contain sufficient full-page writes,
       which requires you to enable <varname>full_page_writes</varname> on the primary and
-      not to use a tool like <application>pg_compresslog</application> as
-      <varname>archive_command</varname> to remove full-page writes from WAL files.
+      not to use a tool in your <varname>archive_library</varname> to remove
+      full-page writes from WAL files.
      </para>
     </listitem>
    </itemizedlist>
diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index b2e41ea814..b846213fb7 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -40,7 +40,7 @@ PostgreSQL documentation
   <para>
    <application>pg_receivewal</application> streams the write-ahead
    log in real time as it's being generated on the server, and does not wait
-   for segments to complete like <xref linkend="guc-archive-command"/> does.
+   for segments to complete like <xref linkend="guc-archive-library"/> does.
    For this reason, it is not necessary to set
    <xref linkend="guc-archive-timeout"/> when using
     <application>pg_receivewal</application>.
@@ -487,11 +487,11 @@ PostgreSQL documentation
 
   <para>
    When using <application>pg_receivewal</application> instead of
-   <xref linkend="guc-archive-command"/> as the main WAL backup method, it is
+   <xref linkend="guc-archive-library"/> as the main WAL backup method, it is
    strongly recommended to use replication slots.  Otherwise, the server is
    free to recycle or remove write-ahead log files before they are backed up,
    because it does not have any information, either
-   from <xref linkend="guc-archive-command"/> or the replication slots, about
+   from <xref linkend="guc-archive-library"/> or the replication slots, about
    how far the WAL stream has been archived.  Note, however, that a
    replication slot will fill up the server's disk space if the receiver does
    not keep up with fetching the WAL data.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 24e1c89503..2bb27a8468 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -636,7 +636,7 @@
    WAL files plus one additional WAL file are
    kept at all times. Also, if WAL archiving is used, old segments cannot be
    removed or recycled until they are archived. If WAL archiving cannot keep up
-   with the pace that WAL is generated, or if <varname>archive_command</varname>
+   with the pace that WAL is generated, or if <varname>archive_library</varname>
    fails repeatedly, old WAL files will accumulate in <filename>pg_wal</filename>
    until the situation is resolved. A slow or failed standby server that
    uses a replication slot will have the same effect (see
-- 
2.25.1

#31Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#30)
3 attachment(s)
Re: archive modules

On Sat, Jan 29, 2022 at 12:50:18PM -0800, Nathan Bossart wrote:

Here is a new revision. I've moved basic_archive to contrib, hardened it
as suggested, and added shutdown support for archive modules.

cfbot was unhappy with v14, so here's another attempt. One other change I
am pondering is surrounding pgarch_MainLoop() with PG_TRY/PG_FINALLY so
that we can also call the shutdown callback in the event of an ERROR. This
might be necessary for an archive module that uses background workers.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v15-0001-Introduce-archive-modules-infrastructure.patchtext/x-diff; charset=us-asciiDownload
From f62fea53b93ba7181dfe084b4100eba59eb82aaa Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:04:41 +0000
Subject: [PATCH v15 1/3] Introduce archive modules infrastructure.

---
 src/backend/access/transam/xlog.c             |  2 +-
 src/backend/postmaster/pgarch.c               | 93 +++++++++++++++++--
 src/backend/postmaster/shell_archive.c        | 24 ++++-
 src/backend/utils/init/miscinit.c             |  1 +
 src/backend/utils/misc/guc.c                  | 12 ++-
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/xlog.h                     |  1 -
 src/include/postmaster/pgarch.h               | 52 ++++++++++-
 8 files changed, 172 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dfe2a0bcce..958220c495 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8831,7 +8831,7 @@ ShutdownXLOG(int code, Datum arg)
 		 * process one more time at the end of shutdown). The checkpoint
 		 * record will go to the next XLOG file and won't be archived (yet).
 		 */
-		if (XLogArchivingActive() && XLogArchiveCommandSet())
+		if (XLogArchivingActive())
 			RequestXLogSwitch(false);
 
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 6e3fcedc97..d4a7ca97ca 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -89,6 +89,8 @@ typedef struct PgArchData
 	slock_t		arch_lck;
 } PgArchData;
 
+char *XLogArchiveLibrary = "";
+
 
 /* ----------
  * Local data
@@ -96,6 +98,8 @@ typedef struct PgArchData
  */
 static time_t last_sigterm_time = 0;
 static PgArchData *PgArch = NULL;
+static ArchiveModuleCallbacks ArchiveContext;
+
 
 /*
  * Stuff for tracking multiple files to archive from each scan of
@@ -140,6 +144,7 @@ static void pgarch_archiveDone(char *xlog);
 static void pgarch_die(int code, Datum arg);
 static void HandlePgArchInterrupts(void);
 static int ready_file_comparator(Datum a, Datum b, void *arg);
+static void LoadArchiveLibrary(void);
 
 /* Report shared memory space needed by PgArchShmemInit */
 Size
@@ -236,6 +241,11 @@ PgArchiverMain(void)
 	 */
 	PgArch->pgprocno = MyProc->pgprocno;
 
+	/*
+	 * Load the archive_library.
+	 */
+	LoadArchiveLibrary();
+
 	/* Create workspace for pgarch_readyXlog() */
 	arch_files = palloc(sizeof(struct arch_files_state));
 	arch_files->arch_files_size = 0;
@@ -407,11 +417,12 @@ pgarch_ArchiverCopyLoop(void)
 			 */
 			HandlePgArchInterrupts();
 
-			/* can't do anything if no command ... */
-			if (!XLogArchiveCommandSet())
+			/* can't do anything if not configured ... */
+			if (ArchiveContext.check_configured_cb != NULL &&
+				!ArchiveContext.check_configured_cb())
 			{
 				ereport(WARNING,
-						(errmsg("archive_mode enabled, yet archive_command is not set")));
+						(errmsg("archive_mode enabled, yet archiving is not configured")));
 				return;
 			}
 
@@ -492,7 +503,7 @@ pgarch_ArchiverCopyLoop(void)
 /*
  * pgarch_archiveXlog
  *
- * Invokes system(3) to copy one archive file to wherever it should go
+ * Invokes archive_file_cb to copy one archive file to wherever it should go
  *
  * Returns true if successful
  */
@@ -509,7 +520,7 @@ pgarch_archiveXlog(char *xlog)
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	ret = shell_archive_file(xlog, pathname);
+	ret = ArchiveContext.archive_file_cb(xlog, pathname);
 	if (ret)
 		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	else
@@ -759,13 +770,79 @@ HandlePgArchInterrupts(void)
 	if (ProcSignalBarrierPending)
 		ProcessProcSignalBarrier();
 
+	/* Perform logging of memory contexts of this process */
+	if (LogMemoryContextPending)
+		ProcessLogMemoryContextInterrupt();
+
 	if (ConfigReloadPending)
 	{
+		char	   *archiveLib = pstrdup(XLogArchiveLibrary);
+		bool		archiveLibChanged;
+
 		ConfigReloadPending = false;
 		ProcessConfigFile(PGC_SIGHUP);
+
+		archiveLibChanged = strcmp(XLogArchiveLibrary, archiveLib) != 0;
+		pfree(archiveLib);
+
+		if (archiveLibChanged)
+		{
+			/*
+			 * Call the currently loaded archive module's shutdown callback, if
+			 * one is defined.
+			 */
+			if (ArchiveContext.shutdown_cb != NULL)
+				ArchiveContext.shutdown_cb();
+
+			/*
+			 * Ideally, we would simply unload the previous archive module and
+			 * load the new one, but there is presently no mechanism for
+			 * unloading a library (see the comment above
+			 * internal_unload_library()).  To deal with this, we simply restart
+			 * the archiver.  The new archive module will be loaded when the new
+			 * archiver process starts up.
+			 */
+			ereport(LOG,
+					(errmsg("restarting archiver process because value of "
+							"\"archive_library\" was changed")));
+
+			proc_exit(0);
+		}
 	}
+}
 
-	/* Perform logging of memory contexts of this process */
-	if (LogMemoryContextPending)
-		ProcessLogMemoryContextInterrupt();
+/*
+ * LoadArchiveLibrary
+ *
+ * Loads the archiving callbacks into our local ArchiveContext.
+ */
+static void
+LoadArchiveLibrary(void)
+{
+	ArchiveModuleInit archive_init;
+
+	memset(&ArchiveContext, 0, sizeof(ArchiveModuleCallbacks));
+
+	/*
+	 * If shell archiving is enabled, use our special initialization
+	 * function.  Otherwise, load the library and call its
+	 * _PG_archive_module_init().
+	 */
+	if (ShellArchivingEnabled())
+		archive_init = shell_archive_init;
+	else
+		archive_init = (ArchiveModuleInit)
+			load_external_function(XLogArchiveLibrary,
+								   "_PG_archive_module_init", false, NULL);
+
+	if (archive_init == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules have to declare the "
+						"_PG_archive_module_init symbol")));
+
+	(*archive_init) (&ArchiveContext);
+
+	if (ArchiveContext.archive_file_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register an archive callback")));
 }
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
index b54e701da4..19e240c205 100644
--- a/src/backend/postmaster/shell_archive.c
+++ b/src/backend/postmaster/shell_archive.c
@@ -2,6 +2,10 @@
  *
  * shell_archive.c
  *
+ * This archiving function uses a user-specified shell command (the
+ * archive_command GUC) to copy write-ahead log files.  It is used as the
+ * default, but other modules may define their own custom archiving logic.
+ *
  * Copyright (c) 2022, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
@@ -17,7 +21,25 @@
 #include "pgstat.h"
 #include "postmaster/pgarch.h"
 
-bool
+static bool shell_archive_configured(void);
+static bool shell_archive_file(const char *file, const char *path);
+
+void
+shell_archive_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&shell_archive_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = shell_archive_configured;
+	cb->archive_file_cb = shell_archive_file;
+}
+
+static bool
+shell_archive_configured(void)
+{
+	return XLogArchiveCommand[0] != '\0';
+}
+
+static bool
 shell_archive_file(const char *file, const char *path)
 {
 	char		xlogarchcmd[MAXPGPATH];
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 0f2570d626..0868e5a24f 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -38,6 +38,7 @@
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/interrupt.h"
+#include "postmaster/pgarch.h"
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 4c94f09c64..86b223821f 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3881,13 +3881,23 @@ static struct config_string ConfigureNamesString[] =
 	{
 		{"archive_command", PGC_SIGHUP, WAL_ARCHIVING,
 			gettext_noop("Sets the shell command that will be called to archive a WAL file."),
-			NULL
+			gettext_noop("This is unused if \"archive_library\" does not indicate archiving via shell is enabled.")
 		},
 		&XLogArchiveCommand,
 		"",
 		NULL, NULL, show_archive_command
 	},
 
+	{
+		{"archive_library", PGC_SIGHUP, WAL_ARCHIVING,
+			gettext_noop("Sets the library that will be called to archive a WAL file."),
+			gettext_noop("A value of \"shell\" or an empty string indicates that \"archive_command\" should be used.")
+		},
+		&XLogArchiveLibrary,
+		"shell",
+		NULL, NULL, NULL
+	},
+
 	{
 		{"restore_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will be called to retrieve an archived WAL file."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 817d5f5324..b4376d76aa 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -245,6 +245,7 @@
 
 #archive_mode = off		# enables archiving; off, on, or always
 				# (change requires restart)
+#archive_library = 'shell'	# library to use to archive a logfile segment
 #archive_command = ''		# command to use to archive a logfile segment
 				# placeholders: %p = path of file to archive
 				#               %f = file name only
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index bb0c52686a..85114b2e5f 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -155,7 +155,6 @@ extern PGDLLIMPORT int wal_level;
 /* Is WAL archiving enabled always (even during recovery)? */
 #define XLogArchivingAlways() \
 	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
-#define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')
 
 /*
  * Is WAL-logging necessary for archival or log-shipping, or can we skip
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 991a6d0616..732b12c0ba 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -33,7 +33,55 @@ extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 extern void PgArchForceDirScan(void);
 
-/* in shell_archive.c */
-extern bool shell_archive_file(const char *file, const char *path);
+/*
+ * The value of the archive_library GUC.
+ */
+extern char *XLogArchiveLibrary;
+
+/*
+ * Callback that gets called to determine if the archive module is
+ * configured.
+ */
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+
+/*
+ * Callback called to archive a single WAL file.
+ */
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+
+/*
+ * Called to shutdown an archive module.
+ */
+typedef void (*ArchiveShutdownCB) (void);
+
+/*
+ * Archive module callbacks
+ */
+typedef struct ArchiveModuleCallbacks
+{
+	ArchiveCheckConfiguredCB check_configured_cb;
+	ArchiveFileCB archive_file_cb;
+	ArchiveShutdownCB shutdown_cb;
+} ArchiveModuleCallbacks;
+
+/*
+ * Type of the shared library symbol _PG_archive_module_init that is looked
+ * up when loading an archive library.
+ */
+typedef void (*ArchiveModuleInit) (ArchiveModuleCallbacks *cb);
+
+/*
+ * Since the logic for archiving via a shell command is in the core server
+ * and does not need to be loaded via a shared library, it has a special
+ * initialization function.
+ */
+extern void shell_archive_init(ArchiveModuleCallbacks *cb);
+
+/*
+ * We consider archiving via shell to be enabled if archive_library is
+ * empty or if archive_library is set to "shell".
+ */
+#define ShellArchivingEnabled() \
+	(XLogArchiveLibrary[0] == '\0' || strcmp(XLogArchiveLibrary, "shell") == 0)
 
 #endif							/* _PGARCH_H */
-- 
2.25.1

v15-0002-Add-test-archive-module.patchtext/x-diff; charset=us-asciiDownload
From 7bfc811c475e16f449ed29ee395d84d77166f047 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:05:43 +0000
Subject: [PATCH v15 2/3] Add test archive module.

---
 contrib/Makefile                              |   1 +
 contrib/basic_archive/.gitignore              |   4 +
 contrib/basic_archive/Makefile                |  20 ++
 contrib/basic_archive/basic_archive.c         | 288 ++++++++++++++++++
 contrib/basic_archive/basic_archive.conf      |   3 +
 .../basic_archive/expected/basic_archive.out  |  29 ++
 contrib/basic_archive/sql/basic_archive.sql   |  22 ++
 7 files changed, 367 insertions(+)
 create mode 100644 contrib/basic_archive/.gitignore
 create mode 100644 contrib/basic_archive/Makefile
 create mode 100644 contrib/basic_archive/basic_archive.c
 create mode 100644 contrib/basic_archive/basic_archive.conf
 create mode 100644 contrib/basic_archive/expected/basic_archive.out
 create mode 100644 contrib/basic_archive/sql/basic_archive.sql

diff --git a/contrib/Makefile b/contrib/Makefile
index 87bf87ab90..e3e221308b 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -9,6 +9,7 @@ SUBDIRS = \
 		amcheck		\
 		auth_delay	\
 		auto_explain	\
+		basic_archive	\
 		bloom		\
 		btree_gin	\
 		btree_gist	\
diff --git a/contrib/basic_archive/.gitignore b/contrib/basic_archive/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/contrib/basic_archive/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/basic_archive/Makefile b/contrib/basic_archive/Makefile
new file mode 100644
index 0000000000..14d036e1c4
--- /dev/null
+++ b/contrib/basic_archive/Makefile
@@ -0,0 +1,20 @@
+# contrib/basic_archive/Makefile
+
+MODULES = basic_archive
+PGFILEDESC = "basic_archive - basic archive module"
+
+REGRESS = basic_archive
+REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/basic_archive/basic_archive.conf
+
+NO_INSTALLCHECK = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/basic_archive
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
new file mode 100644
index 0000000000..faba349fd8
--- /dev/null
+++ b/contrib/basic_archive/basic_archive.c
@@ -0,0 +1,288 @@
+/*-------------------------------------------------------------------------
+ *
+ * basic_archive.c
+ *
+ * This file demonstrates a basic archive library implementation that is
+ * roughly equivalent to the following shell command:
+ *
+ * 		test ! -f /path/to/dest && cp /path/to/src /path/to/dest
+ *
+ * One notable difference between this module and the shell command above
+ * is that this module first copies the file to a temporary destination,
+ * syncs it to disk, and then durably moves it to the final destination.
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  contrib/basic_archive/basic_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <stdlib.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "miscadmin.h"
+#include "postmaster/pgarch.h"
+#include "storage/copydir.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+void _PG_init(void);
+void _PG_archive_module_init(ArchiveModuleCallbacks *cb);
+
+static char *archive_directory = NULL;
+
+static bool basic_archive_configured(void);
+static bool basic_archive_file(const char *file, const char *path);
+static bool check_archive_directory(char **newval, void **extra, GucSource source);
+static bool compare_files(const char *file1, const char *file2);
+
+/*
+ * _PG_init
+ *
+ * Defines the module's GUC.
+ */
+void
+_PG_init(void)
+{
+	DefineCustomStringVariable("basic_archive.archive_directory",
+							   gettext_noop("Archive file destination directory."),
+							   NULL,
+							   &archive_directory,
+							   "",
+							   PGC_SIGHUP,
+							   0,
+							   check_archive_directory, NULL, NULL);
+
+	EmitWarningsOnPlaceholders("basic_archive");
+}
+
+/*
+ * _PG_archive_module_init
+ *
+ * Returns the module's archiving callbacks.
+ */
+void
+_PG_archive_module_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = basic_archive_configured;
+	cb->archive_file_cb = basic_archive_file;
+}
+
+/*
+ * check_archive_directory
+ *
+ * Checks that the provided archive directory exists.
+ */
+static bool
+check_archive_directory(char **newval, void **extra, GucSource source)
+{
+	struct stat st;
+
+	/*
+	 * The default value is an empty string, so we have to accept that value.
+	 * Our check_configured callback also checks for this and prevents archiving
+	 * from proceeding if it is still empty.
+	 */
+	if (*newval == NULL || *newval[0] == '\0')
+		return true;
+
+	/*
+	 * Make sure the file paths won't be too long.  The docs indicate that the
+	 * file names to be archived can be up to 64 characters long.
+	 */
+	if (strlen(*newval) + 64 + 2 >= MAXPGPATH)
+	{
+		GUC_check_errdetail("archive directory too long");
+		return false;
+	}
+
+	/*
+	 * Do a basic sanity check that the specified archive directory exists.  It
+	 * could be removed at some point in the future, so we still need to be
+	 * prepared for it not to exist in the actual archiving logic.
+	 */
+	if (stat(*newval, &st) != 0 || !S_ISDIR(st.st_mode))
+	{
+		GUC_check_errdetail("specified archive directory does not exist");
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * basic_archive_configured
+ *
+ * Checks that archive_directory is not blank.
+ */
+static bool
+basic_archive_configured(void)
+{
+	return archive_directory != NULL && archive_directory[0] != '\0';
+}
+
+/*
+ * basic_archive_file
+ *
+ * Archives one file.
+ */
+static bool
+basic_archive_file(const char *file, const char *path)
+{
+	char		destination[MAXPGPATH];
+	char		temp[MAXPGPATH + 64];
+	struct stat st;
+
+	ereport(DEBUG3,
+			(errmsg("archiving \"%s\" via basic_archive", file)));
+
+	snprintf(destination, MAXPGPATH, "%s/%s", archive_directory, file);
+
+	/*
+	 * First, check if the file has already been archived.  If it already exists
+	 * and has the same contents as the file we're trying to archive, we can
+	 * return success (after ensuring the file is persisted to disk). This
+	 * scenario is possible if the server crashed after archiving the file but
+	 * before renaming its .ready file to .done.
+	 *
+	 * If the archive file already exists but has different contents, something
+	 * might be wrong, so we just fail.
+	 */
+	if (stat(destination, &st) == 0)
+	{
+		if (compare_files(path, destination))
+		{
+			ereport(DEBUG3,
+					(errmsg("archive file \"%s\" already exists with identical contents",
+							destination)));
+
+			fsync_fname(destination, false);
+			fsync_fname(archive_directory, true);
+
+			return true;
+		}
+
+		ereport(WARNING,
+				(errmsg("archive file \"%s\" already exists", destination)));
+		return false;
+	}
+	else if (errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat file \"%s\": %m", destination)));
+
+	/*
+	 * Pick a sufficiently random name for the temporary file so that a
+	 * collision is unlikely.  This helps avoid problems in case a temporary
+	 * file was left around after a crash or another server happens to be
+	 * archiving to the same directory.
+	 */
+	snprintf(temp, sizeof(temp), "%s/%s.%s.%d.%d", archive_directory,
+			 "archtemp", file, MyProcPid, (int) (random() & 0x7fff));
+
+	/*
+	 * Copy the file to its temporary destination.  Note that this will fail if
+	 * temp already exists.
+	 */
+	copy_file(unconstify(char *, path), temp);
+
+	/*
+	 * Sync the temporary file to disk and move it to its final destination.
+	 * This will fail if destination already exists.
+	 */
+	(void) durable_rename_excl(temp, destination, ERROR);
+
+	ereport(DEBUG1,
+			(errmsg("archived \"%s\" via basic_archive", file)));
+
+	return true;
+}
+
+/*
+ * compare_files
+ *
+ * Returns whether the contents of the files are the same.
+ */
+static bool
+compare_files(const char *file1, const char *file2)
+{
+#define CMP_BUF_SIZE (4096)
+	char		buf1[CMP_BUF_SIZE];
+	char		buf2[CMP_BUF_SIZE];
+	int			fd1;
+	int			fd2;
+	bool		ret = true;
+
+	fd1 = OpenTransientFile(file1, O_RDONLY | PG_BINARY);
+	if (fd1 < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", file1)));
+
+	fd2 = OpenTransientFile(file2, O_RDONLY | PG_BINARY);
+	if (fd2 < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", file2)));
+
+	for (;;)
+	{
+		int		nbytes = 0;
+		int		buf1_len = 0;
+		int		buf2_len = 0;
+
+		while (buf1_len < CMP_BUF_SIZE)
+		{
+			nbytes = read(fd1, buf1 + buf1_len, CMP_BUF_SIZE - buf1_len);
+			if (nbytes < 0)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not read file \"%s\": %m", file1)));
+			else if (nbytes == 0)
+				break;
+
+			buf1_len += nbytes;
+		}
+
+		while (buf2_len < CMP_BUF_SIZE)
+		{
+			nbytes = read(fd2, buf2 + buf2_len, CMP_BUF_SIZE - buf2_len);
+			if (nbytes < 0)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not read file \"%s\": %m", file2)));
+			else if (nbytes == 0)
+				break;
+
+			buf2_len += nbytes;
+		}
+
+		if (buf1_len != buf2_len || memcmp(buf1, buf2, buf1_len) != 0)
+		{
+			ret = false;
+			break;
+		}
+		else if (buf1_len == 0)
+			break;
+	}
+
+	if (CloseTransientFile(fd1) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", file1)));
+
+	if (CloseTransientFile(fd2) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", file2)));
+
+	return ret;
+}
diff --git a/contrib/basic_archive/basic_archive.conf b/contrib/basic_archive/basic_archive.conf
new file mode 100644
index 0000000000..b26b2d4144
--- /dev/null
+++ b/contrib/basic_archive/basic_archive.conf
@@ -0,0 +1,3 @@
+archive_mode = 'on'
+archive_library = 'basic_archive'
+basic_archive.archive_directory = '.'
diff --git a/contrib/basic_archive/expected/basic_archive.out b/contrib/basic_archive/expected/basic_archive.out
new file mode 100644
index 0000000000..0015053e0f
--- /dev/null
+++ b/contrib/basic_archive/expected/basic_archive.out
@@ -0,0 +1,29 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+ ?column? 
+----------
+        1
+(1 row)
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test;
diff --git a/contrib/basic_archive/sql/basic_archive.sql b/contrib/basic_archive/sql/basic_archive.sql
new file mode 100644
index 0000000000..14e236d57a
--- /dev/null
+++ b/contrib/basic_archive/sql/basic_archive.sql
@@ -0,0 +1,22 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+
+DROP TABLE test;
-- 
2.25.1

v15-0003-Add-documentation-for-archive-modules.patchtext/x-diff; charset=us-asciiDownload
From 45b00ef07e12385708da45da1a10e087d1463fa9 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:06:01 +0000
Subject: [PATCH v15 3/3] Add documentation for archive modules.

---
 doc/src/sgml/archive-modules.sgml   | 141 ++++++++++++++++++++++++++++
 doc/src/sgml/backup.sgml            |  83 ++++++++++------
 doc/src/sgml/basic-archive.sgml     |  81 ++++++++++++++++
 doc/src/sgml/config.sgml            |  37 ++++++--
 doc/src/sgml/contrib.sgml           |   1 +
 doc/src/sgml/filelist.sgml          |   2 +
 doc/src/sgml/high-availability.sgml |   6 +-
 doc/src/sgml/postgres.sgml          |   1 +
 doc/src/sgml/ref/pg_basebackup.sgml |   4 +-
 doc/src/sgml/ref/pg_receivewal.sgml |   6 +-
 doc/src/sgml/wal.sgml               |   2 +-
 11 files changed, 316 insertions(+), 48 deletions(-)
 create mode 100644 doc/src/sgml/archive-modules.sgml
 create mode 100644 doc/src/sgml/basic-archive.sgml

diff --git a/doc/src/sgml/archive-modules.sgml b/doc/src/sgml/archive-modules.sgml
new file mode 100644
index 0000000000..722dde0d42
--- /dev/null
+++ b/doc/src/sgml/archive-modules.sgml
@@ -0,0 +1,141 @@
+<!-- doc/src/sgml/archive-modules.sgml -->
+
+<chapter id="archive-modules">
+ <title>Archive Modules</title>
+ <indexterm zone="archive-modules">
+  <primary>Archive Modules</primary>
+ </indexterm>
+
+ <para>
+  PostgreSQL provides infrastructure to create custom modules for continuous
+  archiving (see <xref linkend="continuous-archiving"/>).  While archiving via
+  a shell command (i.e., <xref linkend="guc-archive-command"/>) is much
+  simpler, a custom archive module will often be considerably more robust and
+  performant.
+ </para>
+
+ <para>
+  When a custom <xref linkend="guc-archive-library"/> is configured, PostgreSQL
+  will submit completed WAL files to the module, and the server will avoid
+  recyling or removing these WAL files until the module indicates that the files
+  were successfully archived.  It is ultimately up to the module to decide what
+  to do with each WAL file, but many recommendations are listed at
+  <xref linkend="backup-archiving-wal"/>.
+ </para>
+
+ <para>
+  Archiving modules must at least consist of an initialization function (see
+  <xref linkend="archive-module-init"/>) and the required callbacks (see
+  <xref linkend="archive-module-callbacks"/>).  However, archive modules are
+  also permitted to do much more (e.g., declare GUCs and register background
+  workers).
+ </para>
+
+ <para>
+  The <filename>contrib/basic_archive</filename> module contains a working
+  example, which demonstrates some useful techniques.
+ </para>
+
+ <warning>
+  <para>
+   There are considerable robustness and security risks in using archive modules
+   because, being written in the <literal>C</literal> language, they have access
+   to many server resources.  Administrators wishing to enable archive modules
+   should exercise extreme caution.  Only carefully audited modules should be
+   loaded.
+  </para>
+ </warning>
+
+ <sect1 id="archive-module-init">
+  <title>Initialization Functions</title>
+  <indexterm zone="archive-module-init">
+   <primary>_PG_archive_module_init</primary>
+  </indexterm>
+  <para>
+   An archive library is loaded by dynamically loading a shared library with the
+   <xref linkend="guc-archive-library"/>'s name as the library base name.  The
+   normal library search path is used to locate the library.  To provide the
+   required archive module callbacks and to indicate that the library is
+   actually an archive module, it needs to provide a function named
+   <function>_PG_archive_module_init</function>.  This function is passed a
+   struct that needs to be filled with the callback function pointers for
+   individual actions.
+
+<programlisting>
+typedef struct ArchiveModuleCallbacks
+{
+    ArchiveCheckConfiguredCB check_configured_cb;
+    ArchiveFileCB archive_file_cb;
+    ArchiveShutdownCB shutdown_cb;
+} ArchiveModuleCallbacks;
+typedef void (*ArchiveModuleInit) (struct ArchiveModuleCallbacks *cb);
+</programlisting>
+
+   Only the <function>archive_file_cb</function> callback is required.  The
+   others are optional.
+  </para>
+ </sect1>
+
+ <sect1 id="archive-module-callbacks">
+  <title>Archive Module Callbacks</title>
+  <para>
+   The archive callbacks define the actual archiving behavior of the module.
+   The server will call them as required to process each individual WAL file.
+  </para>
+
+  <sect2 id="archive-module-check">
+   <title>Check Callback</title>
+   <para>
+    The <function>check_configured_cb</function> callback is called to determine
+    whether the module is fully configured and ready to accept WAL files.  If no
+    <function>check_configured_cb</function> is defined, the server always
+    assumes the module is configured.
+
+<programlisting>
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server will proceed with
+    archiving the file by calling the <function>archive_file_cb</function>
+    callback.  If <literal>false</literal> is returned, archiving will not
+    proceed.  In the latter case, the server will periodically call this
+    function, and archiving will proceed if it eventually returns
+    <literal>true</literal>.
+   </para>
+  </sect2>
+
+  <sect2 id="archive-module-archive">
+   <title>Archive Callback</title>
+   <para>
+    The <function>archive_file_cb</function> callback is called to archive a
+    single WAL file.
+
+<programlisting>
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server proceeds as if the file
+    was successfully archived, which may include recycling or removing the
+    original WAL file.  If <literal>false</literal> is returned, the server will
+    keep the original WAL file and retry archiving later.
+    <literal>file</literal> will contain just the file name of the WAL file to
+    archive, while <literal>path</literal> contains the full path of the WAL
+    file (including the file name).
+   </para>
+  </sect2>
+
+  <sect2 id="archive-module-shutdown">
+   <title>Shutdown Callback</title>
+   <para>
+    The <function>shutdown_cb</function> callback is called when the value of
+    <xref linkend="guc-archive-library"/> changes (and before the new archive
+    library is loaded).  If no <function>shutdown_cb</function> is defined, no
+    special action is taken before loading the new archive library.
+
+<programlisting>
+typedef void (*ArchiveShutdownCB) (void);
+</programlisting>
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index cba32b6eb3..b42f1b3ca7 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -593,20 +593,23 @@ tar -cf backup.tar /usr/local/pgsql/data
     provide the database administrator with flexibility,
     <productname>PostgreSQL</productname> tries not to make any assumptions about how
     the archiving will be done.  Instead, <productname>PostgreSQL</productname> lets
-    the administrator specify a shell command to be executed to copy a
-    completed segment file to wherever it needs to go.  The command could be
-    as simple as a <literal>cp</literal>, or it could invoke a complex shell
-    script &mdash; it's all up to you.
+    the administrator specify an archive library to be executed to copy a
+    completed segment file to wherever it needs to go.  This could be as simple
+    as a shell command that uses <literal>cp</literal>, or it could invoke a
+    complex C function &mdash; it's all up to you.
    </para>
 
    <para>
     To enable WAL archiving, set the <xref linkend="guc-wal-level"/>
     configuration parameter to <literal>replica</literal> or higher,
     <xref linkend="guc-archive-mode"/> to <literal>on</literal>,
-    and specify the shell command to use in the <xref
-    linkend="guc-archive-command"/> configuration parameter.  In practice
+    and specify the library to use in the <xref
+    linkend="guc-archive-library"/> configuration parameter.  In practice
     these settings will always be placed in the
     <filename>postgresql.conf</filename> file.
+    One simple way to archive is to set <varname>archive_library</varname> to
+    <literal>shell</literal> and to specify a shell command in
+    <xref linkend="guc-archive-command"/>.
     In <varname>archive_command</varname>,
     <literal>%p</literal> is replaced by the path name of the file to
     archive, while <literal>%f</literal> is replaced by only the file name.
@@ -631,7 +634,17 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command will be executed under the ownership of the same
+    Another way to archive is to use a custom archive module as the
+    <varname>archive_library</varname>.  Since such modules are written in
+    <literal>C</literal>, creating your own may require considerably more effort
+    than writing a shell command.  However, archive modules can be more
+    performant than archiving via shell, and they will have access to many
+    useful server resources.  For more information about archive modules, see
+    <xref linkend="archive-modules"/>.
+   </para>
+
+   <para>
+    The archive library will be executed under the ownership of the same
     user that the <productname>PostgreSQL</productname> server is running as.  Since
     the series of WAL files being archived contains effectively everything
     in your database, you will want to be sure that the archived data is
@@ -640,25 +653,31 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is important that the archive command return zero exit status if and
-    only if it succeeds.  Upon getting a zero result,
+    It is important that the archive function return <literal>true</literal> if
+    and only if it succeeds.  If <literal>true</literal> is returned,
     <productname>PostgreSQL</productname> will assume that the file has been
-    successfully archived, and will remove or recycle it.  However, a nonzero
-    status tells <productname>PostgreSQL</productname> that the file was not archived;
-    it will try again periodically until it succeeds.
+    successfully archived, and will remove or recycle it.  However, a return
+    value of <literal>false</literal> tells
+    <productname>PostgreSQL</productname> that the file was not archived; it
+    will try again periodically until it succeeds.  If you are archiving via a
+    shell command, the appropriate return values can be achieved by returning
+    <literal>0</literal> if the command succeeds and a nonzero value if it
+    fails.
    </para>
 
    <para>
-    When the archive command is terminated by a signal (other than
-    <systemitem>SIGTERM</systemitem> that is used as part of a server
-    shutdown) or an error by the shell with an exit status greater than
-    125 (such as command not found), the archiver process aborts and gets
-    restarted by the postmaster. In such cases, the failure is
-    not reported in <xref linkend="pg-stat-archiver-view"/>.
+    If the archive function emits an <literal>ERROR</literal> or
+    <literal>FATAL</literal>, the archiver process aborts and gets restarted by
+    the postmaster.  If you are archiving via shell command, FATAL is emitted if
+    the command is terminated by a signal (other than
+    <systemitem>SIGTERM</systemitem> that is used as part of a server shutdown)
+    or an error by the shell with an exit status greater than 125 (such as
+    command not found).  In such cases, the failure is not reported in
+    <xref linkend="pg-stat-archiver-view"/>.
    </para>
 
    <para>
-    The archive command should generally be designed to refuse to overwrite
+    The archive library should generally be designed to refuse to overwrite
     any pre-existing archive file.  This is an important safety feature to
     preserve the integrity of your archive in case of administrator error
     (such as sending the output of two different servers to the same archive
@@ -666,9 +685,9 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is advisable to test your proposed archive command to ensure that it
+    It is advisable to test your proposed archive library to ensure that it
     indeed does not overwrite an existing file, <emphasis>and that it returns
-    nonzero status in this case</emphasis>.
+    <literal>false</literal> in this case</emphasis>.
     The example command above for Unix ensures this by including a separate
     <command>test</command> step.  On some Unix platforms, <command>cp</command> has
     switches such as <option>-i</option> that can be used to do the same thing
@@ -680,7 +699,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
 
    <para>
     While designing your archiving setup, consider what will happen if
-    the archive command fails repeatedly because some aspect requires
+    the archive library fails repeatedly because some aspect requires
     operator intervention or the archive runs out of space. For example, this
     could occur if you write to tape without an autochanger; when the tape
     fills, nothing further can be archived until the tape is swapped.
@@ -695,7 +714,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The speed of the archiving command is unimportant as long as it can keep up
+    The speed of the archive library is unimportant as long as it can keep up
     with the average rate at which your server generates WAL data.  Normal
     operation continues even if the archiving process falls a little behind.
     If archiving falls significantly behind, this will increase the amount of
@@ -707,11 +726,11 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    In writing your archive command, you should assume that the file names to
+    In writing your archive library, you should assume that the file names to
     be archived can be up to 64 characters long and can contain any
     combination of ASCII letters, digits, and dots.  It is not necessary to
-    preserve the original relative path (<literal>%p</literal>) but it is necessary to
-    preserve the file name (<literal>%f</literal>).
+    preserve the original relative path but it is necessary to preserve the file
+    name.
    </para>
 
    <para>
@@ -728,7 +747,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command is only invoked on completed WAL segments.  Hence,
+    The archive function is only invoked on completed WAL segments.  Hence,
     if your server generates only little WAL traffic (or has slack periods
     where it does so), there could be a long delay between the completion
     of a transaction and its safe recording in archive storage.  To put
@@ -758,7 +777,8 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
     contain enough information for archive recovery.  (Crash recovery is
     unaffected.)  For this reason, <varname>wal_level</varname> can only be changed at
     server start.  However, <varname>archive_command</varname> can be changed with a
-    configuration file reload.  If you wish to temporarily stop archiving,
+    configuration file reload.  If you are archiving via shell and wish to
+    temporarily stop archiving,
     one way to do it is to set <varname>archive_command</varname> to the empty
     string (<literal>''</literal>).
     This will cause WAL files to accumulate in <filename>pg_wal/</filename> until a
@@ -938,11 +958,11 @@ SELECT * FROM pg_stop_backup(false, true);
      On a standby, <varname>archive_mode</varname> must be <literal>always</literal> in order
      for <function>pg_stop_backup</function> to wait.
      Archiving of these files happens automatically since you have
-     already configured <varname>archive_command</varname>. In most cases this
+     already configured <varname>archive_library</varname>. In most cases this
      happens quickly, but you are advised to monitor your archive
      system to ensure there are no delays.
      If the archive process has fallen behind
-     because of failures of the archive command, it will keep retrying
+     because of failures of the archive library, it will keep retrying
      until the archive succeeds and the backup is complete.
      If you wish to place a time limit on the execution of
      <function>pg_stop_backup</function>, set an appropriate
@@ -1500,9 +1520,10 @@ restore_command = 'cp /mnt/server/archivedir/%f %p'
       To prepare for low level standalone hot backups, make sure
       <varname>wal_level</varname> is set to
       <literal>replica</literal> or higher, <varname>archive_mode</varname> to
-      <literal>on</literal>, and set up an <varname>archive_command</varname> that performs
+      <literal>on</literal>, and set up an <varname>archive_library</varname> that performs
       archiving only when a <emphasis>switch file</emphasis> exists.  For example:
 <programlisting>
+archive_library = 'shell'
 archive_command = 'test ! -f /var/lib/pgsql/backup_in_progress || (test ! -f /var/lib/pgsql/archive/%f &amp;&amp; cp %p /var/lib/pgsql/archive/%f)'
 </programlisting>
       This command will perform archiving when
diff --git a/doc/src/sgml/basic-archive.sgml b/doc/src/sgml/basic-archive.sgml
new file mode 100644
index 0000000000..0b650f17a8
--- /dev/null
+++ b/doc/src/sgml/basic-archive.sgml
@@ -0,0 +1,81 @@
+<!-- doc/src/sgml/basic-archive.sgml -->
+
+<sect1 id="basic-archive" xreflabel="basic_archive">
+ <title>basic_archive</title>
+
+ <indexterm zone="basic-archive">
+  <primary>basic_archive</primary>
+ </indexterm>
+
+ <para>
+  <filename>basic_archive</filename> is an example of an archive module.  This
+  module copies completed WAL segment files to the specified directory.  This
+  may not be especially useful, but it can serve as a starting point for
+  developing your own archive module.  For more information about archive
+  modules, see <xref linkend="archive-modules"/>.
+ </para>
+
+ <para>
+  In order to function, this module must be loaded via
+  <xref linkend="guc-archive-library"/>, and <xref linkend="guc-archive-mode"/>
+  must be enabled.
+ </para>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term>
+     <varname>basic_archive.archive_directory</varname> (<type>string</type>)
+     <indexterm>
+      <primary><varname>basic_archive.archive_directory</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      The directory where the server should copy WAL segment files.  This
+      directory must already exist.  The default is an empty string, which
+      effectively halts WAL archiving, but if <xref linkend="guc-archive-mode"/>
+      is enabled, the server will accumulate WAL segment files in the
+      expectation that a value will soon be provided.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <para>
+   These parameters must be set in <filename>postgresql.conf</filename>.
+   Typical usage might be:
+  </para>
+
+<programlisting>
+# postgresql.conf
+archive_mode = 'on'
+archive_library = 'basic_archive'
+basic_archive.archive_directory = '/path/to/archive/directory'
+</programlisting>
+ </sect2>
+
+ <sect2>
+  <title>Notes</title>
+
+  <para>
+   Server crashes may leave temporary files with the prefix
+   <filename>archtemp</filename> in the archive directory.  It is recommended to
+   delete such files before restarting the server after a crash.  It is safe to
+   remove such files while the server is running as long as they are unrelated
+   to any archiving still in progress, but users should use extra caution when
+   doing so.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Author</title>
+
+  <para>
+   Nathan Bossart
+  </para>
+ </sect2>
+
+</sect1>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 692d8a2a17..1836e35ac4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3479,7 +3479,7 @@ include_dir 'conf.d'
         Maximum size to let the WAL grow during automatic
         checkpoints. This is a soft limit; WAL size can exceed
         <varname>max_wal_size</varname> under special circumstances, such as
-        heavy load, a failing <varname>archive_command</varname>, or a high
+        heavy load, a failing <varname>archive_library</varname>, or a high
         <varname>wal_keep_size</varname> setting.
         If this value is specified without units, it is taken as megabytes.
         The default is 1 GB.
@@ -3528,7 +3528,7 @@ include_dir 'conf.d'
        <para>
         When <varname>archive_mode</varname> is enabled, completed WAL segments
         are sent to archive storage by setting
-        <xref linkend="guc-archive-command"/>. In addition to <literal>off</literal>,
+        <xref linkend="guc-archive-library"/>. In addition to <literal>off</literal>,
         to disable, there are two modes: <literal>on</literal>, and
         <literal>always</literal>. During normal operation, there is no
         difference between the two modes, but when set to <literal>always</literal>
@@ -3538,9 +3538,6 @@ include_dir 'conf.d'
         <xref linkend="continuous-archiving-in-standby"/> for details.
        </para>
        <para>
-        <varname>archive_mode</varname> and <varname>archive_command</varname> are
-        separate variables so that <varname>archive_command</varname> can be
-        changed without leaving archiving mode.
         This parameter can only be set at server start.
         <varname>archive_mode</varname> cannot be enabled when
         <varname>wal_level</varname> is set to <literal>minimal</literal>.
@@ -3548,6 +3545,28 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-archive-library" xreflabel="archive_library">
+      <term><varname>archive_library</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>archive_library</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        The library to use for archiving completed WAL file segments.  If set to
+        <literal>shell</literal> (the default) or an empty string, archiving via
+        shell is enabled, and <xref linkend="guc-archive-command"/> is used.
+        Otherwise, the specified shared library is used for archiving.  For more
+        information, see <xref linkend="backup-archiving-wal"/> and
+        <xref linkend="archive-modules"/>.
+       </para>
+       <para>
+        This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-archive-command" xreflabel="archive_command">
       <term><varname>archive_command</varname> (<type>string</type>)
       <indexterm>
@@ -3570,9 +3589,11 @@ include_dir 'conf.d'
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
         file or on the server command line.  It is ignored unless
-        <varname>archive_mode</varname> was enabled at server start.
+        <varname>archive_mode</varname> was enabled at server start and
+        <varname>archive_library</varname> specifies to archive via shell command.
         If <varname>archive_command</varname> is an empty string (the default) while
-        <varname>archive_mode</varname> is enabled, WAL archiving is temporarily
+        <varname>archive_mode</varname> is enabled and <varname>archive_library</varname>
+        specifies archiving via shell, WAL archiving is temporarily
         disabled, but the server continues to accumulate WAL segment files in
         the expectation that a command will soon be provided.  Setting
         <varname>archive_command</varname> to a command that does nothing but
@@ -3592,7 +3613,7 @@ include_dir 'conf.d'
       </term>
       <listitem>
        <para>
-        The <xref linkend="guc-archive-command"/> is only invoked for
+        The <xref linkend="guc-archive-library"/> is only invoked for
         completed WAL segments. Hence, if your server generates little WAL
         traffic (or has slack periods where it does so), there could be a
         long delay between the completion of a transaction and its safe
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index d3ca4b6932..be9711c6f2 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -99,6 +99,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
  &amcheck;
  &auth-delay;
  &auto-explain;
+ &basic-archive;
  &bloom;
  &btree-gin;
  &btree-gist;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 89454e99b9..328cd1f378 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -99,6 +99,7 @@
 <!ENTITY custom-scan SYSTEM "custom-scan.sgml">
 <!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
 <!ENTITY replication-origins SYSTEM "replication-origins.sgml">
+<!ENTITY archive-modules SYSTEM "archive-modules.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
@@ -112,6 +113,7 @@
 <!ENTITY amcheck         SYSTEM "amcheck.sgml">
 <!ENTITY auth-delay      SYSTEM "auth-delay.sgml">
 <!ENTITY auto-explain    SYSTEM "auto-explain.sgml">
+<!ENTITY basic-archive   SYSTEM "basic-archive.sgml">
 <!ENTITY bloom           SYSTEM "bloom.sgml">
 <!ENTITY btree-gin       SYSTEM "btree-gin.sgml">
 <!ENTITY btree-gist      SYSTEM "btree-gist.sgml">
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index a265409f02..437712762a 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -935,7 +935,7 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
     In lieu of using replication slots, it is possible to prevent the removal
     of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by
     storing the segments in an archive using
-    <xref linkend="guc-archive-command"/>.
+    <xref linkend="guc-archive-library"/>.
     However, these methods often result in retaining more WAL segments than
     required, whereas replication slots retain only the number of segments
     known to be needed.  On the other hand, replication slots can retain so
@@ -1386,10 +1386,10 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
      to <literal>always</literal>, and the standby will call the archive
      command for every WAL segment it receives, whether it's by restoring
      from the archive or by streaming replication. The shared archive can
-     be handled similarly, but the <varname>archive_command</varname> must
+     be handled similarly, but the <varname>archive_library</varname> must
      test if the file being archived exists already, and if the existing file
      has identical contents. This requires more care in the
-     <varname>archive_command</varname>, as it must
+     <varname>archive_library</varname>, as it must
      be careful to not overwrite an existing file with different contents,
      but return success if the exactly same file is archived twice. And
      all that must be done free of race conditions, if two servers attempt
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index dba9cf413f..3db6d2160b 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -233,6 +233,7 @@ break is not needed in a wider output rendering.
   &bgworker;
   &logicaldecoding;
   &replication-origins;
+  &archive-modules;
 
  </part>
 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 1546f10c0d..e7ae29ec3d 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -102,8 +102,8 @@ PostgreSQL documentation
      <para>
       All WAL records required for the backup must contain sufficient full-page writes,
       which requires you to enable <varname>full_page_writes</varname> on the primary and
-      not to use a tool like <application>pg_compresslog</application> as
-      <varname>archive_command</varname> to remove full-page writes from WAL files.
+      not to use a tool in your <varname>archive_library</varname> to remove
+      full-page writes from WAL files.
      </para>
     </listitem>
    </itemizedlist>
diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index b2e41ea814..b846213fb7 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -40,7 +40,7 @@ PostgreSQL documentation
   <para>
    <application>pg_receivewal</application> streams the write-ahead
    log in real time as it's being generated on the server, and does not wait
-   for segments to complete like <xref linkend="guc-archive-command"/> does.
+   for segments to complete like <xref linkend="guc-archive-library"/> does.
    For this reason, it is not necessary to set
    <xref linkend="guc-archive-timeout"/> when using
     <application>pg_receivewal</application>.
@@ -487,11 +487,11 @@ PostgreSQL documentation
 
   <para>
    When using <application>pg_receivewal</application> instead of
-   <xref linkend="guc-archive-command"/> as the main WAL backup method, it is
+   <xref linkend="guc-archive-library"/> as the main WAL backup method, it is
    strongly recommended to use replication slots.  Otherwise, the server is
    free to recycle or remove write-ahead log files before they are backed up,
    because it does not have any information, either
-   from <xref linkend="guc-archive-command"/> or the replication slots, about
+   from <xref linkend="guc-archive-library"/> or the replication slots, about
    how far the WAL stream has been archived.  Note, however, that a
    replication slot will fill up the server's disk space if the receiver does
    not keep up with fetching the WAL data.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 24e1c89503..2bb27a8468 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -636,7 +636,7 @@
    WAL files plus one additional WAL file are
    kept at all times. Also, if WAL archiving is used, old segments cannot be
    removed or recycled until they are archived. If WAL archiving cannot keep up
-   with the pace that WAL is generated, or if <varname>archive_command</varname>
+   with the pace that WAL is generated, or if <varname>archive_library</varname>
    fails repeatedly, old WAL files will accumulate in <filename>pg_wal</filename>
    until the situation is resolved. A slow or failed standby server that
    uses a replication slot will have the same effect (see
-- 
2.25.1

#32Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#31)
3 attachment(s)
Re: archive modules

On Sat, Jan 29, 2022 at 04:31:48PM -0800, Nathan Bossart wrote:

On Sat, Jan 29, 2022 at 12:50:18PM -0800, Nathan Bossart wrote:

Here is a new revision. I've moved basic_archive to contrib, hardened it
as suggested, and added shutdown support for archive modules.

cfbot was unhappy with v14, so here's another attempt. One other change I
am pondering is surrounding pgarch_MainLoop() with PG_TRY/PG_FINALLY so
that we can also call the shutdown callback in the event of an ERROR. This
might be necessary for an archive module that uses background workers.

Ugh. Apologies for the noise. cfbot still isn't happy, so here's yet
another attempt. This new patch set also ensures the shutdown callback is
called when the archiver process exits.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v16-0001-Introduce-archive-modules-infrastructure.patchtext/x-diff; charset=us-asciiDownload
From aab6a7cbb2ae7d0d181062f972d2e559bbd4cef6 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:04:41 +0000
Subject: [PATCH v16 1/3] Introduce archive modules infrastructure.

---
 src/backend/access/transam/xlog.c             |   2 +-
 src/backend/postmaster/pgarch.c               | 111 ++++++++++++++++--
 src/backend/postmaster/shell_archive.c        |  24 +++-
 src/backend/utils/init/miscinit.c             |   1 +
 src/backend/utils/misc/guc.c                  |  12 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/xlog.h                     |   1 -
 src/include/postmaster/pgarch.h               |  52 +++++++-
 8 files changed, 189 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dfe2a0bcce..958220c495 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8831,7 +8831,7 @@ ShutdownXLOG(int code, Datum arg)
 		 * process one more time at the end of shutdown). The checkpoint
 		 * record will go to the next XLOG file and won't be archived (yet).
 		 */
-		if (XLogArchivingActive() && XLogArchiveCommandSet())
+		if (XLogArchivingActive())
 			RequestXLogSwitch(false);
 
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 6e3fcedc97..865f1930df 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -89,6 +89,8 @@ typedef struct PgArchData
 	slock_t		arch_lck;
 } PgArchData;
 
+char *XLogArchiveLibrary = "";
+
 
 /* ----------
  * Local data
@@ -96,6 +98,8 @@ typedef struct PgArchData
  */
 static time_t last_sigterm_time = 0;
 static PgArchData *PgArch = NULL;
+static ArchiveModuleCallbacks ArchiveContext;
+
 
 /*
  * Stuff for tracking multiple files to archive from each scan of
@@ -140,6 +144,8 @@ static void pgarch_archiveDone(char *xlog);
 static void pgarch_die(int code, Datum arg);
 static void HandlePgArchInterrupts(void);
 static int ready_file_comparator(Datum a, Datum b, void *arg);
+static void LoadArchiveLibrary(void);
+static void call_archive_module_shutdown_callback(int code, Datum arg);
 
 /* Report shared memory space needed by PgArchShmemInit */
 Size
@@ -244,7 +250,16 @@ PgArchiverMain(void)
 	arch_files->arch_heap = binaryheap_allocate(NUM_FILES_PER_DIRECTORY_SCAN,
 												ready_file_comparator, NULL);
 
-	pgarch_MainLoop();
+	/* Load the archive_library. */
+	LoadArchiveLibrary();
+
+	PG_ENSURE_ERROR_CLEANUP(call_archive_module_shutdown_callback, 0);
+	{
+		pgarch_MainLoop();
+	}
+	PG_END_ENSURE_ERROR_CLEANUP(call_archive_module_shutdown_callback, 0);
+
+	call_archive_module_shutdown_callback(0, 0);
 
 	proc_exit(0);
 }
@@ -407,11 +422,12 @@ pgarch_ArchiverCopyLoop(void)
 			 */
 			HandlePgArchInterrupts();
 
-			/* can't do anything if no command ... */
-			if (!XLogArchiveCommandSet())
+			/* can't do anything if not configured ... */
+			if (ArchiveContext.check_configured_cb != NULL &&
+				!ArchiveContext.check_configured_cb())
 			{
 				ereport(WARNING,
-						(errmsg("archive_mode enabled, yet archive_command is not set")));
+						(errmsg("archive_mode enabled, yet archiving is not configured")));
 				return;
 			}
 
@@ -492,7 +508,7 @@ pgarch_ArchiverCopyLoop(void)
 /*
  * pgarch_archiveXlog
  *
- * Invokes system(3) to copy one archive file to wherever it should go
+ * Invokes archive_file_cb to copy one archive file to wherever it should go
  *
  * Returns true if successful
  */
@@ -509,7 +525,7 @@ pgarch_archiveXlog(char *xlog)
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	ret = shell_archive_file(xlog, pathname);
+	ret = ArchiveContext.archive_file_cb(xlog, pathname);
 	if (ret)
 		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	else
@@ -759,13 +775,90 @@ HandlePgArchInterrupts(void)
 	if (ProcSignalBarrierPending)
 		ProcessProcSignalBarrier();
 
+	/* Perform logging of memory contexts of this process */
+	if (LogMemoryContextPending)
+		ProcessLogMemoryContextInterrupt();
+
 	if (ConfigReloadPending)
 	{
+		char	   *archiveLib = pstrdup(XLogArchiveLibrary);
+		bool		archiveLibChanged;
+
 		ConfigReloadPending = false;
 		ProcessConfigFile(PGC_SIGHUP);
+
+		archiveLibChanged = strcmp(XLogArchiveLibrary, archiveLib) != 0;
+		pfree(archiveLib);
+
+		if (archiveLibChanged)
+		{
+			/*
+			 * Call the currently loaded archive module's shutdown callback, if
+			 * one is defined.
+			 */
+			call_archive_module_shutdown_callback(0, 0);
+
+			/*
+			 * Ideally, we would simply unload the previous archive module and
+			 * load the new one, but there is presently no mechanism for
+			 * unloading a library (see the comment above
+			 * internal_unload_library()).  To deal with this, we simply restart
+			 * the archiver.  The new archive module will be loaded when the new
+			 * archiver process starts up.
+			 */
+			ereport(LOG,
+					(errmsg("restarting archiver process because value of "
+							"\"archive_library\" was changed")));
+
+			proc_exit(0);
+		}
 	}
+}
 
-	/* Perform logging of memory contexts of this process */
-	if (LogMemoryContextPending)
-		ProcessLogMemoryContextInterrupt();
+/*
+ * LoadArchiveLibrary
+ *
+ * Loads the archiving callbacks into our local ArchiveContext.
+ */
+static void
+LoadArchiveLibrary(void)
+{
+	ArchiveModuleInit archive_init;
+
+	memset(&ArchiveContext, 0, sizeof(ArchiveModuleCallbacks));
+
+	/*
+	 * If shell archiving is enabled, use our special initialization
+	 * function.  Otherwise, load the library and call its
+	 * _PG_archive_module_init().
+	 */
+	if (ShellArchivingEnabled())
+		archive_init = shell_archive_init;
+	else
+		archive_init = (ArchiveModuleInit)
+			load_external_function(XLogArchiveLibrary,
+								   "_PG_archive_module_init", false, NULL);
+
+	if (archive_init == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules have to declare the "
+						"_PG_archive_module_init symbol")));
+
+	(*archive_init) (&ArchiveContext);
+
+	if (ArchiveContext.archive_file_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register an archive callback")));
+}
+
+/*
+ * call_archive_module_shutdown_callback
+ *
+ * Calls the loaded archive module's shutdown callback, if one is defined.
+ */
+static void
+call_archive_module_shutdown_callback(int code, Datum arg)
+{
+	if (ArchiveContext.shutdown_cb != NULL)
+		ArchiveContext.shutdown_cb();
 }
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
index b54e701da4..19e240c205 100644
--- a/src/backend/postmaster/shell_archive.c
+++ b/src/backend/postmaster/shell_archive.c
@@ -2,6 +2,10 @@
  *
  * shell_archive.c
  *
+ * This archiving function uses a user-specified shell command (the
+ * archive_command GUC) to copy write-ahead log files.  It is used as the
+ * default, but other modules may define their own custom archiving logic.
+ *
  * Copyright (c) 2022, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
@@ -17,7 +21,25 @@
 #include "pgstat.h"
 #include "postmaster/pgarch.h"
 
-bool
+static bool shell_archive_configured(void);
+static bool shell_archive_file(const char *file, const char *path);
+
+void
+shell_archive_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&shell_archive_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = shell_archive_configured;
+	cb->archive_file_cb = shell_archive_file;
+}
+
+static bool
+shell_archive_configured(void)
+{
+	return XLogArchiveCommand[0] != '\0';
+}
+
+static bool
 shell_archive_file(const char *file, const char *path)
 {
 	char		xlogarchcmd[MAXPGPATH];
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 0f2570d626..0868e5a24f 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -38,6 +38,7 @@
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/interrupt.h"
+#include "postmaster/pgarch.h"
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 4c94f09c64..86b223821f 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3881,13 +3881,23 @@ static struct config_string ConfigureNamesString[] =
 	{
 		{"archive_command", PGC_SIGHUP, WAL_ARCHIVING,
 			gettext_noop("Sets the shell command that will be called to archive a WAL file."),
-			NULL
+			gettext_noop("This is unused if \"archive_library\" does not indicate archiving via shell is enabled.")
 		},
 		&XLogArchiveCommand,
 		"",
 		NULL, NULL, show_archive_command
 	},
 
+	{
+		{"archive_library", PGC_SIGHUP, WAL_ARCHIVING,
+			gettext_noop("Sets the library that will be called to archive a WAL file."),
+			gettext_noop("A value of \"shell\" or an empty string indicates that \"archive_command\" should be used.")
+		},
+		&XLogArchiveLibrary,
+		"shell",
+		NULL, NULL, NULL
+	},
+
 	{
 		{"restore_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will be called to retrieve an archived WAL file."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 817d5f5324..b4376d76aa 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -245,6 +245,7 @@
 
 #archive_mode = off		# enables archiving; off, on, or always
 				# (change requires restart)
+#archive_library = 'shell'	# library to use to archive a logfile segment
 #archive_command = ''		# command to use to archive a logfile segment
 				# placeholders: %p = path of file to archive
 				#               %f = file name only
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index bb0c52686a..85114b2e5f 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -155,7 +155,6 @@ extern PGDLLIMPORT int wal_level;
 /* Is WAL archiving enabled always (even during recovery)? */
 #define XLogArchivingAlways() \
 	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
-#define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')
 
 /*
  * Is WAL-logging necessary for archival or log-shipping, or can we skip
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 991a6d0616..732b12c0ba 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -33,7 +33,55 @@ extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 extern void PgArchForceDirScan(void);
 
-/* in shell_archive.c */
-extern bool shell_archive_file(const char *file, const char *path);
+/*
+ * The value of the archive_library GUC.
+ */
+extern char *XLogArchiveLibrary;
+
+/*
+ * Callback that gets called to determine if the archive module is
+ * configured.
+ */
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+
+/*
+ * Callback called to archive a single WAL file.
+ */
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+
+/*
+ * Called to shutdown an archive module.
+ */
+typedef void (*ArchiveShutdownCB) (void);
+
+/*
+ * Archive module callbacks
+ */
+typedef struct ArchiveModuleCallbacks
+{
+	ArchiveCheckConfiguredCB check_configured_cb;
+	ArchiveFileCB archive_file_cb;
+	ArchiveShutdownCB shutdown_cb;
+} ArchiveModuleCallbacks;
+
+/*
+ * Type of the shared library symbol _PG_archive_module_init that is looked
+ * up when loading an archive library.
+ */
+typedef void (*ArchiveModuleInit) (ArchiveModuleCallbacks *cb);
+
+/*
+ * Since the logic for archiving via a shell command is in the core server
+ * and does not need to be loaded via a shared library, it has a special
+ * initialization function.
+ */
+extern void shell_archive_init(ArchiveModuleCallbacks *cb);
+
+/*
+ * We consider archiving via shell to be enabled if archive_library is
+ * empty or if archive_library is set to "shell".
+ */
+#define ShellArchivingEnabled() \
+	(XLogArchiveLibrary[0] == '\0' || strcmp(XLogArchiveLibrary, "shell") == 0)
 
 #endif							/* _PGARCH_H */
-- 
2.25.1

v16-0002-Add-test-archive-module.patchtext/x-diff; charset=us-asciiDownload
From 730e2ea2da57d58aaf76040512cf4eefe389db45 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:05:43 +0000
Subject: [PATCH v16 2/3] Add test archive module.

---
 contrib/Makefile                              |   1 +
 contrib/basic_archive/.gitignore              |   4 +
 contrib/basic_archive/Makefile                |  20 ++
 contrib/basic_archive/basic_archive.c         | 287 ++++++++++++++++++
 contrib/basic_archive/basic_archive.conf      |   3 +
 .../basic_archive/expected/basic_archive.out  |  29 ++
 contrib/basic_archive/sql/basic_archive.sql   |  22 ++
 7 files changed, 366 insertions(+)
 create mode 100644 contrib/basic_archive/.gitignore
 create mode 100644 contrib/basic_archive/Makefile
 create mode 100644 contrib/basic_archive/basic_archive.c
 create mode 100644 contrib/basic_archive/basic_archive.conf
 create mode 100644 contrib/basic_archive/expected/basic_archive.out
 create mode 100644 contrib/basic_archive/sql/basic_archive.sql

diff --git a/contrib/Makefile b/contrib/Makefile
index 87bf87ab90..e3e221308b 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -9,6 +9,7 @@ SUBDIRS = \
 		amcheck		\
 		auth_delay	\
 		auto_explain	\
+		basic_archive	\
 		bloom		\
 		btree_gin	\
 		btree_gist	\
diff --git a/contrib/basic_archive/.gitignore b/contrib/basic_archive/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/contrib/basic_archive/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/basic_archive/Makefile b/contrib/basic_archive/Makefile
new file mode 100644
index 0000000000..14d036e1c4
--- /dev/null
+++ b/contrib/basic_archive/Makefile
@@ -0,0 +1,20 @@
+# contrib/basic_archive/Makefile
+
+MODULES = basic_archive
+PGFILEDESC = "basic_archive - basic archive module"
+
+REGRESS = basic_archive
+REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/basic_archive/basic_archive.conf
+
+NO_INSTALLCHECK = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/basic_archive
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
new file mode 100644
index 0000000000..a91fc78814
--- /dev/null
+++ b/contrib/basic_archive/basic_archive.c
@@ -0,0 +1,287 @@
+/*-------------------------------------------------------------------------
+ *
+ * basic_archive.c
+ *
+ * This file demonstrates a basic archive library implementation that is
+ * roughly equivalent to the following shell command:
+ *
+ * 		test ! -f /path/to/dest && cp /path/to/src /path/to/dest
+ *
+ * One notable difference between this module and the shell command above
+ * is that this module first copies the file to a temporary destination,
+ * syncs it to disk, and then durably moves it to the final destination.
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  contrib/basic_archive/basic_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "miscadmin.h"
+#include "postmaster/pgarch.h"
+#include "storage/copydir.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+void _PG_init(void);
+void _PG_archive_module_init(ArchiveModuleCallbacks *cb);
+
+static char *archive_directory = NULL;
+
+static bool basic_archive_configured(void);
+static bool basic_archive_file(const char *file, const char *path);
+static bool check_archive_directory(char **newval, void **extra, GucSource source);
+static bool compare_files(const char *file1, const char *file2);
+
+/*
+ * _PG_init
+ *
+ * Defines the module's GUC.
+ */
+void
+_PG_init(void)
+{
+	DefineCustomStringVariable("basic_archive.archive_directory",
+							   gettext_noop("Archive file destination directory."),
+							   NULL,
+							   &archive_directory,
+							   "",
+							   PGC_SIGHUP,
+							   0,
+							   check_archive_directory, NULL, NULL);
+
+	EmitWarningsOnPlaceholders("basic_archive");
+}
+
+/*
+ * _PG_archive_module_init
+ *
+ * Returns the module's archiving callbacks.
+ */
+void
+_PG_archive_module_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = basic_archive_configured;
+	cb->archive_file_cb = basic_archive_file;
+}
+
+/*
+ * check_archive_directory
+ *
+ * Checks that the provided archive directory exists.
+ */
+static bool
+check_archive_directory(char **newval, void **extra, GucSource source)
+{
+	struct stat st;
+
+	/*
+	 * The default value is an empty string, so we have to accept that value.
+	 * Our check_configured callback also checks for this and prevents archiving
+	 * from proceeding if it is still empty.
+	 */
+	if (*newval == NULL || *newval[0] == '\0')
+		return true;
+
+	/*
+	 * Make sure the file paths won't be too long.  The docs indicate that the
+	 * file names to be archived can be up to 64 characters long.
+	 */
+	if (strlen(*newval) + 64 + 2 >= MAXPGPATH)
+	{
+		GUC_check_errdetail("archive directory too long");
+		return false;
+	}
+
+	/*
+	 * Do a basic sanity check that the specified archive directory exists.  It
+	 * could be removed at some point in the future, so we still need to be
+	 * prepared for it not to exist in the actual archiving logic.
+	 */
+	if (stat(*newval, &st) != 0 || !S_ISDIR(st.st_mode))
+	{
+		GUC_check_errdetail("specified archive directory does not exist");
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * basic_archive_configured
+ *
+ * Checks that archive_directory is not blank.
+ */
+static bool
+basic_archive_configured(void)
+{
+	return archive_directory != NULL && archive_directory[0] != '\0';
+}
+
+/*
+ * basic_archive_file
+ *
+ * Archives one file.
+ */
+static bool
+basic_archive_file(const char *file, const char *path)
+{
+	char		destination[MAXPGPATH];
+	char		temp[MAXPGPATH + 64];
+	struct stat st;
+
+	ereport(DEBUG3,
+			(errmsg("archiving \"%s\" via basic_archive", file)));
+
+	snprintf(destination, MAXPGPATH, "%s/%s", archive_directory, file);
+
+	/*
+	 * First, check if the file has already been archived.  If it already exists
+	 * and has the same contents as the file we're trying to archive, we can
+	 * return success (after ensuring the file is persisted to disk). This
+	 * scenario is possible if the server crashed after archiving the file but
+	 * before renaming its .ready file to .done.
+	 *
+	 * If the archive file already exists but has different contents, something
+	 * might be wrong, so we just fail.
+	 */
+	if (stat(destination, &st) == 0)
+	{
+		if (compare_files(path, destination))
+		{
+			ereport(DEBUG3,
+					(errmsg("archive file \"%s\" already exists with identical contents",
+							destination)));
+
+			fsync_fname(destination, false);
+			fsync_fname(archive_directory, true);
+
+			return true;
+		}
+
+		ereport(WARNING,
+				(errmsg("archive file \"%s\" already exists", destination)));
+		return false;
+	}
+	else if (errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat file \"%s\": %m", destination)));
+
+	/*
+	 * Pick a sufficiently unique name for the temporary file so that a
+	 * collision is unlikely.  This helps avoid problems in case a temporary
+	 * file was left around after a crash or another server happens to be
+	 * archiving to the same directory.
+	 */
+	snprintf(temp, sizeof(temp), "%s/%s.%s.%d", archive_directory,
+			 "archtemp", file, MyProcPid);
+
+	/*
+	 * Copy the file to its temporary destination.  Note that this will fail if
+	 * temp already exists.
+	 */
+	copy_file(unconstify(char *, path), temp);
+
+	/*
+	 * Sync the temporary file to disk and move it to its final destination.
+	 * This will fail if destination already exists.
+	 */
+	(void) durable_rename_excl(temp, destination, ERROR);
+
+	ereport(DEBUG1,
+			(errmsg("archived \"%s\" via basic_archive", file)));
+
+	return true;
+}
+
+/*
+ * compare_files
+ *
+ * Returns whether the contents of the files are the same.
+ */
+static bool
+compare_files(const char *file1, const char *file2)
+{
+#define CMP_BUF_SIZE (4096)
+	char		buf1[CMP_BUF_SIZE];
+	char		buf2[CMP_BUF_SIZE];
+	int			fd1;
+	int			fd2;
+	bool		ret = true;
+
+	fd1 = OpenTransientFile(file1, O_RDONLY | PG_BINARY);
+	if (fd1 < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", file1)));
+
+	fd2 = OpenTransientFile(file2, O_RDONLY | PG_BINARY);
+	if (fd2 < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", file2)));
+
+	for (;;)
+	{
+		int		nbytes = 0;
+		int		buf1_len = 0;
+		int		buf2_len = 0;
+
+		while (buf1_len < CMP_BUF_SIZE)
+		{
+			nbytes = read(fd1, buf1 + buf1_len, CMP_BUF_SIZE - buf1_len);
+			if (nbytes < 0)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not read file \"%s\": %m", file1)));
+			else if (nbytes == 0)
+				break;
+
+			buf1_len += nbytes;
+		}
+
+		while (buf2_len < CMP_BUF_SIZE)
+		{
+			nbytes = read(fd2, buf2 + buf2_len, CMP_BUF_SIZE - buf2_len);
+			if (nbytes < 0)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not read file \"%s\": %m", file2)));
+			else if (nbytes == 0)
+				break;
+
+			buf2_len += nbytes;
+		}
+
+		if (buf1_len != buf2_len || memcmp(buf1, buf2, buf1_len) != 0)
+		{
+			ret = false;
+			break;
+		}
+		else if (buf1_len == 0)
+			break;
+	}
+
+	if (CloseTransientFile(fd1) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", file1)));
+
+	if (CloseTransientFile(fd2) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", file2)));
+
+	return ret;
+}
diff --git a/contrib/basic_archive/basic_archive.conf b/contrib/basic_archive/basic_archive.conf
new file mode 100644
index 0000000000..b26b2d4144
--- /dev/null
+++ b/contrib/basic_archive/basic_archive.conf
@@ -0,0 +1,3 @@
+archive_mode = 'on'
+archive_library = 'basic_archive'
+basic_archive.archive_directory = '.'
diff --git a/contrib/basic_archive/expected/basic_archive.out b/contrib/basic_archive/expected/basic_archive.out
new file mode 100644
index 0000000000..0015053e0f
--- /dev/null
+++ b/contrib/basic_archive/expected/basic_archive.out
@@ -0,0 +1,29 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+ ?column? 
+----------
+        1
+(1 row)
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test;
diff --git a/contrib/basic_archive/sql/basic_archive.sql b/contrib/basic_archive/sql/basic_archive.sql
new file mode 100644
index 0000000000..14e236d57a
--- /dev/null
+++ b/contrib/basic_archive/sql/basic_archive.sql
@@ -0,0 +1,22 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+
+DROP TABLE test;
-- 
2.25.1

v16-0003-Add-documentation-for-archive-modules.patchtext/x-diff; charset=us-asciiDownload
From abdf3ded37cede5718636fd2b4bc6953cfb1b530 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:06:01 +0000
Subject: [PATCH v16 3/3] Add documentation for archive modules.

---
 doc/src/sgml/archive-modules.sgml   | 142 ++++++++++++++++++++++++++++
 doc/src/sgml/backup.sgml            |  83 ++++++++++------
 doc/src/sgml/basic-archive.sgml     |  81 ++++++++++++++++
 doc/src/sgml/config.sgml            |  37 ++++++--
 doc/src/sgml/contrib.sgml           |   1 +
 doc/src/sgml/filelist.sgml          |   2 +
 doc/src/sgml/high-availability.sgml |   6 +-
 doc/src/sgml/postgres.sgml          |   1 +
 doc/src/sgml/ref/pg_basebackup.sgml |   4 +-
 doc/src/sgml/ref/pg_receivewal.sgml |   6 +-
 doc/src/sgml/wal.sgml               |   2 +-
 11 files changed, 317 insertions(+), 48 deletions(-)
 create mode 100644 doc/src/sgml/archive-modules.sgml
 create mode 100644 doc/src/sgml/basic-archive.sgml

diff --git a/doc/src/sgml/archive-modules.sgml b/doc/src/sgml/archive-modules.sgml
new file mode 100644
index 0000000000..c42cc5e423
--- /dev/null
+++ b/doc/src/sgml/archive-modules.sgml
@@ -0,0 +1,142 @@
+<!-- doc/src/sgml/archive-modules.sgml -->
+
+<chapter id="archive-modules">
+ <title>Archive Modules</title>
+ <indexterm zone="archive-modules">
+  <primary>Archive Modules</primary>
+ </indexterm>
+
+ <para>
+  PostgreSQL provides infrastructure to create custom modules for continuous
+  archiving (see <xref linkend="continuous-archiving"/>).  While archiving via
+  a shell command (i.e., <xref linkend="guc-archive-command"/>) is much
+  simpler, a custom archive module will often be considerably more robust and
+  performant.
+ </para>
+
+ <para>
+  When a custom <xref linkend="guc-archive-library"/> is configured, PostgreSQL
+  will submit completed WAL files to the module, and the server will avoid
+  recyling or removing these WAL files until the module indicates that the files
+  were successfully archived.  It is ultimately up to the module to decide what
+  to do with each WAL file, but many recommendations are listed at
+  <xref linkend="backup-archiving-wal"/>.
+ </para>
+
+ <para>
+  Archiving modules must at least consist of an initialization function (see
+  <xref linkend="archive-module-init"/>) and the required callbacks (see
+  <xref linkend="archive-module-callbacks"/>).  However, archive modules are
+  also permitted to do much more (e.g., declare GUCs and register background
+  workers).
+ </para>
+
+ <para>
+  The <filename>contrib/basic_archive</filename> module contains a working
+  example, which demonstrates some useful techniques.
+ </para>
+
+ <warning>
+  <para>
+   There are considerable robustness and security risks in using archive modules
+   because, being written in the <literal>C</literal> language, they have access
+   to many server resources.  Administrators wishing to enable archive modules
+   should exercise extreme caution.  Only carefully audited modules should be
+   loaded.
+  </para>
+ </warning>
+
+ <sect1 id="archive-module-init">
+  <title>Initialization Functions</title>
+  <indexterm zone="archive-module-init">
+   <primary>_PG_archive_module_init</primary>
+  </indexterm>
+  <para>
+   An archive library is loaded by dynamically loading a shared library with the
+   <xref linkend="guc-archive-library"/>'s name as the library base name.  The
+   normal library search path is used to locate the library.  To provide the
+   required archive module callbacks and to indicate that the library is
+   actually an archive module, it needs to provide a function named
+   <function>_PG_archive_module_init</function>.  This function is passed a
+   struct that needs to be filled with the callback function pointers for
+   individual actions.
+
+<programlisting>
+typedef struct ArchiveModuleCallbacks
+{
+    ArchiveCheckConfiguredCB check_configured_cb;
+    ArchiveFileCB archive_file_cb;
+    ArchiveShutdownCB shutdown_cb;
+} ArchiveModuleCallbacks;
+typedef void (*ArchiveModuleInit) (struct ArchiveModuleCallbacks *cb);
+</programlisting>
+
+   Only the <function>archive_file_cb</function> callback is required.  The
+   others are optional.
+  </para>
+ </sect1>
+
+ <sect1 id="archive-module-callbacks">
+  <title>Archive Module Callbacks</title>
+  <para>
+   The archive callbacks define the actual archiving behavior of the module.
+   The server will call them as required to process each individual WAL file.
+  </para>
+
+  <sect2 id="archive-module-check">
+   <title>Check Callback</title>
+   <para>
+    The <function>check_configured_cb</function> callback is called to determine
+    whether the module is fully configured and ready to accept WAL files.  If no
+    <function>check_configured_cb</function> is defined, the server always
+    assumes the module is configured.
+
+<programlisting>
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server will proceed with
+    archiving the file by calling the <function>archive_file_cb</function>
+    callback.  If <literal>false</literal> is returned, archiving will not
+    proceed.  In the latter case, the server will periodically call this
+    function, and archiving will proceed if it eventually returns
+    <literal>true</literal>.
+   </para>
+  </sect2>
+
+  <sect2 id="archive-module-archive">
+   <title>Archive Callback</title>
+   <para>
+    The <function>archive_file_cb</function> callback is called to archive a
+    single WAL file.
+
+<programlisting>
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server proceeds as if the file
+    was successfully archived, which may include recycling or removing the
+    original WAL file.  If <literal>false</literal> is returned, the server will
+    keep the original WAL file and retry archiving later.
+    <literal>file</literal> will contain just the file name of the WAL file to
+    archive, while <literal>path</literal> contains the full path of the WAL
+    file (including the file name).
+   </para>
+  </sect2>
+
+  <sect2 id="archive-module-shutdown">
+   <title>Shutdown Callback</title>
+   <para>
+    The <function>shutdown_cb</function> callback is called when the archiver
+    process exits (e.g., after an error) or the value of
+    <xref linkend="guc-archive-library"/> changes.  If no
+    <function>shutdown_cb</function> is defined, no special action is taken in
+    these situations.
+
+<programlisting>
+typedef void (*ArchiveShutdownCB) (void);
+</programlisting>
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index cba32b6eb3..b42f1b3ca7 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -593,20 +593,23 @@ tar -cf backup.tar /usr/local/pgsql/data
     provide the database administrator with flexibility,
     <productname>PostgreSQL</productname> tries not to make any assumptions about how
     the archiving will be done.  Instead, <productname>PostgreSQL</productname> lets
-    the administrator specify a shell command to be executed to copy a
-    completed segment file to wherever it needs to go.  The command could be
-    as simple as a <literal>cp</literal>, or it could invoke a complex shell
-    script &mdash; it's all up to you.
+    the administrator specify an archive library to be executed to copy a
+    completed segment file to wherever it needs to go.  This could be as simple
+    as a shell command that uses <literal>cp</literal>, or it could invoke a
+    complex C function &mdash; it's all up to you.
    </para>
 
    <para>
     To enable WAL archiving, set the <xref linkend="guc-wal-level"/>
     configuration parameter to <literal>replica</literal> or higher,
     <xref linkend="guc-archive-mode"/> to <literal>on</literal>,
-    and specify the shell command to use in the <xref
-    linkend="guc-archive-command"/> configuration parameter.  In practice
+    and specify the library to use in the <xref
+    linkend="guc-archive-library"/> configuration parameter.  In practice
     these settings will always be placed in the
     <filename>postgresql.conf</filename> file.
+    One simple way to archive is to set <varname>archive_library</varname> to
+    <literal>shell</literal> and to specify a shell command in
+    <xref linkend="guc-archive-command"/>.
     In <varname>archive_command</varname>,
     <literal>%p</literal> is replaced by the path name of the file to
     archive, while <literal>%f</literal> is replaced by only the file name.
@@ -631,7 +634,17 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command will be executed under the ownership of the same
+    Another way to archive is to use a custom archive module as the
+    <varname>archive_library</varname>.  Since such modules are written in
+    <literal>C</literal>, creating your own may require considerably more effort
+    than writing a shell command.  However, archive modules can be more
+    performant than archiving via shell, and they will have access to many
+    useful server resources.  For more information about archive modules, see
+    <xref linkend="archive-modules"/>.
+   </para>
+
+   <para>
+    The archive library will be executed under the ownership of the same
     user that the <productname>PostgreSQL</productname> server is running as.  Since
     the series of WAL files being archived contains effectively everything
     in your database, you will want to be sure that the archived data is
@@ -640,25 +653,31 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is important that the archive command return zero exit status if and
-    only if it succeeds.  Upon getting a zero result,
+    It is important that the archive function return <literal>true</literal> if
+    and only if it succeeds.  If <literal>true</literal> is returned,
     <productname>PostgreSQL</productname> will assume that the file has been
-    successfully archived, and will remove or recycle it.  However, a nonzero
-    status tells <productname>PostgreSQL</productname> that the file was not archived;
-    it will try again periodically until it succeeds.
+    successfully archived, and will remove or recycle it.  However, a return
+    value of <literal>false</literal> tells
+    <productname>PostgreSQL</productname> that the file was not archived; it
+    will try again periodically until it succeeds.  If you are archiving via a
+    shell command, the appropriate return values can be achieved by returning
+    <literal>0</literal> if the command succeeds and a nonzero value if it
+    fails.
    </para>
 
    <para>
-    When the archive command is terminated by a signal (other than
-    <systemitem>SIGTERM</systemitem> that is used as part of a server
-    shutdown) or an error by the shell with an exit status greater than
-    125 (such as command not found), the archiver process aborts and gets
-    restarted by the postmaster. In such cases, the failure is
-    not reported in <xref linkend="pg-stat-archiver-view"/>.
+    If the archive function emits an <literal>ERROR</literal> or
+    <literal>FATAL</literal>, the archiver process aborts and gets restarted by
+    the postmaster.  If you are archiving via shell command, FATAL is emitted if
+    the command is terminated by a signal (other than
+    <systemitem>SIGTERM</systemitem> that is used as part of a server shutdown)
+    or an error by the shell with an exit status greater than 125 (such as
+    command not found).  In such cases, the failure is not reported in
+    <xref linkend="pg-stat-archiver-view"/>.
    </para>
 
    <para>
-    The archive command should generally be designed to refuse to overwrite
+    The archive library should generally be designed to refuse to overwrite
     any pre-existing archive file.  This is an important safety feature to
     preserve the integrity of your archive in case of administrator error
     (such as sending the output of two different servers to the same archive
@@ -666,9 +685,9 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is advisable to test your proposed archive command to ensure that it
+    It is advisable to test your proposed archive library to ensure that it
     indeed does not overwrite an existing file, <emphasis>and that it returns
-    nonzero status in this case</emphasis>.
+    <literal>false</literal> in this case</emphasis>.
     The example command above for Unix ensures this by including a separate
     <command>test</command> step.  On some Unix platforms, <command>cp</command> has
     switches such as <option>-i</option> that can be used to do the same thing
@@ -680,7 +699,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
 
    <para>
     While designing your archiving setup, consider what will happen if
-    the archive command fails repeatedly because some aspect requires
+    the archive library fails repeatedly because some aspect requires
     operator intervention or the archive runs out of space. For example, this
     could occur if you write to tape without an autochanger; when the tape
     fills, nothing further can be archived until the tape is swapped.
@@ -695,7 +714,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The speed of the archiving command is unimportant as long as it can keep up
+    The speed of the archive library is unimportant as long as it can keep up
     with the average rate at which your server generates WAL data.  Normal
     operation continues even if the archiving process falls a little behind.
     If archiving falls significantly behind, this will increase the amount of
@@ -707,11 +726,11 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    In writing your archive command, you should assume that the file names to
+    In writing your archive library, you should assume that the file names to
     be archived can be up to 64 characters long and can contain any
     combination of ASCII letters, digits, and dots.  It is not necessary to
-    preserve the original relative path (<literal>%p</literal>) but it is necessary to
-    preserve the file name (<literal>%f</literal>).
+    preserve the original relative path but it is necessary to preserve the file
+    name.
    </para>
 
    <para>
@@ -728,7 +747,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command is only invoked on completed WAL segments.  Hence,
+    The archive function is only invoked on completed WAL segments.  Hence,
     if your server generates only little WAL traffic (or has slack periods
     where it does so), there could be a long delay between the completion
     of a transaction and its safe recording in archive storage.  To put
@@ -758,7 +777,8 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
     contain enough information for archive recovery.  (Crash recovery is
     unaffected.)  For this reason, <varname>wal_level</varname> can only be changed at
     server start.  However, <varname>archive_command</varname> can be changed with a
-    configuration file reload.  If you wish to temporarily stop archiving,
+    configuration file reload.  If you are archiving via shell and wish to
+    temporarily stop archiving,
     one way to do it is to set <varname>archive_command</varname> to the empty
     string (<literal>''</literal>).
     This will cause WAL files to accumulate in <filename>pg_wal/</filename> until a
@@ -938,11 +958,11 @@ SELECT * FROM pg_stop_backup(false, true);
      On a standby, <varname>archive_mode</varname> must be <literal>always</literal> in order
      for <function>pg_stop_backup</function> to wait.
      Archiving of these files happens automatically since you have
-     already configured <varname>archive_command</varname>. In most cases this
+     already configured <varname>archive_library</varname>. In most cases this
      happens quickly, but you are advised to monitor your archive
      system to ensure there are no delays.
      If the archive process has fallen behind
-     because of failures of the archive command, it will keep retrying
+     because of failures of the archive library, it will keep retrying
      until the archive succeeds and the backup is complete.
      If you wish to place a time limit on the execution of
      <function>pg_stop_backup</function>, set an appropriate
@@ -1500,9 +1520,10 @@ restore_command = 'cp /mnt/server/archivedir/%f %p'
       To prepare for low level standalone hot backups, make sure
       <varname>wal_level</varname> is set to
       <literal>replica</literal> or higher, <varname>archive_mode</varname> to
-      <literal>on</literal>, and set up an <varname>archive_command</varname> that performs
+      <literal>on</literal>, and set up an <varname>archive_library</varname> that performs
       archiving only when a <emphasis>switch file</emphasis> exists.  For example:
 <programlisting>
+archive_library = 'shell'
 archive_command = 'test ! -f /var/lib/pgsql/backup_in_progress || (test ! -f /var/lib/pgsql/archive/%f &amp;&amp; cp %p /var/lib/pgsql/archive/%f)'
 </programlisting>
       This command will perform archiving when
diff --git a/doc/src/sgml/basic-archive.sgml b/doc/src/sgml/basic-archive.sgml
new file mode 100644
index 0000000000..0b650f17a8
--- /dev/null
+++ b/doc/src/sgml/basic-archive.sgml
@@ -0,0 +1,81 @@
+<!-- doc/src/sgml/basic-archive.sgml -->
+
+<sect1 id="basic-archive" xreflabel="basic_archive">
+ <title>basic_archive</title>
+
+ <indexterm zone="basic-archive">
+  <primary>basic_archive</primary>
+ </indexterm>
+
+ <para>
+  <filename>basic_archive</filename> is an example of an archive module.  This
+  module copies completed WAL segment files to the specified directory.  This
+  may not be especially useful, but it can serve as a starting point for
+  developing your own archive module.  For more information about archive
+  modules, see <xref linkend="archive-modules"/>.
+ </para>
+
+ <para>
+  In order to function, this module must be loaded via
+  <xref linkend="guc-archive-library"/>, and <xref linkend="guc-archive-mode"/>
+  must be enabled.
+ </para>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term>
+     <varname>basic_archive.archive_directory</varname> (<type>string</type>)
+     <indexterm>
+      <primary><varname>basic_archive.archive_directory</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      The directory where the server should copy WAL segment files.  This
+      directory must already exist.  The default is an empty string, which
+      effectively halts WAL archiving, but if <xref linkend="guc-archive-mode"/>
+      is enabled, the server will accumulate WAL segment files in the
+      expectation that a value will soon be provided.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <para>
+   These parameters must be set in <filename>postgresql.conf</filename>.
+   Typical usage might be:
+  </para>
+
+<programlisting>
+# postgresql.conf
+archive_mode = 'on'
+archive_library = 'basic_archive'
+basic_archive.archive_directory = '/path/to/archive/directory'
+</programlisting>
+ </sect2>
+
+ <sect2>
+  <title>Notes</title>
+
+  <para>
+   Server crashes may leave temporary files with the prefix
+   <filename>archtemp</filename> in the archive directory.  It is recommended to
+   delete such files before restarting the server after a crash.  It is safe to
+   remove such files while the server is running as long as they are unrelated
+   to any archiving still in progress, but users should use extra caution when
+   doing so.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Author</title>
+
+  <para>
+   Nathan Bossart
+  </para>
+ </sect2>
+
+</sect1>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 692d8a2a17..1836e35ac4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3479,7 +3479,7 @@ include_dir 'conf.d'
         Maximum size to let the WAL grow during automatic
         checkpoints. This is a soft limit; WAL size can exceed
         <varname>max_wal_size</varname> under special circumstances, such as
-        heavy load, a failing <varname>archive_command</varname>, or a high
+        heavy load, a failing <varname>archive_library</varname>, or a high
         <varname>wal_keep_size</varname> setting.
         If this value is specified without units, it is taken as megabytes.
         The default is 1 GB.
@@ -3528,7 +3528,7 @@ include_dir 'conf.d'
        <para>
         When <varname>archive_mode</varname> is enabled, completed WAL segments
         are sent to archive storage by setting
-        <xref linkend="guc-archive-command"/>. In addition to <literal>off</literal>,
+        <xref linkend="guc-archive-library"/>. In addition to <literal>off</literal>,
         to disable, there are two modes: <literal>on</literal>, and
         <literal>always</literal>. During normal operation, there is no
         difference between the two modes, but when set to <literal>always</literal>
@@ -3538,9 +3538,6 @@ include_dir 'conf.d'
         <xref linkend="continuous-archiving-in-standby"/> for details.
        </para>
        <para>
-        <varname>archive_mode</varname> and <varname>archive_command</varname> are
-        separate variables so that <varname>archive_command</varname> can be
-        changed without leaving archiving mode.
         This parameter can only be set at server start.
         <varname>archive_mode</varname> cannot be enabled when
         <varname>wal_level</varname> is set to <literal>minimal</literal>.
@@ -3548,6 +3545,28 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-archive-library" xreflabel="archive_library">
+      <term><varname>archive_library</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>archive_library</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        The library to use for archiving completed WAL file segments.  If set to
+        <literal>shell</literal> (the default) or an empty string, archiving via
+        shell is enabled, and <xref linkend="guc-archive-command"/> is used.
+        Otherwise, the specified shared library is used for archiving.  For more
+        information, see <xref linkend="backup-archiving-wal"/> and
+        <xref linkend="archive-modules"/>.
+       </para>
+       <para>
+        This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-archive-command" xreflabel="archive_command">
       <term><varname>archive_command</varname> (<type>string</type>)
       <indexterm>
@@ -3570,9 +3589,11 @@ include_dir 'conf.d'
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
         file or on the server command line.  It is ignored unless
-        <varname>archive_mode</varname> was enabled at server start.
+        <varname>archive_mode</varname> was enabled at server start and
+        <varname>archive_library</varname> specifies to archive via shell command.
         If <varname>archive_command</varname> is an empty string (the default) while
-        <varname>archive_mode</varname> is enabled, WAL archiving is temporarily
+        <varname>archive_mode</varname> is enabled and <varname>archive_library</varname>
+        specifies archiving via shell, WAL archiving is temporarily
         disabled, but the server continues to accumulate WAL segment files in
         the expectation that a command will soon be provided.  Setting
         <varname>archive_command</varname> to a command that does nothing but
@@ -3592,7 +3613,7 @@ include_dir 'conf.d'
       </term>
       <listitem>
        <para>
-        The <xref linkend="guc-archive-command"/> is only invoked for
+        The <xref linkend="guc-archive-library"/> is only invoked for
         completed WAL segments. Hence, if your server generates little WAL
         traffic (or has slack periods where it does so), there could be a
         long delay between the completion of a transaction and its safe
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index d3ca4b6932..be9711c6f2 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -99,6 +99,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
  &amcheck;
  &auth-delay;
  &auto-explain;
+ &basic-archive;
  &bloom;
  &btree-gin;
  &btree-gist;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 89454e99b9..328cd1f378 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -99,6 +99,7 @@
 <!ENTITY custom-scan SYSTEM "custom-scan.sgml">
 <!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
 <!ENTITY replication-origins SYSTEM "replication-origins.sgml">
+<!ENTITY archive-modules SYSTEM "archive-modules.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
@@ -112,6 +113,7 @@
 <!ENTITY amcheck         SYSTEM "amcheck.sgml">
 <!ENTITY auth-delay      SYSTEM "auth-delay.sgml">
 <!ENTITY auto-explain    SYSTEM "auto-explain.sgml">
+<!ENTITY basic-archive   SYSTEM "basic-archive.sgml">
 <!ENTITY bloom           SYSTEM "bloom.sgml">
 <!ENTITY btree-gin       SYSTEM "btree-gin.sgml">
 <!ENTITY btree-gist      SYSTEM "btree-gist.sgml">
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index a265409f02..437712762a 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -935,7 +935,7 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
     In lieu of using replication slots, it is possible to prevent the removal
     of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by
     storing the segments in an archive using
-    <xref linkend="guc-archive-command"/>.
+    <xref linkend="guc-archive-library"/>.
     However, these methods often result in retaining more WAL segments than
     required, whereas replication slots retain only the number of segments
     known to be needed.  On the other hand, replication slots can retain so
@@ -1386,10 +1386,10 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
      to <literal>always</literal>, and the standby will call the archive
      command for every WAL segment it receives, whether it's by restoring
      from the archive or by streaming replication. The shared archive can
-     be handled similarly, but the <varname>archive_command</varname> must
+     be handled similarly, but the <varname>archive_library</varname> must
      test if the file being archived exists already, and if the existing file
      has identical contents. This requires more care in the
-     <varname>archive_command</varname>, as it must
+     <varname>archive_library</varname>, as it must
      be careful to not overwrite an existing file with different contents,
      but return success if the exactly same file is archived twice. And
      all that must be done free of race conditions, if two servers attempt
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index dba9cf413f..3db6d2160b 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -233,6 +233,7 @@ break is not needed in a wider output rendering.
   &bgworker;
   &logicaldecoding;
   &replication-origins;
+  &archive-modules;
 
  </part>
 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 1546f10c0d..e7ae29ec3d 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -102,8 +102,8 @@ PostgreSQL documentation
      <para>
       All WAL records required for the backup must contain sufficient full-page writes,
       which requires you to enable <varname>full_page_writes</varname> on the primary and
-      not to use a tool like <application>pg_compresslog</application> as
-      <varname>archive_command</varname> to remove full-page writes from WAL files.
+      not to use a tool in your <varname>archive_library</varname> to remove
+      full-page writes from WAL files.
      </para>
     </listitem>
    </itemizedlist>
diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index b2e41ea814..b846213fb7 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -40,7 +40,7 @@ PostgreSQL documentation
   <para>
    <application>pg_receivewal</application> streams the write-ahead
    log in real time as it's being generated on the server, and does not wait
-   for segments to complete like <xref linkend="guc-archive-command"/> does.
+   for segments to complete like <xref linkend="guc-archive-library"/> does.
    For this reason, it is not necessary to set
    <xref linkend="guc-archive-timeout"/> when using
     <application>pg_receivewal</application>.
@@ -487,11 +487,11 @@ PostgreSQL documentation
 
   <para>
    When using <application>pg_receivewal</application> instead of
-   <xref linkend="guc-archive-command"/> as the main WAL backup method, it is
+   <xref linkend="guc-archive-library"/> as the main WAL backup method, it is
    strongly recommended to use replication slots.  Otherwise, the server is
    free to recycle or remove write-ahead log files before they are backed up,
    because it does not have any information, either
-   from <xref linkend="guc-archive-command"/> or the replication slots, about
+   from <xref linkend="guc-archive-library"/> or the replication slots, about
    how far the WAL stream has been archived.  Note, however, that a
    replication slot will fill up the server's disk space if the receiver does
    not keep up with fetching the WAL data.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 24e1c89503..2bb27a8468 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -636,7 +636,7 @@
    WAL files plus one additional WAL file are
    kept at all times. Also, if WAL archiving is used, old segments cannot be
    removed or recycled until they are archived. If WAL archiving cannot keep up
-   with the pace that WAL is generated, or if <varname>archive_command</varname>
+   with the pace that WAL is generated, or if <varname>archive_library</varname>
    fails repeatedly, old WAL files will accumulate in <filename>pg_wal</filename>
    until the situation is resolved. A slow or failed standby server that
    uses a replication slot will have the same effect (see
-- 
2.25.1

#33Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#32)
3 attachment(s)
Re: archive modules

On Sat, Jan 29, 2022 at 09:01:41PM -0800, Nathan Bossart wrote:

On Sat, Jan 29, 2022 at 04:31:48PM -0800, Nathan Bossart wrote:

On Sat, Jan 29, 2022 at 12:50:18PM -0800, Nathan Bossart wrote:

Here is a new revision. I've moved basic_archive to contrib, hardened it
as suggested, and added shutdown support for archive modules.

cfbot was unhappy with v14, so here's another attempt. One other change I
am pondering is surrounding pgarch_MainLoop() with PG_TRY/PG_FINALLY so
that we can also call the shutdown callback in the event of an ERROR. This
might be necessary for an archive module that uses background workers.

Ugh. Apologies for the noise. cfbot still isn't happy, so here's yet
another attempt. This new patch set also ensures the shutdown callback is
called when the archiver process exits.

If basic_archive is to be in contrib, we probably want to avoid restarting
the archiver every time the module ERRORs. I debated trying to add a
generic exception handler that all archive modules could use, but I suspect
many will have unique cleanup requirements. Plus, AFAICT restarting the
archiver isn't terrible, it just causes most of the normal retry logic to
be skipped.

I also looked into rewriting basic_archive to avoid ERRORs and return false
for all failures, but this was rather tedious. Instead, I just introduced
a custom exception handler for basic_archive's archive callback. This
allows us to ERROR as necessary (which helps ensure that failures show up
in the server logs), and the archiver can treat it like a normal failure
and avoid restarting.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v17-0001-Introduce-archive-modules-infrastructure.patchtext/x-diff; charset=us-asciiDownload
From d5f91d973e2fab0951e76c6841e8fd827849a0ae Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:04:41 +0000
Subject: [PATCH v17 1/3] Introduce archive modules infrastructure.

---
 src/backend/access/transam/xlog.c             |   2 +-
 src/backend/postmaster/pgarch.c               | 111 ++++++++++++++++--
 src/backend/postmaster/shell_archive.c        |  24 +++-
 src/backend/utils/init/miscinit.c             |   1 +
 src/backend/utils/misc/guc.c                  |  12 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/xlog.h                     |   1 -
 src/include/postmaster/pgarch.h               |  52 +++++++-
 8 files changed, 189 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dfe2a0bcce..958220c495 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8831,7 +8831,7 @@ ShutdownXLOG(int code, Datum arg)
 		 * process one more time at the end of shutdown). The checkpoint
 		 * record will go to the next XLOG file and won't be archived (yet).
 		 */
-		if (XLogArchivingActive() && XLogArchiveCommandSet())
+		if (XLogArchivingActive())
 			RequestXLogSwitch(false);
 
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 6e3fcedc97..865f1930df 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -89,6 +89,8 @@ typedef struct PgArchData
 	slock_t		arch_lck;
 } PgArchData;
 
+char *XLogArchiveLibrary = "";
+
 
 /* ----------
  * Local data
@@ -96,6 +98,8 @@ typedef struct PgArchData
  */
 static time_t last_sigterm_time = 0;
 static PgArchData *PgArch = NULL;
+static ArchiveModuleCallbacks ArchiveContext;
+
 
 /*
  * Stuff for tracking multiple files to archive from each scan of
@@ -140,6 +144,8 @@ static void pgarch_archiveDone(char *xlog);
 static void pgarch_die(int code, Datum arg);
 static void HandlePgArchInterrupts(void);
 static int ready_file_comparator(Datum a, Datum b, void *arg);
+static void LoadArchiveLibrary(void);
+static void call_archive_module_shutdown_callback(int code, Datum arg);
 
 /* Report shared memory space needed by PgArchShmemInit */
 Size
@@ -244,7 +250,16 @@ PgArchiverMain(void)
 	arch_files->arch_heap = binaryheap_allocate(NUM_FILES_PER_DIRECTORY_SCAN,
 												ready_file_comparator, NULL);
 
-	pgarch_MainLoop();
+	/* Load the archive_library. */
+	LoadArchiveLibrary();
+
+	PG_ENSURE_ERROR_CLEANUP(call_archive_module_shutdown_callback, 0);
+	{
+		pgarch_MainLoop();
+	}
+	PG_END_ENSURE_ERROR_CLEANUP(call_archive_module_shutdown_callback, 0);
+
+	call_archive_module_shutdown_callback(0, 0);
 
 	proc_exit(0);
 }
@@ -407,11 +422,12 @@ pgarch_ArchiverCopyLoop(void)
 			 */
 			HandlePgArchInterrupts();
 
-			/* can't do anything if no command ... */
-			if (!XLogArchiveCommandSet())
+			/* can't do anything if not configured ... */
+			if (ArchiveContext.check_configured_cb != NULL &&
+				!ArchiveContext.check_configured_cb())
 			{
 				ereport(WARNING,
-						(errmsg("archive_mode enabled, yet archive_command is not set")));
+						(errmsg("archive_mode enabled, yet archiving is not configured")));
 				return;
 			}
 
@@ -492,7 +508,7 @@ pgarch_ArchiverCopyLoop(void)
 /*
  * pgarch_archiveXlog
  *
- * Invokes system(3) to copy one archive file to wherever it should go
+ * Invokes archive_file_cb to copy one archive file to wherever it should go
  *
  * Returns true if successful
  */
@@ -509,7 +525,7 @@ pgarch_archiveXlog(char *xlog)
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	ret = shell_archive_file(xlog, pathname);
+	ret = ArchiveContext.archive_file_cb(xlog, pathname);
 	if (ret)
 		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	else
@@ -759,13 +775,90 @@ HandlePgArchInterrupts(void)
 	if (ProcSignalBarrierPending)
 		ProcessProcSignalBarrier();
 
+	/* Perform logging of memory contexts of this process */
+	if (LogMemoryContextPending)
+		ProcessLogMemoryContextInterrupt();
+
 	if (ConfigReloadPending)
 	{
+		char	   *archiveLib = pstrdup(XLogArchiveLibrary);
+		bool		archiveLibChanged;
+
 		ConfigReloadPending = false;
 		ProcessConfigFile(PGC_SIGHUP);
+
+		archiveLibChanged = strcmp(XLogArchiveLibrary, archiveLib) != 0;
+		pfree(archiveLib);
+
+		if (archiveLibChanged)
+		{
+			/*
+			 * Call the currently loaded archive module's shutdown callback, if
+			 * one is defined.
+			 */
+			call_archive_module_shutdown_callback(0, 0);
+
+			/*
+			 * Ideally, we would simply unload the previous archive module and
+			 * load the new one, but there is presently no mechanism for
+			 * unloading a library (see the comment above
+			 * internal_unload_library()).  To deal with this, we simply restart
+			 * the archiver.  The new archive module will be loaded when the new
+			 * archiver process starts up.
+			 */
+			ereport(LOG,
+					(errmsg("restarting archiver process because value of "
+							"\"archive_library\" was changed")));
+
+			proc_exit(0);
+		}
 	}
+}
 
-	/* Perform logging of memory contexts of this process */
-	if (LogMemoryContextPending)
-		ProcessLogMemoryContextInterrupt();
+/*
+ * LoadArchiveLibrary
+ *
+ * Loads the archiving callbacks into our local ArchiveContext.
+ */
+static void
+LoadArchiveLibrary(void)
+{
+	ArchiveModuleInit archive_init;
+
+	memset(&ArchiveContext, 0, sizeof(ArchiveModuleCallbacks));
+
+	/*
+	 * If shell archiving is enabled, use our special initialization
+	 * function.  Otherwise, load the library and call its
+	 * _PG_archive_module_init().
+	 */
+	if (ShellArchivingEnabled())
+		archive_init = shell_archive_init;
+	else
+		archive_init = (ArchiveModuleInit)
+			load_external_function(XLogArchiveLibrary,
+								   "_PG_archive_module_init", false, NULL);
+
+	if (archive_init == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules have to declare the "
+						"_PG_archive_module_init symbol")));
+
+	(*archive_init) (&ArchiveContext);
+
+	if (ArchiveContext.archive_file_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register an archive callback")));
+}
+
+/*
+ * call_archive_module_shutdown_callback
+ *
+ * Calls the loaded archive module's shutdown callback, if one is defined.
+ */
+static void
+call_archive_module_shutdown_callback(int code, Datum arg)
+{
+	if (ArchiveContext.shutdown_cb != NULL)
+		ArchiveContext.shutdown_cb();
 }
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
index b54e701da4..19e240c205 100644
--- a/src/backend/postmaster/shell_archive.c
+++ b/src/backend/postmaster/shell_archive.c
@@ -2,6 +2,10 @@
  *
  * shell_archive.c
  *
+ * This archiving function uses a user-specified shell command (the
+ * archive_command GUC) to copy write-ahead log files.  It is used as the
+ * default, but other modules may define their own custom archiving logic.
+ *
  * Copyright (c) 2022, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
@@ -17,7 +21,25 @@
 #include "pgstat.h"
 #include "postmaster/pgarch.h"
 
-bool
+static bool shell_archive_configured(void);
+static bool shell_archive_file(const char *file, const char *path);
+
+void
+shell_archive_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&shell_archive_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = shell_archive_configured;
+	cb->archive_file_cb = shell_archive_file;
+}
+
+static bool
+shell_archive_configured(void)
+{
+	return XLogArchiveCommand[0] != '\0';
+}
+
+static bool
 shell_archive_file(const char *file, const char *path)
 {
 	char		xlogarchcmd[MAXPGPATH];
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 0f2570d626..0868e5a24f 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -38,6 +38,7 @@
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/interrupt.h"
+#include "postmaster/pgarch.h"
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index b3fd42e0f1..f636b93b9c 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3881,13 +3881,23 @@ static struct config_string ConfigureNamesString[] =
 	{
 		{"archive_command", PGC_SIGHUP, WAL_ARCHIVING,
 			gettext_noop("Sets the shell command that will be called to archive a WAL file."),
-			NULL
+			gettext_noop("This is unused if \"archive_library\" does not indicate archiving via shell is enabled.")
 		},
 		&XLogArchiveCommand,
 		"",
 		NULL, NULL, show_archive_command
 	},
 
+	{
+		{"archive_library", PGC_SIGHUP, WAL_ARCHIVING,
+			gettext_noop("Sets the library that will be called to archive a WAL file."),
+			gettext_noop("A value of \"shell\" or an empty string indicates that \"archive_command\" should be used.")
+		},
+		&XLogArchiveLibrary,
+		"shell",
+		NULL, NULL, NULL
+	},
+
 	{
 		{"restore_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will be called to retrieve an archived WAL file."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 817d5f5324..b4376d76aa 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -245,6 +245,7 @@
 
 #archive_mode = off		# enables archiving; off, on, or always
 				# (change requires restart)
+#archive_library = 'shell'	# library to use to archive a logfile segment
 #archive_command = ''		# command to use to archive a logfile segment
 				# placeholders: %p = path of file to archive
 				#               %f = file name only
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 5f934dd65a..a4b1c1286f 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -154,7 +154,6 @@ extern PGDLLIMPORT int wal_level;
 /* Is WAL archiving enabled always (even during recovery)? */
 #define XLogArchivingAlways() \
 	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
-#define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')
 
 /*
  * Is WAL-logging necessary for archival or log-shipping, or can we skip
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 991a6d0616..732b12c0ba 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -33,7 +33,55 @@ extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 extern void PgArchForceDirScan(void);
 
-/* in shell_archive.c */
-extern bool shell_archive_file(const char *file, const char *path);
+/*
+ * The value of the archive_library GUC.
+ */
+extern char *XLogArchiveLibrary;
+
+/*
+ * Callback that gets called to determine if the archive module is
+ * configured.
+ */
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+
+/*
+ * Callback called to archive a single WAL file.
+ */
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+
+/*
+ * Called to shutdown an archive module.
+ */
+typedef void (*ArchiveShutdownCB) (void);
+
+/*
+ * Archive module callbacks
+ */
+typedef struct ArchiveModuleCallbacks
+{
+	ArchiveCheckConfiguredCB check_configured_cb;
+	ArchiveFileCB archive_file_cb;
+	ArchiveShutdownCB shutdown_cb;
+} ArchiveModuleCallbacks;
+
+/*
+ * Type of the shared library symbol _PG_archive_module_init that is looked
+ * up when loading an archive library.
+ */
+typedef void (*ArchiveModuleInit) (ArchiveModuleCallbacks *cb);
+
+/*
+ * Since the logic for archiving via a shell command is in the core server
+ * and does not need to be loaded via a shared library, it has a special
+ * initialization function.
+ */
+extern void shell_archive_init(ArchiveModuleCallbacks *cb);
+
+/*
+ * We consider archiving via shell to be enabled if archive_library is
+ * empty or if archive_library is set to "shell".
+ */
+#define ShellArchivingEnabled() \
+	(XLogArchiveLibrary[0] == '\0' || strcmp(XLogArchiveLibrary, "shell") == 0)
 
 #endif							/* _PGARCH_H */
-- 
2.25.1

v17-0002-Add-test-archive-module.patchtext/x-diff; charset=us-asciiDownload
From 2d467805ad545cf16b49fa5f372b88defccb321a Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:05:43 +0000
Subject: [PATCH v17 2/3] Add test archive module.

---
 contrib/Makefile                              |   1 +
 contrib/basic_archive/.gitignore              |   4 +
 contrib/basic_archive/Makefile                |  20 +
 contrib/basic_archive/basic_archive.c         | 358 ++++++++++++++++++
 contrib/basic_archive/basic_archive.conf      |   3 +
 .../basic_archive/expected/basic_archive.out  |  29 ++
 contrib/basic_archive/sql/basic_archive.sql   |  22 ++
 7 files changed, 437 insertions(+)
 create mode 100644 contrib/basic_archive/.gitignore
 create mode 100644 contrib/basic_archive/Makefile
 create mode 100644 contrib/basic_archive/basic_archive.c
 create mode 100644 contrib/basic_archive/basic_archive.conf
 create mode 100644 contrib/basic_archive/expected/basic_archive.out
 create mode 100644 contrib/basic_archive/sql/basic_archive.sql

diff --git a/contrib/Makefile b/contrib/Makefile
index 87bf87ab90..e3e221308b 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -9,6 +9,7 @@ SUBDIRS = \
 		amcheck		\
 		auth_delay	\
 		auto_explain	\
+		basic_archive	\
 		bloom		\
 		btree_gin	\
 		btree_gist	\
diff --git a/contrib/basic_archive/.gitignore b/contrib/basic_archive/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/contrib/basic_archive/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/basic_archive/Makefile b/contrib/basic_archive/Makefile
new file mode 100644
index 0000000000..14d036e1c4
--- /dev/null
+++ b/contrib/basic_archive/Makefile
@@ -0,0 +1,20 @@
+# contrib/basic_archive/Makefile
+
+MODULES = basic_archive
+PGFILEDESC = "basic_archive - basic archive module"
+
+REGRESS = basic_archive
+REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/basic_archive/basic_archive.conf
+
+NO_INSTALLCHECK = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/basic_archive
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
new file mode 100644
index 0000000000..198427ac69
--- /dev/null
+++ b/contrib/basic_archive/basic_archive.c
@@ -0,0 +1,358 @@
+/*-------------------------------------------------------------------------
+ *
+ * basic_archive.c
+ *
+ * This file demonstrates a basic archive library implementation that is
+ * roughly equivalent to the following shell command:
+ *
+ * 		test ! -f /path/to/dest && cp /path/to/src /path/to/dest
+ *
+ * One notable difference between this module and the shell command above
+ * is that this module first copies the file to a temporary destination,
+ * syncs it to disk, and then durably moves it to the final destination.
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  contrib/basic_archive/basic_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "miscadmin.h"
+#include "postmaster/pgarch.h"
+#include "storage/copydir.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+
+PG_MODULE_MAGIC;
+
+void _PG_init(void);
+void _PG_archive_module_init(ArchiveModuleCallbacks *cb);
+
+static char *archive_directory = NULL;
+static MemoryContext basic_archive_context;
+
+static bool basic_archive_configured(void);
+static bool basic_archive_file(const char *file, const char *path);
+static void basic_archive_file_internal(const char *file, const char *path);
+static bool check_archive_directory(char **newval, void **extra, GucSource source);
+static bool compare_files(const char *file1, const char *file2);
+
+/*
+ * _PG_init
+ *
+ * Defines the module's GUC.
+ */
+void
+_PG_init(void)
+{
+	DefineCustomStringVariable("basic_archive.archive_directory",
+							   gettext_noop("Archive file destination directory."),
+							   NULL,
+							   &archive_directory,
+							   "",
+							   PGC_SIGHUP,
+							   0,
+							   check_archive_directory, NULL, NULL);
+
+	EmitWarningsOnPlaceholders("basic_archive");
+
+	basic_archive_context = AllocSetContextCreate(TopMemoryContext,
+												  "basic_archive",
+												  ALLOCSET_DEFAULT_SIZES);
+}
+
+/*
+ * _PG_archive_module_init
+ *
+ * Returns the module's archiving callbacks.
+ */
+void
+_PG_archive_module_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = basic_archive_configured;
+	cb->archive_file_cb = basic_archive_file;
+}
+
+/*
+ * check_archive_directory
+ *
+ * Checks that the provided archive directory exists.
+ */
+static bool
+check_archive_directory(char **newval, void **extra, GucSource source)
+{
+	struct stat st;
+
+	/*
+	 * The default value is an empty string, so we have to accept that value.
+	 * Our check_configured callback also checks for this and prevents archiving
+	 * from proceeding if it is still empty.
+	 */
+	if (*newval == NULL || *newval[0] == '\0')
+		return true;
+
+	/*
+	 * Make sure the file paths won't be too long.  The docs indicate that the
+	 * file names to be archived can be up to 64 characters long.
+	 */
+	if (strlen(*newval) + 64 + 2 >= MAXPGPATH)
+	{
+		GUC_check_errdetail("archive directory too long");
+		return false;
+	}
+
+	/*
+	 * Do a basic sanity check that the specified archive directory exists.  It
+	 * could be removed at some point in the future, so we still need to be
+	 * prepared for it not to exist in the actual archiving logic.
+	 */
+	if (stat(*newval, &st) != 0 || !S_ISDIR(st.st_mode))
+	{
+		GUC_check_errdetail("specified archive directory does not exist");
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * basic_archive_configured
+ *
+ * Checks that archive_directory is not blank.
+ */
+static bool
+basic_archive_configured(void)
+{
+	return archive_directory != NULL && archive_directory[0] != '\0';
+}
+
+/*
+ * basic_archive_file
+ *
+ * Archives one file.
+ */
+static bool
+basic_archive_file(const char *file, const char *path)
+{
+	sigjmp_buf	local_sigjmp_buf;
+	MemoryContext oldcontext;
+
+	/*
+	 * We run basic_archive_file_internal() in our own memory context so that we
+	 * can easily reset it during error recovery (thus avoiding memory leaks).
+	 */
+	oldcontext = MemoryContextSwitchTo(basic_archive_context);
+
+	/*
+	 * Since the archiver operates at the bottom of the exception stack, ERRORs
+	 * turn into FATALs and cause the archiver process to restart.  However,
+	 * using ereport(ERROR, ...) when there are problems is easy to code and
+	 * maintain.  Therefore, we create our own exception handler to catch ERRORs
+	 * and return false instead of restarting the archiver whenever there is a
+	 * failure.
+	 */
+	if (sigsetjmp(local_sigjmp_buf, 1) != 0)
+	{
+		/* Since not using PG_TRY, must reset error stack by hand */
+		error_context_stack = NULL;
+
+		/* Prevent interrupts while cleaning up */
+		HOLD_INTERRUPTS();
+
+		/* Report the error and clear ErrorContext for next time */
+		EmitErrorReport();
+		FlushErrorState();
+
+		/* Close any files left open by copy_file() */
+		AtEOSubXact_Files(false, InvalidSubTransactionId, InvalidSubTransactionId);
+
+		/* Reset our memory context and switch back to the original one */
+		MemoryContextSwitchTo(oldcontext);
+		MemoryContextReset(basic_archive_context);
+
+		/* Remove our exception handler */
+		PG_exception_stack = NULL;
+
+		/* Now we can allow interrupts again */
+		RESUME_INTERRUPTS();
+
+		/* Report failure so that the archiver retries this file */
+		return false;
+	}
+
+	/* Enable our exception handler */
+	PG_exception_stack = &local_sigjmp_buf;
+
+	/* Archive the file! */
+	basic_archive_file_internal(file, path);
+
+	/* Remove our exception handler */
+	PG_exception_stack = NULL;
+
+	/* Reset our memory context and switch back to the original one */
+	MemoryContextSwitchTo(oldcontext);
+	MemoryContextReset(basic_archive_context);
+
+	return true;
+}
+
+static void
+basic_archive_file_internal(const char *file, const char *path)
+{
+	char		destination[MAXPGPATH];
+	char		temp[MAXPGPATH + 64];
+	struct stat st;
+
+	ereport(DEBUG3,
+			(errmsg("archiving \"%s\" via basic_archive", file)));
+
+	snprintf(destination, MAXPGPATH, "%s/%s", archive_directory, file);
+
+	/*
+	 * First, check if the file has already been archived.  If it already exists
+	 * and has the same contents as the file we're trying to archive, we can
+	 * return success (after ensuring the file is persisted to disk). This
+	 * scenario is possible if the server crashed after archiving the file but
+	 * before renaming its .ready file to .done.
+	 *
+	 * If the archive file already exists but has different contents, something
+	 * might be wrong, so we just fail.
+	 */
+	if (stat(destination, &st) == 0)
+	{
+		if (compare_files(path, destination))
+		{
+			ereport(DEBUG3,
+					(errmsg("archive file \"%s\" already exists with identical contents",
+							destination)));
+
+			fsync_fname(destination, false);
+			fsync_fname(archive_directory, true);
+
+			return;
+		}
+
+		ereport(ERROR,
+				(errmsg("archive file \"%s\" already exists", destination)));
+	}
+	else if (errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat file \"%s\": %m", destination)));
+
+	/*
+	 * Pick a sufficiently unique name for the temporary file so that a
+	 * collision is unlikely.  This helps avoid problems in case a temporary
+	 * file was left around after a crash or another server happens to be
+	 * archiving to the same directory.
+	 */
+	snprintf(temp, sizeof(temp), "%s/%s.%s.%d", archive_directory,
+			 "archtemp", file, MyProcPid);
+
+	/*
+	 * Copy the file to its temporary destination.  Note that this will fail if
+	 * temp already exists.
+	 */
+	copy_file(unconstify(char *, path), temp);
+
+	/*
+	 * Sync the temporary file to disk and move it to its final destination.
+	 * This will fail if destination already exists.
+	 */
+	(void) durable_rename_excl(temp, destination, ERROR);
+
+	ereport(DEBUG1,
+			(errmsg("archived \"%s\" via basic_archive", file)));
+
+	return;
+}
+
+/*
+ * compare_files
+ *
+ * Returns whether the contents of the files are the same.
+ */
+static bool
+compare_files(const char *file1, const char *file2)
+{
+#define CMP_BUF_SIZE (4096)
+	char		buf1[CMP_BUF_SIZE];
+	char		buf2[CMP_BUF_SIZE];
+	int			fd1;
+	int			fd2;
+	bool		ret = true;
+
+	fd1 = OpenTransientFile(file1, O_RDONLY | PG_BINARY);
+	if (fd1 < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", file1)));
+
+	fd2 = OpenTransientFile(file2, O_RDONLY | PG_BINARY);
+	if (fd2 < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", file2)));
+
+	for (;;)
+	{
+		int		nbytes = 0;
+		int		buf1_len = 0;
+		int		buf2_len = 0;
+
+		while (buf1_len < CMP_BUF_SIZE)
+		{
+			nbytes = read(fd1, buf1 + buf1_len, CMP_BUF_SIZE - buf1_len);
+			if (nbytes < 0)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not read file \"%s\": %m", file1)));
+			else if (nbytes == 0)
+				break;
+
+			buf1_len += nbytes;
+		}
+
+		while (buf2_len < CMP_BUF_SIZE)
+		{
+			nbytes = read(fd2, buf2 + buf2_len, CMP_BUF_SIZE - buf2_len);
+			if (nbytes < 0)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not read file \"%s\": %m", file2)));
+			else if (nbytes == 0)
+				break;
+
+			buf2_len += nbytes;
+		}
+
+		if (buf1_len != buf2_len || memcmp(buf1, buf2, buf1_len) != 0)
+		{
+			ret = false;
+			break;
+		}
+		else if (buf1_len == 0)
+			break;
+	}
+
+	if (CloseTransientFile(fd1) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", file1)));
+
+	if (CloseTransientFile(fd2) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", file2)));
+
+	return ret;
+}
diff --git a/contrib/basic_archive/basic_archive.conf b/contrib/basic_archive/basic_archive.conf
new file mode 100644
index 0000000000..b26b2d4144
--- /dev/null
+++ b/contrib/basic_archive/basic_archive.conf
@@ -0,0 +1,3 @@
+archive_mode = 'on'
+archive_library = 'basic_archive'
+basic_archive.archive_directory = '.'
diff --git a/contrib/basic_archive/expected/basic_archive.out b/contrib/basic_archive/expected/basic_archive.out
new file mode 100644
index 0000000000..0015053e0f
--- /dev/null
+++ b/contrib/basic_archive/expected/basic_archive.out
@@ -0,0 +1,29 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+ ?column? 
+----------
+        1
+(1 row)
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test;
diff --git a/contrib/basic_archive/sql/basic_archive.sql b/contrib/basic_archive/sql/basic_archive.sql
new file mode 100644
index 0000000000..14e236d57a
--- /dev/null
+++ b/contrib/basic_archive/sql/basic_archive.sql
@@ -0,0 +1,22 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+
+DROP TABLE test;
-- 
2.25.1

v17-0003-Add-documentation-for-archive-modules.patchtext/x-diff; charset=us-asciiDownload
From 7eb48e17e07eafc774bec4cd8f6c4cb70559856d Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:06:01 +0000
Subject: [PATCH v17 3/3] Add documentation for archive modules.

---
 doc/src/sgml/archive-modules.sgml   | 142 ++++++++++++++++++++++++++++
 doc/src/sgml/backup.sgml            |  83 ++++++++++------
 doc/src/sgml/basic-archive.sgml     |  81 ++++++++++++++++
 doc/src/sgml/config.sgml            |  37 ++++++--
 doc/src/sgml/contrib.sgml           |   1 +
 doc/src/sgml/filelist.sgml          |   2 +
 doc/src/sgml/high-availability.sgml |   6 +-
 doc/src/sgml/postgres.sgml          |   1 +
 doc/src/sgml/ref/pg_basebackup.sgml |   4 +-
 doc/src/sgml/ref/pg_receivewal.sgml |   6 +-
 doc/src/sgml/wal.sgml               |   2 +-
 11 files changed, 317 insertions(+), 48 deletions(-)
 create mode 100644 doc/src/sgml/archive-modules.sgml
 create mode 100644 doc/src/sgml/basic-archive.sgml

diff --git a/doc/src/sgml/archive-modules.sgml b/doc/src/sgml/archive-modules.sgml
new file mode 100644
index 0000000000..c42cc5e423
--- /dev/null
+++ b/doc/src/sgml/archive-modules.sgml
@@ -0,0 +1,142 @@
+<!-- doc/src/sgml/archive-modules.sgml -->
+
+<chapter id="archive-modules">
+ <title>Archive Modules</title>
+ <indexterm zone="archive-modules">
+  <primary>Archive Modules</primary>
+ </indexterm>
+
+ <para>
+  PostgreSQL provides infrastructure to create custom modules for continuous
+  archiving (see <xref linkend="continuous-archiving"/>).  While archiving via
+  a shell command (i.e., <xref linkend="guc-archive-command"/>) is much
+  simpler, a custom archive module will often be considerably more robust and
+  performant.
+ </para>
+
+ <para>
+  When a custom <xref linkend="guc-archive-library"/> is configured, PostgreSQL
+  will submit completed WAL files to the module, and the server will avoid
+  recyling or removing these WAL files until the module indicates that the files
+  were successfully archived.  It is ultimately up to the module to decide what
+  to do with each WAL file, but many recommendations are listed at
+  <xref linkend="backup-archiving-wal"/>.
+ </para>
+
+ <para>
+  Archiving modules must at least consist of an initialization function (see
+  <xref linkend="archive-module-init"/>) and the required callbacks (see
+  <xref linkend="archive-module-callbacks"/>).  However, archive modules are
+  also permitted to do much more (e.g., declare GUCs and register background
+  workers).
+ </para>
+
+ <para>
+  The <filename>contrib/basic_archive</filename> module contains a working
+  example, which demonstrates some useful techniques.
+ </para>
+
+ <warning>
+  <para>
+   There are considerable robustness and security risks in using archive modules
+   because, being written in the <literal>C</literal> language, they have access
+   to many server resources.  Administrators wishing to enable archive modules
+   should exercise extreme caution.  Only carefully audited modules should be
+   loaded.
+  </para>
+ </warning>
+
+ <sect1 id="archive-module-init">
+  <title>Initialization Functions</title>
+  <indexterm zone="archive-module-init">
+   <primary>_PG_archive_module_init</primary>
+  </indexterm>
+  <para>
+   An archive library is loaded by dynamically loading a shared library with the
+   <xref linkend="guc-archive-library"/>'s name as the library base name.  The
+   normal library search path is used to locate the library.  To provide the
+   required archive module callbacks and to indicate that the library is
+   actually an archive module, it needs to provide a function named
+   <function>_PG_archive_module_init</function>.  This function is passed a
+   struct that needs to be filled with the callback function pointers for
+   individual actions.
+
+<programlisting>
+typedef struct ArchiveModuleCallbacks
+{
+    ArchiveCheckConfiguredCB check_configured_cb;
+    ArchiveFileCB archive_file_cb;
+    ArchiveShutdownCB shutdown_cb;
+} ArchiveModuleCallbacks;
+typedef void (*ArchiveModuleInit) (struct ArchiveModuleCallbacks *cb);
+</programlisting>
+
+   Only the <function>archive_file_cb</function> callback is required.  The
+   others are optional.
+  </para>
+ </sect1>
+
+ <sect1 id="archive-module-callbacks">
+  <title>Archive Module Callbacks</title>
+  <para>
+   The archive callbacks define the actual archiving behavior of the module.
+   The server will call them as required to process each individual WAL file.
+  </para>
+
+  <sect2 id="archive-module-check">
+   <title>Check Callback</title>
+   <para>
+    The <function>check_configured_cb</function> callback is called to determine
+    whether the module is fully configured and ready to accept WAL files.  If no
+    <function>check_configured_cb</function> is defined, the server always
+    assumes the module is configured.
+
+<programlisting>
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server will proceed with
+    archiving the file by calling the <function>archive_file_cb</function>
+    callback.  If <literal>false</literal> is returned, archiving will not
+    proceed.  In the latter case, the server will periodically call this
+    function, and archiving will proceed if it eventually returns
+    <literal>true</literal>.
+   </para>
+  </sect2>
+
+  <sect2 id="archive-module-archive">
+   <title>Archive Callback</title>
+   <para>
+    The <function>archive_file_cb</function> callback is called to archive a
+    single WAL file.
+
+<programlisting>
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server proceeds as if the file
+    was successfully archived, which may include recycling or removing the
+    original WAL file.  If <literal>false</literal> is returned, the server will
+    keep the original WAL file and retry archiving later.
+    <literal>file</literal> will contain just the file name of the WAL file to
+    archive, while <literal>path</literal> contains the full path of the WAL
+    file (including the file name).
+   </para>
+  </sect2>
+
+  <sect2 id="archive-module-shutdown">
+   <title>Shutdown Callback</title>
+   <para>
+    The <function>shutdown_cb</function> callback is called when the archiver
+    process exits (e.g., after an error) or the value of
+    <xref linkend="guc-archive-library"/> changes.  If no
+    <function>shutdown_cb</function> is defined, no special action is taken in
+    these situations.
+
+<programlisting>
+typedef void (*ArchiveShutdownCB) (void);
+</programlisting>
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index cba32b6eb3..b42f1b3ca7 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -593,20 +593,23 @@ tar -cf backup.tar /usr/local/pgsql/data
     provide the database administrator with flexibility,
     <productname>PostgreSQL</productname> tries not to make any assumptions about how
     the archiving will be done.  Instead, <productname>PostgreSQL</productname> lets
-    the administrator specify a shell command to be executed to copy a
-    completed segment file to wherever it needs to go.  The command could be
-    as simple as a <literal>cp</literal>, or it could invoke a complex shell
-    script &mdash; it's all up to you.
+    the administrator specify an archive library to be executed to copy a
+    completed segment file to wherever it needs to go.  This could be as simple
+    as a shell command that uses <literal>cp</literal>, or it could invoke a
+    complex C function &mdash; it's all up to you.
    </para>
 
    <para>
     To enable WAL archiving, set the <xref linkend="guc-wal-level"/>
     configuration parameter to <literal>replica</literal> or higher,
     <xref linkend="guc-archive-mode"/> to <literal>on</literal>,
-    and specify the shell command to use in the <xref
-    linkend="guc-archive-command"/> configuration parameter.  In practice
+    and specify the library to use in the <xref
+    linkend="guc-archive-library"/> configuration parameter.  In practice
     these settings will always be placed in the
     <filename>postgresql.conf</filename> file.
+    One simple way to archive is to set <varname>archive_library</varname> to
+    <literal>shell</literal> and to specify a shell command in
+    <xref linkend="guc-archive-command"/>.
     In <varname>archive_command</varname>,
     <literal>%p</literal> is replaced by the path name of the file to
     archive, while <literal>%f</literal> is replaced by only the file name.
@@ -631,7 +634,17 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command will be executed under the ownership of the same
+    Another way to archive is to use a custom archive module as the
+    <varname>archive_library</varname>.  Since such modules are written in
+    <literal>C</literal>, creating your own may require considerably more effort
+    than writing a shell command.  However, archive modules can be more
+    performant than archiving via shell, and they will have access to many
+    useful server resources.  For more information about archive modules, see
+    <xref linkend="archive-modules"/>.
+   </para>
+
+   <para>
+    The archive library will be executed under the ownership of the same
     user that the <productname>PostgreSQL</productname> server is running as.  Since
     the series of WAL files being archived contains effectively everything
     in your database, you will want to be sure that the archived data is
@@ -640,25 +653,31 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is important that the archive command return zero exit status if and
-    only if it succeeds.  Upon getting a zero result,
+    It is important that the archive function return <literal>true</literal> if
+    and only if it succeeds.  If <literal>true</literal> is returned,
     <productname>PostgreSQL</productname> will assume that the file has been
-    successfully archived, and will remove or recycle it.  However, a nonzero
-    status tells <productname>PostgreSQL</productname> that the file was not archived;
-    it will try again periodically until it succeeds.
+    successfully archived, and will remove or recycle it.  However, a return
+    value of <literal>false</literal> tells
+    <productname>PostgreSQL</productname> that the file was not archived; it
+    will try again periodically until it succeeds.  If you are archiving via a
+    shell command, the appropriate return values can be achieved by returning
+    <literal>0</literal> if the command succeeds and a nonzero value if it
+    fails.
    </para>
 
    <para>
-    When the archive command is terminated by a signal (other than
-    <systemitem>SIGTERM</systemitem> that is used as part of a server
-    shutdown) or an error by the shell with an exit status greater than
-    125 (such as command not found), the archiver process aborts and gets
-    restarted by the postmaster. In such cases, the failure is
-    not reported in <xref linkend="pg-stat-archiver-view"/>.
+    If the archive function emits an <literal>ERROR</literal> or
+    <literal>FATAL</literal>, the archiver process aborts and gets restarted by
+    the postmaster.  If you are archiving via shell command, FATAL is emitted if
+    the command is terminated by a signal (other than
+    <systemitem>SIGTERM</systemitem> that is used as part of a server shutdown)
+    or an error by the shell with an exit status greater than 125 (such as
+    command not found).  In such cases, the failure is not reported in
+    <xref linkend="pg-stat-archiver-view"/>.
    </para>
 
    <para>
-    The archive command should generally be designed to refuse to overwrite
+    The archive library should generally be designed to refuse to overwrite
     any pre-existing archive file.  This is an important safety feature to
     preserve the integrity of your archive in case of administrator error
     (such as sending the output of two different servers to the same archive
@@ -666,9 +685,9 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is advisable to test your proposed archive command to ensure that it
+    It is advisable to test your proposed archive library to ensure that it
     indeed does not overwrite an existing file, <emphasis>and that it returns
-    nonzero status in this case</emphasis>.
+    <literal>false</literal> in this case</emphasis>.
     The example command above for Unix ensures this by including a separate
     <command>test</command> step.  On some Unix platforms, <command>cp</command> has
     switches such as <option>-i</option> that can be used to do the same thing
@@ -680,7 +699,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
 
    <para>
     While designing your archiving setup, consider what will happen if
-    the archive command fails repeatedly because some aspect requires
+    the archive library fails repeatedly because some aspect requires
     operator intervention or the archive runs out of space. For example, this
     could occur if you write to tape without an autochanger; when the tape
     fills, nothing further can be archived until the tape is swapped.
@@ -695,7 +714,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The speed of the archiving command is unimportant as long as it can keep up
+    The speed of the archive library is unimportant as long as it can keep up
     with the average rate at which your server generates WAL data.  Normal
     operation continues even if the archiving process falls a little behind.
     If archiving falls significantly behind, this will increase the amount of
@@ -707,11 +726,11 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    In writing your archive command, you should assume that the file names to
+    In writing your archive library, you should assume that the file names to
     be archived can be up to 64 characters long and can contain any
     combination of ASCII letters, digits, and dots.  It is not necessary to
-    preserve the original relative path (<literal>%p</literal>) but it is necessary to
-    preserve the file name (<literal>%f</literal>).
+    preserve the original relative path but it is necessary to preserve the file
+    name.
    </para>
 
    <para>
@@ -728,7 +747,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command is only invoked on completed WAL segments.  Hence,
+    The archive function is only invoked on completed WAL segments.  Hence,
     if your server generates only little WAL traffic (or has slack periods
     where it does so), there could be a long delay between the completion
     of a transaction and its safe recording in archive storage.  To put
@@ -758,7 +777,8 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
     contain enough information for archive recovery.  (Crash recovery is
     unaffected.)  For this reason, <varname>wal_level</varname> can only be changed at
     server start.  However, <varname>archive_command</varname> can be changed with a
-    configuration file reload.  If you wish to temporarily stop archiving,
+    configuration file reload.  If you are archiving via shell and wish to
+    temporarily stop archiving,
     one way to do it is to set <varname>archive_command</varname> to the empty
     string (<literal>''</literal>).
     This will cause WAL files to accumulate in <filename>pg_wal/</filename> until a
@@ -938,11 +958,11 @@ SELECT * FROM pg_stop_backup(false, true);
      On a standby, <varname>archive_mode</varname> must be <literal>always</literal> in order
      for <function>pg_stop_backup</function> to wait.
      Archiving of these files happens automatically since you have
-     already configured <varname>archive_command</varname>. In most cases this
+     already configured <varname>archive_library</varname>. In most cases this
      happens quickly, but you are advised to monitor your archive
      system to ensure there are no delays.
      If the archive process has fallen behind
-     because of failures of the archive command, it will keep retrying
+     because of failures of the archive library, it will keep retrying
      until the archive succeeds and the backup is complete.
      If you wish to place a time limit on the execution of
      <function>pg_stop_backup</function>, set an appropriate
@@ -1500,9 +1520,10 @@ restore_command = 'cp /mnt/server/archivedir/%f %p'
       To prepare for low level standalone hot backups, make sure
       <varname>wal_level</varname> is set to
       <literal>replica</literal> or higher, <varname>archive_mode</varname> to
-      <literal>on</literal>, and set up an <varname>archive_command</varname> that performs
+      <literal>on</literal>, and set up an <varname>archive_library</varname> that performs
       archiving only when a <emphasis>switch file</emphasis> exists.  For example:
 <programlisting>
+archive_library = 'shell'
 archive_command = 'test ! -f /var/lib/pgsql/backup_in_progress || (test ! -f /var/lib/pgsql/archive/%f &amp;&amp; cp %p /var/lib/pgsql/archive/%f)'
 </programlisting>
       This command will perform archiving when
diff --git a/doc/src/sgml/basic-archive.sgml b/doc/src/sgml/basic-archive.sgml
new file mode 100644
index 0000000000..0b650f17a8
--- /dev/null
+++ b/doc/src/sgml/basic-archive.sgml
@@ -0,0 +1,81 @@
+<!-- doc/src/sgml/basic-archive.sgml -->
+
+<sect1 id="basic-archive" xreflabel="basic_archive">
+ <title>basic_archive</title>
+
+ <indexterm zone="basic-archive">
+  <primary>basic_archive</primary>
+ </indexterm>
+
+ <para>
+  <filename>basic_archive</filename> is an example of an archive module.  This
+  module copies completed WAL segment files to the specified directory.  This
+  may not be especially useful, but it can serve as a starting point for
+  developing your own archive module.  For more information about archive
+  modules, see <xref linkend="archive-modules"/>.
+ </para>
+
+ <para>
+  In order to function, this module must be loaded via
+  <xref linkend="guc-archive-library"/>, and <xref linkend="guc-archive-mode"/>
+  must be enabled.
+ </para>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term>
+     <varname>basic_archive.archive_directory</varname> (<type>string</type>)
+     <indexterm>
+      <primary><varname>basic_archive.archive_directory</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      The directory where the server should copy WAL segment files.  This
+      directory must already exist.  The default is an empty string, which
+      effectively halts WAL archiving, but if <xref linkend="guc-archive-mode"/>
+      is enabled, the server will accumulate WAL segment files in the
+      expectation that a value will soon be provided.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <para>
+   These parameters must be set in <filename>postgresql.conf</filename>.
+   Typical usage might be:
+  </para>
+
+<programlisting>
+# postgresql.conf
+archive_mode = 'on'
+archive_library = 'basic_archive'
+basic_archive.archive_directory = '/path/to/archive/directory'
+</programlisting>
+ </sect2>
+
+ <sect2>
+  <title>Notes</title>
+
+  <para>
+   Server crashes may leave temporary files with the prefix
+   <filename>archtemp</filename> in the archive directory.  It is recommended to
+   delete such files before restarting the server after a crash.  It is safe to
+   remove such files while the server is running as long as they are unrelated
+   to any archiving still in progress, but users should use extra caution when
+   doing so.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Author</title>
+
+  <para>
+   Nathan Bossart
+  </para>
+ </sect2>
+
+</sect1>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 692d8a2a17..1836e35ac4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3479,7 +3479,7 @@ include_dir 'conf.d'
         Maximum size to let the WAL grow during automatic
         checkpoints. This is a soft limit; WAL size can exceed
         <varname>max_wal_size</varname> under special circumstances, such as
-        heavy load, a failing <varname>archive_command</varname>, or a high
+        heavy load, a failing <varname>archive_library</varname>, or a high
         <varname>wal_keep_size</varname> setting.
         If this value is specified without units, it is taken as megabytes.
         The default is 1 GB.
@@ -3528,7 +3528,7 @@ include_dir 'conf.d'
        <para>
         When <varname>archive_mode</varname> is enabled, completed WAL segments
         are sent to archive storage by setting
-        <xref linkend="guc-archive-command"/>. In addition to <literal>off</literal>,
+        <xref linkend="guc-archive-library"/>. In addition to <literal>off</literal>,
         to disable, there are two modes: <literal>on</literal>, and
         <literal>always</literal>. During normal operation, there is no
         difference between the two modes, but when set to <literal>always</literal>
@@ -3538,9 +3538,6 @@ include_dir 'conf.d'
         <xref linkend="continuous-archiving-in-standby"/> for details.
        </para>
        <para>
-        <varname>archive_mode</varname> and <varname>archive_command</varname> are
-        separate variables so that <varname>archive_command</varname> can be
-        changed without leaving archiving mode.
         This parameter can only be set at server start.
         <varname>archive_mode</varname> cannot be enabled when
         <varname>wal_level</varname> is set to <literal>minimal</literal>.
@@ -3548,6 +3545,28 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-archive-library" xreflabel="archive_library">
+      <term><varname>archive_library</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>archive_library</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        The library to use for archiving completed WAL file segments.  If set to
+        <literal>shell</literal> (the default) or an empty string, archiving via
+        shell is enabled, and <xref linkend="guc-archive-command"/> is used.
+        Otherwise, the specified shared library is used for archiving.  For more
+        information, see <xref linkend="backup-archiving-wal"/> and
+        <xref linkend="archive-modules"/>.
+       </para>
+       <para>
+        This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-archive-command" xreflabel="archive_command">
       <term><varname>archive_command</varname> (<type>string</type>)
       <indexterm>
@@ -3570,9 +3589,11 @@ include_dir 'conf.d'
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
         file or on the server command line.  It is ignored unless
-        <varname>archive_mode</varname> was enabled at server start.
+        <varname>archive_mode</varname> was enabled at server start and
+        <varname>archive_library</varname> specifies to archive via shell command.
         If <varname>archive_command</varname> is an empty string (the default) while
-        <varname>archive_mode</varname> is enabled, WAL archiving is temporarily
+        <varname>archive_mode</varname> is enabled and <varname>archive_library</varname>
+        specifies archiving via shell, WAL archiving is temporarily
         disabled, but the server continues to accumulate WAL segment files in
         the expectation that a command will soon be provided.  Setting
         <varname>archive_command</varname> to a command that does nothing but
@@ -3592,7 +3613,7 @@ include_dir 'conf.d'
       </term>
       <listitem>
        <para>
-        The <xref linkend="guc-archive-command"/> is only invoked for
+        The <xref linkend="guc-archive-library"/> is only invoked for
         completed WAL segments. Hence, if your server generates little WAL
         traffic (or has slack periods where it does so), there could be a
         long delay between the completion of a transaction and its safe
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index d3ca4b6932..be9711c6f2 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -99,6 +99,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
  &amcheck;
  &auth-delay;
  &auto-explain;
+ &basic-archive;
  &bloom;
  &btree-gin;
  &btree-gist;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 89454e99b9..328cd1f378 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -99,6 +99,7 @@
 <!ENTITY custom-scan SYSTEM "custom-scan.sgml">
 <!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
 <!ENTITY replication-origins SYSTEM "replication-origins.sgml">
+<!ENTITY archive-modules SYSTEM "archive-modules.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
@@ -112,6 +113,7 @@
 <!ENTITY amcheck         SYSTEM "amcheck.sgml">
 <!ENTITY auth-delay      SYSTEM "auth-delay.sgml">
 <!ENTITY auto-explain    SYSTEM "auto-explain.sgml">
+<!ENTITY basic-archive   SYSTEM "basic-archive.sgml">
 <!ENTITY bloom           SYSTEM "bloom.sgml">
 <!ENTITY btree-gin       SYSTEM "btree-gin.sgml">
 <!ENTITY btree-gist      SYSTEM "btree-gist.sgml">
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index a265409f02..437712762a 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -935,7 +935,7 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
     In lieu of using replication slots, it is possible to prevent the removal
     of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by
     storing the segments in an archive using
-    <xref linkend="guc-archive-command"/>.
+    <xref linkend="guc-archive-library"/>.
     However, these methods often result in retaining more WAL segments than
     required, whereas replication slots retain only the number of segments
     known to be needed.  On the other hand, replication slots can retain so
@@ -1386,10 +1386,10 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
      to <literal>always</literal>, and the standby will call the archive
      command for every WAL segment it receives, whether it's by restoring
      from the archive or by streaming replication. The shared archive can
-     be handled similarly, but the <varname>archive_command</varname> must
+     be handled similarly, but the <varname>archive_library</varname> must
      test if the file being archived exists already, and if the existing file
      has identical contents. This requires more care in the
-     <varname>archive_command</varname>, as it must
+     <varname>archive_library</varname>, as it must
      be careful to not overwrite an existing file with different contents,
      but return success if the exactly same file is archived twice. And
      all that must be done free of race conditions, if two servers attempt
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index dba9cf413f..3db6d2160b 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -233,6 +233,7 @@ break is not needed in a wider output rendering.
   &bgworker;
   &logicaldecoding;
   &replication-origins;
+  &archive-modules;
 
  </part>
 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 1546f10c0d..e7ae29ec3d 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -102,8 +102,8 @@ PostgreSQL documentation
      <para>
       All WAL records required for the backup must contain sufficient full-page writes,
       which requires you to enable <varname>full_page_writes</varname> on the primary and
-      not to use a tool like <application>pg_compresslog</application> as
-      <varname>archive_command</varname> to remove full-page writes from WAL files.
+      not to use a tool in your <varname>archive_library</varname> to remove
+      full-page writes from WAL files.
      </para>
     </listitem>
    </itemizedlist>
diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index b2e41ea814..b846213fb7 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -40,7 +40,7 @@ PostgreSQL documentation
   <para>
    <application>pg_receivewal</application> streams the write-ahead
    log in real time as it's being generated on the server, and does not wait
-   for segments to complete like <xref linkend="guc-archive-command"/> does.
+   for segments to complete like <xref linkend="guc-archive-library"/> does.
    For this reason, it is not necessary to set
    <xref linkend="guc-archive-timeout"/> when using
     <application>pg_receivewal</application>.
@@ -487,11 +487,11 @@ PostgreSQL documentation
 
   <para>
    When using <application>pg_receivewal</application> instead of
-   <xref linkend="guc-archive-command"/> as the main WAL backup method, it is
+   <xref linkend="guc-archive-library"/> as the main WAL backup method, it is
    strongly recommended to use replication slots.  Otherwise, the server is
    free to recycle or remove write-ahead log files before they are backed up,
    because it does not have any information, either
-   from <xref linkend="guc-archive-command"/> or the replication slots, about
+   from <xref linkend="guc-archive-library"/> or the replication slots, about
    how far the WAL stream has been archived.  Note, however, that a
    replication slot will fill up the server's disk space if the receiver does
    not keep up with fetching the WAL data.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 24e1c89503..2bb27a8468 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -636,7 +636,7 @@
    WAL files plus one additional WAL file are
    kept at all times. Also, if WAL archiving is used, old segments cannot be
    removed or recycled until they are archived. If WAL archiving cannot keep up
-   with the pace that WAL is generated, or if <varname>archive_command</varname>
+   with the pace that WAL is generated, or if <varname>archive_library</varname>
    fails repeatedly, old WAL files will accumulate in <filename>pg_wal</filename>
    until the situation is resolved. A slow or failed standby server that
    uses a replication slot will have the same effect (see
-- 
2.25.1

#34Robert Haas
robertmhaas@gmail.com
In reply to: Nathan Bossart (#33)
Re: archive modules

On Mon, Jan 31, 2022 at 8:36 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

If basic_archive is to be in contrib, we probably want to avoid restarting
the archiver every time the module ERRORs. I debated trying to add a
generic exception handler that all archive modules could use, but I suspect
many will have unique cleanup requirements. Plus, AFAICT restarting the
archiver isn't terrible, it just causes most of the normal retry logic to
be skipped.

I also looked into rewriting basic_archive to avoid ERRORs and return false
for all failures, but this was rather tedious. Instead, I just introduced
a custom exception handler for basic_archive's archive callback. This
allows us to ERROR as necessary (which helps ensure that failures show up
in the server logs), and the archiver can treat it like a normal failure
and avoid restarting.

I think avoiding ERROR is going to be impractical. Catching it in the
contrib module seems OK. Catching it in the generic code is probably
also possible to do in a reasonable way. Not catching the error also
seems like it would be OK, since we expect errors to be infrequent.
I'm not objecting to anything you did here, but I'm uncertain why
adding basic_archive along shell_archive changes the calculus here in
any way. It just seems like a separate problem.

+       /* Perform logging of memory contexts of this process */
+       if (LogMemoryContextPending)
+               ProcessLogMemoryContextInterrupt();

Any special reason for moving this up higher? Not really an issue, just curious.

+ gettext_noop("This is unused if
\"archive_library\" does not indicate archiving via shell is
enabled.")

This contains a double negative. We could describe it more positively:
This is used only if \"archive_library\" specifies archiving via
shell. But that's actually a little confusing, because the way you've
set it up, archiving via shell can be specified by writing either
archive_library = '' or archive_library = 'shell'. I don't see any
particularly good reason to allow that to be spelled in two ways.
Let's pick one. Then here we can write either:

(a) This is used only if archive_library = 'shell'.
-or-
(b) This is used only if archive_library is not set.

IMHO, either of those would be clearer than what you have right now,
and it would definitely be shorter.

+/*
+ * Callback that gets called to determine if the archive module is
+ * configured.
+ */
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+
+/*
+ * Callback called to archive a single WAL file.
+ */
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+
+/*
+ * Called to shutdown an archive module.
+ */
+typedef void (*ArchiveShutdownCB) (void);

I think that this is the wrong amount of comments. One theory is that
the reader should refer to the documentation to understand how these
callbacks work. In that case, having a separate comment for each one
that doesn't really say anything is just taking up space. It would be
better to have one comment for all three lines referring the reader to
the documentation. Alternatively, one could take the position that the
explanation should go into these comments, and then perhaps we don't
even really need documentation. A one-line comment that doesn't really
say anything non-obvious seems like the worst amount.

+ <warning>
+  <para>
+   There are considerable robustness and security risks in using
archive modules
+   because, being written in the <literal>C</literal> language, they
have access
+   to many server resources.  Administrators wishing to enable archive modules
+   should exercise extreme caution.  Only carefully audited modules should be
+   loaded.
+  </para>
+ </warning>

Maybe I'm just old and jaded, but do we really need this? I know we
have the same thing for background workers, but if anything that seems
like an argument against duplicating it elsewhere. Lots of copies of
essentially identical warnings aren't the way to great documentation;
if we copy this here, we'll probably copy it to more places. And also,
it seems a bit like warning people that they shouldn't give their
complete financial records to total strangers about whom they have no
little or no information. Do tell.

+<programlisting>
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server will proceed with
+    archiving the file by calling the <function>archive_file_cb</function>
+    callback.  If <literal>false</literal> is returned, archiving will not
+    proceed.  In the latter case, the server will periodically call this
+    function, and archiving will proceed if it eventually returns
+    <literal>true</literal>.

It's not obvious from reading this why anyone would want to provide
this callback, or have it do anything other than 'return true'. But
there actually is a behavior difference if you provide this and have
it return false, vs. just having archiving itself fail. At least, the
message "archive_mode enabled, yet archiving is not configured" will
be emitted. So that's something we could mention here.

I would suggest s/if it eventually/only when it/

--
Robert Haas
EDB: http://www.enterprisedb.com

#35Nathan Bossart
nathandbossart@gmail.com
In reply to: Robert Haas (#34)
3 attachment(s)
Re: archive modules

Thanks for the review!

On Wed, Feb 02, 2022 at 01:42:55PM -0500, Robert Haas wrote:

I think avoiding ERROR is going to be impractical. Catching it in the
contrib module seems OK. Catching it in the generic code is probably
also possible to do in a reasonable way. Not catching the error also
seems like it would be OK, since we expect errors to be infrequent.
I'm not objecting to anything you did here, but I'm uncertain why
adding basic_archive along shell_archive changes the calculus here in
any way. It just seems like a separate problem.

The main scenario I'm thinking about is when there is no space left for
archives. The shell archiving logic is pretty good about avoiding ERRORs,
so when there is a problem executing the command, the archiver will retry
the command a few times before giving up for a while. If basic_archive
just ERROR'd due to ENOSPC, it would cause the archiver to restart. Until
space frees up, I believe the archiver will end up restarting every 10
seconds.

I thought some more about adding a generic exception handler for the
archiving callback. I think we'd need to add a new callback function that
would perform any required cleanup (e.g., closing any files that might be
left open). That part isn't too bad. However, module authors will also
need to keep in mind that the archiving callback runs in its own transient
memory context. If the module needs to palloc() something that needs to
stick around for a while, it will need to do so in a different memory
context. With sufficient documentation, maybe this part isn't too bad
either, but in the end, all of this is to save an optional ~15 lines of
code in the module. It's not crucial to do your own ERROR handling in your
archive module, but if you want to, you can use basic_archive as a good
starting point.

tl;dr - I left it the same.

+       /* Perform logging of memory contexts of this process */
+       if (LogMemoryContextPending)
+               ProcessLogMemoryContextInterrupt();

Any special reason for moving this up higher? Not really an issue, just curious.

Since archive_library changes cause the archiver to restart, I thought it
might be good to move this before the process might exit in case
LogMemoryContextPending and ConfigReloadPending are both true.

+ gettext_noop("This is unused if
\"archive_library\" does not indicate archiving via shell is
enabled.")

This contains a double negative. We could describe it more positively:
This is used only if \"archive_library\" specifies archiving via
shell. But that's actually a little confusing, because the way you've
set it up, archiving via shell can be specified by writing either
archive_library = '' or archive_library = 'shell'. I don't see any
particularly good reason to allow that to be spelled in two ways.
Let's pick one. Then here we can write either:

(a) This is used only if archive_library = 'shell'.
-or-
(b) This is used only if archive_library is not set.

IMHO, either of those would be clearer than what you have right now,
and it would definitely be shorter.

I went with (b). That felt a bit more natural to me, and it was easier to
code because I don't have to add error checking for an empty string.

+/*
+ * Callback that gets called to determine if the archive module is
+ * configured.
+ */
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+
+/*
+ * Callback called to archive a single WAL file.
+ */
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+
+/*
+ * Called to shutdown an archive module.
+ */
+typedef void (*ArchiveShutdownCB) (void);

I think that this is the wrong amount of comments. One theory is that
the reader should refer to the documentation to understand how these
callbacks work. In that case, having a separate comment for each one
that doesn't really say anything is just taking up space. It would be
better to have one comment for all three lines referring the reader to
the documentation. Alternatively, one could take the position that the
explanation should go into these comments, and then perhaps we don't
even really need documentation. A one-line comment that doesn't really
say anything non-obvious seems like the worst amount.

In my quest to write well-commented code, I sometimes overdo it. I
adjusted these comments in the new revision.

+ <warning>
+  <para>
+   There are considerable robustness and security risks in using
archive modules
+   because, being written in the <literal>C</literal> language, they
have access
+   to many server resources.  Administrators wishing to enable archive modules
+   should exercise extreme caution.  Only carefully audited modules should be
+   loaded.
+  </para>
+ </warning>

Maybe I'm just old and jaded, but do we really need this? I know we
have the same thing for background workers, but if anything that seems
like an argument against duplicating it elsewhere. Lots of copies of
essentially identical warnings aren't the way to great documentation;
if we copy this here, we'll probably copy it to more places. And also,
it seems a bit like warning people that they shouldn't give their
complete financial records to total strangers about whom they have no
little or no information. Do tell.

I removed this in the new revision.

+<programlisting>
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server will proceed with
+    archiving the file by calling the <function>archive_file_cb</function>
+    callback.  If <literal>false</literal> is returned, archiving will not
+    proceed.  In the latter case, the server will periodically call this
+    function, and archiving will proceed if it eventually returns
+    <literal>true</literal>.

It's not obvious from reading this why anyone would want to provide
this callback, or have it do anything other than 'return true'. But
there actually is a behavior difference if you provide this and have
it return false, vs. just having archiving itself fail. At least, the
message "archive_mode enabled, yet archiving is not configured" will
be emitted. So that's something we could mention here.

The blurb just above this provides a bit more information, but I tried to
add some additional context in the new revision anyway.

I would suggest s/if it eventually/only when it/

Done.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v18-0001-Introduce-archive-modules-infrastructure.patchtext/x-diff; charset=us-asciiDownload
From 91c63675974f3694bf0e2a644720480979c1b20f Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:04:41 +0000
Subject: [PATCH v18 1/3] Introduce archive modules infrastructure.

---
 src/backend/access/transam/xlog.c             |   2 +-
 src/backend/postmaster/pgarch.c               | 111 ++++++++++++++++--
 src/backend/postmaster/shell_archive.c        |  24 +++-
 src/backend/utils/init/miscinit.c             |   1 +
 src/backend/utils/misc/guc.c                  |  12 +-
 src/backend/utils/misc/postgresql.conf.sample |   3 +
 src/include/access/xlog.h                     |   1 -
 src/include/postmaster/pgarch.h               |  38 +++++-
 8 files changed, 177 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dfe2a0bcce..958220c495 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8831,7 +8831,7 @@ ShutdownXLOG(int code, Datum arg)
 		 * process one more time at the end of shutdown). The checkpoint
 		 * record will go to the next XLOG file and won't be archived (yet).
 		 */
-		if (XLogArchivingActive() && XLogArchiveCommandSet())
+		if (XLogArchivingActive())
 			RequestXLogSwitch(false);
 
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 6e3fcedc97..dfca746337 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -89,6 +89,8 @@ typedef struct PgArchData
 	slock_t		arch_lck;
 } PgArchData;
 
+char *XLogArchiveLibrary = "";
+
 
 /* ----------
  * Local data
@@ -96,6 +98,8 @@ typedef struct PgArchData
  */
 static time_t last_sigterm_time = 0;
 static PgArchData *PgArch = NULL;
+static ArchiveModuleCallbacks ArchiveContext;
+
 
 /*
  * Stuff for tracking multiple files to archive from each scan of
@@ -140,6 +144,8 @@ static void pgarch_archiveDone(char *xlog);
 static void pgarch_die(int code, Datum arg);
 static void HandlePgArchInterrupts(void);
 static int ready_file_comparator(Datum a, Datum b, void *arg);
+static void LoadArchiveLibrary(void);
+static void call_archive_module_shutdown_callback(int code, Datum arg);
 
 /* Report shared memory space needed by PgArchShmemInit */
 Size
@@ -244,7 +250,16 @@ PgArchiverMain(void)
 	arch_files->arch_heap = binaryheap_allocate(NUM_FILES_PER_DIRECTORY_SCAN,
 												ready_file_comparator, NULL);
 
-	pgarch_MainLoop();
+	/* Load the archive_library. */
+	LoadArchiveLibrary();
+
+	PG_ENSURE_ERROR_CLEANUP(call_archive_module_shutdown_callback, 0);
+	{
+		pgarch_MainLoop();
+	}
+	PG_END_ENSURE_ERROR_CLEANUP(call_archive_module_shutdown_callback, 0);
+
+	call_archive_module_shutdown_callback(0, 0);
 
 	proc_exit(0);
 }
@@ -407,11 +422,12 @@ pgarch_ArchiverCopyLoop(void)
 			 */
 			HandlePgArchInterrupts();
 
-			/* can't do anything if no command ... */
-			if (!XLogArchiveCommandSet())
+			/* can't do anything if not configured ... */
+			if (ArchiveContext.check_configured_cb != NULL &&
+				!ArchiveContext.check_configured_cb())
 			{
 				ereport(WARNING,
-						(errmsg("archive_mode enabled, yet archive_command is not set")));
+						(errmsg("archive_mode enabled, yet archiving is not configured")));
 				return;
 			}
 
@@ -492,7 +508,7 @@ pgarch_ArchiverCopyLoop(void)
 /*
  * pgarch_archiveXlog
  *
- * Invokes system(3) to copy one archive file to wherever it should go
+ * Invokes archive_file_cb to copy one archive file to wherever it should go
  *
  * Returns true if successful
  */
@@ -509,7 +525,7 @@ pgarch_archiveXlog(char *xlog)
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	ret = shell_archive_file(xlog, pathname);
+	ret = ArchiveContext.archive_file_cb(xlog, pathname);
 	if (ret)
 		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	else
@@ -759,13 +775,90 @@ HandlePgArchInterrupts(void)
 	if (ProcSignalBarrierPending)
 		ProcessProcSignalBarrier();
 
+	/* Perform logging of memory contexts of this process */
+	if (LogMemoryContextPending)
+		ProcessLogMemoryContextInterrupt();
+
 	if (ConfigReloadPending)
 	{
+		char	   *archiveLib = pstrdup(XLogArchiveLibrary);
+		bool		archiveLibChanged;
+
 		ConfigReloadPending = false;
 		ProcessConfigFile(PGC_SIGHUP);
+
+		archiveLibChanged = strcmp(XLogArchiveLibrary, archiveLib) != 0;
+		pfree(archiveLib);
+
+		if (archiveLibChanged)
+		{
+			/*
+			 * Call the currently loaded archive module's shutdown callback, if
+			 * one is defined.
+			 */
+			call_archive_module_shutdown_callback(0, 0);
+
+			/*
+			 * Ideally, we would simply unload the previous archive module and
+			 * load the new one, but there is presently no mechanism for
+			 * unloading a library (see the comment above
+			 * internal_unload_library()).  To deal with this, we simply restart
+			 * the archiver.  The new archive module will be loaded when the new
+			 * archiver process starts up.
+			 */
+			ereport(LOG,
+					(errmsg("restarting archiver process because value of "
+							"\"archive_library\" was changed")));
+
+			proc_exit(0);
+		}
 	}
+}
 
-	/* Perform logging of memory contexts of this process */
-	if (LogMemoryContextPending)
-		ProcessLogMemoryContextInterrupt();
+/*
+ * LoadArchiveLibrary
+ *
+ * Loads the archiving callbacks into our local ArchiveContext.
+ */
+static void
+LoadArchiveLibrary(void)
+{
+	ArchiveModuleInit archive_init;
+
+	memset(&ArchiveContext, 0, sizeof(ArchiveModuleCallbacks));
+
+	/*
+	 * If shell archiving is enabled, use our special initialization
+	 * function.  Otherwise, load the library and call its
+	 * _PG_archive_module_init().
+	 */
+	if (XLogArchiveLibrary[0] == '\0')
+		archive_init = shell_archive_init;
+	else
+		archive_init = (ArchiveModuleInit)
+			load_external_function(XLogArchiveLibrary,
+								   "_PG_archive_module_init", false, NULL);
+
+	if (archive_init == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules have to declare the "
+						"_PG_archive_module_init symbol")));
+
+	(*archive_init) (&ArchiveContext);
+
+	if (ArchiveContext.archive_file_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register an archive callback")));
+}
+
+/*
+ * call_archive_module_shutdown_callback
+ *
+ * Calls the loaded archive module's shutdown callback, if one is defined.
+ */
+static void
+call_archive_module_shutdown_callback(int code, Datum arg)
+{
+	if (ArchiveContext.shutdown_cb != NULL)
+		ArchiveContext.shutdown_cb();
 }
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
index b54e701da4..19e240c205 100644
--- a/src/backend/postmaster/shell_archive.c
+++ b/src/backend/postmaster/shell_archive.c
@@ -2,6 +2,10 @@
  *
  * shell_archive.c
  *
+ * This archiving function uses a user-specified shell command (the
+ * archive_command GUC) to copy write-ahead log files.  It is used as the
+ * default, but other modules may define their own custom archiving logic.
+ *
  * Copyright (c) 2022, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
@@ -17,7 +21,25 @@
 #include "pgstat.h"
 #include "postmaster/pgarch.h"
 
-bool
+static bool shell_archive_configured(void);
+static bool shell_archive_file(const char *file, const char *path);
+
+void
+shell_archive_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&shell_archive_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = shell_archive_configured;
+	cb->archive_file_cb = shell_archive_file;
+}
+
+static bool
+shell_archive_configured(void)
+{
+	return XLogArchiveCommand[0] != '\0';
+}
+
+static bool
 shell_archive_file(const char *file, const char *path)
 {
 	char		xlogarchcmd[MAXPGPATH];
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 0f2570d626..0868e5a24f 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -38,6 +38,7 @@
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/interrupt.h"
+#include "postmaster/pgarch.h"
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index b3fd42e0f1..f505413a7f 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3881,13 +3881,23 @@ static struct config_string ConfigureNamesString[] =
 	{
 		{"archive_command", PGC_SIGHUP, WAL_ARCHIVING,
 			gettext_noop("Sets the shell command that will be called to archive a WAL file."),
-			NULL
+			gettext_noop("This is used only if \"archive_library\" is not set.")
 		},
 		&XLogArchiveCommand,
 		"",
 		NULL, NULL, show_archive_command
 	},
 
+	{
+		{"archive_library", PGC_SIGHUP, WAL_ARCHIVING,
+			gettext_noop("Sets the library that will be called to archive a WAL file."),
+			gettext_noop("An empty string indicates that \"archive_command\" should be used.")
+		},
+		&XLogArchiveLibrary,
+		"",
+		NULL, NULL, NULL
+	},
+
 	{
 		{"restore_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will be called to retrieve an archived WAL file."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 817d5f5324..56d0bee6d9 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -245,6 +245,9 @@
 
 #archive_mode = off		# enables archiving; off, on, or always
 				# (change requires restart)
+#archive_library = ''		# library to use to archive a logfile segment
+				# (empty string indicates archive_command should
+				# be used)
 #archive_command = ''		# command to use to archive a logfile segment
 				# placeholders: %p = path of file to archive
 				#               %f = file name only
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 5f934dd65a..a4b1c1286f 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -154,7 +154,6 @@ extern PGDLLIMPORT int wal_level;
 /* Is WAL archiving enabled always (even during recovery)? */
 #define XLogArchivingAlways() \
 	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
-#define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')
 
 /*
  * Is WAL-logging necessary for archival or log-shipping, or can we skip
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 991a6d0616..9bc7593a2d 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -33,7 +33,41 @@ extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 extern void PgArchForceDirScan(void);
 
-/* in shell_archive.c */
-extern bool shell_archive_file(const char *file, const char *path);
+/*
+ * The value of the archive_library GUC.
+ */
+extern char *XLogArchiveLibrary;
+
+/*
+ * Archive module callbacks
+ *
+ * These callback functions should be defined by archive libraries and returned
+ * via _PG_archive_module_init().  ArchiveFileCB is the only required callback.
+ * For more information about the purpose of each callback, refer to the
+ * archive modules documentation.
+ */
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+typedef void (*ArchiveShutdownCB) (void);
+
+typedef struct ArchiveModuleCallbacks
+{
+	ArchiveCheckConfiguredCB check_configured_cb;
+	ArchiveFileCB archive_file_cb;
+	ArchiveShutdownCB shutdown_cb;
+} ArchiveModuleCallbacks;
+
+/*
+ * Type of the shared library symbol _PG_archive_module_init that is looked
+ * up when loading an archive library.
+ */
+typedef void (*ArchiveModuleInit) (ArchiveModuleCallbacks *cb);
+
+/*
+ * Since the logic for archiving via a shell command is in the core server
+ * and does not need to be loaded via a shared library, it has a special
+ * initialization function.
+ */
+extern void shell_archive_init(ArchiveModuleCallbacks *cb);
 
 #endif							/* _PGARCH_H */
-- 
2.25.1

v18-0002-Add-test-archive-module.patchtext/x-diff; charset=us-asciiDownload
From 3a4f6a90ba59b0cefe6fe950b3e220a2c94a4519 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:05:43 +0000
Subject: [PATCH v18 2/3] Add test archive module.

---
 contrib/Makefile                              |   1 +
 contrib/basic_archive/.gitignore              |   4 +
 contrib/basic_archive/Makefile                |  20 +
 contrib/basic_archive/basic_archive.c         | 365 ++++++++++++++++++
 contrib/basic_archive/basic_archive.conf      |   3 +
 .../basic_archive/expected/basic_archive.out  |  29 ++
 contrib/basic_archive/sql/basic_archive.sql   |  22 ++
 7 files changed, 444 insertions(+)
 create mode 100644 contrib/basic_archive/.gitignore
 create mode 100644 contrib/basic_archive/Makefile
 create mode 100644 contrib/basic_archive/basic_archive.c
 create mode 100644 contrib/basic_archive/basic_archive.conf
 create mode 100644 contrib/basic_archive/expected/basic_archive.out
 create mode 100644 contrib/basic_archive/sql/basic_archive.sql

diff --git a/contrib/Makefile b/contrib/Makefile
index 87bf87ab90..e3e221308b 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -9,6 +9,7 @@ SUBDIRS = \
 		amcheck		\
 		auth_delay	\
 		auto_explain	\
+		basic_archive	\
 		bloom		\
 		btree_gin	\
 		btree_gist	\
diff --git a/contrib/basic_archive/.gitignore b/contrib/basic_archive/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/contrib/basic_archive/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/basic_archive/Makefile b/contrib/basic_archive/Makefile
new file mode 100644
index 0000000000..14d036e1c4
--- /dev/null
+++ b/contrib/basic_archive/Makefile
@@ -0,0 +1,20 @@
+# contrib/basic_archive/Makefile
+
+MODULES = basic_archive
+PGFILEDESC = "basic_archive - basic archive module"
+
+REGRESS = basic_archive
+REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/basic_archive/basic_archive.conf
+
+NO_INSTALLCHECK = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/basic_archive
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
new file mode 100644
index 0000000000..324284f9ad
--- /dev/null
+++ b/contrib/basic_archive/basic_archive.c
@@ -0,0 +1,365 @@
+/*-------------------------------------------------------------------------
+ *
+ * basic_archive.c
+ *
+ * This file demonstrates a basic archive library implementation that is
+ * roughly equivalent to the following shell command:
+ *
+ * 		test ! -f /path/to/dest && cp /path/to/src /path/to/dest
+ *
+ * One notable difference between this module and the shell command above
+ * is that this module first copies the file to a temporary destination,
+ * syncs it to disk, and then durably moves it to the final destination.
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  contrib/basic_archive/basic_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <unistd.h>
+
+#include "common/int.h"
+#include "miscadmin.h"
+#include "postmaster/pgarch.h"
+#include "storage/copydir.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+
+PG_MODULE_MAGIC;
+
+void _PG_init(void);
+void _PG_archive_module_init(ArchiveModuleCallbacks *cb);
+
+static char *archive_directory = NULL;
+static MemoryContext basic_archive_context;
+
+static bool basic_archive_configured(void);
+static bool basic_archive_file(const char *file, const char *path);
+static void basic_archive_file_internal(const char *file, const char *path);
+static bool check_archive_directory(char **newval, void **extra, GucSource source);
+static bool compare_files(const char *file1, const char *file2);
+
+/*
+ * _PG_init
+ *
+ * Defines the module's GUC.
+ */
+void
+_PG_init(void)
+{
+	DefineCustomStringVariable("basic_archive.archive_directory",
+							   gettext_noop("Archive file destination directory."),
+							   NULL,
+							   &archive_directory,
+							   "",
+							   PGC_SIGHUP,
+							   0,
+							   check_archive_directory, NULL, NULL);
+
+	EmitWarningsOnPlaceholders("basic_archive");
+
+	basic_archive_context = AllocSetContextCreate(TopMemoryContext,
+												  "basic_archive",
+												  ALLOCSET_DEFAULT_SIZES);
+}
+
+/*
+ * _PG_archive_module_init
+ *
+ * Returns the module's archiving callbacks.
+ */
+void
+_PG_archive_module_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = basic_archive_configured;
+	cb->archive_file_cb = basic_archive_file;
+}
+
+/*
+ * check_archive_directory
+ *
+ * Checks that the provided archive directory exists.
+ */
+static bool
+check_archive_directory(char **newval, void **extra, GucSource source)
+{
+	struct stat st;
+
+	/*
+	 * The default value is an empty string, so we have to accept that value.
+	 * Our check_configured callback also checks for this and prevents archiving
+	 * from proceeding if it is still empty.
+	 */
+	if (*newval == NULL || *newval[0] == '\0')
+		return true;
+
+	/*
+	 * Make sure the file paths won't be too long.  The docs indicate that the
+	 * file names to be archived can be up to 64 characters long.
+	 */
+	if (strlen(*newval) + 64 + 2 >= MAXPGPATH)
+	{
+		GUC_check_errdetail("archive directory too long");
+		return false;
+	}
+
+	/*
+	 * Do a basic sanity check that the specified archive directory exists.  It
+	 * could be removed at some point in the future, so we still need to be
+	 * prepared for it not to exist in the actual archiving logic.
+	 */
+	if (stat(*newval, &st) != 0 || !S_ISDIR(st.st_mode))
+	{
+		GUC_check_errdetail("specified archive directory does not exist");
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * basic_archive_configured
+ *
+ * Checks that archive_directory is not blank.
+ */
+static bool
+basic_archive_configured(void)
+{
+	return archive_directory != NULL && archive_directory[0] != '\0';
+}
+
+/*
+ * basic_archive_file
+ *
+ * Archives one file.
+ */
+static bool
+basic_archive_file(const char *file, const char *path)
+{
+	sigjmp_buf	local_sigjmp_buf;
+	MemoryContext oldcontext;
+
+	/*
+	 * We run basic_archive_file_internal() in our own memory context so that we
+	 * can easily reset it during error recovery (thus avoiding memory leaks).
+	 */
+	oldcontext = MemoryContextSwitchTo(basic_archive_context);
+
+	/*
+	 * Since the archiver operates at the bottom of the exception stack, ERRORs
+	 * turn into FATALs and cause the archiver process to restart.  However,
+	 * using ereport(ERROR, ...) when there are problems is easy to code and
+	 * maintain.  Therefore, we create our own exception handler to catch ERRORs
+	 * and return false instead of restarting the archiver whenever there is a
+	 * failure.
+	 */
+	if (sigsetjmp(local_sigjmp_buf, 1) != 0)
+	{
+		/* Since not using PG_TRY, must reset error stack by hand */
+		error_context_stack = NULL;
+
+		/* Prevent interrupts while cleaning up */
+		HOLD_INTERRUPTS();
+
+		/* Report the error and clear ErrorContext for next time */
+		EmitErrorReport();
+		FlushErrorState();
+
+		/* Close any files left open by copy_file() or compare_files() */
+		AtEOSubXact_Files(false, InvalidSubTransactionId, InvalidSubTransactionId);
+
+		/* Reset our memory context and switch back to the original one */
+		MemoryContextSwitchTo(oldcontext);
+		MemoryContextReset(basic_archive_context);
+
+		/* Remove our exception handler */
+		PG_exception_stack = NULL;
+
+		/* Now we can allow interrupts again */
+		RESUME_INTERRUPTS();
+
+		/* Report failure so that the archiver retries this file */
+		return false;
+	}
+
+	/* Enable our exception handler */
+	PG_exception_stack = &local_sigjmp_buf;
+
+	/* Archive the file! */
+	basic_archive_file_internal(file, path);
+
+	/* Remove our exception handler */
+	PG_exception_stack = NULL;
+
+	/* Reset our memory context and switch back to the original one */
+	MemoryContextSwitchTo(oldcontext);
+	MemoryContextReset(basic_archive_context);
+
+	return true;
+}
+
+static void
+basic_archive_file_internal(const char *file, const char *path)
+{
+	char		destination[MAXPGPATH];
+	char		temp[MAXPGPATH + 256];
+	struct stat st;
+	struct timeval tv;
+	uint64		epoch;
+
+	ereport(DEBUG3,
+			(errmsg("archiving \"%s\" via basic_archive", file)));
+
+	snprintf(destination, MAXPGPATH, "%s/%s", archive_directory, file);
+
+	/*
+	 * First, check if the file has already been archived.  If it already exists
+	 * and has the same contents as the file we're trying to archive, we can
+	 * return success (after ensuring the file is persisted to disk). This
+	 * scenario is possible if the server crashed after archiving the file but
+	 * before renaming its .ready file to .done.
+	 *
+	 * If the archive file already exists but has different contents, something
+	 * might be wrong, so we just fail.
+	 */
+	if (stat(destination, &st) == 0)
+	{
+		if (compare_files(path, destination))
+		{
+			ereport(DEBUG3,
+					(errmsg("archive file \"%s\" already exists with identical contents",
+							destination)));
+
+			fsync_fname(destination, false);
+			fsync_fname(archive_directory, true);
+
+			return;
+		}
+
+		ereport(ERROR,
+				(errmsg("archive file \"%s\" already exists", destination)));
+	}
+	else if (errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat file \"%s\": %m", destination)));
+
+	/*
+	 * Pick a sufficiently unique name for the temporary file so that a
+	 * collision is unlikely.  This helps avoid problems in case a temporary
+	 * file was left around after a crash or another server happens to be
+	 * archiving to the same directory.
+	 */
+	gettimeofday(&tv, NULL);
+	if (pg_mul_u64_overflow((uint64) 1000, (uint64) tv.tv_sec, &epoch) ||
+		pg_add_u64_overflow(epoch, (uint64) tv.tv_usec, &epoch))
+		elog(ERROR, "could not generate temporary file name for archiving");
+
+	snprintf(temp, sizeof(temp), "%s/%s.%s.%d." UINT64_FORMAT,
+			 archive_directory, "archtemp", file, MyProcPid, epoch);
+
+	/*
+	 * Copy the file to its temporary destination.  Note that this will fail if
+	 * temp already exists.
+	 */
+	copy_file(unconstify(char *, path), temp);
+
+	/*
+	 * Sync the temporary file to disk and move it to its final destination.
+	 * This will fail if destination already exists.
+	 */
+	(void) durable_rename_excl(temp, destination, ERROR);
+
+	ereport(DEBUG1,
+			(errmsg("archived \"%s\" via basic_archive", file)));
+}
+
+/*
+ * compare_files
+ *
+ * Returns whether the contents of the files are the same.
+ */
+static bool
+compare_files(const char *file1, const char *file2)
+{
+#define CMP_BUF_SIZE (4096)
+	char		buf1[CMP_BUF_SIZE];
+	char		buf2[CMP_BUF_SIZE];
+	int			fd1;
+	int			fd2;
+	bool		ret = true;
+
+	fd1 = OpenTransientFile(file1, O_RDONLY | PG_BINARY);
+	if (fd1 < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", file1)));
+
+	fd2 = OpenTransientFile(file2, O_RDONLY | PG_BINARY);
+	if (fd2 < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", file2)));
+
+	for (;;)
+	{
+		int		nbytes = 0;
+		int		buf1_len = 0;
+		int		buf2_len = 0;
+
+		while (buf1_len < CMP_BUF_SIZE)
+		{
+			nbytes = read(fd1, buf1 + buf1_len, CMP_BUF_SIZE - buf1_len);
+			if (nbytes < 0)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not read file \"%s\": %m", file1)));
+			else if (nbytes == 0)
+				break;
+
+			buf1_len += nbytes;
+		}
+
+		while (buf2_len < CMP_BUF_SIZE)
+		{
+			nbytes = read(fd2, buf2 + buf2_len, CMP_BUF_SIZE - buf2_len);
+			if (nbytes < 0)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not read file \"%s\": %m", file2)));
+			else if (nbytes == 0)
+				break;
+
+			buf2_len += nbytes;
+		}
+
+		if (buf1_len != buf2_len || memcmp(buf1, buf2, buf1_len) != 0)
+		{
+			ret = false;
+			break;
+		}
+		else if (buf1_len == 0)
+			break;
+	}
+
+	if (CloseTransientFile(fd1) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", file1)));
+
+	if (CloseTransientFile(fd2) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", file2)));
+
+	return ret;
+}
diff --git a/contrib/basic_archive/basic_archive.conf b/contrib/basic_archive/basic_archive.conf
new file mode 100644
index 0000000000..b26b2d4144
--- /dev/null
+++ b/contrib/basic_archive/basic_archive.conf
@@ -0,0 +1,3 @@
+archive_mode = 'on'
+archive_library = 'basic_archive'
+basic_archive.archive_directory = '.'
diff --git a/contrib/basic_archive/expected/basic_archive.out b/contrib/basic_archive/expected/basic_archive.out
new file mode 100644
index 0000000000..0015053e0f
--- /dev/null
+++ b/contrib/basic_archive/expected/basic_archive.out
@@ -0,0 +1,29 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+ ?column? 
+----------
+        1
+(1 row)
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test;
diff --git a/contrib/basic_archive/sql/basic_archive.sql b/contrib/basic_archive/sql/basic_archive.sql
new file mode 100644
index 0000000000..14e236d57a
--- /dev/null
+++ b/contrib/basic_archive/sql/basic_archive.sql
@@ -0,0 +1,22 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+
+DROP TABLE test;
-- 
2.25.1

v18-0003-Add-documentation-for-archive-modules.patchtext/x-diff; charset=us-asciiDownload
From d127eb5428308c86154212aa759199343b09fb5d Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Fri, 19 Nov 2021 01:06:01 +0000
Subject: [PATCH v18 3/3] Add documentation for archive modules.

---
 doc/src/sgml/archive-modules.sgml   | 136 ++++++++++++++++++++++++++++
 doc/src/sgml/backup.sgml            |  85 ++++++++++-------
 doc/src/sgml/basic-archive.sgml     |  81 +++++++++++++++++
 doc/src/sgml/config.sgml            |  37 ++++++--
 doc/src/sgml/contrib.sgml           |   1 +
 doc/src/sgml/filelist.sgml          |   2 +
 doc/src/sgml/high-availability.sgml |   6 +-
 doc/src/sgml/postgres.sgml          |   1 +
 doc/src/sgml/ref/pg_basebackup.sgml |   4 +-
 doc/src/sgml/ref/pg_receivewal.sgml |   6 +-
 doc/src/sgml/wal.sgml               |   2 +-
 11 files changed, 312 insertions(+), 49 deletions(-)
 create mode 100644 doc/src/sgml/archive-modules.sgml
 create mode 100644 doc/src/sgml/basic-archive.sgml

diff --git a/doc/src/sgml/archive-modules.sgml b/doc/src/sgml/archive-modules.sgml
new file mode 100644
index 0000000000..f1189ddcd5
--- /dev/null
+++ b/doc/src/sgml/archive-modules.sgml
@@ -0,0 +1,136 @@
+<!-- doc/src/sgml/archive-modules.sgml -->
+
+<chapter id="archive-modules">
+ <title>Archive Modules</title>
+ <indexterm zone="archive-modules">
+  <primary>Archive Modules</primary>
+ </indexterm>
+
+ <para>
+  PostgreSQL provides infrastructure to create custom modules for continuous
+  archiving (see <xref linkend="continuous-archiving"/>).  While archiving via
+  a shell command (i.e., <xref linkend="guc-archive-command"/>) is much
+  simpler, a custom archive module will often be considerably more robust and
+  performant.
+ </para>
+
+ <para>
+  When a custom <xref linkend="guc-archive-library"/> is configured, PostgreSQL
+  will submit completed WAL files to the module, and the server will avoid
+  recyling or removing these WAL files until the module indicates that the files
+  were successfully archived.  It is ultimately up to the module to decide what
+  to do with each WAL file, but many recommendations are listed at
+  <xref linkend="backup-archiving-wal"/>.
+ </para>
+
+ <para>
+  Archiving modules must at least consist of an initialization function (see
+  <xref linkend="archive-module-init"/>) and the required callbacks (see
+  <xref linkend="archive-module-callbacks"/>).  However, archive modules are
+  also permitted to do much more (e.g., declare GUCs and register background
+  workers).
+ </para>
+
+ <para>
+  The <filename>contrib/basic_archive</filename> module contains a working
+  example, which demonstrates some useful techniques.
+ </para>
+
+ <sect1 id="archive-module-init">
+  <title>Initialization Functions</title>
+  <indexterm zone="archive-module-init">
+   <primary>_PG_archive_module_init</primary>
+  </indexterm>
+  <para>
+   An archive library is loaded by dynamically loading a shared library with the
+   <xref linkend="guc-archive-library"/>'s name as the library base name.  The
+   normal library search path is used to locate the library.  To provide the
+   required archive module callbacks and to indicate that the library is
+   actually an archive module, it needs to provide a function named
+   <function>_PG_archive_module_init</function>.  This function is passed a
+   struct that needs to be filled with the callback function pointers for
+   individual actions.
+
+<programlisting>
+typedef struct ArchiveModuleCallbacks
+{
+    ArchiveCheckConfiguredCB check_configured_cb;
+    ArchiveFileCB archive_file_cb;
+    ArchiveShutdownCB shutdown_cb;
+} ArchiveModuleCallbacks;
+typedef void (*ArchiveModuleInit) (struct ArchiveModuleCallbacks *cb);
+</programlisting>
+
+   Only the <function>archive_file_cb</function> callback is required.  The
+   others are optional.
+  </para>
+ </sect1>
+
+ <sect1 id="archive-module-callbacks">
+  <title>Archive Module Callbacks</title>
+  <para>
+   The archive callbacks define the actual archiving behavior of the module.
+   The server will call them as required to process each individual WAL file.
+  </para>
+
+  <sect2 id="archive-module-check">
+   <title>Check Callback</title>
+   <para>
+    The <function>check_configured_cb</function> callback is called to determine
+    whether the module is fully configured and ready to accept WAL files (e.g.,
+    its configuration parameters are set to valid values).  If no
+    <function>check_configured_cb</function> is defined, the server always
+    assumes the module is configured.
+
+<programlisting>
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server will proceed with
+    archiving the file by calling the <function>archive_file_cb</function>
+    callback.  If <literal>false</literal> is returned, archiving will not
+    proceed, and the archiver will emit the following message to the server log:
+<screen>
+WARNING:  archive_mode enabled, yet archiving is not configured
+</screen>
+    In the latter case, the server will periodically call this function, and
+    archiving will proceed only when it returns <literal>true</literal>.
+   </para>
+  </sect2>
+
+  <sect2 id="archive-module-archive">
+   <title>Archive Callback</title>
+   <para>
+    The <function>archive_file_cb</function> callback is called to archive a
+    single WAL file.
+
+<programlisting>
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server proceeds as if the file
+    was successfully archived, which may include recycling or removing the
+    original WAL file.  If <literal>false</literal> is returned, the server will
+    keep the original WAL file and retry archiving later.
+    <literal>file</literal> will contain just the file name of the WAL file to
+    archive, while <literal>path</literal> contains the full path of the WAL
+    file (including the file name).
+   </para>
+  </sect2>
+
+  <sect2 id="archive-module-shutdown">
+   <title>Shutdown Callback</title>
+   <para>
+    The <function>shutdown_cb</function> callback is called when the archiver
+    process exits (e.g., after an error) or the value of
+    <xref linkend="guc-archive-library"/> changes.  If no
+    <function>shutdown_cb</function> is defined, no special action is taken in
+    these situations.
+
+<programlisting>
+typedef void (*ArchiveShutdownCB) (void);
+</programlisting>
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index cba32b6eb3..0d69851bb1 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -593,20 +593,23 @@ tar -cf backup.tar /usr/local/pgsql/data
     provide the database administrator with flexibility,
     <productname>PostgreSQL</productname> tries not to make any assumptions about how
     the archiving will be done.  Instead, <productname>PostgreSQL</productname> lets
-    the administrator specify a shell command to be executed to copy a
-    completed segment file to wherever it needs to go.  The command could be
-    as simple as a <literal>cp</literal>, or it could invoke a complex shell
-    script &mdash; it's all up to you.
+    the administrator specify an archive library to be executed to copy a
+    completed segment file to wherever it needs to go.  This could be as simple
+    as a shell command that uses <literal>cp</literal>, or it could invoke a
+    complex C function &mdash; it's all up to you.
    </para>
 
    <para>
     To enable WAL archiving, set the <xref linkend="guc-wal-level"/>
     configuration parameter to <literal>replica</literal> or higher,
     <xref linkend="guc-archive-mode"/> to <literal>on</literal>,
-    and specify the shell command to use in the <xref
-    linkend="guc-archive-command"/> configuration parameter.  In practice
+    and specify the library to use in the <xref
+    linkend="guc-archive-library"/> configuration parameter.  In practice
     these settings will always be placed in the
     <filename>postgresql.conf</filename> file.
+    One simple way to archive is to set <varname>archive_library</varname> to
+    an empty string and to specify a shell command in
+    <xref linkend="guc-archive-command"/>.
     In <varname>archive_command</varname>,
     <literal>%p</literal> is replaced by the path name of the file to
     archive, while <literal>%f</literal> is replaced by only the file name.
@@ -631,7 +634,17 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command will be executed under the ownership of the same
+    Another way to archive is to use a custom archive module as the
+    <varname>archive_library</varname>.  Since such modules are written in
+    <literal>C</literal>, creating your own may require considerably more effort
+    than writing a shell command.  However, archive modules can be more
+    performant than archiving via shell, and they will have access to many
+    useful server resources.  For more information about archive modules, see
+    <xref linkend="archive-modules"/>.
+   </para>
+
+   <para>
+    The archive library will be executed under the ownership of the same
     user that the <productname>PostgreSQL</productname> server is running as.  Since
     the series of WAL files being archived contains effectively everything
     in your database, you will want to be sure that the archived data is
@@ -640,25 +653,31 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is important that the archive command return zero exit status if and
-    only if it succeeds.  Upon getting a zero result,
+    It is important that the archive function return <literal>true</literal> if
+    and only if it succeeds.  If <literal>true</literal> is returned,
     <productname>PostgreSQL</productname> will assume that the file has been
-    successfully archived, and will remove or recycle it.  However, a nonzero
-    status tells <productname>PostgreSQL</productname> that the file was not archived;
-    it will try again periodically until it succeeds.
+    successfully archived, and will remove or recycle it.  However, a return
+    value of <literal>false</literal> tells
+    <productname>PostgreSQL</productname> that the file was not archived; it
+    will try again periodically until it succeeds.  If you are archiving via a
+    shell command, the appropriate return values can be achieved by returning
+    <literal>0</literal> if the command succeeds and a nonzero value if it
+    fails.
    </para>
 
    <para>
-    When the archive command is terminated by a signal (other than
-    <systemitem>SIGTERM</systemitem> that is used as part of a server
-    shutdown) or an error by the shell with an exit status greater than
-    125 (such as command not found), the archiver process aborts and gets
-    restarted by the postmaster. In such cases, the failure is
-    not reported in <xref linkend="pg-stat-archiver-view"/>.
+    If the archive function emits an <literal>ERROR</literal> or
+    <literal>FATAL</literal>, the archiver process aborts and gets restarted by
+    the postmaster.  If you are archiving via shell command, FATAL is emitted if
+    the command is terminated by a signal (other than
+    <systemitem>SIGTERM</systemitem> that is used as part of a server shutdown)
+    or an error by the shell with an exit status greater than 125 (such as
+    command not found).  In such cases, the failure is not reported in
+    <xref linkend="pg-stat-archiver-view"/>.
    </para>
 
    <para>
-    The archive command should generally be designed to refuse to overwrite
+    The archive library should generally be designed to refuse to overwrite
     any pre-existing archive file.  This is an important safety feature to
     preserve the integrity of your archive in case of administrator error
     (such as sending the output of two different servers to the same archive
@@ -666,9 +685,9 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is advisable to test your proposed archive command to ensure that it
+    It is advisable to test your proposed archive library to ensure that it
     indeed does not overwrite an existing file, <emphasis>and that it returns
-    nonzero status in this case</emphasis>.
+    <literal>false</literal> in this case</emphasis>.
     The example command above for Unix ensures this by including a separate
     <command>test</command> step.  On some Unix platforms, <command>cp</command> has
     switches such as <option>-i</option> that can be used to do the same thing
@@ -680,7 +699,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
 
    <para>
     While designing your archiving setup, consider what will happen if
-    the archive command fails repeatedly because some aspect requires
+    the archive library fails repeatedly because some aspect requires
     operator intervention or the archive runs out of space. For example, this
     could occur if you write to tape without an autochanger; when the tape
     fills, nothing further can be archived until the tape is swapped.
@@ -695,7 +714,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The speed of the archiving command is unimportant as long as it can keep up
+    The speed of the archive library is unimportant as long as it can keep up
     with the average rate at which your server generates WAL data.  Normal
     operation continues even if the archiving process falls a little behind.
     If archiving falls significantly behind, this will increase the amount of
@@ -707,11 +726,11 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    In writing your archive command, you should assume that the file names to
+    In writing your archive library, you should assume that the file names to
     be archived can be up to 64 characters long and can contain any
     combination of ASCII letters, digits, and dots.  It is not necessary to
-    preserve the original relative path (<literal>%p</literal>) but it is necessary to
-    preserve the file name (<literal>%f</literal>).
+    preserve the original relative path but it is necessary to preserve the file
+    name.
    </para>
 
    <para>
@@ -728,7 +747,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command is only invoked on completed WAL segments.  Hence,
+    The archive function is only invoked on completed WAL segments.  Hence,
     if your server generates only little WAL traffic (or has slack periods
     where it does so), there could be a long delay between the completion
     of a transaction and its safe recording in archive storage.  To put
@@ -757,8 +776,9 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
     turned on during execution of one of these statements, WAL would not
     contain enough information for archive recovery.  (Crash recovery is
     unaffected.)  For this reason, <varname>wal_level</varname> can only be changed at
-    server start.  However, <varname>archive_command</varname> can be changed with a
-    configuration file reload.  If you wish to temporarily stop archiving,
+    server start.  However, <varname>archive_library</varname> can be changed with a
+    configuration file reload.  If you are archiving via shell and wish to
+    temporarily stop archiving,
     one way to do it is to set <varname>archive_command</varname> to the empty
     string (<literal>''</literal>).
     This will cause WAL files to accumulate in <filename>pg_wal/</filename> until a
@@ -938,11 +958,11 @@ SELECT * FROM pg_stop_backup(false, true);
      On a standby, <varname>archive_mode</varname> must be <literal>always</literal> in order
      for <function>pg_stop_backup</function> to wait.
      Archiving of these files happens automatically since you have
-     already configured <varname>archive_command</varname>. In most cases this
+     already configured <varname>archive_library</varname>. In most cases this
      happens quickly, but you are advised to monitor your archive
      system to ensure there are no delays.
      If the archive process has fallen behind
-     because of failures of the archive command, it will keep retrying
+     because of failures of the archive library, it will keep retrying
      until the archive succeeds and the backup is complete.
      If you wish to place a time limit on the execution of
      <function>pg_stop_backup</function>, set an appropriate
@@ -1500,9 +1520,10 @@ restore_command = 'cp /mnt/server/archivedir/%f %p'
       To prepare for low level standalone hot backups, make sure
       <varname>wal_level</varname> is set to
       <literal>replica</literal> or higher, <varname>archive_mode</varname> to
-      <literal>on</literal>, and set up an <varname>archive_command</varname> that performs
+      <literal>on</literal>, and set up an <varname>archive_library</varname> that performs
       archiving only when a <emphasis>switch file</emphasis> exists.  For example:
 <programlisting>
+archive_library = ''  # use shell command
 archive_command = 'test ! -f /var/lib/pgsql/backup_in_progress || (test ! -f /var/lib/pgsql/archive/%f &amp;&amp; cp %p /var/lib/pgsql/archive/%f)'
 </programlisting>
       This command will perform archiving when
diff --git a/doc/src/sgml/basic-archive.sgml b/doc/src/sgml/basic-archive.sgml
new file mode 100644
index 0000000000..0b650f17a8
--- /dev/null
+++ b/doc/src/sgml/basic-archive.sgml
@@ -0,0 +1,81 @@
+<!-- doc/src/sgml/basic-archive.sgml -->
+
+<sect1 id="basic-archive" xreflabel="basic_archive">
+ <title>basic_archive</title>
+
+ <indexterm zone="basic-archive">
+  <primary>basic_archive</primary>
+ </indexterm>
+
+ <para>
+  <filename>basic_archive</filename> is an example of an archive module.  This
+  module copies completed WAL segment files to the specified directory.  This
+  may not be especially useful, but it can serve as a starting point for
+  developing your own archive module.  For more information about archive
+  modules, see <xref linkend="archive-modules"/>.
+ </para>
+
+ <para>
+  In order to function, this module must be loaded via
+  <xref linkend="guc-archive-library"/>, and <xref linkend="guc-archive-mode"/>
+  must be enabled.
+ </para>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term>
+     <varname>basic_archive.archive_directory</varname> (<type>string</type>)
+     <indexterm>
+      <primary><varname>basic_archive.archive_directory</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      The directory where the server should copy WAL segment files.  This
+      directory must already exist.  The default is an empty string, which
+      effectively halts WAL archiving, but if <xref linkend="guc-archive-mode"/>
+      is enabled, the server will accumulate WAL segment files in the
+      expectation that a value will soon be provided.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <para>
+   These parameters must be set in <filename>postgresql.conf</filename>.
+   Typical usage might be:
+  </para>
+
+<programlisting>
+# postgresql.conf
+archive_mode = 'on'
+archive_library = 'basic_archive'
+basic_archive.archive_directory = '/path/to/archive/directory'
+</programlisting>
+ </sect2>
+
+ <sect2>
+  <title>Notes</title>
+
+  <para>
+   Server crashes may leave temporary files with the prefix
+   <filename>archtemp</filename> in the archive directory.  It is recommended to
+   delete such files before restarting the server after a crash.  It is safe to
+   remove such files while the server is running as long as they are unrelated
+   to any archiving still in progress, but users should use extra caution when
+   doing so.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Author</title>
+
+  <para>
+   Nathan Bossart
+  </para>
+ </sect2>
+
+</sect1>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 692d8a2a17..fc63172efd 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3479,7 +3479,7 @@ include_dir 'conf.d'
         Maximum size to let the WAL grow during automatic
         checkpoints. This is a soft limit; WAL size can exceed
         <varname>max_wal_size</varname> under special circumstances, such as
-        heavy load, a failing <varname>archive_command</varname>, or a high
+        heavy load, a failing <varname>archive_library</varname>, or a high
         <varname>wal_keep_size</varname> setting.
         If this value is specified without units, it is taken as megabytes.
         The default is 1 GB.
@@ -3528,7 +3528,7 @@ include_dir 'conf.d'
        <para>
         When <varname>archive_mode</varname> is enabled, completed WAL segments
         are sent to archive storage by setting
-        <xref linkend="guc-archive-command"/>. In addition to <literal>off</literal>,
+        <xref linkend="guc-archive-library"/>. In addition to <literal>off</literal>,
         to disable, there are two modes: <literal>on</literal>, and
         <literal>always</literal>. During normal operation, there is no
         difference between the two modes, but when set to <literal>always</literal>
@@ -3538,9 +3538,6 @@ include_dir 'conf.d'
         <xref linkend="continuous-archiving-in-standby"/> for details.
        </para>
        <para>
-        <varname>archive_mode</varname> and <varname>archive_command</varname> are
-        separate variables so that <varname>archive_command</varname> can be
-        changed without leaving archiving mode.
         This parameter can only be set at server start.
         <varname>archive_mode</varname> cannot be enabled when
         <varname>wal_level</varname> is set to <literal>minimal</literal>.
@@ -3548,6 +3545,28 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-archive-library" xreflabel="archive_library">
+      <term><varname>archive_library</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>archive_library</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        The library to use for archiving completed WAL file segments.  If set to
+        an empty string (the default), archiving via shell is enabled, and
+        <xref linkend="guc-archive-command"/> is used.  Otherwise, the specified
+        shared library is used for archiving.  For more information, see
+        <xref linkend="backup-archiving-wal"/> and
+        <xref linkend="archive-modules"/>.
+       </para>
+       <para>
+        This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-archive-command" xreflabel="archive_command">
       <term><varname>archive_command</varname> (<type>string</type>)
       <indexterm>
@@ -3570,9 +3589,11 @@ include_dir 'conf.d'
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
         file or on the server command line.  It is ignored unless
-        <varname>archive_mode</varname> was enabled at server start.
+        <varname>archive_mode</varname> was enabled at server start and
+        <varname>archive_library</varname> specifies to archive via shell command.
         If <varname>archive_command</varname> is an empty string (the default) while
-        <varname>archive_mode</varname> is enabled, WAL archiving is temporarily
+        <varname>archive_mode</varname> is enabled and <varname>archive_library</varname>
+        specifies archiving via shell, WAL archiving is temporarily
         disabled, but the server continues to accumulate WAL segment files in
         the expectation that a command will soon be provided.  Setting
         <varname>archive_command</varname> to a command that does nothing but
@@ -3592,7 +3613,7 @@ include_dir 'conf.d'
       </term>
       <listitem>
        <para>
-        The <xref linkend="guc-archive-command"/> is only invoked for
+        The <xref linkend="guc-archive-library"/> is only invoked for
         completed WAL segments. Hence, if your server generates little WAL
         traffic (or has slack periods where it does so), there could be a
         long delay between the completion of a transaction and its safe
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index d3ca4b6932..be9711c6f2 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -99,6 +99,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
  &amcheck;
  &auth-delay;
  &auto-explain;
+ &basic-archive;
  &bloom;
  &btree-gin;
  &btree-gist;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 89454e99b9..328cd1f378 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -99,6 +99,7 @@
 <!ENTITY custom-scan SYSTEM "custom-scan.sgml">
 <!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
 <!ENTITY replication-origins SYSTEM "replication-origins.sgml">
+<!ENTITY archive-modules SYSTEM "archive-modules.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
@@ -112,6 +113,7 @@
 <!ENTITY amcheck         SYSTEM "amcheck.sgml">
 <!ENTITY auth-delay      SYSTEM "auth-delay.sgml">
 <!ENTITY auto-explain    SYSTEM "auto-explain.sgml">
+<!ENTITY basic-archive   SYSTEM "basic-archive.sgml">
 <!ENTITY bloom           SYSTEM "bloom.sgml">
 <!ENTITY btree-gin       SYSTEM "btree-gin.sgml">
 <!ENTITY btree-gist      SYSTEM "btree-gist.sgml">
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index a265409f02..437712762a 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -935,7 +935,7 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
     In lieu of using replication slots, it is possible to prevent the removal
     of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by
     storing the segments in an archive using
-    <xref linkend="guc-archive-command"/>.
+    <xref linkend="guc-archive-library"/>.
     However, these methods often result in retaining more WAL segments than
     required, whereas replication slots retain only the number of segments
     known to be needed.  On the other hand, replication slots can retain so
@@ -1386,10 +1386,10 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
      to <literal>always</literal>, and the standby will call the archive
      command for every WAL segment it receives, whether it's by restoring
      from the archive or by streaming replication. The shared archive can
-     be handled similarly, but the <varname>archive_command</varname> must
+     be handled similarly, but the <varname>archive_library</varname> must
      test if the file being archived exists already, and if the existing file
      has identical contents. This requires more care in the
-     <varname>archive_command</varname>, as it must
+     <varname>archive_library</varname>, as it must
      be careful to not overwrite an existing file with different contents,
      but return success if the exactly same file is archived twice. And
      all that must be done free of race conditions, if two servers attempt
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index dba9cf413f..3db6d2160b 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -233,6 +233,7 @@ break is not needed in a wider output rendering.
   &bgworker;
   &logicaldecoding;
   &replication-origins;
+  &archive-modules;
 
  </part>
 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 1546f10c0d..e7ae29ec3d 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -102,8 +102,8 @@ PostgreSQL documentation
      <para>
       All WAL records required for the backup must contain sufficient full-page writes,
       which requires you to enable <varname>full_page_writes</varname> on the primary and
-      not to use a tool like <application>pg_compresslog</application> as
-      <varname>archive_command</varname> to remove full-page writes from WAL files.
+      not to use a tool in your <varname>archive_library</varname> to remove
+      full-page writes from WAL files.
      </para>
     </listitem>
    </itemizedlist>
diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index b2e41ea814..b846213fb7 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -40,7 +40,7 @@ PostgreSQL documentation
   <para>
    <application>pg_receivewal</application> streams the write-ahead
    log in real time as it's being generated on the server, and does not wait
-   for segments to complete like <xref linkend="guc-archive-command"/> does.
+   for segments to complete like <xref linkend="guc-archive-library"/> does.
    For this reason, it is not necessary to set
    <xref linkend="guc-archive-timeout"/> when using
     <application>pg_receivewal</application>.
@@ -487,11 +487,11 @@ PostgreSQL documentation
 
   <para>
    When using <application>pg_receivewal</application> instead of
-   <xref linkend="guc-archive-command"/> as the main WAL backup method, it is
+   <xref linkend="guc-archive-library"/> as the main WAL backup method, it is
    strongly recommended to use replication slots.  Otherwise, the server is
    free to recycle or remove write-ahead log files before they are backed up,
    because it does not have any information, either
-   from <xref linkend="guc-archive-command"/> or the replication slots, about
+   from <xref linkend="guc-archive-library"/> or the replication slots, about
    how far the WAL stream has been archived.  Note, however, that a
    replication slot will fill up the server's disk space if the receiver does
    not keep up with fetching the WAL data.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 24e1c89503..2bb27a8468 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -636,7 +636,7 @@
    WAL files plus one additional WAL file are
    kept at all times. Also, if WAL archiving is used, old segments cannot be
    removed or recycled until they are archived. If WAL archiving cannot keep up
-   with the pace that WAL is generated, or if <varname>archive_command</varname>
+   with the pace that WAL is generated, or if <varname>archive_library</varname>
    fails repeatedly, old WAL files will accumulate in <filename>pg_wal</filename>
    until the situation is resolved. A slow or failed standby server that
    uses a replication slot will have the same effect (see
-- 
2.25.1

#36Robert Haas
robertmhaas@gmail.com
In reply to: Nathan Bossart (#35)
Re: archive modules

On Wed, Feb 2, 2022 at 5:44 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

I would suggest s/if it eventually/only when it/

Done.

Committed. I'm going to be 0% surprised if the buildfarm turns pretty
colors, but I don't know how to know what it's going to be unhappy
about except by trying it, so here goes.

--
Robert Haas
EDB: http://www.enterprisedb.com

#37Nathan Bossart
nathandbossart@gmail.com
In reply to: Robert Haas (#36)
Re: archive modules

On Thu, Feb 03, 2022 at 02:11:18PM -0500, Robert Haas wrote:

Committed. I'm going to be 0% surprised if the buildfarm turns pretty
colors, but I don't know how to know what it's going to be unhappy
about except by trying it, so here goes.

Thanks! I'll keep an eye on the buildfarm and will send any new patches
that are needed.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#38Robert Haas
robertmhaas@gmail.com
In reply to: Nathan Bossart (#37)
Re: archive modules

On Thu, Feb 3, 2022 at 2:27 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Thu, Feb 03, 2022 at 02:11:18PM -0500, Robert Haas wrote:

Committed. I'm going to be 0% surprised if the buildfarm turns pretty
colors, but I don't know how to know what it's going to be unhappy
about except by trying it, so here goes.

Thanks! I'll keep an eye on the buildfarm and will send any new patches
that are needed.

Andres just pointed out to me that thorntail is unhappy:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&amp;dt=2022-02-03%2019%3A54%3A42

It says:

==~_~===-=-===~_~==
pgsql.build/contrib/basic_archive/log/postmaster.log
==~_~===-=-===~_~==
2022-02-03 23:17:49.019 MSK [1253623:1] FATAL: WAL archival cannot be
enabled when wal_level is "minimal"

The notes for the machine say:

UBSan; force_parallel_mode; wal_level=minimal; OS bug breaks truncate()

So apparently we need to either skip this test when wal_level=minimal,
or force a higher wal_level to be used for this particular test. Not
sure what the existing precedents are, if any.

--
Robert Haas
EDB: http://www.enterprisedb.com

#39Nathan Bossart
nathandbossart@gmail.com
In reply to: Robert Haas (#38)
Re: archive modules

On Thu, Feb 03, 2022 at 04:04:33PM -0500, Robert Haas wrote:

So apparently we need to either skip this test when wal_level=minimal,
or force a higher wal_level to be used for this particular test. Not
sure what the existing precedents are, if any.

The only precedent I've found so far is test_decoding, which sets wal_level
to "logical." Perhaps we can just set it to "replica" in
basic_archive.conf.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#40Robert Haas
robertmhaas@gmail.com
In reply to: Nathan Bossart (#39)
Re: archive modules

On Thu, Feb 3, 2022 at 4:11 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Thu, Feb 03, 2022 at 04:04:33PM -0500, Robert Haas wrote:

So apparently we need to either skip this test when wal_level=minimal,
or force a higher wal_level to be used for this particular test. Not
sure what the existing precedents are, if any.

The only precedent I've found so far is test_decoding, which sets wal_level
to "logical." Perhaps we can just set it to "replica" in
basic_archive.conf.

Yeah, that seems to make sense.

--
Robert Haas
EDB: http://www.enterprisedb.com

#41Nathan Bossart
nathandbossart@gmail.com
In reply to: Robert Haas (#40)
1 attachment(s)
Re: archive modules

On Thu, Feb 03, 2022 at 04:15:30PM -0500, Robert Haas wrote:

On Thu, Feb 3, 2022 at 4:11 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Thu, Feb 03, 2022 at 04:04:33PM -0500, Robert Haas wrote:

So apparently we need to either skip this test when wal_level=minimal,
or force a higher wal_level to be used for this particular test. Not
sure what the existing precedents are, if any.

The only precedent I've found so far is test_decoding, which sets wal_level
to "logical." Perhaps we can just set it to "replica" in
basic_archive.conf.

Yeah, that seems to make sense.

024_archive_recovery.pl seems to do something similar. Patch attached.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

fix-basic-archive-test.patchtext/x-diff; charset=us-asciiDownload
diff --git a/contrib/basic_archive/basic_archive.conf b/contrib/basic_archive/basic_archive.conf
index b26b2d4144..db029f4b8e 100644
--- a/contrib/basic_archive/basic_archive.conf
+++ b/contrib/basic_archive/basic_archive.conf
@@ -1,3 +1,4 @@
 archive_mode = 'on'
 archive_library = 'basic_archive'
 basic_archive.archive_directory = '.'
+wal_level = 'replica'
#42Robert Haas
robertmhaas@gmail.com
In reply to: Nathan Bossart (#41)
Re: archive modules

On Thu, Feb 3, 2022 at 4:25 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

024_archive_recovery.pl seems to do something similar. Patch attached.

Committed. I think this is mostly an issue for pg_regress tests, as
opposed to 024_archive_recovery.pl, which is a TAP test. Maybe I'm
wrong about that, but it looks to me like most TAP tests choose what
they want explicitly, while pg_regress tests tend to inherit the
value.

--
Robert Haas
EDB: http://www.enterprisedb.com

#43Nathan Bossart
nathandbossart@gmail.com
In reply to: Robert Haas (#42)
Re: archive modules

On Thu, Feb 03, 2022 at 04:45:52PM -0500, Robert Haas wrote:

On Thu, Feb 3, 2022 at 4:25 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

024_archive_recovery.pl seems to do something similar. Patch attached.

Committed. I think this is mostly an issue for pg_regress tests, as
opposed to 024_archive_recovery.pl, which is a TAP test. Maybe I'm
wrong about that, but it looks to me like most TAP tests choose what
they want explicitly, while pg_regress tests tend to inherit the
value.

Thanks!

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#44talk to ben
blo.talkto@gmail.com
In reply to: Nathan Bossart (#43)
Re: archive modules

Hi,

I am not sure why, but I can't find "basic_archive.archive_directory" in
pg_settings the same way I would find for example :
"pg_stat_statements.max".

[local]:5656 benoit@postgres=# SELECT count(*) FROM pg_settings WHERE name
= 'basic_archive.archive_directory';
count
-------
0
(1 row)

show can find it if I use the complete name but tab completion can't find
the guc:

[local]:5656 benoit@postgres=# show basic_archive.archive_directory;
basic_archive.archive_directory
---------------------------------
/home/benoit/tmp/tmp/archives
(1 row)

The archiver is configured with "basic_archive" and is working fine. I use
this version of pg:

PostgreSQL 15beta2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.3.1
20220421 (Red Hat 11.3.1-2), 64-bit

#45Nathan Bossart
nathandbossart@gmail.com
In reply to: talk to ben (#44)
Re: archive modules

On Wed, Jul 06, 2022 at 06:21:24PM +0200, talk to ben wrote:

I am not sure why, but I can't find "basic_archive.archive_directory" in
pg_settings the same way I would find for example :
"pg_stat_statements.max".

[local]:5656 benoit@postgres=# SELECT count(*) FROM pg_settings WHERE name
= 'basic_archive.archive_directory';
count
-------
0
(1 row)

show can find it if I use the complete name but tab completion can't find
the guc:

[local]:5656 benoit@postgres=# show basic_archive.archive_directory;
basic_archive.archive_directory
---------------------------------
/home/benoit/tmp/tmp/archives
(1 row)

I think the reason is that only the archiver process loads the library, so
the GUC isn't registered at startup like you'd normally see with
shared_preload_libraries. IIUC the server will still create a placeholder
GUC during startup for custom parameters, which is why it shows up for SHOW
commands.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#46talk to ben
blo.talkto@gmail.com
In reply to: Nathan Bossart (#45)
Re: archive modules

I think the reason is that only the archiver process loads the library, so
the GUC isn't registered at startup like you'd normally see with
shared_preload_libraries. IIUC the server will still create a placeholder
GUC during startup for custom parameters, which is why it shows up for SHOW
commands.

Thanks for the quick answer !
That's a little surprising at first but I understand better now.

Will there be a facility to check archive_library gucs later on ? It might
come in handy with more
guc rich archive modules.

#47talk to ben
blo.talkto@gmail.com
In reply to: talk to ben (#46)
Re: archive modules

(Sorry for the spam Nathan)

With the list in CC and additional information :

The modified archive module parameters are visible in pg_file_settings.
They don't show up in \dconfig+, which I understand given the query used by
the
meta command, but I find a little confusing from an end user POV.

#48Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: talk to ben (#47)
Re: archive modules

On 2022-Jul-07, talk to ben wrote:

The modified archive module parameters are visible in pg_file_settings.
They don't show up in \dconfig+, which I understand given the query used by
the meta command, but I find a little confusing from an end user POV.

Well, this does sound unsatisfactory. I suppose one answer would be to
load the module in all backends, in case the user wants to look at the
value. But that would be wasteful. Maybe we should have a warning
about it in the docs -- tell people to LOAD the library if they want to
examine the configuration?

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#49Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#48)
Re: archive modules

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Well, this does sound unsatisfactory. I suppose one answer would be to
load the module in all backends, in case the user wants to look at the
value. But that would be wasteful. Maybe we should have a warning
about it in the docs -- tell people to LOAD the library if they want to
examine the configuration?

The underlying issue here is that the pg_settings view doesn't show
placeholder GUCs because we lack satisfactory values to put in
most of the columns. I don't know if revisiting that conclusion
would be appropriate or not. The purist approach would be to show
NULL for any unknown column, but how many applications would that
break? And even the "known" values could change unexpectedly when
the module does get loaded, for example if the GUC has units and
the value in the config file is expressed in a non-canonical way.
(To say nothing of what a show hook might do...)

regards, tom lane

#50Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#49)
Re: archive modules

On Thu, Jul 07, 2022 at 11:48:20AM -0400, Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Well, this does sound unsatisfactory. I suppose one answer would be to
load the module in all backends, in case the user wants to look at the
value. But that would be wasteful. Maybe we should have a warning
about it in the docs -- tell people to LOAD the library if they want to
examine the configuration?

Yeah, for something like basic_archive, it should be fine to load it via
shared_preload_libraries or LOAD as well as archive_library, but not all
archive libraries might be written to handle that correctly. And this is
not the most user-friendly.

The underlying issue here is that the pg_settings view doesn't show
placeholder GUCs because we lack satisfactory values to put in
most of the columns. I don't know if revisiting that conclusion
would be appropriate or not. The purist approach would be to show
NULL for any unknown column, but how many applications would that
break? And even the "known" values could change unexpectedly when
the module does get loaded, for example if the GUC has units and
the value in the config file is expressed in a non-canonical way.
(To say nothing of what a show hook might do...)

Perhaps the "category" could indicate that this is a placeholder value, and
the pg_settings documentation could explain exactly what that means (i.e.,
unknown to any libraries loaded in the current process, but may have
meaning to others).

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#51talk to ben
blo.talkto@gmail.com
In reply to: Nathan Bossart (#50)
1 attachment(s)
Re: archive modules

Would this addition to the documentation be ok ? I hope the english is not
too broken ..

Attachments:

0001-basic_archive-parameter-visibility-doc-patch.patchtext/x-patch; charset=US-ASCII; name=0001-basic_archive-parameter-visibility-doc-patch.patchDownload
From 8ea8c21413eeac8fbd37527e64820cbdca3a5d7a Mon Sep 17 00:00:00 2001
From: benoit <benoit.lobreau@dalibo.com>
Date: Mon, 22 Aug 2022 12:00:46 +0200
Subject: [PATCH] basic_archive parameter visibility doc patch

Module parameters are only visible from the pg_settings view once
the module is loaded. Since an archive module is loaded by the archiver
process, the parameters are never visible from the view. This patch
adds a note bout this in the basic_archive module and system views
documentation.
---
 doc/src/sgml/basic-archive.sgml | 13 +++++++++++++
 doc/src/sgml/system-views.sgml  |  5 ++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/basic-archive.sgml b/doc/src/sgml/basic-archive.sgml
index 0b650f17a8..65e70b795b 100644
--- a/doc/src/sgml/basic-archive.sgml
+++ b/doc/src/sgml/basic-archive.sgml
@@ -68,6 +68,19 @@ basic_archive.archive_directory = '/path/to/archive/directory'
    to any archiving still in progress, but users should use extra caution when
    doing so.
   </para>
+
+  <para>
+   The archive module is loaded by the archiver process. Therefore, the
+   parameters defined in the module are not set outside this process and cannot
+   be seen from the <structname>pg_settings</structname> view or the
+   \dconfig meta-command.
+   These parameters values can be shown from the server's configuration
+   file(s) through the <structname>pg_file_settings</structname> view.
+   If you want to check the actual values applied by the archiver, you can
+   <command>LOAD</command> the module before reading
+   <structname>pg_settings</structname>. It's also possible to search
+   for  the options directly with the <command>SHOW</command> command.
+  </para>
  </sect2>
 
  <sect2>
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 9728039e71..c8f0f3843c 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3275,7 +3275,10 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
   <para>
    This view does not display <link linkend="runtime-config-custom">customized options</link>
-   until the extension module that defines them has been loaded.
+   until the extension module that defines them has been loaded. For instance,
+   since an archive module is loaded by the archiver process, its customized
+   options are not visible from other sessions, unless they load the module
+   themselves.
   </para>
 
   <para>
-- 
2.37.1

#52Nathan Bossart
nathandbossart@gmail.com
In reply to: talk to ben (#51)
Re: archive modules

On Tue, Aug 23, 2022 at 04:18:52PM +0200, talk to ben wrote:

--- a/doc/src/sgml/basic-archive.sgml
+++ b/doc/src/sgml/basic-archive.sgml
@@ -68,6 +68,19 @@ basic_archive.archive_directory = '/path/to/archive/directory'
to any archiving still in progress, but users should use extra caution when
doing so.
</para>
+
+  <para>
+   The archive module is loaded by the archiver process. Therefore, the
+   parameters defined in the module are not set outside this process and cannot
+   be seen from the <structname>pg_settings</structname> view or the
+   \dconfig meta-command.
+   These parameters values can be shown from the server's configuration
+   file(s) through the <structname>pg_file_settings</structname> view.
+   If you want to check the actual values applied by the archiver, you can
+   <command>LOAD</command> the module before reading
+   <structname>pg_settings</structname>. It's also possible to search
+   for  the options directly with the <command>SHOW</command> command.
+  </para>

I don't know if it makes sense to document this in basic_archive. On one
hand, it seems like folks will commonly encounter this behavior with this
module, so this feels like a natural place for such a note. But on the
other hand, this is generic behavior for any library that is dynamically
loaded in a separate process.

Overall, I think I'm +1 for this patch. I haven't thought too much about
the exact wording, but provided others support it as well, I will try to
take a deeper look soon.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#53Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#52)
Re: archive modules

Nathan Bossart <nathandbossart@gmail.com> writes:

I don't know if it makes sense to document this in basic_archive. On one
hand, it seems like folks will commonly encounter this behavior with this
module, so this feels like a natural place for such a note. But on the
other hand, this is generic behavior for any library that is dynamically
loaded in a separate process.

Yeah, I don't think this material is at all specific to basic_archive.
Maybe it could be documented near the relevant views, if it isn't already.

Also, I think the proposed text neglects the case of including the
module in shared_preload_libraries.

regards, tom lane

#54talk to ben
blo.talkto@gmail.com
In reply to: Tom Lane (#53)
1 attachment(s)
Re: archive modules

Nathan Bossart <nathandbossart@gmail.com> writes:

On one hand, it seems like folks will commonly encounter this behavior

with this

module, so this feels like a natural place for such a note.

Yes, I looked there first.

Would this addition to the pg_settings description be better ?

Attachments:

0001-basic_archive-parameter-visibility-doc-patch.patchtext/x-patch; charset=US-ASCII; name=0001-basic_archive-parameter-visibility-doc-patch.patchDownload
From 5346a8a0451e222e6592baacb994e6a0f884898d Mon Sep 17 00:00:00 2001
From: benoit <benoit.lobreau@dalibo.com>
Date: Mon, 22 Aug 2022 12:00:46 +0200
Subject: [PATCH] basic_archive parameter visibility doc patch

Module parameters are only visible from the pg_settings view once
the module is loaded. Since an archive module is loaded by the archiver
process, the parameters are never visible from the view. This patch
adds a note bout this in the pg_settings system view documentation.
---
 doc/src/sgml/system-views.sgml | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 9728039e71..929838dfa8 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3275,7 +3275,11 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
   <para>
    This view does not display <link linkend="runtime-config-custom">customized options</link>
-   until the extension module that defines them has been loaded.
+   until the extension module that defines them has been loaded. Therefore, any
+   option defined in a library that is dynamically loaded in a separate process
+   will not be visible in the view, unless the module is manually loaded
+   beforehand. This case applies for example to an archive module loaded by the
+   archiver process.
   </para>
 
   <para>
-- 
2.37.1

#55Nathan Bossart
nathandbossart@gmail.com
In reply to: talk to ben (#54)
Re: archive modules

On Wed, Aug 24, 2022 at 10:05:55AM +0200, talk to ben wrote:

This view does not display <link linkend="runtime-config-custom">customized options</link>
-   until the extension module that defines them has been loaded.
+   until the extension module that defines them has been loaded. Therefore, any
+   option defined in a library that is dynamically loaded in a separate process
+   will not be visible in the view, unless the module is manually loaded
+   beforehand. This case applies for example to an archive module loaded by the
+   archiver process.

I would suggest something like:

This view does not display customized options until the extension
module that defines them has been loaded by the backend process
executing the query (e.g., via shared_preload_libraries, the LOAD
command, or a call to a user-defined C function). For example, since
the archive_library is only loaded by the archiver process, this view
will not display any customized options defined by archive modules
unless special action is taken to load them into the backend process
executing the query.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#56talk to ben
blo.talkto@gmail.com
In reply to: Nathan Bossart (#55)
1 attachment(s)
Re: archive modules

Here is a patch with the proposed wording.

Attachments:

0001-basic_archive-parameter-visibility-doc-patch.patchtext/x-patch; charset=US-ASCII; name=0001-basic_archive-parameter-visibility-doc-patch.patchDownload
From 7fce0073f8a53b3e9ba84fa10fbc7b8efef36e97 Mon Sep 17 00:00:00 2001
From: benoit <benoit.lobreau@dalibo.com>
Date: Mon, 22 Aug 2022 12:00:46 +0200
Subject: [PATCH] basic_archive parameter visibility doc patch

Module parameters are only visible from the pg_settings view once
the module is loaded. Since an archive module is loaded by the archiver
process, the parameters are never visible from the view. This patch
adds a note bout this in the pg_settings system view documentation.
---
 doc/src/sgml/system-views.sgml | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 9728039e71..92aad11c71 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3275,7 +3275,13 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
   <para>
    This view does not display <link linkend="runtime-config-custom">customized options</link>
-   until the extension module that defines them has been loaded.
+   until the extension module that defines them has been loaded by the backend
+   process executing the query (e.g., via shared_preload_libraries, the
+   <command>LOAD</command> command, or a call to a user-defined C function).
+   For example, since the archive_library is only loaded by the archiver
+   process, this view will not display any customized options defined by
+   archive modules unless special action is taken to load them into the backend
+   process executing the query.
   </para>
 
   <para>
-- 
2.37.1

#57Nathan Bossart
nathandbossart@gmail.com
In reply to: talk to ben (#56)
1 attachment(s)
Re: archive modules

On Thu, Aug 25, 2022 at 03:29:41PM +0200, talk to ben wrote:

Here is a patch with the proposed wording.

Here is the same patch with a couple more links.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v4-0001-basic_archive-parameter-visibility-doc-patch.patchtext/x-diff; charset=us-asciiDownload
From 92a6d8669d9e5b527a7ac9af7eb359a86526775b Mon Sep 17 00:00:00 2001
From: benoit <benoit.lobreau@dalibo.com>
Date: Mon, 22 Aug 2022 12:00:46 +0200
Subject: [PATCH v4 1/1] basic_archive parameter visibility doc patch

Module parameters are only visible from the pg_settings view once
the module is loaded. Since an archive module is loaded by the archiver
process, the parameters are never visible from the view. This patch
adds a note bout this in the pg_settings system view documentation.
---
 doc/src/sgml/system-views.sgml | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 44aa70a031..18ac4620f0 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3275,7 +3275,15 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
   <para>
    This view does not display <link linkend="runtime-config-custom">customized options</link>
-   until the extension module that defines them has been loaded.
+   until the extension module that defines them has been loaded by the backend
+   process executing the query (e.g., via
+   <xref linkend="guc-shared-preload-libraries"/>, the
+   <link linkend="sql-load"><command>LOAD</command></link> command, or a call
+   to a <link linkend="xfunc-c">user-defined C function</link>).  For example,
+   since the <xref linkend="guc-archive-library"/> is only loaded by the
+   archiver process, this view will not display any customized options defined
+   by <link linkend="archive-modules">archive modules</link> unless special
+   action is taken to load them into the backend process executing the query.
   </para>
 
   <para>
-- 
2.25.1

#58Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#57)
Re: archive modules

On Thu, Aug 25, 2022 at 01:06:00PM -0700, Nathan Bossart wrote:

On Thu, Aug 25, 2022 at 03:29:41PM +0200, talk to ben wrote:

Here is a patch with the proposed wording.

Here is the same patch with a couple more links.

I would advise that you create a commitfest entry for your patch so that it
isn't forgotten.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#59Benoit Lobréau
benoit.lobreau@gmail.com
In reply to: Nathan Bossart (#58)
Re: archive modules

On Tue, Aug 30, 2022 at 12:46 AM Nathan Bossart <nathandbossart@gmail.com>
wrote:

I would advise that you create a commitfest entry for your patch so that it
isn't forgotten.

Ok done, https://commitfest.postgresql.org/39/3856/ (is that fine if I
added you as a reviewer ?)

and thanks for the added links in the patch.

#60Nathan Bossart
nathandbossart@gmail.com
In reply to: Benoit Lobréau (#59)
Re: archive modules

On Tue, Aug 30, 2022 at 09:49:20AM +0200, Benoit Lobr�au wrote:

Ok done, https://commitfest.postgresql.org/39/3856/ (is that fine if I
added you as a reviewer ?)

Of course. I've marked it as ready-for-committer.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#61Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#60)
Re: archive modules

Nathan Bossart <nathandbossart@gmail.com> writes:

Of course. I've marked it as ready-for-committer.

Pushed with a bit of additional wordsmithing.

regards, tom lane

#62Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#61)
Re: archive modules

On Sat, Sep 10, 2022 at 04:44:16PM -0400, Tom Lane wrote:

Nathan Bossart <nathandbossart@gmail.com> writes:

Of course. I've marked it as ready-for-committer.

Pushed with a bit of additional wordsmithing.

Thanks!

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#63Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Nathan Bossart (#43)
Re: archive modules

I noticed that this patch has gone around and mostly purged mentions of
archive_command from the documentation and replaced them with
archive_library. I don't think this is helpful, since people still use
archive_command and will want to see what the documentation has to say
about it. I suggest we rewind that a bit and for example replace things
like

     removed or recycled until they are archived. If WAL archiving cannot keep up
-   with the pace that WAL is generated, or if <varname>archive_command</varname>
+   with the pace that WAL is generated, or if <varname>archive_library</varname>
     fails repeatedly, old WAL files will accumulate in <filename>pg_wal</filename>

with

removed or recycled until they are archived. If WAL archiving cannot keep up
with the pace that WAL is generated, or if <varname>archive_command</varname>
with the pace that WAL is generated, or if <varname>archive_command</varname>
or <varname>archive_library</varname>
fail repeatedly, old WAL files will accumulate in <filename>pg_wal</filename>

#64Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#63)
Re: archive modules

On Wed, Sep 14, 2022 at 06:37:38AM +0200, Peter Eisentraut wrote:

I noticed that this patch has gone around and mostly purged mentions of
archive_command from the documentation and replaced them with
archive_library. I don't think this is helpful, since people still use
archive_command and will want to see what the documentation has to say
about it. I suggest we rewind that a bit and for example replace things
like

removed or recycled until they are archived. If WAL archiving cannot keep up
-   with the pace that WAL is generated, or if <varname>archive_command</varname>
+   with the pace that WAL is generated, or if <varname>archive_library</varname>
fails repeatedly, old WAL files will accumulate in <filename>pg_wal</filename>

with

removed or recycled until they are archived. If WAL archiving cannot keep up
with the pace that WAL is generated, or if <varname>archive_command</varname>
with the pace that WAL is generated, or if <varname>archive_command</varname>
or <varname>archive_library</varname>
fail repeatedly, old WAL files will accumulate in <filename>pg_wal</filename>

Yep. Some references to archive_library have been changed by 31e121
to do exactly that. There seem to be more spots in need of an
update.
--
Michael

#65Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Michael Paquier (#64)
1 attachment(s)
Re: archive modules

On 14.09.22 07:25, Michael Paquier wrote:

removed or recycled until they are archived. If WAL archiving cannot keep up
-   with the pace that WAL is generated, or if <varname>archive_command</varname>
+   with the pace that WAL is generated, or if <varname>archive_library</varname>
fails repeatedly, old WAL files will accumulate in <filename>pg_wal</filename>

with

removed or recycled until they are archived. If WAL archiving cannot keep up
with the pace that WAL is generated, or if <varname>archive_command</varname>
with the pace that WAL is generated, or if <varname>archive_command</varname>
or <varname>archive_library</varname>
fail repeatedly, old WAL files will accumulate in <filename>pg_wal</filename>

Yep. Some references to archive_library have been changed by 31e121
to do exactly that. There seem to be more spots in need of an
update.

I don't see anything in 31e121 about that.

Here is a patch that addresses this.

Attachments:

0001-Restore-archive_command-documentation.patchtext/plain; charset=UTF-8; name=0001-Restore-archive_command-documentation.patchDownload
From 51512cd9cb59d169b041d10d62fc6a282011675c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Wed, 14 Sep 2022 21:26:10 +0200
Subject: [PATCH] Restore archive_command documentation

Commit 5ef1eefd76f404ddc59b885d50340e602b70f05f, which added
archive_library, purged most mentions of archive_command from the
documentation.  This is inappropriate, since archive_command is still
a feature in use and users will want to see information about it.

This restores all the removed mentions and rephrases things so that
archive_command and archive_library are presented as alternatives of
each other.
---
 doc/src/sgml/backup.sgml            | 50 +++++++++++++++++--------
 doc/src/sgml/config.sgml            | 58 +++++++++++++++--------------
 doc/src/sgml/high-availability.sgml |  6 +--
 doc/src/sgml/ref/pg_receivewal.sgml |  7 +++-
 doc/src/sgml/wal.sgml               |  3 +-
 5 files changed, 76 insertions(+), 48 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index dee59bb422..a6d7105836 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -593,7 +593,7 @@ <title>Setting Up WAL Archiving</title>
     provide the database administrator with flexibility,
     <productname>PostgreSQL</productname> tries not to make any assumptions about how
     the archiving will be done.  Instead, <productname>PostgreSQL</productname> lets
-    the administrator specify an archive library to be executed to copy a
+    the administrator specify a shell command or an archive library to be executed to copy a
     completed segment file to wherever it needs to go.  This could be as simple
     as a shell command that uses <literal>cp</literal>, or it could invoke a
     complex C function &mdash; it's all up to you.
@@ -603,13 +603,15 @@ <title>Setting Up WAL Archiving</title>
     To enable WAL archiving, set the <xref linkend="guc-wal-level"/>
     configuration parameter to <literal>replica</literal> or higher,
     <xref linkend="guc-archive-mode"/> to <literal>on</literal>,
-    and specify the library to use in the <xref
+    specify the shell command to use in the <xref
+    linkend="guc-archive-command"/> configuration parameter
+    or specify the library to use in the <xref
     linkend="guc-archive-library"/> configuration parameter.  In practice
     these settings will always be placed in the
     <filename>postgresql.conf</filename> file.
-    One simple way to archive is to set <varname>archive_library</varname> to
-    an empty string and to specify a shell command in
-    <xref linkend="guc-archive-command"/>.
+   </para>
+
+   <para>
     In <varname>archive_command</varname>,
     <literal>%p</literal> is replaced by the path name of the file to
     archive, while <literal>%f</literal> is replaced by only the file name.
@@ -633,6 +635,24 @@ <title>Setting Up WAL Archiving</title>
     A similar command will be generated for each new file to be archived.
    </para>
 
+   <para>
+    It is important that the archive command return zero exit status if and
+    only if it succeeds.  Upon getting a zero result,
+    <productname>PostgreSQL</productname> will assume that the file has been
+    successfully archived, and will remove or recycle it.  However, a nonzero
+    status tells <productname>PostgreSQL</productname> that the file was not archived;
+    it will try again periodically until it succeeds.
+   </para>
+
+   <para>
+    When the archive command is terminated by a signal (other than
+    <systemitem>SIGTERM</systemitem> that is used as part of a server
+    shutdown) or an error by the shell with an exit status greater than
+    125 (such as command not found), the archiver process aborts and gets
+    restarted by the postmaster. In such cases, the failure is
+    not reported in <xref linkend="pg-stat-archiver-view"/>.
+   </para>
+
    <para>
     Another way to archive is to use a custom archive module as the
     <varname>archive_library</varname>.  Since such modules are written in
@@ -678,7 +698,7 @@ <title>Setting Up WAL Archiving</title>
    </para>
 
    <para>
-    The archive library should generally be designed to refuse to overwrite
+    Archive commands and libraries should generally be designed to refuse to overwrite
     any pre-existing archive file.  This is an important safety feature to
     preserve the integrity of your archive in case of administrator error
     (such as sending the output of two different servers to the same archive
@@ -686,9 +706,9 @@ <title>Setting Up WAL Archiving</title>
    </para>
 
    <para>
-    It is advisable to test your proposed archive library to ensure that it
+    It is advisable to test your proposed archive command or library to ensure that it
     indeed does not overwrite an existing file, <emphasis>and that it returns
-    <literal>false</literal> in this case</emphasis>.
+    nonzero status or <literal>false</literal>, respectively, in this case</emphasis>.
     The example command above for Unix ensures this by including a separate
     <command>test</command> step.  On some Unix platforms, <command>cp</command> has
     switches such as <option>-i</option> that can be used to do the same thing
@@ -700,7 +720,7 @@ <title>Setting Up WAL Archiving</title>
 
    <para>
     While designing your archiving setup, consider what will happen if
-    the archive library fails repeatedly because some aspect requires
+    the archive command or library fails repeatedly because some aspect requires
     operator intervention or the archive runs out of space. For example, this
     could occur if you write to tape without an autochanger; when the tape
     fills, nothing further can be archived until the tape is swapped.
@@ -715,7 +735,7 @@ <title>Setting Up WAL Archiving</title>
    </para>
 
    <para>
-    The speed of the archive library is unimportant as long as it can keep up
+    The speed of the archive command or library is unimportant as long as it can keep up
     with the average rate at which your server generates WAL data.  Normal
     operation continues even if the archiving process falls a little behind.
     If archiving falls significantly behind, this will increase the amount of
@@ -727,7 +747,7 @@ <title>Setting Up WAL Archiving</title>
    </para>
 
    <para>
-    In writing your archive library, you should assume that the file names to
+    In writing your archive command or library, you should assume that the file names to
     be archived can be up to 64 characters long and can contain any
     combination of ASCII letters, digits, and dots.  It is not necessary to
     preserve the original relative path but it is necessary to preserve the file
@@ -748,7 +768,7 @@ <title>Setting Up WAL Archiving</title>
    </para>
 
    <para>
-    The archive function is only invoked on completed WAL segments.  Hence,
+    The archive command or function is only invoked on completed WAL segments.  Hence,
     if your server generates only little WAL traffic (or has slack periods
     where it does so), there could be a long delay between the completion
     of a transaction and its safe recording in archive storage.  To put
@@ -777,7 +797,7 @@ <title>Setting Up WAL Archiving</title>
     turned on during execution of one of these statements, WAL would not
     contain enough information for archive recovery.  (Crash recovery is
     unaffected.)  For this reason, <varname>wal_level</varname> can only be changed at
-    server start.  However, <varname>archive_library</varname> can be changed with a
+    server start.  However, <varname>archive_command</varname> and <varname>archive_library</varname> can be changed with a
     configuration file reload.  If you are archiving via shell and wish to
     temporarily stop archiving,
     one way to do it is to set <varname>archive_command</varname> to the empty
@@ -947,12 +967,12 @@ <title>Making a Base Backup Using the Low Level API</title>
      On a standby, <varname>archive_mode</varname> must be <literal>always</literal> in order
      for <function>pg_backup_stop</function> to wait.
      Archiving of these files happens automatically since you have
-     already configured <varname>archive_library</varname> or
+     already configured <varname>archive_command</varname> or <varname>archive_library</varname> or
      <varname>archive_command</varname>.
      In most cases this happens quickly, but you are advised to monitor your
      archive system to ensure there are no delays.
      If the archive process has fallen behind because of failures of the
-     archive library or archive command, it will keep retrying
+     archive command or library, it will keep retrying
      until the archive succeeds and the backup is complete.
      If you wish to place a time limit on the execution of
      <function>pg_backup_stop</function>, set an appropriate
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5aac7110b1..09c7a6116a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3496,7 +3496,7 @@ <title>Checkpoints</title>
         Maximum size to let the WAL grow during automatic
         checkpoints. This is a soft limit; WAL size can exceed
         <varname>max_wal_size</varname> under special circumstances, such as
-        heavy load, a failing <varname>archive_library</varname>, or a high
+        heavy load, a failing <varname>archive_command</varname> or <varname>archive_library</varname>, or a high
         <varname>wal_keep_size</varname> setting.
         If this value is specified without units, it is taken as megabytes.
         The default is 1 GB.
@@ -3545,6 +3545,7 @@ <title>Archiving</title>
        <para>
         When <varname>archive_mode</varname> is enabled, completed WAL segments
         are sent to archive storage by setting
+        <xref linkend="guc-archive-command"/> or
         <xref linkend="guc-archive-library"/>. In addition to <literal>off</literal>,
         to disable, there are two modes: <literal>on</literal>, and
         <literal>always</literal>. During normal operation, there is no
@@ -3555,6 +3556,9 @@ <title>Archiving</title>
         <xref linkend="continuous-archiving-in-standby"/> for details.
        </para>
        <para>
+        <varname>archive_mode</varname> and <varname>archive_command</varname> are
+        separate variables so that <varname>archive_command</varname> can be
+        changed without leaving archiving mode.
         This parameter can only be set at server start.
         <varname>archive_mode</varname> cannot be enabled when
         <varname>wal_level</varname> is set to <literal>minimal</literal>.
@@ -3562,28 +3566,6 @@ <title>Archiving</title>
       </listitem>
      </varlistentry>
 
-     <varlistentry id="guc-archive-library" xreflabel="archive_library">
-      <term><varname>archive_library</varname> (<type>string</type>)
-      <indexterm>
-       <primary><varname>archive_library</varname> configuration parameter</primary>
-      </indexterm>
-      </term>
-      <listitem>
-       <para>
-        The library to use for archiving completed WAL file segments.  If set to
-        an empty string (the default), archiving via shell is enabled, and
-        <xref linkend="guc-archive-command"/> is used.  Otherwise, the specified
-        shared library is used for archiving.  For more information, see
-        <xref linkend="backup-archiving-wal"/> and
-        <xref linkend="archive-modules"/>.
-       </para>
-       <para>
-        This parameter can only be set in the
-        <filename>postgresql.conf</filename> file or on the server command line.
-       </para>
-      </listitem>
-     </varlistentry>
-
      <varlistentry id="guc-archive-command" xreflabel="archive_command">
       <term><varname>archive_command</varname> (<type>string</type>)
       <indexterm>
@@ -3607,10 +3589,10 @@ <title>Archiving</title>
         This parameter can only be set in the <filename>postgresql.conf</filename>
         file or on the server command line.  It is ignored unless
         <varname>archive_mode</varname> was enabled at server start and
-        <varname>archive_library</varname> specifies to archive via shell command.
+        <varname>archive_library</varname> is set to an empty string.
         If <varname>archive_command</varname> is an empty string (the default) while
-        <varname>archive_mode</varname> is enabled and <varname>archive_library</varname>
-        specifies archiving via shell, WAL archiving is temporarily
+        <varname>archive_mode</varname> is enabled (and <varname>archive_library</varname>
+        is set to an empty string), WAL archiving is temporarily
         disabled, but the server continues to accumulate WAL segment files in
         the expectation that a command will soon be provided.  Setting
         <varname>archive_command</varname> to a command that does nothing but
@@ -3622,6 +3604,28 @@ <title>Archiving</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-archive-library" xreflabel="archive_library">
+      <term><varname>archive_library</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>archive_library</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        The library to use for archiving completed WAL file segments.  If set to
+        an empty string (the default), archiving via shell is enabled, and
+        <xref linkend="guc-archive-command"/> is used.  Otherwise, the specified
+        shared library is used for archiving.  For more information, see
+        <xref linkend="backup-archiving-wal"/> and
+        <xref linkend="archive-modules"/>.
+       </para>
+       <para>
+        This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-archive-timeout" xreflabel="archive_timeout">
       <term><varname>archive_timeout</varname> (<type>integer</type>)
       <indexterm>
@@ -3630,7 +3634,7 @@ <title>Archiving</title>
       </term>
       <listitem>
        <para>
-        The <xref linkend="guc-archive-library"/> is only invoked for
+        The <xref linkend="guc-archive-command"/> or <xref linkend="guc-archive-library"/> is only invoked for
         completed WAL segments. Hence, if your server generates little WAL
         traffic (or has slack periods where it does so), there could be a
         long delay between the completion of a transaction and its safe
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 3df4cda716..b2b3129397 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -935,7 +935,7 @@ <title>Replication Slots</title>
     In lieu of using replication slots, it is possible to prevent the removal
     of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by
     storing the segments in an archive using
-    <xref linkend="guc-archive-library"/>.
+    <xref linkend="guc-archive-command"/> or <xref linkend="guc-archive-library"/>.
     However, these methods often result in retaining more WAL segments than
     required, whereas replication slots retain only the number of segments
     known to be needed.  On the other hand, replication slots can retain so
@@ -1386,10 +1386,10 @@ <title>Continuous Archiving in Standby</title>
      to <literal>always</literal>, and the standby will call the archive
      command for every WAL segment it receives, whether it's by restoring
      from the archive or by streaming replication. The shared archive can
-     be handled similarly, but the <varname>archive_library</varname> must
+     be handled similarly, but the <varname>archive_command</varname> or <varname>archive_library</varname> must
      test if the file being archived exists already, and if the existing file
      has identical contents. This requires more care in the
-     <varname>archive_library</varname>, as it must
+     <varname>archive_command</varname> or <varname>archive_library</varname>, as it must
      be careful to not overwrite an existing file with different contents,
      but return success if the exactly same file is archived twice. And
      all that must be done free of race conditions, if two servers attempt
diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 4fe9e1a874..7138a8e3f6 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -40,7 +40,8 @@ <title>Description</title>
   <para>
    <application>pg_receivewal</application> streams the write-ahead
    log in real time as it's being generated on the server, and does not wait
-   for segments to complete like <xref linkend="guc-archive-library"/> does.
+   for segments to complete like <xref linkend="guc-archive-command"/> and
+   <xref linkend="guc-archive-library"/> does.
    For this reason, it is not necessary to set
    <xref linkend="guc-archive-timeout"/> when using
     <application>pg_receivewal</application>.
@@ -486,11 +487,13 @@ <title>Notes</title>
 
   <para>
    When using <application>pg_receivewal</application> instead of
+   <xref linkend="guc-archive-command"/> or
    <xref linkend="guc-archive-library"/> as the main WAL backup method, it is
    strongly recommended to use replication slots.  Otherwise, the server is
    free to recycle or remove write-ahead log files before they are backed up,
    because it does not have any information, either
-   from <xref linkend="guc-archive-library"/> or the replication slots, about
+   from <xref linkend="guc-archive-command"/> or
+   <xref linkend="guc-archive-library"/> or the replication slots, about
    how far the WAL stream has been archived.  Note, however, that a
    replication slot will fill up the server's disk space if the receiver does
    not keep up with fetching the WAL data.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 4b6ef283c1..27fb020a06 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -636,7 +636,8 @@ <title><acronym>WAL</acronym> Configuration</title>
    WAL files plus one additional WAL file are
    kept at all times. Also, if WAL archiving is used, old segments cannot be
    removed or recycled until they are archived. If WAL archiving cannot keep up
-   with the pace that WAL is generated, or if <varname>archive_library</varname>
+   with the pace that WAL is generated, or if <varname>archive_command</varname>
+   or <varname>archive_library</varname>
    fails repeatedly, old WAL files will accumulate in <filename>pg_wal</filename>
    until the situation is resolved. A slow or failed standby server that
    uses a replication slot will have the same effect (see
-- 
2.37.3

#66Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Nathan Bossart (#62)
Re: archive modules

Another question on this feature: Currently, if archive_library is set,
archive_command is ignored. I think if both are set, it should be an
error. Compare for example what happens if you set multiple
recovery_target_xxx settings. I don't think silently turning off one
setting by setting another is a good behavior.

#67Nathan Bossart
nathandbossart@gmail.com
In reply to: Peter Eisentraut (#66)
Re: archive modules

On Wed, Sep 14, 2022 at 09:33:46PM +0200, Peter Eisentraut wrote:

Another question on this feature: Currently, if archive_library is set,
archive_command is ignored. I think if both are set, it should be an error.
Compare for example what happens if you set multiple recovery_target_xxx
settings. I don't think silently turning off one setting by setting another
is a good behavior.

I originally did it this way, but changed it based on this feedback [0]/messages/by-id/CA+Tgmoaf4Y7_U+_W+Sg5DoAta_FMssr=52mx7-_tJnfaD1VubQ@mail.gmail.com. I
have no problem with the general idea, but the recovery_target_* logic does
have the following note:

* XXX this code is broken by design. Throwing an error from a GUC assign
* hook breaks fundamental assumptions of guc.c. So long as all the variables
* for which this can happen are PGC_POSTMASTER, the consequences are limited,
* since we'd just abort postmaster startup anyway. Nonetheless it's likely
* that we have odd behaviors such as unexpected GUC ordering dependencies.

[0]: /messages/by-id/CA+Tgmoaf4Y7_U+_W+Sg5DoAta_FMssr=52mx7-_tJnfaD1VubQ@mail.gmail.com

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#68Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Nathan Bossart (#67)
Re: archive modules

On 14.09.22 22:03, Nathan Bossart wrote:

On Wed, Sep 14, 2022 at 09:33:46PM +0200, Peter Eisentraut wrote:

Another question on this feature: Currently, if archive_library is set,
archive_command is ignored. I think if both are set, it should be an error.
Compare for example what happens if you set multiple recovery_target_xxx
settings. I don't think silently turning off one setting by setting another
is a good behavior.

I originally did it this way, but changed it based on this feedback [0]. I
have no problem with the general idea, but the recovery_target_* logic does
have the following note:

* XXX this code is broken by design. Throwing an error from a GUC assign
* hook breaks fundamental assumptions of guc.c. So long as all the variables
* for which this can happen are PGC_POSTMASTER, the consequences are limited,
* since we'd just abort postmaster startup anyway. Nonetheless it's likely
* that we have odd behaviors such as unexpected GUC ordering dependencies.

Ah yes, that won't work. But maybe we can just check it at run time,
like in LoadArchiveLibrary().

#69Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#68)
Re: archive modules

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

On 14.09.22 22:03, Nathan Bossart wrote:

I originally did it this way, but changed it based on this feedback [0]. I
have no problem with the general idea, but the recovery_target_* logic does
have the following note:

* XXX this code is broken by design. Throwing an error from a GUC assign
* hook breaks fundamental assumptions of guc.c. So long as all the variables
* for which this can happen are PGC_POSTMASTER, the consequences are limited,
* since we'd just abort postmaster startup anyway. Nonetheless it's likely
* that we have odd behaviors such as unexpected GUC ordering dependencies.

Ah yes, that won't work. But maybe we can just check it at run time,
like in LoadArchiveLibrary().

Yeah, the objection there is only to trying to enforce such
interrelationships in GUC hooks. In this case it seems to me that
we could easily check and complain at the point where we're about
to use the GUC values.

regards, tom lane

#70Nathan Bossart
nathandbossart@gmail.com
In reply to: Peter Eisentraut (#65)
Re: archive modules

On Wed, Sep 14, 2022 at 09:31:04PM +0200, Peter Eisentraut wrote:

Here is a patch that addresses this.

My intent was to present archive_command as the built-in archive library,
but I can see how this might cause confusion, so this change seems
reasonable to me.

+   <para>
+    It is important that the archive command return zero exit status if and
+    only if it succeeds.  Upon getting a zero result,
+    <productname>PostgreSQL</productname> will assume that the file has been
+    successfully archived, and will remove or recycle it.  However, a nonzero
+    status tells <productname>PostgreSQL</productname> that the file was not archived;
+    it will try again periodically until it succeeds.
+   </para>
+
+   <para>
+    When the archive command is terminated by a signal (other than
+    <systemitem>SIGTERM</systemitem> that is used as part of a server
+    shutdown) or an error by the shell with an exit status greater than
+    125 (such as command not found), the archiver process aborts and gets
+    restarted by the postmaster. In such cases, the failure is
+    not reported in <xref linkend="pg-stat-archiver-view"/>.
+   </para>

This wording is very similar to the existing wording in the archive library
section below it. I think the second paragraph covers the shell command case
explicitly, too. Perhaps these should be combined.

+        <varname>archive_mode</varname> and <varname>archive_command</varname> are
+        separate variables so that <varname>archive_command</varname> can be
+        changed without leaving archiving mode.

I believe this applies to archive_library, too.

-   for segments to complete like <xref linkend="guc-archive-library"/> does.
+   for segments to complete like <xref linkend="guc-archive-command"/> and
+   <xref linkend="guc-archive-library"/> does.

nitpick: s/does/do

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#71Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#69)
Re: archive modules

On Wed, Sep 14, 2022 at 04:47:23PM -0400, Tom Lane wrote:

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

On 14.09.22 22:03, Nathan Bossart wrote:

I originally did it this way, but changed it based on this feedback [0]. I
have no problem with the general idea, but the recovery_target_* logic does
have the following note:

* XXX this code is broken by design. Throwing an error from a GUC assign
* hook breaks fundamental assumptions of guc.c. So long as all the variables
* for which this can happen are PGC_POSTMASTER, the consequences are limited,
* since we'd just abort postmaster startup anyway. Nonetheless it's likely
* that we have odd behaviors such as unexpected GUC ordering dependencies.

Ah yes, that won't work. But maybe we can just check it at run time,
like in LoadArchiveLibrary().

Yeah, the objection there is only to trying to enforce such
interrelationships in GUC hooks. In this case it seems to me that
we could easily check and complain at the point where we're about
to use the GUC values.

I think the cleanest way to do something like that would be to load a
check_configured_cb that produces a WARNING. IIRC failing in
LoadArchiveLibrary() would just cause the archiver process to restart over
and over. HandlePgArchInterrupts() might need some work as well.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#72Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#71)
Re: archive modules

Nathan Bossart <nathandbossart@gmail.com> writes:

On Wed, Sep 14, 2022 at 04:47:23PM -0400, Tom Lane wrote:

Yeah, the objection there is only to trying to enforce such
interrelationships in GUC hooks. In this case it seems to me that
we could easily check and complain at the point where we're about
to use the GUC values.

I think the cleanest way to do something like that would be to load a
check_configured_cb that produces a WARNING. IIRC failing in
LoadArchiveLibrary() would just cause the archiver process to restart over
and over. HandlePgArchInterrupts() might need some work as well.

Hm. Maybe consistency-check these settings in the postmaster, sometime
after we've absorbed all GUC settings but before we launch any children?
That could provide a saner implementation for the recovery_target_*
variables too.

regards, tom lane

#73Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#72)
1 attachment(s)
Re: archive modules

On Wed, Sep 14, 2022 at 06:12:09PM -0400, Tom Lane wrote:

Nathan Bossart <nathandbossart@gmail.com> writes:

On Wed, Sep 14, 2022 at 04:47:23PM -0400, Tom Lane wrote:

Yeah, the objection there is only to trying to enforce such
interrelationships in GUC hooks. In this case it seems to me that
we could easily check and complain at the point where we're about
to use the GUC values.

I think the cleanest way to do something like that would be to load a
check_configured_cb that produces a WARNING. IIRC failing in
LoadArchiveLibrary() would just cause the archiver process to restart over
and over. HandlePgArchInterrupts() might need some work as well.

Hm. Maybe consistency-check these settings in the postmaster, sometime
after we've absorbed all GUC settings but before we launch any children?
That could provide a saner implementation for the recovery_target_*
variables too.

Both archive_command and archive_library are PGC_SIGHUP, so IIUC that
wouldn't be sufficient. I attached a quick sketch that seems to provide
the desired behavior. It's nowhere near committable yet, but it
demonstrates what I'm thinking.

For recovery_target_*, something like you are describing seems reasonable.
I believe PostmasterMain() already performs some similar checks.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

handle_archive_misconfiguration.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 6ce361707d..1d0c6029a5 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -422,8 +422,15 @@ pgarch_ArchiverCopyLoop(void)
 			HandlePgArchInterrupts();
 
 			/* can't do anything if not configured ... */
-			if (ArchiveContext.check_configured_cb != NULL &&
-				!ArchiveContext.check_configured_cb())
+			if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0')
+			{
+				ereport(WARNING,
+						(errmsg("archive_mode enabled, but archiving is misconfigured"),
+						 errdetail("Only one of archive_command, archive_library may be set.")));
+				return;
+			}
+			else if (ArchiveContext.check_configured_cb != NULL &&
+					 !ArchiveContext.check_configured_cb())
 			{
 				ereport(WARNING,
 						(errmsg("archive_mode enabled, yet archiving is not configured")));
@@ -794,6 +801,9 @@ HandlePgArchInterrupts(void)
 	{
 		char	   *archiveLib = pstrdup(XLogArchiveLibrary);
 		bool		archiveLibChanged;
+		bool		misconfiguredBeforeReload = (XLogArchiveCommand[0] != '\0' &&
+												 XLogArchiveLibrary[0] != '\0');
+		bool		misconfiguredAfterReload;
 
 		ConfigReloadPending = false;
 		ProcessConfigFile(PGC_SIGHUP);
@@ -801,7 +811,11 @@ HandlePgArchInterrupts(void)
 		archiveLibChanged = strcmp(XLogArchiveLibrary, archiveLib) != 0;
 		pfree(archiveLib);
 
-		if (archiveLibChanged)
+		misconfiguredAfterReload = (XLogArchiveCommand[0] != '\0' &&
+									XLogArchiveLibrary[0] != '\0');
+
+		if ((archiveLibChanged && !misconfiguredAfterReload) ||
+			misconfiguredBeforeReload != misconfiguredAfterReload)
 		{
 			/*
 			 * Call the currently loaded archive module's shutdown callback,
@@ -816,10 +830,17 @@ HandlePgArchInterrupts(void)
 			 * internal_load_library()).  To deal with this, we simply restart
 			 * the archiver.  The new archive module will be loaded when the
 			 * new archiver process starts up.
+			 *
+			 * Similarly, we restart the archiver if our misconfiguration status
+			 * changes.  If the parameters were misconfigured but are no longer,
+			 * we must restart to load the correct callbacks.  If the parameters
+			 * weren't misconfigured but now are, we must restart to unload the
+			 * current callbacks.
 			 */
 			ereport(LOG,
 					(errmsg("restarting archiver process because value of "
-							"\"archive_library\" was changed")));
+							"\"archive_library\" or \"archive_command\" was "
+							"changed")));
 
 			proc_exit(0);
 		}
@@ -838,6 +859,14 @@ LoadArchiveLibrary(void)
 
 	memset(&ArchiveContext, 0, sizeof(ArchiveModuleCallbacks));
 
+	/*
+	 * If both a shell command and an archive library are specified, it is not
+	 * clear what we should do, so do nothing.  The archiver will emit WARNINGs
+	 * about the misconfiguration.
+	 */
+	if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0')
+		return;
+
 	/*
 	 * If shell archiving is enabled, use our special initialization function.
 	 * Otherwise, load the library and call its _PG_archive_module_init().
#74Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Nathan Bossart (#70)
Re: archive modules

On 14.09.22 23:09, Nathan Bossart wrote:

On Wed, Sep 14, 2022 at 09:31:04PM +0200, Peter Eisentraut wrote:

Here is a patch that addresses this.

My intent was to present archive_command as the built-in archive library,
but I can see how this might cause confusion, so this change seems
reasonable to me.

While working on this, I noticed that in master this conflicts with
commit 3cabe45a819f8a2a282d9d57e45f259c84e97c3f. I have posted a
message in that thread looking for a resolution.

#75Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Peter Eisentraut (#74)
Re: archive modules

On 17.09.22 11:49, Peter Eisentraut wrote:

On 14.09.22 23:09, Nathan Bossart wrote:

On Wed, Sep 14, 2022 at 09:31:04PM +0200, Peter Eisentraut wrote:

Here is a patch that addresses this.

My intent was to present archive_command as the built-in archive library,
but I can see how this might cause confusion, so this change seems
reasonable to me.

While working on this, I noticed that in master this conflicts with
commit 3cabe45a819f8a2a282d9d57e45f259c84e97c3f.  I have posted a
message in that thread looking for a resolution.

I have received clarification there, so I went ahead with this patch
here after some adjustments in master around that other patch.

#76Nathan Bossart
nathandbossart@gmail.com
In reply to: Peter Eisentraut (#66)
Re: archive modules

On Wed, Sep 14, 2022 at 09:33:46PM +0200, Peter Eisentraut wrote:

Another question on this feature: Currently, if archive_library is set,
archive_command is ignored. I think if both are set, it should be an error.
Compare for example what happens if you set multiple recovery_target_xxx
settings. I don't think silently turning off one setting by setting another
is a good behavior.

Peter, would you like to proceed with something like [0]/messages/by-id/20220914222736.GA3042279@nathanxps13 to resolve this?
If so, I will work on cleaning the patch up.

[0]: /messages/by-id/20220914222736.GA3042279@nathanxps13

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#77Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Nathan Bossart (#73)
Re: archive modules

On 15.09.22 00:27, Nathan Bossart wrote:

Both archive_command and archive_library are PGC_SIGHUP, so IIUC that
wouldn't be sufficient. I attached a quick sketch that seems to provide
the desired behavior. It's nowhere near committable yet, but it
demonstrates what I'm thinking.

What is the effect of issuing a warning like in this patch? Would it
just not archive anything until the configuration is fixed? I'm not
sure what behavior you are going for; it's a bit hard to imagine from
just reading the patch.

#78Nathan Bossart
nathandbossart@gmail.com
In reply to: Peter Eisentraut (#77)
Re: archive modules

On Fri, Sep 23, 2022 at 05:58:42AM -0400, Peter Eisentraut wrote:

On 15.09.22 00:27, Nathan Bossart wrote:

Both archive_command and archive_library are PGC_SIGHUP, so IIUC that
wouldn't be sufficient. I attached a quick sketch that seems to provide
the desired behavior. It's nowhere near committable yet, but it
demonstrates what I'm thinking.

What is the effect of issuing a warning like in this patch? Would it just
not archive anything until the configuration is fixed? I'm not sure what
behavior you are going for; it's a bit hard to imagine from just reading the
patch.

Yes, it will halt archiving and emit a WARNING, just like what happens on
released versions when you leave archive_command empty.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#79Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Nathan Bossart (#78)
Re: archive modules

On 23.09.22 18:14, Nathan Bossart wrote:

On Fri, Sep 23, 2022 at 05:58:42AM -0400, Peter Eisentraut wrote:

On 15.09.22 00:27, Nathan Bossart wrote:

Both archive_command and archive_library are PGC_SIGHUP, so IIUC that
wouldn't be sufficient. I attached a quick sketch that seems to provide
the desired behavior. It's nowhere near committable yet, but it
demonstrates what I'm thinking.

What is the effect of issuing a warning like in this patch? Would it just
not archive anything until the configuration is fixed? I'm not sure what
behavior you are going for; it's a bit hard to imagine from just reading the
patch.

Yes, it will halt archiving and emit a WARNING, just like what happens on
released versions when you leave archive_command empty.

Leaving archive_command empty is an intentional configuration choice.

What we are talking about here is, arguably, a misconfiguration, so it
should result in an error.

#80Nathan Bossart
nathandbossart@gmail.com
In reply to: Peter Eisentraut (#79)
1 attachment(s)
Re: archive modules

On Wed, Oct 05, 2022 at 07:55:58PM +0200, Peter Eisentraut wrote:

Leaving archive_command empty is an intentional configuration choice.

What we are talking about here is, arguably, a misconfiguration, so it
should result in an error.

Okay. What do you think about something like the attached?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

fail_arch.patchtext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d750290f13..bb4d985f35 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3597,9 +3597,11 @@ include_dir 'conf.d'
        </para>
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
-        file or on the server command line.  It is ignored unless
+        file or on the server command line.  It is only used if
         <varname>archive_mode</varname> was enabled at server start and
-        <varname>archive_library</varname> is set to an empty string.
+        <varname>archive_library</varname> is set to an empty string.  If both
+        <varname>archive_command</varname> and <varname>archive_library</varname>
+        are set, archiving will fail.
         If <varname>archive_command</varname> is an empty string (the default) while
         <varname>archive_mode</varname> is enabled (and <varname>archive_library</varname>
         is set to an empty string), WAL archiving is temporarily
@@ -3624,7 +3626,9 @@ include_dir 'conf.d'
        <para>
         The library to use for archiving completed WAL file segments.  If set to
         an empty string (the default), archiving via shell is enabled, and
-        <xref linkend="guc-archive-command"/> is used.  Otherwise, the specified
+        <xref linkend="guc-archive-command"/> is used.  If both
+        <varname>archive_command</varname> and <varname>archive_library</varname>
+        are set, archiving will fail.  Otherwise, the specified
         shared library is used for archiving.  For more information, see
         <xref linkend="backup-archiving-wal"/> and
         <xref linkend="archive-modules"/>.
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 3868cd7bd3..56dcc0dce5 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -838,6 +838,12 @@ LoadArchiveLibrary(void)
 
 	memset(&ArchiveContext, 0, sizeof(ArchiveModuleCallbacks));
 
+	if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0')
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("both archive_command and archive_library specified"),
+				 errdetail("Only one of archive_command, archive_library may be set.")));
+
 	/*
 	 * If shell archiving is enabled, use our special initialization function.
 	 * Otherwise, load the library and call its _PG_archive_module_init().
#81Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Nathan Bossart (#80)
Re: archive modules

On 05.10.22 20:57, Nathan Bossart wrote:

On Wed, Oct 05, 2022 at 07:55:58PM +0200, Peter Eisentraut wrote:

Leaving archive_command empty is an intentional configuration choice.

What we are talking about here is, arguably, a misconfiguration, so it
should result in an error.

Okay. What do you think about something like the attached?

That looks like the right solution to me.

Let's put that into PG 16, and maybe we can consider backpatching it.

#82Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Peter Eisentraut (#81)
1 attachment(s)
Re: archive modules

On Mon, Oct 10, 2022 at 1:17 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 05.10.22 20:57, Nathan Bossart wrote:

What we are talking about here is, arguably, a misconfiguration, so it
should result in an error.

Okay. What do you think about something like the attached?

The intent here looks reasonable to me. However, why should the user
be able to set both archive_command and archive_library in the first
place only to later fail in LoadArchiveLibrary() per the patch? IMO,
the check_hook() is the right way to disallow any sorts of GUC
misconfigurations, no?

FWIW, I'm attaching a small patch that uses check_hook().

That looks like the right solution to me.

Let's put that into PG 16, and maybe we can consider backpatching it.

+1 to backpatch to PG 15 where the archive modules feature was introduced.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-Handle-misconfigurations-of-archive_command-and-a.patchapplication/x-patch; name=v1-0001-Handle-misconfigurations-of-archive_command-and-a.patchDownload
From c5969071dfb9064bbf9e22513bf2cef57bcfd84f Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Thu, 13 Oct 2022 09:37:40 +0000
Subject: [PATCH v1] Handle misconfigurations of archive_command and
 archive_library

The parameters archive_command and archive_library are mutually
exclusive. This patch errors out if the user is trying to set both
of them at a time.
---
 doc/src/sgml/config.sgml            | 11 +++++++----
 src/backend/access/transam/xlog.c   | 16 ++++++++++++++++
 src/backend/postmaster/pgarch.c     | 18 +++++++++++++++++-
 src/backend/utils/misc/guc_tables.c |  4 ++--
 src/include/utils/guc_hooks.h       |  4 ++++
 5 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 66312b53b8..62e7d61da7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3597,9 +3597,10 @@ include_dir 'conf.d'
        </para>
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
-        file or on the server command line.  It is ignored unless
-        <varname>archive_mode</varname> was enabled at server start and
-        <varname>archive_library</varname> is set to an empty string.
+        file or on the server command line. It is not allowed to set both
+        <varname>archive_command</varname> and <varname>archive_library</varname>
+        at the same time, doing so will cause an error. This parameter is ignored
+        unless <varname>archive_mode</varname> was enabled at server start.
         If <varname>archive_command</varname> is an empty string (the default) while
         <varname>archive_mode</varname> is enabled (and <varname>archive_library</varname>
         is set to an empty string), WAL archiving is temporarily
@@ -3625,7 +3626,9 @@ include_dir 'conf.d'
         The library to use for archiving completed WAL file segments.  If set to
         an empty string (the default), archiving via shell is enabled, and
         <xref linkend="guc-archive-command"/> is used.  Otherwise, the specified
-        shared library is used for archiving.  For more information, see
+        shared library is used for archiving. It is not allowed to set both
+        <varname>archive_library</varname> and <varname>archive_command</varname>
+        at the same time, doing so will cause an error. For more information, see
         <xref linkend="backup-archiving-wal"/> and
         <xref linkend="archive-modules"/>.
        </para>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 27085b15a8..64714c6940 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4462,6 +4462,22 @@ show_archive_command(void)
 		return "(disabled)";
 }
 
+/*
+ * GUC check_hook for archive_command
+ */
+bool
+check_archive_command(char **newval, void **extra, GucSource source)
+{
+	if (*newval && strcmp(*newval, "") != 0 && XLogArchiveLibrary[0] != '\0')
+	{
+		GUC_check_errmsg("cannot set \"archive_command\" when \"archive_library\" is specified");
+		GUC_check_errdetail("Only one of \"archive_command\" or \"archive_library\" can be specified.");
+		return false;
+	}
+
+	return true;
+}
+
 /*
  * GUC show_hook for in_hot_standby
  */
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 3868cd7bd3..f8c05754a1 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -44,7 +44,7 @@
 #include "storage/procsignal.h"
 #include "storage/shmem.h"
 #include "storage/spin.h"
-#include "utils/guc.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 
@@ -146,6 +146,22 @@ static int	ready_file_comparator(Datum a, Datum b, void *arg);
 static void LoadArchiveLibrary(void);
 static void call_archive_module_shutdown_callback(int code, Datum arg);
 
+/*
+ * GUC check_hook for check_archive_library
+ */
+bool
+check_archive_library(char **newval, void **extra, GucSource source)
+{
+	if (*newval && strcmp(*newval, "") != 0 && XLogArchiveCommand[0] != '\0')
+	{
+		GUC_check_errmsg("cannot set \"archive_library\" when \"archive_command\" is specified");
+		GUC_check_errdetail("Only one of \"archive_library\" or \"archive_command\" can be specified.");
+		return false;
+	}
+
+	return true;
+}
+
 /* Report shared memory space needed by PgArchShmemInit */
 Size
 PgArchShmemSize(void)
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 05ab087934..2ec327f41e 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3702,7 +3702,7 @@ struct config_string ConfigureNamesString[] =
 		},
 		&XLogArchiveCommand,
 		"",
-		NULL, NULL, show_archive_command
+		check_archive_command, NULL, show_archive_command
 	},
 
 	{
@@ -3712,7 +3712,7 @@ struct config_string ConfigureNamesString[] =
 		},
 		&XLogArchiveLibrary,
 		"",
-		NULL, NULL, NULL
+		check_archive_library, NULL, NULL
 	},
 
 	{
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index f1a9a183b4..daab4d0a0d 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -29,6 +29,10 @@ extern bool check_application_name(char **newval, void **extra,
 								   GucSource source);
 extern void assign_application_name(const char *newval, void *extra);
 extern const char *show_archive_command(void);
+extern bool check_archive_command(char **newval, void **extra,
+								  GucSource source);
+extern bool check_archive_library(char **newval, void **extra,
+								  GucSource source);
 extern bool check_autovacuum_max_workers(int *newval, void **extra,
 										 GucSource source);
 extern bool check_autovacuum_work_mem(int *newval, void **extra,
-- 
2.34.1

#83Nathan Bossart
nathandbossart@gmail.com
In reply to: Bharath Rupireddy (#82)
Re: archive modules

On Thu, Oct 13, 2022 at 03:25:27PM +0530, Bharath Rupireddy wrote:

The intent here looks reasonable to me. However, why should the user
be able to set both archive_command and archive_library in the first
place only to later fail in LoadArchiveLibrary() per the patch? IMO,
the check_hook() is the right way to disallow any sorts of GUC
misconfigurations, no?

There was some discussion upthread about using the GUC hooks to enforce
this [0]/messages/by-id/20220914200305.GA2984249@nathanxps13. In general, it doesn't seem to be a recommended practice. One
basic example of the problems with this approach is the following:

1. Set archive_command and leave archive_library unset and restart
the server.
2. Unset archive_command and set archive_library and call 'pg_ctl
reload'.

After these steps, you'll see the following log messages:

2022-10-13 10:58:42.112 PDT [1562524] LOG: received SIGHUP, reloading configuration files
2022-10-13 10:58:42.114 PDT [1562524] LOG: cannot set "archive_library" when "archive_command" is specified
2022-10-13 10:58:42.114 PDT [1562524] DETAIL: Only one of "archive_library" or "archive_command" can be specified.
2022-10-13 10:58:42.114 PDT [1562524] LOG: parameter "archive_command" changed to ""
2022-10-13 10:58:42.114 PDT [1562524] LOG: configuration file "/home/nathan/pgdata/postgresql.conf" contains errors; unaffected changes were applied

[0]: /messages/by-id/20220914200305.GA2984249@nathanxps13

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#84Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#83)
Re: archive modules

Nathan Bossart <nathandbossart@gmail.com> writes:

On Thu, Oct 13, 2022 at 03:25:27PM +0530, Bharath Rupireddy wrote:

The intent here looks reasonable to me. However, why should the user
be able to set both archive_command and archive_library in the first
place only to later fail in LoadArchiveLibrary() per the patch? IMO,
the check_hook() is the right way to disallow any sorts of GUC
misconfigurations, no?

There was some discussion upthread about using the GUC hooks to enforce
this [0]. In general, it doesn't seem to be a recommended practice.

Yeah, it really does not work to use GUC hooks to enforce multi-variable
constraints. We've learned that the hard way (more than once, if memory
serves).

regards, tom lane

#85Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#84)
Re: archive modules

On Thu, Oct 13, 2022 at 02:53:38PM -0400, Tom Lane wrote:

Yeah, it really does not work to use GUC hooks to enforce multi-variable
constraints. We've learned that the hard way (more than once, if memory
serves).

414c2fd is one of the most recent ones. Its thread is about the same
thing.
--
Michael

#86Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Michael Paquier (#85)
1 attachment(s)
Re: archive modules

On Fri, Oct 14, 2022 at 6:00 AM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Oct 13, 2022 at 02:53:38PM -0400, Tom Lane wrote:

Yeah, it really does not work to use GUC hooks to enforce multi-variable
constraints. We've learned that the hard way (more than once, if memory
serves).

414c2fd is one of the most recent ones. Its thread is about the same
thing.

Got it. Thanks. Just thinking if we must move below comment somewhere
to guc related files?

* XXX this code is broken by design. Throwing an error from a GUC assign
* hook breaks fundamental assumptions of guc.c. So long as all the variables
* for which this can happen are PGC_POSTMASTER, the consequences are limited,
* since we'd just abort postmaster startup anyway. Nonetheless it's likely
* that we have odd behaviors such as unexpected GUC ordering dependencies.
*/

FWIW, I see check_stage_log_stats() and check_log_stats() that set
errdetail and return false causing the similar error:

postgres=# alter system set log_statement_stats = true;
postgres=# select pg_reload_conf();
postgres=# alter system set log_statement_stats = false;
postgres=# alter system set log_parser_stats = true;
ERROR: invalid value for parameter "log_parser_stats": 1
DETAIL: Cannot enable parameter when "log_statement_stats" is true.

On Thu, Oct 13, 2022 at 11:54 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:

On Thu, Oct 13, 2022 at 03:25:27PM +0530, Bharath Rupireddy wrote:

The intent here looks reasonable to me. However, why should the user
be able to set both archive_command and archive_library in the first
place only to later fail in LoadArchiveLibrary() per the patch? IMO,
the check_hook() is the right way to disallow any sorts of GUC
misconfigurations, no?

There was some discussion upthread about using the GUC hooks to enforce
this [0]. In general, it doesn't seem to be a recommended practice. One
basic example of the problems with this approach is the following:

1. Set archive_command and leave archive_library unset and restart
the server.
2. Unset archive_command and set archive_library and call 'pg_ctl
reload'.

Thanks. And yes, if GUC 'foo' is reset but not reloaded and the
check_hook() in the GUC 'bar' while setting it uses the old value of
'foo' and fails.

I'm re-attaching Nathan's patch as-is from [1]/messages/by-id/20221005185716.GB201192@nathanxps13 here again, just to
make CF bot test the correct patch. Few comments on that patch:

1)
+    if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0')
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("both archive_command and archive_library specified"),
+                 errdetail("Only one of archive_command,
archive_library may be set.")));

The above errmsg looks informational. Can we just say something like
below? It doesn't require errdetail as the errmsg says it all. See
the other instances elsewhere [2]errmsg("cannot specify both PARSER and COPY options"))); errmsg("cannot specify both %s and %s", errmsg("cannot specify both %s and %s",.

ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("cannot specify both \"archive_command\" and
\"archive_library\"")));

2) I think we have a problem - set archive_mode and archive_library
and start the server, then set archive_command, reload the conf, see
[3]: ./psql -c "alter system set archive_mode='on'" postgres ./psql -c "alter system set archive_library='/home/ubuntu/postgres/contrib/basic_archive/basic_archive.so'" postgres ./pg_ctl -D data -l logfile restart ./psql -c "alter system set archive_command='cp %p /home/ubuntu/archived_wal/%f'" postgres ./psql -c "select pg_reload_conf();" postgres postgres=# show archive_mode; archive_mode -------------- on (1 row) postgres=# show archive_command ; archive_command ------------------------------------ cp %p /home/ubuntu/archived_wal/%f (1 row) postgres=# show archive_library ; archive_library -------------------------------------------------------------- /home/ubuntu/postgres/contrib/basic_archive/basic_archive.so (1 row) postgres=# select pid, wait_event_type, backend_type from pg_stat_activity where backend_type = 'archiver'; pid | wait_event_type | backend_type ---------+-----------------+-------------- 2116760 | Activity | archiver (1 row)
restarted whenever archive_library changes but not when
archive_command changes. I think the right place for the error is
after or at the end of HandlePgArchInterrupts().

[1]: /messages/by-id/20221005185716.GB201192@nathanxps13
[2]: errmsg("cannot specify both PARSER and COPY options"))); errmsg("cannot specify both %s and %s", errmsg("cannot specify both %s and %s",
errmsg("cannot specify both %s and %s",
errmsg("cannot specify both %s and %s",
[3]: ./psql -c "alter system set archive_mode='on'" postgres ./psql -c "alter system set archive_library='/home/ubuntu/postgres/contrib/basic_archive/basic_archive.so'" postgres ./pg_ctl -D data -l logfile restart ./psql -c "alter system set archive_command='cp %p /home/ubuntu/archived_wal/%f'" postgres ./psql -c "select pg_reload_conf();" postgres postgres=# show archive_mode; archive_mode -------------- on (1 row) postgres=# show archive_command ; archive_command ------------------------------------ cp %p /home/ubuntu/archived_wal/%f (1 row) postgres=# show archive_library ; archive_library -------------------------------------------------------------- /home/ubuntu/postgres/contrib/basic_archive/basic_archive.so (1 row) postgres=# select pid, wait_event_type, backend_type from pg_stat_activity where backend_type = 'archiver'; pid | wait_event_type | backend_type ---------+-----------------+-------------- 2116760 | Activity | archiver (1 row)
./psql -c "alter system set archive_mode='on'" postgres
./psql -c "alter system set
archive_library='/home/ubuntu/postgres/contrib/basic_archive/basic_archive.so'"
postgres
./pg_ctl -D data -l logfile restart
./psql -c "alter system set archive_command='cp %p
/home/ubuntu/archived_wal/%f'" postgres
./psql -c "select pg_reload_conf();" postgres
postgres=# show archive_mode;
archive_mode
--------------
on
(1 row)
postgres=# show archive_command ;
archive_command
------------------------------------
cp %p /home/ubuntu/archived_wal/%f
(1 row)
postgres=# show archive_library ;
archive_library
--------------------------------------------------------------
/home/ubuntu/postgres/contrib/basic_archive/basic_archive.so
(1 row)
postgres=# select pid, wait_event_type, backend_type from
pg_stat_activity where backend_type = 'archiver';
pid | wait_event_type | backend_type
---------+-----------------+--------------
2116760 | Activity | archiver
(1 row)

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

fail_arch.patchapplication/octet-stream; name=fail_arch.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d750290f13..bb4d985f35 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3597,9 +3597,11 @@ include_dir 'conf.d'
        </para>
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
-        file or on the server command line.  It is ignored unless
+        file or on the server command line.  It is only used if
         <varname>archive_mode</varname> was enabled at server start and
-        <varname>archive_library</varname> is set to an empty string.
+        <varname>archive_library</varname> is set to an empty string.  If both
+        <varname>archive_command</varname> and <varname>archive_library</varname>
+        are set, archiving will fail.
         If <varname>archive_command</varname> is an empty string (the default) while
         <varname>archive_mode</varname> is enabled (and <varname>archive_library</varname>
         is set to an empty string), WAL archiving is temporarily
@@ -3624,7 +3626,9 @@ include_dir 'conf.d'
        <para>
         The library to use for archiving completed WAL file segments.  If set to
         an empty string (the default), archiving via shell is enabled, and
-        <xref linkend="guc-archive-command"/> is used.  Otherwise, the specified
+        <xref linkend="guc-archive-command"/> is used.  If both
+        <varname>archive_command</varname> and <varname>archive_library</varname>
+        are set, archiving will fail.  Otherwise, the specified
         shared library is used for archiving.  For more information, see
         <xref linkend="backup-archiving-wal"/> and
         <xref linkend="archive-modules"/>.
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 3868cd7bd3..56dcc0dce5 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -838,6 +838,12 @@ LoadArchiveLibrary(void)
 
 	memset(&ArchiveContext, 0, sizeof(ArchiveModuleCallbacks));
 
+	if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0')
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("both archive_command and archive_library specified"),
+				 errdetail("Only one of archive_command, archive_library may be set.")));
+
 	/*
 	 * If shell archiving is enabled, use our special initialization function.
 	 * Otherwise, load the library and call its _PG_archive_module_init().
#87Nathan Bossart
nathandbossart@gmail.com
In reply to: Bharath Rupireddy (#86)
Re: archive modules

On Fri, Oct 14, 2022 at 12:10:18PM +0530, Bharath Rupireddy wrote:

+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("both archive_command and archive_library specified"),
+                 errdetail("Only one of archive_command,
archive_library may be set.")));

The above errmsg looks informational. Can we just say something like
below? It doesn't require errdetail as the errmsg says it all. See
the other instances elsewhere [2].

ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("cannot specify both \"archive_command\" and
\"archive_library\"")));

I modeled this after the ERROR that error_multiple_recovery_targets()
emits. I don't think there's really any material difference between your
proposal and mine, but I don't have a strong opinion.

2) I think we have a problem - set archive_mode and archive_library
and start the server, then set archive_command, reload the conf, see
[3] - the archiver needs to error out right? The archiver gets
restarted whenever archive_library changes but not when
archive_command changes. I think the right place for the error is
after or at the end of HandlePgArchInterrupts().

Good catch. You are right, this is broken. I believe that we need to
check for the misconfiguration in HandlePgArchInterrupts() in addition to
LoadArchiveLibrary(). I will work on fixing this.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#88Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#87)
1 attachment(s)
Re: archive modules

On Fri, Oct 14, 2022 at 11:51:30AM -0700, Nathan Bossart wrote:

On Fri, Oct 14, 2022 at 12:10:18PM +0530, Bharath Rupireddy wrote:

2) I think we have a problem - set archive_mode and archive_library
and start the server, then set archive_command, reload the conf, see
[3] - the archiver needs to error out right? The archiver gets
restarted whenever archive_library changes but not when
archive_command changes. I think the right place for the error is
after or at the end of HandlePgArchInterrupts().

Good catch. You are right, this is broken. I believe that we need to
check for the misconfiguration in HandlePgArchInterrupts() in addition to
LoadArchiveLibrary(). I will work on fixing this.

As promised...

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

fail_arch_v2.patchtext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 66312b53b8..9d0f3608c4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3597,9 +3597,11 @@ include_dir 'conf.d'
        </para>
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
-        file or on the server command line.  It is ignored unless
+        file or on the server command line.  It is only used if
         <varname>archive_mode</varname> was enabled at server start and
-        <varname>archive_library</varname> is set to an empty string.
+        <varname>archive_library</varname> is set to an empty string.  If both
+        <varname>archive_command</varname> and <varname>archive_library</varname>
+        are set, archiving will fail.
         If <varname>archive_command</varname> is an empty string (the default) while
         <varname>archive_mode</varname> is enabled (and <varname>archive_library</varname>
         is set to an empty string), WAL archiving is temporarily
@@ -3624,7 +3626,9 @@ include_dir 'conf.d'
        <para>
         The library to use for archiving completed WAL file segments.  If set to
         an empty string (the default), archiving via shell is enabled, and
-        <xref linkend="guc-archive-command"/> is used.  Otherwise, the specified
+        <xref linkend="guc-archive-command"/> is used.  If both
+        <varname>archive_command</varname> and <varname>archive_library</varname>
+        are set, archiving will fail.  Otherwise, the specified
         shared library is used for archiving.  For more information, see
         <xref linkend="backup-archiving-wal"/> and
         <xref linkend="archive-modules"/>.
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 3868cd7bd3..39c2115943 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -801,7 +801,8 @@ HandlePgArchInterrupts(void)
 		archiveLibChanged = strcmp(XLogArchiveLibrary, archiveLib) != 0;
 		pfree(archiveLib);
 
-		if (archiveLibChanged)
+		if (archiveLibChanged ||
+			(XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0'))
 		{
 			/*
 			 * Call the currently loaded archive module's shutdown callback,
@@ -809,17 +810,25 @@ HandlePgArchInterrupts(void)
 			 */
 			call_archive_module_shutdown_callback(0, 0);
 
-			/*
-			 * Ideally, we would simply unload the previous archive module and
-			 * load the new one, but there is presently no mechanism for
-			 * unloading a library (see the comment above
-			 * internal_load_library()).  To deal with this, we simply restart
-			 * the archiver.  The new archive module will be loaded when the
-			 * new archiver process starts up.
-			 */
-			ereport(LOG,
-					(errmsg("restarting archiver process because value of "
-							"\"archive_library\" was changed")));
+			if (archiveLibChanged)
+			{
+				/*
+				 * Ideally, we would simply unload the previous archive module
+				 * and load the new one, but there is presently no mechanism
+				 * for unloading a library (see the comment above
+				 * internal_load_library()).  To deal with this, we simply
+				 * restart the archiver.  The new archive module will be loaded
+				 * when the new archiver process starts up.
+				 */
+				ereport(LOG,
+						(errmsg("restarting archiver process because value of "
+								"\"archive_library\" was changed")));
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("both archive_command and archive_library specified"),
+						 errdetail("Only one of archive_command, archive_library may be set.")));
 
 			proc_exit(0);
 		}
@@ -836,6 +845,12 @@ LoadArchiveLibrary(void)
 {
 	ArchiveModuleInit archive_init;
 
+	if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0')
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("both archive_command and archive_library specified"),
+				 errdetail("Only one of archive_command, archive_library may be set.")));
+
 	memset(&ArchiveContext, 0, sizeof(ArchiveModuleCallbacks));
 
 	/*
#89Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Nathan Bossart (#88)
1 attachment(s)
Re: archive modules

On Sat, Oct 15, 2022 at 3:13 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Fri, Oct 14, 2022 at 11:51:30AM -0700, Nathan Bossart wrote:

On Fri, Oct 14, 2022 at 12:10:18PM +0530, Bharath Rupireddy wrote:

2) I think we have a problem - set archive_mode and archive_library
and start the server, then set archive_command, reload the conf, see
[3] - the archiver needs to error out right? The archiver gets
restarted whenever archive_library changes but not when
archive_command changes. I think the right place for the error is
after or at the end of HandlePgArchInterrupts().

Good catch. You are right, this is broken. I believe that we need to
check for the misconfiguration in HandlePgArchInterrupts() in addition to
LoadArchiveLibrary(). I will work on fixing this.

As promised...

Thanks. I think that if the condition can be simplified something like
in the attached. It's okay to call shutdown callback twice by getting
rid of the comment [1]/* * Call the currently loaded archive module's shutdown callback, * if one is defined. */ call_archive_module_shutdown_callback(0, 0); as it doesn't add any extra value or
information, it just says that we're calling shutdown callback
function. With the attached, the code is more readable and the
footprint of the changes are reduced.

[1]: /* * Call the currently loaded archive module's shutdown callback, * if one is defined. */ call_archive_module_shutdown_callback(0, 0);
/*
* Call the currently loaded archive module's shutdown callback,
* if one is defined.
*/
call_archive_module_shutdown_callback(0, 0);

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v3-0001-Disallow-specifiying-archive_library-and-archive_.patchapplication/octet-stream; name=v3-0001-Disallow-specifiying-archive_library-and-archive_.patchDownload
From e046136ef860ba06186388962a1e4096f9833a8d Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Sun, 16 Oct 2022 07:21:50 +0000
Subject: [PATCH v3] Disallow specifiying archive_library and archive_command
 GUCs at once

The archive_library and archive_command GUCs are meant to be mutually
exclusive because the users allowed to choose any one of the archiving
approach by design. With the patch, the server emits an error if they
both are set at once.

Note that we've not chosen to emit the error from check_hook or
assign_hook as it can have odd behaviors such as unexpected GUC
ordering dependencies.

Backpatch to 15.

Author: Nathan Bossart
Reviewed-by: Peter Eisentraut
Reviewed-by: Bharath Rupireddy
Discussion: https://www.postgresql.org/message-id/20220914222736.GA3042279%40nathanxps13
---
 doc/src/sgml/config.sgml        | 10 +++++++---
 src/backend/postmaster/pgarch.c | 19 +++++++++++++++----
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 66312b53b8..9d0f3608c4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3597,9 +3597,11 @@ include_dir 'conf.d'
        </para>
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
-        file or on the server command line.  It is ignored unless
+        file or on the server command line.  It is only used if
         <varname>archive_mode</varname> was enabled at server start and
-        <varname>archive_library</varname> is set to an empty string.
+        <varname>archive_library</varname> is set to an empty string.  If both
+        <varname>archive_command</varname> and <varname>archive_library</varname>
+        are set, archiving will fail.
         If <varname>archive_command</varname> is an empty string (the default) while
         <varname>archive_mode</varname> is enabled (and <varname>archive_library</varname>
         is set to an empty string), WAL archiving is temporarily
@@ -3624,7 +3626,9 @@ include_dir 'conf.d'
        <para>
         The library to use for archiving completed WAL file segments.  If set to
         an empty string (the default), archiving via shell is enabled, and
-        <xref linkend="guc-archive-command"/> is used.  Otherwise, the specified
+        <xref linkend="guc-archive-command"/> is used.  If both
+        <varname>archive_command</varname> and <varname>archive_library</varname>
+        are set, archiving will fail.  Otherwise, the specified
         shared library is used for archiving.  For more information, see
         <xref linkend="backup-archiving-wal"/> and
         <xref linkend="archive-modules"/>.
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 3868cd7bd3..bba2bc07a9 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -803,10 +803,6 @@ HandlePgArchInterrupts(void)
 
 		if (archiveLibChanged)
 		{
-			/*
-			 * Call the currently loaded archive module's shutdown callback,
-			 * if one is defined.
-			 */
 			call_archive_module_shutdown_callback(0, 0);
 
 			/*
@@ -823,6 +819,15 @@ HandlePgArchInterrupts(void)
 
 			proc_exit(0);
 		}
+		else if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0')
+		{
+			call_archive_module_shutdown_callback(0, 0);
+
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("both archive_command and archive_library specified"),
+					 errdetail("Only one of archive_command, archive_library may be set.")));
+		}
 	}
 }
 
@@ -836,6 +841,12 @@ LoadArchiveLibrary(void)
 {
 	ArchiveModuleInit archive_init;
 
+	if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0')
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("both archive_command and archive_library specified"),
+				 errdetail("Only one of archive_command, archive_library may be set.")));
+
 	memset(&ArchiveContext, 0, sizeof(ArchiveModuleCallbacks));
 
 	/*
-- 
2.34.1

#90Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Nathan Bossart (#88)
Re: archive modules

At Fri, 14 Oct 2022 14:42:56 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in

As promised...

As the code written, when archive library is being added while archive
command is already set, archiver first emits seemingly positive
message "restarting archive process because of..", then errors out
after the resatart and keep restarting with complaining for the wrong
setting. I think we don't need the first message.

The ERROR always turns into FATAL, so FATAL would less confusing here,
maybe.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#91Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#90)
Re: archive modules

On Mon, Oct 17, 2022 at 01:46:39PM +0900, Kyotaro Horiguchi wrote:

As the code written, when archive library is being added while archive
command is already set, archiver first emits seemingly positive
message "restarting archive process because of..", then errors out
after the resatart and keep restarting with complaining for the wrong
setting. I think we don't need the first message.

The ERROR always turns into FATAL, so FATAL would less confusing here,
maybe.

You mean the second message in HandlePgArchInterrupts() when
archiveLibChanged is false? An ERROR or a FATAL would not change much
as there is a proc_exit() anyway down the road.

+   if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0')
+       ereport(ERROR,
+               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                errmsg("both archive_command and archive_library specified"),
+                errdetail("Only one of archive_command, archive_library may be set.")));

So, backpedalling from upthread where Peter mentions that we should
complain if both archive_command and archive_library are set (creating
a parallel with recovery parameters), I'd like to think that pgarch.c
should have zero knowledge of what an archive_command is and should
just handle the library part. This makes the whole reasoning around
what pgarch.c should be much simpler, aka it just needs to know about
archive *libraries*, not *commands*. That's the kind of business that
check_configured_cb() is designed for, actually, as far as I
understand, or this callback could just be removed entirely for the
same effect, as there would be no point in having pgarch.c do its
thing without archive_library or archive_command where a WARNING is
issued in the default case (shell_archive with no archive_command).

And, by the way, this patch would prevent the existence of archive
modules that need to be loaded but *want* an archive_command with
what they want to achieve. That does not strike me as a good idea if
we want to have a maximum of flexibility with this facility. I think
that for all that, we should put the responsability of what should be
set or not set directly to the modules, aka basic_archive could
complain if archive_command is set, but that does not strike me as a
mandatory requirement, either. It is true that archive_library has
been introduced as a way to avoid using archive_command, but the point
of creating a stronger dependency between both would be IMO annoying
in the long-term.
--
Michael

#92Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Michael Paquier (#91)
Re: archive modules

On Mon, Oct 17, 2022 at 11:20 AM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Oct 17, 2022 at 01:46:39PM +0900, Kyotaro Horiguchi wrote:

As the code written, when archive library is being added while archive
command is already set, archiver first emits seemingly positive
message "restarting archive process because of..", then errors out
after the resatart and keep restarting with complaining for the wrong
setting. I think we don't need the first message.

The ERROR always turns into FATAL, so FATAL would less confusing here,
maybe.

You mean the second message in HandlePgArchInterrupts() when
archiveLibChanged is false? An ERROR or a FATAL would not change much
as there is a proc_exit() anyway down the road.

Yes, ERROR or FATAL it really doesn't matter, the process exits see,
pg_re_throw(), for archiver PG_exception_stack is null.
2022-10-18 09:57:41.869 UTC [2479104] FATAL: both archive_command and
archive_library specified
2022-10-18 09:57:41.869 UTC [2479104] DETAIL: Only one of
archive_command, archive_library may be set.

I think Kyotaro-san's concern is to place errmsg("both archive_command
and archive_library specified"), before errmsg("restarting archiver
process because value of \"archive_library\" was changed", something
like the attached v4 patch.

+   if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0')
+       ereport(ERROR,
+               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                errmsg("both archive_command and archive_library specified"),
+                errdetail("Only one of archive_command, archive_library may be set.")));

So, backpedalling from upthread where Peter mentions that we should
complain if both archive_command and archive_library are set (creating
a parallel with recovery parameters), I'd like to think that pgarch.c
should have zero knowledge of what an archive_command is and should
just handle the library part. This makes the whole reasoning around
what pgarch.c should be much simpler, aka it just needs to know about
archive *libraries*, not *commands*.

Are you saying that we make/treat/build shell_archive.c as a separate
shared library/module (instead of just an object file) and load it in
pgarc.c? If yes, this can make pgarch.c simple.

That's the kind of business that
check_configured_cb() is designed for, actually, as far as I
understand, or this callback could just be removed entirely for the
same effect, as there would be no point in having pgarch.c do its
thing without archive_library or archive_command where a WARNING is
issued in the default case (shell_archive with no archive_command).

If it's done as said above, the corresponding check_configured_cb()
can deal with allowing/disallowing/misconfiguring various parameters.

And, by the way, this patch would prevent the existence of archive
modules that need to be loaded but *want* an archive_command with
what they want to achieve. That does not strike me as a good idea if
we want to have a maximum of flexibility with this facility. I think
that for all that, we should put the responsability of what should be
set or not set directly to the modules, aka basic_archive could
complain if archive_command is set, but that does not strike me as a
mandatory requirement, either. It is true that archive_library has
been introduced as a way to avoid using archive_command, but the point
of creating a stronger dependency between both would be IMO annoying
in the long-term.

Great thought! If the responsibility of
allowing/disallowing/misconfiguring various parameters is given to
check_configured_cb(), the modules can decide whether to error out or
deal with it or use it.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#93Ian Lawrence Barwick
barwick@gmail.com
In reply to: Bharath Rupireddy (#89)
Re: archive modules

2022年10月16日(日) 16:36 Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>:

On Sat, Oct 15, 2022 at 3:13 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Fri, Oct 14, 2022 at 11:51:30AM -0700, Nathan Bossart wrote:

On Fri, Oct 14, 2022 at 12:10:18PM +0530, Bharath Rupireddy wrote:

2) I think we have a problem - set archive_mode and archive_library
and start the server, then set archive_command, reload the conf, see
[3] - the archiver needs to error out right? The archiver gets
restarted whenever archive_library changes but not when
archive_command changes. I think the right place for the error is
after or at the end of HandlePgArchInterrupts().

Good catch. You are right, this is broken. I believe that we need to
check for the misconfiguration in HandlePgArchInterrupts() in addition to
LoadArchiveLibrary(). I will work on fixing this.

As promised...

Thanks. I think that if the condition can be simplified something like
in the attached. It's okay to call shutdown callback twice by getting
rid of the comment [1] as it doesn't add any extra value or
information, it just says that we're calling shutdown callback
function. With the attached, the code is more readable and the
footprint of the changes are reduced.

[1]
/*
* Call the currently loaded archive module's shutdown callback,
* if one is defined.
*/
call_archive_module_shutdown_callback(0, 0);

Hi

cfbot reports the patch no longer applies [1]http://cfbot.cputube.org/patch_40_3933.log. As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.

[1]: http://cfbot.cputube.org/patch_40_3933.log

Thanks

Ian Barwick

#94Nathan Bossart
nathandbossart@gmail.com
In reply to: Ian Lawrence Barwick (#93)
1 attachment(s)
Re: archive modules

On Fri, Nov 04, 2022 at 12:05:26PM +0900, Ian Lawrence Barwick wrote:

cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.

Indeed. Here is a new version of the patch.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

fail_arch_v3.patchtext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 559eb898a9..2ffd82ab66 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3597,9 +3597,11 @@ include_dir 'conf.d'
        </para>
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
-        file or on the server command line.  It is ignored unless
+        file or on the server command line.  It is only used if
         <varname>archive_mode</varname> was enabled at server start and
-        <varname>archive_library</varname> is set to an empty string.
+        <varname>archive_library</varname> is set to an empty string.  If both
+        <varname>archive_command</varname> and <varname>archive_library</varname>
+        are set, archiving will fail.
         If <varname>archive_command</varname> is an empty string (the default) while
         <varname>archive_mode</varname> is enabled (and <varname>archive_library</varname>
         is set to an empty string), WAL archiving is temporarily
@@ -3624,7 +3626,9 @@ include_dir 'conf.d'
        <para>
         The library to use for archiving completed WAL file segments.  If set to
         an empty string (the default), archiving via shell is enabled, and
-        <xref linkend="guc-archive-command"/> is used.  Otherwise, the specified
+        <xref linkend="guc-archive-command"/> is used.  If both
+        <varname>archive_command</varname> and <varname>archive_library</varname>
+        are set, archiving will fail.  Otherwise, the specified
         shared library is used for archiving. The WAL archiver process is
         restarted by the postmaster when this parameter changes. For more
         information, see <xref linkend="backup-archiving-wal"/> and
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 2670e41666..3e11a4ce12 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -792,6 +792,12 @@ HandlePgArchInterrupts(void)
 		ConfigReloadPending = false;
 		ProcessConfigFile(PGC_SIGHUP);
 
+		if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0')
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("both archive_command and archive_library specified"),
+					 errdetail("Only one of archive_command, archive_library may be set.")));
+
 		archiveLibChanged = strcmp(XLogArchiveLibrary, archiveLib) != 0;
 		pfree(archiveLib);
 
@@ -825,6 +831,12 @@ LoadArchiveLibrary(void)
 {
 	ArchiveModuleInit archive_init;
 
+	if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0')
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("both archive_command and archive_library specified"),
+				 errdetail("Only one of archive_command, archive_library may be set.")));
+
 	memset(&ArchiveContext, 0, sizeof(ArchiveModuleCallbacks));
 
 	/*
#95Nathan Bossart
nathandbossart@gmail.com
In reply to: Michael Paquier (#91)
Re: archive modules

On Mon, Oct 17, 2022 at 02:49:51PM +0900, Michael Paquier wrote:

And, by the way, this patch would prevent the existence of archive
modules that need to be loaded but *want* an archive_command with
what they want to achieve. That does not strike me as a good idea if
we want to have a maximum of flexibility with this facility.

Such a module could define a custom GUC that accepts a shell command. I
don't think we should overload the meaning of archive_command based on the
whims of whatever archive module is loaded. Besides the potential end-user
confusion, your archive_command might be unexpectedly used incorrectly if
you forget to set archive_library.

Perhaps we could eventually move the archive_command functionality to a
contrib module (i.e., "shell_archive") so that users must always set
archive_library. But until then, I suspect it's better to treat modules
and commands as two separate interfaces to ease migration from older major
versions (even though archive_command is now essentially a built-in archive
module).

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#96Michael Paquier
michael@paquier.xyz
In reply to: Nathan Bossart (#95)
Re: archive modules

On Sat, Nov 05, 2022 at 02:08:58PM -0700, Nathan Bossart wrote:

Such a module could define a custom GUC that accepts a shell command. I
don't think we should overload the meaning of archive_command based on the
whims of whatever archive module is loaded. Besides the potential end-user
confusion, your archive_command might be unexpectedly used incorrectly if
you forget to set archive_library.

While mostly copying the logic from shell_archive.c to build the
command to execute (aka shell_archive_file), which is not great as
well. But well, perhaps my whole line of argument is just moot..

Perhaps we could eventually move the archive_command functionality to a
contrib module (i.e., "shell_archive") so that users must always set
archive_library. But until then, I suspect it's better to treat modules
and commands as two separate interfaces to ease migration from older major
versions (even though archive_command is now essentially a built-in archive
module).

I agree that this is a fine long-term goal, removing all traces of the
archive_command from the backend core code. This is actually an
argument in favor of having no traces of XLogArchiveCommand in
pgarch.c, no? ;p

I am not sure how long we should wait before being able to do that,
perhaps a couple of years of least? I'd like to think the sooner the
better (like v17?) but we are usually conservative, and the removal of
the exclusive backup mode took 5~6 years if I recall correctly..
--
Michael

#97Nathan Bossart
nathandbossart@gmail.com
In reply to: Michael Paquier (#96)
Re: archive modules

On Mon, Nov 07, 2022 at 03:20:31PM +0900, Michael Paquier wrote:

On Sat, Nov 05, 2022 at 02:08:58PM -0700, Nathan Bossart wrote:

Perhaps we could eventually move the archive_command functionality to a
contrib module (i.e., "shell_archive") so that users must always set
archive_library. But until then, I suspect it's better to treat modules
and commands as two separate interfaces to ease migration from older major
versions (even though archive_command is now essentially a built-in archive
module).

I agree that this is a fine long-term goal, removing all traces of the
archive_command from the backend core code. This is actually an
argument in favor of having no traces of XLogArchiveCommand in
pgarch.c, no? ;p

Indeed.

I am not sure how long we should wait before being able to do that,
perhaps a couple of years of least? I'd like to think the sooner the
better (like v17?) but we are usually conservative, and the removal of
the exclusive backup mode took 5~6 years if I recall correctly..

Yeah, I imagine we'd need to mark it as deprecated-and-to-be-removed for
several years first.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#98Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Nathan Bossart (#94)
Re: archive modules

On 05.11.22 21:51, Nathan Bossart wrote:

On Fri, Nov 04, 2022 at 12:05:26PM +0900, Ian Lawrence Barwick wrote:

cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.

Indeed. Here is a new version of the patch.

I have committed this to master.

The surrounding code has changed a bit between PG15 and master, so if we
wanted to backpatch this, we'd need another patch from you. However, at
this point, I'm content to just leave it be in PG15.

#99Nathan Bossart
nathandbossart@gmail.com
In reply to: Peter Eisentraut (#98)
Re: archive modules

On Tue, Nov 15, 2022 at 10:31:44AM +0100, Peter Eisentraut wrote:

I have committed this to master.

Thanks!

The surrounding code has changed a bit between PG15 and master, so if we
wanted to backpatch this, we'd need another patch from you. However, at
this point, I'm content to just leave it be in PG15.

Sounds good to me.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#100Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Nathan Bossart (#99)
Re: archive modules

On 2022-Nov-15, Nathan Bossart wrote:

On Tue, Nov 15, 2022 at 10:31:44AM +0100, Peter Eisentraut wrote:

The surrounding code has changed a bit between PG15 and master, so if we
wanted to backpatch this, we'd need another patch from you. However, at
this point, I'm content to just leave it be in PG15.

Sounds good to me.

Hmm, really? It seems to me that we will have two slightly different
behaviors in 15 and master, which may be confusing later on. I think
it'd be better to make them both work identically.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Industry suffers from the managerial dogma that for the sake of stability
and continuity, the company should be independent of the competence of
individual employees." (E. Dijkstra)

#101Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#100)
Re: archive modules

On Tue, Nov 15, 2022 at 06:14:25PM +0100, Alvaro Herrera wrote:

On 2022-Nov-15, Nathan Bossart wrote:

On Tue, Nov 15, 2022 at 10:31:44AM +0100, Peter Eisentraut wrote:

The surrounding code has changed a bit between PG15 and master, so if we
wanted to backpatch this, we'd need another patch from you. However, at
this point, I'm content to just leave it be in PG15.

Sounds good to me.

Hmm, really? It seems to me that we will have two slightly different
behaviors in 15 and master, which may be confusing later on. I think
it'd be better to make them both work identically.

I don't have a strong opinion either way. While consistency between v15
and master seems nice, the behavior change might not be appropriate for a
minor release. BTW I was able to cherry-pick the committed patch to v15
without any changes. Peter, could you clarify what changes you'd like to
see in a back-patched version?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#102Michael Paquier
michael@paquier.xyz
In reply to: Nathan Bossart (#101)
Re: archive modules

On Tue, Nov 15, 2022 at 12:57:49PM -0800, Nathan Bossart wrote:

On Tue, Nov 15, 2022 at 06:14:25PM +0100, Alvaro Herrera wrote:

Hmm, really? It seems to me that we will have two slightly different
behaviors in 15 and master, which may be confusing later on. I think
it'd be better to make them both work identically.

I don't have a strong opinion either way. While consistency between v15
and master seems nice, the behavior change might not be appropriate for a
minor release. BTW I was able to cherry-pick the committed patch to v15
without any changes. Peter, could you clarify what changes you'd like to
see in a back-patched version?

FWIW, I am not sure that I would have done d627ce3 as I already
mentioned upthread as the library loading should not be related to
archive_command. If there is support more support in doing that, I am
fine to withdraw, but the behavior between HEAD and REL_15_STABLE
ought to be consistent.
--
Michael