parallelizing the archiver

Started by Bossart, Nathanover 4 years ago59 messages
#1Bossart, Nathan
bossartn@amazon.com

Hi hackers,

I'd like to gauge interest in parallelizing the archiver process.
From a quick scan, I was only able to find one recent thread [0]/messages/by-id/20180828060221.x33gokifqi3csjj4@depesz.com that
brought up this topic, and ISTM the conventional wisdom is to use a
backup utility like pgBackRest that does things in parallel behind-
the-scenes. My experience is that the generating-more-WAL-than-we-
can-archive problem is pretty common, and parallelization seems to
help quite a bit, so perhaps it's a good time to consider directly
supporting parallel archiving in PostgreSQL.

Based on previous threads I've seen, I believe many in the community
would like to replace archive_command entirely, but what I'm proposing
here would build on the existing tools. I'm currently thinking of
something a bit like autovacuum_max_workers, but the archive workers
would be created once and would follow a competing consumers model.
Another approach I'm looking at is to use background worker processes,
although I'm not sure if linking such a critical piece of
functionality to max_worker_processes is a good idea. However, I do
see that logical replication uses background workers.

Anyway, I'm curious what folks think about this. I think it'd help
simplify server administration for many users.

Nathan

[0]: /messages/by-id/20180828060221.x33gokifqi3csjj4@depesz.com

#2Julien Rouhaud
rjuju123@gmail.com
In reply to: Bossart, Nathan (#1)
Re: parallelizing the archiver

On Wed, Sep 8, 2021 at 6:36 AM Bossart, Nathan <bossartn@amazon.com> wrote:

I'd like to gauge interest in parallelizing the archiver process.
[...]
Based on previous threads I've seen, I believe many in the community
would like to replace archive_command entirely, but what I'm proposing
here would build on the existing tools.

Having a new implementation that would remove the archive_command is
probably a better long term solution, but I don't know of anyone
working on that and it's probably gonna take some time. Right now we
have a lot of users that face archiving bottleneck so I think it would
be a good thing to implement parallel archiving, fully compatible with
current archive_command, as a short term solution.

#3Bossart, Nathan
bossartn@amazon.com
In reply to: Julien Rouhaud (#2)
Re: parallelizing the archiver

On 9/7/21, 11:38 PM, "Julien Rouhaud" <rjuju123@gmail.com> wrote:

On Wed, Sep 8, 2021 at 6:36 AM Bossart, Nathan <bossartn@amazon.com> wrote:

I'd like to gauge interest in parallelizing the archiver process.
[...]
Based on previous threads I've seen, I believe many in the community
would like to replace archive_command entirely, but what I'm proposing
here would build on the existing tools.

Having a new implementation that would remove the archive_command is
probably a better long term solution, but I don't know of anyone
working on that and it's probably gonna take some time. Right now we
have a lot of users that face archiving bottleneck so I think it would
be a good thing to implement parallel archiving, fully compatible with
current archive_command, as a short term solution.

Thanks for chiming in. I'm planning to work on a patch next week.

Nathan

#4Julien Rouhaud
rjuju123@gmail.com
In reply to: Bossart, Nathan (#3)
Re: parallelizing the archiver

On Fri, Sep 10, 2021 at 6:30 AM Bossart, Nathan <bossartn@amazon.com> wrote:

Thanks for chiming in. I'm planning to work on a patch next week.

Great news!

About the technical concerns:

I'm currently thinking of
something a bit like autovacuum_max_workers, but the archive workers
would be created once and would follow a competing consumers model.

In this approach, the launched archiver workers would be kept as long
as the instance is up, or should they be stopped if they're not
required anymore, e.g. if there was a temporary write activity spike?
I think we should make sure that at least one worker is always up.

Another approach I'm looking at is to use background worker processes,
although I'm not sure if linking such a critical piece of
functionality to max_worker_processes is a good idea. However, I do
see that logical replication uses background workers.

I think that using background workers is a good approach, and the
various guc in that area should allow users to properly configure
archiving too. If that's not the case, it might be an opportunity to
add some new infrastructure that could benefit all bgworkers users.

#5Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Bossart, Nathan (#1)
Re: parallelizing the archiver

8 сент. 2021 г., в 03:36, Bossart, Nathan <bossartn@amazon.com> написал(а):

Anyway, I'm curious what folks think about this. I think it'd help
simplify server administration for many users.

BTW this thread is also related [0]/messages/by-id/CA+TgmobhAbs2yabTuTRkJTq_kkC80-+jw=pfpypdOJ7+gAbQbw@mail.gmail.com.

My 2 cents.
It's OK if external tool is responsible for concurrency. Do we want this complexity in core? Many users do not enable archiving at all.
Maybe just add parallelism API for external tool?
It's much easier to control concurrency in external tool that in PostgreSQL core. Maintaining parallel worker is a tremendously harder than spawning goroutine, thread, task or whatever.
External tool needs to know when xlog segment is ready and needs to report when it's done. Postgres should just ensure that external archiever\restorer is running.
For example external tool could read xlog names from stdin and report finished files from stdout. I can prototype such tool swiftly :)
E.g. postgres runs ```wal-g wal-archiver``` and pushes ready segment filenames on stdin. And no more listing of archive_status and hacky algorithms to predict next WAL name and completition time!

Thoughts?

Best regards, Andrey Borodin.

[0]: /messages/by-id/CA+TgmobhAbs2yabTuTRkJTq_kkC80-+jw=pfpypdOJ7+gAbQbw@mail.gmail.com

#6Julien Rouhaud
rjuju123@gmail.com
In reply to: Andrey Borodin (#5)
Re: parallelizing the archiver

On Fri, Sep 10, 2021 at 1:28 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

It's OK if external tool is responsible for concurrency. Do we want this complexity in core? Many users do not enable archiving at all.
Maybe just add parallelism API for external tool?
It's much easier to control concurrency in external tool that in PostgreSQL core. Maintaining parallel worker is a tremendously harder than spawning goroutine, thread, task or whatever.

Yes, but it also means that it's up to every single archiving tool to
implement a somewhat hackish parallel version of an archive_command,
hoping that core won't break it. If this problem is solved in
postgres core whithout API change, then all existing tool will
automatically benefit from it (maybe not the one who used to have
hacks to make it parallel though, but it seems easier to disable it
rather than implement it).

External tool needs to know when xlog segment is ready and needs to report when it's done. Postgres should just ensure that external archiever\restorer is running.
For example external tool could read xlog names from stdin and report finished files from stdout. I can prototype such tool swiftly :)
E.g. postgres runs ```wal-g wal-archiver``` and pushes ready segment filenames on stdin. And no more listing of archive_status and hacky algorithms to predict next WAL name and completition time!

Yes, but that requires fundamental design changes for the archive
commands right? So while I agree it could be a better approach
overall, it seems like a longer term option. As far as I understand,
what Nathan suggested seems more likely to be achieved in pg15 and
could benefit from a larger set of backup solutions. This can give us
enough time to properly design a better approach for designing a new
archiving approach.

#7Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Julien Rouhaud (#6)
Re: parallelizing the archiver

10 сент. 2021 г., в 10:52, Julien Rouhaud <rjuju123@gmail.com> написал(а):

On Fri, Sep 10, 2021 at 1:28 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

It's OK if external tool is responsible for concurrency. Do we want this complexity in core? Many users do not enable archiving at all.
Maybe just add parallelism API for external tool?
It's much easier to control concurrency in external tool that in PostgreSQL core. Maintaining parallel worker is a tremendously harder than spawning goroutine, thread, task or whatever.

Yes, but it also means that it's up to every single archiving tool to
implement a somewhat hackish parallel version of an archive_command,
hoping that core won't break it.

I'm not proposing to remove existing archive_command. Just deprecate it one-WAL-per-call form.

If this problem is solved in
postgres core whithout API change, then all existing tool will
automatically benefit from it (maybe not the one who used to have
hacks to make it parallel though, but it seems easier to disable it
rather than implement it).

True hacky tools already can coordinate swarm of their processes and are prepared that they are called multiple times concurrently :)

External tool needs to know when xlog segment is ready and needs to report when it's done. Postgres should just ensure that external archiever\restorer is running.
For example external tool could read xlog names from stdin and report finished files from stdout. I can prototype such tool swiftly :)
E.g. postgres runs ```wal-g wal-archiver``` and pushes ready segment filenames on stdin. And no more listing of archive_status and hacky algorithms to predict next WAL name and completition time!

Yes, but that requires fundamental design changes for the archive
commands right? So while I agree it could be a better approach
overall, it seems like a longer term option. As far as I understand,
what Nathan suggested seems more likely to be achieved in pg15 and
could benefit from a larger set of backup solutions. This can give us
enough time to properly design a better approach for designing a new
archiving approach.

It's a very simplistic approach. If some GUC is set - archiver will just feed ready files to stdin of archive command. What fundamental design changes we need?

Best regards, Andrey Borodin.

#8Julien Rouhaud
rjuju123@gmail.com
In reply to: Andrey Borodin (#7)
Re: parallelizing the archiver

On Fri, Sep 10, 2021 at 2:03 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

10 сент. 2021 г., в 10:52, Julien Rouhaud <rjuju123@gmail.com> написал(а):

Yes, but it also means that it's up to every single archiving tool to
implement a somewhat hackish parallel version of an archive_command,
hoping that core won't break it.

I'm not proposing to remove existing archive_command. Just deprecate it one-WAL-per-call form.

Which is a big API beak.

It's a very simplistic approach. If some GUC is set - archiver will just feed ready files to stdin of archive command. What fundamental design changes we need?

I'm talking about the commands themselves. Your suggestion is to
change archive_command to be able to spawn a daemon, and it looks like
a totally different approach. I'm not saying that having a daemon
based approach to take care of archiving is a bad idea, I'm saying
that trying to fit that with the current archive_command + some new
GUC looks like a bad idea.

#9Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Julien Rouhaud (#8)
Re: parallelizing the archiver

10 сент. 2021 г., в 11:11, Julien Rouhaud <rjuju123@gmail.com> написал(а):

On Fri, Sep 10, 2021 at 2:03 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

10 сент. 2021 г., в 10:52, Julien Rouhaud <rjuju123@gmail.com> написал(а):

Yes, but it also means that it's up to every single archiving tool to
implement a somewhat hackish parallel version of an archive_command,
hoping that core won't break it.

I'm not proposing to remove existing archive_command. Just deprecate it one-WAL-per-call form.

Which is a big API beak.

Huge extension, not a break.

It's a very simplistic approach. If some GUC is set - archiver will just feed ready files to stdin of archive command. What fundamental design changes we need?

I'm talking about the commands themselves. Your suggestion is to
change archive_command to be able to spawn a daemon, and it looks like
a totally different approach. I'm not saying that having a daemon
based approach to take care of archiving is a bad idea, I'm saying
that trying to fit that with the current archive_command + some new
GUC looks like a bad idea.

It fits nicely, even in corner cases. E.g. restore_command run from pg_rewind seems compatible with this approach.
One more example: after failover DBA can just ```ls|wal-g wal-push``` to archive all WALs unarchived before network partition.

This is simple yet powerful approach, without any contradiction to existing archive_command API.
Why it's a bad idea?

Best regards, Andrey Borodin.

#10Robert Haas
robertmhaas@gmail.com
In reply to: Bossart, Nathan (#1)
Re: parallelizing the archiver

On Tue, Sep 7, 2021 at 6:36 PM Bossart, Nathan <bossartn@amazon.com> wrote:

Based on previous threads I've seen, I believe many in the community
would like to replace archive_command entirely, but what I'm proposing
here would build on the existing tools. I'm currently thinking of
something a bit like autovacuum_max_workers, but the archive workers
would be created once and would follow a competing consumers model.

To me, it seems way more beneficial to think about being able to
invoke archive_command with many files at a time instead of just one.
I think for most plausible archive commands that would be way more
efficient than what you propose here. It's *possible* that if we had
that, we'd still want this, but I'm not even convinced.

--
Robert Haas
EDB: http://www.enterprisedb.com

#11Julien Rouhaud
rjuju123@gmail.com
In reply to: Robert Haas (#10)
Re: parallelizing the archiver

On Fri, Sep 10, 2021 at 9:13 PM Robert Haas <robertmhaas@gmail.com> wrote:

To me, it seems way more beneficial to think about being able to
invoke archive_command with many files at a time instead of just one.
I think for most plausible archive commands that would be way more
efficient than what you propose here. It's *possible* that if we had
that, we'd still want this, but I'm not even convinced.

Those approaches don't really seems mutually exclusive? In both case
you will need to internally track the status of each WAL file and
handle non contiguous file sequences. In case of parallel commands
you only need additional knowledge that some commands is already
working on a file. Wouldn't it be even better to eventually be able
launch multiple batches of multiple files rather than a single batch?

If we start with parallelism first, the whole ecosystem could
immediately benefit from it as is. To be able to handle multiple
files in a single command, we would need some way to let the server
know which files were successfully archived and which files weren't,
so it requires a different communication approach than the command
return code.

But as I said, I'm not convinced that using the archive_command
approach for that is the best approach If I understand correctly,
most of the backup solutions would prefer to have a daemon being
launched and use it at a queuing system. Wouldn't it be better to
have a new archive_mode, e.g. "daemon", and have postgres responsible
to (re)start it, and pass information through the daemon's
stdin/stdout or something like that?

#12Robert Haas
robertmhaas@gmail.com
In reply to: Julien Rouhaud (#11)
Re: parallelizing the archiver

On Fri, Sep 10, 2021 at 10:19 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

Those approaches don't really seems mutually exclusive? In both case
you will need to internally track the status of each WAL file and
handle non contiguous file sequences. In case of parallel commands
you only need additional knowledge that some commands is already
working on a file. Wouldn't it be even better to eventually be able
launch multiple batches of multiple files rather than a single batch?

Well, I guess I'm not convinced. Perhaps people with more knowledge of
this than I may already know why it's beneficial, but in my experience
commands like 'cp' and 'scp' are usually limited by the speed of I/O,
not the fact that you only have one of them running at once. Running
several at once, again in my experience, is typically not much faster.
On the other hand, scp has a LOT of startup overhead, so it's easy to
see the benefits of batching.

[rhaas pgsql]$ touch x y z
[rhaas pgsql]$ time sh -c 'scp x cthulhu: && scp y cthulhu: && scp z cthulhu:'
x 100% 207KB 78.8KB/s 00:02
y 100% 0 0.0KB/s 00:00
z 100% 0 0.0KB/s 00:00

real 0m9.418s
user 0m0.045s
sys 0m0.071s
[rhaas pgsql]$ time sh -c 'scp x y z cthulhu:'
x 100% 207KB 273.1KB/s 00:00
y 100% 0 0.0KB/s 00:00
z 100% 0 0.0KB/s 00:00

real 0m3.216s
user 0m0.017s
sys 0m0.020s

If we start with parallelism first, the whole ecosystem could
immediately benefit from it as is. To be able to handle multiple
files in a single command, we would need some way to let the server
know which files were successfully archived and which files weren't,
so it requires a different communication approach than the command
return code.

That is possibly true. I think it might work to just assume that you
have to retry everything if it exits non-zero, but that requires the
archive command to be smart enough to do something sensible if an
identical file is already present in the archive.

But as I said, I'm not convinced that using the archive_command
approach for that is the best approach If I understand correctly,
most of the backup solutions would prefer to have a daemon being
launched and use it at a queuing system. Wouldn't it be better to
have a new archive_mode, e.g. "daemon", and have postgres responsible
to (re)start it, and pass information through the daemon's
stdin/stdout or something like that?

Sure. Actually, I think a background worker would be better than a
separate daemon. Then it could just talk to shared memory directly.

--
Robert Haas
EDB: http://www.enterprisedb.com

#13Julien Rouhaud
rjuju123@gmail.com
In reply to: Robert Haas (#12)
Re: parallelizing the archiver

On Fri, Sep 10, 2021 at 11:22 PM Robert Haas <robertmhaas@gmail.com> wrote:

Well, I guess I'm not convinced. Perhaps people with more knowledge of
this than I may already know why it's beneficial, but in my experience
commands like 'cp' and 'scp' are usually limited by the speed of I/O,
not the fact that you only have one of them running at once. Running
several at once, again in my experience, is typically not much faster.
On the other hand, scp has a LOT of startup overhead, so it's easy to
see the benefits of batching.

I totally agree that batching as many file as possible in a single
command is probably what's gonna achieve the best performance. But if
the archiver only gets an answer from the archive_command once it
tried to process all of the file, it also means that postgres won't be
able to remove any WAL file until all of them could be processed. It
means that users will likely have to limit the batch size and
therefore pay more startup overhead than they would like. In case of
archiving on server with high latency / connection overhead it may be
better to be able to run multiple commands in parallel. I may be
overthinking here and definitely having feedback from people with more
experience around that would be welcome.

That is possibly true. I think it might work to just assume that you
have to retry everything if it exits non-zero, but that requires the
archive command to be smart enough to do something sensible if an
identical file is already present in the archive.

Yes, it could be. I think that we need more feedback for that too.

Sure. Actually, I think a background worker would be better than a
separate daemon. Then it could just talk to shared memory directly.

I thought about it too, but I was under the impression that most
people would want to implement a custom daemon (or already have) with
some more parallel/thread friendly language.

#14Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Julien Rouhaud (#11)
Re: parallelizing the archiver

10 сент. 2021 г., в 19:19, Julien Rouhaud <rjuju123@gmail.com> написал(а):
Wouldn't it be better to
have a new archive_mode, e.g. "daemon", and have postgres responsible
to (re)start it, and pass information through the daemon's
stdin/stdout or something like that?

We don't even need to introduce new archive_mode.
Currently archive_command has no expectations regarding stdin\stdout.
Let's just say that we will push new WAL names to stdin until archive_command exits.
And if archive_command prints something to stdout we will interpret it as archived WAL names.
That's it.

Existing archive_commands will continue as is.

Currently information about what is archived is stored on filesystem in archive_status dir. We do not need to change anything.
If archive_command exits (with any exit code) we will restart it if there are WAL files that still were not archived.

Best regards, Andrey Borodin.

#15Bossart, Nathan
bossartn@amazon.com
In reply to: Robert Haas (#12)
Re: parallelizing the archiver

On 9/10/21, 8:22 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:

On Fri, Sep 10, 2021 at 10:19 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

Those approaches don't really seems mutually exclusive? In both case
you will need to internally track the status of each WAL file and
handle non contiguous file sequences. In case of parallel commands
you only need additional knowledge that some commands is already
working on a file. Wouldn't it be even better to eventually be able
launch multiple batches of multiple files rather than a single batch?

Well, I guess I'm not convinced. Perhaps people with more knowledge of
this than I may already know why it's beneficial, but in my experience
commands like 'cp' and 'scp' are usually limited by the speed of I/O,
not the fact that you only have one of them running at once. Running
several at once, again in my experience, is typically not much faster.
On the other hand, scp has a LOT of startup overhead, so it's easy to
see the benefits of batching.

[...]

If we start with parallelism first, the whole ecosystem could
immediately benefit from it as is. To be able to handle multiple
files in a single command, we would need some way to let the server
know which files were successfully archived and which files weren't,
so it requires a different communication approach than the command
return code.

That is possibly true. I think it might work to just assume that you
have to retry everything if it exits non-zero, but that requires the
archive command to be smart enough to do something sensible if an
identical file is already present in the archive.

My initial thinking was similar to Julien's. Assuming I have an
archive_command that handles one file, I can just set
archive_max_workers to 3 and reap the benefits. If I'm using an
existing utility that implements its own parallelism, I can keep
archive_max_workers at 1 and continue using it. This would be a
simple incremental improvement.

That being said, I think the discussion about batching is a good one
to have. If the overhead described in your SCP example is
representative of a typical archive_command, then parallelism does
seem a bit silly. We'd essentially be using a ton more resources when
there's obvious room for improvement via reducing amount of overhead
per archive. I think we could easily make the batch size configurable
so that existing archive commands would work (e.g.,
archive_batch_size=1). However, unlike the simple parallel approach,
you'd likely have to adjust your archive_command if you wanted to make
use of batching. That doesn't seem terrible to me, though. As
discussed above, there are some implementation details to work out for
archive failures, but nothing about that seems intractable to me.
Plus, if you still wanted to parallelize things, feeding your
archive_command several files at a time could still be helpful.

I'm currently leaning toward exploring the batching approach first. I
suppose we could always make a prototype of both solutions for
comparison with some "typical" archive commands if that would help
with the discussion.

Nathan

#16Jacob Champion
pchampion@vmware.com
In reply to: Julien Rouhaud (#13)
Re: parallelizing the archiver

On Fri, 2021-09-10 at 23:48 +0800, Julien Rouhaud wrote:

I totally agree that batching as many file as possible in a single
command is probably what's gonna achieve the best performance. But if
the archiver only gets an answer from the archive_command once it
tried to process all of the file, it also means that postgres won't be
able to remove any WAL file until all of them could be processed. It
means that users will likely have to limit the batch size and
therefore pay more startup overhead than they would like. In case of
archiving on server with high latency / connection overhead it may be
better to be able to run multiple commands in parallel.

Well, users would also have to limit the parallelism, right? If
connections are high-overhead, I wouldn't imagine that running hundreds
of them simultaneously would work very well in practice. (The proof
would be in an actual benchmark, obviously, but usually I would rather
have one process handling a hundred items than a hundred processes
handling one item each.)

For a batching scheme, would it be that big a deal to wait for all of
them to be archived before removal?

That is possibly true. I think it might work to just assume that you
have to retry everything if it exits non-zero, but that requires the
archive command to be smart enough to do something sensible if an
identical file is already present in the archive.

Yes, it could be. I think that we need more feedback for that too.

Seems like this is the sticking point. What would be the smartest thing
for the command to do? If there's a destination file already, checksum
it and make sure it matches the source before continuing?

--Jacob

#17Robert Haas
robertmhaas@gmail.com
In reply to: Julien Rouhaud (#13)
Re: parallelizing the archiver

On Fri, Sep 10, 2021 at 11:49 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

I totally agree that batching as many file as possible in a single
command is probably what's gonna achieve the best performance. But if
the archiver only gets an answer from the archive_command once it
tried to process all of the file, it also means that postgres won't be
able to remove any WAL file until all of them could be processed. It
means that users will likely have to limit the batch size and
therefore pay more startup overhead than they would like. In case of
archiving on server with high latency / connection overhead it may be
better to be able to run multiple commands in parallel. I may be
overthinking here and definitely having feedback from people with more
experience around that would be welcome.

That's a fair point. I'm not sure how much it matters, though. I think
you want to imagine a system where there are let's say 10 WAL flies
being archived per second. Using fork() + exec() to spawn a shell
command 10 times per second is a bit expensive, whether you do it
serially or in parallel, and even if the command is something with a
less-insane startup overhead than scp. If we start a shell command say
every 3 seconds and give it 30 files each time, we can reduce the
startup costs we're paying by ~97% at the price of having to wait up
to 3 additional seconds to know that archiving succeeded for any
particular file. That sounds like a pretty good trade-off, because the
main benefit of removing old files is that it keeps us from running
out of disk space, and you should not be running a busy system in such
a way that it is ever within 3 seconds of running out of disk space,
so whatever.

If on the other hand you imagine a system that's not very busy, say 1
WAL file being archived every 10 seconds, then using a batch size of
30 would very significantly delay removal of old files. However, on
this system, batching probably isn't really needed. The rate of WAL
file generation is low enough that if you pay the startup cost of your
archive_command for every file, you're probably still doing just fine.

Probably, any kind of parallelism or batching needs to take this kind
of time-based thinking into account. For batching, the rate at which
files are generated should affect the batch size. For parallelism, it
should affect the number of processes used.

--
Robert Haas
EDB: http://www.enterprisedb.com

#18Bossart, Nathan
bossartn@amazon.com
In reply to: Robert Haas (#17)
Re: parallelizing the archiver

On 9/10/21, 10:12 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:

If on the other hand you imagine a system that's not very busy, say 1
WAL file being archived every 10 seconds, then using a batch size of
30 would very significantly delay removal of old files. However, on
this system, batching probably isn't really needed. The rate of WAL
file generation is low enough that if you pay the startup cost of your
archive_command for every file, you're probably still doing just fine.

Probably, any kind of parallelism or batching needs to take this kind
of time-based thinking into account. For batching, the rate at which
files are generated should affect the batch size. For parallelism, it
should affect the number of processes used.

I was thinking that archive_batch_size would be the maximum batch
size. If the archiver only finds a single file to archive, that's all
it'd send to the archive command. If it finds more, it'd send up to
archive_batch_size to the command.

Nathan

#19Robert Haas
robertmhaas@gmail.com
In reply to: Bossart, Nathan (#15)
Re: parallelizing the archiver

On Fri, Sep 10, 2021 at 1:07 PM Bossart, Nathan <bossartn@amazon.com> wrote:

That being said, I think the discussion about batching is a good one
to have. If the overhead described in your SCP example is
representative of a typical archive_command, then parallelism does
seem a bit silly.

I think that's pretty realistic, because a lot of people's archive
commands are going to actually be, or need to use, scp specifically.
However, there are also cases where people are using commands that
just put the file in some local directory (maybe on a remote mount
point) and I would expect the startup overhead to be much less in
those cases. Maybe people are archiving via HTTPS or similar as well,
and then you again have some connection overhead though, I suspect,
not as much as scp, since web pages do not take 3 seconds to get an
https connection going. I don't know why scp is so crazy slow.

Even in the relatively low-overhead cases, though, I think we would
want to do some real testing to see if the benefits are as we expect.
See /messages/by-id/20200420211018.w2qphw4yybcbxksl@alap3.anarazel.de
and downthread for context. I was *convinced* that parallel backup was
a win. Benchmarking was a tad underwhelming, but there was a clear if
modest benefit by running a synthetic test of copying a lot of files
serially or in parallel, with the files spread across multiple
filesystems on the same physical box. However, when Andres modified my
test program to use posix_fadvise(), posix_fallocate(), and
sync_file_range() while doing the copies, the benefits of parallelism
largely evaporated, and in fact in some cases enabling parallelism
caused major regressions. In other words, the apparent benefits of
parallelism were really due to suboptimal behaviors in the Linux page
cache and some NUMA effects that were in fact avoidable.

So I'm suspicious that the same things might end up being true here.
It's not exactly the same, because the goal of WAL archiving is to
keep up with the rate of WAL generation, and the goal of a backup is
(unless max-rate is used) to finish as fast as possible, and that
difference in goals might end up being significant. Also, you can make
an argument that some people will benefit from a parallelism feature
even if a perfectly-implemented archive_command doesn't, because many
people use really terrible archive_commnads. But all that said, I
think the parallel backup discussion is still a cautionary tale to
which some attention ought to be paid.

We'd essentially be using a ton more resources when
there's obvious room for improvement via reducing amount of overhead
per archive. I think we could easily make the batch size configurable
so that existing archive commands would work (e.g.,
archive_batch_size=1). However, unlike the simple parallel approach,
you'd likely have to adjust your archive_command if you wanted to make
use of batching. That doesn't seem terrible to me, though. As
discussed above, there are some implementation details to work out for
archive failures, but nothing about that seems intractable to me.
Plus, if you still wanted to parallelize things, feeding your
archive_command several files at a time could still be helpful.

Yep.

I'm currently leaning toward exploring the batching approach first. I
suppose we could always make a prototype of both solutions for
comparison with some "typical" archive commands if that would help
with the discussion.

Yeah, I think the concerns here are more pragmatic than philosophical,
at least for me.

I had kind of been thinking that the way to attack this problem is to
go straight to allowing for a background worker, because the other
problem with archive_command is that running a shell command like cp,
scp, or rsync is not really safe. It won't fsync your data, it might
not fail if the file is in the archive already, and it definitely
won't succeed without doing anything if there's a byte for byte
identical file in the archive and fail if there's a file with
different contents already in the archive. Fixing that stuff by
running different shell commands is hard, but it wouldn't be that hard
to do it in C code, and you could then also extend whatever code you
wrote to do batching and parallelism; starting more workers isn't
hard.

However, I can't see the idea of running a shell command going away
any time soon, in spite of its numerous and severe drawbacks. Such an
interface provides a huge degree of flexibility and allows system
admins to whack around behavior easily, which you don't get if you
have to code every change in C. So I think command-based enhancements
are fine to pursue also, even though I don't think it's the ideal
place for most users to end up.

--
Robert Haas
EDB: http://www.enterprisedb.com

#20Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Bossart, Nathan (#18)
Re: parallelizing the archiver

10 сент. 2021 г., в 22:18, Bossart, Nathan <bossartn@amazon.com> написал(а):

I was thinking that archive_batch_size would be the maximum batch
size. If the archiver only finds a single file to archive, that's all
it'd send to the archive command. If it finds more, it'd send up to
archive_batch_size to the command.

I think that a concept of a "batch" is misleading.
If you pass filenames via stdin you don't need to know all names upfront.
Just send more names to the pipe if achiver_command is still running one more segments just became available.
This way level of parallelism will adapt to the workload.

Best regards, Andrey Borodin.

#21Stephen Frost
sfrost@snowman.net
In reply to: Julien Rouhaud (#8)
Re: parallelizing the archiver

Greetings,

* Julien Rouhaud (rjuju123@gmail.com) wrote:

On Fri, Sep 10, 2021 at 2:03 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

10 сент. 2021 г., в 10:52, Julien Rouhaud <rjuju123@gmail.com> написал(а):
Yes, but it also means that it's up to every single archiving tool to
implement a somewhat hackish parallel version of an archive_command,
hoping that core won't break it.

We've got too many archiving tools as it is, if you want my 2c on that.

I'm not proposing to remove existing archive_command. Just deprecate it one-WAL-per-call form.

Which is a big API beak.

We definitely need to stop being afraid of this. We completely changed
around how restores work and made pretty much all of the backup/restore
tools have to make serious changes when we released v12.

I definitely don't think that we should be making assumptions that
changing archive command to start running things in parallel isn't
*also* an API break too, in any case. It is also a change and there's
definitely a good chance that it'd break some of the archivers out
there. If we're going to make a change here, let's make a sensible one.

It's a very simplistic approach. If some GUC is set - archiver will just feed ready files to stdin of archive command. What fundamental design changes we need?

Haven't really thought about this proposal but it does sound
interesting.

Thanks,

Stephen

#22Julien Rouhaud
rjuju123@gmail.com
In reply to: Stephen Frost (#21)
Re: parallelizing the archiver

On Wed, Sep 15, 2021 at 4:14 AM Stephen Frost <sfrost@snowman.net> wrote:

I'm not proposing to remove existing archive_command. Just deprecate it one-WAL-per-call form.

Which is a big API beak.

We definitely need to stop being afraid of this. We completely changed
around how restores work and made pretty much all of the backup/restore
tools have to make serious changes when we released v12.

I never said that we should avoid API break at all cost, I said that
if we break the API we should introduce something better. The
proposal to pass multiple file names to the archive command said
nothing about how to tell which ones were successfully archived and
which ones weren't, which is a big problem in my opinion. But I think
we should also consider different approach, such as maintaining some
kind of daemon and asynchronously passing all the WAL file names,
waiting for answers. Or maybe something else. It's just that simply
"passing multiple file names" doesn't seem like a big enough win to
justify an API break to me.

I definitely don't think that we should be making assumptions that
changing archive command to start running things in parallel isn't
*also* an API break too, in any case. It is also a change and there's
definitely a good chance that it'd break some of the archivers out
there. If we're going to make a change here, let's make a sensible one.

But doing parallel archiving can and should be controlled with a GUC,
so if your archive_command isn't compatible you can simply just not
use it (on top of having a default of not using parallel archiving, at
least for some times). It doesn't seem like a big problem.

#23Bossart, Nathan
bossartn@amazon.com
In reply to: Andrey Borodin (#20)
1 attachment(s)
Re: parallelizing the archiver

On 9/10/21, 10:42 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:

I had kind of been thinking that the way to attack this problem is to
go straight to allowing for a background worker, because the other
problem with archive_command is that running a shell command like cp,
scp, or rsync is not really safe. It won't fsync your data, it might
not fail if the file is in the archive already, and it definitely
won't succeed without doing anything if there's a byte for byte
identical file in the archive and fail if there's a file with
different contents already in the archive. Fixing that stuff by
running different shell commands is hard, but it wouldn't be that hard
to do it in C code, and you could then also extend whatever code you
wrote to do batching and parallelism; starting more workers isn't
hard.

However, I can't see the idea of running a shell command going away
any time soon, in spite of its numerous and severe drawbacks. Such an
interface provides a huge degree of flexibility and allows system
admins to whack around behavior easily, which you don't get if you
have to code every change in C. So I think command-based enhancements
are fine to pursue also, even though I don't think it's the ideal
place for most users to end up.

I've given this quite a bit of thought. I hacked together a batching
approach for benchmarking, and it seemed to be a decent improvement,
but you're still shelling out every N files, and all the stuff about
shell commands not being ideal that you mentioned still applies.
Perhaps it's still a good improvement, and maybe we should still do
it, but I get the idea that many believe we can still do better. So,
I looked into adding support for setting up archiving via an
extension.

The attached patch is a first try at adding alternatives for
archive_command, restore_command, archive_cleanup_command, and
recovery_end_command. It adds the GUCs archive_library,
restore_library, archive_cleanup_library, and recovery_end_library.
Each of these accepts a library name that is loaded at startup,
similar to shared_preload_libraries. _PG_init() is still used for
initialization, and you can use the same library for multiple purposes
by checking the new exported variables (e.g.,
process_archive_library_in_progress). The library is then responsible
for implementing the relevant function, such as _PG_archive() or
_PG_restore(). The attached patch also demonstrates a simple
implementation for an archive_library that is similar to the sample
archive_command in the documentation.

I tested the sample archive_command in the docs against the sample
archive_library implementation in the patch, and I saw about a 50%
speedup. (The archive_library actually syncs the files to disk, too.)
This is similar to the improvement from batching.

Of course, there are drawbacks to using an extension. Besides the
obvious added complexity of building an extension in C versus writing
a shell command, the patch disallows changing the libraries without
restarting the server. Also, the patch makes no effort to simplify
error handling, memory management, etc. This is left as an exercise
for the extension author.

I'm sure there are other ways to approach this, but I thought I'd give
it a try to see what was possible and to get the conversation started.

Nathan

Attachments:

v1-0001-backup-module-proof-of-concept.patchapplication/octet-stream; name=v1-0001-backup-module-proof-of-concept.patchDownload
From 23e0872805c29947ffc334a7b35c2988285a8130 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Thu, 30 Sep 2021 04:13:45 +0000
Subject: [PATCH v1 1/1] backup module proof of concept

---
 contrib/Makefile                         |   1 +
 contrib/basic_archive/Makefile           |  15 ++
 contrib/basic_archive/basic_archive.c    | 226 ++++++++++++++++++++++
 src/backend/access/transam/xlog.c        |  29 ++-
 src/backend/access/transam/xlogarchive.c | 318 +++++++++++++++++++++++++------
 src/backend/postmaster/pgarch.c          |  95 ++++++++-
 src/backend/postmaster/postmaster.c      |  10 +
 src/backend/utils/fmgr/dfmgr.c           | 100 +++++++++-
 src/backend/utils/init/miscinit.c        |  62 ++++++
 src/backend/utils/misc/guc.c             | 155 ++++++++++++++-
 src/include/access/xlog.h                |   6 +-
 src/include/access/xlog_internal.h       |   1 +
 src/include/access/xlogarchive.h         |   1 +
 src/include/fmgr.h                       |  10 +
 src/include/miscadmin.h                  |   6 +
 15 files changed, 950 insertions(+), 85 deletions(-)
 create mode 100644 contrib/basic_archive/Makefile
 create mode 100644 contrib/basic_archive/basic_archive.c

diff --git a/contrib/Makefile b/contrib/Makefile
index f27e458482..aff834ebaa 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -9,6 +9,7 @@ SUBDIRS = \
 		amcheck		\
 		auth_delay	\
 		auto_explain	\
+		basic_archive	\
 		bloom		\
 		btree_gin	\
 		btree_gist	\
diff --git a/contrib/basic_archive/Makefile b/contrib/basic_archive/Makefile
new file mode 100644
index 0000000000..ea6b460889
--- /dev/null
+++ b/contrib/basic_archive/Makefile
@@ -0,0 +1,15 @@
+# contrib/basic_archive/Makefile
+
+MODULES = basic_archive
+PGFILEDESC = "basic_archive - basic archive module"
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/basic_archive
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
new file mode 100644
index 0000000000..832a7a4d69
--- /dev/null
+++ b/contrib/basic_archive/basic_archive.c
@@ -0,0 +1,226 @@
+/*-------------------------------------------------------------------------
+ *
+ * basic_archive.c
+ *
+ * This file demonstrates a basic archive_library implemenation that is
+ * roughly equivalent to the following shell command:
+ *
+ * 		test ! -f /path/to/dest && cp /path/to/src /path/to/dest
+ *
+ * One notable difference between this module and the shell command above
+ * is that this module first copies the file to a temporary destination,
+ * syncs it to disk, and then durably moves it to the final destination.
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  contrib/basic_archive/basic_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "fmgr.h"
+#include "miscadmin.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+void	_PG_init(void);
+bool	_PG_archive(const char *path, const char *file);
+
+static char *archive_directory = NULL;
+
+static bool check_archive_directory(char **newval, void **extra, GucSource source);
+static bool copy_file(const char *src, const char *dst, char *buf);
+
+void
+_PG_init(void)
+{
+	if (!process_archive_library_in_progress)
+		ereport(ERROR,
+				(errmsg("\"basic_archive\" can only be loaded via "
+						"\"archive_library\"")));
+
+	DefineCustomStringVariable("basic_archive.archive_directory",
+							   gettext_noop("Archive file destination directory."),
+							   NULL,
+							   &archive_directory,
+							   "",
+							   PGC_POSTMASTER,
+							   GUC_NOT_IN_SAMPLE,
+							   check_archive_directory, NULL, NULL);
+}
+
+static bool
+check_archive_directory(char **newval, void **extra, GucSource source)
+{
+	struct stat st;
+
+	if (*newval == NULL || *newval[0] == '\0')
+		return true;
+
+	if (stat(*newval, &st) != 0 || !S_ISDIR(st.st_mode))
+	{
+		GUC_check_errdetail("specified archive directory does not exist");
+		return false;
+	}
+
+	return true;
+}
+
+bool
+_PG_archive(const char *path, const char *file)
+{
+	char destination[MAXPGPATH];
+	char temp[MAXPGPATH];
+	struct stat st;
+	char *buf;
+
+	if (archive_directory == NULL || archive_directory[0] == '\0')
+	{
+		ereport(WARNING,
+				(errmsg("\"basic_archive.archive_directory\" not specified")));
+		return false;
+	}
+
+#define TEMP_FILE_NAME ("archtemp")
+
+	if (strlen(path) + Max(strlen(file), strlen(TEMP_FILE_NAME)) + 2 >= MAXPGPATH)
+	{
+		ereport(WARNING,
+				(errmsg("archive destination path too long")));
+		return false;
+	}
+
+	snprintf(destination, MAXPGPATH, "%s/%s", path, file);
+	snprintf(temp, MAXPGPATH, "%s/%s", path, TEMP_FILE_NAME);
+
+	/*
+	 * First, check if the file has already been archived.  If it has,
+	 * just fail, because something is wrong.
+	 */
+	if (stat(destination, &st) == 0)
+	{
+		ereport(WARNING,
+				(errmsg("archive file \"%s\" already exists",
+						destination)));
+		return false;
+	}
+	else if (errno != ENOENT)
+	{
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not stat file \"%s\": %m",
+						destination)));
+		return false;
+	}
+
+	/*
+	 * Remove pre-existing temporary file, if one exists.
+	 */
+	if (unlink(temp) != 0 && errno != ENOENT)
+	{
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not unlink file \"%s\": %m",
+						temp)));
+		return false;
+	}
+
+	/*
+	 * Copy the file to its temporary destination.
+	 */
+#define COPY_BUF_SIZE (64 * 1024)
+
+	buf = palloc(COPY_BUF_SIZE);
+
+	if (!copy_file(path, temp, buf))
+	{
+		pfree(buf);
+		return false;
+	}
+
+	pfree(buf);
+
+	/*
+	 * Sync the temporary file to disk and move it to its final
+	 * destination.
+	 */
+	return (durable_rename(temp, destination, WARNING) == 0);
+}
+
+static bool
+copy_file(const char *src, const char *dst, char *buf)
+{
+	int srcfd;
+	int dstfd;
+	int nbytes;
+
+	srcfd = OpenTransientFile(src, O_RDONLY | PG_BINARY);
+	if (srcfd < 0)
+	{
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", src)));
+		return false;
+	}
+
+	dstfd = OpenTransientFile(dst, O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
+	if (dstfd < 0)
+	{
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", dst)));
+		return false;
+	}
+
+	for (;;)
+	{
+		nbytes = read(srcfd, buf, COPY_BUF_SIZE);
+		if (nbytes < 0)
+		{
+			ereport(WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", src)));
+			return false;
+		}
+
+		if (nbytes == 0)
+			break;
+
+		errno = 0;
+		if ((int) write(dstfd, buf, nbytes) != nbytes)
+		{
+			/* if write didn't set errno, assume problem is no disk space */
+			if (errno == 0)
+				errno = ENOSPC;
+			ereport(WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not write to file \"%s\": %m", dst)));
+			return false;
+		}
+	}
+
+	if (CloseTransientFile(dstfd) != 0)
+	{
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", dst)));
+		return false;
+	}
+
+	if (CloseTransientFile(srcfd) != 0)
+	{
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", src)));
+		return false;
+	}
+
+	return true;
+}
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1388afdfb0..74566217bd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -95,6 +95,7 @@ int			XLOGbuffers = -1;
 int			XLogArchiveTimeout = 0;
 int			XLogArchiveMode = ARCHIVE_MODE_OFF;
 char	   *XLogArchiveCommand = NULL;
+char	   *XLogArchiveLibrary = NULL;
 bool		EnableHotStandby = false;
 bool		fullPageWrites = true;
 bool		wal_log_hints = false;
@@ -270,8 +271,11 @@ static char *primary_image_masked = NULL;
 
 /* options formerly taken from recovery.conf for archive recovery */
 char	   *recoveryRestoreCommand = NULL;
+char	   *recoveryRestoreLibrary = NULL;
 char	   *recoveryEndCommand = NULL;
+char	   *recoveryEndLibrary = NULL;
 char	   *archiveCleanupCommand = NULL;
+char	   *archiveCleanupLibrary = NULL;
 RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET;
 bool		recoveryTargetInclusive = true;
 int			recoveryTargetAction = RECOVERY_TARGET_ACTION_PAUSE;
@@ -5558,18 +5562,21 @@ validateRecoveryParameters(void)
 	if (StandbyModeRequested)
 	{
 		if ((PrimaryConnInfo == NULL || strcmp(PrimaryConnInfo, "") == 0) &&
-			(recoveryRestoreCommand == NULL || strcmp(recoveryRestoreCommand, "") == 0))
+			(recoveryRestoreCommand == NULL || strcmp(recoveryRestoreCommand, "") == 0) &&
+			(recoveryRestoreLibrary == NULL || strcmp(recoveryRestoreLibrary, "") == 0))
 			ereport(WARNING,
-					(errmsg("specified neither primary_conninfo nor restore_command"),
+					(errmsg("specified neither primary_conninfo nor restore_command nor restore_library"),
 					 errhint("The database server will regularly poll the pg_wal subdirectory to check for files placed there.")));
 	}
 	else
 	{
-		if (recoveryRestoreCommand == NULL ||
-			strcmp(recoveryRestoreCommand, "") == 0)
+		if ((recoveryRestoreCommand == NULL ||
+			 strcmp(recoveryRestoreCommand, "") == 0) &&
+			(recoveryRestoreLibrary == NULL ||
+			 strcmp(recoveryRestoreLibrary, "") == 0))
 			ereport(FATAL,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("must specify restore_command when standby mode is not enabled")));
+					 errmsg("must specify restore_command or restore_library when standby mode is not enabled")));
 	}
 
 	/*
@@ -7997,12 +8004,15 @@ StartupXLOG(void)
 	if (ArchiveRecoveryRequested)
 	{
 		/*
-		 * And finally, execute the recovery_end_command, if any.
+		 * And finally, execute the recovery_end_command or
+		 * recovery_end_library, if any.
 		 */
 		if (recoveryEndCommand && strcmp(recoveryEndCommand, "") != 0)
 			ExecuteRecoveryCommand(recoveryEndCommand,
 								   "recovery_end_command",
 								   true);
+		else if (recoveryEndLibrary && strcmp(recoveryEndLibrary, "") != 0)
+			ExecuteRecoveryLibrary("recovery_end_library");
 
 		/*
 		 * We switched to a new timeline. Clean up segments on the old
@@ -8739,7 +8749,7 @@ ShutdownXLOG(int code, Datum arg)
 		 * process one more time at the end of shutdown). The checkpoint
 		 * record will go to the next XLOG file and won't be archived (yet).
 		 */
-		if (XLogArchivingActive() && XLogArchiveCommandSet())
+		if (XLogArchivingActive() && XLogArchiveCommandOrLibrarySet())
 			RequestXLogSwitch(false);
 
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
@@ -9805,12 +9815,15 @@ CreateRestartPoint(int flags)
 							   timestamptz_to_str(xtime)) : 0));
 
 	/*
-	 * Finally, execute archive_cleanup_command, if any.
+	 * Finally, execute archive_cleanup_command or archive_cleanup_library,
+	 * if any.
 	 */
 	if (archiveCleanupCommand && strcmp(archiveCleanupCommand, "") != 0)
 		ExecuteRecoveryCommand(archiveCleanupCommand,
 							   "archive_cleanup_command",
 							   false);
+	else if (archiveCleanupLibrary && strcmp(archiveCleanupLibrary, "") != 0)
+		ExecuteRecoveryLibrary("archive_cleanup_library");
 
 	return true;
 }
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 26b023e754..d78e14fba2 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -23,6 +23,7 @@
 #include "access/xlog_internal.h"
 #include "access/xlogarchive.h"
 #include "common/archive.h"
+#include "fmgr.h"
 #include "miscadmin.h"
 #include "postmaster/startup.h"
 #include "postmaster/pgarch.h"
@@ -31,6 +32,16 @@
 #include "storage/ipc.h"
 #include "storage/lwlock.h"
 
+static bool execute_restore_library(const char *xlogpath, const char *xlogfname,
+									const char *lastRestartPointFname, char *path,
+									off_t expectedSize);
+static bool execute_restore_command(const char *xlogpath, const char *xlogfname,
+									const char *lastRestartPointFname, char *path,
+									off_t expectedSize);
+static void check_restored_file(const char *xlogpath, const char *xlogfname,
+								off_t expectedSize, bool *success,
+								bool *stat_failed);
+
 /*
  * Attempt to retrieve the specified file from off-line archival storage.
  * If successful, fill "path" with its complete path (note that this will be
@@ -55,9 +66,7 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 					bool cleanupEnabled)
 {
 	char		xlogpath[MAXPGPATH];
-	char	   *xlogRestoreCmd;
 	char		lastRestartPointFname[MAXPGPATH];
-	int			rc;
 	struct stat stat_buf;
 	XLogSegNo	restartSegNo;
 	XLogRecPtr	restartRedoPtr;
@@ -65,14 +74,23 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 
 	/*
 	 * Ignore restore_command when not in archive recovery (meaning we are in
-	 * crash recovery).
+	 * crash recovery).  In standby mode, restore_command and restore_library
+	 * might not be supplied.
 	 */
-	if (!ArchiveRecoveryRequested)
-		goto not_available;
-
-	/* In standby mode, restore_command might not be supplied */
-	if (recoveryRestoreCommand == NULL || strcmp(recoveryRestoreCommand, "") == 0)
-		goto not_available;
+	if (!ArchiveRecoveryRequested ||
+		((recoveryRestoreCommand == NULL || strcmp(recoveryRestoreCommand, "") == 0) &&
+		 (recoveryRestoreLibrary == NULL || strcmp(recoveryRestoreLibrary, "") == 0)))
+	{
+		/*
+		 * if an archived file is not available, there might still be a version
+		 * of this file in XLOGDIR, so return that as the filename to open.
+		 *
+		 * In many recovery scenarios we expect this to fail also, but if so
+		 * that just means we've reached the end of WAL.
+		 */
+		snprintf(path, MAXPGPATH, XLOGDIR "/%s", xlogfname);
+		return false;
+	}
 
 	/*
 	 * When doing archive recovery, we always prefer an archived log file even
@@ -148,6 +166,95 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 	else
 		XLogFileName(lastRestartPointFname, 0, 0L, wal_segment_size);
 
+	if (PG_restore != NULL)
+		return execute_restore_library(xlogpath, xlogfname,
+									   lastRestartPointFname, path,
+									   expectedSize);
+	else
+		return execute_restore_command(xlogpath, xlogfname,
+									   lastRestartPointFname, path,
+									   expectedSize);
+}
+
+static bool
+execute_restore_library(const char *xlogpath, const char *xlogfname,
+						const char *lastRestartPointFname, char *path,
+						off_t expectedSize)
+{
+	bool ret;
+
+	Assert(PG_restore != NULL);
+	Assert(xlogpath != NULL);
+	Assert(xlogfname != NULL);
+	Assert(lastRestartPointFname != NULL);
+	Assert(path != NULL);
+
+	ereport(DEBUG3,
+			(errmsg_internal("executing restore library \"%s\" for file \"%s\"",
+							 recoveryRestoreLibrary, xlogfname)));
+
+	/*
+	 * Check signals before restore command and reset afterwards.
+	 */
+	PreRestoreCommand();
+
+	/*
+	 * Copy xlog from archival storage to XLOGDIR
+	 *
+	 * Note that we do not catch any ERRORs (or worse) that the restore
+	 * library emits.  It is the responsibility of the library to do the
+	 * necessary error handling, memory management, etc.  If a library does
+	 * ERROR (or worse), it will bubble up and may cause unexpected
+	 * behavior.
+	 */
+	ret = (*PG_restore) (xlogpath, xlogfname, lastRestartPointFname);
+
+	PostRestoreCommand();
+
+	if (ret)
+	{
+		bool success;
+		bool stat_failed;
+
+		/*
+		 * command apparently succeeded, but let's make sure the file is
+		 * really there now and has the correct size.
+		 */
+		check_restored_file(xlogpath, xlogfname, expectedSize, &success,
+							&stat_failed);
+
+		if (success)
+			strcpy(path, xlogpath);
+
+		if (!stat_failed)
+			return success;
+	}
+
+	/*
+	 * if an archived file is not available, there might still be a version of
+	 * this file in XLOGDIR, so return that as the filename to open.
+	 *
+	 * In many recovery scenarios we expect this to fail also, but if so that
+	 * just means we've reached the end of WAL.
+	 */
+	snprintf(path, MAXPGPATH, XLOGDIR "/%s", xlogfname);
+	return false;
+}
+
+static bool
+execute_restore_command(const char *xlogpath, const char *xlogfname,
+						const char *lastRestartPointFname, char *path,
+						off_t expectedSize)
+{
+	char	   *xlogRestoreCmd;
+	int			rc;
+
+	Assert(recoveryRestoreCommand != NULL);
+	Assert(xlogpath != NULL);
+	Assert(xlogfname != NULL);
+	Assert(lastRestartPointFname != NULL);
+	Assert(path != NULL);
+
 	/* Build the restore command to execute */
 	xlogRestoreCmd = BuildRestoreCommand(recoveryRestoreCommand,
 										 xlogpath, xlogfname,
@@ -175,58 +282,21 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 
 	if (rc == 0)
 	{
+		bool success;
+		bool stat_failed;
+
 		/*
 		 * command apparently succeeded, but let's make sure the file is
 		 * really there now and has the correct size.
 		 */
-		if (stat(xlogpath, &stat_buf) == 0)
-		{
-			if (expectedSize > 0 && stat_buf.st_size != expectedSize)
-			{
-				int			elevel;
-
-				/*
-				 * If we find a partial file in standby mode, we assume it's
-				 * because it's just being copied to the archive, and keep
-				 * trying.
-				 *
-				 * Otherwise treat a wrong-sized file as FATAL to ensure the
-				 * DBA would notice it, but is that too strong? We could try
-				 * to plow ahead with a local copy of the file ... but the
-				 * problem is that there probably isn't one, and we'd
-				 * incorrectly conclude we've reached the end of WAL and we're
-				 * done recovering ...
-				 */
-				if (StandbyMode && stat_buf.st_size < expectedSize)
-					elevel = DEBUG1;
-				else
-					elevel = FATAL;
-				ereport(elevel,
-						(errmsg("archive file \"%s\" has wrong size: %lld instead of %lld",
-								xlogfname,
-								(long long int) stat_buf.st_size,
-								(long long int) expectedSize)));
-				return false;
-			}
-			else
-			{
-				ereport(LOG,
-						(errmsg("restored log file \"%s\" from archive",
-								xlogfname)));
-				strcpy(path, xlogpath);
-				return true;
-			}
-		}
-		else
-		{
-			/* stat failed */
-			int			elevel = (errno == ENOENT) ? LOG : FATAL;
+		check_restored_file(xlogpath, xlogfname, expectedSize, &success,
+							&stat_failed);
 
-			ereport(elevel,
-					(errcode_for_file_access(),
-					 errmsg("could not stat file \"%s\": %m", xlogpath),
-					 errdetail("restore_command returned a zero exit status, but stat() failed.")));
-		}
+		if (success)
+			strcpy(path, xlogpath);
+
+		if (!stat_failed)
+			return success;
 	}
 
 	/*
@@ -260,8 +330,6 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 			(errmsg("could not restore file \"%s\" from archive: %s",
 					xlogfname, wait_result_to_str(rc))));
 
-not_available:
-
 	/*
 	 * if an archived file is not available, there might still be a version of
 	 * this file in XLOGDIR, so return that as the filename to open.
@@ -273,6 +341,79 @@ not_available:
 	return false;
 }
 
+/*
+ * check_restored_file
+ *
+ * Check that the file specified by xlogpath and xlogfname exists and has
+ * the expected size.  If everything checks out, *success is set to true
+ * and *stat_failed is set to false.  If the call to stat() fails, *success
+ * is set to false and *stat_failed is set to true.  Otherwise, both
+ * *success and *stat_failed are set to false.
+ */
+static void
+check_restored_file(const char *xlogpath, const char *xlogfname,
+					off_t expectedSize, bool *success, bool *stat_failed)
+{
+	struct stat stat_buf;
+
+	Assert(xlogpath != NULL);
+	Assert(xlogfname != NULL);
+	Assert(success != NULL);
+	Assert(stat_failed != NULL);
+
+	*success = false;
+	*stat_failed = false;
+
+	if (stat(xlogpath, &stat_buf) == 0)
+	{
+		if (expectedSize > 0 && stat_buf.st_size != expectedSize)
+		{
+			int			elevel;
+
+			/*
+			 * If we find a partial file in standby mode, we assume it's
+			 * because it's just being copied to the archive, and keep
+			 * trying.
+			 *
+			 * Otherwise treat a wrong-sized file as FATAL to ensure the
+			 * DBA would notice it, but is that too strong? We could try
+			 * to plow ahead with a local copy of the file ... but the
+			 * problem is that there probably isn't one, and we'd
+			 * incorrectly conclude we've reached the end of WAL and we're
+			 * done recovering ...
+			 */
+			if (StandbyMode && stat_buf.st_size < expectedSize)
+				elevel = DEBUG1;
+			else
+				elevel = FATAL;
+			ereport(elevel,
+					(errmsg("archive file \"%s\" has wrong size: %lld instead of %lld",
+							xlogfname,
+							(long long int) stat_buf.st_size,
+							(long long int) expectedSize)));
+		}
+		else
+		{
+			ereport(LOG,
+					(errmsg("restored log file \"%s\" from archive",
+							xlogfname)));
+			*success = true;
+		}
+	}
+	else
+	{
+		/* stat failed */
+		int			elevel = (errno == ENOENT) ? LOG : FATAL;
+
+		ereport(elevel,
+				(errcode_for_file_access(),
+				 errmsg("could not stat file \"%s\": %m", xlogpath),
+				 errdetail("restore_command returned a zero exit status, but stat() failed.")));
+
+		*stat_failed = true;
+	}
+}
+
 /*
  * Attempt to execute an external shell command during recovery.
  *
@@ -371,6 +512,67 @@ ExecuteRecoveryCommand(const char *command, const char *commandName, bool failOn
 	}
 }
 
+/*
+ * Attempt to execute an external library during recovery.
+ *
+ * 'libraryType' is the library to be executed.
+ *
+ * This is currently used for recovery_end_library and
+ * archive_cleanup_library.
+ */
+void
+ExecuteRecoveryLibrary(const char *libraryType)
+{
+	char		lastRestartPointFname[MAXPGPATH];
+	XLogSegNo	restartSegNo;
+	XLogRecPtr	restartRedoPtr;
+	TimeLineID	restartTli;
+	bool		ret;
+	char	   *libraryName;
+
+	Assert(libraryType);
+
+	/*
+	 * Calculate the archive file cutoff point for use during log shipping
+	 * replication. All files earlier than this point can be deleted from the
+	 * archive, though there is no requirement to do so.
+	 */
+	GetOldestRestartPoint(&restartRedoPtr, &restartTli);
+	XLByteToSeg(restartRedoPtr, restartSegNo, wal_segment_size);
+	XLogFileName(lastRestartPointFname, restartTli, restartSegNo,
+				 wal_segment_size);
+
+	if (strcmp(libraryType, "archive_cleanup_library") == 0)
+		libraryName = archiveCleanupLibrary;
+	else if (strcmp(libraryType, "recovery_end_library") == 0)
+		libraryName = recoveryEndLibrary;
+	else
+		elog(ERROR, "unknown library type");
+
+	ereport(DEBUG3,
+			(errmsg_internal("executing %s library \"%s\" for file \"%s\"",
+							 libraryType, libraryName, lastRestartPointFname)));
+
+	/*
+	 * Call the library.
+	 *
+	 * Note that we do not catch any ERRORs (or worse) that the library
+	 * emits.  It is the responsibility of the library to do the necessary
+	 * error handling, memory management, etc.  If a library does ERROR (or
+	 * worse), it will bubble up and may cause unexpected behavior.
+	 */
+	if (strcmp(libraryType, "archive_cleanup_library") == 0)
+		ret = (*PG_archive_cleanup) (lastRestartPointFname);
+	else if (strcmp(libraryType, "recovery_end_library") == 0)
+		ret = (*PG_recovery_end) (lastRestartPointFname);
+	else
+		elog(ERROR, "unknown library type");
+
+	if (!ret)
+		ereport(WARNING,
+				(errmsg("executing %s library \"%s\" for file \"%s\" failed",
+						libraryType, libraryName, lastRestartPointFname)));
+}
 
 /*
  * A file was restored from the archive under a temporary filename (path),
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 74a7d7c4d0..ac6c35c4be 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -103,6 +103,8 @@ static bool pgarch_readyXlog(char *xlog);
 static void pgarch_archiveDone(char *xlog);
 static void pgarch_die(int code, Datum arg);
 static void HandlePgArchInterrupts(void);
+static bool execute_archive_library(const char *pathname, const char *xlog);
+static bool execute_archive_command(const char *pathname, const char *xlog);
 
 /* Report shared memory space needed by PgArchShmemInit */
 Size
@@ -358,11 +360,12 @@ pgarch_ArchiverCopyLoop(void)
 			 */
 			HandlePgArchInterrupts();
 
-			/* can't do anything if no command ... */
-			if (!XLogArchiveCommandSet())
+			/* can't do anything if no command or library... */
+			if (!XLogArchiveCommandOrLibrarySet())
 			{
 				ereport(WARNING,
-						(errmsg("archive_mode enabled, yet archive_command is not set")));
+						(errmsg("archive_mode enabled, yet archive_command "
+								"and archive_library are not set")));
 				return;
 			}
 
@@ -443,22 +446,102 @@ pgarch_ArchiverCopyLoop(void)
 /*
  * pgarch_archiveXlog
  *
- * Invokes system(3) to copy one archive file to wherever it should go
+ * Invokes system(3) or PG_archive() to copy one archive file to wherever
+ * it should go.
  *
  * Returns true if successful
  */
 static bool
 pgarch_archiveXlog(char *xlog)
 {
-	char		xlogarchcmd[MAXPGPATH];
 	char		pathname[MAXPGPATH];
+
+	snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
+
+	if (PG_archive != NULL)
+		return execute_archive_library(pathname, xlog);
+	else
+		return execute_archive_command(pathname, xlog);
+}
+
+/*
+ * execute_archive_library
+ *
+ * Invokes PG_archive() to copy one archive file to wherever it should go.
+ *
+ * Returns true if successful
+ */
+static bool
+execute_archive_library(const char *pathname, const char *xlog)
+{
+	char		activitymsg[MAXFNAMELEN + 16];
+	bool		ret;
+
+	Assert(PG_archive != NULL);
+	Assert(pathname != NULL);
+	Assert(xlog != NULL);
+
+	/*
+	 * Report that we are archiving a file.
+	 */
+	ereport(DEBUG3,
+			(errmsg_internal("executing archive library \"%s\" for file \"%s\"",
+							 XLogArchiveLibrary, xlog)));
+	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
+	set_ps_display(activitymsg);
+
+	/*
+	 * Call the archive library.
+	 *
+	 * Note that we do not catch any ERRORs (or worse) that the archive
+	 * library emits.  It is the responsibility of the library to do the
+	 * necessary error handling, memory management, etc.  If a library does
+	 * ERROR (or worse), it will bubble up and cause the archiver to
+	 * restart.
+	 */
+	ret = (*PG_archive) (pathname, xlog);
+
+	/*
+	 * Report the success or failure of the archival attempt.
+	 */
+	if (ret)
+	{
+		ereport(DEBUG1,
+				(errmsg("archived write-ahead log file \"%s\"", xlog)));
+		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
+	}
+	else
+	{
+		ereport(LOG,
+				(errmsg("archive library \"%s\" failed for file \"%s\"",
+						XLogArchiveLibrary, xlog)));
+		snprintf(activitymsg, sizeof(activitymsg), "failed on %s", xlog);
+	}
+
+	set_ps_display(activitymsg);
+	return ret;
+}
+
+/*
+ * execute_archive_command
+ *
+ * Invokes system(3) to copy one archive file to wherever it should go.
+ *
+ * Returns true if successful
+ */
+static bool
+execute_archive_command(const char *pathname, const char *xlog)
+{
+	char		xlogarchcmd[MAXPGPATH];
 	char		activitymsg[MAXFNAMELEN + 16];
 	char	   *dp;
 	char	   *endp;
 	const char *sp;
 	int			rc;
 
-	snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
+	Assert(XLogArchiveCommand != NULL && XLogArchiveCommand[0] != '\0');
+	Assert(pathname != NULL);
+	Assert(xlog != NULL);
 
 	/*
 	 * construct the command to be executed
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index e2a76ba055..b975ca362b 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1025,6 +1025,11 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	process_shared_preload_libraries();
 
+	/*
+	 * Process any backup libraries.
+	 */
+	process_backup_libraries();
+
 	/*
 	 * Initialize SSL library, if specified.
 	 */
@@ -5012,6 +5017,11 @@ SubPostmasterMain(int argc, char *argv[])
 	 */
 	process_shared_preload_libraries();
 
+	/*
+	 * Process any backup libraries.
+	 */
+	process_backup_libraries();
+
 	/* Run backend or appropriate child */
 	if (strcmp(argv[1], "--forkbackend") == 0)
 	{
diff --git a/src/backend/utils/fmgr/dfmgr.c b/src/backend/utils/fmgr/dfmgr.c
index 96fd9d2268..0910e3b1b9 100644
--- a/src/backend/utils/fmgr/dfmgr.c
+++ b/src/backend/utils/fmgr/dfmgr.c
@@ -61,6 +61,7 @@ typedef struct df_files
 	ino_t		inode;			/* Inode number of file */
 #endif
 	void	   *handle;			/* a handle for pg_dl* functions */
+	bool		non_backup;		/* loaded as a non-backup library */
 	char		filename[FLEXIBLE_ARRAY_MEMBER];	/* Full pathname of file */
 } DynamicFileList;
 
@@ -76,6 +77,11 @@ static DynamicFileList *file_tail = NULL;
 
 char	   *Dynamic_library_path;
 
+PG_archive_t PG_archive = NULL;
+PG_restore_t PG_restore = NULL;
+PG_archive_cleanup_t PG_archive_cleanup = NULL;
+PG_recovery_end_t PG_recovery_end = NULL;
+
 static void *internal_load_library(const char *libname);
 static void incompatible_module_error(const char *libname,
 									  const Pg_magic_struct *module_magic_data) pg_attribute_noreturn();
@@ -85,6 +91,7 @@ static char *expand_dynamic_library_name(const char *name);
 static void check_restricted_library_name(const char *name);
 static char *substitute_libpath_macro(const char *name);
 static char *find_in_dynamic_libpath(const char *basename);
+static void init_library(const char *libname, void *handle);
 
 /* Magic structure that module needs to match to be accepted */
 static const Pg_magic_struct magic_data = PG_MODULE_MAGIC_DATA;
@@ -187,7 +194,6 @@ internal_load_library(const char *libname)
 	PGModuleMagicFunction magic_func;
 	char	   *load_error;
 	struct stat stat_buf;
-	PG_init_t	PG_init;
 
 	/*
 	 * Scan the list of loaded FILES to see if the file has been loaded.
@@ -281,12 +287,7 @@ internal_load_library(const char *libname)
 					 errhint("Extension libraries are required to use the PG_MODULE_MAGIC macro.")));
 		}
 
-		/*
-		 * If the library has a _PG_init() function, call it.
-		 */
-		PG_init = (PG_init_t) dlsym(file_scanner->handle, "_PG_init");
-		if (PG_init)
-			(*PG_init) ();
+		init_library(libname, file_scanner->handle);
 
 		/* OK to link it into list */
 		if (file_list == NULL)
@@ -295,10 +296,95 @@ internal_load_library(const char *libname)
 			file_tail->next = file_scanner;
 		file_tail = file_scanner;
 	}
+	else if (!file_scanner->non_backup || process_backup_libraries_in_progress)
+	{
+		/*
+		 * If we are loading a backup library, we initialize the library
+		 * even if we previously loaded it.  This allows users to use
+		 * backup libraries for multiple reasons (e.g., the same library
+		 * can be specified in shared_preload_libraries, archive_library,
+		 * and restore_library).  Similarly, if we are loading a previously
+		 * loaded backup library as a "non-backup" library (e.g.,
+		 * session_preload_libraries), we want to initialize it again then,
+		 * too.
+		 */
+		init_library(libname, file_scanner->handle);
+	}
+
+	/*
+	 * Record whether this library is being loaded as a "non-backup"
+	 * library (e.g., session_preload_libraries).  If the same library is
+	 * used as a backup library, we want to call its _PG_init()
+	 * function (if one exists) again when initializing the backup
+	 * tooling.  Similarly, if a backup library is loaded as a
+	 * "non-backup" library, we want to call it's _PG_init() again when
+	 * reinitializing it.
+	 */
+	file_scanner->non_backup |= !process_backup_libraries_in_progress;
 
 	return file_scanner->handle;
 }
 
+/*
+ * init_library
+ *
+ * This function calls the library's _PG_init() function if it exists.  If
+ * we are loading a backup library, we look up the relevant functions and
+ * save them for later.
+ */
+static void
+init_library(const char *libname, void *handle)
+{
+	PG_init_t PG_init;
+
+	/*
+	 * If the library has a _PG_init() function, call it.
+	 */
+	PG_init = (PG_init_t) dlsym(handle, "_PG_init");
+	if (PG_init)
+		(*PG_init) ();
+
+	if (process_archive_library_in_progress)
+	{
+		PG_archive = (PG_archive_t) dlsym(handle, "_PG_archive");
+		if (PG_archive == NULL)
+			ereport(ERROR,
+					(errmsg("incompatible archive library \"%s\": missing "
+							"function \"_PG_archive()\"",
+							libname)));
+	}
+
+	if (process_restore_library_in_progress)
+	{
+		PG_restore = (PG_restore_t) dlsym(handle, "_PG_restore");
+		if (PG_restore == NULL)
+			ereport(ERROR,
+					(errmsg("incompatible restore library \"%s\": missing "
+							"function \"_PG_restore()\"",
+							libname)));
+	}
+
+	if (process_archive_cleanup_library_in_progress)
+	{
+		PG_archive_cleanup = (PG_archive_cleanup_t) dlsym(handle, "_PG_archive_cleanup");
+		if (PG_archive_cleanup == NULL)
+			ereport(ERROR,
+					(errmsg("incompatible archive cleanup library \"%s\": "
+							"missing function \"_PG_archive_cleanup()\"",
+							libname)));
+	}
+
+	if (process_recovery_end_library_in_progress)
+	{
+		PG_recovery_end = (PG_recovery_end_t) dlsym(handle, "_PG_recovery_end");
+		if (PG_recovery_end == NULL)
+			ereport(ERROR,
+					(errmsg("incompatible recovery end library \"%s\": "
+							"missing function \"_PG_recovery_end()\"",
+							libname)));
+	}
+}
+
 /*
  * Report a suitable error for an incompatible magic block.
  */
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 88801374b5..304cce00f7 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -29,6 +29,7 @@
 #include <utime.h>
 
 #include "access/htup_details.h"
+#include "access/xlog.h"
 #include "catalog/pg_authid.h"
 #include "common/file_perm.h"
 #include "libpq/libpq.h"
@@ -1614,6 +1615,16 @@ char	   *local_preload_libraries_string = NULL;
 /* Flag telling that we are loading shared_preload_libraries */
 bool		process_shared_preload_libraries_in_progress = false;
 
+/*
+ * Flags telling that we are loading archive_library, restore_library,
+ * archive_cleanup_library, and recovery_end_library.
+ */
+bool		process_backup_libraries_in_progress = false;
+bool		process_archive_library_in_progress = false;
+bool		process_restore_library_in_progress = false;
+bool		process_archive_cleanup_library_in_progress = false;
+bool		process_recovery_end_library_in_progress = false;
+
 /*
  * load the shared libraries listed in 'libraries'
  *
@@ -1696,6 +1707,57 @@ process_session_preload_libraries(void)
 				   true);
 }
 
+/*
+ * process backup libraries
+ */
+void
+process_backup_libraries(void)
+{
+	process_backup_libraries_in_progress = true;
+
+	if (XLogArchiveLibrary && XLogArchiveLibrary[0] != '\0')
+	{
+		process_archive_library_in_progress = true;
+		load_file(XLogArchiveLibrary, false);
+		ereport(DEBUG1,
+				(errmsg_internal("loaded archive library \"%s\"",
+								 XLogArchiveLibrary)));
+		process_archive_library_in_progress = false;
+	}
+
+	if (recoveryRestoreLibrary && recoveryRestoreLibrary[0] != '\0')
+	{
+		process_restore_library_in_progress = true;
+		load_file(recoveryRestoreLibrary, false);
+		ereport(DEBUG1,
+				(errmsg_internal("loaded restore library \"%s\"",
+								 recoveryRestoreLibrary)));
+		process_restore_library_in_progress = false;
+	}
+
+	if (archiveCleanupLibrary && archiveCleanupLibrary[0] != '\0')
+	{
+		process_archive_cleanup_library_in_progress = true;
+		load_file(archiveCleanupLibrary, false);
+		ereport(DEBUG1,
+				(errmsg_internal("loaded archive cleanup library \"%s\"",
+								 archiveCleanupLibrary)));
+		process_archive_cleanup_library_in_progress = false;
+	}
+
+	if (recoveryEndLibrary && recoveryEndLibrary[0] != '\0')
+	{
+		process_recovery_end_library_in_progress = true;
+		load_file(recoveryEndLibrary, false);
+		ereport(DEBUG1,
+				(errmsg_internal("loaded recovery end library \"%s\"",
+								 recoveryEndLibrary)));
+		process_recovery_end_library_in_progress = false;
+	}
+
+	process_backup_libraries_in_progress = false;
+}
+
 void
 pg_bindtextdomain(const char *domain)
 {
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index d2ce4a8450..1fd9ca5cbc 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -234,6 +234,14 @@ static bool check_recovery_target_lsn(char **newval, void **extra, GucSource sou
 static void assign_recovery_target_lsn(const char *newval, void *extra);
 static bool check_primary_slot_name(char **newval, void **extra, GucSource source);
 static bool check_default_with_oids(bool *newval, void **extra, GucSource source);
+static bool check_archive_command(char **newval, void **extra, GucSource source);
+static bool check_archive_library(char **newval, void **extra, GucSource source);
+static bool check_restore_command(char **newval, void **extra, GucSource source);
+static bool check_restore_library(char **newval, void **extra, GucSource source);
+static bool check_archive_cleanup_command(char **newval, void **extra, GucSource source);
+static bool check_archive_cleanup_library(char **newval, void **extra, GucSource source);
+static bool check_recovery_end_command(char **newval, void **extra, GucSource source);
+static bool check_recovery_end_library(char **newval, void **extra, GucSource source);
 
 /* Private functions in guc-file.l that need to be called from guc.c */
 static ConfigVariable *ProcessConfigFileInternal(GucContext context,
@@ -3855,7 +3863,17 @@ static struct config_string ConfigureNamesString[] =
 		},
 		&XLogArchiveCommand,
 		"",
-		NULL, NULL, show_archive_command
+		check_archive_command, NULL, show_archive_command
+	},
+
+	{
+		{"archive_library", PGC_POSTMASTER, WAL_ARCHIVING,
+			gettext_noop("Sets the library that will be called to archive a WAL file."),
+			NULL
+		},
+		&XLogArchiveLibrary,
+		"",
+		check_archive_library, NULL, NULL
 	},
 
 	{
@@ -3865,7 +3883,17 @@ static struct config_string ConfigureNamesString[] =
 		},
 		&recoveryRestoreCommand,
 		"",
-		NULL, NULL, NULL
+		check_restore_command, NULL, NULL
+	},
+
+	{
+		{"restore_library", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY,
+			gettext_noop("Sets the library that will be called to retrieve an archived WAL file."),
+			NULL
+		},
+		&recoveryRestoreLibrary,
+		"",
+		check_restore_library, NULL, NULL
 	},
 
 	{
@@ -3875,7 +3903,17 @@ static struct config_string ConfigureNamesString[] =
 		},
 		&archiveCleanupCommand,
 		"",
-		NULL, NULL, NULL
+		check_archive_cleanup_command, NULL, NULL
+	},
+
+	{
+		{"archive_cleanup_library", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY,
+			gettext_noop("Sets the library that will be executed at every restart point."),
+			NULL
+		},
+		&archiveCleanupLibrary,
+		"",
+		check_archive_cleanup_library, NULL, NULL
 	},
 
 	{
@@ -3885,7 +3923,17 @@ static struct config_string ConfigureNamesString[] =
 		},
 		&recoveryEndCommand,
 		"",
-		NULL, NULL, NULL
+		check_recovery_end_command, NULL, NULL
+	},
+
+	{
+		{"recovery_end_library", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY,
+			gettext_noop("Sets the library that will be executed once at the end of recovery."),
+			NULL
+		},
+		&recoveryEndLibrary,
+		"",
+		check_recovery_end_library, NULL, NULL
 	},
 
 	{
@@ -8948,7 +8996,8 @@ init_custom_variable(const char *name,
 	 * module might already have hooked into.
 	 */
 	if (context == PGC_POSTMASTER &&
-		!process_shared_preload_libraries_in_progress)
+		!process_shared_preload_libraries_in_progress &&
+		!process_backup_libraries_in_progress)
 		elog(FATAL, "cannot create PGC_POSTMASTER variables after startup");
 
 	/*
@@ -12559,4 +12608,100 @@ check_default_with_oids(bool *newval, void **extra, GucSource source)
 	return true;
 }
 
+static bool
+check_archive_command(char **newval, void **extra, GucSource source)
+{
+	if (*newval && *newval[0] != '\0' &&
+		XLogArchiveLibrary && XLogArchiveLibrary[0] != '\0')
+	{
+		GUC_check_errdetail("Cannot set parameter when \"archive_library\" is set.");
+		return false;
+	}
+	return true;
+}
+
+static bool
+check_archive_library(char **newval, void **extra, GucSource source)
+{
+	if (*newval && *newval[0] != '\0' &&
+		XLogArchiveCommand && XLogArchiveCommand[0] != '\0')
+	{
+		GUC_check_errdetail("Cannot set parameter when \"archive_command\" is set.");
+		return false;
+	}
+	return true;
+}
+
+static bool
+check_restore_command(char **newval, void **extra, GucSource source)
+{
+	if (*newval && *newval[0] != '\0' &&
+		recoveryRestoreLibrary && recoveryRestoreLibrary[0] != '\0')
+	{
+		GUC_check_errdetail("Cannot set parameter when \"restore_library\" is set.");
+		return false;
+	}
+	return true;
+}
+
+static bool
+check_restore_library(char **newval, void **extra, GucSource source)
+{
+	if (*newval && *newval[0] != '\0' &&
+		recoveryRestoreCommand && recoveryRestoreCommand[0] != '\0')
+	{
+		GUC_check_errdetail("Cannot set parameter when \"restore_command\" is set.");
+		return false;
+	}
+	return true;
+}
+
+static bool
+check_archive_cleanup_command(char **newval, void **extra, GucSource source)
+{
+	if (*newval && *newval[0] != '\0' &&
+		archiveCleanupLibrary && archiveCleanupLibrary[0] != '\0')
+	{
+		GUC_check_errdetail("Cannot set parameter when \"archive_cleanup_library\" is set.");
+		return false;
+	}
+	return true;
+}
+
+static bool
+check_archive_cleanup_library(char **newval, void **extra, GucSource source)
+{
+	if (*newval && *newval[0] != '\0' &&
+		archiveCleanupCommand && archiveCleanupCommand[0] != '\0')
+	{
+		GUC_check_errdetail("Cannot set parameter when \"archive_cleanup_command\" is set.");
+		return false;
+	}
+	return true;
+}
+
+static bool
+check_recovery_end_command(char **newval, void **extra, GucSource source)
+{
+	if (*newval && *newval[0] != '\0' &&
+		recoveryEndLibrary && recoveryEndLibrary[0] != '\0')
+	{
+		GUC_check_errdetail("Cannot set parameter when \"recovery_end_library\" is set.");
+		return false;
+	}
+	return true;
+}
+
+static bool
+check_recovery_end_library(char **newval, void **extra, GucSource source)
+{
+	if (*newval && *newval[0] != '\0' &&
+		recoveryEndCommand && recoveryEndCommand[0] != '\0')
+	{
+		GUC_check_errdetail("Cannot set parameter when \"recovery_end_command\" is set.");
+		return false;
+	}
+	return true;
+}
+
 #include "guc-file.c"
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 5e2c94a05f..094b74050c 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -71,6 +71,7 @@ extern int	XLOGbuffers;
 extern int	XLogArchiveTimeout;
 extern int	wal_retrieve_retry_interval;
 extern char *XLogArchiveCommand;
+extern char *XLogArchiveLibrary;
 extern bool EnableHotStandby;
 extern bool fullPageWrites;
 extern bool wal_log_hints;
@@ -81,8 +82,11 @@ extern bool *wal_consistency_checking;
 extern char *wal_consistency_checking_string;
 extern bool log_checkpoints;
 extern char *recoveryRestoreCommand;
+extern char *recoveryRestoreLibrary;
 extern char *recoveryEndCommand;
+extern char *recoveryEndLibrary;
 extern char *archiveCleanupCommand;
+extern char *archiveCleanupLibrary;
 extern bool recoveryTargetInclusive;
 extern int	recoveryTargetAction;
 extern int	recovery_min_apply_delay;
@@ -157,7 +161,7 @@ extern PGDLLIMPORT int wal_level;
 /* Is WAL archiving enabled always (even during recovery)? */
 #define XLogArchivingAlways() \
 	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
-#define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')
+#define XLogArchiveCommandOrLibrarySet() (XLogArchiveCommand[0] != '\0' || XLogArchiveLibrary[0] != '\0')
 
 /*
  * Is WAL-logging necessary for archival or log-shipping, or can we skip
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index c0da76cab4..9c6c3c5f06 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -332,5 +332,6 @@ extern bool ArchiveRecoveryRequested;
 extern bool InArchiveRecovery;
 extern bool StandbyMode;
 extern char *recoveryRestoreCommand;
+extern char *recoveryRestoreLibrary;
 
 #endif							/* XLOG_INTERNAL_H */
diff --git a/src/include/access/xlogarchive.h b/src/include/access/xlogarchive.h
index 3edd1a976c..4c5e63697c 100644
--- a/src/include/access/xlogarchive.h
+++ b/src/include/access/xlogarchive.h
@@ -22,6 +22,7 @@ extern bool RestoreArchivedFile(char *path, const char *xlogfname,
 								bool cleanupEnabled);
 extern void ExecuteRecoveryCommand(const char *command, const char *commandName,
 								   bool failOnSignal);
+extern void ExecuteRecoveryLibrary(const char *libraryType);
 extern void KeepFileRestoredFromArchive(const char *path, const char *xlogfname);
 extern void XLogArchiveNotify(const char *xlog);
 extern void XLogArchiveNotifySeg(XLogSegNo segno);
diff --git a/src/include/fmgr.h b/src/include/fmgr.h
index ab7b85c86e..d2f28d084a 100644
--- a/src/include/fmgr.h
+++ b/src/include/fmgr.h
@@ -718,6 +718,16 @@ extern bool CheckFunctionValidatorAccess(Oid validatorOid, Oid functionOid);
  */
 extern char *Dynamic_library_path;
 
+typedef bool (*PG_archive_t) (const char *path, const char *file);
+extern PG_archive_t PG_archive;
+typedef bool (*PG_restore_t) (const char *path, const char *file,
+							  const char *last_restartpoint_file);
+extern PG_restore_t PG_restore;
+typedef bool (*PG_archive_cleanup_t) (const char *last_restartpoint_file);
+extern PG_archive_cleanup_t PG_archive_cleanup;
+typedef bool (*PG_recovery_end_t) (const char *last_restartpoint_file);
+extern PG_recovery_end_t PG_recovery_end;
+
 extern void *load_external_function(const char *filename, const char *funcname,
 									bool signalNotFound, void **filehandle);
 extern void *lookup_external_function(void *filehandle, const char *funcname);
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 90a3016065..2f68d7c1dc 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -464,6 +464,11 @@ extern void BaseInit(void);
 /* in utils/init/miscinit.c */
 extern bool IgnoreSystemIndexes;
 extern PGDLLIMPORT bool process_shared_preload_libraries_in_progress;
+extern PGDLLIMPORT bool process_backup_libraries_in_progress;
+extern PGDLLIMPORT bool process_archive_library_in_progress;
+extern PGDLLIMPORT bool process_restore_library_in_progress;
+extern PGDLLIMPORT bool process_archive_cleanup_library_in_progress;
+extern PGDLLIMPORT bool process_recovery_end_library_in_progress;
 extern char *session_preload_libraries_string;
 extern char *shared_preload_libraries_string;
 extern char *local_preload_libraries_string;
@@ -477,6 +482,7 @@ extern bool RecheckDataDirLockFile(void);
 extern void ValidatePgVersion(const char *path);
 extern void process_shared_preload_libraries(void);
 extern void process_session_preload_libraries(void);
+extern void process_backup_libraries(void);
 extern void pg_bindtextdomain(const char *domain);
 extern bool has_rolreplication(Oid roleid);
 
-- 
2.16.6

#24Bossart, Nathan
bossartn@amazon.com
In reply to: Andrey Borodin (#20)
Re: parallelizing the archiver

On 9/29/21, 9:49 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

I'm sure there are other ways to approach this, but I thought I'd give
it a try to see what was possible and to get the conversation started.

BTW I am also considering the background worker approach that was
mentioned upthread. My current thinking is that the backup extension
would define a special background worker that communicates with the
archiver via shared memory. As noted upthread, this would enable
extension authors to do whatever batching, parallelism, etc. that they
want, and it should also prevent failures from taking down the
archiver process. However, this approach might not make sense for
things like recovery_end_command that are only executed once. Maybe
it's okay to leave that one alone for now.

Nathan

#25Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Bossart, Nathan (#23)
Re: parallelizing the archiver

30 сент. 2021 г., в 09:47, Bossart, Nathan <bossartn@amazon.com> написал(а):

The attached patch is a first try at adding alternatives for
archive_command

Looks like an interesting alternative design.

I tested the sample archive_command in the docs against the sample
archive_library implementation in the patch, and I saw about a 50%
speedup. (The archive_library actually syncs the files to disk, too.)
This is similar to the improvement from batching.

Why test sample agains sample? I think if one tests this agains real archive tool doing archive_status lookup and ready->done renaming results will be much different.

Of course, there are drawbacks to using an extension. Besides the
obvious added complexity of building an extension in C versus writing
a shell command, the patch disallows changing the libraries without
restarting the server. Also, the patch makes no effort to simplify
error handling, memory management, etc. This is left as an exercise
for the extension author.

I think the real problem with extension is quite different than mentioned above.
There are many archive tools that already feature parallel archiving. PgBackrest, wal-e, wal-g, pg_probackup, pghoard, pgbarman and others. These tools by far outweight tools that don't look into archive_status to parallelize archival.
And we are going to ask them: add also a C extension without any feasible benefit to the user. You only get some restrictions like system restart to enable shared library.

I think we need a design that legalises already existing de-facto standard features in archive tools. Or event better - enables these tools to be more efficient, reliable etc. Either way we will create legacy code from the scratch.

Thanks!

Best regards, Andrey Borodin.

#26Bossart, Nathan
bossartn@amazon.com
In reply to: Andrey Borodin (#25)
Re: parallelizing the archiver

On 10/1/21, 12:08 PM, "Andrey Borodin" <x4mmm@yandex-team.ru> wrote:

30 сент. 2021 г., в 09:47, Bossart, Nathan <bossartn@amazon.com> написал(а):

I tested the sample archive_command in the docs against the sample
archive_library implementation in the patch, and I saw about a 50%
speedup. (The archive_library actually syncs the files to disk, too.)
This is similar to the improvement from batching.

Why test sample agains sample? I think if one tests this agains real archive tool doing archive_status lookup and ready->done renaming results will be much different.

My intent was to demonstrate the impact of reducing the amount of
overhead when archiving. I don't doubt that third party archive tools
can show improvements by doing batching/parallelism behind the scenes.

Of course, there are drawbacks to using an extension. Besides the
obvious added complexity of building an extension in C versus writing
a shell command, the patch disallows changing the libraries without
restarting the server. Also, the patch makes no effort to simplify
error handling, memory management, etc. This is left as an exercise
for the extension author.

I think the real problem with extension is quite different than mentioned above.
There are many archive tools that already feature parallel archiving. PgBackrest, wal-e, wal-g, pg_probackup, pghoard, pgbarman and others. These tools by far outweight tools that don't look into archive_status to parallelize archival.
And we are going to ask them: add also a C extension without any feasible benefit to the user. You only get some restrictions like system restart to enable shared library.

I think we need a design that legalises already existing de-facto standard features in archive tools. Or event better - enables these tools to be more efficient, reliable etc. Either way we will create legacy code from the scratch.

My proposal wouldn't require any changes to any of these utilities.
This design just adds a new mechanism that would allow end users to
set up archiving a different way with less overhead in hopes that it
will help them keep up. I suspect a lot of work has been put into the
archive tools you mentioned to make sure they can keep up with high
rates of WAL generation, so I'm skeptical that anything we do here
will really benefit them all that much. Ideally, we'd do something
that improves matters for everyone, though. I'm open to suggestions.

Nathan

#27Stephen Frost
sfrost@snowman.net
In reply to: Bossart, Nathan (#26)
Re: parallelizing the archiver

Greetings,

* Bossart, Nathan (bossartn@amazon.com) wrote:

On 10/1/21, 12:08 PM, "Andrey Borodin" <x4mmm@yandex-team.ru> wrote:

30 сент. 2021 г., в 09:47, Bossart, Nathan <bossartn@amazon.com> написал(а):

Of course, there are drawbacks to using an extension. Besides the
obvious added complexity of building an extension in C versus writing
a shell command, the patch disallows changing the libraries without
restarting the server. Also, the patch makes no effort to simplify
error handling, memory management, etc. This is left as an exercise
for the extension author.

I think the real problem with extension is quite different than mentioned above.
There are many archive tools that already feature parallel archiving. PgBackrest, wal-e, wal-g, pg_probackup, pghoard, pgbarman and others. These tools by far outweight tools that don't look into archive_status to parallelize archival.
And we are going to ask them: add also a C extension without any feasible benefit to the user. You only get some restrictions like system restart to enable shared library.

I think we need a design that legalises already existing de-facto standard features in archive tools. Or event better - enables these tools to be more efficient, reliable etc. Either way we will create legacy code from the scratch.

My proposal wouldn't require any changes to any of these utilities.
This design just adds a new mechanism that would allow end users to
set up archiving a different way with less overhead in hopes that it
will help them keep up. I suspect a lot of work has been put into the
archive tools you mentioned to make sure they can keep up with high
rates of WAL generation, so I'm skeptical that anything we do here
will really benefit them all that much. Ideally, we'd do something
that improves matters for everyone, though. I'm open to suggestions.

This has something we've contemplated quite a bit and the last thing
that I'd want to have is a requirement to configure a whole bunch of
additional parameters to enable this. Why do we need to have some many
new GUCs? I would have thought we'd probably be able to get away with
just having the appropriate hooks and then telling folks to load the
extension in shared_preload_libraries..

As for the hooks themselves, I'd certainly hope that they'd be designed
to handle batches of WAL rather than individual ones as that's long been
one of the main issues with the existing archive command approach. I
appreciate that maybe that's less of an issue with a shared library but
it's still something to consider.

Admittedly, I haven't looked in depth with this patch set and am just
going off of the description of them provided in the thread, so perhaps
I missed something.

Thanks,

Stephen

#28Bossart, Nathan
bossartn@amazon.com
In reply to: Stephen Frost (#27)
Re: parallelizing the archiver

On 10/4/21, 7:21 PM, "Stephen Frost" <sfrost@snowman.net> wrote:

This has something we've contemplated quite a bit and the last thing
that I'd want to have is a requirement to configure a whole bunch of
additional parameters to enable this. Why do we need to have some many
new GUCs? I would have thought we'd probably be able to get away with
just having the appropriate hooks and then telling folks to load the
extension in shared_preload_libraries..

That would certainly simplify my patch quite a bit. I'll do it this
way in the next revision.

As for the hooks themselves, I'd certainly hope that they'd be designed
to handle batches of WAL rather than individual ones as that's long been
one of the main issues with the existing archive command approach. I
appreciate that maybe that's less of an issue with a shared library but
it's still something to consider.

Will do. This seems like it should be easier with the hook because we
can provide a way to return which files were successfully archived.

Nathan

#29Stephen Frost
sfrost@snowman.net
In reply to: Bossart, Nathan (#28)
Re: parallelizing the archiver

Greetings,

* Bossart, Nathan (bossartn@amazon.com) wrote:

On 10/4/21, 7:21 PM, "Stephen Frost" <sfrost@snowman.net> wrote:

This has something we've contemplated quite a bit and the last thing
that I'd want to have is a requirement to configure a whole bunch of
additional parameters to enable this. Why do we need to have some many
new GUCs? I would have thought we'd probably be able to get away with
just having the appropriate hooks and then telling folks to load the
extension in shared_preload_libraries..

That would certainly simplify my patch quite a bit. I'll do it this
way in the next revision.

As for the hooks themselves, I'd certainly hope that they'd be designed
to handle batches of WAL rather than individual ones as that's long been
one of the main issues with the existing archive command approach. I
appreciate that maybe that's less of an issue with a shared library but
it's still something to consider.

Will do. This seems like it should be easier with the hook because we
can provide a way to return which files were successfully archived.

It's also been discussed, at least around the water cooler (as it were
in pandemic times- aka our internal slack channels..) that the existing
archive command might be reimplemented as an extension using these. Not
sure if that's really necessary but it was a thought. In any case,
thanks for working on this!

Stephen

#30Bossart, Nathan
bossartn@amazon.com
In reply to: Stephen Frost (#29)
Re: parallelizing the archiver

On 10/4/21, 8:19 PM, "Stephen Frost" <sfrost@snowman.net> wrote:

It's also been discussed, at least around the water cooler (as it were
in pandemic times- aka our internal slack channels..) that the existing
archive command might be reimplemented as an extension using these. Not
sure if that's really necessary but it was a thought. In any case,
thanks for working on this!

Interesting. I like the idea of having one code path for everything
instead of branching for the hook and non-hook paths. Thanks for
sharing your thoughts.

Nathan

#31Magnus Hagander
magnus@hagander.net
In reply to: Bossart, Nathan (#30)
Re: parallelizing the archiver

On Tue, Oct 5, 2021 at 5:32 AM Bossart, Nathan <bossartn@amazon.com> wrote:

On 10/4/21, 8:19 PM, "Stephen Frost" <sfrost@snowman.net> wrote:

It's also been discussed, at least around the water cooler (as it were
in pandemic times- aka our internal slack channels..) that the existing
archive command might be reimplemented as an extension using these. Not
sure if that's really necessary but it was a thought. In any case,
thanks for working on this!

Interesting. I like the idea of having one code path for everything
instead of branching for the hook and non-hook paths. Thanks for
sharing your thoughts.

I remember having had this discussion a few times, I think mainly with
Stephen and David as well (but not on their internal slack channels :P).

I definitely think that's the way to go. It gives a single path for
everything which makes it simpler in the most critical parts. And once you
have picked an implementation other than it, you're now completely rid of
the old implementation. And of course the good old idea that having an
extension already using the API is a good way to show that the API is in a
good place.

As much as I dislike our current interface in archive_command, and would
like to see it go away completely, I do believe we need to ship something
that has it - if nothing else then for backwards compatibility. But an
extension like this would also make it easier to eventually, down the road,
deprecate this solution.

Oh, and please put said implementation in a better place than contrib :)

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#32Bossart, Nathan
bossartn@amazon.com
In reply to: Magnus Hagander (#31)
1 attachment(s)
Re: parallelizing the archiver

On 10/6/21, 1:34 PM, "Magnus Hagander" <magnus@hagander.net> wrote:

I definitely think that's the way to go. It gives a single path for everything which makes it simpler in the most critical parts. And once you have picked an implementation other than it, you're now completely rid of the old implementation. And of course the good old idea that having an extension already using the API is a good way to show that the API is in a good place.

As much as I dislike our current interface in archive_command, and would like to see it go away completely, I do believe we need to ship something that has it - if nothing else then for backwards compatibility. But an extension like this would also make it easier to eventually, down the road, deprecate this solution.

Oh, and please put said implementation in a better place than contrib :)

I've attached an attempt at moving the archive_command logic to its
own module and replacing it with a hook. This was actually pretty
straightforward.

I think the biggest question is where to put the archive_command
module, which I've called shell_archive. The only existing directory
that looked to me like it might work is src/test/modules. It might be
rather bold to relegate this functionality to a test module so
quickly, but on the other hand, perhaps it's the right thing to do
given we intend to deprecate it in the future. I'm curious what
others think about this.

I'm still working on the documentation updates, which are quite
extensive. I haven't included any of those in the patch yet.

Nathan

Attachments:

v2-0001-Replace-archive_command-with-a-hook.patchapplication/octet-stream; name=v2-0001-Replace-archive_command-with-a-hook.patchDownload
From 63c0567b47fb59617c76c7aa989c89668bdd46be Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Mon, 18 Oct 2021 22:56:58 +0000
Subject: [PATCH v2 1/1] Replace archive_command with a hook.

---
 src/backend/access/transam/xlog.c                |   9 +-
 src/backend/postmaster/pgarch.c                  | 136 +++------------
 src/backend/utils/misc/guc.c                     |  22 +--
 src/backend/utils/misc/postgresql.conf.sample    |   4 -
 src/include/access/xlog.h                        |   2 -
 src/include/postmaster/pgarch.h                  |   3 +
 src/test/modules/Makefile                        |   1 +
 src/test/modules/shell_archive/Makefile          |  15 ++
 src/test/modules/shell_archive/shell_archive.c   | 207 +++++++++++++++++++++++
 src/test/perl/PostgresNode.pm                    |   3 +-
 src/test/recovery/t/020_archive_status.pl        |   6 +-
 src/test/recovery/t/025_stuck_on_old_timeline.pl |   2 +-
 12 files changed, 261 insertions(+), 149 deletions(-)
 create mode 100644 src/test/modules/shell_archive/Makefile
 create mode 100644 src/test/modules/shell_archive/shell_archive.c

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 62862255fc..7a8b8eff20 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -94,7 +94,6 @@ int			wal_keep_size_mb = 0;
 int			XLOGbuffers = -1;
 int			XLogArchiveTimeout = 0;
 int			XLogArchiveMode = ARCHIVE_MODE_OFF;
-char	   *XLogArchiveCommand = NULL;
 bool		EnableHotStandby = false;
 bool		fullPageWrites = true;
 bool		wal_log_hints = false;
@@ -7898,7 +7897,7 @@ StartupXLOG(void)
 	 * assign it a unique new ID.  Even if we ran to the end, modifying the
 	 * current last segment is problematic because it may result in trying to
 	 * overwrite an already-archived copy of that segment, and we encourage
-	 * DBAs to make their archive_commands reject that.  We can dodge the
+	 * DBAs to make their archive_hooks reject that.  We can dodge the
 	 * problem by making the new active segment have a new timeline ID.
 	 *
 	 * In a normal crash recovery, we can just extend the timeline we were in.
@@ -8777,7 +8776,7 @@ ShutdownXLOG(int code, Datum arg)
 		 * process one more time at the end of shutdown). The checkpoint
 		 * record will go to the next XLOG file and won't be archived (yet).
 		 */
-		if (XLogArchivingActive() && XLogArchiveCommandSet())
+		if (XLogArchivingActive() && archive_hook != NULL)
 			RequestXLogSwitch(false);
 
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
@@ -11791,7 +11790,7 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 	 * property of the WAL files ensures any earlier WAL files are safely
 	 * archived as well.
 	 *
-	 * We wait forever, since archive_command is supposed to work and we
+	 * We wait forever, since archive_hook is supposed to work and we
 	 * assume the admin wanted his backup to work completely. If you don't
 	 * wish to wait, then either waitforarchive should be passed in as false,
 	 * or you can set statement_timeout.  Also, some notices are issued to
@@ -11836,7 +11835,7 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 				ereport(WARNING,
 						(errmsg("still waiting for all required WAL segments to be archived (%d seconds elapsed)",
 								waits),
-						 errhint("Check that your archive_command is executing properly.  "
+						 errhint("Check that your archive_hook is executing properly.  "
 								 "You can safely cancel this backup, "
 								 "but the database backup will not be usable without all the WAL segments.")));
 			}
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 74a7d7c4d0..4041bcf9d5 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -36,7 +36,6 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "libpq/pqsignal.h"
-#include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/pgarch.h"
@@ -78,6 +77,7 @@ typedef struct PgArchData
 	int			pgprocno;		/* pgprocno of archiver process */
 } PgArchData;
 
+archive_hook_type archive_hook = NULL;
 
 /* ----------
  * Local data
@@ -358,11 +358,11 @@ pgarch_ArchiverCopyLoop(void)
 			 */
 			HandlePgArchInterrupts();
 
-			/* can't do anything if no command ... */
-			if (!XLogArchiveCommandSet())
+			/* can't do anything if no hook ... */
+			if (archive_hook == NULL)
 			{
 				ereport(WARNING,
-						(errmsg("archive_mode enabled, yet archive_command is not set")));
+						(errmsg("archive_mode enabled, yet archive_hook is not set")));
 				return;
 			}
 
@@ -443,136 +443,48 @@ pgarch_ArchiverCopyLoop(void)
 /*
  * pgarch_archiveXlog
  *
- * Invokes system(3) to copy one archive file to wherever it should go
+ * Invokes archive_hook to copy one archive file to wherever it should go
  *
  * Returns true if successful
  */
 static bool
 pgarch_archiveXlog(char *xlog)
 {
-	char		xlogarchcmd[MAXPGPATH];
 	char		pathname[MAXPGPATH];
 	char		activitymsg[MAXFNAMELEN + 16];
-	char	   *dp;
-	char	   *endp;
-	const char *sp;
-	int			rc;
+	bool		success = false;
 
 	snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
 
-	/*
-	 * construct the command to be executed
-	 */
-	dp = xlogarchcmd;
-	endp = xlogarchcmd + MAXPGPATH - 1;
-	*endp = '\0';
-
-	for (sp = XLogArchiveCommand; *sp; sp++)
-	{
-		if (*sp == '%')
-		{
-			switch (sp[1])
-			{
-				case 'p':
-					/* %p: relative path of source file */
-					sp++;
-					strlcpy(dp, pathname, endp - dp);
-					make_native_path(dp);
-					dp += strlen(dp);
-					break;
-				case 'f':
-					/* %f: filename of source file */
-					sp++;
-					strlcpy(dp, xlog, endp - dp);
-					dp += strlen(dp);
-					break;
-				case '%':
-					/* convert %% to a single % */
-					sp++;
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-				default:
-					/* otherwise treat the % as not special */
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-			}
-		}
-		else
-		{
-			if (dp < endp)
-				*dp++ = *sp;
-		}
-	}
-	*dp = '\0';
-
 	ereport(DEBUG3,
-			(errmsg_internal("executing archive command \"%s\"",
-							 xlogarchcmd)));
+			(errmsg_internal("executing archive_hook for \"%s\"",
+							 xlog)));
 
 	/* Report archive activity in PS display */
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	rc = system(xlogarchcmd);
-	if (rc != 0)
-	{
-		/*
-		 * If either the shell itself, or a called command, died on a signal,
-		 * abort the archiver.  We do this because system() ignores SIGINT and
-		 * SIGQUIT while waiting; so a signal is very likely something that
-		 * should have interrupted us too.  Also die if the shell got a hard
-		 * "command not found" type of error.  If we overreact it's no big
-		 * deal, the postmaster will just start the archiver again.
-		 */
-		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
-
-		if (WIFEXITED(rc))
-		{
-			ereport(lev,
-					(errmsg("archive command failed with exit code %d",
-							WEXITSTATUS(rc)),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-		else if (WIFSIGNALED(rc))
-		{
-#if defined(WIN32)
-			ereport(lev,
-					(errmsg("archive command was terminated by exception 0x%X",
-							WTERMSIG(rc)),
-					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#else
-			ereport(lev,
-					(errmsg("archive command was terminated by signal %d: %s",
-							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#endif
-		}
-		else
-		{
-			ereport(lev,
-					(errmsg("archive command exited with unrecognized status %d",
-							rc),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
+	/* Call the archiving hook */
+	(*archive_hook) (xlog, pathname, &success);
 
+	/* Report whether the hook succeeded and update activity in PS display */
+	if (success)
+	{
+		ereport(DEBUG1,
+				(errmsg("archived write-ahead log file \"%s\"",
+						xlog)));
+		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
+	}
+	else
+	{
+		ereport(LOG,
+				(errmsg("failed to archive write-ahead log file \"%s\"",
+						xlog)));
 		snprintf(activitymsg, sizeof(activitymsg), "failed on %s", xlog);
-		set_ps_display(activitymsg);
-
-		return false;
 	}
-	elog(DEBUG1, "archived write-ahead log file \"%s\"", xlog);
-
-	snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	set_ps_display(activitymsg);
 
-	return true;
+	return success;
 }
 
 /*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index d2ce4a8450..a1744a94bc 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -192,7 +192,6 @@ static bool check_canonical_path(char **newval, void **extra, GucSource source);
 static bool check_timezone_abbreviations(char **newval, void **extra, GucSource source);
 static void assign_timezone_abbreviations(const char *newval, void *extra);
 static void pg_timezone_abbrev_initialize(void);
-static const char *show_archive_command(void);
 static void assign_tcp_keepalives_idle(int newval, void *extra);
 static void assign_tcp_keepalives_interval(int newval, void *extra);
 static void assign_tcp_keepalives_count(int newval, void *extra);
@@ -3848,16 +3847,6 @@ static struct config_real ConfigureNamesReal[] =
 
 static struct config_string ConfigureNamesString[] =
 {
-	{
-		{"archive_command", PGC_SIGHUP, WAL_ARCHIVING,
-			gettext_noop("Sets the shell command that will be called to archive a WAL file."),
-			NULL
-		},
-		&XLogArchiveCommand,
-		"",
-		NULL, NULL, show_archive_command
-	},
-
 	{
 		{"restore_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will be called to retrieve an archived WAL file."),
@@ -4802,7 +4791,7 @@ static struct config_enum ConfigureNamesEnum[] =
 
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
-			gettext_noop("Allows archiving of WAL files using archive_command."),
+			gettext_noop("Allows archiving of WAL files."),
 			NULL
 		},
 		&XLogArchiveMode,
@@ -11914,15 +11903,6 @@ pg_timezone_abbrev_initialize(void)
 					PGC_POSTMASTER, PGC_S_DYNAMIC_DEFAULT);
 }
 
-static const char *
-show_archive_command(void)
-{
-	if (XLogArchivingActive())
-		return XLogArchiveCommand;
-	else
-		return "(disabled)";
-}
-
 static void
 assign_tcp_keepalives_idle(int newval, void *extra)
 {
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 3fe9a53cb3..a466eedf9b 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -245,10 +245,6 @@
 
 #archive_mode = off		# enables archiving; off, on, or always
 				# (change requires restart)
-#archive_command = ''		# command to use to archive a logfile segment
-				# placeholders: %p = path of file to archive
-				#               %f = file name only
-				# e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
 #archive_timeout = 0		# force a logfile segment switch after this
 				# number of seconds; 0 disables
 
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 5e2c94a05f..3724d8cd83 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -70,7 +70,6 @@ extern int	max_slot_wal_keep_size_mb;
 extern int	XLOGbuffers;
 extern int	XLogArchiveTimeout;
 extern int	wal_retrieve_retry_interval;
-extern char *XLogArchiveCommand;
 extern bool EnableHotStandby;
 extern bool fullPageWrites;
 extern bool wal_log_hints;
@@ -157,7 +156,6 @@ extern PGDLLIMPORT int wal_level;
 /* Is WAL archiving enabled always (even during recovery)? */
 #define XLogArchivingAlways() \
 	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
-#define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')
 
 /*
  * Is WAL-logging necessary for archival or log-shipping, or can we skip
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 1e47a143e1..753628ed7f 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -32,4 +32,7 @@ extern bool PgArchCanRestart(void);
 extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 
+typedef void (*archive_hook_type) (const char *file, const char *path, bool *success);
+extern PGDLLIMPORT archive_hook_type archive_hook;
+
 #endif							/* _PGARCH_H */
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index dffc79b2d9..e924c9c7ac 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -12,6 +12,7 @@ SUBDIRS = \
 		  dummy_seclabel \
 		  libpq_pipeline \
 		  plsample \
+		  shell_archive \
 		  snapshot_too_old \
 		  spgist_name_ops \
 		  test_bloomfilter \
diff --git a/src/test/modules/shell_archive/Makefile b/src/test/modules/shell_archive/Makefile
new file mode 100644
index 0000000000..a48d227c9f
--- /dev/null
+++ b/src/test/modules/shell_archive/Makefile
@@ -0,0 +1,15 @@
+# src/test/modules/shell_archive/Makefile
+
+MODULES = shell_archive
+PGFILEDESC = "shell_archive - archive module example"
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/shell_archive
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/shell_archive/shell_archive.c b/src/test/modules/shell_archive/shell_archive.c
new file mode 100644
index 0000000000..ba8c63b5f4
--- /dev/null
+++ b/src/test/modules/shell_archive/shell_archive.c
@@ -0,0 +1,207 @@
+/* -------------------------------------------------------------------------
+ *
+ * shell_archive.c
+ *		Sample archive module code that uses a user-specified shell command
+ *		to copy write-ahead log files.  This strategy has many shortcomings
+ *		and should ideally not be used in production settings.  However,
+ *		prior to v15, the functionality provided in this module was the only
+ *		archiving mechanism available in core, so it is provided here for
+ *		backward compatibility.
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/shell_archive/shell_archive.c
+ *
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/xlog.h"
+#include "fmgr.h"
+#include "miscadmin.h"
+#include "postmaster/pgarch.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+void		_PG_init(void);
+void		_PG_fini(void);
+
+static const char *show_archive_command(void);
+static void shell_archive_hook(const char *file, const char *path,
+							   bool *success);
+
+static char *ArchiveCommand = NULL;
+
+void
+_PG_init(void)
+{
+	if (!process_shared_preload_libraries_in_progress)
+		return;
+
+	DefineCustomStringVariable("shell_archive.archive_command",
+							   "Sets the shell command that will be called to archive a WAL file.",
+							   NULL,
+							   &ArchiveCommand,
+							   "",
+							   PGC_SIGHUP,
+							   0,
+							   NULL, NULL, show_archive_command);
+
+	EmitWarningsOnPlaceholders("shell_archive");
+
+	if (archive_hook != NULL)
+		ereport(ERROR,
+				(errmsg("archive_hook already set"),
+				 errdetail("Only one archive_hook can be loaded via "
+						   "shared_preload_libraries.")));
+
+	archive_hook = shell_archive_hook;
+}
+
+void
+_PG_fini(void)
+{
+	archive_hook = NULL;
+}
+
+static const char *
+show_archive_command(void)
+{
+	if (XLogArchivingActive())
+		return ArchiveCommand;
+	else
+		return "(disabled)";
+}
+
+/*
+ * shell_archive_hook
+ *
+ * Invokes system(3) to copy one archive file to wherever it should go
+ */
+static void
+shell_archive_hook(const char *file, const char *path, bool *success)
+{
+	char		xlogarchcmd[MAXPGPATH];
+	char	   *dp;
+	char	   *endp;
+	const char *sp;
+	int			rc;
+
+	Assert(file != NULL);
+	Assert(path != NULL);
+	Assert(success != NULL);
+
+	if (ArchiveCommand == NULL || ArchiveCommand[0] == '\0')
+	{
+		ereport(WARNING,
+				(errmsg("\"shell_archive.archive_command\" is not set")));
+		*success = false;
+		return;
+	}
+
+	/*
+	 * construct the command to be executed
+	 */
+	dp = xlogarchcmd;
+	endp = xlogarchcmd + MAXPGPATH - 1;
+	*endp = '\0';
+
+	for (sp = ArchiveCommand; *sp; sp++)
+	{
+		if (*sp == '%')
+		{
+			switch (sp[1])
+			{
+				case 'p':
+					/* %p: relative path of source file */
+					sp++;
+					strlcpy(dp, path, endp - dp);
+					make_native_path(dp);
+					dp += strlen(dp);
+					break;
+				case 'f':
+					/* %f: filename of source file */
+					sp++;
+					strlcpy(dp, file, endp - dp);
+					dp += strlen(dp);
+					break;
+				case '%':
+					/* convert %% to a single % */
+					sp++;
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+				default:
+					/* otherwise treat the % as not special */
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+			}
+		}
+		else
+		{
+			if (dp < endp)
+				*dp++ = *sp;
+		}
+	}
+	*dp = '\0';
+
+	ereport(DEBUG3,
+			(errmsg_internal("executing archive command \"%s\"",
+							 xlogarchcmd)));
+
+	rc = system(xlogarchcmd);
+	if (rc != 0)
+	{
+		/*
+		 * If either the shell itself, or a called command, died on a signal,
+		 * abort the archiver.  We do this because system() ignores SIGINT and
+		 * SIGQUIT while waiting; so a signal is very likely something that
+		 * should have interrupted us too.  Also die if the shell got a hard
+		 * "command not found" type of error.  If we overreact it's no big
+		 * deal, the postmaster will just start the archiver again.
+		 */
+		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
+
+		if (WIFEXITED(rc))
+		{
+			ereport(lev,
+					(errmsg("archive command failed with exit code %d",
+							WEXITSTATUS(rc)),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+		else if (WIFSIGNALED(rc))
+		{
+#if defined(WIN32)
+			ereport(lev,
+					(errmsg("archive command was terminated by exception 0x%X",
+							WTERMSIG(rc)),
+					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#else
+			ereport(lev,
+					(errmsg("archive command was terminated by signal %d: %s",
+							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#endif
+		}
+		else
+		{
+			ereport(lev,
+					(errmsg("archive command exited with unrecognized status %d",
+							rc),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+
+		*success = false;
+		return;
+	}
+
+	*success = true;
+}
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index ba80baf091..7d3ed412d0 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -1107,7 +1107,8 @@ sub enable_archiving
 	$self->append_conf(
 		'postgresql.conf', qq(
 archive_mode = on
-archive_command = '$copy_command'
+shared_preload_libraries = 'shell_archive'
+shell_archive.archive_command = '$copy_command'
 ));
 	return;
 }
diff --git a/src/test/recovery/t/020_archive_status.pl b/src/test/recovery/t/020_archive_status.pl
index cea65735a3..4dfeae3a66 100644
--- a/src/test/recovery/t/020_archive_status.pl
+++ b/src/test/recovery/t/020_archive_status.pl
@@ -32,7 +32,7 @@ my $incorrect_command =
   : qq{cp "%p_does_not_exist" "%f_does_not_exist"};
 $primary->safe_psql(
 	'postgres', qq{
-    ALTER SYSTEM SET archive_command TO '$incorrect_command';
+    ALTER SYSTEM SET shell_archive.archive_command TO '$incorrect_command';
     SELECT pg_reload_conf();
 });
 
@@ -90,7 +90,7 @@ ok( -f "$primary_data/$segment_path_1_ready",
 # Allow WAL archiving again and wait for a success.
 $primary->safe_psql(
 	'postgres', q{
-	ALTER SYSTEM RESET archive_command;
+	ALTER SYSTEM RESET shell_archive.archive_command;
 	SELECT pg_reload_conf();
 });
 
@@ -212,7 +212,7 @@ ok( -f "$standby2_data/$segment_path_1_ready",
 # Allow WAL archiving again, and wait for the segments to be archived.
 $standby2->safe_psql(
 	'postgres', q{
-	ALTER SYSTEM RESET archive_command;
+	ALTER SYSTEM RESET shell_archive.archive_command;
 	SELECT pg_reload_conf();
 });
 $standby2->poll_query_until('postgres',
diff --git a/src/test/recovery/t/025_stuck_on_old_timeline.pl b/src/test/recovery/t/025_stuck_on_old_timeline.pl
index 00ee9fcaed..819995bcac 100644
--- a/src/test/recovery/t/025_stuck_on_old_timeline.pl
+++ b/src/test/recovery/t/025_stuck_on_old_timeline.pl
@@ -34,7 +34,7 @@ my $archivedir_primary = $node_primary->archive_dir;
 $archivedir_primary =~ s!\\!/!g if $TestLib::windows_os;
 $node_primary->append_conf(
 	'postgresql.conf', qq(
-archive_command = '"$perlbin" "$FindBin::RealBin/cp_history_files" "%p" "$archivedir_primary/%f"'
+shell_archive.archive_command = '"$perlbin" "$FindBin::RealBin/cp_history_files" "%p" "$archivedir_primary/%f"'
 wal_keep_size=128MB
 ));
 # Make sure that Msys perl doesn't complain about difficulty in setting locale
-- 
2.16.6

#33Robert Haas
robertmhaas@gmail.com
In reply to: Bossart, Nathan (#32)
Re: parallelizing the archiver

On Mon, Oct 18, 2021 at 7:25 PM Bossart, Nathan <bossartn@amazon.com> wrote:

I think the biggest question is where to put the archive_command
module, which I've called shell_archive. The only existing directory
that looked to me like it might work is src/test/modules. It might be
rather bold to relegate this functionality to a test module so
quickly, but on the other hand, perhaps it's the right thing to do
given we intend to deprecate it in the future. I'm curious what
others think about this.

I don't see that as being a viable path forward based on my customer
interactions working here at EDB.

I am not quite sure why we wouldn't just compile the functions into
the server. Functions pointers can point to core functions as surely
as loadable modules. The present design isn't too congenial to that
because it's relying on the shared library loading mechanism to wire
the thing in place - but there's no reason it has to be that way.
Logical decoding plugins don't work that way, for example. We could
still have a GUC, say call it archive_method, that selects the module
-- with 'shell' being a builtin method, and others being loadable as
modules. If you set archive_method='shell' then you enable this
module, and it has its own GUC, say call it archive_command, to
configure the behavior.

An advantage of this approach is that it's perfectly
backward-compatible. I understand that archive_command is a hateful
thing to many people here, but software has to serve the user base,
not just the developers. Lots of people use archive_command and rely
on it -- and are not interested in installing yet another piece of
out-of-core software to do what $OTHERDB has built in.

--
Robert Haas
EDB: http://www.enterprisedb.com

#34David Steele
david@pgmasters.net
In reply to: Robert Haas (#33)
Re: parallelizing the archiver

On 10/19/21 8:50 AM, Robert Haas wrote:

On Mon, Oct 18, 2021 at 7:25 PM Bossart, Nathan <bossartn@amazon.com> wrote:

I think the biggest question is where to put the archive_command
module, which I've called shell_archive. The only existing directory
that looked to me like it might work is src/test/modules. It might be
rather bold to relegate this functionality to a test module so
quickly, but on the other hand, perhaps it's the right thing to do
given we intend to deprecate it in the future. I'm curious what
others think about this.

I don't see that as being a viable path forward based on my customer
interactions working here at EDB.

I am not quite sure why we wouldn't just compile the functions into
the server. Functions pointers can point to core functions as surely
as loadable modules. The present design isn't too congenial to that
because it's relying on the shared library loading mechanism to wire
the thing in place - but there's no reason it has to be that way.
Logical decoding plugins don't work that way, for example. We could
still have a GUC, say call it archive_method, that selects the module
-- with 'shell' being a builtin method, and others being loadable as
modules. If you set archive_method='shell' then you enable this
module, and it has its own GUC, say call it archive_command, to
configure the behavior.

An advantage of this approach is that it's perfectly
backward-compatible. I understand that archive_command is a hateful
thing to many people here, but software has to serve the user base,
not just the developers. Lots of people use archive_command and rely
on it -- and are not interested in installing yet another piece of
out-of-core software to do what $OTHERDB has built in.

+1 to all of this, certainly for the time being. The archive_command
mechanism is not great, but it is simple, and this part is not really
what makes writing a good archive command hard.

I had also originally envisioned this a default extension in core, but
having the default 'shell' method built-in is certainly simpler.

Regards,
--
-David
david@pgmasters.net

#35Magnus Hagander
magnus@hagander.net
In reply to: Robert Haas (#33)
Re: parallelizing the archiver

On Tue, Oct 19, 2021 at 2:50 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Oct 18, 2021 at 7:25 PM Bossart, Nathan <bossartn@amazon.com>
wrote:

I think the biggest question is where to put the archive_command
module, which I've called shell_archive. The only existing directory
that looked to me like it might work is src/test/modules. It might be
rather bold to relegate this functionality to a test module so
quickly, but on the other hand, perhaps it's the right thing to do
given we intend to deprecate it in the future. I'm curious what
others think about this.

I don't see that as being a viable path forward based on my customer
interactions working here at EDB.

I am not quite sure why we wouldn't just compile the functions into
the server. Functions pointers can point to core functions as surely
as loadable modules. The present design isn't too congenial to that
because it's relying on the shared library loading mechanism to wire
the thing in place - but there's no reason it has to be that way.
Logical decoding plugins don't work that way, for example. We could
still have a GUC, say call it archive_method, that selects the module
-- with 'shell' being a builtin method, and others being loadable as
modules. If you set archive_method='shell' then you enable this
module, and it has its own GUC, say call it archive_command, to
configure the behavior.

Yeah, seems reasonable. It wouldn't serve as well as an example to
developers, but then it's probably not the "loadable module" part of
building it that people need examples of. So as long as it's using the same
internal APIs and just happens to be compiled in by default, I see no
problem with that.

But, is logical decoding really that great an example? I mean, we build
pgoutput.so as a library, we don't provide it compiled-in. So we could
build the "shell archiver" based on that pattern, in which case we should
create a postmaster/shell_archiver directory or something like that?

It should definitely not go under "test".

An advantage of this approach is that it's perfectly

backward-compatible. I understand that archive_command is a hateful
thing to many people here, but software has to serve the user base,
not just the developers. Lots of people use archive_command and rely
on it -- and are not interested in installing yet another piece of
out-of-core software to do what $OTHERDB has built in.

Backwards compatibility is definitely a must, I'd say. Regardless of
exactly how the backwards-compatible pugin is shipped, it should be what's
turned on by default.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#36Bossart, Nathan
bossartn@amazon.com
In reply to: Magnus Hagander (#35)
Re: parallelizing the archiver

On 10/19/21, 6:39 AM, "David Steele" <david@pgmasters.net> wrote:

On 10/19/21 8:50 AM, Robert Haas wrote:

I am not quite sure why we wouldn't just compile the functions into
the server. Functions pointers can point to core functions as surely
as loadable modules. The present design isn't too congenial to that
because it's relying on the shared library loading mechanism to wire
the thing in place - but there's no reason it has to be that way.
Logical decoding plugins don't work that way, for example. We could
still have a GUC, say call it archive_method, that selects the module
-- with 'shell' being a builtin method, and others being loadable as
modules. If you set archive_method='shell' then you enable this
module, and it has its own GUC, say call it archive_command, to
configure the behavior.

An advantage of this approach is that it's perfectly
backward-compatible. I understand that archive_command is a hateful
thing to many people here, but software has to serve the user base,
not just the developers. Lots of people use archive_command and rely
on it -- and are not interested in installing yet another piece of
out-of-core software to do what $OTHERDB has built in.

+1 to all of this, certainly for the time being. The archive_command
mechanism is not great, but it is simple, and this part is not really
what makes writing a good archive command hard.

I had also originally envisioned this a default extension in core, but
having the default 'shell' method built-in is certainly simpler.

I have no problem building it this way. It's certainly better for
backward compatibility, which I think everyone here feels is
important.

Robert's proposed design is a bit more like my original proof-of-
concept [0]/messages/by-id/E9035E94-EC76-436E-B6C9-1C03FBD8EF54@amazon.com. There, I added an archive_library GUC which was
basically an extension of shared_preload_libraries (which creates some
interesting problems in the library loading logic). You could only
set one of archive_command or archive_library at any given time. When
the archive_library was set, we ran that library's _PG_init() just
like we do for any other library, and then we set the archiver
function pointer to the library's _PG_archive() function.

IIUC the main difference between this design and what Robert proposes
is that we'd also move the existing archive_command stuff somewhere
else and then access it via the archiver function pointer. I think
that is clearly better than branching based on whether archive_command
or archive_library is set. (BTW I'm not wedded to these GUCs. If
folks would rather create something like the archive_method GUC, I
think that would work just as well.)

My original proof-of-concept also attempted to handle a bunch of other
shell command GUCs, but perhaps I'd better keep this focused on
archive_command for now. What we do here could serve as an example of
how to adjust the other shell command GUCs later on. I'll go ahead
and rework my patch to look more like what is being discussed here,
although I expect the exact design for the interface will continue to
evolve based on the feedback in this thread.

Nathan

[0]: /messages/by-id/E9035E94-EC76-436E-B6C9-1C03FBD8EF54@amazon.com

#37Robert Haas
robertmhaas@gmail.com
In reply to: Magnus Hagander (#35)
Re: parallelizing the archiver

On Tue, Oct 19, 2021 at 10:19 AM Magnus Hagander <magnus@hagander.net> wrote:

But, is logical decoding really that great an example? I mean, we build pgoutput.so as a library, we don't provide it compiled-in. So we could build the "shell archiver" based on that pattern, in which case we should create a postmaster/shell_archiver directory or something like that?

Well, I guess you could also use parallel contexts as an example.
There, the core facilities that most people will use are baked into
the server, but you can provide your own in an extension and the
parallel context stuff will happily call it for you if you so request.

I don't think the details here are too important. I'm just saying that
not everything needs to depend on _PG_init() as a way of bootstrapping
itself. TBH, if I ran the zoo and also had infinite time to tinker
with stuff like this, I'd probably make a pass through the hooks we
already have and try to refactor as many of them as possible to use
some mechanism other than _PG_init() to bootstrap themselves. That
mechanism actually sucks. When we use other mechanisms -- like a
language "C" function that knows the shared object name and function
name -- then load is triggered when it's needed, and the user gets the
behavior they want. Similarly with logical decoding and FDWs -- you,
as the user, say that you want this or that kind of logical decoding
or FDW or C function or whatever -- and then the system either notices
that it's already loaded and does what you want, or notices that it's
not loaded and loads it, and then does what you want.

But when the bootstrapping mechanism is _PG_init(), then the user has
got to make sure the library is loaded at the correct time. They have
to know whether it should go into shared_preload_libraries or whether
it should be put into one of the other various GUCs or if it can be
loaded on the fly with LOAD. If they don't load it in the right way,
or if it doesn't get loaded at all, well then probably it just
silently doesn't work. Plus there can be weird cases if it gets loaded
into some backends but not others and things like that.

And here we seem to have an opportunity to improve the interface by
not depending on it.

--
Robert Haas
EDB: http://www.enterprisedb.com

#38Stephen Frost
sfrost@snowman.net
In reply to: Magnus Hagander (#35)
Re: parallelizing the archiver

Greetings,

* Magnus Hagander (magnus@hagander.net) wrote:

Backwards compatibility is definitely a must, I'd say. Regardless of
exactly how the backwards-compatible pugin is shipped, it should be what's
turned on by default.

I keep seeing this thrown around and I don't quite get why we feel this
is the case. I'm not completely against trying to maintain backwards
compatibility, but at the same time, we just went through changing quite
a bit around in v12 with the restore command and that's the other half
of this. Why are we so concerned about backwards compatibility here
when there was hardly any complaint raised about breaking it in the
restore case?

If maintaining compatibility makes this a lot more difficult or ugly,
then I'm against doing so. I don't know that to be the case, none of
the proposed approaches really sound all that bad to me, but I certainly
don't think we should be entirely avoiding the idea of breaking
backwards compatibility here. We literally just did that and while
there's been some noise about it, it's hardly risen to the level of
being "something we should never, ever, even consider doing again" as
seems to be implied on this thread.

For those who might argue that maintaining compatibility for archive
command is somehow more important than for restore command- allow me to
save you the trouble and just let you know that I don't buy off on such
an argument. If anything, it should be the opposite. You back up your
database all the time and you're likely to see much more quickly if that
stops working. Database restores, on the other hand, are nearly always
done in times of great stress and when you want things to be very clear
and easy to follow and for everything to 'just work'.

Thanks,

Stephen

#39Bossart, Nathan
bossartn@amazon.com
In reply to: Stephen Frost (#38)
2 attachment(s)
Re: parallelizing the archiver

On 10/19/21, 9:14 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

My original proof-of-concept also attempted to handle a bunch of other
shell command GUCs, but perhaps I'd better keep this focused on
archive_command for now. What we do here could serve as an example of
how to adjust the other shell command GUCs later on. I'll go ahead
and rework my patch to look more like what is being discussed here,
although I expect the exact design for the interface will continue to
evolve based on the feedback in this thread.

Alright, I reworked the patch a bit to maintain backward
compatibility. My initial intent for 0001 was to just do a clean
refactor to move the shell archiving stuff to its own file. However,
after I did that, I realized that adding the hook wouldn't be too much
more work, so I did that as well. This seems to be enough to support
custom archiving modules. I included a basic example of such a module
in 0002. 0002 is included primarily for demonstration purposes.

I do wonder if there are some further enhancements we should make to
the archiving module interface. With 0001 applied, archive_command is
silently ignored if you've preloaded a library that uses the hook.
There's no way to indicate that you actually want to use
archive_command or that you want to use a specific library as the
archive library. On the other hand, just adding the hook keeps things
simple, and it doesn't preclude future improvements in this area.

Nathan

Attachments:

v3-0001-Move-logic-for-archiving-via-shell-to-its-own-fil.patchapplication/octet-stream; name=v3-0001-Move-logic-for-archiving-via-shell-to-its-own-fil.patchDownload
From c6b448bed60ada06b9a341e59dac501b43ceafe1 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Wed, 20 Oct 2021 21:42:54 +0000
Subject: [PATCH v3 1/2] Move logic for archiving via shell to its own file.

---
 src/backend/access/transam/xlog.c      |   2 +-
 src/backend/postmaster/Makefile        |   1 +
 src/backend/postmaster/pgarch.c        | 138 ++++-----------------------------
 src/backend/postmaster/shell_archive.c | 137 ++++++++++++++++++++++++++++++++
 src/include/access/xlog.h              |  11 ++-
 src/include/postmaster/pgarch.h        |   6 ++
 6 files changed, 172 insertions(+), 123 deletions(-)
 create mode 100644 src/backend/postmaster/shell_archive.c

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 62862255fc..b0bb3d633e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8777,7 +8777,7 @@ ShutdownXLOG(int code, Datum arg)
 		 * process one more time at the end of shutdown). The checkpoint
 		 * record will go to the next XLOG file and won't be archived (yet).
 		 */
-		if (XLogArchivingActive() && XLogArchiveCommandSet())
+		if (XLogArchivingActive() && XLogArchivingConfigured())
 			RequestXLogSwitch(false);
 
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 787c6a2c3b..dbbeac5a82 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -23,6 +23,7 @@ OBJS = \
 	pgarch.o \
 	pgstat.o \
 	postmaster.o \
+	shell_archive.o \
 	startup.o \
 	syslogger.o \
 	walwriter.o
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 74a7d7c4d0..69e23e286f 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -25,18 +25,12 @@
  */
 #include "postgres.h"
 
-#include <fcntl.h>
-#include <signal.h>
-#include <time.h>
 #include <sys/stat.h>
-#include <sys/time.h>
-#include <sys/wait.h>
 #include <unistd.h>
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "libpq/pqsignal.h"
-#include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/pgarch.h"
@@ -78,6 +72,12 @@ typedef struct PgArchData
 	int			pgprocno;		/* pgprocno of archiver process */
 } PgArchData;
 
+/*
+ * PG_archive can be overwritten by modules to define custom archiving logic.
+ * By default, we use archive_command.
+ */
+PG_archive_t PG_archive = shell_archive;
+
 
 /* ----------
  * Local data
@@ -358,11 +358,11 @@ pgarch_ArchiverCopyLoop(void)
 			 */
 			HandlePgArchInterrupts();
 
-			/* can't do anything if no command ... */
-			if (!XLogArchiveCommandSet())
+			/* can't do anything if not configured ... */
+			if (!XLogArchivingConfigured())
 			{
 				ereport(WARNING,
-						(errmsg("archive_mode enabled, yet archive_command is not set")));
+						(errmsg("archive_mode enabled, yet archiving is not configured")));
 				return;
 			}
 
@@ -443,136 +443,32 @@ pgarch_ArchiverCopyLoop(void)
 /*
  * pgarch_archiveXlog
  *
- * Invokes system(3) to copy one archive file to wherever it should go
+ * Invokes PG_archive() to copy one archive file to wherever it should go
  *
  * Returns true if successful
  */
 static bool
 pgarch_archiveXlog(char *xlog)
 {
-	char		xlogarchcmd[MAXPGPATH];
 	char		pathname[MAXPGPATH];
 	char		activitymsg[MAXFNAMELEN + 16];
-	char	   *dp;
-	char	   *endp;
-	const char *sp;
-	int			rc;
-
-	snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
+	bool		ret;
 
-	/*
-	 * construct the command to be executed
-	 */
-	dp = xlogarchcmd;
-	endp = xlogarchcmd + MAXPGPATH - 1;
-	*endp = '\0';
+	Assert(PG_archive != NULL);
 
-	for (sp = XLogArchiveCommand; *sp; sp++)
-	{
-		if (*sp == '%')
-		{
-			switch (sp[1])
-			{
-				case 'p':
-					/* %p: relative path of source file */
-					sp++;
-					strlcpy(dp, pathname, endp - dp);
-					make_native_path(dp);
-					dp += strlen(dp);
-					break;
-				case 'f':
-					/* %f: filename of source file */
-					sp++;
-					strlcpy(dp, xlog, endp - dp);
-					dp += strlen(dp);
-					break;
-				case '%':
-					/* convert %% to a single % */
-					sp++;
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-				default:
-					/* otherwise treat the % as not special */
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-			}
-		}
-		else
-		{
-			if (dp < endp)
-				*dp++ = *sp;
-		}
-	}
-	*dp = '\0';
-
-	ereport(DEBUG3,
-			(errmsg_internal("executing archive command \"%s\"",
-							 xlogarchcmd)));
+	snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
 
 	/* Report archive activity in PS display */
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	rc = system(xlogarchcmd);
-	if (rc != 0)
-	{
-		/*
-		 * If either the shell itself, or a called command, died on a signal,
-		 * abort the archiver.  We do this because system() ignores SIGINT and
-		 * SIGQUIT while waiting; so a signal is very likely something that
-		 * should have interrupted us too.  Also die if the shell got a hard
-		 * "command not found" type of error.  If we overreact it's no big
-		 * deal, the postmaster will just start the archiver again.
-		 */
-		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
-
-		if (WIFEXITED(rc))
-		{
-			ereport(lev,
-					(errmsg("archive command failed with exit code %d",
-							WEXITSTATUS(rc)),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-		else if (WIFSIGNALED(rc))
-		{
-#if defined(WIN32)
-			ereport(lev,
-					(errmsg("archive command was terminated by exception 0x%X",
-							WTERMSIG(rc)),
-					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#else
-			ereport(lev,
-					(errmsg("archive command was terminated by signal %d: %s",
-							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#endif
-		}
-		else
-		{
-			ereport(lev,
-					(errmsg("archive command exited with unrecognized status %d",
-							rc),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-
+	if ((ret = (*PG_archive) (xlog, pathname)))
+		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
+	else
 		snprintf(activitymsg, sizeof(activitymsg), "failed on %s", xlog);
-		set_ps_display(activitymsg);
-
-		return false;
-	}
-	elog(DEBUG1, "archived write-ahead log file \"%s\"", xlog);
-
-	snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	set_ps_display(activitymsg);
 
-	return true;
+	return ret;
 }
 
 /*
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
new file mode 100644
index 0000000000..1ced9d11dc
--- /dev/null
+++ b/src/backend/postmaster/shell_archive.c
@@ -0,0 +1,137 @@
+/*-------------------------------------------------------------------------
+ *
+ * shell_archive.c
+ *
+ * This archiving function uses a user-specified shell command (the
+ * archive_command GUC) to copy write-ahead log files.  It is used as the
+ * default for PG_archive, but other modules may define their own custom
+ * archiving logic.
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/shell_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/xlog.h"
+#include "postmaster/pgarch.h"
+
+bool
+shell_archive(const char *file, const char *path)
+{
+	char		xlogarchcmd[MAXPGPATH];
+	char	   *dp;
+	char	   *endp;
+	const char *sp;
+	int			rc;
+
+	Assert(file != NULL);
+	Assert(path != NULL);
+
+	/*
+	 * construct the command to be executed
+	 */
+	dp = xlogarchcmd;
+	endp = xlogarchcmd + MAXPGPATH - 1;
+	*endp = '\0';
+
+	for (sp = XLogArchiveCommand; *sp; sp++)
+	{
+		if (*sp == '%')
+		{
+			switch (sp[1])
+			{
+				case 'p':
+					/* %p: relative path of source file */
+					sp++;
+					strlcpy(dp, path, endp - dp);
+					make_native_path(dp);
+					dp += strlen(dp);
+					break;
+				case 'f':
+					/* %f: filename of source file */
+					sp++;
+					strlcpy(dp, file, endp - dp);
+					dp += strlen(dp);
+					break;
+				case '%':
+					/* convert %% to a single % */
+					sp++;
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+				default:
+					/* otherwise treat the % as not special */
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+			}
+		}
+		else
+		{
+			if (dp < endp)
+				*dp++ = *sp;
+		}
+	}
+	*dp = '\0';
+
+	ereport(DEBUG3,
+			(errmsg_internal("executing archive command \"%s\"",
+							 xlogarchcmd)));
+
+	rc = system(xlogarchcmd);
+	if (rc != 0)
+	{
+		/*
+		 * If either the shell itself, or a called command, died on a signal,
+		 * abort the archiver.  We do this because system() ignores SIGINT and
+		 * SIGQUIT while waiting; so a signal is very likely something that
+		 * should have interrupted us too.  Also die if the shell got a hard
+		 * "command not found" type of error.  If we overreact it's no big
+		 * deal, the postmaster will just start the archiver again.
+		 */
+		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
+
+		if (WIFEXITED(rc))
+		{
+			ereport(lev,
+					(errmsg("archive command failed with exit code %d",
+							WEXITSTATUS(rc)),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+		else if (WIFSIGNALED(rc))
+		{
+#if defined(WIN32)
+			ereport(lev,
+					(errmsg("archive command was terminated by exception 0x%X",
+							WTERMSIG(rc)),
+					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#else
+			ereport(lev,
+					(errmsg("archive command was terminated by signal %d: %s",
+							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#endif
+		}
+		else
+		{
+			ereport(lev,
+					(errmsg("archive command exited with unrecognized status %d",
+							rc),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+
+		return false;
+	}
+
+	elog(DEBUG1, "archived write-ahead log file \"%s\"", file);
+	return true;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 5e2c94a05f..f38e383600 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -18,6 +18,7 @@
 #include "datatype/timestamp.h"
 #include "lib/stringinfo.h"
 #include "nodes/pg_list.h"
+#include "postmaster/pgarch.h"
 #include "storage/fd.h"
 
 
@@ -157,7 +158,15 @@ extern PGDLLIMPORT int wal_level;
 /* Is WAL archiving enabled always (even during recovery)? */
 #define XLogArchivingAlways() \
 	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
-#define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')
+
+/*
+ * Is WAL archiving configured?  For consistency with previous releases, this
+ * checks that archive_command is set when archiving via shell is enabled.
+ * Otherwise, we just check that an archive function is set, and it is the
+ * responsibility of that archive function to ensure it is properly configured.
+ */
+#define XLogArchivingConfigured() \
+	(PG_archive && (PG_archive != shell_archive || XLogArchiveCommand[0] != '\0'))
 
 /*
  * Is WAL-logging necessary for archival or log-shipping, or can we skip
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 1e47a143e1..59fefb3458 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -32,4 +32,10 @@ extern bool PgArchCanRestart(void);
 extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 
+typedef bool (*PG_archive_t) (const char *file, const char *path);
+extern PGDLLIMPORT PG_archive_t PG_archive;
+
+/* in shell_archive.c */
+extern bool shell_archive(const char *file, const char *path);
+
 #endif							/* _PGARCH_H */
-- 
2.16.6

v3-0002-Add-an-example-of-a-basic-archiving-module.patchapplication/octet-stream; name=v3-0002-Add-an-example-of-a-basic-archiving-module.patchDownload
From 02868284e06782d5dd2c51961727fe70f09f9b09 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Wed, 20 Oct 2021 21:51:08 +0000
Subject: [PATCH v3 2/2] Add an example of a basic archiving module.

---
 contrib/Makefile                      |   1 +
 contrib/basic_archive/Makefile        |  15 +++
 contrib/basic_archive/basic_archive.c | 218 ++++++++++++++++++++++++++++++++++
 3 files changed, 234 insertions(+)
 create mode 100644 contrib/basic_archive/Makefile
 create mode 100644 contrib/basic_archive/basic_archive.c

diff --git a/contrib/Makefile b/contrib/Makefile
index f27e458482..aff834ebaa 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -9,6 +9,7 @@ SUBDIRS = \
 		amcheck		\
 		auth_delay	\
 		auto_explain	\
+		basic_archive	\
 		bloom		\
 		btree_gin	\
 		btree_gist	\
diff --git a/contrib/basic_archive/Makefile b/contrib/basic_archive/Makefile
new file mode 100644
index 0000000000..ea6b460889
--- /dev/null
+++ b/contrib/basic_archive/Makefile
@@ -0,0 +1,15 @@
+# contrib/basic_archive/Makefile
+
+MODULES = basic_archive
+PGFILEDESC = "basic_archive - basic archive module"
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/basic_archive
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
new file mode 100644
index 0000000000..ed78c96e16
--- /dev/null
+++ b/contrib/basic_archive/basic_archive.c
@@ -0,0 +1,218 @@
+/*-------------------------------------------------------------------------
+ *
+ * basic_archive.c
+ *
+ * This file demonstrates a basic archive library implementation that is
+ * roughly equivalent to the following shell command:
+ *
+ * 		test ! -f /path/to/dest && cp /path/to/src /path/to/dest
+ *
+ * One notable difference between this module and the shell command above
+ * is that this module first copies the file to a temporary destination,
+ * syncs it to disk, and then durably moves it to the final destination.
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  contrib/basic_archive/basic_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "fmgr.h"
+#include "miscadmin.h"
+#include "postmaster/pgarch.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+void	_PG_init(void);
+bool	basic_archive(const char *file, const char *path);
+
+static char *archive_directory = NULL;
+
+static bool check_archive_directory(char **newval, void **extra, GucSource source);
+static void copy_file(const char *src, const char *dst);
+
+void
+_PG_init(void)
+{
+	if (!process_shared_preload_libraries_in_progress)
+		ereport(ERROR,
+				(errmsg("\"basic_archive\" can only be loaded via "
+						"\"shared_preload_libraries\"")));
+
+	if (PG_archive != shell_archive)
+		ereport(ERROR,
+				(errmsg("custom archive function already loaded by another module")));
+
+	DefineCustomStringVariable("basic_archive.archive_directory",
+							   gettext_noop("Archive file destination directory."),
+							   NULL,
+							   &archive_directory,
+							   "",
+							   PGC_POSTMASTER,
+							   GUC_NOT_IN_SAMPLE,
+							   check_archive_directory, NULL, NULL);
+
+	EmitWarningsOnPlaceholders("basic_archive");
+
+	PG_archive = basic_archive;
+}
+
+static bool
+check_archive_directory(char **newval, void **extra, GucSource source)
+{
+	struct stat st;
+
+	if (*newval == NULL || *newval[0] == '\0')
+		return true;
+
+	if (stat(*newval, &st) != 0 || !S_ISDIR(st.st_mode))
+	{
+		GUC_check_errdetail("specified archive directory does not exist");
+		return false;
+	}
+
+	return true;
+}
+
+bool
+basic_archive(const char *file, const char *path)
+{
+	char destination[MAXPGPATH];
+	char temp[MAXPGPATH];
+	struct stat st;
+
+	Assert(file);
+	Assert(path);
+
+	ereport(DEBUG3,
+			(errmsg("archiving \"%s\" via basic_archive", file)));
+
+	if (archive_directory == NULL || archive_directory[0] == '\0')
+	{
+		ereport(WARNING,
+				(errmsg("\"basic_archive.archive_directory\" not specified")));
+		return false;
+	}
+
+#define TEMP_FILE_NAME ("archtemp")
+
+	if (strlen(archive_directory) + Max(strlen(file), strlen(TEMP_FILE_NAME)) + 2 >= MAXPGPATH)
+	{
+		ereport(WARNING,
+				(errmsg("archive destination path too long")));
+		return false;
+	}
+
+	snprintf(destination, MAXPGPATH, "%s/%s", archive_directory, file);
+	snprintf(temp, MAXPGPATH, "%s/%s", archive_directory, TEMP_FILE_NAME);
+
+	/*
+	 * First, check if the file has already been archived.  If it has,
+	 * just fail because something might be wrong.
+	 */
+	if (stat(destination, &st) == 0)
+	{
+		ereport(WARNING,
+				(errmsg("archive file \"%s\" already exists",
+						destination)));
+		return false;
+	}
+	else if (errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat file \"%s\": %m",
+						destination)));
+
+	/*
+	 * Remove pre-existing temporary file, if one exists.
+	 */
+	if (unlink(temp) != 0 && errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not unlink file \"%s\": %m",
+						temp)));
+
+	/*
+	 * Copy the file to its temporary destination.
+	 */
+	copy_file(path, temp);
+
+	/*
+	 * Sync the temporary file to disk and move it to its final
+	 * destination.
+	 */
+	(void) durable_rename(temp, destination, ERROR);
+
+	ereport(DEBUG1,
+			(errmsg("archived \"%s\" via basic_archive", file)));
+
+	return true;
+}
+
+static void
+copy_file(const char *src, const char *dst)
+{
+	int srcfd;
+	int dstfd;
+	int nbytes;
+	char *buf;
+
+#define COPY_BUF_SIZE (64 * 1024)
+
+	buf = palloc(COPY_BUF_SIZE);
+
+	srcfd = OpenTransientFile(src, O_RDONLY | PG_BINARY);
+	if (srcfd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", src)));
+
+	dstfd = OpenTransientFile(dst, O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
+	if (dstfd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", dst)));
+
+	for (;;)
+	{
+		nbytes = read(srcfd, buf, COPY_BUF_SIZE);
+		if (nbytes < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", src)));
+
+		if (nbytes == 0)
+			break;
+
+		errno = 0;
+		if ((int) write(dstfd, buf, nbytes) != nbytes)
+		{
+			/* if write didn't set errno, assume problem is no disk space */
+			if (errno == 0)
+				errno = ENOSPC;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not write to file \"%s\": %m", dst)));
+		}
+	}
+
+	if (CloseTransientFile(dstfd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", dst)));
+
+	if (CloseTransientFile(srcfd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", src)));
+
+	pfree(buf);
+}
-- 
2.16.6

#40Bossart, Nathan
bossartn@amazon.com
In reply to: Stephen Frost (#38)
2 attachment(s)
Re: parallelizing the archiver

On 10/20/21, 3:23 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

Alright, I reworked the patch a bit to maintain backward
compatibility. My initial intent for 0001 was to just do a clean
refactor to move the shell archiving stuff to its own file. However,
after I did that, I realized that adding the hook wouldn't be too much
more work, so I did that as well. This seems to be enough to support
custom archiving modules. I included a basic example of such a module
in 0002. 0002 is included primarily for demonstration purposes.

It looks like the FreeBSD build is failing because sys/wait.h is
missing. Here is an attempt at fixing that.

Nathan

Attachments:

v4-0002-Add-an-example-of-a-basic-archiving-module.patchapplication/octet-stream; name=v4-0002-Add-an-example-of-a-basic-archiving-module.patchDownload
From 853c54d86dcaeca9ad6496fa84323b0982741227 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Wed, 20 Oct 2021 21:51:08 +0000
Subject: [PATCH v4 2/2] Add an example of a basic archiving module.

---
 contrib/Makefile                      |   1 +
 contrib/basic_archive/Makefile        |  15 +++
 contrib/basic_archive/basic_archive.c | 218 ++++++++++++++++++++++++++++++++++
 3 files changed, 234 insertions(+)
 create mode 100644 contrib/basic_archive/Makefile
 create mode 100644 contrib/basic_archive/basic_archive.c

diff --git a/contrib/Makefile b/contrib/Makefile
index f27e458482..aff834ebaa 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -9,6 +9,7 @@ SUBDIRS = \
 		amcheck		\
 		auth_delay	\
 		auto_explain	\
+		basic_archive	\
 		bloom		\
 		btree_gin	\
 		btree_gist	\
diff --git a/contrib/basic_archive/Makefile b/contrib/basic_archive/Makefile
new file mode 100644
index 0000000000..ea6b460889
--- /dev/null
+++ b/contrib/basic_archive/Makefile
@@ -0,0 +1,15 @@
+# contrib/basic_archive/Makefile
+
+MODULES = basic_archive
+PGFILEDESC = "basic_archive - basic archive module"
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/basic_archive
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
new file mode 100644
index 0000000000..ed78c96e16
--- /dev/null
+++ b/contrib/basic_archive/basic_archive.c
@@ -0,0 +1,218 @@
+/*-------------------------------------------------------------------------
+ *
+ * basic_archive.c
+ *
+ * This file demonstrates a basic archive library implementation that is
+ * roughly equivalent to the following shell command:
+ *
+ * 		test ! -f /path/to/dest && cp /path/to/src /path/to/dest
+ *
+ * One notable difference between this module and the shell command above
+ * is that this module first copies the file to a temporary destination,
+ * syncs it to disk, and then durably moves it to the final destination.
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  contrib/basic_archive/basic_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "fmgr.h"
+#include "miscadmin.h"
+#include "postmaster/pgarch.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+void	_PG_init(void);
+bool	basic_archive(const char *file, const char *path);
+
+static char *archive_directory = NULL;
+
+static bool check_archive_directory(char **newval, void **extra, GucSource source);
+static void copy_file(const char *src, const char *dst);
+
+void
+_PG_init(void)
+{
+	if (!process_shared_preload_libraries_in_progress)
+		ereport(ERROR,
+				(errmsg("\"basic_archive\" can only be loaded via "
+						"\"shared_preload_libraries\"")));
+
+	if (PG_archive != shell_archive)
+		ereport(ERROR,
+				(errmsg("custom archive function already loaded by another module")));
+
+	DefineCustomStringVariable("basic_archive.archive_directory",
+							   gettext_noop("Archive file destination directory."),
+							   NULL,
+							   &archive_directory,
+							   "",
+							   PGC_POSTMASTER,
+							   GUC_NOT_IN_SAMPLE,
+							   check_archive_directory, NULL, NULL);
+
+	EmitWarningsOnPlaceholders("basic_archive");
+
+	PG_archive = basic_archive;
+}
+
+static bool
+check_archive_directory(char **newval, void **extra, GucSource source)
+{
+	struct stat st;
+
+	if (*newval == NULL || *newval[0] == '\0')
+		return true;
+
+	if (stat(*newval, &st) != 0 || !S_ISDIR(st.st_mode))
+	{
+		GUC_check_errdetail("specified archive directory does not exist");
+		return false;
+	}
+
+	return true;
+}
+
+bool
+basic_archive(const char *file, const char *path)
+{
+	char destination[MAXPGPATH];
+	char temp[MAXPGPATH];
+	struct stat st;
+
+	Assert(file);
+	Assert(path);
+
+	ereport(DEBUG3,
+			(errmsg("archiving \"%s\" via basic_archive", file)));
+
+	if (archive_directory == NULL || archive_directory[0] == '\0')
+	{
+		ereport(WARNING,
+				(errmsg("\"basic_archive.archive_directory\" not specified")));
+		return false;
+	}
+
+#define TEMP_FILE_NAME ("archtemp")
+
+	if (strlen(archive_directory) + Max(strlen(file), strlen(TEMP_FILE_NAME)) + 2 >= MAXPGPATH)
+	{
+		ereport(WARNING,
+				(errmsg("archive destination path too long")));
+		return false;
+	}
+
+	snprintf(destination, MAXPGPATH, "%s/%s", archive_directory, file);
+	snprintf(temp, MAXPGPATH, "%s/%s", archive_directory, TEMP_FILE_NAME);
+
+	/*
+	 * First, check if the file has already been archived.  If it has,
+	 * just fail because something might be wrong.
+	 */
+	if (stat(destination, &st) == 0)
+	{
+		ereport(WARNING,
+				(errmsg("archive file \"%s\" already exists",
+						destination)));
+		return false;
+	}
+	else if (errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat file \"%s\": %m",
+						destination)));
+
+	/*
+	 * Remove pre-existing temporary file, if one exists.
+	 */
+	if (unlink(temp) != 0 && errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not unlink file \"%s\": %m",
+						temp)));
+
+	/*
+	 * Copy the file to its temporary destination.
+	 */
+	copy_file(path, temp);
+
+	/*
+	 * Sync the temporary file to disk and move it to its final
+	 * destination.
+	 */
+	(void) durable_rename(temp, destination, ERROR);
+
+	ereport(DEBUG1,
+			(errmsg("archived \"%s\" via basic_archive", file)));
+
+	return true;
+}
+
+static void
+copy_file(const char *src, const char *dst)
+{
+	int srcfd;
+	int dstfd;
+	int nbytes;
+	char *buf;
+
+#define COPY_BUF_SIZE (64 * 1024)
+
+	buf = palloc(COPY_BUF_SIZE);
+
+	srcfd = OpenTransientFile(src, O_RDONLY | PG_BINARY);
+	if (srcfd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", src)));
+
+	dstfd = OpenTransientFile(dst, O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
+	if (dstfd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", dst)));
+
+	for (;;)
+	{
+		nbytes = read(srcfd, buf, COPY_BUF_SIZE);
+		if (nbytes < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", src)));
+
+		if (nbytes == 0)
+			break;
+
+		errno = 0;
+		if ((int) write(dstfd, buf, nbytes) != nbytes)
+		{
+			/* if write didn't set errno, assume problem is no disk space */
+			if (errno == 0)
+				errno = ENOSPC;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not write to file \"%s\": %m", dst)));
+		}
+	}
+
+	if (CloseTransientFile(dstfd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", dst)));
+
+	if (CloseTransientFile(srcfd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", src)));
+
+	pfree(buf);
+}
-- 
2.16.6

v4-0001-Move-logic-for-archiving-via-shell-to-its-own-fil.patchapplication/octet-stream; name=v4-0001-Move-logic-for-archiving-via-shell-to-its-own-fil.patchDownload
From 513842e402841251c3a1d2dc0fac5fc51a0f2a9b Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Wed, 20 Oct 2021 21:42:54 +0000
Subject: [PATCH v4 1/2] Move logic for archiving via shell to its own file.

---
 src/backend/access/transam/xlog.c      |   2 +-
 src/backend/postmaster/Makefile        |   1 +
 src/backend/postmaster/pgarch.c        | 138 ++++----------------------------
 src/backend/postmaster/shell_archive.c | 139 +++++++++++++++++++++++++++++++++
 src/include/access/xlog.h              |  11 ++-
 src/include/postmaster/pgarch.h        |   6 ++
 6 files changed, 174 insertions(+), 123 deletions(-)
 create mode 100644 src/backend/postmaster/shell_archive.c

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 62862255fc..b0bb3d633e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8777,7 +8777,7 @@ ShutdownXLOG(int code, Datum arg)
 		 * process one more time at the end of shutdown). The checkpoint
 		 * record will go to the next XLOG file and won't be archived (yet).
 		 */
-		if (XLogArchivingActive() && XLogArchiveCommandSet())
+		if (XLogArchivingActive() && XLogArchivingConfigured())
 			RequestXLogSwitch(false);
 
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 787c6a2c3b..dbbeac5a82 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -23,6 +23,7 @@ OBJS = \
 	pgarch.o \
 	pgstat.o \
 	postmaster.o \
+	shell_archive.o \
 	startup.o \
 	syslogger.o \
 	walwriter.o
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 74a7d7c4d0..69e23e286f 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -25,18 +25,12 @@
  */
 #include "postgres.h"
 
-#include <fcntl.h>
-#include <signal.h>
-#include <time.h>
 #include <sys/stat.h>
-#include <sys/time.h>
-#include <sys/wait.h>
 #include <unistd.h>
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "libpq/pqsignal.h"
-#include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/pgarch.h"
@@ -78,6 +72,12 @@ typedef struct PgArchData
 	int			pgprocno;		/* pgprocno of archiver process */
 } PgArchData;
 
+/*
+ * PG_archive can be overwritten by modules to define custom archiving logic.
+ * By default, we use archive_command.
+ */
+PG_archive_t PG_archive = shell_archive;
+
 
 /* ----------
  * Local data
@@ -358,11 +358,11 @@ pgarch_ArchiverCopyLoop(void)
 			 */
 			HandlePgArchInterrupts();
 
-			/* can't do anything if no command ... */
-			if (!XLogArchiveCommandSet())
+			/* can't do anything if not configured ... */
+			if (!XLogArchivingConfigured())
 			{
 				ereport(WARNING,
-						(errmsg("archive_mode enabled, yet archive_command is not set")));
+						(errmsg("archive_mode enabled, yet archiving is not configured")));
 				return;
 			}
 
@@ -443,136 +443,32 @@ pgarch_ArchiverCopyLoop(void)
 /*
  * pgarch_archiveXlog
  *
- * Invokes system(3) to copy one archive file to wherever it should go
+ * Invokes PG_archive() to copy one archive file to wherever it should go
  *
  * Returns true if successful
  */
 static bool
 pgarch_archiveXlog(char *xlog)
 {
-	char		xlogarchcmd[MAXPGPATH];
 	char		pathname[MAXPGPATH];
 	char		activitymsg[MAXFNAMELEN + 16];
-	char	   *dp;
-	char	   *endp;
-	const char *sp;
-	int			rc;
-
-	snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
+	bool		ret;
 
-	/*
-	 * construct the command to be executed
-	 */
-	dp = xlogarchcmd;
-	endp = xlogarchcmd + MAXPGPATH - 1;
-	*endp = '\0';
+	Assert(PG_archive != NULL);
 
-	for (sp = XLogArchiveCommand; *sp; sp++)
-	{
-		if (*sp == '%')
-		{
-			switch (sp[1])
-			{
-				case 'p':
-					/* %p: relative path of source file */
-					sp++;
-					strlcpy(dp, pathname, endp - dp);
-					make_native_path(dp);
-					dp += strlen(dp);
-					break;
-				case 'f':
-					/* %f: filename of source file */
-					sp++;
-					strlcpy(dp, xlog, endp - dp);
-					dp += strlen(dp);
-					break;
-				case '%':
-					/* convert %% to a single % */
-					sp++;
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-				default:
-					/* otherwise treat the % as not special */
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-			}
-		}
-		else
-		{
-			if (dp < endp)
-				*dp++ = *sp;
-		}
-	}
-	*dp = '\0';
-
-	ereport(DEBUG3,
-			(errmsg_internal("executing archive command \"%s\"",
-							 xlogarchcmd)));
+	snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
 
 	/* Report archive activity in PS display */
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	rc = system(xlogarchcmd);
-	if (rc != 0)
-	{
-		/*
-		 * If either the shell itself, or a called command, died on a signal,
-		 * abort the archiver.  We do this because system() ignores SIGINT and
-		 * SIGQUIT while waiting; so a signal is very likely something that
-		 * should have interrupted us too.  Also die if the shell got a hard
-		 * "command not found" type of error.  If we overreact it's no big
-		 * deal, the postmaster will just start the archiver again.
-		 */
-		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
-
-		if (WIFEXITED(rc))
-		{
-			ereport(lev,
-					(errmsg("archive command failed with exit code %d",
-							WEXITSTATUS(rc)),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-		else if (WIFSIGNALED(rc))
-		{
-#if defined(WIN32)
-			ereport(lev,
-					(errmsg("archive command was terminated by exception 0x%X",
-							WTERMSIG(rc)),
-					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#else
-			ereport(lev,
-					(errmsg("archive command was terminated by signal %d: %s",
-							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#endif
-		}
-		else
-		{
-			ereport(lev,
-					(errmsg("archive command exited with unrecognized status %d",
-							rc),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-
+	if ((ret = (*PG_archive) (xlog, pathname)))
+		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
+	else
 		snprintf(activitymsg, sizeof(activitymsg), "failed on %s", xlog);
-		set_ps_display(activitymsg);
-
-		return false;
-	}
-	elog(DEBUG1, "archived write-ahead log file \"%s\"", xlog);
-
-	snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	set_ps_display(activitymsg);
 
-	return true;
+	return ret;
 }
 
 /*
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
new file mode 100644
index 0000000000..9bbe1cbe0f
--- /dev/null
+++ b/src/backend/postmaster/shell_archive.c
@@ -0,0 +1,139 @@
+/*-------------------------------------------------------------------------
+ *
+ * shell_archive.c
+ *
+ * This archiving function uses a user-specified shell command (the
+ * archive_command GUC) to copy write-ahead log files.  It is used as the
+ * default for PG_archive, but other modules may define their own custom
+ * archiving logic.
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/shell_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/wait.h>
+
+#include "access/xlog.h"
+#include "postmaster/pgarch.h"
+
+bool
+shell_archive(const char *file, const char *path)
+{
+	char		xlogarchcmd[MAXPGPATH];
+	char	   *dp;
+	char	   *endp;
+	const char *sp;
+	int			rc;
+
+	Assert(file != NULL);
+	Assert(path != NULL);
+
+	/*
+	 * construct the command to be executed
+	 */
+	dp = xlogarchcmd;
+	endp = xlogarchcmd + MAXPGPATH - 1;
+	*endp = '\0';
+
+	for (sp = XLogArchiveCommand; *sp; sp++)
+	{
+		if (*sp == '%')
+		{
+			switch (sp[1])
+			{
+				case 'p':
+					/* %p: relative path of source file */
+					sp++;
+					strlcpy(dp, path, endp - dp);
+					make_native_path(dp);
+					dp += strlen(dp);
+					break;
+				case 'f':
+					/* %f: filename of source file */
+					sp++;
+					strlcpy(dp, file, endp - dp);
+					dp += strlen(dp);
+					break;
+				case '%':
+					/* convert %% to a single % */
+					sp++;
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+				default:
+					/* otherwise treat the % as not special */
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+			}
+		}
+		else
+		{
+			if (dp < endp)
+				*dp++ = *sp;
+		}
+	}
+	*dp = '\0';
+
+	ereport(DEBUG3,
+			(errmsg_internal("executing archive command \"%s\"",
+							 xlogarchcmd)));
+
+	rc = system(xlogarchcmd);
+	if (rc != 0)
+	{
+		/*
+		 * If either the shell itself, or a called command, died on a signal,
+		 * abort the archiver.  We do this because system() ignores SIGINT and
+		 * SIGQUIT while waiting; so a signal is very likely something that
+		 * should have interrupted us too.  Also die if the shell got a hard
+		 * "command not found" type of error.  If we overreact it's no big
+		 * deal, the postmaster will just start the archiver again.
+		 */
+		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
+
+		if (WIFEXITED(rc))
+		{
+			ereport(lev,
+					(errmsg("archive command failed with exit code %d",
+							WEXITSTATUS(rc)),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+		else if (WIFSIGNALED(rc))
+		{
+#if defined(WIN32)
+			ereport(lev,
+					(errmsg("archive command was terminated by exception 0x%X",
+							WTERMSIG(rc)),
+					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#else
+			ereport(lev,
+					(errmsg("archive command was terminated by signal %d: %s",
+							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#endif
+		}
+		else
+		{
+			ereport(lev,
+					(errmsg("archive command exited with unrecognized status %d",
+							rc),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+
+		return false;
+	}
+
+	elog(DEBUG1, "archived write-ahead log file \"%s\"", file);
+	return true;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 5e2c94a05f..f38e383600 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -18,6 +18,7 @@
 #include "datatype/timestamp.h"
 #include "lib/stringinfo.h"
 #include "nodes/pg_list.h"
+#include "postmaster/pgarch.h"
 #include "storage/fd.h"
 
 
@@ -157,7 +158,15 @@ extern PGDLLIMPORT int wal_level;
 /* Is WAL archiving enabled always (even during recovery)? */
 #define XLogArchivingAlways() \
 	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
-#define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')
+
+/*
+ * Is WAL archiving configured?  For consistency with previous releases, this
+ * checks that archive_command is set when archiving via shell is enabled.
+ * Otherwise, we just check that an archive function is set, and it is the
+ * responsibility of that archive function to ensure it is properly configured.
+ */
+#define XLogArchivingConfigured() \
+	(PG_archive && (PG_archive != shell_archive || XLogArchiveCommand[0] != '\0'))
 
 /*
  * Is WAL-logging necessary for archival or log-shipping, or can we skip
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 1e47a143e1..59fefb3458 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -32,4 +32,10 @@ extern bool PgArchCanRestart(void);
 extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 
+typedef bool (*PG_archive_t) (const char *file, const char *path);
+extern PGDLLIMPORT PG_archive_t PG_archive;
+
+/* in shell_archive.c */
+extern bool shell_archive(const char *file, const char *path);
+
 #endif							/* _PGARCH_H */
-- 
2.16.6

#41Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#38)
Re: parallelizing the archiver

On Tue, Oct 19, 2021 at 2:50 PM Stephen Frost <sfrost@snowman.net> wrote:

I keep seeing this thrown around and I don't quite get why we feel this
is the case. I'm not completely against trying to maintain backwards
compatibility, but at the same time, we just went through changing quite
a bit around in v12 with the restore command and that's the other half
of this. Why are we so concerned about backwards compatibility here
when there was hardly any complaint raised about breaking it in the
restore case?

There are 0 references to restore_command in the v12 release notes.
Just in case you had the version number wrong in this email, I
compared the documentation for restore_command in v10 to the
documentation in v14. The differences seem to be only cosmetic. So I'm
not sure what functional change you think we made. It was probably
less significant than what was being discussed here in regards to
archive_command.

Also, more to the point, when there's a need to break backward
compatibility in order to get some improvement, it's worth
considering, but here there just isn't.

--
Robert Haas
EDB: http://www.enterprisedb.com

#42Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#41)
Re: parallelizing the archiver

Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:

On Tue, Oct 19, 2021 at 2:50 PM Stephen Frost <sfrost@snowman.net> wrote:

I keep seeing this thrown around and I don't quite get why we feel this
is the case. I'm not completely against trying to maintain backwards
compatibility, but at the same time, we just went through changing quite
a bit around in v12 with the restore command and that's the other half
of this. Why are we so concerned about backwards compatibility here
when there was hardly any complaint raised about breaking it in the
restore case?

There are 0 references to restore_command in the v12 release notes.
Just in case you had the version number wrong in this email, I
compared the documentation for restore_command in v10 to the
documentation in v14. The differences seem to be only cosmetic. So I'm
not sure what functional change you think we made. It was probably
less significant than what was being discussed here in regards to
archive_command.

restore_command used to be in recovery.conf, which disappeared with v12
and it now has to go into postgresql.auto.conf or postgresql.conf.

That's a huge breaking change.

Also, more to the point, when there's a need to break backward
compatibility in order to get some improvement, it's worth
considering, but here there just isn't.

There won't be any thought towards a backwards-incompatible capability
if everyone is saying that we can't possibly break it. That's why I was
commenting on it.

Thanks,

Stephen

#43Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#42)
Re: parallelizing the archiver

On Thu, Oct 21, 2021 at 4:29 PM Stephen Frost <sfrost@snowman.net> wrote:

restore_command used to be in recovery.conf, which disappeared with v12
and it now has to go into postgresql.auto.conf or postgresql.conf.

That's a huge breaking change.

Not in the same sense. Moving the functionality to a different
configuration file can and probably did cause a lot of problems for
people, but the same basic functionality was still available.

(Also, I'm pretty sure that the recovery.conf changes would have
happened years earlier if there hadn't been backward compatibility
concerns, from Simon in particular. So saying that there was "hardly
any complaint raised" in that case doesn't seem to me to be entirely
accurate.)

Also, more to the point, when there's a need to break backward
compatibility in order to get some improvement, it's worth
considering, but here there just isn't.

There won't be any thought towards a backwards-incompatible capability
if everyone is saying that we can't possibly break it. That's why I was
commenting on it.

I can't speak for anyone else, but that is not what I am saying. I am
open to the idea of breaking it if we thereby get some valuable
benefit which cannot be obtained otherwise. But Nathan has now
implemented something which, from the sound of it, will allow us to
obtain all of the available benefits with no incompatibilities. If we
think of additional benefits that we cannot obtain without
incompatibilities, then we can consider that situation when it arises.
In the meantime, there's no need to go looking for reasons to break
stuff that works in existing releases.

--
Robert Haas
EDB: http://www.enterprisedb.com

#44Magnus Hagander
magnus@hagander.net
In reply to: Robert Haas (#43)
Re: parallelizing the archiver

On Thu, Oct 21, 2021 at 11:05 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Oct 21, 2021 at 4:29 PM Stephen Frost <sfrost@snowman.net> wrote:

restore_command used to be in recovery.conf, which disappeared with v12
and it now has to go into postgresql.auto.conf or postgresql.conf.

That's a huge breaking change.

Not in the same sense. Moving the functionality to a different
configuration file can and probably did cause a lot of problems for
people, but the same basic functionality was still available.

Yeah.

And as a bonus it got a bunch of people to upgrade their backup software
that suddenly stopped working. Or in some case, to install backup software
instead of using the hand-rolled scripts. So there were some good
side-effects specifically to breaking it as well.

(Also, I'm pretty sure that the recovery.conf changes would have

happened years earlier if there hadn't been backward compatibility
concerns, from Simon in particular. So saying that there was "hardly
any complaint raised" in that case doesn't seem to me to be entirely
accurate.)

Also, more to the point, when there's a need to break backward
compatibility in order to get some improvement, it's worth
considering, but here there just isn't.

There won't be any thought towards a backwards-incompatible capability
if everyone is saying that we can't possibly break it. That's why I was
commenting on it.

I can't speak for anyone else, but that is not what I am saying. I am
open to the idea of breaking it if we thereby get some valuable
benefit which cannot be obtained otherwise. But Nathan has now
implemented something which, from the sound of it, will allow us to
obtain all of the available benefits with no incompatibilities. If we
think of additional benefits that we cannot obtain without
incompatibilities, then we can consider that situation when it arises.
In the meantime, there's no need to go looking for reasons to break
stuff that works in existing releases.

Agreed.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#45Magnus Hagander
magnus@hagander.net
In reply to: Bossart, Nathan (#40)
Re: parallelizing the archiver

On Thu, Oct 21, 2021 at 9:51 PM Bossart, Nathan <bossartn@amazon.com> wrote:

On 10/20/21, 3:23 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

Alright, I reworked the patch a bit to maintain backward
compatibility. My initial intent for 0001 was to just do a clean
refactor to move the shell archiving stuff to its own file. However,
after I did that, I realized that adding the hook wouldn't be too much
more work, so I did that as well. This seems to be enough to support
custom archiving modules. I included a basic example of such a module
in 0002. 0002 is included primarily for demonstration purposes.

It looks like the FreeBSD build is failing because sys/wait.h is
missing. Here is an attempt at fixing that.

I still like the idea of loading the library via a special parameter,
archive_library or such.

One reason for that is that adding/removing modules in
shared_preload_libraries has a terrible UX in that you have to replace the
whole thing. This makes it much more complex to deal with when different
modules just want to add to it.

E.g. my awsome backup program could set
archive_library='my_awesome_backups', and know it didn't break anything
else. but it couldn't set shared_preload_libraries='my_awesome_bacukps',
because then it might break a bunch of other modules that used to be there.
So it has to go try to parse the whole config and figure out where to make
such modifications.

Now, this could *also* be solved by allowing shared_preload_library to be a
"list" instead of a string, and allow postgresql.conf to accept syntax like
shared_preload_libraries+='my_awesome_backups'.

But without that level fo functionality available, I think a separate
parameter for the archive library would be a good thing.

Other than that:
+
+/*
+ * Is WAL archiving configured?  For consistency with previous releases,
this
+ * checks that archive_command is set when archiving via shell is enabled.
+ * Otherwise, we just check that an archive function is set, and it is the
+ * responsibility of that archive function to ensure it is properly
configured.
+ */
+#define XLogArchivingConfigured() \
+       (PG_archive && (PG_archive != shell_archive ||
XLogArchiveCommand[0] != '\0'))

Wouldn't that be better as a callback into the module? So that
shell_archive would implement the check for XLogArchiveCommand. Then
another third party module can make it's own decision on what to check. And
PGarchive would then be a struct that holds a function pointer to the
archive command and another function pointer to the isenabled command? (I
think having a struct for it would be useful regardless -- for possible
future extensions with more API points).

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#46Bossart, Nathan
bossartn@amazon.com
In reply to: Magnus Hagander (#45)
Re: parallelizing the archiver

On 10/22/21, 7:43 AM, "Magnus Hagander" <magnus@hagander.net> wrote:

I still like the idea of loading the library via a special
parameter, archive_library or such.

My first attempt [0]/messages/by-id/E9035E94-EC76-436E-B6C9-1C03FBD8EF54@amazon.com added a GUC like this, so I can speak to some of
the interesting design decisions that follow.

The simplest thing we could do would be to add the archive_library GUC
and to load that just like the library is at the end of
shared_preload_libraries. This would mean that the archive library
could be specified in either GUC, and there would effectively be no
difference between the two.

The next thing we could consider doing is adding a new boolean called
process_archive_library_in_progress, which would be analogous to
process_shared_preload_libraries_in_progress. If a library is loaded
from the archive_library GUC, its _PG_init() will be called with
process_archive_library_in_progress set. This also means that if a
library is specified in both shared_preload_libraries and
archive_library, we'd call its _PG_init() twice. The library could
then branch based on whether
process_shared_preload_libraries_in_progress or
process_archive_library_in_progress was set.

Another approach would be to add a new initialization function (e.g.,
PG_archive_init()) that would be used if the library is being loaded
from archive_library. Like before, you can use the library for both
shared_preload_libraries and archive_library, but your initialization
logic would be expected to go in separate functions. However, there
still wouldn't be anything forcing that. A library could still break
the rules and do everything in _PG_init() and be loaded via
shared_preload_libraries.

One more thing we could do is to discover the relevant symbols for
archiving in library loading function. Rather than expecting the
initialization function to set the hook correctly, we'd just look up
the _PG_archive() function during loading. Again, a library could
probably still break the rules and do everything in
_PG_init()/shared_preload_libraries, but there would at least be a
nicer interface available.

I believe the main drawbacks of going down this path are the
additional complexity in the backend and the slippery slope of adding
all kinds of new GUCs in the future. My original patch also tried to
do something similar for some other shell command GUCs
(archive_cleanup_command, restore_command, and recovery_end_command).
While I'm going to try to keep this focused on archive_command for
now, presumably we'd eventually want the ability to use hooks for all
of these things. I don't know if we really want to incur a new GUC
for every single one of these. To be clear, I'm not against adding a
GUC if it seems like the right thing to do. I just want to make sure
we are aware of the tradeoffs compared to a simple
shared_preload_libraries approach with its terrible UX.

Wouldn't that be better as a callback into the module? So that
shell_archive would implement the check for XLogArchiveCommand. Then
another third party module can make it's own decision on what to
check. And PGarchive would then be a struct that holds a function
pointer to the archive command and another function pointer to the
isenabled command? (I think having a struct for it would be useful
regardless -- for possible future extensions with more API points).

+1. This crossed my mind, too. I'll add this in the next revision.

Nathan

[0]: /messages/by-id/E9035E94-EC76-436E-B6C9-1C03FBD8EF54@amazon.com

#47Robert Haas
robertmhaas@gmail.com
In reply to: Bossart, Nathan (#46)
Re: parallelizing the archiver

On Fri, Oct 22, 2021 at 1:42 PM Bossart, Nathan <bossartn@amazon.com> wrote:

Another approach would be to add a new initialization function (e.g.,
PG_archive_init()) that would be used if the library is being loaded
from archive_library. Like before, you can use the library for both
shared_preload_libraries and archive_library, but your initialization
logic would be expected to go in separate functions. However, there
still wouldn't be anything forcing that. A library could still break
the rules and do everything in _PG_init() and be loaded via
shared_preload_libraries.

I was imagining something like what logical decoding does. In that
case, you make a _PG_output_plugin_init function and it returns a
table of callbacks. Then the core code invokes those callbacks at the
appropriate times.

--
Robert Haas
EDB: http://www.enterprisedb.com

#48Bossart, Nathan
bossartn@amazon.com
In reply to: Robert Haas (#47)
2 attachment(s)
Re: parallelizing the archiver

On 10/22/21, 4:35 PM, "Robert Haas" <robertmhaas@gmail.com> wrote:

I was imagining something like what logical decoding does. In that
case, you make a _PG_output_plugin_init function and it returns a
table of callbacks. Then the core code invokes those callbacks at the
appropriate times.

Here is an attempt at doing this. Archive modules are expected to
declare _PG_archive_module_init(), which can define GUCs, register
background workers, etc. This function must at least define the
archive callbacks. For now, I've introduced two callbacks. The first
is for checking that the archive module is configured, and the second
contains the actual archiving logic.

I've written this so that the same library can be used for multiple
purposes (e.g., it could be in shared_preload_libraries and
archive_library). I don't know if that's really necessary, but it
seemed to me like a reasonable way to handle the changes to the
library loading logic that we need anyway.

0002 is still a sample backup module, but I also added some handling
for preexisting archives. If the preexisting archive file has the
same contents as the current file to archive, archiving is allowed to
continue. If the contents don't match, archiving fails. This sample
module could still produce unexpected results if two servers were
sending archives to the same directory. I stopped short of adding
handling for that case, but that might be a good thing to tackle next.

Nathan

Attachments:

v5-0001-Introduce-archive-module-framework.patchapplication/octet-stream; name=v5-0001-Introduce-archive-module-framework.patchDownload
From 0d6cb5561aedd2c355cb3a5f04804824b5ca7e55 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Sat, 23 Oct 2021 21:23:22 +0000
Subject: [PATCH v5 1/2] Introduce archive module framework.

---
 src/backend/access/transam/xlog.c      |   2 +-
 src/backend/postmaster/Makefile        |   1 +
 src/backend/postmaster/pgarch.c        | 135 +++--------------------------
 src/backend/postmaster/postmaster.c    |  10 +++
 src/backend/postmaster/shell_archive.c | 154 +++++++++++++++++++++++++++++++++
 src/backend/utils/fmgr/dfmgr.c         |  70 +++++++++++++--
 src/backend/utils/init/miscinit.c      |  32 +++++++
 src/backend/utils/misc/guc.c           |  41 ++++++++-
 src/include/access/xlog.h              |   6 +-
 src/include/miscadmin.h                |   2 +
 src/include/postmaster/pgarch.h        |  38 ++++++++
 11 files changed, 359 insertions(+), 132 deletions(-)
 create mode 100644 src/backend/postmaster/shell_archive.c

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 62862255fc..b0bb3d633e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8777,7 +8777,7 @@ ShutdownXLOG(int code, Datum arg)
 		 * process one more time at the end of shutdown). The checkpoint
 		 * record will go to the next XLOG file and won't be archived (yet).
 		 */
-		if (XLogArchivingActive() && XLogArchiveCommandSet())
+		if (XLogArchivingActive() && XLogArchivingConfigured())
 			RequestXLogSwitch(false);
 
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 787c6a2c3b..dbbeac5a82 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -23,6 +23,7 @@ OBJS = \
 	pgarch.o \
 	pgstat.o \
 	postmaster.o \
+	shell_archive.o \
 	startup.o \
 	syslogger.o \
 	walwriter.o
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 74a7d7c4d0..27deedcd60 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -25,18 +25,12 @@
  */
 #include "postgres.h"
 
-#include <fcntl.h>
-#include <signal.h>
-#include <time.h>
 #include <sys/stat.h>
-#include <sys/time.h>
-#include <sys/wait.h>
 #include <unistd.h>
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "libpq/pqsignal.h"
-#include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/pgarch.h"
@@ -78,6 +72,9 @@ typedef struct PgArchData
 	int			pgprocno;		/* pgprocno of archiver process */
 } PgArchData;
 
+ArchiveModuleCallbacks ArchiveContext;
+char *XLogArchiveLibrary = NULL;
+
 
 /* ----------
  * Local data
@@ -358,11 +355,11 @@ pgarch_ArchiverCopyLoop(void)
 			 */
 			HandlePgArchInterrupts();
 
-			/* can't do anything if no command ... */
-			if (!XLogArchiveCommandSet())
+			/* can't do anything if not configured ... */
+			if (!XLogArchivingConfigured())
 			{
 				ereport(WARNING,
-						(errmsg("archive_mode enabled, yet archive_command is not set")));
+						(errmsg("archive_mode enabled, yet archiving is not configured")));
 				return;
 			}
 
@@ -443,136 +440,32 @@ pgarch_ArchiverCopyLoop(void)
 /*
  * pgarch_archiveXlog
  *
- * Invokes system(3) to copy one archive file to wherever it should go
+ * Invokes archive_file_cb to copy one archive file to wherever it should go
  *
  * Returns true if successful
  */
 static bool
 pgarch_archiveXlog(char *xlog)
 {
-	char		xlogarchcmd[MAXPGPATH];
 	char		pathname[MAXPGPATH];
 	char		activitymsg[MAXFNAMELEN + 16];
-	char	   *dp;
-	char	   *endp;
-	const char *sp;
-	int			rc;
-
-	snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
+	bool		ret;
 
-	/*
-	 * construct the command to be executed
-	 */
-	dp = xlogarchcmd;
-	endp = xlogarchcmd + MAXPGPATH - 1;
-	*endp = '\0';
-
-	for (sp = XLogArchiveCommand; *sp; sp++)
-	{
-		if (*sp == '%')
-		{
-			switch (sp[1])
-			{
-				case 'p':
-					/* %p: relative path of source file */
-					sp++;
-					strlcpy(dp, pathname, endp - dp);
-					make_native_path(dp);
-					dp += strlen(dp);
-					break;
-				case 'f':
-					/* %f: filename of source file */
-					sp++;
-					strlcpy(dp, xlog, endp - dp);
-					dp += strlen(dp);
-					break;
-				case '%':
-					/* convert %% to a single % */
-					sp++;
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-				default:
-					/* otherwise treat the % as not special */
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-			}
-		}
-		else
-		{
-			if (dp < endp)
-				*dp++ = *sp;
-		}
-	}
-	*dp = '\0';
+	Assert(ArchiveContext.archive_file_cb != NULL);
 
-	ereport(DEBUG3,
-			(errmsg_internal("executing archive command \"%s\"",
-							 xlogarchcmd)));
+	snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
 
 	/* Report archive activity in PS display */
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	rc = system(xlogarchcmd);
-	if (rc != 0)
-	{
-		/*
-		 * If either the shell itself, or a called command, died on a signal,
-		 * abort the archiver.  We do this because system() ignores SIGINT and
-		 * SIGQUIT while waiting; so a signal is very likely something that
-		 * should have interrupted us too.  Also die if the shell got a hard
-		 * "command not found" type of error.  If we overreact it's no big
-		 * deal, the postmaster will just start the archiver again.
-		 */
-		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
-
-		if (WIFEXITED(rc))
-		{
-			ereport(lev,
-					(errmsg("archive command failed with exit code %d",
-							WEXITSTATUS(rc)),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-		else if (WIFSIGNALED(rc))
-		{
-#if defined(WIN32)
-			ereport(lev,
-					(errmsg("archive command was terminated by exception 0x%X",
-							WTERMSIG(rc)),
-					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#else
-			ereport(lev,
-					(errmsg("archive command was terminated by signal %d: %s",
-							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#endif
-		}
-		else
-		{
-			ereport(lev,
-					(errmsg("archive command exited with unrecognized status %d",
-							rc),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-
+	if ((ret = (*ArchiveContext.archive_file_cb) (xlog, pathname)))
+		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
+	else
 		snprintf(activitymsg, sizeof(activitymsg), "failed on %s", xlog);
-		set_ps_display(activitymsg);
-
-		return false;
-	}
-	elog(DEBUG1, "archived write-ahead log file \"%s\"", xlog);
-
-	snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	set_ps_display(activitymsg);
 
-	return true;
+	return ret;
 }
 
 /*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index e2a76ba055..8452545946 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1022,8 +1022,13 @@ PostmasterMain(int argc, char *argv[])
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
+	 *
+	 * NB: It is important to process shared_preload_libraries before
+	 * archive_library because of assumptions made by the library loading
+	 * code.
 	 */
 	process_shared_preload_libraries();
+	process_archive_library();
 
 	/*
 	 * Initialize SSL library, if specified.
@@ -5009,8 +5014,13 @@ SubPostmasterMain(int argc, char *argv[])
 	 * exec'd this process, those libraries didn't come along with us; but we
 	 * should load them into all child processes to be consistent with the
 	 * non-EXEC_BACKEND behavior.
+	 *
+	 * NB: It is important to process shared_preload_libraries before
+	 * archive_library because of assumptions made by the library loading
+	 * code.
 	 */
 	process_shared_preload_libraries();
+	process_archive_library();
 
 	/* Run backend or appropriate child */
 	if (strcmp(argv[1], "--forkbackend") == 0)
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
new file mode 100644
index 0000000000..c0f3f1dca2
--- /dev/null
+++ b/src/backend/postmaster/shell_archive.c
@@ -0,0 +1,154 @@
+/*-------------------------------------------------------------------------
+ *
+ * shell_archive.c
+ *
+ * This archiving function uses a user-specified shell command (the
+ * archive_command GUC) to copy write-ahead log files.  It is used as the
+ * default, but other modules may define their own custom archiving logic.
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/shell_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/wait.h>
+
+#include "access/xlog.h"
+#include "postmaster/pgarch.h"
+
+static bool shell_archive_configured(void);
+static bool shell_archive_file(const char *file, const char *path);
+
+void
+shell_archive_init(ArchiveModuleCallbacks *cb)
+{
+	cb->check_configured_cb = shell_archive_configured;
+	cb->archive_file_cb = shell_archive_file;
+}
+
+static bool
+shell_archive_configured(void)
+{
+	return XLogArchiveCommand[0] != '\0';
+}
+
+static bool
+shell_archive_file(const char *file, const char *path)
+{
+	char		xlogarchcmd[MAXPGPATH];
+	char	   *dp;
+	char	   *endp;
+	const char *sp;
+	int			rc;
+
+	Assert(file != NULL);
+	Assert(path != NULL);
+
+	/*
+	 * construct the command to be executed
+	 */
+	dp = xlogarchcmd;
+	endp = xlogarchcmd + MAXPGPATH - 1;
+	*endp = '\0';
+
+	for (sp = XLogArchiveCommand; *sp; sp++)
+	{
+		if (*sp == '%')
+		{
+			switch (sp[1])
+			{
+				case 'p':
+					/* %p: relative path of source file */
+					sp++;
+					strlcpy(dp, path, endp - dp);
+					make_native_path(dp);
+					dp += strlen(dp);
+					break;
+				case 'f':
+					/* %f: filename of source file */
+					sp++;
+					strlcpy(dp, file, endp - dp);
+					dp += strlen(dp);
+					break;
+				case '%':
+					/* convert %% to a single % */
+					sp++;
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+				default:
+					/* otherwise treat the % as not special */
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+			}
+		}
+		else
+		{
+			if (dp < endp)
+				*dp++ = *sp;
+		}
+	}
+	*dp = '\0';
+
+	ereport(DEBUG3,
+			(errmsg_internal("executing archive command \"%s\"",
+							 xlogarchcmd)));
+
+	rc = system(xlogarchcmd);
+	if (rc != 0)
+	{
+		/*
+		 * If either the shell itself, or a called command, died on a signal,
+		 * abort the archiver.  We do this because system() ignores SIGINT and
+		 * SIGQUIT while waiting; so a signal is very likely something that
+		 * should have interrupted us too.  Also die if the shell got a hard
+		 * "command not found" type of error.  If we overreact it's no big
+		 * deal, the postmaster will just start the archiver again.
+		 */
+		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
+
+		if (WIFEXITED(rc))
+		{
+			ereport(lev,
+					(errmsg("archive command failed with exit code %d",
+							WEXITSTATUS(rc)),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+		else if (WIFSIGNALED(rc))
+		{
+#if defined(WIN32)
+			ereport(lev,
+					(errmsg("archive command was terminated by exception 0x%X",
+							WTERMSIG(rc)),
+					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#else
+			ereport(lev,
+					(errmsg("archive command was terminated by signal %d: %s",
+							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#endif
+		}
+		else
+		{
+			ereport(lev,
+					(errmsg("archive command exited with unrecognized status %d",
+							rc),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+
+		return false;
+	}
+
+	elog(DEBUG1, "archived write-ahead log file \"%s\"", file);
+	return true;
+}
diff --git a/src/backend/utils/fmgr/dfmgr.c b/src/backend/utils/fmgr/dfmgr.c
index 96fd9d2268..a755c215aa 100644
--- a/src/backend/utils/fmgr/dfmgr.c
+++ b/src/backend/utils/fmgr/dfmgr.c
@@ -33,6 +33,7 @@
 #include "fmgr.h"
 #include "lib/stringinfo.h"
 #include "miscadmin.h"
+#include "postmaster/pgarch.h"
 #include "storage/shmem.h"
 #include "utils/hsearch.h"
 
@@ -85,6 +86,7 @@ static char *expand_dynamic_library_name(const char *name);
 static void check_restricted_library_name(const char *name);
 static char *substitute_libpath_macro(const char *name);
 static char *find_in_dynamic_libpath(const char *basename);
+static void init_library(const char *libname, void *handle);
 
 /* Magic structure that module needs to match to be accepted */
 static const Pg_magic_struct magic_data = PG_MODULE_MAGIC_DATA;
@@ -187,7 +189,6 @@ internal_load_library(const char *libname)
 	PGModuleMagicFunction magic_func;
 	char	   *load_error;
 	struct stat stat_buf;
-	PG_init_t	PG_init;
 
 	/*
 	 * Scan the list of loaded FILES to see if the file has been loaded.
@@ -281,12 +282,7 @@ internal_load_library(const char *libname)
 					 errhint("Extension libraries are required to use the PG_MODULE_MAGIC macro.")));
 		}
 
-		/*
-		 * If the library has a _PG_init() function, call it.
-		 */
-		PG_init = (PG_init_t) dlsym(file_scanner->handle, "_PG_init");
-		if (PG_init)
-			(*PG_init) ();
+		init_library(libname, file_scanner->handle);
 
 		/* OK to link it into list */
 		if (file_list == NULL)
@@ -295,10 +291,70 @@ internal_load_library(const char *libname)
 			file_tail->next = file_scanner;
 		file_tail = file_scanner;
 	}
+	else if (process_archive_library_in_progress)
+	{
+		/*
+		 * If we are loading an archive library, we initialize the library
+		 * even if we previously loaded it.  This allows users to use
+		 * archive libraries for multiple reasons (e.g., the same library
+		 * can be specified in shared_preload_libraries and
+		 * archive_library).
+		 *
+		 * NB: This assumes that we load archive_library after loading
+		 * shared_preload_libraries.
+		 */
+		init_library(libname, file_scanner->handle);
+	}
 
 	return file_scanner->handle;
 }
 
+/*
+ * init_library
+ *
+ * If we are loading an archive library, this function calls the library's
+ * _PG_archive_module_init() function and ensures the necessary callbacks are
+ * registered.  Otherwise, this function calls the library's _PG_init() function
+ * if it exists.
+ */
+static void
+init_library(const char *libname, void *handle)
+{
+	if (process_archive_library_in_progress)
+	{
+		ArchiveModuleInit archive_init;
+
+		ArchiveContext.check_configured_cb = NULL;
+		ArchiveContext.archive_file_cb = NULL;
+
+		archive_init = (ArchiveModuleInit) dlsym(handle, "_PG_archive_module_init");
+		if (archive_init)
+			(*archive_init) (&ArchiveContext);
+		else
+			ereport(ERROR,
+					(errmsg("incompatible archive library \"%s\"", libname),
+					 errhint("Archive modules have to declare the _PG_archive_module_init symbol.")));
+
+		if (ArchiveContext.check_configured_cb == NULL)
+			ereport(ERROR,
+					(errmsg("archive modules have to register a check callback")));
+		if (ArchiveContext.archive_file_cb == NULL)
+			ereport(ERROR,
+					(errmsg("archive modules have to register an archive callback")));
+	}
+	else
+	{
+		PG_init_t PG_init;
+
+		/*
+		 * If the library has a _PG_init() function, call it.
+		 */
+		PG_init = (PG_init_t) dlsym(handle, "_PG_init");
+		if (PG_init)
+			(*PG_init) ();
+	}
+}
+
 /*
  * Report a suitable error for an incompatible magic block.
  */
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 88801374b5..4b66094ce9 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -29,6 +29,7 @@
 #include <utime.h>
 
 #include "access/htup_details.h"
+#include "access/xlog.h"
 #include "catalog/pg_authid.h"
 #include "common/file_perm.h"
 #include "libpq/libpq.h"
@@ -38,6 +39,7 @@
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/interrupt.h"
+#include "postmaster/pgarch.h"
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -1614,6 +1616,9 @@ char	   *local_preload_libraries_string = NULL;
 /* Flag telling that we are loading shared_preload_libraries */
 bool		process_shared_preload_libraries_in_progress = false;
 
+/* Flag telling that we are loading archive_library */
+bool		process_archive_library_in_progress = false;
+
 /*
  * load the shared libraries listed in 'libraries'
  *
@@ -1696,6 +1701,33 @@ process_session_preload_libraries(void)
 				   true);
 }
 
+/*
+ * process the archive library
+ */
+void
+process_archive_library(void)
+{
+	process_archive_library_in_progress = true;
+
+	if (XLogArchiveLibrary && XLogArchiveLibrary[0] != '\0')
+	{
+		load_file(XLogArchiveLibrary, false);
+		ereport(DEBUG1,
+				(errmsg_internal("loaded archive library \"%s\"",
+								 XLogArchiveLibrary)));
+	}
+	else
+	{
+		/*
+		 * If no archive library was specified, fall back to archiving via
+		 * shell (i.e., archive_command).
+		 */
+		shell_archive_init(&ArchiveContext);
+	}
+
+	process_archive_library_in_progress = false;
+}
+
 void
 pg_bindtextdomain(const char *domain)
 {
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index d2ce4a8450..b5179011b8 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -234,6 +234,8 @@ static bool check_recovery_target_lsn(char **newval, void **extra, GucSource sou
 static void assign_recovery_target_lsn(const char *newval, void *extra);
 static bool check_primary_slot_name(char **newval, void **extra, GucSource source);
 static bool check_default_with_oids(bool *newval, void **extra, GucSource source);
+static bool check_archive_command(char **newval, void **extra, GucSource source);
+static bool check_archive_library(char **newval, void **extra, GucSource source);
 
 /* Private functions in guc-file.l that need to be called from guc.c */
 static ConfigVariable *ProcessConfigFileInternal(GucContext context,
@@ -3855,7 +3857,17 @@ static struct config_string ConfigureNamesString[] =
 		},
 		&XLogArchiveCommand,
 		"",
-		NULL, NULL, show_archive_command
+		check_archive_command, NULL, show_archive_command
+	},
+
+	{
+		{"archive_library", PGC_POSTMASTER, WAL_ARCHIVING,
+			gettext_noop("Sets the library that will be called to archive a WAL file."),
+			NULL
+		},
+		&XLogArchiveLibrary,
+		"",
+		check_archive_library, NULL, NULL
 	},
 
 	{
@@ -8948,7 +8960,8 @@ init_custom_variable(const char *name,
 	 * module might already have hooked into.
 	 */
 	if (context == PGC_POSTMASTER &&
-		!process_shared_preload_libraries_in_progress)
+		!process_shared_preload_libraries_in_progress &&
+		!process_archive_library_in_progress)
 		elog(FATAL, "cannot create PGC_POSTMASTER variables after startup");
 
 	/*
@@ -12559,4 +12572,28 @@ check_default_with_oids(bool *newval, void **extra, GucSource source)
 	return true;
 }
 
+static bool
+check_archive_command(char **newval, void **extra, GucSource source)
+{
+	if (*newval && *newval[0] != '\0' &&
+		XLogArchiveLibrary && XLogArchiveLibrary[0] != '\0')
+	{
+		GUC_check_errdetail("Cannot set parameter when \"archive_library\" is set.");
+		return false;
+	}
+	return true;
+}
+
+static bool
+check_archive_library(char **newval, void **extra, GucSource source)
+{
+	if (*newval && *newval[0] != '\0' &&
+		XLogArchiveCommand && XLogArchiveCommand[0] != '\0')
+	{
+		GUC_check_errdetail("Cannot set parameter when \"archive_command\" is set.");
+		return false;
+	}
+	return true;
+}
+
 #include "guc-file.c"
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 5e2c94a05f..09e163c0ac 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -18,6 +18,7 @@
 #include "datatype/timestamp.h"
 #include "lib/stringinfo.h"
 #include "nodes/pg_list.h"
+#include "postmaster/pgarch.h"
 #include "storage/fd.h"
 
 
@@ -71,6 +72,7 @@ extern int	XLOGbuffers;
 extern int	XLogArchiveTimeout;
 extern int	wal_retrieve_retry_interval;
 extern char *XLogArchiveCommand;
+extern char *XLogArchiveLibrary;
 extern bool EnableHotStandby;
 extern bool fullPageWrites;
 extern bool wal_log_hints;
@@ -157,7 +159,9 @@ extern PGDLLIMPORT int wal_level;
 /* Is WAL archiving enabled always (even during recovery)? */
 #define XLogArchivingAlways() \
 	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
-#define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')
+/* Is WAL archiving configured? */
+#define XLogArchivingConfigured() \
+	(AssertMacro(ArchiveContext.check_configured_cb != NULL), ArchiveContext.check_configured_cb())
 
 /*
  * Is WAL-logging necessary for archival or log-shipping, or can we skip
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 90a3016065..8717fed0dc 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -464,6 +464,7 @@ extern void BaseInit(void);
 /* in utils/init/miscinit.c */
 extern bool IgnoreSystemIndexes;
 extern PGDLLIMPORT bool process_shared_preload_libraries_in_progress;
+extern PGDLLIMPORT bool process_archive_library_in_progress;
 extern char *session_preload_libraries_string;
 extern char *shared_preload_libraries_string;
 extern char *local_preload_libraries_string;
@@ -477,6 +478,7 @@ extern bool RecheckDataDirLockFile(void);
 extern void ValidatePgVersion(const char *path);
 extern void process_shared_preload_libraries(void);
 extern void process_session_preload_libraries(void);
+extern void process_archive_library(void);
 extern void pg_bindtextdomain(const char *domain);
 extern bool has_rolreplication(Oid roleid);
 
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 1e47a143e1..38a5828ffa 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -32,4 +32,42 @@ extern bool PgArchCanRestart(void);
 extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 
+/*
+ * Callback that gets called to determine if the archive module is
+ * configured.
+ */
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+
+/*
+ * Callback called to archive a single WAL file.
+ */
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+
+/*
+ * Archive module callbacks
+ */
+typedef struct ArchiveModuleCallbacks
+{
+	ArchiveCheckConfiguredCB check_configured_cb;
+	ArchiveFileCB archive_file_cb;
+} ArchiveModuleCallbacks;
+
+/*
+ * Type of the shared library symbol _PG_archive_module_init that is looked
+ * up when loading an archive library.
+ */
+typedef void (*ArchiveModuleInit) (ArchiveModuleCallbacks *cb);
+
+/*
+ * The registered callbacks for the archiver to use.
+ */
+extern ArchiveModuleCallbacks ArchiveContext;
+
+/*
+ * Since the logic for archiving via a shell command is in the core server
+ * and does not need to be loaded via a shared library, it has a special
+ * initialization function.
+ */
+extern void shell_archive_init(ArchiveModuleCallbacks *cb);
+
 #endif							/* _PGARCH_H */
-- 
2.16.6

v5-0002-Add-an-example-of-a-basic-archiving-module.patchapplication/octet-stream; name=v5-0002-Add-an-example-of-a-basic-archiving-module.patchDownload
From 21bf196a5b36c66432d86d84c2f1027ace2c1838 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Sun, 24 Oct 2021 04:40:15 +0000
Subject: [PATCH v5 2/2] Add an example of a basic archiving module.

---
 contrib/Makefile                      |   1 +
 contrib/basic_archive/Makefile        |  15 ++
 contrib/basic_archive/basic_archive.c | 385 ++++++++++++++++++++++++++++++++++
 3 files changed, 401 insertions(+)
 create mode 100644 contrib/basic_archive/Makefile
 create mode 100644 contrib/basic_archive/basic_archive.c

diff --git a/contrib/Makefile b/contrib/Makefile
index f27e458482..aff834ebaa 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -9,6 +9,7 @@ SUBDIRS = \
 		amcheck		\
 		auth_delay	\
 		auto_explain	\
+		basic_archive	\
 		bloom		\
 		btree_gin	\
 		btree_gist	\
diff --git a/contrib/basic_archive/Makefile b/contrib/basic_archive/Makefile
new file mode 100644
index 0000000000..ea6b460889
--- /dev/null
+++ b/contrib/basic_archive/Makefile
@@ -0,0 +1,15 @@
+# contrib/basic_archive/Makefile
+
+MODULES = basic_archive
+PGFILEDESC = "basic_archive - basic archive module"
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/basic_archive
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
new file mode 100644
index 0000000000..6c8ffa0520
--- /dev/null
+++ b/contrib/basic_archive/basic_archive.c
@@ -0,0 +1,385 @@
+/*-------------------------------------------------------------------------
+ *
+ * basic_archive.c
+ *
+ * This file demonstrates a basic archive library implementation that is
+ * roughly equivalent to the following shell command:
+ *
+ * 		test ! -f /path/to/dest && cp /path/to/src /path/to/dest
+ *
+ * One notable difference between this module and the shell command above
+ * is that this module first copies the file to a temporary destination,
+ * syncs it to disk, and then durably moves it to the final destination.
+ * While this module is designed to gracefully recover from inconvenient
+ * server crashes (e.g., a crash after we've archived the file but before
+ * we've renamed its .ready file to .done), it does not have any special
+ * handling for multiple servers archiving to the same location.
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  contrib/basic_archive/basic_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "common/checksum_helper.h"
+#include "fmgr.h"
+#include "miscadmin.h"
+#include "postmaster/pgarch.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+void _PG_archive_module_init(ArchiveModuleCallbacks *cb);
+
+static char *archive_directory = NULL;
+
+static bool basic_archive_configured(void);
+static bool basic_archive_file(const char *file, const char *path);
+static bool check_archive_directory(char **newval, void **extra, GucSource source);
+static void copy_file(const char *src, const char *dst);
+static void fsync_file_and_parent_dir(const char *path);
+static bool file_contents_match(const char *file1, const char *file2);
+static void get_file_checksum(const char *file, int *checksumlen, uint8 *checksumbuf);
+
+/*
+ * _PG_archive_module_init
+ *
+ * Defines the module's GUC and returns its callbacks.
+ */
+void
+_PG_archive_module_init(ArchiveModuleCallbacks *cb)
+{
+	if (!process_archive_library_in_progress)
+		ereport(ERROR,
+				(errmsg("\"basic_archive\" can only be loaded via \"archive_library\"")));
+
+	DefineCustomStringVariable("basic_archive.archive_directory",
+							   gettext_noop("Archive file destination directory."),
+							   NULL,
+							   &archive_directory,
+							   "",
+							   PGC_SIGHUP,
+							   GUC_NOT_IN_SAMPLE,
+							   check_archive_directory, NULL, NULL);
+
+	EmitWarningsOnPlaceholders("basic_archive");
+
+	cb->check_configured_cb = basic_archive_configured;
+	cb->archive_file_cb = basic_archive_file;
+}
+
+/*
+ * check_archive_directory
+ *
+ * Checks that the provided archive directory exists.
+ */
+static bool
+check_archive_directory(char **newval, void **extra, GucSource source)
+{
+	struct stat st;
+
+	/*
+	 * The default value is an empty string, so we have to accept that value.
+	 * Our check_configured callback also checks for this and prevents archiving
+	 * from proceeding if it is still empty.
+	 */
+	if (*newval == NULL || *newval[0] == '\0')
+		return true;
+
+	/*
+	 * Do a basic sanity check that the specified archive directory exists.  It
+	 * could be removed at some point in the future, so we still need to be
+	 * prepared for it not to exist in the actual archiving logic.
+	 */
+	if (stat(*newval, &st) != 0 || !S_ISDIR(st.st_mode))
+	{
+		GUC_check_errdetail("specified archive directory does not exist");
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * basic_archive_configured
+ *
+ * Checks that archive_directory is not blank.
+ */
+static bool
+basic_archive_configured(void)
+{
+	return archive_directory != NULL && archive_directory[0] != '\0';
+}
+
+/*
+ * basic_archive_file
+ *
+ * Archives one file.
+ */
+static bool
+basic_archive_file(const char *file, const char *path)
+{
+	char destination[MAXPGPATH];
+	char temp[MAXPGPATH];
+	struct stat st;
+
+	Assert(file != NULL);
+	Assert(path != NULL);
+
+	ereport(DEBUG3,
+			(errmsg("archiving \"%s\" via basic_archive", file)));
+
+#define TEMP_FILE_NAME ("archtemp")
+
+	Assert(MAX_XFN_CHARS >= strlen(TEMP_FILE_NAME));
+	if (strlen(archive_directory) + MAX_XFN_CHARS + 2 >= MAXPGPATH)
+	{
+		ereport(WARNING,
+				(errmsg("archive destination path too long")));
+		return false;
+	}
+
+	snprintf(destination, MAXPGPATH, "%s/%s", archive_directory, file);
+	snprintf(temp, MAXPGPATH, "%s/%s", archive_directory, TEMP_FILE_NAME);
+
+	/*
+	 * First, check if the file has already been archived.
+	 *
+	 * If the archive file already exists, check if its content matches the to-
+	 * be-archived file.  If the files match, we assume that the archiver
+	 * previously crashed at an unfortunate time and that we can safely proceed.
+	 * If the files do not match, something might be wrong, so we just fail.
+	 */
+	if (stat(destination, &st) == 0)
+	{
+		if (file_contents_match(path, destination))
+		{
+			ereport(DEBUG3,
+					(errmsg("archive file \"%s\" already exists with matching contents",
+							destination)));
+			fsync_file_and_parent_dir(destination);
+			return true;
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("archive file \"%s\" already exists with different contents",
+							destination)));
+			return false;
+		}
+	}
+	else if (errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat file \"%s\": %m", destination)));
+
+	/*
+	 * Remove pre-existing temporary file, if one exists.
+	 */
+	if (unlink(temp) != 0 && errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not unlink file \"%s\": %m", temp)));
+
+	/*
+	 * Copy the file to its temporary destination.
+	 */
+	copy_file(path, temp);
+
+	/*
+	 * Sync the temporary file to disk and move it to its final destination.
+	 */
+	(void) durable_rename(temp, destination, ERROR);
+
+	ereport(DEBUG1,
+			(errmsg("archived \"%s\" via basic_archive", file)));
+
+	return true;
+}
+
+/*
+ * copy_file
+ *
+ * Copies the contents of src to dst.
+ */
+static void
+copy_file(const char *src, const char *dst)
+{
+	int srcfd;
+	int dstfd;
+	char *buf;
+
+	Assert(src != NULL);
+	Assert(dst != NULL);
+
+#define COPY_BUF_SIZE (64 * 1024)
+
+	buf = palloc(COPY_BUF_SIZE);
+
+	srcfd = OpenTransientFile(src, O_RDONLY | PG_BINARY);
+	if (srcfd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", src)));
+
+	dstfd = OpenTransientFile(dst, O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
+	if (dstfd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", dst)));
+
+	for (;;)
+	{
+		int nbytes = read(srcfd, buf, COPY_BUF_SIZE);
+		if (nbytes < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", src)));
+
+		if (nbytes == 0)
+			break;
+
+		errno = 0;
+		if ((int) write(dstfd, buf, nbytes) != nbytes)
+		{
+			/* if write didn't set errno, assume problem is no disk space */
+			if (errno == 0)
+				errno = ENOSPC;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not write to file \"%s\": %m", dst)));
+		}
+	}
+
+	if (CloseTransientFile(dstfd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", dst)));
+
+	if (CloseTransientFile(srcfd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", src)));
+
+	pfree(buf);
+}
+
+/*
+ * fsync_file_and_parent_dir
+ *
+ * Flushes the given file and its parent directory to disk.
+ */
+static void
+fsync_file_and_parent_dir(const char *path)
+{
+	char parentpath[MAXPGPATH];
+
+	fsync_fname_ext(path, false, false, ERROR);
+
+	strlcpy(parentpath, path, MAXPGPATH);
+	get_parent_directory(parentpath);
+
+	/*
+	 * get_parent_directory() returns an empty string if the input argument is
+	 * just a file name (see comments in path.c), so handle that as being the
+	 * current directory.
+	 */
+	if (strlen(parentpath) == 0)
+		strlcpy(parentpath, ".", MAXPGPATH);
+
+	(void) fsync_fname_ext(parentpath, true, false, ERROR);
+}
+
+/*
+ * get_file_checksum
+ *
+ * Calculates a basic checksum for the given file.  The checksums returned by
+ * this function are not cryptographically strong.
+ */
+static void
+get_file_checksum(const char *file, int *checksumlen, uint8 *checksumbuf)
+{
+	int fd;
+	uint8 *buf;
+	pg_checksum_context checksum_ctx;
+
+	Assert(file != NULL);
+	Assert(checksumlen != NULL);
+	Assert(checksumbuf != NULL);
+
+#define CHECKSUM_BUF_SIZE (64 * 1024)
+
+	buf = palloc(CHECKSUM_BUF_SIZE);
+
+	fd = OpenTransientFile(file, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", file)));
+
+	if (pg_checksum_init(&checksum_ctx, CHECKSUM_TYPE_CRC32C) < 0)
+		ereport(ERROR,
+				(errmsg("could not initialized checksum of file \"%s\"", file)));
+
+	for (;;)
+	{
+		int nbytes = read(fd, buf, CHECKSUM_BUF_SIZE);
+		if (nbytes < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", file)));
+
+		if (nbytes == 0)
+			break;
+
+		if (pg_checksum_update(&checksum_ctx, buf, nbytes) < 0)
+			ereport(ERROR,
+					(errmsg("could not update checksum of file \"%s\"", file)));
+	}
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", file)));
+
+	*checksumlen = pg_checksum_final(&checksum_ctx, checksumbuf);
+	if (*checksumlen < 0)
+		ereport(ERROR,
+				(errmsg("could not finalize checksum of file \"%s\"", file)));
+
+	pfree(buf);
+}
+
+/*
+ * file_contents_match
+ *
+ * This determines whether the two given files have the same contents by
+ * calculating and comparing their checksums.  If the files have the same
+ * contents, this function returns true.  If the files have different contents,
+ * this function will typically return false, but may in rare circumstances
+ * return true due to a checksum collision.  This risk of this function
+ * returning true for files with different contents is considered negligible.
+ */
+static bool
+file_contents_match(const char *file1, const char *file2)
+{
+	uint8 checksumbuf1[PG_CHECKSUM_MAX_LENGTH];
+	uint8 checksumbuf2[PG_CHECKSUM_MAX_LENGTH];
+	int checksumlen1 = 0;
+	int checksumlen2 = 0;
+
+	get_file_checksum(file1, &checksumlen1, checksumbuf1);
+	get_file_checksum(file2, &checksumlen2, checksumbuf2);
+
+	if (checksumlen1 != checksumlen2)
+		return false;
+
+	return memcmp(checksumbuf1, checksumbuf2, checksumlen1) == 0;
+}
-- 
2.16.6

#49Stephen Frost
sfrost@snowman.net
In reply to: Magnus Hagander (#44)
Re: parallelizing the archiver

Greetings,

* Magnus Hagander (magnus@hagander.net) wrote:

On Thu, Oct 21, 2021 at 11:05 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Oct 21, 2021 at 4:29 PM Stephen Frost <sfrost@snowman.net> wrote:

restore_command used to be in recovery.conf, which disappeared with v12
and it now has to go into postgresql.auto.conf or postgresql.conf.

That's a huge breaking change.

Not in the same sense. Moving the functionality to a different
configuration file can and probably did cause a lot of problems for
people, but the same basic functionality was still available.

Yeah.

And as a bonus it got a bunch of people to upgrade their backup software
that suddenly stopped working. Or in some case, to install backup software
instead of using the hand-rolled scripts. So there were some good
side-effects specifically to breaking it as well.

I feel like there's some confusion here- just to clear things up, I
wasn't suggesting that we wouldn't include the capability, just that we
should be open to changing the interface/configuration based on what
makes sense and not, necessarily, insist on perfect backwards
compatibility. Seems everyone else has come out in support of that as
well at this point and so I don't think there's much more to say here.

The original complaint I had made was that it felt like folks were
pushing hard on backwards compatibility for the sake of it and I was
just trying to make sure it's clear that we can, and do, break backwards
compatibility sometimes and the bar to clear isn't necessarily all that
high, though of course we should be gaining something if we do decide to
make such a change.

Thanks,

Stephen

#50Robert Haas
robertmhaas@gmail.com
In reply to: Bossart, Nathan (#48)
Re: parallelizing the archiver

On Sun, Oct 24, 2021 at 2:15 AM Bossart, Nathan <bossartn@amazon.com> wrote:

Here is an attempt at doing this. Archive modules are expected to
declare _PG_archive_module_init(), which can define GUCs, register
background workers, etc. This function must at least define the
archive callbacks. For now, I've introduced two callbacks. The first
is for checking that the archive module is configured, and the second
contains the actual archiving logic.

I don't see why this patch should need to make any changes to
internal_load_library(), PostmasterMain(), SubPostmasterMain(), or any
other central point of control, and I don't think it should.
pgarch_archiveXlog() can just load the library at the time it's
needed. That way it only gets loaded in the archiver process, and the
required changes are much more localized. Like instead of asserting
that the functions are initialized, just
load_external_function(libname, "_PG_archive_module_init") and call it
if they aren't.

I think the attempt in check_archive_command()/check_archive_library()
to force exactly one of those two to be set is not going to work well
and should be removed. In general, GUCs whose legal values depend on
the values of other GUCs don't end up working out well. I think what
should happen instead is that if archive_library=shell then
archive_command does whatever it does; otherwise archive_command is
without effect.

--
Robert Haas
EDB: http://www.enterprisedb.com

#51Bossart, Nathan
bossartn@amazon.com
In reply to: Robert Haas (#50)
Re: parallelizing the archiver

On 10/25/21, 10:02 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:

I don't see why this patch should need to make any changes to
internal_load_library(), PostmasterMain(), SubPostmasterMain(), or any
other central point of control, and I don't think it should.
pgarch_archiveXlog() can just load the library at the time it's
needed. That way it only gets loaded in the archiver process, and the
required changes are much more localized. Like instead of asserting
that the functions are initialized, just
load_external_function(libname, "_PG_archive_module_init") and call it
if they aren't.

IIUC this would mean that archive modules that need to define GUCs or
register background workers would have to separately define a
_PG_init() and be loaded via shared_preload_libraries in addition to
archive_library. That doesn't seem too terrible to me, but it was
something I was trying to avoid.

I think the attempt in check_archive_command()/check_archive_library()
to force exactly one of those two to be set is not going to work well
and should be removed. In general, GUCs whose legal values depend on
the values of other GUCs don't end up working out well. I think what
should happen instead is that if archive_library=shell then
archive_command does whatever it does; otherwise archive_command is
without effect.

I'm fine with this approach. I'll go this route in the next revision.

Nathan

#52Robert Haas
robertmhaas@gmail.com
In reply to: Bossart, Nathan (#51)
Re: parallelizing the archiver

On Mon, Oct 25, 2021 at 1:14 PM Bossart, Nathan <bossartn@amazon.com> wrote:

IIUC this would mean that archive modules that need to define GUCs or
register background workers would have to separately define a
_PG_init() and be loaded via shared_preload_libraries in addition to
archive_library. That doesn't seem too terrible to me, but it was
something I was trying to avoid.

Hmm. That doesn't seem like a terrible goal, but I think we should try
to find some way of achieving it that looks tidier than this does.

--
Robert Haas
EDB: http://www.enterprisedb.com

#53Bossart, Nathan
bossartn@amazon.com
In reply to: Robert Haas (#52)
Re: parallelizing the archiver

On 10/25/21, 10:18 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:

On Mon, Oct 25, 2021 at 1:14 PM Bossart, Nathan <bossartn@amazon.com> wrote:

IIUC this would mean that archive modules that need to define GUCs or
register background workers would have to separately define a
_PG_init() and be loaded via shared_preload_libraries in addition to
archive_library. That doesn't seem too terrible to me, but it was
something I was trying to avoid.

Hmm. That doesn't seem like a terrible goal, but I think we should try
to find some way of achieving it that looks tidier than this does.

We could just treat archive_library as if it is tacked onto the
shared_preload_libraries list. I think I can make that look
relatively tidy.

Nathan

#54Bossart, Nathan
bossartn@amazon.com
In reply to: Robert Haas (#52)
2 attachment(s)
Re: parallelizing the archiver

On 10/25/21, 10:50 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

On 10/25/21, 10:18 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:

On Mon, Oct 25, 2021 at 1:14 PM Bossart, Nathan <bossartn@amazon.com> wrote:

IIUC this would mean that archive modules that need to define GUCs or
register background workers would have to separately define a
_PG_init() and be loaded via shared_preload_libraries in addition to
archive_library. That doesn't seem too terrible to me, but it was
something I was trying to avoid.

Hmm. That doesn't seem like a terrible goal, but I think we should try
to find some way of achieving it that looks tidier than this does.

We could just treat archive_library as if it is tacked onto the
shared_preload_libraries list. I think I can make that look
relatively tidy.

Alright, here is an attempt at that. With this revision, archive
libraries are preloaded (and _PG_init() is called), and the archiver
is responsible for calling _PG_archive_module_init() to get the
callbacks. I've also removed the GUC check hooks as previously
discussed.

Nathan

Attachments:

v6-0001-Introduce-archive-module-framework.patchapplication/octet-stream; name=v6-0001-Introduce-archive-module-framework.patchDownload
From 337a444e888920e6942a427132501e95485a8186 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Mon, 25 Oct 2021 19:17:59 +0000
Subject: [PATCH v6 1/2] Introduce archive module framework.

---
 src/backend/access/transam/xlog.c      |   2 +-
 src/backend/postmaster/Makefile        |   1 +
 src/backend/postmaster/pgarch.c        | 182 +++++++++++----------------------
 src/backend/postmaster/postmaster.c    |   2 +
 src/backend/postmaster/shell_archive.c | 156 ++++++++++++++++++++++++++++
 src/backend/utils/init/miscinit.c      |  27 +++++
 src/backend/utils/misc/guc.c           |  15 ++-
 src/include/access/xlog.h              |   1 -
 src/include/miscadmin.h                |   2 +
 src/include/postmaster/pgarch.h        |  45 ++++++++
 10 files changed, 308 insertions(+), 125 deletions(-)
 create mode 100644 src/backend/postmaster/shell_archive.c

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index cd553d6e12..70e87af284 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8795,7 +8795,7 @@ ShutdownXLOG(int code, Datum arg)
 		 * process one more time at the end of shutdown). The checkpoint
 		 * record will go to the next XLOG file and won't be archived (yet).
 		 */
-		if (XLogArchivingActive() && XLogArchiveCommandSet())
+		if (XLogArchivingActive())
 			RequestXLogSwitch(false);
 
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 787c6a2c3b..dbbeac5a82 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -23,6 +23,7 @@ OBJS = \
 	pgarch.o \
 	pgstat.o \
 	postmaster.o \
+	shell_archive.o \
 	startup.o \
 	syslogger.o \
 	walwriter.o
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 74a7d7c4d0..f0e437f820 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -25,18 +25,12 @@
  */
 #include "postgres.h"
 
-#include <fcntl.h>
-#include <signal.h>
-#include <time.h>
 #include <sys/stat.h>
-#include <sys/time.h>
-#include <sys/wait.h>
 #include <unistd.h>
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "libpq/pqsignal.h"
-#include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/pgarch.h"
@@ -78,6 +72,8 @@ typedef struct PgArchData
 	int			pgprocno;		/* pgprocno of archiver process */
 } PgArchData;
 
+char *XLogArchiveLibrary = "";
+
 
 /* ----------
  * Local data
@@ -85,6 +81,8 @@ typedef struct PgArchData
  */
 static time_t last_sigterm_time = 0;
 static PgArchData *PgArch = NULL;
+static ArchiveModuleCallbacks *ArchiveContext = NULL;
+
 
 /*
  * Flags set by interrupt handlers for later service in the main loop.
@@ -103,6 +101,7 @@ static bool pgarch_readyXlog(char *xlog);
 static void pgarch_archiveDone(char *xlog);
 static void pgarch_die(int code, Datum arg);
 static void HandlePgArchInterrupts(void);
+static void LoadArchiveLibrary(void);
 
 /* Report shared memory space needed by PgArchShmemInit */
 Size
@@ -198,6 +197,11 @@ PgArchiverMain(void)
 	 */
 	PgArch->pgprocno = MyProc->pgprocno;
 
+	/*
+	 * Load the archive_library.
+	 */
+	LoadArchiveLibrary();
+
 	pgarch_MainLoop();
 
 	proc_exit(0);
@@ -358,11 +362,11 @@ pgarch_ArchiverCopyLoop(void)
 			 */
 			HandlePgArchInterrupts();
 
-			/* can't do anything if no command ... */
-			if (!XLogArchiveCommandSet())
+			/* can't do anything if not configured ... */
+			if (!ArchiveContext->check_configured_cb())
 			{
 				ereport(WARNING,
-						(errmsg("archive_mode enabled, yet archive_command is not set")));
+						(errmsg("archive_mode enabled, yet archiving is not configured")));
 				return;
 			}
 
@@ -443,136 +447,31 @@ pgarch_ArchiverCopyLoop(void)
 /*
  * pgarch_archiveXlog
  *
- * Invokes system(3) to copy one archive file to wherever it should go
+ * Invokes archive_file_cb to copy one archive file to wherever it should go
  *
  * Returns true if successful
  */
 static bool
 pgarch_archiveXlog(char *xlog)
 {
-	char		xlogarchcmd[MAXPGPATH];
 	char		pathname[MAXPGPATH];
 	char		activitymsg[MAXFNAMELEN + 16];
-	char	   *dp;
-	char	   *endp;
-	const char *sp;
-	int			rc;
+	bool		ret;
 
 	snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
 
-	/*
-	 * construct the command to be executed
-	 */
-	dp = xlogarchcmd;
-	endp = xlogarchcmd + MAXPGPATH - 1;
-	*endp = '\0';
-
-	for (sp = XLogArchiveCommand; *sp; sp++)
-	{
-		if (*sp == '%')
-		{
-			switch (sp[1])
-			{
-				case 'p':
-					/* %p: relative path of source file */
-					sp++;
-					strlcpy(dp, pathname, endp - dp);
-					make_native_path(dp);
-					dp += strlen(dp);
-					break;
-				case 'f':
-					/* %f: filename of source file */
-					sp++;
-					strlcpy(dp, xlog, endp - dp);
-					dp += strlen(dp);
-					break;
-				case '%':
-					/* convert %% to a single % */
-					sp++;
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-				default:
-					/* otherwise treat the % as not special */
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-			}
-		}
-		else
-		{
-			if (dp < endp)
-				*dp++ = *sp;
-		}
-	}
-	*dp = '\0';
-
-	ereport(DEBUG3,
-			(errmsg_internal("executing archive command \"%s\"",
-							 xlogarchcmd)));
-
 	/* Report archive activity in PS display */
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	rc = system(xlogarchcmd);
-	if (rc != 0)
-	{
-		/*
-		 * If either the shell itself, or a called command, died on a signal,
-		 * abort the archiver.  We do this because system() ignores SIGINT and
-		 * SIGQUIT while waiting; so a signal is very likely something that
-		 * should have interrupted us too.  Also die if the shell got a hard
-		 * "command not found" type of error.  If we overreact it's no big
-		 * deal, the postmaster will just start the archiver again.
-		 */
-		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
-
-		if (WIFEXITED(rc))
-		{
-			ereport(lev,
-					(errmsg("archive command failed with exit code %d",
-							WEXITSTATUS(rc)),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-		else if (WIFSIGNALED(rc))
-		{
-#if defined(WIN32)
-			ereport(lev,
-					(errmsg("archive command was terminated by exception 0x%X",
-							WTERMSIG(rc)),
-					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#else
-			ereport(lev,
-					(errmsg("archive command was terminated by signal %d: %s",
-							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#endif
-		}
-		else
-		{
-			ereport(lev,
-					(errmsg("archive command exited with unrecognized status %d",
-							rc),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-
+	ret = ArchiveContext->archive_file_cb(xlog, pathname);
+	if (ret)
+		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
+	else
 		snprintf(activitymsg, sizeof(activitymsg), "failed on %s", xlog);
-		set_ps_display(activitymsg);
-
-		return false;
-	}
-	elog(DEBUG1, "archived write-ahead log file \"%s\"", xlog);
-
-	snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	set_ps_display(activitymsg);
 
-	return true;
+	return ret;
 }
 
 /*
@@ -716,3 +615,44 @@ HandlePgArchInterrupts(void)
 		ProcessConfigFile(PGC_SIGHUP);
 	}
 }
+
+/*
+ * LoadArchiveLibrary
+ *
+ * Loads the archiving callbacks into our local ArchiveContext.
+ */
+static void
+LoadArchiveLibrary(void)
+{
+	ArchiveContext = palloc0(sizeof(ArchiveModuleCallbacks));
+
+	/*
+	 * If shell archiving is enabled, use our special initialization
+	 * function.  Otherwise, load the library and call its
+	 * _PG_archive_module_init().
+	 */
+	if (ShellArchivingEnabled())
+		shell_archive_init(ArchiveContext);
+	else
+	{
+		ArchiveModuleInit archive_init;
+
+		archive_init = (ArchiveModuleInit)
+			load_external_function(XLogArchiveLibrary,
+								   "_PG_archive_module_init", false, NULL);
+
+		if (archive_init == NULL)
+			ereport(ERROR,
+					(errmsg("archive modules have to declare the "
+							"_PG_archive_module_init symbol")));
+
+		archive_init(ArchiveContext);
+	}
+
+	if (ArchiveContext->check_configured_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register a check callback")));
+	if (ArchiveContext->archive_file_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register an archive callback")));
+}
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index e2a76ba055..f43c6b4cdc 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1024,6 +1024,7 @@ PostmasterMain(int argc, char *argv[])
 	 * process any libraries that should be preloaded at postmaster start
 	 */
 	process_shared_preload_libraries();
+	process_archive_library();
 
 	/*
 	 * Initialize SSL library, if specified.
@@ -5011,6 +5012,7 @@ SubPostmasterMain(int argc, char *argv[])
 	 * non-EXEC_BACKEND behavior.
 	 */
 	process_shared_preload_libraries();
+	process_archive_library();
 
 	/* Run backend or appropriate child */
 	if (strcmp(argv[1], "--forkbackend") == 0)
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
new file mode 100644
index 0000000000..7298dda6ee
--- /dev/null
+++ b/src/backend/postmaster/shell_archive.c
@@ -0,0 +1,156 @@
+/*-------------------------------------------------------------------------
+ *
+ * shell_archive.c
+ *
+ * This archiving function uses a user-specified shell command (the
+ * archive_command GUC) to copy write-ahead log files.  It is used as the
+ * default, but other modules may define their own custom archiving logic.
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/shell_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/wait.h>
+
+#include "access/xlog.h"
+#include "postmaster/pgarch.h"
+
+static bool shell_archive_configured(void);
+static bool shell_archive_file(const char *file, const char *path);
+
+void
+shell_archive_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&shell_archive_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = shell_archive_configured;
+	cb->archive_file_cb = shell_archive_file;
+}
+
+static bool
+shell_archive_configured(void)
+{
+	return XLogArchiveCommand[0] != '\0';
+}
+
+static bool
+shell_archive_file(const char *file, const char *path)
+{
+	char		xlogarchcmd[MAXPGPATH];
+	char	   *dp;
+	char	   *endp;
+	const char *sp;
+	int			rc;
+
+	Assert(file != NULL);
+	Assert(path != NULL);
+
+	/*
+	 * construct the command to be executed
+	 */
+	dp = xlogarchcmd;
+	endp = xlogarchcmd + MAXPGPATH - 1;
+	*endp = '\0';
+
+	for (sp = XLogArchiveCommand; *sp; sp++)
+	{
+		if (*sp == '%')
+		{
+			switch (sp[1])
+			{
+				case 'p':
+					/* %p: relative path of source file */
+					sp++;
+					strlcpy(dp, path, endp - dp);
+					make_native_path(dp);
+					dp += strlen(dp);
+					break;
+				case 'f':
+					/* %f: filename of source file */
+					sp++;
+					strlcpy(dp, file, endp - dp);
+					dp += strlen(dp);
+					break;
+				case '%':
+					/* convert %% to a single % */
+					sp++;
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+				default:
+					/* otherwise treat the % as not special */
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+			}
+		}
+		else
+		{
+			if (dp < endp)
+				*dp++ = *sp;
+		}
+	}
+	*dp = '\0';
+
+	ereport(DEBUG3,
+			(errmsg_internal("executing archive command \"%s\"",
+							 xlogarchcmd)));
+
+	rc = system(xlogarchcmd);
+	if (rc != 0)
+	{
+		/*
+		 * If either the shell itself, or a called command, died on a signal,
+		 * abort the archiver.  We do this because system() ignores SIGINT and
+		 * SIGQUIT while waiting; so a signal is very likely something that
+		 * should have interrupted us too.  Also die if the shell got a hard
+		 * "command not found" type of error.  If we overreact it's no big
+		 * deal, the postmaster will just start the archiver again.
+		 */
+		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
+
+		if (WIFEXITED(rc))
+		{
+			ereport(lev,
+					(errmsg("archive command failed with exit code %d",
+							WEXITSTATUS(rc)),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+		else if (WIFSIGNALED(rc))
+		{
+#if defined(WIN32)
+			ereport(lev,
+					(errmsg("archive command was terminated by exception 0x%X",
+							WTERMSIG(rc)),
+					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#else
+			ereport(lev,
+					(errmsg("archive command was terminated by signal %d: %s",
+							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#endif
+		}
+		else
+		{
+			ereport(lev,
+					(errmsg("archive command exited with unrecognized status %d",
+							rc),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+
+		return false;
+	}
+
+	elog(DEBUG1, "archived write-ahead log file \"%s\"", file);
+	return true;
+}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 88801374b5..9f2766ed04 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -38,6 +38,7 @@
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/interrupt.h"
+#include "postmaster/pgarch.h"
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -1614,6 +1615,9 @@ char	   *local_preload_libraries_string = NULL;
 /* Flag telling that we are loading shared_preload_libraries */
 bool		process_shared_preload_libraries_in_progress = false;
 
+/* Flag telling that we are loading archive_library */
+bool		process_archive_library_in_progress = false;
+
 /*
  * load the shared libraries listed in 'libraries'
  *
@@ -1696,6 +1700,29 @@ process_session_preload_libraries(void)
 				   true);
 }
 
+/*
+ * process the archive library
+ */
+void
+process_archive_library(void)
+{
+	process_archive_library_in_progress = true;
+
+	/*
+	 * The shell archiving code is in the core server, so there's nothing
+	 * to load for that.
+	 */
+	if (!ShellArchivingEnabled())
+	{
+		load_file(XLogArchiveLibrary, false);
+		ereport(DEBUG1,
+				(errmsg_internal("loaded archive library \"%s\"",
+								 XLogArchiveLibrary)));
+	}
+
+	process_archive_library_in_progress = false;
+}
+
 void
 pg_bindtextdomain(const char *domain)
 {
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index e91d5a3cfd..9204f608fc 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3864,13 +3864,23 @@ static struct config_string ConfigureNamesString[] =
 	{
 		{"archive_command", PGC_SIGHUP, WAL_ARCHIVING,
 			gettext_noop("Sets the shell command that will be called to archive a WAL file."),
-			NULL
+			gettext_noop("This is unused if \"archive_library\" does not indicate archiving via shell is enabled.")
 		},
 		&XLogArchiveCommand,
 		"",
 		NULL, NULL, show_archive_command
 	},
 
+	{
+		{"archive_library", PGC_POSTMASTER, WAL_ARCHIVING,
+			gettext_noop("Sets the library that will be called to archive a WAL file."),
+			gettext_noop("A value of \"shell\" or an empty string indicates that \"archive_command\" should be used.")
+		},
+		&XLogArchiveLibrary,
+		"shell",
+		NULL, NULL, NULL
+	},
+
 	{
 		{"restore_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will be called to retrieve an archived WAL file."),
@@ -8961,7 +8971,8 @@ init_custom_variable(const char *name,
 	 * module might already have hooked into.
 	 */
 	if (context == PGC_POSTMASTER &&
-		!process_shared_preload_libraries_in_progress)
+		!process_shared_preload_libraries_in_progress &&
+		!process_archive_library_in_progress)
 		elog(FATAL, "cannot create PGC_POSTMASTER variables after startup");
 
 	/*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 5e2c94a05f..7093e3390f 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -157,7 +157,6 @@ extern PGDLLIMPORT int wal_level;
 /* Is WAL archiving enabled always (even during recovery)? */
 #define XLogArchivingAlways() \
 	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
-#define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')
 
 /*
  * Is WAL-logging necessary for archival or log-shipping, or can we skip
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 90a3016065..8717fed0dc 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -464,6 +464,7 @@ extern void BaseInit(void);
 /* in utils/init/miscinit.c */
 extern bool IgnoreSystemIndexes;
 extern PGDLLIMPORT bool process_shared_preload_libraries_in_progress;
+extern PGDLLIMPORT bool process_archive_library_in_progress;
 extern char *session_preload_libraries_string;
 extern char *shared_preload_libraries_string;
 extern char *local_preload_libraries_string;
@@ -477,6 +478,7 @@ extern bool RecheckDataDirLockFile(void);
 extern void ValidatePgVersion(const char *path);
 extern void process_shared_preload_libraries(void);
 extern void process_session_preload_libraries(void);
+extern void process_archive_library(void);
 extern void pg_bindtextdomain(const char *domain);
 extern bool has_rolreplication(Oid roleid);
 
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 1e47a143e1..7d09d2665e 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -32,4 +32,49 @@ extern bool PgArchCanRestart(void);
 extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 
+/*
+ * The value of the archive_library GUC.
+ */
+extern char *XLogArchiveLibrary;
+
+/*
+ * Callback that gets called to determine if the archive module is
+ * configured.
+ */
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+
+/*
+ * Callback called to archive a single WAL file.
+ */
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+
+/*
+ * Archive module callbacks
+ */
+typedef struct ArchiveModuleCallbacks
+{
+	ArchiveCheckConfiguredCB check_configured_cb;
+	ArchiveFileCB archive_file_cb;
+} ArchiveModuleCallbacks;
+
+/*
+ * Type of the shared library symbol _PG_archive_module_init that is looked
+ * up when loading an archive library.
+ */
+typedef void (*ArchiveModuleInit) (ArchiveModuleCallbacks *cb);
+
+/*
+ * Since the logic for archiving via a shell command is in the core server
+ * and does not need to be loaded via a shared library, it has a special
+ * initialization function.
+ */
+extern void shell_archive_init(ArchiveModuleCallbacks *cb);
+
+/*
+ * We consider archiving via shell to be enabled if archive_library is
+ * empty or if archive_library is set to "shell".
+ */
+#define ShellArchivingEnabled() \
+	(XLogArchiveLibrary[0] == '\0' || strcmp(XLogArchiveLibrary, "shell") == 0)
+
 #endif							/* _PGARCH_H */
-- 
2.16.6

v6-0002-Add-an-example-of-a-basic-archiving-module.patchapplication/octet-stream; name=v6-0002-Add-an-example-of-a-basic-archiving-module.patchDownload
From cf45aa481afcc1f474c61972299d1b4b0b924fd2 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Mon, 25 Oct 2021 19:30:31 +0000
Subject: [PATCH v6 2/2] Add an example of a basic archiving module.

---
 contrib/Makefile                      |   1 +
 contrib/basic_archive/Makefile        |  15 ++
 contrib/basic_archive/basic_archive.c | 397 ++++++++++++++++++++++++++++++++++
 3 files changed, 413 insertions(+)
 create mode 100644 contrib/basic_archive/Makefile
 create mode 100644 contrib/basic_archive/basic_archive.c

diff --git a/contrib/Makefile b/contrib/Makefile
index f27e458482..aff834ebaa 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -9,6 +9,7 @@ SUBDIRS = \
 		amcheck		\
 		auth_delay	\
 		auto_explain	\
+		basic_archive	\
 		bloom		\
 		btree_gin	\
 		btree_gist	\
diff --git a/contrib/basic_archive/Makefile b/contrib/basic_archive/Makefile
new file mode 100644
index 0000000000..ea6b460889
--- /dev/null
+++ b/contrib/basic_archive/Makefile
@@ -0,0 +1,15 @@
+# contrib/basic_archive/Makefile
+
+MODULES = basic_archive
+PGFILEDESC = "basic_archive - basic archive module"
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/basic_archive
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
new file mode 100644
index 0000000000..513428de16
--- /dev/null
+++ b/contrib/basic_archive/basic_archive.c
@@ -0,0 +1,397 @@
+/*-------------------------------------------------------------------------
+ *
+ * basic_archive.c
+ *
+ * This file demonstrates a basic archive library implementation that is
+ * roughly equivalent to the following shell command:
+ *
+ * 		test ! -f /path/to/dest && cp /path/to/src /path/to/dest
+ *
+ * One notable difference between this module and the shell command above
+ * is that this module first copies the file to a temporary destination,
+ * syncs it to disk, and then durably moves it to the final destination.
+ * While this module is designed to gracefully recover from inconvenient
+ * server crashes (e.g., a crash after we've archived the file but before
+ * we've renamed its .ready file to .done), it does not have any special
+ * handling for multiple servers archiving to the same location.
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  contrib/basic_archive/basic_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "common/checksum_helper.h"
+#include "fmgr.h"
+#include "miscadmin.h"
+#include "postmaster/pgarch.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+void _PG_init(void);
+void _PG_archive_module_init(ArchiveModuleCallbacks *cb);
+
+static char *archive_directory = NULL;
+
+static bool basic_archive_configured(void);
+static bool basic_archive_file(const char *file, const char *path);
+static bool check_archive_directory(char **newval, void **extra, GucSource source);
+static void copy_file(const char *src, const char *dst);
+static void fsync_file_and_parent_dir(const char *path);
+static bool file_contents_match(const char *file1, const char *file2);
+static void get_file_checksum(const char *file, int *checksumlen, uint8 *checksumbuf);
+
+/*
+ * _PG_init
+ *
+ * Defines the module's GUC.
+ */
+void
+_PG_init(void)
+{
+	if (!process_archive_library_in_progress)
+		ereport(ERROR,
+				(errmsg("\"basic_archive\" can only be loaded via \"archive_library\"")));
+
+	DefineCustomStringVariable("basic_archive.archive_directory",
+							   gettext_noop("Archive file destination directory."),
+							   NULL,
+							   &archive_directory,
+							   "",
+							   PGC_SIGHUP,
+							   0,
+							   check_archive_directory, NULL, NULL);
+
+	EmitWarningsOnPlaceholders("basic_archive");
+}
+
+/*
+ * _PG_archive_module_init
+ *
+ * Returns the module's archiving callbacks.
+ */
+void
+_PG_archive_module_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = basic_archive_configured;
+	cb->archive_file_cb = basic_archive_file;
+}
+
+/*
+ * check_archive_directory
+ *
+ * Checks that the provided archive directory exists.
+ */
+static bool
+check_archive_directory(char **newval, void **extra, GucSource source)
+{
+	struct stat st;
+
+	/*
+	 * The default value is an empty string, so we have to accept that value.
+	 * Our check_configured callback also checks for this and prevents archiving
+	 * from proceeding if it is still empty.
+	 */
+	if (*newval == NULL || *newval[0] == '\0')
+		return true;
+
+	/*
+	 * Do a basic sanity check that the specified archive directory exists.  It
+	 * could be removed at some point in the future, so we still need to be
+	 * prepared for it not to exist in the actual archiving logic.
+	 */
+	if (stat(*newval, &st) != 0 || !S_ISDIR(st.st_mode))
+	{
+		GUC_check_errdetail("specified archive directory does not exist");
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * basic_archive_configured
+ *
+ * Checks that archive_directory is not blank.
+ */
+static bool
+basic_archive_configured(void)
+{
+	return archive_directory != NULL && archive_directory[0] != '\0';
+}
+
+/*
+ * basic_archive_file
+ *
+ * Archives one file.
+ */
+static bool
+basic_archive_file(const char *file, const char *path)
+{
+	char destination[MAXPGPATH];
+	char temp[MAXPGPATH];
+	struct stat st;
+
+	Assert(file != NULL);
+	Assert(path != NULL);
+
+	ereport(DEBUG3,
+			(errmsg("archiving \"%s\" via basic_archive", file)));
+
+#define TEMP_FILE_NAME ("archtemp")
+
+	Assert(MAX_XFN_CHARS >= strlen(TEMP_FILE_NAME));
+	if (strlen(archive_directory) + MAX_XFN_CHARS + 2 >= MAXPGPATH)
+	{
+		ereport(WARNING,
+				(errmsg("archive destination path too long")));
+		return false;
+	}
+
+	snprintf(destination, MAXPGPATH, "%s/%s", archive_directory, file);
+	snprintf(temp, MAXPGPATH, "%s/%s", archive_directory, TEMP_FILE_NAME);
+
+	/*
+	 * First, check if the file has already been archived.
+	 *
+	 * If the archive file already exists, check if its content matches the to-
+	 * be-archived file.  If the files match, we assume that the archiver
+	 * previously crashed at an unfortunate time and that we can safely proceed.
+	 * If the files do not match, something might be wrong, so we just fail.
+	 */
+	if (stat(destination, &st) == 0)
+	{
+		if (file_contents_match(path, destination))
+		{
+			ereport(DEBUG3,
+					(errmsg("archive file \"%s\" already exists with matching contents",
+							destination)));
+			fsync_file_and_parent_dir(destination);
+			return true;
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("archive file \"%s\" already exists with different contents",
+							destination)));
+			return false;
+		}
+	}
+	else if (errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat file \"%s\": %m", destination)));
+
+	/*
+	 * Remove pre-existing temporary file, if one exists.
+	 */
+	if (unlink(temp) != 0 && errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not unlink file \"%s\": %m", temp)));
+
+	/*
+	 * Copy the file to its temporary destination.
+	 */
+	copy_file(path, temp);
+
+	/*
+	 * Sync the temporary file to disk and move it to its final destination.
+	 */
+	(void) durable_rename(temp, destination, ERROR);
+
+	ereport(DEBUG1,
+			(errmsg("archived \"%s\" via basic_archive", file)));
+
+	return true;
+}
+
+/*
+ * copy_file
+ *
+ * Copies the contents of src to dst.
+ */
+static void
+copy_file(const char *src, const char *dst)
+{
+	int srcfd;
+	int dstfd;
+	char *buf;
+
+	Assert(src != NULL);
+	Assert(dst != NULL);
+
+#define COPY_BUF_SIZE (64 * 1024)
+
+	buf = palloc(COPY_BUF_SIZE);
+
+	srcfd = OpenTransientFile(src, O_RDONLY | PG_BINARY);
+	if (srcfd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", src)));
+
+	dstfd = OpenTransientFile(dst, O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
+	if (dstfd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", dst)));
+
+	for (;;)
+	{
+		int nbytes = read(srcfd, buf, COPY_BUF_SIZE);
+		if (nbytes < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", src)));
+
+		if (nbytes == 0)
+			break;
+
+		errno = 0;
+		if ((int) write(dstfd, buf, nbytes) != nbytes)
+		{
+			/* if write didn't set errno, assume problem is no disk space */
+			if (errno == 0)
+				errno = ENOSPC;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not write to file \"%s\": %m", dst)));
+		}
+	}
+
+	if (CloseTransientFile(dstfd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", dst)));
+
+	if (CloseTransientFile(srcfd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", src)));
+
+	pfree(buf);
+}
+
+/*
+ * fsync_file_and_parent_dir
+ *
+ * Flushes the given file and its parent directory to disk.
+ */
+static void
+fsync_file_and_parent_dir(const char *path)
+{
+	char parentpath[MAXPGPATH];
+
+	fsync_fname_ext(path, false, false, ERROR);
+
+	strlcpy(parentpath, path, MAXPGPATH);
+	get_parent_directory(parentpath);
+
+	/*
+	 * get_parent_directory() returns an empty string if the input argument is
+	 * just a file name (see comments in path.c), so handle that as being the
+	 * current directory.
+	 */
+	if (strlen(parentpath) == 0)
+		strlcpy(parentpath, ".", MAXPGPATH);
+
+	(void) fsync_fname_ext(parentpath, true, false, ERROR);
+}
+
+/*
+ * get_file_checksum
+ *
+ * Calculates a basic checksum for the given file.  The checksums returned by
+ * this function are not cryptographically strong.
+ */
+static void
+get_file_checksum(const char *file, int *checksumlen, uint8 *checksumbuf)
+{
+	int fd;
+	uint8 *buf;
+	pg_checksum_context checksum_ctx;
+
+	Assert(file != NULL);
+	Assert(checksumlen != NULL);
+	Assert(checksumbuf != NULL);
+
+#define CHECKSUM_BUF_SIZE (64 * 1024)
+
+	buf = palloc(CHECKSUM_BUF_SIZE);
+
+	fd = OpenTransientFile(file, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m", file)));
+
+	if (pg_checksum_init(&checksum_ctx, CHECKSUM_TYPE_CRC32C) < 0)
+		ereport(ERROR,
+				(errmsg("could not initialized checksum of file \"%s\"", file)));
+
+	for (;;)
+	{
+		int nbytes = read(fd, buf, CHECKSUM_BUF_SIZE);
+		if (nbytes < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", file)));
+
+		if (nbytes == 0)
+			break;
+
+		if (pg_checksum_update(&checksum_ctx, buf, nbytes) < 0)
+			ereport(ERROR,
+					(errmsg("could not update checksum of file \"%s\"", file)));
+	}
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", file)));
+
+	*checksumlen = pg_checksum_final(&checksum_ctx, checksumbuf);
+	if (*checksumlen < 0)
+		ereport(ERROR,
+				(errmsg("could not finalize checksum of file \"%s\"", file)));
+
+	pfree(buf);
+}
+
+/*
+ * file_contents_match
+ *
+ * This determines whether the two given files have the same contents by
+ * calculating and comparing their checksums.  If the files have the same
+ * contents, this function returns true.  If the files have different contents,
+ * this function will typically return false, but may in rare circumstances
+ * return true due to a checksum collision.  This risk of this function
+ * returning true for files with different contents is considered negligible.
+ */
+static bool
+file_contents_match(const char *file1, const char *file2)
+{
+	uint8 checksumbuf1[PG_CHECKSUM_MAX_LENGTH];
+	uint8 checksumbuf2[PG_CHECKSUM_MAX_LENGTH];
+	int checksumlen1 = 0;
+	int checksumlen2 = 0;
+
+	get_file_checksum(file1, &checksumlen1, checksumbuf1);
+	get_file_checksum(file2, &checksumlen2, checksumbuf2);
+
+	if (checksumlen1 != checksumlen2)
+		return false;
+
+	return memcmp(checksumbuf1, checksumbuf2, checksumlen1) == 0;
+}
-- 
2.16.6

#55Robert Haas
robertmhaas@gmail.com
In reply to: Bossart, Nathan (#54)
Re: parallelizing the archiver

On Mon, Oct 25, 2021 at 3:45 PM Bossart, Nathan <bossartn@amazon.com> wrote:

Alright, here is an attempt at that. With this revision, archive
libraries are preloaded (and _PG_init() is called), and the archiver
is responsible for calling _PG_archive_module_init() to get the
callbacks. I've also removed the GUC check hooks as previously
discussed.

I would need to spend more time on this to have a detailed opinion on
all of it, but I agree that part looks better this way.

--
Robert Haas
EDB: http://www.enterprisedb.com

#56Bossart, Nathan
bossartn@amazon.com
In reply to: Robert Haas (#55)
Re: parallelizing the archiver

On 10/25/21, 1:29 PM, "Robert Haas" <robertmhaas@gmail.com> wrote:

On Mon, Oct 25, 2021 at 3:45 PM Bossart, Nathan <bossartn@amazon.com> wrote:

Alright, here is an attempt at that. With this revision, archive
libraries are preloaded (and _PG_init() is called), and the archiver
is responsible for calling _PG_archive_module_init() to get the
callbacks. I've also removed the GUC check hooks as previously
discussed.

I would need to spend more time on this to have a detailed opinion on
all of it, but I agree that part looks better this way.

Great. Unless I see additional feedback on the basic design shortly,
I'll give the documentation updates a try.

Nathan

#57Bossart, Nathan
bossartn@amazon.com
In reply to: Robert Haas (#55)
1 attachment(s)
Re: parallelizing the archiver

On 10/25/21, 1:41 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

Great. Unless I see additional feedback on the basic design shortly,
I'll give the documentation updates a try.

Okay, here is a more complete patch with a first attempt at the
documentation changes. I tried to keep the changes to the existing
docs as minimal as possible, and then I added a new chapter that
describes what goes into creating an archive module. Separately, I
simplified the basic_archive module, moved it to src/test/modules,
and added a simple test. My goal is for this to serve as a basic
example and to provide some test coverage on the new infrastructure.

Nathan

Attachments:

v7-0001-Introduce-archive-module-infrastructure.patchapplication/octet-stream; name=v7-0001-Introduce-archive-module-infrastructure.patchDownload
From 72e606ca7ab3b411de2971600b3ed0a64e2644ec Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Wed, 27 Oct 2021 03:22:04 +0000
Subject: [PATCH v7 1/1] Introduce archive module infrastructure.

This feature allows custom archive libraries to be used in place of
archive_command.  A new GUC called archive_library specifies the
archive module that should be used.  The library is preloaded, so
its _PG_init() can do anything that libraries loaded via
shared_preload_libraries can do.  Like logical decoding output
plugins, archive modules must define an initialization function and
some callbacks.  If archive_library is set to "shell" (which is the
default for backward compatibility), archive_command is used.
---
 doc/src/sgml/archive-modules.sgml                  | 133 +++++++++++++++
 doc/src/sgml/backup.sgml                           |  83 +++++----
 doc/src/sgml/config.sgml                           |  37 +++-
 doc/src/sgml/filelist.sgml                         |   1 +
 doc/src/sgml/high-availability.sgml                |   6 +-
 doc/src/sgml/postgres.sgml                         |   1 +
 doc/src/sgml/ref/pg_basebackup.sgml                |   4 +-
 doc/src/sgml/ref/pg_receivewal.sgml                |   6 +-
 doc/src/sgml/wal.sgml                              |   2 +-
 src/backend/access/transam/xlog.c                  |   2 +-
 src/backend/postmaster/Makefile                    |   1 +
 src/backend/postmaster/pgarch.c                    | 182 +++++++-------------
 src/backend/postmaster/postmaster.c                |   2 +
 src/backend/postmaster/shell_archive.c             | 156 +++++++++++++++++
 src/backend/utils/init/miscinit.c                  |  27 +++
 src/backend/utils/misc/guc.c                       |  15 +-
 src/backend/utils/misc/postgresql.conf.sample      |   1 +
 src/include/access/xlog.h                          |   1 -
 src/include/miscadmin.h                            |   2 +
 src/include/postmaster/pgarch.h                    |  45 +++++
 src/test/modules/Makefile                          |   1 +
 src/test/modules/basic_archive/.gitignore          |   4 +
 src/test/modules/basic_archive/Makefile            |  20 +++
 src/test/modules/basic_archive/basic_archive.c     | 189 +++++++++++++++++++++
 src/test/modules/basic_archive/basic_archive.conf  |   3 +
 .../basic_archive/expected/basic_archive.out       |  29 ++++
 .../modules/basic_archive/sql/basic_archive.sql    |  22 +++
 27 files changed, 802 insertions(+), 173 deletions(-)
 create mode 100644 doc/src/sgml/archive-modules.sgml
 create mode 100644 src/backend/postmaster/shell_archive.c
 create mode 100644 src/test/modules/basic_archive/.gitignore
 create mode 100644 src/test/modules/basic_archive/Makefile
 create mode 100644 src/test/modules/basic_archive/basic_archive.c
 create mode 100644 src/test/modules/basic_archive/basic_archive.conf
 create mode 100644 src/test/modules/basic_archive/expected/basic_archive.out
 create mode 100644 src/test/modules/basic_archive/sql/basic_archive.sql

diff --git a/doc/src/sgml/archive-modules.sgml b/doc/src/sgml/archive-modules.sgml
new file mode 100644
index 0000000000..d69b462578
--- /dev/null
+++ b/doc/src/sgml/archive-modules.sgml
@@ -0,0 +1,133 @@
+<!-- doc/src/sgml/archive-modules.sgml -->
+
+<chapter id="archive-modules">
+ <title>Archive Modules</title>
+ <indexterm zone="archive-modules">
+  <primary>Archive Modules</primary>
+ </indexterm>
+
+ <para>
+  PostgreSQL provides infrastructure to create custom modules for continuous
+  archiving (see <xref linkend="continuous-archiving"/>).  While archiving via
+  a shell command (i.e., <xref linkend="guc-archive-command"/>) is much
+  simpler, a custom archive module will often be considerably more robust and
+  performant.
+ </para>
+
+ <para>
+  When a custom <xref linkend="guc-archive-library"/> is configured, PostgreSQL
+  will submit completed WAL files to the module, and the server will avoid
+  recyling or removing these WAL files until the module indicates that the files
+  were successfully archived.  It is ultimately up to the module to decide what
+  to do with each WAL file, but many recommendations are listed at
+  <xref linkend="backup-archiving-wal"/>.
+ </para>
+
+ <para>
+  Archiving modules must at least consist of an initialization function (see
+  <xref linkend="archive-module-init"/>) and the required callbacks (see
+  <xref linkend="archive-module-callbacks"/>).  However, archive modules are
+  also permitted to do much more (e.g., declare GUCs, register background
+  workers, and implement SQL functions).
+ </para>
+
+ <para>
+  The <filename>src/test/modules/basic_archive</filename> module contains a
+  working example, which demonstrates some useful techniques.
+ </para>
+
+ <warning>
+  <para>
+   There are considerable robustness and security risks in using archive modules
+   because, being written in the <literal>C</literal> language, they have access
+   to many of the server resources.  Administrators wishing to enable archive
+   modules should exercise extreme caution.  Only carefully audited modules
+   should be loaded.
+  </para>
+ </warning>
+
+ <sect1 id="archive-module-init">
+  <title>Initialization Functions</title>
+  <indexterm zone="archive-module-init">
+   <primary>_PG_archive_module_init</primary>
+  </indexterm>
+  <para>
+   An archive library is loaded by dynamically loading a shared library with the
+   <xref linkend="guc-archive-library"/>'s name as the library base name.  The
+   normal library search path is used to locate the library.  To provide the
+   required archive module callbacks and to indicate that the library is
+   actually an archive module, it needs to provide a function named
+   <function>PG_archive_module_init</function>.  This function is passed a
+   struct that needs to be filled with the callback function pointers for
+   individual actions.
+
+<programlisting>
+typedef struct ArchiveModuleCallbacks
+{
+    ArchiveCheckConfiguredCB check_configured_cb;
+    ArchiveFileCB archive_file_cb;
+} ArchiveModuleCallbacks;
+typedef void (*ArchiveModuleInit) (struct ArchiveModuleCallbacks *cb);
+</programlisting>
+
+   Both callbacks are required.
+  </para>
+
+  <para>
+   Archive libraries are preloaded in a similar fashion as
+   <xref linkend="guc-shared-preload-libraries"/>.  This means that it is
+   possible to do things in the module's <function>_PG_init</function> function
+   that can only be done at server start.  The
+   <varname>process_archive_library_in_progress</varname> will be set to
+   <literal>true</literal> when the archive library is being preloaded during
+   server startup.
+  </para>
+ </sect1>
+
+ <sect1 id="archive-module-callbacks">
+  <title>Archive Module Callbacks</title>
+  <para>
+   The archive callbacks define the actual archiving behavior of the module.
+   The server will call them as required to process each individual WAL file.
+  </para>
+
+  <sect2 id="archive-module-check">
+   <title>Check Callback</title>
+   <para>
+    The <function>check_configured_cb</function> callback is called to determine
+    whether the module is fully configured and ready to accept WAL files.
+
+<programlisting>
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server will proceed with
+    archiving the file by calling the <function>archive_file_cb</function>
+    callback.  If <literal>false</literal> is returned, archiving will not
+    proceed.  In the latter case, the server will periodically call this
+    function, and archiving will proceed if it eventually returns
+    <literal>true</literal>.
+   </para>
+  </sect2>
+
+  <sect2 id="archive-module-archive">
+   <title>Archive Callback</title>
+   <para>
+    The <function>archive_file_cb</function> callback is called to archive a
+    single WAL file.
+
+<programlisting>
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+</programlisting>
+
+    If <literal>true</literal> is returned, the server proceeds as if the file
+    was successfully archived, which may include recycling or removing the
+    original WAL file.  If <literal>false</literal> is returned, the server will
+    keep the original WAL file and retry archiving later.
+    <literal>file</literal> will contain just the file name of the WAL file to
+    archive, while <literal>path</literal> contains the full path of the WAL
+    file (including the file name).
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index cba32b6eb3..b42f1b3ca7 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -593,20 +593,23 @@ tar -cf backup.tar /usr/local/pgsql/data
     provide the database administrator with flexibility,
     <productname>PostgreSQL</productname> tries not to make any assumptions about how
     the archiving will be done.  Instead, <productname>PostgreSQL</productname> lets
-    the administrator specify a shell command to be executed to copy a
-    completed segment file to wherever it needs to go.  The command could be
-    as simple as a <literal>cp</literal>, or it could invoke a complex shell
-    script &mdash; it's all up to you.
+    the administrator specify an archive library to be executed to copy a
+    completed segment file to wherever it needs to go.  This could be as simple
+    as a shell command that uses <literal>cp</literal>, or it could invoke a
+    complex C function &mdash; it's all up to you.
    </para>
 
    <para>
     To enable WAL archiving, set the <xref linkend="guc-wal-level"/>
     configuration parameter to <literal>replica</literal> or higher,
     <xref linkend="guc-archive-mode"/> to <literal>on</literal>,
-    and specify the shell command to use in the <xref
-    linkend="guc-archive-command"/> configuration parameter.  In practice
+    and specify the library to use in the <xref
+    linkend="guc-archive-library"/> configuration parameter.  In practice
     these settings will always be placed in the
     <filename>postgresql.conf</filename> file.
+    One simple way to archive is to set <varname>archive_library</varname> to
+    <literal>shell</literal> and to specify a shell command in
+    <xref linkend="guc-archive-command"/>.
     In <varname>archive_command</varname>,
     <literal>%p</literal> is replaced by the path name of the file to
     archive, while <literal>%f</literal> is replaced by only the file name.
@@ -631,7 +634,17 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command will be executed under the ownership of the same
+    Another way to archive is to use a custom archive module as the
+    <varname>archive_library</varname>.  Since such modules are written in
+    <literal>C</literal>, creating your own may require considerably more effort
+    than writing a shell command.  However, archive modules can be more
+    performant than archiving via shell, and they will have access to many
+    useful server resources.  For more information about archive modules, see
+    <xref linkend="archive-modules"/>.
+   </para>
+
+   <para>
+    The archive library will be executed under the ownership of the same
     user that the <productname>PostgreSQL</productname> server is running as.  Since
     the series of WAL files being archived contains effectively everything
     in your database, you will want to be sure that the archived data is
@@ -640,25 +653,31 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is important that the archive command return zero exit status if and
-    only if it succeeds.  Upon getting a zero result,
+    It is important that the archive function return <literal>true</literal> if
+    and only if it succeeds.  If <literal>true</literal> is returned,
     <productname>PostgreSQL</productname> will assume that the file has been
-    successfully archived, and will remove or recycle it.  However, a nonzero
-    status tells <productname>PostgreSQL</productname> that the file was not archived;
-    it will try again periodically until it succeeds.
+    successfully archived, and will remove or recycle it.  However, a return
+    value of <literal>false</literal> tells
+    <productname>PostgreSQL</productname> that the file was not archived; it
+    will try again periodically until it succeeds.  If you are archiving via a
+    shell command, the appropriate return values can be achieved by returning
+    <literal>0</literal> if the command succeeds and a nonzero value if it
+    fails.
    </para>
 
    <para>
-    When the archive command is terminated by a signal (other than
-    <systemitem>SIGTERM</systemitem> that is used as part of a server
-    shutdown) or an error by the shell with an exit status greater than
-    125 (such as command not found), the archiver process aborts and gets
-    restarted by the postmaster. In such cases, the failure is
-    not reported in <xref linkend="pg-stat-archiver-view"/>.
+    If the archive function emits an <literal>ERROR</literal> or
+    <literal>FATAL</literal>, the archiver process aborts and gets restarted by
+    the postmaster.  If you are archiving via shell command, FATAL is emitted if
+    the command is terminated by a signal (other than
+    <systemitem>SIGTERM</systemitem> that is used as part of a server shutdown)
+    or an error by the shell with an exit status greater than 125 (such as
+    command not found).  In such cases, the failure is not reported in
+    <xref linkend="pg-stat-archiver-view"/>.
    </para>
 
    <para>
-    The archive command should generally be designed to refuse to overwrite
+    The archive library should generally be designed to refuse to overwrite
     any pre-existing archive file.  This is an important safety feature to
     preserve the integrity of your archive in case of administrator error
     (such as sending the output of two different servers to the same archive
@@ -666,9 +685,9 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is advisable to test your proposed archive command to ensure that it
+    It is advisable to test your proposed archive library to ensure that it
     indeed does not overwrite an existing file, <emphasis>and that it returns
-    nonzero status in this case</emphasis>.
+    <literal>false</literal> in this case</emphasis>.
     The example command above for Unix ensures this by including a separate
     <command>test</command> step.  On some Unix platforms, <command>cp</command> has
     switches such as <option>-i</option> that can be used to do the same thing
@@ -680,7 +699,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
 
    <para>
     While designing your archiving setup, consider what will happen if
-    the archive command fails repeatedly because some aspect requires
+    the archive library fails repeatedly because some aspect requires
     operator intervention or the archive runs out of space. For example, this
     could occur if you write to tape without an autochanger; when the tape
     fills, nothing further can be archived until the tape is swapped.
@@ -695,7 +714,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The speed of the archiving command is unimportant as long as it can keep up
+    The speed of the archive library is unimportant as long as it can keep up
     with the average rate at which your server generates WAL data.  Normal
     operation continues even if the archiving process falls a little behind.
     If archiving falls significantly behind, this will increase the amount of
@@ -707,11 +726,11 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    In writing your archive command, you should assume that the file names to
+    In writing your archive library, you should assume that the file names to
     be archived can be up to 64 characters long and can contain any
     combination of ASCII letters, digits, and dots.  It is not necessary to
-    preserve the original relative path (<literal>%p</literal>) but it is necessary to
-    preserve the file name (<literal>%f</literal>).
+    preserve the original relative path but it is necessary to preserve the file
+    name.
    </para>
 
    <para>
@@ -728,7 +747,7 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    The archive command is only invoked on completed WAL segments.  Hence,
+    The archive function is only invoked on completed WAL segments.  Hence,
     if your server generates only little WAL traffic (or has slack periods
     where it does so), there could be a long delay between the completion
     of a transaction and its safe recording in archive storage.  To put
@@ -758,7 +777,8 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
     contain enough information for archive recovery.  (Crash recovery is
     unaffected.)  For this reason, <varname>wal_level</varname> can only be changed at
     server start.  However, <varname>archive_command</varname> can be changed with a
-    configuration file reload.  If you wish to temporarily stop archiving,
+    configuration file reload.  If you are archiving via shell and wish to
+    temporarily stop archiving,
     one way to do it is to set <varname>archive_command</varname> to the empty
     string (<literal>''</literal>).
     This will cause WAL files to accumulate in <filename>pg_wal/</filename> until a
@@ -938,11 +958,11 @@ SELECT * FROM pg_stop_backup(false, true);
      On a standby, <varname>archive_mode</varname> must be <literal>always</literal> in order
      for <function>pg_stop_backup</function> to wait.
      Archiving of these files happens automatically since you have
-     already configured <varname>archive_command</varname>. In most cases this
+     already configured <varname>archive_library</varname>. In most cases this
      happens quickly, but you are advised to monitor your archive
      system to ensure there are no delays.
      If the archive process has fallen behind
-     because of failures of the archive command, it will keep retrying
+     because of failures of the archive library, it will keep retrying
      until the archive succeeds and the backup is complete.
      If you wish to place a time limit on the execution of
      <function>pg_stop_backup</function>, set an appropriate
@@ -1500,9 +1520,10 @@ restore_command = 'cp /mnt/server/archivedir/%f %p'
       To prepare for low level standalone hot backups, make sure
       <varname>wal_level</varname> is set to
       <literal>replica</literal> or higher, <varname>archive_mode</varname> to
-      <literal>on</literal>, and set up an <varname>archive_command</varname> that performs
+      <literal>on</literal>, and set up an <varname>archive_library</varname> that performs
       archiving only when a <emphasis>switch file</emphasis> exists.  For example:
 <programlisting>
+archive_library = 'shell'
 archive_command = 'test ! -f /var/lib/pgsql/backup_in_progress || (test ! -f /var/lib/pgsql/archive/%f &amp;&amp; cp %p /var/lib/pgsql/archive/%f)'
 </programlisting>
       This command will perform archiving when
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index de77f14573..1e6ab34913 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3479,7 +3479,7 @@ include_dir 'conf.d'
         Maximum size to let the WAL grow during automatic
         checkpoints. This is a soft limit; WAL size can exceed
         <varname>max_wal_size</varname> under special circumstances, such as
-        heavy load, a failing <varname>archive_command</varname>, or a high
+        heavy load, a failing <varname>archive_library</varname>, or a high
         <varname>wal_keep_size</varname> setting.
         If this value is specified without units, it is taken as megabytes.
         The default is 1 GB.
@@ -3528,7 +3528,7 @@ include_dir 'conf.d'
        <para>
         When <varname>archive_mode</varname> is enabled, completed WAL segments
         are sent to archive storage by setting
-        <xref linkend="guc-archive-command"/>. In addition to <literal>off</literal>,
+        <xref linkend="guc-archive-library"/>. In addition to <literal>off</literal>,
         to disable, there are two modes: <literal>on</literal>, and
         <literal>always</literal>. During normal operation, there is no
         difference between the two modes, but when set to <literal>always</literal>
@@ -3538,9 +3538,6 @@ include_dir 'conf.d'
         <xref linkend="continuous-archiving-in-standby"/> for details.
        </para>
        <para>
-        <varname>archive_mode</varname> and <varname>archive_command</varname> are
-        separate variables so that <varname>archive_command</varname> can be
-        changed without leaving archiving mode.
         This parameter can only be set at server start.
         <varname>archive_mode</varname> cannot be enabled when
         <varname>wal_level</varname> is set to <literal>minimal</literal>.
@@ -3548,6 +3545,28 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-archive-library" xreflabel="archive_library">
+      <term><varname>archive_library</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>archive_library</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        The library to use for archiving completed WAL file segments.  If set to
+        <literal>shell</literal> (the default) or an empty string, archiving via
+        shell is enabled, and <xref linkend="guc-archive-command"/> is used.
+        Otherwise, the specified shared library is preloaded and is used for
+        archiving.  For more information, see
+        <xref linkend="backup-archiving-wal"/> and
+        <xref linkend="archive-modules"/>.
+       </para>
+       <para>
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-archive-command" xreflabel="archive_command">
       <term><varname>archive_command</varname> (<type>string</type>)
       <indexterm>
@@ -3570,9 +3589,11 @@ include_dir 'conf.d'
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
         file or on the server command line.  It is ignored unless
-        <varname>archive_mode</varname> was enabled at server start.
+        <varname>archive_mode</varname> was enabled at server start and
+        <varname>archive_library</varname> specifies to archive via shell command.
         If <varname>archive_command</varname> is an empty string (the default) while
-        <varname>archive_mode</varname> is enabled, WAL archiving is temporarily
+        <varname>archive_mode</varname> is enabled and <varname>archive_library</varname>
+        specifies archiving via shell, WAL archiving is temporarily
         disabled, but the server continues to accumulate WAL segment files in
         the expectation that a command will soon be provided.  Setting
         <varname>archive_command</varname> to a command that does nothing but
@@ -3592,7 +3613,7 @@ include_dir 'conf.d'
       </term>
       <listitem>
        <para>
-        The <xref linkend="guc-archive-command"/> is only invoked for
+        The <xref linkend="guc-archive-library"/> is only invoked for
         completed WAL segments. Hence, if your server generates little WAL
         traffic (or has slack periods where it does so), there could be a
         long delay between the completion of a transaction and its safe
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 89454e99b9..e6b472ec32 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -99,6 +99,7 @@
 <!ENTITY custom-scan SYSTEM "custom-scan.sgml">
 <!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
 <!ENTITY replication-origins SYSTEM "replication-origins.sgml">
+<!ENTITY archive-modules SYSTEM "archive-modules.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index c43f214020..f4e5e9420b 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -935,7 +935,7 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
     In lieu of using replication slots, it is possible to prevent the removal
     of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by
     storing the segments in an archive using
-    <xref linkend="guc-archive-command"/>.
+    <xref linkend="guc-archive-library"/>.
     However, these methods often result in retaining more WAL segments than
     required, whereas replication slots retain only the number of segments
     known to be needed.  On the other hand, replication slots can retain so
@@ -1386,10 +1386,10 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
      to <literal>always</literal>, and the standby will call the archive
      command for every WAL segment it receives, whether it's by restoring
      from the archive or by streaming replication. The shared archive can
-     be handled similarly, but the <varname>archive_command</varname> must
+     be handled similarly, but the <varname>archive_library</varname> must
      test if the file being archived exists already, and if the existing file
      has identical contents. This requires more care in the
-     <varname>archive_command</varname>, as it must
+     <varname>archive_library</varname>, as it must
      be careful to not overwrite an existing file with different contents,
      but return success if the exactly same file is archived twice. And
      all that must be done free of race conditions, if two servers attempt
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index dba9cf413f..3db6d2160b 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -233,6 +233,7 @@ break is not needed in a wider output rendering.
   &bgworker;
   &logicaldecoding;
   &replication-origins;
+  &archive-modules;
 
  </part>
 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9e6807b457..2aaeaca766 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -102,8 +102,8 @@ PostgreSQL documentation
      <para>
       All WAL records required for the backup must contain sufficient full-page writes,
       which requires you to enable <varname>full_page_writes</varname> on the primary and
-      not to use a tool like <application>pg_compresslog</application> as
-      <varname>archive_command</varname> to remove full-page writes from WAL files.
+      not to use a tool in your <varname>archive_library</varname> to remove
+      full-page writes from WAL files.
      </para>
     </listitem>
    </itemizedlist>
diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 9fde2fd2ef..10ee107000 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -40,7 +40,7 @@ PostgreSQL documentation
   <para>
    <application>pg_receivewal</application> streams the write-ahead
    log in real time as it's being generated on the server, and does not wait
-   for segments to complete like <xref linkend="guc-archive-command"/> does.
+   for segments to complete like <xref linkend="guc-archive-library"/> does.
    For this reason, it is not necessary to set
    <xref linkend="guc-archive-timeout"/> when using
     <application>pg_receivewal</application>.
@@ -465,11 +465,11 @@ PostgreSQL documentation
 
   <para>
    When using <application>pg_receivewal</application> instead of
-   <xref linkend="guc-archive-command"/> as the main WAL backup method, it is
+   <xref linkend="guc-archive-library"/> as the main WAL backup method, it is
    strongly recommended to use replication slots.  Otherwise, the server is
    free to recycle or remove write-ahead log files before they are backed up,
    because it does not have any information, either
-   from <xref linkend="guc-archive-command"/> or the replication slots, about
+   from <xref linkend="guc-archive-library"/> or the replication slots, about
    how far the WAL stream has been archived.  Note, however, that a
    replication slot will fill up the server's disk space if the receiver does
    not keep up with fetching the WAL data.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 24e1c89503..2bb27a8468 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -636,7 +636,7 @@
    WAL files plus one additional WAL file are
    kept at all times. Also, if WAL archiving is used, old segments cannot be
    removed or recycled until they are archived. If WAL archiving cannot keep up
-   with the pace that WAL is generated, or if <varname>archive_command</varname>
+   with the pace that WAL is generated, or if <varname>archive_library</varname>
    fails repeatedly, old WAL files will accumulate in <filename>pg_wal</filename>
    until the situation is resolved. A slow or failed standby server that
    uses a replication slot will have the same effect (see
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f547efd294..6350656a8b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8795,7 +8795,7 @@ ShutdownXLOG(int code, Datum arg)
 		 * process one more time at the end of shutdown). The checkpoint
 		 * record will go to the next XLOG file and won't be archived (yet).
 		 */
-		if (XLogArchivingActive() && XLogArchiveCommandSet())
+		if (XLogArchivingActive())
 			RequestXLogSwitch(false);
 
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 787c6a2c3b..dbbeac5a82 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -23,6 +23,7 @@ OBJS = \
 	pgarch.o \
 	pgstat.o \
 	postmaster.o \
+	shell_archive.o \
 	startup.o \
 	syslogger.o \
 	walwriter.o
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 74a7d7c4d0..f0e437f820 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -25,18 +25,12 @@
  */
 #include "postgres.h"
 
-#include <fcntl.h>
-#include <signal.h>
-#include <time.h>
 #include <sys/stat.h>
-#include <sys/time.h>
-#include <sys/wait.h>
 #include <unistd.h>
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "libpq/pqsignal.h"
-#include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/pgarch.h"
@@ -78,6 +72,8 @@ typedef struct PgArchData
 	int			pgprocno;		/* pgprocno of archiver process */
 } PgArchData;
 
+char *XLogArchiveLibrary = "";
+
 
 /* ----------
  * Local data
@@ -85,6 +81,8 @@ typedef struct PgArchData
  */
 static time_t last_sigterm_time = 0;
 static PgArchData *PgArch = NULL;
+static ArchiveModuleCallbacks *ArchiveContext = NULL;
+
 
 /*
  * Flags set by interrupt handlers for later service in the main loop.
@@ -103,6 +101,7 @@ static bool pgarch_readyXlog(char *xlog);
 static void pgarch_archiveDone(char *xlog);
 static void pgarch_die(int code, Datum arg);
 static void HandlePgArchInterrupts(void);
+static void LoadArchiveLibrary(void);
 
 /* Report shared memory space needed by PgArchShmemInit */
 Size
@@ -198,6 +197,11 @@ PgArchiverMain(void)
 	 */
 	PgArch->pgprocno = MyProc->pgprocno;
 
+	/*
+	 * Load the archive_library.
+	 */
+	LoadArchiveLibrary();
+
 	pgarch_MainLoop();
 
 	proc_exit(0);
@@ -358,11 +362,11 @@ pgarch_ArchiverCopyLoop(void)
 			 */
 			HandlePgArchInterrupts();
 
-			/* can't do anything if no command ... */
-			if (!XLogArchiveCommandSet())
+			/* can't do anything if not configured ... */
+			if (!ArchiveContext->check_configured_cb())
 			{
 				ereport(WARNING,
-						(errmsg("archive_mode enabled, yet archive_command is not set")));
+						(errmsg("archive_mode enabled, yet archiving is not configured")));
 				return;
 			}
 
@@ -443,136 +447,31 @@ pgarch_ArchiverCopyLoop(void)
 /*
  * pgarch_archiveXlog
  *
- * Invokes system(3) to copy one archive file to wherever it should go
+ * Invokes archive_file_cb to copy one archive file to wherever it should go
  *
  * Returns true if successful
  */
 static bool
 pgarch_archiveXlog(char *xlog)
 {
-	char		xlogarchcmd[MAXPGPATH];
 	char		pathname[MAXPGPATH];
 	char		activitymsg[MAXFNAMELEN + 16];
-	char	   *dp;
-	char	   *endp;
-	const char *sp;
-	int			rc;
+	bool		ret;
 
 	snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
 
-	/*
-	 * construct the command to be executed
-	 */
-	dp = xlogarchcmd;
-	endp = xlogarchcmd + MAXPGPATH - 1;
-	*endp = '\0';
-
-	for (sp = XLogArchiveCommand; *sp; sp++)
-	{
-		if (*sp == '%')
-		{
-			switch (sp[1])
-			{
-				case 'p':
-					/* %p: relative path of source file */
-					sp++;
-					strlcpy(dp, pathname, endp - dp);
-					make_native_path(dp);
-					dp += strlen(dp);
-					break;
-				case 'f':
-					/* %f: filename of source file */
-					sp++;
-					strlcpy(dp, xlog, endp - dp);
-					dp += strlen(dp);
-					break;
-				case '%':
-					/* convert %% to a single % */
-					sp++;
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-				default:
-					/* otherwise treat the % as not special */
-					if (dp < endp)
-						*dp++ = *sp;
-					break;
-			}
-		}
-		else
-		{
-			if (dp < endp)
-				*dp++ = *sp;
-		}
-	}
-	*dp = '\0';
-
-	ereport(DEBUG3,
-			(errmsg_internal("executing archive command \"%s\"",
-							 xlogarchcmd)));
-
 	/* Report archive activity in PS display */
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	rc = system(xlogarchcmd);
-	if (rc != 0)
-	{
-		/*
-		 * If either the shell itself, or a called command, died on a signal,
-		 * abort the archiver.  We do this because system() ignores SIGINT and
-		 * SIGQUIT while waiting; so a signal is very likely something that
-		 * should have interrupted us too.  Also die if the shell got a hard
-		 * "command not found" type of error.  If we overreact it's no big
-		 * deal, the postmaster will just start the archiver again.
-		 */
-		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
-
-		if (WIFEXITED(rc))
-		{
-			ereport(lev,
-					(errmsg("archive command failed with exit code %d",
-							WEXITSTATUS(rc)),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-		else if (WIFSIGNALED(rc))
-		{
-#if defined(WIN32)
-			ereport(lev,
-					(errmsg("archive command was terminated by exception 0x%X",
-							WTERMSIG(rc)),
-					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#else
-			ereport(lev,
-					(errmsg("archive command was terminated by signal %d: %s",
-							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-#endif
-		}
-		else
-		{
-			ereport(lev,
-					(errmsg("archive command exited with unrecognized status %d",
-							rc),
-					 errdetail("The failed archive command was: %s",
-							   xlogarchcmd)));
-		}
-
+	ret = ArchiveContext->archive_file_cb(xlog, pathname);
+	if (ret)
+		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
+	else
 		snprintf(activitymsg, sizeof(activitymsg), "failed on %s", xlog);
-		set_ps_display(activitymsg);
-
-		return false;
-	}
-	elog(DEBUG1, "archived write-ahead log file \"%s\"", xlog);
-
-	snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	set_ps_display(activitymsg);
 
-	return true;
+	return ret;
 }
 
 /*
@@ -716,3 +615,44 @@ HandlePgArchInterrupts(void)
 		ProcessConfigFile(PGC_SIGHUP);
 	}
 }
+
+/*
+ * LoadArchiveLibrary
+ *
+ * Loads the archiving callbacks into our local ArchiveContext.
+ */
+static void
+LoadArchiveLibrary(void)
+{
+	ArchiveContext = palloc0(sizeof(ArchiveModuleCallbacks));
+
+	/*
+	 * If shell archiving is enabled, use our special initialization
+	 * function.  Otherwise, load the library and call its
+	 * _PG_archive_module_init().
+	 */
+	if (ShellArchivingEnabled())
+		shell_archive_init(ArchiveContext);
+	else
+	{
+		ArchiveModuleInit archive_init;
+
+		archive_init = (ArchiveModuleInit)
+			load_external_function(XLogArchiveLibrary,
+								   "_PG_archive_module_init", false, NULL);
+
+		if (archive_init == NULL)
+			ereport(ERROR,
+					(errmsg("archive modules have to declare the "
+							"_PG_archive_module_init symbol")));
+
+		archive_init(ArchiveContext);
+	}
+
+	if (ArchiveContext->check_configured_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register a check callback")));
+	if (ArchiveContext->archive_file_cb == NULL)
+		ereport(ERROR,
+				(errmsg("archive modules must register an archive callback")));
+}
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index e2a76ba055..f43c6b4cdc 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1024,6 +1024,7 @@ PostmasterMain(int argc, char *argv[])
 	 * process any libraries that should be preloaded at postmaster start
 	 */
 	process_shared_preload_libraries();
+	process_archive_library();
 
 	/*
 	 * Initialize SSL library, if specified.
@@ -5011,6 +5012,7 @@ SubPostmasterMain(int argc, char *argv[])
 	 * non-EXEC_BACKEND behavior.
 	 */
 	process_shared_preload_libraries();
+	process_archive_library();
 
 	/* Run backend or appropriate child */
 	if (strcmp(argv[1], "--forkbackend") == 0)
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
new file mode 100644
index 0000000000..7298dda6ee
--- /dev/null
+++ b/src/backend/postmaster/shell_archive.c
@@ -0,0 +1,156 @@
+/*-------------------------------------------------------------------------
+ *
+ * shell_archive.c
+ *
+ * This archiving function uses a user-specified shell command (the
+ * archive_command GUC) to copy write-ahead log files.  It is used as the
+ * default, but other modules may define their own custom archiving logic.
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/shell_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/wait.h>
+
+#include "access/xlog.h"
+#include "postmaster/pgarch.h"
+
+static bool shell_archive_configured(void);
+static bool shell_archive_file(const char *file, const char *path);
+
+void
+shell_archive_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&shell_archive_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = shell_archive_configured;
+	cb->archive_file_cb = shell_archive_file;
+}
+
+static bool
+shell_archive_configured(void)
+{
+	return XLogArchiveCommand[0] != '\0';
+}
+
+static bool
+shell_archive_file(const char *file, const char *path)
+{
+	char		xlogarchcmd[MAXPGPATH];
+	char	   *dp;
+	char	   *endp;
+	const char *sp;
+	int			rc;
+
+	Assert(file != NULL);
+	Assert(path != NULL);
+
+	/*
+	 * construct the command to be executed
+	 */
+	dp = xlogarchcmd;
+	endp = xlogarchcmd + MAXPGPATH - 1;
+	*endp = '\0';
+
+	for (sp = XLogArchiveCommand; *sp; sp++)
+	{
+		if (*sp == '%')
+		{
+			switch (sp[1])
+			{
+				case 'p':
+					/* %p: relative path of source file */
+					sp++;
+					strlcpy(dp, path, endp - dp);
+					make_native_path(dp);
+					dp += strlen(dp);
+					break;
+				case 'f':
+					/* %f: filename of source file */
+					sp++;
+					strlcpy(dp, file, endp - dp);
+					dp += strlen(dp);
+					break;
+				case '%':
+					/* convert %% to a single % */
+					sp++;
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+				default:
+					/* otherwise treat the % as not special */
+					if (dp < endp)
+						*dp++ = *sp;
+					break;
+			}
+		}
+		else
+		{
+			if (dp < endp)
+				*dp++ = *sp;
+		}
+	}
+	*dp = '\0';
+
+	ereport(DEBUG3,
+			(errmsg_internal("executing archive command \"%s\"",
+							 xlogarchcmd)));
+
+	rc = system(xlogarchcmd);
+	if (rc != 0)
+	{
+		/*
+		 * If either the shell itself, or a called command, died on a signal,
+		 * abort the archiver.  We do this because system() ignores SIGINT and
+		 * SIGQUIT while waiting; so a signal is very likely something that
+		 * should have interrupted us too.  Also die if the shell got a hard
+		 * "command not found" type of error.  If we overreact it's no big
+		 * deal, the postmaster will just start the archiver again.
+		 */
+		int			lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
+
+		if (WIFEXITED(rc))
+		{
+			ereport(lev,
+					(errmsg("archive command failed with exit code %d",
+							WEXITSTATUS(rc)),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+		else if (WIFSIGNALED(rc))
+		{
+#if defined(WIN32)
+			ereport(lev,
+					(errmsg("archive command was terminated by exception 0x%X",
+							WTERMSIG(rc)),
+					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#else
+			ereport(lev,
+					(errmsg("archive command was terminated by signal %d: %s",
+							WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+#endif
+		}
+		else
+		{
+			ereport(lev,
+					(errmsg("archive command exited with unrecognized status %d",
+							rc),
+					 errdetail("The failed archive command was: %s",
+							   xlogarchcmd)));
+		}
+
+		return false;
+	}
+
+	elog(DEBUG1, "archived write-ahead log file \"%s\"", file);
+	return true;
+}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 88801374b5..9f2766ed04 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -38,6 +38,7 @@
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/interrupt.h"
+#include "postmaster/pgarch.h"
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -1614,6 +1615,9 @@ char	   *local_preload_libraries_string = NULL;
 /* Flag telling that we are loading shared_preload_libraries */
 bool		process_shared_preload_libraries_in_progress = false;
 
+/* Flag telling that we are loading archive_library */
+bool		process_archive_library_in_progress = false;
+
 /*
  * load the shared libraries listed in 'libraries'
  *
@@ -1696,6 +1700,29 @@ process_session_preload_libraries(void)
 				   true);
 }
 
+/*
+ * process the archive library
+ */
+void
+process_archive_library(void)
+{
+	process_archive_library_in_progress = true;
+
+	/*
+	 * The shell archiving code is in the core server, so there's nothing
+	 * to load for that.
+	 */
+	if (!ShellArchivingEnabled())
+	{
+		load_file(XLogArchiveLibrary, false);
+		ereport(DEBUG1,
+				(errmsg_internal("loaded archive library \"%s\"",
+								 XLogArchiveLibrary)));
+	}
+
+	process_archive_library_in_progress = false;
+}
+
 void
 pg_bindtextdomain(const char *domain)
 {
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index e91d5a3cfd..9204f608fc 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3864,13 +3864,23 @@ static struct config_string ConfigureNamesString[] =
 	{
 		{"archive_command", PGC_SIGHUP, WAL_ARCHIVING,
 			gettext_noop("Sets the shell command that will be called to archive a WAL file."),
-			NULL
+			gettext_noop("This is unused if \"archive_library\" does not indicate archiving via shell is enabled.")
 		},
 		&XLogArchiveCommand,
 		"",
 		NULL, NULL, show_archive_command
 	},
 
+	{
+		{"archive_library", PGC_POSTMASTER, WAL_ARCHIVING,
+			gettext_noop("Sets the library that will be called to archive a WAL file."),
+			gettext_noop("A value of \"shell\" or an empty string indicates that \"archive_command\" should be used.")
+		},
+		&XLogArchiveLibrary,
+		"shell",
+		NULL, NULL, NULL
+	},
+
 	{
 		{"restore_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will be called to retrieve an archived WAL file."),
@@ -8961,7 +8971,8 @@ init_custom_variable(const char *name,
 	 * module might already have hooked into.
 	 */
 	if (context == PGC_POSTMASTER &&
-		!process_shared_preload_libraries_in_progress)
+		!process_shared_preload_libraries_in_progress &&
+		!process_archive_library_in_progress)
 		elog(FATAL, "cannot create PGC_POSTMASTER variables after startup");
 
 	/*
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 1cbc9feeb6..dc4a20b014 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -245,6 +245,7 @@
 
 #archive_mode = off		# enables archiving; off, on, or always
 				# (change requires restart)
+#archive_library = 'shell'	# library to use to archive a logfile segment
 #archive_command = ''		# command to use to archive a logfile segment
 				# placeholders: %p = path of file to archive
 				#               %f = file name only
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 5e2c94a05f..7093e3390f 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -157,7 +157,6 @@ extern PGDLLIMPORT int wal_level;
 /* Is WAL archiving enabled always (even during recovery)? */
 #define XLogArchivingAlways() \
 	(AssertMacro(XLogArchiveMode == ARCHIVE_MODE_OFF || wal_level >= WAL_LEVEL_REPLICA), XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
-#define XLogArchiveCommandSet() (XLogArchiveCommand[0] != '\0')
 
 /*
  * Is WAL-logging necessary for archival or log-shipping, or can we skip
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 90a3016065..8717fed0dc 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -464,6 +464,7 @@ extern void BaseInit(void);
 /* in utils/init/miscinit.c */
 extern bool IgnoreSystemIndexes;
 extern PGDLLIMPORT bool process_shared_preload_libraries_in_progress;
+extern PGDLLIMPORT bool process_archive_library_in_progress;
 extern char *session_preload_libraries_string;
 extern char *shared_preload_libraries_string;
 extern char *local_preload_libraries_string;
@@ -477,6 +478,7 @@ extern bool RecheckDataDirLockFile(void);
 extern void ValidatePgVersion(const char *path);
 extern void process_shared_preload_libraries(void);
 extern void process_session_preload_libraries(void);
+extern void process_archive_library(void);
 extern void pg_bindtextdomain(const char *domain);
 extern bool has_rolreplication(Oid roleid);
 
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 1e47a143e1..7d09d2665e 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -32,4 +32,49 @@ extern bool PgArchCanRestart(void);
 extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
 
+/*
+ * The value of the archive_library GUC.
+ */
+extern char *XLogArchiveLibrary;
+
+/*
+ * Callback that gets called to determine if the archive module is
+ * configured.
+ */
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+
+/*
+ * Callback called to archive a single WAL file.
+ */
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+
+/*
+ * Archive module callbacks
+ */
+typedef struct ArchiveModuleCallbacks
+{
+	ArchiveCheckConfiguredCB check_configured_cb;
+	ArchiveFileCB archive_file_cb;
+} ArchiveModuleCallbacks;
+
+/*
+ * Type of the shared library symbol _PG_archive_module_init that is looked
+ * up when loading an archive library.
+ */
+typedef void (*ArchiveModuleInit) (ArchiveModuleCallbacks *cb);
+
+/*
+ * Since the logic for archiving via a shell command is in the core server
+ * and does not need to be loaded via a shared library, it has a special
+ * initialization function.
+ */
+extern void shell_archive_init(ArchiveModuleCallbacks *cb);
+
+/*
+ * We consider archiving via shell to be enabled if archive_library is
+ * empty or if archive_library is set to "shell".
+ */
+#define ShellArchivingEnabled() \
+	(XLogArchiveLibrary[0] == '\0' || strcmp(XLogArchiveLibrary, "shell") == 0)
+
 #endif							/* _PGARCH_H */
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index dffc79b2d9..b49e508a2c 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -5,6 +5,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = \
+		  basic_archive \
 		  brin \
 		  commit_ts \
 		  delay_execution \
diff --git a/src/test/modules/basic_archive/.gitignore b/src/test/modules/basic_archive/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/basic_archive/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/basic_archive/Makefile b/src/test/modules/basic_archive/Makefile
new file mode 100644
index 0000000000..ffbf846b68
--- /dev/null
+++ b/src/test/modules/basic_archive/Makefile
@@ -0,0 +1,20 @@
+# src/test/modules/basic_archive/Makefile
+
+MODULES = basic_archive
+PGFILEDESC = "basic_archive - basic archive module"
+
+REGRESS = basic_archive
+REGRESS_OPTS = --temp-config $(top_srcdir)/src/test/modules/basic_archive/basic_archive.conf
+
+NO_INSTALLCHECK = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/basic_archive
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/basic_archive/basic_archive.c b/src/test/modules/basic_archive/basic_archive.c
new file mode 100644
index 0000000000..322049d45f
--- /dev/null
+++ b/src/test/modules/basic_archive/basic_archive.c
@@ -0,0 +1,189 @@
+/*-------------------------------------------------------------------------
+ *
+ * basic_archive.c
+ *
+ * This file demonstrates a basic archive library implementation that is
+ * roughly equivalent to the following shell command:
+ *
+ * 		test ! -f /path/to/dest && cp /path/to/src /path/to/dest
+ *
+ * One notable difference between this module and the shell command above
+ * is that this module first copies the file to a temporary destination,
+ * syncs it to disk, and then durably moves it to the final destination.
+ *
+ * Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/test/modules/basic_archive/basic_archive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "miscadmin.h"
+#include "postmaster/pgarch.h"
+#include "storage/copydir.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+void _PG_init(void);
+void _PG_archive_module_init(ArchiveModuleCallbacks *cb);
+
+static char *archive_directory = NULL;
+
+static bool basic_archive_configured(void);
+static bool basic_archive_file(const char *file, const char *path);
+static bool check_archive_directory(char **newval, void **extra, GucSource source);
+
+/*
+ * _PG_init
+ *
+ * Defines the module's GUC.
+ */
+void
+_PG_init(void)
+{
+	if (!process_archive_library_in_progress)
+		ereport(ERROR,
+				(errmsg("\"basic_archive\" can only be loaded via \"archive_library\"")));
+
+	DefineCustomStringVariable("basic_archive.archive_directory",
+							   gettext_noop("Archive file destination directory."),
+							   NULL,
+							   &archive_directory,
+							   "",
+							   PGC_SIGHUP,
+							   0,
+							   check_archive_directory, NULL, NULL);
+
+	EmitWarningsOnPlaceholders("basic_archive");
+}
+
+/*
+ * _PG_archive_module_init
+ *
+ * Returns the module's archiving callbacks.
+ */
+void
+_PG_archive_module_init(ArchiveModuleCallbacks *cb)
+{
+	AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit);
+
+	cb->check_configured_cb = basic_archive_configured;
+	cb->archive_file_cb = basic_archive_file;
+}
+
+/*
+ * check_archive_directory
+ *
+ * Checks that the provided archive directory exists.
+ */
+static bool
+check_archive_directory(char **newval, void **extra, GucSource source)
+{
+	struct stat st;
+
+	/*
+	 * The default value is an empty string, so we have to accept that value.
+	 * Our check_configured callback also checks for this and prevents archiving
+	 * from proceeding if it is still empty.
+	 */
+	if (*newval == NULL || *newval[0] == '\0')
+		return true;
+
+	/*
+	 * Make sure the file paths won't be too long.  The docs indicate that the
+	 * file names to be archived can be up to 64 characters long.
+	 */
+	if (strlen(*newval) + 64 + 2 >= MAXPGPATH)
+	{
+		GUC_check_errdetail("archive directory too long");
+		return false;
+	}
+
+	/*
+	 * Do a basic sanity check that the specified archive directory exists.  It
+	 * could be removed at some point in the future, so we still need to be
+	 * prepared for it not to exist in the actual archiving logic.
+	 */
+	if (stat(*newval, &st) != 0 || !S_ISDIR(st.st_mode))
+	{
+		GUC_check_errdetail("specified archive directory does not exist");
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * basic_archive_configured
+ *
+ * Checks that archive_directory is not blank.
+ */
+static bool
+basic_archive_configured(void)
+{
+	return archive_directory != NULL && archive_directory[0] != '\0';
+}
+
+/*
+ * basic_archive_file
+ *
+ * Archives one file.
+ */
+static bool
+basic_archive_file(const char *file, const char *path)
+{
+	char destination[MAXPGPATH];
+	char temp[MAXPGPATH];
+	struct stat st;
+
+	ereport(DEBUG3,
+			(errmsg("archiving \"%s\" via basic_archive", file)));
+
+	snprintf(destination, MAXPGPATH, "%s/%s", archive_directory, file);
+	snprintf(temp, MAXPGPATH, "%s/%s", archive_directory, "archtemp");
+
+	/*
+	 * First, check if the file has already been archived.  If the archive file
+	 * already exists, something might be wrong, so we just fail.
+	 */
+	if (stat(destination, &st) == 0)
+	{
+		ereport(WARNING,
+				(errmsg("archive file \"%s\" already exists", destination)));
+		return false;
+	}
+	else if (errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat file \"%s\": %m", destination)));
+
+	/*
+	 * Remove pre-existing temporary file, if one exists.
+	 */
+	if (unlink(temp) != 0 && errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not unlink file \"%s\": %m", temp)));
+
+	/*
+	 * Copy the file to its temporary destination.
+	 */
+	copy_file(unconstify(char *, path), temp);
+
+	/*
+	 * Sync the temporary file to disk and move it to its final destination.
+	 */
+	(void) durable_rename_excl(temp, destination, ERROR);
+
+	ereport(DEBUG1,
+			(errmsg("archived \"%s\" via basic_archive", file)));
+
+	return true;
+}
diff --git a/src/test/modules/basic_archive/basic_archive.conf b/src/test/modules/basic_archive/basic_archive.conf
new file mode 100644
index 0000000000..b26b2d4144
--- /dev/null
+++ b/src/test/modules/basic_archive/basic_archive.conf
@@ -0,0 +1,3 @@
+archive_mode = 'on'
+archive_library = 'basic_archive'
+basic_archive.archive_directory = '.'
diff --git a/src/test/modules/basic_archive/expected/basic_archive.out b/src/test/modules/basic_archive/expected/basic_archive.out
new file mode 100644
index 0000000000..0015053e0f
--- /dev/null
+++ b/src/test/modules/basic_archive/expected/basic_archive.out
@@ -0,0 +1,29 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+ ?column? 
+----------
+        1
+(1 row)
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test;
diff --git a/src/test/modules/basic_archive/sql/basic_archive.sql b/src/test/modules/basic_archive/sql/basic_archive.sql
new file mode 100644
index 0000000000..14e236d57a
--- /dev/null
+++ b/src/test/modules/basic_archive/sql/basic_archive.sql
@@ -0,0 +1,22 @@
+CREATE TABLE test (a INT);
+SELECT 1 FROM pg_switch_wal();
+
+DO $$
+DECLARE
+	archived bool;
+	loops int := 0;
+BEGIN
+	LOOP
+		archived := count(*) > 0 FROM pg_ls_dir('.', false, false) a
+			WHERE a ~ '^[0-9A-F]{24}$';
+		IF archived OR loops > 120 * 10 THEN EXIT; END IF;
+		PERFORM pg_sleep(0.1);
+		loops := loops + 1;
+	END LOOP;
+END
+$$;
+
+SELECT count(*) > 0 FROM pg_ls_dir('.', false, false) a
+	WHERE a ~ '^[0-9A-F]{24}$';
+
+DROP TABLE test;
-- 
2.16.6

#58Stephen Frost
sfrost@snowman.net
In reply to: Bossart, Nathan (#57)
Re: parallelizing the archiver

Greetings,

* Bossart, Nathan (bossartn@amazon.com) wrote:

On 10/25/21, 1:41 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

Great. Unless I see additional feedback on the basic design shortly,
I'll give the documentation updates a try.

Okay, here is a more complete patch with a first attempt at the
documentation changes. I tried to keep the changes to the existing
docs as minimal as possible, and then I added a new chapter that
describes what goes into creating an archive module. Separately, I
simplified the basic_archive module, moved it to src/test/modules,
and added a simple test. My goal is for this to serve as a basic
example and to provide some test coverage on the new infrastructure.

Definitely interested and plan to look at this more shortly, and
generally this all sounds good, but maybe we should have it be posted
under a new thread as it's moved pretty far from the subject and folks
might not appreciate what this is about at this point..?

Thanks,

Stephen

#59Bossart, Nathan
bossartn@amazon.com
In reply to: Stephen Frost (#58)
Re: parallelizing the archiver

On 11/1/21, 10:57 AM, "Stephen Frost" <sfrost@snowman.net> wrote:

Definitely interested and plan to look at this more shortly, and
generally this all sounds good, but maybe we should have it be posted
under a new thread as it's moved pretty far from the subject and folks
might not appreciate what this is about at this point..?

Done: /messages/by-id/668D2428-F73B-475E-87AE-F89D67942270@amazon.com

Looking forward to your feedback.

Nathan