parallelizing the archiver
Hi hackers,
I'd like to gauge interest in parallelizing the archiver process.
From a quick scan, I was only able to find one recent thread [0]/messages/by-id/20180828060221.x33gokifqi3csjj4@depesz.com that
brought up this topic, and ISTM the conventional wisdom is to use a
backup utility like pgBackRest that does things in parallel behind-
the-scenes. My experience is that the generating-more-WAL-than-we-
can-archive problem is pretty common, and parallelization seems to
help quite a bit, so perhaps it's a good time to consider directly
supporting parallel archiving in PostgreSQL.
Based on previous threads I've seen, I believe many in the community
would like to replace archive_command entirely, but what I'm proposing
here would build on the existing tools. I'm currently thinking of
something a bit like autovacuum_max_workers, but the archive workers
would be created once and would follow a competing consumers model.
Another approach I'm looking at is to use background worker processes,
although I'm not sure if linking such a critical piece of
functionality to max_worker_processes is a good idea. However, I do
see that logical replication uses background workers.
Anyway, I'm curious what folks think about this. I think it'd help
simplify server administration for many users.
Nathan
[0]: /messages/by-id/20180828060221.x33gokifqi3csjj4@depesz.com
On Wed, Sep 8, 2021 at 6:36 AM Bossart, Nathan <bossartn@amazon.com> wrote:
I'd like to gauge interest in parallelizing the archiver process.
[...]
Based on previous threads I've seen, I believe many in the community
would like to replace archive_command entirely, but what I'm proposing
here would build on the existing tools.
Having a new implementation that would remove the archive_command is
probably a better long term solution, but I don't know of anyone
working on that and it's probably gonna take some time. Right now we
have a lot of users that face archiving bottleneck so I think it would
be a good thing to implement parallel archiving, fully compatible with
current archive_command, as a short term solution.
On 9/7/21, 11:38 PM, "Julien Rouhaud" <rjuju123@gmail.com> wrote:
On Wed, Sep 8, 2021 at 6:36 AM Bossart, Nathan <bossartn@amazon.com> wrote:
I'd like to gauge interest in parallelizing the archiver process.
[...]
Based on previous threads I've seen, I believe many in the community
would like to replace archive_command entirely, but what I'm proposing
here would build on the existing tools.Having a new implementation that would remove the archive_command is
probably a better long term solution, but I don't know of anyone
working on that and it's probably gonna take some time. Right now we
have a lot of users that face archiving bottleneck so I think it would
be a good thing to implement parallel archiving, fully compatible with
current archive_command, as a short term solution.
Thanks for chiming in. I'm planning to work on a patch next week.
Nathan
On Fri, Sep 10, 2021 at 6:30 AM Bossart, Nathan <bossartn@amazon.com> wrote:
Thanks for chiming in. I'm planning to work on a patch next week.
Great news!
About the technical concerns:
I'm currently thinking of
something a bit like autovacuum_max_workers, but the archive workers
would be created once and would follow a competing consumers model.
In this approach, the launched archiver workers would be kept as long
as the instance is up, or should they be stopped if they're not
required anymore, e.g. if there was a temporary write activity spike?
I think we should make sure that at least one worker is always up.
Another approach I'm looking at is to use background worker processes,
although I'm not sure if linking such a critical piece of
functionality to max_worker_processes is a good idea. However, I do
see that logical replication uses background workers.
I think that using background workers is a good approach, and the
various guc in that area should allow users to properly configure
archiving too. If that's not the case, it might be an opportunity to
add some new infrastructure that could benefit all bgworkers users.
8 сент. 2021 г., в 03:36, Bossart, Nathan <bossartn@amazon.com> написал(а):
Anyway, I'm curious what folks think about this. I think it'd help
simplify server administration for many users.
BTW this thread is also related [0]/messages/by-id/CA+TgmobhAbs2yabTuTRkJTq_kkC80-+jw=pfpypdOJ7+gAbQbw@mail.gmail.com.
My 2 cents.
It's OK if external tool is responsible for concurrency. Do we want this complexity in core? Many users do not enable archiving at all.
Maybe just add parallelism API for external tool?
It's much easier to control concurrency in external tool that in PostgreSQL core. Maintaining parallel worker is a tremendously harder than spawning goroutine, thread, task or whatever.
External tool needs to know when xlog segment is ready and needs to report when it's done. Postgres should just ensure that external archiever\restorer is running.
For example external tool could read xlog names from stdin and report finished files from stdout. I can prototype such tool swiftly :)
E.g. postgres runs ```wal-g wal-archiver``` and pushes ready segment filenames on stdin. And no more listing of archive_status and hacky algorithms to predict next WAL name and completition time!
Thoughts?
Best regards, Andrey Borodin.
[0]: /messages/by-id/CA+TgmobhAbs2yabTuTRkJTq_kkC80-+jw=pfpypdOJ7+gAbQbw@mail.gmail.com
On Fri, Sep 10, 2021 at 1:28 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
It's OK if external tool is responsible for concurrency. Do we want this complexity in core? Many users do not enable archiving at all.
Maybe just add parallelism API for external tool?
It's much easier to control concurrency in external tool that in PostgreSQL core. Maintaining parallel worker is a tremendously harder than spawning goroutine, thread, task or whatever.
Yes, but it also means that it's up to every single archiving tool to
implement a somewhat hackish parallel version of an archive_command,
hoping that core won't break it. If this problem is solved in
postgres core whithout API change, then all existing tool will
automatically benefit from it (maybe not the one who used to have
hacks to make it parallel though, but it seems easier to disable it
rather than implement it).
External tool needs to know when xlog segment is ready and needs to report when it's done. Postgres should just ensure that external archiever\restorer is running.
For example external tool could read xlog names from stdin and report finished files from stdout. I can prototype such tool swiftly :)
E.g. postgres runs ```wal-g wal-archiver``` and pushes ready segment filenames on stdin. And no more listing of archive_status and hacky algorithms to predict next WAL name and completition time!
Yes, but that requires fundamental design changes for the archive
commands right? So while I agree it could be a better approach
overall, it seems like a longer term option. As far as I understand,
what Nathan suggested seems more likely to be achieved in pg15 and
could benefit from a larger set of backup solutions. This can give us
enough time to properly design a better approach for designing a new
archiving approach.
10 сент. 2021 г., в 10:52, Julien Rouhaud <rjuju123@gmail.com> написал(а):
On Fri, Sep 10, 2021 at 1:28 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
It's OK if external tool is responsible for concurrency. Do we want this complexity in core? Many users do not enable archiving at all.
Maybe just add parallelism API for external tool?
It's much easier to control concurrency in external tool that in PostgreSQL core. Maintaining parallel worker is a tremendously harder than spawning goroutine, thread, task or whatever.Yes, but it also means that it's up to every single archiving tool to
implement a somewhat hackish parallel version of an archive_command,
hoping that core won't break it.
I'm not proposing to remove existing archive_command. Just deprecate it one-WAL-per-call form.
If this problem is solved in
postgres core whithout API change, then all existing tool will
automatically benefit from it (maybe not the one who used to have
hacks to make it parallel though, but it seems easier to disable it
rather than implement it).
True hacky tools already can coordinate swarm of their processes and are prepared that they are called multiple times concurrently :)
External tool needs to know when xlog segment is ready and needs to report when it's done. Postgres should just ensure that external archiever\restorer is running.
For example external tool could read xlog names from stdin and report finished files from stdout. I can prototype such tool swiftly :)
E.g. postgres runs ```wal-g wal-archiver``` and pushes ready segment filenames on stdin. And no more listing of archive_status and hacky algorithms to predict next WAL name and completition time!Yes, but that requires fundamental design changes for the archive
commands right? So while I agree it could be a better approach
overall, it seems like a longer term option. As far as I understand,
what Nathan suggested seems more likely to be achieved in pg15 and
could benefit from a larger set of backup solutions. This can give us
enough time to properly design a better approach for designing a new
archiving approach.
It's a very simplistic approach. If some GUC is set - archiver will just feed ready files to stdin of archive command. What fundamental design changes we need?
Best regards, Andrey Borodin.
On Fri, Sep 10, 2021 at 2:03 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
10 сент. 2021 г., в 10:52, Julien Rouhaud <rjuju123@gmail.com> написал(а):
Yes, but it also means that it's up to every single archiving tool to
implement a somewhat hackish parallel version of an archive_command,
hoping that core won't break it.I'm not proposing to remove existing archive_command. Just deprecate it one-WAL-per-call form.
Which is a big API beak.
It's a very simplistic approach. If some GUC is set - archiver will just feed ready files to stdin of archive command. What fundamental design changes we need?
I'm talking about the commands themselves. Your suggestion is to
change archive_command to be able to spawn a daemon, and it looks like
a totally different approach. I'm not saying that having a daemon
based approach to take care of archiving is a bad idea, I'm saying
that trying to fit that with the current archive_command + some new
GUC looks like a bad idea.
10 сент. 2021 г., в 11:11, Julien Rouhaud <rjuju123@gmail.com> написал(а):
On Fri, Sep 10, 2021 at 2:03 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
10 сент. 2021 г., в 10:52, Julien Rouhaud <rjuju123@gmail.com> написал(а):
Yes, but it also means that it's up to every single archiving tool to
implement a somewhat hackish parallel version of an archive_command,
hoping that core won't break it.I'm not proposing to remove existing archive_command. Just deprecate it one-WAL-per-call form.
Which is a big API beak.
Huge extension, not a break.
It's a very simplistic approach. If some GUC is set - archiver will just feed ready files to stdin of archive command. What fundamental design changes we need?
I'm talking about the commands themselves. Your suggestion is to
change archive_command to be able to spawn a daemon, and it looks like
a totally different approach. I'm not saying that having a daemon
based approach to take care of archiving is a bad idea, I'm saying
that trying to fit that with the current archive_command + some new
GUC looks like a bad idea.
It fits nicely, even in corner cases. E.g. restore_command run from pg_rewind seems compatible with this approach.
One more example: after failover DBA can just ```ls|wal-g wal-push``` to archive all WALs unarchived before network partition.
This is simple yet powerful approach, without any contradiction to existing archive_command API.
Why it's a bad idea?
Best regards, Andrey Borodin.
On Tue, Sep 7, 2021 at 6:36 PM Bossart, Nathan <bossartn@amazon.com> wrote:
Based on previous threads I've seen, I believe many in the community
would like to replace archive_command entirely, but what I'm proposing
here would build on the existing tools. I'm currently thinking of
something a bit like autovacuum_max_workers, but the archive workers
would be created once and would follow a competing consumers model.
To me, it seems way more beneficial to think about being able to
invoke archive_command with many files at a time instead of just one.
I think for most plausible archive commands that would be way more
efficient than what you propose here. It's *possible* that if we had
that, we'd still want this, but I'm not even convinced.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Sep 10, 2021 at 9:13 PM Robert Haas <robertmhaas@gmail.com> wrote:
To me, it seems way more beneficial to think about being able to
invoke archive_command with many files at a time instead of just one.
I think for most plausible archive commands that would be way more
efficient than what you propose here. It's *possible* that if we had
that, we'd still want this, but I'm not even convinced.
Those approaches don't really seems mutually exclusive? In both case
you will need to internally track the status of each WAL file and
handle non contiguous file sequences. In case of parallel commands
you only need additional knowledge that some commands is already
working on a file. Wouldn't it be even better to eventually be able
launch multiple batches of multiple files rather than a single batch?
If we start with parallelism first, the whole ecosystem could
immediately benefit from it as is. To be able to handle multiple
files in a single command, we would need some way to let the server
know which files were successfully archived and which files weren't,
so it requires a different communication approach than the command
return code.
But as I said, I'm not convinced that using the archive_command
approach for that is the best approach If I understand correctly,
most of the backup solutions would prefer to have a daemon being
launched and use it at a queuing system. Wouldn't it be better to
have a new archive_mode, e.g. "daemon", and have postgres responsible
to (re)start it, and pass information through the daemon's
stdin/stdout or something like that?
On Fri, Sep 10, 2021 at 10:19 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
Those approaches don't really seems mutually exclusive? In both case
you will need to internally track the status of each WAL file and
handle non contiguous file sequences. In case of parallel commands
you only need additional knowledge that some commands is already
working on a file. Wouldn't it be even better to eventually be able
launch multiple batches of multiple files rather than a single batch?
Well, I guess I'm not convinced. Perhaps people with more knowledge of
this than I may already know why it's beneficial, but in my experience
commands like 'cp' and 'scp' are usually limited by the speed of I/O,
not the fact that you only have one of them running at once. Running
several at once, again in my experience, is typically not much faster.
On the other hand, scp has a LOT of startup overhead, so it's easy to
see the benefits of batching.
[rhaas pgsql]$ touch x y z
[rhaas pgsql]$ time sh -c 'scp x cthulhu: && scp y cthulhu: && scp z cthulhu:'
x 100% 207KB 78.8KB/s 00:02
y 100% 0 0.0KB/s 00:00
z 100% 0 0.0KB/s 00:00
real 0m9.418s
user 0m0.045s
sys 0m0.071s
[rhaas pgsql]$ time sh -c 'scp x y z cthulhu:'
x 100% 207KB 273.1KB/s 00:00
y 100% 0 0.0KB/s 00:00
z 100% 0 0.0KB/s 00:00
real 0m3.216s
user 0m0.017s
sys 0m0.020s
If we start with parallelism first, the whole ecosystem could
immediately benefit from it as is. To be able to handle multiple
files in a single command, we would need some way to let the server
know which files were successfully archived and which files weren't,
so it requires a different communication approach than the command
return code.
That is possibly true. I think it might work to just assume that you
have to retry everything if it exits non-zero, but that requires the
archive command to be smart enough to do something sensible if an
identical file is already present in the archive.
But as I said, I'm not convinced that using the archive_command
approach for that is the best approach If I understand correctly,
most of the backup solutions would prefer to have a daemon being
launched and use it at a queuing system. Wouldn't it be better to
have a new archive_mode, e.g. "daemon", and have postgres responsible
to (re)start it, and pass information through the daemon's
stdin/stdout or something like that?
Sure. Actually, I think a background worker would be better than a
separate daemon. Then it could just talk to shared memory directly.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Sep 10, 2021 at 11:22 PM Robert Haas <robertmhaas@gmail.com> wrote:
Well, I guess I'm not convinced. Perhaps people with more knowledge of
this than I may already know why it's beneficial, but in my experience
commands like 'cp' and 'scp' are usually limited by the speed of I/O,
not the fact that you only have one of them running at once. Running
several at once, again in my experience, is typically not much faster.
On the other hand, scp has a LOT of startup overhead, so it's easy to
see the benefits of batching.
I totally agree that batching as many file as possible in a single
command is probably what's gonna achieve the best performance. But if
the archiver only gets an answer from the archive_command once it
tried to process all of the file, it also means that postgres won't be
able to remove any WAL file until all of them could be processed. It
means that users will likely have to limit the batch size and
therefore pay more startup overhead than they would like. In case of
archiving on server with high latency / connection overhead it may be
better to be able to run multiple commands in parallel. I may be
overthinking here and definitely having feedback from people with more
experience around that would be welcome.
That is possibly true. I think it might work to just assume that you
have to retry everything if it exits non-zero, but that requires the
archive command to be smart enough to do something sensible if an
identical file is already present in the archive.
Yes, it could be. I think that we need more feedback for that too.
Sure. Actually, I think a background worker would be better than a
separate daemon. Then it could just talk to shared memory directly.
I thought about it too, but I was under the impression that most
people would want to implement a custom daemon (or already have) with
some more parallel/thread friendly language.
10 сент. 2021 г., в 19:19, Julien Rouhaud <rjuju123@gmail.com> написал(а):
Wouldn't it be better to
have a new archive_mode, e.g. "daemon", and have postgres responsible
to (re)start it, and pass information through the daemon's
stdin/stdout or something like that?
We don't even need to introduce new archive_mode.
Currently archive_command has no expectations regarding stdin\stdout.
Let's just say that we will push new WAL names to stdin until archive_command exits.
And if archive_command prints something to stdout we will interpret it as archived WAL names.
That's it.
Existing archive_commands will continue as is.
Currently information about what is archived is stored on filesystem in archive_status dir. We do not need to change anything.
If archive_command exits (with any exit code) we will restart it if there are WAL files that still were not archived.
Best regards, Andrey Borodin.
On 9/10/21, 8:22 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
On Fri, Sep 10, 2021 at 10:19 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
Those approaches don't really seems mutually exclusive? In both case
you will need to internally track the status of each WAL file and
handle non contiguous file sequences. In case of parallel commands
you only need additional knowledge that some commands is already
working on a file. Wouldn't it be even better to eventually be able
launch multiple batches of multiple files rather than a single batch?Well, I guess I'm not convinced. Perhaps people with more knowledge of
this than I may already know why it's beneficial, but in my experience
commands like 'cp' and 'scp' are usually limited by the speed of I/O,
not the fact that you only have one of them running at once. Running
several at once, again in my experience, is typically not much faster.
On the other hand, scp has a LOT of startup overhead, so it's easy to
see the benefits of batching.[...]
If we start with parallelism first, the whole ecosystem could
immediately benefit from it as is. To be able to handle multiple
files in a single command, we would need some way to let the server
know which files were successfully archived and which files weren't,
so it requires a different communication approach than the command
return code.That is possibly true. I think it might work to just assume that you
have to retry everything if it exits non-zero, but that requires the
archive command to be smart enough to do something sensible if an
identical file is already present in the archive.
My initial thinking was similar to Julien's. Assuming I have an
archive_command that handles one file, I can just set
archive_max_workers to 3 and reap the benefits. If I'm using an
existing utility that implements its own parallelism, I can keep
archive_max_workers at 1 and continue using it. This would be a
simple incremental improvement.
That being said, I think the discussion about batching is a good one
to have. If the overhead described in your SCP example is
representative of a typical archive_command, then parallelism does
seem a bit silly. We'd essentially be using a ton more resources when
there's obvious room for improvement via reducing amount of overhead
per archive. I think we could easily make the batch size configurable
so that existing archive commands would work (e.g.,
archive_batch_size=1). However, unlike the simple parallel approach,
you'd likely have to adjust your archive_command if you wanted to make
use of batching. That doesn't seem terrible to me, though. As
discussed above, there are some implementation details to work out for
archive failures, but nothing about that seems intractable to me.
Plus, if you still wanted to parallelize things, feeding your
archive_command several files at a time could still be helpful.
I'm currently leaning toward exploring the batching approach first. I
suppose we could always make a prototype of both solutions for
comparison with some "typical" archive commands if that would help
with the discussion.
Nathan
On Fri, 2021-09-10 at 23:48 +0800, Julien Rouhaud wrote:
I totally agree that batching as many file as possible in a single
command is probably what's gonna achieve the best performance. But if
the archiver only gets an answer from the archive_command once it
tried to process all of the file, it also means that postgres won't be
able to remove any WAL file until all of them could be processed. It
means that users will likely have to limit the batch size and
therefore pay more startup overhead than they would like. In case of
archiving on server with high latency / connection overhead it may be
better to be able to run multiple commands in parallel.
Well, users would also have to limit the parallelism, right? If
connections are high-overhead, I wouldn't imagine that running hundreds
of them simultaneously would work very well in practice. (The proof
would be in an actual benchmark, obviously, but usually I would rather
have one process handling a hundred items than a hundred processes
handling one item each.)
For a batching scheme, would it be that big a deal to wait for all of
them to be archived before removal?
That is possibly true. I think it might work to just assume that you
have to retry everything if it exits non-zero, but that requires the
archive command to be smart enough to do something sensible if an
identical file is already present in the archive.Yes, it could be. I think that we need more feedback for that too.
Seems like this is the sticking point. What would be the smartest thing
for the command to do? If there's a destination file already, checksum
it and make sure it matches the source before continuing?
--Jacob
On Fri, Sep 10, 2021 at 11:49 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
I totally agree that batching as many file as possible in a single
command is probably what's gonna achieve the best performance. But if
the archiver only gets an answer from the archive_command once it
tried to process all of the file, it also means that postgres won't be
able to remove any WAL file until all of them could be processed. It
means that users will likely have to limit the batch size and
therefore pay more startup overhead than they would like. In case of
archiving on server with high latency / connection overhead it may be
better to be able to run multiple commands in parallel. I may be
overthinking here and definitely having feedback from people with more
experience around that would be welcome.
That's a fair point. I'm not sure how much it matters, though. I think
you want to imagine a system where there are let's say 10 WAL flies
being archived per second. Using fork() + exec() to spawn a shell
command 10 times per second is a bit expensive, whether you do it
serially or in parallel, and even if the command is something with a
less-insane startup overhead than scp. If we start a shell command say
every 3 seconds and give it 30 files each time, we can reduce the
startup costs we're paying by ~97% at the price of having to wait up
to 3 additional seconds to know that archiving succeeded for any
particular file. That sounds like a pretty good trade-off, because the
main benefit of removing old files is that it keeps us from running
out of disk space, and you should not be running a busy system in such
a way that it is ever within 3 seconds of running out of disk space,
so whatever.
If on the other hand you imagine a system that's not very busy, say 1
WAL file being archived every 10 seconds, then using a batch size of
30 would very significantly delay removal of old files. However, on
this system, batching probably isn't really needed. The rate of WAL
file generation is low enough that if you pay the startup cost of your
archive_command for every file, you're probably still doing just fine.
Probably, any kind of parallelism or batching needs to take this kind
of time-based thinking into account. For batching, the rate at which
files are generated should affect the batch size. For parallelism, it
should affect the number of processes used.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 9/10/21, 10:12 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
If on the other hand you imagine a system that's not very busy, say 1
WAL file being archived every 10 seconds, then using a batch size of
30 would very significantly delay removal of old files. However, on
this system, batching probably isn't really needed. The rate of WAL
file generation is low enough that if you pay the startup cost of your
archive_command for every file, you're probably still doing just fine.Probably, any kind of parallelism or batching needs to take this kind
of time-based thinking into account. For batching, the rate at which
files are generated should affect the batch size. For parallelism, it
should affect the number of processes used.
I was thinking that archive_batch_size would be the maximum batch
size. If the archiver only finds a single file to archive, that's all
it'd send to the archive command. If it finds more, it'd send up to
archive_batch_size to the command.
Nathan
On Fri, Sep 10, 2021 at 1:07 PM Bossart, Nathan <bossartn@amazon.com> wrote:
That being said, I think the discussion about batching is a good one
to have. If the overhead described in your SCP example is
representative of a typical archive_command, then parallelism does
seem a bit silly.
I think that's pretty realistic, because a lot of people's archive
commands are going to actually be, or need to use, scp specifically.
However, there are also cases where people are using commands that
just put the file in some local directory (maybe on a remote mount
point) and I would expect the startup overhead to be much less in
those cases. Maybe people are archiving via HTTPS or similar as well,
and then you again have some connection overhead though, I suspect,
not as much as scp, since web pages do not take 3 seconds to get an
https connection going. I don't know why scp is so crazy slow.
Even in the relatively low-overhead cases, though, I think we would
want to do some real testing to see if the benefits are as we expect.
See /messages/by-id/20200420211018.w2qphw4yybcbxksl@alap3.anarazel.de
and downthread for context. I was *convinced* that parallel backup was
a win. Benchmarking was a tad underwhelming, but there was a clear if
modest benefit by running a synthetic test of copying a lot of files
serially or in parallel, with the files spread across multiple
filesystems on the same physical box. However, when Andres modified my
test program to use posix_fadvise(), posix_fallocate(), and
sync_file_range() while doing the copies, the benefits of parallelism
largely evaporated, and in fact in some cases enabling parallelism
caused major regressions. In other words, the apparent benefits of
parallelism were really due to suboptimal behaviors in the Linux page
cache and some NUMA effects that were in fact avoidable.
So I'm suspicious that the same things might end up being true here.
It's not exactly the same, because the goal of WAL archiving is to
keep up with the rate of WAL generation, and the goal of a backup is
(unless max-rate is used) to finish as fast as possible, and that
difference in goals might end up being significant. Also, you can make
an argument that some people will benefit from a parallelism feature
even if a perfectly-implemented archive_command doesn't, because many
people use really terrible archive_commnads. But all that said, I
think the parallel backup discussion is still a cautionary tale to
which some attention ought to be paid.
We'd essentially be using a ton more resources when
there's obvious room for improvement via reducing amount of overhead
per archive. I think we could easily make the batch size configurable
so that existing archive commands would work (e.g.,
archive_batch_size=1). However, unlike the simple parallel approach,
you'd likely have to adjust your archive_command if you wanted to make
use of batching. That doesn't seem terrible to me, though. As
discussed above, there are some implementation details to work out for
archive failures, but nothing about that seems intractable to me.
Plus, if you still wanted to parallelize things, feeding your
archive_command several files at a time could still be helpful.
Yep.
I'm currently leaning toward exploring the batching approach first. I
suppose we could always make a prototype of both solutions for
comparison with some "typical" archive commands if that would help
with the discussion.
Yeah, I think the concerns here are more pragmatic than philosophical,
at least for me.
I had kind of been thinking that the way to attack this problem is to
go straight to allowing for a background worker, because the other
problem with archive_command is that running a shell command like cp,
scp, or rsync is not really safe. It won't fsync your data, it might
not fail if the file is in the archive already, and it definitely
won't succeed without doing anything if there's a byte for byte
identical file in the archive and fail if there's a file with
different contents already in the archive. Fixing that stuff by
running different shell commands is hard, but it wouldn't be that hard
to do it in C code, and you could then also extend whatever code you
wrote to do batching and parallelism; starting more workers isn't
hard.
However, I can't see the idea of running a shell command going away
any time soon, in spite of its numerous and severe drawbacks. Such an
interface provides a huge degree of flexibility and allows system
admins to whack around behavior easily, which you don't get if you
have to code every change in C. So I think command-based enhancements
are fine to pursue also, even though I don't think it's the ideal
place for most users to end up.
--
Robert Haas
EDB: http://www.enterprisedb.com
10 сент. 2021 г., в 22:18, Bossart, Nathan <bossartn@amazon.com> написал(а):
I was thinking that archive_batch_size would be the maximum batch
size. If the archiver only finds a single file to archive, that's all
it'd send to the archive command. If it finds more, it'd send up to
archive_batch_size to the command.
I think that a concept of a "batch" is misleading.
If you pass filenames via stdin you don't need to know all names upfront.
Just send more names to the pipe if achiver_command is still running one more segments just became available.
This way level of parallelism will adapt to the workload.
Best regards, Andrey Borodin.