where should I stick that backup?

Started by Robert Haasabout 6 years ago52 messageshackers

robertmhaas@gmail.com

about 6 years ago

There are a couple of things that pg_basebackup can't do that might be
an issue for some users. One of them is that you might want to do
something like encrypt your backup. Another is that you might want to
store someplace other than in the filesystem, like maybe S3. We could
certainly teach pg_basebackup how to do specifically those things, and
maybe that is worthwhile. However, I wonder if it would be useful to
provide a more general capability, either instead of doing those more
specific things or in addition to doing those more specific things.

What I'm thinking about is: suppose we add an option to pg_basebackup
with a name like --pipe-output. This would be mutually exclusive with
-D, but would work at least with -Ft and maybe also with -Fp. The
argument to --pipe-output would be a shell command to be executed once
per output file. Any instance of %f in the shell command would be
replaced with the name of the file that would have been written (and
%% would turn into a single %). The shell command itself would be
executed via system(). So if you want to compress, but using some
other compression program instead of gzip, you could do something
like:

pg_basebackup -Ft --pipe-output 'bzip > %f.bz2'

And if you want to encrypt, you could do something like:

pg_basebackup -Ft --pipe-output 'gpg -e -o %f.gpg'

And if you want to ship it off to be stored in a concrete bunker deep
underground, you can just do something like:

pg_basebackup -Ft --pipe-output 'send-to-underground-storage.sh
backup-2020-04-03 %f'

You still have to write send-to-underground-storage.sh, of course, and
that may involve some work, and maybe also some expensive
construction. But what you don't have to do is first copy the entire
backup to your local filesystem and then as a second step figure out
how to put it through whatever post-processing it needs. Instead, you
can simply take your backup and stick it anywhere you like.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

noah@leadboat.com

about 6 years ago

In reply to: Robert Haas (#1)

Re: where should I stick that backup?

On Fri, Apr 03, 2020 at 10:19:21AM -0400, Robert Haas wrote:

What I'm thinking about is: suppose we add an option to pg_basebackup
with a name like --pipe-output. This would be mutually exclusive with
-D, but would work at least with -Ft and maybe also with -Fp. The
argument to --pipe-output would be a shell command to be executed once
per output file. Any instance of %f in the shell command would be
replaced with the name of the file that would have been written (and
%% would turn into a single %). The shell command itself would be
executed via system(). So if you want to compress, but using some
other compression program instead of gzip, you could do something
like:

pg_basebackup -Ft --pipe-output 'bzip > %f.bz2'

Seems good to me. I agree -Fp is a "maybe" since the overhead will be high
for small files.

sfrost@snowman.net

about 6 years ago

In reply to: Noah Misch (#2)

Re: where should I stick that backup?

Greetings,

* Noah Misch (noah@leadboat.com) wrote:

On Fri, Apr 03, 2020 at 10:19:21AM -0400, Robert Haas wrote:

What I'm thinking about is: suppose we add an option to pg_basebackup
with a name like --pipe-output. This would be mutually exclusive with
-D, but would work at least with -Ft and maybe also with -Fp. The
argument to --pipe-output would be a shell command to be executed once
per output file. Any instance of %f in the shell command would be
replaced with the name of the file that would have been written (and
%% would turn into a single %). The shell command itself would be
executed via system(). So if you want to compress, but using some
other compression program instead of gzip, you could do something
like:

pg_basebackup -Ft --pipe-output 'bzip > %f.bz2'

Seems good to me. I agree -Fp is a "maybe" since the overhead will be high
for small files.

For my 2c, at least, introducing more shell commands into critical parts
of the system is absolutely the wrong direction to go in.
archive_command continues to be a mess that we refuse to clean up or
even properly document and the project would be much better off by
trying to eliminate it rather than add in new ways for users to end up
with bad or invalid backups.

Further, having a generic shell script approach like this would result
in things like "well, we don't need to actually add support for X, Y or
Z, because we have this wonderful generic shell script thing and you can
write your own, and therefore we won't accept patches which do add those
capabilities because then we'd have to actually maintain that support."

In short, -1 from me.

Thanks,

Stephen

robertmhaas@gmail.com

about 6 years ago

In reply to: Stephen Frost (#3)

Re: where should I stick that backup?

On Mon, Apr 6, 2020 at 10:45 AM Stephen Frost <sfrost@snowman.net> wrote:

For my 2c, at least, introducing more shell commands into critical parts
of the system is absolutely the wrong direction to go in.
archive_command continues to be a mess that we refuse to clean up or
even properly document and the project would be much better off by
trying to eliminate it rather than add in new ways for users to end up
with bad or invalid backups.

Further, having a generic shell script approach like this would result
in things like "well, we don't need to actually add support for X, Y or
Z, because we have this wonderful generic shell script thing and you can
write your own, and therefore we won't accept patches which do add those
capabilities because then we'd have to actually maintain that support."

In short, -1 from me.

I'm not sure that there's any point in responding to this because I
believe that the wording of this email suggests that you've made up
your mind that it's bad and that position is not subject to change no
matter what anyone else may say. However, I'm going to try on reply
anyway, on the theory that (1) I might be wrong and (2) even if I'm
right, it might influence the opinions of others who have not spoken
yet, and whose opinions may be less settled.

First of all, while I agree that archive_command has some problems, I
don't think that means that every case where we use a shell command
for anything is a hopeless mess. The only problem I really see in this
case is that if you route to a local file via an intermediate program
you wouldn't get an fsync() any more. But we could probably figure out
some clever things to work around that problem, if that's the issue.
If there's some other problem, what is it?

Second, PostgreSQL is not realistically going to link pg_basebackup
against every compression, encryption, and remote storage library out
there. One, yeah, we don't want to maintain that. Two, we don't want
PostgreSQL to have build-time dependencies on a dozen or more
libraries that people might want to use for stuff like this. We might
well want to incorporate support for a few of the more popular things
in this area, but people will always want support for newer things
than what existing server releases feature, and for more of them.

Third, I am getting pretty tired of being told every time I try to do
something that is related in any way to backup that it's wrong. If
your experience with pgbackrest motivated you to propose ways of
improving backup and restore functionality in the community, that
would be great. But in my experience so far, it seems to mostly
involve making a lot of negative comments that make it hard to get
anything done. I would appreciate it if you would adopt a more
constructive tone.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Magnus Hagander

magnus@hagander.net

about 6 years ago

In reply to: Stephen Frost (#3)

Re: where should I stick that backup?

On Mon, Apr 6, 2020 at 4:45 PM Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Noah Misch (noah@leadboat.com) wrote:

On Fri, Apr 03, 2020 at 10:19:21AM -0400, Robert Haas wrote:

What I'm thinking about is: suppose we add an option to pg_basebackup
with a name like --pipe-output. This would be mutually exclusive with
-D, but would work at least with -Ft and maybe also with -Fp. The
argument to --pipe-output would be a shell command to be executed once
per output file. Any instance of %f in the shell command would be
replaced with the name of the file that would have been written (and
%% would turn into a single %). The shell command itself would be
executed via system(). So if you want to compress, but using some
other compression program instead of gzip, you could do something
like:

pg_basebackup -Ft --pipe-output 'bzip > %f.bz2'

Seems good to me. I agree -Fp is a "maybe" since the overhead will be high
for small files.

For my 2c, at least, introducing more shell commands into critical parts
of the system is absolutely the wrong direction to go in.
archive_command continues to be a mess that we refuse to clean up or
even properly document and the project would be much better off by
trying to eliminate it rather than add in new ways for users to end up
with bad or invalid backups.

I think the bigger problem with archive_command more comes from how
it's defined to work tbh. Which leaves a lot of things open.

This sounds to me like a much narrower use-case, which makes it a lot
more OK. But I agree we have to be careful not to get back into that
whole mess. One thing would be to clearly document such things *from
the beginning*, and not try to retrofit it years later like we ended
up doing with archive_command.

And as Robert mentions downthread, the fsync() issue is definitely a
real one, but if that is documented clearly ahead of time, that's a
reasonable level foot-gun I'd say.

Further, having a generic shell script approach like this would result
in things like "well, we don't need to actually add support for X, Y or
Z, because we have this wonderful generic shell script thing and you can
write your own, and therefore we won't accept patches which do add those
capabilities because then we'd have to actually maintain that support."

In principle, I agree with "shellscripts suck".

Now, if we were just talking about compression, it would actually be
interesting to implement some sort of "postgres compression API" if
you will, that is implemented by a shared library. This library could
then be used from pg_basebackup or from anything else that needs
compression. And anybody who wants could then do a "<compression X>
for PostgreSQL" module, removing the need for us to carry such code
upstream.

There's been discussions of that for the backend before IIRC, but I
don't recall the conclusions. And in particular, I don't recall if it
included the idea of being able to use it in situations like this as
well, and with *run-time loading*.

And that said, then we'd limit ourselves to compression. We'd still
need a way to deal with encryption...

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

sfrost@snowman.net

about 6 years ago

In reply to: Robert Haas (#4)

Re: where should I stick that backup?

Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:

On Mon, Apr 6, 2020 at 10:45 AM Stephen Frost <sfrost@snowman.net> wrote:

For my 2c, at least, introducing more shell commands into critical parts
of the system is absolutely the wrong direction to go in.
archive_command continues to be a mess that we refuse to clean up or
even properly document and the project would be much better off by
trying to eliminate it rather than add in new ways for users to end up
with bad or invalid backups.

Further, having a generic shell script approach like this would result
in things like "well, we don't need to actually add support for X, Y or
Z, because we have this wonderful generic shell script thing and you can
write your own, and therefore we won't accept patches which do add those
capabilities because then we'd have to actually maintain that support."

In short, -1 from me.

I'm not sure that there's any point in responding to this because I
believe that the wording of this email suggests that you've made up
your mind that it's bad and that position is not subject to change no
matter what anyone else may say. However, I'm going to try on reply
anyway, on the theory that (1) I might be wrong and (2) even if I'm
right, it might influence the opinions of others who have not spoken
yet, and whose opinions may be less settled.

Chances certainly aren't good that you'll convince me that putting more
absolutely crticial-to-get-perfect shell scripts into the backup path is
a good idea.

First of all, while I agree that archive_command has some problems, I
don't think that means that every case where we use a shell command
for anything is a hopeless mess. The only problem I really see in this
case is that if you route to a local file via an intermediate program
you wouldn't get an fsync() any more. But we could probably figure out
some clever things to work around that problem, if that's the issue.
If there's some other problem, what is it?

We certainly haven't solved the issues with archive_command (at least,
not in core), so this "well, maybe we could fix all the issues" claim
really doesn't hold any water. Having commands like this ends up just
punting on the whole problem and saying "here user, you deal with it."
*Maybe* if we *also* wrote dedicated tools to be used with these
commands (as has been proposed multiple times with archive_command, but
hasn't actually happened, at least, not in core), we could build
something where this would work reasonably well and it'd be alright, but
that wasn't what seemed to be suggested here, and if we're going to
write all that code anyway, it doesn't really seem like a shell
interface is a best one to go with.

There's also been something of an expectation that if we're going to
provide an interface then we should have an example of something that
uses it- but when it comes to something like archive_command, the
example we came up with was terrible and yet it's still in our
documentation and is commonly used, much to the disservice of our users.
Sure, we can point to our users and say "well, that's now how you should
actually use that feature, you should do all this other stuff in that
command" and punt on this and push it back on our users and tell them
that they're using the interface we provide wrong but the only folks who
can possibly actually like that answer is ourselves- our users aren't
happy with it because they're left with a broken backup that they can't
restore from when they needed to.

That your initial email had more-or-less the exact same kind of
"example" certainly doesn't inspire confidence that this would end up
being used sensibly by our users.

Yes, fsync() is part of the issue but it's not the only one- retry
logic, and making sure the results are correct, is pretty darn important
too, especially with things like s3 (even dedicated tools have issues in
this area- I just saw a report about wal-g failing to archive a WAL file
properly because there was an error which resulted in a 0-byte WAL file
being stored; wal-g did properly retry, but then it saw the file was
there and figured "all is well" and returned success even though the
file was 0-byte in s3). I don't doubt that David could point out a few
other issues- he routinely does whenever I chat with him about various
ideas I've got.

So, instead of talking about 'bzip2 > %f.bz2', and then writing into our
documentation that that's how this feature can be used, what about
proposing something that would actually work reliably with this
interface? Which properly fsync's everything, has good retry logic for
when failures happen, is able to actually detect when a failure
happened, how to restore from a backup taken this way, and it'd probably
be good to show how pg_verifybackup could be used to make sure the
backup is actually correct and valid too.

Second, PostgreSQL is not realistically going to link pg_basebackup
against every compression, encryption, and remote storage library out
there. One, yeah, we don't want to maintain that. Two, we don't want
PostgreSQL to have build-time dependencies on a dozen or more
libraries that people might want to use for stuff like this. We might
well want to incorporate support for a few of the more popular things
in this area, but people will always want support for newer things
than what existing server releases feature, and for more of them.

We don't need to link to 'every compression, encryption and remote
storage library out there'. In some cases, yes, it makes sense to use
an existing library (OpenSSL, zlib, lz4), but in many other cases it
makes more sense to build support directly into the system (s3, gcs,
probably others) because a good library doesn't exist. It'd also be
good to build a nicely extensible system which people can add to, to
support other storage or compression options but I don't think that's
reasonable to do with a shell-script based interface- maybe with
shared libraries, as Magnus suggests elsewhere, maybe, but even there I
have some doubts.

Third, I am getting pretty tired of being told every time I try to do
something that is related in any way to backup that it's wrong. If
your experience with pgbackrest motivated you to propose ways of
improving backup and restore functionality in the community, that
would be great. But in my experience so far, it seems to mostly
involve making a lot of negative comments that make it hard to get
anything done. I would appreciate it if you would adopt a more
constructive tone.

pgbackrest is how we're working to improve backup and restore
functionality in the community, and we've come a long way and gone
through a great deal of fire getting there. I appreciate that it's not
in core and I'd love to discuss how we can change that, but it's
absolutely a part of the PG community and ecosystem- with changes being
made in core routinely which improve the in-core tools as well as
pgbackrest by the authors contributing back.

As far as my tone, I'm afraid that's simply coming from having dealt
with and discussed many of these, well, shortcuts, to trying to improve
backup and recovery. Did David and I discuss using s3cmd? Of course.
Did we research various s3 libraries? http libraries? SSL libraries?
compression libraries? Absolutely, which is why we ended up using
OpenSSL (PG links to it already, so if you're happy enough with PG's SSL
then you'll probably accept pgbackrest using the same one- and yes,
we've talked about supporting others as PG is moving in that direction
too), and zlib (same reasons), we've now added lz4 (after researching it
and deciding it was pretty reasonable to include), but when it came to
dealing with s3, we wrote our own HTTP and s3 code- none of the existing
libraries were a great answer and trying to make it work with s3cmd was,
well, about like saying that you should just use CSV files and forget
about this whole database thing. We're very likely to write our own
code for gcs too, but we already have the HTTP code, which means it's
not actually all that heavy of a lift to do.

I'm not against trying to improve the situation in core, and I've even
talked about and tried to give feedback about what would make the most
sense for that to look like, but I feel like every time I do that
there's a bunch of push-back that I want it to look like pgbackrest or
that I'm being negative about things that don't look like pgbackrest.
Guess what? Yes, I do think it should look like pgbackrest, but that's
not because I have some not invented here syndrome issue, it's because
we've been through this and have learned a great deal and have taken
what we've learned and worked to build the best tool we can, much the
way the PG community works to make the best database we can.

Yes, we were able to argue and make it clear that a manifest really did
make sense and even that it should be in json format, and then argue
that checking WAL was a pretty important part of verifying any backup,
but each and every one of these ends up being a long and drawn out
argument and it's draining. The thing is, this stuff isn't new to us.

Thanks,

Stephen

sfrost@snowman.net

about 6 years ago

In reply to: Magnus Hagander (#5)

Re: where should I stick that backup?

Greetings,

* Magnus Hagander (magnus@hagander.net) wrote:

On Mon, Apr 6, 2020 at 4:45 PM Stephen Frost <sfrost@snowman.net> wrote:

* Noah Misch (noah@leadboat.com) wrote:

On Fri, Apr 03, 2020 at 10:19:21AM -0400, Robert Haas wrote:

What I'm thinking about is: suppose we add an option to pg_basebackup
with a name like --pipe-output. This would be mutually exclusive with
-D, but would work at least with -Ft and maybe also with -Fp. The
argument to --pipe-output would be a shell command to be executed once
per output file. Any instance of %f in the shell command would be
replaced with the name of the file that would have been written (and
%% would turn into a single %). The shell command itself would be
executed via system(). So if you want to compress, but using some
other compression program instead of gzip, you could do something
like:

pg_basebackup -Ft --pipe-output 'bzip > %f.bz2'

Seems good to me. I agree -Fp is a "maybe" since the overhead will be high
for small files.

For my 2c, at least, introducing more shell commands into critical parts
of the system is absolutely the wrong direction to go in.
archive_command continues to be a mess that we refuse to clean up or
even properly document and the project would be much better off by
trying to eliminate it rather than add in new ways for users to end up
with bad or invalid backups.

I think the bigger problem with archive_command more comes from how
it's defined to work tbh. Which leaves a lot of things open.

This sounds to me like a much narrower use-case, which makes it a lot
more OK. But I agree we have to be careful not to get back into that
whole mess. One thing would be to clearly document such things *from
the beginning*, and not try to retrofit it years later like we ended
up doing with archive_command.

This sounds like a much broader use-case to me, not a narrower one. I
agree that we don't want to try and retrofit things years later.

And as Robert mentions downthread, the fsync() issue is definitely a
real one, but if that is documented clearly ahead of time, that's a
reasonable level foot-gun I'd say.

Documented how..?

Further, having a generic shell script approach like this would result
in things like "well, we don't need to actually add support for X, Y or
Z, because we have this wonderful generic shell script thing and you can
write your own, and therefore we won't accept patches which do add those
capabilities because then we'd have to actually maintain that support."

In principle, I agree with "shellscripts suck".

Now, if we were just talking about compression, it would actually be
interesting to implement some sort of "postgres compression API" if
you will, that is implemented by a shared library. This library could
then be used from pg_basebackup or from anything else that needs
compression. And anybody who wants could then do a "<compression X>
for PostgreSQL" module, removing the need for us to carry such code
upstream.

Getting a bit off-track here, but I actually think we should absolutely
figure out a way to support custom compression options in PG. I had
been thinking of something along the lines of per-datatype actually,
where each data type could define it's own compression method, since we
know that different data has different characteristics and therefore
might benefit from different ways of compressing it. Though it's also
true that generically there are tradeoffs between cpu time, memory size,
resulting size on disk, etc, and having ways to pick between those could
also be interesting.

There's been discussions of that for the backend before IIRC, but I
don't recall the conclusions. And in particular, I don't recall if it
included the idea of being able to use it in situations like this as
well, and with *run-time loading*.

Run-time loading brings in the fun that maybe we aren't able to load the
library when we need to too, and what then? :)

And that said, then we'd limit ourselves to compression. We'd still
need a way to deal with encryption...

And shipping stuff off to some remote server too, at least if we are
going to tell users that they can use this approach to send their
backups to s3... (and that reminds me- there's other things to think
about there too, like maybe you don't want to ship off 0-byte files to
s3, or maybe you don't want to ship tiny files, because there's costs
associated with these things...).

Thanks,

Stephen

robertmhaas@gmail.com

about 6 years ago

In reply to: Stephen Frost (#6)

Re: where should I stick that backup?

On Mon, Apr 6, 2020 at 2:23 PM Stephen Frost <sfrost@snowman.net> wrote:

So, instead of talking about 'bzip2 > %f.bz2', and then writing into our
documentation that that's how this feature can be used, what about
proposing something that would actually work reliably with this
interface? Which properly fsync's everything, has good retry logic for
when failures happen, is able to actually detect when a failure
happened, how to restore from a backup taken this way, and it'd probably
be good to show how pg_verifybackup could be used to make sure the
backup is actually correct and valid too.

I don't really understand the problem here. Suppose I do:

mkdir ~/my-brand-new-empty-directory
cd ~/my-brand-new-empty-directory
pg_basebackup -Ft --pipe-output 'bzip2 > %f.bz2'
initdb -S --dont-expect-that-this-is-a-data-directory . # because
right now it would complain about pg_wal and pg_tblspc being missing

I think if all that works, my backup should be good and durably on
disk. If it's not, then either pg_basebackup or bzip2 or initdb didn't
report errors that they should have reported. If you're worried about
that, say because you suspect those programs are buggy or because you
think the kernel may not be reporting errors properly, you can use tar
-jxvf + pg_validatebackup to check.

What *exactly* do you think can go wrong here?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

robertmhaas@gmail.com

about 6 years ago

In reply to: Magnus Hagander (#5)

Re: where should I stick that backup?

On Mon, Apr 6, 2020 at 1:32 PM Magnus Hagander <magnus@hagander.net> wrote:

Now, if we were just talking about compression, it would actually be
interesting to implement some sort of "postgres compression API" if
you will, that is implemented by a shared library. This library could
then be used from pg_basebackup or from anything else that needs
compression. And anybody who wants could then do a "<compression X>
for PostgreSQL" module, removing the need for us to carry such code
upstream.

I think it could be more general than a compression library. It could
be a store-my-stuff-and-give-it-back-to-me library, which might do
compression or encryption or cloud storage or any combination of the
three, and probably other stuff too. Imagine that you first call an
init function with a namespace that is basically a string provided by
the user. Then you open a file either for read or for write (but not
both). Then you read or write a series of chunks (depending on the
file mode). Then you close the file. Then you can do the same with
more files. Finally at the end you close the namespace. You don't
really need to care where or how the functions you are calling store
the data. You just need them to return proper error indicators if by
chance they fail.

As compared with my previous proposal, this would work much better for
pg_basebackup -Fp, because you wouldn't launch a new bzip2 process for
every file. You'd just bzopen(), which is presumably quite lightweight
by comparison. The reasons I didn't propose it are:

1. Running bzip2 on every file in a plain-format backup seems a lot
sillier than running it on every tar file in a tar-format backup.
2. I'm not confident that the command specified here actually needs to
be anything very complicated (unlike archive_command).
3. The barrier to entry for a loadable module is a lot higher than for
a shell command.
4. I think that all of our existing infrastructure for loadable
modules is backend-only.

Now all of these are up for discussion. I am sure we can make the
loadable module stuff work in frontend code; it would just take some
work. A C interface for extensibility is very significantly harder to
use than a shell interface, but it's still way better than no
interface. The idea that this shell command can be something simple is
my current belief, but it may turn out to be wrong. And I'm sure
somebody can propose a good reason to do something with every file in
a plain-format backup rather than using tar format.

All that being said, I still find it hard to believe that we will want
to add dependencies for libraries that we'd need to do encryption or
S3 cloud storage to PostgreSQL itself. So if we go with this more
integrated approach we should consider the possibility that, when the
dust settles, PostgreSQL will only have pg_basebackup
--output-plugin=lz4 and Aurora will also have pg_basebackup
--output-plugin=s3. From my point of view, that would be less than
ideal.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

bruce@momjian.us

about 6 years ago

In reply to: Magnus Hagander (#5)

Re: where should I stick that backup?

On Mon, Apr 6, 2020 at 07:32:45PM +0200, Magnus Hagander wrote:

On Mon, Apr 6, 2020 at 4:45 PM Stephen Frost <sfrost@snowman.net> wrote:

For my 2c, at least, introducing more shell commands into critical parts
of the system is absolutely the wrong direction to go in.
archive_command continues to be a mess that we refuse to clean up or
even properly document and the project would be much better off by
trying to eliminate it rather than add in new ways for users to end up
with bad or invalid backups.

I think the bigger problem with archive_command more comes from how
it's defined to work tbh. Which leaves a lot of things open.

This sounds to me like a much narrower use-case, which makes it a lot
more OK. But I agree we have to be careful not to get back into that
whole mess. One thing would be to clearly document such things *from
the beginning*, and not try to retrofit it years later like we ended
up doing with archive_command.

And as Robert mentions downthread, the fsync() issue is definitely a
real one, but if that is documented clearly ahead of time, that's a
reasonable level foot-gun I'd say.

I think we need to step back and look at the larger issue. The real
argument goes back to the Unix command-line API vs the VMS/Windows API.
The former has discrete parts that can be stitched together, while the
VMS/Windows API presents a more duplicative but more holistic API for
every piece. We have discussed using shell commands for
archive_command, and even more recently, for the server pass phrase.

To get more specific, I think we have to understand how the
_requirements_ of the job match the shell script API, with stdin,
stdout, stderr, return code, and command-line arguments. Looking at
archive_command, the command-line arguments allow specification of file
names, but quoting can be complex. The error return code and stderr
output seem to work fine. There is no clean API for fsync and testing
if the file exists, so that all that has to be hand done in one
command-line. This is why many users use pre-written archive_command
shell scripts.

This brings up a few questions:

* Should we have split apart archive_command into file-exists, copy,
fsync-file? Should we add that now?

* How well does this backup requirement match with the shell command
API?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

sfrost@snowman.net

about 6 years ago

In reply to: Bruce Momjian (#10)

Re: where should I stick that backup?

Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:

I think we need to step back and look at the larger issue. The real
argument goes back to the Unix command-line API vs the VMS/Windows API.
The former has discrete parts that can be stitched together, while the
VMS/Windows API presents a more duplicative but more holistic API for
every piece. We have discussed using shell commands for
archive_command, and even more recently, for the server pass phrase.

When it comes to something like the server pass phrase, it seems much
more reasonable to consider using a shell script (though still perhaps
not ideal) because it's not involved directly in ensuring that the data
is reliably stored and it's pretty clear that if it doesn't work the
worst thing that happens is that the database doesn't start up, but it
won't corrupt any data or destroy it or do other bad things.

To get more specific, I think we have to understand how the
_requirements_ of the job match the shell script API, with stdin,
stdout, stderr, return code, and command-line arguments. Looking at
archive_command, the command-line arguments allow specification of file
names, but quoting can be complex. The error return code and stderr
output seem to work fine. There is no clean API for fsync and testing
if the file exists, so that all that has to be hand done in one
command-line. This is why many users use pre-written archive_command
shell scripts.

We aren't considering all of the use-cases really though, in specific,
things like pushing to s3 or gcs require, at least, good retry logic,
and that's without starting to think about things like high-rate systems
(spawning lots of new processes isn't free, particularly if they're
written in shell script but any interpreted language is expensive) and
wanting to parallelize.

This brings up a few questions:

* Should we have split apart archive_command into file-exists, copy,
fsync-file? Should we add that now?

No.. The right approach to improving on archive command is to add a way
for an extension to take over that job, maybe with a complete background
worker of its own, or perhaps a shared library that can be loaded by the
archiver process, at least if we're talking about how to allow people to
extend it.

Potentially a better answer is to just build this stuff into PG- things
like "archive WAL to s3/GCS with these credentials" are what an awful
lot of users want. There's then some who want "archive first to this
other server, and then archive to s3/GCS", or more complex options.

I'll also point out that there's not one "s3".. there's quite a few
alternatives, including some which are open source, which talk the s3
protocol (sadly, they don't all do it perfectly, which is why we are
talking about building a GCS-specific driver for gcs rather than using
their s3 gateway, but still, s3 isn't just 'one thing').

* How well does this backup requirement match with the shell command
API?

For my part, it's not just a question of an API, but it's a question of
who is going to implement a good and reliable solution- PG developers,
or some admin who is just trying to get PG up and running in their
environment..? One aspect of that is being knowledgable about where all
the land mines are- like the whole fsync thing. Sure, if you're a PG
developer or you've been around long enough, you're going to realize
that 'cp' isn't going to fsync() the file and therefore it's a pretty
high risk choice for archive_command, and you'll understand just how
important WAL is, but there's certainly an awful lot of folks out there
who don't realize that or at least don't think about it when they're
standing up a new system and instead they just are following our docs
with the expectation that those docs are providing good advice.

Thanks,

Stephen

bruce@momjian.us

about 6 years ago

In reply to: Stephen Frost (#11)

Re: where should I stick that backup?

On Thu, Apr 9, 2020 at 04:15:07PM -0400, Stephen Frost wrote:

Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:

I think we need to step back and look at the larger issue. The real
argument goes back to the Unix command-line API vs the VMS/Windows API.
The former has discrete parts that can be stitched together, while the
VMS/Windows API presents a more duplicative but more holistic API for
every piece. We have discussed using shell commands for
archive_command, and even more recently, for the server pass phrase.

When it comes to something like the server pass phrase, it seems much
more reasonable to consider using a shell script (though still perhaps
not ideal) because it's not involved directly in ensuring that the data
is reliably stored and it's pretty clear that if it doesn't work the
worst thing that happens is that the database doesn't start up, but it
won't corrupt any data or destroy it or do other bad things.

Well, the pass phrase relates to security, so it is important too. I
don't think the _importance_ of the action is the most determining
issue. Rather, I think it is how well the action fits the shell script
API.

To get more specific, I think we have to understand how the
_requirements_ of the job match the shell script API, with stdin,
stdout, stderr, return code, and command-line arguments. Looking at
archive_command, the command-line arguments allow specification of file
names, but quoting can be complex. The error return code and stderr
output seem to work fine. There is no clean API for fsync and testing
if the file exists, so that all that has to be hand done in one
command-line. This is why many users use pre-written archive_command
shell scripts.

We aren't considering all of the use-cases really though, in specific,
things like pushing to s3 or gcs require, at least, good retry logic,
and that's without starting to think about things like high-rate systems
(spawning lots of new processes isn't free, particularly if they're
written in shell script but any interpreted language is expensive) and
wanting to parallelize.

Good point, but if there are multiple APIs, it makes shell script
flexibility even more useful.

This brings up a few questions:

* Should we have split apart archive_command into file-exists, copy,
fsync-file? Should we add that now?

No.. The right approach to improving on archive command is to add a way
for an extension to take over that job, maybe with a complete background
worker of its own, or perhaps a shared library that can be loaded by the
archiver process, at least if we're talking about how to allow people to
extend it.

That seems quite vague, which is the issue we had years ago when
considering doing archive_command as a link to a C library.

Potentially a better answer is to just build this stuff into PG- things
like "archive WAL to s3/GCS with these credentials" are what an awful
lot of users want. There's then some who want "archive first to this
other server, and then archive to s3/GCS", or more complex options.

Yes, we certainly know how to do a file system copy, but what about
copying files to other things like S3? I don't know how we would do
that and allow users to change things like file paths or URLs.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

robertmhaas@gmail.com

about 6 years ago

In reply to: Bruce Momjian (#12)

Re: where should I stick that backup?

On Thu, Apr 9, 2020 at 6:44 PM Bruce Momjian <bruce@momjian.us> wrote:

Good point, but if there are multiple APIs, it makes shell script
flexibility even more useful.

This is really the key point for me. There are so many existing tools
that store a file someplace that we really can't ever hope to support
them all in core, or even to have well-written extensions that support
them all available on PGXN or wherever. We need to integrate with the
tools that other people have created, not try to reinvent them all in
PostgreSQL.

Now what I understand Stephen to be saying is that a lot of those
tools actually suck, and I think that's a completely valid point. But
I also think that it's unwise to decide that such problems are our
problems rather than problems with those tools. That's a hole with no
bottom.

One thing I do think would be realistic would be to invent a set of
tools that are perform certain local filesystem operations in a
"hardened" way. Maybe a single tool with subcommands and options. So
you could say, e.g. 'pgfile cp SOURCE TARGET' and it would create a
temporary file in the target directory, write the contents of the
source into that file, fsync the file, rename it into place, and do
more fsyncs to make sure it's all durable in case of a crash. You
could have a variant of this that instead of using the temporary file
and rename in place approach, does the thing where you open the target
file with O_CREAT|O_EXCL, writes the bytes, and then closes and fsyncs
it. And you could have other things too, like 'pgfile mkdir DIR' to
create a directory and fsync it for durability. A toolset like this
would probably help people write better archive commands - it would
certainly been an improvement over what we have now, anyway, and it
could also be used with the feature that I proposed upthread.

For example, if you're concerned that bzip might overwrite an existing
file and that it might not fsync, then instead of saying:

pg_basebackup -Ft --pipe-output 'bzip > %f.bz2'

You could instead write:

pg_basebackup -Ft --pipe-output 'bzip | pgfile create-exclusive - %f.bz2'

or whatever we pick for actual syntax. And that provides a kind of
hardening that can be used with any other command line tool that can
be used as a filter.

If you want to compress with bzip, encrypt, and then copy the file to
a remote system, you could do:

pg_basebackup -Ft --pipe-output 'bzip | gpg -e | ssh someuser@somehost
pgfile create-exclusive - /backups/tuesday/%f.bz2'

It is of course not impossible to teach pg_basebackup to do all of
that stuff internally, but I have a really difficult time imagining us
ever getting it done. There are just too many possibilities, and new
ones arise all the time.

A 'pgfile' utility wouldn't help at all for people who are storing to
S3 or whatever. They could use 'aws s3' as a target for --pipe-output,
but if it turns out that said tool is insufficiently robust in terms
of overwriting files or doing fsyncs or whatever, then they might have
problems. Now, Stephen or anyone else could choose to provide
alternative tools with more robust behavior, and that would be great.
But even if he didn't, people could take their chances with what's
already out there. To me, that's a good thing. Yeah, maybe they'll do
dumb things that don't work, but realistically, they can do dumb stuff
without the proposed option too.

Yes, we certainly know how to do a file system copy, but what about
copying files to other things like S3? I don't know how we would do
that and allow users to change things like file paths or URLs.

Right. I think it's key that we provide people with tools that are
highly flexible and, ideally, also highly composable.

(Incidentally, pg_basebackup already has an option to output the
entire backup as a tarfile on standard output, and a user can already
pipe that into any tool they like. However, it doesn't work with
tablespaces. So you could think of this proposal as extending the
existing functionality to cover that case.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

sfrost@snowman.net

about 6 years ago

In reply to: Bruce Momjian (#12)

Re: where should I stick that backup?

Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:

On Thu, Apr 9, 2020 at 04:15:07PM -0400, Stephen Frost wrote:

* Bruce Momjian (bruce@momjian.us) wrote:

I think we need to step back and look at the larger issue. The real
argument goes back to the Unix command-line API vs the VMS/Windows API.
The former has discrete parts that can be stitched together, while the
VMS/Windows API presents a more duplicative but more holistic API for
every piece. We have discussed using shell commands for
archive_command, and even more recently, for the server pass phrase.

When it comes to something like the server pass phrase, it seems much
more reasonable to consider using a shell script (though still perhaps
not ideal) because it's not involved directly in ensuring that the data
is reliably stored and it's pretty clear that if it doesn't work the
worst thing that happens is that the database doesn't start up, but it
won't corrupt any data or destroy it or do other bad things.

Well, the pass phrase relates to security, so it is important too. I
don't think the _importance_ of the action is the most determining
issue. Rather, I think it is how well the action fits the shell script
API.

There isn't a single 'shell script API' though, and it's possible to
craft a 'shell script API' to fit nearly any use-case, but that doesn't
make it a good solution. The amount we depend on the external code for
the correct operation of the system is relevant, and important to
consider.

To get more specific, I think we have to understand how the
_requirements_ of the job match the shell script API, with stdin,
stdout, stderr, return code, and command-line arguments. Looking at
archive_command, the command-line arguments allow specification of file
names, but quoting can be complex. The error return code and stderr
output seem to work fine. There is no clean API for fsync and testing
if the file exists, so that all that has to be hand done in one
command-line. This is why many users use pre-written archive_command
shell scripts.

We aren't considering all of the use-cases really though, in specific,
things like pushing to s3 or gcs require, at least, good retry logic,
and that's without starting to think about things like high-rate systems
(spawning lots of new processes isn't free, particularly if they're
written in shell script but any interpreted language is expensive) and
wanting to parallelize.

Good point, but if there are multiple APIs, it makes shell script
flexibility even more useful.

This doesn't seem to answer the concerns that I brought up.

Trying to understand it did make me think of another relevant question
that was brought up in this discussion- can we really expect users to
actually implement a C library for this, if we provided a way for them
to? For that, I'd point to FDWs, where we certainly don't have any
shortage of external, written in C, solutions. Another would be logical
decoding.

This brings up a few questions:

* Should we have split apart archive_command into file-exists, copy,
fsync-file? Should we add that now?

No.. The right approach to improving on archive command is to add a way
for an extension to take over that job, maybe with a complete background
worker of its own, or perhaps a shared library that can be loaded by the
archiver process, at least if we're talking about how to allow people to
extend it.

That seems quite vague, which is the issue we had years ago when
considering doing archive_command as a link to a C library.

That prior discussion isn't really relevant though, as it was before we
had extensions, and before we had background workers that can run as part
of an extension.

Potentially a better answer is to just build this stuff into PG- things
like "archive WAL to s3/GCS with these credentials" are what an awful
lot of users want. There's then some who want "archive first to this
other server, and then archive to s3/GCS", or more complex options.

Yes, we certainly know how to do a file system copy, but what about
copying files to other things like S3? I don't know how we would do
that and allow users to change things like file paths or URLs.

There's a few different ways we could go about this. The simple answer
would be to use GUCs, which would simplify things like dealing with the
restore side too. Another option would be to have a concept of
'repository' objects in the system, not unlike tablespaces, but they'd
have more options. To deal with that during recovery though, we'd need
a way to get the relevant information from the catalogs (maybe we write
the catalog out to a flat file on update, not unlike what we used to do
with pg_shadow), perhaps even in a format that users could modify if
they needed to. The nice thing about having actual objects in the
system is that it'd be a bit cleaner to be able to define multiple ones
and then have SQL-level functions/commands that work with them.

A good deal of this does involve the question about how to deal with
recovery though, since you might want to, or need to, use different
options when it comes to recovery. Back to the use-case that I was
mentioning, you could certainly want something like "try to get the WAL
from the local archive, and if that doesn't work, try to get it from the
s3 repo". What that implies then is that you'd really like a way to
configure multiple repos, which is where we start to see the fragility
of our GUC system. Pushing that out to something external doesn't
strike me as the right answer though, but rather, we should think about
how to resolve these issues with the GUC system, or come up with
something better. This isn't the only area where the GUC system isn't
really helping us- synchronous standby names is getting to be a pretty
complicated GUC, for example.

Of course, we could start out with just supporting a single repo with
just a few new GUCs to configure it, that wouldn't be hard and there's
good examples out there about what's needed to configure an s3 repo.

Thanks,

Stephen

sfrost@snowman.net

about 6 years ago

In reply to: Robert Haas (#13)

Re: where should I stick that backup?

Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:

On Thu, Apr 9, 2020 at 6:44 PM Bruce Momjian <bruce@momjian.us> wrote:

Good point, but if there are multiple APIs, it makes shell script
flexibility even more useful.

This is really the key point for me. There are so many existing tools
that store a file someplace that we really can't ever hope to support
them all in core, or even to have well-written extensions that support
them all available on PGXN or wherever. We need to integrate with the
tools that other people have created, not try to reinvent them all in
PostgreSQL.

So, this goes to what I was just mentioning to Bruce independently- you
could have made the same argument about FDWs, but it just doesn't
actually hold any water. Sure, some of the FDWs aren't great, but
there's certainly no shortage of them, and the ones that are
particularly important (like postgres_fdw) are well written and in core.

Now what I understand Stephen to be saying is that a lot of those
tools actually suck, and I think that's a completely valid point. But
I also think that it's unwise to decide that such problems are our
problems rather than problems with those tools. That's a hole with no
bottom.

I don't really think 'bzip2' sucks as a tool, or that bash does. They
weren't designed or intended to meet the expectations that we have for
data durability though, which is why relying on them for exactly that
ends up being a bad recipe.

One thing I do think would be realistic would be to invent a set of
tools that are perform certain local filesystem operations in a
"hardened" way. Maybe a single tool with subcommands and options. So
you could say, e.g. 'pgfile cp SOURCE TARGET' and it would create a
temporary file in the target directory, write the contents of the
source into that file, fsync the file, rename it into place, and do
more fsyncs to make sure it's all durable in case of a crash. You
could have a variant of this that instead of using the temporary file
and rename in place approach, does the thing where you open the target
file with O_CREAT|O_EXCL, writes the bytes, and then closes and fsyncs
it. And you could have other things too, like 'pgfile mkdir DIR' to
create a directory and fsync it for durability. A toolset like this
would probably help people write better archive commands - it would
certainly been an improvement over what we have now, anyway, and it
could also be used with the feature that I proposed upthread.

This argument leads in a direction to justify anything as being sensible
to implement using shell scripts. If we're open to writing the shell
level tools that would be needed, we could reimplement all of our
indexes that way, or FDWs, or TDE, or just about anything else.

What we would end up with though is that we'd have more complications
changing those interfaces because people will be using those tools, and
maybe those tools don't get updated at the same time as PG does, and
maybe there's critical changes that need to be made in back branches and
we can't really do that with these interfaces.

It is of course not impossible to teach pg_basebackup to do all of
that stuff internally, but I have a really difficult time imagining us
ever getting it done. There are just too many possibilities, and new
ones arise all the time.

I agree that it's certainly a fair bit of work, but it can be
accomplished incrementally and, with a good design, allow for adding in
new options in the future with relative ease. Now is the time to
discuss what that design looks like, think about how we can implement it
in a way that all of the tools we have are able to work together, and
have them all support and be tested together with these different
options.

The concerns about there being too many possibilities and new ones
coming up all the time could be applied equally to FDWs, but rather than
ending up with a dearth of options and external solutions there, what
we've actually seen is an explosion of options and externally written
libraries for a large variety of options.

A 'pgfile' utility wouldn't help at all for people who are storing to
S3 or whatever. They could use 'aws s3' as a target for --pipe-output,
but if it turns out that said tool is insufficiently robust in terms
of overwriting files or doing fsyncs or whatever, then they might have
problems. Now, Stephen or anyone else could choose to provide
alternative tools with more robust behavior, and that would be great.
But even if he didn't, people could take their chances with what's
already out there. To me, that's a good thing. Yeah, maybe they'll do
dumb things that don't work, but realistically, they can do dumb stuff
without the proposed option too.

How does this solution give them a good way to do the right thing
though? In a way that will work with large databases and complex
requirements? The answer seems to be "well, everyone will have to write
their own tool to do that" and that basically means that, at best, we're
only providing half of a solution and expecting all of our users to
provide the other half, and to always do it correctly and in a well
written way. Acknowledging that most users aren't going to actually do
that and instead they'll implement half measures that aren't reliable
shouldn't be seen as an endorsement of this approach.

Thanks,

Stephen

bruce@momjian.us

about 6 years ago

In reply to: Stephen Frost (#15)

Re: where should I stick that backup?

On Fri, Apr 10, 2020 at 10:54:10AM -0400, Stephen Frost wrote:

Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:

On Thu, Apr 9, 2020 at 6:44 PM Bruce Momjian <bruce@momjian.us> wrote:

Good point, but if there are multiple APIs, it makes shell script
flexibility even more useful.

This is really the key point for me. There are so many existing tools
that store a file someplace that we really can't ever hope to support
them all in core, or even to have well-written extensions that support
them all available on PGXN or wherever. We need to integrate with the
tools that other people have created, not try to reinvent them all in
PostgreSQL.

So, this goes to what I was just mentioning to Bruce independently- you
could have made the same argument about FDWs, but it just doesn't
actually hold any water. Sure, some of the FDWs aren't great, but
there's certainly no shortage of them, and the ones that are
particularly important (like postgres_fdw) are well written and in core.

No, no one made that argument. It isn't clear how a shell script API
would map to relational database queries. The point is how well the
APIs match, and then if they are close, does it give us the flexibility
we need. You can't just look at flexibility without an API match.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

robertmhaas@gmail.com

about 6 years ago

In reply to: Stephen Frost (#15)

Re: where should I stick that backup?

On Fri, Apr 10, 2020 at 10:54 AM Stephen Frost <sfrost@snowman.net> wrote:

So, this goes to what I was just mentioning to Bruce independently- you
could have made the same argument about FDWs, but it just doesn't
actually hold any water. Sure, some of the FDWs aren't great, but
there's certainly no shortage of them, and the ones that are
particularly important (like postgres_fdw) are well written and in core.

That's a fairly different use case. In the case of the FDW interface:

- The number of interface method calls is very high, at least one per
tuple and a bunch of extra ones for each query.
- There is a significant amount of complex state that needs to be
maintained across API calls.
- The return values are often tuples, which are themselves an
in-memory data structure.

But here:

- We're only talking about writing a handful of tar files, and that's
in the context of a full-database backup, which is a much
heavier-weight operation than a query.
- There is not really any state that needs to be maintained across calls.
- The expected result is that a file gets written someplace, which is
not an in-memory data structure but something that gets written to a
place outside of PostgreSQL.

The concerns about there being too many possibilities and new ones
coming up all the time could be applied equally to FDWs, but rather than
ending up with a dearth of options and external solutions there, what
we've actually seen is an explosion of options and externally written
libraries for a large variety of options.

Sure, but a lot of those FDWs are relatively low-quality, and it's
often hard to find one that does what you want. And even if you do,
you don't really know how good it is. Unfortunately, in that case
there's no real alternative, because implementing something based on
shell commands couldn't ever have reasonable performance or a halfway
decent feature set. That's not the case here.

How does this solution give them a good way to do the right thing
though? In a way that will work with large databases and complex
requirements? The answer seems to be "well, everyone will have to write
their own tool to do that" and that basically means that, at best, we're
only providing half of a solution and expecting all of our users to
provide the other half, and to always do it correctly and in a well
written way. Acknowledging that most users aren't going to actually do
that and instead they'll implement half measures that aren't reliable
shouldn't be seen as an endorsement of this approach.

I don't acknowledge that. I think it's possible to use tools like the
proposed option in a perfectly reliable way, and I've already given a
bunch of examples of how it could be done. Writing a file is not such
a complex operation that every bit of code that writes one reliably
has to be written by someone associated with the PostgreSQL project. I
strongly suspect that people who use a cloud provider's tools to
upload their backup files will be quite happy with the results, and if
they aren't, I hope they will blame the cloud provider's tool for
eating the data rather than this option for making it easy to give the
data to the thing that ate it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

andres@anarazel.de

about 6 years ago

In reply to: Robert Haas (#17)

Re: where should I stick that backup?

Hi,

On 2020-04-10 12:20:01 -0400, Robert Haas wrote:

- We're only talking about writing a handful of tar files, and that's
in the context of a full-database backup, which is a much
heavier-weight operation than a query.
- There is not really any state that needs to be maintained across calls.
- The expected result is that a file gets written someplace, which is
not an in-memory data structure but something that gets written to a
place outside of PostgreSQL.

Wouldn't there be state like a S3/ssh/https/... connection? And perhaps
a 'backup_id' in the backup metadata DB that'd one would want to update
at the end?

Greetings,

Andres Freund

José Luis Tallón

jltallon@adv-solutions.net

about 6 years ago

In reply to: Robert Haas (#13)

Re: where should I stick that backup?

On 10/4/20 15:49, Robert Haas wrote:

On Thu, Apr 9, 2020 at 6:44 PM Bruce Momjian <bruce@momjian.us> wrote:

Good point, but if there are multiple APIs, it makes shell script
flexibility even more useful.

[snip]

One thing I do think would be realistic would be to invent a set of
tools that are perform certain local filesystem operations in a
"hardened" way.

+10

Maybe a single tool with subcommands and options. So
you could say, e.g. 'pgfile cp SOURCE TARGET' and it would create a
temporary file in the target directory, write the contents of the
source into that file, fsync the file, rename it into place, and do
more fsyncs to make sure it's all durable in case of a crash. You
could have a variant of this that instead of using the temporary file
and rename in place approach, does the thing where you open the target
file with O_CREAT|O_EXCL, writes the bytes, and then closes and fsyncs
it.

Behaviour might be decided in the same way as the default for
'wal_sync_method' gets chosen, as the most appropriate for a particular
system.

And you could have other things too, like 'pgfile mkdir DIR' to
create a directory and fsync it for durability. A toolset like this
would probably help people write better archive commands

Definitely, "mkdir" and "create-exclusive" (along with cp) would be a
great addition and simplify the kind of tasks properly (i.e. with
risking data loss every time)

[excerpted]

pg_basebackup -Ft --pipe-output 'bzip | pgfile create-exclusive - %f.bz2'

[....]

pg_basebackup -Ft --pipe-output 'bzip | gpg -e | ssh someuser@somehost
pgfile create-exclusive - /backups/tuesday/%f.bz2'

Yep. Would also fit the case for non-synchronous NFS mounts for backups...

It is of course not impossible to teach pg_basebackup to do all of
that stuff internally, but I have a really difficult time imagining us
ever getting it done. There are just too many possibilities, and new
ones arise all the time.

Indeed. The beauty of Unix-like OSs is precisely this.

A 'pgfile' utility wouldn't help at all for people who are storing to
S3 or whatever. They could use 'aws s3' as a target for --pipe-output,
[snip]
(Incidentally, pg_basebackup already has an option to output the
entire backup as a tarfile on standard output, and a user can already
pipe that into any tool they like. However, it doesn't work with
tablespaces. So you could think of this proposal as extending the
existing functionality to cover that case.)

Been there already :S Having pg_basebackup output multiple tarballs
(one per tablespace), ideally separated via something so that splitting
can be trivially done on the receiving end.

...but that's probably matter for another thread.

Thanks,

/ J.L.

robertmhaas@gmail.com

about 6 years ago

In reply to: Andres Freund (#18)

Re: where should I stick that backup?

On Fri, Apr 10, 2020 at 3:38 PM Andres Freund <andres@anarazel.de> wrote:

Wouldn't there be state like a S3/ssh/https/... connection? And perhaps
a 'backup_id' in the backup metadata DB that'd one would want to update
at the end?

Good question. I don't know that there would be but, uh, maybe? It's
not obvious to me why all of that would need to be done using the same
connection, but if it is, the idea I proposed isn't going to work very
nicely.

More generally, can you think of any ideas for how to structure an API
here that are easier to use than "write some C code"? Or do you think
we should tell people to write some C code if they want to
compress/encrypt/relocate their backup in some non-standard way?

For the record, I'm not against eventually having more than one way to
do this, maybe a shell-script interface for simpler things and some
kind of API for more complex needs (e.g. NetBackup integration,
perhaps). And I did wonder if there was some other way we could do
this. For instance, we could add an option --tar-everything that
sticks all the things that would have been returned by the backup
inside another level of tar file and sends the result to stdout. Then
you can pipe it into a single command that gets invoked only once for
all the data, rather than once per tablespace. That might be better,
but I'm not sure it's better. It's better if you want to do
complicated things that involve steps that happen before and after and
persistent connections and so on, but it seems worse for simple things
like piping through a non-default compressor.

Larry Wall somewhat famously commented that a good programming
language should (and I paraphrase) make simple things simple and
complex things possible. My hesitation in going straight to a C API is
that it does not make simple things simple; and I'd like to be really
sure that there is no way of achieving that valuable goal before we
give up on it. However, there is no doubt that a C API is potentially
more powerful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

José Luis Tallón

jltallon@adv-solutions.net

about 6 years ago

In reply to: Andres Freund (#18)

Magnus Hagander

magnus@hagander.net

about 6 years ago

In reply to: Robert Haas (#20)

robertmhaas@gmail.com

about 6 years ago

In reply to: Magnus Hagander (#22)

andres@anarazel.de

about 6 years ago

In reply to: Robert Haas (#20)

andres@anarazel.de

about 6 years ago

In reply to: Robert Haas (#23)

david@pgmasters.net

about 6 years ago

In reply to: Robert Haas (#23)

david@pgmasters.net

about 6 years ago

In reply to: Andres Freund (#24)

andres@anarazel.de

about 6 years ago

In reply to: David Steele (#27)

david@pgmasters.net

about 6 years ago

In reply to: Andres Freund (#28)

robertmhaas@gmail.com

about 6 years ago

In reply to: Andres Freund (#24)

sfrost@snowman.net

about 6 years ago

In reply to: David Steele (#29)

andres@anarazel.de

about 6 years ago

In reply to: Robert Haas (#30)

sfrost@snowman.net

about 6 years ago

In reply to: Andres Freund (#32)

bruce@momjian.us

about 6 years ago

In reply to: Stephen Frost (#33)

robertmhaas@gmail.com

about 6 years ago

In reply to: Stephen Frost (#33)

robertmhaas@gmail.com

about 6 years ago

In reply to: Andres Freund (#32)

sfrost@snowman.net

about 6 years ago

In reply to: Robert Haas (#36)

robertmhaas@gmail.com

about 6 years ago

In reply to: Stephen Frost (#37)

andres@anarazel.de

about 6 years ago

In reply to: Robert Haas (#38)

robertmhaas@gmail.com

about 6 years ago

In reply to: Andres Freund (#39)

andres@anarazel.de

about 6 years ago

In reply to: Robert Haas (#40)

robertmhaas@gmail.com

about 6 years ago

In reply to: Andres Freund (#41)

robertmhaas@gmail.com

about 6 years ago

In reply to: Robert Haas (#42)

robertmhaas@gmail.com

about 6 years ago

In reply to: Robert Haas (#43)

andres@anarazel.de

about 6 years ago

In reply to: Robert Haas (#44)

robertmhaas@gmail.com

about 6 years ago

In reply to: Andres Freund (#45)

bruce@momjian.us

about 6 years ago

In reply to: Robert Haas (#46)

amit.kapila16@gmail.com

about 6 years ago

In reply to: Robert Haas (#46)

amit.kapila16@gmail.com

about 6 years ago

In reply to: Andres Freund (#45)

amit.kapila16@gmail.com

about 6 years ago

In reply to: Amit Kapila (#48)

amit.kapila16@gmail.com

about 6 years ago

In reply to: Stephen Frost (#31)

amit.kapila16@gmail.com

about 6 years ago

In reply to: Andres Freund (#41)