Adding pipe support to pg_dump and pg_restore
Hello,
I recently wanted a way to encrypt/decrypt backups while still
utilizing the parallel dump/restore functionality. I couldn't see a
way to do this so I experimented a bit with the directory backup
format. If there's in fact already a way to do this, please tell me
now :-)
The idea is to add a --pipe option to pg_dump / pg_restore where you
can specify a custom shell command that is used to write / read each
.dat-file. Usage examples include encryption with pgp and/or custom
compression pipelines. %p in the command is expanded to the path to
write to / read from. The pipe command is not applied to the toc.
The current version is attached. Could something like this be
acceptable for inclusion?
Attachments:
0001-Add-support-for-pipe-to-pg_dump-and-pg_restore.patchtext/x-patch; charset=US-ASCII; name=0001-Add-support-for-pipe-to-pg_dump-and-pg_restore.patchDownload
From 27f6c541be6546edfef62646f514fe1a92042705 Mon Sep 17 00:00:00 2001
From: David Hedberg <david.hedberg@gmail.com>
Date: Sat, 29 Sep 2018 12:55:52 +0200
Subject: [PATCH] Add support for --pipe to pg_dump and pg_restore
---
src/bin/pg_dump/compress_io.c | 97 ++++++++++++++++++++++++---
src/bin/pg_dump/compress_io.h | 6 +-
src/bin/pg_dump/pg_backup.h | 2 +
src/bin/pg_dump/pg_backup_directory.c | 14 ++--
src/bin/pg_dump/pg_dump.c | 17 ++++-
src/bin/pg_dump/pg_restore.c | 7 ++
6 files changed, 121 insertions(+), 22 deletions(-)
diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index a96da15dc1..64c06d7eae 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -443,6 +443,9 @@ struct cfp
static int hasSuffix(const char *filename, const char *suffix);
#endif
+static void
+expand_shell_command(char *buf, size_t bufsize, const char *cmd, const char *filepath);
+
/* free() without changing errno; useful in several places below */
static void
free_keep_errno(void *p)
@@ -464,24 +467,26 @@ free_keep_errno(void *p)
* On failure, return NULL with an error code in errno.
*/
cfp *
-cfopen_read(const char *path, const char *mode)
+cfopen_read(const char *path, const char *mode, const char *pipecmd)
{
cfp *fp;
+ if (pipecmd)
+ fp = cfopen(path, mode, 0, pipecmd);
#ifdef HAVE_LIBZ
- if (hasSuffix(path, ".gz"))
- fp = cfopen(path, mode, 1);
+ else if (hasSuffix(path, ".gz"))
+ fp = cfopen(path, mode, 1, NULL);
else
#endif
{
- fp = cfopen(path, mode, 0);
+ fp = cfopen(path, mode, 0, NULL);
#ifdef HAVE_LIBZ
if (fp == NULL)
{
char *fname;
fname = psprintf("%s.gz", path);
- fp = cfopen(fname, mode, 1);
+ fp = cfopen(fname, mode, 1, NULL);
free_keep_errno(fname);
}
#endif
@@ -501,19 +506,19 @@ cfopen_read(const char *path, const char *mode)
* On failure, return NULL with an error code in errno.
*/
cfp *
-cfopen_write(const char *path, const char *mode, int compression)
+cfopen_write(const char *path, const char *mode, int compression, const char *pipecmd)
{
cfp *fp;
if (compression == 0)
- fp = cfopen(path, mode, 0);
+ fp = cfopen(path, mode, 0, pipecmd);
else
{
#ifdef HAVE_LIBZ
char *fname;
fname = psprintf("%s.gz", path);
- fp = cfopen(fname, mode, compression);
+ fp = cfopen(fname, mode, compression, pipecmd);
free_keep_errno(fname);
#else
exit_horribly(modulename, "not built with zlib support\n");
@@ -530,11 +535,32 @@ cfopen_write(const char *path, const char *mode, int compression)
* On failure, return NULL with an error code in errno.
*/
cfp *
-cfopen(const char *path, const char *mode, int compression)
+cfopen(const char *path, const char *mode, int compression, const char *pipecmd)
{
cfp *fp = pg_malloc(sizeof(cfp));
- if (compression != 0)
+ if (pipecmd)
+ {
+ char cmd[MAXPGPATH];
+ char pmode[2];
+
+ if ( !(mode[0] == 'r' || mode[0] == 'w') ) {
+ exit_horribly(modulename, "Pipe does not support mode %s", mode);
+ }
+ pmode[0] = mode[0];
+ pmode[1] = '\0';
+
+ expand_shell_command(cmd, MAXPGPATH, pipecmd, path);
+
+ fp->compressedfp = NULL;
+ fp->uncompressedfp = popen(cmd, pmode);
+ if (fp->uncompressedfp == NULL)
+ {
+ free_keep_errno(fp);
+ fp->uncompressedfp = NULL;
+ }
+ }
+ else if (compression != 0)
{
#ifdef HAVE_LIBZ
if (compression != Z_DEFAULT_COMPRESSION)
@@ -731,5 +757,54 @@ hasSuffix(const char *filename, const char *suffix)
suffix,
suffixlen) == 0;
}
-
#endif
+
+/*
+ * Expand a shell command
+ *
+ * Replaces %p in cmd with the path in filepath and writes the result to buf.
+ */
+static void
+expand_shell_command(char *buf, size_t bufsize, const char *cmd, const char *filepath)
+{
+ char *dp;
+ char *endp;
+ const char *sp;
+
+ dp = buf;
+ endp = buf + bufsize - 1;
+ *endp = '\0';
+
+ for (sp = cmd; *sp; sp++)
+ {
+ if (*sp == '%')
+ {
+ switch (sp[1])
+ {
+ case 'p':
+ /* %p: absolute path of file */
+ sp++;
+ strlcpy(dp, filepath, endp - dp);
+ dp += strlen(dp);
+ break;
+ case '%':
+ /* convert %% to a single % */
+ sp++;
+ if (dp < endp)
+ *dp++ = *sp;
+ break;
+ default:
+ /* otherwise treat the % as not special */
+ if (dp < endp)
+ *dp++ = *sp;
+ break;
+ }
+ }
+ else
+ {
+ if (dp < endp)
+ *dp++ = *sp;
+ }
+ }
+ *dp = '\0';
+}
diff --git a/src/bin/pg_dump/compress_io.h b/src/bin/pg_dump/compress_io.h
index 10fde8bdef..8a09086f96 100644
--- a/src/bin/pg_dump/compress_io.h
+++ b/src/bin/pg_dump/compress_io.h
@@ -56,9 +56,9 @@ extern void EndCompressor(ArchiveHandle *AH, CompressorState *cs);
typedef struct cfp cfp;
-extern cfp *cfopen(const char *path, const char *mode, int compression);
-extern cfp *cfopen_read(const char *path, const char *mode);
-extern cfp *cfopen_write(const char *path, const char *mode, int compression);
+extern cfp *cfopen(const char *path, const char *mode, int compression, const char *pipecmd);
+extern cfp *cfopen_read(const char *path, const char *mode, const char *pipecmd);
+extern cfp *cfopen_write(const char *path, const char *mode, int compression, const char *pipecmd);
extern int cfread(void *ptr, int size, cfp *fp);
extern int cfwrite(const void *ptr, int size, cfp *fp);
extern int cfgetc(cfp *fp);
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index ba798213be..9aba93f923 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -87,6 +87,7 @@ typedef struct _restoreOptions
int verbose;
int aclsSkip;
const char *lockWaitTimeout;
+ const char *pipeCommand;
int include_everything;
int tocSummary;
@@ -141,6 +142,7 @@ typedef struct _dumpOptions
int dumpSections; /* bitmask of chosen sections */
bool aclsSkip;
const char *lockWaitTimeout;
+ const char *pipeCommand;
/* flags for various command-line long options */
int disable_dollar_quoting;
diff --git a/src/bin/pg_dump/pg_backup_directory.c b/src/bin/pg_dump/pg_backup_directory.c
index cda90b9a2a..c9a3f5db52 100644
--- a/src/bin/pg_dump/pg_backup_directory.c
+++ b/src/bin/pg_dump/pg_backup_directory.c
@@ -205,7 +205,7 @@ InitArchiveFmt_Directory(ArchiveHandle *AH)
setFilePath(AH, fname, "toc.dat");
- tocFH = cfopen_read(fname, PG_BINARY_R);
+ tocFH = cfopen_read(fname, PG_BINARY_R, NULL);
if (tocFH == NULL)
exit_horribly(modulename,
"could not open input file \"%s\": %s\n",
@@ -333,7 +333,7 @@ _StartData(ArchiveHandle *AH, TocEntry *te)
setFilePath(AH, fname, tctx->filename);
- ctx->dataFH = cfopen_write(fname, PG_BINARY_W, AH->compression);
+ ctx->dataFH = cfopen_write(fname, PG_BINARY_W, AH->compression, AH->public.dopt->pipeCommand);
if (ctx->dataFH == NULL)
exit_horribly(modulename, "could not open output file \"%s\": %s\n",
fname, strerror(errno));
@@ -392,7 +392,7 @@ _PrintFileData(ArchiveHandle *AH, char *filename)
if (!filename)
return;
- cfp = cfopen_read(filename, PG_BINARY_R);
+ cfp = cfopen_read(filename, PG_BINARY_R, AH->public.ropt->pipeCommand);
if (!cfp)
exit_horribly(modulename, "could not open input file \"%s\": %s\n",
@@ -446,7 +446,7 @@ _LoadBlobs(ArchiveHandle *AH)
setFilePath(AH, fname, "blobs.toc");
- ctx->blobsTocFH = cfopen_read(fname, PG_BINARY_R);
+ ctx->blobsTocFH = cfopen_read(fname, PG_BINARY_R, NULL);
if (ctx->blobsTocFH == NULL)
exit_horribly(modulename, "could not open large object TOC file \"%s\" for input: %s\n",
@@ -579,7 +579,7 @@ _CloseArchive(ArchiveHandle *AH)
ctx->pstate = ParallelBackupStart(AH);
/* The TOC is always created uncompressed */
- tocFH = cfopen_write(fname, PG_BINARY_W, 0);
+ tocFH = cfopen_write(fname, PG_BINARY_W, 0, NULL);
if (tocFH == NULL)
exit_horribly(modulename, "could not open output file \"%s\": %s\n",
fname, strerror(errno));
@@ -644,7 +644,7 @@ _StartBlobs(ArchiveHandle *AH, TocEntry *te)
setFilePath(AH, fname, "blobs.toc");
/* The blob TOC file is never compressed */
- ctx->blobsTocFH = cfopen_write(fname, "ab", 0);
+ ctx->blobsTocFH = cfopen_write(fname, "ab", 0, NULL);
if (ctx->blobsTocFH == NULL)
exit_horribly(modulename, "could not open output file \"%s\": %s\n",
fname, strerror(errno));
@@ -663,7 +663,7 @@ _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
snprintf(fname, MAXPGPATH, "%s/blob_%u.dat", ctx->directory, oid);
- ctx->dataFH = cfopen_write(fname, PG_BINARY_W, AH->compression);
+ ctx->dataFH = cfopen_write(fname, PG_BINARY_W, AH->compression, AH->public.dopt->pipeCommand);
if (ctx->dataFH == NULL)
exit_horribly(modulename, "could not open output file \"%s\": %s\n",
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index c8d01ed4a4..5fe9619055 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -378,6 +378,7 @@ main(int argc, char **argv)
{"no-subscriptions", no_argument, &dopt.no_subscriptions, 1},
{"no-sync", no_argument, NULL, 7},
{"on-conflict-do-nothing", no_argument, &dopt.do_nothing, 1},
+ {"pipe", required_argument, NULL, 8},
{NULL, 0, NULL, 0}
};
@@ -562,6 +563,10 @@ main(int argc, char **argv)
dosync = false;
break;
+ case 8: /* pipe */
+ dopt.pipeCommand = pg_strdup(optarg);
+ break;
+
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
exit_nicely(1);
@@ -629,11 +634,15 @@ main(int argc, char **argv)
if (archiveFormat == archNull)
plainText = 1;
+ if (dopt.pipeCommand && compressLevel != -1) {
+ exit_horribly(NULL, "option --pipe and --compress cannot be used together\n");
+ }
+
/* Custom and directory formats are compressed by default, others not */
if (compressLevel == -1)
{
#ifdef HAVE_LIBZ
- if (archiveFormat == archCustom || archiveFormat == archDirectory)
+ if (!dopt.pipeCommand && !(archiveFormat == archCustom || archiveFormat == archDirectory))
compressLevel = Z_DEFAULT_COMPRESSION;
else
#endif
@@ -670,6 +679,10 @@ main(int argc, char **argv)
if (archiveFormat != archDirectory && numWorkers > 1)
exit_horribly(NULL, "parallel backup only supported by the directory format\n");
+ /* Pipe only in the directory archive format so far */
+ if (archiveFormat != archDirectory && dopt.pipeCommand)
+ exit_horribly(NULL, "pipe only supported by the directory format\n");
+
/* Open the output file */
fout = CreateArchive(filename, archiveFormat, compressLevel, dosync,
archiveMode, setupDumpWorker);
@@ -998,6 +1011,8 @@ help(const char *progname)
printf(_(" --use-set-session-authorization\n"
" use SET SESSION AUTHORIZATION commands instead of\n"
" ALTER OWNER commands to set ownership\n"));
+ printf(_(" --pipe=COMMAND Create files by piping data to the given command\n"
+ " Only usable with the directory format\n"));
printf(_("\nConnection options:\n"));
printf(_(" -d, --dbname=DBNAME database to dump\n"));
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index 501d7cea72..2f0e7b03d8 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -124,6 +124,7 @@ main(int argc, char **argv)
{"no-publications", no_argument, &no_publications, 1},
{"no-security-labels", no_argument, &no_security_labels, 1},
{"no-subscriptions", no_argument, &no_subscriptions, 1},
+ {"pipe", required_argument, NULL, 4},
{NULL, 0, NULL, 0}
};
@@ -281,6 +282,10 @@ main(int argc, char **argv)
set_dump_section(optarg, &(opts->dumpSections));
break;
+ case 4: /* pipe */
+ opts->pipeCommand = pg_strdup(optarg);
+ break;
+
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
exit_nicely(1);
@@ -496,6 +501,8 @@ usage(const char *progname)
printf(_(" --use-set-session-authorization\n"
" use SET SESSION AUTHORIZATION commands instead of\n"
" ALTER OWNER commands to set ownership\n"));
+ printf(_(" --pipe=COMMAND Read files using the output from the given command\n"
+ " Only usable with the directory format\n"));
printf(_("\nConnection options:\n"));
printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
--
2.17.1
Greetings,
* David Hedberg (david.hedberg@gmail.com) wrote:
I recently wanted a way to encrypt/decrypt backups while still
utilizing the parallel dump/restore functionality. I couldn't see a
way to do this so I experimented a bit with the directory backup
format. If there's in fact already a way to do this, please tell me
now :-)
Supporting encryption/decryption is certainly a good idea but I'm not
sure that we want to punt like this and expect the user to provide a
shell script or similar to do it. I would have thought we'd build in
encryption leveraging openssl (and, ideally, other providers, similar to
what we're working to do with SSL) directly.
The idea is to add a --pipe option to pg_dump / pg_restore where you
can specify a custom shell command that is used to write / read each
.dat-file. Usage examples include encryption with pgp and/or custom
compression pipelines. %p in the command is expanded to the path to
write to / read from. The pipe command is not applied to the toc.
I would certainly think that we'd want to have support for custom format
dumps too..
The current version is attached. Could something like this be
acceptable for inclusion?
At least for my 2c, I'm not completely against it, but I'd much rather
see us providing encryption directly and for all of the formats we
support, doing intelligent things like encrypting the TOC for a custom
format dump independently so we can still support fast restore of
individual objects and such. I'm also not entirely sure about how well
this proposed approach would work on Windows..
Thanks!
Stephen
Stephen Frost <sfrost@snowman.net> writes:
* David Hedberg (david.hedberg@gmail.com) wrote:
The idea is to add a --pipe option to pg_dump / pg_restore where you
can specify a custom shell command that is used to write / read each
.dat-file. Usage examples include encryption with pgp and/or custom
compression pipelines. %p in the command is expanded to the path to
write to / read from. The pipe command is not applied to the toc.
I would certainly think that we'd want to have support for custom format
dumps too..
This seems like rather a kluge :-(. In the context of encrypted dumps
in particular, I see no really safe way to pass an encryption key down
to the custom command --- either you put it in the command line to be
exec'd, or you put it in the process environment, and neither of those
are secure on all platforms.
The assumption that the TOC doesn't need encryption seems pretty
shaky as well.
So I think we'd be better off proceeding as Stephen envisions.
Maybe there are use-cases for the sort of thing David is proposing,
but I don't think encrypted dumps present a good argument for it.
regards, tom lane
On Sat, Sep 29, 2018 at 11:42:40AM -0400, Tom Lane wrote:
Stephen Frost <sfrost@snowman.net> writes:
* David Hedberg (david.hedberg@gmail.com) wrote:
The idea is to add a --pipe option to pg_dump / pg_restore where
you can specify a custom shell command that is used to write /
read each .dat-file. Usage examples include encryption with pgp
and/or custom compression pipelines. %p in the command is
expanded to the path to write to / read from. The pipe command is
not applied to the toc.I would certainly think that we'd want to have support for custom
format dumps too..This seems like rather a kluge :-(. In the context of encrypted
dumps in particular, I see no really safe way to pass an encryption
key down to the custom command --- either you put it in the command
line to be exec'd, or you put it in the process environment, and
neither of those are secure on all platforms.
As I understand it, those are the options for providing secrets in
general. At least in the case of encryption, one good solution would
be to use an asymmetric encryption scheme, i.e. one where encrypting
doesn't expose a secret in any way.
As to decryption, that's generally done with more caution in
environments where things are being routinely encrypted in the first
place.
The assumption that the TOC doesn't need encryption seems pretty
shaky as well.
That it does.
So I think we'd be better off proceeding as Stephen envisions.
Maybe there are use-cases for the sort of thing David is proposing,
but I don't think encrypted dumps present a good argument for it.
Dumping over a network seems like a reasonable use case for this. I
know that we have remote ways to do this, but in some
environments--think FedRAMP, or similar compliance regime--setting up
a remote access to do the dump can cause extra headaches. Being able
to encrypt them in the process would be helpful in situations I've
seen in the past week.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Sat, Sep 29, 2018 at 5:56 PM, David Fetter <david@fetter.org> wrote:
On Sat, Sep 29, 2018 at 11:42:40AM -0400, Tom Lane wrote:
Stephen Frost <sfrost@snowman.net> writes:
* David Hedberg (david.hedberg@gmail.com) wrote:
The idea is to add a --pipe option to pg_dump / pg_restore where
you can specify a custom shell command that is used to write /
read each .dat-file. Usage examples include encryption with pgp
and/or custom compression pipelines. %p in the command is
expanded to the path to write to / read from. The pipe command is
not applied to the toc.I would certainly think that we'd want to have support for custom
format dumps too..This seems like rather a kluge :-(. In the context of encrypted
dumps in particular, I see no really safe way to pass an encryption
key down to the custom command --- either you put it in the command
line to be exec'd, or you put it in the process environment, and
neither of those are secure on all platforms.As I understand it, those are the options for providing secrets in
general. At least in the case of encryption, one good solution would
be to use an asymmetric encryption scheme, i.e. one where encrypting
doesn't expose a secret in any way.As to decryption, that's generally done with more caution in
environments where things are being routinely encrypted in the first
place.
Yes; in my specific case the idea is to use public key encryption with
gpg. In that scenario the secret does not need to be on the server at
all.
The assumption that the TOC doesn't need encryption seems pretty
shaky as well.That it does.
I don't think there's any inherent reason it can't be applied to the
TOC as well. It's mostly an accident of me following the the existing
compression code.
On Sat, Sep 29, 2018 at 5:03 PM, Stephen Frost <sfrost@snowman.net> wrote:
At least for my 2c, I'm not completely against it, but I'd much rather
see us providing encryption directly and for all of the formats we
support, doing intelligent things like encrypting the TOC for a custom
format dump independently so we can still support fast restore of
individual objects and such. I'm also not entirely sure about how well
this proposed approach would work on Windows..
I haven't tested it in windows, but I did see that there's already a
popen function in src/port/system.c so my guess was going to be that
it can work..
Generally, my thinking is that this can be pretty useful in general
besides encryption. For other formats the dumps can already be written
to standard output and piped through for example gpg or a custom
compression application of the administrators choice, so in a sense
this functionality would merely add the same feature to the directory
format.
My main wish here is to be able combine a parallel dump/restore with
encryption without having to first write the dump encrypted and then
loop over and rewrite the files encrypted in an extra step. This can
surely be quite a large win as the size of the dumps grow larger..
/ David
Greetings,
* David Hedberg (david.hedberg@gmail.com) wrote:
On Sat, Sep 29, 2018 at 5:56 PM, David Fetter <david@fetter.org> wrote:
On Sat, Sep 29, 2018 at 11:42:40AM -0400, Tom Lane wrote:
As I understand it, those are the options for providing secrets in
general. At least in the case of encryption, one good solution would
be to use an asymmetric encryption scheme, i.e. one where encrypting
doesn't expose a secret in any way.As to decryption, that's generally done with more caution in
environments where things are being routinely encrypted in the first
place.Yes; in my specific case the idea is to use public key encryption with
gpg. In that scenario the secret does not need to be on the server at
all.
Using public key encryption doesn't mean you get to entirely avoid the
question around how to handle secrets- you'll presumably want to
actually restore the dump at some point.
On Sat, Sep 29, 2018 at 5:03 PM, Stephen Frost <sfrost@snowman.net> wrote:
At least for my 2c, I'm not completely against it, but I'd much rather
see us providing encryption directly and for all of the formats we
support, doing intelligent things like encrypting the TOC for a custom
format dump independently so we can still support fast restore of
individual objects and such. I'm also not entirely sure about how well
this proposed approach would work on Windows..I haven't tested it in windows, but I did see that there's already a
popen function in src/port/system.c so my guess was going to be that
it can work..
Perhaps, though these things tend to be trickier on Windows, at least
from what I've seen (I'm no Windows dev myself tho, to be clear).
Generally, my thinking is that this can be pretty useful in general
besides encryption. For other formats the dumps can already be written
to standard output and piped through for example gpg or a custom
compression application of the administrators choice, so in a sense
this functionality would merely add the same feature to the directory
format.
That's certainly not the same though. One of the great advantages of
custom and directory format dumps is the TOC and the ability to
selectively extract data from them without having to read the entire
dump file. You end up losing that if you have to pass the entire dump
through something else because you're using the pipe.
My main wish here is to be able combine a parallel dump/restore with
encryption without having to first write the dump encrypted and then
loop over and rewrite the files encrypted in an extra step. This can
surely be quite a large win as the size of the dumps grow larger..
That's great, and I think we agree that it'd be a very nice feature for
pg_dump/restore to support encryption, but done intelligently, across
the formats that pg_dump supports, with a secure way to pass the
secrets.
Thanks!
Stephen
Hi,
On Sat, Sep 29, 2018 at 7:01 PM, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* David Hedberg (david.hedberg@gmail.com) wrote:
On Sat, Sep 29, 2018 at 5:56 PM, David Fetter <david@fetter.org> wrote:
On Sat, Sep 29, 2018 at 11:42:40AM -0400, Tom Lane wrote:
As I understand it, those are the options for providing secrets in
general. At least in the case of encryption, one good solution would
be to use an asymmetric encryption scheme, i.e. one where encrypting
doesn't expose a secret in any way.As to decryption, that's generally done with more caution in
environments where things are being routinely encrypted in the first
place.Yes; in my specific case the idea is to use public key encryption with
gpg. In that scenario the secret does not need to be on the server at
all.Using public key encryption doesn't mean you get to entirely avoid the
question around how to handle secrets- you'll presumably want to
actually restore the dump at some point.
You are right of course. But I don't see how it's more difficult to
pass the secret to the piped commands than it is to pass it to
postgres.
You wouldn't want to pass the secrets as options to the commands of
course. In the case of gpg you would probably let gpg store and handle
them, which seems to me about the same as letting postgres store them.
On Sat, Sep 29, 2018 at 5:03 PM, Stephen Frost <sfrost@snowman.net> wrote:
Generally, my thinking is that this can be pretty useful in general
besides encryption. For other formats the dumps can already be written
to standard output and piped through for example gpg or a custom
compression application of the administrators choice, so in a sense
this functionality would merely add the same feature to the directory
format.That's certainly not the same though. One of the great advantages of
custom and directory format dumps is the TOC and the ability to
selectively extract data from them without having to read the entire
dump file. You end up losing that if you have to pass the entire dump
through something else because you're using the pipe.
I can maybe see the problem here, but I apologize if I'm missing the point.
Since all the files are individually passed through separate instances
of the pipe, they can also be individually restored. I guess the
--list option could be (adopted to be) used to produce a clear text
TOC to further use in selective decryption of the rest of the archive?
Possibly combined with an option to not apply the pipeline commands to
the TOC during dump and/or restore, if there's any need for that.
I do think that I understand the advantages of having a TOC that
describes the exact format of the dump and how to restore it, and I am
in no way arguing against having encryption included natively in the
format as a default option.
But I think the pipe option, or one like it, could be used to easily
extend the format. Easily supporting a different compression
algorithm, a different encryption method or even a different storage
method like uploading the files directly to a bucket in S3. In this
way I think that it's similar to be able to write the other formats to
stdout; there are probably many different usages of it out there,
including custom compression or encryption.
If this is simply outside the scope of the directory or the custom
format, that is certainly understandable (and, to me, somewhat
regrettable :-) ).
Thank you the answers,
David
Hi,
On 2018-09-29 14:51:33 +0200, David Hedberg wrote:
I recently wanted a way to encrypt/decrypt backups while still
utilizing the parallel dump/restore functionality. I couldn't see a
way to do this so I experimented a bit with the directory backup
format. If there's in fact already a way to do this, please tell me
now :-)The idea is to add a --pipe option to pg_dump / pg_restore where you
can specify a custom shell command that is used to write / read each
.dat-file. Usage examples include encryption with pgp and/or custom
compression pipelines. %p in the command is expanded to the path to
write to / read from. The pipe command is not applied to the toc.The current version is attached. Could something like this be
acceptable for inclusion?
Isn't that a bit unsatisfying because information about the tables and
their sizes leaks?
My suspicion is, and was, that we're probably at some point are going to
want a format that supports the features the directory format does,
without requiring to write to multiple files...
- Andres
Greetings,
* David Hedberg (david.hedberg@gmail.com) wrote:
On Sat, Sep 29, 2018 at 7:01 PM, Stephen Frost <sfrost@snowman.net> wrote:
* David Hedberg (david.hedberg@gmail.com) wrote:
On Sat, Sep 29, 2018 at 5:03 PM, Stephen Frost <sfrost@snowman.net> wrote:
Generally, my thinking is that this can be pretty useful in general
besides encryption. For other formats the dumps can already be written
to standard output and piped through for example gpg or a custom
compression application of the administrators choice, so in a sense
this functionality would merely add the same feature to the directory
format.That's certainly not the same though. One of the great advantages of
custom and directory format dumps is the TOC and the ability to
selectively extract data from them without having to read the entire
dump file. You end up losing that if you have to pass the entire dump
through something else because you're using the pipe.I can maybe see the problem here, but I apologize if I'm missing the point.
Since all the files are individually passed through separate instances
of the pipe, they can also be individually restored. I guess the
--list option could be (adopted to be) used to produce a clear text
TOC to further use in selective decryption of the rest of the archive?
This can work for directory format, but it wouldn't work for custom
format. For a custom format dump, we'd need a way to encrypt the TOC
independently of the rest, and we might even want to have the TOC
include individual keys for the different objects or similar.
Possibly combined with an option to not apply the pipeline commands to
the TOC during dump and/or restore, if there's any need for that.
That certainly doesn't seem to make things simpler or to be a very good
interface.
But I think the pipe option, or one like it, could be used to easily
extend the format. Easily supporting a different compression
algorithm, a different encryption method or even a different storage
method like uploading the files directly to a bucket in S3. In this
way I think that it's similar to be able to write the other formats to
stdout; there are probably many different usages of it out there,
including custom compression or encryption.
Considering the difficulty in doing selective restores (one of the
primary reasons for doing a logical dump at all, imv) from a dump file
that has to be completely decrypted or decompressed (due to using a
custom compression method), I don't know that I really buy off on this
argument that it's very commonly done or that it's a particularly good
interface to use.
If this is simply outside the scope of the directory or the custom
format, that is certainly understandable (and, to me, somewhat
regrettable :-) ).
What I think isn't getting through is that while this is an interesting
approach, it really isn't a terribly good one, regardless of how
flexible you view it to be. The way to move this forward seems pretty
clearly to work on adding generalized encryption support to
pg_dump/restore that doesn't depend on calling external programs
underneath of the directory format with a pipe.
Thanks!
Stephen
Hi,
On Sat, Sep 29, 2018 at 8:03 PM, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* David Hedberg (david.hedberg@gmail.com) wrote:
On Sat, Sep 29, 2018 at 7:01 PM, Stephen Frost <sfrost@snowman.net> wrote:
* David Hedberg (david.hedberg@gmail.com) wrote:
On Sat, Sep 29, 2018 at 5:03 PM, Stephen Frost <sfrost@snowman.net> wrote:
Generally, my thinking is that this can be pretty useful in general
besides encryption. For other formats the dumps can already be written
to standard output and piped through for example gpg or a custom
compression application of the administrators choice, so in a sense
this functionality would merely add the same feature to the directory
format.That's certainly not the same though. One of the great advantages of
custom and directory format dumps is the TOC and the ability to
selectively extract data from them without having to read the entire
dump file. You end up losing that if you have to pass the entire dump
through something else because you're using the pipe.I can maybe see the problem here, but I apologize if I'm missing the point.
Since all the files are individually passed through separate instances
of the pipe, they can also be individually restored. I guess the
--list option could be (adopted to be) used to produce a clear text
TOC to further use in selective decryption of the rest of the archive?
I admit that my understanding of the custom format was naive (I have
never actually used it).
If this is simply outside the scope of the directory or the custom
format, that is certainly understandable (and, to me, somewhat
regrettable :-) ).What I think isn't getting through is that while this is an interesting
approach, it really isn't a terribly good one, regardless of how
flexible you view it to be. The way to move this forward seems pretty
clearly to work on adding generalized encryption support to
pg_dump/restore that doesn't depend on calling external programs
underneath of the directory format with a pipe.
I did get the message that it wasn't the optimal way of doing it, and
I have now also gotten the message that it's probably not really
wanted at all.
Thanks you for your insights,
David