How to fork pg_dump or psql w/o leaking secrets?

Started by Dominique Devienneover 2 years ago14 messagesgeneral
Jump to latest
#1Dominique Devienne
ddevienne@gmail.com

Hi. To administer our PostgreSQL-based system, we have custom tools
to manage the schemas, load data, etc... Including a versatile CLI tool.

But that tool is special purpose, while sometimes we want/need the general
purpose PSQL. But because our ROLE and SCHEMA names are rather long
and unwieldly, it's cumbersome to SET ROLE and SET SEARCH_PATH manually
to setup the session for easy access to the data. Thus I'd like to fork
PSQL from
our tool, which requires reconnecting to the server (thus possibly
"secrets"),
and I also need to run some PSQL \commands and SQL to config PSQL correctly
for the context our tool was run with (i.e. our tool's own CLI options).

So how would I:
1) "pass secrets" somehow, so they don't leak to the PS command for
example. stdin?
2) configure the forked PSQL in terms of SET ROLE, SET SEARCH_PATH, and
possible other \commands?

In a very similar vein, I do my own backups using COPY, similar to
[pg_dumpbinary][1]https://github.com/lzlabs/pg_dumpbinary,
for the same reasons, but using a different "container" (not several
files), for several reasons,
but I'd like to reuse `pg_dump --pre/post-data` by also forking it, with
the save issue #1 above,
to take care of the DDLs for me, while I handle the DMLs myself.

Does anybody have experience doing something similar?
Any public example anywhere?
Otherwise, any advice on how best to do it?

Thanks, --DD

[1]: https://github.com/lzlabs/pg_dumpbinary

#2Luca Ferrari
fluca1978@gmail.com
In reply to: Dominique Devienne (#1)
Re: How to fork pg_dump or psql w/o leaking secrets?

On Thu, Sep 21, 2023 at 7:46 PM Dominique Devienne <ddevienne@gmail.com> wrote:

Hi. To administer our PostgreSQL-based system, we have custom tools
to manage the schemas, load data, etc... Including a versatile CLI tool.

But that tool is special purpose, while sometimes we want/need the general
purpose PSQL. But because our ROLE and SCHEMA names are rather long
and unwieldly, it's cumbersome to SET ROLE and SET SEARCH_PATH manually
to setup the session for easy access to the data. Thus I'd like to fork PSQL from
our tool, which requires reconnecting to the server (thus possibly "secrets"),
and I also need to run some PSQL \commands and SQL to config PSQL correctly
for the context our tool was run with (i.e. our tool's own CLI options).

Isn't .psqlrc a possibile approach?

% cat .psqlrc
\echo "Loading configuration"
set search_path to 'luca';

So how would I:
1) "pass secrets" somehow, so they don't leak to the PS command for example. stdin?

I don't get what you mean by secrets, but I guess you have to play
with a wrapper and pass or tools like that.

2) configure the forked PSQL in terms of SET ROLE, SET SEARCH_PATH, and possible other \commands?

I would use .psqlrc for every user.

In a very similar vein, I do my own backups using COPY, similar to [pg_dumpbinary][1],
for the same reasons, but using a different "container" (not several files), for several reasons,
but I'd like to reuse `pg_dump --pre/post-data` by also forking it, with the save issue #1 above,
to take care of the DDLs for me, while I handle the DMLs myself.

Again, I don't understand the aim, seems you would obfuscate what you
are doing to others for, ehm, commercial reasons?
However, pg_dump can dump the only schema (--schema-only and friends),
as well as data only. I'm not sure pg_dumpbinary can (is it a
purpose?).
I would not mix and macth the two tools however.

Luca

#3Dominique Devienne
ddevienne@gmail.com
In reply to: Luca Ferrari (#2)
Re: How to fork pg_dump or psql w/o leaking secrets?

On Fri, Sep 22, 2023 at 10:56 AM Luca Ferrari <fluca1978@gmail.com> wrote:

On Thu, Sep 21, 2023 at 7:46 PM Dominique Devienne <ddevienne@gmail.com>
wrote:

and I also need to run some PSQL \commands and SQL to config PSQL

correctly

for the context our tool was run with (i.e. our tool's own CLI options).

Isn't .psqlrc a possibile approach?

% cat .psqlrc
\echo "Loading configuration"
set search_path to 'luca';

I see. I don't want to mess with user's .pqsqlrc, but from
https://www.postgresql.org/docs/current/app-psql.html
I can use PSQLRC to override the default location.

Or I can use `-f filename` or `-f -` to provide them independently of the
.psqlrc,
with or without `--no-psqlrc`.

So that takes care of configuring PSQL, thanks.

So how would I:

1) "pass secrets" somehow, so they don't leak to the PS command for

example. stdin?

I don't get what you mean by secrets, but I guess you have to play
with a wrapper and pass or tools like that.

Well, simply the password for example. The user connects to PostgreSQL
using our tool,
possibly prompting for the password, if PQconnectionNeedsPassword() returns
true.
We translate project names and "persona" into actual ROLE and SCHEMA names,
check
permissions, check schema versions, that sort of things. Then "fork" PSQL,
which must
reconnect to the same DB for the same USER, but ideally w/o having PSQL
prompting
for the password again. It should be seamless. Especially since we have
alternate ways
to get the password from, beside prompting and PGSERVICE and PGPASSWORD.

So my question is how I pass the password my tool already own, to the
forked PSQL,
w/o that password leaking. I could pass it on the command-line, but that
would be leaking
it to the `ps` command (and in various other places).

I guess I could create a temp file with the connection info, and set a
PGSERVICE env-var
for PSQL, but then the password in in-clear in a file. Again, that's
leaking the password, in a way.

Another alternative would be to read what PSQL outputs, and look for it
asking for a password,
and provide it, but that seems brittle, requires `pipe` (not
cross-platform).

That's why I'm asking the community how best to the forked PSQL can connect
w/o password prompting.

In a very similar vein, I do my own backups using COPY, similar to

[pg_dumpbinary][1],

for the same reasons, but using a different "container" (not several

files), for several reasons,

but I'd like to reuse `pg_dump --pre/post-data` by also forking it, with

the save issue #1 above,

to take care of the DDLs for me, while I handle the DMLs myself.

Again, I don't understand the aim, seems you would obfuscate what you
are doing to others for, ehm, commercial reasons?

Not really no. It's actually to make the backup more introspectable in fact.
But that's beside the point, no?

However, pg_dump can dump the only schema (--schema-only and friends),
as well as data only. I'm not sure pg_dumpbinary can (is it a
purpose?). I would not mix and macth the two tools however.

pg_dumpbinary is a Perl script that uses pg_dump and pg_restore.

#4Luca Ferrari
fluca1978@gmail.com
In reply to: Dominique Devienne (#3)
Re: How to fork pg_dump or psql w/o leaking secrets?

On Fri, Sep 22, 2023 at 12:13 PM Dominique Devienne <ddevienne@gmail.com> wrote:

So my question is how I pass the password my tool already own, to the forked PSQL,
w/o that password leaking. I could pass it on the command-line, but that would be leaking
it to the `ps` command (and in various other places).

I think that forking a beast like psql will make you incurring into a
lot of security problems that are worst your "password leak".
One solution I could see, could be something like "a poor man captive
psql": write a per-user .pgpass file with the password for the
database (from your tool), write a .psqlrc configuration for the user
and let him to connect to the database. Once done, remove both files
and/or restore previous ones.

Besides, as far as I understand, you are approaching the problem with
a possible wrong design: schemas in PostgreSQL do not provide the
amount of isolation it seems you are trying to achieve with your
persona-to-role-to-schema wrapper.

That's why I'm asking the community how best to the forked PSQL can connect w/o password prompting.

psql and lipq can exploit .pgpass for exactly that aim: not messing
around with passwords.
Again, I would discourage you to fork psql. Would you be able to
maintain the new upcoming versions in the future?

However, pg_dump can dump the only schema (--schema-only and friends),
as well as data only. I'm not sure pg_dumpbinary can (is it a
purpose?). I would not mix and macth the two tools however.

pg_dumpbinary is a Perl script that uses pg_dump and pg_restore.

Yes, and in fact it does the pre-data and post-data sections, e.g.,
<https://github.com/lzlabs/pg_dumpbinary/blob/master/pg_dumpbinary#L305&gt;.
However, I'm not sure you will be ble to mix and match a pg_dump
manually obtined schema (in plain text) with pg_restorebinary.
And it is not clear to me, still, the aim of this approach.

Luca

#5Dominique Devienne
ddevienne@gmail.com
In reply to: Luca Ferrari (#4)
Re: How to fork pg_dump or psql w/o leaking secrets?

On Fri, Sep 22, 2023 at 12:45 PM Luca Ferrari <fluca1978@gmail.com> wrote:

On Fri, Sep 22, 2023 at 12:13 PM Dominique Devienne <ddevienne@gmail.com>
wrote:

So my question is how I pass the password my tool already own, to the

forked PSQL,

w/o that password leaking. I could pass it on the command-line, but that

would be leaking

it to the `ps` command (and in various other places).

I think that forking a beast like psql will make you incurring into a
lot of security problems that are worst your "password leak".

I'm sorry, but this doesn't make sense to me.
I'm talking of replacing using my tool then psql, with using my tool that
forks psql.
In both cases these are clients apps that connect to a particular DB, for a
given user.
The point if about the best way to not expose the password, if a password
is necessary.

One solution I could see, could be something like "a poor man captive
psql": write a per-user .pgpass file with the password for the
database (from your tool), write a .psqlrc configuration for the user
and let him to connect to the database. Once done, remove both files
and/or restore previous ones.

I already told you my app is LIBPQ aware, including about PGPASSWORD and
PGSERVICE.
It's users who decide to use these mechanisms (which have plain-text
passwords BTW...), not my tool.
The same way PSQL prompts for a password when not using these mechanism, so
does my tool.

Besides, as far as I understand, you are approaching the problem with

a possible wrong design: schemas in PostgreSQL do not provide the
amount of isolation it seems you are trying to achieve with your
persona-to-role-to-schema wrapper.

I don't know where you are inferring that...
And it seems beside the point anyway, if it was true (and it isn't).

That's why I'm asking the community how best to the forked PSQL can

connect w/o password prompting.

psql and lipq can exploit .pgpass for exactly that aim: not messing
around with passwords.

You are confusing things. I use PGSERVICE and PGPASSWORD myself.
But other users don't. The tool must work either way. And besides, I already
mentioned we have other custom (encrypted store) ways to get passwords too.
So in that case, the password must be communicated to PSQL anyways, somehow.
And the somehow is exactly the point of my questions here.

Again, I would discourage you to fork psql. Would you be able to
maintain the new upcoming versions in the future?

PSQL is not willy-nilly changing its CLI options or the way it operates.
Plus I bundled PSQL in my package, which is standalone and cross-platforms.
And again, I really don't see what's the harm is forking PSQL versus
running it directly,
albeit having to provide credentials again, exactly what I want to avoid.

However, pg_dump can dump the only schema (--schema-only and friends),

as well as data only. I'm not sure pg_dumpbinary can (is it a
purpose?). I would not mix and macth the two tools however.

pg_dumpbinary is a Perl script that uses pg_dump and pg_restore.

Yes, and in fact it does the pre-data and post-data sections, e.g.,
<https://github.com/lzlabs/pg_dumpbinary/blob/master/pg_dumpbinary#L305&gt;.
However, I'm not sure you will be ble to mix and match a pg_dump
manually obtined schema (in plain text) with pg_restorebinary.
And it is not clear to me, still, the aim of this approach.

Who says I'd use pg_restorebinary? My tool does custom backup / restore
because:
1) it's a mix of a schema (a "project"), and pieces (subset) of another
schema (meta-data about the project)
2) it's full of bytea values, some quite large.
3) the backup is not opaque, it's an SQLite DB with a 1-to-1 match in terms
of tables and rows with the original schema.
(but not in terms of columns, the COPY BINARY bytes of the row as dumped
as-is; I also use the same integer PK if any)
(and in some cases, I even [expose virtual columns][1]https://sqlite.org/forum/forumpost/c819a1e0fbca1f07dac18cacccb7676b15968a29c0dbc5f9c50a26406e243db6 in SQLite for
"interesting" (i.e. NKs) columns of the COPY BINARY bytes)

The DML parts using COPY BINARY are already working just fine, thank you.

And it yields a 1-file backup per "project" schema (+ meta-data) that you
can easily introspect in any SQLite tool.
Most of the rows are opaque, but tables, rows, integer PKs, and even some
text NKs and FKs (parent-child)
are visible from SQLite, making it possible to partially restore some
subset, or get row stats about the "project", etc...
I even have views which know about the schema structure, to project even
more insights about the backup content.
It's so convenient, I wonder why it doesn't exist already. I'm not the only
SQLite + PostgreSQL dev, after all, right.
OTOH, I do tend to think out-of-the-box...

Now I want to add capturing the DDLs for the "project" schema at backup
time,
so I don't have to pre-create the "project" schema using my own tool,
before restoring it.
I want to capture the schema exactly as it was, recreate the schema using
the captured DDLs,
and the use my tool as usual to check and possibly upgrade that schema, as
necessary.
And instead of re-inventing the wheel (in my tool) to capture DDLs, reusing
pg_dump
makes complete sense to me.

My users don't know SQL or PostgreSQL. They ask to backup a project "by
name" to my tool,
they won't run pg_dump manually. They wouldn't even know what the schema
name is. Or even
how to properly quote it (requires triple dquotes in CMD on Windows, how
many people know that!),
much less what pg_dump options to use.

Anyways, I'm getting carried away I'm afraid...
That's not the kind of answers (or questions / comments) I was expecting,
from this ML.
Hopefully all is not lost, and someone might answer the
as-secure-as-possible password passing question.

Thanks, --DD

[1]: https://sqlite.org/forum/forumpost/c819a1e0fbca1f07dac18cacccb7676b15968a29c0dbc5f9c50a26406e243db6
https://sqlite.org/forum/forumpost/c819a1e0fbca1f07dac18cacccb7676b15968a29c0dbc5f9c50a26406e243db6

#6Francisco Olarte
folarte@peoplecall.com
In reply to: Dominique Devienne (#5)
Re: How to fork pg_dump or psql w/o leaking secrets?

On Fri, 22 Sept 2023 at 15:25, Dominique Devienne <ddevienne@gmail.com> wrote:

On Fri, Sep 22, 2023 at 12:45 PM Luca Ferrari <fluca1978@gmail.com> wrote:

...

I think that forking a beast like psql will make you incurring into a
lot of security problems that are worst your "password leak".

I'm sorry, but this doesn't make sense to me.
I'm talking of replacing using my tool then psql, with using my tool that forks psql.
In both cases these are clients apps that connect to a particular DB, for a given user.

It appears you use fork to mean "fork & exec", the classic way to
start a slave ( controlled? helper ? I do not know the modern PC word
for that ) process on *ix, while Luca is thinking on a source fork,
the thing which is normally done in git and friends to have a similar
but derived software built.

Francisco Olarte.

#7David G. Johnston
david.g.johnston@gmail.com
In reply to: Luca Ferrari (#4)

On Friday, September 22, 2023, Luca Ferrari <fluca1978@gmail.com> wrote:

That's why I'm asking the community how best to the forked PSQL can

connect w/o password prompting.

psql and lipq can exploit .pgpass for exactly that aim: not messing
around with passwords.
Again, I would discourage you to fork psql. Would you be able to
maintain the new upcoming versions in the future?

The OP seems to be used the term fork in a process sense, not forking the
source code. Process execution from within another program. “Launch” or
“execute” psql would be a better choice of wording here.

David J.

#8Luca Ferrari
fluca1978@gmail.com
In reply to: Dominique Devienne (#5)
Re: How to fork pg_dump or psql w/o leaking secrets?

On Fri, Sep 22, 2023 at 3:24 PM Dominique Devienne <ddevienne@gmail.com> wrote:

I'm talking of replacing using my tool then psql, with using my tool that forks psql.

I thought you were talking about forking the source code to write your
own psql, sorry about that.
It is clear to me that you mean Unix fork(2) and exec(2).

The point if about the best way to not expose the password, if a password is necessary.

As I understand the point, which is apparently really hard for me, one
way is using .pgpass (for example adding/removing lines from your
tools, that is effectively a psql wrapper in this sense).

I already told you my app is LIBPQ aware, including about PGPASSWORD and PGSERVICE.
It's users who decide to use these mechanisms (which have plain-text passwords BTW...), not my tool.

It cannot be your tool, it is the database that asks for a password if
needed (i.e., pg_hba.conf).
Therefore, if your tool is PGPASSWORD aware, it can handle dynamic
lines in .pgpass, which apparently is the simplest and most secure way
to avoid other users spoofing on passwords using ps(1) and friends.
Or pass the environment variable as it is: you are already risking at
the entry point of your tool.

So in that case, the password must be communicated to PSQL anyways, somehow.
And the somehow is exactly the point of my questions here.

The options are what you already have mentioned: environment
variables, stdin, and .pgpass.
I'm not aware of any other method, so I thjnk the decision is up to you.

PSQL is not willy-nilly changing its CLI options or the way it operates.
Plus I bundled PSQL in my package, which is standalone and cross-platforms.
And again, I really don't see what's the harm is forking PSQL versus running it directly,

I was referring to source code forking, not fork(2).

And instead of re-inventing the wheel (in my tool) to capture DDLs, reusing pg_dump
makes complete sense to me.

You added information I was not aware before: you are dumping
PostgreSQL to restore it into SQLite, while I was thinking you wanted
to do some stuff with a PostgreSQL-to-PostgreSQL backup and restore.
That is why I was suggesting to stick with a tool and not to mix and
match.
Apparently you have already what you need, or quite frankly I'm not
getting your actual need.

That's not the kind of answers (or questions / comments) I was expecting, from this ML.

Sorry about that, hope you will find the correct answer.

Luca

#9Dominique Devienne
ddevienne@gmail.com
In reply to: Luca Ferrari (#8)
Re: How to fork pg_dump or psql w/o leaking secrets?

On Fri, Sep 22, 2023 at 5:19 PM Luca Ferrari <fluca1978@gmail.com> wrote:

On Fri, Sep 22, 2023 at 3:24 PM Dominique Devienne <ddevienne@gmail.com>
wrote:
You added information I was not aware before: you are dumping
PostgreSQL to restore it into SQLite, while I was thinking you wanted
to do some stuff with a PostgreSQL-to-PostgreSQL backup and restore.

I must be really bad at explaining things today, sorry...

Because of course I'm doing PostgreSQL-to-PostgreSQL backup/restore.
The fact I use a custom SQLite DB file as the backup's "format" instead of
a "pile-of-files" (or a TAR of that pile), is the only difference.

That, and the fact it's a backup that mixes one full schema with part of
another.
And that my SQLite backup format is much more convenient and "expressive"
IMHO,
since fully introspectable ("semi-opaque", since rows are still COPY BINARY
encoded).

Writing the backup as an SQLite DB incurs a little overhead, of course,
compared to just appending to per-table files, but not that much when
in non-transactional mode. Acceptable tradeoff compared to the fact one
can open the backup easily and see the tables and rows almost as-is,
which also opens the door to partial restores (there are separates "roots",
independent entity trees, in these schemas).

#10Brad White
b55white@gmail.com
In reply to: Dominique Devienne (#9)
Re: How to fork pg_dump or psql w/o leaking secrets?

Seems to me that your tool could set the env var that you want.
If you don't export it, I think it shouldn't "leak" but your child process
should get it as part of their environment.

On Fri, Sep 22, 2023 at 12:43 PM Dominique Devienne <ddevienne@gmail.com>
wrote:

Show quoted text

On Fri, Sep 22, 2023 at 5:19 PM Luca Ferrari <fluca1978@gmail.com> wrote:

On Fri, Sep 22, 2023 at 3:24 PM Dominique Devienne <ddevienne@gmail.com>
wrote:
You added information I was not aware before: you are dumping
PostgreSQL to restore it into SQLite, while I was thinking you wanted
to do some stuff with a PostgreSQL-to-PostgreSQL backup and restore.

I must be really bad at explaining things today, sorry...

Because of course I'm doing PostgreSQL-to-PostgreSQL backup/restore.
The fact I use a custom SQLite DB file as the backup's "format" instead of
a "pile-of-files" (or a TAR of that pile), is the only difference.

That, and the fact it's a backup that mixes one full schema with part of
another.
And that my SQLite backup format is much more convenient and "expressive"
IMHO,
since fully introspectable ("semi-opaque", since rows are still COPY
BINARY encoded).

Writing the backup as an SQLite DB incurs a little overhead, of course,
compared to just appending to per-table files, but not that much when
in non-transactional mode. Acceptable tradeoff compared to the fact one
can open the backup easily and see the tables and rows almost as-is,
which also opens the door to partial restores (there are separates "roots",
independent entity trees, in these schemas).

#11David G. Johnston
david.g.johnston@gmail.com
In reply to: Dominique Devienne (#5)
Re: How to fork pg_dump or psql w/o leaking secrets?

On Friday, September 22, 2023, Dominique Devienne <ddevienne@gmail.com>
wrote:

I already told you my app is LIBPQ aware, including about PGPASSWORD and
PGSERVICE.
It's users who decide to use these mechanisms (which have plain-text
passwords BTW...), not my tool.
The same way PSQL prompts for a password when not using these mechanism,
so does my tool.

Once you have the password you should utilize the PGPASSWORD environment
variable to get it passed to psql. It doesn’t matter in the least how you
obtained that password in the first place.

David J.

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#11)
Re: How to fork pg_dump or psql w/o leaking secrets?

"David G. Johnston" <david.g.johnston@gmail.com> writes:

Once you have the password you should utilize the PGPASSWORD environment
variable to get it passed to psql. It doesn’t matter in the least how you
obtained that password in the first place.

Keep in mind that on many flavors of Unix, a process's environment
variables can readily be inspected by other processes. You should
check your platform carefully before assuming that PGPASSWORD is
a safe way to pass down a secret.

regards, tom lane

#13Dominique Devienne
ddevienne@gmail.com
In reply to: Tom Lane (#12)
Re: How to fork pg_dump or psql w/o leaking secrets?

On Fri, Sep 22, 2023 at 8:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David G. Johnston" <david.g.johnston@gmail.com> writes:

Once you have the password you should utilize the PGPASSWORD environment
variable to get it passed to psql. It doesn’t matter in the least how

you

obtained that password in the first place.

Keep in mind that on many flavors of Unix, a process's environment
variables can readily be inspected by other processes. You should
check your platform carefully before assuming that PGPASSWORD is
a safe way to pass down a secret.

Yep. From https://www.postgresql.org/docs/current/libpq-envars.html :

PGPASSWORD behaves the same as the password connection parameter.
Use of this environment variable is not recommended for security reasons,
as some operating systems allow non-root users to see process environment
variables via ps; instead consider using a password file (see Section

34.16).

but I'm not a fan of creating a temporary file either, with the password in
plain text...

Remember that I'm already connected in the "parent" process, to the DB.
There aught to be a way to obtain a token from the DB via a connection,
with a short duration, to supply to the exec'd PostgreSQL tools like psql
or pg_dump,
to completely bypass passwords. The server would maintain per-DB secrets,
and sign a JWT token for example, valid for a few seconds, for that user/DB
pair,
that the parent "process" could then utilize / pass to the "fork/exec"d
tool.

Much safer than plain-text passwords floating around env-vars or
temp-files. --DD

#14David G. Johnston
david.g.johnston@gmail.com
In reply to: Dominique Devienne (#13)
Re: How to fork pg_dump or psql w/o leaking secrets?

On Friday, September 22, 2023, Dominique Devienne <ddevienne@gmail.com>
wrote:

Remember that I'm already connected in the "parent" process, to the DB.
There aught to be a way to obtain a token from the DB via a connection,
with a short duration, to supply to the exec'd PostgreSQL tools like psql
or pg_dump,
to completely bypass passwords. The server would maintain per-DB secrets,
and sign a JWT token for example, valid for a few seconds, for that
user/DB pair,
that the parent "process" could then utilize / pass to the "fork/exec"d
tool.

Much safer than plain-text passwords floating around env-vars or
temp-files. --DD

Sure, though maybe just some kind of “—password-on-stdin” option and then
the next input read from stdin is interpreted as the password, would be
more readily accomplished. Scripts should be sent via “—file” in that
usage but that seems desirable anyway.

David J.