BUG #16604: pg_dump with --jobs breaks SSL connections

Started by PG Bug reporting formover 5 years ago9 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 16604
Logged by: Zsolt Ero
Email address: zsolt.ero@gmail.com
PostgreSQL version: 12.4
Operating system: Ubuntu 20.04
Description:

I'm using pg_dump in the following syntax:

pg_dump --dbname="sslmode=verify-ca sslrootcert=server-ca.pem \
sslcert=client-cert.pem sslkey=client-key.pem \
hostaddr=1.2.3.4 \
user=postgres dbname=app" \
--format=directory \
--file=dump_app \
--jobs=3

As long as the --jobs parameter is present, the process breaks after
"pg_dump: saving database definition".
It breaks with "FATAL:  connection requires a valid client certificate".

Without --jobs it completes without errors. I also think it's happening with
pg_restore as well.

Client is postgresql-client-12 under Ubuntu 20.04 from latest official
packages (http://apt.postgresql.org/pub/repos/apt).

#2Daniel Gustafsson
daniel@yesql.se
In reply to: PG Bug reporting form (#1)
Re: BUG #16604: pg_dump with --jobs breaks SSL connections

On 1 Sep 2020, at 22:08, PG Bug reporting form <noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 16604
Logged by: Zsolt Ero
Email address: zsolt.ero@gmail.com
PostgreSQL version: 12.4
Operating system: Ubuntu 20.04
Description:

I'm using pg_dump in the following syntax:

pg_dump --dbname="sslmode=verify-ca sslrootcert=server-ca.pem \
sslcert=client-cert.pem sslkey=client-key.pem \
hostaddr=1.2.3.4 \
user=postgres dbname=app" \
--format=directory \
--file=dump_app \
--jobs=3

As long as the --jobs parameter is present, the process breaks after
"pg_dump: saving database definition".
It breaks with "FATAL: connection requires a valid client certificate".

Without --jobs it completes without errors. I also think it's happening with
pg_restore as well.

I am unable to reproduce this with 12.4 as well as 14devel. If this error is
reproducible for you, can you please share more details on the setup? (ideally
a recipe for setting up a repro environment)

cheers ./daniel

#3Magnus Hagander
magnus@hagander.net
In reply to: Daniel Gustafsson (#2)
Re: BUG #16604: pg_dump with --jobs breaks SSL connections

On Tue, Sep 15, 2020 at 1:36 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 1 Sep 2020, at 22:08, PG Bug reporting form <noreply@postgresql.org>

wrote:

The following bug has been logged on the website:

Bug reference: 16604
Logged by: Zsolt Ero
Email address: zsolt.ero@gmail.com
PostgreSQL version: 12.4
Operating system: Ubuntu 20.04
Description:

I'm using pg_dump in the following syntax:

pg_dump --dbname="sslmode=verify-ca sslrootcert=server-ca.pem \
sslcert=client-cert.pem sslkey=client-key.pem \
hostaddr=1.2.3.4 \
user=postgres dbname=app" \
--format=directory \
--file=dump_app \
--jobs=3

As long as the --jobs parameter is present, the process breaks after
"pg_dump: saving database definition".
It breaks with "FATAL: connection requires a valid client certificate".

Without --jobs it completes without errors. I also think it's happening

with

pg_restore as well.

I am unable to reproduce this with 12.4 as well as 14devel. If this error
is
reproducible for you, can you please share more details on the setup?
(ideally
a recipe for setting up a repro environment)

Grasping a long straw hwere, but I wonder if it could be relataed to the
debian/ubuntu wrapper.

Zsolt, can you try running it with hardcoding the pg_dump path
to /usr/lib/postgresql/12/bin/pg_dump instead of just pg_dump, to see if
that might be it?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#4Zsolt Ero
zsolt.ero@gmail.com
In reply to: Magnus Hagander (#3)
Re: BUG #16604: pg_dump with --jobs breaks SSL connections

I've created a minimal reproducible Dockerfile, it reproduces 100%. The PG
server is configured to require client certificates.

Dockerfile (repro with: "docker build .")

FROM ubuntu:18.04

RUN apt-get update && apt-get install -y wget gnupg

RUN echo "deb http://apt.postgresql.org/pub/repos/apt bionic-pgdg main" >
/etc/apt/sources.list.d/pgdg.list
RUN wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc |
apt-key add -
RUN apt-get update && apt-get install -y postgresql-client-12

WORKDIR /tmp
COPY *.pem /tmp/
RUN chmod 400 /tmp/*.pem

ENV PGPASSWORD=xxx

RUN pg_dump --dbname="sslmode=verify-ca sslrootcert=server-ca.pem \
sslcert=client-cert.pem sslkey=client-key.pem \
hostaddr=1.2.3.4 \
port=5432 \
user=postgres dbname=postgres" \
--format=directory \
--file=dump_app \
--jobs=3

Zsolt

On 15 Sep 2020 at 14:04:09, Magnus Hagander <magnus@hagander.net> wrote:

Show quoted text

On Tue, Sep 15, 2020 at 1:36 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 1 Sep 2020, at 22:08, PG Bug reporting form <noreply@postgresql.org>

wrote:

The following bug has been logged on the website:

Bug reference: 16604
Logged by: Zsolt Ero
Email address: zsolt.ero@gmail.com
PostgreSQL version: 12.4
Operating system: Ubuntu 20.04
Description:

I'm using pg_dump in the following syntax:

pg_dump --dbname="sslmode=verify-ca sslrootcert=server-ca.pem \
sslcert=client-cert.pem sslkey=client-key.pem \
hostaddr=1.2.3.4 \
user=postgres dbname=app" \
--format=directory \
--file=dump_app \
--jobs=3

As long as the --jobs parameter is present, the process breaks after
"pg_dump: saving database definition".
It breaks with "FATAL: connection requires a valid client certificate".

Without --jobs it completes without errors. I also think it's happening

with

pg_restore as well.

I am unable to reproduce this with 12.4 as well as 14devel. If this
error is
reproducible for you, can you please share more details on the setup?
(ideally
a recipe for setting up a repro environment)

Grasping a long straw hwere, but I wonder if it could be relataed to the
debian/ubuntu wrapper.

Zsolt, can you try running it with hardcoding the pg_dump path
to /usr/lib/postgresql/12/bin/pg_dump instead of just pg_dump, to see if
that might be it?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zsolt Ero (#4)
Re: BUG #16604: pg_dump with --jobs breaks SSL connections

Zsolt Ero <zsolt.ero@gmail.com> writes:

I've created a minimal reproducible Dockerfile, it reproduces 100%. The PG
server is configured to require client certificates.

I reproduced this locally, and the problem seems to be that
CloneArchive() is doing a far-less-than-half-baked job of
reconstructing the original connection parameters.
The data that PQconnectdbParams gets is just

(gdb) p keywords
$1 = {0x44ef22 "host", 0x44d52d "port", 0x44e4f0 "user", 0x4532b6 "password",
0x44ef14 "dbname", 0x45325f "fallback_application_name", 0x0}
(gdb) p values
$2 = {0x25459e0 "127.0.0.1", 0x25459a0 "5432", 0x2545180 "postgres", 0x0,
0x2a61d60 "dbname=regression", 0x2538410 "pg_dump", 0x0}

so parallel pg_dump is basically guaranteed to fail in any
case with even slightly unusual connection parameters.

Not sure why we should be trying to do it like that at
all; it'd be better if the original command-line parameters
got passed down in all cases. Looking at that now.

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#5)
Re: BUG #16604: pg_dump with --jobs breaks SSL connections

I wrote:

I reproduced this locally, and the problem seems to be that
CloneArchive() is doing a far-less-than-half-baked job of
reconstructing the original connection parameters. ...
so parallel pg_dump is basically guaranteed to fail in any
case with even slightly unusual connection parameters.
Not sure why we should be trying to do it like that at
all; it'd be better if the original command-line parameters
got passed down in all cases. Looking at that now.

The attached patch seems to fix it, and also takes care of a nasty
habit that parallel pg_restore had of issuing repeated password
prompts if you're fool enough to specify -W. IMO it's only
sensible to apply that option on the first connection attempt.

Note, however, that AFAICS parallel pg_restore was okay on duplicating
the original connection parameters otherwise (the prompting issue
was because it went too far with that...). So I fail to confirm
the OP's claim that pg_restore had the same bug as parallel pg_dump.
If there really is an issue there, we'll need a test case that
demonstrates it.

regards, tom lane

Attachments:

fix-parallel-pg-dump-with-dbname-connstring.patchtext/x-diff; charset=us-ascii; name=fix-parallel-pg-dump-with-dbname-connstring.patchDownload+21-47
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#6)
Re: BUG #16604: pg_dump with --jobs breaks SSL connections

I wrote:

Note, however, that AFAICS parallel pg_restore was okay on duplicating
the original connection parameters otherwise (the prompting issue
was because it went too far with that...). So I fail to confirm
the OP's claim that pg_restore had the same bug as parallel pg_dump.
If there really is an issue there, we'll need a test case that
demonstrates it.

Hmm ... so a bit later, I have a test case that may or may not be
what Zsolt saw:

pg_restore -C --dbname="hostaddr=127.0.0.1 sslmode=verify-ca dbname=postgres user=postgres sslcert=client-cert.crt sslkey=client-cert.key" -Fd dumpdir
pg_restore: error: could not reconnect to database: FATAL: connection requires a valid client certificate

What is happening here is the exact same issue, but in
ReconnectToServer/_connectDB: those functions suppose that they
can blow away all the original connection parameters and insert
just a database name instead. I'm inclined to think that
we should get rid of _connectDB, because it seems unnecessarily
duplicative with ConnectDatabase. But we'll need a bit more
API refactoring than what I had. New patch coming ...

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#7)
Re: BUG #16604: pg_dump with --jobs breaks SSL connections

I wrote:

I'm inclined to think that
we should get rid of _connectDB, because it seems unnecessarily
duplicative with ConnectDatabase. But we'll need a bit more
API refactoring than what I had. New patch coming ...

A bit of refactoring later, I have the attached, which seems to
clean up all cases. I got rid of some bits of dead code along
the way.

regards, tom lane

Attachments:

fix-parallel-pg-dump-with-dbname-connstring-2.patchtext/x-diff; charset=us-ascii; name=fix-parallel-pg-dump-with-dbname-connstring-2.patchDownload+130-281
#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#8)
Re: BUG #16604: pg_dump with --jobs breaks SSL connections

I wrote:

A bit of refactoring later, I have the attached, which seems to
clean up all cases. I got rid of some bits of dead code along
the way.

I pushed that at commit a45bc8a4f. However...

In some off-list discussion, Peter Eisentraut pointed out that
pg_dump/pg_restore are not the only programs with this problem.

Several of the src/bin/scripts/ programs, such as vacuumdb
and reindexdb, perform reconnections when told to use parallel
processing. These reconnections use methods similar to parallel
pg_restore and have the same bug. We can fix this straightforwardly
using much the same methods as in a45bc8a4f, as per 0001 below.
(0001 also fixes the documentation oversight that we didn't say that
these programs' --maintenance-db option could be a connstring.)

pg_dumpall, as well as "pg_dump -C" with text output, have a
variant of the issue: they emit psql scripts containing commands
such as "\connect databasename", and psql's \connect processing
isn't very good about re-using all connection options either.
In fact, it'll *only* re-use username, host name (and hostaddr if
given), and port. So if you do "psql -d '...some connection options'
and then try to run a dump script, the reconnections will fail
if there were other critical options in the -d connstring, much
like what was reported here. 0002 below is a draft attempt at
addressing this, but it's probably more controversial than 0001,
because there's a larger behavioral change involved.

An important thing to understand is that except for the \connect
case, all these programs perform a reconnection by re-using the
textual parameters given originally. Thus for example, if you do
"vacuumdb -j 4 -h host1,host2 ...", it's unclear which of host1 and
host2 will be connected to, either by the initial connection or any
of the parallel worker processes. I don't regard that as a bug
particularly. Perhaps some users would wish that once vacuumdb
made an initial connection, the workers would guarantee to reconnect
to that same host; but ISTM the real answer is "don't use that kind
of host spec for this purpose". (Note that there are other ways
to shoot yourself in the foot here, for example if your DNS can
resolve several different IPs for the same server host name. All
these cases seem more like pilot error than things for us to fix.)

\connect, however, is way out in left field. In the first place,
while these other programs are just interested in re-using all the
connection parameters except possibly the database name, \connect
intends to allow substitution of new values for some but not
necessarily all of the parameters. In the second place, \connect
makes some effort to force reconnection to occur to the very same
server if you don't specify a new host. Rather than re-using the
original parameter strings, it pulls out PQhost() and PQport()
values from the previous connection, and also PQhostaddr() if you
specified hostaddr previously. These values will reflect just the
server actually connected to, not the multi-host values you might
have given originally.

I'm inclined to think that this second behavior is simply wrong,
or at least a failure to adhere to the principle of least surprise.
An example of why it might be thought buggy is

postgres=# \connect "dbname=postgres host=sss1,sss1 hostaddr=127.0.0.1,::1"
You are now connected to database "postgres" as user "postgres".
postgres=# \connect -reuse-previous=on "dbname=postgres host=sss1,sss1"
could not match 2 host names to 1 hostaddr values
Previous connection kept

This happens because what -reuse-previous injects is not the previous
hostaddr connection parameter value ("127.0.0.1,::1"), but only the
actual connection address ("127.0.0.1"), resulting in a surprising
mismatch.

Moreover, psql isn't even being self-consistent by doing this.
If you get a connection loss, it will invoke PQreset() which just
re-uses the connection parameters previously given, and therefore
is perfectly content to connect to any of the hosts you named before.
It's quite unclear why \connect should be pickier than that case.

Therefore, my proposal in 0002 below is to drop all that logic.
The patch's method for re-using connection parameters is to fetch
the parameters of the old connection with PQconninfo(), which will
provide the textual values for each parameter slot, and then just
overwrite the specific slots that you provide new values for.
This seems to me to have considerably less surprise factor than
what happens now. But there's no denying that it changes behavior
in multi-host cases to a greater degree than is minimally necessary
to fix the bug complained of here.

There are some things adjacent to this that I'd kind of like to fix.
For example, I noted that createuser and dropuser lack any way to use
a connstring, as they have neither -d nor --maintenance-db options.
That seems pretty bogus, but it also seems like a missing feature not
a bug, so I didn't do anything about it here. (To be clear, I'm
proposing these changes for back-patch, as a45bc8a4f was.)

Comments?

regards, tom lane

Attachments:

0001-fix-reconnection-in-scripts-1.patchtext/x-diff; charset=us-ascii; name=0001-fix-reconnection-in-scripts-1.patchDownload+260-206
0002-fix-reconnection-in-psql-1.patchtext/x-diff; charset=us-ascii; name=0002-fix-reconnection-in-psql-1.patchDownload+211-119