BUG #16508: using multi-host connection string when the first host is starting fails

Started by PG Bug reporting formalmost 6 years ago5 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 16508
Logged by: Cyril Jouve
Email address: jv.cyril@gmail.com
PostgreSQL version: 10.11
Operating system: rhel7
Description:

Hello,

I'm connection to pg10 using psql (tried with clients psql 10.11 & psql
12.2) using a connection string such as:
psql 'dbname=xxxxx1,xxxxx2,xxxxx3,xxxxx4 target_session_attrs=read-write'

the connection to first database (xxxxx1) fail with the error:
psql.bin: FATAL: the database system is starting up

which is correct according to postgres state on that machine,
but then I would expect the psql tries the next server (xxxxx2) with is in
the one acceptiong the connection params (target_session_attrs=read-write)

instead of the error.

If I swap the connection parameters like this: psql
'dbname=xxxxx2,xxxxx1,xxxxx3,xxxxx4 target_session_attrs=read-write'
(swapped xxxx1/xxxx2), psql is able to connect to xxxxx2.

#2Noah Misch
noah@leadboat.com
In reply to: PG Bug reporting form (#1)
Re: BUG #16508: using multi-host connection string when the first host is starting fails

On Wed, Jun 24, 2020 at 08:17:44AM +0000, PG Bug reporting form wrote:

I'm connection to pg10 using psql (tried with clients psql 10.11 & psql
12.2) using a connection string such as:
psql 'dbname=xxxxx1,xxxxx2,xxxxx3,xxxxx4 target_session_attrs=read-write'

the connection to first database (xxxxx1) fail with the error:
psql.bin: FATAL: the database system is starting up

which is correct according to postgres state on that machine,
but then I would expect the psql tries the next server (xxxxx2) with is in
the one acceptiong the connection params (target_session_attrs=read-write)

instead of the error.

I agree.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#2)
Re: BUG #16508: using multi-host connection string when the first host is starting fails

Noah Misch <noah@leadboat.com> writes:

On Wed, Jun 24, 2020 at 08:17:44AM +0000, PG Bug reporting form wrote:

I'm connection to pg10 using psql (tried with clients psql 10.11 & psql
12.2) using a connection string such as:
psql 'dbname=xxxxx1,xxxxx2,xxxxx3,xxxxx4 target_session_attrs=read-write'

the connection to first database (xxxxx1) fail with the error:
psql.bin: FATAL: the database system is starting up

which is correct according to postgres state on that machine,
but then I would expect the psql tries the next server (xxxxx2) with is in
the one acceptiong the connection params (target_session_attrs=read-write)
instead of the error.

I agree.

I think the OP needs to be less opaque about what he actually did,
because the given example could not possibly have worked in any variant.
There is no provision in libpq for interpreting dbname as a
comma-separated list; therefore, what you actually get with the above
is a single attempt to connect to a database named
"xxxxx1,xxxxx2,xxxxx3,xxxxx4" (or "xxxxx2,xxxxx1,xxxxx3,xxxxx4" in
the allegedly-working case).

I assume that the actual test case involved a comma-separated *host*
(or hostaddr) list, which is what drives multiple connection attempts.
It is true that if we manage to make a connection to a host, but it
then rejects us for some reason, we just give up rather than trying
the next host. The problem with trying to improve that is that it's
very unclear which cases it's actually appropriate to do that for.
As an example, if you fat-finger the password to host 1, it's unlikely
that silently switching our attention to host 2 would be advisable.
At best, what you'd get is several confusing duplicate messages.

I experimented with letting PQconnectPoll retry after getting a server
error message, as per attached, but I thought the results were more
confusing than helpful.

regards, tom lane

Attachments:

retry-after-server-error-0.1.patchtext/x-diff; charset=us-ascii; name=retry-after-server-error-0.1.patchDownload+17-2
#4Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#3)
Re: BUG #16508: using multi-host connection string when the first host is starting fails

On Thu, Aug 13, 2020 at 06:08:14PM -0400, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

On Wed, Jun 24, 2020 at 08:17:44AM +0000, PG Bug reporting form wrote:

I'm connection to pg10 using psql (tried with clients psql 10.11 & psql
12.2) using a connection string such as:
psql 'dbname=xxxxx1,xxxxx2,xxxxx3,xxxxx4 target_session_attrs=read-write'

the connection to first database (xxxxx1) fail with the error:
psql.bin: FATAL: the database system is starting up

which is correct according to postgres state on that machine,
but then I would expect the psql tries the next server (xxxxx2) with is in
the one acceptiong the connection params (target_session_attrs=read-write)
instead of the error.

I agree.

I assume that the actual test case involved a comma-separated *host*
(or hostaddr) list, which is what drives multiple connection attempts.

Like you, I assumed that.

It is true that if we manage to make a connection to a host, but it
then rejects us for some reason, we just give up rather than trying
the next host. The problem with trying to improve that is that it's
very unclear which cases it's actually appropriate to do that for.

That is an obstacle, yes. Assume the connection string has N>=2 entries
where, typically, one is read-write and N-1 are read-only. Here's the
principle that made me agree with the bug report. It should be possible to
"pg_ctl restart" any one host without interrupting clients' ability to form a
read-only connection using the multi-host connection string. Currently, the
outcome depends on the timing within the restart sequence:

1. fails during shutdown checkpoint
2. succeeds after port closes
3. fails after port opens, during early recovery
4. succeeds after recovery permits read-only connections

That's a bad user experience. I didn't form a proposal for what to do
instead, but I doubt we already have the optimum.

As an example, if you fat-finger the password to host 1, it's unlikely
that silently switching our attention to host 2 would be advisable.
At best, what you'd get is several confusing duplicate messages.

I'd be fine with the duplicate messages. Yeah, if we could divine that the
connection attempt failed due to a client typo, it would be nice to stop
there. A client can't divine that. If the PostgreSQL servers use pam or ldap
authentication, server-side trouble can cause transient authentication
failures.

I do seem to recall discussion that rejected retrying on all errors, but I
looked and didn't locate it.

#5Cyril Jouve
jv.cyril@gmail.com
In reply to: Noah Misch (#4)
Re: BUG #16508: using multi-host connection string when the first host is starting fails

you assumed right that it was host instead of dbname :)
I failed at "anonymising" my connection string...

Note that at the same time, I had java services connected to the same
cluster using the jdbc driver without issues.

Regards,
Cyril

Le ven. 14 août 2020 à 09:51, Noah Misch <noah@leadboat.com> a écrit :

Show quoted text

On Thu, Aug 13, 2020 at 06:08:14PM -0400, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

On Wed, Jun 24, 2020 at 08:17:44AM +0000, PG Bug reporting form wrote:

I'm connection to pg10 using psql (tried with clients psql 10.11 &

psql

12.2) using a connection string such as:
psql 'dbname=xxxxx1,xxxxx2,xxxxx3,xxxxx4

target_session_attrs=read-write'

the connection to first database (xxxxx1) fail with the error:
psql.bin: FATAL: the database system is starting up

which is correct according to postgres state on that machine,
but then I would expect the psql tries the next server (xxxxx2) with

is in

the one acceptiong the connection params

(target_session_attrs=read-write)

instead of the error.

I agree.

I assume that the actual test case involved a comma-separated *host*
(or hostaddr) list, which is what drives multiple connection attempts.

Like you, I assumed that.

It is true that if we manage to make a connection to a host, but it
then rejects us for some reason, we just give up rather than trying
the next host. The problem with trying to improve that is that it's
very unclear which cases it's actually appropriate to do that for.

That is an obstacle, yes. Assume the connection string has N>=2 entries
where, typically, one is read-write and N-1 are read-only. Here's the
principle that made me agree with the bug report. It should be possible to
"pg_ctl restart" any one host without interrupting clients' ability to
form a
read-only connection using the multi-host connection string. Currently,
the
outcome depends on the timing within the restart sequence:

1. fails during shutdown checkpoint
2. succeeds after port closes
3. fails after port opens, during early recovery
4. succeeds after recovery permits read-only connections

That's a bad user experience. I didn't form a proposal for what to do
instead, but I doubt we already have the optimum.

As an example, if you fat-finger the password to host 1, it's unlikely
that silently switching our attention to host 2 would be advisable.
At best, what you'd get is several confusing duplicate messages.

I'd be fine with the duplicate messages. Yeah, if we could divine that the
connection attempt failed due to a client typo, it would be nice to stop
there. A client can't divine that. If the PostgreSQL servers use pam or
ldap
authentication, server-side trouble can cause transient authentication
failures.

I do seem to recall discussion that rejected retrying on all errors, but I
looked and didn't locate it.