[PATCH] libpq: try all addresses for a host before moving to next on target_session_attrs mismatch

Started by Evgeny Kuzin5 days ago7 messages
Jump to latest
#1Evgeny Kuzin
evgeny.kuzin@outlook.com

Hi hackers,
This is my first time submitting a patch to PostgreSQL, so please bear with me if I've missed anything in the process.
We've been running into an issue with "target_session_attrs" when using dns-based service discovery. Currently, when libpq connects to a host with multiple A-records and the connection succeeds but is rejected due to target_session_attrs mismatch (e.g., connecting to a read-only server with target_session_attrs=read-write), it skips all remaining addresses for that hostname and moves directly to the next host in the connection string.

Looking at git history, I found this was a deliberate choice by Robert Haas in commit 721f7bd3cbc (2016), where he noted "I changed Mithun's patch to skip all remaining IPs for a host if we reject a connection based on this new parameter." The original mailing list discussion is at [1]/messages/by-id/CAD__OuhqPRGpcsfwPHz_PDqAGkoqS1UvnUnOnAB-LBWBW=wu4A@mail.gmail.com, though I wasn't able to find a clear explanation of why this approach was preferred over trying all addresses.

This makes it impractical to use a single multi-A-record DNS name pointing to all cluster members with target_session_attrs=read-write to find the primary - only the first responding IP is tried before giving up on that hostname.
The attached patch changes the behavior to try all addresses for a hostname before moving to the next host, matching the existing behavior for connection failures. This would enable simpler DNS-based service discovery without requiring external tools like Consul or explicit multi-host connection strings.
If there was a specific reason for the original design that I'm missing, I'd be happy to learn more.
Happy to address any feedback or rework the patch as needed.

[1]: /messages/by-id/CAD__OuhqPRGpcsfwPHz_PDqAGkoqS1UvnUnOnAB-LBWBW=wu4A@mail.gmail.com

Thanks,
Evgeny

Attachments:

0001-libpq-try-all-addresses-before-moving-to-next-host-o.patchapplication/octet-stream; name=0001-libpq-try-all-addresses-before-moving-to-next-host-o.patchDownload+6-7
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Evgeny Kuzin (#1)
Re: [PATCH] libpq: try all addresses for a host before moving to next on target_session_attrs mismatch

Evgeny Kuzin <evgeny.kuzin@outlook.com> writes:

We've been running into an issue with "target_session_attrs" when using dns-based service discovery. Currently, when libpq connects to a host with multiple A-records and the connection succeeds but is rejected due to target_session_attrs mismatch (e.g., connecting to a read-only server with target_session_attrs=read-write), it skips all remaining addresses for that hostname and moves directly to the next host in the connection string.

Looking at git history, I found this was a deliberate choice by Robert Haas in commit 721f7bd3cbc (2016), where he noted "I changed Mithun's patch to skip all remaining IPs for a host if we reject a connection based on this new parameter." The original mailing list discussion is at [1], though I wasn't able to find a clear explanation of why this approach was preferred over trying all addresses.

This makes it impractical to use a single multi-A-record DNS name pointing to all cluster members with target_session_attrs=read-write to find the primary - only the first responding IP is tried before giving up on that hostname.

The attached patch changes the behavior to try all addresses for a hostname before moving to the next host, matching the existing behavior for connection failures. This would enable simpler DNS-based service discovery without requiring external tools like Consul or explicit multi-host connection strings.

TBH, I'd say that your DNS setup is broken and you should fix it.
It makes no sense to have the same DNS entry pointing to both
read-write and read-only hosts. The proposed patch will mainly
result in useless connection attempts in more-sanely-constructed
setups.

regards, tom lane

#3Evgeny Kuzin
evgeny.kuzin@outlook.com
In reply to: Tom Lane (#2)
Re: [PATCH] libpq: try all addresses for a host before moving to next on target_session_attrs mismatch

Hi Tom,
Thanks for the feedback. I should clarify the use case - we're not mixing read-write and read-only hosts under one DNS name by accident. This is intentional for HA failover.
We run a PostgreSQL clusters with streaming replication. After a failover, the old primary becomes a standby and vice versa. The challenge is: how do clients find the new primary?
Current options:

1. Update DNS on every failover - operationally complex, TTL delays, requires automation
2. Consul/etcd - adds operational complexity and another failure domain
3. Multiple hosts in connection string - requires application changes when cluster topology changes (e.g., adding a new standby)

The proposed approach:

* Single A-record (db.internal) pointing to all cluster member IPs
* Clients connect with host=db.internal target_session_attrs=read-write
* libpq tries each IP until it finds the primary

IIUC this​ is how JDBC's targetServerType=primary works - it iterates through all resolved addresses. The "useless connection attempts" are actually the feature: it's probing to find the right server, same as when you specify multiple hosts explicitly.
The only difference from host=pg1,pg2,pg3 is that DNS provides the list instead of the connection string. From libpq's perspective, why should it matter where the address list came from?

________________________________
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Thursday, March 5, 2026 2:55 PM
To: Evgeny Kuzin <evgeny.kuzin@outlook.com>
Cc: pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
Subject: Re: [PATCH] libpq: try all addresses for a host before moving to next on target_session_attrs mismatch

Evgeny Kuzin <evgeny.kuzin@outlook.com> writes:

We've been running into an issue with "target_session_attrs" when using dns-based service discovery. Currently, when libpq connects to a host with multiple A-records and the connection succeeds but is rejected due to target_session_attrs mismatch (e.g., connecting to a read-only server with target_session_attrs=read-write), it skips all remaining addresses for that hostname and moves directly to the next host in the connection string.

Looking at git history, I found this was a deliberate choice by Robert Haas in commit 721f7bd3cbc (2016), where he noted "I changed Mithun's patch to skip all remaining IPs for a host if we reject a connection based on this new parameter." The original mailing list discussion is at [1], though I wasn't able to find a clear explanation of why this approach was preferred over trying all addresses.

This makes it impractical to use a single multi-A-record DNS name pointing to all cluster members with target_session_attrs=read-write to find the primary - only the first responding IP is tried before giving up on that hostname.

The attached patch changes the behavior to try all addresses for a hostname before moving to the next host, matching the existing behavior for connection failures. This would enable simpler DNS-based service discovery without requiring external tools like Consul or explicit multi-host connection strings.

TBH, I'd say that your DNS setup is broken and you should fix it.
It makes no sense to have the same DNS entry pointing to both
read-write and read-only hosts. The proposed patch will mainly
result in useless connection attempts in more-sanely-constructed
setups.

regards, tom lane

#4Andrey Borodin
amborodin@acm.org
In reply to: Tom Lane (#2)
Re: [PATCH] libpq: try all addresses for a host before moving to next on target_session_attrs mismatch

On 5 Mar 2026, at 19:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

TBH, I'd say that your DNS setup is broken and you should fix it.
It makes no sense to have the same DNS entry pointing to both
read-write and read-only hosts. The proposed patch will mainly
result in useless connection attempts in more-sanely-constructed
setups.

This is very desired feature by cloud providers.

We sell PGaaS clusters which are just a bunch of hosts. Each of
these hosts can became primary any time.
Currently, when user adds more hosts they have to redeploy\reconfigure
their app.

Unless user uses pgx that already works this way, then we can just give
them one FQDN for whole cluster and update DNS records.

This was proposed before [0]https://commitfest.postgresql.org/patch/5396/ and I think Andrew and Evgeny could join
efforts. Certainly, this can be implemented without affecting those
who do not need it.

Best regards, Andrey Borodin.

[0]: https://commitfest.postgresql.org/patch/5396/

#5Evgeny Kuzin
evgeny.kuzin@outlook.com
In reply to: Andrey Borodin (#4)
Re: [PATCH] libpq: try all addresses for a host before moving to next on target_session_attrs mismatch

Thanks for the pointer to patch 5396 - I wasn't aware of Andrew Jackson's prior work on this.
I'd also like to add another argument from that thread. Artem Navrotskiy pointed out [1]/messages/by-id/235381750793454@mail.yandex.ru that the current behavior actually contradict the documentation. The libpq docs [2]https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-MULTIPLE-HOSTS Thanks, Evgeny state:
"When multiple hosts are specified, or when a single host name is translated to multiple addresses, all the hosts and addresses will be tried in order, until one succeeds."
The current behavior where target_session_attrs mismatch skips remaining addresses doesn't match this. A standby successfully responding but not matching target_session_attrs isn't a "connection failure" per se, but it does prevent finding a "successful" connection according to the user's requirements.
This suggests the simpler fix might actually be correcting a deviation from documented behavior, rather than introducing new behavior requiring a new parameter (as in 5396).
Happy to coordinate with Andrew on this - perhaps the question is whether this should be:

1.
A behavioral fix which matches documentation
2. An opt-in feature (5396's check_all_addrs parameter) - preserves backward compatibility

Given the documentation wording, I'd lean toward (1), but curious what others think.
[1]: /messages/by-id/235381750793454@mail.yandex.ru
[2]: https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-MULTIPLE-HOSTS Thanks, Evgeny
Thanks,
Evgeny

________________________________
From: Andrey Borodin <x4mmm@yandex-team.ru>
Sent: Thursday, March 5, 2026 3:16 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Evgeny Kuzin <evgeny.kuzin@outlook.com>; pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
Subject: Re: [PATCH] libpq: try all addresses for a host before moving to next on target_session_attrs mismatch

On 5 Mar 2026, at 19:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

TBH, I'd say that your DNS setup is broken and you should fix it.
It makes no sense to have the same DNS entry pointing to both
read-write and read-only hosts. The proposed patch will mainly
result in useless connection attempts in more-sanely-constructed
setups.

This is very desired feature by cloud providers.

We sell PGaaS clusters which are just a bunch of hosts. Each of
these hosts can became primary any time.
Currently, when user adds more hosts they have to redeploy\reconfigure
their app.

Unless user uses pgx that already works this way, then we can just give
them one FQDN for whole cluster and update DNS records.

This was proposed before [0]https://commitfest.postgresql.org/patch/5396/ and I think Andrew and Evgeny could join
efforts. Certainly, this can be implemented without affecting those
who do not need it.

Best regards, Andrey Borodin.

[0]: https://commitfest.postgresql.org/patch/5396/

#6Nick Cleaton
nick@cleaton.net
In reply to: Andrey Borodin (#4)
Re: [PATCH] libpq: try all addresses for a host before moving to next on target_session_attrs mismatch

On Sat, 7 Mar 2026 at 14:08, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

On 5 Mar 2026, at 19:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

TBH, I'd say that your DNS setup is broken and you should fix it.
It makes no sense to have the same DNS entry pointing to both
read-write and read-only hosts. The proposed patch will mainly
result in useless connection attempts in more-sanely-constructed
setups.

This is very desired feature by cloud providers.

We sell PGaaS clusters which are just a bunch of hosts. Each of
these hosts can became primary any time.
Currently, when user adds more hosts they have to redeploy\reconfigure
their app.

Somewhat related, we're using dynamic DNS to track the primary, but we
want a backup in case the dynamic DNS fails. We're using multi-host
connection strings for this, with a hostname like
"foo,foo1,foo2,foo3,foo4", where "foo" is the dynamic hostname and
"foo1"..."foo4" are CNAMEs to individual hosts. By updating the
CNAMEs, we can bring hosts in and out without reconfiguring clients.

Managing that is more complex than using a single fallback hostname
with an IP address for each host. It's annoying that we need an upper
bound on the number of potential primaries when configuring the
client. We could do better if libpq tried each IP address of a host
until it got an acceptable connection.

#7Greg Sabino Mullane
greg@turnstep.com
In reply to: Evgeny Kuzin (#5)
Re: [PATCH] libpq: try all addresses for a host before moving to next on target_session_attrs mismatch

On Thu, Mar 5, 2026 at 10:31 AM Evgeny Kuzin <evgeny.kuzin@outlook.com>
wrote:

This suggests the simpler fix might actually be correcting a deviation
from documented behavior, rather than introducing new behavior requiring a
new parameter (as in 5396).

+1, I think the docs have the right idea here which is "try until we get
exactly what we want"

Cheers,
Greg