Delays between "connection received" and "connection authenticated" because of localhost entries in hba

Started by Daniel Westermann (DWE)over 1 year ago20 messagesgeneral
Jump to latest
#1Daniel Westermann (DWE)
daniel.westermann@dbi-services.com

Delays between "connection received" and "connection authenticated" because of localhost entries in hba

Hi,

we're facing a strange issue with delays between "connection received" and "connection authenticated".

# select version();
version
-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 15.6 (Ubuntu 15.6-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
(1 ligne)

I know, this is not the latest minor version.

# \! cat /etc/os-release | head -1
PRETTY_NAME="Ubuntu 22.04.4 LTS"

What we see in the log is this (around 4 seconds delay):

2024-05-07 15:29:50.244 CEST [369909] LOG: connection received: host=xx.xx.xx.100 port=48434
2024-05-07 15:29:54.518 CEST [369909] LOG: connection authenticated: identity="xxxxxx" method=md5 (/etc/postgresql/15/main/pg_hba.conf:121)

The matching line is this ( I know md5 ):
host all xxxxx xx.xx.xx.0/24 md5

What we've found out so far is, that this only happens if we have a localhost(or any other hostname) line before the line matching our connection, something like this:
host replication xxxxx localhost md5
host all xxxxx xx.xx.xx.0/24 md5

We don't see it if we do it like this:
host all xxxxx xx.xx.xx.0/24 md5
host replication xxxxx localhost md5

Has anyone experienced such a behavior? It seems clear that this is somehow related to name resolution but we couldn't reproduce something like this on the OS using dig (in a loop several hundred of times). It is also only happening from time to time, and not constantly.

What we're basically looking for is a way to prove the assumption without involving PostgreSQL at all, if that does make sense?

Many thanks in advance
Daniel

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Daniel Westermann (DWE) (#1)
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba

On 10/29/24 09:30, Daniel Westermann (DWE) wrote:

Delays between "connection received" and "connection authenticated" because of localhost entries in hba

Hi,

we're facing a strange issue with delays between "connection received" and "connection authenticated".

# select version();
version
-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 15.6 (Ubuntu 15.6-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
(1 ligne)

I know, this is not the latest minor version.

# \! cat /etc/os-release | head -1
PRETTY_NAME="Ubuntu 22.04.4 LTS"

What we see in the log is this (around 4 seconds delay):

2024-05-07 15:29:50.244 CEST [369909] LOG: connection received: host=xx.xx.xx.100 port=48434
2024-05-07 15:29:54.518 CEST [369909] LOG: connection authenticated: identity="xxxxxx" method=md5 (/etc/postgresql/15/main/pg_hba.conf:121)

1) host=xx.xx.xx.100 = localhost?

2) In postgresql.conf what is listen_addresses set to?

3) What are the settings in /etc/hosts?

The matching line is this ( I know md5 ):
host all xxxxx xx.xx.xx.0/24 md5

What we've found out so far is, that this only happens if we have a localhost(or any other hostname) line before the line matching our connection, something like this:
host replication xxxxx localhost md5
host all xxxxx xx.xx.xx.0/24 md5

We don't see it if we do it like this:
host all xxxxx xx.xx.xx.0/24 md5
host replication xxxxx localhost md5

Has anyone experienced such a behavior? It seems clear that this is somehow related to name resolution but we couldn't reproduce something like this on the OS using dig (in a loop several hundred of times). It is also only happening from time to time, and not constantly.

What we're basically looking for is a way to prove the assumption without involving PostgreSQL at all, if that does make sense?

Many thanks in advance
Daniel

--
Adrian Klaver
adrian.klaver@aklaver.com

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Westermann (DWE) (#1)
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba

"Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes:

we're facing a strange issue with delays between "connection received" and "connection authenticated".

Has anyone experienced such a behavior? It seems clear that this is somehow related to name resolution but we couldn't reproduce something like this on the OS using dig (in a loop several hundred of times). It is also only happening from time to time, and not constantly.

FWIW, I do have vague memories of other complaints like this, although
not of whether solutions were found. It was awhile ago though.
It might be worth searching our mailing list archives.

regards, tom lane

#4Daniel Westermann (DWE)
daniel.westermann@dbi-services.com
In reply to: Adrian Klaver (#2)
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba

1) host=xx.xx.xx.100 = localhost?

No, it is not

2) In postgresql.conf what is listen_addresses set to?

*

3) What are the settings in /etc/hosts?

Standard Ubuntu settings:

127.0.0.1 localhost
127.0.1.1 abc cde

# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

Regards
Daniel

#5Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Daniel Westermann (DWE) (#4)
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba

On 10/29/24 10:42 AM, Daniel Westermann (DWE) wrote:

1) host=xx.xx.xx.100 = localhost?

No, it is not

Is the localhost machine on the same network?

Is the client connection string using IPV4 address or host name?

2) In postgresql.conf what is listen_addresses set to?

*

3) What are the settings in /etc/hosts?

Standard Ubuntu settings:

127.0.0.1 localhost
127.0.1.1 abc cde

# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

Contents of resolv.conf?

Would it be possible to crank the logging detail up?

Regards
Daniel

--
Adrian Klaver
adrian.klaver@aklaver.com

#6Vijaykumar Jain
vijaykumarjain.github@gmail.com
In reply to: Daniel Westermann (DWE) (#1)
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba

missed the mailing list.

#7Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Vijaykumar Jain (#6)
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba

On 10/29/24 11:20 AM, Vijaykumar Jain wrote:

missed the mailing list.

Can you provide a more complete explanation for above?

--
Adrian Klaver
adrian.klaver@aklaver.com

#8Vijaykumar Jain
vijaykumarjain.github@gmail.com
In reply to: Daniel Westermann (DWE) (#1)
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba

On Tue, 29 Oct 2024 at 23:43, Vijaykumar Jain <
vijaykumarjain.github@gmail.com> wrote:

On Tue, 29 Oct 2024 at 22:00, Daniel Westermann (DWE) <
daniel.westermann@dbi-services.com> wrote:

Delays between "connection received" and "connection authenticated"
because of localhost entries in hba

Hi,

we're facing a strange issue with delays between "connection received"
and "connection authenticated".

# select version();
version

-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 15.6 (Ubuntu 15.6-1.pgdg22.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
(1 ligne)

I know, this is not the latest minor version.

# \! cat /etc/os-release | head -1
PRETTY_NAME="Ubuntu 22.04.4 LTS"

What we see in the log is this (around 4 seconds delay):

2024-05-07 15:29:50.244 CEST [369909] LOG: connection received:
host=xx.xx.xx.100 port=48434
2024-05-07 15:29:54.518 CEST [369909] LOG: connection authenticated:
identity="xxxxxx" method=md5 (/etc/postgresql/15/main/pg_hba.conf:121)

Just before we get into os and name resolution stuff..

1) if the database usage is really heavy with schema churn and or heavy
temp usage. can the catalog bloat result in slowless of pg_auth* objects
resulting in slow authentication?
a general check used to be checking the size of the catalog objects and
if needed vacuum full in single user mode .... to speed up auth lookup.

i tried with pg latest on ubuntu 22, but did not see any issues.

So to simulate a similar issue, i understand once connected, it will look
up pg_auth* objects cascading to other catalogs for resolution ...
in one session:
PGPASSWORD=1234 psql -h localhost -p 5432 -d postgres -U postgres -c
'select 1'

in another session

postgres@ubuntu:~$ cat /tmp/db1/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
trace_locks = 'on'
trace_lock_table = 1260

postgres=# select 'pg_authid'::regclass::oid;
oid
------
1260
(1 row)
postgres=*# lock table pg_authid in access exclusive mode;
LOCK TABLE
postgres=*# rollback;
ROLLBACK

i tried to trace which locks are held when a connection is established
after auth.

for example pg_authid , so i enable tracing on this object, and to mock
the delay i try to lock the table in access exclusive mode ...
note, this may not be the scenario, but i am trying to rule out pg related
delay due to bloat or locking

postgres@ubuntu:/tmp$ time PGPASSWORD=1234 psql -h localhost -p 5432 -d
postgres -U postgres -c 'select 1'
?column?
----------
1
(1 row)

real 0m7.503s - -delay
user 0m0.001s
sys 0m0.005s

postgres@ubuntu:/tmp$ #after rollback of pg_authid

from the logs
2024-10-29 18:04:46.411 UTC [localhost(33424)] [62661] LOG: connection
received: host=localhost port=33424
2024-10-29 18:04:53.906 UTC [localhost(33424)] [62661] LOG: connection
authenticated: identity="postgres" method=md5 (/tmp/db1/pg_hba.conf:120)

what i tried here was, delay auth but after connection established till
lock held, which can be due to bloated catalog objects or long locks due to
temp objects churn?

still it does not explain how the order of pg_hba deals with the delay ?

anyways, fyi to rule out :)

--
Thanks,
Vijay

Open to work
Resume - Vijaykumar Jain <https://github.com/cabecada&gt;

sorry, sharing again.

#9Alan Hodgson
ahodgson@lists.simkin.ca
In reply to: Daniel Westermann (DWE) (#1)
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba

On Tue, 2024-10-29 at 16:30 +0000, Daniel Westermann (DWE) wrote:

Delays between "connection received" and "connection authenticated"
because of localhost entries in hba

Maybe check that "files" is the first mechanism for hosts lookups in
nsswitch.conf.

afaik dig doesn't follow the same name resolution path as glibc.

#10Vijaykumar Jain
vijaykumarjain.github@gmail.com
In reply to: Alan Hodgson (#9)
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba

On Wed, 30 Oct 2024 at 00:04, Alan Hodgson <ahodgson@lists.simkin.ca> wrote:

On Tue, 2024-10-29 at 16:30 +0000, Daniel Westermann (DWE) wrote:

Delays between "connection received" and "connection authenticated"
because of localhost entries in hba

Maybe check that "files" is the first mechanism for hosts lookups in
nsswitch.conf.

afaik dig doesn't follow the same name resolution path as glibc.

a connection received should mean resolution was done, right ? am i missing
something?
I could try adding delay with traffic control, but then if a connection
established is logged in the log, then name resolution is ok ?

or the syscall to md5 is slow , does it work faster for trust ? just saying
:)

--
Thanks,
Vijay

Open to work
Resume - Vijaykumar Jain <https://github.com/cabecada&gt;

#11Alan Hodgson
ahodgson@lists.simkin.ca
In reply to: Vijaykumar Jain (#10)
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba

On Wed, 2024-10-30 at 00:15 +0530, Vijaykumar Jain wrote:

On Wed, 30 Oct 2024 at 00:04, Alan Hodgson
<ahodgson@lists.simkin.ca> wrote:

On Tue, 2024-10-29 at 16:30 +0000, Daniel Westermann (DWE) wrote:

Delays between "connection received" and "connection
authenticated" because of localhost entries in hba

Maybe check that "files" is the first mechanism for hosts lookups
in nsswitch.conf.

afaik dig doesn't follow the same name resolution path as glibc.

a connection received should mean resolution was done, right ? am
i missing something?

Well, I'm just throwing out something to check, but I meant on the
server, not the client. It's trying to resolve "localhost" to check
the auth. If there's a delay, that's probably where it is.

I guess I just assumed that's where you were testing dig.

#12Vijaykumar Jain
vijaykumarjain.github@gmail.com
In reply to: Alan Hodgson (#11)
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba

On Wed, 30 Oct 2024 at 00:40, Alan Hodgson <ahodgson@lists.simkin.ca> wrote:

Well, I'm just throwing out something to check, but I meant on the server,
not the client. It's trying to resolve "localhost" to check the auth. If
there's a delay, that's probably where it is.

I guess I just assumed that's where you were testing dig.

Yep. my mistake. i did not read it correctly. in order to discard the rule,
it first has to find out by expanding the hosts in the rule, so yep, dns
resolution might be the problem.

also, the 4-5s delay is eerie similar to issues in the past.

https://unix.stackexchange.com/questions/290987/resolving-hostname-takes-5-seconds
maybe disabling ipv6 completely resolves this for the time being.

--
Thanks,
Vijay

Open to work
Resume - Vijaykumar Jain <https://github.com/cabecada&gt;

#13Daniel Westermann (DWE)
daniel.westermann@dbi-services.com
In reply to: Alan Hodgson (#9)
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba

Maybe check that "files" is the first mechanism for hosts lookups in nsswitch.conf.

It is

#14Daniel Westermann (DWE)
daniel.westermann@dbi-services.com
In reply to: Alan Hodgson (#11)
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba

Maybe check that "files" is the first mechanism for hosts lookups in nsswitch.conf.

afaik dig doesn't follow the same name resolution path as glibc.

I don't know. Is there a way to test the glibc way easily?

a connection received should mean resolution was done, right ? am i missing something?

Well, I'm just throwing out something to check, but I meant on the server, not the client. It's trying to resolve >"localhost" to check the auth. If there's a delay, that's probably where it is.

I guess I just assumed that's where you were testing dig.

Yes

Regards
Daniel

#15Daniel Westermann (DWE)
daniel.westermann@dbi-services.com
In reply to: Tom Lane (#3)
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba

It might be worth searching our mailing list archives.

I did that, but was not able to find anything useful related to this.

Regards
Daniel

#16Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Daniel Westermann (DWE) (#1)
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba

On Tue, 2024-10-29 at 16:30 +0000, Daniel Westermann (DWE) wrote:

we're facing a strange issue with delays between "connection received" and "connection authenticated".

# select version();
                                                              version
-----------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 15.6 (Ubuntu 15.6-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
(1 ligne)

I know, this is not the latest minor version.

# \! cat /etc/os-release | head -1
PRETTY_NAME="Ubuntu 22.04.4 LTS"

What we see in the log is this (around 4 seconds delay):

2024-05-07 15:29:50.244 CEST [369909] LOG:  connection received: host=xx.xx.xx.100 port=48434
2024-05-07 15:29:54.518 CEST [369909] LOG:  connection authenticated: identity="xxxxxx" method=md5 (/etc/postgresql/15/main/pg_hba.conf:121)

The matching line is this ( I know md5 ):
host    all             xxxxx     xx.xx.xx.0/24            md5

What we've found out so far is, that this only happens if we have a localhost(or any other hostname) line before the line matching our connection, something like this:
host    replication     xxxxx     localhost                md5
host    all             xxxxx     xx.xx.xx.0/24            md5

I'd "strace" such a database connection to see where the time is spent.

Yours,
Laurenz Albe

#17Daniel Westermann (DWE)
daniel.westermann@dbi-services.com
In reply to: Laurenz Albe (#16)
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba

What we've found out so far is, that this only happens if we have a localhost(or any other hostname) line before the line matching our connection, something like this:
host    replication     xxxxx     localhost                md5
host    all             xxxxx     xx.xx.xx.0/24            md5

... which is quite hard to do if you don't know which one it will be

Regards
Daniel

#18Greg Sabino Mullane
greg@turnstep.com
In reply to: Daniel Westermann (DWE) (#17)
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba

I'd echo the suggestion to strace this. You can use the pre_auth_delay
setting to help facilitate that. See:

https://www.postgresql.org/docs/current/runtime-config-developer.html

Cheers,
Greg

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Sabino Mullane (#18)
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba

Greg Sabino Mullane <htamfids@gmail.com> writes:

I'd echo the suggestion to strace this. You can use the pre_auth_delay
setting to help facilitate that. See:

IIUC, the delays are rare and unpredictable, so that strace'ing seems
unlikely to be practical.

If rebuilding from source is feasible, you could insert monitoring
elog(LOG) calls around the name lookup (that is, around the
pg_getnameinfo_all call in src/backend/libpq/hba.c). This would
confirm or disprove the theory that the name lookup is the source
of the issue, which is about as much as we'd get out of strace.

regards, tom lane

#20Daniel Westermann (DWE)
daniel.westermann@dbi-services.com
In reply to: Tom Lane (#19)
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba

Greg Sabino Mullane <htamfids@gmail.com> writes:
I'd echo the suggestion to strace this. You can use the pre_auth_delay
setting to help facilitate that. See:

IIUC, the delays are rare and unpredictable, so that strace'ing seems
unlikely to be practical.

If rebuilding from source is feasible, you could insert monitoring
elog(LOG) calls around the name lookup (that is, around the
pg_getnameinfo_all call in src/backend/libpq/hba.c).  This would
confirm or disprove the theory that the name lookup is the source
of the issue, which is about as much as we'd get out of strace.

Thanks for all your tips/hints

Regards
Daniel