Delays between "connection received" and "connection authenticated" because of localhost entries in hba
Delays between "connection received" and "connection authenticated" because of localhost entries in hba
Hi,
we're facing a strange issue with delays between "connection received" and "connection authenticated".
# select version();
version
-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 15.6 (Ubuntu 15.6-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
(1 ligne)
I know, this is not the latest minor version.
# \! cat /etc/os-release | head -1
PRETTY_NAME="Ubuntu 22.04.4 LTS"
What we see in the log is this (around 4 seconds delay):
2024-05-07 15:29:50.244 CEST [369909] LOG: connection received: host=xx.xx.xx.100 port=48434
2024-05-07 15:29:54.518 CEST [369909] LOG: connection authenticated: identity="xxxxxx" method=md5 (/etc/postgresql/15/main/pg_hba.conf:121)
The matching line is this ( I know md5 ):
host all xxxxx xx.xx.xx.0/24 md5
What we've found out so far is, that this only happens if we have a localhost(or any other hostname) line before the line matching our connection, something like this:
host replication xxxxx localhost md5
host all xxxxx xx.xx.xx.0/24 md5
We don't see it if we do it like this:
host all xxxxx xx.xx.xx.0/24 md5
host replication xxxxx localhost md5
Has anyone experienced such a behavior? It seems clear that this is somehow related to name resolution but we couldn't reproduce something like this on the OS using dig (in a loop several hundred of times). It is also only happening from time to time, and not constantly.
What we're basically looking for is a way to prove the assumption without involving PostgreSQL at all, if that does make sense?
Many thanks in advance
Daniel
On 10/29/24 09:30, Daniel Westermann (DWE) wrote:
Delays between "connection received" and "connection authenticated" because of localhost entries in hba
Hi,
we're facing a strange issue with delays between "connection received" and "connection authenticated".
# select version();
version
-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 15.6 (Ubuntu 15.6-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
(1 ligne)I know, this is not the latest minor version.
# \! cat /etc/os-release | head -1
PRETTY_NAME="Ubuntu 22.04.4 LTS"What we see in the log is this (around 4 seconds delay):
2024-05-07 15:29:50.244 CEST [369909] LOG: connection received: host=xx.xx.xx.100 port=48434
2024-05-07 15:29:54.518 CEST [369909] LOG: connection authenticated: identity="xxxxxx" method=md5 (/etc/postgresql/15/main/pg_hba.conf:121)
1) host=xx.xx.xx.100 = localhost?
2) In postgresql.conf what is listen_addresses set to?
3) What are the settings in /etc/hosts?
The matching line is this ( I know md5 ):
host all xxxxx xx.xx.xx.0/24 md5What we've found out so far is, that this only happens if we have a localhost(or any other hostname) line before the line matching our connection, something like this:
host replication xxxxx localhost md5
host all xxxxx xx.xx.xx.0/24 md5We don't see it if we do it like this:
host all xxxxx xx.xx.xx.0/24 md5
host replication xxxxx localhost md5Has anyone experienced such a behavior? It seems clear that this is somehow related to name resolution but we couldn't reproduce something like this on the OS using dig (in a loop several hundred of times). It is also only happening from time to time, and not constantly.
What we're basically looking for is a way to prove the assumption without involving PostgreSQL at all, if that does make sense?
Many thanks in advance
Daniel
--
Adrian Klaver
adrian.klaver@aklaver.com
"Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes:
we're facing a strange issue with delays between "connection received" and "connection authenticated".
Has anyone experienced such a behavior? It seems clear that this is somehow related to name resolution but we couldn't reproduce something like this on the OS using dig (in a loop several hundred of times). It is also only happening from time to time, and not constantly.
FWIW, I do have vague memories of other complaints like this, although
not of whether solutions were found. It was awhile ago though.
It might be worth searching our mailing list archives.
regards, tom lane
1) host=xx.xx.xx.100 = localhost?
No, it is not
2) In postgresql.conf what is listen_addresses set to?
*
3) What are the settings in /etc/hosts?
Standard Ubuntu settings:
127.0.0.1 localhost
127.0.1.1 abc cde
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
Regards
Daniel
On 10/29/24 10:42 AM, Daniel Westermann (DWE) wrote:
1) host=xx.xx.xx.100 = localhost?
No, it is not
Is the localhost machine on the same network?
Is the client connection string using IPV4 address or host name?
2) In postgresql.conf what is listen_addresses set to?
*
3) What are the settings in /etc/hosts?
Standard Ubuntu settings:
127.0.0.1 localhost
127.0.1.1 abc cde# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
Contents of resolv.conf?
Would it be possible to crank the logging detail up?
Regards
Daniel
--
Adrian Klaver
adrian.klaver@aklaver.com
missed the mailing list.
Import Notes
Reply to msg id not found: CAM+6J94jKdQpBHEwPivSQtZ888Seo-WfobwEm0ZZimc3BxvV6w@mail.gmail.com
On 10/29/24 11:20 AM, Vijaykumar Jain wrote:
missed the mailing list.
Can you provide a more complete explanation for above?
--
Adrian Klaver
adrian.klaver@aklaver.com
On Tue, 29 Oct 2024 at 23:43, Vijaykumar Jain <
vijaykumarjain.github@gmail.com> wrote:
On Tue, 29 Oct 2024 at 22:00, Daniel Westermann (DWE) <
daniel.westermann@dbi-services.com> wrote:Delays between "connection received" and "connection authenticated"
because of localhost entries in hbaHi,
we're facing a strange issue with delays between "connection received"
and "connection authenticated".# select version();
version-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 15.6 (Ubuntu 15.6-1.pgdg22.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
(1 ligne)I know, this is not the latest minor version.
# \! cat /etc/os-release | head -1
PRETTY_NAME="Ubuntu 22.04.4 LTS"What we see in the log is this (around 4 seconds delay):
2024-05-07 15:29:50.244 CEST [369909] LOG: connection received:
host=xx.xx.xx.100 port=48434
2024-05-07 15:29:54.518 CEST [369909] LOG: connection authenticated:
identity="xxxxxx" method=md5 (/etc/postgresql/15/main/pg_hba.conf:121)Just before we get into os and name resolution stuff..
1) if the database usage is really heavy with schema churn and or heavy
temp usage. can the catalog bloat result in slowless of pg_auth* objects
resulting in slow authentication?
a general check used to be checking the size of the catalog objects and
if needed vacuum full in single user mode .... to speed up auth lookup.i tried with pg latest on ubuntu 22, but did not see any issues.
So to simulate a similar issue, i understand once connected, it will look
up pg_auth* objects cascading to other catalogs for resolution ...
in one session:
PGPASSWORD=1234 psql -h localhost -p 5432 -d postgres -U postgres -c
'select 1'in another session
postgres@ubuntu:~$ cat /tmp/db1/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
trace_locks = 'on'
trace_lock_table = 1260postgres=# select 'pg_authid'::regclass::oid;
oid
------
1260
(1 row)
postgres=*# lock table pg_authid in access exclusive mode;
LOCK TABLE
postgres=*# rollback;
ROLLBACKi tried to trace which locks are held when a connection is established
after auth.for example pg_authid , so i enable tracing on this object, and to mock
the delay i try to lock the table in access exclusive mode ...
note, this may not be the scenario, but i am trying to rule out pg related
delay due to bloat or lockingpostgres@ubuntu:/tmp$ time PGPASSWORD=1234 psql -h localhost -p 5432 -d
postgres -U postgres -c 'select 1'
?column?
----------
1
(1 row)real 0m7.503s - -delay
user 0m0.001s
sys 0m0.005spostgres@ubuntu:/tmp$ #after rollback of pg_authid
from the logs
2024-10-29 18:04:46.411 UTC [localhost(33424)] [62661] LOG: connection
received: host=localhost port=33424
2024-10-29 18:04:53.906 UTC [localhost(33424)] [62661] LOG: connection
authenticated: identity="postgres" method=md5 (/tmp/db1/pg_hba.conf:120)what i tried here was, delay auth but after connection established till
lock held, which can be due to bloated catalog objects or long locks due to
temp objects churn?still it does not explain how the order of pg_hba deals with the delay ?
anyways, fyi to rule out :)
--
Thanks,
VijayOpen to work
Resume - Vijaykumar Jain <https://github.com/cabecada>
sorry, sharing again.
Import Notes
Reply to msg id not found: CAM+6J94jKdQpBHEwPivSQtZ888Seo-WfobwEm0ZZimc3BxvV6w@mail.gmail.com
On Tue, 2024-10-29 at 16:30 +0000, Daniel Westermann (DWE) wrote:
Delays between "connection received" and "connection authenticated"
because of localhost entries in hba
Maybe check that "files" is the first mechanism for hosts lookups in
nsswitch.conf.
afaik dig doesn't follow the same name resolution path as glibc.
On Wed, 30 Oct 2024 at 00:04, Alan Hodgson <ahodgson@lists.simkin.ca> wrote:
On Tue, 2024-10-29 at 16:30 +0000, Daniel Westermann (DWE) wrote:
Delays between "connection received" and "connection authenticated"
because of localhost entries in hbaMaybe check that "files" is the first mechanism for hosts lookups in
nsswitch.conf.afaik dig doesn't follow the same name resolution path as glibc.
a connection received should mean resolution was done, right ? am i missing
something?
I could try adding delay with traffic control, but then if a connection
established is logged in the log, then name resolution is ok ?
or the syscall to md5 is slow , does it work faster for trust ? just saying
:)
--
Thanks,
Vijay
Open to work
Resume - Vijaykumar Jain <https://github.com/cabecada>
On Wed, 2024-10-30 at 00:15 +0530, Vijaykumar Jain wrote:
On Wed, 30 Oct 2024 at 00:04, Alan Hodgson
<ahodgson@lists.simkin.ca> wrote:On Tue, 2024-10-29 at 16:30 +0000, Daniel Westermann (DWE) wrote:
Delays between "connection received" and "connection
authenticated" because of localhost entries in hbaMaybe check that "files" is the first mechanism for hosts lookups
in nsswitch.conf.afaik dig doesn't follow the same name resolution path as glibc.
a connection received should mean resolution was done, right ? am
i missing something?
Well, I'm just throwing out something to check, but I meant on the
server, not the client. It's trying to resolve "localhost" to check
the auth. If there's a delay, that's probably where it is.
I guess I just assumed that's where you were testing dig.
On Wed, 30 Oct 2024 at 00:40, Alan Hodgson <ahodgson@lists.simkin.ca> wrote:
Well, I'm just throwing out something to check, but I meant on the server,
not the client. It's trying to resolve "localhost" to check the auth. If
there's a delay, that's probably where it is.I guess I just assumed that's where you were testing dig.
Yep. my mistake. i did not read it correctly. in order to discard the rule,
it first has to find out by expanding the hosts in the rule, so yep, dns
resolution might be the problem.
also, the 4-5s delay is eerie similar to issues in the past.
https://unix.stackexchange.com/questions/290987/resolving-hostname-takes-5-seconds
maybe disabling ipv6 completely resolves this for the time being.
--
Thanks,
Vijay
Open to work
Resume - Vijaykumar Jain <https://github.com/cabecada>
Maybe check that "files" is the first mechanism for hosts lookups in nsswitch.conf.
It is
Maybe check that "files" is the first mechanism for hosts lookups in nsswitch.conf.
afaik dig doesn't follow the same name resolution path as glibc.
I don't know. Is there a way to test the glibc way easily?
a connection received should mean resolution was done, right ? am i missing something?
Well, I'm just throwing out something to check, but I meant on the server, not the client. It's trying to resolve >"localhost" to check the auth. If there's a delay, that's probably where it is.
I guess I just assumed that's where you were testing dig.
Yes
Regards
Daniel
It might be worth searching our mailing list archives.
I did that, but was not able to find anything useful related to this.
Regards
Daniel
On Tue, 2024-10-29 at 16:30 +0000, Daniel Westermann (DWE) wrote:
we're facing a strange issue with delays between "connection received" and "connection authenticated".
# select version();
version
-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 15.6 (Ubuntu 15.6-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
(1 ligne)I know, this is not the latest minor version.
# \! cat /etc/os-release | head -1
PRETTY_NAME="Ubuntu 22.04.4 LTS"What we see in the log is this (around 4 seconds delay):
2024-05-07 15:29:50.244 CEST [369909] LOG: connection received: host=xx.xx.xx.100 port=48434
2024-05-07 15:29:54.518 CEST [369909] LOG: connection authenticated: identity="xxxxxx" method=md5 (/etc/postgresql/15/main/pg_hba.conf:121)The matching line is this ( I know md5 ):
host all xxxxx xx.xx.xx.0/24 md5What we've found out so far is, that this only happens if we have a localhost(or any other hostname) line before the line matching our connection, something like this:
host replication xxxxx localhost md5
host all xxxxx xx.xx.xx.0/24 md5
I'd "strace" such a database connection to see where the time is spent.
Yours,
Laurenz Albe
What we've found out so far is, that this only happens if we have a localhost(or any other hostname) line before the line matching our connection, something like this:
host replication xxxxx localhost md5
host all xxxxx xx.xx.xx.0/24 md5
... which is quite hard to do if you don't know which one it will be
Regards
Daniel
I'd echo the suggestion to strace this. You can use the pre_auth_delay
setting to help facilitate that. See:
https://www.postgresql.org/docs/current/runtime-config-developer.html
Cheers,
Greg
Greg Sabino Mullane <htamfids@gmail.com> writes:
I'd echo the suggestion to strace this. You can use the pre_auth_delay
setting to help facilitate that. See:
IIUC, the delays are rare and unpredictable, so that strace'ing seems
unlikely to be practical.
If rebuilding from source is feasible, you could insert monitoring
elog(LOG) calls around the name lookup (that is, around the
pg_getnameinfo_all call in src/backend/libpq/hba.c). This would
confirm or disprove the theory that the name lookup is the source
of the issue, which is about as much as we'd get out of strace.
regards, tom lane
Greg Sabino Mullane <htamfids@gmail.com> writes:
I'd echo the suggestion to strace this. You can use the pre_auth_delay
setting to help facilitate that. See:
IIUC, the delays are rare and unpredictable, so that strace'ing seems
unlikely to be practical.
If rebuilding from source is feasible, you could insert monitoring
elog(LOG) calls around the name lookup (that is, around the
pg_getnameinfo_all call in src/backend/libpq/hba.c). This would
confirm or disprove the theory that the name lookup is the source
of the issue, which is about as much as we'd get out of strace.
Thanks for all your tips/hints
Regards
Daniel