Establishing a local connection to PostgreSQL 9.6 takes seconds

Started by Marcin Barczynskialmost 5 years ago4 messagesgeneral
Jump to latest
#1Marcin Barczynski
mbarczynski@starfishstorage.com

In a heavily used production database prod, running a `psql prod` takes
seconds. Once the connection is established, performance of queries is
fine. There are ~2 new usually short-lived connections / second, and a
couple of long-running analytical queries. Connecting to other databases on
the same PostgreSQL server is fast.

Here is what I tried:

-bash-4.2$ date; psql prod -c "SELECT backend_start, xact_start,
query_start FROM pg_stat_activity WHERE pid = (SELECT pg_backend_pid());";
date
Wed Jun 16 09:01:51 EDT 2021
backend_start | xact_start |
query_start
-------------------------------+-------------------------------+-------------------------------
2021-06-16 09:01:51.596197-04 | 2021-06-16 09:01:57.979979-04 | 2021-06-16
09:01:57.979979-04
(1 row)

Wed Jun 16 09:01:57 EDT 2021

So the backend started quickly, but it took 6 seconds to start running the
query.
I used

ps --sort=start_time -elfy | grep "postgres prod .*startup" | head -n1 |
awk '{print $3}'

to identify the PID the most recent pg backend started by myself in startup
mode.
I straced PG backend after running psql:

$ sudo strace -tt -T -v -p $(ps --sort=start_time -elfy | grep "postgres
prod .*startup" | head -n1 | awk '{print $3}')
(...)
08:47:28.870917 read(6, "|\25\3\0\0|\v\335$!\1\0\364\0\20\1\0 \4
\374%\4\6p\237\30\1\340\236\30\1"..., 8192) = 8192 <0.000010>
08:47:28.870951 read(6, "|\25\3\0@\274\202\262\254\230\1\0\364\0\20\1\0 \4
\0&\4\6p\237\30\1\340\236\30\1"..., 8192) = 8192 <0.000010>
08:47:28.870986 read(6, "|\25\3\0\220\f\270\266\24\t\1\0\364\0\20\1\0 \4
\5&\4\6p\237\30\1\340\236\30\1"..., 8192) = 8192 <0.000010>
08:47:28.871021 read(6, "|\25\3\0008%\200\262r%\1\0\364\0\20\1\0 \4
\4&\4\6p\237\30\1\340\236\30\1"..., 8192) = 8192 <0.000011>
08:47:28.871056 semop(41779221, [{8, 1, 0}], 1) = 0 <0.000010>
08:47:28.871088 read(6, "|\25\3\0\220[\202\262\241\230\1\0\364\0\20\1\0 \4
\6&\4\6p\237\30\1\340\236\30\1"..., 8192) = 8192 <0.000010>
08:47:28.871124 read(6, "|\25\3\0\350\247\200\262\32^\1\0\364\0\20\1\0 \4
\1&\4\6p\237\30\1\340\236\30\1"..., 8192) = 8192 <0.000010>
08:47:28.871158 read(6, "|\25\3\0P\373\262\266D\262\1\0\364\0\20\1\0 \4
\373%\4\6p\237\30\1\340\236\30\1"..., 8192) = 8192 <0.000010>
08:47:28.871192 read(6, "|\25\3\0\360\240\201\262\345\206\1\0\364\0\20\1\0
\4 \3&\4\6p\237\30\1\340\236\30\1"..., 8192) = 8192 <0.000010>
08:47:28.871227 read(6, "|\25\3\0p\372\262\266\0h\1\0\364\0\20\1\0 \4
\2&\4\6p\237\30\1\340\236\30\1"..., 8192) = 8192 <0.000010>
08:47:28.871262 read(6, "|\25\3\0@\370\262\266\313\267\1\0\364\0\20\1\0 \4
\7&\4\6p\237\30\1\340\236\30\1"..., 8192) = 8192 <0.000011>
(...)

then stopped strace and ran:

$ sudo lsof -p $(ps --sort=start_time -elfy | grep "postgres prod
.*startup" | head -n1 | awk '{print $3}')
(...)
postmaste 4722 postgres 6u REG 8,1 1073741824
369860117 /opt/prod/pg/9.6/base/18370/1108887031
(...)

It's always /opt/prod/pg/9.6/base/18370/1108887031. Unfortunately, no trace
of it in pg_class:

prod=# SELECT relname FROM pg_class WHERE oid = 1108887031;
relname
---------
(0 rows)
prod=# SELECT relname FROM pg_class WHERE relfilenode = 1108887031;
relname
---------
(0 rows)

What does pg backend do on startup?
I would be thankful for suggestions on how to troubleshoot it.

Versions:

PostgreSQL 9.6.21 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-44), 64-bit
Linux 3.10.0-1062.el7.x86_64 #1 SMP Wed Aug 7 18:08:02 UTC 2019 x86_64
x86_64 x86_64 GNU/Linux

--
Thanks,
Marcin

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marcin Barczynski (#1)
Re: Establishing a local connection to PostgreSQL 9.6 takes seconds

Marcin Barczynski <mbarczynski@starfishstorage.com> writes:

It's always /opt/prod/pg/9.6/base/18370/1108887031. Unfortunately, no trace
of it in pg_class:

It's probably a mapped system catalog, which will have relfilenode = 0.
Try

SELECT relname FROM pg_class WHERE pg_relation_filenode(oid) = 1108887031;

My guess is that whichever catalog it is is badly bloated.

regards, tom lane

#3Marcin Barczynski
mbarczynski@starfishstorage.com
In reply to: Tom Lane (#2)
Re: Establishing a local connection to PostgreSQL 9.6 takes seconds

Thanks for the immediate reply.

It turned out to be pg_attribute. It was bloated probably due to the large
number of temp tables created.
Are there any recommendations on how to prevent such a bloat from happening?

On Wed, Jun 16, 2021 at 4:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Marcin Barczynski <mbarczynski@starfishstorage.com> writes:

It's always /opt/prod/pg/9.6/base/18370/1108887031. Unfortunately, no

trace

of it in pg_class:

It's probably a mapped system catalog, which will have relfilenode = 0.
Try

SELECT relname FROM pg_class WHERE pg_relation_filenode(oid) = 1108887031;

My guess is that whichever catalog it is is badly bloated.

regards, tom lane

--

*Marcin Barczyński* | *Senior Software Engineer * |

mbarczynski@starfishstorage.com | http://www.starfishstorage.com

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marcin Barczynski (#3)
Re: Establishing a local connection to PostgreSQL 9.6 takes seconds

Marcin Barczynski <mbarczynski@starfishstorage.com> writes:

It turned out to be pg_attribute. It was bloated probably due to the large
number of temp tables created.
Are there any recommendations on how to prevent such a bloat from happening?

You could perhaps apply more aggressive autovacuum settings to that
catalog. That won't be enough to get rid of the existing problem;
you'll likely need to do a VACUUM FULL on it to remove the bloat.
But autovac ought to be able to keep up with things in future,
if tuned right.

My guess is that pg_class has also got a bloat problem, though
perhaps not as severe.

regards, tom lane