PostgreSQL lock file

Started by Олег Самойловover 6 years ago3 messagesgeneral
Jump to latest

Hi all.

I have a task to create HA PostgreSQL cross datacenter cluster. I created a test bed on VirtualBox and I simulate different failures and disastrous in loops. And got one of funny bug of the Postgresql (11.4). One of the clusters has two instances of PostgreSQL on port 5433 and 5434. After simulating unexpected power off of the node the lock files of PostgreSQL still exist in /tmp directory. And on recovery the first instance on port 5433 starts without problem, but the second instance sometimes not. The reason is that the PID in /tmp/.s.PGSQL.5434.lock sometimes point to the one of process of PostgereSQL on port 5433, for instance the logger.

1408 ? S 0:00 /usr/pgsql-11/bin/postgres -D /var/lib/pgsql/krogan0a
1415 ? Ss 0:00 \_ postgres: logger
1416 ? Ss 0:02 \_ postgres: startup recovering 000000030000000000000007
1417 ? Ss 0:00 \_ postgres: checkpointer
1418 ? Ss 0:00 \_ postgres: background writer
1419 ? Ss 0:00 \_ postgres: stats collector
1420 ? Ss 0:10 \_ postgres: walreceiver streaming 0/7269290

# cat /tmp/.s.PGSQL.5434.lock
1415
/var/lib/pgsql/krogan0b
1565355860
5434
/tmp

# sudo -u postgres /usr/pgsql-11/bin/pg_ctl -D /var/lib/pgsql/krogan0b start
waiting for server to start....2019-08-09 16:52:25.685 MSK [6673] LOG: listening on IPv4 address "0.0.0.0", port 5434
2019-08-09 16:52:25.685 MSK [6673] LOG: listening on IPv6 address "::", port 5434
2019-08-09 16:52:25.686 MSK [6673] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5434"
2019-08-09 16:52:25.686 MSK [6673] FATAL: lock file "/tmp/.s.PGSQL.5434.lock" already exists
2019-08-09 16:52:25.686 MSK [6673] HINT: Is another postmaster (PID 1415) using socket file "/tmp/.s.PGSQL.5434"?
2019-08-09 16:52:25.686 MSK [6673] LOG: database system is shut down
stopped waiting
pg_ctl: could not start server
Examine the log output.

So it will be good if postgres will check not only PID, but also that the different postgres is running on different or same port. And is there some workaround?

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Олег Самойлов (#1)
Re: PostgreSQL lock file

=?utf-8?B?0J7Qu9C10LMg0KHQsNC80L7QudC70L7Qsg==?= <splarv@ya.ru> writes:

I have a task to create HA PostgreSQL cross datacenter cluster. I created a test bed on VirtualBox and I simulate different failures and disastrous in loops. And got one of funny bug of the Postgresql (11.4). One of the clusters has two instances of PostgreSQL on port 5433 and 5434. After simulating unexpected power off of the node the lock files of PostgreSQL still exist in /tmp directory. And on recovery the first instance on port 5433 starts without problem, but the second instance sometimes not. The reason is that the PID in /tmp/.s.PGSQL.5434.lock sometimes point to the one of process of PostgereSQL on port 5433, for instance the logger.

Best practice for this sort of thing is considered to be to run the two
Postgres instances under different user IDs. That avoids them thinking
that the other one's lock files are relevant, and it provides an extra
layer of security against possible conflicts between the two instances.

regards, tom lane

In reply to: Tom Lane (#2)
Re: PostgreSQL lock file

Best practice for this sort of thing is considered to be to run the two
Postgres instances under different user IDs. That avoids them thinking
that the other one's lock files are relevant, and it provides an extra
layer of security against possible conflicts between the two instances.

regards, tom lane

Thanks, an interesting notice. I already did changes in start script to check instance by pg_ctl status and delete lock file if status<>0. But your recommendation is looked better.