postgresql systemd service fails to start only on boot but not manually
Hello everyone,
I'm having trouble with my PostgreSQL Systemd service and I wonder
whether this is a problem with the software or some problem with the
packaging or build.
My server fails to start PostgreSQL only on boot, if I restart it
manually afterwards it doesn't have any problem starting. Here is the
log extracted from the journal:
```
2018-09-21 20:46:40.028 CEST [306] LOG: listening on IPv4 address "127.0.0.1", port 5432
2018-09-21 20:46:40.036 CEST [306] LOG: listening on Unix socket "/run/postgresql/.s.PGSQL.5432"
2018-09-21 20:46:40.233 CEST [337] LOG: database system was shut down at 2018-09-21 20:46:21 CEST
2018-09-21 20:48:10.441 CEST [352] WARNING: worker took too long to start; canceled
2018-09-21 20:49:10.469 CEST [352] WARNING: worker took too long to start; canceled
2018-09-21 20:49:10.478 CEST [306] LOG: database system is ready to accept connections
2018-09-21 20:49:10.486 CEST [306] LOG: received fast shutdown request
2018-09-21 20:49:10.497 CEST [306] LOG: aborting any active transactions
2018-09-21 20:49:10.498 CEST [637] FATAL: terminating autovacuum process due to administrator command
2018-09-21 20:49:10.500 CEST [306] LOG: worker process: logical replication launcher (PID 636) exited with code 1
2018-09-21 20:49:10.501 CEST [349] LOG: shutting down
2018-09-21 20:49:10.518 CEST [306] LOG: database system is shut down
```
Any Ideas? I'd really like to fix this since now after every reboot I have to
remember to manually restart it.
Thanks.
On 9/22/18 3:44 AM, Doron Behar wrote:
Hello everyone,
I'm having trouble with my PostgreSQL Systemd service and I wonder
whether this is a problem with the software or some problem with the
packaging or build.My server fails to start PostgreSQL only on boot, if I restart it
manually afterwards it doesn't have any problem starting. Here is the
log extracted from the journal:```
2018-09-21 20:46:40.028 CEST [306] LOG: listening on IPv4 address "127.0.0.1", port 5432
2018-09-21 20:46:40.036 CEST [306] LOG: listening on Unix socket "/run/postgresql/.s.PGSQL.5432"
2018-09-21 20:46:40.233 CEST [337] LOG: database system was shut down at 2018-09-21 20:46:21 CEST
2018-09-21 20:48:10.441 CEST [352] WARNING: worker took too long to start; canceled
2018-09-21 20:49:10.469 CEST [352] WARNING: worker took too long to start; canceled
2018-09-21 20:49:10.478 CEST [306] LOG: database system is ready to accept connection > 2018-09-21 20:49:10.486 CEST [306] LOG: received fast shutdown request
2018-09-21 20:49:10.497 CEST [306] LOG: aborting any active transactions
2018-09-21 20:49:10.498 CEST [637] FATAL: terminating autovacuum process due to administrator command
2018-09-21 20:49:10.500 CEST [306] LOG: worker process: logical replication launcher (PID 636) exited with code 1
2018-09-21 20:49:10.501 CEST [349] LOG: shutting down
2018-09-21 20:49:10.518 CEST [306] LOG: database system is shut down
```Any Ideas? I'd really like to fix this since now after every reboot I have to
remember to manually restart it.
Linux distro and version?
Assuming Postgres version 10+ given logical replication warning. Still
actual version would be nice.
How did you install Postgres?
Where did systemd script come from?
What is in the systemd script?
What does the log show when you do a successful manual start?
What does the system log show when the Postgres reboot startup fails?
Thanks.
--
Adrian Klaver
adrian.klaver@aklaver.com
## Doron Behar (doron.behar@gmail.com):
My server fails to start PostgreSQL only on boot, if I restart it
manually afterwards it doesn't have any problem starting. Here is the
log extracted from the journal:```
2018-09-21 20:46:40.028 CEST [306] LOG: listening on IPv4 address "127.0.0.1", port 5432
2018-09-21 20:46:40.036 CEST [306] LOG: listening on Unix socket "/run/postgresql/.s.PGSQL.5432"
2018-09-21 20:46:40.233 CEST [337] LOG: database system was shut down at 2018-09-21 20:46:21 CEST
2018-09-21 20:48:10.441 CEST [352] WARNING: worker took too long to start; canceled
2018-09-21 20:49:10.469 CEST [352] WARNING: worker took too long to start; canceled
This would indicate that your machine is overloaded during start -
perhaps there's just too much being started at the same time?
ObRant: that's what happens if people take "system startup duration"
as a benchmark and optimize for that - sure, running one clumsy shell
script after another isn't effective usage of today's systems,
but starting eight dozens programs all at once may have other
side effects. Really, with the hardware taking small ages to find
it's own arse before even loading the boot loader, those few seconds
weren't worth optimizing - and if people reboot their computers so
often that startup time takes a measurable toll on their productive
day, perhaps they should rather spend their time thinking about their
usage pattern than "optimizing" the startup process.
So, now that I've got that off my chest... your machine propably tries to
do too much at the same time when booting: the worker processes take
longer than 90 seconds to start. Slow CPU or storage maybe?
2018-09-21 20:49:10.478 CEST [306] LOG: database system is ready to accept connections
2018-09-21 20:49:10.486 CEST [306] LOG: received fast shutdown request
And in the mean time, systemd has lost it's patience, declares the
start as failed and terminates the process group. (The default systemd
timeout is 90 seconds, at least in some releases of systemd, so
this fits quite nicely).
You could try to work around this by increasing TimeoutStartSec
in postgresql's systemd unit (or even globally), which perhaps
only hides the problem until the next service suddenly doesn't
start anymore.
You could move postgresql to the end of the boot order by
adding "After=..." to the Unit section of the systemd service
file, the value behind "After=" being all the other services in
the same target, which should reduce parallelism and improve
PostgreSQL's startup behaviour.
A more advanced variant of that would be to create a new
systemd target, make that start "After" multiuser.target
or even graphical.target (depending on your setup), make sure
it "Requires" the current default systemd target and make
postgresql the only additional service in that target.
(This would be the cleanest solution, but you should get some
grasp of systemd and how your specific distribution uses it
before meddling with the default targets; I don't know every
distribution/version variant of systemd integration, so I
can't give that specific instructions here).
Or you figure out what the heck your machine is running
during startup any why it is that slow, and try to fix that.
Regards,
Christoph
--
Spare Space
On Sat, Sep 22, 2018 at 07:14:33AM -0700, Adrian Klaver wrote:
Linux distro and version?
Arch Linux
$ uname -a
Linux vps 4.18.9-arch1-1-ARCH #1 SMP PREEMPT Wed Sep 19 21:19:17 UTC 2018 x86_64 GNU/Linux
It's a VPS with one CPU core on it and 2G RAM - not very much I know,
hosted on OVH
Assuming Postgres version 10+ given logical replication warning. Still
actual version would be nice.How did you install Postgres?
With the package manager and the files used in it's build are viewable
from here:
https://git.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/postgresql
Where did systemd script come from?
What is in the systemd script?
The systemd service was installed with the package
[Unit]
Description=PostgreSQL database server
After=network.target
[Service]
Type=notify
TimeoutSec=120
User=postgres
Group=postgres
Environment=PGROOT=/var/lib/postgres
SyslogIdentifier=postgres
PIDFile=/var/lib/postgres/data/postmaster.pid
RuntimeDirectory=postgresql
RuntimeDirectoryMode=755
ExecStartPre=/usr/bin/postgresql-check-db-dir ${PGROOT}/data
ExecStart=/usr/bin/postgres -D ${PGROOT}/data
ExecReload=/bin/kill -HUP ${MAINPID}
KillMode=mixed
KillSignal=SIGINT
# Due to PostgreSQL's use of shared memory, OOM killer is often overzealous in
# killing Postgres, so adjust it downward
OOMScoreAdjust=-200
# Additional security-related features
PrivateTmp=true
ProtectHome=true
ProtectSystem=full
NoNewPrivileges=true
[Install]
WantedBy=multi-user.target
What does the log show when you do a successful manual start?
2018-09-22 09:38:44.470 CEST [15251] LOG: listening on IPv4 address "127.0.0.1", port 5432
2018-09-22 09:38:44.472 CEST [15251] LOG: listening on Unix socket "/run/postgresql/.s.PGSQL.5432"
2018-09-22 09:38:44.485 CEST [15253] LOG: database system was shut down at 2018-09-21 20:49:10 CEST
2018-09-22 09:38:44.490 CEST [15251] LOG: database system is ready to accept connections
What does the system log show when the Postgres reboot startup fails?
This was posted in my 1st message.
On Sat, Sep 22, 2018 at 04:58:18PM +0200, Christoph Moench-Tegeder wrote:
## Doron Behar (doron.behar@gmail.com):
My server fails to start PostgreSQL only on boot, if I restart it
manually afterwards it doesn't have any problem starting. Here is the
log extracted from the journal:```
2018-09-21 20:46:40.028 CEST [306] LOG: listening on IPv4 address "127.0.0.1", port 5432
2018-09-21 20:46:40.036 CEST [306] LOG: listening on Unix socket "/run/postgresql/.s.PGSQL.5432"
2018-09-21 20:46:40.233 CEST [337] LOG: database system was shut down at 2018-09-21 20:46:21 CEST
2018-09-21 20:48:10.441 CEST [352] WARNING: worker took too long to start; canceled
2018-09-21 20:49:10.469 CEST [352] WARNING: worker took too long to start; canceledThis would indicate that your machine is overloaded during start -
perhaps there's just too much being started at the same time?
ObRant: that's what happens if people take "system startup duration"
as a benchmark and optimize for that - sure, running one clumsy shell
script after another isn't effective usage of today's systems,
but starting eight dozens programs all at once may have other
side effects. Really, with the hardware taking small ages to find
it's own arse before even loading the boot loader, those few seconds
weren't worth optimizing - and if people reboot their computers so
often that startup time takes a measurable toll on their productive
day, perhaps they should rather spend their time thinking about their
usage pattern than "optimizing" the startup process.So, now that I've got that off my chest... your machine propably tries to
do too much at the same time when booting: the worker processes take
longer than 90 seconds to start. Slow CPU or storage maybe?2018-09-21 20:49:10.478 CEST [306] LOG: database system is ready to accept connections
2018-09-21 20:49:10.486 CEST [306] LOG: received fast shutdown requestAnd in the mean time, systemd has lost it's patience, declares the
start as failed and terminates the process group. (The default systemd
timeout is 90 seconds, at least in some releases of systemd, so
this fits quite nicely).You could try to work around this by increasing TimeoutStartSec
in postgresql's systemd unit (or even globally), which perhaps
only hides the problem until the next service suddenly doesn't
start anymore.
You could move postgresql to the end of the boot order by
adding "After=..." to the Unit section of the systemd service
file, the value behind "After=" being all the other services in
the same target, which should reduce parallelism and improve
PostgreSQL's startup behaviour.
A more advanced variant of that would be to create a new
systemd target, make that start "After" multiuser.target
or even graphical.target (depending on your setup), make sure
it "Requires" the current default systemd target and make
postgresql the only additional service in that target.
(This would be the cleanest solution, but you should get some
grasp of systemd and how your specific distribution uses it
before meddling with the default targets; I don't know every
distribution/version variant of systemd integration, so I
can't give that specific instructions here).
Or you figure out what the heck your machine is running
during startup any why it is that slow, and try to fix that.Regards,
Christoph
Thanks for your very detailed answer, that helped me a lot. I've
increased `TimeoutSec=` to infinity in the systemd service since it was
set initially to 120 seconds which apparently wasn't enough for my poor
VPS with 2G RAM and 1 CPU core. That worked great, I still feel like I
have slow startups but at least PostgreSQL doesn't totally fail to start
on boot.
I'll try to debug the slow startups on my own, thanks again for
everything!
Doron.