pg_ctlcluster is not stopping cluster

Started by Telium Technical Supportabout 3 years ago10 messagesgeneral
Jump to latest

I am string to stop my PostgreSQL (on debian 11) server using the following
command

root@d11:/# sudo -u postgres /usr/bin/pg_ctlcluster 15 main stop -- -m fast
-D /var/lib/postgresql/13/main

Notice: extra pg_ctl/postgres options given, bypassing systemctl for stop
operation

pg_ctl: PID file "/var/lib/postgresql/13/main/postmaster.pid" does not exist

Is server running?

The notice is correct, the is no such postmaster.pid file in that directory,
but yes the service is running. I can confirm it's running with:

root@d11:/# sudo -u postgres /usr/bin/pg_ctlcluster 15 main status -- -D
/var/lib/postgresql/13/main

pg_ctl: server is running (PID: 2701882)

/usr/lib/postgresql/15/bin/postgres "-D" "/var/lib/postgresql/13/main" "-c"
"config_file=/etc/postgresql/15/main/postgresql.conf"

So why is my stop command being ignored? (I tried without the -m fast option
but no change, and I'd like to keep that for other reasons). I confirmed
with 'ps ax' that postgresql 15 is running, despite the directory suggesting
it might be 13. Coincidentally, there is a postmaster.pid file in a
directory OTHER than the data directory:

/var/lib/postgresql/15/main/postmaster.pid

(and notice the 15). Is this a clue?

#2Boris Epstein
borepstein@gmail.com
In reply to: Telium Technical Support (#1)
Re: pg_ctlcluster is not stopping cluster

I wonder if the best way to proceed would be to go on to individual nodes
in the cluster and use OS level commands (such as ps) to track individual
processes and stop them individually.

On Fri, Apr 7, 2023 at 6:27 PM Telium Technical Support <support@telium.io>
wrote:

Show quoted text

I am string to stop my PostgreSQL (on debian 11) server using the
following command

root@d11:/# sudo -u postgres /usr/bin/pg_ctlcluster 15 main stop -- -m
fast -D /var/lib/postgresql/13/main

Notice: extra pg_ctl/postgres options given, bypassing systemctl for stop
operation

pg_ctl: PID file "/var/lib/postgresql/13/main/postmaster.pid" does not
exist

Is server running?

The notice is correct, the is no such postmaster.pid file in that
directory, but yes the service is running. I can confirm it's running with:

root@d11:/# sudo -u postgres /usr/bin/pg_ctlcluster 15 main status -- -D
/var/lib/postgresql/13/main

pg_ctl: server is running (PID: 2701882)

/usr/lib/postgresql/15/bin/postgres "-D" "/var/lib/postgresql/13/main"
"-c" "config_file=/etc/postgresql/15/main/postgresql.conf"

So why is my stop command being ignored? (I tried without the -m fast
option but no change, and I'd like to keep that for other reasons). I
confirmed with 'ps ax' that postgresql 15 is running, despite the directory
suggesting it might be 13. Coincidentally, there is a postmaster.pid file
in a directory OTHER than the data directory:

/var/lib/postgresql/15/main/postmaster.pid

(and notice the 15). Is this a clue?

#3Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Telium Technical Support (#1)
Re: pg_ctlcluster is not stopping cluster

On 4/7/23 15:27, Telium Technical Support wrote:

I am string to stop my PostgreSQL (on debian 11) server using the
following command

root@d11:/# sudo -u postgres /usr/bin/pg_ctlcluster 15 main stop -- -m
fast -D /var/lib/postgresql/13/main

Notice: extra pg_ctl/postgres options given, bypassing systemctl for
stop operation

pg_ctl: PID file "/var/lib/postgresql/13/main/postmaster.pid" does not exist

Is server running?

The notice is correct, the is no such postmaster.pid file in that
directory, but yes the service is running. I can confirm it's running with:

root@d11:/# sudo -u postgres /usr/bin/pg_ctlcluster 15 main status -- -D
/var/lib/postgresql/13/main

pg_ctl: server is running (PID: 2701882)

/usr/lib/postgresql/15/bin/postgres "-D" "/var/lib/postgresql/13/main"
"-c" "config_file=/etc/postgresql/15/main/postgresql.conf"

So why is my stop command being ignored? (I tried without the -m fast
option but no change, and I'd like to keep that for other reasons). I
confirmed with 'ps ax' that postgresql 15 is running, despite the
directory suggesting it might be 13.  Coincidentally, there is a
postmaster.pid file in a directory OTHER than the data directory:

/var/lib/postgresql/15/main/postmaster.pid

(and notice the 15). Is this a clue?

Yes that this:

sudo -u postgres /usr/bin/pg_ctlcluster 15 main stop -- -m fast -D
/var/lib/postgresql/13/main

is not correct.

First do:

pg_lsclusters

to determine what is actually running.

Then do

sudo -u postgres /usr/bin/pg_ctlcluster <version> main stop -- -m fast
-D /var/lib/postgresql/13/main stop -m fast

for whatever version is running.

--
Adrian Klaver
adrian.klaver@aklaver.com

In reply to: Boris Epstein (#2)
RE: pg_ctlcluster is not stopping cluster

These commands are actually run from a C++ program (which does a lot of other things)…so not easy to change. I’m assuming something is misconfigured on the host that’s causing this unusual behavior….and that’s what I need to understand

From: Boris Epstein [mailto:borepstein@gmail.com]
Sent: Friday, April 7, 2023 6:46 PM
To: Telium Technical Support <support@telium.io>
Cc: pgsql-general@lists.postgresql.org
Subject: Re: pg_ctlcluster is not stopping cluster

I wonder if the best way to proceed would be to go on to individual nodes in the cluster and use OS level commands (such as ps) to track individual processes and stop them individually.

On Fri, Apr 7, 2023 at 6:27 PM Telium Technical Support <support@telium.io <mailto:support@telium.io> > wrote:

I am string to stop my PostgreSQL (on debian 11) server using the following command

root@d11:/# sudo -u postgres /usr/bin/pg_ctlcluster 15 main stop -- -m fast -D /var/lib/postgresql/13/main

Notice: extra pg_ctl/postgres options given, bypassing systemctl for stop operation

pg_ctl: PID file "/var/lib/postgresql/13/main/postmaster.pid" does not exist

Is server running?

The notice is correct, the is no such postmaster.pid file in that directory, but yes the service is running. I can confirm it's running with:

root@d11:/# sudo -u postgres /usr/bin/pg_ctlcluster 15 main status -- -D /var/lib/postgresql/13/main

pg_ctl: server is running (PID: 2701882)

/usr/lib/postgresql/15/bin/postgres "-D" "/var/lib/postgresql/13/main" "-c" "config_file=/etc/postgresql/15/main/postgresql.conf"

So why is my stop command being ignored? (I tried without the -m fast option but no change, and I'd like to keep that for other reasons). I confirmed with 'ps ax' that postgresql 15 is running, despite the directory suggesting it might be 13. Coincidentally, there is a postmaster.pid file in a directory OTHER than the data directory:

/var/lib/postgresql/15/main/postmaster.pid

(and notice the 15). Is this a clue?

#5Thorsten Glaser
tg@evolvis.org
In reply to: Telium Technical Support (#1)
Re: pg_ctlcluster is not stopping cluster

On Fri, 7 Apr 2023, Telium Technical Support wrote:

Notice: extra pg_ctl/postgres options given, bypassing systemctl for stop

it might be 13. Coincidentally, there is a postmaster.pid file in a
directory OTHER than the data directory:

/var/lib/postgresql/15/main/postmaster.pid

(and notice the 15). Is this a clue?

Maybe the pidfile is written into the cluster version-based directory
instead of the data directory. Best figure out what exactly writes the
pidfile, whether systemd is used for starting (which would of course be
a prime suspect as it says it’s explicitly not used for stopping), etc.

Maybe this is indeed a bug in whatever determines the pidfile path,
perhaps not; is it supposed to live within the data directory?

bye,
//mirabilos
--
15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)

#6Thorsten Glaser
tg@evolvis.org
In reply to: Telium Technical Support (#4)
RE: pg_ctlcluster is not stopping cluster

On Fri, 7 Apr 2023, Telium Technical Support wrote:

I’m assuming something is misconfigured on the host that’s causing this
unusual behavior….and that’s what I need to understand

The mix between 13 and 15 here is what I’d consider a misconfiguration.

Also, please don’t top-post and full-quote.

bye,
//mirabilos
--
15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)

In reply to: Adrian Klaver (#3)
RE: pg_ctlcluster is not stopping cluster

I tried the command you suggested, and it shows that data directory status as "DOWN". Yet when I ask pg_ctlcluster for it's status it says the server is running.

What does this mean?

root@d11:/var/tmp/myapp# pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
13 main 5432 down postgres /var/lib/postgresql/13/main /var/log/postgresql/postgresql-13-main.log
15 main 5433 down postgres /var/lib/postgresql/15/main /var/log/postgresql/postgresql-15-main.log
root@d11:/var/tmp/myapp# sudo -u postgres /usr/bin/pg_ctlcluster 15 main status -- -D /var/lib/postgresql/13/main
pg_ctl: server is running (PID: 2701882)
/usr/lib/postgresql/15/bin/postgres "-D" "/var/lib/postgresql/13/main" "-c" "config_file=/etc/postgresql/15/main/postgresql.conf"
root@d11:/var/tmp/myapp#

-----Original Message-----
From: Adrian Klaver [mailto:adrian.klaver@aklaver.com]
Sent: Friday, April 7, 2023 6:47 PM
To: Telium Technical Support <support@telium.io>; pgsql-general@lists.postgresql.org
Subject: Re: pg_ctlcluster is not stopping cluster

On 4/7/23 15:27, Telium Technical Support wrote:

I am string to stop my PostgreSQL (on debian 11) server using the
following command

root@d11:/# sudo -u postgres /usr/bin/pg_ctlcluster 15 main stop -- -m
fast -D /var/lib/postgresql/13/main

Notice: extra pg_ctl/postgres options given, bypassing systemctl for
stop operation

pg_ctl: PID file "/var/lib/postgresql/13/main/postmaster.pid" does not
exist

Is server running?

The notice is correct, the is no such postmaster.pid file in that
directory, but yes the service is running. I can confirm it's running with:

root@d11:/# sudo -u postgres /usr/bin/pg_ctlcluster 15 main status --
-D /var/lib/postgresql/13/main

pg_ctl: server is running (PID: 2701882)

/usr/lib/postgresql/15/bin/postgres "-D" "/var/lib/postgresql/13/main"
"-c" "config_file=/etc/postgresql/15/main/postgresql.conf"

So why is my stop command being ignored? (I tried without the -m fast
option but no change, and I'd like to keep that for other reasons). I
confirmed with 'ps ax' that postgresql 15 is running, despite the
directory suggesting it might be 13. Coincidentally, there is a
postmaster.pid file in a directory OTHER than the data directory:

/var/lib/postgresql/15/main/postmaster.pid

(and notice the 15). Is this a clue?

Yes that this:

sudo -u postgres /usr/bin/pg_ctlcluster 15 main stop -- -m fast -D /var/lib/postgresql/13/main

is not correct.

First do:

pg_lsclusters

to determine what is actually running.

Then do

sudo -u postgres /usr/bin/pg_ctlcluster <version> main stop -- -m fast -D /var/lib/postgresql/13/main stop -m fast

for whatever version is running.

--
Adrian Klaver
adrian.klaver@aklaver.com

In reply to: Thorsten Glaser (#6)
RE: pg_ctlcluster is not stopping cluster

The mix between 13 and 15 here is what I’d consider a misconfiguration.

As I inherited this (and I'm somewhat new to pgsql), I'm trying to understand this. From the docs I read online, the postmaster.pid file is supposed to reside in the data directory. Which it does. So that's ok.

Does the fact that the database resides in the /var/lib/postgresql/13 mean I have multiple pgsql servers running? (Does the directory name make a big difference) Can I just kill all pgsql processes and move the directory into the /15 directory and problem solved?

It feels like I'm missing something obvious...why would the directory matter so much (since pgsql is clearly tracking it in the right dir)

#9Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Telium Technical Support (#7)
Re: pg_ctlcluster is not stopping cluster

On 4/7/23 15:51, Telium Technical Support wrote:

I tried the command you suggested, and it shows that data directory status as "DOWN". Yet when I ask pg_ctlcluster for it's status it says the server is running.

What does this mean?

root@d11:/var/tmp/myapp# pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
13 main 5432 down postgres /var/lib/postgresql/13/main /var/log/postgresql/postgresql-13-main.log
15 main 5433 down postgres /var/lib/postgresql/15/main /var/log/postgresql/postgresql-15-main.log

Neither cluster is running.

To confirm do:

ps ax | grep postgres

root@d11:/var/tmp/myapp# sudo -u postgres /usr/bin/pg_ctlcluster 15 main status -- -D /var/lib/postgresql/13/main
pg_ctl: server is running (PID: 2701882)

Best guess, is that because of this:

sudo -u postgres /usr/bin/pg_ctlcluster 15 main stop -- -m fast -D
/var/lib/postgresql/13/main

the Postgres pid did not get removed on shutdown.

Bottom line you should not use

pg_ctlcluster 15 main stop

to

shut down a 13 cluster located at:

-D /var/lib/postgresql/13/main

Just do:

sudo -u postgres /usr/bin/pg_ctlcluster 15 main stop -m fast

or

sudo -u postgres /usr/bin/pg_ctlcluster 13 main stop -- -m fast

depending on which cluster you want to shut down.

/usr/lib/postgresql/15/bin/postgres "-D" "/var/lib/postgresql/13/main" "-c" "config_file=/etc/postgresql/15/main/postgresql.conf"
root@d11:/var/tmp/myapp#

--
Adrian Klaver
adrian.klaver@aklaver.com

#10Jerry Sievers
gsievers19@comcast.net
In reply to: Thorsten Glaser (#5)
Re: pg_ctlcluster is not stopping cluster

Thorsten Glaser <tg@evolvis.org> writes:

On Fri, 7 Apr 2023, Telium Technical Support wrote:

Notice: extra pg_ctl/postgres options given, bypassing systemctl for stop

it might be 13. Coincidentally, there is a postmaster.pid file in a
directory OTHER than the data directory:

/var/lib/postgresql/15/main/postmaster.pid

(and notice the 15). Is this a clue?

Maybe the pidfile is written into the cluster version-based directory
instead of the data directory. Best figure out what exactly writes the
pidfile, whether systemd is used for starting (which would of course be

Some distros in their wrapper foo, I believe including Debian use this
setting to put a PID file under /var/run/postgresql IIRC but YMMV...

external_pid_file (string)

Specifies the name of an additional process-ID (PID) file that the
server should create for use by server administration programs. This
parameter can only be set at server start.

Show quoted text

a prime suspect as it says it’s explicitly not used for stopping), etc.

Maybe this is indeed a bug in whatever determines the pidfile path,
perhaps not; is it supposed to live within the data directory?

bye,
//mirabilos