pg_ctl status with nonexistent data directory

Started by Peter Eisentrautabout 12 years ago17 messages
#1Peter Eisentraut
peter_e@gmx.net

This doesn't seem right:

$ pg_ctl -D /nowhere status
pg_ctl: no server running

It does exit with status 3, so it's not all that broken, but I think the
error message could be more accurate.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Robert Haas
robertmhaas@gmail.com
In reply to: Peter Eisentraut (#1)
Re: pg_ctl status with nonexistent data directory

On Sat, Nov 2, 2013 at 3:32 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

This doesn't seem right:

$ pg_ctl -D /nowhere status
pg_ctl: no server running

It does exit with status 3, so it's not all that broken, but I think the
error message could be more accurate.

I doubt anyone will object if you feel the urge to fix it. I won't, anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#2)
1 attachment(s)
Re: pg_ctl status with nonexistent data directory

On Mon, Nov 4, 2013 at 11:31:30AM -0500, Robert Haas wrote:

On Sat, Nov 2, 2013 at 3:32 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

This doesn't seem right:

$ pg_ctl -D /nowhere status
pg_ctl: no server running

It does exit with status 3, so it's not all that broken, but I think the
error message could be more accurate.

I doubt anyone will object if you feel the urge to fix it. I won't, anyway.

I have addressed this issue with the attached patch:

$ pg_ctl -D /lkjasdf status
pg_ctl: directory "/lkjasdf" does not exist
$ pg_ctl -D / status
pg_ctl: directory "/" is not a database cluster directory

One odd question is that pg_ctl status has this comment for reporting
the exit code for non-running servers:

printf(_("%s: no server running\n"), progname);

/*
* The Linux Standard Base Core Specification 3.1 says this should return
* '3'
* https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
*/
exit(3);

If they haven't passed us a data directory, we don't really know if the
server is running or not, so the patch just returns '1'.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

Attachments:

pg_ctl.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
new file mode 100644
index 0dbaa6e..90c98ee
*** a/src/bin/pg_ctl/pg_ctl.c
--- b/src/bin/pg_ctl/pg_ctl.c
*************** static bool allow_core_files = false;
*** 97,102 ****
--- 97,103 ----
  static time_t start_time;
  
  static char postopts_file[MAXPGPATH];
+ static char version_file[MAXPGPATH];
  static char pid_file[MAXPGPATH];
  static char backup_file[MAXPGPATH];
  static char recovery_file[MAXPGPATH];
*************** get_pgpid(void)
*** 250,255 ****
--- 251,275 ----
  {
  	FILE	   *pidf;
  	long		pid;
+ 	struct stat statbuf;
+ 
+ 	if (stat(pg_data, &statbuf) != 0)
+ 	{
+ 		if (errno == ENOENT)
+ 			printf(_("%s: directory \"%s\" does not exist\n"), progname,
+ 					 pg_data);
+ 		else
+ 			printf(_("%s: cannot access directory \"%s\"\n"), progname,
+ 					 pg_data);
+ 		exit(1);
+ 	}
+ 
+ 	if (stat(version_file, &statbuf) != 0 && errno == ENOENT)
+ 	{
+ 		printf(_("%s: directory \"%s\" is not a database cluster directory\n"),
+ 				 progname, pg_data);
+ 		exit(1);
+ 	}
  
  	pidf = fopen(pid_file, "r");
  	if (pidf == NULL)
*************** main(int argc, char **argv)
*** 2285,2290 ****
--- 2305,2311 ----
  	if (pg_data)
  	{
  		snprintf(postopts_file, MAXPGPATH, "%s/postmaster.opts", pg_data);
+ 		snprintf(version_file, MAXPGPATH, "%s/PG_VERSION", pg_data);
  		snprintf(pid_file, MAXPGPATH, "%s/postmaster.pid", pg_data);
  		snprintf(backup_file, MAXPGPATH, "%s/backup_label", pg_data);
  		snprintf(recovery_file, MAXPGPATH, "%s/recovery.conf", pg_data);
#4Amit Kapila
amit.kapila16@gmail.com
In reply to: Bruce Momjian (#3)
Re: pg_ctl status with nonexistent data directory

On Thu, Mar 6, 2014 at 4:38 AM, Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Nov 4, 2013 at 11:31:30AM -0500, Robert Haas wrote:

On Sat, Nov 2, 2013 at 3:32 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

This doesn't seem right:

$ pg_ctl -D /nowhere status
pg_ctl: no server running

It does exit with status 3, so it's not all that broken, but I think the
error message could be more accurate.

I doubt anyone will object if you feel the urge to fix it. I won't, anyway.

I have addressed this issue with the attached patch:

$ pg_ctl -D /lkjasdf status
pg_ctl: directory "/lkjasdf" does not exist
$ pg_ctl -D / status
pg_ctl: directory "/" is not a database cluster directory

One odd question is that pg_ctl status has this comment for reporting
the exit code for non-running servers:

printf(_("%s: no server running\n"), progname);

/*
* The Linux Standard Base Core Specification 3.1 says this should return
* '3'
* https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
*/
exit(3);

If they haven't passed us a data directory, we don't really know if the
server is running or not, so the patch just returns '1'.

But for such cases, isn't the status 4 more appropriate?
As per above link "4 program or service status is unknown"

status 1 - "1 program is dead and /var/run pid file exists"
Going by this definition, it seems status 1 means, someone
has forcefully killed the server and pid file still remains.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Bruce Momjian
bruce@momjian.us
In reply to: Amit Kapila (#4)
Re: pg_ctl status with nonexistent data directory

On Thu, Mar 6, 2014 at 09:54:57AM +0530, Amit Kapila wrote:

If they haven't passed us a data directory, we don't really know if the
server is running or not, so the patch just returns '1'.

But for such cases, isn't the status 4 more appropriate?
As per above link "4 program or service status is unknown"

status 1 - "1 program is dead and /var/run pid file exists"
Going by this definition, it seems status 1 means, someone
has forcefully killed the server and pid file still remains.

Technically, you are right, but I tried a while ago to assign meaningful
values to all the exit locations and the community feedback I got was
that we didn't want that. I don't see how specifying non-existant or
non-cluster directory would somehow be a case that would be special.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Florian Pflug
fgp@phlo.org
In reply to: Bruce Momjian (#3)
Re: pg_ctl status with nonexistent data directory

On Mar6, 2014, at 00:08 , Bruce Momjian <bruce@momjian.us> wrote:

I have addressed this issue with the attached patch:

$ pg_ctl -D /lkjasdf status
pg_ctl: directory "/lkjasdf" does not exist
$ pg_ctl -D / status
pg_ctl: directory "/" is not a database cluster directory

One odd question is that pg_ctl status has this comment for reporting
the exit code for non-running servers:

printf(_("%s: no server running\n"), progname);

/*
* The Linux Standard Base Core Specification 3.1 says this should return
* '3'
* https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
*/
exit(3);

If they haven't passed us a data directory, we don't really know if the
server is running or not, so the patch just returns '1'.

Why change the exit code at all in the ENOENT-case? If the directory
does not exist, it's fairly certain that the server is not running, so
"3" seems fine. Technically, changing the return value is an API change
and might break things, so why do it if there's no clear benefit?

In the EPERM case (or, rather the non-ENOENT case), I agree with Amit
that "4" (meaning "program or service status is unknown") fits much better
than "1" (meaning "program is dead and /var/run pid file exists"). So *if*
we change it at all, we should change it to 4, not to some other, equally
arbitrary value.

best regards,
Florian Pflug

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#5)
Re: pg_ctl status with nonexistent data directory

Bruce Momjian escribi�:

Technically, you are right, but I tried a while ago to assign meaningful
values to all the exit locations and the community feedback I got was
that we didn't want that.

That sounds odd. Do you have a link?

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#7)
Re: pg_ctl status with nonexistent data directory

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Bruce Momjian escribi�:

Technically, you are right, but I tried a while ago to assign meaningful
values to all the exit locations and the community feedback I got was
that we didn't want that.

That sounds odd. Do you have a link?

FWIW, I recall that in my former life at Red Hat, I had to deal several
times with patching pg_ctl to make its exit codes more LSB-compliant.
And I think Debian does the same. So Linux packagers, at least, would
like to see us pay more attention to that standard. On the other hand,
there is no market for changes that make the exit codes "more
meaningful" unless the changes can be justified by chapter and verse
in LSB.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#7)
Re: pg_ctl status with nonexistent data directory

On Thu, Mar 6, 2014 at 12:17:55PM -0300, Alvaro Herrera wrote:

Bruce Momjian escribi�:

Technically, you are right, but I tried a while ago to assign meaningful
values to all the exit locations and the community feedback I got was
that we didn't want that.

That sounds odd. Do you have a link?

Sure, the patch is here:

/messages/by-id/20130629025033.GI13790@momjian.us

and the idea of keeping what we have is stated here:

/messages/by-id/51D1E482.5090602@gmx.net

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Amit Kapila
amit.kapila16@gmail.com
In reply to: Bruce Momjian (#5)
Re: pg_ctl status with nonexistent data directory

On Thu, Mar 6, 2014 at 7:46 PM, Bruce Momjian <bruce@momjian.us> wrote:

On Thu, Mar 6, 2014 at 09:54:57AM +0530, Amit Kapila wrote:

If they haven't passed us a data directory, we don't really know if the
server is running or not, so the patch just returns '1'.

But for such cases, isn't the status 4 more appropriate?
As per above link "4 program or service status is unknown"

status 1 - "1 program is dead and /var/run pid file exists"
Going by this definition, it seems status 1 means, someone
has forcefully killed the server and pid file still remains.

Technically, you are right, but I tried a while ago to assign meaningful
values to all the exit locations and the community feedback I got was
that we didn't want that. I don't see how specifying non-existant or
non-cluster directory would somehow be a case that would be special.

One reason could be that as we are already returning special exit code
for 'status' option of pg_ctl (exit(3)), this seems to be inline with it as this
also get called during status command.

Also in the link sent by you in another mail, I could see some statement
which indicates it is more important to return correct codes in case of
status which sounds a good reason to me.

Link and statement
/messages/by-id/51D1E482.5090602@gmx.net

"The "status" case is different, because there the exit code
can be passed out by the init script directly."

If we just want to handle correct exit codes for "status" option, then
may be we need to refactor the code a bit to ensure the same.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#9)
Re: pg_ctl status with nonexistent data directory

Bruce Momjian <bruce@momjian.us> writes:

On Thu, Mar 6, 2014 at 12:17:55PM -0300, Alvaro Herrera wrote:

Bruce Momjian escribi�:

Technically, you are right, but I tried a while ago to assign meaningful
values to all the exit locations and the community feedback I got was
that we didn't want that.

That sounds odd. Do you have a link?

Sure, the patch is here:
/messages/by-id/20130629025033.GI13790@momjian.us
and the idea of keeping what we have is stated here:
/messages/by-id/51D1E482.5090602@gmx.net

Perhaps I shouldn't be putting words in Peter's mouth, but my reading of
his complaint was that he didn't think you'd mapped the pg_ctl failure
conditions to LSB status codes very well. That's not necessarily a vote
against the abstract idea of making it more LSB-compliant.

But it seems like we might have to go through it case-by-case to argue out
what's the right error code for each case ... and I'm not sure anybody
thinks it's worth that much effort.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#11)
Re: pg_ctl status with nonexistent data directory

On Thu, Mar 6, 2014 at 10:43:01PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Thu, Mar 6, 2014 at 12:17:55PM -0300, Alvaro Herrera wrote:

Bruce Momjian escribi�:

Technically, you are right, but I tried a while ago to assign meaningful
values to all the exit locations and the community feedback I got was
that we didn't want that.

That sounds odd. Do you have a link?

Sure, the patch is here:
/messages/by-id/20130629025033.GI13790@momjian.us
and the idea of keeping what we have is stated here:
/messages/by-id/51D1E482.5090602@gmx.net

Perhaps I shouldn't be putting words in Peter's mouth, but my reading of
his complaint was that he didn't think you'd mapped the pg_ctl failure
conditions to LSB status codes very well. That's not necessarily a vote
against the abstract idea of making it more LSB-compliant.

But it seems like we might have to go through it case-by-case to argue out
what's the right error code for each case ... and I'm not sure anybody
thinks it's worth that much effort.

Yes, I think the question was whether the effort was worth it.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Bruce Momjian
bruce@momjian.us
In reply to: Amit Kapila (#10)
1 attachment(s)
Re: pg_ctl status with nonexistent data directory

On Fri, Mar 7, 2014 at 09:07:24AM +0530, Amit Kapila wrote:

One reason could be that as we are already returning special exit code
for 'status' option of pg_ctl (exit(3)), this seems to be inline with it as this
also get called during status command.

Also in the link sent by you in another mail, I could see some statement
which indicates it is more important to return correct codes in case of
status which sounds a good reason to me.

Link and statement
/messages/by-id/51D1E482.5090602@gmx.net

"The "status" case is different, because there the exit code
can be passed out by the init script directly."

If we just want to handle correct exit codes for "status" option, then
may be we need to refactor the code a bit to ensure the same.

OK, done with the attached patch Three is returned for status, but one
for other cases.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

Attachments:

pg_ctl.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
new file mode 100644
index 0dbaa6e..8c9970d
*** a/src/bin/pg_ctl/pg_ctl.c
--- b/src/bin/pg_ctl/pg_ctl.c
*************** static bool allow_core_files = false;
*** 97,102 ****
--- 97,103 ----
  static time_t start_time;
  
  static char postopts_file[MAXPGPATH];
+ static char version_file[MAXPGPATH];
  static char pid_file[MAXPGPATH];
  static char backup_file[MAXPGPATH];
  static char recovery_file[MAXPGPATH];
*************** static void pgwin32_doRunAsService(void)
*** 152,158 ****
  static int	CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service);
  #endif
  
! static pgpid_t get_pgpid(void);
  static char **readfile(const char *path);
  static void free_readfile(char **optlines);
  static int	start_postmaster(void);
--- 153,159 ----
  static int	CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service);
  #endif
  
! static pgpid_t get_pgpid(bool is_status_request);
  static char **readfile(const char *path);
  static void free_readfile(char **optlines);
  static int	start_postmaster(void);
*************** print_msg(const char *msg)
*** 246,255 ****
  }
  
  static pgpid_t
! get_pgpid(void)
  {
  	FILE	   *pidf;
  	long		pid;
  
  	pidf = fopen(pid_file, "r");
  	if (pidf == NULL)
--- 247,275 ----
  }
  
  static pgpid_t
! get_pgpid(bool is_status_request)
  {
  	FILE	   *pidf;
  	long		pid;
+ 	struct stat statbuf;
+ 
+ 	if (stat(pg_data, &statbuf) != 0)
+ 	{
+ 		if (errno == ENOENT)
+ 			printf(_("%s: directory \"%s\" does not exist\n"), progname,
+ 					 pg_data);
+ 		else
+ 			printf(_("%s: cannot access directory \"%s\"\n"), progname,
+ 					 pg_data);
+ 		exit(is_status_request ? 3 : 1);
+ 	}
+ 
+ 	if (stat(version_file, &statbuf) != 0 && errno == ENOENT)
+ 	{
+ 		printf(_("%s: directory \"%s\" is not a database cluster directory\n"),
+ 				 progname, pg_data);
+ 		exit(is_status_request ? 3 : 1);
+ 	}
  
  	pidf = fopen(pid_file, "r");
  	if (pidf == NULL)
*************** do_start(void)
*** 810,816 ****
  
  	if (ctl_command != RESTART_COMMAND)
  	{
! 		old_pid = get_pgpid();
  		if (old_pid != 0)
  			write_stderr(_("%s: another server might be running; "
  						   "trying to start server anyway\n"),
--- 830,836 ----
  
  	if (ctl_command != RESTART_COMMAND)
  	{
! 		old_pid = get_pgpid(false);
  		if (old_pid != 0)
  			write_stderr(_("%s: another server might be running; "
  						   "trying to start server anyway\n"),
*************** do_stop(void)
*** 894,900 ****
  	pgpid_t		pid;
  	struct stat statbuf;
  
! 	pid = get_pgpid();
  
  	if (pid == 0)				/* no pid file */
  	{
--- 914,920 ----
  	pgpid_t		pid;
  	struct stat statbuf;
  
! 	pid = get_pgpid(false);
  
  	if (pid == 0)				/* no pid file */
  	{
*************** do_stop(void)
*** 943,949 ****
  
  		for (cnt = 0; cnt < wait_seconds; cnt++)
  		{
! 			if ((pid = get_pgpid()) != 0)
  			{
  				print_msg(".");
  				pg_usleep(1000000);		/* 1 sec */
--- 963,969 ----
  
  		for (cnt = 0; cnt < wait_seconds; cnt++)
  		{
! 			if ((pid = get_pgpid(false)) != 0)
  			{
  				print_msg(".");
  				pg_usleep(1000000);		/* 1 sec */
*************** do_restart(void)
*** 980,986 ****
  	pgpid_t		pid;
  	struct stat statbuf;
  
! 	pid = get_pgpid();
  
  	if (pid == 0)				/* no pid file */
  	{
--- 1000,1006 ----
  	pgpid_t		pid;
  	struct stat statbuf;
  
! 	pid = get_pgpid(false);
  
  	if (pid == 0)				/* no pid file */
  	{
*************** do_restart(void)
*** 1033,1039 ****
  
  		for (cnt = 0; cnt < wait_seconds; cnt++)
  		{
! 			if ((pid = get_pgpid()) != 0)
  			{
  				print_msg(".");
  				pg_usleep(1000000);		/* 1 sec */
--- 1053,1059 ----
  
  		for (cnt = 0; cnt < wait_seconds; cnt++)
  		{
! 			if ((pid = get_pgpid(false)) != 0)
  			{
  				print_msg(".");
  				pg_usleep(1000000);		/* 1 sec */
*************** do_reload(void)
*** 1071,1077 ****
  {
  	pgpid_t		pid;
  
! 	pid = get_pgpid();
  	if (pid == 0)				/* no pid file */
  	{
  		write_stderr(_("%s: PID file \"%s\" does not exist\n"), progname, pid_file);
--- 1091,1097 ----
  {
  	pgpid_t		pid;
  
! 	pid = get_pgpid(false);
  	if (pid == 0)				/* no pid file */
  	{
  		write_stderr(_("%s: PID file \"%s\" does not exist\n"), progname, pid_file);
*************** do_promote(void)
*** 1110,1116 ****
  	pgpid_t		pid;
  	struct stat statbuf;
  
! 	pid = get_pgpid();
  
  	if (pid == 0)				/* no pid file */
  	{
--- 1130,1136 ----
  	pgpid_t		pid;
  	struct stat statbuf;
  
! 	pid = get_pgpid(false);
  
  	if (pid == 0)				/* no pid file */
  	{
*************** do_status(void)
*** 1204,1210 ****
  {
  	pgpid_t		pid;
  
! 	pid = get_pgpid();
  	/* Is there a pid file? */
  	if (pid != 0)
  	{
--- 1224,1230 ----
  {
  	pgpid_t		pid;
  
! 	pid = get_pgpid(true);
  	/* Is there a pid file? */
  	if (pid != 0)
  	{
*************** main(int argc, char **argv)
*** 2285,2290 ****
--- 2305,2311 ----
  	if (pg_data)
  	{
  		snprintf(postopts_file, MAXPGPATH, "%s/postmaster.opts", pg_data);
+ 		snprintf(version_file, MAXPGPATH, "%s/PG_VERSION", pg_data);
  		snprintf(pid_file, MAXPGPATH, "%s/postmaster.pid", pg_data);
  		snprintf(backup_file, MAXPGPATH, "%s/backup_label", pg_data);
  		snprintf(recovery_file, MAXPGPATH, "%s/recovery.conf", pg_data);
#14Amit Kapila
amit.kapila16@gmail.com
In reply to: Bruce Momjian (#13)
Re: pg_ctl status with nonexistent data directory

On Fri, Mar 7, 2014 at 9:41 AM, Bruce Momjian <bruce@momjian.us> wrote:

On Fri, Mar 7, 2014 at 09:07:24AM +0530, Amit Kapila wrote:

One reason could be that as we are already returning special exit code
for 'status' option of pg_ctl (exit(3)), this seems to be inline with it as this
also get called during status command.

Also in the link sent by you in another mail, I could see some statement
which indicates it is more important to return correct codes in case of
status which sounds a good reason to me.

Link and statement
/messages/by-id/51D1E482.5090602@gmx.net

"The "status" case is different, because there the exit code
can be passed out by the init script directly."

If we just want to handle correct exit codes for "status" option, then
may be we need to refactor the code a bit to ensure the same.

OK, done with the attached patch Three is returned for status, but one
for other cases.

I think it would have been better if it return status as 4 for cases where
directory/file is not accessible (current new cases added by this patch).

I think status 3 should be only return only when the data directory is valid
and pid file is missing, because in script after getting this code, the next
thing they will probably do is to start the server which doesn't seem a
good fit for newly added cases.

Other than that patch seems fine to me.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Bruce Momjian
bruce@momjian.us
In reply to: Amit Kapila (#14)
1 attachment(s)
Re: pg_ctl status with nonexistent data directory

On Fri, Mar 7, 2014 at 07:18:04PM +0530, Amit Kapila wrote:

OK, done with the attached patch Three is returned for status, but one
for other cases.

I think it would have been better if it return status as 4 for cases where
directory/file is not accessible (current new cases added by this patch).

I think status 3 should be only return only when the data directory is valid
and pid file is missing, because in script after getting this code, the next
thing they will probably do is to start the server which doesn't seem a
good fit for newly added cases.

OK, I have updated the attached patch to reflect this, and added a C
comment.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

Attachments:

pg_ctl.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
new file mode 100644
index 0dbaa6e..56d238f
*** a/src/bin/pg_ctl/pg_ctl.c
--- b/src/bin/pg_ctl/pg_ctl.c
*************** static bool allow_core_files = false;
*** 97,102 ****
--- 97,103 ----
  static time_t start_time;
  
  static char postopts_file[MAXPGPATH];
+ static char version_file[MAXPGPATH];
  static char pid_file[MAXPGPATH];
  static char backup_file[MAXPGPATH];
  static char recovery_file[MAXPGPATH];
*************** static void pgwin32_doRunAsService(void)
*** 152,158 ****
  static int	CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service);
  #endif
  
! static pgpid_t get_pgpid(void);
  static char **readfile(const char *path);
  static void free_readfile(char **optlines);
  static int	start_postmaster(void);
--- 153,159 ----
  static int	CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service);
  #endif
  
! static pgpid_t get_pgpid(bool is_status_request);
  static char **readfile(const char *path);
  static void free_readfile(char **optlines);
  static int	start_postmaster(void);
*************** print_msg(const char *msg)
*** 246,255 ****
  }
  
  static pgpid_t
! get_pgpid(void)
  {
  	FILE	   *pidf;
  	long		pid;
  
  	pidf = fopen(pid_file, "r");
  	if (pidf == NULL)
--- 247,280 ----
  }
  
  static pgpid_t
! get_pgpid(bool is_status_request)
  {
  	FILE	   *pidf;
  	long		pid;
+ 	struct stat statbuf;
+ 
+ 	if (stat(pg_data, &statbuf) != 0)
+ 	{
+ 		if (errno == ENOENT)
+ 			printf(_("%s: directory \"%s\" does not exist\n"), progname,
+ 					 pg_data);
+ 		else
+ 			printf(_("%s: cannot access directory \"%s\"\n"), progname,
+ 					 pg_data);
+ 		/*
+ 		 * The Linux Standard Base Core Specification 3.1 says this should return
+ 		 * '4, program or service status is unknown'
+ 		 * https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
+ 		 */
+ 		exit(is_status_request ? 4 : 1);
+ 	}
+ 
+ 	if (stat(version_file, &statbuf) != 0 && errno == ENOENT)
+ 	{
+ 		printf(_("%s: directory \"%s\" is not a database cluster directory\n"),
+ 				 progname, pg_data);
+ 		exit(is_status_request ? 4 : 1);
+ 	}
  
  	pidf = fopen(pid_file, "r");
  	if (pidf == NULL)
*************** do_start(void)
*** 810,816 ****
  
  	if (ctl_command != RESTART_COMMAND)
  	{
! 		old_pid = get_pgpid();
  		if (old_pid != 0)
  			write_stderr(_("%s: another server might be running; "
  						   "trying to start server anyway\n"),
--- 835,841 ----
  
  	if (ctl_command != RESTART_COMMAND)
  	{
! 		old_pid = get_pgpid(false);
  		if (old_pid != 0)
  			write_stderr(_("%s: another server might be running; "
  						   "trying to start server anyway\n"),
*************** do_stop(void)
*** 894,900 ****
  	pgpid_t		pid;
  	struct stat statbuf;
  
! 	pid = get_pgpid();
  
  	if (pid == 0)				/* no pid file */
  	{
--- 919,925 ----
  	pgpid_t		pid;
  	struct stat statbuf;
  
! 	pid = get_pgpid(false);
  
  	if (pid == 0)				/* no pid file */
  	{
*************** do_stop(void)
*** 943,949 ****
  
  		for (cnt = 0; cnt < wait_seconds; cnt++)
  		{
! 			if ((pid = get_pgpid()) != 0)
  			{
  				print_msg(".");
  				pg_usleep(1000000);		/* 1 sec */
--- 968,974 ----
  
  		for (cnt = 0; cnt < wait_seconds; cnt++)
  		{
! 			if ((pid = get_pgpid(false)) != 0)
  			{
  				print_msg(".");
  				pg_usleep(1000000);		/* 1 sec */
*************** do_restart(void)
*** 980,986 ****
  	pgpid_t		pid;
  	struct stat statbuf;
  
! 	pid = get_pgpid();
  
  	if (pid == 0)				/* no pid file */
  	{
--- 1005,1011 ----
  	pgpid_t		pid;
  	struct stat statbuf;
  
! 	pid = get_pgpid(false);
  
  	if (pid == 0)				/* no pid file */
  	{
*************** do_restart(void)
*** 1033,1039 ****
  
  		for (cnt = 0; cnt < wait_seconds; cnt++)
  		{
! 			if ((pid = get_pgpid()) != 0)
  			{
  				print_msg(".");
  				pg_usleep(1000000);		/* 1 sec */
--- 1058,1064 ----
  
  		for (cnt = 0; cnt < wait_seconds; cnt++)
  		{
! 			if ((pid = get_pgpid(false)) != 0)
  			{
  				print_msg(".");
  				pg_usleep(1000000);		/* 1 sec */
*************** do_reload(void)
*** 1071,1077 ****
  {
  	pgpid_t		pid;
  
! 	pid = get_pgpid();
  	if (pid == 0)				/* no pid file */
  	{
  		write_stderr(_("%s: PID file \"%s\" does not exist\n"), progname, pid_file);
--- 1096,1102 ----
  {
  	pgpid_t		pid;
  
! 	pid = get_pgpid(false);
  	if (pid == 0)				/* no pid file */
  	{
  		write_stderr(_("%s: PID file \"%s\" does not exist\n"), progname, pid_file);
*************** do_promote(void)
*** 1110,1116 ****
  	pgpid_t		pid;
  	struct stat statbuf;
  
! 	pid = get_pgpid();
  
  	if (pid == 0)				/* no pid file */
  	{
--- 1135,1141 ----
  	pgpid_t		pid;
  	struct stat statbuf;
  
! 	pid = get_pgpid(false);
  
  	if (pid == 0)				/* no pid file */
  	{
*************** do_status(void)
*** 1204,1210 ****
  {
  	pgpid_t		pid;
  
! 	pid = get_pgpid();
  	/* Is there a pid file? */
  	if (pid != 0)
  	{
--- 1229,1235 ----
  {
  	pgpid_t		pid;
  
! 	pid = get_pgpid(true);
  	/* Is there a pid file? */
  	if (pid != 0)
  	{
*************** do_status(void)
*** 1247,1253 ****
  
  	/*
  	 * The Linux Standard Base Core Specification 3.1 says this should return
! 	 * '3'
  	 * https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
  	 */
  	exit(3);
--- 1272,1278 ----
  
  	/*
  	 * The Linux Standard Base Core Specification 3.1 says this should return
! 	 * '3, program is not running'
  	 * https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
  	 */
  	exit(3);
*************** main(int argc, char **argv)
*** 2285,2290 ****
--- 2310,2316 ----
  	if (pg_data)
  	{
  		snprintf(postopts_file, MAXPGPATH, "%s/postmaster.opts", pg_data);
+ 		snprintf(version_file, MAXPGPATH, "%s/PG_VERSION", pg_data);
  		snprintf(pid_file, MAXPGPATH, "%s/postmaster.pid", pg_data);
  		snprintf(backup_file, MAXPGPATH, "%s/backup_label", pg_data);
  		snprintf(recovery_file, MAXPGPATH, "%s/recovery.conf", pg_data);
#16Amit Kapila
amit.kapila16@gmail.com
In reply to: Bruce Momjian (#15)
1 attachment(s)
Re: pg_ctl status with nonexistent data directory

On Fri, Mar 7, 2014 at 7:59 PM, Bruce Momjian <bruce@momjian.us> wrote:

On Fri, Mar 7, 2014 at 07:18:04PM +0530, Amit Kapila wrote:

OK, done with the attached patch Three is returned for status, but one
for other cases.

I think it would have been better if it return status as 4 for cases where
directory/file is not accessible (current new cases added by this patch).

I think status 3 should be only return only when the data directory is valid
and pid file is missing, because in script after getting this code, the next
thing they will probably do is to start the server which doesn't seem a
good fit for newly added cases.

OK, I have updated the attached patch to reflect this, and added a C
comment.

This is fine, do you think we should mention about this in docs?
I have added one line in docs (updated patch attached), if you think
it makes sense then add it otherwise ignore it.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

pg_ctl-v2.difftext/plain; charset=US-ASCII; name=pg_ctl-v2.diffDownload
diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index 45b53ce..99cb176 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -206,9 +206,10 @@ PostgreSQL documentation
   <para>
    <option>status</option> mode checks whether a server is running in
    the specified data directory. If it is, the <acronym>PID</acronym>
-   and the command line options that were used to invoke it are
-   displayed.  If the server is not running, the process returns an
-   exit status of 3.
+    and the command line options that were used to invoke it are
+    displayed.  If the server is not running, the process returns an
+    exit status of 3.  If the specified data directory is not accessible,
+    the process returns an exit status of 4.
   </para>
 
   <para>
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 0dbaa6e..56d238f 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -97,6 +97,7 @@ static bool allow_core_files = false;
 static time_t start_time;
 
 static char postopts_file[MAXPGPATH];
+static char version_file[MAXPGPATH];
 static char pid_file[MAXPGPATH];
 static char backup_file[MAXPGPATH];
 static char recovery_file[MAXPGPATH];
@@ -152,7 +153,7 @@ static void pgwin32_doRunAsService(void);
 static int	CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service);
 #endif
 
-static pgpid_t get_pgpid(void);
+static pgpid_t get_pgpid(bool is_status_request);
 static char **readfile(const char *path);
 static void free_readfile(char **optlines);
 static int	start_postmaster(void);
@@ -246,10 +247,34 @@ print_msg(const char *msg)
 }
 
 static pgpid_t
-get_pgpid(void)
+get_pgpid(bool is_status_request)
 {
 	FILE	   *pidf;
 	long		pid;
+	struct stat statbuf;
+
+	if (stat(pg_data, &statbuf) != 0)
+	{
+		if (errno == ENOENT)
+			printf(_("%s: directory \"%s\" does not exist\n"), progname,
+					 pg_data);
+		else
+			printf(_("%s: cannot access directory \"%s\"\n"), progname,
+					 pg_data);
+		/*
+		 * The Linux Standard Base Core Specification 3.1 says this should return
+		 * '4, program or service status is unknown'
+		 * https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
+		 */
+		exit(is_status_request ? 4 : 1);
+	}
+
+	if (stat(version_file, &statbuf) != 0 && errno == ENOENT)
+	{
+		printf(_("%s: directory \"%s\" is not a database cluster directory\n"),
+				 progname, pg_data);
+		exit(is_status_request ? 4 : 1);
+	}
 
 	pidf = fopen(pid_file, "r");
 	if (pidf == NULL)
@@ -810,7 +835,7 @@ do_start(void)
 
 	if (ctl_command != RESTART_COMMAND)
 	{
-		old_pid = get_pgpid();
+		old_pid = get_pgpid(false);
 		if (old_pid != 0)
 			write_stderr(_("%s: another server might be running; "
 						   "trying to start server anyway\n"),
@@ -894,7 +919,7 @@ do_stop(void)
 	pgpid_t		pid;
 	struct stat statbuf;
 
-	pid = get_pgpid();
+	pid = get_pgpid(false);
 
 	if (pid == 0)				/* no pid file */
 	{
@@ -943,7 +968,7 @@ do_stop(void)
 
 		for (cnt = 0; cnt < wait_seconds; cnt++)
 		{
-			if ((pid = get_pgpid()) != 0)
+			if ((pid = get_pgpid(false)) != 0)
 			{
 				print_msg(".");
 				pg_usleep(1000000);		/* 1 sec */
@@ -980,7 +1005,7 @@ do_restart(void)
 	pgpid_t		pid;
 	struct stat statbuf;
 
-	pid = get_pgpid();
+	pid = get_pgpid(false);
 
 	if (pid == 0)				/* no pid file */
 	{
@@ -1033,7 +1058,7 @@ do_restart(void)
 
 		for (cnt = 0; cnt < wait_seconds; cnt++)
 		{
-			if ((pid = get_pgpid()) != 0)
+			if ((pid = get_pgpid(false)) != 0)
 			{
 				print_msg(".");
 				pg_usleep(1000000);		/* 1 sec */
@@ -1071,7 +1096,7 @@ do_reload(void)
 {
 	pgpid_t		pid;
 
-	pid = get_pgpid();
+	pid = get_pgpid(false);
 	if (pid == 0)				/* no pid file */
 	{
 		write_stderr(_("%s: PID file \"%s\" does not exist\n"), progname, pid_file);
@@ -1110,7 +1135,7 @@ do_promote(void)
 	pgpid_t		pid;
 	struct stat statbuf;
 
-	pid = get_pgpid();
+	pid = get_pgpid(false);
 
 	if (pid == 0)				/* no pid file */
 	{
@@ -1204,7 +1229,7 @@ do_status(void)
 {
 	pgpid_t		pid;
 
-	pid = get_pgpid();
+	pid = get_pgpid(true);
 	/* Is there a pid file? */
 	if (pid != 0)
 	{
@@ -1247,7 +1272,7 @@ do_status(void)
 
 	/*
 	 * The Linux Standard Base Core Specification 3.1 says this should return
-	 * '3'
+	 * '3, program is not running'
 	 * https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
 	 */
 	exit(3);
@@ -2285,6 +2310,7 @@ main(int argc, char **argv)
 	if (pg_data)
 	{
 		snprintf(postopts_file, MAXPGPATH, "%s/postmaster.opts", pg_data);
+		snprintf(version_file, MAXPGPATH, "%s/PG_VERSION", pg_data);
 		snprintf(pid_file, MAXPGPATH, "%s/postmaster.pid", pg_data);
 		snprintf(backup_file, MAXPGPATH, "%s/backup_label", pg_data);
 		snprintf(recovery_file, MAXPGPATH, "%s/recovery.conf", pg_data);
#17Bruce Momjian
bruce@momjian.us
In reply to: Amit Kapila (#16)
Re: pg_ctl status with nonexistent data directory

On Sat, Mar 8, 2014 at 09:08:31AM +0530, Amit Kapila wrote:

On Fri, Mar 7, 2014 at 7:59 PM, Bruce Momjian <bruce@momjian.us> wrote:

On Fri, Mar 7, 2014 at 07:18:04PM +0530, Amit Kapila wrote:

OK, done with the attached patch Three is returned for status, but one
for other cases.

I think it would have been better if it return status as 4 for cases where
directory/file is not accessible (current new cases added by this patch).

I think status 3 should be only return only when the data directory is valid
and pid file is missing, because in script after getting this code, the next
thing they will probably do is to start the server which doesn't seem a
good fit for newly added cases.

OK, I have updated the attached patch to reflect this, and added a C
comment.

This is fine, do you think we should mention about this in docs?
I have added one line in docs (updated patch attached), if you think
it makes sense then add it otherwise ignore it.

I have applied your version of the patch with slightly different text:

If an accessible data directory is not specified, the process returns an
exit status of 4.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers