BUG #5650: Postgres service showing as stopped when in fact it is running

Started by Mark Llewellynover 15 years ago43 messages
#1Mark Llewellyn
mark_llewellyn@adp.com

The following bug has been logged online:

Bug reference: 5650
Logged by: Mark Llewellyn
Email address: mark_llewellyn@adp.com
PostgreSQL version: 9.0 RC1
Operating system: Windows XP
Description: Postgres service showing as stopped when in fact it is
running
Details:

The postgresql-9.0 service is showing as *not running* in pgAdmin 3 1.12 RC1
and as *stopped* in the Windows services list, although the database is
actually running and I can connect to it and create databases, tables etc.
It also shows multiple postgresql tasks in Windows task manager.
However, because Windows XP is not aware that the postgresql-9.0 service is
running it is not possible to stop the service.

#2Robert Haas
robertmhaas@gmail.com
In reply to: Mark Llewellyn (#1)
Re: BUG #5650: Postgres service showing as stopped when in fact it is running

On Thu, Sep 9, 2010 at 12:22 PM, Mark Llewellyn <mark_llewellyn@adp.com> wrote:

The following bug has been logged online:

Bug reference:      5650
Logged by:          Mark Llewellyn
Email address:      mark_llewellyn@adp.com
PostgreSQL version: 9.0 RC1
Operating system:   Windows XP
Description:        Postgres service showing as stopped when in fact it is
running
Details:

The postgresql-9.0 service is showing as *not running* in pgAdmin 3 1.12 RC1
and as *stopped* in the Windows services list, although the database is
actually running and I can connect to it and create databases, tables etc.
It also shows multiple postgresql tasks in Windows task manager.
However, because Windows XP is not aware that the postgresql-9.0 service is
running it is not possible to stop the service.

Maybe this is a dumb question, but how did you start the service?

And applying the good old Windows troubleshooting meme... if you
reboot, does that fix it?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#3Ashesh Vashi
ashesh.vashi@enterprisedb.com
In reply to: Mark Llewellyn (#1)
Re: BUG #5650: Postgres service showing as stopped when in fact it is running

Hi Mark,

On of my college (Sujeet) has found a way to reproduce the same behaviour.
1. Installed PG 9.0 on Win XP SP3
2. Stop the Postgresql-9.0 service from service manager console
3. Create pgpass.conf in postgres (service account) user's profile with an
incorrect password deliberately.
(Refer: http://www.postgresql.org/docs/8.4/interactive/libpq-pgpass.html)
4. Now start the postgresql-9.0 service, it will return an error and the
status
shows stopped
5. However i could connect to the psql shell and get the prompt which means
the server is running.

So, please check if pgpass.conf file exists in service account (postgres)
user's %APPDATA%\postgresql directory with wrong password.

--
Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise Postgres Company<http://www.enterprisedb.com&gt;

On Thu, Sep 9, 2010 at 9:52 PM, Mark Llewellyn <mark_llewellyn@adp.com>wrote:

Show quoted text

The following bug has been logged online:

Bug reference: 5650
Logged by: Mark Llewellyn
Email address: mark_llewellyn@adp.com
PostgreSQL version: 9.0 RC1
Operating system: Windows XP
Description: Postgres service showing as stopped when in fact it is
running
Details:

The postgresql-9.0 service is showing as *not running* in pgAdmin 3 1.12
RC1
and as *stopped* in the Windows services list, although the database is
actually running and I can connect to it and create databases, tables etc.
It also shows multiple postgresql tasks in Windows task manager.
However, because Windows XP is not aware that the postgresql-9.0 service is
running it is not possible to stop the service.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Magnus Hagander
magnus@hagander.net
In reply to: Ashesh Vashi (#3)
Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

On Fri, Sep 17, 2010 at 05:51, Ashesh Vashi
<ashesh.vashi@enterprisedb.com> wrote:

Hi Mark,

On of my college (Sujeet) has found a way to reproduce the same behaviour.
1. Installed PG 9.0 on Win XP SP3
2. Stop the Postgresql-9.0 service from service manager console
3. Create pgpass.conf in postgres (service account) user's profile with an
incorrect password deliberately.
(Refer: http://www.postgresql.org/docs/8.4/interactive/libpq-pgpass.html)
4. Now start the postgresql-9.0 service, it will return an error and the
status
   shows stopped
5. However i could connect to the psql shell and get the prompt which means
    the server is running.

I took a quick look at the code, and from what I can tell this is
because PQconnectionNeedsPassword() always returns false if a
pgpass.conf has been used. There is no handling the case where pgpass
is used, but has an incorrect password.

Does anybody recall the specific reason for this? Do we need a way for
pg_ctl to figure this out, or do we need to change it in
PQconnecitonNeedsPassword()?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#4)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Magnus Hagander <magnus@hagander.net> writes:

I took a quick look at the code, and from what I can tell this is
because PQconnectionNeedsPassword() always returns false if a
pgpass.conf has been used. There is no handling the case where pgpass
is used, but has an incorrect password.

Why should it? That code is complicated enough, I don't think it needs
to have a behavior of pretending that a wrong entry isn't there.

regards, tom lane

#6Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#5)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

On Fri, Sep 24, 2010 at 16:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Magnus Hagander <magnus@hagander.net> writes:

I took a quick look at the code, and from what I can tell this is
because PQconnectionNeedsPassword() always returns false if a
pgpass.conf has been used. There is no handling the case where pgpass
is used, but has an incorrect password.

Why should it?  That code is complicated enough, I don't think it needs
to have a behavior of pretending that a wrong entry isn't there.

In that case, we should probably teach pg_ctl about this case, no?
Since it clearly gives an incorrect message to the user now...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#7Andrew Dunstan
andrew@dunslane.net
In reply to: Magnus Hagander (#6)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

On 09/24/2010 10:15 AM, Magnus Hagander wrote:

On Fri, Sep 24, 2010 at 16:04, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Magnus Hagander<magnus@hagander.net> writes:

I took a quick look at the code, and from what I can tell this is
because PQconnectionNeedsPassword() always returns false if a
pgpass.conf has been used. There is no handling the case where pgpass
is used, but has an incorrect password.

Why should it? That code is complicated enough, I don't think it needs
to have a behavior of pretending that a wrong entry isn't there.

In that case, we should probably teach pg_ctl about this case, no?
Since it clearly gives an incorrect message to the user now...

pg_ctl decides that the server is running iff it can connect to it. Do
you intend to provide for a different test? Setting an incorrect
password for the service account sounds like pilot error to me.

cheers

andrew

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#7)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Andrew Dunstan <andrew@dunslane.net> writes:

On 09/24/2010 10:15 AM, Magnus Hagander wrote:

In that case, we should probably teach pg_ctl about this case, no?
Since it clearly gives an incorrect message to the user now...

pg_ctl decides that the server is running iff it can connect to it. Do
you intend to provide for a different test?

Seems like getting a password challenge from the server is sufficient
evidence that the server is running, whether we are able to meet the
challenge or not. Perhaps we could just twiddle pg_ctl's "is it up"
test a bit to notice whether the connect failure was of this sort.

(Of course, a "pg_ping" utility would be a better answer, but nobody's
gotten around to that in more than ten years, so I'm not holding my
breath.)

regards, tom lane

#9Dave Page
dpage@pgadmin.org
In reply to: Tom Lane (#8)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

On Fri, Sep 24, 2010 at 4:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

(Of course, a "pg_ping" utility would be a better answer, but nobody's
gotten around to that in more than ten years, so I'm not holding my
breath.)

Hmm, that sounded like it could be my 9.1 mini project - then Google
showed me that SeanC wrote something already.

http://archives.postgresql.org/pgsql-patches/2003-07/msg00053.php

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dave Page (#9)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Dave Page <dpage@pgadmin.org> writes:

On Fri, Sep 24, 2010 at 4:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

(Of course, a "pg_ping" utility would be a better answer, but nobody's
gotten around to that in more than ten years, so I'm not holding my
breath.)

Hmm, that sounded like it could be my 9.1 mini project - then Google
showed me that SeanC wrote something already.
http://archives.postgresql.org/pgsql-patches/2003-07/msg00053.php

Huh, I wonder why we never adopted that? Although I'd be inclined to
do most of the heavy lifting inside libpq, myself, and this is way
more verbose than what pg_ctl wants.

regards, tom lane

#11Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#8)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

On 09/24/2010 11:11 AM, Tom Lane wrote:

Andrew Dunstan<andrew@dunslane.net> writes:

On 09/24/2010 10:15 AM, Magnus Hagander wrote:

In that case, we should probably teach pg_ctl about this case, no?
Since it clearly gives an incorrect message to the user now...

pg_ctl decides that the server is running iff it can connect to it. Do
you intend to provide for a different test?

Seems like getting a password challenge from the server is sufficient
evidence that the server is running, whether we are able to meet the
challenge or not. Perhaps we could just twiddle pg_ctl's "is it up"
test a bit to notice whether the connect failure was of this sort.

pg_ctl does in fact use that sort of logic:

if ((conn = PQconnectdb(connstr)) != NULL &&
(PQstatus(conn) == CONNECTION_OK ||
PQconnectionNeedsPassword(conn)))

But of course, libpq won't set that last condition if there is a bad
password in the pgpass file, which seems a rather perverse thing to do.

cheers

andrew

#12Llewellyn, Mark (ADP DSI)
mark_llewellyn@adp.com
In reply to: Ashesh Vashi (#3)
Re: BUG #5650: Postgres service showing as stopped when in fact it is running

Hi Ashesh

Yes, this appears to be the issue that the password in the pgpass.conf file for the postgres user had not been changed when the windows account password was reset.

Problem now resolved.

Many thanks
Mark Llewellyn

________________________________
From: Ashesh Vashi [mailto:ashesh.vashi@enterprisedb.com]
Sent: 17 September 2010 04:51
To: Llewellyn, Mark (ADP DSI)
Cc: pgsql-bugs@postgresql.org; Sujeet Rajguru
Subject: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Hi Mark,

On of my college (Sujeet) has found a way to reproduce the same behaviour.
1. Installed PG 9.0 on Win XP SP3
2. Stop the Postgresql-9.0 service from service manager console
3. Create pgpass.conf in postgres (service account) user's profile with an incorrect password deliberately.
(Refer: http://www.postgresql.org/docs/8.4/interactive/libpq-pgpass.html)
4. Now start the postgresql-9.0 service, it will return an error and the status
shows stopped
5. However i could connect to the psql shell and get the prompt which means
the server is running.

So, please check if pgpass.conf file exists in service account (postgres) user's %APPDATA%\postgresql directory with wrong password.

--
Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise Postgres Company<http://www.enterprisedb.com&gt;

On Thu, Sep 9, 2010 at 9:52 PM, Mark Llewellyn <mark_llewellyn@adp.com<mailto:mark_llewellyn@adp.com>> wrote:

The following bug has been logged online:

Bug reference: 5650
Logged by: Mark Llewellyn
Email address: mark_llewellyn@adp.com<mailto:mark_llewellyn@adp.com>
PostgreSQL version: 9.0 RC1
Operating system: Windows XP
Description: Postgres service showing as stopped when in fact it is
running
Details:

The postgresql-9.0 service is showing as *not running* in pgAdmin 3 1.12 RC1
and as *stopped* in the Windows services list, although the database is
actually running and I can connect to it and create databases, tables etc.
It also shows multiple postgresql tasks in Windows task manager.
However, because Windows XP is not aware that the postgresql-9.0 service is
running it is not possible to stop the service.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org<mailto:pgsql-bugs@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system.

#13Ashesh Vashi
ashesh.vashi@enterprisedb.com
In reply to: Llewellyn, Mark (ADP DSI) (#12)
Re: BUG #5650: Postgres service showing as stopped when in fact it is running

We're happy to see the problem resolved on your end. :-)

--
Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise Postgres Company<http://www.enterprisedb.com&gt;

*http://www.linkedin.com/in/asheshvashi*&lt;http://www.linkedin.com/in/asheshvashi&gt;

On Mon, Sep 27, 2010 at 8:57 PM, Llewellyn, Mark (ADP DSI) <
mark_llewellyn@adp.com> wrote:

Show quoted text

Hi Ashesh

Yes, this appears to be the issue that the password in the pgpass.conf file
for the postgres user had not been changed when the windows account password
was reset.

Problem now resolved.

Many thanks

Mark Llewellyn

------------------------------

*From:* Ashesh Vashi [mailto:ashesh.vashi@enterprisedb.com]
*Sent:* 17 September 2010 04:51
*To:* Llewellyn, Mark (ADP DSI)
*Cc:* pgsql-bugs@postgresql.org; Sujeet Rajguru
*Subject:* Re: [BUGS] BUG #5650: Postgres service showing as stopped when
in fact it is running

Hi Mark,

On of my college (Sujeet) has found a way to reproduce the same behaviour.
1. Installed PG 9.0 on Win XP SP3
2. Stop the Postgresql-9.0 service from service manager console
3. Create pgpass.conf in postgres (service account) user's profile with an
incorrect password deliberately.
(Refer: http://www.postgresql.org/docs/8.4/interactive/libpq-pgpass.html)
4. Now start the postgresql-9.0 service, it will return an error and the
status
shows stopped
5. However i could connect to the psql shell and get the prompt which means
the server is running.

So, please check if pgpass.conf file exists in service account (postgres)
user's %APPDATA%\postgresql directory with wrong password.

--*
Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: *Enterprise Postgres Company<http://www.enterprisedb.com&gt;

On Thu, Sep 9, 2010 at 9:52 PM, Mark Llewellyn <mark_llewellyn@adp.com>
wrote:

The following bug has been logged online:

Bug reference: 5650
Logged by: Mark Llewellyn
Email address: mark_llewellyn@adp.com
PostgreSQL version: 9.0 RC1
Operating system: Windows XP
Description: Postgres service showing as stopped when in fact it is
running
Details:

The postgresql-9.0 service is showing as *not running* in pgAdmin 3 1.12
RC1
and as *stopped* in the Windows services list, although the database is
actually running and I can connect to it and create databases, tables etc.
It also shows multiple postgresql tasks in Windows task manager.
However, because Windows XP is not aware that the postgresql-9.0 service is
running it is not possible to stop the service.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

------------------------------
This message and any attachments are intended only for the use of the
addressee and may contain information that is privileged and confidential.
If the reader of the message is not the intended recipient or an authorized
representative of the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, please notify us immediately by e-mail
and delete the message and any attachments from your system.

#14Robert Haas
robertmhaas@gmail.com
In reply to: Ashesh Vashi (#13)
Re: BUG #5650: Postgres service showing as stopped when in fact it is running

On Mon, Sep 27, 2010 at 12:44 PM, Ashesh Vashi <
ashesh.vashi@enterprisedb.com> wrote:

We're happy to see the problem resolved on your end. :-)--

However, it doesn't seem that we've actually done anything about the
underlying issue with pg_ctl.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#15Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#14)
Re: BUG #5650: Postgres service showing as stopped when in fact it is running

Robert Haas wrote:

On Mon, Sep 27, 2010 at 12:44 PM, Ashesh Vashi <
ashesh.vashi@enterprisedb.com> wrote:

We're happy to see the problem resolved on your end. :-)--

However, it doesn't seem that we've actually done anything about the
underlying issue with pg_ctl.

I will look at that in the next few weeks.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#16Bruce Momjian
bruce@momjian.us
In reply to: Magnus Hagander (#4)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Magnus Hagander wrote:

On Fri, Sep 17, 2010 at 05:51, Ashesh Vashi
<ashesh.vashi@enterprisedb.com> wrote:

Hi Mark,

On of my college (Sujeet) has found a way to reproduce the same behaviour.
1. Installed PG 9.0 on Win XP SP3
2. Stop the Postgresql-9.0 service from service manager console
3. Create pgpass.conf in postgres (service account) user's profile with an
incorrect password deliberately.
(Refer: http://www.postgresql.org/docs/8.4/interactive/libpq-pgpass.html)
4. Now start the postgresql-9.0 service, it will return an error and the
status
?? shows stopped
5. However i could connect to the psql shell and get the prompt which means
??? the server is running.

I took a quick look at the code, and from what I can tell this is
because PQconnectionNeedsPassword() always returns false if a
pgpass.conf has been used. There is no handling the case where pgpass
is used, but has an incorrect password.

Does anybody recall the specific reason for this? Do we need a way for
pg_ctl to figure this out, or do we need to change it in
PQconnecitonNeedsPassword()?

I was not able to reproduce this failure on my BSD system using GIT
head:

$ psql test
psql: FATAL: password authentication failed for user "postgres"
password retrieved from file "/u/postgres/.pgpass"

$ pg_ctl status
pg_ctl: server is running (PID: 710)
/usr/var/local/pgsql/bin/postgres "-i"

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#17Magnus Hagander
magnus@hagander.net
In reply to: Bruce Momjian (#16)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

On Fri, Nov 12, 2010 at 03:49, Bruce Momjian <bruce@momjian.us> wrote:

Magnus Hagander wrote:

On Fri, Sep 17, 2010 at 05:51, Ashesh Vashi
<ashesh.vashi@enterprisedb.com> wrote:

Hi Mark,

On of my college (Sujeet) has found a way to reproduce the same behaviour.
1. Installed PG 9.0 on Win XP SP3
2. Stop the Postgresql-9.0 service from service manager console
3. Create pgpass.conf in postgres (service account) user's profile with an
incorrect password deliberately.
(Refer: http://www.postgresql.org/docs/8.4/interactive/libpq-pgpass.html)
4. Now start the postgresql-9.0 service, it will return an error and the
status
?? shows stopped
5. However i could connect to the psql shell and get the prompt which means
??? the server is running.

I took a quick look at the code, and from what I can tell this is
because PQconnectionNeedsPassword() always returns false if a
pgpass.conf has been used. There is no handling the case where pgpass
is used, but has an incorrect password.

Does anybody recall the specific reason for this? Do we need a way for
pg_ctl to figure this out, or do we need to change it in
PQconnecitonNeedsPassword()?

I was not able to reproduce this failure on my BSD system using GIT
head:

       $ psql test
       psql: FATAL:  password authentication failed for user "postgres"
       password retrieved from file "/u/postgres/.pgpass"

       $ pg_ctl status
       pg_ctl: server is running (PID: 710)
       /usr/var/local/pgsql/bin/postgres "-i"

The problem is not in pg_ctl status, it's in pg_ctl start. They're
different codepaths - status never tries to actually connect, it just
checks if the process is alive.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#18Bruce Momjian
bruce@momjian.us
In reply to: Magnus Hagander (#17)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Magnus Hagander wrote:

On Fri, Nov 12, 2010 at 03:49, Bruce Momjian <bruce@momjian.us> wrote:

Magnus Hagander wrote:

On Fri, Sep 17, 2010 at 05:51, Ashesh Vashi
<ashesh.vashi@enterprisedb.com> wrote:

Hi Mark,

On of my college (Sujeet) has found a way to reproduce the same behaviour.
1. Installed PG 9.0 on Win XP SP3
2. Stop the Postgresql-9.0 service from service manager console
3. Create pgpass.conf in postgres (service account) user's profile with an
incorrect password deliberately.
(Refer: http://www.postgresql.org/docs/8.4/interactive/libpq-pgpass.html)
4. Now start the postgresql-9.0 service, it will return an error and the
status
?? shows stopped
5. However i could connect to the psql shell and get the prompt which means
??? the server is running.

I took a quick look at the code, and from what I can tell this is
because PQconnectionNeedsPassword() always returns false if a
pgpass.conf has been used. There is no handling the case where pgpass
is used, but has an incorrect password.

Does anybody recall the specific reason for this? Do we need a way for
pg_ctl to figure this out, or do we need to change it in
PQconnecitonNeedsPassword()?

I was not able to reproduce this failure on my BSD system using GIT
head:

? ? ? ?$ psql test
? ? ? ?psql: FATAL: ?password authentication failed for user "postgres"
? ? ? ?password retrieved from file "/u/postgres/.pgpass"

? ? ? ?$ pg_ctl status
? ? ? ?pg_ctl: server is running (PID: 710)
? ? ? ?/usr/var/local/pgsql/bin/postgres "-i"

The problem is not in pg_ctl status, it's in pg_ctl start. They're
different codepaths - status never tries to actually connect, it just
checks if the process is alive.

Uh, I still cannot reproduce the failure:

$ psql postgres
psql: FATAL: password authentication failed for user "postgres"
password retrieved from file "/u/postgres/.pgpass"

$ pg_ctl stop
waiting for server to shut down.... done
server stopped

$ pg_ctl -l /dev/null start
server starting

(Got to love that new 9.0 pgpass error message.)

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#18)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Bruce Momjian <bruce@momjian.us> writes:

Uh, I still cannot reproduce the failure:

I would imagine you need -w option on the start. The whole issue
here is whether start's wait-for-server-start code works.

regards, tom lane

#20Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#19)
1 attachment(s)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Uh, I still cannot reproduce the failure:

I would imagine you need -w option on the start. The whole issue
here is whether start's wait-for-server-start code works.

Thanks, I am now able to reproduce this. I was able to get this to
report the .pgpass problem:

$ psql postgres
psql: FATAL: password authentication failed for user "postgres"
password retrieved from file "/u/postgres/.pgpass"

$ pg_ctl stop
waiting for server to shut down.... done
server stopped

$ pg_ctl -w -l /dev/null start
waiting for server to start....FATAL: password authentication failed
for user "postgres"
password retrieved from file "/u/postgres/.pgpass"
.FATAL: password authentication failed for user "postgres"
password retrieved from file "/u/postgres/.pgpass"
.FATAL: password authentication failed for user "postgres"
password retrieved from file "/u/postgres/.pgpass"
.^C

I basically report the connection error string if it starts with "FATAL:".

I originally tried to check for an ERRCODE_INVALID_PASSWORD error field
(see // comments), but it seems there is no way to access this, i.e.
PQgetResult(conn) on a connection failure is always NULL.

Anyway, perhaps FATAL is a better test because it will report any major
failure, not just a .pgpass one.

Patch attached.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachments:

/pgpatches/pg_ctltext/x-diffDownload
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 14d36b5..df71c16 100644
*** a/src/bin/pg_ctl/pg_ctl.c
--- b/src/bin/pg_ctl/pg_ctl.c
*************** typedef enum
*** 70,75 ****
--- 70,78 ----
  } CtlCommand;
  
  #define DEFAULT_WAIT	60
+ //
+ ///* This is part of the protocol so just define it */
+ //#define ERRCODE_INVALID_PASSWORD "28P01"
  
  static bool do_wait = false;
  static bool wait_set = false;
*************** test_postmaster_connection(bool do_check
*** 511,516 ****
--- 514,523 ----
  		if ((conn = PQconnectdb(connstr)) != NULL &&
  			(PQstatus(conn) == CONNECTION_OK ||
  			 PQconnectionNeedsPassword(conn)))
+ //			/* only works with >= 9.0 servers */
+ //			(PQgetResult(conn) &&
+ //			strcmp(PQresultErrorField(PQgetResult(conn), PG_DIAG_SQLSTATE),
+ //			   ERRCODE_INVALID_PASSWORD) == 0)))
  		{
  			PQfinish(conn);
  			success = true;
*************** test_postmaster_connection(bool do_check
*** 518,523 ****
--- 525,533 ----
  		}
  		else
  		{
+ 			/* report fatal errors like invalid .pgpass passwords */
+ 			if (strncmp(PQerrorMessage(conn), "FATAL:", strlen("FATAL:")) == 0)
+ 				fputs(PQerrorMessage(conn), stderr);
  			PQfinish(conn);
  
  #if defined(WIN32)
#21Magnus Hagander
magnus@hagander.net
In reply to: Bruce Momjian (#20)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

On Fri, Nov 12, 2010 at 17:47, Bruce Momjian <bruce@momjian.us> wrote:

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Uh, I still cannot reproduce the failure:

I would imagine you need -w option on the start.  The whole issue
here is whether start's wait-for-server-start code works.

Thanks, I am now able to reproduce this.  I was able to get this to
report the .pgpass problem:

       $ psql postgres
       psql: FATAL:  password authentication failed for user "postgres"
       password retrieved from file "/u/postgres/.pgpass"

       $ pg_ctl stop
       waiting for server to shut down.... done
       server stopped

       $ pg_ctl -w -l /dev/null start
       waiting for server to start....FATAL:  password authentication failed
       for user "postgres"
       password retrieved from file "/u/postgres/.pgpass"
       .FATAL:  password authentication failed for user "postgres"
       password retrieved from file "/u/postgres/.pgpass"
       .FATAL:  password authentication failed for user "postgres"
       password retrieved from file "/u/postgres/.pgpass"
       .^C

I basically report the connection error string if it starts with "FATAL:".

I originally tried to check for an ERRCODE_INVALID_PASSWORD error field
(see // comments), but it seems there is no way to access this, i.e.
PQgetResult(conn) on a connection failure is always NULL.

Anyway, perhaps FATAL is a better test because it will report any major
failure, not just a .pgpass one.

Patch attached.

Bad Bruce, using C++ comments like that :P And non-context diff ;)

Does this actually solve the *problem*, though? The problem is not
what is reported on stdout/stderr, the problem is that the net result
is that the server is reported as not started (by the service control
manager) when it actually *is* started. In this case, stderr doesn't
even go anywhere. What happens if you *don't* Ctrl-C it?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#22Bruce Momjian
bruce@momjian.us
In reply to: Magnus Hagander (#21)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Magnus Hagander wrote:

I basically report the connection error string if it starts with "FATAL:".

I originally tried to check for an ERRCODE_INVALID_PASSWORD error field
(see // comments), but it seems there is no way to access this, i.e.
PQgetResult(conn) on a connection failure is always NULL.

Anyway, perhaps FATAL is a better test because it will report any major
failure, not just a .pgpass one.

Patch attached.

Bad Bruce, using C++ comments like that :P And non-context diff ;)

That comment use was to highlight that those are not for commit, but
there if people want to test.

As far as the diff, it seems git-external-diff isn't portable to
non-Linux systems; I will post a separate email on that.

Does this actually solve the *problem*, though? The problem is not
what is reported on stdout/stderr, the problem is that the net result
is that the server is reported as not started (by the service control
manager) when it actually *is* started. In this case, stderr doesn't
even go anywhere. What happens if you *don't* Ctrl-C it?

I was just going to post on that. :-) Right now, it prints the FATAL
and keeps printing 60 times, then says not running. Should we just exit
on FATAL and output a special exit string, or say running?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#23Magnus Hagander
magnus@hagander.net
In reply to: Bruce Momjian (#22)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

On Wed, Nov 17, 2010 at 19:50, Bruce Momjian <bruce@momjian.us> wrote:

Magnus Hagander wrote:

Does this actually solve the *problem*, though? The problem is not
what is reported  on stdout/stderr, the problem is that the net result
is that the server is reported as not started (by the service control
manager) when it actually *is* started. In this case, stderr doesn't
even go anywhere. What happens if you *don't* Ctrl-C it?

I was just going to post on that.  :-)  Right now, it prints the FATAL
and keeps printing 60 times, then says not running.  Should we just exit
on FATAL and output a special exit string, or say running?

From the perspective of the service control manager, it should say
running. That might break other scenarios though, but i'm not sure - I
think we can safely say the server is running when we try to log in
and get a password failure.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#24Bruce Momjian
bruce@momjian.us
In reply to: Magnus Hagander (#23)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Magnus Hagander wrote:

On Wed, Nov 17, 2010 at 19:50, Bruce Momjian <bruce@momjian.us> wrote:

Magnus Hagander wrote:

Does this actually solve the *problem*, though? The problem is not
what is reported ?on stdout/stderr, the problem is that the net result
is that the server is reported as not started (by the service control
manager) when it actually *is* started. In this case, stderr doesn't
even go anywhere. What happens if you *don't* Ctrl-C it?

I was just going to post on that. ?:-) ?Right now, it prints the FATAL
and keeps printing 60 times, then says not running. ?Should we just exit
on FATAL and output a special exit string, or say running?

From the perspective of the service control manager, it should say

running. That might break other scenarios though, but i'm not sure - I
think we can safely say the server is running when we try to log in
and get a password failure.

That was another part of the discussion. Right now we report any FATAL,
so it might be a password problem, or something else, and it seems doing
all FATALs is the best idea because it will catch any other cases like
this.

Is FATAL, in general, enough to conclude the server is running?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#25Magnus Hagander
magnus@hagander.net
In reply to: Bruce Momjian (#24)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

On Wed, Nov 17, 2010 at 19:57, Bruce Momjian <bruce@momjian.us> wrote:

Magnus Hagander wrote:

On Wed, Nov 17, 2010 at 19:50, Bruce Momjian <bruce@momjian.us> wrote:

Magnus Hagander wrote:

Does this actually solve the *problem*, though? The problem is not
what is reported ?on stdout/stderr, the problem is that the net result
is that the server is reported as not started (by the service control
manager) when it actually *is* started. In this case, stderr doesn't
even go anywhere. What happens if you *don't* Ctrl-C it?

I was just going to post on that. ?:-) ?Right now, it prints the FATAL
and keeps printing 60 times, then says not running. ?Should we just exit
on FATAL and output a special exit string, or say running?

From the perspective of the service control manager, it should say

running. That might break other scenarios though, but i'm not sure - I
think we can safely say the server is running when we try to log in
and get a password failure.

That was another part of the discussion.  Right now we report any FATAL,
so it might be a password problem, or something else, and it seems doing
all FATALs is the best idea because it will catch any other cases like
this.

Is FATAL, in general, enough to conclude the server is running?

No - specifically, we will send FATAL when "the database system is
starting up", which is exactly the one we want to *avoid*.

I think we should only exclude the password case. I guess we could
also do all fatal *except* <list>, but that seems more fragile.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#25)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Magnus Hagander <magnus@hagander.net> writes:

On Wed, Nov 17, 2010 at 19:57, Bruce Momjian <bruce@momjian.us> wrote:

Is FATAL, in general, enough to conclude the server is running?

No - specifically, we will send FATAL when "the database system is
starting up", which is exactly the one we want to *avoid*.

I think we should only exclude the password case. I guess we could
also do all fatal *except* <list>, but that seems more fragile.

I believe that the above argument is exactly backwards. What we want
here is to check the result of postmaster.c's canAcceptConnections(),
and there are only a finite number of error codes that can result from
rejections there. If we get past that, there are a large number of
possible failures, but all of them indicate that the postmaster is in
principle willing to accept connections. Checking for password errors
only is utterly wrong: any other type of auth failure would be the same
for this purpose, as would "no such database", "no such user", "too many
connections", etc etc etc.

What we actually want here, and don't have, is the fabled pg_ping
protocol...

regards, tom lane

#27Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#26)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

On Wed, Nov 17, 2010 at 19:57, Bruce Momjian <bruce@momjian.us> wrote:

Is FATAL, in general, enough to conclude the server is running?

No - specifically, we will send FATAL when "the database system is
starting up", which is exactly the one we want to *avoid*.

I think we should only exclude the password case. I guess we could
also do all fatal *except* <list>, but that seems more fragile.

I believe that the above argument is exactly backwards. What we want
here is to check the result of postmaster.c's canAcceptConnections(),
and there are only a finite number of error codes that can result from
rejections there. If we get past that, there are a large number of
possible failures, but all of them indicate that the postmaster is in
principle willing to accept connections. Checking for password errors
only is utterly wrong: any other type of auth failure would be the same
for this purpose, as would "no such database", "no such user", "too many
connections", etc etc etc.

Agreed. So how do we pass that info to libpq without exceeding the
value of fixing this problem? Should we parse pg_controldata output?
pg_upgrade could use machine-readable output from that too.

What we actually want here, and don't have, is the fabled pg_ping
protocol...

Well, we are basically figuring how to implement that with this fix,
whether it is part of pg_ctl or a separate binary.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#27)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Bruce Momjian <bruce@momjian.us> writes:

Agreed. So how do we pass that info to libpq without exceeding the
value of fixing this problem? Should we parse pg_controldata output?
pg_upgrade could use machine-readable output from that too.

pg_controldata seems 100% unrelated to this problem. You cannot even
tell if the postmaster is alive just by inspecting pg_control.

What we actually want here, and don't have, is the fabled pg_ping
protocol...

Well, we are basically figuring how to implement that with this fix,
whether it is part of pg_ctl or a separate binary.

Possibly the cleanest fix is to implement pg_ping as a libpq function.
You do have to distinguish connection failures (ie connection refused)
from errors that came back from the postmaster, and the easiest place to
be doing that is inside libpq.

regards, tom lane

#29Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#28)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Agreed. So how do we pass that info to libpq without exceeding the
value of fixing this problem? Should we parse pg_controldata output?
pg_upgrade could use machine-readable output from that too.

pg_controldata seems 100% unrelated to this problem. You cannot even
tell if the postmaster is alive just by inspecting pg_control.

I was thinking of this:

$ pg_controldata /u/pg/data
...
Database cluster state: shut down

What we actually want here, and don't have, is the fabled pg_ping
protocol...

Well, we are basically figuring how to implement that with this fix,
whether it is part of pg_ctl or a separate binary.

Possibly the cleanest fix is to implement pg_ping as a libpq function.
You do have to distinguish connection failures (ie connection refused)
from errors that came back from the postmaster, and the easiest place to
be doing that is inside libpq.

OK, so a new libpq function --- got it. Would we just pass the status
from the backend or can it be done without backend modifications?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#29)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

Possibly the cleanest fix is to implement pg_ping as a libpq function.
You do have to distinguish connection failures (ie connection refused)
from errors that came back from the postmaster, and the easiest place to
be doing that is inside libpq.

OK, so a new libpq function --- got it. Would we just pass the status
from the backend or can it be done without backend modifications?

It would definitely be better to do it without backend mods, so that
the functionality would work against back-branch postmasters.

To my mind, the entire purpose of such a function is to classify the
possible errors so that the caller doesn't have to. So I wouldn't
consider that it ought to "pass back the status from the backend".
I think what we basically want is a function that takes a conninfo
string (or one of the variants of that) and returns an enum defined
more or less like this:

* failed to connect to postmaster
* connected, but postmaster is not accepting sessions
* postmaster is up and accepting sessions

I'm not sure those are exactly the categories we want, but something
close to that. In particular, I don't know if there's any value in
subdividing the "not accepting sessions" status --- pg_ctl doesn't
really care, but other use-cases might want to tell the difference
between the various canAcceptConnections failure states.

BTW, it is annoying that we can't definitively distinguish "postmaster
is not running" from a connectivity problem, but I can't see a way
around that.

regards, tom lane

#31Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#30)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

Possibly the cleanest fix is to implement pg_ping as a libpq function.
You do have to distinguish connection failures (ie connection refused)
from errors that came back from the postmaster, and the easiest place to
be doing that is inside libpq.

OK, so a new libpq function --- got it. Would we just pass the status
from the backend or can it be done without backend modifications?

It would definitely be better to do it without backend mods, so that
the functionality would work against back-branch postmasters.

To my mind, the entire purpose of such a function is to classify the
possible errors so that the caller doesn't have to. So I wouldn't
consider that it ought to "pass back the status from the backend".
I think what we basically want is a function that takes a conninfo
string (or one of the variants of that) and returns an enum defined
more or less like this:

* failed to connect to postmaster
* connected, but postmaster is not accepting sessions
* postmaster is up and accepting sessions

I'm not sure those are exactly the categories we want, but something
close to that. In particular, I don't know if there's any value in
subdividing the "not accepting sessions" status --- pg_ctl doesn't
really care, but other use-cases might want to tell the difference
between the various canAcceptConnections failure states.

BTW, it is annoying that we can't definitively distinguish "postmaster
is not running" from a connectivity problem, but I can't see a way
around that.

Agreed. I will research this.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#32Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#31)
1 attachment(s)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Bruce Momjian wrote:

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

Possibly the cleanest fix is to implement pg_ping as a libpq function.
You do have to distinguish connection failures (ie connection refused)
from errors that came back from the postmaster, and the easiest place to
be doing that is inside libpq.

OK, so a new libpq function --- got it. Would we just pass the status
from the backend or can it be done without backend modifications?

It would definitely be better to do it without backend mods, so that
the functionality would work against back-branch postmasters.

To my mind, the entire purpose of such a function is to classify the
possible errors so that the caller doesn't have to. So I wouldn't
consider that it ought to "pass back the status from the backend".
I think what we basically want is a function that takes a conninfo
string (or one of the variants of that) and returns an enum defined
more or less like this:

* failed to connect to postmaster
* connected, but postmaster is not accepting sessions
* postmaster is up and accepting sessions

I'm not sure those are exactly the categories we want, but something
close to that. In particular, I don't know if there's any value in
subdividing the "not accepting sessions" status --- pg_ctl doesn't
really care, but other use-cases might want to tell the difference
between the various canAcceptConnections failure states.

BTW, it is annoying that we can't definitively distinguish "postmaster
is not running" from a connectivity problem, but I can't see a way
around that.

Agreed. I will research this.

I have researched this and developed the attached patch. It implements
PGping() and PGpingParams() in libpq, and has pg_ctl use it for pg_ctl
-w server status detection.

The new output for cases where .pgpass is not allowing for a connection
is:

$ pg_ctl -w -l /dev/null start
waiting for server to start.... done
server started
However, could not connect, perhaps due to invalid authentication or
misconfiguration.

The code basically checks the connection status between PQconnectStart()
and connectDBComplete() to see if the server is running but we failed to
connect for some reason.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachments:

/pgpatches/pg_ctl_v2text/x-diffDownload
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index a911c50..32c58a5 100644
*** /tmp/b2EvXa_libpq.sgml	Tue Nov 23 17:41:50 2010
--- doc/src/sgml/libpq.sgml	Tue Nov 23 17:36:32 2010
*************** int PQbackendPID(const PGconn *conn);
*** 1511,1516 ****
--- 1511,1584 ----
       </listitem>
      </varlistentry>
  
+     <varlistentry id="libpq-pqpingparams">
+      <term><function>PQpingParams</function><indexterm><primary>PQpingParams</></></term>
+      <listitem>
+       <para>
+        <function>PQpingParams</function> indicates the status of the
+        server.  The currently recognized parameter key words are the
+        same as <function>PQconnectParams</>.
+ 
+ <synopsis>
+ PGPing PQpingParams(const char **keywords, const char **values, int expand_dbname);
+ </synopsis>
+ 
+        It returns one of the following values:
+ 
+        <variablelist>
+         <varlistentry id="libpq-pqpingparams-pqaccess">
+          <term><literal>PQACCESS</literal></term>
+          <listitem>
+           <para>
+            The server is running and allows access.
+           </para>
+          </listitem>
+         </varlistentry>
+ 
+         <varlistentry id="libpq-pqpingparams-pqreject">
+          <term><literal>PQREJECT</literal></term>
+          <listitem>
+           <para>
+            The server is running but rejected a connection request.
+           </para>
+          </listitem>
+         </varlistentry>
+ 
+         <varlistentry id="libpq-pqpingparams-pqnoresponse">
+          <term><literal>PQNORESPONSE</literal></term>
+          <listitem>
+           <para>
+            The server did not respond.
+           </para>
+          </listitem>
+         </varlistentry>
+        </variablelist>
+ 
+       </para>
+ 
+      </listitem>
+     </varlistentry>
+ 
+     <varlistentry id="libpq-pqping">
+      <term><function>PQping</function><indexterm><primary>PQping</></></term>
+      <listitem>
+       <para>
+        Returns the status of the server.
+ 
+ <synopsis>
+ PGPing PQping(const char *conninfo);
+ </synopsis>
+       </para>
+ 
+       <para>
+        This function uses the same <literal>conninfo</literal> parameter
+        key words as <function>PQconnectdb</>.  It returns the same
+        values as <function>PQpingParams</> above.
+       </para>
+ 
+      </listitem>
+     </varlistentry>
+ 
      <varlistentry id="libpq-pqconnectionneedspassword">
       <term><function>PQconnectionNeedsPassword</function><indexterm><primary>PQconnectionNeedsPassword</></></term>
       <listitem>
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 14d36b5..7a5bb7a 100644
*** /tmp/3BxVxb_pg_ctl.c	Tue Nov 23 17:41:50 2010
--- src/bin/pg_ctl/pg_ctl.c	Tue Nov 23 17:25:19 2010
*************** static char **readfile(const char *path)
*** 136,142 ****
  static int	start_postmaster(void);
  static void read_post_opts(void);
  
! static bool test_postmaster_connection(bool);
  static bool postmaster_is_alive(pid_t pid);
  
  static char postopts_file[MAXPGPATH];
--- 136,142 ----
  static int	start_postmaster(void);
  static void read_post_opts(void);
  
! static PGPing test_postmaster_connection(bool);
  static bool postmaster_is_alive(pid_t pid);
  
  static char postopts_file[MAXPGPATH];
*************** start_postmaster(void)
*** 400,410 ****
   * Note that the checkpoint parameter enables a Windows service control
   * manager checkpoint, it's got nothing to do with database checkpoints!!
   */
! static bool
  test_postmaster_connection(bool do_checkpoint)
  {
! 	PGconn	   *conn;
! 	bool		success = false;
  	int			i;
  	char		portstr[32];
  	char	   *p;
--- 400,409 ----
   * Note that the checkpoint parameter enables a Windows service control
   * manager checkpoint, it's got nothing to do with database checkpoints!!
   */
! static PGPing
  test_postmaster_connection(bool do_checkpoint)
  {
! 	PGPing		ret = PQACCESS;	/* assume success for zero wait */
  	int			i;
  	char		portstr[32];
  	char	   *p;
*************** test_postmaster_connection(bool do_check
*** 508,525 ****
  
  	for (i = 0; i < wait_seconds; i++)
  	{
! 		if ((conn = PQconnectdb(connstr)) != NULL &&
! 			(PQstatus(conn) == CONNECTION_OK ||
! 			 PQconnectionNeedsPassword(conn)))
! 		{
! 			PQfinish(conn);
! 			success = true;
! 			break;
! 		}
  		else
  		{
- 			PQfinish(conn);
- 
  #if defined(WIN32)
  			if (do_checkpoint)
  			{
--- 507,516 ----
  
  	for (i = 0; i < wait_seconds; i++)
  	{
! 		if ((ret = PQping(connstr)) != PQNORESPONSE)
! 			return ret;
  		else
  		{
  #if defined(WIN32)
  			if (do_checkpoint)
  			{
*************** test_postmaster_connection(bool do_check
*** 543,549 ****
  		}
  	}
  
! 	return success;
  }
  
  
--- 534,541 ----
  		}
  	}
  
! 	/* value of last call to PQping */
! 	return ret;
  }
  
  
*************** do_start(void)
*** 746,754 ****
  
  	if (do_wait)
  	{
  		print_msg(_("waiting for server to start..."));
  
! 		if (test_postmaster_connection(false) == false)
  		{
  			write_stderr(_("%s: could not start server\n"
  						   "Examine the log output.\n"),
--- 738,748 ----
  
  	if (do_wait)
  	{
+ 		int status;
+ 		
  		print_msg(_("waiting for server to start..."));
  
! 		if ((status = test_postmaster_connection(false)) == PQNORESPONSE)
  		{
  			write_stderr(_("%s: could not start server\n"
  						   "Examine the log output.\n"),
*************** do_start(void)
*** 759,764 ****
--- 753,761 ----
  		{
  			print_msg(_(" done\n"));
  			print_msg(_("server started\n"));
+ 			if (status == PQREJECT)
+ 				write_stderr(_("However, could not connect, perhaps due to invalid authentication or\n"
+ 								"misconfiguration.\n"));
  		}
  	}
  	else
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index ecbd54c..a6c73af 100644
*** /tmp/TyACxa_exports.txt	Tue Nov 23 17:41:50 2010
--- src/interfaces/libpq/exports.txt	Tue Nov 23 17:26:29 2010
*************** PQescapeLiteral           154
*** 157,159 ****
--- 157,161 ----
  PQescapeIdentifier        155
  PQconnectdbParams         156
  PQconnectStartParams      157
+ PQping                    158
+ PQpingParams              159
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 8f318a1..2149f96 100644
*** /tmp/fGWite_fe-connect.c	Tue Nov 23 17:41:50 2010
--- src/interfaces/libpq/fe-connect.c	Tue Nov 23 17:27:28 2010
*************** static bool connectOptions1(PGconn *conn
*** 285,290 ****
--- 285,291 ----
  static bool connectOptions2(PGconn *conn);
  static int	connectDBStart(PGconn *conn);
  static int	connectDBComplete(PGconn *conn);
+ static PGPing internal_ping(PGconn *conn);
  static PGconn *makeEmptyPGconn(void);
  static void fillPGconn(PGconn *conn, PQconninfoOption *connOptions);
  static void freePGconn(PGconn *conn);
*************** PQconnectdbParams(const char **keywords,
*** 375,380 ****
--- 376,395 ----
  
  }
  
+ PGPing
+ PQpingParams(const char **keywords,
+ 				  const char **values,
+ 				  int expand_dbname)
+ {
+ 	PGconn	   *conn = PQconnectStartParams(keywords, values, expand_dbname);
+ 	PGPing		ret;
+ 
+ 	ret = internal_ping(conn);
+ 	PQfinish(conn);
+ 
+ 	return ret;
+ }
+ 
  /*
   *		PQconnectdb
   *
*************** PQconnectdb(const char *conninfo)
*** 408,413 ****
--- 423,440 ----
  	return conn;
  }
  
+ PGPing
+ PQping(const char *conninfo)
+ {
+ 	PGconn	   *conn = PQconnectStart(conninfo);
+ 	PGPing		ret;
+ 
+ 	ret = internal_ping(conn);
+ 	PQfinish(conn);
+ 
+ 	return ret;
+ }
+ 
  /*
   *		PQconnectStartParams
   *
*************** error_return:
*** 2491,2496 ****
--- 2518,2549 ----
  
  
  /*
+  * internal_ping
+  *	Determine if a server is running and if we can connect to it.
+  */
+ PGPing
+ internal_ping(PGconn *conn)
+ {
+ 	if (conn && conn->status != CONNECTION_BAD)
+ 	{
+ 		(void) connectDBComplete(conn);
+ 
+ 		/*
+ 		 *	If the connection needs a password, we can consider the
+ 		 *	server as accepting connections.
+ 		 */
+ 	    if (conn && (conn->status != CONNECTION_BAD ||
+ 		    PQconnectionNeedsPassword(conn)))
+ 			return PQACCESS;
+ 		else
+ 			return PQREJECT;
+ 	}
+ 	else
+ 		return PQNORESPONSE;
+ }
+ 
+ 
+ /*
   * makeEmptyPGconn
   *	 - create a PGconn data structure with (as yet) no interesting data
   */
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index 659d82d..d1a6dd4 100644
*** /tmp/VTULxb_libpq-fe.h	Tue Nov 23 17:41:50 2010
--- src/interfaces/libpq/libpq-fe.h	Tue Nov 23 17:31:25 2010
*************** typedef enum
*** 107,112 ****
--- 107,119 ----
  	PQERRORS_VERBOSE			/* all the facts, ma'am */
  } PGVerbosity;
  
+ typedef enum
+ {
+ 	PQACCESS,					/* connected to server */
+ 	PQREJECT,					/* server rejected access */
+ 	PQNORESPONSE				/* server did not respond */
+ } PGPing;
+ 
  /* PGconn encapsulates a connection to the backend.
   * The contents of this struct are not supposed to be known to applications.
   */
*************** extern int	PQendcopy(PGconn *conn);
*** 403,408 ****
--- 410,418 ----
  extern int	PQsetnonblocking(PGconn *conn, int arg);
  extern int	PQisnonblocking(const PGconn *conn);
  extern int	PQisthreadsafe(void);
+ extern PGPing PQping(const char *conninfo);
+ extern PGPing PQpingParams(const char **keywords,
+ 				  const char **values, int expand_dbname);
  
  /* Force the write buffer to be written (or at least try) */
  extern int	PQflush(PGconn *conn);
#33Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#32)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Bruce Momjian wrote:

BTW, it is annoying that we can't definitively distinguish "postmaster
is not running" from a connectivity problem, but I can't see a way
around that.

Agreed. I will research this.

I have researched this and developed the attached patch. It implements
PGping() and PGpingParams() in libpq, and has pg_ctl use it for pg_ctl
-w server status detection.

The new output for cases where .pgpass is not allowing for a connection
is:

$ pg_ctl -w -l /dev/null start
waiting for server to start.... done
server started
However, could not connect, perhaps due to invalid authentication or
misconfiguration.

The code basically checks the connection status between PQconnectStart()
and connectDBComplete() to see if the server is running but we failed to
connect for some reason.

I have applied this patch, with modified wording of the "cannot connect"
case:

$ pg_ctl -w -l /dev/null start
waiting for server to start.... done
server started
warning: could not connect, perhaps due to invalid authentication or
misconfiguration.

I assume having the warning as the last printed things is appropriate.
This is my second patch this week that got little feedback --- I am
getting a little spooked. ;-)

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#34Fujii Masao
masao.fujii@gmail.com
In reply to: Bruce Momjian (#33)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

On Fri, Nov 26, 2010 at 3:11 AM, Bruce Momjian <bruce@momjian.us> wrote:

I have applied this patch, with modified wording of the "cannot connect"
case:

       $ pg_ctl -w -l /dev/null start
       waiting for server to start.... done
       server started
       warning:  could not connect, perhaps due to invalid authentication or
       misconfiguration.

This patch breaks the behavior that "pg_ctl -w start" waits until the standby
has been ready to accept read-only queries. IOW, pg_ctl without this patch
continues to check the connection even if the connection is rejected because
the database has not been consistent yet. But pg_ctl with this patch treats
that rejection as success of the standby starting and prints the above
messages.

I agree to treat the receipt of password request from the server as success
of the server starting. But I don't think that we should treat other rejection
cases that way and change the existing behavior.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#35Bruce Momjian
bruce@momjian.us
In reply to: Fujii Masao (#34)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Fujii Masao wrote:

On Fri, Nov 26, 2010 at 3:11 AM, Bruce Momjian <bruce@momjian.us> wrote:

I have applied this patch, with modified wording of the "cannot connect"
case:

? ? ? ?$ pg_ctl -w -l /dev/null start
? ? ? ?waiting for server to start.... done
? ? ? ?server started
? ? ? ?warning: ?could not connect, perhaps due to invalid authentication or
? ? ? ?misconfiguration.

This patch breaks the behavior that "pg_ctl -w start" waits until the standby
has been ready to accept read-only queries. IOW, pg_ctl without this patch
continues to check the connection even if the connection is rejected because
the database has not been consistent yet. But pg_ctl with this patch treats
that rejection as success of the standby starting and prints the above
messages.

I agree to treat the receipt of password request from the server as success
of the server starting. But I don't think that we should treat other rejection
cases that way and change the existing behavior.

OK, that is easy to fix. The only downside is that if you misconfigured
.pgpass (which is what I used for testing), you have to wait 60 seconds
to get the "cannot connect" error message. Is that OK?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#36Dmitriy Igrishin
dmitigr@gmail.com
In reply to: Bruce Momjian (#35)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Hey hackers,

I am sorry, but is it possible to implement BTW ability to
check exactly status of authentication from libpq ? As for now,
the only way to check failed authentication is parsing the error
message, that is sadly.

2010/11/26 Bruce Momjian <bruce@momjian.us>

Fujii Masao wrote:

On Fri, Nov 26, 2010 at 3:11 AM, Bruce Momjian <bruce@momjian.us> wrote:

I have applied this patch, with modified wording of the "cannot

connect"

case:

? ? ? ?$ pg_ctl -w -l /dev/null start
? ? ? ?waiting for server to start.... done
? ? ? ?server started
? ? ? ?warning: ?could not connect, perhaps due to invalid

authentication or

? ? ? ?misconfiguration.

This patch breaks the behavior that "pg_ctl -w start" waits until the

standby

has been ready to accept read-only queries. IOW, pg_ctl without this

patch

continues to check the connection even if the connection is rejected

because

the database has not been consistent yet. But pg_ctl with this patch

treats

that rejection as success of the standby starting and prints the above
messages.

I agree to treat the receipt of password request from the server as

success

of the server starting. But I don't think that we should treat other

rejection

cases that way and change the existing behavior.

OK, that is easy to fix. The only downside is that if you misconfigured
.pgpass (which is what I used for testing), you have to wait 60 seconds
to get the "cannot connect" error message. Is that OK?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
// Dmitriy.

#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#35)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Bruce Momjian <bruce@momjian.us> writes:

Fujii Masao wrote:

I agree to treat the receipt of password request from the server as success
of the server starting. But I don't think that we should treat other rejection
cases that way and change the existing behavior.

OK, that is easy to fix.

It's wrong though. If you get back a "password rejected" error, or most
other types of errors, it still indicates that the server started.
We just went over this a few days ago.

The only downside is that if you misconfigured
.pgpass (which is what I used for testing), you have to wait 60 seconds
to get the "cannot connect" error message. Is that OK?

No; it's useless and unnecessary behavior.

regards, tom lane

#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#35)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Bruce Momjian <bruce@momjian.us> writes:

Fujii Masao wrote:

This patch breaks the behavior that "pg_ctl -w start" waits until the standby
has been ready to accept read-only queries. IOW, pg_ctl without this patch
continues to check the connection even if the connection is rejected because
the database has not been consistent yet. But pg_ctl with this patch treats
that rejection as success of the standby starting and prints the above
messages.

The reason this is a problem is that somebody, in a fit of inappropriate
optimization, took out the code that allowed canAcceptConnections to
distinguish the "not consistent yet" state. We need to put that back,
not try to kluge around the problem from the client side.

regards, tom lane

#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#38)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

I wrote:

The reason this is a problem is that somebody, in a fit of inappropriate
optimization, took out the code that allowed canAcceptConnections to
distinguish the "not consistent yet" state.

Oh, no, that's not the case --- the PM_RECOVERY postmaster state does
still distinguish not-ready from ready. The real problem is that what
Bruce implemented has practically nothing to do with what was discussed
last week. PQping is supposed to be smarter about classifying errors
than this.

Speaking of classifying errors, should we have a fourth result value to
cover "obviously bogus parameters"? Right now you'll get PQNORESPONSE
for cases like incorrect syntax in the conninfo string. I'm not sure
how tense we ought to try to be about distinguishing, but if libpq
failed before even attempting a connection, PQNORESPONSE seems a bit
misleading.

regards, tom lane

#40Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#39)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

On Fri, Nov 26, 2010 at 6:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Speaking of classifying errors, should we have a fourth result value to
cover "obviously bogus parameters"?

+1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#41Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#39)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Tom Lane wrote:

I wrote:

The reason this is a problem is that somebody, in a fit of inappropriate
optimization, took out the code that allowed canAcceptConnections to
distinguish the "not consistent yet" state.

Oh, no, that's not the case --- the PM_RECOVERY postmaster state does
still distinguish not-ready from ready. The real problem is that what
Bruce implemented has practically nothing to do with what was discussed
last week. PQping is supposed to be smarter about classifying errors
than this.

I was not aware this was discussed last week because I am behind on
email. I was fixing a report from a month ago. I did explain how I was
doing the tests.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#41)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

PQping is supposed to be smarter about classifying errors
than this.

I was not aware this was discussed last week because I am behind on
email. I was fixing a report from a month ago. I did explain how I was
doing the tests.

Um, you did respond in that thread, several times even:
http://archives.postgresql.org/pgsql-hackers/2010-11/msg01102.php
so I kind of assumed that the patch you presented this week did
what was agreed to last week.

I have committed a patch to make PQping do what was agreed to.

regards, tom lane

#43Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#42)
Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

PQping is supposed to be smarter about classifying errors
than this.

I was not aware this was discussed last week because I am behind on
email. I was fixing a report from a month ago. I did explain how I was
doing the tests.

Um, you did respond in that thread, several times even:
http://archives.postgresql.org/pgsql-hackers/2010-11/msg01102.php
so I kind of assumed that the patch you presented this week did
what was agreed to last week.

Yes, I do remember that, but I remember this:

http://archives.postgresql.org/pgsql-hackers/2010-11/msg01095.php

What we want here is to check the result of postmaster.c's
canAcceptConnections(),

and this:

http://archives.postgresql.org/pgsql-hackers/2010-11/msg01106.php

You do have to distinguish connection failures (ie connection refused)
from errors that came back from the postmaster, and the easiest place to
be doing that is inside libpq.

which I thought meant it had to be done in libpq and we didn't have
access to the postmaster return codes in libpq.

Your changes look very good, and not something I would have been able to
code.

I have committed a patch to make PQping do what was agreed to.

Thanks.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +