Re: Problem during Windows service start

Started by Kyotaro Horiguchiover 6 years ago4 messages
#1Kyotaro Horiguchi
horikyota.ntt@gmail.com

Sorry in advance for link-breaking message, but the original mail was
too old and gmail doesn't allow me to craft required headers to link
to it.

/messages/by-id/CAKm4Xs71Ma8bV1fY6Gfz9Mg3AKmiHuoJNpxeDVF_KTVOKoy1WQ@mail.gmail.com

Please find the proposed patch for review. I will attach it to
commitfest as well

Pacemaker suffers the same thing. We suggest our customers that "start
server alone to perform recovery then start pacemaker if it is
expected to take a long time for recovery so that reaches time out".

I don't think it is good think to let status SERVICE_RUNNING although
it actually is not (yet). I think the right direction here is that, if
pg_ctl returns by timeout, pgwin32_ServiceMain kills the starting
server then report something like "timedout and server was stopped,
please make sure the server not to take a long time to perform
recovery.".

Thougts?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#2Alvaro Herrera from 2ndQuadrant
alvherre@alvh.no-ip.org
In reply to: Kyotaro Horiguchi (#1)

On 2019-Jul-24, Kyotaro Horiguchi wrote:

Please find the proposed patch for review. I will attach it to
commitfest as well

Pacemaker suffers the same thing. We suggest our customers that "start
server alone to perform recovery then start pacemaker if it is
expected to take a long time for recovery so that reaches time out".

I don't think it is good think to let status SERVICE_RUNNING although
it actually is not (yet). I think the right direction here is that, if
pg_ctl returns by timeout, pgwin32_ServiceMain kills the starting
server then report something like "timedout and server was stopped,
please make sure the server not to take a long time to perform
recovery.".

I'm not sure that's a great reaction; it makes total recovery time
even longer. How would the user ensure that recovery takes a shorter
time? We'd be forcing them to start the service over and over, until
recovery completes.

Can't we have pg_ctl just continue to wait indefinitely? So we'd set
SERVICE_START_PENDING when wait_for_postmaster is out of patience, then
loop again -- until recovery completes. Exiting pg_ctl on timeout seems
reasonable for interactive use, but maybe for service use it's not
reasonable.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera from 2ndQuadrant (#2)

On Thu, Sep 05, 2019 at 07:09:45PM -0400, Alvaro Herrera from 2ndQuadrant wrote:

Can't we have pg_ctl just continue to wait indefinitely? So we'd set
SERVICE_START_PENDING when wait_for_postmaster is out of patience, then
loop again -- until recovery completes. Exiting pg_ctl on timeout seems
reasonable for interactive use, but maybe for service use it's not
reasonable.

The root of the problem here is that the time recovery takes is not
something that can be guessed, and that service registering happens in
the background. It depends on the time the last checkpoint occurred,
the load on the machine involved and the WAL operations done. So it
seems to me that Alvaro's idea is something which we could work on for
at least HEAD. There is also the path of providing a longer timeout,
still that's just a workaround..

My understanding is that this could be qualified as a bug because of
the fact that we require using again pg_ctl after starting the service
from the windows service control center.

So, are there plans to move on with this patch? It is waiting on
author for some time now.
--
Michael

#4Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#3)

On Thu, Nov 07, 2019 at 12:55:13PM +0900, Michael Paquier wrote:

So, are there plans to move on with this patch? It is waiting on
author for some time now.

Seeing no activity from the author or even the reviewer, I have marked
the patch as returned with feedback for now. I am not actually fully
convinced that this should be backpatched either, so it could be done
as a future improvement.
--
Michael