Why has postmaster shutdown gotten so slow?
Shutdown of an idle postmaster used to take about two or three seconds
(mostly due to the sync/sleep(2)/sync in md_sync). For the last couple
of days it's taking more like a dozen seconds. I presume somebody broke
something, but I'm unsure whether to pin the blame on bgwriter or
Windows changes. Anyone care to fess up?
regards, tom lane
Shutdown of an idle postmaster used to take about two or three seconds
(mostly due to the sync/sleep(2)/sync in md_sync). For the last couple
of days it's taking more like a dozen seconds. I presume somebody broke
something, but I'm unsure whether to pin the blame on bgwriter or
Windows changes. Anyone care to fess up?
AFAICS, Win32 changes for the past few days have been minimal, and pretty
much isolated to Win32. Happy to stand corrected, but I'd start by looking
elsewhere...
Cheers,
Claudio
---
Certain disclaimers and policies apply to all email sent from Memetrics.
For the full text of these disclaimers and policies see
<a
href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em
ailpolicy.html</a>
Import Notes
Resolved by subject fallback
Tom Lane wrote:
Shutdown of an idle postmaster used to take about two or three seconds
(mostly due to the sync/sleep(2)/sync in md_sync). For the last couple
of days it's taking more like a dozen seconds. I presume somebody broke
something, but I'm unsure whether to pin the blame on bgwriter or
Windows changes. Anyone care to fess up?
I guess it could well be the bgwriter, which when having nothing to do
at all is sleeping for 10 seconds. Not sure, will check.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #
Jan Wieck wrote:
Tom Lane wrote:
Shutdown of an idle postmaster used to take about two or three seconds
(mostly due to the sync/sleep(2)/sync in md_sync). For the last couple
of days it's taking more like a dozen seconds. I presume somebody broke
something, but I'm unsure whether to pin the blame on bgwriter or
Windows changes. Anyone care to fess up?I guess it could well be the bgwriter, which when having nothing to do
at all is sleeping for 10 seconds. Not sure, will check.
I checked the background writer for this and I can not reproduce the
behaviour. If the bgwriter had zero blocks to write it does PG_USLEEP
for 10 seconds, which on Unix is done by select() and that is correctly
interrupted when the postmaster sends it the term signal on shutdown.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #
Jan Wieck <JanWieck@Yahoo.com> writes:
I checked the background writer for this and I can not reproduce the
behaviour. If the bgwriter had zero blocks to write it does PG_USLEEP
for 10 seconds, which on Unix is done by select() and that is correctly
interrupted when the postmaster sends it the term signal on shutdown.
This appears to be a platform-dependent behavior. The HPUX select(2) man
page says
[EINTR] The select() function was interrupted before any
of the selected events occurred and before the
timeout interval expired. If SA_RESTART has been
set for the interrupting signal, it is
implementation-dependent whether select() restarts
or returns with EINTR.
which text also appears verbatim in the Single Unix Spec. Since we set
SA_RESTART for every signal except SIGALRM (see pqsignal.c), we are
subject to the implementation dependency for SIGTERM.
Tracing the bgwriter process on my machine makes it real obvious that in
fact the select delay is allowed to finish out when SIGTERM is received.
In fact worse than that: it's restarted from the beginning. If 5
seconds have already elapsed, another 10 still elapse before the select
exits.
This won't do :-(. We cannot afford to fritter away 10 seconds in the
SIGTERM shutdown cycle --- on typical systems init isn't going to give
us more than 20 seconds before a hard kill.
I'd suggest reducing the delay to a second or two, or perhaps breaking
it into several 1-second waits with interrupt flag checks between.
In the longer run we might want to rethink what we are doing with
SA_RESTART, but I am not sure about the implications of fooling with
that.
regards, tom lane
Tom Lane wrote:
Jan Wieck <JanWieck@Yahoo.com> writes:
I checked the background writer for this and I can not reproduce the
behaviour. If the bgwriter had zero blocks to write it does PG_USLEEP
for 10 seconds, which on Unix is done by select() and that is correctly
interrupted when the postmaster sends it the term signal on shutdown.This appears to be a platform-dependent behavior. The HPUX select(2) man
page says[EINTR] The select() function was interrupted before any
of the selected events occurred and before the
timeout interval expired. If SA_RESTART has been
set for the interrupting signal, it is
implementation-dependent whether select() restarts
or returns with EINTR.which text also appears verbatim in the Single Unix Spec. Since we set
SA_RESTART for every signal except SIGALRM (see pqsignal.c), we are
subject to the implementation dependency for SIGTERM.
That explains it.
Tracing the bgwriter process on my machine makes it real obvious that in
fact the select delay is allowed to finish out when SIGTERM is received.
In fact worse than that: it's restarted from the beginning. If 5
seconds have already elapsed, another 10 still elapse before the select
exits.This won't do :-(. We cannot afford to fritter away 10 seconds in the
SIGTERM shutdown cycle --- on typical systems init isn't going to give
us more than 20 seconds before a hard kill.I'd suggest reducing the delay to a second or two, or perhaps breaking
it into several 1-second waits with interrupt flag checks between.In the longer run we might want to rethink what we are doing with
SA_RESTART, but I am not sure about the implications of fooling with
that.
I think we should at this point have some maximum value for PG_xSLEEP
over which it falls back to a function call that does either this
breaking up into a loop with checking InterruptPending or removes the
SA_RESTART flag while wating for the timeout.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #
Tracing the bgwriter process on my machine makes it real
obvious that in
fact the select delay is allowed to finish out when SIGTERM
is received.
In fact worse than that: it's restarted from the beginning. If 5
seconds have already elapsed, another 10 still elapse beforethe select
exits.
This won't do :-(. We cannot afford to fritter away 10
seconds in the
SIGTERM shutdown cycle --- on typical systems init isn't
going to give
us more than 20 seconds before a hard kill.
I'd suggest reducing the delay to a second or two, or
perhaps breaking
it into several 1-second waits with interrupt flag checks between.
In the longer run we might want to rethink what we are doing with
SA_RESTART, but I am not sure about the implications of fooling with
that.I think we should at this point have some maximum value for PG_xSLEEP
over which it falls back to a function call that does either this
breaking up into a loop with checking InterruptPending or removes the
SA_RESTART flag while wating for the timeout.
If you look at my win32 signals patch nr 3 (posted feb 4th), I have code
to do this for win32 in it. It breaks up select() timeouts into pieces
of 1 second and polls for win32 signals inbetween.
Turns out it wasn't necessary, since win32 *does* deliver our signals
whlie in select. So for once it's win32 that does what we want - I think
that's a first.. But it might help on another platform.
//Magnus
Import Notes
Resolved by subject fallback