8.04 and RedHat/CentOS init script issue
Hi,
I installed 8.04 via RPM on Centos 4.2 which is the same as RedHat 4.2
and while booting the init script reports that the daemon [FAILED], but
after I logon it shows the postmaster running and I am able to connect
from any client remotely.
I made not modifcations to the script and there is nothing out of the
ordinary in the log.
Thanks,
Tony
Hi,
On Tue, 18 Oct 2005, Tony Caduto wrote:
I installed 8.04 via RPM on Centos 4.2 which is the same as RedHat 4.2 and
while booting the init script reports that the daemon [FAILED], but after I
logon it shows the postmaster running and I am able to connect from any
client remotely.I made not modifcations to the script and there is nothing out of the
ordinary in the log.
Hmm. In 8.0.4 RPM init scripts, we were using a 1 second of sleep time
(see sleep 1 line in the init script). On some cases where the system is
slow, you are prompted about the startup failure; however this is not the
real case.
In 8.1 RPMs, the sleep time was increased to 2 seconds; which we believe
that won't have the problem you've reported:
So please increase this sleep time and give another try.
Regards,
--
Devrim GUNDUZ
Kivi Bili�im Teknolojileri - http://www.kivi.com.tr
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org
From pgsql-hackers-owner@postgresql.org Wed Oct 19 14:43:36 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
Received: from localhost (av.hub.org [200.46.204.144])
by svr1.postgresql.org (Postfix) with ESMTP id 3CBA6D89FB
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>; Wed, 19 Oct 2005 14:43:35 -0300 (ADT)
Received: from svr1.postgresql.org ([200.46.204.71])
by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
with ESMTP id 75948-05
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>;
Wed, 19 Oct 2005 17:43:32 +0000 (GMT)
Received: from gwmta.wicourts.gov (gwmta.wicourts.gov [165.219.244.99])
by svr1.postgresql.org (Postfix) with ESMTP id D6122D6DAA
for <pgsql-hackers@postgresql.org>; Wed, 19 Oct 2005 14:43:32 -0300 (ADT)
Received: from Courts-MTA by gwmta.wicourts.gov
with Novell_GroupWise; Wed, 19 Oct 2005 12:43:33 -0500
Message-Id: <43563F65020000250000012B@gwmta.wicourts.gov>
X-Mailer: Novell GroupWise Internet Agent 7.0
Date: Wed, 19 Oct 2005 12:43:17 -0500
From: "Kevin Grittner" <Kevin.Grittner@wicourts.gov>
To: <pgsql-hackers@postgresql.org>
Subject: Re: A costing analysis tool
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, hits=0.003 required=5 tests=[AWL=0.003]
X-Spam-Level:
X-Archive-Number: 200510/822
X-Sequence-Number: 74681
Summary of schema I'm considering. Comments welcome.
When it gets downt to the detail, it may make sense to combine
or split some of these. For example, runtime_options should
probably not have a column for each currently known option,
but a child table which maps to all non-default option values.
submitter
Identifies who submitted test results.
Name, optional info such as email address and organization
runtime_environment
Identifies the overall test environment.
OS with distribution & version, CPU number, type speed, RAM,
background load, static configuration, etc.
This provides context for a series of tests, to see how the numbers
look in a given environment
dataset_characteristics
Identifies data metrics which may affect costing accuracy
Table counts, row counts, column counts, disk space used,
level of fragmentation.
Maybe some of the "standard" tests will share common dataset
characteristics across multiple environments.
cache_state
Identifies the level of initial caching for a test, and the degree to
which
the referenced data can be cached during execution
runtime_options
Identifies the runtime choices in effect for a query run.
The state of EXPLAIN, ANALYZE, enable_xxx, and dynamic configuration
settings, such as random_page_cost.
query
Identifies a test query, possibly run by many people in many
environments
against various datasets with different cache states and runtime
options
test_result_summary
Ties a query to details about a run, with a summary of results.
Run time from the client perspective, rows returned.
test_result_step_detail
Shows EXPLAIN ANALYZE information (if any) for each step.
Hi all,
I tried changing the sleep command in the script to 2, but at boot it
still says [FAILED].
even though the script reports it failed, the db is up an running.
System is a Compaq DL380(2.5gb ram 2.4 dual 2.4gzh Xeon) running CentOS 4.2
I am going to install 8.1beta 3 on another box that is the exact same
hardware and OS version, I will report back what happens.
Not sure what is going on, has anyone else had this problem with CentOS
4.2 or Red Had EL 4.2?
Thanks,
Tony Caduto
http://www.amsoftwaredesign.com
Home of PG Lightning Admin for Postgresql 8.x
Tony Caduto <tony_caduto@amsoftwaredesign.com> writes:
I tried changing the sleep command in the script to 2, but at boot it
still says [FAILED].
even though the script reports it failed, the db is up an running.
This seems to happen for some people and not others. I've been wanting
to find out how the heck it can take multiple seconds for the postmaster
to start and create its pid-file ... that shouldn't take long at all.
Are you willing to try strace'ing the postmaster? Modify the script
like
$SU -l postgres -c "strace -tt -o /tmp/strace.out $PGENGINE/postmaster -p '$PGPORT' -D '$PGDATA' ${PGOPTS} &" >> "$PGLOG" 2>&1 < /dev/null
^^^^^^^^^^^^^^ add this ^^^^^^
and reboot. (After you've gotten a trace of a failing case, change it
back and reboot again.)
This is kind of invasive and may change the behavior enough that we
don't see the problem :-( --- but if you're willing to reboot a few
times in hopes of capturing a trace of a failed case, it'd be worth
trying.
regards, tom lane
Tom Lane wrote:
Tony Caduto <tony_caduto@amsoftwaredesign.com> writes:
I tried changing the sleep command in the script to 2, but at boot it
still says [FAILED].
even though the script reports it failed, the db is up an running.This seems to happen for some people and not others. I've been wanting
to find out how the heck it can take multiple seconds for the postmaster
to start and create its pid-file ... that shouldn't take long at all.
Are you willing to try strace'ing the postmaster? Modify the script
like$SU -l postgres -c "strace -tt -o /tmp/strace.out $PGENGINE/postmaster -p '$PGPORT' -D '$PGDATA' ${PGOPTS} &" >> "$PGLOG" 2>&1 < /dev/null
^^^^^^^^^^^^^^ add this ^^^^^^and reboot. (After you've gotten a trace of a failing case, change it
back and reboot again.)This is kind of invasive and may change the behavior enough that we
don't see the problem :-( --- but if you're willing to reboot a few
times in hopes of capturing a trace of a failed case, it'd be worth
trying.regards, tom lane
Hi Tom,
I added the strace line like you said and rebooted, it did display the
[FAILED] after the reboot.
I put the resulting strace.out file on my web server, here is the
link(warning it's petty big):
http://www.amsoftwaredesign.com/downloads/strace.out
After the second reboot I changed the sleep from 2 to 5 and then it
worked correctly, of course this really slowed the boot process.
Thanks,
Tony
Tony Caduto <tony_caduto@amsoftwaredesign.com> writes:
Tom Lane wrote:
Are you willing to try strace'ing the postmaster?
I added the strace line like you said and rebooted, it did display the
[FAILED] after the reboot.
Thanks for collecting the raw data. The salient events seem to be these:
12:57:52.400888 exec() call
12:57:52.619268 completion(?) of opening shared libraries
12:57:52.657465 first call coming from our own code instead of libraries
12:57:52.902476 begin reading postgresql.conf
12:57:52.915949 done reading postgresql.conf
12:57:52.916191 begin trying to identify system timezone
12:58:01.117869 done identifying system timezone
12:58:01.131798 postmaster.pid created
In short: pg_timezone_initialize() took about 8.2 seconds out of the
total time of 8.73 seconds.
Since pg_timezone_initialize() needs to scan all of the 500-odd files
under postgresql/share/timezone/, it isn't so surprising that it would
take a little bit of time. But 8 seconds seems like a lot. The trace
makes it look like localtime() performs stat("/etc/localtime") on each
call, which is pretty ugly --- I wonder if there isn't some way around
that?
Anyway, the short answer is that pg_timezone_initialize ought to wait
till after we've created postmaster.pid. There's no urgent reason to
do it earlier AFAICS. This also explains why we didn't see a startup
problem in earlier releases --- pg_timezone_initialize didn't exist
before 8.0.
regards, tom lane
Tom Lane wrote:
In short: pg_timezone_initialize() took about 8.2 seconds out of the
total time of 8.73 seconds.Since pg_timezone_initialize() needs to scan all of the 500-odd files
under postgresql/share/timezone/, it isn't so surprising that it would
take a little bit of time. But 8 seconds seems like a lot. The trace
makes it look like localtime() performs stat("/etc/localtime") on each
call, which is pretty ugly --- I wonder if there isn't some way around
that?
Further data points:
I just observed this taking over 20 seconds on my clunky old pII 266.
That's really horrible. But pg_ctl -w start was able to complete in
about 2 seconds.
Even on my much faster laptop the timezone lib startup took 3 or 4
seconds (and pg_ctl -w start came back in about 1 second).
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
Tom Lane wrote:
In short: pg_timezone_initialize() took about 8.2 seconds out of the
total time of 8.73 seconds.
Further data points:
I just observed this taking over 20 seconds on my clunky old pII 266.
That's really horrible. But pg_ctl -w start was able to complete in
about 2 seconds.
Yeah. I've been experimenting here, and it's clear that strace itself
adds huge overhead --- on my machine, postmaster start is normally well
under a second, but strace'ing it brings it to about 8 seconds. No
doubt that's because of all the stat("/etc/localtime") calls it has to
trace.
So there's some Heisenberg effect here. However, I don't think there
can be much doubt that on a machine that is just booting (and has
surely got none of these files in cache) the search through
share/postgresql/timezone could take a few seconds. Hindsight is
always 20/20 ;-)
regards, tom lane
Tom Lane wrote:
So there's some Heisenberg effect here. However, I don't think there
can be much doubt that on a machine that is just booting (and has
surely got none of these files in cache) the search through
share/postgresql/timezone could take a few seconds. Hindsight is
always 20/20 ;-)
Something is surely wrong in the timezone lib, though:
[andrew@alphonso inst]$ grep /etc/localtime strace.out | wc -l
38073
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
Something is surely wrong in the timezone lib, though:
[ digs in glibc sources for awhile... ]
The test loop in score_timezone() calls both localtime() and strftime()
for each probe point, and in glibc strftime() calls tzset(), which the
source code claims is required by POSIX. The explicit tzset() call is
what's forcing the recheck of /etc/localtime.
Possibly the glibc boys would listen to a suggestion that strftime()
need not force the file recheck, but my experience with them is that
they're relatively impervious to suggestions :-(
I'm not actually particularly worried about the startup time. What's
bothering me right at the moment, given the new-found knowledge that
strftime() is slow on Linux, is that we're using it in elog(). At the
time that code was written, we did it deliberately to ensure that all
the backends would write log timestamps in the same timezone regardless
of local SET TimeZone commands. That's still an important
consideration, but I wonder whether we don't now have enough timezone
infrastructure that we could get the same results using pg_strftime.
regards, tom lane
I wrote:
Possibly the glibc boys would listen to a suggestion that strftime()
need not force the file recheck, but my experience with them is that
they're relatively impervious to suggestions :-(
I've filed a bug for this:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=171351
so no need for everyone else to do it too ...
I'm not actually particularly worried about the startup time. What's
bothering me right at the moment, given the new-found knowledge that
strftime() is slow on Linux, is that we're using it in elog(). At the
time that code was written, we did it deliberately to ensure that all
the backends would write log timestamps in the same timezone regardless
of local SET TimeZone commands. That's still an important
consideration, but I wonder whether we don't now have enough timezone
infrastructure that we could get the same results using pg_strftime.
If glibc fixes the problem upstream then we can leave well enough alone,
but if they indicate they won't then we should think about doing this
someday. The major problem with it probably is "what do you do when
messages need to be emitted before pgtz has been initialized?"
regards, tom lane
I'm not actually particularly worried about the startup
time. What's
bothering me right at the moment, given the new-found knowledge that
strftime() is slow on Linux, is that we're using it inelog(). At the
time that code was written, we did it deliberately to
ensure that all
the backends would write log timestamps in the same timezone
regardless of local SET TimeZone commands. That's still animportant
consideration, but I wonder whether we don't now have
enough timezone
infrastructure that we could get the same results using pg_strftime.
If glibc fixes the problem upstream then we can leave well
enough alone, but if they indicate they won't then we should
That'll take quite a while to trickle down into the distributions even
if it's fixed, won't it? If the fix is simple, we should perhaps
consider it anyway.
think about doing this someday. The major problem with it
probably is "what do you do when messages need to be emitted
before pgtz has been initialized?"
Shouldn't be too hard, I think. If we declare a "pg_tz* system_timezone"
or so, and initialize it to NULL. Once pgtz is initialized we assign a
valid timezone to it, being the startup timezone. Then in elog, we
simply check if system_timezone is null and then fallback on the glibc
version of strftime.
It shouldn't be a performance issue if it fails that often, because we
won't call elog a whole lot of times there, right?
//Magnus
Import Notes
Resolved by subject fallback