sick DB - ??

Started by Pete Leonardover 24 years ago6 messagesgeneral
Jump to latest
#1Pete Leonard
pete@hero.com

Postgres 7.1.2, FreeBSD 3.4

Box got sick, had to bounce it. Postgres wasn't brought down in a
graceful fashion..

restart didn't bring the DB back properly, so as the postgres user, did
the following:

/usr/local/pgsql/bin/postmaster -d5 start

it dumps the initial environment variables, and then returns nothing. CPU
is pegged at 100%. No reporting, no information as to what's happening.

Solutions? It the DB corrupted badly? Where do I go from here?

thanks,

--pete

#2Pete Leonard
pete@hero.com
In reply to: Pete Leonard (#1)
Re: sick DB - ??

As a followup - the line from top:

1641 postgres 105 0 2684K 1384K CPU1 0 8:26 99.02% 99.02%
postgres

As you can see, it's barely taking up any RAM - the process is going nuts
right off the bat..

On Wed, 18 Jul 2001, Pete Leonard wrote:

Show quoted text

Postgres 7.1.2, FreeBSD 3.4

Box got sick, had to bounce it. Postgres wasn't brought down in a
graceful fashion..

restart didn't bring the DB back properly, so as the postgres user, did
the following:

/usr/local/pgsql/bin/postmaster -d5 start

it dumps the initial environment variables, and then returns nothing. CPU
is pegged at 100%. No reporting, no information as to what's happening.

Solutions? It the DB corrupted badly? Where do I go from here?

thanks,

--pete

#3Pete Leonard
pete@hero.com
In reply to: Pete Leonard (#2)
Re: sick DB - ??

Followup ^2 -

The reason this happened was that for whatever reason (we're still
investigating), /tmp was writeable only by root.

I only noticed this when using initdb to create a new data directory.

postmaster offered no suggestion that there was a problem here, even when
running at -d5.

chmod 777 /tmp fixed everything.

my best guess (I don't know how postmaster is operating, I didn't run any
of the system-level diagnostic tools to check) is that if postmaster fails
on opening a pipe/tmpfile, rather than check the error properly, it
changes the filename and tries again ad infinitum? Perhaps printing some
error code (especially at debug level 5) would help?

thanks,

--pete

On Wed, 18 Jul 2001, Pete Leonard wrote:

Show quoted text

As a followup - the line from top:

1641 postgres 105 0 2684K 1384K CPU1 0 8:26 99.02% 99.02%
postgres

As you can see, it's barely taking up any RAM - the process is going nuts
right off the bat..

On Wed, 18 Jul 2001, Pete Leonard wrote:

Postgres 7.1.2, FreeBSD 3.4

Box got sick, had to bounce it. Postgres wasn't brought down in a
graceful fashion..

restart didn't bring the DB back properly, so as the postgres user, did
the following:

/usr/local/pgsql/bin/postmaster -d5 start

it dumps the initial environment variables, and then returns nothing. CPU
is pegged at 100%. No reporting, no information as to what's happening.

Solutions? It the DB corrupted badly? Where do I go from here?

thanks,

--pete

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pete Leonard (#2)
Re: Re: sick DB - ??

Pete Leonard <pete@hero.com> writes:

restart didn't bring the DB back properly, so as the postgres user, did
the following:
/usr/local/pgsql/bin/postmaster -d5 start
it dumps the initial environment variables, and then returns nothing. CPU
is pegged at 100%. No reporting, no information as to what's happening.

This is kind of a random guess, but we recently noticed that 7.1 has a
bug whereby the postmaster can go into an infinite loop at startup if
the $PGDATA directory is not writable. Check permissions. It might
also be a good idea to remove the old postmaster.pid file by hand.

regards, tom lane

#5Mike Castle
dalgoda@ix.netcom.com
In reply to: Pete Leonard (#3)
Re: Re: sick DB - ??

On Wed, Jul 18, 2001 at 09:36:38AM -0700, Pete Leonard wrote:

chmod 777 /tmp fixed everything.

That should be 1777.

mrc
--
Mike Castle dalgoda@ix.netcom.com www.netcom.com/~dalgoda/
We are all of us living in the shadow of Manhattan. -- Watchmen
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pete Leonard (#3)
Re: Re: sick DB - ??

Pete Leonard <pete@hero.com> writes:

The reason this happened was that for whatever reason (we're still
investigating), /tmp was writeable only by root.

Ah. Hadn't thought about it before, but the infinite-loop-on-
nonwritable-$PGDATA bug would also trigger for nonwritable /tmp.
(The bug was actually in CreateLockFile, which is used both to
create a lockfile in $PGDATA and one in /tmp. Sigh.)

This is fixed in current sources. If we were going to do a 7.1.3
then I'd backpatch the fix into the REL7_1 branch, but at this point
I suspect there won't be a 7.1.3 --- we'll probably go into 7.2 beta
in another five or six weeks, so there's not much point.

regards, tom lane