Moved postgres, now won't start

Started by Madison Kellyalmost 19 years ago8 messagesgeneral

linux@alteeve.com

almost 19 years ago

Hi all,

I've created a small 2-node (Debian Etch, PgSQL8.1) cluster using a
(shared) DRBD8 partition formatted as ext3 running in Primary/Secondary
mode.

I shut down postgresql-8.1, moved '/etc/postgresql' and
'/etc/postgres-commin' to '/ha/etc' (where '/ha' is the DRBD partitions
mount point). Then I created symlinks to the directories under '/ha' and
then restarted PostgreSQL. Everything *seemed* okay, until I tried to
connect to a database (ie: 'template1' as 'postgres'). Then I get the error:

$ psql template1
psql: FATAL: could not open file "global/pg_database": No such file or
directory

When I tried connecting to another DB as a user with a (md5) password
it recognizes if the password is right or not. Also, the file:

# cat /var/lib/postgresql/8.1/main/global/pg_database
"postgres" 10793 1663 499 499
"template1" 1 1663 499 499
"template0" 10792 1663 499 499

Exists, and is readable as you can see.

Any idea what's wrong? Does it not like that '/var/lib/postgres ->
'/ha/var/lib/postgres'?

Thanks!

Madison

Tom Lane

tgl@sss.pgh.pa.us

almost 19 years ago

In reply to: Madison Kelly (#1)

Re: Moved postgres, now won't start

Madison Kelly <linux@alteeve.com> writes:

I've created a small 2-node (Debian Etch, PgSQL8.1) cluster using a
(shared) DRBD8 partition formatted as ext3 running in Primary/Secondary
mode.

I shut down postgresql-8.1, moved '/etc/postgresql' and
'/etc/postgres-commin' to '/ha/etc' (where '/ha' is the DRBD partitions
mount point). Then I created symlinks to the directories under '/ha' and
then restarted PostgreSQL. Everything *seemed* okay, until I tried to
connect to a database (ie: 'template1' as 'postgres'). Then I get the error:

$ psql template1
psql: FATAL: could not open file "global/pg_database": No such file or
directory

I think that's the first actual file access that happens during the
connect sequence (everything before that is done with in-memory caches
in the postmaster). So what I'm wondering is whether you *really* shut
down and restarted the postmaster, or whether you are trying to connect
to the same old postmaster process that has now had all its files
deleted out from under it.

regards, tom lane

Boszormenyi Zoltan

zb@cybertec.at

almost 19 years ago

In reply to: Madison Kelly (#1)

Re: Moved postgres, now won't start

Hi,

Madison Kelly írta:

Hi all,

I've created a small 2-node (Debian Etch, PgSQL8.1) cluster using a
(shared) DRBD8 partition formatted as ext3 running in
Primary/Secondary mode.

I shut down postgresql-8.1, moved '/etc/postgresql' and
'/etc/postgres-commin' to '/ha/etc' (where '/ha' is the DRBD
partitions mount point). Then I created symlinks to the directories
under '/ha' and then restarted PostgreSQL. Everything *seemed* okay,
until I tried to connect to a database (ie: 'template1' as
'postgres'). Then I get the error:

$ psql template1
psql: FATAL: could not open file "global/pg_database": No such file
or directory

When I tried connecting to another DB as a user with a (md5)
password it recognizes if the password is right or not. Also, the file:

# cat /var/lib/postgresql/8.1/main/global/pg_database
"postgres" 10793 1663 499 499
"template1" 1 1663 499 499
"template0" 10792 1663 499 499

Exists, and is readable as you can see.

Any idea what's wrong? Does it not like that '/var/lib/postgres ->
'/ha/var/lib/postgres'?

Thanks!

Madison

Do you use SELinux?
Look for "avc denied" messages in the logs to see if it's the case.

--
----------------------------------
Zoltán Böszörményi
Cybertec Geschwinde & Schönig GmbH
http://www.postgresql.at/

Madison Kelly

linux@alteeve.com

almost 19 years ago

In reply to: Tom Lane (#2)

Re: Moved postgres, now won't start

Tom Lane wrote:

I think that's the first actual file access that happens during the
connect sequence (everything before that is done with in-memory caches
in the postmaster). So what I'm wondering is whether you *really* shut
down and restarted the postmaster, or whether you are trying to connect
to the same old postmaster process that has now had all its files
deleted out from under it.

regards, tom lane

Thank you for your reply!

Before the move;

# /etc/init.d/postgresql-8.1 status
Version Cluster Port Status Owner Data directory
Log file
8.1 main 5432 online postgres /var/lib/postgresql/8.1/main
/var/log/postgresql/postgresql-8.1-main.log
# /etc/init.d/postgresql-8.1 stop
Stopping PostgreSQL 8.1 database server: main.
nicole:/etc/postgresql/8.1/main# /etc/init.d/postgresql-8.1 status
Version Cluster Port Status Owner Data directory
Log file
8.1 main 5432 down postgres /var/lib/postgresql/8.1/main
/var/log/postgresql/postgresql-8.1-main.log

I hope that doesn't get too mangled. Unless I am misunderstanding
"stop", then I think it was stopped. I made the move/symlinks mentioned
in my first post, then restarted.

For double certainty, I switched to the slave node after shutting down
postgres on the master node and doubled checked that it was still 'down'
as well.

Madison

Madison Kelly

linux@alteeve.com

almost 19 years ago

In reply to: Boszormenyi Zoltan (#3)

Re: Moved postgres, now won't start

Zoltan Boszormenyi wrote:

Do you use SELinux?
Look for "avc denied" messages in the logs to see if it's the case.

No, I don't (unless I missed it and Debian Etch uses it by default
now). To be sure, I checked the log files and only say this:

2007-07-16 13:58:03 EDT LOG: incomplete startup packet
2007-07-16 13:58:04 EDT LOG: could not open temporary statistics file
"global/pgstat.tmp": No such file or directory
2007-07-16 13:59:03 EDT FATAL: could not open file
"global/pg_database": No such file or directory
2007-07-16 13:59:04 EDT LOG: could not open temporary statistics file
"global/pgstat.tmp": No such file or directory
2007-07-16 14:00:03 EDT FATAL: could not open file
"global/pg_database": No such file or directory

Over and over again. I tried shutting down postgresql again and got
this at the shell:

# /etc/init.d/postgresql-8.1 stop
Stopping PostgreSQL 8.1 database server: main* pg_ctl: postmaster does
not shut down
(does not shutdown gracefully, now stopping immediately)pg_ctl: could
not send stop signal (PID: 19958): No such process
Insecure dependency in kill while running with -T switch at
/usr/bin/pg_ctlcluster line 370.
(does not shutdown, killing the process)
failed!

And this in the logs:

2007-07-16 14:28:00 EDT LOG: received fast shutdown request
2007-07-16 14:28:00 EDT LOG: shutting down
2007-07-16 14:28:00 EDT PANIC: could not open control file
"global/pg_control": No such file or directory
2007-07-16 14:28:00 EDT LOG: background writer process (PID 19960) was
terminated by signal 6
2007-07-16 14:28:00 EDT LOG: terminating any other active server processes
2007-07-16 14:28:00 EDT LOG: all server processes terminated;
reinitializing
2007-07-16 14:28:00 EDT LOG: could not open file "postmaster.pid": No
such file or directory
2007-07-16 14:28:00 EDT PANIC: could not open control file
"global/pg_control": No such file or directory
2007-07-16 14:28:00 EDT LOG: could not open temporary statistics file
"global/pgstat.tmp": No such file or directory

Lastly, to be very sure, I tried grep'ing for that string with no
results:

nicole:/var/log# grep "avc denied" * -Rni
nicole:/var/log#

Thanks for the reply!

Madison

Madison Kelly

linux@alteeve.com

almost 19 years ago

In reply to: Tom Lane (#2)

Re: Moved postgres, now won't start

Tom Lane wrote:

Madison Kelly <linux@alteeve.com> writes:

I've created a small 2-node (Debian Etch, PgSQL8.1) cluster using a
(shared) DRBD8 partition formatted as ext3 running in Primary/Secondary
mode.

I shut down postgresql-8.1, moved '/etc/postgresql' and
'/etc/postgres-commin' to '/ha/etc' (where '/ha' is the DRBD partitions
mount point). Then I created symlinks to the directories under '/ha' and
then restarted PostgreSQL. Everything *seemed* okay, until I tried to
connect to a database (ie: 'template1' as 'postgres'). Then I get the error:

$ psql template1
psql: FATAL: could not open file "global/pg_database": No such file or
directory

I think that's the first actual file access that happens during the
connect sequence (everything before that is done with in-memory caches
in the postmaster). So what I'm wondering is whether you *really* shut
down and restarted the postmaster, or whether you are trying to connect
to the same old postmaster process that has now had all its files
deleted out from under it.

To test your idea, I rebooted both cluster nodes and it works now.

How could I have done this without requiring a reboot? Is there a way to
tell postgres to create an entirely new connection?

Thanks!!

Madison

Tom Lane

tgl@sss.pgh.pa.us

almost 19 years ago

In reply to: Madison Kelly (#5)

Re: Moved postgres, now won't start

Madison Kelly <linux@alteeve.com> writes:

Over and over again. I tried shutting down postgresql again and got
this at the shell:

# /etc/init.d/postgresql-8.1 stop
Stopping PostgreSQL 8.1 database server: main* pg_ctl: postmaster does
not shut down
(does not shutdown gracefully, now stopping immediately)pg_ctl: could
not send stop signal (PID: 19958): No such process
Insecure dependency in kill while running with -T switch at
/usr/bin/pg_ctlcluster line 370.
(does not shutdown, killing the process)
failed!

And this in the logs:

2007-07-16 14:28:00 EDT LOG: received fast shutdown request
2007-07-16 14:28:00 EDT LOG: shutting down
2007-07-16 14:28:00 EDT PANIC: could not open control file
"global/pg_control": No such file or directory
2007-07-16 14:28:00 EDT LOG: background writer process (PID 19960) was
terminated by signal 6
2007-07-16 14:28:00 EDT LOG: terminating any other active server processes
2007-07-16 14:28:00 EDT LOG: all server processes terminated;
reinitializing
2007-07-16 14:28:00 EDT LOG: could not open file "postmaster.pid": No
such file or directory
2007-07-16 14:28:00 EDT PANIC: could not open control file
"global/pg_control": No such file or directory
2007-07-16 14:28:00 EDT LOG: could not open temporary statistics file
"global/pgstat.tmp": No such file or directory

I think this proves my theory --- that all looks like leftover processes
trying to work in an installation that isn't there anymore. (Except I
have no idea what the "insecure dependency" bit is about.)

What I suspect happened is that you moved the directories before you
actually shut down the old postmaster, and then the initscript's "stop"
command would have failed because it couldn't find the postmaster.pid file.

You could get rid of the old postmaster by doing "ps auxww | grep post"
to determine its PID and then "kill -QUIT postmaster_pid". The real
problem you're likely to have is that if you moved the directories while
anything was happening, you'll have an inconsistent snapshot of the
database files, probably meaning database corruption. There isn't
anything much you can do about that at this stage (although REINDEXing
your more active tables might not be a bad idea, once you've got the
thing talking to you again). I hope you have a reasonably recent backup
to resort to, in case it emerges that things are hopelessly messed up.

regards, tom lane

Alvaro Herrera

alvherre@2ndquadrant.com

almost 19 years ago

In reply to: Tom Lane (#7)

Re: Moved postgres, now won't start

Tom Lane wrote:

I think this proves my theory --- that all looks like leftover processes
trying to work in an installation that isn't there anymore. (Except I
have no idea what the "insecure dependency" bit is about.)

"Insecure dependency" is about Perl tainted mode (which pg_ctlcluster is
written in).

--
Alvaro Herrera http://www.PlanetPostgreSQL.org/
Tulio: oh, para quï¿½ servirï¿½ este boton, Juan Carlos?
Policarpo: No, alï¿½jense, no toquen la consola!
Juan Carlos: Lo apretarï¿½ una y otra vez.