BUG #5603: pg_tblspc and pg_twoface directories get deleted when starting up service

Started by Nacho Mezzadraover 15 years ago5 messagesbugs
Jump to latest
#1Nacho Mezzadra
nachomezzadra@gmail.com

The following bug has been logged online:

Bug reference: 5603
Logged by: Nacho Mezzadra
Email address: nachomezzadra@gmail.com
PostgreSQL version: 8.3.11
Operating system: Red Hat Enterprise 5.3
Description: pg_tblspc and pg_twoface directories get deleted when
starting up service
Details:

This issue happened not very frequently, but it happened to me 3 times, in 3
different Red Hat servers.
The thing is that when stopping the Postgresql service with the
"/sbin/service postgresql-8.3 stop" command, and after that starting it with
the "/sbin/service postgresql-8.3 start" command (haven't tried with the
restart one though), a few times both pg_tblspc and pg_twoface directories
(inside data directory) get somehow deleted and hence the start service
command fails. Looking in the log files I find the following error:

2010-07-19 16:54:55 ISTFATAL: could not open directory "pg_tblspc": No such
file or directory

So I manually create the "pg_tblspc" directory, and then try to start again
the service unsuccessfully, getting this time a similar error, but saying
that pg_twoface directory doesn't exist.

After creating the pg_twoface directory, service can be successfully
started.

Please note that all these always happened running the service command as
root.
All 3 linux boxes are running over a VMWare host.

#2Robert Haas
robertmhaas@gmail.com
In reply to: Nacho Mezzadra (#1)
Re: BUG #5603: pg_tblspc and pg_twoface directories get deleted when starting up service

On Thu, Aug 5, 2010 at 2:46 PM, Nacho Mezzadra <nachomezzadra@gmail.com> wrote:

The following bug has been logged online:

Bug reference:      5603
Logged by:          Nacho Mezzadra
Email address:      nachomezzadra@gmail.com
PostgreSQL version: 8.3.11
Operating system:   Red Hat Enterprise 5.3
Description:        pg_tblspc and pg_twoface directories get deleted when
starting up service
Details:

This issue happened not very frequently, but it happened to me 3 times, in 3
different Red Hat servers.
The thing is that when stopping the Postgresql service with the
"/sbin/service postgresql-8.3 stop" command, and after that starting it with
the "/sbin/service postgresql-8.3 start" command (haven't tried with the
restart one though), a few times both pg_tblspc and pg_twoface  directories
(inside data directory) get somehow deleted and hence the start service
command fails.  Looking in the log files I find the following error:

2010-07-19 16:54:55 ISTFATAL:  could not open directory "pg_tblspc": No such
file or directory

So I manually create the "pg_tblspc" directory, and then try to start again
the service unsuccessfully, getting this time a similar error, but saying
that pg_twoface directory doesn't exist.

After creating the pg_twoface directory, service can be successfully
started.

Please note that all these always happened running the service command as
root.
All 3 linux boxes are running over a VMWare host.

This is pretty scary, but it's a little hard to believe that Red Hat
would ship a script which had even the faintest chance of obliterating
two critical directories. Especially since the guy who does the
packaging of PostgreSQL over thereabouts is our most knowledgeable,
experienced, and prolific committer. So I suspect you've a (broken)
custom script, or a cron job that's doing something evil, or some
other weirdness that is specific to your installations, but you
haven't provided enough details to speculate in detail (for example,
perhaps you could reply to the list and post a copy of the script you
think is doing this).

Also, I'm pretty sure that we don't have a directory called
pg_twoface, though it would pretty funny if we did. It's fairly
obvious what this is meant to say, but it doesn't.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#2)
Re: BUG #5603: pg_tblspc and pg_twoface directories get deleted when starting up service

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Aug 5, 2010 at 2:46 PM, Nacho Mezzadra <nachomezzadra@gmail.com> wrote:

PostgreSQL version: 8.3.11
Operating system: � Red Hat Enterprise 5.3
Description: � � � �pg_tblspc and pg_twoface directories get deleted when
starting up service

This is pretty scary, but it's a little hard to believe that Red Hat
would ship a script which had even the faintest chance of obliterating
two critical directories. Especially since the guy who does the
packaging of PostgreSQL over thereabouts is our most knowledgeable,
experienced, and prolific committer. So I suspect you've a (broken)
custom script, or a cron job that's doing something evil, or some
other weirdness that is specific to your installations, but you
haven't provided enough details to speculate in detail (for example,
perhaps you could reply to the list and post a copy of the script you
think is doing this).

Well, I have to disclaim credit/blame for this, because Red Hat has
never shipped PG 8.3.anything for RHEL-5. Possibly the OP is running
Devrim's or Command Prompt's RPMs. That said, the initscript Devrim
uses looks just about like mine, and there's no chance whatever that it
would selectively delete portions of what's under $PGDATA. I have to
think that there's a loose cannon somewhere else in the OP's system.
We have for example seen some very unfortunate behavior in the past
when the data directory was located on a slow-to-mount NFS server.
(I have no reason to think that that's exactly what this problem is;
I just cite it to illustrate the kind of thing to be looking for.)

regards, tom lane

#4Nacho Mezzadra
nachomezzadra@gmail.com
In reply to: Tom Lane (#3)
Re: BUG #5603: pg_tblspc and pg_twoface directories get deleted when starting up service

Tom, Robert, sorry I am coming back to you after a while, but we still
have the same issue. This has been happening in our environments, but
now it is also happening in customers' environments -which we do not
set up- and it is also happening. All environments are always Red Hat
Enterprise 5.3.
As reported in the issue, when starting a service using /sbin/service
postgresql-8.3 start, sometimes the directories data/pg_tblspc and
data/pg_twophase get deleted and PostgreSQL engine won't start up. As
a workaround, we recreate both directories and PostgreSQL can be
started again, but we need to know why this is happening and if it
ever will harm in any way our data.
Please let me know if you need any more info, or whatever.
Thanks a lot in advance,
Nacho.-

On Tue, Aug 10, 2010 at 01:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Aug 5, 2010 at 2:46 PM, Nacho Mezzadra <nachomezzadra@gmail.com> wrote:

PostgreSQL version: 8.3.11
Operating system:   Red Hat Enterprise 5.3
Description:        pg_tblspc and pg_twoface directories get deleted when
starting up service

This is pretty scary, but it's a little hard to believe that Red Hat
would ship a script which had even the faintest chance of obliterating
two critical directories.  Especially since the guy who does the
packaging of PostgreSQL over thereabouts is our most knowledgeable,
experienced, and prolific committer.  So I suspect you've a (broken)
custom script, or a cron job that's doing something evil, or some
other weirdness that is specific to your installations, but you
haven't provided enough details to speculate in detail (for example,
perhaps you could reply to the list and post a copy of the script you
think is doing this).

Well, I have to disclaim credit/blame for this, because Red Hat has
never shipped PG 8.3.anything for RHEL-5.  Possibly the OP is running
Devrim's or Command Prompt's RPMs.  That said, the initscript Devrim
uses looks just about like mine, and there's no chance whatever that it
would selectively delete portions of what's under $PGDATA.  I have to
think that there's a loose cannon somewhere else in the OP's system.
We have for example seen some very unfortunate behavior in the past
when the data directory was located on a slow-to-mount NFS server.
(I have no reason to think that that's exactly what this problem is;
I just cite it to illustrate the kind of thing to be looking for.)

                       regards, tom lane

On Thu, Aug 5, 2010 at 2:46 PM, Nacho Mezzadra <nachomezzadra@gmail.com> wrote:

The following bug has been logged online:

Bug reference:      5603
Logged by:          Nacho Mezzadra
Email address:      nachomezzadra@gmail.com
PostgreSQL version: 8.3.11
Operating system:   Red Hat Enterprise 5.3
Description:        pg_tblspc and pg_twoface directories get deleted when
starting up service
Details:

This issue happened not very frequently, but it happened to me 3 times, in 3
different Red Hat servers.
The thing is that when stopping the Postgresql service with the
"/sbin/service postgresql-8.3 stop" command, and after that starting it with
the "/sbin/service postgresql-8.3 start" command (haven't tried with the
restart one though), a few times both pg_tblspc and pg_twoface  directories
(inside data directory) get somehow deleted and hence the start service
command fails.  Looking in the log files I find the following error:

2010-07-19 16:54:55 ISTFATAL:  could not open directory "pg_tblspc": No such
file or directory

So I manually create the "pg_tblspc" directory, and then try to start again
the service unsuccessfully, getting this time a similar error, but saying
that pg_twoface directory doesn't exist.

After creating the pg_twoface directory, service can be successfully
started.

Please note that all these always happened running the service command as
root.
All 3 linux boxes are running over a VMWare host.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nacho Mezzadra (#4)
Re: BUG #5603: pg_tblspc and pg_twoface directories get deleted when starting up service

Nacho Mezzadra <nachomezzadra@gmail.com> writes:

Tom, Robert, sorry I am coming back to you after a while, but we still
have the same issue. This has been happening in our environments, but
now it is also happening in customers' environments -which we do not
set up- and it is also happening. All environments are always Red Hat
Enterprise 5.3.

You still haven't given any reason to think this is a Postgres bug,
nor indeed any information beyond what you said originally.

One thing that strikes me is that both pg_tblspc and pg_twophase are
empty and unused during normal operation (if you're not using the
relevant features). They are scanned during postmaster startup though,
which is why you're getting failures then. I suspect that these
subdirectories are not in fact getting removed during PG shutdown or
restart, but were deleted some time before that. In particular I wonder
if somebody's loosed an overaggressive tmp-file-cleaning script on your
whole filesystem. Something that was removing empty directories that
hadn't been accessed in awhile could explain this.

regards, tom lane