could not create lock file postmaster.pid: No such file or directory, but file does exist
Hi,
This is my first post to this list, so I hope I am posting it to the correct lists. But I am really stuck and getting pretty desperate at the moment.
This weekend my database crashed while importing some Openstreetmapdata and I can't get it back to work again. It happened before and normally I would reset the WAL-dir with the pg_resetxlog command. I would loose some data but that would be all.
This time it is somehow different because he doesn't recognize any of the important files anymore. For example when I try to start Postgresql again with the command:
/usr/lib/postgresql/9.1/bin/pg_ctl -D OSM/ start
I get the following error:
FATAL: could not create lock file "postmaster.pid": No such file or directory
But when I do a ls -l on the directory I can see the file exists.
drwx------ 0 postgres postgres 0 Jan 24 10:07 backup
drwx------ 0 postgres postgres 0 Feb 14 11:10 base
drwx------ 0 postgres postgres 0 Feb 17 09:46 global
drwx------ 0 postgres postgres 0 Oct 11 10:49 pg_clog
-rwxr-xr-x 0 postgres postgres 4476 Oct 11 10:49 pg_hba.conf
-rwxr-xr-x 0 postgres postgres 1636 Oct 11 10:49 pg_ident.conf
drwx------ 0 postgres postgres 0 Feb 17 11:29 pg_log
drwx------ 0 postgres postgres 0 Oct 11 10:49 pg_multixact
drwx------ 0 postgres postgres 0 Feb 17 08:58 pg_notify
drwx------ 0 postgres postgres 0 Oct 11 10:49 pg_serial
drwx------ 0 postgres postgres 0 Feb 12 09:58 pg_stat_tmp
drwx------ 0 postgres postgres 0 Feb 14 09:01 pg_subtrans
drwx------ 0 postgres postgres 0 Oct 11 10:49 pg_tblspc
drwx------ 0 postgres postgres 0 Oct 11 10:49 pg_twophase
-rwxr-xr-x 0 postgres postgres 4 Oct 11 10:49 PG_VERSION
drwx------ 0 postgres postgres 0 Feb 14 13:37 pg_xlog
-rwxr-xr-x 0 postgres postgres 19168 Oct 11 11:41 postgresql.conf
-rwxr-xr-x 0 postgres postgres 121 Feb 17 08:57 postmaster.opts
-rwxr-xr-x 0 postgres postgres 88 Feb 17 08:58 postmaster.pid
I cannot perform any action on the postmaster.pid file. I tried cp, mv and rm, but nothing works. Is there anything I can do to make the system recognize this file again? And get my database up and running? Or is all hopelessly lost?
I have Postgresql 9.1 installed on Ubuntu 12.04.
Kind regards,
Rob.
Rob Goethals wrote:
This is my first post to this list, so I hope I am posting it to the correct lists. But I am really
stuck and getting pretty desperate at the moment.
You should not post to more than one list.
This weekend my database crashed while importing some Openstreetmapdata and I can’t get it back to
work again. It happened before and normally I would reset the WAL-dir with the pg_resetxlog command. I
would loose some data but that would be all.
That is not a good idea. PostgreSQL should recover from a crash automatically.
If you run pg_resetxlog your database cluster is damaged, and all you should
do is pg_dump all the data you can, run initdb and import the data.
This time it is somehow different because he doesn’t recognize any of the important files anymore. For
example when I try to start Postgresql again with the command:/usr/lib/postgresql/9.1/bin/pg_ctl -D OSM/ start
I get the following error:
FATAL: could not create lock file "postmaster.pid": No such file or directory
But when I do a ls –l on the directory I can see the file exists.
[...]
-rwxr-xr-x 0 postgres postgres 88 Feb 17 08:58 postmaster.pid
I cannot perform any action on the postmaster.pid file. I tried cp, mv and rm, but nothing works. Is
there anything I can do to make the system recognize this file again? And get my database up and
running? Or is all hopelessly lost?I have Postgresql 9.1 installed on Ubuntu 12.04.
What is the error message you get for cp, mv or rm?
Can you describe the crash of your machine in greater detail?
What was the cause?
One wild guess: could it be that the OS automatically remounted the file system
read-only because it encountered a problem? Check your /var/log/messages (I hope
the location is the same on Ubuntu and on RHEL).
In that case unmount, fsck and remount should solve the problem.
Yours,
Laurenz Albe
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
-----Oorspronkelijk bericht-----
Van: Albe Laurenz [mailto:laurenz.albe@wien.gv.at]
Verzonden: maandag 17 februari 2014 14:22
Aan: Rob Goethals
Onderwerp: RE: could not create lock file postmaster.pid: No such file or
directory, but file does existDear Rob,
you should send your reply to the list.
This way
a) people know that your problem is solved and won't spend their time trying
to help you.
b) others can benefit from the information.
OK, clear. I hereby send this reply also to the list.
This weekend my database crashed while importing some
Openstreetmapdata and I can’t get it back to work again. It happened
before and normally I would reset the WAL-dir with the pg_resetxlogcommand. I would loose some data but that would be all.
That is not a good idea. PostgreSQL should recover from a crash
automatically.
If you run pg_resetxlog your database cluster is damaged, and all you
should do is pg_dump all the data you can, run initdb and import the data.But what if Postgresql doesn't recover automatically? When my database
crashed and I try to restart it, I most of the time get a message like:
LOG: could not open file "pg_xlog/0000000100000114000000D2" (log file
276, segment 210): No such file or directory
LOG: invalid primary checkpoint record
LOG: invalid secondary checkpoint link in control file
PANIC: could not locate a valid checkpoint record
LOG: startup process (PID 3604) was terminated by signal 6: Aborted
LOG: aborting startup due to startup process failureInteresting.
How did you get PostgreSQL into this state? Did you set fsync=off or similar?
Which storage did you put pg_xlog on?
I am adding OSM-changefiles to my database with the command:
osm2pgsql --append --database $database --username $user --slim --cache 3000 --number-processes 6 --style /usr/share/osm2pgsql/default.style --extra-attributes changes.osc.gz
At the moment of the crash the postgresql-log says:
2014-02-15 00:49:04 CET LOG: WAL writer process (PID 1127) was terminated by signal 6: Aborted
2014-02-15 00:49:04 CET LOG: terminating any other active server processes
2014-02-15 00:49:04 CET [unknown] WARNING: terminating connection because of crash of another server process
2014-02-15 00:49:04 CET [unknown] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
So what exactly is happening, I don't know.
When it is trying to startup again this is the logfile output:
2014-02-15 00:49:08 CET LOG: could not open temporary statistics file "global/pgstat.tmp": Input/output error
2014-02-15 00:49:14 CET LOG: all server processes terminated; reinitializing
2014-02-15 00:49:17 CET LOG: database system was interrupted; last known up at 2014-02-15 00:32:01 CET
2014-02-15 00:49:33 CET [unknown] [unknown]LOG: connection received: host=[local]
2014-02-15 00:49:33 CET [unknown] FATAL: the database system is in recovery mode
2014-02-15 00:49:56 CET LOG: database system was not properly shut down; automatic recovery in progress
2014-02-15 00:49:57 CET [unknown] [unknown]LOG: connection received: host=[local]
2014-02-15 00:49:57 CET [unknown] FATAL: the database system is in recovery mode
2014-02-15 00:50:01 CET LOG: redo starts at 114/C8B27330
2014-02-15 00:50:02 CET LOG: could not open file "pg_xlog/0000000100000114000000CB" (log file 276, segment 203): No such file or directory
2014-02-15 00:50:02 CET LOG: redo done at 114/CAFFFF80
2014-02-15 00:50:02 CET LOG: checkpoint starting: end-of-recovery immediate
2014-02-15 00:50:05 CET PANIC: could not create file "pg_xlog/xlogtemp.5390": Input/output error
2014-02-15 00:50:22 CET [unknown] [unknown]LOG: connection received: host=[local]
2014-02-15 00:50:22 CET [unknown] FATAL: the database system is in recovery mode
2014-02-15 00:50:23 CET LOG: startup process (PID 5390) was terminated by signal 6: Aborted
2014-02-15 00:50:23 CET LOG: aborting startup due to startup process failure
Furthermore I checked my conf-file and my fsync is indeed set to off.
I mounted a directory on a NTFS network-disk (because of the available size and considering the amount of OSM-data is pretty big). This is where I put all my database data, so also the pg_xlog.
Is there a better procedure to follow when something like this
happens? I am fairly new at the whole Postgresql thing so I am very
willing to learn all about it anyway I can from experienced users. I
am googling all my way round the internet to try and solve all the
questions I have, but as with many things there's most of the time morethan 1 answer to a problem and for me it is very hard to figure out what is the
best solution.No, in that case I would restore from a backup.
One wild guess: could it be that the OS automatically remounted the
file system read-only because it encountered a problem? Check your
/var/log/messages (I hope the location is the same on Ubuntu and onRHEL).
In that case unmount, fsck and remount should solve the problem.
I am impressed. Your wild guess exactly did the trick. Manually
unmounting, checking and remounting was all it needed. Thank you verymuch!!
That would suggest that you have a hardware problem with your storage.
It may be that your file system is corrupted. Did you fsck it?
The fsck didn't work as it was mounted as cifs. So I guess I should let Windows do the checking.
Yours,
Laurenz Albe
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Import Notes
Reply to msg id not found: A737B7A37273E048B164557ADEF4A58B17CA7E88@ntex2010i.host.magwien.gv.at
On 17 February 2014 14:42, Rob Goethals / SNP <Rob.Goethals@snp.nl> wrote:
2014-02-15 00:49:04 CET LOG: WAL writer process (PID 1127) was terminated by signal 6: Aborted
Signal 6 is usually caused by hardware issues.
Then again, you also say:
I mounted a directory on a NTFS network-disk (because of the available size and considering the
amount of OSM-data is pretty big). This is where I put all my database data, so also the pg_xlog.
That will cause problems as well. SMBFS does not support all the
necessary file flags, locks and such that the database needs to
operate on those files in a safe way. That's probably worse than
running with sciss... ehr... fsync=off
Alban Hertroys.
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Rob Goethals / SNP <Rob.Goethals@snp.nl> writes:
When it is trying to startup again this is the logfile output:
...
2014-02-15 00:50:05 CET PANIC: could not create file "pg_xlog/xlogtemp.5390": Input/output error
The above PANIC is the reason for the abort that happens immediately
thereafter.
On local storage I'd think this meant disk hardware problems, but since
you say you've got the database on an NTFS volume, what it more likely
means is that there's a bug in the kernel's NTFS support. Anyway, it's
fruitless to try to get Postgres going again until you have a stable
filesystem underneath it.
Generally speaking, longtime Postgres users are very suspicious of running
Postgres atop any kind of networked filesystem. We find that network
filesystems are invariably less stable than local ones. NTFS seems likely
to be a particularly unfortunate choice from this standpoint, as you get
to benefit from Windows' bugs along with Linux's.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Rob Goethals wrote:
OK, clear. I hereby send this reply also to the list.
Cool.
Interesting.
How did you get PostgreSQL into this state? Did you set fsync=off or similar?
Which storage did you put pg_xlog on?
2014-02-15 00:49:04 CET LOG: WAL writer process (PID 1127) was terminated by signal 6: Aborted
Ouch.
Furthermore I checked my conf-file and my fsync is indeed set to off.
Well, that is one reason why crash recovery is not working.
I mounted a directory on a NTFS network-disk (because of the available size and considering the amount
of OSM-data is pretty big). This is where I put all my database data, so also the pg_xlog.
Double ouch.
CIFS is not a supported file system.
At least that explains your problems.
Try with a local file system or NFS with hard foreground mount.
Yours,
Laurenz Albe
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
OK, it is clear to me that I didn't make the best choices setting up this database. :(
I am happy I found this list because I am learning a lot in a very short period of time. :) Thank you all for your tips and comments.
I will definitely move the database to a Linux-system and set fsync to on. I hope this will give me a more stable environment. Furthermore I'll dive into the whole database-backup subject so next time I'll have something to restore if things go wrong.
Rob Goethals.
-----Oorspronkelijk bericht-----
Van: Albe Laurenz [mailto:laurenz.albe@wien.gv.at]
Verzonden: maandag 17 februari 2014 16:20
Aan: Rob Goethals
CC: 'pgsql-general@postgresql.org'
Onderwerp: RE: could not create lock file postmaster.pid: No such file or
directory, but file does existRob Goethals wrote:
OK, clear. I hereby send this reply also to the list.
Cool.
Interesting.
How did you get PostgreSQL into this state? Did you set fsync=off orsimilar?
Which storage did you put pg_xlog on?
2014-02-15 00:49:04 CET LOG: WAL writer process (PID 1127) was
terminated by signal 6: AbortedOuch.
Furthermore I checked my conf-file and my fsync is indeed set to off.
Well, that is one reason why crash recovery is not working.
I mounted a directory on a NTFS network-disk (because of the available
size and considering the amount of OSM-data is pretty big). This is where Iput all my database data, so also the pg_xlog.
Double ouch.
CIFS is not a supported file system.At least that explains your problems.
Try with a local file system or NFS with hard foreground mount.Yours,
Laurenz Albe
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
You don't give a lot of information, but try "sudo rm postmaster.pid" or
"sudo -u postgres rm postmaster.pid" if you are sure that postgres is not
running.
Cheers,
Cliff
On Tue, Feb 18, 2014 at 12:07 AM, Rob Goethals / SNP <Rob.Goethals@snp.nl>wrote:
Show quoted text
Hi,
This is my first post to this list, so I hope I am posting it to the
correct lists. But I am really stuck and getting pretty desperate at the
moment.This weekend my database crashed while importing some Openstreetmapdata
and I can't get it back to work again. It happened before and normally I
would reset the WAL-dir with the pg_resetxlog command. I would loose some
data but that would be all.This time it is somehow different because he doesn't recognize any of the
important files anymore. For example when I try to start Postgresql again
with the command:/usr/lib/postgresql/9.1/bin/pg_ctl -D OSM/ start
I get the following error:
FATAL: could not create lock file "postmaster.pid": No such file or
directoryBut when I do a ls -l on the directory I can see the file exists.
drwx------ 0 postgres postgres 0 Jan 24 10:07 backup
drwx------ 0 postgres postgres 0 Feb 14 11:10 base
drwx------ 0 postgres postgres 0 Feb 17 09:46 global
drwx------ 0 postgres postgres 0 Oct 11 10:49 pg_clog
-rwxr-xr-x 0 postgres postgres 4476 Oct 11 10:49 pg_hba.conf
-rwxr-xr-x 0 postgres postgres 1636 Oct 11 10:49 pg_ident.conf
drwx------ 0 postgres postgres 0 Feb 17 11:29 pg_log
drwx------ 0 postgres postgres 0 Oct 11 10:49 pg_multixact
drwx------ 0 postgres postgres 0 Feb 17 08:58 pg_notify
drwx------ 0 postgres postgres 0 Oct 11 10:49 pg_serial
drwx------ 0 postgres postgres 0 Feb 12 09:58 pg_stat_tmp
drwx------ 0 postgres postgres 0 Feb 14 09:01 pg_subtrans
drwx------ 0 postgres postgres 0 Oct 11 10:49 pg_tblspc
drwx------ 0 postgres postgres 0 Oct 11 10:49 pg_twophase
-rwxr-xr-x 0 postgres postgres 4 Oct 11 10:49 PG_VERSION
drwx------ 0 postgres postgres 0 Feb 14 13:37 pg_xlog
-rwxr-xr-x 0 postgres postgres 19168 Oct 11 11:41 postgresql.conf
-rwxr-xr-x 0 postgres postgres 121 Feb 17 08:57 postmaster.opts
-rwxr-xr-x 0 postgres postgres 88 Feb 17 08:58 postmaster.pid
I cannot perform any action on the postmaster.pid file. I tried cp, mv and
rm, but nothing works. Is there anything I can do to make the system
recognize this file again? And get my database up and running? Or is all
hopelessly lost?I have Postgresql 9.1 installed on Ubuntu 12.04.
Kind regards,
Rob.