HINT: Perhaps out of disk space?
I'm investigating a problem that happened last night and I would
appreciate any recommendations. The logs indicate that the disks were
full, but I truly doubt that since we only use about 14GB out of the
available 65GB.
I found entries like this in the logs:
ERROR: could not write block 2354 of temporary file: No space left on device
HINT: Perhaps out of disk space?
....
ERROR: could not extend relation "parent_table": No space left on device
HINT: Check free disk space.
....
LOG: could not close temporary statistics file "/var/lib/postgres/data/global/pgstat.tmp.1464": No space left on device
According to the logs, the problem went away after a reboot. I wonder
if the kernel or the RAID device got confused and postgres was simply
echoing what it was told. We run a couple hundred postgres servers and
we have not seen this before (except when the disks truly were full).
Everything is in the root filesystem, which has plenty of room.
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 67756724 14344392 49970408 23% /
tmpfs 1034768 0 1034768 0% /dev/shm
PostgreSQL 7.4.7 on i386-pc-linux-gnu, compiled by GCC i386-linux-gcc (GCC) 3.3.5 (Debian 1:3.3.5-12)
Debian Sarge with Linux kernel 2.4.27-2-686-smp
Dell PowerEdge 1800
Dell MegaRAID PERC 4/DC RAID Controller, 128MB cache w/BBU
2x SEAGATE Cheetah 10K.7 ST373207LC in RAID 1 (mirroring)
Folks are a little jittery because our customers do very heavy
business this month and we don't want frantic support calls when we
should be drinking eggnog.
-Mike
Michael Adler <adler@pobox.com> writes:
I'm investigating a problem that happened last night and I would
appreciate any recommendations. The logs indicate that the disks were
full, but I truly doubt that since we only use about 14GB out of the
available 65GB.
I found entries like this in the logs:
ERROR: could not write block 2354 of temporary file: No space left on device
HINT: Perhaps out of disk space?
....
ERROR: could not extend relation "parent_table": No space left on device
HINT: Check free disk space.
....
LOG: could not close temporary statistics file "/var/lib/postgres/data/global/pgstat.tmp.1464": No space left on device
According to the logs, the problem went away after a reboot. I wonder
if the kernel or the RAID device got confused and postgres was simply
echoing what it was told. We run a couple hundred postgres servers and
we have not seen this before (except when the disks truly were full).
I'm inclined to think that a query created a 50GB temporary file ...
the postmaster cleans out temp files when restarted, so that would
have destroyed the evidence.
regards, tom lane
On Fri, Dec 23, 2005 at 11:36:54AM -0500, Tom Lane wrote:
Michael Adler <adler@pobox.com> writes:
I'm investigating a problem that happened last night and I would
appreciate any recommendations. The logs indicate that the disks were
full, but I truly doubt that since we only use about 14GB out of the
available 65GB.I found entries like this in the logs:
ERROR: could not write block 2354 of temporary file: No space left on device
HINT: Perhaps out of disk space?
....
ERROR: could not extend relation "parent_table": No space left on device
HINT: Check free disk space.
....
LOG: could not close temporary statistics file "/var/lib/postgres/data/global/pgstat.tmp.1464": No space left on deviceAccording to the logs, the problem went away after a reboot. I wonder
if the kernel or the RAID device got confused and postgres was simply
echoing what it was told. We run a couple hundred postgres servers and
we have not seen this before (except when the disks truly were full).I'm inclined to think that a query created a 50GB temporary file ...
the postmaster cleans out temp files when restarted, so that would
have destroyed the evidence.
I'm curious about what could have resulted in so much temporary
storage for a database that fits entirely in 2.5GB space. I can
imagine taking the largest table and joining it against itself many
times without a WHERE clause. What else would use a lot of temp
storage?
How long would it take to clean out 50GB of temp files? It looks like
the postmaster was able to start up instantly after the reboot (ready
less than 1 second after "LOG: database system was shut down at...")
I really appreciate any guidance you could offer.
-Mike
On Fri, 23 Dec 2005 13:42:13 -0500, Michael Adler wrote:
On Fri, Dec 23, 2005 at 11:36:54AM -0500, Tom Lane wrote:
Michael Adler <adler@pobox.com> writes:
I'm investigating a problem that happened last night and I would
appreciate any recommendations. The logs indicate that the disks were
full, but I truly doubt that since we only use about 14GB out of the
available 65GB.I found entries like this in the logs:
ERROR: could not write block 2354 of temporary file: No space left on device
HINT: Perhaps out of disk space?
....
ERROR: could not extend relation "parent_table": No space left on device
HINT: Check free disk space.
....
LOG: could not close temporary statistics file "/var/lib/postgres/data/global/pgstat.tmp.1464": No space left on deviceAccording to the logs, the problem went away after a reboot. I wonder
if the kernel or the RAID device got confused and postgres was simply
echoing what it was told. We run a couple hundred postgres servers and
we have not seen this before (except when the disks truly were full).I really appreciate any guidance you could offer.
Are there any errors about running out of shared memory? I have seen the
"No space left on device" error for that on FreeBSD before.