Could not read directory "pg_xlog": Invalid argument (on SSD Raid)
I'm frequently getting these errors in my console:
4/11/09 2:25:04 PM org.postgresql.postgres[192] ERROR: could not read
directory "pg_xlog": Invalid argument
4/11/09 2:25:56 PM org.postgresql.postgres[192] ERROR: could not read
directory "pg_xlog": Invalid argument
4/11/09 2:36:03 PM org.postgresql.postgres[192] ERROR: could not read
directory "pg_xlog": Invalid argument
and rarely:
3/11/09 10:32:31 PM org.postgresql.postgres[217] ERROR: could not
read directory "pg_clog": Invalid argument
It is clearly not failing all the time, as the pg_xlog file is full of files
that keep being touched and updated. I have not experienced data loss
(yet), but large queries are taking orders of magnitude longer than I would
like.
System:
Mac Pro Quad Nahelem 2.93GHz, 16GB RAM running Snow Leopard OS X 10.6.1 in
64bit mode
Postgres 8.4.1 (Intel 64 bit) from
http://www.kyngchaos.com/software:postgres
( I have also tried compiling from source - I have the same problems
plus a few extra installation issues. The "official" postgresql binary from
http://www.enterprisedb.com/ is not 64 bit)
The postgres data directory is on an SSD Raid 0 array. It can support
around 10K random read I/O per second, or 5K random write I/Os, sustained,
in other applications. pg_xlog and pg_clog are on the same SSD raid array as
the postgres DB.
Under postgres it does several thousand I/Os per second for about 1-2
seconds, then drops back to only about 50 I/Os per second for about 10
seconds, before repeating the cycle. CPU is usually only a couple %
occupied. The console often records an error message "pg_xlog": Invalid
argument during those infrequent activity bursts.
I've looked at the source code in src/port/dirmod.c:
pgfnames(const char *path)
{
....
while ((file = readdir(dir)) != NULL)
{
....
errno = 0;
}
....
if (errno)
{
....
fprintf(stderr, _("could not read directory \"%s\": %s\n"),
path, strerror(errno));
....
}
So it seems that readdir is returning "Invalid argument" occasionally. But
I do not understand how this error could possibly occur in this location.
I've searched for "pg_xlog": Invalid argument, and the only other mention I
have found was on Linux running on a ram disk.
Could this be a race condition? Suggestions?
Stephen
Data Growth Pty Ltd <datagrowth@gmail.com> writes:
I'm frequently getting these errors in my console:
4/11/09 2:25:04 PM org.postgresql.postgres[192] ERROR: could not read
directory "pg_xlog": Invalid argument
Mac Pro Quad Nahelem 2.93GHz, 16GB RAM running Snow Leopard OS X 10.6.1 in
64bit mode
This is a known bug in Snow Leopard --- readdir() calls fail after
having deleted a file in the directory. We are hoping that Apple fixes
it in 10.6.2, because trying to kluge around it seems like a mess.
Your example is actually in a different place from the known case in
DROP TABLESPACE, which just reinforces that trying to avoid the bug at
the application level would be difficult.
I'm running 10.6 myself on my laptop, but I think it ought to be
regarded as not quite stable enough for production servers yet :-(
regards, tom lane