Could not read directory "pg_xlog": Invalid argument (on SSD Raid)

Started by Data Growth Pty Ltdover 16 years ago2 messagesgeneral
Jump to latest
#1Data Growth Pty Ltd
datagrowth@gmail.com

I'm frequently getting these errors in my console:

4/11/09 2:25:04 PM org.postgresql.postgres[192] ERROR: could not read
directory "pg_xlog": Invalid argument
4/11/09 2:25:56 PM org.postgresql.postgres[192] ERROR: could not read
directory "pg_xlog": Invalid argument
4/11/09 2:36:03 PM org.postgresql.postgres[192] ERROR: could not read
directory "pg_xlog": Invalid argument

and rarely:

3/11/09 10:32:31 PM org.postgresql.postgres[217] ERROR: could not
read directory "pg_clog": Invalid argument

It is clearly not failing all the time, as the pg_xlog file is full of files
that keep being touched and updated. I have not experienced data loss
(yet), but large queries are taking orders of magnitude longer than I would
like.

System:

Mac Pro Quad Nahelem 2.93GHz, 16GB RAM running Snow Leopard OS X 10.6.1 in
64bit mode

Postgres 8.4.1 (Intel 64 bit) from
http://www.kyngchaos.com/software:postgres
( I have also tried compiling from source - I have the same problems
plus a few extra installation issues. The "official" postgresql binary from
http://www.enterprisedb.com/ is not 64 bit)

The postgres data directory is on an SSD Raid 0 array. It can support
around 10K random read I/O per second, or 5K random write I/Os, sustained,
in other applications. pg_xlog and pg_clog are on the same SSD raid array as
the postgres DB.

Under postgres it does several thousand I/Os per second for about 1-2
seconds, then drops back to only about 50 I/Os per second for about 10
seconds, before repeating the cycle. CPU is usually only a couple %
occupied. The console often records an error message "pg_xlog": Invalid
argument during those infrequent activity bursts.

I've looked at the source code in src/port/dirmod.c:

pgfnames(const char *path)
{
....
while ((file = readdir(dir)) != NULL)
{
....
errno = 0;
}
....
if (errno)
{
....
fprintf(stderr, _("could not read directory \"%s\": %s\n"),
path, strerror(errno));
....
}

So it seems that readdir is returning "Invalid argument" occasionally. But
I do not understand how this error could possibly occur in this location.

I've searched for "pg_xlog": Invalid argument, and the only other mention I
have found was on Linux running on a ram disk.

Could this be a race condition? Suggestions?

Stephen

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Data Growth Pty Ltd (#1)
Re: Could not read directory "pg_xlog": Invalid argument (on SSD Raid)

Data Growth Pty Ltd <datagrowth@gmail.com> writes:

I'm frequently getting these errors in my console:
4/11/09 2:25:04 PM org.postgresql.postgres[192] ERROR: could not read
directory "pg_xlog": Invalid argument
Mac Pro Quad Nahelem 2.93GHz, 16GB RAM running Snow Leopard OS X 10.6.1 in
64bit mode

This is a known bug in Snow Leopard --- readdir() calls fail after
having deleted a file in the directory. We are hoping that Apple fixes
it in 10.6.2, because trying to kluge around it seems like a mess.
Your example is actually in a different place from the known case in
DROP TABLESPACE, which just reinforces that trying to avoid the bug at
the application level would be difficult.

I'm running 10.6 myself on my laptop, but I think it ought to be
regarded as not quite stable enough for production servers yet :-(

regards, tom lane