open_sync fails
Basic system setup:
Linux 2.4 kernel (heavily modified)
Dual core Athlon Opteron
4GB ECC RAM
SW RAID 10 configuration with 8 750 Gb disks (using only 500Gb of each
disk) connected via LSISAS1068 based card
While working on tuning my database, I was experimenting with changing
the wal_sync_method to try to find the optimal value. The really odd
thing is when I switch to open_sync (O_SYNC), Postgres immediately fails
and gives me an error message of:
2008-07-22 11:22:37 UTC 19411 akamai [local] PANIC: could not write to
log file 101, segment 40 at offset 1255
8336, length 2097152: No space left on device
Even running the test_fsync tool on this system gives me an error
message indicating O_SYNC isn't supported, and it promptly bails.
So I'm wondering what the heck is going on. I've found a bunch of posts
that indicate O_SYNC may provide some extra throughput, but nothing
indicating that O_SYNC doesn't work.
Can anybody provide me any pointers on this?
Thanks
--Rick
Rick Weber <riweber@akamai.com> writes:
Basic system setup:
Linux 2.4 kernel (heavily modified)
"Heavily modified" meaning what exactly?
Given that no one else has reported such a thing, and the obvious
bogosity of the errno code, I'd certainly first cast suspicion on the
kernel.
regards, tom lane
Rick Weber wrote:
While working on tuning my database, I was experimenting with changing
the wal_sync_method to try to find the optimal value. The really odd
thing is when I switch to open_sync (O_SYNC), Postgres immediately fails
and gives me an error message of:2008-07-22 11:22:37 UTC 19411 akamai [local] PANIC: could not write to
log file 101, segment 40 at offset 12558336, length 2097152: No space left on device
Sounds like a kernel bug to me, particularly because the segment is most
likely already 16 MB in length; we're only rewriting the contents, not
enlarging it. Perhaps the kernel wanted to report a problem and chose
the wrong errno.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Definitely believable. It gives me an internal avenue to chase down.
Thanks
--Rick
Alvaro Herrera wrote:
Show quoted text
Rick Weber wrote:
While working on tuning my database, I was experimenting with changing
the wal_sync_method to try to find the optimal value. The really odd
thing is when I switch to open_sync (O_SYNC), Postgres immediately fails
and gives me an error message of:2008-07-22 11:22:37 UTC 19411 akamai [local] PANIC: could not write to
log file 101, segment 40 at offset 12558336, length 2097152: No space left on deviceSounds like a kernel bug to me, particularly because the segment is most
likely already 16 MB in length; we're only rewriting the contents, not
enlarging it. Perhaps the kernel wanted to report a problem and chose
the wrong errno.