open_sync fails

Started by Rick Weberover 17 years ago4 messagesgeneral
Jump to latest
#1Rick Weber
riweber@akamai.com

Basic system setup:

Linux 2.4 kernel (heavily modified)
Dual core Athlon Opteron
4GB ECC RAM
SW RAID 10 configuration with 8 750 Gb disks (using only 500Gb of each
disk) connected via LSISAS1068 based card

While working on tuning my database, I was experimenting with changing
the wal_sync_method to try to find the optimal value. The really odd
thing is when I switch to open_sync (O_SYNC), Postgres immediately fails
and gives me an error message of:

2008-07-22 11:22:37 UTC 19411 akamai [local] PANIC: could not write to
log file 101, segment 40 at offset 1255
8336, length 2097152: No space left on device

Even running the test_fsync tool on this system gives me an error
message indicating O_SYNC isn't supported, and it promptly bails.

So I'm wondering what the heck is going on. I've found a bunch of posts
that indicate O_SYNC may provide some extra throughput, but nothing
indicating that O_SYNC doesn't work.

Can anybody provide me any pointers on this?

Thanks

--Rick

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Rick Weber (#1)
Re: open_sync fails

Rick Weber <riweber@akamai.com> writes:

Basic system setup:
Linux 2.4 kernel (heavily modified)

"Heavily modified" meaning what exactly?

Given that no one else has reported such a thing, and the obvious
bogosity of the errno code, I'd certainly first cast suspicion on the
kernel.

regards, tom lane

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Rick Weber (#1)
Re: open_sync fails

Rick Weber wrote:

While working on tuning my database, I was experimenting with changing
the wal_sync_method to try to find the optimal value. The really odd
thing is when I switch to open_sync (O_SYNC), Postgres immediately fails
and gives me an error message of:

2008-07-22 11:22:37 UTC 19411 akamai [local] PANIC: could not write to
log file 101, segment 40 at offset 12558336, length 2097152: No space left on device

Sounds like a kernel bug to me, particularly because the segment is most
likely already 16 MB in length; we're only rewriting the contents, not
enlarging it. Perhaps the kernel wanted to report a problem and chose
the wrong errno.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#4Rick Weber
riweber@akamai.com
In reply to: Alvaro Herrera (#3)
Re: open_sync fails

Definitely believable. It gives me an internal avenue to chase down.

Thanks

--Rick

Alvaro Herrera wrote:

Show quoted text

Rick Weber wrote:

While working on tuning my database, I was experimenting with changing
the wal_sync_method to try to find the optimal value. The really odd
thing is when I switch to open_sync (O_SYNC), Postgres immediately fails
and gives me an error message of:

2008-07-22 11:22:37 UTC 19411 akamai [local] PANIC: could not write to
log file 101, segment 40 at offset 12558336, length 2097152: No space left on device

Sounds like a kernel bug to me, particularly because the segment is most
likely already 16 MB in length; we're only rewriting the contents, not
enlarging it. Perhaps the kernel wanted to report a problem and chose
the wrong errno.