O_DIRECT use
I have added this item to TODO:
* Consider use of open/fctl(O_DIRECT) to minimize OS caching
Web shows it minimized file system caching, perhaps for sequential
scans:
http://archives2.us.postgresql.org/pgsql-hackers/2001-09/msg00713.php
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes:
I have added this item to TODO:
* Consider use of open/fctl(O_DIRECT) to minimize OS caching
Why exactly would we wish to minimize OS caching?
In my mind, Postgres has always relied heavily on the existence of a
layer of kernel caching. Disabling that will hurt far more than help.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
I have added this item to TODO:
* Consider use of open/fctl(O_DIRECT) to minimize OS cachingWhy exactly would we wish to minimize OS caching?
In my mind, Postgres has always relied heavily on the existence of a
layer of kernel caching. Disabling that will hurt far more than help.
Not sure. Someone on IRC brought it up. If we are sequential scanning a
large table, caching may be bad because we are pushing out stuff already
in the cache that may be useful. It is related to this TODO item:
* Add free-behind capability for large sequential scans (Bruce)
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Tom Lane wrote:
Why exactly would we wish to minimize OS caching?
Not sure. Someone on IRC brought it up. If we are sequential scanning a
large table, caching may be bad because we are pushing out stuff already
in the cache that may be useful.
Yeah, but people normally try to set things up to avoid doing large
sequential scans, at least in all the contexts where they need high
performance. For index searches you definitely want all the caching
you can get.
For that matter, I would expect that O_DIRECT also defeats readahead,
so I'd fully expect it to be a loser for seqscans too.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Tom Lane wrote:
Why exactly would we wish to minimize OS caching?
Not sure. Someone on IRC brought it up. If we are sequential scanning a
large table, caching may be bad because we are pushing out stuff already
in the cache that may be useful.Yeah, but people normally try to set things up to avoid doing large
sequential scans, at least in all the contexts where they need high
performance. For index searches you definitely want all the caching
you can get.For that matter, I would expect that O_DIRECT also defeats readahead,
so I'd fully expect it to be a loser for seqscans too.
I am told on FreeBSD it does not disable read-ahead, just caching;
something that needs more research.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
[2002-01-04 16:31] Bruce Momjian said:
| Not sure. Someone on IRC brought it up.
Is there a pg IRC channel? What is the server?
cheers.
brent
--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing." -- Duane Allman
Bruce Momjian <pgman@candle.pha.pa.us> writes:
For that matter, I would expect that O_DIRECT also defeats readahead,
so I'd fully expect it to be a loser for seqscans too.
I am told on FreeBSD it does not disable read-ahead, just caching;
something that needs more research.
Hmm. I always thought of read-ahead as preloading buffer cache entries.
It'd be interesting to get a description of *exactly* what this flag
does, rather than handwavy approximations. Time to start reading the
kernel code, I suppose.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
For that matter, I would expect that O_DIRECT also defeats readahead,
so I'd fully expect it to be a loser for seqscans too.I am told on FreeBSD it does not disable read-ahead, just caching;
something that needs more research.Hmm. I always thought of read-ahead as preloading buffer cache entries.
It'd be interesting to get a description of *exactly* what this flag
does, rather than handwavy approximations. Time to start reading the
kernel code, I suppose.
I found this before adding the item:
http://www.pairlist.net/pipermail/flow-tools/2001-October/000058.html
And this for FreeBSD 4.4:
2.1 Kernel Changes
The O_DIRECT flag has been added to open(2) and fcntl(2). Specifying this
flag for open files will attempt to minimize the cache effects of reading
and writing.
I also found:
http://www.ukuug.org/events/linux2001/papers/html/AArcangeli-o_direct.html
These later ones seem to indicate there isn't read-ahead, meaning we
would have to do our own prefetches. Eck. I am unclear if that is true
on all OS's.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Brent Verner wrote:
[2002-01-04 16:31] Bruce Momjian said:
| Not sure. Someone on IRC brought it up.
Is there a pg IRC channel? What is the server?
See FAQ item 1.6.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Brent Verner wrote:
[2002-01-04 16:31] Bruce Momjian said:
| Not sure. Someone on IRC brought it up.
Is there a pg IRC channel? What is the server?
FAQ item text is:
<P>There is also an IRC channel on EFNet, channel
<I>#PostgreSQL.</I> I use the unix command <CODE>irc -c
'#PostgreSQL' "$USER" irc.phoenix.net.</CODE></P>
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Fri, 4 Jan 2002, Bruce Momjian wrote:
For that matter, I would expect that O_DIRECT also defeats readahead,
so I'd fully expect it to be a loser for seqscans too.
And this for FreeBSD 4.4:
The O_DIRECT flag has been added to open(2) and fcntl(2). Specifying this
flag for open files will attempt to minimize the cache effects of reading
and writing.
This seems rather vague. Can any FreeBSD person here say
whether the semantics are any stronger?
http://www.ukuug.org/events/linux2001/papers/html/AArcangeli-o_direct.html
These later ones seem to indicate there isn't read-ahead, meaning we
would have to do our own prefetches. Eck. I am unclear if that is
true on all OS's.
The Linux O_DIRECT semantics are intended to be harder.
In essence, the kernel _will not cache_ data read from
or written to such a file or device.
The point of this, incidentally, was to be able to run
things like Oracle Parallel Server and other shared-
disk setups. It's use as an "I don't need this cached"
mechanism is secondary, and rather sub-optimal, as seen
here; you disable software read-ahead and introduce
coherence issues with non-O_DIRECT openers of the file.
(I'm not sure of the precise Linux semantics of this,
but it's probably fair to say that you may as well
consider them undefined.)
Linux 2.4 has "madvise", but unfortunately no matching
"fadvise". A quick Google implied that FreeBSD is in
the same boat.
Matthew.