expanding our usage of POSIX_FADVISE
Hello,
I wonder if POSIX_FADV_RANDOM and POSIX_FADV_SEQUENTIAL are still innacurate
for postgreSQL ?
I find
«A related problem is that the smgr uses the same FD to access the same
relation no matter how many scans are in progress. Think about a complex query
that is doing both a seqscan and an indexscan on the same relation (a self-
join could easily do this). You'd really need to change this if you want
POSIX_FADV_SEQUENTIAL and POSIX_FADV_RANDOM to get set usefully.
» (tom lane, 2003)
And also :
«
Surely POSIX_FADV_SEQUENTIAL is the one intended to hint seq scans, and
POSIX_FADV_RANDOM to hint random access. No?
ISTM, _WILLNEED seems just right for small random-access blocks.
Anyway, for those who want to see what they do in Linux,
http://www.gelato.unsw.edu.au/lxr/source/mm/fadvise.c Pretty scary that Bruce
said it could make older linuxes dump core - there isn't a lot of code there.
» (ron mayer, 2006)
But that seems a bit old.
----
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org
On Wed, Aug 12, 2009 at 3:07 PM, Cédric
Villemain<cedric.villemain@dalibo.com> wrote:
I wonder if POSIX_FADV_RANDOM and POSIX_FADV_SEQUENTIAL are still innacurate
for postgreSQL ?I find
«A related problem is that the smgr uses the same FD to access the same
relation no matter how many scans are in progress. Think about a complex
query that is doing both a seqscan and an indexscan on the same relation (a
self-join could easily do this). You'd really need to change this if you
want POSIX_FADV_SEQUENTIAL and POSIX_FADV_RANDOM to get set usefully.
» (tom lane, 2003)
I had a version of the POSIX_FADV_SEQUENTIAL patch going which set the
appropriate mode before every block read (skipping it if it was the
same mode as last set -- just like we handle lseek). I couldn't
measure any consistent improvement on sequential scans though which,
at least on Linux, already saturdate any i/o system I tested. Mileage
on other operating systems or i/o systems may vary of course.
I think the real benefit of this would be avoiding polluting the
filesystem cache with blocks which we have no intention of reading.
That will be a hard benefit to measure though. Especially since just
because we're doing a random i/o doesn't actually mean we won't read
nearby blocks eventually. If we're scanning an index range and the
table is actually mostly clustered then our random i/o won't be so
random after all...
Le mercredi 12 août 2009, Greg Stark a écrit :
On Wed, Aug 12, 2009 at 3:07 PM, Cédric
Villemain<cedric.villemain@dalibo.com> wrote:
I wonder if POSIX_FADV_RANDOM and POSIX_FADV_SEQUENTIAL are still
innacurate for postgreSQL ?I find
«A related problem is that the smgr uses the same FD to access the same
relation no matter how many scans are in progress. Think about a complex
query that is doing both a seqscan and an indexscan on the same relation
(a self-join could easily do this). You'd really need to change this if
you want POSIX_FADV_SEQUENTIAL and POSIX_FADV_RANDOM to get set usefully.
» (tom lane, 2003)I had a version of the POSIX_FADV_SEQUENTIAL patch going which set the
appropriate mode before every block read (skipping it if it was the
same mode as last set -- just like we handle lseek). I couldn't
measure any consistent improvement on sequential scans though which,
at least on Linux, already saturdate any i/o system I tested. Mileage
on other operating systems or i/o systems may vary of course.
yes as stated before by Greg Smith, some OS use more or less the POSIX_FADV_*
depending on their default. Linux is agresive and the POSIX_FADV_SEQUENTIAL
have probably only poor benefit on it. I wonder what happen with the
POSIX_FADV_RANDOM one.
I think the real benefit of this would be avoiding polluting the
filesystem cache with blocks which we have no intention of reading.
and be sure we readhead when needed, bypassing system default.
That will be a hard benefit to measure though. Especially since just
because we're doing a random i/o doesn't actually mean we won't read
nearby blocks eventually. If we're scanning an index range and the
table is actually mostly clustered then our random i/o won't be so
random after all...
Probably, yes... :/
----
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org