avoiding file system caching of a table

Started by Gabriel Sánchez Martínezover 12 years ago7 messagesgeneral

gabrielesanchez@gmail.com

over 12 years ago

Is there a way of asking PostgreSQL to read the files of a table
directly off the disk, asking the OS not to use the file cache? I am
running PostgreSQL 9.1 on Ubuntu Server 64-bit. The server in question
has the maximum amount of RAM it supports, but the database has grown
much larger. Most of the time it doesn't matter, because only specific
tables or parts of indexed tables are queried, and all of that fits in
the file cache. But we have a new requirement of queries to a table
several times larger than the total RAM, and the database has slowed
down considerably for the other queries.

I am assuming that with every query to the large table, the OS caches
the files containing the table's data, and since the table is larger
than total RAM, all the old caches are cleared. The caches that were
useful for other smaller tables are lost, and the new caches of the
large table are useless because on the next query caching will start
again from the first files of the table. Please point out if there is a
problem with this assumption. Note that I am refering to OS file
caching and not PostgreSQL caching.

Is there a way around this? I have read that there is a way of asking
the OS not to cache a file when the file is opened. Is there a way of
telling PostgreSQL to use this option when reading files that belong a
specific table?

What about putting the table on a tablespace that is on a different
device partition with the sync mount option? Would that help?

All suggestions will be appreciated.

Thanks,
Gabriel

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Gabriel Sánchez Martínez

gabrielesanchez@gmail.com

over 12 years ago

In reply to: Gabriel Sánchez Martínez (#1)

Re: avoiding file system caching of a table

On 02/16/2014 10:33 PM, Gabriel Sï¿½nchez Martï¿½nez wrote:

Is there a way of asking PostgreSQL to read the files of a table
directly off the disk, asking the OS not to use the file cache? I am
running PostgreSQL 9.1 on Ubuntu Server 64-bit. The server in
question has the maximum amount of RAM it supports, but the database
has grown much larger. Most of the time it doesn't matter, because
only specific tables or parts of indexed tables are queried, and all
of that fits in the file cache. But we have a new requirement of
queries to a table several times larger than the total RAM, and the
database has slowed down considerably for the other queries.

I am assuming that with every query to the large table, the OS caches
the files containing the table's data, and since the table is larger
than total RAM, all the old caches are cleared. The caches that were
useful for other smaller tables are lost, and the new caches of the
large table are useless because on the next query caching will start
again from the first files of the table. Please point out if there is
a problem with this assumption. Note that I am refering to OS file
caching and not PostgreSQL caching.

Is there a way around this? I have read that there is a way of asking
the OS not to cache a file when the file is opened. Is there a way of
telling PostgreSQL to use this option when reading files that belong a
specific table?

What about putting the table on a tablespace that is on a different
device partition with the sync mount option? Would that help?

I have read forum postings saying that the sync option affects writes,
and will not prevent reads from caching. At some forum posting I came
across nocache, a utility for linux. It is used by typing "nocache
<command>" in a shell. But I can't do that with a postgres process when
a connection opens because postgres is the one opening the process.
Does someone know a work-around, or a different solution to the
problem? Shouldn't PostgreSQL be smart about this and based on the
statistics collected for a table and on the query plan know the harm
that will be done if all of a very large table's pages are read and
flush the cache?

All suggestions will be appreciated.

Thanks,
Gabriel

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jeff Janes

jeff.janes@gmail.com

over 12 years ago

In reply to: Gabriel Sánchez Martínez (#2)

Re: avoiding file system caching of a table

On Mon, Feb 17, 2014 at 5:09 PM, "Gabriel E. Sánchez Martínez" <
gabrielesanchez@gmail.com> wrote:

On 02/16/2014 10:33 PM, Gabriel Sánchez Martínez wrote:

Is there a way of asking PostgreSQL to read the files of a table directly
off the disk, asking the OS not to use the file cache? I am running
PostgreSQL 9.1 on Ubuntu Server 64-bit. The server in question has the
maximum amount of RAM it supports, but the database has grown much larger.
Most of the time it doesn't matter, because only specific tables or parts
of indexed tables are queried, and all of that fits in the file cache. But
we have a new requirement of queries to a table several times larger than
the total RAM, and the database has slowed down considerably for the other
queries.

I am assuming that with every query to the large table, the OS caches the
files containing the table's data, and since the table is larger than total
RAM, all the old caches are cleared. The caches that were useful for other
smaller tables are lost, and the new caches of the large table are useless
because on the next query caching will start again from the first files of
the table. Please point out if there is a problem with this assumption.
Note that I am refering to OS file caching and not PostgreSQL caching.

If you told postgresql to tell the kernel not to cache the data it reads,
how would this help? The data you want in cache would no longer be pushed
out of the cache, but that is because it would no longer be there in the
first place. You would have to make this instruction to the kernel be
selective. It would only tell it not to cache when it is doing a very
large query. It might be theoretically possible to do this, but it it
would probably cause more harm than good to most people most of the time.

Is there a way around this? I have read that there is a way of asking
the OS not to cache a file when the file is opened. Is there a way of
telling PostgreSQL to use this option when reading files that belong a
specific table?

What about putting the table on a tablespace that is on a different
device partition with the sync mount option? Would that help?

I have read forum postings saying that the sync option affects writes, and
will not prevent reads from caching. At some forum posting I came across
nocache, a utility for linux. It is used by typing "nocache <command>" in
a shell. But I can't do that with a postgres process when a connection
opens because postgres is the one opening the process.

You would have to start the entire service with that utility, then. Which
again would defeat the purpose.

Does someone know a work-around, or a different solution to the problem?
Shouldn't PostgreSQL be smart about this and based on the statistics
collected for a table and on the query plan know the harm that will be done
if all of a very large table's pages are read and flush the cache?

PostgreSQL does know this. It has a special ring "buffer access strategy"
that it uses to prevent a large sequential scan from pushing all of the
other data out of its shared_buffers. It sounds like it is the kernel
which is failing to employ similar logic on the file cache which the
*kernel* manages.

The kernel does also have some logic to prevent this, but it may or may not
be very effective in your case (you haven't us what version of the kernel
you are using). In fact one effort of the kernel to fix this problem for
cases like yours ended up making it worse for other conditions, i.e. when
the file being read sequentially was less than available RAM but greater
than 1/2 available RAM.

You could try increasing shared_buffers (you haven't told us what it is set
to now) until it takes up a big chunk of RAM, so that PostgreSQL manages
more of the cache and the kernel manages less of it. Setting it like that
has been reported to cause problems on write-heavy work loads, but I
haven't heard of problems on read-mostly workloads.

Cheers,

Jeff

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Jeff Janes (#3)

Re: avoiding file system caching of a table

Jeff Janes <jeff.janes@gmail.com> writes:

If you told postgresql to tell the kernel not to cache the data it reads,
how would this help?

I seem to recall also that O_DIRECT disables optimizations we definitely
want, like read-ahead. So it could come out a loser independently of
any what's-in-cache concerns.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Gabriel Sánchez Martínez

gabrielesanchez@gmail.com

over 12 years ago

In reply to: Jeff Janes (#3)

Re: avoiding file system caching of a table

On 02/17/2014 08:45 PM, Jeff Janes wrote:

On Mon, Feb 17, 2014 at 5:09 PM, "Gabriel E. Sï¿½nchez Martï¿½nez"
<gabrielesanchez@gmail.com <mailto:gabrielesanchez@gmail.com>> wrote:

On 02/16/2014 10:33 PM, Gabriel Sï¿½nchez Martï¿½nez wrote:

Is there a way of asking PostgreSQL to read the files of a
table directly off the disk, asking the OS not to use the file
cache? I am running PostgreSQL 9.1 on Ubuntu Server 64-bit.
The server in question has the maximum amount of RAM it
supports, but the database has grown much larger. Most of the
time it doesn't matter, because only specific tables or parts
of indexed tables are queried, and all of that fits in the
file cache. But we have a new requirement of queries to a
table several times larger than the total RAM, and the
database has slowed down considerably for the other queries.

I am assuming that with every query to the large table, the OS
caches the files containing the table's data, and since the
table is larger than total RAM, all the old caches are
cleared. The caches that were useful for other smaller tables
are lost, and the new caches of the large table are useless
because on the next query caching will start again from the
first files of the table. Please point out if there is a
problem with this assumption. Note that I am refering to OS
file caching and not PostgreSQL caching.

If you told postgresql to tell the kernel not to cache the data it
reads, how would this help? The data you want in cache would no
longer be pushed out of the cache, but that is because it would no
longer be there in the first place. You would have to make this
instruction to the kernel be selective. It would only tell it not to
cache when it is doing a very large query. It might be theoretically
possible to do this, but it it would probably cause more harm than
good to most people most of the time.

I read that the suggestion not to cache a file when reading it is given
by programs at the time the file is opened. That prompted me to think
that there might be a way of telling PostgreSQL to apply that to the
pages of a specific relation. I did not mean to suggest it should be a
process-wide or database-wide setting.

Is there a way around this? I have read that there is a way
of asking the OS not to cache a file when the file is opened.
Is there a way of telling PostgreSQL to use this option when
reading files that belong a specific table?

What about putting the table on a tablespace that is on a
different device partition with the sync mount option? Would
that help?

I have read forum postings saying that the sync option affects
writes, and will not prevent reads from caching. At some forum
posting I came across nocache, a utility for linux. It is used by
typing "nocache <command>" in a shell. But I can't do that with a
postgres process when a connection opens because postgres is the
one opening the process.

You would have to start the entire service with that utility, then.
Which again would defeat the purpose.

Since a process is launched every time a session opens, e.g. a query
window in pgAdmin, I thought it would be possible to do it per session
rather than for the whole service. Either way, I agree this wouldn't
solve the problem.

Does someone know a work-around, or a different solution to the
problem? Shouldn't PostgreSQL be smart about this and based on
the statistics collected for a table and on the query plan know
the harm that will be done if all of a very large table's pages
are read and flush the cache?

PostgreSQL does know this. It has a special ring "buffer access
strategy" that it uses to prevent a large sequential scan from pushing
all of the other data out of its shared_buffers. It sounds like it
is the kernel which is failing to employ similar logic on the file
cache which the *kernel* manages.

I have no idea how the kernel manages its cache, but I think that since
individual pages that store the data of the table are small, the kernel
has no way of knowing that a process will read a very large number of
relatively small files that collectively will cause harm. Maybe if it
were a single file large than total physical RAM it would act
differently. But I am just speculating.

The kernel does also have some logic to prevent this, but it may or
may not be very effective in your case (you haven't us what version of
the kernel you are using).

Thanks for asking. 3.8.0-35-generic. I'm curious. What does it do?
Or do you know where I can read about this (just out of curiosity).

In fact one effort of the kernel to fix this problem for cases like
yours ended up making it worse for other conditions, i.e. when the
file being read sequentially was less than available RAM but greater
than 1/2 available RAM.
You could try increasing shared_buffers (you haven't told us what it
is set to now) until it takes up a big chunk of RAM, so that
PostgreSQL manages more of the cache and the kernel manages less of
it. Setting it like that has been reported to cause problems on
write-heavy work loads, but I haven't heard of problems on read-mostly
workloads.

This server is read-mostly. It has 64 GB of RAM, a single 6-core i7
processor, and four SATA hard drives on software RAID 10. I get about
400 MB/s of sequential reads on simple benchmarks. Shared buffers is
set to 16 GB, temp buffers to 8 MB, work mem to 100 MB, and maintenance
work mem to 100 MB. I could probably tweak this but I know very little
about it. Do you think I should set any of these higher? Other things
run on the server, but most of the usage is PostgreSQL queries on tables
of several hundred GB... some of which need to process whole tables.

So other than the possibility of tweaking shared_buffers, the only other
solution is getting a server with TBs of RAM?

Cheers,

Jeff

I really appreciate your help. Thank you (and Tom).

Haribabu Kommi

kommi.haribabu@gmail.com

over 12 years ago

In reply to: Gabriel Sánchez Martínez (#1)

Re: avoiding file system caching of a table

On Mon, Feb 17, 2014 at 2:33 PM, Gabriel Sánchez Martínez <
gabrielesanchez@gmail.com> wrote:

Is there a way of asking PostgreSQL to read the files of a table directly
off the disk, asking the OS not to use the file cache? I am running
PostgreSQL 9.1 on Ubuntu Server 64-bit. The server in question has the
maximum amount of RAM it supports, but the database has grown much larger.
Most of the time it doesn't matter, because only specific tables or parts
of indexed tables are queried, and all of that fits in the file cache. But
we have a new requirement of queries to a table several times larger than
the total RAM, and the database has slowed down considerably for the other
queries.

I am assuming that with every query to the large table, the OS caches the
files containing the table's data, and since the table is larger than total
RAM, all the old caches are cleared. The caches that were useful for other
smaller tables are lost, and the new caches of the large table are useless
because on the next query caching will start again from the first files of
the table. Please point out if there is a problem with this assumption.
Note that I am refering to OS file caching and not PostgreSQL caching.

Is there a way around this? I have read that there is a way of asking the
OS not to cache a file when the file is opened. Is there a way of telling
PostgreSQL to use this option when reading files that belong a specific
table?

What about putting the table on a tablespace that is on a different device
partition with the sync mount option? Would that help?

All suggestions will be appreciated.

Can you please check the following extension, it may be useful to you.
https://github.com/klando/pgfincore

Regards,
Hari Babu
Fujitsu Australia

Jeff Janes

jeff.janes@gmail.com

about 12 years ago

In reply to: Gabriel Sánchez Martínez (#5)

Re: avoiding file system caching of a table

On Mon, Feb 17, 2014 at 7:30 PM, Gabriel Sánchez Martínez <
gabrielesanchez@gmail.com> wrote:

On 02/17/2014 08:45 PM, Jeff Janes wrote:

On Mon, Feb 17, 2014 at 5:09 PM, "Gabriel E. Sánchez Martínez" <
gabrielesanchez@gmail.com> wrote:

I have read forum postings saying that the sync option affects writes,
and will not prevent reads from caching. At some forum posting I came
across nocache, a utility for linux. It is used by typing "nocache
<command>" in a shell. But I can't do that with a postgres process when a
connection opens because postgres is the one opening the process.

You would have to start the entire service with that utility, then.
Which again would defeat the purpose.

Since a process is launched every time a session opens, e.g. a query
window in pgAdmin, I thought it would be possible to do it per session
rather than for the whole service. Either way, I agree this wouldn't solve
the problem.

Perhaps there is a way to do that, but I haven't been able to find any
information on the "nocache" utility myself. Anyway, probably kernel
hackers rather than PostgreSQL hackers would be a better source of info on
that.

Does someone know a work-around, or a different solution to the problem?
Shouldn't PostgreSQL be smart about this and based on the statistics
collected for a table and on the query plan know the harm that will be done
if all of a very large table's pages are read and flush the cache?

PostgreSQL does know this. It has a special ring "buffer access
strategy" that it uses to prevent a large sequential scan from pushing all
of the other data out of its shared_buffers. It sounds like it is the
kernel which is failing to employ similar logic on the file cache which the
*kernel* manages.

I have no idea how the kernel manages its cache, but I think that since
individual pages that store the data of the table are small, the kernel has
no way of knowing that a process will read a very large number of
relatively small files that collectively will cause harm. Maybe if it were
a single file large than total physical RAM it would act differently. But
I am just speculating.

I don't think the kernel cares much about file boundaries. The way it
usually works is one part of the cache is set aside for data pages that
were accessed once recently, and another part for data that was accessed
multiple times.

The kernel does also have some logic to prevent this, but it may or may
not be very effective in your case (you haven't us what version of the
kernel you are using).

Thanks for asking. 3.8.0-35-generic. I'm curious. What does it do? Or
do you know where I can read about this (just out of curiosity).

I was hoping someone with better knowledge of the details would respond,
but since they haven't...I think 3.8.0 is in the range that uses half of
the cache memory for recently first-read pages, and half for
frequently-read pages, but allows the frequently-read half steal from the
other half if it would otherwise go unused.

Unfortunately I don't know of a good way to figure out when the various
features were added to which version of the kernel. It is very frustrating.
Also, I think various distributions might backport certain patches out of
sequence, so that the version number of the kernel itself is not even
deterministic for each feature. But I could be spreading misinformation.

In fact one effort of the kernel to fix this problem for cases like
yours ended up making it worse for other conditions, i.e. when the file
being read sequentially was less than available RAM but greater than 1/2
available RAM.

You could try increasing shared_buffers (you haven't told us what it is
set to now) until it takes up a big chunk of RAM, so that PostgreSQL
manages more of the cache and the kernel manages less of it. Setting it
like that has been reported to cause problems on write-heavy work loads,
but I haven't heard of problems on read-mostly workloads.

This server is read-mostly. It has 64 GB of RAM, a single 6-core i7
processor, and four SATA hard drives on software RAID 10. I get about 400
MB/s of sequential reads on simple benchmarks. Shared buffers is set to 16
GB,

If the problem is that the kernel is mishandling its cache, you could
increase shared_buffers to 52GB and see if that helps. Hopefully you have a
testing environment that you can use to evaluate it before changing it on
production.

temp buffers to 8 MB, work mem to 100 MB, and maintenance work mem to 100
MB. I could probably tweak this but I know very little about it. Do you
think I should set any of these higher? Other things run on the server,
but most of the usage is PostgreSQL queries on tables of several hundred
GB... some of which need to process whole tables.

So other than the possibility of tweaking shared_buffers, the only other
solution is getting a server with TBs of RAM?

Just doubling the RAM might do the job. For example, say the
frequently-used data takes up 3/4 of the kernel's available page cache.
Normally this is fine, because it is held in the cache. But then the
large single-pass query kicks in, and it starts evicting the
frequently-used data down to its "fair share" of 1/2 of the page cache.
Now 1/3 of the access to the frequently used data need to go to disk and
your performance drops. So you don't need to add enough RAM to store the
single-pass data, you just need enough to store ~2 times the
frequently-used data.

The first thing I'd try to figure out is whether your frequently-used data,
in the absence of the large queries, is residing all in shared_buffers, or
if it is split between shared_buffers and the kernel cache. If you can,
disable the large queries for a while, reset the statistics, let the small
queries run for a while, and then look in pg_stat_database to figure that
out, particularly blks_read and blks_hit.

Cheers,

Jeff