How to keep a table in memory?

Started by adrobjover 18 years ago30 messageshackers

adrobj@yahoo.com

over 18 years ago

I have a pretty small table (~20MB) that is accessed very frequently and
randomly, so I want to make sure it's 100% in memory all the time. There is
a lot of other staff that's also gets accessed frequently, so I don't want
to just hope that Linux file cache would do the right thing for me.

Is there any way to do that?

One idea I have in my head is to start a process that does mmap() and
mlock() with the table file. Will it work? If so, are there any potential
problems?
--
View this message in context: http://www.nabble.com/How-to-keep-a-table-in-memory--tf4789293.html#a13700771
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 18 years ago

In reply to: adrobj (#1)

Re: How to keep a table in memory?

adrobj wrote:

I have a pretty small table (~20MB) that is accessed very frequently and
randomly, so I want to make sure it's 100% in memory all the time. There is
a lot of other staff that's also gets accessed frequently, so I don't want
to just hope that Linux file cache would do the right thing for me.

Is there any way to do that?

One idea I have in my head is to start a process that does mmap() and
mlock() with the table file. Will it work? If so, are there any potential
problems?

Just leave it to the cache management algorithms in Postgres and Linux.
If it really is frequently accessed, it should stay in Postgres shared
buffers.

You can use the pg_buffercache contrib module to see what's in cache.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

adrobj@yahoo.com

over 18 years ago

In reply to: Heikki Linnakangas (#2)

Re: How to keep a table in memory?

Hi Heikki,

Thanks for the response!

I understand that relying on cache management would be the easiest solution. However, I had a similar issue with other RDBMS (MSSQL, to be specific) in the past and observed a lot of disk activity until the table was pinned in memory (fortunately MSSQL has 'dbcc pintable' for that).

Basically, this is all about a high-traffic website, where virtually _all_ data in the DB get accessed frequently - so it's not obvious which DB pages are going to win the eviction war. However, the overall cost of access is different for different tables - for the table in question it very well may ~20 disk seeks per webpage view, so very high cache hit rate (ideally 100%) has to be assured.

So - will the 'mlock' hack work? Or any other ideas for "pinning" a table in memory?

- Alex

Heikki Linnakangas <heikki@enterprisedb.com> wrote:
adrobj wrote:

I have a pretty small table (~20MB) that is accessed very frequently and
randomly, so I want to make sure it's 100% in memory all the time. There is
a lot of other staff that's also gets accessed frequently, so I don't want
to just hope that Linux file cache would do the right thing for me.

Is there any way to do that?

One idea I have in my head is to start a process that does mmap() and
mlock() with the table file. Will it work? If so, are there any potential
problems?

Just leave it to the cache management algorithms in Postgres and Linux.
If it really is frequently accessed, it should stay in Postgres shared
buffers.

You can use the pg_buffercache contrib module to see what's in cache.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------------
Never miss a thing. Make Yahoo your homepage.

andrew@dunslane.net

over 18 years ago

In reply to: adrobj (#3)

Re: How to keep a table in memory?

Alex Drobychev wrote:

Hi Heikki,

Thanks for the response!

I understand that relying on cache management would be the easiest
solution. However, I had a similar issue with other RDBMS (MSSQL, to
be specific) in the past and observed a lot of disk activity until the
table was pinned in memory (fortunately MSSQL has 'dbcc pintable' for
that).

Basically, this is all about a high-traffic website, where virtually
_all_ data in the DB get accessed frequently - so it's not obvious
which DB pages are going to win the eviction war. However, the overall
cost of access is different for different tables - for the table in
question it very well may ~20 disk seeks per webpage view, so very
high cache hit rate (ideally 100%) has to be assured.

So - will the 'mlock' hack work? Or any other ideas for "pinning" a
table in memory?

- Alex

*/Heikki Linnakangas <heikki@enterprisedb.com>/* wrote:

adrobj wrote:

I have a pretty small table (~20MB) that is accessed very

frequently and

randomly, so I want to make sure it's 100% in memory all the

time. There is

a lot of other staff that's also gets accessed frequently, so I

don't want

to just hope that Linux file cache would do the right thing for me.

Is there any way to do that?

One idea I have in my head is to start a process that does

mmap() and

mlock() with the table file. Will it work? If so, are there any

potential

problems?

Just leave it to the cache management algorithms in Postgres and
Linux.
If it really is frequently accessed, it should stay in Postgres
shared
buffers.

You can use the pg_buffercache contrib module to see what's in cache.

1. when someone replies to your post at the bottom, please don't put
your reply at the top. It makes everything totally unreadable.

2. you should investigate one or more of: pg_memcache, solid state disk.

FYI, Postgres is know to be used successfully on some *extremely* heavy
websites, without using tables pinned in memory.

cheers

andrew

xzilla@users.sourceforge.net

over 18 years ago

In reply to: Andrew Dunstan (#4)

Re: How to keep a table in memory?

On Monday 12 November 2007 18:31, Andrew Dunstan wrote:

1. when someone replies to your post at the bottom, please don't put
your reply at the top. It makes everything totally unreadable.

+1

2. you should investigate one or more of: pg_memcache, solid state disk.

you might also consider creating a tablespace on tmpfs or ramfs or something
like pramfs

FYI, Postgres is know to be used successfully on some *extremely* heavy
websites, without using tables pinned in memory.

+1

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

Joshua D. Drake

jd@commandprompt.com

over 18 years ago

In reply to: Robert Treat (#5)

Re: How to keep a table in memory?

Robert Treat wrote:

On Monday 12 November 2007 18:31, Andrew Dunstan wrote:

1. when someone replies to your post at the bottom, please don't put
your reply at the top. It makes everything totally unreadable.

+1

2. you should investigate one or more of: pg_memcache, solid state disk.

you might also consider creating a tablespace on tmpfs or ramfs or something
like pramfs

FYI, Postgres is know to be used successfully on some *extremely* heavy
websites, without using tables pinned in memory.

+1

I give this a +/- 1. Yes extremely heavy websites can do this *but* they
require extremely expensive hardware to do so.

Joshua D. Drake

Devrim GÜNDÜZ

devrim@gunduz.org

over 18 years ago

In reply to: Robert Treat (#5)

Re: How to keep a table in memory?

Hi,

On Mon, 2007-11-12 at 21:15 -0500, Robert Treat wrote:

2. you should investigate one or more of: pg_memcache, solid state
disk.

you might also consider creating a tablespace on tmpfs or ramfs or
something like pramfs

IIRC, ramfs are not that good for database use: If you want to extend
its size, you have to stop the database instance -- which is not
considered good.

Regards,
--
Devrim GÜNDÜZ , RHCE
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/

andrew@dunslane.net

over 18 years ago

In reply to: Joshua D. Drake (#6)

Re: How to keep a table in memory?

Joshua D. Drake wrote:

FYI, Postgres is know to be used successfully on some *extremely* heavy
websites, without using tables pinned in memory.

+1

I give this a +/- 1. Yes extremely heavy websites can do this *but*
they require extremely expensive hardware to do so.

I expect extremely heavy websites to require extremely expensive
equipment regardless of the software they use. Cost was not the issue
raised by the OP.

cheers

andrew

Show quoted text

Joshua D. Drake

jd@commandprompt.com

over 18 years ago

In reply to: Devrim GÜNDÜZ (#7)

Re: How to keep a table in memory?

Devrim GÜNDÜZ wrote:

Hi,

On Mon, 2007-11-12 at 21:15 -0500, Robert Treat wrote:

2. you should investigate one or more of: pg_memcache, solid state
disk.

you might also consider creating a tablespace on tmpfs or ramfs or
something like pramfs

IIRC, ramfs are not that good for database use: If you want to extend
its size, you have to stop the database instance -- which is not
considered good.

Well, depending on the size you could push the table to another table
space, drop the old table space, resize the ramfs, and reverse the
previous :)

Joshua D. Drake

Show quoted text

Regards,

Joshua D. Drake

jd@commandprompt.com

over 18 years ago

In reply to: Andrew Dunstan (#8)

Re: How to keep a table in memory?

Andrew Dunstan wrote:

I give this a +/- 1. Yes extremely heavy websites can do this *but*
they require extremely expensive hardware to do so.

I expect extremely heavy websites to require extremely expensive
equipment regardless of the software they use. Cost was not the issue
raised by the OP.

Cost is always an issue, even if implicit. If the person is so hung up
on the idea of pushing things into ram there is a pretty good
possibility they have priced out the 50 and 100 spindle devices needed
to get the same type of performance.

Sincerely,

Joshua D. Drake

Show quoted text

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Devrim GÜNDÜZ

devrim@gunduz.org

over 18 years ago

In reply to: Heikki Linnakangas (#2)

Re: How to keep a table in memory?

Hi,

On Mon, 2007-11-12 at 09:12 +0000, Heikki Linnakangas wrote:

Just leave it to the cache management algorithms in Postgres and
Linux. If it really is frequently accessed, it should stay in
Postgres shared buffers.

How is "frequently accessed" determined by PostgreSQL?

I mean... You know, OS caches either inodes, or pages. Page caches are
pretty ignorable, since it means the data is already in virtual memory.
So, we have inode caching, and IIRC it results in i/o requests from the
disk -- and sure, it uses i/o scheduler of the kernel (like the all of
the applications running on that machine -- including a basic login
session). *If* the data hadn't been deleted, it returns from i/o
scheduler.

So there is no 100% guarantee that the table is in the memory. If we
could use the ram (some (or a :) ) database(s) can do that IIRC), we
will avoid i/o scheduler, which will really speed up the process. (Ok,
AFAIK, you can "pin" your objects to memory with Oracle).

... and one more thing with ramfs: Since there is a fs on ramfs, it
passes through VFS -- and goes through kernel schedulers again.

So, IMHO, saying "trust your OS + PostgreSQL" is not a 100% perfect
approach for the people who are asking to keep their objects on RAM,
even though I know that there is nothing we can say right now.

Regards,
--
Devrim GÜNDÜZ , RHCE
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Devrim GÜNDÜZ (#11)

Re: How to keep a table in memory?

Devrim =?ISO-8859-1?Q?G=DCND=DCZ?= <devrim@CommandPrompt.com> writes:

So, IMHO, saying "trust your OS + PostgreSQL" is not a 100% perfect
approach for the people who are asking to keep their objects on RAM,
even though I know that there is nothing we can say right now.

Well, nothing is a 100% solution. But my opinion is that people who
think they are smarter than an LRU caching algorithm are typically
mistaken. If the table is all that heavily used, it will stay in memory
just fine. If it's not sufficiently heavily used to stay in memory
according to an LRU algorithm, maybe the memory space really should be
spent on something else.

Now there are certainly cases where a standard caching algorithm falls
down --- the main one I can think of offhand is where you would like to
give one class of queries higher priority than another, and so memory
space should preferentially go to tables that are needed by the first
class. But if that's your problem, "pin these tables in memory" is
still an awfully crude solution to the problem. I'd be inclined to
think instead about a scheme that lets references made by
higher-priority queries bump buffers' use-counts by more than 1,
or some other way of making the priority considerations visible to an
automatic cache management algorithm.

regards, tom lane

gsmith@gregsmith.com

over 18 years ago

In reply to: adrobj (#3)

Re: How to keep a table in memory?

On Mon, 12 Nov 2007, Alex Drobychev wrote:

Or any other ideas for "pinning" a table in memory?

If the table you're worried about is only 20MB, have you considered just
running something regularly that touches the whole thing? This may be the
only time I've ever considered running "select count(*) from x" as a
productive move. That would waste some CPU, but it would help those pages
"win the eviction war" as you say.

You definately should follow-up on the suggestion given to look at the
pg_buffercache contrib module to get a better idea what's going on under
the LRU hood. In fact, you may want to install a tweak that's standard in
8.3 to show the usage counts in order to better get a feel for what's
going on; the appendix on my article at
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm goes
into this a bit, with the documentation to pg_buffercache having the rest
of what you'd need.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

llonergan@greenplum.com

over 18 years ago

In reply to: Greg Smith (#13)

Re: How to keep a table in memory?

Vacuum is a better thing to run, much less CPU usage.

- Luke

Msg is shrt cuz m on ma treo

-----Original Message-----
From: Greg Smith [mailto:gsmith@gregsmith.com]
Sent: Monday, November 12, 2007 11:59 PM Eastern Standard Time
To: Alex Drobychev
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] How to keep a table in memory?

On Mon, 12 Nov 2007, Alex Drobychev wrote:

Or any other ideas for "pinning" a table in memory?

If the table you're worried about is only 20MB, have you considered just
running something regularly that touches the whole thing? This may be the
only time I've ever considered running "select count(*) from x" as a
productive move. That would waste some CPU, but it would help those pages
"win the eviction war" as you say.

You definately should follow-up on the suggestion given to look at the
pg_buffercache contrib module to get a better idea what's going on under
the LRU hood. In fact, you may want to install a tweak that's standard in
8.3 to show the usage counts in order to better get a feel for what's
going on; the appendix on my article at
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm goes
into this a bit, with the documentation to pg_buffercache having the rest
of what you'd need.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Import Notes

Resolved by subject fallback

cbbrowne@acm.org

over 18 years ago

In reply to: adrobj (#3)

Re: How to keep a table in memory?

In an attempt to throw the authorities off his trail, jd@commandprompt.com ("Joshua D. Drake") transmitted:

Andrew Dunstan wrote:

I give this a +/- 1. Yes extremely heavy websites can do this
*but* they require extremely expensive hardware to do so.

I expect extremely heavy websites to require extremely expensive
equipment regardless of the software they use. Cost was not the
issue raised by the OP.

Cost is always an issue, even if implicit. If the person is so hung
up on the idea of pushing things into ram there is a pretty good
possibility they have priced out the 50 and 100 spindle devices
needed to get the same type of performance.

I dunno; I had a chat about cacheing strategies today where it became
clear to me that when we migrate to 8.3, we'll need to re-examine
things because there has been *so* much change since some of our
present policy was created back in the 7.2 days.

(Pointedly, one of the reasons to want a separate cache DB was to cut
down on XID consumption by read-only processes, and that reason
evaporates in 8.3.)

I have seen enough naive analyses done that I wouldn't be inclined to
assume much of anything.

People can get mighty self-assured about things that they have heard,
whether those things have validity or not. Few things can get as
badly wrong as bad assumptions made about performance...
--
(reverse (concatenate 'string "moc.liamg" "@" "enworbbc"))
http://linuxdatabases.info/info/x.html
"When we write programs that "learn", it turns out that we do and they
don't." -- Alan Perlis

cbbrowne@acm.org

over 18 years ago

In reply to: adrobj (#1)

Re: How to keep a table in memory?

Quoth tgl@sss.pgh.pa.us (Tom Lane):

Devrim =?ISO-8859-1?Q?G=DCND=DCZ?= <devrim@CommandPrompt.com> writes:

So, IMHO, saying "trust your OS + PostgreSQL" is not a 100% perfect
approach for the people who are asking to keep their objects on RAM,
even though I know that there is nothing we can say right now.

Well, nothing is a 100% solution. But my opinion is that people who
think they are smarter than an LRU caching algorithm are typically
mistaken. If the table is all that heavily used, it will stay in memory
just fine. If it's not sufficiently heavily used to stay in memory
according to an LRU algorithm, maybe the memory space really should be
spent on something else.

Now there are certainly cases where a standard caching algorithm falls
down --- the main one I can think of offhand is where you would like to
give one class of queries higher priority than another, and so memory
space should preferentially go to tables that are needed by the first
class. But if that's your problem, "pin these tables in memory" is
still an awfully crude solution to the problem. I'd be inclined to
think instead about a scheme that lets references made by
higher-priority queries bump buffers' use-counts by more than 1,
or some other way of making the priority considerations visible to an
automatic cache management algorithm.

Something I found *really* interesting was that whenever we pushed any
"high traffic" systems onto PostgreSQL 8.1, I kept seeing measurable
performance improvements taking place every day for a week.

Evidently, it took that long for cache to *truly* settle down.

Given that, and given that we've gotten a couple of good steps *more*
sophisticated than mere LRU, I'm fairly willing to go pretty far down
the "trust the shared memory cache" road.

The scenario described certainly warrants doing some benchmarking; it
warrants analyzing the state of the internal buffers over a period of
time to see what is actually in them.

If, after a reasonable period of time (that includes some variations
in system load), a reasonable portion (or perhaps the entirety) of the
Essential Table has consistently resided in buffers, then that should
be pretty decent evidence that cacheing is working the way it should.
--
output = ("cbbrowne" "@" "gmail.com")
http://linuxdatabases.info/info/slony.html
A Plateau is the highest form of flattery.

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 18 years ago

In reply to: Luke Lonergan (#14)

Re: How to keep a table in memory?

Luke Lonergan wrote:

Vacuum is a better thing to run, much less CPU usage.

Vacuum is actually not good for this purpose, because it's been
special-cased to not bump the usage count.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

bruce@momjian.us

over 18 years ago

In reply to: Tom Lane (#12)

Re: How to keep a table in memory?

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

I'd be inclined to think instead about a scheme that lets references made by
higher-priority queries bump buffers' use-counts by more than 1, or some
other way of making the priority considerations visible to an automatic
cache management algorithm.

I don't think that really solves the problem.

Consider a case where you have a few dozen queries all of which use indexes to
access only a few pages per call (but spread across a large enough table),
except for just one query which uses a sequential scan of a moderate sized
table.

In such a circumstance the best average performance might be to keep the pages
used by the index scans in memory and force most of the sequential scan to go
to disk. Especially if the sequential scan is fairly rare and especially if
random_page_cost is fairly high.

However if your concern is response time, not average performance, then that
would be disastrous. In exchange for a slight improvement of already fast
queries you would be obtaining an unsatisfactory response time for the
sequential scan.

I'm not sure what the solution is. This scenario is going to be a problem for
any system which tries to judge future usage based on past usage. If the
infrequent query with a strict response time requirement is infrequent enough
any automatic algorithm will evict it.

Some brainstorming ideas: What if a prepared query which previously ran under
some specified response time guarantee didn't bump the usage counts at all.
That way frequently run queries which are fast enough even with disk accesses
don't evict pages needed for slower queries.

Or better yet if we tag a prepared query with the average (or 90% percentile
or something like that) response time from the past and tag every buffer it
touches with that response time if it's greater than what the buffer is
already tagged with. When scanning for a page to evict we ignore any buffer
with response times larger than ours. Ie, queries which respond quickly are
not allowed to evict buffers needed by queries which response slower than
them. Only a slower or ad-hoc non-prepared query is allowed to evict those
pages.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

Andrew Sullivan

ajs@crankycanuck.ca

over 18 years ago

In reply to: Joshua D. Drake (#10)

Re: How to keep a table in memory?

On Mon, Nov 12, 2007 at 06:55:09PM -0800, Joshua D. Drake wrote:

Cost is always an issue, even if implicit. If the person is so hung up
on the idea of pushing things into ram there is a pretty good
possibility they have priced out the 50 and 100 spindle devices needed
to get the same type of performance.

I'm not sure I agree with that. The OP was claiming that this approach was
what worked for him with MS SQL Server, which makes me think that this is
the usual human habit of generalizing widely from a particular. That is,
"X was a solution that worked once with another product, so I want to know
how to do X with your product." We get these questions all the time, partly
because one has to re-learn all sorts of things when moving to PostgreSQL.
For instance, most of the traditional real database systems don't
collaborate with the OS in memory and cache management.

A

--
Andrew Sullivan
Old sigs will return after re-constitution of blue smoke

Andrew Sullivan

ajs@crankycanuck.ca

over 18 years ago

In reply to: Tom Lane (#12)

Re: How to keep a table in memory?

On Mon, Nov 12, 2007 at 10:54:34PM -0500, Tom Lane wrote:

class. But if that's your problem, "pin these tables in memory" is
still an awfully crude solution to the problem. I'd be inclined to
think instead about a scheme that lets references made by
higher-priority queries bump buffers' use-counts by more than 1,
or some other way of making the priority considerations visible to an
automatic cache management algorithm.

While this is true, nobody seems to have those other ways available today.
If there was a quick and easy way to pin certain tables in memory, I think
that administrators might be well-advised to use that mechanism until such
time as the weighted-priority cacheing or whatever shows up. (Of course,
AFAICT, there's no easy way to do the pinning, either, so this all seems a
little academic.)

I have to agree with what Tom says, however, about people thinking they're
smarter than the system. Much of the time, this sort of thumb on the scale
optimisation just moves the cost to some other place, and the admin's
analysis isn't comprehensive enough to turn that up until it's all turned on
in production.

A

--
Andrew Sullivan
Old sigs will return after re-constitution of blue smoke

josh@agliodbs.com

over 18 years ago

In reply to: Bruce Momjian (#18)

gsmith@gregsmith.com

over 18 years ago

In reply to: Andrew Sullivan (#20)

Andrew Sullivan

ajs@crankycanuck.ca

over 18 years ago

In reply to: Greg Smith (#22)

rm_pg@cheapcomplexdevices.com

over 18 years ago

In reply to: Heikki Linnakangas (#17)

Kevin.Grittner@wicourts.gov

over 18 years ago

In reply to: Andrew Sullivan (#23)

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 18 years ago

In reply to: Ron Mayer (#24)

mmoncure@gmail.com

over 18 years ago

In reply to: Chris Browne (#16)

simon@2ndQuadrant.com

over 18 years ago

In reply to: Greg Smith (#22)

Zeugswetter Andreas ADI SD

Andreas.Zeugswetter@s-itsolutions.at

over 18 years ago

In reply to: Kevin Grittner (#25)

bruce@momjian.us

about 18 years ago

In reply to: Tom Lane (#12)