Larger volumes of chronologically ordered data and the planner
Hello,
What is PostgreSQL's likely behaviour when it encounters a large
volume of data that is chronologically ordered (there's a btree index
on a date column)? Is postgreSQL intelligent enough to discern that
since the most frequently accessed data is invariably recent data,
that it should store only that in memory, and efficiently store less
relevant, older data on disk (the volume of data in production at the
moment is still small enough to fit entirely in memory)? The
application I maintain is not really a data warehousing app, but this
is likely to be where I first encounter performance issues, if I ever
do.
Where can I learn more about this subject in general?
Regards,
John Moran
John Moran <johnfrederickmoran@gmail.com> writes:
What is PostgreSQL's likely behaviour when it encounters a large
volume of data that is chronologically ordered (there's a btree index
on a date column)? Is postgreSQL intelligent enough to discern that
since the most frequently accessed data is invariably recent data,
that it should store only that in memory, and efficiently store less
relevant, older data on disk (the volume of data in production at the
moment is still small enough to fit entirely in memory)?
There's no dedicated intelligence about such a case, but I don't see why
the ordinary cache management algorithms won't handle it perfectly well.
regards, tom lane
John Moran wrote:
Is postgreSQL intelligent enough to discern that
since the most frequently accessed data is invariably recent data,
that it should store only that in memory, and efficiently store less
relevant, older data on disk
When you ask for a database block from disk, it increments a usage count
figure for that block when it's read into memory, and again if it turns
out it was already there. Those requests to allocate new blocks are
constantly decreasing those usage counts as they "clock sweep" over the
cache looking for space that hasn't been used recently. This will
automatically keep blocks you've used recently in RAM, while evicting
ones that aren't.
The database doesn't have any intelligence to determining what data to
keep in memory or not beyond that. Its sole notion of "relevant" is
whether someone has accessed that block recently or not. The operating
system cache will sit as a second layer on top of this, typically with
its own LRU scheme typically for determining what gets cached or not.
I've written a long paper covering the internals here named "Inside the
PostgreSQL Buffer Cache" at
http://www.westnet.com/~gsmith/content/postgresql/ if you want to know
exactly how this is all implemented.
--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us
I've written a long paper covering the internals here named "Inside the
PostgreSQL Buffer Cache" at
http://www.westnet.com/~gsmith/content/postgresql/ if you want to know
exactly how this is all implemented.
Greg,
That's exactly what I was looking for,
Regards,
John