patch for new feature: Buffer Cache Hibernation

Started by Mitsuru IWASAKIalmost 15 years ago52 messageshackers
Jump to latest
#1Mitsuru IWASAKI
iwasaki@jp.FreeBSD.org

Hi,

I am working on new feature `Buffer Cache Hibernation' which enables
postgres to keep higher cache hit ratio even just started.

Postgres usually starts with ZERO buffer cache. By saving the buffer
cache data structure into hibernation files just before shutdown, and
loading them at startup, postgres can start operations with the saved
buffer cache as the same condition as just before the last shutdown.

Here is the patch for 9.0.3 (also tested on 8.4.7)
http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-9.0.3.patch

The patch includes the following.
- At shutdown, buffer cache data structure (such as BufferDescriptors,
BufferBlocks and StrategyControl) is saved into hibernation files.
- At startup, buffer cache data structure is loaded from hibernation
files and buffer lookup hashtable is setup based on buffer descriptors.
- Above functions are enabled by specifying `enable_buffer_cache_hibernation=on'
in postgresql.conf.

Any comments are welcome and I would very much appreciate merging the
patch in source tree.

Have fun and thanks!

#2Andrew Dunstan
andrew@dunslane.net
In reply to: Mitsuru IWASAKI (#1)
Re: patch for new feature: Buffer Cache Hibernation

On 05/04/2011 10:10 AM, Mitsuru IWASAKI wrote:

Hi,

I am working on new feature `Buffer Cache Hibernation' which enables
postgres to keep higher cache hit ratio even just started.

Postgres usually starts with ZERO buffer cache. By saving the buffer
cache data structure into hibernation files just before shutdown, and
loading them at startup, postgres can start operations with the saved
buffer cache as the same condition as just before the last shutdown.

Here is the patch for 9.0.3 (also tested on 8.4.7)
http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-9.0.3.patch

The patch includes the following.
- At shutdown, buffer cache data structure (such as BufferDescriptors,
BufferBlocks and StrategyControl) is saved into hibernation files.
- At startup, buffer cache data structure is loaded from hibernation
files and buffer lookup hashtable is setup based on buffer descriptors.
- Above functions are enabled by specifying `enable_buffer_cache_hibernation=on'
in postgresql.conf.

Any comments are welcome and I would very much appreciate merging the
patch in source tree.

That sounds cool.

Please a) make sure your patch is up to data against the latest source
in git and b) submit it to the next commitfest at
<https://commitfest.postgresql.org/action/commitfest_view?id=10&gt;

We don't backport features, and 9.1 is closed for features now, so the
earliest release this could be used in is 9.2.

cheers

andrew

#3Bruce Momjian
bruce@momjian.us
In reply to: Mitsuru IWASAKI (#1)
Re: patch for new feature: Buffer Cache Hibernation

On Wed, May 4, 2011 at 3:10 PM, Mitsuru IWASAKI <iwasaki@jp.freebsd.org> wrote:

Postgres usually starts with ZERO buffer cache.  By saving the buffer
cache data structure into hibernation files just before shutdown, and
loading them at startup, postgres can start operations with the saved
buffer cache as the same condition as just before the last shutdown.

Offhand this seems pretty handy for benchmarks where it would help get
reproducible results.

--
greg

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mitsuru IWASAKI (#1)
Re: patch for new feature: Buffer Cache Hibernation

Mitsuru IWASAKI <iwasaki@jp.FreeBSD.org> writes:

Postgres usually starts with ZERO buffer cache. By saving the buffer
cache data structure into hibernation files just before shutdown, and
loading them at startup, postgres can start operations with the saved
buffer cache as the same condition as just before the last shutdown.

This seems like a lot of complication for rather dubious gain. What
happens when the DBA changes the shared_buffers setting, for instance?
How do you protect against the cached buffers getting out-of-sync with
the actual disk files (especially during recovery scenarios)? What
about crash-induced corruption in the cache file itself (consider the
not-unlikely possibility that init will kill the database before it's
had time to dump all the buffers during a system shutdown)? Do you have
any proof that writing out a few GB of buffers and then reading them
back in is actually much cheaper than letting the database re-read the
data from the disk files?

regards, tom lane

#5Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#4)
Re: patch for new feature: Buffer Cache Hibernation

Excerpts from Tom Lane's message of mié may 04 12:44:36 -0300 2011:

This seems like a lot of complication for rather dubious gain. What
happens when the DBA changes the shared_buffers setting, for instance?
How do you protect against the cached buffers getting out-of-sync with
the actual disk files (especially during recovery scenarios)? What
about crash-induced corruption in the cache file itself (consider the
not-unlikely possibility that init will kill the database before it's
had time to dump all the buffers during a system shutdown)? Do you have
any proof that writing out a few GB of buffers and then reading them
back in is actually much cheaper than letting the database re-read the
data from the disk files?

I thought the idea wasn't to copy the entire buffer but only a
descriptor, so that the buffer would be loaded from the original page.

If shared_buffers changes, there's no problem. If the new setting is
smaller, then the last paages would just not be copied, and would have
to be read from disk the first time they are accessed. If the new
setting is larger, then the last few buffers would remain unused until
requested.

As for gain, I have heard of test setups requiring hours of runtime in
order to prime the buffer cache.

Crash safety would have to be researched, sure. Maybe only do it in
clean shutdown.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#6Dickson S. Guedes
listas@guedesoft.net
In reply to: Bruce Momjian (#3)
Re: patch for new feature: Buffer Cache Hibernation

2011/5/4 Greg Stark <gsstark@mit.edu>:

On Wed, May 4, 2011 at 3:10 PM, Mitsuru IWASAKI <iwasaki@jp.freebsd.org> wrote:

Postgres usually starts with ZERO buffer cache.  By saving the buffer
cache data structure into hibernation files just before shutdown, and
loading them at startup, postgres can start operations with the saved
buffer cache as the same condition as just before the last shutdown.

Offhand this seems pretty handy for benchmarks where it would help get
reproducible results.

It could have an option to force it or not at start of postgres. This
could helps on benchmarks scenarios.

--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br

#7Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#4)
Re: patch for new feature: Buffer Cache Hibernation

On Wed, May 4, 2011 at 4:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Do you have
any proof that writing out a few GB of buffers and then reading them
back in is actually much cheaper than letting the database re-read the
data from the disk files?

I believe he's just writing out the meta data. Ie, which blocks to
re-reread from the disk files.

--
greg

#8Greg Smith
gsmith@gregsmith.com
In reply to: Alvaro Herrera (#5)
Re: patch for new feature: Buffer Cache Hibernation

Alvaro Herrera wrote:

As for gain, I have heard of test setups requiring hours of runtime in
order to prime the buffer cache.

And production ones too. I have multiple customers where a server
restart is almost a planned multi-hour downtime. The system may be back
up, but for a couple of hours performance is so terrible it's barely
usable. You can watch the MB/s ramp up as the more random data fills in
over time; getting that taken care of in a larger block more amenable to
elevator sorting would be a huge help.

I never bothered with this particular idea though because shared_buffers
is only a portion of the important data. Cedric's pgfincore code digs
into the OS cache, too, which can then save enough to be really useful
here. And that's already got a snapshot/restore feature. The slides at
http://www.pgcon.org/2010/schedule/events/261.en.html have a useful into
to that, pages 30 through 34 are the neat ones. That provides some
other neat APIs for preloading popular data into cache too. I'd rather
work on getting something like that into core, rather than adding
something that only is targeting just shared_buffers.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

#9Jeff Janes
jeff.janes@gmail.com
In reply to: Mitsuru IWASAKI (#1)
Re: patch for new feature: Buffer Cache Hibernation

On Wed, May 4, 2011 at 7:10 AM, Mitsuru IWASAKI <iwasaki@jp.freebsd.org> wrote:

Hi,

I am working on new feature `Buffer Cache Hibernation' which enables
postgres to keep higher cache hit ratio even just started.

Postgres usually starts with ZERO buffer cache.  By saving the buffer
cache data structure into hibernation files just before shutdown, and
loading them at startup, postgres can start operations with the saved
buffer cache as the same condition as just before the last shutdown.

Here is the patch for 9.0.3 (also tested on 8.4.7)
http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-9.0.3.patch

The patch includes the following.
- At shutdown, buffer cache data structure (such as BufferDescriptors,
 BufferBlocks and StrategyControl) is saved into hibernation files.
- At startup, buffer cache data structure is loaded from hibernation
 files and buffer lookup hashtable is setup based on buffer descriptors.
- Above functions are enabled by specifying `enable_buffer_cache_hibernation=on'
 in postgresql.conf.

Any comments are welcome and I would very much appreciate merging the
patch in source tree.

Have fun and thanks!

It applies and builds against head with offsets and some fuzz. It
fails make check, but apparently only because
src/test/regress/expected/rangefuncs.out needs to be updated to
include the new setting. (Although all the other "enable%" settings
are for the planner, so making a new setting with that prefix that
does something else might be undesirable)

I think that PgFincore (http://pgfoundry.org/projects/pgfincore/)
provides similar functionality. Are you familiar with that? If so,
could you contrast your approach with that one?

Cheers,

Jeff

#10Josh Berkus
josh@agliodbs.com
In reply to: Greg Smith (#8)
Re: patch for new feature: Buffer Cache Hibernation

All,

I thought that Dimitri had already implemented this using Fincore. It's
linux-only, but that should work well enough to test the general concept.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#11Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Josh Berkus (#10)
Re: patch for new feature: Buffer Cache Hibernation

Josh Berkus <josh@agliodbs.com> writes:

I thought that Dimitri had already implemented this using Fincore. It's
linux-only, but that should work well enough to test the general concept.

Actually, Cédric did, and I have a clone of his repository where I did
some debian packaging of it.

http://villemain.org/projects/pgfincore
http://git.postgresql.org/gitweb?p=pgfincore.git;a=summary
http://git.postgresql.org/gitweb?p=pgfincore.git;a=tree

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#12Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Josh Berkus (#10)
Re: patch for new feature: Buffer Cache Hibernation

2011/5/4 Josh Berkus <josh@agliodbs.com>:

All,

I thought that Dimitri had already implemented this using Fincore.  It's
linux-only, but that should work well enough to test the general concept.

Harald provided me some pointers at pgday in Stuttgart to make it work
with windows but ... hum I have not windows and wasn't enought
motivated to make it work on it if no one need it.

I didn't search recently on the different kernels, but any kernel
supporting mincore and posix_fadvise should work. (so probably the
same set of kernel that support our 'effective_io_concurrency').

Still waiting for (free)BSD support .....

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

#13Mitsuru IWASAKI
iwasaki@jp.FreeBSD.org
In reply to: Jeff Janes (#9)
Re: patch for new feature: Buffer Cache Hibernation

Hi,

I think that PgFincore (http://pgfoundry.org/projects/pgfincore/)
provides similar functionality. Are you familiar with that? If so,
could you contrast your approach with that one?

I'm not familiar with PgFincore at all sorry, but I got source code
and documents and read through them just now.
# and I'm a novice on postgres actually...
The target both is to reduce physical I/O, but their approaches and
gains are different.
My understanding is like this;

+---------------------+     +---------------------+
| Postgres(backend)   |     | Postgres            |
| +-----------------+ |     |                     |
| | DB Buffer Cache | |     |                     |
| | (shared buffers)| |     |                     |
| |*my target       | |     |                     |
| +-----------------+ |     |                     |
|   ^      ^          |     |                     |
|   |      |          |     |                     |
|   v      v          |     |                     |
| +-----------------+ |     | +-----------------+ | 
| |  buffer manager | |     | |    pgfincore    | |
| +-----------------+ |     | +-----------------+ |
+---^------^----------+     +----------^----------+
    |      |smgrread()                 |posix_fadvise()
    |read()|                           |                 userland
==================================================================
    |      |                           |                 kernel
    |      +-------------+-------------+
    |                    |
    |                    v
    |       +------------------------+
    |       | File System            |
    |       |   +-----------------+  |
    +------>|   | FS Buffer Cache |  |
            |   |*PgFincore target|  |
            |   +-----------------+  |
            |    ^       ^           |
            +----|-------|-----------+
                 |       |
==================================================================
                 |       |                               hardware
       +---------|-------|----------------+
       |         |       v  Physical Disk |
       |         |   +------------------+ |
       |         |   | base/16384/24598 | |
       |         v   +------------------+ |
       | +------------------------------+ |
       | |Buffer Cache Hibernation Files| |
       | +------------------------------+ |
       +----------------------------------+

In summary, PgFincore's target is File System Buffer Cache, Buffer
Cache Hibernation's target is DB Buffer Cache(shared buffers).

PgFincore is trying to preload database file by posix_fadvise() into
File System Buffer Cache, not into DB Buffer Cache(shared buffers).
On query execution, buffer manager will get DB buffer blocks by
smgrread() from file system unless necessary blocks exist in DB Buffer
Cache. At this point, physical reads may not happen because part of
(or entire) database file is already loaded into FS Buffer Cache.

The gain depends on the file system, especially size of File System
Buffer Cache.
Preloading database file is equivalent to following command in short.
$ cat base/16384/24598 > /dev/null

I think PgFincore is good for data warehouse in applications.

Buffer Cache Hibernation, my approach, is more simple and straight forward.
It try to save/load the contents of DB Buffer Cache(shared buffers) using
regular files(called Buffer Cache Hibernation Files).
At startup, buffer manager will load DB buffer blocks into DB Buffer
Cache from Buffer Cache Hibernation Files which was saved at the last
shutdown. Note that database file will not be read, so it is not
cached in File System Buffer Cache at all. Only contents of DB Buffer
Cache are filled. Therefore, the DB buffer cache miss penalty would
be larger than PgFincore's.

The gain depends on the size of shared buffers, and how often the
similar queries are executed before and after restarting.

Buffer Cache Hibernation is good for OLTP in applications.

I think that PgFincore and Buffer Cache Hibernation is not exclusive,
they can co-work together in different caching levels.

Sorry for my poor english skill, but I'm doing my best :)

Thanks

#14Mitsuru IWASAKI
iwasaki@jp.FreeBSD.org
In reply to: Tom Lane (#4)
Re: patch for new feature: Buffer Cache Hibernation

Hi, thanks for good suggestions.

Postgres usually starts with ZERO buffer cache. By saving the buffer
cache data structure into hibernation files just before shutdown, and
loading them at startup, postgres can start operations with the saved
buffer cache as the same condition as just before the last shutdown.

This seems like a lot of complication for rather dubious gain. What
happens when the DBA changes the shared_buffers setting, for instance?

It was my first concern actually. Current implementation is stopping
reading hibernation file when detecting the size mismatch among
shared_buffers and hibernation file. I think it is a safety way.
As Alvaro Herrera mentioned, it would be possible to adjust copying
buffer bloks, but changing shared_buffers setting is not so often I
think.

How do you protect against the cached buffers getting out-of-sync with
the actual disk files (especially during recovery scenarios)? What

Saving DB buffer cahce is called at shutdown after finishing
bgwriter's final checkpoint process, so dirty-buffers should not exist
I believe.
For recovery scenarios, I need to research it though...
Could you describe what is need to be consider?

about crash-induced corruption in the cache file itself (consider the
not-unlikely possibility that init will kill the database before it's
had time to dump all the buffers during a system shutdown)? Do you have

I think this is important point. I'll implement validation function for
hibernation file.

any proof that writing out a few GB of buffers and then reading them
back in is actually much cheaper than letting the database re-read the
data from the disk files?

I think this means sequential-read vs scattered-read.
The largest hibernation file is for buffer blocks, and sequential-read
from it would be much faster than scattered-read from database file
via smgrread() block by block.
As Greg Stark suggested, re-reading from database file based on buffer
descriptors was one of implementation candidates (it can reduce
storage consumption for hibernation), but I chose creating buffer
blocks raw image file and reading it for the performance.

Thanks

#15Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Mitsuru IWASAKI (#13)
Re: patch for new feature: Buffer Cache Hibernation

2011/5/5 Mitsuru IWASAKI <iwasaki@jp.freebsd.org>:

Hi,

I think that PgFincore (http://pgfoundry.org/projects/pgfincore/)
provides similar functionality.  Are you familiar with that?  If so,
could you contrast your approach with that one?

I'm not familiar with PgFincore at all sorry, but I got source code
and documents and read through them just now.
# and I'm a novice on postgres actually...
The target both is to reduce physical I/O, but their approaches and
gains are different.
My understanding is like this;

+---------------------+     +---------------------+
| Postgres(backend)   |     | Postgres            |
| +-----------------+ |     |                     |
| | DB Buffer Cache | |     |                     |
| | (shared buffers)| |     |                     |
| |*my target       | |     |                     |
| +-----------------+ |     |                     |
|   ^      ^          |     |                     |
|   |      |          |     |                     |
|   v      v          |     |                     |
| +-----------------+ |     | +-----------------+ |
| |  buffer manager | |     | |    pgfincore    | |
| +-----------------+ |     | +-----------------+ |
+---^------^----------+     +----------^----------+
   |      |smgrread()                 |posix_fadvise()
   |read()|                           |                 userland
==================================================================
   |      |                           |                 kernel
   |      +-------------+-------------+
   |                    |
   |                    v
   |       +------------------------+
   |       | File System            |
   |       |   +-----------------+  |
   +------>|   | FS Buffer Cache |  |
           |   |*PgFincore target|  |
           |   +-----------------+  |
           |    ^       ^           |
           +----|-------|-----------+
                |       |
==================================================================
                |       |                               hardware
      +---------|-------|----------------+
      |         |       v  Physical Disk |
      |         |   +------------------+ |
      |         |   | base/16384/24598 | |
      |         v   +------------------+ |
      | +------------------------------+ |
      | |Buffer Cache Hibernation Files| |
      | +------------------------------+ |
      +----------------------------------+

littel detail, pgfincore store its data per relation in a file, like you do.
I rewrote a bit that, and it will store its data directly in
postgresql tables, as well as it will be able to restore the cache
from raw bitstring.

In summary, PgFincore's target is File System Buffer Cache, Buffer
Cache Hibernation's target is DB Buffer Cache(shared buffers).

Correct. (btw I am very happy of your idea and that you get time to do it)

PgFincore is trying to preload database file by posix_fadvise() into
File System Buffer Cache, not into DB Buffer Cache(shared buffers).
On query execution, buffer manager will get DB buffer blocks by
smgrread() from file system unless necessary blocks exist in DB Buffer
Cache.  At this point, physical reads may not happen because part of
(or entire) database file is already loaded into FS Buffer Cache.

The gain depends on the file system, especially size of File System
Buffer Cache.
Preloading database file is equivalent to following command in short.
$ cat base/16384/24598 > /dev/null

Not exactly.

it exists 2 calls :

* pgfadv_WILLNEED
* pgfadv_WILLNEED_snapshot

The former ask to load each segment of a relation *but* the kernel can
decide to not do that or load only part of each segment. (so it is not
as brutal as cat file > /dev/null )
The later read *exactly* each blocks required in each segment, not all
blocks except if all were in cache while doing the snapshot. (this one
is the part of the snapshot/restore combo)

I think PgFincore is good for data warehouse in applications.

Pgfincore with bitstring storage in a table allow streaming to
HotStandbys and get better response in case of switch-over/fail-over
by doing some house-keeping on the HotStandby and keep it really hot
;)

Even web applications have large database today ....

(they is more, but it is no the subject)

Buffer Cache Hibernation, my approach, is more simple and straight forward.
It try to save/load the contents of DB Buffer Cache(shared buffers) using
regular files(called Buffer Cache Hibernation Files).
At startup, buffer manager will load DB buffer blocks into DB Buffer
Cache from Buffer Cache Hibernation Files which was saved at the last
shutdown.  Note that database file will not be read, so it is not
cached in File System Buffer Cache at all.  Only contents of DB Buffer
Cache are filled.  Therefore, the DB buffer cache miss penalty would
be larger than PgFincore's.

The gain depends on the size of shared buffers, and how often the
similar queries are executed before and after restarting.

Buffer Cache Hibernation is good for OLTP in applications.

It is very helpfull for debugging and analysis purpose, also, IIUC.
I may prefer the per relation approach (so you can snapshot and
restore only the interesting tables/index). Given what I read in your
patch it looks easy to do, isn't it ?

I also prefer the idea to keep a map of the Buffer Cache (yes, like
what I do with pgfincore) than storing the data directly and reading
it directly. This later part semmes a bit dangerous to me, even if it
looks sane from a normal postgresql stop/start process.

I think that PgFincore and Buffer Cache Hibernation is not exclusive,
they can co-work together in different caching levels.

Yes.

Sorry for my poor english skill, but I'm doing my best :)

better than me, and anyway your patch remain very easy to read in all case.

Thanks

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

#16Mitsuru IWASAKI
iwasaki@jp.FreeBSD.org
In reply to: Mitsuru IWASAKI (#14)
Re: patch for new feature: Buffer Cache Hibernation

Hi,

I revised the patch against HEAD, it's available at:
http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-20110506.patch

Implemented hibernation file validations:
- comparison with pg_control
At shutdown:
pg_control state should be DB_SHUTDOWNED.
At startup:
pg_control state should be DB_SHUTDOWNED.
hibernation files should be newer than pg_control.

- CRC check
At shutdown:
compute CRC values for hibernation files and store them into a file.
At startup:
CRC values for hibernation files should be the same with read from the
file created at shutdown.

- file size
At startup:
The size of hibernation file should be the same with calculated file
size based on shared_buffers.

- buffer descriptors validation
At startup:
The descriptor flags should not include BM_DIRTY, BM_IO_IN_PROGRESS,
BM_IO_ERROR, BM_JUST_DIRTIED and BM_PIN_COUNT_WAITER.
Sanity checks for usage_count and usage_count should be done.
(wait_backend_pid is zero-cleared because the process was terminated already)

- system call error checking
At shutdown and startup:
Evaluation for return value system call (eg. open(), read(), write()
and etc) should be done.

How do you protect against the cached buffers getting out-of-sync with
the actual disk files (especially during recovery scenarios)? What

Saving DB buffer cahce is called at shutdown after finishing
bgwriter's final checkpoint process, so dirty-buffers should not exist
I believe.
For recovery scenarios, I need to research it though...
Could you describe what is need to be consider?

I think hibernation should be allowed only when the system is shutdown
normaly by checking pg_control state.
And once the abnormal shutdown was detected, the hibernation files
should be ignored.
The latest patch includes this.
# modifications for xlog.c:ReadControlFile() was required though...

about crash-induced corruption in the cache file itself (consider the
not-unlikely possibility that init will kill the database before it's
had time to dump all the buffers during a system shutdown)? Do you have

I think this is important point. I'll implement validation function for
hibernation file.

Added validations seem enough for me.
# because my understanding on postgres is not enough ;)
If any other considerations are required, please point them out.

Thanks

#17Mitsuru IWASAKI
iwasaki@jp.FreeBSD.org
In reply to: Cédric Villemain (#15)
Re: patch for new feature: Buffer Cache Hibernation

Hi, thanks for your comments!
I'm glad to discuss about this topic.

* pgfadv_WILLNEED
* pgfadv_WILLNEED_snapshot

The former ask to load each segment of a relation *but* the kernel can
decide to not do that or load only part of each segment. (so it is not
as brutal as cat file > /dev/null )
The later read *exactly* each blocks required in each segment, not all
blocks except if all were in cache while doing the snapshot. (this one
is the part of the snapshot/restore combo)

Sorry about that, I'm not so familiar with posix_fadvise().
I'll check posix_fadvise() later.
Actually I used to execute 'cat database_file > /dev/null' script on
other DBSM before starting.
# or 'select /*+ INDEX(emp emp_pk) */ count(*) from emp;' to load
# index blocks

I may prefer the per relation approach (so you can snapshot and
restore only the interesting tables/index). Given what I read in your
patch it looks easy to do, isn't it ?

I would like to keep my patch as simple as possible, because
it is just a hibernation function, not complicated buffer management.
But I want to try improving buffer management on next vacation.
# currently I'm in 11-days vacation until Sunday.

My rough idea on improving buffer management like this;
SQL> alter table table_name buffer pin priority 7;
SQL> alter index index_name buffer pin priority 10;

This DDL set 'buffer pin priority' property to table/index and
also buffer descriptors related with table/index.
Optionally preloading database files in FS cache and relation blocks
in DB cache would be possible.

When new buffer is required, buffer manager refer to the priority in
each buffers and select a victim buffer.

I think it helps batch job runs in better buffer cache condition
by giving hints for buffer management.
For example, job-A reads table_A, index_A and writes only table_B;
SQL> alter table table_A buffer pin priority 7;
SQL> alter index index_A buffer pin priority 10;
SQL> alter table table_B buffer pin priority 1;
keeps buffers of index_A, table_A (table_B will be victims soon).

Buffer pin priority can be reset like this;
SQL> alter system buffer pin priority 5;

Next job-B reads and writes table_C, reads index_C with preloading;
SQL> alter table table_C buffer pin priority 5;
SQL> alter index index_C buffer pin priority 10 with preloading 50%;
something like this.

I also prefer the idea to keep a map of the Buffer Cache (yes, like
what I do with pgfincore) than storing the data directly and reading
it directly. This later part semmes a bit dangerous to me, even if it
looks sane from a normal postgresql stop/start process.

Never mind :)
I added enough validations and will add more.

better than me, and anyway your patch remain very easy to read in all case.

Thanks a lot! My policy on experimental implementation is easy-to-read
so that people understand my idea quickly.
That's why my first patch doesn't have enough error checkings ;)

Thanks

#18Greg Smith
gsmith@gregsmith.com
In reply to: Mitsuru IWASAKI (#13)
Re: patch for new feature: Buffer Cache Hibernation

On 05/05/2011 05:06 AM, Mitsuru IWASAKI wrote:

In summary, PgFincore's target is File System Buffer Cache, Buffer
Cache Hibernation's target is DB Buffer Cache(shared buffers).

Right. The thing to realize is that shared_buffers is becoming a
smaller fraction of the total RAM used by the database every year. On
Windows it's been stuck at useful settings being less than 512MB for a
while now. And on UNIX systems, around 8GB seems to be effective upper
limit. Best case, shared_buffers is only going to be around 25% of
total RAM; worst-case, approximately, you might have Windows server with
64GB of RAM where shared_buffers is less than 1% of total RAM.

There's nothing wrong with the general idea you're suggesting. It's
just only targeting a small (and shrinking) subset of the real problem
here. Rebuilding cache state starts with shared_buffers, but that's not
enough of the problem to be an effective tweak on many systems.

I think that all the complexity with CRCs etc. is unlikely to lead
anywhere too, and those two issues are not completely unrelated. The
simplest, safest thing here is the right way to approach this, not the
most complicated one, and a simpler format might add some flexibility
here to reload more cache state too. The bottleneck on reloading the
cache state is reading everything from disk. Trying to micro-optimize
any other part of that is moving in the wrong direction to me. I doubt
you'll ever measure a useful benefit that overcomes the expense of
maintaining the code. And you seem to be moving to where someone can't
restore cache state when they change shared_buffers. A simpler
implementation might still work in that situation; reload until you run
out of buffers if shared_buffers shrinks, reload until you're done with
the original size.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

#19Robert Haas
robertmhaas@gmail.com
In reply to: Greg Smith (#18)
Re: patch for new feature: Buffer Cache Hibernation

On Fri, May 6, 2011 at 5:31 PM, Greg Smith <greg@2ndquadrant.com> wrote:

On 05/05/2011 05:06 AM, Mitsuru IWASAKI wrote:

In summary, PgFincore's target is File System Buffer Cache, Buffer
Cache Hibernation's target is DB Buffer Cache(shared buffers).

Right.  The thing to realize is that shared_buffers is becoming a smaller
fraction of the total RAM used by the database every year.  On Windows it's
been stuck at useful settings being less than 512MB for a while now.  And on
UNIX systems, around 8GB seems to be effective upper limit.  Best case,
shared_buffers is only going to be around 25% of total RAM; worst-case,
approximately, you might have Windows server with 64GB of RAM where
shared_buffers is less than 1% of total RAM.

There's nothing wrong with the general idea you're suggesting.  It's just
only targeting a small (and shrinking) subset of the real problem here.
 Rebuilding cache state starts with shared_buffers, but that's not enough of
the problem to be an effective tweak on many systems.

I think that all the complexity with CRCs etc. is unlikely to lead anywhere
too, and those two issues are not completely unrelated.  The simplest,
safest thing here is the right way to approach this, not the most
complicated one, and a simpler format might add some flexibility here to
reload more cache state too.  The bottleneck on reloading the cache state is
reading everything from disk.  Trying to micro-optimize any other part of
that is moving in the wrong direction to me.  I doubt you'll ever measure a
useful benefit that overcomes the expense of maintaining the code.  And you
seem to be moving to where someone can't restore cache state when they
change shared_buffers.  A simpler implementation might still work in that
situation; reload until you run out of buffers if shared_buffers shrinks,
reload until you're done with the original size.

Yeah, I'm pretty well convinced this whole approach is a dead end.
Priming the OS buffer cache seems way more useful. I also think
saving the blocks to be read rather than the actual blocks makes a lot
more sense.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20Mitsuru IWASAKI
iwasaki@jp.FreeBSD.org
In reply to: Robert Haas (#19)
Re: patch for new feature: Buffer Cache Hibernation

Hi,
Sorry, I missed these messages because I didn't subscribe to this list.
# I've just subscribed temporary

I think that all the complexity with CRCs etc. is unlikely to lead anywhere
too, and those two issues are not completely unrelated.  The simplest,
safest thing here is the right way to approach this, not the most
complicated one, and a simpler format might add some flexibility here to
reload more cache state too.  The bottleneck on reloading the cache state is
reading everything from disk.  Trying to micro-optimize any other part of
that is moving in the wrong direction to me.  I doubt you'll ever measure a
useful benefit that overcomes the expense of maintaining the code.  And you
seem to be moving to where someone can't restore cache state when they
change shared_buffers.  A simpler implementation might still work in that
situation; reload until you run out of buffers if shared_buffers shrinks,
reload until you're done with the original size.

Yeah, I'm pretty well convinced this whole approach is a dead end.
Priming the OS buffer cache seems way more useful. I also think
saving the blocks to be read rather than the actual blocks makes a lot
more sense.

OK, there are two your suggestions here IIUC.
# if not, please correct me.
1. restore buffer blocks based on buffer descriptors, not from the saved file.
2. support restoring cache state even if shared_buffers had changed.

For 1, I've just finish my work. The latest patch is available at:
http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-20110507.patch

On my box, shared_buffers can be set up to only 200MB.
Elapsed time for starting up is almost the same, about 3 sec (w/o
hibernation takes about 1 sec).
For shutdown, writing buffer blocks takes about 10 sec, otherwise
about 1 sec.

Well, it seems you were right :)
By restoring buffer blocks based on buffer descriptors, the OS buffer
cache will be filled too. This can help buffer updating performance
I believe.

I think saving buffer blocks is still useful for debugging or portability,
so I would like to remain the support code in my patch.

For 2, I'm not sure how to implement this.
The problem is that freelist.c:StrategyControl is also restored at
startup, but I have no idea currently how to adjust StrategyControl
when shared_buffer had changed.
StrategyControl has important data on buffer allocation, so this should be
matched with shared_buffer, I belive.

Changing shared_buffer is not so often on production environment.
Current implementation like this;
If shared_buffer had changed, restoring is aborted only on that time
and saving is executed with new shared_buffer at shutdown, restoring
is executed at startup on next time.

I have one more day for working on this, but I may give up...

Thanks

#21Robert Haas
robertmhaas@gmail.com
In reply to: Mitsuru IWASAKI (#20)
#22Mitsuru IWASAKI
iwasaki@jp.FreeBSD.org
In reply to: Mitsuru IWASAKI (#20)
#23Greg Smith
gsmith@gregsmith.com
In reply to: Mitsuru IWASAKI (#22)
#24Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Mitsuru IWASAKI (#22)
#25Mitsuru IWASAKI
iwasaki@jp.FreeBSD.org
In reply to: Greg Smith (#23)
#26Mitsuru IWASAKI
iwasaki@jp.FreeBSD.org
In reply to: Heikki Linnakangas (#24)
#27Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Mitsuru IWASAKI (#26)
#28Robert Haas
robertmhaas@gmail.com
In reply to: Greg Smith (#18)
#29Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Robert Haas (#28)
#30Greg Smith
gsmith@gregsmith.com
In reply to: Mitsuru IWASAKI (#20)
#31Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Robert Haas (#19)
#32Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Tatsuo Ishii (#31)
#33Greg Smith
gsmith@gregsmith.com
In reply to: Tatsuo Ishii (#31)
#34Jeff Janes
jeff.janes@gmail.com
In reply to: Robert Haas (#28)
#35Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Janes (#34)
#36Bruce Momjian
bruce@momjian.us
In reply to: Jeff Janes (#34)
#37Mitsuru IWASAKI
iwasaki@jp.FreeBSD.org
In reply to: Greg Smith (#30)
#38Greg Smith
gsmith@gregsmith.com
In reply to: Mitsuru IWASAKI (#37)
#39Bruce Momjian
bruce@momjian.us
In reply to: Mitsuru IWASAKI (#37)
#40Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Bruce Momjian (#39)
#41Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Cédric Villemain (#40)
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Cédric Villemain (#40)
#43Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#42)
#44Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#43)
#45Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#44)
#46Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#45)
#47Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#44)
#48Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#45)
#49Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#46)
#50Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#48)
#51Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#49)
#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#51)