Proposing pg_hibernate
Please find attached the pg_hibernate extension. It is a
set-it-and-forget-it solution to enable hibernation of Postgres
shared-buffers. It can be thought of as an amalgam of pg_buffercache and
pg_prewarm.
It uses the background worker infrastructure. It registers one worker
process (BufferSaver) to save the shared-buffer metadata when server is
shutting down, and one worker per database (BlockReader) when restoring the
shared buffers.
It stores the buffer metadata under $PGDATA/pg_database/, one file per
database, and one separate file for global objects. It sorts the list of
buffers before storage, so that when it encounters a range of consecutive
blocks of a relation's fork, it stores that range as just one entry, hence
reducing the storage and I/O overhead.
On-disk binary format, which is used to create the per-database save-files,
is defined as:
1. One or more relation filenodes; stored as r<relfilenode>.
2. Each realtion is followed by one or more fork number; stored as
f<forknumber>
3. Each fork number is followed by one or more block numbers; stored as
b<blocknumber>
4. Each block number is followed by zero or more range numbers; stored as
N<number>
{r {f {b N* }+ }+ }+
Possible enhancements:
- Ability to save/restore only specific databases.
- Control how many BlockReaders are active at a time; to avoid I/O storms.
- Be smart about lowered shared_buffers across the restart.
- Different modes of reading like pg_prewarm does.
- Include PgFincore functionality, at least for Linux platforms.
The extension currently works with PG 9.3, and may work on 9.4 without any
changes; I haven't tested, though. If not, I think it'd be easy to port to
HEAD/PG 9.4. I see that 9.4 has put a cap on maximum background workers via
a GUC, and since my aim is to provide a non-intrusive no-tuning-required
extension, I'd like to use the new dynamic-background-worker infrastructure
in 9.4, which doesn't seem to have any preset limits (I think it's limited
by max_connections, but I may be wrong). I can work on 9.4 port, if there's
interest in including this as a contrib/ module.
To see the extension in action:
.) Compile it.
.) Install it.
.) Add it to shared_preload_libraries.
.) Start/restart Postgres.
.) Install pg_buffercache extension, to inspect the shared buffers.
.) Note the result of pg_buffercache view.
.) Work on your database to fill up the shared buffers.
.) Note the result of pg_buffercache view, again; there should be more
blocks than last time we checked.
.) Stop and start the Postgres server.
.) Note the output of pg_buffercache view; it should contain the blocks
seen just before the shutdown.
.) Future server restarts will automatically save and restore the blocks in
shared-buffers.
The code is also available as Git repository at
https://github.com/gurjeet/pg_hibernate/
Demo:
$ make -C contrib/pg_hibernate/
$ make -C contrib/pg_hibernate/ install
$ vi $B/db/data/postgresql.conf
$ grep shared_preload_libraries $PGDATA/postgresql.conf
shared_preload_libraries = 'pg_hibernate' # (change requires restart)
$ pgstart
waiting for server to start.... done
server started
$ pgsql -c 'create extension pg_buffercache;'
CREATE EXTENSION
$ pgsql -c 'select count(*) from pg_buffercache where relblocknumber is not
null group by reldatabase;'
count
-------
163
14
(2 rows)
$ pgsql -c 'create table test_hibernate as select s as a, s::char(1000) as
b from generate_series(1, 100000) as s;'
SELECT 100000
$ pgsql -c 'create index on test_hibernate(a);'
CREATE INDEX
$ pgsql -c 'select count(*) from pg_buffercache where relblocknumber is not
null group by reldatabase;'
count
-------
2254
14
(2 rows)
$ pgstop
waiting for server to shut down....... done
server stopped
$ pgstart
waiting for server to start.... done
server started
$ pgsql -c 'select count(*) from pg_buffercache where relblocknumber is not
null group by reldatabase;'
count
-------
2264
17
(2 rows)
There are a few more blocks than the time they were saved, but all the
blocks from before the restart are present in shared buffers after the
restart.
Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/
EDB www.EnterpriseDB.com <http://www.enterprisedb.com>
Attachments:
Please find attached the updated code of Postgres Hibenator. Notable
changes since the first proposal are:
.) The name has been changed to pg_hibernator (from pg_hibernate), to
avoid confusion with the ORM Hibernate.
.) Works with Postgres 9.4
.) Uses DynamicBackgroundWorker infrastructure.
.) Ability to restore one database at a time, to avoid random-read
storms. Can be disabled by parameter.
.) A couple of bug-fixes.
.) Detailed documentation.
I am pasting the README here (also included in the attachment).
Best regards,
Postgres Hibernator
===================
This Postgres extension is a set-it-and-forget-it solution to save and restore
the Postgres shared-buffers contents, across Postgres server restarts.
It performs the automatic save and restore of database buffers, integrated with
database shutdown and startup, hence reducing the durations of
database maintenance
windows, in effect increasing the uptime of your applications.
Postgres Hibernator automatically saves the list of shared buffers to the disk
on database shutdown, and automatically restores the buffers on
database startup.
This acts pretty much like your Operating System's hibernate feature, except,
instead of saving the contents of the memory to disk, Postgres Hibernator saves
just a list of block identifiers. And it uses that list after startup to restore
the blocks from data directory into Postgres' shared buffers.
Why
--------------
DBAs are often faced with the task of performing some maintenance on their
database server(s) which requires shutting down the database. The maintenance
may involve anything from a database patch application, to a hardware upgrade.
One ugly side-effect of restarting the database server/service is that all the
data currently in database server's memory will be all lost, which was
painstakingly fetched from disk and put there in response to application queries
over time. And this data will have to be rebuilt as applications start querying
database again. The query response times will be very high until all the "hot"
data is fetched from disk and put back in memory again.
People employ a few tricks to get around this ugly truth, which range from
running a `select * from app_table;`, to `dd if=table_file ...`, to using
specialized utilities like pgfincore to prefetch data files into OS cache.
Wouldn't it be ideal if the database itself could save and restore its memory
contents across restarts!
The duration for which the server is building up caches, and trying to reach its
optimal cache performance is called ramp-up time. Postgres Hibernator is aimed
at reducing the ramp-up time of Postgres servers.
How
--------------
Compile and install the extension (you'll need a Postgres instalation and its
`pg_config` in `$PATH`):
$ cd pg_hibernator
$ make install
Then.
1. Add `pg_hibernator` to the `shared_preload_libraries` variable in
`postgresql.conf` file.
2. Restart the Postgres server.
3. You are done.
How it works
--------------
This extension uses the `Background Worker` infrastructure of
Postgres, which was
introduced in Postgres 9.3. When the server starts, this extension registers
background workers; one for saving the buffers (called `Buffer Saver`) when the
server shuts down, and one for each database in the cluster (called
`Block Readers`)
for restoring the buffers saved during previous shutdown.
When the Postgres server is being stopped/shut down, the `Buffer
Saver` scans the
shared-buffers of Postgres, and stores the unique block identifiers of
each cached
block to the disk. This information is saved under the `$PGDATA/pg_hibernator/`
directory. For each of the database whose blocks are resident in shared buffers,
one file is created; for eg.: `$PGDATA/pg_hibernator/2.postgres.save`.
During the next startup sequence, the `Block Reader` threads are
registered, one for
each file present under `$PGDATA/pg_hibernator/` directory. When the
Postgres server
has reached stable state (that is, it's ready for database connections), these
`Block Reader` processes are launched. The `Block Reader` process
reads the save-files
looking for block-ids to restore. It then connects to the respective database,
and requests Postgres to fetch the blocks into shared-buffers.
Configuration
--------------
This extension can be controlled via the following parameters. These parameters
can be set in postgresql.conf or on postmaster's command-line.
- `pg_hibernator.enabled`
Setting this parameter to false disables the hibernator features. That is,
on server startup the BlockReader processes will not be launched, and on
server shutdown the list of blocks in shared buffers will not be saved.
Note that the BuffersSaver process exists at all times, even when this
parameter is set to `false`. This is to allow the DBA to enable/disable the
extension without having to restart the server. The BufferSaver process
checks this parameter during server startup and right before shutdown, and
honors this parameter's value at that time.
To enable/disable Postgres Hibernator at runtime, change the value in
`postgresql.conf` and use `pg_ctl reload` to make Postgres re-read the new
parameter values from `postgresql.conf`.
Default value: `true`.
- `pg_hibernator.parallel`
This parameter controls whether Postgres Hibernator launches the BlockReader
processes in parallel, or sequentially, waiting for current BlockReader to
exit before launching the next one.
When enabled, all the BlockReaders, one for each database, will be launched
simultaneously, and this may cause huge random-read flood on disks if there
are many databases in cluster. This may also cause some BlockReaders to fail
to launch successfully because of `max_worker_processes` limit.
Default value: `false`.
- `pg_hibernator.default_database`
The BufferSaver process needs to connect to a database in order to perform
the database-name lookups etc. This parameter controls which database the
BufferSaver process connects to for performing these operations.
Default value: `postgres`.
Caveats
--------------
- Buffer list is saved only when Postgres is shutdown in "smart" and
"fast" modes.
That is, buffer list is not saved when database crashes, or on "immediate"
shutdown.
- A reduction in `shared_buffers` is not detected.
If the `shared_buffers` is reduced across a restart, and if the combined
saved buffer list is larger than the new shared_buffers, Postgres
Hibernator continues to read and restore blocks even after `shared_buffers`
worth of buffers have been restored.
FAQ
--------------
- What is the relationship between `pg_buffercache`, `pg_prewarm`, and
`pg_hibernator`?
They all allow you to do different things with Postgres' shared buffers.
+ pg_buffercahce:
Inspect and show contents of shared buffers
+ pg_prewarm:
Load some table/index/fork blocks into shared buffers. User needs
to tell it which blocks to load.
+ pg_hibernator:
Upon shutdown, save list of blocks stored in shared buffers. Upon
startup, loads those blocks back into shared buffers.
The goal of Postgres Hibernator is to be invisible to the user/DBA.
Whereas with `pg_prewarm` the user needs to know a lot of stuff about
what they really want to do, most likely information gathered via
`pg_buffercahce`.
- Does `pg_hibernate` use either `pg_buffercache` or `pg_prewarm`?
No, Postgres Hibernator works all on its own.
If the concern is, "Do I have to install pg_buffercache and pg_prewarm
to use pg_hibernator", the answer is no. pg_hibernator is a stand-alone
extension, although influenced by pg_buffercache and pg_prewarm.
With `pg_prewarm` you can load blocks of **only** the database
you're connected
to. So if you have `N` databases in your cluster, to restore blocks of all
databases, the DBA will have to connect to each database and invoke
`pg_prewarm` functions.
With `pg_hibernator`, DBA isn't required to do anything, let alone
connecting to the database!
- Where can I learn more about it?
There are a couple of blog posts and initial proposal to Postgres
hackers' mailing list. They may provide a better understanding of
Postgres Hibernator.
[Proposal](/messages/by-id/CABwTF4Ui_anAG+ybseFunAH5Z6DE9aw2NPdy4HryK+M5OdXCCA@mail.gmail.com)
[Introducing Postrges
Hibernator](http://gurjeet.singh.im/blog/2014/02/03/introducing-postgres-hibernator/)
[Demostrating Performance
Benefits](http://gurjeet.singh.im/blog/2014/04/30/postgres-hibernator-reduce-planned-database-down-times/)
On Mon, Feb 3, 2014 at 7:18 PM, Gurjeet Singh <gurjeet@singh.im> wrote:
Please find attached the pg_hibernate extension. It is a
set-it-and-forget-it solution to enable hibernation of Postgres
shared-buffers. It can be thought of as an amalgam of pg_buffercache and
pg_prewarm.It uses the background worker infrastructure. It registers one worker
process (BufferSaver) to save the shared-buffer metadata when server is
shutting down, and one worker per database (BlockReader) when restoring the
shared buffers.It stores the buffer metadata under $PGDATA/pg_database/, one file per
database, and one separate file for global objects. It sorts the list of
buffers before storage, so that when it encounters a range of consecutive
blocks of a relation's fork, it stores that range as just one entry, hence
reducing the storage and I/O overhead.On-disk binary format, which is used to create the per-database save-files,
is defined as:
1. One or more relation filenodes; stored as r<relfilenode>.
2. Each realtion is followed by one or more fork number; stored as
f<forknumber>
3. Each fork number is followed by one or more block numbers; stored as
b<blocknumber>
4. Each block number is followed by zero or more range numbers; stored as
N<number>{r {f {b N* }+ }+ }+
Possible enhancements:
- Ability to save/restore only specific databases.
- Control how many BlockReaders are active at a time; to avoid I/O storms.
- Be smart about lowered shared_buffers across the restart.
- Different modes of reading like pg_prewarm does.
- Include PgFincore functionality, at least for Linux platforms.The extension currently works with PG 9.3, and may work on 9.4 without any
changes; I haven't tested, though. If not, I think it'd be easy to port to
HEAD/PG 9.4. I see that 9.4 has put a cap on maximum background workers via
a GUC, and since my aim is to provide a non-intrusive no-tuning-required
extension, I'd like to use the new dynamic-background-worker infrastructure
in 9.4, which doesn't seem to have any preset limits (I think it's limited
by max_connections, but I may be wrong). I can work on 9.4 port, if there's
interest in including this as a contrib/ module.To see the extension in action:
.) Compile it.
.) Install it.
.) Add it to shared_preload_libraries.
.) Start/restart Postgres.
.) Install pg_buffercache extension, to inspect the shared buffers.
.) Note the result of pg_buffercache view.
.) Work on your database to fill up the shared buffers.
.) Note the result of pg_buffercache view, again; there should be more
blocks than last time we checked.
.) Stop and start the Postgres server.
.) Note the output of pg_buffercache view; it should contain the blocks seen
just before the shutdown.
.) Future server restarts will automatically save and restore the blocks in
shared-buffers.The code is also available as Git repository at
https://github.com/gurjeet/pg_hibernate/Demo:
$ make -C contrib/pg_hibernate/
$ make -C contrib/pg_hibernate/ install
$ vi $B/db/data/postgresql.conf
$ grep shared_preload_libraries $PGDATA/postgresql.conf
shared_preload_libraries = 'pg_hibernate' # (change requires restart)$ pgstart
waiting for server to start.... done
server started$ pgsql -c 'create extension pg_buffercache;'
CREATE EXTENSION$ pgsql -c 'select count(*) from pg_buffercache where relblocknumber is not
null group by reldatabase;'
count
-------
163
14
(2 rows)$ pgsql -c 'create table test_hibernate as select s as a, s::char(1000) as b
from generate_series(1, 100000) as s;'
SELECT 100000$ pgsql -c 'create index on test_hibernate(a);'
CREATE INDEX$ pgsql -c 'select count(*) from pg_buffercache where relblocknumber is not
null group by reldatabase;'
count
-------
2254
14
(2 rows)$ pgstop
waiting for server to shut down....... done
server stopped$ pgstart
waiting for server to start.... done
server started$ pgsql -c 'select count(*) from pg_buffercache where relblocknumber is not
null group by reldatabase;'
count
-------
2264
17
(2 rows)There are a few more blocks than the time they were saved, but all the
blocks from before the restart are present in shared buffers after the
restart.Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/EDB www.EnterpriseDB.com
--
Gurjeet Singh http://gurjeet.singh.im/
EDB www.EnterpriseDB.com
Attachments:
On Wed, May 28, 2014 at 7:31 AM, Gurjeet Singh <gurjeet@singh.im> wrote:
Caveats
--------------- Buffer list is saved only when Postgres is shutdown in "smart" and
"fast" modes.That is, buffer list is not saved when database crashes, or on
"immediate"
shutdown.
- A reduction in `shared_buffers` is not detected.
If the `shared_buffers` is reduced across a restart, and if the
combined
saved buffer list is larger than the new shared_buffers, Postgres
Hibernator continues to read and restore blocks even after
`shared_buffers`
worth of buffers have been restored.
How about the cases when shared buffers already contain some
data:
a. Before Readers start filling shared buffers, if this cluster wishes
to join replication as a slave and receive the data from master, then
this utility might need to evict some buffers filled during startup
phase.
b. As soon as the server completes startup (reached consistent
point), it allows new connections which can also use some shared
buffers before Reader process could use shared buffers or are you
planing to change the time when users can connect to database.
I am not sure if replacing shared buffer contents in such cases can
always be considered useful.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, May 28, 2014 at 2:15 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, May 28, 2014 at 7:31 AM, Gurjeet Singh <gurjeet@singh.im> wrote:
Caveats
--------------- Buffer list is saved only when Postgres is shutdown in "smart" and
"fast" modes.That is, buffer list is not saved when database crashes, or on
"immediate"
shutdown.- A reduction in `shared_buffers` is not detected.
If the `shared_buffers` is reduced across a restart, and if the
combined
saved buffer list is larger than the new shared_buffers, Postgres
Hibernator continues to read and restore blocks even after
`shared_buffers`
worth of buffers have been restored.How about the cases when shared buffers already contain some
data:
a. Before Readers start filling shared buffers, if this cluster wishes
to join replication as a slave and receive the data from master, then
this utility might need to evict some buffers filled during startup
phase.
A cluster that wishes to be a replication standby, it would do so
while it's in startup phase. The BlockReaders are launched immediately
on cluster reaching consistent state, at which point, I presume, in
most of the cases, most of the buffers would be unoccupied. Hence
BlockReaders might evict the occupied buffers, which may be a small
fraction of total shared_buffers.
b. As soon as the server completes startup (reached consistent
point), it allows new connections which can also use some shared
buffers before Reader process could use shared buffers or are you
planing to change the time when users can connect to database.
The BlockReaders are launched immediately after the cluster reaches
consistent state, that is, just about when it is ready to accept
connections. So yes, there is a possibility that the I/O caused by the
BlockReaders may affect the performance of queries executed right at
cluster startup. But given that the performance of those queries was
anyway going to be low (because of empty shared buffers), and that
BlockReaders tend to cause sequential reads, and that by default
there's only one BlockReader active at a time, I think this won't be a
concern in most of the cases. By the time the shared buffers start
getting filled up, the buffer replacement strategy will evict any
buffers populated by BlockReaders if they are not used by the normal
queries.
In the 'Sample Runs' section of my blog [1]http://gurjeet.singh.im/blog/2014/04/30/postgres-hibernator-reduce-planned-database-down-times/, I compared the cases
'Hibernator w/ App' and 'Hibernator then App', which demonstrate that
launching application load while the BlockReaders are active does
cause performance of both to be impacted by each other. But overall
it's a net win for application performance.
I am not sure if replacing shared buffer contents in such cases can
always be considered useful.
IMHO, all of these caveats, would affect a very small fraction of
use-cases and are eclipsed by the benefits this extension provides in
normal cases.
[1]: http://gurjeet.singh.im/blog/2014/04/30/postgres-hibernator-reduce-planned-database-down-times/
Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/
EDB www.EnterpriseDB.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, May 28, 2014 at 5:30 PM, Gurjeet Singh <gurjeet@singh.im> wrote:
On Wed, May 28, 2014 at 2:15 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
How about the cases when shared buffers already contain some
data:
a. Before Readers start filling shared buffers, if this cluster wishes
to join replication as a slave and receive the data from master, then
this utility might need to evict some buffers filled during startup
phase.A cluster that wishes to be a replication standby, it would do so
while it's in startup phase. The BlockReaders are launched immediately
on cluster reaching consistent state, at which point, I presume, in
most of the cases, most of the buffers would be unoccupied.
Even to reach consistent state, it might need to get the records
from master (example to get to STANDBY_SNAPSHOT_READY state).
Hence
BlockReaders might evict the occupied buffers, which may be a small
fraction of total shared_buffers.
Yes, but I think still it depends on how much redo replay happens
on different pages.
b. As soon as the server completes startup (reached consistent
point), it allows new connections which can also use some shared
buffers before Reader process could use shared buffers or are you
planing to change the time when users can connect to database.The BlockReaders are launched immediately after the cluster reaches
consistent state, that is, just about when it is ready to accept
connections. So yes, there is a possibility that the I/O caused by the
BlockReaders may affect the performance of queries executed right at
cluster startup. But given that the performance of those queries was
anyway going to be low (because of empty shared buffers), and that
BlockReaders tend to cause sequential reads, and that by default
there's only one BlockReader active at a time, I think this won't be a
concern in most of the cases. By the time the shared buffers start
getting filled up, the buffer replacement strategy will evict any
buffers populated by BlockReaders if they are not used by the normal
queries.
Even Block Readers might need to evict buffers filled by user
queries or by itself in which case there is chance of contention, but
again all these are quite rare scenario's.
I am not sure if replacing shared buffer contents in such cases can
always be considered useful.IMHO, all of these caveats, would affect a very small fraction of
use-cases and are eclipsed by the benefits this extension provides in
normal cases.
I agree with you that there are only few corner cases where evicting
shared buffers by this utility would harm, but was wondering if we could
even save those, say if it would only use available free buffers. I think
currently there is no such interface and inventing a new interface for this
case doesn't seem to reasonable unless we see any other use case of
such a interface.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On May 29, 2014 12:12 AM, "Amit Kapila" <amit.kapila16@gmail.com> wrote:
I agree with you that there are only few corner cases where evicting
shared buffers by this utility would harm, but was wondering if we could
even save those, say if it would only use available free buffers. I think
currently there is no such interface and inventing a new interface for
this
case doesn't seem to reasonable unless we see any other use case of
such a interface.
+1
On Tue, May 27, 2014 at 10:01 PM, Gurjeet Singh <gurjeet@singh.im> wrote:
When the Postgres server is being stopped/shut down, the `Buffer
Saver` scans the
shared-buffers of Postgres, and stores the unique block identifiers of
each cached
block to the disk. This information is saved under the `$PGDATA/pg_hibernator/`
directory. For each of the database whose blocks are resident in shared buffers,
one file is created; for eg.: `$PGDATA/pg_hibernator/2.postgres.save`.
This file-naming convention seems a bit fragile. For example, on my
filesystem (HFS) if I create a database named "foo / bar", I'll get a
complaint like:
ERROR: could not open "pg_hibernator/5.foo / bar.save": No such file
or directory
during shutdown.
Josh
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, May 30, 2014 at 5:33 PM, Josh Kupershmidt <schmiddy@gmail.com> wrote:
On Tue, May 27, 2014 at 10:01 PM, Gurjeet Singh <gurjeet@singh.im> wrote:
When the Postgres server is being stopped/shut down, the `Buffer
Saver` scans the
shared-buffers of Postgres, and stores the unique block identifiers of
each cached
block to the disk. This information is saved under the `$PGDATA/pg_hibernator/`
directory. For each of the database whose blocks are resident in shared buffers,
one file is created; for eg.: `$PGDATA/pg_hibernator/2.postgres.save`.This file-naming convention seems a bit fragile. For example, on my
filesystem (HFS) if I create a database named "foo / bar", I'll get a
complaint like:ERROR: could not open "pg_hibernator/5.foo / bar.save": No such file
or directoryduring shutdown.
Thanks for the report. I have reworked the file naming, and now the
save-file name is simply '<integer>.save', so the name of a database
does not affect the file name on disk. Instead, the null-terminated
database name is now written in the file, and read back for use when
restoring the buffers.
Attached is the new version of pg_hibernator, with updated code and README.
Just a heads up for anyone who might have read/reviewed previous
version's code, there's some unrelated trivial code and Makefile
changes as well in this version, which can be easily spotted by a
`diff -r`.
Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/
EDB www.EnterpriseDB.com
Attachments:
pg_hibernator.v2.tgzapplication/x-gzip; name=pg_hibernator.v2.tgzDownload
� .��S �<kS�F��j���3���j�eR`<7`(2�U[������W��������Z�=��$�����[�}��y��G���y0�Y��I��������7�����W{�_��f��/_���������7/���t��B�n&�7�"�I�|������~��k�z���2�����������������{��_�������?�����K{a�K�w��v�yo��zZ���}|����s��p3�/\�J���J����&�p�d�H�[����
�F(��� ��7��i���U��v���b���n�z�B7��k�a���K�d
@5�d�Cx����(� �xR�K_LD>�����U������c��������a2C�R�L���}��O���y/�5T�A}d�D�_*�'�5>c�:�d=�&�� ��B��NYB��M���3"i���"B�;+�U��?�e����t
��[�������4���A,PD{29��b�$a�����c�N"�C7������������s��G)������`���/S���n>�
�ID������&Jv�P_�pV���v���� yW������+lh�F�d�$�O�1rC�Ae�6x���/���0m��m�28���e�������gdj�S��>�v�\�����������t}v9D
�<(����}�S���-�������{��E���������������/�yLvWKB�}qyrs>p&�L��������{ ����>����hptr1�E~�LAk��x�\�~��N������)L����C����kG�e��r����Ig�j�m�;~o�e�d[?� ��3��$u&E�<�����,@���.X���Y�L��1f�L������{�m[�~0��m��{_i�/���W��������������LR\��T���8���9����P�m���^�O��EX����8:=���l�mqy'3�y&�LYC��-��0��0���%290%P�M�~�l���b:���L` qvO/�8��]�����B�.W���@����:=9�>j=*-=�������H��C�a����HQ���������7
v�A����Gp�"��y��9T�����Q8��#�1`*B3�P"f�\���A'=����
�.�L��� ��S���i!+��J�� H���"
1�{�0G�P�$�n����P�2�W|@����!?�i��b ��8E#�����2~^�e2���q`�1f'#���(�>��A9����X.�d?�����,Cvh�%��g����DC���IlUg�!�e,����d�8�\P����J���B��"�Qd���dXG��������@z���[I��C��ERK�������
����T0�Y����1l������S<8�I�AJH�C-���t��U�40e�@@��9����5A�� G��(����Cm�=Bb���jv'}"�Yl�=TS�3>5����g�k���t���S�����l�K�d�����nc� �(�g����3�����e��Z-C���d��]��?P����o�f��%<FUJ�[������-���^?���$7z>�h�� /��� C�Q��3�<������99�����Per�0A�%�7$sX���'��������>��XfN}�H��7Mh�����,b����F�>���,lo����~�> '�""����y�H+T3���d� @q�(O�lK�]���4M��w�'5(`F��7���N�G����t��f��t���Lp�u�S?tm�:���<�O�(���-�$Z&�QZ����
�?'�J��S������'�j���
�s�������x��p;�QW����I�n�N�����i�� �9��1�X��:�B��S�1�Q�\���a~�fS7D3����j�����`�-��sd�c>,k�K��3vUV��~%����S�4F�`�!�Y�2<�b"0�f�
��M��[��S:��*�� ��qr�^C�I����@�0�������`t�}����H������r�0IR4���+��7��'E&�X�]�R[p��H��X����R��AYG��,�82�~��l&~����\<\��m�����#���l�� =Bqo����_&���+M'��$�����o� ���D|�+$`M}~����x��$�,0*���1�[y�Be�k���2 �:e�Cj� ��g�x���q'�q�%G���L- -���*���e����d
�������f�� �������h0"����j-G�(�N,������dh�CJ
a7�����VA���Y���d��k
K[�N=/��������4� Y( ��0��#,�S2]�Z
�Y�G����$%0E"�A�'���5!������^Wo���Lh�B�~�����t���^2�'��"$�$�RB��������8]���2�\YY���!�P�ef#_J�t�����p"����$5��v3d���_��������jgc�0ltM��1�'�Zz�[ �����������s�E�W=�:�;l��G{�A.� �?�x�_�?�yd�JG���6�����~���sD}�����E�2/��!]��*���V}(�kCW.�k�2�!�����V�h\�F8��hd�uB7R#��A$������h����t���(�*xh4}�! ���Xx��x"�����)���ve*���y��N����l !�
���W�KG����S���U�%�8c�HG��2�@�G���q���9��� ^O�]L�+��LbB��F����"5�l�s�����\� PM��IW�aXQ,S @�[[��?C`�<�%
���>��uH���r0���b����_����[x�$"d������;�q
)�P�����8���23���}��k��t|+�EAVQI�i��3_�F,c����+����O������,0x�@R��sT���p(Ee ��������Z��^_T!w6�C�r �G4��T1�0�i��HB�$����n��K����(4e���G&�����[~�P���-���C}�+y��Wpw|:���l��HQt:HAx�&�����S�6*����Aj�o��DyDs����]`���� �E��e��;r��w��
#�M��T+�A�)V{Ee��
�0Wv�[6�������+�1[� :�
;4�'���?��M��b�5��n�@ ��HB�|=��1���:�\'im��X*.�n���!�S������=*4�O����}$M\W)
!t�wU�EP+�0KS��'F��������Rd|�� f��q���Z�M+�m������h��gFt�(��y*������r-B�"{�B�.�������g|}4����?^^;�\�F���- ��YL���6s�L_�f�������p��\������2��b>�����W���#�e��d�1�s���6������[H��[Hli�?�<��2���.c��WQK��S7�K�iC�-G�w:�e�6t�%��(i���=${]�.�fU���a��wYKQ�}�,�4����m��Bi�w^M�n6��O+Q����w��Z\EH�&A�H�k���K��DZ����6�YW�9�O/��P�Z�0Y����;���q�b2�#r��x2[8��Hw�/�������=�*���!J�������\8G��`<�]=?>��0�8xS{|4`�p��c��oD�k�"�rFJe!Fe"'"��iF��!v."�8G��F{!f���=*\� e�����on��kM�wH���Sp{��<�gO
�r.�>��]��+�^/� �)���t�Z�e���<aT��]yY�����Q4��?�P�d���SO�G�E�wB�'v����p0����Z�:�����r1'�V�
8 S��@*�=������~�C��zd�es�%�u,���x�������0�K5_����)Z�"�:J2���K��]��:>"a��Zn���9�Ks���.`��rp�����u�P��`�w�
��0�\���w�.�\�+r�xF�`���u��^�c�N��R��g]�*�*�r�;�����!�o���%����\�k�7�x}����AH�I�6a[�I����|2
y�R����?[��x���fa ��]�������
�^�
��6��n:�{�R���V������9b�')G��K��T�n;�P��C
e"�����&.�#�Pn�Y��I_�e�l�i!����������u���lO��{W^�lTy�)�-Q��w9V�Y�A��U�8��n��N����\�e%��/2KL���~��Q_�V<�v���������������5���x;V��:
2S$�����'x�b�����;���f_�\S����I�I���fx���� k�Z��I�wb W���@J �6�D���^`%T%���1-������,�?�)i,�a6�AI���R [�@��D�I�@�eX���R~A{�O:JE������\���h�|��H2�K��f�C ��;c����0������}�I��t4 �8�]��'��O��a��I�� {������%���V�b���r���`T��n��e7��+�3��D���%��;�|z{�<#�J/J�}~zV��W��3��������������G�yp���C�GO�7�/,LB��)���%UCF2��'�hX�����n��g=���G���!��������=�V�+*G[,_����$�����g����-�� L�.�%�A_V��,w���0���t�����F����:��fC�Z�z8
�F��+3�oEZc��rkL����q����:=%����3����XbK8����@.zd�
h�2g�S�!���G�����
�F�pF8��W��=�6Kl'��/�
n��{�5@ <��.�a���d@�B���� dC�gu�F���e��%T�E0�&JbjSz��C���|���#J����r�Q�m>}��v��U?��yv��PV���J ���"��An���ln{+�K����{���N9p�_���^�uQJ]8�,�ri�>���cWg�v����P�#���n
?�@jb-.����6���?���z��n���!X��Jb�����j��:� ]�"e2�D�����HHCC&a �\ 5*����#��m��Lz��b .�@
����� wjJ?��f�JG�����-O��p��k ���5%��*<���5������'��P��j���4t=�pt=��S�;=l�t��^q�a�W5y�<��G�1$>����]�����+����)�9��0L����p���p���]0�r�m��x
�Ac�E}&�A} W����|X �,M(��2����\�,@����%MX�`�d� 1�A� ��e \��u]`
%�I����QAZH~��A�UBe��^'jJG/�7
�&�!7�@,��dE�7���Tr��M�w�U6�����eTOy�+����<�
������[��g����v����v�iq�����2�*��
"� B����V��|Z�Y0524Vi�*-������h�������!{�v+��m�����Z�_��zR���P7�g_�� _��j�yx�C�����"�1�T��n})�Sk�9��n���28���{� G��V��3uXgj-��@x���k E����P�o��+_����43s��Ua�U�f���[��g������DDvK���y�>-�1w'����Ti���4��E��Z�Xc��5@~*��c��� �#�.��p~O/���%_`T���
��U�s��Y1��i^hi����~N���W)��g5JkE��t�2o#����#���V��+���PV*��lR��M9M�Z)���]"]S��LWu\�����P]&)�~�b�Z�\���I�j�Z��6���%��&)��d;N����-9V�li#y=��K��B�[ ��f��y����'�s��� H���l��55�H���O��wW�����;�/�k���9���dZ��A�j�-EQ���t'Oy4���P����>+��c�)f~�1�DL��x{|vX�?l�h�K�Mm���9�44n�S���g����]�jbu�m_���1|���,"���**I�����l6~gZ�"��������=��"H;��6hoB�
��&�y����h�S��o���8<e���k���|�Q��Te�N�Z{K�sG�]�5P�*[�N���
�,
���w$�z(w�I\u���i��=yq�����#� ����Y�Nl�[[%��Ml��f�PC�H�bV(����;����;����a:�����h����PK���N�������f?)x ��o&`y����&d�5����9��=�
w�d��~4O[~�{�Plc9?�]�h�m^g��Yv�S����;�)3h3�%��<z��r�|�.���~�
U�P�s;\�����
��3��7eb:`�o�����Wld6�FV�~3 ���7)��/�Ax*JP���i���(�-s��5BZ�e3�b������h���4�d{�dZ2�R��d4J�T��D��T������hE�%&\6��i������e�2�/<�Hig��}Ya����0��<m~ar��� a@�� kk2I�E�88l��'��{�7�@��G'�bE�$�.-hH E@���4z��J=e���|�$S�p �M��a:�8�!F;�G�y��*��t5���'�&F#� `)8�Z��a2J�A2�]���3��e0),a�����s������Y�3�yv�����������:g����;0�����#�ln$0�t���?q4��� �������������9�^jF�`<�(V6���������5�U!2��'�O��g@|u���P��x�HxK�)��O
�Q�����s�+��8��+�&���H��EA����D�um�M)�F#�����m1����lR;������Q��l��������O,�.QW��9_WKU�[
����Z]�CW`"L��d�S
X�
~j?�{�<�<�(�s����'yA����wqm����lM���@�#�\b��E�>~����u����SF4*?�B��,�������o�0RTm�m���XfJsn��tk� hy�+]�Fi Z��hl�2��z� ��Cz�3���}�D4��Re]�=?Nv��r Dm+7o�p���TL�Wqo#jw<���Z���$�g���=���5�x(j���H4�U�a�a���?<P�z�:H�s�l4$�H�%��?J�����*��r�������7�a� ��$����QIe8�w��T�JD0S��1�!M*%�m��_�Axq����g��O:��bt�)�H)�����v���,��:t�""}����??�>���z
�q����y���}36
�4hp�M����0J54�Hp��0�����G��)H���C����D�^p��H'�t)����.��S�fFm��)/�/{3p7�<�z�0D�!(#��9:�o���a��g�o� {�y��F�����B��"�3��/^r`��&��o��[�ay�n������o_�b�lk��TX'|5Vl��@�_������>�1���Z��bB��OI�� um��1��VKQ1o>*�V?�
/T�dZ�#���0�� ��V��\����p��D�����������20"�8P ��k�mf���1����2��H���G ��7� ��-�7��$��3�����{����$������F�5��s�Qb�"E��FK**�tp�#h�=�l����YNs��F�{:���T�E(TH�vdL�b+���(���gSa��e�����1K�M����!�v��Hv�~D{�<:���u���om�{jsZ���-O���#(Au����x�!U�*�,�����0�Z�Z�H/,H �3�����������? � Vx�����y�����X ��A�Y��v ���:���99vLZ��F��21s��aCvyT�DYKY~�
Y��J�[��������
�{KV���C�s����������yz���U����9o�"��O�b�"��l'e������x�����^�������0i���
�c[[�����8U�(;YJg�D�S�S�2)2��ITQ�3���)�j�~�1�K��%�9{�$� 8{�4����4�tY_���������[���Z`���K���
r,T��P6�����d�!7�]��&��
��J+k����,o����?�Fe�S�~�~�b�������������M�\ �n���^i��&�0�V�?x�d��"�~��i���iv���F�oxHy��_�'�9��j����+s������7y�e�������4�I����0�����R]zM�g�q���i@�Ep�O0�6g�%�����xt�,>tUM �9��}9���e�Fh���n�:�����=n��Y�����o�8n_�=*�����!=��N���O �����!G�����A[��[�.���sk����-O����e�=(���=(&���O��M��Y�����s^[� �=,�ti��(�[xq[,v�*0���:J��t^8f��5O����t%��B@cQ��y��7u��B�Zd������@���,���8��������h����K���3tF��l�W��:TtP;�%��S��G��xL"�2�����x�G\U7[%{�oU����j�/-������Dy ����j|��z��P�A�@�F������� J�5������u[��@: �;_4�+I���4��C�����%y��}�����T��������_h
y�'�'�h|Rd�)��#h�L|D�feqM$�D�QH�Z3v��q�TN���8Ow�%�3����?A5���j4aJ9���I���~a;�8-���u�W�����Y�!��p�)?ip�F ^�\�O
bV��N=����5��y�0P'��PN �3�E�������(M#.'���T�!���`t4���J�����2(0!h�:��V���Q��0vEs�F��
�8��������3���s��A���;����q��+��s
a��
���!�u j�>0Y+<C��h��N�����Z������[SU�������G[����a�C���#�/��!���n�_��8R�]���>~![���������,�������{p�?�<X��]>@����O�#�>�jDX��-��-���+�'���
l�*���:j����a#�0�e�\u+�������@`Y����%�^^��>kP��LWd^���}�������� �`g=�)K�N�u@���q�W�Z��Bq8�c�5m��lJ�)4p�W,�^C�&�U�c��'�t�*b�:]*5f�/i�.�����{x��U�l��z�W���2*�K��:�0 ����s�����I�)�����^D�kMa(@3P
��D$���x\=4�T3�Z$Z��"�"�v��-���A����B�>z���e�
�&pU����y���&v8����|}y��O�?��d�@�x��� i���I�)�S|G�I�_�)
��R����))�YA�A���r�������p�N�t�_�f9B�RE��5Q7���q
g,N��1h)�R��}�m��?�m��pr}B� Zy�L^�M�"gB���.~J\a� ��<�(%rA+���Jx��
KOy��LQ8�)�MN�v�6��;Y���IX�|E�s�}�`�������k��F��s�"~&���F��K;K���36��HUY��_������?y�`X�gK>mWy��n�L��'��p�����
q��4��?�� z6�T������IFaWs�\�Z�����%R��p@���+��v��?Q�w )�_a&X�=-`�M��5�� M�w�b��������^�r���`%�NA���D�i����M�A��j�e[r�������� U��G�yAc$�5D4�H�0$�Ki@ux����)�Ys@>CS�h�BvF�Y�4����q�E�_�3[����=�\���������� ):�������\��� c0JY��u��H���N:g����A�����jT��2'���#%�����z/t�]�e��*����,�����h�����6�KT����Do+����k��3��������_��|n�"I���Q����C"��J6j�4�K� /l/�����
#{���avdB�(l��`Q5���fu�)�.�����n��
L��v�8o�uL
�Ro8�0m��J���������F��-��g�F8�;[�,�����T�m|IF���g�j�%���$9��5����|�V�|�J��(}�B�q�'[%�@��b��@�,~f}Rb�Wuj����*
���z�O3��#L
/���9h?\4��mp6af7�� �h�aU�����v8��\������5����
J����,Z]�yU�)@|9!�YZ����0C�������qL�KN�kAiJ�3M�T�e�����"�y�a@i[)��Y�����L���!.U7��������o����f�q����Be<�� 1��"1e����O#AH3A���M�Kj� :Q1��g��3���
��C�|k��n��
SX��7�iR������@�����=q�m�F��B��`4+�:}��C�MHzqe��}Rc��Z
�����A�V���y�<����~q4)
d�vT�����QSn�R.M��-W�m�=�!P�������S��$�R����v�\{�������yy;��(m
T��iO���.'�IVn��,�aT������|67�J�@���ZS`�h��j��?/�H
t�J B$7����%� �Y�6�z�x���)�J��G�t��4�*r���u�c����U�!����
�U������R���=��#E�,��RK�ns��M�>HW�)g�2��`������9�|���g4N@`��FF�B���e��� �a��>0��,9 #�"G�������~q<��b���Mx=q�tH
"�+
�CJ�y���r���')�a= d��T��0�
�)��Z����8� k�$�6l��)
�TG��xRJj�(����jM��j�T�������L?�n���kdN .�Sh;*�����m��lH)y���.��a\��|�n��m�W�Z����X��ebzM�Td�������
��|���l�5&r'���k��@�5M����&N
J%�I�V���j���G������G��:�Bin�(@Pa����3RR+�U(���'hs���tc�Q�<
j�Pq�#�0��1��x���
2x��m�����A�1�vI��6����l���*)����wM���m��C����U1g{��)����\�Q�i�9;���K�J����T[2����S#�};�l�m
��_]v�+&$~b[�p�9j�t��}��b=%�nj$�����T�e`N^k���%���~s���+USr�s0����"{�2���4OI c-��ADk��]���x�������}���
&t�6�MJ����'M`CyA�?�;�w�%Gz��A!�m��Y����.�FIJbF� ��S������2�p������2 ��;��'��5��?6���������8�2h�m� q���DC)�Dh�U
��?$};F����1B���#@
����a�77�����mh��f,<KS�RA7l�V�����-�|���[�#��9�^�*��^�������'�KUv�}�{�m�Aq�&���\ �<4�0H�oiRR��c"`MF/�����gX"���3d�b����3y������'���H�����l0����'��a'~}��b/O#������X��0�j���V�F7��2��yf�A�9��:8'L����p2N0����th"i����e�8������g�Vo�7lh
���6C���D�W$��d6���OO��VW�������~Y����WWWm�6M�������x�i���������h��v��"O�����_���������/O������?o���8�����?�����_���~q�8�l����G"��`�l��m\���;���w�n���N]C-3������d��>Qn�'�8��U�����lWt�b���1I ���\��f��
8�?�z���>�`��4K�$����6���d�*�nr����hEr*Zb$�(�[7Gi�k�>u��v��{ �?��}y������r��U���� �3�
�l�Pk�rM����S@�'���I�I���w\B�t;z8�Yh��'��oF&��bLLa#�_~z$��6���)��U���S66����l�]4j�vQ����,w����:�h����$����f-��O��5W���X���6�W�*�U-c�~�B���"�H�)��H����O3�+�����>y�h9�xP����ln����f}�]�_lfU
�8�����`����A�;8~���h/`d���4� j��Q����[s���[�&���D� h��.�FJ������������,R�
��<��9�M��Q0)|�XB��"a�� �Q0�,�u��E�+�"V�M���B��
�~��!5�0^��� �K�B�y�d���c�rKEy�����pI�J�
���\@���- T��MV����7/;/��zt���W������4 G��o$A��5\�1LUs?��v�����[��uYH26�>�!�.m\?��p�l|��QYr~��~^V��?�>�U���%8�i����Y�z�MF�����"�=s��%��R�V]�|�M�~�-�����N��.�fPN�/�}�����E�/'�b��M�������K�b���%0]DWau%������|�N��o(uS��Tm�M� ����@��
�����XD4��F=r'!Ds+4F���u��PPi>�e�c���0�K��~?��V/ =���K�J6��s�������aH���zki�0��b���1��`&;�A= �B�}� w� �2����T��g��?uj.N���8���������v_�1�MO�iWIN�����qX�`l���%P�NJ|�.�?Z�Y}V��g�Y}V��g�Y}V��g�Y}V��g�Y}V��g�Y}V��g��������� � On Thu, May 29, 2014 at 12:12 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
IMHO, all of these caveats, would affect a very small fraction of
use-cases and are eclipsed by the benefits this extension provides in
normal cases.I agree with you that there are only few corner cases where evicting
shared buffers by this utility would harm, but was wondering if we could
even save those, say if it would only use available free buffers. I think
currently there is no such interface and inventing a new interface for this
case doesn't seem to reasonable unless we see any other use case of
such a interface.
It seems like it would be best to try to do this at cluster startup
time, rather than once recovery has reached consistency. Of course,
that might mean doing it with a single process, which could have its
own share of problems. But I'm somewhat inclined to think that if
recovery has already run for a significant period of time, the blocks
that recovery has brought into shared_buffers are more likely to be
useful than whatever pg_hibernate would load.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 3, 2014 at 7:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, May 29, 2014 at 12:12 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
IMHO, all of these caveats, would affect a very small fraction of
use-cases and are eclipsed by the benefits this extension provides in
normal cases.I agree with you that there are only few corner cases where evicting
shared buffers by this utility would harm, but was wondering if we could
even save those, say if it would only use available free buffers. I think
currently there is no such interface and inventing a new interface for this
case doesn't seem to reasonable unless we see any other use case of
such a interface.It seems like it would be best to try to do this at cluster startup
time, rather than once recovery has reached consistency. Of course,
that might mean doing it with a single process, which could have its
own share of problems. But I'm somewhat inclined to think that if
recovery has already run for a significant period of time, the blocks
that recovery has brought into shared_buffers are more likely to be
useful than whatever pg_hibernate would load.
I am not absolutely sure of the order of execution between recovery
process and the BGWorker, but ...
For sizeable shared_buffers size, the restoration of the shared
buffers can take several seconds. I have a feeling the users wouldn't
like their master database take up to a few minutes to start accepting
connections. From my tests [1]http://gurjeet.singh.im/blog/2014/04/30/postgres-hibernator-reduce-planned-database-down-times/, " In the 'App after Hibernator' [case]
... This took 70 seconds for reading the ~4 GB database."
[1]: http://gurjeet.singh.im/blog/2014/04/30/postgres-hibernator-reduce-planned-database-down-times/
Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/
EDB www.EnterpriseDB.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 3, 2014 at 8:13 AM, Gurjeet Singh <gurjeet@singh.im> wrote:
On Tue, Jun 3, 2014 at 7:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, May 29, 2014 at 12:12 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
IMHO, all of these caveats, would affect a very small fraction of
use-cases and are eclipsed by the benefits this extension provides in
normal cases.I agree with you that there are only few corner cases where evicting
shared buffers by this utility would harm, but was wondering if we could
even save those, say if it would only use available free buffers. I think
currently there is no such interface and inventing a new interface for this
case doesn't seem to reasonable unless we see any other use case of
such a interface.It seems like it would be best to try to do this at cluster startup
time, rather than once recovery has reached consistency. Of course,
that might mean doing it with a single process, which could have its
own share of problems. But I'm somewhat inclined to think that if
Currently pg_hibernator uses ReadBufferExtended() API, and AIUI, that
API requires a database connection//shared-memory attachment, and that
a backend process cannot switch between databases after the initial
connection.
own share of problems. But I'm somewhat inclined to think that if
recovery has already run for a significant period of time, the blocks
that recovery has brought into shared_buffers are more likely to be
useful than whatever pg_hibernate would load.
The applications that connect to a standby may have a different access
pattern than the applications that are operating on the master
database. So the buffers that are being restored by startup process
may not be relevant to the workload on the standby.
Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/
EDB www.EnterpriseDB.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 3, 2014 at 5:43 PM, Gurjeet Singh <gurjeet@singh.im> wrote:
On Tue, Jun 3, 2014 at 7:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
It seems like it would be best to try to do this at cluster startup
time, rather than once recovery has reached consistency. Of course,
that might mean doing it with a single process, which could have its
own share of problems. But I'm somewhat inclined to think that if
recovery has already run for a significant period of time, the blocks
that recovery has brought into shared_buffers are more likely to be
useful than whatever pg_hibernate would load.I am not absolutely sure of the order of execution between recovery
process and the BGWorker, but ...For sizeable shared_buffers size, the restoration of the shared
buffers can take several seconds.
Incase of recovery, the shared buffers saved by this utility are
from previous shutdown which doesn't seem to be of more use
than buffers loaded by recovery.
I have a feeling the users wouldn't
like their master database take up to a few minutes to start accepting
connections.
I think this is fair point and to address this we can have an option to
decide when to load buffer's and have default value as load before
recovery.
Currently pg_hibernator uses ReadBufferExtended() API, and AIUI, that
API requires a database connection//shared-memory attachment, and that
a backend process cannot switch between databases after the initial
connection.
If recovery can load the buffer's to apply WAL, why can't it be done with
pg_hibernator. Can't we use ReadBufferWithoutRelcache() to achieve
the purpose of pg_hibernator?
One other point:
Note that the BuffersSaver process exists at all times, even when this
parameter is set to `false`. This is to allow the DBA to enable/disable
the
extension without having to restart the server. The BufferSaver process
checks this parameter during server startup and right before shutdown, and
honors this parameter's value at that time.
Why can't it be done when user register's the extension by using dynamic
background facility "RegisterDynamicBackgroundWorker"?
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 2014-06-04 10:24:13 +0530, Amit Kapila wrote:
On Tue, Jun 3, 2014 at 5:43 PM, Gurjeet Singh <gurjeet@singh.im> wrote:
On Tue, Jun 3, 2014 at 7:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
It seems like it would be best to try to do this at cluster startup
time, rather than once recovery has reached consistency. Of course,
that might mean doing it with a single process, which could have its
own share of problems. But I'm somewhat inclined to think that if
recovery has already run for a significant period of time, the blocks
that recovery has brought into shared_buffers are more likely to be
useful than whatever pg_hibernate would load.I am not absolutely sure of the order of execution between recovery
process and the BGWorker, but ...For sizeable shared_buffers size, the restoration of the shared
buffers can take several seconds.Incase of recovery, the shared buffers saved by this utility are
from previous shutdown which doesn't seem to be of more use
than buffers loaded by recovery.
Why? The server might have been queried if it's a hot standby one?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 4, 2014 at 2:08 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-06-04 10:24:13 +0530, Amit Kapila wrote:
On Tue, Jun 3, 2014 at 5:43 PM, Gurjeet Singh <gurjeet@singh.im> wrote:
On Tue, Jun 3, 2014 at 7:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
It seems like it would be best to try to do this at cluster startup
time, rather than once recovery has reached consistency. Of course,
that might mean doing it with a single process, which could have its
own share of problems. But I'm somewhat inclined to think that if
recovery has already run for a significant period of time, the blocks
that recovery has brought into shared_buffers are more likely to be
useful than whatever pg_hibernate would load.I am not absolutely sure of the order of execution between recovery
process and the BGWorker, but ...For sizeable shared_buffers size, the restoration of the shared
buffers can take several seconds.Incase of recovery, the shared buffers saved by this utility are
from previous shutdown which doesn't seem to be of more use
than buffers loaded by recovery.Why? The server might have been queried if it's a hot standby one?
I think that's essentially the same point Amit is making. Gurjeet is
arguing for reloading the buffers from the previous shutdown at end of
recovery; IIUC, Amit, you, and I all think this isn't a good idea.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-06-04 09:51:36 -0400, Robert Haas wrote:
On Wed, Jun 4, 2014 at 2:08 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-06-04 10:24:13 +0530, Amit Kapila wrote:
Incase of recovery, the shared buffers saved by this utility are
from previous shutdown which doesn't seem to be of more use
than buffers loaded by recovery.Why? The server might have been queried if it's a hot standby one?
I think that's essentially the same point Amit is making. Gurjeet is
arguing for reloading the buffers from the previous shutdown at end of
recovery; IIUC, Amit, you, and I all think this isn't a good idea.
I think I am actually arguing for Gurjeet's position. If the server is
actively being queried (i.e. hot_standby=on and actually used for
queries) it's quite reasonable to expect that shared_buffers has lots of
content that is *not* determined by WAL replay.
There's not that much read IO going on during WAL replay anyway - after
a crash/start from a restartpoint most of it is loaded via full page
anyway. So it's only disadvantageous to fault in pages via pg_hibernate
if that causes pages that already have been read in via FPIs to be
thrown out.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 4, 2014 at 7:26 PM, Andres Freund <andres@2ndquadrant.com>
wrote:
On 2014-06-04 09:51:36 -0400, Robert Haas wrote:
On Wed, Jun 4, 2014 at 2:08 AM, Andres Freund <andres@2ndquadrant.com>
wrote:
On 2014-06-04 10:24:13 +0530, Amit Kapila wrote:
Incase of recovery, the shared buffers saved by this utility are
from previous shutdown which doesn't seem to be of more use
than buffers loaded by recovery.Why? The server might have been queried if it's a hot standby one?
Consider the case, crash (force kill or some other way) occurs when
BGSaver is saving the buffers, now I think it is possible that it has
saved partial information (information about some buffers is correct
and others is missing) and it is also possible by that time checkpoint
record is not written (which means recovery will start from previous
restart point). So whats going to happen is that pg_hibernate might
load some less used buffers/blocks (which have lower usage count)
and WAL replayed blocks will be sacrificed. So the WAL data from
previous restart point and some more due to delay in start of
standby (changes occured in master during that time) will be
sacrificed.
Another case is of standalone server in which case there is always
high chance that blocks recovered by recovery are the active one's.
Now I agree that case of standalone servers is less, but still some
small applications might be using it. Also I think same is true if
the crashed server is master.
I think that's essentially the same point Amit is making. Gurjeet is
arguing for reloading the buffers from the previous shutdown at end of
recovery; IIUC, Amit, you, and I all think this isn't a good idea.I think I am actually arguing for Gurjeet's position. If the server is
actively being queried (i.e. hot_standby=on and actually used for
queries) it's quite reasonable to expect that shared_buffers has lots of
content that is *not* determined by WAL replay.
Yes, that's quite possible, however there can be situations where it
is not true as explained above.
There's not that much read IO going on during WAL replay anyway - after
a crash/start from a restartpoint most of it is loaded via full page
anyway.
So it's only disadvantageous to fault in pages via pg_hibernate
if that causes pages that already have been read in via FPIs to be
thrown out.
So for such cases, pages loaded by pg_hibernate turn out to be loss.
Overall I think there can be both kind of cases when it is beneficial
to load buffers after recovery and before recovery, thats why I
mentioned above that either it can be a parameter from user to
decide the same or may be we can have a new API which will
load buffers by BGworker without evicting any existing buffer
(use buffers from free list only).
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Jun 4, 2014 at 9:56 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-06-04 09:51:36 -0400, Robert Haas wrote:
On Wed, Jun 4, 2014 at 2:08 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-06-04 10:24:13 +0530, Amit Kapila wrote:
Incase of recovery, the shared buffers saved by this utility are
from previous shutdown which doesn't seem to be of more use
than buffers loaded by recovery.Why? The server might have been queried if it's a hot standby one?
I think that's essentially the same point Amit is making. Gurjeet is
arguing for reloading the buffers from the previous shutdown at end of
recovery; IIUC, Amit, you, and I all think this isn't a good idea.I think I am actually arguing for Gurjeet's position. If the server is
actively being queried (i.e. hot_standby=on and actually used for
queries) it's quite reasonable to expect that shared_buffers has lots of
content that is *not* determined by WAL replay.There's not that much read IO going on during WAL replay anyway - after
a crash/start from a restartpoint most of it is loaded via full page
anyway. So it's only disadvantageous to fault in pages via pg_hibernate
if that causes pages that already have been read in via FPIs to be
thrown out.
The thing I was concerned about is that the system might have been in
recovery for months. What was hot at the time the base backup was
taken seems like a poor guide to what will be hot at the time of
promotion. Consider a history table, for example: the pages at the
end, which have just been written, are much more likely to be useful
than anything earlier.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-06-04 14:50:39 -0400, Robert Haas wrote:
The thing I was concerned about is that the system might have been in
recovery for months. What was hot at the time the base backup was
taken seems like a poor guide to what will be hot at the time of
promotion. Consider a history table, for example: the pages at the
end, which have just been written, are much more likely to be useful
than anything earlier.
I'd assumed that the hibernation files would simply be excluded from the
basebackup...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 4, 2014 at 12:54 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jun 3, 2014 at 5:43 PM, Gurjeet Singh <gurjeet@singh.im> wrote:
For sizeable shared_buffers size, the restoration of the shared
buffers can take several seconds.Incase of recovery, the shared buffers saved by this utility are
from previous shutdown which doesn't seem to be of more use
than buffers loaded by recovery.
I feel the need to enumerate the recovery scenarios we're talking
about so that we're all on the same page.
1) Hot backup (cp/rsync/pg_basebackup/.. while the master was running)
followed by
1a) recovery using archives or streaming replication.
1a.i) database in hot-standby mode
1a.ii) database not in hot-standby mode, i.e. it's in warm-standby mode.
1b) minimal recovery, that is, recover only the WAL available in
pg_xlog, then come online.
2) Cold backup of a crashed master, followed by startup of the copy
(causing crash recovery; IMHO same as case 1b above.).
3) Cold backup of clean-shutdown master, followed by startup of the
copy (no recovery).
In cases 1.x there won't be any save-files (*), because the
BlockReader processes remove their respective save-file when they are
done restoring the buffers, So the hot/warm-standby created thus will
not inherit the save-files, and hence post-recovery will not cause any
buffer restores.
Case 2 also won't cause any buffer restores because the save-files are
created only on clean shutdowons; not on a crash or immediate
shutdown.
Case 3, is the sweet spot of pg_hibernator. It will save buffer-list
on shutdown, and restore them when the backup-copy is started
(provided pg_hibernator is installed there).
(*) If a hot-backup is taken immediately after database comes online,
since BlockReaders may still be running and not have deleted the
save-files, the save-files may end up in backup, and hence cause the
recovery-time conflicts we're talking about. This should be rare in
practice, and even when it does happen, at worst it will affect the
initial performance of the cluster.
I have a feeling the users wouldn't
like their master database take up to a few minutes to start accepting
connections.I think this is fair point and to address this we can have an option to
decide when to load buffer's and have default value as load before
recovery.
Given the above description, I don't think crash/archive recovery is a
concern anymore. But if that corner case is still a concern, I
wouldn't favour making recovery slow by default, and make users of
pg_hibernator pay for choosing to use the extension. I'd prefer the
user explicitly ask for a behaviour that makes startups slow.
One other point:
Note that the BuffersSaver process exists at all times, even when this
parameter is set to `false`. This is to allow the DBA to enable/disable
the
extension without having to restart the server. The BufferSaver process
checks this parameter during server startup and right before shutdown, and
honors this parameter's value at that time.Why can't it be done when user register's the extension by using dynamic
background facility "RegisterDynamicBackgroundWorker"?
There's no user interface to this extension except for the 3 GUC
parameters; not even CREATE EXTENSION. The DBA is expected to append
this extension's name in shared_preload_libraries.
Since this extension declares one of its parameters PGC_POSTMASTER, it
can't be loaded via the SQL 'LOAD ' command.
postgres=# load 'pg_hibernator';
FATAL: cannot create PGC_POSTMASTER variables after startup
FATAL: cannot create PGC_POSTMASTER variables after startup
The connection to the server was lost. Attempting reset: Succeeded.
Best regards,
PS: I was out sick yesterday, so couldn't respond promptly.
--
Gurjeet Singh http://gurjeet.singh.im/
EDB www.EnterpriseDB.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 4, 2014 at 2:52 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-06-04 14:50:39 -0400, Robert Haas wrote:
The thing I was concerned about is that the system might have been in
recovery for months. What was hot at the time the base backup was
taken seems like a poor guide to what will be hot at the time of
promotion. Consider a history table, for example: the pages at the
end, which have just been written, are much more likely to be useful
than anything earlier.I'd assumed that the hibernation files would simply be excluded from the
basebackup...
Yes, they will be excluded, provided the BlockReader processes have
finished, because each BlockReader unlinks its save-file after it is
done restoring buffers listed in it.
Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/
EDB www.EnterpriseDB.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 4, 2014 at 2:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
The thing I was concerned about is that the system might have been in
recovery for months. What was hot at the time the base backup was
taken seems like a poor guide to what will be hot at the time of
promotion. Consider a history table, for example: the pages at the
end, which have just been written, are much more likely to be useful
than anything earlier.
I think you are specifically talking about a warm-standby that runs
recovery for months before being brought online. As described in my
response to Amit, if the base backup used to create that standby was
taken after the BlockReaders had restored the buffers (which should
complete within few minutes of startup, even for large databases),
then there's no concern since the base backup wouldn't contain the
save-files.
If it's a hot-standby, the restore process would start as soon as the
database starts accepting connections, finish soon after, and get
completely out of the way of the normal recovery process. At which
point the buffers populated by the recovery would compete only with
the buffers being requested by backends, which is the normal
behaviour.
Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/
EDB www.EnterpriseDB.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jun 5, 2014 at 5:39 PM, Gurjeet Singh <gurjeet@singh.im> wrote:
On Wed, Jun 4, 2014 at 12:54 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
On Tue, Jun 3, 2014 at 5:43 PM, Gurjeet Singh <gurjeet@singh.im> wrote:
For sizeable shared_buffers size, the restoration of the shared
buffers can take several seconds.Incase of recovery, the shared buffers saved by this utility are
from previous shutdown which doesn't seem to be of more use
than buffers loaded by recovery.I feel the need to enumerate the recovery scenarios we're talking
about so that we're all on the same page.1) Hot backup (cp/rsync/pg_basebackup/.. while the master was running)
followed by
1a) recovery using archives or streaming replication.
1a.i) database in hot-standby mode
1a.ii) database not in hot-standby mode, i.e. it's in warm-standby
mode.
1b) minimal recovery, that is, recover only the WAL available in
pg_xlog, then come online.2) Cold backup of a crashed master, followed by startup of the copy
(causing crash recovery; IMHO same as case 1b above.).3) Cold backup of clean-shutdown master, followed by startup of the
copy (no recovery).In cases 1.x there won't be any save-files (*), because the
BlockReader processes remove their respective save-file when they are
done restoring the buffers, So the hot/warm-standby created thus will
not inherit the save-files, and hence post-recovery will not cause any
buffer restores.Case 2 also won't cause any buffer restores because the save-files are
created only on clean shutdowons; not on a crash or immediate
shutdown.
How do you ensure that buffers are saved only on clean shutdown?
Buffer saver process itself can crash while saving or restoring
buffers.
IIUC on shutdown request, postmaster will send signal to BG Saver
and BG Saver will save the buffers and then postmaster will send
signal to checkpointer to shutdown. So before writing Checkpoint
record, BG Saver can crash (it might have saved half the buffers)
or may BG saver saves buffers, but checkpointer crashes (due to
power outage or any such thing).
Another thing is don't you want to handle SIGQUIT signal in bg saver?
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Jun 5, 2014 at 11:32 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Another thing is don't you want to handle SIGQUIT signal in bg saver?
I think bgworker_quickdie registered in StartBackgroundWorker() serves
the purpose just fine.
Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/
EDB www.EnterpriseDB.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jun 5, 2014 at 11:32 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Jun 5, 2014 at 5:39 PM, Gurjeet Singh <gurjeet@singh.im> wrote:
On Tue, Jun 3, 2014 at 5:43 PM, Gurjeet Singh <gurjeet@singh.im> wrote:
Case 2 also won't cause any buffer restores because the save-files are
created only on clean shutdowons; not on a crash or immediate
shutdown.How do you ensure that buffers are saved only on clean shutdown?
Postmaster sends SIGTERM only in "smart" or "fast" shutdown requests.
Buffer saver process itself can crash while saving or restoring
buffers.
True. That may lead to partial list of buffers being saved. And the
code in Reader process tries hard to read only valid data, and punts
at the first sight of data that doesn't make sense or on ERROR raised
from Postgres API call.
IIUC on shutdown request, postmaster will send signal to BG Saver
and BG Saver will save the buffers and then postmaster will send
signal to checkpointer to shutdown. So before writing Checkpoint
record, BG Saver can crash (it might have saved half the buffers)
Case handled as described above.
or may BG saver saves buffers, but checkpointer crashes (due to
power outage or any such thing).
Checkpointer process' crash seems to be irrelevant to Postgres
Hibernator's workings.
I think you are trying to argue the wording in my claim "save-files
are created only on clean shutdowons; not on a crash or immediate
shutdown", by implying that a crash may occur at any time during and
after the BufferSaver processing. I agree the wording can be improved.
How about
... save-files are created only when Postgres is requested to shutdown
in normal (smart or fast) modes.
Note that I am leaving out the mention of crash.
Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/
EDB www.EnterpriseDB.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 6/4/14, 8:56 AM, Andres Freund wrote:
On 2014-06-04 09:51:36 -0400, Robert Haas wrote:
On Wed, Jun 4, 2014 at 2:08 AM, Andres Freund<andres@2ndquadrant.com> wrote:
On 2014-06-04 10:24:13 +0530, Amit Kapila wrote:
Incase of recovery, the shared buffers saved by this utility are
from previous shutdown which doesn't seem to be of more use
than buffers loaded by recovery.Why? The server might have been queried if it's a hot standby one?
I think that's essentially the same point Amit is making. Gurjeet is
arguing for reloading the buffers from the previous shutdown at end of
recovery; IIUC, Amit, you, and I all think this isn't a good idea.I think I am actually arguing for Gurjeet's position. If the server is
actively being queried (i.e. hot_standby=on and actually used for
queries) it's quite reasonable to expect that shared_buffers has lots of
content that is*not* determined by WAL replay.
Perhaps instead of trying to get data actually into shared buffers it would be better to just advise the kernel that we think we're going to need it? ISTM it's reasonably fast to pull data from disk cache into shared buffers.
On a related note, what I really wish for is the ability to restore the disk cash after a restart/unmount...
--
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Le lundi 3 février 2014 19:18:54 Gurjeet Singh a écrit :
Possible enhancements:
- Ability to save/restore only specific databases.
- Control how many BlockReaders are active at a time; to avoid I/O
storms. - Be smart about lowered shared_buffers across the restart.
- Different modes of reading like pg_prewarm does.
- Include PgFincore functionality, at least for Linux platforms.
Please note that pgfincore is working on any system where PostgreSQL
prefetch is working, exactly like pg_prewarm. This includes linux, BSD and
many unix-like. It *is not* limited to linux.
I never had a single request for windows, but windows does provides an
API for that too (however I have no windows offhand to test).
Another side note is that currently BSD (at least freeBSD) have a more
advanced mincore() syscall than linux and offers a better analysis (dirty
status is known) and they implemented posix_fadvise...
PS:
There is a previous thread about that hibernation feature. Mitsuru IWASAKI
did a patch, and it triggers some interesting discussions.
Some notes in this thread are outdated now, but it's worth having a look
at it:
/messages/by-id/20110504.231048.113741617.iwasaki@jp.FreeBSD.org
https://commitfest.postgresql.org/action/patch_view?id=549
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation
On Fri, Jun 6, 2014 at 5:31 PM, Gurjeet Singh <gurjeet@singh.im> wrote:
On Thu, Jun 5, 2014 at 11:32 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
Buffer saver process itself can crash while saving or restoring
buffers.True. That may lead to partial list of buffers being saved. And the
code in Reader process tries hard to read only valid data, and punts
at the first sight of data that doesn't make sense or on ERROR raised
from Postgres API call.
Inspite of Reader Process trying hard, I think we should ensure by
some other means that file saved by buffer saver is valid (may be
first write in tmp file and then rename it or something else).
IIUC on shutdown request, postmaster will send signal to BG Saver
and BG Saver will save the buffers and then postmaster will send
signal to checkpointer to shutdown. So before writing Checkpoint
record, BG Saver can crash (it might have saved half the buffers)Case handled as described above.
or may BG saver saves buffers, but checkpointer crashes (due to
power outage or any such thing).Checkpointer process' crash seems to be irrelevant to Postgres
Hibernator's workings.
Yeap, but if it crashes before writing checkpoint record, it will lead to
recovery which is what we are considering.
I think you are trying to argue the wording in my claim "save-files
are created only on clean shutdowons; not on a crash or immediate
shutdown", by implying that a crash may occur at any time during and
after the BufferSaver processing. I agree the wording can be improved.
Not only wording, but in your above mail Case 2 and 1b would need to
load buffer's and perform recovery as well, so we need to decide which
one to give preference.
So If you agree that we should have consideration for recovery data
along with saved files data, then I think we have below options to
consider:
1. Have an provision for user to specify which data (recovery or
previous cached blocks) should be considered more important
and then load buffers before or after recovery based on that
input.
2. Always perform before recovery and mention in docs that users
can expect more time for servers to start in case they enable this
extension along with the advantages of the same.
3. Always perform after recovery and mention in docs that enabling
this extension might discard cached data by recovery or initial few
operations done by user.
4. Have an exposed API by BufMgr module such that Buffer loader
will only consider buffers in freelist to load buffers.
Based on opinion of others, I think we can decide on one of these
or if any other better way.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Jun 5, 2014 at 8:32 AM, Gurjeet Singh <gurjeet@singh.im> wrote:
On Wed, Jun 4, 2014 at 2:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
The thing I was concerned about is that the system might have been in
recovery for months. What was hot at the time the base backup was
taken seems like a poor guide to what will be hot at the time of
promotion. Consider a history table, for example: the pages at the
end, which have just been written, are much more likely to be useful
than anything earlier.I think you are specifically talking about a warm-standby that runs
recovery for months before being brought online. As described in my
response to Amit, if the base backup used to create that standby was
taken after the BlockReaders had restored the buffers (which should
complete within few minutes of startup, even for large databases),
then there's no concern since the base backup wouldn't contain the
save-files.If it's a hot-standby, the restore process would start as soon as the
database starts accepting connections, finish soon after, and get
completely out of the way of the normal recovery process. At which
point the buffers populated by the recovery would compete only with
the buffers being requested by backends, which is the normal
behaviour.
I guess I don't see what warm-standby vs. hot-standby has to do with
it. If recovery has been running for a long time, then restoring
buffers from some save file created before that is probably a bad
idea, regardless of whether the buffers already loaded were read in by
recovery itself or by queries running on the system. But if you're
saying that doesn't happen, then there's no problem there.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 10, 2014 at 12:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:
If recovery has been running for a long time, then restoring
buffers from some save file created before that is probably a bad
idea, regardless of whether the buffers already loaded were read in by
recovery itself or by queries running on the system. But if you're
saying that doesn't happen, then there's no problem there.
Normally, it won't happen. There's one case I can think of, which has
to coincide with a small window of time for such a thing to happen.
Consider this:
.) A database is shutdown, which creates the save-files in
$PGDATA/pg_hibernator/.
.) The database is restarted.
.) BlockReaders begin to read and restore the disk blocks into buffers.
.) Before the BlockReaders could finish*, a copy of the database is
taken (rsync/cp/FS-snapshot/etc.)
This causes the the save-files to be present in the copy, because
the BlockReaders haven't deleted them, yet.
* (The BlockReaders ideally finish their task in first few minutes
after first of them is started.)
.) The copy of the database is used to restore and erect a warm-standby.
.) The warm-standby starts replaying logs from WAL archive/stream.
.) Some time (hours/weeks/months) later, the warm-standby is promoted
to be a master.
.) It starts the Postgres Hibernator, which sees save-files in
$PGDATA/pg_hibernator/ and launches BlockReaders.
At this point, the BlockReaders will restore the blocks that were
present in original DB's shared-buffers at the time of shutdown. So,
this would fetch blocks into shared-buffers that may be completely
unrelated to the blocks recently operated on by the recovery process.
And it's probably accepted by now that such a bahviour is not
catastrophic, merely inconvenient.
Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/
EDB www.EnterpriseDB.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Jun 8, 2014 at 3:24 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jun 6, 2014 at 5:31 PM, Gurjeet Singh <gurjeet@singh.im> wrote:
On Thu, Jun 5, 2014 at 11:32 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:Buffer saver process itself can crash while saving or restoring
buffers.True. That may lead to partial list of buffers being saved. And the
code in Reader process tries hard to read only valid data, and punts
at the first sight of data that doesn't make sense or on ERROR raised
from Postgres API call.Inspite of Reader Process trying hard, I think we should ensure by
some other means that file saved by buffer saver is valid (may be
first write in tmp file and then rename it or something else).
I see no harm in current approach, since even if the file is partially
written on shutdown, or if it is corrupted due to hardware corruption,
the worst that can happen is the BlockReaders will try to restore, and
possibly succeed, a wrong block to shared-buffers.
I am okay with your approach of first writing to a temp file, if
others see an advantage of doing this and insist on it.
IIUC on shutdown request, postmaster will send signal to BG Saver
and BG Saver will save the buffers and then postmaster will send
signal to checkpointer to shutdown. So before writing Checkpoint
record, BG Saver can crash (it might have saved half the buffers)Case handled as described above.
or may BG saver saves buffers, but checkpointer crashes (due to
power outage or any such thing).Checkpointer process' crash seems to be irrelevant to Postgres
Hibernator's workings.Yeap, but if it crashes before writing checkpoint record, it will lead to
recovery which is what we are considering.
Good point.
In case of such recovery, the recovery process will read in the blocks
that were recently modified, and were possibly still in shared-buffers
when Checkpointer crashed. So after recovery finishes, the
BlockReaders will be invoked (because save-files were successfully
written before the crash), and they would request the same blocks to
be restored. Most likely, those blocks would already be in
shared-buffers, hence no cause of concern regarding BlockReaders
evicting buffers populated by recovery.
I think you are trying to argue the wording in my claim "save-files
are created only on clean shutdowons; not on a crash or immediate
shutdown", by implying that a crash may occur at any time during and
after the BufferSaver processing. I agree the wording can be improved.Not only wording, but in your above mail Case 2 and 1b would need to
load buffer's and perform recovery as well, so we need to decide which
one to give preference.
In the cases you mention, 1b and 2, ideally there will be no
save-files because the server either (1b) was still running, or (2)
crashed.
If there were any save-files present during the previous startup (the
one that happened before (1b) hot-backup or (2) crash) of the server,
they would have been removed by the BlockReaders soon after the
startup.
So If you agree that we should have consideration for recovery data
along with saved files data, then I think we have below options to
consider:
I don't think any of the options you mention need any consideration
because recovery and buffer-restore process don't seem to be at
conflict with each other; not enough to be a concern, IMHO.
Thanks and best regards,
--
Gurjeet Singh http://gurjeet.singh.im/
EDB www.EnterpriseDB.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 11, 2014 at 7:59 AM, Gurjeet Singh <gurjeet@singh.im> wrote:
On Sun, Jun 8, 2014 at 3:24 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
IIUC on shutdown request, postmaster will send signal to BG Saver
and BG Saver will save the buffers and then postmaster will send
signal to checkpointer to shutdown. So before writing Checkpoint
record, BG Saver can crash (it might have saved half the buffers)Case handled as described above.
or may BG saver saves buffers, but checkpointer crashes (due to
power outage or any such thing).Checkpointer process' crash seems to be irrelevant to Postgres
Hibernator's workings.Yeap, but if it crashes before writing checkpoint record, it will lead
to
recovery which is what we are considering.
Good point.
In case of such recovery, the recovery process will read in the blocks
that were recently modified, and were possibly still in shared-buffers
when Checkpointer crashed. So after recovery finishes, the
BlockReaders will be invoked (because save-files were successfully
written before the crash), and they would request the same blocks to
be restored. Most likely, those blocks would already be in
shared-buffers, hence no cause of concern regarding BlockReaders
evicting buffers populated by recovery.
Not necessarily because after crash, recovery has to start from
previous checkpoint, so it might not perform operations on same
pages as are saved by buffer saver. Also as the file saved by
buffer saver can be a file which contains only partial list of
buffers which were in shared buffer's, it becomes more likely that
in such cases it can override the buffers populated by recovery.
Now as pg_hibernator doesn't give any preference to usage_count while
saving buffer's, it can also evict the buffers populated by recovery
with some lower used pages of previous run.
I think you are trying to argue the wording in my claim "save-files
are created only on clean shutdowons; not on a crash or immediate
shutdown", by implying that a crash may occur at any time during and
after the BufferSaver processing. I agree the wording can be improved.Not only wording, but in your above mail Case 2 and 1b would need to
load buffer's and perform recovery as well, so we need to decide which
one to give preference.In the cases you mention, 1b and 2, ideally there will be no
save-files because the server either (1b) was still running, or (2)
crashed.If there were any save-files present during the previous startup (the
one that happened before (1b) hot-backup or (2) crash) of the server,
they would have been removed by the BlockReaders soon after the
startup.
I think Block Readers will remove file only after reading and populating
buffers from it and that's the reason I mentioned that it can lead to doing
both recovery as well as load buffers based on file saved by buffer
saver.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Jun 10, 2014 at 10:03 PM, Gurjeet Singh <gurjeet@singh.im> wrote:
And it's probably accepted by now that such a bahviour is not
catastrophic, merely inconvenient.
I think the whole argument for having pg_hibernator is that getting
the block cache properly initialized is important. If it's not
important, then we don't need pg_hibernator in the first place. But
if it is important, then I think not loading unrelated blocks into
shared_buffers is also important.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 11, 2014 at 12:25 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jun 11, 2014 at 7:59 AM, Gurjeet Singh <gurjeet@singh.im> wrote:
On Sun, Jun 8, 2014 at 3:24 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:Yeap, but if it crashes before writing checkpoint record, it will lead
to
recovery which is what we are considering.Good point.
In case of such recovery, the recovery process will read in the blocks
that were recently modified, and were possibly still in shared-buffers
when Checkpointer crashed. So after recovery finishes, the
BlockReaders will be invoked (because save-files were successfully
written before the crash), and they would request the same blocks to
be restored. Most likely, those blocks would already be in
shared-buffers, hence no cause of concern regarding BlockReaders
evicting buffers populated by recovery.Not necessarily because after crash, recovery has to start from
previous checkpoint, so it might not perform operations on same
pages as are saved by buffer saver.
Granted, the recovery may not start that way (that is, reading in
blocks that were in shared-buffers when shutdown was initiated), but
it sure would end that way. Towards the end of recovery, the blocks
it'd read back in are highly likely to be the ones that were present
in shared-buffers at the time of shutdown. By the end of recovery,
either (a) blocks read in at the beginning of recovery are evicted by
later operations of recovery, or (b) they are still present in
shared-buffers. So the blocks requested by the BlockReaders are highly
likely to be already in shared-buffers at the end of recovery, because
these are the same blocks that were dirty (and hence recorded in WAL)
just before shutdown time.
I guess what I am trying to say is that the blocks read in by the
BlockReaders will be a superset of those read in by the reocvery
process. At the time of shutdown/saving-buffers, the shared-buffers
may have contained dirty and clean buffers. WAL contains the info of
which blocks were dirtied. Recovery will read back the blocks that
were dirty, to replay the WAL, and since the BlockReaders are started
_after_ recovery finishes, the BlockReaders will effectively read in
only those blocks that are not already read-in by the recovery.
I am not yet convinced, at least in this case, that Postgres
Hibernator would restore blocks that can cause eviction of buffers
restored by recovery.
I don't have intimate knowledge of recovery but I think the above
assessment of recovery's operations holds true. If you still think
this is a concern, can you please provide a bit firm example using
which I can visualize the problem you're talking about.
Also as the file saved by
buffer saver can be a file which contains only partial list of
buffers which were in shared buffer's, it becomes more likely that
in such cases it can override the buffers populated by recovery.
I beg to differ. As described above, the blocks read-in by the
BlockReader will not evict the recovery-restored blocks. The
save-files being written partially does not change that.
Now as pg_hibernator doesn't give any preference to usage_count while
saving buffer's, it can also evict the buffers populated by recovery
with some lower used pages of previous run.
The case we're discussing (checkpointer/BufferSaver/some-other-process
crash during a smart/fast shutdown) should occur rarely in practice.
Although Postgres Hibernator is not yet proven to do the wrong thing
in this case, I hope you'd agree that BlockReaders evicting buffers
populated by recovery process is not catastrophic at all, merely
inconvenient from performance perspective. Also, the impact is only on
the initial performance immediately after startup, since application
queries will re-prime the shared-buffers with whatever buffers they
need.
I think Block Readers will remove file only after reading and populating
buffers from it
Correct.
and that's the reason I mentioned that it can lead to doing
both recovery as well as load buffers based on file saved by buffer
saver.
I am not sure I completely understand the implication here, but I
think the above description of case where
recovery-followed-by-BlockReaders not causing a concern may cover it.
Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/
EDB www.EnterpriseDB.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 11, 2014 at 10:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Jun 10, 2014 at 10:03 PM, Gurjeet Singh <gurjeet@singh.im> wrote:
And it's probably accepted by now that such a bahviour is not
catastrophic, merely inconvenient.I think the whole argument for having pg_hibernator is that getting
the block cache properly initialized is important. If it's not
important, then we don't need pg_hibernator in the first place. But
if it is important, then I think not loading unrelated blocks into
shared_buffers is also important.
I was constructing a contrived scenario, something that would rarely
happen in reality. I feel that the benefits of this feature greatly
outweigh the minor performance loss caused in such an unlikely scenario.
Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/
EDB www.EnterpriseDB.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jun 12, 2014 at 12:17 AM, Gurjeet Singh <gurjeet@singh.im> wrote:
On Wed, Jun 11, 2014 at 10:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Jun 10, 2014 at 10:03 PM, Gurjeet Singh <gurjeet@singh.im> wrote:
And it's probably accepted by now that such a bahviour is not
catastrophic, merely inconvenient.I think the whole argument for having pg_hibernator is that getting
the block cache properly initialized is important. If it's not
important, then we don't need pg_hibernator in the first place. But
if it is important, then I think not loading unrelated blocks into
shared_buffers is also important.I was constructing a contrived scenario, something that would rarely
happen in reality. I feel that the benefits of this feature greatly
outweigh the minor performance loss caused in such an unlikely scenario.
So, are you proposing this for inclusion in PostgreSQL core?
If not, I don't think there's much to discuss here - people can use it
or not as they see fit, and we'll see what happens and perhaps design
improvements will result, or not.
If so, that's different: you'll need to demonstrate the benefits via
convincing proof points, and you'll also need to show that the
disadvantages are in fact minor and that the scenario is in fact
unlikely. So far there are zero performance numbers on this thread, a
situation that doesn't meet community standards for a performance
patch.
Thanks,
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jun 12, 2014 at 12:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:
So, are you proposing this for inclusion in PostgreSQL core?
Yes, as a contrib module.
If so, that's different: you'll need to demonstrate the benefits via
convincing proof points
Please see attached charts, and the spreadsheet that these charts were
generated from.
Quoting from my blog, where I first published these charts:
<quote>
As can be seen in the chart below, the database ramp-up time drops
dramatically when Postgres Hibernator is enabled. The sooner the
database TPS can reach the steady state, the faster your applications
can start performing at full throttle.
The ramp-up time is even shorter if you wait for the Postgres
Hibernator processes to end, before starting your applications.
As is quite evident, waiting for Postgres Hibernator to finish loading
the data blocks before starting the application yeilds a 97%
impprovement in database ramp-up time (2300 seconds to get to 122k TPS
without Postgres Hibernator vs. 70 seconds).
### Details
Please note that this is not a real benchmark, just something I
developed to showcase this extension at its sweet spot.
The full source of this mini benchmark is available with the source
code of the Postgres Hibernator, at its [Git repo][pg_hibernator_git].
```
Hardware: MacBook Pro 9,1
OS Distribution: Ubuntu 12.04 Desktop
OS Kernel: Linux 3.11.0-19-generic
RAM: 8 GB
Physical CPU: 1
CPU Count: 4
Core Count: 8
pgbench scale: 260 (~ 4 GB database)
```
Before every test run, except the last ('DB-only restart; No
Hibernator'), the Linux OS caches are dropped to simulate an OS
restart.
In 'First Run', the Postgres Hibernator is enabled, but since this is
the first ever run of the database, Postgres Hibernator doesn't kick
in until shutdown, to save the buffer list.
In 'Hibernator w/ App', the application (pgbench) is started right
after database restart. The Postgres Hibernator is restoring the data
blocks to shared buffers while the application is also querying the
database.
In the 'App after Hibernator' case, the application is started _after_
the Postgres Hibernator has finished reading database blocks. This
took 70 seconds for reading the ~4 GB database.
In 'DB-only restart; No Hibernator` run, the OS caches are not
dropped, but just the database service is restarted. This simulates
database minor version upgrades, etc.
</quote>
and you'll also need to show that the
disadvantages are in fact minor and that the scenario is in fact
unlikely.
Attached is the new patch that addresses this concern. Right at
startup, Postgres hibernator renames all .save files to
.save.restoring. Later BlockReaders restore the blocks listed in the
.save.restoring files. If, for any reason, the database crashes and
restarts, the next startup of Hibernator will first remove all
.save.restoring files.
So in the case of my contrived example,
<scenario>
1) A database is shutdown, which creates the save-files in
$PGDATA/pg_hibernator/.
2) The database is restarted.
3) BlockReaders begin to read and restore the disk blocks into buffers.
4) Before the BlockReaders could finish*, a copy of the database is
taken (rsync/cp/FS-snapshot/etc.)
This causes the the save-files to be present in the copy, because
the BlockReaders haven't deleted them, yet.
* (The BlockReaders ideally finish their task in first few minutes
after first of them is started.)
5) The copy of the database is used to restore and erect a warm-standby.
6) The warm-standby starts replaying logs from WAL archive/stream.
7) Some time (hours/weeks/months) later, the warm-standby is promoted
to be a master.
8) It starts the Postgres Hibernator, which sees save-files in
$PGDATA/pg_hibernator/ and launches BlockReaders.
</scenario>
Right at step 2 the .save files will be renamed to .save.restoring,
and later at step 8 Hibernator removes all .save.restoring files
before proceeding further. So the BlockReaders will not restore stale
save-files.
Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/
EDB www.EnterpriseDB.com
Attachments:
pg_hibernator_comparison.pngimage/png; name=pg_hibernator_comparison.pngDownload
�PNG
IHDR � ^�F= sBIT��O� tEXtSoftware gnome-screenshot��>