pg_prewarm
It's been bugging me for a while now that we don't have a prewarming
utility, for a couple of reasons, including:
1. Our customers look at me funny when I suggest that they use
pg_relation_filepath() and /bin/dd for this purpose.
2. Sometimes when I'm benchmarking stuff, I want to get all the data
cached in shared_buffers. This is surprisingly hard to do if the size
of any relation involved is >=1/4 of shared buffers, because the
BAS_BULKREAD stuff kicks in. You can do it by repeatedly seq-scanning
the relation - eventually all the blocks trickle in - but it takes a
long time, and that's annoying.
So I wrote a prewarming utility. Patch is attached. You can prewarm
either the OS cache or PostgreSQL's cache, and there are two options
for prewarming the OS cache to meet different needs. By passing the
correct arguments to the function, you can prewarm an entire relation
or just the blocks you choose; prewarming of blocks from alternate
relation forks is also supported, for completeness.
Hope you like it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
pg_prewarm_v1.patchapplication/octet-stream; name=pg_prewarm_v1.patchDownload+308-0
On Thu, Mar 8, 2012 at 11:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:
It's been bugging me for a while now that we don't have a prewarming
utility, for a couple of reasons, including:1. Our customers look at me funny when I suggest that they use
pg_relation_filepath() and /bin/dd for this purpose.
well, you can't deny that is funny see people doing faces ;)
So I wrote a prewarming utility. Patch is attached.
cool!
just a suggestion, can we relax this check? just send a WARNING or a
NOTICE and set "last_block = nblocks - 1"
just an opinion
+ if (PG_ARGISNULL(4))
+ last_block = nblocks - 1;
+ else
+ {
+ last_block = PG_GETARG_INT64(4);
+ if (last_block > nblocks)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("ending block number " INT64_FORMAT " exceeds number of
blocks in relation " INT64_FORMAT, last_block, nblocks)));
+ }
--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
So I wrote a prewarming utility. Patch is attached. You can prewarm
either the OS cache or PostgreSQL's cache, and there are two options for
prewarming the OS cache to meet different needs. By passing the correct
arguments to the function, you can prewarm an entire relation or just
the blocks you choose; prewarming of blocks from alternate relation
forks is also supported, for completeness. Hope you like it.
+1
Hi,
On Thu, 2012-03-08 at 23:13 -0500, Robert Haas wrote:
It's been bugging me for a while now that we don't have a prewarming
utility, for a couple of reasons, including:1. Our customers look at me funny when I suggest that they use
pg_relation_filepath() and /bin/dd for this purpose.2. Sometimes when I'm benchmarking stuff, I want to get all the data
cached in shared_buffers. This is surprisingly hard to do if the size
of any relation involved is >=1/4 of shared buffers, because the
BAS_BULKREAD stuff kicks in. You can do it by repeatedly seq-scanning
the relation - eventually all the blocks trickle in - but it takes a
long time, and that's annoying.So I wrote a prewarming utility.
I was talking to an Oracle DBA about this just yesterday. We also have
pgfincore, but pg_prewarm is pretty much we need actually, I think. Did
not test the patch, but the feature should be in core/contrib/whatever.
This will also increase performance for the static tables that needs to
be in the buffers all the time. I'm also seeing some use cases for BI
databases.
Thanks!
Regards,
--
Devrim GÜNDÜZ
Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz
On Fri, Mar 9, 2012 at 1:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:
It's been bugging me for a while now that we don't have a prewarming
utility, for a couple of reasons, including:1. Our customers look at me funny when I suggest that they use
pg_relation_filepath() and /bin/dd for this purpose.2. Sometimes when I'm benchmarking stuff, I want to get all the data
cached in shared_buffers. This is surprisingly hard to do if the size
of any relation involved is >=1/4 of shared buffers, because the
BAS_BULKREAD stuff kicks in. You can do it by repeatedly seq-scanning
the relation - eventually all the blocks trickle in - but it takes a
long time, and that's annoying.So I wrote a prewarming utility. Patch is attached. You can prewarm
either the OS cache or PostgreSQL's cache, and there are two options
for prewarming the OS cache to meet different needs. By passing the
correct arguments to the function, you can prewarm an entire relation
or just the blocks you choose; prewarming of blocks from alternate
relation forks is also supported, for completeness.Hope you like it.
+1
When a relation is loaded into cache, are corresponding indexes also loaded
at the same time? Can this load only the specified index into cache?
When the relation is too huge to fit into the cache and most access pattern
in the system is index scan, DBA might want to load only index rather
than table.
For such system, so far I've been suggesting using pgstatindex, but it's good
if pg_prewarm can do that.
This utility might be helpful to accelerate a recovery of WAL record not
containing FPW. IOW, before starting a recovery, list the relations to recover
from WAL files by using xlogdump tool, load them into cache by using
this utility,
and then start a recovery.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
we had some different idea here in the past: what if we had a procedure / method to allow people to save the list of current buffers / cached blocks to be written to disk (sorted). we could then reload this "cache profile" on startup in the background or people could load a certain cache content at runtime (maybe to test or whatever).
writing those block ids in sorted order would help us to avoid some random I/O on reload.
regards,
hans
On Mar 9, 2012, at 5:13 AM, Robert Haas wrote:
It's been bugging me for a while now that we don't have a prewarming
utility, for a couple of reasons, including:1. Our customers look at me funny when I suggest that they use
pg_relation_filepath() and /bin/dd for this purpose.2. Sometimes when I'm benchmarking stuff, I want to get all the data
cached in shared_buffers. This is surprisingly hard to do if the size
of any relation involved is >=1/4 of shared buffers, because the
BAS_BULKREAD stuff kicks in. You can do it by repeatedly seq-scanning
the relation - eventually all the blocks trickle in - but it takes a
long time, and that's annoying.So I wrote a prewarming utility. Patch is attached. You can prewarm
either the OS cache or PostgreSQL's cache, and there are two options
for prewarming the OS cache to meet different needs. By passing the
correct arguments to the function, you can prewarm an entire relation
or just the blocks you choose; prewarming of blocks from alternate
relation forks is also supported, for completeness.Hope you like it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
<pg_prewarm_v1.patch>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de
On Fri, Mar 9, 2012 at 5:24 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
When a relation is loaded into cache, are corresponding indexes also loaded
at the same time?
No, although if you wanted to do that you could easily do so, using a
query like this:
select pg_prewarm(indexrelid, 'main', 'read', NULL, NULL) from
pg_index where indrelid = 'your_table_name'::regclass;
Can this load only the specified index into cache?
Yes. The relation can be anything that has storage, so you can
prewarm either a table or an index (or even a sequence or TOAST table,
if you're so inclined).
When the relation is too huge to fit into the cache and most access pattern
in the system is index scan, DBA might want to load only index rather
than table.
For such system, so far I've been suggesting using pgstatindex, but it's good
if pg_prewarm can do that
pgstatindex is an interesting idea; hadn't thought of that. Actually,
though, pgstaindex probably ought to be using a BufferAccessStrategy
to avoid trashing the cache. I've had reports of pgstatindex
torpedoing performance on production systems.
This utility might be helpful to accelerate a recovery of WAL record not
containing FPW. IOW, before starting a recovery, list the relations to recover
from WAL files by using xlogdump tool, load them into cache by using
this utility,
and then start a recovery.
Interesting idea.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
It's been bugging me for a while now that we don't have a prewarming
utility, for a couple of reasons, including:1. Our customers look at me funny when I suggest that they use
pg_relation_filepath() and /bin/dd for this purpose.
Try telling them about pgfincore maybe.
https://github.com/klando/pgfincore
2. Sometimes when I'm benchmarking stuff, I want to get all the data
cached in shared_buffers. This is surprisingly hard to do if the size
of any relation involved is >=1/4 of shared buffers, because the
BAS_BULKREAD stuff kicks in. You can do it by repeatedly seq-scanning
the relation - eventually all the blocks trickle in - but it takes a
long time, and that's annoying.
That reminds me of something…
cedric=# select * from pgfadvise_willneed('pgbench_accounts');
relpath | os_page_size | rel_os_pages | os_pages_free
--------------------+--------------+--------------+---------------
base/11874/16447 | 4096 | 262144 | 169138
base/11874/16447.1 | 4096 | 65726 | 103352
(2 rows)
Time: 4462,936 ms
With pgfincore you can also get at how many pages are in memory already,
os cache or shared buffers, per file segment of a relation. So you can
both force warming up a whole relation, parts of it, and check the
current state of things.
So I wrote a prewarming utility. Patch is attached. You can prewarm
either the OS cache or PostgreSQL's cache, and there are two options
for prewarming the OS cache to meet different needs. By passing the
correct arguments to the function, you can prewarm an entire relation
or just the blocks you choose; prewarming of blocks from alternate
relation forks is also supported, for completeness.
Is it possible with your tool to snapshot the OS and PostgreSQL cache in
order to warm an Hot Standby server?
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Fri, Mar 9, 2012 at 5:42 AM, Hans-Jürgen Schönig
<postgres@cybertec.at> wrote:
we had some different idea here in the past: what if we had a procedure / method to allow people to save the list of current buffers / cached blocks to be written to disk (sorted). we could then reload this "cache profile" on startup in the background or people could load a certain cache content at runtime (maybe to test or whatever).
writing those block ids in sorted order would help us to avoid some random I/O on reload.
I don't think that's a bad idea at all, and someone actually did write
a patch for it at one point, though it didn't get committed, partly I
believe because of technical issues and partly because Greg Smith was
uncertain how much good it did to restore shared_buffers without
thinking about the OS cache. Personally, I don't buy into the latter
objection: a lot of people are running with data sets that fit inside
shared_buffers, and those people would benefit tremendously.
However, this just provides mechanism, not policy, and is therefore
more general. You could use pg_buffercache to save the cache contents
at shutdown and pg_prewarm to load those blocks back in at startup, if
you were so inclined. Or if you just want to load up your main
relation, and its indexes, you can do that, too.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Mar 9, 2012 at 8:25 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
It's been bugging me for a while now that we don't have a prewarming
utility, for a couple of reasons, including:1. Our customers look at me funny when I suggest that they use
pg_relation_filepath() and /bin/dd for this purpose.Try telling them about pgfincore maybe.
Oh, huh. I had no idea that pgfincore could do that. I thought that
was just for introspection; I didn't realize it could actually move
data around for you.
2. Sometimes when I'm benchmarking stuff, I want to get all the data
cached in shared_buffers. This is surprisingly hard to do if the size
of any relation involved is >=1/4 of shared buffers, because the
BAS_BULKREAD stuff kicks in. You can do it by repeatedly seq-scanning
the relation - eventually all the blocks trickle in - but it takes a
long time, and that's annoying.That reminds me of something…
cedric=# select * from pgfadvise_willneed('pgbench_accounts');
relpath | os_page_size | rel_os_pages | os_pages_free
--------------------+--------------+--------------+---------------
base/11874/16447 | 4096 | 262144 | 169138
base/11874/16447.1 | 4096 | 65726 | 103352
(2 rows)Time: 4462,936 ms
That's not the same thing. That's pulling them into the OS cache, not
shared_buffers.
Is it possible with your tool to snapshot the OS and PostgreSQL cache in
order to warm an Hot Standby server?
Nope. It doesn't have any capabilities to probe for information,
because I knew those things already existed in pg_buffercache and
pgfincore, and also because they weren't what I needed to solve my
immediate problem, which was a way to get the entirety of a relation
into shared_buffers.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
Oh, huh. I had no idea that pgfincore could do that. I thought that
was just for introspection; I didn't realize it could actually move
data around for you.
Well, I though Cédric already had included shared buffers related
facilities, so that make us square it seems…
Is it possible with your tool to snapshot the OS and PostgreSQL cache in
order to warm an Hot Standby server?Nope. It doesn't have any capabilities to probe for information,
because I knew those things already existed in pg_buffercache and
pgfincore, and also because they weren't what I needed to solve my
immediate problem, which was a way to get the entirety of a relation
into shared_buffers.
So that's complementary with pgfincore, ok. I still wish we could
maintain the RAM content HOT on the standby in the same way we are able
to maintain its data set on disk, though.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Fri, Mar 9, 2012 at 10:33 AM, Dimitri Fontaine
<dimitri@2ndquadrant.fr> wrote:
So that's complementary with pgfincore, ok. I still wish we could
maintain the RAM content HOT on the standby in the same way we are able
to maintain its data set on disk, though.
That's an interesting idea. It seems tricky, though.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Mar 9, 2012 at 5:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 9, 2012 at 5:24 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
When a relation is loaded into cache, are corresponding indexes also loaded
at the same time?No, although if you wanted to do that you could easily do so, using a
query like this:select pg_prewarm(indexrelid, 'main', 'read', NULL, NULL) from
pg_index where indrelid = 'your_table_name'::regclass;
Could that be included in an example? Maybe admins are expected to
know how to construct such queries of the cuff, but I always need to
look it up each time which is rather tedious.
In the patch:
s/no special projection/no special protection/
Thanks for putting this together.
Cheers,
Jeff
On Fri, Mar 9, 2012 at 10:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Fri, Mar 9, 2012 at 5:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 9, 2012 at 5:24 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
When a relation is loaded into cache, are corresponding indexes also loaded
at the same time?No, although if you wanted to do that you could easily do so, using a
query like this:select pg_prewarm(indexrelid, 'main', 'read', NULL, NULL) from
pg_index where indrelid = 'your_table_name'::regclass;Could that be included in an example? Maybe admins are expected to
know how to construct such queries of the cuff, but I always need to
look it up each time which is rather tedious.
Not a bad idea. I thought of including an "Examples" section, but it
didn't seem quite worth it for the simple case of prewarming a heap.
Might be worth it to also include this.
In the patch:
s/no special projection/no special protection/
OK, will fix.
Thanks for putting this together.
I will confess that it was 0% altruistic. Not having it was ruining
my day. :-)
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mar 9, 2012, at 2:34 PM, Robert Haas wrote:
On Fri, Mar 9, 2012 at 5:42 AM, Hans-Jürgen Schönig
<postgres@cybertec.at> wrote:we had some different idea here in the past: what if we had a procedure / method to allow people to save the list of current buffers / cached blocks to be written to disk (sorted). we could then reload this "cache profile" on startup in the background or people could load a certain cache content at runtime (maybe to test or whatever).
writing those block ids in sorted order would help us to avoid some random I/O on reload.I don't think that's a bad idea at all, and someone actually did write
a patch for it at one point, though it didn't get committed, partly I
believe because of technical issues and partly because Greg Smith was
uncertain how much good it did to restore shared_buffers without
thinking about the OS cache. Personally, I don't buy into the latter
objection: a lot of people are running with data sets that fit inside
shared_buffers, and those people would benefit tremendously.However, this just provides mechanism, not policy, and is therefore
more general. You could use pg_buffercache to save the cache contents
at shutdown and pg_prewarm to load those blocks back in at startup, if
you were so inclined. Or if you just want to load up your main
relation, and its indexes, you can do that, too.--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
i also think that it can be beneficial.
once in a while people ask how to "bring a database up to speed" after a restart. i have seen more than one case when a DB was close to death after a restart because random I/O was simply killing it during cache warmup. it seems the problem is getting worse as we see machines with more and more RAM in the field.
technically i would see a rather brute force approach: if we just spill out of the list of blocks we got in shared buffer atm (not content of course, just physical location sorted by file / position in file) it would be good enough. if a block physically does not exist on reload any more it would not even be an issue and allow people basically to "snapshot" their cache status. we could allow named cache profiles or so and make a GUC to indicate of one of them should be preloaded on startup (background or beforehand - i see usecases for both approaches).
yes, somehow linking to pg_buffercache makes a lot of sense. maybe just extending it with some extra functions is already enough for most cases.
hans
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de
Hi Robert,
Just recently I asked on postgres-performance "PG as in-memory db? How
to warm up and re-populate buffers? How to read in all tuples into
memory?"
Somehow open was, what's the best practice of configuration and
relationship between disk/OS cache vs. Portgres cache
The main conclusion was:
* Do a "tar cf /dev/zero $PG_DATA/base either shortly before or
shortly after the database is created"
* Do a seq scan "SELECT * FROM osm_point".
Is your tool a replacement of those above?
-Stefan
2012/3/9 Robert Haas <robertmhaas@gmail.com>:
Show quoted text
On Fri, Mar 9, 2012 at 10:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Fri, Mar 9, 2012 at 5:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 9, 2012 at 5:24 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
When a relation is loaded into cache, are corresponding indexes also loaded
at the same time?No, although if you wanted to do that you could easily do so, using a
query like this:select pg_prewarm(indexrelid, 'main', 'read', NULL, NULL) from
pg_index where indrelid = 'your_table_name'::regclass;Could that be included in an example? Maybe admins are expected to
know how to construct such queries of the cuff, but I always need to
look it up each time which is rather tedious.Not a bad idea. I thought of including an "Examples" section, but it
didn't seem quite worth it for the simple case of prewarming a heap.
Might be worth it to also include this.In the patch:
s/no special projection/no special protection/
OK, will fix.
Thanks for putting this together.
I will confess that it was 0% altruistic. Not having it was ruining
my day. :-)--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Mar 10, 2012 at 4:35 PM, Stefan Keller <sfkeller@gmail.com> wrote:
The main conclusion was:
* Do a "tar cf /dev/zero $PG_DATA/base either shortly before or
shortly after the database is created"
* Do a seq scan "SELECT * FROM osm_point".Is your tool a replacement of those above?
It can be used that way, although it is more general.
(The patch does include documentation...)
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi Robert
2012/3/11 Robert Haas <robertmhaas@gmail.com>:
On Sat, Mar 10, 2012 at 4:35 PM, Stefan Keller <sfkeller@gmail.com> wrote:
The main conclusion was:
* Do a "tar cf /dev/zero $PG_DATA/base either shortly before or
shortly after the database is created"
* Do a seq scan "SELECT * FROM osm_point".Is your tool a replacement of those above?
It can be used that way, although it is more general.
(The patch does include documentation...)
Thanks for the hint. That function is cool and it seems to be the
solution of the concluding question in my talk about read-only
databases at pgconf.de 2011!
I'm new to the contrib best practices of Postgres so I did not expect
that a file 'pg_prewarm_v1.patch' contains a brand new stand-alone
extension.
Does pg_prewarm have already a website entry somewhere? I did not find
anything (like here
http://www.postgresql.org/search/?q=pg_prewarm&a=1&submit=Search )
except at Commitfest open patches (https://commitfest.postgresql.org/
).
-Stefan
Le vendredi 9 mars 2012 16:50:05, Robert Haas a écrit :
On Fri, Mar 9, 2012 at 10:33 AM, Dimitri Fontaine
<dimitri@2ndquadrant.fr> wrote:
So that's complementary with pgfincore, ok. I still wish we could
maintain the RAM content HOT on the standby in the same way we are able
to maintain its data set on disk, though.That's an interesting idea. It seems tricky, though.
it is the purpose of the latest pgfincore version.
I use a varbit as output of introspection on master, then you are able to
store in a table, stream to slaves, then replay localy.
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation
Cédric and Robert
Thanks, Cédric, for the reminder.
Would be nice to sort out the features of the two Postgres extentions
pgfincore (https://github.com/klando/pgfincore ) and pg_prewarm: what
do they have in common, what is complementary?
I would be happy to test both. But when reading the current
documentation I'm missing installation requirements (PG version,
replication? memory/hardware requirements), specifics of Linux (and
Windows if supported), and some config. hints (e.g.
relationships/dependencies of OS cache and PG cache an
postgresql.conf).
-Stefan
2012/3/11 Cédric Villemain <cedric@2ndquadrant.com>:
Show quoted text
Le vendredi 9 mars 2012 16:50:05, Robert Haas a écrit :
On Fri, Mar 9, 2012 at 10:33 AM, Dimitri Fontaine
<dimitri@2ndquadrant.fr> wrote:
So that's complementary with pgfincore, ok. I still wish we could
maintain the RAM content HOT on the standby in the same way we are able
to maintain its data set on disk, though.That's an interesting idea. It seems tricky, though.
it is the purpose of the latest pgfincore version.
I use a varbit as output of introspection on master, then you are able to
store in a table, stream to slaves, then replay localy.--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers