Buffers: shared hit/read to shared_buffers dependence

Started by Pavel Suderevskyover 10 years ago5 messagesgeneral
Jump to latest
#1Pavel Suderevsky
psuderevsky@gmail.com

Hi,

When I have been passing through "Understanding explain" manual (
http://www.dalibo.org/_media/understanding_explain.pdf) I've faced some
strange situation when table with size of 65MB completely placed in cache
with shared_buffers=320MB and it doesn't with shared_buffers <= 256MB.
Actually behaviour of caching in my case is the same with either 256MB or
32MB. Im my mind shared_buffers with size of 256MB should be enough for
caching table with size of 65MB, but it isn't. Could you please explain
such behaviour?

Steps:

understanding_explain=# select pg_size_pretty(pg_relation_size('foo'));
pg_size_pretty
----------------
65 MB
(1 row)
=============================================================================================================================

postgres=# show shared_buffers ;
shared_buffers
----------------
320MB
(1 row)

postgres=# \c understanding_explain
You are now connected to database "understanding_explain" as user
"postgres".
understanding_explain=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..17500.60 rows=1000000 width=37) (actual
time=0.786..143.686 rows=1000000 loops=1)
Buffers: shared read=8334
Planning time: 3.796 ms
Execution time: 195.557 ms
(4 rows)

understanding_explain=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..17500.60 rows=1000000 width=37) (actual
time=0.009..83.546 rows=1000000 loops=1)
Buffers: shared hit=8334
Planning time: 0.029 ms
Execution time: 129.499 ms
(4 rows)

=============================================================================================================================
[root@dbtest3 ~]# systemctl stop postgres
[root@dbtest3 ~]# sync
[root@dbtest3 ~]# echo 3 > /proc/sys/vm/drop_caches
[root@dbtest3 ~]# systemctl start postgres
=============================================================================================================================

understanding_explain=# show shared_buffers;
shared_buffers
----------------
256MB
(1 row)

understanding_explain=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..17500.60 rows=1000000 width=37) (actual
time=0.772..126.242 rows=1000000 loops=1)
Buffers: shared read=8334
Planning time: 5.164 ms
Execution time: 181.306 ms
(4 rows)

understanding_explain=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..17500.60 rows=1000000 width=37) (actual
time=0.029..91.686 rows=1000000 loops=1)
Buffers: shared hit=32 read=8302
Planning time: 0.025 ms
Execution time: 136.584 ms
(4 rows)

With every new query execution 32 hits adding to shared hit value.

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Pavel Suderevsky (#1)
Re: Buffers: shared hit/read to shared_buffers dependence

Pavel Suderevsky wrote:

When I have been passing through "Understanding explain" manual (http://www.dalibo.org/_media/understanding_explain.pdf)
I've faced some strange situation when table with size of 65MB completely placed in cache with shared_buffers=320MB and it doesn't with shared_buffers <= 256MB.
Actually behaviour of caching in my case is the same with either 256MB or 32MB. Im my mind shared_buffers
with size of 256MB should be enough for caching table with size of 65MB, but it isn't. Could you please explain such behaviour?

Steps:

understanding_explain=# select pg_size_pretty(pg_relation_size('foo'));
pg_size_pretty
----------------
65 MB
(1 row)

postgres=# show shared_buffers ;
shared_buffers
----------------
320MB
(1 row)

understanding_explain=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..17500.60 rows=1000000 width=37) (actual time=0.786..143.686 rows=1000000 loops=1)
Buffers: shared read=8334

understanding_explain=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..17500.60 rows=1000000 width=37) (actual time=0.009..83.546 rows=1000000 loops=1)
Buffers: shared hit=8334

understanding_explain=# show shared_buffers;
shared_buffers
----------------
256MB
(1 row)

understanding_explain=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..17500.60 rows=1000000 width=37) (actual time=0.772..126.242 rows=1000000 loops=1)
Buffers: shared read=8334

understanding_explain=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..17500.60 rows=1000000 width=37) (actual time=0.029..91.686 rows=1000000 loops=1)
Buffers: shared hit=32 read=8302

With every new query execution 32 hits adding to shared hit value.

This must be due to this commit:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d526575f893c1a4e05ebd307e80203536b213a6d

See also src/backend/storage/buffer/README, chapter
"Buffer Ring Replacement Strategy" and the functions initcan() and GetAccessStrategy()
in the source.

Basically, if in a sequential table scan shared_buffers is less than four times the estimated table size,
PostgreSQL will allocate a "ring buffer" of size 256 KB to cache the table data, so that a large sequential scan
does not "blow out" significant parts of the shared cache.
The rationale is that data from a sequential scan will probably not be needed again right away, while
other data in the cache might be hot.

That's what you see in your second example: 32 buffers equals 256 KB, and the ring buffer is chosen from
free buffer pages, so the amount of table data cached increases by 32 buffers every time.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Merlin Moncure
mmoncure@gmail.com
In reply to: Laurenz Albe (#2)
Re: Buffers: shared hit/read to shared_buffers dependence

On Sat, Sep 5, 2015 at 3:28 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

Pavel Suderevsky wrote:

When I have been passing through "Understanding explain" manual (http://www.dalibo.org/_media/understanding_explain.pdf)
I've faced some strange situation when table with size of 65MB completely placed in cache with shared_buffers=320MB and it doesn't with shared_buffers <= 256MB.
Actually behaviour of caching in my case is the same with either 256MB or 32MB. Im my mind shared_buffers
with size of 256MB should be enough for caching table with size of 65MB, but it isn't. Could you please explain such behaviour?

Steps:

understanding_explain=# select pg_size_pretty(pg_relation_size('foo'));
pg_size_pretty
----------------
65 MB
(1 row)

postgres=# show shared_buffers ;
shared_buffers
----------------
320MB
(1 row)

understanding_explain=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..17500.60 rows=1000000 width=37) (actual time=0.786..143.686 rows=1000000 loops=1)
Buffers: shared read=8334

understanding_explain=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..17500.60 rows=1000000 width=37) (actual time=0.009..83.546 rows=1000000 loops=1)
Buffers: shared hit=8334

understanding_explain=# show shared_buffers;
shared_buffers
----------------
256MB
(1 row)

understanding_explain=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..17500.60 rows=1000000 width=37) (actual time=0.772..126.242 rows=1000000 loops=1)
Buffers: shared read=8334

understanding_explain=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..17500.60 rows=1000000 width=37) (actual time=0.029..91.686 rows=1000000 loops=1)
Buffers: shared hit=32 read=8302

With every new query execution 32 hits adding to shared hit value.

This must be due to this commit:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d526575f893c1a4e05ebd307e80203536b213a6d

See also src/backend/storage/buffer/README, chapter
"Buffer Ring Replacement Strategy" and the functions initcan() and GetAccessStrategy()
in the source.

Basically, if in a sequential table scan shared_buffers is less than four times the estimated table size,
PostgreSQL will allocate a "ring buffer" of size 256 KB to cache the table data, so that a large sequential scan
does not "blow out" significant parts of the shared cache.
The rationale is that data from a sequential scan will probably not be needed again right away, while
other data in the cache might be hot.

That's what you see in your second example: 32 buffers equals 256 KB, and the ring buffer is chosen from
free buffer pages, so the amount of table data cached increases by 32 buffers every time.

Yeah. Couple more points:
*) If your table has an index on it, you can try disabling sequential
scans temporarily (via set enable_seqscan) in order to get the
bitmapscan which IIRC does not use ring buffers.

*) for a more robust approach to that, check out the prewarm utility:
http://www.postgresql.org/docs/9.4/static/pgprewarm.html

*) Even if postgres does not cache the table, the o/s will probably
still cache it assuming it has the memory to do so. Shared buffers
are faster than reading from memory cached by the kernel, but that's
much faster than reading from storage unless your storage is very,
very fast.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Pavel Suderevsky
psuderevsky@gmail.com
In reply to: Merlin Moncure (#3)
Re: Buffers: shared hit/read to shared_buffers dependence

Laurenz, Merlin,

Thanks a lot for your explanations.

Even if postgres does not cache the table, the o/s will probably
still cache it assuming it has the memory to do so.

Could you please clarify, do I understand right that there are no way to
determine with 'explain' whether postgres applies to hard drive or OS cache
buffer?

2015-09-09 0:47 GMT+03:00 Merlin Moncure <mmoncure@gmail.com>:

Show quoted text

On Sat, Sep 5, 2015 at 3:28 PM, Albe Laurenz <laurenz.albe@wien.gv.at>
wrote:

Pavel Suderevsky wrote:

When I have been passing through "Understanding explain" manual (

http://www.dalibo.org/_media/understanding_explain.pdf)

I've faced some strange situation when table with size of 65MB

completely placed in cache with shared_buffers=320MB and it doesn't with
shared_buffers <= 256MB.

Actually behaviour of caching in my case is the same with either 256MB

or 32MB. Im my mind shared_buffers

with size of 256MB should be enough for caching table with size of

65MB, but it isn't. Could you please explain such behaviour?

Steps:

understanding_explain=# select pg_size_pretty(pg_relation_size('foo'));
pg_size_pretty
----------------
65 MB
(1 row)

postgres=# show shared_buffers ;
shared_buffers
----------------
320MB
(1 row)

understanding_explain=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------

Seq Scan on foo (cost=0.00..17500.60 rows=1000000 width=37) (actual

time=0.786..143.686 rows=1000000 loops=1)

Buffers: shared read=8334

understanding_explain=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------

Seq Scan on foo (cost=0.00..17500.60 rows=1000000 width=37) (actual

time=0.009..83.546 rows=1000000 loops=1)

Buffers: shared hit=8334

understanding_explain=# show shared_buffers;
shared_buffers
----------------
256MB
(1 row)

understanding_explain=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------

Seq Scan on foo (cost=0.00..17500.60 rows=1000000 width=37) (actual

time=0.772..126.242 rows=1000000 loops=1)

Buffers: shared read=8334

understanding_explain=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------

Seq Scan on foo (cost=0.00..17500.60 rows=1000000 width=37) (actual

time=0.029..91.686 rows=1000000 loops=1)

Buffers: shared hit=32 read=8302

With every new query execution 32 hits adding to shared hit value.

This must be due to this commit:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d526575f893c1a4e05ebd307e80203536b213a6d

See also src/backend/storage/buffer/README, chapter
"Buffer Ring Replacement Strategy" and the functions initcan() and

GetAccessStrategy()

in the source.

Basically, if in a sequential table scan shared_buffers is less than

four times the estimated table size,

PostgreSQL will allocate a "ring buffer" of size 256 KB to cache the

table data, so that a large sequential scan

does not "blow out" significant parts of the shared cache.
The rationale is that data from a sequential scan will probably not be

needed again right away, while

other data in the cache might be hot.

That's what you see in your second example: 32 buffers equals 256 KB,

and the ring buffer is chosen from

free buffer pages, so the amount of table data cached increases by 32

buffers every time.

Yeah. Couple more points:
*) If your table has an index on it, you can try disabling sequential
scans temporarily (via set enable_seqscan) in order to get the
bitmapscan which IIRC does not use ring buffers.

*) for a more robust approach to that, check out the prewarm utility:
http://www.postgresql.org/docs/9.4/static/pgprewarm.html

*) Even if postgres does not cache the table, the o/s will probably
still cache it assuming it has the memory to do so. Shared buffers
are faster than reading from memory cached by the kernel, but that's
much faster than reading from storage unless your storage is very,
very fast.

merlin

#5Guillaume Lelarge
guillaume@lelarge.info
In reply to: Pavel Suderevsky (#4)
Re: Buffers: shared hit/read to shared_buffers dependence

2015-09-09 17:06 GMT+02:00 Pavel Suderevsky <psuderevsky@gmail.com>:

Laurenz, Merlin,

Thanks a lot for your explanations.

Even if postgres does not cache the table, the o/s will probably
still cache it assuming it has the memory to do so.

Could you please clarify, do I understand right that there are no way to
determine with 'explain' whether postgres applies to hard drive or OS cache
buffer?

You're right.

2015-09-09 0:47 GMT+03:00 Merlin Moncure <mmoncure@gmail.com>:

On Sat, Sep 5, 2015 at 3:28 PM, Albe Laurenz <laurenz.albe@wien.gv.at>
wrote:

Pavel Suderevsky wrote:

When I have been passing through "Understanding explain" manual (

http://www.dalibo.org/_media/understanding_explain.pdf)

I've faced some strange situation when table with size of 65MB

completely placed in cache with shared_buffers=320MB and it doesn't with
shared_buffers <= 256MB.

Actually behaviour of caching in my case is the same with either 256MB

or 32MB. Im my mind shared_buffers

with size of 256MB should be enough for caching table with size of

65MB, but it isn't. Could you please explain such behaviour?

Steps:

understanding_explain=# select pg_size_pretty(pg_relation_size('foo'));
pg_size_pretty
----------------
65 MB
(1 row)

postgres=# show shared_buffers ;
shared_buffers
----------------
320MB
(1 row)

understanding_explain=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------

Seq Scan on foo (cost=0.00..17500.60 rows=1000000 width=37) (actual

time=0.786..143.686 rows=1000000 loops=1)

Buffers: shared read=8334

understanding_explain=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------

Seq Scan on foo (cost=0.00..17500.60 rows=1000000 width=37) (actual

time=0.009..83.546 rows=1000000 loops=1)

Buffers: shared hit=8334

understanding_explain=# show shared_buffers;
shared_buffers
----------------
256MB
(1 row)

understanding_explain=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------

Seq Scan on foo (cost=0.00..17500.60 rows=1000000 width=37) (actual

time=0.772..126.242 rows=1000000 loops=1)

Buffers: shared read=8334

understanding_explain=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------

Seq Scan on foo (cost=0.00..17500.60 rows=1000000 width=37) (actual

time=0.029..91.686 rows=1000000 loops=1)

Buffers: shared hit=32 read=8302

With every new query execution 32 hits adding to shared hit value.

This must be due to this commit:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d526575f893c1a4e05ebd307e80203536b213a6d

See also src/backend/storage/buffer/README, chapter
"Buffer Ring Replacement Strategy" and the functions initcan() and

GetAccessStrategy()

in the source.

Basically, if in a sequential table scan shared_buffers is less than

four times the estimated table size,

PostgreSQL will allocate a "ring buffer" of size 256 KB to cache the

table data, so that a large sequential scan

does not "blow out" significant parts of the shared cache.
The rationale is that data from a sequential scan will probably not be

needed again right away, while

other data in the cache might be hot.

That's what you see in your second example: 32 buffers equals 256 KB,

and the ring buffer is chosen from

free buffer pages, so the amount of table data cached increases by 32

buffers every time.

Yeah. Couple more points:
*) If your table has an index on it, you can try disabling sequential
scans temporarily (via set enable_seqscan) in order to get the
bitmapscan which IIRC does not use ring buffers.

*) for a more robust approach to that, check out the prewarm utility:
http://www.postgresql.org/docs/9.4/static/pgprewarm.html

*) Even if postgres does not cache the table, the o/s will probably
still cache it assuming it has the memory to do so. Shared buffers
are faster than reading from memory cached by the kernel, but that's
much faster than reading from storage unless your storage is very,
very fast.

merlin

--
Guillaume.
http://blog.guillaume.lelarge.info
http://www.dalibo.com