Low cache hit ratio

Started by Lucasover 4 years ago2 messagesgeneral

root@sud0.nz

over 4 years ago

Hello,

I have recently deployed a new Slave (streaming replication) and have been monitoring its cache hit ratio.

At the moment, read-only queries are being sent to this slave but only 10% of the traffic.
The cache hit ratio is now at 82%. This database is around 1.4TB and is running on EC2 instances in AWS. PostgreSQL version is 9.2.24.

The instance is a r4.2xlarge, so it's got 8 vCPUs and 61GB of RAM. The postgresql.conf file can be seen below (will show only relevant parameters for the cache/performance):

hot_standby = on
random_page_cost = 1.1
max_connections = 500
shared_buffers = 15GB
statement_timeout = 0
work_mem = 31457kB
maintenance_work_mem = 2GB
wal_level = hot_standby
fsync = on
synchronous_commit = on
wal_buffers = 16MB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
max_wal_senders = 20
wal_keep_segments = 1024
effective_cache_size = 45GB
logging_collector = on
autovacuum = on
log_autovacuum_min_duration = 1000
autovacuum_max_workers = 5
autovacuum_naptime = 40s
autovacuum_vacuum_threshold = 200
autovacuum_analyze_threshold = 150
autovacuum_vacuum_scale_factor = 0.02
autovacuum_analyze_scale_factor = 0.005
deadlock_timeout = 2s
max_files_per_process = 4096
effective_io_concurrency = 200
hot_standby_feedback = on
max_standby_streaming_delay = 120s
default_statistics_target = 100

I have tried reducing the shared_buffers parameter from 15GB to 8GB, but the cache hit ratio went down to 60%.

Do you guys have any suggestions, on what I could try to get this cache more hits?

Thanks in advance!

---
Regards,

Lucas

Show quoted text

This message is encrypted. Both the Public Key and the GPG encrypted message are included in this email so that you can verify its origin.

Philip Semanchuk

philip@americanefficient.com

over 4 years ago

In reply to: Lucas (#1)

Re: Low cache hit ratio

On Jul 29, 2021, at 3:09 AM, Lucas <root@sud0.nz> wrote:

Hello,

I have recently deployed a new Slave (streaming replication) and have been monitoring its cache hit ratio.

At the moment, read-only queries are being sent to this slave but only 10% of the traffic.
The cache hit ratio is now at 82%. This database is around 1.4TB and is running on EC2 instances in AWS. PostgreSQL version is 9.2.24.

The instance is a r4.2xlarge, so it's got 8 vCPUs and 61GB of RAM.

Hi Lucas,
Have you tried the pg_buffercache extension? It gives you some visibility into what’s in the cache so you can understand what’s staying resident and not leaving room for other things. I wrote a view atop pg_buffercache that I use for this purpose. It’s pasted below; I hope you find it helpful. My only caveat is that I run this under Postgres 11. I *think* I’ve used it under Postgres 9.6 but I’m not sure. It definitely hasn’t been tested on 9.2.

Hope this helps,
Philip

/* A view of pg_buffercache which shows what's in the Postgres cache.
Access to pg_buffercache requires membership in the group pg_monitor.

It's OK to query this ad hoc, but don't query it aggressively (e.g. in a polling loop). The
Postgres doc says --

When the pg_buffercache view is accessed, internal buffer manager locks are taken for
long enough to copy all the buffer state data that the view will display. This ensures
that the view produces a consistent set of results, while not blocking normal buffer
activity longer than necessary. Nonetheless there could be some impact on database
performance if this view is read often.

https://www.postgresql.org/docs/11/pgbuffercache.html

CREATE OR REPLACE VIEW
vw_postgres_cache
AS
SELECT
c.relname,
sum(usagecount) AS usage_count,
/* count(*) is the number of buffers in use. Each buffer is 8Kb. (8Kb is the page size
a.k.a. block size configured at compile time, confirm in psql with the command
`show block_size`).

I cast the count to float to introduce a float into calculations that are otherwise all int
which would produce a result rounded to the nearest int.
*/
-- cache_% shows the portion of the cache that this entity occupies
((count(*)::float / pg_settings.setting::int) * 100)::numeric(3, 1) AS "cache_%",
-- entity_% shows the portion of this entity that's in cache
-- pg_table_size() is in bytes, multiply buffer count by (8 * 1024) so units match
(((count(*)::float * 8192) / pg_table_size(c.relname::text)) * 100)::numeric(4,1)
AS "entity_%",
(count(*)::float * 8 / (1024 * 1024))::numeric(20, 1) AS Gb,
(count(*)::float * 8 / 1024 )::numeric(20, 1) AS Mb
FROM
pg_buffercache b
CROSS JOIN pg_settings
INNER JOIN pg_class c
ON b.relfilenode = pg_relation_filenode(c.oid)
AND b.reldatabase IN (0, (SELECT oid FROM pg_database WHERE datname = current_database()))
WHERE
pg_settings.name = 'shared_buffers'
-- If this is run on a system where shared_buffers is expressed in something other than 8kB
-- blocks, it will calculate incorrect kB/Mb/Gb values. Asserting the value of the unit here
-- ensures no results at all will be returned in that case.
AND pg_settings.unit = '8kB'
GROUP BY
c.relname, pg_settings.setting::int
HAVING
-- Only include entries that occupy at least 0.1% of the cache
((count(*)::float / pg_settings.setting::int) * 100) >= 0.1
ORDER BY 6 DESC
;

Low cache hit ratio

Attachments: