core system is getting unresponsive because over 300 cpu load
Hi to all!
We've got problem with a very serious repetitive incident on our core
system. Namely, cpu load spikes to 300-400 and the whole db becomes
unresponsive. From db point of view nothing special is happening, memory
looks fine, disks io's are ok and the only problem is huge cpu load. Kernel
parameters that are increasing with load are always the same:
* page tables size
* Committed_AS
* Active anon
<http://www.postgresql-archive.org/file/t342733/pagetables.png>
and the total number of connections are increasing very fast (but I suppose
it's the symptom not the root cause of cpu load) and exceed max_connections
(1000).
System:
* CentOS Linux release 7.2.1511 (Core)
* Linux 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux
* postgresql95-9.5.5-1PGDG.rhel7.x86_64
* postgresql95-contrib-9.5.5-1PGDG.rhel7.x86_64
* postgresql95-docs-9.5.5-1PGDG.rhel7.x86_64
* postgresql95-libs-9.5.5-1PGDG.rhel7.x86_64
* postgresql95-server-9.5.5-1PGDG.rhel7.x86_64
* 4 sockets/80 cores
* vm.dirty_background_bytes = 0
* vm.dirty_background_ratio = 2
* vm.dirty_bytes = 0
* vm.dirty_expire_centisecs = 3000
* vm.dirty_ratio = 20
* vm.dirty_writeback_centisecs = 500
after the first incident we have changed:
* increased shared_buffers to 16GB (completely on huge pages. previously
2GB)
* adjusted vm.nr_hugepages to 8000 (we've got 2mb pages)
* changed vm.overcommit_memory = 2 and vm.overcommit_ratio = 99
* disabled transparent huge pages (they were set before unfortunately to
'always')
It's a highly transactional db. Today I've run:
select now(), txid_current();
and the results:
3 339 351 transactions between 2017-10-10 14:42 and 2017-10-10 16:24
* db size 1,1TB
* RAM over 500GB
* biggest tables (the rest isn't big):
369 GB
48 GB
48 GB
34 GB
23 GB
19 GB
19 GB
17 GB
16 GB
12 GB
9910 MB
We have captured some of db statistics, for instance bgwriter and
buffercache.
Today the load spides happened at:
1). 10:44
2). 11:04
(and then several times during a day)
The premiere was yesterday about 6PM.
What we observed back then was for instance autovacuum process to prevent
wraparound on the biggest table (369GB). We did vacuum freeze manually after
this happened but before that we gathered statistics with the query:
SELECT
oid::regclass::text AS table,
age(relfrozenxid) AS xid_age,
mxid_age(relminmxid) AS mxid_age,
least(
(SELECT setting::int
FROM pg_settings
WHERE name = 'autovacuum_freeze_max_age') - age(relfrozenxid),
(SELECT setting::int
FROM pg_settings
WHERE name = 'autovacuum_multixact_freeze_max_age') -
mxid_age(relminmxid)
) AS tx_before_wraparound_vacuum,
pg_size_pretty(pg_total_relation_size(oid)) AS size,
pg_stat_get_last_autovacuum_time(oid) AS last_autovacuum
FROM pg_class
ORDER BY tx_before_wraparound_vacuum;
and the biggest table which was vacuumed looked like:
217310511 8156548 -17310511 369 GB 2017-09-30
01:57:33.972068+02
So, from the kernel stats we know that the failure happens when db is trying
to alocate some huge amount of pages (page tables size, anons, commited_as).
But what is triggering this situation?
I suppose it could be lazy autovacuum (just standard settings). So
autovacuum had to read whole 369gb yesterday to clean xids. today did the
same on some other tables.
Another idea is too small shared buffers setting.
Today it looked like:
<http://www.postgresql-archive.org/file/t342733/buffercache1040.png>
c - means count
the number after c is the usage count, so c5dirty means here count of dirty
pages with usagecount=5
that is the snapshot before and after the failure at 10:44
before and after the spike at 11:04:
<http://www.postgresql-archive.org/file/t342733/buffercache1104.png>
My interpretation of it is the following:
the count of clean buffers with high usagecount is decreasing, the count of
buffers with usagecount of 0 and 1 is very unstable -> so the buffers have
no time to get older in the shared buffers and are thrown out?
bgwriter stats:
<http://www.postgresql-archive.org/file/t342733/bgwriter.png>
the biggest number of buffers is cleaned by backends - so there is no free
buffers with usagecount 0 and LWlocks happen?
So increasing shared buffers would be a solution?
Please help, it's happening quite often and I'm not sure which way is the
right one...
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Tue, Oct 10, 2017 at 2:40 PM, pinker <pinker@onet.eu> wrote:
Hi to all!
We've got problem with a very serious repetitive incident on our core
system. Namely, cpu load spikes to 300-400 and the whole db becomes
unresponsive. From db point of view nothing special is happening, memory
looks fine, disks io's are ok and the only problem is huge cpu load. Kernel
parameters that are increasing with load are always the same:
The solution here is to reduce the number of connections usually via
some kind of connection pooling. Any db server will have a max
throughput at around the number of cpu cores == connections (give or
take a factor of 2). Outside that performance falls off, and has a
very sharp knee on the other side as the # of conns goes up.
Reduce connections, db runs faster. Increase it slows until it
eventually falls over.
pgbouncer and pgpool II are useful on the db end, look at pooling
options on the app side as well.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Thank you Scott,
we are planning to do it today. But are you sure it will help in this case?
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
2017-10-10 23:40 GMT+03:00 pinker <pinker@onet.eu>:
We've got problem with a very serious repetitive incident on our core
system. Namely, cpu load spikes to 300-400 and the whole db becomes
unresponsive. From db point of view nothing special is happening, memory
looks fine, disks io's are ok and the only problem is huge cpu load. Kernel
parameters that are increasing with load are always the same:
Can you provide output of `iostat -myx 10` at the “peak” moments, please?
Also, it'd be good to look in more detailed bgwriter/checkpointer stats.
You can find more details in this post: http://blog.postgresql-
consulting.com/2017/03/deep-dive-into-postgres-stats_27.html
(You might want to reset 'shared' stats here.)
--
Victor Yegorov
On 10/10/2017 10:40 PM, pinker wrote:
Hi to all!
We've got problem with a very serious repetitive incident on our core
system. Namely, cpu load spikes to 300-400 and the whole db becomes
What is "CPU load"? Perhaps you mean "load average"?
Also, what are the basic system parameters (number of cores, RAM), it's
difficult to help without knowing that.
unresponsive. From db point of view nothing special is happening, memory
looks fine, disks io's are ok and the only problem is huge cpu load. Kernel
parameters that are increasing with load are always the same:* page tables size
* Committed_AS
* Active anon<http://www.postgresql-archive.org/file/t342733/pagetables.png>
and the total number of connections are increasing very fast (but I suppose
it's the symptom not the root cause of cpu load) and exceed max_connections
(1000).
I doubt you have 1000 cores in your system, so 1000 connections active
at the same time is guaranteed to cause issues. What we see quite often
is a minor hiccup (occasional slow query) snowballing into much more
serious trouble exactly because of this.
Queries get slower for some reason, application starts opening more
connections (through a built-in connection pool) to run more queries,
that further increases pressure, slows the queries even more, ...
As Scott suggested, you should consider using a connection pool.
System:
* CentOS Linux release 7.2.1511 (Core)
* Linux 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux
* postgresql95-9.5.5-1PGDG.rhel7.x86_64
* postgresql95-contrib-9.5.5-1PGDG.rhel7.x86_64
* postgresql95-docs-9.5.5-1PGDG.rhel7.x86_64
* postgresql95-libs-9.5.5-1PGDG.rhel7.x86_64
* postgresql95-server-9.5.5-1PGDG.rhel7.x86_64* 4 sockets/80 cores
* vm.dirty_background_bytes = 0
* vm.dirty_background_ratio = 2
* vm.dirty_bytes = 0
* vm.dirty_expire_centisecs = 3000
* vm.dirty_ratio = 20
* vm.dirty_writeback_centisecs = 500after the first incident we have changed:
* increased shared_buffers to 16GB (completely on huge pages. previously
2GB)
* adjusted vm.nr_hugepages to 8000 (we've got 2mb pages)
* changed vm.overcommit_memory = 2 and vm.overcommit_ratio = 99
* disabled transparent huge pages (they were set before unfortunately to
'always')It's a highly transactional db. Today I've run:
select now(), txid_current();
and the results:
3 339 351 transactions between 2017-10-10 14:42 and 2017-10-10 16:24
Well, 3M transactions over ~2h period is just ~450tps, so nothing
extreme. Not sure how large the transactions are, of course.
... snip ...
So, from the kernel stats we know that the failure happens when db is trying
to alocate some huge amount of pages (page tables size, anons, commited_as).
But what is triggering this situation?
Something gets executed on the database. We have no idea what it is, but
it should be in the system logs. And you should see the process in 'top'
with large amounts of virtual memory ...
I suppose it could be lazy autovacuum (just standard settings). So
autovacuum had to read whole 369gb yesterday to clean xids. today did the
same on some other tables.
Possible, but it shouldn't allocate more than maintenance_work_mem. So I
don't why it would allocate so much virtual memory.
Another possibility is a run-away query that consumes a lot of work_mem.
Another idea is too small shared buffers setting.
... snip ...
bgwriter stats:
<http://www.postgresql-archive.org/file/t342733/bgwriter.png>
Yes, this suggests you probably have shared_buffers set too low, but
it's impossible to say if increasing the size will help - perhaps your
active set (part of DB you regularly access) is way too big.
Measure cache hit ratio (see pg_stat_database.blks_hit and blks_read),
and then you can decide.
You may also make the bgwriter more aggressive - that won't really
improve the hit ratio, it will only make enough room for the backends.
But I don't quite see how this could cause the severe problems you have,
as I assume this is kinda regular behavior on that system. Hard to say
without more data.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Victor Yegorov wrote
Can you provide output of `iostat -myx 10` at the “peak” moments, please?
sure, please find it here:
https://pastebin.com/f2Pv6hDL
Victor Yegorov wrote
Also, it'd be good to look in more detailed bgwriter/checkpointer stats.
You can find more details in this post: http://blog.postgresql-
consulting.com/2017/03/deep-dive-into-postgres-stats_27.html
(You might want to reset 'shared' stats here.)
thank you for the link, it's really nice explanation. Here you'll find the
full bgwriter stats: https://pastebin.com/VA8pyfXj
--
Victor Yegorov
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Tue, Oct 10, 2017 at 3:53 PM, pinker <pinker@onet.eu> wrote:
Victor Yegorov wrote
Can you provide output of `iostat -myx 10` at the “peak” moments, please?
sure, please find it here:
https://pastebin.com/f2Pv6hDL
Ouch, unless I'm reading that wrong, your IO subsystem seems to be REALLY slow.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Scott Marlowe-2 wrote
Ouch, unless I'm reading that wrong, your IO subsystem seems to be REALLY
slow.
it's a huge array where a lot is happening, for instance data snapshots :/
the lun on which is this db is dm-7.
I'm a DBA with null knowledge about arrays so any advice will be much
appreciated :)
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
2017-10-11 0:53 GMT+03:00 pinker <pinker@onet.eu>:
Can you provide output of `iostat -myx 10` at the “peak” moments, please?
sure, please find it here:
https://pastebin.com/f2Pv6hDL
Looks like `sdg` and `sdm` are the ones used most.
Can you describe what's on those devices? Do you have WAL and DB sitting
together?
Where DB log files are stored?
Here you'll find the
full bgwriter stats: https://pastebin.com/VA8pyfXj
Can you, please, provide the output of this query (linked from the article
mentioned):
https://gist.github.com/lesovsky/4587d70f169739c01d4525027c087d14
And also this query:
SELECT name,version,source FROM pg_settings WHERE source NOT IN
('default','override');
--
Victor Yegorov
Tomas Vondra-4 wrote
What is "CPU load"? Perhaps you mean "load average"?
Yes, I wasn't exact: I mean system cpu usage, it can be seen here - it's the
graph from yesterday's failure (after 6p.m.):
<http://www.postgresql-archive.org/file/t342733/cpu.png>
So as one can see connections spikes follow cpu spikes...
Tomas Vondra-4 wrote
Also, what are the basic system parameters (number of cores, RAM), it's
difficult to help without knowing that.
I have actually written everything in the first post:
80 CPU and 4 sockets
over 500GB RAM
Tomas Vondra-4 wrote
Well, 3M transactions over ~2h period is just ~450tps, so nothing
extreme. Not sure how large the transactions are, of course.
It's quite a lot going on. Most of them are complicated stored procedures.
Tomas Vondra-4 wrote
Something gets executed on the database. We have no idea what it is, but
it should be in the system logs. And you should see the process in 'top'
with large amounts of virtual memory ...
Yes, it would be much easier if it would be just single query from the top,
but the most cpu is eaten by the system itself and I'm not sure why. I
suppose because of page tables size and anon pages is NUMA related.
Tomas Vondra-4 wrote
Another possibility is a run-away query that consumes a lot of work_mem.
It was exactly my first guess. work_mem is set to ~ 350MB and I see a lot of
stored procedures with unnecessary WITH clauses (i.e. materialization) and
right after it IN query with results of that (hash).
Tomas Vondra-4 wrote
Measure cache hit ratio (see pg_stat_database.blks_hit and blks_read),
and then you can decide.
Thank you for the tip. I always do it but haven't here, so the result is
0.992969610990056 - so increasing it is rather pointless.
Tomas Vondra-4 wrote
You may also make the bgwriter more aggressive - that won't really
improve the hit ratio, it will only make enough room for the backends.
yes i probably will
Tomas Vondra-4 wrote
But I don't quite see how this could cause the severe problems you have,
as I assume this is kinda regular behavior on that system. Hard to say
without more data.
I can provide you with any data you need :)
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 10/10/2017 3:28 PM, pinker wrote:
It was exactly my first guess. work_mem is set to ~ 350MB and I see a lot of
stored procedures with unnecessary WITH clauses (i.e. materialization) and
right after it IN query with results of that (hash).
1000 connections all doing queries that need 1 work_mem each will
consume 1000*350MB == 350GB of your ram. many queries use several
work_mem's.
if the vast majority of your operations are OLTP and only access a few
rows, then large work_mem is NOT a good idea. If you're doing large
aggregate operations like OLAP for reporting or whatever, then thats
another story, but generally doing that sort of thing does NOT use 1000
connections.
--
john r pierce, recycling bits in santa cruz
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Victor Yegorov wrote
Looks like `sdg` and `sdm` are the ones used most.
Can you describe what's on those devices? Do you have WAL and DB sitting
together?
Where DB log files are stored?
it's multipath with the same LUN for PGDATA and pg_log, but separate one for
xlogs and archives.
mpatha dm-4 IBM ,2145
size=2.0T features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| |- 7:0:1:2 sdg 8:96 active undef running
| `- 8:0:1:2 sdm 8:192 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
|- 7:0:0:2 sdd 8:48 active undef running
`- 8:0:0:2 sdj 8:144 active undef running
Victor Yegorov wrote
Can you, please, provide the output of this query (linked from the article
mentioned):
https://gist.github.com/lesovsky/4587d70f169739c01d4525027c087d14
00:26:51.226024|120 days
03:05:37.987175|0.6|7.99|300.63|0.46|12673500.4|162.00|0.34|0.51|0.37|1.22|26.721|27.7|41.8|30.6|4.47|34.27|--------------------------------------|21532|124|6510377185|9920323|449049896|677360078|2321057|495798075|0
Victor Yegorov wrote
And also this query:
SELECT name,version,source FROM pg_settings WHERE source NOT IN
('default','override');
application_name | client | psql
archive_command | configuration file | <deleted>
archive_mode | configuration file | on
autovacuum | configuration file | on
autovacuum_max_workers | configuration file | 10
checkpoint_completion_target | configuration file | 0.9
checkpoint_timeout | configuration file | 480
client_encoding | client | UTF8
DateStyle | configuration file | ISO, MDY
default_statistics_target | configuration file | 350
default_text_search_config | configuration file | pg_catalog.english
effective_cache_size | configuration file | 52428800
enable_indexscan | configuration file | on
huge_pages | configuration file | on
lc_messages | configuration file | en_US.UTF-8
lc_monetary | configuration file | en_US.UTF-8
lc_numeric | configuration file | en_US.UTF-8
lc_time | configuration file | en_US.UTF-8
listen_addresses | configuration file | *
log_autovacuum_min_duration | configuration file | 0
log_checkpoints | configuration file | on
log_connections | configuration file | on
log_destination | configuration file | stderr
log_directory | configuration file | pg_log
log_disconnections | configuration file | on
log_duration | configuration file | off
log_filename | configuration file | postgresql-%a.log
log_line_prefix | configuration file | %t [%p]: [%l-1]
user=%u,db=%d
log_lock_waits | configuration file | on
log_min_duration_statement | configuration file | 0
log_rotation_age | configuration file | 1440
log_rotation_size | configuration file | 0
log_temp_files | configuration file | 0
log_timezone | configuration file | Poland
log_truncate_on_rotation | configuration file | on
logging_collector | configuration file | on
maintenance_work_mem | configuration file | 2097152
max_connections | configuration file | 1000
max_stack_depth | environment variable | 2048
max_wal_senders | configuration file | 10
max_wal_size | configuration file | 640
random_page_cost | configuration file | 1
shared_buffers | configuration file | 2097152
temp_buffers | configuration file | 16384
TimeZone | configuration file | Poland
track_functions | configuration file | all
track_io_timing | configuration file | off
wal_buffers | configuration file | 2048
wal_keep_segments | configuration file | 150
wal_level | configuration file | hot_standby
work_mem | configuration file | 393216+
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 10/11/2017 12:28 AM, pinker wrote:
Tomas Vondra-4 wrote
What is "CPU load"? Perhaps you mean "load average"?
Yes, I wasn't exact: I mean system cpu usage, it can be seen here - it's the
graph from yesterday's failure (after 6p.m.):
<http://www.postgresql-archive.org/file/t342733/cpu.png>
So as one can see connections spikes follow cpu spikes...
I'm probably a bit dumb (after all, it's 1AM over here), but can you
explain the CPU chart? I'd understand percentages (say, 75% CPU used)
but what do the seconds / fractions mean? E.g. when the system time
reaches 5 seconds, what does that mean?
Tomas Vondra-4 wrote
Also, what are the basic system parameters (number of cores, RAM), it's
difficult to help without knowing that.I have actually written everything in the first post:
80 CPU and 4 sockets
over 500GB RAM
Apologies, missed that bit.
Tomas Vondra-4 wrote
Well, 3M transactions over ~2h period is just ~450tps, so nothing
extreme. Not sure how large the transactions are, of course.It's quite a lot going on. Most of them are complicated stored procedures.
OK.
Tomas Vondra-4 wrote
Something gets executed on the database. We have no idea what it is, but
it should be in the system logs. And you should see the process in 'top'
with large amounts of virtual memory ...Yes, it would be much easier if it would be just single query from the top,
but the most cpu is eaten by the system itself and I'm not sure why. I
suppose because of page tables size and anon pages is NUMA related.
Have you tried profiling using perf? That usually identifies hot spots
pretty quickly - either in PostgreSQL code or in the kernel.
Tomas Vondra-4 wrote
Another possibility is a run-away query that consumes a lot of work_mem.
It was exactly my first guess. work_mem is set to ~ 350MB and I see a lot of
stored procedures with unnecessary WITH clauses (i.e. materialization) and
right after it IN query with results of that (hash).
Depends on how much data is in the CTEs. We don't really allocate all of
work_mem at once, but bit by bit.
Tomas Vondra-4 wrote
Measure cache hit ratio (see pg_stat_database.blks_hit and blks_read),
and then you can decide.Thank you for the tip. I always do it but haven't here, so the result is
0.992969610990056 - so increasing it is rather pointless.
Yeah.
Tomas Vondra-4 wrote
You may also make the bgwriter more aggressive - that won't really
improve the hit ratio, it will only make enough room for the backends.yes i probably will
On the other hand, the numbers are rather low. I mean, the backends
seems to be evicting ~15k buffers over 5-minute period, which is pretty
much nothing (~400kB/s). I wouldn't bother by tuning this.
Tomas Vondra-4 wrote
But I don't quite see how this could cause the severe problems you have,
as I assume this is kinda regular behavior on that system. Hard to say
without more data.I can provide you with any data you need :)
What I meant is that if the system evicts this amount of buffers all the
time (i.e. there doesn't seem to be any sudden spike), then it's
unlikely to be the cause (or related to it).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hi,
On 2017-10-10 13:40:07 -0700, pinker wrote:
and the total number of connections are increasing very fast (but I suppose
it's the symptom not the root cause of cpu load) and exceed max_connections
(1000).
Others mentioned already that that's worth improving.
System:
* CentOS Linux release 7.2.1511 (Core)
* Linux 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux
Some versions of this kernel have had serious problems with transparent
hugepages. I'd try turning that off. I think it defaults to off even in
that version, but also make sure zone_reclaim_mode is disabled.
* postgresql95-9.5.5-1PGDG.rhel7.x86_64
* postgresql95-contrib-9.5.5-1PGDG.rhel7.x86_64
* postgresql95-docs-9.5.5-1PGDG.rhel7.x86_64
* postgresql95-libs-9.5.5-1PGDG.rhel7.x86_64
* postgresql95-server-9.5.5-1PGDG.rhel7.x86_64* 4 sockets/80 cores
9.6 has quite some scalability improvements over 9.5. I don't know
whether it's feasible for you to update, but if so, It's worth trying.
How about taking perf profile to investigate?
* vm.dirty_background_bytes = 0
* vm.dirty_background_ratio = 2
* vm.dirty_bytes = 0
* vm.dirty_expire_centisecs = 3000
* vm.dirty_ratio = 20
* vm.dirty_writeback_centisecs = 500
I'd suggest monitoring /proc/meminfo for the amount of Dirty and
Writeback memory, and see whether rapid changes therein coincide with
periodds of slowdown.
Greetings,
Andres Freund
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Tomas Vondra-4 wrote
I'm probably a bit dumb (after all, it's 1AM over here), but can you
explain the CPU chart? I'd understand percentages (say, 75% CPU used)
but what do the seconds / fractions mean? E.g. when the system time
reaches 5 seconds, what does that mean?
hehe, no you've just spotted a mistake, it suppose to be 50 cores :)
out of 80 in total
Tomas Vondra-4 wrote
Have you tried profiling using perf? That usually identifies hot spots
pretty quickly - either in PostgreSQL code or in the kernel.
I was always afraid because of overhead, but maybe it's time to start ...
Tomas Vondra-4 wrote
What I meant is that if the system evicts this amount of buffers all the
time (i.e. there doesn't seem to be any sudden spike), then it's
unlikely to be the cause (or related to it).
I was actually been thinking about scenario where different sessions want to
at one time read/write from or to many different relfilenodes, what could
cause page swap between shared buffers and os cache? we see that context
switches on cpu are increasing as well. kernel documentation says that using
page tables instead of Translation Lookaside Buffer (TLB) is very costly and
on some blogs have seen recomendations that using huge pages (so more
addresses can fit in TLB) will help here but postgresql, unlike oracle,
cannot use it for anything else than page buffering (so 16gb) ... so process
memory still needs to use 4k pages.
or memory fragmentation?
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Andres Freund wrote
Others mentioned already that that's worth improving.
Yes, we are just setting up pgbouncer
Andres Freund wrote
Some versions of this kernel have had serious problems with transparent
hugepages. I'd try turning that off. I think it defaults to off even in
that version, but also make sure zone_reclaim_mode is disabled.
Yes, I'm aware of that so always set it to never.
but thank you for the zone_reclaim_mode.
Andres Freund wrote
9.6 has quite some scalability improvements over 9.5. I don't know
whether it's feasible for you to update, but if so, It's worth trying.How about taking perf profile to investigate?
Both are on my to do list :)
Andres Freund wrote
I'd suggest monitoring /proc/meminfo for the amount of Dirty and
Writeback memory, and see whether rapid changes therein coincide with
periodds of slowdown.
yes, I was monitoring it the whole day and that's the reason why I've
changed dirty_background_ratio but both of them were flat - without any
bigger spikes.
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Tue, Oct 10, 2017 at 01:40:07PM -0700, pinker wrote:
Hi to all!
We've got problem with a very serious repetitive incident on our core
system. Namely, cpu load spikes to 300-400 and the whole db becomes
unresponsive. From db point of view nothing special is happening, memory
looks fine, disks io's are ok and the only problem is huge cpu load. Kernel
parameters that are increasing with load are always the same:
* disabled transparent huge pages (they were set before unfortunately to
'always')
Did you also try disabling KSM ?
echo 2 |sudo tee /sys/kernel/mm/ksm/run
I believe for us that was affecting a postgres VM(QEMU/KVM) and maybe not
postgres itself. Worth a try ?
/messages/by-id/20170718180152.GE17566@telsasoft.com
Justin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 10/11/2017 02:26 AM, pinker wrote:
Tomas Vondra-4 wrote
I'm probably a bit dumb (after all, it's 1AM over here), but can you
explain the CPU chart? I'd understand percentages (say, 75% CPU used)
but what do the seconds / fractions mean? E.g. when the system time
reaches 5 seconds, what does that mean?hehe, no you've just spotted a mistake, it suppose to be 50 cores :)
out of 80 in total
Ah, so it should say '50 cores' instead of '5s'? Well, that's busy
system I guess.
Tomas Vondra-4 wrote
Have you tried profiling using perf? That usually identifies hot spots
pretty quickly - either in PostgreSQL code or in the kernel.I was always afraid because of overhead, but maybe it's time to start ...
I don't follow. If you're not in trouble, a little bit of additional
overhead is not an issue (but you generally don't need profiling at that
moment). If you're already in trouble, then spending a bit of CPU time
on basic CPU profile is certainly worth it.
Tomas Vondra-4 wrote
What I meant is that if the system evicts this amount of buffers all the
time (i.e. there doesn't seem to be any sudden spike), then it's
unlikely to be the cause (or related to it).I was actually been thinking about scenario where different sessions
want to at one time read/write from or to many different relfilenodes,
what could cause page swap between shared buffers and os cache?
Perhaps. If the sessions only do reads, that would not be visible in
buffer_backends I believe (not sure ATM, would have to check source).
But it'd be visible in buffers_alloc and certainly in blks_read.
we see that context switches on cpu are increasing as well. kernel
documentation says that using page tables instead of Translation
Lookaside Buffer (TLB) is very costly and on some blogs have seen
recomendations that using huge pages (so more addresses can fit in
TLB) will help here but postgresql, unlike oracle, cannot use it for
anything else than page buffering (so 16gb) ... so process memory
still needs to use 4k pages.
The context switches are likely due to large number of runnable
processes competing for the CPU.
Also, memory bandwidth is increasingly an issue on big boxes ...
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Tue, Oct 10, 2017 at 4:28 PM, pinker <pinker@onet.eu> wrote:
Yes, it would be much easier if it would be just single query from the top,
but the most cpu is eaten by the system itself and I'm not sure why.
You are experiencing a context switch storm. The OS is spending so
much time trying to switch between 1,000+ processes it doesn't have
any time left to do much else.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Any chance this situation has something to do with the latest reveal of
meltdown vulnerability?
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html