PostgreSQL: CPU utilization creeping to 100%
I'm running into a strange issue whereby my postgres processes are
slowly creeping to 100% CPU utilization. I'm running
postgresql-server-9.2.3, FreeBSD 8.3-RELEASE-p6, and using the
postgresql-9.2-1002.jdbc4 driver.
I'm not sure what information here is relevant, so I'll give
everything I can as concisely as I can.
The hardware is a Dell PowerEdge 1420, dual Xeon Nocona's, 3.2ghz,
16gb ram. The disks are 4 Kingston HyperX SATA3's attached to a
HighPoint RocketRAID 2721 controller, ZFS, RAID10.
There are 2 databases, one a queue, and one containing tables storing
crawled data.
The application is a webcrawler. The application pulls URLs from the
queue and marks them active in a single transaction. It then feeds the
URLs to the crawler threads who crawl the URL, populate the tables
with data, and signal the main thread to update the queue database,
marking the item as inactive and rescheduling it with a new
"NextCrawlDate".
The processes that hang are the postgres processes that interact with
the queue database.
'select * from pg_stat_activity' shows that the queries are not
waiting, and are in the idle state.
Essentially, the query in question is responsible for returning 1 URL
from the union of the list of URL's whose crawl session has timed out
and the list of URL's next in line to be crawled according to the
schedule (NextCrawlDate). The query is: "select * from ((select * from
"crawlq" where "Active" = 'true' AND "TimeoutDate" <= now()) UNION
(select * from "crawlq" where "Active" = 'false')) as RS order by
"NextCrawlDate" asc limit 1"
Beyond this I don't know what useful debugging information to include.
I'll take a guess and start with some vmstat output.
Under normal conditions (with the crawler running) vmstat shows the following:
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr da0 da1 in sy
cs us sy id
2 0 0 119G 8450M 1143 0 1 0 900 0 0 0 128 6700
8632 32 4 65
2 0 0 119G 8444M 1937 0 0 0 100 0 4 4 280 112206
7683 36 5 59
1 0 0 119G 8443M 427 0 0 0 1377 0 90 90 222 115889
9020 35 7 58
1 0 0 119G 8442M 1798 0 0 0 18 0 1 1 153 104954
7071 30 4 66
3 0 0 119G 8443M 528 0 0 0 681 0 10 10 293 125170
14523 40 15 46
1 0 0 119G 8432M 15227 0 0 4 2850 0 3 3 205 83830
6663 55 12 33
8 0 0 119G 8433M 3306 0 0 0 445 0 2 2 249 113792
7068 34 5 61
10 0 0 118G 8461M 1190 0 0 0 9909 0 72 73 451 120263
10036 56 9 35
14 0 0 118G 8456M 5887 0 0 0 1202 0 2 2 272 130795
9506 44 12 44
9 0 0 118G 8444M 7750 0 0 0 1070 0 9 9 298 87643
9584 80 13 7
3 0 0 118G 8442M 1335 0 0 0 648 0 5 5 189 143691
9234 36 6 58
1 0 0 118G 8442M 689 0 1 1 472 0 2 2 206 153868
8635 32 7 61
1 0 0 118G 8441M 203 0 0 0 1124 0 75 75 191 142598
8909 31 10 60
2 0 0 118G 8440M 9508 0 0 0 684 0 8 8 231 132785
10247 47 13 41
4 0 0 118G 8456M 4046 0 0 0 5469 0 11 11 299 143119
12475 54 22 24
4 0 0 117G 8490M 1076 0 0 0 9858 0 16 16 291 140701
14849 58 25 17
1 0 0 116G 8524M 344 0 0 0 8936 0 4 4 234 149103
12137 45 15 40
2 0 0 114G 8586M 715 0 0 5 17719 0 73 75 322 151002
11430 34 10 56
5 0 0 112G 8648M 2773 0 0 0 16997 0 6 6 225 118339
8700 30 10 61
1 0 0 110G 8705M 4429 0 0 0 15763 0 7 7 423 139590
10354 40 11 49
1 0 0 108G 8760M 1443 0 0 0 14519 0 7 7 405 139806
10214 37 5 58
1 0 0 104G 8863M 333 0 0 0 26537 0 5 5 284 107770
9947 34 6 60
1 0 0 104G 8859M 1331 0 0 0 1700 0 114 114 464 103248
12113 40 9 51
1 0 0 104G 8854M 1708 0 0 0 272 0 6 6 279 99817
9470 40 5 55
9 0 0 104G 8850M 3653 0 0 0 4809 0 28 28 346 160041
54071 42 32 26
12 3 0 105G 8845M 20576 0 0 0 18344 0 7 7 383 95019
32533 46 53 1
20 0 0 114G 8721M 46913 0 0 0 2941 0 11 11 461 77480
9794 72 28 0
12 1 0 110G 8759M 25109 0 0 0 35881 0 70 70 413 72631
10161 76 24 0
2 0 0 110G 8716M 12993 0 1 1 265 0 8 8 292 83085
10073 61 30 9
3 0 0 110G 8716M 2144 0 0 0 45 0 3 3 183 100994
7410 39 20 41
...and when postgres goes bonkers:
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr da0 da1 in sy
cs us sy id
2 0 0 98G 10G 1091 0 1 0 849 0 0 0 114 2641
8582 30 4 66
2 0 0 98G 10G 20 0 0 0 0 0 0 0 197 20500
10454 46 2 53
2 0 0 98G 10G 59 0 0 0 0 0 0 0 284 23715
11180 46 3 51
3 0 0 98G 10G 17 0 0 0 652 0 69 70 288 21968
11571 46 4 50
2 0 0 98G 10G 56 0 0 0 0 0 0 0 242 22120
10251 47 2 50
2 0 0 98G 10G 16 0 0 0 0 0 0 0 207 20750
9982 48 1 51
4 0 0 98G 10G 61 0 0 0 97 0 0 0 205 21024
10395 49 2 50
2 0 0 98G 10G 24 0 0 0 0 0 0 0 215 21167
10378 48 1 50
4 0 0 98G 10G 57 0 0 0 346 0 46 46 272 22766
11314 47 3 50
2 0 0 98G 10G 19 0 0 0 0 0 0 0 205 20594
10340 48 1 50
2 0 0 98G 10G 53 0 0 0 0 0 0 0 250 22282
10526 48 2 50
2 0 0 98G 10G 27 0 0 0 0 0 0 0 260 27824
12090 51 4 44
2 0 0 98G 10G 543 0 0 0 429 0 0 0 267 28757
11817 46 4 50
2 0 0 98G 10G 23 0 0 0 260 0 44 44 278 33147
12624 51 5 43
2 0 0 98G 10G 55 0 0 0 0 0 0 0 243 26885
11081 47 3 50
2 0 0 98G 10G 16 0 0 0 0 0 0 0 207 20854
10348 48 1 51
2 0 0 98G 10G 55 0 0 0 9 0 1 1 222 22714
10766 48 1 51
2 0 0 98G 10G 23 0 0 0 0 0 0 0 200 20919
10392 47 2 51
2 0 0 98G 10G 55 0 0 0 517 0 88 87 247 20874
11032 46 3 51
4 0 0 98G 10G 20 0 0 0 0 0 0 0 253 30263
11705 50 3 47
2 0 0 98G 10G 1133 0 0 0 4 0 0 0 212 26791
11186 51 2 47
2 0 0 98G 10G 19 0 0 0 4 0 0 0 202 22043
10752 48 1 51
2 0 0 98G 10G 59 0 0 0 0 0 0 0 208 21416
10506 48 1 51
4 0 0 98G 10G 18 0 0 0 4 0 0 0 214 22207
10598 48 2 50
1 0 0 98G 10G 57 0 0 0 9 0 0 0 203 21102
10245 50 1 49
2 0 0 98G 10G 25 0 0 0 2 0 0 0 206 21531
10225 47 3 50
2 0 0 98G 10G 53 0 0 0 0 0 0 0 217 21083
10519 46 1 53
2 0 0 98G 10G 24 0 0 0 15 0 1 1 199 22009
10620 49 3 48
2 0 0 98G 10G 55 0 0 0 138 0 44 44 233 21647
10862 48 2 50
1 0 0 98G 10G 55 0 0 0 6 0 0 0 203 23002
10653 49 2 49
/etc/rc.conf:
postgresql_enable="YES"
postgresql_data="/zdb/pgsql/data"
postgresql_flags="-s -m smart"
/boot/loader.conf:
kern.ipc.semmns="1024"
kern.ipc.semmni="128"
kern.ipc.shmall="1048576"
kern.ipc.shmseg="2048"
kern.ipc.shmmax="2147483647"
kern.ipc.shmmni="2048"
kern.maxusers="1024"
kern.maxswzone="335544320"
postgresql.conf, all standard/default except for:
max_connections = 256
Any thoughts? What other information can I provide?
Regards,
-David
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
2013/4/3 David Noel <david.i.noel@gmail.com>:
I'm running into a strange issue whereby my postgres processes are
slowly creeping to 100% CPU utilization. I'm running
postgresql-server-9.2.3, FreeBSD 8.3-RELEASE-p6, and using the
postgresql-9.2-1002.jdbc4 driver.
(...)
postgresql.conf, all standard/default except for:
max_connections = 256
It's very likely the default settings are woefully inadequate for your
server; some basic
tuning (especially the value of shared_buffers and other
memory-related parameters)
should help.
Any thoughts? What other information can I provide?
Sample EXPLAIN/EXPLAIN ANALYZE output from the query in question, and
if possible relevant table definitions etc. would certainly be useful.
Regards
Ian Barwick
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
What's strange is that the crawler will run just fine for up to
several hours. At some point though the CPU utilization slowly begins
to creep higher. Eventually everything locks and the program hangs.
'top' shows the processes connected to the queue database at or near
%100, and the program ceases output (I have debugging messages built
in to show current activity "[timestamp] : crawling [URL]"). At some
point--anywhere from 30 minutes to several hours later--CPU
utilization drops to normal and the program resumes operation as if
everything were fine. This goes on for up to several hours, then the
utilization issue repeats. So it's a very odd issue I've run into.
On 4/2/13, David Noel <david.i.noel@gmail.com> wrote:
I'm running into a strange issue whereby my postgres processes are
slowly creeping to 100% CPU utilization. I'm running
postgresql-server-9.2.3, FreeBSD 8.3-RELEASE-p6, and using the
postgresql-9.2-1002.jdbc4 driver.I'm not sure what information here is relevant, so I'll give
everything I can as concisely as I can.The hardware is a Dell PowerEdge 1420, dual Xeon Nocona's, 3.2ghz,
16gb ram. The disks are 4 Kingston HyperX SATA3's attached to a
HighPoint RocketRAID 2721 controller, ZFS, RAID10.There are 2 databases, one a queue, and one containing tables storing
crawled data.The application is a webcrawler. The application pulls URLs from the
queue and marks them active in a single transaction. It then feeds the
URLs to the crawler threads who crawl the URL, populate the tables
with data, and signal the main thread to update the queue database,
marking the item as inactive and rescheduling it with a new
"NextCrawlDate".The processes that hang are the postgres processes that interact with
the queue database.'select * from pg_stat_activity' shows that the queries are not
waiting, and are in the idle state.Essentially, the query in question is responsible for returning 1 URL
from the union of the list of URL's whose crawl session has timed out
and the list of URL's next in line to be crawled according to the
schedule (NextCrawlDate). The query is: "select * from ((select * from
"crawlq" where "Active" = 'true' AND "TimeoutDate" <= now()) UNION
(select * from "crawlq" where "Active" = 'false')) as RS order by
"NextCrawlDate" asc limit 1"Beyond this I don't know what useful debugging information to include.
I'll take a guess and start with some vmstat output.Under normal conditions (with the crawler running) vmstat shows the
following:procs memory page disks faults
cpu
r b w avm fre flt re pi po fr sr da0 da1 in sy
cs us sy id
2 0 0 119G 8450M 1143 0 1 0 900 0 0 0 128 6700
8632 32 4 65
2 0 0 119G 8444M 1937 0 0 0 100 0 4 4 280 112206
7683 36 5 59
1 0 0 119G 8443M 427 0 0 0 1377 0 90 90 222 115889
9020 35 7 58
1 0 0 119G 8442M 1798 0 0 0 18 0 1 1 153 104954
7071 30 4 66
3 0 0 119G 8443M 528 0 0 0 681 0 10 10 293 125170
14523 40 15 46
1 0 0 119G 8432M 15227 0 0 4 2850 0 3 3 205 83830
6663 55 12 33
8 0 0 119G 8433M 3306 0 0 0 445 0 2 2 249 113792
7068 34 5 61
10 0 0 118G 8461M 1190 0 0 0 9909 0 72 73 451 120263
10036 56 9 35
14 0 0 118G 8456M 5887 0 0 0 1202 0 2 2 272 130795
9506 44 12 44
9 0 0 118G 8444M 7750 0 0 0 1070 0 9 9 298 87643
9584 80 13 7
3 0 0 118G 8442M 1335 0 0 0 648 0 5 5 189 143691
9234 36 6 58
1 0 0 118G 8442M 689 0 1 1 472 0 2 2 206 153868
8635 32 7 61
1 0 0 118G 8441M 203 0 0 0 1124 0 75 75 191 142598
8909 31 10 60
2 0 0 118G 8440M 9508 0 0 0 684 0 8 8 231 132785
10247 47 13 41
4 0 0 118G 8456M 4046 0 0 0 5469 0 11 11 299 143119
12475 54 22 24
4 0 0 117G 8490M 1076 0 0 0 9858 0 16 16 291 140701
14849 58 25 17
1 0 0 116G 8524M 344 0 0 0 8936 0 4 4 234 149103
12137 45 15 40
2 0 0 114G 8586M 715 0 0 5 17719 0 73 75 322 151002
11430 34 10 56
5 0 0 112G 8648M 2773 0 0 0 16997 0 6 6 225 118339
8700 30 10 61
1 0 0 110G 8705M 4429 0 0 0 15763 0 7 7 423 139590
10354 40 11 49
1 0 0 108G 8760M 1443 0 0 0 14519 0 7 7 405 139806
10214 37 5 58
1 0 0 104G 8863M 333 0 0 0 26537 0 5 5 284 107770
9947 34 6 60
1 0 0 104G 8859M 1331 0 0 0 1700 0 114 114 464 103248
12113 40 9 51
1 0 0 104G 8854M 1708 0 0 0 272 0 6 6 279 99817
9470 40 5 55
9 0 0 104G 8850M 3653 0 0 0 4809 0 28 28 346 160041
54071 42 32 26
12 3 0 105G 8845M 20576 0 0 0 18344 0 7 7 383 95019
32533 46 53 1
20 0 0 114G 8721M 46913 0 0 0 2941 0 11 11 461 77480
9794 72 28 0
12 1 0 110G 8759M 25109 0 0 0 35881 0 70 70 413 72631
10161 76 24 0
2 0 0 110G 8716M 12993 0 1 1 265 0 8 8 292 83085
10073 61 30 9
3 0 0 110G 8716M 2144 0 0 0 45 0 3 3 183 100994
7410 39 20 41...and when postgres goes bonkers:
procs memory page disks faults
cpu
r b w avm fre flt re pi po fr sr da0 da1 in sy
cs us sy id
2 0 0 98G 10G 1091 0 1 0 849 0 0 0 114 2641
8582 30 4 66
2 0 0 98G 10G 20 0 0 0 0 0 0 0 197 20500
10454 46 2 53
2 0 0 98G 10G 59 0 0 0 0 0 0 0 284 23715
11180 46 3 51
3 0 0 98G 10G 17 0 0 0 652 0 69 70 288 21968
11571 46 4 50
2 0 0 98G 10G 56 0 0 0 0 0 0 0 242 22120
10251 47 2 50
2 0 0 98G 10G 16 0 0 0 0 0 0 0 207 20750
9982 48 1 51
4 0 0 98G 10G 61 0 0 0 97 0 0 0 205 21024
10395 49 2 50
2 0 0 98G 10G 24 0 0 0 0 0 0 0 215 21167
10378 48 1 50
4 0 0 98G 10G 57 0 0 0 346 0 46 46 272 22766
11314 47 3 50
2 0 0 98G 10G 19 0 0 0 0 0 0 0 205 20594
10340 48 1 50
2 0 0 98G 10G 53 0 0 0 0 0 0 0 250 22282
10526 48 2 50
2 0 0 98G 10G 27 0 0 0 0 0 0 0 260 27824
12090 51 4 44
2 0 0 98G 10G 543 0 0 0 429 0 0 0 267 28757
11817 46 4 50
2 0 0 98G 10G 23 0 0 0 260 0 44 44 278 33147
12624 51 5 43
2 0 0 98G 10G 55 0 0 0 0 0 0 0 243 26885
11081 47 3 50
2 0 0 98G 10G 16 0 0 0 0 0 0 0 207 20854
10348 48 1 51
2 0 0 98G 10G 55 0 0 0 9 0 1 1 222 22714
10766 48 1 51
2 0 0 98G 10G 23 0 0 0 0 0 0 0 200 20919
10392 47 2 51
2 0 0 98G 10G 55 0 0 0 517 0 88 87 247 20874
11032 46 3 51
4 0 0 98G 10G 20 0 0 0 0 0 0 0 253 30263
11705 50 3 47
2 0 0 98G 10G 1133 0 0 0 4 0 0 0 212 26791
11186 51 2 47
2 0 0 98G 10G 19 0 0 0 4 0 0 0 202 22043
10752 48 1 51
2 0 0 98G 10G 59 0 0 0 0 0 0 0 208 21416
10506 48 1 51
4 0 0 98G 10G 18 0 0 0 4 0 0 0 214 22207
10598 48 2 50
1 0 0 98G 10G 57 0 0 0 9 0 0 0 203 21102
10245 50 1 49
2 0 0 98G 10G 25 0 0 0 2 0 0 0 206 21531
10225 47 3 50
2 0 0 98G 10G 53 0 0 0 0 0 0 0 217 21083
10519 46 1 53
2 0 0 98G 10G 24 0 0 0 15 0 1 1 199 22009
10620 49 3 48
2 0 0 98G 10G 55 0 0 0 138 0 44 44 233 21647
10862 48 2 50
1 0 0 98G 10G 55 0 0 0 6 0 0 0 203 23002
10653 49 2 49/etc/rc.conf:
postgresql_enable="YES"
postgresql_data="/zdb/pgsql/data"
postgresql_flags="-s -m smart"/boot/loader.conf:
kern.ipc.semmns="1024"
kern.ipc.semmni="128"
kern.ipc.shmall="1048576"
kern.ipc.shmseg="2048"
kern.ipc.shmmax="2147483647"
kern.ipc.shmmni="2048"
kern.maxusers="1024"
kern.maxswzone="335544320"postgresql.conf, all standard/default except for:
max_connections = 256Any thoughts? What other information can I provide?
Regards,
-David
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
David Noel <david.i.noel@gmail.com> wrote:
'select * from pg_stat_activity' shows that the queries are not
waiting, and are in the idle state.
The process is idle or the process is running the query? If the
latter, what do you mean when you say "the queries ... are in the
idle state"?
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 4/2/2013 3:35 PM, David Noel wrote:
The hardware is a Dell PowerEdge 1420, dual Xeon Nocona's, 3.2ghz,
16gb ram. The disks are 4 Kingston HyperX SATA3's attached to a
HighPoint RocketRAID 2721 controller, ZFS, RAID10.
.....
postgresql.conf, all standard/default except for:
max_connections = 256
A) use a connection pool so you don't NEED 256 active database connections.
B) shared_buffers, work_mem, and maintenance_work_mem all need to be
tuned. I'd suggest 4gb, 16mb, 1gb respectively as a starting point on
a 16GB ram system. if you can, shrink your max_connections by using a
connection pooler (my target is generally no more than 2-4 active
queries per CPU core or hardware thread). Ouch, Xeon Nocona was a
single core, dual thread CPU, with rather poor performance, essentially
just a Pentium-4... 3Ghz on a P4 is like 2Ghz on other CPUs.
when you said raid10, do you mean zfs mirrored, or are you doing
hardware raid10 in the Highpoint? I would have configured the raid
card for JBOD, and done ZFS mirroring in the OS, so you can take
advantage of ZFS's data integrity features. Those are consumer
grade SSD's, are they even qualified for use with that Highpoint
controller ?
--
john r pierce 37N 122W
somewhere on the middle of the left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 4/2/13, Ian Lawrence Barwick <barwick@gmail.com> wrote:
2013/4/3 David Noel <david.i.noel@gmail.com>:
I'm running into a strange issue whereby my postgres processes are
slowly creeping to 100% CPU utilization. I'm running
postgresql-server-9.2.3, FreeBSD 8.3-RELEASE-p6, and using the
postgresql-9.2-1002.jdbc4 driver.(...)
postgresql.conf, all standard/default except for:
max_connections = 256It's very likely the default settings are woefully inadequate for your
server; some basic
tuning (especially the value of shared_buffers and other
memory-related parameters)
should help.Any thoughts? What other information can I provide?
Sample EXPLAIN/EXPLAIN ANALYZE output from the query in question, and
if possible relevant table definitions etc. would certainly be useful.Regards
Ian Barwick
Thanks for the feedback. I'll look into pg tunings. Hopefully the
problem's there somewhere.
explain analyze select * from ((select * from "crawlq" where "Active"
= 'true' AND "TimeoutDate" <= now()) UNION (select * from "crawlq"
where "Active" = 'false')) as RS order by "NextCrawlDate" asc limit 1
"Limit (cost=4092.39..4092.39 rows=1 width=203) (actual
time=23.447..23.450 rows=1 loops=1)"
" -> Sort (cost=4092.39..4096.34 rows=1583 width=203) (actual
time=23.442..23.442 rows=1 loops=1)"
" Sort Key: public.crawlq."NextCrawlDate""
" Sort Method: top-N heapsort Memory: 25kB"
" -> HashAggregate (cost=4052.81..4068.64 rows=1583
width=236) (actual time=18.195..20.486 rows=877 loops=1)"
" -> Append (cost=0.00..3997.41 rows=1583 width=236)
(actual time=0.015..13.423 rows=877 loops=1)"
" -> Seq Scan on crawlq (cost=0.00..1995.14
rows=18 width=236) (actual time=0.011..3.397 rows=49 loops=1)"
" Filter: ("Active" AND ("TimeoutDate" <= now()))"
" Rows Removed by Filter: 828"
" -> Seq Scan on crawlq (cost=0.00..1986.43
rows=1565 width=236) (actual time=0.013..7.152 rows=828 loops=1)"
" Filter: (NOT "Active")"
" Rows Removed by Filter: 49"
"Total runtime: 23.633 ms"
Relevant rows from table crawlq:
CREATE TABLE crawlq
(
"URL" text NOT NULL,
"LastCrawlDate" timestamp with time zone DEFAULT now(),
"NextCrawlDate" timestamp with time zone,
"Active" boolean DEFAULT false,
"TimeoutDate" timestamp with time zone,
CONSTRAINT crawlq_pkey PRIMARY KEY ("URL")
)
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 4/2/13, Kevin Grittner <kgrittn@ymail.com> wrote:
David Noel <david.i.noel@gmail.com> wrote:
'select * from pg_stat_activity' shows that the queries are not
waiting, and are in the idle state.The process is idle or the process is running the query? If the
latter, what do you mean when you say "the queries ... are in the
idle state"?
select * from pg_stat_activity returns a table containing a column
labeled "state". When the postgres process is at 100% utilization and
the application has hung, this query returns the value "idle" in that
field. When things are running properly, as they are for the moment
now, the value is "active".
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 4/2/13, John R Pierce <pierce@hogranch.com> wrote:
On 4/2/2013 3:35 PM, David Noel wrote:
The hardware is a Dell PowerEdge 1420, dual Xeon Nocona's, 3.2ghz,
16gb ram. The disks are 4 Kingston HyperX SATA3's attached to a
HighPoint RocketRAID 2721 controller, ZFS, RAID10......
postgresql.conf, all standard/default except for:
max_connections = 256A) use a connection pool so you don't NEED 256 active database connections.
B) shared_buffers, work_mem, and maintenance_work_mem all need to be
tuned. I'd suggest 4gb, 16mb, 1gb respectively as a starting point on
a 16GB ram system. if you can, shrink your max_connections by using a
connection pooler (my target is generally no more than 2-4 active
queries per CPU core or hardware thread).
Great, thanks. I'll get those tunables modified and see if that
smooths things out.
Ouch, Xeon Nocona was a
single core, dual thread CPU, with rather poor performance, essentially
just a Pentium-4... 3Ghz on a P4 is like 2Ghz on other CPUs.
I won't tell them you said that. Feelings might get hurt.
when you said raid10, do you mean zfs mirrored, or are you doing
hardware raid10 in the Highpoint? I would have configured the raid
card for JBOD, and done ZFS mirroring in the OS, so you can take
advantage of ZFS's data integrity features.
RAID10 under ZFS. Yes, JBOD. ZFS is neat!
Those are consumer grade SSD's, are they even qualified for use
with that Highpoint controller?
Consumer grade SSD's, indeed. They've held together so far though.
Fingers crossed.
Thanks again,
-David
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
David Noel <david.i.noel@gmail.com> wrote:
On 4/2/13, Kevin Grittner <kgrittn@ymail.com> wrote:
David Noel <david.i.noel@gmail.com> wrote:
'select * from pg_stat_activity' shows that the queries are not
waiting, and are in the idle state.The process is idle or the process is running the query? If the
latter, what do you mean when you say "the queries ... are in the
idle state"?select * from pg_stat_activity returns a table containing a column
labeled "state". When the postgres process is at 100% utilization and
the application has hung, this query returns the value "idle" in that
field. When things are running properly, as they are for the moment
now, the value is "active".
When a connection shows "idle", the related process should be
showing zero CPU usage. Are you seeing something different? If
so, is the transaction or query start time changing from one sample
to the next?
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Tue, 2 Apr 2013 18:08:36 -0500
David Noel <david.i.noel@gmail.com> wrote:
On 4/2/13, Kevin Grittner <kgrittn@ymail.com> wrote:
David Noel <david.i.noel@gmail.com> wrote:
'select * from pg_stat_activity' shows that the queries are not
waiting, and are in the idle state.The process is idle or the process is running the query? If the
latter, what do you mean when you say "the queries ... are in the
idle state"?select * from pg_stat_activity returns a table containing a column
labeled "state". When the postgres process is at 100% utilization and
the application has hung, this query returns the value "idle" in that
field. When things are running properly, as they are for the moment
now, the value is "active".
Take care becasue "idle" is OK, "idle in transaction" not. Some checks about FreeBSD 8.3 (I use the same but with geom)
a) Perhaps process are waiting to I/O, do you take zfs snapshots? How often? It can limit your i/o performance. Check the output of #zpool iostat 5
b) Is the zpool ok? If one of the disks lags behind the others (because hardware errors) reconstructing the raidz should show what you say. Check the output of #zpool status when the "cpu storm" happens.
c) If you do a simple #top -U postgres (or the user that executes your postgres server), what does the STATE column show? Check that to know the kernel state of the process.
d) Do you use the standard values for zfs? Specially arc values.
--- ---
Eduardo Morras <emorrasg@yahoo.es>
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 4/3/13, Kevin Grittner <kgrittn@ymail.com> wrote:
David Noel <david.i.noel@gmail.com> wrote:
On 4/2/13, Kevin Grittner <kgrittn@ymail.com> wrote:
David Noel <david.i.noel@gmail.com> wrote:
'select * from pg_stat_activity' shows that the queries are not
waiting, and are in the idle state.The process is idle or the process is running the query? If the
latter, what do you mean when you say "the queries ... are in the
idle state"?select * from pg_stat_activity returns a table containing a column
labeled "state". When the postgres process is at 100% utilization and
the application has hung, this query returns the value "idle" in that
field. When things are running properly, as they are for the moment
now, the value is "active".When a connection shows "idle", the related process should be
showing zero CPU usage. Are you seeing something different? If
so, is the transaction or query start time changing from one sample
to the next?
The related process was maxed out at or near 100%. I've scanned
chapter 18.4 of the documentation and along with some suggested values
from a reply earlier have brought things to the point where it *seems*
to be stable. I'm continuing to test though, so if the issue pops up
again I'll check to see whether the query start time is changing or
not.
Thanks for the help,
-David
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 4/2/13, John R Pierce <pierce@hogranch.com> wrote:
On 4/2/2013 3:35 PM, David Noel wrote:
The hardware is a Dell PowerEdge 1420, dual Xeon Nocona's, 3.2ghz,
16gb ram. The disks are 4 Kingston HyperX SATA3's attached to a
HighPoint RocketRAID 2721 controller, ZFS, RAID10......
postgresql.conf, all standard/default except for:
max_connections = 256A) use a connection pool so you don't NEED 256 active database connections.
B) shared_buffers, work_mem, and maintenance_work_mem all need to be
tuned. I'd suggest 4gb, 16mb, 1gb respectively as a starting point on
a 16GB ram system. if you can, shrink your max_connections by using a
connection pooler (my target is generally no more than 2-4 active
queries per CPU core or hardware thread). Ouch, Xeon Nocona was a
single core, dual thread CPU, with rather poor performance, essentially
just a Pentium-4... 3Ghz on a P4 is like 2Ghz on other CPUs.when you said raid10, do you mean zfs mirrored, or are you doing
hardware raid10 in the Highpoint? I would have configured the raid
card for JBOD, and done ZFS mirroring in the OS, so you can take
advantage of ZFS's data integrity features. Those are consumer
grade SSD's, are they even qualified for use with that Highpoint
controller ?--
john r pierce 37N 122W
somewhere on the middle of the left coast
It looks like you guys were spot on, thanks. I've incorporated some of
the suggested values, done a little RTFM'ing (chapter 18.4), made a
few additional tweaks, and have brought things to a seemingly stable
state. Still testing, but so far so good. Glad it was such a simple
"fix".
Many thanks,
-David
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 4/3/13, Eduardo Morras <emorrasg@yahoo.es> wrote:
a) Perhaps process are waiting to I/O, do you take zfs snapshots? How often?
It can limit your i/o performance. Check the output of #zpool iostat 5b) Is the zpool ok? If one of the disks lags behind the others (because
hardware errors) reconstructing the raidz should show what you say. Check
the output of #zpool status when the "cpu storm" happens.c) If you do a simple #top -U postgres (or the user that executes your
postgres server), what does the STATE column show? Check that to know the
kernel state of the process.d) Do you use the standard values for zfs? Specially arc values.
Hmm, your points do make sense. Tuning postgresql.conf seems to have
done the trick for now, but if this issue pops up again I'll
definitely run through the diagnostics you have suggested.
Standard zfs arc values, yes.
Thanks,
-David
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general