high load on server

Started by Gerd Koenigabout 17 years ago7 messagesgeneral
Jump to latest
#1Gerd Koenig
koenig@transporeon.com

Hello,

since 2 days ago we're facing an increased load on our database server
(opensuse10.3-64bit, PostgreSQL 8.3.5, 8GB Ram). This high load stays the whole
working day.
==================
current situation:
==================
#>top
top - 14:09:46 up 40 days, 8:08, 2 users, load average: 7.60, 7.46, 7.13
...
Mem: 8194596k total, 5716680k used, 2477916k free, 185516k buffers
Swap: 4200988k total, 204k used, 4200784k free, 5041448k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17478 postgres 15 0 610m 455m 444m R 52 5.7 0:08.78 postmaster
17449 postgres 15 0 606m 497m 489m S 37 6.2 0:16.35 postmaster
22541 postgres 16 0 607m 522m 516m R 31 6.5 123:25.17 postmaster
17491 postgres 15 0 618m 447m 435m S 22 5.6 0:03.97 postmaster
17454 postgres 15 0 616m 474m 457m S 18 5.9 0:15.88 postmaster
22547 postgres 15 0 608m 534m 527m S 18 6.7 100:12.01 postmaster
17448 postgres 16 0 616m 517m 501m S 17 6.5 0:15.60 postmaster
17451 postgres 15 0 611m 491m 479m S 11 6.1 0:25.04 postmaster
17490 postgres 15 0 606m 351m 344m S 10 4.4 0:02.69 postmaster
22540 postgres 15 0 607m 520m 513m S 2 6.5 33:46.47 postmaster
17489 postgres 15 0 604m 316m 311m S 2 4.0 0:03.34 postmaster

I assume the problem is caused by heavy writing slows down the
server....?!?...why?=>

1.) there are no long running queries:
SELECT current_query, COUNT(current_query)
FROM pg_stat_activity
WHERE query_start < now() - interval '1 min'
AND current_query != '<IDLE>'
GROUP BY current_query;
current_query | count
---------------+-------
(0 Zeilen)

2.) we get wal archives written every 2-3min.
3.) we have no high-performant hardware layout, data and log on the same disk=>
#>iostat 2 5
Linux 2.6.22.5-31-default 03.04.2009
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 13,42 38,57 391,86 134436221 1365849137
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 36,21 0,00 994,02 0 2992
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 67,67 0,00 1621,33 0 4864
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 40,00 0,00 989,33 0 2968
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 26,91 18,60 948,84 56 2856

#>vmstat 2 10
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
5 0 204 2449652 185692 5046168 0 0 2 24 1 1 2 0 95 2
3 0 204 2448496 185692 5046184 0 0 0 518 2984 18683 24 5 65 6
3 0 204 2430864 185692 5046192 0 0 0 344 2083 10004 34 3 58 5
2 0 204 2434600 185700 5046200 0 0 0 386 2084 23592 33 3 57 7
3 0 204 2425612 185700 5046220 0 0 0 372 2352 2905 36 2 57 5
5 0 204 2424828 185700 5046256 0 0 0 600 2372 33094 36 12 48 4
4 0 204 2405516 185700 5046256 0 0 4 992 1747 29035 33 8 52 6
3 0 204 2419368 185708 5046272 0 0 4 660 2735 24732 36 7 51 6
2 0 204 2419244 185712 5046296 0 0 0 360 2251 3193 9 1 84 5
3 0 204 2407096 185712 5046296 0 0 0 332 2319 3269 20 3 72 5

Can I check further system/database details ?

Can we lower the load by reducing the amount of written wal archives, is this
somehow possible ?

Since buying and installing new hardware is a huge effort any other solutions
highly welcome :-))

thanks in advance...GERD...

#2Scott Marlowe
scott.marlowe@gmail.com
In reply to: Gerd Koenig (#1)
Re: high load on server

2009/4/3 Gerd König <koenig@transporeon.com>:

Hello,

since 2 days ago we're facing an increased load on our database server
(opensuse10.3-64bit, PostgreSQL 8.3.5, 8GB Ram). This high load stays the whole
working day.

How man cores?

==================
current situation:
==================
#>top
top - 14:09:46 up 40 days,  8:08,  2 users,  load average: 7.60, 7.46, 7.13
...
Mem:   8194596k total,  5716680k used,  2477916k free,   185516k buffers
Swap:  4200988k total,      204k used,  4200784k free,  5041448k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
17478 postgres  15   0  610m 455m 444m R   52  5.7   0:08.78 postmaster
17449 postgres  15   0  606m 497m 489m S   37  6.2   0:16.35 postmaster
22541 postgres  16   0  607m 522m 516m R   31  6.5 123:25.17 postmaster
17491 postgres  15   0  618m 447m 435m S   22  5.6   0:03.97 postmaster
17454 postgres  15   0  616m 474m 457m S   18  5.9   0:15.88 postmaster
22547 postgres  15   0  608m 534m 527m S   18  6.7 100:12.01 postmaster
17448 postgres  16   0  616m 517m 501m S   17  6.5   0:15.60 postmaster
17451 postgres  15   0  611m 491m 479m S   11  6.1   0:25.04 postmaster
17490 postgres  15   0  606m 351m 344m S   10  4.4   0:02.69 postmaster
22540 postgres  15   0  607m 520m 513m S    2  6.5  33:46.47 postmaster
17489 postgres  15   0  604m 316m 311m S    2  4.0   0:03.34 postmaster

Next time hit c first to see what the postmasters are up to.

I assume the problem is caused by heavy writing slows down the
server....?!?...why?=>

The problem might be that you're assuming there's a problem. Looking
at the rest of your diags, you're data set fits in memory, I/O wait is
< 10% and there are no processes waiting for a CPU to free up, they're
all running.

Looks healthy to me.

#3Gerd Koenig
koenig@transporeon.com
In reply to: Scott Marlowe (#2)
Re: high load on server

Hello Scott,

thanks for answering.

Scott Marlowe schrieb:

2009/4/3 Gerd K�nig <koenig@transporeon.com>:

Hello,

since 2 days ago we're facing an increased load on our database server
(opensuse10.3-64bit, PostgreSQL 8.3.5, 8GB Ram). This high load stays the whole
working day.

How man cores?

The server contains two
"model name : Intel(R) Xeon(R) CPU X5355 @ 2.66GHz"
CPU's, thereby 8 cores...

==================
current situation:
==================
#>top
top - 14:09:46 up 40 days, 8:08, 2 users, load average: 7.60, 7.46, 7.13
...
Mem: 8194596k total, 5716680k used, 2477916k free, 185516k buffers
Swap: 4200988k total, 204k used, 4200784k free, 5041448k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17478 postgres 15 0 610m 455m 444m R 52 5.7 0:08.78 postmaster
17449 postgres 15 0 606m 497m 489m S 37 6.2 0:16.35 postmaster
22541 postgres 16 0 607m 522m 516m R 31 6.5 123:25.17 postmaster
17491 postgres 15 0 618m 447m 435m S 22 5.6 0:03.97 postmaster
17454 postgres 15 0 616m 474m 457m S 18 5.9 0:15.88 postmaster
22547 postgres 15 0 608m 534m 527m S 18 6.7 100:12.01 postmaster
17448 postgres 16 0 616m 517m 501m S 17 6.5 0:15.60 postmaster
17451 postgres 15 0 611m 491m 479m S 11 6.1 0:25.04 postmaster
17490 postgres 15 0 606m 351m 344m S 10 4.4 0:02.69 postmaster
22540 postgres 15 0 607m 520m 513m S 2 6.5 33:46.47 postmaster
17489 postgres 15 0 604m 316m 311m S 2 4.0 0:03.34 postmaster

Next time hit c first to see what the postmasters are up to.

good hint, I'll perform this the next time the server runs under higher
load (probably on monday...)

I assume the problem is caused by heavy writing slows down the
server....?!?...why?=>

The problem might be that you're assuming there's a problem. Looking
at the rest of your diags, you're data set fits in memory, I/O wait is
< 10% and there are no processes waiting for a CPU to free up, they're
all running.

Looks healthy to me.

Perfect, probably our customers didn't work that much in the past, but
now they do ;-)

kind regards...:GERD:...

#4Scott Marlowe
scott.marlowe@gmail.com
In reply to: Gerd Koenig (#3)
Re: high load on server

On Fri, Apr 3, 2009 at 12:35 PM, Gerd Koenig <koenig@transporeon.com> wrote:

The problem might be that you're assuming there's a problem.  Looking
at the rest of your diags, you're data set fits in memory, I/O wait is
< 10% and there are no processes waiting for a CPU to free up, they're
all running.

Looks healthy to me.

Perfect, probably our customers didn't work that much in the past, but now
they do ;-)

Well, it looks like you're about halfway to where you're gonna have to
start improving your hardware / using slony read slaves / using
memcached or something like that to handle the extra load. Keep an
eye on your wait%. If that starts climbing and vmstat shows more and
more bo going to your drives, then you'll need to improve your I/O
subsystem to keep up with the load.

#5Erik Jones
ejones@engineyard.com
In reply to: Scott Marlowe (#2)
Re: high load on server

On Apr 3, 2009, at 7:32 AM, Scott Marlowe wrote:

2009/4/3 Gerd König <koenig@transporeon.com>:

Hello,

since 2 days ago we're facing an increased load on our database
server
(opensuse10.3-64bit, PostgreSQL 8.3.5, 8GB Ram). This high load
stays the whole
working day.

How man cores?

==================
current situation:
==================
#>top
top - 14:09:46 up 40 days, 8:08, 2 users, load average: 7.60,
7.46, 7.13
...
Mem: 8194596k total, 5716680k used, 2477916k free, 185516k
buffers
Swap: 4200988k total, 204k used, 4200784k free, 5041448k
cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17478 postgres 15 0 610m 455m 444m R 52 5.7 0:08.78
postmaster
17449 postgres 15 0 606m 497m 489m S 37 6.2 0:16.35
postmaster
22541 postgres 16 0 607m 522m 516m R 31 6.5 123:25.17
postmaster
17491 postgres 15 0 618m 447m 435m S 22 5.6 0:03.97
postmaster
17454 postgres 15 0 616m 474m 457m S 18 5.9 0:15.88
postmaster
22547 postgres 15 0 608m 534m 527m S 18 6.7 100:12.01
postmaster
17448 postgres 16 0 616m 517m 501m S 17 6.5 0:15.60
postmaster
17451 postgres 15 0 611m 491m 479m S 11 6.1 0:25.04
postmaster
17490 postgres 15 0 606m 351m 344m S 10 4.4 0:02.69
postmaster
22540 postgres 15 0 607m 520m 513m S 2 6.5 33:46.47
postmaster
17489 postgres 15 0 604m 316m 311m S 2 4.0 0:03.34
postmaster

Next time hit c first to see what the postmasters are up to.

I assume the problem is caused by heavy writing slows down the
server....?!?...why?=>

The problem might be that you're assuming there's a problem. Looking
at the rest of your diags, you're data set fits in memory, I/O wait is
< 10% and there are no processes waiting for a CPU to free up, they're
all running.

Looks healthy to me.

Eh? His run queue constantly has procs waiting for run time, although
I've seen higher. That with a distinct lack of heavy IO says cpu
bound to me...

#>vmstat 2 10
procs -----------memory---------- ---swap-- -----io---- -system-- ----
cpu----
r b swpd free buff cache si so bi bo in cs us sy
id wa
5 0 204 2449652 185692 5046168 0 0 2 24 1 1 2
0 95 2
3 0 204 2448496 185692 5046184 0 0 0 518 2984 18683
24 5 65 6
3 0 204 2430864 185692 5046192 0 0 0 344 2083 10004
34 3 58 5
2 0 204 2434600 185700 5046200 0 0 0 386 2084 23592
33 3 57 7
3 0 204 2425612 185700 5046220 0 0 0 372 2352 2905 36
2 57 5
5 0 204 2424828 185700 5046256 0 0 0 600 2372 33094 36
12 48 4
4 0 204 2405516 185700 5046256 0 0 4 992 1747 29035
33 8 52 6
3 0 204 2419368 185708 5046272 0 0 4 660 2735 24732
36 7 51 6
2 0 204 2419244 185712 5046296 0 0 0 360 2251 3193 9
1 84 5
3 0 204 2407096 185712 5046296 0 0 0 332 2319 3269 20
3 72 5

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k

#6Scott Marlowe
scott.marlowe@gmail.com
In reply to: Erik Jones (#5)
Re: high load on server

On Fri, Apr 3, 2009 at 4:13 PM, Erik Jones <ejones@engineyard.com> wrote:

On Apr 3, 2009, at 7:32 AM, Scott Marlowe wrote:

2009/4/3 Gerd König <koenig@transporeon.com>:

Hello,

since 2 days ago we're facing an increased load on our database server
(opensuse10.3-64bit, PostgreSQL 8.3.5, 8GB Ram). This high load stays the
whole
working day.

How man cores?

==================
current situation:
==================
#>top
top - 14:09:46 up 40 days,  8:08,  2 users,  load average: 7.60, 7.46,
7.13
...
Mem:   8194596k total,  5716680k used,  2477916k free,   185516k buffers
Swap:  4200988k total,      204k used,  4200784k free,  5041448k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
17478 postgres  15   0  610m 455m 444m R   52  5.7   0:08.78 postmaster
17449 postgres  15   0  606m 497m 489m S   37  6.2   0:16.35 postmaster
22541 postgres  16   0  607m 522m 516m R   31  6.5 123:25.17 postmaster
17491 postgres  15   0  618m 447m 435m S   22  5.6   0:03.97 postmaster
17454 postgres  15   0  616m 474m 457m S   18  5.9   0:15.88 postmaster
22547 postgres  15   0  608m 534m 527m S   18  6.7 100:12.01 postmaster
17448 postgres  16   0  616m 517m 501m S   17  6.5   0:15.60 postmaster
17451 postgres  15   0  611m 491m 479m S   11  6.1   0:25.04 postmaster
17490 postgres  15   0  606m 351m 344m S   10  4.4   0:02.69 postmaster
22540 postgres  15   0  607m 520m 513m S    2  6.5  33:46.47 postmaster
17489 postgres  15   0  604m 316m 311m S    2  4.0   0:03.34 postmaster

Next time hit c first to see what the postmasters are up to.

I assume the problem is caused by heavy writing slows down the
server....?!?...why?=>

The problem might be that you're assuming there's a problem.  Looking
at the rest of your diags, you're data set fits in memory, I/O wait is
< 10% and there are no processes waiting for a CPU to free up, they're
all running.

Looks healthy to me.

Eh?  His run queue constantly has procs waiting for run time, although I've
seen higher.  That with a distinct lack of heavy IO says cpu bound to me...

How do you see that? He's got 50% or so idle, and is running fewer
processes than he has cores.

#7Scott Marlowe
scott.marlowe@gmail.com
In reply to: Erik Jones (#5)
Re: high load on server

On Fri, Apr 3, 2009 at 4:13 PM, Erik Jones <ejones@engineyard.com> wrote:

Eh?  His run queue constantly has procs waiting for run time, although I've
seen higher.  That with a distinct lack of heavy IO says cpu bound to me...

I just pulled up the linux man page and it says that r is the number
of processes waiting to run. This isn't entirely correct. A BSD or
Solaris man page more correctly identifies it as the number of
processes running OR waiting to run and that if this number exceeds
the number of cores then the number by which is exceeds it is how big
of a queue there is.