Amazon EC2 CPU Utilization

Started by Mike Bresnahanabout 16 years ago19 messagesbugsgeneral
Jump to latest
#1Mike Bresnahan
mike.bresnahan@bestbuy.com
bugsgeneral

I have deployed PostgresSQL 8.4.1 on a Fedora 9 c1.xlarge (8x1 cores) instance
in the Amazon E2 Cloud. When I run pgbench in read-only mode (-S) on a small
database, I am unable to peg the CPUs no matter how many clients I throw at it.
In fact, the CPU utilization never drops below 60% idle. I also tried this on
Fedora 12 (kernel 2.6.31) and got the same basic result. What's going on here?
Am I really only utilizing 40% of the CPUs? Is this to be expected on virtual
(xen) instances?

[root@domU-12-31-39-0C-88-C1 ~]# uname -a
Linux domU-12-31-39-0C-88-C1 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20
17:48:28 EST 2009 x86_64 x86_64 x86_64 GNU/Linux

-bash-4.0# pgbench -S -c 16 -T 30 -h domU-12-31-39-0C-88-C1 -U postgres
Password:
starting vacuum...end.
transaction type: SELECT only
scaling factor: 64
query mode: simple
number of clients: 16
duration: 30 s
number of transactions actually processed: 590508
tps = 19663.841772 (including connections establishing)
tps = 19710.041020 (excluding connections establishing)

top - 15:55:05 up 1:33, 2 users, load average: 2.44, 0.98, 0.44
Tasks: 123 total, 11 running, 112 sleeping, 0 stopped, 0 zombie
Cpu(s): 18.9%us, 8.8%sy, 0.0%ni, 70.6%id, 0.0%wa, 0.0%hi, 1.7%si, 0.0%st
Mem: 7348132k total, 1886912k used, 5461220k free, 34432k buffers
Swap: 0k total, 0k used, 0k free, 1456472k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

2834 postgres 15 0 191m 72m 70m S 16 1.0 0:00.66 postmaster

2838 postgres 15 0 191m 66m 64m R 15 0.9 0:00.62 postmaster

2847 postgres 15 0 191m 70m 68m S 15 1.0 0:00.59 postmaster

2837 postgres 15 0 191m 72m 70m S 14 1.0 0:00.47 postmaster

2842 postgres 15 0 191m 66m 64m R 14 0.9 0:00.48 postmaster

2835 postgres 15 0 191m 69m 67m S 14 1.0 0:00.54 postmaster

2839 postgres 15 0 191m 69m 67m R 14 1.0 0:00.60 postmaster

2840 postgres 15 0 191m 68m 67m R 14 1.0 0:00.58 postmaster

2833 postgres 15 0 191m 68m 66m R 14 1.0 0:00.50 postmaster

2845 postgres 15 0 191m 70m 68m R 14 1.0 0:00.50 postmaster

2846 postgres 15 0 191m 67m 65m R 14 0.9 0:00.51 postmaster

2836 postgres 15 0 191m 66m 64m S 12 0.9 0:00.43 postmaster

2844 postgres 15 0 191m 68m 66m R 11 1.0 0:00.40 postmaster

2841 postgres 15 0 191m 65m 64m R 11 0.9 0:00.43 postmaster

2832 postgres 15 0 191m 67m 65m S 10 0.9 0:00.38 postmaster

2843 postgres 15 0 191m 67m 66m S 10 0.9 0:00.43 postmaster

[root@domU-12-31-39-0C-88-C1 ~]# iostat -d 2 -x
Linux 2.6.21.7-2.ec2.v1.2.fc8xen (domU-12-31-39-0C-88-C1) 01/27/10

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda1 0.57 15.01 1.32 3.56 34.39 148.57 37.52
0.28 57.35 3.05 1.49
sdb1 0.03 112.38 5.50 12.11 87.98 995.91 61.57
1.88 106.61 2.23 3.93

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda1 0.00 0.00 0.00 1.79 0.00 28.57 16.00
0.00 2.00 1.50 0.27
sdb1 0.00 4.46 0.00 14.29 0.00 150.00 10.50
0.37 26.00 2.56 3.66

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda1 0.00 3.57 0.00 0.79 0.00 34.92 44.00
0.00 3.00 3.00 0.24
sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00

#2Jim Mlodgenski
jimmy76@gmail.com
In reply to: Mike Bresnahan (#1)
bugsgeneral
Re: Amazon EC2 CPU Utilization

On Wed, Jan 27, 2010 at 3:59 PM, Mike Bresnahan
<mike.bresnahan@bestbuy.com>wrote:

I have deployed PostgresSQL 8.4.1 on a Fedora 9 c1.xlarge (8x1 cores)
instance
in the Amazon E2 Cloud. When I run pgbench in read-only mode (-S) on a
small
database, I am unable to peg the CPUs no matter how many clients I throw at
it.
In fact, the CPU utilization never drops below 60% idle. I also tried this
on
Fedora 12 (kernel 2.6.31) and got the same basic result. What's going on
here?
Am I really only utilizing 40% of the CPUs? Is this to be expected on
virtual
(xen) instances?

I have seen behavior like this in the past on EC2. I believe your

bottleneck may be pulling the data out of cache. I benchmarked this a while
back and found that memory speeds are not much faster than disk speeds on
EC2. I am not sure if that is true of Xen in general or if its just limited
to the cloud.

[root@domU-12-31-39-0C-88-C1 ~]# uname -a
Linux domU-12-31-39-0C-88-C1 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20
17:48:28 EST 2009 x86_64 x86_64 x86_64 GNU/Linux

-bash-4.0# pgbench -S -c 16 -T 30 -h domU-12-31-39-0C-88-C1 -U postgres
Password:
starting vacuum...end.
transaction type: SELECT only
scaling factor: 64
query mode: simple
number of clients: 16
duration: 30 s
number of transactions actually processed: 590508
tps = 19663.841772 (including connections establishing)
tps = 19710.041020 (excluding connections establishing)

top - 15:55:05 up 1:33, 2 users, load average: 2.44, 0.98, 0.44
Tasks: 123 total, 11 running, 112 sleeping, 0 stopped, 0 zombie
Cpu(s): 18.9%us, 8.8%sy, 0.0%ni, 70.6%id, 0.0%wa, 0.0%hi, 1.7%si,
0.0%st
Mem: 7348132k total, 1886912k used, 5461220k free, 34432k buffers
Swap: 0k total, 0k used, 0k free, 1456472k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

2834 postgres 15 0 191m 72m 70m S 16 1.0 0:00.66 postmaster

2838 postgres 15 0 191m 66m 64m R 15 0.9 0:00.62 postmaster

2847 postgres 15 0 191m 70m 68m S 15 1.0 0:00.59 postmaster

2837 postgres 15 0 191m 72m 70m S 14 1.0 0:00.47 postmaster

2842 postgres 15 0 191m 66m 64m R 14 0.9 0:00.48 postmaster

2835 postgres 15 0 191m 69m 67m S 14 1.0 0:00.54 postmaster

2839 postgres 15 0 191m 69m 67m R 14 1.0 0:00.60 postmaster

2840 postgres 15 0 191m 68m 67m R 14 1.0 0:00.58 postmaster

2833 postgres 15 0 191m 68m 66m R 14 1.0 0:00.50 postmaster

2845 postgres 15 0 191m 70m 68m R 14 1.0 0:00.50 postmaster

2846 postgres 15 0 191m 67m 65m R 14 0.9 0:00.51 postmaster

2836 postgres 15 0 191m 66m 64m S 12 0.9 0:00.43 postmaster

2844 postgres 15 0 191m 68m 66m R 11 1.0 0:00.40 postmaster

2841 postgres 15 0 191m 65m 64m R 11 0.9 0:00.43 postmaster

2832 postgres 15 0 191m 67m 65m S 10 0.9 0:00.38 postmaster

2843 postgres 15 0 191m 67m 66m S 10 0.9 0:00.43 postmaster

[root@domU-12-31-39-0C-88-C1 ~]# iostat -d 2 -x
Linux 2.6.21.7-2.ec2.v1.2.fc8xen (domU-12-31-39-0C-88-C1) 01/27/10

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda1 0.57 15.01 1.32 3.56 34.39 148.57 37.52
0.28 57.35 3.05 1.49
sdb1 0.03 112.38 5.50 12.11 87.98 995.91 61.57
1.88 106.61 2.23 3.93

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda1 0.00 0.00 0.00 1.79 0.00 28.57 16.00
0.00 2.00 1.50 0.27
sdb1 0.00 4.46 0.00 14.29 0.00 150.00 10.50
0.37 26.00 2.56 3.66

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda1 0.00 3.57 0.00 0.79 0.00 34.92 44.00
0.00 3.00 3.00 0.24
sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
--
Jim Mlodgenski
EnterpriseDB (http://www.enterprisedb.com)

#3Mike Bresnahan
mike.bresnahan@bestbuy.com
In reply to: Mike Bresnahan (#1)
bugsgeneral
Re: Amazon EC2 CPU Utilization

Jim Mlodgenski <jimmy76 <at> gmail.com> writes:

I have seen behavior like this in the past on EC2. I believe your bottleneck

may be pulling the data out of cache. I benchmarked this a while back and found
that memory speeds are not much faster than disk speeds on EC2. I am not sure if
that is true of Xen in general or if its just limited to the cloud.  

When the CPU is waiting for a memory read, are the CPU cycles not charged to the
currently running process?

#4Greg Smith
gsmith@gregsmith.com
In reply to: Mike Bresnahan (#1)
bugsgeneral
Re: Amazon EC2 CPU Utilization

Mike Bresnahan wrote:

top - 15:55:05 up 1:33, 2 users, load average: 2.44, 0.98, 0.44
Tasks: 123 total, 11 running, 112 sleeping, 0 stopped, 0 zombie
Cpu(s): 18.9%us, 8.8%sy, 0.0%ni, 70.6%id, 0.0%wa, 0.0%hi, 1.7%si, 0.0%st
Mem: 7348132k total, 1886912k used, 5461220k free, 34432k buffers
Swap: 0k total, 0k used, 0k free, 1456472k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

2834 postgres 15 0 191m 72m 70m S 16 1.0 0:00.66 postmaster

2838 postgres 15 0 191m 66m 64m R 15 0.9 0:00.62 postmaster

Could you try this again with "top -c", which will label these
postmaster processes usefully, and include the pgbench client itself in
what you post? It's hard to sort out what's going on in these
situations without that style of breakdown.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.com

#5John R Pierce
pierce@hogranch.com
In reply to: Jim Mlodgenski (#2)
bugsgeneral
Re: Amazon EC2 CPU Utilization

I have seen behavior like this in the past on EC2. I believe your
bottleneck may be pulling the data out of cache. I benchmarked this a
while back and found that memory speeds are not much faster than disk
speeds on EC2. I am not sure if that is true of Xen in general or if
its just limited to the cloud.

that doesn't make much sense.

more likely, he's disk IO bound, but hard to say as that iostat output
only showed a couple 2 second slices of work. the first output, which
shows average since system startup, seems to show the system has had
relatively high average wait times of 100ms on the average, yet the
samples below only show 0, 2, 3mS await.

#6Mike Bresnahan
mike.bresnahan@bestbuy.com
In reply to: Mike Bresnahan (#1)
bugsgeneral
Re: Amazon EC2 CPU Utilization

John R Pierce <pierce <at> hogranch.com> writes:

more likely, he's disk IO bound, but hard to say as that iostat output
only showed a couple 2 second slices of work. the first output, which
shows average since system startup, seems to show the system has had
relatively high average wait times of 100ms on the average, yet the
samples below only show 0, 2, 3mS await.

I don't think the problem is disk I/O. The database easily fits in the available
RAM (in fact there is a ton of RAM free) and iostat does not show a heavy load.

#7Mike Bresnahan
mike.bresnahan@bestbuy.com
In reply to: Mike Bresnahan (#1)
bugsgeneral
Re: Amazon EC2 CPU Utilization

Could you try this again with "top -c", which will label these
postmaster processes usefully, and include the pgbench client itself in
what you post? It's hard to sort out what's going on in these
situations without that style of breakdown.

I had run pgbench on a separate instance last time, but this time I ran it on
the same machine. With the -c option, top(1) reports that many of the postgres
processes are idle.

top - 18:25:23 up 8 min, 2 users, load average: 1.52, 1.32, 0.55
Tasks: 218 total, 15 running, 203 sleeping, 0 stopped, 0 zombie
Cpu(s): 32.3%us, 17.5%sy, 0.0%ni, 49.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.6%st
Mem: 7358492k total, 1620500k used, 5737992k free, 11144k buffers
Swap: 0k total, 0k used, 0k free, 1248388k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

1323 postgres 20 0 50364 2192 1544 R 56.7 0.0 0:03.19 pgbench -S -c 16 -T
30

1337 postgres 20 0 197m 114m 112m R 25.4 1.6 0:01.35 postgres: postgres
postgres [local] SELECT

1331 postgres 20 0 197m 113m 111m R 24.4 1.6 0:01.16 postgres: postgres
postgres [local] idle

1335 postgres 20 0 197m 114m 112m R 24.1 1.6 0:01.30 postgres: postgres
postgres [local] SELECT

1340 postgres 20 0 197m 113m 112m R 22.7 1.6 0:01.28 postgres: postgres
postgres [local] idle

1327 postgres 20 0 197m 114m 113m R 22.1 1.6 0:01.26 postgres: postgres
postgres [local] idle

1328 postgres 20 0 197m 114m 113m R 21.8 1.6 0:01.32 postgres: postgres
postgres [local] SELECT

1332 postgres 20 0 197m 114m 112m R 21.8 1.6 0:01.11 postgres: postgres
postgres [local] SELECT

1326 postgres 20 0 197m 112m 110m R 21.4 1.6 0:01.10 postgres: postgres
postgres [local] idle

1325 postgres 20 0 197m 112m 110m R 20.8 1.6 0:01.28 postgres: postgres
postgres [local] SELECT

1330 postgres 20 0 197m 113m 111m R 20.4 1.6 0:01.21 postgres: postgres
postgres [local] idle

1339 postgres 20 0 197m 113m 111m R 20.4 1.6 0:01.10 postgres: postgres
postgres [local] idle

1333 postgres 20 0 197m 114m 112m S 20.1 1.6 0:01.08 postgres: postgres
postgres [local] SELECT

1336 postgres 20 0 197m 113m 111m S 19.8 1.6 0:01.10 postgres: postgres
postgres [local] SELECT

1329 postgres 20 0 197m 113m 111m S 19.1 1.6 0:01.21 postgres: postgres
postgres [local] idle

1338 postgres 20 0 197m 114m 112m R 19.1 1.6 0:01.28 postgres: postgres
postgres [local] SELECT

1334 postgres 20 0 197m 114m 112m R 18.8 1.6 0:01.00 postgres: postgres
postgres [local] idle

1214 root 20 0 14900 1348 944 R 0.3 0.0 0:00.41 top -c

#8Mike Bresnahan
mike.bresnahan@bestbuy.com
In reply to: Mike Bresnahan (#1)
bugsgeneral
Re: Amazon EC2 CPU Utilization

Greg Smith <greg <at> 2ndquadrant.com> writes:

Could you try this again with "top -c", which will label these
postmaster processes usefully, and include the pgbench client itself in
what you post? It's hard to sort out what's going on in these
situations without that style of breakdown.

As a further experiment, I ran 8 pgbench processes in parallel. The result is
about the same.

top - 18:34:15 up 17 min, 2 users, load average: 0.39, 0.40, 0.36
Tasks: 217 total, 8 running, 209 sleeping, 0 stopped, 0 zombie
Cpu(s): 22.2%us, 8.9%sy, 0.0%ni, 68.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.3%st
Mem: 7358492k total, 1611148k used, 5747344k free, 11416k buffers
Swap: 0k total, 0k used, 0k free, 1248408k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

1506 postgres 20 0 197m 134m 132m S 29.4 1.9 0:09.27 postgres: postgres
postgres [local] idle

1524 postgres 20 0 197m 134m 132m R 29.4 1.9 0:05.13 postgres: postgres
postgres [local] idle

1509 postgres 20 0 197m 134m 132m R 27.1 1.9 0:08.58 postgres: postgres
postgres [local] SELECT

1521 postgres 20 0 197m 134m 132m R 26.4 1.9 0:05.77 postgres: postgres
postgres [local] SELECT

1512 postgres 20 0 197m 134m 132m S 26.1 1.9 0:07.62 postgres: postgres
postgres [local] idle

1520 postgres 20 0 197m 134m 132m R 25.8 1.9 0:05.31 postgres: postgres
postgres [local] idle

1515 postgres 20 0 197m 134m 132m S 23.8 1.9 0:06.94 postgres: postgres
postgres [local] SELECT

1527 postgres 20 0 197m 134m 132m S 21.8 1.9 0:04.46 postgres: postgres
postgres [local] SELECT

1517 postgres 20 0 49808 2012 1544 R 5.3 0.0 0:01.02 pgbench -S -c 1 -T
30

1507 postgres 20 0 49808 2012 1544 R 4.6 0.0 0:01.70 pgbench -S -c 1 -T
30

1510 postgres 20 0 49808 2008 1544 S 4.3 0.0 0:01.32 pgbench -S -c 1 -T
30

1525 postgres 20 0 49808 2012 1544 S 4.3 0.0 0:00.79 pgbench -S -c 1 -T
30

1516 postgres 20 0 49808 2016 1544 S 4.0 0.0 0:01.00 pgbench -S -c 1 -T
30

1504 postgres 20 0 49808 2012 1544 R 3.3 0.0 0:01.81 pgbench -S -c 1 -T
30

1513 postgres 20 0 49808 2016 1544 S 3.0 0.0 0:01.07 pgbench -S -c 1 -T
30

1522 postgres 20 0 49808 2012 1544 S 3.0 0.0 0:00.86 pgbench -S -c 1 -T
30

1209 postgres 20 0 63148 1476 476 S 0.3 0.0 0:00.11 postgres: stats
collector process

#9Jim Mlodgenski
jimmy76@gmail.com
In reply to: Mike Bresnahan (#8)
bugsgeneral
Re: Amazon EC2 CPU Utilization

On Wed, Jan 27, 2010 at 6:37 PM, Mike Bresnahan
<mike.bresnahan@bestbuy.com>wrote:

Greg Smith <greg <at> 2ndquadrant.com> writes:

Could you try this again with "top -c", which will label these
postmaster processes usefully, and include the pgbench client itself in
what you post? It's hard to sort out what's going on in these
situations without that style of breakdown.

As a further experiment, I ran 8 pgbench processes in parallel. The result
is
about the same.

Let's start from the beginning. Have you tuned your postgresql.conf file?

What do you have shared_buffers set to? That would have the biggest effect
on a test like this.

top - 18:34:15 up 17 min, 2 users, load average: 0.39, 0.40, 0.36
Tasks: 217 total, 8 running, 209 sleeping, 0 stopped, 0 zombie
Cpu(s): 22.2%us, 8.9%sy, 0.0%ni, 68.7%id, 0.0%wa, 0.0%hi, 0.0%si,
0.3%st
Mem: 7358492k total, 1611148k used, 5747344k free, 11416k buffers
Swap: 0k total, 0k used, 0k free, 1248408k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

1506 postgres 20 0 197m 134m 132m S 29.4 1.9 0:09.27 postgres:
postgres
postgres [local] idle

1524 postgres 20 0 197m 134m 132m R 29.4 1.9 0:05.13 postgres:
postgres
postgres [local] idle

1509 postgres 20 0 197m 134m 132m R 27.1 1.9 0:08.58 postgres:
postgres
postgres [local] SELECT

1521 postgres 20 0 197m 134m 132m R 26.4 1.9 0:05.77 postgres:
postgres
postgres [local] SELECT

1512 postgres 20 0 197m 134m 132m S 26.1 1.9 0:07.62 postgres:
postgres
postgres [local] idle

1520 postgres 20 0 197m 134m 132m R 25.8 1.9 0:05.31 postgres:
postgres
postgres [local] idle

1515 postgres 20 0 197m 134m 132m S 23.8 1.9 0:06.94 postgres:
postgres
postgres [local] SELECT

1527 postgres 20 0 197m 134m 132m S 21.8 1.9 0:04.46 postgres:
postgres
postgres [local] SELECT

1517 postgres 20 0 49808 2012 1544 R 5.3 0.0 0:01.02 pgbench -S -c
1 -T
30

1507 postgres 20 0 49808 2012 1544 R 4.6 0.0 0:01.70 pgbench -S -c
1 -T
30

1510 postgres 20 0 49808 2008 1544 S 4.3 0.0 0:01.32 pgbench -S -c
1 -T
30

1525 postgres 20 0 49808 2012 1544 S 4.3 0.0 0:00.79 pgbench -S -c
1 -T
30

1516 postgres 20 0 49808 2016 1544 S 4.0 0.0 0:01.00 pgbench -S -c
1 -T
30

1504 postgres 20 0 49808 2012 1544 R 3.3 0.0 0:01.81 pgbench -S -c
1 -T
30

1513 postgres 20 0 49808 2016 1544 S 3.0 0.0 0:01.07 pgbench -S -c
1 -T
30

1522 postgres 20 0 49808 2012 1544 S 3.0 0.0 0:00.86 pgbench -S -c
1 -T
30

1209 postgres 20 0 63148 1476 476 S 0.3 0.0 0:00.11 postgres:
stats
collector process

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
--
Jim Mlodgenski
EnterpriseDB (http://www.enterprisedb.com)

#10Mike Bresnahan
mike.bresnahan@bestbuy.com
In reply to: Mike Bresnahan (#1)
bugsgeneral
Re: Amazon EC2 CPU Utilization

Jim Mlodgenski <jimmy76 <at> gmail.com> writes:

Let's start from the beginning. Have you tuned your postgresql.conf file? What

do you have shared_buffers set to? That would have the biggest effect on a test
like this. 

shared_buffers = 128MB
maintenance_work_mem = 256MB
checkpoint_segments = 20

#11Hardwick, Joe
Joe.Hardwick@fnis.com
In reply to: Mike Bresnahan (#10)
bugsgeneral
SET statement_timeout problem

I have a problem with fetching from cursors sometimes taking an
extremely long time to run. I am attempting to use the
statement_timeout parameter to limit the runtime on these.

PostgreSQL 8.2.4
Linux 2.6.22.14-72.fc6 #1 SMP Wed Nov 21 13:44:07 EST 2007 i686 i686
i386 GNU/Linux

begin;
set search_path = testdb;
declare cur_rep cursor for select * from accounts, individual;

set statement_timeout = 1000;

fetch forward 1000000 from cur_rep;

The open join, 1000ms, and 1000000 count are all intentional. Normally
those values would be 300000 and 10000. The accounts and individual
tables have around 100 fields and 500k records each.

Nested Loop (cost=21992.28..8137785497.71 rows=347496704100 width=8)
-> Seq Scan on accounts (cost=0.00..30447.44 rows=623844 width=8)
-> Materialize (cost=21992.28..29466.53 rows=557025 width=0)
-> Seq Scan on individual (cost=0.00..19531.25 rows=557025
width=0)

I tried moving the SET statment before the cursor delcaration and
outside the transaction with the same results. I thought possibly it
was getting bogged down in I/O but the timeout seems to work fine if not
using a cursor.

What am I missing here?

Thanks,
Joe

_____________

The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
_____________

#12Greg Smith
gsmith@gregsmith.com
In reply to: Mike Bresnahan (#1)
bugsgeneral
Re: Amazon EC2 CPU Utilization

Mike Bresnahan wrote:

I have deployed PostgresSQL 8.4.1 on a Fedora 9 c1.xlarge (8x1 cores) instance
in the Amazon E2 Cloud. When I run pgbench in read-only mode (-S) on a small
database, I am unable to peg the CPUs no matter how many clients I throw at it.
In fact, the CPU utilization never drops below 60% idle. I also tried this on
Fedora 12 (kernel 2.6.31) and got the same basic result. What's going on here?
Am I really only utilizing 40% of the CPUs? Is this to be expected on virtual
(xen) instances?
tps = 19663.841772 (including connections establishing

Looks to me like you're running into a general memory bandwidth issue
here, possibly one that's made a bit worse by how pgbench works. It's a
somewhat funky workload Linux systems aren't always happy with, although
one of your tests had the right configuration to sidestep the worst of
the problems there. I don't see any evidence that pgbench itself is a
likely suspect for the issue, but it does shuffle a lot of things around
in memory relative to transaction time when running this small
select-only test, and clients can get stuck waiting for it when that
happens.

To put your results in perspective, I would expect to get around 25K TPS
running the pgbench setup/test you're doing on a recent 4-core/single
processor system, and around 50K TPS is normal for an 8-core server
doing this type of test. And those numbers are extremely sensitive to
the speed of the underlying RAM even with the CPU staying the same.

I would characterize your results as "getting about 1/2 of the
CPU+memory performance of an install on a dedicated 8-core system".
That's not horrible, as long as you have reasonable expectations here,
which is really the case for any virtualized install I think. I'd
actually like to launch a more thorough investigation into this
particular area, exactly how the PostgreSQL bottlenecks shift around on
EC2 compared to similar dedicated hardware, if I found a sponsor for it
one day. A bit too much work to do it right just for fun.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.com

#13Mike Bresnahan
mike.bresnahan@bestbuy.com
In reply to: Mike Bresnahan (#1)
bugsgeneral
Re: Amazon EC2 CPU Utilization

Greg Smith <greg <at> 2ndquadrant.com> writes:

Looks to me like you're running into a general memory bandwidth issue
here, possibly one that's made a bit worse by how pgbench works. It's a
somewhat funky workload Linux systems aren't always happy with, although
one of your tests had the right configuration to sidestep the worst of
the problems there. I don't see any evidence that pgbench itself is a
likely suspect for the issue, but it does shuffle a lot of things around
in memory relative to transaction time when running this small
select-only test, and clients can get stuck waiting for it when that
happens.

To put your results in perspective, I would expect to get around 25K TPS
running the pgbench setup/test you're doing on a recent 4-core/single
processor system, and around 50K TPS is normal for an 8-core server
doing this type of test. And those numbers are extremely sensitive to
the speed of the underlying RAM even with the CPU staying the same.

I would characterize your results as "getting about 1/2 of the
CPU+memory performance of an install on a dedicated 8-core system".
That's not horrible, as long as you have reasonable expectations here,
which is really the case for any virtualized install I think. I'd
actually like to launch a more thorough investigation into this
particular area, exactly how the PostgreSQL bottlenecks shift around on
EC2 compared to similar dedicated hardware, if I found a sponsor for it
one day. A bit too much work to do it right just for fun.

I can understand that I will not get as much performance out of a EC2 instance
as a dedicated server, but I don't understand why top(1) is showing 50% CPU
utilization. If it were a memory speed problem wouldn't top(1) report 100% CPU
utilization? Does the kernel really do a context shift when waiting for response
from RAM? That would surprise me, because to do a context shift it might need to
read from RAM, which would then also block. I still worry it is a lock
contention or scheduling problem, but I am not sure how to diagnose it. I've
seen some references to using dtrace to analyze PostgreSQL locks, but it looks
like it might take a lot of ramp up time for me to learn how to use dtrace.

Note that I can peg the CPU by running 8 infinite loops inside or outside the
database. I have only seen the utilization problem when running queries (with
pgbench and my application) against PostgreSQL.

In any case, assuming this is a EC2 memory speed thing, it is going to be
difficult to diagnose application bottlenecks when I cannot rely on top(1)
reporting meaningful CPU stats.

Thank you for your help.

#14Jeff Davis
pgsql@j-davis.com
In reply to: Mike Bresnahan (#13)
bugsgeneral
Re: Amazon EC2 CPU Utilization

On Thu, 2010-01-28 at 22:45 +0000, Mike Bresnahan wrote:

I can understand that I will not get as much performance out of a EC2 instance
as a dedicated server, but I don't understand why top(1) is showing 50% CPU
utilization.

One possible cause is lock contention, but I don't know if that explains
your problem. Perhaps there's something about the handling of shared
memory or semaphores on EC2 that makes it slow enough that it's causing
lock contention.

You could try testing on a xen instance and see if you have the same
problem.

Regards,
Jeff Davis

#15Rodger Donaldson
rodger@diaspora.gen.nz
In reply to: Mike Bresnahan (#13)
bugsgeneral
Re: Amazon EC2 CPU Utilization

Mike Bresnahan wrote:

I can understand that I will not get as much performance out of a EC2 instance
as a dedicated server, but I don't understand why top(1) is showing 50% CPU
utilization. If it were a memory speed problem wouldn't top(1) report 100% CPU
utilization?

A couple of points:

top is not the be-all and end-all of analysis tools. I'm sure you know
that, but it bears repeating.

More importantly, in a virtualised environment the tools on the inside
of the guest don't have a full picture of what's really going on. I've
not done any real work with Xen; most of my experience is with zVM and
KVM.

It's pretty normal on a heavily loaded server to see tools like top (and
vmstat, sar, et al) reporting less than 100% use while the box is
running flat-out, leaving nothing left for the guest to get. I had this
last night doing a load on a guest - 60-70% CPU at peak, with no more
available. You *should* see steal and 0% idle time in this case, but I
*have* seen zVM Linux guests reporting ample idle time while the zVM
level monitoring tools reported the LPAR as a whole running at 90-95%
utilisation (which is when an LPAR will usually run out of steam).

A secondary effect is that sometimes the scheduling of guests on and off
the hypervisor will cause skewing in the timekeeping of the guest; it's
not uncommon in our loaded-up zVM environment to see discrepencies of
5-20% between the guest's view of how much CPU time it thinks it's
getting and how much time the hypervisor knows it's getting (this is why
companies like Velocity make money selling hypervisor-aware tools that
auto-correct those stats).

In any case, assuming this is a EC2 memory speed thing, it is going to be
difficult to diagnose application bottlenecks when I cannot rely on top(1)
reporting meaningful CPU stats.

It's going to be even harder from inside the guests, since you're
getting an incomplete view of the system as a whole.

You could try the c2cbench (http://sourceforge.net/projects/c2cbench/)
which is designed to benchmark memory cache performance, but it'll still
be subject to the caveats I outlined above: it may give you something
indicative if you think it's a cache problem, but it may also simply
tell you that the virtual CPUs are fine while the real processors are
pegged for cache from running a bunch of workloads with high memory
pressure.

If you were running a newer kernel you could look at perf_counters or
something similar to get more detail from what the guest thinks it's
doing, but, again, there are going to be inaccuracies.

#16Mike Bresnahan
mike.bresnahan@bestbuy.com
In reply to: Mike Bresnahan (#1)
bugsgeneral
Re: Amazon EC2 CPU Utilization

In an attempt to determine whether top(1) is lying about the CPU utilization, I
did an experiment. I fired up a EC2 c1.xlarge instance and ran pgbench and a
tight loop in parallel.

-bash-4.0$ uname -a
Linux domu-12-31-39-00-8d-71.compute-1.internal 2.6.31-302-ec2 #7-Ubuntu SMP Tue
Oct 13 19:55:22 UTC 2009 x86_64 x86_64 x86_64 GNU/Linux

-bash-4.0$ pgbench -S -T 30 -c 16 -h localhost
Password:
starting vacuum...end.
transaction type: SELECT only
scaling factor: 64
query mode: simple
number of clients: 16
duration: 30 s
number of transactions actually processed: 804719
tps = 26787.949376 (including connections establishing)
tps = 26842.193411 (excluding connections establishing)

While pgbench was running I ran a tight loop at the bash prompt.

-bash-4.0# time for i in {1..10000000}; do true; done

real 0m36.660s
user 0m33.100s
sys 0m2.040s

Then I ran each alone.

-bash-4.0$ pgbench -S -T 30 -c 16 -h localhost
Password:
starting vacuum...end.
transaction type: SELECT only
scaling factor: 64
query mode: simple
number of clients: 16
duration: 30 s
number of transactions actually processed: 964639
tps = 32143.595223 (including connections establishing)
tps = 32208.347194 (excluding connections establishing)

-bash-4.0# time for i in {1..10000000}; do true; done

real 0m32.811s
user 0m31.330s
sys 0m1.470s

Running the loop caused pgbench to lose about 12.5% (1/8), which is exactly what
I would expect on a 8 core machine. So it seems that top(1) is lying.

#17John R Pierce
pierce@hogranch.com
In reply to: Rodger Donaldson (#15)
bugsgeneral
Re: [GENERAL] Amazon EC2 CPU Utilization

top is not the be-all and end-all of analysis tools. I'm sure you
know that, but it bears repeating.

More importantly, in a virtualised environment the tools on the inside
of the guest don't have a full picture of what's really going on.

Indeed, you have hit the nail on the head.

does anyone know what the ACTUAL hardware ec2 is using is? and, does
anyone know how much over-subscribing they do? eg, if you're paying
for 8 cores, do you actually have 8 dedicated cores, or will they put
several "8 virtual core" domU's on the same physical cores?

OOOOH.... I'm reading http://aws.amazon.com/ec2/instance-types/

As I'm interpreting that, an "XL" instance is FOUR /virtual/ cores,
allocated the horsepower equivalent of 2 1.0 Ghz core2duo style cores
each, or 1.7Ghz P4 style processors.

So we've been WAY off base here, the XL is *FOUR*, not EIGHT cores.
This XL is nominally equivalent to a dual socket dual core 2Ghz Xeon
3050 "Conroe".

Does this better fit the observations?

#18John R Pierce
pierce@hogranch.com
In reply to: John R Pierce (#17)
bugsgeneral
Re: [GENERAL] Amazon EC2 CPU Utilization

John R Pierce wrote:

top is not the be-all and end-all of analysis tools. I'm sure you
know that, but it bears repeating.

More importantly, in a virtualised environment the tools on the
inside of the guest don't have a full picture of what's really going on.

Indeed, you have hit the nail on the head.

....

ooops, somehow sent this to the wrong list?!? ignore.

#19John R Pierce
pierce@hogranch.com
In reply to: Rodger Donaldson (#15)
bugsgeneral
Re: Amazon EC2 CPU Utilization

top is not the be-all and end-all of analysis tools. I'm sure you
know that, but it bears repeating.
More importantly, in a virtualised environment the tools on the inside
of the guest don't have a full picture of what's really going on.

Indeed, you have hit the nail on the head.

does anyone know what the ACTUAL hardware ec2 is using is? and, does
anyone know how much over-subscribing they do? eg, if you're paying
for 8 cores, do you actually have 8 dedicated cores, or will they put
several "8 virtual core" domU's on the same physical cores?

OOOOH.... I'm reading http://aws.amazon.com/ec2/instance-types/

As I'm interpreting that, an "XL" instance is FOUR /virtual/ cores,
allocated the horsepower equivalent of 2 1.0 Ghz core2duo style cores
each, or 1.7Ghz P4 style processors.

So we've been WAY off base here, the XL is *FOUR*, not EIGHT cores.
This XL is nominally equivalent to a dual socket dual core 2Ghz Xeon
3050 "Conroe".

Does this better fit the observations?