Question on Opteron performance

Started by Steve Wolfeabout 22 years ago20 messagesgeneral

nw@codon.com

about 22 years ago

Right now, our production DB server is getting a bit more heavily loaded
than we'd like, and we expect its usage to double in the next few months,
so we're looking at where to put our money for a better machine.

Right now, we're using a dual 2.8GHz Xeon with 3 gigs of memory, and run
without fsync() enabled. Between disk cache and shared buffers, the disk
system isn't an issue - vmstat shows that the disk I/O is nearly always at
zero, with the occasional blips of activity rarely being more than a few
hundred kilobytes.

The main question in my mind is whether a 4-way Opteron is going to
give me enough of a performance benefit over a 2-way Opteron to make the
extra $10k worth it. My first guess was that it would, as going from 2
Opterons to 4 will give you twice the potential memory bandwidth.
However, as PostgreSQL pulls heavily from the global buffers, I may not be
able to utilize all of that potential bandwidth.

If anyone has done tests with PostgreSQL on 2- vs. 4-way machines under
heavy load (many simultaneous connections), I would greatly appreciate
hearing about the results.

Steve Wolfe

Stephen Robert Norris

srn@commsecure.com.au

about 22 years ago

In reply to: Steve Wolfe (#1)

Re: Question on Opteron performance

On Tue, 2004-03-09 at 11:06, Steve Wolfe wrote:

Right now, our production DB server is getting a bit more heavily loaded
than we'd like, and we expect its usage to double in the next few months,
so we're looking at where to put our money for a better machine.

Right now, we're using a dual 2.8GHz Xeon with 3 gigs of memory, and run
without fsync() enabled. Between disk cache and shared buffers, the disk
system isn't an issue - vmstat shows that the disk I/O is nearly always at
zero, with the occasional blips of activity rarely being more than a few
hundred kilobytes.

You do know that turning off fsync() means your data will all get
trashed if you get an OS crash or power problem or H/W crash or ...

The main question in my mind is whether a 4-way Opteron is going to
give me enough of a performance benefit over a 2-way Opteron to make the
extra $10k worth it. My first guess was that it would, as going from 2
Opterons to 4 will give you twice the potential memory bandwidth.
However, as PostgreSQL pulls heavily from the global buffers, I may not be
able to utilize all of that potential bandwidth.

Is this true? Did they really double the size of the memory bus, or is
it a case of 4 CPUs fighting for the same memory bandwidth that 2 had
before?

If anyone has done tests with PostgreSQL on 2- vs. 4-way machines under
heavy load (many simultaneous connections), I would greatly appreciate
hearing about the results.

What sort of load is "heavy load" to you?

Stephen

Joseph Shraibman

jks@selectacast.net

about 22 years ago

In reply to: Stephen Robert Norris (#2)

Re: Question on Opteron performance

Stephen Robert Norris wrote:

Is this true? Did they really double the size of the memory bus, or is
it a case of 4 CPUs fighting for the same memory bandwidth that 2 had
before?

Opertons have built in memory controllers so they scale better than
intel chips. See http://www.anandtech.com/IT/showdoc.html?i=1982

Stephen Robert Norris

srn@commsecure.com.au

about 22 years ago

In reply to: Joseph Shraibman (#3)

Re: Question on Opteron performance

On Tue, 2004-03-09 at 14:50, Joseph Shraibman wrote:

Stephen Robert Norris wrote:

Is this true? Did they really double the size of the memory bus, or is
it a case of 4 CPUs fighting for the same memory bandwidth that 2 had
before?

Opertons have built in memory controllers so they scale better than
intel chips. See http://www.anandtech.com/IT/showdoc.html?i=1982

I didn't realise that. Thanks for the pointer. We're going to be buying
some more servers soon, I should look at Opterons I guess...

Stephen

Paul Ramsey

pramsey@cleverelephant.ca

about 22 years ago

In reply to: Stephen Robert Norris (#4)

Re: Question on Opteron performance

Sun is selling Opterons nowadays, I actually got a call from a Sun
salesman. Could be good. (not 4-way though...)

http://www.sun.com/servers/entry/v20z/specs.html

Stephen Robert Norris wrote:

On Tue, 2004-03-09 at 14:50, Joseph Shraibman wrote:

Stephen Robert Norris wrote:

Is this true? Did they really double the size of the memory bus, or is
it a case of 4 CPUs fighting for the same memory bandwidth that 2 had
before?

Opertons have built in memory controllers so they scale better than
intel chips. See http://www.anandtech.com/IT/showdoc.html?i=1982

I didn't realise that. Thanks for the pointer. We're going to be buying
some more servers soon, I should look at Opterons I guess...

Stephen

--
__
/
| Paul Ramsey
| Refractions Research
| Email: pramsey@refractions.net
| Phone: (250) 885-0632
\_

Steve Wolfe

nw@codon.com

about 22 years ago

In reply to: Stephen Robert Norris (#2)

Re: Question on Opteron performance

Right now, we're using a dual 2.8GHz Xeon with 3 gigs of memory, and run
without fsync() enabled. Between disk cache and shared buffers, the disk
system isn't an issue - vmstat shows that the disk I/O is nearly always at
zero, with the occasional blips of activity rarely being more than a few
hundred kilobytes.

You do know that turning off fsync() means your data will all get
trashed if you get an OS crash or power problem or H/W crash or ...

But of course. : )

I've been running production servers with fsync() disabled for about
four years now, without a problem. On the semi-production machine where
that sort of thing is allowed to happen, even abnormal power outtages
haven't produced any data corruption in the few times they've occured.
Of course, I do realize that sooner or later, it may catch up to me and
bite me in the butt. Because of that, I do have recovery/contingency
plans in place!

Is this true? Did they really double the size of the memory bus, or is
it a case of 4 CPUs fighting for the same memory bandwidth that 2 had
before?

As another person pointed out, the Opterons are NUMA-style machines.
Each CPU has its own memory controller, so each time you add another CPU,
you're also adding more memory bandwidth. This is how some of the "bigger"
machines (like Suns) have been doing it for some time.

As I've taken our real-world data and benchmarked various systems (
4-way P3 Xeon, dual Athlons, dual P4 Xeons), adding CPU cycles tends to
increase performance linearly, and in small increments. Increasing the
memory bandwidth, however, seems to produce the large performance
improvements.

In fact, while the dual Athlon smoked the 4xP3 Xeon machine, it was still
very limitted by the "measly" 266 MHz, 64-bit memory subsystem. When we'd
max out the throughput, the CPUs usually weren't doing a whole lot, but
rather waiting for memory. With double the memory bandwidth, the Xeons seem
to be able to keep the CPUs doing a bit more than the Athlons could. If
I'm wrong about the shared-buffer limitation, and PostgreSQL's design will
lend itself well to the Opteron's memory architecture, then a 4-way Opteron
having more than 4 times the memory bandwidth should definitely be good
for what ails us.

If anyone has done tests with PostgreSQL on 2- vs. 4-way machines under
heavy load (many simultaneous connections), I would greatly appreciate
hearing about the results.

What sort of load is "heavy load" to you?

If I recall from today's loads, we were getting about 50 queries per
second from the pool of front-end servers. Obviously, whether 50 queries
per second is "heavy" depends on the type of queries, these were enough
to push the 5-minute system loads up into the 0.8 range. In our application,
once we exceed a system load of about 0.9, we start seeing enough slowdown
that it does become noticeable. Not always very significant, but noticeable.

steve

Nick Barr

nicky@chuckie.co.uk

about 22 years ago

In reply to: Stephen Robert Norris (#4)

Re: Question on Opteron performance

Stephen Robert Norris wrote:

On Tue, 2004-03-09 at 14:50, Joseph Shraibman wrote:

Stephen Robert Norris wrote:

Is this true? Did they really double the size of the memory bus, or is
it a case of 4 CPUs fighting for the same memory bandwidth that 2 had
before?

Opertons have built in memory controllers so they scale better than
intel chips. See http://www.anandtech.com/IT/showdoc.html?i=1982

I didn't realise that. Thanks for the pointer. We're going to be buying
some more servers soon, I should look at Opterons I guess...

Stephen

And here is a nice pretty picture of the architecture of a 4 way Opteron.

http://www.amd.com/us-en/assets/content_type/DownloadableAssets/AM190_briefv3.pdf

I know which architecture I prefer, the one without a central hub.
Whether the Opteron will perform well for PG I have no idea but we also
are planning some 4 way Opteron boxes to beef up our data center. We
will probably do some benchmarking on those bad boys when we get them
and post it to pgsql-performance if anyone is interested.

Nick

Stephen Robert Norris

srn@commsecure.com.au

about 22 years ago

In reply to: Nick Barr (#7)

Re: Question on Opteron performance

On Tue, 2004-03-09 at 21:01, Nick Barr wrote:

Stephen Robert Norris wrote:

On Tue, 2004-03-09 at 14:50, Joseph Shraibman wrote:

Stephen Robert Norris wrote:

Is this true? Did they really double the size of the memory bus, or is
it a case of 4 CPUs fighting for the same memory bandwidth that 2 had
before?

Opertons have built in memory controllers so they scale better than
intel chips. See http://www.anandtech.com/IT/showdoc.html?i=1982

I didn't realise that. Thanks for the pointer. We're going to be buying
some more servers soon, I should look at Opterons I guess...

Stephen

And here is a nice pretty picture of the architecture of a 4 way Opteron.

http://www.amd.com/us-en/assets/content_type/DownloadableAssets/AM190_briefv3.pdf

I know which architecture I prefer, the one without a central hub.
Whether the Opteron will perform well for PG I have no idea but we also
are planning some 4 way Opteron boxes to beef up our data center. We
will probably do some benchmarking on those bad boys when we get them
and post it to pgsql-performance if anyone is interested.

Nick

I'd be fascinated.

--
Stephen Norris srn@fn.com.au
Farrow Norris Pty Ltd +61 417 243 239

Stephen Frost

sfrost@snowman.net

about 22 years ago

In reply to: Steve Wolfe (#1)

Re: Question on Opteron performance

* Steve Wolfe (nw@codon.com) wrote:

The main question in my mind is whether a 4-way Opteron is going to
give me enough of a performance benefit over a 2-way Opteron to make the
extra $10k worth it. My first guess was that it would, as going from 2
Opterons to 4 will give you twice the potential memory bandwidth.
However, as PostgreSQL pulls heavily from the global buffers, I may not be
able to utilize all of that potential bandwidth.

Well, one question to ask, of course, is how much overlap is there in
the queries? I think that'd make some difference... Besides that, make
sure you get enough DIMMS (2 per CPU) to get interleaved memory access
going on the Opterons. I'm guessing you realize this already, but
figured I'd mention it anyway. :) When I got my opterons in I hadn't
realized it'd do interleaved till I was flipping through the MB manual.
:)

Stephen

#10

Jeff

threshar@torgo.978.org

about 22 years ago

In reply to: Stephen Frost (#9)

Re: Question on Opteron performance

On Mar 9, 2004, at 8:00 AM, Stephen Frost wrote:

* Steve Wolfe (nw@codon.com) wrote:

The main question in my mind is whether a 4-way Opteron is going to
give me enough of a performance benefit over a 2-way Opteron to make
the
extra $10k worth it. My first guess was that it would, as going from
2
Opterons to 4 will give you twice the potential memory bandwidth.
However, as PostgreSQL pulls heavily from the global buffers, I may
not be
able to utilize all of that potential bandwidth.

Well, one question to ask, of course, is how much overlap is there in
the queries? I think that'd make some difference... Besides that,
make
sure you get enough DIMMS (2 per CPU) to get interleaved memory access
going on the Opterons. I'm guessing you realize this already, but
figured I'd mention it anyway. :) When I got my opterons in I hadn't
realized it'd do interleaved till I was flipping through the MB manual.
:)\

And lets not forget one of the best things to do: Optimize the queries
themselves!

Nothing can beat good ol' fashion query optimization.
Wether it be adding an index or trying something out like a
materialized view.

I have a PG machine here doing over 50 queries (both read/write) and it
has plenty of idle cpu.

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

#11

Steve Wolfe

nw@codon.com

about 22 years ago

In reply to: Steve Wolfe (#1)

Re: Question on Opteron performance

And lets not forget one of the best things to do: Optimize the queries
themselves!

Nothing can beat good ol' fashion query optimization.
Wether it be adding an index or trying something out like a
materialized view.

I have a PG machine here doing over 50 queries (both read/write) and it
has plenty of idle cpu.

Like I said, whether 50 queries per second is any substantial load
depends on the queries. With some of our queries, we could probably issue
a thousand per second. However, much of our data model is very complex,
integrating data from a lot of different places, and some of the queries
can get rather intensive. Believe me, we've spent quite a bit of time
looking for ways to optimize our queries, but if you want to integrate
this much data, there's only so much optimization you can do.

(I suppose we could create lookup tables of every possible query result,
if we only had a couple of petabytes of storage to use....)

steve

#12

Bruce Momjian

bruce@momjian.us

about 22 years ago

In reply to: Steve Wolfe (#6)

Re: Question on Opteron performance

nw@codon.com writes:

What sort of load is "heavy load" to you?

If I recall from today's loads, we were getting about 50 queries per
second from the pool of front-end servers. Obviously, whether 50 queries
per second is "heavy" depends on the type of queries, these were enough
to push the 5-minute system loads up into the 0.8 range. In our application,
once we exceed a system load of about 0.9, we start seeing enough slowdown
that it does become noticeable. Not always very significant, but noticeable.

The only time I've seen high cpu and memory bandwidth load with near-zero i/o
load like you describe was on Oracle and it turned out to be an sql
optimization problem.

What caused it was a moderate but not very large table on which a very
frequent query was doing a full table scan (= sequential scan). The entire
table was easily kept in cache, but it was large enough that merely scanning
every block of it in the cache consumed a lot of cpu and memory bandwidth. I
don't remember how large, but something on the order of a few thousand records.

The query still ran reasonably fast, but much slower than it ought to have
been. I don't remember numbers, it was probably something like 200ms instead
of 20ms. Plenty of other queries were in the 200ms range but due to normal i/o
delays. 200ms is a lot more cpu/memory usage than it is i/o usage, enough to
hog those resources and slow down the entire system but not show up on our
lists of top slow queries.

I don't know if your problem is anything similar, and I'm not even sure where
I would start to find a problem like this in postgres. In Oracle I could sort
the query cache by "total logical buffer gets" which basically translated into
memory bandwidth consumed for all executions of the query. That produces very
different results than looking at the queries sorted by the time they take to
execute.

--
greg

#13

Steve Wolfe

nw@codon.com

about 22 years ago

In reply to: Steve Wolfe (#6)

Re: Question on Opteron performance

The only time I've seen high cpu and memory bandwidth load with

near-zero i/o

load like you describe was on Oracle and it turned out to be an sql
optimization problem.

What caused it was a moderate but not very large table on which a very
frequent query was doing a full table scan (= sequential scan). The

entire

table was easily kept in cache, but it was large enough that merely

scanning

every block of it in the cache consumed a lot of cpu and memory

bandwidth. I

don't remember how large, but something on the order of a few thousand

records.

Every so often, I log all queries that are issued, and on a seperate
machine, I EXPLAIN them and store the results in a database, so I can do
analysis on them. Each time, we look at what's using the greatest amount
of resources, and attack that. Believe me, the "low-hanging fruit" like
using indexes instead of sequential scans were eliminated years ago. : )

Over the past four years, our traffic has increased, on average, about
90% per year. We've also incorporated far more sources of data into our
model, and come up with far more ways to use the data. When you're
talking about exponential traffic growth combined with exponential data
complexity, it doesn't take long before you start hitting limits!

Before I shell out the $15k on the 4-way Opteron, I'm going to spend
some long, hard time looking for ways to make the system more efficient.
However, after all that's already been done, I'm not optimistic that it's
going to preclude needing the new server. I'm just surprised that nobody
seems to have used PostgreSQL on a quad-Opteron before!

steve

#14

Christopher Petrilli

petrilli@amber.org

about 22 years ago

In reply to: Steve Wolfe (#13)

Re: Question on Opteron performance

On Mar 10, 2004, at 3:14 PM, Steve Wolfe wrote:

Before I shell out the $15k on the 4-way Opteron, I'm going to spend
some long, hard time looking for ways to make the system more
efficient.
However, after all that's already been done, I'm not optimistic that
it's
going to preclude needing the new server. I'm just surprised that
nobody
seems to have used PostgreSQL on a quad-Opteron before!

Well, I haven't had a chance to run PostgreSQL on a quad-Opteron box,
but in discussing this with someone building a cluster out of them,
their experience is that they are seeing better performance out of a
quad-Opteron than a 3Ghz Xeon box (quad as well), which they believe
reflects superior memory architecture. So, if someone has run on a
quad-Xeon of similar "specs", then I would imagine you should see
similar, if not better, numbers.

Chris
--
| Christopher Petrilli
| petrilli (at) amber.org

#15

William Yu

wyu@talisys.com

about 22 years ago

In reply to: Steve Wolfe (#13)

Re: Question on Opteron performance

Steve Wolfe wrote:

Before I shell out the $15k on the 4-way Opteron, I'm going to spend
some long, hard time looking for ways to make the system more efficient.
However, after all that's already been done, I'm not optimistic that it's
going to preclude needing the new server. I'm just surprised that nobody
seems to have used PostgreSQL on a quad-Opteron before!

It's semi-logical why nobody has done it yet. Those who've gone with
Opteron servers have had to go with smaller vendors and usually the
profile of those types of buyers would be classes as price conscious. At
this time, only Newisys offers a Quad Opteron box and it carries a hefty
premium. (Sun's upcoming 4X machine is a rebadged Newisys machine and
it's possible HP's will be also.)

The picture should change with a few more vendors getting into the
4xOpteron arena. Once Tyan's S4880/4882 motherboards are available,
smaller vendors will be able to offer 4X servers.

#16

Vivek Khera

khera@kcilink.com

about 22 years ago

In reply to: Steve Wolfe (#6)

Re: Question on Opteron performance

"SW" == Steve Wolfe <nw@codon.com> writes:

SW> However, after all that's already been done, I'm not optimistic that it's
SW> going to preclude needing the new server. I'm just surprised that nobody
SW> seems to have used PostgreSQL on a quad-Opteron before!

I think people saturate the disks before the CPUs. I know I certainly
do, even with 4GB RAM and a fair number of shared buffers. Dual CPUs
are more then plenty for our usage patterns.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D. Khera Communications, Inc.
Internet: khera@kciLink.com Rockville, MD +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/

#17

Lincoln Yeoh

lyeoh@pop.jaring.my

about 22 years ago

In reply to: Steve Wolfe (#1)

Re: Question on Opteron performance

With mem per CPU, Opterons scale very well in most cases (as long as you
have many processes). The lower memory latency helps too.

Is it likely for you to be network bandwidth limited - e.g. maxing out your
NICs or NIC I/O capacity? I doubt it, but if you are actually getting close
then things get a bit harder...

BTW in most Opteron configs, a lot of the major I/O goes through via one CPU.

Check this out:
http://www.aceshardware.com/read.jsp?id=60000277

The benchmarks could be interesting too.

At 05:06 PM 3/8/2004 -0700, Steve Wolfe wrote:

Show quoted text

The main question in my mind is whether a 4-way Opteron is going to
give me enough of a performance benefit over a 2-way Opteron to make the
extra $10k worth it. My first guess was that it would, as going from 2
Opterons to 4 will give you twice the potential memory bandwidth.
However, as PostgreSQL pulls heavily from the global buffers, I may not be
able to utilize all of that potential bandwidth.

If anyone has done tests with PostgreSQL on 2- vs. 4-way machines under
heavy load (many simultaneous connections), I would greatly appreciate
hearing about the results.

Steve Wolfe

#18

scott.marlowe

scott.marlowe@ihs.com

about 22 years ago

In reply to: Christopher Petrilli (#14)

Re: Question on Opteron performance

On Wed, 10 Mar 2004, Christopher Petrilli wrote:

On Mar 10, 2004, at 3:14 PM, Steve Wolfe wrote:

Before I shell out the $15k on the 4-way Opteron, I'm going to spend
some long, hard time looking for ways to make the system more
efficient.
However, after all that's already been done, I'm not optimistic that
it's
going to preclude needing the new server. I'm just surprised that
nobody
seems to have used PostgreSQL on a quad-Opteron before!

Well, I haven't had a chance to run PostgreSQL on a quad-Opteron box,
but in discussing this with someone building a cluster out of them,
their experience is that they are seeing better performance out of a
quad-Opteron than a 3Ghz Xeon box (quad as well), which they believe
reflects superior memory architecture. So, if someone has run on a
quad-Xeon of similar "specs", then I would imagine you should see
similar, if not better, numbers.

This article:

http://www.anandtech.com/IT/showdoc.html?i=1982

seems to support that view that opterons currently scale better than
Xeons.

#19

Reece Hart

reece@in-machina.com

about 22 years ago

In reply to: William Yu (#15)

Re: Question on Opteron performance

On Wed, 2004-03-10 at 18:23, William Yu wrote:

At this time, only Newisys offers a Quad Opteron box and it carries a hefty
premium. (Sun's upcoming 4X machine is a rebadged Newisys machine and
it's possible HP's will be also.)

There are several vendors with quad opterons out there. Off the top of
my head, I know that Aspen, Penguin Computing, Appro, and Polywell all
have them. I just googled quad opteron and see that there are bunches of
others too.

-Reece

--
Reece Hart, http://www.in-machina.com/~reece/, GPG:0x25EC91A0 0xD178AAF9

#20

William Yu

wyu@talisys.com

about 22 years ago

In reply to: Reece Hart (#19)

Re: Question on Opteron performance

Reece Hart wrote:

On Wed, 2004-03-10 at 18:23, William Yu wrote:

/At this time, only Newisys offers a Quad Opteron box and it carries a hefty
premium. (Sun's upcoming 4X machine is a rebadged Newisys machine and
it's possible HP's will be also.)/

There are several vendors with quad opterons out there. Off the top of
my head, I know that Aspen, Penguin Computing, Appro, and Polywell all
have them. I just googled quad opteron and see that there are bunches of
others too.

I'm pretty sure most of these guys just rebadge the Newisys box (at this
time).