Dream Server?
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I'm currently running a 4 Gig 11 million row database and am looking
to build a "dream server" for it. I am currently running a dual p3
1GHz system with 10k rpm scsi drives and 4 gigs of ram (which I have
configured pgsql to use all of) and I'm concerned about performance
once the db doubles in size, which should be in 6 mos to a year at
the latest. First off, If money was no concern, what would you buy
as the ultimate postgresql server running linux? Second off, on a
more technical note, does pgsql take advantage of multiple
processors. If I had a 8 way 800 MHz Xeon would the machine blow
away a 2GHz P4? How much is CPU a factor compared to memory? Disk
speed? I want to be able to do large volume selects on tables with
more than 5 million rows and not have the server blink at other
requests put in at the same time.
Any hints or suggestions/experience here would be appreciated.
Thanks,
Gavin
-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>
iQA/AwUBPGGlgn9xeBXAlKqsEQJPywCfenV2bHDKJ0czKLy2qbaoj+hiow8AoNaT
9gYOTvmzFR9+YIjA5MQwjSMN
=B6Ak
-----END PGP SIGNATURE-----
here's a link for a sql server performance tuning guide...
http://www.sql-server-performance.com/hardware_tuning.asp
some of the nuggets from it is:
"When selecting your CPU for your server, select one
with a large L2 cache. This is especially important if
you have multiple-processor servers. Select at least a
1MB L2 cache if you have one or two CPUs. If you have
4 or more CPUs, get at a least 2MB L2 cache in each
CPU. The greater the size of the L2 cache, the greater
the server's CPU performance because it reduces the
amount of wait time experienced by the CPU when reading
and writing data to main memory."
"From a performance perspective, it is better to have
more smaller SCSI disk drives in an array than having
fewer larger SCSI disk drives. Let's say that you need
about 100GB of hard disk space in a RAID 5 array. There
are several ways you can configure such an array, some
of which offer more performance than others. Some
configurations include:
13 - 9GB Drives
7 - 18GB Drives
4 - 36GB Drives
Each of the above configurations will provide about the
same amount of storage space, but the more drives there
are in the array, the faster the I/O will be (assuming
that the controllers can handle all of the I/O traffic).
This is because more drives offer more read/write heads
that all can be working simultaneously, which speeds
disk reads and writes."
*****
"Select the best I/O controller you can get. Top-notch
controllers offload much of the I/O work onto its own
local CPU, freeing up CPU time on the server to do other
tasks. For the ultimate in I/O controllers, consider a
fiber channel connection instead of a SCSI connection."
rjsjr
Show quoted text
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Gavin M. Roy
Sent: Wednesday, February 06, 2002 3:52 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Dream Server?-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1I'm currently running a 4 Gig 11 million row database and am looking
to build a "dream server" for it. I am currently running a dual p3
1GHz system with 10k rpm scsi drives and 4 gigs of ram (which I have
configured pgsql to use all of) and I'm concerned about performance
once the db doubles in size, which should be in 6 mos to a year at
the latest. First off, If money was no concern, what would you buy
as the ultimate postgresql server running linux? Second off, on a
more technical note, does pgsql take advantage of multiple
processors. If I had a 8 way 800 MHz Xeon would the machine blow
away a 2GHz P4? How much is CPU a factor compared to memory? Disk
speed? I want to be able to do large volume selects on tables with
more than 5 million rows and not have the server blink at other
requests put in at the same time.Any hints or suggestions/experience here would be appreciated.
Thanks,
Gavin
-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>iQA/AwUBPGGlgn9xeBXAlKqsEQJPywCfenV2bHDKJ0czKLy2qbaoj+hiow8AoNaT
9gYOTvmzFR9+YIjA5MQwjSMN
=B6Ak
-----END PGP SIGNATURE--------------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
I'm currently running a 4 Gig 11 million row database and am looking
to build a "dream server" for it. I am currently running a dual p3
1GHz system with 10k rpm scsi drives and 4 gigs of ram (which I have
configured pgsql to use all of) and I'm concerned about performance
once the db doubles in size, which should be in 6 mos to a year at
the latest. First off, If money was no concern, what would you buy
as the ultimate postgresql server running linux?
I don't know if Linux runs on the new version of the Sun Starfires, does
it?
Second off, on a
more technical note, does pgsql take advantage of multiple
processors.
If you are running multiple connections, then yes, it does. But having
an extra CPU around to handle kernel code, disk activity, network
activity, etc. does help even for a single connection.
If I had a 8 way 800 MHz Xeon would the machine blow
away a 2GHz P4? How much is CPU a factor compared to memory?
Both CPU and memory have to be taken into account. It does you no good
to have a 10 GHz chip if your memory subsystem can't provide the data to
feed the chip, and likewise, having 10 GB/sec throughput from your memory
system does no good if you are running a 386! If you're using extremely
large tables (which it sounds like you will be), then memory throughput is
a very critical factor.
As for the 8-Xeon to 2-P4, in total throughput, yes, the Xeon should
blow away the P4, assuming that you were talking about overall throughput
with multiple connections, and the 8-way machine had a memory subsystem
that was up to the task. If you're talking about a single query on a
single connection, then the P4 may very well beat the Xeon. Ideally, your
SMP machine should require you to add DIMMS in groups equal to the number
of your processors - in this case, 8 DIMMS at a time. Otherwise, you
simply can't give each chip the full bandwidth. One interesting solution
would be to interleave 4 banks of DDR Ram to feed 8 Xeons, but I doubt
we'll ever see that in production.
Disk
speed? I want to be able to do large volume selects on tables with
more than 5 million rows and not have the server blink at other
requests put in at the same time.
If you have enough RAM to keep the entire database in disk cache, then
disk speed becomes much less of a factor, if you turn of fsync(). Like
I've said plenty of times before (and probably bored everyone to tears),
the lights on our DB machine only blink *occasionally*, even when the
machine is literally being slammed with database activity. A hardware
RAID card with cache on the board and some moderately-decent drives can
give you a VERY fast, responsive disk subsystem.
In a day or two, I'll be getting the parts to build a dual Athlon MP
1800+ machine, and I'm planning on putting PG on it and testing it against
our 4x700 MHz Xeon machine, to see how it fares. Once I do, I'm planning
on making a full report to the list. If anyone has a certain PG benchmark
that they'd like me to run, let me know.
steve
----- Original Message -----
Perhaps we should think wide not tall. As the pontiac commercial says,
wider
is
better. Build a distributed database. Increasing height of a box does
not
scale.
Amdahl proved it.Perhaps someone can help with some links, it have seen references to it
on
ha-linux groups.
That's something that I've dreamed about for some time. My rack of
load-balanced web servers scales efficiently, easily, and cheaply. I
need double the capacity? I buy more machines, and plug them in. If
there were a way of replicating PG data from one master to many slaves in
near-real-time, I could have a rack full of load-balanced database servers
right next to it - cheap, easy, and effective.
Even though I've kept my mouth shut, I've wondered why more effort isn't
devoted to that. There are a LOT of companies out there that fork money
over hand-over-fist trying to buy a single machine that can handle all of
their database usage, and as the size of the machine increases, the cost
per transaction seems to increase exponentially. Sure, a million dollars
would get you an entry-level Starfire, with 16 processors, upgradeably to
64, with a couple of gigabytes/second throughput, or for a lot less money,
you could buy a number of smaller systems that, through copious amounts of
RAM, CPU cycles, and combined bandwidth, could spin circles around the
Starfire.
Don't get the impression that I'm bad-mouthing the developpers - I'm
sure that they're taking care of priorities as best they can.
steve
Perhaps we should think wide not tall. As the pontiac commercial says, wider
is
better. Build a distributed database. Increasing height of a box does not
scale.
Amdahl proved it.
Perhaps someone can help with some links, it have seen references to it on
ha-linux groups.
Steve Wolfe wrote:
I'm currently running a 4 Gig 11 million row database and am looking
to build a "dream server" for it. I am currently running a dual p3
1GHz system with 10k rpm scsi drives and 4 gigs of ram (which I have
configured pgsql to use all of) and I'm concerned about performance
once the db doubles in size, which should be in 6 mos to a year at
the latest. First off, If money was no concern, what would you buy
as the ultimate postgresql server running linux?I don't know if Linux runs on the new version of the Sun Starfires, does
it?Second off, on a
more technical note, does pgsql take advantage of multiple
processors.If you are running multiple connections, then yes, it does. But having
an extra CPU around to handle kernel code, disk activity, network
activity, etc. does help even for a single connection.If I had a 8 way 800 MHz Xeon would the machine blow
away a 2GHz P4? How much is CPU a factor compared to memory?Both CPU and memory have to be taken into account. It does you no good
to have a 10 GHz chip if your memory subsystem can't provide the data to
feed the chip, and likewise, having 10 GB/sec throughput from your memory
system does no good if you are running a 386! If you're using extremely
large tables (which it sounds like you will be), then memory throughput is
a very critical factor.As for the 8-Xeon to 2-P4, in total throughput, yes, the Xeon should
blow away the P4, assuming that you were talking about overall throughput
with multiple connections, and the 8-way machine had a memory subsystem
that was up to the task. If you're talking about a single query on a
single connection, then the P4 may very well beat the Xeon. Ideally, your
SMP machine should require you to add DIMMS in groups equal to the number
of your processors - in this case, 8 DIMMS at a time. Otherwise, you
simply can't give each chip the full bandwidth. One interesting solution
would be to interleave 4 banks of DDR Ram to feed 8 Xeons, but I doubt
we'll ever see that in production.Disk
speed? I want to be able to do large volume selects on tables with
more than 5 million rows and not have the server blink at other
requests put in at the same time.If you have enough RAM to keep the entire database in disk cache, then
disk speed becomes much less of a factor, if you turn of fsync(). Like
I've said plenty of times before (and probably bored everyone to tears),
the lights on our DB machine only blink *occasionally*, even when the
machine is literally being slammed with database activity. A hardware
RAID card with cache on the board and some moderately-decent drives can
give you a VERY fast, responsive disk subsystem.In a day or two, I'll be getting the parts to build a dual Athlon MP
1800+ machine, and I'm planning on putting PG on it and testing it against
our 4x700 MHz Xeon machine, to see how it fares. Once I do, I'm planning
on making a full report to the list. If anyone has a certain PG benchmark
that they'd like me to run, let me know.steve
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
--
-------------------------------------------------------------------------
Medi Montaseri medi@CyberShell.com
Unix Distributed Systems Engineer HTTP://www.CyberShell.com
CyberShell Engineering
-------------------------------------------------------------------------
On Wed, Feb 06, 2002 at 06:00:33PM -0700, Steve Wolfe wrote:
need double the capacity? I buy more machines, and plug them in. If
there were a way of replicating PG data from one master to many slaves in
near-real-time, I could have a rack full of load-balanced database servers
right next to it - cheap, easy, and effective.
If the idea here is to have only one master (where data goes in) but
many slaves (whence data comes when you read), then check out
eRserver. PostgreSQL Inc's version is better than the one included
in contrib/. We're using it to provide both redundancy and a
secondary source for reads, and it is performing without a problem.
A
--
----
Andrew Sullivan 87 Mowat Avenue
Liberty RMS Toronto, Ontario Canada
<andrew@libertyrms.info> M6K 3E3
+1 416 646 3304 x110
That's something that I've dreamed about for some time. My rack of
load-balanced web servers scales efficiently, easily, and cheaply. I
need double the capacity? I buy more machines, and plug them in. If
there were a way of replicating PG data from one master to many slaves in
near-real-time, I could have a rack full of load-balanced database servers
right next to it - cheap, easy, and effective.Even though I've kept my mouth shut, I've wondered why more effort isn't
devoted to that.
I'll second this. Replication is my/our number one (actually, aside from
schemas, the only) needed new feature for PostgreSQL. If we could do updates
to a master server and push all those changes to a bunch of slaves, my job
would be incredibly easier. As it stands, we've had to restructure our data
model so that we can spread information across a number of different
machines.
Don't get the impression that I'm bad-mouthing the developpers - I'm
sure that they're taking care of priorities as best they can.
Here here. PostgreSQL seems to have come a long way, even in the relatively
short time I've been using it. I can't heap enough praise on those who give
their time to improving it. But as a user, it's important to let the
developers know what we want and/or need.
Greg
The two tier architecture proposed here, consisting of a master and slave
seems to be the quickest way to get there. However writes handled by a
slave needs to be reported to the master for rebroadcast to all slaves. This
would create a wave in the system, or poor integrity in rapid writes
environment.
Of course an "application aware" scenario, where application connects to the
master for all writes could ease the situation. It effectively removes one hup
from
the data path. ie instead of client-slave-master-slaves, it does
client-master-slaves.
And for a do it yourselfer, one can have a trigger on a system table for
auto-updates.
What I was thinking is perhaps to modify the "Planner/Optimizer" so that this
layer would in addition to its current functionality, would assume the role of
a distributor, dispatcher, scheduler (or whatever other name), setting up a
"Modified Plan Tree" for the Executor. This "Plan Tree" would include a Node
information as well (a Remote Executor, think RPC). There would also need to
be an Intra-Node Advisory Communication Channel (perhaps more than one)
for load balancing and Lock management. I call this an Asymetric architecture.
Another approach would be to have the "Planner" only do the lock management
(in additon to its normal things), I call this "Almost Shared Nothing", and let
DNS
determine who is next.
Another approach would be to have a RAID (Redundant Array of Inexpensive
Databases, I know I'm overloading RAID) Virtual Database (think Virtual File
System),
where an abstraction layer would determine where to put the tables; on node-1
or
node-2 or both. Then either thru a discovery phase (client says, where do I
connect
for this query), or in a transparent mode, the server would pipe up with the
owner of
the data for the given query and simply facilitates the data movement.
The more I write the more I see how complicated this is....
Sorry about my random thoughts.....it reminds me of "Some write to say what
they
think and some write to think what they want to say...."
Aaaahhh.... I better get back to work.... day dreaming again...
Gregory Wood wrote:
That's something that I've dreamed about for some time. My rack of
load-balanced web servers scales efficiently, easily, and cheaply. I
need double the capacity? I buy more machines, and plug them in. If
there were a way of replicating PG data from one master to many slaves in
near-real-time, I could have a rack full of load-balanced database servers
right next to it - cheap, easy, and effective.Even though I've kept my mouth shut, I've wondered why more effort isn't
devoted to that.I'll second this. Replication is my/our number one (actually, aside from
schemas, the only) needed new feature for PostgreSQL. If we could do updates
to a master server and push all those changes to a bunch of slaves, my job
would be incredibly easier. As it stands, we've had to restructure our data
model so that we can spread information across a number of different
machines.Don't get the impression that I'm bad-mouthing the developpers - I'm
sure that they're taking care of priorities as best they can.Here here. PostgreSQL seems to have come a long way, even in the relatively
short time I've been using it. I can't heap enough praise on those who give
their time to improving it. But as a user, it's important to let the
developers know what we want and/or need.Greg
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
--
-------------------------------------------------------------------------
Medi Montaseri medi@CyberShell.com
Unix Distributed Systems Engineer HTTP://www.CyberShell.com
CyberShell Engineering
-------------------------------------------------------------------------