Hardware optimising

Started by M Simmsover 26 years ago12 messagesgeneral

Jump to latest

M Simms

grim@argh.demon.co.uk

over 26 years ago

I am about to upgrade a server that is about to be running a large and busy
postgresql database
currently it has

128 MB 100 MHz SDRAM
AMD K6-2/300 CPU
10 GB 7200RPM 9.0ms IBM IDE HDD

It will, over the next few months, as money becomes available, be upgraded to:

256 MB 100 MHz SDRAM
Dual Athlon 500 CPUs
10 GB UltraII Wide SCSI drive

The database will contain several million records and needs to be able to do
very fast selects from tables with a lot of rows, and do small updates and
inserts onto these tables at a good speed also.

Now, as the funds for this upgrade are trickling through slowly, I need to know
where I would see the most performance increase initially. Which of the three
peices would I be best to upgrade first to see the best performance increase. I
am leaning towards the fast scsi, as there is a LOT of data transfer.

Any insights from people that already run big databases, do we need memory, CPU
or fast disc most?

Thanx
M Simms
--
#define z(x,y) for (x=0;x<y;x++){
main(){long int i[3]={1214606444,1864390511,1919706122};int x,y;
z(x,3)z(y,4)putchar((i[x]>>((3-y)<<3))&(255));
}}}

Roderick A. Anderson

raanders@altoplanos.net

over 26 years ago

In reply to: M Simms (#1)

Re: [GENERAL] Hardware optimising

On Thu, 26 Aug 1999, Michael wrote:

Hi

I am about to upgrade a server that is about to be running a large and busy
postgresql database
currently it has

128 MB 100 MHz SDRAM
AMD K6-2/300 CPU
10 GB 7200RPM 9.0ms IBM IDE HDD

From experience I'd say the biggest performance increase would be to get

away from the IDE HD. It is also the first or second most expensive
portion of your plan.

RAM will help but not that much when it comes to disk accesses. It should
provide a certain amount of cache for the disk subsystem but teh bottle
neck will still be the the IDE interface. (Though I've seen some IDE
controllers with onboard cache.)

It will, over the next few months, as money becomes available, be upgraded to:

Depending on how fast the money trickles in you'd get some help cheapest
with the RAM, then the SCSI, then the MB and processors.

Rod
--
Roderick A. Anderson
raanders@altoplanos.net Altoplanos Information Systems, Inc.
Voice: 208.765.6149 212 S. 11th Street, Suite 5
FAX: 208.664.5299 Coeur d'Alene, ID 83814

Bruce Momjian

bruce@momjian.us

over 26 years ago

In reply to: M Simms (#1)

Re: [GENERAL] Hardware optimising

Hi

I am about to upgrade a server that is about to be running a large and busy
postgresql database
currently it has

128 MB 100 MHz SDRAM
AMD K6-2/300 CPU
10 GB 7200RPM 9.0ms IBM IDE HDD

It will, over the next few months, as money becomes available, be upgraded to:

256 MB 100 MHz SDRAM
Dual Athlon 500 CPUs
10 GB UltraII Wide SCSI drive

SCSI harddrive, clearly, should be the first change. Then cpu, then
memory, though it is tough to say. If you are doing any swapping,
memory first.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Aaron Seigo

aaron@gtv.ca

over 26 years ago

In reply to: M Simms (#1)

Re: [GENERAL] Hardware optimising

hi...

128 MB 100 MHz SDRAM
AMD K6-2/300 CPU
10 GB 7200RPM 9.0ms IBM IDE HDD

It will, over the next few months, as money becomes available, be upgraded to:

256 MB 100 MHz SDRAM
Dual Athlon 500 CPUs
10 GB UltraII Wide SCSI drive

The database will contain several million records and needs to be able to do
very fast selects from tables with a lot of rows, and do small updates and
inserts onto these tables at a good speed also.

seems there are a lot of opinions on this one floating around, and not
alot of explanations to go along with them. =)

in my experience, it really depends on what you are doing. if you are
going to be doing the same selects alot, then extra RAM will help a lot.
just be sure to set the buffers when starting postmaster to be
relatively high (i.e. a few thousand). also, since you are doing few
inserts, i'd turn off f-sync for greater speed and less disk access
(which with your current IDE will be expensive (time wise) at best).
this will run the risk of losing inserts if the machine crashes, but if
you are using a stable OS (i.e. not NT) then you'll probably be just
fine if the inserts are few compared to selects.

also, if you are going to be doing a lot of pre- and post-processing of
the data (i.e. grabbing bits of data based on a algorythm (sp) or
getting bits of data and massaging them about a lot (i.e. creating
graphs, lay-out, doing analysis, etc)) then RAM will also see a boost as
you will be able to do these in memory, allowing the database the disk
more to itself...

the SCSI drive will see an increase in speed to be sure! in fact, i'd
suggest giving the database the drive all to itself for data... leave
everything else on the IDE drive (OS, database engine, etc) and format
the SCSI drive with large i-node blocks (i.e. 1MB) and just let the data
reside on the SCSI disk. besides gaining the speed of the disk you'll
also allow the rest of the system to stay the hell out of the way of
that disk intensive database! =) you'll probably see a tremendous
increase in speed doing it this way (large inodes, only database data)
than if you just simply replace the IDE with the SCSI drive...

of course, as time gos on, if you use a mirroring raid array by adding
another disk, you'll see even more speed increase. other RAIDs, while
preserving your data, will result in slow downs.. though it will still
be faster than an IDE drive with everything on it. RAID 5 is cheaper
(more out of your disk space) but will be a bit slower than a single
disk system or a RAID 5... but RAID 5 is a nice way to go...

however, if you do go RAID, DO NOT use software RAID. why? well... it
negates some of the fail-safe power of the RAID (although if well set up
you can render this moot) but more importantly it will drag
significantly on your processors (~10% or so is common)

as for the processor, this will see an increase, of course. note,
however, that since PostgreSQL is _not_ multithreaded, that it will run
only on one of the processors. (i'm about to assume you are using linux
here... 'scuse me if i'm wrong) however, the good news is that you can
encourage linux (through the scheduler) to run postgres on one of the
processors and everything else on the other one. this should give the
database its own processor more oft than not. things may still drift,
etc... but it will be better this way....

the processor boost will be important, again, if you are doing lots of
pre/post-processing of data. it will also see an improvement if you are
offering other services (i.e. WWW) on the machine (which i'm guessing
you are). this will require a kernel recompile and some muckin' about to
get it all running as quickly/smoothly/efficiently as possible.

this is another side of things to look at:

RAM is quick and instant. power down, slap in some more ram, power up.
fast down time.

DISK upgrade will take more time. i.e. formatting; setting it in your
FSTAB, etc; changing your start up scripts to tell postgres where the
data is now; copying things to the new disk, etc... this will result in
some fairly good down time. the installation can be done quickly (if
well thought out, i.e. pre-format the drive, etc...) and the rest can be
done while online. although you'll want to shut the database down while
copying data files. the longer you wait on this one, the longer your
down time will be (more data to copy, etc...)

CPU upgrade will require downtime to install (not nearly as fast or easy
as RAM).. then kernel recompiling.. then testing of the new kernel...
then tweaking the system. probably resulting in even more down time than
with the disk upgrade.

i'm guessing that while the system is new, you'll probably be more
agreeable to longer downtimes. so perhaps the disk upgrade would be
better earlier on in that it will probably give you the best improvement
while absorbing down time impact early in on the venture when it might
not be noticed so much (i'm guessing here again as to the nature of your
usage... assuming the demands on the system will start out smallish and
grow as time gos on...)

second, i'd do the RAM upgrade. cheap, fast, good improvements.

third, i'd tackle the CPUs (tricky, not as cheap, fair amount of
downtime...)

but that's just me. and those who know me know that i'm often off to one
side of the field. usually looking at the clouds, in fact. =)

as a bit of last advice, SysAdmin magazine, numerous O'Reilly books, and
many online FAQs and HOW-TOs give some really good advice on these sorts
of issues. these aren't really database related as much as they are
systems administration questions and are applicable to most high-demand
services. get the books/mags, read incessently, keep up to date on
what's what, etc, etc and your new service can only stand to benefit.

Aaron Seigo
somebody who types a lot during the course of a day.

Bruce Momjian

bruce@momjian.us

over 26 years ago

In reply to: Aaron Seigo (#4)

Re: [GENERAL] Hardware optimising

as for the processor, this will see an increase, of course. note,
however, that since PostgreSQL is _not_ multithreaded, that it will run
only on one of the processors. (i'm about to assume you are using linux
here... 'scuse me if i'm wrong) however, the good news is that you can
encourage linux (through the scheduler) to run postgres on one of the
processors and everything else on the other one. this should give the
database its own processor more oft than not. things may still drift,
etc... but it will be better this way....

Different backends can use different CPU's, no problem.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Andy Lewis

alewis@roundnoon.com

over 26 years ago

In reply to: Bruce Momjian (#5)

Re: [GENERAL] Hardware optimising

What scheduler are we speaking of here?

Andy

On Thu, 26 Aug 1999, Bruce Momjian wrote:

Show quoted text

as for the processor, this will see an increase, of course. note,
however, that since PostgreSQL is _not_ multithreaded, that it will run
only on one of the processors. (i'm about to assume you are using linux
here... 'scuse me if i'm wrong) however, the good news is that you can
encourage linux (through the scheduler) to run postgres on one of the
processors and everything else on the other one. this should give the
database its own processor more oft than not. things may still drift,
etc... but it will be better this way....

Different backends can use different CPU's, no problem.
-- 
Bruce Momjian                        |  http://www.op.net/~candle
maillist@candle.pha.pa.us            |  (610) 853-3000
+  If your life is a hard drive,     |  830 Blythe Avenue
+  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
************

Bruce Momjian

bruce@momjian.us

over 26 years ago

In reply to: Andy Lewis (#6)

Re: [GENERAL] Hardware optimising

What scheduler are we speaking of here?

Andy

On Thu, 26 Aug 1999, Bruce Momjian wrote:

as for the processor, this will see an increase, of course. note,
however, that since PostgreSQL is _not_ multithreaded, that it will run
only on one of the processors. (i'm about to assume you are using linux
here... 'scuse me if i'm wrong) however, the good news is that you can
encourage linux (through the scheduler) to run postgres on one of the
processors and everything else on the other one. this should give the
database its own processor more oft than not. things may still drift,
etc... but it will be better this way....

Different backends can use different CPU's, no problem.

Each backend is a different process, so they can run at the same time on
multiple cpu's. Any OS that can handle multiple cpu's can handle
PostgreSQL running multiple backends at the same time.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Marcus Mascari

mascarim@yahoo.com

over 26 years ago

In reply to: Bruce Momjian (#7)

Re: [GENERAL] Hardware optimising

--- Andy Lewis <alewis@roundnoon.com> wrote:

What scheduler are we speaking of here?

Andy

On Thu, 26 Aug 1999, Bruce Momjian wrote:

as for the processor, this will see an increase,

of course. note,

however, that since PostgreSQL is _not_

multithreaded, that it will run

only on one of the processors. (i'm about to

assume you are using linux

here... 'scuse me if i'm wrong) however, the

good news is that you can

encourage linux (through the scheduler) to run

postgres on one of the

processors and everything else on the other one.

this should give the

database its own processor more oft than not.

things may still drift,

etc... but it will be better this way....

Different backends can use different CPU's, no

problem.

--
Bruce Momjian

Bruce, of course, is, as always, absolutely correct.
Each connection to the backend starts a postgres
process which will be assigned to either CPU by
Linux. I have read (either somewhere in the SMP FAQ
or some mailing list) that there are utilities
forthcoming (this was awhile ago) to assign a process
to a specific CPU. There are several advantages to
having a multithreaded backend instead of a
multitasking backend since connections would be
faster, no need for shared memory segments, etc.,
but use of multiple processors is not exclusive to
multithreading applications. Any application which
forks() or execs() another can take advantage of
multiple processors. And there are disadvantages to
multithreading too as pointed out in
previous threads (no pun intended), such as stability
of the running process if one of its threads dies
abnormally.

With regard to the original post, I again, agree
fully with Bruce - SCSI first. And spend an extra
couple hundred to get the 80MBs variety, dual channel
controllers; its worth it. Hopefully one would also
be able to optimize the disk configuration as well.
We run RedHat Linux 2.0.36 on a Dual 450Mhz deskside
server with 256M of RAM. The only regret I have is
we didn't get the 80MBs (we got 40MBs) controller
and (6) 4 Gig hard drives. Instead we got (2) 9 Gig
drives. This forces us to only run RAID 1.
For only a few hundred more, we could have run
RAID 0+1 on dual channels (with each mirror on the
other channel). We also put the database on the second
innermost partition, with the outer being swap.

Finally, if you are using Linux and choose to go
the SMP route, I highly recommend the newer 2.2
kernels. We saw dramatic improvement in speed over
2.0.36 vs. 2.2.x in our testing environment. In fact,
to enable SMP on a 2.0.36 kernel, you must modify the
top-level Makefile for the kernel and rebuild.

Anyways,

Hope that helps,

Mike Mascari
(mascarim@yahoo.com)

P.S. From previous posts, I'm starting to think that
there is a VAST misconception that a single-threaded
database engine (which is what Oracle was until some
version 7 releases, I believe, called Oracle MTS
appeared) can only handle ONE query at a time, and
does
not exec() a child process for each connection.
Someone ought to start the propoganda of claiming
multi-threaded DBMS as "single process" servers.

__________________________________________________
Do You Yahoo!?
Bid and sell for free at http://auctions.yahoo.com

Import Notes

Resolved by subject fallback

Andy Lewis

alewis@roundnoon.com

over 26 years ago

In reply to: Marcus Mascari (#8)

Re: [GENERAL] Hardware optimising

Thanks for the info!

Much appreciated!

Andy

On Thu, 26 Aug 1999, Mike Mascari wrote:

Show quoted text

--- Andy Lewis <alewis@roundnoon.com> wrote:
What scheduler are we speaking of here?

Andy

On Thu, 26 Aug 1999, Bruce Momjian wrote:

as for the processor, this will see an increase,

of course. note,

however, that since PostgreSQL is _not_

multithreaded, that it will run

only on one of the processors. (i'm about to

assume you are using linux

here... 'scuse me if i'm wrong) however, the

good news is that you can

encourage linux (through the scheduler) to run

postgres on one of the

processors and everything else on the other one.

this should give the

database its own processor more oft than not.

things may still drift,

etc... but it will be better this way....

Different backends can use different CPU's, no

problem.

--
Bruce Momjian

Bruce, of course, is, as always, absolutely correct.
Each connection to the backend starts a postgres
process which will be assigned to either CPU by
Linux. I have read (either somewhere in the SMP FAQ
or some mailing list) that there are utilities
forthcoming (this was awhile ago) to assign a process
to a specific CPU. There are several advantages to
having a multithreaded backend instead of a
multitasking backend since connections would be
faster, no need for shared memory segments, etc.,
but use of multiple processors is not exclusive to
multithreading applications. Any application which
forks() or execs() another can take advantage of
multiple processors. And there are disadvantages to
multithreading too as pointed out in
previous threads (no pun intended), such as stability
of the running process if one of its threads dies
abnormally.

With regard to the original post, I again, agree
fully with Bruce - SCSI first. And spend an extra
couple hundred to get the 80MBs variety, dual channel
controllers; its worth it. Hopefully one would also
be able to optimize the disk configuration as well.
We run RedHat Linux 2.0.36 on a Dual 450Mhz deskside
server with 256M of RAM. The only regret I have is
we didn't get the 80MBs (we got 40MBs) controller
and (6) 4 Gig hard drives. Instead we got (2) 9 Gig
drives. This forces us to only run RAID 1.
For only a few hundred more, we could have run
RAID 0+1 on dual channels (with each mirror on the
other channel). We also put the database on the second
innermost partition, with the outer being swap.

Finally, if you are using Linux and choose to go
the SMP route, I highly recommend the newer 2.2
kernels. We saw dramatic improvement in speed over
2.0.36 vs. 2.2.x in our testing environment. In fact,
to enable SMP on a 2.0.36 kernel, you must modify the
top-level Makefile for the kernel and rebuild.

Anyways,

Hope that helps,

Mike Mascari
(mascarim@yahoo.com)

P.S. From previous posts, I'm starting to think that
there is a VAST misconception that a single-threaded
database engine (which is what Oracle was until some
version 7 releases, I believe, called Oracle MTS
appeared) can only handle ONE query at a time, and
does
not exec() a child process for each connection.
Someone ought to start the propoganda of claiming
multi-threaded DBMS as "single process" servers.

__________________________________________________
Do You Yahoo!?
Bid and sell for free at http://auctions.yahoo.com

************

#10

Bruce Momjian

bruce@momjian.us

over 26 years ago

In reply to: Marcus Mascari (#8)

Re: [GENERAL] Hardware optimising

P.S. From previous posts, I'm starting to think that
there is a VAST misconception that a single-threaded
database engine (which is what Oracle was until some
version 7 releases, I believe, called Oracle MTS
appeared) can only handle ONE query at a time, and
does
not exec() a child process for each connection.
Someone ought to start the propoganda of claiming
multi-threaded DBMS as "single process" servers.

Yes, I am totally unsure how this gets confused by people. I am going
to put it int the FAQ.

Yes, and I agree that most multi-threaded DBMS are "single process",
which can't make use if multiple cpus, except on some very special OS's
that allow threads to move between cpus, sometimes called kernel
threads, I think, but I am not sure on that.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#11

Bruce Momjian

bruce@momjian.us

over 26 years ago

In reply to: Bruce Momjian (#10)

Re: [GENERAL] Hardware optimising

P.S. From previous posts, I'm starting to think that
there is a VAST misconception that a single-threaded
database engine (which is what Oracle was until some
version 7 releases, I believe, called Oracle MTS
appeared) can only handle ONE query at a time, and
does
not exec() a child process for each connection.
Someone ought to start the propoganda of claiming
multi-threaded DBMS as "single process" servers.

Yes, I am totally unsure how this gets confused by people. I am going
to put it int the FAQ.

Yes, and I agree that most multi-threaded DBMS are "single process",
which can't make use if multiple cpus, except on some very special OS's
that allow threads to move between cpus, sometimes called kernel
threads, I think, but I am not sure on that.

I have added this to the FAQ:

We handle each user connection by creating a Unix process. Backend
processes share data buffers and locking information. With multiple
CPU's, multiple backends can easily run on different CPU's. Unlike
PostgreSQL, databases that use process threads have multiple users
connections running in the same Unix process. This makes it difficult
for them to utlilize multiple CPU's.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Import Notes

Reply to msg id not found: FromenvmaillistatAug261999100612pm | Resolved by subject fallback

#12

Dimitri

dimitri@France.Sun.COM

over 26 years ago

In reply to: Bruce Momjian (#11)

Re: [GENERAL] Hardware optimising

Hi Bruce!

I think you are not right with remark about threads...
At least with Sun Solaris you will be much faster with multithreaded
process vs multi processes.
Why? Because Solaris is real multithreaded kernel, so it sees only
threads in the kernel level.
"Process" is a user view for running task, and 1 process = at least 1
thread.
Why multithreaded process wil be faster than multi processes? - Because
switching between threads will cost much less than between processes, so
you wil have more CPU time to continue your job.
And, of course, all threads can use any CPU, so you win more with more
CPU(s), but even with ONE CPU on your machine you will be faster with 10
threads than with 10 processes...

Just my 0.02c about Solaris :)

Best regards!
Dimitri

Bruce Momjian wrote:

P.S. From previous posts, I'm starting to think that
there is a VAST misconception that a single-threaded
database engine (which is what Oracle was until some
version 7 releases, I believe, called Oracle MTS
appeared) can only handle ONE query at a time, and
does
not exec() a child process for each connection.
Someone ought to start the propoganda of claiming
multi-threaded DBMS as "single process" servers.

Yes, I am totally unsure how this gets confused by people. I am going
to put it int the FAQ.

Yes, and I agree that most multi-threaded DBMS are "single process",
which can't make use if multiple cpus, except on some very special OS's
that allow threads to move between cpus, sometimes called kernel
threads, I think, but I am not sure on that.

I have added this to the FAQ:

We handle each user connection by creating a Unix process. Backend
processes share data buffers and locking information. With multiple
CPU's, multiple backends can easily run on different CPU's. Unlike
PostgreSQL, databases that use process threads have multiple users
connections running in the same Unix process. This makes it difficult
for them to utlilize multiple CPU's.
--
Bruce Momjian                        |  http://www.op.net/~candle
maillist@candle.pha.pa.us            |  (610) 853-3000
+  If your life is a hard drive,     |  830 Blythe Avenue
+  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
************

--
=====================================================
Dimitri KRAVTCHUK (dim) Sun Microsystems
Benchmark Engineer France
dimitri@France.Sun.COM
=====================================================