Scalability

Started by Jan Ploskiabout 25 years ago4 messagesgeneral
Jump to latest
#1Jan Ploski
jpljpl@gmx.de

Hello,

When large databases, complex queries and frequent access are combined,
the database backend may become a performance bottleneck. This is quite
obvious.

Do I assume correctly that CPU will be the earliest limiting factor
for a database (if we have a reasonable amount of memory in the box)?

If yes, one could deal with it by switching to a more powerful machine,
OR one could distribute the database between several equivalent machines,
as I imagine. The latter makes sense if data can be partitioned so that
consistency and dependencies can be worked around, that is, I can move
two subsets of data (say A and B) to two different boxes if rows in these
subsets are very unlikely to be combined in a single query. Then I can have
my application code access the correct database instance based on some
simple rule, for example, users with IDs 0-1000000 have their messages
stored on box A and all others on box B. Or: articles for newsgroups
with hash code == 0 -> box A, others -> box B. Do we get linear scalability
that way? I guess so. Does it sound cool? Yes. But does it cost less than
upgrading a single server? I'm not sure.

Is anyone out there using such a load balancing scheme with PostgreSQL?
Are there any foreseeable problems which would make it impractical?

Thanks -
JPL

#2Gordan Bobic
gordan@freeuk.com
In reply to: Jan Ploski (#1)
Re: Scalability

(Frequent Access)
If you just have lots of queries in parallel, try replication, and
pick a random server for each connection.

(Complex Queries)
If you absolutely, positively need one query to be executed across all
nodes in the cluster because one machine would just take too long no
matter how big, then you may have to shell out for Oracle...

(Splitting the Data)
Alternatively, you could put separate tables on separate machines, or
wait until the "schema" gets advanced enough to let you do full
cross-database SQL queries... No idea how far off this is in
PostgreSQL...

Personally, I find that CPU is rarely the bottleneck. It is usually
either lack of memory or the speed of the disks...

----- Original Message -----
From: "Jan Ploski" <jpljpl@gmx.de>
To: <pgsql-general@postgresql.org>
Sent: Wednesday, March 14, 2001 1:54 PM
Subject: [GENERAL] Scalability

Hello,

When large databases, complex queries and frequent access are

combined,

the database backend may become a performance bottleneck. This is

quite

obvious.

Do I assume correctly that CPU will be the earliest limiting factor
for a database (if we have a reasonable amount of memory in the

box)?

If yes, one could deal with it by switching to a more powerful

machine,

OR one could distribute the database between several equivalent

machines,

as I imagine. The latter makes sense if data can be partitioned so

that

consistency and dependencies can be worked around, that is, I can

move

two subsets of data (say A and B) to two different boxes if rows in

these

subsets are very unlikely to be combined in a single query. Then I

can have

my application code access the correct database instance based on

some

simple rule, for example, users with IDs 0-1000000 have their

messages

stored on box A and all others on box B. Or: articles for newsgroups
with hash code == 0 -> box A, others -> box B. Do we get linear

scalability

that way? I guess so. Does it sound cool? Yes. But does it cost less

than

upgrading a single server? I'm not sure.

Is anyone out there using such a load balancing scheme with

PostgreSQL?

Are there any foreseeable problems which would make it impractical?

Thanks -
JPL

---------------------------(end of

broadcast)---------------------------

TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to

majordomo@postgresql.org)

Show quoted text
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Ploski (#1)
Re: Scalability

Jan Ploski <jpljpl@gmx.de> writes:

Do I assume correctly that CPU will be the earliest limiting factor
for a database (if we have a reasonable amount of memory in the box)?

Depends on your query mix. If you have a lot of updates then I'd
recommend spending plenty of money on fast disk drives and good disk
controller boards. If it's mostly SELECTs then you don't need such fast
drives, as long as you have enough RAM to cache a good chunk of the
active database. In that case the CPU probably would be the limiting
factor.

Is anyone out there using such a load balancing scheme with PostgreSQL?

Hardware is so cheap these days that I'd think it makes more sense just
to throw more hardware at the problem... I think client-driven load
balancing will be a pain in the neck to maintain. (There has been some
talk of actual server support for load balancing, but it's still a long
way off.)

regards, tom lane

#4Gordon A. Runkle
gar@no-spam-integrated-dynamics.com
In reply to: Gordan Bobic (#2)
Re: Scalability

In article <006001c0ac91$4ebe3ea0$800010ac@localdomain>, "Gordan Bobic"
<gordan@freeuk.com> wrote:

(Complex Queries)
If you absolutely, positively need one query to be executed across all
nodes in the cluster because one machine would just take too long no
matter how big, then you may have to shell out for Oracle... (Splitting

or DB2 EEE (Enterprise Extended Edition). It's cheaper
than Oracle, too.

Gordon.
--
It doesn't get any easier, you just go faster.
-- Greg LeMond