Redundant file server for postgres

Started by Bob Powellabout 18 years ago5 messagesgeneral
Jump to latest
#1Bob Powell
Bob@hotchkiss.org

To whom it may concern,

I'm looking for a file server that will give me a high level of
redundancy and high performance for a postgres database. The server
will be running only postgres as a backend service, connected to a front
end server with the application on it.

I was thinking along the lines of a RAID 10 setup with a very fast
processor.

Any suggestions would be greatly appreciated.

Bob Powell

The Hotchkiss School

#2Karl Denninger
karl@denninger.net
In reply to: Bob Powell (#1)
Re: Redundant file server for postgres

What's the expected transaction split (read/write)?

If mostly READs (e.g. SELECTs) then its very, very hard to do better
from a performance perspective than Raid 1 with the transaction log on a
separate array (physically separate spindles)

I run a VERY busy web forum on a Quadcore Intel box with this setup and
it is very fast. Really quite amazing when you get right down to it.

The latest release of the PostgreSQL code markedly improved query
optimization, by the way. The performance improvement when I migrated
over was quite stunning.

Karl Denninger (karl@denninger.net)
http://www.denninger.net

Robert Powell wrote:

Show quoted text

To whom it may concern,

I'm looking for a file server that will give me a high level of
redundancy and high performance for a postgres database. The server
will be running only postgres as a backend service, connected to a
front end server with the application on it.

I was thinking along the lines of a RAID 10 setup with a very fast
processor.

Any suggestions would be greatly appreciated.

Bob Powell

The Hotchkiss School

#3Craig Ringer
craig@2ndquadrant.com
In reply to: Bob Powell (#1)
Re: Redundant file server for postgres

Robert Powell wrote:

To whom it may concern,

I'm looking for a file server that will give me a high level of
redundancy and high performance for a postgres database.

For strong redundancy and availability you may need a secondary server
and some sort of replication setup (be it a WAL-following warm spare,
slony-I, or whatever). It depends on what you mean by "high".

As for performance - I'm still learning on this myself, so treat the
following as being of questionable accuracy.

As far as I know the general rule for databases is "if in doubt, add
more fast disks". A fast CPU (or depending on type of workload several
almost-as-fast CPUs) will be nice, but if your database is big enough
not to fit mostly in RAM you'll mostly be limited by disk I/O. To
increase disk I/O performance, in general you want more disks. Faster
disks will help, but probably not as much as just having more of them.

More RAM is of course also nice, but might make a huge difference for
some workloads and database types and relatively little for others. If
doubling your RAM lets the server cache most of the database in RAM
it'll probably speed things up a lot. If doubling the RAM is the
difference between 2% and 4% of the DB in RAM ... it might not make such
a difference (unless, of course, your queries mostly operate on a subset
of your data that's fairly similar to your RAM size, you do lots of big
joins, etc).

Various RAID types also have implications for disk I/O. For example,
RAID-5 tends to have miserable write performance.

In the end, though, it depends a huge amount on your workload. Will you
have huge numbers of simpler concurrent transactions, or relatively few
heavy and complex ones? Will the database be read-mostly, or will it be
written to very heavily? Vaguely how large is your expected dataset? Is
all the data likely to be accessed with equal frequency or are most
queries likely to concentrate on a small subset of the data? And so on...

--
Craig Ringer

#4Karl Denninger
karl@denninger.net
In reply to: Craig Ringer (#3)
Re: Redundant file server for postgres

Craig Ringer wrote:

Robert Powell wrote:

To whom it may concern,

I'm looking for a file server that will give me a high level of
redundancy and high performance for a postgres database.

For strong redundancy and availability you may need a secondary server
and some sort of replication setup (be it a WAL-following warm spare,
slony-I, or whatever). It depends on what you mean by "high".

As for performance - I'm still learning on this myself, so treat the
following as being of questionable accuracy.

As far as I know the general rule for databases is "if in doubt, add
more fast disks". A fast CPU (or depending on type of workload several
almost-as-fast CPUs) will be nice, but if your database is big enough
not to fit mostly in RAM you'll mostly be limited by disk I/O. To
increase disk I/O performance, in general you want more disks. Faster
disks will help, but probably not as much as just having more of them.

More RAM is of course also nice, but might make a huge difference for
some workloads and database types and relatively little for others. If
doubling your RAM lets the server cache most of the database in RAM
it'll probably speed things up a lot. If doubling the RAM is the
difference between 2% and 4% of the DB in RAM ... it might not make
such a difference (unless, of course, your queries mostly operate on a
subset of your data that's fairly similar to your RAM size, you do
lots of big joins, etc).

Various RAID types also have implications for disk I/O. For example,
RAID-5 tends to have miserable write performance.

In the end, though, it depends a huge amount on your workload. Will
you have huge numbers of simpler concurrent transactions, or
relatively few heavy and complex ones? Will the database be
read-mostly, or will it be written to very heavily? Vaguely how large
is your expected dataset? Is all the data likely to be accessed with
equal frequency or are most queries likely to concentrate on a small
subset of the data? And so on...

--
Craig Ringer

The key issue on RAM is not whether the database will fit into RAM (for
all but the most trivial applications, it will not)

It is whether the key INDICES will fit into RAM. If they will, then you
get a HUGE win in performance.

If not, then it is all about disk I/O performance and the better you can
spread that load across multiple spindles and get the data into the CPU
at a very high rate of speed, the faster the system will perform.

In terms of redundancy you have to know your workload before designing a
strategy. For a database that is almost all queries (few
inserts/updates) the job is considerably simpler than a database that
sees very frequent inserts and/or updates.

Karl Denninger (karl@denninger.net)
http://www.denninger.net

#5Scott Marlowe
scott.marlowe@gmail.com
In reply to: Karl Denninger (#4)
Re: Redundant file server for postgres

On Sun, Mar 16, 2008 at 1:02 PM, Karl Denninger <karl@denninger.net> wrote:

The key issue on RAM is not whether the database will fit into RAM (for
all but the most trivial applications, it will not)

I would argue that many applications where the data fits into memory
are not trivial. Especially if we're talking about the working set.
If you operate on 1 Gig sets out of a terabyte range for a reporting
database, then your data fits into (or damned well should :) ) memory.

Also, many applications with small datasets can be quite complex, like
control systems. The actual amount of might be 100 Meg, but the
throughput might be very high, and require a battery backed cache
because of all the writes going in.

So there are plenty of times your data will fit in memory.

It is whether the key INDICES will fit into RAM. If they will, then you
get a HUGE win in performance.

When they don't, you often need to start looking at some form of
partitioning if you want to keep good performance. By partitioning
I'm not just limiting that to using inherited tables to do it, it
could include things like horizontal partitioning of data across
different pg servers.

Note that I'm not disagreeing with everything you said, just a slight
clarification on data sets that do / don't fit into memory.