Re: Postgresql and Software RAID/LVM

Started by Marty Scholesalmost 21 years ago5 messagesgeneral
Jump to latest
#1Marty Scholes
marty@outputservices.com

Has anyone ran Postgres with software RAID or LVM on a production box?
What have been your experience?

Yes, we have run for a couple years Pg with software LVM (mirroring)
against two hardware RAID5 arrays. We host a production Sun box that
runs 24/7.

My experience:
* Software RAID (other than mirroring) is a disaster waiting to happen.
If the metadata for the RAID set gives out for any reason (CMOS
scrambles, card dies, power spike, etc.) then you are hosed beyond
belief. In most cases it is almost impossible to recover. With
mirroring, however, you can always boot and operate on a single mirror,
pretending that no LVM/RAID is underway. In other words, each mirror is
a fully functional copy of the data which will operate your server.

* Hardware RAID5 is a terrific way to boost performance via write
caching and spreading I/O across multiple spindles. Each of our
external arrays operates 14 drives (12 data, 1 parity and 1 hot spare).
While RAID5 protects against single spindle failure, it will not hedge
against multiple failures in a short time period, SCSI contoller
failure, SCSI cable problems or even wholesale failure of the RAID
controller. All of these things happen in a 24/7 operation. Using
software RAID1 against the hardware RAID5 arrays hedges against any
single failure.

* Software mirroring gives you tremendous ability to change the system
while it is running, by taking offline the mirror you wish to change and
then synchronizing it after the change.

On a fully operational production server, we have:
* restriped the RAID5 array
* replaced all RAID5 media with higher capacity drives
* upgraded RAID5 controller
* moved all data from an old RAID5 array to a newer one
* replaced host SCSI controller
* uncabled and physically moved storage to a different part of data center

Again, all of this has taken place (over the years) while our machine
was fully operational.

#2John Arbash Meinel
john@arbash-meinel.com
In reply to: Marty Scholes (#1)

Marty Scholes wrote:

Has anyone ran Postgres with software RAID or LVM on a production box?
What have been your experience?

Yes, we have run for a couple years Pg with software LVM (mirroring)
against two hardware RAID5 arrays. We host a production Sun box that
runs 24/7.

My experience:
* Software RAID (other than mirroring) is a disaster waiting to happen.
If the metadata for the RAID set gives out for any reason (CMOS
scrambles, card dies, power spike, etc.) then you are hosed beyond
belief. In most cases it is almost impossible to recover. With
mirroring, however, you can always boot and operate on a single mirror,
pretending that no LVM/RAID is underway. In other words, each mirror is
a fully functional copy of the data which will operate your server.

Isn't this actually more of a problem for the meta-data to give out in a
hardware situation? I mean, if the card you are using dies, you can't
just get another one.
With software raid, because the meta-data is on the drives, you can pull
it out of that machine, and put it into any machine that has a
controller which can read the drives, and a similar kernel, and you are
back up and running.

* Hardware RAID5 is a terrific way to boost performance via write
caching and spreading I/O across multiple spindles. Each of our
external arrays operates 14 drives (12 data, 1 parity and 1 hot spare).
While RAID5 protects against single spindle failure, it will not hedge
against multiple failures in a short time period, SCSI contoller
failure, SCSI cable problems or even wholesale failure of the RAID
controller. All of these things happen in a 24/7 operation. Using
software RAID1 against the hardware RAID5 arrays hedges against any
single failure.

No, it hedges against *more* than one failure. But you can also do a
RAID1 over a RAID5 in software. But if you are honestly willing to
create a full RAID1, just create a RAID1 over RAID0. The performance is
much better. And since you have a full RAID1, as long as both drives of
a pairing don't give out, you can lose half of your drives.

If you want the space, but you feel that RAID5 isn't redundant enough,
go to RAID6, which uses 2 parity locations, each with a different method
of storing parity, so not only is it more redundant, you have a better
chance of finding problems.

* Software mirroring gives you tremendous ability to change the system
while it is running, by taking offline the mirror you wish to change and
then synchronizing it after the change.

That certainly is a nice ability. But remember that LVM also has the
idea of "snapshot"ing a running system. I don't know the exact details,
just that there is a way to have some processes see the filesystem as it
existed at an exact point in time. Which is also a great way to handle
backups.

On a fully operational production server, we have:
* restriped the RAID5 array
* replaced all RAID5 media with higher capacity drives
* upgraded RAID5 controller
* moved all data from an old RAID5 array to a newer one
* replaced host SCSI controller
* uncabled and physically moved storage to a different part of data center

Again, all of this has taken place (over the years) while our machine
was fully operational.

So you are saying that you were able to replace the RAID controller
without turning off the machine? I realize there does exist
hot-swappable PCI cards, but I think you are overstating what you mean
by "fully operational". For instance, it's not like you can access your
data while it is being physically moved.

I do think you had some nice hardware. But I know you can do all of this
in software as well. It is usually a price/performance tradeoff. You
spend quite a bit to get a hardware RAID card that can keep up with a
modern CPU. I know we have an FC raid box at work which has a full 512MB
of cache on it, but it wasn't that much cheaper than buying a dedicated
server.

John
=:->

#3Marty Scholes
marty@outputservices.com
In reply to: Marty Scholes (#1)

John A Meinel wrote:

Isn't this actually more of a problem for the meta-data to give out in a
hardware situation? I mean, if the card you are using dies, you can't
just get another one.
With software raid, because the meta-data is on the drives, you can pull
it out of that machine, and put it into any machine that has a
controller which can read the drives, and a similar kernel, and you are
back up and running.

Probably true. If you have a similar kernel and hardware and if you can
recover the state information, knowing where the state information is
stored. Those are some very big "ifs" during a hectic disaster.

No, it hedges against *more* than one failure. But you can also do a
RAID1 over a RAID5 in software. But if you are honestly willing to
create a full RAID1, just create a RAID1 over RAID0. The performance is
much better. And since you have a full RAID1, as long as both drives of
a pairing don't give out, you can lose half of your drives.

True as well. The problem with RAID1 over RAID0 is that, during a drive
failure, you are one bad sector from disaster. Further, RAID5 does
automatic rebuild, whereas most RAID1 setups do not. RAID5 reduces the
amount of time that things are degraded, reducing the time that your
data is in danger.

If you want the space, but you feel that RAID5 isn't redundant enough,
go to RAID6, which uses 2 parity locations, each with a different method
of storing parity, so not only is it more redundant, you have a better
chance of finding problems.

Agreed, RAID6 is the future, but still won't keep the server running
when the RAID controller dies, or the SCSI/FC host adapter goes, or you
want to upgrade controller firmware, or you want to replace the media, or...

So you are saying that you were able to replace the RAID controller
without turning off the machine? I realize there does exist
hot-swappable PCI cards, but I think you are overstating what you mean
by "fully operational". For instance, it's not like you can access your
data while it is being physically moved.

Detach mirror 1, uncable and move, recable and resync. Detach mirror 2,
uncable and move, recable and resync.

I do think you had some nice hardware. But I know you can do all of this
in software as well. It is usually a price/performance tradeoff. You
spend quite a bit to get a hardware RAID card that can keep up with a
modern CPU. I know we have an FC raid box at work which has a full 512MB
of cache on it, but it wasn't that much cheaper than buying a dedicated
server.

We run two Nexsan ATABoy2 arrays. These can be found in 1 TB
configurations for about $3,000 each, putting mirrored RAID5 storage at
$6 per GB. Is that a lot of money for storage? Maybe. In our case,
that's dirt cheap protection against storage-related downtime.

Marty

#4Noname
brew@theMode.com
In reply to: Marty Scholes (#3)
Debian Stable goes from Woody to Sarge!!

Debian Stable has gone from Woody to Sarge.

Hooray!

That means the normal package installed goes from 7.2.1 to 7.4.7.

Thanks to the folks who told me about backports.org, but I didn't follow
through and load it, though. Maybe when backports has 8.x I'll go that
route.

Thanks Oliver for the work you did (I'm assuming) on getting the Sarge
postgreSQL package ready over the various incarnations of Testing.

Now I'm off to upgrade the rest of my machines, prudently saving my
production server for last.

brew

==========================================================================
Strange Brew (brew@theMode.com)
Check out my Stock Option Covered Call website http://www.callpix.com
and my Musician's Online Database Exchange http://www.TheMode.com
==========================================================================

#5Peter Eisentraut
peter_e@gmx.net
In reply to: Noname (#4)
Re: Debian Stable goes from Woody to Sarge!!

brew@theMode.com wrote:

Thanks Oliver for the work you did (I'm assuming) on getting the
Sarge postgreSQL package ready over the various incarnations of
Testing.

Martin Pitt maintains the Debian packages of PostgreSQL these days.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/