Postgresql data integrity during RAID10 drive rebuild

Started by Steve Poeover 19 years ago8 messagesgeneral
Jump to latest
#1Steve Poe
steve.poe@gmail.com

I need some input from the Postgresql community.

Our animal hospital runs Postgresql 7.4 on a 6-disc RAID10. The database
logs are on a separate RAID1.

We're using an LSI MegaRAID 320-2X controller. The controller reports one
146GB SCSI disc has failed in the RAID10 performance is in "DEGRADED" mode.
The database seems to be running fine but slower.

I've never had to replace a disc in an array with Postgresql running on it.
LSI says I can replace the disc and do a rebuild while everything is
running. I am of course concerned about data integrity/corruption.

Has anyone had to rebuild one of their disc in an array of their database?

Thanks for your help.

Steve Poe

#2Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Steve Poe (#1)
Re: Postgresql data integrity during RAID10 drive rebuild

On Wed, 2006-11-29 at 10:56, Steve Poe wrote:

I need some input from the Postgresql community.

Our animal hospital runs Postgresql 7.4 on a 6-disc RAID10. The
database logs are on a separate RAID1.

We're using an LSI MegaRAID 320-2X controller. The controller reports
one 146GB SCSI disc has failed in the RAID10 performance is in
"DEGRADED" mode. The database seems to be running fine but slower.

I've never had to replace a disc in an array with Postgresql running
on it. LSI says I can replace the disc and do a rebuild while
everything is running. I am of course concerned about data
integrity/corruption.

Has anyone had to rebuild one of their disc in an array of their
database?

Yep, I've done it a few times. A few tips:

backup your database with pg_dump. confirm you can restore on a test
machine.

replace the drive during the lowest traffic period for your database.

The LSI cards are very stable. I've replaced drives under them back in
the Ultra-320 days with removable caddies. I would imagine that as long
as you have the proper drive caddies you're set. If the drives are not
in removable caddies, you'll need to power down to safely unplug them.

#3Joshua D. Drake
jd@commandprompt.com
In reply to: Steve Poe (#1)
Re: Postgresql data integrity during RAID10 drive rebuild

On Wed, 2006-11-29 at 08:56 -0800, Steve Poe wrote:

I need some input from the Postgresql community.

Our animal hospital runs Postgresql 7.4 on a 6-disc RAID10. The
database logs are on a separate RAID1.

We're using an LSI MegaRAID 320-2X controller. The controller reports
one 146GB SCSI disc has failed in the RAID10 performance is in
"DEGRADED" mode. The database seems to be running fine but slower.
I've never had to replace a disc in an array with Postgresql running
on it. LSI says I can replace the disc and do a rebuild while
everything is running. I am of course concerned about data
integrity/corruption.

Well of course do a backup, but LSI is correct it will rebuild
correctly.

Has anyone had to rebuild one of their disc in an array of their
database?

Of course :) and you should be fine. However make sure you grab a backup
just in case, and check the firmware version on your LSI. There was a
parity bug that caused data corruption a coupld of revs ago.

Joshua D. Drake

Thanks for your help.

Steve Poe

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

#4Vick Khera
vivek@khera.org
In reply to: Steve Poe (#1)
Re: Postgresql data integrity during RAID10 drive rebuild

On Nov 29, 2006, at 11:56 AM, Steve Poe wrote:

I've never had to replace a disc in an array with Postgresql
running on it. LSI says I can replace the disc and do a rebuild
while everything is running. I am of course concerned about data
integrity/corruption.

This is the whole entire complete purpose you have a RAID card and
hot-swap drives: To make it transparent to the layers above the disk
interface.

Has anyone had to rebuild one of their disc in an array of their
database?

Yes. The OS (let alone an application such as the DB) has no clue
other than possibly slower response from the mirrored pair being
rebuilt.

Attachments:

smime.p7sapplication/pkcs7-signature; name=smime.p7sDownload
#5Steve Poe
steve.poe@gmail.com
In reply to: Scott Marlowe (#2)
Re: Postgresql data integrity during RAID10 drive rebuild

Yep, I've done it a few times. A few tips:

backup your database with pg_dump. confirm you can restore on a test
machine.

Thanks. We do nightly dump and restore to a second server for
testing/backup purposes.
The data is entact. Are you recommending a dump before we begin the drive
rebuild?

replace the drive during the lowest traffic period for your database.

Sounds good. According to LSI, the drive will take 8 hrs to rebuild a 146GB
disc (at a 30% rebuild rate), so doing this in the middle of the day is not
ideal.

The LSI cards are very stable. I've replaced drives under them back in

the Ultra-320 days with removable caddies. I would imagine that as long
as you have the proper drive caddies you're set. If the drives are not
in removable caddies, you'll need to power down to safely unplug them.

The drive is on a removable caddie in a 4-disc array backplane.

Scott, Joshua, and Vivek...thanks for your feedback.

Steve Poe
Adobe Animal Hospital

#6Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Steve Poe (#5)
Re: Postgresql data integrity during RAID10 drive rebuild

On Wed, 2006-11-29 at 13:21, Steve Poe wrote:

Yep, I've done it a few times. A few tips:

backup your database with pg_dump. confirm you can restore on
a test
machine.

Thanks. We do nightly dump and restore to a second server for
testing/backup purposes.
The data is entact. Are you recommending a dump before we begin the
drive rebuild?

Either that, or wait until the nightly dump / restore has run and start
then.

replace the drive during the lowest traffic period for your
database.

Sounds good. According to LSI, the drive will take 8 hrs to rebuild a
146GB disc (at a 30% rebuild rate), so doing this in the middle of the
day is not ideal.

The rebuild time also tends to depend on how full the array is. If
you're only using 5% or so, it won't take the full 8 hours they're
projecting.

#7Vick Khera
vivek@khera.org
In reply to: Scott Marlowe (#6)
Re: Postgresql data integrity during RAID10 drive rebuild

On Nov 29, 2006, at 2:39 PM, Scott Marlowe wrote:

Sounds good. According to LSI, the drive will take 8 hrs to rebuild a
146GB disc (at a 30% rebuild rate), so doing this in the middle of
the
day is not ideal.

The rebuild time also tends to depend on how full the array is. If
you're only using 5% or so, it won't take the full 8 hours they're
projecting.

But how does the RAID card know what is and what is not "full" in the
unix file system stored on it? It has to rebuild the entire drive.

Attachments:

smime.p7sapplication/pkcs7-signature; name=smime.p7sDownload
#8Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Vick Khera (#7)
Re: Postgresql data integrity during RAID10 drive rebuild

On Wed, 2006-11-29 at 14:16, Vivek Khera wrote:

On Nov 29, 2006, at 2:39 PM, Scott Marlowe wrote:

Sounds good. According to LSI, the drive will take 8 hrs to rebuild a
146GB disc (at a 30% rebuild rate), so doing this in the middle of
the
day is not ideal.

The rebuild time also tends to depend on how full the array is. If
you're only using 5% or so, it won't take the full 8 hours they're
projecting.

But how does the RAID card know what is and what is not "full" in the
unix file system stored on it? It has to rebuild the entire drive.

Not sure how they do it exactly, but it seems a lot of RAID controllers
know which parts of a drive have been written to and which haven't. I
recall seeing it happen on rebuilding a RAID 5 on an old LSI card.

It could just be that it's a lot faster if it's got zeros on a block and
can short circuit the parity there.

As for RAID 1+0, not sure if it will be faster or not.