RAID 1 - drive failed - very slow queries even after drive replaced
Hi,
I am looking for some advice on where to troubleshoot after 1 drive in
a RAID 1 failed.
Thank you.
I am running v 7.41, I am currently importing the data to another
physical server running 8.4 and will test with that once I can. In the
meantime here is relevant info:
Backups used to take 25 minutes, and now take 110 minutes, before
replacing the drive it became clear the backup was not going to finish
since in 120 minutes it had only finished 200mb of 2.8gb.
Before replacing the drive:
-----------------------------------
We noticed all of the queries were slow, many taking over 100 seconds.
After we replaced the drives we noticed the queries are running 40
seconds or more and most are 8 seconds or more where the same query
used to take only 1 second. We have replaced a drive in this RAID 1
before and nothing like this happened. The schema was not touched for
at least 1 week prior to this.
Since replacing the drive I have:
-------------------------------------------
Restored from a backup a few hours before the queries became very
slow.
Reindex all tables
Vacuum all tables
Analyze all tables
Here is what I get with iostat:
iostat -k /dev/sda2
Linux 2.6.26-2-686-bigmem (db1)
avg-cpu: %user %nice %system %iowait %steal %idle
19.61 0.00 8.34 1.60 0.00 70.45
Hi,
I am looking for some advice on where to troubleshoot after 1 drive in
a RAID 1 failed.Thank you.
I am running v 7.41, I am currently importing the data to another
physical server running 8.4 and will test with that once I can. In the
meantime here is relevant info:Backups used to take 25 minutes, and now take 110 minutes, before
replacing the drive it became clear the backup was not going to finish
since in 120 minutes it had only finished 200mb of 2.8gb.
What kind of RAID is this? I know it's RAID1, but is it built using linux
sw raid (md), or is it some other solution (like a hw card)? Can you post
some diagnostics info like
mdadm --detail /dev/md1
or something like that? I guess this would be reflected in the iostat
output (at least for the 'md') but I'm not sure. Maybe there's something
still in progress (sync of the new drive?).
Tomas
On Wed, Mar 23, 2011 at 3:33 AM, Merrick <merrick@gmail.com> wrote:
Hi,
I am looking for some advice on where to troubleshoot after 1 drive in
a RAID 1 failed.Thank you.
I am running v 7.41, I am currently importing the data to another
physical server running 8.4 and will test with that once I can. In the
meantime here is relevant info:Backups used to take 25 minutes, and now take 110 minutes, before
replacing the drive it became clear the backup was not going to finish
since in 120 minutes it had only finished 200mb of 2.8gb.Before replacing the drive:
-----------------------------------
We noticed all of the queries were slow, many taking over 100 seconds.
After we replaced the drives we noticed the queries are running 40
seconds or more and most are 8 seconds or more where the same query
used to take only 1 second. We have replaced a drive in this RAID 1
before and nothing like this happened. The schema was not touched for
at least 1 week prior to this.Since replacing the drive I have:
-------------------------------------------
Restored from a backup a few hours before the queries became very
slow.
Reindex all tables
Vacuum all tables
Analyze all tablesHere is what I get with iostat:
iostat -k /dev/sda2
Linux 2.6.26-2-686-bigmem (db1)
avg-cpu: %user %nice %system %iowait %steal %idle
19.61 0.00 8.34 1.60 0.00 70.45
probably the replacement drive is bunk, or some esoteric hw problem is
tripping you up. some iostat numbers while you are having the problem
would be more telling. the solution is obvious -- in terms of this
server, it's time to ramble on...
merlin
Thank you Merlin, I had my suspicions about the hardware as well.
The backup server is blazing fast, it is definitely
"time to ramble on..."
Show quoted text
On Mar 23, 7:11 am, mmonc...@gmail.com (Merlin Moncure) wrote:
On Wed, Mar 23, 2011 at 3:33 AM, Merrick <merr...@gmail.com> wrote:
Hi,
I am looking for some advice on where to troubleshoot after 1 drive in
a RAID 1 failed.Thank you.
I am running v 7.41, I am currently importing the data to another
physical server running 8.4 and will test with that once I can. In the
meantime here is relevant info:Backups used to take 25 minutes, and now take 110 minutes, before
replacing the drive it became clear the backup was not going to finish
since in 120 minutes it had only finished 200mb of 2.8gb.Before replacing the drive:
-----------------------------------
We noticed all of the queries were slow, many taking over 100 seconds.
After we replaced the drives we noticed the queries are running 40
seconds or more and most are 8 seconds or more where the same query
used to take only 1 second. We have replaced a drive in this RAID 1
before and nothing like this happened. The schema was not touched for
at least 1 week prior to this.Since replacing the drive I have:
-------------------------------------------
Restored from a backup a few hours before the queries became very
slow.
Reindex all tables
Vacuum all tables
Analyze all tablesHere is what I get with iostat:
iostat -k /dev/sda2
Linux 2.6.26-2-686-bigmem (db1)
avg-cpu: �%user � %nice %system %iowait �%steal � %idle
� � � � �19.61 � �0.00 � �8.34 � �1.60 � �0.00 � 70.45probably the replacement drive is bunk, or some esoteric hw problem is
tripping you up. some iostat numbers while you are having the problem
would be more telling. the solution is obvious -- in terms of this
server, it's time to ramble on...merlin
--
Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general
On 23 Mar 2011, at 9:33, Merrick wrote:
Backups used to take 25 minutes, and now take 110 minutes, before
replacing the drive it became clear the backup was not going to finish
since in 120 minutes it had only finished 200mb of 2.8gb.
A few obvious questions:
1. Are you sure you replaced the correct drive?
2. Did the mirror finish resyncing before you did above measurements?
Before replacing the drive:
-----------------------------------
We noticed all of the queries were slow, many taking over 100 seconds.
After we replaced the drives we noticed the queries are running 40
seconds or more and most are 8 seconds or more where the same query
used to take only 1 second. We have replaced a drive in this RAID 1
before and nothing like this happened. The schema was not touched for
at least 1 week prior to this.Since replacing the drive I have:
-------------------------------------------
Restored from a backup a few hours before the queries became very
slow.
Reindex all tables
Vacuum all tables
Analyze all tables
Are you sure it was the drive that broke? Or maybe you have some collateral hardware damage to, for example, your raid controller, cables, disk controller or motherboard? Maybe the new drive has different requirements than the old one had (more power, for example)?
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.
!DSPAM:737,4d8a454f651341486416489!
On Wed, Mar 23, 2011 at 08:08:47PM +0100, Alban Hertroys wrote:
er or motherboard? Maybe the new drive has different requirements than the old one had (more power, for example)?
Or a newer but backward-plug-compatible interface? Often, the new
drive in the old plug only uses the Right Features once its
compatibility mode has been turned off. (This is at least true in my
experience. Not saying it's the cause of the present issue, though.)
A
--
Andrew Sullivan
ajs@crankycanuck.ca