READ ONLY & I/O ERROR

Started by Sam Jasover 16 years ago12 messagesgeneral
Jump to latest
#1Sam Jas
samjas33@yahoo.com

Hi Folks,

I am frequently getting read-only file system error on my server.

We are using postgreSQL, GridSQL database. The size of database is very huge.  
Architecture Details:
CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port
32 GB RAM
assemble hardware
 
We are daily processing millions of rows and loadiing into database. We have marked that when we create a new database it worked fine upto 20 or 25 days. After that we
are getting errors like "read only file system" , data is corrupted. Therefore we are running fsck to remove bad blocks from the disk. However, after running fsck also we are getting the same error.

I will appreciate you if somebody help me to get rid out of this issue.
 

--
Thanks
Sam Jas

The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. http://in.yahoo.com/

#2Grzegorz Jaśkiewicz
gryzman@gmail.com
In reply to: Sam Jas (#1)
Re: READ ONLY & I/O ERROR

On Thu, Nov 26, 2009 at 1:40 PM, Sam Jas <samjas33@yahoo.com> wrote:

Hi Folks,

I am frequently getting read-only file system error on my server.

We are using postgreSQL, GridSQL database. The size of database is very
huge.
Architecture Details:
CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port
32 GB RAM
assemble hardware

We are daily processing millions of rows and loadiing into database. We
have marked that when we create a new database it worked fine upto 20 or 25
days. After that we
are getting errors like "read only file system" , data is corrupted.
Therefore we are running fsck to remove bad blocks from the disk. However,
after running fsck also we are getting the same error.

I will appreciate you if somebody help me to get rid out of this issue.

this looks more like filesystem corruption.
What's the FS database is running on ? presumably ext3 (cos it is centos5).

If possible, consider checking the root cause of FS corruption, possibly
test on other FS (xfs?).
Maybe you should also try to enable journaling, if you run in ext2/3 mode.

--
GJ

#3Sam Jas
samjas33@yahoo.com
In reply to: Grzegorz Jaśkiewicz (#2)
Re: READ ONLY & I/O ERROR

How can i enable journaling as i am not so good at OS & H/W level. Can you give me some detail description.

Thanks
Sam Jas

--- On Thu, 26/11/09, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote:

From: Grzegorz Jaśkiewicz <gryzman@gmail.com>
Subject: Re: [GENERAL] READ ONLY & I/O ERROR
To: "Sam Jas" <samjas33@yahoo.com>
Cc: pgsql-general@postgresql.org
Date: Thursday, 26 November, 2009, 1:44 PM

On Thu, Nov 26, 2009 at 1:40 PM, Sam Jas <samjas33@yahoo.com> wrote:

Hi Folks,

I am frequently getting read-only file system error on my server.

We are using postgreSQL, GridSQL database. The size of database is very huge.  
Architecture Details:
CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port

32 GB RAM
assemble hardware
 
We are daily processing millions of rows and loadiing into database. We have marked that when we create a new database it worked fine upto 20 or 25 days. After that we
are getting errors like "read only file system" , data is corrupted. Therefore we are running fsck to remove bad blocks from the disk. However, after running fsck also we are getting the same error.

I will appreciate you if somebody help me to get rid out of this issue. this looks more like filesystem corruption.
What's the FS database is running on ? presumably ext3 (cos it is centos5).

If possible, consider checking the root cause of FS corruption, possibly test on other FS (xfs?).
Maybe you should also try to enable journaling, if you run in ext2/3 mode.
 

--
GJ

The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. http://in.yahoo.com/

#4Grzegorz Jaśkiewicz
gryzman@gmail.com
In reply to: Sam Jas (#3)
Re: READ ONLY & I/O ERROR

2009/11/26 Sam Jas <samjas33@yahoo.com>

How can i enable journaling as i am not so good at OS & H/W level. Can you
give me some detail description.

a) don't top post,
b) don't send emails in html,
c) man e2fsck , I am sure it is described all around net million times. it
is something I haven't done in a while - so please search for instructions,
for instance on redhat's website.

--
GJ

#5Grzegorz Jaśkiewicz
gryzman@gmail.com
In reply to: Grzegorz Jaśkiewicz (#4)
Re: READ ONLY & I/O ERROR

oh, and fourth - if you get filesystem errors, I would inspect drives, raid
card, etc - because those usually mean that something's fishy.

#6Alan Hodgson
ahodgson@simkin.ca
In reply to: Sam Jas (#1)
Re: READ ONLY & I/O ERROR

On Thursday 26 November 2009, Sam Jas <samjas33@yahoo.com> wrote:

We are daily processing millions of rows and loadiing into database. We
have marked that when we create a new database it worked fine upto 20 or
25 days. After that we are getting errors like "read only file system" ,
data is corrupted. Therefore we are running fsck to remove bad blocks
from the disk. However, after running fsck also we are getting the same
error.

You have a hardware problem. Get your system administrator to isolate and
repair the bad hardware.

--
A hybrid Escalade is missing the point much in the same way that having a
diet soda with your extra large pepperoni pizza is missing the point.

#7Scott Marlowe
scott.marlowe@gmail.com
In reply to: Sam Jas (#1)
Re: READ ONLY & I/O ERROR

On Thu, Nov 26, 2009 at 6:40 AM, Sam Jas <samjas33@yahoo.com> wrote:

Hi Folks,

I am frequently getting read-only file system error on my server.

We are using postgreSQL, GridSQL database. The size of database is very huge.
Architecture Details:
CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port

Areca doesn't make the high point rocket raid cards (which are medium
quality RAID cards).

32 GB RAM
assemble hardware

Did you follow proper ESD precautions when building this machine??

We are daily processing millions of rows and loadiing into database. We have marked that when we create a new database it worked fine upto 20 or 25 days. After that we
are getting errors like "read only file system" , data is corrupted. Therefore we are running fsck to remove bad blocks from the disk. However, after running fsck also we are getting the same error.

I will appreciate you if somebody help me to get rid out of this issue.

Sounds like your hardware is bad. Could be mobo / cpu / memory or
RAID card. Does this machine "hang" every so often or anything?

I'd run memtest86+ on it first to confirm good cpu / memory / mobo.

Quick factoid from my days as an electronics instructor in the USAF,
95% of all ESD induced failures are latent in nature, either resulting
in catastrophic failure or thermal degradation some months or years
down the road.

#8Scott Marlowe
scott.marlowe@gmail.com
In reply to: Scott Marlowe (#7)
Re: READ ONLY & I/O ERROR

On Fri, Nov 27, 2009 at 4:53 AM, Sam Jas <samjas33@yahoo.com> wrote:

I will check that one. Also i have read one forum which tells that whenever you face disk i/o run "dmesg" command it will give you detail information. Today again i face disk i/o and i have run "dmesg" it has given me below o/p. Can somebody help me to explain what is it telling ?

sd 0:0:3:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sdd, sector 16
Buffer I/O error on device sdd, logical block 2
Buffer I/O error on device sdd, logical block 3
sd 0:0:3:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sdd, sector 0

Looks like you've got a bad drive.

#9Greg Smith
gsmith@gregsmith.com
In reply to: Scott Marlowe (#7)
Re: READ ONLY & I/O ERROR

Scott Marlowe wrote:

Areca doesn't make the high point rocket raid cards (which are medium
quality RAID cards).

On a good day maybe. HighPoint is a pretty miserable RAID vendor--in
the same league as Promise from what I've seen as far as their Linux
driver support goes. In generally, and for reasons I'm not completely
sure of, everyone selling "fake RAID" cards seems to be completely
incompetent. The page at http://linuxmafia.com/faq/Hardware/sata.html
hasn't been updated in a while, but as of 2007 all the current HighPoint
cards were still based on closed-source drivers only. Completely
worthless hardware IMHO.

Sounds like your hardware is bad. Could be mobo / cpu / memory or
RAID card. Does this machine "hang" every so often or anything?

It's not out of the question for this sort of problem to be caused by a
bad driver too. In this case it seems more likely it's a drive failure
though.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.com

#10Sam Jas
samjas33@yahoo.com
In reply to: Greg Smith (#9)
Re: READ ONLY & I/O ERROR

We are getting the below errors after 20 or 25 days of database creation.

ERROR: could not
open relation 1919829/1152694/1921473: Read-only file system
ERROR: could not read block 312320 of relation 1964206/1152694/1981329:
Input/output error

If we create a new database the problem is
repeated after 20 or 25 days. Until then we don't have any issues with the new
database.

The size of database is very huge. We are loading millions of
records every day and also fetching from the database is also high. Even the
disks are not full. We are not dropping the old database.

What is the
reason for this issue?

How can we ensure that it is not a database
issue?

We are using
GridSQL: 1.1.0.9
PostgreSQL 8.3
Architecture Details:
CentOS 5.3 64 bit Areca high point rocket raid
3520 8 port
32 GB RAM

--
Thanks
Sam Jas

--- On Mon, 30/11/09, Greg Smith <greg@2ndquadrant.com> wrote:

From: Greg Smith <greg@2ndquadrant.com>
Subject: Re: [GENERAL] READ ONLY & I/O ERROR
To: "Scott Marlowe" <scott.marlowe@gmail.com>
Cc: "Sam Jas" <samjas33@yahoo.com>, pgsql-general@postgresql.org
Date: Monday, 30 November, 2009, 8:29 PM

Scott Marlowe wrote:

Areca doesn't make the high point rocket raid cards (which are medium
quality RAID cards).
   

On a good day maybe.  HighPoint is a pretty miserable RAID vendor--in the same league as Promise from what I've seen as far as their Linux driver support goes.  In generally, and for reasons I'm not completely sure of, everyone selling "fake RAID" cards seems to be completely incompetent.  The page at http://linuxmafia.com/faq/Hardware/sata.html hasn't been updated in a while, but as of 2007 all the current HighPoint cards were still based on closed-source drivers only.  Completely worthless hardware IMHO.

Sounds like your hardware is bad.  Could be mobo / cpu / memory or
RAID card.  Does this machine "hang" every so often or anything?
   

It's not out of the question for this sort of problem to be caused by a bad driver too.  In this case it seems more likely it's a drive failure though.

-- Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com

-- Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. http://in.yahoo.com/

#11Scott Marlowe
scott.marlowe@gmail.com
In reply to: Sam Jas (#10)
Re: READ ONLY & I/O ERROR

(please use text only email to the list)

On Wed, Dec 2, 2009 at 7:51 AM, Sam Jas <samjas33@yahoo.com> wrote:

We are getting the below errors after 20 or 25 days of database creation.

ERROR: could not open relation 1919829/1152694/1921473: Read-only file system
ERROR: could not read block 312320 of relation 1964206/1152694/1981329: Input/output error

PostgreSQL cannot make a file system read only. The OS does that.

What do your system logs in /var/log have to say when this happens?
There's got to be more context in there than we're getting evidence of
here on the list.

If we create a new database the problem is repeated after 20 or 25 days. Until then we don't have any issues with the new database.

My guess is that it's not a fixed number, just what you've seen so
far, could happen in a day or a month or a year.

The size of database is very huge. We are loading millions of records every day and also fetching from the database is also high. Even the disks are not full. We are not dropping the old database.

What is the reason for this issue?

Looks like bad hardware to me.

How can we ensure that it is not a database issue?

It can't be a database number, as the database isn't capable of
actually locking a file system. It can trigger an OS bug maybe that
causes this problem, but given that no one else is having this issue
with Centos 5.3, I'm gonna bet on bad hardware.

We are using
GridSQL: 1.1.0.9
PostgreSQL 8.3
Architecture Details:
CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port
32 GB RAM

I will repeat, Areca does NOT MAKE the high point rocket raid. I will
also add that a Rocket Raid is not, IMHO, suitable for a production
environment. If it's an actual Areca, then the model will be
something like 11xx, 12xx, or 16xx numbers, not 3520.

#12Craig Ringer
craig@2ndquadrant.com
In reply to: Scott Marlowe (#11)
Re: READ ONLY & I/O ERROR

On 2/12/2009 11:35 PM, Scott Marlowe wrote:

(please use text only email to the list)

On Wed, Dec 2, 2009 at 7:51 AM, Sam Jas<samjas33@yahoo.com> wrote:

We are getting the below errors after 20 or 25 days of database creation.

ERROR: could not open relation 1919829/1152694/1921473: Read-only file system
ERROR: could not read block 312320 of relation 1964206/1152694/1981329: Input/output error

PostgreSQL cannot make a file system read only. The OS does that.

What do your system logs in /var/log have to say when this happens?
There's got to be more context in there than we're getting evidence of
here on the list.

In particular, if you're on a Linux system check the output of the
"dmesg" command. I expect to see warnings about file system errors and
about the file system being re-mounted read-only. I won't be surprised
to see disk/raid errors either.

If we create a new database the problem is repeated after 20 or 25 days. Until then we don't have any issues with the new database.

My guess is that it's not a fixed number, just what you've seen so
far, could happen in a day or a month or a year.

Do you do any RAID scrubbing? On what schedule? Do you test the disks
that are part of your RAID array using their internal SMART diagnostics?

Is your server ever hard-reset or rebooted due to loss of power?
(PostgreSQL is fine with this on a proper setup, but if you have a buggy
RAID controller or one that caches writes without a battery backup, it's
going to have issues).

--
Craig Ringer