Should mdxxx functions(e.g. mdread, mdwrite, mdsync etc) PANIC instead of ERROR when I/O failed?

Started by Jacky Lengover 16 years ago4 messages
#1Jacky Leng
lengjianquan@163.com

Recently, when I was running my application on 8.3.7, my data got
corrupted. The scene was like this: "invalid memory alloc request size ...."

I invested the error data, and found that one sector of a db-block became
all-zero (I confirmed the reason later, it was because that my disk got
bad).

I also checked the log of postmaster, and I found that there were 453
ERROR messages that said "could not read block XXX of relation XXX: ??",
where XXX was the db-block that the bad sector resided in. After these 453
failed read operations, postmaster read successed, but got an all-zero
sector! (I don't know why operating system will allow this happen, but it
just happened)

My question is: should not mdxxx functions(e.g. mdread, mdwrite, mdsync)
just report PANIC instead of ERROR when I/O failed? IMO, since the data has
already corrupted, reporting ERROR will just leave us a very curious scene
later -- which does more harm that benefit.

#2Martijn van Oosterhout
kleptog@svana.org
In reply to: Jacky Leng (#1)
Re: Should mdxxx functions(e.g. mdread, mdwrite, mdsync etc) PANIC instead of ERROR when I/O failed?

On Mon, Jun 15, 2009 at 04:41:42PM +0800, Jacky Leng wrote:

My question is: should not mdxxx functions(e.g. mdread, mdwrite, mdsync)
just report PANIC instead of ERROR when I/O failed? IMO, since the data has
already corrupted, reporting ERROR will just leave us a very curious scene
later -- which does more harm that benefit.

I think the reasoning is that if those functions reported a PANIC the
chance you could recover your data is zero, because you need the
database system to read the other (good) data.

With an ERROR you can investigate the problem and save what can de
saved...

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martijn van Oosterhout (#2)
Re: Should mdxxx functions(e.g. mdread, mdwrite, mdsync etc) PANIC instead of ERROR when I/O failed?

Martijn van Oosterhout <kleptog@svana.org> writes:

On Mon, Jun 15, 2009 at 04:41:42PM +0800, Jacky Leng wrote:

My question is: should not mdxxx functions(e.g. mdread, mdwrite, mdsync)
just report PANIC instead of ERROR when I/O failed? IMO, since the data has
already corrupted, reporting ERROR will just leave us a very curious scene
later -- which does more harm that benefit.

I think the reasoning is that if those functions reported a PANIC the
chance you could recover your data is zero, because you need the
database system to read the other (good) data.

Also, in the case you're complaining about, the problem was that there
wasn't any O/S error report that we could have PANIC'd about anyhow.

But Martijn is correct that a PANIC here would reduce the system's
overall stability without any clear benefit. We already do refuse
to read a page into shared buffers if there's a read error on it,
so it's not clear to me how you think that an ERROR leaves things
in an unstable state.

regards, tom lane

#4Jacky Leng
lengjianquan@163.com
In reply to: Jacky Leng (#1)
Re: Should mdxxx functions(e.g. mdread, mdwrite, mdsync etc) PANIC instead of ERROR when I/O failed?

I think the reasoning is that if those functions reported a PANIC the
chance you could recover your data is zero, because you need the
database system to read the other (good) data.

I do not see why PANIC reduced the chance to recover my data. AFAICS,
my data has already corrupted(because of the bad-block here), whether
PANIC or not, the read opertion on the bad-block should get the same result.

Also, in the case you're complaining about, the problem was that there
wasn't any O/S error report that we could have PANIC'd about anyhow.

No, the O/S did report the error, which lead to the 453 ERROR messages of
postgres. The O/S error messages(got this using dmesg) is like this:
end_request: I/O error, dev sda, sector 504342711
ata1: EH complete
SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
ata1.00: (irq_stat 0x40000008)
ata1.00: cmd 60/08:00:b0:a8:0f/00:00:1e:00:00/40 tag 0 cdb 0x0 data 4096
in
res 41/40:08:b7:a8:0f/06:00:1e:00:00/00 Emask 0x9 (media error)
ata1.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
ata1.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168

We already do refuse
to read a page into shared buffers if there's a read error on it,
so it's not clear to me how you think that an ERROR leaves things
in an unstable state.

In my scene, it seems that the O/S does not ensure that if an I/O operation
(read, write, sync, etc) on a block failed, then all later I/O operations
on this block will also failed. For example:
1. As I noted before, although the bad db-block in my data has been read
unsuccessfully for 453 times, but the 454th read operation succeeds(but
some data(the bad sector) has been set to all-zero). So, even if the 453
failed I/O has reported ERROR, there is still chance that the bad
db-block
can be read in shared buffres.
2. Besides, I have noticed a scene like this: 1)an mdsync operations failed
with the message "ERROR: could not fsync segment XXX of relation XXX:
??";

The error message of O/S(I get this using dmesg command) is like this:
Buffer I/O error on device ^A&#63733;XX205503, logical block 43837786
lost page write due to I/O error on ^A&#63733;XX205503

2) This leaves a half-writen db-block in my data. But the page can still
be read in shared buffers successfully later, which leads to an curious
scene that says "ERROR: could not access status of transaction XXXXX"