Why database is corrupted after re-booting

Started by Andrusover 20 years ago50 messagesgeneral
Jump to latest
#1Andrus
eetasoft@online.ee

Yesterday computer running Postgres re-boots suddenly. After that,

select * from firma1.klient

returns

ERROR: invalid page header in block 739 of relation "klient"

I have Quantum Fireball IDE drive, write caching is turned OFF.
I have Windows XP with FAT32 file system.
I'm using PostgreSQL 8.0.2 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC)
3.4.2 (mingw-special)

Why the corruption occurs ? How to avoid data corruption?
Will NTFS file system prevent all corruptions ? If yes, how to convert FAT32
to NTFS without losing data in drive ?

Andrus.

#2Troy
troy@hendrix.biz
In reply to: Andrus (#1)
Re: Why database is corrupted after re-booting

I couldn't load it on a FAT32 partition on an XP HOME pc. So I loaded
it on the NTSF partition of the same drive.

I don't know why it did & now doesn't work but it could be that you
need to defrag and clear some space.

To change partition types you need to re-format (resetting partitions
will lose data structure - reformat required).

You could just pop in an additional harddrive (slave) and have it
formatted NTFS - then install it on that drive D:/postgres/

Not the answer you'd want but good luck.
Troy

#3Gregory Youngblood
pgcluster@netio.org
In reply to: Andrus (#1)
Re: Why database is corrupted after re-booting

Talking with various people that ran postgres at different times, one
thing they always come back with in why mysql is so much better:
postgresql corrupts too easily and you lose your data.

Personally, I've not seen corruption in postgres since 5.x or 6.x
versions from several years ago. And, I've seen corruption on mysql
(though I could not isolate between a reiserfs or mysql problem - both
with supposedly stable releases installed as part of a distro).

Is corruption a problem? I don't think so - but I want to make sure I
haven't had my head in the sand for a while. :) I realize this instance
appears to be on Windows, which is relatively new as a native Windows
program. I'm really after the answer on more mature platforms (including
Linux).

Thanks,
Greg

On Wed, 2005-10-26 at 18:27 +0300, Andrus wrote:

Show quoted text

Yesterday computer running Postgres re-boots suddenly. After that,

select * from firma1.klient

returns

ERROR: invalid page header in block 739 of relation "klient"

I have Quantum Fireball IDE drive, write caching is turned OFF.
I have Windows XP with FAT32 file system.
I'm using PostgreSQL 8.0.2 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC)
3.4.2 (mingw-special)

Why the corruption occurs ? How to avoid data corruption?
Will NTFS file system prevent all corruptions ? If yes, how to convert FAT32
to NTFS without losing data in drive ?

Andrus.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

#4Andrus
eetasoft@online.ee
In reply to: Andrus (#1)
Re: Why database is corrupted after re-booting

To change partition types you need to re-format (resetting partitions
will lose data structure - reformat required).

Troy,

Whole my IDE drive is 20 GB FAT32 C: drive booting XP
I have a lot of data in this drive so it is not possible to re-format. Also
I do'nt want to create two logical disks in single drive.

Is this prevents data corruption for Postgres, is there some utility which
can convert C: drive to NTFS ?
Can Partition Magic help ?

Andrus

#5Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Andrus (#1)
Re: Why database is corrupted after re-booting

On Wed, 2005-10-26 at 10:27, Andrus wrote:

Yesterday computer running Postgres re-boots suddenly. After that,

select * from firma1.klient

returns

ERROR: invalid page header in block 739 of relation "klient"

I have Quantum Fireball IDE drive, write caching is turned OFF.
I have Windows XP with FAT32 file system.
I'm using PostgreSQL 8.0.2 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC)
3.4.2 (mingw-special)

Why the corruption occurs ? How to avoid data corruption?
Will NTFS file system prevent all corruptions ? If yes, how to convert FAT32
to NTFS without losing data in drive ?

If your machine crashes, FAT makes no promises that it will come back
up, uncorrupted or otherwise.

NTFS has journaling, and should provide more safety.

Turning off the write cache is the right thing to do. Putting your db
on FAT is the (very very) wrong thing to do.

I would run the ntfs converter if I were you, but you'll likely need a
backup to get your database back on its feet again. Don't forget the
backups.

#6Joshua D. Drake
jd@commandprompt.com
In reply to: Andrus (#1)
Re: Why database is corrupted after re-booting

On Wed, 2005-10-26 at 18:27 +0300, Andrus wrote:

Yesterday computer running Postgres re-boots suddenly. After that,

select * from firma1.klient

returns

ERROR: invalid page header in block 739 of relation "klient"

I have Quantum Fireball IDE drive, write caching is turned OFF.
I have Windows XP with FAT32 file system.
I'm using PostgreSQL 8.0.2 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC)
3.4.2 (mingw-special)

Why the corruption occurs ?

Most likely because the IDE was caching the information. IDE drives
sometimes lie about having caching turned on or off.

How to avoid data corruption?

You could also have a bad drive.

Will NTFS file system prevent all corruptions ?

No.

Sincerely,

Joshua D. Drake

If yes, how to convert FAT32
to NTFS without losing data in drive ?

Andrus.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

#7Troy
troy@hendrix.biz
In reply to: Andrus (#4)
Re: Why database is corrupted after re-booting

Cheaper solution is to get a second hard drive an put it in your
computer as a slave....

yes you could xcopy your drive to some backup device then repartition
and plop it back - that would take alot of work and involves
DiskCopy/Ghost like software and has great risk. (Run Defrag first -
Plus you may still need dual partition the drive to put your boot files
back in place.) Backup everything first!

I don't know how much access you have, but another harddrive (100GB
from bestbuy.com about $50 - cheaper that software. You could install
a used, smaller hard drive and you'd never know the difference. Put
just Postgres on the second hard drive (FORMAT IT NTFS FIRST).

hope it helps
Troy H

#8Welty, Richard
richard.welty@bankofamerica.com
In reply to: Troy (#7)
Re: Why database is corrupted after re-booting

Gregory Youngblood wrote:

Is corruption a problem? I don't think so - but I want to make sure I haven't had my
head in the sand for a while. :) I realize this instance appears to be on Windows,
which is relatively new as a native Windows program. I'm really after the answer on
more mature platforms (including Linux).

crappy disk drives and bad windows file systems, nothing more. postgresql is
rather corruption free when the surrounding hardware and software environments
are well chosen.

if you do have to use cheap disk drives/controllers, then a battery backup
unit that shuts the server down automagically is a really really good idea.
getting that IDE cache flushed is pretty high on the priority list.

richard

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gregory Youngblood (#3)
Re: Why database is corrupted after re-booting

Gregory Youngblood <pgcluster@netio.org> writes:

Is corruption a problem? I don't think so - but I want to make sure I
haven't had my head in the sand for a while. :) I realize this instance
appears to be on Windows, which is relatively new as a native Windows
program. I'm really after the answer on more mature platforms (including
Linux).

It's been quite some time since I've seen an instance of data corruption
that appeared to be due to a Postgres bug. (At least, not corruption in
tables ... we've had some index bugs, but those you can always fix with
REINDEX.) I have seen lots of cases that seemed to be due to hardware
or OS misfeasance, eg, disk sectors filled with data that didn't come
from Postgres at all.

You can reduce your exposure by making sure things are correctly
configured (eg, disable write caching, or better yet don't use
consumer-grade drives at all). In the end there's no substitute
for a good backup policy ;-)

AFAICS mysql will have exactly the same problems. So will oracle or
any other DB. Oracle may have a better looking track record, but
that's probably because people don't try to run it on cheap junk PCs.

regards, tom lane

#10Joshua D. Drake
jd@commandprompt.com
In reply to: Andrus (#4)
Re: Why database is corrupted after re-booting

On Wed, 2005-10-26 at 19:14 +0300, Andrus wrote:

To change partition types you need to re-format (resetting partitions
will lose data structure - reformat required).

Troy,

Whole my IDE drive is 20 GB FAT32 C: drive booting XP
I have a lot of data in this drive so it is not possible to re-format. Also
I do'nt want to create two logical disks in single drive.

Is this prevents data corruption for Postgres, is there some utility which
can convert C: drive to NTFS ?
Can Partition Magic help ?

XP at least on install I believe has the ability to convert to NTFS.

Have you tried just right clicking on your C: selecting properties
and then seeing if there is a convert option?

Andrus

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

#11Troy
troy@hendrix.biz
In reply to: Joshua D. Drake (#10)
Re: Why database is corrupted after re-booting

Unless I missed something, I think you can select on a fresh install
but not after. I doubt even an image could be switched but I could be
wrong, I am too often.

Troy

#12Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#9)
Re: Why database is corrupted after re-booting

AFAICS mysql will have exactly the same problems. So will oracle or
any other DB. Oracle may have a better looking track record, but
that's probably because people don't try to run it on cheap junk PCs.

Can I quote this?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

#13Wes Williams
wes_williams@fcbonline.net
In reply to: Joshua D. Drake (#10)
Re: Why database is corrupted after re-booting

Type the following at the Windows command prompt (start, run, "cmd"):

convert c: /fs:ntfs /v

It will complain about locked files and perform the convert at the next
reboot, which you should do immediately.

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Joshua D. Drake
Sent: Wednesday, October 26, 2005 1:10 PM
To: Andrus
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Why database is corrupted after re-booting

On Wed, 2005-10-26 at 19:14 +0300, Andrus wrote:

To change partition types you need to re-format (resetting partitions
will lose data structure - reformat required).

Troy,

Whole my IDE drive is 20 GB FAT32 C: drive booting XP
I have a lot of data in this drive so it is not possible to re-format.

Also

I do'nt want to create two logical disks in single drive.

Is this prevents data corruption for Postgres, is there some utility which
can convert C: drive to NTFS ?
Can Partition Magic help ?

XP at least on install I believe has the ability to convert to NTFS.

Have you tried just right clicking on your C: selecting properties
and then seeing if there is a convert option?

Andrus

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

#14Shelby Cain
alyandon@yahoo.com
In reply to: Wes Williams (#13)
Re: Why database is corrupted after re-booting

Additionally, you should also take the opportunity to defrag the
filesystem after the conversion as the change in cluster size (I'm
guessing from 64k to 4k) will leave your shiny new NTFS file system
highly fragmented.

--- Wes Williams <wes_williams@fcbonline.net> wrote:

Type the following at the Windows command prompt (start, run, "cmd"):

convert c: /fs:ntfs /v

It will complain about locked files and perform the convert at the
next
reboot, which you should do immediately.

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Joshua D.
Drake
Sent: Wednesday, October 26, 2005 1:10 PM
To: Andrus
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Why database is corrupted after re-booting

On Wed, 2005-10-26 at 19:14 +0300, Andrus wrote:

To change partition types you need to re-format (resetting

partitions

will lose data structure - reformat required).

Troy,

Whole my IDE drive is 20 GB FAT32 C: drive booting XP
I have a lot of data in this drive so it is not possible to

re-format.
Also

I do'nt want to create two logical disks in single drive.

Is this prevents data corruption for Postgres, is there some

utility which

can convert C: drive to NTFS ?
Can Partition Magic help ?

XP at least on install I believe has the ability to convert to NTFS.

Have you tried just right clicking on your C: selecting properties
and then seeing if there is a convert option?

Andrus

---------------------------(end of

broadcast)---------------------------

TIP 9: In versions below 8.0, the planner will ignore your desire

to

choose an index scan if your joining column's datatypes do

not

match

--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

---------------------------(end of
broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

---------------------------(end of
broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

__________________________________
Yahoo! FareChase: Search multiple travel sites in one click.
http://farechase.yahoo.com

#15Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Gregory Youngblood (#3)
Re: Why database is corrupted after re-booting

On Wed, 2005-10-26 at 11:14, Gregory Youngblood wrote:

Talking with various people that ran postgres at different times, one
thing they always come back with in why mysql is so much better:
postgresql corrupts too easily and you lose your data.

Personally, I've not seen corruption in postgres since 5.x or 6.x
versions from several years ago. And, I've seen corruption on mysql
(though I could not isolate between a reiserfs or mysql problem - both
with supposedly stable releases installed as part of a distro).

Is corruption a problem? I don't think so - but I want to make sure I
haven't had my head in the sand for a while. :) I realize this
instance appears to be on Windows, which is relatively new as a native
Windows program. I'm really after the answer on more mature platforms
(including Linux).

I have been using PostgreSQL since version 6.5.2. There are many people
on this list that have been using it longer than that. In all that
time, I've had exactly zero problems with data corruption. Of course,
every server I've run PostgreSQL on has been burnt in for at least a
week of heavy testing, and they've all had SCSI drives, and if they had
RAID controllers they all had battery backed cache.

Every machine was tested by running pg_bench for many days, about 100
clients wide, while doing other, more general work at the same time. A
part of the testing was to switch the machine off many times while it
was committing to the database, often forcing a flush before pulling the
plug.

I found quickly that IDE drives are not reliable with the cache turned
on, and are too slow for most production purposes without the cache.
So, SCSI was (and apparently still is) the only way to go.

Now, I'm willing to bet that PostgreSQL is more likely to notice
corruption and report it than MySQL. I wonder if MySQL can detect most
simple single bit errors or not? I'd have to do some testing on it to
see if it can detect such errors easily.

I'd much rather have a database that simply stops and reports a data
corruption error than one that doesn't notice, wouldn't you?

#16Wes Williams
wes_williams@fcbonline.net
In reply to: Scott Marlowe (#15)
Re: Why database is corrupted after re-booting

Even with a primary UPS on the *entire PostgreSQL server* does one still
need, or even still recommend, a battery-backed cache on the RAID controller
card? [ref SCSI 320, of course]

If so, I'd be interest in knowing briefly why.

Thanks.

-----Original Message-----
===snip===

...
every server I've run PostgreSQL on has been burnt in for at least a
week of heavy testing, and they've all had SCSI drives, and if they had
RAID controllers they all had battery backed cache.

#17snacktime
snacktime@gmail.com
In reply to: Scott Marlowe (#15)
Re: Why database is corrupted after re-booting

I remember a few months back when someone hit the emergency power switch to
the whole floor where we host at Internap. Subsequently the backup power
system had a cascading failure. Livejournal, who also hosts there, was up
all night and into the next day restoring their mysql databases after a
bunch of them were corrupted. I believe they had write cache turned on.

Of course our postgresql servers on scsi drives came right back up. If it
wasn't for a couple of servers that won't reboot automatically if the power
goes out I wouldn't have even had to go down to the data center.

Chris

#18Doug McNaught
doug@mcnaught.org
In reply to: Wes Williams (#16)
Re: Why database is corrupted after re-booting

"Wes Williams" <wes_williams@fcbonline.net> writes:

Even with a primary UPS on the *entire PostgreSQL server* does one still
need, or even still recommend, a battery-backed cache on the RAID controller
card? [ref SCSI 320, of course]

If so, I'd be interest in knowing briefly why.

UPSs can fail just like any other piece of hardware.

-Doug

#19Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Wes Williams (#16)
Re: Why database is corrupted after re-booting

On Wed, 2005-10-26 at 13:38, Wes Williams wrote:

Even with a primary UPS on the *entire PostgreSQL server* does one still
need, or even still recommend, a battery-backed cache on the RAID controller
card? [ref SCSI 320, of course]

If so, I'd be interest in knowing briefly why.

I'll tell you a quick little story.

Got a new server, aged out the old one. new server was a dual P-IV 2800
with 2 gigs ram and a pair of 36 gig U320 drives in a RAID-1 mirror
under a battery backed cache. This machine also had four 120 gig IDE
drives for file storage. But the database was on the dual SCSIs under
the RAID controller.

I tested it with the power off test, etc... And it passed with flying
colors. Put it into production. Many other servers, including our
Oracle servers, were not tested in this way.

This machine had dual redundant power supplies with separate power
cables running into two separate rails, each running off of a different
UPS. The UPSes were fed by power conditioners, and there was a switch
on the other side of that to switch us over to diesel generators should
the power go out. The UPSes were quite large, and even with a hundred
or so computers in the hosting center, there was about 3 hours of
battery time before the diesel generator HAD to be up or we'd lose
power.

Seems pretty solid, right? We're talking a multi million dollar hosting
center, the kind with an ops center that looks like the deck of the
Enterprise. Raised floors, everything.

Fast forward six months. An electrician working on the wiring in the
ceiling above one of the power conditioners clips off a tiny piece of
wire. Said tiny piece of wire drops into the power conditioner. Said
power conditioner overloads, and trips the other two power conditioners
in the hosting center. This also blew out the master controller on the
UPS setup, so it didn't come up. The switch for the Diesel generator
would have switched over, but it was fried too. The UPSes, luckily,
were the constant on variety, so they took the hit for the computers on
the other side of them, about half the UPSes were destroyed.

After about 3 hours, we had enough of the power jury rigged to bring the
systems back up. In a company with dozens and dozens, ranging from
MySQL to Oracle to PostgreSQL to Ingres to MSSQL to interbase to foxpro,
exactly one of our database servers came up without any errors. You
already know which one it was, or I wouldn't be writing this letter.

Power supplies fail, UPSes fail, hard drives fail, and raid controllers
and batter backed caches fail. You can remove every possibility of
failure, but you can limit the number of things that can harm you should
they fail.

I do know that after that outage, I never once got shit for using
postgresql ever again from anybody. The sad thing is, if any of those
other machines had had battery backed raid controllers with local
storage (many were running on NFS or SMB mounts) they would have been
fine too. But many of the DBAs for those other databases had the same
"who needs to worry about sudden power off when we have UPSes and power
conditioners." You can guess what optional feature suddenly seemed like
a good idea for every new database server after that.

#20Bricklen Anderson
banderson@presinet.com
In reply to: snacktime (#17)
Re: Why database is corrupted after re-booting

snacktime wrote:

I remember a few months back when someone hit the emergency power switch
to the whole floor where we host at Internap. Subsequently the backup
power system had a cascading failure. Livejournal, who also hosts
there, was up all night and into the next day restoring their mysql
databases after a bunch of them were corrupted. I believe they had
write cache turned on.

Of course our postgresql servers on scsi drives came right back up. If
it wasn't for a couple of servers that won't reboot automatically if the
power goes out I wouldn't have even had to go down to the data center.

Chris

I remember reading a detailed account on Livejournal about the hoops they had to
jump through to get up and running again after that incident. Bit of a nightmare
for them.

--
_______________________________

This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.
_______________________________

#21Welty, Richard
richard.welty@bankofamerica.com
In reply to: Bricklen Anderson (#20)
#22Keith C. Perry
netadmin@vcsn.com
In reply to: Welty, Richard (#21)
#23Rick Ellis
ellis@spinics.net
In reply to: Welty, Richard (#8)
#24David Garamond
lists@zara.6.isreserved.com
In reply to: Andrus (#1)
#25Alex Stapleton
alexs@advfn.com
In reply to: snacktime (#17)
#26Andrus
eetasoft@online.ee
In reply to: Andrus (#1)
#27Richard Huxton
dev@archonet.com
In reply to: Andrus (#26)
#28Andrus
eetasoft@online.ee
In reply to: Andrus (#1)
#29Richard Huxton
dev@archonet.com
In reply to: Andrus (#28)
#30Martijn van Oosterhout
kleptog@svana.org
In reply to: Andrus (#28)
#31Andrus
eetasoft@online.ee
In reply to: Andrus (#1)
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alex Stapleton (#25)
#33Alex Stapleton
alexs@advfn.com
In reply to: Tom Lane (#32)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alex Stapleton (#33)
#35Alex Stapleton
alexs@advfn.com
In reply to: Tom Lane (#34)
#36Richard Huxton
dev@archonet.com
In reply to: Andrus (#31)
#37Keith C. Perry
netadmin@vcsn.com
In reply to: Tom Lane (#32)
#38Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Keith C. Perry (#37)
#39w_tom
w_tom1@usa.net
In reply to: Keith C. Perry (#37)
#40w_tom
w_tom1@usa.net
In reply to: Andrus (#1)
#41Troy
troy@hendrix.biz
In reply to: Wes Williams (#13)
#42Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: w_tom (#39)
#43Andrus
eetasoft@online.ee
In reply to: Andrus (#1)
#44Bruce Momjian
bruce@momjian.us
In reply to: Andrus (#43)
#45Alex Turner
armtuk@gmail.com
In reply to: Bruce Momjian (#44)
#46Bruce Momjian
bruce@momjian.us
In reply to: Alex Turner (#45)
#47Alex Turner
armtuk@gmail.com
In reply to: Bruce Momjian (#46)
#48w_tom
w_tom1@usa.net
In reply to: Ron Mayer (#42)
#49Magnus Hagander
magnus@hagander.net
In reply to: w_tom (#48)
#50Andrus
eetasoft@online.ee
In reply to: Magnus Hagander (#49)