Losing records when server hang

Started by lecover 21 years ago25 messagesgeneral
Jump to latest
#1lec
limec@streamyx.com

Hi,

I'm observing the following:
If I commit the following records 1,2,3,4,5,6,7,8,9,10 to the database
and the server hangs, I could lose records 5,6,7,8,9 but record 10 is
there. How is this possible and do anyone know how Postgresql
physically writes the records?

Thanks,
thomas.

#2Scott Marlowe
smarlowe@qwest.net
In reply to: lec (#1)
Re: Losing records when server hang

On Sun, 2004-08-08 at 19:43, lec wrote:

Hi,

I'm observing the following:
If I commit the following records 1,2,3,4,5,6,7,8,9,10 to the database
and the server hangs, I could lose records 5,6,7,8,9 but record 10 is
there. How is this possible and do anyone know how Postgresql
physically writes the records?

Assuming a properly function storage subsystem and a kernel that does
not lie about fsync, this is not possible.

Are you running on top of IDE drives with the write cache enabled?
Most, if not all, IDE drives lie about FSYNC and can lose data if power
is disconnected abrubtly.

Or if you are running with fsync=false in postgresql.conf the same thing
can happen. Otherwise, not, it should not happen. More info perhaps?

#3Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Scott Marlowe (#2)
Re: Losing records when server hang

On Sun, Aug 08, 2004 at 08:36:36PM -0600, Scott Marlowe wrote:

On Sun, 2004-08-08 at 19:43, lec wrote:

If I commit the following records 1,2,3,4,5,6,7,8,9,10 to the database
and the server hangs, I could lose records 5,6,7,8,9 but record 10 is
there. How is this possible and do anyone know how Postgresql
physically writes the records?

Assuming a properly function storage subsystem and a kernel that does
not lie about fsync, this is not possible.

It is actually possible if he uses several backends to do the job, and
transaction inserting record 10 commits before the hang, and 5,6,7,8,9
don't.

If this is only one backend, then I'd love to see how did he do that.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"The ability to monopolize a planet is insignificant
next to the power of the source"

#4Scott Marlowe
smarlowe@qwest.net
In reply to: Alvaro Herrera (#3)
Re: Losing records when server hang

On Sun, 2004-08-08 at 21:26, Alvaro Herrera Munoz wrote:

On Sun, Aug 08, 2004 at 08:36:36PM -0600, Scott Marlowe wrote:

On Sun, 2004-08-08 at 19:43, lec wrote:

If I commit the following records 1,2,3,4,5,6,7,8,9,10 to the database
and the server hangs, I could lose records 5,6,7,8,9 but record 10 is
there. How is this possible and do anyone know how Postgresql
physically writes the records?

Assuming a properly function storage subsystem and a kernel that does
not lie about fsync, this is not possible.

It is actually possible if he uses several backends to do the job, and
transaction inserting record 10 commits before the hang, and 5,6,7,8,9
don't.

Yeah, but he explicitly said he'd committed 1 through 10. Unless he
didn't understand what is meant by commit. I.e. committing AND
receiving the ack for that commit. Until the database says it
committed, nothing's been committed, so he might have thought just
firing the insert query was committing. I hadn't really thought of that
angle.

Is that the case, lec?

If this is only one backend, then I'd love to see how did he do that.

Me too :-)

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Scott Marlowe (#4)
Re: Losing records when server hang

"Scott Marlowe" <smarlowe@qwest.net> writes:

On Sun, 2004-08-08 at 21:26, Alvaro Herrera Munoz wrote:

If this is only one backend, then I'd love to see how did he do that.

Me too :-)

Given a disk drive that lies about write complete (which is to to say,
most consumer-grade IDE gear), it's at least theoretically possible.
But it doesn't sound real likely, especially if the machine didn't
actually lose power.

I was wondering what he meant by "hang", as well as what he meant by
"commit". The overall lack of detail in the report is notable.

regards, tom lane

#6lec
limec@streamyx.com
In reply to: Scott Marlowe (#4)
Re: Losing records when server hang

Scott Marlowe wrote:

On Sun, 2004-08-08 at 21:26, Alvaro Herrera Munoz wrote:

On Sun, Aug 08, 2004 at 08:36:36PM -0600, Scott Marlowe wrote:

On Sun, 2004-08-08 at 19:43, lec wrote:

If I commit the following records 1,2,3,4,5,6,7,8,9,10 to the database
and the server hangs, I could lose records 5,6,7,8,9 but record 10 is
there. How is this possible and do anyone know how Postgresql
physically writes the records?

Assuming a properly function storage subsystem and a kernel that does
not lie about fsync, this is not possible.

I'm using Redhat 7.3, kernel 2.4.18

It is actually possible if he uses several backends to do the job, and
transaction inserting record 10 commits before the hang, and 5,6,7,8,9
don't.

Just 1 backend.

Yeah, but he explicitly said he'd committed 1 through 10. Unless he
didn't understand what is meant by commit. I.e. committing AND
receiving the ack for that commit. Until the database says it
committed, nothing's been committed, so he might have thought just
firing the insert query was committing. I hadn't really thought of that
angle.

Is that the case, lec?

I explicitly 'COMMIT'

If this is only one backend, then I'd love to see how did he do that.

Me too :-)

That's exactly leaving me puzzled. I don't know if it has anything to
do with the SCSI controller or hardware related stuff. The controller
is a RAID, configured are RAID-5.

--lec

#7lec
limec@streamyx.com
In reply to: lec (#1)
Re: Losing records when server hang

Tom Lane wrote:

"Scott Marlowe" <smarlowe@qwest.net> writes:

On Sun, 2004-08-08 at 21:26, Alvaro Herrera Munoz wrote:

If this is only one backend, then I'd love to see how did he do that.

Me too :-)

Given a disk drive that lies about write complete (which is to to say,
most consumer-grade IDE gear), it's at least theoretically possible.
But it doesn't sound real likely, especially if the machine didn't
actually lose power.

It's a SCSI, RAID-5 on a Dell server.

I was wondering what he meant by "hang", as well as what he meant by
"commit". The overall lack of detail in the report is notable.

The hardware actually "hang". The Dell engineers came and replaced the
motherboard but couldn't tell what the actual fault was.

Commit as in 'COMMIT'. 'Records' 1,2,3,4,5,6,7,8,9,10 are actually
transactions. I'm as puzzled as to why I lost the transactions in the
middle but got the last transaction.

Show quoted text

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

#8lec
limec@streamyx.com
In reply to: Scott Marlowe (#2)
Re: Losing records when server hang

Scott Marlowe wrote:

On Sun, 2004-08-08 at 19:43, lec wrote:

Hi,

I'm observing the following:
If I commit the following records 1,2,3,4,5,6,7,8,9,10 to the database
and the server hangs, I could lose records 5,6,7,8,9 but record 10 is
there. How is this possible and do anyone know how Postgresql
physically writes the records?

Assuming a properly function storage subsystem and a kernel that does
not lie about fsync, this is not possible.

Are you running on top of IDE drives with the write cache enabled?
Most, if not all, IDE drives lie about FSYNC and can lose data if power
is disconnected abrubtly.

SCSI

Or if you are running with fsync=false in postgresql.conf the same thing
can happen. Otherwise, not, it should not happen. More info perhaps?

I didn't change the default fsync, which I believe is fsync=true.

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: lec (#7)
Re: Losing records when server hang

lec <limec@streamyx.com> writes:

It's a SCSI, RAID-5 on a Dell server.

The hardware actually "hang". The Dell engineers came and replaced the
motherboard but couldn't tell what the actual fault was.

Commit as in 'COMMIT'. 'Records' 1,2,3,4,5,6,7,8,9,10 are actually
transactions. I'm as puzzled as to why I lost the transactions in the
middle but got the last transaction.

I'm puzzled too. I don't suppose you have the postmaster log from when
it was trying to recover from the crash? Or even better, copies of the
WAL files?

A possible theory has to do with corruption of the WAL log. For
instance, transactions 1-10 are all down to disk in WAL (or at least the
kernel told postgres the writes were done) and for one reason or another
the buffer manager chances to flush the page containing record 10 out
to its data file before the other records' pages. Now the system hangs.
After reboot, if the WAL log is unreadable beyond transaction 1 then the
database would come up with transaction 1 replayed, 2-10 not replayed,
but 10's data is out there anyway.

However this would seem to imply disk drive misfeasance above and beyond
your motherboard problem.

regards, tom lane

#10lec
limec@streamyx.com
In reply to: Tom Lane (#9)
Re: Losing records when server hang

Tom Lane wrote:

I'm puzzled too. I don't suppose you have the postmaster log from when
it was trying to recover from the crash? Or even better, copies of the
WAL files?

I never knew where the log files are :(

#11Marco Colombo
marco@esi.it
In reply to: Tom Lane (#9)
Re: Losing records when server hang

Tom Lane wrote:

lec <limec@streamyx.com> writes:

It's a SCSI, RAID-5 on a Dell server.

The hardware actually "hang". The Dell engineers came and replaced the
motherboard but couldn't tell what the actual fault was.

Commit as in 'COMMIT'. 'Records' 1,2,3,4,5,6,7,8,9,10 are actually
transactions. I'm as puzzled as to why I lost the transactions in the
middle but got the last transaction.

I'm puzzled too. I don't suppose you have the postmaster log from when
it was trying to recover from the crash? Or even better, copies of the
WAL files?

A possible theory has to do with corruption of the WAL log. For
instance, transactions 1-10 are all down to disk in WAL (or at least the
kernel told postgres the writes were done) and for one reason or another
the buffer manager chances to flush the page containing record 10 out
to its data file before the other records' pages. Now the system hangs.
After reboot, if the WAL log is unreadable beyond transaction 1 then the
database would come up with transaction 1 replayed, 2-10 not replayed,
but 10's data is out there anyway.

However this would seem to imply disk drive misfeasance above and beyond
your motherboard problem.

Well, no. How about this theory:

1) everything is ok:
the backend executes write()/fsync() for transactions 1-5

2) hardware fails some how at MB level (imagine CPU/RAM overheating):
RAM gets corrupted - kernel starts oopsing (but goes on)
meanwhile, the backend executes write()/fsync() for transactions 6-10,
but randomly corrupted data gets written to disk.

3) unrecoverable kernel error occurs, the show stops.

On recover, transactions 6-9 don't even look like valid log entries, while
10, for some reason, does (maybe only data is corrupted).

I'm not familiar with the details of WAL files and post-crash recovery,
but is that possible? Or does the process stop at the first failure?

Anyway, if your CPU/RAM is failing, no DB technology can save you. You
need redundant CPU/RAM units to perform the same operations concurrenly,
and the hardware to validate the results on a 2vs1 basis at least.
Ask NASA, I think they know what "mission critical" actually means. :)

Really, when the hardware starts flipping random bits in RAM, you can't
even know how long it's being going on, can be hours w/o the kernel panic
or hang at all. No one knows how good is your data. There's no point in
recovering a transaction if the data inside is corrupted.

.TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ Colombo@ESI.it

#12Scott Marlowe
smarlowe@qwest.net
In reply to: lec (#6)
Re: Losing records when server hang

On Mon, 2004-08-09 at 09:07, lec wrote:

Scott Marlowe wrote:

On Sun, 2004-08-08 at 21:26, Alvaro Herrera Munoz wrote:

On Sun, Aug 08, 2004 at 08:36:36PM -0600, Scott Marlowe wrote:

On Sun, 2004-08-08 at 19:43, lec wrote:

If I commit the following records 1,2,3,4,5,6,7,8,9,10 to the database
and the server hangs, I could lose records 5,6,7,8,9 but record 10 is
there. How is this possible and do anyone know how Postgresql
physically writes the records?

Assuming a properly function storage subsystem and a kernel that does
not lie about fsync, this is not possible.

I'm using Redhat 7.3, kernel 2.4.18

It is actually possible if he uses several backends to do the job, and
transaction inserting record 10 commits before the hang, and 5,6,7,8,9
don't.

Just 1 backend.

Yeah, but he explicitly said he'd committed 1 through 10. Unless he
didn't understand what is meant by commit. I.e. committing AND
receiving the ack for that commit. Until the database says it
committed, nothing's been committed, so he might have thought just
firing the insert query was committing. I hadn't really thought of that
angle.

Is that the case, lec?

I explicitly 'COMMIT'

If this is only one backend, then I'd love to see how did he do that.

Me too :-)

That's exactly leaving me puzzled. I don't know if it has anything to
do with the SCSI controller or hardware related stuff. The controller
is a RAID, configured are RAID-5.

Does that RAID controller have NON battery backed cache?

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marco Colombo (#11)
Re: Losing records when server hang

Marco Colombo <marco@esi.it> writes:

Tom Lane wrote:

However this would seem to imply disk drive misfeasance above and beyond
your motherboard problem.

Well, no. How about this theory:

1) everything is ok:
the backend executes write()/fsync() for transactions 1-5

2) hardware fails some how at MB level (imagine CPU/RAM overheating):
RAM gets corrupted - kernel starts oopsing (but goes on)
meanwhile, the backend executes write()/fsync() for transactions 6-10,
but randomly corrupted data gets written to disk.

3) unrecoverable kernel error occurs, the show stops.

On recover, transactions 6-9 don't even look like valid log entries, while
10, for some reason, does (maybe only data is corrupted).

I'm not familiar with the details of WAL files and post-crash recovery,
but is that possible? Or does the process stop at the first failure?

Recovery will stop at the first corrupted record, so it would not happen
like that. But you are right, the MB failure alone might have been
enough to corrupt the outgoing WAL log data and thus produce the
scenario I described. Once Postgres *thinks* transactions 1-10 are
safely down to disk in the WAL log, it will feel free to update the data
files in any random order that seems convenient. So the write of record
10 could have occurred before the rest, and if that happened not to get
corrupted by the MB problem, we could see the result lec describes.

Of course this is all guesswork since we have no direct evidence to look
at, but it seems fairly plausible.

Anyway, if your CPU/RAM is failing, no DB technology can save you.

Agreed. Software certainly cannot make any guarantees if it can't even
execute correctly ...

regards, tom lane

#14lec
limec@streamyx.com
In reply to: Scott Marlowe (#12)
Re: Losing records when server hang

Scott Marlowe wrote:

On Mon, 2004-08-09 at 09:07, lec wrote:

Scott Marlowe wrote:

On Sun, 2004-08-08 at 21:26, Alvaro Herrera Munoz wrote:

On Sun, Aug 08, 2004 at 08:36:36PM -0600, Scott Marlowe wrote:

On Sun, 2004-08-08 at 19:43, lec wrote:

If I commit the following records 1,2,3,4,5,6,7,8,9,10 to the database
and the server hangs, I could lose records 5,6,7,8,9 but record 10 is
there. How is this possible and do anyone know how Postgresql
physically writes the records?

Assuming a properly function storage subsystem and a kernel that does
not lie about fsync, this is not possible.

I'm using Redhat 7.3, kernel 2.4.18

It is actually possible if he uses several backends to do the job, and
transaction inserting record 10 commits before the hang, and 5,6,7,8,9
don't.

Just 1 backend.

Yeah, but he explicitly said he'd committed 1 through 10. Unless he
didn't understand what is meant by commit. I.e. committing AND
receiving the ack for that commit. Until the database says it
committed, nothing's been committed, so he might have thought just
firing the insert query was committing. I hadn't really thought of that
angle.

Is that the case, lec?

I explicitly 'COMMIT'

If this is only one backend, then I'd love to see how did he do that.

Me too :-)

That's exactly leaving me puzzled. I don't know if it has anything to
do with the SCSI controller or hardware related stuff. The controller
is a RAID, configured are RAID-5.

Does that RAID controller have NON battery backed cache?

I'm not sure.

#15lec
limec@streamyx.com
In reply to: lec (#1)
Re: Losing records when server hang

Tom Lane wrote:

Marco Colombo <marco@esi.it> writes:

Tom Lane wrote:

However this would seem to imply disk drive misfeasance above and beyond
your motherboard problem.

Well, no. How about this theory:

1) everything is ok:
the backend executes write()/fsync() for transactions 1-5

2) hardware fails some how at MB level (imagine CPU/RAM overheating):
RAM gets corrupted - kernel starts oopsing (but goes on)
meanwhile, the backend executes write()/fsync() for transactions 6-10,
but randomly corrupted data gets written to disk.

3) unrecoverable kernel error occurs, the show stops.

On recover, transactions 6-9 don't even look like valid log entries, while
10, for some reason, does (maybe only data is corrupted).

I'm not familiar with the details of WAL files and post-crash recovery,
but is that possible? Or does the process stop at the first failure?

Recovery will stop at the first corrupted record, so it would not happen
like that. But you are right, the MB failure alone might have been
enough to corrupt the outgoing WAL log data and thus produce the
scenario I described. Once Postgres *thinks* transactions 1-10 are
safely down to disk in the WAL log, it will feel free to update the data
files in any random order that seems convenient. So the write of record
10 could have occurred before the rest, and if that happened not to get
corrupted by the MB problem, we could see the result lec describes.

Of course this is all guesswork since we have no direct evidence to look
at, but it seems fairly plausible.

Anyway, if your CPU/RAM is failing, no DB technology can save you.

Agreed. Software certainly cannot make any guarantees if it can't even
execute correctly ...

Same here. I don't even want to have to prove anything if the hardware
isn't reliable but the "management" queries about the lost transactions,
blaming on system/software/database. I could prove to them that the lost
transactions were due to the system hang, but transaction #10 being
there makes my reasoning doubtful.

Thanks for all your feedbacks and reasoning.

--lec

#16Scott Marlowe
smarlowe@qwest.net
In reply to: lec (#14)
Re: Losing records when server hang

On Mon, 2004-08-09 at 19:29, lec wrote:

Scott Marlowe wrote:

Does that RAID controller have NON battery backed cache?

I'm not sure.

Find out. If it does, it's an accident looking for a place to happen.
Unless it's set to write thru not write back. If you're gonna have a
caching controller set to write back it MUST have battery backed cache.

#17Chris Travers
chris@metatrontech.com
In reply to: lec (#8)
Re: Losing records when server hang

lec wrote:

Scott Marlowe wrote:

On Sun, 2004-08-08 at 19:43, lec wrote:

Hi,

I'm observing the following:
If I commit the following records 1,2,3,4,5,6,7,8,9,10 to the database
and the server hangs, I could lose records 5,6,7,8,9 but record 10 is
there. How is this possible and do anyone know how Postgresql
physically writes the records?

Assuming a properly function storage subsystem and a kernel that does
not lie about fsync, this is not possible.

Are you running on top of IDE drives with the write cache enabled?
Most, if not all, IDE drives lie about FSYNC and can lose data if power
is disconnected abrubtly.

SCSI

Or if you are running with fsync=false in postgresql.conf the same thing
can happen. Otherwise, not, it should not happen. More info perhaps?

I didn't change the default fsync, which I believe is fsync=true.

Could your SCSI controller be misbehaving? Could this be leading to a
system hang (say when swap space is being called back into memory)?

Best Wishes,
Chris Travers
Metatron Technology Consulting

#18lec
limec@streamyx.com
In reply to: Chris Travers (#17)
Re: Losing records when server hang

Chris Travers wrote:

lec wrote:

Scott Marlowe wrote:

On Sun, 2004-08-08 at 19:43, lec wrote:

Hi,

I'm observing the following:
If I commit the following records 1,2,3,4,5,6,7,8,9,10 to the
database and the server hangs, I could lose records 5,6,7,8,9 but
record 10 is there. How is this possible and do anyone know how
Postgresql physically writes the records?

Assuming a properly function storage subsystem and a kernel that does
not lie about fsync, this is not possible.

Are you running on top of IDE drives with the write cache enabled?
Most, if not all, IDE drives lie about FSYNC and can lose data if power
is disconnected abrubtly.

SCSI

Or if you are running with fsync=false in postgresql.conf the same
thing
can happen. Otherwise, not, it should not happen. More info perhaps?

I didn't change the default fsync, which I believe is fsync=true.

Could your SCSI controller be misbehaving? Could this be leading to a
system hang (say when swap space is being called back into memory)?

The first hardware that got replaced was the SCSI controller. After
that there were still hardware "hanging" and eventually the system
engineers replaced the whole server and they weren't sure what the fault
was. The lost transactions happened not on the first server hang. I'm
more curious to know why the transactions in the middle got lost but the
last transaction was there.

Thanks,
lec

#19Scott Marlowe
smarlowe@qwest.net
In reply to: lec (#18)
Re: Losing records when server hang

On Tue, 2004-08-10 at 19:30, lec wrote:

Chris Travers wrote:

The first hardware that got replaced was the SCSI controller. After
that there were still hardware "hanging" and eventually the system
engineers replaced the whole server and they weren't sure what the fault
was. The lost transactions happened not on the first server hang. I'm
more curious to know why the transactions in the middle got lost but the
last transaction was there.

With questionable hardware (failing memory, scsi / raid controller,
etc.) all failure modes are pretty much possible. PostgreSQL is good,
but it can't make up for bad hardware.

#20Renê Salomão
rene@ibiz.com.br
In reply to: Chris Travers (#17)
Problem when installing dbsize of PG 8.0-beta1

I installed PG 8.0 Beta 1 in my machine. Everything seemed ok... The
problem occurred when I tried to install the programs in contrib
directory...
I commented out dbsize from the Makefile... And everything installed
successfully... Following is compiling output of dbsize... Am I doing
anything wrong or GETDATABASEPATH is broken?

gcc -O2
-fno-strict-aliasing
-Wall
-Wmissing-prototypes
-Wmissing-declarations
-fpic
-I.
-I../../src/include
-D_GNU_SOURCE
-c
-o dbsize.o dbsize.c

dbsize.c: In function`database_size': dbsize.c:58: error: too few
arguments to function`GetDatabasePath' gmake: *** [dbsize.o] Error 1

Current configuration of PG:

pg_config --configure
--prefix=/usr/local/postgresql-8.0
--with-java
--with-perl
--enable-nls
--with-openssl
--with-tcl
--with-pgport=5435
--enable-thread-safety

-----------------------------

Cordialmente,

Renê Salomão
rene@ibiz.com.br
IBIZ Tecnologia - www.ibiz.com.br
(011) 5579-3178

--------------------------------

#21Bruce Momjian
bruce@momjian.us
In reply to: Renê Salomão (#20)
In reply to: Bruce Momjian (#21)
#23Bruno Wolff III
bruno@wolff.to
In reply to: Geoffrey KRETZ (#22)
In reply to: Bruno Wolff III (#23)
#25Mike Mascari
mascarm@mascari.com
In reply to: Geoffrey KRETZ (#24)