RE: WAL does not recover gracefully from out-of-disk-sp ace

Started by Mikheev, Vadimalmost 25 years ago4 messages
#1Mikheev, Vadim
vmikheev@SECTORBASE.COM

Was the following bug already fixed ?

Dunno. I've changed the WAL ReadRecord code so that it fails soft (no
Asserts or elog(STOP)s) for all failure cases, so the particular crash
mode exhibited here should be gone. But I'm not sure why the code
appears to be trying to open the wrong log segment, as Vadim comments.
That bug might still be there. Need to try to reproduce the problem
with new code.

Did you try to start up with wal-debug?

Vadim

#2Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Mikheev, Vadim (#1)

I see that seek+write was changed to write-s in XLogFileInit
(that was induced by subj, right?), but what about problem
itself?

BTW, were performance tests run after seek+write --> write-s
change?

That change was for safety, not for performance. It might be a
performance win on systems that support fdatasync properly (because it
lets us use fdatasync), otherwise it's probably not a performance win.

Even with true fdatasync it's not obviously good for performance - it takes
too long time to write 16Mb files and fills OS buffer cache with trash-:(
Probably, we need in separate process like LGWR (log writer) in Oracle.
I also like the Andreas idea about re-using log files.

But we need it regardless --- if you didn't want a fully-allocated WAL
file, why'd you bother with the original seek-and-write-1-byte code?

I considered this mostly as hint for OS about how log file should be
allocated (to decrease fragmentation). Not sure how OSes use such hints
but seek+write costs nothing.

Vadim

#3Ian Lance Taylor
ian@airs.com
In reply to: Mikheev, Vadim (#2)
Re: WAL does not recover gracefully from out-of-disk-sp ace

"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:

But we need it regardless --- if you didn't want a fully-allocated WAL
file, why'd you bother with the original seek-and-write-1-byte code?

I considered this mostly as hint for OS about how log file should be
allocated (to decrease fragmentation). Not sure how OSes use such hints
but seek+write costs nothing.

Doing a seek to a large value and doing a write is not a hint to a
Unix system that you are going to write a large sequential file. If
anything, it's a hint that you are going to write a sparse file. A
Unix kernel will optimize by not allocating blocks you aren't going to
write to.

Ian

---------------------------(end of broadcast)---------------------------
TIP 97: Oh this age! How tasteless and ill-bred it is.
-- Gaius Valerius Catullus

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mikheev, Vadim (#2)
Re: WAL does not recover gracefully from out-of-disk-sp ace

"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:

Even with true fdatasync it's not obviously good for performance - it takes
too long time to write 16Mb files and fills OS buffer cache with trash-:(

True. But at least the write is (hopefully) being done at a
non-performance-critical time.

Probably, we need in separate process like LGWR (log writer) in Oracle.

I think the create-ahead feature in the checkpoint maker should be on
by default.

But we need it regardless --- if you didn't want a fully-allocated WAL
file, why'd you bother with the original seek-and-write-1-byte code?

I considered this mostly as hint for OS about how log file should be
allocated (to decrease fragmentation). Not sure how OSes use such hints
but seek+write costs nothing.

AFAIK, extant Unixes will not regard this as a hint at all; they'll
think it is a great opportunity to not store zeroes :-(.

One reason that I like logfile fill to be done separately is that it's
easier to convince ourselves that failure (due to out of disk space)
need not require elog(STOP) than if we have the same failure during
XLogWrite. You are right that we don't have time to consider each STOP
in the WAL code, but I think we should at least look at that case...

regards, tom lane