asynchronous commit risk window is overly optimistic

Started by Jeff Janesabout 7 years ago2 messagesdocs
Jump to latest
#1Jeff Janes
jeff.janes@gmail.com

https://www.postgresql.org/docs/current/wal-async-commit.html:

"If the database crashes during the risk window between an asynchronous
commit and the writing of the transaction's WAL records, then changes made
during that transaction will be lost. The duration of the risk window is
limited because a background process (the “WAL writer”) flushes unwritten
WAL records to disk every wal_writer_delay milliseconds. The actual maximum
duration of the risk window is three times wal_writer_delay because the WAL
writer is designed to favor writing whole pages at a time during busy
periods."

I think the phrase "actual maximum duration" here is far too reassuring.
There is no guarantee that the kernel will wake WAL writer three times in a
row at the times it requested, or even any other smalish multiple of that
time. Even if the wal_writer does repeatedly wake on schedule and requests
a fsync, that fsync itself can take a very large multiple of
wal_writer_delay milliseconds before it takes effect.

If your server experiences a sudden power failure during normal operations
with uncongested IO, then it is very likely that anything asynchronously
committed more than three wal_writer_delay (plus two disk rotations) ago
has made it to disk. But if it crashes for some other reason than a sudden
power failure, it is less likely to be on disk. A stricken server can go
wobbly for a long time before finally falling over.

Maybe it should be replaced with something less confident, like "Under
normal conditions, the flush will be initiated within three times
wal_writer_delay because the WAL writer is designed to favor writing whole
pages at a time during busy periods."

Although the whole "because" clause seems to be more inside baseball than
is warranted here.

Cheers,

Jeff

#2Bruce Momjian
bruce@momjian.us
In reply to: Jeff Janes (#1)
Re: asynchronous commit risk window is overly optimistic

On Wed, Mar 20, 2019 at 02:50:21PM -0400, Jeff Janes wrote:

https://www.postgresql.org/docs/current/wal-async-commit.html:

"If the database crashes during the risk window between an asynchronous commit
and the writing of the transaction's WAL records, then changes made during that
transaction will be lost. The duration of the risk window is limited because a
background process (the “WAL writer”) flushes unwritten WAL records to disk
every wal_writer_delay milliseconds. The actual maximum duration of the risk
window is three times wal_writer_delay because the WAL writer is designed to
favor writing whole pages at a time during busy periods."

I think the phrase "actual maximum duration" here is far too reassuring. There
is no guarantee that the kernel will wake WAL writer three times in a row at
the times it requested, or even any other smalish multiple of that time. Even
if the wal_writer does repeatedly wake on schedule and requests a fsync, that
fsync itself can take a very large multiple of wal_writer_delay milliseconds
before it takes effect.

If your server experiences a sudden power failure during normal operations with
uncongested IO, then it is very likely that anything asynchronously committed
more than three wal_writer_delay (plus two disk rotations) ago has made it to
disk.  But if it crashes for some other reason than a sudden power failure, it
is less likely to be on disk.  A stricken server can go wobbly for a long time
before finally falling over.

Maybe it should be replaced with something less confident, like "Under normal
conditions, the flush will be initiated within three times wal_writer_delay
because the WAL writer is designed to favor writing whole pages at a time
during busy periods."

Although the whole "because" clause seems to be more inside baseball than is
warranted here.

I think we can go with:

"Under normal conditions, the flush will be initiated within
roughly three times wal_writer_delay".

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +