MarkBufferDirtyHint() and LSN update

Started by Antonin Houskaabout 6 years ago15 messages
#1Antonin Houska
Antonin Houska
ah@cybertec.at

Please consider this scenario (race conditions):

1. FlushBuffer() has written the buffer but hasn't yet managed to clear the
BM_DIRTY flag (however BM_JUST_DIRTIED could be cleared by now).

2. Another backend modified a hint bit and called MarkBufferDirtyHint().

3. In MarkBufferDirtyHint(), if XLogHintBitIsNeeded() evaluates to true
(e.g. due to checksums enabled), new LSN is computed, however it's not
assigned to the page because the buffer is still dirty:

if (!(buf_state & BM_DIRTY))
{
...

if (!XLogRecPtrIsInvalid(lsn))
PageSetLSN(page, lsn);
}

4. MarkBufferDirtyHint() completes.

5. In the first session, FlushBuffer()->TerminateBufferIO() will not clear
BM_DIRTY because MarkBufferDirtyHint() has eventually set
BM_JUST_DIRTIED. Thus the hint bit change itself will be written by the next
call of FlushBuffer(). However page LSN is hasn't been updated so the
requirement that WAL must be flushed first is not met.

I think that PageSetLSN() should be called regardless BM_DIRTY. Do I miss any
subtle detail?

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#2Tomas Vondra
Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Antonin Houska (#1)
Re: MarkBufferDirtyHint() and LSN update

On Wed, Oct 30, 2019 at 02:44:18PM +0100, Antonin Houska wrote:

Please consider this scenario (race conditions):

1. FlushBuffer() has written the buffer but hasn't yet managed to clear the
BM_DIRTY flag (however BM_JUST_DIRTIED could be cleared by now).

2. Another backend modified a hint bit and called MarkBufferDirtyHint().

3. In MarkBufferDirtyHint(), if XLogHintBitIsNeeded() evaluates to true
(e.g. due to checksums enabled), new LSN is computed, however it's not
assigned to the page because the buffer is still dirty:

if (!(buf_state & BM_DIRTY))
{
...

if (!XLogRecPtrIsInvalid(lsn))
PageSetLSN(page, lsn);
}

4. MarkBufferDirtyHint() completes.

5. In the first session, FlushBuffer()->TerminateBufferIO() will not clear
BM_DIRTY because MarkBufferDirtyHint() has eventually set
BM_JUST_DIRTIED. Thus the hint bit change itself will be written by the next
call of FlushBuffer(). However page LSN is hasn't been updated so the
requirement that WAL must be flushed first is not met.

I think that PageSetLSN() should be called regardless BM_DIRTY. Do I miss any
subtle detail?

Isn't this prevented by locking of the buffer header? Both FlushBuffer
and MarkBufferDirtyHint do obtain that lock. I see MarkBufferDirtyHint
does a bit of work before, but that's related to BM_PERMANENT.

If there really is a race condition, it shouldn't be that difficult to
trigger it by adding a sleep or a breakpoint, I guess.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Antonin Houska
Antonin Houska
ah@cybertec.at
In reply to: Tomas Vondra (#2)
Re: MarkBufferDirtyHint() and LSN update

Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On Wed, Oct 30, 2019 at 02:44:18PM +0100, Antonin Houska wrote:

Please consider this scenario (race conditions):

1. FlushBuffer() has written the buffer but hasn't yet managed to clear the
BM_DIRTY flag (however BM_JUST_DIRTIED could be cleared by now).

2. Another backend modified a hint bit and called MarkBufferDirtyHint().

3. In MarkBufferDirtyHint(), if XLogHintBitIsNeeded() evaluates to true
(e.g. due to checksums enabled), new LSN is computed, however it's not
assigned to the page because the buffer is still dirty:

if (!(buf_state & BM_DIRTY))
{
...

if (!XLogRecPtrIsInvalid(lsn))
PageSetLSN(page, lsn);
}

4. MarkBufferDirtyHint() completes.

5. In the first session, FlushBuffer()->TerminateBufferIO() will not clear
BM_DIRTY because MarkBufferDirtyHint() has eventually set
BM_JUST_DIRTIED. Thus the hint bit change itself will be written by the next
call of FlushBuffer(). However page LSN is hasn't been updated so the
requirement that WAL must be flushed first is not met.

I think that PageSetLSN() should be called regardless BM_DIRTY. Do I miss any
subtle detail?

Isn't this prevented by locking of the buffer header? Both FlushBuffer
and MarkBufferDirtyHint do obtain that lock. I see MarkBufferDirtyHint
does a bit of work before, but that's related to BM_PERMANENT.

If there really is a race condition, it shouldn't be that difficult to
trigger it by adding a sleep or a breakpoint, I guess.

Yes, I had verified it using gdb: inserted a row into a table, then attached
gdb to checkpointer and stopped it when FlushBuffer() was about to call
TerminateBufferIO(). Then, when scanning the table, I saw that
MarkBufferDirtyHint() skipped the call of PageSetLSN(). Finally, before
checkpointer unlocked the buffer header in TerminateBufferIO(), buf_state was
3553624066 ~ 0b11010011110100000000000000000010.

With BM_DIRTY defined as

#define BM_DIRTY (1U << 23)

the flag appeared to be set.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#4Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Antonin Houska (#1)
Re: MarkBufferDirtyHint() and LSN update

On Wed, Oct 30, 2019 at 9:43 AM Antonin Houska <ah@cybertec.at> wrote:

5. In the first session, FlushBuffer()->TerminateBufferIO() will not clear
BM_DIRTY because MarkBufferDirtyHint() has eventually set
BM_JUST_DIRTIED. Thus the hint bit change itself will be written by the next
call of FlushBuffer(). However page LSN is hasn't been updated so the
requirement that WAL must be flushed first is not met.

This part confuses me. Are you saying that MarkBufferDirtyHint() can
set BM_JUST_DIRTIED without setting BM_DIRTY?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Antonin Houska
Antonin Houska
ah@cybertec.at
In reply to: Robert Haas (#4)
Re: MarkBufferDirtyHint() and LSN update

Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Oct 30, 2019 at 9:43 AM Antonin Houska <ah@cybertec.at> wrote:

5. In the first session, FlushBuffer()->TerminateBufferIO() will not clear
BM_DIRTY because MarkBufferDirtyHint() has eventually set
BM_JUST_DIRTIED. Thus the hint bit change itself will be written by the next
call of FlushBuffer(). However page LSN is hasn't been updated so the
requirement that WAL must be flushed first is not met.

This part confuses me. Are you saying that MarkBufferDirtyHint() can
set BM_JUST_DIRTIED without setting BM_DIRTY?

No, I'm saying that MarkBufferDirtyHint() leaves BM_DIRTY set, as
expected. However, if things happen in the order I described, then LSN
returned by XLogSaveBufferForHint() is not assigned to the page.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#6Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Antonin Houska (#3)
1 attachment(s)
Re: MarkBufferDirtyHint() and LSN update

On Thu, Oct 31, 2019 at 09:43:47AM +0100, Antonin Houska wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Isn't this prevented by locking of the buffer header? Both FlushBuffer
and MarkBufferDirtyHint do obtain that lock. I see MarkBufferDirtyHint
does a bit of work before, but that's related to BM_PERMANENT.

In FlushBuffer() you have a window after fetching the buffer state and
the header is once unlocked until TerminateBufferIO() gets called
(this would take again a lock on the buffer header), so it is
logically possible for the buffer to be marked as dirty once again
causing the update of the LSN on the page to be missed even if a
backup block has been written in WAL.

Yes, I had verified it using gdb: inserted a row into a table, then attached
gdb to checkpointer and stopped it when FlushBuffer() was about to call
TerminateBufferIO(). Then, when scanning the table, I saw that
MarkBufferDirtyHint() skipped the call of PageSetLSN(). Finally, before
checkpointer unlocked the buffer header in TerminateBufferIO(), buf_state was
3553624066 ~ 0b11010011110100000000000000000010.

Small typo here: 11010011110100000000000000000010...

With BM_DIRTY defined as

#define BM_DIRTY (1U << 23)

the flag appeared to be set.

... Still, the bit is set here.

Does something like the attached patch make sense? Reviews are
welcome.
--
Michael

Attachments:

hintbit-lsn.patchtext/x-diff; charset=us-ascii
#7Antonin Houska
Antonin Houska
ah@cybertec.at
In reply to: Michael Paquier (#6)
Re: MarkBufferDirtyHint() and LSN update

Michael Paquier <michael@paquier.xyz> wrote:

Does something like the attached patch make sense? Reviews are
welcome.

This looks good to me.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#8Kyotaro Horiguchi
Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Antonin Houska (#7)
Re: MarkBufferDirtyHint() and LSN update

At Mon, 11 Nov 2019 10:03:14 +0100, Antonin Houska <ah@cybertec.at> wrote in

Michael Paquier <michael@paquier.xyz> wrote:

Does something like the attached patch make sense? Reviews are
welcome.

This looks good to me.

I have a qustion.

The current code assumes that !BM_DIRTY means that the function is
dirtying the page. But if !BM_JUST_DIRTIED, the function actually is
going to re-dirty the page even if BM_DIRTY.

If this is correct, the trigger for stats update is not !BM_DIRTY but
!BM_JUST_DIRTIED, or the fact that we passed the line of
XLogSaveBufferForHint() could be the trigger, regardless whether the
LSN is valid or not.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#9Antonin Houska
Antonin Houska
ah@cybertec.at
In reply to: Kyotaro Horiguchi (#8)
Re: MarkBufferDirtyHint() and LSN update

Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

At Mon, 11 Nov 2019 10:03:14 +0100, Antonin Houska <ah@cybertec.at> wrote in

Michael Paquier <michael@paquier.xyz> wrote:

Does something like the attached patch make sense? Reviews are
welcome.

This looks good to me.

I have a qustion.

The current code assumes that !BM_DIRTY means that the function is
dirtying the page. But if !BM_JUST_DIRTIED, the function actually is
going to re-dirty the page even if BM_DIRTY.

It makes sense to me. I can imagine the following:

1. FlushBuffer() cleared BM_JUST_DIRTIED, wrote the page to disk but hasn't
yet cleared BM_DIRTY.

2. Another backend changed a hint bit in shared memory and called
MarkBufferDirtyHint().

Thus FlushBuffer() missed the current hint bit change, so we need to dirty the
page again.

If this is correct, the trigger for stats update is not !BM_DIRTY but
!BM_JUST_DIRTIED, or the fact that we passed the line of
XLogSaveBufferForHint() could be the trigger, regardless whether the
LSN is valid or not.

I agree.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#10Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Antonin Houska (#7)
Re: MarkBufferDirtyHint() and LSN update

On Mon, Nov 11, 2019 at 10:03:14AM +0100, Antonin Houska wrote:

This looks good to me.

Actually, no, this is not good. I have been studying more the patch,
and after stressing more this code path with a cluster having
checksums enabled and shared_buffers at 1MB, I have been able to make
a couple of page's LSNs go backwards with pgbench -s 100. The cause
was simply that the page got flushed with a newer LSN than what was
returned by XLogSaveBufferForHint() before taking the buffer header
lock, so updating only the LSN for a non-dirty page was simply
guarding against that.
--
Michael

#11Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#10)
1 attachment(s)
Re: MarkBufferDirtyHint() and LSN update

On Wed, Nov 13, 2019 at 09:17:03PM +0900, Michael Paquier wrote:

Actually, no, this is not good. I have been studying more the patch,
and after stressing more this code path with a cluster having
checksums enabled and shared_buffers at 1MB, I have been able to make
a couple of page's LSNs go backwards with pgbench -s 100. The cause
was simply that the page got flushed with a newer LSN than what was
returned by XLogSaveBufferForHint() before taking the buffer header
lock, so updating only the LSN for a non-dirty page was simply
guarding against that.

for the reference attached is the trick I have used, adding an extra
assertion check in PageSetLSN().
--
Michael

Attachments:

page-setlsn-trick.patchtext/x-diff; charset=us-ascii
#12Kyotaro Horiguchi
Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#11)
Re: MarkBufferDirtyHint() and LSN update

At Thu, 14 Nov 2019 12:01:29 +0900, Michael Paquier <michael@paquier.xyz> wrote in

On Wed, Nov 13, 2019 at 09:17:03PM +0900, Michael Paquier wrote:

Actually, no, this is not good. I have been studying more the patch,
and after stressing more this code path with a cluster having
checksums enabled and shared_buffers at 1MB, I have been able to make
a couple of page's LSNs go backwards with pgbench -s 100. The cause
was simply that the page got flushed with a newer LSN than what was
returned by XLogSaveBufferForHint() before taking the buffer header
lock, so updating only the LSN for a non-dirty page was simply
guarding against that.

I thought of something like that but forgot to mention. But after
thinking of that, couldn't the current code can do the same think even
though with a far small probability? Still a session with smaller hint
LSN but didn't entered the header lock section yet can be cut-in by
another session with larger hint LSN.

for the reference attached is the trick I have used, adding an extra
assertion check in PageSetLSN().

I believe that all locations where Page-LSN is set is in the same
buffer-ex-lock section with XLogInsert.. but not sure.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#13Antonin Houska
Antonin Houska
ah@cybertec.at
In reply to: Michael Paquier (#10)
1 attachment(s)
Re: MarkBufferDirtyHint() and LSN update

Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Nov 11, 2019 at 10:03:14AM +0100, Antonin Houska wrote:

This looks good to me.

Actually, no, this is not good. I have been studying more the patch,
and after stressing more this code path with a cluster having
checksums enabled and shared_buffers at 1MB, I have been able to make
a couple of page's LSNs go backwards with pgbench -s 100. The cause
was simply that the page got flushed with a newer LSN than what was
returned by XLogSaveBufferForHint() before taking the buffer header
lock, so updating only the LSN for a non-dirty page was simply
guarding against that.

Interesting. Now that I know about the problem, I could have reproduced it
using gdb: MarkBufferDirtyHint() was called by 2 backends concurrently in such
a way that the first backend generates the LSN, but before it manages to
assign it to the page, another backend generates another LSN and sets it.

Can't we just apply the attached diff on the top of your patch?

Also I wonder how checksums helped you to discover the problem? Although I
could simulate the setting of lower LSN, I could not see any broken
checksum. And I wouldn't even expect that since FlushBuffer() copies the
buffer into backend local memory before it calculates the checksum.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachments:

hintbit-lsn-compare.patchtext/x-diff
#14Antonin Houska
Antonin Houska
ah@cybertec.at
In reply to: Antonin Houska (#13)
Re: MarkBufferDirtyHint() and LSN update

Antonin Houska <ah@cybertec.at> wrote:

Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Nov 11, 2019 at 10:03:14AM +0100, Antonin Houska wrote:

This looks good to me.

Actually, no, this is not good. I have been studying more the patch,
and after stressing more this code path with a cluster having
checksums enabled and shared_buffers at 1MB, I have been able to make
a couple of page's LSNs go backwards with pgbench -s 100. The cause
was simply that the page got flushed with a newer LSN than what was
returned by XLogSaveBufferForHint() before taking the buffer header
lock, so updating only the LSN for a non-dirty page was simply
guarding against that.

Interesting. Now that I know about the problem, I could have reproduced it
using gdb: MarkBufferDirtyHint() was called by 2 backends concurrently in such
a way that the first backend generates the LSN, but before it manages to
assign it to the page, another backend generates another LSN and sets it.

Can't we just apply the attached diff on the top of your patch?

I wanted to register the patch for the next CF so it's not forgotten, but see
it's already there. Why have you set the status to "withdrawn"?

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#15Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Antonin Houska (#14)
Re: MarkBufferDirtyHint() and LSN update

On Fri, Dec 20, 2019 at 04:30:38PM +0100, Antonin Houska wrote:

I wanted to register the patch for the next CF so it's not forgotten, but see
it's already there. Why have you set the status to "withdrawn"?

Because my patch was incorrect, and I did not make enough bandwidth to
think more on the matter. I am actually not sure that what you are
proposing is better.. If you wish to get that reviewed, could you add
a new CF entry?
--
Michael