hio.c does visibilitymap_pin()/IO while holding buffer lock

Started by Andres Freundabout 3 years ago12 messageshackers
Jump to latest
#1Andres Freund
andres@anarazel.de

Hi,

Starting with

commit 7db0cd2145f2bce84cac92402e205e4d2b045bf2
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date: 2021-01-17 22:11:39 +0100

Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE

RelationGetBufferForTuple does

/*
* The page is empty, pin vmbuffer to set all_frozen bit.
*/
if (options & HEAP_INSERT_FROZEN)
{
Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0);
visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
}

while holding a buffer lock. visibilitymap_pin() reads pages, if vmbuffer
doesn't already point to the right block.

The lock ordering rules are to lock VM pages *before* locking heap pages.

I think the reason this hasn't yet bitten us badly, is that INSERT_FROZEN
effectively requires that the relation is access exclusive locked. There
shouldn't be other backends locking multiple buffers in the relation (bgwriter
/ checkpointer can lock a single buffer at a time, but that's it).

I see roughly two ways forward:

1) We add a comment explaining why it's safe to violate lock ordering rules in
this one situation

2) Change relevant code so that we only return a valid vmbuffer if we could do
so without blocking / IO and, obviously, skip updating the VM if we
couldn't get the buffer.

Greetings,

Andres Freund

#2Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#1)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

On 3/25/23 03:57, Andres Freund wrote:

Hi,

Starting with

commit 7db0cd2145f2bce84cac92402e205e4d2b045bf2
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date: 2021-01-17 22:11:39 +0100

Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE

That's a bummer :-(

RelationGetBufferForTuple does

/*
* The page is empty, pin vmbuffer to set all_frozen bit.
*/
if (options & HEAP_INSERT_FROZEN)
{
Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0);
visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
}

while holding a buffer lock. visibilitymap_pin() reads pages, if vmbuffer
doesn't already point to the right block.

The lock ordering rules are to lock VM pages *before* locking heap pages.

I think the reason this hasn't yet bitten us badly, is that INSERT_FROZEN
effectively requires that the relation is access exclusive locked. There
shouldn't be other backends locking multiple buffers in the relation (bgwriter
/ checkpointer can lock a single buffer at a time, but that's it).

Right. Still, it seems a bit fragile ...

I see roughly two ways forward:

1) We add a comment explaining why it's safe to violate lock ordering rules in
this one situation

Possible, although I feel uneasy about just documenting a broken rule.
Would be better to maintain the locking order.

2) Change relevant code so that we only return a valid vmbuffer if we could do
so without blocking / IO and, obviously, skip updating the VM if we
couldn't get the buffer.

I don't recall the exact details about the vm locking/pinning, but can't
we just ensure we actually follow the proper locking order? I mean, this
only deals with new pages, requested at line ~624:

buffer = ReadBufferBI(relation, P_NEW, RBM_ZERO_AND_LOCK, bistate);

Can't we ensure we actually lock the vm buffer too in ReadBufferBI,
before calling ReadBufferExtended? Or am I confused and that's not
possible for some reason?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#2)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

Hi,

On 2023-03-25 14:34:25 +0100, Tomas Vondra wrote:

On 3/25/23 03:57, Andres Freund wrote:

2) Change relevant code so that we only return a valid vmbuffer if we could do
so without blocking / IO and, obviously, skip updating the VM if we
couldn't get the buffer.

I don't recall the exact details about the vm locking/pinning, but can't
we just ensure we actually follow the proper locking order? I mean, this
only deals with new pages, requested at line ~624:

buffer = ReadBufferBI(relation, P_NEW, RBM_ZERO_AND_LOCK, bistate);

Can't we ensure we actually lock the vm buffer too in ReadBufferBI,
before calling ReadBufferExtended? Or am I confused and that's not
possible for some reason?

Note that this is using P_NEW. I.e. we don't know the buffer location yet.

Greetings,

Andres Freund

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#3)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

Andres Freund <andres@anarazel.de> writes:

On 2023-03-25 14:34:25 +0100, Tomas Vondra wrote:

Can't we ensure we actually lock the vm buffer too in ReadBufferBI,
before calling ReadBufferExtended? Or am I confused and that's not
possible for some reason?

Note that this is using P_NEW. I.e. we don't know the buffer location yet.

Maybe the relation-extension logic needs to include the ability to get
the relevant vm page?

regards, tom lane

#5Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#4)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

Hi,

On 2023-03-25 12:57:17 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2023-03-25 14:34:25 +0100, Tomas Vondra wrote:

Can't we ensure we actually lock the vm buffer too in ReadBufferBI,
before calling ReadBufferExtended? Or am I confused and that's not
possible for some reason?

Note that this is using P_NEW. I.e. we don't know the buffer location yet.

Maybe the relation-extension logic needs to include the ability to get
the relevant vm page?

I don't see how that's easily possible with the current lock ordering
rules. At least without giving up using RBM_ZERO_AND_LOCK for extending or
stuffing even more things to happen with the the extension lock held, which I
don't think we want to. I don't think INSERT_FROZEN is worth that price.

Perhaps we should just try to heuristically pin the right VM buffer before
trying to extend?

Thinking more about this, I think there's no inherent deadlock danger with
reading the VM while holding a buffer lock, "just" an efficiency issue. If we
avoid needing to do IO nearly all the time, by trying to pin the right page
before extending, it's probably good enough.

Greetings,

Andres Freund

In reply to: Andres Freund (#5)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

On Sat, Mar 25, 2023 at 11:17 AM Andres Freund <andres@anarazel.de> wrote:

Thinking more about this, I think there's no inherent deadlock danger with
reading the VM while holding a buffer lock, "just" an efficiency issue. If we
avoid needing to do IO nearly all the time, by trying to pin the right page
before extending, it's probably good enough.

Uh, it was quite possible for lazy_vacuum_heap_page() to do that up
until very recently (it was fixed by my commit 980ae17310). Since it
would call visibilitymap_get_status() with an exclusive buffer lock on
the heap page, which sometimes had to change the VM page. It
potentially did an IO at that point, to read in a later VM page to the
caller's initially-pinned one.

In other words, up until recently there was a strange idiom used by
lazy_vacuum_heap_page/lazy_vacuum_heap_rel, where we'd abuse
visibilitymap_get_status() as a replacement for calling
visibilitymap_pin() right before acquire a heap page buffer lock. But
now the second heap pass does it the same way as the first heap pass.
(Even still, I have no reason to believe that the previous approach
was all that bad; it was just a bit ugly.)

There are still a few visibilitymap_get_status()-with-buffer-lock
calls in vacuumlazy.c, FWIW, but they don't run the risk of needing to
change the vmbuffer we have pinned with the heap page buffer lock
held.

--
Peter Geoghegan

#7Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#5)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

Hi,

On 2023-03-25 11:17:07 -0700, Andres Freund wrote:

I don't see how that's easily possible with the current lock ordering
rules. At least without giving up using RBM_ZERO_AND_LOCK for extending or
stuffing even more things to happen with the the extension lock held, which I
don't think we want to. I don't think INSERT_FROZEN is worth that price.

I think I might have been thinking of this too narrowly. It's extremely
unlikely that another backend would discover the page. And we can use
visibilitymap_pin_ok() to amortize the cost to almost nothing - there's a lot
of bits in an 8k block...

Here's a draft patch.

The bulk relation patch I am polishing has a similar issue, except that there
the problem is inserting into the FSM, instead of pinning a VM pageabout the
FSM. Hence the patch above makes the infrastructure a bit more general than
required for the HEAP_INSERT_FROZEN case alone (where we currently shouldn't
ever have a valid otherBuffer).

The way the parameter ordering for GetVisibilityMapPins() works make it
somewhat unwieldy - see e.g the existing
if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock)
GetVisibilityMapPins(relation, buffer, otherBuffer,
targetBlock, otherBlock, vmbuffer,
vmbuffer_other);
else
GetVisibilityMapPins(relation, otherBuffer, buffer,
otherBlock, targetBlock, vmbuffer_other,
vmbuffer);

Which I now duplicated in yet another place.

Perhaps we just ought to switch buffer1/block1 with buffer2/block2 inside
GetVisibilityMapPins(), to avoid duplicating that code elsewhere?

Because we now track whether the *targetBuffer* was ever unlocked, we can be a
bit more narrow about the possibility of there not being sufficient space.

The patch could be narrowed for backpatching. But as there's likely no
practical problem at this point, I wouldn't want to backpatch anyway?

Greetings,

Andres Freund

Attachments:

v1-0001-hio-Release-extension-lock-before-initializing-pa.patchtext/x-diff; charset=us-asciiDownload+7-8
v1-0002-WIP-relation-extension-Don-t-pin-the-VM-while-hol.patchtext/x-diff; charset=us-asciiDownload+78-38
#8Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#7)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

Hi,

On 2023-03-28 18:21:02 -0700, Andres Freund wrote:

Here's a draft patch.

Attached is v2, with a stupid bug fixed and a bit of comment / pgindent
polish.

Greetings,

Andres Freund

Attachments:

v2-0002-WIP-relation-extension-Don-t-pin-the-VM-while-hol.patchtext/x-diff; charset=us-asciiDownload+81-40
v2-0001-hio-Release-extension-lock-before-initializing-pa.patchtext/x-diff; charset=us-asciiDownload+7-8
#9Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#8)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

Hi,

On 2023-03-28 19:17:21 -0700, Andres Freund wrote:

On 2023-03-28 18:21:02 -0700, Andres Freund wrote:

Here's a draft patch.

Attached is v2, with a stupid bug fixed and a bit of comment / pgindent
polish.

I'd welcome some review (Tomas?), but otherwise I'm planning to push ahead
with this.

I'm still debating with myself whether this commit (or a prerequisite commit)
should move logic dealing with the buffer ordering into
GetVisibilityMapPins(), so we don't need two blocks like this:

if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock)
GetVisibilityMapPins(relation, buffer, otherBuffer,
targetBlock, otherBlock, vmbuffer,
vmbuffer_other);
else
GetVisibilityMapPins(relation, otherBuffer, buffer,
otherBlock, targetBlock, vmbuffer_other,
vmbuffer);
...

if (otherBuffer != InvalidBuffer)
{
if (GetVisibilityMapPins(relation, otherBuffer, buffer,
otherBlock, targetBlock, vmbuffer_other,
vmbuffer))
unlockedTargetBuffer = true;
}
else
{
if (GetVisibilityMapPins(relation, buffer, InvalidBuffer,
targetBlock, InvalidBlockNumber,
vmbuffer, InvalidBuffer))
unlockedTargetBuffer = true;
}
}

Greetings,

Andres Freund

#10Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#9)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

On 4/3/23 00:40, Andres Freund wrote:

Hi,

On 2023-03-28 19:17:21 -0700, Andres Freund wrote:

On 2023-03-28 18:21:02 -0700, Andres Freund wrote:

Here's a draft patch.

Attached is v2, with a stupid bug fixed and a bit of comment / pgindent
polish.

I'd welcome some review (Tomas?), but otherwise I'm planning to push ahead
with this.

I guess the 0001 part was already pushed, so I should be looking only at
0002, correct?

I think 0002 makes RelationGetBufferForTuple() harder to understand. I'm
not saying it's incorrect, but I find it hard to reason about the new
combinations of conditions :-(

I mean, it only had this condition:

if (otherBuffer != InvalidBuffer)
{
...
}

but now it have

if (unlockedTargetBuffer)
{
...
}
else if (otherBuffer != InvalidBuffer)
{
...
}

if (unlockedTargetBuffer || otherBuffer != InvalidBuffer)
{
...
}

Not sure how to improve that :-/ but not exactly trivial to figure out
what's going to happen.

Maybe this

* If we unlocked the target buffer above, it's unlikely, but possible,
* that another backend used space on this page.

might say what we're going to do in this case. I mean - I understand
some backend may use space in unlocked page, but what does that mean for
this code? What is it going to do? (The same comment talks about the
next condition in much more detail, for example.)

Also, isn't it a bit strange the free space check now happens outside
any if condition? It used to happen in the

if (otherBuffer != InvalidBuffer)
{
...
}

block, but now it happens outside.

I'm still debating with myself whether this commit (or a prerequisite commit)
should move logic dealing with the buffer ordering into
GetVisibilityMapPins(), so we don't need two blocks like this:

if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock)
GetVisibilityMapPins(relation, buffer, otherBuffer,
targetBlock, otherBlock, vmbuffer,
vmbuffer_other);
else
GetVisibilityMapPins(relation, otherBuffer, buffer,
otherBlock, targetBlock, vmbuffer_other,
vmbuffer);
...

if (otherBuffer != InvalidBuffer)
{
if (GetVisibilityMapPins(relation, otherBuffer, buffer,
otherBlock, targetBlock, vmbuffer_other,
vmbuffer))
unlockedTargetBuffer = true;
}
else
{
if (GetVisibilityMapPins(relation, buffer, InvalidBuffer,
targetBlock, InvalidBlockNumber,
vmbuffer, InvalidBuffer))
unlockedTargetBuffer = true;
}
}

Yeah. I haven't tried, but I imagine it'd make RelationGetBufferForTuple
a little bit.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#11Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#10)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

Hi,

On 2023-04-03 14:25:59 +0200, Tomas Vondra wrote:

On 4/3/23 00:40, Andres Freund wrote:

Hi,

On 2023-03-28 19:17:21 -0700, Andres Freund wrote:

On 2023-03-28 18:21:02 -0700, Andres Freund wrote:

Here's a draft patch.

Attached is v2, with a stupid bug fixed and a bit of comment / pgindent
polish.

I'd welcome some review (Tomas?), but otherwise I'm planning to push ahead
with this.

I guess the 0001 part was already pushed, so I should be looking only at
0002, correct?

Yes.

I think 0002 makes RelationGetBufferForTuple() harder to understand. I'm
not saying it's incorrect, but I find it hard to reason about the new
combinations of conditions :-(

I mean, it only had this condition:

if (otherBuffer != InvalidBuffer)
{
...
}

but now it have

if (unlockedTargetBuffer)
{
...
}
else if (otherBuffer != InvalidBuffer)
{
...
}

if (unlockedTargetBuffer || otherBuffer != InvalidBuffer)
{
...
}

Not sure how to improve that :-/ but not exactly trivial to figure out
what's going to happen.

It's not great, I agree. I tried to make it easier to read in this version by
a) changing GetVisibilityMapPins() as I proposed
b) added a new variable "recheckVmPins", that gets set in
if (unlockedTargetBuffer)
and
if (otherBuffer != InvalidBuffer)
c) reformulated comments

Maybe this

* If we unlocked the target buffer above, it's unlikely, but possible,
* that another backend used space on this page.

might say what we're going to do in this case. I mean - I understand
some backend may use space in unlocked page, but what does that mean for
this code? What is it going to do? (The same comment talks about the
next condition in much more detail, for example.)

There's a comment about that detail further below. But you're right, it wasn't
clear as-is. How about now?

Also, isn't it a bit strange the free space check now happens outside
any if condition? It used to happen in the

if (otherBuffer != InvalidBuffer)
{
...
}

block, but now it happens outside.

Well, the alternative is to repeat it in the two branches, which doesn't seem
great either. Particularly because there'll be a third branch after the bulk
extension patch.

Greetings,

Andres Freund

Attachments:

v3-0001-hio-Relax-rules-for-calling-GetVisibilityMapPins.patchtext/x-diff; charset=us-asciiDownload+26-12
v3-0002-hio-Don-t-pin-the-VM-while-holding-buffer-lock-wh.patchtext/x-diff; charset=us-asciiDownload+85-42
#12Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#11)
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

Hi,

On 2023-04-03 12:00:30 -0700, Andres Freund wrote:

It's not great, I agree. I tried to make it easier to read in this version by
a) changing GetVisibilityMapPins() as I proposed
b) added a new variable "recheckVmPins", that gets set in
if (unlockedTargetBuffer)
and
if (otherBuffer != InvalidBuffer)
c) reformulated comments

I pushed this version a couple hours ago, after a bit more polishing.

Greetings,

Andres Freund