freespace buffer locking

Started by Andres Freund4 months ago2 messageshackers

andres@anarazel.de

4 months ago

Hi,

In the context of [1]/messages/by-id/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff I started looking at the freespace code, in particular
fsm_vacuum_page(), which has this comment:
/*
* Reset the next slot pointer. This encourages the use of low-numbered
* pages, increasing the chances that a later vacuum can truncate the
* relation. We don't bother with a lock here, nor with marking the page
* dirty if it wasn't already, since this is just a hint.
*/

I.e. we modify the buffer without even holding a share lock on the page. That
seems ... not ok.

What if, e.g., the page were included in a WAL record? Then this would corrupt
the record checksum. Now, this normally won't happen, was the FSM isn't WAL
logged, but still. And I think there may be special circumstances where the
page is included in a WAL record, e.g. as part of an CREATE DATABASE. And
there's FreeSpaceMapPrepareTruncateRel() - which hopefully can't run
concurrently with fsm_vacuum_page(), but would seem to court WAL corruption,
if it ever did.

Besides modifying the page while not even share locked, there are a few other
oddities:

There seem to be some other oddities:
- GetRecordedFreeSpace() does fsm_get_avail() without locking
- fsm_vacuum_page() does fsm_get_avail(), fsm_get_max_avail() without locking

ISTM we clearly should take a lock in fsm_vacuum_page() to reset fp_next_slot,
that just seems like a nasty hard to find bug waiting to happen. Changing it
to not look at the page without a lock seems a bit more challenging.

I suspect the omission of the lock in GetRecordedFreeSpace() in 15c121b3ed7e
wasn't intentional? Heikki, you probably don't remember? :). I think we
should fix that - none of the callers look like they'd be anywhere near
frequent enough in real workloads to make that a problem?

Greetings,

Andres Freund

[1]: /messages/by-id/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff

Kirill Reshke

reshkekirill@gmail.com

4 months ago

In reply to: Andres Freund (#1)

Re: freespace buffer locking

On Tue, 2 Dec 2025 at 03:41, Andres Freund <andres@anarazel.de> wrote:

Hi,

Hi!

I.e. we modify the buffer without even holding a share lock on the page. That

seems ... not ok.

I was recently confused by these lines too doing some related hacking
on my work.

Now, this normally won't happen, was the FSM isn't WAL
logged, but still. And I think there may be special circumstances where the
page is included in a WAL record, e.g. as part of an CREATE DATABASE. And
there's FreeSpaceMapPrepareTruncateRel() - which hopefully can't run
concurrently with fsm_vacuum_page(), but would seem to court WAL corruption,
if it ever did.

Yep, also extension may want to run log_newpage_range for FSM fork for
whatever reason.
Extension then should rely on that when running log_newpage_range,
other activity will not
cause any hazards or corruptions. Is it?

Besides modifying the page while not even share locked, there are a few other
oddities:

There seem to be some other oddities:
- GetRecordedFreeSpace() does fsm_get_avail() without locking
- fsm_vacuum_page() does fsm_get_avail(), fsm_get_max_avail() without locking

IIRC, Given these all are single-byte reads will there be no actual
troubles? Like, we already can
read corrupted info from FSM fork, because this is just a "hint", and
fsm callers are ready to not-trust the results. Just my 2c

ISTM we clearly should take a lock in fsm_vacuum_page() to reset fp_next_slot,
that just seems like a nasty hard to find bug waiting to happen. Changing it
to not look at the page without a lock seems a bit more challenging.

I agree on the grounds of enforcing good examples of buf lock/pin
management along the internals.
Like, maybe all of this actually works on HEAD as expected, but I'm
not sure it is worth its additional complexity.

--
Best regards,
Kirill Reshke