Bug: mdunlinkfiletag unlinks mainfork seg.0 instead of indicated fork+segment

Started by Matthias van de Meentover 1 year ago6 messageshackers
Jump to latest
#1Matthias van de Meent
boekewurm+postgres@gmail.com

Hi,

I noticed that the MD smgr internals misbehave when unlink requests
for specific forks or specific segments are sent through SyncOps, as
it currently always unlinks segment 0 of the main fork, even if only a
different fork and/or segment was requested.

While probably not extremely critical, it seems bad to not unlink the
right segment while in recovery, so here's a patch that unlinks the
exact requested files.

The unlinking of forks in the FileTag infrastructure has been broken
since b0a55e43 in PG16, while a segment number other than 0 has never
been unlinked (at least not since the introduction of the system with
3eb77eba in PG12). However, extensions may still make use of this and
incorrectly assume that only the requested file of the requested fork
's segment will be unlinked, when it actually unlinks data from the
main fork.

The attached fixes that for PG16+. PG13-15 will take a little bit more
effort due to code changes in PG16; though it'd probably still require
a relatively minor change.

Kind regards,

Matthias van de Meent.
Neon (https://neon.tech)

Attachments:

v1-0001-MD-smgr-Unlink-the-requested-file-segment-not-mai.patchapplication/octet-stream; name=v1-0001-MD-smgr-Unlink-the-requested-file-segment-not-mai.patchDownload+16-3
#2Thomas Munro
thomas.munro@gmail.com
In reply to: Matthias van de Meent (#1)
Re: Bug: mdunlinkfiletag unlinks mainfork seg.0 instead of indicated fork+segment

On Sat, Dec 21, 2024 at 11:41 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

The unlinking of forks in the FileTag infrastructure has been broken
since b0a55e43 in PG16,

Well spotted.

-    p = relpathperm(ftag->rlocator, MAIN_FORKNUM);
+    p = _mdfd_segpath_rflb(rlfb, ftag->forknum, ftag->segno);

As you say, a harmless thinko as far as core is concerned, as we only
ever use it for the "tombstone" file preventing relfilenode recycling.
The tombstone is the bare relfilenode (main segment 0), since every
other segment is unlinked on commit.

while a segment number other than 0 has never
been unlinked (at least not since the introduction of the system with
3eb77eba in PG12)

Right, and that predates the FileTag refactoring, it's just that the
earlier coding didn't explicitly mention the segment number so it
looks slightly different. In fact it was hard-coded to unlink
relpathperm(entry->rnode, MAIN_FORKNUM) before that work, so both fork
and segment number were always fixed, it's just that the FileTag
mechanism was made slightly more general, really for the SYNC stuff,
not so much for the UNLINK stuff which uses the same tags.

However, extensions may still make use of this and
incorrectly assume that only the requested file of the requested fork
's segment will be unlinked, when it actually unlinks data from the
main fork.

It seems unlikely to be useful for any purpose other than tombstones.
And it seems like if someone is already using it, they would have been
in touch to say that it doesn't work. Or perhaps you tried to use it
and noticed this flaw, or know of someone who would like to use it?
Or more likely I guess you're working on smgr extension support. It
is not a reliable mechanism (pull the power after the checkpoint
record is written and before it processes that list and you've leaked
a file), and it's dealing with an edge case we should close in a
better way, and then get rid of it.

The attached fixes that for PG16+. PG13-15 will take a little bit more
effort due to code changes in PG16; though it'd probably still require
a relatively minor change.

The patch does not seem unreasonable and I'd like to help tidy this
up, but ... hmm, could we also consider going the other way?
register_unlink_segment(), mdunlinkfiletag() and the macro that
populates md.c's FileTag are internal to md.c, and we don't expect
external code to be pushing md.c SYNC_UNLINK_REQUEST requests into the
request queue (who would do that and what could the motivation
possibly be?) Doesn't feel like a supported usage to me... So my
question is: what bad thing would happen if we just renamed
register_unlink_segment() to register_unlink_tombstone() without
fork/seg arguments, to make it clear that it's not really a general
purpose unreliable segment unlink mechanism that we want anyone to
build more stuff on top of?

#3Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Thomas Munro (#2)
Re: Bug: mdunlinkfiletag unlinks mainfork seg.0 instead of indicated fork+segment

On Sat, 21 Dec 2024 at 01:05, Thomas Munro <thomas.munro@gmail.com> wrote:

On Sat, Dec 21, 2024 at 11:41 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

The unlinking of forks in the FileTag infrastructure has been broken
since b0a55e43 in PG16,
while a segment number other than 0 has never
been unlinked (at least not since the introduction of the system with
3eb77eba in PG12)

Right, and that predates the FileTag refactoring, it's just that the
earlier coding didn't explicitly mention the segment number so it
looks slightly different. In fact it was hard-coded to unlink
relpathperm(entry->rnode, MAIN_FORKNUM) before that work, so both fork
and segment number were always fixed, it's just that the FileTag
mechanism was made slightly more general, really for the SYNC stuff,
not so much for the UNLINK stuff which uses the same tags.

I see.

However, extensions may still make use of this and
incorrectly assume that only the requested file of the requested fork
's segment will be unlinked, when it actually unlinks data from the
main fork.

It seems unlikely to be useful for any purpose other than tombstones.
And it seems like if someone is already using it, they would have been
in touch to say that it doesn't work. Or perhaps you tried to use it
and noticed this flaw, or know of someone who would like to use it?
Or more likely I guess you're working on smgr extension support.

I noticed it when I was browsing NBuffers-sized allocations, which got
me looking into the FileTag infrastructure, which got me trying to
figure out what FileTag.segno is used for that would require it to be
a uint64 in addition to the RelFileNode, which got me looking through
this code.

So, not exactly for SMGR extension support here, but my experience in
that did make it easier for me to figure out that the code doesn't
behave as I'd expected it to.

The attached fixes that for PG16+. PG13-15 will take a little bit more
effort due to code changes in PG16; though it'd probably still require
a relatively minor change.

The patch does not seem unreasonable and I'd like to help tidy this
up, but ... hmm, could we also consider going the other way?
register_unlink_segment(), mdunlinkfiletag() and the macro that
populates md.c's FileTag are internal to md.c, and we don't expect
external code to be pushing md.c SYNC_UNLINK_REQUEST requests into the
request queue (who would do that and what could the motivation
possibly be?) Doesn't feel like a supported usage to me... So my
question is: what bad thing would happen if we just renamed
register_unlink_segment() to register_unlink_tombstone() without
fork/seg arguments, to make it clear that it's not really a general
purpose unreliable segment unlink mechanism that we want anyone to
build more stuff on top of?

I just noticed I misinterpreted the conditions in mdunlinkfork, so
that I thought it allowed a user to pass their own forknum into
register_unlink_segment (as that is called with the user-provided
forknum). Instead, that branch is only taken when forknum ==
MAIN_FORKNUM, so I think you might be right that going in the other
direction is more desirable.

In that case, something along the lines of the attached would then be
better - it removes the fork and segno from register_unlink_segment's
arguments (renamed to register_unlink_tombstone), and Asserts() that
mdunlinkfiletag only receives a FileTag that contains expected values.

Kind regards,

Matthias van de Meent.

Attachments:

v2-0001-MD-smgr-Clarify-FileTag-based-unlinking.patchapplication/octet-stream; name=v2-0001-MD-smgr-Clarify-FileTag-based-unlinking.patchDownload+10-8
#4surya poondla
suryapoondla4@gmail.com
In reply to: Matthias van de Meent (#1)
Re: Bug: mdunlinkfiletag unlinks mainfork seg.0 instead of indicated fork+segment

Hi Matthias,

Thank you for reporting this issue and working on it.

I reviewed the whole email chain and v2 patch.

I really liked Thomas's suggestion on renaming register_unlink_segment()
to register_unlink_tombstone().

The old signature with forknum and segno parameters genuinely looked like
it could unlink arbitrary fork segments but mdunlinkfiletag() was silently
ignoring both fields and always going to main fork seg 0 regardless. This
could be very confusing.
The assert is a good belt-and-suspenders addition on top of that. If anyone
ever manages to push a non-tombstone tag through this path, a cassert build
will catch it immediately instead of silently deleting the wrong file.

One thing I'd flag for future work, Thomas mentioned in the thread that
there's a reliability gap here where a crash between the checkpoint record
being written and the unlink queue being processed will leak the tombstone.
This patch doesn't fix that and it doesn't need to, but it'd be good to see
someone pick that up eventually.

Overall the patch is in a good shape.

Regards,
Surya Poondla

#5solai v
solai.cdac@gmail.com
In reply to: surya poondla (#4)
Re: Bug: mdunlinkfiletag unlinks mainfork seg.0 instead of indicated fork+segment

Hi Matthias,

I reviewed the v2 patch and tested it on my local build.

I verified that before the patch, mdunlinkfiletag() ignored forknum
and segno and always operated on MAIN_FORKNUM segment 0.
The new direction with register_unlink_tombstone() makes the intent
much clearer, and the added Assert() looks like a good safeguard
against accidental misuse.

I also had a small comment suggestion near the Assert() to make the
tombstone-only behavior slightly more explicit.
The patch applied successfully with git apply --3way, and PostgreSQL
built and tested successfully on my setup.

Overall the patch looks good to me.

Regards,
Solai

#6surya poondla
suryapoondla4@gmail.com
In reply to: solai v (#5)
Re: Bug: mdunlinkfiletag unlinks mainfork seg.0 instead of indicated fork+segment

Hi Solai,

Thanks for testing and confirming.

Sounds good regarding your comment suggestion.

Regards,
Surya Poondla