Bug: mdunlinkfiletag unlinks mainfork seg.0 instead of indicated fork+segment

Started by Matthias van de Meentabout 1 year ago3 messages
#1Matthias van de Meent
boekewurm+postgres@gmail.com
1 attachment(s)

Hi,

I noticed that the MD smgr internals misbehave when unlink requests
for specific forks or specific segments are sent through SyncOps, as
it currently always unlinks segment 0 of the main fork, even if only a
different fork and/or segment was requested.

While probably not extremely critical, it seems bad to not unlink the
right segment while in recovery, so here's a patch that unlinks the
exact requested files.

The unlinking of forks in the FileTag infrastructure has been broken
since b0a55e43 in PG16, while a segment number other than 0 has never
been unlinked (at least not since the introduction of the system with
3eb77eba in PG12). However, extensions may still make use of this and
incorrectly assume that only the requested file of the requested fork
's segment will be unlinked, when it actually unlinks data from the
main fork.

The attached fixes that for PG16+. PG13-15 will take a little bit more
effort due to code changes in PG16; though it'd probably still require
a relatively minor change.

Kind regards,

Matthias van de Meent.
Neon (https://neon.tech)

Attachments:

v1-0001-MD-smgr-Unlink-the-requested-file-segment-not-mai.patchapplication/octet-stream; name=v1-0001-MD-smgr-Unlink-the-requested-file-segment-not-mai.patchDownload
From e7818a6579d644eb1caeedd6edb3740ae796f718 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Fri, 20 Dec 2024 19:25:16 +0100
Subject: [PATCH v1] MD smgr: Unlink the requested file segment, not main fork
 segment 0

While it seems like we only unlink segment 0 of any fork, it's
better to just comply with the request, instead of ignoring the
provided information.

Backpatch: 13, the oldest supported release.
---
 src/backend/storage/smgr/md.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index cc8a80ee96..3175ec252a 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -133,6 +133,8 @@ static void _fdvec_resize(SMgrRelation reln,
 						  int nseg);
 static char *_mdfd_segpath(SMgrRelation reln, ForkNumber forknum,
 						   BlockNumber segno);
+static char *_mdfd_segpath_rflb(RelFileLocatorBackend reln, ForkNumber forknum,
+								BlockNumber segno);
 static MdfdVec *_mdfd_openseg(SMgrRelation reln, ForkNumber forknum,
 							  BlockNumber segno, int oflags);
 static MdfdVec *_mdfd_getseg(SMgrRelation reln, ForkNumber forknum,
@@ -1535,11 +1537,18 @@ _fdvec_resize(SMgrRelation reln,
  */
 static char *
 _mdfd_segpath(SMgrRelation reln, ForkNumber forknum, BlockNumber segno)
+{
+	return _mdfd_segpath_rflb(reln->smgr_rlocator, forknum, segno);
+}
+
+static char *
+_mdfd_segpath_rflb(RelFileLocatorBackend reln, ForkNumber forknum,
+				   BlockNumber segno)
 {
 	char	   *path,
 			   *fullpath;
 
-	path = relpath(reln->smgr_rlocator, forknum);
+	path = relpath(reln, forknum);
 
 	if (segno > 0)
 	{
@@ -1810,9 +1819,14 @@ int
 mdunlinkfiletag(const FileTag *ftag, char *path)
 {
 	char	   *p;
+	RelFileLocatorBackend rlfb = {
+		.locator = ftag->rlocator,
+		.backend = INVALID_PROC_NUMBER,
+	};
 
+	Assert(ftag->segno == 0);
 	/* Compute the path. */
-	p = relpathperm(ftag->rlocator, MAIN_FORKNUM);
+	p = _mdfd_segpath_rflb(rlfb, ftag->forknum, ftag->segno);
 	strlcpy(path, p, MAXPGPATH);
 	pfree(p);
 
-- 
2.45.2

#2Thomas Munro
thomas.munro@gmail.com
In reply to: Matthias van de Meent (#1)
Re: Bug: mdunlinkfiletag unlinks mainfork seg.0 instead of indicated fork+segment

On Sat, Dec 21, 2024 at 11:41 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

The unlinking of forks in the FileTag infrastructure has been broken
since b0a55e43 in PG16,

Well spotted.

-    p = relpathperm(ftag->rlocator, MAIN_FORKNUM);
+    p = _mdfd_segpath_rflb(rlfb, ftag->forknum, ftag->segno);

As you say, a harmless thinko as far as core is concerned, as we only
ever use it for the "tombstone" file preventing relfilenode recycling.
The tombstone is the bare relfilenode (main segment 0), since every
other segment is unlinked on commit.

while a segment number other than 0 has never
been unlinked (at least not since the introduction of the system with
3eb77eba in PG12)

Right, and that predates the FileTag refactoring, it's just that the
earlier coding didn't explicitly mention the segment number so it
looks slightly different. In fact it was hard-coded to unlink
relpathperm(entry->rnode, MAIN_FORKNUM) before that work, so both fork
and segment number were always fixed, it's just that the FileTag
mechanism was made slightly more general, really for the SYNC stuff,
not so much for the UNLINK stuff which uses the same tags.

However, extensions may still make use of this and
incorrectly assume that only the requested file of the requested fork
's segment will be unlinked, when it actually unlinks data from the
main fork.

It seems unlikely to be useful for any purpose other than tombstones.
And it seems like if someone is already using it, they would have been
in touch to say that it doesn't work. Or perhaps you tried to use it
and noticed this flaw, or know of someone who would like to use it?
Or more likely I guess you're working on smgr extension support. It
is not a reliable mechanism (pull the power after the checkpoint
record is written and before it processes that list and you've leaked
a file), and it's dealing with an edge case we should close in a
better way, and then get rid of it.

The attached fixes that for PG16+. PG13-15 will take a little bit more
effort due to code changes in PG16; though it'd probably still require
a relatively minor change.

The patch does not seem unreasonable and I'd like to help tidy this
up, but ... hmm, could we also consider going the other way?
register_unlink_segment(), mdunlinkfiletag() and the macro that
populates md.c's FileTag are internal to md.c, and we don't expect
external code to be pushing md.c SYNC_UNLINK_REQUEST requests into the
request queue (who would do that and what could the motivation
possibly be?) Doesn't feel like a supported usage to me... So my
question is: what bad thing would happen if we just renamed
register_unlink_segment() to register_unlink_tombstone() without
fork/seg arguments, to make it clear that it's not really a general
purpose unreliable segment unlink mechanism that we want anyone to
build more stuff on top of?

#3Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Thomas Munro (#2)
1 attachment(s)
Re: Bug: mdunlinkfiletag unlinks mainfork seg.0 instead of indicated fork+segment

On Sat, 21 Dec 2024 at 01:05, Thomas Munro <thomas.munro@gmail.com> wrote:

On Sat, Dec 21, 2024 at 11:41 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

The unlinking of forks in the FileTag infrastructure has been broken
since b0a55e43 in PG16,
while a segment number other than 0 has never
been unlinked (at least not since the introduction of the system with
3eb77eba in PG12)

Right, and that predates the FileTag refactoring, it's just that the
earlier coding didn't explicitly mention the segment number so it
looks slightly different. In fact it was hard-coded to unlink
relpathperm(entry->rnode, MAIN_FORKNUM) before that work, so both fork
and segment number were always fixed, it's just that the FileTag
mechanism was made slightly more general, really for the SYNC stuff,
not so much for the UNLINK stuff which uses the same tags.

I see.

However, extensions may still make use of this and
incorrectly assume that only the requested file of the requested fork
's segment will be unlinked, when it actually unlinks data from the
main fork.

It seems unlikely to be useful for any purpose other than tombstones.
And it seems like if someone is already using it, they would have been
in touch to say that it doesn't work. Or perhaps you tried to use it
and noticed this flaw, or know of someone who would like to use it?
Or more likely I guess you're working on smgr extension support.

I noticed it when I was browsing NBuffers-sized allocations, which got
me looking into the FileTag infrastructure, which got me trying to
figure out what FileTag.segno is used for that would require it to be
a uint64 in addition to the RelFileNode, which got me looking through
this code.

So, not exactly for SMGR extension support here, but my experience in
that did make it easier for me to figure out that the code doesn't
behave as I'd expected it to.

The attached fixes that for PG16+. PG13-15 will take a little bit more
effort due to code changes in PG16; though it'd probably still require
a relatively minor change.

The patch does not seem unreasonable and I'd like to help tidy this
up, but ... hmm, could we also consider going the other way?
register_unlink_segment(), mdunlinkfiletag() and the macro that
populates md.c's FileTag are internal to md.c, and we don't expect
external code to be pushing md.c SYNC_UNLINK_REQUEST requests into the
request queue (who would do that and what could the motivation
possibly be?) Doesn't feel like a supported usage to me... So my
question is: what bad thing would happen if we just renamed
register_unlink_segment() to register_unlink_tombstone() without
fork/seg arguments, to make it clear that it's not really a general
purpose unreliable segment unlink mechanism that we want anyone to
build more stuff on top of?

I just noticed I misinterpreted the conditions in mdunlinkfork, so
that I thought it allowed a user to pass their own forknum into
register_unlink_segment (as that is called with the user-provided
forknum). Instead, that branch is only taken when forknum ==
MAIN_FORKNUM, so I think you might be right that going in the other
direction is more desirable.

In that case, something along the lines of the attached would then be
better - it removes the fork and segno from register_unlink_segment's
arguments (renamed to register_unlink_tombstone), and Asserts() that
mdunlinkfiletag only receives a FileTag that contains expected values.

Kind regards,

Matthias van de Meent.

Attachments:

v2-0001-MD-smgr-Clarify-FileTag-based-unlinking.patchapplication/octet-stream; name=v2-0001-MD-smgr-Clarify-FileTag-based-unlinking.patchDownload
From 5b0b02de3eaacce089ade50850782da6d58d6e82 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Sat, 21 Dec 2024 02:15:32 +0100
Subject: [PATCH v2] MD smgr: Clarify FileTag-based unlinking

Only "tombstone" files (first segment of main fork) are unlinked after
checkpoints, so make sure that the register function is clear about this.

Additionally, add an assertion in mdunlinkfiletag that the FileTag only
contains expected values.

Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
---
 src/backend/storage/smgr/md.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 11fccda475f..e94ea9d2c9f 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -116,8 +116,7 @@ static void mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum,
 static MdfdVec *mdopenfork(SMgrRelation reln, ForkNumber forknum, int behavior);
 static void register_dirty_segment(SMgrRelation reln, ForkNumber forknum,
 								   MdfdVec *seg);
-static void register_unlink_segment(RelFileLocatorBackend rlocator, ForkNumber forknum,
-									BlockNumber segno);
+static void register_unlink_tombstone(RelFileLocatorBackend rlocator);
 static void register_forget_request(RelFileLocatorBackend rlocator, ForkNumber forknum,
 									BlockNumber segno);
 static void _fdvec_resize(SMgrRelation reln,
@@ -382,7 +381,7 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo)
 
 		/* Register request to unlink first segment later */
 		save_errno = errno;
-		register_unlink_segment(rlocator, forknum, 0 /* first seg */ );
+		register_unlink_tombstone(rlocator);
 		errno = save_errno;
 	}
 
@@ -1406,15 +1405,16 @@ register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
 }
 
 /*
- * register_unlink_segment() -- Schedule a file to be deleted after next checkpoint
+ * register_unlink_tombstone()
+ * 
+ * Schedule a file to be deleted after next checkpoint
  */
 static void
-register_unlink_segment(RelFileLocatorBackend rlocator, ForkNumber forknum,
-						BlockNumber segno)
+register_unlink_tombstone(RelFileLocatorBackend rlocator)
 {
 	FileTag		tag;
 
-	INIT_MD_FILETAG(tag, rlocator.locator, forknum, segno);
+	INIT_MD_FILETAG(tag, rlocator.locator, MAIN_FORKNUM, 0);
 
 	/* Should never be used with temp relations */
 	Assert(!RelFileLocatorBackendIsTemp(rlocator));
@@ -1813,6 +1813,9 @@ mdunlinkfiletag(const FileTag *ftag, char *path)
 {
 	char	   *p;
 
+	/* We only unlink tombstone files through this mechanism */
+	Assert(ftag->forknum == MAIN_FORKNUM && ftag->segno == 0);
+
 	/* Compute the path. */
 	p = relpathperm(ftag->rlocator, MAIN_FORKNUM);
 	strlcpy(path, p, MAXPGPATH);
-- 
2.45.2