Is this a problem in GenericXLogFinish()?

Started by Jeff Davisover 2 years ago64 messageshackers
Jump to latest
#1Jeff Davis
pgsql@j-davis.com

src/backend/transam/README says:

...
4. Mark the shared buffer(s) as dirty with MarkBufferDirty(). (This 
must happen before the WAL record is inserted; see notes in 
SyncOneBuffer().)
...

But GenericXLogFinish() does this:

...
/* Insert xlog record */
lsn = XLogInsert(RM_GENERIC_ID, 0);

/* Set LSN and mark buffers dirty */
for (i = 0; i < MAX_GENERIC_XLOG_PAGES; i++)
{
PageData *pageData = &state->pages[i];

if (BufferIsInvalid(pageData->buffer))
continue;
PageSetLSN(BufferGetPage(pageData->buffer), lsn);
MarkBufferDirty(pageData->buffer);
}
END_CRIT_SECTION();

Am I missing something or is that a problem?

Regards,
Jeff Davis

#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Jeff Davis (#1)
Re: Is this a problem in GenericXLogFinish()?

On 22/09/2023 23:52, Jeff Davis wrote:

src/backend/transam/README says:

...
4. Mark the shared buffer(s) as dirty with MarkBufferDirty(). (This
must happen before the WAL record is inserted; see notes in
SyncOneBuffer().)
...

But GenericXLogFinish() does this:

...
/* Insert xlog record */
lsn = XLogInsert(RM_GENERIC_ID, 0);

/* Set LSN and mark buffers dirty */
for (i = 0; i < MAX_GENERIC_XLOG_PAGES; i++)
{
PageData *pageData = &state->pages[i];

if (BufferIsInvalid(pageData->buffer))
continue;
PageSetLSN(BufferGetPage(pageData->buffer), lsn);
MarkBufferDirty(pageData->buffer);
}
END_CRIT_SECTION();

Am I missing something or is that a problem?

Yes, that's a problem.

I wish we had an assertion for that. XLogInsert() could assert that the
page is already marked dirty, for example.

--
Heikki Linnakangas
Neon (https://neon.tech)

#3Jeff Davis
pgsql@j-davis.com
In reply to: Heikki Linnakangas (#2)
Re: Is this a problem in GenericXLogFinish()?

On Mon, 2023-09-25 at 13:04 +0300, Heikki Linnakangas wrote:

Yes, that's a problem.

Patch attached. I rearranged the code a bit to follow the expected
pattern of: write, mark dirty, WAL, set LSN. I think computing the
deltas could also be moved earlier, outside of the critical section,
but I'm not sure that would be useful.

Do you have a suggestion for any kind of test addition, or should we
just review carefully?

I wish we had an assertion for that. XLogInsert() could assert that
the
page is already marked dirty, for example.

Unfortunately that's not always the case, e.g. log_newpage_range().

Regards,
Jeff Davis

Attachments:

v1-0001-Fix-GenericXLogFinish.patchtext/x-patch; charset=UTF-8; name=v1-0001-Fix-GenericXLogFinish.patchDownload+34-33
#4Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Jeff Davis (#3)
Re: Is this a problem in GenericXLogFinish()?

On 26/09/2023 22:32, Jeff Davis wrote:

On Mon, 2023-09-25 at 13:04 +0300, Heikki Linnakangas wrote:

Yes, that's a problem.

Patch attached. I rearranged the code a bit to follow the expected
pattern of: write, mark dirty, WAL, set LSN. I think computing the
deltas could also be moved earlier, outside of the critical section,
but I'm not sure that would be useful.

Looks correct. You now loop through all the block IDs three times,
however. I wonder if that is measurably slower, but even if it's not,
was there a reason you wanted to move the XLogRegisterBuffer() calls to
a separate loop?

Do you have a suggestion for any kind of test addition, or should we
just review carefully?

I wish we had an assertion for that. XLogInsert() could assert that
the page is already marked dirty, for example.

Unfortunately that's not always the case, e.g. log_newpage_range().

Hmm, I'm sure there are exceptions but log_newpage_range() actually
seems to be doing the right thing; it calls MarkBufferDirty() before
XLogInsert(). It only calls it after XLogRegisterBuffer() though, and I
concur that XLogRegisterBuffer() would be the logical place for the
assertion. We could tighten this up, require that you call
MarkBufferDirty() before XLogRegisterBuffer(), and fix all the callers.

--
Heikki Linnakangas
Neon (https://neon.tech)

#5Jeff Davis
pgsql@j-davis.com
In reply to: Heikki Linnakangas (#4)
Re: Is this a problem in GenericXLogFinish()?

On Wed, 2023-09-27 at 00:14 +0300, Heikki Linnakangas wrote:

Looks correct. You now loop through all the block IDs three times,
however. I wonder if that is measurably slower, but even if it's not,
was there a reason you wanted to move the XLogRegisterBuffer() calls
to
a separate loop?

I did so to correspond more closely to what's outlined in the README
and in other places in the code, where marking the buffers dirty
happens before XLogBeginInsert(). It didn't occur to me that one extra
loop would matter, but I can combine them again.

It would be a bit more concise to do the XLogBeginInsert() first (like
before) and then register the buffers in the same loop that does the
writes and marks the buffers dirty. Updated patch attached.

Hmm, I'm sure there are exceptions but log_newpage_range() actually
seems to be doing the right thing; it calls MarkBufferDirty() before
XLogInsert(). It only calls it after XLogRegisterBuffer() though, and
I
concur that XLogRegisterBuffer() would be the logical place for the
assertion. We could tighten this up, require that you call
MarkBufferDirty() before XLogRegisterBuffer(), and fix all the
callers.

That site is pretty trivial to fix, but there are also a couple places
in hash.c and hashovfl.c that are registering a clean page and it's not
clear to me exactly what's going on.

Regards,
Jeff Davis

Attachments:

v2-0001-Fix-GenericXLogFinish.patchtext/x-patch; charset=UTF-8; name=v2-0001-Fix-GenericXLogFinish.patchDownload+23-31
#6Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#5)
Re: Is this a problem in GenericXLogFinish()?

On Tue, Sep 26, 2023 at 9:36 PM Jeff Davis <pgsql@j-davis.com> wrote:

That site is pretty trivial to fix, but there are also a couple places
in hash.c and hashovfl.c that are registering a clean page and it's not
clear to me exactly what's going on.

Huh, I wonder if that's just a bug. Do you know where exactly?

--
Robert Haas
EDB: http://www.enterprisedb.com

#7Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#6)
Re: Is this a problem in GenericXLogFinish()?

On Wed, 2023-09-27 at 10:36 -0400, Robert Haas wrote:

On Tue, Sep 26, 2023 at 9:36 PM Jeff Davis <pgsql@j-davis.com> wrote:

That site is pretty trivial to fix, but there are also a couple
places
in hash.c and hashovfl.c that are registering a clean page and it's
not
clear to me exactly what's going on.

Huh, I wonder if that's just a bug. Do you know where exactly?

hashovfl.c:665 and hash.c:831

In both cases it's registering the bucket_buf, and has a comment like:

/*
* bucket buffer needs to be registered to ensure that we can
* acquire a cleanup lock on it during replay.
*/

And various redo routines have comments like:

/*
* Ensure we have a cleanup lock on primary bucket page before we
start
* with the actual replay operation. This is to ensure that neither a
* scan can start nor a scan can be already-in-progress during the
replay
* of this operation. If we allow scans during this operation, then
they
* can miss some records or show the same record multiple times.
*/

So it looks like it's intentionally registering a clean buffer so that
it can take a cleanup lock for reasons other than cleaning (or even
modiying) the page. I would think that there's a better way of
accomplishing that goal, so perhaps we can fix that?

Regards,
Jeff Davis

#8Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#7)
Re: Is this a problem in GenericXLogFinish()?

On Wed, Sep 27, 2023 at 11:03 AM Jeff Davis <pgsql@j-davis.com> wrote:

So it looks like it's intentionally registering a clean buffer so that
it can take a cleanup lock for reasons other than cleaning (or even
modiying) the page. I would think that there's a better way of
accomplishing that goal, so perhaps we can fix that?

I had forgotten some of the details of how this works until you
reminded me, but now that you've jarred my memory, I remember some of
it.

When Amit Kaplla and I were working on the project to add WAL-logging
to hash indexes, we ran into some problems with concurrency control
for individual buckets within the hash index. Before that project,
this was handled using heavyweight locks, one per bucket. That got
changed in 6d46f4783efe457f74816a75173eb23ed8930020 for the reasons
explained in that commit message. Basically, instead of taking
heavyweight locks, we started taking cleanup locks on the primary
bucket pages. I always thought that was a little awkward, but I didn't
quite see how to do better. I don't think that I gave much thought at
the time to the consequence you've uncovered here, namely that it
means we're sometimes locking one page (the primary bucket page)
because we want to do something to another bucket page (some page in
the linked list of pages that are part of that bucket) and for that to
be safe, we need to lock out concurrent scans of that bucket.

I guess I don't know of any easy fix here. :-(

--
Robert Haas
EDB: http://www.enterprisedb.com

#9Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#8)
Re: Is this a problem in GenericXLogFinish()?

On 27/09/2023 18:47, Robert Haas wrote:

On Wed, Sep 27, 2023 at 11:03 AM Jeff Davis <pgsql@j-davis.com> wrote:

So it looks like it's intentionally registering a clean buffer so that
it can take a cleanup lock for reasons other than cleaning (or even
modiying) the page. I would think that there's a better way of
accomplishing that goal, so perhaps we can fix that?

I had forgotten some of the details of how this works until you
reminded me, but now that you've jarred my memory, I remember some of
it.

When Amit Kaplla and I were working on the project to add WAL-logging
to hash indexes, we ran into some problems with concurrency control
for individual buckets within the hash index. Before that project,
this was handled using heavyweight locks, one per bucket. That got
changed in 6d46f4783efe457f74816a75173eb23ed8930020 for the reasons
explained in that commit message. Basically, instead of taking
heavyweight locks, we started taking cleanup locks on the primary
bucket pages. I always thought that was a little awkward, but I didn't
quite see how to do better. I don't think that I gave much thought at
the time to the consequence you've uncovered here, namely that it
means we're sometimes locking one page (the primary bucket page)
because we want to do something to another bucket page (some page in
the linked list of pages that are part of that bucket) and for that to
be safe, we need to lock out concurrent scans of that bucket.

I guess I don't know of any easy fix here. :-(

That seems OK. Maybe there would be a better way to do that, but there's
nothing wrong with that approach per se.

We could define a new REGBUF_NO_CHANGE flag to signal that the buffer is
registered just for locking purposes, and not actually modified by the
WAL record.

--
Heikki Linnakangas
Neon (https://neon.tech)

#10Jeff Davis
pgsql@j-davis.com
In reply to: Heikki Linnakangas (#9)
Re: Is this a problem in GenericXLogFinish()?

On Wed, 2023-09-27 at 18:57 +0300, Heikki Linnakangas wrote:

We could define a new REGBUF_NO_CHANGE flag to signal that the buffer
is
registered just for locking purposes, and not actually modified by
the
WAL record.

Out of curiosity I also added a trial assert (not intending to actually
add this):

Assert(!(flags & REGBUF_NO_CHANGE) || !BufferIsExclusiveLocked());

I expected that to fail, but it didn't -- perhaps that buffer is only
locked during replay or something? I don't think we want that specific
Assert; I was just experimenting with some cross-checks we could do to
verify that REGBUF_NO_CHANGE is used properly.

Also, I ran into some problems with GIN that might require a bit more
refactoring in ginPlaceToPage(). Perhaps we could consider
REGBUF_NO_CHANGE a general bypass of the Assert(BufferIsDirty()), and
use it temporarily until we have a chance to analyze/refactor each of
the callers that need it.

Regards,
        Jeff Davis

#11Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Davis (#10)
Re: Is this a problem in GenericXLogFinish()?

I committed and backported 0001 (the GenericXLogFinish() fix but not
the Assert).

Strangely I didn't see the -committers email come through yet. If
anyone notices anything wrong, please let me know before the final v11
release.

On Thu, 2023-09-28 at 12:05 -0700, Jeff Davis wrote:

Also, I ran into some problems with GIN that might require a bit more
refactoring in ginPlaceToPage(). Perhaps we could consider
REGBUF_NO_CHANGE a general bypass of the Assert(BufferIsDirty()), and
use it temporarily until we have a chance to analyze/refactor each of
the callers that need it.

For the Assert, I have attached a new patch for v17.

I changed the name of the flag to REGBUF_CLEAN_OK, because for some of
the callsites it was not clear to me whether the page is always
unchanged or sometimes unchanged. In other words, REGBUF_CLEAN_OK is a
way to bypass the Assert for callsites where we either know that we
actually want to register an unchanged page, or for callsites where we
don't know if the page is changed or not.

If we commit this, ideally someone would look into the places where
REGBUF_CLEAN_OK is used and make sure that it's not a bug, and perhaps
refactor so that it would benefit from the Assert. But refactoring
those places is outside of the scope of this patch.

It sounds like we have no intention to change the hash index code, so
are we OK if this flag just lasts forever? Do you still think it offers
a useful check?

Regards,
Jeff Davis

Attachments:

v3-0002-Assert-that-buffers-are-marked-dirty-before-XLogR.patchtext/x-patch; charset=UTF-8; name=v3-0002-Assert-that-buffers-are-marked-dirty-before-XLogR.patchDownload+51-10
#12Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Jeff Davis (#11)
Re: Is this a problem in GenericXLogFinish()?

On 10/10/2023 22:57, Jeff Davis wrote:

On Thu, 2023-09-28 at 12:05 -0700, Jeff Davis wrote:

Also, I ran into some problems with GIN that might require a bit more
refactoring in ginPlaceToPage(). Perhaps we could consider
REGBUF_NO_CHANGE a general bypass of the Assert(BufferIsDirty()), and
use it temporarily until we have a chance to analyze/refactor each of
the callers that need it.

For the Assert, I have attached a new patch for v17.

Thanks for working on this!

+/*
+ * BufferIsDirty
+ *
+ *		Checks if buffer is already dirty.
+ *
+ * Buffer must be pinned and exclusive-locked.  (If caller does not hold
+ * exclusive lock, then the result may be stale before it's returned.)
+ */
+bool
+BufferIsDirty(Buffer buffer)
+{
+	BufferDesc *bufHdr;
+
+	if (BufferIsLocal(buffer))
+	{
+		int bufid = -buffer - 1;
+		bufHdr = GetLocalBufferDescriptor(bufid);
+	}
+	else
+	{
+		bufHdr = GetBufferDescriptor(buffer - 1);
+	}
+
+	Assert(BufferIsPinned(buffer));
+	Assert(LWLockHeldByMeInMode(BufferDescriptorGetContentLock(bufHdr),
+								LW_EXCLUSIVE));
+
+	return pg_atomic_read_u32(&bufHdr->state) & BM_DIRTY;
+}

The comment suggests that you don't need to hold an exclusive lock when
you call this, but there's an assertion that you do.

Is it a new requirement that you must hold an exclusive lock on the
buffer when you call XLogRegisterBuffer()? I think that's reasonable.

I thought if that might be a problem when you WAL-log a page when you
set hint bits on it and checksums are enabled. Hint bits can be set
holding just a share lock. But it uses XLogSaveBufferForHint(), which
calls XLogRegisterBlock() instead of XLogRegisterBuffer()

There is a comment above XLogSaveBufferForHint() that implies that
XLogRegisterBuffer() requires you to hold an exclusive-lock but I don't
see that spelled out explicitly in XLogRegisterBuffer() itself. Maybe
add an assertion for that in XLogRegisterBuffer() to make it more explicit.

--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -246,6 +246,7 @@ XLogRegisterBuffer(uint8 block_id, Buffer buffer, uint8 flags)

/* NO_IMAGE doesn't make sense with FORCE_IMAGE */
Assert(!((flags & REGBUF_FORCE_IMAGE) && (flags & (REGBUF_NO_IMAGE))));
+ Assert((flags & REGBUF_CLEAN_OK) || BufferIsDirty(buffer));
Assert(begininsert_called);

if (block_id >= max_registered_block_id)

I'd suggest a comment here to explain what's wrong if someone hits this
assertion. Something like "The buffer must be marked as dirty before
registering, unless REGBUF_CLEAN_OK is set to indicate that the buffer
is not being modified".

I changed the name of the flag to REGBUF_CLEAN_OK, because for some of
the callsites it was not clear to me whether the page is always
unchanged or sometimes unchanged. In other words, REGBUF_CLEAN_OK is a
way to bypass the Assert for callsites where we either know that we
actually want to register an unchanged page, or for callsites where we
don't know if the page is changed or not.

Ok. A flag to explicitly mark that the page is not modified would
actually be nice for pg_rewind. It could ignore such page references.
Not that it matters much in practice, since those records are so rare.
And for that, we'd need to include the flag in the WAL record too. So I
think this is fine.

If we commit this, ideally someone would look into the places where
REGBUF_CLEAN_OK is used and make sure that it's not a bug, and perhaps
refactor so that it would benefit from the Assert. But refactoring
those places is outside of the scope of this patch.

About that: you added the flag to a couple of XLogRegisterBuffer() calls
in GIN, because they call MarkBufferDirty() only after
XLogRegisterBuffer(). Those would be easy to refactor so I'd suggest
doing that now.

It sounds like we have no intention to change the hash index code, so
are we OK if this flag just lasts forever? Do you still think it offers
a useful check?

Yeah, I think this is a useful assertion. It might catch some bugs in
extensions too.

--
Heikki Linnakangas
Neon (https://neon.tech)

#13Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#12)
Re: Is this a problem in GenericXLogFinish()?

On Wed, Oct 11, 2023 at 7:53 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

+ * Buffer must be pinned and exclusive-locked.  (If caller does not hold
+ * exclusive lock, then the result may be stale before it's returned.)

The comment suggests that you don't need to hold an exclusive lock when
you call this, but there's an assertion that you do.

I don't think the comment suggests that. It would if you only read the
sentence in parentheses. But if you read both of them it seems clear
enough. I guess the parenthetical sentence cloud say "If the caller
did not hold an exclusive lock, then the result might become stale
even before it was returned," basically putting the whole thing in the
subjunctive.

--
Robert Haas
EDB: http://www.enterprisedb.com

#14Jeff Davis
pgsql@j-davis.com
In reply to: Heikki Linnakangas (#12)
Re: Is this a problem in GenericXLogFinish()?

On Wed, 2023-10-11 at 14:53 +0300, Heikki Linnakangas wrote:

The comment suggests that you don't need to hold an exclusive lock
when
you call this, but there's an assertion that you do.

Reworded.

Is it a new requirement that you must hold an exclusive lock on the
buffer when you call XLogRegisterBuffer()? I think that's reasonable.

Agreed.

I'd suggest a comment here to explain what's wrong if someone hits
this
assertion. Something like "The buffer must be marked as dirty before
registering, unless REGBUF_CLEAN_OK is set to indicate that the
buffer
is not being modified".

Done, with different wording.

If we commit this, ideally someone would look into the places where
REGBUF_CLEAN_OK is used and make sure that it's not a bug, and
perhaps
refactor so that it would benefit from the Assert. But refactoring
those places is outside of the scope of this patch.

I don't think it makes sense to register a buffer that is perhaps clean
and perhaps not. After registering a buffer and writing WAL, you need
to PageSetLSN(), and that should only be done after MarkBufferDirty(),
right?

So if you need to condition PageSetLSN() on whether MarkBufferDirty()
has happened, it would be trivial to use the same condition for
XLogRegisterBuffer(). Therefore I changed the name back to
REGBUF_NO_CHANGE, as you suggested originally.

The hash indexing code knows it's not modifying the bucket buffer,
doesn't mark it dirty, and doesn't set the LSN. I presume that's safe.

However, there are quite a few places where
XLogRegisterBuffer()+WAL+PageSetLSN() all happen, but it's not
immediately obvious that all code paths to get there also
MarkBufferDirty(). For instanace:

lazy_scan_heap() -- conditionally marks heapbuf as dirty
visibilitymap_set() -- conditionally calls log_heap_visible
log_heap_visible()
XLogRegisterBuffer(1, heap_buffer, flags)

if those conditions don't match up exactly, it seems we could get into
a situation where we update the LSN of a clean page, which seems bad.

There are a few other places in the hash code and spgist code where I
have similar concerns. Not many, though, I looked at all of the call
sites (at least briefly) and most of them are fine.

About that: you added the flag to a couple of XLogRegisterBuffer()
calls
in GIN, because they call MarkBufferDirty() only after
XLogRegisterBuffer(). Those would be easy to refactor so I'd suggest
doing that now.

Done.

It sounds like we have no intention to change the hash index code,
so
are we OK if this flag just lasts forever? Do you still think it
offers
a useful check?

Yeah, I think this is a useful assertion. It might catch some bugs in
extensions too.

I was asking more about the REGBUF_NO_CHANGE flag. It feels very
specific to the hash indexes, and I'm not sure we want to encourage
extensions to use it.

Though maybe the locking protocol used by hash indexes is more
generally applicable, and other indexing strategies might want to use
it, too?

Another option might be to just change the hash indexing code to follow
the correct protocol, locking and calling MarkBufferDirty() in those 3
call sites. Marking the buffer dirty is easy, but making sure that it's
locked might require some refactoring. Robert, would following the
right protocol here affect performance?

Regards,
Jeff Davis

Attachments:

v4-0002-Assert-that-buffers-are-marked-dirty-before-XLogR.patchtext/x-patch; charset=UTF-8; name=v4-0002-Assert-that-buffers-are-marked-dirty-before-XLogR.patchDownload+118-21
#15Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#14)
Re: Is this a problem in GenericXLogFinish()?

On Mon, Oct 16, 2023 at 7:31 PM Jeff Davis <pgsql@j-davis.com> wrote:

Another option might be to just change the hash indexing code to follow
the correct protocol, locking and calling MarkBufferDirty() in those 3
call sites. Marking the buffer dirty is easy, but making sure that it's
locked might require some refactoring. Robert, would following the
right protocol here affect performance?

Sorry, I'm not sure I understand the question. Are you asking whether
dirtying buffers unnecessarily might be slower than not doing that?

--
Robert Haas
EDB: http://www.enterprisedb.com

#16Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#15)
Re: Is this a problem in GenericXLogFinish()?

On Tue, 2023-10-17 at 08:41 -0400, Robert Haas wrote:

Sorry, I'm not sure I understand the question. Are you asking whether
dirtying buffers unnecessarily might be slower than not doing that?

I meant: are those cleanup operations frequent enough that dirtying
those buffers in that case would matter?

Regards,
Jeff Davis

#17Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#16)
Re: Is this a problem in GenericXLogFinish()?

On Tue, Oct 17, 2023 at 12:38 PM Jeff Davis <pgsql@j-davis.com> wrote:

I meant: are those cleanup operations frequent enough that dirtying
those buffers in that case would matter?

Honestly, I'm not sure. Probably not? I mean, hashbucketcleanup()
seems to only be called during vacuum or a bucket split, and I don't
think you can have super-frequent calls to _hash_freeovflpage()
either. For what it's worth, though, I think it would be better to
just make these cases exceptions to your Assert, as you did in the
patch, rather than changing them to dirty the buffer. There doesn't
seem to be enough upside to making the assert unconditional to justify
changing stuff that might have a real-world performance cost ... even
if we don't think it would amount to much.

--
Robert Haas
EDB: http://www.enterprisedb.com

#18Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#17)
Re: Is this a problem in GenericXLogFinish()?

On Thu, 2023-10-19 at 16:12 -0400, Robert Haas wrote:

For what it's worth, though, I think it would be better to
just make these cases exceptions to your Assert

OK, I'll probably commit something like v4 then.

I still have a question though: if a buffer is exclusive-locked,
unmodified and clean, and then the caller registers it and later does
PageSetLSN (just as if it were dirty), is that a definite bug?

There are a couple callsites where the control flow is complex enough
that it's hard to be sure the buffer is always marked dirty before
being registered (like in log_heap_visible(), as I mentioned upthread).
But those callsites are all doing PageSetLSN, unlike the hash index
case.

Regards,
Jeff Davis

#19Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#18)
Re: Is this a problem in GenericXLogFinish()?

On Fri, Oct 20, 2023 at 12:28 PM Jeff Davis <pgsql@j-davis.com> wrote:

I still have a question though: if a buffer is exclusive-locked,
unmodified and clean, and then the caller registers it and later does
PageSetLSN (just as if it were dirty), is that a definite bug?

There are a couple callsites where the control flow is complex enough
that it's hard to be sure the buffer is always marked dirty before
being registered (like in log_heap_visible(), as I mentioned upthread).
But those callsites are all doing PageSetLSN, unlike the hash index
case.

I think that would be a bug. I think it would be OK to just change the
page LSN and nothing else, but that requires calling MarkBufferDirty()
at some point. If you only call PageSetLSN() and never
MarkBufferDirty(), that sounds messed up: either the former should be
skipped, or the latter should be added. We shouldn't modify a shared
buffer without marking it dirty.

--
Robert Haas
EDB: http://www.enterprisedb.com

#20Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#19)
Re: Is this a problem in GenericXLogFinish()?

On Mon, 2023-10-23 at 10:14 -0400, Robert Haas wrote:

I think that would be a bug. I think it would be OK to just change
the
page LSN and nothing else, but that requires calling
MarkBufferDirty()
at some point. If you only call PageSetLSN() and never
MarkBufferDirty(), that sounds messed up: either the former should be
skipped, or the latter should be added. We shouldn't modify a shared
buffer without marking it dirty.

In that case, I think REGBUF_NO_CHANGE is the right name for the flag.

Committed.

Regards,
Jeff Davis

#21Alexander Lakhin
exclusion@gmail.com
In reply to: Jeff Davis (#20)
#22Jeff Davis
pgsql@j-davis.com
In reply to: Alexander Lakhin (#21)
#23Michael Paquier
michael@paquier.xyz
In reply to: Jeff Davis (#22)
#24Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#23)
#25Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#24)
#26Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#25)
#27Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#26)
#28Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#27)
#29Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#26)
#30Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#29)
#31Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#30)
#32Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#28)
#33Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#32)
#34Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Amit Kapila (#33)
#35Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Hayato Kuroda (Fujitsu) (#34)
#36Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Hayato Kuroda (Fujitsu) (#35)
#37Amit Kapila
amit.kapila16@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#36)
#38Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Amit Kapila (#37)
#39Robert Haas
robertmhaas@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#38)
#40Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#39)
#41Alexander Lakhin
exclusion@gmail.com
In reply to: Robert Haas (#39)
#42Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Alexander Lakhin (#41)
#43Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Hayato Kuroda (Fujitsu) (#42)
#44Amit Kapila
amit.kapila16@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#43)
#45Alexander Lakhin
exclusion@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#43)
#46Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Amit Kapila (#44)
#47Amit Kapila
amit.kapila16@gmail.com
In reply to: Alexander Lakhin (#45)
#48Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#47)
#49Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#48)
#50Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Michael Paquier (#48)
#51Amit Kapila
amit.kapila16@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#50)
#52Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Amit Kapila (#51)
#53Michael Paquier
michael@paquier.xyz
In reply to: Hayato Kuroda (Fujitsu) (#52)
#54Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#53)
#55Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#54)
#56Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#55)
#57Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#56)
#58Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Michael Paquier (#56)
#59Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#56)
#60Michael Paquier
michael@paquier.xyz
In reply to: Hayato Kuroda (Fujitsu) (#58)
#61Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Michael Paquier (#60)
#62Michael Paquier
michael@paquier.xyz
In reply to: Hayato Kuroda (Fujitsu) (#61)
#63Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#62)
#64Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#63)