Confine vacuum skip logic to lazy_scan_skip

melanieplageman@gmail.com

about 2 years ago

In reply to: Melanie Plageman (#1)

Re: Confine vacuum skip logic to lazy_scan_skip

On Sun, Dec 31, 2023 at 1:28 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

There are a few comments that still need to be updated. I also noticed I
needed to reorder and combine a couple of the commits. I wanted to
register this for the january commitfest, so I didn't quite have time
for the finishing touches.

I've updated this patch set to remove a commit that didn't make sense
on its own and do various other cleanup.

- Melanie

Andres Freund

andres@anarazel.de

about 2 years ago

In reply to: Melanie Plageman (#2)

Re: Confine vacuum skip logic to lazy_scan_skip

Hi,

On 2024-01-02 12:36:18 -0500, Melanie Plageman wrote:

Subject: [PATCH v2 1/6] lazy_scan_skip remove unnecessary local var rel_pages
Subject: [PATCH v2 2/6] lazy_scan_skip remove unneeded local var
nskippable_blocks

I think these may lead to worse code - the compiler has to reload
vacrel->rel_pages/next_unskippable_block for every loop iteration, because it
can't guarantee that they're not changed within one of the external functions
called in the loop body.

Subject: [PATCH v2 3/6] Add lazy_scan_skip unskippable state

Future commits will remove all skipping logic from lazy_scan_heap() and
confine it to lazy_scan_skip(). To make those commits more clear, first
introduce the struct, VacSkipState, which will maintain the variables
needed to skip ranges less than SKIP_PAGES_THRESHOLD.

Why not add this to LVRelState, possibly as a struct embedded within it?

From 335faad5948b2bec3b83c2db809bb9161d373dcb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 30 Dec 2023 16:59:27 -0500
Subject: [PATCH v2 4/6] Confine vacuum skip logic to lazy_scan_skip

In preparation for vacuum to use the streaming read interface (and eventually
AIO), refactor vacuum's logic for skipping blocks such that it is entirely
confined to lazy_scan_skip(). This turns lazy_scan_skip() and the VacSkipState
it uses into an iterator which yields blocks to lazy_scan_heap(). Such a
structure is conducive to an async interface.

And it's cleaner - I find the current code extremely hard to reason about.

By always calling lazy_scan_skip() -- instead of only when we have reached the
next unskippable block, we no longer need the skipping_current_range variable.
lazy_scan_heap() no longer needs to manage the skipped range -- checking if we
reached the end in order to then call lazy_scan_skip(). And lazy_scan_skip()
can derive the visibility status of a block from whether or not we are in a
skippable range -- that is, whether or not the next_block is equal to the next
unskippable block.

I wonder if it should be renamed as part of this - the name is somewhat
confusing now (and perhaps before)? lazy_scan_get_next_block() or such?

+ while (true)
{
Buffer buf;
Page page;
- bool all_visible_according_to_vm;
LVPagePruneState prunestate;

-		if (blkno == vacskip.next_unskippable_block)
-		{
-			/*
-			 * Can't skip this page safely.  Must scan the page.  But
-			 * determine the next skippable range after the page first.
-			 */
-			all_visible_according_to_vm = vacskip.next_unskippable_allvis;
-			lazy_scan_skip(vacrel, &vacskip, blkno + 1);
-
-			Assert(vacskip.next_unskippable_block >= blkno + 1);
-		}
-		else
-		{
-			/* Last page always scanned (may need to set nonempty_pages) */
-			Assert(blkno < rel_pages - 1);
-
-			if (vacskip.skipping_current_range)
-				continue;
+		blkno = lazy_scan_skip(vacrel, &vacskip, blkno + 1,
+							   &all_visible_according_to_vm);

-			/* Current range is too small to skip -- just scan the page */
-			all_visible_according_to_vm = true;
-		}
+		if (blkno == InvalidBlockNumber)
+			break;

vacrel->scanned_pages++;

I don't like that we still do determination about the next block outside of
lazy_scan_skip() and have duplicated exit conditions between lazy_scan_skip()
and lazy_scan_heap().

I'd probably change the interface to something like

while (lazy_scan_get_next_block(vacrel, &blkno))
{
...
}

From b6603e35147c4bbe3337280222e6243524b0110e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 31 Dec 2023 09:47:18 -0500
Subject: [PATCH v2 5/6] VacSkipState saves reference to LVRelState

The streaming read interface can only give pgsr_next callbacks access to
two pieces of private data. As such, move a reference to the LVRelState
into the VacSkipState.

This is a separate commit (as opposed to as part of the commit
introducing VacSkipState) because it is required for using the streaming
read interface but not a natural change on its own. VacSkipState is per
block and the LVRelState is referenced for the whole relation vacuum.

I'd do it the other way round, i.e. either embed VacSkipState ino LVRelState
or point to it from VacSkipState.

LVRelState is already tied to the iteration state, so I don't think there's a
reason not to do so.

Greetings,

Andres Freund

Jim Nasby

Jim.Nasby@BlueTreble.com

about 2 years ago

In reply to: Andres Freund (#3)

Re: Confine vacuum skip logic to lazy_scan_skip

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 1/4/24 2:23 PM, Andres Freund wrote:<br>
</div>
<blockquote type="cite"
cite="mid:20240104202309.77h5llrambkl5a3m@awork3.anarazel.de">
<pre><pre class="moz-quote-pre" wrap="">On 2024-01-02 12:36:18 -0500, Melanie Plageman wrote:
</pre><blockquote type="cite" style="color: #007cff;"><pre
class="moz-quote-pre" wrap="">Subject: [PATCH v2 1/6] lazy_scan_skip remove unnecessary local var rel_pages
Subject: [PATCH v2 2/6] lazy_scan_skip remove unneeded local var
nskippable_blocks
</pre></blockquote><pre class="moz-quote-pre" wrap="">I think these may lead to worse code - the compiler has to reload
vacrel->rel_pages/next_unskippable_block for every loop iteration, because it
can't guarantee that they're not changed within one of the external functions
called in the loop body.</pre></pre>
</blockquote>
<p>Admittedly I'm not up to speed on recent vacuum changes, but I
have to wonder if the concept of skipping should go away in the
context of vector IO? Instead of thinking about "we can skip this
range of blocks", why not maintain a list of "here's the next X
number of blocks that we need to vacuum"?<br>
</p>
<pre class="moz-signature" cols="72">--
Jim Nasby, Data Architect, Austin TX</pre>
</body>
</html>

Nazir Bilal Yavuz

byavuz81@gmail.com

about 2 years ago

In reply to: Jim Nasby (#4)

Re: Confine vacuum skip logic to lazy_scan_skip

Hi,

On Fri, 5 Jan 2024 at 02:25, Jim Nasby <jim.nasby@gmail.com> wrote:

On 1/4/24 2:23 PM, Andres Freund wrote:

On 2024-01-02 12:36:18 -0500, Melanie Plageman wrote:

Subject: [PATCH v2 1/6] lazy_scan_skip remove unnecessary local var rel_pages
Subject: [PATCH v2 2/6] lazy_scan_skip remove unneeded local var
nskippable_blocks

I think these may lead to worse code - the compiler has to reload
vacrel->rel_pages/next_unskippable_block for every loop iteration, because it
can't guarantee that they're not changed within one of the external functions
called in the loop body.

Admittedly I'm not up to speed on recent vacuum changes, but I have to wonder if the concept of skipping should go away in the context of vector IO? Instead of thinking about "we can skip this range of blocks", why not maintain a list of "here's the next X number of blocks that we need to vacuum"?

Sorry if I misunderstood. AFAIU, with the help of the vectored IO;
"the next X number of blocks that need to be vacuumed" will be
prefetched by calculating the unskippable blocks ( using the
lazy_scan_skip() function ) and the X will be determined by Postgres
itself. Do you have something different in your mind?

--
Regards,
Nazir Bilal Yavuz
Microsoft

melanieplageman@gmail.com

about 2 years ago

In reply to: Andres Freund (#3)

Re: Confine vacuum skip logic to lazy_scan_skip

v3 attached

On Thu, Jan 4, 2024 at 3:23 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2024-01-02 12:36:18 -0500, Melanie Plageman wrote:

Subject: [PATCH v2 1/6] lazy_scan_skip remove unnecessary local var rel_pages
Subject: [PATCH v2 2/6] lazy_scan_skip remove unneeded local var
nskippable_blocks

I think these may lead to worse code - the compiler has to reload
vacrel->rel_pages/next_unskippable_block for every loop iteration, because it
can't guarantee that they're not changed within one of the external functions
called in the loop body.

I buy that for 0001 but 0002 is still using local variables.
nskippable_blocks was just another variable to keep track of even
though we could already get that info from local variables
next_unskippable_block and next_block.

In light of this comment, I've refactored 0003/0004 (0002 and 0003 in
this version [v3]) to use local variables in the loop as well. I had
started using the members of the VacSkipState which I introduced.

Subject: [PATCH v2 3/6] Add lazy_scan_skip unskippable state

Future commits will remove all skipping logic from lazy_scan_heap() and
confine it to lazy_scan_skip(). To make those commits more clear, first
introduce the struct, VacSkipState, which will maintain the variables
needed to skip ranges less than SKIP_PAGES_THRESHOLD.

Why not add this to LVRelState, possibly as a struct embedded within it?

Done in attached.

From 335faad5948b2bec3b83c2db809bb9161d373dcb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 30 Dec 2023 16:59:27 -0500
Subject: [PATCH v2 4/6] Confine vacuum skip logic to lazy_scan_skip

By always calling lazy_scan_skip() -- instead of only when we have reached the
next unskippable block, we no longer need the skipping_current_range variable.
lazy_scan_heap() no longer needs to manage the skipped range -- checking if we
reached the end in order to then call lazy_scan_skip(). And lazy_scan_skip()
can derive the visibility status of a block from whether or not we are in a
skippable range -- that is, whether or not the next_block is equal to the next
unskippable block.

I wonder if it should be renamed as part of this - the name is somewhat
confusing now (and perhaps before)? lazy_scan_get_next_block() or such?

Why stop there! I've removed lazy and called it
heap_vac_scan_get_next_block() -- a little long, but...

+     while (true)
{
Buffer          buf;
Page            page;
-             bool            all_visible_according_to_vm;
LVPagePruneState prunestate;

-             if (blkno == vacskip.next_unskippable_block)
-             {
-                     /*
-                      * Can't skip this page safely.  Must scan the page.  But
-                      * determine the next skippable range after the page first.
-                      */
-                     all_visible_according_to_vm = vacskip.next_unskippable_allvis;
-                     lazy_scan_skip(vacrel, &vacskip, blkno + 1);
-
-                     Assert(vacskip.next_unskippable_block >= blkno + 1);
-             }
-             else
-             {
-                     /* Last page always scanned (may need to set nonempty_pages) */
-                     Assert(blkno < rel_pages - 1);
-
-                     if (vacskip.skipping_current_range)
-                             continue;
+             blkno = lazy_scan_skip(vacrel, &vacskip, blkno + 1,
+                                                        &all_visible_according_to_vm);

-                     /* Current range is too small to skip -- just scan the page */
-                     all_visible_according_to_vm = true;
-             }
+             if (blkno == InvalidBlockNumber)
+                     break;

vacrel->scanned_pages++;

I don't like that we still do determination about the next block outside of
lazy_scan_skip() and have duplicated exit conditions between lazy_scan_skip()
and lazy_scan_heap().

I'd probably change the interface to something like

while (lazy_scan_get_next_block(vacrel, &blkno))
{
...
}

I've done this. I do now find the parameter names a bit confusing.
There is next_block (which is the "next block in line" and is an input
parameter) and blkno, which is an output parameter with the next block
that should actually be processed. Maybe it's okay?

From b6603e35147c4bbe3337280222e6243524b0110e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 31 Dec 2023 09:47:18 -0500
Subject: [PATCH v2 5/6] VacSkipState saves reference to LVRelState

The streaming read interface can only give pgsr_next callbacks access to
two pieces of private data. As such, move a reference to the LVRelState
into the VacSkipState.

This is a separate commit (as opposed to as part of the commit
introducing VacSkipState) because it is required for using the streaming
read interface but not a natural change on its own. VacSkipState is per
block and the LVRelState is referenced for the whole relation vacuum.

I'd do it the other way round, i.e. either embed VacSkipState ino LVRelState
or point to it from VacSkipState.

LVRelState is already tied to the iteration state, so I don't think there's a
reason not to do so.

Done, and, as such, this patch is dropped from the set.

- Melane

melanieplageman@gmail.com

about 2 years ago

In reply to: Nazir Bilal Yavuz (#5)

Re: Confine vacuum skip logic to lazy_scan_skip

On Fri, Jan 5, 2024 at 5:51 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

On Fri, 5 Jan 2024 at 02:25, Jim Nasby <jim.nasby@gmail.com> wrote:

On 1/4/24 2:23 PM, Andres Freund wrote:

On 2024-01-02 12:36:18 -0500, Melanie Plageman wrote:

Subject: [PATCH v2 1/6] lazy_scan_skip remove unnecessary local var rel_pages
Subject: [PATCH v2 2/6] lazy_scan_skip remove unneeded local var
nskippable_blocks

I think these may lead to worse code - the compiler has to reload
vacrel->rel_pages/next_unskippable_block for every loop iteration, because it
can't guarantee that they're not changed within one of the external functions
called in the loop body.

Admittedly I'm not up to speed on recent vacuum changes, but I have to wonder if the concept of skipping should go away in the context of vector IO? Instead of thinking about "we can skip this range of blocks", why not maintain a list of "here's the next X number of blocks that we need to vacuum"?

Sorry if I misunderstood. AFAIU, with the help of the vectored IO;
"the next X number of blocks that need to be vacuumed" will be
prefetched by calculating the unskippable blocks ( using the
lazy_scan_skip() function ) and the X will be determined by Postgres
itself. Do you have something different in your mind?

I think you are both right. As we gain more control of readahead from
within Postgres, we will likely want to revisit this heuristic as it
may not serve us anymore. But the streaming read interface/vectored
I/O is also not a drop-in replacement for it. To change anything and
ensure there is no regression, we will probably have to do
cross-platform benchmarking, though.

That being said, I would absolutely love to get rid of the skippable
ranges because I find them very error-prone and confusing. Hopefully
now that the skipping logic is isolated to a single function, it will
be easier not to trip over it when working on lazy_scan_heap().

- Melanie

Jim Nasby

Jim.Nasby@BlueTreble.com

about 2 years ago

In reply to: Melanie Plageman (#7)

Re: Confine vacuum skip logic to lazy_scan_skip

On 1/11/24 5:50 PM, Melanie Plageman wrote:

On Fri, Jan 5, 2024 at 5:51 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

On Fri, 5 Jan 2024 at 02:25, Jim Nasby <jim.nasby@gmail.com> wrote:

On 1/4/24 2:23 PM, Andres Freund wrote:

On 2024-01-02 12:36:18 -0500, Melanie Plageman wrote:

Subject: [PATCH v2 1/6] lazy_scan_skip remove unnecessary local var rel_pages
Subject: [PATCH v2 2/6] lazy_scan_skip remove unneeded local var
nskippable_blocks

I think these may lead to worse code - the compiler has to reload
vacrel->rel_pages/next_unskippable_block for every loop iteration, because it
can't guarantee that they're not changed within one of the external functions
called in the loop body.

Admittedly I'm not up to speed on recent vacuum changes, but I have to wonder if the concept of skipping should go away in the context of vector IO? Instead of thinking about "we can skip this range of blocks", why not maintain a list of "here's the next X number of blocks that we need to vacuum"?

Sorry if I misunderstood. AFAIU, with the help of the vectored IO;
"the next X number of blocks that need to be vacuumed" will be
prefetched by calculating the unskippable blocks ( using the
lazy_scan_skip() function ) and the X will be determined by Postgres
itself. Do you have something different in your mind?

I think you are both right. As we gain more control of readahead from
within Postgres, we will likely want to revisit this heuristic as it
may not serve us anymore. But the streaming read interface/vectored
I/O is also not a drop-in replacement for it. To change anything and
ensure there is no regression, we will probably have to do
cross-platform benchmarking, though.

That being said, I would absolutely love to get rid of the skippable
ranges because I find them very error-prone and confusing. Hopefully
now that the skipping logic is isolated to a single function, it will
be easier not to trip over it when working on lazy_scan_heap().

Yeah, arguably it's just a matter of semantics, but IMO it's a lot
clearer to simply think in terms of "here's the next blocks we know we
want to vacuum" instead of "we vacuum everything, but sometimes we skip
some blocks".
--
Jim Nasby, Data Architect, Austin TX

melanieplageman@gmail.com

about 2 years ago

In reply to: Jim Nasby (#8)

Re: Confine vacuum skip logic to lazy_scan_skip

On Fri, Jan 12, 2024 at 2:02 PM Jim Nasby <jim.nasby@gmail.com> wrote:

On 1/11/24 5:50 PM, Melanie Plageman wrote:

On Fri, Jan 5, 2024 at 5:51 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

On Fri, 5 Jan 2024 at 02:25, Jim Nasby <jim.nasby@gmail.com> wrote:

On 1/4/24 2:23 PM, Andres Freund wrote:

On 2024-01-02 12:36:18 -0500, Melanie Plageman wrote:

Subject: [PATCH v2 1/6] lazy_scan_skip remove unnecessary local var rel_pages
Subject: [PATCH v2 2/6] lazy_scan_skip remove unneeded local var
nskippable_blocks

I think these may lead to worse code - the compiler has to reload
vacrel->rel_pages/next_unskippable_block for every loop iteration, because it
can't guarantee that they're not changed within one of the external functions
called in the loop body.

Admittedly I'm not up to speed on recent vacuum changes, but I have to wonder if the concept of skipping should go away in the context of vector IO? Instead of thinking about "we can skip this range of blocks", why not maintain a list of "here's the next X number of blocks that we need to vacuum"?

Sorry if I misunderstood. AFAIU, with the help of the vectored IO;
"the next X number of blocks that need to be vacuumed" will be
prefetched by calculating the unskippable blocks ( using the
lazy_scan_skip() function ) and the X will be determined by Postgres
itself. Do you have something different in your mind?

I think you are both right. As we gain more control of readahead from
within Postgres, we will likely want to revisit this heuristic as it
may not serve us anymore. But the streaming read interface/vectored
I/O is also not a drop-in replacement for it. To change anything and
ensure there is no regression, we will probably have to do
cross-platform benchmarking, though.

That being said, I would absolutely love to get rid of the skippable
ranges because I find them very error-prone and confusing. Hopefully
now that the skipping logic is isolated to a single function, it will
be easier not to trip over it when working on lazy_scan_heap().

Yeah, arguably it's just a matter of semantics, but IMO it's a lot
clearer to simply think in terms of "here's the next blocks we know we
want to vacuum" instead of "we vacuum everything, but sometimes we skip
some blocks".

Even "we vacuum some stuff, but sometimes we skip some blocks" would
be okay. What we have now is "we vacuum some stuff, but sometimes we
skip some blocks, but only if we would skip enough blocks, and, when
we decide to do that we can't go back and actually get visibility
information for those blocks we skipped because we are too cheap"

- Melanie

#10

vignesh C

vignesh21@gmail.com

about 2 years ago

In reply to: Melanie Plageman (#6)

Re: Confine vacuum skip logic to lazy_scan_skip

On Fri, 12 Jan 2024 at 05:12, Melanie Plageman
<melanieplageman@gmail.com> wrote:

v3 attached

On Thu, Jan 4, 2024 at 3:23 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2024-01-02 12:36:18 -0500, Melanie Plageman wrote:

Subject: [PATCH v2 1/6] lazy_scan_skip remove unnecessary local var rel_pages
Subject: [PATCH v2 2/6] lazy_scan_skip remove unneeded local var
nskippable_blocks

I think these may lead to worse code - the compiler has to reload
vacrel->rel_pages/next_unskippable_block for every loop iteration, because it
can't guarantee that they're not changed within one of the external functions
called in the loop body.

I buy that for 0001 but 0002 is still using local variables.
nskippable_blocks was just another variable to keep track of even
though we could already get that info from local variables
next_unskippable_block and next_block.

In light of this comment, I've refactored 0003/0004 (0002 and 0003 in
this version [v3]) to use local variables in the loop as well. I had
started using the members of the VacSkipState which I introduced.

Subject: [PATCH v2 3/6] Add lazy_scan_skip unskippable state

Future commits will remove all skipping logic from lazy_scan_heap() and
confine it to lazy_scan_skip(). To make those commits more clear, first
introduce the struct, VacSkipState, which will maintain the variables
needed to skip ranges less than SKIP_PAGES_THRESHOLD.

Why not add this to LVRelState, possibly as a struct embedded within it?

Done in attached.

From 335faad5948b2bec3b83c2db809bb9161d373dcb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 30 Dec 2023 16:59:27 -0500
Subject: [PATCH v2 4/6] Confine vacuum skip logic to lazy_scan_skip

By always calling lazy_scan_skip() -- instead of only when we have reached the
next unskippable block, we no longer need the skipping_current_range variable.
lazy_scan_heap() no longer needs to manage the skipped range -- checking if we
reached the end in order to then call lazy_scan_skip(). And lazy_scan_skip()
can derive the visibility status of a block from whether or not we are in a
skippable range -- that is, whether or not the next_block is equal to the next
unskippable block.

I wonder if it should be renamed as part of this - the name is somewhat
confusing now (and perhaps before)? lazy_scan_get_next_block() or such?

Why stop there! I've removed lazy and called it
heap_vac_scan_get_next_block() -- a little long, but...
+     while (true)
{
Buffer          buf;
Page            page;
-             bool            all_visible_according_to_vm;
LVPagePruneState prunestate;
-             if (blkno == vacskip.next_unskippable_block)
-             {
-                     /*
-                      * Can't skip this page safely.  Must scan the page.  But
-                      * determine the next skippable range after the page first.
-                      */
-                     all_visible_according_to_vm = vacskip.next_unskippable_allvis;
-                     lazy_scan_skip(vacrel, &vacskip, blkno + 1);
-
-                     Assert(vacskip.next_unskippable_block >= blkno + 1);
-             }
-             else
-             {
-                     /* Last page always scanned (may need to set nonempty_pages) */
-                     Assert(blkno < rel_pages - 1);
-
-                     if (vacskip.skipping_current_range)
-                             continue;
+             blkno = lazy_scan_skip(vacrel, &vacskip, blkno + 1,
+                                                        &all_visible_according_to_vm);
-                     /* Current range is too small to skip -- just scan the page */
-                     all_visible_according_to_vm = true;
-             }
+             if (blkno == InvalidBlockNumber)
+                     break;
vacrel->scanned_pages++;
I don't like that we still do determination about the next block outside of
lazy_scan_skip() and have duplicated exit conditions between lazy_scan_skip()
and lazy_scan_heap().

I'd probably change the interface to something like

while (lazy_scan_get_next_block(vacrel, &blkno))
{
...
}
I've done this. I do now find the parameter names a bit confusing.
There is next_block (which is the "next block in line" and is an input
parameter) and blkno, which is an output parameter with the next block
that should actually be processed. Maybe it's okay?

From b6603e35147c4bbe3337280222e6243524b0110e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 31 Dec 2023 09:47:18 -0500
Subject: [PATCH v2 5/6] VacSkipState saves reference to LVRelState

The streaming read interface can only give pgsr_next callbacks access to
two pieces of private data. As such, move a reference to the LVRelState
into the VacSkipState.

This is a separate commit (as opposed to as part of the commit
introducing VacSkipState) because it is required for using the streaming
read interface but not a natural change on its own. VacSkipState is per
block and the LVRelState is referenced for the whole relation vacuum.

I'd do it the other way round, i.e. either embed VacSkipState ino LVRelState
or point to it from VacSkipState.

LVRelState is already tied to the iteration state, so I don't think there's a
reason not to do so.

Done, and, as such, this patch is dropped from the set.

CFBot shows that the patch does not apply anymore as in [1]http://cfbot.cputube.org/patch_46_4755.log:
=== applying patch
./v3-0002-Add-lazy_scan_skip-unskippable-state-to-LVRelStat.patch
patching file src/backend/access/heap/vacuumlazy.c
...
Hunk #10 FAILED at 1042.
Hunk #11 FAILED at 1121.
Hunk #12 FAILED at 1132.
Hunk #13 FAILED at 1161.
Hunk #14 FAILED at 1172.
Hunk #15 FAILED at 1194.
...
6 out of 21 hunks FAILED -- saving rejects to file
src/backend/access/heap/vacuumlazy.c.rej

Please post an updated version for the same.

[1]: http://cfbot.cputube.org/patch_46_4755.log

Regards,
Vignesh

#11

melanieplageman@gmail.com

about 2 years ago

In reply to: vignesh C (#10)

Re: Confine vacuum skip logic to lazy_scan_skip

On Fri, Jan 26, 2024 at 8:28 AM vignesh C <vignesh21@gmail.com> wrote:

CFBot shows that the patch does not apply anymore as in [1]:
=== applying patch
./v3-0002-Add-lazy_scan_skip-unskippable-state-to-LVRelStat.patch
patching file src/backend/access/heap/vacuumlazy.c
...
Hunk #10 FAILED at 1042.
Hunk #11 FAILED at 1121.
Hunk #12 FAILED at 1132.
Hunk #13 FAILED at 1161.
Hunk #14 FAILED at 1172.
Hunk #15 FAILED at 1194.
...
6 out of 21 hunks FAILED -- saving rejects to file
src/backend/access/heap/vacuumlazy.c.rej

Please post an updated version for the same.

[1] - http://cfbot.cputube.org/patch_46_4755.log

Fixed in attached rebased v4

- Melanie

#12

[1]: /messages/by-id/CA+hUKGJtLyxcAEvLhVUhgD4fMQkOu3PDaj8Qb9SR_UsmzgsBpQ@mail.gmail.com

melanieplageman@gmail.com

about 2 years ago

In reply to: Melanie Plageman (#11)

Re: Confine vacuum skip logic to lazy_scan_skip

On Mon, Jan 29, 2024 at 8:18 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Fri, Jan 26, 2024 at 8:28 AM vignesh C <vignesh21@gmail.com> wrote:

CFBot shows that the patch does not apply anymore as in [1]:
=== applying patch
./v3-0002-Add-lazy_scan_skip-unskippable-state-to-LVRelStat.patch
patching file src/backend/access/heap/vacuumlazy.c
...
Hunk #10 FAILED at 1042.
Hunk #11 FAILED at 1121.
Hunk #12 FAILED at 1132.
Hunk #13 FAILED at 1161.
Hunk #14 FAILED at 1172.
Hunk #15 FAILED at 1194.
...
6 out of 21 hunks FAILED -- saving rejects to file
src/backend/access/heap/vacuumlazy.c.rej

Please post an updated version for the same.

[1] - http://cfbot.cputube.org/patch_46_4755.log

Fixed in attached rebased v4

In light of Thomas' update to the streaming read API [1]/messages/by-id/CA+hUKGJtLyxcAEvLhVUhgD4fMQkOu3PDaj8Qb9SR_UsmzgsBpQ@mail.gmail.com, I have
rebased and updated this patch set.

The attached v5 has some simplifications when compared to v4 but takes
largely the same approach.

0001-0004 are refactoring
0005 is the streaming read code not yet in master
0006 is the vacuum streaming read user for vacuum's first pass
0007 is the vacuum streaming read user for vacuum's second pass

- Melanie

#13

heikki.linnakangas@enterprisedb.com

about 2 years ago

In reply to: Melanie Plageman (#12)

Re: Confine vacuum skip logic to lazy_scan_skip

On 27/02/2024 21:47, Melanie Plageman wrote:

The attached v5 has some simplifications when compared to v4 but takes
largely the same approach.

0001-0004 are refactoring

I'm looking at just these 0001-0004 patches for now. I like those
changes a lot for the sake of readablity even without any of the later
patches.

I made some further changes. I kept them as separate commits for easier
review, see the commit messages for details. Any thoughts on those changes?

I feel heap_vac_scan_get_next_block() function could use some love.
Maybe just some rewording of the comments, or maybe some other
refactoring; not sure. But I'm pretty happy with the function signature
and how it's called.

BTW, do we have tests that would fail if we botched up
heap_vac_scan_get_next_block() so that it would skip pages incorrectly,
for example? Not asking you to write them for this patch, but I'm just
wondering.

--
Heikki Linnakangas
Neon (https://neon.tech)

#14

See https://cirrus-ci.com/task/6312305361682432

melanieplageman@gmail.com

about 2 years ago

In reply to: Melanie Plageman (#12)

Re: Confine vacuum skip logic to lazy_scan_skip

On Tue, Feb 27, 2024 at 02:47:03PM -0500, Melanie Plageman wrote:

On Mon, Jan 29, 2024 at 8:18 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Fri, Jan 26, 2024 at 8:28 AM vignesh C <vignesh21@gmail.com> wrote:

CFBot shows that the patch does not apply anymore as in [1]:
=== applying patch
./v3-0002-Add-lazy_scan_skip-unskippable-state-to-LVRelStat.patch
patching file src/backend/access/heap/vacuumlazy.c
...
Hunk #10 FAILED at 1042.
Hunk #11 FAILED at 1121.
Hunk #12 FAILED at 1132.
Hunk #13 FAILED at 1161.
Hunk #14 FAILED at 1172.
Hunk #15 FAILED at 1194.
...
6 out of 21 hunks FAILED -- saving rejects to file
src/backend/access/heap/vacuumlazy.c.rej

Please post an updated version for the same.

[1] - http://cfbot.cputube.org/patch_46_4755.log

Fixed in attached rebased v4

In light of Thomas' update to the streaming read API [1], I have
rebased and updated this patch set.

The attached v5 has some simplifications when compared to v4 but takes
largely the same approach.

Attached is a patch set (v5a) which updates the streaming read user for
vacuum to fix an issue Andrey Borodin pointed out to me off-list.

Note that I started writing this email before seeing Heikki's upthread
review [1]/messages/by-id/1eeccf12-d5d1-4b7e-b88b-7342410129d7@iki.fi, so I will respond to that in a bit. There are no changes in
v5a to any of the prelim refactoring patches which Heikki reviewed in
that email. I only changed the vacuum streaming read users (last two
patches in the set).

Back to this patch set:
Andrey pointed out that it was failing to compile on windows and the
reason is that I had accidentally left an undefined variable "index" in
these places

Assert(index > 0);
...
ereport(DEBUG2,
(errmsg("table \"%s\": removed %lld dead item identifiers in %u pages",
vacrel->relname, (long long) index, vacuumed_pages)));

I don't understand how this didn't warn me (or fail to compile) for an
assert build on my own workstation. It seems to think "index" is a
function?

Anyway, thinking about what the correct assertion would be here:

Assert(index > 0);
Assert(vacrel->num_index_scans > 1 ||
(rbstate->end_idx == vacrel->lpdead_items &&
vacuumed_pages == vacrel->lpdead_item_pages));

I think I can just replace "index" with "rbstate->end_index". At the end
of reaping, this should have the same value that index would have had.
The issue with this is if pg_streaming_read_buffer_get_next() somehow
never returned a valid buffer (there were no dead items), then rbstate
would potentially be uninitialized. The old assertion (index > 0) would
only have been true if there were some dead items, but there isn't an
explicit assertion in this function that there were some dead items.
Perhaps it is worth adding this? Even if we add this, perhaps it is
unacceptable from a programming standpoint to use rbstate in that scope?

In addition to fixing this slip-up, I have done some performance testing
for streaming read vacuum. Note that these tests are for both vacuum
passes (1 and 2) using streaming read.

Performance results:

The TL;DR of my performance results is that streaming read vacuum is
faster. However there is an issue with the interaction of the streaming
read code and the vacuum buffer access strategy which must be addressed.

Note that "master" in the results below is actually just a commit on my
branch [2]https://github.com/melanieplageman/postgres/tree/vac_pgsr before the one adding the vacuum streaming read users. So it
includes all of my refactoring of the vacuum code from the preliminary
patches.

I tested two vacuum "data states". Both are relatively small tables
because the impact of streaming read can easily be seen even at small
table sizes. DDL for both data states is at the end of the email.

The first data state is a 28 MB table which has never been vacuumed and
has one or two dead tuples on every block. All of the blocks have dead
tuples, so all of the blocks must be vacuumed. We'll call this the
"sequential" data state.

The second data state is a 67 MB table which has been vacuumed and then
a small percentage of the blocks (non-consecutive blocks at irregular
intervals) are updated afterward. Because the visibility map has been
updated and only a few blocks have dead tuples, large ranges of blocks
do not need to be vacuumed. There is at least one run of blocks with
dead tuples larger than 1 block but most of the blocks with dead tuples
are a single block followed by many blocks with no dead tuples. We'll
call this the "few" data state.

I tested these data states with "master" and with streaming read vacuum
with three caching options:

- table data fully in shared buffers (via pg_prewarm)
- table data in the kernel buffer cache but not in shared buffers
- table data completely uncached

I tested the OS cached and uncached caching options with both the
default vacuum buffer access strategy and with BUFFER_USAGE_LIMIT 0
(which uses as many shared buffers as needed).

For the streaming read vacuum, I tested with maintenance_io_concurrency
10, 100, and 1000. 10 is the current default on master.
maintenance_io_concurrency is not used by vacuum on master AFAICT.

maintenance_io_concurrency is used by streaming read to determine how
many buffers it can pin at the same time (with the hope of combining
consecutive blocks into larger IOs) and, in the case of vacuum, it is
used to determine prefetch distance.

In the following results, I ran vacuum at least five times and averaged
the timing results.

Table data cached in shared buffers
===================================

Sequential data state
---------------------

The only noticeable difference in performance was that streaming read
vacuum took 2% longer than master (19 ms vs 18.6 ms). It was a bit more
noticeable at maintenance_io_concurrency 1000 than 10.

This may be resolved by a patch Thomas is working on to avoid pinning
too many buffers if larger IOs cannot be created (like in a fully SB
resident workload). We should revisit this when that patch is available.

Few data state
--------------

There was no difference in timing for any of the scenarios.

Table data cached in OS buffer cache
====================================

Sequential data state
---------------------

With the buffer access strategy disabled, streaming read vacuum took 11%
less time regardless of maintenance_io_concurrency (26 ms vs 23 ms).

With the default vacuum buffer access strategy,
maintenance_io_concurrency had a large impact:

Note that "mic" is maintenace_io_concurrency

| data state | code | mic | time (ms) |
+------------+-----------+------+-----------+
| sequential | master | NA | 99 |
| sequential | streaming | 10 | 122 |
| sequential | streaming | 100 | 28 |

The streaming read API calculates the maximum number of pinned buffers
as 4 * maintenance_io_concurrency. The default vacuum buffer access
strategy ring buffer is 256 kB -- which is 32 buffers.

With maintenance_io_concurrency 10, streaming read code wants to pin 40
buffers. There is likely an interaction between this and the buffer
access strategy which leads to the slowdown at
maintenance_io_concurrency 10.

We could change the default maintenance_io_concurrency, but a better
option is to take the buffer access strategy into account in the
streaming read code.

Few data state
--------------

There was no difference in timing for any of the scenarios.

Table data uncached
===================

Sequential data state
---------------------

When the buffer access strategy is disabled, streaming read vacuum takes
12% less time regardless of maintenance_io_concurrency (36 ms vs 41 ms).

With the default buffer access strategy (ring buffer 256 kB) and
maintenance_io_concurrency 10 (the default), the streaming read vacuum
takes 19% more time. But if we bump maintenance_io_concurrency up to
100+, streaming read vacuum takes 64% less time:

| data state | code | mic | time (ms) |
+------------+-----------+------+-----------+
| sequential | master | NA | 113 |
| sequential | streaming | 10 | 140 |
| sequential | streaming | 100 | 41 |

This is likely due to the same adverse interaction between streaming
reads' max pinned buffers and the buffer access strategy ring buffer
size.

Few data state
--------------

The buffer access strategy had no impact here, so all of these results
are with the default buffer access strategy. The streaming read vacuum
takes 20-25% less time than master vacuum.

| data state | code | mic | time (ms) |
+------------+-----------+------+-----------+
| few | master | NA | 4.5 |
| few | streaming | 10 | 3.4 |
| few | streaming | 100 | 3.5 |

The improvement is likely due to prefetching and the one range of
consecutive blocks containing dead tuples which could be merged into a
larger IO.

Higher maintenance_io_concurrency only helps a little probably because:

1) most the blocks to vacuum are not consecutive so we can't make bigger
IOs in most cases
2) we are not vacuuming enough blocks such that we want to prefetch more
than 10 blocks.

This experiment should probably be redone with larger tables containing
more blocks needing vacuum. At 3-4 ms, a 20% performance difference
isn't really that interesting.

The next step (other than the preliminary refactoring patches) is to
decide how the streaming read API should use the buffer access strategy.

Sequential Data State DDL:
drop table if exists foo;
create table foo (a int) with (autovacuum_enabled=false, fillfactor=25);
insert into foo select i % 3 from generate_series(1,200000)i;
update foo set a = 5 where a = 1;

Few Data State DDL:
drop table if exists foo;
create table foo (a int) with (autovacuum_enabled=false, fillfactor=25);
insert into foo select i from generate_series(2,20000)i;
insert into foo select 1 from generate_series(1,200)i;
insert into foo select i from generate_series(2,20000)i;
insert into foo select 1 from generate_series(1,200)i;
insert into foo select i from generate_series(2,200000)i;
insert into foo select 1 from generate_series(1,200)i;
insert into foo select i from generate_series(2,20000)i;
insert into foo select 1 from generate_series(1,2000)i;
insert into foo select i from generate_series(2,20000)i;
insert into foo select 1 from generate_series(1,200)i;
insert into foo select i from generate_series(2,200000)i;
insert into foo select 1 from generate_series(1,200)i;
vacuum (freeze) foo;
update foo set a = 5 where a = 1;

- Melanie

[1]: /messages/by-id/1eeccf12-d5d1-4b7e-b88b-7342410129d7@iki.fi
[2]: https://github.com/melanieplageman/postgres/tree/vac_pgsr

#15

melanieplageman@gmail.com

about 2 years ago

In reply to: Heikki Linnakangas (#13)

Re: Confine vacuum skip logic to lazy_scan_skip

On Wed, Mar 06, 2024 at 09:55:21PM +0200, Heikki Linnakangas wrote:

On 27/02/2024 21:47, Melanie Plageman wrote:

The attached v5 has some simplifications when compared to v4 but takes
largely the same approach.

0001-0004 are refactoring

I'm looking at just these 0001-0004 patches for now. I like those changes a
lot for the sake of readablity even without any of the later patches.

Thanks! And thanks so much for the review!

I've done a small performance experiment comparing a branch with all of
the patches applied (your v6 0001-0009) with master. I made an 11 GB
table that has 1,394,328 blocks. For setup, I vacuumed it to update the
VM and made sure it was entirely in shared buffers. All of this was to
make sure all of the blocks would be skipped and we spend the majority
of the time spinning through the lazy_scan_heap() code. Then I ran
vacuum again (the actual test). I saw vacuum go from 13 ms to 10 ms
with the patches applied.

I think I need to do some profiling to see if the difference is actually
due to our code changes, but I thought I would share preliminary
results.

I made some further changes. I kept them as separate commits for easier
review, see the commit messages for details. Any thoughts on those changes?

I've given some inline feedback on most of the extra patches you added.
Short answer is they all seem fine to me except I have a reservations
about 0008 because of the number of blkno variables flying around. I
didn't have a chance to rebase these into my existing changes today, so
either I will do it tomorrow or, if you are feeling like you're on a
roll and want to do it, that also works!

I feel heap_vac_scan_get_next_block() function could use some love. Maybe
just some rewording of the comments, or maybe some other refactoring; not
sure. But I'm pretty happy with the function signature and how it's called.

I was wondering if we should remove the "get" and just go with
heap_vac_scan_next_block(). I didn't do that originally because I didn't
want to imply that the next block was literally the sequentially next
block, but I think maybe I was overthinking it.

Another idea is to call it heap_scan_vac_next_block() and then the order
of the words is more like the table AM functions that get the next block
(e.g. heapam_scan_bitmap_next_block()). Though maybe we don't want it to
be too similar to those since this isn't a table AM callback.

As for other refactoring and other rewording of comments and such, I
will take a pass at this tomorrow.

BTW, do we have tests that would fail if we botched up
heap_vac_scan_get_next_block() so that it would skip pages incorrectly, for
example? Not asking you to write them for this patch, but I'm just
wondering.

So, while developing this, when I messed up and skipped blocks I
shouldn't, vacuum would error out with the "found xmin from before
relfrozenxid" error -- which would cause random tests to fail. I know
that's not a correctly failing test of this code. I think there might be
some tests in the verify_heapam tests that could/do test this kind of
thing but I don't remember them failing for me during development -- so
I didn't spend much time looking at them.

I would also sometimes get freespace or VM tests that would fail because
those blocks that are incorrectly skipped were meant to be reflected in
the FSM or VM in those tests.

All of that is to say, perhaps we should write a more targeted test?

When I was writing the code, I added logging of skipped blocks and then
came up with different scenarios and ran them on master and with the
patch and diffed the logs.

From b4047b941182af0643838fde056c298d5cc3ae32 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 6 Mar 2024 20:13:42 +0200
Subject: [PATCH v6 5/9] Remove unused 'skipping_current_range' field

---
src/backend/access/heap/vacuumlazy.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 65d257aab83..51391870bf3 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -217,8 +217,6 @@ typedef struct LVRelState
Buffer		vmbuffer;
/* Next unskippable block's visibility status */
bool		next_unskippable_allvis;
-		/* Whether or not skippable blocks should be skipped */
-		bool		skipping_current_range;
}			skip;
} LVRelState;
--
2.39.2

Oops! I thought I removed this. I must have forgotten

From 27e431e8dc69bbf09d831cb1cf2903d16f177d74 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 6 Mar 2024 20:58:57 +0200
Subject: [PATCH v6 6/9] Move vmbuffer back to a local varible in
lazy_scan_heap()

It felt confusing that we passed around the current block, 'blkno', as
an argument to lazy_scan_new_or_empty() and lazy_scan_prune(), but
'vmbuffer' was accessed directly in the 'scan_state'.

It was also a bit vague, when exactly 'vmbuffer' was valid. Calling
heap_vac_scan_get_next_block() set it, sometimes, to a buffer that
might or might not contain the VM bit for 'blkno'. But other
functions, like lazy_scan_prune(), assumed it to contain the correct
buffer. That was fixed up visibilitymap_pin(). But clearly it was not
"owned" by heap_vac_scan_get_next_block(), like the other 'scan_state'
fields.

I moved it back to a local variable, like it was. Maybe there would be
even better ways to handle it, but at least this is not worse than
what we have in master currently.

I'm fine with this. I did it the way I did (grouping it with the
"next_unskippable_block" in the skip struct), because I think that this
vmbuffer is always the buffer containing the VM bit for the next
unskippable block -- which sometimes is the block returned by
heap_vac_scan_get_next_block() and sometimes isn't.

I agree it might be best as a local variable but perhaps we could retain
the comment about it being the block of the VM containing the bit for the
next unskippable block. (Honestly, the whole thing is very confusing).

From 519e26a01b6e6974f9e0edb94b00756af053f7ee Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 6 Mar 2024 20:27:57 +0200
Subject: [PATCH v6 7/9] Rename skip_state

I don't want to emphasize the "skipping" part. Rather, it's the state
onwed by the heap_vac_scan_get_next_block() function

This makes sense to me. Skipping should be private details of vacuum's
get_next_block functionality. Though the name is a bit long. Maybe we
don't need the "get" and "state" parts (it is already in a struct with
state in the name)?

From 6dfae936a29e2d3479273f8ab47778a596258b16 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 6 Mar 2024 21:03:19 +0200
Subject: [PATCH v6 8/9] Track 'current_block' in the skip state

The caller was expected to always pass last blk + 1. It's not clear if
the next_unskippable block accounting would work correctly if you
passed something else. So rather than expecting the caller to do that,
have heap_vac_scan_get_next_block() keep track of the last returned
block itself, in the 'skip' state.

This is largely redundant with the LVRelState->blkno field. But that
one is currently only used for error reporting, so it feels best to
give heap_vac_scan_get_next_block() its own field that it owns.

I understand and agree with you that relying on blkno + 1 is bad and we
should make the "next_block" state keep track of the current block.

Though, I now find it easy to confuse
lvrelstate->get_next_block_state->current_block, lvrelstate->blkno and
the local variable blkno in lazy_scan_heap(). I think it is a naming
thing and not that we shouldn't have all three. I'll think more about it
in the morning.

From 619556cad4aad68d1711c12b962e9002e56d8db2 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 6 Mar 2024 21:35:11 +0200
Subject: [PATCH v6 9/9] Comment & whitespace cleanup

I moved some of the paragraphs to inside the
heap_vac_scan_get_next_block() function. I found the explanation in
the function comment at the old place like too much detail. Someone
looking at the function signature and how to call it would not care
about all the details of what can or cannot be skipped.

LGTM.

Thanks again.

- Melanie

#16

melanieplageman@gmail.com

about 2 years ago

In reply to: Melanie Plageman (#15)

Re: Confine vacuum skip logic to lazy_scan_skip

On Wed, Mar 06, 2024 at 10:00:23PM -0500, Melanie Plageman wrote:

On Wed, Mar 06, 2024 at 09:55:21PM +0200, Heikki Linnakangas wrote:

I made some further changes. I kept them as separate commits for easier
review, see the commit messages for details. Any thoughts on those changes?

I've given some inline feedback on most of the extra patches you added.
Short answer is they all seem fine to me except I have a reservations
about 0008 because of the number of blkno variables flying around. I
didn't have a chance to rebase these into my existing changes today, so
either I will do it tomorrow or, if you are feeling like you're on a
roll and want to do it, that also works!

Attached v7 contains all of the changes that you suggested plus some
additional cleanups here and there.

I feel heap_vac_scan_get_next_block() function could use some love. Maybe
just some rewording of the comments, or maybe some other refactoring; not
sure. But I'm pretty happy with the function signature and how it's called.

I've cleaned up the comments on heap_vac_scan_next_block() in the first
couple patches (not so much in the streaming read user). Let me know if
it addresses your feelings or if I should look for other things I could
change.

I will say that now all of the variable names are *very* long. I didn't
want to remove the "state" from LVRelState->next_block_state. (In fact, I
kind of miss the "get". But I had to draw the line somewhere.) I think
without "state" in the name, next_block sounds too much like a function.

Any ideas for shortening the names of next_block_state and its members
or are you fine with them?

I was wondering if we should remove the "get" and just go with
heap_vac_scan_next_block(). I didn't do that originally because I didn't
want to imply that the next block was literally the sequentially next
block, but I think maybe I was overthinking it.

Another idea is to call it heap_scan_vac_next_block() and then the order
of the words is more like the table AM functions that get the next block
(e.g. heapam_scan_bitmap_next_block()). Though maybe we don't want it to
be too similar to those since this isn't a table AM callback.

I've done a version of this.

From 27e431e8dc69bbf09d831cb1cf2903d16f177d74 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 6 Mar 2024 20:58:57 +0200
Subject: [PATCH v6 6/9] Move vmbuffer back to a local varible in
lazy_scan_heap()

It felt confusing that we passed around the current block, 'blkno', as
an argument to lazy_scan_new_or_empty() and lazy_scan_prune(), but
'vmbuffer' was accessed directly in the 'scan_state'.

It was also a bit vague, when exactly 'vmbuffer' was valid. Calling
heap_vac_scan_get_next_block() set it, sometimes, to a buffer that
might or might not contain the VM bit for 'blkno'. But other
functions, like lazy_scan_prune(), assumed it to contain the correct
buffer. That was fixed up visibilitymap_pin(). But clearly it was not
"owned" by heap_vac_scan_get_next_block(), like the other 'scan_state'
fields.

I moved it back to a local variable, like it was. Maybe there would be
even better ways to handle it, but at least this is not worse than
what we have in master currently.

I'm fine with this. I did it the way I did (grouping it with the
"next_unskippable_block" in the skip struct), because I think that this
vmbuffer is always the buffer containing the VM bit for the next
unskippable block -- which sometimes is the block returned by
heap_vac_scan_get_next_block() and sometimes isn't.

I agree it might be best as a local variable but perhaps we could retain
the comment about it being the block of the VM containing the bit for the
next unskippable block. (Honestly, the whole thing is very confusing).

In 0001-0004 I've stuck with only having the local variable vmbuffer in
lazy_scan_heap().

In 0006 (introducing pass 1 vacuum streaming read user) I added a
vmbuffer back to the next_block_state (while also keeping the local
variable vmbuffer in lazy_scan_heap()). The vmbuffer in lazy_scan_heap()
contains the block of the VM containing visi information for the next
unskippable block or for the current block if its visi information
happens to be in the same block of the VM as either 1) the next
unskippable block or 2) the most recently processed heap block.

Streaming read vacuum separates this visibility check in
heap_vac_scan_next_block() from the main loop of lazy_scan_heap(), so we
can't just use a local variable anymore. Now the local variable vmbuffer
in lazy_scan_heap() will only already contain the block with the visi
information for the to-be-processed block if it happens to be in the
same VM block as the most recently processed heap block. That means
potentially more VM fetches.

However, by adding a vmbuffer to next_block_state, the callback may be
able to avoid extra VM fetches from one invocation to the next.

Note that next_block->current_block in the streaming read vacuum context
is actually the prefetch block.

- Melanie

#17

heikki.linnakangas@enterprisedb.com

about 2 years ago

In reply to: Melanie Plageman (#16)

Re: Confine vacuum skip logic to lazy_scan_skip

On 08/03/2024 02:46, Melanie Plageman wrote:

On Wed, Mar 06, 2024 at 10:00:23PM -0500, Melanie Plageman wrote:

I feel heap_vac_scan_get_next_block() function could use some love. Maybe
just some rewording of the comments, or maybe some other refactoring; not
sure. But I'm pretty happy with the function signature and how it's called.

I've cleaned up the comments on heap_vac_scan_next_block() in the first
couple patches (not so much in the streaming read user). Let me know if
it addresses your feelings or if I should look for other things I could
change.

Thanks, that is better. I think I now finally understand how the
function works, and now I can see some more issues and refactoring
opportunities :-).

Looking at current lazy_scan_skip() code in 'master', one thing now
caught my eye (and it's the same with your patches):

*next_unskippable_allvis = true;
while (next_unskippable_block < rel_pages)
{
uint8 mapbits = visibilitymap_get_status(vacrel->rel,
next_unskippable_block,
vmbuffer);

if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
*next_unskippable_allvis = false;
break;
}

/*
* Caller must scan the last page to determine whether it has tuples
* (caller must have the opportunity to set vacrel->nonempty_pages).
* This rule avoids having lazy_truncate_heap() take access-exclusive
* lock on rel to attempt a truncation that fails anyway, just because
* there are tuples on the last page (it is likely that there will be
* tuples on other nearby pages as well, but those can be skipped).
*
* Implement this by always treating the last block as unsafe to skip.
*/
if (next_unskippable_block == rel_pages - 1)
break;

/* DISABLE_PAGE_SKIPPING makes all skipping unsafe */
if (!vacrel->skipwithvm)
{
/* Caller shouldn't rely on all_visible_according_to_vm */
*next_unskippable_allvis = false;
break;
}

/*
* Aggressive VACUUM caller can't skip pages just because they are
* all-visible. They may still skip all-frozen pages, which can't
* contain XIDs < OldestXmin (XIDs that aren't already frozen by now).
*/
if ((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0)
{
if (vacrel->aggressive)
break;

/*
* All-visible block is safe to skip in non-aggressive case. But
* remember that the final range contains such a block for later.
*/
skipsallvis = true;
}

/* XXX: is it OK to remove this? */
vacuum_delay_point();
next_unskippable_block++;
nskippable_blocks++;
}

Firstly, it seems silly to check DISABLE_PAGE_SKIPPING within the loop.
When DISABLE_PAGE_SKIPPING is set, we always return the next block and
set *next_unskippable_allvis = false regardless of the visibility map,
so why bother checking the visibility map at all?

Except at the very last block of the relation! If you look carefully,
at the last block we do return *next_unskippable_allvis = true, if the
VM says so, even if DISABLE_PAGE_SKIPPING is set. I think that's wrong.
Surely the intention was to pretend that none of the VM bits were set if
DISABLE_PAGE_SKIPPING is used, also for the last block.

This was changed in commit 980ae17310:

@@ -1311,7 +1327,11 @@ lazy_scan_skip(LVRelState *vacrel, Buffer *vmbuffer, BlockNumber next_block,

/* DISABLE_PAGE_SKIPPING makes all skipping unsafe */
if (!vacrel->skipwithvm)
+               {
+                       /* Caller shouldn't rely on all_visible_according_to_vm */
+                       *next_unskippable_allvis = false;
break;
+               }

Before that, *next_unskippable_allvis was set correctly according to the
VM, even when DISABLE_PAGE_SKIPPING was used. It's not clear to me why
that was changed. And I think setting it to 'true' would be a more
failsafe value than 'false'. When *next_unskippable_allvis is set to
true, the caller cannot rely on it because a concurrent modification
could immediately clear the VM bit. But because VACUUM is the only
process that sets VM bits, if it's set to false, the caller can assume
that it's still not set later on.

One consequence of that is that with DISABLE_PAGE_SKIPPING,
lazy_scan_heap() dirties all pages, even if there are no changes. The
attached test script demonstrates that.

ISTM we should revert the above hunk, and backpatch it to v16. I'm a
little wary because I don't understand why that change was made in the
first place, though. I think it was just an ill-advised attempt at
tidying up the code as part of the larger commit, but I'm not sure.
Peter, do you remember?

I wonder if we should give up trying to set all_visible_according_to_vm
correctly when we decide what to skip, and always do
"all_visible_according_to_vm = visibilitymap_get_status(...)" in
lazy_scan_prune(). It would be more expensive, but maybe it doesn't
matter in practice. It would get rid of this tricky bookkeeping in
heap_vac_scan_next_block().

--
Heikki Linnakangas
Neon (https://neon.tech)

#18

Peter Geoghegan

pg@bowt.ie

about 2 years ago

In reply to: Heikki Linnakangas (#17)

Re: Confine vacuum skip logic to lazy_scan_skip

On Fri, Mar 8, 2024 at 8:49 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

ISTM we should revert the above hunk, and backpatch it to v16. I'm a
little wary because I don't understand why that change was made in the
first place, though. I think it was just an ill-advised attempt at
tidying up the code as part of the larger commit, but I'm not sure.
Peter, do you remember?

I think that it makes sense to set the VM when indicated by
lazy_scan_prune, independent of what either the visibility map or the
page's PD_ALL_VISIBLE marking say. The whole point of
DISABLE_PAGE_SKIPPING is to deal with VM corruption, after all.

In retrospect I didn't handle this particular aspect very well in
commit 980ae17310. The approach I took is a bit crude (and in any case
slightly wrong in that it is inconsistent in how it handles the last
page). But it has the merit of fixing the case where we just have the
VM's all-frozen bit set for a given block (not the all-visible bit
set) -- which is always wrong. There was good reason to be concerned
about that possibility when 980ae17310 went in.

--
Peter Geoghegan

#19

melanieplageman@gmail.com

about 2 years ago

In reply to: Heikki Linnakangas (#17)

Re: Confine vacuum skip logic to lazy_scan_skip

On Fri, Mar 8, 2024 at 8:49 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 08/03/2024 02:46, Melanie Plageman wrote:

On Wed, Mar 06, 2024 at 10:00:23PM -0500, Melanie Plageman wrote:

I feel heap_vac_scan_get_next_block() function could use some love. Maybe
just some rewording of the comments, or maybe some other refactoring; not
sure. But I'm pretty happy with the function signature and how it's called.

I've cleaned up the comments on heap_vac_scan_next_block() in the first
couple patches (not so much in the streaming read user). Let me know if
it addresses your feelings or if I should look for other things I could
change.

Thanks, that is better. I think I now finally understand how the
function works, and now I can see some more issues and refactoring
opportunities :-).

Looking at current lazy_scan_skip() code in 'master', one thing now
caught my eye (and it's the same with your patches):

*next_unskippable_allvis = true;
while (next_unskippable_block < rel_pages)
{
uint8 mapbits = visibilitymap_get_status(vacrel->rel,
next_unskippable_block,
vmbuffer);

if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
*next_unskippable_allvis = false;
break;
}

/*
* Caller must scan the last page to determine whether it has tuples
* (caller must have the opportunity to set vacrel->nonempty_pages).
* This rule avoids having lazy_truncate_heap() take access-exclusive
* lock on rel to attempt a truncation that fails anyway, just because
* there are tuples on the last page (it is likely that there will be
* tuples on other nearby pages as well, but those can be skipped).
*
* Implement this by always treating the last block as unsafe to skip.
*/
if (next_unskippable_block == rel_pages - 1)
break;

/* DISABLE_PAGE_SKIPPING makes all skipping unsafe */
if (!vacrel->skipwithvm)
{
/* Caller shouldn't rely on all_visible_according_to_vm */
*next_unskippable_allvis = false;
break;
}

/*
* Aggressive VACUUM caller can't skip pages just because they are
* all-visible. They may still skip all-frozen pages, which can't
* contain XIDs < OldestXmin (XIDs that aren't already frozen by now).
*/
if ((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0)
{
if (vacrel->aggressive)
break;

/*
* All-visible block is safe to skip in non-aggressive case. But
* remember that the final range contains such a block for later.
*/
skipsallvis = true;
}

/* XXX: is it OK to remove this? */
vacuum_delay_point();
next_unskippable_block++;
nskippable_blocks++;
}

Firstly, it seems silly to check DISABLE_PAGE_SKIPPING within the loop.
When DISABLE_PAGE_SKIPPING is set, we always return the next block and
set *next_unskippable_allvis = false regardless of the visibility map,
so why bother checking the visibility map at all?

Except at the very last block of the relation! If you look carefully,
at the last block we do return *next_unskippable_allvis = true, if the
VM says so, even if DISABLE_PAGE_SKIPPING is set. I think that's wrong.
Surely the intention was to pretend that none of the VM bits were set if
DISABLE_PAGE_SKIPPING is used, also for the last block.

I agree that having next_unskippable_allvis and, as a consequence,
all_visible_according_to_vm set to true for the last block seems
wrong. And It makes sense from a loop efficiency standpoint also to
move it up to the top. However, making that change would have us end
up dirtying all pages in your example.

This was changed in commit 980ae17310:
@@ -1311,7 +1327,11 @@ lazy_scan_skip(LVRelState *vacrel, Buffer *vmbuffer, BlockNumber next_block,
/* DISABLE_PAGE_SKIPPING makes all skipping unsafe */
if (!vacrel->skipwithvm)
+               {
+                       /* Caller shouldn't rely on all_visible_according_to_vm */
+                       *next_unskippable_allvis = false;
break;
+               }
Before that, *next_unskippable_allvis was set correctly according to the
VM, even when DISABLE_PAGE_SKIPPING was used. It's not clear to me why
that was changed. And I think setting it to 'true' would be a more
failsafe value than 'false'. When *next_unskippable_allvis is set to
true, the caller cannot rely on it because a concurrent modification
could immediately clear the VM bit. But because VACUUM is the only
process that sets VM bits, if it's set to false, the caller can assume
that it's still not set later on.

One consequence of that is that with DISABLE_PAGE_SKIPPING,
lazy_scan_heap() dirties all pages, even if there are no changes. The
attached test script demonstrates that.

This does seem undesirable.

However, if we do as you suggest above and don't check
DISABLE_PAGE_SKIPPING in the loop and instead return without checking
the VM when DISABLE_PAGE_SKIPPING is passed, setting
next_unskippable_allvis = false, we will end up dirtying all pages as
in your example. It would fix the last block issue but it would result
in dirtying all pages in your example.

ISTM we should revert the above hunk, and backpatch it to v16. I'm a
little wary because I don't understand why that change was made in the
first place, though. I think it was just an ill-advised attempt at
tidying up the code as part of the larger commit, but I'm not sure.
Peter, do you remember?

If we revert this, then the when all_visible_according_to_vm and
all_visible are true in lazy_scan_prune(), the VM will only get
updated when all_frozen is true and the VM doesn't have all frozen set
yet, so maybe that is inconsistent with the goal of
DISABLE_PAGE_SKIPPING to update the VM when its contents are "suspect"
(according to docs).

I wonder if we should give up trying to set all_visible_according_to_vm
correctly when we decide what to skip, and always do
"all_visible_according_to_vm = visibilitymap_get_status(...)" in
lazy_scan_prune(). It would be more expensive, but maybe it doesn't
matter in practice. It would get rid of this tricky bookkeeping in
heap_vac_scan_next_block().

I did some experiments on this in the past and thought that it did
have a perf impact to call visibilitymap_get_status() every time. But
let me try and dig those up. (doesn't speak to whether or not in
matters in practice)

- Melanie

#20

melanieplageman@gmail.com

about 2 years ago

In reply to: Peter Geoghegan (#18)

Re: Confine vacuum skip logic to lazy_scan_skip

On Fri, Mar 8, 2024 at 10:41 AM Peter Geoghegan <pg@bowt.ie> wrote:

On Fri, Mar 8, 2024 at 8:49 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

ISTM we should revert the above hunk, and backpatch it to v16. I'm a
little wary because I don't understand why that change was made in the
first place, though. I think it was just an ill-advised attempt at
tidying up the code as part of the larger commit, but I'm not sure.
Peter, do you remember?

I think that it makes sense to set the VM when indicated by
lazy_scan_prune, independent of what either the visibility map or the
page's PD_ALL_VISIBLE marking say. The whole point of
DISABLE_PAGE_SKIPPING is to deal with VM corruption, after all.

Not that it will be fun to maintain another special case in the VM
update code in lazy_scan_prune(), but we could have a special case
that checks if DISABLE_PAGE_SKIPPING was passed to vacuum and if
all_visible_according_to_vm is true and all_visible is true, we update
the VM but don't dirty the page. The docs on DISABLE_PAGE_SKIPPING say
it is meant to deal with VM corruption -- it doesn't say anything
about dealing with incorrectly set PD_ALL_VISIBLE markings.

- Melanie

#21

Peter Geoghegan

pg@bowt.ie

about 2 years ago

In reply to: Melanie Plageman (#20)

#22

Peter Geoghegan

pg@bowt.ie

about 2 years ago

In reply to: Peter Geoghegan (#21)

#23

heikki.linnakangas@enterprisedb.com

about 2 years ago

In reply to: Melanie Plageman (#16)

#24

melanieplageman@gmail.com

about 2 years ago

In reply to: Peter Geoghegan (#21)

#25

melanieplageman@gmail.com

about 2 years ago

In reply to: Melanie Plageman (#24)

#26

melanieplageman@gmail.com

about 2 years ago

In reply to: Heikki Linnakangas (#23)

#27

melanieplageman@gmail.com

about 2 years ago

In reply to: Melanie Plageman (#26)

#28

melanieplageman@gmail.com

about 2 years ago

In reply to: Melanie Plageman (#14)

#29

thomas.munro@gmail.com

about 2 years ago

In reply to: Melanie Plageman (#28)

#30

heikki.linnakangas@enterprisedb.com

almost 2 years ago

In reply to: Melanie Plageman (#26)

#31

melanieplageman@gmail.com

almost 2 years ago

In reply to: Heikki Linnakangas (#30)

#32

heikki.linnakangas@enterprisedb.com

almost 2 years ago

In reply to: Melanie Plageman (#31)

#33

melanieplageman@gmail.com

almost 2 years ago

In reply to: Heikki Linnakangas (#32)

#34

melanieplageman@gmail.com

almost 2 years ago

In reply to: Thomas Munro (#29)

#35

thomas.munro@gmail.com

almost 2 years ago

In reply to: Melanie Plageman (#34)

#36

melanieplageman@gmail.com

over 1 year ago

In reply to: Thomas Munro (#35)

#37

Noah Misch

noah@leadboat.com

over 1 year ago

In reply to: Melanie Plageman (#36)

#38

melanieplageman@gmail.com

over 1 year ago

In reply to: Noah Misch (#37)

#39

thomas.munro@gmail.com

over 1 year ago

In reply to: Noah Misch (#37)

#40

Noah Misch

noah@leadboat.com

over 1 year ago

In reply to: Thomas Munro (#39)

#41

thomas.munro@gmail.com

over 1 year ago

In reply to: Noah Misch (#40)

#42

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 1 year ago

In reply to: Thomas Munro (#41)

#43

melanieplageman@gmail.com

about 1 year ago

In reply to: Tomas Vondra (#42)

#44

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 1 year ago

In reply to: Melanie Plageman (#43)

#45

thomas.munro@gmail.com

about 1 year ago

In reply to: Tomas Vondra (#44)

#46

thomas.munro@gmail.com

about 1 year ago

In reply to: Thomas Munro (#45)

#47

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 1 year ago

In reply to: Thomas Munro (#45)

#48

melanieplageman@gmail.com

about 1 year ago

In reply to: Tomas Vondra (#44)

#49

melanieplageman@gmail.com

about 1 year ago

In reply to: Melanie Plageman (#48)

#50

melanieplageman@gmail.com

about 1 year ago

In reply to: Melanie Plageman (#49)

#51

Masahiko Sawada

sawada.mshk@gmail.com

about 1 year ago

In reply to: Melanie Plageman (#50)

#52

melanieplageman@gmail.com

about 1 year ago

In reply to: Melanie Plageman (#50)

#53

thomas.munro@gmail.com

about 1 year ago

In reply to: Melanie Plageman (#52)

#54

melanieplageman@gmail.com

about 1 year ago

In reply to: Masahiko Sawada (#51)

#55

melanieplageman@gmail.com

about 1 year ago

In reply to: Thomas Munro (#53)

#56

Masahiko Sawada

sawada.mshk@gmail.com

about 1 year ago

In reply to: Melanie Plageman (#54)

#57

melanieplageman@gmail.com

about 1 year ago

In reply to: Masahiko Sawada (#56)

#58

melanieplageman@gmail.com

about 1 year ago

In reply to: Melanie Plageman (#57)

#59

melanieplageman@gmail.com

about 1 year ago

In reply to: Melanie Plageman (#58)

#60

thomas.munro@gmail.com

about 1 year ago

In reply to: Melanie Plageman (#59)

#61

thomas.munro@gmail.com

about 1 year ago

In reply to: Thomas Munro (#60)

#62

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Thomas Munro (#61)

#63

thomas.munro@gmail.com

about 1 year ago

In reply to: Tom Lane (#62)

#64

melanieplageman@gmail.com

about 1 year ago

In reply to: Thomas Munro (#63)

#65

thomas.munro@gmail.com

about 1 year ago

In reply to: Melanie Plageman (#64)

#66

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Thomas Munro (#65)

#67

melanieplageman@gmail.com

about 1 year ago

In reply to: Tom Lane (#66)

#68

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Melanie Plageman (#67)

#69

Ranier Vilela

ranier.vf@gmail.com

about 1 year ago

In reply to: Melanie Plageman (#67)

#70

Andres Freund

andres@anarazel.de

about 1 year ago

In reply to: Ranier Vilela (#69)

#71

Andres Freund

andres@anarazel.de

about 1 year ago

In reply to: Andres Freund (#70)

#72

Ranier Vilela

ranier.vf@gmail.com

about 1 year ago

In reply to: Andres Freund (#71)

#73

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Andres Freund (#71)

#74

melanieplageman@gmail.com

about 1 year ago

In reply to: Tom Lane (#73)

#75

thomas.munro@gmail.com

about 1 year ago

In reply to: Melanie Plageman (#74)

#76

thomas.munro@gmail.com

about 1 year ago

In reply to: Thomas Munro (#75)

#77

Alvaro Herrera

alvherre@2ndquadrant.com

7 months ago

In reply to: Thomas Munro (#76)

#78

tgl@sss.pgh.pa.us

5 months ago

In reply to: Thomas Munro (#41)

#79

John Naylor

john.naylor@enterprisedb.com

4 months ago

In reply to: Tom Lane (#78)

#80