Assert("vacrel->eager_scan_remaining_successes > 0")

Started by Masahiko Sawada8 months ago12 messages
#1Masahiko Sawada
sawada.mshk@gmail.com

Hi,

I hit the assertion failure in the subject line with the following script:

create table t (a int) with (autovacuum_enabled = off);
insert into t select generate_series(1, 1_000_000);
vacuum t;
insert into t select generate_series(1, 1_000_000);
set vacuum_freeze_min_age to 0;
vacuum t;

When the success count reaches to 0, we disable the eager scan by
resetting related fields as follows:

/*
* If we hit our success cap, permanently disable eager
* scanning by setting the other eager scan management
* fields to their disabled values.
*/
vacrel->eager_scan_remaining_fails = 0;
vacrel->next_eager_scan_region_start = InvalidBlockNumber;
vacrel->eager_scan_max_fails_per_region = 0;

However, there is a possibility that we have already eagerly scanned
another page and returned it to the read stream before we freeze the
eagerly-scanned page and disable the eager scan. In this case, the
next block that we retrieved from the read stream could also be an
eagerly-scanned page.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#2Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#1)
Re: Assert("vacrel->eager_scan_remaining_successes > 0")

(CC'ed to Melanie)

On Fri, May 2, 2025 at 11:59 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi,

I hit the assertion failure in the subject line with the following script:

create table t (a int) with (autovacuum_enabled = off);
insert into t select generate_series(1, 1_000_000);
vacuum t;
insert into t select generate_series(1, 1_000_000);
set vacuum_freeze_min_age to 0;
vacuum t;

When the success count reaches to 0, we disable the eager scan by
resetting related fields as follows:

/*
* If we hit our success cap, permanently disable eager
* scanning by setting the other eager scan management
* fields to their disabled values.
*/
vacrel->eager_scan_remaining_fails = 0;
vacrel->next_eager_scan_region_start = InvalidBlockNumber;
vacrel->eager_scan_max_fails_per_region = 0;

However, there is a possibility that we have already eagerly scanned
another page and returned it to the read stream before we freeze the
eagerly-scanned page and disable the eager scan. In this case, the
next block that we retrieved from the read stream could also be an
eagerly-scanned page.

I've added it to Open Items for v18.

If I understand correctly, there's a potential scenario where we might
eagerly scan more pages than permitted by the success and failure
caps. One question is: what is the practical magnitude of this excess
scanning? If the overflow could be substantial (for instance, eagerly
scanning 30% of table pages), we should consider revising our eager
scanning mechanism.

One potential solution would be to implement a counter tracking the
number of eagerly scanned but unprocessed pages. This counter could
then inform the decision-making process in
find_next_unskippable_block() regarding whether to proceed with eager
scanning of additional pages.

Alternatively, if the excess scanning proves negligible in practice,
we could adopt a simpler solution by allowing
vacrel->eager_scan_remaining_successes to accept negative values and
removing the assertion in question.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#3Melanie Plageman
melanieplageman@gmail.com
In reply to: Masahiko Sawada (#2)
Re: Assert("vacrel->eager_scan_remaining_successes > 0")

On Tue, May 20, 2025 at 5:23 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Fri, May 2, 2025 at 11:59 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

However, there is a possibility that we have already eagerly scanned
another page and returned it to the read stream before we freeze the
eagerly-scanned page and disable the eager scan. In this case, the
next block that we retrieved from the read stream could also be an
eagerly-scanned page.

I've added it to Open Items for v18.

If I understand correctly, there's a potential scenario where we might
eagerly scan more pages than permitted by the success and failure
caps. One question is: what is the practical magnitude of this excess
scanning? If the overflow could be substantial (for instance, eagerly
scanning 30% of table pages), we should consider revising our eager
scanning mechanism.

Thanks for reporting this. Sorry I missed it initially.

I need to do some more investigation, but IIUC you are saying that this is
an interaction between the read stream and eager scan code? I tried to
confirm that was the case by setting io_combine_limit and
maintenance_io_concurrency to 1, which should be similar behavior to not
using the read stream WRT read combining etc. But, even doing so, your
repro tripped the assert. What made you suspect an interaction with the
read stream?

- Melanie

#4Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Melanie Plageman (#3)
Re: Assert("vacrel->eager_scan_remaining_successes > 0")

On Tue, May 20, 2025 at 3:22 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Tue, May 20, 2025 at 5:23 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, May 2, 2025 at 11:59 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

However, there is a possibility that we have already eagerly scanned
another page and returned it to the read stream before we freeze the
eagerly-scanned page and disable the eager scan. In this case, the
next block that we retrieved from the read stream could also be an
eagerly-scanned page.

I've added it to Open Items for v18.

If I understand correctly, there's a potential scenario where we might
eagerly scan more pages than permitted by the success and failure
caps. One question is: what is the practical magnitude of this excess
scanning? If the overflow could be substantial (for instance, eagerly
scanning 30% of table pages), we should consider revising our eager
scanning mechanism.

Thanks for reporting this. Sorry I missed it initially.

I need to do some more investigation, but IIUC you are saying that this is an interaction between the read stream and eager scan code?

Yes.

I tried to confirm that was the case by setting io_combine_limit and maintenance_io_concurrency to 1, which should be similar behavior to not using the read stream WRT read combining etc. But, even doing so, your repro tripped the assert. What made you suspect an interaction with the read stream?

While I haven't identified how exactly read stream is related to this
issue, what I've observed through debugging this issue is, during a
single read_stream_next_buffer() call, I observed that
heap_vac_scan_next_block() is invoked multiple times to return blocks
to the read stream and that we continued processing eagerly scanned
pages even after the success counter reaches zero while processing the
previous block. Even when I set both io_combine_limit and
maintenance_io_concurrency to 1, the debug logs showed that vacuum
still performs eager scanning two blocks ahead of the current block
being processed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#5Melanie Plageman
melanieplageman@gmail.com
In reply to: Masahiko Sawada (#4)
Re: Assert("vacrel->eager_scan_remaining_successes > 0")

On Tue, May 20, 2025 at 6:59 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

While I haven't identified how exactly read stream is related to this
issue, what I've observed through debugging this issue is, during a
single read_stream_next_buffer() call, I observed that
heap_vac_scan_next_block() is invoked multiple times to return blocks
to the read stream and that we continued processing eagerly scanned
pages even after the success counter reaches zero while processing the
previous block. Even when I set both io_combine_limit and
maintenance_io_concurrency to 1, the debug logs showed that vacuum
still performs eager scanning two blocks ahead of the current block
being processed.

I spent some more time looking into this today. This assert no longer works
with streaming vacuum.
Even with the minimums: io_combine_limit 1 and maintenance_io_concurrency
set to 0, we will still invoke read_stream_get_block() one extra time at
the start of the read stream.

To your point about eagerly scanning beyond the success and failure caps,
it is possible, but it is hard to get into the situation where you far
exceed them.

Though max_pinned_buffers is derived from io_combine_limit *
maintenance_io_concurrency, it is limited by the size of the buffer access
strategy ring (by default 2 MB for vacuum) and the size of shared buffers.
And because we won't prefetch if sequential access is detected (and the
prefetch distance has to first ramp up), it is unlikely we will both
prefetch the maximum distance of randomly located blocks and combine blocks
to the maximum IO size before returning to vacuum code from read stream
code. That is to say, the maximum number of extra all-visible blocks
scanned should not be very large.

We ran into a similar concern with the autoprewarm read stream user and
decided the overrun would not be large enough to merit special handling.

In this case, because we don't know if we will successfully freeze eagerly
scanned blocks until after they have been yielded by the read stream API,
there is no way to use the caps to limit the prefetch distance or read
combine size. Doing so would entail mixing logic into vacuum code about how
the read stream API works -- at least as far as I can tell.

I think the best course of action is to either change the assert to a guard

if (vacrel->eager_scan_remaining_successes > 0)
vacrel->eager_scan_remaining_successes--;

or make the counter signed and allow it to go negative (as you mentioned).

- Melanie

#6Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Melanie Plageman (#5)
1 attachment(s)
Re: Assert("vacrel->eager_scan_remaining_successes > 0")

On Wed, May 21, 2025 at 10:09 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Tue, May 20, 2025 at 6:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

While I haven't identified how exactly read stream is related to this
issue, what I've observed through debugging this issue is, during a
single read_stream_next_buffer() call, I observed that
heap_vac_scan_next_block() is invoked multiple times to return blocks
to the read stream and that we continued processing eagerly scanned
pages even after the success counter reaches zero while processing the
previous block. Even when I set both io_combine_limit and
maintenance_io_concurrency to 1, the debug logs showed that vacuum
still performs eager scanning two blocks ahead of the current block
being processed.

I spent some more time looking into this today. This assert no longer works with streaming vacuum.
Even with the minimums: io_combine_limit 1 and maintenance_io_concurrency set to 0, we will still invoke read_stream_get_block() one extra time at the start of the read stream.

To your point about eagerly scanning beyond the success and failure caps, it is possible, but it is hard to get into the situation where you far exceed them.

Though max_pinned_buffers is derived from io_combine_limit * maintenance_io_concurrency, it is limited by the size of the buffer access strategy ring (by default 2 MB for vacuum) and the size of shared buffers. And because we won't prefetch if sequential access is detected (and the prefetch distance has to first ramp up), it is unlikely we will both prefetch the maximum distance of randomly located blocks and combine blocks to the maximum IO size before returning to vacuum code from read stream code. That is to say, the maximum number of extra all-visible blocks scanned should not be very large.

We ran into a similar concern with the autoprewarm read stream user and decided the overrun would not be large enough to merit special handling.

In this case, because we don't know if we will successfully freeze eagerly scanned blocks until after they have been yielded by the read stream API, there is no way to use the caps to limit the prefetch distance or read combine size. Doing so would entail mixing logic into vacuum code about how the read stream API works -- at least as far as I can tell.

I concur with this analysis. Even in an extreme scenario where we set
high values for both io_combine_limit and maintenance_io_concurrency,
utilize a large shared buffer, and disable the ring buffer, the
potential overrun would likely not be huge.

I think the best course of action is to either change the assert to a guard

if (vacrel->eager_scan_remaining_successes > 0)
vacrel->eager_scan_remaining_successes--;

I've attached a patch that uses this idea. Feedback is very welcome.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-Fix-assertion-when-decrementing-eager_scan_remain.patchapplication/octet-stream; name=v1-0001-Fix-assertion-when-decrementing-eager_scan_remain.patchDownload
From 9ef95e019b09ea5f1bc62a88a2be4a0480197ada Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 21 May 2025 14:13:02 -0700
Subject: [PATCH v1] Fix assertion when decrementing
 eager_scan_remaining_successes counter.

Previously, we asserted that the eager scan's success counter was
positive before decrementing it. However, this assumption was
incorrect, as it's possible that some blocks have already been eagerly
scanned by the time eager scanning is disabled.

This commit replaces the assertion with a guard to handle this
scenario gracefully.

With this change, we continue to allow read-ahead operations by the
read stream that exceed the success and failure caps. While there is a
possibility that overruns will trigger eager scans of additional
pages (beyond 20% of the total number of all-visible but not
all-frozen pages), this does not pose a practical concern as the
overruns will not be huge and remain within acceptable range.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAD21AoConf6tkVCv-=JhQJj56kYsDwo4jG5+WqgT+ukSkYomSQ@mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f28326bad09..0811a59b37a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1413,11 +1413,24 @@ lazy_scan_heap(LVRelState *vacrel)
 
 			if (vm_page_frozen)
 			{
-				Assert(vacrel->eager_scan_remaining_successes > 0);
-				vacrel->eager_scan_remaining_successes--;
+				if (vacrel->eager_scan_remaining_successes > 0)
+					vacrel->eager_scan_remaining_successes--;
 
 				if (vacrel->eager_scan_remaining_successes == 0)
 				{
+					/*
+					 * Report only once that we disabled eager scanning. This
+					 * check is required because we might have eagerly read
+					 * more blocks and we could reach here even after
+					 * disabling eager scanning.
+					 */
+					if (vacrel->eager_scan_max_fails_per_region > 0)
+						ereport(vacrel->verbose ? INFO : DEBUG2,
+								(errmsg("disabling eager scanning after freezing %u eagerly scanned blocks of \"%s.%s.%s\"",
+										orig_eager_scan_success_limit,
+										vacrel->dbname, vacrel->relnamespace,
+										vacrel->relname)));
+
 					/*
 					 * If we hit our success cap, permanently disable eager
 					 * scanning by setting the other eager scan management
@@ -1426,12 +1439,6 @@ lazy_scan_heap(LVRelState *vacrel)
 					vacrel->eager_scan_remaining_fails = 0;
 					vacrel->next_eager_scan_region_start = InvalidBlockNumber;
 					vacrel->eager_scan_max_fails_per_region = 0;
-
-					ereport(vacrel->verbose ? INFO : DEBUG2,
-							(errmsg("disabling eager scanning after freezing %u eagerly scanned blocks of \"%s.%s.%s\"",
-									orig_eager_scan_success_limit,
-									vacrel->dbname, vacrel->relnamespace,
-									vacrel->relname)));
 				}
 			}
 			else
-- 
2.43.5

#7Melanie Plageman
melanieplageman@gmail.com
In reply to: Masahiko Sawada (#6)
Re: Assert("vacrel->eager_scan_remaining_successes > 0")

On Wed, May 21, 2025 at 6:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

if (vacrel->eager_scan_remaining_successes > 0)
vacrel->eager_scan_remaining_successes--;

I've attached a patch that uses this idea. Feedback is very welcome.

Thanks for writing the patch!

I actually think we have the same situation with
eager_scan_remaining_fails. Since the extra pages that are eagerly
scanned could either fail or succeed to be frozen, so we probably also
need to change the assert in the failure case into a guard as well:

else
{
Assert(vacrel->eager_scan_remaining_fails > 0);
vacrel->eager_scan_remaining_fails--;
}

->

else if (vacrel->eager_scan_remaining_fails > 0)
vacrel->eager_scan_remaining_fails--;

In the comment you wrote, I would probably just change one thing

+                    /*
+                     * Report only once that we disabled eager scanning. This
+                     * check is required because we might have eagerly read
+                     * more blocks and we could reach here even after
+                     * disabling eager scanning.
+                     */

I would emphasize that we read ahead these blocks before executing the
code trying to freeze them. So, I might instead say something like:
"Report only once that we disabled eager scanning. We may eagerly read
ahead blocks in excess of the success or failure caps before
attempting to freeze them, so we could reach here even after disabling
additional eager scanning"

And then probably avoid repeating the whole comment above the
remaining fails guard.

- Melanie

#8Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Melanie Plageman (#7)
1 attachment(s)
Re: Assert("vacrel->eager_scan_remaining_successes > 0")

On Thu, May 22, 2025 at 7:27 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Wed, May 21, 2025 at 6:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

if (vacrel->eager_scan_remaining_successes > 0)
vacrel->eager_scan_remaining_successes--;

I've attached a patch that uses this idea. Feedback is very welcome.

Thanks for writing the patch!

I actually think we have the same situation with
eager_scan_remaining_fails.

Good catch.

Since the extra pages that are eagerly
scanned could either fail or succeed to be frozen, so we probably also
need to change the assert in the failure case into a guard as well:

else
{
Assert(vacrel->eager_scan_remaining_fails > 0);
vacrel->eager_scan_remaining_fails--;
}

->

else if (vacrel->eager_scan_remaining_fails > 0)
vacrel->eager_scan_remaining_fails--;

In the comment you wrote, I would probably just change one thing

+                    /*
+                     * Report only once that we disabled eager scanning. This
+                     * check is required because we might have eagerly read
+                     * more blocks and we could reach here even after
+                     * disabling eager scanning.
+                     */

I would emphasize that we read ahead these blocks before executing the
code trying to freeze them. So, I might instead say something like:
"Report only once that we disabled eager scanning. We may eagerly read
ahead blocks in excess of the success or failure caps before
attempting to freeze them, so we could reach here even after disabling
additional eager scanning"

And then probably avoid repeating the whole comment above the
remaining fails guard.

Agreed. I've updated the patch. Does this address your comments?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

v2-0001-Fix-assertion-when-decrementing-eager-scanning-su.patchapplication/octet-stream; name=v2-0001-Fix-assertion-when-decrementing-eager-scanning-su.patchDownload
From 72677b0b97ce8534aef04cdf5ffc06a6aecadc75 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 21 May 2025 14:13:02 -0700
Subject: [PATCH v2] Fix assertion when decrementing eager scanning success and
 failure counters.

Previously, we asserted that the eager scan's success and failure
counters were positive before decrementing them. However, this
assumption was incorrect, as it's possible that some blocks have
already been eagerly scanned by the time eager scanning is disabled.

This commit replaces the assertions with guards to handle this
scenario gracefully.

With this change, we continue to allow read-ahead operations by the
read stream that exceed the success and failure caps. While there is a
possibility that overruns will trigger eager scans of additional
pages (beyond 20% of the total number of all-visible but not
all-frozen pages), this does not pose a practical concern as the
overruns will not be substantial huge and remain within an acceptable
range.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAD21AoConf6tkVCv-=JhQJj56kYsDwo4jG5+WqgT+ukSkYomSQ@mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 29 ++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f28326bad09..708674d8fcf 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1413,11 +1413,25 @@ lazy_scan_heap(LVRelState *vacrel)
 
 			if (vm_page_frozen)
 			{
-				Assert(vacrel->eager_scan_remaining_successes > 0);
-				vacrel->eager_scan_remaining_successes--;
+				if (vacrel->eager_scan_remaining_successes > 0)
+					vacrel->eager_scan_remaining_successes--;
 
 				if (vacrel->eager_scan_remaining_successes == 0)
 				{
+					/*
+					 * Report only once that we disabled eager scanning. We
+					 * may eagerly read ahead blocks in excess of the success
+					 * or failure caps before attempting to freeze them, so we
+					 * could reach here even after disabling additional eager
+					 * scanning.
+					 */
+					if (vacrel->eager_scan_max_fails_per_region > 0)
+						ereport(vacrel->verbose ? INFO : DEBUG2,
+								(errmsg("disabling eager scanning after freezing %u eagerly scanned blocks of \"%s.%s.%s\"",
+										orig_eager_scan_success_limit,
+										vacrel->dbname, vacrel->relnamespace,
+										vacrel->relname)));
+
 					/*
 					 * If we hit our success cap, permanently disable eager
 					 * scanning by setting the other eager scan management
@@ -1426,19 +1440,10 @@ lazy_scan_heap(LVRelState *vacrel)
 					vacrel->eager_scan_remaining_fails = 0;
 					vacrel->next_eager_scan_region_start = InvalidBlockNumber;
 					vacrel->eager_scan_max_fails_per_region = 0;
-
-					ereport(vacrel->verbose ? INFO : DEBUG2,
-							(errmsg("disabling eager scanning after freezing %u eagerly scanned blocks of \"%s.%s.%s\"",
-									orig_eager_scan_success_limit,
-									vacrel->dbname, vacrel->relnamespace,
-									vacrel->relname)));
 				}
 			}
-			else
-			{
-				Assert(vacrel->eager_scan_remaining_fails > 0);
+			else if (vacrel->eager_scan_remaining_fails > 0)
 				vacrel->eager_scan_remaining_fails--;
-			}
 		}
 
 		/*
-- 
2.43.5

#9Melanie Plageman
melanieplageman@gmail.com
In reply to: Masahiko Sawada (#8)
Re: Assert("vacrel->eager_scan_remaining_successes > 0")

On Thu, May 22, 2025 at 4:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Agreed. I've updated the patch. Does this address your comments?

Yep. LGTM.

I'd probably just remove the parenthetical remark about 20% from the
commit message since that only applies to the success cap and
referencing both the success and failure caps will make the sentence
very long.

from the commit message:
pages (beyond 20% of the total number of all-visible but not
all-frozen pages), this does not pose a practical concern as the

Thanks again for writing the patch!

- Melanie

#10Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Melanie Plageman (#9)
Re: Assert("vacrel->eager_scan_remaining_successes > 0")

On Thu, May 22, 2025 at 1:19 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Thu, May 22, 2025 at 4:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Agreed. I've updated the patch. Does this address your comments?

Yep. LGTM.

I'd probably just remove the parenthetical remark about 20% from the
commit message since that only applies to the success cap and
referencing both the success and failure caps will make the sentence
very long.

from the commit message:
pages (beyond 20% of the total number of all-visible but not
all-frozen pages), this does not pose a practical concern as the

Agreed.

I'll remove that part and push early next week, barring any objections.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#11Melanie Plageman
melanieplageman@gmail.com
In reply to: Masahiko Sawada (#10)
Re: Assert("vacrel->eager_scan_remaining_successes > 0")

On Fri, May 23, 2025 at 12:41 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I'll remove that part and push early next week, barring any objections.

Great, thanks so much!

- Melanie

#12Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Melanie Plageman (#11)
Re: Assert("vacrel->eager_scan_remaining_successes > 0")

On Fri, May 23, 2025 at 1:53 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Fri, May 23, 2025 at 12:41 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I'll remove that part and push early next week, barring any objections.

Great, thanks so much!

Pushed the fix and closed the open item. Thank you for reviewing the patch!

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com