[patch] CLUSTER blocks scanned progress reporting

Started by Matthias van de Meentabout 5 years ago9 messages
#1Matthias van de Meent
boekewurm+postgres@gmail.com
1 attachment(s)

Hi,

The pg_stat_progress_cluster view can report incorrect
heap_blks_scanned values when synchronize_seqscans is enabled, because
it allows the sequential heap scan to not start at block 0. This can
result in wraparounds in the heap_blks_scanned column when the table
scan wraps around, and starting the next phase with heap_blks_scanned
!= heap_blks_total. This issue was introduced with the
pg_stat_progress_cluster view.

The attached patch fixes the issue by accounting for a non-0
heapScan->rs_startblock and calculating the correct number with a
non-0 heapScan->rs_startblock in mind.

The issue is reproducible starting from PG12 by stopping a
CLUSTER-command while it is sequential-scanning the table (seqscan
enforceable through enable_indexscan = off), and then re-starting the
seqscanning CLUSTER command (without other load/seq-scans on the
table).

Any thoughts?

Matthias van de Meent

Attachments:

v1-0001-Fix-CLUSTER-progress-reporting-of-number-of-block.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Fix-CLUSTER-progress-reporting-of-number-of-block.patchDownload
From 12f7617f4999b7bf85cc2c9ef4c3b96aac3b26e8 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm@gmail.com>
Date: Fri, 20 Nov 2020 16:23:59 +0100
Subject: [PATCH v1] Fix CLUSTER progress reporting of number of blocks
 scanned.

The heapScan need not start at block 0, so heapScan->rs_cblock need not be the
correct value for amount of blocks scanned. A more correct value is
 ((heapScan->rs_cblock - heapScan->rs_startblock + heapScan->rs_nblocks) %
   heapScan->rs_nblocks), as it accounts for the wraparound and the initial
offset of the heapScan.

Signed-off-by: Matthias van de Meent <boekewurm@gmail.com>
---
 src/backend/access/heap/heapam_handler.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dcaea7135f..3d0433633b 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -798,9 +798,17 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 			/*
 			 * In scan-and-sort mode and also VACUUM FULL, set heap blocks
 			 * scanned
+			 *
+			 * Note that heapScan may start at an offset and wrap around, i.e.
+			 * rs_startblock may be >0, and rs_cblock may end with a number
+			 * below rs_startblock. To prevent showing this wraparound to the
+			 * user, we offset rs_cblock by rs_startblock (modulo rs_nblocks).
 			 */
 			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
-										 heapScan->rs_cblock + 1);
+										 (heapScan->rs_cblock +
+										 heapScan->rs_nblocks -
+										 heapScan->rs_startblock
+										 ) % heapScan->rs_nblocks + 1);
 		}
 
 		tuple = ExecFetchSlotHeapTuple(slot, false, NULL);
-- 
2.20.1

#2Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Matthias van de Meent (#1)
Re: [patch] CLUSTER blocks scanned progress reporting

On 2020/11/21 2:32, Matthias van de Meent wrote:

Hi,

The pg_stat_progress_cluster view can report incorrect
heap_blks_scanned values when synchronize_seqscans is enabled, because
it allows the sequential heap scan to not start at block 0. This can
result in wraparounds in the heap_blks_scanned column when the table
scan wraps around, and starting the next phase with heap_blks_scanned
!= heap_blks_total. This issue was introduced with the
pg_stat_progress_cluster view.

Good catch! I agree that this is a bug.

The attached patch fixes the issue by accounting for a non-0
heapScan->rs_startblock and calculating the correct number with a
non-0 heapScan->rs_startblock in mind.

Thanks for the patch! It basically looks good to me.

It's a bit waste of cycles to calculate and update the number of scanned
blocks every cycles. So I'm inclined to change the code as follows.
Thought?

+	BlockNumber	prev_cblock = InvalidBlockNumber;
<snip>
+			if (prev_cblock != heapScan->rs_cblock)
+			{
+				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+											 (heapScan->rs_cblock +
+											  heapScan->rs_nblocks -
+											  heapScan->rs_startblock
+												 ) % heapScan->rs_nblocks + 1);
+				prev_cblock = heapScan->rs_cblock;
+			}

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#3Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Fujii Masao (#2)
1 attachment(s)
Re: [patch] CLUSTER blocks scanned progress reporting

On Tue, 24 Nov 2020 at 15:05, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/21 2:32, Matthias van de Meent wrote:

Hi,

The pg_stat_progress_cluster view can report incorrect
heap_blks_scanned values when synchronize_seqscans is enabled, because
it allows the sequential heap scan to not start at block 0. This can
result in wraparounds in the heap_blks_scanned column when the table
scan wraps around, and starting the next phase with heap_blks_scanned
!= heap_blks_total. This issue was introduced with the
pg_stat_progress_cluster view.

Good catch! I agree that this is a bug.

The attached patch fixes the issue by accounting for a non-0
heapScan->rs_startblock and calculating the correct number with a
non-0 heapScan->rs_startblock in mind.

Thanks for the patch! It basically looks good to me.

Thanks for the feedback!

It's a bit waste of cycles to calculate and update the number of scanned
blocks every cycles. So I'm inclined to change the code as follows.
Thought?

+       BlockNumber     prev_cblock = InvalidBlockNumber;
<snip>
+                       if (prev_cblock != heapScan->rs_cblock)
+                       {
+                               pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+                                                                                        (heapScan->rs_cblock +
+                                                                                         heapScan->rs_nblocks -
+                                                                                         heapScan->rs_startblock
+                                                                                                ) % heapScan->rs_nblocks + 1);
+                               prev_cblock = heapScan->rs_cblock;
+                       }

That seems quite reasonable.

I noticed that with my proposed patch it is still possible to go to
the next phase while heap_blks_scanned != heap_blks_total. This can
happen when the final heap pages contain only dead tuples, so no tuple
is returned from the last heap page(s) of the scan. As the
heapScan->rs_cblock is set to InvalidBlockNumber when the scan is
finished (see heapam.c#1060-1072), I think it would be correct to set
heap_blks_scanned to heapScan->rs_nblocks at the end of the scan
instead.

Please find attached a patch applying the suggested changes.

Matthias van de Meent

Attachments:

v2-0001-Fix-CLUSTER-progress-reporting-of-number-of-block.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Fix-CLUSTER-progress-reporting-of-number-of-block.patchDownload
From b3327cace3bebdb15006834e21672fc30cb2f0bb Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgresql@gmail.com>
Date: Fri, 20 Nov 2020 16:23:59 +0100
Subject: [PATCH v2] Fix CLUSTER progress reporting of number of blocks
 scanned.

The heapScan need not start at block 0, so heapScan->rs_cblock need not be the
correct value for amount of blocks scanned. A more correct value is
 ((heapScan->rs_cblock - heapScan->rs_startblock + heapScan->rs_nblocks) %
   heapScan->rs_nblocks), as it accounts for the wraparound and the initial
offset of the heapScan.

Additionally, a heap scan need not return tuples from the last scanned page.
This means that when table_scan_getnextslot returns false, we must manually
update the heap_blks_scanned parameter to the number of blocks in the heap
scan.
---
 src/backend/access/heap/heapam_handler.c | 28 ++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dcaea7135f..f20d4bed07 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -698,6 +698,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	Datum	   *values;
 	bool	   *isnull;
 	BufferHeapTupleTableSlot *hslot;
+	BlockNumber prev_cblock = InvalidBlockNumber;
 
 	/* Remember if it's a system catalog */
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -793,14 +794,37 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		else
 		{
 			if (!table_scan_getnextslot(tableScan, ForwardScanDirection, slot))
+			{
+				/*
+				 * A heap scan need not return tuples for the last page it has
+				 * scanned. To ensure that heap_blks_scanned is equivalent to
+				 * total_heap_blks after the table scan phase, this parameter
+				 * is manually updated to the correct value when the table scan
+				 * finishes.
+				 */
+				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+											 heapScan->rs_nblocks);
 				break;
+			}
 
 			/*
 			 * In scan-and-sort mode and also VACUUM FULL, set heap blocks
 			 * scanned
+			 *
+			 * Note that heapScan may start at an offset and wrap around, i.e.
+			 * rs_startblock may be >0, and rs_cblock may end with a number
+			 * below rs_startblock. To prevent showing this wraparound to the
+			 * user, we offset rs_cblock by rs_startblock (modulo rs_nblocks).
 			 */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
-										 heapScan->rs_cblock + 1);
+			if (prev_cblock != heapScan->rs_cblock)
+			{
+				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+											 (heapScan->rs_cblock +
+											  heapScan->rs_nblocks -
+											  heapScan->rs_startblock
+											 ) % heapScan->rs_nblocks + 1);
+				prev_cblock = heapScan->rs_cblock;
+			}
 		}
 
 		tuple = ExecFetchSlotHeapTuple(slot, false, NULL);
-- 
2.20.1

#4Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Matthias van de Meent (#3)
Re: [patch] CLUSTER blocks scanned progress reporting

On 2020/11/25 0:25, Matthias van de Meent wrote:

On Tue, 24 Nov 2020 at 15:05, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/21 2:32, Matthias van de Meent wrote:

Hi,

The pg_stat_progress_cluster view can report incorrect
heap_blks_scanned values when synchronize_seqscans is enabled, because
it allows the sequential heap scan to not start at block 0. This can
result in wraparounds in the heap_blks_scanned column when the table
scan wraps around, and starting the next phase with heap_blks_scanned
!= heap_blks_total. This issue was introduced with the
pg_stat_progress_cluster view.

Good catch! I agree that this is a bug.

The attached patch fixes the issue by accounting for a non-0
heapScan->rs_startblock and calculating the correct number with a
non-0 heapScan->rs_startblock in mind.

Thanks for the patch! It basically looks good to me.

Thanks for the feedback!

It's a bit waste of cycles to calculate and update the number of scanned
blocks every cycles. So I'm inclined to change the code as follows.
Thought?

+       BlockNumber     prev_cblock = InvalidBlockNumber;
<snip>
+                       if (prev_cblock != heapScan->rs_cblock)
+                       {
+                               pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+                                                                                        (heapScan->rs_cblock +
+                                                                                         heapScan->rs_nblocks -
+                                                                                         heapScan->rs_startblock
+                                                                                                ) % heapScan->rs_nblocks + 1);
+                               prev_cblock = heapScan->rs_cblock;
+                       }

That seems quite reasonable.

I noticed that with my proposed patch it is still possible to go to
the next phase while heap_blks_scanned != heap_blks_total. This can
happen when the final heap pages contain only dead tuples, so no tuple
is returned from the last heap page(s) of the scan. As the
heapScan->rs_cblock is set to InvalidBlockNumber when the scan is
finished (see heapam.c#1060-1072), I think it would be correct to set
heap_blks_scanned to heapScan->rs_nblocks at the end of the scan
instead.

Thanks for updating the patch!

Please let me check my understanding about this. I was thinking that even
when the last page contains only dead tuples, table_scan_getnextslot()
returns the last page (because SnapshotAny is used) and heap_blks_scanned
is incremented properly. And then, heapam_relation_copy_for_cluster()
handles all the dead tuples in that page. No?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#5Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Fujii Masao (#4)
Re: [patch] CLUSTER blocks scanned progress reporting

On Wed, 25 Nov 2020 at 10:35, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/25 0:25, Matthias van de Meent wrote:

I noticed that with my proposed patch it is still possible to go to
the next phase while heap_blks_scanned != heap_blks_total. This can
happen when the final heap pages contain only dead tuples, so no tuple
is returned from the last heap page(s) of the scan. As the
heapScan->rs_cblock is set to InvalidBlockNumber when the scan is
finished (see heapam.c#1060-1072), I think it would be correct to set
heap_blks_scanned to heapScan->rs_nblocks at the end of the scan
instead.

Thanks for updating the patch!

Please let me check my understanding about this. I was thinking that even
when the last page contains only dead tuples, table_scan_getnextslot()
returns the last page (because SnapshotAny is used) and heap_blks_scanned
is incremented properly. And then, heapam_relation_copy_for_cluster()
handles all the dead tuples in that page. No?

Yes, my description was incorrect.

The potential for a discrepancy exists for seemingly empty pages. I
thought that 'filled with only dead tuples' would be correct for
'seemingly empty', but SnapshotAny indeed returns all tuples on a
page. But pages can still be empty with SnapshotAny, through being
emptied by vacuum, so the last pages scanned can still be empty pages.
Then, there would be no tuple returned at the last pages of the scan,
and subsequently the CLUSTER/VACUUM FULL would start the next phase
without reporting on the last pages that were scanned and had no
tuples in them (such that heap_blks_scanned != heap_blks_total).

Vacuum truncates empty blocks from the end of the relation, and this
prevents empty blocks at the end of CLUSTER for the default case of
table scans starting at 0; but because the table scan might not start
at block 0, we can have an empty page at the end of the table scan due
to the last page of the scan not being the last page of the table.

Matthias van de Meent

#6Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Matthias van de Meent (#5)
Re: [patch] CLUSTER blocks scanned progress reporting

On 2020/11/25 20:52, Matthias van de Meent wrote:

On Wed, 25 Nov 2020 at 10:35, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/25 0:25, Matthias van de Meent wrote:

I noticed that with my proposed patch it is still possible to go to
the next phase while heap_blks_scanned != heap_blks_total. This can
happen when the final heap pages contain only dead tuples, so no tuple
is returned from the last heap page(s) of the scan. As the
heapScan->rs_cblock is set to InvalidBlockNumber when the scan is
finished (see heapam.c#1060-1072), I think it would be correct to set
heap_blks_scanned to heapScan->rs_nblocks at the end of the scan
instead.

Thanks for updating the patch!

Please let me check my understanding about this. I was thinking that even
when the last page contains only dead tuples, table_scan_getnextslot()
returns the last page (because SnapshotAny is used) and heap_blks_scanned
is incremented properly. And then, heapam_relation_copy_for_cluster()
handles all the dead tuples in that page. No?

Yes, my description was incorrect.

The potential for a discrepancy exists for seemingly empty pages. I
thought that 'filled with only dead tuples' would be correct for
'seemingly empty', but SnapshotAny indeed returns all tuples on a
page. But pages can still be empty with SnapshotAny, through being
emptied by vacuum, so the last pages scanned can still be empty pages.
Then, there would be no tuple returned at the last pages of the scan,
and subsequently the CLUSTER/VACUUM FULL would start the next phase
without reporting on the last pages that were scanned and had no
tuples in them (such that heap_blks_scanned != heap_blks_total).

Vacuum truncates empty blocks from the end of the relation, and this
prevents empty blocks at the end of CLUSTER for the default case of
table scans starting at 0; but because the table scan might not start
at block 0, we can have an empty page at the end of the table scan due
to the last page of the scan not being the last page of the table.

Thanks for explaining that! I understood that.
That situation can happen easily also by using VACUUM (TRUNCATE off) command.

+                                * A heap scan need not return tuples for the last page it has
+                                * scanned. To ensure that heap_blks_scanned is equivalent to
+                                * total_heap_blks after the table scan phase, this parameter
+                                * is manually updated to the correct value when the table scan
+                                * finishes.

So it's better to update this comment a bit? For example,

If the scanned last pages are empty, it's possible to go to the next phase
while heap_blks_scanned != heap_blks_total. To ensure that they are
equivalet after the table scan phase, this parameter is manually updated
to the correct value when the table scan finishes.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#7Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Fujii Masao (#6)
1 attachment(s)
Re: [patch] CLUSTER blocks scanned progress reporting

On Thu, 26 Nov 2020 at 00:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

+                                * A heap scan need not return tuples for the last page it has
+                                * scanned. To ensure that heap_blks_scanned is equivalent to
+                                * total_heap_blks after the table scan phase, this parameter
+                                * is manually updated to the correct value when the table scan
+                                * finishes.

So it's better to update this comment a bit? For example,

If the scanned last pages are empty, it's possible to go to the next phase
while heap_blks_scanned != heap_blks_total. To ensure that they are
equivalet after the table scan phase, this parameter is manually updated
to the correct value when the table scan finishes.

PFA a patch with updated message and comment. I've reworded the
messages to specifically mention empty pages and the need for setting
heap_blks_scanned = total_heap_blks explicitly.

Feel free to update the specifics, other than that I think this is a
complete fix for the issues at hand.

Regards,

Matthias van de Meent

Attachments:

v3-0001-Fix-CLUSTER-progress-reporting-of-number-of-block.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Fix-CLUSTER-progress-reporting-of-number-of-block.patchDownload
From 66af670588b88b7fb4eb78aa6251609d10651e6e Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm@gmail.com>
Date: Fri, 20 Nov 2020 16:23:59 +0100
Subject: [PATCH v3] Fix CLUSTER progress reporting of number of blocks
 scanned.

The heapScan need not start at block 0, so heapScan->rs_cblock need not be the
correct value for amount of blocks scanned. A more correct value is
 ((heapScan->rs_cblock - heapScan->rs_startblock + heapScan->rs_nblocks) %
   heapScan->rs_nblocks), as it accounts for the wraparound and the initial
offset of the heapScan.

Additionally, if the last pages of the scan were empty, the heap scan does
not return tuples from the last scanned page. This means that when
table_scan_getnextslot returns false, indicating the end of the table scan, we
must manually update the heap_blks_scanned parameter to the number of blocks
in the heap scan to ensure that heap_blks_scanned == heap_blks_total when
starting the next phase.
---
 src/backend/access/heap/heapam_handler.c | 29 ++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dcaea7135f..720d92cac9 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -698,6 +698,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	Datum	   *values;
 	bool	   *isnull;
 	BufferHeapTupleTableSlot *hslot;
+	BlockNumber prev_cblock = InvalidBlockNumber;
 
 	/* Remember if it's a system catalog */
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -793,14 +794,38 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		else
 		{
 			if (!table_scan_getnextslot(tableScan, ForwardScanDirection, slot))
+			{
+				/*
+				 * If the last pages of the scan were empty, we would go to
+				 * the next phase while heap_blks_scanned != heap_blks_total.
+				 * Instead, to ensure that heap_blks_scanned is equivalent to
+				 * total_heap_blks after the table scan phase, this parameter
+				 * is manually updated to the correct value when the table scan
+				 * finishes.
+				 */
+				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+											 heapScan->rs_nblocks);
 				break;
+			}
 
 			/*
 			 * In scan-and-sort mode and also VACUUM FULL, set heap blocks
 			 * scanned
+			 *
+			 * Note that heapScan may start at an offset and wrap around, i.e.
+			 * rs_startblock may be >0, and rs_cblock may end with a number
+			 * below rs_startblock. To prevent showing this wraparound to the
+			 * user, we offset rs_cblock by rs_startblock (modulo rs_nblocks).
 			 */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
-										 heapScan->rs_cblock + 1);
+			if (prev_cblock != heapScan->rs_cblock)
+			{
+				pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+											 (heapScan->rs_cblock +
+											  heapScan->rs_nblocks -
+											  heapScan->rs_startblock
+											 ) % heapScan->rs_nblocks + 1);
+				prev_cblock = heapScan->rs_cblock;
+			}
 		}
 
 		tuple = ExecFetchSlotHeapTuple(slot, false, NULL);
-- 
2.20.1

#8Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Matthias van de Meent (#7)
Re: [patch] CLUSTER blocks scanned progress reporting

On 2020/11/27 1:45, Matthias van de Meent wrote:

On Thu, 26 Nov 2020 at 00:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

+                                * A heap scan need not return tuples for the last page it has
+                                * scanned. To ensure that heap_blks_scanned is equivalent to
+                                * total_heap_blks after the table scan phase, this parameter
+                                * is manually updated to the correct value when the table scan
+                                * finishes.

So it's better to update this comment a bit? For example,

If the scanned last pages are empty, it's possible to go to the next phase
while heap_blks_scanned != heap_blks_total. To ensure that they are
equivalet after the table scan phase, this parameter is manually updated
to the correct value when the table scan finishes.

PFA a patch with updated message and comment. I've reworded the
messages to specifically mention empty pages and the need for setting
heap_blks_scanned = total_heap_blks explicitly.

Thanks for updating the patch! It looks good to me.
Barring any objection, I will commit this patch (also back-patch to v12).

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#9Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Fujii Masao (#8)
Re: [patch] CLUSTER blocks scanned progress reporting

On 2020/11/27 1:51, Fujii Masao wrote:

On 2020/11/27 1:45, Matthias van de Meent wrote:

On Thu, 26 Nov 2020 at 00:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

+                                * A heap scan need not return tuples for the last page it has
+                                * scanned. To ensure that heap_blks_scanned is equivalent to
+                                * total_heap_blks after the table scan phase, this parameter
+                                * is manually updated to the correct value when the table scan
+                                * finishes.

So it's better to update this comment a bit? For example,

      If the scanned last pages are empty, it's possible to go to the next phase
      while heap_blks_scanned != heap_blks_total. To ensure that they are
      equivalet after the table scan phase, this parameter is manually updated
      to the correct value when the table scan finishes.

PFA a patch with updated message and comment. I've reworded the
messages to specifically mention empty pages and the need for setting
heap_blks_scanned = total_heap_blks explicitly.

Thanks for updating the patch! It looks good to me.
Barring any objection, I will commit this patch (also back-patch to v12).

Pushed. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION