Use pgBufferUsage for block reporting in analyze

Started by Anthonin Bonnefoyover 1 year ago17 messages
#1Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
1 attachment(s)

Hi,

Analyze logs within autovacuum uses specific variables
VacuumPage{Hit,Miss,Dirty} to track the buffer usage count. However,
pgBufferUsage already provides block usage tracking and handles more cases
(temporary tables, parallel workers...).

Those variables were only used in two places, block usage reporting in
verbose vacuum and analyze. 5cd72cc0c5017a9d4de8b5d465a75946da5abd1d
removed their usage in the vacuum command as part of a bugfix.

This patch replaces those Vacuum specific variables by pgBufferUsage
in analyze. This makes VacuumPage{Hit,Miss,Dirty} unused and removable.
This commit removes both their calls in bufmgr and their declarations.

Regards,
Anthonin

Attachments:

v1-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patchapplication/octet-stream; name=v1-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patchDownload
From 07fd03cb95132ab0c8b50c59f7ea4df21d1f9ef1 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Wed, 24 Apr 2024 10:40:33 +0200
Subject: Use pgBufferUsage for block reporting in analyze
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Analyze logs within autovacuum uses specific variables
VacuumPage{Hit,Miss,Dirty} to track the buffer usage count. However,
pgBufferUsage already provides block usage tracking and handles more
cases (temporary tables, parallel workers...).

Those variables were only used in two places, block usage reporting in
verbose vacuum and analyze. 5cd72cc0c5017a9d4de8b5d465a75946da5abd1d
removed their usage in the vacuum command as part of a bugfix. 

This patch replaces those Vacuum specific variables by pgBufferUsage in 
analyze. This makes VacuumPage{Hit,Miss,Dirty} completely unused and
removable. This commit removes both their calls in bufmgr and their
declarations.

pgBufferUsage is already providing block usage tracking.

Those variables were only used in two places, verbose reporting
in verbose vacuum and analyze. 5cd72cc0c5017a9d4de8b5d465a75946da5abd1d
already removed their usage in the vacuum command.

This patch replace VacuumPage{Hit,Miss,Dirty} Vacuum specific variables
by pgBufferUsage in analyze command.
This makes VacuumPage{Hit,Miss,Dirty} completely unused and removable.
This commit removes both their calls in bufmgr and their declarations.
---
 src/backend/commands/analyze.c        | 24 +++++++++++-------------
 src/backend/commands/vacuum.c         |  3 ---
 src/backend/commands/vacuumparallel.c |  3 ---
 src/backend/storage/buffer/bufmgr.c   |  4 ----
 src/backend/utils/init/globals.c      |  4 ----
 src/include/miscadmin.h               |  4 ----
 6 files changed, 11 insertions(+), 31 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 7d2cd24997..cab5792525 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -303,9 +303,8 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
-	int64		AnalyzePageHit = VacuumPageHit;
-	int64		AnalyzePageMiss = VacuumPageMiss;
-	int64		AnalyzePageDirty = VacuumPageDirty;
+	BufferUsage startbufferusage = pgBufferUsage;
+	BufferUsage bufferusage;
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
 
@@ -737,9 +736,8 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			 * happened as part of the analyze by subtracting out the
 			 * pre-analyze values which we saved above.
 			 */
-			AnalyzePageHit = VacuumPageHit - AnalyzePageHit;
-			AnalyzePageMiss = VacuumPageMiss - AnalyzePageMiss;
-			AnalyzePageDirty = VacuumPageDirty - AnalyzePageDirty;
+			memset(&bufferusage, 0, sizeof(BufferUsage));
+			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
 
 			/*
 			 * We do not expect an analyze to take > 25 days and it simplifies
@@ -765,10 +763,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 
 			if (delay_in_ms > 0)
 			{
-				read_rate = (double) BLCKSZ * AnalyzePageMiss / (1024 * 1024) /
-					(delay_in_ms / 1000.0);
-				write_rate = (double) BLCKSZ * AnalyzePageDirty / (1024 * 1024) /
-					(delay_in_ms / 1000.0);
+				read_rate = (double) BLCKSZ * (bufferusage.shared_blks_read + bufferusage.local_blks_read) /
+					(1024 * 1024) / (delay_in_ms / 1000.0);
+				write_rate = (double) BLCKSZ * (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied) /
+					(1024 * 1024) / (delay_in_ms / 1000.0);
 			}
 
 			/*
@@ -792,9 +790,9 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 							 read_rate, write_rate);
 			appendStringInfo(&buf, _("buffer usage: %lld hits, %lld misses, %lld dirtied\n"),
-							 (long long) AnalyzePageHit,
-							 (long long) AnalyzePageMiss,
-							 (long long) AnalyzePageDirty);
+							 (long long) (bufferusage.shared_blks_hit + bufferusage.local_blks_hit),
+							 (long long) (bufferusage.shared_blks_read + bufferusage.local_blks_read),
+							 (long long) (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied));
 			appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
 
 			ereport(LOG,
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 521ee74586..4d8c0e511d 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -603,9 +603,6 @@ vacuum(List *relations, VacuumParams *params, BufferAccessStrategy bstrategy,
 		VacuumFailsafeActive = false;
 		VacuumUpdateCosts();
 		VacuumCostBalance = 0;
-		VacuumPageHit = 0;
-		VacuumPageMiss = 0;
-		VacuumPageDirty = 0;
 		VacuumCostBalanceLocal = 0;
 		VacuumSharedCostBalance = NULL;
 		VacuumActiveNWorkers = NULL;
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index f26070bff2..22c057fe61 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -1043,9 +1043,6 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
 	/* Set cost-based vacuum delay */
 	VacuumUpdateCosts();
 	VacuumCostBalance = 0;
-	VacuumPageHit = 0;
-	VacuumPageMiss = 0;
-	VacuumPageDirty = 0;
 	VacuumCostBalanceLocal = 0;
 	VacuumSharedCostBalance = &(shared->cost_balance);
 	VacuumActiveNWorkers = &(shared->active_nworkers);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 49637284f9..996ee6fdc8 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1127,7 +1127,6 @@ PinBufferForBlock(Relation rel,
 	}
 	if (*foundPtr)
 	{
-		VacuumPageHit++;
 		pgstat_count_io_op(io_object, io_context, IOOP_HIT);
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageHit;
@@ -1519,7 +1518,6 @@ WaitReadBuffers(ReadBuffersOperation *operation)
 											  false);
 		}
 
-		VacuumPageMiss += io_buffers_len;
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageMiss * io_buffers_len;
 	}
@@ -2513,7 +2511,6 @@ MarkBufferDirty(Buffer buffer)
 	 */
 	if (!(old_buf_state & BM_DIRTY))
 	{
-		VacuumPageDirty++;
 		pgBufferUsage.shared_blks_dirtied++;
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageDirty;
@@ -5036,7 +5033,6 @@ MarkBufferDirtyHint(Buffer buffer, bool buffer_std)
 
 		if (dirtied)
 		{
-			VacuumPageDirty++;
 			pgBufferUsage.shared_blks_dirtied++;
 			if (VacuumCostActive)
 				VacuumCostBalance += VacuumCostPageDirty;
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index cc61937eef..ba7f45e8a2 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -151,10 +151,6 @@ int			VacuumCostPageDirty = 20;
 int			VacuumCostLimit = 200;
 double		VacuumCostDelay = 0;
 
-int64		VacuumPageHit = 0;
-int64		VacuumPageMiss = 0;
-int64		VacuumPageDirty = 0;
-
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 90f9b21b25..6d07005d39 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -283,10 +283,6 @@ extern PGDLLIMPORT int VacuumCostPageDirty;
 extern PGDLLIMPORT int VacuumCostLimit;
 extern PGDLLIMPORT double VacuumCostDelay;
 
-extern PGDLLIMPORT int64 VacuumPageHit;
-extern PGDLLIMPORT int64 VacuumPageMiss;
-extern PGDLLIMPORT int64 VacuumPageDirty;
-
 extern PGDLLIMPORT int VacuumCostBalance;
 extern PGDLLIMPORT bool VacuumCostActive;
 
-- 
2.39.3 (Apple Git-146)

#2Michael Paquier
michael@paquier.xyz
In reply to: Anthonin Bonnefoy (#1)
Re: Use pgBufferUsage for block reporting in analyze

On Fri, May 10, 2024 at 10:54:07AM +0200, Anthonin Bonnefoy wrote:

This patch replaces those Vacuum specific variables by pgBufferUsage
in analyze. This makes VacuumPage{Hit,Miss,Dirty} unused and removable.
This commit removes both their calls in bufmgr and their declarations.

Hmm, yeah, it looks like you're right. I can track all the blocks
read, hit and dirtied for VACUUM and ANALYZE in all the code path
where these removed variables were incremented. This needs some
runtime check to make sure that the calculations are consistent before
and after the fact (cannot do that now).

             appendStringInfo(&buf, _("buffer usage: %lld hits, %lld misses, %lld dirtied\n"),
-                             (long long) AnalyzePageHit,
-                             (long long) AnalyzePageMiss,
-                             (long long) AnalyzePageDirty);
+                             (long long) (bufferusage.shared_blks_hit + bufferusage.local_blks_hit),
+                             (long long) (bufferusage.shared_blks_read + bufferusage.local_blks_read),
+                             (long long) (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied));

Perhaps this should say "read" rather than "miss" in the logs as the
two read variables for the shared and local blocks are used? For
consistency, at least.

That's not material for v17, only for v18.
--
Michael

#3Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Michael Paquier (#2)
2 attachment(s)
Re: Use pgBufferUsage for block reporting in analyze

Thanks for having a look.

On Fri, May 10, 2024 at 12:40 PM Michael Paquier <michael@paquier.xyz>
wrote:

This needs some runtime check to make sure that the calculations
are consistent before and after the fact (cannot do that now).

Yeah, testing this is also a bit painful as buffer usage of analyze is only
displayed in the logs during autoanalyze. While looking at this, I've
thought of additional changes that could make testing easier and improve
consistency with VACUUM VERBOSE:
- Have ANALYZE VERBOSE outputs the buffer usage stats
- Add Wal usage to ANALYZE VERBOSE

analyze verbose output would look like:
postgres=# analyze (verbose) pgbench_accounts ;
INFO: analyzing "public.pgbench_accounts"
INFO: "pgbench_accounts": scanned 1640 of 1640 pages, containing 100000
live rows and 0 dead rows; 30000 rows in sample, 100000 estimated total rows
INFO: analyze of table "postgres.public.pgbench_accounts"
avg read rate: 124.120 MB/s, avg write rate: 0.110 MB/s
buffer usage: 533 hits, 1128 reads, 1 dirtied
WAL usage: 12 records, 1 full page images, 5729 bytes
system usage: CPU: user: 0.06 s, system: 0.00 s, elapsed: 0.07 s

Perhaps this should say "read" rather than "miss" in the logs as the

two read variables for the shared and local blocks are used? For
consistency, at least.

Sounds good.

That's not material for v17, only for v18.

Definitely

I've split the patch in two parts
1: Removal of the vacuum specific variables, this is the same as the
initial patch.
2: Add buffer and wal usage to analyze verbose output + rename miss to
reads

Regards,
Anthonin

Attachments:

v2-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patchapplication/octet-stream; name=v2-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patchDownload
From 012de0405b16869acf6510c51b58ef6649028af4 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Wed, 24 Apr 2024 10:40:33 +0200
Subject: Use pgBufferUsage for block reporting in analyze
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Analyze logs within autovacuum uses specific variables
VacuumPage{Hit,Miss,Dirty} to track the buffer usage count. However,
pgBufferUsage already provides block usage tracking and handles more
cases (temporary tables, parallel workers...).

Those variables were only used in two places, block usage reporting in
verbose vacuum and analyze. 5cd72cc0c5017a9d4de8b5d465a75946da5abd1d
removed their usage in the vacuum command as part of a bugfix. 

This patch replaces those Vacuum specific variables by pgBufferUsage in 
analyze. This makes VacuumPage{Hit,Miss,Dirty} completely unused and
removable. This commit removes both their calls in bufmgr and their
declarations.

pgBufferUsage is already providing block usage tracking.

Those variables were only used in two places, verbose reporting
in verbose vacuum and analyze. 5cd72cc0c5017a9d4de8b5d465a75946da5abd1d
already removed their usage in the vacuum command.

This patch replace VacuumPage{Hit,Miss,Dirty} Vacuum specific variables
by pgBufferUsage in analyze command.
This makes VacuumPage{Hit,Miss,Dirty} completely unused and removable.
This commit removes both their calls in bufmgr and their declarations.
---
 src/backend/commands/analyze.c        | 24 +++++++++++-------------
 src/backend/commands/vacuum.c         |  3 ---
 src/backend/commands/vacuumparallel.c |  3 ---
 src/backend/storage/buffer/bufmgr.c   |  4 ----
 src/backend/utils/init/globals.c      |  4 ----
 src/include/miscadmin.h               |  4 ----
 6 files changed, 11 insertions(+), 31 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 7d2cd24997..cab5792525 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -303,9 +303,8 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
-	int64		AnalyzePageHit = VacuumPageHit;
-	int64		AnalyzePageMiss = VacuumPageMiss;
-	int64		AnalyzePageDirty = VacuumPageDirty;
+	BufferUsage startbufferusage = pgBufferUsage;
+	BufferUsage bufferusage;
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
 
@@ -737,9 +736,8 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			 * happened as part of the analyze by subtracting out the
 			 * pre-analyze values which we saved above.
 			 */
-			AnalyzePageHit = VacuumPageHit - AnalyzePageHit;
-			AnalyzePageMiss = VacuumPageMiss - AnalyzePageMiss;
-			AnalyzePageDirty = VacuumPageDirty - AnalyzePageDirty;
+			memset(&bufferusage, 0, sizeof(BufferUsage));
+			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
 
 			/*
 			 * We do not expect an analyze to take > 25 days and it simplifies
@@ -765,10 +763,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 
 			if (delay_in_ms > 0)
 			{
-				read_rate = (double) BLCKSZ * AnalyzePageMiss / (1024 * 1024) /
-					(delay_in_ms / 1000.0);
-				write_rate = (double) BLCKSZ * AnalyzePageDirty / (1024 * 1024) /
-					(delay_in_ms / 1000.0);
+				read_rate = (double) BLCKSZ * (bufferusage.shared_blks_read + bufferusage.local_blks_read) /
+					(1024 * 1024) / (delay_in_ms / 1000.0);
+				write_rate = (double) BLCKSZ * (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied) /
+					(1024 * 1024) / (delay_in_ms / 1000.0);
 			}
 
 			/*
@@ -792,9 +790,9 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 							 read_rate, write_rate);
 			appendStringInfo(&buf, _("buffer usage: %lld hits, %lld misses, %lld dirtied\n"),
-							 (long long) AnalyzePageHit,
-							 (long long) AnalyzePageMiss,
-							 (long long) AnalyzePageDirty);
+							 (long long) (bufferusage.shared_blks_hit + bufferusage.local_blks_hit),
+							 (long long) (bufferusage.shared_blks_read + bufferusage.local_blks_read),
+							 (long long) (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied));
 			appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
 
 			ereport(LOG,
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 48f8eab202..7d8e9d2045 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -603,9 +603,6 @@ vacuum(List *relations, VacuumParams *params, BufferAccessStrategy bstrategy,
 		VacuumFailsafeActive = false;
 		VacuumUpdateCosts();
 		VacuumCostBalance = 0;
-		VacuumPageHit = 0;
-		VacuumPageMiss = 0;
-		VacuumPageDirty = 0;
 		VacuumCostBalanceLocal = 0;
 		VacuumSharedCostBalance = NULL;
 		VacuumActiveNWorkers = NULL;
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index f26070bff2..22c057fe61 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -1043,9 +1043,6 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
 	/* Set cost-based vacuum delay */
 	VacuumUpdateCosts();
 	VacuumCostBalance = 0;
-	VacuumPageHit = 0;
-	VacuumPageMiss = 0;
-	VacuumPageDirty = 0;
 	VacuumCostBalanceLocal = 0;
 	VacuumSharedCostBalance = &(shared->cost_balance);
 	VacuumActiveNWorkers = &(shared->active_nworkers);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 49637284f9..996ee6fdc8 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1127,7 +1127,6 @@ PinBufferForBlock(Relation rel,
 	}
 	if (*foundPtr)
 	{
-		VacuumPageHit++;
 		pgstat_count_io_op(io_object, io_context, IOOP_HIT);
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageHit;
@@ -1519,7 +1518,6 @@ WaitReadBuffers(ReadBuffersOperation *operation)
 											  false);
 		}
 
-		VacuumPageMiss += io_buffers_len;
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageMiss * io_buffers_len;
 	}
@@ -2513,7 +2511,6 @@ MarkBufferDirty(Buffer buffer)
 	 */
 	if (!(old_buf_state & BM_DIRTY))
 	{
-		VacuumPageDirty++;
 		pgBufferUsage.shared_blks_dirtied++;
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageDirty;
@@ -5036,7 +5033,6 @@ MarkBufferDirtyHint(Buffer buffer, bool buffer_std)
 
 		if (dirtied)
 		{
-			VacuumPageDirty++;
 			pgBufferUsage.shared_blks_dirtied++;
 			if (VacuumCostActive)
 				VacuumCostBalance += VacuumCostPageDirty;
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index cc61937eef..ba7f45e8a2 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -151,10 +151,6 @@ int			VacuumCostPageDirty = 20;
 int			VacuumCostLimit = 200;
 double		VacuumCostDelay = 0;
 
-int64		VacuumPageHit = 0;
-int64		VacuumPageMiss = 0;
-int64		VacuumPageDirty = 0;
-
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 90f9b21b25..6d07005d39 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -283,10 +283,6 @@ extern PGDLLIMPORT int VacuumCostPageDirty;
 extern PGDLLIMPORT int VacuumCostLimit;
 extern PGDLLIMPORT double VacuumCostDelay;
 
-extern PGDLLIMPORT int64 VacuumPageHit;
-extern PGDLLIMPORT int64 VacuumPageMiss;
-extern PGDLLIMPORT int64 VacuumPageDirty;
-
 extern PGDLLIMPORT int VacuumCostBalance;
 extern PGDLLIMPORT bool VacuumCostActive;
 
-- 
2.39.3 (Apple Git-146)

v2-0002-Output-buffer-and-wal-usage-with-verbose-analyze.patchapplication/octet-stream; name=v2-0002-Output-buffer-and-wal-usage-with-verbose-analyze.patchDownload
From 5635e88f7438371c22aef117648d29c0472a0e89 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Mon, 27 May 2024 09:51:12 +0200
Subject: Output buffer and wal usage with verbose analyze

Currently, buffer usage generated by analyze is only logged during
autoanalyze. To make behaviour more consistent with VACUUM VERBOSE, this
patch outputs buffer and wal usage of analyze on ANALYZE VERBOSE.

buffer miss is renamed into buffer read to be consistent with
pgBufferUsage naming in both vacuum and analyze verbose.

The patch also changes delta time computation in VACUUM to use ms
difference, to both simplify and make it more consistent with what's
done in ANALYZE.
---
 src/backend/access/heap/vacuumlazy.c | 13 ++++++-------
 src/backend/commands/analyze.c       | 28 +++++++++++++++++++++-------
 2 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8145ea8fc3..66e86c1d08 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -599,8 +599,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			TimestampDifferenceExceeds(starttime, endtime,
 									   params->log_min_duration))
 		{
-			long		secs_dur;
-			int			usecs_dur;
+			long		delay_in_ms;
 			WalUsage	walusage;
 			BufferUsage bufferusage;
 			StringInfoData buf;
@@ -609,7 +608,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			double		read_rate = 0,
 						write_rate = 0;
 
-			TimestampDifference(starttime, endtime, &secs_dur, &usecs_dur);
+			delay_in_ms = TimestampDifferenceMilliseconds(starttime, endtime);
 			memset(&walusage, 0, sizeof(WalUsage));
 			WalUsageAccumDiff(&walusage, &pgWalUsage, &startwalusage);
 			memset(&bufferusage, 0, sizeof(BufferUsage));
@@ -738,17 +737,17 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 				appendStringInfo(&buf, _("I/O timings: read: %.3f ms, write: %.3f ms\n"),
 								 read_ms, write_ms);
 			}
-			if (secs_dur > 0 || usecs_dur > 0)
+			if (delay_in_ms > 0)
 			{
 				read_rate = (double) BLCKSZ * (bufferusage.shared_blks_read + bufferusage.local_blks_read) /
-					(1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
+					(1024 * 1024) / (delay_in_ms / 1000.0);
 				write_rate = (double) BLCKSZ * (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied) /
-					(1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
+					(1024 * 1024) / (delay_in_ms / 1000.0);
 			}
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 							 read_rate, write_rate);
 			appendStringInfo(&buf,
-							 _("buffer usage: %lld hits, %lld misses, %lld dirtied\n"),
+							 _("buffer usage: %lld hits, %lld reads, %lld dirtied\n"),
 							 (long long) (bufferusage.shared_blks_hit + bufferusage.local_blks_hit),
 							 (long long) (bufferusage.shared_blks_read + bufferusage.local_blks_read),
 							 (long long) (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied));
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index cab5792525..14d30af644 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -288,7 +288,9 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 				ind;
 	Relation   *Irel;
 	int			nindexes;
-	bool		hasindex;
+	bool		verbose,
+				instrument,
+				hasindex;
 	VacAttrStats **vacattrstats;
 	AnlIndexData *indexdata;
 	int			targrows,
@@ -303,11 +305,15 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	WalUsage	startwalusage = pgWalUsage;
 	BufferUsage startbufferusage = pgBufferUsage;
 	BufferUsage bufferusage;
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
 
+	verbose = (params->options & VACOPT_VERBOSE) != 0;
+	instrument = (verbose || (AmAutoVacuumWorkerProcess() &&
+							  params->log_min_duration >= 0));
 	if (inh)
 		ereport(elevel,
 				(errmsg("analyzing \"%s.%s\" inheritance tree",
@@ -340,7 +346,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	RestrictSearchPath();
 
 	/* measure elapsed time iff autovacuum logging requires it */
-	if (AmAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
+	if (instrument)
 	{
 		if (track_io_timing)
 		{
@@ -718,15 +724,16 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	vac_close_indexes(nindexes, Irel, NoLock);
 
 	/* Log the action if appropriate */
-	if (AmAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
+	if (instrument)
 	{
 		TimestampTz endtime = GetCurrentTimestamp();
 
-		if (params->log_min_duration == 0 ||
+		if (verbose || params->log_min_duration == 0 ||
 			TimestampDifferenceExceeds(starttime, endtime,
 									   params->log_min_duration))
 		{
 			long		delay_in_ms;
+			WalUsage	walusage;
 			double		read_rate = 0;
 			double		write_rate = 0;
 			StringInfoData buf;
@@ -738,6 +745,8 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			 */
 			memset(&bufferusage, 0, sizeof(BufferUsage));
 			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
+			memset(&walusage, 0, sizeof(WalUsage));
+			WalUsageAccumDiff(&walusage, &pgWalUsage, &startwalusage);
 
 			/*
 			 * We do not expect an analyze to take > 25 days and it simplifies
@@ -775,7 +784,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			 */
 
 			initStringInfo(&buf);
-			appendStringInfo(&buf, _("automatic analyze of table \"%s.%s.%s\"\n"),
+			appendStringInfo(&buf, _("analyze of table \"%s.%s.%s\"\n"),
 							 get_database_name(MyDatabaseId),
 							 get_namespace_name(RelationGetNamespace(onerel)),
 							 RelationGetRelationName(onerel));
@@ -789,13 +798,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			}
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 							 read_rate, write_rate);
-			appendStringInfo(&buf, _("buffer usage: %lld hits, %lld misses, %lld dirtied\n"),
+			appendStringInfo(&buf, _("buffer usage: %lld hits, %lld reads, %lld dirtied\n"),
 							 (long long) (bufferusage.shared_blks_hit + bufferusage.local_blks_hit),
 							 (long long) (bufferusage.shared_blks_read + bufferusage.local_blks_read),
 							 (long long) (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied));
+			appendStringInfo(&buf,
+							 _("WAL usage: %lld records, %lld full page images, %llu bytes\n"),
+							 (long long) walusage.wal_records,
+							 (long long) walusage.wal_fpi,
+							 (unsigned long long) walusage.wal_bytes);
 			appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
 
-			ereport(LOG,
+			ereport(verbose ? INFO : LOG,
 					(errmsg_internal("%s", buf.data)));
 
 			pfree(buf.data);
-- 
2.39.3 (Apple Git-146)

#4Karina Litskevich
litskevichkarina@gmail.com
In reply to: Anthonin Bonnefoy (#3)
Re: Use pgBufferUsage for block reporting in analyze

Hi Anthonin,

I suggest assigning values
bufferusage.shared_blks_read + bufferusage.local_blks_read
and
bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied
to new variables and using them. This would keep the changed lines within
the 80 symbols limit, and make the code more readable overall.

I also believe that changing "misses" to "reads" should belong to 0001
patch since we only change it because we replace AnalyzePageMiss with
bufferusage.shared_blks_read + bufferusage.local_blks_read in 0001.

Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/

Show quoted text
#5Karina Litskevich
litskevichkarina@gmail.com
In reply to: Karina Litskevich (#4)
Re: Use pgBufferUsage for block reporting in analyze

I wrote:

I suggest assigning values
bufferusage.shared_blks_read + bufferusage.local_blks_read
and
bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied
to new variables and using them. This would keep the changed lines within
the 80 symbols limit, and make the code more readable overall.

The same applies to
bufferusage.shared_blks_hit + bufferusage.local_blks_hit

#6Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Karina Litskevich (#5)
3 attachment(s)
Re: Use pgBufferUsage for block reporting in analyze

Hi,

Thanks for the review, I've updated the patches with the suggestions:
- moved renaming of misses to reads to the first patch
- added intermediate variables for total blks usage

I've also done some additional tests using the provided
vacuum_analyze_buffer_usage.sql script. It relies on
pg_stat_statements to check the results (only pgss gives information
on dirtied buffers). It gives the following output:

psql:vacuum_analyze_buffer_usage.sql:21: INFO: vacuuming
"postgres.pg_temp_7.vacuum_blks_stat_test"
...
buffer usage: 105 hits, 3 reads, 6 dirtied
...
query | sum_hit | sum_read | sum_dirtied
--------------------+---------+----------+-------------
VACUUM (VERBOSE... | 105 | 3 | 6

For vacuum, we have the same results with SKIP_DATABASE_STATS. Without
this setting, we would have block usage generated by
vac_update_datfrozenxid outside of vacuum_rel and therefore not
tracked by the verbose output. For the second test, the second patch
is needed to have ANALYZE (VERBOSE) output the block usage. It will
output the following:

psql:vacuum_analyze_buffer_usage.sql:29: INFO: analyzing
"pg_temp_7.vacuum_blks_stat_test"
...
buffer usage: 84 hits, 33 reads, 2 dirtied
...
query | sum_hit | sum_read | sum_dirtied
---------------------+---------+----------+-------------
ANALYZE (VERBOSE... | 91 | 38 | 2

There's additional buffer hits/reads reported by pgss, those are from
analyze_rel opening the relations in try_relation_open and are not
tracked by the ANALYZE VERBOSE.

Attachments:

v3-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patchapplication/octet-stream; name=v3-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patchDownload
From 3d038a7d87b1aa3090ac56b4bd84ce130b6569ea Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Wed, 24 Apr 2024 10:40:33 +0200
Subject: Use pgBufferUsage for block reporting in analyze

Analyze logs within autovacuum uses specific variables
VacuumPage{Hit,Miss,Dirty} to track the buffer usage count. However,
pgBufferUsage already provides block usage tracking and handles more
cases (temporary tables, parallel workers...).

Those variables were only used in two places, block usage reporting in
verbose vacuum and analyze. 5cd72cc0c5017a9d4de8b5d465a75946da5abd1d
removed their usage in the vacuum command as part of a bugfix.

This patch replaces those Vacuum specific variables by pgBufferUsage in
analyze. This makes VacuumPage{Hit,Miss,Dirty} completely unused and
removable. This commit removes both their calls in bufmgr and their
declarations. It also renames buffer misses to buffer reads for better
consistency.
---
 src/backend/access/heap/vacuumlazy.c  | 22 +++++++++++-----
 src/backend/commands/analyze.c        | 36 ++++++++++++++++-----------
 src/backend/commands/vacuum.c         |  3 ---
 src/backend/commands/vacuumparallel.c |  3 ---
 src/backend/storage/buffer/bufmgr.c   |  4 ---
 src/backend/utils/init/globals.c      |  4 ---
 src/include/miscadmin.h               |  4 ---
 7 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3f88cf1e8e..cd5d5fc319 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -608,6 +608,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			int32		diff;
 			double		read_rate = 0,
 						write_rate = 0;
+			int64		total_blks_hit;
+			int64		total_blks_read;
+			int64		total_blks_dirtied;
 
 			TimestampDifference(starttime, endtime, &secs_dur, &usecs_dur);
 			memset(&walusage, 0, sizeof(WalUsage));
@@ -615,6 +618,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			memset(&bufferusage, 0, sizeof(BufferUsage));
 			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
 
+			total_blks_hit = bufferusage.shared_blks_hit +
+				bufferusage.local_blks_hit;
+			total_blks_read = bufferusage.shared_blks_read +
+				bufferusage.local_blks_read;
+			total_blks_dirtied = bufferusage.shared_blks_dirtied +
+				bufferusage.local_blks_dirtied;
+
 			initStringInfo(&buf);
 			if (verbose)
 			{
@@ -740,18 +750,18 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			}
 			if (secs_dur > 0 || usecs_dur > 0)
 			{
-				read_rate = (double) BLCKSZ * (bufferusage.shared_blks_read + bufferusage.local_blks_read) /
+				read_rate = (double) BLCKSZ * total_blks_read /
 					(1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
-				write_rate = (double) BLCKSZ * (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied) /
+				write_rate = (double) BLCKSZ * total_blks_dirtied /
 					(1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
 			}
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 							 read_rate, write_rate);
 			appendStringInfo(&buf,
-							 _("buffer usage: %lld hits, %lld misses, %lld dirtied\n"),
-							 (long long) (bufferusage.shared_blks_hit + bufferusage.local_blks_hit),
-							 (long long) (bufferusage.shared_blks_read + bufferusage.local_blks_read),
-							 (long long) (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied));
+							 _("buffer usage: %lld hits, %lld reads, %lld dirtied\n"),
+							 (long long) total_blks_hit,
+							 (long long) total_blks_read,
+							 (long long) total_blks_dirtied);
 			appendStringInfo(&buf,
 							 _("WAL usage: %lld records, %lld full page images, %llu bytes\n"),
 							 (long long) walusage.wal_records,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 7d2cd24997..258627d3db 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -303,9 +303,8 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
-	int64		AnalyzePageHit = VacuumPageHit;
-	int64		AnalyzePageMiss = VacuumPageMiss;
-	int64		AnalyzePageDirty = VacuumPageDirty;
+	BufferUsage startbufferusage = pgBufferUsage;
+	BufferUsage bufferusage;
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
 
@@ -731,15 +730,24 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			double		read_rate = 0;
 			double		write_rate = 0;
 			StringInfoData buf;
+			int64		total_blks_hit;
+			int64		total_blks_read;
+			int64		total_blks_dirtied;
 
 			/*
 			 * Calculate the difference in the Page Hit/Miss/Dirty that
 			 * happened as part of the analyze by subtracting out the
 			 * pre-analyze values which we saved above.
 			 */
-			AnalyzePageHit = VacuumPageHit - AnalyzePageHit;
-			AnalyzePageMiss = VacuumPageMiss - AnalyzePageMiss;
-			AnalyzePageDirty = VacuumPageDirty - AnalyzePageDirty;
+			memset(&bufferusage, 0, sizeof(BufferUsage));
+			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
+
+			total_blks_hit = bufferusage.shared_blks_hit +
+				bufferusage.local_blks_hit;
+			total_blks_read = bufferusage.shared_blks_read +
+				bufferusage.local_blks_read;
+			total_blks_dirtied = bufferusage.shared_blks_dirtied +
+				bufferusage.local_blks_dirtied;
 
 			/*
 			 * We do not expect an analyze to take > 25 days and it simplifies
@@ -765,10 +773,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 
 			if (delay_in_ms > 0)
 			{
-				read_rate = (double) BLCKSZ * AnalyzePageMiss / (1024 * 1024) /
-					(delay_in_ms / 1000.0);
-				write_rate = (double) BLCKSZ * AnalyzePageDirty / (1024 * 1024) /
-					(delay_in_ms / 1000.0);
+				read_rate = (double) BLCKSZ * total_blks_read /
+					(1024 * 1024) / (delay_in_ms / 1000.0);
+				write_rate = (double) BLCKSZ * total_blks_dirtied /
+					(1024 * 1024) / (delay_in_ms / 1000.0);
 			}
 
 			/*
@@ -791,10 +799,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			}
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 							 read_rate, write_rate);
-			appendStringInfo(&buf, _("buffer usage: %lld hits, %lld misses, %lld dirtied\n"),
-							 (long long) AnalyzePageHit,
-							 (long long) AnalyzePageMiss,
-							 (long long) AnalyzePageDirty);
+			appendStringInfo(&buf, _("buffer usage: %lld hits, %lld reads, %lld dirtied\n"),
+							 (long long) total_blks_hit,
+							 (long long) total_blks_read,
+							 (long long) total_blks_dirtied);
 			appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
 
 			ereport(LOG,
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 48f8eab202..7d8e9d2045 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -603,9 +603,6 @@ vacuum(List *relations, VacuumParams *params, BufferAccessStrategy bstrategy,
 		VacuumFailsafeActive = false;
 		VacuumUpdateCosts();
 		VacuumCostBalance = 0;
-		VacuumPageHit = 0;
-		VacuumPageMiss = 0;
-		VacuumPageDirty = 0;
 		VacuumCostBalanceLocal = 0;
 		VacuumSharedCostBalance = NULL;
 		VacuumActiveNWorkers = NULL;
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index f26070bff2..22c057fe61 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -1043,9 +1043,6 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
 	/* Set cost-based vacuum delay */
 	VacuumUpdateCosts();
 	VacuumCostBalance = 0;
-	VacuumPageHit = 0;
-	VacuumPageMiss = 0;
-	VacuumPageDirty = 0;
 	VacuumCostBalanceLocal = 0;
 	VacuumSharedCostBalance = &(shared->cost_balance);
 	VacuumActiveNWorkers = &(shared->active_nworkers);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 6181673095..be70d6ad5b 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1173,7 +1173,6 @@ PinBufferForBlock(Relation rel,
 	}
 	if (*foundPtr)
 	{
-		VacuumPageHit++;
 		pgstat_count_io_op(io_object, io_context, IOOP_HIT);
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageHit;
@@ -1565,7 +1564,6 @@ WaitReadBuffers(ReadBuffersOperation *operation)
 											  false);
 		}
 
-		VacuumPageMiss += io_buffers_len;
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageMiss * io_buffers_len;
 	}
@@ -2559,7 +2557,6 @@ MarkBufferDirty(Buffer buffer)
 	 */
 	if (!(old_buf_state & BM_DIRTY))
 	{
-		VacuumPageDirty++;
 		pgBufferUsage.shared_blks_dirtied++;
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageDirty;
@@ -5082,7 +5079,6 @@ MarkBufferDirtyHint(Buffer buffer, bool buffer_std)
 
 		if (dirtied)
 		{
-			VacuumPageDirty++;
 			pgBufferUsage.shared_blks_dirtied++;
 			if (VacuumCostActive)
 				VacuumCostBalance += VacuumCostPageDirty;
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 927bccfbea..d4d51c5173 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -152,10 +152,6 @@ int			VacuumCostPageDirty = 20;
 int			VacuumCostLimit = 200;
 double		VacuumCostDelay = 0;
 
-int64		VacuumPageHit = 0;
-int64		VacuumPageMiss = 0;
-int64		VacuumPageDirty = 0;
-
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 90f9b21b25..6d07005d39 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -283,10 +283,6 @@ extern PGDLLIMPORT int VacuumCostPageDirty;
 extern PGDLLIMPORT int VacuumCostLimit;
 extern PGDLLIMPORT double VacuumCostDelay;
 
-extern PGDLLIMPORT int64 VacuumPageHit;
-extern PGDLLIMPORT int64 VacuumPageMiss;
-extern PGDLLIMPORT int64 VacuumPageDirty;
-
 extern PGDLLIMPORT int VacuumCostBalance;
 extern PGDLLIMPORT bool VacuumCostActive;
 
-- 
2.39.3 (Apple Git-146)

v3-0002-Output-buffer-and-wal-usage-with-verbose-analyze.patchapplication/octet-stream; name=v3-0002-Output-buffer-and-wal-usage-with-verbose-analyze.patchDownload
From cea4a55edc18d7ea1f6b89b87ad7d5dfd056a7bd Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Mon, 8 Jul 2024 08:30:31 +0200
Subject: Output buffer and wal usage with verbose analyze

Currently, buffer usage generated by analyze is only logged during
autoanalyze. To make behaviour more consistent with VACUUM VERBOSE, this
patch outputs buffer and wal usage of analyze on ANALYZE VERBOSE.

The patch also changes delta time computation in VACUUM to use ms
difference, to both simplify and make it more consistent with what's
done in ANALYZE.
---
 src/backend/access/heap/vacuumlazy.c | 11 +++++------
 src/backend/commands/analyze.c       | 26 ++++++++++++++++++++------
 2 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cd5d5fc319..af3f603b36 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -599,8 +599,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			TimestampDifferenceExceeds(starttime, endtime,
 									   params->log_min_duration))
 		{
-			long		secs_dur;
-			int			usecs_dur;
+			long		delay_in_ms;
 			WalUsage	walusage;
 			BufferUsage bufferusage;
 			StringInfoData buf;
@@ -612,7 +611,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			int64		total_blks_read;
 			int64		total_blks_dirtied;
 
-			TimestampDifference(starttime, endtime, &secs_dur, &usecs_dur);
+			delay_in_ms = TimestampDifferenceMilliseconds(starttime, endtime);
 			memset(&walusage, 0, sizeof(WalUsage));
 			WalUsageAccumDiff(&walusage, &pgWalUsage, &startwalusage);
 			memset(&bufferusage, 0, sizeof(BufferUsage));
@@ -748,12 +747,12 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 				appendStringInfo(&buf, _("I/O timings: read: %.3f ms, write: %.3f ms\n"),
 								 read_ms, write_ms);
 			}
-			if (secs_dur > 0 || usecs_dur > 0)
+			if (delay_in_ms > 0)
 			{
 				read_rate = (double) BLCKSZ * total_blks_read /
-					(1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
+					(1024 * 1024) / (delay_in_ms / 1000.0);
 				write_rate = (double) BLCKSZ * total_blks_dirtied /
-					(1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
+					(1024 * 1024) / (delay_in_ms / 1000.0);
 			}
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 							 read_rate, write_rate);
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 258627d3db..c4575365b8 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -288,7 +288,9 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 				ind;
 	Relation   *Irel;
 	int			nindexes;
-	bool		hasindex;
+	bool		verbose,
+				instrument,
+				hasindex;
 	VacAttrStats **vacattrstats;
 	AnlIndexData *indexdata;
 	int			targrows,
@@ -303,11 +305,15 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	WalUsage	startwalusage = pgWalUsage;
 	BufferUsage startbufferusage = pgBufferUsage;
 	BufferUsage bufferusage;
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
 
+	verbose = (params->options & VACOPT_VERBOSE) != 0;
+	instrument = (verbose || (AmAutoVacuumWorkerProcess() &&
+							  params->log_min_duration >= 0));
 	if (inh)
 		ereport(elevel,
 				(errmsg("analyzing \"%s.%s\" inheritance tree",
@@ -340,7 +346,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	RestrictSearchPath();
 
 	/* measure elapsed time iff autovacuum logging requires it */
-	if (AmAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
+	if (instrument)
 	{
 		if (track_io_timing)
 		{
@@ -718,15 +724,16 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	vac_close_indexes(nindexes, Irel, NoLock);
 
 	/* Log the action if appropriate */
-	if (AmAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
+	if (instrument)
 	{
 		TimestampTz endtime = GetCurrentTimestamp();
 
-		if (params->log_min_duration == 0 ||
+		if (verbose || params->log_min_duration == 0 ||
 			TimestampDifferenceExceeds(starttime, endtime,
 									   params->log_min_duration))
 		{
 			long		delay_in_ms;
+			WalUsage	walusage;
 			double		read_rate = 0;
 			double		write_rate = 0;
 			StringInfoData buf;
@@ -741,6 +748,8 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			 */
 			memset(&bufferusage, 0, sizeof(BufferUsage));
 			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
+			memset(&walusage, 0, sizeof(WalUsage));
+			WalUsageAccumDiff(&walusage, &pgWalUsage, &startwalusage);
 
 			total_blks_hit = bufferusage.shared_blks_hit +
 				bufferusage.local_blks_hit;
@@ -785,7 +794,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			 */
 
 			initStringInfo(&buf);
-			appendStringInfo(&buf, _("automatic analyze of table \"%s.%s.%s\"\n"),
+			appendStringInfo(&buf, _("analyze of table \"%s.%s.%s\"\n"),
 							 get_database_name(MyDatabaseId),
 							 get_namespace_name(RelationGetNamespace(onerel)),
 							 RelationGetRelationName(onerel));
@@ -803,9 +812,14 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 							 (long long) total_blks_hit,
 							 (long long) total_blks_read,
 							 (long long) total_blks_dirtied);
+			appendStringInfo(&buf,
+							 _("WAL usage: %lld records, %lld full page images, %llu bytes\n"),
+							 (long long) walusage.wal_records,
+							 (long long) walusage.wal_fpi,
+							 (unsigned long long) walusage.wal_bytes);
 			appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
 
-			ereport(LOG,
+			ereport(verbose ? INFO : LOG,
 					(errmsg_internal("%s", buf.data)));
 
 			pfree(buf.data);
-- 
2.39.3 (Apple Git-146)

vacuum_analyze_buffer_usage.sqlapplication/octet-stream; name=vacuum_analyze_buffer_usage.sqlDownload
#7Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Anthonin Bonnefoy (#6)
Re: Use pgBufferUsage for block reporting in analyze

On Mon, Jul 8, 2024 at 2:35 AM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

Hi,

Thanks for the review, I've updated the patches with the suggestions:
- moved renaming of misses to reads to the first patch
- added intermediate variables for total blks usage

Thank you for working on this. 0001 patch looks good to me.

I like the 0002 patch idea. But with this patch, ANALYZE VERBOSE
writes something like this:

INFO: analyzing "public.test"
INFO: "test": scanned 443 of 443 pages, containing 100000 live rows
and 0 dead rows; 30000 rows in sample, 100000 estimated total rows
INFO: analyze of table "postgres.public.test"
avg read rate: 38.446 MB/s, avg write rate: 0.000 MB/s
buffer usage: 265 hits, 187 reads, 0 dirtied
WAL usage: 4 records, 0 full page images, 637 bytes
system usage: CPU: user: 0.03 s, system: 0.00 s, elapsed: 0.03 s

Which seems not to be consistent with what we do in VACUUM VERBOSE in
some points. For example, in VACUUM VERBOSE outputs, we write
statistics of pages, tuples, buffer usage, and WAL usage in one INFO
message:

INFO: vacuuming "postgres.public.test"
INFO: finished vacuuming "postgres.public.test": index scans: 0
pages: 0 removed, 443 remain, 1 scanned (0.23% of total)
tuples: 0 removed, 100000 remain, 0 are dead but not yet removable
removable cutoff: 754, which was 0 XIDs old when operation ended
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
index scan not needed: 0 pages from table (0.00% of total) had 0 dead
item identifiers removed
avg read rate: 23.438 MB/s, avg write rate: 0.000 MB/s
buffer usage: 5 hits, 3 reads, 0 dirtied
WAL usage: 0 records, 0 full page images, 0 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s

I'd suggest writing analyze verbose messages as something like:

INFO: finished analyzing "postgres.public.test"
pages: 443 of 443 scanned
tuples: 100000 live tuples, 0 are dead; 30000 tuples in sample, 100000
estimated total tuples
avg read rate: 38.446 MB/s, avg write rate: 0.000 MB/s
buffer usage: 265 hits, 187 reads, 0 dirtied
WAL usage: 4 records, 0 full page images, 637 bytes
system usage: CPU: user: 0.03 s, system: 0.00 s, elapsed: 0.03 s

The first line would vary depending on whether an autovacuum worker or
not. And the above suggestion includes a change of term "row" to
"tuple" for better consistency with VACUUM VERBOSE outputs. I think it
would be great if autoanalyze also writes logs in the same format.
IIUC with the patch, autoanalyze logs don't include the page and tuple
statistics.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#8Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Masahiko Sawada (#7)
3 attachment(s)
Re: Use pgBufferUsage for block reporting in analyze

On Mon, Jul 22, 2024 at 10:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

The first line would vary depending on whether an autovacuum worker or
not. And the above suggestion includes a change of term "row" to
"tuple" for better consistency with VACUUM VERBOSE outputs. I think it
would be great if autoanalyze also writes logs in the same format.
IIUC with the patch, autoanalyze logs don't include the page and tuple
statistics.

One issue is that the number of scanned pages, live tuples and dead
tuples is only available in acquire_sample_rows which is where the log
containing those stats is emitted. I've tried to implement the
following in 0003:
- Sampling functions now accept an AcquireSampleStats struct to store
pages and tuples stats
- Log is removed from sampling functions
- do_analyze_rel now outputs scanned and tuples statistics when
relevant. sampling from fdw doesn't provide those statistics so they
are not displayed in those cases.

This ensures that analyze logs are only emitted in do_analyze_rel,
allowing to display the same output for both autoanalyze and ANALYZE
VERBOSE. With those changes, we have the following output for analyze
verbose of a table:

analyze (verbose) pgbench_accounts ;
INFO: analyzing "public.pgbench_accounts"
INFO: analyze of table "postgres.public.pgbench_accounts"
pages: 1640 of 1640 scanned
tuples: 100000 live tuples, 0 are dead; 30000 tuples in samples,
100000 estimated total tuples
avg read rate: 174.395 MB/s, avg write rate: 0.000 MB/s
buffer usage: 285 hits, 1384 reads, 0 dirtied
WAL usage: 14 records, 0 full page images, 1343 bytes
system usage: CPU: user: 0.05 s, system: 0.00 s, elapsed: 0.06 s

For a file_fdw, the output will look like:

analyze (verbose) pglog;
INFO: analyzing "public.pglog"
INFO: analyze of table "postgres.public.pglog"
tuples: 30000 tuples in samples, 60042 estimated total tuples
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 182 hits, 0 reads, 0 dirtied
WAL usage: 118 records, 0 full page images, 13086 bytes
system usage: CPU: user: 0.40 s, system: 0.00 s, elapsed: 0.41 s

I've also slightly modified 0002 to display "automatic analyze" when
we're inside an autovacuum worker, similar to what's done with vacuum
output.

Regards,
Anthonin

Attachments:

v3-0002-Output-buffer-and-wal-usage-with-verbose-analyze.patchapplication/octet-stream; name=v3-0002-Output-buffer-and-wal-usage-with-verbose-analyze.patchDownload
From d0f3c8dd28859bc4a48f0c5bc9cf81ecb7f93065 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Mon, 8 Jul 2024 08:30:31 +0200
Subject: Output buffer and wal usage with verbose analyze

Currently, buffer usage generated by analyze is only logged during
autoanalyze. To make behaviour more consistent with VACUUM VERBOSE, this
patch outputs buffer and wal usage of analyze on ANALYZE VERBOSE.

The patch also changes delta time computation in VACUUM to use ms
difference, to both simplify and make it more consistent with what's
done in ANALYZE.
---
 src/backend/access/heap/vacuumlazy.c | 11 +++++-----
 src/backend/commands/analyze.c       | 33 +++++++++++++++++++++++-----
 2 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d82aa3d489..32cce88e26 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -599,8 +599,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			TimestampDifferenceExceeds(starttime, endtime,
 									   params->log_min_duration))
 		{
-			long		secs_dur;
-			int			usecs_dur;
+			long		delay_in_ms;
 			WalUsage	walusage;
 			BufferUsage bufferusage;
 			StringInfoData buf;
@@ -612,7 +611,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			int64		total_blks_read;
 			int64		total_blks_dirtied;
 
-			TimestampDifference(starttime, endtime, &secs_dur, &usecs_dur);
+			delay_in_ms = TimestampDifferenceMilliseconds(starttime, endtime);
 			memset(&walusage, 0, sizeof(WalUsage));
 			WalUsageAccumDiff(&walusage, &pgWalUsage, &startwalusage);
 			memset(&bufferusage, 0, sizeof(BufferUsage));
@@ -748,12 +747,12 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 				appendStringInfo(&buf, _("I/O timings: read: %.3f ms, write: %.3f ms\n"),
 								 read_ms, write_ms);
 			}
-			if (secs_dur > 0 || usecs_dur > 0)
+			if (delay_in_ms > 0)
 			{
 				read_rate = (double) BLCKSZ * total_blks_read /
-					(1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
+					(1024 * 1024) / (delay_in_ms / 1000.0);
 				write_rate = (double) BLCKSZ * total_blks_dirtied /
-					(1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
+					(1024 * 1024) / (delay_in_ms / 1000.0);
 			}
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 							 read_rate, write_rate);
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d4377186d7..d492772aac 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -288,7 +288,9 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 				ind;
 	Relation   *Irel;
 	int			nindexes;
-	bool		hasindex;
+	bool		verbose,
+				instrument,
+				hasindex;
 	VacAttrStats **vacattrstats;
 	AnlIndexData *indexdata;
 	int			targrows,
@@ -303,11 +305,15 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	WalUsage	startwalusage = pgWalUsage;
 	BufferUsage startbufferusage = pgBufferUsage;
 	BufferUsage bufferusage;
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
 
+	verbose = (params->options & VACOPT_VERBOSE) != 0;
+	instrument = (verbose || (AmAutoVacuumWorkerProcess() &&
+							  params->log_min_duration >= 0));
 	if (inh)
 		ereport(elevel,
 				(errmsg("analyzing \"%s.%s\" inheritance tree",
@@ -340,7 +346,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	RestrictSearchPath();
 
 	/* measure elapsed time iff autovacuum logging requires it */
-	if (AmAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
+	if (instrument)
 	{
 		if (track_io_timing)
 		{
@@ -723,17 +729,19 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	vac_close_indexes(nindexes, Irel, NoLock);
 
 	/* Log the action if appropriate */
-	if (AmAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
+	if (instrument)
 	{
 		TimestampTz endtime = GetCurrentTimestamp();
 
-		if (params->log_min_duration == 0 ||
+		if (verbose || params->log_min_duration == 0 ||
 			TimestampDifferenceExceeds(starttime, endtime,
 									   params->log_min_duration))
 		{
 			long		delay_in_ms;
+			WalUsage	walusage;
 			double		read_rate = 0;
 			double		write_rate = 0;
+			char	   *msgfmt;
 			StringInfoData buf;
 			int64		total_blks_hit;
 			int64		total_blks_read;
@@ -746,6 +754,8 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			 */
 			memset(&bufferusage, 0, sizeof(BufferUsage));
 			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
+			memset(&walusage, 0, sizeof(WalUsage));
+			WalUsageAccumDiff(&walusage, &pgWalUsage, &startwalusage);
 
 			total_blks_hit = bufferusage.shared_blks_hit +
 				bufferusage.local_blks_hit;
@@ -790,7 +800,13 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			 */
 
 			initStringInfo(&buf);
-			appendStringInfo(&buf, _("automatic analyze of table \"%s.%s.%s\"\n"),
+
+			if (AmAutoVacuumWorkerProcess())
+				msgfmt = _("automatic analyze of table \"%s.%s.%s\"\n");
+			else
+				msgfmt = _("analyze of table \"%s.%s.%s\"\n");
+
+			appendStringInfo(&buf, msgfmt,
 							 get_database_name(MyDatabaseId),
 							 get_namespace_name(RelationGetNamespace(onerel)),
 							 RelationGetRelationName(onerel));
@@ -808,9 +824,14 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 							 (long long) total_blks_hit,
 							 (long long) total_blks_read,
 							 (long long) total_blks_dirtied);
+			appendStringInfo(&buf,
+							 _("WAL usage: %lld records, %lld full page images, %llu bytes\n"),
+							 (long long) walusage.wal_records,
+							 (long long) walusage.wal_fpi,
+							 (unsigned long long) walusage.wal_bytes);
 			appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
 
-			ereport(LOG,
+			ereport(verbose ? INFO : LOG,
 					(errmsg_internal("%s", buf.data)));
 
 			pfree(buf.data);
-- 
2.39.3 (Apple Git-146)

v3-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patchapplication/octet-stream; name=v3-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patchDownload
From b5be250dc3628f0dde0554300562912a62accbbb Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Wed, 24 Apr 2024 10:40:33 +0200
Subject: Use pgBufferUsage for block reporting in analyze

Analyze logs within autovacuum uses specific variables
VacuumPage{Hit,Miss,Dirty} to track the buffer usage count. However,
pgBufferUsage already provides block usage tracking and handles more
cases (temporary tables, parallel workers...).

Those variables were only used in two places, block usage reporting in
verbose vacuum and analyze. 5cd72cc0c5017a9d4de8b5d465a75946da5abd1d
removed their usage in the vacuum command as part of a bugfix.

This patch replaces those Vacuum specific variables by pgBufferUsage in
analyze. This makes VacuumPage{Hit,Miss,Dirty} completely unused and
removable. This commit removes both their calls in bufmgr and their
declarations. It also renames buffer misses to buffer reads for better
consistency.
---
 src/backend/access/heap/vacuumlazy.c  | 22 +++++++++++-----
 src/backend/commands/analyze.c        | 36 ++++++++++++++++-----------
 src/backend/commands/vacuum.c         |  3 ---
 src/backend/commands/vacuumparallel.c |  3 ---
 src/backend/storage/buffer/bufmgr.c   |  4 ---
 src/backend/utils/init/globals.c      |  4 ---
 src/include/miscadmin.h               |  4 ---
 7 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 835b53415d..d82aa3d489 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -608,6 +608,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			int32		diff;
 			double		read_rate = 0,
 						write_rate = 0;
+			int64		total_blks_hit;
+			int64		total_blks_read;
+			int64		total_blks_dirtied;
 
 			TimestampDifference(starttime, endtime, &secs_dur, &usecs_dur);
 			memset(&walusage, 0, sizeof(WalUsage));
@@ -615,6 +618,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			memset(&bufferusage, 0, sizeof(BufferUsage));
 			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
 
+			total_blks_hit = bufferusage.shared_blks_hit +
+				bufferusage.local_blks_hit;
+			total_blks_read = bufferusage.shared_blks_read +
+				bufferusage.local_blks_read;
+			total_blks_dirtied = bufferusage.shared_blks_dirtied +
+				bufferusage.local_blks_dirtied;
+
 			initStringInfo(&buf);
 			if (verbose)
 			{
@@ -740,18 +750,18 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			}
 			if (secs_dur > 0 || usecs_dur > 0)
 			{
-				read_rate = (double) BLCKSZ * (bufferusage.shared_blks_read + bufferusage.local_blks_read) /
+				read_rate = (double) BLCKSZ * total_blks_read /
 					(1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
-				write_rate = (double) BLCKSZ * (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied) /
+				write_rate = (double) BLCKSZ * total_blks_dirtied /
 					(1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
 			}
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 							 read_rate, write_rate);
 			appendStringInfo(&buf,
-							 _("buffer usage: %lld hits, %lld misses, %lld dirtied\n"),
-							 (long long) (bufferusage.shared_blks_hit + bufferusage.local_blks_hit),
-							 (long long) (bufferusage.shared_blks_read + bufferusage.local_blks_read),
-							 (long long) (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied));
+							 _("buffer usage: %lld hits, %lld reads, %lld dirtied\n"),
+							 (long long) total_blks_hit,
+							 (long long) total_blks_read,
+							 (long long) total_blks_dirtied);
 			appendStringInfo(&buf,
 							 _("WAL usage: %lld records, %lld full page images, %llu bytes\n"),
 							 (long long) walusage.wal_records,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index c590a2adc3..d4377186d7 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -303,9 +303,8 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
-	int64		AnalyzePageHit = VacuumPageHit;
-	int64		AnalyzePageMiss = VacuumPageMiss;
-	int64		AnalyzePageDirty = VacuumPageDirty;
+	BufferUsage startbufferusage = pgBufferUsage;
+	BufferUsage bufferusage;
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
 
@@ -736,15 +735,24 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			double		read_rate = 0;
 			double		write_rate = 0;
 			StringInfoData buf;
+			int64		total_blks_hit;
+			int64		total_blks_read;
+			int64		total_blks_dirtied;
 
 			/*
 			 * Calculate the difference in the Page Hit/Miss/Dirty that
 			 * happened as part of the analyze by subtracting out the
 			 * pre-analyze values which we saved above.
 			 */
-			AnalyzePageHit = VacuumPageHit - AnalyzePageHit;
-			AnalyzePageMiss = VacuumPageMiss - AnalyzePageMiss;
-			AnalyzePageDirty = VacuumPageDirty - AnalyzePageDirty;
+			memset(&bufferusage, 0, sizeof(BufferUsage));
+			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
+
+			total_blks_hit = bufferusage.shared_blks_hit +
+				bufferusage.local_blks_hit;
+			total_blks_read = bufferusage.shared_blks_read +
+				bufferusage.local_blks_read;
+			total_blks_dirtied = bufferusage.shared_blks_dirtied +
+				bufferusage.local_blks_dirtied;
 
 			/*
 			 * We do not expect an analyze to take > 25 days and it simplifies
@@ -770,10 +778,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 
 			if (delay_in_ms > 0)
 			{
-				read_rate = (double) BLCKSZ * AnalyzePageMiss / (1024 * 1024) /
-					(delay_in_ms / 1000.0);
-				write_rate = (double) BLCKSZ * AnalyzePageDirty / (1024 * 1024) /
-					(delay_in_ms / 1000.0);
+				read_rate = (double) BLCKSZ * total_blks_read /
+					(1024 * 1024) / (delay_in_ms / 1000.0);
+				write_rate = (double) BLCKSZ * total_blks_dirtied /
+					(1024 * 1024) / (delay_in_ms / 1000.0);
 			}
 
 			/*
@@ -796,10 +804,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			}
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 							 read_rate, write_rate);
-			appendStringInfo(&buf, _("buffer usage: %lld hits, %lld misses, %lld dirtied\n"),
-							 (long long) AnalyzePageHit,
-							 (long long) AnalyzePageMiss,
-							 (long long) AnalyzePageDirty);
+			appendStringInfo(&buf, _("buffer usage: %lld hits, %lld reads, %lld dirtied\n"),
+							 (long long) total_blks_hit,
+							 (long long) total_blks_read,
+							 (long long) total_blks_dirtied);
 			appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
 
 			ereport(LOG,
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 48f8eab202..7d8e9d2045 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -603,9 +603,6 @@ vacuum(List *relations, VacuumParams *params, BufferAccessStrategy bstrategy,
 		VacuumFailsafeActive = false;
 		VacuumUpdateCosts();
 		VacuumCostBalance = 0;
-		VacuumPageHit = 0;
-		VacuumPageMiss = 0;
-		VacuumPageDirty = 0;
 		VacuumCostBalanceLocal = 0;
 		VacuumSharedCostBalance = NULL;
 		VacuumActiveNWorkers = NULL;
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index f26070bff2..22c057fe61 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -1043,9 +1043,6 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
 	/* Set cost-based vacuum delay */
 	VacuumUpdateCosts();
 	VacuumCostBalance = 0;
-	VacuumPageHit = 0;
-	VacuumPageMiss = 0;
-	VacuumPageDirty = 0;
 	VacuumCostBalanceLocal = 0;
 	VacuumSharedCostBalance = &(shared->cost_balance);
 	VacuumActiveNWorkers = &(shared->active_nworkers);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 4415ba648a..fb38c7bdd4 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1192,7 +1192,6 @@ PinBufferForBlock(Relation rel,
 	}
 	if (*foundPtr)
 	{
-		VacuumPageHit++;
 		pgstat_count_io_op(io_object, io_context, IOOP_HIT);
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageHit;
@@ -1588,7 +1587,6 @@ WaitReadBuffers(ReadBuffersOperation *operation)
 											  false);
 		}
 
-		VacuumPageMiss += io_buffers_len;
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageMiss * io_buffers_len;
 	}
@@ -2582,7 +2580,6 @@ MarkBufferDirty(Buffer buffer)
 	 */
 	if (!(old_buf_state & BM_DIRTY))
 	{
-		VacuumPageDirty++;
 		pgBufferUsage.shared_blks_dirtied++;
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageDirty;
@@ -5122,7 +5119,6 @@ MarkBufferDirtyHint(Buffer buffer, bool buffer_std)
 
 		if (dirtied)
 		{
-			VacuumPageDirty++;
 			pgBufferUsage.shared_blks_dirtied++;
 			if (VacuumCostActive)
 				VacuumCostBalance += VacuumCostPageDirty;
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 927bccfbea..d4d51c5173 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -152,10 +152,6 @@ int			VacuumCostPageDirty = 20;
 int			VacuumCostLimit = 200;
 double		VacuumCostDelay = 0;
 
-int64		VacuumPageHit = 0;
-int64		VacuumPageMiss = 0;
-int64		VacuumPageDirty = 0;
-
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 90f9b21b25..6d07005d39 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -283,10 +283,6 @@ extern PGDLLIMPORT int VacuumCostPageDirty;
 extern PGDLLIMPORT int VacuumCostLimit;
 extern PGDLLIMPORT double VacuumCostDelay;
 
-extern PGDLLIMPORT int64 VacuumPageHit;
-extern PGDLLIMPORT int64 VacuumPageMiss;
-extern PGDLLIMPORT int64 VacuumPageDirty;
-
 extern PGDLLIMPORT int VacuumCostBalance;
 extern PGDLLIMPORT bool VacuumCostActive;
 
-- 
2.39.3 (Apple Git-146)

v3-0003-Output-pages-and-tuples-stats-in-ANALYZE-VERBOSE.patchapplication/octet-stream; name=v3-0003-Output-pages-and-tuples-stats-in-ANALYZE-VERBOSE.patchDownload
From 25211c6b6c4eebdcdea92a956b8cfbe928fa94f8 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Wed, 24 Jul 2024 10:15:35 +0200
Subject: Output pages and tuples stats in ANALYZE VERBOSE

Pages and tuples stats were only displayed as a log within
acquire_sample function and ANALYZE VERBOSE couldn't display those as
they were not available, only the total live and dead tuples were
exported.

To make the behaviour more consistent with VACUUM VERBOSE and between
autoanalyze logs and ANALYZE VERBOSE:
- relevant sampling stats are in a AcquireSampleStats
- log is removed from acquire sample functions.
- do_analyze_rel now outputs scanned and tuples statistics when
  relevant. sampling from fdw doesn't provide those statistics so they
  are not displayed in those cases.

The sampling functions of both file_fdw and postgres_fdw were updated to reflect
those changes.
---
 contrib/file_fdw/file_fdw.c         |  36 +++------
 contrib/postgres_fdw/postgres_fdw.c |  29 ++-----
 src/backend/commands/analyze.c      | 113 ++++++++++++++--------------
 src/include/foreign/fdwapi.h        |  18 ++++-
 4 files changed, 89 insertions(+), 107 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 249d82d3a0..2a4b0d8a3b 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -160,9 +160,8 @@ static void estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
 						   FileFdwPlanState *fdw_private,
 						   Cost *startup_cost, Cost *total_cost);
-static int	file_acquire_sample_rows(Relation onerel, int elevel,
-									 HeapTuple *rows, int targrows,
-									 double *totalrows, double *totaldeadrows);
+static int	file_acquire_sample_rows(Relation onerel, HeapTuple *rows,
+									 int targrows, AcquireSampleStats * sstats);
 
 
 /*
@@ -1112,7 +1111,7 @@ estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
  * which must have at least targrows entries.
  * The actual number of rows selected is returned as the function result.
  * We also count the total number of rows in the file and return it into
- * *totalrows.  Note that *totaldeadrows is always set to 0.
+ * sstats->total_live_tuples.
  *
  * Note that the returned list of rows is not always in order by physical
  * position in the file.  Therefore, correlation estimates derived later
@@ -1120,11 +1119,9 @@ estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
  * currently (the planner only pays attention to correlation for indexscans).
  */
 static int
-file_acquire_sample_rows(Relation onerel, int elevel,
-						 HeapTuple *rows, int targrows,
-						 double *totalrows, double *totaldeadrows)
+file_acquire_sample_rows(Relation onerel, HeapTuple *rows,
+						 int targrows, AcquireSampleStats * sstats)
 {
-	int			numrows = 0;
 	double		rowstoskip = -1;	/* -1 means not set yet */
 	ReservoirStateData rstate;
 	TupleDesc	tupDesc;
@@ -1141,6 +1138,8 @@ file_acquire_sample_rows(Relation onerel, int elevel,
 
 	Assert(onerel);
 	Assert(targrows > 0);
+	Assert(sstats->sampled_tuples == 0);
+	Assert(sstats->total_live_tuples == 0);
 
 	tupDesc = RelationGetDescr(onerel);
 	values = (Datum *) palloc(tupDesc->natts * sizeof(Datum));
@@ -1172,8 +1171,6 @@ file_acquire_sample_rows(Relation onerel, int elevel,
 	errcallback.previous = error_context_stack;
 	error_context_stack = &errcallback;
 
-	*totalrows = 0;
-	*totaldeadrows = 0;
 	for (;;)
 	{
 		/* Check for user-requested abort or sleep */
@@ -1196,9 +1193,9 @@ file_acquire_sample_rows(Relation onerel, int elevel,
 		 * reach the end of the relation. This algorithm is from Jeff Vitter's
 		 * paper (see more info in commands/analyze.c).
 		 */
-		if (numrows < targrows)
+		if (sstats->sampled_tuples < targrows)
 		{
-			rows[numrows++] = heap_form_tuple(tupDesc, values, nulls);
+			rows[sstats->sampled_tuples++] = heap_form_tuple(tupDesc, values, nulls);
 		}
 		else
 		{
@@ -1208,7 +1205,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
 			 * not-yet-incremented value of totalrows as t.
 			 */
 			if (rowstoskip < 0)
-				rowstoskip = reservoir_get_next_S(&rstate, *totalrows, targrows);
+				rowstoskip = reservoir_get_next_S(&rstate, sstats->total_live_tuples, targrows);
 
 			if (rowstoskip <= 0)
 			{
@@ -1226,7 +1223,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
 			rowstoskip -= 1;
 		}
 
-		*totalrows += 1;
+		sstats->total_live_tuples += 1;
 	}
 
 	/* Remove error callback. */
@@ -1240,14 +1237,5 @@ file_acquire_sample_rows(Relation onerel, int elevel,
 	pfree(values);
 	pfree(nulls);
 
-	/*
-	 * Emit some interesting relation info
-	 */
-	ereport(elevel,
-			(errmsg("\"%s\": file contains %.0f rows; "
-					"%d rows in sample",
-					RelationGetRelationName(onerel),
-					*totalrows, numrows)));
-
-	return numrows;
+	return sstats->sampled_tuples;
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fc65d81e21..eddfef21e3 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -498,10 +498,8 @@ static void process_query_params(ExprContext *econtext,
 								 FmgrInfo *param_flinfo,
 								 List *param_exprs,
 								 const char **param_values);
-static int	postgresAcquireSampleRowsFunc(Relation relation, int elevel,
-										  HeapTuple *rows, int targrows,
-										  double *totalrows,
-										  double *totaldeadrows);
+static int	postgresAcquireSampleRowsFunc(Relation relation, HeapTuple *rows,
+										  int targrows, AcquireSampleStats * sstats);
 static void analyze_row_processor(PGresult *res, int row,
 								  PgFdwAnalyzeState *astate);
 static void produce_tuple_asynchronously(AsyncRequest *areq, bool fetch);
@@ -5056,7 +5054,7 @@ postgresGetAnalyzeInfoForForeignTable(Relation relation, bool *can_tablesample)
  * which must have at least targrows entries.
  * The actual number of rows selected is returned as the function result.
  * We also count the total number of rows in the table and return it into
- * *totalrows.  Note that *totaldeadrows is always set to 0.
+ * sstats->total_live_tuples.
  *
  * Note that the returned list of rows is not always in order by physical
  * position in the table.  Therefore, correlation estimates derived later
@@ -5064,10 +5062,8 @@ postgresGetAnalyzeInfoForForeignTable(Relation relation, bool *can_tablesample)
  * currently (the planner only pays attention to correlation for indexscans).
  */
 static int
-postgresAcquireSampleRowsFunc(Relation relation, int elevel,
-							  HeapTuple *rows, int targrows,
-							  double *totalrows,
-							  double *totaldeadrows)
+postgresAcquireSampleRowsFunc(Relation relation, HeapTuple *rows,
+							  int targrows, AcquireSampleStats * sstats)
 {
 	PgFdwAnalyzeState astate;
 	ForeignTable *table;
@@ -5345,26 +5341,17 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 
 	ReleaseConnection(conn);
 
-	/* We assume that we have no dead tuple. */
-	*totaldeadrows = 0.0;
-
 	/*
 	 * Without sampling, we've retrieved all living tuples from foreign
 	 * server, so report that as totalrows.  Otherwise use the reltuples
 	 * estimate we got from the remote side.
 	 */
 	if (method == ANALYZE_SAMPLE_OFF)
-		*totalrows = astate.samplerows;
+		sstats->total_live_tuples = astate.samplerows;
 	else
-		*totalrows = reltuples;
+		sstats->total_live_tuples = reltuples;
 
-	/*
-	 * Emit some interesting relation info
-	 */
-	ereport(elevel,
-			(errmsg("\"%s\": table contains %.0f rows, %d rows in sample",
-					RelationGetRelationName(relation),
-					*totalrows, astate.numrows)));
+	sstats->sampled_tuples = astate.numrows;
 
 	return astate.numrows;
 }
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d492772aac..7469226313 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -87,13 +87,12 @@ static void compute_index_stats(Relation onerel, double totalrows,
 								MemoryContext col_context);
 static VacAttrStats *examine_attribute(Relation onerel, int attnum,
 									   Node *index_expr);
-static int	acquire_sample_rows(Relation onerel, int elevel,
-								HeapTuple *rows, int targrows,
-								double *totalrows, double *totaldeadrows);
+static int	acquire_sample_rows(Relation onerel, HeapTuple *rows,
+								int targrows, AcquireSampleStats * sstats);
 static int	compare_rows(const void *a, const void *b, void *arg);
 static int	acquire_inherited_sample_rows(Relation onerel, int elevel,
 										  HeapTuple *rows, int targrows,
-										  double *totalrows, double *totaldeadrows);
+										  AcquireSampleStats * sstats);
 static void update_attstats(Oid relid, bool inh,
 							int natts, VacAttrStats **vacattrstats);
 static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
@@ -296,8 +295,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	int			targrows,
 				numrows,
 				minrows;
-	double		totalrows,
-				totaldeadrows;
+	AcquireSampleStats sstats;
 	HeapTuple  *rows;
 	PGRUsage	ru0;
 	TimestampTz starttime = 0;
@@ -524,14 +522,16 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
 								 inh ? PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS_INH :
 								 PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS);
+
+	memset(&sstats, 0, sizeof(AcquireSampleStats));
+
 	if (inh)
 		numrows = acquire_inherited_sample_rows(onerel, elevel,
 												rows, targrows,
-												&totalrows, &totaldeadrows);
+												&sstats);
 	else
-		numrows = (*acquirefunc) (onerel, elevel,
-								  rows, targrows,
-								  &totalrows, &totaldeadrows);
+		numrows = (*acquirefunc) (onerel, rows,
+								  targrows, &sstats);
 
 	/*
 	 * Compute the statistics.  Temporary results during the calculations for
@@ -562,7 +562,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			stats->compute_stats(stats,
 								 std_fetch_func,
 								 numrows,
-								 totalrows);
+								 sstats.total_live_tuples);
 
 			/*
 			 * If the appropriate flavor of the n_distinct option is
@@ -582,7 +582,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		}
 
 		if (nindexes > 0)
-			compute_index_stats(onerel, totalrows,
+			compute_index_stats(onerel, sstats.total_live_tuples,
 								indexdata, nindexes,
 								rows, numrows,
 								col_context);
@@ -607,7 +607,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		}
 
 		/* Build extended statistics (if there are any). */
-		BuildRelationExtStatistics(onerel, inh, totalrows, numrows, rows,
+		BuildRelationExtStatistics(onerel, inh, sstats.total_live_tuples, numrows, rows,
 								   attr_cnt, vacattrstats);
 	}
 
@@ -641,7 +641,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		CommandCounterIncrement();
 		vac_update_relstats(onerel,
 							relpages,
-							totalrows,
+							sstats.total_live_tuples,
 							relallvisible,
 							hasindex,
 							InvalidTransactionId,
@@ -655,7 +655,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			AnlIndexData *thisdata = &indexdata[ind];
 			double		totalindexrows;
 
-			totalindexrows = ceil(thisdata->tupleFract * totalrows);
+			totalindexrows = ceil(thisdata->tupleFract * sstats.total_live_tuples);
 			vac_update_relstats(Irel[ind],
 								RelationGetNumberOfBlocks(Irel[ind]),
 								totalindexrows,
@@ -674,7 +674,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		 * in their pg_class entries except for reltuples and relhasindex.
 		 */
 		CommandCounterIncrement();
-		vac_update_relstats(onerel, -1, totalrows,
+		vac_update_relstats(onerel, -1, sstats.total_live_tuples,
 							0, hasindex, InvalidTransactionId,
 							InvalidMultiXactId,
 							NULL, NULL,
@@ -691,8 +691,8 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	 * columns; otherwise, there is still work for auto-analyze to do.
 	 */
 	if (!inh)
-		pgstat_report_analyze(onerel, totalrows, totaldeadrows,
-							  (va_cols == NIL));
+		pgstat_report_analyze(onerel, sstats.total_live_tuples,
+							  sstats.total_dead_tuples, (va_cols == NIL));
 	else if (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		pgstat_report_analyze(onerel, 0, 0, (va_cols == NIL));
 
@@ -810,6 +810,26 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 							 get_database_name(MyDatabaseId),
 							 get_namespace_name(RelationGetNamespace(onerel)),
 							 RelationGetRelationName(onerel));
+
+			/*
+			 * Sampling from file_fdw and postgres_fdw won't report pages
+			 * stats nor live_tuples and dead_tuples. Just display tuples
+			 * sampled and total tuples estimate in this case.
+			 */
+			if (sstats.scanned_pages == 0)
+			{
+				appendStringInfo(&buf, _("tuples: %d tuples in samples, %.0f estimated total tuples\n"),
+								 sstats.sampled_tuples, sstats.total_live_tuples);
+			}
+			else
+			{
+				appendStringInfo(&buf, _("pages: %u of %u scanned\n"),
+								 sstats.scanned_pages, sstats.rel_pages);
+				appendStringInfo(&buf, _("tuples: %.0f live tuples, %.0f are dead; %d tuples in samples, %.0f estimated total tuples\n"),
+								 sstats.live_tuples, sstats.dead_tuples,
+								 sstats.sampled_tuples, sstats.total_live_tuples);
+			}
+
 			if (track_io_timing)
 			{
 				double		read_ms = (double) (pgStatBlockReadTime - startreadtime) / 1000;
@@ -1184,17 +1204,13 @@ block_sampling_read_stream_next(ReadStream *stream,
  * density near the start of the table.
  */
 static int
-acquire_sample_rows(Relation onerel, int elevel,
-					HeapTuple *rows, int targrows,
-					double *totalrows, double *totaldeadrows)
+acquire_sample_rows(Relation onerel, HeapTuple *rows,
+					int targrows, AcquireSampleStats * sstats)
 {
 	int			numrows = 0;	/* # rows now in reservoir */
 	double		samplerows = 0; /* total # rows collected */
-	double		liverows = 0;	/* # live rows seen */
-	double		deadrows = 0;	/* # dead rows seen */
 	double		rowstoskip = -1;	/* -1 means not set yet */
 	uint32		randseed;		/* Seed for block sampler(s) */
-	BlockNumber totalblocks;
 	TransactionId OldestXmin;
 	BlockSamplerData bs;
 	ReservoirStateData rstate;
@@ -1206,14 +1222,14 @@ acquire_sample_rows(Relation onerel, int elevel,
 
 	Assert(targrows > 0);
 
-	totalblocks = RelationGetNumberOfBlocks(onerel);
+	sstats->rel_pages = RelationGetNumberOfBlocks(onerel);
 
 	/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
 	OldestXmin = GetOldestNonRemovableTransactionId(onerel);
 
 	/* Prepare for sampling block numbers */
 	randseed = pg_prng_uint32(&pg_global_prng_state);
-	nblocks = BlockSampler_Init(&bs, totalblocks, targrows, randseed);
+	nblocks = BlockSampler_Init(&bs, sstats->rel_pages, targrows, randseed);
 
 	/* Report sampling block numbers */
 	pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_TOTAL,
@@ -1238,7 +1254,7 @@ acquire_sample_rows(Relation onerel, int elevel,
 	{
 		vacuum_delay_point();
 
-		while (table_scan_analyze_next_tuple(scan, OldestXmin, &liverows, &deadrows, slot))
+		while (table_scan_analyze_next_tuple(scan, OldestXmin, &sstats->live_tuples, &sstats->dead_tuples, slot))
 		{
 			/*
 			 * The first targrows sample rows are simply copied into the
@@ -1313,26 +1329,11 @@ acquire_sample_rows(Relation onerel, int elevel,
 	 */
 	if (bs.m > 0)
 	{
-		*totalrows = floor((liverows / bs.m) * totalblocks + 0.5);
-		*totaldeadrows = floor((deadrows / bs.m) * totalblocks + 0.5);
+		sstats->total_live_tuples = floor((sstats->live_tuples / bs.m) * sstats->rel_pages + 0.5);
+		sstats->total_dead_tuples = floor((sstats->dead_tuples / bs.m) * sstats->rel_pages + 0.5);
 	}
-	else
-	{
-		*totalrows = 0.0;
-		*totaldeadrows = 0.0;
-	}
-
-	/*
-	 * Emit some interesting relation info
-	 */
-	ereport(elevel,
-			(errmsg("\"%s\": scanned %d of %u pages, "
-					"containing %.0f live rows and %.0f dead rows; "
-					"%d rows in sample, %.0f estimated total rows",
-					RelationGetRelationName(onerel),
-					bs.m, totalblocks,
-					liverows, deadrows,
-					numrows, *totalrows)));
+	sstats->scanned_pages = bs.m;
+	sstats->sampled_tuples = numrows;
 
 	return numrows;
 }
@@ -1373,7 +1374,7 @@ compare_rows(const void *a, const void *b, void *arg)
 static int
 acquire_inherited_sample_rows(Relation onerel, int elevel,
 							  HeapTuple *rows, int targrows,
-							  double *totalrows, double *totaldeadrows)
+							  AcquireSampleStats * sstats)
 {
 	List	   *tableOIDs;
 	Relation   *rels;
@@ -1386,10 +1387,6 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 	ListCell   *lc;
 	bool		has_child;
 
-	/* Initialize output parameters to zero now, in case we exit early */
-	*totalrows = 0;
-	*totaldeadrows = 0;
-
 	/*
 	 * Find all members of inheritance set.  We only need AccessShareLock on
 	 * the children.
@@ -1559,13 +1556,13 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 			if (childtargrows > 0)
 			{
 				int			childrows;
-				double		trows,
-							tdrows;
+				AcquireSampleStats childsstats;
+
+				memset(&childsstats, 0, sizeof(AcquireSampleStats));
 
 				/* Fetch a random sample of the child's rows */
-				childrows = (*acquirefunc) (childrel, elevel,
-											rows + numrows, childtargrows,
-											&trows, &tdrows);
+				childrows = (*acquirefunc) (childrel, rows + numrows,
+											childtargrows, &childsstats);
 
 				/* We may need to convert from child's rowtype to parent's */
 				if (childrows > 0 &&
@@ -1594,8 +1591,8 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 
 				/* And add to counts */
 				numrows += childrows;
-				*totalrows += trows;
-				*totaldeadrows += tdrows;
+				sstats->total_live_tuples += childsstats.total_live_tuples;
+				sstats->total_dead_tuples += childsstats.total_dead_tuples;
 			}
 		}
 
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index fcde3876b2..b27571f92a 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -19,6 +19,18 @@
 /* To avoid including explain.h here, reference ExplainState thus: */
 struct ExplainState;
 
+typedef struct AcquireSampleStats
+{
+	BlockNumber rel_pages;		/* total number of pages */
+	BlockNumber scanned_pages;	/* # pages examined */
+	double		live_tuples;	/* # live tuples found in scanned pages */
+	double		dead_tuples;	/* # dead tuples found in scanned pages */
+
+	int			sampled_tuples; /* # sampled tuples */
+	double		total_live_tuples;	/* # estimated live tuples */
+	double		total_dead_tuples;	/* # estimated dead tuples */
+}			AcquireSampleStats;
+
 
 /*
  * Callback function signatures --- see fdwhandler.sgml for more info.
@@ -148,10 +160,8 @@ typedef void (*ExplainForeignModify_function) (ModifyTableState *mtstate,
 typedef void (*ExplainDirectModify_function) (ForeignScanState *node,
 											  struct ExplainState *es);
 
-typedef int (*AcquireSampleRowsFunc) (Relation relation, int elevel,
-									  HeapTuple *rows, int targrows,
-									  double *totalrows,
-									  double *totaldeadrows);
+typedef int (*AcquireSampleRowsFunc) (Relation relation, HeapTuple *rows,
+									  int targrows, AcquireSampleStats * sstats);
 
 typedef bool (*AnalyzeForeignTable_function) (Relation relation,
 											  AcquireSampleRowsFunc *func,
-- 
2.39.3 (Apple Git-146)

#9Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Anthonin Bonnefoy (#8)
Re: Use pgBufferUsage for block reporting in analyze

On Wed, Jul 24, 2024 at 1:58 AM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

On Mon, Jul 22, 2024 at 10:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

The first line would vary depending on whether an autovacuum worker or
not. And the above suggestion includes a change of term "row" to
"tuple" for better consistency with VACUUM VERBOSE outputs. I think it
would be great if autoanalyze also writes logs in the same format.
IIUC with the patch, autoanalyze logs don't include the page and tuple
statistics.

One issue is that the number of scanned pages, live tuples and dead
tuples is only available in acquire_sample_rows which is where the log
containing those stats is emitted. I've tried to implement the
following in 0003:
- Sampling functions now accept an AcquireSampleStats struct to store
pages and tuples stats
- Log is removed from sampling functions
- do_analyze_rel now outputs scanned and tuples statistics when
relevant. sampling from fdw doesn't provide those statistics so they
are not displayed in those cases.

Studying how we write verbose log messages, it seems that currently
ANALYZE (autoanalyze) lets tables and FDWs write logs in its own
format. Which makes sense to me as some instruments for heap such as
dead tuple might not be necessary for FDWs and FDW might want to write
other information such as executed queries. An alternative idea would
be to pass StringInfo to AcquireSampleRowsFunc() so that callback can
write its messages there. This is somewhat similar to what we do in
the EXPLAIN command (cf, ExplainPropertyText() etc). It could be too
much but I think it could be better than writing logs in the single
format.

I've also slightly modified 0002 to display "automatic analyze" when
we're inside an autovacuum worker, similar to what's done with vacuum
output.

+1

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#10Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Masahiko Sawada (#9)
3 attachment(s)
Re: Use pgBufferUsage for block reporting in analyze

On Sat, Jul 27, 2024 at 12:35 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

An alternative idea would
be to pass StringInfo to AcquireSampleRowsFunc() so that callback can
write its messages there. This is somewhat similar to what we do in
the EXPLAIN command (cf, ExplainPropertyText() etc). It could be too
much but I think it could be better than writing logs in the single
format.

I've tested this approach, it definitely looks better. I've added a
logbuf StringInfo to AcquireSampleRowsFunc which will receive the
logs. elevel was removed as it is not used anymore. Since everything
is in the same log line, I've removed the relation name in the acquire
sample functions.

For partitioned tables, I've also added the processed partition table
being sampled. The output will look like:

INFO: analyze of table "postgres.public.test_partition"
Sampling rows from child "public.test_partition_1"
pages: 5 of 5 scanned
tuples: 999 live tuples, 0 are dead; 999 tuples in sample, 999
estimated total tuples
Sampling rows from child "public.test_partition_2"
pages: 5 of 5 scanned
tuples: 1000 live tuples, 0 are dead; 1000 tuples in sample, 1000
estimated total tuples
avg read rate: 2.604 MB/s, avg write rate: 0.000 MB/s
...

For a file_fdw, the output will be:

INFO: analyze of table "postgres.public.pglog"
tuples: 60043 tuples; 30000 tuples in sample
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
...

Regards,
Anthonin

Attachments:

v4-0002-Output-buffer-and-wal-usage-with-verbose-analyze.patchapplication/octet-stream; name=v4-0002-Output-buffer-and-wal-usage-with-verbose-analyze.patchDownload
From 23c046a0a60fea7f6cf3eb1197c4ba52b9ad74d5 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Mon, 8 Jul 2024 08:30:31 +0200
Subject: Output buffer and wal usage with verbose analyze

Currently, buffer usage generated by analyze is only logged during
autoanalyze. To make behaviour more consistent with VACUUM VERBOSE, this
patch outputs buffer and wal usage of analyze on ANALYZE VERBOSE.

The patch also changes delta time computation in VACUUM to use ms
difference, to both simplify and make it more consistent with what's
done in ANALYZE.
---
 src/backend/access/heap/vacuumlazy.c | 11 +++++-----
 src/backend/commands/analyze.c       | 33 +++++++++++++++++++++++-----
 2 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d82aa3d4896..32cce88e260 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -599,8 +599,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			TimestampDifferenceExceeds(starttime, endtime,
 									   params->log_min_duration))
 		{
-			long		secs_dur;
-			int			usecs_dur;
+			long		delay_in_ms;
 			WalUsage	walusage;
 			BufferUsage bufferusage;
 			StringInfoData buf;
@@ -612,7 +611,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			int64		total_blks_read;
 			int64		total_blks_dirtied;
 
-			TimestampDifference(starttime, endtime, &secs_dur, &usecs_dur);
+			delay_in_ms = TimestampDifferenceMilliseconds(starttime, endtime);
 			memset(&walusage, 0, sizeof(WalUsage));
 			WalUsageAccumDiff(&walusage, &pgWalUsage, &startwalusage);
 			memset(&bufferusage, 0, sizeof(BufferUsage));
@@ -748,12 +747,12 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 				appendStringInfo(&buf, _("I/O timings: read: %.3f ms, write: %.3f ms\n"),
 								 read_ms, write_ms);
 			}
-			if (secs_dur > 0 || usecs_dur > 0)
+			if (delay_in_ms > 0)
 			{
 				read_rate = (double) BLCKSZ * total_blks_read /
-					(1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
+					(1024 * 1024) / (delay_in_ms / 1000.0);
 				write_rate = (double) BLCKSZ * total_blks_dirtied /
-					(1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
+					(1024 * 1024) / (delay_in_ms / 1000.0);
 			}
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 							 read_rate, write_rate);
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d4377186d77..d492772aacb 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -288,7 +288,9 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 				ind;
 	Relation   *Irel;
 	int			nindexes;
-	bool		hasindex;
+	bool		verbose,
+				instrument,
+				hasindex;
 	VacAttrStats **vacattrstats;
 	AnlIndexData *indexdata;
 	int			targrows,
@@ -303,11 +305,15 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	WalUsage	startwalusage = pgWalUsage;
 	BufferUsage startbufferusage = pgBufferUsage;
 	BufferUsage bufferusage;
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
 
+	verbose = (params->options & VACOPT_VERBOSE) != 0;
+	instrument = (verbose || (AmAutoVacuumWorkerProcess() &&
+							  params->log_min_duration >= 0));
 	if (inh)
 		ereport(elevel,
 				(errmsg("analyzing \"%s.%s\" inheritance tree",
@@ -340,7 +346,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	RestrictSearchPath();
 
 	/* measure elapsed time iff autovacuum logging requires it */
-	if (AmAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
+	if (instrument)
 	{
 		if (track_io_timing)
 		{
@@ -723,17 +729,19 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	vac_close_indexes(nindexes, Irel, NoLock);
 
 	/* Log the action if appropriate */
-	if (AmAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
+	if (instrument)
 	{
 		TimestampTz endtime = GetCurrentTimestamp();
 
-		if (params->log_min_duration == 0 ||
+		if (verbose || params->log_min_duration == 0 ||
 			TimestampDifferenceExceeds(starttime, endtime,
 									   params->log_min_duration))
 		{
 			long		delay_in_ms;
+			WalUsage	walusage;
 			double		read_rate = 0;
 			double		write_rate = 0;
+			char	   *msgfmt;
 			StringInfoData buf;
 			int64		total_blks_hit;
 			int64		total_blks_read;
@@ -746,6 +754,8 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			 */
 			memset(&bufferusage, 0, sizeof(BufferUsage));
 			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
+			memset(&walusage, 0, sizeof(WalUsage));
+			WalUsageAccumDiff(&walusage, &pgWalUsage, &startwalusage);
 
 			total_blks_hit = bufferusage.shared_blks_hit +
 				bufferusage.local_blks_hit;
@@ -790,7 +800,13 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			 */
 
 			initStringInfo(&buf);
-			appendStringInfo(&buf, _("automatic analyze of table \"%s.%s.%s\"\n"),
+
+			if (AmAutoVacuumWorkerProcess())
+				msgfmt = _("automatic analyze of table \"%s.%s.%s\"\n");
+			else
+				msgfmt = _("analyze of table \"%s.%s.%s\"\n");
+
+			appendStringInfo(&buf, msgfmt,
 							 get_database_name(MyDatabaseId),
 							 get_namespace_name(RelationGetNamespace(onerel)),
 							 RelationGetRelationName(onerel));
@@ -808,9 +824,14 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 							 (long long) total_blks_hit,
 							 (long long) total_blks_read,
 							 (long long) total_blks_dirtied);
+			appendStringInfo(&buf,
+							 _("WAL usage: %lld records, %lld full page images, %llu bytes\n"),
+							 (long long) walusage.wal_records,
+							 (long long) walusage.wal_fpi,
+							 (unsigned long long) walusage.wal_bytes);
 			appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
 
-			ereport(LOG,
+			ereport(verbose ? INFO : LOG,
 					(errmsg_internal("%s", buf.data)));
 
 			pfree(buf.data);
-- 
2.39.3 (Apple Git-146)

v4-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patchapplication/octet-stream; name=v4-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patchDownload
From 847cbc32c77290df5381e9d23cdeaea0f30f265b Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Wed, 24 Apr 2024 10:40:33 +0200
Subject: Use pgBufferUsage for block reporting in analyze

Analyze logs within autovacuum uses specific variables
VacuumPage{Hit,Miss,Dirty} to track the buffer usage count. However,
pgBufferUsage already provides block usage tracking and handles more
cases (temporary tables, parallel workers...).

Those variables were only used in two places, block usage reporting in
verbose vacuum and analyze. 5cd72cc0c5017a9d4de8b5d465a75946da5abd1d
removed their usage in the vacuum command as part of a bugfix.

This patch replaces those Vacuum specific variables by pgBufferUsage in
analyze. This makes VacuumPage{Hit,Miss,Dirty} completely unused and
removable. This commit removes both their calls in bufmgr and their
declarations. It also renames buffer misses to buffer reads for better
consistency.
---
 src/backend/access/heap/vacuumlazy.c  | 22 +++++++++++-----
 src/backend/commands/analyze.c        | 36 ++++++++++++++++-----------
 src/backend/commands/vacuum.c         |  3 ---
 src/backend/commands/vacuumparallel.c |  3 ---
 src/backend/storage/buffer/bufmgr.c   |  4 ---
 src/backend/utils/init/globals.c      |  4 ---
 src/include/miscadmin.h               |  4 ---
 7 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 835b53415d0..d82aa3d4896 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -608,6 +608,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			int32		diff;
 			double		read_rate = 0,
 						write_rate = 0;
+			int64		total_blks_hit;
+			int64		total_blks_read;
+			int64		total_blks_dirtied;
 
 			TimestampDifference(starttime, endtime, &secs_dur, &usecs_dur);
 			memset(&walusage, 0, sizeof(WalUsage));
@@ -615,6 +618,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			memset(&bufferusage, 0, sizeof(BufferUsage));
 			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
 
+			total_blks_hit = bufferusage.shared_blks_hit +
+				bufferusage.local_blks_hit;
+			total_blks_read = bufferusage.shared_blks_read +
+				bufferusage.local_blks_read;
+			total_blks_dirtied = bufferusage.shared_blks_dirtied +
+				bufferusage.local_blks_dirtied;
+
 			initStringInfo(&buf);
 			if (verbose)
 			{
@@ -740,18 +750,18 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			}
 			if (secs_dur > 0 || usecs_dur > 0)
 			{
-				read_rate = (double) BLCKSZ * (bufferusage.shared_blks_read + bufferusage.local_blks_read) /
+				read_rate = (double) BLCKSZ * total_blks_read /
 					(1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
-				write_rate = (double) BLCKSZ * (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied) /
+				write_rate = (double) BLCKSZ * total_blks_dirtied /
 					(1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
 			}
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 							 read_rate, write_rate);
 			appendStringInfo(&buf,
-							 _("buffer usage: %lld hits, %lld misses, %lld dirtied\n"),
-							 (long long) (bufferusage.shared_blks_hit + bufferusage.local_blks_hit),
-							 (long long) (bufferusage.shared_blks_read + bufferusage.local_blks_read),
-							 (long long) (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied));
+							 _("buffer usage: %lld hits, %lld reads, %lld dirtied\n"),
+							 (long long) total_blks_hit,
+							 (long long) total_blks_read,
+							 (long long) total_blks_dirtied);
 			appendStringInfo(&buf,
 							 _("WAL usage: %lld records, %lld full page images, %llu bytes\n"),
 							 (long long) walusage.wal_records,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index c590a2adc35..d4377186d77 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -303,9 +303,8 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
-	int64		AnalyzePageHit = VacuumPageHit;
-	int64		AnalyzePageMiss = VacuumPageMiss;
-	int64		AnalyzePageDirty = VacuumPageDirty;
+	BufferUsage startbufferusage = pgBufferUsage;
+	BufferUsage bufferusage;
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
 
@@ -736,15 +735,24 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			double		read_rate = 0;
 			double		write_rate = 0;
 			StringInfoData buf;
+			int64		total_blks_hit;
+			int64		total_blks_read;
+			int64		total_blks_dirtied;
 
 			/*
 			 * Calculate the difference in the Page Hit/Miss/Dirty that
 			 * happened as part of the analyze by subtracting out the
 			 * pre-analyze values which we saved above.
 			 */
-			AnalyzePageHit = VacuumPageHit - AnalyzePageHit;
-			AnalyzePageMiss = VacuumPageMiss - AnalyzePageMiss;
-			AnalyzePageDirty = VacuumPageDirty - AnalyzePageDirty;
+			memset(&bufferusage, 0, sizeof(BufferUsage));
+			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
+
+			total_blks_hit = bufferusage.shared_blks_hit +
+				bufferusage.local_blks_hit;
+			total_blks_read = bufferusage.shared_blks_read +
+				bufferusage.local_blks_read;
+			total_blks_dirtied = bufferusage.shared_blks_dirtied +
+				bufferusage.local_blks_dirtied;
 
 			/*
 			 * We do not expect an analyze to take > 25 days and it simplifies
@@ -770,10 +778,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 
 			if (delay_in_ms > 0)
 			{
-				read_rate = (double) BLCKSZ * AnalyzePageMiss / (1024 * 1024) /
-					(delay_in_ms / 1000.0);
-				write_rate = (double) BLCKSZ * AnalyzePageDirty / (1024 * 1024) /
-					(delay_in_ms / 1000.0);
+				read_rate = (double) BLCKSZ * total_blks_read /
+					(1024 * 1024) / (delay_in_ms / 1000.0);
+				write_rate = (double) BLCKSZ * total_blks_dirtied /
+					(1024 * 1024) / (delay_in_ms / 1000.0);
 			}
 
 			/*
@@ -796,10 +804,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			}
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 							 read_rate, write_rate);
-			appendStringInfo(&buf, _("buffer usage: %lld hits, %lld misses, %lld dirtied\n"),
-							 (long long) AnalyzePageHit,
-							 (long long) AnalyzePageMiss,
-							 (long long) AnalyzePageDirty);
+			appendStringInfo(&buf, _("buffer usage: %lld hits, %lld reads, %lld dirtied\n"),
+							 (long long) total_blks_hit,
+							 (long long) total_blks_read,
+							 (long long) total_blks_dirtied);
 			appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
 
 			ereport(LOG,
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 48f8eab2022..7d8e9d20454 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -603,9 +603,6 @@ vacuum(List *relations, VacuumParams *params, BufferAccessStrategy bstrategy,
 		VacuumFailsafeActive = false;
 		VacuumUpdateCosts();
 		VacuumCostBalance = 0;
-		VacuumPageHit = 0;
-		VacuumPageMiss = 0;
-		VacuumPageDirty = 0;
 		VacuumCostBalanceLocal = 0;
 		VacuumSharedCostBalance = NULL;
 		VacuumActiveNWorkers = NULL;
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index f26070bff2a..22c057fe61b 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -1043,9 +1043,6 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
 	/* Set cost-based vacuum delay */
 	VacuumUpdateCosts();
 	VacuumCostBalance = 0;
-	VacuumPageHit = 0;
-	VacuumPageMiss = 0;
-	VacuumPageDirty = 0;
 	VacuumCostBalanceLocal = 0;
 	VacuumSharedCostBalance = &(shared->cost_balance);
 	VacuumActiveNWorkers = &(shared->active_nworkers);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 4415ba648ae..fb38c7bdd45 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1192,7 +1192,6 @@ PinBufferForBlock(Relation rel,
 	}
 	if (*foundPtr)
 	{
-		VacuumPageHit++;
 		pgstat_count_io_op(io_object, io_context, IOOP_HIT);
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageHit;
@@ -1588,7 +1587,6 @@ WaitReadBuffers(ReadBuffersOperation *operation)
 											  false);
 		}
 
-		VacuumPageMiss += io_buffers_len;
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageMiss * io_buffers_len;
 	}
@@ -2582,7 +2580,6 @@ MarkBufferDirty(Buffer buffer)
 	 */
 	if (!(old_buf_state & BM_DIRTY))
 	{
-		VacuumPageDirty++;
 		pgBufferUsage.shared_blks_dirtied++;
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageDirty;
@@ -5122,7 +5119,6 @@ MarkBufferDirtyHint(Buffer buffer, bool buffer_std)
 
 		if (dirtied)
 		{
-			VacuumPageDirty++;
 			pgBufferUsage.shared_blks_dirtied++;
 			if (VacuumCostActive)
 				VacuumCostBalance += VacuumCostPageDirty;
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 927bccfbea8..d4d51c51738 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -152,10 +152,6 @@ int			VacuumCostPageDirty = 20;
 int			VacuumCostLimit = 200;
 double		VacuumCostDelay = 0;
 
-int64		VacuumPageHit = 0;
-int64		VacuumPageMiss = 0;
-int64		VacuumPageDirty = 0;
-
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 90f9b21b258..6d07005d39b 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -283,10 +283,6 @@ extern PGDLLIMPORT int VacuumCostPageDirty;
 extern PGDLLIMPORT int VacuumCostLimit;
 extern PGDLLIMPORT double VacuumCostDelay;
 
-extern PGDLLIMPORT int64 VacuumPageHit;
-extern PGDLLIMPORT int64 VacuumPageMiss;
-extern PGDLLIMPORT int64 VacuumPageDirty;
-
 extern PGDLLIMPORT int VacuumCostBalance;
 extern PGDLLIMPORT bool VacuumCostActive;
 
-- 
2.39.3 (Apple Git-146)

v4-0003-Pass-StringInfo-logbuf-to-AcquireSampleFunc.patchapplication/octet-stream; name=v4-0003-Pass-StringInfo-logbuf-to-AcquireSampleFunc.patchDownload
From 4ebad7d4f6a56d8dbe4dd9e77cf0599c1b6dbe1c Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Thu, 25 Jul 2024 09:34:55 +0200
Subject: Pass StringInfo logbuf to AcquireSampleFunc

Currently, row sampling functions emit log with interesting informations
like scanned pages or live tuples within the function itself. This
prevents the possibility to output those informations during ANALYZE
VERBOSE in a consistent way (a single log line emitted at the end of the
analysis).

To fix this, this patch modifies AcquireSampleRowsFunc to accept a
StringInfo that will store log output from the function. This allows to
control when and where the sampling functions logs should be by
do_analyze_rel.

elevel parameter has been removed from the AcquireSampleRowsFunc's
parameters as it is now unused.

Also, rows were replaced by tuples in logs for consistency.
---
 contrib/file_fdw/file_fdw.c         | 21 ++++----
 contrib/postgres_fdw/postgres_fdw.c | 23 ++++----
 doc/src/sgml/fdwhandler.sgml        |  4 +-
 src/backend/commands/analyze.c      | 81 +++++++++++++++--------------
 src/include/foreign/fdwapi.h        |  7 ++-
 5 files changed, 67 insertions(+), 69 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 249d82d3a05..20d96abce4e 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -160,9 +160,9 @@ static void estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
 						   FileFdwPlanState *fdw_private,
 						   Cost *startup_cost, Cost *total_cost);
-static int	file_acquire_sample_rows(Relation onerel, int elevel,
-									 HeapTuple *rows, int targrows,
-									 double *totalrows, double *totaldeadrows);
+static int	file_acquire_sample_rows(Relation onerel, HeapTuple *rows,
+									 int targrows, double *totalrows,
+									 double *totaldeadrows, StringInfo logbuf);
 
 
 /*
@@ -1120,9 +1120,9 @@ estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
  * currently (the planner only pays attention to correlation for indexscans).
  */
 static int
-file_acquire_sample_rows(Relation onerel, int elevel,
-						 HeapTuple *rows, int targrows,
-						 double *totalrows, double *totaldeadrows)
+file_acquire_sample_rows(Relation onerel, HeapTuple *rows,
+						 int targrows, double *totalrows,
+						 double *totaldeadrows, StringInfo logbuf)
 {
 	int			numrows = 0;
 	double		rowstoskip = -1;	/* -1 means not set yet */
@@ -1241,13 +1241,10 @@ file_acquire_sample_rows(Relation onerel, int elevel,
 	pfree(nulls);
 
 	/*
-	 * Emit some interesting relation info
+	 * Add some interesting relation info to logbuf
 	 */
-	ereport(elevel,
-			(errmsg("\"%s\": file contains %.0f rows; "
-					"%d rows in sample",
-					RelationGetRelationName(onerel),
-					*totalrows, numrows)));
+	appendStringInfo(logbuf, "tuples: %.0f tuples; %d tuples in sample\n",
+					 *totalrows, numrows);
 
 	return numrows;
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fc65d81e217..54a65018474 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -498,10 +498,9 @@ static void process_query_params(ExprContext *econtext,
 								 FmgrInfo *param_flinfo,
 								 List *param_exprs,
 								 const char **param_values);
-static int	postgresAcquireSampleRowsFunc(Relation relation, int elevel,
-										  HeapTuple *rows, int targrows,
-										  double *totalrows,
-										  double *totaldeadrows);
+static int	postgresAcquireSampleRowsFunc(Relation relation, HeapTuple *rows,
+										  int targrows, double *totalrows,
+										  double *totaldeadrows, StringInfo logbuf);
 static void analyze_row_processor(PGresult *res, int row,
 								  PgFdwAnalyzeState *astate);
 static void produce_tuple_asynchronously(AsyncRequest *areq, bool fetch);
@@ -5064,10 +5063,9 @@ postgresGetAnalyzeInfoForForeignTable(Relation relation, bool *can_tablesample)
  * currently (the planner only pays attention to correlation for indexscans).
  */
 static int
-postgresAcquireSampleRowsFunc(Relation relation, int elevel,
-							  HeapTuple *rows, int targrows,
-							  double *totalrows,
-							  double *totaldeadrows)
+postgresAcquireSampleRowsFunc(Relation relation, HeapTuple *rows,
+							  int targrows, double *totalrows,
+							  double *totaldeadrows, StringInfo logbuf)
 {
 	PgFdwAnalyzeState astate;
 	ForeignTable *table;
@@ -5359,12 +5357,11 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 		*totalrows = reltuples;
 
 	/*
-	 * Emit some interesting relation info
+	 * Add some interesting relation info to logbuf
 	 */
-	ereport(elevel,
-			(errmsg("\"%s\": table contains %.0f rows, %d rows in sample",
-					RelationGetRelationName(relation),
-					*totalrows, astate.numrows)));
+	appendStringInfo(logbuf,
+					 "tuples: %.0f tuples; %d tuples in sample\n",
+					 *totalrows, astate.numrows);
 
 	return astate.numrows;
 }
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index b80320504d6..d758b904ee5 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1393,11 +1393,11 @@ AnalyzeForeignTable(Relation relation,
 <programlisting>
 int
 AcquireSampleRowsFunc(Relation relation,
-                      int elevel,
                       HeapTuple *rows,
                       int targrows,
                       double *totalrows,
-                      double *totaldeadrows);
+                      double *totaldeadrows,
+                      StringInfo logbuf);
 </programlisting>
 
      A random sample of up to <parameter>targrows</parameter> rows should be collected
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d492772aacb..45e36298196 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -87,13 +87,13 @@ static void compute_index_stats(Relation onerel, double totalrows,
 								MemoryContext col_context);
 static VacAttrStats *examine_attribute(Relation onerel, int attnum,
 									   Node *index_expr);
-static int	acquire_sample_rows(Relation onerel, int elevel,
-								HeapTuple *rows, int targrows,
-								double *totalrows, double *totaldeadrows);
+static int	acquire_sample_rows(Relation onerel, HeapTuple *rows,
+								int targrows, double *totalrows,
+								double *totaldeadrows, StringInfo logbuf);
 static int	compare_rows(const void *a, const void *b, void *arg);
-static int	acquire_inherited_sample_rows(Relation onerel, int elevel,
-										  HeapTuple *rows, int targrows,
-										  double *totalrows, double *totaldeadrows);
+static int	acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows,
+										  int targrows, double *totalrows,
+										  double *totaldeadrows, StringInfo logbuf);
 static void update_attstats(Oid relid, bool inh,
 							int natts, VacAttrStats **vacattrstats);
 static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
@@ -310,6 +310,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	BufferUsage bufferusage;
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
+	StringInfoData logbuf;
 
 	verbose = (params->options & VACOPT_VERBOSE) != 0;
 	instrument = (verbose || (AmAutoVacuumWorkerProcess() &&
@@ -334,6 +335,9 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 										ALLOCSET_DEFAULT_SIZES);
 	caller_context = MemoryContextSwitchTo(anl_context);
 
+	/* Initialize log buffer */
+	initStringInfo(&logbuf);
+
 	/*
 	 * Switch to the table owner's userid, so that any index functions are run
 	 * as that user.  Also lock down security-restricted operations and
@@ -525,13 +529,11 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 								 inh ? PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS_INH :
 								 PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS);
 	if (inh)
-		numrows = acquire_inherited_sample_rows(onerel, elevel,
-												rows, targrows,
-												&totalrows, &totaldeadrows);
+		numrows = acquire_inherited_sample_rows(onerel, rows, targrows,
+												&totalrows, &totaldeadrows, &logbuf);
 	else
-		numrows = (*acquirefunc) (onerel, elevel,
-								  rows, targrows,
-								  &totalrows, &totaldeadrows);
+		numrows = (*acquirefunc) (onerel, rows, targrows,
+								  &totalrows, &totaldeadrows, &logbuf);
 
 	/*
 	 * Compute the statistics.  Temporary results during the calculations for
@@ -810,6 +812,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 							 get_database_name(MyDatabaseId),
 							 get_namespace_name(RelationGetNamespace(onerel)),
 							 RelationGetRelationName(onerel));
+
+			/* Append logs from sampling function */
+			appendBinaryStringInfo(&buf, logbuf.data, logbuf.len);
+
 			if (track_io_timing)
 			{
 				double		read_ms = (double) (pgStatBlockReadTime - startreadtime) / 1000;
@@ -1184,9 +1190,9 @@ block_sampling_read_stream_next(ReadStream *stream,
  * density near the start of the table.
  */
 static int
-acquire_sample_rows(Relation onerel, int elevel,
-					HeapTuple *rows, int targrows,
-					double *totalrows, double *totaldeadrows)
+acquire_sample_rows(Relation onerel, HeapTuple *rows,
+					int targrows, double *totalrows,
+					double *totaldeadrows, StringInfo logbuf)
 {
 	int			numrows = 0;	/* # rows now in reservoir */
 	double		samplerows = 0; /* total # rows collected */
@@ -1323,16 +1329,12 @@ acquire_sample_rows(Relation onerel, int elevel,
 	}
 
 	/*
-	 * Emit some interesting relation info
+	 * Add interesting relation info to logbuf
 	 */
-	ereport(elevel,
-			(errmsg("\"%s\": scanned %d of %u pages, "
-					"containing %.0f live rows and %.0f dead rows; "
-					"%d rows in sample, %.0f estimated total rows",
-					RelationGetRelationName(onerel),
-					bs.m, totalblocks,
-					liverows, deadrows,
-					numrows, *totalrows)));
+	appendStringInfo(logbuf, _("pages: %u of %u scanned\n"),
+					 bs.m, totalblocks);
+	appendStringInfo(logbuf, _("tuples: %.0f live tuples, %.0f are dead; %d tuples in sample, %.0f estimated total tuples\n"),
+					 liverows, deadrows, numrows, *totalrows);
 
 	return numrows;
 }
@@ -1371,9 +1373,9 @@ compare_rows(const void *a, const void *b, void *arg)
  * children are foreign tables that don't support ANALYZE.
  */
 static int
-acquire_inherited_sample_rows(Relation onerel, int elevel,
-							  HeapTuple *rows, int targrows,
-							  double *totalrows, double *totaldeadrows)
+acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows,
+							  int targrows, double *totalrows,
+							  double *totaldeadrows, StringInfo logbuf)
 {
 	List	   *tableOIDs;
 	Relation   *rels;
@@ -1409,10 +1411,10 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 		/* CCI because we already updated the pg_class row in this command */
 		CommandCounterIncrement();
 		SetRelationHasSubclass(RelationGetRelid(onerel), false);
-		ereport(elevel,
-				(errmsg("skipping analyze of \"%s.%s\" inheritance tree --- this inheritance tree contains no child tables",
-						get_namespace_name(RelationGetNamespace(onerel)),
-						RelationGetRelationName(onerel))));
+		appendStringInfo(logbuf,
+						 "skipping analyze of \"%s.%s\" inheritance tree --- this inheritance tree contains no child tables\n",
+						 get_namespace_name(RelationGetNamespace(onerel)),
+						 RelationGetRelationName(onerel));
 		return 0;
 	}
 
@@ -1507,10 +1509,10 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 	 */
 	if (!has_child)
 	{
-		ereport(elevel,
-				(errmsg("skipping analyze of \"%s.%s\" inheritance tree --- this inheritance tree contains no analyzable child tables",
-						get_namespace_name(RelationGetNamespace(onerel)),
-						RelationGetRelationName(onerel))));
+		appendStringInfo(logbuf,
+						 "skipping analyze of \"%s.%s\" inheritance tree --- this inheritance tree contains no analyzable child tables\n",
+						 get_namespace_name(RelationGetNamespace(onerel)),
+						 RelationGetRelationName(onerel));
 		return 0;
 	}
 
@@ -1562,10 +1564,13 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 				double		trows,
 							tdrows;
 
+				appendStringInfo(logbuf, _("Sampling rows from child \"%s.%s\"\n"),
+								 get_namespace_name(RelationGetNamespace(childrel)),
+								 RelationGetRelationName(childrel));
 				/* Fetch a random sample of the child's rows */
-				childrows = (*acquirefunc) (childrel, elevel,
-											rows + numrows, childtargrows,
-											&trows, &tdrows);
+				childrows = (*acquirefunc) (childrel, rows + numrows,
+											childtargrows, &trows,
+											&tdrows, logbuf);
 
 				/* We may need to convert from child's rowtype to parent's */
 				if (childrows > 0 &&
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index fcde3876b28..b33a183b080 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -148,10 +148,9 @@ typedef void (*ExplainForeignModify_function) (ModifyTableState *mtstate,
 typedef void (*ExplainDirectModify_function) (ForeignScanState *node,
 											  struct ExplainState *es);
 
-typedef int (*AcquireSampleRowsFunc) (Relation relation, int elevel,
-									  HeapTuple *rows, int targrows,
-									  double *totalrows,
-									  double *totaldeadrows);
+typedef int (*AcquireSampleRowsFunc) (Relation relation, HeapTuple *rows,
+									  int targrows, double *totalrows,
+									  double *totaldeadrows, StringInfo logbuf);
 
 typedef bool (*AnalyzeForeignTable_function) (Relation relation,
 											  AcquireSampleRowsFunc *func,
-- 
2.39.3 (Apple Git-146)

#11Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Anthonin Bonnefoy (#10)
Re: Use pgBufferUsage for block reporting in analyze

Hi,

On Mon, Jul 29, 2024 at 12:12 AM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

On Sat, Jul 27, 2024 at 12:35 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

An alternative idea would
be to pass StringInfo to AcquireSampleRowsFunc() so that callback can
write its messages there. This is somewhat similar to what we do in
the EXPLAIN command (cf, ExplainPropertyText() etc). It could be too
much but I think it could be better than writing logs in the single
format.

I have one comment on 0001 patch:

                         /*
                          * Calculate the difference in the Page
Hit/Miss/Dirty that
                          * happened as part of the analyze by
subtracting out the
                          * pre-analyze values which we saved above.
                          */
-                        AnalyzePageHit = VacuumPageHit - AnalyzePageHit;
-                        AnalyzePageMiss = VacuumPageMiss - AnalyzePageMiss;
-                        AnalyzePageDirty = VacuumPageDirty - AnalyzePageDirty;
+                        memset(&bufferusage, 0, sizeof(BufferUsage));
+                        BufferUsageAccumDiff(&bufferusage,
&pgBufferUsage, &startbufferusage);
+
+                        total_blks_hit = bufferusage.shared_blks_hit +
+                                bufferusage.local_blks_hit;
+                        total_blks_read = bufferusage.shared_blks_read +
+                                bufferusage.local_blks_read;
+                        total_blks_dirtied = bufferusage.shared_blks_dirtied +
+                                bufferusage.local_blks_dirtied;

The comment should also be updated or removed.

And here are some comments on 0002 patch:

-           TimestampDifference(starttime, endtime, &secs_dur, &usecs_dur);
+           delay_in_ms = TimestampDifferenceMilliseconds(starttime, endtime);

I think that this change is to make vacuum code consistent with
analyze code, particularly the following part:

/*
* We do not expect an analyze to take > 25 days and it simplifies
* things a bit to use TimestampDifferenceMilliseconds.
*/
delay_in_ms = TimestampDifferenceMilliseconds(starttime, endtime);

However, as the above comment says, delay_in_ms can have a duration up
to 25 days. I guess it would not be a problem for analyze cases but
could be in vacuum cases as vacuum could sometimes be running for a
very long time. I've seen vacuums running even for almost 1 month. So
I think we should keep this part.

---
         /* measure elapsed time iff autovacuum logging requires it */
-        if (AmAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
+        if (instrument)

The comment should also be updated.

---
Could you split the 0002 patch into two patches? One is to have
ANALYZE command (with VERBOSE option) write the buffer usage, and
second one is to add WAL usage to both ANALYZE command and
autoanalyze. I think adding WAL usage to ANALYZE could be
controversial as it should not be WAL-intensive operation, so I'd like
to keep them separate.

I've tested this approach, it definitely looks better. I've added a
logbuf StringInfo to AcquireSampleRowsFunc which will receive the
logs. elevel was removed as it is not used anymore. Since everything
is in the same log line, I've removed the relation name in the acquire
sample functions.

For partitioned tables, I've also added the processed partition table
being sampled. The output will look like:

INFO: analyze of table "postgres.public.test_partition"
Sampling rows from child "public.test_partition_1"
pages: 5 of 5 scanned
tuples: 999 live tuples, 0 are dead; 999 tuples in sample, 999
estimated total tuples
Sampling rows from child "public.test_partition_2"
pages: 5 of 5 scanned
tuples: 1000 live tuples, 0 are dead; 1000 tuples in sample, 1000
estimated total tuples
avg read rate: 2.604 MB/s, avg write rate: 0.000 MB/s
...

For a file_fdw, the output will be:

INFO: analyze of table "postgres.public.pglog"
tuples: 60043 tuples; 30000 tuples in sample
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
...

Thank you for updating the patch. With your patch, I got the following
logs for when executing ANALYZE VERBOSE on a partitioned table:

postgres(1:3971560)=# analyze (verbose) p;
INFO: analyzing "public.p" inheritance tree
INFO: analyze of table "postgres.public.p"
Sampling rows from child "public.c1"
pages: 10000 of 14750 scanned
tuples: 2259833 live tuples, 0 are dead; 10000 tuples in sample,
3333254 estimated total tuples
Sampling rows from child "public.c2"
pages: 10000 of 14750 scanned
tuples: 2260000 live tuples, 0 are dead; 10000 tuples in sample,
3333500 estimated total tuples
Sampling rows from child "public.c3"
pages: 10000 of 14750 scanned
tuples: 2259833 live tuples, 0 are dead; 10000 tuples in sample,
3333254 estimated total tuples
avg read rate: 335.184 MB/s, avg write rate: 0.031 MB/s
buffer usage: 8249 hits, 21795 reads, 2 dirtied
WAL usage: 6 records, 1 full page images, 8825 bytes
system usage: CPU: user: 0.46 s, system: 0.03 s, elapsed: 0.50 s
:

Whereas the current log messages are like follow:

INFO: analyzing "public.p" inheritance tree
INFO: "c1": scanned 10000 of 14750 pages, containing 2259833 live
rows and 0 dead rows; 10000 rows in sample, 3333254 estimated total
rows
INFO: "c2": scanned 10000 of 14750 pages, containing 2259834 live
rows and 0 dead rows; 10000 rows in sample, 3333255 estimated total
rows
INFO: "c3": scanned 10000 of 14750 pages, containing 2259833 live
rows and 0 dead rows; 10000 rows in sample, 3333254 estimated total
rows
:

It seems to me that the current style is more concise and readable (3
rows per table vs. 1 row per table). We might need to consider a
better output format for partitioned tables as the number of
partitions could be high. I don't have a good idea now, though.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#12Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Masahiko Sawada (#11)
4 attachment(s)
Re: Use pgBufferUsage for block reporting in analyze

Hi,

On Tue, Jul 30, 2024 at 1:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I have one comment on 0001 patch:
The comment should also be updated or removed.

Ok, I've removed the comment.

However, as the above comment says, delay_in_ms can have a duration up
to 25 days. I guess it would not be a problem for analyze cases but
could be in vacuum cases as vacuum could sometimes be running for a
very long time. I've seen vacuums running even for almost 1 month. So
I think we should keep this part.

Good point, I've reverted to using TimestampDifference for vacuum.

/* measure elapsed time iff autovacuum logging requires it */
-        if (AmAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
+        if (instrument)

The comment should also be updated.

Updated.

Could you split the 0002 patch into two patches? One is to have
ANALYZE command (with VERBOSE option) write the buffer usage, and
second one is to add WAL usage to both ANALYZE command and
autoanalyze. I think adding WAL usage to ANALYZE could be
controversial as it should not be WAL-intensive operation, so I'd like
to keep them separate.

I've split the patch, 0002 makes verbose outputs the same as
autoanalyze logs with buffer/io/system while 0003 adds WAL usage
output.

It seems to me that the current style is more concise and readable (3
rows per table vs. 1 row per table). We might need to consider a
better output format for partitioned tables as the number of
partitions could be high. I don't have a good idea now, though.

A possible change would be to pass an inh flag when an acquirefunc is
called from acquire_inherited_sample_rows. The acquirefunc could then
use an alternative log format to append to logbuf. This way, we could
have a more compact format for partitioned tables.

Regards,
Anthonin

Attachments:

v5-0003-Output-wal-usage-of-analyze-in-verbose-output-and.patchapplication/octet-stream; name=v5-0003-Output-wal-usage-of-analyze-in-verbose-output-and.patchDownload
From 6b039ab5eea171fbf55afb8e218f093e300e5c34 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Tue, 30 Jul 2024 08:39:15 +0200
Subject: Output wal usage of analyze in verbose output and logs

To make analyze output more consistent with vacuum output, this patch
adds the WAL usage of analyze to both verbose output and autoanalyze's
log output.
---
 src/backend/commands/analyze.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 7fe2736777b..8b25eb99fb8 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -305,6 +305,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	WalUsage	startwalusage = pgWalUsage;
 	BufferUsage startbufferusage = pgBufferUsage;
 	BufferUsage bufferusage;
 	PgStat_Counter startreadtime = 0;
@@ -740,6 +741,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 									   params->log_min_duration))
 		{
 			long		delay_in_ms;
+			WalUsage	walusage;
 			double		read_rate = 0;
 			double		write_rate = 0;
 			char	   *msgfmt;
@@ -750,6 +752,8 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 
 			memset(&bufferusage, 0, sizeof(BufferUsage));
 			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
+			memset(&walusage, 0, sizeof(WalUsage));
+			WalUsageAccumDiff(&walusage, &pgWalUsage, &startwalusage);
 
 			total_blks_hit = bufferusage.shared_blks_hit +
 				bufferusage.local_blks_hit;
@@ -818,6 +822,11 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 							 (long long) total_blks_hit,
 							 (long long) total_blks_read,
 							 (long long) total_blks_dirtied);
+			appendStringInfo(&buf,
+							 _("WAL usage: %lld records, %lld full page images, %llu bytes\n"),
+							 (long long) walusage.wal_records,
+							 (long long) walusage.wal_fpi,
+							 (unsigned long long) walusage.wal_bytes);
 			appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
 
 			ereport(verbose ? INFO : LOG,
-- 
2.39.3 (Apple Git-146)

v5-0002-Output-analyze-details-on-analyze-verbose.patchapplication/octet-stream; name=v5-0002-Output-analyze-details-on-analyze-verbose.patchDownload
From 9dc332abe55c5689582715e11610075661a89903 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Tue, 30 Jul 2024 08:36:09 +0200
Subject: Output analyze details on analyze verbose

Currently, analyze details like buffer, io, and system usage are only
displayed in autoanalyze's logs.

To make behaviour more consistent with VACUUM VERBOSE, this patch
outputs those details on ANALYZE VERBOSE.
---
 src/backend/commands/analyze.c | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 50159232218..7fe2736777b 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -288,7 +288,9 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 				ind;
 	Relation   *Irel;
 	int			nindexes;
-	bool		hasindex;
+	bool		verbose,
+				instrument,
+				hasindex;
 	VacAttrStats **vacattrstats;
 	AnlIndexData *indexdata;
 	int			targrows,
@@ -308,6 +310,9 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
 
+	verbose = (params->options & VACOPT_VERBOSE) != 0;
+	instrument = (verbose || (AmAutoVacuumWorkerProcess() &&
+							  params->log_min_duration >= 0));
 	if (inh)
 		ereport(elevel,
 				(errmsg("analyzing \"%s.%s\" inheritance tree",
@@ -339,8 +344,11 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	save_nestlevel = NewGUCNestLevel();
 	RestrictSearchPath();
 
-	/* measure elapsed time iff autovacuum logging requires it */
-	if (AmAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
+	/*
+	 * measure elapsed time if called with verbose or if autovacuum logging
+	 * requires it
+	 */
+	if (instrument)
 	{
 		if (track_io_timing)
 		{
@@ -723,17 +731,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	vac_close_indexes(nindexes, Irel, NoLock);
 
 	/* Log the action if appropriate */
-	if (AmAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
+	if (instrument)
 	{
 		TimestampTz endtime = GetCurrentTimestamp();
 
-		if (params->log_min_duration == 0 ||
+		if (verbose || params->log_min_duration == 0 ||
 			TimestampDifferenceExceeds(starttime, endtime,
 									   params->log_min_duration))
 		{
 			long		delay_in_ms;
 			double		read_rate = 0;
 			double		write_rate = 0;
+			char	   *msgfmt;
 			StringInfoData buf;
 			int64		total_blks_hit;
 			int64		total_blks_read;
@@ -785,7 +794,13 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			 */
 
 			initStringInfo(&buf);
-			appendStringInfo(&buf, _("automatic analyze of table \"%s.%s.%s\"\n"),
+
+			if (AmAutoVacuumWorkerProcess())
+				msgfmt = _("automatic analyze of table \"%s.%s.%s\"\n");
+			else
+				msgfmt = _("analyze of table \"%s.%s.%s\"\n");
+
+			appendStringInfo(&buf, msgfmt,
 							 get_database_name(MyDatabaseId),
 							 get_namespace_name(RelationGetNamespace(onerel)),
 							 RelationGetRelationName(onerel));
@@ -805,7 +820,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 							 (long long) total_blks_dirtied);
 			appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
 
-			ereport(LOG,
+			ereport(verbose ? INFO : LOG,
 					(errmsg_internal("%s", buf.data)));
 
 			pfree(buf.data);
-- 
2.39.3 (Apple Git-146)

v5-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patchapplication/octet-stream; name=v5-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patchDownload
From 3e596e05a04fb5e2d8dca4e8cab588283d0d9873 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Wed, 24 Apr 2024 10:40:33 +0200
Subject: Use pgBufferUsage for block reporting in analyze

Analyze logs within autovacuum uses specific variables
VacuumPage{Hit,Miss,Dirty} to track the buffer usage count. However,
pgBufferUsage already provides block usage tracking and handles more
cases (temporary tables, parallel workers...).

Those variables were only used in two places, block usage reporting in
verbose vacuum and analyze. 5cd72cc0c5017a9d4de8b5d465a75946da5abd1d
removed their usage in the vacuum command as part of a bugfix.

This patch replaces those Vacuum specific variables by pgBufferUsage in
analyze. This makes VacuumPage{Hit,Miss,Dirty} completely unused and
removable. This commit removes both their calls in bufmgr and their
declarations. It also renames buffer misses to buffer reads for better
consistency.
---
 src/backend/access/heap/vacuumlazy.c  | 22 ++++++++++----
 src/backend/commands/analyze.c        | 41 ++++++++++++++-------------
 src/backend/commands/vacuum.c         |  3 --
 src/backend/commands/vacuumparallel.c |  3 --
 src/backend/storage/buffer/bufmgr.c   |  4 ---
 src/backend/utils/init/globals.c      |  4 ---
 src/include/miscadmin.h               |  4 ---
 7 files changed, 38 insertions(+), 43 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 835b53415d0..d82aa3d4896 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -608,6 +608,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			int32		diff;
 			double		read_rate = 0,
 						write_rate = 0;
+			int64		total_blks_hit;
+			int64		total_blks_read;
+			int64		total_blks_dirtied;
 
 			TimestampDifference(starttime, endtime, &secs_dur, &usecs_dur);
 			memset(&walusage, 0, sizeof(WalUsage));
@@ -615,6 +618,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			memset(&bufferusage, 0, sizeof(BufferUsage));
 			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
 
+			total_blks_hit = bufferusage.shared_blks_hit +
+				bufferusage.local_blks_hit;
+			total_blks_read = bufferusage.shared_blks_read +
+				bufferusage.local_blks_read;
+			total_blks_dirtied = bufferusage.shared_blks_dirtied +
+				bufferusage.local_blks_dirtied;
+
 			initStringInfo(&buf);
 			if (verbose)
 			{
@@ -740,18 +750,18 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			}
 			if (secs_dur > 0 || usecs_dur > 0)
 			{
-				read_rate = (double) BLCKSZ * (bufferusage.shared_blks_read + bufferusage.local_blks_read) /
+				read_rate = (double) BLCKSZ * total_blks_read /
 					(1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
-				write_rate = (double) BLCKSZ * (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied) /
+				write_rate = (double) BLCKSZ * total_blks_dirtied /
 					(1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
 			}
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 							 read_rate, write_rate);
 			appendStringInfo(&buf,
-							 _("buffer usage: %lld hits, %lld misses, %lld dirtied\n"),
-							 (long long) (bufferusage.shared_blks_hit + bufferusage.local_blks_hit),
-							 (long long) (bufferusage.shared_blks_read + bufferusage.local_blks_read),
-							 (long long) (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied));
+							 _("buffer usage: %lld hits, %lld reads, %lld dirtied\n"),
+							 (long long) total_blks_hit,
+							 (long long) total_blks_read,
+							 (long long) total_blks_dirtied);
 			appendStringInfo(&buf,
 							 _("WAL usage: %lld records, %lld full page images, %llu bytes\n"),
 							 (long long) walusage.wal_records,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index c590a2adc35..50159232218 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -303,9 +303,8 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
-	int64		AnalyzePageHit = VacuumPageHit;
-	int64		AnalyzePageMiss = VacuumPageMiss;
-	int64		AnalyzePageDirty = VacuumPageDirty;
+	BufferUsage startbufferusage = pgBufferUsage;
+	BufferUsage bufferusage;
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
 
@@ -736,15 +735,19 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			double		read_rate = 0;
 			double		write_rate = 0;
 			StringInfoData buf;
+			int64		total_blks_hit;
+			int64		total_blks_read;
+			int64		total_blks_dirtied;
 
-			/*
-			 * Calculate the difference in the Page Hit/Miss/Dirty that
-			 * happened as part of the analyze by subtracting out the
-			 * pre-analyze values which we saved above.
-			 */
-			AnalyzePageHit = VacuumPageHit - AnalyzePageHit;
-			AnalyzePageMiss = VacuumPageMiss - AnalyzePageMiss;
-			AnalyzePageDirty = VacuumPageDirty - AnalyzePageDirty;
+			memset(&bufferusage, 0, sizeof(BufferUsage));
+			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
+
+			total_blks_hit = bufferusage.shared_blks_hit +
+				bufferusage.local_blks_hit;
+			total_blks_read = bufferusage.shared_blks_read +
+				bufferusage.local_blks_read;
+			total_blks_dirtied = bufferusage.shared_blks_dirtied +
+				bufferusage.local_blks_dirtied;
 
 			/*
 			 * We do not expect an analyze to take > 25 days and it simplifies
@@ -770,10 +773,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 
 			if (delay_in_ms > 0)
 			{
-				read_rate = (double) BLCKSZ * AnalyzePageMiss / (1024 * 1024) /
-					(delay_in_ms / 1000.0);
-				write_rate = (double) BLCKSZ * AnalyzePageDirty / (1024 * 1024) /
-					(delay_in_ms / 1000.0);
+				read_rate = (double) BLCKSZ * total_blks_read /
+					(1024 * 1024) / (delay_in_ms / 1000.0);
+				write_rate = (double) BLCKSZ * total_blks_dirtied /
+					(1024 * 1024) / (delay_in_ms / 1000.0);
 			}
 
 			/*
@@ -796,10 +799,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			}
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 							 read_rate, write_rate);
-			appendStringInfo(&buf, _("buffer usage: %lld hits, %lld misses, %lld dirtied\n"),
-							 (long long) AnalyzePageHit,
-							 (long long) AnalyzePageMiss,
-							 (long long) AnalyzePageDirty);
+			appendStringInfo(&buf, _("buffer usage: %lld hits, %lld reads, %lld dirtied\n"),
+							 (long long) total_blks_hit,
+							 (long long) total_blks_read,
+							 (long long) total_blks_dirtied);
 			appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
 
 			ereport(LOG,
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 48f8eab2022..7d8e9d20454 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -603,9 +603,6 @@ vacuum(List *relations, VacuumParams *params, BufferAccessStrategy bstrategy,
 		VacuumFailsafeActive = false;
 		VacuumUpdateCosts();
 		VacuumCostBalance = 0;
-		VacuumPageHit = 0;
-		VacuumPageMiss = 0;
-		VacuumPageDirty = 0;
 		VacuumCostBalanceLocal = 0;
 		VacuumSharedCostBalance = NULL;
 		VacuumActiveNWorkers = NULL;
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index f26070bff2a..22c057fe61b 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -1043,9 +1043,6 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
 	/* Set cost-based vacuum delay */
 	VacuumUpdateCosts();
 	VacuumCostBalance = 0;
-	VacuumPageHit = 0;
-	VacuumPageMiss = 0;
-	VacuumPageDirty = 0;
 	VacuumCostBalanceLocal = 0;
 	VacuumSharedCostBalance = &(shared->cost_balance);
 	VacuumActiveNWorkers = &(shared->active_nworkers);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 4415ba648ae..fb38c7bdd45 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1192,7 +1192,6 @@ PinBufferForBlock(Relation rel,
 	}
 	if (*foundPtr)
 	{
-		VacuumPageHit++;
 		pgstat_count_io_op(io_object, io_context, IOOP_HIT);
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageHit;
@@ -1588,7 +1587,6 @@ WaitReadBuffers(ReadBuffersOperation *operation)
 											  false);
 		}
 
-		VacuumPageMiss += io_buffers_len;
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageMiss * io_buffers_len;
 	}
@@ -2582,7 +2580,6 @@ MarkBufferDirty(Buffer buffer)
 	 */
 	if (!(old_buf_state & BM_DIRTY))
 	{
-		VacuumPageDirty++;
 		pgBufferUsage.shared_blks_dirtied++;
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageDirty;
@@ -5122,7 +5119,6 @@ MarkBufferDirtyHint(Buffer buffer, bool buffer_std)
 
 		if (dirtied)
 		{
-			VacuumPageDirty++;
 			pgBufferUsage.shared_blks_dirtied++;
 			if (VacuumCostActive)
 				VacuumCostBalance += VacuumCostPageDirty;
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index d8debd160e4..03a54451ac2 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -153,10 +153,6 @@ int			VacuumCostPageDirty = 20;
 int			VacuumCostLimit = 200;
 double		VacuumCostDelay = 0;
 
-int64		VacuumPageHit = 0;
-int64		VacuumPageMiss = 0;
-int64		VacuumPageDirty = 0;
-
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index ac16233b71f..25348e71eb9 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -284,10 +284,6 @@ extern PGDLLIMPORT int VacuumCostPageDirty;
 extern PGDLLIMPORT int VacuumCostLimit;
 extern PGDLLIMPORT double VacuumCostDelay;
 
-extern PGDLLIMPORT int64 VacuumPageHit;
-extern PGDLLIMPORT int64 VacuumPageMiss;
-extern PGDLLIMPORT int64 VacuumPageDirty;
-
 extern PGDLLIMPORT int VacuumCostBalance;
 extern PGDLLIMPORT bool VacuumCostActive;
 
-- 
2.39.3 (Apple Git-146)

v5-0004-Pass-StringInfo-logbuf-to-AcquireSampleFunc.patchapplication/octet-stream; name=v5-0004-Pass-StringInfo-logbuf-to-AcquireSampleFunc.patchDownload
From 2df4381fd685bd114450cc6afb5e3bea25d9f67d Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Thu, 25 Jul 2024 09:34:55 +0200
Subject: Pass StringInfo logbuf to AcquireSampleFunc

Currently, row sampling functions emit log with interesting informations
like scanned pages or live tuples within the function itself. This
prevents the possibility to output those informations during ANALYZE
VERBOSE in a consistent way (a single log line emitted at the end of the
analysis).

To fix this, this patch modifies AcquireSampleRowsFunc to accept a
StringInfo that will store log output from the function. This allows to
control when and where the sampling functions logs should be by
do_analyze_rel.

elevel parameter has been removed from the AcquireSampleRowsFunc's
parameters as it is now unused.

Also, rows were replaced by tuples in logs for consistency.
---
 contrib/file_fdw/file_fdw.c         | 21 ++++----
 contrib/postgres_fdw/postgres_fdw.c | 23 ++++----
 doc/src/sgml/fdwhandler.sgml        |  4 +-
 src/backend/commands/analyze.c      | 81 +++++++++++++++--------------
 src/include/foreign/fdwapi.h        |  7 ++-
 5 files changed, 67 insertions(+), 69 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 249d82d3a05..20d96abce4e 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -160,9 +160,9 @@ static void estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
 						   FileFdwPlanState *fdw_private,
 						   Cost *startup_cost, Cost *total_cost);
-static int	file_acquire_sample_rows(Relation onerel, int elevel,
-									 HeapTuple *rows, int targrows,
-									 double *totalrows, double *totaldeadrows);
+static int	file_acquire_sample_rows(Relation onerel, HeapTuple *rows,
+									 int targrows, double *totalrows,
+									 double *totaldeadrows, StringInfo logbuf);
 
 
 /*
@@ -1120,9 +1120,9 @@ estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
  * currently (the planner only pays attention to correlation for indexscans).
  */
 static int
-file_acquire_sample_rows(Relation onerel, int elevel,
-						 HeapTuple *rows, int targrows,
-						 double *totalrows, double *totaldeadrows)
+file_acquire_sample_rows(Relation onerel, HeapTuple *rows,
+						 int targrows, double *totalrows,
+						 double *totaldeadrows, StringInfo logbuf)
 {
 	int			numrows = 0;
 	double		rowstoskip = -1;	/* -1 means not set yet */
@@ -1241,13 +1241,10 @@ file_acquire_sample_rows(Relation onerel, int elevel,
 	pfree(nulls);
 
 	/*
-	 * Emit some interesting relation info
+	 * Add some interesting relation info to logbuf
 	 */
-	ereport(elevel,
-			(errmsg("\"%s\": file contains %.0f rows; "
-					"%d rows in sample",
-					RelationGetRelationName(onerel),
-					*totalrows, numrows)));
+	appendStringInfo(logbuf, "tuples: %.0f tuples; %d tuples in sample\n",
+					 *totalrows, numrows);
 
 	return numrows;
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fc65d81e217..54a65018474 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -498,10 +498,9 @@ static void process_query_params(ExprContext *econtext,
 								 FmgrInfo *param_flinfo,
 								 List *param_exprs,
 								 const char **param_values);
-static int	postgresAcquireSampleRowsFunc(Relation relation, int elevel,
-										  HeapTuple *rows, int targrows,
-										  double *totalrows,
-										  double *totaldeadrows);
+static int	postgresAcquireSampleRowsFunc(Relation relation, HeapTuple *rows,
+										  int targrows, double *totalrows,
+										  double *totaldeadrows, StringInfo logbuf);
 static void analyze_row_processor(PGresult *res, int row,
 								  PgFdwAnalyzeState *astate);
 static void produce_tuple_asynchronously(AsyncRequest *areq, bool fetch);
@@ -5064,10 +5063,9 @@ postgresGetAnalyzeInfoForForeignTable(Relation relation, bool *can_tablesample)
  * currently (the planner only pays attention to correlation for indexscans).
  */
 static int
-postgresAcquireSampleRowsFunc(Relation relation, int elevel,
-							  HeapTuple *rows, int targrows,
-							  double *totalrows,
-							  double *totaldeadrows)
+postgresAcquireSampleRowsFunc(Relation relation, HeapTuple *rows,
+							  int targrows, double *totalrows,
+							  double *totaldeadrows, StringInfo logbuf)
 {
 	PgFdwAnalyzeState astate;
 	ForeignTable *table;
@@ -5359,12 +5357,11 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 		*totalrows = reltuples;
 
 	/*
-	 * Emit some interesting relation info
+	 * Add some interesting relation info to logbuf
 	 */
-	ereport(elevel,
-			(errmsg("\"%s\": table contains %.0f rows, %d rows in sample",
-					RelationGetRelationName(relation),
-					*totalrows, astate.numrows)));
+	appendStringInfo(logbuf,
+					 "tuples: %.0f tuples; %d tuples in sample\n",
+					 *totalrows, astate.numrows);
 
 	return astate.numrows;
 }
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index b80320504d6..d758b904ee5 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1393,11 +1393,11 @@ AnalyzeForeignTable(Relation relation,
 <programlisting>
 int
 AcquireSampleRowsFunc(Relation relation,
-                      int elevel,
                       HeapTuple *rows,
                       int targrows,
                       double *totalrows,
-                      double *totaldeadrows);
+                      double *totaldeadrows,
+                      StringInfo logbuf);
 </programlisting>
 
      A random sample of up to <parameter>targrows</parameter> rows should be collected
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8b25eb99fb8..def9090850d 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -87,13 +87,13 @@ static void compute_index_stats(Relation onerel, double totalrows,
 								MemoryContext col_context);
 static VacAttrStats *examine_attribute(Relation onerel, int attnum,
 									   Node *index_expr);
-static int	acquire_sample_rows(Relation onerel, int elevel,
-								HeapTuple *rows, int targrows,
-								double *totalrows, double *totaldeadrows);
+static int	acquire_sample_rows(Relation onerel, HeapTuple *rows,
+								int targrows, double *totalrows,
+								double *totaldeadrows, StringInfo logbuf);
 static int	compare_rows(const void *a, const void *b, void *arg);
-static int	acquire_inherited_sample_rows(Relation onerel, int elevel,
-										  HeapTuple *rows, int targrows,
-										  double *totalrows, double *totaldeadrows);
+static int	acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows,
+										  int targrows, double *totalrows,
+										  double *totaldeadrows, StringInfo logbuf);
 static void update_attstats(Oid relid, bool inh,
 							int natts, VacAttrStats **vacattrstats);
 static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
@@ -310,6 +310,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	BufferUsage bufferusage;
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
+	StringInfoData logbuf;
 
 	verbose = (params->options & VACOPT_VERBOSE) != 0;
 	instrument = (verbose || (AmAutoVacuumWorkerProcess() &&
@@ -334,6 +335,9 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 										ALLOCSET_DEFAULT_SIZES);
 	caller_context = MemoryContextSwitchTo(anl_context);
 
+	/* Initialize log buffer */
+	initStringInfo(&logbuf);
+
 	/*
 	 * Switch to the table owner's userid, so that any index functions are run
 	 * as that user.  Also lock down security-restricted operations and
@@ -528,13 +532,11 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 								 inh ? PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS_INH :
 								 PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS);
 	if (inh)
-		numrows = acquire_inherited_sample_rows(onerel, elevel,
-												rows, targrows,
-												&totalrows, &totaldeadrows);
+		numrows = acquire_inherited_sample_rows(onerel, rows, targrows,
+												&totalrows, &totaldeadrows, &logbuf);
 	else
-		numrows = (*acquirefunc) (onerel, elevel,
-								  rows, targrows,
-								  &totalrows, &totaldeadrows);
+		numrows = (*acquirefunc) (onerel, rows, targrows,
+								  &totalrows, &totaldeadrows, &logbuf);
 
 	/*
 	 * Compute the statistics.  Temporary results during the calculations for
@@ -808,6 +810,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 							 get_database_name(MyDatabaseId),
 							 get_namespace_name(RelationGetNamespace(onerel)),
 							 RelationGetRelationName(onerel));
+
+			/* Append logs from sampling function */
+			appendBinaryStringInfo(&buf, logbuf.data, logbuf.len);
+
 			if (track_io_timing)
 			{
 				double		read_ms = (double) (pgStatBlockReadTime - startreadtime) / 1000;
@@ -1182,9 +1188,9 @@ block_sampling_read_stream_next(ReadStream *stream,
  * density near the start of the table.
  */
 static int
-acquire_sample_rows(Relation onerel, int elevel,
-					HeapTuple *rows, int targrows,
-					double *totalrows, double *totaldeadrows)
+acquire_sample_rows(Relation onerel, HeapTuple *rows,
+					int targrows, double *totalrows,
+					double *totaldeadrows, StringInfo logbuf)
 {
 	int			numrows = 0;	/* # rows now in reservoir */
 	double		samplerows = 0; /* total # rows collected */
@@ -1321,16 +1327,12 @@ acquire_sample_rows(Relation onerel, int elevel,
 	}
 
 	/*
-	 * Emit some interesting relation info
+	 * Add interesting relation info to logbuf
 	 */
-	ereport(elevel,
-			(errmsg("\"%s\": scanned %d of %u pages, "
-					"containing %.0f live rows and %.0f dead rows; "
-					"%d rows in sample, %.0f estimated total rows",
-					RelationGetRelationName(onerel),
-					bs.m, totalblocks,
-					liverows, deadrows,
-					numrows, *totalrows)));
+	appendStringInfo(logbuf, _("pages: %u of %u scanned\n"),
+					 bs.m, totalblocks);
+	appendStringInfo(logbuf, _("tuples: %.0f live tuples, %.0f are dead; %d tuples in sample, %.0f estimated total tuples\n"),
+					 liverows, deadrows, numrows, *totalrows);
 
 	return numrows;
 }
@@ -1369,9 +1371,9 @@ compare_rows(const void *a, const void *b, void *arg)
  * children are foreign tables that don't support ANALYZE.
  */
 static int
-acquire_inherited_sample_rows(Relation onerel, int elevel,
-							  HeapTuple *rows, int targrows,
-							  double *totalrows, double *totaldeadrows)
+acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows,
+							  int targrows, double *totalrows,
+							  double *totaldeadrows, StringInfo logbuf)
 {
 	List	   *tableOIDs;
 	Relation   *rels;
@@ -1407,10 +1409,10 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 		/* CCI because we already updated the pg_class row in this command */
 		CommandCounterIncrement();
 		SetRelationHasSubclass(RelationGetRelid(onerel), false);
-		ereport(elevel,
-				(errmsg("skipping analyze of \"%s.%s\" inheritance tree --- this inheritance tree contains no child tables",
-						get_namespace_name(RelationGetNamespace(onerel)),
-						RelationGetRelationName(onerel))));
+		appendStringInfo(logbuf,
+						 "skipping analyze of \"%s.%s\" inheritance tree --- this inheritance tree contains no child tables\n",
+						 get_namespace_name(RelationGetNamespace(onerel)),
+						 RelationGetRelationName(onerel));
 		return 0;
 	}
 
@@ -1505,10 +1507,10 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 	 */
 	if (!has_child)
 	{
-		ereport(elevel,
-				(errmsg("skipping analyze of \"%s.%s\" inheritance tree --- this inheritance tree contains no analyzable child tables",
-						get_namespace_name(RelationGetNamespace(onerel)),
-						RelationGetRelationName(onerel))));
+		appendStringInfo(logbuf,
+						 "skipping analyze of \"%s.%s\" inheritance tree --- this inheritance tree contains no analyzable child tables\n",
+						 get_namespace_name(RelationGetNamespace(onerel)),
+						 RelationGetRelationName(onerel));
 		return 0;
 	}
 
@@ -1560,10 +1562,13 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 				double		trows,
 							tdrows;
 
+				appendStringInfo(logbuf, _("Sampling rows from child \"%s.%s\"\n"),
+								 get_namespace_name(RelationGetNamespace(childrel)),
+								 RelationGetRelationName(childrel));
 				/* Fetch a random sample of the child's rows */
-				childrows = (*acquirefunc) (childrel, elevel,
-											rows + numrows, childtargrows,
-											&trows, &tdrows);
+				childrows = (*acquirefunc) (childrel, rows + numrows,
+											childtargrows, &trows,
+											&tdrows, logbuf);
 
 				/* We may need to convert from child's rowtype to parent's */
 				if (childrows > 0 &&
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index fcde3876b28..b33a183b080 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -148,10 +148,9 @@ typedef void (*ExplainForeignModify_function) (ModifyTableState *mtstate,
 typedef void (*ExplainDirectModify_function) (ForeignScanState *node,
 											  struct ExplainState *es);
 
-typedef int (*AcquireSampleRowsFunc) (Relation relation, int elevel,
-									  HeapTuple *rows, int targrows,
-									  double *totalrows,
-									  double *totaldeadrows);
+typedef int (*AcquireSampleRowsFunc) (Relation relation, HeapTuple *rows,
+									  int targrows, double *totalrows,
+									  double *totaldeadrows, StringInfo logbuf);
 
 typedef bool (*AnalyzeForeignTable_function) (Relation relation,
 											  AcquireSampleRowsFunc *func,
-- 
2.39.3 (Apple Git-146)

#13Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Anthonin Bonnefoy (#12)
4 attachment(s)
Re: Use pgBufferUsage for block reporting in analyze

On Tue, Jul 30, 2024 at 9:21 AM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

A possible change would be to pass an inh flag when an acquirefunc is
called from acquire_inherited_sample_rows. The acquirefunc could then
use an alternative log format to append to logbuf. This way, we could
have a more compact format for partitioned tables.

I've just tested this, the result isn't great as it creates an
inconsistent output

INFO: analyze of table "postgres.public.test_partition"
"test_partition_1": scanned 5 of 5 pages, containing 999 live tuples
and 0 dead tuples; 999 rows in sample, 999 estimated total rows
"test_partition_2": scanned 5 of 5 pages, containing 1000 live tuples
and 0 dead tuples; 1000 rows in sample, 1000 estimated total rows
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
...
INFO: analyze of table "postgres.public.test_partition_1"
pages: 5 of 5 scanned
tuples: 999 live tuples, 0 are dead; 999 tuples in sample, 999
estimated total tuples
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s

Maybe the best approach is to always use the compact form?

INFO: analyze of table "postgres.public.test_partition"
"test_partition_1": scanned 5 of 5 pages, containing 999 live tuples
and 0 dead tuples; 999 tuples in sample, 999 estimated total tuples
"test_partition_2": scanned 5 of 5 pages, containing 1000 live tuples
and 0 dead tuples; 1000 tuples in sample, 1000 estimated total tuples
avg read rate: 1.953 MB/s, avg write rate: 0.000 MB/s
...
INFO: analyze of table "postgres.public.test_partition_1"
"test_partition_1": scanned 5 of 5 pages, containing 999 live tuples
and 0 dead tuples; 999 tuples in sample, 999 estimated total tuples
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s

I've updated the patchset with those changes. 0004 introduces the
StringInfo logbuf so we can output logs as a single log and during
ANALYZE VERBOSE while using the compact form.

Regards,
Anthonin

Attachments:

v6-0004-Pass-StringInfo-logbuf-to-AcquireSampleFunc.patchapplication/octet-stream; name=v6-0004-Pass-StringInfo-logbuf-to-AcquireSampleFunc.patchDownload
From 49f89017b5a7f6127df635f3f6aa0b57faa303fc Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Thu, 25 Jul 2024 09:34:55 +0200
Subject: Pass StringInfo logbuf to AcquireSampleFunc

Currently, row sampling functions emit log with interesting informations
like scanned pages or live tuples within the function itself. This
prevents the possibility to output those informations during ANALYZE
VERBOSE in a consistent way (a single log line emitted at the end of the
analysis).

To fix this, this patch modifies AcquireSampleRowsFunc to accept a
StringInfo that will store log output from the function. This allows to
control when and where the sampling functions logs should be by
do_analyze_rel.

elevel parameter has been removed from the AcquireSampleRowsFunc's
parameters as it is now unused.

Also, rows were replaced by tuples in logs for consistency.
---
 contrib/file_fdw/file_fdw.c         | 21 ++++----
 contrib/postgres_fdw/postgres_fdw.c | 24 ++++-----
 doc/src/sgml/fdwhandler.sgml        |  4 +-
 src/backend/commands/analyze.c      | 82 ++++++++++++++++-------------
 src/include/foreign/fdwapi.h        |  7 ++-
 5 files changed, 69 insertions(+), 69 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 249d82d3a05..a42d4fe7413 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -160,9 +160,9 @@ static void estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
 						   FileFdwPlanState *fdw_private,
 						   Cost *startup_cost, Cost *total_cost);
-static int	file_acquire_sample_rows(Relation onerel, int elevel,
-									 HeapTuple *rows, int targrows,
-									 double *totalrows, double *totaldeadrows);
+static int	file_acquire_sample_rows(Relation onerel, HeapTuple *rows,
+									 int targrows, double *totalrows,
+									 double *totaldeadrows, StringInfo logbuf);
 
 
 /*
@@ -1120,9 +1120,9 @@ estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
  * currently (the planner only pays attention to correlation for indexscans).
  */
 static int
-file_acquire_sample_rows(Relation onerel, int elevel,
-						 HeapTuple *rows, int targrows,
-						 double *totalrows, double *totaldeadrows)
+file_acquire_sample_rows(Relation onerel, HeapTuple *rows,
+						 int targrows, double *totalrows,
+						 double *totaldeadrows, StringInfo logbuf)
 {
 	int			numrows = 0;
 	double		rowstoskip = -1;	/* -1 means not set yet */
@@ -1241,13 +1241,10 @@ file_acquire_sample_rows(Relation onerel, int elevel,
 	pfree(nulls);
 
 	/*
-	 * Emit some interesting relation info
+	 * Add some interesting relation info to logbuf
 	 */
-	ereport(elevel,
-			(errmsg("\"%s\": file contains %.0f rows; "
-					"%d rows in sample",
-					RelationGetRelationName(onerel),
-					*totalrows, numrows)));
+	appendStringInfo(logbuf, "\"%s\": file contains %.0f tuples; %d tuples in sample\n",
+				  RelationGetRelationName(onerel), *totalrows, numrows);
 
 	return numrows;
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fc65d81e217..046c49ac931 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -498,10 +498,9 @@ static void process_query_params(ExprContext *econtext,
 								 FmgrInfo *param_flinfo,
 								 List *param_exprs,
 								 const char **param_values);
-static int	postgresAcquireSampleRowsFunc(Relation relation, int elevel,
-										  HeapTuple *rows, int targrows,
-										  double *totalrows,
-										  double *totaldeadrows);
+static int	postgresAcquireSampleRowsFunc(Relation relation, HeapTuple *rows,
+										  int targrows, double *totalrows,
+										  double *totaldeadrows, StringInfo logbuf);
 static void analyze_row_processor(PGresult *res, int row,
 								  PgFdwAnalyzeState *astate);
 static void produce_tuple_asynchronously(AsyncRequest *areq, bool fetch);
@@ -5064,10 +5063,9 @@ postgresGetAnalyzeInfoForForeignTable(Relation relation, bool *can_tablesample)
  * currently (the planner only pays attention to correlation for indexscans).
  */
 static int
-postgresAcquireSampleRowsFunc(Relation relation, int elevel,
-							  HeapTuple *rows, int targrows,
-							  double *totalrows,
-							  double *totaldeadrows)
+postgresAcquireSampleRowsFunc(Relation relation, HeapTuple *rows,
+							  int targrows, double *totalrows,
+							  double *totaldeadrows, StringInfo logbuf)
 {
 	PgFdwAnalyzeState astate;
 	ForeignTable *table;
@@ -5359,12 +5357,12 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 		*totalrows = reltuples;
 
 	/*
-	 * Emit some interesting relation info
+	 * Add some interesting relation info to logbuf
 	 */
-	ereport(elevel,
-			(errmsg("\"%s\": table contains %.0f rows, %d rows in sample",
-					RelationGetRelationName(relation),
-					*totalrows, astate.numrows)));
+	appendStringInfo(logbuf,
+				  "\"%s\": table contains %.0f tuples; %d tuples in sample\n",
+				  RelationGetRelationName(relation),
+				  *totalrows, astate.numrows);
 
 	return astate.numrows;
 }
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index b80320504d6..d758b904ee5 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1393,11 +1393,11 @@ AnalyzeForeignTable(Relation relation,
 <programlisting>
 int
 AcquireSampleRowsFunc(Relation relation,
-                      int elevel,
                       HeapTuple *rows,
                       int targrows,
                       double *totalrows,
-                      double *totaldeadrows);
+                      double *totaldeadrows,
+                      StringInfo logbuf);
 </programlisting>
 
      A random sample of up to <parameter>targrows</parameter> rows should be collected
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8b25eb99fb8..d1f99dda2ae 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -87,13 +87,13 @@ static void compute_index_stats(Relation onerel, double totalrows,
 								MemoryContext col_context);
 static VacAttrStats *examine_attribute(Relation onerel, int attnum,
 									   Node *index_expr);
-static int	acquire_sample_rows(Relation onerel, int elevel,
-								HeapTuple *rows, int targrows,
-								double *totalrows, double *totaldeadrows);
+static int	acquire_sample_rows(Relation onerel, HeapTuple *rows,
+								int targrows, double *totalrows,
+								double *totaldeadrows, StringInfo logbuf);
 static int	compare_rows(const void *a, const void *b, void *arg);
-static int	acquire_inherited_sample_rows(Relation onerel, int elevel,
-										  HeapTuple *rows, int targrows,
-										  double *totalrows, double *totaldeadrows);
+static int	acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows,
+										  int targrows, double *totalrows,
+										  double *totaldeadrows, StringInfo logbuf);
 static void update_attstats(Oid relid, bool inh,
 							int natts, VacAttrStats **vacattrstats);
 static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
@@ -310,6 +310,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	BufferUsage bufferusage;
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
+	StringInfoData logbuf;
 
 	verbose = (params->options & VACOPT_VERBOSE) != 0;
 	instrument = (verbose || (AmAutoVacuumWorkerProcess() &&
@@ -334,6 +335,9 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 										ALLOCSET_DEFAULT_SIZES);
 	caller_context = MemoryContextSwitchTo(anl_context);
 
+	/* Initialize log buffer */
+	initStringInfo(&logbuf);
+
 	/*
 	 * Switch to the table owner's userid, so that any index functions are run
 	 * as that user.  Also lock down security-restricted operations and
@@ -528,13 +532,11 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 								 inh ? PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS_INH :
 								 PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS);
 	if (inh)
-		numrows = acquire_inherited_sample_rows(onerel, elevel,
-												rows, targrows,
-												&totalrows, &totaldeadrows);
+		numrows = acquire_inherited_sample_rows(onerel, rows, targrows,
+												&totalrows, &totaldeadrows, &logbuf);
 	else
-		numrows = (*acquirefunc) (onerel, elevel,
-								  rows, targrows,
-								  &totalrows, &totaldeadrows);
+		numrows = (*acquirefunc) (onerel, rows, targrows,
+								  &totalrows, &totaldeadrows, &logbuf);
 
 	/*
 	 * Compute the statistics.  Temporary results during the calculations for
@@ -808,6 +810,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 							 get_database_name(MyDatabaseId),
 							 get_namespace_name(RelationGetNamespace(onerel)),
 							 RelationGetRelationName(onerel));
+
+			/* Append logs from sampling function */
+			appendBinaryStringInfo(&buf, logbuf.data, logbuf.len);
+
 			if (track_io_timing)
 			{
 				double		read_ms = (double) (pgStatBlockReadTime - startreadtime) / 1000;
@@ -1182,9 +1188,9 @@ block_sampling_read_stream_next(ReadStream *stream,
  * density near the start of the table.
  */
 static int
-acquire_sample_rows(Relation onerel, int elevel,
-					HeapTuple *rows, int targrows,
-					double *totalrows, double *totaldeadrows)
+acquire_sample_rows(Relation onerel, HeapTuple *rows,
+					int targrows, double *totalrows,
+					double *totaldeadrows, StringInfo logbuf)
 {
 	int			numrows = 0;	/* # rows now in reservoir */
 	double		samplerows = 0; /* total # rows collected */
@@ -1321,16 +1327,16 @@ acquire_sample_rows(Relation onerel, int elevel,
 	}
 
 	/*
-	 * Emit some interesting relation info
+	 * Add interesting relation info to logbuf
 	 */
-	ereport(elevel,
-			(errmsg("\"%s\": scanned %d of %u pages, "
-					"containing %.0f live rows and %.0f dead rows; "
-					"%d rows in sample, %.0f estimated total rows",
-					RelationGetRelationName(onerel),
-					bs.m, totalblocks,
-					liverows, deadrows,
-					numrows, *totalrows)));
+	appendStringInfo(logbuf,
+				  "\"%s\": scanned %d of %u pages, "
+				  "containing %.0f live tuples and %.0f dead tuples; "
+				  "%d tuples in sample, %.0f estimated total tuples\n",
+				  RelationGetRelationName(onerel),
+				  bs.m, totalblocks,
+				  liverows, deadrows,
+				  numrows, *totalrows);
 
 	return numrows;
 }
@@ -1369,9 +1375,9 @@ compare_rows(const void *a, const void *b, void *arg)
  * children are foreign tables that don't support ANALYZE.
  */
 static int
-acquire_inherited_sample_rows(Relation onerel, int elevel,
-							  HeapTuple *rows, int targrows,
-							  double *totalrows, double *totaldeadrows)
+acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows,
+							  int targrows, double *totalrows,
+							  double *totaldeadrows, StringInfo logbuf)
 {
 	List	   *tableOIDs;
 	Relation   *rels;
@@ -1407,10 +1413,10 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 		/* CCI because we already updated the pg_class row in this command */
 		CommandCounterIncrement();
 		SetRelationHasSubclass(RelationGetRelid(onerel), false);
-		ereport(elevel,
-				(errmsg("skipping analyze of \"%s.%s\" inheritance tree --- this inheritance tree contains no child tables",
-						get_namespace_name(RelationGetNamespace(onerel)),
-						RelationGetRelationName(onerel))));
+		appendStringInfo(logbuf,
+						 "skipping analyze of \"%s.%s\" inheritance tree --- this inheritance tree contains no child tables\n",
+						 get_namespace_name(RelationGetNamespace(onerel)),
+						 RelationGetRelationName(onerel));
 		return 0;
 	}
 
@@ -1505,10 +1511,10 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 	 */
 	if (!has_child)
 	{
-		ereport(elevel,
-				(errmsg("skipping analyze of \"%s.%s\" inheritance tree --- this inheritance tree contains no analyzable child tables",
-						get_namespace_name(RelationGetNamespace(onerel)),
-						RelationGetRelationName(onerel))));
+		appendStringInfo(logbuf,
+						 "skipping analyze of \"%s.%s\" inheritance tree --- this inheritance tree contains no analyzable child tables\n",
+						 get_namespace_name(RelationGetNamespace(onerel)),
+						 RelationGetRelationName(onerel));
 		return 0;
 	}
 
@@ -1561,9 +1567,9 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 							tdrows;
 
 				/* Fetch a random sample of the child's rows */
-				childrows = (*acquirefunc) (childrel, elevel,
-											rows + numrows, childtargrows,
-											&trows, &tdrows);
+				childrows = (*acquirefunc) (childrel, rows + numrows,
+											childtargrows, &trows,
+											&tdrows, logbuf);
 
 				/* We may need to convert from child's rowtype to parent's */
 				if (childrows > 0 &&
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index fcde3876b28..b33a183b080 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -148,10 +148,9 @@ typedef void (*ExplainForeignModify_function) (ModifyTableState *mtstate,
 typedef void (*ExplainDirectModify_function) (ForeignScanState *node,
 											  struct ExplainState *es);
 
-typedef int (*AcquireSampleRowsFunc) (Relation relation, int elevel,
-									  HeapTuple *rows, int targrows,
-									  double *totalrows,
-									  double *totaldeadrows);
+typedef int (*AcquireSampleRowsFunc) (Relation relation, HeapTuple *rows,
+									  int targrows, double *totalrows,
+									  double *totaldeadrows, StringInfo logbuf);
 
 typedef bool (*AnalyzeForeignTable_function) (Relation relation,
 											  AcquireSampleRowsFunc *func,
-- 
2.39.3 (Apple Git-146)

v6-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patchapplication/octet-stream; name=v6-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patchDownload
From 3e596e05a04fb5e2d8dca4e8cab588283d0d9873 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Wed, 24 Apr 2024 10:40:33 +0200
Subject: Use pgBufferUsage for block reporting in analyze

Analyze logs within autovacuum uses specific variables
VacuumPage{Hit,Miss,Dirty} to track the buffer usage count. However,
pgBufferUsage already provides block usage tracking and handles more
cases (temporary tables, parallel workers...).

Those variables were only used in two places, block usage reporting in
verbose vacuum and analyze. 5cd72cc0c5017a9d4de8b5d465a75946da5abd1d
removed their usage in the vacuum command as part of a bugfix.

This patch replaces those Vacuum specific variables by pgBufferUsage in
analyze. This makes VacuumPage{Hit,Miss,Dirty} completely unused and
removable. This commit removes both their calls in bufmgr and their
declarations. It also renames buffer misses to buffer reads for better
consistency.
---
 src/backend/access/heap/vacuumlazy.c  | 22 ++++++++++----
 src/backend/commands/analyze.c        | 41 ++++++++++++++-------------
 src/backend/commands/vacuum.c         |  3 --
 src/backend/commands/vacuumparallel.c |  3 --
 src/backend/storage/buffer/bufmgr.c   |  4 ---
 src/backend/utils/init/globals.c      |  4 ---
 src/include/miscadmin.h               |  4 ---
 7 files changed, 38 insertions(+), 43 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 835b53415d0..d82aa3d4896 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -608,6 +608,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			int32		diff;
 			double		read_rate = 0,
 						write_rate = 0;
+			int64		total_blks_hit;
+			int64		total_blks_read;
+			int64		total_blks_dirtied;
 
 			TimestampDifference(starttime, endtime, &secs_dur, &usecs_dur);
 			memset(&walusage, 0, sizeof(WalUsage));
@@ -615,6 +618,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			memset(&bufferusage, 0, sizeof(BufferUsage));
 			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
 
+			total_blks_hit = bufferusage.shared_blks_hit +
+				bufferusage.local_blks_hit;
+			total_blks_read = bufferusage.shared_blks_read +
+				bufferusage.local_blks_read;
+			total_blks_dirtied = bufferusage.shared_blks_dirtied +
+				bufferusage.local_blks_dirtied;
+
 			initStringInfo(&buf);
 			if (verbose)
 			{
@@ -740,18 +750,18 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			}
 			if (secs_dur > 0 || usecs_dur > 0)
 			{
-				read_rate = (double) BLCKSZ * (bufferusage.shared_blks_read + bufferusage.local_blks_read) /
+				read_rate = (double) BLCKSZ * total_blks_read /
 					(1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
-				write_rate = (double) BLCKSZ * (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied) /
+				write_rate = (double) BLCKSZ * total_blks_dirtied /
 					(1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
 			}
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 							 read_rate, write_rate);
 			appendStringInfo(&buf,
-							 _("buffer usage: %lld hits, %lld misses, %lld dirtied\n"),
-							 (long long) (bufferusage.shared_blks_hit + bufferusage.local_blks_hit),
-							 (long long) (bufferusage.shared_blks_read + bufferusage.local_blks_read),
-							 (long long) (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied));
+							 _("buffer usage: %lld hits, %lld reads, %lld dirtied\n"),
+							 (long long) total_blks_hit,
+							 (long long) total_blks_read,
+							 (long long) total_blks_dirtied);
 			appendStringInfo(&buf,
 							 _("WAL usage: %lld records, %lld full page images, %llu bytes\n"),
 							 (long long) walusage.wal_records,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index c590a2adc35..50159232218 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -303,9 +303,8 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
-	int64		AnalyzePageHit = VacuumPageHit;
-	int64		AnalyzePageMiss = VacuumPageMiss;
-	int64		AnalyzePageDirty = VacuumPageDirty;
+	BufferUsage startbufferusage = pgBufferUsage;
+	BufferUsage bufferusage;
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
 
@@ -736,15 +735,19 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			double		read_rate = 0;
 			double		write_rate = 0;
 			StringInfoData buf;
+			int64		total_blks_hit;
+			int64		total_blks_read;
+			int64		total_blks_dirtied;
 
-			/*
-			 * Calculate the difference in the Page Hit/Miss/Dirty that
-			 * happened as part of the analyze by subtracting out the
-			 * pre-analyze values which we saved above.
-			 */
-			AnalyzePageHit = VacuumPageHit - AnalyzePageHit;
-			AnalyzePageMiss = VacuumPageMiss - AnalyzePageMiss;
-			AnalyzePageDirty = VacuumPageDirty - AnalyzePageDirty;
+			memset(&bufferusage, 0, sizeof(BufferUsage));
+			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
+
+			total_blks_hit = bufferusage.shared_blks_hit +
+				bufferusage.local_blks_hit;
+			total_blks_read = bufferusage.shared_blks_read +
+				bufferusage.local_blks_read;
+			total_blks_dirtied = bufferusage.shared_blks_dirtied +
+				bufferusage.local_blks_dirtied;
 
 			/*
 			 * We do not expect an analyze to take > 25 days and it simplifies
@@ -770,10 +773,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 
 			if (delay_in_ms > 0)
 			{
-				read_rate = (double) BLCKSZ * AnalyzePageMiss / (1024 * 1024) /
-					(delay_in_ms / 1000.0);
-				write_rate = (double) BLCKSZ * AnalyzePageDirty / (1024 * 1024) /
-					(delay_in_ms / 1000.0);
+				read_rate = (double) BLCKSZ * total_blks_read /
+					(1024 * 1024) / (delay_in_ms / 1000.0);
+				write_rate = (double) BLCKSZ * total_blks_dirtied /
+					(1024 * 1024) / (delay_in_ms / 1000.0);
 			}
 
 			/*
@@ -796,10 +799,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			}
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 							 read_rate, write_rate);
-			appendStringInfo(&buf, _("buffer usage: %lld hits, %lld misses, %lld dirtied\n"),
-							 (long long) AnalyzePageHit,
-							 (long long) AnalyzePageMiss,
-							 (long long) AnalyzePageDirty);
+			appendStringInfo(&buf, _("buffer usage: %lld hits, %lld reads, %lld dirtied\n"),
+							 (long long) total_blks_hit,
+							 (long long) total_blks_read,
+							 (long long) total_blks_dirtied);
 			appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
 
 			ereport(LOG,
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 48f8eab2022..7d8e9d20454 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -603,9 +603,6 @@ vacuum(List *relations, VacuumParams *params, BufferAccessStrategy bstrategy,
 		VacuumFailsafeActive = false;
 		VacuumUpdateCosts();
 		VacuumCostBalance = 0;
-		VacuumPageHit = 0;
-		VacuumPageMiss = 0;
-		VacuumPageDirty = 0;
 		VacuumCostBalanceLocal = 0;
 		VacuumSharedCostBalance = NULL;
 		VacuumActiveNWorkers = NULL;
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index f26070bff2a..22c057fe61b 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -1043,9 +1043,6 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
 	/* Set cost-based vacuum delay */
 	VacuumUpdateCosts();
 	VacuumCostBalance = 0;
-	VacuumPageHit = 0;
-	VacuumPageMiss = 0;
-	VacuumPageDirty = 0;
 	VacuumCostBalanceLocal = 0;
 	VacuumSharedCostBalance = &(shared->cost_balance);
 	VacuumActiveNWorkers = &(shared->active_nworkers);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 4415ba648ae..fb38c7bdd45 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1192,7 +1192,6 @@ PinBufferForBlock(Relation rel,
 	}
 	if (*foundPtr)
 	{
-		VacuumPageHit++;
 		pgstat_count_io_op(io_object, io_context, IOOP_HIT);
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageHit;
@@ -1588,7 +1587,6 @@ WaitReadBuffers(ReadBuffersOperation *operation)
 											  false);
 		}
 
-		VacuumPageMiss += io_buffers_len;
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageMiss * io_buffers_len;
 	}
@@ -2582,7 +2580,6 @@ MarkBufferDirty(Buffer buffer)
 	 */
 	if (!(old_buf_state & BM_DIRTY))
 	{
-		VacuumPageDirty++;
 		pgBufferUsage.shared_blks_dirtied++;
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageDirty;
@@ -5122,7 +5119,6 @@ MarkBufferDirtyHint(Buffer buffer, bool buffer_std)
 
 		if (dirtied)
 		{
-			VacuumPageDirty++;
 			pgBufferUsage.shared_blks_dirtied++;
 			if (VacuumCostActive)
 				VacuumCostBalance += VacuumCostPageDirty;
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index d8debd160e4..03a54451ac2 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -153,10 +153,6 @@ int			VacuumCostPageDirty = 20;
 int			VacuumCostLimit = 200;
 double		VacuumCostDelay = 0;
 
-int64		VacuumPageHit = 0;
-int64		VacuumPageMiss = 0;
-int64		VacuumPageDirty = 0;
-
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index ac16233b71f..25348e71eb9 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -284,10 +284,6 @@ extern PGDLLIMPORT int VacuumCostPageDirty;
 extern PGDLLIMPORT int VacuumCostLimit;
 extern PGDLLIMPORT double VacuumCostDelay;
 
-extern PGDLLIMPORT int64 VacuumPageHit;
-extern PGDLLIMPORT int64 VacuumPageMiss;
-extern PGDLLIMPORT int64 VacuumPageDirty;
-
 extern PGDLLIMPORT int VacuumCostBalance;
 extern PGDLLIMPORT bool VacuumCostActive;
 
-- 
2.39.3 (Apple Git-146)

v6-0002-Output-analyze-details-on-analyze-verbose.patchapplication/octet-stream; name=v6-0002-Output-analyze-details-on-analyze-verbose.patchDownload
From 9dc332abe55c5689582715e11610075661a89903 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Tue, 30 Jul 2024 08:36:09 +0200
Subject: Output analyze details on analyze verbose

Currently, analyze details like buffer, io, and system usage are only
displayed in autoanalyze's logs.

To make behaviour more consistent with VACUUM VERBOSE, this patch
outputs those details on ANALYZE VERBOSE.
---
 src/backend/commands/analyze.c | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 50159232218..7fe2736777b 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -288,7 +288,9 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 				ind;
 	Relation   *Irel;
 	int			nindexes;
-	bool		hasindex;
+	bool		verbose,
+				instrument,
+				hasindex;
 	VacAttrStats **vacattrstats;
 	AnlIndexData *indexdata;
 	int			targrows,
@@ -308,6 +310,9 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
 
+	verbose = (params->options & VACOPT_VERBOSE) != 0;
+	instrument = (verbose || (AmAutoVacuumWorkerProcess() &&
+							  params->log_min_duration >= 0));
 	if (inh)
 		ereport(elevel,
 				(errmsg("analyzing \"%s.%s\" inheritance tree",
@@ -339,8 +344,11 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	save_nestlevel = NewGUCNestLevel();
 	RestrictSearchPath();
 
-	/* measure elapsed time iff autovacuum logging requires it */
-	if (AmAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
+	/*
+	 * measure elapsed time if called with verbose or if autovacuum logging
+	 * requires it
+	 */
+	if (instrument)
 	{
 		if (track_io_timing)
 		{
@@ -723,17 +731,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	vac_close_indexes(nindexes, Irel, NoLock);
 
 	/* Log the action if appropriate */
-	if (AmAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
+	if (instrument)
 	{
 		TimestampTz endtime = GetCurrentTimestamp();
 
-		if (params->log_min_duration == 0 ||
+		if (verbose || params->log_min_duration == 0 ||
 			TimestampDifferenceExceeds(starttime, endtime,
 									   params->log_min_duration))
 		{
 			long		delay_in_ms;
 			double		read_rate = 0;
 			double		write_rate = 0;
+			char	   *msgfmt;
 			StringInfoData buf;
 			int64		total_blks_hit;
 			int64		total_blks_read;
@@ -785,7 +794,13 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			 */
 
 			initStringInfo(&buf);
-			appendStringInfo(&buf, _("automatic analyze of table \"%s.%s.%s\"\n"),
+
+			if (AmAutoVacuumWorkerProcess())
+				msgfmt = _("automatic analyze of table \"%s.%s.%s\"\n");
+			else
+				msgfmt = _("analyze of table \"%s.%s.%s\"\n");
+
+			appendStringInfo(&buf, msgfmt,
 							 get_database_name(MyDatabaseId),
 							 get_namespace_name(RelationGetNamespace(onerel)),
 							 RelationGetRelationName(onerel));
@@ -805,7 +820,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 							 (long long) total_blks_dirtied);
 			appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
 
-			ereport(LOG,
+			ereport(verbose ? INFO : LOG,
 					(errmsg_internal("%s", buf.data)));
 
 			pfree(buf.data);
-- 
2.39.3 (Apple Git-146)

v6-0003-Output-wal-usage-of-analyze-in-verbose-output-and.patchapplication/octet-stream; name=v6-0003-Output-wal-usage-of-analyze-in-verbose-output-and.patchDownload
From 6b039ab5eea171fbf55afb8e218f093e300e5c34 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Tue, 30 Jul 2024 08:39:15 +0200
Subject: Output wal usage of analyze in verbose output and logs

To make analyze output more consistent with vacuum output, this patch
adds the WAL usage of analyze to both verbose output and autoanalyze's
log output.
---
 src/backend/commands/analyze.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 7fe2736777b..8b25eb99fb8 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -305,6 +305,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	WalUsage	startwalusage = pgWalUsage;
 	BufferUsage startbufferusage = pgBufferUsage;
 	BufferUsage bufferusage;
 	PgStat_Counter startreadtime = 0;
@@ -740,6 +741,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 									   params->log_min_duration))
 		{
 			long		delay_in_ms;
+			WalUsage	walusage;
 			double		read_rate = 0;
 			double		write_rate = 0;
 			char	   *msgfmt;
@@ -750,6 +752,8 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 
 			memset(&bufferusage, 0, sizeof(BufferUsage));
 			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
+			memset(&walusage, 0, sizeof(WalUsage));
+			WalUsageAccumDiff(&walusage, &pgWalUsage, &startwalusage);
 
 			total_blks_hit = bufferusage.shared_blks_hit +
 				bufferusage.local_blks_hit;
@@ -818,6 +822,11 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 							 (long long) total_blks_hit,
 							 (long long) total_blks_read,
 							 (long long) total_blks_dirtied);
+			appendStringInfo(&buf,
+							 _("WAL usage: %lld records, %lld full page images, %llu bytes\n"),
+							 (long long) walusage.wal_records,
+							 (long long) walusage.wal_fpi,
+							 (unsigned long long) walusage.wal_bytes);
 			appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
 
 			ereport(verbose ? INFO : LOG,
-- 
2.39.3 (Apple Git-146)

#14Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Anthonin Bonnefoy (#13)
2 attachment(s)
Re: Use pgBufferUsage for block reporting in analyze

On Wed, Jul 31, 2024 at 12:03 AM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

On Tue, Jul 30, 2024 at 9:21 AM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

A possible change would be to pass an inh flag when an acquirefunc is
called from acquire_inherited_sample_rows. The acquirefunc could then
use an alternative log format to append to logbuf. This way, we could
have a more compact format for partitioned tables.

I've just tested this, the result isn't great as it creates an
inconsistent output

INFO: analyze of table "postgres.public.test_partition"
"test_partition_1": scanned 5 of 5 pages, containing 999 live tuples
and 0 dead tuples; 999 rows in sample, 999 estimated total rows
"test_partition_2": scanned 5 of 5 pages, containing 1000 live tuples
and 0 dead tuples; 1000 rows in sample, 1000 estimated total rows
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
...
INFO: analyze of table "postgres.public.test_partition_1"
pages: 5 of 5 scanned
tuples: 999 live tuples, 0 are dead; 999 tuples in sample, 999
estimated total tuples
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s

Maybe the best approach is to always use the compact form?

INFO: analyze of table "postgres.public.test_partition"
"test_partition_1": scanned 5 of 5 pages, containing 999 live tuples
and 0 dead tuples; 999 tuples in sample, 999 estimated total tuples
"test_partition_2": scanned 5 of 5 pages, containing 1000 live tuples
and 0 dead tuples; 1000 tuples in sample, 1000 estimated total tuples
avg read rate: 1.953 MB/s, avg write rate: 0.000 MB/s
...
INFO: analyze of table "postgres.public.test_partition_1"
"test_partition_1": scanned 5 of 5 pages, containing 999 live tuples
and 0 dead tuples; 999 tuples in sample, 999 estimated total tuples
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s

I've updated the patchset with those changes. 0004 introduces the
StringInfo logbuf so we can output logs as a single log and during
ANALYZE VERBOSE while using the compact form.

Fair point. I'll consider a better output format.

Meanwhile, I think we can push 0001 and 0002 patches since they are in
good shape. I've updated commit messages to them and slightly changed
0002 patch to write "finished analyzing of table \"%s.%s.%s\" instead
of "analyze of table \"%s.%s.%s\".

Also, regarding 0003 patch, what is the main reason why we want to add
WAL usage to analyze reports? I think that analyze normally does not
write WAL records much so I'm not sure it's going to provide a good
insight for users.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

v7-0001-Use-pgBufferUsage-for-buffer-usage-tracking-in-an.patchapplication/octet-stream; name=v7-0001-Use-pgBufferUsage-for-buffer-usage-tracking-in-an.patchDownload
From 8580beab337c57dd51f3009d230af17e54384079 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 31 Jul 2024 10:33:14 -0700
Subject: [PATCH v7 1/3] Use pgBufferUsage for buffer usage tracking in
 analyze.

Previously, (auto)analyze used global variables VacuumPageHit,
VacuumPageMiss, and VacuumPageDirty to track buffer usage. However,
pgBufferUsage provides a more generic way to track buffer usage with
support functions.

This change replaces those global variables with pgBufferUsage in
analyze. Since analyze was the sole user of those variables, it
removes their declarations. Vacuum previously used those variables but
was already replaced with pgBufferUsage as part of a bug fix, commit
5cd72cc0c.

Additionally, it adjusts the buffer usage message in both vacuum and
analyze for better consistency.

Author: Anthonin Bonnefoy
Reviewed-by: Masahiko Sawada, Michael Paquier
Discussion: Discussion: https://postgr.es/m/CAO6_Xqr__kTTCLkftqS0qSCm-J7_xbRG3Ge2rWhucxQJMJhcRA%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c  | 22 ++++++++++----
 src/backend/commands/analyze.c        | 41 ++++++++++++++-------------
 src/backend/commands/vacuum.c         |  3 --
 src/backend/commands/vacuumparallel.c |  3 --
 src/backend/storage/buffer/bufmgr.c   |  4 ---
 src/backend/utils/init/globals.c      |  4 ---
 src/include/miscadmin.h               |  4 ---
 7 files changed, 38 insertions(+), 43 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 835b53415d..d82aa3d489 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -608,6 +608,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			int32		diff;
 			double		read_rate = 0,
 						write_rate = 0;
+			int64		total_blks_hit;
+			int64		total_blks_read;
+			int64		total_blks_dirtied;
 
 			TimestampDifference(starttime, endtime, &secs_dur, &usecs_dur);
 			memset(&walusage, 0, sizeof(WalUsage));
@@ -615,6 +618,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			memset(&bufferusage, 0, sizeof(BufferUsage));
 			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
 
+			total_blks_hit = bufferusage.shared_blks_hit +
+				bufferusage.local_blks_hit;
+			total_blks_read = bufferusage.shared_blks_read +
+				bufferusage.local_blks_read;
+			total_blks_dirtied = bufferusage.shared_blks_dirtied +
+				bufferusage.local_blks_dirtied;
+
 			initStringInfo(&buf);
 			if (verbose)
 			{
@@ -740,18 +750,18 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			}
 			if (secs_dur > 0 || usecs_dur > 0)
 			{
-				read_rate = (double) BLCKSZ * (bufferusage.shared_blks_read + bufferusage.local_blks_read) /
+				read_rate = (double) BLCKSZ * total_blks_read /
 					(1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
-				write_rate = (double) BLCKSZ * (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied) /
+				write_rate = (double) BLCKSZ * total_blks_dirtied /
 					(1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
 			}
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 							 read_rate, write_rate);
 			appendStringInfo(&buf,
-							 _("buffer usage: %lld hits, %lld misses, %lld dirtied\n"),
-							 (long long) (bufferusage.shared_blks_hit + bufferusage.local_blks_hit),
-							 (long long) (bufferusage.shared_blks_read + bufferusage.local_blks_read),
-							 (long long) (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied));
+							 _("buffer usage: %lld hits, %lld reads, %lld dirtied\n"),
+							 (long long) total_blks_hit,
+							 (long long) total_blks_read,
+							 (long long) total_blks_dirtied);
 			appendStringInfo(&buf,
 							 _("WAL usage: %lld records, %lld full page images, %llu bytes\n"),
 							 (long long) walusage.wal_records,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index c590a2adc3..5015923221 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -303,9 +303,8 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
-	int64		AnalyzePageHit = VacuumPageHit;
-	int64		AnalyzePageMiss = VacuumPageMiss;
-	int64		AnalyzePageDirty = VacuumPageDirty;
+	BufferUsage startbufferusage = pgBufferUsage;
+	BufferUsage bufferusage;
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
 
@@ -736,15 +735,19 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			double		read_rate = 0;
 			double		write_rate = 0;
 			StringInfoData buf;
+			int64		total_blks_hit;
+			int64		total_blks_read;
+			int64		total_blks_dirtied;
 
-			/*
-			 * Calculate the difference in the Page Hit/Miss/Dirty that
-			 * happened as part of the analyze by subtracting out the
-			 * pre-analyze values which we saved above.
-			 */
-			AnalyzePageHit = VacuumPageHit - AnalyzePageHit;
-			AnalyzePageMiss = VacuumPageMiss - AnalyzePageMiss;
-			AnalyzePageDirty = VacuumPageDirty - AnalyzePageDirty;
+			memset(&bufferusage, 0, sizeof(BufferUsage));
+			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
+
+			total_blks_hit = bufferusage.shared_blks_hit +
+				bufferusage.local_blks_hit;
+			total_blks_read = bufferusage.shared_blks_read +
+				bufferusage.local_blks_read;
+			total_blks_dirtied = bufferusage.shared_blks_dirtied +
+				bufferusage.local_blks_dirtied;
 
 			/*
 			 * We do not expect an analyze to take > 25 days and it simplifies
@@ -770,10 +773,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 
 			if (delay_in_ms > 0)
 			{
-				read_rate = (double) BLCKSZ * AnalyzePageMiss / (1024 * 1024) /
-					(delay_in_ms / 1000.0);
-				write_rate = (double) BLCKSZ * AnalyzePageDirty / (1024 * 1024) /
-					(delay_in_ms / 1000.0);
+				read_rate = (double) BLCKSZ * total_blks_read /
+					(1024 * 1024) / (delay_in_ms / 1000.0);
+				write_rate = (double) BLCKSZ * total_blks_dirtied /
+					(1024 * 1024) / (delay_in_ms / 1000.0);
 			}
 
 			/*
@@ -796,10 +799,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			}
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 							 read_rate, write_rate);
-			appendStringInfo(&buf, _("buffer usage: %lld hits, %lld misses, %lld dirtied\n"),
-							 (long long) AnalyzePageHit,
-							 (long long) AnalyzePageMiss,
-							 (long long) AnalyzePageDirty);
+			appendStringInfo(&buf, _("buffer usage: %lld hits, %lld reads, %lld dirtied\n"),
+							 (long long) total_blks_hit,
+							 (long long) total_blks_read,
+							 (long long) total_blks_dirtied);
 			appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
 
 			ereport(LOG,
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 48f8eab202..7d8e9d2045 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -603,9 +603,6 @@ vacuum(List *relations, VacuumParams *params, BufferAccessStrategy bstrategy,
 		VacuumFailsafeActive = false;
 		VacuumUpdateCosts();
 		VacuumCostBalance = 0;
-		VacuumPageHit = 0;
-		VacuumPageMiss = 0;
-		VacuumPageDirty = 0;
 		VacuumCostBalanceLocal = 0;
 		VacuumSharedCostBalance = NULL;
 		VacuumActiveNWorkers = NULL;
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index f26070bff2..22c057fe61 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -1043,9 +1043,6 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
 	/* Set cost-based vacuum delay */
 	VacuumUpdateCosts();
 	VacuumCostBalance = 0;
-	VacuumPageHit = 0;
-	VacuumPageMiss = 0;
-	VacuumPageDirty = 0;
 	VacuumCostBalanceLocal = 0;
 	VacuumSharedCostBalance = &(shared->cost_balance);
 	VacuumActiveNWorkers = &(shared->active_nworkers);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 4415ba648a..fb38c7bdd4 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1192,7 +1192,6 @@ PinBufferForBlock(Relation rel,
 	}
 	if (*foundPtr)
 	{
-		VacuumPageHit++;
 		pgstat_count_io_op(io_object, io_context, IOOP_HIT);
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageHit;
@@ -1588,7 +1587,6 @@ WaitReadBuffers(ReadBuffersOperation *operation)
 											  false);
 		}
 
-		VacuumPageMiss += io_buffers_len;
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageMiss * io_buffers_len;
 	}
@@ -2582,7 +2580,6 @@ MarkBufferDirty(Buffer buffer)
 	 */
 	if (!(old_buf_state & BM_DIRTY))
 	{
-		VacuumPageDirty++;
 		pgBufferUsage.shared_blks_dirtied++;
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageDirty;
@@ -5122,7 +5119,6 @@ MarkBufferDirtyHint(Buffer buffer, bool buffer_std)
 
 		if (dirtied)
 		{
-			VacuumPageDirty++;
 			pgBufferUsage.shared_blks_dirtied++;
 			if (VacuumCostActive)
 				VacuumCostBalance += VacuumCostPageDirty;
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index d8debd160e..03a54451ac 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -153,10 +153,6 @@ int			VacuumCostPageDirty = 20;
 int			VacuumCostLimit = 200;
 double		VacuumCostDelay = 0;
 
-int64		VacuumPageHit = 0;
-int64		VacuumPageMiss = 0;
-int64		VacuumPageDirty = 0;
-
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index ac16233b71..25348e71eb 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -284,10 +284,6 @@ extern PGDLLIMPORT int VacuumCostPageDirty;
 extern PGDLLIMPORT int VacuumCostLimit;
 extern PGDLLIMPORT double VacuumCostDelay;
 
-extern PGDLLIMPORT int64 VacuumPageHit;
-extern PGDLLIMPORT int64 VacuumPageMiss;
-extern PGDLLIMPORT int64 VacuumPageDirty;
-
 extern PGDLLIMPORT int VacuumCostBalance;
 extern PGDLLIMPORT bool VacuumCostActive;
 
-- 
2.39.3

v7-0002-Add-resource-statistics-reporting-to-ANALYZE-VERB.patchapplication/octet-stream; name=v7-0002-Add-resource-statistics-reporting-to-ANALYZE-VERB.patchDownload
From e79b5dc65c0e0742e9644a8a772a8397aa70d558 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 31 Jul 2024 10:36:08 -0700
Subject: [PATCH v7 2/3] Add resource statistics reporting to ANALYZE VERBOSE.

Previously, log_autovacuum_min_duration utilized dedicated code for
logging resource statistics, such as system and buffer usage during
autoanalyze. However, this logging functionality was not utilized by
ANALYZE VERBOSE.

This commit adds resource statistics reporting to ANALYZE VERBOSE by
reusing the same logging code as autoanalyze. The detailed sampling
results will continue to be logged at the INFO message level.

Author: Anthonin Bonnefoy
Reviewed-by: Masahiko Sawada
Discussion: Discussion: https://postgr.es/m/CAO6_Xqr__kTTCLkftqS0qSCm-J7_xbRG3Ge2rWhucxQJMJhcRA%40mail.gmail.com
---
 src/backend/commands/analyze.c | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 5015923221..902eb1a450 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -288,7 +288,9 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 				ind;
 	Relation   *Irel;
 	int			nindexes;
-	bool		hasindex;
+	bool		verbose,
+				instrument,
+				hasindex;
 	VacAttrStats **vacattrstats;
 	AnlIndexData *indexdata;
 	int			targrows,
@@ -308,6 +310,9 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
 
+	verbose = (params->options & VACOPT_VERBOSE) != 0;
+	instrument = (verbose || (AmAutoVacuumWorkerProcess() &&
+							  params->log_min_duration >= 0));
 	if (inh)
 		ereport(elevel,
 				(errmsg("analyzing \"%s.%s\" inheritance tree",
@@ -339,8 +344,11 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	save_nestlevel = NewGUCNestLevel();
 	RestrictSearchPath();
 
-	/* measure elapsed time iff autovacuum logging requires it */
-	if (AmAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
+	/*
+	 * measure elapsed time if called with verbose or if autovacuum logging
+	 * requires it
+	 */
+	if (instrument)
 	{
 		if (track_io_timing)
 		{
@@ -723,17 +731,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	vac_close_indexes(nindexes, Irel, NoLock);
 
 	/* Log the action if appropriate */
-	if (AmAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
+	if (instrument)
 	{
 		TimestampTz endtime = GetCurrentTimestamp();
 
-		if (params->log_min_duration == 0 ||
+		if (verbose || params->log_min_duration == 0 ||
 			TimestampDifferenceExceeds(starttime, endtime,
 									   params->log_min_duration))
 		{
 			long		delay_in_ms;
 			double		read_rate = 0;
 			double		write_rate = 0;
+			char	   *msgfmt;
 			StringInfoData buf;
 			int64		total_blks_hit;
 			int64		total_blks_read;
@@ -785,7 +794,13 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			 */
 
 			initStringInfo(&buf);
-			appendStringInfo(&buf, _("automatic analyze of table \"%s.%s.%s\"\n"),
+
+			if (AmAutoVacuumWorkerProcess())
+				msgfmt = _("automatic analyze of table \"%s.%s.%s\"\n");
+			else
+				msgfmt = _("finished analyzing table \"%s.%s.%s\"\n");
+
+			appendStringInfo(&buf, msgfmt,
 							 get_database_name(MyDatabaseId),
 							 get_namespace_name(RelationGetNamespace(onerel)),
 							 RelationGetRelationName(onerel));
@@ -805,7 +820,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 							 (long long) total_blks_dirtied);
 			appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
 
-			ereport(LOG,
+			ereport(verbose ? INFO : LOG,
 					(errmsg_internal("%s", buf.data)));
 
 			pfree(buf.data);
-- 
2.39.3

#15Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Masahiko Sawada (#14)
Re: Use pgBufferUsage for block reporting in analyze

On Wed, Jul 31, 2024 at 9:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Meanwhile, I think we can push 0001 and 0002 patches since they are in
good shape. I've updated commit messages to them and slightly changed
0002 patch to write "finished analyzing of table \"%s.%s.%s\" instead
of "analyze of table \"%s.%s.%s\".

Wouldn't it make sense to do the same for autoanalyze and write
"finished automatic analyze of table \"%s.%s.%s\"\n" instead of
"automatic analyze of table \"%s.%s.%s\"\n"?

Also, regarding 0003 patch, what is the main reason why we want to add
WAL usage to analyze reports? I think that analyze normally does not
write WAL records much so I'm not sure it's going to provide a good
insight for users.

There was no strong reason except for consistency with VACUUM VERBOSE
output. But as you said, it's not really providing valuable
information so it's probably better to keep the noise down and drop
it.

Regards,
Anthonin Bonnefoy

#16Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Anthonin Bonnefoy (#15)
Re: Use pgBufferUsage for block reporting in analyze

On Wed, Jul 31, 2024 at 11:27 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

On Wed, Jul 31, 2024 at 9:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Meanwhile, I think we can push 0001 and 0002 patches since they are in
good shape. I've updated commit messages to them and slightly changed
0002 patch to write "finished analyzing of table \"%s.%s.%s\" instead
of "analyze of table \"%s.%s.%s\".

Wouldn't it make sense to do the same for autoanalyze and write
"finished automatic analyze of table \"%s.%s.%s\"\n" instead of
"automatic analyze of table \"%s.%s.%s\"\n"?

I think that the current style is consistent with autovacuum logs:

2024-08-01 16:04:48.088 PDT [12302] LOG: automatic vacuum of table
"postgres.public.test": index scans: 0
pages: 0 removed, 443 remain, 443 scanned (100.00% of total)
tuples: 0 removed, 100000 remain, 0 are dead but not yet removable
removable cutoff: 751, which was 0 XIDs old when operation ended
new relfrozenxid: 739, which is 1 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
index scan not needed: 0 pages from table (0.00% of total) had
0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 1.466 MB/s
buffer usage: 905 hits, 0 reads, 4 dirtied
system usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.02 s
2024-08-01 16:04:48.125 PDT [12302] LOG: automatic analyze of table
"postgres.public.test"
avg read rate: 5.551 MB/s, avg write rate: 0.617 MB/s
buffer usage: 512 hits, 27 reads, 3 dirtied
system usage: CPU: user: 0.02 s, system: 0.00 s, elapsed: 0.03 s

Since ANALYZE command writes the start log, I think it makes sense to
write "finished" at the end of the operation:

=# analyze verbose test;
INFO: analyzing "public.test"
INFO: "test": scanned 443 of 443 pages, containing 100000 live rows
and 0 dead rows; 30000 rows in sample, 100000 estimated total rows
INFO: finished analyzing table "postgres.public.test"
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 549 hits, 0 reads, 0 dirtied
system usage: CPU: user: 0.02 s, system: 0.00 s, elapsed: 0.03 s
ANALYZE

Also, regarding 0003 patch, what is the main reason why we want to add
WAL usage to analyze reports? I think that analyze normally does not
write WAL records much so I'm not sure it's going to provide a good
insight for users.

There was no strong reason except for consistency with VACUUM VERBOSE
output. But as you said, it's not really providing valuable
information so it's probably better to keep the noise down and drop
it.

Okay. I think writing WAL usage would not be very noisy and probably
could help some cases where (auto)analyze unexpectedly writes many WAL
records (e.g., writing full page images much), and is consistent with
(auto)vacuum logs as you mentioned. So let's go with this direction
unless others think differently.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#17Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#16)
1 attachment(s)
Re: Use pgBufferUsage for block reporting in analyze

On Fri, Aug 2, 2024 at 8:11 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jul 31, 2024 at 11:27 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

On Wed, Jul 31, 2024 at 9:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Meanwhile, I think we can push 0001 and 0002 patches since they are in
good shape. I've updated commit messages to them and slightly changed
0002 patch to write "finished analyzing of table \"%s.%s.%s\" instead
of "analyze of table \"%s.%s.%s\".

Wouldn't it make sense to do the same for autoanalyze and write
"finished automatic analyze of table \"%s.%s.%s\"\n" instead of
"automatic analyze of table \"%s.%s.%s\"\n"?

I think that the current style is consistent with autovacuum logs:

2024-08-01 16:04:48.088 PDT [12302] LOG: automatic vacuum of table
"postgres.public.test": index scans: 0
pages: 0 removed, 443 remain, 443 scanned (100.00% of total)
tuples: 0 removed, 100000 remain, 0 are dead but not yet removable
removable cutoff: 751, which was 0 XIDs old when operation ended
new relfrozenxid: 739, which is 1 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
index scan not needed: 0 pages from table (0.00% of total) had
0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 1.466 MB/s
buffer usage: 905 hits, 0 reads, 4 dirtied
system usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.02 s
2024-08-01 16:04:48.125 PDT [12302] LOG: automatic analyze of table
"postgres.public.test"
avg read rate: 5.551 MB/s, avg write rate: 0.617 MB/s
buffer usage: 512 hits, 27 reads, 3 dirtied
system usage: CPU: user: 0.02 s, system: 0.00 s, elapsed: 0.03 s

Since ANALYZE command writes the start log, I think it makes sense to
write "finished" at the end of the operation:

=# analyze verbose test;
INFO: analyzing "public.test"
INFO: "test": scanned 443 of 443 pages, containing 100000 live rows
and 0 dead rows; 30000 rows in sample, 100000 estimated total rows
INFO: finished analyzing table "postgres.public.test"
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 549 hits, 0 reads, 0 dirtied
system usage: CPU: user: 0.02 s, system: 0.00 s, elapsed: 0.03 s
ANALYZE

Committed 0001 and 0002 patches.

Also, regarding 0003 patch, what is the main reason why we want to add
WAL usage to analyze reports? I think that analyze normally does not
write WAL records much so I'm not sure it's going to provide a good
insight for users.

There was no strong reason except for consistency with VACUUM VERBOSE
output. But as you said, it's not really providing valuable
information so it's probably better to keep the noise down and drop
it.

Okay. I think writing WAL usage would not be very noisy and probably
could help some cases where (auto)analyze unexpectedly writes many WAL
records (e.g., writing full page images much), and is consistent with
(auto)vacuum logs as you mentioned. So let's go with this direction
unless others think differently.

I've updated the patch to add WAL usage to analyze. I'm going to push
it this week, barring any objections.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

v8-0001-Add-WAL-usage-reporting-to-ANALYZE.patchapplication/octet-stream; name=v8-0001-Add-WAL-usage-reporting-to-ANALYZE.patchDownload
From 4a9967e55cc1b9821c9bd2f16dddfdb0854d8363 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Date: Tue, 30 Jul 2024 08:39:15 +0200
Subject: [PATCH v8] Add WAL usage reporting to ANALYZE.

This change adds WAL usage reporting to the output of ANALYZE VERBOSE
and autoanalyze reports. It aligns the analyze output with VACUUM,
providing consistency. Additionally, it aids in troubleshooting cases
where unexpected WAL records are generated during analyze operations.

Author: Anthonin Bonnefoy
Reviewed-by: Masahiko Sawada
Discussion: https://postgr.es/m/CAO6_Xqr__kTTCLkftqS0qSCm-J7_xbRG3Ge2rWhucxQJMJhcRA%40mail.gmail.com
---
 src/backend/commands/analyze.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 902eb1a450..38fb4c3ef2 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -305,6 +305,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	WalUsage	startwalusage = pgWalUsage;
 	BufferUsage startbufferusage = pgBufferUsage;
 	BufferUsage bufferusage;
 	PgStat_Counter startreadtime = 0;
@@ -740,6 +741,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 									   params->log_min_duration))
 		{
 			long		delay_in_ms;
+			WalUsage	walusage;
 			double		read_rate = 0;
 			double		write_rate = 0;
 			char	   *msgfmt;
@@ -750,6 +752,8 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 
 			memset(&bufferusage, 0, sizeof(BufferUsage));
 			BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
+			memset(&walusage, 0, sizeof(WalUsage));
+			WalUsageAccumDiff(&walusage, &pgWalUsage, &startwalusage);
 
 			total_blks_hit = bufferusage.shared_blks_hit +
 				bufferusage.local_blks_hit;
@@ -818,6 +822,11 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 							 (long long) total_blks_hit,
 							 (long long) total_blks_read,
 							 (long long) total_blks_dirtied);
+			appendStringInfo(&buf,
+							 _("WAL usage: %lld records, %lld full page images, %llu bytes\n"),
+							 (long long) walusage.wal_records,
+							 (long long) walusage.wal_fpi,
+							 (unsigned long long) walusage.wal_bytes);
 			appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
 
 			ereport(verbose ? INFO : LOG,
-- 
2.39.3