cleanup patches for incremental backup

Started by Robert Haasabout 2 years ago40 messages
#1Robert Haas
robertmhaas@gmail.com
4 attachment(s)

Hi,

I discovered that my patch to add WAL summarization added two new
SQL-callable functions but failed to document them. 0001 fixes that.

An outstanding item from the original thread was to write a better
test for the not-yet-committed pg_walsummary utility. But I discovered
that I couldn't do that because there were some race conditions that
couldn't easily be cured. So 0002 therefore adds a new function
pg_get_wal_summarizer_state() which returns various items of in-memory
state related to WAL summarization. We had some brief discussion of
this being desirable for other reasons; it's nice for users to be able
to look at this information in case of trouble (say, if the summarizer
is not keeping up).

0003 then adds the previously-proposed pg_walsummary utility, with
tests that depend on 0002.

0004 attempts to fix some problems detected by Coverity and subsequent
code inspection.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

v1-0001-Document-WAL-summarization-information-functions.patchapplication/octet-stream; name=v1-0001-Document-WAL-summarization-information-functions.patchDownload
From 414f1dbbaf23072abeeebfa06a9950b67a82b71d Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 4 Jan 2024 09:01:07 -0500
Subject: [PATCH v1 1/4] Document WAL summarization information functions.

Commit 174c480508ac25568561443e6d4a82d5c1103487 added two new
information functions but failed to document them anywhere.
---
 doc/src/sgml/func.sgml | 80 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 80 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index cec21e42c0..de78d58d4b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26480,6 +26480,86 @@ SELECT collation for ('foo' COLLATE "de_DE");
 
   </sect2>
 
+  <sect2 id="functions-info-wal-summary">
+   <title>WAL Summarization Information Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-wal-summary"/>
+    print information about the status of WAL summarization.
+    See <xref linkend="guc-summarize-wal" />.
+   </para>
+
+   <table id="functions-wal-summary">
+    <title>WAL Summarization Information Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_available_wal_summaries</primary>
+        </indexterm>
+        <function>pg_available_wal_summaries</function> ()
+        <returnvalue>setof record</returnvalue>
+        ( <parameter>tli</parameter> <type>bigint</type>,
+        <parameter>start_lsn</parameter> <type>pg_lsn</type>,
+        <parameter>end_lsn</parameter> <type>pg_lsn</type> )
+       </para>
+       <para>
+        Returns information about the WAL summary files present in the
+        data directory, under <literal>pg_wal/summaries</literal>.
+        One row will be returned per WAL summary file. Each file summarizes
+        WAL on the indicated TLI within the indicated LSN range. This function
+        might be useful to determine whether enough WAL summaries are present
+        on the server to take an incremental backup based on some prior
+        backup whose start LSN is known.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_wal_summary_contents</primary>
+        </indexterm>
+        <function>pg_wal_summary_contents</function> ( <parameter>tli</parameter> <type>bigint</type>, <parameter>start_lsn</parameter> <type>pg_lsn</type>, <parameter>end_lsn</parameter> <type>pg_lsn</type> )
+        <returnvalue>setof record</returnvalue>
+        ( <parameter>relfilenode</parameter> <type>oid</type>,
+        <parameter>reltablespace</parameter> <type>oid</type>,
+        <parameter>reldatabase</parameter> <type>oid</type>,
+        <parameter>relforknumber</parameter> <type>smallint</type>,
+        <parameter>relblocknumber</parameter> <type>bigint</type>,
+        <parameter>is_limit_block</parameter> <type>boolean</type> )
+       </para>
+       <para>
+        Returns one information about the contents of a single WAL summary file
+        identified by TLI and starting and ending LSNs. Each row with
+        <literal>is_limit_block</literal> false indicates that the block
+        identified by the remaining output columns was modified by at least
+        one WAL record within the range of records summarized by this file.
+        Each row with <literal>is_limit_block</literal> true indicates either
+        that (a) the relation fork was truncated to the length given by
+        <literal>relblocknumber</literal> within the relevant range of WAL
+        records or (b) that the relation fork was created or dropped within
+        the relevant range of WAL records; in such cases,
+        <literal>relblocknumber</literal> will be zero.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   </sect1>
 
   <sect1 id="functions-admin">
-- 
2.39.3 (Apple Git-145)

v1-0002-Add-new-function-pg_get_wal_summarizer_state.patchapplication/octet-stream; name=v1-0002-Add-new-function-pg_get_wal_summarizer_state.patchDownload
From f72c824bcfd16c6ef23b0597495d52dc2e287c86 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 4 Jan 2024 13:37:41 -0500
Subject: [PATCH v1 2/4] Add new function pg_get_wal_summarizer_state().

This makes it possible to access information about the progress
of WAL summarization from SQL. The previously-added functions
pg_available_wal_summaries() and pg_wal_summary_contents() only
examine on-disk state, but this function exposes information from
the server's shared memory.

XXX. Bump catversion.
---
 doc/src/sgml/func.sgml                 | 28 +++++++++++
 src/backend/backup/walsummaryfuncs.c   | 39 ++++++++++++++++
 src/backend/postmaster/walsummarizer.c | 65 ++++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat        |  9 ++++
 src/include/postmaster/walsummarizer.h |  4 ++
 5 files changed, 145 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index de78d58d4b..0f7d409e60 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26554,6 +26554,34 @@ SELECT collation for ('foo' COLLATE "de_DE");
         <literal>relblocknumber</literal> will be zero.
        </para></entry>
       </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_get_wal_summarizer_state</primary>
+        </indexterm>
+        <function>pg_get_wal_summarizer_state</function> ()
+        <returnvalue>record</returnvalue>
+        ( <parameter>summarized_tli</parameter> <type>bigint</type>,
+        <parameter>summarized_lsn</parameter> <type>pg_lsn</type>,
+        <parameter>pending_lsn</parameter> <type>pg_lsn</type>,
+        <parameter>summarizer_pid</parameter> <type>int</type> )
+       </para>
+       <para>
+        Returns information about the progress of the WAL summarizer. If the
+        WAL summarizer has never run since the instance was started, then
+        <literal>summarized_tli</literal> and <literal>summarized_lsn</literal>
+        will be <literal>0</literal> and <literal>0/0</literal> respectively;
+        otherwise, they will be the TLI and ending LSN of the last WAL summary
+        file written to disk. If the WAL summarizer is currently running,
+        <literal>pending_lsn</literal> will be the ending LSN of the last
+        record that it has consumed, which must always be greater than or
+        equal to <literal>summarized_lsn</literal>; if the WAL summarizer is
+        not running, it will be equal to <literal>summarized_lsn</literal>.
+        <literal>summarized_pid</literal> is the PID of the WAL summarizer
+        process, if it is running, and otherwise NULL.
+       </para></entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
diff --git a/src/backend/backup/walsummaryfuncs.c b/src/backend/backup/walsummaryfuncs.c
index 89c660c696..f082488b33 100644
--- a/src/backend/backup/walsummaryfuncs.c
+++ b/src/backend/backup/walsummaryfuncs.c
@@ -16,11 +16,13 @@
 #include "common/blkreftable.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/walsummarizer.h"
 #include "utils/fmgrprotos.h"
 #include "utils/pg_lsn.h"
 
 #define NUM_WS_ATTS			3
 #define NUM_SUMMARY_ATTS	6
+#define NUM_STATE_ATTS		4
 #define MAX_BLOCKS_PER_CALL	256
 
 /*
@@ -167,3 +169,40 @@ pg_wal_summary_contents(PG_FUNCTION_ARGS)
 
 	return (Datum) 0;
 }
+
+/*
+ * Returns information about the state of the WAL summarizer process.
+ */
+Datum
+pg_get_wal_summarizer_state(PG_FUNCTION_ARGS)
+{
+	Datum		values[NUM_STATE_ATTS];
+	bool		nulls[NUM_STATE_ATTS];
+	TimeLineID	summarized_tli;
+	XLogRecPtr	summarized_lsn;
+	XLogRecPtr	pending_lsn;
+	int			summarizer_pid;
+	TupleDesc	tupdesc;
+	HeapTuple	htup;
+
+	GetWalSummarizerState(&summarized_tli, &summarized_lsn, &pending_lsn,
+						  &summarizer_pid);
+
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	memset(nulls, 0, sizeof(nulls));
+
+	values[0] = Int64GetDatum((int64) summarized_tli);
+	values[1] = LSNGetDatum(summarized_lsn);
+	values[2] = LSNGetDatum(pending_lsn);
+
+	if (summarizer_pid < 0)
+		nulls[3] = true;
+	else
+		values[3] = Int32GetDatum(summarizer_pid);
+
+	htup = heap_form_tuple(tupdesc, values, nulls);
+
+	PG_RETURN_DATUM(HeapTupleGetDatum(htup));
+}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index f828cc436a..7287f07529 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -142,6 +142,7 @@ static XLogRecPtr redo_pointer_at_last_summary_removal = InvalidXLogRecPtr;
 bool		summarize_wal = false;
 int			wal_summary_keep_time = 10 * 24 * 60;
 
+static void WalSummarizerShutdown(int code, Datum arg);
 static XLogRecPtr GetLatestLSN(TimeLineID *tli);
 static void HandleWalSummarizerInterrupts(void);
 static XLogRecPtr SummarizeWAL(TimeLineID tli, XLogRecPtr start_lsn,
@@ -245,6 +246,7 @@ WalSummarizerMain(void)
 	pqsignal(SIGUSR2, SIG_IGN); /* not used */
 
 	/* Advertise ourselves. */
+	on_shmem_exit(WalSummarizerShutdown, (Datum) 0);
 	LWLockAcquire(WALSummarizerLock, LW_EXCLUSIVE);
 	WalSummarizerCtl->summarizer_pgprocno = MyProc->pgprocno;
 	LWLockRelease(WALSummarizerLock);
@@ -417,6 +419,57 @@ WalSummarizerMain(void)
 	}
 }
 
+/*
+ * Get information about the state of the WAL summarizer.
+ */
+void
+GetWalSummarizerState(TimeLineID *summarized_tli, XLogRecPtr *summarized_lsn,
+					  XLogRecPtr *pending_lsn, int *summarizer_pid)
+{
+	LWLockAcquire(WALSummarizerLock, LW_SHARED);
+	if (!WalSummarizerCtl->initialized)
+	{
+		/*
+		 * If initialized is false, the rest of the structure contents are
+		 * undefined.
+		 */
+		*summarized_tli = 0;
+		*summarized_lsn = InvalidXLogRecPtr;
+		*pending_lsn = InvalidXLogRecPtr;
+		*summarizer_pid = -1;
+	}
+	else
+	{
+		int	summarizer_pgprocno = WalSummarizerCtl->summarizer_pgprocno;
+
+		*summarized_tli = WalSummarizerCtl->summarized_tli;
+		*summarized_lsn = WalSummarizerCtl->summarized_lsn;
+		if (summarizer_pgprocno == INVALID_PGPROCNO)
+		{
+			/*
+			 * If the summarizer has exited, the fact that it had processed
+			 * beyond summarized_lsn is irrelevant now.
+			 */
+			*pending_lsn = WalSummarizerCtl->summarized_lsn;
+			*summarizer_pid = -1;
+		}
+		else
+		{
+			*pending_lsn = WalSummarizerCtl->pending_lsn;
+
+			/*
+			 * We're not fussed about inexact answers here, since they could
+			 * become stale instantly, so we don't bother taking the lock, but
+			 * make sure that invalid PID values are normalized to -1.
+			 */
+			*summarizer_pid = GetPGProcByNumber(summarizer_pgprocno)->pid;
+			if (*summarizer_pid <= 0)
+				*summarizer_pid = -1;
+		}
+	}
+	LWLockRelease(WALSummarizerLock);
+}
+
 /*
  * Get the oldest LSN in this server's timeline history that has not yet been
  * summarized.
@@ -622,6 +675,18 @@ WaitForWalSummarization(XLogRecPtr lsn, long timeout, XLogRecPtr *pending_lsn)
 	return summarized_lsn;
 }
 
+/*
+ * On exit, update shared memory to make it clear that we're no longer
+ * running.
+ */
+static void
+WalSummarizerShutdown(int code, Datum arg)
+{
+	LWLockAcquire(WALSummarizerLock, LW_EXCLUSIVE);
+	WalSummarizerCtl->summarizer_pgprocno = INVALID_PGPROCNO;
+	LWLockRelease(WALSummarizerLock);
+}
+
 /*
  * Get the latest LSN that is eligible to be summarized, and set *tli to the
  * corresponding timeline.
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7979392776..58811a6530 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12142,5 +12142,14 @@
   proargmodes => '{i,i,i,o,o,o,o,o,o}',
   proargnames => '{tli,start_lsn,end_lsn,relfilenode,reltablespace,reldatabase,relforknumber,relblocknumber,is_limit_block}',
   prosrc => 'pg_wal_summary_contents' },
+{ oid => '8438',
+  descr => 'WAL summarizer state',
+  proname => 'pg_get_wal_summarizer_state',
+  provolatile => 'v', proparallel => 's',
+  prorettype => 'record', proargtypes => '',
+  proallargtypes => '{int8,pg_lsn,pg_lsn,int4}',
+  proargmodes => '{o,o,o,o}',
+  proargnames => '{summarized_tli,summarized_lsn,pending_lsn,summarizer_pid}',
+  prosrc => 'pg_get_wal_summarizer_state' },
 
 ]
diff --git a/src/include/postmaster/walsummarizer.h b/src/include/postmaster/walsummarizer.h
index 75cb1d709f..d6e61b59ac 100644
--- a/src/include/postmaster/walsummarizer.h
+++ b/src/include/postmaster/walsummarizer.h
@@ -23,6 +23,10 @@ extern Size WalSummarizerShmemSize(void);
 extern void WalSummarizerShmemInit(void);
 extern void WalSummarizerMain(void) pg_attribute_noreturn();
 
+extern void GetWalSummarizerState(TimeLineID *summarized_tli,
+								  XLogRecPtr *summarized_lsn,
+								  XLogRecPtr *pending_lsn,
+								  int *summarizer_pid);
 extern XLogRecPtr GetOldestUnsummarizedLSN(TimeLineID *tli,
 										   bool *lsn_is_exact,
 										   bool reset_pending_lsn);
-- 
2.39.3 (Apple Git-145)

v1-0004-Repair-various-defects-in-dc212340058b4e7ecfc5a7a.patchapplication/octet-stream; name=v1-0004-Repair-various-defects-in-dc212340058b4e7ecfc5a7a.patchDownload
From 211d49090a0f2f9579140653eca632dac1f9d5c4 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 5 Jan 2024 09:09:34 -0500
Subject: [PATCH v1 4/4] Repair various defects in
 dc212340058b4e7ecfc5a7a81ec50e7a207bf288.

pg_combinebackup had various problems:

* strncpy was used in various places where should be used instead,
  to avoid any possibility of the result not being \0-terminated.
* scan_for_existing_tablespaces() failed to close the directory,
  and an error when opening the directory was reported with the
  wrong pathname.
* write_reconstructed_file() contained some redundant and therefore
  dead code.
* flush_manifest() didn't check the result of pg_checksum_update()
  as we do in other places, and misused a local pathname variable
  that shouldn't exist at all.

In pg_basebackup, the wrong variable name was used in one place,
due to a copy and paste that was not properly adjusted.

Per Coverity and subsequent code inspection.
---
 src/bin/pg_basebackup/pg_basebackup.c       |  2 +-
 src/bin/pg_combinebackup/pg_combinebackup.c | 15 +++++++++------
 src/bin/pg_combinebackup/reconstruct.c      |  5 ++---
 src/bin/pg_combinebackup/write_manifest.c   | 10 +++++-----
 4 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index b734cce5d4..3b3cc242e0 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -709,7 +709,7 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier,
 					 PQserverVersion(conn) < MINIMUM_VERSION_FOR_PG_WAL ?
 					 "pg_xlog" : "pg_wal");
 
-			if (pg_mkdir_p(statusdir, pg_dir_create_mode) != 0 &&
+			if (pg_mkdir_p(summarydir, pg_dir_create_mode) != 0 &&
 				errno != EEXIST)
 				pg_fatal("could not create directory \"%s\": %m", summarydir);
 		}
diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c b/src/bin/pg_combinebackup/pg_combinebackup.c
index 6027e241f3..31ead7f405 100644
--- a/src/bin/pg_combinebackup/pg_combinebackup.c
+++ b/src/bin/pg_combinebackup/pg_combinebackup.c
@@ -773,8 +773,8 @@ process_directory_recursively(Oid tsoid,
 	 */
 	if (relative_path == NULL)
 	{
-		strncpy(ifulldir, input_directory, MAXPGPATH);
-		strncpy(ofulldir, output_directory, MAXPGPATH);
+		strlcpy(ifulldir, input_directory, MAXPGPATH);
+		strlcpy(ofulldir, output_directory, MAXPGPATH);
 		if (OidIsValid(tsoid))
 			snprintf(manifest_prefix, MAXPGPATH, "pg_tblspc/%u/", tsoid);
 		else
@@ -855,7 +855,7 @@ process_directory_recursively(Oid tsoid,
 
 			/* Append new pathname component to relative path. */
 			if (relative_path == NULL)
-				strncpy(new_relative_path, de->d_name, MAXPGPATH);
+				strlcpy(new_relative_path, de->d_name, MAXPGPATH);
 			else
 				snprintf(new_relative_path, MAXPGPATH, "%s/%s", relative_path,
 						 de->d_name);
@@ -1131,7 +1131,7 @@ scan_for_existing_tablespaces(char *pathname, cb_options *opt)
 	pg_log_debug("scanning \"%s\"", pg_tblspc);
 
 	if ((dir = opendir(pg_tblspc)) == NULL)
-		pg_fatal("could not open directory \"%s\": %m", pathname);
+		pg_fatal("could not open directory \"%s\": %m", pg_tblspc);
 
 	while (errno = 0, (de = readdir(dir)) != NULL)
 	{
@@ -1203,8 +1203,8 @@ scan_for_existing_tablespaces(char *pathname, cb_options *opt)
 			{
 				if (strcmp(tsmap->old_dir, link_target) == 0)
 				{
-					strncpy(ts->old_dir, tsmap->old_dir, MAXPGPATH);
-					strncpy(ts->new_dir, tsmap->new_dir, MAXPGPATH);
+					strlcpy(ts->old_dir, tsmap->old_dir, MAXPGPATH);
+					strlcpy(ts->new_dir, tsmap->new_dir, MAXPGPATH);
 					ts->in_place = false;
 					break;
 				}
@@ -1238,6 +1238,9 @@ scan_for_existing_tablespaces(char *pathname, cb_options *opt)
 		tslist = ts;
 	}
 
+	if (closedir(dir) != 0)
+		pg_fatal("could not close directory \"%s\": %m", pg_tblspc);
+
 	return tslist;
 }
 
diff --git a/src/bin/pg_combinebackup/reconstruct.c b/src/bin/pg_combinebackup/reconstruct.c
index b835ec363e..873d307902 100644
--- a/src/bin/pg_combinebackup/reconstruct.c
+++ b/src/bin/pg_combinebackup/reconstruct.c
@@ -577,13 +577,12 @@ write_reconstructed_file(char *input_filename,
 			{
 				if (current_block == start_of_range)
 					appendStringInfo(&debug_buf, " %u:%s@" UINT64_FORMAT,
-									 current_block,
-									 s == NULL ? "ZERO" : s->filename,
+									 current_block, s->filename,
 									 (uint64) offsetmap[current_block]);
 				else
 					appendStringInfo(&debug_buf, " %u-%u:%s@" UINT64_FORMAT,
 									 start_of_range, current_block,
-									 s == NULL ? "ZERO" : s->filename,
+									 s->filename,
 									 (uint64) offsetmap[current_block]);
 			}
 
diff --git a/src/bin/pg_combinebackup/write_manifest.c b/src/bin/pg_combinebackup/write_manifest.c
index 608e84f3b4..01deb82cc9 100644
--- a/src/bin/pg_combinebackup/write_manifest.c
+++ b/src/bin/pg_combinebackup/write_manifest.c
@@ -241,8 +241,6 @@ escape_json(StringInfo buf, const char *str)
 static void
 flush_manifest(manifest_writer *mwriter)
 {
-	char		pathname[MAXPGPATH];
-
 	if (mwriter->fd == -1 &&
 		(mwriter->fd = open(mwriter->pathname,
 							O_WRONLY | O_CREAT | O_EXCL | PG_BINARY,
@@ -260,13 +258,15 @@ flush_manifest(manifest_writer *mwriter)
 				pg_fatal("could not write \"%s\": %m", mwriter->pathname);
 			else
 				pg_fatal("could not write file \"%s\": wrote only %d of %d bytes",
-						 pathname, (int) wb, mwriter->buf.len);
+						 mwriter->pathname, (int) wb, mwriter->buf.len);
 		}
 
-		if (mwriter->still_checksumming)
+		if (mwriter->still_checksumming &&
 			pg_checksum_update(&mwriter->manifest_ctx,
 							   (uint8 *) mwriter->buf.data,
-							   mwriter->buf.len);
+							   mwriter->buf.len) < 0)
+			pg_fatal("could not update checksum of file \"%s\"",
+					 mwriter->pathname);
 		resetStringInfo(&mwriter->buf);
 	}
 }
-- 
2.39.3 (Apple Git-145)

v1-0003-Add-new-pg_walsummary-tool.patchapplication/octet-stream; name=v1-0003-Add-new-pg_walsummary-tool.patchDownload
From a4e98d908e9d10438f7694ae7600c4b6a8d6d021 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 20 Dec 2023 15:52:59 -0500
Subject: [PATCH v1 3/4] Add new pg_walsummary tool.

This can dump the contents of the WAL summary files found in
pg_wal/summaries. Normally, this shouldn't really be something anyone
needs to do, but it may be needed for debugging problems with
incremental backup, or could possibly be used in some useful way by
external tools.
---
 doc/src/sgml/ref/allfiles.sgml        |   1 +
 doc/src/sgml/ref/pg_walsummary.sgml   | 122 +++++++++++
 doc/src/sgml/reference.sgml           |   1 +
 src/bin/Makefile                      |   1 +
 src/bin/meson.build                   |   1 +
 src/bin/pg_walsummary/.gitignore      |   1 +
 src/bin/pg_walsummary/Makefile        |  48 +++++
 src/bin/pg_walsummary/meson.build     |  30 +++
 src/bin/pg_walsummary/nls.mk          |   6 +
 src/bin/pg_walsummary/pg_walsummary.c | 280 ++++++++++++++++++++++++++
 src/bin/pg_walsummary/t/001_basic.pl  |  19 ++
 src/bin/pg_walsummary/t/002_blocks.pl |  88 ++++++++
 src/tools/pgindent/typedefs.list      |   2 +
 13 files changed, 600 insertions(+)
 create mode 100644 doc/src/sgml/ref/pg_walsummary.sgml
 create mode 100644 src/bin/pg_walsummary/.gitignore
 create mode 100644 src/bin/pg_walsummary/Makefile
 create mode 100644 src/bin/pg_walsummary/meson.build
 create mode 100644 src/bin/pg_walsummary/nls.mk
 create mode 100644 src/bin/pg_walsummary/pg_walsummary.c
 create mode 100644 src/bin/pg_walsummary/t/001_basic.pl
 create mode 100644 src/bin/pg_walsummary/t/002_blocks.pl

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index fda4690eab..4a42999b18 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -219,6 +219,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgtesttiming       SYSTEM "pgtesttiming.sgml">
 <!ENTITY pgupgrade          SYSTEM "pgupgrade.sgml">
 <!ENTITY pgwaldump          SYSTEM "pg_waldump.sgml">
+<!ENTITY pgwalsummary       SYSTEM "pg_walsummary.sgml">
 <!ENTITY postgres           SYSTEM "postgres-ref.sgml">
 <!ENTITY psqlRef            SYSTEM "psql-ref.sgml">
 <!ENTITY reindexdb          SYSTEM "reindexdb.sgml">
diff --git a/doc/src/sgml/ref/pg_walsummary.sgml b/doc/src/sgml/ref/pg_walsummary.sgml
new file mode 100644
index 0000000000..93e265ead7
--- /dev/null
+++ b/doc/src/sgml/ref/pg_walsummary.sgml
@@ -0,0 +1,122 @@
+<!--
+doc/src/sgml/ref/pg_walsummary.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="app-pgwalsummary">
+ <indexterm zone="app-pgwalsummary">
+  <primary>pg_walsummary</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_walsummary</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_walsummary</refname>
+  <refpurpose>print contents of WAL summary files</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_walsummary</command>
+   <arg rep="repeat" choice="opt"><replaceable>option</replaceable></arg>
+   <arg rep="repeat"><replaceable>file</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+  <para>
+   <application>pg_walsummary</application> is used to print the contents of
+   WAL summary files. These binary files are found with the
+   <literal>pg_wal/summaries</literal> subdirectory of the data directory,
+   and can be converted to text using this tool. This is not ordinarily
+   necessary, since WAL summary files primarily exist to support
+   <link linkend="backup-incremental-backup">incremental backup</link>,
+   but it may be useful for debugging purposes.
+  </para>
+
+  <para>
+   A WAL summary file is indexed by tablespace OID, relation OID, and relation
+   fork. For each relation fork, it stores the list of blocks that were
+   modified by WAL within the range summarized in the file. It can also
+   store a "limit block," which is 0 if the relation fork was created or
+   truncated within the relevant WAL range, and otherwise the shortest length
+   to which the relation fork was truncated. If the relation fork was not
+   created, deleted, or truncated within the relevant WAL range, the limit
+   block is undefined or infinite and will not be printed by this tool.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    <variablelist>
+     <varlistentry>
+      <term><option>-i</option></term>
+      <term><option>--indivudual</option></term>
+      <listitem>
+       <para>
+        By default, <literal>pg_walsummary</literal> prints one line of output
+        for each range of one or more consecutive modified blocks. This can
+        make the output a lot briefer, since a relation where all blocks from
+        0 through 999 were modified will produce only one line of output rather
+        than 1000 separate lines. This option requests a separate line of
+        output for every modified block.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-q</option></term>
+      <term><option>--quiet</option></term>
+      <listitem>
+       <para>
+        Do not print any output, except for errors. This can be useful
+        when you want to know whether a WAL summary file can be successfully
+        parsed but don't care about the contents.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-?</option></term>
+       <term><option>--help</option></term>
+       <listitem>
+       <para>
+       Shows help about <application>pg_walsummary</application> command line
+       arguments, and exits.
+       </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Environment</title>
+
+  <para>
+   The environment variable <envar>PG_COLOR</envar> specifies whether to use
+   color in diagnostic messages. Possible values are
+   <literal>always</literal>, <literal>auto</literal> and
+   <literal>never</literal>.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="app-pgbasebackup"/></member>
+   <member><xref linkend="app-pgcombinebackup"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index a07d2b5e01..aa94f6adf6 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -289,6 +289,7 @@
    &pgtesttiming;
    &pgupgrade;
    &pgwaldump;
+   &pgwalsummary;
    &postgres;
 
  </reference>
diff --git a/src/bin/Makefile b/src/bin/Makefile
index 117ffdab6a..fc789da17b 100644
--- a/src/bin/Makefile
+++ b/src/bin/Makefile
@@ -31,6 +31,7 @@ SUBDIRS = \
 	pg_upgrade \
 	pg_verifybackup \
 	pg_waldump \
+	pg_walsummary \
 	pgbench \
 	psql \
 	scripts
diff --git a/src/bin/meson.build b/src/bin/meson.build
index f292752c5c..aa60ebaa30 100644
--- a/src/bin/meson.build
+++ b/src/bin/meson.build
@@ -17,6 +17,7 @@ subdir('pg_test_timing')
 subdir('pg_upgrade')
 subdir('pg_verifybackup')
 subdir('pg_waldump')
+subdir('pg_walsummary')
 subdir('pgbench')
 subdir('pgevent')
 subdir('psql')
diff --git a/src/bin/pg_walsummary/.gitignore b/src/bin/pg_walsummary/.gitignore
new file mode 100644
index 0000000000..d71ec192fa
--- /dev/null
+++ b/src/bin/pg_walsummary/.gitignore
@@ -0,0 +1 @@
+pg_walsummary
diff --git a/src/bin/pg_walsummary/Makefile b/src/bin/pg_walsummary/Makefile
new file mode 100644
index 0000000000..2c24bc6db5
--- /dev/null
+++ b/src/bin/pg_walsummary/Makefile
@@ -0,0 +1,48 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/bin/pg_walsummary
+#
+# Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/bin/pg_walsummary/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "pg_walsummary - print contents of WAL summary files"
+PGAPPICON=win32
+
+subdir = src/bin/pg_walsummary
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils
+
+OBJS = \
+	$(WIN32RES) \
+	pg_walsummary.o
+
+all: pg_walsummary
+
+pg_walsummary: $(OBJS) | submake-libpgport submake-libpgfeutils
+	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+
+install: all installdirs
+	$(INSTALL_PROGRAM) pg_walsummary$(X) '$(DESTDIR)$(bindir)/pg_walsummary$(X)'
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(bindir)'
+
+uninstall:
+	rm -f '$(DESTDIR)$(bindir)/pg_walsummary$(X)'
+
+clean distclean maintainer-clean:
+	rm -f pg_walsummary$(X) $(OBJS)
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
diff --git a/src/bin/pg_walsummary/meson.build b/src/bin/pg_walsummary/meson.build
new file mode 100644
index 0000000000..f886a4cb36
--- /dev/null
+++ b/src/bin/pg_walsummary/meson.build
@@ -0,0 +1,30 @@
+# Copyright (c) 2022-2023, PostgreSQL Global Development Group
+
+pg_walsummary_sources = files(
+  'pg_walsummary.c',
+)
+
+if host_system == 'windows'
+  pg_walsummary_sources += rc_bin_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'pg_walsummary',
+    '--FILEDESC', 'pg_walsummary - print contents of WAL summary files',])
+endif
+
+pg_walsummary = executable('pg_walsummary',
+  pg_walsummary_sources,
+  dependencies: [frontend_code],
+  kwargs: default_bin_args,
+)
+bin_targets += pg_walsummary
+
+tests += {
+  'name': 'pg_walsummary',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'tap': {
+    'tests': [
+      't/001_basic.pl',
+      't/002_blocks.pl',
+    ],
+  }
+}
diff --git a/src/bin/pg_walsummary/nls.mk b/src/bin/pg_walsummary/nls.mk
new file mode 100644
index 0000000000..f411dcfe9e
--- /dev/null
+++ b/src/bin/pg_walsummary/nls.mk
@@ -0,0 +1,6 @@
+# src/bin/pg_combinebackup/nls.mk
+CATALOG_NAME     = pg_walsummary
+GETTEXT_FILES    = $(FRONTEND_COMMON_GETTEXT_FILES) \
+		   pg_walsummary.c
+GETTEXT_TRIGGERS = $(FRONTEND_COMMON_GETTEXT_TRIGGERS)
+GETTEXT_FLAGS    = $(FRONTEND_COMMON_GETTEXT_FLAGS)
diff --git a/src/bin/pg_walsummary/pg_walsummary.c b/src/bin/pg_walsummary/pg_walsummary.c
new file mode 100644
index 0000000000..0c0225eeb8
--- /dev/null
+++ b/src/bin/pg_walsummary/pg_walsummary.c
@@ -0,0 +1,280 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_walsummary.c
+ *		Prints the contents of WAL summary files.
+ *
+ * Copyright (c) 2017-2023, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/bin/pg_walsummary/pg_walsummary.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include <fcntl.h>
+#include <limits.h>
+
+#include "common/blkreftable.h"
+#include "common/logging.h"
+#include "fe_utils/option_utils.h"
+#include "lib/stringinfo.h"
+#include "getopt_long.h"
+
+typedef struct ws_options
+{
+	bool		individual;
+	bool		quiet;
+} ws_options;
+
+typedef struct ws_file_info
+{
+	int			fd;
+	char	   *filename;
+} ws_file_info;
+
+static BlockNumber *block_buffer = NULL;
+static unsigned block_buffer_size = 512;	/* Initial size. */
+
+static void dump_one_relation(ws_options *opt, RelFileLocator *rlocator,
+							  ForkNumber forknum, BlockNumber limit_block,
+							  BlockRefTableReader *reader);
+static void help(const char *progname);
+static int	compare_block_numbers(const void *a, const void *b);
+static int	walsummary_read_callback(void *callback_arg, void *data,
+									 int length);
+static void walsummary_error_callback(void *callback_arg, char *fmt,...) pg_attribute_printf(2, 3);
+
+/*
+ * Main program.
+ */
+int
+main(int argc, char *argv[])
+{
+	static struct option long_options[] = {
+		{"individual", no_argument, NULL, 'i'},
+		{"quiet", no_argument, NULL, 'q'},
+		{NULL, 0, NULL, 0}
+	};
+
+	const char *progname;
+	int			optindex;
+	int			c;
+	ws_options	opt;
+
+	memset(&opt, 0, sizeof(ws_options));
+
+	pg_logging_init(argv[0]);
+	progname = get_progname(argv[0]);
+	handle_help_version_opts(argc, argv, progname, help);
+
+	/* process command-line options */
+	while ((c = getopt_long(argc, argv, "f:iqw:",
+							long_options, &optindex)) != -1)
+	{
+		switch (c)
+		{
+			case 'i':
+				opt.individual = true;
+				break;
+			case 'q':
+				opt.quiet = true;
+				break;
+			default:
+				/* getopt_long already emitted a complaint */
+				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+				exit(1);
+		}
+	}
+
+	if (optind >= argc)
+	{
+		pg_log_error("%s: no input files specified", progname);
+		pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+		exit(1);
+	}
+
+	while (optind < argc)
+	{
+		ws_file_info ws;
+		BlockRefTableReader *reader;
+		RelFileLocator rlocator;
+		ForkNumber	forknum;
+		BlockNumber limit_block;
+
+		ws.filename = argv[optind++];
+		if ((ws.fd = open(ws.filename, O_RDONLY | PG_BINARY, 0)) < 0)
+			pg_fatal("could not open file \"%s\": %m", ws.filename);
+
+		reader = CreateBlockRefTableReader(walsummary_read_callback, &ws,
+										   ws.filename,
+										   walsummary_error_callback, NULL);
+		while (BlockRefTableReaderNextRelation(reader, &rlocator, &forknum,
+											   &limit_block))
+			dump_one_relation(&opt, &rlocator, forknum, limit_block, reader);
+
+		DestroyBlockRefTableReader(reader);
+		close(ws.fd);
+	}
+
+	exit(0);
+}
+
+/*
+ * Dump details for one relation.
+ */
+static void
+dump_one_relation(ws_options *opt, RelFileLocator *rlocator,
+				  ForkNumber forknum, BlockNumber limit_block,
+				  BlockRefTableReader *reader)
+{
+	unsigned	i = 0;
+	unsigned	nblocks;
+	BlockNumber startblock = InvalidBlockNumber;
+	BlockNumber endblock = InvalidBlockNumber;
+
+	/* Dump limit block, if any. */
+	if (limit_block != InvalidBlockNumber)
+		printf("TS %u, DB %u, REL %u, FORK %s: limit %u\n",
+			   rlocator->spcOid, rlocator->dbOid, rlocator->relNumber,
+			   forkNames[forknum], limit_block);
+
+	/* If we haven't allocated a block buffer yet, do that now. */
+	if (block_buffer == NULL)
+		block_buffer = palloc_array(BlockNumber, block_buffer_size);
+
+	/* Try to fill the block buffer. */
+	nblocks = BlockRefTableReaderGetBlocks(reader,
+										   block_buffer,
+										   block_buffer_size);
+
+	/* If we filled the block buffer completely, we must enlarge it. */
+	while (nblocks >= block_buffer_size)
+	{
+		unsigned	new_size;
+
+		/* Double the size, being careful about overflow. */
+		new_size = block_buffer_size * 2;
+		if (new_size < block_buffer_size)
+			new_size = PG_UINT32_MAX;
+		block_buffer = repalloc_array(block_buffer, BlockNumber, new_size);
+
+		/* Try to fill the newly-allocated space. */
+		nblocks +=
+			BlockRefTableReaderGetBlocks(reader,
+										 block_buffer + block_buffer_size,
+										 new_size - block_buffer_size);
+
+		/* Save the new size for later calls. */
+		block_buffer_size = new_size;
+	}
+
+	/* If we don't need to produce any output, skip the rest of this. */
+	if (opt->quiet)
+		return;
+
+	/*
+	 * Sort the returned block numbers. If the block reference table was using
+	 * the bitmap representation for a given chunk, the block numbers in that
+	 * chunk will already be sorted, but when the array-of-offsets
+	 * representation is used, we can receive block numbers here out of order.
+	 */
+	qsort(block_buffer, nblocks, sizeof(BlockNumber), compare_block_numbers);
+
+	/* Dump block references. */
+	while (i < nblocks)
+	{
+		/*
+		 * Find the next range of blocks to print, but if --individual was
+		 * specified, then consider each block a separate range.
+		 */
+		startblock = endblock = block_buffer[i++];
+		if (!opt->individual)
+		{
+			while (i < nblocks && block_buffer[i] == endblock + 1)
+			{
+				endblock++;
+				i++;
+			}
+		}
+
+		/* Format this range of block numbers as a string. */
+		if (startblock == endblock)
+			printf("TS %u, DB %u, REL %u, FORK %s: block %u\n",
+				   rlocator->spcOid, rlocator->dbOid, rlocator->relNumber,
+				   forkNames[forknum], startblock);
+		else
+			printf("TS %u, DB %u, REL %u, FORK %s: blocks %u..%u\n",
+				   rlocator->spcOid, rlocator->dbOid, rlocator->relNumber,
+				   forkNames[forknum], startblock, endblock);
+	}
+}
+
+/*
+ * Quicksort comparator for block numbers.
+ */
+static int
+compare_block_numbers(const void *a, const void *b)
+{
+	BlockNumber aa = *(BlockNumber *) a;
+	BlockNumber bb = *(BlockNumber *) b;
+
+	if (aa > bb)
+		return 1;
+	else if (aa == bb)
+		return 0;
+	else
+		return -1;
+}
+
+/*
+ * Error callback.
+ */
+void
+walsummary_error_callback(void *callback_arg, char *fmt,...)
+{
+	va_list		ap;
+
+	va_start(ap, fmt);
+	pg_log_generic_v(PG_LOG_ERROR, PG_LOG_PRIMARY, fmt, ap);
+	va_end(ap);
+
+	exit(1);
+}
+
+/*
+ * Read callback.
+ */
+int
+walsummary_read_callback(void *callback_arg, void *data, int length)
+{
+	ws_file_info *ws = callback_arg;
+	int			rc;
+
+	if ((rc = read(ws->fd, data, length)) < 0)
+		pg_fatal("could not read file \"%s\": %m", ws->filename);
+
+	return rc;
+}
+
+/*
+ * help
+ *
+ * Prints help page for the program
+ *
+ * progname: the name of the executed program, such as "pg_walsummary"
+ */
+static void
+help(const char *progname)
+{
+	printf(_("%s prints the contents of a WAL summary file.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION]... FILE...\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_("  -i, --individual          list block numbers individually, not as ranges\n"));
+	printf(_("  -q, --quiet               don't print anything, just parse the files\n"));
+	printf(_("  -?, --help                show this help, then exit\n"));
+
+	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
+	printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
+}
diff --git a/src/bin/pg_walsummary/t/001_basic.pl b/src/bin/pg_walsummary/t/001_basic.pl
new file mode 100644
index 0000000000..10a232a150
--- /dev/null
+++ b/src/bin/pg_walsummary/t/001_basic.pl
@@ -0,0 +1,19 @@
+# Copyright (c) 2021-2023, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $tempdir = PostgreSQL::Test::Utils::tempdir;
+
+program_help_ok('pg_walsummary');
+program_version_ok('pg_walsummary');
+program_options_handling_ok('pg_walsummary');
+
+command_fails_like(
+	['pg_walsummary'],
+	qr/no input files specified/,
+	'input files must be specified');
+
+done_testing();
diff --git a/src/bin/pg_walsummary/t/002_blocks.pl b/src/bin/pg_walsummary/t/002_blocks.pl
new file mode 100644
index 0000000000..170976f9e2
--- /dev/null
+++ b/src/bin/pg_walsummary/t/002_blocks.pl
@@ -0,0 +1,88 @@
+# Copyright (c) 2021-2023, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use File::Compare;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Set up a new database instance.
+my $node1 = PostgreSQL::Test::Cluster->new('node1');
+$node1->init(has_archiving => 1, allows_streaming => 1);
+$node1->append_conf('postgresql.conf', 'summarize_wal = on');
+$node1->start;
+
+# See what's been summarized up until now.
+my $progress = $node1->safe_psql('postgres', <<EOM);
+SELECT summarized_tli, summarized_lsn FROM pg_get_wal_summarizer_state()
+EOM
+my ($summarized_tli, $summarized_lsn) = split(/\|/, $progress);
+note("before insert, summarized TLI $summarized_tli through $summarized_lsn");
+
+# Create a table and insert a few test rows into it. VACUUM FREEZE it so that
+# autovacuum doesn't induce any future modifications unexpectedly. Then
+# trigger a checkpoint.
+$node1->safe_psql('postgres', <<EOM);
+CREATE TABLE mytable (a int, b text);
+INSERT INTO mytable
+SELECT
+	g, random()::text||random()::text||random()::text||random()::text
+FROM
+	generate_series(1, 400) g;
+VACUUM FREEZE;
+CHECKPOINT;
+EOM
+
+# Wait for a new summary to show up.
+$node1->poll_query_until('postgres', <<EOM);
+SELECT EXISTS (
+    SELECT * from pg_available_wal_summaries()
+    WHERE tli = $summarized_tli AND end_lsn > '$summarized_lsn'
+)
+EOM
+
+# Again check the progress of WAL summarization.
+$progress = $node1->safe_psql('postgres', <<EOM);
+SELECT summarized_tli, summarized_lsn FROM pg_get_wal_summarizer_state()
+EOM
+($summarized_tli, $summarized_lsn) = split(/\|/, $progress);
+note("after insert, summarized TLI $summarized_tli through $summarized_lsn");
+
+# Update a row in the first block of the table and trigger a checkpoint.
+$node1->safe_psql('postgres', <<EOM);
+UPDATE mytable SET b = 'abcdefghijklmnopqrstuvwxyz' WHERE a = 2;
+CHECKPOINT;
+EOM
+
+# Again wait for a new summary to show up.
+$node1->poll_query_until('postgres', <<EOM);
+SELECT EXISTS (
+    SELECT * from pg_available_wal_summaries()
+    WHERE tli = $summarized_tli AND end_lsn > '$summarized_lsn'
+)
+EOM
+
+# Figure out the exact details for the new sumamry file.
+my $details = $node1->safe_psql('postgres', <<EOM);
+SELECT tli, start_lsn, end_lsn from pg_available_wal_summaries()
+	WHERE tli = $summarized_tli AND end_lsn > '$summarized_lsn'
+EOM
+my ($tli, $start_lsn, $end_lsn) = split(/\|/, $details);
+note("examining summary for TLI $tli from $start_lsn to $end_lsn");
+
+# Reconstruct the full pathname for the WAL summary file.
+my $filename = sprintf "%s/pg_wal/summaries/%08s%08s%08s%08s%08s.summary",
+					   $node1->data_dir, $tli,
+					   split(m@/@, $start_lsn),
+					   split(m@/@, $end_lsn);
+ok(-f $filename, "WAL summary file exists");
+
+# Run pg_walsummary on it. We expect block 0 to be modified, but block 1
+# to be unmodified, so the output should say block 0, not block 0..1 or
+# similar.
+my ($stdout, $stderr) = run_command([ 'pg_walsummary', $filename ]);
+like($stdout, qr/FORK main: block 0$/m, "stdout shows block 0 modified");
+is($stderr, '', 'stderr is empty');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 5fd46b7bd1..f582eb59e7 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4039,3 +4039,5 @@ cb_tablespace_mapping
 manifest_data
 manifest_writer
 rfile
+ws_options
+ws_file_info
-- 
2.39.3 (Apple Git-145)

#2Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#1)
3 attachment(s)
Re: cleanup patches for incremental backup

Hello again,

I have now committed 0001.

I got some off-list review of 0004; specifically, Michael Paquier said
it looked OK, and Tom Lane found another bug. So I've added a fix for
that in what's now 0003.

Here's v2. I plan to commit the rest of this fairly soon if there are
no comments.

...Robert

Attachments:

v2-0002-Add-new-pg_walsummary-tool.patchapplication/octet-stream; name=v2-0002-Add-new-pg_walsummary-tool.patchDownload
From 044d7a5e49bee75ee47dc943c758d0e660055156 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 20 Dec 2023 15:52:59 -0500
Subject: [PATCH v2 2/3] Add new pg_walsummary tool.

This can dump the contents of the WAL summary files found in
pg_wal/summaries. Normally, this shouldn't really be something anyone
needs to do, but it may be needed for debugging problems with
incremental backup, or could possibly be used in some useful way by
external tools.
---
 doc/src/sgml/ref/allfiles.sgml        |   1 +
 doc/src/sgml/ref/pg_walsummary.sgml   | 122 +++++++++++
 doc/src/sgml/reference.sgml           |   1 +
 src/bin/Makefile                      |   1 +
 src/bin/meson.build                   |   1 +
 src/bin/pg_walsummary/.gitignore      |   1 +
 src/bin/pg_walsummary/Makefile        |  48 +++++
 src/bin/pg_walsummary/meson.build     |  30 +++
 src/bin/pg_walsummary/nls.mk          |   6 +
 src/bin/pg_walsummary/pg_walsummary.c | 280 ++++++++++++++++++++++++++
 src/bin/pg_walsummary/t/001_basic.pl  |  19 ++
 src/bin/pg_walsummary/t/002_blocks.pl |  88 ++++++++
 src/tools/pgindent/typedefs.list      |   2 +
 13 files changed, 600 insertions(+)
 create mode 100644 doc/src/sgml/ref/pg_walsummary.sgml
 create mode 100644 src/bin/pg_walsummary/.gitignore
 create mode 100644 src/bin/pg_walsummary/Makefile
 create mode 100644 src/bin/pg_walsummary/meson.build
 create mode 100644 src/bin/pg_walsummary/nls.mk
 create mode 100644 src/bin/pg_walsummary/pg_walsummary.c
 create mode 100644 src/bin/pg_walsummary/t/001_basic.pl
 create mode 100644 src/bin/pg_walsummary/t/002_blocks.pl

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index fda4690eab..4a42999b18 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -219,6 +219,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgtesttiming       SYSTEM "pgtesttiming.sgml">
 <!ENTITY pgupgrade          SYSTEM "pgupgrade.sgml">
 <!ENTITY pgwaldump          SYSTEM "pg_waldump.sgml">
+<!ENTITY pgwalsummary       SYSTEM "pg_walsummary.sgml">
 <!ENTITY postgres           SYSTEM "postgres-ref.sgml">
 <!ENTITY psqlRef            SYSTEM "psql-ref.sgml">
 <!ENTITY reindexdb          SYSTEM "reindexdb.sgml">
diff --git a/doc/src/sgml/ref/pg_walsummary.sgml b/doc/src/sgml/ref/pg_walsummary.sgml
new file mode 100644
index 0000000000..93e265ead7
--- /dev/null
+++ b/doc/src/sgml/ref/pg_walsummary.sgml
@@ -0,0 +1,122 @@
+<!--
+doc/src/sgml/ref/pg_walsummary.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="app-pgwalsummary">
+ <indexterm zone="app-pgwalsummary">
+  <primary>pg_walsummary</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_walsummary</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_walsummary</refname>
+  <refpurpose>print contents of WAL summary files</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_walsummary</command>
+   <arg rep="repeat" choice="opt"><replaceable>option</replaceable></arg>
+   <arg rep="repeat"><replaceable>file</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+  <para>
+   <application>pg_walsummary</application> is used to print the contents of
+   WAL summary files. These binary files are found with the
+   <literal>pg_wal/summaries</literal> subdirectory of the data directory,
+   and can be converted to text using this tool. This is not ordinarily
+   necessary, since WAL summary files primarily exist to support
+   <link linkend="backup-incremental-backup">incremental backup</link>,
+   but it may be useful for debugging purposes.
+  </para>
+
+  <para>
+   A WAL summary file is indexed by tablespace OID, relation OID, and relation
+   fork. For each relation fork, it stores the list of blocks that were
+   modified by WAL within the range summarized in the file. It can also
+   store a "limit block," which is 0 if the relation fork was created or
+   truncated within the relevant WAL range, and otherwise the shortest length
+   to which the relation fork was truncated. If the relation fork was not
+   created, deleted, or truncated within the relevant WAL range, the limit
+   block is undefined or infinite and will not be printed by this tool.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    <variablelist>
+     <varlistentry>
+      <term><option>-i</option></term>
+      <term><option>--indivudual</option></term>
+      <listitem>
+       <para>
+        By default, <literal>pg_walsummary</literal> prints one line of output
+        for each range of one or more consecutive modified blocks. This can
+        make the output a lot briefer, since a relation where all blocks from
+        0 through 999 were modified will produce only one line of output rather
+        than 1000 separate lines. This option requests a separate line of
+        output for every modified block.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-q</option></term>
+      <term><option>--quiet</option></term>
+      <listitem>
+       <para>
+        Do not print any output, except for errors. This can be useful
+        when you want to know whether a WAL summary file can be successfully
+        parsed but don't care about the contents.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-?</option></term>
+       <term><option>--help</option></term>
+       <listitem>
+       <para>
+       Shows help about <application>pg_walsummary</application> command line
+       arguments, and exits.
+       </para>
+       </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Environment</title>
+
+  <para>
+   The environment variable <envar>PG_COLOR</envar> specifies whether to use
+   color in diagnostic messages. Possible values are
+   <literal>always</literal>, <literal>auto</literal> and
+   <literal>never</literal>.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="app-pgbasebackup"/></member>
+   <member><xref linkend="app-pgcombinebackup"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index a07d2b5e01..aa94f6adf6 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -289,6 +289,7 @@
    &pgtesttiming;
    &pgupgrade;
    &pgwaldump;
+   &pgwalsummary;
    &postgres;
 
  </reference>
diff --git a/src/bin/Makefile b/src/bin/Makefile
index 117ffdab6a..fc789da17b 100644
--- a/src/bin/Makefile
+++ b/src/bin/Makefile
@@ -31,6 +31,7 @@ SUBDIRS = \
 	pg_upgrade \
 	pg_verifybackup \
 	pg_waldump \
+	pg_walsummary \
 	pgbench \
 	psql \
 	scripts
diff --git a/src/bin/meson.build b/src/bin/meson.build
index f292752c5c..aa60ebaa30 100644
--- a/src/bin/meson.build
+++ b/src/bin/meson.build
@@ -17,6 +17,7 @@ subdir('pg_test_timing')
 subdir('pg_upgrade')
 subdir('pg_verifybackup')
 subdir('pg_waldump')
+subdir('pg_walsummary')
 subdir('pgbench')
 subdir('pgevent')
 subdir('psql')
diff --git a/src/bin/pg_walsummary/.gitignore b/src/bin/pg_walsummary/.gitignore
new file mode 100644
index 0000000000..d71ec192fa
--- /dev/null
+++ b/src/bin/pg_walsummary/.gitignore
@@ -0,0 +1 @@
+pg_walsummary
diff --git a/src/bin/pg_walsummary/Makefile b/src/bin/pg_walsummary/Makefile
new file mode 100644
index 0000000000..2c24bc6db5
--- /dev/null
+++ b/src/bin/pg_walsummary/Makefile
@@ -0,0 +1,48 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/bin/pg_walsummary
+#
+# Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/bin/pg_walsummary/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "pg_walsummary - print contents of WAL summary files"
+PGAPPICON=win32
+
+subdir = src/bin/pg_walsummary
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils
+
+OBJS = \
+	$(WIN32RES) \
+	pg_walsummary.o
+
+all: pg_walsummary
+
+pg_walsummary: $(OBJS) | submake-libpgport submake-libpgfeutils
+	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+
+install: all installdirs
+	$(INSTALL_PROGRAM) pg_walsummary$(X) '$(DESTDIR)$(bindir)/pg_walsummary$(X)'
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(bindir)'
+
+uninstall:
+	rm -f '$(DESTDIR)$(bindir)/pg_walsummary$(X)'
+
+clean distclean maintainer-clean:
+	rm -f pg_walsummary$(X) $(OBJS)
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
diff --git a/src/bin/pg_walsummary/meson.build b/src/bin/pg_walsummary/meson.build
new file mode 100644
index 0000000000..f886a4cb36
--- /dev/null
+++ b/src/bin/pg_walsummary/meson.build
@@ -0,0 +1,30 @@
+# Copyright (c) 2022-2023, PostgreSQL Global Development Group
+
+pg_walsummary_sources = files(
+  'pg_walsummary.c',
+)
+
+if host_system == 'windows'
+  pg_walsummary_sources += rc_bin_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'pg_walsummary',
+    '--FILEDESC', 'pg_walsummary - print contents of WAL summary files',])
+endif
+
+pg_walsummary = executable('pg_walsummary',
+  pg_walsummary_sources,
+  dependencies: [frontend_code],
+  kwargs: default_bin_args,
+)
+bin_targets += pg_walsummary
+
+tests += {
+  'name': 'pg_walsummary',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'tap': {
+    'tests': [
+      't/001_basic.pl',
+      't/002_blocks.pl',
+    ],
+  }
+}
diff --git a/src/bin/pg_walsummary/nls.mk b/src/bin/pg_walsummary/nls.mk
new file mode 100644
index 0000000000..f411dcfe9e
--- /dev/null
+++ b/src/bin/pg_walsummary/nls.mk
@@ -0,0 +1,6 @@
+# src/bin/pg_combinebackup/nls.mk
+CATALOG_NAME     = pg_walsummary
+GETTEXT_FILES    = $(FRONTEND_COMMON_GETTEXT_FILES) \
+		   pg_walsummary.c
+GETTEXT_TRIGGERS = $(FRONTEND_COMMON_GETTEXT_TRIGGERS)
+GETTEXT_FLAGS    = $(FRONTEND_COMMON_GETTEXT_FLAGS)
diff --git a/src/bin/pg_walsummary/pg_walsummary.c b/src/bin/pg_walsummary/pg_walsummary.c
new file mode 100644
index 0000000000..0c0225eeb8
--- /dev/null
+++ b/src/bin/pg_walsummary/pg_walsummary.c
@@ -0,0 +1,280 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_walsummary.c
+ *		Prints the contents of WAL summary files.
+ *
+ * Copyright (c) 2017-2023, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/bin/pg_walsummary/pg_walsummary.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include <fcntl.h>
+#include <limits.h>
+
+#include "common/blkreftable.h"
+#include "common/logging.h"
+#include "fe_utils/option_utils.h"
+#include "lib/stringinfo.h"
+#include "getopt_long.h"
+
+typedef struct ws_options
+{
+	bool		individual;
+	bool		quiet;
+} ws_options;
+
+typedef struct ws_file_info
+{
+	int			fd;
+	char	   *filename;
+} ws_file_info;
+
+static BlockNumber *block_buffer = NULL;
+static unsigned block_buffer_size = 512;	/* Initial size. */
+
+static void dump_one_relation(ws_options *opt, RelFileLocator *rlocator,
+							  ForkNumber forknum, BlockNumber limit_block,
+							  BlockRefTableReader *reader);
+static void help(const char *progname);
+static int	compare_block_numbers(const void *a, const void *b);
+static int	walsummary_read_callback(void *callback_arg, void *data,
+									 int length);
+static void walsummary_error_callback(void *callback_arg, char *fmt,...) pg_attribute_printf(2, 3);
+
+/*
+ * Main program.
+ */
+int
+main(int argc, char *argv[])
+{
+	static struct option long_options[] = {
+		{"individual", no_argument, NULL, 'i'},
+		{"quiet", no_argument, NULL, 'q'},
+		{NULL, 0, NULL, 0}
+	};
+
+	const char *progname;
+	int			optindex;
+	int			c;
+	ws_options	opt;
+
+	memset(&opt, 0, sizeof(ws_options));
+
+	pg_logging_init(argv[0]);
+	progname = get_progname(argv[0]);
+	handle_help_version_opts(argc, argv, progname, help);
+
+	/* process command-line options */
+	while ((c = getopt_long(argc, argv, "f:iqw:",
+							long_options, &optindex)) != -1)
+	{
+		switch (c)
+		{
+			case 'i':
+				opt.individual = true;
+				break;
+			case 'q':
+				opt.quiet = true;
+				break;
+			default:
+				/* getopt_long already emitted a complaint */
+				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+				exit(1);
+		}
+	}
+
+	if (optind >= argc)
+	{
+		pg_log_error("%s: no input files specified", progname);
+		pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+		exit(1);
+	}
+
+	while (optind < argc)
+	{
+		ws_file_info ws;
+		BlockRefTableReader *reader;
+		RelFileLocator rlocator;
+		ForkNumber	forknum;
+		BlockNumber limit_block;
+
+		ws.filename = argv[optind++];
+		if ((ws.fd = open(ws.filename, O_RDONLY | PG_BINARY, 0)) < 0)
+			pg_fatal("could not open file \"%s\": %m", ws.filename);
+
+		reader = CreateBlockRefTableReader(walsummary_read_callback, &ws,
+										   ws.filename,
+										   walsummary_error_callback, NULL);
+		while (BlockRefTableReaderNextRelation(reader, &rlocator, &forknum,
+											   &limit_block))
+			dump_one_relation(&opt, &rlocator, forknum, limit_block, reader);
+
+		DestroyBlockRefTableReader(reader);
+		close(ws.fd);
+	}
+
+	exit(0);
+}
+
+/*
+ * Dump details for one relation.
+ */
+static void
+dump_one_relation(ws_options *opt, RelFileLocator *rlocator,
+				  ForkNumber forknum, BlockNumber limit_block,
+				  BlockRefTableReader *reader)
+{
+	unsigned	i = 0;
+	unsigned	nblocks;
+	BlockNumber startblock = InvalidBlockNumber;
+	BlockNumber endblock = InvalidBlockNumber;
+
+	/* Dump limit block, if any. */
+	if (limit_block != InvalidBlockNumber)
+		printf("TS %u, DB %u, REL %u, FORK %s: limit %u\n",
+			   rlocator->spcOid, rlocator->dbOid, rlocator->relNumber,
+			   forkNames[forknum], limit_block);
+
+	/* If we haven't allocated a block buffer yet, do that now. */
+	if (block_buffer == NULL)
+		block_buffer = palloc_array(BlockNumber, block_buffer_size);
+
+	/* Try to fill the block buffer. */
+	nblocks = BlockRefTableReaderGetBlocks(reader,
+										   block_buffer,
+										   block_buffer_size);
+
+	/* If we filled the block buffer completely, we must enlarge it. */
+	while (nblocks >= block_buffer_size)
+	{
+		unsigned	new_size;
+
+		/* Double the size, being careful about overflow. */
+		new_size = block_buffer_size * 2;
+		if (new_size < block_buffer_size)
+			new_size = PG_UINT32_MAX;
+		block_buffer = repalloc_array(block_buffer, BlockNumber, new_size);
+
+		/* Try to fill the newly-allocated space. */
+		nblocks +=
+			BlockRefTableReaderGetBlocks(reader,
+										 block_buffer + block_buffer_size,
+										 new_size - block_buffer_size);
+
+		/* Save the new size for later calls. */
+		block_buffer_size = new_size;
+	}
+
+	/* If we don't need to produce any output, skip the rest of this. */
+	if (opt->quiet)
+		return;
+
+	/*
+	 * Sort the returned block numbers. If the block reference table was using
+	 * the bitmap representation for a given chunk, the block numbers in that
+	 * chunk will already be sorted, but when the array-of-offsets
+	 * representation is used, we can receive block numbers here out of order.
+	 */
+	qsort(block_buffer, nblocks, sizeof(BlockNumber), compare_block_numbers);
+
+	/* Dump block references. */
+	while (i < nblocks)
+	{
+		/*
+		 * Find the next range of blocks to print, but if --individual was
+		 * specified, then consider each block a separate range.
+		 */
+		startblock = endblock = block_buffer[i++];
+		if (!opt->individual)
+		{
+			while (i < nblocks && block_buffer[i] == endblock + 1)
+			{
+				endblock++;
+				i++;
+			}
+		}
+
+		/* Format this range of block numbers as a string. */
+		if (startblock == endblock)
+			printf("TS %u, DB %u, REL %u, FORK %s: block %u\n",
+				   rlocator->spcOid, rlocator->dbOid, rlocator->relNumber,
+				   forkNames[forknum], startblock);
+		else
+			printf("TS %u, DB %u, REL %u, FORK %s: blocks %u..%u\n",
+				   rlocator->spcOid, rlocator->dbOid, rlocator->relNumber,
+				   forkNames[forknum], startblock, endblock);
+	}
+}
+
+/*
+ * Quicksort comparator for block numbers.
+ */
+static int
+compare_block_numbers(const void *a, const void *b)
+{
+	BlockNumber aa = *(BlockNumber *) a;
+	BlockNumber bb = *(BlockNumber *) b;
+
+	if (aa > bb)
+		return 1;
+	else if (aa == bb)
+		return 0;
+	else
+		return -1;
+}
+
+/*
+ * Error callback.
+ */
+void
+walsummary_error_callback(void *callback_arg, char *fmt,...)
+{
+	va_list		ap;
+
+	va_start(ap, fmt);
+	pg_log_generic_v(PG_LOG_ERROR, PG_LOG_PRIMARY, fmt, ap);
+	va_end(ap);
+
+	exit(1);
+}
+
+/*
+ * Read callback.
+ */
+int
+walsummary_read_callback(void *callback_arg, void *data, int length)
+{
+	ws_file_info *ws = callback_arg;
+	int			rc;
+
+	if ((rc = read(ws->fd, data, length)) < 0)
+		pg_fatal("could not read file \"%s\": %m", ws->filename);
+
+	return rc;
+}
+
+/*
+ * help
+ *
+ * Prints help page for the program
+ *
+ * progname: the name of the executed program, such as "pg_walsummary"
+ */
+static void
+help(const char *progname)
+{
+	printf(_("%s prints the contents of a WAL summary file.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION]... FILE...\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_("  -i, --individual          list block numbers individually, not as ranges\n"));
+	printf(_("  -q, --quiet               don't print anything, just parse the files\n"));
+	printf(_("  -?, --help                show this help, then exit\n"));
+
+	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
+	printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
+}
diff --git a/src/bin/pg_walsummary/t/001_basic.pl b/src/bin/pg_walsummary/t/001_basic.pl
new file mode 100644
index 0000000000..10a232a150
--- /dev/null
+++ b/src/bin/pg_walsummary/t/001_basic.pl
@@ -0,0 +1,19 @@
+# Copyright (c) 2021-2023, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $tempdir = PostgreSQL::Test::Utils::tempdir;
+
+program_help_ok('pg_walsummary');
+program_version_ok('pg_walsummary');
+program_options_handling_ok('pg_walsummary');
+
+command_fails_like(
+	['pg_walsummary'],
+	qr/no input files specified/,
+	'input files must be specified');
+
+done_testing();
diff --git a/src/bin/pg_walsummary/t/002_blocks.pl b/src/bin/pg_walsummary/t/002_blocks.pl
new file mode 100644
index 0000000000..170976f9e2
--- /dev/null
+++ b/src/bin/pg_walsummary/t/002_blocks.pl
@@ -0,0 +1,88 @@
+# Copyright (c) 2021-2023, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use File::Compare;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Set up a new database instance.
+my $node1 = PostgreSQL::Test::Cluster->new('node1');
+$node1->init(has_archiving => 1, allows_streaming => 1);
+$node1->append_conf('postgresql.conf', 'summarize_wal = on');
+$node1->start;
+
+# See what's been summarized up until now.
+my $progress = $node1->safe_psql('postgres', <<EOM);
+SELECT summarized_tli, summarized_lsn FROM pg_get_wal_summarizer_state()
+EOM
+my ($summarized_tli, $summarized_lsn) = split(/\|/, $progress);
+note("before insert, summarized TLI $summarized_tli through $summarized_lsn");
+
+# Create a table and insert a few test rows into it. VACUUM FREEZE it so that
+# autovacuum doesn't induce any future modifications unexpectedly. Then
+# trigger a checkpoint.
+$node1->safe_psql('postgres', <<EOM);
+CREATE TABLE mytable (a int, b text);
+INSERT INTO mytable
+SELECT
+	g, random()::text||random()::text||random()::text||random()::text
+FROM
+	generate_series(1, 400) g;
+VACUUM FREEZE;
+CHECKPOINT;
+EOM
+
+# Wait for a new summary to show up.
+$node1->poll_query_until('postgres', <<EOM);
+SELECT EXISTS (
+    SELECT * from pg_available_wal_summaries()
+    WHERE tli = $summarized_tli AND end_lsn > '$summarized_lsn'
+)
+EOM
+
+# Again check the progress of WAL summarization.
+$progress = $node1->safe_psql('postgres', <<EOM);
+SELECT summarized_tli, summarized_lsn FROM pg_get_wal_summarizer_state()
+EOM
+($summarized_tli, $summarized_lsn) = split(/\|/, $progress);
+note("after insert, summarized TLI $summarized_tli through $summarized_lsn");
+
+# Update a row in the first block of the table and trigger a checkpoint.
+$node1->safe_psql('postgres', <<EOM);
+UPDATE mytable SET b = 'abcdefghijklmnopqrstuvwxyz' WHERE a = 2;
+CHECKPOINT;
+EOM
+
+# Again wait for a new summary to show up.
+$node1->poll_query_until('postgres', <<EOM);
+SELECT EXISTS (
+    SELECT * from pg_available_wal_summaries()
+    WHERE tli = $summarized_tli AND end_lsn > '$summarized_lsn'
+)
+EOM
+
+# Figure out the exact details for the new sumamry file.
+my $details = $node1->safe_psql('postgres', <<EOM);
+SELECT tli, start_lsn, end_lsn from pg_available_wal_summaries()
+	WHERE tli = $summarized_tli AND end_lsn > '$summarized_lsn'
+EOM
+my ($tli, $start_lsn, $end_lsn) = split(/\|/, $details);
+note("examining summary for TLI $tli from $start_lsn to $end_lsn");
+
+# Reconstruct the full pathname for the WAL summary file.
+my $filename = sprintf "%s/pg_wal/summaries/%08s%08s%08s%08s%08s.summary",
+					   $node1->data_dir, $tli,
+					   split(m@/@, $start_lsn),
+					   split(m@/@, $end_lsn);
+ok(-f $filename, "WAL summary file exists");
+
+# Run pg_walsummary on it. We expect block 0 to be modified, but block 1
+# to be unmodified, so the output should say block 0, not block 0..1 or
+# similar.
+my ($stdout, $stderr) = run_command([ 'pg_walsummary', $filename ]);
+like($stdout, qr/FORK main: block 0$/m, "stdout shows block 0 modified");
+is($stderr, '', 'stderr is empty');
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 5fd46b7bd1..f582eb59e7 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4039,3 +4039,5 @@ cb_tablespace_mapping
 manifest_data
 manifest_writer
 rfile
+ws_options
+ws_file_info
-- 
2.39.3 (Apple Git-145)

v2-0003-Repair-various-defects-in-dc212340058b4e7ecfc5a7a.patchapplication/octet-stream; name=v2-0003-Repair-various-defects-in-dc212340058b4e7ecfc5a7a.patchDownload
From a866596debade4226ed8480838af124f262414fd Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 9 Jan 2024 13:12:43 -0500
Subject: [PATCH v2 3/3] Repair various defects in
 dc212340058b4e7ecfc5a7a81ec50e7a207bf288.

pg_combinebackup had various problems:

* strncpy was used in various places where should be used instead,
  to avoid any possibility of the result not being \0-terminated.
* scan_for_existing_tablespaces() failed to close the directory,
  and an error when opening the directory was reported with the
  wrong pathname.
* write_reconstructed_file() contained some redundant and therefore
  dead code.
* flush_manifest() didn't check the result of pg_checksum_update()
  as we do in other places, and misused a local pathname variable
  that shouldn't exist at all.

In pg_basebackup, the wrong variable name was used in one place,
due to a copy and paste that was not properly adjusted.

In blkreftable.c, the loop incorrectly doubled chunkno instead of
max_chunks.

Per Coverity and subsequent code inspection by me and by Tom Lane.
---
 src/bin/pg_basebackup/pg_basebackup.c       |  2 +-
 src/bin/pg_combinebackup/pg_combinebackup.c | 15 +++++++++------
 src/bin/pg_combinebackup/reconstruct.c      |  5 ++---
 src/bin/pg_combinebackup/write_manifest.c   | 10 +++++-----
 src/common/blkreftable.c                    |  2 +-
 5 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index b734cce5d4..3b3cc242e0 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -709,7 +709,7 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier,
 					 PQserverVersion(conn) < MINIMUM_VERSION_FOR_PG_WAL ?
 					 "pg_xlog" : "pg_wal");
 
-			if (pg_mkdir_p(statusdir, pg_dir_create_mode) != 0 &&
+			if (pg_mkdir_p(summarydir, pg_dir_create_mode) != 0 &&
 				errno != EEXIST)
 				pg_fatal("could not create directory \"%s\": %m", summarydir);
 		}
diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c b/src/bin/pg_combinebackup/pg_combinebackup.c
index 6027e241f3..31ead7f405 100644
--- a/src/bin/pg_combinebackup/pg_combinebackup.c
+++ b/src/bin/pg_combinebackup/pg_combinebackup.c
@@ -773,8 +773,8 @@ process_directory_recursively(Oid tsoid,
 	 */
 	if (relative_path == NULL)
 	{
-		strncpy(ifulldir, input_directory, MAXPGPATH);
-		strncpy(ofulldir, output_directory, MAXPGPATH);
+		strlcpy(ifulldir, input_directory, MAXPGPATH);
+		strlcpy(ofulldir, output_directory, MAXPGPATH);
 		if (OidIsValid(tsoid))
 			snprintf(manifest_prefix, MAXPGPATH, "pg_tblspc/%u/", tsoid);
 		else
@@ -855,7 +855,7 @@ process_directory_recursively(Oid tsoid,
 
 			/* Append new pathname component to relative path. */
 			if (relative_path == NULL)
-				strncpy(new_relative_path, de->d_name, MAXPGPATH);
+				strlcpy(new_relative_path, de->d_name, MAXPGPATH);
 			else
 				snprintf(new_relative_path, MAXPGPATH, "%s/%s", relative_path,
 						 de->d_name);
@@ -1131,7 +1131,7 @@ scan_for_existing_tablespaces(char *pathname, cb_options *opt)
 	pg_log_debug("scanning \"%s\"", pg_tblspc);
 
 	if ((dir = opendir(pg_tblspc)) == NULL)
-		pg_fatal("could not open directory \"%s\": %m", pathname);
+		pg_fatal("could not open directory \"%s\": %m", pg_tblspc);
 
 	while (errno = 0, (de = readdir(dir)) != NULL)
 	{
@@ -1203,8 +1203,8 @@ scan_for_existing_tablespaces(char *pathname, cb_options *opt)
 			{
 				if (strcmp(tsmap->old_dir, link_target) == 0)
 				{
-					strncpy(ts->old_dir, tsmap->old_dir, MAXPGPATH);
-					strncpy(ts->new_dir, tsmap->new_dir, MAXPGPATH);
+					strlcpy(ts->old_dir, tsmap->old_dir, MAXPGPATH);
+					strlcpy(ts->new_dir, tsmap->new_dir, MAXPGPATH);
 					ts->in_place = false;
 					break;
 				}
@@ -1238,6 +1238,9 @@ scan_for_existing_tablespaces(char *pathname, cb_options *opt)
 		tslist = ts;
 	}
 
+	if (closedir(dir) != 0)
+		pg_fatal("could not close directory \"%s\": %m", pg_tblspc);
+
 	return tslist;
 }
 
diff --git a/src/bin/pg_combinebackup/reconstruct.c b/src/bin/pg_combinebackup/reconstruct.c
index b835ec363e..873d307902 100644
--- a/src/bin/pg_combinebackup/reconstruct.c
+++ b/src/bin/pg_combinebackup/reconstruct.c
@@ -577,13 +577,12 @@ write_reconstructed_file(char *input_filename,
 			{
 				if (current_block == start_of_range)
 					appendStringInfo(&debug_buf, " %u:%s@" UINT64_FORMAT,
-									 current_block,
-									 s == NULL ? "ZERO" : s->filename,
+									 current_block, s->filename,
 									 (uint64) offsetmap[current_block]);
 				else
 					appendStringInfo(&debug_buf, " %u-%u:%s@" UINT64_FORMAT,
 									 start_of_range, current_block,
-									 s == NULL ? "ZERO" : s->filename,
+									 s->filename,
 									 (uint64) offsetmap[current_block]);
 			}
 
diff --git a/src/bin/pg_combinebackup/write_manifest.c b/src/bin/pg_combinebackup/write_manifest.c
index 608e84f3b4..01deb82cc9 100644
--- a/src/bin/pg_combinebackup/write_manifest.c
+++ b/src/bin/pg_combinebackup/write_manifest.c
@@ -241,8 +241,6 @@ escape_json(StringInfo buf, const char *str)
 static void
 flush_manifest(manifest_writer *mwriter)
 {
-	char		pathname[MAXPGPATH];
-
 	if (mwriter->fd == -1 &&
 		(mwriter->fd = open(mwriter->pathname,
 							O_WRONLY | O_CREAT | O_EXCL | PG_BINARY,
@@ -260,13 +258,15 @@ flush_manifest(manifest_writer *mwriter)
 				pg_fatal("could not write \"%s\": %m", mwriter->pathname);
 			else
 				pg_fatal("could not write file \"%s\": wrote only %d of %d bytes",
-						 pathname, (int) wb, mwriter->buf.len);
+						 mwriter->pathname, (int) wb, mwriter->buf.len);
 		}
 
-		if (mwriter->still_checksumming)
+		if (mwriter->still_checksumming &&
 			pg_checksum_update(&mwriter->manifest_ctx,
 							   (uint8 *) mwriter->buf.data,
-							   mwriter->buf.len);
+							   mwriter->buf.len) < 0)
+			pg_fatal("could not update checksum of file \"%s\"",
+					 mwriter->pathname);
 		resetStringInfo(&mwriter->buf);
 	}
 }
diff --git a/src/common/blkreftable.c b/src/common/blkreftable.c
index bf0b563a38..20c80c779e 100644
--- a/src/common/blkreftable.c
+++ b/src/common/blkreftable.c
@@ -988,7 +988,7 @@ BlockRefTableEntryMarkBlockModified(BlockRefTableEntry *entry,
 		 */
 		max_chunks = Max(16, entry->nchunks);
 		while (max_chunks < chunkno + 1)
-			chunkno *= 2;
+			max_chunks *= 2;
 		Assert(max_chunks > chunkno);
 		extra_chunks = max_chunks - entry->nchunks;
 
-- 
2.39.3 (Apple Git-145)

v2-0001-Add-new-function-pg_get_wal_summarizer_state.patchapplication/octet-stream; name=v2-0001-Add-new-function-pg_get_wal_summarizer_state.patchDownload
From 46e4a46be8d8a53db903d815cb306c9a3dc75d14 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 4 Jan 2024 13:37:41 -0500
Subject: [PATCH v2 1/3] Add new function pg_get_wal_summarizer_state().

This makes it possible to access information about the progress
of WAL summarization from SQL. The previously-added functions
pg_available_wal_summaries() and pg_wal_summary_contents() only
examine on-disk state, but this function exposes information from
the server's shared memory.

XXX. Bump catversion.
---
 doc/src/sgml/func.sgml                 | 28 +++++++++++
 src/backend/backup/walsummaryfuncs.c   | 39 ++++++++++++++++
 src/backend/postmaster/walsummarizer.c | 65 ++++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat        |  9 ++++
 src/include/postmaster/walsummarizer.h |  4 ++
 5 files changed, 145 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index de78d58d4b..0f7d409e60 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26554,6 +26554,34 @@ SELECT collation for ('foo' COLLATE "de_DE");
         <literal>relblocknumber</literal> will be zero.
        </para></entry>
       </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_get_wal_summarizer_state</primary>
+        </indexterm>
+        <function>pg_get_wal_summarizer_state</function> ()
+        <returnvalue>record</returnvalue>
+        ( <parameter>summarized_tli</parameter> <type>bigint</type>,
+        <parameter>summarized_lsn</parameter> <type>pg_lsn</type>,
+        <parameter>pending_lsn</parameter> <type>pg_lsn</type>,
+        <parameter>summarizer_pid</parameter> <type>int</type> )
+       </para>
+       <para>
+        Returns information about the progress of the WAL summarizer. If the
+        WAL summarizer has never run since the instance was started, then
+        <literal>summarized_tli</literal> and <literal>summarized_lsn</literal>
+        will be <literal>0</literal> and <literal>0/0</literal> respectively;
+        otherwise, they will be the TLI and ending LSN of the last WAL summary
+        file written to disk. If the WAL summarizer is currently running,
+        <literal>pending_lsn</literal> will be the ending LSN of the last
+        record that it has consumed, which must always be greater than or
+        equal to <literal>summarized_lsn</literal>; if the WAL summarizer is
+        not running, it will be equal to <literal>summarized_lsn</literal>.
+        <literal>summarized_pid</literal> is the PID of the WAL summarizer
+        process, if it is running, and otherwise NULL.
+       </para></entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
diff --git a/src/backend/backup/walsummaryfuncs.c b/src/backend/backup/walsummaryfuncs.c
index 89c660c696..f082488b33 100644
--- a/src/backend/backup/walsummaryfuncs.c
+++ b/src/backend/backup/walsummaryfuncs.c
@@ -16,11 +16,13 @@
 #include "common/blkreftable.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/walsummarizer.h"
 #include "utils/fmgrprotos.h"
 #include "utils/pg_lsn.h"
 
 #define NUM_WS_ATTS			3
 #define NUM_SUMMARY_ATTS	6
+#define NUM_STATE_ATTS		4
 #define MAX_BLOCKS_PER_CALL	256
 
 /*
@@ -167,3 +169,40 @@ pg_wal_summary_contents(PG_FUNCTION_ARGS)
 
 	return (Datum) 0;
 }
+
+/*
+ * Returns information about the state of the WAL summarizer process.
+ */
+Datum
+pg_get_wal_summarizer_state(PG_FUNCTION_ARGS)
+{
+	Datum		values[NUM_STATE_ATTS];
+	bool		nulls[NUM_STATE_ATTS];
+	TimeLineID	summarized_tli;
+	XLogRecPtr	summarized_lsn;
+	XLogRecPtr	pending_lsn;
+	int			summarizer_pid;
+	TupleDesc	tupdesc;
+	HeapTuple	htup;
+
+	GetWalSummarizerState(&summarized_tli, &summarized_lsn, &pending_lsn,
+						  &summarizer_pid);
+
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	memset(nulls, 0, sizeof(nulls));
+
+	values[0] = Int64GetDatum((int64) summarized_tli);
+	values[1] = LSNGetDatum(summarized_lsn);
+	values[2] = LSNGetDatum(pending_lsn);
+
+	if (summarizer_pid < 0)
+		nulls[3] = true;
+	else
+		values[3] = Int32GetDatum(summarizer_pid);
+
+	htup = heap_form_tuple(tupdesc, values, nulls);
+
+	PG_RETURN_DATUM(HeapTupleGetDatum(htup));
+}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index f828cc436a..7287f07529 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -142,6 +142,7 @@ static XLogRecPtr redo_pointer_at_last_summary_removal = InvalidXLogRecPtr;
 bool		summarize_wal = false;
 int			wal_summary_keep_time = 10 * 24 * 60;
 
+static void WalSummarizerShutdown(int code, Datum arg);
 static XLogRecPtr GetLatestLSN(TimeLineID *tli);
 static void HandleWalSummarizerInterrupts(void);
 static XLogRecPtr SummarizeWAL(TimeLineID tli, XLogRecPtr start_lsn,
@@ -245,6 +246,7 @@ WalSummarizerMain(void)
 	pqsignal(SIGUSR2, SIG_IGN); /* not used */
 
 	/* Advertise ourselves. */
+	on_shmem_exit(WalSummarizerShutdown, (Datum) 0);
 	LWLockAcquire(WALSummarizerLock, LW_EXCLUSIVE);
 	WalSummarizerCtl->summarizer_pgprocno = MyProc->pgprocno;
 	LWLockRelease(WALSummarizerLock);
@@ -417,6 +419,57 @@ WalSummarizerMain(void)
 	}
 }
 
+/*
+ * Get information about the state of the WAL summarizer.
+ */
+void
+GetWalSummarizerState(TimeLineID *summarized_tli, XLogRecPtr *summarized_lsn,
+					  XLogRecPtr *pending_lsn, int *summarizer_pid)
+{
+	LWLockAcquire(WALSummarizerLock, LW_SHARED);
+	if (!WalSummarizerCtl->initialized)
+	{
+		/*
+		 * If initialized is false, the rest of the structure contents are
+		 * undefined.
+		 */
+		*summarized_tli = 0;
+		*summarized_lsn = InvalidXLogRecPtr;
+		*pending_lsn = InvalidXLogRecPtr;
+		*summarizer_pid = -1;
+	}
+	else
+	{
+		int	summarizer_pgprocno = WalSummarizerCtl->summarizer_pgprocno;
+
+		*summarized_tli = WalSummarizerCtl->summarized_tli;
+		*summarized_lsn = WalSummarizerCtl->summarized_lsn;
+		if (summarizer_pgprocno == INVALID_PGPROCNO)
+		{
+			/*
+			 * If the summarizer has exited, the fact that it had processed
+			 * beyond summarized_lsn is irrelevant now.
+			 */
+			*pending_lsn = WalSummarizerCtl->summarized_lsn;
+			*summarizer_pid = -1;
+		}
+		else
+		{
+			*pending_lsn = WalSummarizerCtl->pending_lsn;
+
+			/*
+			 * We're not fussed about inexact answers here, since they could
+			 * become stale instantly, so we don't bother taking the lock, but
+			 * make sure that invalid PID values are normalized to -1.
+			 */
+			*summarizer_pid = GetPGProcByNumber(summarizer_pgprocno)->pid;
+			if (*summarizer_pid <= 0)
+				*summarizer_pid = -1;
+		}
+	}
+	LWLockRelease(WALSummarizerLock);
+}
+
 /*
  * Get the oldest LSN in this server's timeline history that has not yet been
  * summarized.
@@ -622,6 +675,18 @@ WaitForWalSummarization(XLogRecPtr lsn, long timeout, XLogRecPtr *pending_lsn)
 	return summarized_lsn;
 }
 
+/*
+ * On exit, update shared memory to make it clear that we're no longer
+ * running.
+ */
+static void
+WalSummarizerShutdown(int code, Datum arg)
+{
+	LWLockAcquire(WALSummarizerLock, LW_EXCLUSIVE);
+	WalSummarizerCtl->summarizer_pgprocno = INVALID_PGPROCNO;
+	LWLockRelease(WALSummarizerLock);
+}
+
 /*
  * Get the latest LSN that is eligible to be summarized, and set *tli to the
  * corresponding timeline.
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7979392776..58811a6530 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12142,5 +12142,14 @@
   proargmodes => '{i,i,i,o,o,o,o,o,o}',
   proargnames => '{tli,start_lsn,end_lsn,relfilenode,reltablespace,reldatabase,relforknumber,relblocknumber,is_limit_block}',
   prosrc => 'pg_wal_summary_contents' },
+{ oid => '8438',
+  descr => 'WAL summarizer state',
+  proname => 'pg_get_wal_summarizer_state',
+  provolatile => 'v', proparallel => 's',
+  prorettype => 'record', proargtypes => '',
+  proallargtypes => '{int8,pg_lsn,pg_lsn,int4}',
+  proargmodes => '{o,o,o,o}',
+  proargnames => '{summarized_tli,summarized_lsn,pending_lsn,summarizer_pid}',
+  prosrc => 'pg_get_wal_summarizer_state' },
 
 ]
diff --git a/src/include/postmaster/walsummarizer.h b/src/include/postmaster/walsummarizer.h
index 75cb1d709f..d6e61b59ac 100644
--- a/src/include/postmaster/walsummarizer.h
+++ b/src/include/postmaster/walsummarizer.h
@@ -23,6 +23,10 @@ extern Size WalSummarizerShmemSize(void);
 extern void WalSummarizerShmemInit(void);
 extern void WalSummarizerMain(void) pg_attribute_noreturn();
 
+extern void GetWalSummarizerState(TimeLineID *summarized_tli,
+								  XLogRecPtr *summarized_lsn,
+								  XLogRecPtr *pending_lsn,
+								  int *summarizer_pid);
 extern XLogRecPtr GetOldestUnsummarizedLSN(TimeLineID *tli,
 										   bool *lsn_is_exact,
 										   bool reset_pending_lsn);
-- 
2.39.3 (Apple Git-145)

#3Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#2)
Re: cleanup patches for incremental backup

On Tue, Jan 9, 2024 at 1:18 PM Robert Haas <robertmhaas@gmail.com> wrote:

Here's v2. I plan to commit the rest of this fairly soon if there are
no comments.

Done, with a minor adjustment to 0003.

--
Robert Haas
EDB: http://www.enterprisedb.com

In reply to: Robert Haas (#3)
1 attachment(s)
RE: cleanup patches for incremental backup

Hi,
Thank you for developing the new tool. I have attached a patch that corrects the spelling of the --individual option in the SGML file.

Regards,
Noriyoshi Shinoda

-----Original Message-----
From: Robert Haas <robertmhaas@gmail.com>
Sent: Friday, January 12, 2024 3:13 AM
To: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>; Jakub Wartak <jakub.wartak@enterprisedb.com>
Subject: Re: cleanup patches for incremental backup

On Tue, Jan 9, 2024 at 1:18 PM Robert Haas <robertmhaas@gmail.com> wrote:

Here's v2. I plan to commit the rest of this fairly soon if there are
no comments.

Done, with a minor adjustment to 0003.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

pg_walsummary_sgml_v1.diffapplication/octet-stream; name=pg_walsummary_sgml_v1.diffDownload
diff --git a/doc/src/sgml/ref/pg_walsummary.sgml b/doc/src/sgml/ref/pg_walsummary.sgml
index 93e265ead7..96835d2f60 100644
--- a/doc/src/sgml/ref/pg_walsummary.sgml
+++ b/doc/src/sgml/ref/pg_walsummary.sgml
@@ -58,7 +58,7 @@ PostgreSQL documentation
     <variablelist>
      <varlistentry>
       <term><option>-i</option></term>
-      <term><option>--indivudual</option></term>
+      <term><option>--individual</option></term>
       <listitem>
        <para>
         By default, <literal>pg_walsummary</literal> prints one line of output
#5Robert Haas
robertmhaas@gmail.com
In reply to: Shinoda, Noriyoshi (HPE Services Japan - FSIP) (#4)
Re: cleanup patches for incremental backup

On Thu, Jan 11, 2024 at 8:00 PM Shinoda, Noriyoshi (HPE Services Japan
- FSIP) <noriyoshi.shinoda@hpe.com> wrote:

Thank you for developing the new tool. I have attached a patch that corrects the spelling of the --individual option in the SGML file.

Thanks, committed.

--
Robert Haas
EDB: http://www.enterprisedb.com

#6Alexander Lakhin
exclusion@gmail.com
In reply to: Robert Haas (#5)
1 attachment(s)
Re: cleanup patches for incremental backup

Hello Robert,

12.01.2024 17:50, Robert Haas wrote:

On Thu, Jan 11, 2024 at 8:00 PM Shinoda, Noriyoshi (HPE Services Japan
- FSIP) <noriyoshi.shinoda@hpe.com> wrote:

Thank you for developing the new tool. I have attached a patch that corrects the spelling of the --individual option in the SGML file.

Thanks, committed.

I've found one more typo in the sgml:
summarized_pid
And one in a comment:
sumamry

A trivial fix is attached.

Best regards,
Alexander

Attachments:

pg_walsummary-fixes.patchtext/x-patch; charset=UTF-8; name=pg_walsummary-fixes.patchDownload
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0f7d409e60..210c7c0b02 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26578,7 +26578,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
         record that it has consumed, which must always be greater than or
         equal to <literal>summarized_lsn</literal>; if the WAL summarizer is
         not running, it will be equal to <literal>summarized_lsn</literal>.
-        <literal>summarized_pid</literal> is the PID of the WAL summarizer
+        <literal>summarizer_pid</literal> is the PID of the WAL summarizer
         process, if it is running, and otherwise NULL.
        </para></entry>
       </row>
diff --git a/src/bin/pg_walsummary/t/002_blocks.pl b/src/bin/pg_walsummary/t/002_blocks.pl
index d473471bc7..d609d2c547 100644
--- a/src/bin/pg_walsummary/t/002_blocks.pl
+++ b/src/bin/pg_walsummary/t/002_blocks.pl
@@ -63,7 +63,7 @@ SELECT EXISTS (
 )
 EOM
 
-# Figure out the exact details for the new sumamry file.
+# Figure out the exact details for the new summary file.
 my $details = $node1->safe_psql('postgres', <<EOM);
 SELECT tli, start_lsn, end_lsn from pg_available_wal_summaries()
 	WHERE tli = $summarized_tli AND end_lsn > '$summarized_lsn'
#7Robert Haas
robertmhaas@gmail.com
In reply to: Alexander Lakhin (#6)
Re: cleanup patches for incremental backup

On Sat, Jan 13, 2024 at 1:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:

I've found one more typo in the sgml:
summarized_pid
And one in a comment:
sumamry

A trivial fix is attached.

Thanks, committed.

--
Robert Haas
EDB: http://www.enterprisedb.com

#8Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Robert Haas (#7)
1 attachment(s)
Re: cleanup patches for incremental backup

On Mon, 15 Jan 2024 at 17:58, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Jan 13, 2024 at 1:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:

I've found one more typo in the sgml:
summarized_pid
And one in a comment:
sumamry

A trivial fix is attached.

Thanks, committed.

Off-list I was notified that the new WAL summarizer process was not
yet added to the glossary, so PFA a patch that does that.
In passing, it also adds "incremental backup" to the glossary, and
updates the documented types of backends in monitoring.sgml with the
new backend type, too.

Kind regards,

Matthias van de Meent.

Attachments:

v1-0001-incremental-backups-Add-new-items-to-glossary-mon.patchapplication/octet-stream; name=v1-0001-incremental-backups-Add-new-items-to-glossary-mon.patchDownload
From 712dd139243c4ec8e48b4db1fa1109c3437081f3 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Mon, 15 Jan 2024 21:20:38 +0100
Subject: [PATCH v1] incremental backups: Add new items to glossary,
 monitoring.sgml

The previous patches seem to have overlooked this.
---
 doc/src/sgml/glossary.sgml   | 37 ++++++++++++++++++++++++++++++++++++
 doc/src/sgml/monitoring.sgml | 18 +++++++++++++++++-
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 5815fa4471..0f9d48a7dd 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -893,6 +893,29 @@
     </para>
    </glossdef>
   </glossentry>
+  <glossentry id="glossary-incremental-backup">
+   <glossterm>Incremental Backup</glossterm>
+   <glossdef>
+    <para>
+     A backup that contains only the data that was potentially changed since
+     the previous <glossentry id="glossary-basebackup">base backup</glossentry>
+     and other incremental backups. It is often (but not always) smaller than
+     the WAL that generated the changes between the last basebackup and the
+     end of this incremental backup. Like base backups, it is generated by the
+     tool <xref linkend="app-pgbasebackup"/>.
+    </para>
+    <para>
+     In combination with <glossterm linkend="glossary-wal">WAL</glossterm>
+     data, recent incremental backups and a base backup this can restore a
+     <glossterm linkend="glossary-db-cluster"database cluster</glossterm> to
+     a consistent state without having to replay all the WAL that was generated
+     since the last base backup.
+    </para>
+    <para>
+     For more information, see <xref linkend="backup-incremental-backup"/>
+    </para>
+   </glossdef>
+  </glossentry>
 
   <glossentry id="glossary-insert">
    <glossterm>Insert</glossterm>
@@ -2157,6 +2180,20 @@
    </glossdef>
   </glossentry>
 
+  <glossentry id="glossary-wal-summarizer">
+   <glossterm>WAL summarizer (process)</glossterm>
+   <glossdef>
+    <para>
+     A special <glossterm linkend="glossary-backend">backend process</glossterm>
+     that summarizes WAL data for
+     <glossterm linkend="glossary-incremental-backup">incremental backups</glossterm>.
+    </para>
+    <para>
+     For more information, see <xref linkend="runtime-config-wal-summarization"/>
+    </para>
+   </glossdef>
+  </glossentry>
+
   <glossentry id="glossary-wal-writer">
    <glossterm>WAL writer (process)</glossterm>
    <glossdef>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b804eb8b5e..6135d7a270 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -999,7 +999,8 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
        <literal>client backend</literal>, <literal>checkpointer</literal>,
        <literal>archiver</literal>, <literal>standalone backend</literal>,
        <literal>startup</literal>, <literal>walreceiver</literal>,
-       <literal>walsender</literal> and <literal>walwriter</literal>.
+       <literal>walsender</literal>, <literal>walwriter</literal> and
+       <literal>walsummarizer</literal>.
        In addition, background workers registered by extensions may have
        additional types.
       </para></entry>
@@ -4062,6 +4063,21 @@ description | Waiting for a newly initialized WAL file to reach durable storage
    </para>
   </note>
 
+  <note>
+   <para>
+    Every time an index is searched, the index's
+    <structname>pg_stat_all_indexes</structname>.<structfield>idx_scan</structfield>
+    field is incremented.  This usually happens once per index scan node
+    execution, but might take place several times during execution of a scan
+    that searches for multiple values together.  Queries that use certain
+    <acronym>SQL</acronym> constructs to search for rows matching any value
+    out of a list (or an array) of multiple scalar values might perform
+    multiple <quote>primitive</quote> index scans (up to one primitive scan
+    per scalar value) at runtime.  See <xref linkend="functions-comparisons"/>
+    for details.
+   </para>
+  </note>
+
  </sect2>
 
  <sect2 id="monitoring-pg-statio-all-tables-view">
-- 
2.40.1

#9Robert Haas
robertmhaas@gmail.com
In reply to: Matthias van de Meent (#8)
Re: cleanup patches for incremental backup

On Mon, Jan 15, 2024 at 3:31 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

Off-list I was notified that the new WAL summarizer process was not
yet added to the glossary, so PFA a patch that does that.
In passing, it also adds "incremental backup" to the glossary, and
updates the documented types of backends in monitoring.sgml with the
new backend type, too.

I wonder if it's possible that you sent the wrong version of this
patch, because:

(1) The docs don't build with this applied. I'm not sure if it's the
only problem, but <glossterm linkend="glossary-db-cluster" is missing
the closing >.

(2) The changes to monitoring.sgml contain an unrelated change, about
pg_stat_all_indexes.idx_scan.

Also, I think the "For more information, see <xref linkend="whatever"
/> bit should have a period after the markup tag, as we seem to do in
other cases.

One other thought is that the incremental backup only replaces
relation files with incremental files, and it never does anything
about FSM files. So the statement that it only contains data that was
potentially changed isn't quite correct. It might be better to phrase
it the other way around i.e. it is like a full backup, except that
some files can be replaced by incremental files which omit blocks to
which no WAL-logged changes have been made.

--
Robert Haas
EDB: http://www.enterprisedb.com

#10Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Robert Haas (#9)
1 attachment(s)
Re: cleanup patches for incremental backup

On Tue, 16 Jan 2024 at 16:39, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jan 15, 2024 at 3:31 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

Off-list I was notified that the new WAL summarizer process was not
yet added to the glossary, so PFA a patch that does that.
In passing, it also adds "incremental backup" to the glossary, and
updates the documented types of backends in monitoring.sgml with the
new backend type, too.

I wonder if it's possible that you sent the wrong version of this
patch, because:

(1) The docs don't build with this applied. I'm not sure if it's the
only problem, but <glossterm linkend="glossary-db-cluster" is missing
the closing >.

That's my mistake, I didn't check install-world yet due to unrelated
issues building the docs. I've since sorted out these issues (this was
a good stick to get that done), so this issue is fixed in the attached
patch.

(2) The changes to monitoring.sgml contain an unrelated change, about
pg_stat_all_indexes.idx_scan.

Thanks for noticing, fixed in attached.

Also, I think the "For more information, see <xref linkend="whatever"
/> bit should have a period after the markup tag, as we seem to do in
other cases.

Fixed.

One other thought is that the incremental backup only replaces
relation files with incremental files, and it never does anything
about FSM files. So the statement that it only contains data that was
potentially changed isn't quite correct. It might be better to phrase
it the other way around i.e. it is like a full backup, except that
some files can be replaced by incremental files which omit blocks to
which no WAL-logged changes have been made.

How about the attached?

Kind regards,

Matthias van de Meent

Attachments:

v2-0001-incremental-backups-Add-new-items-to-glossary-mon.patchapplication/octet-stream; name=v2-0001-incremental-backups-Add-new-items-to-glossary-mon.patchDownload
From 3d1092128c67c13004dff5a6ad58dbf35d05aa4d Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Mon, 15 Jan 2024 21:20:38 +0100
Subject: [PATCH v2] incremental backups: Add new items to glossary,
 monitoring.sgml

The previous patches seem to have overlooked this.
---
 doc/src/sgml/glossary.sgml   | 36 ++++++++++++++++++++++++++++++++++++
 doc/src/sgml/monitoring.sgml |  3 ++-
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 5815fa4471..2db0af766a 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -893,6 +893,28 @@
     </para>
    </glossdef>
   </glossentry>
+  <glossentry id="glossary-incremental-backup">
+   <glossterm>Incremental Backup</glossterm>
+   <glossdef>
+    <para>
+     A special <glossterm linkend="glossary-basebackup">base backup</glossterm>
+     that for some WAL-logged relations only contains the pages that were
+     modified since a previous backup, as opposed to the full relation data of
+     normal base backups. Like base backups, it is generated by the tool
+     <xref linkend="app-pgbasebackup"/>.
+    </para>
+    <para>
+     To restore incremental backups the tool <xref linkend="app-pgcombinebackup"/>
+     is used, which combines the incremental backups with a base backup and
+     <glossterm linkend="glossary-wal">WAL</glossterm> to restore a
+     <glossterm linkend="glossary-db-cluster">database cluster</glossterm> to
+     a consistent state.
+    </para>
+    <para>
+     For more information, see <xref linkend="backup-incremental-backup"/>.
+    </para>
+   </glossdef>
+  </glossentry>
 
   <glossentry id="glossary-insert">
    <glossterm>Insert</glossterm>
@@ -2157,6 +2179,20 @@
    </glossdef>
   </glossentry>
 
+  <glossentry id="glossary-wal-summarizer">
+   <glossterm>WAL summarizer (process)</glossterm>
+   <glossdef>
+    <para>
+     A special <glossterm linkend="glossary-backend">backend process</glossterm>
+     that summarizes WAL data for
+     <glossterm linkend="glossary-incremental-backup">incremental backups</glossterm>.
+    </para>
+    <para>
+     For more information, see <xref linkend="runtime-config-wal-summarization"/>.
+    </para>
+   </glossdef>
+  </glossentry>
+
   <glossentry id="glossary-wal-writer">
    <glossterm>WAL writer (process)</glossterm>
    <glossdef>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b804eb8b5e..6e74138a69 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -999,7 +999,8 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
        <literal>client backend</literal>, <literal>checkpointer</literal>,
        <literal>archiver</literal>, <literal>standalone backend</literal>,
        <literal>startup</literal>, <literal>walreceiver</literal>,
-       <literal>walsender</literal> and <literal>walwriter</literal>.
+       <literal>walsender</literal>, <literal>walwriter</literal> and
+       <literal>walsummarizer</literal>.
        In addition, background workers registered by extensions may have
        additional types.
       </para></entry>
-- 
2.40.1

#11Robert Haas
robertmhaas@gmail.com
In reply to: Matthias van de Meent (#10)
Re: cleanup patches for incremental backup

On Tue, Jan 16, 2024 at 3:22 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

One other thought is that the incremental backup only replaces
relation files with incremental files, and it never does anything
about FSM files. So the statement that it only contains data that was
potentially changed isn't quite correct. It might be better to phrase
it the other way around i.e. it is like a full backup, except that
some files can be replaced by incremental files which omit blocks to
which no WAL-logged changes have been made.

How about the attached?

I like the direction.

+     A special <glossterm linkend="glossary-basebackup">base backup</glossterm>
+     that for some WAL-logged relations only contains the pages that were
+     modified since a previous backup, as opposed to the full relation data of
+     normal base backups. Like base backups, it is generated by the tool
+     <xref linkend="app-pgbasebackup"/>.

Could we say "that for some files may contain only those pages that
were modified since a previous backup, as opposed to the full contents
of every file"? My thoughts are (1) there's no hard guarantee that an
incremental backup will replace even one file with an incremental
file, although in practice it is probably almost always going to
happen and (2) pg_combinebackup would actually be totally fine with
any file at all being sent incrementally; it's only that the server
isn't smart enough to figure out how to do this with e.g. SLRU data
right now.

+     To restore incremental backups the tool <xref
linkend="app-pgcombinebackup"/>
+     is used, which combines the incremental backups with a base backup and
+     <glossterm linkend="glossary-wal">WAL</glossterm> to restore a
+     <glossterm linkend="glossary-db-cluster">database cluster</glossterm> to
+     a consistent state.

I wondered if this needed to be clearer that the chain of backups
could have length > 2. But on further reflection, I think it's fine,
unless you feel otherwise.

The rest LGTM.

--
Robert Haas
EDB: http://www.enterprisedb.com

#12Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Robert Haas (#11)
1 attachment(s)
Re: cleanup patches for incremental backup

On Tue, 16 Jan 2024 at 21:49, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jan 16, 2024 at 3:22 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
+     A special <glossterm linkend="glossary-basebackup">base backup</glossterm>
+     that for some WAL-logged relations only contains the pages that were
+     modified since a previous backup, as opposed to the full relation data of
+     normal base backups. Like base backups, it is generated by the tool
+     <xref linkend="app-pgbasebackup"/>.

Could we say "that for some files may contain only those pages that
were modified since a previous backup, as opposed to the full contents
of every file"?

Sure, added in attached.

+     To restore incremental backups the tool <pgcombinebackup>
+     is used, which combines the incremental backups with a base backup and
+ [...]
I wondered if this needed to be clearer that the chain of backups
could have length > 2. But on further reflection, I think it's fine,
unless you feel otherwise.

I removed "the" from the phrasing "the incremental backups", which
makes it a bit less restricted.

The rest LGTM.

In the latest patch I also fixed the casing of "Incremental Backup" to
"... backup", to be in line with most other multi-word items.

Thanks!

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Attachments:

v3-0001-incremental-backups-Add-new-items-to-glossary-mon.patchapplication/octet-stream; name=v3-0001-incremental-backups-Add-new-items-to-glossary-mon.patchDownload
From 12a2b5290f29e3897391cc7b7a2666198cb1e835 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Mon, 15 Jan 2024 21:20:38 +0100
Subject: [PATCH v3] incremental backups: Add new items to glossary,
 monitoring.sgml

The previous patches seem to have overlooked this.
---
 doc/src/sgml/glossary.sgml   | 35 +++++++++++++++++++++++++++++++++++
 doc/src/sgml/monitoring.sgml |  3 ++-
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 5815fa4471..a0150e5c48 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -893,6 +893,27 @@
     </para>
    </glossdef>
   </glossentry>
+  <glossentry id="glossary-incremental-backup">
+   <glossterm>Incremental backup</glossterm>
+   <glossdef>
+    <para>
+     A special <glossterm linkend="glossary-basebackup">base backup</glossterm>
+     that for some files may contain only those pages that were modified since
+     a previous backup, as opposed to the full contents of every file. Like
+     base backups, it is generated by the tool <xref linkend="app-pgbasebackup"/>.
+    </para>
+    <para>
+     To restore incremental backups the tool <xref linkend="app-pgcombinebackup"/>
+     is used, which combines incremental backups with a base backup and
+     <glossterm linkend="glossary-wal">WAL</glossterm> to restore a
+     <glossterm linkend="glossary-db-cluster">database cluster</glossterm> to
+     a consistent state.
+    </para>
+    <para>
+     For more information, see <xref linkend="backup-incremental-backup"/>.
+    </para>
+   </glossdef>
+  </glossentry>
 
   <glossentry id="glossary-insert">
    <glossterm>Insert</glossterm>
@@ -2157,6 +2178,20 @@
    </glossdef>
   </glossentry>
 
+  <glossentry id="glossary-wal-summarizer">
+   <glossterm>WAL summarizer (process)</glossterm>
+   <glossdef>
+    <para>
+     A special <glossterm linkend="glossary-backend">backend process</glossterm>
+     that summarizes WAL data for
+     <glossterm linkend="glossary-incremental-backup">incremental backups</glossterm>.
+    </para>
+    <para>
+     For more information, see <xref linkend="runtime-config-wal-summarization"/>.
+    </para>
+   </glossdef>
+  </glossentry>
+
   <glossentry id="glossary-wal-writer">
    <glossterm>WAL writer (process)</glossterm>
    <glossdef>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b804eb8b5e..6e74138a69 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -999,7 +999,8 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
        <literal>client backend</literal>, <literal>checkpointer</literal>,
        <literal>archiver</literal>, <literal>standalone backend</literal>,
        <literal>startup</literal>, <literal>walreceiver</literal>,
-       <literal>walsender</literal> and <literal>walwriter</literal>.
+       <literal>walsender</literal>, <literal>walwriter</literal> and
+       <literal>walsummarizer</literal>.
        In addition, background workers registered by extensions may have
        additional types.
       </para></entry>
-- 
2.40.1

#13Robert Haas
robertmhaas@gmail.com
In reply to: Matthias van de Meent (#12)
Re: cleanup patches for incremental backup

On Wed, Jan 17, 2024 at 1:42 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

Sure, added in attached.

I think this mostly looks good now but I just realized that I think
this needs rephrasing:

+     To restore incremental backups the tool <xref
linkend="app-pgcombinebackup"/>
+     is used, which combines incremental backups with a base backup and
+     <glossterm linkend="glossary-wal">WAL</glossterm> to restore a
+     <glossterm linkend="glossary-db-cluster">database cluster</glossterm> to
+     a consistent state.

The way this is worded, at least to me, it makes it sound like
pg_combinebackup is going to do the WAL recovery for you, which it
isn't. Maybe:

To restore incremental backups the tool <xref
linkend="app-pgcombinebackup"/> is used, which combines incremental
backups with a base backup. Afterwards, recovery can use <glossterm
linkend="glossary-wal">WAL</glossterm> to bring the <glossterm
linkend="glossary-db-cluster">database cluster</glossterm> to a
consistent state.

--
Robert Haas
EDB: http://www.enterprisedb.com

#14Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Robert Haas (#13)
Re: cleanup patches for incremental backup

On Wed, 17 Jan 2024 at 21:10, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jan 17, 2024 at 1:42 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

Sure, added in attached.

I think this mostly looks good now but I just realized that I think
this needs rephrasing:

+     To restore incremental backups the tool <xref
linkend="app-pgcombinebackup"/>
+     is used, which combines incremental backups with a base backup and
+     <glossterm linkend="glossary-wal">WAL</glossterm> to restore a
+     <glossterm linkend="glossary-db-cluster">database cluster</glossterm> to
+     a consistent state.

The way this is worded, at least to me, it makes it sound like
pg_combinebackup is going to do the WAL recovery for you, which it
isn't. Maybe:

To restore incremental backups the tool <xref
linkend="app-pgcombinebackup"/> is used, which combines incremental
backups with a base backup. Afterwards, recovery can use <glossterm
linkend="glossary-wal">WAL</glossterm> to bring the <glossterm
linkend="glossary-db-cluster">database cluster</glossterm> to a
consistent state.

Sure, that's fine with me.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

#15Robert Haas
robertmhaas@gmail.com
In reply to: Matthias van de Meent (#14)
Re: cleanup patches for incremental backup

On Thu, Jan 18, 2024 at 4:50 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

Sure, that's fine with me.

OK, committed that way.

--
Robert Haas
EDB: http://www.enterprisedb.com

#16Nathan Bossart
nathandbossart@gmail.com
In reply to: Robert Haas (#15)
Re: cleanup patches for incremental backup

I'm seeing some recent buildfarm failures for pg_walsummary:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&amp;dt=2024-01-14%2006%3A21%3A58
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=idiacanthus&amp;dt=2024-01-17%2021%3A10%3A36
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&amp;dt=2024-01-20%2018%3A58%3A49
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=taipan&amp;dt=2024-01-23%2002%3A46%3A57
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&amp;dt=2024-01-23%2020%3A23%3A36

The signature looks nearly identical in each:

# Failed test 'WAL summary file exists'
# at t/002_blocks.pl line 79.

# Failed test 'stdout shows block 0 modified'
# at t/002_blocks.pl line 85.
# ''
# doesn't match '(?^m:FORK main: block 0$)'

I haven't been able to reproduce the issue on my machine, and I haven't
figured out precisely what is happening yet, but I wanted to make sure
there is awareness.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#17Robert Haas
robertmhaas@gmail.com
In reply to: Nathan Bossart (#16)
Re: cleanup patches for incremental backup

On Wed, Jan 24, 2024 at 12:08 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:

I'm seeing some recent buildfarm failures for pg_walsummary:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&amp;dt=2024-01-14%2006%3A21%3A58
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=idiacanthus&amp;dt=2024-01-17%2021%3A10%3A36
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&amp;dt=2024-01-20%2018%3A58%3A49
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=taipan&amp;dt=2024-01-23%2002%3A46%3A57
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&amp;dt=2024-01-23%2020%3A23%3A36

The signature looks nearly identical in each:

# Failed test 'WAL summary file exists'
# at t/002_blocks.pl line 79.

# Failed test 'stdout shows block 0 modified'
# at t/002_blocks.pl line 85.
# ''
# doesn't match '(?^m:FORK main: block 0$)'

I haven't been able to reproduce the issue on my machine, and I haven't
figured out precisely what is happening yet, but I wanted to make sure
there is awareness.

This is weird. There's a little more detail in the log file,
regress_log_002_blocks, e.g. from the first failure you linked:

[11:18:20.683](96.787s) # before insert, summarized TLI 1 through 0/14E09D0
[11:18:21.188](0.505s) # after insert, summarized TLI 1 through 0/14E0D08
[11:18:21.326](0.138s) # examining summary for TLI 1 from 0/14E0D08 to 0/155BAF0
# 1
...
[11:18:21.349](0.000s) # got: 'pg_walsummary: error: could
not open file "/home/nm/farm/gcc64/HEAD/pgsql.build/src/bin/pg_walsummary/tmp_check/t_002_blocks_node1_data/pgdata/pg_wal/summaries/0000000100000000014E0D0800000000155BAF0
# 1.summary": No such file or directory'

The "examining summary" line is generated based on the output of
pg_available_wal_summaries(). The way that works is that the server
calls readdir(), disassembles the filename into a TLI and two LSNs,
and returns the result. Then, a fraction of a second later, the test
script reassembles those components into a filename and finds the file
missing. If the logic to translate between filenames and TLIs & LSNs
were incorrect, the test would fail consistently. So the only
explanation that seems to fit the facts is the file disappearing out
from under us. But that really shouldn't happen. We do have code to
remove such files in MaybeRemoveOldWalSummaries(), but it's only
supposed to be nuking files more than 10 days old.

So I don't really have a theory here as to what could be happening. :-(

--
Robert Haas
EDB: http://www.enterprisedb.com

#18Nathan Bossart
nathandbossart@gmail.com
In reply to: Robert Haas (#17)
Re: cleanup patches for incremental backup

On Wed, Jan 24, 2024 at 12:46:16PM -0500, Robert Haas wrote:

The "examining summary" line is generated based on the output of
pg_available_wal_summaries(). The way that works is that the server
calls readdir(), disassembles the filename into a TLI and two LSNs,
and returns the result. Then, a fraction of a second later, the test
script reassembles those components into a filename and finds the file
missing. If the logic to translate between filenames and TLIs & LSNs
were incorrect, the test would fail consistently. So the only
explanation that seems to fit the facts is the file disappearing out
from under us. But that really shouldn't happen. We do have code to
remove such files in MaybeRemoveOldWalSummaries(), but it's only
supposed to be nuking files more than 10 days old.

So I don't really have a theory here as to what could be happening. :-(

There might be an overflow risk in the cutoff time calculation, but I doubt
that's the root cause of these failures:

/*
* Files should only be removed if the last modification time precedes the
* cutoff time we compute here.
*/
cutoff_time = time(NULL) - 60 * wal_summary_keep_time;

Otherwise, I think we'll probably need to add some additional logging to
figure out what is happening...

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#19Robert Haas
robertmhaas@gmail.com
In reply to: Nathan Bossart (#18)
Re: cleanup patches for incremental backup

On Wed, Jan 24, 2024 at 1:05 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

There might be an overflow risk in the cutoff time calculation, but I doubt
that's the root cause of these failures:

/*
* Files should only be removed if the last modification time precedes the
* cutoff time we compute here.
*/
cutoff_time = time(NULL) - 60 * wal_summary_keep_time;

Otherwise, I think we'll probably need to add some additional logging to
figure out what is happening...

Where, though? I suppose we could:

1. Change the server code so that it logs each WAL summary file
removed at a log level high enough to show up in the test logs.

2. Change the TAP test so that it prints out readdir(WAL summary
directory) at various points in the test.

--
Robert Haas
EDB: http://www.enterprisedb.com

#20Nathan Bossart
nathandbossart@gmail.com
In reply to: Robert Haas (#19)
Re: cleanup patches for incremental backup

On Wed, Jan 24, 2024 at 02:08:08PM -0500, Robert Haas wrote:

On Wed, Jan 24, 2024 at 1:05 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

Otherwise, I think we'll probably need to add some additional logging to
figure out what is happening...

Where, though? I suppose we could:

1. Change the server code so that it logs each WAL summary file
removed at a log level high enough to show up in the test logs.

2. Change the TAP test so that it prints out readdir(WAL summary
directory) at various points in the test.

That seems like a reasonable starting point. Even if it doesn't help
determine the root cause, it should at least help rule out concurrent
summary removal.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#21Robert Haas
robertmhaas@gmail.com
In reply to: Nathan Bossart (#20)
1 attachment(s)
Re: cleanup patches for incremental backup

On Wed, Jan 24, 2024 at 2:39 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

That seems like a reasonable starting point. Even if it doesn't help
determine the root cause, it should at least help rule out concurrent
summary removal.

Here is a patch for that.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

v1-0001-Temporary-patch-to-help-debug-pg_walsummary-test-.patchapplication/octet-stream; name=v1-0001-Temporary-patch-to-help-debug-pg_walsummary-test-.patchDownload
From aa9a129de3a7860c9d1db239ee4c23940eb40759 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 25 Jan 2024 10:02:41 -0500
Subject: [PATCH v1] Temporary patch to help debug pg_walsummary test failures.

The tests in 002_blocks.pl are failing in the buildfarm from time to
time, but we don't know how to reproduce the failure elsewhere. The
most obvious explanation seems to be the unexpected disappearance of a
WAL summary file, so bump up the logging level in
RemoveWalSummaryIfOlderThan to try to help us spot such problems. Also
adjust 002_blocks.pl to dump out a directory listing of the relevant
directory at various points.

This patch should be reverted once we sort out what's happening here.
---
 src/backend/backup/walsummary.c       |  3 ++-
 src/bin/pg_walsummary/t/002_blocks.pl | 14 ++++++++++++++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/src/backend/backup/walsummary.c b/src/backend/backup/walsummary.c
index b549673a9d..bef75a9b47 100644
--- a/src/backend/backup/walsummary.c
+++ b/src/backend/backup/walsummary.c
@@ -251,7 +251,8 @@ RemoveWalSummaryIfOlderThan(WalSummaryFile *ws, time_t cutoff_time)
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not stat file \"%s\": %m", path)));
-	ereport(DEBUG2,
+	/* XXX: was DEBUG2, temporarily increased to LOG */
+	ereport(LOG,
 			(errmsg_internal("removing file \"%s\"", path)));
 }
 
diff --git a/src/bin/pg_walsummary/t/002_blocks.pl b/src/bin/pg_walsummary/t/002_blocks.pl
index d609d2c547..40908da8cb 100644
--- a/src/bin/pg_walsummary/t/002_blocks.pl
+++ b/src/bin/pg_walsummary/t/002_blocks.pl
@@ -48,6 +48,7 @@ SELECT summarized_tli, summarized_lsn FROM pg_get_wal_summarizer_state()
 EOM
 ($summarized_tli, $summarized_lsn) = split(/\|/, $progress);
 note("after insert, summarized TLI $summarized_tli through $summarized_lsn");
+note_wal_summary_dir("after insert", $node1);
 
 # Update a row in the first block of the table and trigger a checkpoint.
 $node1->safe_psql('postgres', <<EOM);
@@ -70,6 +71,7 @@ SELECT tli, start_lsn, end_lsn from pg_available_wal_summaries()
 EOM
 my ($tli, $start_lsn, $end_lsn) = split(/\|/, $details);
 note("examining summary for TLI $tli from $start_lsn to $end_lsn");
+note_wal_summary_dir("after new summary", $node1);
 
 # Reconstruct the full pathname for the WAL summary file.
 my $filename = sprintf "%s/pg_wal/summaries/%08s%08s%08s%08s%08s.summary",
@@ -77,6 +79,7 @@ my $filename = sprintf "%s/pg_wal/summaries/%08s%08s%08s%08s%08s.summary",
 					   split(m@/@, $start_lsn),
 					   split(m@/@, $end_lsn);
 ok(-f $filename, "WAL summary file exists");
+note_wal_summary_dir("after existence check", $node1);
 
 # Run pg_walsummary on it. We expect block 0 to be modified, but depending
 # on where the new tuple ends up, block 1 might also be modified, so we
@@ -84,5 +87,16 @@ ok(-f $filename, "WAL summary file exists");
 my ($stdout, $stderr) = run_command([ 'pg_walsummary', '-i', $filename ]);
 like($stdout, qr/FORK main: block 0$/m, "stdout shows block 0 modified");
 is($stderr, '', 'stderr is empty');
+note_wal_summary_dir("after pg_walsummary run", $node1);
 
 done_testing();
+
+# XXX. Temporary debugging code.
+sub note_wal_summary_dir
+{
+	my ($flair, $node) = @_;
+
+	my $wsdir = sprintf "%s/pg_wal/summaries", $node->data_dir;
+	my @wsfiles = grep { $_ ne '.' && $_ ne '..' } slurp_dir($wsdir);
+	note("$flair pg_wal/summaries has: @wsfiles");
+}
-- 
2.39.3 (Apple Git-145)

#22Nathan Bossart
nathandbossart@gmail.com
In reply to: Robert Haas (#21)
Re: cleanup patches for incremental backup

On Thu, Jan 25, 2024 at 10:06:41AM -0500, Robert Haas wrote:

On Wed, Jan 24, 2024 at 2:39 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

That seems like a reasonable starting point. Even if it doesn't help
determine the root cause, it should at least help rule out concurrent
summary removal.

Here is a patch for that.

LGTM. The only thing I might add is the cutoff_time in that LOG.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#23Robert Haas
robertmhaas@gmail.com
In reply to: Nathan Bossart (#22)
1 attachment(s)
Re: cleanup patches for incremental backup

On Thu, Jan 25, 2024 at 11:08 AM Nathan Bossart
<nathandbossart@gmail.com> wrote:

On Thu, Jan 25, 2024 at 10:06:41AM -0500, Robert Haas wrote:

On Wed, Jan 24, 2024 at 2:39 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

That seems like a reasonable starting point. Even if it doesn't help
determine the root cause, it should at least help rule out concurrent
summary removal.

Here is a patch for that.

LGTM. The only thing I might add is the cutoff_time in that LOG.

Here is v2 with that addition.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

v2-0001-Temporary-patch-to-help-debug-pg_walsummary-test-.patchapplication/octet-stream; name=v2-0001-Temporary-patch-to-help-debug-pg_walsummary-test-.patchDownload
From c79461a1147acc851e80c5b5885e7a1d3b591050 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 26 Jan 2024 11:03:29 -0500
Subject: [PATCH v2] Temporary patch to help debug pg_walsummary test failures.

The tests in 002_blocks.pl are failing in the buildfarm from time to
time, but we don't know how to reproduce the failure elsewhere. The
most obvious explanation seems to be the unexpected disappearance of a
WAL summary file, so bump up the logging level in
RemoveWalSummaryIfOlderThan to try to help us spot such problems, and
print the cutoff time in addition to the removed filename. Also
adjust 002_blocks.pl to dump out a directory listing of the relevant
directory at various points.

This patch should be reverted once we sort out what's happening here.
---
 src/backend/backup/walsummary.c       |  7 +++++++
 src/bin/pg_walsummary/t/002_blocks.pl | 14 ++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/src/backend/backup/walsummary.c b/src/backend/backup/walsummary.c
index b549673a9d..ae314d8b74 100644
--- a/src/backend/backup/walsummary.c
+++ b/src/backend/backup/walsummary.c
@@ -251,8 +251,15 @@ RemoveWalSummaryIfOlderThan(WalSummaryFile *ws, time_t cutoff_time)
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not stat file \"%s\": %m", path)));
+	/* XXX temporarily changed to debug buildfarm failures */
+#if 0
 	ereport(DEBUG2,
 			(errmsg_internal("removing file \"%s\"", path)));
+#else
+	ereport(LOG,
+			(errmsg_internal("removing file \"%s\" cutoff_time=%llu", path,
+			 (unsigned long long) cutoff_time)));
+#endif
 }
 
 /*
diff --git a/src/bin/pg_walsummary/t/002_blocks.pl b/src/bin/pg_walsummary/t/002_blocks.pl
index d609d2c547..40908da8cb 100644
--- a/src/bin/pg_walsummary/t/002_blocks.pl
+++ b/src/bin/pg_walsummary/t/002_blocks.pl
@@ -48,6 +48,7 @@ SELECT summarized_tli, summarized_lsn FROM pg_get_wal_summarizer_state()
 EOM
 ($summarized_tli, $summarized_lsn) = split(/\|/, $progress);
 note("after insert, summarized TLI $summarized_tli through $summarized_lsn");
+note_wal_summary_dir("after insert", $node1);
 
 # Update a row in the first block of the table and trigger a checkpoint.
 $node1->safe_psql('postgres', <<EOM);
@@ -70,6 +71,7 @@ SELECT tli, start_lsn, end_lsn from pg_available_wal_summaries()
 EOM
 my ($tli, $start_lsn, $end_lsn) = split(/\|/, $details);
 note("examining summary for TLI $tli from $start_lsn to $end_lsn");
+note_wal_summary_dir("after new summary", $node1);
 
 # Reconstruct the full pathname for the WAL summary file.
 my $filename = sprintf "%s/pg_wal/summaries/%08s%08s%08s%08s%08s.summary",
@@ -77,6 +79,7 @@ my $filename = sprintf "%s/pg_wal/summaries/%08s%08s%08s%08s%08s.summary",
 					   split(m@/@, $start_lsn),
 					   split(m@/@, $end_lsn);
 ok(-f $filename, "WAL summary file exists");
+note_wal_summary_dir("after existence check", $node1);
 
 # Run pg_walsummary on it. We expect block 0 to be modified, but depending
 # on where the new tuple ends up, block 1 might also be modified, so we
@@ -84,5 +87,16 @@ ok(-f $filename, "WAL summary file exists");
 my ($stdout, $stderr) = run_command([ 'pg_walsummary', '-i', $filename ]);
 like($stdout, qr/FORK main: block 0$/m, "stdout shows block 0 modified");
 is($stderr, '', 'stderr is empty');
+note_wal_summary_dir("after pg_walsummary run", $node1);
 
 done_testing();
+
+# XXX. Temporary debugging code.
+sub note_wal_summary_dir
+{
+	my ($flair, $node) = @_;
+
+	my $wsdir = sprintf "%s/pg_wal/summaries", $node->data_dir;
+	my @wsfiles = grep { $_ ne '.' && $_ ne '..' } slurp_dir($wsdir);
+	note("$flair pg_wal/summaries has: @wsfiles");
+}
-- 
2.39.3 (Apple Git-145)

#24Nathan Bossart
nathandbossart@gmail.com
In reply to: Robert Haas (#23)
Re: cleanup patches for incremental backup

On Fri, Jan 26, 2024 at 11:04:37AM -0500, Robert Haas wrote:

Here is v2 with that addition.

Looks reasonable.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#25Robert Haas
robertmhaas@gmail.com
In reply to: Nathan Bossart (#24)
Re: cleanup patches for incremental backup

On Fri, Jan 26, 2024 at 12:39 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:

On Fri, Jan 26, 2024 at 11:04:37AM -0500, Robert Haas wrote:

Here is v2 with that addition.

Looks reasonable.

Thanks for the report & review. I have committed that version.

--
Robert Haas
EDB: http://www.enterprisedb.com

#26Alexander Lakhin
exclusion@gmail.com
In reply to: Robert Haas (#25)
Re: cleanup patches for incremental backup

Hello Robert,

26.01.2024 21:37, Robert Haas wrote:

On Fri, Jan 26, 2024 at 12:39 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:

On Fri, Jan 26, 2024 at 11:04:37AM -0500, Robert Haas wrote:

Here is v2 with that addition.

Looks reasonable.

Thanks for the report & review. I have committed that version.

While trying to reproduce the 002_blocks test failure, I've encountered
another anomaly (or two):
make -s check -C src/bin/pg_walsummary/ PROVE_TESTS="t/002*" PROVE_FLAGS="--timer"
# +++ tap check in src/bin/pg_walsummary +++
[05:40:38] t/002_blocks.pl .. # poll_query_until timed out executing this query:
# SELECT EXISTS (
#     SELECT * from pg_available_wal_summaries()
#     WHERE tli = 0 AND end_lsn > '0/0'
# )
#
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
[05:40:38] t/002_blocks.pl .. ok   266739 ms ( 0.00 usr  0.01 sys + 17.51 cusr 26.79 csys = 44.31 CPU)
[05:45:05]
All tests successful.
Files=1, Tests=3, 267 wallclock secs ( 0.02 usr  0.02 sys + 17.51 cusr 26.79 csys = 44.34 CPU)
Result: PASS

It looks like the test may call pg_get_wal_summarizer_state() when
WalSummarizerCtl->initialized is false yet, i. e. before the first
GetOldestUnsummarizedLSN() call.
I could reproduce the issue easily (within 10 test runs) with
pg_usleep(100000L);
added inside WalSummarizerMain() just below:
sigprocmask(SIG_SETMASK, &UnBlockSig, NULL);

But the fact that the test passes regardless of the timeout, make me
wonder, whether any test should fail when such timeout occurs?

Best regards,
Alexander

#27Alexander Lakhin
exclusion@gmail.com
In reply to: Robert Haas (#17)
Re: cleanup patches for incremental backup

24.01.2024 20:46, Robert Haas wrote:

This is weird. There's a little more detail in the log file,
regress_log_002_blocks, e.g. from the first failure you linked:

[11:18:20.683](96.787s) # before insert, summarized TLI 1 through 0/14E09D0
[11:18:21.188](0.505s) # after insert, summarized TLI 1 through 0/14E0D08
[11:18:21.326](0.138s) # examining summary for TLI 1 from 0/14E0D08 to 0/155BAF0
# 1
...
[11:18:21.349](0.000s) # got: 'pg_walsummary: error: could
not open file "/home/nm/farm/gcc64/HEAD/pgsql.build/src/bin/pg_walsummary/tmp_check/t_002_blocks_node1_data/pgdata/pg_wal/summaries/0000000100000000014E0D0800000000155BAF0
# 1.summary": No such file or directory'

The "examining summary" line is generated based on the output of
pg_available_wal_summaries(). The way that works is that the server
calls readdir(), disassembles the filename into a TLI and two LSNs,
and returns the result.

I'm discouraged by "\n1" in the file name and in the
"examining summary..." message.
regress_log_002_blocks from the following successful test run on the same
sungazer node contains:
[15:21:58.924](0.106s) # examining summary for TLI 1 from 0/155BAE0 to 0/155E750
[15:21:58.925](0.001s) ok 1 - WAL summary file exists

Best regards,
Alexander

#28Nathan Bossart
nathandbossart@gmail.com
In reply to: Alexander Lakhin (#27)
1 attachment(s)
Re: cleanup patches for incremental backup

On Sat, Jan 27, 2024 at 11:00:01AM +0300, Alexander Lakhin wrote:

24.01.2024 20:46, Robert Haas wrote:

This is weird. There's a little more detail in the log file,
regress_log_002_blocks, e.g. from the first failure you linked:

[11:18:20.683](96.787s) # before insert, summarized TLI 1 through 0/14E09D0
[11:18:21.188](0.505s) # after insert, summarized TLI 1 through 0/14E0D08
[11:18:21.326](0.138s) # examining summary for TLI 1 from 0/14E0D08 to 0/155BAF0
# 1
...
[11:18:21.349](0.000s) # got: 'pg_walsummary: error: could
not open file "/home/nm/farm/gcc64/HEAD/pgsql.build/src/bin/pg_walsummary/tmp_check/t_002_blocks_node1_data/pgdata/pg_wal/summaries/0000000100000000014E0D0800000000155BAF0
# 1.summary": No such file or directory'

The "examining summary" line is generated based on the output of
pg_available_wal_summaries(). The way that works is that the server
calls readdir(), disassembles the filename into a TLI and two LSNs,
and returns the result.

I'm discouraged by "\n1" in the file name and in the
"examining summary..." message.
regress_log_002_blocks from the following successful test run on the same
sungazer node contains:
[15:21:58.924](0.106s) # examining summary for TLI 1 from 0/155BAE0 to 0/155E750
[15:21:58.925](0.001s) ok 1 - WAL summary file exists

Ah, I think this query:

SELECT tli, start_lsn, end_lsn from pg_available_wal_summaries()
WHERE tli = $summarized_tli AND end_lsn > '$summarized_lsn'

is returning more than one row in some cases. I attached a quick sketch of
an easy way to reproduce the issue as well as one way to fix it.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

repro_and_fix.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/bin/pg_walsummary/t/002_blocks.pl b/src/bin/pg_walsummary/t/002_blocks.pl
index 40908da8cb..5609b1bd14 100644
--- a/src/bin/pg_walsummary/t/002_blocks.pl
+++ b/src/bin/pg_walsummary/t/002_blocks.pl
@@ -64,10 +64,16 @@ SELECT EXISTS (
 )
 EOM
 
+$node1->safe_psql('postgres', <<EOM);
+UPDATE mytable SET b = 'abefghijklmnopqrstuvwxyz' WHERE a = 2;
+CHECKPOINT;
+EOM
+$node1->safe_psql('postgres', 'SELECT pg_sleep(1);');
+
 # Figure out the exact details for the new summary file.
 my $details = $node1->safe_psql('postgres', <<EOM);
 SELECT tli, start_lsn, end_lsn from pg_available_wal_summaries()
-	WHERE tli = $summarized_tli AND end_lsn > '$summarized_lsn'
+	WHERE tli = $summarized_tli AND end_lsn > '$summarized_lsn' ORDER BY end_lsn LIMIT 1
 EOM
 my ($tli, $start_lsn, $end_lsn) = split(/\|/, $details);
 note("examining summary for TLI $tli from $start_lsn to $end_lsn");
#29Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#28)
Re: cleanup patches for incremental backup

On Sat, Jan 27, 2024 at 10:31:09AM -0600, Nathan Bossart wrote:

On Sat, Jan 27, 2024 at 11:00:01AM +0300, Alexander Lakhin wrote:

I'm discouraged by "\n1" in the file name and in the
"examining summary..." message.
regress_log_002_blocks from the following successful test run on the same
sungazer node contains:
[15:21:58.924](0.106s) # examining summary for TLI 1 from 0/155BAE0 to 0/155E750
[15:21:58.925](0.001s) ok 1 - WAL summary file exists

Ah, I think this query:

SELECT tli, start_lsn, end_lsn from pg_available_wal_summaries()
WHERE tli = $summarized_tli AND end_lsn > '$summarized_lsn'

is returning more than one row in some cases. I attached a quick sketch of
an easy way to reproduce the issue as well as one way to fix it.

The buildfarm just caught a failure with the new logging in place:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&amp;dt=2024-01-29%2018%3A09%3A10

I'm not totally sure my "fix" is correct, but I think this does confirm the
theory.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#30Robert Haas
robertmhaas@gmail.com
In reply to: Nathan Bossart (#29)
Re: cleanup patches for incremental backup

On Mon, Jan 29, 2024 at 1:21 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

Ah, I think this query:

SELECT tli, start_lsn, end_lsn from pg_available_wal_summaries()
WHERE tli = $summarized_tli AND end_lsn > '$summarized_lsn'

is returning more than one row in some cases. I attached a quick sketch of
an easy way to reproduce the issue as well as one way to fix it.

The buildfarm just caught a failure with the new logging in place:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&amp;dt=2024-01-29%2018%3A09%3A10

I'm not totally sure my "fix" is correct, but I think this does confirm the
theory.

Ah. The possibilities of ending up with TWO new WAL summaries never
occurred to me. Things that never occurred to the developer are a
leading cause of bugs, and so here.

I'm wondering if what we need to do is run pg_walsummary on both
summary files in that case. If we just pick one or the other, how do
we know which one to pick?

--
Robert Haas
EDB: http://www.enterprisedb.com

#31Nathan Bossart
nathandbossart@gmail.com
In reply to: Robert Haas (#30)
Re: cleanup patches for incremental backup

On Mon, Jan 29, 2024 at 03:18:50PM -0500, Robert Haas wrote:

I'm wondering if what we need to do is run pg_walsummary on both
summary files in that case. If we just pick one or the other, how do
we know which one to pick?

Even if we do that, isn't it possible that none of the summaries will
include the change? Presently, we get the latest summarized LSN, make a
change, and then wait for the next summary file with a greater LSN than
what we saw before the change. But AFAICT there's no guarantee that means
the change has been summarized yet, although the chances of that happening
in a test are probably pretty small.

Could we get the LSN before and after making the change and then inspect
all summaries that include that LSN range?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#32Robert Haas
robertmhaas@gmail.com
In reply to: Nathan Bossart (#31)
Re: cleanup patches for incremental backup

On Mon, Jan 29, 2024 at 4:13 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Mon, Jan 29, 2024 at 03:18:50PM -0500, Robert Haas wrote:

I'm wondering if what we need to do is run pg_walsummary on both
summary files in that case. If we just pick one or the other, how do
we know which one to pick?

Even if we do that, isn't it possible that none of the summaries will
include the change? Presently, we get the latest summarized LSN, make a
change, and then wait for the next summary file with a greater LSN than
what we saw before the change. But AFAICT there's no guarantee that means
the change has been summarized yet, although the chances of that happening
in a test are probably pretty small.

Could we get the LSN before and after making the change and then inspect
all summaries that include that LSN range?

The trick here is that each WAL summary file covers one checkpoint
cycle. The intent of the test is to load data into the table,
checkpoint, see what summaries exist, then update a row, checkpoint
again, and see what summaries now exist. We expect one new summary
because there's been one new checkpoint. When I was thinking about
this yesterday, I was imagining that we were somehow getting an extra
checkpoint in some cases. But it looks like it's actually an
off-by-one situation. In
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&amp;dt=2024-01-29%2018%3A09%3A10
the new files that show up between "after insert" and "after new
summary" are:

00000001000000000152FAE000000000015AAAC8.summary (LSN distance ~500k)
00000001000000000152F7A8000000000152FAE0.summary (LSN distance 824 bytes)

The checkpoint after the inserts says:

LOG: checkpoint complete: wrote 14 buffers (10.9%); 0 WAL file(s)
added, 0 removed, 0 recycled; write=0.956 s, sync=0.929 s, total=3.059
s; sync files=39, longest=0.373 s, average=0.024 s; distance=491 kB,
estimate=491 kB; lsn=0/15AAB20, redo lsn=0/15AAAC8

And the checkpoint after the single-row update says:

LOG: checkpoint complete: wrote 4 buffers (3.1%); 0 WAL file(s)
added, 0 removed, 0 recycled; write=0.648 s, sync=0.355 s, total=2.798
s; sync files=3, longest=0.348 s, average=0.119 s; distance=11 kB,
estimate=443 kB; lsn=0/15AD770, redo lsn=0/15AD718

So both of the new WAL summary files that are appearing here are from
checkpoints that happened before the single-row update. The larger
file is the one covering the 400 inserts, and the smaller one is the
checkpoint before that. Which means that the "Wait for a new summary
to show up." code isn't actually waiting long enough, and then the
whole thing goes haywire. The problem is, I think, that this code
naively thinks it can just wait for summarized_lsn and everything will
be fine ... but that assumes we were caught up when we first measured
the summarized_lsn, and that need not be so, because it takes some
short but non-zero amount of time for the summarizer to catch up with
the WAL generated during initdb.

I think the solution here is to find a better way to wait for the
inserts to be summarized, one that actually does wait for that to
happen.

--
Robert Haas
EDB: http://www.enterprisedb.com

#33Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#32)
1 attachment(s)
Re: cleanup patches for incremental backup

On Tue, Jan 30, 2024 at 10:51 AM Robert Haas <robertmhaas@gmail.com> wrote:

I think the solution here is to find a better way to wait for the
inserts to be summarized, one that actually does wait for that to
happen.

Here's a patch for that. I now think
a7097ca630a25dd2248229f21ebce4968d85d10a was actually misguided, and
served only to mask some of the failures caused by waiting for the WAL
summary file.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

v1-0001-Revise-pg_walsummary-s-002_blocks-test-to-avoid-s.patchapplication/octet-stream; name=v1-0001-Revise-pg_walsummary-s-002_blocks-test-to-avoid-s.patchDownload
From 890160b2760ac348af5edae7e6af07318b7fb20e Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 30 Jan 2024 11:48:58 -0500
Subject: [PATCH v1] Revise pg_walsummary's 002_blocks test to avoid spurious
 failures.

Analysis of buildfarm results showed that the code that was intended
to wait for the inserts performed by this test to complete did not
actually do so. Try to make that logic more robust.

Improve error checking elsewhere in the script, too, so that we
don't miss things like poll_query_until failing.
---
 src/bin/pg_walsummary/t/002_blocks.pl | 47 ++++++++++++++++-----------
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/src/bin/pg_walsummary/t/002_blocks.pl b/src/bin/pg_walsummary/t/002_blocks.pl
index 40908da8cb..eb128f460c 100644
--- a/src/bin/pg_walsummary/t/002_blocks.pl
+++ b/src/bin/pg_walsummary/t/002_blocks.pl
@@ -13,13 +13,6 @@ $node1->init(has_archiving => 1, allows_streaming => 1);
 $node1->append_conf('postgresql.conf', 'summarize_wal = on');
 $node1->start;
 
-# See what's been summarized up until now.
-my $progress = $node1->safe_psql('postgres', <<EOM);
-SELECT summarized_tli, summarized_lsn FROM pg_get_wal_summarizer_state()
-EOM
-my ($summarized_tli, $summarized_lsn) = split(/\|/, $progress);
-note("before insert, summarized TLI $summarized_tli through $summarized_lsn");
-
 # Create a table and insert a few test rows into it. VACUUM FREEZE it so that
 # autovacuum doesn't induce any future modifications unexpectedly. Then
 # trigger a checkpoint.
@@ -31,22 +24,33 @@ SELECT
 FROM
 	generate_series(1, 400) g;
 VACUUM FREEZE;
+EOM
+
+# Record the current WAL insert LSN.
+my $base_lsn = $node1->safe_psql('postgres', <<EOM);
+SELECT pg_current_wal_insert_lsn()
+EOM
+note("just after insert, WAL insert LSN is $base_lsn");
+
+# Now perform a CHECKPOINT.
+$node1->safe_psql('postgres', <<EOM);
 CHECKPOINT;
 EOM
 
-# Wait for a new summary to show up.
-$node1->poll_query_until('postgres', <<EOM);
+# Wait for a new summary to show up, one that includes the inserts we just did.
+my $result = $node1->poll_query_until('postgres', <<EOM);
 SELECT EXISTS (
     SELECT * from pg_available_wal_summaries()
-    WHERE tli = $summarized_tli AND end_lsn > '$summarized_lsn'
+    WHERE end_lsn >= '$base_lsn'
 )
 EOM
+ok($result, "WAL summarization caught up after insert");
 
-# Again check the progress of WAL summarization.
-$progress = $node1->safe_psql('postgres', <<EOM);
+# Get a list of what summaries we now have.
+my $progress = $node1->safe_psql('postgres', <<EOM);
 SELECT summarized_tli, summarized_lsn FROM pg_get_wal_summarizer_state()
 EOM
-($summarized_tli, $summarized_lsn) = split(/\|/, $progress);
+my ($summarized_tli, $summarized_lsn) = split(/\|/, $progress);
 note("after insert, summarized TLI $summarized_tli through $summarized_lsn");
 note_wal_summary_dir("after insert", $node1);
 
@@ -56,20 +60,23 @@ UPDATE mytable SET b = 'abcdefghijklmnopqrstuvwxyz' WHERE a = 2;
 CHECKPOINT;
 EOM
 
-# Again wait for a new summary to show up.
-$node1->poll_query_until('postgres', <<EOM);
+# Wait for a new summary to show up.
+$result = $node1->poll_query_until('postgres', <<EOM);
 SELECT EXISTS (
     SELECT * from pg_available_wal_summaries()
     WHERE tli = $summarized_tli AND end_lsn > '$summarized_lsn'
 )
 EOM
+ok($result, "got new WAL summary after update");
 
 # Figure out the exact details for the new summary file.
 my $details = $node1->safe_psql('postgres', <<EOM);
 SELECT tli, start_lsn, end_lsn from pg_available_wal_summaries()
 	WHERE tli = $summarized_tli AND end_lsn > '$summarized_lsn'
 EOM
-my ($tli, $start_lsn, $end_lsn) = split(/\|/, $details);
+my @lines = split(/\n/, $details);
+is(0+@lines, 1, "got exactly one new WAL summary");
+my ($tli, $start_lsn, $end_lsn) = split(/\|/, $lines[0]);
 note("examining summary for TLI $tli from $start_lsn to $end_lsn");
 note_wal_summary_dir("after new summary", $node1);
 
@@ -81,12 +88,14 @@ my $filename = sprintf "%s/pg_wal/summaries/%08s%08s%08s%08s%08s.summary",
 ok(-f $filename, "WAL summary file exists");
 note_wal_summary_dir("after existence check", $node1);
 
-# Run pg_walsummary on it. We expect block 0 to be modified, but depending
-# on where the new tuple ends up, block 1 might also be modified, so we
-# pass -i to pg_walsummary to make sure we don't end up with a 0..1 range.
+# Run pg_walsummary on it. We expect exactly two blocks to be modified,
+# block 0 and one other.
 my ($stdout, $stderr) = run_command([ 'pg_walsummary', '-i', $filename ]);
+note($stdout);
+@lines = split(/\n/, $stdout);
 like($stdout, qr/FORK main: block 0$/m, "stdout shows block 0 modified");
 is($stderr, '', 'stderr is empty');
+is(0+@lines, 2, "UPDATE modified 2 blocks");
 note_wal_summary_dir("after pg_walsummary run", $node1);
 
 done_testing();
-- 
2.39.3 (Apple Git-145)

#34Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#33)
Re: cleanup patches for incremental backup

On Tue, Jan 30, 2024 at 11:52 AM Robert Haas <robertmhaas@gmail.com> wrote:

Here's a patch for that. I now think
a7097ca630a25dd2248229f21ebce4968d85d10a was actually misguided, and
served only to mask some of the failures caused by waiting for the WAL
summary file.

Committed.

--
Robert Haas
EDB: http://www.enterprisedb.com

#35Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#18)
1 attachment(s)
Re: cleanup patches for incremental backup

On Wed, Jan 24, 2024 at 12:05:15PM -0600, Nathan Bossart wrote:

There might be an overflow risk in the cutoff time calculation, but I doubt
that's the root cause of these failures:

/*
* Files should only be removed if the last modification time precedes the
* cutoff time we compute here.
*/
cutoff_time = time(NULL) - 60 * wal_summary_keep_time;

I've attached a short patch for fixing this overflow risk. Specifically,
it limits wal_summary_keep_time to INT_MAX / SECS_PER_MINUTE, just like
log_rotation_age.

I considering checking for overflow when we subtract the keep-time from the
result of time(2), but AFAICT there's only a problem if time_t is unsigned,
which Wikipedia leads me to believe is unusual [0]https://en.wikipedia.org/wiki/Unix_time#Representing_the_number, so I figured we might
be able to just wait this one out until 2038.

Otherwise, I think we'll probably need to add some additional logging to
figure out what is happening...

Separately, I suppose it's probably time to revert the temporary debugging
code adding by commit 5ddf997. I can craft a patch for that, too.

[0]: https://en.wikipedia.org/wiki/Unix_time#Representing_the_number

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

overflow_fix.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index ec2874c18c..c820d1f9ed 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -139,7 +139,7 @@ static XLogRecPtr redo_pointer_at_last_summary_removal = InvalidXLogRecPtr;
  * GUC parameters
  */
 bool		summarize_wal = false;
-int			wal_summary_keep_time = 10 * 24 * 60;
+int			wal_summary_keep_time = 10 * HOURS_PER_DAY * MINS_PER_HOUR;
 
 static void WalSummarizerShutdown(int code, Datum arg);
 static XLogRecPtr GetLatestLSN(TimeLineID *tli);
@@ -1474,7 +1474,7 @@ MaybeRemoveOldWalSummaries(void)
 	 * Files should only be removed if the last modification time precedes the
 	 * cutoff time we compute here.
 	 */
-	cutoff_time = time(NULL) - 60 * wal_summary_keep_time;
+	cutoff_time = time(NULL) - wal_summary_keep_time * SECS_PER_MINUTE;
 
 	/* Get all the summaries that currently exist. */
 	wslist = GetWalSummaries(0, InvalidXLogRecPtr, InvalidXLogRecPtr);
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 57d9de4dd9..1e71e7db4a 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3293,9 +3293,9 @@ struct config_int ConfigureNamesInt[] =
 			GUC_UNIT_MIN,
 		},
 		&wal_summary_keep_time,
-		10 * 24 * 60,			/* 10 days */
+		10 * HOURS_PER_DAY * MINS_PER_HOUR, /* 10 days */
 		0,
-		INT_MAX,
+		INT_MAX / SECS_PER_MINUTE,
 		NULL, NULL, NULL
 	},
 
#36Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#35)
2 attachment(s)
Re: cleanup patches for incremental backup

On Thu, Mar 14, 2024 at 04:00:10PM -0500, Nathan Bossart wrote:

Separately, I suppose it's probably time to revert the temporary debugging
code adding by commit 5ddf997. I can craft a patch for that, too.

As promised...

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-Revert-Temporary-patch-to-help-debug-pg_walsummar.patchtext/x-diff; charset=us-asciiDownload
From 3923e30acb1d4e21ce311b25edbcd8b96b4223d2 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Thu, 14 Mar 2024 20:36:48 -0500
Subject: [PATCH v1 1/2] Revert "Temporary patch to help debug pg_walsummary
 test failures."

This reverts commit 5ddf9973477729cf161b4ad0a1efd52f4fea9c88.
---
 src/backend/backup/walsummary.c       |  7 -------
 src/bin/pg_walsummary/t/002_blocks.pl | 14 --------------
 2 files changed, 21 deletions(-)

diff --git a/src/backend/backup/walsummary.c b/src/backend/backup/walsummary.c
index 4d047e1c02..322ae3c3ad 100644
--- a/src/backend/backup/walsummary.c
+++ b/src/backend/backup/walsummary.c
@@ -252,15 +252,8 @@ RemoveWalSummaryIfOlderThan(WalSummaryFile *ws, time_t cutoff_time)
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not stat file \"%s\": %m", path)));
-	/* XXX temporarily changed to debug buildfarm failures */
-#if 0
 	ereport(DEBUG2,
 			(errmsg_internal("removing file \"%s\"", path)));
-#else
-	ereport(LOG,
-			(errmsg_internal("removing file \"%s\" cutoff_time=%llu", path,
-							 (unsigned long long) cutoff_time)));
-#endif
 }
 
 /*
diff --git a/src/bin/pg_walsummary/t/002_blocks.pl b/src/bin/pg_walsummary/t/002_blocks.pl
index d4bae3d564..52d3bd8840 100644
--- a/src/bin/pg_walsummary/t/002_blocks.pl
+++ b/src/bin/pg_walsummary/t/002_blocks.pl
@@ -51,7 +51,6 @@ my $summarized_lsn = $node1->safe_psql('postgres', <<EOM);
 SELECT MAX(end_lsn) AS summarized_lsn FROM pg_available_wal_summaries()
 EOM
 note("after insert, summarized through $summarized_lsn");
-note_wal_summary_dir("after insert", $node1);
 
 # Update a row in the first block of the table and trigger a checkpoint.
 $node1->safe_psql('postgres', <<EOM);
@@ -78,7 +77,6 @@ my @lines = split(/\n/, $details);
 is(0+@lines, 1, "got exactly one new WAL summary");
 my ($tli, $start_lsn, $end_lsn) = split(/\|/, $lines[0]);
 note("examining summary for TLI $tli from $start_lsn to $end_lsn");
-note_wal_summary_dir("after new summary", $node1);
 
 # Reconstruct the full pathname for the WAL summary file.
 my $filename = sprintf "%s/pg_wal/summaries/%08s%08s%08s%08s%08s.summary",
@@ -86,7 +84,6 @@ my $filename = sprintf "%s/pg_wal/summaries/%08s%08s%08s%08s%08s.summary",
 					   split(m@/@, $start_lsn),
 					   split(m@/@, $end_lsn);
 ok(-f $filename, "WAL summary file exists");
-note_wal_summary_dir("after existence check", $node1);
 
 # Run pg_walsummary on it. We expect exactly two blocks to be modified,
 # block 0 and one other.
@@ -96,16 +93,5 @@ note($stdout);
 like($stdout, qr/FORK main: block 0$/m, "stdout shows block 0 modified");
 is($stderr, '', 'stderr is empty');
 is(0+@lines, 2, "UPDATE modified 2 blocks");
-note_wal_summary_dir("after pg_walsummary run", $node1);
 
 done_testing();
-
-# XXX. Temporary debugging code.
-sub note_wal_summary_dir
-{
-	my ($flair, $node) = @_;
-
-	my $wsdir = sprintf "%s/pg_wal/summaries", $node->data_dir;
-	my @wsfiles = grep { $_ ne '.' && $_ ne '..' } slurp_dir($wsdir);
-	note("$flair pg_wal/summaries has: @wsfiles");
-}
-- 
2.25.1

v1-0002-Fix-possible-overflow-in-MaybeRemoveOldWalSummari.patchtext/x-diff; charset=us-asciiDownload
From 38dda89ee48736489d37e18dea186e90358468b0 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Thu, 14 Mar 2024 20:46:02 -0500
Subject: [PATCH v1 2/2] Fix possible overflow in MaybeRemoveOldWalSummaries().

---
 src/backend/postmaster/walsummarizer.c | 4 ++--
 src/backend/utils/misc/guc_tables.c    | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index ec2874c18c..c820d1f9ed 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -139,7 +139,7 @@ static XLogRecPtr redo_pointer_at_last_summary_removal = InvalidXLogRecPtr;
  * GUC parameters
  */
 bool		summarize_wal = false;
-int			wal_summary_keep_time = 10 * 24 * 60;
+int			wal_summary_keep_time = 10 * HOURS_PER_DAY * MINS_PER_HOUR;
 
 static void WalSummarizerShutdown(int code, Datum arg);
 static XLogRecPtr GetLatestLSN(TimeLineID *tli);
@@ -1474,7 +1474,7 @@ MaybeRemoveOldWalSummaries(void)
 	 * Files should only be removed if the last modification time precedes the
 	 * cutoff time we compute here.
 	 */
-	cutoff_time = time(NULL) - 60 * wal_summary_keep_time;
+	cutoff_time = time(NULL) - wal_summary_keep_time * SECS_PER_MINUTE;
 
 	/* Get all the summaries that currently exist. */
 	wslist = GetWalSummaries(0, InvalidXLogRecPtr, InvalidXLogRecPtr);
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 57d9de4dd9..1e71e7db4a 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3293,9 +3293,9 @@ struct config_int ConfigureNamesInt[] =
 			GUC_UNIT_MIN,
 		},
 		&wal_summary_keep_time,
-		10 * 24 * 60,			/* 10 days */
+		10 * HOURS_PER_DAY * MINS_PER_HOUR, /* 10 days */
 		0,
-		INT_MAX,
+		INT_MAX / SECS_PER_MINUTE,
 		NULL, NULL, NULL
 	},
 
-- 
2.25.1

#37Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#36)
Re: cleanup patches for incremental backup

On Thu, Mar 14, 2024 at 08:52:55PM -0500, Nathan Bossart wrote:

Subject: [PATCH v1 1/2] Revert "Temporary patch to help debug pg_walsummary
test failures."

Subject: [PATCH v1 2/2] Fix possible overflow in MaybeRemoveOldWalSummaries().

Assuming there are no objections or feedback, I plan to commit these two
patches within the next couple of days.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#38Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#37)
Re: cleanup patches for incremental backup

On Tue, Mar 19, 2024 at 01:15:02PM -0500, Nathan Bossart wrote:

Assuming there are no objections or feedback, I plan to commit these two
patches within the next couple of days.

Committed.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#39Robert Haas
robertmhaas@gmail.com
In reply to: Nathan Bossart (#38)
Re: cleanup patches for incremental backup

On Wed, Mar 20, 2024 at 2:35 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Tue, Mar 19, 2024 at 01:15:02PM -0500, Nathan Bossart wrote:

Assuming there are no objections or feedback, I plan to commit these two
patches within the next couple of days.

Committed.

Thanks. Sorry you had to clean up after me. :-(

--
Robert Haas
EDB: http://www.enterprisedb.com

#40Nathan Bossart
nathandbossart@gmail.com
In reply to: Robert Haas (#39)
Re: cleanup patches for incremental backup

On Wed, Mar 20, 2024 at 02:53:01PM -0400, Robert Haas wrote:

On Wed, Mar 20, 2024 at 2:35 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

Committed.

Thanks. Sorry you had to clean up after me. :-(

No worries.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com