[PATCH] Teach pg_waldump to extract FPIs from the WAL

Started by David Christensenover 3 years ago64 messages
#1David Christensen
david.christensen@crunchydata.com
1 attachment(s)

Hi -hackers,

Enclosed is a patch to allow extraction/saving of FPI from the WAL
stream via pg_waldump.

Description from the commit:

Extracts full-page images from the WAL stream into a target directory,
which must be empty or not
exist. These images are subject to the same filtering rules as normal
display in pg_waldump, which
means that you can isolate the full page writes to a target relation,
among other things.

Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk> with
formatting to make things
somewhat sortable; for instance:

00000000-010000C0.1663.1.6117.0
00000000-01000150.1664.0.6115.0
00000000-010001E0.1664.0.6114.0
00000000-01000270.1663.1.6116.0
00000000-01000300.1663.1.6113.0
00000000-01000390.1663.1.6112.0
00000000-01000420.1663.1.8903.0
00000000-010004B0.1663.1.8902.0
00000000-01000540.1663.1.6111.0
00000000-010005D0.1663.1.6110.0

It's noteworthy that the raw images do not have the current LSN stored
with them in the WAL
stream (as would be true for on-heap versions of the blocks), nor
would the checksum be valid in
them (though WAL itself has checksums, so there is some protection
there). This patch chooses to
place the LSN and calculate the proper checksum (if non-zero in the
source image) in the outputted
block. (This could perhaps be a targetted flag if we decide we don't
always want this.)

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect`
suite of tools to perform detailed analysis on the pages in question,
based on historical
information, and may come in handy for forensics work.

Best,

David

Attachments:

v1-pg_waldump-save-fpi.patchapplication/octet-stream; name=v1-pg_waldump-save-fpi.patchDownload
From e9063dba6d6eac55db15c148380a85aa394f647b Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Wed, 20 Apr 2022 19:59:35 -0500
Subject: [PATCH] Teach pg_waldump to extract FPIs from the WAL stream

Extracts full-page images from the WAL stream into a target directory, which must be empty or not
exist.  These images are subject to the same filtering rules as normal display in pg_waldump, which
means that you can isolate the full page writes to a target relation, among other things.

Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk> with formatting to make things
somewhat sortable; for instance:

00000000-010000C0.1663.1.6117.0
00000000-01000150.1664.0.6115.0
00000000-010001E0.1664.0.6114.0
00000000-01000270.1663.1.6116.0
00000000-01000300.1663.1.6113.0
00000000-01000390.1663.1.6112.0
00000000-01000420.1663.1.8903.0
00000000-010004B0.1663.1.8902.0
00000000-01000540.1663.1.6111.0
00000000-010005D0.1663.1.6110.0

It's noteworthy that the raw images do not have the current LSN stored with them in the WAL
stream (as would be true for on-heap versions of the blocks), nor would the checksum be valid in
them (though WAL itself has checksums, so there is some protection there).  This patch chooses to
place the LSN and calculate the proper checksum (if non-zero in the source image) in the outputted
block.  (This could perhaps be a targetted flag if we decide we don't always want this.)

These images could be loaded/inspected via `pg_read_binary_file()` and used in the `pageinspect`
suite of tools to perform detailed analysis on the pages in question, based on historical
information, and may come in handy for forensics work.
---
 doc/src/sgml/ref/pg_waldump.sgml |  57 +++++++++++++
 src/bin/pg_waldump/pg_waldump.c  | 132 ++++++++++++++++++++++++++++++-
 2 files changed, 188 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index 57746d9421..215e63d962 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -240,6 +240,63 @@ PostgreSQL documentation
        </listitem>
      </varlistentry>
 
+     <varlistentry>
+       <term><option>-W <replaceable>save_path</replaceable></option></term>
+       <term><option>--raw-fpi=<replaceable>save_path</replaceable></option></term>
+       <listitem>
+       <para>
+        Save full page images seen in the WAL stream to the
+        given <replaceable>save_path</replaceable>, which should not exist or
+        should be empty.  The images saved will be subject to the same
+        filtering and limiting criteria as display records, but in this
+        mode <application>pg_waldump</application> will not output any other
+        information.
+       </para>
+       <para>
+        The page images will be saved with the file
+        format: <literal><replaceable>LSN</replaceable>.<replaceable>TSOID</replaceable>.<replaceable>DBOID</replaceable>.<replaceable>RELNODE</replaceable>.<replaceable>BLKNO</replaceable></literal>
+
+        The dot-separated components are (in order):
+
+        <table id="pgwaldump-save-fpi-components">
+         <tgroup cols="2">
+          <thead>
+           <row>
+            <entry>Component</entry>
+            <entry>Description</entry>
+           </row>
+          </thead>
+          <tbody>
+           <row>
+            <entry>LSN</entry>
+            <entry>The LSN of the record with this block, formatted
+            as <literal>%08x-%08X</literal> instead of the
+            conventional <literal>%X/%X</literal> due to filesystem naming
+            limits</entry>
+           </row>
+           <row>
+            <entry>TSOID</entry>
+            <entry>tablespace OID for the block</entry>
+           </row>
+           <row>
+            <entry>DBOID</entry>
+            <entry>database OID for the block</entry>
+           </row>
+           <row>
+            <entry>RELNODE</entry>
+            <entry>relnode id for the block</entry>
+           </row>
+           <row>
+            <entry>BLKNO</entry>
+            <entry>the block number of this block</entry>
+           </row>
+          </tbody>
+         </tgroup>
+        </table>        
+       </para>
+       </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-x <replaceable>xid</replaceable></option></term>
       <term><option>--xid=<replaceable>xid</replaceable></option></term>
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 4f265ef546..bcb3e2c350 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -26,6 +26,9 @@
 #include "common/logging.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
 
 /*
  * NOTE: For any code change or issue fix here, it is highly recommended to
@@ -70,6 +73,10 @@ typedef struct XLogDumpConfig
 	bool		filter_by_relation_block_enabled;
 	ForkNumber	filter_by_relation_forknum;
 	bool		filter_by_fpw;
+
+	/* output options */
+	bool        save_fpw;
+	char        *save_fpw_path;
 } XLogDumpConfig;
 
 
@@ -439,6 +446,68 @@ XLogRecordHasFPW(XLogReaderState *record)
 	return false;
 }
 
+/*
+ * Function to externally save all FPWs stored in the given WAL record
+ */
+static void
+XLogRecordSaveFPWs(XLogReaderState *record, const char *savepath)
+{
+	int			block_id;
+
+	for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+	{
+		/* we will now extract the fullpage image from the XLogRecord and save
+		 * it to a calculated filename */
+
+		if (XLogRecHasBlockImage(record, block_id))
+		{
+			char page[BLCKSZ];
+
+			memset(page, 0, BLCKSZ);
+
+			if (RestoreBlockImage(record, block_id, page))
+			{
+				/* we have our extracted FPI, let's save it now */
+				char filename[MAXPGPATH];
+				FILE *OPF;
+				BlockNumber blk;
+				RelFileNode rnode;
+
+				XLogRecGetBlockTagExtended(record, block_id,
+										   &rnode, NULL, &blk, NULL);
+
+				/*
+				 * The page may be uninitialized. If so, we can't set the LSN because
+				 * that would corrupt the page.
+				 */
+				if (!PageIsNew(page))
+				{
+					PageSetLSN(page, record->ReadRecPtr);
+					/* if checksum field is non-zero then we have checksums enabled,
+					 * so recalculate the checksum with new LSN (yes, this is a hack)
+					 */
+					if (((PageHeader) page)->pd_checksum)
+						((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blk);
+				}
+
+				/* TODO: do we need fork number here? */
+				snprintf(filename, MAXPGPATH, "%s/%08X-%08X.%u.%u.%u.%u", savepath,
+						 LSN_FORMAT_ARGS(record->ReadRecPtr),
+						 rnode.spcNode, rnode.dbNode, rnode.relNode, blk);
+
+				OPF = fopen(filename, PG_BINARY_W);
+				if (!OPF)
+					pg_fatal("Couldn't open file for output: %s", filename);
+
+				if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
+					pg_fatal("Couldn't write out complete FPI to file: %s", filename);
+
+				fclose(OPF);
+			}
+		}
+	}
+}
+
 /*
  * Print a record to stdout
  */
@@ -679,6 +748,7 @@ usage(void)
 			 "                         (default: 1 or the value used in STARTSEG)\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
 	printf(_("  -w, --fullpage         only show records with a full page write\n"));
+	printf(_("  -W, --raw-fpi=path     save found full page images to given path\n"));
 	printf(_("  -x, --xid=XID          only show records with transaction ID XID\n"));
 	printf(_("  -z, --stats[=record]   show statistics instead of records\n"
 			 "                         (optionally, show per-record statistics)\n"));
@@ -712,6 +782,7 @@ main(int argc, char **argv)
 		{"limit", required_argument, NULL, 'n'},
 		{"path", required_argument, NULL, 'p'},
 		{"quiet", no_argument, NULL, 'q'},
+		{"raw-fpi", required_argument, NULL, 'W'},
 		{"relation", required_argument, NULL, 'R'},
 		{"rmgr", required_argument, NULL, 'r'},
 		{"start", required_argument, NULL, 's'},
@@ -772,6 +843,8 @@ main(int argc, char **argv)
 	config.filter_by_fpw = false;
 	config.stats = false;
 	config.stats_per_record = false;
+	config.save_fpw = false;
+	config.save_fpw_path = NULL;
 
 	stats.startptr = InvalidXLogRecPtr;
 	stats.endptr = InvalidXLogRecPtr;
@@ -782,7 +855,7 @@ main(int argc, char **argv)
 		goto bad_argument;
 	}
 
-	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wx:z",
+	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wW:x:z",
 								 long_options, &optindex)) != -1)
 	{
 		switch (option)
@@ -919,6 +992,10 @@ main(int argc, char **argv)
 			case 'w':
 				config.filter_by_fpw = true;
 				break;
+			case 'W':
+				config.save_fpw = true;
+				config.save_fpw_path = pg_strdup(optarg);
+				break;
 			case 'x':
 				if (sscanf(optarg, "%u", &config.filter_by_xid) != 1)
 				{
@@ -972,6 +1049,54 @@ main(int argc, char **argv)
 		}
 	}
 
+	if (config.save_fpw_path != NULL)
+	{
+		struct stat st;
+		bool		is_empty = false;
+
+		/* we accept an empty existing directory */
+		if (stat(config.save_fpw_path, &st) == 0 && S_ISDIR(st.st_mode))
+		{
+			DIR		   *dir = opendir(config.save_fpw_path);
+
+			if (dir)
+			{
+				struct dirent *d;
+
+				is_empty = true;
+				while (errno = 0, (d = readdir(dir)))
+				{
+					if (strcmp(d->d_name, ".") != 0 && strcmp(d->d_name, "..") != 0)
+					{
+						is_empty = false;
+						break;
+					}
+				}
+
+				if (errno)
+				{
+					pg_log_error("could not read raw-fpi directory \"%s\": %m",
+								 config.save_fpw_path);
+					goto bad_argument;
+				}
+
+				if (closedir(dir))
+				{
+					pg_log_error("could not close raw-fpi directory \"%s\": %m",
+								 config.save_fpw_path);
+					goto bad_argument;
+				}
+			}
+		}
+		if (!is_empty && mkdir(config.save_fpw_path, 0700) < 0)
+		{
+			pg_fatal("could not create raw-fpi output directory \"%s\": %m",
+					 config.save_fpw_path);
+
+			goto bad_argument;
+		}
+	}
+
 	/* parse files as start/end boundaries, extract path if not specified */
 	if (optind < argc)
 	{
@@ -1150,6 +1275,11 @@ main(int argc, char **argv)
 				XLogRecStoreStats(&stats, xlogreader_state);
 				stats.endptr = xlogreader_state->EndRecPtr;
 			}
+			else if (config.save_fpw)
+			{
+				if (XLogRecordHasFPW(xlogreader_state))
+					XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path);
+			}
 			else
 				XLogDumpDisplayRecord(&config, xlogreader_state);
 		}
-- 
2.32.0 (Apple Git-132)

#2Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: David Christensen (#1)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On 2022-Apr-22, David Christensen wrote:

Hi -hackers,

Enclosed is a patch to allow extraction/saving of FPI from the WAL
stream via pg_waldump.

I already wrote and posted a patch to do exactly this, and found it the
only way to fix a customer problem, so +1 on having this feature. I
haven't reviewed David's patch.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

#3Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: David Christensen (#1)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Sat, 23 Apr 2022 at 00:51, David Christensen
<david.christensen@crunchydata.com> wrote:

Hi -hackers,

Enclosed is a patch to allow extraction/saving of FPI from the WAL
stream via pg_waldump.

Description from the commit:

Extracts full-page images from the WAL stream into a target directory,
which must be empty or not
exist. These images are subject to the same filtering rules as normal
display in pg_waldump, which
means that you can isolate the full page writes to a target relation,
among other things.

Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk> with
formatting to make things

Regardless of my (lack of) opinion on the inclusion of this patch in
PG (I did not significantly review this patch); I noticed that you do
not yet identify the 'fork' of the FPI in the file name.

A lack of fork identifier in the exported file names would make
debugging much more difficult due to the relatively difficult to
identify data contained in !main forks, so I think this oversight
should be fixed, be it through `_forkname` postfix like normal fork
segments, or be it through `.<forknum>` numerical in- or postfix in
the filename.

-Matthias

#4David Christensen
david.christensen@crunchydata.com
In reply to: Matthias van de Meent (#3)
1 attachment(s)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Sat, Apr 23, 2022 at 9:49 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

Regardless of my (lack of) opinion on the inclusion of this patch in
PG (I did not significantly review this patch); I noticed that you do
not yet identify the 'fork' of the FPI in the file name.

A lack of fork identifier in the exported file names would make
debugging much more difficult due to the relatively difficult to
identify data contained in !main forks, so I think this oversight
should be fixed, be it through `_forkname` postfix like normal fork
segments, or be it through `.<forknum>` numerical in- or postfix in
the filename.

-Matthias

Hi Matthias, great point. Enclosed is a revised version of the patch
that adds the fork identifier to the end if it's a non-main fork.

Best,

David

Attachments:

v2-pg_waldump-save-fpi.patchapplication/octet-stream; name=v2-pg_waldump-save-fpi.patchDownload
From a016be601767956a26f8bc52c198b719e635d9a3 Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Wed, 20 Apr 2022 19:59:35 -0500
Subject: [PATCH] Teach pg_waldump to extract FPIs from the WAL stream

Extracts full-page images from the WAL stream into a target directory, which must be empty or not
exist.  These images are subject to the same filtering rules as normal display in pg_waldump, which
means that you can isolate the full page writes to a target relation, among other things.

Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk> with formatting to make things
somewhat sortable; for instance:

00000000-010000C0.1663.1.6117.0
00000000-01000150.1664.0.6115.0
00000000-010001E0.1664.0.6114.0
00000000-01000270.1663.1.6116.0
00000000-01000300.1663.1.6113.0
00000000-01000390.1663.1.6112.0
00000000-01000420.1663.1.8903.0
00000000-010004B0.1663.1.8902.0
00000000-01000540.1663.1.6111.0
00000000-010005D0.1663.1.6110.0

If the FPI comes from a fork other than the main fork, the fork name will be appended on the output
file name; e.g.:

00000000-014A4758.1663.1.12864.0_vm

It's noteworthy that the raw images do not have the current LSN stored with them in the WAL
stream (as would be true for on-heap versions of the blocks), nor would the checksum be valid in
them (though WAL itself has checksums, so there is some protection there).  This patch chooses to
place the LSN and calculate the proper checksum (if non-zero in the source image) in the outputted
block.  (This could perhaps be a targetted flag if we decide we don't always want this.)

These images could be loaded/inspected via `pg_read_binary_file()` and used in the `pageinspect`
suite of tools to perform detailed analysis on the pages in question, based on historical
information, and may come in handy for forensics work.
---
 doc/src/sgml/ref/pg_waldump.sgml |  65 ++++++++++++++
 src/bin/pg_waldump/pg_waldump.c  | 144 ++++++++++++++++++++++++++++++-
 2 files changed, 208 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index 57746d9421..bd96798cef 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -240,6 +240,71 @@ PostgreSQL documentation
        </listitem>
      </varlistentry>
 
+     <varlistentry>
+       <term><option>-W <replaceable>save_path</replaceable></option></term>
+       <term><option>--raw-fpi=<replaceable>save_path</replaceable></option></term>
+       <listitem>
+       <para>
+        Save full page images seen in the WAL stream to the
+        given <replaceable>save_path</replaceable>, which should not exist or
+        should be empty.  The images saved will be subject to the same
+        filtering and limiting criteria as display records, but in this
+        mode <application>pg_waldump</application> will not output any other
+        information.
+       </para>
+       <para>
+        The page images will be saved with the file
+        format: <literal><replaceable>LSN</replaceable>.<replaceable>TSOID</replaceable>.<replaceable>DBOID</replaceable>.<replaceable>RELNODE</replaceable>.<replaceable>BLKNO</replaceable><replaceable>FORK</replaceable></literal>
+
+        The dot-separated components are (in order):
+
+        <table id="pgwaldump-save-fpi-components">
+         <tgroup cols="2">
+          <thead>
+           <row>
+            <entry>Component</entry>
+            <entry>Description</entry>
+           </row>
+          </thead>
+          <tbody>
+           <row>
+            <entry>LSN</entry>
+            <entry>The LSN of the record with this block, formatted
+            as <literal>%08x-%08X</literal> instead of the
+            conventional <literal>%X/%X</literal> due to filesystem naming
+            limits</entry>
+           </row>
+           <row>
+            <entry>TSOID</entry>
+            <entry>tablespace OID for the block</entry>
+           </row>
+           <row>
+            <entry>DBOID</entry>
+            <entry>database OID for the block</entry>
+           </row>
+           <row>
+            <entry>RELNODE</entry>
+            <entry>relnode id for the block</entry>
+           </row>
+           <row>
+            <entry>BLKNO</entry>
+            <entry>the block number of this block</entry>
+           </row>
+           <row>
+            <entry>FORK</entry>
+            <entry>
+             if coming from the main fork, will be empty, otherwise will be
+             one of <literal>_fsm</literal>, <literal>_vm</literal>,
+             or <literal>_init</literal>.
+            </entry>
+           </row>
+          </tbody>
+         </tgroup>
+        </table>        
+       </para>
+       </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-x <replaceable>xid</replaceable></option></term>
       <term><option>--xid=<replaceable>xid</replaceable></option></term>
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 4f265ef546..fccc07ee8a 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -23,9 +23,13 @@
 #include "access/xlogrecord.h"
 #include "access/xlogstats.h"
 #include "common/fe_memutils.h"
+#include "common/relpath.h"
 #include "common/logging.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
 
 /*
  * NOTE: For any code change or issue fix here, it is highly recommended to
@@ -70,6 +74,10 @@ typedef struct XLogDumpConfig
 	bool		filter_by_relation_block_enabled;
 	ForkNumber	filter_by_relation_forknum;
 	bool		filter_by_fpw;
+
+	/* output options */
+	bool        save_fpw;
+	char        *save_fpw_path;
 } XLogDumpConfig;
 
 
@@ -439,6 +447,79 @@ XLogRecordHasFPW(XLogReaderState *record)
 	return false;
 }
 
+/*
+ * Function to externally save all FPWs stored in the given WAL record
+ */
+static void
+XLogRecordSaveFPWs(XLogReaderState *record, const char *savepath)
+{
+	int			block_id;
+
+	for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+	{
+		/* we will now extract the fullpage image from the XLogRecord and save
+		 * it to a calculated filename */
+
+		if (XLogRecHasBlockImage(record, block_id))
+		{
+			char page[BLCKSZ];
+
+			memset(page, 0, BLCKSZ);
+
+			if (RestoreBlockImage(record, block_id, page))
+			{
+				/* we have our extracted FPI, let's save it now */
+				char filename[MAXPGPATH];
+				char forkname[FORKNAMECHARS + 2]; /* _ + \0 */
+				FILE *OPF;
+				BlockNumber blk;
+				RelFileNode rnode;
+				ForkNumber  fork;
+
+				XLogRecGetBlockTagExtended(record, block_id,
+										   &rnode, &fork, &blk, NULL);
+
+				/*
+				 * The page may be uninitialized. If so, we can't set the LSN because
+				 * that would corrupt the page.
+				 */
+				if (!PageIsNew(page))
+				{
+					PageSetLSN(page, record->ReadRecPtr);
+					/* if checksum field is non-zero then we have checksums enabled,
+					 * so recalculate the checksum with new LSN (yes, this is a hack)
+					 */
+					if (((PageHeader) page)->pd_checksum)
+						((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blk);
+				}
+
+				if (fork >= 0 && fork <= MAX_FORKNUM)
+				{
+					if (fork)
+						sprintf(forkname, "_%s", forkNames[fork]);
+					else
+						forkname[0] = 0;
+				}
+				else
+					pg_fatal("Found invalid fork number: %u", fork);
+
+				snprintf(filename, MAXPGPATH, "%s/%08X-%08X.%u.%u.%u.%u%s", savepath,
+						 LSN_FORMAT_ARGS(record->ReadRecPtr),
+						 rnode.spcNode, rnode.dbNode, rnode.relNode, blk, forkname);
+
+				OPF = fopen(filename, PG_BINARY_W);
+				if (!OPF)
+					pg_fatal("Couldn't open file for output: %s", filename);
+
+				if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
+					pg_fatal("Couldn't write out complete FPI to file: %s", filename);
+
+				fclose(OPF);
+			}
+		}
+	}
+}
+
 /*
  * Print a record to stdout
  */
@@ -679,6 +760,7 @@ usage(void)
 			 "                         (default: 1 or the value used in STARTSEG)\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
 	printf(_("  -w, --fullpage         only show records with a full page write\n"));
+	printf(_("  -W, --raw-fpi=path     save found full page images to given path\n"));
 	printf(_("  -x, --xid=XID          only show records with transaction ID XID\n"));
 	printf(_("  -z, --stats[=record]   show statistics instead of records\n"
 			 "                         (optionally, show per-record statistics)\n"));
@@ -712,6 +794,7 @@ main(int argc, char **argv)
 		{"limit", required_argument, NULL, 'n'},
 		{"path", required_argument, NULL, 'p'},
 		{"quiet", no_argument, NULL, 'q'},
+		{"raw-fpi", required_argument, NULL, 'W'},
 		{"relation", required_argument, NULL, 'R'},
 		{"rmgr", required_argument, NULL, 'r'},
 		{"start", required_argument, NULL, 's'},
@@ -772,6 +855,8 @@ main(int argc, char **argv)
 	config.filter_by_fpw = false;
 	config.stats = false;
 	config.stats_per_record = false;
+	config.save_fpw = false;
+	config.save_fpw_path = NULL;
 
 	stats.startptr = InvalidXLogRecPtr;
 	stats.endptr = InvalidXLogRecPtr;
@@ -782,7 +867,7 @@ main(int argc, char **argv)
 		goto bad_argument;
 	}
 
-	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wx:z",
+	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wW:x:z",
 								 long_options, &optindex)) != -1)
 	{
 		switch (option)
@@ -919,6 +1004,10 @@ main(int argc, char **argv)
 			case 'w':
 				config.filter_by_fpw = true;
 				break;
+			case 'W':
+				config.save_fpw = true;
+				config.save_fpw_path = pg_strdup(optarg);
+				break;
 			case 'x':
 				if (sscanf(optarg, "%u", &config.filter_by_xid) != 1)
 				{
@@ -972,6 +1061,54 @@ main(int argc, char **argv)
 		}
 	}
 
+	if (config.save_fpw_path != NULL)
+	{
+		struct stat st;
+		bool		is_empty = false;
+
+		/* we accept an empty existing directory */
+		if (stat(config.save_fpw_path, &st) == 0 && S_ISDIR(st.st_mode))
+		{
+			DIR		   *dir = opendir(config.save_fpw_path);
+
+			if (dir)
+			{
+				struct dirent *d;
+
+				is_empty = true;
+				while (errno = 0, (d = readdir(dir)))
+				{
+					if (strcmp(d->d_name, ".") != 0 && strcmp(d->d_name, "..") != 0)
+					{
+						is_empty = false;
+						break;
+					}
+				}
+
+				if (errno)
+				{
+					pg_log_error("could not read raw-fpi directory \"%s\": %m",
+								 config.save_fpw_path);
+					goto bad_argument;
+				}
+
+				if (closedir(dir))
+				{
+					pg_log_error("could not close raw-fpi directory \"%s\": %m",
+								 config.save_fpw_path);
+					goto bad_argument;
+				}
+			}
+		}
+		if (!is_empty && mkdir(config.save_fpw_path, 0700) < 0)
+		{
+			pg_fatal("could not create raw-fpi output directory \"%s\": %m",
+					 config.save_fpw_path);
+
+			goto bad_argument;
+		}
+	}
+
 	/* parse files as start/end boundaries, extract path if not specified */
 	if (optind < argc)
 	{
@@ -1150,6 +1287,11 @@ main(int argc, char **argv)
 				XLogRecStoreStats(&stats, xlogreader_state);
 				stats.endptr = xlogreader_state->EndRecPtr;
 			}
+			else if (config.save_fpw)
+			{
+				if (XLogRecordHasFPW(xlogreader_state))
+					XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path);
+			}
 			else
 				XLogDumpDisplayRecord(&config, xlogreader_state);
 		}
-- 
2.32.0 (Apple Git-132)

#5Michael Paquier
michael@paquier.xyz
In reply to: David Christensen (#4)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Sat, Apr 23, 2022 at 01:43:36PM -0500, David Christensen wrote:

Hi Matthias, great point. Enclosed is a revised version of the patch
that adds the fork identifier to the end if it's a non-main fork.

Like Alvaro, I have seen cases where this would have been really
handy. So +1 from me, as well, to have more tooling like what you are
proposing. Fine for me to use one file for each block with a name
like what you are suggesting for each one of them.

+       /* we accept an empty existing directory */
+       if (stat(config.save_fpw_path, &st) == 0 && S_ISDIR(st.st_mode))
+       {
I don't think that there is any need to rely on a new logic if there
is already some code in place able to do the same work.  See
verify_dir_is_empty_or_create() in pg_basebackup.c, as one example,
that relies on pg_check_dir().  I think that you'd better rely at
least on what pgcheckdir.c offers.
+       {"raw-fpi", required_argument, NULL, 'W'},
I think that we'd better rename this option.  "fpi", that is not used
much in the user-facing docs, is additionally not adapted when we have
an other option called -w/--fullpage.  I can think of
--save-fullpage.
+                   PageSetLSN(page, record->ReadRecPtr);
+                   /* if checksum field is non-zero then we have checksums enabled,
+                    * so recalculate the checksum with new LSN (yes, this is a hack)
+                    */
Yeah, that looks like a hack, but putting in place a page on a cluster
that has checksums enabled would be more annoying with
zero_damaged_pages enabled if we don't do that, so that's fine by me
as-is.  Perhaps you should mention that FPWs don't have their
pd_checksum updated when written.
+       /* we will now extract the fullpage image from the XLogRecord and save
+        * it to a calculated filename */
The format of this comment is incorrect.
+            <entry>The LSN of the record with this block, formatted
+            as <literal>%08x-%08X</literal> instead of the
+            conventional <literal>%X/%X</literal> due to filesystem naming
+            limits</entry>
The last part of the sentence about %X/%X could just be removed.  That
could be confusing, at worse.

+ PageSetLSN(page, record->ReadRecPtr);
Why is pd_lsn set?

git diff --check complains a bit.

This stuff should include some tests. With --end, the tests can
be cheap.
--
Michael

#6Drouvot, Bertrand
bdrouvot@amazon.com
In reply to: Michael Paquier (#5)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

Hi,

On 4/25/22 8:11 AM, Michael Paquier wrote:

On Sat, Apr 23, 2022 at 01:43:36PM -0500, David Christensen wrote:

Hi Matthias, great point. Enclosed is a revised version of the patch
that adds the fork identifier to the end if it's a non-main fork.

Like Alvaro, I have seen cases where this would have been really
handy. So +1 from me, as well, to have more tooling like what you are
proposing.

+1 on the idea.

FWIW, there is an extension doing this [1]https://github.com/bdrouvot/pg_wal_fp_extract but having the feature
included in pg_waldump would be great.

Bertrand

[1]: https://github.com/bdrouvot/pg_wal_fp_extract

#7Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David Christensen (#1)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Sat, Apr 23, 2022 at 4:21 AM David Christensen
<david.christensen@crunchydata.com> wrote:

Hi -hackers,

Enclosed is a patch to allow extraction/saving of FPI from the WAL
stream via pg_waldump.

Description from the commit:

Extracts full-page images from the WAL stream into a target directory,
which must be empty or not
exist. These images are subject to the same filtering rules as normal
display in pg_waldump, which
means that you can isolate the full page writes to a target relation,
among other things.

Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk> with
formatting to make things
somewhat sortable; for instance:

00000000-010000C0.1663.1.6117.0
00000000-01000150.1664.0.6115.0
00000000-010001E0.1664.0.6114.0
00000000-01000270.1663.1.6116.0
00000000-01000300.1663.1.6113.0
00000000-01000390.1663.1.6112.0
00000000-01000420.1663.1.8903.0
00000000-010004B0.1663.1.8902.0
00000000-01000540.1663.1.6111.0
00000000-010005D0.1663.1.6110.0

It's noteworthy that the raw images do not have the current LSN stored
with them in the WAL
stream (as would be true for on-heap versions of the blocks), nor
would the checksum be valid in
them (though WAL itself has checksums, so there is some protection
there). This patch chooses to
place the LSN and calculate the proper checksum (if non-zero in the
source image) in the outputted
block. (This could perhaps be a targetted flag if we decide we don't
always want this.)

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect`
suite of tools to perform detailed analysis on the pages in question,
based on historical
information, and may come in handy for forensics work.

Thanks for working on this. I'm just thinking if we can use these FPIs
to repair the corrupted pages? I would like to understand more
detailed usages of the FPIs other than inspecting with pageinspect.

Given that others have realistic use-cases (of course I would like to
know more about those), +1 for the idea. However, I would suggest
adding a function to extract raw FPI data to the pg_walinspect
extension that got recently committed in PG 15, the out of which can
directly be fed to pageinspect functions or

Few comments:
1) I think it's good to mention the stored file name format.
+ printf(_("  -W, --raw-fpi=path     save found full page images to
given path\n"));
2)
+ for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+ {
+ /* we will now extract the fullpage image from the XLogRecord and save
+ * it to a calculated filename */
+
+ if (XLogRecHasBlockImage(record, block_id))

I think we need XLogRecHasBlockRef to be true to check
XLogRecHasBlockImage otherwise, we will see some build farms failing,
recently I've seen this failure for pg_walinspect..

for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
{
if (!XLogRecHasBlockRef(record, block_id))
continue;

        if (XLogRecHasBlockImage(record, block_id))
            *fpi_len += XLogRecGetBlock(record, block_id)->bimg_len;
    }
3) Please correct the commenting format:
+ /* we will now extract the fullpage image from the XLogRecord and save
+ * it to a calculated filename */
4) Usually we start errors with lower case letters "could not ....."
+ pg_fatal("Couldn't open file for output: %s", filename);
+ pg_fatal("Couldn't write out complete FPI to file: %s", filename);
And the variable name too:
+ FILE *OPF;
5) Not sure how the FPIs of TOASTed tables get stored, but it would be
good to check.
6) Good to specify the known usages of FPIs in the documentation.
7) Isn't it good to emit an error if RestoreBlockImage returns false?
+ if (RestoreBlockImage(record, block_id, page))
+ {
8) I think I don't mind if a non-empty directory is specified - IMO
better usability is this - if the directory is non-empty, just go add
the FPI files if FPI file exists just replace it, if the directory
isn't existing, create and write the FPI files.
+ /* we accept an empty existing directory */
+ if (stat(config.save_fpw_path, &st) == 0 && S_ISDIR(st.st_mode))
+ {
 9) Instead of following:
+ if (XLogRecordHasFPW(xlogreader_state))
+ XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path);
I will just do this in XLogRecordSaveFPWs:
    for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
    {
        if (!XLogRecHasBlockRef(record, block_id))
            continue;

if (XLogRecHasBlockImage(record, block_id))
{

}
}
10) Along with pg_pwrite(), can we also fsync the files (of course
users can choose it optionally) so that the writes will be durable for
the OS crashes?

Regards,
Bharath Rupireddy.

#8David Christensen
david.christensen@crunchydata.com
In reply to: Michael Paquier (#5)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Apr 25, 2022 at 1:11 AM Michael Paquier <michael@paquier.xyz> wrote:

On Sat, Apr 23, 2022 at 01:43:36PM -0500, David Christensen wrote:

Hi Matthias, great point. Enclosed is a revised version of the patch
that adds the fork identifier to the end if it's a non-main fork.

Like Alvaro, I have seen cases where this would have been really
handy. So +1 from me, as well, to have more tooling like what you are
proposing. Fine for me to use one file for each block with a name
like what you are suggesting for each one of them.

+       /* we accept an empty existing directory */
+       if (stat(config.save_fpw_path, &st) == 0 && S_ISDIR(st.st_mode))
+       {
I don't think that there is any need to rely on a new logic if there
is already some code in place able to do the same work.  See
verify_dir_is_empty_or_create() in pg_basebackup.c, as one example,
that relies on pg_check_dir().  I think that you'd better rely at
least on what pgcheckdir.c offers.

Yeah, though I am tending towards what another user had suggested and
just accepting an existing directory rather than requiring it to be
empty, so thinking I might just skip this one. (Will review the file
you've pointed out to see if there is a relevant function though.)

+       {"raw-fpi", required_argument, NULL, 'W'},
I think that we'd better rename this option.  "fpi", that is not used
much in the user-facing docs, is additionally not adapted when we have
an other option called -w/--fullpage.  I can think of
--save-fullpage.

Makes sense.

+                   PageSetLSN(page, record->ReadRecPtr);
+                   /* if checksum field is non-zero then we have checksums enabled,
+                    * so recalculate the checksum with new LSN (yes, this is a hack)
+                    */
Yeah, that looks like a hack, but putting in place a page on a cluster
that has checksums enabled would be more annoying with
zero_damaged_pages enabled if we don't do that, so that's fine by me
as-is.  Perhaps you should mention that FPWs don't have their
pd_checksum updated when written.

Can make a mention; you thinking just a comment in the code is
sufficient, or want there to be user-visible docs as well?

+       /* we will now extract the fullpage image from the XLogRecord and save
+        * it to a calculated filename */
The format of this comment is incorrect.
+            <entry>The LSN of the record with this block, formatted
+            as <literal>%08x-%08X</literal> instead of the
+            conventional <literal>%X/%X</literal> due to filesystem naming
+            limits</entry>
The last part of the sentence about %X/%X could just be removed.  That
could be confusing, at worse.

Makes sense.

+ PageSetLSN(page, record->ReadRecPtr);
Why is pd_lsn set?

This was to make the page as extracted equivalent to the effect of
applying the WAL record block on replay (the LSN and checksum both);
since recovery calls this function to mark the LSN where the page came
from this is why I did this as well. (I do see perhaps a case for
--raw output that doesn't munge the page whatsoever, just made
comparisons easier.)

git diff --check complains a bit.

Can look into this.

This stuff should include some tests. With --end, the tests can
be cheap.

Yeah, makes sense, will include some in the next version.

David

#9David Christensen
david.christensen@crunchydata.com
In reply to: Drouvot, Bertrand (#6)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Apr 25, 2022 at 2:00 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi,

On 4/25/22 8:11 AM, Michael Paquier wrote:

On Sat, Apr 23, 2022 at 01:43:36PM -0500, David Christensen wrote:

Hi Matthias, great point. Enclosed is a revised version of the patch
that adds the fork identifier to the end if it's a non-main fork.

Like Alvaro, I have seen cases where this would have been really
handy. So +1 from me, as well, to have more tooling like what you are
proposing.

+1 on the idea.

FWIW, there is an extension doing this [1] but having the feature
included in pg_waldump would be great.

Cool, glad to see there is some interest; definitely some overlap in
forensics inside and outside the database both, as there are different
use cases for both.

David

#10David Christensen
david.christensen@crunchydata.com
In reply to: Bharath Rupireddy (#7)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Apr 25, 2022 at 6:03 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Thanks for working on this. I'm just thinking if we can use these FPIs
to repair the corrupted pages? I would like to understand more
detailed usages of the FPIs other than inspecting with pageinspect.

My main use case was for being able to look at potential corruption,
either in the WAL stream, on heap, or in tools associated with the WAL
stream. I suppose you could use the page images to replace corrupted
on-disk pages (and in fact I think I've heard of a tool or two that
try to do that), though don't know that I consider this the primary
purpose (and having toast tables and the list, as well as clog would
make it potentially hard to just drop-in a page version without
issues). Might help in extreme situations though.

Given that others have realistic use-cases (of course I would like to
know more about those), +1 for the idea. However, I would suggest
adding a function to extract raw FPI data to the pg_walinspect
extension that got recently committed in PG 15, the out of which can
directly be fed to pageinspect functions or

Yeah, makes sense to have some overlap here; will review what is there
and see if there is some shared code base we can utilize. (ISTR some
work towards getting these two tools using more of the same code, and
this seems like another such instance.)

Few comments:
1) I think it's good to mention the stored file name format.
+ printf(_(" -W, --raw-fpi=path save found full page images to
given path\n"));

+1, though I've also thought there could be uses to have multiple
possible output formats here (most immediately, there may be cases
where we want *each* FPI for a block vs the *latest*, so files name
with/without the LSN component seem the easiest way forward here).
That would introduce some additional complexity though, so might need
to see if others think that makes any sense.

2)
+ for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+ {
+ /* we will now extract the fullpage image from the XLogRecord and save
+ * it to a calculated filename */
+
+ if (XLogRecHasBlockImage(record, block_id))

I think we need XLogRecHasBlockRef to be true to check
XLogRecHasBlockImage otherwise, we will see some build farms failing,
recently I've seen this failure for pg_walinspect..

for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
{
if (!XLogRecHasBlockRef(record, block_id))
continue;

if (XLogRecHasBlockImage(record, block_id))
*fpi_len += XLogRecGetBlock(record, block_id)->bimg_len;
}

Good point; my previous patch that got committed here (127aea2a65)
probably also needed this treatment.

3) Please correct the commenting format:
+ /* we will now extract the fullpage image from the XLogRecord and save
+ * it to a calculated filename */

Ack.

4) Usually we start errors with lower case letters "could not ....."
+ pg_fatal("Couldn't open file for output: %s", filename);
+ pg_fatal("Couldn't write out complete FPI to file: %s", filename);
And the variable name too:
+ FILE *OPF;

Ack.

5) Not sure how the FPIs of TOASTed tables get stored, but it would be
good to check.

What would be different here? Are there issues you can think of, or
just more from the pageinspect side of things?

6) Good to specify the known usages of FPIs in the documentation.

Ack. Prob good to get additional info/use cases from others, as mine
is fairly short. :-)

7) Isn't it good to emit an error if RestoreBlockImage returns false?
+ if (RestoreBlockImage(record, block_id, page))
+ {

Ack.

8) I think I don't mind if a non-empty directory is specified - IMO
better usability is this - if the directory is non-empty, just go add
the FPI files if FPI file exists just replace it, if the directory
isn't existing, create and write the FPI files.
+ /* we accept an empty existing directory */
+ if (stat(config.save_fpw_path, &st) == 0 && S_ISDIR(st.st_mode))
+ {

Agreed; was mainly trying to prevent accidental expansion inside
`pg_wal` when an earlier version of the patch implied `.` as the
current dir with an optional path, but I've since made the path
non-optional and agree that this is unnecessarily restrictive.

9) Instead of following:
+ if (XLogRecordHasFPW(xlogreader_state))
+ XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path);
I will just do this in XLogRecordSaveFPWs:
for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
{
if (!XLogRecHasBlockRef(record, block_id))
continue;

if (XLogRecHasBlockImage(record, block_id))
{

}
}

Yeah, a little redundant.

10) Along with pg_pwrite(), can we also fsync the files (of course
users can choose it optionally) so that the writes will be durable for
the OS crashes?

Can add; you thinking a separate flag to disable this with default true?

Best,

David

#11Michael Paquier
michael@paquier.xyz
In reply to: David Christensen (#10)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Apr 25, 2022 at 10:24:52AM -0500, David Christensen wrote:

On Mon, Apr 25, 2022 at 6:03 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Thanks for working on this. I'm just thinking if we can use these FPIs
to repair the corrupted pages? I would like to understand more
detailed usages of the FPIs other than inspecting with pageinspect.

My main use case was for being able to look at potential corruption,
either in the WAL stream, on heap, or in tools associated with the WAL
stream. I suppose you could use the page images to replace corrupted
on-disk pages (and in fact I think I've heard of a tool or two that
try to do that), though don't know that I consider this the primary
purpose (and having toast tables and the list, as well as clog would
make it potentially hard to just drop-in a page version without
issues). Might help in extreme situations though.

You could do a bunch of things with those images, even make things
worse if you are not careful enough.

10) Along with pg_pwrite(), can we also fsync the files (of course
users can choose it optionally) so that the writes will be durable for
the OS crashes?

Can add; you thinking a separate flag to disable this with default true?

We expect data generated by tools like pg_dump, pg_receivewal
(depending on the use --synchronous) or pg_basebackup to be consistent
when we exit from the call. FWIW, flushing this data does not seem
like a strong requirement for something aimed at being used page-level
chirurgy or lookups, because the WAL segments should still be around
even if the host holding the archives is unplugged.
--
Michael

#12Michael Paquier
michael@paquier.xyz
In reply to: David Christensen (#8)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Apr 25, 2022 at 10:11:10AM -0500, David Christensen wrote:

On Mon, Apr 25, 2022 at 1:11 AM Michael Paquier <michael@paquier.xyz> wrote:

I don't think that there is any need to rely on a new logic if there
is already some code in place able to do the same work. See
verify_dir_is_empty_or_create() in pg_basebackup.c, as one example,
that relies on pg_check_dir(). I think that you'd better rely at
least on what pgcheckdir.c offers.

Yeah, though I am tending towards what another user had suggested and
just accepting an existing directory rather than requiring it to be
empty, so thinking I might just skip this one. (Will review the file
you've pointed out to see if there is a relevant function though.)

OK. FWIW, pg_check_dir() is used in initdb and pg_basebackup because
these care about the behavior to use when specifying a target path.
You could reuse it, but use a different policy depending on its
returned result for the needs of what you see fit in this case.

+                   PageSetLSN(page, record->ReadRecPtr);
+                   /* if checksum field is non-zero then we have checksums enabled,
+                    * so recalculate the checksum with new LSN (yes, this is a hack)
+                    */
Yeah, that looks like a hack, but putting in place a page on a cluster
that has checksums enabled would be more annoying with
zero_damaged_pages enabled if we don't do that, so that's fine by me
as-is.  Perhaps you should mention that FPWs don't have their
pd_checksum updated when written.

Can make a mention; you thinking just a comment in the code is
sufficient, or want there to be user-visible docs as well?

I was thinking about a comment, at least.

This was to make the page as extracted equivalent to the effect of
applying the WAL record block on replay (the LSN and checksum both);
since recovery calls this function to mark the LSN where the page came
from this is why I did this as well. (I do see perhaps a case for
--raw output that doesn't munge the page whatsoever, just made
comparisons easier.)

Hm. Okay. The argument goes both ways, I guess, depending on what we
want to do with those raw pages. Still you should not need pd_lsn if
the point is to be able to stick the pages back in place to attempt to
get back as much data as possible when loading it back to the shared
buffers?
--
Michael

#13David Christensen
david.christensen@crunchydata.com
In reply to: Michael Paquier (#12)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Apr 25, 2022 at 9:54 PM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Apr 25, 2022 at 10:11:10AM -0500, David Christensen wrote:

On Mon, Apr 25, 2022 at 1:11 AM Michael Paquier <michael@paquier.xyz> wrote:

I don't think that there is any need to rely on a new logic if there
is already some code in place able to do the same work. See
verify_dir_is_empty_or_create() in pg_basebackup.c, as one example,
that relies on pg_check_dir(). I think that you'd better rely at
least on what pgcheckdir.c offers.

Yeah, though I am tending towards what another user had suggested and
just accepting an existing directory rather than requiring it to be
empty, so thinking I might just skip this one. (Will review the file
you've pointed out to see if there is a relevant function though.)

OK. FWIW, pg_check_dir() is used in initdb and pg_basebackup because
these care about the behavior to use when specifying a target path.
You could reuse it, but use a different policy depending on its
returned result for the needs of what you see fit in this case.

I have a new version of the patch (pending tests) that uses
pg_check_dir's return value to handle things appropriately, so at
least some code reuse now. It did end up simplifying a lot.

+                   PageSetLSN(page, record->ReadRecPtr);
+                   /* if checksum field is non-zero then we have checksums enabled,
+                    * so recalculate the checksum with new LSN (yes, this is a hack)
+                    */
Yeah, that looks like a hack, but putting in place a page on a cluster
that has checksums enabled would be more annoying with
zero_damaged_pages enabled if we don't do that, so that's fine by me
as-is.  Perhaps you should mention that FPWs don't have their
pd_checksum updated when written.

Can make a mention; you thinking just a comment in the code is
sufficient, or want there to be user-visible docs as well?

I was thinking about a comment, at least.

New patch version has significantly more comments.

This was to make the page as extracted equivalent to the effect of
applying the WAL record block on replay (the LSN and checksum both);
since recovery calls this function to mark the LSN where the page came
from this is why I did this as well. (I do see perhaps a case for
--raw output that doesn't munge the page whatsoever, just made
comparisons easier.)

Hm. Okay. The argument goes both ways, I guess, depending on what we
want to do with those raw pages. Still you should not need pd_lsn if
the point is to be able to stick the pages back in place to attempt to
get back as much data as possible when loading it back to the shared
buffers?

Yeah, I can see that too; I think there's at least enough of an
argument for a flag to apply the fixups or just extract only the raw
page pre-modification. Not sure which should be the "default"
behavior; either `--raw` or `--fixup-metadata` or something could
work. (Naming suggestions welcomed.)

David

#14David Christensen
david.christensen@crunchydata.com
In reply to: Michael Paquier (#11)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Apr 25, 2022 at 9:42 PM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Apr 25, 2022 at 10:24:52AM -0500, David Christensen wrote:

On Mon, Apr 25, 2022 at 6:03 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Thanks for working on this. I'm just thinking if we can use these FPIs
to repair the corrupted pages? I would like to understand more
detailed usages of the FPIs other than inspecting with pageinspect.

My main use case was for being able to look at potential corruption,
either in the WAL stream, on heap, or in tools associated with the WAL
stream. I suppose you could use the page images to replace corrupted
on-disk pages (and in fact I think I've heard of a tool or two that
try to do that), though don't know that I consider this the primary
purpose (and having toast tables and the list, as well as clog would
make it potentially hard to just drop-in a page version without
issues). Might help in extreme situations though.

You could do a bunch of things with those images, even make things
worse if you are not careful enough.

True. :-) This does seem like a tool geared towards "expert mode", so
maybe we just assume if you need it you know what you're doing?

10) Along with pg_pwrite(), can we also fsync the files (of course
users can choose it optionally) so that the writes will be durable for
the OS crashes?

Can add; you thinking a separate flag to disable this with default true?

We expect data generated by tools like pg_dump, pg_receivewal
(depending on the use --synchronous) or pg_basebackup to be consistent
when we exit from the call. FWIW, flushing this data does not seem
like a strong requirement for something aimed at being used page-level
chirurgy or lookups, because the WAL segments should still be around
even if the host holding the archives is unplugged.

I have added the fsync to the latest path (forthcoming), but I have no
strong preferences here as to the correct/expected behavior.

Best,

David

#15Michael Paquier
michael@paquier.xyz
In reply to: David Christensen (#14)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Tue, Apr 26, 2022 at 01:15:05PM -0500, David Christensen wrote:

True. :-) This does seem like a tool geared towards "expert mode", so
maybe we just assume if you need it you know what you're doing?

This is definitely an expert mode toy.
--
Michael

#16David Christensen
david.christensen@crunchydata.com
In reply to: David Christensen (#13)
1 attachment(s)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

Enclosed is v3 of this patch; this adds two modes for this feature,
one with the raw page `--save-fullpage/-W` and one with the
LSN+checksum fixups `--save-fullpage-fixup/-X`.

I've added at least some basic sanity-checking of the underlying
feature, as well as run the test file and the changes to pg_waldump.c
through pgindent/perltidy to make them adhere to project standards.
Threw in a rebase as well.

Would appreciate any additional feedback here.

Best,

David

Attachments:

v3-pg_waldump-save-fpi.patchapplication/octet-stream; name=v3-pg_waldump-save-fpi.patchDownload
From d85ccf8c89a55cf1761149ad9ebb9f836877fc24 Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Wed, 20 Apr 2022 19:59:35 -0500
Subject: [PATCH] Teach pg_waldump to extract FPIs from the WAL stream

Extracts full-page images from the WAL stream into a target directory,
which must be empty or not exist.  These images are subject to the same
filtering rules as normal display in pg_waldump, which means that you
can isolate the full page writes to a target relation, among other
things.

Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk> with
formatting to make things somewhat sortable; for instance:

00000000-010000C0.1663.1.6117.0
00000000-01000150.1664.0.6115.0
00000000-010001E0.1664.0.6114.0
00000000-01000270.1663.1.6116.0
00000000-01000300.1663.1.6113.0
00000000-01000390.1663.1.6112.0
00000000-01000420.1663.1.8903.0
00000000-010004B0.1663.1.8902.0
00000000-01000540.1663.1.6111.0
00000000-010005D0.1663.1.6110.0

If the FPI comes from a fork other than the main fork, the fork name
will be appended on the output file name; e.g.:

00000000-014A4758.1663.1.12864.0_vm

It's noteworthy that the raw block images do not have the current LSN
stored with them in the WAL stream (as would be true for on-heap
versions of the blocks), nor would the checksum be updated in
them (though WAL itself has checksums, so there is some protection
there).  As such there are two versions of this functionality, one which
returns the raw page as it appears in the WAL (--save-fullpage) and one
which applies the updated pd_lsn and pd_checksum (--save-fullpage-fixup).

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect` suite of tools to perform detailed analysis on
the pages in question, based on historical information, and may come in
handy for forensics work.
---
 doc/src/sgml/ref/pg_waldump.sgml          |  73 ++++++++
 src/bin/pg_waldump/pg_waldump.c           | 199 +++++++++++++++++++---
 src/bin/pg_waldump/t/002_save_fullpage.pl | 132 ++++++++++++++
 3 files changed, 379 insertions(+), 25 deletions(-)
 create mode 100644 src/bin/pg_waldump/t/002_save_fullpage.pl

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index 57746d9421..a4e1e6388d 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -240,6 +240,79 @@ PostgreSQL documentation
        </listitem>
      </varlistentry>
 
+     <varlistentry>
+       <term><option>-W <replaceable>save_path</replaceable></option></term>
+       <term><option>--save-fullpage=<replaceable>save_path</replaceable></option></term>
+       <term><option>-X <replaceable>save_path</replaceable></option></term>
+       <term><option>--save-fullpage-fixup=<replaceable>save_path</replaceable></option></term>
+       <listitem>
+       <para>
+        Save full page images seen in the WAL stream to the
+        <replaceable>save_path</replaceable> directory, which will be created
+        if it does not exist.  The images saved will be subject to the same
+        filtering and limiting criteria as display records, but in this
+        mode <application>pg_waldump</application> will not output any other
+        information.
+       </para>
+       <para>
+        If invoked using
+        the <literal>-X</literal>/<literal>--save-fullpage-fixup</literal>
+        option, this page image will include the <literal>pd_lsn</literal> of
+        the replayed record rather than the raw page image; as well,
+        the <literal>pd_checksum</literal> field will be updated if it had
+        previously existed.
+       </para>
+       <para>
+        The page images will be saved with the file
+        format: <literal><replaceable>LSN</replaceable>.<replaceable>TSOID</replaceable>.<replaceable>DBOID</replaceable>.<replaceable>RELNODE</replaceable>.<replaceable>BLKNO</replaceable><replaceable>FORK</replaceable></literal>
+
+        The dot-separated components are (in order):
+
+        <table id="pgwaldump-save-fpi-components">
+         <tgroup cols="2">
+          <thead>
+           <row>
+            <entry>Component</entry>
+            <entry>Description</entry>
+           </row>
+          </thead>
+          <tbody>
+           <row>
+            <entry>LSN</entry>
+            <entry>The LSN of the record with this block, formatted
+            as two 8-character hexadecimal numbers <literal>%08X-%08X</literal></entry>
+           </row>
+           <row>
+            <entry>TSOID</entry>
+            <entry>tablespace OID for the block</entry>
+           </row>
+           <row>
+            <entry>DBOID</entry>
+            <entry>database OID for the block</entry>
+           </row>
+           <row>
+            <entry>RELNODE</entry>
+            <entry>relnode id for the block</entry>
+           </row>
+           <row>
+            <entry>BLKNO</entry>
+            <entry>the block number of this block</entry>
+           </row>
+           <row>
+            <entry>FORK</entry>
+            <entry>
+             if coming from the main fork, will be empty, otherwise will be
+             one of <literal>_fsm</literal>, <literal>_vm</literal>,
+             or <literal>_init</literal>.
+            </entry>
+           </row>
+          </tbody>
+         </tgroup>
+        </table>
+       </para>
+       </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-x <replaceable>xid</replaceable></option></term>
       <term><option>--xid=<replaceable>xid</replaceable></option></term>
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 4f265ef546..dd1035e593 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -24,8 +24,12 @@
 #include "access/xlogstats.h"
 #include "common/fe_memutils.h"
 #include "common/logging.h"
+#include "common/relpath.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
 
 /*
  * NOTE: For any code change or issue fix here, it is highly recommended to
@@ -70,6 +74,11 @@ typedef struct XLogDumpConfig
 	bool		filter_by_relation_block_enabled;
 	ForkNumber	filter_by_relation_forknum;
 	bool		filter_by_fpw;
+
+	/* save options */
+	bool		save_fpw;
+	bool		fixup_fpw;
+	char	   *save_fpw_path;
 } XLogDumpConfig;
 
 
@@ -439,6 +448,109 @@ XLogRecordHasFPW(XLogReaderState *record)
 	return false;
 }
 
+/*
+ * Function to externally save all FPWs stored in the given WAL record
+ */
+static void
+XLogRecordSaveFPWs(XLogReaderState *record, const char *savepath, bool fixup)
+{
+	int			block_id;
+
+	for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+	{
+		if (!XLogRecHasBlockRef(record, block_id))
+			continue;
+
+		if (XLogRecHasBlockImage(record, block_id))
+		{
+			char		page[BLCKSZ];
+
+			memset(page, 0, BLCKSZ);
+
+			if (RestoreBlockImage(record, block_id, page))
+			{
+				/* we have our extracted FPI, let's save it now */
+				char		filename[MAXPGPATH];
+				char		forkname[FORKNAMECHARS + 2];	/* _ + \0 */
+				FILE	   *OPF;
+				BlockNumber blk;
+				RelFileNode rnode;
+				ForkNumber	fork;
+
+				XLogRecGetBlockTagExtended(record, block_id,
+										   &rnode, &fork, &blk, NULL);
+
+				/*
+				 * The raw page as stored in the WAL record includes the LSN
+				 * of the block as it appeared when it was originally written,
+				 * however this differs than the effects of replaying this
+				 * same FPI in recovery, as recovery calls RestoreBlockImage()
+				 * and then sets the LSN as part of one action.  What this
+				 * means is that a page as recovered from WAL and the version
+				 * of the page saved here will differ by the LSN and the
+				 * checksum (if enabled).
+				 *
+				 * There are potentially use-cases for both versions (with and
+				 * without mentioned fixups), so allow this to be
+				 * user-selected, unless the restored page was empty, in which
+				 * case we leave it alone.
+				 */
+
+				if (fixup && !PageIsNew(page))
+				{
+					PageSetLSN(page, record->ReadRecPtr);
+
+					/*
+					 * If checksum field is non-zero then we have checksums
+					 * enabled, so recalculate the checksum with new LSN
+					 * (whether this is considered a hack or heuristics is an
+					 * exercise for the reader).
+					 *
+					 * We make this choice to allow pages saved by this
+					 * function to work as expected with the checksum-related
+					 * functions in pageinspect without having to worry about
+					 * zero_damaged_pages or other considerations.
+					 *
+					 * FPIs in WAL do not have the checksum field updated in
+					 * the page image; in a checksums-enabled cluster, this
+					 * task is handled by FlushBuffer() when a dirty buffer is
+					 * written out to disk.  Since we are running outside of
+					 * Postmaster that won't work in this case, so we handle
+					 * ourselves.
+					 */
+
+					if (((PageHeader) page)->pd_checksum)
+						((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blk);
+				}
+
+				if (fork >= 0 && fork <= MAX_FORKNUM)
+				{
+					if (fork)
+						sprintf(forkname, "_%s", forkNames[fork]);
+					else
+						forkname[0] = 0;
+				}
+				else
+					pg_fatal("found invalid fork number: %u", fork);
+
+				snprintf(filename, MAXPGPATH, "%s/%08X-%08X.%u.%u.%u.%u%s", savepath,
+						 LSN_FORMAT_ARGS(record->ReadRecPtr),
+						 rnode.spcNode, rnode.dbNode, rnode.relNode, blk, forkname);
+
+				OPF = fopen(filename, PG_BINARY_W);
+				if (!OPF)
+					pg_fatal("couldn't open file for output: %s", filename);
+
+				if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
+					pg_fatal("couldn't write out complete fullpage image to file: %s", filename);
+
+				fsync(fileno(OPF));
+				fclose(OPF);
+			}
+		}
+	}
+}
+
 /*
  * Print a record to stdout
  */
@@ -660,29 +772,32 @@ usage(void)
 	printf(_("Usage:\n"));
 	printf(_("  %s [OPTION]... [STARTSEG [ENDSEG]]\n"), progname);
 	printf(_("\nOptions:\n"));
-	printf(_("  -b, --bkp-details      output detailed information about backup blocks\n"));
-	printf(_("  -B, --block=N          with --relation, only show records that modify block N\n"));
-	printf(_("  -e, --end=RECPTR       stop reading at WAL location RECPTR\n"));
-	printf(_("  -f, --follow           keep retrying after reaching end of WAL\n"));
-	printf(_("  -F, --fork=FORK        only show records that modify blocks in fork FORK;\n"
-			 "                         valid names are main, fsm, vm, init\n"));
-	printf(_("  -n, --limit=N          number of records to display\n"));
-	printf(_("  -p, --path=PATH        directory in which to find log segment files or a\n"
-			 "                         directory with a ./pg_wal that contains such files\n"
-			 "                         (default: current directory, ./pg_wal, $PGDATA/pg_wal)\n"));
-	printf(_("  -q, --quiet            do not print any output, except for errors\n"));
-	printf(_("  -r, --rmgr=RMGR        only show records generated by resource manager RMGR;\n"
-			 "                         use --rmgr=list to list valid resource manager names\n"));
-	printf(_("  -R, --relation=T/D/R   only show records that modify blocks in relation T/D/R\n"));
-	printf(_("  -s, --start=RECPTR     start reading at WAL location RECPTR\n"));
-	printf(_("  -t, --timeline=TLI     timeline from which to read log records\n"
-			 "                         (default: 1 or the value used in STARTSEG)\n"));
-	printf(_("  -V, --version          output version information, then exit\n"));
-	printf(_("  -w, --fullpage         only show records with a full page write\n"));
-	printf(_("  -x, --xid=XID          only show records with transaction ID XID\n"));
-	printf(_("  -z, --stats[=record]   show statistics instead of records\n"
-			 "                         (optionally, show per-record statistics)\n"));
-	printf(_("  -?, --help             show this help, then exit\n"));
+	printf(_("  -b, --bkp-details               output detailed information about backup blocks\n"));
+	printf(_("  -B, --block=N                   with --relation, only show records that modify block N\n"));
+	printf(_("  -e, --end=RECPTR                stop reading at WAL location RECPTR\n"));
+	printf(_("  -f, --follow                    keep retrying after reaching end of WAL\n"));
+	printf(_("  -F, --fork=FORK                 only show records that modify blocks in fork FORK;\n"
+			 "                                  valid names are main, fsm, vm, init\n"));
+	printf(_("  -n, --limit=N                   number of records to display\n"));
+	printf(_("  -p, --path=PATH                 directory in which to find log segment files or a\n"
+			 "                                  directory with a ./pg_wal that contains such files\n"
+			 "                                  (default: current directory, ./pg_wal, $PGDATA/pg_wal)\n"));
+	printf(_("  -q, --quiet                     do not print any output, except for errors\n"));
+	printf(_("  -r, --rmgr=RMGR                 only show records generated by resource manager RMGR;\n"
+			 "                                  use --rmgr=list to list valid resource manager names\n"));
+	printf(_("  -R, --relation=T/D/R            only show records that modify blocks in relation T/D/R\n"));
+	printf(_("  -s, --start=RECPTR              start reading at WAL location RECPTR\n"));
+	printf(_("  -t, --timeline=TLI              timeline from which to read log records\n"
+			 "                                  (default: 1 or the value used in STARTSEG)\n"));
+	printf(_("  -V, --version                   output version information, then exit\n"));
+	printf(_("  -w, --fullpage                  only show records with a full page write\n"));
+	printf(_("  -W, --save-fullpage=path        save found full page images to given path\n"
+			 "                                  as LSN.T.D.R.B_F\n"));
+	printf(_("  -X, --save-fullpage-fixup=path  like -W, but include page-level fixups in saved page\n"));
+	printf(_("  -x, --xid=XID                   only show records with transaction ID XID\n"));
+	printf(_("  -z, --stats[=record]            show statistics instead of records\n"
+			 "                                  (optionally, show per-record statistics)\n"));
+	printf(_("  -?, --help                      show this help, then exit\n"));
 	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
 	printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
 }
@@ -695,7 +810,7 @@ main(int argc, char **argv)
 	XLogReaderState *xlogreader_state;
 	XLogDumpPrivate private;
 	XLogDumpConfig config;
-	XLogStats stats;
+	XLogStats	stats;
 	XLogRecord *record;
 	XLogRecPtr	first_record;
 	char	   *waldir = NULL;
@@ -712,6 +827,8 @@ main(int argc, char **argv)
 		{"limit", required_argument, NULL, 'n'},
 		{"path", required_argument, NULL, 'p'},
 		{"quiet", no_argument, NULL, 'q'},
+		{"save-fullpage", required_argument, NULL, 'W'},
+		{"save-fullpage-fixup", required_argument, NULL, 'X'},
 		{"relation", required_argument, NULL, 'R'},
 		{"rmgr", required_argument, NULL, 'r'},
 		{"start", required_argument, NULL, 's'},
@@ -772,6 +889,9 @@ main(int argc, char **argv)
 	config.filter_by_fpw = false;
 	config.stats = false;
 	config.stats_per_record = false;
+	config.save_fpw = false;
+	config.fixup_fpw = false;
+	config.save_fpw_path = NULL;
 
 	stats.startptr = InvalidXLogRecPtr;
 	stats.endptr = InvalidXLogRecPtr;
@@ -782,7 +902,7 @@ main(int argc, char **argv)
 		goto bad_argument;
 	}
 
-	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wx:z",
+	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wW:x:X:z",
 								 long_options, &optindex)) != -1)
 	{
 		switch (option)
@@ -919,6 +1039,12 @@ main(int argc, char **argv)
 			case 'w':
 				config.filter_by_fpw = true;
 				break;
+			case 'W':
+			case 'X':
+				config.save_fpw = true;
+				config.fixup_fpw = (option == 'X');
+				config.save_fpw_path = pg_strdup(optarg);
+				break;
 			case 'x':
 				if (sscanf(optarg, "%u", &config.filter_by_xid) != 1)
 				{
@@ -972,6 +1098,24 @@ main(int argc, char **argv)
 		}
 	}
 
+	if (config.save_fpw_path != NULL)
+	{
+		int			dir_status = pg_check_dir(config.save_fpw_path);
+
+		if (dir_status < 0)
+		{
+			pg_log_error("could not access output directory: %s", config.save_fpw_path);
+			goto bad_argument;
+		}
+
+		if (!dir_status && mkdir(config.save_fpw_path, 0700) < 0)
+		{
+			pg_log_error("could not create output directory \"%s\": %m",
+						 config.save_fpw_path);
+			goto bad_argument;
+		}
+	}
+
 	/* parse files as start/end boundaries, extract path if not specified */
 	if (optind < argc)
 	{
@@ -1150,6 +1294,11 @@ main(int argc, char **argv)
 				XLogRecStoreStats(&stats, xlogreader_state);
 				stats.endptr = xlogreader_state->EndRecPtr;
 			}
+			else if (config.save_fpw)
+			{
+				if (XLogRecordHasFPW(xlogreader_state))
+					XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path, config.fixup_fpw);
+			}
 			else
 				XLogDumpDisplayRecord(&config, xlogreader_state);
 		}
diff --git a/src/bin/pg_waldump/t/002_save_fullpage.pl b/src/bin/pg_waldump/t/002_save_fullpage.pl
new file mode 100644
index 0000000000..55aa9afb22
--- /dev/null
+++ b/src/bin/pg_waldump/t/002_save_fullpage.pl
@@ -0,0 +1,132 @@
+
+# Copyright (c) 2022, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Utils;
+use PostgreSQL::Test::Cluster;
+use Test::More;
+use File::Basename;
+
+# routine to extract the LSN and checksum from the given block structure
+sub get_block_info
+{
+	my $path = shift;
+	my $block;
+
+	open my $fh, '<', $path or die "couldn't open file: $path\n";
+	die "couldn't read full block\n" if 8192 != read $fh, $block, 8192;
+	my ($lsn_hi, $lsn_lo, $checksum) = unpack('VVv', $block);
+
+	$lsn_hi = sprintf('%08X', $lsn_hi);
+	$lsn_lo = sprintf('%08X', $lsn_lo);
+
+	return ($lsn_hi, $lsn_lo, $checksum);
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init('-k');
+$primary->adjust_conf('postgresql.conf', 'max_wal_size', '100MB');
+$primary->start;
+
+# Sanity checks for command line options.
+$primary->command_fails(
+	[ 'pg_waldump', '--save-fullpage' ],
+	'--save-fullpage fails without path');
+$primary->command_fails(
+	[ 'pg_waldump', '--save-fullpage-fixup' ],
+	'--save-fullpage-fixup fails without path');
+
+# generate data/wal to examine
+$primary->safe_psql('postgres', q(CREATE DATABASE db1));
+$primary->safe_psql('db1',      <<EOF);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT;
+UPDATE test_table SET a = a + 1;
+CHECKPOINT;
+EOF
+
+# get the relation node, etc for the new table
+my $relation = $primary->safe_psql('db1',
+	q{SELECT CASE WHEN reltablespace = 0 THEN dattablespace ELSE reltablespace END || '/' || pg_database.oid || '/' || relfilenode FROM pg_class, pg_database WHERE relname = 'test_table' AND datname = current_database()}
+);
+
+diag $relation;
+
+$primary->stop;
+my $waldir = $primary->basedir . '/pgdata/pg_wal';
+system("rsync -av " . $primary->basedir . " /tmp");
+diag "using waldir: $waldir";
+diag "waldir content: @{[qx/ls -l $waldir/]}";
+#my $walfile = [glob("$waldir/*")]->[0];
+my $tmp_folder = PostgreSQL::Test::Utils::tempdir;
+#diag "using walfile: $walfile";
+
+# # extract files
+system(
+	"pg_waldump --save-fullpage $tmp_folder/raw --relation $relation $waldir/000* || true"
+);
+system(
+	"pg_waldump --save-fullpage-fixup $tmp_folder/fixup --relation $relation $waldir/000* || true"
+);
+system("rsync -av " . $tmp_folder . " /tmp/pgtesttmp/");
+
+my $file_re =
+  qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm)?$/;
+
+my %checksums;
+my %files;
+
+# verify filename formats matches w/--save-fullpage
+for my $fullpath (glob "$tmp_folder/raw/*")
+{
+	my $file = File::Basename::basename($fullpath);
+
+	like($file, $file_re, "verify filename format for file $file");
+
+	# save filename for later verification
+	$files{$file}++;
+
+	my ($hi_lsn_fn, $lo_lsn_fn) = ($file =~ $file_re);
+	my ($hi_lsn_bk, $lo_lsn_bk, $checksum) = get_block_info($fullpath);
+
+	# since no fixup, verify the lsn in the block comes before the file's lsn
+	ok( $hi_lsn_fn . $lo_lsn_fn gt $hi_lsn_bk . $lo_lsn_bk,
+		'verify file-based LSN precedes block-based one');
+
+	# stash checksum for later comparisons
+	$checksums{$file} = $checksum;
+}
+
+# verify filename formats matches w/--save-fullpage-fixup
+for my $fullpath (glob "$tmp_folder/fixup/*")
+{
+	my $file = File::Basename::basename($fullpath);
+
+	like($file, $file_re, "verify filename format for file $file");
+
+	# save filename for later verification
+	$files{$file}++;
+
+	my ($hi_lsn_fn, $lo_lsn_fn) = ($file =~ $file_re);
+	my ($hi_lsn_bk, $lo_lsn_bk, $checksum) = get_block_info($fullpath);
+
+	# since fixup, verify the lsn in the block equals file lsn
+	ok( $hi_lsn_fn . $lo_lsn_fn eq $hi_lsn_bk . $lo_lsn_bk,
+		'verify file-based LSN is the same as block-based one');
+
+	# verify checksum change; XXX: there could be valid clashes here,
+	# just validate that the page matches the expected checksum instead?
+	ok( $checksum == 0 || $checksums{$file} != $checksum,
+		'check for checksum change or no checksum');
+}
+
+# validate that we ended up with some files output and they were the same
+ok(%files > 0, 'verify we processed some files');
+ok((grep { $_ != 2 } values %files) == 0,
+	'ensure raw and fixup had same number of files');
+
+done_testing();
-- 
2.32.0 (Apple Git-132)

#17David Christensen
david.christensen@crunchydata.com
In reply to: David Christensen (#16)
1 attachment(s)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

...and pushing a couple fixups pointed out by cfbot, so here's v4.

On Mon, May 2, 2022 at 8:42 AM David Christensen
<david.christensen@crunchydata.com> wrote:

Show quoted text

Enclosed is v3 of this patch; this adds two modes for this feature,
one with the raw page `--save-fullpage/-W` and one with the
LSN+checksum fixups `--save-fullpage-fixup/-X`.

I've added at least some basic sanity-checking of the underlying
feature, as well as run the test file and the changes to pg_waldump.c
through pgindent/perltidy to make them adhere to project standards.
Threw in a rebase as well.

Would appreciate any additional feedback here.

Best,

David

Attachments:

v4-pg_waldump-save-fpi.patchapplication/octet-stream; name=v4-pg_waldump-save-fpi.patchDownload
From acfd94c3db2497a8f690ed0c700d0dd4b65ead99 Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Wed, 20 Apr 2022 19:59:35 -0500
Subject: [PATCH] Teach pg_waldump to extract FPIs from the WAL stream

Extracts full-page images from the WAL stream into a target directory,
which must be empty or not exist.  These images are subject to the same
filtering rules as normal display in pg_waldump, which means that you
can isolate the full page writes to a target relation, among other
things.

Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk> with
formatting to make things somewhat sortable; for instance:

00000000-010000C0.1663.1.6117.0
00000000-01000150.1664.0.6115.0
00000000-010001E0.1664.0.6114.0
00000000-01000270.1663.1.6116.0
00000000-01000300.1663.1.6113.0
00000000-01000390.1663.1.6112.0
00000000-01000420.1663.1.8903.0
00000000-010004B0.1663.1.8902.0
00000000-01000540.1663.1.6111.0
00000000-010005D0.1663.1.6110.0

If the FPI comes from a fork other than the main fork, the fork name
will be appended on the output file name; e.g.:

00000000-014A4758.1663.1.12864.0_vm

It's noteworthy that the raw block images do not have the current LSN
stored with them in the WAL stream (as would be true for on-heap
versions of the blocks), nor would the checksum be updated in
them (though WAL itself has checksums, so there is some protection
there).  As such there are two versions of this functionality, one which
returns the raw page as it appears in the WAL (--save-fullpage) and one
which applies the updated pd_lsn and pd_checksum (--save-fullpage-fixup).

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect` suite of tools to perform detailed analysis on
the pages in question, based on historical information, and may come in
handy for forensics work.
---
 doc/src/sgml/ref/pg_waldump.sgml          |  79 +++++++++
 src/bin/pg_waldump/pg_waldump.c           | 199 +++++++++++++++++++---
 src/bin/pg_waldump/t/002_save_fullpage.pl | 132 ++++++++++++++
 3 files changed, 385 insertions(+), 25 deletions(-)
 create mode 100644 src/bin/pg_waldump/t/002_save_fullpage.pl

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index 57746d9421..2bd6062069 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -240,6 +240,85 @@ PostgreSQL documentation
        </listitem>
      </varlistentry>
 
+     <varlistentry>
+       <term><option>-W <replaceable>save_path</replaceable></option></term>
+       <term><option>--save-fullpage=<replaceable>save_path</replaceable></option></term>
+       <term><option>-X <replaceable>save_path</replaceable></option></term>
+       <term><option>--save-fullpage-fixup=<replaceable>save_path</replaceable></option></term>
+       <listitem>
+       <para>
+        Save full page images seen in the WAL stream to the
+        <replaceable>save_path</replaceable> directory, which will be created
+        if it does not exist.  The images saved will be subject to the same
+        filtering and limiting criteria as display records, but in this
+        mode <application>pg_waldump</application> will not output any other
+        information.
+       </para>
+       <para>
+        If invoked using
+        the <literal>-X</literal>/<literal>--save-fullpage-fixup</literal>
+        option, this page image will include the <literal>pd_lsn</literal> of
+        the replayed record rather than the raw page image; as well,
+        the <literal>pd_checksum</literal> field will be updated if it had
+        previously existed.
+       </para>
+       <para>
+        The page images will be saved with the file
+        format: <literal><replaceable>LSN</replaceable>.<replaceable>TSOID</replaceable>.<replaceable>DBOID</replaceable>.<replaceable>RELNODE</replaceable>.<replaceable>BLKNO</replaceable><replaceable>FORK</replaceable></literal>
+
+        The dot-separated components are (in order):
+
+        <informaltable>
+         <tgroup cols="2">
+          <thead>
+           <row>
+            <entry>Component</entry>
+            <entry>Description</entry>
+           </row>
+          </thead>
+
+          <tbody>
+           <row>
+            <entry>LSN</entry>
+            <entry>The LSN of the record with this block, formatted
+             as two 8-character hexadecimal numbers <literal>%08X-%08X</literal></entry>
+           </row>
+
+           <row>
+            <entry>TSOID</entry>
+            <entry>tablespace OID for the block</entry>
+           </row>
+
+           <row>
+            <entry>DBOID</entry>
+            <entry>database OID for the block</entry>
+           </row>
+
+           <row>
+            <entry>RELNODE</entry>
+            <entry>relnode id for the block</entry>
+           </row>
+
+           <row>
+            <entry>BLKNO</entry>
+            <entry>the block number of this block</entry>
+           </row>
+
+           <row>
+            <entry>FORK</entry>
+            <entry>
+             if coming from the main fork, will be empty, otherwise will be
+             one of <literal>_fsm</literal>, <literal>_vm</literal>,
+             or <literal>_init</literal>.
+            </entry>
+           </row>
+          </tbody>
+         </tgroup>
+        </informaltable>
+       </para>
+       </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-x <replaceable>xid</replaceable></option></term>
       <term><option>--xid=<replaceable>xid</replaceable></option></term>
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 4f265ef546..dd1035e593 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -24,8 +24,12 @@
 #include "access/xlogstats.h"
 #include "common/fe_memutils.h"
 #include "common/logging.h"
+#include "common/relpath.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
 
 /*
  * NOTE: For any code change or issue fix here, it is highly recommended to
@@ -70,6 +74,11 @@ typedef struct XLogDumpConfig
 	bool		filter_by_relation_block_enabled;
 	ForkNumber	filter_by_relation_forknum;
 	bool		filter_by_fpw;
+
+	/* save options */
+	bool		save_fpw;
+	bool		fixup_fpw;
+	char	   *save_fpw_path;
 } XLogDumpConfig;
 
 
@@ -439,6 +448,109 @@ XLogRecordHasFPW(XLogReaderState *record)
 	return false;
 }
 
+/*
+ * Function to externally save all FPWs stored in the given WAL record
+ */
+static void
+XLogRecordSaveFPWs(XLogReaderState *record, const char *savepath, bool fixup)
+{
+	int			block_id;
+
+	for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+	{
+		if (!XLogRecHasBlockRef(record, block_id))
+			continue;
+
+		if (XLogRecHasBlockImage(record, block_id))
+		{
+			char		page[BLCKSZ];
+
+			memset(page, 0, BLCKSZ);
+
+			if (RestoreBlockImage(record, block_id, page))
+			{
+				/* we have our extracted FPI, let's save it now */
+				char		filename[MAXPGPATH];
+				char		forkname[FORKNAMECHARS + 2];	/* _ + \0 */
+				FILE	   *OPF;
+				BlockNumber blk;
+				RelFileNode rnode;
+				ForkNumber	fork;
+
+				XLogRecGetBlockTagExtended(record, block_id,
+										   &rnode, &fork, &blk, NULL);
+
+				/*
+				 * The raw page as stored in the WAL record includes the LSN
+				 * of the block as it appeared when it was originally written,
+				 * however this differs than the effects of replaying this
+				 * same FPI in recovery, as recovery calls RestoreBlockImage()
+				 * and then sets the LSN as part of one action.  What this
+				 * means is that a page as recovered from WAL and the version
+				 * of the page saved here will differ by the LSN and the
+				 * checksum (if enabled).
+				 *
+				 * There are potentially use-cases for both versions (with and
+				 * without mentioned fixups), so allow this to be
+				 * user-selected, unless the restored page was empty, in which
+				 * case we leave it alone.
+				 */
+
+				if (fixup && !PageIsNew(page))
+				{
+					PageSetLSN(page, record->ReadRecPtr);
+
+					/*
+					 * If checksum field is non-zero then we have checksums
+					 * enabled, so recalculate the checksum with new LSN
+					 * (whether this is considered a hack or heuristics is an
+					 * exercise for the reader).
+					 *
+					 * We make this choice to allow pages saved by this
+					 * function to work as expected with the checksum-related
+					 * functions in pageinspect without having to worry about
+					 * zero_damaged_pages or other considerations.
+					 *
+					 * FPIs in WAL do not have the checksum field updated in
+					 * the page image; in a checksums-enabled cluster, this
+					 * task is handled by FlushBuffer() when a dirty buffer is
+					 * written out to disk.  Since we are running outside of
+					 * Postmaster that won't work in this case, so we handle
+					 * ourselves.
+					 */
+
+					if (((PageHeader) page)->pd_checksum)
+						((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blk);
+				}
+
+				if (fork >= 0 && fork <= MAX_FORKNUM)
+				{
+					if (fork)
+						sprintf(forkname, "_%s", forkNames[fork]);
+					else
+						forkname[0] = 0;
+				}
+				else
+					pg_fatal("found invalid fork number: %u", fork);
+
+				snprintf(filename, MAXPGPATH, "%s/%08X-%08X.%u.%u.%u.%u%s", savepath,
+						 LSN_FORMAT_ARGS(record->ReadRecPtr),
+						 rnode.spcNode, rnode.dbNode, rnode.relNode, blk, forkname);
+
+				OPF = fopen(filename, PG_BINARY_W);
+				if (!OPF)
+					pg_fatal("couldn't open file for output: %s", filename);
+
+				if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
+					pg_fatal("couldn't write out complete fullpage image to file: %s", filename);
+
+				fsync(fileno(OPF));
+				fclose(OPF);
+			}
+		}
+	}
+}
+
 /*
  * Print a record to stdout
  */
@@ -660,29 +772,32 @@ usage(void)
 	printf(_("Usage:\n"));
 	printf(_("  %s [OPTION]... [STARTSEG [ENDSEG]]\n"), progname);
 	printf(_("\nOptions:\n"));
-	printf(_("  -b, --bkp-details      output detailed information about backup blocks\n"));
-	printf(_("  -B, --block=N          with --relation, only show records that modify block N\n"));
-	printf(_("  -e, --end=RECPTR       stop reading at WAL location RECPTR\n"));
-	printf(_("  -f, --follow           keep retrying after reaching end of WAL\n"));
-	printf(_("  -F, --fork=FORK        only show records that modify blocks in fork FORK;\n"
-			 "                         valid names are main, fsm, vm, init\n"));
-	printf(_("  -n, --limit=N          number of records to display\n"));
-	printf(_("  -p, --path=PATH        directory in which to find log segment files or a\n"
-			 "                         directory with a ./pg_wal that contains such files\n"
-			 "                         (default: current directory, ./pg_wal, $PGDATA/pg_wal)\n"));
-	printf(_("  -q, --quiet            do not print any output, except for errors\n"));
-	printf(_("  -r, --rmgr=RMGR        only show records generated by resource manager RMGR;\n"
-			 "                         use --rmgr=list to list valid resource manager names\n"));
-	printf(_("  -R, --relation=T/D/R   only show records that modify blocks in relation T/D/R\n"));
-	printf(_("  -s, --start=RECPTR     start reading at WAL location RECPTR\n"));
-	printf(_("  -t, --timeline=TLI     timeline from which to read log records\n"
-			 "                         (default: 1 or the value used in STARTSEG)\n"));
-	printf(_("  -V, --version          output version information, then exit\n"));
-	printf(_("  -w, --fullpage         only show records with a full page write\n"));
-	printf(_("  -x, --xid=XID          only show records with transaction ID XID\n"));
-	printf(_("  -z, --stats[=record]   show statistics instead of records\n"
-			 "                         (optionally, show per-record statistics)\n"));
-	printf(_("  -?, --help             show this help, then exit\n"));
+	printf(_("  -b, --bkp-details               output detailed information about backup blocks\n"));
+	printf(_("  -B, --block=N                   with --relation, only show records that modify block N\n"));
+	printf(_("  -e, --end=RECPTR                stop reading at WAL location RECPTR\n"));
+	printf(_("  -f, --follow                    keep retrying after reaching end of WAL\n"));
+	printf(_("  -F, --fork=FORK                 only show records that modify blocks in fork FORK;\n"
+			 "                                  valid names are main, fsm, vm, init\n"));
+	printf(_("  -n, --limit=N                   number of records to display\n"));
+	printf(_("  -p, --path=PATH                 directory in which to find log segment files or a\n"
+			 "                                  directory with a ./pg_wal that contains such files\n"
+			 "                                  (default: current directory, ./pg_wal, $PGDATA/pg_wal)\n"));
+	printf(_("  -q, --quiet                     do not print any output, except for errors\n"));
+	printf(_("  -r, --rmgr=RMGR                 only show records generated by resource manager RMGR;\n"
+			 "                                  use --rmgr=list to list valid resource manager names\n"));
+	printf(_("  -R, --relation=T/D/R            only show records that modify blocks in relation T/D/R\n"));
+	printf(_("  -s, --start=RECPTR              start reading at WAL location RECPTR\n"));
+	printf(_("  -t, --timeline=TLI              timeline from which to read log records\n"
+			 "                                  (default: 1 or the value used in STARTSEG)\n"));
+	printf(_("  -V, --version                   output version information, then exit\n"));
+	printf(_("  -w, --fullpage                  only show records with a full page write\n"));
+	printf(_("  -W, --save-fullpage=path        save found full page images to given path\n"
+			 "                                  as LSN.T.D.R.B_F\n"));
+	printf(_("  -X, --save-fullpage-fixup=path  like -W, but include page-level fixups in saved page\n"));
+	printf(_("  -x, --xid=XID                   only show records with transaction ID XID\n"));
+	printf(_("  -z, --stats[=record]            show statistics instead of records\n"
+			 "                                  (optionally, show per-record statistics)\n"));
+	printf(_("  -?, --help                      show this help, then exit\n"));
 	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
 	printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
 }
@@ -695,7 +810,7 @@ main(int argc, char **argv)
 	XLogReaderState *xlogreader_state;
 	XLogDumpPrivate private;
 	XLogDumpConfig config;
-	XLogStats stats;
+	XLogStats	stats;
 	XLogRecord *record;
 	XLogRecPtr	first_record;
 	char	   *waldir = NULL;
@@ -712,6 +827,8 @@ main(int argc, char **argv)
 		{"limit", required_argument, NULL, 'n'},
 		{"path", required_argument, NULL, 'p'},
 		{"quiet", no_argument, NULL, 'q'},
+		{"save-fullpage", required_argument, NULL, 'W'},
+		{"save-fullpage-fixup", required_argument, NULL, 'X'},
 		{"relation", required_argument, NULL, 'R'},
 		{"rmgr", required_argument, NULL, 'r'},
 		{"start", required_argument, NULL, 's'},
@@ -772,6 +889,9 @@ main(int argc, char **argv)
 	config.filter_by_fpw = false;
 	config.stats = false;
 	config.stats_per_record = false;
+	config.save_fpw = false;
+	config.fixup_fpw = false;
+	config.save_fpw_path = NULL;
 
 	stats.startptr = InvalidXLogRecPtr;
 	stats.endptr = InvalidXLogRecPtr;
@@ -782,7 +902,7 @@ main(int argc, char **argv)
 		goto bad_argument;
 	}
 
-	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wx:z",
+	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wW:x:X:z",
 								 long_options, &optindex)) != -1)
 	{
 		switch (option)
@@ -919,6 +1039,12 @@ main(int argc, char **argv)
 			case 'w':
 				config.filter_by_fpw = true;
 				break;
+			case 'W':
+			case 'X':
+				config.save_fpw = true;
+				config.fixup_fpw = (option == 'X');
+				config.save_fpw_path = pg_strdup(optarg);
+				break;
 			case 'x':
 				if (sscanf(optarg, "%u", &config.filter_by_xid) != 1)
 				{
@@ -972,6 +1098,24 @@ main(int argc, char **argv)
 		}
 	}
 
+	if (config.save_fpw_path != NULL)
+	{
+		int			dir_status = pg_check_dir(config.save_fpw_path);
+
+		if (dir_status < 0)
+		{
+			pg_log_error("could not access output directory: %s", config.save_fpw_path);
+			goto bad_argument;
+		}
+
+		if (!dir_status && mkdir(config.save_fpw_path, 0700) < 0)
+		{
+			pg_log_error("could not create output directory \"%s\": %m",
+						 config.save_fpw_path);
+			goto bad_argument;
+		}
+	}
+
 	/* parse files as start/end boundaries, extract path if not specified */
 	if (optind < argc)
 	{
@@ -1150,6 +1294,11 @@ main(int argc, char **argv)
 				XLogRecStoreStats(&stats, xlogreader_state);
 				stats.endptr = xlogreader_state->EndRecPtr;
 			}
+			else if (config.save_fpw)
+			{
+				if (XLogRecordHasFPW(xlogreader_state))
+					XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path, config.fixup_fpw);
+			}
 			else
 				XLogDumpDisplayRecord(&config, xlogreader_state);
 		}
diff --git a/src/bin/pg_waldump/t/002_save_fullpage.pl b/src/bin/pg_waldump/t/002_save_fullpage.pl
new file mode 100644
index 0000000000..bf2389abf8
--- /dev/null
+++ b/src/bin/pg_waldump/t/002_save_fullpage.pl
@@ -0,0 +1,132 @@
+
+# Copyright (c) 2022, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Utils;
+use PostgreSQL::Test::Cluster;
+use Test::More;
+use File::Basename;
+
+# routine to extract the LSN and checksum from the given block structure
+sub get_block_info
+{
+	my $path = shift;
+	my $block;
+
+	open my $fh, '<', $path or die "couldn't open file: $path\n";
+	die "couldn't read full block\n" if 8192 != read $fh, $block, 8192;
+	my ($lsn_hi, $lsn_lo, $checksum) = unpack('VVv', $block);
+
+	$lsn_hi = sprintf('%08X', $lsn_hi);
+	$lsn_lo = sprintf('%08X', $lsn_lo);
+
+	return ($lsn_hi, $lsn_lo, $checksum);
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init('-k');
+$primary->adjust_conf('postgresql.conf', 'max_wal_size', '100MB');
+$primary->start;
+
+# Sanity checks for command line options.
+$primary->command_fails(
+	[ 'pg_waldump', '--save-fullpage' ],
+	'--save-fullpage fails without path');
+$primary->command_fails(
+	[ 'pg_waldump', '--save-fullpage-fixup' ],
+	'--save-fullpage-fixup fails without path');
+
+# generate data/wal to examine
+$primary->safe_psql('postgres', q(CREATE DATABASE db1));
+$primary->safe_psql('db1',      <<EOF);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT;
+UPDATE test_table SET a = a + 1;
+CHECKPOINT;
+EOF
+
+# get the relation node, etc for the new table
+my $relation = $primary->safe_psql('db1',
+	q{SELECT CASE WHEN reltablespace = 0 THEN dattablespace ELSE reltablespace END || '/' || pg_database.oid || '/' || relfilenode FROM pg_class, pg_database WHERE relname = 'test_table' AND datname = current_database()}
+);
+
+diag $relation;
+
+$primary->stop;
+my $waldir = $primary->basedir . '/pgdata/pg_wal';
+system("rsync -av " . $primary->basedir . " /tmp");
+diag "using waldir: $waldir";
+diag "waldir content: @{[qx/ls -l $waldir/]}";
+#my $walfile = [glob("$waldir/*")]->[0];
+my $tmp_folder = PostgreSQL::Test::Utils::tempdir;
+#diag "using walfile: $walfile";
+
+# # extract files
+system(
+	"pg_waldump --save-fullpage $tmp_folder/raw --relation $relation $waldir/000* || true"
+);
+system(
+	"pg_waldump --save-fullpage-fixup $tmp_folder/fixup --relation $relation $waldir/000* || true"
+);
+system("rsync -av " . $tmp_folder . " /tmp/pgtesttmp/");
+
+my $file_re =
+  qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm)?$/;
+
+my %checksums;
+my %files;
+
+# verify filename formats matches w/--save-fullpage
+for my $fullpath (glob "$tmp_folder/raw/*")
+{
+	my $file = File::Basename::basename($fullpath);
+
+	like($file, $file_re, "verify filename format for file $file");
+
+	# save filename for later verification
+	$files{$file}++;
+
+	my ($hi_lsn_fn, $lo_lsn_fn) = ($file =~ $file_re);
+	my ($hi_lsn_bk, $lo_lsn_bk, $checksum) = get_block_info($fullpath);
+
+	# since no fixup, verify the lsn in the block comes before the file's lsn
+	ok( $hi_lsn_fn . $lo_lsn_fn gt $hi_lsn_bk . $lo_lsn_bk,
+		'verify file-based LSN precedes block-based one');
+
+	# stash checksum for later comparisons
+	$checksums{$file} = $checksum;
+}
+
+# verify filename formats matches w/--save-fullpage-fixup
+for my $fullpath (glob "$tmp_folder/fixup/*")
+{
+	my $file = File::Basename::basename($fullpath);
+
+	like($file, $file_re, "verify filename format for file $file");
+
+	# save filename for later verification
+	$files{$file}++;
+
+	my ($hi_lsn_fn, $lo_lsn_fn) = ($file =~ $file_re);
+	my ($hi_lsn_bk, $lo_lsn_bk, $checksum) = get_block_info($fullpath);
+
+	# since fixup, verify the lsn in the block equals file lsn
+	ok( $hi_lsn_fn . $lo_lsn_fn eq $hi_lsn_bk . $lo_lsn_bk,
+		'verify file-based LSN is the same as block-based one');
+
+	# verify checksum change; XXX: there could be valid clashes here,
+	# just validate that the page matches the expected checksum instead?
+	ok( $checksum == 0 || $checksums{$file} != $checksum,
+		'check for checksum change or no checksum');
+}
+
+# validate that we ended up with some files output and they were the same
+ok(keys %files > 0, 'verify we processed some files');
+ok((grep { $_ != 2 } values %files) == 0,
+	'ensure raw and fixup had same number of files');
+
+done_testing();
-- 
2.32.0 (Apple Git-132)

#18Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Michael Paquier (#15)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On 2022-Apr-27, Michael Paquier wrote:

On Tue, Apr 26, 2022 at 01:15:05PM -0500, David Christensen wrote:

True. :-) This does seem like a tool geared towards "expert mode", so
maybe we just assume if you need it you know what you're doing?

This is definitely an expert mode toy.

I remember Greg Mullane posted a tool that attempts to correct page CRC
mismatches[1]https://github.com/turnstep/pg_healer. This new tool might be useful to feed healing attempts,
too. (It's of course not in any way a *solution*, because the page
might have been modified by other WAL records since the last FPI, but it
could be a start towards building a solution that scoops page contents
from WAL.)

[1]: https://github.com/turnstep/pg_healer

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Subversion to GIT: the shortest path to happiness I've ever heard of
(Alexey Klyukin)

#19Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#18)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Tue, May 03, 2022 at 10:34:41AM +0200, Alvaro Herrera wrote:

I remember Greg Mullane posted a tool that attempts to correct page CRC
mismatches[1]. This new tool might be useful to feed healing attempts,
too. (It's of course not in any way a *solution*, because the page
might have been modified by other WAL records since the last FPI, but it
could be a start towards building a solution that scoops page contents
from WAL.)

[1] https://github.com/turnstep/pg_healer

Fun. Thanks for mentioning that.
--
Michael

#20Ian Lawrence Barwick
barwick@gmail.com
In reply to: David Christensen (#17)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022年5月3日(火) 8:45 David Christensen <david.christensen@crunchydata.com>:

...and pushing a couple fixups pointed out by cfbot, so here's v4.

Hi

cfbot reports the patch no longer applies [1]http://cfbot.cputube.org/patch_40_3628.log. As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.

[1]: http://cfbot.cputube.org/patch_40_3628.log

Thanks

Ian Barwick

#21Justin Pryzby
pryzby@telsasoft.com
In reply to: Ian Lawrence Barwick (#20)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Fri, Nov 04, 2022 at 11:52:59AM +0900, Ian Lawrence Barwick wrote:

2022年5月3日(火) 8:45 David Christensen <david.christensen@crunchydata.com>:

...and pushing a couple fixups pointed out by cfbot, so here's v4.

cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.

More important than needing to be rebased, the patch has never passed
its current tests on windows.

As I recall, that's due to relying on "cp". And "rsync", which
shouldn't be assumed to exist by regression tests).

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/38/3628

#22David Christensen
david.christensen@crunchydata.com
In reply to: Justin Pryzby (#21)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Nov 4, 2022, at 9:02 AM, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Fri, Nov 04, 2022 at 11:52:59AM +0900, Ian Lawrence Barwick wrote:

2022年5月3日(火) 8:45 David Christensen <david.christensen@crunchydata.com>:

...and pushing a couple fixups pointed out by cfbot, so here's v4.

cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.

More important than needing to be rebased, the patch has never passed
its current tests on windows.

As I recall, that's due to relying on "cp". And "rsync", which
shouldn't be assumed to exist by regression tests).

I will work on supporting the windows compatibility here. Is there some list of guidelines for what you can and can’t use? I don’t have a windows machine available to develop on.

Was it failing on windows? I was attempting to skip it as I recall.

Best,

David

#23Justin Pryzby
pryzby@telsasoft.com
In reply to: David Christensen (#22)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Fri, Nov 04, 2022 at 09:16:29AM -0500, David Christensen wrote:

On Nov 4, 2022, at 9:02 AM, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Fri, Nov 04, 2022 at 11:52:59AM +0900, Ian Lawrence Barwick wrote:

2022年5月3日(火) 8:45 David Christensen <david.christensen@crunchydata.com>:

...and pushing a couple fixups pointed out by cfbot, so here's v4.

cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.

More important than needing to be rebased, the patch has never passed
its current tests on windows.

As I recall, that's due to relying on "cp". And "rsync", which
shouldn't be assumed to exist by regression tests).

I will work on supporting the windows compatibility here. Is there some list of guidelines for what you can and can’t use? I don’t have a windows machine available to develop on.

I think a lot (most?) developers here don't have a windows environment
available, so now have been using cirrusci's tests to verify. If you
haven't used cirrusci directly (not via cfbot) before, start at:
src/tools/ci/README

There's not much assumed about the build environment, and not much more
assumed about the test environment. Most of the portability is handled
by using C and perl. I think there's even no assumption that "tar" is
available (except maybe for building releases). This patch should avoid
relying on tools that aren't already required.

As a practical matter, cfbot needs to pass, not only to demonstrate that
the patch consistently passes tests, but also because if the patch were
merged while it failed tests in cfbot, it would cause every other patch
to start to fail, too.

Was it failing on windows? I was attempting to skip it as I recall.

I don't see anything about skipping, and cirrus's logs from 2
commitfests ago were pruned. I looked at this patch earlier this year,
but never got around to replacing the calls to rsync and cp.

--
Justin

#24David Christensen
david.christensen@crunchydata.com
In reply to: Justin Pryzby (#23)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Fri, Nov 4, 2022 at 1:38 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

As I recall, that's due to relying on "cp". And "rsync", which
shouldn't be assumed to exist by regression tests).

Will poke around other TAP tests to see if there's a more consistent
interface, what perl version we can assume and available modules, etc.
If there's not some trivial wrapper at this point so all TAP tests
could use it regardless of OS, it would definitely be good to
introduce such a method.

I will work on supporting the windows compatibility here. Is there some list of guidelines for what you can and can’t use? I don’t have a windows machine available to develop on.

I think a lot (most?) developers here don't have a windows environment
available, so now have been using cirrusci's tests to verify. If you
haven't used cirrusci directly (not via cfbot) before, start at:
src/tools/ci/README

Thanks, good starting point.

Was it failing on windows? I was attempting to skip it as I recall.

I don't see anything about skipping, and cirrus's logs from 2
commitfests ago were pruned. I looked at this patch earlier this year,
but never got around to replacing the calls to rsync and cp.

Ah, it's skipped (not fixed) in my git repo, but never got around to
submitting that version through email. That explains it.

David

#25David Christensen
david.christensen@crunchydata.com
In reply to: David Christensen (#24)
1 attachment(s)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

Hi Justin et al,

Enclosed is v5 of this patch which now passes the CirrusCI checks for
all supported OSes. I went ahead and reworked the test a bit so it's a
little more amenable to the OS-agnostic approach for testing.

Best,

David

Attachments:

v5-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patchapplication/octet-stream; name=v5-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patchDownload
From 92ecf6e3f2b2f5ae5224daa063b784444c40f906 Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Wed, 20 Apr 2022 19:59:35 -0500
Subject: [PATCH v5] Teach pg_waldump to extract FPIs from the WAL stream

Extracts full-page images from the WAL stream into a target directory,
which must be empty or not exist.  These images are subject to the same
filtering rules as normal display in pg_waldump, which means that you
can isolate the full page writes to a target relation, among other
things.

Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk> with
formatting to make things somewhat sortable; for instance:

00000000-010000C0.1663.1.6117.0
00000000-01000150.1664.0.6115.0
00000000-010001E0.1664.0.6114.0
00000000-01000270.1663.1.6116.0
00000000-01000300.1663.1.6113.0
00000000-01000390.1663.1.6112.0
00000000-01000420.1663.1.8903.0
00000000-010004B0.1663.1.8902.0
00000000-01000540.1663.1.6111.0
00000000-010005D0.1663.1.6110.0

If the FPI comes from a fork other than the main fork, the fork name
will be appended on the output file name; e.g.:

00000000-014A4758.1663.1.12864.0_vm

It's noteworthy that the raw block images do not have the current LSN
stored with them in the WAL stream (as would be true for on-heap
versions of the blocks), nor would the checksum be updated in
them (though WAL itself has checksums, so there is some protection
there).  As such there are two versions of this functionality, one which
returns the raw page as it appears in the WAL (--save-fpi) and one
which applies the updated pd_lsn and pd_checksum (--fixup-fpi).

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect` suite of tools to perform detailed analysis on
the pages in question, based on historical information, and may come in
handy for forensics work.
---
 doc/src/sgml/ref/pg_waldump.sgml          |  79 +++++++++++
 src/bin/pg_waldump/pg_waldump.c           | 151 +++++++++++++++++++++-
 src/bin/pg_waldump/t/002_save_fullpage.pl | 137 ++++++++++++++++++++
 3 files changed, 366 insertions(+), 1 deletion(-)
 create mode 100644 src/bin/pg_waldump/t/002_save_fullpage.pl

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d559f091e5..cc3ce91890 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -240,6 +240,85 @@ PostgreSQL documentation
        </listitem>
      </varlistentry>
 
+     <varlistentry>
+       <term><option>-W <replaceable>save_path</replaceable></option></term>
+       <term><option>--save-fpi=<replaceable>save_path</replaceable></option></term>
+       <term><option>-X <replaceable>save_path</replaceable></option></term>
+       <term><option>--fixup-fpi=<replaceable>save_path</replaceable></option></term>
+       <listitem>
+       <para>
+        Save full page images seen in the WAL stream to the
+        <replaceable>save_path</replaceable> directory, which will be created
+        if it does not exist.  The images saved will be subject to the same
+        filtering and limiting criteria as display records, but in this
+        mode <application>pg_waldump</application> will not output any other
+        information.
+       </para>
+       <para>
+        If invoked using
+        the <literal>-X</literal>/<literal>--fixup-fpi</literal>
+        option, this page image will include the <literal>pd_lsn</literal> of
+        the replayed record rather than the raw page image; as well,
+        the <literal>pd_checksum</literal> field will be updated if it had
+        previously existed.
+       </para>
+       <para>
+        The page images will be saved with the file
+        format: <literal><replaceable>LSN</replaceable>.<replaceable>TSOID</replaceable>.<replaceable>DBOID</replaceable>.<replaceable>RELNODE</replaceable>.<replaceable>BLKNO</replaceable><replaceable>FORK</replaceable></literal>
+
+        The dot-separated components are (in order):
+
+        <informaltable>
+         <tgroup cols="2">
+          <thead>
+           <row>
+            <entry>Component</entry>
+            <entry>Description</entry>
+           </row>
+          </thead>
+
+          <tbody>
+           <row>
+            <entry>LSN</entry>
+            <entry>The LSN of the record with this block, formatted
+             as two 8-character hexadecimal numbers <literal>%08X-%08X</literal></entry>
+           </row>
+
+           <row>
+            <entry>TSOID</entry>
+            <entry>tablespace OID for the block</entry>
+           </row>
+
+           <row>
+            <entry>DBOID</entry>
+            <entry>database OID for the block</entry>
+           </row>
+
+           <row>
+            <entry>RELNODE</entry>
+            <entry>relnode id for the block</entry>
+           </row>
+
+           <row>
+            <entry>BLKNO</entry>
+            <entry>the block number of this block</entry>
+           </row>
+
+           <row>
+            <entry>FORK</entry>
+            <entry>
+             if coming from the main fork, will be empty, otherwise will be
+             one of <literal>_fsm</literal>, <literal>_vm</literal>,
+             or <literal>_init</literal>.
+            </entry>
+           </row>
+          </tbody>
+         </tgroup>
+        </informaltable>
+       </para>
+       </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-x <replaceable>xid</replaceable></option></term>
       <term><option>--xid=<replaceable>xid</replaceable></option></term>
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 9993378ca5..2b6f772dc7 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -24,8 +24,12 @@
 #include "access/xlogstats.h"
 #include "common/fe_memutils.h"
 #include "common/logging.h"
+#include "common/relpath.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
 
 /*
  * NOTE: For any code change or issue fix here, it is highly recommended to
@@ -70,6 +74,11 @@ typedef struct XLogDumpConfig
 	bool		filter_by_relation_block_enabled;
 	ForkNumber	filter_by_relation_forknum;
 	bool		filter_by_fpw;
+
+	/* save options */
+	bool		save_fpw;
+	bool		fixup_fpw;
+	char	   *save_fpw_path;
 } XLogDumpConfig;
 
 
@@ -439,6 +448,109 @@ XLogRecordHasFPW(XLogReaderState *record)
 	return false;
 }
 
+/*
+ * Function to externally save all FPWs stored in the given WAL record
+ */
+static void
+XLogRecordSaveFPWs(XLogReaderState *record, const char *savepath, bool fixup)
+{
+	int			block_id;
+
+	for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+	{
+		if (!XLogRecHasBlockRef(record, block_id))
+			continue;
+
+		if (XLogRecHasBlockImage(record, block_id))
+		{
+			char		page[BLCKSZ];
+
+			memset(page, 0, BLCKSZ);
+
+			if (RestoreBlockImage(record, block_id, page))
+			{
+				/* we have our extracted FPI, let's save it now */
+				char		filename[MAXPGPATH];
+				char		forkname[FORKNAMECHARS + 2];	/* _ + \0 */
+				FILE	   *OPF;
+				BlockNumber blk;
+				RelFileLocator rnode;
+				ForkNumber	fork;
+
+				XLogRecGetBlockTagExtended(record, block_id,
+										   &rnode, &fork, &blk, NULL);
+
+				/*
+				 * The raw page as stored in the WAL record includes the LSN
+				 * of the block as it appeared when it was originally written,
+				 * however this differs than the effects of replaying this
+				 * same FPI in recovery, as recovery calls RestoreBlockImage()
+				 * and then sets the LSN as part of one action.  What this
+				 * means is that a page as recovered from WAL and the version
+				 * of the page saved here will differ by the LSN and the
+				 * checksum (if enabled).
+				 *
+				 * There are potentially use-cases for both versions (with and
+				 * without mentioned fixups), so allow this to be
+				 * user-selected, unless the restored page was empty, in which
+				 * case we leave it alone.
+				 */
+
+				if (fixup && !PageIsNew(page))
+				{
+					PageSetLSN(page, record->ReadRecPtr);
+
+					/*
+					 * If checksum field is non-zero then we have checksums
+					 * enabled, so recalculate the checksum with new LSN
+					 * (whether this is considered a hack or heuristics is an
+					 * exercise for the reader).
+					 *
+					 * We make this choice to allow pages saved by this
+					 * function to work as expected with the checksum-related
+					 * functions in pageinspect without having to worry about
+					 * zero_damaged_pages or other considerations.
+					 *
+					 * FPIs in WAL do not have the checksum field updated in
+					 * the page image; in a checksums-enabled cluster, this
+					 * task is handled by FlushBuffer() when a dirty buffer is
+					 * written out to disk.  Since we are running outside of
+					 * Postmaster that won't work in this case, so we handle
+					 * ourselves.
+					 */
+
+					if (((PageHeader) page)->pd_checksum)
+						((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blk);
+				}
+
+				if (fork >= 0 && fork <= MAX_FORKNUM)
+				{
+					if (fork)
+						sprintf(forkname, "_%s", forkNames[fork]);
+					else
+						forkname[0] = 0;
+				}
+				else
+					pg_fatal("found invalid fork number: %u", fork);
+
+				snprintf(filename, MAXPGPATH, "%s/%08X-%08X.%u.%u.%u.%u%s", savepath,
+						 LSN_FORMAT_ARGS(record->ReadRecPtr),
+						 rnode.spcOid, rnode.dbOid, rnode.relNumber, blk, forkname);
+
+				OPF = fopen(filename, PG_BINARY_W);
+				if (!OPF)
+					pg_fatal("couldn't open file for output: %s", filename);
+
+				if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
+					pg_fatal("couldn't write out complete fullpage image to file: %s", filename);
+
+				fsync(fileno(OPF));
+				fclose(OPF);
+			}
+		}
+	}
+}
+
 /*
  * Print a record to stdout
  */
@@ -679,6 +791,9 @@ usage(void)
 			 "                         (default: 1 or the value used in STARTSEG)\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
 	printf(_("  -w, --fullpage         only show records with a full page write\n"));
+	printf(_("  -W, --save-fpi=path    save found full page images to given path\n"
+			 "                         as LSN.T.D.R.B_F\n"));
+	printf(_("  -X, --fixup-fpi=path   like --save-fpi but apply LSN fixups to \n"));
 	printf(_("  -x, --xid=XID          only show records with transaction ID XID\n"));
 	printf(_("  -z, --stats[=record]   show statistics instead of records\n"
 			 "                         (optionally, show per-record statistics)\n"));
@@ -712,6 +827,8 @@ main(int argc, char **argv)
 		{"limit", required_argument, NULL, 'n'},
 		{"path", required_argument, NULL, 'p'},
 		{"quiet", no_argument, NULL, 'q'},
+		{"save-fpi", required_argument, NULL, 'W'},
+		{"fixup-fpi", required_argument, NULL, 'X'},
 		{"relation", required_argument, NULL, 'R'},
 		{"rmgr", required_argument, NULL, 'r'},
 		{"start", required_argument, NULL, 's'},
@@ -772,6 +889,9 @@ main(int argc, char **argv)
 	config.filter_by_fpw = false;
 	config.stats = false;
 	config.stats_per_record = false;
+	config.save_fpw = false;
+	config.fixup_fpw = false;
+	config.save_fpw_path = NULL;
 
 	stats.startptr = InvalidXLogRecPtr;
 	stats.endptr = InvalidXLogRecPtr;
@@ -782,7 +902,7 @@ main(int argc, char **argv)
 		goto bad_argument;
 	}
 
-	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wx:z",
+	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wW:x:X:z",
 								 long_options, &optindex)) != -1)
 	{
 		switch (option)
@@ -918,6 +1038,12 @@ main(int argc, char **argv)
 			case 'w':
 				config.filter_by_fpw = true;
 				break;
+			case 'W':
+			case 'X':
+				config.save_fpw = true;
+				config.fixup_fpw = (option == 'X');
+				config.save_fpw_path = pg_strdup(optarg);
+				break;
 			case 'x':
 				if (sscanf(optarg, "%u", &config.filter_by_xid) != 1)
 				{
@@ -972,6 +1098,24 @@ main(int argc, char **argv)
 		}
 	}
 
+	if (config.save_fpw_path != NULL)
+	{
+		int			dir_status = pg_check_dir(config.save_fpw_path);
+
+		if (dir_status < 0)
+		{
+			pg_log_error("could not access output directory: %s", config.save_fpw_path);
+			goto bad_argument;
+		}
+
+		if (!dir_status && mkdir(config.save_fpw_path, 0700) < 0)
+		{
+			pg_log_error("could not create output directory \"%s\": %m",
+						 config.save_fpw_path);
+			goto bad_argument;
+		}
+	}
+
 	/* parse files as start/end boundaries, extract path if not specified */
 	if (optind < argc)
 	{
@@ -1150,6 +1294,11 @@ main(int argc, char **argv)
 				XLogRecStoreStats(&stats, xlogreader_state);
 				stats.endptr = xlogreader_state->EndRecPtr;
 			}
+			else if (config.save_fpw)
+			{
+				if (XLogRecordHasFPW(xlogreader_state))
+					XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path, config.fixup_fpw);
+			}
 			else
 				XLogDumpDisplayRecord(&config, xlogreader_state);
 		}
diff --git a/src/bin/pg_waldump/t/002_save_fullpage.pl b/src/bin/pg_waldump/t/002_save_fullpage.pl
new file mode 100644
index 0000000000..f605b4f6ad
--- /dev/null
+++ b/src/bin/pg_waldump/t/002_save_fullpage.pl
@@ -0,0 +1,137 @@
+
+# Copyright (c) 2022, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use File::Basename;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::RecursiveCopy;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# routine to extract the LSN and checksum from the given block structure
+sub get_block_info
+{
+	my $path = shift;
+	my $block;
+
+	open my $fh, '<', $path or die "couldn't open file: $path\n";
+	die "couldn't read full block\n" if 8192 != read $fh, $block, 8192;
+	my ($lsn_hi, $lsn_lo, $checksum) = unpack('VVv', $block);
+
+	$lsn_hi = sprintf('%08X', $lsn_hi);
+	$lsn_lo = sprintf('%08X', $lsn_lo);
+
+	return ($lsn_hi, $lsn_lo, $checksum);
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init('-k');
+$primary->append_conf('postgresql.conf', "max_wal_size='100MB'");
+$primary->append_conf('postgresql.conf', "wal_level='replica'");
+$primary->append_conf('postgresql.conf', 'archive_mode=on');
+$primary->append_conf('postgresql.conf', "archive_command='/bin/false'");
+$primary->start;
+
+# Sanity checks for command line options.
+$primary->command_fails(
+	[ 'pg_waldump', '--save-fpi' ],
+	'--save-fpi fails without path');
+$primary->command_fails(
+	[ 'pg_waldump', '--fixup-fpi' ],
+	'--fixup-fpi fails without path');
+
+# generate data/wal to examine
+$primary->safe_psql('postgres', q(CREATE DATABASE db1));
+$primary->safe_psql('db1',      <<EOF);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT;
+SELECT pg_switch_wal();
+UPDATE test_table SET a = a + 1;
+CHECKPOINT;
+SELECT pg_switch_wal();
+UPDATE test_table SET a = a + 1;
+CHECKPOINT;
+SELECT pg_switch_wal();
+EOF
+
+# get the relation node, etc for the new table
+my $relation = $primary->safe_psql('db1',
+	q{SELECT CASE WHEN reltablespace = 0 THEN dattablespace ELSE reltablespace END || '/' || pg_database.oid || '/' || relfilenode FROM pg_class, pg_database WHERE relname = 'test_table' AND datname = current_database()}
+);
+
+diag $relation;
+
+$primary->stop;
+my $waldir = $primary->basedir . '/pgdata/pg_wal';
+my $walfile = [glob("$waldir/*")]->[2]; # we want the 00000002 file
+my $tmp_folder = PostgreSQL::Test::Utils::tempdir;
+diag "using walfile: $walfile";
+
+# # # extract files
+# PostgreSQL::Test::RecursiveCopy::copypath(
+# 	$primary->data_dir . "/pg_wal",
+# 	$primary->tmp_folder);
+
+$primary->command_ok(['pg_waldump', '--save-fpi', "$tmp_folder/raw", '--relation', $relation, $walfile]);
+$primary->command_ok(['pg_waldump', '--fixup-fpi', "$tmp_folder/fixup", '--relation', $relation, $walfile]);
+
+my $file_re =
+  qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm)?$/;
+
+my %checksums;
+my %files;
+
+# verify filename formats matches w/--save-fpi
+for my $fullpath (glob "$tmp_folder/raw/*")
+{
+	my $file = File::Basename::basename($fullpath);
+
+	like($file, $file_re, "verify filename format for file $file");
+
+	# save filename for later verification
+	$files{$file}++;
+
+	my ($hi_lsn_fn, $lo_lsn_fn) = ($file =~ $file_re);
+	my ($hi_lsn_bk, $lo_lsn_bk, $checksum) = get_block_info($fullpath);
+
+	# since no fixup, verify the lsn in the block comes before the file's lsn
+	ok( $hi_lsn_fn . $lo_lsn_fn gt $hi_lsn_bk . $lo_lsn_bk,
+		'verify file-based LSN precedes block-based one');
+
+	# stash checksum for later comparisons
+	$checksums{$file} = $checksum;
+}
+
+# verify filename formats matches w/--fixup-fpi
+for my $fullpath (glob "$tmp_folder/fixup/*")
+{
+	my $file = File::Basename::basename($fullpath);
+
+	like($file, $file_re, "verify filename format for file $file");
+
+	# save filename for later verification
+	$files{$file}++;
+
+	my ($hi_lsn_fn, $lo_lsn_fn) = ($file =~ $file_re);
+	my ($hi_lsn_bk, $lo_lsn_bk, $checksum) = get_block_info($fullpath);
+
+	# since fixup, verify the lsn in the block equals file lsn
+	ok( $hi_lsn_fn . $lo_lsn_fn eq $hi_lsn_bk . $lo_lsn_bk,
+		'verify file-based LSN is the same as block-based one');
+
+	# verify checksum change; XXX: there could be valid clashes here,
+	# just validate that the page matches the expected checksum instead?
+	ok( $checksum == 0 || $checksums{$file} != $checksum,
+		'check for checksum change or no checksum');
+}
+
+# validate that we ended up with some files output and they were the same
+ok(keys %files > 0, 'verify we processed some files');
+ok((grep { $_ != 2 } values %files) == 0,
+	'ensure raw and fixup had same number of files');
+
+done_testing();
-- 
2.37.1 (Apple Git-137.1)

#26Justin Pryzby
pryzby@telsasoft.com
In reply to: David Christensen (#25)
2 attachment(s)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Nov 07, 2022 at 05:01:01PM -0600, David Christensen wrote:

Hi Justin et al,

Enclosed is v5 of this patch which now passes the CirrusCI checks for
all supported OSes. I went ahead and reworked the test a bit so it's a
little more amenable to the OS-agnostic approach for testing.

Great, thanks.

This includes the changes that I'd started a few months ago.
Plus adding the test which was missing for meson.

+ format: <literal><replaceable>LSN</replaceable>.<replaceable>TSOID</replaceable>.<replaceable>DBOID</replaceable>.<replaceable>RELNODE</replaceable>.<replaceable>BLKNO</replaceable></literal>

I'd prefer if the abbreviations were "reltablespace" and "datoid"

Also, should the test case call pg_relation_filenode() rather than using
relfilenode directly ? Is it a problem that the test code assumes
pagesize=8192 ?

--
Justin

Attachments:

0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-stream.patchtext/x-diff; charset=us-asciiDownload
From d4cb23bc2f0edcc65b0bef75fd2e8d6e5aeda21f Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Wed, 20 Apr 2022 19:59:35 -0500
Subject: [PATCH 1/2] Teach pg_waldump to extract FPIs from the WAL stream

Extracts full-page images from the WAL stream into a target directory,
which must be empty or not exist.  These images are subject to the same
filtering rules as normal display in pg_waldump, which means that you
can isolate the full page writes to a target relation, among other
things.

Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk> with
formatting to make things somewhat sortable; for instance:

00000000-010000C0.1663.1.6117.0
00000000-01000150.1664.0.6115.0
00000000-010001E0.1664.0.6114.0
00000000-01000270.1663.1.6116.0
00000000-01000300.1663.1.6113.0
00000000-01000390.1663.1.6112.0
00000000-01000420.1663.1.8903.0
00000000-010004B0.1663.1.8902.0
00000000-01000540.1663.1.6111.0
00000000-010005D0.1663.1.6110.0

If the FPI comes from a fork other than the main fork, the fork name
will be appended on the output file name; e.g.:

00000000-014A4758.1663.1.12864.0_vm

It's noteworthy that the raw block images do not have the current LSN
stored with them in the WAL stream (as would be true for on-heap
versions of the blocks), nor would the checksum be updated in
them (though WAL itself has checksums, so there is some protection
there).  As such there are two versions of this functionality, one which
returns the raw page as it appears in the WAL (--save-fpi) and one
which applies the updated pd_lsn and pd_checksum (--fixup-fpi).

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect` suite of tools to perform detailed analysis on
the pages in question, based on historical information, and may come in
handy for forensics work.
---
 doc/src/sgml/ref/pg_waldump.sgml          |  79 +++++++++++
 src/bin/pg_waldump/pg_waldump.c           | 151 +++++++++++++++++++++-
 src/bin/pg_waldump/t/002_save_fullpage.pl | 137 ++++++++++++++++++++
 3 files changed, 366 insertions(+), 1 deletion(-)
 create mode 100644 src/bin/pg_waldump/t/002_save_fullpage.pl

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d559f091e5f..cc3ce918905 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -240,6 +240,85 @@ PostgreSQL documentation
        </listitem>
      </varlistentry>
 
+     <varlistentry>
+       <term><option>-W <replaceable>save_path</replaceable></option></term>
+       <term><option>--save-fpi=<replaceable>save_path</replaceable></option></term>
+       <term><option>-X <replaceable>save_path</replaceable></option></term>
+       <term><option>--fixup-fpi=<replaceable>save_path</replaceable></option></term>
+       <listitem>
+       <para>
+        Save full page images seen in the WAL stream to the
+        <replaceable>save_path</replaceable> directory, which will be created
+        if it does not exist.  The images saved will be subject to the same
+        filtering and limiting criteria as display records, but in this
+        mode <application>pg_waldump</application> will not output any other
+        information.
+       </para>
+       <para>
+        If invoked using
+        the <literal>-X</literal>/<literal>--fixup-fpi</literal>
+        option, this page image will include the <literal>pd_lsn</literal> of
+        the replayed record rather than the raw page image; as well,
+        the <literal>pd_checksum</literal> field will be updated if it had
+        previously existed.
+       </para>
+       <para>
+        The page images will be saved with the file
+        format: <literal><replaceable>LSN</replaceable>.<replaceable>TSOID</replaceable>.<replaceable>DBOID</replaceable>.<replaceable>RELNODE</replaceable>.<replaceable>BLKNO</replaceable><replaceable>FORK</replaceable></literal>
+
+        The dot-separated components are (in order):
+
+        <informaltable>
+         <tgroup cols="2">
+          <thead>
+           <row>
+            <entry>Component</entry>
+            <entry>Description</entry>
+           </row>
+          </thead>
+
+          <tbody>
+           <row>
+            <entry>LSN</entry>
+            <entry>The LSN of the record with this block, formatted
+             as two 8-character hexadecimal numbers <literal>%08X-%08X</literal></entry>
+           </row>
+
+           <row>
+            <entry>TSOID</entry>
+            <entry>tablespace OID for the block</entry>
+           </row>
+
+           <row>
+            <entry>DBOID</entry>
+            <entry>database OID for the block</entry>
+           </row>
+
+           <row>
+            <entry>RELNODE</entry>
+            <entry>relnode id for the block</entry>
+           </row>
+
+           <row>
+            <entry>BLKNO</entry>
+            <entry>the block number of this block</entry>
+           </row>
+
+           <row>
+            <entry>FORK</entry>
+            <entry>
+             if coming from the main fork, will be empty, otherwise will be
+             one of <literal>_fsm</literal>, <literal>_vm</literal>,
+             or <literal>_init</literal>.
+            </entry>
+           </row>
+          </tbody>
+         </tgroup>
+        </informaltable>
+       </para>
+       </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-x <replaceable>xid</replaceable></option></term>
       <term><option>--xid=<replaceable>xid</replaceable></option></term>
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 9993378ca58..2b6f772dc71 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -24,8 +24,12 @@
 #include "access/xlogstats.h"
 #include "common/fe_memutils.h"
 #include "common/logging.h"
+#include "common/relpath.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
 
 /*
  * NOTE: For any code change or issue fix here, it is highly recommended to
@@ -70,6 +74,11 @@ typedef struct XLogDumpConfig
 	bool		filter_by_relation_block_enabled;
 	ForkNumber	filter_by_relation_forknum;
 	bool		filter_by_fpw;
+
+	/* save options */
+	bool		save_fpw;
+	bool		fixup_fpw;
+	char	   *save_fpw_path;
 } XLogDumpConfig;
 
 
@@ -439,6 +448,109 @@ XLogRecordHasFPW(XLogReaderState *record)
 	return false;
 }
 
+/*
+ * Function to externally save all FPWs stored in the given WAL record
+ */
+static void
+XLogRecordSaveFPWs(XLogReaderState *record, const char *savepath, bool fixup)
+{
+	int			block_id;
+
+	for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+	{
+		if (!XLogRecHasBlockRef(record, block_id))
+			continue;
+
+		if (XLogRecHasBlockImage(record, block_id))
+		{
+			char		page[BLCKSZ];
+
+			memset(page, 0, BLCKSZ);
+
+			if (RestoreBlockImage(record, block_id, page))
+			{
+				/* we have our extracted FPI, let's save it now */
+				char		filename[MAXPGPATH];
+				char		forkname[FORKNAMECHARS + 2];	/* _ + \0 */
+				FILE	   *OPF;
+				BlockNumber blk;
+				RelFileLocator rnode;
+				ForkNumber	fork;
+
+				XLogRecGetBlockTagExtended(record, block_id,
+										   &rnode, &fork, &blk, NULL);
+
+				/*
+				 * The raw page as stored in the WAL record includes the LSN
+				 * of the block as it appeared when it was originally written,
+				 * however this differs than the effects of replaying this
+				 * same FPI in recovery, as recovery calls RestoreBlockImage()
+				 * and then sets the LSN as part of one action.  What this
+				 * means is that a page as recovered from WAL and the version
+				 * of the page saved here will differ by the LSN and the
+				 * checksum (if enabled).
+				 *
+				 * There are potentially use-cases for both versions (with and
+				 * without mentioned fixups), so allow this to be
+				 * user-selected, unless the restored page was empty, in which
+				 * case we leave it alone.
+				 */
+
+				if (fixup && !PageIsNew(page))
+				{
+					PageSetLSN(page, record->ReadRecPtr);
+
+					/*
+					 * If checksum field is non-zero then we have checksums
+					 * enabled, so recalculate the checksum with new LSN
+					 * (whether this is considered a hack or heuristics is an
+					 * exercise for the reader).
+					 *
+					 * We make this choice to allow pages saved by this
+					 * function to work as expected with the checksum-related
+					 * functions in pageinspect without having to worry about
+					 * zero_damaged_pages or other considerations.
+					 *
+					 * FPIs in WAL do not have the checksum field updated in
+					 * the page image; in a checksums-enabled cluster, this
+					 * task is handled by FlushBuffer() when a dirty buffer is
+					 * written out to disk.  Since we are running outside of
+					 * Postmaster that won't work in this case, so we handle
+					 * ourselves.
+					 */
+
+					if (((PageHeader) page)->pd_checksum)
+						((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blk);
+				}
+
+				if (fork >= 0 && fork <= MAX_FORKNUM)
+				{
+					if (fork)
+						sprintf(forkname, "_%s", forkNames[fork]);
+					else
+						forkname[0] = 0;
+				}
+				else
+					pg_fatal("found invalid fork number: %u", fork);
+
+				snprintf(filename, MAXPGPATH, "%s/%08X-%08X.%u.%u.%u.%u%s", savepath,
+						 LSN_FORMAT_ARGS(record->ReadRecPtr),
+						 rnode.spcOid, rnode.dbOid, rnode.relNumber, blk, forkname);
+
+				OPF = fopen(filename, PG_BINARY_W);
+				if (!OPF)
+					pg_fatal("couldn't open file for output: %s", filename);
+
+				if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
+					pg_fatal("couldn't write out complete fullpage image to file: %s", filename);
+
+				fsync(fileno(OPF));
+				fclose(OPF);
+			}
+		}
+	}
+}
+
 /*
  * Print a record to stdout
  */
@@ -679,6 +791,9 @@ usage(void)
 			 "                         (default: 1 or the value used in STARTSEG)\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
 	printf(_("  -w, --fullpage         only show records with a full page write\n"));
+	printf(_("  -W, --save-fpi=path    save found full page images to given path\n"
+			 "                         as LSN.T.D.R.B_F\n"));
+	printf(_("  -X, --fixup-fpi=path   like --save-fpi but apply LSN fixups to \n"));
 	printf(_("  -x, --xid=XID          only show records with transaction ID XID\n"));
 	printf(_("  -z, --stats[=record]   show statistics instead of records\n"
 			 "                         (optionally, show per-record statistics)\n"));
@@ -712,6 +827,8 @@ main(int argc, char **argv)
 		{"limit", required_argument, NULL, 'n'},
 		{"path", required_argument, NULL, 'p'},
 		{"quiet", no_argument, NULL, 'q'},
+		{"save-fpi", required_argument, NULL, 'W'},
+		{"fixup-fpi", required_argument, NULL, 'X'},
 		{"relation", required_argument, NULL, 'R'},
 		{"rmgr", required_argument, NULL, 'r'},
 		{"start", required_argument, NULL, 's'},
@@ -772,6 +889,9 @@ main(int argc, char **argv)
 	config.filter_by_fpw = false;
 	config.stats = false;
 	config.stats_per_record = false;
+	config.save_fpw = false;
+	config.fixup_fpw = false;
+	config.save_fpw_path = NULL;
 
 	stats.startptr = InvalidXLogRecPtr;
 	stats.endptr = InvalidXLogRecPtr;
@@ -782,7 +902,7 @@ main(int argc, char **argv)
 		goto bad_argument;
 	}
 
-	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wx:z",
+	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wW:x:X:z",
 								 long_options, &optindex)) != -1)
 	{
 		switch (option)
@@ -918,6 +1038,12 @@ main(int argc, char **argv)
 			case 'w':
 				config.filter_by_fpw = true;
 				break;
+			case 'W':
+			case 'X':
+				config.save_fpw = true;
+				config.fixup_fpw = (option == 'X');
+				config.save_fpw_path = pg_strdup(optarg);
+				break;
 			case 'x':
 				if (sscanf(optarg, "%u", &config.filter_by_xid) != 1)
 				{
@@ -972,6 +1098,24 @@ main(int argc, char **argv)
 		}
 	}
 
+	if (config.save_fpw_path != NULL)
+	{
+		int			dir_status = pg_check_dir(config.save_fpw_path);
+
+		if (dir_status < 0)
+		{
+			pg_log_error("could not access output directory: %s", config.save_fpw_path);
+			goto bad_argument;
+		}
+
+		if (!dir_status && mkdir(config.save_fpw_path, 0700) < 0)
+		{
+			pg_log_error("could not create output directory \"%s\": %m",
+						 config.save_fpw_path);
+			goto bad_argument;
+		}
+	}
+
 	/* parse files as start/end boundaries, extract path if not specified */
 	if (optind < argc)
 	{
@@ -1150,6 +1294,11 @@ main(int argc, char **argv)
 				XLogRecStoreStats(&stats, xlogreader_state);
 				stats.endptr = xlogreader_state->EndRecPtr;
 			}
+			else if (config.save_fpw)
+			{
+				if (XLogRecordHasFPW(xlogreader_state))
+					XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path, config.fixup_fpw);
+			}
 			else
 				XLogDumpDisplayRecord(&config, xlogreader_state);
 		}
diff --git a/src/bin/pg_waldump/t/002_save_fullpage.pl b/src/bin/pg_waldump/t/002_save_fullpage.pl
new file mode 100644
index 00000000000..f605b4f6adb
--- /dev/null
+++ b/src/bin/pg_waldump/t/002_save_fullpage.pl
@@ -0,0 +1,137 @@
+
+# Copyright (c) 2022, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use File::Basename;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::RecursiveCopy;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# routine to extract the LSN and checksum from the given block structure
+sub get_block_info
+{
+	my $path = shift;
+	my $block;
+
+	open my $fh, '<', $path or die "couldn't open file: $path\n";
+	die "couldn't read full block\n" if 8192 != read $fh, $block, 8192;
+	my ($lsn_hi, $lsn_lo, $checksum) = unpack('VVv', $block);
+
+	$lsn_hi = sprintf('%08X', $lsn_hi);
+	$lsn_lo = sprintf('%08X', $lsn_lo);
+
+	return ($lsn_hi, $lsn_lo, $checksum);
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init('-k');
+$primary->append_conf('postgresql.conf', "max_wal_size='100MB'");
+$primary->append_conf('postgresql.conf', "wal_level='replica'");
+$primary->append_conf('postgresql.conf', 'archive_mode=on');
+$primary->append_conf('postgresql.conf', "archive_command='/bin/false'");
+$primary->start;
+
+# Sanity checks for command line options.
+$primary->command_fails(
+	[ 'pg_waldump', '--save-fpi' ],
+	'--save-fpi fails without path');
+$primary->command_fails(
+	[ 'pg_waldump', '--fixup-fpi' ],
+	'--fixup-fpi fails without path');
+
+# generate data/wal to examine
+$primary->safe_psql('postgres', q(CREATE DATABASE db1));
+$primary->safe_psql('db1',      <<EOF);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT;
+SELECT pg_switch_wal();
+UPDATE test_table SET a = a + 1;
+CHECKPOINT;
+SELECT pg_switch_wal();
+UPDATE test_table SET a = a + 1;
+CHECKPOINT;
+SELECT pg_switch_wal();
+EOF
+
+# get the relation node, etc for the new table
+my $relation = $primary->safe_psql('db1',
+	q{SELECT CASE WHEN reltablespace = 0 THEN dattablespace ELSE reltablespace END || '/' || pg_database.oid || '/' || relfilenode FROM pg_class, pg_database WHERE relname = 'test_table' AND datname = current_database()}
+);
+
+diag $relation;
+
+$primary->stop;
+my $waldir = $primary->basedir . '/pgdata/pg_wal';
+my $walfile = [glob("$waldir/*")]->[2]; # we want the 00000002 file
+my $tmp_folder = PostgreSQL::Test::Utils::tempdir;
+diag "using walfile: $walfile";
+
+# # # extract files
+# PostgreSQL::Test::RecursiveCopy::copypath(
+# 	$primary->data_dir . "/pg_wal",
+# 	$primary->tmp_folder);
+
+$primary->command_ok(['pg_waldump', '--save-fpi', "$tmp_folder/raw", '--relation', $relation, $walfile]);
+$primary->command_ok(['pg_waldump', '--fixup-fpi', "$tmp_folder/fixup", '--relation', $relation, $walfile]);
+
+my $file_re =
+  qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm)?$/;
+
+my %checksums;
+my %files;
+
+# verify filename formats matches w/--save-fpi
+for my $fullpath (glob "$tmp_folder/raw/*")
+{
+	my $file = File::Basename::basename($fullpath);
+
+	like($file, $file_re, "verify filename format for file $file");
+
+	# save filename for later verification
+	$files{$file}++;
+
+	my ($hi_lsn_fn, $lo_lsn_fn) = ($file =~ $file_re);
+	my ($hi_lsn_bk, $lo_lsn_bk, $checksum) = get_block_info($fullpath);
+
+	# since no fixup, verify the lsn in the block comes before the file's lsn
+	ok( $hi_lsn_fn . $lo_lsn_fn gt $hi_lsn_bk . $lo_lsn_bk,
+		'verify file-based LSN precedes block-based one');
+
+	# stash checksum for later comparisons
+	$checksums{$file} = $checksum;
+}
+
+# verify filename formats matches w/--fixup-fpi
+for my $fullpath (glob "$tmp_folder/fixup/*")
+{
+	my $file = File::Basename::basename($fullpath);
+
+	like($file, $file_re, "verify filename format for file $file");
+
+	# save filename for later verification
+	$files{$file}++;
+
+	my ($hi_lsn_fn, $lo_lsn_fn) = ($file =~ $file_re);
+	my ($hi_lsn_bk, $lo_lsn_bk, $checksum) = get_block_info($fullpath);
+
+	# since fixup, verify the lsn in the block equals file lsn
+	ok( $hi_lsn_fn . $lo_lsn_fn eq $hi_lsn_bk . $lo_lsn_bk,
+		'verify file-based LSN is the same as block-based one');
+
+	# verify checksum change; XXX: there could be valid clashes here,
+	# just validate that the page matches the expected checksum instead?
+	ok( $checksum == 0 || $checksums{$file} != $checksum,
+		'check for checksum change or no checksum');
+}
+
+# validate that we ended up with some files output and they were the same
+ok(keys %files > 0, 'verify we processed some files');
+ok((grep { $_ != 2 } values %files) == 0,
+	'ensure raw and fixup had same number of files');
+
+done_testing();
-- 
2.25.1

0002-f-justin.patchtext/x-diff; charset=us-asciiDownload
From 3c2c369c495760d8f96d3cdeab623c4f0372792a Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 15 Sep 2022 13:39:53 -0500
Subject: [PATCH 2/2] f!justin

ci-os-only: freebsd
---
 src/bin/pg_waldump/meson.build            |   1 +
 src/bin/pg_waldump/pg_waldump.c           | 172 +++++++++++-----------
 src/bin/pg_waldump/t/002_save_fullpage.pl |   2 +-
 3 files changed, 86 insertions(+), 89 deletions(-)

diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 9605976870d..34e37bffc36 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -29,6 +29,7 @@ tests += {
   'tap': {
     'tests': [
       't/001_basic.pl',
+      't/002_save_fullpage.pl',
     ],
   },
 }
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 2b6f772dc71..e35d15132f4 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -76,7 +76,6 @@ typedef struct XLogDumpConfig
 	bool		filter_by_fpw;
 
 	/* save options */
-	bool		save_fpw;
 	bool		fixup_fpw;
 	char	   *save_fpw_path;
 } XLogDumpConfig;
@@ -458,96 +457,94 @@ XLogRecordSaveFPWs(XLogReaderState *record, const char *savepath, bool fixup)
 
 	for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
 	{
+		char		page[BLCKSZ] = {0};
+		char		filename[MAXPGPATH];
+		char		forkname[FORKNAMECHARS + 2];	/* _ + \0 */
+		FILE	   *OPF;
+		BlockNumber blk;
+		RelFileLocator rnode;
+		ForkNumber	fork;
+
 		if (!XLogRecHasBlockRef(record, block_id))
 			continue;
 
-		if (XLogRecHasBlockImage(record, block_id))
-		{
-			char		page[BLCKSZ];
-
-			memset(page, 0, BLCKSZ);
-
-			if (RestoreBlockImage(record, block_id, page))
-			{
-				/* we have our extracted FPI, let's save it now */
-				char		filename[MAXPGPATH];
-				char		forkname[FORKNAMECHARS + 2];	/* _ + \0 */
-				FILE	   *OPF;
-				BlockNumber blk;
-				RelFileLocator rnode;
-				ForkNumber	fork;
-
-				XLogRecGetBlockTagExtended(record, block_id,
-										   &rnode, &fork, &blk, NULL);
-
-				/*
-				 * The raw page as stored in the WAL record includes the LSN
-				 * of the block as it appeared when it was originally written,
-				 * however this differs than the effects of replaying this
-				 * same FPI in recovery, as recovery calls RestoreBlockImage()
-				 * and then sets the LSN as part of one action.  What this
-				 * means is that a page as recovered from WAL and the version
-				 * of the page saved here will differ by the LSN and the
-				 * checksum (if enabled).
-				 *
-				 * There are potentially use-cases for both versions (with and
-				 * without mentioned fixups), so allow this to be
-				 * user-selected, unless the restored page was empty, in which
-				 * case we leave it alone.
-				 */
-
-				if (fixup && !PageIsNew(page))
-				{
-					PageSetLSN(page, record->ReadRecPtr);
+		if (!XLogRecHasBlockImage(record, block_id))
+			continue;
 
-					/*
-					 * If checksum field is non-zero then we have checksums
-					 * enabled, so recalculate the checksum with new LSN
-					 * (whether this is considered a hack or heuristics is an
-					 * exercise for the reader).
-					 *
-					 * We make this choice to allow pages saved by this
-					 * function to work as expected with the checksum-related
-					 * functions in pageinspect without having to worry about
-					 * zero_damaged_pages or other considerations.
-					 *
-					 * FPIs in WAL do not have the checksum field updated in
-					 * the page image; in a checksums-enabled cluster, this
-					 * task is handled by FlushBuffer() when a dirty buffer is
-					 * written out to disk.  Since we are running outside of
-					 * Postmaster that won't work in this case, so we handle
-					 * ourselves.
-					 */
+		if (!RestoreBlockImage(record, block_id, page))
+			continue;
 
-					if (((PageHeader) page)->pd_checksum)
-						((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blk);
-				}
+		/* we have our extracted FPI, let's save it now */
+
+		XLogRecGetBlockTagExtended(record, block_id,
+								   &rnode, &fork, &blk, NULL);
+
+		/*
+		 * The raw page as stored in the WAL record includes the LSN
+		 * of the block as it appeared when it was originally written,
+		 * however this differs than the effects of replaying this
+		 * same FPI in recovery, as recovery calls RestoreBlockImage()
+		 * and then sets the LSN as part of one action.  What this
+		 * means is that a page as recovered from WAL and the version
+		 * of the page saved here will differ by the LSN and the
+		 * checksum (if enabled).
+		 *
+		 * There are potentially use-cases for both versions (with and
+		 * without mentioned fixups), so allow this to be
+		 * user-selected, unless the restored page was empty, in which
+		 * case we leave it alone.
+		 */
+
+		if (fixup && !PageIsNew(page))
+		{
+			PageSetLSN(page, record->ReadRecPtr);
+
+			/*
+			 * If checksum field is non-zero then we have checksums
+			 * enabled, so recalculate the checksum with new LSN
+			 * (whether this is considered a hack or heuristics is an
+			 * exercise for the reader).
+			 *
+			 * We make this choice to allow pages saved by this
+			 * function to work as expected with the checksum-related
+			 * functions in pageinspect without having to worry about
+			 * zero_damaged_pages or other considerations.
+			 *
+			 * FPIs in WAL do not have the checksum field updated in
+			 * the page image; in a checksums-enabled cluster, this
+			 * task is handled by FlushBuffer() when a dirty buffer is
+			 * written out to disk.  Since we are running outside of
+			 * Postmaster that won't work in this case, so we handle
+			 * ourselves.
+			 */
+
+			if (((PageHeader) page)->pd_checksum)
+				((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blk);
+		}
 
-				if (fork >= 0 && fork <= MAX_FORKNUM)
-				{
-					if (fork)
-						sprintf(forkname, "_%s", forkNames[fork]);
-					else
-						forkname[0] = 0;
-				}
-				else
-					pg_fatal("found invalid fork number: %u", fork);
+		if (fork >= 0 && fork <= MAX_FORKNUM)
+		{
+			if (fork)
+				sprintf(forkname, "_%s", forkNames[fork]);
+			else
+				forkname[0] = 0;
+		}
+		else
+			pg_fatal("found invalid fork number: %u", fork);
 
-				snprintf(filename, MAXPGPATH, "%s/%08X-%08X.%u.%u.%u.%u%s", savepath,
-						 LSN_FORMAT_ARGS(record->ReadRecPtr),
-						 rnode.spcOid, rnode.dbOid, rnode.relNumber, blk, forkname);
+		snprintf(filename, MAXPGPATH, "%s/%08X-%08X.%u.%u.%u.%u%s", savepath,
+				 LSN_FORMAT_ARGS(record->ReadRecPtr),
+				 rnode.spcOid, rnode.dbOid, rnode.relNumber, blk, forkname);
 
-				OPF = fopen(filename, PG_BINARY_W);
-				if (!OPF)
-					pg_fatal("couldn't open file for output: %s", filename);
+		OPF = fopen(filename, PG_BINARY_W);
+		if (!OPF)
+			pg_fatal("couldn't open file for output: %s", filename);
 
-				if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
-					pg_fatal("couldn't write out complete fullpage image to file: %s", filename);
+		if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
+			pg_fatal("couldn't write out complete fullpage image to file: %s", filename);
 
-				fsync(fileno(OPF));
-				fclose(OPF);
-			}
-		}
+		fsync(fileno(OPF));
+		fclose(OPF);
 	}
 }
 
@@ -791,9 +788,9 @@ usage(void)
 			 "                         (default: 1 or the value used in STARTSEG)\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
 	printf(_("  -w, --fullpage         only show records with a full page write\n"));
-	printf(_("  -W, --save-fpi=path    save found full page images to given path\n"
-			 "                         as LSN.T.D.R.B_F\n"));
-	printf(_("  -X, --fixup-fpi=path   like --save-fpi but apply LSN fixups to \n"));
+	printf(_("  -W, --save-fpi=path    save full page images to given path as\n"
+			 "                         LSN.T.D.R.B_F\n"));
+	printf(_("  -X, --fixup-fpi=path   like --save-fpi but apply LSN fixups to saved page\n"));
 	printf(_("  -x, --xid=XID          only show records with transaction ID XID\n"));
 	printf(_("  -z, --stats[=record]   show statistics instead of records\n"
 			 "                         (optionally, show per-record statistics)\n"));
@@ -889,7 +886,6 @@ main(int argc, char **argv)
 	config.filter_by_fpw = false;
 	config.stats = false;
 	config.stats_per_record = false;
-	config.save_fpw = false;
 	config.fixup_fpw = false;
 	config.save_fpw_path = NULL;
 
@@ -1040,7 +1036,6 @@ main(int argc, char **argv)
 				break;
 			case 'W':
 			case 'X':
-				config.save_fpw = true;
 				config.fixup_fpw = (option == 'X');
 				config.save_fpw_path = pg_strdup(optarg);
 				break;
@@ -1108,7 +1103,8 @@ main(int argc, char **argv)
 			goto bad_argument;
 		}
 
-		if (!dir_status && mkdir(config.save_fpw_path, 0700) < 0)
+		/* Create the dir if it doesn't exist */
+		if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
 		{
 			pg_log_error("could not create output directory \"%s\": %m",
 						 config.save_fpw_path);
@@ -1294,7 +1290,7 @@ main(int argc, char **argv)
 				XLogRecStoreStats(&stats, xlogreader_state);
 				stats.endptr = xlogreader_state->EndRecPtr;
 			}
-			else if (config.save_fpw)
+			else if (config.save_fpw_path)
 			{
 				if (XLogRecordHasFPW(xlogreader_state))
 					XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path, config.fixup_fpw);
diff --git a/src/bin/pg_waldump/t/002_save_fullpage.pl b/src/bin/pg_waldump/t/002_save_fullpage.pl
index f605b4f6adb..6df670c39ba 100644
--- a/src/bin/pg_waldump/t/002_save_fullpage.pl
+++ b/src/bin/pg_waldump/t/002_save_fullpage.pl
@@ -60,7 +60,7 @@ EOF
 
 # get the relation node, etc for the new table
 my $relation = $primary->safe_psql('db1',
-	q{SELECT CASE WHEN reltablespace = 0 THEN dattablespace ELSE reltablespace END || '/' || pg_database.oid || '/' || relfilenode FROM pg_class, pg_database WHERE relname = 'test_table' AND datname = current_database()}
+	q{SELECT format('%s/%s/%s', CASE WHEN reltablespace = 0 THEN dattablespace ELSE reltablespace END, pg_database.oid, relfilenode) FROM pg_class, pg_database WHERE relname = 'test_table' AND datname = current_database()}
 );
 
 diag $relation;
-- 
2.25.1

#27David Christensen
david.christensen@crunchydata.com
In reply to: Justin Pryzby (#26)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Tue, Nov 8, 2022 at 4:45 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Mon, Nov 07, 2022 at 05:01:01PM -0600, David Christensen wrote:

Hi Justin et al,

Enclosed is v5 of this patch which now passes the CirrusCI checks for
all supported OSes. I went ahead and reworked the test a bit so it's a
little more amenable to the OS-agnostic approach for testing.

Great, thanks.

This includes the changes that I'd started a few months ago.
Plus adding the test which was missing for meson.

Cool, will review, thanks.

+ format: <literal><replaceable>LSN</replaceable>.<replaceable>TSOID</replaceable>.<replaceable>DBOID</replaceable>.<replaceable>RELNODE</replaceable>.<replaceable>BLKNO</replaceable></literal>

I'd prefer if the abbreviations were "reltablespace" and "datoid"

Sure, no issues there.

Also, should the test case call pg_relation_filenode() rather than using
relfilenode directly ? Is it a problem that the test code assumes
pagesize=8192 ?

Both good points. Is pagesize just exposed via
`current_setting('block_size')` or is there a different approach?

David

#28David Christensen
david.christensen@crunchydata.com
In reply to: David Christensen (#27)
1 attachment(s)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

Enclosed is v6, which squashes your refactor and adds the additional
recent suggestions.

Thanks!

Attachments:

v6-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patchapplication/octet-stream; name=v6-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patchDownload
From 02e9c308cd435e266dda95186127932bc2f50965 Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Wed, 20 Apr 2022 19:59:35 -0500
Subject: [PATCH v6] Teach pg_waldump to extract FPIs from the WAL stream

Extracts full-page images from the WAL stream into a target directory,
which must be empty or not exist.  These images are subject to the same
filtering rules as normal display in pg_waldump, which means that you
can isolate the full page writes to a target relation, among other
things.

Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk> with
formatting to make things somewhat sortable; for instance:

00000000-010000C0.1663.1.6117.0
00000000-01000150.1664.0.6115.0
00000000-010001E0.1664.0.6114.0
00000000-01000270.1663.1.6116.0
00000000-01000300.1663.1.6113.0
00000000-01000390.1663.1.6112.0
00000000-01000420.1663.1.8903.0
00000000-010004B0.1663.1.8902.0
00000000-01000540.1663.1.6111.0
00000000-010005D0.1663.1.6110.0

If the FPI comes from a fork other than the main fork, the fork name
will be appended on the output file name; e.g.:

00000000-014A4758.1663.1.12864.0_vm

It's noteworthy that the raw block images do not have the current LSN
stored with them in the WAL stream (as would be true for on-heap
versions of the blocks), nor would the checksum be updated in
them (though WAL itself has checksums, so there is some protection
there).  As such there are two versions of this functionality, one which
returns the raw page as it appears in the WAL (--save-fpi) and one
which applies the updated pd_lsn and pd_checksum (--fixup-fpi).

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect` suite of tools to perform detailed analysis on
the pages in question, based on historical information, and may come in
handy for forensics work.
---
 doc/src/sgml/ref/pg_waldump.sgml          |  79 ++++++++++++
 src/bin/pg_waldump/meson.build            |   1 +
 src/bin/pg_waldump/pg_waldump.c           | 147 +++++++++++++++++++++-
 src/bin/pg_waldump/t/002_save_fullpage.pl | 142 +++++++++++++++++++++
 4 files changed, 368 insertions(+), 1 deletion(-)
 create mode 100644 src/bin/pg_waldump/t/002_save_fullpage.pl

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d559f091e5..ecc973e86b 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -240,6 +240,85 @@ PostgreSQL documentation
        </listitem>
      </varlistentry>
 
+     <varlistentry>
+       <term><option>-W <replaceable>save_path</replaceable></option></term>
+       <term><option>--save-fpi=<replaceable>save_path</replaceable></option></term>
+       <term><option>-X <replaceable>save_path</replaceable></option></term>
+       <term><option>--fixup-fpi=<replaceable>save_path</replaceable></option></term>
+       <listitem>
+       <para>
+        Save full page images seen in the WAL stream to the
+        <replaceable>save_path</replaceable> directory, which will be created
+        if it does not exist.  The images saved will be subject to the same
+        filtering and limiting criteria as display records, but in this
+        mode <application>pg_waldump</application> will not output any other
+        information.
+       </para>
+       <para>
+        If invoked using
+        the <literal>-X</literal>/<literal>--fixup-fpi</literal>
+        option, this page image will include the <literal>pd_lsn</literal> of
+        the replayed record rather than the raw page image; as well,
+        the <literal>pd_checksum</literal> field will be updated if it had
+        previously existed.
+       </para>
+       <para>
+        The page images will be saved with the file
+        format: <literal><replaceable>LSN</replaceable>.<replaceable>RELTABLESPACE</replaceable>.<replaceable>DATOID</replaceable>.<replaceable>RELNODE</replaceable>.<replaceable>BLKNO</replaceable><replaceable>FORK</replaceable></literal>
+
+        The dot-separated components are (in order):
+
+        <informaltable>
+         <tgroup cols="2">
+          <thead>
+           <row>
+            <entry>Component</entry>
+            <entry>Description</entry>
+           </row>
+          </thead>
+
+          <tbody>
+           <row>
+            <entry>LSN</entry>
+            <entry>The LSN of the record with this block, formatted
+             as two 8-character hexadecimal numbers <literal>%08X-%08X</literal></entry>
+           </row>
+
+           <row>
+            <entry>RELTABLESPACE</entry>
+            <entry>tablespace OID for the block</entry>
+           </row>
+
+           <row>
+            <entry>DATOID</entry>
+            <entry>database OID for the block</entry>
+           </row>
+
+           <row>
+            <entry>RELNODE</entry>
+            <entry>relnode id for the block</entry>
+           </row>
+
+           <row>
+            <entry>BLKNO</entry>
+            <entry>the block number of this block</entry>
+           </row>
+
+           <row>
+            <entry>FORK</entry>
+            <entry>
+             if coming from the main fork, will be empty, otherwise will be
+             one of <literal>_fsm</literal>, <literal>_vm</literal>,
+             or <literal>_init</literal>.
+            </entry>
+           </row>
+          </tbody>
+         </tgroup>
+        </informaltable>
+       </para>
+       </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-x <replaceable>xid</replaceable></option></term>
       <term><option>--xid=<replaceable>xid</replaceable></option></term>
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 9605976870..34e37bffc3 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -29,6 +29,7 @@ tests += {
   'tap': {
     'tests': [
       't/001_basic.pl',
+      't/002_save_fullpage.pl',
     ],
   },
 }
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 9993378ca5..e35d15132f 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -24,8 +24,12 @@
 #include "access/xlogstats.h"
 #include "common/fe_memutils.h"
 #include "common/logging.h"
+#include "common/relpath.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
 
 /*
  * NOTE: For any code change or issue fix here, it is highly recommended to
@@ -70,6 +74,10 @@ typedef struct XLogDumpConfig
 	bool		filter_by_relation_block_enabled;
 	ForkNumber	filter_by_relation_forknum;
 	bool		filter_by_fpw;
+
+	/* save options */
+	bool		fixup_fpw;
+	char	   *save_fpw_path;
 } XLogDumpConfig;
 
 
@@ -439,6 +447,107 @@ XLogRecordHasFPW(XLogReaderState *record)
 	return false;
 }
 
+/*
+ * Function to externally save all FPWs stored in the given WAL record
+ */
+static void
+XLogRecordSaveFPWs(XLogReaderState *record, const char *savepath, bool fixup)
+{
+	int			block_id;
+
+	for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+	{
+		char		page[BLCKSZ] = {0};
+		char		filename[MAXPGPATH];
+		char		forkname[FORKNAMECHARS + 2];	/* _ + \0 */
+		FILE	   *OPF;
+		BlockNumber blk;
+		RelFileLocator rnode;
+		ForkNumber	fork;
+
+		if (!XLogRecHasBlockRef(record, block_id))
+			continue;
+
+		if (!XLogRecHasBlockImage(record, block_id))
+			continue;
+
+		if (!RestoreBlockImage(record, block_id, page))
+			continue;
+
+		/* we have our extracted FPI, let's save it now */
+
+		XLogRecGetBlockTagExtended(record, block_id,
+								   &rnode, &fork, &blk, NULL);
+
+		/*
+		 * The raw page as stored in the WAL record includes the LSN
+		 * of the block as it appeared when it was originally written,
+		 * however this differs than the effects of replaying this
+		 * same FPI in recovery, as recovery calls RestoreBlockImage()
+		 * and then sets the LSN as part of one action.  What this
+		 * means is that a page as recovered from WAL and the version
+		 * of the page saved here will differ by the LSN and the
+		 * checksum (if enabled).
+		 *
+		 * There are potentially use-cases for both versions (with and
+		 * without mentioned fixups), so allow this to be
+		 * user-selected, unless the restored page was empty, in which
+		 * case we leave it alone.
+		 */
+
+		if (fixup && !PageIsNew(page))
+		{
+			PageSetLSN(page, record->ReadRecPtr);
+
+			/*
+			 * If checksum field is non-zero then we have checksums
+			 * enabled, so recalculate the checksum with new LSN
+			 * (whether this is considered a hack or heuristics is an
+			 * exercise for the reader).
+			 *
+			 * We make this choice to allow pages saved by this
+			 * function to work as expected with the checksum-related
+			 * functions in pageinspect without having to worry about
+			 * zero_damaged_pages or other considerations.
+			 *
+			 * FPIs in WAL do not have the checksum field updated in
+			 * the page image; in a checksums-enabled cluster, this
+			 * task is handled by FlushBuffer() when a dirty buffer is
+			 * written out to disk.  Since we are running outside of
+			 * Postmaster that won't work in this case, so we handle
+			 * ourselves.
+			 */
+
+			if (((PageHeader) page)->pd_checksum)
+				((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blk);
+		}
+
+		if (fork >= 0 && fork <= MAX_FORKNUM)
+		{
+			if (fork)
+				sprintf(forkname, "_%s", forkNames[fork]);
+			else
+				forkname[0] = 0;
+		}
+		else
+			pg_fatal("found invalid fork number: %u", fork);
+
+		snprintf(filename, MAXPGPATH, "%s/%08X-%08X.%u.%u.%u.%u%s", savepath,
+				 LSN_FORMAT_ARGS(record->ReadRecPtr),
+				 rnode.spcOid, rnode.dbOid, rnode.relNumber, blk, forkname);
+
+		OPF = fopen(filename, PG_BINARY_W);
+		if (!OPF)
+			pg_fatal("couldn't open file for output: %s", filename);
+
+		if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
+			pg_fatal("couldn't write out complete fullpage image to file: %s", filename);
+
+		fsync(fileno(OPF));
+		fclose(OPF);
+	}
+}
+
 /*
  * Print a record to stdout
  */
@@ -679,6 +788,9 @@ usage(void)
 			 "                         (default: 1 or the value used in STARTSEG)\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
 	printf(_("  -w, --fullpage         only show records with a full page write\n"));
+	printf(_("  -W, --save-fpi=path    save full page images to given path as\n"
+			 "                         LSN.T.D.R.B_F\n"));
+	printf(_("  -X, --fixup-fpi=path   like --save-fpi but apply LSN fixups to saved page\n"));
 	printf(_("  -x, --xid=XID          only show records with transaction ID XID\n"));
 	printf(_("  -z, --stats[=record]   show statistics instead of records\n"
 			 "                         (optionally, show per-record statistics)\n"));
@@ -712,6 +824,8 @@ main(int argc, char **argv)
 		{"limit", required_argument, NULL, 'n'},
 		{"path", required_argument, NULL, 'p'},
 		{"quiet", no_argument, NULL, 'q'},
+		{"save-fpi", required_argument, NULL, 'W'},
+		{"fixup-fpi", required_argument, NULL, 'X'},
 		{"relation", required_argument, NULL, 'R'},
 		{"rmgr", required_argument, NULL, 'r'},
 		{"start", required_argument, NULL, 's'},
@@ -772,6 +886,8 @@ main(int argc, char **argv)
 	config.filter_by_fpw = false;
 	config.stats = false;
 	config.stats_per_record = false;
+	config.fixup_fpw = false;
+	config.save_fpw_path = NULL;
 
 	stats.startptr = InvalidXLogRecPtr;
 	stats.endptr = InvalidXLogRecPtr;
@@ -782,7 +898,7 @@ main(int argc, char **argv)
 		goto bad_argument;
 	}
 
-	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wx:z",
+	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wW:x:X:z",
 								 long_options, &optindex)) != -1)
 	{
 		switch (option)
@@ -918,6 +1034,11 @@ main(int argc, char **argv)
 			case 'w':
 				config.filter_by_fpw = true;
 				break;
+			case 'W':
+			case 'X':
+				config.fixup_fpw = (option == 'X');
+				config.save_fpw_path = pg_strdup(optarg);
+				break;
 			case 'x':
 				if (sscanf(optarg, "%u", &config.filter_by_xid) != 1)
 				{
@@ -972,6 +1093,25 @@ main(int argc, char **argv)
 		}
 	}
 
+	if (config.save_fpw_path != NULL)
+	{
+		int			dir_status = pg_check_dir(config.save_fpw_path);
+
+		if (dir_status < 0)
+		{
+			pg_log_error("could not access output directory: %s", config.save_fpw_path);
+			goto bad_argument;
+		}
+
+		/* Create the dir if it doesn't exist */
+		if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
+		{
+			pg_log_error("could not create output directory \"%s\": %m",
+						 config.save_fpw_path);
+			goto bad_argument;
+		}
+	}
+
 	/* parse files as start/end boundaries, extract path if not specified */
 	if (optind < argc)
 	{
@@ -1150,6 +1290,11 @@ main(int argc, char **argv)
 				XLogRecStoreStats(&stats, xlogreader_state);
 				stats.endptr = xlogreader_state->EndRecPtr;
 			}
+			else if (config.save_fpw_path)
+			{
+				if (XLogRecordHasFPW(xlogreader_state))
+					XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path, config.fixup_fpw);
+			}
 			else
 				XLogDumpDisplayRecord(&config, xlogreader_state);
 		}
diff --git a/src/bin/pg_waldump/t/002_save_fullpage.pl b/src/bin/pg_waldump/t/002_save_fullpage.pl
new file mode 100644
index 0000000000..c98100bf21
--- /dev/null
+++ b/src/bin/pg_waldump/t/002_save_fullpage.pl
@@ -0,0 +1,142 @@
+
+# Copyright (c) 2022, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use File::Basename;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::RecursiveCopy;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $blocksize = 8192;
+
+# routine to extract the LSN and checksum from the given block structure
+sub get_block_info
+{
+	my $path = shift;
+	my $block;
+
+	open my $fh, '<', $path or die "couldn't open file: $path\n";
+	die "couldn't read full block\n" if $blocksize != read $fh, $block, $blocksize;
+	my ($lsn_hi, $lsn_lo, $checksum) = unpack('VVv', $block);
+
+	$lsn_hi = sprintf('%08X', $lsn_hi);
+	$lsn_lo = sprintf('%08X', $lsn_lo);
+
+	return ($lsn_hi, $lsn_lo, $checksum);
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init('-k');
+$primary->append_conf('postgresql.conf', "max_wal_size='100MB'");
+$primary->append_conf('postgresql.conf', "wal_level='replica'");
+$primary->append_conf('postgresql.conf', 'archive_mode=on');
+$primary->append_conf('postgresql.conf', "archive_command='/bin/false'");
+$primary->start;
+
+# Sanity checks for command line options.
+$primary->command_fails(
+	[ 'pg_waldump', '--save-fpi' ],
+	'--save-fpi fails without path');
+$primary->command_fails(
+	[ 'pg_waldump', '--fixup-fpi' ],
+	'--fixup-fpi fails without path');
+
+# generate data/wal to examine
+$primary->safe_psql('postgres', q(CREATE DATABASE db1));
+$primary->safe_psql('db1',      <<EOF);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT;
+SELECT pg_switch_wal();
+UPDATE test_table SET a = a + 1;
+CHECKPOINT;
+SELECT pg_switch_wal();
+UPDATE test_table SET a = a + 1;
+CHECKPOINT;
+SELECT pg_switch_wal();
+EOF
+
+# get our blocksize from the db
+$blocksize = $primary->safe_psql('postgres', "SELECT current_setting('block_size')");
+
+# get the relation node, etc for the new table
+my $relation = $primary->safe_psql('db1',
+	q{SELECT format('%s/%s/%s', CASE WHEN reltablespace = 0 THEN dattablespace ELSE reltablespace END, pg_database.oid, pg_relation_filenode(pg_class.oid)) FROM pg_class, pg_database WHERE relname = 'test_table' AND datname = current_database()}
+);
+
+diag $relation;
+
+$primary->stop;
+my $waldir = $primary->basedir . '/pgdata/pg_wal';
+my $walfile = [glob("$waldir/*")]->[2]; # we want the 00000002 file
+my $tmp_folder = PostgreSQL::Test::Utils::tempdir;
+diag "using walfile: $walfile";
+
+# # # extract files
+# PostgreSQL::Test::RecursiveCopy::copypath(
+# 	$primary->data_dir . "/pg_wal",
+# 	$primary->tmp_folder);
+
+$primary->command_ok(['pg_waldump', '--save-fpi', "$tmp_folder/raw", '--relation', $relation, $walfile]);
+$primary->command_ok(['pg_waldump', '--fixup-fpi', "$tmp_folder/fixup", '--relation', $relation, $walfile]);
+
+my $file_re =
+  qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm)?$/;
+
+my %checksums;
+my %files;
+
+# verify filename formats matches w/--save-fpi
+for my $fullpath (glob "$tmp_folder/raw/*")
+{
+	my $file = File::Basename::basename($fullpath);
+
+	like($file, $file_re, "verify filename format for file $file");
+
+	# save filename for later verification
+	$files{$file}++;
+
+	my ($hi_lsn_fn, $lo_lsn_fn) = ($file =~ $file_re);
+	my ($hi_lsn_bk, $lo_lsn_bk, $checksum) = get_block_info($fullpath);
+
+	# since no fixup, verify the lsn in the block comes before the file's lsn
+	ok( $hi_lsn_fn . $lo_lsn_fn gt $hi_lsn_bk . $lo_lsn_bk,
+		'verify file-based LSN precedes block-based one');
+
+	# stash checksum for later comparisons
+	$checksums{$file} = $checksum;
+}
+
+# verify filename formats matches w/--fixup-fpi
+for my $fullpath (glob "$tmp_folder/fixup/*")
+{
+	my $file = File::Basename::basename($fullpath);
+
+	like($file, $file_re, "verify filename format for file $file");
+
+	# save filename for later verification
+	$files{$file}++;
+
+	my ($hi_lsn_fn, $lo_lsn_fn) = ($file =~ $file_re);
+	my ($hi_lsn_bk, $lo_lsn_bk, $checksum) = get_block_info($fullpath);
+
+	# since fixup, verify the lsn in the block equals file lsn
+	ok( $hi_lsn_fn . $lo_lsn_fn eq $hi_lsn_bk . $lo_lsn_bk,
+		'verify file-based LSN is the same as block-based one');
+
+	# verify checksum change; XXX: there could be valid clashes here,
+	# just validate that the page matches the expected checksum instead?
+	ok( $checksum == 0 || $checksums{$file} != $checksum,
+		'check for checksum change or no checksum');
+}
+
+# validate that we ended up with some files output and they were the same
+ok(keys %files > 0, 'verify we processed some files');
+ok((grep { $_ != 2 } values %files) == 0,
+	'ensure raw and fixup had same number of files');
+
+done_testing();
-- 
2.37.1 (Apple Git-137.1)

#29sho kato
kato-sho@fujitsu.com
In reply to: David Christensen (#28)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, failed
Spec compliant: tested, failed
Documentation: tested, failed

Hello,

I tested this patch on Linux and there is no problem.
Also, I reviewed this patch and commented below.

@@ -439,6 +447,107 @@ XLogRecordHasFPW(XLogReaderState *record)
+       if (fork >= 0 && fork <= MAX_FORKNUM)
+       {
+           if (fork)
+               sprintf(forkname, "_%s", forkNames[fork]);
+           else
+               forkname[0] = 0;
+       }
+       else
+           pg_fatal("found invalid fork number: %u", fork);

Should we add the comment if the main fork is saved without "_main" suffix for code readability?

@@ -679,6 +788,9 @@ usage(void)
             "                         (default: 1 or the value used in STARTSEG)\n"));
    printf(_("  -V, --version          output version information, then exit\n"));
    printf(_("  -w, --fullpage         only show records with a full page write\n"));
+   printf(_("  -W, --save-fpi=path    save full page images to given path as\n"
+            "                         LSN.T.D.R.B_F\n"));
+   printf(_("  -X, --fixup-fpi=path   like --save-fpi but apply LSN fixups to saved page\n"));
    printf(_("  -x, --xid=XID          only show records with transaction ID XID\n"));
    printf(_("  -z, --stats[=record]   show statistics instead of records\n"
             "                         (optionally, show per-record statistics)\n"));

Since lower-case options are displayed at the top, should we switch the order of -x and -X?

@@ -972,6 +1093,25 @@ main(int argc, char **argv)
}
}

+       int         dir_status = pg_check_dir(config.save_fpw_path);
+
+       if (dir_status < 0)
+       {
+           pg_log_error("could not access output directory: %s", config.save_fpw_path);
+           goto bad_argument;
+       }

Should we output %s enclosed with \"?

Regards,
Sho Kato

#30Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David Christensen (#28)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Wed, Nov 9, 2022 at 5:08 AM David Christensen
<david.christensen@crunchydata.com> wrote:

Enclosed is v6, which squashes your refactor and adds the additional
recent suggestions.

Thanks for working on this feature. Here are some comments for now. I
haven't looked at the tests, I will take another look at the code and
tests after these and all other comments are addressed.

1. For ease of review, please split the test patch to 0002.

2. I'm unable to understand the use-case for --fixup-fpi option.
pg_waldump is supposed to be just WAL reader, and must not return any
modified information, with --fixup-fpi option, the patch violates this
principle i.e. it sets page LSN and returns. Without actually
replaying the WAL record on the page, how is it correct to just set
the LSN? How will it be useful? ISTM, we must ignore this option
unless there's a strong use-case.
3.
+        if (fork >= 0 && fork <= MAX_FORKNUM)
+        {
+            if (fork)
+                sprintf(forkname, "_%s", forkNames[fork]);
+            else
+                forkname[0] = 0;
+        }
+        else
+            pg_fatal("found invalid fork number: %u", fork);

Isn't the above complex? What's the problem with something like below?
Why do we need if (fork) - else block?

if (fork >= 0 && fork <= MAX_FORKNUM)
sprintf(forkname, "_%s", forkNames[fork]);
else
pg_fatal("found invalid fork number: %u", fork);

3.
+ char page[BLCKSZ] = {0};
I think when writing to a file, we need PGAlignedBlock rather than a
simple char array of bytes, see the description around PGAlignedBlock
for why it is so.

4.
+ if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
Why pg_pwrite(), why not just fwrite()? If fwrite() is used, you can
avoid fileno() system calls, no? Do you need the file position to
remain the same after writing, hence pg_pwrite()?

5.
+        if (!RestoreBlockImage(record, block_id, page))
+            continue;
+
+        /* we have our extracted FPI, let's save it now */
After extracting the page from the WAL record, do we need to perform a
checksum on it?

6.
+ if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
Use pg_dir_create_mode instead of hard-coded 0007?

7.
+        if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
+        fsync(fileno(OPF));
+        fclose(OPF);
Since you're creating the directory in case it's not available, you
need to fsync the directory too?
8.
+            case 'W':
+            case 'X':
+                config.fixup_fpw = (option == 'X');
+                config.save_fpw_path = pg_strdup(optarg);
+                break;
Just set  config.fixup_fpw = false before the switch block starts,
like the other variables, and then perhaps doing like below is more
readable:
case 'W':
    config.save_fpw_path = pg_strdup(optarg);
case 'X':
   config.fixup_fpw = true;
   config.save_fpw_path = pg_strdup(optarg);

9.
+ if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
Should we use pg_mkdir_p() instead of mkdir()?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#31Justin Pryzby
pryzby@telsasoft.com
In reply to: Bharath Rupireddy (#30)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Wed, Nov 09, 2022 at 06:00:40PM +0530, Bharath Rupireddy wrote:

On Wed, Nov 9, 2022 at 5:08 AM David Christensen <david.christensen@crunchydata.com> wrote:

Enclosed is v6, which squashes your refactor and adds the additional
recent suggestions.

Thanks for working on this feature. Here are some comments for now. I
haven't looked at the tests, I will take another look at the code and
tests after these and all other comments are addressed.

1. For ease of review, please split the test patch to 0002.

This is just my opinion, but .. why ? Since it's easy to
filter/skip/display a file, I don't think it's usually useful to have
separate patches for tests or docs.

6.
+ if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
Use pg_dir_create_mode instead of hard-coded 0007?

I think I thought of that when I first looked at the patch ... but, I'm
not sure, since it says:

src/include/common/file_perm.h-/* Modes for creating directories and files IN THE DATA DIRECTORY */
src/include/common/file_perm.h:extern PGDLLIMPORT int pg_dir_create_mode;

I was wondering if there's any reason to do "CREATE DATABASE". The vast
majority of TAP tests don't.

$ git grep -ho 'safe_psql[^ ]*' '*pl' |sort |uniq -c |sort -nr |head
1435 safe_psql('postgres',
335 safe_psql(
23 safe_psql($connect_db,

--
Justin

#32Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Justin Pryzby (#31)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On 2022-Nov-09, Justin Pryzby wrote:

On Wed, Nov 09, 2022 at 06:00:40PM +0530, Bharath Rupireddy wrote:

1. For ease of review, please split the test patch to 0002.

This is just my opinion, but .. why ? Since it's easy to
filter/skip/display a file, I don't think it's usually useful to have
separate patches for tests or docs.

I concur with Justin. When a patch is bugfix and a test is added that
verifies it, I like to keep the test in a separate commit (for submit
purposes and in my personal repo -- not for the official push!) so that
I can git-checkout to just the test and make sure it fails ahead of
pushing the fix commit. But for a new feature, there's no reason to do
so.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

#33David Christensen
david.christensen@crunchydata.com
In reply to: Bharath Rupireddy (#30)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Wed, Nov 9, 2022 at 6:30 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Nov 9, 2022 at 5:08 AM David Christensen
<david.christensen@crunchydata.com> wrote:

Enclosed is v6, which squashes your refactor and adds the additional
recent suggestions.

Thanks for working on this feature. Here are some comments for now. I
haven't looked at the tests, I will take another look at the code and
tests after these and all other comments are addressed.

1. For ease of review, please split the test patch to 0002.

Per later discussion seems like new feature tests are fine in the same
patch, yes?

2. I'm unable to understand the use-case for --fixup-fpi option.
pg_waldump is supposed to be just WAL reader, and must not return any
modified information, with --fixup-fpi option, the patch violates this
principle i.e. it sets page LSN and returns. Without actually
replaying the WAL record on the page, how is it correct to just set
the LSN? How will it be useful? ISTM, we must ignore this option
unless there's a strong use-case.

How I was envisioning this was for cases like extreme surgery for
corrupted pages, where you extract the page from WAL but it has lsn
and checksum set so you could do something like `dd if=fixup-block
of=relation ...`, so it *simulates* the replay of said fullpage blocks
in cases where for some reason you can't play the intermediate
records; since this is always a fullpage block, it's capturing what
would be the snapshot so you could manually insert somewhere as needed
without needing to replay (say if dealing with an incomplete or
corrupted WAL stream).

3.
+        if (fork >= 0 && fork <= MAX_FORKNUM)
+        {
+            if (fork)
+                sprintf(forkname, "_%s", forkNames[fork]);
+            else
+                forkname[0] = 0;
+        }
+        else
+            pg_fatal("found invalid fork number: %u", fork);

Isn't the above complex? What's the problem with something like below?
Why do we need if (fork) - else block?

if (fork >= 0 && fork <= MAX_FORKNUM)
sprintf(forkname, "_%s", forkNames[fork]);
else
pg_fatal("found invalid fork number: %u", fork);

This was to suppress any suffix for main forks, but yes, could
simplify and include the `_` in the suffix name. Will include such a
change.

3.
+ char page[BLCKSZ] = {0};
I think when writing to a file, we need PGAlignedBlock rather than a
simple char array of bytes, see the description around PGAlignedBlock
for why it is so.

Easy enough change, and makes sense.

4.
+ if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
Why pg_pwrite(), why not just fwrite()? If fwrite() is used, you can
avoid fileno() system calls, no? Do you need the file position to
remain the same after writing, hence pg_pwrite()?

I don't recall the original motivation, TBH.

5.
+        if (!RestoreBlockImage(record, block_id, page))
+            continue;
+
+        /* we have our extracted FPI, let's save it now */
After extracting the page from the WAL record, do we need to perform a
checksum on it?

That is there in fixup mode (or should be). Are you thinking this
should also be set if not in fixup mode? That defeats the purpose of
the raw page extract, which is to see *exactly* what the WAL stream
has.

6.
+ if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
Use pg_dir_create_mode instead of hard-coded 0007?

Sure.

7.
+        if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
+        fsync(fileno(OPF));
+        fclose(OPF);
Since you're creating the directory in case it's not available, you
need to fsync the directory too?

Sure.

8.
+            case 'W':
+            case 'X':
+                config.fixup_fpw = (option == 'X');
+                config.save_fpw_path = pg_strdup(optarg);
+                break;
Just set  config.fixup_fpw = false before the switch block starts,
like the other variables, and then perhaps doing like below is more
readable:
case 'W':
config.save_fpw_path = pg_strdup(optarg);
case 'X':
config.fixup_fpw = true;
config.save_fpw_path = pg_strdup(optarg);

Like separate opt processing with their own `break` statement?
Probably a bit more readable/conventional.

9.
+ if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
Should we use pg_mkdir_p() instead of mkdir()?

Sure.

Thanks,

David

#34David Christensen
david.christensen@crunchydata.com
In reply to: Justin Pryzby (#31)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

6.
+ if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
Use pg_dir_create_mode instead of hard-coded 0007?

I think I thought of that when I first looked at the patch ... but, I'm
not sure, since it says:

src/include/common/file_perm.h-/* Modes for creating directories and files IN THE DATA DIRECTORY */
src/include/common/file_perm.h:extern PGDLLIMPORT int pg_dir_create_mode;

Looks like it's pretty evenly split in src/bin:

$ git grep -o -E -w '(pg_mkdir_p|mkdir)' '**.c' | sort | uniq -c
3 initdb/initdb.c:mkdir
3 initdb/initdb.c:pg_mkdir_p
1 pg_basebackup/bbstreamer_file.c:mkdir
2 pg_basebackup/pg_basebackup.c:pg_mkdir_p
1 pg_dump/pg_backup_directory.c:mkdir
1 pg_rewind/file_ops.c:mkdir
4 pg_upgrade/pg_upgrade.c:mkdir

So if that is the preferred approach I'll go ahead and use it.

I was wondering if there's any reason to do "CREATE DATABASE". The vast
majority of TAP tests don't.

$ git grep -ho 'safe_psql[^ ]*' '*pl' |sort |uniq -c |sort -nr |head
1435 safe_psql('postgres',
335 safe_psql(
23 safe_psql($connect_db,

If there was a reason, I don't recall offhand; I will test removing it
and if things still work will consider it good enough.

David

#35David Christensen
david.christensen@crunchydata.com
In reply to: David Christensen (#34)
1 attachment(s)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Wed, Nov 9, 2022 at 2:08 PM David Christensen
<david.christensen@crunchydata.com> wrote:

Justin sez:

I was wondering if there's any reason to do "CREATE DATABASE". The vast
majority of TAP tests don't.

$ git grep -ho 'safe_psql[^ ]*' '*pl' |sort |uniq -c |sort -nr |head
1435 safe_psql('postgres',
335 safe_psql(
23 safe_psql($connect_db,

If there was a reason, I don't recall offhand; I will test removing it
and if things still work will consider it good enough.

Things blew up when I did that; rather than hunt it down, I just left it in. :-)

Enclosed is v7, with changes thus suggested thus far.

Attachments:

v7-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patchapplication/octet-stream; name=v7-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patchDownload
From b61a55343d896813c64be07b2269b8c0235615ff Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Wed, 20 Apr 2022 19:59:35 -0500
Subject: [PATCH v7] Teach pg_waldump to extract FPIs from the WAL stream

Extracts full-page images from the WAL stream into a target directory,
which must be empty or not exist.  These images are subject to the same
filtering rules as normal display in pg_waldump, which means that you
can isolate the full page writes to a target relation, among other
things.

Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk> with
formatting to make things somewhat sortable; for instance:

00000000-010000C0.1663.1.6117.0
00000000-01000150.1664.0.6115.0
00000000-010001E0.1664.0.6114.0
00000000-01000270.1663.1.6116.0
00000000-01000300.1663.1.6113.0
00000000-01000390.1663.1.6112.0
00000000-01000420.1663.1.8903.0
00000000-010004B0.1663.1.8902.0
00000000-01000540.1663.1.6111.0
00000000-010005D0.1663.1.6110.0

If the FPI comes from a fork other than the main fork, the fork name
will be appended on the output file name; e.g.:

00000000-014A4758.1663.1.12864.0_vm

It's noteworthy that the raw block images do not have the current LSN
stored with them in the WAL stream (as would be true for on-heap
versions of the blocks), nor would the checksum be updated in
them (though WAL itself has checksums, so there is some protection
there).  As such there are two versions of this functionality, one which
returns the raw page as it appears in the WAL (--save-fpi) and one
which applies the updated pd_lsn and pd_checksum (--fixup-fpi).

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect` suite of tools to perform detailed analysis on
the pages in question, based on historical information, and may come in
handy for forensics work.
---
 doc/src/sgml/ref/pg_waldump.sgml          |  79 ++++++++++++
 src/bin/pg_waldump/meson.build            |   1 +
 src/bin/pg_waldump/pg_waldump.c           | 138 ++++++++++++++++++++-
 src/bin/pg_waldump/t/002_save_fullpage.pl | 142 ++++++++++++++++++++++
 4 files changed, 359 insertions(+), 1 deletion(-)
 create mode 100644 src/bin/pg_waldump/t/002_save_fullpage.pl

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d559f091e5..ecc973e86b 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -240,6 +240,85 @@ PostgreSQL documentation
        </listitem>
      </varlistentry>
 
+     <varlistentry>
+       <term><option>-W <replaceable>save_path</replaceable></option></term>
+       <term><option>--save-fpi=<replaceable>save_path</replaceable></option></term>
+       <term><option>-X <replaceable>save_path</replaceable></option></term>
+       <term><option>--fixup-fpi=<replaceable>save_path</replaceable></option></term>
+       <listitem>
+       <para>
+        Save full page images seen in the WAL stream to the
+        <replaceable>save_path</replaceable> directory, which will be created
+        if it does not exist.  The images saved will be subject to the same
+        filtering and limiting criteria as display records, but in this
+        mode <application>pg_waldump</application> will not output any other
+        information.
+       </para>
+       <para>
+        If invoked using
+        the <literal>-X</literal>/<literal>--fixup-fpi</literal>
+        option, this page image will include the <literal>pd_lsn</literal> of
+        the replayed record rather than the raw page image; as well,
+        the <literal>pd_checksum</literal> field will be updated if it had
+        previously existed.
+       </para>
+       <para>
+        The page images will be saved with the file
+        format: <literal><replaceable>LSN</replaceable>.<replaceable>RELTABLESPACE</replaceable>.<replaceable>DATOID</replaceable>.<replaceable>RELNODE</replaceable>.<replaceable>BLKNO</replaceable><replaceable>FORK</replaceable></literal>
+
+        The dot-separated components are (in order):
+
+        <informaltable>
+         <tgroup cols="2">
+          <thead>
+           <row>
+            <entry>Component</entry>
+            <entry>Description</entry>
+           </row>
+          </thead>
+
+          <tbody>
+           <row>
+            <entry>LSN</entry>
+            <entry>The LSN of the record with this block, formatted
+             as two 8-character hexadecimal numbers <literal>%08X-%08X</literal></entry>
+           </row>
+
+           <row>
+            <entry>RELTABLESPACE</entry>
+            <entry>tablespace OID for the block</entry>
+           </row>
+
+           <row>
+            <entry>DATOID</entry>
+            <entry>database OID for the block</entry>
+           </row>
+
+           <row>
+            <entry>RELNODE</entry>
+            <entry>relnode id for the block</entry>
+           </row>
+
+           <row>
+            <entry>BLKNO</entry>
+            <entry>the block number of this block</entry>
+           </row>
+
+           <row>
+            <entry>FORK</entry>
+            <entry>
+             if coming from the main fork, will be empty, otherwise will be
+             one of <literal>_fsm</literal>, <literal>_vm</literal>,
+             or <literal>_init</literal>.
+            </entry>
+           </row>
+          </tbody>
+         </tgroup>
+        </informaltable>
+       </para>
+       </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-x <replaceable>xid</replaceable></option></term>
       <term><option>--xid=<replaceable>xid</replaceable></option></term>
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 9605976870..34e37bffc3 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -29,6 +29,7 @@ tests += {
   'tap': {
     'tests': [
       't/001_basic.pl',
+      't/002_save_fullpage.pl',
     ],
   },
 }
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 9993378ca5..6769d21182 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -23,9 +23,14 @@
 #include "access/xlogrecord.h"
 #include "access/xlogstats.h"
 #include "common/fe_memutils.h"
+#include "common/file_perm.h"
 #include "common/logging.h"
+#include "common/relpath.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
 
 /*
  * NOTE: For any code change or issue fix here, it is highly recommended to
@@ -70,6 +75,10 @@ typedef struct XLogDumpConfig
 	bool		filter_by_relation_block_enabled;
 	ForkNumber	filter_by_relation_forknum;
 	bool		filter_by_fpw;
+
+	/* save options */
+	bool		fixup_fpw;
+	char	   *save_fpw_path;
 } XLogDumpConfig;
 
 
@@ -439,6 +448,105 @@ XLogRecordHasFPW(XLogReaderState *record)
 	return false;
 }
 
+/*
+ * Function to externally save all FPWs stored in the given WAL record
+ */
+static void
+XLogRecordSaveFPWs(XLogReaderState *record, const char *savepath, bool fixup)
+{
+	int			block_id;
+
+	for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+	{
+		PGAlignedBlock	zerobuf;
+		Page		page;
+		char		filename[MAXPGPATH];
+		char		forkname[FORKNAMECHARS + 2];	/* _ + \0 */
+		FILE	   *OPF;
+		BlockNumber blk;
+		RelFileLocator rnode;
+		ForkNumber	fork;
+
+		page = (Page) zerobuf.data;
+
+		if (!XLogRecHasBlockRef(record, block_id))
+			continue;
+
+		if (!XLogRecHasBlockImage(record, block_id))
+			continue;
+
+		if (!RestoreBlockImage(record, block_id, page))
+			continue;
+
+		/* we have our extracted FPI, let's save it now */
+
+		XLogRecGetBlockTagExtended(record, block_id,
+								   &rnode, &fork, &blk, NULL);
+
+		/*
+		 * The raw page as stored in the WAL record includes the LSN
+		 * of the block as it appeared when it was originally written,
+		 * however this differs than the effects of replaying this
+		 * same FPI in recovery, as recovery calls RestoreBlockImage()
+		 * and then sets the LSN as part of one action.  What this
+		 * means is that a page as recovered from WAL and the version
+		 * of the page saved here will differ by the LSN and the
+		 * checksum (if enabled).
+		 *
+		 * There are potentially use-cases for both versions (with and
+		 * without mentioned fixups), so allow this to be
+		 * user-selected, unless the restored page was empty, in which
+		 * case we leave it alone.
+		 */
+
+		if (fixup && !PageIsNew(page))
+		{
+			PageSetLSN(page, record->ReadRecPtr);
+
+			/*
+			 * If checksum field is non-zero then we have checksums
+			 * enabled, so recalculate the checksum with new LSN
+			 * (whether this is considered a hack or heuristics is an
+			 * exercise for the reader).
+			 *
+			 * We make this choice to allow pages saved by this
+			 * function to work as expected with the checksum-related
+			 * functions in pageinspect without having to worry about
+			 * zero_damaged_pages or other considerations.
+			 *
+			 * FPIs in WAL do not have the checksum field updated in
+			 * the page image; in a checksums-enabled cluster, this
+			 * task is handled by FlushBuffer() when a dirty buffer is
+			 * written out to disk.  Since we are running outside of
+			 * Postmaster that won't work in this case, so we handle
+			 * ourselves.
+			 */
+
+			if (((PageHeader) page)->pd_checksum)
+				((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blk);
+		}
+
+		if (fork >= 0 && fork <= MAX_FORKNUM)
+			sprintf(forkname, "_%s", forkNames[fork]);
+		else
+			pg_fatal("found invalid fork number: %u", fork);
+
+		snprintf(filename, MAXPGPATH, "%s/%08X-%08X.%u.%u.%u.%u%s", savepath,
+				 LSN_FORMAT_ARGS(record->ReadRecPtr),
+				 rnode.spcOid, rnode.dbOid, rnode.relNumber, blk, forkname);
+
+		OPF = fopen(filename, PG_BINARY_W);
+		if (!OPF)
+			pg_fatal("couldn't open file for output: %s", filename);
+
+		if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
+			pg_fatal("couldn't write out complete fullpage image to file: %s", filename);
+
+		fsync(fileno(OPF));
+		fclose(OPF);
+	}
+}
+
 /*
  * Print a record to stdout
  */
@@ -679,6 +787,9 @@ usage(void)
 			 "                         (default: 1 or the value used in STARTSEG)\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
 	printf(_("  -w, --fullpage         only show records with a full page write\n"));
+	printf(_("  -W, --save-fpi=path    save full page images to given path as\n"
+			 "                         LSN.T.D.R.B_F\n"));
+	printf(_("  -X, --fixup-fpi=path   like --save-fpi but apply LSN fixups to saved page\n"));
 	printf(_("  -x, --xid=XID          only show records with transaction ID XID\n"));
 	printf(_("  -z, --stats[=record]   show statistics instead of records\n"
 			 "                         (optionally, show per-record statistics)\n"));
@@ -712,6 +823,8 @@ main(int argc, char **argv)
 		{"limit", required_argument, NULL, 'n'},
 		{"path", required_argument, NULL, 'p'},
 		{"quiet", no_argument, NULL, 'q'},
+		{"save-fpi", required_argument, NULL, 'W'},
+		{"fixup-fpi", required_argument, NULL, 'X'},
 		{"relation", required_argument, NULL, 'R'},
 		{"rmgr", required_argument, NULL, 'r'},
 		{"start", required_argument, NULL, 's'},
@@ -772,6 +885,8 @@ main(int argc, char **argv)
 	config.filter_by_fpw = false;
 	config.stats = false;
 	config.stats_per_record = false;
+	config.fixup_fpw = false;
+	config.save_fpw_path = NULL;
 
 	stats.startptr = InvalidXLogRecPtr;
 	stats.endptr = InvalidXLogRecPtr;
@@ -782,7 +897,7 @@ main(int argc, char **argv)
 		goto bad_argument;
 	}
 
-	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wx:z",
+	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wW:x:X:z",
 								 long_options, &optindex)) != -1)
 	{
 		switch (option)
@@ -918,6 +1033,11 @@ main(int argc, char **argv)
 			case 'w':
 				config.filter_by_fpw = true;
 				break;
+			case 'W':
+			case 'X':
+				config.fixup_fpw = (option == 'X');
+				config.save_fpw_path = pg_strdup(optarg);
+				break;
 			case 'x':
 				if (sscanf(optarg, "%u", &config.filter_by_xid) != 1)
 				{
@@ -972,6 +1092,17 @@ main(int argc, char **argv)
 		}
 	}
 
+	if (config.save_fpw_path != NULL)
+	{
+		/* Create the dir if it doesn't exist */
+		if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0)
+		{
+			pg_log_error("could not create output directory \"%s\": %m",
+						 config.save_fpw_path);
+			goto bad_argument;
+		}
+	}
+
 	/* parse files as start/end boundaries, extract path if not specified */
 	if (optind < argc)
 	{
@@ -1150,6 +1281,11 @@ main(int argc, char **argv)
 				XLogRecStoreStats(&stats, xlogreader_state);
 				stats.endptr = xlogreader_state->EndRecPtr;
 			}
+			else if (config.save_fpw_path)
+			{
+				if (XLogRecordHasFPW(xlogreader_state))
+					XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path, config.fixup_fpw);
+			}
 			else
 				XLogDumpDisplayRecord(&config, xlogreader_state);
 		}
diff --git a/src/bin/pg_waldump/t/002_save_fullpage.pl b/src/bin/pg_waldump/t/002_save_fullpage.pl
new file mode 100644
index 0000000000..8131ff2d93
--- /dev/null
+++ b/src/bin/pg_waldump/t/002_save_fullpage.pl
@@ -0,0 +1,142 @@
+
+# Copyright (c) 2022, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use File::Basename;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::RecursiveCopy;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $blocksize = 8192;
+
+# routine to extract the LSN and checksum from the given block structure
+sub get_block_info
+{
+	my $path = shift;
+	my $block;
+
+	open my $fh, '<', $path or die "couldn't open file: $path\n";
+	die "couldn't read full block\n" if $blocksize != read $fh, $block, $blocksize;
+	my ($lsn_hi, $lsn_lo, $checksum) = unpack('VVv', $block);
+
+	$lsn_hi = sprintf('%08X', $lsn_hi);
+	$lsn_lo = sprintf('%08X', $lsn_lo);
+
+	return ($lsn_hi, $lsn_lo, $checksum);
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init('-k');
+$primary->append_conf('postgresql.conf', "max_wal_size='100MB'");
+$primary->append_conf('postgresql.conf', "wal_level='replica'");
+$primary->append_conf('postgresql.conf', 'archive_mode=on');
+$primary->append_conf('postgresql.conf', "archive_command='/bin/false'");
+$primary->start;
+
+# Sanity checks for command line options.
+$primary->command_fails(
+	[ 'pg_waldump', '--save-fpi' ],
+	'--save-fpi fails without path');
+$primary->command_fails(
+	[ 'pg_waldump', '--fixup-fpi' ],
+	'--fixup-fpi fails without path');
+
+# generate data/wal to examine
+$primary->safe_psql('postgres', q(CREATE DATABASE db1));
+$primary->safe_psql('db1',      <<EOF);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT;
+SELECT pg_switch_wal();
+UPDATE test_table SET a = a + 1;
+CHECKPOINT;
+SELECT pg_switch_wal();
+UPDATE test_table SET a = a + 1;
+CHECKPOINT;
+SELECT pg_switch_wal();
+EOF
+
+# get our blocksize from the db
+$blocksize = $primary->safe_psql('postgres', "SELECT current_setting('block_size')");
+
+# get the relation node, etc for the new table
+my $relation = $primary->safe_psql('db1',
+	q{SELECT format('%s/%s/%s', CASE WHEN reltablespace = 0 THEN dattablespace ELSE reltablespace END, pg_database.oid, pg_relation_filenode(pg_class.oid)) FROM pg_class, pg_database WHERE relname = 'test_table' AND datname = current_database()}
+);
+
+diag $relation;
+
+$primary->stop;
+my $waldir = $primary->basedir . '/pgdata/pg_wal';
+my $walfile = [glob("$waldir/*")]->[2]; # we want the 00000002 file
+my $tmp_folder = PostgreSQL::Test::Utils::tempdir;
+diag "using walfile: $walfile";
+
+# # # extract files
+# PostgreSQL::Test::RecursiveCopy::copypath(
+# 	$primary->data_dir . "/pg_wal",
+# 	$primary->tmp_folder);
+
+$primary->command_ok(['pg_waldump', '--save-fpi', "$tmp_folder/raw", '--relation', $relation, $walfile]);
+$primary->command_ok(['pg_waldump', '--fixup-fpi', "$tmp_folder/fixup", '--relation', $relation, $walfile]);
+
+my $file_re =
+  qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/;
+
+my %checksums;
+my %files;
+
+# verify filename formats matches w/--save-fpi
+for my $fullpath (glob "$tmp_folder/raw/*")
+{
+	my $file = File::Basename::basename($fullpath);
+
+	like($file, $file_re, "verify filename format for file $file");
+
+	# save filename for later verification
+	$files{$file}++;
+
+	my ($hi_lsn_fn, $lo_lsn_fn) = ($file =~ $file_re);
+	my ($hi_lsn_bk, $lo_lsn_bk, $checksum) = get_block_info($fullpath);
+
+	# since no fixup, verify the lsn in the block comes before the file's lsn
+	ok( $hi_lsn_fn . $lo_lsn_fn gt $hi_lsn_bk . $lo_lsn_bk,
+		'verify file-based LSN precedes block-based one');
+
+	# stash checksum for later comparisons
+	$checksums{$file} = $checksum;
+}
+
+# verify filename formats matches w/--fixup-fpi
+for my $fullpath (glob "$tmp_folder/fixup/*")
+{
+	my $file = File::Basename::basename($fullpath);
+
+	like($file, $file_re, "verify filename format for file $file");
+
+	# save filename for later verification
+	$files{$file}++;
+
+	my ($hi_lsn_fn, $lo_lsn_fn) = ($file =~ $file_re);
+	my ($hi_lsn_bk, $lo_lsn_bk, $checksum) = get_block_info($fullpath);
+
+	# since fixup, verify the lsn in the block equals file lsn
+	ok( $hi_lsn_fn . $lo_lsn_fn eq $hi_lsn_bk . $lo_lsn_bk,
+		'verify file-based LSN is the same as block-based one');
+
+	# verify checksum change; XXX: there could be valid clashes here,
+	# just validate that the page matches the expected checksum instead?
+	ok( $checksum == 0 || $checksums{$file} != $checksum,
+		'check for checksum change or no checksum');
+}
+
+# validate that we ended up with some files output and they were the same
+ok(keys %files > 0, 'verify we processed some files');
+ok((grep { $_ != 2 } values %files) == 0,
+	'ensure raw and fixup had same number of files');
+
+done_testing();
-- 
2.37.1 (Apple Git-137.1)

#36Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David Christensen (#33)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Thu, Nov 10, 2022 at 1:31 AM David Christensen
<david.christensen@crunchydata.com> wrote:

Thanks for providing the v7 patch, please see my comments and responses below.

2. I'm unable to understand the use-case for --fixup-fpi option.
pg_waldump is supposed to be just WAL reader, and must not return any
modified information, with --fixup-fpi option, the patch violates this
principle i.e. it sets page LSN and returns. Without actually
replaying the WAL record on the page, how is it correct to just set
the LSN? How will it be useful? ISTM, we must ignore this option
unless there's a strong use-case.

How I was envisioning this was for cases like extreme surgery for
corrupted pages, where you extract the page from WAL but it has lsn
and checksum set so you could do something like `dd if=fixup-block
of=relation ...`, so it *simulates* the replay of said fullpage blocks
in cases where for some reason you can't play the intermediate
records; since this is always a fullpage block, it's capturing what
would be the snapshot so you could manually insert somewhere as needed
without needing to replay (say if dealing with an incomplete or
corrupted WAL stream).

Recovery sets the page LSN after it replayed the WAL record on the
page right? Recovery does this - base_page/FPI +
apply_WAL_record_and_then_set_applied_WAL_record's_LSN =
new_version_of_page. Essentially, in your patch, you are just setting
the WAL record LSN with the page contents being the base page's. I'm
still not sure what's the real use-case here. We don't have an
independent function in postgres, given a base page and a WAL record
that just replays the WAL record and output's the new version of the
page, so I think what you do in the patch with fixup option seems
wrong to me.

5.
+        if (!RestoreBlockImage(record, block_id, page))
+            continue;
+
+        /* we have our extracted FPI, let's save it now */
After extracting the page from the WAL record, do we need to perform a
checksum on it?

I think you just need to do the following, this will ensure the
authenticity of the page that pg_waldump returns.
if ((PageHeader) page)->pd_checksum != pg_checksum_page((char *) page, blk))
{
pg_fatal("page checksum failed");
}

case 'W':
config.save_fpw_path = pg_strdup(optarg);
case 'X':
config.fixup_fpw = true;
config.save_fpw_path = pg_strdup(optarg);

Like separate opt processing with their own `break` statement?
Probably a bit more readable/conventional.

Yes.

Some more comments:

1.
+ PGAlignedBlock zerobuf;
Essentially, it's not a zero buffer, please rename the variable to
something like 'buf' or 'page_buf' or someother?

2.
+ if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
Replace pg_pwrite with fwrite() and avoid fileno() system calls that
should suffice here, AFICS, we don't need pg_pwrite.

3.
+    if (config.save_fpw_path != NULL)
+    {
+        /* Create the dir if it doesn't exist */
+        if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0)
I think  you still need pg_check_dir() here, how about something like below?

if (pg_check_dir(config.save_fpw_path) == 0)
{
if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0)
{
/* error */
}
}

4.
+        /* Create the dir if it doesn't exist */
+        if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0)
+        {
+            pg_log_error("could not create output directory \"%s\": %m",
+                         config.save_fpw_path);
+            goto bad_argument;
Why is the directory creation error a bad_argument? I think you need
just pg_fatal() here.

5.
+ fsync(fileno(OPF));
+ fclose(OPF);
I think just the fsync() isn't enough, you still need fsync_fname()
and/or fsync_parent_path(), perhaps after for (block_id = 0; block_id
<= XLogRecMaxBlockId(record); block_id++) loop.

6. Speaking of which, do we need to do fsync()'s optionally? If we
were to write many such FPI files, aren't there going to be more
fsync() calls and imagine this feature being used on a production
server and a lot of fsync() will definitely make running server
fsync() ops slower. I think we need a new option whether pg_waldump
ever do fsync() or not, something similar to --no-sync of
pg_receivewal/pg_upgrade/pg_dump/pg_initdb/pg_checksums etc. I would
like it if the pg_waldump's --no-sync is coded as 0001 and 0002 can
make use of it.

7.
+ pg_fatal("couldn't write out complete fullpage image to
file: %s", filename);
We typically use "full page image" in the output strings, please correct.

8.
+
+            if (((PageHeader) page)->pd_checksum)
+                ((PageHeader) page)->pd_checksum =
pg_checksum_page((char *) page, blk);
Why do you need to set the page's checksum by yourself? I don't think
this is the right way, pg_waldump should just return what it sees in
the WAL record, of course, after verifying a few checks (like checksum
is correct or not), but it mustn't set or compute anything new in the
returned page.
Few comments on the tests:
1.
+$primary->init('-k');
+$primary->append_conf('postgresql.conf', "max_wal_size='100MB'");
+$primary->append_conf('postgresql.conf', "wal_level='replica'");
+$primary->append_conf('postgresql.conf', 'archive_mode=on');
+$primary->append_conf('postgresql.conf', "archive_command='/bin/false'");
+$primary->start;
I don't think we need these many append_conf calls here, see how
others are doing it with just one single call:

$node->append_conf(
'postgresql.conf', qq{
listen_addresses = '$hostaddr'
krb_server_keyfile = '$keytab'
log_connections = on
lc_messages = 'C'
});

2.
+$primary->append_conf('postgresql.conf', "max_wal_size='100MB'");
Do you really need 100MB max_wal_size? Why can't you just initialize
your cluster with 1MB wal files instead of 16MB and set max_wal_size
to 4MB or a bit more, something like 019_replslot_limit.pl does?

3.
+# generate data/wal to examine
+$primary->safe_psql('postgres', q(CREATE DATABASE db1));
+$primary->safe_psql('db1',      <<EOF);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT;
+SELECT pg_switch_wal();
+UPDATE test_table SET a = a + 1;
+CHECKPOINT;
+SELECT pg_switch_wal();
+UPDATE test_table SET a = a + 1;
+CHECKPOINT;
+SELECT pg_switch_wal();
+EOF
I don't think you need these many complex things to generate WAL,
multiple CHECKPOINT;s can make tests slower. To keep it simple, you
can just create a table, insert a single row, checkpoint, update the
row, switch the wal - no need to test if your feature generates
multiple WAL files, it's enough to test if it generates just one.
Please simplfiy the tests.
4.
+$primary->append_conf('postgresql.conf', "wal_level='replica'");
+$primary->append_conf('postgresql.conf', 'archive_mode=on');
+$primary->append_conf('postgresql.conf', "archive_command='/bin/false'");
Why do you need to set wal_level to replica, out of the box your
cluster comes with replica only no?
And why do you need archive_mode on and set the command to do nothing?
Why archiving is needed for testing your feature firstly?

5.
+my $primary = PostgreSQL::Test::Cluster->new('primary');
Can you rename your node to other than primary? Because this isn't a
test of replication where primary and standby nodes get created. How
about just 'node'?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#37David Christensen
david.christensen@crunchydata.com
In reply to: Bharath Rupireddy (#36)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Thu, Nov 10, 2022 at 2:33 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Nov 10, 2022 at 1:31 AM David Christensen
<david.christensen@crunchydata.com> wrote:

Thanks for providing the v7 patch, please see my comments and responses below.

Hi Bharath, Thanks for the feedback.

2. I'm unable to understand the use-case for --fixup-fpi option.
pg_waldump is supposed to be just WAL reader, and must not return any
modified information, with --fixup-fpi option, the patch violates this
principle i.e. it sets page LSN and returns. Without actually
replaying the WAL record on the page, how is it correct to just set
the LSN? How will it be useful? ISTM, we must ignore this option
unless there's a strong use-case.

How I was envisioning this was for cases like extreme surgery for
corrupted pages, where you extract the page from WAL but it has lsn
and checksum set so you could do something like `dd if=fixup-block
of=relation ...`, so it *simulates* the replay of said fullpage blocks
in cases where for some reason you can't play the intermediate
records; since this is always a fullpage block, it's capturing what
would be the snapshot so you could manually insert somewhere as needed
without needing to replay (say if dealing with an incomplete or
corrupted WAL stream).

Recovery sets the page LSN after it replayed the WAL record on the
page right? Recovery does this - base_page/FPI +
apply_WAL_record_and_then_set_applied_WAL_record's_LSN =
new_version_of_page. Essentially, in your patch, you are just setting
the WAL record LSN with the page contents being the base page's. I'm
still not sure what's the real use-case here. We don't have an
independent function in postgres, given a base page and a WAL record
that just replays the WAL record and output's the new version of the
page, so I think what you do in the patch with fixup option seems
wrong to me.

Well if it's not the same output then I guess you're right and there's
not a use for the `--fixup` mode. By the same token, I'd say
calculating/setting the checksum also wouldn't need to be done, we
should just include the page as included in the WAL stream.

5.
+        if (!RestoreBlockImage(record, block_id, page))
+            continue;
+
+        /* we have our extracted FPI, let's save it now */
After extracting the page from the WAL record, do we need to perform a
checksum on it?

I think you just need to do the following, this will ensure the
authenticity of the page that pg_waldump returns.
if ((PageHeader) page)->pd_checksum != pg_checksum_page((char *) page, blk))
{
pg_fatal("page checksum failed");
}

The WAL already has a checksum, so not certain this makes sense on its
own. Also I'm inclined to make it a warning if it doesn't match rather
than a fatal. (I'd also have to verify that the checksum is properly
set on the page prior to copying the FPI into WAL, which I'm pretty
sure it is but not certain.)

case 'W':
config.save_fpw_path = pg_strdup(optarg);
case 'X':
config.fixup_fpw = true;
config.save_fpw_path = pg_strdup(optarg);

Like separate opt processing with their own `break` statement?
Probably a bit more readable/conventional.

Yes.

Moot with the removal of the --fixup mode.

Some more comments:

1.
+ PGAlignedBlock zerobuf;
Essentially, it's not a zero buffer, please rename the variable to
something like 'buf' or 'page_buf' or someother?

Sure.

2.
+ if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
Replace pg_pwrite with fwrite() and avoid fileno() system calls that
should suffice here, AFICS, we don't need pg_pwrite.

Sure.

3.
+    if (config.save_fpw_path != NULL)
+    {
+        /* Create the dir if it doesn't exist */
+        if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0)
I think  you still need pg_check_dir() here, how about something like below?

I was assuming pg_mkdir_p() acted just like mkdir -p, where it's just
an idempotent action, so an existing dir is just treated the same.
What's the benefit here? Would assume if a non-dir file existed at
that path or other permissions issues arose we'd just get an error
from pg_mkdir_p(). (Will review the code there and confirm.)

4.
+        /* Create the dir if it doesn't exist */
+        if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0)
+        {
+            pg_log_error("could not create output directory \"%s\": %m",
+                         config.save_fpw_path);
+            goto bad_argument;
Why is the directory creation error a bad_argument? I think you need
just pg_fatal() here.

Sure. Was just following the other patterns I'd seen for argument handling.

5.
+ fsync(fileno(OPF));
+ fclose(OPF);
I think just the fsync() isn't enough, you still need fsync_fname()
and/or fsync_parent_path(), perhaps after for (block_id = 0; block_id
<= XLogRecMaxBlockId(record); block_id++) loop.

I'm not sure I get the value of the fsyncs here; if you are using this
tool at this capacity you're by definition doing some sort of
transient investigative steps. Since the WAL was fsync'd, you could
always rerun/recreate as needed in the unlikely event of an OS crash
in the middle of this investigation. Since this is outside the
purview of the database operations proper (unlike, say, initdb) seems
like it's unnecessary (or definitely shouldn't need to be selectable).
My thoughts are that if we're going to fsync, just do the fsyncs
unconditionally rather than complicate the interface further.

6. Speaking of which, do we need to do fsync()'s optionally? If we
were to write many such FPI files, aren't there going to be more
fsync() calls and imagine this feature being used on a production
server and a lot of fsync() will definitely make running server
fsync() ops slower. I think we need a new option whether pg_waldump
ever do fsync() or not, something similar to --no-sync of
pg_receivewal/pg_upgrade/pg_dump/pg_initdb/pg_checksums etc. I would
like it if the pg_waldump's --no-sync is coded as 0001 and 0002 can
make use of it.

See my thoughts on #5; basically I'm -0.5 on the fsyncs at all.

7.
+ pg_fatal("couldn't write out complete fullpage image to
file: %s", filename);
We typically use "full page image" in the output strings, please correct.

Sure.

8.
+
+            if (((PageHeader) page)->pd_checksum)
+                ((PageHeader) page)->pd_checksum =
pg_checksum_page((char *) page, blk);
Why do you need to set the page's checksum by yourself? I don't think
this is the right way, pg_waldump should just return what it sees in
the WAL record, of course, after verifying a few checks (like checksum
is correct or not), but it mustn't set or compute anything new in the
returned page.

This was in the --fixup codepath only, so will go away.

Few comments on the tests:
1.
+$primary->init('-k');
+$primary->append_conf('postgresql.conf', "max_wal_size='100MB'");
+$primary->append_conf('postgresql.conf', "wal_level='replica'");
+$primary->append_conf('postgresql.conf', 'archive_mode=on');
+$primary->append_conf('postgresql.conf', "archive_command='/bin/false'");
+$primary->start;
I don't think we need these many append_conf calls here, see how
others are doing it with just one single call:

$node->append_conf(
'postgresql.conf', qq{
listen_addresses = '$hostaddr'
krb_server_keyfile = '$keytab'
log_connections = on
lc_messages = 'C'
});

Kk.

2.
+$primary->append_conf('postgresql.conf', "max_wal_size='100MB'");
Do you really need 100MB max_wal_size? Why can't you just initialize
your cluster with 1MB wal files instead of 16MB and set max_wal_size
to 4MB or a bit more, something like 019_replslot_limit.pl does?

Yeah, that could work; I wasn't aware that other tests were modifying
those params. In order to get a test that wouldn't barf when it hit
the end of the WAL stream (and so fail the test) I needed to ensure
there were multiple WAL files generated that would not be recycled so
I could dump a single WAL file without error. This was the approach I
was able to come up with. :-)

3.
+# generate data/wal to examine
+$primary->safe_psql('postgres', q(CREATE DATABASE db1));
+$primary->safe_psql('db1',      <<EOF);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT;
+SELECT pg_switch_wal();
+UPDATE test_table SET a = a + 1;
+CHECKPOINT;
+SELECT pg_switch_wal();
+UPDATE test_table SET a = a + 1;
+CHECKPOINT;
+SELECT pg_switch_wal();
+EOF
I don't think you need these many complex things to generate WAL,
multiple CHECKPOINT;s can make tests slower. To keep it simple, you
can just create a table, insert a single row, checkpoint, update the
row, switch the wal - no need to test if your feature generates
multiple WAL files, it's enough to test if it generates just one.
Please simplfiy the tests.

Can probably simplify more, but see the rationale on my last point.

4.
+$primary->append_conf('postgresql.conf', "wal_level='replica'");
+$primary->append_conf('postgresql.conf', 'archive_mode=on');
+$primary->append_conf('postgresql.conf', "archive_command='/bin/false'");
Why do you need to set wal_level to replica, out of the box your
cluster comes with replica only no?
And why do you need archive_mode on and set the command to do nothing?
Why archiving is needed for testing your feature firstly?

I think it had shown "minimal" in my testing; I was purposefully
failing archives so the WAL would stick around. Maybe a custom
archive command that just copied a single WAL file into a known
location so I could use that instead of the current approach would
work, though not sure how Windows support would work with that. Open
to other ideas to more cleanly get a single WAL file that isn't the
last one. (Earlier versions of this test were using /all/ of the
generated WAL files rather than a single one, so maybe I am
overcomplicating things for a single WAL file case.)

5.
+my $primary = PostgreSQL::Test::Cluster->new('primary');
Can you rename your node to other than primary? Because this isn't a
test of replication where primary and standby nodes get created. How
about just 'node'?

Sure, np.

Best,

David

#38Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David Christensen (#37)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Thu, Nov 10, 2022 at 9:52 PM David Christensen
<david.christensen@crunchydata.com> wrote:

2. I'm unable to understand the use-case for --fixup-fpi option.
pg_waldump is supposed to be just WAL reader, and must not return any
modified information, with --fixup-fpi option, the patch violates this
principle i.e. it sets page LSN and returns. Without actually
replaying the WAL record on the page, how is it correct to just set
the LSN? How will it be useful? ISTM, we must ignore this option
unless there's a strong use-case.

How I was envisioning this was for cases like extreme surgery for
corrupted pages, where you extract the page from WAL but it has lsn
and checksum set so you could do something like `dd if=fixup-block
of=relation ...`, so it *simulates* the replay of said fullpage blocks
in cases where for some reason you can't play the intermediate
records; since this is always a fullpage block, it's capturing what
would be the snapshot so you could manually insert somewhere as needed
without needing to replay (say if dealing with an incomplete or
corrupted WAL stream).

Recovery sets the page LSN after it replayed the WAL record on the
page right? Recovery does this - base_page/FPI +
apply_WAL_record_and_then_set_applied_WAL_record's_LSN =
new_version_of_page. Essentially, in your patch, you are just setting
the WAL record LSN with the page contents being the base page's. I'm
still not sure what's the real use-case here. We don't have an
independent function in postgres, given a base page and a WAL record
that just replays the WAL record and output's the new version of the
page, so I think what you do in the patch with fixup option seems
wrong to me.

Well if it's not the same output then I guess you're right and there's
not a use for the `--fixup` mode. By the same token, I'd say
calculating/setting the checksum also wouldn't need to be done, we
should just include the page as included in the WAL stream.

Let's hear from others, we may be missing something here. I recommend
keeping the --fixup patch as 0002, in case if we decide to discard
it's easier, however I'll leave that to you.

5.
+        if (!RestoreBlockImage(record, block_id, page))
+            continue;
+
+        /* we have our extracted FPI, let's save it now */
After extracting the page from the WAL record, do we need to perform a
checksum on it?

I think you just need to do the following, this will ensure the
authenticity of the page that pg_waldump returns.
if ((PageHeader) page)->pd_checksum != pg_checksum_page((char *) page, blk))
{
pg_fatal("page checksum failed");
}

The WAL already has a checksum, so not certain this makes sense on its
own. Also I'm inclined to make it a warning if it doesn't match rather
than a fatal. (I'd also have to verify that the checksum is properly
set on the page prior to copying the FPI into WAL, which I'm pretty
sure it is but not certain.)

How about having it as an Assert()?

5.
+ fsync(fileno(OPF));
+ fclose(OPF);
I think just the fsync() isn't enough, you still need fsync_fname()
and/or fsync_parent_path(), perhaps after for (block_id = 0; block_id
<= XLogRecMaxBlockId(record); block_id++) loop.

I'm not sure I get the value of the fsyncs here; if you are using this
tool at this capacity you're by definition doing some sort of
transient investigative steps. Since the WAL was fsync'd, you could
always rerun/recreate as needed in the unlikely event of an OS crash
in the middle of this investigation. Since this is outside the
purview of the database operations proper (unlike, say, initdb) seems
like it's unnecessary (or definitely shouldn't need to be selectable).
My thoughts are that if we're going to fsync, just do the fsyncs
unconditionally rather than complicate the interface further.

-1 for fysnc() per file created as it can create a lot of sync load on
production servers impacting performance. How about just syncing the
directory at the end assuming it doesn't cost as much as fsync() per
FPI file created would?

4.
+$primary->append_conf('postgresql.conf', "wal_level='replica'");
+$primary->append_conf('postgresql.conf', 'archive_mode=on');
+$primary->append_conf('postgresql.conf', "archive_command='/bin/false'");
Why do you need to set wal_level to replica, out of the box your
cluster comes with replica only no?
And why do you need archive_mode on and set the command to do nothing?
Why archiving is needed for testing your feature firstly?

I think it had shown "minimal" in my testing; I was purposefully
failing archives so the WAL would stick around. Maybe a custom
archive command that just copied a single WAL file into a known
location so I could use that instead of the current approach would
work, though not sure how Windows support would work with that. Open
to other ideas to more cleanly get a single WAL file that isn't the
last one. (Earlier versions of this test were using /all/ of the
generated WAL files rather than a single one, so maybe I am
overcomplicating things for a single WAL file case.)

Typically we create a physical replication slot at the beginning so
that the server keeps the WAL required for you in pg_wal itself, for
instance, please see pg_walinspect:

-- Make sure checkpoints don't interfere with the test.
SELECT 'init' FROM
pg_create_physical_replication_slot('regress_pg_walinspect_slot',
true, false);

2.
+$primary->append_conf('postgresql.conf', "max_wal_size='100MB'");
Do you really need 100MB max_wal_size? Why can't you just initialize
your cluster with 1MB wal files instead of 16MB and set max_wal_size
to 4MB or a bit more, something like 019_replslot_limit.pl does?

Yeah, that could work; I wasn't aware that other tests were modifying
those params. In order to get a test that wouldn't barf when it hit
the end of the WAL stream (and so fail the test) I needed to ensure
there were multiple WAL files generated that would not be recycled so
I could dump a single WAL file without error. This was the approach I
was able to come up with. :-)

If you do trick specified in the above comment i.e. using replication
slot to hold the WAL, I think you don't need to set max_wal_size at
all, if that's true, I think the tests can just spin up a node and run
the tests without bothering max_wal_size, archive_mode etc.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#39Justin Pryzby
pryzby@telsasoft.com
In reply to: David Christensen (#35)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Wed, Nov 09, 2022 at 02:37:29PM -0600, David Christensen wrote:

On Wed, Nov 9, 2022 at 2:08 PM David Christensen <david.christensen@crunchydata.com> wrote:

Justin sez:

I was wondering if there's any reason to do "CREATE DATABASE". The vast
majority of TAP tests don't.

$ git grep -ho 'safe_psql[^ ]*' '*pl' |sort |uniq -c |sort -nr |head
1435 safe_psql('postgres',
335 safe_psql(
23 safe_psql($connect_db,

If there was a reason, I don't recall offhand; I will test removing it
and if things still work will consider it good enough.

Things blew up when I did that; rather than hunt it down, I just left it in. :-)

+$primary->safe_psql('db1', <<EOF);

It worked for me when I removed the 3 references to db1.
That's good for efficiency of the test.

+my $blocksize = 8192;

I think this should be just "my $blocksize;" rather than setting a value
which is later overwriten.

--
Justin

#40David Christensen
david.christensen@crunchydata.com
In reply to: Justin Pryzby (#39)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Fri, Nov 11, 2022 at 8:15 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Wed, Nov 09, 2022 at 02:37:29PM -0600, David Christensen wrote:

On Wed, Nov 9, 2022 at 2:08 PM David Christensen <david.christensen@crunchydata.com> wrote:

Justin sez:

I was wondering if there's any reason to do "CREATE DATABASE". The vast
majority of TAP tests don't.

$ git grep -ho 'safe_psql[^ ]*' '*pl' |sort |uniq -c |sort -nr |head
1435 safe_psql('postgres',
335 safe_psql(
23 safe_psql($connect_db,

If there was a reason, I don't recall offhand; I will test removing it
and if things still work will consider it good enough.

Things blew up when I did that; rather than hunt it down, I just left it in. :-)

+$primary->safe_psql('db1', <<EOF);

It worked for me when I removed the 3 references to db1.
That's good for efficiency of the test.

I did figure that out later; fixed in git.

+my $blocksize = 8192;

I think this should be just "my $blocksize;" rather than setting a value
which is later overwriten.

Yep. Fixed in git.

David

#41David Christensen
david.christensen@crunchydata.com
In reply to: Bharath Rupireddy (#38)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Fri, Nov 11, 2022 at 4:57 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Well if it's not the same output then I guess you're right and there's
not a use for the `--fixup` mode. By the same token, I'd say
calculating/setting the checksum also wouldn't need to be done, we
should just include the page as included in the WAL stream.

Let's hear from others, we may be missing something here. I recommend
keeping the --fixup patch as 0002, in case if we decide to discard
it's easier, however I'll leave that to you.

I've whacked in `git` for now; I can resurrect if people consider it useful.

5.
+        if (!RestoreBlockImage(record, block_id, page))
+            continue;
+
+        /* we have our extracted FPI, let's save it now */
After extracting the page from the WAL record, do we need to perform a
checksum on it?

I think you just need to do the following, this will ensure the
authenticity of the page that pg_waldump returns.
if ((PageHeader) page)->pd_checksum != pg_checksum_page((char *) page, blk))
{
pg_fatal("page checksum failed");
}

The WAL already has a checksum, so not certain this makes sense on its
own. Also I'm inclined to make it a warning if it doesn't match rather
than a fatal. (I'd also have to verify that the checksum is properly
set on the page prior to copying the FPI into WAL, which I'm pretty
sure it is but not certain.)

How about having it as an Assert()?

Based on empirical testing, the checksums don't match, so
asserting/alerting on each block extracted seems next to useless, so
going to just remove that.

5.
+ fsync(fileno(OPF));
+ fclose(OPF);
I think just the fsync() isn't enough, you still need fsync_fname()
and/or fsync_parent_path(), perhaps after for (block_id = 0; block_id
<= XLogRecMaxBlockId(record); block_id++) loop.

I'm not sure I get the value of the fsyncs here; if you are using this
tool at this capacity you're by definition doing some sort of
transient investigative steps. Since the WAL was fsync'd, you could
always rerun/recreate as needed in the unlikely event of an OS crash
in the middle of this investigation. Since this is outside the
purview of the database operations proper (unlike, say, initdb) seems
like it's unnecessary (or definitely shouldn't need to be selectable).
My thoughts are that if we're going to fsync, just do the fsyncs
unconditionally rather than complicate the interface further.

-1 for fysnc() per file created as it can create a lot of sync load on
production servers impacting performance. How about just syncing the
directory at the end assuming it doesn't cost as much as fsync() per
FPI file created would?

I can fsync the dir if that's a useful compromise.

4.
+$primary->append_conf('postgresql.conf', "wal_level='replica'");
+$primary->append_conf('postgresql.conf', 'archive_mode=on');
+$primary->append_conf('postgresql.conf', "archive_command='/bin/false'");
Why do you need to set wal_level to replica, out of the box your
cluster comes with replica only no?
And why do you need archive_mode on and set the command to do nothing?
Why archiving is needed for testing your feature firstly?

I think it had shown "minimal" in my testing; I was purposefully
failing archives so the WAL would stick around. Maybe a custom
archive command that just copied a single WAL file into a known
location so I could use that instead of the current approach would
work, though not sure how Windows support would work with that. Open
to other ideas to more cleanly get a single WAL file that isn't the
last one. (Earlier versions of this test were using /all/ of the
generated WAL files rather than a single one, so maybe I am
overcomplicating things for a single WAL file case.)

Typically we create a physical replication slot at the beginning so
that the server keeps the WAL required for you in pg_wal itself, for
instance, please see pg_walinspect:

-- Make sure checkpoints don't interfere with the test.
SELECT 'init' FROM
pg_create_physical_replication_slot('regress_pg_walinspect_slot',
true, false);

Will see if I can get something like this to work; I'm currently
stopping the server before running the file-based tests, but I suppose
there's no reason to do so, so a temporary slot that holds it around
until the test is complete is probably fine.

David

#42David Christensen
david.christensen@crunchydata.com
In reply to: David Christensen (#41)
1 attachment(s)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

Enclosed is v8, which uses the replication slot method to retain WAL
as well as fsync'ing the output directory when everything is done.

Attachments:

v8-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patchapplication/octet-stream; name=v8-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patchDownload
From 82b5cc8356b49ca4a781e066f9abc703aead15c5 Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Thu, 10 Nov 2022 10:29:38 -0600
Subject: [PATCH v8] Teach pg_waldump to extract FPIs from the WAL stream

Extracts full-page images from the WAL stream into a given target directory.
These images are subject to the same filtering rules as normal display in
pg_waldump, which means that you can isolate the full page writes to a
target relation, among other things.

Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk>_<fork> with
formatting to make things somewhat sortable; for instance:

00000000-010000C0.1663.1.6117.0_main
00000000-01000150.1664.0.6115.0_main
00000000-010001E0.1664.0.6114.0_main
00000000-01000270.1663.1.6116.0_main
00000000-01000300.1663.1.6113.0_main
00000000-01000390.1663.1.6112.0_main
00000000-01000420.1663.1.8903.0_main
00000000-010004B0.1663.1.8902.0_main
00000000-01000540.1663.1.6111.0_main
00000000-010005D0.1663.1.6110.0_main

It's noteworthy that the raw block images do not have the current LSN
stored with them in the WAL stream (as would be true for on-heap
versions of the blocks), nor would the checksum be updated in
them (though WAL itself has checksums, so there is some protection
there).

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect` suite of tools to perform detailed analysis on
the pages in question, based on historical information, and may come in
handy for forensics work.
---
 doc/src/sgml/ref/pg_waldump.sgml          |  69 ++++++++++++++
 src/bin/pg_waldump/meson.build            |   1 +
 src/bin/pg_waldump/pg_waldump.c           |  95 +++++++++++++++++++-
 src/bin/pg_waldump/t/002_save_fullpage.pl | 104 ++++++++++++++++++++++
 4 files changed, 268 insertions(+), 1 deletion(-)
 create mode 100644 src/bin/pg_waldump/t/002_save_fullpage.pl

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d559f091e5..3c5bd9eb3f 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -240,6 +240,75 @@ PostgreSQL documentation
        </listitem>
      </varlistentry>
 
+     <varlistentry>
+       <term><option>-W <replaceable>save_path</replaceable></option></term>
+       <term><option>--save-fpi=<replaceable>save_path</replaceable></option></term>
+       <listitem>
+       <para>
+        Save full page images seen in the WAL stream to the
+        <replaceable>save_path</replaceable> directory, which will be created
+        if it does not exist.  The images saved will be subject to the same
+        filtering and limiting criteria as display records, but in this
+        mode <application>pg_waldump</application> will not output any other
+        information.
+       </para>
+       <para>
+        The page images will be saved with the file
+        format: <literal><replaceable>LSN</replaceable>.<replaceable>RELTABLESPACE</replaceable>.<replaceable>DATOID</replaceable>.<replaceable>RELNODE</replaceable>.<replaceable>BLKNO</replaceable><replaceable>FORK</replaceable></literal>
+
+        The dot-separated components are (in order):
+
+        <informaltable>
+         <tgroup cols="2">
+          <thead>
+           <row>
+            <entry>Component</entry>
+            <entry>Description</entry>
+           </row>
+          </thead>
+
+          <tbody>
+           <row>
+            <entry>LSN</entry>
+            <entry>The LSN of the record with this block, formatted
+             as two 8-character hexadecimal numbers <literal>%08X-%08X</literal></entry>
+           </row>
+
+           <row>
+            <entry>RELTABLESPACE</entry>
+            <entry>tablespace OID for the block</entry>
+           </row>
+
+           <row>
+            <entry>DATOID</entry>
+            <entry>database OID for the block</entry>
+           </row>
+
+           <row>
+            <entry>RELNODE</entry>
+            <entry>relnode id for the block</entry>
+           </row>
+
+           <row>
+            <entry>BLKNO</entry>
+            <entry>the block number of this block</entry>
+           </row>
+
+           <row>
+            <entry>FORK</entry>
+            <entry>
+             The name of the fork the full page image came from.  One
+             of <literal>_main</literal>, <literal>_fsm</literal>,
+             <literal>_vm</literal>, or <literal>_init</literal>.
+            </entry>
+           </row>
+          </tbody>
+         </tgroup>
+        </informaltable>
+       </para>
+       </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-x <replaceable>xid</replaceable></option></term>
       <term><option>--xid=<replaceable>xid</replaceable></option></term>
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 9605976870..34e37bffc3 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -29,6 +29,7 @@ tests += {
   'tap': {
     'tests': [
       't/001_basic.pl',
+      't/002_save_fullpage.pl',
     ],
   },
 }
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 9993378ca5..b79f09ad4f 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -23,9 +23,15 @@
 #include "access/xlogrecord.h"
 #include "access/xlogstats.h"
 #include "common/fe_memutils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "common/logging.h"
+#include "common/relpath.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
 
 /*
  * NOTE: For any code change or issue fix here, it is highly recommended to
@@ -70,6 +76,9 @@ typedef struct XLogDumpConfig
 	bool		filter_by_relation_block_enabled;
 	ForkNumber	filter_by_relation_forknum;
 	bool		filter_by_fpw;
+
+	/* save options */
+	char	   *save_fpw_path;
 } XLogDumpConfig;
 
 
@@ -439,6 +448,61 @@ XLogRecordHasFPW(XLogReaderState *record)
 	return false;
 }
 
+/*
+ * Function to externally save all FPWs stored in the given WAL record
+ */
+static void
+XLogRecordSaveFPWs(XLogReaderState *record, const char *savepath)
+{
+	int			block_id;
+
+	for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+	{
+		PGAlignedBlock buf;
+		Page		page;
+		char		filename[MAXPGPATH];
+		char		forkname[FORKNAMECHARS + 2];	/* _ + \0 */
+		FILE	   *OPF;
+		BlockNumber blk;
+		RelFileLocator rnode;
+		ForkNumber	fork;
+
+		page = (Page) buf.data;
+
+		if (!XLogRecHasBlockRef(record, block_id))
+			continue;
+
+		if (!XLogRecHasBlockImage(record, block_id))
+			continue;
+
+		if (!RestoreBlockImage(record, block_id, page))
+			continue;
+
+		/* we have our extracted FPI, let's save it now */
+
+		XLogRecGetBlockTagExtended(record, block_id,
+								   &rnode, &fork, &blk, NULL);
+
+		if (fork >= 0 && fork <= MAX_FORKNUM)
+			sprintf(forkname, "_%s", forkNames[fork]);
+		else
+			pg_fatal("found invalid fork number: %u", fork);
+
+		snprintf(filename, MAXPGPATH, "%s/%08X-%08X.%u.%u.%u.%u%s", savepath,
+				 LSN_FORMAT_ARGS(record->ReadRecPtr),
+				 rnode.spcOid, rnode.dbOid, rnode.relNumber, blk, forkname);
+
+		OPF = fopen(filename, PG_BINARY_W);
+		if (!OPF)
+			pg_fatal("couldn't open file for output: %s", filename);
+
+		if (fwrite(page, BLCKSZ, 1, OPF) != 1)
+			pg_fatal("couldn't write out complete full page image to file: %s", filename);
+
+		fclose(OPF);
+	}
+}
+
 /*
  * Print a record to stdout
  */
@@ -679,6 +743,8 @@ usage(void)
 			 "                         (default: 1 or the value used in STARTSEG)\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
 	printf(_("  -w, --fullpage         only show records with a full page write\n"));
+	printf(_("  -W, --save-fpi=path    save full page images to given path as\n"
+			 "                         LSN.T.D.R.B_F\n"));
 	printf(_("  -x, --xid=XID          only show records with transaction ID XID\n"));
 	printf(_("  -z, --stats[=record]   show statistics instead of records\n"
 			 "                         (optionally, show per-record statistics)\n"));
@@ -712,6 +778,7 @@ main(int argc, char **argv)
 		{"limit", required_argument, NULL, 'n'},
 		{"path", required_argument, NULL, 'p'},
 		{"quiet", no_argument, NULL, 'q'},
+		{"save-fpi", required_argument, NULL, 'W'},
 		{"relation", required_argument, NULL, 'R'},
 		{"rmgr", required_argument, NULL, 'r'},
 		{"start", required_argument, NULL, 's'},
@@ -772,6 +839,7 @@ main(int argc, char **argv)
 	config.filter_by_fpw = false;
 	config.stats = false;
 	config.stats_per_record = false;
+	config.save_fpw_path = NULL;
 
 	stats.startptr = InvalidXLogRecPtr;
 	stats.endptr = InvalidXLogRecPtr;
@@ -782,7 +850,7 @@ main(int argc, char **argv)
 		goto bad_argument;
 	}
 
-	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wx:z",
+	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wW:x:X:z",
 								 long_options, &optindex)) != -1)
 	{
 		switch (option)
@@ -918,6 +986,9 @@ main(int argc, char **argv)
 			case 'w':
 				config.filter_by_fpw = true;
 				break;
+			case 'W':
+				config.save_fpw_path = pg_strdup(optarg);
+				break;
 			case 'x':
 				if (sscanf(optarg, "%u", &config.filter_by_xid) != 1)
 				{
@@ -972,6 +1043,17 @@ main(int argc, char **argv)
 		}
 	}
 
+	if (config.save_fpw_path != NULL)
+	{
+		/* Create the dir if it doesn't exist */
+		if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0)
+		{
+			pg_log_error("could not create output directory \"%s\": %m",
+						 config.save_fpw_path);
+			goto bad_argument;
+		}
+	}
+
 	/* parse files as start/end boundaries, extract path if not specified */
 	if (optind < argc)
 	{
@@ -1150,6 +1232,11 @@ main(int argc, char **argv)
 				XLogRecStoreStats(&stats, xlogreader_state);
 				stats.endptr = xlogreader_state->EndRecPtr;
 			}
+			else if (config.save_fpw_path)
+			{
+				if (XLogRecordHasFPW(xlogreader_state))
+					XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path);
+			}
 			else
 				XLogDumpDisplayRecord(&config, xlogreader_state);
 		}
@@ -1167,6 +1254,12 @@ main(int argc, char **argv)
 	if (time_to_stop)
 		exit(0);
 
+	if (config.save_fpw_path != NULL)
+	{
+		/* Fsync our output directory */
+		fsync_fname(config.save_fpw_path, true);
+	}
+
 	if (errormsg)
 		pg_fatal("error in WAL record at %X/%X: %s",
 				 LSN_FORMAT_ARGS(xlogreader_state->ReadRecPtr),
diff --git a/src/bin/pg_waldump/t/002_save_fullpage.pl b/src/bin/pg_waldump/t/002_save_fullpage.pl
new file mode 100644
index 0000000000..c9a2ac733a
--- /dev/null
+++ b/src/bin/pg_waldump/t/002_save_fullpage.pl
@@ -0,0 +1,104 @@
+
+# Copyright (c) 2022, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use File::Basename;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::RecursiveCopy;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $blocksize;
+
+# routine to extract the LSN and checksum from the given block structure
+sub get_block_info
+{
+	my $path = shift;
+	my $block;
+
+	open my $fh, '<', $path or die "couldn't open file: $path\n";
+	die "couldn't read full block\n" if $blocksize != read $fh, $block, $blocksize;
+	my ($lsn_hi, $lsn_lo, $checksum) = unpack('VVv', $block);
+
+	$lsn_hi = sprintf('%08X', $lsn_hi);
+	$lsn_lo = sprintf('%08X', $lsn_lo);
+
+	return ($lsn_hi, $lsn_lo, $checksum);
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+my $node =  PostgreSQL::Test::Cluster->new('primary');
+$node->init(extra => ['-k'], allows_streaming => 1);
+$node->start;
+
+# Sanity checks for command line options.
+$node->command_fails(
+	[ 'pg_waldump', '--save-fpi' ],
+	'--save-fpi fails without path');
+
+# generate data/wal to examine that will have FPIs in them
+$node->safe_psql('postgres', <<EOF);
+SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_waldump_slot', true, false);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT;
+SELECT pg_switch_wal();
+UPDATE test_table SET a = a + 1;
+SELECT pg_switch_wal();
+EOF
+
+# get our blocksize from the db
+$blocksize = $node->safe_psql('postgres', "SELECT current_setting('block_size')");
+
+# get the relation node, etc for the new table
+my $relation = $node->safe_psql('postgres',
+	q{SELECT format('%s/%s/%s', CASE WHEN reltablespace = 0 THEN dattablespace ELSE reltablespace END, pg_database.oid, pg_relation_filenode(pg_class.oid)) FROM pg_class, pg_database WHERE relname = 'test_table' AND datname = current_database()}
+);
+
+diag $relation;
+
+my $waldir = $node->basedir . '/pgdata/pg_wal';
+my $walfile = [sort { $a <=> $b } glob("$waldir/00*")]->[1]; # we want the second WAL file, which will be a complete WAL file with full-page writes for our specific relation.
+my $tmp_folder = PostgreSQL::Test::Utils::tempdir;
+diag "using walfile: $walfile";
+
+ok($walfile, "Got a WAL file");
+
+$node->command_ok(['pg_waldump', '--save-fpi', "$tmp_folder/raw", '--relation', $relation, $walfile]);
+
+my $file_re =
+  qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/;
+
+my %checksums;
+my %files;
+
+# verify filename formats matches w/--save-fpi
+for my $fullpath (glob "$tmp_folder/raw/*")
+{
+	my $file = File::Basename::basename($fullpath);
+
+	like($file, $file_re, "verify filename format for file $file");
+
+	# save filename for later verification
+	$files{$file}++;
+
+	my ($hi_lsn_fn, $lo_lsn_fn) = ($file =~ $file_re);
+	my ($hi_lsn_bk, $lo_lsn_bk, $checksum) = get_block_info($fullpath);
+
+	# verify the lsn in the block comes before the file's lsn
+	ok( $hi_lsn_fn . $lo_lsn_fn gt $hi_lsn_bk . $lo_lsn_bk,
+		'verify file-based LSN precedes block-based one');
+
+	# stash checksum for later comparisons
+	$checksums{$file} = $checksum;
+}
+
+# validate that we ended up with some files output and they were the same
+ok(keys %files > 0, 'verify we processed some files');
+
+$node->safe_psql('postgres', <<EOQ);
+SELECT pg_drop_replication_slot('regress_pg_waldump_slot');
+EOQ
+done_testing();
-- 
2.37.1 (Apple Git-137.1)

#43Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David Christensen (#42)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Tue, Nov 15, 2022 at 1:29 AM David Christensen
<david.christensen@crunchydata.com> wrote:

Enclosed is v8, which uses the replication slot method to retain WAL
as well as fsync'ing the output directory when everything is done.

Thanks. It mostly is in good shape. However, few more comments:

1.
+        if it does not exist.  The images saved will be subject to the same
+        filtering and limiting criteria as display records, but in this
+        mode <application>pg_waldump</application> will not output any other
+        information.
May I know what's the intention of the statement 'The images saved
....'? If it's not necessary and convey anything useful to the user,
can we remove it?

2.
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
I think we don't need the above includes as we got rid of verifying
page checksums. The patch compiles without them for me.

3.
+ char *save_fpw_path;
Can we rename the above variable to save_fpi_path, just to be in sync
with what we expose to the user, the option name 'save-fpi'?

4.
+    if (config.save_fpw_path != NULL)
+    {
+        /* Fsync our output directory */
+        fsync_fname(config.save_fpw_path, true);
+    }
I guess adding a comment there as to why we aren't fsyncing for every
file that gets created, but once per the directory at the end. That'd
help clarify doubts that other members might get while looking at the
code.
5.
+    if (config.save_fpw_path != NULL)
+    {
+        /* Fsync our output directory */
+        fsync_fname(config.save_fpw_path, true);
+    }
So, are we sure that we don't want to fsync for time_to_stop exit(0)
cases, say when CTRL+C'ed. Looks like we handle time_to_stop safely
meaning exiting with return code 0, shouldn't we fsync the directory?

6.
+ else if (config.save_fpw_path)
Let's use the same convention to check non-NULLness,
config.save_fpw_path != NULL.

7.
+CHECKPOINT;
+SELECT pg_switch_wal();
+UPDATE test_table SET a = a + 1;
+SELECT pg_switch_wal();
I don't think switching WAL after checkpoint is necessary here,
because the checkpoint ensures all the WAL gets flushed to disk.
Please remove it.

PS: I've seen the following code:
+my $walfile = [sort { $a <=> $b } glob("$waldir/00*")]->[1]; # we
want the second WAL file, which will be a complete WAL file with
full-page writes for our specific relation.

8.
+$node->safe_psql('postgres', <<EOF);
+EOF
Why EOF is used here? Can't we do something like below to execute
multiple statements?
$node->safe_psql(
'postgres', qq[
SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
]);

Same here:
+$node->safe_psql('postgres', <<EOQ);
+SELECT pg_drop_replication_slot('regress_pg_waldump_slot');
+EOQ

9.
+my $walfile = [sort { $a <=> $b } glob("$waldir/00*")]->[1]; # we
want the second WAL file, which will be a complete WAL file with
full-page writes for our specific relation.
Is it guaranteed that just looking at the second WAL file in the
pg_wal directory assures WAL file with FPIs? I think we have to save
the WAL file that contains FPIs, that is the file after, CHECKPOINT,
UPDATE and pg_switch_wal. I think you can store output LSN of
pg_switch_wal

10.
+$node->safe_psql('postgres', <<EOQ);
+SELECT pg_drop_replication_slot('regress_pg_waldump_slot');
+EOQ
+done_testing();

Do we need to explicitly drop the slot here? I think we don't
specifically drop the replication slot in all the places, my guess is
after done_testing(), the node would get destroyed and also the slot.
I think it's not required.

11.
+# verify filename formats matches w/--save-fpi
+for my $fullpath (glob "$tmp_folder/raw/*")
Do we need to look for the exact match of the file that gets created
in the save-fpi path? While checking for this is great, it makes the
test code non-portable (may not work on Windows or other platforms,
no?) and complex? This way, you can get rid of get_block_info() as
well? And +for my $fullpath (glob "$tmp_folder/raw/*")
will also  get simplified.

I think you can further simplify the tests by:
create the node
generate an FPI
call pg_waldump with save-fpi option
check the target directory for a file that contains the relid,
something like '%relid%'.

The above would still serve the purpose, tests the code without much complexity.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#44David Christensen
david.christensen@crunchydata.com
In reply to: Bharath Rupireddy (#43)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Tue, Nov 15, 2022 at 4:41 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Tue, Nov 15, 2022 at 1:29 AM David Christensen
<david.christensen@crunchydata.com> wrote:

Enclosed is v8, which uses the replication slot method to retain WAL
as well as fsync'ing the output directory when everything is done.

Thanks. It mostly is in good shape. However, few more comments:

1.
+        if it does not exist.  The images saved will be subject to the same
+        filtering and limiting criteria as display records, but in this
+        mode <application>pg_waldump</application> will not output any other
+        information.
May I know what's the intention of the statement 'The images saved
....'? If it's not necessary and convey anything useful to the user,
can we remove it?

Basically I mean if you're limiting to a specific relation or rmgr
type, etc, it only saves those FPIs. (So filtering is applied first
before considering whether to save the FPI or not.)

2.
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
I think we don't need the above includes as we got rid of verifying
page checksums. The patch compiles without them for me.

Good catch.

3.
+ char *save_fpw_path;
Can we rename the above variable to save_fpi_path, just to be in sync
with what we expose to the user, the option name 'save-fpi'?

Sure.

4.
+    if (config.save_fpw_path != NULL)
+    {
+        /* Fsync our output directory */
+        fsync_fname(config.save_fpw_path, true);
+    }
I guess adding a comment there as to why we aren't fsyncing for every
file that gets created, but once per the directory at the end. That'd
help clarify doubts that other members might get while looking at the
code.

Can do.

5.
+    if (config.save_fpw_path != NULL)
+    {
+        /* Fsync our output directory */
+        fsync_fname(config.save_fpw_path, true);
+    }
So, are we sure that we don't want to fsync for time_to_stop exit(0)
cases, say when CTRL+C'ed. Looks like we handle time_to_stop safely
meaning exiting with return code 0, shouldn't we fsync the directory?

We can. Like I've said before, since these aren't production parts of
the cluster I don't personally have much of an opinion if fsync() is
appropriate at all, so don't have strong feelings here.

6.
+ else if (config.save_fpw_path)
Let's use the same convention to check non-NULLness,
config.save_fpw_path != NULL.

Good catch.

7.
+CHECKPOINT;
+SELECT pg_switch_wal();
+UPDATE test_table SET a = a + 1;
+SELECT pg_switch_wal();
I don't think switching WAL after checkpoint is necessary here,
because the checkpoint ensures all the WAL gets flushed to disk.
Please remove it.

The point is to ensure we have a clean WAL segment that we know will
contain the relation we are filtering by. Will test if this still
holds without the extra pg_switch_wal(), but that's the rationale.

PS: I've seen the following code:
+my $walfile = [sort { $a <=> $b } glob("$waldir/00*")]->[1]; # we
want the second WAL file, which will be a complete WAL file with
full-page writes for our specific relation.

I don't understand the question.

8.
+$node->safe_psql('postgres', <<EOF);
+EOF
Why EOF is used here? Can't we do something like below to execute
multiple statements?
$node->safe_psql(
'postgres', qq[
SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
]);

Same here:
+$node->safe_psql('postgres', <<EOQ);
+SELECT pg_drop_replication_slot('regress_pg_waldump_slot');
+EOQ

As a long-time perl programmer, heredocs seem more natural and easier
to read rather than a string that accomplishes the same function. If
there is an established project style I'll stick with it, but it just
rolled out that way. :-)

9.
+my $walfile = [sort { $a <=> $b } glob("$waldir/00*")]->[1]; # we
want the second WAL file, which will be a complete WAL file with
full-page writes for our specific relation.
Is it guaranteed that just looking at the second WAL file in the
pg_wal directory assures WAL file with FPIs? I think we have to save
the WAL file that contains FPIs, that is the file after, CHECKPOINT,
UPDATE and pg_switch_wal. I think you can store output LSN of
pg_switch_wal

Yeah, I could look at that approach; originally this test was doing a
lot more, so this is sort of residual from that original
implementation. For a single file, this would probably be an
acceptable route.

10.
+$node->safe_psql('postgres', <<EOQ);
+SELECT pg_drop_replication_slot('regress_pg_waldump_slot');
+EOQ
+done_testing();

Do we need to explicitly drop the slot here? I think we don't
specifically drop the replication slot in all the places, my guess is
after done_testing(), the node would get destroyed and also the slot.
I think it's not required.

Maybe not required, but seems good form (and the other test I based
this on did do the cleanup).

11.
+# verify filename formats matches w/--save-fpi
+for my $fullpath (glob "$tmp_folder/raw/*")
Do we need to look for the exact match of the file that gets created
in the save-fpi path? While checking for this is great, it makes the
test code non-portable (may not work on Windows or other platforms,
no?) and complex? This way, you can get rid of get_block_info() as
well? And +for my $fullpath (glob "$tmp_folder/raw/*")
will also  get simplified.

I think you can further simplify the tests by:
create the node
generate an FPI
call pg_waldump with save-fpi option
check the target directory for a file that contains the relid,
something like '%relid%'.

The above would still serve the purpose, tests the code without much complexity.

I disagree; I think there is utility in keeping the validation of the
expected output. Since we have the code that works for it (and does
work on Windows, per passing the CI tests) I'm not seeing why we
wouldn't want to continue to validate as much as possible.

Thanks,

David

#45David Christensen
david.christensen@crunchydata.com
In reply to: David Christensen (#44)
1 attachment(s)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

Enclosed is v9.

- code style consistency (FPI instead of FPW) internally.
- cleanup of no-longer needed checksum-related pieces from code and tests.
- test cleanup/simplification.
- other comment cleanup.

Passes all CI checks.

Best,

David

Attachments:

v9-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patchapplication/octet-stream; name=v9-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patchDownload
From e0fe44626cea657f5dd6a6e035406989c529a02a Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Thu, 10 Nov 2022 10:29:38 -0600
Subject: [PATCH v9] Teach pg_waldump to extract FPIs from the WAL stream

Extracts full-page images from the WAL stream into a given target directory.
These images are subject to the same filtering rules as normal display in
pg_waldump, which means that you can isolate the full page writes to a
target relation, among other things.

Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk>_<fork> with
formatting to make things somewhat sortable; for instance:

00000000-010000C0.1663.1.6117.0_main
00000000-01000150.1664.0.6115.0_main
00000000-010001E0.1664.0.6114.0_main
00000000-01000270.1663.1.6116.0_main
00000000-01000300.1663.1.6113.0_main
00000000-01000390.1663.1.6112.0_main
00000000-01000420.1663.1.8903.0_main
00000000-010004B0.1663.1.8902.0_main
00000000-01000540.1663.1.6111.0_main
00000000-010005D0.1663.1.6110.0_main

It's noteworthy that the raw block images do not have the current LSN
stored with them in the WAL stream (as would be true for on-heap
versions of the blocks), nor would the checksum be updated in
them (though WAL itself has checksums, so there is some protection
there).

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect` suite of tools to perform detailed analysis on
the pages in question, based on historical information, and may come in
handy for forensics work.
---
 doc/src/sgml/ref/pg_waldump.sgml          |  69 +++++++++++++++
 src/bin/pg_waldump/meson.build            |   1 +
 src/bin/pg_waldump/pg_waldump.c           | 101 +++++++++++++++++++++-
 src/bin/pg_waldump/t/002_save_fullpage.pl |  96 ++++++++++++++++++++
 4 files changed, 264 insertions(+), 3 deletions(-)
 create mode 100644 src/bin/pg_waldump/t/002_save_fullpage.pl

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d559f091e5..3c5bd9eb3f 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -240,6 +240,75 @@ PostgreSQL documentation
        </listitem>
      </varlistentry>
 
+     <varlistentry>
+       <term><option>-W <replaceable>save_path</replaceable></option></term>
+       <term><option>--save-fpi=<replaceable>save_path</replaceable></option></term>
+       <listitem>
+       <para>
+        Save full page images seen in the WAL stream to the
+        <replaceable>save_path</replaceable> directory, which will be created
+        if it does not exist.  The images saved will be subject to the same
+        filtering and limiting criteria as display records, but in this
+        mode <application>pg_waldump</application> will not output any other
+        information.
+       </para>
+       <para>
+        The page images will be saved with the file
+        format: <literal><replaceable>LSN</replaceable>.<replaceable>RELTABLESPACE</replaceable>.<replaceable>DATOID</replaceable>.<replaceable>RELNODE</replaceable>.<replaceable>BLKNO</replaceable><replaceable>FORK</replaceable></literal>
+
+        The dot-separated components are (in order):
+
+        <informaltable>
+         <tgroup cols="2">
+          <thead>
+           <row>
+            <entry>Component</entry>
+            <entry>Description</entry>
+           </row>
+          </thead>
+
+          <tbody>
+           <row>
+            <entry>LSN</entry>
+            <entry>The LSN of the record with this block, formatted
+             as two 8-character hexadecimal numbers <literal>%08X-%08X</literal></entry>
+           </row>
+
+           <row>
+            <entry>RELTABLESPACE</entry>
+            <entry>tablespace OID for the block</entry>
+           </row>
+
+           <row>
+            <entry>DATOID</entry>
+            <entry>database OID for the block</entry>
+           </row>
+
+           <row>
+            <entry>RELNODE</entry>
+            <entry>relnode id for the block</entry>
+           </row>
+
+           <row>
+            <entry>BLKNO</entry>
+            <entry>the block number of this block</entry>
+           </row>
+
+           <row>
+            <entry>FORK</entry>
+            <entry>
+             The name of the fork the full page image came from.  One
+             of <literal>_main</literal>, <literal>_fsm</literal>,
+             <literal>_vm</literal>, or <literal>_init</literal>.
+            </entry>
+           </row>
+          </tbody>
+         </tgroup>
+        </informaltable>
+       </para>
+       </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-x <replaceable>xid</replaceable></option></term>
       <term><option>--xid=<replaceable>xid</replaceable></option></term>
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 9605976870..34e37bffc3 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -29,6 +29,7 @@ tests += {
   'tap': {
     'tests': [
       't/001_basic.pl',
+      't/002_save_fullpage.pl',
     ],
   },
 }
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 9993378ca5..cbfafebe8a 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -23,9 +23,13 @@
 #include "access/xlogrecord.h"
 #include "access/xlogstats.h"
 #include "common/fe_memutils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "common/logging.h"
+#include "common/relpath.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "storage/bufpage.h"
 
 /*
  * NOTE: For any code change or issue fix here, it is highly recommended to
@@ -70,6 +74,9 @@ typedef struct XLogDumpConfig
 	bool		filter_by_relation_block_enabled;
 	ForkNumber	filter_by_relation_forknum;
 	bool		filter_by_fpw;
+
+	/* save options */
+	char	   *save_fpi_path;
 } XLogDumpConfig;
 
 
@@ -423,7 +430,7 @@ XLogRecordMatchesRelationBlock(XLogReaderState *record,
  * Boolean to return whether the given WAL record contains a full page write.
  */
 static bool
-XLogRecordHasFPW(XLogReaderState *record)
+XLogRecordHasFPI(XLogReaderState *record)
 {
 	int			block_id;
 
@@ -439,6 +446,61 @@ XLogRecordHasFPW(XLogReaderState *record)
 	return false;
 }
 
+/*
+ * Function to externally save all FPIs stored in the given WAL record
+ */
+static void
+XLogRecordSaveFPIs(XLogReaderState *record, const char *savepath)
+{
+	int			block_id;
+
+	for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+	{
+		PGAlignedBlock buf;
+		Page		page;
+		char		filename[MAXPGPATH];
+		char		forkname[FORKNAMECHARS + 2];	/* _ + \0 */
+		FILE	   *OPF;
+		BlockNumber blk;
+		RelFileLocator rnode;
+		ForkNumber	fork;
+
+		page = (Page) buf.data;
+
+		if (!XLogRecHasBlockRef(record, block_id))
+			continue;
+
+		if (!XLogRecHasBlockImage(record, block_id))
+			continue;
+
+		if (!RestoreBlockImage(record, block_id, page))
+			continue;
+
+		/* we have our extracted FPI, let's save it now */
+
+		XLogRecGetBlockTagExtended(record, block_id,
+								   &rnode, &fork, &blk, NULL);
+
+		if (fork >= 0 && fork <= MAX_FORKNUM)
+			sprintf(forkname, "_%s", forkNames[fork]);
+		else
+			pg_fatal("found invalid fork number: %u", fork);
+
+		snprintf(filename, MAXPGPATH, "%s/%08X-%08X.%u.%u.%u.%u%s", savepath,
+				 LSN_FORMAT_ARGS(record->ReadRecPtr),
+				 rnode.spcOid, rnode.dbOid, rnode.relNumber, blk, forkname);
+
+		OPF = fopen(filename, PG_BINARY_W);
+		if (!OPF)
+			pg_fatal("couldn't open file for output: %s", filename);
+
+		if (fwrite(page, BLCKSZ, 1, OPF) != 1)
+			pg_fatal("couldn't write out complete full page image to file: %s", filename);
+
+		fclose(OPF);
+	}
+}
+
 /*
  * Print a record to stdout
  */
@@ -679,6 +741,8 @@ usage(void)
 			 "                         (default: 1 or the value used in STARTSEG)\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
 	printf(_("  -w, --fullpage         only show records with a full page write\n"));
+	printf(_("  -W, --save-fpi=path    save full page images to given path as\n"
+			 "                         LSN.T.D.R.B_F\n"));
 	printf(_("  -x, --xid=XID          only show records with transaction ID XID\n"));
 	printf(_("  -z, --stats[=record]   show statistics instead of records\n"
 			 "                         (optionally, show per-record statistics)\n"));
@@ -712,6 +776,7 @@ main(int argc, char **argv)
 		{"limit", required_argument, NULL, 'n'},
 		{"path", required_argument, NULL, 'p'},
 		{"quiet", no_argument, NULL, 'q'},
+		{"save-fpi", required_argument, NULL, 'W'},
 		{"relation", required_argument, NULL, 'R'},
 		{"rmgr", required_argument, NULL, 'r'},
 		{"start", required_argument, NULL, 's'},
@@ -772,6 +837,7 @@ main(int argc, char **argv)
 	config.filter_by_fpw = false;
 	config.stats = false;
 	config.stats_per_record = false;
+	config.save_fpi_path = NULL;
 
 	stats.startptr = InvalidXLogRecPtr;
 	stats.endptr = InvalidXLogRecPtr;
@@ -782,7 +848,7 @@ main(int argc, char **argv)
 		goto bad_argument;
 	}
 
-	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wx:z",
+	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wW:x:X:z",
 								 long_options, &optindex)) != -1)
 	{
 		switch (option)
@@ -918,6 +984,9 @@ main(int argc, char **argv)
 			case 'w':
 				config.filter_by_fpw = true;
 				break;
+			case 'W':
+				config.save_fpi_path = pg_strdup(optarg);
+				break;
 			case 'x':
 				if (sscanf(optarg, "%u", &config.filter_by_xid) != 1)
 				{
@@ -972,6 +1041,17 @@ main(int argc, char **argv)
 		}
 	}
 
+	if (config.save_fpi_path != NULL)
+	{
+		/* Create the dir if it doesn't exist */
+		if (pg_mkdir_p(config.save_fpi_path, pg_dir_create_mode) < 0)
+		{
+			pg_log_error("could not create output directory \"%s\": %m",
+						 config.save_fpi_path);
+			goto bad_argument;
+		}
+	}
+
 	/* parse files as start/end boundaries, extract path if not specified */
 	if (optind < argc)
 	{
@@ -1139,7 +1219,7 @@ main(int argc, char **argv)
 											config.filter_by_relation_forknum))
 			continue;
 
-		if (config.filter_by_fpw && !XLogRecordHasFPW(xlogreader_state))
+		if (config.filter_by_fpw && !XLogRecordHasFPI(xlogreader_state))
 			continue;
 
 		/* perform any per-record work */
@@ -1150,6 +1230,11 @@ main(int argc, char **argv)
 				XLogRecStoreStats(&stats, xlogreader_state);
 				stats.endptr = xlogreader_state->EndRecPtr;
 			}
+			else if (config.save_fpi_path != NULL)
+			{
+				if (XLogRecordHasFPI(xlogreader_state))
+					XLogRecordSaveFPIs(xlogreader_state, config.save_fpi_path);
+			}
 			else
 				XLogDumpDisplayRecord(&config, xlogreader_state);
 		}
@@ -1164,6 +1249,16 @@ main(int argc, char **argv)
 	if (config.stats == true && !config.quiet)
 		XLogDumpDisplayStats(&config, &stats);
 
+	if (config.save_fpi_path != NULL)
+	{
+		/* We fsync our output directory only; since these files are not part
+		 * of the production database we do not require the performance hit
+		 * that fsyncing every FPI would entail, so are doing this as a
+		 * compromise. */
+
+		fsync_fname(config.save_fpi_path, true);
+	}
+
 	if (time_to_stop)
 		exit(0);
 
diff --git a/src/bin/pg_waldump/t/002_save_fullpage.pl b/src/bin/pg_waldump/t/002_save_fullpage.pl
new file mode 100644
index 0000000000..ba0d6c7ad6
--- /dev/null
+++ b/src/bin/pg_waldump/t/002_save_fullpage.pl
@@ -0,0 +1,96 @@
+
+# Copyright (c) 2022, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use File::Basename;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::RecursiveCopy;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my ($blocksize, $walfile_name);
+
+# routine to extract the LSN from the given block structure
+sub get_block_info
+{
+	my $path = shift;
+	my $block;
+
+	open my $fh, '<', $path or die "couldn't open file: $path\n";
+	die "couldn't read full block\n" if $blocksize != read $fh, $block, $blocksize;
+	my ($lsn_hi, $lsn_lo) = unpack('VV', $block);
+
+	$lsn_hi = sprintf('%08X', $lsn_hi);
+	$lsn_lo = sprintf('%08X', $lsn_lo);
+
+	return ($lsn_hi, $lsn_lo);
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+my $node =  PostgreSQL::Test::Cluster->new('primary');
+$node->init(extra => ['-k'], allows_streaming => 1);
+$node->start;
+
+# Sanity checks for command line options.
+$node->command_fails(
+	[ 'pg_waldump', '--save-fpi' ],
+	'--save-fpi fails without path');
+
+# generate data/wal to examine that will have FPIs in them
+$node->safe_psql('postgres', <<EOF);
+SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_waldump_slot', true, false);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT; -- required to force FPI for next writes
+UPDATE test_table SET a = a + 1;
+EOF
+
+($walfile_name, $blocksize) = split '\|' => $node->safe_psql('postgres',"SELECT pg_walfile_name(pg_switch_wal()), current_setting('block_size')");
+
+# get the relation node, etc for the new table
+my $relation = $node->safe_psql('postgres',
+	q{SELECT format('%s/%s/%s', CASE WHEN reltablespace = 0 THEN dattablespace ELSE reltablespace END, pg_database.oid, pg_relation_filenode(pg_class.oid)) FROM pg_class, pg_database WHERE relname = 'test_table' AND datname = current_database()}
+);
+
+diag $relation;
+
+my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
+my $tmp_folder = PostgreSQL::Test::Utils::tempdir;
+
+diag "using walfile: $walfile";
+
+ok(-f $walfile, "Got a WAL file");
+
+$node->command_ok(['pg_waldump', '--save-fpi', "$tmp_folder/raw", '--relation', $relation, $walfile]);
+
+my $file_re =
+  qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/;
+
+my %files;
+
+# verify filename formats matches w/--save-fpi
+for my $fullpath (glob "$tmp_folder/raw/*")
+{
+	my $file = File::Basename::basename($fullpath);
+
+	like($file, $file_re, "verify filename format for file $file");
+
+	# save filename for later verification
+	$files{$file}++;
+
+	my ($hi_lsn_fn, $lo_lsn_fn) = ($file =~ $file_re);
+	my ($hi_lsn_bk, $lo_lsn_bk) = get_block_info($fullpath);
+
+	# verify the lsn in the block comes before the file's lsn
+	ok( $hi_lsn_fn . $lo_lsn_fn gt $hi_lsn_bk . $lo_lsn_bk,
+		'verify file-based LSN precedes block-based one');
+}
+
+# validate that we ended up with some FPIs saved
+ok(keys %files > 0, 'verify we processed some files');
+
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regress_pg_waldump_slot')");
+
+done_testing();
-- 
2.37.1 (Apple Git-137.1)

#46Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David Christensen (#45)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Wed, Nov 16, 2022 at 1:20 AM David Christensen
<david.christensen@crunchydata.com> wrote:

Enclosed is v9.

- code style consistency (FPI instead of FPW) internally.
- cleanup of no-longer needed checksum-related pieces from code and tests.
- test cleanup/simplification.
- other comment cleanup.

Passes all CI checks.

Thanks for the updated patch.

1.
-        if (config.filter_by_fpw && !XLogRecordHasFPW(xlogreader_state))
+        if (config.filter_by_fpw && !XLogRecordHasFPI(xlogreader_state))
These changes are not related to this feature, hence renaming those
variables/function names must be dealt with separately. If required,
proposing another patch can be submitted to change filter_by_fpw to
filter_by_fpi and XLogRecordHasFPW() to XLogRecordHasFPI().
2.
+        /* We fsync our output directory only; since these files are not part
+         * of the production database we do not require the performance hit
+         * that fsyncing every FPI would entail, so are doing this as a
+         * compromise. */
The commenting style doesn't match the standard that we follow
elsewhere in postgres, please refer to other multi-line comments.
3.
+        fsync_fname(config.save_fpi_path, true);
+    }
It looks like fsync_fname()/fsync() in general isn't recursive, in the
sense that it doesn't fsync the files under the directory, but the
directory only. So, the idea of directory fsync doesn't seem worth it.
We either 1) get rid of fsync entirely or 2) fsync all the files after
they are created and the directory at the end or 3) do option (2) with
--no-sync option similar to its friends. Since option (2) is a no go,
we can either choose option (1) or option (2). My vote at this point
is for option (1).
4.
+($walfile_name, $blocksize) = split '\|' =>
$node->safe_psql('postgres',"SELECT pg_walfile_name(pg_switch_wal()),
current_setting('block_size')");
+my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
I think there's something wrong with this, no? pg_switch_wal() can, at
times, return end+1 of the prior segment (see below snippet) and I'm
not sure if such a case can happen here.

* The return value is either the end+1 address of the switch record,
* or the end+1 address of the prior segment if we did not need to
* write a switch record because we are already at segment start.
*/
XLogRecPtr
RequestXLogSwitch(bool mark_unimportant)

5.
+my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
+ok(-f $walfile, "Got a WAL file");
Is this checking if the WAL file is present or not in PGDATA/pg_wal?
If yes, I think this isn't required as pg_switch_wal() ensures that
the WAL is written and flushed to disk.

6.
+my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
Isn't "pgdata" hardcoded here? I think you might need to do the following:
$node->data_dir . '/pg_wal/' . $walfile_name;;

7.
+    # save filename for later verification
+    $files{$file}++;
+# validate that we ended up with some FPIs saved
+ok(keys %files > 0, 'verify we processed some files');
Why do we need to store filenames in an array when we later just check
the size of the array? Can't we use a boolean (file_found) or an int
variable (file_count) to verify that we found the file.
8.
+$node->safe_psql('postgres', <<EOF);
+SELECT 'init' FROM
pg_create_physical_replication_slot('regress_pg_waldump_slot', true,
false);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT; -- required to force FPI for next writes
+UPDATE test_table SET a = a + 1;
+EOF
The EOF with append_conf() is being used in 4 files and elsewhere in
the TAP test files (more than 100?) qq[] or quotes is being used. I
have no strong opinion here, I'll leave it to the other reviewers or
committer.
11.
+# verify filename formats matches w/--save-fpi
+for my $fullpath (glob "$tmp_folder/raw/*")
Do we need to look for the exact match of the file that gets created
in the save-fpi path? While checking for this is great, it makes the
test code non-portable (may not work on Windows or other platforms,
no?) and complex? This way, you can get rid of get_block_info() as
well? And +for my $fullpath (glob "$tmp_folder/raw/*")
will also  get simplified.

I think you can further simplify the tests by:
create the node
generate an FPI
call pg_waldump with save-fpi option
check the target directory for a file that contains the relid,
something like '%relid%'.

The above would still serve the purpose, tests the code without much complexity.

I disagree; I think there is utility in keeping the validation of the
expected output. Since we have the code that works for it (and does
work on Windows, per passing the CI tests) I'm not seeing why we
wouldn't want to continue to validate as much as possible.

My intention is to simplify the tests further and I still stick to it.
It looks like the majority of test code is to form the file name in
the format that pg_waldump outputs and match with the file name in the
target directory - for instance, in get_block_info(), and in the loop
for my $fullpath (glob "$tmp_folder/raw/*"). I don't think the tests
need to aim for file format checks, it's enough to look for the
written file with '%relid%' by pg_waldump, if needed, the contents of
the files written/FPI can also be verified with, say, pg_checksum
tool. Others may have different opinions though.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#47David Christensen
david.christensen@crunchydata.com
In reply to: Bharath Rupireddy (#46)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Wed, Nov 16, 2022 at 3:30 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

1.
-        if (config.filter_by_fpw && !XLogRecordHasFPW(xlogreader_state))
+        if (config.filter_by_fpw && !XLogRecordHasFPI(xlogreader_state))
These changes are not related to this feature, hence renaming those
variables/function names must be dealt with separately. If required,
proposing another patch can be submitted to change filter_by_fpw to
filter_by_fpi and XLogRecordHasFPW() to XLogRecordHasFPI().

Not required; can revert the changes unrelated to this specific patch.
(I'd written the original ones for it, so didn't really think anything
of it... :-))

2.
+        /* We fsync our output directory only; since these files are not part
+         * of the production database we do not require the performance hit
+         * that fsyncing every FPI would entail, so are doing this as a
+         * compromise. */
The commenting style doesn't match the standard that we follow
elsewhere in postgres, please refer to other multi-line comments.

Will fix.

3.
+        fsync_fname(config.save_fpi_path, true);
+    }
It looks like fsync_fname()/fsync() in general isn't recursive, in the
sense that it doesn't fsync the files under the directory, but the
directory only. So, the idea of directory fsync doesn't seem worth it.
We either 1) get rid of fsync entirely or 2) fsync all the files after
they are created and the directory at the end or 3) do option (2) with
--no-sync option similar to its friends. Since option (2) is a no go,
we can either choose option (1) or option (2). My vote at this point
is for option (1).

Agree to remove.

4.
+($walfile_name, $blocksize) = split '\|' =>
$node->safe_psql('postgres',"SELECT pg_walfile_name(pg_switch_wal()),
current_setting('block_size')");
+my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
I think there's something wrong with this, no? pg_switch_wal() can, at
times, return end+1 of the prior segment (see below snippet) and I'm
not sure if such a case can happen here.

* The return value is either the end+1 address of the switch record,
* or the end+1 address of the prior segment if we did not need to
* write a switch record because we are already at segment start.
*/
XLogRecPtr
RequestXLogSwitch(bool mark_unimportant)

I think this approach is pretty common to get the walfile name, no?
While there might be an edge case here, since the rest of the test is
a controlled environment I'm inclined to just not worry about it; this
would require the changes prior to this to exactly fill a WAL segment
which strikes me as extremely unlikely to the point of impossible in
this specific scenario.

5.
+my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
+ok(-f $walfile, "Got a WAL file");
Is this checking if the WAL file is present or not in PGDATA/pg_wal?
If yes, I think this isn't required as pg_switch_wal() ensures that
the WAL is written and flushed to disk.

You are correct, probably another artifact of the earlier version.
That said, not sure I see the harm in keeping it as a sanity-check.

6.
+my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
Isn't "pgdata" hardcoded here? I think you might need to do the following:
$node->data_dir . '/pg_wal/' . $walfile_name;;

Can fix.

7.
+    # save filename for later verification
+    $files{$file}++;
+# validate that we ended up with some FPIs saved
+ok(keys %files > 0, 'verify we processed some files');
Why do we need to store filenames in an array when we later just check
the size of the array? Can't we use a boolean (file_found) or an int
variable (file_count) to verify that we found the file.

Another artifact; we were comparing the files output between two
separate lists of arbitrary numbers of pages being written out and
verifying the raw/fixup versions had the same lists.

8.
+$node->safe_psql('postgres', <<EOF);
+SELECT 'init' FROM
pg_create_physical_replication_slot('regress_pg_waldump_slot', true,
false);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT; -- required to force FPI for next writes
+UPDATE test_table SET a = a + 1;
+EOF
The EOF with append_conf() is being used in 4 files and elsewhere in
the TAP test files (more than 100?) qq[] or quotes is being used. I
have no strong opinion here, I'll leave it to the other reviewers or
committer.

I'm inclined to leave it just for (personal) readability, but can
change if there's a strong consensus against.

11.
+# verify filename formats matches w/--save-fpi
+for my $fullpath (glob "$tmp_folder/raw/*")
Do we need to look for the exact match of the file that gets created
in the save-fpi path? While checking for this is great, it makes the
test code non-portable (may not work on Windows or other platforms,
no?) and complex? This way, you can get rid of get_block_info() as
well? And +for my $fullpath (glob "$tmp_folder/raw/*")
will also  get simplified.

I think you can further simplify the tests by:
create the node
generate an FPI
call pg_waldump with save-fpi option
check the target directory for a file that contains the relid,
something like '%relid%'.

The above would still serve the purpose, tests the code without much complexity.

I disagree; I think there is utility in keeping the validation of the
expected output. Since we have the code that works for it (and does
work on Windows, per passing the CI tests) I'm not seeing why we
wouldn't want to continue to validate as much as possible.

My intention is to simplify the tests further and I still stick to it.
It looks like the majority of test code is to form the file name in
the format that pg_waldump outputs and match with the file name in the
target directory - for instance, in get_block_info(), and in the loop
for my $fullpath (glob "$tmp_folder/raw/*"). I don't think the tests
need to aim for file format checks, it's enough to look for the
written file with '%relid%' by pg_waldump, if needed, the contents of
the files written/FPI can also be verified with, say, pg_checksum
tool. Others may have different opinions though.

I would like to get broader feedback before changing this.

David

#48Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David Christensen (#47)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Thu, Nov 17, 2022 at 10:02 PM David Christensen
<david.christensen@crunchydata.com> wrote:

On Wed, Nov 16, 2022 at 3:30 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

1.
-        if (config.filter_by_fpw && !XLogRecordHasFPW(xlogreader_state))
+        if (config.filter_by_fpw && !XLogRecordHasFPI(xlogreader_state))
These changes are not related to this feature, hence renaming those
variables/function names must be dealt with separately. If required,
proposing another patch can be submitted to change filter_by_fpw to
filter_by_fpi and XLogRecordHasFPW() to XLogRecordHasFPI().

Not required; can revert the changes unrelated to this specific patch.
(I'd written the original ones for it, so didn't really think anything
of it... :-))

2.
+        /* We fsync our output directory only; since these files are not part
+         * of the production database we do not require the performance hit
+         * that fsyncing every FPI would entail, so are doing this as a
+         * compromise. */
The commenting style doesn't match the standard that we follow
elsewhere in postgres, please refer to other multi-line comments.

Will fix.

3.
+        fsync_fname(config.save_fpi_path, true);
+    }
It looks like fsync_fname()/fsync() in general isn't recursive, in the
sense that it doesn't fsync the files under the directory, but the
directory only. So, the idea of directory fsync doesn't seem worth it.
We either 1) get rid of fsync entirely or 2) fsync all the files after
they are created and the directory at the end or 3) do option (2) with
--no-sync option similar to its friends. Since option (2) is a no go,
we can either choose option (1) or option (2). My vote at this point
is for option (1).

Agree to remove.

4.
+($walfile_name, $blocksize) = split '\|' =>
$node->safe_psql('postgres',"SELECT pg_walfile_name(pg_switch_wal()),
current_setting('block_size')");
+my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
I think there's something wrong with this, no? pg_switch_wal() can, at
times, return end+1 of the prior segment (see below snippet) and I'm
not sure if such a case can happen here.

* The return value is either the end+1 address of the switch record,
* or the end+1 address of the prior segment if we did not need to
* write a switch record because we are already at segment start.
*/
XLogRecPtr
RequestXLogSwitch(bool mark_unimportant)

I think this approach is pretty common to get the walfile name, no?
While there might be an edge case here, since the rest of the test is
a controlled environment I'm inclined to just not worry about it; this
would require the changes prior to this to exactly fill a WAL segment
which strikes me as extremely unlikely to the point of impossible in
this specific scenario.

5.
+my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
+ok(-f $walfile, "Got a WAL file");
Is this checking if the WAL file is present or not in PGDATA/pg_wal?
If yes, I think this isn't required as pg_switch_wal() ensures that
the WAL is written and flushed to disk.

You are correct, probably another artifact of the earlier version.
That said, not sure I see the harm in keeping it as a sanity-check.

6.
+my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
Isn't "pgdata" hardcoded here? I think you might need to do the following:
$node->data_dir . '/pg_wal/' . $walfile_name;;

Can fix.

7.
+    # save filename for later verification
+    $files{$file}++;
+# validate that we ended up with some FPIs saved
+ok(keys %files > 0, 'verify we processed some files');
Why do we need to store filenames in an array when we later just check
the size of the array? Can't we use a boolean (file_found) or an int
variable (file_count) to verify that we found the file.

Another artifact; we were comparing the files output between two
separate lists of arbitrary numbers of pages being written out and
verifying the raw/fixup versions had the same lists.

8.
+$node->safe_psql('postgres', <<EOF);
+SELECT 'init' FROM
pg_create_physical_replication_slot('regress_pg_waldump_slot', true,
false);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT; -- required to force FPI for next writes
+UPDATE test_table SET a = a + 1;
+EOF
The EOF with append_conf() is being used in 4 files and elsewhere in
the TAP test files (more than 100?) qq[] or quotes is being used. I
have no strong opinion here, I'll leave it to the other reviewers or
committer.

I'm inclined to leave it just for (personal) readability, but can
change if there's a strong consensus against.

11.
+# verify filename formats matches w/--save-fpi
+for my $fullpath (glob "$tmp_folder/raw/*")
Do we need to look for the exact match of the file that gets created
in the save-fpi path? While checking for this is great, it makes the
test code non-portable (may not work on Windows or other platforms,
no?) and complex? This way, you can get rid of get_block_info() as
well? And +for my $fullpath (glob "$tmp_folder/raw/*")
will also  get simplified.

I think you can further simplify the tests by:
create the node
generate an FPI
call pg_waldump with save-fpi option
check the target directory for a file that contains the relid,
something like '%relid%'.

The above would still serve the purpose, tests the code without much complexity.

I disagree; I think there is utility in keeping the validation of the
expected output. Since we have the code that works for it (and does
work on Windows, per passing the CI tests) I'm not seeing why we
wouldn't want to continue to validate as much as possible.

My intention is to simplify the tests further and I still stick to it.
It looks like the majority of test code is to form the file name in
the format that pg_waldump outputs and match with the file name in the
target directory - for instance, in get_block_info(), and in the loop
for my $fullpath (glob "$tmp_folder/raw/*"). I don't think the tests
need to aim for file format checks, it's enough to look for the
written file with '%relid%' by pg_waldump, if needed, the contents of
the files written/FPI can also be verified with, say, pg_checksum
tool. Others may have different opinions though.

I would like to get broader feedback before changing this.

David, is there a plan to provide an updated patch in this commitfest?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#49David Christensen
david.christensen@crunchydata.com
In reply to: Bharath Rupireddy (#48)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

Hi Bharath,

I can get one sent in tomorrow.

Thanks,

David

#50Michael Paquier
michael@paquier.xyz
In reply to: David Christensen (#49)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Wed, Dec 14, 2022 at 04:44:34PM -0600, David Christensen wrote:

I can get one sent in tomorrow.

-XLogRecordHasFPW(XLogReaderState *record)
+XLogRecordHasFPI(XLogReaderState *record)
This still refers to a FPW, so let's leave that out as well as any
renamings of this kind..
+   if (config.save_fpi_path != NULL)
+   {
+       /* Create the dir if it doesn't exist */
+       if (pg_mkdir_p(config.save_fpi_path, pg_dir_create_mode) < 0)
+       {
+           pg_log_error("could not create output directory \"%s\": %m",
+                        config.save_fpi_path);
+           goto bad_argument;
+       }
+   }
It seems to me that you could allow things to work even if the
directory exists and is empty.  See for example
verify_dir_is_empty_or_create() in pg_basebackup.c.
+my $file_re =
+  qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/;
This is artistic to parse for people not used to regexps (I do, a
little).  Perhaps this could use a small comment with an example of
name or a reference describing this format?
+# Set umask so test directories and files are created with default permissions
+umask(0077);
Incorrect copy-paste coming from elsewhere like the TAP tests of
pg_basebackup with group permissions?  Doesn't 
PostgreSQL::Test::Utils::tempdir give already enough protection in
terms of umask() and permissions?
+   if (config.save_fpi_path != NULL)
+   {
+       /* We fsync our output directory only; since these files are not part
+        * of the production database we do not require the performance hit
+        * that fsyncing every FPI would entail, so are doing this as a
+        * compromise. */
+
+       fsync_fname(config.save_fpi_path, true);
+   }
Why is it necessary to flush anything at all, then?
+my $relation = $node->safe_psql('postgres',
+   q{SELECT format('%s/%s/%s', CASE WHEN reltablespace = 0 THEN
dattablespace ELSE reltablespace END, pg_database.oid,
pg_relation_filenode(pg_class.oid)) FROM pg_class, pg_database WHERE
relname = 'test_table' AND datname = current_database()}
Could you rewrite that to be on multiple lines?

+diag "using walfile: $walfile";
You should avoid the use of "diag", as this would cause extra output
noise with a simple make check.

+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regress_pg_waldump_slot')")
That's not really necessary, the nodes are wiped out at the end of the
test.

+$node->safe_psql('postgres', <<EOF);
+SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_waldump_slot', true, false);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT; -- required to force FPI for next writes
+UPDATE test_table SET a = a + 1;
Using an EOF to execute a multi-line query would be a first.  Couldn't
you use the same thing as anywhere else?  009_twophase.pl just to
mention one.  (Mentioned by Bharath upthread, where he asked for an
extra opinion so here it is.)
--
Michael
#51David Christensen
david.christensen@crunchydata.com
In reply to: Michael Paquier (#50)
1 attachment(s)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Dec 14, 2022 at 04:44:34PM -0600, David Christensen wrote:

I can get one sent in tomorrow.

This v10 should incorporate your feedback as well as Bharath's.

-XLogRecordHasFPW(XLogReaderState *record)
+XLogRecordHasFPI(XLogReaderState *record)
This still refers to a FPW, so let's leave that out as well as any
renamings of this kind..

Reverted those changes.

+   if (config.save_fpi_path != NULL)
+   {
+       /* Create the dir if it doesn't exist */
+       if (pg_mkdir_p(config.save_fpi_path, pg_dir_create_mode) < 0)
+       {
+           pg_log_error("could not create output directory \"%s\": %m",
+                        config.save_fpi_path);
+           goto bad_argument;
+       }
+   }
It seems to me that you could allow things to work even if the
directory exists and is empty.  See for example
verify_dir_is_empty_or_create() in pg_basebackup.c.

The `pg_mkdir_p()` supports an existing directory (and I don't think
we want to require it to be empty first), so this only errors when it
can't create a directory for some reason.

+my $file_re =
+  qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/;
This is artistic to parse for people not used to regexps (I do, a
little).  Perhaps this could use a small comment with an example of
name or a reference describing this format?

Added a description of what this is looking for.

+# Set umask so test directories and files are created with default permissions
+umask(0077);
Incorrect copy-paste coming from elsewhere like the TAP tests of
pg_basebackup with group permissions?  Doesn't
PostgreSQL::Test::Utils::tempdir give already enough protection in
terms of umask() and permissions?

I'd expect that's where that came from. Removed.

+   if (config.save_fpi_path != NULL)
+   {
+       /* We fsync our output directory only; since these files are not part
+        * of the production database we do not require the performance hit
+        * that fsyncing every FPI would entail, so are doing this as a
+        * compromise. */
+
+       fsync_fname(config.save_fpi_path, true);
+   }
Why is it necessary to flush anything at all, then?

I personally don't think it is, but added it per Bharath's request.
Removed in this revision.

+my $relation = $node->safe_psql('postgres',
+   q{SELECT format('%s/%s/%s', CASE WHEN reltablespace = 0 THEN
dattablespace ELSE reltablespace END, pg_database.oid,
pg_relation_filenode(pg_class.oid)) FROM pg_class, pg_database WHERE
relname = 'test_table' AND datname = current_database()}
Could you rewrite that to be on multiple lines?

Sure, reformated.

+diag "using walfile: $walfile";
You should avoid the use of "diag", as this would cause extra output
noise with a simple make check.

Had been using it for debugging and didn't realize it'd cause issues.
Removed both instances.

+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regress_pg_waldump_slot')")
That's not really necessary, the nodes are wiped out at the end of the
test.

Removed.

+$node->safe_psql('postgres', <<EOF);
+SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_waldump_slot', true, false);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT; -- required to force FPI for next writes
+UPDATE test_table SET a = a + 1;
Using an EOF to execute a multi-line query would be a first.  Couldn't
you use the same thing as anywhere else?  009_twophase.pl just to
mention one.  (Mentioned by Bharath upthread, where he asked for an
extra opinion so here it is.)

Fair enough, while idiomatic perl to me, not a hill to die on;
converted to a standard multiline string.

Best,

David

Attachments:

v10-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-st.patchapplication/octet-stream; name=v10-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-st.patchDownload
From 2858c2ff9e17faca516cd782c606bce5028bc51e Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Thu, 15 Dec 2022 18:14:57 -0500
Subject: [PATCH v10] Teach pg_waldump to extract FPIs from the WAL stream

Extracts full-page images from the WAL stream into a given target directory.
These images are subject to the same filtering rules as normal display in
pg_waldump, which means that you can isolate the full page writes to a
target relation, among other things.

Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk>_<fork> with
formatting to make things somewhat sortable; for instance:

00000000-010000C0.1663.1.6117.0_main
00000000-01000150.1664.0.6115.0_main
00000000-010001E0.1664.0.6114.0_main
00000000-01000270.1663.1.6116.0_main
00000000-01000300.1663.1.6113.0_main
00000000-01000390.1663.1.6112.0_main
00000000-01000420.1663.1.8903.0_main
00000000-010004B0.1663.1.8902.0_main
00000000-01000540.1663.1.6111.0_main
00000000-010005D0.1663.1.6110.0_main

It's noteworthy that the raw block images do not have the current LSN
stored with them in the WAL stream (as would be true for on-heap
versions of the blocks), nor would the checksum be updated in
them (though WAL itself has checksums, so there is some protection
there).

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect` suite of tools to perform detailed analysis on
the pages in question, based on historical information, and may come in
handy for forensics work.
---
 doc/src/sgml/ref/pg_waldump.sgml          | 69 ++++++++++++++++
 src/bin/pg_waldump/meson.build            |  1 +
 src/bin/pg_waldump/pg_waldump.c           | 87 +++++++++++++++++++-
 src/bin/pg_waldump/t/002_save_fullpage.pl | 99 +++++++++++++++++++++++
 4 files changed, 255 insertions(+), 1 deletion(-)
 create mode 100644 src/bin/pg_waldump/t/002_save_fullpage.pl

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d559f091e5..3c5bd9eb3f 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -240,6 +240,75 @@ PostgreSQL documentation
        </listitem>
      </varlistentry>
 
+     <varlistentry>
+       <term><option>-W <replaceable>save_path</replaceable></option></term>
+       <term><option>--save-fpi=<replaceable>save_path</replaceable></option></term>
+       <listitem>
+       <para>
+        Save full page images seen in the WAL stream to the
+        <replaceable>save_path</replaceable> directory, which will be created
+        if it does not exist.  The images saved will be subject to the same
+        filtering and limiting criteria as display records, but in this
+        mode <application>pg_waldump</application> will not output any other
+        information.
+       </para>
+       <para>
+        The page images will be saved with the file
+        format: <literal><replaceable>LSN</replaceable>.<replaceable>RELTABLESPACE</replaceable>.<replaceable>DATOID</replaceable>.<replaceable>RELNODE</replaceable>.<replaceable>BLKNO</replaceable><replaceable>FORK</replaceable></literal>
+
+        The dot-separated components are (in order):
+
+        <informaltable>
+         <tgroup cols="2">
+          <thead>
+           <row>
+            <entry>Component</entry>
+            <entry>Description</entry>
+           </row>
+          </thead>
+
+          <tbody>
+           <row>
+            <entry>LSN</entry>
+            <entry>The LSN of the record with this block, formatted
+             as two 8-character hexadecimal numbers <literal>%08X-%08X</literal></entry>
+           </row>
+
+           <row>
+            <entry>RELTABLESPACE</entry>
+            <entry>tablespace OID for the block</entry>
+           </row>
+
+           <row>
+            <entry>DATOID</entry>
+            <entry>database OID for the block</entry>
+           </row>
+
+           <row>
+            <entry>RELNODE</entry>
+            <entry>relnode id for the block</entry>
+           </row>
+
+           <row>
+            <entry>BLKNO</entry>
+            <entry>the block number of this block</entry>
+           </row>
+
+           <row>
+            <entry>FORK</entry>
+            <entry>
+             The name of the fork the full page image came from.  One
+             of <literal>_main</literal>, <literal>_fsm</literal>,
+             <literal>_vm</literal>, or <literal>_init</literal>.
+            </entry>
+           </row>
+          </tbody>
+         </tgroup>
+        </informaltable>
+       </para>
+       </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-x <replaceable>xid</replaceable></option></term>
       <term><option>--xid=<replaceable>xid</replaceable></option></term>
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 9605976870..34e37bffc3 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -29,6 +29,7 @@ tests += {
   'tap': {
     'tests': [
       't/001_basic.pl',
+      't/002_save_fullpage.pl',
     ],
   },
 }
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 9993378ca5..90f8d48e5f 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -23,9 +23,13 @@
 #include "access/xlogrecord.h"
 #include "access/xlogstats.h"
 #include "common/fe_memutils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "common/logging.h"
+#include "common/relpath.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "storage/bufpage.h"
 
 /*
  * NOTE: For any code change or issue fix here, it is highly recommended to
@@ -70,6 +74,9 @@ typedef struct XLogDumpConfig
 	bool		filter_by_relation_block_enabled;
 	ForkNumber	filter_by_relation_forknum;
 	bool		filter_by_fpw;
+
+	/* save options */
+	char	   *save_fpi_path;
 } XLogDumpConfig;
 
 
@@ -439,6 +446,61 @@ XLogRecordHasFPW(XLogReaderState *record)
 	return false;
 }
 
+/*
+ * Function to externally save all FPIs stored in the given WAL record
+ */
+static void
+XLogRecordSaveFPIs(XLogReaderState *record, const char *savepath)
+{
+	int			block_id;
+
+	for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+	{
+		PGAlignedBlock buf;
+		Page		page;
+		char		filename[MAXPGPATH];
+		char		forkname[FORKNAMECHARS + 2];	/* _ + \0 */
+		FILE	   *OPF;
+		BlockNumber blk;
+		RelFileLocator rnode;
+		ForkNumber	fork;
+
+		page = (Page) buf.data;
+
+		if (!XLogRecHasBlockRef(record, block_id))
+			continue;
+
+		if (!XLogRecHasBlockImage(record, block_id))
+			continue;
+
+		if (!RestoreBlockImage(record, block_id, page))
+			continue;
+
+		/* we have our extracted FPI, let's save it now */
+
+		XLogRecGetBlockTagExtended(record, block_id,
+								   &rnode, &fork, &blk, NULL);
+
+		if (fork >= 0 && fork <= MAX_FORKNUM)
+			sprintf(forkname, "_%s", forkNames[fork]);
+		else
+			pg_fatal("found invalid fork number: %u", fork);
+
+		snprintf(filename, MAXPGPATH, "%s/%08X-%08X.%u.%u.%u.%u%s", savepath,
+				 LSN_FORMAT_ARGS(record->ReadRecPtr),
+				 rnode.spcOid, rnode.dbOid, rnode.relNumber, blk, forkname);
+
+		OPF = fopen(filename, PG_BINARY_W);
+		if (!OPF)
+			pg_fatal("couldn't open file for output: %s", filename);
+
+		if (fwrite(page, BLCKSZ, 1, OPF) != 1)
+			pg_fatal("couldn't write out complete full page image to file: %s", filename);
+
+		fclose(OPF);
+	}
+}
+
 /*
  * Print a record to stdout
  */
@@ -679,6 +741,8 @@ usage(void)
 			 "                         (default: 1 or the value used in STARTSEG)\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
 	printf(_("  -w, --fullpage         only show records with a full page write\n"));
+	printf(_("  -W, --save-fpi=path    save full page images to given path as\n"
+			 "                         LSN.T.D.R.B_F\n"));
 	printf(_("  -x, --xid=XID          only show records with transaction ID XID\n"));
 	printf(_("  -z, --stats[=record]   show statistics instead of records\n"
 			 "                         (optionally, show per-record statistics)\n"));
@@ -712,6 +776,7 @@ main(int argc, char **argv)
 		{"limit", required_argument, NULL, 'n'},
 		{"path", required_argument, NULL, 'p'},
 		{"quiet", no_argument, NULL, 'q'},
+		{"save-fpi", required_argument, NULL, 'W'},
 		{"relation", required_argument, NULL, 'R'},
 		{"rmgr", required_argument, NULL, 'r'},
 		{"start", required_argument, NULL, 's'},
@@ -772,6 +837,7 @@ main(int argc, char **argv)
 	config.filter_by_fpw = false;
 	config.stats = false;
 	config.stats_per_record = false;
+	config.save_fpi_path = NULL;
 
 	stats.startptr = InvalidXLogRecPtr;
 	stats.endptr = InvalidXLogRecPtr;
@@ -782,7 +848,7 @@ main(int argc, char **argv)
 		goto bad_argument;
 	}
 
-	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wx:z",
+	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wW:x:z",
 								 long_options, &optindex)) != -1)
 	{
 		switch (option)
@@ -918,6 +984,9 @@ main(int argc, char **argv)
 			case 'w':
 				config.filter_by_fpw = true;
 				break;
+			case 'W':
+				config.save_fpi_path = pg_strdup(optarg);
+				break;
 			case 'x':
 				if (sscanf(optarg, "%u", &config.filter_by_xid) != 1)
 				{
@@ -972,6 +1041,17 @@ main(int argc, char **argv)
 		}
 	}
 
+	if (config.save_fpi_path != NULL)
+	{
+		/* Create the dir if it doesn't exist */
+		if (pg_mkdir_p(config.save_fpi_path, pg_dir_create_mode) < 0)
+		{
+			pg_log_error("could not create output directory \"%s\": %m",
+						 config.save_fpi_path);
+			goto bad_argument;
+		}
+	}
+
 	/* parse files as start/end boundaries, extract path if not specified */
 	if (optind < argc)
 	{
@@ -1150,6 +1230,11 @@ main(int argc, char **argv)
 				XLogRecStoreStats(&stats, xlogreader_state);
 				stats.endptr = xlogreader_state->EndRecPtr;
 			}
+			else if (config.save_fpi_path != NULL)
+			{
+				if (XLogRecordHasFPW(xlogreader_state))
+					XLogRecordSaveFPIs(xlogreader_state, config.save_fpi_path);
+			}
 			else
 				XLogDumpDisplayRecord(&config, xlogreader_state);
 		}
diff --git a/src/bin/pg_waldump/t/002_save_fullpage.pl b/src/bin/pg_waldump/t/002_save_fullpage.pl
new file mode 100644
index 0000000000..2fe2821eb6
--- /dev/null
+++ b/src/bin/pg_waldump/t/002_save_fullpage.pl
@@ -0,0 +1,99 @@
+
+# Copyright (c) 2022, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use File::Basename;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::RecursiveCopy;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my ($blocksize, $walfile_name);
+
+# routine to extract the LSN from the given block structure
+sub get_block_info
+{
+	my $path = shift;
+	my $block;
+
+	open my $fh, '<', $path or die "couldn't open file: $path\n";
+	die "couldn't read full block\n" if $blocksize != read $fh, $block, $blocksize;
+	my ($lsn_hi, $lsn_lo) = unpack('VV', $block);
+
+	$lsn_hi = sprintf('%08X', $lsn_hi);
+	$lsn_lo = sprintf('%08X', $lsn_lo);
+
+	return ($lsn_hi, $lsn_lo);
+}
+
+my $node =  PostgreSQL::Test::Cluster->new('primary');
+$node->init(extra => ['-k'], allows_streaming => 1);
+$node->start;
+
+# Sanity checks for command line options.
+$node->command_fails(
+	[ 'pg_waldump', '--save-fpi' ],
+	'--save-fpi fails without path');
+
+# generate data/wal to examine that will have FPIs in them
+$node->safe_psql('postgres', "SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_waldump_slot', true, false);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT; -- required to force FPI for next writes
+UPDATE test_table SET a = a + 1;
+");
+
+($walfile_name, $blocksize) = split '\|' => $node->safe_psql('postgres',"SELECT pg_walfile_name(pg_switch_wal()), current_setting('block_size')");
+
+# get the relation node, etc for the new table
+my $relation = $node->safe_psql('postgres',
+	q{SELECT format(
+        '%s/%s/%s',
+        CASE WHEN reltablespace = 0 THEN dattablespace ELSE reltablespace END,
+        pg_database.oid,
+        pg_relation_filenode(pg_class.oid))
+    FROM pg_class, pg_database
+    WHERE relname = 'test_table' AND
+        datname = current_database()}
+);
+
+my $walfile = $node->data_dir . '/pg_wal/' . $walfile_name;
+my $tmp_folder = PostgreSQL::Test::Utils::tempdir;
+
+ok(-f $walfile, "Got a WAL file");
+
+$node->command_ok(['pg_waldump', '--save-fpi', "$tmp_folder/raw", '--relation', $relation, $walfile]);
+
+# This regexp will match filenames of the format XXXXXXXX-XXXXXXXX.dd.dd.dd.dd_fork
+# with the components being:
+# - WAL LSN in hex format,
+# - Decimal database oid
+# - Decimal tablespace oid or 0 for global
+# - Decimal relfilenumber
+# - Decimal block number
+# - fork this block came from (one of vm, init, fsm, or main)
+my $file_re =
+  qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/;
+
+my $file_count = 0;
+
+# verify filename formats matches w/--save-fpi
+for my $fullpath (glob "$tmp_folder/raw/*")
+{
+	my $file = File::Basename::basename($fullpath);
+
+	like($file, $file_re, "verify filename format for file $file");
+	$file_count++;
+
+	my ($hi_lsn_fn, $lo_lsn_fn) = ($file =~ $file_re);
+	my ($hi_lsn_bk, $lo_lsn_bk) = get_block_info($fullpath);
+
+	# verify the lsn in the block comes before the file's lsn
+	ok( $hi_lsn_fn . $lo_lsn_fn gt $hi_lsn_bk . $lo_lsn_bk,
+		'verify file-based LSN precedes block-based one');
+}
+
+# validate that we ended up with some FPIs saved
+ok($file_count > 0, 'verify we processed some files');
+
+done_testing();
-- 
2.37.1 (Apple Git-137.1)

#52Michael Paquier
michael@paquier.xyz
In reply to: David Christensen (#51)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Thu, Dec 15, 2022 at 05:17:46PM -0600, David Christensen wrote:

On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier <michael@paquier.xyz> wrote:
This v10 should incorporate your feedback as well as Bharath's.

Thanks for the new version. I have minor comments.

It seems to me that you could allow things to work even if the
directory exists and is empty. See for example
verify_dir_is_empty_or_create() in pg_basebackup.c.

The `pg_mkdir_p()` supports an existing directory (and I don't think
we want to require it to be empty first), so this only errors when it
can't create a directory for some reason.

Sure, but things can also be made so as we don't fail if the directory
exists and is empty? This would be more consistent with the base
directories created by pg_basebackup and initdb.

+$node->safe_psql('postgres', <<EOF);
+SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_waldump_slot', true, false);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT; -- required to force FPI for next writes
+UPDATE test_table SET a = a + 1;
Using an EOF to execute a multi-line query would be a first.  Couldn't
you use the same thing as anywhere else?  009_twophase.pl just to
mention one.  (Mentioned by Bharath upthread, where he asked for an
extra opinion so here it is.)

Fair enough, while idiomatic perl to me, not a hill to die on;
converted to a standard multiline string.

By the way, knowing that we have an option called --fullpage, could be
be better to use --save-fullpage for the option name?

+       OPF = fopen(filename, PG_BINARY_W);
+       if (!OPF)
+           pg_fatal("couldn't open file for output: %s", filename);
[..]
+       if (fwrite(page, BLCKSZ, 1, OPF) != 1)
+           pg_fatal("couldn't write out complete full page image to file: %s", filename);
These should more more generic, as of "could not open file \"%s\"" and
"could not write file \"%s\"" as the file name provides all the
information about what this writes.
--
Michael
#53Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David Christensen (#51)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Fri, Dec 16, 2022 at 4:47 AM David Christensen
<david.christensen@crunchydata.com> wrote:

On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Dec 14, 2022 at 04:44:34PM -0600, David Christensen wrote:

I can get one sent in tomorrow.

This v10 should incorporate your feedback as well as Bharath's.

Thanks for the patch. Here're some minor comments:

1. +my $node = PostgreSQL::Test::Cluster->new('primary');
Can the name be other than 'primary' because we don't create a standby
for this test? Something like - 'node_a' or 'node_extract_fpi' or some
other.

2. +$node->init(extra => ['-k'], allows_streaming => 1);
When enabled with allows_streaming, there are a bunch of things that
happen to the node while initializing, I don't think we need all of
them for this.

3. +$node->init(extra => ['-k'], allows_streaming => 1);
Can we use --data-checksums instead of -k for more readability?
Perhaps, a comment on why we need that option helps greatly.

4.
+        page = (Page) buf.data;
+
+        if (!XLogRecHasBlockRef(record, block_id))
+            continue;
+
+        if (!XLogRecHasBlockImage(record, block_id))
+            continue;
+
+        if (!RestoreBlockImage(record, block_id, page))
+            continue;
Can you shift  page = (Page) buf.data; just before the last if
condition RestoreBlockImage() so that it doesn't get executed for the
other two continue statements?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#54David Christensen
david.christensen@crunchydata.com
In reply to: Bharath Rupireddy (#53)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Wed, Dec 21, 2022 at 5:47 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Dec 16, 2022 at 4:47 AM David Christensen
<david.christensen@crunchydata.com> wrote:

On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Dec 14, 2022 at 04:44:34PM -0600, David Christensen wrote:

I can get one sent in tomorrow.

This v10 should incorporate your feedback as well as Bharath's.

Thanks for the patch. Here're some minor comments:

1. +my $node = PostgreSQL::Test::Cluster->new('primary');
Can the name be other than 'primary' because we don't create a standby
for this test? Something like - 'node_a' or 'node_extract_fpi' or some
other.

Sure, no issues.

2. +$node->init(extra => ['-k'], allows_streaming => 1);
When enabled with allows_streaming, there are a bunch of things that
happen to the node while initializing, I don't think we need all of
them for this.

I think the "allows_streaming" was required to ensure the WAL files
were preserved properly, and was the approach we ended up taking
rather than trying to fail the archive_command or other approaches I'd
taken earlier. I'd rather keep this if we can, unless you can propose
a different approach that would continue to work in the same way.

3. +$node->init(extra => ['-k'], allows_streaming => 1);
Can we use --data-checksums instead of -k for more readability?
Perhaps, a comment on why we need that option helps greatly.

Yeah, can spell out; don't recall exactly why we needed it offhand,
but will confirm or remove if insignificant.

4.
+        page = (Page) buf.data;
+
+        if (!XLogRecHasBlockRef(record, block_id))
+            continue;
+
+        if (!XLogRecHasBlockImage(record, block_id))
+            continue;
+
+        if (!RestoreBlockImage(record, block_id, page))
+            continue;
Can you shift  page = (Page) buf.data; just before the last if
condition RestoreBlockImage() so that it doesn't get executed for the
other two continue statements?

Sure; since it was just setting a pointer value I didn't consider it
to be a hotspot for optimization.

Best,

David

#55David Christensen
david.christensen@crunchydata.com
In reply to: Michael Paquier (#52)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Dec 19, 2022 at 12:23 AM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Dec 15, 2022 at 05:17:46PM -0600, David Christensen wrote:

On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier <michael@paquier.xyz> wrote:
This v10 should incorporate your feedback as well as Bharath's.

Thanks for the new version. I have minor comments.

It seems to me that you could allow things to work even if the
directory exists and is empty. See for example
verify_dir_is_empty_or_create() in pg_basebackup.c.

The `pg_mkdir_p()` supports an existing directory (and I don't think
we want to require it to be empty first), so this only errors when it
can't create a directory for some reason.

Sure, but things can also be made so as we don't fail if the directory
exists and is empty? This would be more consistent with the base
directories created by pg_basebackup and initdb.

I guess I'm feeling a little dense here; how is this failing if there
is an existing empty directory?

+$node->safe_psql('postgres', <<EOF);
+SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_waldump_slot', true, false);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT; -- required to force FPI for next writes
+UPDATE test_table SET a = a + 1;
Using an EOF to execute a multi-line query would be a first.  Couldn't
you use the same thing as anywhere else?  009_twophase.pl just to
mention one.  (Mentioned by Bharath upthread, where he asked for an
extra opinion so here it is.)

Fair enough, while idiomatic perl to me, not a hill to die on;
converted to a standard multiline string.

By the way, knowing that we have an option called --fullpage, could be
be better to use --save-fullpage for the option name?

Works for me. I think I'd just wanted to avoid reformatting the
entire usage message which is why I'd gone with the shorter version.

+       OPF = fopen(filename, PG_BINARY_W);
+       if (!OPF)
+           pg_fatal("couldn't open file for output: %s", filename);
[..]
+       if (fwrite(page, BLCKSZ, 1, OPF) != 1)
+           pg_fatal("couldn't write out complete full page image to file: %s", filename);
These should more more generic, as of "could not open file \"%s\"" and
"could not write file \"%s\"" as the file name provides all the
information about what this writes.

Sure, will update.

Best,

David

#56David Christensen
david.christensen@crunchydata.com
In reply to: David Christensen (#54)
1 attachment(s)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Fri, Dec 23, 2022 at 12:57 PM David Christensen
<david.christensen@crunchydata.com> wrote:

On Wed, Dec 21, 2022 at 5:47 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

[snip]

2. +$node->init(extra => ['-k'], allows_streaming => 1);
When enabled with allows_streaming, there are a bunch of things that
happen to the node while initializing, I don't think we need all of
them for this.

I think the "allows_streaming" was required to ensure the WAL files
were preserved properly, and was the approach we ended up taking
rather than trying to fail the archive_command or other approaches I'd
taken earlier. I'd rather keep this if we can, unless you can propose
a different approach that would continue to work in the same way.

Confirmed that we needed this in order to create the replication slot,
so this /is/ required for the test to work.

Enclosing v11 with yours and Michael's latest feedback.

Best,

David

Attachments:

v11-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-st.patchapplication/octet-stream; name=v11-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-st.patchDownload
From 573fa21454d032faef41b69c91d338fffbeda43c Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Fri, 23 Dec 2022 13:24:58 -0600
Subject: [PATCH v11] Teach pg_waldump to extract FPIs from the WAL stream

Extracts full-page images from the WAL stream into a given target directory.
These images are subject to the same filtering rules as normal display in
pg_waldump, which means that you can isolate the full page writes to a
target relation, among other things.

Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk>_<fork> with
formatting to make things somewhat sortable; for instance:

00000000-010000C0.1663.1.6117.0_main
00000000-01000150.1664.0.6115.0_main
00000000-010001E0.1664.0.6114.0_main
00000000-01000270.1663.1.6116.0_main
00000000-01000300.1663.1.6113.0_main
00000000-01000390.1663.1.6112.0_main
00000000-01000420.1663.1.8903.0_main
00000000-010004B0.1663.1.8902.0_main
00000000-01000540.1663.1.6111.0_main
00000000-010005D0.1663.1.6110.0_main

It's noteworthy that the raw block images do not have the current LSN
stored with them in the WAL stream (as would be true for on-heap
versions of the blocks), nor would the checksum be updated in
them (though WAL itself has checksums, so there is some protection
there).

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect` suite of tools to perform detailed analysis on
the pages in question, based on historical information, and may come in
handy for forensics work.
---
 doc/src/sgml/ref/pg_waldump.sgml          | 69 ++++++++++++++++
 src/bin/pg_waldump/meson.build            |  1 +
 src/bin/pg_waldump/pg_waldump.c           | 87 +++++++++++++++++++-
 src/bin/pg_waldump/t/002_save_fullpage.pl | 99 +++++++++++++++++++++++
 4 files changed, 255 insertions(+), 1 deletion(-)
 create mode 100644 src/bin/pg_waldump/t/002_save_fullpage.pl

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d559f091e5..090205e122 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -240,6 +240,75 @@ PostgreSQL documentation
        </listitem>
      </varlistentry>
 
+     <varlistentry>
+       <term><option>-W <replaceable>save_path</replaceable></option></term>
+       <term><option>--save-fullpage=<replaceable>save_path</replaceable></option></term>
+       <listitem>
+       <para>
+        Save full page images seen in the WAL stream to the
+        <replaceable>save_path</replaceable> directory, which will be created
+        if it does not exist.  The images saved will be subject to the same
+        filtering and limiting criteria as display records, but in this
+        mode <application>pg_waldump</application> will not output any other
+        information.
+       </para>
+       <para>
+        The page images will be saved with the file
+        format: <literal><replaceable>LSN</replaceable>.<replaceable>RELTABLESPACE</replaceable>.<replaceable>DATOID</replaceable>.<replaceable>RELNODE</replaceable>.<replaceable>BLKNO</replaceable><replaceable>FORK</replaceable></literal>
+
+        The dot-separated components are (in order):
+
+        <informaltable>
+         <tgroup cols="2">
+          <thead>
+           <row>
+            <entry>Component</entry>
+            <entry>Description</entry>
+           </row>
+          </thead>
+
+          <tbody>
+           <row>
+            <entry>LSN</entry>
+            <entry>The LSN of the record with this block, formatted
+             as two 8-character hexadecimal numbers <literal>%08X-%08X</literal></entry>
+           </row>
+
+           <row>
+            <entry>RELTABLESPACE</entry>
+            <entry>tablespace OID for the block</entry>
+           </row>
+
+           <row>
+            <entry>DATOID</entry>
+            <entry>database OID for the block</entry>
+           </row>
+
+           <row>
+            <entry>RELNODE</entry>
+            <entry>relnode id for the block</entry>
+           </row>
+
+           <row>
+            <entry>BLKNO</entry>
+            <entry>the block number of this block</entry>
+           </row>
+
+           <row>
+            <entry>FORK</entry>
+            <entry>
+             The name of the fork the full page image came from.  One
+             of <literal>_main</literal>, <literal>_fsm</literal>,
+             <literal>_vm</literal>, or <literal>_init</literal>.
+            </entry>
+           </row>
+          </tbody>
+         </tgroup>
+        </informaltable>
+       </para>
+       </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-x <replaceable>xid</replaceable></option></term>
       <term><option>--xid=<replaceable>xid</replaceable></option></term>
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 3fa1b53e71..0428998350 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -31,6 +31,7 @@ tests += {
   'tap': {
     'tests': [
       't/001_basic.pl',
+      't/002_save_fullpage.pl',
     ],
   },
 }
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 9993378ca5..81aeef3b97 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -23,9 +23,13 @@
 #include "access/xlogrecord.h"
 #include "access/xlogstats.h"
 #include "common/fe_memutils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "common/logging.h"
+#include "common/relpath.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "storage/bufpage.h"
 
 /*
  * NOTE: For any code change or issue fix here, it is highly recommended to
@@ -70,6 +74,9 @@ typedef struct XLogDumpConfig
 	bool		filter_by_relation_block_enabled;
 	ForkNumber	filter_by_relation_forknum;
 	bool		filter_by_fpw;
+
+	/* save options */
+	char	   *save_fpi_path;
 } XLogDumpConfig;
 
 
@@ -439,6 +446,61 @@ XLogRecordHasFPW(XLogReaderState *record)
 	return false;
 }
 
+/*
+ * Function to externally save all FPIs stored in the given WAL record
+ */
+static void
+XLogRecordSaveFPIs(XLogReaderState *record, const char *savepath)
+{
+	int			block_id;
+
+	for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+	{
+		PGAlignedBlock buf;
+		Page		page;
+		char		filename[MAXPGPATH];
+		char		forkname[FORKNAMECHARS + 2];	/* _ + \0 */
+		FILE	   *OPF;
+		BlockNumber blk;
+		RelFileLocator rnode;
+		ForkNumber	fork;
+
+		if (!XLogRecHasBlockRef(record, block_id))
+			continue;
+
+		if (!XLogRecHasBlockImage(record, block_id))
+			continue;
+
+		page = (Page) buf.data;
+
+		if (!RestoreBlockImage(record, block_id, page))
+			continue;
+
+		/* we have our extracted FPI, let's save it now */
+
+		XLogRecGetBlockTagExtended(record, block_id,
+								   &rnode, &fork, &blk, NULL);
+
+		if (fork >= 0 && fork <= MAX_FORKNUM)
+			sprintf(forkname, "_%s", forkNames[fork]);
+		else
+			pg_fatal("found invalid fork number: %u", fork);
+
+		snprintf(filename, MAXPGPATH, "%s/%08X-%08X.%u.%u.%u.%u%s", savepath,
+				 LSN_FORMAT_ARGS(record->ReadRecPtr),
+				 rnode.spcOid, rnode.dbOid, rnode.relNumber, blk, forkname);
+
+		OPF = fopen(filename, PG_BINARY_W);
+		if (!OPF)
+			pg_fatal("could not open file \"%s\"", filename);
+
+		if (fwrite(page, BLCKSZ, 1, OPF) != 1)
+			pg_fatal("could not write file \"%s\"", filename);
+
+		fclose(OPF);
+	}
+}
+
 /*
  * Print a record to stdout
  */
@@ -679,6 +741,8 @@ usage(void)
 			 "                         (default: 1 or the value used in STARTSEG)\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
 	printf(_("  -w, --fullpage         only show records with a full page write\n"));
+	printf(_("  -W, --save-fullpage=path    save full page images to given path as\n"
+			 "                              LSN.T.D.R.B_F\n"));
 	printf(_("  -x, --xid=XID          only show records with transaction ID XID\n"));
 	printf(_("  -z, --stats[=record]   show statistics instead of records\n"
 			 "                         (optionally, show per-record statistics)\n"));
@@ -712,6 +776,7 @@ main(int argc, char **argv)
 		{"limit", required_argument, NULL, 'n'},
 		{"path", required_argument, NULL, 'p'},
 		{"quiet", no_argument, NULL, 'q'},
+		{"save-fullpage", required_argument, NULL, 'W'},
 		{"relation", required_argument, NULL, 'R'},
 		{"rmgr", required_argument, NULL, 'r'},
 		{"start", required_argument, NULL, 's'},
@@ -772,6 +837,7 @@ main(int argc, char **argv)
 	config.filter_by_fpw = false;
 	config.stats = false;
 	config.stats_per_record = false;
+	config.save_fpi_path = NULL;
 
 	stats.startptr = InvalidXLogRecPtr;
 	stats.endptr = InvalidXLogRecPtr;
@@ -782,7 +848,7 @@ main(int argc, char **argv)
 		goto bad_argument;
 	}
 
-	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wx:z",
+	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wW:x:z",
 								 long_options, &optindex)) != -1)
 	{
 		switch (option)
@@ -918,6 +984,9 @@ main(int argc, char **argv)
 			case 'w':
 				config.filter_by_fpw = true;
 				break;
+			case 'W':
+				config.save_fpi_path = pg_strdup(optarg);
+				break;
 			case 'x':
 				if (sscanf(optarg, "%u", &config.filter_by_xid) != 1)
 				{
@@ -972,6 +1041,17 @@ main(int argc, char **argv)
 		}
 	}
 
+	if (config.save_fpi_path != NULL)
+	{
+		/* Create the dir if it doesn't exist */
+		if (pg_mkdir_p(config.save_fpi_path, pg_dir_create_mode) < 0)
+		{
+			pg_log_error("could not create output directory \"%s\": %m",
+						 config.save_fpi_path);
+			goto bad_argument;
+		}
+	}
+
 	/* parse files as start/end boundaries, extract path if not specified */
 	if (optind < argc)
 	{
@@ -1150,6 +1230,11 @@ main(int argc, char **argv)
 				XLogRecStoreStats(&stats, xlogreader_state);
 				stats.endptr = xlogreader_state->EndRecPtr;
 			}
+			else if (config.save_fpi_path != NULL)
+			{
+				if (XLogRecordHasFPW(xlogreader_state))
+					XLogRecordSaveFPIs(xlogreader_state, config.save_fpi_path);
+			}
 			else
 				XLogDumpDisplayRecord(&config, xlogreader_state);
 		}
diff --git a/src/bin/pg_waldump/t/002_save_fullpage.pl b/src/bin/pg_waldump/t/002_save_fullpage.pl
new file mode 100644
index 0000000000..35bb12e067
--- /dev/null
+++ b/src/bin/pg_waldump/t/002_save_fullpage.pl
@@ -0,0 +1,99 @@
+
+# Copyright (c) 2022, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use File::Basename;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::RecursiveCopy;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my ($blocksize, $walfile_name);
+
+# routine to extract the LSN from the given block structure
+sub get_block_info
+{
+	my $path = shift;
+	my $block;
+
+	open my $fh, '<', $path or die "couldn't open file: $path\n";
+	die "couldn't read full block\n" if $blocksize != read $fh, $block, $blocksize;
+	my ($lsn_hi, $lsn_lo) = unpack('VV', $block);
+
+	$lsn_hi = sprintf('%08X', $lsn_hi);
+	$lsn_lo = sprintf('%08X', $lsn_lo);
+
+	return ($lsn_hi, $lsn_lo);
+}
+
+my $node =  PostgreSQL::Test::Cluster->new('node_a');
+$node->init(allows_streaming => 1);
+$node->start;
+
+# Sanity checks for command line options.
+$node->command_fails(
+	[ 'pg_waldump', '--save-fullpage' ],
+	'--save-fullpage fails without path');
+
+# generate data/wal to examine that will have FPIs in them
+$node->safe_psql('postgres', "SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_waldump_slot', true, false);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT; -- required to force FPI for next writes
+UPDATE test_table SET a = a + 1;
+");
+
+($walfile_name, $blocksize) = split '\|' => $node->safe_psql('postgres',"SELECT pg_walfile_name(pg_switch_wal()), current_setting('block_size')");
+
+# get the relation node, etc for the new table
+my $relation = $node->safe_psql('postgres',
+	q{SELECT format(
+        '%s/%s/%s',
+        CASE WHEN reltablespace = 0 THEN dattablespace ELSE reltablespace END,
+        pg_database.oid,
+        pg_relation_filenode(pg_class.oid))
+    FROM pg_class, pg_database
+    WHERE relname = 'test_table' AND
+        datname = current_database()}
+);
+
+my $walfile = $node->data_dir . '/pg_wal/' . $walfile_name;
+my $tmp_folder = PostgreSQL::Test::Utils::tempdir;
+
+ok(-f $walfile, "Got a WAL file");
+
+$node->command_ok(['pg_waldump', '--save-fullpage', "$tmp_folder/raw", '--relation', $relation, $walfile]);
+
+# This regexp will match filenames of the format XXXXXXXX-XXXXXXXX.dd.dd.dd.dd_fork
+# with the components being:
+# - WAL LSN in hex format,
+# - Decimal database oid
+# - Decimal tablespace oid or 0 for global
+# - Decimal relfilenumber
+# - Decimal block number
+# - fork this block came from (one of vm, init, fsm, or main)
+my $file_re =
+  qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/;
+
+my $file_count = 0;
+
+# verify filename formats matches w/--save-fullpage
+for my $fullpath (glob "$tmp_folder/raw/*")
+{
+	my $file = File::Basename::basename($fullpath);
+
+	like($file, $file_re, "verify filename format for file $file");
+	$file_count++;
+
+	my ($hi_lsn_fn, $lo_lsn_fn) = ($file =~ $file_re);
+	my ($hi_lsn_bk, $lo_lsn_bk) = get_block_info($fullpath);
+
+	# verify the lsn in the block comes before the file's lsn
+	ok( $hi_lsn_fn . $lo_lsn_fn gt $hi_lsn_bk . $lo_lsn_bk,
+		'verify file-based LSN precedes block-based one');
+}
+
+# validate that we ended up with some FPIs saved
+ok($file_count > 0, 'verify we processed some files');
+
+done_testing();
-- 
2.37.1 (Apple Git-137.1)

#57Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David Christensen (#56)
1 attachment(s)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Sat, Dec 24, 2022 at 12:58 AM David Christensen
<david.christensen@crunchydata.com> wrote:

On Fri, Dec 23, 2022 at 12:57 PM David Christensen
<david.christensen@crunchydata.com> wrote:

On Wed, Dec 21, 2022 at 5:47 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

2. +$node->init(extra => ['-k'], allows_streaming => 1);
When enabled with allows_streaming, there are a bunch of things that
happen to the node while initializing, I don't think we need all of
them for this.

I think the "allows_streaming" was required to ensure the WAL files
were preserved properly, and was the approach we ended up taking
rather than trying to fail the archive_command or other approaches I'd
taken earlier. I'd rather keep this if we can, unless you can propose
a different approach that would continue to work in the same way.

Confirmed that we needed this in order to create the replication slot,
so this /is/ required for the test to work.

The added test needs wal_level to be replica, but the TAP tests set it
to minimal if allows_streaming isn't passed. However, if passed
allows_streaming, it sets a bunch of other parameters which are not
required for this test (see note->init function in cluster.pm), hence
we could just set the required parameters wal_level = replica and
max_wal_senders for the replication slot to be created.

Enclosing v11 with yours and Michael's latest feedback.

Thanks for the patch. I've made the above change as well as renamed
the test file name to be save_fpi.pl, everything else remains the same
as v11. Here's the v12 patch which LGTM. I'll mark it as RfC -
https://commitfest.postgresql.org/41/3628/.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v12-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-st.patchapplication/octet-stream; name=v12-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-st.patchDownload
From cee0c0017342fc8847e4b099358245fac6046a98 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Sat, 24 Dec 2022 12:39:12 +0000
Subject: [PATCH v12] Teach pg_waldump to extract FPIs from the WAL stream

Extracts full-page images from the WAL stream into a given target directory.
These images are subject to the same filtering rules as normal display in
pg_waldump, which means that you can isolate the full page writes to a
target relation, among other things.

Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk>_<fork> with
formatting to make things somewhat sortable; for instance:

00000000-010000C0.1663.1.6117.0_main
00000000-01000150.1664.0.6115.0_main
00000000-010001E0.1664.0.6114.0_main
00000000-01000270.1663.1.6116.0_main
00000000-01000300.1663.1.6113.0_main
00000000-01000390.1663.1.6112.0_main
00000000-01000420.1663.1.8903.0_main
00000000-010004B0.1663.1.8902.0_main
00000000-01000540.1663.1.6111.0_main
00000000-010005D0.1663.1.6110.0_main

It's noteworthy that the raw block images do not have the current LSN
stored with them in the WAL stream (as would be true for on-heap
versions of the blocks), nor would the checksum be updated in
them (though WAL itself has checksums, so there is some protection
there).

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect` suite of tools to perform detailed analysis on
the pages in question, based on historical information, and may come in
handy for forensics work.
---
 doc/src/sgml/ref/pg_waldump.sgml     |  69 ++++++++++++++++++
 src/bin/pg_waldump/meson.build       |   1 +
 src/bin/pg_waldump/pg_waldump.c      |  87 +++++++++++++++++++++-
 src/bin/pg_waldump/t/002_save_fpi.pl | 104 +++++++++++++++++++++++++++
 4 files changed, 260 insertions(+), 1 deletion(-)
 create mode 100644 src/bin/pg_waldump/t/002_save_fpi.pl

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d559f091e5..090205e122 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -240,6 +240,75 @@ PostgreSQL documentation
        </listitem>
      </varlistentry>
 
+     <varlistentry>
+       <term><option>-W <replaceable>save_path</replaceable></option></term>
+       <term><option>--save-fullpage=<replaceable>save_path</replaceable></option></term>
+       <listitem>
+       <para>
+        Save full page images seen in the WAL stream to the
+        <replaceable>save_path</replaceable> directory, which will be created
+        if it does not exist.  The images saved will be subject to the same
+        filtering and limiting criteria as display records, but in this
+        mode <application>pg_waldump</application> will not output any other
+        information.
+       </para>
+       <para>
+        The page images will be saved with the file
+        format: <literal><replaceable>LSN</replaceable>.<replaceable>RELTABLESPACE</replaceable>.<replaceable>DATOID</replaceable>.<replaceable>RELNODE</replaceable>.<replaceable>BLKNO</replaceable><replaceable>FORK</replaceable></literal>
+
+        The dot-separated components are (in order):
+
+        <informaltable>
+         <tgroup cols="2">
+          <thead>
+           <row>
+            <entry>Component</entry>
+            <entry>Description</entry>
+           </row>
+          </thead>
+
+          <tbody>
+           <row>
+            <entry>LSN</entry>
+            <entry>The LSN of the record with this block, formatted
+             as two 8-character hexadecimal numbers <literal>%08X-%08X</literal></entry>
+           </row>
+
+           <row>
+            <entry>RELTABLESPACE</entry>
+            <entry>tablespace OID for the block</entry>
+           </row>
+
+           <row>
+            <entry>DATOID</entry>
+            <entry>database OID for the block</entry>
+           </row>
+
+           <row>
+            <entry>RELNODE</entry>
+            <entry>relnode id for the block</entry>
+           </row>
+
+           <row>
+            <entry>BLKNO</entry>
+            <entry>the block number of this block</entry>
+           </row>
+
+           <row>
+            <entry>FORK</entry>
+            <entry>
+             The name of the fork the full page image came from.  One
+             of <literal>_main</literal>, <literal>_fsm</literal>,
+             <literal>_vm</literal>, or <literal>_init</literal>.
+            </entry>
+           </row>
+          </tbody>
+         </tgroup>
+        </informaltable>
+       </para>
+       </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-x <replaceable>xid</replaceable></option></term>
       <term><option>--xid=<replaceable>xid</replaceable></option></term>
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 3fa1b53e71..98c6de1a80 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -31,6 +31,7 @@ tests += {
   'tap': {
     'tests': [
       't/001_basic.pl',
+      't/002_save_fpi.pl',
     ],
   },
 }
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 9993378ca5..81aeef3b97 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -23,9 +23,13 @@
 #include "access/xlogrecord.h"
 #include "access/xlogstats.h"
 #include "common/fe_memutils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "common/logging.h"
+#include "common/relpath.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "storage/bufpage.h"
 
 /*
  * NOTE: For any code change or issue fix here, it is highly recommended to
@@ -70,6 +74,9 @@ typedef struct XLogDumpConfig
 	bool		filter_by_relation_block_enabled;
 	ForkNumber	filter_by_relation_forknum;
 	bool		filter_by_fpw;
+
+	/* save options */
+	char	   *save_fpi_path;
 } XLogDumpConfig;
 
 
@@ -439,6 +446,61 @@ XLogRecordHasFPW(XLogReaderState *record)
 	return false;
 }
 
+/*
+ * Function to externally save all FPIs stored in the given WAL record
+ */
+static void
+XLogRecordSaveFPIs(XLogReaderState *record, const char *savepath)
+{
+	int			block_id;
+
+	for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+	{
+		PGAlignedBlock buf;
+		Page		page;
+		char		filename[MAXPGPATH];
+		char		forkname[FORKNAMECHARS + 2];	/* _ + \0 */
+		FILE	   *OPF;
+		BlockNumber blk;
+		RelFileLocator rnode;
+		ForkNumber	fork;
+
+		if (!XLogRecHasBlockRef(record, block_id))
+			continue;
+
+		if (!XLogRecHasBlockImage(record, block_id))
+			continue;
+
+		page = (Page) buf.data;
+
+		if (!RestoreBlockImage(record, block_id, page))
+			continue;
+
+		/* we have our extracted FPI, let's save it now */
+
+		XLogRecGetBlockTagExtended(record, block_id,
+								   &rnode, &fork, &blk, NULL);
+
+		if (fork >= 0 && fork <= MAX_FORKNUM)
+			sprintf(forkname, "_%s", forkNames[fork]);
+		else
+			pg_fatal("found invalid fork number: %u", fork);
+
+		snprintf(filename, MAXPGPATH, "%s/%08X-%08X.%u.%u.%u.%u%s", savepath,
+				 LSN_FORMAT_ARGS(record->ReadRecPtr),
+				 rnode.spcOid, rnode.dbOid, rnode.relNumber, blk, forkname);
+
+		OPF = fopen(filename, PG_BINARY_W);
+		if (!OPF)
+			pg_fatal("could not open file \"%s\"", filename);
+
+		if (fwrite(page, BLCKSZ, 1, OPF) != 1)
+			pg_fatal("could not write file \"%s\"", filename);
+
+		fclose(OPF);
+	}
+}
+
 /*
  * Print a record to stdout
  */
@@ -679,6 +741,8 @@ usage(void)
 			 "                         (default: 1 or the value used in STARTSEG)\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
 	printf(_("  -w, --fullpage         only show records with a full page write\n"));
+	printf(_("  -W, --save-fullpage=path    save full page images to given path as\n"
+			 "                              LSN.T.D.R.B_F\n"));
 	printf(_("  -x, --xid=XID          only show records with transaction ID XID\n"));
 	printf(_("  -z, --stats[=record]   show statistics instead of records\n"
 			 "                         (optionally, show per-record statistics)\n"));
@@ -712,6 +776,7 @@ main(int argc, char **argv)
 		{"limit", required_argument, NULL, 'n'},
 		{"path", required_argument, NULL, 'p'},
 		{"quiet", no_argument, NULL, 'q'},
+		{"save-fullpage", required_argument, NULL, 'W'},
 		{"relation", required_argument, NULL, 'R'},
 		{"rmgr", required_argument, NULL, 'r'},
 		{"start", required_argument, NULL, 's'},
@@ -772,6 +837,7 @@ main(int argc, char **argv)
 	config.filter_by_fpw = false;
 	config.stats = false;
 	config.stats_per_record = false;
+	config.save_fpi_path = NULL;
 
 	stats.startptr = InvalidXLogRecPtr;
 	stats.endptr = InvalidXLogRecPtr;
@@ -782,7 +848,7 @@ main(int argc, char **argv)
 		goto bad_argument;
 	}
 
-	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wx:z",
+	while ((option = getopt_long(argc, argv, "bB:e:fF:n:p:qr:R:s:t:wW:x:z",
 								 long_options, &optindex)) != -1)
 	{
 		switch (option)
@@ -918,6 +984,9 @@ main(int argc, char **argv)
 			case 'w':
 				config.filter_by_fpw = true;
 				break;
+			case 'W':
+				config.save_fpi_path = pg_strdup(optarg);
+				break;
 			case 'x':
 				if (sscanf(optarg, "%u", &config.filter_by_xid) != 1)
 				{
@@ -972,6 +1041,17 @@ main(int argc, char **argv)
 		}
 	}
 
+	if (config.save_fpi_path != NULL)
+	{
+		/* Create the dir if it doesn't exist */
+		if (pg_mkdir_p(config.save_fpi_path, pg_dir_create_mode) < 0)
+		{
+			pg_log_error("could not create output directory \"%s\": %m",
+						 config.save_fpi_path);
+			goto bad_argument;
+		}
+	}
+
 	/* parse files as start/end boundaries, extract path if not specified */
 	if (optind < argc)
 	{
@@ -1150,6 +1230,11 @@ main(int argc, char **argv)
 				XLogRecStoreStats(&stats, xlogreader_state);
 				stats.endptr = xlogreader_state->EndRecPtr;
 			}
+			else if (config.save_fpi_path != NULL)
+			{
+				if (XLogRecordHasFPW(xlogreader_state))
+					XLogRecordSaveFPIs(xlogreader_state, config.save_fpi_path);
+			}
 			else
 				XLogDumpDisplayRecord(&config, xlogreader_state);
 		}
diff --git a/src/bin/pg_waldump/t/002_save_fpi.pl b/src/bin/pg_waldump/t/002_save_fpi.pl
new file mode 100644
index 0000000000..244692320f
--- /dev/null
+++ b/src/bin/pg_waldump/t/002_save_fpi.pl
@@ -0,0 +1,104 @@
+
+# Copyright (c) 2022, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use File::Basename;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::RecursiveCopy;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my ($blocksize, $walfile_name);
+
+# Function to extract the LSN from the given block structure
+sub get_block_info
+{
+	my $path = shift;
+	my $block;
+
+	open my $fh, '<', $path or die "couldn't open file: $path\n";
+	die "couldn't read full block\n" if $blocksize != read $fh, $block, $blocksize;
+	my ($lsn_hi, $lsn_lo) = unpack('VV', $block);
+
+	$lsn_hi = sprintf('%08X', $lsn_hi);
+	$lsn_lo = sprintf('%08X', $lsn_lo);
+
+	return ($lsn_hi, $lsn_lo);
+}
+
+my $node =  PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->append_conf(
+	'postgresql.conf', q{
+wal_level = 'replica'
+max_wal_senders = 4
+});
+$node->start;
+
+# Sanity checks for command line options.
+$node->command_fails(
+	[ 'pg_waldump', '--save-fullpage' ],
+	'--save-fullpage fails without path');
+
+# Generate data/WAL to examine that will have FPIs in them
+$node->safe_psql('postgres', "SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_waldump_slot', true, false);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT; -- required to force FPI for next writes
+UPDATE test_table SET a = a + 1;
+");
+
+($walfile_name, $blocksize) = split '\|' => $node->safe_psql('postgres',"SELECT pg_walfile_name(pg_switch_wal()), current_setting('block_size')");
+
+# Get the relation node, etc for the new table
+my $relation = $node->safe_psql('postgres',
+	q{SELECT format(
+        '%s/%s/%s',
+        CASE WHEN reltablespace = 0 THEN dattablespace ELSE reltablespace END,
+        pg_database.oid,
+        pg_relation_filenode(pg_class.oid))
+    FROM pg_class, pg_database
+    WHERE relname = 'test_table' AND
+        datname = current_database()}
+);
+
+my $walfile = $node->data_dir . '/pg_wal/' . $walfile_name;
+my $tmp_folder = PostgreSQL::Test::Utils::tempdir;
+
+ok(-f $walfile, "Got a WAL file");
+
+$node->command_ok(['pg_waldump', '--save-fullpage', "$tmp_folder/raw", '--relation', $relation, $walfile]);
+
+# This regexp will match filenames of the format XXXXXXXX-XXXXXXXX.dd.dd.dd.dd_fork
+# with the components being:
+# - WAL LSN in hex format,
+# - Decimal database oid
+# - Decimal tablespace oid or 0 for global
+# - Decimal relfilenumber
+# - Decimal block number
+# - fork this block came from (one of vm, init, fsm, or main)
+my $file_re =
+  qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/;
+
+my $file_count = 0;
+
+# Verify filename formats matches w/--save-fullpage
+for my $fullpath (glob "$tmp_folder/raw/*")
+{
+	my $file = File::Basename::basename($fullpath);
+
+	like($file, $file_re, "verify filename format for file $file");
+	$file_count++;
+
+	my ($hi_lsn_fn, $lo_lsn_fn) = ($file =~ $file_re);
+	my ($hi_lsn_bk, $lo_lsn_bk) = get_block_info($fullpath);
+
+	# Verify the lsn in the block comes before the file's lsn
+	ok( $hi_lsn_fn . $lo_lsn_fn gt $hi_lsn_bk . $lo_lsn_bk,
+		'verify file-based LSN precedes block-based one');
+}
+
+# Validate that we ended up with some FPIs saved
+ok($file_count > 0, 'verify we processed some files');
+
+done_testing();
-- 
2.34.1

#58Michael Paquier
michael@paquier.xyz
In reply to: Bharath Rupireddy (#57)
1 attachment(s)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Sat, Dec 24, 2022 at 06:23:29PM +0530, Bharath Rupireddy wrote:

Thanks for the patch. I've made the above change as well as renamed
the test file name to be save_fpi.pl, everything else remains the same
as v11. Here's the v12 patch which LGTM. I'll mark it as RfC -
https://commitfest.postgresql.org/41/3628/.

I have done a review of that, and here are my notes:
- The variable names were a bit inconsistent, so I have switched most
of the new code to use "fullpage".
- The code was not able to handle the case of a target directory
existing but empty, so I have added a wrapper on pg_check_dir().
- XLogRecordHasFPW() could be checked directly in the function saving
the blocks. Still, there is no need for it as we apply the same
checks again in the inner loop of the routine.
- The new test has been renamed.
- RestoreBlockImage() would report a failure and the code would just
skip it and continue its work. This could point out to a compression
failure for example, so like any code paths calling this routine I
think that we'd better do a pg_fatal() and fail hard.
- I did not understand why there is a reason to make this option
conditional on the record prints or even the stats, so I have moved
the FPW save routine into a separate code path. The other two could
be silenced (or not) using --quiet for example, for the same result as
v12 without impacting the usability of this feature.
- Few tweaks to the docs, the --help output, the comments and the
tests.
- Indentation applied.

Being able to filter the blocks saved using start/end LSNs or just
--relation is really cool, especially as the file names use the same
order as what's needed for this option.

Comments?
--
Michael

Attachments:

v13-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-st.patchtext/x-diff; charset=us-asciiDownload
From 66eedbb96858a4f6804509900b116d8a63b39925 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 26 Dec 2022 16:28:00 +0900
Subject: [PATCH v13] Teach pg_waldump to extract FPIs from the WAL stream

Extracts full-page images from the WAL stream into a given target directory.
These images are subject to the same filtering rules as normal display in
pg_waldump, which means that you can isolate the full page writes to a
target relation, among other things.

Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk>_<fork> with
formatting to make things somewhat sortable; for instance:

00000000-010000C0.1663.1.6117.0_main
00000000-01000150.1664.0.6115.0_main
00000000-010001E0.1664.0.6114.0_main
00000000-01000270.1663.1.6116.0_main
00000000-01000300.1663.1.6113.0_main
00000000-01000390.1663.1.6112.0_main
00000000-01000420.1663.1.8903.0_main
00000000-010004B0.1663.1.8902.0_main
00000000-01000540.1663.1.6111.0_main
00000000-010005D0.1663.1.6110.0_main

It's noteworthy that the raw block images do not have the current LSN
stored with them in the WAL stream (as would be true for on-heap
versions of the blocks), nor would the checksum be updated in
them (though WAL itself has checksums, so there is some protection
there).

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect` suite of tools to perform detailed analysis on
the pages in question, based on historical information, and may come in
handy for forensics work.
---
 src/bin/pg_waldump/meson.build            |   1 +
 src/bin/pg_waldump/pg_waldump.c           | 107 +++++++++++++++++++++
 src/bin/pg_waldump/t/002_save_fullpage.pl | 111 ++++++++++++++++++++++
 doc/src/sgml/ref/pg_waldump.sgml          |  66 +++++++++++++
 4 files changed, 285 insertions(+)
 create mode 100644 src/bin/pg_waldump/t/002_save_fullpage.pl

diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 3fa1b53e71..0428998350 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -31,6 +31,7 @@ tests += {
   'tap': {
     'tests': [
       't/001_basic.pl',
+      't/002_save_fullpage.pl',
     ],
   },
 }
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 9993378ca5..d90a142c68 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -23,9 +23,13 @@
 #include "access/xlogrecord.h"
 #include "access/xlogstats.h"
 #include "common/fe_memutils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "common/logging.h"
+#include "common/relpath.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "storage/bufpage.h"
 
 /*
  * NOTE: For any code change or issue fix here, it is highly recommended to
@@ -70,6 +74,9 @@ typedef struct XLogDumpConfig
 	bool		filter_by_relation_block_enabled;
 	ForkNumber	filter_by_relation_forknum;
 	bool		filter_by_fpw;
+
+	/* save options */
+	char	   *save_fullpage_path;
 } XLogDumpConfig;
 
 
@@ -112,6 +119,37 @@ verify_directory(const char *directory)
 	return true;
 }
 
+/*
+ * Create if necessary the directory storing the full-page images extracted
+ * from the WAL records read.
+ */
+static void
+create_fullpage_directory(char *path)
+{
+	int			ret;
+
+	switch ((ret = pg_check_dir(path)))
+	{
+		case 0:
+			/* Does not exist, so create it */
+			if (pg_mkdir_p(path, pg_dir_create_mode) < 0)
+				pg_fatal("could not create directory \"%s\": %m", path);
+			break;
+		case 1:
+			/* Present and empty, so do nothing */
+			break;
+		case 2:
+		case 3:
+		case 4:
+			/* Exists and not empty */
+			pg_fatal("directory \"%s\" exists but is not empty", path);
+			break;
+		default:
+			/* Trouble accessing directory */
+			pg_fatal("could not access directory \"%s\": %m", path);
+	}
+}
+
 /*
  * Split a pathname as dirname(1) and basename(1) would.
  *
@@ -439,6 +477,61 @@ XLogRecordHasFPW(XLogReaderState *record)
 	return false;
 }
 
+/*
+ * Function to externally save all FPWs stored in the given WAL record.
+ * Decompression is applied to all the blocks saved, if necessary.
+ */
+static void
+XLogRecordSaveFPWs(XLogReaderState *record, const char *savepath)
+{
+	int			block_id;
+
+	for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+	{
+		PGAlignedBlock buf;
+		Page		page;
+		char		filename[MAXPGPATH];
+		char		forkname[FORKNAMECHARS + 2];	/* _ + terminating zero */
+		FILE	   *file;
+		BlockNumber blk;
+		RelFileLocator rnode;
+		ForkNumber	fork;
+
+		if (!XLogRecHasBlockRef(record, block_id))
+			continue;
+
+		if (!XLogRecHasBlockImage(record, block_id))
+			continue;
+
+		page = (Page) buf.data;
+
+		/* Full page exists, so let's save it */
+		if (!RestoreBlockImage(record, block_id, page))
+			pg_fatal("%s", record->errormsg_buf);
+
+		(void) XLogRecGetBlockTagExtended(record, block_id,
+										  &rnode, &fork, &blk, NULL);
+
+		if (fork >= 0 && fork <= MAX_FORKNUM)
+			sprintf(forkname, "_%s", forkNames[fork]);
+		else
+			pg_fatal("invalid fork number: %u", fork);
+
+		snprintf(filename, MAXPGPATH, "%s/%08X-%08X.%u.%u.%u.%u%s", savepath,
+				 LSN_FORMAT_ARGS(record->ReadRecPtr),
+				 rnode.spcOid, rnode.dbOid, rnode.relNumber, blk, forkname);
+
+		file = fopen(filename, PG_BINARY_W);
+		if (!file)
+			pg_fatal("could not open file \"%s\": %m", filename);
+
+		if (fwrite(page, BLCKSZ, 1, file) != 1)
+			pg_fatal("could not write file \"%s\": %m", filename);
+
+		fclose(file);
+	}
+}
+
 /*
  * Print a record to stdout
  */
@@ -679,6 +772,8 @@ usage(void)
 			 "                         (default: 1 or the value used in STARTSEG)\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
 	printf(_("  -w, --fullpage         only show records with a full page write\n"));
+	printf(_("      --save-fullpage=PATH\n"
+			 "                         save full page images\n"));
 	printf(_("  -x, --xid=XID          only show records with transaction ID XID\n"));
 	printf(_("  -z, --stats[=record]   show statistics instead of records\n"
 			 "                         (optionally, show per-record statistics)\n"));
@@ -719,6 +814,7 @@ main(int argc, char **argv)
 		{"xid", required_argument, NULL, 'x'},
 		{"version", no_argument, NULL, 'V'},
 		{"stats", optional_argument, NULL, 'z'},
+		{"save-fullpage", required_argument, NULL, 1},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -770,6 +866,7 @@ main(int argc, char **argv)
 	config.filter_by_relation_block_enabled = false;
 	config.filter_by_relation_forknum = InvalidForkNumber;
 	config.filter_by_fpw = false;
+	config.save_fullpage_path = NULL;
 	config.stats = false;
 	config.stats_per_record = false;
 
@@ -942,6 +1039,9 @@ main(int argc, char **argv)
 					}
 				}
 				break;
+			case 1:
+				config.save_fullpage_path = pg_strdup(optarg);
+				break;
 			default:
 				goto bad_argument;
 		}
@@ -972,6 +1072,9 @@ main(int argc, char **argv)
 		}
 	}
 
+	if (config.save_fullpage_path != NULL)
+		create_fullpage_directory(config.save_fullpage_path);
+
 	/* parse files as start/end boundaries, extract path if not specified */
 	if (optind < argc)
 	{
@@ -1154,6 +1257,10 @@ main(int argc, char **argv)
 				XLogDumpDisplayRecord(&config, xlogreader_state);
 		}
 
+		/* save full pages if requested */
+		if (config.save_fullpage_path != NULL)
+			XLogRecordSaveFPWs(xlogreader_state, config.save_fullpage_path);
+
 		/* check whether we printed enough */
 		config.already_displayed_records++;
 		if (config.stop_after_records > 0 &&
diff --git a/src/bin/pg_waldump/t/002_save_fullpage.pl b/src/bin/pg_waldump/t/002_save_fullpage.pl
new file mode 100644
index 0000000000..eac3c25a4e
--- /dev/null
+++ b/src/bin/pg_waldump/t/002_save_fullpage.pl
@@ -0,0 +1,111 @@
+
+# Copyright (c) 2022, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use File::Basename;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::RecursiveCopy;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my ($blocksize, $walfile_name);
+
+# Function to extract the LSN from the given block structure
+sub get_block_info
+{
+	my $path      = shift;
+	my $blocksize = shift;
+	my $block;
+
+	open my $fh, '<', $path or die "couldn't open file: $path\n";
+	die "could not read block\n"
+	  if $blocksize != read($fh, $block, $blocksize);
+	my ($lsn_hi, $lsn_lo) = unpack('VV', $block);
+
+	$lsn_hi = sprintf('%08X', $lsn_hi);
+	$lsn_lo = sprintf('%08X', $lsn_lo);
+
+	return ($lsn_hi, $lsn_lo);
+}
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->append_conf(
+	'postgresql.conf', q{
+wal_level = 'replica'
+max_wal_senders = 4
+});
+$node->start;
+
+# Generate data/WAL to examine that will have full pages in them.
+$node->safe_psql(
+	'postgres',
+	"SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_waldump_slot', true, false);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+-- Force FPWs on the next writes.
+CHECKPOINT;
+UPDATE test_table SET a = a + 1;
+");
+
+($walfile_name, $blocksize) = split '\|' => $node->safe_psql('postgres',
+	"SELECT pg_walfile_name(pg_switch_wal()), current_setting('block_size')");
+
+# Get the relation node, etc for the new table
+my $relation = $node->safe_psql(
+	'postgres',
+	q{SELECT format(
+        '%s/%s/%s',
+        CASE WHEN reltablespace = 0 THEN dattablespace ELSE reltablespace END,
+        pg_database.oid,
+        pg_relation_filenode(pg_class.oid))
+    FROM pg_class, pg_database
+    WHERE relname = 'test_table' AND
+        datname = current_database()}
+);
+
+my $walfile    = $node->data_dir . '/pg_wal/' . $walfile_name;
+my $tmp_folder = PostgreSQL::Test::Utils::tempdir;
+
+ok(-f $walfile, "Got a WAL file");
+
+$node->command_ok(
+	[
+		'pg_waldump',      '--quiet',
+		'--save-fullpage', "$tmp_folder/raw",
+		'--relation',      $relation,
+		$walfile
+	]);
+
+# This regexp will match filenames formatted as:
+# XXXXXXXX-XXXXXXXX.dd.dd.dd.dd_fork with the components being:
+# - WAL LSN in hex format,
+# - Decimal database OID.
+# - Decimal tablespace OID or 0 for global.
+# - Decimal relfilenode.
+# - Decimal block number.
+# - Fork this block came from (vm, init, fsm, or main)
+my $file_re =
+  qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/;
+
+my $file_count = 0;
+
+# Verify filename formats matches --save-fullpage.
+for my $fullpath (glob "$tmp_folder/raw/*")
+{
+	my $file = File::Basename::basename($fullpath);
+
+	like($file, $file_re, "verify filename format for file $file");
+	$file_count++;
+
+	my ($hi_lsn_fn, $lo_lsn_fn) = ($file =~ $file_re);
+	my ($hi_lsn_bk, $lo_lsn_bk) = get_block_info($fullpath, $blocksize);
+
+	# The LSN on the block comes before the file's LSN.
+	ok( $hi_lsn_fn . $lo_lsn_fn gt $hi_lsn_bk . $lo_lsn_bk,
+		'LSN stored in the file precedes the one stored in the block');
+}
+
+ok($file_count > 0, 'verify that at least one block has been saved');
+
+done_testing();
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d559f091e5..343f0482a9 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -240,6 +240,72 @@ PostgreSQL documentation
        </listitem>
      </varlistentry>
 
+     <varlistentry>
+       <term><option>--save-fullpage=<replaceable>save_path</replaceable></option></term>
+       <listitem>
+       <para>
+        Save full page images found in the WAL records to the
+        <replaceable>save_path</replaceable> directory. The images saved
+        are subject to the same filtering and limiting criteria as the
+        records displayed.
+       </para>
+       <para>
+        The full page images are saved with the following file name format:
+        <literal><replaceable>LSN</replaceable>.<replaceable>RELTABLESPACE</replaceable>.<replaceable>DATOID</replaceable>.<replaceable>RELNODE</replaceable>.<replaceable>BLKNO</replaceable><replaceable>FORK</replaceable></literal>
+
+        The file names are composed of the following parts:
+        <informaltable>
+         <tgroup cols="2">
+          <thead>
+           <row>
+            <entry>Component</entry>
+            <entry>Description</entry>
+           </row>
+          </thead>
+
+          <tbody>
+           <row>
+            <entry>LSN</entry>
+            <entry>The <acronym>LSN</acronym> of the record with this image,
+             formatted as two 8-character hexadecimal numbers
+             <literal>%08X-%08X</literal></entry>
+           </row>
+
+           <row>
+            <entry>RELTABLESPACE</entry>
+            <entry>tablespace OID of the block</entry>
+           </row>
+
+           <row>
+            <entry>DATOID</entry>
+            <entry>database OID of the block</entry>
+           </row>
+
+           <row>
+            <entry>RELNODE</entry>
+            <entry>filenode of the block</entry>
+           </row>
+
+           <row>
+            <entry>BLKNO</entry>
+            <entry>block number of the block</entry>
+           </row>
+
+           <row>
+            <entry>FORK</entry>
+            <entry>
+             The name of the fork the full page image came from, as of
+             <literal>_main</literal>, <literal>_fsm</literal>,
+             <literal>_vm</literal>, or <literal>_init</literal>.
+            </entry>
+           </row>
+          </tbody>
+         </tgroup>
+        </informaltable>
+       </para>
+       </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-x <replaceable>xid</replaceable></option></term>
       <term><option>--xid=<replaceable>xid</replaceable></option></term>
-- 
2.39.0

#59Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Michael Paquier (#58)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Dec 26, 2022 at 12:59 PM Michael Paquier <michael@paquier.xyz> wrote:

I have done a review of that, and here are my notes:
- The variable names were a bit inconsistent, so I have switched most
of the new code to use "fullpage".

- The new test has been renamed.

- RestoreBlockImage() would report a failure and the code would just
skip it and continue its work. This could point out to a compression
failure for example, so like any code paths calling this routine I
think that we'd better do a pg_fatal() and fail hard.

- XLogRecordHasFPW() could be checked directly in the function saving
the blocks. Still, there is no need for it as we apply the same
checks again in the inner loop of the routine.

- Few tweaks to the docs, the --help output, the comments and the
tests.
- Indentation applied.

- I did not understand why there is a reason to make this option
conditional on the record prints or even the stats, so I have moved
the FPW save routine into a separate code path. The other two could
be silenced (or not) using --quiet for example, for the same result as
v12 without impacting the usability of this feature.

Looks good.

- The code was not able to handle the case of a target directory
existing but empty, so I have added a wrapper on pg_check_dir().

Looks okay and with the following, we impose the user-provided target
directory must be empty.

+        case 4:
+            /* Exists and not empty */
+            pg_fatal("directory \"%s\" exists but is not empty", path);

Being able to filter the blocks saved using start/end LSNs or just
--relation is really cool, especially as the file names use the same
order as what's needed for this option.

Comments?

+1. I think this feature will also be useful in pg_walinspect.
However, I'm a bit concerned that it can flood the running database
disk - if someone generates a lot of FPI files.

Overall, the v13 patch LGTM.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#60David Christensen
david.christensen@crunchydata.com
In reply to: Michael Paquier (#58)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Dec 26, 2022, at 1:29 AM, Michael Paquier <michael@paquier.xyz> wrote:

On Sat, Dec 24, 2022 at 06:23:29PM +0530, Bharath Rupireddy wrote:

Thanks for the patch. I've made the above change as well as renamed
the test file name to be save_fpi.pl, everything else remains the same
as v11. Here's the v12 patch which LGTM. I'll mark it as RfC -
https://commitfest.postgresql.org/41/3628/.

I have done a review of that, and here are my notes:
- The variable names were a bit inconsistent, so I have switched most
of the new code to use "fullpage".
- The code was not able to handle the case of a target directory
existing but empty, so I have added a wrapper on pg_check_dir().
- XLogRecordHasFPW() could be checked directly in the function saving
the blocks. Still, there is no need for it as we apply the same
checks again in the inner loop of the routine.
- The new test has been renamed.
- RestoreBlockImage() would report a failure and the code would just
skip it and continue its work. This could point out to a compression
failure for example, so like any code paths calling this routine I
think that we'd better do a pg_fatal() and fail hard.
- I did not understand why there is a reason to make this option
conditional on the record prints or even the stats, so I have moved
the FPW save routine into a separate code path. The other two could
be silenced (or not) using --quiet for example, for the same result as
v12 without impacting the usability of this feature.
- Few tweaks to the docs, the --help output, the comments and the
tests.
- Indentation applied.

Being able to filter the blocks saved using start/end LSNs or just
--relation is really cool, especially as the file names use the same
order as what's needed for this option.

Sounds good, definitely along the ideas of what I’d originally envisioned.

Thanks,

David

#61Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#58)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Dec 26, 2022 at 04:28:52PM +0900, Michael Paquier wrote:

Comments?

+		file = fopen(filename, PG_BINARY_W);
+		if (!file)
+			pg_fatal("could not open file \"%s\": %m", filename);
+
+		if (fwrite(page, BLCKSZ, 1, file) != 1)
+			pg_fatal("could not write file \"%s\": %m", filename);
+
+		fclose(file);

fclose() should be tested, too:

+		if (fwrite(page, BLCKSZ, 1, file) != 1 || fclose(file) != 0)
+			pg_fatal("could not write file \"%s\": %m", filename);

--
Justin

#62Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#61)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Dec 26, 2022 at 02:39:03PM -0600, Justin Pryzby wrote:

fclose() should be tested, too:

Sure. Done that too, and applied the change after a last lookup.
--
Michael

#63Michael Paquier
michael@paquier.xyz
In reply to: Bharath Rupireddy (#59)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Dec 26, 2022 at 04:18:18PM +0530, Bharath Rupireddy wrote:

+1. I think this feature will also be useful in pg_walinspect.
However, I'm a bit concerned that it can flood the running database
disk - if someone generates a lot of FPI files.

pg_read_file() and pg_waldump can be fed absolute paths outside the
data folder. So, well, just don't do that then :)
--
Michael

#64Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#59)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

On Mon, Dec 26, 2022 at 4:18 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Mon, Dec 26, 2022 at 12:59 PM Michael Paquier <michael@paquier.xyz> wrote:

+1. I think this feature will also be useful in pg_walinspect.

Just for the record - here's the pg_walinspect function to extract
FPIs from WAL records -
/messages/by-id/CALj2ACVCcvzd7WiWvD=6_7NBvVB_r6G0EGSxL4F8vosAi6Se4g@mail.gmail.com.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com