Add a new pg_walinspect function to extract FPIs from WAL records
Hi,
Here's a patch that implements the idea of extracting full page images
from WAL records [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d497093cbecccf6df26365e06a5f8f8614b591c8 [2]/messages/by-id/CAOxo6XKjQb2bMSBRpePf3ZpzfNTwjQUc4Tafh21=jzjX6bX8CA@mail.gmail.com with a function in pg_walinspect. This new
function accepts start and end lsn and returns full page image info
such as WAL record lsn, tablespace oid, database oid, relfile number,
block number, fork name and the raw full page (as bytea). I'll
register this in the next commitfest.
Thoughts?
[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d497093cbecccf6df26365e06a5f8f8614b591c8
[2]: /messages/by-id/CAOxo6XKjQb2bMSBRpePf3ZpzfNTwjQUc4Tafh21=jzjX6bX8CA@mail.gmail.com
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v1-0001-Add-FPI-extract-function-to-pg_walinspect.patchapplication/octet-stream; name=v1-0001-Add-FPI-extract-function-to-pg_walinspect.patchDownload
From f80d957e5f8b95c434cb18f4ab9c0b7d31ac606a Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Tue, 27 Dec 2022 11:16:49 +0000
Subject: [PATCH v1] Add FPI extract function to pg_walinspect
---
contrib/pg_walinspect/Makefile | 2 +-
.../pg_walinspect/expected/pg_walinspect.out | 46 ++++++++
contrib/pg_walinspect/meson.build | 1 +
.../pg_walinspect/pg_walinspect--1.0--1.1.sql | 24 ++++
contrib/pg_walinspect/pg_walinspect.c | 109 ++++++++++++++++++
contrib/pg_walinspect/pg_walinspect.control | 2 +-
contrib/pg_walinspect/sql/pg_walinspect.sql | 32 +++++
doc/src/sgml/pgwalinspect.sgml | 48 ++++++++
8 files changed, 262 insertions(+), 2 deletions(-)
create mode 100644 contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql
diff --git a/contrib/pg_walinspect/Makefile b/contrib/pg_walinspect/Makefile
index 960530eb6c..7033878a79 100644
--- a/contrib/pg_walinspect/Makefile
+++ b/contrib/pg_walinspect/Makefile
@@ -7,7 +7,7 @@ OBJS = \
PGFILEDESC = "pg_walinspect - functions to inspect contents of PostgreSQL Write-Ahead Log"
EXTENSION = pg_walinspect
-DATA = pg_walinspect--1.0.sql
+DATA = pg_walinspect--1.0.sql pg_walinspect--1.0--1.1.sql
REGRESS = pg_walinspect
diff --git a/contrib/pg_walinspect/expected/pg_walinspect.out b/contrib/pg_walinspect/expected/pg_walinspect.out
index a1ee743457..5cd28402ad 100644
--- a/contrib/pg_walinspect/expected/pg_walinspect.out
+++ b/contrib/pg_walinspect/expected/pg_walinspect.out
@@ -73,6 +73,27 @@ SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info(:'wal_lsn1', :'wal_lsn2'
t
(1 row)
+-- ===================================================================
+-- Tests for getting full page images
+-- ===================================================================
+SELECT pg_current_wal_lsn() AS wal_lsn3 \gset
+-- Force FPWs on the next writes.
+CHECKPOINT;
+UPDATE sample_tbl SET col1 = col1 + 1;
+SELECT 1 FROM pg_switch_wal();
+ ?column?
+----------
+ 1
+(1 row)
+
+SELECT pg_current_wal_lsn() AS wal_lsn4 \gset
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4')
+ WHERE relfile_number = :'sample_tbl_oid';
+ ok
+----
+ t
+(1 row)
+
-- ===================================================================
-- Tests for permissions
-- ===================================================================
@@ -98,6 +119,13 @@ SELECT has_function_privilege('regress_pg_walinspect',
f
(1 row)
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- no
+ has_function_privilege
+------------------------
+ f
+(1 row)
+
-- Functions accessible by users with role pg_read_server_files
GRANT pg_read_server_files TO regress_pg_walinspect;
SELECT has_function_privilege('regress_pg_walinspect',
@@ -121,6 +149,13 @@ SELECT has_function_privilege('regress_pg_walinspect',
t
(1 row)
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+ has_function_privilege
+------------------------
+ t
+(1 row)
+
REVOKE pg_read_server_files FROM regress_pg_walinspect;
-- Superuser can grant execute to other users
GRANT EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn)
@@ -129,6 +164,8 @@ GRANT EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
TO regress_pg_walinspect;
GRANT EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
TO regress_pg_walinspect;
+GRANT EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ TO regress_pg_walinspect;
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_record_info(pg_lsn)', 'EXECUTE'); -- yes
has_function_privilege
@@ -150,12 +187,21 @@ SELECT has_function_privilege('regress_pg_walinspect',
t
(1 row)
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+ has_function_privilege
+------------------------
+ t
+(1 row)
+
REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn)
FROM regress_pg_walinspect;
REVOKE EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
FROM regress_pg_walinspect;
REVOKE EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
FROM regress_pg_walinspect;
+REVOKE EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ FROM regress_pg_walinspect;
-- ===================================================================
-- Clean up
-- ===================================================================
diff --git a/contrib/pg_walinspect/meson.build b/contrib/pg_walinspect/meson.build
index 7917e03557..6ab97d3672 100644
--- a/contrib/pg_walinspect/meson.build
+++ b/contrib/pg_walinspect/meson.build
@@ -19,6 +19,7 @@ contrib_targets += pg_walinspect
install_data(
'pg_walinspect.control',
'pg_walinspect--1.0.sql',
+ 'pg_walinspect--1.0--1.1.sql',
kwargs: contrib_data_args,
)
diff --git a/contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql b/contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql
new file mode 100644
index 0000000000..4ffbd91728
--- /dev/null
+++ b/contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql
@@ -0,0 +1,24 @@
+/* contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_walinspect UPDATE TO '1.1'" to load this file. \quit
+
+--
+-- pg_get_wal_fpi_info()
+--
+CREATE FUNCTION pg_get_wal_fpi_info(IN start_lsn pg_lsn,
+ IN end_lsn pg_lsn,
+ OUT lsn pg_lsn,
+ OUT tablespace_oid oid,
+ OUT database_oid oid,
+ OUT relfile_number oid,
+ OUT block_number int8,
+ OUT fork_name text,
+ OUT fpi bytea
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_get_wal_fpi_info'
+LANGUAGE C STRICT PARALLEL SAFE;
+
+REVOKE EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn) TO pg_read_server_files;
diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index beba4788c7..02fee9e2ab 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -35,6 +35,7 @@ PG_FUNCTION_INFO_V1(pg_get_wal_records_info);
PG_FUNCTION_INFO_V1(pg_get_wal_records_info_till_end_of_wal);
PG_FUNCTION_INFO_V1(pg_get_wal_stats);
PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
+PG_FUNCTION_INFO_V1(pg_get_wal_fpi_info);
static bool IsFutureLSN(XLogRecPtr lsn, XLogRecPtr *curr_lsn);
static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
@@ -55,6 +56,7 @@ static void FillXLogStatsRow(const char *name, uint64 n, uint64 total_count,
Datum *values, bool *nulls, uint32 ncols);
static void GetWalStats(FunctionCallInfo fcinfo, XLogRecPtr start_lsn,
XLogRecPtr end_lsn, bool stats_per_record);
+static void GetWALFPIInfo(FunctionCallInfo fcinfo, XLogReaderState *record);
/*
* Check if the given LSN is in future. Also, return the LSN up to which the
@@ -616,3 +618,110 @@ pg_get_wal_stats_till_end_of_wal(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+
+/*
+ * Get full page images and their info associated with a given WAL record.
+ */
+static void
+GetWALFPIInfo(FunctionCallInfo fcinfo, XLogReaderState *record)
+{
+#define PG_GET_WAL_FPI_INFO_COLS 7
+ int block_id;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+ {
+ PGAlignedBlock buf;
+ Page page;
+ bytea *raw_page;
+ BlockNumber blk;
+ RelFileLocator rnode;
+ ForkNumber fork;
+ Datum values[PG_GET_WAL_FPI_INFO_COLS] = {0};
+ bool nulls[PG_GET_WAL_FPI_INFO_COLS] = {0};
+ int i = 0;
+
+ if (!XLogRecHasBlockRef(record, block_id))
+ continue;
+
+ if (!XLogRecHasBlockImage(record, block_id))
+ continue;
+
+ page = (Page) buf.data;
+
+ /* Full page exists, so let's output it. */
+ if (!RestoreBlockImage(record, block_id, page))
+ ereport(ERROR,
+ (errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("%s", record->errormsg_buf)));
+
+ (void) XLogRecGetBlockTagExtended(record, block_id,
+ &rnode, &fork, &blk, NULL);
+
+ values[i++] = LSNGetDatum(record->ReadRecPtr);
+ values[i++] = ObjectIdGetDatum(rnode.spcOid);
+ values[i++] = ObjectIdGetDatum(rnode.dbOid);
+ values[i++] = ObjectIdGetDatum(rnode.relNumber);
+ values[i++] = Int64GetDatum((int64) blk);
+
+ if (fork >= 0 && fork <= MAX_FORKNUM)
+ values[i++] = CStringGetTextDatum(forkNames[fork]);
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("invalid fork number: %u", fork)));
+
+ /* Initialize bytea buffer to copy the FPI to. */
+ raw_page = (bytea *) palloc(BLCKSZ + VARHDRSZ);
+ SET_VARSIZE(raw_page, BLCKSZ + VARHDRSZ);
+
+ /* Take a verbatim copy of the FPI. */
+ memcpy(VARDATA(raw_page), page, BLCKSZ);
+
+ values[i++] = PointerGetDatum(raw_page);
+
+ Assert(i == PG_GET_WAL_FPI_INFO_COLS);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+
+#undef PG_GET_WAL_FPI_INFO_COLS
+}
+
+/*
+ * Get full page images and their info of all WAL records between start LSN and
+ * end LSN.
+ *
+ * This function emits an error if a future start or end WAL LSN i.e. WAL LSN
+ * the database system doesn't know about is specified.
+ */
+Datum
+pg_get_wal_fpi_info(PG_FUNCTION_ARGS)
+{
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+ XLogReaderState *xlogreader;
+
+ start_lsn = PG_GETARG_LSN(0);
+ end_lsn = PG_GETARG_LSN(1);
+
+ end_lsn = ValidateInputLSNs(false, start_lsn, end_lsn);
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ xlogreader = InitXLogReaderState(start_lsn);
+
+ while (ReadNextXLogRecord(xlogreader) &&
+ xlogreader->EndRecPtr <= end_lsn)
+ {
+ GetWALFPIInfo(fcinfo, xlogreader);
+
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ pfree(xlogreader->private_data);
+ XLogReaderFree(xlogreader);
+
+ PG_RETURN_VOID();
+}
diff --git a/contrib/pg_walinspect/pg_walinspect.control b/contrib/pg_walinspect/pg_walinspect.control
index 017e56a2bb..efa3cb2cfe 100644
--- a/contrib/pg_walinspect/pg_walinspect.control
+++ b/contrib/pg_walinspect/pg_walinspect.control
@@ -1,5 +1,5 @@
# pg_walinspect extension
comment = 'functions to inspect contents of PostgreSQL Write-Ahead Log'
-default_version = '1.0'
+default_version = '1.1'
module_pathname = '$libdir/pg_walinspect'
relocatable = true
diff --git a/contrib/pg_walinspect/sql/pg_walinspect.sql b/contrib/pg_walinspect/sql/pg_walinspect.sql
index 1b265ea7bc..25f707de44 100644
--- a/contrib/pg_walinspect/sql/pg_walinspect.sql
+++ b/contrib/pg_walinspect/sql/pg_walinspect.sql
@@ -52,6 +52,23 @@ SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info(:'wal_lsn1', :'wal_lsn2'
SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info(:'wal_lsn1', :'wal_lsn2')
WHERE resource_manager = 'Heap' AND record_type = 'INSERT';
+-- ===================================================================
+-- Tests for getting full page images
+-- ===================================================================
+SELECT pg_current_wal_lsn() AS wal_lsn3 \gset
+
+-- Force FPWs on the next writes.
+CHECKPOINT;
+
+UPDATE sample_tbl SET col1 = col1 + 1;
+
+SELECT 1 FROM pg_switch_wal();
+
+SELECT pg_current_wal_lsn() AS wal_lsn4 \gset
+
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4')
+ WHERE relfile_number = :'sample_tbl_oid';
+
-- ===================================================================
-- Tests for permissions
-- ===================================================================
@@ -66,6 +83,9 @@ SELECT has_function_privilege('regress_pg_walinspect',
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_stats(pg_lsn, pg_lsn, boolean) ', 'EXECUTE'); -- no
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- no
+
-- Functions accessible by users with role pg_read_server_files
GRANT pg_read_server_files TO regress_pg_walinspect;
@@ -79,6 +99,9 @@ SELECT has_function_privilege('regress_pg_walinspect',
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_stats(pg_lsn, pg_lsn, boolean) ', 'EXECUTE'); -- yes
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+
REVOKE pg_read_server_files FROM regress_pg_walinspect;
-- Superuser can grant execute to other users
@@ -91,6 +114,9 @@ GRANT EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
GRANT EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
TO regress_pg_walinspect;
+GRANT EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ TO regress_pg_walinspect;
+
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_record_info(pg_lsn)', 'EXECUTE'); -- yes
@@ -100,6 +126,9 @@ SELECT has_function_privilege('regress_pg_walinspect',
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_stats(pg_lsn, pg_lsn, boolean) ', 'EXECUTE'); -- yes
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+
REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn)
FROM regress_pg_walinspect;
@@ -109,6 +138,9 @@ REVOKE EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
REVOKE EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
FROM regress_pg_walinspect;
+REVOKE EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ FROM regress_pg_walinspect;
+
-- ===================================================================
-- Clean up
-- ===================================================================
diff --git a/doc/src/sgml/pgwalinspect.sgml b/doc/src/sgml/pgwalinspect.sgml
index 1a1bee7d6a..98208c2deb 100644
--- a/doc/src/sgml/pgwalinspect.sgml
+++ b/doc/src/sgml/pgwalinspect.sgml
@@ -261,6 +261,54 @@ postgres=# select * from pg_get_wal_stats('0/14AFC30', '0/15011D7', true) where
</listitem>
</varlistentry>
+ <varlistentry>
+ <term>
+ <function>
+ pg_get_wal_fpi_info(start_lsn pg_lsn,
+ end_lsn pg_lsn,
+ lsn OUT pg_lsn,
+ tablespace_oid OUT oid,
+ database_oid OUT oid,
+ relfile_number OUT oid,
+ block_number OUT int8,
+ fork_name OUT text,
+ fpi OUT bytea)
+ returns setof record
+ </function>
+ </term>
+
+ <listitem>
+ <para>
+ Gets raw full page images and their information associated with all the
+ valid WAL records between <replaceable>start_lsn</replaceable> and
+ <replaceable>end_lsn</replaceable>. Returns one row per full page image.
+ If <replaceable>start_lsn</replaceable> or
+ <replaceable>end_lsn</replaceable> are not yet available, the function
+ will raise an error. For example, usage of the function is as follows:
+<screen>
+postgres=# SELECT lsn, tablespace_oid, database_oid, relfile_number, block_number, fork_name, length(fpi) > 0 as fpi_ok FROM pg_get_wal_fpi_info('0/7418E60', '0/7518218');
+ lsn | tablespace_oid | database_oid | relfile_number | block_number | fork_name | fpi_ok
+-----------+----------------+--------------+----------------+--------------+-----------+--------
+ 0/7418F48 | 1663 | 5 | 16419 | 0 | main | t
+ 0/7419080 | 1663 | 5 | 16422 | 19 | main | t
+ 0/741B0B8 | 1663 | 5 | 16422 | 22 | main | t
+ 0/741BF48 | 1663 | 5 | 16422 | 20 | main | t
+ 0/74257B8 | 1663 | 5 | 16422 | 21 | main | t
+ 0/742FF30 | 1663 | 5 | 16425 | 42 | main | t
+ 0/7431F68 | 1663 | 5 | 16425 | 44 | main | t
+ 0/7439B78 | 1663 | 5 | 16425 | 43 | main | t
+ 0/74451E8 | 1663 | 5 | 16428 | 199 | main | t
+ 0/7447220 | 1663 | 5 | 16428 | 221 | main | t
+ 0/744E458 | 1663 | 5 | 16428 | 200 | main | t
+ 0/7457CB0 | 1663 | 5 | 16428 | 201 | main | t
+ 0/7461520 | 1663 | 5 | 16428 | 202 | main | t
+ 0/746AD90 | 1663 | 5 | 16428 | 203 | main | t
+(14 rows)
+</screen>
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
--
2.34.1
Hi,
On 12/27/22 12:48 PM, Bharath Rupireddy wrote:
Hi,
Here's a patch that implements the idea of extracting full page images
from WAL records [1] [2] with a function in pg_walinspect. This new
function accepts start and end lsn and returns full page image info
such as WAL record lsn, tablespace oid, database oid, relfile number,
block number, fork name and the raw full page (as bytea). I'll
register this in the next commitfest.Thoughts?
I think it makes sense to somehow align the pg_walinspect functions with the pg_waldump "features".
And since [1] added FPI "extraction" then +1 for the proposed patch in this thread.
[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d497093cbecccf6df26365e06a5f8f8614b591c8
[2] /messages/by-id/CAOxo6XKjQb2bMSBRpePf3ZpzfNTwjQUc4Tafh21=jzjX6bX8CA@mail.gmail.com
I just have a few comments:
+
+/*
+ * Get full page images and their info associated with a given WAL record.
+ */
What about adding a few words about compression? (like "Decompression is applied if necessary"?)
+ /* Full page exists, so let's output it. */
+ if (!RestoreBlockImage(record, block_id, page))
"Full page exists, so let's output its info and content." instead?
+ <para>
+ Gets raw full page images and their information associated with all the
+ valid WAL records between <replaceable>start_lsn</replaceable> and
+ <replaceable>end_lsn</replaceable>. Returns one row per full page image.
Worth to add a few words about decompression too?
I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that both
extracted images are the same and matches the one modified right after the checkpoint.
What do you think? (could be done later in another patch though).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Wed, Jan 4, 2023 at 8:19 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
I think it makes sense to somehow align the pg_walinspect functions with the pg_waldump "features".
And since [1] added FPI "extraction" then +1 for the proposed patch in this thread.[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d497093cbecccf6df26365e06a5f8f8614b591c8
[2] /messages/by-id/CAOxo6XKjQb2bMSBRpePf3ZpzfNTwjQUc4Tafh21=jzjX6bX8CA@mail.gmail.comI just have a few comments:
Thanks for reviewing.
+ +/* + * Get full page images and their info associated with a given WAL record. + */+ <para> + Gets raw full page images and their information associated with all the + valid WAL records between <replaceable>start_lsn</replaceable> and + <replaceable>end_lsn</replaceable>. Returns one row per full page image.Worth to add a few words about decompression too?
Done.
What about adding a few words about compression? (like "Decompression is applied if necessary"?)
+ /* Full page exists, so let's output it. */ + if (!RestoreBlockImage(record, block_id, page))"Full page exists, so let's output its info and content." instead?
Done.
I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that both
extracted images are the same and matches the one modified right after the checkpoint.What do you think? (could be done later in another patch though).
I think pageinspect can be used here. We can fetch the raw page from
the table after the checkpoint and raw FPI from the WAL record logged
as part of the update. I've tried to do so [1]SELECT * FROM page_header(:'page_from_table'); lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid -----------+----------+-------+-------+-------+---------+----------+---------+----------- 0/1891D78 | 0 | 0 | 40 | 8064 | 8192 | 8192 | 4 | 0 (1 row), but I see a slight
difference in the raw output. The expectation is that they both be the
same. It might be that the update operation logs the FPI with some
more info set (prune_xid). I'll try to see why it is so.
I'm attaching the v2 patch for further review.
[1]: SELECT * FROM page_header(:'page_from_table'); lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid -----------+----------+-------+-------+-------+---------+----------+---------+----------- 0/1891D78 | 0 | 0 | 40 | 8064 | 8192 | 8192 | 4 | 0 (1 row)
SELECT * FROM page_header(:'page_from_table');
lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
0/1891D78 | 0 | 0 | 40 | 8064 | 8192 | 8192 |
4 | 0
(1 row)
SELECT * FROM page_header(:'page_from_wal');
lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
0/1891D78 | 0 | 0 | 44 | 8032 | 8192 | 8192 |
4 | 735
(1 row)
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v2-0001-Add-FPI-extract-function-to-pg_walinspect.patchapplication/octet-stream; name=v2-0001-Add-FPI-extract-function-to-pg_walinspect.patchDownload
From 53aa9e78b80f3fa09ba299c2a165eeb4e87e946f Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Thu, 5 Jan 2023 11:15:45 +0000
Subject: [PATCH v2] Add FPI extract function to pg_walinspect
---
contrib/pg_walinspect/Makefile | 2 +-
.../pg_walinspect/expected/pg_walinspect.out | 46 ++++++++
contrib/pg_walinspect/meson.build | 1 +
contrib/pg_walinspect/pg_walinspect.c | 110 ++++++++++++++++++
contrib/pg_walinspect/pg_walinspect.control | 2 +-
contrib/pg_walinspect/sql/pg_walinspect.sql | 32 +++++
doc/src/sgml/pgwalinspect.sgml | 49 ++++++++
7 files changed, 240 insertions(+), 2 deletions(-)
diff --git a/contrib/pg_walinspect/Makefile b/contrib/pg_walinspect/Makefile
index 960530eb6c..7033878a79 100644
--- a/contrib/pg_walinspect/Makefile
+++ b/contrib/pg_walinspect/Makefile
@@ -7,7 +7,7 @@ OBJS = \
PGFILEDESC = "pg_walinspect - functions to inspect contents of PostgreSQL Write-Ahead Log"
EXTENSION = pg_walinspect
-DATA = pg_walinspect--1.0.sql
+DATA = pg_walinspect--1.0.sql pg_walinspect--1.0--1.1.sql
REGRESS = pg_walinspect
diff --git a/contrib/pg_walinspect/expected/pg_walinspect.out b/contrib/pg_walinspect/expected/pg_walinspect.out
index a1ee743457..5cd28402ad 100644
--- a/contrib/pg_walinspect/expected/pg_walinspect.out
+++ b/contrib/pg_walinspect/expected/pg_walinspect.out
@@ -73,6 +73,27 @@ SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info(:'wal_lsn1', :'wal_lsn2'
t
(1 row)
+-- ===================================================================
+-- Tests for getting full page images
+-- ===================================================================
+SELECT pg_current_wal_lsn() AS wal_lsn3 \gset
+-- Force FPWs on the next writes.
+CHECKPOINT;
+UPDATE sample_tbl SET col1 = col1 + 1;
+SELECT 1 FROM pg_switch_wal();
+ ?column?
+----------
+ 1
+(1 row)
+
+SELECT pg_current_wal_lsn() AS wal_lsn4 \gset
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4')
+ WHERE relfile_number = :'sample_tbl_oid';
+ ok
+----
+ t
+(1 row)
+
-- ===================================================================
-- Tests for permissions
-- ===================================================================
@@ -98,6 +119,13 @@ SELECT has_function_privilege('regress_pg_walinspect',
f
(1 row)
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- no
+ has_function_privilege
+------------------------
+ f
+(1 row)
+
-- Functions accessible by users with role pg_read_server_files
GRANT pg_read_server_files TO regress_pg_walinspect;
SELECT has_function_privilege('regress_pg_walinspect',
@@ -121,6 +149,13 @@ SELECT has_function_privilege('regress_pg_walinspect',
t
(1 row)
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+ has_function_privilege
+------------------------
+ t
+(1 row)
+
REVOKE pg_read_server_files FROM regress_pg_walinspect;
-- Superuser can grant execute to other users
GRANT EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn)
@@ -129,6 +164,8 @@ GRANT EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
TO regress_pg_walinspect;
GRANT EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
TO regress_pg_walinspect;
+GRANT EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ TO regress_pg_walinspect;
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_record_info(pg_lsn)', 'EXECUTE'); -- yes
has_function_privilege
@@ -150,12 +187,21 @@ SELECT has_function_privilege('regress_pg_walinspect',
t
(1 row)
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+ has_function_privilege
+------------------------
+ t
+(1 row)
+
REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn)
FROM regress_pg_walinspect;
REVOKE EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
FROM regress_pg_walinspect;
REVOKE EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
FROM regress_pg_walinspect;
+REVOKE EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ FROM regress_pg_walinspect;
-- ===================================================================
-- Clean up
-- ===================================================================
diff --git a/contrib/pg_walinspect/meson.build b/contrib/pg_walinspect/meson.build
index f6e912a7d8..bf7b79b1b7 100644
--- a/contrib/pg_walinspect/meson.build
+++ b/contrib/pg_walinspect/meson.build
@@ -19,6 +19,7 @@ contrib_targets += pg_walinspect
install_data(
'pg_walinspect.control',
'pg_walinspect--1.0.sql',
+ 'pg_walinspect--1.0--1.1.sql',
kwargs: contrib_data_args,
)
diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 08a4c2fb52..8f992960e8 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -35,6 +35,7 @@ PG_FUNCTION_INFO_V1(pg_get_wal_records_info);
PG_FUNCTION_INFO_V1(pg_get_wal_records_info_till_end_of_wal);
PG_FUNCTION_INFO_V1(pg_get_wal_stats);
PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
+PG_FUNCTION_INFO_V1(pg_get_wal_fpi_info);
static bool IsFutureLSN(XLogRecPtr lsn, XLogRecPtr *curr_lsn);
static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
@@ -55,6 +56,7 @@ static void FillXLogStatsRow(const char *name, uint64 n, uint64 total_count,
Datum *values, bool *nulls, uint32 ncols);
static void GetWalStats(FunctionCallInfo fcinfo, XLogRecPtr start_lsn,
XLogRecPtr end_lsn, bool stats_per_record);
+static void GetWALFPIInfo(FunctionCallInfo fcinfo, XLogReaderState *record);
/*
* Check if the given LSN is in future. Also, return the LSN up to which the
@@ -616,3 +618,111 @@ pg_get_wal_stats_till_end_of_wal(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+
+/*
+ * Get full page images and their info associated with a given WAL record.
+ * Decompression is applied to the blocks, if necessary.
+ */
+static void
+GetWALFPIInfo(FunctionCallInfo fcinfo, XLogReaderState *record)
+{
+#define PG_GET_WAL_FPI_INFO_COLS 7
+ int block_id;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+ {
+ PGAlignedBlock buf;
+ Page page;
+ bytea *raw_page;
+ BlockNumber blk;
+ RelFileLocator rnode;
+ ForkNumber fork;
+ Datum values[PG_GET_WAL_FPI_INFO_COLS] = {0};
+ bool nulls[PG_GET_WAL_FPI_INFO_COLS] = {0};
+ int i = 0;
+
+ if (!XLogRecHasBlockRef(record, block_id))
+ continue;
+
+ if (!XLogRecHasBlockImage(record, block_id))
+ continue;
+
+ page = (Page) buf.data;
+
+ if (!RestoreBlockImage(record, block_id, page))
+ ereport(ERROR,
+ (errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("%s", record->errormsg_buf)));
+
+ /* Full page exists, so let's output its info and content. */
+ (void) XLogRecGetBlockTagExtended(record, block_id,
+ &rnode, &fork, &blk, NULL);
+
+ values[i++] = LSNGetDatum(record->ReadRecPtr);
+ values[i++] = ObjectIdGetDatum(rnode.spcOid);
+ values[i++] = ObjectIdGetDatum(rnode.dbOid);
+ values[i++] = ObjectIdGetDatum(rnode.relNumber);
+ values[i++] = Int64GetDatum((int64) blk);
+
+ if (fork >= 0 && fork <= MAX_FORKNUM)
+ values[i++] = CStringGetTextDatum(forkNames[fork]);
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("invalid fork number: %u", fork)));
+
+ /* Initialize bytea buffer to copy the FPI to. */
+ raw_page = (bytea *) palloc(BLCKSZ + VARHDRSZ);
+ SET_VARSIZE(raw_page, BLCKSZ + VARHDRSZ);
+
+ /* Take a verbatim copy of the FPI. */
+ memcpy(VARDATA(raw_page), page, BLCKSZ);
+
+ values[i++] = PointerGetDatum(raw_page);
+
+ Assert(i == PG_GET_WAL_FPI_INFO_COLS);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+
+#undef PG_GET_WAL_FPI_INFO_COLS
+}
+
+/*
+ * Get full page images and their info of all WAL records between start LSN and
+ * end LSN. Decompression is applied to the blocks, if necessary.
+ *
+ * This function emits an error if a future start or end WAL LSN i.e. WAL LSN
+ * the database system doesn't know about is specified.
+ */
+Datum
+pg_get_wal_fpi_info(PG_FUNCTION_ARGS)
+{
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+ XLogReaderState *xlogreader;
+
+ start_lsn = PG_GETARG_LSN(0);
+ end_lsn = PG_GETARG_LSN(1);
+
+ end_lsn = ValidateInputLSNs(false, start_lsn, end_lsn);
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ xlogreader = InitXLogReaderState(start_lsn);
+
+ while (ReadNextXLogRecord(xlogreader) &&
+ xlogreader->EndRecPtr <= end_lsn)
+ {
+ GetWALFPIInfo(fcinfo, xlogreader);
+
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ pfree(xlogreader->private_data);
+ XLogReaderFree(xlogreader);
+
+ PG_RETURN_VOID();
+}
diff --git a/contrib/pg_walinspect/pg_walinspect.control b/contrib/pg_walinspect/pg_walinspect.control
index 017e56a2bb..efa3cb2cfe 100644
--- a/contrib/pg_walinspect/pg_walinspect.control
+++ b/contrib/pg_walinspect/pg_walinspect.control
@@ -1,5 +1,5 @@
# pg_walinspect extension
comment = 'functions to inspect contents of PostgreSQL Write-Ahead Log'
-default_version = '1.0'
+default_version = '1.1'
module_pathname = '$libdir/pg_walinspect'
relocatable = true
diff --git a/contrib/pg_walinspect/sql/pg_walinspect.sql b/contrib/pg_walinspect/sql/pg_walinspect.sql
index 1b265ea7bc..25f707de44 100644
--- a/contrib/pg_walinspect/sql/pg_walinspect.sql
+++ b/contrib/pg_walinspect/sql/pg_walinspect.sql
@@ -52,6 +52,23 @@ SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info(:'wal_lsn1', :'wal_lsn2'
SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info(:'wal_lsn1', :'wal_lsn2')
WHERE resource_manager = 'Heap' AND record_type = 'INSERT';
+-- ===================================================================
+-- Tests for getting full page images
+-- ===================================================================
+SELECT pg_current_wal_lsn() AS wal_lsn3 \gset
+
+-- Force FPWs on the next writes.
+CHECKPOINT;
+
+UPDATE sample_tbl SET col1 = col1 + 1;
+
+SELECT 1 FROM pg_switch_wal();
+
+SELECT pg_current_wal_lsn() AS wal_lsn4 \gset
+
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4')
+ WHERE relfile_number = :'sample_tbl_oid';
+
-- ===================================================================
-- Tests for permissions
-- ===================================================================
@@ -66,6 +83,9 @@ SELECT has_function_privilege('regress_pg_walinspect',
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_stats(pg_lsn, pg_lsn, boolean) ', 'EXECUTE'); -- no
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- no
+
-- Functions accessible by users with role pg_read_server_files
GRANT pg_read_server_files TO regress_pg_walinspect;
@@ -79,6 +99,9 @@ SELECT has_function_privilege('regress_pg_walinspect',
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_stats(pg_lsn, pg_lsn, boolean) ', 'EXECUTE'); -- yes
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+
REVOKE pg_read_server_files FROM regress_pg_walinspect;
-- Superuser can grant execute to other users
@@ -91,6 +114,9 @@ GRANT EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
GRANT EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
TO regress_pg_walinspect;
+GRANT EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ TO regress_pg_walinspect;
+
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_record_info(pg_lsn)', 'EXECUTE'); -- yes
@@ -100,6 +126,9 @@ SELECT has_function_privilege('regress_pg_walinspect',
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_stats(pg_lsn, pg_lsn, boolean) ', 'EXECUTE'); -- yes
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+
REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn)
FROM regress_pg_walinspect;
@@ -109,6 +138,9 @@ REVOKE EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
REVOKE EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
FROM regress_pg_walinspect;
+REVOKE EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ FROM regress_pg_walinspect;
+
-- ===================================================================
-- Clean up
-- ===================================================================
diff --git a/doc/src/sgml/pgwalinspect.sgml b/doc/src/sgml/pgwalinspect.sgml
index 1a1bee7d6a..ef5e5852a9 100644
--- a/doc/src/sgml/pgwalinspect.sgml
+++ b/doc/src/sgml/pgwalinspect.sgml
@@ -261,6 +261,55 @@ postgres=# select * from pg_get_wal_stats('0/14AFC30', '0/15011D7', true) where
</listitem>
</varlistentry>
+ <varlistentry>
+ <term>
+ <function>
+ pg_get_wal_fpi_info(start_lsn pg_lsn,
+ end_lsn pg_lsn,
+ lsn OUT pg_lsn,
+ tablespace_oid OUT oid,
+ database_oid OUT oid,
+ relfile_number OUT oid,
+ block_number OUT int8,
+ fork_name OUT text,
+ fpi OUT bytea)
+ returns setof record
+ </function>
+ </term>
+
+ <listitem>
+ <para>
+ Gets raw full page images and their information associated with all the
+ valid WAL records between <replaceable>start_lsn</replaceable> and
+ <replaceable>end_lsn</replaceable>. Returns one row per full page image.
+ Decompression is applied to the blocks, if necessary.
+ If <replaceable>start_lsn</replaceable> or
+ <replaceable>end_lsn</replaceable> are not yet available, the function
+ will raise an error. For example, usage of the function is as follows:
+<screen>
+postgres=# SELECT lsn, tablespace_oid, database_oid, relfile_number, block_number, fork_name, length(fpi) > 0 as fpi_ok FROM pg_get_wal_fpi_info('0/7418E60', '0/7518218');
+ lsn | tablespace_oid | database_oid | relfile_number | block_number | fork_name | fpi_ok
+-----------+----------------+--------------+----------------+--------------+-----------+--------
+ 0/7418F48 | 1663 | 5 | 16419 | 0 | main | t
+ 0/7419080 | 1663 | 5 | 16422 | 19 | main | t
+ 0/741B0B8 | 1663 | 5 | 16422 | 22 | main | t
+ 0/741BF48 | 1663 | 5 | 16422 | 20 | main | t
+ 0/74257B8 | 1663 | 5 | 16422 | 21 | main | t
+ 0/742FF30 | 1663 | 5 | 16425 | 42 | main | t
+ 0/7431F68 | 1663 | 5 | 16425 | 44 | main | t
+ 0/7439B78 | 1663 | 5 | 16425 | 43 | main | t
+ 0/74451E8 | 1663 | 5 | 16428 | 199 | main | t
+ 0/7447220 | 1663 | 5 | 16428 | 221 | main | t
+ 0/744E458 | 1663 | 5 | 16428 | 200 | main | t
+ 0/7457CB0 | 1663 | 5 | 16428 | 201 | main | t
+ 0/7461520 | 1663 | 5 | 16428 | 202 | main | t
+ 0/746AD90 | 1663 | 5 | 16428 | 203 | main | t
+(14 rows)
+</screen>
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
--
2.34.1
On Thu, 5 Jan 2023 at 18:52, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Jan 4, 2023 at 8:19 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:I think it makes sense to somehow align the pg_walinspect functions with the pg_waldump "features".
And since [1] added FPI "extraction" then +1 for the proposed patch in this thread.[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d497093cbecccf6df26365e06a5f8f8614b591c8
[2] /messages/by-id/CAOxo6XKjQb2bMSBRpePf3ZpzfNTwjQUc4Tafh21=jzjX6bX8CA@mail.gmail.comI just have a few comments:
Thanks for reviewing.
+ +/* + * Get full page images and their info associated with a given WAL record. + */+ <para> + Gets raw full page images and their information associated with all the + valid WAL records between <replaceable>start_lsn</replaceable> and + <replaceable>end_lsn</replaceable>. Returns one row per full page image.Worth to add a few words about decompression too?
Done.
What about adding a few words about compression? (like "Decompression is applied if necessary"?)
+ /* Full page exists, so let's output it. */ + if (!RestoreBlockImage(record, block_id, page))"Full page exists, so let's output its info and content." instead?
Done.
I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that both
extracted images are the same and matches the one modified right after the checkpoint.What do you think? (could be done later in another patch though).
I think pageinspect can be used here. We can fetch the raw page from
the table after the checkpoint and raw FPI from the WAL record logged
as part of the update. I've tried to do so [1], but I see a slight
difference in the raw output. The expectation is that they both be the
same. It might be that the update operation logs the FPI with some
more info set (prune_xid). I'll try to see why it is so.I'm attaching the v2 patch for further review.
I felt one of the files was missing in the patch:
[13:39:03.534] contrib/pg_walinspect/meson.build:19:0: ERROR: File
pg_walinspect--1.0--1.1.sql does not exist.
Please post an updated version for the same.
Regards,
Vignesh
On Thu, Jan 5, 2023 at 6:51 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that both
extracted images are the same and matches the one modified right after the checkpoint.What do you think? (could be done later in another patch though).
I think pageinspect can be used here. We can fetch the raw page from
the table after the checkpoint and raw FPI from the WAL record logged
as part of the update. I've tried to do so [1], but I see a slight
difference in the raw output. The expectation is that they both be the
same. It might be that the update operation logs the FPI with some
more info set (prune_xid). I'll try to see why it is so.I'm attaching the v2 patch for further review.
[1]
SELECT * FROM page_header(:'page_from_table');
lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
0/1891D78 | 0 | 0 | 40 | 8064 | 8192 | 8192 |
4 | 0
(1 row)SELECT * FROM page_header(:'page_from_wal');
lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
0/1891D78 | 0 | 0 | 44 | 8032 | 8192 | 8192 |
4 | 735
(1 row)
Ugh, v2 patch missed the new file added, I'm attaching v3 patch for
further review. Sorry for the noise.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v3-0001-Add-FPI-extract-function-to-pg_walinspect.patchapplication/octet-stream; name=v3-0001-Add-FPI-extract-function-to-pg_walinspect.patchDownload
From 0618bf43054322927226a1cde5ca5dad5bcbeaa6 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Fri, 6 Jan 2023 06:13:26 +0000
Subject: [PATCH v3] Add FPI extract function to pg_walinspect
---
contrib/pg_walinspect/Makefile | 2 +-
.../pg_walinspect/expected/pg_walinspect.out | 46 ++++++++
contrib/pg_walinspect/meson.build | 1 +
.../pg_walinspect/pg_walinspect--1.0--1.1.sql | 24 ++++
contrib/pg_walinspect/pg_walinspect.c | 110 ++++++++++++++++++
contrib/pg_walinspect/pg_walinspect.control | 2 +-
contrib/pg_walinspect/sql/pg_walinspect.sql | 32 +++++
doc/src/sgml/pgwalinspect.sgml | 49 ++++++++
8 files changed, 264 insertions(+), 2 deletions(-)
create mode 100644 contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql
diff --git a/contrib/pg_walinspect/Makefile b/contrib/pg_walinspect/Makefile
index 960530eb6c..7033878a79 100644
--- a/contrib/pg_walinspect/Makefile
+++ b/contrib/pg_walinspect/Makefile
@@ -7,7 +7,7 @@ OBJS = \
PGFILEDESC = "pg_walinspect - functions to inspect contents of PostgreSQL Write-Ahead Log"
EXTENSION = pg_walinspect
-DATA = pg_walinspect--1.0.sql
+DATA = pg_walinspect--1.0.sql pg_walinspect--1.0--1.1.sql
REGRESS = pg_walinspect
diff --git a/contrib/pg_walinspect/expected/pg_walinspect.out b/contrib/pg_walinspect/expected/pg_walinspect.out
index a1ee743457..5cd28402ad 100644
--- a/contrib/pg_walinspect/expected/pg_walinspect.out
+++ b/contrib/pg_walinspect/expected/pg_walinspect.out
@@ -73,6 +73,27 @@ SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info(:'wal_lsn1', :'wal_lsn2'
t
(1 row)
+-- ===================================================================
+-- Tests for getting full page images
+-- ===================================================================
+SELECT pg_current_wal_lsn() AS wal_lsn3 \gset
+-- Force FPWs on the next writes.
+CHECKPOINT;
+UPDATE sample_tbl SET col1 = col1 + 1;
+SELECT 1 FROM pg_switch_wal();
+ ?column?
+----------
+ 1
+(1 row)
+
+SELECT pg_current_wal_lsn() AS wal_lsn4 \gset
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4')
+ WHERE relfile_number = :'sample_tbl_oid';
+ ok
+----
+ t
+(1 row)
+
-- ===================================================================
-- Tests for permissions
-- ===================================================================
@@ -98,6 +119,13 @@ SELECT has_function_privilege('regress_pg_walinspect',
f
(1 row)
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- no
+ has_function_privilege
+------------------------
+ f
+(1 row)
+
-- Functions accessible by users with role pg_read_server_files
GRANT pg_read_server_files TO regress_pg_walinspect;
SELECT has_function_privilege('regress_pg_walinspect',
@@ -121,6 +149,13 @@ SELECT has_function_privilege('regress_pg_walinspect',
t
(1 row)
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+ has_function_privilege
+------------------------
+ t
+(1 row)
+
REVOKE pg_read_server_files FROM regress_pg_walinspect;
-- Superuser can grant execute to other users
GRANT EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn)
@@ -129,6 +164,8 @@ GRANT EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
TO regress_pg_walinspect;
GRANT EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
TO regress_pg_walinspect;
+GRANT EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ TO regress_pg_walinspect;
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_record_info(pg_lsn)', 'EXECUTE'); -- yes
has_function_privilege
@@ -150,12 +187,21 @@ SELECT has_function_privilege('regress_pg_walinspect',
t
(1 row)
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+ has_function_privilege
+------------------------
+ t
+(1 row)
+
REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn)
FROM regress_pg_walinspect;
REVOKE EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
FROM regress_pg_walinspect;
REVOKE EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
FROM regress_pg_walinspect;
+REVOKE EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ FROM regress_pg_walinspect;
-- ===================================================================
-- Clean up
-- ===================================================================
diff --git a/contrib/pg_walinspect/meson.build b/contrib/pg_walinspect/meson.build
index f6e912a7d8..bf7b79b1b7 100644
--- a/contrib/pg_walinspect/meson.build
+++ b/contrib/pg_walinspect/meson.build
@@ -19,6 +19,7 @@ contrib_targets += pg_walinspect
install_data(
'pg_walinspect.control',
'pg_walinspect--1.0.sql',
+ 'pg_walinspect--1.0--1.1.sql',
kwargs: contrib_data_args,
)
diff --git a/contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql b/contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql
new file mode 100644
index 0000000000..4ffbd91728
--- /dev/null
+++ b/contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql
@@ -0,0 +1,24 @@
+/* contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_walinspect UPDATE TO '1.1'" to load this file. \quit
+
+--
+-- pg_get_wal_fpi_info()
+--
+CREATE FUNCTION pg_get_wal_fpi_info(IN start_lsn pg_lsn,
+ IN end_lsn pg_lsn,
+ OUT lsn pg_lsn,
+ OUT tablespace_oid oid,
+ OUT database_oid oid,
+ OUT relfile_number oid,
+ OUT block_number int8,
+ OUT fork_name text,
+ OUT fpi bytea
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_get_wal_fpi_info'
+LANGUAGE C STRICT PARALLEL SAFE;
+
+REVOKE EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn) TO pg_read_server_files;
diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 08a4c2fb52..8f992960e8 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -35,6 +35,7 @@ PG_FUNCTION_INFO_V1(pg_get_wal_records_info);
PG_FUNCTION_INFO_V1(pg_get_wal_records_info_till_end_of_wal);
PG_FUNCTION_INFO_V1(pg_get_wal_stats);
PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
+PG_FUNCTION_INFO_V1(pg_get_wal_fpi_info);
static bool IsFutureLSN(XLogRecPtr lsn, XLogRecPtr *curr_lsn);
static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
@@ -55,6 +56,7 @@ static void FillXLogStatsRow(const char *name, uint64 n, uint64 total_count,
Datum *values, bool *nulls, uint32 ncols);
static void GetWalStats(FunctionCallInfo fcinfo, XLogRecPtr start_lsn,
XLogRecPtr end_lsn, bool stats_per_record);
+static void GetWALFPIInfo(FunctionCallInfo fcinfo, XLogReaderState *record);
/*
* Check if the given LSN is in future. Also, return the LSN up to which the
@@ -616,3 +618,111 @@ pg_get_wal_stats_till_end_of_wal(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+
+/*
+ * Get full page images and their info associated with a given WAL record.
+ * Decompression is applied to the blocks, if necessary.
+ */
+static void
+GetWALFPIInfo(FunctionCallInfo fcinfo, XLogReaderState *record)
+{
+#define PG_GET_WAL_FPI_INFO_COLS 7
+ int block_id;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+ {
+ PGAlignedBlock buf;
+ Page page;
+ bytea *raw_page;
+ BlockNumber blk;
+ RelFileLocator rnode;
+ ForkNumber fork;
+ Datum values[PG_GET_WAL_FPI_INFO_COLS] = {0};
+ bool nulls[PG_GET_WAL_FPI_INFO_COLS] = {0};
+ int i = 0;
+
+ if (!XLogRecHasBlockRef(record, block_id))
+ continue;
+
+ if (!XLogRecHasBlockImage(record, block_id))
+ continue;
+
+ page = (Page) buf.data;
+
+ if (!RestoreBlockImage(record, block_id, page))
+ ereport(ERROR,
+ (errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("%s", record->errormsg_buf)));
+
+ /* Full page exists, so let's output its info and content. */
+ (void) XLogRecGetBlockTagExtended(record, block_id,
+ &rnode, &fork, &blk, NULL);
+
+ values[i++] = LSNGetDatum(record->ReadRecPtr);
+ values[i++] = ObjectIdGetDatum(rnode.spcOid);
+ values[i++] = ObjectIdGetDatum(rnode.dbOid);
+ values[i++] = ObjectIdGetDatum(rnode.relNumber);
+ values[i++] = Int64GetDatum((int64) blk);
+
+ if (fork >= 0 && fork <= MAX_FORKNUM)
+ values[i++] = CStringGetTextDatum(forkNames[fork]);
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("invalid fork number: %u", fork)));
+
+ /* Initialize bytea buffer to copy the FPI to. */
+ raw_page = (bytea *) palloc(BLCKSZ + VARHDRSZ);
+ SET_VARSIZE(raw_page, BLCKSZ + VARHDRSZ);
+
+ /* Take a verbatim copy of the FPI. */
+ memcpy(VARDATA(raw_page), page, BLCKSZ);
+
+ values[i++] = PointerGetDatum(raw_page);
+
+ Assert(i == PG_GET_WAL_FPI_INFO_COLS);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+
+#undef PG_GET_WAL_FPI_INFO_COLS
+}
+
+/*
+ * Get full page images and their info of all WAL records between start LSN and
+ * end LSN. Decompression is applied to the blocks, if necessary.
+ *
+ * This function emits an error if a future start or end WAL LSN i.e. WAL LSN
+ * the database system doesn't know about is specified.
+ */
+Datum
+pg_get_wal_fpi_info(PG_FUNCTION_ARGS)
+{
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+ XLogReaderState *xlogreader;
+
+ start_lsn = PG_GETARG_LSN(0);
+ end_lsn = PG_GETARG_LSN(1);
+
+ end_lsn = ValidateInputLSNs(false, start_lsn, end_lsn);
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ xlogreader = InitXLogReaderState(start_lsn);
+
+ while (ReadNextXLogRecord(xlogreader) &&
+ xlogreader->EndRecPtr <= end_lsn)
+ {
+ GetWALFPIInfo(fcinfo, xlogreader);
+
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ pfree(xlogreader->private_data);
+ XLogReaderFree(xlogreader);
+
+ PG_RETURN_VOID();
+}
diff --git a/contrib/pg_walinspect/pg_walinspect.control b/contrib/pg_walinspect/pg_walinspect.control
index 017e56a2bb..efa3cb2cfe 100644
--- a/contrib/pg_walinspect/pg_walinspect.control
+++ b/contrib/pg_walinspect/pg_walinspect.control
@@ -1,5 +1,5 @@
# pg_walinspect extension
comment = 'functions to inspect contents of PostgreSQL Write-Ahead Log'
-default_version = '1.0'
+default_version = '1.1'
module_pathname = '$libdir/pg_walinspect'
relocatable = true
diff --git a/contrib/pg_walinspect/sql/pg_walinspect.sql b/contrib/pg_walinspect/sql/pg_walinspect.sql
index 1b265ea7bc..25f707de44 100644
--- a/contrib/pg_walinspect/sql/pg_walinspect.sql
+++ b/contrib/pg_walinspect/sql/pg_walinspect.sql
@@ -52,6 +52,23 @@ SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info(:'wal_lsn1', :'wal_lsn2'
SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info(:'wal_lsn1', :'wal_lsn2')
WHERE resource_manager = 'Heap' AND record_type = 'INSERT';
+-- ===================================================================
+-- Tests for getting full page images
+-- ===================================================================
+SELECT pg_current_wal_lsn() AS wal_lsn3 \gset
+
+-- Force FPWs on the next writes.
+CHECKPOINT;
+
+UPDATE sample_tbl SET col1 = col1 + 1;
+
+SELECT 1 FROM pg_switch_wal();
+
+SELECT pg_current_wal_lsn() AS wal_lsn4 \gset
+
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4')
+ WHERE relfile_number = :'sample_tbl_oid';
+
-- ===================================================================
-- Tests for permissions
-- ===================================================================
@@ -66,6 +83,9 @@ SELECT has_function_privilege('regress_pg_walinspect',
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_stats(pg_lsn, pg_lsn, boolean) ', 'EXECUTE'); -- no
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- no
+
-- Functions accessible by users with role pg_read_server_files
GRANT pg_read_server_files TO regress_pg_walinspect;
@@ -79,6 +99,9 @@ SELECT has_function_privilege('regress_pg_walinspect',
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_stats(pg_lsn, pg_lsn, boolean) ', 'EXECUTE'); -- yes
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+
REVOKE pg_read_server_files FROM regress_pg_walinspect;
-- Superuser can grant execute to other users
@@ -91,6 +114,9 @@ GRANT EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
GRANT EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
TO regress_pg_walinspect;
+GRANT EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ TO regress_pg_walinspect;
+
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_record_info(pg_lsn)', 'EXECUTE'); -- yes
@@ -100,6 +126,9 @@ SELECT has_function_privilege('regress_pg_walinspect',
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_stats(pg_lsn, pg_lsn, boolean) ', 'EXECUTE'); -- yes
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+
REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn)
FROM regress_pg_walinspect;
@@ -109,6 +138,9 @@ REVOKE EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
REVOKE EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
FROM regress_pg_walinspect;
+REVOKE EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ FROM regress_pg_walinspect;
+
-- ===================================================================
-- Clean up
-- ===================================================================
diff --git a/doc/src/sgml/pgwalinspect.sgml b/doc/src/sgml/pgwalinspect.sgml
index 1a1bee7d6a..ef5e5852a9 100644
--- a/doc/src/sgml/pgwalinspect.sgml
+++ b/doc/src/sgml/pgwalinspect.sgml
@@ -261,6 +261,55 @@ postgres=# select * from pg_get_wal_stats('0/14AFC30', '0/15011D7', true) where
</listitem>
</varlistentry>
+ <varlistentry>
+ <term>
+ <function>
+ pg_get_wal_fpi_info(start_lsn pg_lsn,
+ end_lsn pg_lsn,
+ lsn OUT pg_lsn,
+ tablespace_oid OUT oid,
+ database_oid OUT oid,
+ relfile_number OUT oid,
+ block_number OUT int8,
+ fork_name OUT text,
+ fpi OUT bytea)
+ returns setof record
+ </function>
+ </term>
+
+ <listitem>
+ <para>
+ Gets raw full page images and their information associated with all the
+ valid WAL records between <replaceable>start_lsn</replaceable> and
+ <replaceable>end_lsn</replaceable>. Returns one row per full page image.
+ Decompression is applied to the blocks, if necessary.
+ If <replaceable>start_lsn</replaceable> or
+ <replaceable>end_lsn</replaceable> are not yet available, the function
+ will raise an error. For example, usage of the function is as follows:
+<screen>
+postgres=# SELECT lsn, tablespace_oid, database_oid, relfile_number, block_number, fork_name, length(fpi) > 0 as fpi_ok FROM pg_get_wal_fpi_info('0/7418E60', '0/7518218');
+ lsn | tablespace_oid | database_oid | relfile_number | block_number | fork_name | fpi_ok
+-----------+----------------+--------------+----------------+--------------+-----------+--------
+ 0/7418F48 | 1663 | 5 | 16419 | 0 | main | t
+ 0/7419080 | 1663 | 5 | 16422 | 19 | main | t
+ 0/741B0B8 | 1663 | 5 | 16422 | 22 | main | t
+ 0/741BF48 | 1663 | 5 | 16422 | 20 | main | t
+ 0/74257B8 | 1663 | 5 | 16422 | 21 | main | t
+ 0/742FF30 | 1663 | 5 | 16425 | 42 | main | t
+ 0/7431F68 | 1663 | 5 | 16425 | 44 | main | t
+ 0/7439B78 | 1663 | 5 | 16425 | 43 | main | t
+ 0/74451E8 | 1663 | 5 | 16428 | 199 | main | t
+ 0/7447220 | 1663 | 5 | 16428 | 221 | main | t
+ 0/744E458 | 1663 | 5 | 16428 | 200 | main | t
+ 0/7457CB0 | 1663 | 5 | 16428 | 201 | main | t
+ 0/7461520 | 1663 | 5 | 16428 | 202 | main | t
+ 0/746AD90 | 1663 | 5 | 16428 | 203 | main | t
+(14 rows)
+</screen>
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
--
2.34.1
On Fri, Jan 6, 2023 at 11:47 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Thu, Jan 5, 2023 at 6:51 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that both
extracted images are the same and matches the one modified right after the checkpoint.What do you think? (could be done later in another patch though).
I think pageinspect can be used here. We can fetch the raw page from
the table after the checkpoint and raw FPI from the WAL record logged
as part of the update. I've tried to do so [1], but I see a slight
difference in the raw output. The expectation is that they both be the
same. It might be that the update operation logs the FPI with some
more info set (prune_xid). I'll try to see why it is so.I'm attaching the v2 patch for further review.
[1]
SELECT * FROM page_header(:'page_from_table');
lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
0/1891D78 | 0 | 0 | 40 | 8064 | 8192 | 8192 |
4 | 0
(1 row)SELECT * FROM page_header(:'page_from_wal');
lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
0/1891D78 | 0 | 0 | 44 | 8032 | 8192 | 8192 |
4 | 735
(1 row)Ugh, v2 patch missed the new file added, I'm attaching v3 patch for
further review. Sorry for the noise.
I took a stab at how and what gets logged as FPI in WAL records:
Option 1:
WAL record with FPI contains both the unmodified table page from the
disk after checkpoint and new tuple (not applied to the unmodified
page) and the recovery (redo) applies the new tuple to the unmodified
page as part of recovery. A bit more WAL is needed to store both
unmodified page and new tuple data in the WAL record and recovery can
get slower a bit too as it needs to stitch the modified page.
Option 2:
WAL record with FPI contains only the modified page (new tuple applied
to the unmodified page from the disk after checkpoint) and the
recovery (redo) just returns the applied block as BLK_RESTORED.
Recovery can get faster with this approach and less WAL is needed to
store just the modified page.
My earlier understanding was that postgres does option (1), however, I
was wrong, option (2) is what actually postgres has implemented for
the obvious advantages specified.
I now made the tests a bit stricter in checking the FPI contents
(tuple values) pulled from the WAL record with raw page contents
pulled from the table using the pageinspect extension. Please see the
attached v4 patch.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v4-0001-Add-FPI-extract-function-to-pg_walinspect.patchapplication/x-patch; name=v4-0001-Add-FPI-extract-function-to-pg_walinspect.patchDownload
From 6b577f15aede723f5cb8ea675d41d0efe9b96727 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Fri, 6 Jan 2023 17:07:49 +0000
Subject: [PATCH v4] Add FPI extract function to pg_walinspect
---
contrib/pg_walinspect/Makefile | 3 +-
.../pg_walinspect/expected/pg_walinspect.out | 84 ++++++++++++-
contrib/pg_walinspect/meson.build | 1 +
.../pg_walinspect/pg_walinspect--1.0--1.1.sql | 24 ++++
contrib/pg_walinspect/pg_walinspect.c | 110 ++++++++++++++++++
contrib/pg_walinspect/pg_walinspect.control | 2 +-
contrib/pg_walinspect/sql/pg_walinspect.sql | 58 ++++++++-
doc/src/sgml/pgwalinspect.sgml | 49 ++++++++
8 files changed, 327 insertions(+), 4 deletions(-)
create mode 100644 contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql
diff --git a/contrib/pg_walinspect/Makefile b/contrib/pg_walinspect/Makefile
index 960530eb6c..7412307ede 100644
--- a/contrib/pg_walinspect/Makefile
+++ b/contrib/pg_walinspect/Makefile
@@ -7,11 +7,12 @@ OBJS = \
PGFILEDESC = "pg_walinspect - functions to inspect contents of PostgreSQL Write-Ahead Log"
EXTENSION = pg_walinspect
-DATA = pg_walinspect--1.0.sql
+DATA = pg_walinspect--1.0.sql pg_walinspect--1.0--1.1.sql
REGRESS = pg_walinspect
REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_walinspect/walinspect.conf
+EXTRA_INSTALL = contrib/pageinspect
# Disabled because these tests require "wal_level=replica", which
# some installcheck users do not have (e.g. buildfarm clients).
diff --git a/contrib/pg_walinspect/expected/pg_walinspect.out b/contrib/pg_walinspect/expected/pg_walinspect.out
index a1ee743457..b4ae466edb 100644
--- a/contrib/pg_walinspect/expected/pg_walinspect.out
+++ b/contrib/pg_walinspect/expected/pg_walinspect.out
@@ -1,4 +1,5 @@
CREATE EXTENSION pg_walinspect;
+CREATE EXTENSION pageinspect;
-- Make sure checkpoints don't interfere with the test.
SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_walinspect_slot', true, false);
?column?
@@ -10,7 +11,7 @@ CREATE TABLE sample_tbl(col1 int, col2 int);
SELECT pg_current_wal_lsn() AS wal_lsn1 \gset
INSERT INTO sample_tbl SELECT * FROM generate_series(1, 2);
SELECT pg_current_wal_lsn() AS wal_lsn2 \gset
-INSERT INTO sample_tbl SELECT * FROM generate_series(1, 2);
+INSERT INTO sample_tbl SELECT * FROM generate_series(3, 4);
-- ===================================================================
-- Tests for input validation
-- ===================================================================
@@ -73,6 +74,62 @@ SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info(:'wal_lsn1', :'wal_lsn2'
t
(1 row)
+-- ===================================================================
+-- Tests to get full page image (FPI) from WAL record and compare it
+-- with raw page (post-update) from table
+-- ===================================================================
+SELECT pg_current_wal_lsn() AS wal_lsn3 \gset
+-- Force FPI on the next update
+CHECKPOINT;
+-- Update table to generate an FPI. We intentionally update a single row
+-- here to ensure the predictability in verifying the FPI with raw page
+-- content from the table.
+UPDATE sample_tbl SET col1 = col1 * 100 WHERE col1 = 1;
+SELECT pg_current_wal_lsn() AS wal_lsn4 \gset
+-- Get FPI from WAL record
+SELECT fpi AS page_from_wal FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4')
+ WHERE relfile_number = :'sample_tbl_oid' \gset
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4')
+ WHERE relfile_number = :'sample_tbl_oid';
+ ok
+----
+ t
+(1 row)
+
+-- Get raw page from table
+SELECT get_raw_page('sample_tbl', 0) AS page_from_table \gset
+-- Compare FPI from WAL record and page from table, they must be same
+SELECT tuple_data_split('sample_tbl'::regclass, t_data, t_infomask, t_infomask2, t_bits)
+ FROM heap_page_items(:'page_from_table');
+ tuple_data_split
+----------------------
+ {"\\x01000000",NULL}
+ {"\\x02000000",NULL}
+ {"\\x03000000",NULL}
+ {"\\x04000000",NULL}
+ {"\\x64000000",NULL}
+(5 rows)
+
+SELECT tuple_data_split('sample_tbl'::regclass, t_data, t_infomask, t_infomask2, t_bits)
+ FROM heap_page_items(:'page_from_wal');
+ tuple_data_split
+----------------------
+ {"\\x01000000",NULL}
+ {"\\x02000000",NULL}
+ {"\\x03000000",NULL}
+ {"\\x04000000",NULL}
+ {"\\x64000000",NULL}
+(5 rows)
+
+SELECT tuple_data_split('sample_tbl'::regclass, t_data, t_infomask, t_infomask2, t_bits)
+ FROM heap_page_items(:'page_from_table')
+EXCEPT
+SELECT tuple_data_split('sample_tbl'::regclass, t_data, t_infomask, t_infomask2, t_bits)
+ FROM heap_page_items(:'page_from_wal');
+ tuple_data_split
+------------------
+(0 rows)
+
-- ===================================================================
-- Tests for permissions
-- ===================================================================
@@ -98,6 +155,13 @@ SELECT has_function_privilege('regress_pg_walinspect',
f
(1 row)
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- no
+ has_function_privilege
+------------------------
+ f
+(1 row)
+
-- Functions accessible by users with role pg_read_server_files
GRANT pg_read_server_files TO regress_pg_walinspect;
SELECT has_function_privilege('regress_pg_walinspect',
@@ -121,6 +185,13 @@ SELECT has_function_privilege('regress_pg_walinspect',
t
(1 row)
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+ has_function_privilege
+------------------------
+ t
+(1 row)
+
REVOKE pg_read_server_files FROM regress_pg_walinspect;
-- Superuser can grant execute to other users
GRANT EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn)
@@ -129,6 +200,8 @@ GRANT EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
TO regress_pg_walinspect;
GRANT EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
TO regress_pg_walinspect;
+GRANT EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ TO regress_pg_walinspect;
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_record_info(pg_lsn)', 'EXECUTE'); -- yes
has_function_privilege
@@ -150,12 +223,21 @@ SELECT has_function_privilege('regress_pg_walinspect',
t
(1 row)
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+ has_function_privilege
+------------------------
+ t
+(1 row)
+
REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn)
FROM regress_pg_walinspect;
REVOKE EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
FROM regress_pg_walinspect;
REVOKE EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
FROM regress_pg_walinspect;
+REVOKE EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ FROM regress_pg_walinspect;
-- ===================================================================
-- Clean up
-- ===================================================================
diff --git a/contrib/pg_walinspect/meson.build b/contrib/pg_walinspect/meson.build
index f6e912a7d8..bf7b79b1b7 100644
--- a/contrib/pg_walinspect/meson.build
+++ b/contrib/pg_walinspect/meson.build
@@ -19,6 +19,7 @@ contrib_targets += pg_walinspect
install_data(
'pg_walinspect.control',
'pg_walinspect--1.0.sql',
+ 'pg_walinspect--1.0--1.1.sql',
kwargs: contrib_data_args,
)
diff --git a/contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql b/contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql
new file mode 100644
index 0000000000..4ffbd91728
--- /dev/null
+++ b/contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql
@@ -0,0 +1,24 @@
+/* contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_walinspect UPDATE TO '1.1'" to load this file. \quit
+
+--
+-- pg_get_wal_fpi_info()
+--
+CREATE FUNCTION pg_get_wal_fpi_info(IN start_lsn pg_lsn,
+ IN end_lsn pg_lsn,
+ OUT lsn pg_lsn,
+ OUT tablespace_oid oid,
+ OUT database_oid oid,
+ OUT relfile_number oid,
+ OUT block_number int8,
+ OUT fork_name text,
+ OUT fpi bytea
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_get_wal_fpi_info'
+LANGUAGE C STRICT PARALLEL SAFE;
+
+REVOKE EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn) TO pg_read_server_files;
diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 08a4c2fb52..8f992960e8 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -35,6 +35,7 @@ PG_FUNCTION_INFO_V1(pg_get_wal_records_info);
PG_FUNCTION_INFO_V1(pg_get_wal_records_info_till_end_of_wal);
PG_FUNCTION_INFO_V1(pg_get_wal_stats);
PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
+PG_FUNCTION_INFO_V1(pg_get_wal_fpi_info);
static bool IsFutureLSN(XLogRecPtr lsn, XLogRecPtr *curr_lsn);
static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
@@ -55,6 +56,7 @@ static void FillXLogStatsRow(const char *name, uint64 n, uint64 total_count,
Datum *values, bool *nulls, uint32 ncols);
static void GetWalStats(FunctionCallInfo fcinfo, XLogRecPtr start_lsn,
XLogRecPtr end_lsn, bool stats_per_record);
+static void GetWALFPIInfo(FunctionCallInfo fcinfo, XLogReaderState *record);
/*
* Check if the given LSN is in future. Also, return the LSN up to which the
@@ -616,3 +618,111 @@ pg_get_wal_stats_till_end_of_wal(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+
+/*
+ * Get full page images and their info associated with a given WAL record.
+ * Decompression is applied to the blocks, if necessary.
+ */
+static void
+GetWALFPIInfo(FunctionCallInfo fcinfo, XLogReaderState *record)
+{
+#define PG_GET_WAL_FPI_INFO_COLS 7
+ int block_id;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+ {
+ PGAlignedBlock buf;
+ Page page;
+ bytea *raw_page;
+ BlockNumber blk;
+ RelFileLocator rnode;
+ ForkNumber fork;
+ Datum values[PG_GET_WAL_FPI_INFO_COLS] = {0};
+ bool nulls[PG_GET_WAL_FPI_INFO_COLS] = {0};
+ int i = 0;
+
+ if (!XLogRecHasBlockRef(record, block_id))
+ continue;
+
+ if (!XLogRecHasBlockImage(record, block_id))
+ continue;
+
+ page = (Page) buf.data;
+
+ if (!RestoreBlockImage(record, block_id, page))
+ ereport(ERROR,
+ (errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("%s", record->errormsg_buf)));
+
+ /* Full page exists, so let's output its info and content. */
+ (void) XLogRecGetBlockTagExtended(record, block_id,
+ &rnode, &fork, &blk, NULL);
+
+ values[i++] = LSNGetDatum(record->ReadRecPtr);
+ values[i++] = ObjectIdGetDatum(rnode.spcOid);
+ values[i++] = ObjectIdGetDatum(rnode.dbOid);
+ values[i++] = ObjectIdGetDatum(rnode.relNumber);
+ values[i++] = Int64GetDatum((int64) blk);
+
+ if (fork >= 0 && fork <= MAX_FORKNUM)
+ values[i++] = CStringGetTextDatum(forkNames[fork]);
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("invalid fork number: %u", fork)));
+
+ /* Initialize bytea buffer to copy the FPI to. */
+ raw_page = (bytea *) palloc(BLCKSZ + VARHDRSZ);
+ SET_VARSIZE(raw_page, BLCKSZ + VARHDRSZ);
+
+ /* Take a verbatim copy of the FPI. */
+ memcpy(VARDATA(raw_page), page, BLCKSZ);
+
+ values[i++] = PointerGetDatum(raw_page);
+
+ Assert(i == PG_GET_WAL_FPI_INFO_COLS);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+
+#undef PG_GET_WAL_FPI_INFO_COLS
+}
+
+/*
+ * Get full page images and their info of all WAL records between start LSN and
+ * end LSN. Decompression is applied to the blocks, if necessary.
+ *
+ * This function emits an error if a future start or end WAL LSN i.e. WAL LSN
+ * the database system doesn't know about is specified.
+ */
+Datum
+pg_get_wal_fpi_info(PG_FUNCTION_ARGS)
+{
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+ XLogReaderState *xlogreader;
+
+ start_lsn = PG_GETARG_LSN(0);
+ end_lsn = PG_GETARG_LSN(1);
+
+ end_lsn = ValidateInputLSNs(false, start_lsn, end_lsn);
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ xlogreader = InitXLogReaderState(start_lsn);
+
+ while (ReadNextXLogRecord(xlogreader) &&
+ xlogreader->EndRecPtr <= end_lsn)
+ {
+ GetWALFPIInfo(fcinfo, xlogreader);
+
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ pfree(xlogreader->private_data);
+ XLogReaderFree(xlogreader);
+
+ PG_RETURN_VOID();
+}
diff --git a/contrib/pg_walinspect/pg_walinspect.control b/contrib/pg_walinspect/pg_walinspect.control
index 017e56a2bb..efa3cb2cfe 100644
--- a/contrib/pg_walinspect/pg_walinspect.control
+++ b/contrib/pg_walinspect/pg_walinspect.control
@@ -1,5 +1,5 @@
# pg_walinspect extension
comment = 'functions to inspect contents of PostgreSQL Write-Ahead Log'
-default_version = '1.0'
+default_version = '1.1'
module_pathname = '$libdir/pg_walinspect'
relocatable = true
diff --git a/contrib/pg_walinspect/sql/pg_walinspect.sql b/contrib/pg_walinspect/sql/pg_walinspect.sql
index 1b265ea7bc..6b4c3dd90a 100644
--- a/contrib/pg_walinspect/sql/pg_walinspect.sql
+++ b/contrib/pg_walinspect/sql/pg_walinspect.sql
@@ -1,5 +1,7 @@
CREATE EXTENSION pg_walinspect;
+CREATE EXTENSION pageinspect;
+
-- Make sure checkpoints don't interfere with the test.
SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_walinspect_slot', true, false);
@@ -11,7 +13,7 @@ INSERT INTO sample_tbl SELECT * FROM generate_series(1, 2);
SELECT pg_current_wal_lsn() AS wal_lsn2 \gset
-INSERT INTO sample_tbl SELECT * FROM generate_series(1, 2);
+INSERT INTO sample_tbl SELECT * FROM generate_series(3, 4);
-- ===================================================================
-- Tests for input validation
@@ -52,6 +54,45 @@ SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info(:'wal_lsn1', :'wal_lsn2'
SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info(:'wal_lsn1', :'wal_lsn2')
WHERE resource_manager = 'Heap' AND record_type = 'INSERT';
+-- ===================================================================
+-- Tests to get full page image (FPI) from WAL record and compare it
+-- with raw page (post-update) from table
+-- ===================================================================
+SELECT pg_current_wal_lsn() AS wal_lsn3 \gset
+
+-- Force FPI on the next update
+CHECKPOINT;
+
+-- Update table to generate an FPI. We intentionally update a single row
+-- here to ensure the predictability in verifying the FPI with raw page
+-- content from the table.
+UPDATE sample_tbl SET col1 = col1 * 100 WHERE col1 = 1;
+
+SELECT pg_current_wal_lsn() AS wal_lsn4 \gset
+
+-- Get FPI from WAL record
+SELECT fpi AS page_from_wal FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4')
+ WHERE relfile_number = :'sample_tbl_oid' \gset
+
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4')
+ WHERE relfile_number = :'sample_tbl_oid';
+
+-- Get raw page from table
+SELECT get_raw_page('sample_tbl', 0) AS page_from_table \gset
+
+-- Compare FPI from WAL record and page from table, they must be same
+SELECT tuple_data_split('sample_tbl'::regclass, t_data, t_infomask, t_infomask2, t_bits)
+ FROM heap_page_items(:'page_from_table');
+
+SELECT tuple_data_split('sample_tbl'::regclass, t_data, t_infomask, t_infomask2, t_bits)
+ FROM heap_page_items(:'page_from_wal');
+
+SELECT tuple_data_split('sample_tbl'::regclass, t_data, t_infomask, t_infomask2, t_bits)
+ FROM heap_page_items(:'page_from_table')
+EXCEPT
+SELECT tuple_data_split('sample_tbl'::regclass, t_data, t_infomask, t_infomask2, t_bits)
+ FROM heap_page_items(:'page_from_wal');
+
-- ===================================================================
-- Tests for permissions
-- ===================================================================
@@ -66,6 +107,9 @@ SELECT has_function_privilege('regress_pg_walinspect',
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_stats(pg_lsn, pg_lsn, boolean) ', 'EXECUTE'); -- no
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- no
+
-- Functions accessible by users with role pg_read_server_files
GRANT pg_read_server_files TO regress_pg_walinspect;
@@ -79,6 +123,9 @@ SELECT has_function_privilege('regress_pg_walinspect',
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_stats(pg_lsn, pg_lsn, boolean) ', 'EXECUTE'); -- yes
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+
REVOKE pg_read_server_files FROM regress_pg_walinspect;
-- Superuser can grant execute to other users
@@ -91,6 +138,9 @@ GRANT EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
GRANT EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
TO regress_pg_walinspect;
+GRANT EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ TO regress_pg_walinspect;
+
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_record_info(pg_lsn)', 'EXECUTE'); -- yes
@@ -100,6 +150,9 @@ SELECT has_function_privilege('regress_pg_walinspect',
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_stats(pg_lsn, pg_lsn, boolean) ', 'EXECUTE'); -- yes
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+
REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn)
FROM regress_pg_walinspect;
@@ -109,6 +162,9 @@ REVOKE EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
REVOKE EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
FROM regress_pg_walinspect;
+REVOKE EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ FROM regress_pg_walinspect;
+
-- ===================================================================
-- Clean up
-- ===================================================================
diff --git a/doc/src/sgml/pgwalinspect.sgml b/doc/src/sgml/pgwalinspect.sgml
index 1a1bee7d6a..ef5e5852a9 100644
--- a/doc/src/sgml/pgwalinspect.sgml
+++ b/doc/src/sgml/pgwalinspect.sgml
@@ -261,6 +261,55 @@ postgres=# select * from pg_get_wal_stats('0/14AFC30', '0/15011D7', true) where
</listitem>
</varlistentry>
+ <varlistentry>
+ <term>
+ <function>
+ pg_get_wal_fpi_info(start_lsn pg_lsn,
+ end_lsn pg_lsn,
+ lsn OUT pg_lsn,
+ tablespace_oid OUT oid,
+ database_oid OUT oid,
+ relfile_number OUT oid,
+ block_number OUT int8,
+ fork_name OUT text,
+ fpi OUT bytea)
+ returns setof record
+ </function>
+ </term>
+
+ <listitem>
+ <para>
+ Gets raw full page images and their information associated with all the
+ valid WAL records between <replaceable>start_lsn</replaceable> and
+ <replaceable>end_lsn</replaceable>. Returns one row per full page image.
+ Decompression is applied to the blocks, if necessary.
+ If <replaceable>start_lsn</replaceable> or
+ <replaceable>end_lsn</replaceable> are not yet available, the function
+ will raise an error. For example, usage of the function is as follows:
+<screen>
+postgres=# SELECT lsn, tablespace_oid, database_oid, relfile_number, block_number, fork_name, length(fpi) > 0 as fpi_ok FROM pg_get_wal_fpi_info('0/7418E60', '0/7518218');
+ lsn | tablespace_oid | database_oid | relfile_number | block_number | fork_name | fpi_ok
+-----------+----------------+--------------+----------------+--------------+-----------+--------
+ 0/7418F48 | 1663 | 5 | 16419 | 0 | main | t
+ 0/7419080 | 1663 | 5 | 16422 | 19 | main | t
+ 0/741B0B8 | 1663 | 5 | 16422 | 22 | main | t
+ 0/741BF48 | 1663 | 5 | 16422 | 20 | main | t
+ 0/74257B8 | 1663 | 5 | 16422 | 21 | main | t
+ 0/742FF30 | 1663 | 5 | 16425 | 42 | main | t
+ 0/7431F68 | 1663 | 5 | 16425 | 44 | main | t
+ 0/7439B78 | 1663 | 5 | 16425 | 43 | main | t
+ 0/74451E8 | 1663 | 5 | 16428 | 199 | main | t
+ 0/7447220 | 1663 | 5 | 16428 | 221 | main | t
+ 0/744E458 | 1663 | 5 | 16428 | 200 | main | t
+ 0/7457CB0 | 1663 | 5 | 16428 | 201 | main | t
+ 0/7461520 | 1663 | 5 | 16428 | 202 | main | t
+ 0/746AD90 | 1663 | 5 | 16428 | 203 | main | t
+(14 rows)
+</screen>
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
--
2.34.1
Hi,
On 1/6/23 6:41 PM, Bharath Rupireddy wrote:
On Fri, Jan 6, 2023 at 11:47 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Thu, Jan 5, 2023 at 6:51 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that both
extracted images are the same and matches the one modified right after the checkpoint.What do you think? (could be done later in another patch though).
I think pageinspect can be used here. We can fetch the raw page from
the table after the checkpoint and raw FPI from the WAL record logged
as part of the update. I've tried to do so [1], but I see a slight
difference in the raw output. The expectation is that they both be the
same. It might be that the update operation logs the FPI with some
more info set (prune_xid). I'll try to see why it is so.I'm attaching the v2 patch for further review.
[1]
SELECT * FROM page_header(:'page_from_table');
lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
0/1891D78 | 0 | 0 | 40 | 8064 | 8192 | 8192 |
4 | 0
(1 row)SELECT * FROM page_header(:'page_from_wal');
lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
0/1891D78 | 0 | 0 | 44 | 8032 | 8192 | 8192 |
4 | 735
(1 row)Ugh, v2 patch missed the new file added, I'm attaching v3 patch for
further review. Sorry for the noise.I took a stab at how and what gets logged as FPI in WAL records:
Option 1:
WAL record with FPI contains both the unmodified table page from the
disk after checkpoint and new tuple (not applied to the unmodified
page) and the recovery (redo) applies the new tuple to the unmodified
page as part of recovery. A bit more WAL is needed to store both
unmodified page and new tuple data in the WAL record and recovery can
get slower a bit too as it needs to stitch the modified page.Option 2:
WAL record with FPI contains only the modified page (new tuple applied
to the unmodified page from the disk after checkpoint) and the
recovery (redo) just returns the applied block as BLK_RESTORED.
Recovery can get faster with this approach and less WAL is needed to
store just the modified page.My earlier understanding was that postgres does option (1), however, I
was wrong, option (2) is what actually postgres has implemented for
the obvious advantages specified.I now made the tests a bit stricter in checking the FPI contents
(tuple values) pulled from the WAL record with raw page contents
pulled from the table using the pageinspect extension. Please see the
attached v4 patch.
Thanks for updating the patch!
+-- Compare FPI from WAL record and page from table, they must be same
I think "must be the same" or "must be identical" sounds better (but not 100% sure).
Except this nit, V4 looks good to me.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Tue, Jan 10, 2023 at 09:29:03AM +0100, Drouvot, Bertrand wrote:
Thanks for updating the patch!
+-- Compare FPI from WAL record and page from table, they must be same
I think "must be the same" or "must be identical" sounds better (but not 100% sure).
Except this nit, V4 looks good to me.
+postgres=# SELECT lsn, tablespace_oid, database_oid, relfile_number,
block_number, fork_name, length(fpi) > 0 as fpi_ok FROM
pg_get_wal_fpi_info('0/7418E60', '0/7518218');
This query in the docs is too long IMO. Could you split that across
multiple lines for readability?
+ pg_get_wal_fpi_info(start_lsn pg_lsn,
+ end_lsn pg_lsn,
+ lsn OUT pg_lsn,
+ tablespace_oid OUT oid,
+ database_oid OUT oid,
+ relfile_number OUT oid,
+ block_number OUT int8,
+ fork_name OUT text,
+ fpi OUT bytea)
I am a bit surprised by this format, used to define the functions part
of the module in the docs, while we have examples that actually show
what's printed out. I understand that this comes from the original
commit of the module, but the rendered docs are really hard to parse
as well, no? FWIW, I think that this had better be fixed as well in
the docs of v15.. Showing a full set of attributes for the returned
record is fine by me, still if these are too long we could just use
\x. For this one, I think that there is little point in showing 14
records, so I would stick with a style similar to pageinspect.
+CREATE FUNCTION pg_get_wal_fpi_info(IN start_lsn pg_lsn,
+ IN end_lsn pg_lsn,
+ OUT lsn pg_lsn,
+ OUT tablespace_oid oid,
Slight indentation issue here.
Using "relfile_number" would be a first, for what is defined in the
code and the docs as a filenode.
+SELECT pg_current_wal_lsn() AS wal_lsn4 \gset
+-- Get FPI from WAL record
+SELECT fpi AS page_from_wal FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4')
+ WHERE relfile_number = :'sample_tbl_oid' \gset
I would be tempted to keep the checks run here minimal with only a
basic set of checks on the LSN, without the dependencies to
pageinspect (tuple_data_split and get_raw_page), which would be fine
enough to check the execution of the function.
FWIW, I am surprised by the design choice behind ValidateInputLSNs()
to allow data to be gathered until the end of WAL in some cases, but
to not allow it in others. It is likely too late to come back to this
choice for the existing functions in v15 (quoique?), but couldn't it
be useful to make this new FPI function work at least with an insanely
high LSN value to make sure that we fetch all the FPIs from a given
start position, up to the end of WAL? That looks like a pretty good
default behavior to me, rather than issuing an error when a LSN is
defined as in the future.. I am really wondering why we have
ValidateInputLSNs(till_end_of_wal=false) to begin with, while we could
just allow any LSN value in the future automatically, as we can know
the current insert or replay LSNs (depending on the recovery state).
--
Michael
On Wed, Jan 11, 2023 at 10:07 AM Michael Paquier <michael@paquier.xyz> wrote:
+postgres=# SELECT lsn, tablespace_oid, database_oid, relfile_number,
block_number, fork_name, length(fpi) > 0 as fpi_ok FROM
pg_get_wal_fpi_info('0/7418E60', '0/7518218');This query in the docs is too long IMO. Could you split that across
multiple lines for readability?
Done.
+ pg_get_wal_fpi_info(start_lsn pg_lsn, + end_lsn pg_lsn, + lsn OUT pg_lsn, + tablespace_oid OUT oid, + database_oid OUT oid, + relfile_number OUT oid, + block_number OUT int8, + fork_name OUT text, + fpi OUT bytea) I am a bit surprised by this format, used to define the functions part of the module in the docs, while we have examples that actually show what's printed out. I understand that this comes from the original commit of the module, but the rendered docs are really hard to parse as well, no? FWIW, I think that this had better be fixed as well in the docs of v15.. Showing a full set of attributes for the returned record is fine by me, still if these are too long we could just use \x.
Thanks. I'll work on that separately.
For this one, I think that there is little point in showing 14
records, so I would stick with a style similar to pageinspect.
I've done it that way for pg_get_wal_fpi_info. If this format looks
okay, I can propose to do the same for other functions (for
backpatching too) in a separate thread though.
+CREATE FUNCTION pg_get_wal_fpi_info(IN start_lsn pg_lsn, + IN end_lsn pg_lsn, + OUT lsn pg_lsn, + OUT tablespace_oid oid, Slight indentation issue here.
Done.
Using "relfile_number" would be a first, for what is defined in the
code and the docs as a filenode.
Yes, I've changed the column names to be consistent (like in pg_buffercache).
+SELECT pg_current_wal_lsn() AS wal_lsn4 \gset +-- Get FPI from WAL record +SELECT fpi AS page_from_wal FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4') + WHERE relfile_number = :'sample_tbl_oid' \gset I would be tempted to keep the checks run here minimal with only a basic set of checks on the LSN, without the dependencies to pageinspect (tuple_data_split and get_raw_page), which would be fine enough to check the execution of the function.
I understand the concern here that creating dependency between
extensions just for testing isn't good.
I'm okay to just read the LSN (lsn1) from raw FPI (bytea stream) and
the WAL record's LSN (lsn2) and compare them to be lsn2 > lsn1. I'm
looking for a way to convert the first 8 bytes from bytea stream to
pg_lsn type, on a quick look I couldn't find direct conversion
functions, however, I'll try to figure out a way.
FWIW, I am surprised by the design choice behind ValidateInputLSNs()
to allow data to be gathered until the end of WAL in some cases, but
to not allow it in others. It is likely too late to come back to this
choice for the existing functions in v15 (quoique?), but couldn't it
Separate functions for users passing end_lsn by themselves and users
letting functions decide the end_lsn (current flush LSN or replay LSN)
were chosen for better and easier usability and easier validation of
user-entered input lsns.
We deliberated to have something like below:
pg_get_wal_stats(start_lsn, end_lsn, till_end_of_wal default false);
pg_get_wal_records_info(start_lsn, end_lsn, till_end_of_wal default false);
We wanted to have better validation of the start_lsn and end_lsn, that
is, start_lsn < end_lsn and end_lsn mustn't fall into the future when
users specify it by themselves (otherwise, one can easily trick the
server by passing in the extreme end of the LSN - 0xFFFFFFFFFFFFFFFF).
And, we couldn't find a better way to deal with when till_end_of_wal
is passed as true (in the above version of the functions).
Another idea was to have something like below:
pg_get_wal_stats(start_lsn, end_lsn default '0/0');
pg_get_wal_records_info(start_lsn, end_lsn default '0/0');
When end_lsn is not entered or entered as invalid lsn, then return the
stats/info till end of the WAL. Again, we wanted to have some
validation of the user-entered end_lsn.
Instead of cooking multiple behaviours into a single function we opted
for till_end_of_wal versions. I still feel this is better unless
there's a strong reason against till_end_of_wal versions.
be useful to make this new FPI function work at least with an insanely
high LSN value to make sure that we fetch all the FPIs from a given
start position, up to the end of WAL? That looks like a pretty good
default behavior to me, rather than issuing an error when a LSN is
defined as in the future.. I am really wondering why we have
ValidateInputLSNs(till_end_of_wal=false) to begin with, while we could
just allow any LSN value in the future automatically, as we can know
the current insert or replay LSNs (depending on the recovery state).
Hm. How about having pg_get_wal_fpi_info_till_end_of_wal() then?
With some of the review comments addressed, I'm attaching v5 patch
herewith. I would like to hear thoughts on the above open points
before writing the v6 patch.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v5-0001-Add-FPI-extract-function-to-pg_walinspect.patchapplication/x-patch; name=v5-0001-Add-FPI-extract-function-to-pg_walinspect.patchDownload
From a611efa28d2caf86c3b228ec538d5a9c22931bb1 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Wed, 11 Jan 2023 12:09:46 +0000
Subject: [PATCH v5] Add FPI extract function to pg_walinspect
---
contrib/pg_walinspect/Makefile | 3 +-
.../pg_walinspect/expected/pg_walinspect.out | 84 ++++++++++++-
contrib/pg_walinspect/meson.build | 1 +
.../pg_walinspect/pg_walinspect--1.0--1.1.sql | 24 ++++
contrib/pg_walinspect/pg_walinspect.c | 110 ++++++++++++++++++
contrib/pg_walinspect/pg_walinspect.control | 2 +-
contrib/pg_walinspect/sql/pg_walinspect.sql | 58 ++++++++-
doc/src/sgml/pgwalinspect.sgml | 56 +++++++++
8 files changed, 334 insertions(+), 4 deletions(-)
create mode 100644 contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql
diff --git a/contrib/pg_walinspect/Makefile b/contrib/pg_walinspect/Makefile
index 960530eb6c..7412307ede 100644
--- a/contrib/pg_walinspect/Makefile
+++ b/contrib/pg_walinspect/Makefile
@@ -7,11 +7,12 @@ OBJS = \
PGFILEDESC = "pg_walinspect - functions to inspect contents of PostgreSQL Write-Ahead Log"
EXTENSION = pg_walinspect
-DATA = pg_walinspect--1.0.sql
+DATA = pg_walinspect--1.0.sql pg_walinspect--1.0--1.1.sql
REGRESS = pg_walinspect
REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_walinspect/walinspect.conf
+EXTRA_INSTALL = contrib/pageinspect
# Disabled because these tests require "wal_level=replica", which
# some installcheck users do not have (e.g. buildfarm clients).
diff --git a/contrib/pg_walinspect/expected/pg_walinspect.out b/contrib/pg_walinspect/expected/pg_walinspect.out
index a1ee743457..b422e450e2 100644
--- a/contrib/pg_walinspect/expected/pg_walinspect.out
+++ b/contrib/pg_walinspect/expected/pg_walinspect.out
@@ -1,4 +1,5 @@
CREATE EXTENSION pg_walinspect;
+CREATE EXTENSION pageinspect;
-- Make sure checkpoints don't interfere with the test.
SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_walinspect_slot', true, false);
?column?
@@ -10,7 +11,7 @@ CREATE TABLE sample_tbl(col1 int, col2 int);
SELECT pg_current_wal_lsn() AS wal_lsn1 \gset
INSERT INTO sample_tbl SELECT * FROM generate_series(1, 2);
SELECT pg_current_wal_lsn() AS wal_lsn2 \gset
-INSERT INTO sample_tbl SELECT * FROM generate_series(1, 2);
+INSERT INTO sample_tbl SELECT * FROM generate_series(3, 4);
-- ===================================================================
-- Tests for input validation
-- ===================================================================
@@ -73,6 +74,62 @@ SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info(:'wal_lsn1', :'wal_lsn2'
t
(1 row)
+-- ===================================================================
+-- Tests to get full page image (FPI) from WAL record and compare it
+-- with raw page (post-update) from table
+-- ===================================================================
+SELECT pg_current_wal_lsn() AS wal_lsn3 \gset
+-- Force FPI on the next update
+CHECKPOINT;
+-- Update table to generate an FPI. We intentionally update a single row
+-- here to ensure the predictability in verifying the FPI with raw page
+-- content from the table.
+UPDATE sample_tbl SET col1 = col1 * 100 WHERE col1 = 1;
+SELECT pg_current_wal_lsn() AS wal_lsn4 \gset
+-- Get FPI from WAL record
+SELECT fpi AS page_from_wal FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4')
+ WHERE relfilenode = :'sample_tbl_oid' \gset
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4')
+ WHERE relfilenode = :'sample_tbl_oid';
+ ok
+----
+ t
+(1 row)
+
+-- Get raw page from table
+SELECT get_raw_page('sample_tbl', 0) AS page_from_table \gset
+-- Compare FPI from WAL record and page from table, they must be same
+SELECT tuple_data_split('sample_tbl'::regclass, t_data, t_infomask, t_infomask2, t_bits)
+ FROM heap_page_items(:'page_from_table');
+ tuple_data_split
+----------------------
+ {"\\x01000000",NULL}
+ {"\\x02000000",NULL}
+ {"\\x03000000",NULL}
+ {"\\x04000000",NULL}
+ {"\\x64000000",NULL}
+(5 rows)
+
+SELECT tuple_data_split('sample_tbl'::regclass, t_data, t_infomask, t_infomask2, t_bits)
+ FROM heap_page_items(:'page_from_wal');
+ tuple_data_split
+----------------------
+ {"\\x01000000",NULL}
+ {"\\x02000000",NULL}
+ {"\\x03000000",NULL}
+ {"\\x04000000",NULL}
+ {"\\x64000000",NULL}
+(5 rows)
+
+SELECT tuple_data_split('sample_tbl'::regclass, t_data, t_infomask, t_infomask2, t_bits)
+ FROM heap_page_items(:'page_from_table')
+EXCEPT
+SELECT tuple_data_split('sample_tbl'::regclass, t_data, t_infomask, t_infomask2, t_bits)
+ FROM heap_page_items(:'page_from_wal');
+ tuple_data_split
+------------------
+(0 rows)
+
-- ===================================================================
-- Tests for permissions
-- ===================================================================
@@ -98,6 +155,13 @@ SELECT has_function_privilege('regress_pg_walinspect',
f
(1 row)
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- no
+ has_function_privilege
+------------------------
+ f
+(1 row)
+
-- Functions accessible by users with role pg_read_server_files
GRANT pg_read_server_files TO regress_pg_walinspect;
SELECT has_function_privilege('regress_pg_walinspect',
@@ -121,6 +185,13 @@ SELECT has_function_privilege('regress_pg_walinspect',
t
(1 row)
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+ has_function_privilege
+------------------------
+ t
+(1 row)
+
REVOKE pg_read_server_files FROM regress_pg_walinspect;
-- Superuser can grant execute to other users
GRANT EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn)
@@ -129,6 +200,8 @@ GRANT EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
TO regress_pg_walinspect;
GRANT EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
TO regress_pg_walinspect;
+GRANT EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ TO regress_pg_walinspect;
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_record_info(pg_lsn)', 'EXECUTE'); -- yes
has_function_privilege
@@ -150,12 +223,21 @@ SELECT has_function_privilege('regress_pg_walinspect',
t
(1 row)
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+ has_function_privilege
+------------------------
+ t
+(1 row)
+
REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn)
FROM regress_pg_walinspect;
REVOKE EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
FROM regress_pg_walinspect;
REVOKE EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
FROM regress_pg_walinspect;
+REVOKE EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ FROM regress_pg_walinspect;
-- ===================================================================
-- Clean up
-- ===================================================================
diff --git a/contrib/pg_walinspect/meson.build b/contrib/pg_walinspect/meson.build
index f6e912a7d8..bf7b79b1b7 100644
--- a/contrib/pg_walinspect/meson.build
+++ b/contrib/pg_walinspect/meson.build
@@ -19,6 +19,7 @@ contrib_targets += pg_walinspect
install_data(
'pg_walinspect.control',
'pg_walinspect--1.0.sql',
+ 'pg_walinspect--1.0--1.1.sql',
kwargs: contrib_data_args,
)
diff --git a/contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql b/contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql
new file mode 100644
index 0000000000..1e9e1e6115
--- /dev/null
+++ b/contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql
@@ -0,0 +1,24 @@
+/* contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_walinspect UPDATE TO '1.1'" to load this file. \quit
+
+--
+-- pg_get_wal_fpi_info()
+--
+CREATE FUNCTION pg_get_wal_fpi_info(IN start_lsn pg_lsn,
+ IN end_lsn pg_lsn,
+ OUT lsn pg_lsn,
+ OUT reltablespace oid,
+ OUT reldatabase oid,
+ OUT relfilenode oid,
+ OUT relblocknumber int8,
+ OUT forkname text,
+ OUT fpi bytea
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_get_wal_fpi_info'
+LANGUAGE C STRICT PARALLEL SAFE;
+
+REVOKE EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn) TO pg_read_server_files;
diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 08a4c2fb52..8f992960e8 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -35,6 +35,7 @@ PG_FUNCTION_INFO_V1(pg_get_wal_records_info);
PG_FUNCTION_INFO_V1(pg_get_wal_records_info_till_end_of_wal);
PG_FUNCTION_INFO_V1(pg_get_wal_stats);
PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
+PG_FUNCTION_INFO_V1(pg_get_wal_fpi_info);
static bool IsFutureLSN(XLogRecPtr lsn, XLogRecPtr *curr_lsn);
static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
@@ -55,6 +56,7 @@ static void FillXLogStatsRow(const char *name, uint64 n, uint64 total_count,
Datum *values, bool *nulls, uint32 ncols);
static void GetWalStats(FunctionCallInfo fcinfo, XLogRecPtr start_lsn,
XLogRecPtr end_lsn, bool stats_per_record);
+static void GetWALFPIInfo(FunctionCallInfo fcinfo, XLogReaderState *record);
/*
* Check if the given LSN is in future. Also, return the LSN up to which the
@@ -616,3 +618,111 @@ pg_get_wal_stats_till_end_of_wal(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+
+/*
+ * Get full page images and their info associated with a given WAL record.
+ * Decompression is applied to the blocks, if necessary.
+ */
+static void
+GetWALFPIInfo(FunctionCallInfo fcinfo, XLogReaderState *record)
+{
+#define PG_GET_WAL_FPI_INFO_COLS 7
+ int block_id;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+ {
+ PGAlignedBlock buf;
+ Page page;
+ bytea *raw_page;
+ BlockNumber blk;
+ RelFileLocator rnode;
+ ForkNumber fork;
+ Datum values[PG_GET_WAL_FPI_INFO_COLS] = {0};
+ bool nulls[PG_GET_WAL_FPI_INFO_COLS] = {0};
+ int i = 0;
+
+ if (!XLogRecHasBlockRef(record, block_id))
+ continue;
+
+ if (!XLogRecHasBlockImage(record, block_id))
+ continue;
+
+ page = (Page) buf.data;
+
+ if (!RestoreBlockImage(record, block_id, page))
+ ereport(ERROR,
+ (errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("%s", record->errormsg_buf)));
+
+ /* Full page exists, so let's output its info and content. */
+ (void) XLogRecGetBlockTagExtended(record, block_id,
+ &rnode, &fork, &blk, NULL);
+
+ values[i++] = LSNGetDatum(record->ReadRecPtr);
+ values[i++] = ObjectIdGetDatum(rnode.spcOid);
+ values[i++] = ObjectIdGetDatum(rnode.dbOid);
+ values[i++] = ObjectIdGetDatum(rnode.relNumber);
+ values[i++] = Int64GetDatum((int64) blk);
+
+ if (fork >= 0 && fork <= MAX_FORKNUM)
+ values[i++] = CStringGetTextDatum(forkNames[fork]);
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("invalid fork number: %u", fork)));
+
+ /* Initialize bytea buffer to copy the FPI to. */
+ raw_page = (bytea *) palloc(BLCKSZ + VARHDRSZ);
+ SET_VARSIZE(raw_page, BLCKSZ + VARHDRSZ);
+
+ /* Take a verbatim copy of the FPI. */
+ memcpy(VARDATA(raw_page), page, BLCKSZ);
+
+ values[i++] = PointerGetDatum(raw_page);
+
+ Assert(i == PG_GET_WAL_FPI_INFO_COLS);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+
+#undef PG_GET_WAL_FPI_INFO_COLS
+}
+
+/*
+ * Get full page images and their info of all WAL records between start LSN and
+ * end LSN. Decompression is applied to the blocks, if necessary.
+ *
+ * This function emits an error if a future start or end WAL LSN i.e. WAL LSN
+ * the database system doesn't know about is specified.
+ */
+Datum
+pg_get_wal_fpi_info(PG_FUNCTION_ARGS)
+{
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+ XLogReaderState *xlogreader;
+
+ start_lsn = PG_GETARG_LSN(0);
+ end_lsn = PG_GETARG_LSN(1);
+
+ end_lsn = ValidateInputLSNs(false, start_lsn, end_lsn);
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ xlogreader = InitXLogReaderState(start_lsn);
+
+ while (ReadNextXLogRecord(xlogreader) &&
+ xlogreader->EndRecPtr <= end_lsn)
+ {
+ GetWALFPIInfo(fcinfo, xlogreader);
+
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ pfree(xlogreader->private_data);
+ XLogReaderFree(xlogreader);
+
+ PG_RETURN_VOID();
+}
diff --git a/contrib/pg_walinspect/pg_walinspect.control b/contrib/pg_walinspect/pg_walinspect.control
index 017e56a2bb..efa3cb2cfe 100644
--- a/contrib/pg_walinspect/pg_walinspect.control
+++ b/contrib/pg_walinspect/pg_walinspect.control
@@ -1,5 +1,5 @@
# pg_walinspect extension
comment = 'functions to inspect contents of PostgreSQL Write-Ahead Log'
-default_version = '1.0'
+default_version = '1.1'
module_pathname = '$libdir/pg_walinspect'
relocatable = true
diff --git a/contrib/pg_walinspect/sql/pg_walinspect.sql b/contrib/pg_walinspect/sql/pg_walinspect.sql
index 1b265ea7bc..3cd1beea35 100644
--- a/contrib/pg_walinspect/sql/pg_walinspect.sql
+++ b/contrib/pg_walinspect/sql/pg_walinspect.sql
@@ -1,5 +1,7 @@
CREATE EXTENSION pg_walinspect;
+CREATE EXTENSION pageinspect;
+
-- Make sure checkpoints don't interfere with the test.
SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_walinspect_slot', true, false);
@@ -11,7 +13,7 @@ INSERT INTO sample_tbl SELECT * FROM generate_series(1, 2);
SELECT pg_current_wal_lsn() AS wal_lsn2 \gset
-INSERT INTO sample_tbl SELECT * FROM generate_series(1, 2);
+INSERT INTO sample_tbl SELECT * FROM generate_series(3, 4);
-- ===================================================================
-- Tests for input validation
@@ -52,6 +54,45 @@ SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info(:'wal_lsn1', :'wal_lsn2'
SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info(:'wal_lsn1', :'wal_lsn2')
WHERE resource_manager = 'Heap' AND record_type = 'INSERT';
+-- ===================================================================
+-- Tests to get full page image (FPI) from WAL record and compare it
+-- with raw page (post-update) from table
+-- ===================================================================
+SELECT pg_current_wal_lsn() AS wal_lsn3 \gset
+
+-- Force FPI on the next update
+CHECKPOINT;
+
+-- Update table to generate an FPI. We intentionally update a single row
+-- here to ensure the predictability in verifying the FPI with raw page
+-- content from the table.
+UPDATE sample_tbl SET col1 = col1 * 100 WHERE col1 = 1;
+
+SELECT pg_current_wal_lsn() AS wal_lsn4 \gset
+
+-- Get FPI from WAL record
+SELECT fpi AS page_from_wal FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4')
+ WHERE relfilenode = :'sample_tbl_oid' \gset
+
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4')
+ WHERE relfilenode = :'sample_tbl_oid';
+
+-- Get raw page from table
+SELECT get_raw_page('sample_tbl', 0) AS page_from_table \gset
+
+-- Compare FPI from WAL record and page from table, they must be same
+SELECT tuple_data_split('sample_tbl'::regclass, t_data, t_infomask, t_infomask2, t_bits)
+ FROM heap_page_items(:'page_from_table');
+
+SELECT tuple_data_split('sample_tbl'::regclass, t_data, t_infomask, t_infomask2, t_bits)
+ FROM heap_page_items(:'page_from_wal');
+
+SELECT tuple_data_split('sample_tbl'::regclass, t_data, t_infomask, t_infomask2, t_bits)
+ FROM heap_page_items(:'page_from_table')
+EXCEPT
+SELECT tuple_data_split('sample_tbl'::regclass, t_data, t_infomask, t_infomask2, t_bits)
+ FROM heap_page_items(:'page_from_wal');
+
-- ===================================================================
-- Tests for permissions
-- ===================================================================
@@ -66,6 +107,9 @@ SELECT has_function_privilege('regress_pg_walinspect',
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_stats(pg_lsn, pg_lsn, boolean) ', 'EXECUTE'); -- no
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- no
+
-- Functions accessible by users with role pg_read_server_files
GRANT pg_read_server_files TO regress_pg_walinspect;
@@ -79,6 +123,9 @@ SELECT has_function_privilege('regress_pg_walinspect',
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_stats(pg_lsn, pg_lsn, boolean) ', 'EXECUTE'); -- yes
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+
REVOKE pg_read_server_files FROM regress_pg_walinspect;
-- Superuser can grant execute to other users
@@ -91,6 +138,9 @@ GRANT EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
GRANT EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
TO regress_pg_walinspect;
+GRANT EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ TO regress_pg_walinspect;
+
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_record_info(pg_lsn)', 'EXECUTE'); -- yes
@@ -100,6 +150,9 @@ SELECT has_function_privilege('regress_pg_walinspect',
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_stats(pg_lsn, pg_lsn, boolean) ', 'EXECUTE'); -- yes
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+
REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn)
FROM regress_pg_walinspect;
@@ -109,6 +162,9 @@ REVOKE EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
REVOKE EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
FROM regress_pg_walinspect;
+REVOKE EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ FROM regress_pg_walinspect;
+
-- ===================================================================
-- Clean up
-- ===================================================================
diff --git a/doc/src/sgml/pgwalinspect.sgml b/doc/src/sgml/pgwalinspect.sgml
index c4c3efe932..d74432a39f 100644
--- a/doc/src/sgml/pgwalinspect.sgml
+++ b/doc/src/sgml/pgwalinspect.sgml
@@ -261,6 +261,62 @@ postgres=# select * from pg_get_wal_stats('0/14AFC30', '0/15011D7', true) where
</listitem>
</varlistentry>
+ <varlistentry>
+ <term>
+ <function>pg_get_wal_fpi_info(start_lsn pg_lsn, end_lsn pg_lsn) returns setof record</function>
+ </term>
+
+ <listitem>
+ <para>
+ Gets raw full page images and their information associated with all the
+ valid WAL records between <replaceable>start_lsn</replaceable> and
+ <replaceable>end_lsn</replaceable>. Returns one row per full page image.
+ Decompression is applied to the blocks, if necessary.
+ If <replaceable>start_lsn</replaceable> or
+ <replaceable>end_lsn</replaceable> are not yet available, the function
+ will raise an error. For example, usage of the function is as follows:
+<screen>
+postgres=# \x
+postgres=# SELECT lsn, reltablespace, reldatabase, relfilenode, relblocknumber,
+ forkname, length(fpi) > 0 as fpi_ok
+ FROM pg_get_wal_fpi_info('0/1590ED0', '0/15BCC58');
+-[ RECORD 1 ]--+----------
+lsn | 0/1590F08
+reltablespace | 1663
+reldatabase | 5
+relfilenode | 16403
+relblocknumber | 4
+forkname | main
+fpi_ok | t
+-[ RECORD 2 ]--+----------
+lsn | 0/159DF90
+reltablespace | 1663
+reldatabase | 5
+relfilenode | 16403
+relblocknumber | 6
+forkname | main
+fpi_ok | t
+-[ RECORD 3 ]--+----------
+lsn | 0/15B9C20
+reltablespace | 1663
+reldatabase | 5
+relfilenode | 16406
+relblocknumber | 0
+forkname | main
+fpi_ok | t
+-[ RECORD 4 ]--+----------
+lsn | 0/15BBC58
+reltablespace | 1663
+reldatabase | 5
+relfilenode | 16406
+relblocknumber | 4
+forkname | main
+fpi_ok | t
+</screen>
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
--
2.34.1
On Wed, Jan 11, 2023 at 06:59:18PM +0530, Bharath Rupireddy wrote:
I've done it that way for pg_get_wal_fpi_info. If this format looks
okay, I can propose to do the same for other functions (for
backpatching too) in a separate thread though.
My vote would be to make that happen first, to have in place cleaner
basics for the docs. I could just do it and move on..
We deliberated to have something like below:
pg_get_wal_stats(start_lsn, end_lsn, till_end_of_wal default false);
pg_get_wal_records_info(start_lsn, end_lsn, till_end_of_wal default false);We wanted to have better validation of the start_lsn and end_lsn, that
is, start_lsn < end_lsn and end_lsn mustn't fall into the future when
users specify it by themselves (otherwise, one can easily trick the
server by passing in the extreme end of the LSN - 0xFFFFFFFFFFFFFFFF).
And, we couldn't find a better way to deal with when till_end_of_wal
is passed as true (in the above version of the functions).Another idea was to have something like below:
pg_get_wal_stats(start_lsn, end_lsn default '0/0');
pg_get_wal_records_info(start_lsn, end_lsn default '0/0');When end_lsn is not entered or entered as invalid lsn, then return the
stats/info till end of the WAL. Again, we wanted to have some
validation of the user-entered end_lsn.
This reminds me of the slot advancing, where we discarded this case
just because it is useful to enforce a LSN far in the future.
Honestly, I cannot think of any case where I would use this level of
validation, especially having *two* functions to decide one behavior
or the other: this stuff could just use one function and use for
example NULL as a setup to enforce the end of WAL, on top of a LSN far
ahead.. But well..
be useful to make this new FPI function work at least with an insanely
high LSN value to make sure that we fetch all the FPIs from a given
start position, up to the end of WAL? That looks like a pretty good
default behavior to me, rather than issuing an error when a LSN is
defined as in the future.. I am really wondering why we have
ValidateInputLSNs(till_end_of_wal=false) to begin with, while we could
just allow any LSN value in the future automatically, as we can know
the current insert or replay LSNs (depending on the recovery state).Hm. How about having pg_get_wal_fpi_info_till_end_of_wal() then?
I don't really want to make the interface more bloated with more
functions than necessary, TBH, though I get your point to push for
consistency in these functions. This makes me really wonder whether
we'd better make relax all the existing functions, but I may get
outvoted.
--
Michael
On Thu, Jan 12, 2023 at 11:23 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Jan 11, 2023 at 06:59:18PM +0530, Bharath Rupireddy wrote:
I've done it that way for pg_get_wal_fpi_info. If this format looks
okay, I can propose to do the same for other functions (for
backpatching too) in a separate thread though.My vote would be to make that happen first, to have in place cleaner
basics for the docs. I could just do it and move on..
Sure. Posted a separate patch here -
/messages/by-id/CALj2ACVGcUpziGgQrcT-1G3dHWQQfWjYBu1YQ2ypv9y86dgogg@mail.gmail.com
This reminds me of the slot advancing, where we discarded this case
just because it is useful to enforce a LSN far in the future.
Honestly, I cannot think of any case where I would use this level of
validation, especially having *two* functions to decide one behavior
or the other: this stuff could just use one function and use for
example NULL as a setup to enforce the end of WAL, on top of a LSN far
ahead.. But well..
I understand. I don't mind discussing something like [1]CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn, IN end_lsn pg_lsn DEFAULT NULL, OUT start_lsn pg_lsn, OUT end_lsn pg_lsn, OUT prev_lsn pg_lsn, OUT xid xid, OUT resource_manager text, OUT record_type text, OUT record_length int4, OUT main_data_length int4, OUT fpi_length int4, OUT description text, OUT block_ref text ) RETURNS SETOF record AS 'MODULE_PATHNAME', 'pg_get_wal_records_info' LANGUAGE C CALLED ON NULL INPUT IMMUTABLE PARALLEL SAFE STRICT PARALLEL SAFE; with the
following behaviour and discarding till_end_of_wal functions
altogether:
If start_lsn is NULL, error out/return NULL.
If end_lsn isn't specified, default to NULL, then determine the end_lsn.
If end_lsn is specified as NULL, then determine the end_lsn.
If end_lsn is specified as non-NULL, then determine if it is greater
than start_lsn if yes, go ahead do the job, otherwise error out.
I'll think a bit more on this and perhaps discuss it separately.
Hm. How about having pg_get_wal_fpi_info_till_end_of_wal() then?
I don't really want to make the interface more bloated with more
functions than necessary, TBH, though I get your point to push for
consistency in these functions. This makes me really wonder whether
we'd better make relax all the existing functions, but I may get
outvoted.
I'll keep the FPI extract function simple as proposed in the patch and
I'll not go write till_end_of_wal version. If needed to get all the
FPIs till the end of WAL, one can always determine the end_lsn with
pg_current_wal_flush_lsn()/pg_last_wal_replay_lsn() when in recovery,
and use the proposed function.
Note that I kept the FPI extract function test simple - ensure FPI
gets generated and check if the new function can fetch it, discarding
lsn or other FPI sanity checks. Whole FPI sanity check needs
pageinspect and we don't want to create the dependency just for tests.
And checking for FPI lsn with WAL record lsn requires us to fetch lsn
from raw bytea stream. As there's no straightforward way to convert
raw bytes from bytea to pg_lsn, doing that in a platform-agnostic
manner (little-endian and big-endian comes into play here) actually is
a no-go IMO. FWIW, for a little-endian system I had to do [2]select '\x00000000b8078901000000002c00601f00200420df020000e09f3800c09f3800a09f380080'::bytea AS fpi \gset select substring(:'fpi' from 3 for 16) AS rlsn \gset select substring(:'rlsn' from 7 for 2) || substring(:'rlsn' from 5 for 2) || substring(:'rlsn' from 3 for 2) || substring(:'rlsn' from 1 for 2) AS hi \gset select substring(:'rlsn' from 15 for 2) || substring(:'rlsn' from 13 for 2) || substring(:'rlsn' from 11 for 2) || substring(:'rlsn' from 9 for 2) AS lo \gset select (:'hi' || '/' || :'lo')::pg_lsn;.
Therefore I stick to the simple test unless there's a better way.
I'm attaching the v6 patch for further review.
[1]: CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn, IN end_lsn pg_lsn DEFAULT NULL, OUT start_lsn pg_lsn, OUT end_lsn pg_lsn, OUT prev_lsn pg_lsn, OUT xid xid, OUT resource_manager text, OUT record_type text, OUT record_length int4, OUT main_data_length int4, OUT fpi_length int4, OUT description text, OUT block_ref text ) RETURNS SETOF record AS 'MODULE_PATHNAME', 'pg_get_wal_records_info' LANGUAGE C CALLED ON NULL INPUT IMMUTABLE PARALLEL SAFE STRICT PARALLEL SAFE;
CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn,
IN end_lsn pg_lsn DEFAULT NULL,
OUT start_lsn pg_lsn,
OUT end_lsn pg_lsn,
OUT prev_lsn pg_lsn,
OUT xid xid,
OUT resource_manager text,
OUT record_type text,
OUT record_length int4,
OUT main_data_length int4,
OUT fpi_length int4,
OUT description text,
OUT block_ref text
)
RETURNS SETOF record
AS 'MODULE_PATHNAME', 'pg_get_wal_records_info'
LANGUAGE C CALLED ON NULL INPUT IMMUTABLE PARALLEL SAFE STRICT PARALLEL SAFE;
CREATE FUNCTION pg_get_wal_stats(IN start_lsn pg_lsn,
IN end_lsn pg_lsn DEFAULT NULL,
IN per_record boolean DEFAULT false,
OUT "resource_manager/record_type" text,
OUT count int8,
OUT count_percentage float8,
OUT record_size int8,
OUT record_size_percentage float8,
OUT fpi_size int8,
OUT fpi_size_percentage float8,
OUT combined_size int8,
OUT combined_size_percentage float8
)
RETURNS SETOF record
AS 'MODULE_PATHNAME', 'pg_get_wal_stats'
LANGUAGE C CALLED ON NULL INPUT IMMUTABLE PARALLEL SAFE STRICT PARALLEL SAFE;
[2]: select '\x00000000b8078901000000002c00601f00200420df020000e09f3800c09f3800a09f380080'::bytea AS fpi \gset select substring(:'fpi' from 3 for 16) AS rlsn \gset select substring(:'rlsn' from 7 for 2) || substring(:'rlsn' from 5 for 2) || substring(:'rlsn' from 3 for 2) || substring(:'rlsn' from 1 for 2) AS hi \gset select substring(:'rlsn' from 15 for 2) || substring(:'rlsn' from 13 for 2) || substring(:'rlsn' from 11 for 2) || substring(:'rlsn' from 9 for 2) AS lo \gset select (:'hi' || '/' || :'lo')::pg_lsn;
select '\x00000000b8078901000000002c00601f00200420df020000e09f3800c09f3800a09f380080'::bytea
AS fpi \gset
select substring(:'fpi' from 3 for 16) AS rlsn \gset
select substring(:'rlsn' from 7 for 2) || substring(:'rlsn' from 5 for
2) || substring(:'rlsn' from 3 for 2) || substring(:'rlsn' from 1 for
2) AS hi \gset
select substring(:'rlsn' from 15 for 2) || substring(:'rlsn' from 13
for 2) || substring(:'rlsn' from 11 for 2) || substring(:'rlsn' from 9
for 2) AS lo \gset
select (:'hi' || '/' || :'lo')::pg_lsn;
postgres=# select (:'hi' || '/' || :'lo')::pg_lsn;
pg_lsn
-----------
0/18907B8
(1 row)
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v6-0001-Add-FPI-extract-function-to-pg_walinspect.patchapplication/octet-stream; name=v6-0001-Add-FPI-extract-function-to-pg_walinspect.patchDownload
From 09f70c89e6f83704ebe4268fa23a8b1b89249c80 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Thu, 12 Jan 2023 11:47:42 +0000
Subject: [PATCH v6] Add FPI extract function to pg_walinspect
---
contrib/pg_walinspect/Makefile | 2 +-
.../pg_walinspect/expected/pg_walinspect.out | 44 ++++++-
contrib/pg_walinspect/meson.build | 1 +
.../pg_walinspect/pg_walinspect--1.0--1.1.sql | 24 ++++
contrib/pg_walinspect/pg_walinspect.c | 110 ++++++++++++++++++
contrib/pg_walinspect/pg_walinspect.control | 2 +-
contrib/pg_walinspect/sql/pg_walinspect.sql | 34 +++++-
doc/src/sgml/pgwalinspect.sgml | 56 +++++++++
8 files changed, 269 insertions(+), 4 deletions(-)
create mode 100644 contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql
diff --git a/contrib/pg_walinspect/Makefile b/contrib/pg_walinspect/Makefile
index 960530eb6c..7033878a79 100644
--- a/contrib/pg_walinspect/Makefile
+++ b/contrib/pg_walinspect/Makefile
@@ -7,7 +7,7 @@ OBJS = \
PGFILEDESC = "pg_walinspect - functions to inspect contents of PostgreSQL Write-Ahead Log"
EXTENSION = pg_walinspect
-DATA = pg_walinspect--1.0.sql
+DATA = pg_walinspect--1.0.sql pg_walinspect--1.0--1.1.sql
REGRESS = pg_walinspect
diff --git a/contrib/pg_walinspect/expected/pg_walinspect.out b/contrib/pg_walinspect/expected/pg_walinspect.out
index a1ee743457..096eb18681 100644
--- a/contrib/pg_walinspect/expected/pg_walinspect.out
+++ b/contrib/pg_walinspect/expected/pg_walinspect.out
@@ -10,7 +10,7 @@ CREATE TABLE sample_tbl(col1 int, col2 int);
SELECT pg_current_wal_lsn() AS wal_lsn1 \gset
INSERT INTO sample_tbl SELECT * FROM generate_series(1, 2);
SELECT pg_current_wal_lsn() AS wal_lsn2 \gset
-INSERT INTO sample_tbl SELECT * FROM generate_series(1, 2);
+INSERT INTO sample_tbl SELECT * FROM generate_series(3, 4);
-- ===================================================================
-- Tests for input validation
-- ===================================================================
@@ -73,6 +73,23 @@ SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info(:'wal_lsn1', :'wal_lsn2'
t
(1 row)
+-- ===================================================================
+-- Tests to get full page image (FPI) from WAL record
+-- ===================================================================
+SELECT pg_current_wal_lsn() AS wal_lsn3 \gset
+-- Force FPI on the next update
+CHECKPOINT;
+-- Update table to generate an FPI
+UPDATE sample_tbl SET col1 = col1 * 100 WHERE col1 = 1;
+SELECT pg_current_wal_lsn() AS wal_lsn4 \gset
+-- Check if we get FPI from WAL record
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4')
+ WHERE relfilenode = :'sample_tbl_oid';
+ ok
+----
+ t
+(1 row)
+
-- ===================================================================
-- Tests for permissions
-- ===================================================================
@@ -98,6 +115,13 @@ SELECT has_function_privilege('regress_pg_walinspect',
f
(1 row)
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- no
+ has_function_privilege
+------------------------
+ f
+(1 row)
+
-- Functions accessible by users with role pg_read_server_files
GRANT pg_read_server_files TO regress_pg_walinspect;
SELECT has_function_privilege('regress_pg_walinspect',
@@ -121,6 +145,13 @@ SELECT has_function_privilege('regress_pg_walinspect',
t
(1 row)
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+ has_function_privilege
+------------------------
+ t
+(1 row)
+
REVOKE pg_read_server_files FROM regress_pg_walinspect;
-- Superuser can grant execute to other users
GRANT EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn)
@@ -129,6 +160,8 @@ GRANT EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
TO regress_pg_walinspect;
GRANT EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
TO regress_pg_walinspect;
+GRANT EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ TO regress_pg_walinspect;
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_record_info(pg_lsn)', 'EXECUTE'); -- yes
has_function_privilege
@@ -150,12 +183,21 @@ SELECT has_function_privilege('regress_pg_walinspect',
t
(1 row)
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+ has_function_privilege
+------------------------
+ t
+(1 row)
+
REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn)
FROM regress_pg_walinspect;
REVOKE EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
FROM regress_pg_walinspect;
REVOKE EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
FROM regress_pg_walinspect;
+REVOKE EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ FROM regress_pg_walinspect;
-- ===================================================================
-- Clean up
-- ===================================================================
diff --git a/contrib/pg_walinspect/meson.build b/contrib/pg_walinspect/meson.build
index f6e912a7d8..bf7b79b1b7 100644
--- a/contrib/pg_walinspect/meson.build
+++ b/contrib/pg_walinspect/meson.build
@@ -19,6 +19,7 @@ contrib_targets += pg_walinspect
install_data(
'pg_walinspect.control',
'pg_walinspect--1.0.sql',
+ 'pg_walinspect--1.0--1.1.sql',
kwargs: contrib_data_args,
)
diff --git a/contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql b/contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql
new file mode 100644
index 0000000000..1e9e1e6115
--- /dev/null
+++ b/contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql
@@ -0,0 +1,24 @@
+/* contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_walinspect UPDATE TO '1.1'" to load this file. \quit
+
+--
+-- pg_get_wal_fpi_info()
+--
+CREATE FUNCTION pg_get_wal_fpi_info(IN start_lsn pg_lsn,
+ IN end_lsn pg_lsn,
+ OUT lsn pg_lsn,
+ OUT reltablespace oid,
+ OUT reldatabase oid,
+ OUT relfilenode oid,
+ OUT relblocknumber int8,
+ OUT forkname text,
+ OUT fpi bytea
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_get_wal_fpi_info'
+LANGUAGE C STRICT PARALLEL SAFE;
+
+REVOKE EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn) TO pg_read_server_files;
diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index 08a4c2fb52..8f992960e8 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -35,6 +35,7 @@ PG_FUNCTION_INFO_V1(pg_get_wal_records_info);
PG_FUNCTION_INFO_V1(pg_get_wal_records_info_till_end_of_wal);
PG_FUNCTION_INFO_V1(pg_get_wal_stats);
PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
+PG_FUNCTION_INFO_V1(pg_get_wal_fpi_info);
static bool IsFutureLSN(XLogRecPtr lsn, XLogRecPtr *curr_lsn);
static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
@@ -55,6 +56,7 @@ static void FillXLogStatsRow(const char *name, uint64 n, uint64 total_count,
Datum *values, bool *nulls, uint32 ncols);
static void GetWalStats(FunctionCallInfo fcinfo, XLogRecPtr start_lsn,
XLogRecPtr end_lsn, bool stats_per_record);
+static void GetWALFPIInfo(FunctionCallInfo fcinfo, XLogReaderState *record);
/*
* Check if the given LSN is in future. Also, return the LSN up to which the
@@ -616,3 +618,111 @@ pg_get_wal_stats_till_end_of_wal(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+
+/*
+ * Get full page images and their info associated with a given WAL record.
+ * Decompression is applied to the blocks, if necessary.
+ */
+static void
+GetWALFPIInfo(FunctionCallInfo fcinfo, XLogReaderState *record)
+{
+#define PG_GET_WAL_FPI_INFO_COLS 7
+ int block_id;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+ {
+ PGAlignedBlock buf;
+ Page page;
+ bytea *raw_page;
+ BlockNumber blk;
+ RelFileLocator rnode;
+ ForkNumber fork;
+ Datum values[PG_GET_WAL_FPI_INFO_COLS] = {0};
+ bool nulls[PG_GET_WAL_FPI_INFO_COLS] = {0};
+ int i = 0;
+
+ if (!XLogRecHasBlockRef(record, block_id))
+ continue;
+
+ if (!XLogRecHasBlockImage(record, block_id))
+ continue;
+
+ page = (Page) buf.data;
+
+ if (!RestoreBlockImage(record, block_id, page))
+ ereport(ERROR,
+ (errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("%s", record->errormsg_buf)));
+
+ /* Full page exists, so let's output its info and content. */
+ (void) XLogRecGetBlockTagExtended(record, block_id,
+ &rnode, &fork, &blk, NULL);
+
+ values[i++] = LSNGetDatum(record->ReadRecPtr);
+ values[i++] = ObjectIdGetDatum(rnode.spcOid);
+ values[i++] = ObjectIdGetDatum(rnode.dbOid);
+ values[i++] = ObjectIdGetDatum(rnode.relNumber);
+ values[i++] = Int64GetDatum((int64) blk);
+
+ if (fork >= 0 && fork <= MAX_FORKNUM)
+ values[i++] = CStringGetTextDatum(forkNames[fork]);
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("invalid fork number: %u", fork)));
+
+ /* Initialize bytea buffer to copy the FPI to. */
+ raw_page = (bytea *) palloc(BLCKSZ + VARHDRSZ);
+ SET_VARSIZE(raw_page, BLCKSZ + VARHDRSZ);
+
+ /* Take a verbatim copy of the FPI. */
+ memcpy(VARDATA(raw_page), page, BLCKSZ);
+
+ values[i++] = PointerGetDatum(raw_page);
+
+ Assert(i == PG_GET_WAL_FPI_INFO_COLS);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+
+#undef PG_GET_WAL_FPI_INFO_COLS
+}
+
+/*
+ * Get full page images and their info of all WAL records between start LSN and
+ * end LSN. Decompression is applied to the blocks, if necessary.
+ *
+ * This function emits an error if a future start or end WAL LSN i.e. WAL LSN
+ * the database system doesn't know about is specified.
+ */
+Datum
+pg_get_wal_fpi_info(PG_FUNCTION_ARGS)
+{
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+ XLogReaderState *xlogreader;
+
+ start_lsn = PG_GETARG_LSN(0);
+ end_lsn = PG_GETARG_LSN(1);
+
+ end_lsn = ValidateInputLSNs(false, start_lsn, end_lsn);
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ xlogreader = InitXLogReaderState(start_lsn);
+
+ while (ReadNextXLogRecord(xlogreader) &&
+ xlogreader->EndRecPtr <= end_lsn)
+ {
+ GetWALFPIInfo(fcinfo, xlogreader);
+
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ pfree(xlogreader->private_data);
+ XLogReaderFree(xlogreader);
+
+ PG_RETURN_VOID();
+}
diff --git a/contrib/pg_walinspect/pg_walinspect.control b/contrib/pg_walinspect/pg_walinspect.control
index 017e56a2bb..efa3cb2cfe 100644
--- a/contrib/pg_walinspect/pg_walinspect.control
+++ b/contrib/pg_walinspect/pg_walinspect.control
@@ -1,5 +1,5 @@
# pg_walinspect extension
comment = 'functions to inspect contents of PostgreSQL Write-Ahead Log'
-default_version = '1.0'
+default_version = '1.1'
module_pathname = '$libdir/pg_walinspect'
relocatable = true
diff --git a/contrib/pg_walinspect/sql/pg_walinspect.sql b/contrib/pg_walinspect/sql/pg_walinspect.sql
index 1b265ea7bc..0a0d5069ce 100644
--- a/contrib/pg_walinspect/sql/pg_walinspect.sql
+++ b/contrib/pg_walinspect/sql/pg_walinspect.sql
@@ -11,7 +11,7 @@ INSERT INTO sample_tbl SELECT * FROM generate_series(1, 2);
SELECT pg_current_wal_lsn() AS wal_lsn2 \gset
-INSERT INTO sample_tbl SELECT * FROM generate_series(1, 2);
+INSERT INTO sample_tbl SELECT * FROM generate_series(3, 4);
-- ===================================================================
-- Tests for input validation
@@ -52,6 +52,23 @@ SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info(:'wal_lsn1', :'wal_lsn2'
SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info(:'wal_lsn1', :'wal_lsn2')
WHERE resource_manager = 'Heap' AND record_type = 'INSERT';
+-- ===================================================================
+-- Tests to get full page image (FPI) from WAL record
+-- ===================================================================
+SELECT pg_current_wal_lsn() AS wal_lsn3 \gset
+
+-- Force FPI on the next update
+CHECKPOINT;
+
+-- Update table to generate an FPI
+UPDATE sample_tbl SET col1 = col1 * 100 WHERE col1 = 1;
+
+SELECT pg_current_wal_lsn() AS wal_lsn4 \gset
+
+-- Check if we get FPI from WAL record
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4')
+ WHERE relfilenode = :'sample_tbl_oid';
+
-- ===================================================================
-- Tests for permissions
-- ===================================================================
@@ -66,6 +83,9 @@ SELECT has_function_privilege('regress_pg_walinspect',
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_stats(pg_lsn, pg_lsn, boolean) ', 'EXECUTE'); -- no
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- no
+
-- Functions accessible by users with role pg_read_server_files
GRANT pg_read_server_files TO regress_pg_walinspect;
@@ -79,6 +99,9 @@ SELECT has_function_privilege('regress_pg_walinspect',
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_stats(pg_lsn, pg_lsn, boolean) ', 'EXECUTE'); -- yes
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+
REVOKE pg_read_server_files FROM regress_pg_walinspect;
-- Superuser can grant execute to other users
@@ -91,6 +114,9 @@ GRANT EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
GRANT EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
TO regress_pg_walinspect;
+GRANT EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ TO regress_pg_walinspect;
+
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_record_info(pg_lsn)', 'EXECUTE'); -- yes
@@ -100,6 +126,9 @@ SELECT has_function_privilege('regress_pg_walinspect',
SELECT has_function_privilege('regress_pg_walinspect',
'pg_get_wal_stats(pg_lsn, pg_lsn, boolean) ', 'EXECUTE'); -- yes
+SELECT has_function_privilege('regress_pg_walinspect',
+ 'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+
REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn)
FROM regress_pg_walinspect;
@@ -109,6 +138,9 @@ REVOKE EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
REVOKE EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
FROM regress_pg_walinspect;
+REVOKE EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+ FROM regress_pg_walinspect;
+
-- ===================================================================
-- Clean up
-- ===================================================================
diff --git a/doc/src/sgml/pgwalinspect.sgml b/doc/src/sgml/pgwalinspect.sgml
index c727a70489..3a88318d96 100644
--- a/doc/src/sgml/pgwalinspect.sgml
+++ b/doc/src/sgml/pgwalinspect.sgml
@@ -294,6 +294,62 @@ combined_size_percentage | 96.67190261142746
</listitem>
</varlistentry>
+ <varlistentry>
+ <term>
+ <function>pg_get_wal_fpi_info(start_lsn pg_lsn, end_lsn pg_lsn) returns setof record</function>
+ </term>
+
+ <listitem>
+ <para>
+ Gets raw full page images as <type>bytea</type> values (after applying
+ decompression when necessary) and their information associated with all
+ the valid WAL records between <replaceable>start_lsn</replaceable> and
+ <replaceable>end_lsn</replaceable>. Returns one row per full page image.
+ If <replaceable>start_lsn</replaceable> or
+ <replaceable>end_lsn</replaceable> are not yet available, the function
+ will raise an error. For example, usage of the function is as follows:
+<screen>
+postgres=# \x
+postgres=# SELECT lsn, reltablespace, reldatabase, relfilenode, relblocknumber,
+ forkname, substring(fpi for 24) as fpi_trimmed
+ FROM pg_get_wal_fpi_info('0/1801690', '0/1825C60');
+-[ RECORD 1 ]--+---------------------------------------------------
+lsn | 0/1807E20
+reltablespace | 1663
+reldatabase | 5
+relfilenode | 16396
+relblocknumber | 43
+forkname | main
+fpi_trimmed | \x00000000b89e660100000000a003c0030020042000000000
+-[ RECORD 2 ]--+---------------------------------------------------
+lsn | 0/1811690
+reltablespace | 1663
+reldatabase | 5
+relfilenode | 16396
+relblocknumber | 44
+forkname | main
+fpi_trimmed | \x0000000068d7660100000000a003c0030020042000000000
+-[ RECORD 3 ]--+---------------------------------------------------
+lsn | 0/181C040
+reltablespace | 1663
+reldatabase | 5
+relfilenode | 16399
+relblocknumber | 0
+forkname | main
+fpi_trimmed | \x00000000007a530100000000000400040020042000000000
+-[ RECORD 4 ]--+---------------------------------------------------
+lsn | 0/181E098
+reltablespace | 1663
+reldatabase | 5
+relfilenode | 16399
+relblocknumber | 45
+forkname | main
+fpi_trimmed | \x00000000f0135d01000000009c00e01b0020042000000000
+</screen>
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
--
2.34.1
On Thu, Jan 12, 2023 at 05:37:40PM +0530, Bharath Rupireddy wrote:
I understand. I don't mind discussing something like [1] with the
following behaviour and discarding till_end_of_wal functions
altogether:
If start_lsn is NULL, error out/return NULL.
If end_lsn isn't specified, default to NULL, then determine the end_lsn.
If end_lsn is specified as NULL, then determine the end_lsn.
If end_lsn is specified as non-NULL, then determine if it is greater
than start_lsn if yes, go ahead do the job, otherwise error out.I'll think a bit more on this and perhaps discuss it separately.
FWIW, I still find the current interface of the module bloated. So,
while it is possible to stick some pg_current_wal_lsn() calls to
bypass the error in most cases, enforcing the end of WAL with a NULL
or larger value would still be something I would push for based on my
own experience as there would be no need to worry about the latest LSN
as being two different values in two function contexts. You could
keep the functions as STRICT for consistency, and just allow larger
values as a synonym for the end of WAL.
Saying that, the version of pg_get_wal_fpi_info() committed respects
the current behavior of the module, with an error on an incorrect end
LSN.
I'll keep the FPI extract function simple as proposed in the patch and
I'll not go write till_end_of_wal version. If needed to get all the
FPIs till the end of WAL, one can always determine the end_lsn with
pg_current_wal_flush_lsn()/pg_last_wal_replay_lsn() when in recovery,
and use the proposed function.
I was reading the patch this morning, and that's pretty much what I
would have done in terms of simplicity with a test checking that at
least one FPI has been generated. I have shortened a bit the
documentation, tweaked a few comments and applied the whole after
seeing the result.
One thing that I have been wondering about is whether it is worth
adding the block_id from the record in the output, but discarded this
idea as it could be confused with the physical block number, even if
this function is for advanced users.
--
Michael