Show various offset arrays for heap WAL records

Started by Andres Freundabout 3 years ago49 messages
#1Andres Freund
andres@anarazel.de
1 attachment(s)

Hi,

A couple times when investigating data corruption issues, the last time just
yesterday in [1]/messages/by-id/CANtu0ojby3eBdMXfs4QmS+K1avBc7NcRq_Ot5bnzrbwM+uQ55w@mail.gmail.com, I needed to see the offsets affected by PRUNE and VACUUM
records. As that's probably not just me, I think we should make that change
in-tree.

The attached patch adds details to XLOG_HEAP2_PRUNE, XLOG_HEAP2_VACUUM,
XLOG_HEAP2_FREEZE_PAGE.

The biggest issue I have with the patch is that it's very hard to figure out
what punctuation to use where ;). The existing code is very inconsistent.

I chose to include infomask[2] for the different freeze plans mainly because
it looks odd to see different plans without a visible reason. But I'm not sure
that's the right choice.

Greetings,

Andres Freund

[1]: /messages/by-id/CANtu0ojby3eBdMXfs4QmS+K1avBc7NcRq_Ot5bnzrbwM+uQ55w@mail.gmail.com

Attachments:

v3-0001-heapdesc-Include-information-about-offset-arrays.patchtext/x-diff; charset=us-asciiDownload
From 2668f69edc5345a45b850eb4f33321f8bc18cb6e Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 9 Jan 2023 13:49:09 -0800
Subject: [PATCH v3] heapdesc: Include information about offset arrays

---
 src/backend/access/rmgrdesc/heapdesc.c | 94 ++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)

diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 628f7e82156..f1c4a0e4c64 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -114,6 +114,37 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "off %u", xlrec->offnum);
 	}
 }
+
+static void
+print_offset_array(StringInfo buf, const char *label, OffsetNumber *offsets, int count)
+{
+	if (count == 0)
+		return;
+	appendStringInfo(buf, "%s [", label);
+	for (int i = 0; i < count; i++)
+	{
+		if (i > 0)
+			appendStringInfoString(buf, ", ");
+		appendStringInfo(buf, "%u", offsets[i]);
+	}
+	appendStringInfoString(buf, "]");
+}
+
+static void
+print_redirect_array(StringInfo buf, const char *label, OffsetNumber *offsets, int count)
+{
+	if (count == 0)
+		return;
+	appendStringInfo(buf, "%s [", label);
+	for (int i = 0; i < count; i++)
+	{
+		if (i > 0)
+			appendStringInfoString(buf, ", ");
+		appendStringInfo(buf, "%u->%u", offsets[i * 2], offsets[i * 2 + 1]);
+	}
+	appendStringInfoString(buf, "]");
+}
+
 void
 heap2_desc(StringInfo buf, XLogReaderState *record)
 {
@@ -129,12 +160,48 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec->snapshotConflictHorizon,
 						 xlrec->nredirected,
 						 xlrec->ndead);
+
+
+		if (!XLogRecHasBlockImage(record, 0))
+		{
+			OffsetNumber *end;
+			OffsetNumber *redirected;
+			OffsetNumber *nowdead;
+			OffsetNumber *nowunused;
+			int			nredirected;
+			int			nunused;
+			Size		datalen;
+
+			redirected = (OffsetNumber *) XLogRecGetBlockData(record, 0, &datalen);
+
+			nredirected = xlrec->nredirected;
+			end = (OffsetNumber *) ((char *) redirected + datalen);
+			nowdead = redirected + (nredirected * 2);
+			nowunused = nowdead + xlrec->ndead;
+			nunused = (end - nowunused);
+			Assert(nunused >= 0);
+
+			appendStringInfo(buf, " nunused %u", nunused);
+			print_offset_array(buf, ", unused:", nowunused, nunused);
+			print_redirect_array(buf, ", redirected:", redirected, nredirected);
+			print_offset_array(buf, ", dead:", nowdead, xlrec->ndead);
+		}
 	}
 	else if (info == XLOG_HEAP2_VACUUM)
 	{
 		xl_heap_vacuum *xlrec = (xl_heap_vacuum *) rec;
 
 		appendStringInfo(buf, "nunused %u", xlrec->nunused);
+
+		if (!XLogRecHasBlockImage(record, 0))
+		{
+			OffsetNumber *nowunused;
+			Size		datalen;
+
+			nowunused = (OffsetNumber *) XLogRecGetBlockData(record, 0, &datalen);
+
+			print_offset_array(buf, " unused:", nowunused, xlrec->nunused);
+		}
 	}
 	else if (info == XLOG_HEAP2_FREEZE_PAGE)
 	{
@@ -142,6 +209,30 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 
 		appendStringInfo(buf, "snapshotConflictHorizon %u nplans %u",
 						 xlrec->snapshotConflictHorizon, xlrec->nplans);
+
+		if (!XLogRecHasBlockImage(record, 0))
+		{
+			OffsetNumber *offsets;
+			xl_heap_freeze_plan *plans;
+			int			curoff = 0;
+
+			plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, NULL);
+			offsets = (OffsetNumber *) ((char *) plans +
+										(xlrec->nplans *
+										 sizeof(xl_heap_freeze_plan)));
+
+			for (int p = 0; p < xlrec->nplans; p++)
+			{
+				xl_heap_freeze_plan plan;
+
+				memcpy(&plan, &plans[p], sizeof(xl_heap_freeze_plan));
+
+				appendStringInfo(buf, "; plan #%d: xmax %u infomask %u infomask2 %u",
+								 p, plan.xmax, plan.t_infomask, plan.t_infomask2);
+				print_offset_array(buf, ", offsets", offsets + curoff, plan.ntuples);
+				curoff += plan.ntuples;
+			}
+		}
 	}
 	else if (info == XLOG_HEAP2_VISIBLE)
 	{
@@ -153,9 +244,12 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
+		bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 
 		appendStringInfo(buf, "%d tuples flags 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
+		if (!XLogRecHasBlockImage(record, 0) && !isinit)
+			print_offset_array(buf, " offsets", xlrec->offsets, xlrec->ntuples);
 	}
 	else if (info == XLOG_HEAP2_LOCK_UPDATED)
 	{
-- 
2.38.0

In reply to: Andres Freund (#1)
Re: Show various offset arrays for heap WAL records

On Mon, Jan 9, 2023 at 1:58 PM Andres Freund <andres@anarazel.de> wrote:

A couple times when investigating data corruption issues, the last time just
yesterday in [1], I needed to see the offsets affected by PRUNE and VACUUM
records. As that's probably not just me, I think we should make that change
in-tree.

I remember how useful this was when we were investigating that early
bug in 14, that turned out to be in parallel VACUUM. So I'm all in
favor of it.

The attached patch adds details to XLOG_HEAP2_PRUNE, XLOG_HEAP2_VACUUM,
XLOG_HEAP2_FREEZE_PAGE.

I'm bound to end up doing the same in index access methods. Might make
sense for the utility routines to live somewhere more centralized, at
least when code reuse is likely. Practically every index AM has WAL
records that include a sorted page offset number array, just like
these ones. It's a very standard thing, obviously.

I chose to include infomask[2] for the different freeze plans mainly because
it looks odd to see different plans without a visible reason. But I'm not sure
that's the right choice.

I don't think that it is particularly necessary to do so in order for
the output to make sense -- pg_waldump is inherently a tool for
experts. What it comes down to for me is whether or not this
information is sufficiently useful to display, and/or can be (or needs
to be) controlled via some kind of verbosity knob.

I think that it easily could be useful, and I also think that it
easily could be a bit annoying. How hard would it be to invent a
general mechanism to control the verbosity of what we'll show for each
WAL record?

--
Peter Geoghegan

#3Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#2)
Re: Show various offset arrays for heap WAL records

Hi,

On 2023-01-09 19:59:42 -0800, Peter Geoghegan wrote:

On Mon, Jan 9, 2023 at 1:58 PM Andres Freund <andres@anarazel.de> wrote:

A couple times when investigating data corruption issues, the last time just
yesterday in [1], I needed to see the offsets affected by PRUNE and VACUUM
records. As that's probably not just me, I think we should make that change
in-tree.

I remember how useful this was when we were investigating that early
bug in 14, that turned out to be in parallel VACUUM. So I'm all in
favor of it.

Cool.

The attached patch adds details to XLOG_HEAP2_PRUNE, XLOG_HEAP2_VACUUM,
XLOG_HEAP2_FREEZE_PAGE.

I'm bound to end up doing the same in index access methods. Might make
sense for the utility routines to live somewhere more centralized, at
least when code reuse is likely. Practically every index AM has WAL
records that include a sorted page offset number array, just like
these ones. It's a very standard thing, obviously.

Hm, there doesn't seem to be a great location for them today. I guess we could
add something like src/include/access/rmgrdesc_utils.h? And put the
implementation in src/backend/access/rmgrdesc/rmgrdesc_utils.c? I first was
thinking of just rmgrdesc.[ch], but custom rmgrs added
src/bin/pg_waldump/rmgrdesc.[ch] ...

I chose to include infomask[2] for the different freeze plans mainly because
it looks odd to see different plans without a visible reason. But I'm not sure
that's the right choice.

I don't think that it is particularly necessary to do so in order for
the output to make sense -- pg_waldump is inherently a tool for
experts. What it comes down to for me is whether or not this
information is sufficiently useful to display, and/or can be (or needs
to be) controlled via some kind of verbosity knob.

It seemed useful enough to me, but I likely also stare more at this stuff than
most. Compared to the list of offsets it's not that much content.

How hard would it be to invent a general mechanism to control the verbosity
of what we'll show for each WAL record?

Nontrivial, I'm afraid. We don't pass any relevant parameters to rm_desc:
void (*rm_desc) (StringInfo buf, XLogReaderState *record);

so we'd need to patch all of them. That might be worth doing at some point,
but I don't want to tackle it right now.

Greetings,

Andres Freund

In reply to: Andres Freund (#3)
Re: Show various offset arrays for heap WAL records

On Tue, Jan 10, 2023 at 11:35 AM Andres Freund <andres@anarazel.de> wrote:

Nontrivial, I'm afraid. We don't pass any relevant parameters to rm_desc:
void (*rm_desc) (StringInfo buf, XLogReaderState *record);

so we'd need to patch all of them. That might be worth doing at some point,
but I don't want to tackle it right now.

Okay. Let's just get the basics in soon, then.

I would like to have a similar capability for index access methods,
but mostly just for investigating performance. Whenever we've really
needed something like this for debugging it seems to have been a
heapam thing, just because there's a lot more that can go wrong with
pruning, which is spread across many different places.

--
Peter Geoghegan

#5Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#4)
Re: Show various offset arrays for heap WAL records

Hi,

On 2023-01-11 14:53:54 -0800, Peter Geoghegan wrote:

On Tue, Jan 10, 2023 at 11:35 AM Andres Freund <andres@anarazel.de> wrote:

Nontrivial, I'm afraid. We don't pass any relevant parameters to rm_desc:
void (*rm_desc) (StringInfo buf, XLogReaderState *record);

so we'd need to patch all of them. That might be worth doing at some point,
but I don't want to tackle it right now.

Okay. Let's just get the basics in soon, then.

I would like to have a similar capability for index access methods,
but mostly just for investigating performance. Whenever we've really
needed something like this for debugging it seems to have been a
heapam thing, just because there's a lot more that can go wrong with
pruning, which is spread across many different places.

What are your thoughts about the place for the helper functions? You're ok
with rmgrdesc_utils.[ch]?

Greetings,

Andres Freund

In reply to: Andres Freund (#5)
Re: Show various offset arrays for heap WAL records

On Wed, Jan 11, 2023 at 3:00 PM Andres Freund <andres@anarazel.de> wrote:

What are your thoughts about the place for the helper functions? You're ok
with rmgrdesc_utils.[ch]?

Yeah, that seems okay.

We may well need to put more stuff in that file. We're overdue a big
overhaul of the rmgr output, so that everybody uses the same format
for everything. We made some progress on that for 16 already, by
standardizing on the name snapshotConflictHorizon, but a lot of
annoying inconsistencies still remain. Like the punctuation issue you
mentioned.

Ideally we'd be able to make the output more easy to manipulate via
the SQL interface from pg_walinspect, or perhaps via scripting. That
would require some rules that are imposed top-down, so that consumers
of the data can make certain general assumptions. But that's fairly
natural. It's not like there is just inherently a great deal of
diversity that we need to be considered. For example, the WAL records
used by each individual index access method are all very similar. In
fact the most important index AM WAL records used by each index AM
(e.g. insert, delete, vacuum) have virtually the same format as each
other already.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#6)
Re: Show various offset arrays for heap WAL records

On Wed, Jan 11, 2023 at 3:11 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Wed, Jan 11, 2023 at 3:00 PM Andres Freund <andres@anarazel.de> wrote:

What are your thoughts about the place for the helper functions? You're ok
with rmgrdesc_utils.[ch]?

Yeah, that seems okay.

BTW, while playing around with this patch today, I noticed that it
won't display the number of elements in each offset array directly.
Perhaps it's worth including that, too?

--
Peter Geoghegan

#8Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#7)
1 attachment(s)
Re: Show various offset arrays for heap WAL records

Hi,

I have taken a stab at doing some of the tasks listed in this email.

I have made the new files rmgr_utils.c/h.

I have come up with a standard format that I like for the output and
used it in all the heap record types.

Examples below:

snapshotConflictHorizon: 2184, nplans: 2, plans [ { xmax: 0, infomask:
2816, infomask2: 2, ntuples: 5, offsets: [ 10, 11, 12, 18, 71 ] }, {
xmax: 0, infomask: 11008, infomask2: 2, ntuples: 2, offsets: [ 72, 73
] } ]

snapshotConflictHorizon: 2199, nredirected: 4, ndead: 0, nunused: 4,
redirected: [ 1->38, 2->39, 3->40, 4->41 ], dead: [], unused: [ 24,
25, 26, 27, 37 ]

I started documenting it in the rmgr_utils.h header file in a comment,
however it may be worth a README?

I haven't polished this description of the format (or added examples,
etc) or used it in the btree-related functions because I assume the
format and helper function API will need more discussion.

This is still a rough draft, as I anticipate changes will be requested.
I would split it into multiple patches, etc. But I am looking for
feedback on the suggested format and the array formatting helper
function API.

Perhaps there should also be example output of the offset arrays in
pgwalinspect docs?

I've changed the array format helper functions that Andres added to be a
single function with an additional layer of indirection so that any
record with an array can use it regardless of type and format of the
individual elements. The signature is based somewhat off of qsort_r()
and allows the user to pass a function with the the desired format of
the elements.

On a semi-unrelated note, I think it might be nice to have a comment in
heapam_xlog.h about what the infobits fields actually are and why they
exist -- e.g. we only need a subset of infomask[2] bits in these
records.
I put a random comment in the code where I think it should go.
I will delete it later, of course.

On Mon, Jan 9, 2023 at 11:00 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Mon, Jan 9, 2023 at 1:58 PM Andres Freund <andres@anarazel.de> wrote:

The attached patch adds details to XLOG_HEAP2_PRUNE, XLOG_HEAP2_VACUUM,
XLOG_HEAP2_FREEZE_PAGE.

I'm bound to end up doing the same in index access methods. Might make
sense for the utility routines to live somewhere more centralized, at
least when code reuse is likely. Practically every index AM has WAL
records that include a sorted page offset number array, just like
these ones. It's a very standard thing, obviously.

I plan to add these if the format and API I suggested seems like the
right direction.

On Tue, Jan 10, 2023 at 2:35 PM Andres Freund <andres@anarazel.de> wrote:

I chose to include infomask[2] for the different freeze plans mainly because
it looks odd to see different plans without a visible reason. But I'm not sure
that's the right choice.

I don't think that it is particularly necessary to do so in order for
the output to make sense -- pg_waldump is inherently a tool for
experts. What it comes down to for me is whether or not this
information is sufficiently useful to display, and/or can be (or needs
to be) controlled via some kind of verbosity knob.

It seemed useful enough to me, but I likely also stare more at this stuff than
most. Compared to the list of offsets it's not that much content.

Personally, I like having the infomasks for the freeze plans. If we
someday have a more structured input to rmgr_desc, we could then easily
have them in their own column and use functions like
heap_tuple_infomask_flags() on them.

How hard would it be to invent a general mechanism to control the verbosity
of what we'll show for each WAL record?

Nontrivial, I'm afraid. We don't pass any relevant parameters to rm_desc:
void (*rm_desc) (StringInfo buf, XLogReaderState *record);

so we'd need to patch all of them. That might be worth doing at some point,
but I don't want to tackle it right now.

In terms of a more structured format, it seems like it would make the
most sense to pass a JSON or composite datatype structure to rm_desc
instead of that StringInfo.

I would also like to see functions like XLogRecGetBlockRefInfo() pass
something more useful than a stringinfo buffer so that we could easily
extract out the relfilenode in pgwalinspect.

On Mon, Jan 16, 2023 at 10:09 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Wed, Jan 11, 2023 at 3:11 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Wed, Jan 11, 2023 at 3:00 PM Andres Freund <andres@anarazel.de> wrote:

What are your thoughts about the place for the helper functions? You're ok
with rmgrdesc_utils.[ch]?

Yeah, that seems okay.

BTW, while playing around with this patch today, I noticed that it
won't display the number of elements in each offset array directly.
Perhaps it's worth including that, too?

I believe I have addressed this in the attached patch.

- Melanie

Attachments:

v1-0001-Add-rmgr_desc-utilities.patchapplication/octet-stream; name=v1-0001-Add-rmgr_desc-utilities.patchDownload
From 10ec7b3366f7ffdc31295831cb112863446fb578 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 25 Jan 2023 08:38:38 -0500
Subject: [PATCH v1] Add rmgr_desc utilities

- Begin to standardize format of rmgr_desc output, adding a new
  rmgr_utils file with comments about expected format.
- Add helper which prints arrays in a standard format and use it in
  heapdesc to print out offset arrays, relid arrays, freeze plan arrays,
  and relid arrays
---
 src/backend/access/rmgrdesc/Makefile         |   1 +
 src/backend/access/rmgrdesc/heapdesc.c       | 154 +++++++++++++++----
 src/backend/access/rmgrdesc/meson.build      |   1 +
 src/backend/access/rmgrdesc/rmgrdesc_utils.c |  88 +++++++++++
 src/bin/pg_waldump/Makefile                  |   2 +-
 src/include/access/heapam_xlog.h             |   1 +
 src/include/access/rmgrdesc_utils.h          |  33 ++++
 7 files changed, 251 insertions(+), 29 deletions(-)
 create mode 100644 src/backend/access/rmgrdesc/rmgrdesc_utils.c
 create mode 100644 src/include/access/rmgrdesc_utils.h

diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..cd95eec37f 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -23,6 +23,7 @@ OBJS = \
 	nbtdesc.o \
 	relmapdesc.o \
 	replorigindesc.o \
+	rmgrdesc_utils.o \
 	seqdesc.o \
 	smgrdesc.o \
 	spgdesc.o \
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 628f7e8215..25a663c71b 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -15,22 +15,56 @@
 #include "postgres.h"
 
 #include "access/heapam_xlog.h"
+#include "access/rmgrdesc_utils.h"
+
 
 static void
 out_infobits(StringInfo buf, uint8 infobits)
 {
+	if ((infobits & XLHL_XMAX_IS_MULTI) == 0 &&
+		(infobits & XLHL_XMAX_LOCK_ONLY) == 0 &&
+		(infobits & XLHL_XMAX_EXCL_LOCK) == 0 &&
+		(infobits & XLHL_XMAX_KEYSHR_LOCK) == 0 &&
+		(infobits & XLHL_KEYS_UPDATED) == 0)
+		return;
+
+	appendStringInfoString(buf, ", infobits: [");
+
 	if (infobits & XLHL_XMAX_IS_MULTI)
-		appendStringInfoString(buf, "IS_MULTI ");
+		appendStringInfoString(buf, " IS_MULTI");
 	if (infobits & XLHL_XMAX_LOCK_ONLY)
-		appendStringInfoString(buf, "LOCK_ONLY ");
+		appendStringInfoString(buf, ", LOCK_ONLY");
 	if (infobits & XLHL_XMAX_EXCL_LOCK)
-		appendStringInfoString(buf, "EXCL_LOCK ");
+		appendStringInfoString(buf, ", EXCL_LOCK");
 	if (infobits & XLHL_XMAX_KEYSHR_LOCK)
-		appendStringInfoString(buf, "KEYSHR_LOCK ");
+		appendStringInfoString(buf, ", KEYSHR_LOCK");
 	if (infobits & XLHL_KEYS_UPDATED)
-		appendStringInfoString(buf, "KEYS_UPDATED ");
+		appendStringInfoString(buf, ", KEYS_UPDATED");
+
+	appendStringInfoString(buf, " ]");
+}
+
+static void
+plan_elem_desc(StringInfo buf, void *restrict plan, void *restrict data)
+{
+	xl_heap_freeze_plan *new_plan = (xl_heap_freeze_plan *) plan;
+	OffsetNumber **offsets = data;
+
+	appendStringInfo(buf, "{ xmax: %u, infomask: %u, infomask2: %u, ntuples: %u",
+					 new_plan->xmax, new_plan->t_infomask,
+					 new_plan->t_infomask2,
+					 new_plan->ntuples);
+
+	appendStringInfoString(buf, ", offsets:");
+	array_desc(buf, *offsets, sizeof(OffsetNumber), new_plan->ntuples,
+			   &offset_elem_desc, NULL);
+
+	*offsets += new_plan->ntuples;
+
+	appendStringInfo(buf, " }");
 }
 
+
 void
 heap_desc(StringInfo buf, XLogReaderState *record)
 {
@@ -42,14 +76,15 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_insert *xlrec = (xl_heap_insert *) rec;
 
-		appendStringInfo(buf, "off %u flags 0x%02X", xlrec->offnum,
+		appendStringInfo(buf, "off: %u, flags: 0x%02X",
+						 xlrec->offnum,
 						 xlrec->flags);
 	}
 	else if (info == XLOG_HEAP_DELETE)
 	{
 		xl_heap_delete *xlrec = (xl_heap_delete *) rec;
 
-		appendStringInfo(buf, "off %u flags 0x%02X ",
+		appendStringInfo(buf, "off: %u, flags: 0x%02X",
 						 xlrec->offnum,
 						 xlrec->flags);
 		out_infobits(buf, xlrec->infobits_set);
@@ -58,12 +93,12 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_update *xlrec = (xl_heap_update *) rec;
 
-		appendStringInfo(buf, "off %u xmax %u flags 0x%02X ",
+		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
 						 xlrec->old_offnum,
 						 xlrec->old_xmax,
 						 xlrec->flags);
 		out_infobits(buf, xlrec->old_infobits_set);
-		appendStringInfo(buf, "; new off %u xmax %u",
+		appendStringInfo(buf, ", new off: %u, xmax %u",
 						 xlrec->new_offnum,
 						 xlrec->new_xmax);
 	}
@@ -71,39 +106,41 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_update *xlrec = (xl_heap_update *) rec;
 
-		appendStringInfo(buf, "off %u xmax %u flags 0x%02X ",
+		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
 						 xlrec->old_offnum,
 						 xlrec->old_xmax,
 						 xlrec->flags);
 		out_infobits(buf, xlrec->old_infobits_set);
-		appendStringInfo(buf, "; new off %u xmax %u",
+		appendStringInfo(buf, ", new off: %u, xmax: %u",
 						 xlrec->new_offnum,
 						 xlrec->new_xmax);
 	}
 	else if (info == XLOG_HEAP_TRUNCATE)
 	{
 		xl_heap_truncate *xlrec = (xl_heap_truncate *) rec;
-		int			i;
 
+		appendStringInfoString(buf, "flags: [");
 		if (xlrec->flags & XLH_TRUNCATE_CASCADE)
-			appendStringInfoString(buf, "cascade ");
+			appendStringInfoString(buf, " CASCADE");
 		if (xlrec->flags & XLH_TRUNCATE_RESTART_SEQS)
-			appendStringInfoString(buf, "restart_seqs ");
-		appendStringInfo(buf, "nrelids %u relids", xlrec->nrelids);
-		for (i = 0; i < xlrec->nrelids; i++)
-			appendStringInfo(buf, " %u", xlrec->relids[i]);
+			appendStringInfoString(buf, ", RESTART_SEQS");
+		appendStringInfoString(buf, " ]");
+
+		appendStringInfo(buf, ", nrelids: %u", xlrec->nrelids);
+		appendStringInfoString(buf, ", relids:");
+		array_desc(buf, xlrec->relids, sizeof(Oid), xlrec->nrelids, &relid_desc, NULL);
 	}
 	else if (info == XLOG_HEAP_CONFIRM)
 	{
 		xl_heap_confirm *xlrec = (xl_heap_confirm *) rec;
 
-		appendStringInfo(buf, "off %u", xlrec->offnum);
+		appendStringInfo(buf, "off: %u", xlrec->offnum);
 	}
 	else if (info == XLOG_HEAP_LOCK)
 	{
 		xl_heap_lock *xlrec = (xl_heap_lock *) rec;
 
-		appendStringInfo(buf, "off %u: xid %u: flags 0x%02X ",
+		appendStringInfo(buf, "off: %u, xid: %u, flags: 0x%02X",
 						 xlrec->offnum, xlrec->locking_xid, xlrec->flags);
 		out_infobits(buf, xlrec->infobits_set);
 	}
@@ -111,9 +148,10 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_inplace *xlrec = (xl_heap_inplace *) rec;
 
-		appendStringInfo(buf, "off %u", xlrec->offnum);
+		appendStringInfo(buf, "off: %u", xlrec->offnum);
 	}
 }
+
 void
 heap2_desc(StringInfo buf, XLogReaderState *record)
 {
@@ -125,43 +163,103 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_prune *xlrec = (xl_heap_prune *) rec;
 
-		appendStringInfo(buf, "snapshotConflictHorizon %u nredirected %u ndead %u",
+		appendStringInfo(buf, "snapshotConflictHorizon: %u, nredirected: %u, ndead: %u",
 						 xlrec->snapshotConflictHorizon,
 						 xlrec->nredirected,
 						 xlrec->ndead);
+
+
+		if (!XLogRecHasBlockImage(record, 0))
+		{
+			OffsetNumber *end;
+			OffsetNumber *redirected;
+			OffsetNumber *nowdead;
+			OffsetNumber *nowunused;
+			int			nredirected;
+			int			nunused;
+			Size		datalen;
+
+			redirected = (OffsetNumber *) XLogRecGetBlockData(record, 0, &datalen);
+
+			nredirected = xlrec->nredirected;
+			end = (OffsetNumber *) ((char *) redirected + datalen);
+			nowdead = redirected + (nredirected * 2);
+			nowunused = nowdead + xlrec->ndead;
+			nunused = (end - nowunused);
+			Assert(nunused >= 0);
+
+			appendStringInfo(buf, ", nunused: %u", nunused);
+
+			appendStringInfoString(buf, ", redirected:");
+			array_desc(buf, redirected, sizeof(OffsetNumber) * 2, nredirected * 2, &redirect_elem_desc, NULL);
+			appendStringInfoString(buf, ", dead:");
+			array_desc(buf, nowdead, sizeof(OffsetNumber), xlrec->ndead, &offset_elem_desc, NULL);
+			appendStringInfoString(buf, ", unused:");
+			array_desc(buf, nowunused, sizeof(OffsetNumber), nunused, &offset_elem_desc, NULL);
+		}
 	}
 	else if (info == XLOG_HEAP2_VACUUM)
 	{
 		xl_heap_vacuum *xlrec = (xl_heap_vacuum *) rec;
 
-		appendStringInfo(buf, "nunused %u", xlrec->nunused);
+		appendStringInfo(buf, "nunused: %u", xlrec->nunused);
+
+		if (!XLogRecHasBlockImage(record, 0))
+		{
+			OffsetNumber *nowunused;
+			Size		datalen;
+
+			nowunused = (OffsetNumber *) XLogRecGetBlockData(record, 0, &datalen);
+
+			appendStringInfoString(buf, ", unused:");
+			array_desc(buf, nowunused, sizeof(OffsetNumber), xlrec->nunused, &offset_elem_desc, NULL);
+		}
 	}
 	else if (info == XLOG_HEAP2_FREEZE_PAGE)
 	{
 		xl_heap_freeze_page *xlrec = (xl_heap_freeze_page *) rec;
 
-		appendStringInfo(buf, "snapshotConflictHorizon %u nplans %u",
+		appendStringInfo(buf, "snapshotConflictHorizon: %u, nplans: %u",
 						 xlrec->snapshotConflictHorizon, xlrec->nplans);
+
+		if (!XLogRecHasBlockImage(record, 0))
+		{
+			xl_heap_freeze_plan *plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, NULL);
+			OffsetNumber *offsets = (OffsetNumber *) &plans[xlrec->nplans];
+
+			appendStringInfoString(buf, ", plans");
+			array_desc(buf,
+					   plans,
+					   sizeof(xl_heap_freeze_plan),
+					   xlrec->nplans,
+					   &plan_elem_desc,
+					   &offsets);
+
+		}
 	}
 	else if (info == XLOG_HEAP2_VISIBLE)
 	{
 		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
 
-		appendStringInfo(buf, "snapshotConflictHorizon %u flags 0x%02X",
+		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
 						 xlrec->snapshotConflictHorizon, xlrec->flags);
 	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
+		bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 
-		appendStringInfo(buf, "%d tuples flags 0x%02X", xlrec->ntuples,
+		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
+		appendStringInfoString(buf, ", offsets:");
+		if (!XLogRecHasBlockImage(record, 0) && !isinit)
+			array_desc(buf, xlrec->offsets, sizeof(OffsetNumber), xlrec->ntuples, &offset_elem_desc, NULL);
 	}
 	else if (info == XLOG_HEAP2_LOCK_UPDATED)
 	{
 		xl_heap_lock_updated *xlrec = (xl_heap_lock_updated *) rec;
 
-		appendStringInfo(buf, "off %u: xmax %u: flags 0x%02X ",
+		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
 						 xlrec->offnum, xlrec->xmax, xlrec->flags);
 		out_infobits(buf, xlrec->infobits_set);
 	}
@@ -169,13 +267,13 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_new_cid *xlrec = (xl_heap_new_cid *) rec;
 
-		appendStringInfo(buf, "rel %u/%u/%u; tid %u/%u",
+		appendStringInfo(buf, "rel: %u/%u/%u, tid: %u/%u",
 						 xlrec->target_locator.spcOid,
 						 xlrec->target_locator.dbOid,
 						 xlrec->target_locator.relNumber,
 						 ItemPointerGetBlockNumber(&(xlrec->target_tid)),
 						 ItemPointerGetOffsetNumber(&(xlrec->target_tid)));
-		appendStringInfo(buf, "; cmin: %u, cmax: %u, combo: %u",
+		appendStringInfo(buf, ", cmin: %u, cmax: %u, combo: %u",
 						 xlrec->cmin, xlrec->cmax, xlrec->combocid);
 	}
 }
diff --git a/src/backend/access/rmgrdesc/meson.build b/src/backend/access/rmgrdesc/meson.build
index 166cee67b6..f76e87e2d7 100644
--- a/src/backend/access/rmgrdesc/meson.build
+++ b/src/backend/access/rmgrdesc/meson.build
@@ -16,6 +16,7 @@ rmgr_desc_sources = files(
   'nbtdesc.c',
   'relmapdesc.c',
   'replorigindesc.c',
+  'rmgrdesc_utils.c',
   'seqdesc.c',
   'smgrdesc.c',
   'spgdesc.c',
diff --git a/src/backend/access/rmgrdesc/rmgrdesc_utils.c b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
new file mode 100644
index 0000000000..fa45c796a0
--- /dev/null
+++ b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
@@ -0,0 +1,88 @@
+/*-------------------------------------------------------------------------
+ *
+ * rmgrdesc_utils.c
+ *		support functions for rmgrdesc
+ *
+ * Portions Copyright (c) 2023, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/rmgrdesc/rmgrdesc_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/rmgrdesc_utils.h"
+
+/*
+ * The expected formatting of a desc function for an xlog record is as follows:
+ * member1_name: member1_value, member2_name: member2_value
+ * If the value is a list, please use
+ * member3_name: [ member3_list_value1, member3_list_value2 ]
+ *
+ * The first item appended to the string should not be prepended by any spaces
+ * or comma, however all subsequent appends to the string are responsible for
+ * prepended themselves with a comma followed by a space.
+ *
+ * Arrays should have a space between the opening square bracket and first
+ * element and between the last element and closing brace.
+ *
+ * Flags should be in all caps.
+ *
+ * For lists/arrays of items, the number of those items should be listed at the
+ * beginning with all the other numbers.
+ *
+ * whether or not to leave out a n[item] if the number is 0 (not consistent for
+ * XLOG_HEAP2_VACUUM and XLOG_HEAP2_PRUNE
+ *
+ * composite objects in a list should be surrounded with { }
+ *
+ * etc.
+ *
+ * TODO: make this description better/maybe put in a README
+ */
+
+
+void
+array_desc(StringInfo buf, void *array, size_t elem_size, int count,
+		   void (*elem_desc) (StringInfo buf, void *restrict elem, void *restrict data),
+		   void *restrict data)
+{
+	if (count == 0)
+	{
+		appendStringInfoString(buf, " []");
+		return;
+	}
+	appendStringInfo(buf, " [");
+	for (int i = 0; i < count; i++)
+	{
+		if (i > 0)
+			appendStringInfoString(buf, ",");
+		appendStringInfoString(buf, " ");
+
+		elem_desc(buf, (char *) array + elem_size * i, data);
+	}
+	appendStringInfoString(buf, " ]");
+}
+
+void
+redirect_elem_desc(StringInfo buf, void *restrict offset, void *restrict data)
+{
+	OffsetNumber *new_offset = (OffsetNumber *) offset;
+
+	appendStringInfo(buf, "%u->%u", new_offset[0], new_offset[1]);
+}
+
+void
+offset_elem_desc(StringInfo buf, void *restrict offset, void *restrict data)
+{
+	appendStringInfo(buf, "%u", *(OffsetNumber *) offset);
+}
+
+void
+relid_desc(StringInfo buf, void *restrict relid, void *restrict data)
+{
+	appendStringInfo(buf, "%u", *(Oid *) relid);
+}
diff --git a/src/bin/pg_waldump/Makefile b/src/bin/pg_waldump/Makefile
index d6459e17c7..61c9de470c 100644
--- a/src/bin/pg_waldump/Makefile
+++ b/src/bin/pg_waldump/Makefile
@@ -18,7 +18,7 @@ OBJS = \
 
 override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
 
-RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc.c)))
+RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc.c) $(top_srcdir)/src/backend/access/rmgrdesc/rmgrdesc_utils.c))
 RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES))
 
 
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 8cb0d8da19..e7a5179b9f 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -265,6 +265,7 @@ typedef struct xl_heap_vacuum
 #define SizeOfHeapVacuum (offsetof(xl_heap_vacuum, nunused) + sizeof(uint16))
 
 /* flags for infobits_set */
+// add a comment about this
 #define XLHL_XMAX_IS_MULTI		0x01
 #define XLHL_XMAX_LOCK_ONLY		0x02
 #define XLHL_XMAX_EXCL_LOCK		0x04
diff --git a/src/include/access/rmgrdesc_utils.h b/src/include/access/rmgrdesc_utils.h
new file mode 100644
index 0000000000..7e4fcbdf25
--- /dev/null
+++ b/src/include/access/rmgrdesc_utils.h
@@ -0,0 +1,33 @@
+/*-------------------------------------------------------------------------
+ *
+ * rmgrdesc_utils.h
+ *	  helper utilities for rmgrdesc
+ *
+ * Copyright (c) 2023 PostgreSQL Global Development Group
+ *
+ * src/include/access/rmgrdesc_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef RMGRDESC_UTILS_H_
+#define RMGRDESC_UTILS_H_
+
+#include "storage/off.h"
+#include "access/heapam_xlog.h"
+
+void
+			array_desc(StringInfo buf, void *array, size_t elem_size, int count,
+					   void (*elem_desc) (StringInfo buf, void *restrict elem, void *restrict data),
+					   void *restrict data);
+
+void
+			offset_elem_desc(StringInfo buf, void *restrict offset, void *restrict data);
+
+void
+			redirect_elem_desc(StringInfo buf, void *restrict offset, void *restrict data);
+
+void
+			relid_desc(StringInfo buf, void *restrict relid, void *restrict data);
+
+
+#endif							/* RMGRDESC_UTILS_H */
-- 
2.38.1

#9Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#8)
Re: Show various offset arrays for heap WAL records

On Fri, Jan 27, 2023 at 12:24 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I believe I have addressed this in the attached patch.

I'm not sure what's best in terms of formatting details but I
definitely like the idea of making pg_waldump show more details. I'd
even like to have a way to extract the tuple data, when it's
operations on tuples and we have those tuples in the payload. That'd
be a lot more verbose than what you are doing here, though, and I'm
not saying you should go do it right now or anything like that.

--
Robert Haas
EDB: http://www.enterprisedb.com

In reply to: Melanie Plageman (#8)
Re: Show various offset arrays for heap WAL records

On Fri, Jan 27, 2023 at 9:24 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I have taken a stab at doing some of the tasks listed in this email.

Cool.

I have made the new files rmgr_utils.c/h.

I have come up with a standard format that I like for the output and
used it in all the heap record types.

Examples below:

That seems like a reasonable approach.

I started documenting it in the rmgr_utils.h header file in a comment,
however it may be worth a README?

I haven't polished this description of the format (or added examples,
etc) or used it in the btree-related functions because I assume the
format and helper function API will need more discussion.

I think that standardization is good, but ISTM that we need clarity on
what the scope is -- what is *not* being standardized? It may or may
not be useful to call the end result an API. Or it may not make sense
to do so in the first committed version, even though we may ultimately
end up as something that deserves to be called an API. The obligation
to not break tools that are scraping the output in whatever way seems
kind of onerous right now -- just not having any gratuitous
inconsistencies (e.g., fixing totally inconsistent punctuation, making
the names for fields across WAL records consistent when they serve
exactly the same purpose) would be a big improvement.

As I mentioned in passing already, I actually don't think that the
B-Tree WAL records are all that special, as far as this stuff goes.
For example, the DELETE Btree record type is very similar to heapam's
PRUNE record type, and practically identical to Btree's VACUUM record
type. All of these record types use the same basic conventions, like a
snapshotConflictHorizon field for recovery conflicts (which is
generated in a very similar way during original execution, and
processed in precisely the same way during REDO), and arrays of page
offset numbers sorted in ascending order.

There are some remaining details where things from an index AM WAL
record aren't directly analogous (or pretty much identical) to some
other heapam WAL records, such as the way that the DELETE Btree record
type deals with deleting a subset of TIDs from a posting list index
tuple (generated by B-Tree deduplication). But even these exceptions
don't require all that much discussion. You could either choose to
only display the array of deleted index tuple page offset numbers, as
well as the similar array of "updated" index tuple page offset numbers
from xl_btree_delete, in which case you just display two arrays of
page offset numbers, in the same standard way. You may or may not want
to also show each individual xl_btree_update entry -- doing so would
be kinda like showing the details of individual freeze plans, except
that you'd probably display something very similar to the page offset
number display here too (even though these aren't page offset numbers,
they're 0-based offsets into the posting list's item pointer data
array).

BTW, there is also a tendency for non-btree index AM WAL records to be
fairly similar or even near-identical to the B-Tree WAL records. While
Hash indexes are very different to B-Tree indexes at a high level, it
is nevertheless the case that xl_hash_vacuum_one_page is directly
based on xl_btree_delete/xl_btree_vacuum, and that xl_hash_insert is
directly based on xl_btree_insert. There are some other WAL record
types that are completely different across hash and B-Tree, which is a
reflection of the fact that the index grows using a totally different
approach in each AM -- but that doesn't seem like something that
throws up any roadblocks for you (these can all be displayed as simple
structs anyway).

Speaking with my B-Tree hat on, I'd just be happy to be able to see
both of the page offset number arrays (the deleted + updated offset
number arrays from xl_btree_delete/xl_btree_vacuum), without also
being able to\ see output for each individual xl_btree_update
item-pointer-array-offset arrays -- just seeing that much is already a
huge improvement. That's why I'm a bit hesitant to use the term API
just yet, because an obligation to be consistent in whatever way seems
like it might block incremental progress.

Perhaps there should also be example output of the offset arrays in
pgwalinspect docs?

That would definitely make sense.

I've changed the array format helper functions that Andres added to be a
single function with an additional layer of indirection so that any
record with an array can use it regardless of type and format of the
individual elements. The signature is based somewhat off of qsort_r()
and allows the user to pass a function with the the desired format of
the elements.

That's handy.

Personally, I like having the infomasks for the freeze plans. If we
someday have a more structured input to rmgr_desc, we could then easily
have them in their own column and use functions like
heap_tuple_infomask_flags() on them.

I agree, in general, though long term the best approach is one that
has a configurable level of verbosity, with some kind of roughly
uniform definition of verbosity (kinda like DEBUG1 - DEBUG5, though
probably with only 2 or 3 distinct levels).

Obviously what you're doing here will lead to a significant increase
in the verbosity of the output for affected WAL records. I don't feel
too bad about that, though. It's really an existing problem, and one
that should be fixed either way. You kind of have to deal with this
already, by having a good psql pager, since record types such as
COMMIT_PREPARED, INVALIDATIONS, and RUNNING_XACTS are already very
verbose in roughly the same way. You only need to have one of these
record types output by a function like pg_get_wal_records_info() to
get absurdly wide output -- it hardly matters that most individual WAL
record types have terse output at that point.

How hard would it be to invent a general mechanism to control the verbosity
of what we'll show for each WAL record?

Nontrivial, I'm afraid. We don't pass any relevant parameters to rm_desc:
void (*rm_desc) (StringInfo buf, XLogReaderState *record);

so we'd need to patch all of them. That might be worth doing at some point,
but I don't want to tackle it right now.

In terms of a more structured format, it seems like it would make the
most sense to pass a JSON or composite datatype structure to rm_desc
instead of that StringInfo.

I would also like to see functions like XLogRecGetBlockRefInfo() pass
something more useful than a stringinfo buffer so that we could easily
extract out the relfilenode in pgwalinspect.

That does seem particularly important. It's a pain to do this from
SQL. In general I'm okay with focussing on pg_walinspect over
pg_waldump, since it'll become more important over time. Obviously
pg_waldump needs to still work, but I think it's okay to care less
about pg_waldump usability.

BTW, while playing around with this patch today, I noticed that it
won't display the number of elements in each offset array directly.
Perhaps it's worth including that, too?

I believe I have addressed this in the attached patch.

Thanks for taking care of that.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#10)
Re: Show various offset arrays for heap WAL records

On Tue, Jan 31, 2023 at 1:52 PM Peter Geoghegan <pg@bowt.ie> wrote:

I would also like to see functions like XLogRecGetBlockRefInfo() pass
something more useful than a stringinfo buffer so that we could easily
extract out the relfilenode in pgwalinspect.

That does seem particularly important. It's a pain to do this from
SQL. In general I'm okay with focussing on pg_walinspect over
pg_waldump, since it'll become more important over time. Obviously
pg_waldump needs to still work, but I think it's okay to care less
about pg_waldump usability.

I just realized why you mentioned XLogRecGetBlockRefInfo() -- it
probably shouldn't even be used by pg_walinspect at all (just by
pg_waldump). Using something like XLogRecGetBlockRefInfo() within
pg_walinspect misses out on the opportunity to output information in a
more descriptive tuple format, with real data types. It's not just the
relfilenode, either -- it's the block numbers themselves. And the fork
number.

In other words, I suspect that this is out of scope for this patch,
strictly speaking. We simply shouldn't be using
XLogRecGetBlockRefInfo() in pg_walinspect in the first place. Rather,
pg_walinspect should be calling some other function that ultimately
allows the user to work with (say) an array of int8 from SQL for the
block numbers. There is no great reason not to, AFAICT, since this
information is completely generic -- it's not like the rmgr-specific
output from GetRmgr(), where fine grained type information is just a
nice-to-have, with usability issues of its own (on account of the
details being record type specific).

I've been managing this problem within my own custom pg_walinspect
queries by using my own custom ICU collation. I use ICU's natural sort
order to order based on block_ref, or based on a substring()
expression that extracts something interesting from block_ref, such as
relfilenode. You can create a custom collation for this like so, per
the docs:

CREATE COLLATION IF NOT EXISTS numeric (provider = icu, locale =
'en-u-kn-true');

Obviously this hack of mine works, but hardly anybody else would be
willing to take the time to figure something like this out. Plus it's
error prone when it doesn't really have to be. And it suggests that
the block_ref field isn't record type generic -- that's sort of
misleading IMV.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#10)
Re: Show various offset arrays for heap WAL records

On Tue, Jan 31, 2023 at 1:52 PM Peter Geoghegan <pg@bowt.ie> wrote:

Obviously what you're doing here will lead to a significant increase
in the verbosity of the output for affected WAL records. I don't feel
too bad about that, though. It's really an existing problem, and one
that should be fixed either way. You kind of have to deal with this
already, by having a good psql pager, since record types such as
COMMIT_PREPARED, INVALIDATIONS, and RUNNING_XACTS are already very
verbose in roughly the same way. You only need to have one of these
record types output by a function like pg_get_wal_records_info() to
get absurdly wide output -- it hardly matters that most individual WAL
record types have terse output at that point.

Actually the really wide output comes from COMMIT records. After I run
the regression tests, and execute some of my own custom pg_walinspect
queries, I see that some individual COMMIT records have a
length(description) of over 10,000 bytes/characters. There is even one
particular COMMIT record whose length(description) is about 46,000
bytes/characters. So *ludicrously* verbose GetRmgr() strings are not
uncommon today. The worst case (or even particularly bad cases) won't
be made any worse by this patch, because there are obviously limits on
the width of the arrays that it outputs details descriptions of, that
don't apply to these COMMIT records.

--
Peter Geoghegan

#13Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#12)
Re: Show various offset arrays for heap WAL records

On Tue, Jan 31, 2023 at 6:20 PM Peter Geoghegan <pg@bowt.ie> wrote:

Actually the really wide output comes from COMMIT records. After I run
the regression tests, and execute some of my own custom pg_walinspect
queries, I see that some individual COMMIT records have a
length(description) of over 10,000 bytes/characters. There is even one
particular COMMIT record whose length(description) is about 46,000
bytes/characters. So *ludicrously* verbose GetRmgr() strings are not
uncommon today. The worst case (or even particularly bad cases) won't
be made any worse by this patch, because there are obviously limits on
the width of the arrays that it outputs details descriptions of, that
don't apply to these COMMIT records.

If we're dumping a lot of details out of each WAL record, we might
want to switch to a multi-line format of some kind. No one enjoys a
460-character wide line, let alone 46000.

--
Robert Haas
EDB: http://www.enterprisedb.com

In reply to: Robert Haas (#13)
Re: Show various offset arrays for heap WAL records

On Wed, Feb 1, 2023 at 5:20 AM Robert Haas <robertmhaas@gmail.com> wrote:

If we're dumping a lot of details out of each WAL record, we might
want to switch to a multi-line format of some kind. No one enjoys a
460-character wide line, let alone 46000.

I generally prefer it when I can use psql without using expanded table
format mode, and without having to use a pager. Of course that isn't
always possible, but it often is. I just don't think that that's going
to become feasible with pg_walinspect queries any time soon, since it
really requires a comprehensive strategy to deal with the issue of
verbosity.

It seems practically mandatory to use a pager when running
pg_walinspect queries in psql right now -- pspg is good for this. I
really can't use expanded table mode here, since it obscures the
relationship between adjoining records. I'm usually looking through
rows/records in LSN order, and want to be able to easily compare the
LSNs (or other details) of groups of adjoining records.

--
Peter Geoghegan

#15Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#14)
Re: Show various offset arrays for heap WAL records

On Wed, Feb 1, 2023 at 12:47 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Wed, Feb 1, 2023 at 5:20 AM Robert Haas <robertmhaas@gmail.com> wrote:

If we're dumping a lot of details out of each WAL record, we might
want to switch to a multi-line format of some kind. No one enjoys a
460-character wide line, let alone 46000.

I generally prefer it when I can use psql without using expanded table
format mode, and without having to use a pager. Of course that isn't
always possible, but it often is. I just don't think that that's going
to become feasible with pg_walinspect queries any time soon, since it
really requires a comprehensive strategy to deal with the issue of
verbosity.

Well, if we're thinking of making the output a lot more verbose, it
seems like we should at least do a bit of brainstorming about what
that strategy could be.

--
Robert Haas
EDB: http://www.enterprisedb.com

#16Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#11)
1 attachment(s)
Re: Show various offset arrays for heap WAL records

On Tue, Jan 31, 2023 at 5:48 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Tue, Jan 31, 2023 at 1:52 PM Peter Geoghegan <pg@bowt.ie> wrote:

I would also like to see functions like XLogRecGetBlockRefInfo() pass
something more useful than a stringinfo buffer so that we could easily
extract out the relfilenode in pgwalinspect.

That does seem particularly important. It's a pain to do this from
SQL. In general I'm okay with focussing on pg_walinspect over
pg_waldump, since it'll become more important over time. Obviously
pg_waldump needs to still work, but I think it's okay to care less
about pg_waldump usability.

I just realized why you mentioned XLogRecGetBlockRefInfo() -- it
probably shouldn't even be used by pg_walinspect at all (just by
pg_waldump). Using something like XLogRecGetBlockRefInfo() within
pg_walinspect misses out on the opportunity to output information in a
more descriptive tuple format, with real data types. It's not just the
relfilenode, either -- it's the block numbers themselves. And the fork
number.

In other words, I suspect that this is out of scope for this patch,
strictly speaking. We simply shouldn't be using
XLogRecGetBlockRefInfo() in pg_walinspect in the first place. Rather,
pg_walinspect should be calling some other function that ultimately
allows the user to work with (say) an array of int8 from SQL for the
block numbers. There is no great reason not to, AFAICT, since this
information is completely generic -- it's not like the rmgr-specific
output from GetRmgr(), where fine grained type information is just a
nice-to-have, with usability issues of its own (on account of the
details being record type specific).

Something like the attached?

start_lsn | 0/19823390
end_lsn | 0/19824360
prev_lsn | 0/19821358
xid | 1355
resource_manager | Heap
record_type | UPDATE
record_length | 4021
main_data_length | 14
fpi_length | 3948
description | off 11 xmax 1355 flags 0x00 ; new off 109 xmax 0
block_ref |
[0:1][0:8]={{0,1663,5,17033,0,442,460,4244,0},{1,1663,5,17033,0,0,0,0,0}}

It is a bit annoying not to have information about what each block_ref
item in the array represents (previously in the string), so maybe the
format in the attached shouldn't be a replacement for what is already
displayed by pg_get_wal_records_info() and friends.

It could instead be a new function which returns information in this
format -- perhaps tuples with separate columns for each labeled block
ref field denormalized to repeat the wal record info for every block?

The one piece of information I didn't include in the new block_ref
columns is the compression type (since it is a string). Since I used the
forknum value instead of the forknum name, maybe it is defensible to
also provide a documented int value for the compression type and make
that an int too?

- Melanie

Attachments:

v1-0001-Return-block_ref-details-as-an-array.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Return-block_ref-details-as-an-array.patchDownload
From 8e2308b6b9047c42d3906943dec3fd11e975ef72 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 1 Mar 2023 10:53:41 -0500
Subject: [PATCH v1] Return block_ref details as an array

In pg_walinspect, the block_ref column was a string, making it difficult
to access individual members without regular expressions. Now it is an
int array of int arrays for each block with the relevant information.

Note that in the case of a compressed block image, the compression type
is not listed as it is a string.
---
 contrib/pg_walinspect/pg_walinspect--1.0.sql |   6 +-
 contrib/pg_walinspect/pg_walinspect.c        | 135 +++++++++++++++----
 2 files changed, 114 insertions(+), 27 deletions(-)

diff --git a/contrib/pg_walinspect/pg_walinspect--1.0.sql b/contrib/pg_walinspect/pg_walinspect--1.0.sql
index 08b3dd5556..eb8ff82dd8 100644
--- a/contrib/pg_walinspect/pg_walinspect--1.0.sql
+++ b/contrib/pg_walinspect/pg_walinspect--1.0.sql
@@ -17,7 +17,7 @@ CREATE FUNCTION pg_get_wal_record_info(IN in_lsn pg_lsn,
     OUT main_data_length int4,
     OUT fpi_length int4,
     OUT description text,
-    OUT block_ref text
+    OUT block_ref int4[][]
 )
 AS 'MODULE_PATHNAME', 'pg_get_wal_record_info'
 LANGUAGE C STRICT PARALLEL SAFE;
@@ -40,7 +40,7 @@ CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn,
     OUT main_data_length int4,
     OUT fpi_length int4,
     OUT description text,
-    OUT block_ref text
+    OUT block_ref int4[][]
 )
 RETURNS SETOF record
 AS 'MODULE_PATHNAME', 'pg_get_wal_records_info'
@@ -63,7 +63,7 @@ CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn,
     OUT main_data_length int4,
     OUT fpi_length int4,
     OUT description text,
-    OUT block_ref text
+    OUT block_ref int4[][]
 )
 RETURNS SETOF record
 AS 'MODULE_PATHNAME', 'pg_get_wal_records_info_till_end_of_wal'
diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
index b7b0a805ee..446449bbe1 100644
--- a/contrib/pg_walinspect/pg_walinspect.c
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -22,6 +22,7 @@
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/pg_lsn.h"
+#include "utils/array.h"
 
 /*
  * NOTE: For any code change or issue fix here, it is highly recommended to
@@ -57,6 +58,8 @@ static void FillXLogStatsRow(const char *name, uint64 n, uint64 total_count,
 static void GetWalStats(FunctionCallInfo fcinfo, XLogRecPtr start_lsn,
 						XLogRecPtr end_lsn, bool stats_per_record);
 static void GetWALFPIInfo(FunctionCallInfo fcinfo, XLogReaderState *record);
+static Datum
+walinspect_get_blkrefs(XLogReaderState *record, uint32 *fpi_data_size);
 
 /*
  * Check if the given LSN is in future. Also, return the LSN up to which the
@@ -174,6 +177,95 @@ ReadNextXLogRecord(XLogReaderState *xlogreader)
 	return record;
 }
 
+/* number of block ref items to display for every block (cols for our array) */
+#define WAL_BLOCK_INFO_TYPES 9
+
+static Datum
+walinspect_get_blkrefs(XLogReaderState *record, uint32 *fpi_data_size)
+{
+	DecodedBkpBlock *block;
+	ArrayType *aout;
+	Datum *cur_result;
+	Datum *result_datums;
+	bool *result_nulls;
+	int dims[2];
+	int lbs[2] = {0,0};
+	int nblocks = 0; /* nblocks is the number of rows of blocks */
+
+	int max_block_id  = XLogRecMaxBlockId(record);
+
+	/* determine how many blocks we have */
+	for (int block_id = 0; block_id <= max_block_id; block_id++)
+	{
+		if (XLogRecHasBlockRef(record, block_id))
+			nblocks++;
+	}
+
+	dims[0] = nblocks;
+	dims[1] = WAL_BLOCK_INFO_TYPES;
+
+	result_datums = palloc0(sizeof(Datum) * WAL_BLOCK_INFO_TYPES * nblocks);
+	result_nulls = palloc0(sizeof(bool) * WAL_BLOCK_INFO_TYPES * nblocks);
+
+	cur_result = result_datums;
+
+	for (int block_id = 0; block_id <= max_block_id; block_id++)
+	{
+		if (!XLogRecHasBlockRef(record, block_id))
+			continue;
+
+		block = XLogRecGetBlock(record, block_id);
+
+		*cur_result++ = Int32GetDatum(block_id);
+		*cur_result++ = Int32GetDatum(block->rlocator.spcOid);
+		*cur_result++ = Int32GetDatum(block->rlocator.dbOid);
+		*cur_result++ = Int32GetDatum(block->rlocator.relNumber);
+		*cur_result++ = Int32GetDatum(block->forknum);
+		*cur_result++ = Int32GetDatum(block->blkno);
+
+		if (!block->has_image)
+		{
+			cur_result += 3;
+			continue;
+		}
+
+		(*fpi_data_size) += block->bimg_len;
+
+		*cur_result++ = Int32GetDatum(block->hole_offset);
+		*cur_result++ = Int32GetDatum(block->hole_length);
+
+		if (!BKPIMAGE_COMPRESSED(block->bimg_info))
+		{
+			cur_result++;
+			continue;
+		}
+
+		*cur_result++ = Int32GetDatum(BLCKSZ - block->hole_length - block->bimg_len);
+	}
+
+	aout = construct_md_array(result_datums, result_nulls, 2, dims, lbs,
+			INT8OID, sizeof(int64), FLOAT8PASSBYVAL, TYPALIGN_DOUBLE);
+
+	PG_RETURN_POINTER(aout);
+}
+
+typedef enum wal_record_cols
+{
+	WAL_RECORD_START_LSN,
+	WAL_RECORD_END_LSN,
+	WAL_RECORD_PREV_LSN,
+	WAL_RECORD_XID,
+	WAL_RECORD_RMGR,
+	WAL_RECORD_TYPE,
+	WAL_RECORD_LENGTH,
+	WAL_RECORD_MAIN_DATA_LENGTH,
+	WAL_RECORD_FPI_LENGTH,
+	WAL_RECORD_DESC,
+	WAL_RECORD_BLOCK_REF
+} wal_record_cols;
+
+#define WAL_RECORD_NUM_COLS (WAL_RECORD_BLOCK_REF + 1)
+
 /*
  * Get a single WAL record info.
  */
@@ -183,40 +275,35 @@ GetWALRecordInfo(XLogReaderState *record, Datum *values,
 {
 	const char *id;
 	RmgrData	desc;
-	uint32		fpi_len = 0;
 	StringInfoData rec_desc;
-	StringInfoData rec_blk_ref;
-	uint32		main_data_len;
-	int			i = 0;
+	Datum blkref_array;
+	uint32		fpi_len = 0;
+
+	values[WAL_RECORD_START_LSN] = LSNGetDatum(record->ReadRecPtr);
+	values[WAL_RECORD_END_LSN] = LSNGetDatum(record->EndRecPtr);
+	values[WAL_RECORD_PREV_LSN] = LSNGetDatum(XLogRecGetPrev(record));
+	values[WAL_RECORD_XID] = TransactionIdGetDatum(XLogRecGetXid(record));
 
 	desc = GetRmgr(XLogRecGetRmid(record));
-	id = desc.rm_identify(XLogRecGetInfo(record));
+	values[WAL_RECORD_RMGR] = CStringGetTextDatum(desc.rm_name);
 
+	id = desc.rm_identify(XLogRecGetInfo(record));
 	if (id == NULL)
 		id = psprintf("UNKNOWN (%x)", XLogRecGetInfo(record) & ~XLR_INFO_MASK);
+	values[WAL_RECORD_TYPE] = CStringGetTextDatum(id);
 
-	initStringInfo(&rec_desc);
-	desc.rm_desc(&rec_desc, record);
-
-	/* Block references. */
-	initStringInfo(&rec_blk_ref);
-	XLogRecGetBlockRefInfo(record, false, true, &rec_blk_ref, &fpi_len);
+	values[WAL_RECORD_LENGTH] = UInt32GetDatum(XLogRecGetTotalLen(record));
+	values[WAL_RECORD_MAIN_DATA_LENGTH] = UInt32GetDatum(XLogRecGetDataLen(record));
 
-	main_data_len = XLogRecGetDataLen(record);
+	blkref_array = walinspect_get_blkrefs(record, &fpi_len);
 
-	values[i++] = LSNGetDatum(record->ReadRecPtr);
-	values[i++] = LSNGetDatum(record->EndRecPtr);
-	values[i++] = LSNGetDatum(XLogRecGetPrev(record));
-	values[i++] = TransactionIdGetDatum(XLogRecGetXid(record));
-	values[i++] = CStringGetTextDatum(desc.rm_name);
-	values[i++] = CStringGetTextDatum(id);
-	values[i++] = UInt32GetDatum(XLogRecGetTotalLen(record));
-	values[i++] = UInt32GetDatum(main_data_len);
-	values[i++] = UInt32GetDatum(fpi_len);
-	values[i++] = CStringGetTextDatum(rec_desc.data);
-	values[i++] = CStringGetTextDatum(rec_blk_ref.data);
+	values[WAL_RECORD_FPI_LENGTH] = UInt32GetDatum(fpi_len);
+	initStringInfo(&rec_desc);
+	desc.rm_desc(&rec_desc, record);
+	values[WAL_RECORD_DESC] = CStringGetTextDatum(rec_desc.data);
+	values[WAL_RECORD_BLOCK_REF] = blkref_array;
 
-	Assert(i == ncols);
+	Assert(ncols == WAL_RECORD_NUM_COLS);
 }
 
 
-- 
2.37.2

#17Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Melanie Plageman (#16)
Re: Show various offset arrays for heap WAL records

On 01.03.23 17:11, Melanie Plageman wrote:

diff --git a/contrib/pg_walinspect/pg_walinspect--1.0.sql b/contrib/pg_walinspect/pg_walinspect--1.0.sql
index 08b3dd5556..eb8ff82dd8 100644
--- a/contrib/pg_walinspect/pg_walinspect--1.0.sql
+++ b/contrib/pg_walinspect/pg_walinspect--1.0.sql
@@ -17,7 +17,7 @@ CREATE FUNCTION pg_get_wal_record_info(IN in_lsn pg_lsn,
OUT main_data_length int4,
OUT fpi_length int4,
OUT description text,
-    OUT block_ref text
+    OUT block_ref int4[][]
)
AS 'MODULE_PATHNAME', 'pg_get_wal_record_info'
LANGUAGE C STRICT PARALLEL SAFE;

A change like this would require a new extension version and an upgrade
script.

I suppose it's ok to postpone that work while the actual meat of the
patch is still being worked out, but I figured I'd mention it in case it
wasn't considered yet.

#18Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#10)
2 attachment(s)
Re: Show various offset arrays for heap WAL records

Thanks for the various perspectives and feedback.

Attached v2 has additional info for xl_btree_vacuum and xl_btree_delete.

I've quoted various emails by various senders below and replied.

On Fri, Jan 27, 2023 at 3:02 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jan 27, 2023 at 12:24 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I believe I have addressed this in the attached patch.

I'm not sure what's best in terms of formatting details but I
definitely like the idea of making pg_waldump show more details. I'd
even like to have a way to extract the tuple data, when it's
operations on tuples and we have those tuples in the payload. That'd
be a lot more verbose than what you are doing here, though, and I'm
not saying you should go do it right now or anything like that.

If I'm not mistaken, this would be quite difficult without changing
rm_desc to return some kind of self-describing data type.

On Tue, Jan 31, 2023 at 4:52 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Fri, Jan 27, 2023 at 9:24 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I started documenting it in the rmgr_utils.h header file in a comment,
however it may be worth a README?

I haven't polished this description of the format (or added examples,
etc) or used it in the btree-related functions because I assume the
format and helper function API will need more discussion.

I think that standardization is good, but ISTM that we need clarity on
what the scope is -- what is *not* being standardized? It may or may
not be useful to call the end result an API. Or it may not make sense
to do so in the first committed version, even though we may ultimately
end up as something that deserves to be called an API. The obligation
to not break tools that are scraping the output in whatever way seems
kind of onerous right now -- just not having any gratuitous
inconsistencies (e.g., fixing totally inconsistent punctuation, making
the names for fields across WAL records consistent when they serve
exactly the same purpose) would be a big improvement.

So, we can scrap any README or big comment, but are there other changes
to the code or structure you think would avoid it being seen as an
API?

As I mentioned in passing already, I actually don't think that the
B-Tree WAL records are all that special, as far as this stuff goes.
For example, the DELETE Btree record type is very similar to heapam's
PRUNE record type, and practically identical to Btree's VACUUM record
type. All of these record types use the same basic conventions, like a
snapshotConflictHorizon field for recovery conflicts (which is
generated in a very similar way during original execution, and
processed in precisely the same way during REDO), and arrays of page
offset numbers sorted in ascending order.

There are some remaining details where things from an index AM WAL
record aren't directly analogous (or pretty much identical) to some
other heapam WAL records, such as the way that the DELETE Btree record
type deals with deleting a subset of TIDs from a posting list index
tuple (generated by B-Tree deduplication). But even these exceptions
don't require all that much discussion. You could either choose to
only display the array of deleted index tuple page offset numbers, as
well as the similar array of "updated" index tuple page offset numbers
from xl_btree_delete, in which case you just display two arrays of
page offset numbers, in the same standard way. You may or may not want
to also show each individual xl_btree_update entry -- doing so would
be kinda like showing the details of individual freeze plans, except
that you'd probably display something very similar to the page offset
number display here too (even though these aren't page offset numbers,
they're 0-based offsets into the posting list's item pointer data
array).

I have added detail to xl_btree_delete and xl_btree_vacuum. I have added
the updated/deleted target offset numbers and the updated tuples
metadata.

I wondered if there was any reason to do xl_btree_dedup deduplication
intervals.

BTW, there is also a tendency for non-btree index AM WAL records to be
fairly similar or even near-identical to the B-Tree WAL records. While
Hash indexes are very different to B-Tree indexes at a high level, it
is nevertheless the case that xl_hash_vacuum_one_page is directly
based on xl_btree_delete/xl_btree_vacuum, and that xl_hash_insert is
directly based on xl_btree_insert. There are some other WAL record
types that are completely different across hash and B-Tree, which is a
reflection of the fact that the index grows using a totally different
approach in each AM -- but that doesn't seem like something that
throws up any roadblocks for you (these can all be displayed as simple
structs anyway).

I chose not to take on any other index types until I saw if this was viable.

Perhaps there should also be example output of the offset arrays in
pgwalinspect docs?

That would definitely make sense.

I wanted to include at least a minimal example for those following along
with this thread that would cause creation of one of the record types
which I have enhanced, but I had a little trouble making a reliable
example.

Below is my strategy for getting a Heap PRUNE record with redirects, but
it occasionally doesn't end up working and I wasn't sure why (I can do
more investigation if we think that having some kind of test for this is
useful).

CREATE EXTENSION pg_walinspect;
DROP TABLE IF EXISTS lsns;
CREATE TABLE lsns(name TEXT, lsn pg_lsn);

DROP TABLE IF EXISTS baz;
create table baz(a int, b int) with (autovacuum_enabled=false);
insert into baz select i, i % 3 from generate_series(1,100)i;

update baz set b = 0 where b = 1;
update baz set b = 7 where b = 0;
INSERT INTO lsns VALUES('start_lsn', (SELECT pg_current_wal_lsn()));
vacuum baz;
select count(*) from baz;
INSERT INTO lsns VALUES('end_lsn', (SELECT pg_current_wal_lsn()));
SELECT * FROM pg_get_wal_records_info((select lsn from lsns where name
= 'start_lsn'),
(select lsn from lsns where name = 'end_lsn'))
WHERE record_type LIKE 'PRUNE%' AND resource_manager = 'Heap2' LIMIT 1;

Personally, I like having the infomasks for the freeze plans. If we
someday have a more structured input to rmgr_desc, we could then easily
have them in their own column and use functions like
heap_tuple_infomask_flags() on them.

I agree, in general, though long term the best approach is one that
has a configurable level of verbosity, with some kind of roughly
uniform definition of verbosity (kinda like DEBUG1 - DEBUG5, though
probably with only 2 or 3 distinct levels).

Given this comment and Robert's concern quoted below, I am wondering if
the consensus is that a lack of verbosity control is a dealbreaker for
adding offsets or not.

On Wed, Feb 1, 2023 at 12:52 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Feb 1, 2023 at 12:47 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Wed, Feb 1, 2023 at 5:20 AM Robert Haas <robertmhaas@gmail.com> wrote:

If we're dumping a lot of details out of each WAL record, we might
want to switch to a multi-line format of some kind. No one enjoys a
460-character wide line, let alone 46000.

I generally prefer it when I can use psql without using expanded table
format mode, and without having to use a pager. Of course that isn't
always possible, but it often is. I just don't think that that's going
to become feasible with pg_walinspect queries any time soon, since it
really requires a comprehensive strategy to deal with the issue of
verbosity.

Well, if we're thinking of making the output a lot more verbose, it
seems like we should at least do a bit of brainstorming about what
that strategy could be.

In terms of strategies for controlling output verbosity, it seems
difficult to do without changing the rmgrdesc function signature. Unless
you are thinking of trying to reparse the rmgrdesc string output on the
pg_walinspect/pg_waldump side?

I think if there was a more structured output of rmgrdesc, then this
would also solve the verbosity level problem. Consumers could decide on
their verbosity level -- in various pg_walinspect function outputs, that
would probably just be column selection. For pg_waldump, I imagine that
some kind of parameter or flag would work.

Unless you are suggesting that we add a verbosity parameter to the
rmgrdesc function API now?

On Thu, Mar 2, 2023 at 3:17 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 01.03.23 17:11, Melanie Plageman wrote:

diff --git a/contrib/pg_walinspect/pg_walinspect--1.0.sql b/contrib/pg_walinspect/pg_walinspect--1.0.sql
index 08b3dd5556..eb8ff82dd8 100644
--- a/contrib/pg_walinspect/pg_walinspect--1.0.sql
+++ b/contrib/pg_walinspect/pg_walinspect--1.0.sql
@@ -17,7 +17,7 @@ CREATE FUNCTION pg_get_wal_record_info(IN in_lsn pg_lsn,
OUT main_data_length int4,
OUT fpi_length int4,
OUT description text,
-    OUT block_ref text
+    OUT block_ref int4[][]
)
AS 'MODULE_PATHNAME', 'pg_get_wal_record_info'
LANGUAGE C STRICT PARALLEL SAFE;

A change like this would require a new extension version and an upgrade
script.

I suppose it's ok to postpone that work while the actual meat of the
patch is still being worked out, but I figured I'd mention it in case it
wasn't considered yet.

Thanks for letting me know. This pg_walinspect patch ended up being
discussed over in [1]/messages/by-id/CAAKRu_bORebdZmcV8V4cZBzU8M_C6tDDdbiPhCZ6i-iuSXW9TA@mail.gmail.com.

- Melanie

[1]: /messages/by-id/CAAKRu_bORebdZmcV8V4cZBzU8M_C6tDDdbiPhCZ6i-iuSXW9TA@mail.gmail.com

Attachments:

v2-0001-Add-rmgr_desc-utilities.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Add-rmgr_desc-utilities.patchDownload
From 068f0caaf1188be2dcc979efc54979260ed53d5b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 25 Jan 2023 08:38:38 -0500
Subject: [PATCH v2 1/2] Add rmgr_desc utilities

- Begin to standardize format of rmgr_desc output, adding a new
  rmgr_utils file with comments about expected format.
- Add helper which prints arrays in a standard format and use it in
  heapdesc to print out offset arrays, relid arrays, freeze plan arrays,
  and relid arrays
---
 src/backend/access/rmgrdesc/Makefile         |   1 +
 src/backend/access/rmgrdesc/heapdesc.c       | 153 +++++++++++++++----
 src/backend/access/rmgrdesc/meson.build      |   1 +
 src/backend/access/rmgrdesc/rmgrdesc_utils.c |  88 +++++++++++
 src/bin/pg_waldump/Makefile                  |   2 +-
 src/include/access/heapam_xlog.h             |   1 +
 src/include/access/rmgrdesc_utils.h          |  33 ++++
 7 files changed, 250 insertions(+), 29 deletions(-)
 create mode 100644 src/backend/access/rmgrdesc/rmgrdesc_utils.c
 create mode 100644 src/include/access/rmgrdesc_utils.h

diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..cd95eec37f 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -23,6 +23,7 @@ OBJS = \
 	nbtdesc.o \
 	relmapdesc.o \
 	replorigindesc.o \
+	rmgrdesc_utils.o \
 	seqdesc.o \
 	smgrdesc.o \
 	spgdesc.o \
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 628f7e8215..38cff5ac26 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -15,22 +15,56 @@
 #include "postgres.h"
 
 #include "access/heapam_xlog.h"
+#include "access/rmgrdesc_utils.h"
+
 
 static void
 out_infobits(StringInfo buf, uint8 infobits)
 {
+	if ((infobits & XLHL_XMAX_IS_MULTI) == 0 &&
+		(infobits & XLHL_XMAX_LOCK_ONLY) == 0 &&
+		(infobits & XLHL_XMAX_EXCL_LOCK) == 0 &&
+		(infobits & XLHL_XMAX_KEYSHR_LOCK) == 0 &&
+		(infobits & XLHL_KEYS_UPDATED) == 0)
+		return;
+
+	appendStringInfoString(buf, ", infobits: [");
+
 	if (infobits & XLHL_XMAX_IS_MULTI)
-		appendStringInfoString(buf, "IS_MULTI ");
+		appendStringInfoString(buf, " IS_MULTI");
 	if (infobits & XLHL_XMAX_LOCK_ONLY)
-		appendStringInfoString(buf, "LOCK_ONLY ");
+		appendStringInfoString(buf, ", LOCK_ONLY");
 	if (infobits & XLHL_XMAX_EXCL_LOCK)
-		appendStringInfoString(buf, "EXCL_LOCK ");
+		appendStringInfoString(buf, ", EXCL_LOCK");
 	if (infobits & XLHL_XMAX_KEYSHR_LOCK)
-		appendStringInfoString(buf, "KEYSHR_LOCK ");
+		appendStringInfoString(buf, ", KEYSHR_LOCK");
 	if (infobits & XLHL_KEYS_UPDATED)
-		appendStringInfoString(buf, "KEYS_UPDATED ");
+		appendStringInfoString(buf, ", KEYS_UPDATED");
+
+	appendStringInfoString(buf, " ]");
+}
+
+static void
+plan_elem_desc(StringInfo buf, void *restrict plan, void *restrict data)
+{
+	xl_heap_freeze_plan *new_plan = (xl_heap_freeze_plan *) plan;
+	OffsetNumber **offsets = data;
+
+	appendStringInfo(buf, "{ xmax: %u, infomask: %u, infomask2: %u, ntuples: %u",
+					 new_plan->xmax, new_plan->t_infomask,
+					 new_plan->t_infomask2,
+					 new_plan->ntuples);
+
+	appendStringInfoString(buf, ", offsets:");
+	array_desc(buf, *offsets, sizeof(OffsetNumber), new_plan->ntuples,
+			   &offset_elem_desc, NULL);
+
+	*offsets += new_plan->ntuples;
+
+	appendStringInfo(buf, " }");
 }
 
+
 void
 heap_desc(StringInfo buf, XLogReaderState *record)
 {
@@ -42,14 +76,15 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_insert *xlrec = (xl_heap_insert *) rec;
 
-		appendStringInfo(buf, "off %u flags 0x%02X", xlrec->offnum,
+		appendStringInfo(buf, "off: %u, flags: 0x%02X",
+						 xlrec->offnum,
 						 xlrec->flags);
 	}
 	else if (info == XLOG_HEAP_DELETE)
 	{
 		xl_heap_delete *xlrec = (xl_heap_delete *) rec;
 
-		appendStringInfo(buf, "off %u flags 0x%02X ",
+		appendStringInfo(buf, "off: %u, flags: 0x%02X",
 						 xlrec->offnum,
 						 xlrec->flags);
 		out_infobits(buf, xlrec->infobits_set);
@@ -58,12 +93,12 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_update *xlrec = (xl_heap_update *) rec;
 
-		appendStringInfo(buf, "off %u xmax %u flags 0x%02X ",
+		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
 						 xlrec->old_offnum,
 						 xlrec->old_xmax,
 						 xlrec->flags);
 		out_infobits(buf, xlrec->old_infobits_set);
-		appendStringInfo(buf, "; new off %u xmax %u",
+		appendStringInfo(buf, ", new off: %u, xmax %u",
 						 xlrec->new_offnum,
 						 xlrec->new_xmax);
 	}
@@ -71,39 +106,41 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_update *xlrec = (xl_heap_update *) rec;
 
-		appendStringInfo(buf, "off %u xmax %u flags 0x%02X ",
+		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
 						 xlrec->old_offnum,
 						 xlrec->old_xmax,
 						 xlrec->flags);
 		out_infobits(buf, xlrec->old_infobits_set);
-		appendStringInfo(buf, "; new off %u xmax %u",
+		appendStringInfo(buf, ", new off: %u, xmax: %u",
 						 xlrec->new_offnum,
 						 xlrec->new_xmax);
 	}
 	else if (info == XLOG_HEAP_TRUNCATE)
 	{
 		xl_heap_truncate *xlrec = (xl_heap_truncate *) rec;
-		int			i;
 
+		appendStringInfoString(buf, "flags: [");
 		if (xlrec->flags & XLH_TRUNCATE_CASCADE)
-			appendStringInfoString(buf, "cascade ");
+			appendStringInfoString(buf, " CASCADE");
 		if (xlrec->flags & XLH_TRUNCATE_RESTART_SEQS)
-			appendStringInfoString(buf, "restart_seqs ");
-		appendStringInfo(buf, "nrelids %u relids", xlrec->nrelids);
-		for (i = 0; i < xlrec->nrelids; i++)
-			appendStringInfo(buf, " %u", xlrec->relids[i]);
+			appendStringInfoString(buf, ", RESTART_SEQS");
+		appendStringInfoString(buf, " ]");
+
+		appendStringInfo(buf, ", nrelids: %u", xlrec->nrelids);
+		appendStringInfoString(buf, ", relids:");
+		array_desc(buf, xlrec->relids, sizeof(Oid), xlrec->nrelids, &relid_desc, NULL);
 	}
 	else if (info == XLOG_HEAP_CONFIRM)
 	{
 		xl_heap_confirm *xlrec = (xl_heap_confirm *) rec;
 
-		appendStringInfo(buf, "off %u", xlrec->offnum);
+		appendStringInfo(buf, "off: %u", xlrec->offnum);
 	}
 	else if (info == XLOG_HEAP_LOCK)
 	{
 		xl_heap_lock *xlrec = (xl_heap_lock *) rec;
 
-		appendStringInfo(buf, "off %u: xid %u: flags 0x%02X ",
+		appendStringInfo(buf, "off: %u, xid: %u, flags: 0x%02X",
 						 xlrec->offnum, xlrec->locking_xid, xlrec->flags);
 		out_infobits(buf, xlrec->infobits_set);
 	}
@@ -111,9 +148,10 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_inplace *xlrec = (xl_heap_inplace *) rec;
 
-		appendStringInfo(buf, "off %u", xlrec->offnum);
+		appendStringInfo(buf, "off: %u", xlrec->offnum);
 	}
 }
+
 void
 heap2_desc(StringInfo buf, XLogReaderState *record)
 {
@@ -125,43 +163,102 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_prune *xlrec = (xl_heap_prune *) rec;
 
-		appendStringInfo(buf, "snapshotConflictHorizon %u nredirected %u ndead %u",
+		appendStringInfo(buf, "snapshotConflictHorizon: %u, nredirected: %u, ndead: %u",
 						 xlrec->snapshotConflictHorizon,
 						 xlrec->nredirected,
 						 xlrec->ndead);
+
+
+		if (!XLogRecHasBlockImage(record, 0))
+		{
+			OffsetNumber *end;
+			OffsetNumber *redirected;
+			OffsetNumber *nowdead;
+			OffsetNumber *nowunused;
+			int			nredirected;
+			int			nunused;
+			Size		datalen;
+
+			redirected = (OffsetNumber *) XLogRecGetBlockData(record, 0, &datalen);
+
+			nredirected = xlrec->nredirected;
+			end = (OffsetNumber *) ((char *) redirected + datalen);
+			nowdead = redirected + (nredirected * 2);
+			nowunused = nowdead + xlrec->ndead;
+			nunused = (end - nowunused);
+			Assert(nunused >= 0);
+
+			appendStringInfo(buf, ", nunused: %u", nunused);
+
+			appendStringInfoString(buf, ", redirected:");
+			array_desc(buf, redirected, sizeof(OffsetNumber) * 2, nredirected * 2, &redirect_elem_desc, NULL);
+			appendStringInfoString(buf, ", dead:");
+			array_desc(buf, nowdead, sizeof(OffsetNumber), xlrec->ndead, &offset_elem_desc, NULL);
+			appendStringInfoString(buf, ", unused:");
+			array_desc(buf, nowunused, sizeof(OffsetNumber), nunused, &offset_elem_desc, NULL);
+		}
 	}
 	else if (info == XLOG_HEAP2_VACUUM)
 	{
 		xl_heap_vacuum *xlrec = (xl_heap_vacuum *) rec;
 
-		appendStringInfo(buf, "nunused %u", xlrec->nunused);
+		appendStringInfo(buf, "nunused: %u", xlrec->nunused);
+
+		if (!XLogRecHasBlockImage(record, 0))
+		{
+			OffsetNumber *nowunused;
+
+			nowunused = (OffsetNumber *) XLogRecGetBlockData(record, 0, NULL);
+
+			appendStringInfoString(buf, ", unused:");
+			array_desc(buf, nowunused, sizeof(OffsetNumber), xlrec->nunused, &offset_elem_desc, NULL);
+		}
 	}
 	else if (info == XLOG_HEAP2_FREEZE_PAGE)
 	{
 		xl_heap_freeze_page *xlrec = (xl_heap_freeze_page *) rec;
 
-		appendStringInfo(buf, "snapshotConflictHorizon %u nplans %u",
+		appendStringInfo(buf, "snapshotConflictHorizon: %u, nplans: %u",
 						 xlrec->snapshotConflictHorizon, xlrec->nplans);
+
+		if (!XLogRecHasBlockImage(record, 0))
+		{
+			xl_heap_freeze_plan *plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, NULL);
+			OffsetNumber *offsets = (OffsetNumber *) &plans[xlrec->nplans];
+
+			appendStringInfoString(buf, ", plans:");
+			array_desc(buf,
+					   plans,
+					   sizeof(xl_heap_freeze_plan),
+					   xlrec->nplans,
+					   &plan_elem_desc,
+					   &offsets);
+
+		}
 	}
 	else if (info == XLOG_HEAP2_VISIBLE)
 	{
 		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
 
-		appendStringInfo(buf, "snapshotConflictHorizon %u flags 0x%02X",
+		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
 						 xlrec->snapshotConflictHorizon, xlrec->flags);
 	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
+		bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 
-		appendStringInfo(buf, "%d tuples flags 0x%02X", xlrec->ntuples,
+		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
+		appendStringInfoString(buf, ", offsets:");
+		if (!XLogRecHasBlockImage(record, 0) && !isinit)
+			array_desc(buf, xlrec->offsets, sizeof(OffsetNumber), xlrec->ntuples, &offset_elem_desc, NULL);
 	}
 	else if (info == XLOG_HEAP2_LOCK_UPDATED)
 	{
 		xl_heap_lock_updated *xlrec = (xl_heap_lock_updated *) rec;
 
-		appendStringInfo(buf, "off %u: xmax %u: flags 0x%02X ",
+		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
 						 xlrec->offnum, xlrec->xmax, xlrec->flags);
 		out_infobits(buf, xlrec->infobits_set);
 	}
@@ -169,13 +266,13 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_new_cid *xlrec = (xl_heap_new_cid *) rec;
 
-		appendStringInfo(buf, "rel %u/%u/%u; tid %u/%u",
+		appendStringInfo(buf, "rel: %u/%u/%u, tid: %u/%u",
 						 xlrec->target_locator.spcOid,
 						 xlrec->target_locator.dbOid,
 						 xlrec->target_locator.relNumber,
 						 ItemPointerGetBlockNumber(&(xlrec->target_tid)),
 						 ItemPointerGetOffsetNumber(&(xlrec->target_tid)));
-		appendStringInfo(buf, "; cmin: %u, cmax: %u, combo: %u",
+		appendStringInfo(buf, ", cmin: %u, cmax: %u, combo: %u",
 						 xlrec->cmin, xlrec->cmax, xlrec->combocid);
 	}
 }
diff --git a/src/backend/access/rmgrdesc/meson.build b/src/backend/access/rmgrdesc/meson.build
index 166cee67b6..f76e87e2d7 100644
--- a/src/backend/access/rmgrdesc/meson.build
+++ b/src/backend/access/rmgrdesc/meson.build
@@ -16,6 +16,7 @@ rmgr_desc_sources = files(
   'nbtdesc.c',
   'relmapdesc.c',
   'replorigindesc.c',
+  'rmgrdesc_utils.c',
   'seqdesc.c',
   'smgrdesc.c',
   'spgdesc.c',
diff --git a/src/backend/access/rmgrdesc/rmgrdesc_utils.c b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
new file mode 100644
index 0000000000..fa45c796a0
--- /dev/null
+++ b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
@@ -0,0 +1,88 @@
+/*-------------------------------------------------------------------------
+ *
+ * rmgrdesc_utils.c
+ *		support functions for rmgrdesc
+ *
+ * Portions Copyright (c) 2023, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/rmgrdesc/rmgrdesc_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/rmgrdesc_utils.h"
+
+/*
+ * The expected formatting of a desc function for an xlog record is as follows:
+ * member1_name: member1_value, member2_name: member2_value
+ * If the value is a list, please use
+ * member3_name: [ member3_list_value1, member3_list_value2 ]
+ *
+ * The first item appended to the string should not be prepended by any spaces
+ * or comma, however all subsequent appends to the string are responsible for
+ * prepended themselves with a comma followed by a space.
+ *
+ * Arrays should have a space between the opening square bracket and first
+ * element and between the last element and closing brace.
+ *
+ * Flags should be in all caps.
+ *
+ * For lists/arrays of items, the number of those items should be listed at the
+ * beginning with all the other numbers.
+ *
+ * whether or not to leave out a n[item] if the number is 0 (not consistent for
+ * XLOG_HEAP2_VACUUM and XLOG_HEAP2_PRUNE
+ *
+ * composite objects in a list should be surrounded with { }
+ *
+ * etc.
+ *
+ * TODO: make this description better/maybe put in a README
+ */
+
+
+void
+array_desc(StringInfo buf, void *array, size_t elem_size, int count,
+		   void (*elem_desc) (StringInfo buf, void *restrict elem, void *restrict data),
+		   void *restrict data)
+{
+	if (count == 0)
+	{
+		appendStringInfoString(buf, " []");
+		return;
+	}
+	appendStringInfo(buf, " [");
+	for (int i = 0; i < count; i++)
+	{
+		if (i > 0)
+			appendStringInfoString(buf, ",");
+		appendStringInfoString(buf, " ");
+
+		elem_desc(buf, (char *) array + elem_size * i, data);
+	}
+	appendStringInfoString(buf, " ]");
+}
+
+void
+redirect_elem_desc(StringInfo buf, void *restrict offset, void *restrict data)
+{
+	OffsetNumber *new_offset = (OffsetNumber *) offset;
+
+	appendStringInfo(buf, "%u->%u", new_offset[0], new_offset[1]);
+}
+
+void
+offset_elem_desc(StringInfo buf, void *restrict offset, void *restrict data)
+{
+	appendStringInfo(buf, "%u", *(OffsetNumber *) offset);
+}
+
+void
+relid_desc(StringInfo buf, void *restrict relid, void *restrict data)
+{
+	appendStringInfo(buf, "%u", *(Oid *) relid);
+}
diff --git a/src/bin/pg_waldump/Makefile b/src/bin/pg_waldump/Makefile
index d6459e17c7..61c9de470c 100644
--- a/src/bin/pg_waldump/Makefile
+++ b/src/bin/pg_waldump/Makefile
@@ -18,7 +18,7 @@ OBJS = \
 
 override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
 
-RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc.c)))
+RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc.c) $(top_srcdir)/src/backend/access/rmgrdesc/rmgrdesc_utils.c))
 RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES))
 
 
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index a2c67d1cd3..63075c51bd 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -265,6 +265,7 @@ typedef struct xl_heap_vacuum
 #define SizeOfHeapVacuum (offsetof(xl_heap_vacuum, nunused) + sizeof(uint16))
 
 /* flags for infobits_set */
+// add a comment about this
 #define XLHL_XMAX_IS_MULTI		0x01
 #define XLHL_XMAX_LOCK_ONLY		0x02
 #define XLHL_XMAX_EXCL_LOCK		0x04
diff --git a/src/include/access/rmgrdesc_utils.h b/src/include/access/rmgrdesc_utils.h
new file mode 100644
index 0000000000..7e4fcbdf25
--- /dev/null
+++ b/src/include/access/rmgrdesc_utils.h
@@ -0,0 +1,33 @@
+/*-------------------------------------------------------------------------
+ *
+ * rmgrdesc_utils.h
+ *	  helper utilities for rmgrdesc
+ *
+ * Copyright (c) 2023 PostgreSQL Global Development Group
+ *
+ * src/include/access/rmgrdesc_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef RMGRDESC_UTILS_H_
+#define RMGRDESC_UTILS_H_
+
+#include "storage/off.h"
+#include "access/heapam_xlog.h"
+
+void
+			array_desc(StringInfo buf, void *array, size_t elem_size, int count,
+					   void (*elem_desc) (StringInfo buf, void *restrict elem, void *restrict data),
+					   void *restrict data);
+
+void
+			offset_elem_desc(StringInfo buf, void *restrict offset, void *restrict data);
+
+void
+			redirect_elem_desc(StringInfo buf, void *restrict offset, void *restrict data);
+
+void
+			relid_desc(StringInfo buf, void *restrict relid, void *restrict data);
+
+
+#endif							/* RMGRDESC_UTILS_H */
-- 
2.37.2

v2-0002-Add-detail-to-some-btree-xlog-records.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Add-detail-to-some-btree-xlog-records.patchDownload
From 9a8a9d01bcd2e18e5f6240c7aedd0f982734c860 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 13 Mar 2023 18:15:17 -0400
Subject: [PATCH v2 2/2] Add detail to some btree xlog records

---
 src/backend/access/rmgrdesc/nbtdesc.c        | 81 +++++++++++++++++---
 src/backend/access/rmgrdesc/rmgrdesc_utils.c |  6 ++
 src/include/access/rmgrdesc_utils.h          |  3 +
 3 files changed, 78 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/rmgrdesc/nbtdesc.c b/src/backend/access/rmgrdesc/nbtdesc.c
index c5dc543a0f..3f2f19aad7 100644
--- a/src/backend/access/rmgrdesc/nbtdesc.c
+++ b/src/backend/access/rmgrdesc/nbtdesc.c
@@ -15,6 +15,53 @@
 #include "postgres.h"
 
 #include "access/nbtxlog.h"
+#include "access/rmgrdesc_utils.h"
+
+static void
+btree_update_elem_desc(StringInfo buf, void *restrict update, void *restrict data)
+{
+	xl_btree_update *new_update = (xl_btree_update *) update;
+	OffsetNumber *updated_offset = *((OffsetNumber **) data);
+
+	appendStringInfo(buf, "{ updated offset: %u, ndeleted tids: %u", *updated_offset, new_update->ndeletedtids);
+
+	appendStringInfoString(buf, ", deleted tids:");
+
+	array_desc(buf, (char *) new_update + SizeOfBtreeUpdate,
+			sizeof(uint16), new_update->ndeletedtids, &uint16_elem_desc, NULL);
+
+	updated_offset++;
+
+	appendStringInfo(buf, " }");
+}
+
+static void
+btree_del_desc(StringInfo buf, char *block_data, uint16 ndeleted, uint16 nupdated)
+{
+	OffsetNumber *updatedoffsets;
+	xl_btree_update *updates;
+	OffsetNumber *data = (OffsetNumber *) block_data;
+
+	appendStringInfoString(buf, ", deleted:");
+	array_desc(buf, data, sizeof(OffsetNumber), ndeleted, &offset_elem_desc, NULL);
+
+	appendStringInfoString(buf, ", updated:");
+	array_desc(buf, data, sizeof(OffsetNumber), nupdated, &offset_elem_desc, NULL);
+
+	if (nupdated <= 0)
+		return;
+
+	updatedoffsets = (OffsetNumber *)
+		((char *) data + ndeleted * sizeof(OffsetNumber));
+	updates = (xl_btree_update *) ((char *) updatedoffsets +
+								nupdated *
+								sizeof(OffsetNumber));
+
+	appendStringInfoString(buf, ", updates:");
+	array_desc(buf, updates, sizeof(xl_btree_update),
+			nupdated, &btree_update_elem_desc,
+			&updatedoffsets);
+}
 
 void
 btree_desc(StringInfo buf, XLogReaderState *record)
@@ -31,7 +78,7 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 			{
 				xl_btree_insert *xlrec = (xl_btree_insert *) rec;
 
-				appendStringInfo(buf, "off %u", xlrec->offnum);
+				appendStringInfo(buf, "off: %u", xlrec->offnum);
 				break;
 			}
 		case XLOG_BTREE_SPLIT_L:
@@ -39,7 +86,7 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 			{
 				xl_btree_split *xlrec = (xl_btree_split *) rec;
 
-				appendStringInfo(buf, "level %u, firstrightoff %d, newitemoff %d, postingoff %d",
+				appendStringInfo(buf, "level: %u, firstrightoff: %d, newitemoff: %d, postingoff: %d",
 								 xlrec->level, xlrec->firstrightoff,
 								 xlrec->newitemoff, xlrec->postingoff);
 				break;
@@ -48,31 +95,41 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 			{
 				xl_btree_dedup *xlrec = (xl_btree_dedup *) rec;
 
-				appendStringInfo(buf, "nintervals %u", xlrec->nintervals);
+				appendStringInfo(buf, "nintervals: %u", xlrec->nintervals);
 				break;
 			}
 		case XLOG_BTREE_VACUUM:
 			{
 				xl_btree_vacuum *xlrec = (xl_btree_vacuum *) rec;
 
-				appendStringInfo(buf, "ndeleted %u; nupdated %u",
-								 xlrec->ndeleted, xlrec->nupdated);
+				appendStringInfo(buf, "ndeleted: %u, nupdated: %u",
+						xlrec->ndeleted, xlrec->nupdated);
+
+				if (!XLogRecHasBlockImage(record, 0))
+					btree_del_desc(buf, XLogRecGetBlockData(record, 0, NULL),
+							xlrec->ndeleted, xlrec->nupdated);
+
 				break;
 			}
 		case XLOG_BTREE_DELETE:
 			{
 				xl_btree_delete *xlrec = (xl_btree_delete *) rec;
 
-				appendStringInfo(buf, "snapshotConflictHorizon %u; ndeleted %u; nupdated %u",
+				appendStringInfo(buf, "snapshotConflictHorizon: %u, ndeleted: %u, nupdated: %u",
 								 xlrec->snapshotConflictHorizon,
 								 xlrec->ndeleted, xlrec->nupdated);
+
+				if (!XLogRecHasBlockImage(record, 0))
+					btree_del_desc(buf, XLogRecGetBlockData(record, 0, NULL),
+							xlrec->ndeleted, xlrec->nupdated);
+
 				break;
 			}
 		case XLOG_BTREE_MARK_PAGE_HALFDEAD:
 			{
 				xl_btree_mark_page_halfdead *xlrec = (xl_btree_mark_page_halfdead *) rec;
 
-				appendStringInfo(buf, "topparent %u; leaf %u; left %u; right %u",
+				appendStringInfo(buf, "topparent: %u; leaf: %u; left: %u; right: %u",
 								 xlrec->topparent, xlrec->leafblk, xlrec->leftblk, xlrec->rightblk);
 				break;
 			}
@@ -81,11 +138,11 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 			{
 				xl_btree_unlink_page *xlrec = (xl_btree_unlink_page *) rec;
 
-				appendStringInfo(buf, "left %u; right %u; level %u; safexid %u:%u; ",
+				appendStringInfo(buf, "left: %u; right: %u; level: %u; safexid: %u:%u; ",
 								 xlrec->leftsib, xlrec->rightsib, xlrec->level,
 								 EpochFromFullTransactionId(xlrec->safexid),
 								 XidFromFullTransactionId(xlrec->safexid));
-				appendStringInfo(buf, "leafleft %u; leafright %u; leaftopparent %u",
+				appendStringInfo(buf, "leafleft: %u; leafright: %u; leaftopparent: %u",
 								 xlrec->leafleftsib, xlrec->leafrightsib,
 								 xlrec->leaftopparent);
 				break;
@@ -94,14 +151,14 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 			{
 				xl_btree_newroot *xlrec = (xl_btree_newroot *) rec;
 
-				appendStringInfo(buf, "lev %u", xlrec->level);
+				appendStringInfo(buf, "lev: %u", xlrec->level);
 				break;
 			}
 		case XLOG_BTREE_REUSE_PAGE:
 			{
 				xl_btree_reuse_page *xlrec = (xl_btree_reuse_page *) rec;
 
-				appendStringInfo(buf, "rel %u/%u/%u; snapshotConflictHorizon %u:%u",
+				appendStringInfo(buf, "rel: %u/%u/%u, snapshotConflictHorizon: %u:%u",
 								 xlrec->locator.spcOid, xlrec->locator.dbOid,
 								 xlrec->locator.relNumber,
 								 EpochFromFullTransactionId(xlrec->snapshotConflictHorizon),
@@ -114,7 +171,7 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 
 				xlrec = (xl_btree_metadata *) XLogRecGetBlockData(record, 0,
 																  NULL);
-				appendStringInfo(buf, "last_cleanup_num_delpages %u",
+				appendStringInfo(buf, "last_cleanup_num_delpages: %u",
 								 xlrec->last_cleanup_num_delpages);
 				break;
 			}
diff --git a/src/backend/access/rmgrdesc/rmgrdesc_utils.c b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
index fa45c796a0..26da7e344f 100644
--- a/src/backend/access/rmgrdesc/rmgrdesc_utils.c
+++ b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
@@ -86,3 +86,9 @@ relid_desc(StringInfo buf, void *restrict relid, void *restrict data)
 {
 	appendStringInfo(buf, "%u", *(Oid *) relid);
 }
+
+void
+uint16_elem_desc(StringInfo buf, void *restrict value, void *restrict data)
+{
+	appendStringInfo(buf, "%u", *(uint16 *) value);
+}
diff --git a/src/include/access/rmgrdesc_utils.h b/src/include/access/rmgrdesc_utils.h
index 7e4fcbdf25..ddf50335b1 100644
--- a/src/include/access/rmgrdesc_utils.h
+++ b/src/include/access/rmgrdesc_utils.h
@@ -29,5 +29,8 @@ void
 void
 			relid_desc(StringInfo buf, void *restrict relid, void *restrict data);
 
+void
+			uint16_elem_desc(StringInfo buf, void *restrict value, void *restrict data);
+
 
 #endif							/* RMGRDESC_UTILS_H */
-- 
2.37.2

In reply to: Melanie Plageman (#18)
Re: Show various offset arrays for heap WAL records

On Mon, Mar 13, 2023 at 4:01 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Fri, Jan 27, 2023 at 3:02 PM Robert Haas <robertmhaas@gmail.com> wrote:

I'm not sure what's best in terms of formatting details but I
definitely like the idea of making pg_waldump show more details.

If I'm not mistaken, this would be quite difficult without changing
rm_desc to return some kind of self-describing data type.

I'd say that it would depend on how far you went with it. Basic
information about the tuple wouldn't require any of that. I suggest
leaving this part out for now, though.

So, we can scrap any README or big comment, but are there other changes
to the code or structure you think would avoid it being seen as an
API?

I think that it would be good to try to build something that looks
like an API, while making zero promises about its stability -- at
least until further notice. Kind of like how there are no guarantees
about the stability of internal interfaces within the Linux kernel.

There is no reason to not take a firm position on some things now.
Things like punctuation, and symbol names for generic cross-record
symbols like snapshotConflictHorizon. Many of the differences that
exist now are wholly gratuitous -- just accidents. It would make sense
to standardize-away these clearly unnecessary variations. And to
document the new standard. I'd be surprised if anybody disagreed with
me on this point.

I have added detail to xl_btree_delete and xl_btree_vacuum. I have added
the updated/deleted target offset numbers and the updated tuples
metadata.

I wondered if there was any reason to do xl_btree_dedup deduplication
intervals.

No reason. It wouldn't be hard to cover xl_btree_dedup deduplication
intervals -- each element is a page offset number, and a corresponding
count of index tuples to merge together in the REDO routine. That's
slightly different to anything else, but not in a way that seems like
it requires very much additional effort.

I wanted to include at least a minimal example for those following along
with this thread that would cause creation of one of the record types
which I have enhanced, but I had a little trouble making a reliable
example.

Below is my strategy for getting a Heap PRUNE record with redirects, but
it occasionally doesn't end up working and I wasn't sure why (I can do
more investigation if we think that having some kind of test for this is
useful).

I'm not sure, but offhand I think that there could be a number of
annoying little implementation details that make it hard to come up
with a perfectly reliable test case. Perhaps try it while using VACUUM
VERBOSE, with the proviso that we should only expect the revised
example workflow to show a redirect record as intended when the
VERBOSE output confirms that VACUUM actually ran as expected, in
whatever way. For example, VACUUM can't have failed to acquire a
cleanup lock on a heap page due to the current phase of the moon.
VACUUM shouldn't have its "removable cutoff" held back by
who-knows-what when the test case is run, either.

Some of the tests for VACUUM use a temp table, since they conveniently
cannot have their "removable cutoff" held back -- not since commit
a7212be8. Of course, that strategy won't help you here. Getting VACUUM
to behave very predictably for testing purposes has proven tricky at
times.

I agree, in general, though long term the best approach is one that
has a configurable level of verbosity, with some kind of roughly
uniform definition of verbosity (kinda like DEBUG1 - DEBUG5, though
probably with only 2 or 3 distinct levels).

Given this comment and Robert's concern quoted below, I am wondering if
the consensus is that a lack of verbosity control is a dealbreaker for
adding offsets or not.

There are several different things that seem important to me
personally. These are in tension with each other, to a degree. These
are:

1. Like Andres, I'd really like to have some way of inspecting things
like heapam PRUNE, VACUUM, and FREEZE_PAGE records in significant
detail. These record types happen to be very important in general, and
the ability to see detailed information about the WAL record would
definitely help with some debugging scenarios. I've really missed
stuff like this while debugging serious issues under time pressure.

2. To a lesser extent I would like to see similar detailed information
for nbtree's DELETE, VACUUM, and possibly DEDUPLICATE record types.
They might also come in handy for debugging, in about the same way.

3. More manageable verbosity.

I think that it would be okay to put off coming up with a solution to
3, for now. 1 and 2 seem more important than 3.

I think if there was a more structured output of rmgrdesc, then this
would also solve the verbosity level problem. Consumers could decide on
their verbosity level -- in various pg_walinspect function outputs, that
would probably just be column selection. For pg_waldump, I imagine that
some kind of parameter or flag would work.

Unless you are suggesting that we add a verbosity parameter to the
rmgrdesc function API now?

The verbosity problem will get somewhat worse if we do just my items 1
and 2, so it would be nice if we at least had a strategy in mind that
delivers on item 3/verbosity -- though the implementation can appear
in a later release. Maybe something simple would work, like promising
to output (say) 30 characters or less in terse mode, and making no
such promise otherwise. Terse mode wouldn't just truncate the output
of verbose mode -- it would never display information that could in
principle exceed the 30 character allowance, even with records that
happen to fall under the limit.

I can't feel too bad about putting this part off. A pager like pspg is
already table stakes when using pg_walinspect in any sort of serious
way. As I said upthread, absurdly wide output is already reasonably
common in most cases.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#19)
1 attachment(s)
Re: Show various offset arrays for heap WAL records

On Mon, Mar 13, 2023 at 6:41 PM Peter Geoghegan <pg@bowt.ie> wrote:

There are several different things that seem important to me
personally. These are in tension with each other, to a degree. These
are:

1. Like Andres, I'd really like to have some way of inspecting things
like heapam PRUNE, VACUUM, and FREEZE_PAGE records in significant
detail. These record types happen to be very important in general, and
the ability to see detailed information about the WAL record would
definitely help with some debugging scenarios. I've really missed
stuff like this while debugging serious issues under time pressure.

One problem that I often run into when performing analysis of VACUUM
using pg_walinspect is the issue of *who* pruned which heap page, for
any given PRUNE record. Was it VACUUM/autovacuum, or was it
opportunistic pruning? There is no way of knowing for sure right now.
You *cannot* rely on an xid of 0 as an indicator of a given PRUNE
record coming from VACUUM; it could just have been an opportunistic
prune operation that happened to take place when a SELECT query ran,
before any XID was ever allocated.

I think that we should do something like the attached, to completely
avoid this ambiguity. This patch adds a new XLOG_HEAP2 bit that's
similar to XLOG_HEAP_INIT_PAGE -- XLOG_HEAP2_BYVACUUM. This allows all
XLOG_HEAP2 record types to indicate that they took place during
VACUUM, by XOR'ing the flag with the record type/info when
XLogInsert() is called. For now this is only used by PRUNE records.
Tools like pg_walinspect will report a separate "Heap2/PRUNE+BYVACUUM"
record_type, as well as the unadorned Heap2/PRUNE record_type, which
we'll now know must have been opportunistic pruning.

The approach of using a bit in the style of the heapam init bit makes
sense to me, because the bit is available, and works in a way that is
minimally invasive. Also, one can imagine needing to resolve a similar
ambiguity in the future, when (say) opportunistic freezing is added.

I think that it makes sense to treat this within the scope of
Melanie's ongoing work to improve the instrumentation of these records
-- meaning that it's in scope for Postgres 16. Admittedly this is a
slightly creative interpretation, so if others disagree then I won't
argue. This is quite a small patch, though, which makes debugging
significantly easier. I think that there could be a great deal of
utility in being able to easily "pair up" corresponding
"Heap2/PRUNE+BYVACUUM" and "Heap2/VACUUM" records in debugging
scenarios. I can imagine linking these to "Heap2/FREEZE_PAGE" and
"Heap2/VISIBLE" records, too, since they're all closely related record
types.

--
Peter Geoghegan

Attachments:

v1-0001-Record-which-PRUNE-records-are-from-VACUUM.patchapplication/octet-stream; name=v1-0001-Record-which-PRUNE-records-are-from-VACUUM.patchDownload
From b584717e9de72c1ac084698453667cb301551e36 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Tue, 21 Mar 2023 14:40:43 -0700
Subject: [PATCH v1] Record which PRUNE records are from VACUUM.

---
 src/include/access/heapam.h            |  2 +-
 src/include/access/heapam_xlog.h       |  8 +++++++-
 src/backend/access/heap/pruneheap.c    | 25 +++++++++++++++----------
 src/backend/access/rmgrdesc/heapdesc.c |  3 +++
 4 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index faf502651..2eedd7efd 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -289,7 +289,7 @@ extern int	heap_page_prune(Relation relation, Buffer buffer,
 							TransactionId old_snap_xmin,
 							TimestampTz old_snap_ts,
 							int *nnewlpdead,
-							OffsetNumber *off_loc);
+							OffsetNumber *off_loc_vacuum);
 extern void heap_page_prune_execute(Buffer buffer,
 									OffsetNumber *redirected, int nredirected,
 									OffsetNumber *nowdead, int ndead,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index a2c67d1cd..3372d1938 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -48,7 +48,7 @@
  * We ran out of opcodes, so heapam.c now has a second RmgrId.  These opcodes
  * are associated with RM_HEAP2_ID, but are not logically different from
  * the ones above associated with RM_HEAP_ID.  XLOG_HEAP_OPMASK applies to
- * these, too.
+ * these, too.  The BYVACUUM bit is used in place of RM_HEAP_ID's init bit.
  */
 #define XLOG_HEAP2_REWRITE		0x00
 #define XLOG_HEAP2_PRUNE		0x10
@@ -59,6 +59,12 @@
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
 
+/*
+ * RM_HEAP2_ID operations that take place during VACUUM set the BYVACUUM bit
+ * as instrumentation.  Only set in XLOG_HEAP2_PRUNE records, for now.
+ */
+#define XLOG_HEAP2_BYVACUUM		0x80
+
 /*
  * xl_heap_insert/xl_heap_multi_insert flag values, 8 bits are available.
  */
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4e65cbcad..5f4ef384e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -257,8 +257,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * Sets *nnewlpdead for caller, indicating the number of items that were
  * newly set LP_DEAD during prune operation.
  *
- * off_loc is the offset location required by the caller to use in error
- * callback.
+ * off_loc_vacuum is the offset location required by VACUUM caller to use in
+ * error callback.  Opportunistic pruning caller should pass NULL.
  *
  * Returns the number of tuples deleted from the page during this call.
  */
@@ -268,7 +268,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 				TransactionId old_snap_xmin,
 				TimestampTz old_snap_ts,
 				int *nnewlpdead,
-				OffsetNumber *off_loc)
+				OffsetNumber *off_loc_vacuum)
 {
 	int			ndeleted = 0;
 	Page		page = BufferGetPage(buffer);
@@ -345,8 +345,8 @@ heap_page_prune(Relation relation, Buffer buffer,
 		 * Set the offset number so that we can display it along with any
 		 * error that occurred while processing this tuple.
 		 */
-		if (off_loc)
-			*off_loc = offnum;
+		if (off_loc_vacuum)
+			*off_loc_vacuum = offnum;
 
 		prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
 														   buffer);
@@ -364,8 +364,8 @@ heap_page_prune(Relation relation, Buffer buffer,
 			continue;
 
 		/* see preceding loop */
-		if (off_loc)
-			*off_loc = offnum;
+		if (off_loc_vacuum)
+			*off_loc_vacuum = offnum;
 
 		/* Nothing to do if slot is empty or already dead */
 		itemid = PageGetItemId(page, offnum);
@@ -377,8 +377,8 @@ heap_page_prune(Relation relation, Buffer buffer,
 	}
 
 	/* Clear the offset information once we have processed the given page. */
-	if (off_loc)
-		*off_loc = InvalidOffsetNumber;
+	if (off_loc_vacuum)
+		*off_loc_vacuum = InvalidOffsetNumber;
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -416,6 +416,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 		if (RelationNeedsWAL(relation))
 		{
 			xl_heap_prune xlrec;
+			uint8		info = XLOG_HEAP2_PRUNE;
 			XLogRecPtr	recptr;
 
 			xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
@@ -445,7 +446,11 @@ heap_page_prune(Relation relation, Buffer buffer,
 				XLogRegisterBufData(0, (char *) prstate.nowunused,
 									prstate.nunused * sizeof(OffsetNumber));
 
-			recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
+			/* Set bit indicating pruning by VACUUM where appropriate */
+			if (off_loc_vacuum)
+				info |= XLOG_HEAP2_BYVACUUM;
+
+			recptr = XLogInsert(RM_HEAP2_ID, info);
 
 			PageSetLSN(BufferGetPage(buffer), recptr);
 		}
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 628f7e821..9231b7411 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -235,6 +235,9 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE:
 			id = "PRUNE";
 			break;
+		case XLOG_HEAP2_PRUNE | XLOG_HEAP2_BYVACUUM:
+			id = "PRUNE+BYVACUUM";
+			break;
 		case XLOG_HEAP2_VACUUM:
 			id = "VACUUM";
 			break;
-- 
2.39.2

In reply to: Peter Geoghegan (#20)
Re: Show various offset arrays for heap WAL records

On Tue, Mar 21, 2023 at 3:37 PM Peter Geoghegan <pg@bowt.ie> wrote:

One problem that I often run into when performing analysis of VACUUM
using pg_walinspect is the issue of *who* pruned which heap page, for
any given PRUNE record. Was it VACUUM/autovacuum, or was it
opportunistic pruning? There is no way of knowing for sure right now.
You *cannot* rely on an xid of 0 as an indicator of a given PRUNE
record coming from VACUUM; it could just have been an opportunistic
prune operation that happened to take place when a SELECT query ran,
before any XID was ever allocated.

In case it's unclear how much of a problem this can be, here's an example:

The misc.sql regression test does a bulk update of the table "onek". A
little later, one of the queries that appears under the section "copy"
from the same file SELECTs from "onek". This produces a succession of
opportunistic prune records that look exactly like what you'd expect from
a VACUUM when viewed through pg_walinspect (without this patch). Each
PRUNE record has XID 0. The records appear in ascending heap block
number order, since there is a sequential scan involved (we go through
heapgetpage() to get to heap_page_prune_opt(), where the query prunes
opportunistically).

Another slightly surprising fact revealed by the patch is the ratio of
opportunistic prunes ("Heap2/PRUNE") to prunes run during VACUUM
("Heap2/PRUNE+BYVACUUM") with the regression tests:

│ resource_manager/record_type │ Heap2/PRUNE │
│ count │ 4,521 │
│ count_perc │ 0.220 │
│ rec_size │ 412,442 │
│ avg_rec_size │ 91 │
│ rec_size_perc │ 0.194 │
│ fpi_size │ 632,828 │
│ fpi_size_perc │ 1.379 │
│ combined_size │ 1,045,270 │
│ combined_size_perc │ 0.404 │
├─[ RECORD 61 ]────────────────┼─────────────────────────────┤
│ resource_manager/record_type │ Heap2/PRUNE+BYVACUUM │
│ count │ 2,784 │
│ count_perc │ 0.135 │
│ rec_size │ 467,057 │
│ avg_rec_size │ 167 │
│ rec_size_perc │ 0.219 │
│ fpi_size │ 546,344 │
│ fpi_size_perc │ 1.190 │
│ combined_size │ 1,013,401 │
│ combined_size_perc │ 0.391 │
├─[ RECORD 62 ]────────────────┼─────────────────────────────┤
│ resource_manager/record_type │ Heap2/VACUUM │
│ count │ 3,463 │
│ count_perc │ 0.168 │
│ rec_size │ 610,038 │
│ avg_rec_size │ 176 │
│ rec_size_perc │ 0.286 │
│ fpi_size │ 893,964 │
│ fpi_size_perc │ 1.948 │
│ combined_size │ 1,504,002 │
│ combined_size_perc │ 0.581 │
├─[ RECORD 63 ]────────────────┼─────────────────────────────┤
│ resource_manager/record_type │ Heap2/VISIBLE │
│ count │ 7,293 │
│ count_perc │ 0.354 │
│ rec_size │ 431,382 │
│ avg_rec_size │ 59 │
│ rec_size_perc │ 0.202 │
│ fpi_size │ 1,794,048 │
│ fpi_size_perc │ 3.909 │
│ combined_size │ 2,225,430 │
│ combined_size_perc │ 0.859 │

--
Peter Geoghegan

#22Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#21)
Re: Show various offset arrays for heap WAL records

On Mon, Mar 13, 2023 at 9:41 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Mon, Mar 13, 2023 at 4:01 PM Melanie Plageman <melanieplageman@gmail.com> wrote:

I have added detail to xl_btree_delete and xl_btree_vacuum. I have added
the updated/deleted target offset numbers and the updated tuples
metadata.

I wondered if there was any reason to do xl_btree_dedup deduplication
intervals.

No reason. It wouldn't be hard to cover xl_btree_dedup deduplication
intervals -- each element is a page offset number, and a corresponding
count of index tuples to merge together in the REDO routine. That's
slightly different to anything else, but not in a way that seems like
it requires very much additional effort.

I went to add dedup records and noticed that since the actual
BTDedupInterval struct is what is put in the xlog, I would need access
to that type from nbtdesc.c, however, including nbtree.h doesn't seem to
work because it includes files that cannot be included in frontend code.

I, of course, could make some local struct in nbtdesc.c which has an
OffsetNumber and a uint16, since the BTDedupInterval is pretty
straightforward, but that seems a bit annoying.
I'm probably missing something obvious, but is there a better way to do
this?

On another note, I've thought about how to include some example output
in docs, and, for example we could modify the example output in the
pgwalinspect docs which includes a PRUNE record already for
pg_get_wal_record_info() docs. We'd probably just want to keep it short.

- Melanie

In reply to: Melanie Plageman (#22)
Re: Show various offset arrays for heap WAL records

On Mon, Mar 27, 2023 at 2:29 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I went to add dedup records and noticed that since the actual
BTDedupInterval struct is what is put in the xlog, I would need access
to that type from nbtdesc.c, however, including nbtree.h doesn't seem to
work because it includes files that cannot be included in frontend code.

I suppose that the BTDedupInterval struct could have just as easily
gone in nbtxlog.h, next to xl_btree_dedup. There might have been a
moment where I thought about doing it that way, but I guess I found it
slightly preferable to use that symbol name (BTDedupInterval) rather
than (say) xl_btree_dedup_interval in places like the nearby
BTDedupStateData struct.

Actually, I suppose that it's hard to make that alternative work, at
least without
including nbtxlog.h in nbtree.h. Which sounds wrong.

I, of course, could make some local struct in nbtdesc.c which has an
OffsetNumber and a uint16, since the BTDedupInterval is pretty
straightforward, but that seems a bit annoying.
I'm probably missing something obvious, but is there a better way to do
this?

It was probably just one of those cases where I settled on the
arrangement that looked least odd overall. Not a particularly
principled approach. But the approach that I'm going to take once more
here. ;-)

All of the available alternatives are annoying in roughly the same
way, though perhaps to varying degrees. All except one: I'm okay with
just not adding coverage for deduplication records, for the time being
-- just seeing the number of intervals alone is relatively informative
with deduplication records, unlike (say) nbtree delete records. I'm
also okay with having coverage for dedup records if you feel it's
worth having. Your call.

If we're going to have coverage for deduplication records then it
seems to me that we have to have a struct in nbtxlog.h for your code
to work off of. It also seems likely that we'll want to use that same
struct within nbtxlog.c. What's less clear is what that means for the
BTDedupInterval struct. I don't think that we should include nbtxlog.h
in nbtree.h, nor should we do the converse.

I guess maybe two identical structs would be okay. BTDedupInterval,
and xl_btree_dedup_interval, with the former still used in nbtdedup.c,
and the latter used through a pointer at the point that nbtxlog.c
reads a dedup record. Then maybe at a sizeof() static assert beside
the existing btree_xlog_dedup() assertions that check that the dedup
state interval array matches the array taken from the WAL record.
That's still a bit weird, but I find it preferable to any alternative
that I can think of.

On another note, I've thought about how to include some example output
in docs, and, for example we could modify the example output in the
pgwalinspect docs which includes a PRUNE record already for
pg_get_wal_record_info() docs. We'd probably just want to keep it short.

Yeah. Perhaps a PRUNE record for one of the system catalogs whose
relfilenode is relatively recognizable. Say pg_class. It probably
doesn't matter that much, but there is perhaps some small value in
picking an example that is relatively easy to recreate later on (or to
approximately recreate). I'm certainly not insisting on that, though.

--
Peter Geoghegan

#24Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#23)
2 attachment(s)
Re: Show various offset arrays for heap WAL records

Attached v3 is cleaned up and includes a pg_walinspect docs update as
well as some edited comments in rmgr_utils.c

On Mon, Mar 27, 2023 at 6:27 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Mon, Mar 27, 2023 at 2:29 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I went to add dedup records and noticed that since the actual
BTDedupInterval struct is what is put in the xlog, I would need access
to that type from nbtdesc.c, however, including nbtree.h doesn't seem to
work because it includes files that cannot be included in frontend code.

I suppose that the BTDedupInterval struct could have just as easily
gone in nbtxlog.h, next to xl_btree_dedup. There might have been a
moment where I thought about doing it that way, but I guess I found it
slightly preferable to use that symbol name (BTDedupInterval) rather
than (say) xl_btree_dedup_interval in places like the nearby
BTDedupStateData struct.

Actually, I suppose that it's hard to make that alternative work, at
least without
including nbtxlog.h in nbtree.h. Which sounds wrong.

I, of course, could make some local struct in nbtdesc.c which has an
OffsetNumber and a uint16, since the BTDedupInterval is pretty
straightforward, but that seems a bit annoying.
I'm probably missing something obvious, but is there a better way to do
this?

It was probably just one of those cases where I settled on the
arrangement that looked least odd overall. Not a particularly
principled approach. But the approach that I'm going to take once more
here. ;-)

All of the available alternatives are annoying in roughly the same
way, though perhaps to varying degrees. All except one: I'm okay with
just not adding coverage for deduplication records, for the time being
-- just seeing the number of intervals alone is relatively informative
with deduplication records, unlike (say) nbtree delete records. I'm
also okay with having coverage for dedup records if you feel it's
worth having. Your call.

If we're going to have coverage for deduplication records then it
seems to me that we have to have a struct in nbtxlog.h for your code
to work off of. It also seems likely that we'll want to use that same
struct within nbtxlog.c. What's less clear is what that means for the
BTDedupInterval struct. I don't think that we should include nbtxlog.h
in nbtree.h, nor should we do the converse.

I guess maybe two identical structs would be okay. BTDedupInterval,
and xl_btree_dedup_interval, with the former still used in nbtdedup.c,
and the latter used through a pointer at the point that nbtxlog.c
reads a dedup record. Then maybe at a sizeof() static assert beside
the existing btree_xlog_dedup() assertions that check that the dedup
state interval array matches the array taken from the WAL record.
That's still a bit weird, but I find it preferable to any alternative
that I can think of.

I've omitted enhancements for the dedup record type for now.

On another note, I've thought about how to include some example output
in docs, and, for example we could modify the example output in the
pgwalinspect docs which includes a PRUNE record already for
pg_get_wal_record_info() docs. We'd probably just want to keep it short.

Yeah. Perhaps a PRUNE record for one of the system catalogs whose
relfilenode is relatively recognizable. Say pg_class. It probably
doesn't matter that much, but there is perhaps some small value in
picking an example that is relatively easy to recreate later on (or to
approximately recreate). I'm certainly not insisting on that, though.

I've added such an example to pg_walinspect docs.

On Tue, Mar 21, 2023 at 6:37 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Mon, Mar 13, 2023 at 6:41 PM Peter Geoghegan <pg@bowt.ie> wrote:

There are several different things that seem important to me
personally. These are in tension with each other, to a degree. These
are:

1. Like Andres, I'd really like to have some way of inspecting things
like heapam PRUNE, VACUUM, and FREEZE_PAGE records in significant
detail. These record types happen to be very important in general, and
the ability to see detailed information about the WAL record would
definitely help with some debugging scenarios. I've really missed
stuff like this while debugging serious issues under time pressure.

One problem that I often run into when performing analysis of VACUUM
using pg_walinspect is the issue of *who* pruned which heap page, for
any given PRUNE record. Was it VACUUM/autovacuum, or was it
opportunistic pruning? There is no way of knowing for sure right now.
You *cannot* rely on an xid of 0 as an indicator of a given PRUNE
record coming from VACUUM; it could just have been an opportunistic
prune operation that happened to take place when a SELECT query ran,
before any XID was ever allocated.

I think that we should do something like the attached, to completely
avoid this ambiguity. This patch adds a new XLOG_HEAP2 bit that's
similar to XLOG_HEAP_INIT_PAGE -- XLOG_HEAP2_BYVACUUM. This allows all
XLOG_HEAP2 record types to indicate that they took place during
VACUUM, by XOR'ing the flag with the record type/info when
XLogInsert() is called. For now this is only used by PRUNE records.
Tools like pg_walinspect will report a separate "Heap2/PRUNE+BYVACUUM"
record_type, as well as the unadorned Heap2/PRUNE record_type, which
we'll now know must have been opportunistic pruning.

The approach of using a bit in the style of the heapam init bit makes
sense to me, because the bit is available, and works in a way that is
minimally invasive. Also, one can imagine needing to resolve a similar
ambiguity in the future, when (say) opportunistic freezing is added.

I think that it makes sense to treat this within the scope of
Melanie's ongoing work to improve the instrumentation of these records
-- meaning that it's in scope for Postgres 16. Admittedly this is a
slightly creative interpretation, so if others disagree then I won't
argue. This is quite a small patch, though, which makes debugging
significantly easier. I think that there could be a great deal of
utility in being able to easily "pair up" corresponding
"Heap2/PRUNE+BYVACUUM" and "Heap2/VACUUM" records in debugging
scenarios. I can imagine linking these to "Heap2/FREEZE_PAGE" and
"Heap2/VISIBLE" records, too, since they're all closely related record
types.

I really like this idea and would find it useful. I reviewed the patch
and tried it out and it worked for me and code looked fine as well.

I didn't include it in the attached patchset because I don't feel
confident enough in my own understanding of any potential implications
of splitting up these record types to definitively endorse it. But, if
someone else felt comfortable with it, I would like to see it in the
tree.

- Melanie

Attachments:

v1-0001-Record-which-PRUNE-records-are-from-VACUUM.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Record-which-PRUNE-records-are-from-VACUUM.patchDownload
From b584717e9de72c1ac084698453667cb301551e36 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Tue, 21 Mar 2023 14:40:43 -0700
Subject: [PATCH v1] Record which PRUNE records are from VACUUM.

---
 src/include/access/heapam.h            |  2 +-
 src/include/access/heapam_xlog.h       |  8 +++++++-
 src/backend/access/heap/pruneheap.c    | 25 +++++++++++++++----------
 src/backend/access/rmgrdesc/heapdesc.c |  3 +++
 4 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index faf502651..2eedd7efd 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -289,7 +289,7 @@ extern int	heap_page_prune(Relation relation, Buffer buffer,
 							TransactionId old_snap_xmin,
 							TimestampTz old_snap_ts,
 							int *nnewlpdead,
-							OffsetNumber *off_loc);
+							OffsetNumber *off_loc_vacuum);
 extern void heap_page_prune_execute(Buffer buffer,
 									OffsetNumber *redirected, int nredirected,
 									OffsetNumber *nowdead, int ndead,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index a2c67d1cd..3372d1938 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -48,7 +48,7 @@
  * We ran out of opcodes, so heapam.c now has a second RmgrId.  These opcodes
  * are associated with RM_HEAP2_ID, but are not logically different from
  * the ones above associated with RM_HEAP_ID.  XLOG_HEAP_OPMASK applies to
- * these, too.
+ * these, too.  The BYVACUUM bit is used in place of RM_HEAP_ID's init bit.
  */
 #define XLOG_HEAP2_REWRITE		0x00
 #define XLOG_HEAP2_PRUNE		0x10
@@ -59,6 +59,12 @@
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
 
+/*
+ * RM_HEAP2_ID operations that take place during VACUUM set the BYVACUUM bit
+ * as instrumentation.  Only set in XLOG_HEAP2_PRUNE records, for now.
+ */
+#define XLOG_HEAP2_BYVACUUM		0x80
+
 /*
  * xl_heap_insert/xl_heap_multi_insert flag values, 8 bits are available.
  */
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4e65cbcad..5f4ef384e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -257,8 +257,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * Sets *nnewlpdead for caller, indicating the number of items that were
  * newly set LP_DEAD during prune operation.
  *
- * off_loc is the offset location required by the caller to use in error
- * callback.
+ * off_loc_vacuum is the offset location required by VACUUM caller to use in
+ * error callback.  Opportunistic pruning caller should pass NULL.
  *
  * Returns the number of tuples deleted from the page during this call.
  */
@@ -268,7 +268,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 				TransactionId old_snap_xmin,
 				TimestampTz old_snap_ts,
 				int *nnewlpdead,
-				OffsetNumber *off_loc)
+				OffsetNumber *off_loc_vacuum)
 {
 	int			ndeleted = 0;
 	Page		page = BufferGetPage(buffer);
@@ -345,8 +345,8 @@ heap_page_prune(Relation relation, Buffer buffer,
 		 * Set the offset number so that we can display it along with any
 		 * error that occurred while processing this tuple.
 		 */
-		if (off_loc)
-			*off_loc = offnum;
+		if (off_loc_vacuum)
+			*off_loc_vacuum = offnum;
 
 		prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
 														   buffer);
@@ -364,8 +364,8 @@ heap_page_prune(Relation relation, Buffer buffer,
 			continue;
 
 		/* see preceding loop */
-		if (off_loc)
-			*off_loc = offnum;
+		if (off_loc_vacuum)
+			*off_loc_vacuum = offnum;
 
 		/* Nothing to do if slot is empty or already dead */
 		itemid = PageGetItemId(page, offnum);
@@ -377,8 +377,8 @@ heap_page_prune(Relation relation, Buffer buffer,
 	}
 
 	/* Clear the offset information once we have processed the given page. */
-	if (off_loc)
-		*off_loc = InvalidOffsetNumber;
+	if (off_loc_vacuum)
+		*off_loc_vacuum = InvalidOffsetNumber;
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -416,6 +416,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 		if (RelationNeedsWAL(relation))
 		{
 			xl_heap_prune xlrec;
+			uint8		info = XLOG_HEAP2_PRUNE;
 			XLogRecPtr	recptr;
 
 			xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
@@ -445,7 +446,11 @@ heap_page_prune(Relation relation, Buffer buffer,
 				XLogRegisterBufData(0, (char *) prstate.nowunused,
 									prstate.nunused * sizeof(OffsetNumber));
 
-			recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
+			/* Set bit indicating pruning by VACUUM where appropriate */
+			if (off_loc_vacuum)
+				info |= XLOG_HEAP2_BYVACUUM;
+
+			recptr = XLogInsert(RM_HEAP2_ID, info);
 
 			PageSetLSN(BufferGetPage(buffer), recptr);
 		}
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 628f7e821..9231b7411 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -235,6 +235,9 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE:
 			id = "PRUNE";
 			break;
+		case XLOG_HEAP2_PRUNE | XLOG_HEAP2_BYVACUUM:
+			id = "PRUNE+BYVACUUM";
+			break;
 		case XLOG_HEAP2_VACUUM:
 			id = "VACUUM";
 			break;
-- 
2.39.2

v3-0001-Add-rmgr_desc-utilities.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Add-rmgr_desc-utilities.patchDownload
From 5b33b8fa323db65f3cf3b0dcc343071a7975b649 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 25 Jan 2023 08:38:38 -0500
Subject: [PATCH v3 1/2] Add rmgr_desc utilities

- Begin to standardize format of rmgr_desc output, adding a new
  rmgr_utils file with comments about expected format.
- Add helper which prints arrays in a standard format and use it in
  heapdesc to print out offset arrays, relid arrays, freeze plan arrays,
  and redirect arrays.

Suggested by Andres Freund

Discussion: https://postgr.es/m/flat/20230109215842.fktuhesvayno6o4g%40awork3.anarazel.de
---
 doc/src/sgml/pgwalinspect.sgml               |  14 +-
 src/backend/access/rmgrdesc/Makefile         |   1 +
 src/backend/access/rmgrdesc/heapdesc.c       | 152 +++++++++++++++----
 src/backend/access/rmgrdesc/meson.build      |   1 +
 src/backend/access/rmgrdesc/rmgrdesc_utils.c |  82 ++++++++++
 src/bin/pg_waldump/Makefile                  |   2 +-
 src/include/access/rmgrdesc_utils.h          |  29 ++++
 7 files changed, 245 insertions(+), 36 deletions(-)
 create mode 100644 src/backend/access/rmgrdesc/rmgrdesc_utils.c
 create mode 100644 src/include/access/rmgrdesc_utils.h

diff --git a/doc/src/sgml/pgwalinspect.sgml b/doc/src/sgml/pgwalinspect.sgml
index d9ed8f0a9a..7fb089e83a 100644
--- a/doc/src/sgml/pgwalinspect.sgml
+++ b/doc/src/sgml/pgwalinspect.sgml
@@ -110,18 +110,18 @@ block_ref        | blkref #0: rel 1663/5/60221 fork main blk 2
       Returns one row per WAL record.  For example:
 <screen>
 postgres=# SELECT * FROM pg_get_wal_records_info('0/1E913618', '0/1E913740') LIMIT 1;
--[ RECORD 1 ]----+--------------------------------------------------------------
+-[ RECORD 1 ]----+------------------------------------------------------------------------------------------------------------------------
 start_lsn        | 0/1E913618
 end_lsn          | 0/1E913650
 prev_lsn         | 0/1E9135A0
 xid              | 0
-resource_manager | Standby
-record_type      | RUNNING_XACTS
-record_length    | 50
-main_data_length | 24
+resource_manager | Heap2
+record_type      | PRUNE
+record_length    | 61
+main_data_length | 9
 fpi_length       | 0
-description      | nextXid 33775 latestCompletedXid 33774 oldestRunningXid 33775
-block_ref        |
+description      | snapshotConflictHorizon: 754, nredirected: 1, ndead: 1, nunused: 0, redirected: [ 1->5, 6->5 ], dead: [ 6 ], unused: []
+block_ref        | blkref #0: rel 1663/5/1259 fork main blk 1
 </screen>
      </para>
      <para>
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..cd95eec37f 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -23,6 +23,7 @@ OBJS = \
 	nbtdesc.o \
 	relmapdesc.o \
 	replorigindesc.o \
+	rmgrdesc_utils.o \
 	seqdesc.o \
 	smgrdesc.o \
 	spgdesc.o \
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 628f7e8215..1041db239f 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -15,20 +15,53 @@
 #include "postgres.h"
 
 #include "access/heapam_xlog.h"
+#include "access/rmgrdesc_utils.h"
+
 
 static void
 out_infobits(StringInfo buf, uint8 infobits)
 {
+	if ((infobits & XLHL_XMAX_IS_MULTI) == 0 &&
+		(infobits & XLHL_XMAX_LOCK_ONLY) == 0 &&
+		(infobits & XLHL_XMAX_EXCL_LOCK) == 0 &&
+		(infobits & XLHL_XMAX_KEYSHR_LOCK) == 0 &&
+		(infobits & XLHL_KEYS_UPDATED) == 0)
+		return;
+
+	appendStringInfoString(buf, ", infobits: [");
+
 	if (infobits & XLHL_XMAX_IS_MULTI)
-		appendStringInfoString(buf, "IS_MULTI ");
+		appendStringInfoString(buf, " IS_MULTI");
 	if (infobits & XLHL_XMAX_LOCK_ONLY)
-		appendStringInfoString(buf, "LOCK_ONLY ");
+		appendStringInfoString(buf, ", LOCK_ONLY");
 	if (infobits & XLHL_XMAX_EXCL_LOCK)
-		appendStringInfoString(buf, "EXCL_LOCK ");
+		appendStringInfoString(buf, ", EXCL_LOCK");
 	if (infobits & XLHL_XMAX_KEYSHR_LOCK)
-		appendStringInfoString(buf, "KEYSHR_LOCK ");
+		appendStringInfoString(buf, ", KEYSHR_LOCK");
 	if (infobits & XLHL_KEYS_UPDATED)
-		appendStringInfoString(buf, "KEYS_UPDATED ");
+		appendStringInfoString(buf, ", KEYS_UPDATED");
+
+	appendStringInfoString(buf, " ]");
+}
+
+static void
+plan_elem_desc(StringInfo buf, void *restrict plan, void *restrict data)
+{
+	xl_heap_freeze_plan *new_plan = (xl_heap_freeze_plan *) plan;
+	OffsetNumber **offsets = data;
+
+	appendStringInfo(buf, "{ xmax: %u, infomask: %u, infomask2: %u, ntuples: %u",
+					 new_plan->xmax, new_plan->t_infomask,
+					 new_plan->t_infomask2,
+					 new_plan->ntuples);
+
+	appendStringInfoString(buf, ", offsets:");
+	array_desc(buf, *offsets, sizeof(OffsetNumber), new_plan->ntuples,
+			   &offset_elem_desc, NULL);
+
+	*offsets += new_plan->ntuples;
+
+	appendStringInfo(buf, " }");
 }
 
 void
@@ -42,14 +75,15 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_insert *xlrec = (xl_heap_insert *) rec;
 
-		appendStringInfo(buf, "off %u flags 0x%02X", xlrec->offnum,
+		appendStringInfo(buf, "off: %u, flags: 0x%02X",
+						 xlrec->offnum,
 						 xlrec->flags);
 	}
 	else if (info == XLOG_HEAP_DELETE)
 	{
 		xl_heap_delete *xlrec = (xl_heap_delete *) rec;
 
-		appendStringInfo(buf, "off %u flags 0x%02X ",
+		appendStringInfo(buf, "off: %u, flags: 0x%02X",
 						 xlrec->offnum,
 						 xlrec->flags);
 		out_infobits(buf, xlrec->infobits_set);
@@ -58,12 +92,12 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_update *xlrec = (xl_heap_update *) rec;
 
-		appendStringInfo(buf, "off %u xmax %u flags 0x%02X ",
+		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
 						 xlrec->old_offnum,
 						 xlrec->old_xmax,
 						 xlrec->flags);
 		out_infobits(buf, xlrec->old_infobits_set);
-		appendStringInfo(buf, "; new off %u xmax %u",
+		appendStringInfo(buf, ", new off: %u, xmax %u",
 						 xlrec->new_offnum,
 						 xlrec->new_xmax);
 	}
@@ -71,39 +105,41 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_update *xlrec = (xl_heap_update *) rec;
 
-		appendStringInfo(buf, "off %u xmax %u flags 0x%02X ",
+		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
 						 xlrec->old_offnum,
 						 xlrec->old_xmax,
 						 xlrec->flags);
 		out_infobits(buf, xlrec->old_infobits_set);
-		appendStringInfo(buf, "; new off %u xmax %u",
+		appendStringInfo(buf, ", new off: %u, xmax: %u",
 						 xlrec->new_offnum,
 						 xlrec->new_xmax);
 	}
 	else if (info == XLOG_HEAP_TRUNCATE)
 	{
 		xl_heap_truncate *xlrec = (xl_heap_truncate *) rec;
-		int			i;
 
+		appendStringInfoString(buf, "flags: [");
 		if (xlrec->flags & XLH_TRUNCATE_CASCADE)
-			appendStringInfoString(buf, "cascade ");
+			appendStringInfoString(buf, " CASCADE");
 		if (xlrec->flags & XLH_TRUNCATE_RESTART_SEQS)
-			appendStringInfoString(buf, "restart_seqs ");
-		appendStringInfo(buf, "nrelids %u relids", xlrec->nrelids);
-		for (i = 0; i < xlrec->nrelids; i++)
-			appendStringInfo(buf, " %u", xlrec->relids[i]);
+			appendStringInfoString(buf, ", RESTART_SEQS");
+		appendStringInfoString(buf, " ]");
+
+		appendStringInfo(buf, ", nrelids: %u", xlrec->nrelids);
+		appendStringInfoString(buf, ", relids:");
+		array_desc(buf, xlrec->relids, sizeof(Oid), xlrec->nrelids, &relid_desc, NULL);
 	}
 	else if (info == XLOG_HEAP_CONFIRM)
 	{
 		xl_heap_confirm *xlrec = (xl_heap_confirm *) rec;
 
-		appendStringInfo(buf, "off %u", xlrec->offnum);
+		appendStringInfo(buf, "off: %u", xlrec->offnum);
 	}
 	else if (info == XLOG_HEAP_LOCK)
 	{
 		xl_heap_lock *xlrec = (xl_heap_lock *) rec;
 
-		appendStringInfo(buf, "off %u: xid %u: flags 0x%02X ",
+		appendStringInfo(buf, "off: %u, xid: %u, flags: 0x%02X",
 						 xlrec->offnum, xlrec->locking_xid, xlrec->flags);
 		out_infobits(buf, xlrec->infobits_set);
 	}
@@ -111,9 +147,10 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_inplace *xlrec = (xl_heap_inplace *) rec;
 
-		appendStringInfo(buf, "off %u", xlrec->offnum);
+		appendStringInfo(buf, "off: %u", xlrec->offnum);
 	}
 }
+
 void
 heap2_desc(StringInfo buf, XLogReaderState *record)
 {
@@ -125,43 +162,102 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_prune *xlrec = (xl_heap_prune *) rec;
 
-		appendStringInfo(buf, "snapshotConflictHorizon %u nredirected %u ndead %u",
+		appendStringInfo(buf, "snapshotConflictHorizon: %u, nredirected: %u, ndead: %u",
 						 xlrec->snapshotConflictHorizon,
 						 xlrec->nredirected,
 						 xlrec->ndead);
+
+
+		if (!XLogRecHasBlockImage(record, 0))
+		{
+			OffsetNumber *end;
+			OffsetNumber *redirected;
+			OffsetNumber *nowdead;
+			OffsetNumber *nowunused;
+			int			nredirected;
+			int			nunused;
+			Size		datalen;
+
+			redirected = (OffsetNumber *) XLogRecGetBlockData(record, 0, &datalen);
+
+			nredirected = xlrec->nredirected;
+			end = (OffsetNumber *) ((char *) redirected + datalen);
+			nowdead = redirected + (nredirected * 2);
+			nowunused = nowdead + xlrec->ndead;
+			nunused = (end - nowunused);
+			Assert(nunused >= 0);
+
+			appendStringInfo(buf, ", nunused: %u", nunused);
+
+			appendStringInfoString(buf, ", redirected:");
+			array_desc(buf, redirected, sizeof(OffsetNumber) * 2, nredirected * 2, &redirect_elem_desc, NULL);
+			appendStringInfoString(buf, ", dead:");
+			array_desc(buf, nowdead, sizeof(OffsetNumber), xlrec->ndead, &offset_elem_desc, NULL);
+			appendStringInfoString(buf, ", unused:");
+			array_desc(buf, nowunused, sizeof(OffsetNumber), nunused, &offset_elem_desc, NULL);
+		}
 	}
 	else if (info == XLOG_HEAP2_VACUUM)
 	{
 		xl_heap_vacuum *xlrec = (xl_heap_vacuum *) rec;
 
-		appendStringInfo(buf, "nunused %u", xlrec->nunused);
+		appendStringInfo(buf, "nunused: %u", xlrec->nunused);
+
+		if (!XLogRecHasBlockImage(record, 0))
+		{
+			OffsetNumber *nowunused;
+
+			nowunused = (OffsetNumber *) XLogRecGetBlockData(record, 0, NULL);
+
+			appendStringInfoString(buf, ", unused:");
+			array_desc(buf, nowunused, sizeof(OffsetNumber), xlrec->nunused, &offset_elem_desc, NULL);
+		}
 	}
 	else if (info == XLOG_HEAP2_FREEZE_PAGE)
 	{
 		xl_heap_freeze_page *xlrec = (xl_heap_freeze_page *) rec;
 
-		appendStringInfo(buf, "snapshotConflictHorizon %u nplans %u",
+		appendStringInfo(buf, "snapshotConflictHorizon: %u, nplans: %u",
 						 xlrec->snapshotConflictHorizon, xlrec->nplans);
+
+		if (!XLogRecHasBlockImage(record, 0))
+		{
+			xl_heap_freeze_plan *plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, NULL);
+			OffsetNumber *offsets = (OffsetNumber *) &plans[xlrec->nplans];
+
+			appendStringInfoString(buf, ", plans:");
+			array_desc(buf,
+					   plans,
+					   sizeof(xl_heap_freeze_plan),
+					   xlrec->nplans,
+					   &plan_elem_desc,
+					   &offsets);
+
+		}
 	}
 	else if (info == XLOG_HEAP2_VISIBLE)
 	{
 		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
 
-		appendStringInfo(buf, "snapshotConflictHorizon %u flags 0x%02X",
+		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
 						 xlrec->snapshotConflictHorizon, xlrec->flags);
 	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
+		bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 
-		appendStringInfo(buf, "%d tuples flags 0x%02X", xlrec->ntuples,
+		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
+		appendStringInfoString(buf, ", offsets:");
+		if (!XLogRecHasBlockImage(record, 0) && !isinit)
+			array_desc(buf, xlrec->offsets, sizeof(OffsetNumber), xlrec->ntuples, &offset_elem_desc, NULL);
 	}
 	else if (info == XLOG_HEAP2_LOCK_UPDATED)
 	{
 		xl_heap_lock_updated *xlrec = (xl_heap_lock_updated *) rec;
 
-		appendStringInfo(buf, "off %u: xmax %u: flags 0x%02X ",
+		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
 						 xlrec->offnum, xlrec->xmax, xlrec->flags);
 		out_infobits(buf, xlrec->infobits_set);
 	}
@@ -169,13 +265,13 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_new_cid *xlrec = (xl_heap_new_cid *) rec;
 
-		appendStringInfo(buf, "rel %u/%u/%u; tid %u/%u",
+		appendStringInfo(buf, "rel: %u/%u/%u, tid: %u/%u",
 						 xlrec->target_locator.spcOid,
 						 xlrec->target_locator.dbOid,
 						 xlrec->target_locator.relNumber,
 						 ItemPointerGetBlockNumber(&(xlrec->target_tid)),
 						 ItemPointerGetOffsetNumber(&(xlrec->target_tid)));
-		appendStringInfo(buf, "; cmin: %u, cmax: %u, combo: %u",
+		appendStringInfo(buf, ", cmin: %u, cmax: %u, combo: %u",
 						 xlrec->cmin, xlrec->cmax, xlrec->combocid);
 	}
 }
diff --git a/src/backend/access/rmgrdesc/meson.build b/src/backend/access/rmgrdesc/meson.build
index 166cee67b6..f76e87e2d7 100644
--- a/src/backend/access/rmgrdesc/meson.build
+++ b/src/backend/access/rmgrdesc/meson.build
@@ -16,6 +16,7 @@ rmgr_desc_sources = files(
   'nbtdesc.c',
   'relmapdesc.c',
   'replorigindesc.c',
+  'rmgrdesc_utils.c',
   'seqdesc.c',
   'smgrdesc.c',
   'spgdesc.c',
diff --git a/src/backend/access/rmgrdesc/rmgrdesc_utils.c b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
new file mode 100644
index 0000000000..2bfd11cb6e
--- /dev/null
+++ b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
@@ -0,0 +1,82 @@
+/*-------------------------------------------------------------------------
+ *
+ * rmgrdesc_utils.c
+ *		support functions for rmgrdesc
+ *
+ * Portions Copyright (c) 2023, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/rmgrdesc/rmgrdesc_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/rmgrdesc_utils.h"
+
+/*
+ * The expected formatting of a desc function for an xlog record is as follows:
+ * member1_name: member1_value, member2_name: member2_value
+ * If the value is a list, please use
+ * member3_name: [ member3_list_value1, member3_list_value2 ]
+ *
+ * The first item appended to the string should not be prepended by any spaces
+ * or comma, however all subsequent appends to the string are responsible for
+ * prepending themselves with a comma followed by a space.
+ *
+ * Arrays should have a space between the opening square bracket and first
+ * element and between the last element and closing brace.
+ *
+ * Flags should be in all caps.
+ *
+ * For lists/arrays of items, the number of those items should be listed at the
+ * beginning with all of the other numbers.
+ *
+ * List punctuation should still be included even if there are 0 items.
+ *
+ * Composite objects in a list should be surrounded with { }.
+ */
+
+void
+array_desc(StringInfo buf, void *array, size_t elem_size, int count,
+		   void (*elem_desc) (StringInfo buf, void *restrict elem, void *restrict data),
+		   void *restrict data)
+{
+	if (count == 0)
+	{
+		appendStringInfoString(buf, " []");
+		return;
+	}
+	appendStringInfo(buf, " [");
+	for (int i = 0; i < count; i++)
+	{
+		if (i > 0)
+			appendStringInfoString(buf, ",");
+		appendStringInfoString(buf, " ");
+
+		elem_desc(buf, (char *) array + elem_size * i, data);
+	}
+	appendStringInfoString(buf, " ]");
+}
+
+void
+offset_elem_desc(StringInfo buf, void *restrict offset, void *restrict data)
+{
+	appendStringInfo(buf, "%u", *(OffsetNumber *) offset);
+}
+
+void
+redirect_elem_desc(StringInfo buf, void *restrict offset, void *restrict data)
+{
+	OffsetNumber *new_offset = (OffsetNumber *) offset;
+
+	appendStringInfo(buf, "%u->%u", new_offset[0], new_offset[1]);
+}
+
+void
+relid_desc(StringInfo buf, void *restrict relid, void *restrict data)
+{
+	appendStringInfo(buf, "%u", *(Oid *) relid);
+}
diff --git a/src/bin/pg_waldump/Makefile b/src/bin/pg_waldump/Makefile
index d6459e17c7..61c9de470c 100644
--- a/src/bin/pg_waldump/Makefile
+++ b/src/bin/pg_waldump/Makefile
@@ -18,7 +18,7 @@ OBJS = \
 
 override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
 
-RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc.c)))
+RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc.c) $(top_srcdir)/src/backend/access/rmgrdesc/rmgrdesc_utils.c))
 RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES))
 
 
diff --git a/src/include/access/rmgrdesc_utils.h b/src/include/access/rmgrdesc_utils.h
new file mode 100644
index 0000000000..763ae81859
--- /dev/null
+++ b/src/include/access/rmgrdesc_utils.h
@@ -0,0 +1,29 @@
+/*-------------------------------------------------------------------------
+ *
+ * rmgrdesc_utils.h
+ *	  helper utilities for rmgrdesc
+ *
+ * Copyright (c) 2023 PostgreSQL Global Development Group
+ *
+ * src/include/access/rmgrdesc_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef RMGRDESC_UTILS_H_
+#define RMGRDESC_UTILS_H_
+
+#include "storage/off.h"
+#include "access/heapam_xlog.h"
+
+extern void array_desc(StringInfo buf, void *array, size_t elem_size, int count,
+					   void (*elem_desc) (StringInfo buf, void *restrict elem, void *restrict data),
+					   void *restrict data);
+
+extern void offset_elem_desc(StringInfo buf, void *restrict offset, void *restrict data);
+
+extern void redirect_elem_desc(StringInfo buf, void *restrict offset, void *restrict data);
+
+extern void relid_desc(StringInfo buf, void *restrict relid, void *restrict data);
+
+
+#endif							/* RMGRDESC_UTILS_H */
-- 
2.37.2

In reply to: Melanie Plageman (#24)
1 attachment(s)
Re: Show various offset arrays for heap WAL records

On Fri, Apr 7, 2023 at 1:33 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Attached v3 is cleaned up and includes a pg_walinspect docs update as
well as some edited comments in rmgr_utils.c

Attached v4 has some small tweaks on your v3. Mostly just whitespace
tweaks. Two slightly notable tweaks:

* I changed the approach to globbing in the Makefile, rather than use
your original overwide formulation for the new rmgrdesc_utils.c file.

What do you think of this approach?

* Removed use of the restrict keyword.

While "restrict" is C99, I'm not completely sure that it's totally
supported by Postgres. I'm a bit surprised that you opted to use it in
this particular patch.

I meant to ask you about this earlier...why use restrict in this patch?

I've added such an example to pg_walinspect docs.

There already was a PRUNE example, though -- for the
pg_get_wal_record_info function (singular, not to be confused with
pg_get_wal_records_info).

v4 makes the example a VACUUM record, which replaces the previous
pg_get_wal_record_info PRUNE example -- that needed to be updated
anyway. This approach has the advantage of not being too verbose,
which still showing some of this kind of detail.

This has the advantage of allowing pg_get_wal_records_info's example
to continue to be an example that lacks a block reference (and so has
a NULL block_ref). This is a useful contrast against the new
pg_get_wal_block_info function.

I really like this idea and would find it useful. I reviewed the patch
and tried it out and it worked for me and code looked fine as well.

I didn't include it in the attached patchset because I don't feel
confident enough in my own understanding of any potential implications
of splitting up these record types to definitively endorse it. But, if
someone else felt comfortable with it, I would like to see it in the
tree.

I'm not going to move on it now for 16, given the lack of feedback about it.

--
Peter Geoghegan

Attachments:

v4-0001-Add-rmgr_desc-utilities.patchapplication/octet-stream; name=v4-0001-Add-rmgr_desc-utilities.patchDownload
From 88934d500a11e2dc1a20d39a16af1e6fe850a859 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Fri, 7 Apr 2023 14:42:54 -0700
Subject: [PATCH v4] Add rmgr_desc utilities

- Begin to standardize format of rmgr_desc output, adding a new
  rmgr_utils file with comments about expected format.
- Add helper which prints arrays in a standard format and use it in
  heapdesc to print out offset arrays, relid arrays, freeze plan arrays,
  and redirect arrays.

Suggested by Andres Freund

Discussion: https://postgr.es/m/flat/20230109215842.fktuhesvayno6o4g%40awork3.anarazel.de
---
 src/include/access/rmgrdesc_utils.h          |  25 +++
 src/backend/access/rmgrdesc/Makefile         |   1 +
 src/backend/access/rmgrdesc/heapdesc.c       | 159 +++++++++++++++----
 src/backend/access/rmgrdesc/meson.build      |   1 +
 src/backend/access/rmgrdesc/rmgrdesc_utils.c |  83 ++++++++++
 src/bin/pg_waldump/Makefile                  |   2 +-
 doc/src/sgml/pgwalinspect.sgml               |  20 +--
 7 files changed, 252 insertions(+), 39 deletions(-)
 create mode 100644 src/include/access/rmgrdesc_utils.h
 create mode 100644 src/backend/access/rmgrdesc/rmgrdesc_utils.c

diff --git a/src/include/access/rmgrdesc_utils.h b/src/include/access/rmgrdesc_utils.h
new file mode 100644
index 000000000..58da418bb
--- /dev/null
+++ b/src/include/access/rmgrdesc_utils.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * rmgrdesc_utils.h
+ *	  helper utilities for rmgrdesc
+ *
+ * Copyright (c) 2023 PostgreSQL Global Development Group
+ *
+ * src/include/access/rmgrdesc_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef RMGRDESC_UTILS_H_
+#define RMGRDESC_UTILS_H_
+
+#include "storage/off.h"
+#include "access/heapam_xlog.h"
+
+extern void array_desc(StringInfo buf, void *array, size_t elem_size, int count,
+					   void (*elem_desc) (StringInfo buf, void *elem, void *data),
+					   void *data);
+extern void offset_elem_desc(StringInfo buf, void *offset, void *data);
+extern void redirect_elem_desc(StringInfo buf, void *offset, void *data);
+extern void relid_desc(StringInfo buf, void *relid, void *data);
+
+#endif							/* RMGRDESC_UTILS_H */
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd8..cd95eec37 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -23,6 +23,7 @@ OBJS = \
 	nbtdesc.o \
 	relmapdesc.o \
 	replorigindesc.o \
+	rmgrdesc_utils.o \
 	seqdesc.o \
 	smgrdesc.o \
 	spgdesc.o \
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 628f7e821..64427ac56 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -15,20 +15,52 @@
 #include "postgres.h"
 
 #include "access/heapam_xlog.h"
+#include "access/rmgrdesc_utils.h"
 
 static void
 out_infobits(StringInfo buf, uint8 infobits)
 {
+	if ((infobits & XLHL_XMAX_IS_MULTI) == 0 &&
+		(infobits & XLHL_XMAX_LOCK_ONLY) == 0 &&
+		(infobits & XLHL_XMAX_EXCL_LOCK) == 0 &&
+		(infobits & XLHL_XMAX_KEYSHR_LOCK) == 0 &&
+		(infobits & XLHL_KEYS_UPDATED) == 0)
+		return;
+
+	appendStringInfoString(buf, ", infobits: [");
+
 	if (infobits & XLHL_XMAX_IS_MULTI)
-		appendStringInfoString(buf, "IS_MULTI ");
+		appendStringInfoString(buf, " IS_MULTI");
 	if (infobits & XLHL_XMAX_LOCK_ONLY)
-		appendStringInfoString(buf, "LOCK_ONLY ");
+		appendStringInfoString(buf, ", LOCK_ONLY");
 	if (infobits & XLHL_XMAX_EXCL_LOCK)
-		appendStringInfoString(buf, "EXCL_LOCK ");
+		appendStringInfoString(buf, ", EXCL_LOCK");
 	if (infobits & XLHL_XMAX_KEYSHR_LOCK)
-		appendStringInfoString(buf, "KEYSHR_LOCK ");
+		appendStringInfoString(buf, ", KEYSHR_LOCK");
 	if (infobits & XLHL_KEYS_UPDATED)
-		appendStringInfoString(buf, "KEYS_UPDATED ");
+		appendStringInfoString(buf, ", KEYS_UPDATED");
+
+	appendStringInfoString(buf, " ]");
+}
+
+static void
+plan_elem_desc(StringInfo buf, void *plan, void *data)
+{
+	xl_heap_freeze_plan *new_plan = (xl_heap_freeze_plan *) plan;
+	OffsetNumber **offsets = data;
+
+	appendStringInfo(buf, "{ xmax: %u, infomask: %u, infomask2: %u, ntuples: %u",
+					 new_plan->xmax, new_plan->t_infomask,
+					 new_plan->t_infomask2,
+					 new_plan->ntuples);
+
+	appendStringInfoString(buf, ", offsets:");
+	array_desc(buf, *offsets, sizeof(OffsetNumber), new_plan->ntuples,
+			   &offset_elem_desc, NULL);
+
+	*offsets += new_plan->ntuples;
+
+	appendStringInfo(buf, " }");
 }
 
 void
@@ -42,14 +74,15 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_insert *xlrec = (xl_heap_insert *) rec;
 
-		appendStringInfo(buf, "off %u flags 0x%02X", xlrec->offnum,
+		appendStringInfo(buf, "off: %u, flags: 0x%02X",
+						 xlrec->offnum,
 						 xlrec->flags);
 	}
 	else if (info == XLOG_HEAP_DELETE)
 	{
 		xl_heap_delete *xlrec = (xl_heap_delete *) rec;
 
-		appendStringInfo(buf, "off %u flags 0x%02X ",
+		appendStringInfo(buf, "off: %u, flags: 0x%02X",
 						 xlrec->offnum,
 						 xlrec->flags);
 		out_infobits(buf, xlrec->infobits_set);
@@ -58,12 +91,12 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_update *xlrec = (xl_heap_update *) rec;
 
-		appendStringInfo(buf, "off %u xmax %u flags 0x%02X ",
+		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
 						 xlrec->old_offnum,
 						 xlrec->old_xmax,
 						 xlrec->flags);
 		out_infobits(buf, xlrec->old_infobits_set);
-		appendStringInfo(buf, "; new off %u xmax %u",
+		appendStringInfo(buf, ", new off: %u, xmax %u",
 						 xlrec->new_offnum,
 						 xlrec->new_xmax);
 	}
@@ -71,39 +104,41 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_update *xlrec = (xl_heap_update *) rec;
 
-		appendStringInfo(buf, "off %u xmax %u flags 0x%02X ",
+		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
 						 xlrec->old_offnum,
 						 xlrec->old_xmax,
 						 xlrec->flags);
 		out_infobits(buf, xlrec->old_infobits_set);
-		appendStringInfo(buf, "; new off %u xmax %u",
+		appendStringInfo(buf, ", new off: %u, xmax: %u",
 						 xlrec->new_offnum,
 						 xlrec->new_xmax);
 	}
 	else if (info == XLOG_HEAP_TRUNCATE)
 	{
 		xl_heap_truncate *xlrec = (xl_heap_truncate *) rec;
-		int			i;
 
+		appendStringInfoString(buf, "flags: [");
 		if (xlrec->flags & XLH_TRUNCATE_CASCADE)
-			appendStringInfoString(buf, "cascade ");
+			appendStringInfoString(buf, " CASCADE");
 		if (xlrec->flags & XLH_TRUNCATE_RESTART_SEQS)
-			appendStringInfoString(buf, "restart_seqs ");
-		appendStringInfo(buf, "nrelids %u relids", xlrec->nrelids);
-		for (i = 0; i < xlrec->nrelids; i++)
-			appendStringInfo(buf, " %u", xlrec->relids[i]);
+			appendStringInfoString(buf, ", RESTART_SEQS");
+		appendStringInfoString(buf, " ]");
+
+		appendStringInfo(buf, ", nrelids: %u", xlrec->nrelids);
+		appendStringInfoString(buf, ", relids:");
+		array_desc(buf, xlrec->relids, sizeof(Oid), xlrec->nrelids, &relid_desc, NULL);
 	}
 	else if (info == XLOG_HEAP_CONFIRM)
 	{
 		xl_heap_confirm *xlrec = (xl_heap_confirm *) rec;
 
-		appendStringInfo(buf, "off %u", xlrec->offnum);
+		appendStringInfo(buf, "off: %u", xlrec->offnum);
 	}
 	else if (info == XLOG_HEAP_LOCK)
 	{
 		xl_heap_lock *xlrec = (xl_heap_lock *) rec;
 
-		appendStringInfo(buf, "off %u: xid %u: flags 0x%02X ",
+		appendStringInfo(buf, "off: %u, xid: %u, flags: 0x%02X",
 						 xlrec->offnum, xlrec->locking_xid, xlrec->flags);
 		out_infobits(buf, xlrec->infobits_set);
 	}
@@ -111,9 +146,10 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_inplace *xlrec = (xl_heap_inplace *) rec;
 
-		appendStringInfo(buf, "off %u", xlrec->offnum);
+		appendStringInfo(buf, "off: %u", xlrec->offnum);
 	}
 }
+
 void
 heap2_desc(StringInfo buf, XLogReaderState *record)
 {
@@ -125,43 +161,110 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_prune *xlrec = (xl_heap_prune *) rec;
 
-		appendStringInfo(buf, "snapshotConflictHorizon %u nredirected %u ndead %u",
+		appendStringInfo(buf, "snapshotConflictHorizon: %u, nredirected: %u, ndead: %u",
 						 xlrec->snapshotConflictHorizon,
 						 xlrec->nredirected,
 						 xlrec->ndead);
+
+
+		if (!XLogRecHasBlockImage(record, 0))
+		{
+			OffsetNumber *end;
+			OffsetNumber *redirected;
+			OffsetNumber *nowdead;
+			OffsetNumber *nowunused;
+			int			nredirected;
+			int			nunused;
+			Size		datalen;
+
+			redirected = (OffsetNumber *) XLogRecGetBlockData(record, 0,
+															  &datalen);
+
+			nredirected = xlrec->nredirected;
+			end = (OffsetNumber *) ((char *) redirected + datalen);
+			nowdead = redirected + (nredirected * 2);
+			nowunused = nowdead + xlrec->ndead;
+			nunused = (end - nowunused);
+			Assert(nunused >= 0);
+
+			appendStringInfo(buf, ", nunused: %u", nunused);
+
+			appendStringInfoString(buf, ", redirected:");
+			array_desc(buf, redirected, sizeof(OffsetNumber) * 2,
+					   nredirected * 2, &redirect_elem_desc, NULL);
+			appendStringInfoString(buf, ", dead:");
+			array_desc(buf, nowdead, sizeof(OffsetNumber), xlrec->ndead,
+					   &offset_elem_desc, NULL);
+			appendStringInfoString(buf, ", unused:");
+			array_desc(buf, nowunused, sizeof(OffsetNumber), nunused,
+					   &offset_elem_desc, NULL);
+		}
 	}
 	else if (info == XLOG_HEAP2_VACUUM)
 	{
 		xl_heap_vacuum *xlrec = (xl_heap_vacuum *) rec;
 
-		appendStringInfo(buf, "nunused %u", xlrec->nunused);
+		appendStringInfo(buf, "nunused: %u", xlrec->nunused);
+
+		if (!XLogRecHasBlockImage(record, 0))
+		{
+			OffsetNumber *nowunused;
+
+			nowunused = (OffsetNumber *) XLogRecGetBlockData(record, 0, NULL);
+
+			appendStringInfoString(buf, ", unused:");
+			array_desc(buf, nowunused, sizeof(OffsetNumber), xlrec->nunused, &offset_elem_desc, NULL);
+		}
 	}
 	else if (info == XLOG_HEAP2_FREEZE_PAGE)
 	{
 		xl_heap_freeze_page *xlrec = (xl_heap_freeze_page *) rec;
 
-		appendStringInfo(buf, "snapshotConflictHorizon %u nplans %u",
+		appendStringInfo(buf, "snapshotConflictHorizon: %u, nplans: %u",
 						 xlrec->snapshotConflictHorizon, xlrec->nplans);
+
+		if (!XLogRecHasBlockImage(record, 0))
+		{
+			xl_heap_freeze_plan *plans;
+			OffsetNumber *offsets;
+
+			plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, NULL);
+			offsets = (OffsetNumber *) &plans[xlrec->nplans];
+			appendStringInfoString(buf, ", plans:");
+			array_desc(buf,
+					   plans,
+					   sizeof(xl_heap_freeze_plan),
+					   xlrec->nplans,
+					   &plan_elem_desc,
+					   &offsets);
+
+		}
 	}
 	else if (info == XLOG_HEAP2_VISIBLE)
 	{
 		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
 
-		appendStringInfo(buf, "snapshotConflictHorizon %u flags 0x%02X",
+		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
 						 xlrec->snapshotConflictHorizon, xlrec->flags);
 	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
+		bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 
-		appendStringInfo(buf, "%d tuples flags 0x%02X", xlrec->ntuples,
+		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
+
+		appendStringInfoString(buf, ", offsets:");
+		if (!XLogRecHasBlockImage(record, 0) && !isinit)
+			array_desc(buf, xlrec->offsets, sizeof(OffsetNumber),
+					   xlrec->ntuples, &offset_elem_desc, NULL);
 	}
 	else if (info == XLOG_HEAP2_LOCK_UPDATED)
 	{
 		xl_heap_lock_updated *xlrec = (xl_heap_lock_updated *) rec;
 
-		appendStringInfo(buf, "off %u: xmax %u: flags 0x%02X ",
+		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
 						 xlrec->offnum, xlrec->xmax, xlrec->flags);
 		out_infobits(buf, xlrec->infobits_set);
 	}
@@ -169,13 +272,13 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_new_cid *xlrec = (xl_heap_new_cid *) rec;
 
-		appendStringInfo(buf, "rel %u/%u/%u; tid %u/%u",
+		appendStringInfo(buf, "rel: %u/%u/%u, tid: %u/%u",
 						 xlrec->target_locator.spcOid,
 						 xlrec->target_locator.dbOid,
 						 xlrec->target_locator.relNumber,
 						 ItemPointerGetBlockNumber(&(xlrec->target_tid)),
 						 ItemPointerGetOffsetNumber(&(xlrec->target_tid)));
-		appendStringInfo(buf, "; cmin: %u, cmax: %u, combo: %u",
+		appendStringInfo(buf, ", cmin: %u, cmax: %u, combo: %u",
 						 xlrec->cmin, xlrec->cmax, xlrec->combocid);
 	}
 }
diff --git a/src/backend/access/rmgrdesc/meson.build b/src/backend/access/rmgrdesc/meson.build
index 166cee67b..f76e87e2d 100644
--- a/src/backend/access/rmgrdesc/meson.build
+++ b/src/backend/access/rmgrdesc/meson.build
@@ -16,6 +16,7 @@ rmgr_desc_sources = files(
   'nbtdesc.c',
   'relmapdesc.c',
   'replorigindesc.c',
+  'rmgrdesc_utils.c',
   'seqdesc.c',
   'smgrdesc.c',
   'spgdesc.c',
diff --git a/src/backend/access/rmgrdesc/rmgrdesc_utils.c b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
new file mode 100644
index 000000000..2b3c1887d
--- /dev/null
+++ b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
@@ -0,0 +1,83 @@
+/*-------------------------------------------------------------------------
+ *
+ * rmgrdesc_utils.c
+ *		support functions for rmgrdesc
+ *
+ * Portions Copyright (c) 2023, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/rmgrdesc/rmgrdesc_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/rmgrdesc_utils.h"
+
+/*
+ * Guidelines for formatting desc functions:
+ *
+ * member1_name: member1_value, member2_name: member2_value
+ *
+ * If the value is a list, please use:
+ *
+ * member3_name: [ member3_list_value1, member3_list_value2 ]
+ *
+ * The first item appended to the string should not be prepended by any spaces
+ * or comma, however all subsequent appends to the string are responsible for
+ * prepending themselves with a comma followed by a space.
+ *
+ * Arrays should have a space between the opening square bracket and first
+ * element and between the last element and closing brace.
+ *
+ * Flags should be in ALL CAPS.
+ *
+ * For lists/arrays of items, the number of those items should be listed at
+ * the beginning with all of the other numbers.
+ *
+ * List punctuation should still be included even if there are 0 items.
+ *
+ * Composite objects in a list should be surrounded with { }.
+ */
+void
+array_desc(StringInfo buf, void *array, size_t elem_size, int count,
+		   void (*elem_desc) (StringInfo buf, void *elem, void *data),
+		   void *data)
+{
+	if (count == 0)
+	{
+		appendStringInfoString(buf, " []");
+		return;
+	}
+	appendStringInfo(buf, " [");
+	for (int i = 0; i < count; i++)
+	{
+		if (i > 0)
+			appendStringInfoString(buf, ",");
+		appendStringInfoString(buf, " ");
+
+		elem_desc(buf, (char *) array + elem_size * i, data);
+	}
+	appendStringInfoString(buf, " ]");
+}
+
+void
+offset_elem_desc(StringInfo buf, void *offset, void *data)
+{
+	appendStringInfo(buf, "%u", *(OffsetNumber *) offset);
+}
+
+void
+redirect_elem_desc(StringInfo buf, void *offset, void *data)
+{
+	OffsetNumber *new_offset = (OffsetNumber *) offset;
+
+	appendStringInfo(buf, "%u->%u", new_offset[0], new_offset[1]);
+}
+
+void
+relid_desc(StringInfo buf, void *relid, void *data)
+{
+	appendStringInfo(buf, "%u", *(Oid *) relid);
+}
diff --git a/src/bin/pg_waldump/Makefile b/src/bin/pg_waldump/Makefile
index d6459e17c..0ecf58203 100644
--- a/src/bin/pg_waldump/Makefile
+++ b/src/bin/pg_waldump/Makefile
@@ -18,7 +18,7 @@ OBJS = \
 
 override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
 
-RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc.c)))
+RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc*.c)))
 RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES))
 
 
diff --git a/doc/src/sgml/pgwalinspect.sgml b/doc/src/sgml/pgwalinspect.sgml
index d9ed8f0a9..0928717ba 100644
--- a/doc/src/sgml/pgwalinspect.sgml
+++ b/doc/src/sgml/pgwalinspect.sgml
@@ -71,19 +71,19 @@
       after the <replaceable>in_lsn</replaceable> argument.  For
       example:
 <screen>
-postgres=# SELECT * FROM pg_get_wal_record_info('0/1E826E98');
--[ RECORD 1 ]----+----------------------------------------------------
-start_lsn        | 0/1E826F20
-end_lsn          | 0/1E826F60
-prev_lsn         | 0/1E826C80
+postgres=# SELECT * FROM pg_get_wal_record_info(''0/E84F5E8'');
+-[ RECORD 1 ]----+--------------------------------------------------
+start_lsn        | 0/E84F5E8
+end_lsn          | 0/E84F620
+prev_lsn         | 0/E84F5A8
 xid              | 0
 resource_manager | Heap2
-record_type      | PRUNE
-record_length    | 58
-main_data_length | 8
+record_type      | VACUUM
+record_length    | 50
+main_data_length | 2
 fpi_length       | 0
-description      | snapshotConflictHorizon 33748 nredirected 0 ndead 2
-block_ref        | blkref #0: rel 1663/5/60221 fork main blk 2
+description      | nunused: 1, unused: [ 22 ]
+block_ref        | blkref #0: rel 1663/16389/20884 fork main blk 126
 </screen>
      </para>
      <para>
-- 
2.34.1

#26Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#25)
Re: Show various offset arrays for heap WAL records

On Fri, Apr 7, 2023 at 5:43 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Fri, Apr 7, 2023 at 1:33 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Attached v3 is cleaned up and includes a pg_walinspect docs update as
well as some edited comments in rmgr_utils.c

Attached v4 has some small tweaks on your v3. Mostly just whitespace
tweaks. Two slightly notable tweaks:

* I changed the approach to globbing in the Makefile, rather than use
your original overwide formulation for the new rmgrdesc_utils.c file.

What do you think of this approach?

Seems fine.

* Removed use of the restrict keyword.

While "restrict" is C99, I'm not completely sure that it's totally
supported by Postgres. I'm a bit surprised that you opted to use it in
this particular patch.

I meant to ask you about this earlier...why use restrict in this patch?

So, I think the signature I meant to have was:

void
array_desc(StringInfo buf, void *array, size_t elem_size, int count,
void (*elem_desc) (StringInfo buf, const void *elem, void *data),
void *data)

Basically I wanted to indicate that elem was not and should not be
modified and data can be modified but that they should not be the same
element or overlap at all.

I've added such an example to pg_walinspect docs.

There already was a PRUNE example, though -- for the
pg_get_wal_record_info function (singular, not to be confused with
pg_get_wal_records_info).

v4 makes the example a VACUUM record, which replaces the previous
pg_get_wal_record_info PRUNE example -- that needed to be updated
anyway. This approach has the advantage of not being too verbose,
which still showing some of this kind of detail.

This has the advantage of allowing pg_get_wal_records_info's example
to continue to be an example that lacks a block reference (and so has
a NULL block_ref). This is a useful contrast against the new
pg_get_wal_block_info function.

LGTM

- Melanie

In reply to: Melanie Plageman (#26)
Re: Show various offset arrays for heap WAL records

On Fri, Apr 7, 2023 at 4:01 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

LGTM

Pushed, thanks.

--
Peter Geoghegan

#28Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#27)
1 attachment(s)
Re: Show various offset arrays for heap WAL records

On Fri, Apr 7, 2023 at 7:09 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Fri, Apr 7, 2023 at 4:01 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

LGTM

Pushed, thanks.

It's come to my attention that I forgot to include the btree patch earlier.

PFA

Attachments:

v4-0002-Add-detail-to-some-btree-xlog-record-descs.patchtext/x-patch; charset=US-ASCII; name=v4-0002-Add-detail-to-some-btree-xlog-record-descs.patchDownload
From 4f502b2513ba79d738e7ed87aaf7d18ed2a2e30f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 13 Mar 2023 18:15:17 -0400
Subject: [PATCH v4 2/2] Add detail to some btree xlog record descs

Suggested by Peter Geoghegan

Discussion: https://postgr.es/m/flat/20230109215842.fktuhesvayno6o4g%40awork3.anarazel.de
---
 src/backend/access/rmgrdesc/nbtdesc.c        | 84 +++++++++++++++++---
 src/backend/access/rmgrdesc/rmgrdesc_utils.c |  6 ++
 src/include/access/rmgrdesc_utils.h          |  2 +
 3 files changed, 81 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/rmgrdesc/nbtdesc.c b/src/backend/access/rmgrdesc/nbtdesc.c
index c5dc543a0f..9ffece109d 100644
--- a/src/backend/access/rmgrdesc/nbtdesc.c
+++ b/src/backend/access/rmgrdesc/nbtdesc.c
@@ -15,6 +15,58 @@
 #include "postgres.h"
 
 #include "access/nbtxlog.h"
+#include "access/rmgrdesc_utils.h"
+
+static void btree_del_desc(StringInfo buf, char *block_data, uint16 ndeleted,
+						   uint16 nupdated);
+static void btree_update_elem_desc(StringInfo buf, void *restrict update, void
+								   *restrict data);
+
+static void
+btree_del_desc(StringInfo buf, char *block_data, uint16 ndeleted, uint16 nupdated)
+{
+	OffsetNumber *updatedoffsets;
+	xl_btree_update *updates;
+	OffsetNumber *data = (OffsetNumber *) block_data;
+
+	appendStringInfoString(buf, ", deleted:");
+	array_desc(buf, data, sizeof(OffsetNumber), ndeleted, &offset_elem_desc, NULL);
+
+	appendStringInfoString(buf, ", updated:");
+	array_desc(buf, data, sizeof(OffsetNumber), nupdated, &offset_elem_desc, NULL);
+
+	if (nupdated <= 0)
+		return;
+
+	updatedoffsets = (OffsetNumber *)
+		((char *) data + ndeleted * sizeof(OffsetNumber));
+	updates = (xl_btree_update *) ((char *) updatedoffsets +
+								   nupdated *
+								   sizeof(OffsetNumber));
+
+	appendStringInfoString(buf, ", updates:");
+	array_desc(buf, updates, sizeof(xl_btree_update),
+			   nupdated, &btree_update_elem_desc,
+			   &updatedoffsets);
+}
+
+static void
+btree_update_elem_desc(StringInfo buf, void *restrict update, void *restrict data)
+{
+	xl_btree_update *new_update = (xl_btree_update *) update;
+	OffsetNumber *updated_offset = *((OffsetNumber **) data);
+
+	appendStringInfo(buf, "{ updated offset: %u, ndeleted tids: %u", *updated_offset, new_update->ndeletedtids);
+
+	appendStringInfoString(buf, ", deleted tids:");
+
+	array_desc(buf, (char *) new_update + SizeOfBtreeUpdate,
+			   sizeof(uint16), new_update->ndeletedtids, &uint16_elem_desc, NULL);
+
+	updated_offset++;
+
+	appendStringInfo(buf, " }");
+}
 
 void
 btree_desc(StringInfo buf, XLogReaderState *record)
@@ -31,7 +83,7 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 			{
 				xl_btree_insert *xlrec = (xl_btree_insert *) rec;
 
-				appendStringInfo(buf, "off %u", xlrec->offnum);
+				appendStringInfo(buf, "off: %u", xlrec->offnum);
 				break;
 			}
 		case XLOG_BTREE_SPLIT_L:
@@ -39,7 +91,7 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 			{
 				xl_btree_split *xlrec = (xl_btree_split *) rec;
 
-				appendStringInfo(buf, "level %u, firstrightoff %d, newitemoff %d, postingoff %d",
+				appendStringInfo(buf, "level: %u, firstrightoff: %d, newitemoff: %d, postingoff: %d",
 								 xlrec->level, xlrec->firstrightoff,
 								 xlrec->newitemoff, xlrec->postingoff);
 				break;
@@ -48,31 +100,41 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 			{
 				xl_btree_dedup *xlrec = (xl_btree_dedup *) rec;
 
-				appendStringInfo(buf, "nintervals %u", xlrec->nintervals);
+				appendStringInfo(buf, "nintervals: %u", xlrec->nintervals);
 				break;
 			}
 		case XLOG_BTREE_VACUUM:
 			{
 				xl_btree_vacuum *xlrec = (xl_btree_vacuum *) rec;
 
-				appendStringInfo(buf, "ndeleted %u; nupdated %u",
+				appendStringInfo(buf, "ndeleted: %u, nupdated: %u",
 								 xlrec->ndeleted, xlrec->nupdated);
+
+				if (!XLogRecHasBlockImage(record, 0))
+					btree_del_desc(buf, XLogRecGetBlockData(record, 0, NULL),
+								   xlrec->ndeleted, xlrec->nupdated);
+
 				break;
 			}
 		case XLOG_BTREE_DELETE:
 			{
 				xl_btree_delete *xlrec = (xl_btree_delete *) rec;
 
-				appendStringInfo(buf, "snapshotConflictHorizon %u; ndeleted %u; nupdated %u",
+				appendStringInfo(buf, "snapshotConflictHorizon: %u, ndeleted: %u, nupdated: %u",
 								 xlrec->snapshotConflictHorizon,
 								 xlrec->ndeleted, xlrec->nupdated);
+
+				if (!XLogRecHasBlockImage(record, 0))
+					btree_del_desc(buf, XLogRecGetBlockData(record, 0, NULL),
+								   xlrec->ndeleted, xlrec->nupdated);
+
 				break;
 			}
 		case XLOG_BTREE_MARK_PAGE_HALFDEAD:
 			{
 				xl_btree_mark_page_halfdead *xlrec = (xl_btree_mark_page_halfdead *) rec;
 
-				appendStringInfo(buf, "topparent %u; leaf %u; left %u; right %u",
+				appendStringInfo(buf, "topparent: %u; leaf: %u; left: %u; right: %u",
 								 xlrec->topparent, xlrec->leafblk, xlrec->leftblk, xlrec->rightblk);
 				break;
 			}
@@ -81,11 +143,11 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 			{
 				xl_btree_unlink_page *xlrec = (xl_btree_unlink_page *) rec;
 
-				appendStringInfo(buf, "left %u; right %u; level %u; safexid %u:%u; ",
+				appendStringInfo(buf, "left: %u; right: %u; level: %u; safexid: %u:%u; ",
 								 xlrec->leftsib, xlrec->rightsib, xlrec->level,
 								 EpochFromFullTransactionId(xlrec->safexid),
 								 XidFromFullTransactionId(xlrec->safexid));
-				appendStringInfo(buf, "leafleft %u; leafright %u; leaftopparent %u",
+				appendStringInfo(buf, "leafleft: %u; leafright: %u; leaftopparent: %u",
 								 xlrec->leafleftsib, xlrec->leafrightsib,
 								 xlrec->leaftopparent);
 				break;
@@ -94,14 +156,14 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 			{
 				xl_btree_newroot *xlrec = (xl_btree_newroot *) rec;
 
-				appendStringInfo(buf, "lev %u", xlrec->level);
+				appendStringInfo(buf, "lev: %u", xlrec->level);
 				break;
 			}
 		case XLOG_BTREE_REUSE_PAGE:
 			{
 				xl_btree_reuse_page *xlrec = (xl_btree_reuse_page *) rec;
 
-				appendStringInfo(buf, "rel %u/%u/%u; snapshotConflictHorizon %u:%u",
+				appendStringInfo(buf, "rel: %u/%u/%u, snapshotConflictHorizon: %u:%u",
 								 xlrec->locator.spcOid, xlrec->locator.dbOid,
 								 xlrec->locator.relNumber,
 								 EpochFromFullTransactionId(xlrec->snapshotConflictHorizon),
@@ -114,7 +176,7 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 
 				xlrec = (xl_btree_metadata *) XLogRecGetBlockData(record, 0,
 																  NULL);
-				appendStringInfo(buf, "last_cleanup_num_delpages %u",
+				appendStringInfo(buf, "last_cleanup_num_delpages: %u",
 								 xlrec->last_cleanup_num_delpages);
 				break;
 			}
diff --git a/src/backend/access/rmgrdesc/rmgrdesc_utils.c b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
index 2bfd11cb6e..1268d6131e 100644
--- a/src/backend/access/rmgrdesc/rmgrdesc_utils.c
+++ b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
@@ -80,3 +80,9 @@ relid_desc(StringInfo buf, void *restrict relid, void *restrict data)
 {
 	appendStringInfo(buf, "%u", *(Oid *) relid);
 }
+
+void
+uint16_elem_desc(StringInfo buf, void *restrict value, void *restrict data)
+{
+	appendStringInfo(buf, "%u", *(uint16 *) value);
+}
diff --git a/src/include/access/rmgrdesc_utils.h b/src/include/access/rmgrdesc_utils.h
index 763ae81859..ce24accc3b 100644
--- a/src/include/access/rmgrdesc_utils.h
+++ b/src/include/access/rmgrdesc_utils.h
@@ -25,5 +25,7 @@ extern void redirect_elem_desc(StringInfo buf, void *restrict offset, void *rest
 
 extern void relid_desc(StringInfo buf, void *restrict relid, void *restrict data);
 
+extern void uint16_elem_desc(StringInfo buf, void *restrict value, void *restrict data);
+
 
 #endif							/* RMGRDESC_UTILS_H */
-- 
2.37.2

In reply to: Melanie Plageman (#28)
Re: Show various offset arrays for heap WAL records

On Fri, Apr 7, 2023 at 4:21 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

It's come to my attention that I forgot to include the btree patch earlier.

Pushed that one too.

Also removed the use of the "restrict" keyword here.

Thanks
--
Peter Geoghegan

In reply to: Peter Geoghegan (#29)
1 attachment(s)
Re: Show various offset arrays for heap WAL records

On Fri, Apr 7, 2023 at 4:46 PM Peter Geoghegan <pg@bowt.ie> wrote:

Pushed that one too.

I noticed that the nbtree VACUUM and DELETE record types have their
update/xl_btree_update arrays output incorrectly. We cannot use the
generic array_desc() approach with xl_btree_update elements, because
they're variable-width elements. The problem is that array_desc() only deals
with fixed-width elements.

I also changed some of the details around whitespace in arrays in the
fixup patch (though I didn't do the same with objects). It doesn't
seem useful to use so much whitespace for long arrays of integers
(really page offset numbers). And I brought a few nbtree desc routines
that still used ";" characters as punctuation in line with the new
convention.

Finally, the patch revises the guidelines written for rmgr desc
routine authors. I don't think that we need to describe how to handle
outputting whitespace in detail. It'll be quite natural for other
rmgrs to use existing facilities such as array_desc() themselves,
which makes whitespace type inconsistencies unlikely. I've tried to
make the limits of the guidelines clear. The main goal is to avoid
gratuitous inconsistencies, and to provide a standard way of doing
things that many different rmgrs are likely to want to do, again and
again. But individual rmgrs still have a certain amount of discretion,
which seems like a good thing to me (the alternative requires that we
fix at least a couple of things in nbtdesc.c and in heapdesc.c, which
doesn't seem useful to me).

--
Peter Geoghegan

Attachments:

v1-0001-Fix-WAL-description-of-posting-list-updates.patchapplication/x-patch; name=v1-0001-Fix-WAL-description-of-posting-list-updates.patchDownload
From 44bfa575f5f28d1a8093e02b4a7993c1368cbeb5 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 8 Apr 2023 18:59:09 -0700
Subject: [PATCH v1] Fix WAL description of posting list updates.

---
 src/include/access/rmgrdesc_utils.h          |  41 ++++++-
 src/backend/access/rmgrdesc/heapdesc.c       |   2 +-
 src/backend/access/rmgrdesc/nbtdesc.c        | 121 ++++++++++---------
 src/backend/access/rmgrdesc/rmgrdesc_utils.c |  43 +------
 doc/src/sgml/pgwalinspect.sgml               |  34 +++---
 5 files changed, 123 insertions(+), 118 deletions(-)

diff --git a/src/include/access/rmgrdesc_utils.h b/src/include/access/rmgrdesc_utils.h
index 13552d6f3..16916e3e8 100644
--- a/src/include/access/rmgrdesc_utils.h
+++ b/src/include/access/rmgrdesc_utils.h
@@ -12,12 +12,49 @@
 #ifndef RMGRDESC_UTILS_H_
 #define RMGRDESC_UTILS_H_
 
+/*
+ * Guidelines for rmgr descriptor routine authors:
+ *
+ * The goal of these guidelines is to avoid gratuitous inconsistencies across
+ * each rmgr, and to allow users to parse desc output strings without too much
+ * difficulty.  This is not an API specification or an interchange format.
+ * (Only heapam and nbtree desc routines follow these guidelines at present,
+ * in any case.)
+ *
+ * Record descriptions are similar to JSON style key/value objects.  However,
+ * there is no explicit "string" type/string escaping.  Top-level { } brackets
+ * should be omitted.  For example:
+ *
+ * snapshotConflictHorizon: 0, flags: 0x03
+ *
+ * Record descriptions may contain variable-length arrays.  For example:
+ *
+ * nunused: 5, unused: [1, 2, 3, 4, 5]
+ *
+ * Nested objects are supported via { } brackets.  They generally appear
+ * inside variable-length arrays.  For example:
+ *
+ * ndeleted: 0, nupdated: 1, deleted: [], updated: [{ off: 45, nptids: 1, ptids: [0] }]
+ *
+ * Try to output things in an order that faithfully represents the order of
+ * things in the physical WAL record struct.  It's a good idea if the number
+ * of items in the array appears before the array.
+ *
+ * It's okay for individual WAL record types to invent their own conventions.
+ * For example, heapam's PRUNE records output the follow representation of
+ * redirects:
+ *
+ * ... redirected: [39->46], ...
+ *
+ * Arguably the PRUNE desc routine should be using object notation instead.
+ * This ad-hoc representation of redirects has the advantage of being terse in
+ * a context where that might matter a lot.
+ */
 extern void array_desc(StringInfo buf, void *array, size_t elem_size, int count,
 					   void (*elem_desc) (StringInfo buf, void *elem, void *data),
 					   void *data);
 extern void offset_elem_desc(StringInfo buf, void *offset, void *data);
 extern void redirect_elem_desc(StringInfo buf, void *offset, void *data);
-extern void relid_desc(StringInfo buf, void *relid, void *data);
-extern void uint16_elem_desc(StringInfo buf, void *value, void *data);
+extern void oid_elem_desc(StringInfo buf, void *relid, void *data);
 
 #endif							/* RMGRDESC_UTILS_H */
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 6dfd7d7d1..3bd083875 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -127,7 +127,7 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, ", nrelids: %u", xlrec->nrelids);
 		appendStringInfoString(buf, ", relids:");
 		array_desc(buf, xlrec->relids, sizeof(Oid), xlrec->nrelids,
-				   &relid_desc, NULL);
+				   &oid_elem_desc, NULL);
 	}
 	else if (info == XLOG_HEAP_CONFIRM)
 	{
diff --git a/src/backend/access/rmgrdesc/nbtdesc.c b/src/backend/access/rmgrdesc/nbtdesc.c
index 6bc043265..6edb02a0b 100644
--- a/src/backend/access/rmgrdesc/nbtdesc.c
+++ b/src/backend/access/rmgrdesc/nbtdesc.c
@@ -17,59 +17,8 @@
 #include "access/nbtxlog.h"
 #include "access/rmgrdesc_utils.h"
 
-static void btree_del_desc(StringInfo buf, char *block_data, uint16 ndeleted,
-						   uint16 nupdated);
-static void btree_update_elem_desc(StringInfo buf, void *update, void *data);
-
-static void
-btree_del_desc(StringInfo buf, char *block_data, uint16 ndeleted,
-			   uint16 nupdated)
-{
-	OffsetNumber *updatedoffsets;
-	xl_btree_update *updates;
-	OffsetNumber *data = (OffsetNumber *) block_data;
-
-	appendStringInfoString(buf, ", deleted:");
-	array_desc(buf, data, sizeof(OffsetNumber), ndeleted,
-			   &offset_elem_desc, NULL);
-
-	appendStringInfoString(buf, ", updated:");
-	array_desc(buf, data, sizeof(OffsetNumber), nupdated,
-			   &offset_elem_desc, NULL);
-
-	if (nupdated <= 0)
-		return;
-
-	updatedoffsets = (OffsetNumber *)
-		((char *) data + ndeleted * sizeof(OffsetNumber));
-	updates = (xl_btree_update *) ((char *) updatedoffsets +
-								   nupdated *
-								   sizeof(OffsetNumber));
-
-	appendStringInfoString(buf, ", updates:");
-	array_desc(buf, updates, sizeof(xl_btree_update),
-			   nupdated, &btree_update_elem_desc,
-			   &updatedoffsets);
-}
-
-static void
-btree_update_elem_desc(StringInfo buf, void *update, void *data)
-{
-	xl_btree_update *new_update = (xl_btree_update *) update;
-	OffsetNumber *updated_offset = *((OffsetNumber **) data);
-
-	appendStringInfo(buf, "{ updated offset: %u, ndeleted tids: %u",
-					 *updated_offset, new_update->ndeletedtids);
-
-	appendStringInfoString(buf, ", deleted tids:");
-
-	array_desc(buf, (char *) new_update + SizeOfBtreeUpdate, sizeof(uint16),
-			   new_update->ndeletedtids, &uint16_elem_desc, NULL);
-
-	updated_offset++;
-
-	appendStringInfo(buf, " }");
-}
+static void delvacuum_desc(StringInfo buf, char *block_data,
+						   uint16 ndeleted, uint16 nupdated);
 
 void
 btree_desc(StringInfo buf, XLogReaderState *record)
@@ -114,9 +63,8 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 								 xlrec->ndeleted, xlrec->nupdated);
 
 				if (!XLogRecHasBlockImage(record, 0))
-					btree_del_desc(buf, XLogRecGetBlockData(record, 0, NULL),
+					delvacuum_desc(buf, XLogRecGetBlockData(record, 0, NULL),
 								   xlrec->ndeleted, xlrec->nupdated);
-
 				break;
 			}
 		case XLOG_BTREE_DELETE:
@@ -128,16 +76,15 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 								 xlrec->ndeleted, xlrec->nupdated);
 
 				if (!XLogRecHasBlockImage(record, 0))
-					btree_del_desc(buf, XLogRecGetBlockData(record, 0, NULL),
+					delvacuum_desc(buf, XLogRecGetBlockData(record, 0, NULL),
 								   xlrec->ndeleted, xlrec->nupdated);
-
 				break;
 			}
 		case XLOG_BTREE_MARK_PAGE_HALFDEAD:
 			{
 				xl_btree_mark_page_halfdead *xlrec = (xl_btree_mark_page_halfdead *) rec;
 
-				appendStringInfo(buf, "topparent: %u; leaf: %u; left: %u; right: %u",
+				appendStringInfo(buf, "topparent: %u, leaf: %u, left: %u, right: %u",
 								 xlrec->topparent, xlrec->leafblk, xlrec->leftblk, xlrec->rightblk);
 				break;
 			}
@@ -146,11 +93,11 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 			{
 				xl_btree_unlink_page *xlrec = (xl_btree_unlink_page *) rec;
 
-				appendStringInfo(buf, "left: %u; right: %u; level: %u; safexid: %u:%u; ",
+				appendStringInfo(buf, "left: %u, right: %u, level: %u, safexid: %u:%u, ",
 								 xlrec->leftsib, xlrec->rightsib, xlrec->level,
 								 EpochFromFullTransactionId(xlrec->safexid),
 								 XidFromFullTransactionId(xlrec->safexid));
-				appendStringInfo(buf, "leafleft: %u; leafright: %u; leaftopparent: %u",
+				appendStringInfo(buf, "leafleft: %u, leafright: %u, leaftopparent: %u",
 								 xlrec->leafleftsib, xlrec->leafrightsib,
 								 xlrec->leaftopparent);
 				break;
@@ -242,3 +189,57 @@ btree_identify(uint8 info)
 
 	return id;
 }
+
+static void
+delvacuum_desc(StringInfo buf, char *block_data,
+			   uint16 ndeleted, uint16 nupdated)
+{
+	OffsetNumber *deletedoffsets;
+	OffsetNumber *updatedoffsets;
+	xl_btree_update *updates;
+
+	/* Output deleted page offset number array */
+	appendStringInfoString(buf, ", deleted:");
+	deletedoffsets = (OffsetNumber *) block_data;
+	array_desc(buf, deletedoffsets, sizeof(OffsetNumber), ndeleted,
+			   &offset_elem_desc, NULL);
+
+	/*
+	 * Output updates as an array of "update objects", where each element
+	 * contains a page offset number from updated array.  (This is not the
+	 * most literal representation of the underlying physical data structure
+	 * that we could use.  Readability seems more important here.)
+	 */
+	appendStringInfoString(buf, ", updated: [");
+	updatedoffsets = (OffsetNumber *) (block_data + ndeleted *
+									   sizeof(OffsetNumber));
+	updates = (xl_btree_update *) ((char *) updatedoffsets +
+								   nupdated *
+								   sizeof(OffsetNumber));
+	for (int i = 0; i < nupdated; i++)
+	{
+		OffsetNumber off = updatedoffsets[i];
+
+		appendStringInfo(buf, "{ off: %u, nptids: %u, ptids: [",
+						 off, updates->ndeletedtids);
+		Assert(updates->ndeletedtids > 0);
+		for (int p = 0; p < updates->ndeletedtids; p++)
+		{
+			uint16 *ptid;
+
+			ptid = (uint16 *) ((char *) updates + SizeOfBtreeUpdate) + p;
+			appendStringInfo(buf, "%u", *ptid);
+
+			if (p < updates->ndeletedtids - 1)
+				appendStringInfoString(buf, ", ");
+		}
+		appendStringInfoString(buf, "] }");
+		if (i < nupdated - 1)
+			appendStringInfoString(buf, ", ");
+
+		updates = (xl_btree_update *)
+			((char *) updates + SizeOfBtreeUpdate +
+			 updates->ndeletedtids * sizeof(uint16));
+	}
+	appendStringInfoString(buf, "]");
+}
diff --git a/src/backend/access/rmgrdesc/rmgrdesc_utils.c b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
index c95a41a25..90051f0df 100644
--- a/src/backend/access/rmgrdesc/rmgrdesc_utils.c
+++ b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
@@ -16,31 +16,6 @@
 #include "access/rmgrdesc_utils.h"
 #include "storage/off.h"
 
-/*
- * Guidelines for formatting desc functions:
- *
- * member1_name: member1_value, member2_name: member2_value
- *
- * If the value is a list, please use:
- *
- * member3_name: [ member3_list_value1, member3_list_value2 ]
- *
- * The first item appended to the string should not be prepended by any spaces
- * or comma, however all subsequent appends to the string are responsible for
- * prepending themselves with a comma followed by a space.
- *
- * Arrays should have a space between the opening square bracket and first
- * element and between the last element and closing brace.
- *
- * Flags should be in ALL CAPS.
- *
- * For lists/arrays of items, the number of those items should be listed at
- * the beginning with all of the other numbers.
- *
- * List punctuation should still be included even if there are 0 items.
- *
- * Composite objects in a list should be surrounded with { }.
- */
 void
 array_desc(StringInfo buf, void *array, size_t elem_size, int count,
 		   void (*elem_desc) (StringInfo buf, void *elem, void *data),
@@ -51,16 +26,14 @@ array_desc(StringInfo buf, void *array, size_t elem_size, int count,
 		appendStringInfoString(buf, " []");
 		return;
 	}
-	appendStringInfo(buf, " [");
+	appendStringInfoString(buf, " [");
 	for (int i = 0; i < count; i++)
 	{
-		if (i > 0)
-			appendStringInfoString(buf, ",");
-		appendStringInfoString(buf, " ");
-
 		elem_desc(buf, (char *) array + elem_size * i, data);
+		if (i < count - 1)
+			appendStringInfoString(buf, ", ");
 	}
-	appendStringInfoString(buf, " ]");
+	appendStringInfoString(buf, "]");
 }
 
 void
@@ -78,13 +51,7 @@ redirect_elem_desc(StringInfo buf, void *offset, void *data)
 }
 
 void
-relid_desc(StringInfo buf, void *relid, void *data)
+oid_elem_desc(StringInfo buf, void *relid, void *data)
 {
 	appendStringInfo(buf, "%u", *(Oid *) relid);
 }
-
-void
-uint16_elem_desc(StringInfo buf, void *value, void *data)
-{
-	appendStringInfo(buf, "%u", *(uint16 *) value);
-}
diff --git a/doc/src/sgml/pgwalinspect.sgml b/doc/src/sgml/pgwalinspect.sgml
index b3712be00..a1a4422fb 100644
--- a/doc/src/sgml/pgwalinspect.sgml
+++ b/doc/src/sgml/pgwalinspect.sgml
@@ -71,19 +71,19 @@
       after the <replaceable>in_lsn</replaceable> argument.  For
       example:
 <screen>
-postgres=# SELECT * FROM pg_get_wal_record_info('0/E84F5E8');
--[ RECORD 1 ]----+--------------------------------------------------
-start_lsn        | 0/E84F5E8
-end_lsn          | 0/E84F620
-prev_lsn         | 0/E84F5A8
+postgres=# SELECT * FROM pg_get_wal_record_info('0/E419E28');
+-[ RECORD 1 ]----+-------------------------------------------------
+start_lsn        | 0/E419E28
+end_lsn          | 0/E419E68
+prev_lsn         | 0/E419D78
 xid              | 0
 resource_manager | Heap2
 record_type      | VACUUM
-record_length    | 50
+record_length    | 58
 main_data_length | 2
 fpi_length       | 0
-description      | nunused: 1, unused: [ 22 ]
-block_ref        | blkref #0: rel 1663/16389/20884 fork main blk 126
+description      | nunused: 5, unused: [1, 2, 3, 4, 5]
+block_ref        | blkref #0: rel 1663/16385/1249 fork main blk 364
 </screen>
      </para>
      <para>
@@ -144,18 +144,18 @@ block_ref        |
       references.  Returns one row per block reference per WAL record.
       For example:
 <screen>
-postgres=# SELECT * FROM pg_get_wal_block_info('0/10E9D80', '0/10E9DC0');
+postgres=# SELECT * FROM pg_get_wal_block_info('0/1230278', '0/12302B8');
 -[ RECORD 1 ]-----+-----------------------------------
-start_lsn         | 0/10E9D80
-end_lsn           | 0/10E9DC0
-prev_lsn          | 0/10E9860
+start_lsn         | 0/1230278
+end_lsn           | 0/12302B8
+prev_lsn          | 0/122FD40
 block_id          | 0
 reltablespace     | 1663
 reldatabase       | 1
-relfilenode       | 2690
+relfilenode       | 2658
 relforknumber     | 0
-relblocknumber    | 5
-xid               | 117
+relblocknumber    | 11
+xid               | 341
 resource_manager  | Btree
 record_type       | INSERT_LEAF
 record_length     | 64
@@ -163,8 +163,8 @@ main_data_length  | 2
 block_data_length | 16
 block_fpi_length  | 0
 block_fpi_info    |
-description       | off 14
-block_data        | \x00005400020010001407000000000000
+description       | off: 46
+block_data        | \x00002a00070010402630000070696400
 block_fpi_data    |
 </screen>
      </para>
-- 
2.40.0

In reply to: Peter Geoghegan (#30)
1 attachment(s)
Re: Show various offset arrays for heap WAL records

On Sun, Apr 9, 2023 at 5:12 PM Peter Geoghegan <pg@bowt.ie> wrote:

I noticed that the nbtree VACUUM and DELETE record types have their
update/xl_btree_update arrays output incorrectly. We cannot use the
generic array_desc() approach with xl_btree_update elements, because
they're variable-width elements. The problem is that array_desc() only deals
with fixed-width elements.

I pushed this fix just now, though without the updates to the
guidelines (or only minimal updates).

A remaining problem with arrays appears in "infobits" fields for
record types such as LOCK. Here's an example of the problem:

off: 34, xid: 3, flags: 0x00, infobits: [, LOCK_ONLY, EXCL_LOCK ]

Clearly the punctuation from the array is malformed.

A second issue (related to the first) is the name of the key itself,
"infobits". While "infobits" actually seems fine in this particular
example, I don't think that we want to do the same for record types
such as HEAP_UPDATE, since such records require that the description
show information about flags whose underlying field in the WAL record
struct is actually called "old_infobits_set". I think that we should
be outputting "old_infobits: [ ... ] " in the description of
HEAP_UPDATE records, which isn't the case right now.

A third issue is present in the nearby handling of xl_heap_truncate
status flags. It's the same basic array punctuation issue again, so
arguably this is the same issue as the first one.

Attached patch fixes all of these issues, and overhauls the guidelines
in the way originally proposed by the nbtree fix patch (since I didn't
keep that part of the nbtree patch when I pushed it today).

Note that the patch makes many individual (say) HOT_UPDATE records
have descriptions that look like this:

... old_infobits: [], ...

This differs from HEAD, where the output is totally suppressed because
there are no flag bits to show. I think that this behavior is more
logical and consistent overall.

--
Peter Geoghegan

Attachments:

v1-0001-Fix-heapdesc-infomask-array-output.patchapplication/octet-stream; name=v1-0001-Fix-heapdesc-infomask-array-output.patchDownload
From 06865859f3c5da8d735f52719ec0ac0632322be1 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sun, 9 Apr 2023 18:14:27 -0700
Subject: [PATCH v1 1/2] Fix heapdesc infomask array output.

---
 src/include/access/rmgrdesc_utils.h          |  38 +++++++
 src/backend/access/rmgrdesc/heapdesc.c       | 105 +++++++++++++------
 src/backend/access/rmgrdesc/rmgrdesc_utils.c |  20 ----
 3 files changed, 112 insertions(+), 51 deletions(-)

diff --git a/src/include/access/rmgrdesc_utils.h b/src/include/access/rmgrdesc_utils.h
index 00e614649..16916e3e8 100644
--- a/src/include/access/rmgrdesc_utils.h
+++ b/src/include/access/rmgrdesc_utils.h
@@ -12,6 +12,44 @@
 #ifndef RMGRDESC_UTILS_H_
 #define RMGRDESC_UTILS_H_
 
+/*
+ * Guidelines for rmgr descriptor routine authors:
+ *
+ * The goal of these guidelines is to avoid gratuitous inconsistencies across
+ * each rmgr, and to allow users to parse desc output strings without too much
+ * difficulty.  This is not an API specification or an interchange format.
+ * (Only heapam and nbtree desc routines follow these guidelines at present,
+ * in any case.)
+ *
+ * Record descriptions are similar to JSON style key/value objects.  However,
+ * there is no explicit "string" type/string escaping.  Top-level { } brackets
+ * should be omitted.  For example:
+ *
+ * snapshotConflictHorizon: 0, flags: 0x03
+ *
+ * Record descriptions may contain variable-length arrays.  For example:
+ *
+ * nunused: 5, unused: [1, 2, 3, 4, 5]
+ *
+ * Nested objects are supported via { } brackets.  They generally appear
+ * inside variable-length arrays.  For example:
+ *
+ * ndeleted: 0, nupdated: 1, deleted: [], updated: [{ off: 45, nptids: 1, ptids: [0] }]
+ *
+ * Try to output things in an order that faithfully represents the order of
+ * things in the physical WAL record struct.  It's a good idea if the number
+ * of items in the array appears before the array.
+ *
+ * It's okay for individual WAL record types to invent their own conventions.
+ * For example, heapam's PRUNE records output the follow representation of
+ * redirects:
+ *
+ * ... redirected: [39->46], ...
+ *
+ * Arguably the PRUNE desc routine should be using object notation instead.
+ * This ad-hoc representation of redirects has the advantage of being terse in
+ * a context where that might matter a lot.
+ */
 extern void array_desc(StringInfo buf, void *array, size_t elem_size, int count,
 					   void (*elem_desc) (StringInfo buf, void *elem, void *data),
 					   void *data);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 3bd083875..a64d14c2c 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -18,29 +18,75 @@
 #include "access/rmgrdesc_utils.h"
 
 static void
-out_infobits(StringInfo buf, uint8 infobits)
+infobits_desc(StringInfo buf, uint8 infobits, const char *keyname)
 {
-	if ((infobits & XLHL_XMAX_IS_MULTI) == 0 &&
-		(infobits & XLHL_XMAX_LOCK_ONLY) == 0 &&
-		(infobits & XLHL_XMAX_EXCL_LOCK) == 0 &&
-		(infobits & XLHL_XMAX_KEYSHR_LOCK) == 0 &&
-		(infobits & XLHL_KEYS_UPDATED) == 0)
-		return;
+	bool		first = true;
 
-	appendStringInfoString(buf, ", infobits: [");
+	appendStringInfo(buf, "%s: [", keyname);
 
 	if (infobits & XLHL_XMAX_IS_MULTI)
-		appendStringInfoString(buf, " IS_MULTI");
+	{
+		appendStringInfoString(buf, "IS_MULTI");
+		first = false;
+	}
 	if (infobits & XLHL_XMAX_LOCK_ONLY)
-		appendStringInfoString(buf, ", LOCK_ONLY");
+	{
+		if (first)
+			appendStringInfoString(buf, "LOCK_ONLY");
+		else
+			appendStringInfoString(buf, ", LOCK_ONLY");
+		first = false;
+	}
 	if (infobits & XLHL_XMAX_EXCL_LOCK)
-		appendStringInfoString(buf, ", EXCL_LOCK");
+	{
+		if (first)
+			appendStringInfoString(buf, "EXCL_LOCK");
+		else
+			appendStringInfoString(buf, ", EXCL_LOCK");
+		first = false;
+	}
 	if (infobits & XLHL_XMAX_KEYSHR_LOCK)
-		appendStringInfoString(buf, ", KEYSHR_LOCK");
+	{
+		if (first)
+			appendStringInfoString(buf, "KEYSHR_LOCK");
+		else
+			appendStringInfoString(buf, ", KEYSHR_LOCK");
+		first = false;
+	}
 	if (infobits & XLHL_KEYS_UPDATED)
-		appendStringInfoString(buf, ", KEYS_UPDATED");
+	{
+		if (first)
+			appendStringInfoString(buf, "KEYS_UPDATED");
+		else
+			appendStringInfoString(buf, ", KEYS_UPDATED");
+		first = false;
+	}
 
-	appendStringInfoString(buf, " ]");
+	appendStringInfoString(buf, "]");
+}
+
+static void
+truncate_flags_desc(StringInfo buf, uint8 flags)
+{
+	bool		first = true;
+
+	appendStringInfoString(buf, "flags: [");
+
+	if (flags & XLH_TRUNCATE_CASCADE)
+	{
+		appendStringInfoString(buf, "CASCADE");
+		first = false;
+	}
+	if (flags & XLH_TRUNCATE_RESTART_SEQS)
+	{
+		if (first)
+			appendStringInfoString(buf, "RESTART_SEQS");
+		else
+			appendStringInfoString(buf, ", RESTART_SEQS");
+		first = false;
+	}
+
+	appendStringInfoString(buf, "]");
 }
 
 static void
@@ -82,20 +128,20 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_delete *xlrec = (xl_heap_delete *) rec;
 
-		appendStringInfo(buf, "off: %u, flags: 0x%02X",
+		appendStringInfo(buf, "off: %u, flags: 0x%02X, ",
 						 xlrec->offnum,
 						 xlrec->flags);
-		out_infobits(buf, xlrec->infobits_set);
+		infobits_desc(buf, xlrec->infobits_set, "infobits");
 	}
 	else if (info == XLOG_HEAP_UPDATE)
 	{
 		xl_heap_update *xlrec = (xl_heap_update *) rec;
 
-		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
+		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X, ",
 						 xlrec->old_offnum,
 						 xlrec->old_xmax,
 						 xlrec->flags);
-		out_infobits(buf, xlrec->old_infobits_set);
+		infobits_desc(buf, xlrec->old_infobits_set, "old_infobits");
 		appendStringInfo(buf, ", new off: %u, xmax %u",
 						 xlrec->new_offnum,
 						 xlrec->new_xmax);
@@ -104,11 +150,11 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_update *xlrec = (xl_heap_update *) rec;
 
-		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
+		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X, ",
 						 xlrec->old_offnum,
 						 xlrec->old_xmax,
 						 xlrec->flags);
-		out_infobits(buf, xlrec->old_infobits_set);
+		infobits_desc(buf, xlrec->old_infobits_set, "old_infobits");
 		appendStringInfo(buf, ", new off: %u, xmax: %u",
 						 xlrec->new_offnum,
 						 xlrec->new_xmax);
@@ -117,12 +163,7 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_truncate *xlrec = (xl_heap_truncate *) rec;
 
-		appendStringInfoString(buf, "flags: [");
-		if (xlrec->flags & XLH_TRUNCATE_CASCADE)
-			appendStringInfoString(buf, " CASCADE");
-		if (xlrec->flags & XLH_TRUNCATE_RESTART_SEQS)
-			appendStringInfoString(buf, ", RESTART_SEQS");
-		appendStringInfoString(buf, " ]");
+		truncate_flags_desc(buf, xlrec->flags);
 
 		appendStringInfo(buf, ", nrelids: %u", xlrec->nrelids);
 		appendStringInfoString(buf, ", relids:");
@@ -139,9 +180,9 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_lock *xlrec = (xl_heap_lock *) rec;
 
-		appendStringInfo(buf, "off: %u, xid: %u, flags: 0x%02X",
+		appendStringInfo(buf, "off: %u, xid: %u, flags: 0x%02X, ",
 						 xlrec->offnum, xlrec->locking_xid, xlrec->flags);
-		out_infobits(buf, xlrec->infobits_set);
+		infobits_desc(buf, xlrec->infobits_set, "infobits");
 	}
 	else if (info == XLOG_HEAP_INPLACE)
 	{
@@ -230,7 +271,9 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			OffsetNumber *offsets;
 
 			plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, NULL);
-			offsets = (OffsetNumber *) &plans[xlrec->nplans];
+			offsets = (OffsetNumber *) ((char *) plans +
+										(xlrec->nplans *
+										 sizeof(xl_heap_freeze_plan)));
 			appendStringInfoString(buf, ", plans:");
 			array_desc(buf, plans, sizeof(xl_heap_freeze_plan), xlrec->nplans,
 					   &plan_elem_desc, &offsets);
@@ -260,9 +303,9 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_lock_updated *xlrec = (xl_heap_lock_updated *) rec;
 
-		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
+		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X, ",
 						 xlrec->offnum, xlrec->xmax, xlrec->flags);
-		out_infobits(buf, xlrec->infobits_set);
+		infobits_desc(buf, xlrec->infobits_set, "infobits");
 	}
 	else if (info == XLOG_HEAP2_NEW_CID)
 	{
diff --git a/src/backend/access/rmgrdesc/rmgrdesc_utils.c b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
index 3d16b1fcb..90051f0df 100644
--- a/src/backend/access/rmgrdesc/rmgrdesc_utils.c
+++ b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
@@ -16,26 +16,6 @@
 #include "access/rmgrdesc_utils.h"
 #include "storage/off.h"
 
-/*
- * Guidelines for formatting desc functions:
- *
- * member1_name: member1_value, member2_name: member2_value
- *
- * If the value is a list, please use:
- *
- * member3_name: [ member3_list_value1, member3_list_value2 ]
- *
- * The first item appended to the string should not be prepended by any spaces
- * or comma, however all subsequent appends to the string are responsible for
- * prepending themselves with a comma followed by a space.
- *
- * Flags should be in ALL CAPS.
- *
- * For lists/arrays of items, the number of those items should be listed at
- * the beginning with all of the other numbers.
- *
- * Composite objects in a list should be surrounded with { }.
- */
 void
 array_desc(StringInfo buf, void *array, size_t elem_size, int count,
 		   void (*elem_desc) (StringInfo buf, void *elem, void *data),
-- 
2.40.0

#32Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#31)
Re: Show various offset arrays for heap WAL records

On Sun, Apr 9, 2023 at 8:12 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Fri, Apr 7, 2023 at 4:46 PM Peter Geoghegan <pg@bowt.ie> wrote:

Pushed that one too.

I noticed that the nbtree VACUUM and DELETE record types have their
update/xl_btree_update arrays output incorrectly. We cannot use the
generic array_desc() approach with xl_btree_update elements, because
they're variable-width elements. The problem is that array_desc() only deals
with fixed-width elements.

You are right. I'm sorry for the rather egregious oversight.

I took a look at the first patch even though you've pushed the bugfix
part. Any reason you didn't use array_desc() for the inner array (of
"ptids")? I find that following the pattern of using array_desc (when it
is correct, of course!) helps me to quickly identify: "okay, this is an
array of x" without having to stare at the loop too much.

I will say that the prefix of p in "ptid" makes it sound like pointer to
a tid, which I don't believe is what you meant.

I also changed some of the details around whitespace in arrays in the
fixup patch (though I didn't do the same with objects). It doesn't
seem useful to use so much whitespace for long arrays of integers
(really page offset numbers). And I brought a few nbtree desc routines
that still used ";" characters as punctuation in line with the new
convention.

Cool.

Finally, the patch revises the guidelines written for rmgr desc
routine authors. I don't think that we need to describe how to handle
outputting whitespace in detail. It'll be quite natural for other
rmgrs to use existing facilities such as array_desc() themselves,
which makes whitespace type inconsistencies unlikely. I've tried to
make the limits of the guidelines clear. The main goal is to avoid
gratuitous inconsistencies, and to provide a standard way of doing
things that many different rmgrs are likely to want to do, again and
again. But individual rmgrs still have a certain amount of discretion,
which seems like a good thing to me (the alternative requires that we
fix at least a couple of things in nbtdesc.c and in heapdesc.c, which
doesn't seem useful to me).

I like the new guidelines you proposed (in the patch).
They are well-written and clear.

On Mon, Apr 10, 2023 at 3:18 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Sun, Apr 9, 2023 at 5:12 PM Peter Geoghegan <pg@bowt.ie> wrote:

I noticed that the nbtree VACUUM and DELETE record types have their
update/xl_btree_update arrays output incorrectly. We cannot use the
generic array_desc() approach with xl_btree_update elements, because
they're variable-width elements. The problem is that array_desc() only deals
with fixed-width elements.

I pushed this fix just now, though without the updates to the
guidelines (or only minimal updates).

A remaining problem with arrays appears in "infobits" fields for
record types such as LOCK. Here's an example of the problem:

off: 34, xid: 3, flags: 0x00, infobits: [, LOCK_ONLY, EXCL_LOCK ]

Clearly the punctuation from the array is malformed.

So, I did do this on purpose -- because I didn't want to have to do the
gymnastics to determine which flag was hit first (though it looks like I
mistakenly omitted the comma prepending IS_MULTI -- that was not
intentional).
I recognized that the output doesn't look nice, but I hadn't exactly
thought of it as malformed. Perhaps you are right.

I will say and I am still not a fan of the "if (first) else" logic in
your attached patch.

I've put my suggestion for how to do it instead inline with the code
diff below for clarity.

diff --git a/src/backend/access/rmgrdesc/heapdesc.c
b/src/backend/access/rmgrdesc/heapdesc.c
index 3bd083875..a64d14c2c 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -18,29 +18,75 @@
 #include "access/rmgrdesc_utils.h"
 static void
-out_infobits(StringInfo buf, uint8 infobits)
+infobits_desc(StringInfo buf, uint8 infobits, const char *keyname)
...
     if (infobits & XLHL_KEYS_UPDATED)
-        appendStringInfoString(buf, ", KEYS_UPDATED");
+    {
+        if (first)
+            appendStringInfoString(buf, "KEYS_UPDATED");
+        else
+            appendStringInfoString(buf, ", KEYS_UPDATED");
+        first = false;
+    }

How about we have the flags use a trailing comma and space and then
overwrite the last one with something this:

if (infobits & XLHL_KEYS_UPDATED)
appendStringInfoString(buf, "KEYS_UPDATED, ");
buf->data[buf->len -= strlen(", ")] = '\0';

@@ -230,7 +271,9 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
OffsetNumber *offsets;

I don't prefer this to what I had, which is also correct, right?

             plans = (xl_heap_freeze_plan *)
XLogRecGetBlockData(record, 0, NULL);
-            offsets = (OffsetNumber *) &plans[xlrec->nplans];
+            offsets = (OffsetNumber *) ((char *) plans +
+                                        (xlrec->nplans *
+                                        sizeof(xl_heap_freeze_plan)));
             appendStringInfoString(buf, ", plans:");
             array_desc(buf, plans, sizeof(xl_heap_freeze_plan), xlrec->nplans,
                       &plan_elem_desc, &offsets);

A second issue (related to the first) is the name of the key itself,
"infobits". While "infobits" actually seems fine in this particular
example, I don't think that we want to do the same for record types
such as HEAP_UPDATE, since such records require that the description
show information about flags whose underlying field in the WAL record
struct is actually called "old_infobits_set". I think that we should
be outputting "old_infobits: [ ... ] " in the description of
HEAP_UPDATE records, which isn't the case right now.

--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -18,29 +18,75 @@
 #include "access/rmgrdesc_utils.h"
 static void
-out_infobits(StringInfo buf, uint8 infobits)
+infobits_desc(StringInfo buf, uint8 infobits, const char *keyname)

I like the keyname parameter.

Note that the patch makes many individual (say) HOT_UPDATE records
have descriptions that look like this:

... old_infobits: [], ...

This differs from HEAD, where the output is totally suppressed because
there are no flag bits to show. I think that this behavior is more
logical and consistent overall.

Yea, I think it is better to include things and show that they are empty
then omit them. I find it more clear.

- Melanie

In reply to: Melanie Plageman (#32)
Re: Show various offset arrays for heap WAL records

On Mon, Apr 10, 2023 at 3:04 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I took a look at the first patch even though you've pushed the bugfix
part. Any reason you didn't use array_desc() for the inner array (of
"ptids")? I find that following the pattern of using array_desc (when it
is correct, of course!) helps me to quickly identify: "okay, this is an
array of x" without having to stare at the loop too much.

It was fairly arbitrary. I was thinking "we can't use array_desc for
this", which wasn't 100% true, but seemed close enough. It helped that
this allowed me to remove uint16_elem_desc(), which likely wouldn't
have been reused later on.

I will say that the prefix of p in "ptid" makes it sound like pointer to
a tid, which I don't believe is what you meant.

I was thinking of the symbol name "ptid" from
_bt_delitems_delete_check() (it even appears in code comments). I
intended "posting list TID". But "pointer to a TID" actually kinda
works too, since these are offsets into a posting list (a simple
ItemPointerData array) for those TIDs that we're in the process of
removing/deleted from the tuple.

I like the new guidelines you proposed (in the patch).
They are well-written and clear.

Thanks. The guidelines might well become stricter in the future. Right
now I'd be happy if everybody could at least be in rough agreement
about the most basic things.

I recognized that the output doesn't look nice, but I hadn't exactly
thought of it as malformed. Perhaps you are right.

It does seem like an annoying thing to have to handle if you actually
want to parse the array. It requires a different approach to every
other array, which seems bad.

I will say and I am still not a fan of the "if (first) else" logic in
your attached patch.

I agree that my approach there was pretty ugly.

How about we have the flags use a trailing comma and space and then
overwrite the last one with something this:

if (infobits & XLHL_KEYS_UPDATED)
appendStringInfoString(buf, "KEYS_UPDATED, ");
buf->data[buf->len -= strlen(", ")] = '\0';

I'll try something like that instead.

-            offsets = (OffsetNumber *) &plans[xlrec->nplans];
+            offsets = (OffsetNumber *) ((char *) plans +
+                                        (xlrec->nplans *
+                                        sizeof(xl_heap_freeze_plan)));
appendStringInfoString(buf, ", plans:");
array_desc(buf, plans, sizeof(xl_heap_freeze_plan), xlrec->nplans,
&plan_elem_desc, &offsets);

I thought that it made sense to match the FREEZE_PAGE REDO routine.

Another fairly arbitrary change, to be honest.

Note that the patch makes many individual (say) HOT_UPDATE records
have descriptions that look like this:

... old_infobits: [], ...

This differs from HEAD, where the output is totally suppressed because
there are no flag bits to show. I think that this behavior is more
logical and consistent overall.

Yea, I think it is better to include things and show that they are empty
then omit them. I find it more clear.

Right. It makes sense for something like this, because generally
speaking the structures aren't nested in any real sense. They're also
very static -- WAL records have a fixed structure. So it's unlikely
that anybody is going to try to parse the description before knowing
which particular WAL record type (or perhaps types, plural) are
involved.

--
Peter Geoghegan

#34Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#33)
Re: Show various offset arrays for heap WAL records

On Mon, Apr 10, 2023 at 04:31:44PM -0700, Peter Geoghegan wrote:

On Mon, Apr 10, 2023 at 3:04 PM Melanie Plageman <melanieplageman@gmail.com> wrote:

I will say that the prefix of p in "ptid" makes it sound like pointer to
a tid, which I don't believe is what you meant.

I was thinking of the symbol name "ptid" from
_bt_delitems_delete_check() (it even appears in code comments). I
intended "posting list TID". But "pointer to a TID" actually kinda
works too, since these are offsets into a posting list (a simple
ItemPointerData array) for those TIDs that we're in the process of
removing/deleted from the tuple.

If you keep the name, I'd explain it briefly in a comment above the code
then -- for those of us who spend less time with btrees. It is a tool
that will be often used by developers, so it is not unreasonable to
assume they may read the code if they are confused.

- Melanie

In reply to: Melanie Plageman (#34)
1 attachment(s)
Re: Show various offset arrays for heap WAL records

On Mon, Apr 10, 2023 at 5:23 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

If you keep the name, I'd explain it briefly in a comment above the code
then -- for those of us who spend less time with btrees. It is a tool
that will be often used by developers, so it is not unreasonable to
assume they may read the code if they are confused.

Okay, I'll do something about that shortly.

Attached v2 deals with the "trailing comma and space in flags array"
heap desc issue using an approach that's along the same lines as your
suggested approach. What do you think?

--
Peter Geoghegan

Attachments:

v2-0001-Fix-heapdesc-infomask-array-output.patchapplication/octet-stream; name=v2-0001-Fix-heapdesc-infomask-array-output.patchDownload
From 5daddbd24e71c5088f0083a9e1e78d6f63d221cb Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sun, 9 Apr 2023 18:14:27 -0700
Subject: [PATCH v2 1/2] Fix heapdesc infomask array output.

---
 src/include/access/rmgrdesc_utils.h          | 38 +++++++++
 src/backend/access/rmgrdesc/heapdesc.c       | 84 ++++++++++++--------
 src/backend/access/rmgrdesc/rmgrdesc_utils.c | 20 -----
 3 files changed, 90 insertions(+), 52 deletions(-)

diff --git a/src/include/access/rmgrdesc_utils.h b/src/include/access/rmgrdesc_utils.h
index 00e614649..16916e3e8 100644
--- a/src/include/access/rmgrdesc_utils.h
+++ b/src/include/access/rmgrdesc_utils.h
@@ -12,6 +12,44 @@
 #ifndef RMGRDESC_UTILS_H_
 #define RMGRDESC_UTILS_H_
 
+/*
+ * Guidelines for rmgr descriptor routine authors:
+ *
+ * The goal of these guidelines is to avoid gratuitous inconsistencies across
+ * each rmgr, and to allow users to parse desc output strings without too much
+ * difficulty.  This is not an API specification or an interchange format.
+ * (Only heapam and nbtree desc routines follow these guidelines at present,
+ * in any case.)
+ *
+ * Record descriptions are similar to JSON style key/value objects.  However,
+ * there is no explicit "string" type/string escaping.  Top-level { } brackets
+ * should be omitted.  For example:
+ *
+ * snapshotConflictHorizon: 0, flags: 0x03
+ *
+ * Record descriptions may contain variable-length arrays.  For example:
+ *
+ * nunused: 5, unused: [1, 2, 3, 4, 5]
+ *
+ * Nested objects are supported via { } brackets.  They generally appear
+ * inside variable-length arrays.  For example:
+ *
+ * ndeleted: 0, nupdated: 1, deleted: [], updated: [{ off: 45, nptids: 1, ptids: [0] }]
+ *
+ * Try to output things in an order that faithfully represents the order of
+ * things in the physical WAL record struct.  It's a good idea if the number
+ * of items in the array appears before the array.
+ *
+ * It's okay for individual WAL record types to invent their own conventions.
+ * For example, heapam's PRUNE records output the follow representation of
+ * redirects:
+ *
+ * ... redirected: [39->46], ...
+ *
+ * Arguably the PRUNE desc routine should be using object notation instead.
+ * This ad-hoc representation of redirects has the advantage of being terse in
+ * a context where that might matter a lot.
+ */
 extern void array_desc(StringInfo buf, void *array, size_t elem_size, int count,
 					   void (*elem_desc) (StringInfo buf, void *elem, void *data),
 					   void *data);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 3bd083875..151569990 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -18,29 +18,53 @@
 #include "access/rmgrdesc_utils.h"
 
 static void
-out_infobits(StringInfo buf, uint8 infobits)
+infobits_desc(StringInfo buf, uint8 infobits, const char *keyname)
 {
-	if ((infobits & XLHL_XMAX_IS_MULTI) == 0 &&
-		(infobits & XLHL_XMAX_LOCK_ONLY) == 0 &&
-		(infobits & XLHL_XMAX_EXCL_LOCK) == 0 &&
-		(infobits & XLHL_XMAX_KEYSHR_LOCK) == 0 &&
-		(infobits & XLHL_KEYS_UPDATED) == 0)
-		return;
+	appendStringInfo(buf, "%s: [", keyname);
 
-	appendStringInfoString(buf, ", infobits: [");
+	Assert(buf->data[buf->len - 1] != ' ');
 
 	if (infobits & XLHL_XMAX_IS_MULTI)
-		appendStringInfoString(buf, " IS_MULTI");
+		appendStringInfoString(buf, "IS_MULTI, ");
 	if (infobits & XLHL_XMAX_LOCK_ONLY)
-		appendStringInfoString(buf, ", LOCK_ONLY");
+		appendStringInfoString(buf, "LOCK_ONLY, ");
 	if (infobits & XLHL_XMAX_EXCL_LOCK)
-		appendStringInfoString(buf, ", EXCL_LOCK");
+		appendStringInfoString(buf, "EXCL_LOCK, ");
 	if (infobits & XLHL_XMAX_KEYSHR_LOCK)
-		appendStringInfoString(buf, ", KEYSHR_LOCK");
+		appendStringInfoString(buf, "KEYSHR_LOCK, ");
 	if (infobits & XLHL_KEYS_UPDATED)
-		appendStringInfoString(buf, ", KEYS_UPDATED");
+		appendStringInfoString(buf, "KEYS_UPDATED, ");
 
-	appendStringInfoString(buf, " ]");
+	if (buf->data[buf->len - 1] == ' ')
+	{
+		/* Truncate-away final unneeded ", "  */
+		Assert(buf->data[buf->len - 2] == ',');
+		buf->len -= 2;
+		buf->data[buf->len] = '\0';
+	}
+
+	appendStringInfoString(buf, "]");
+}
+
+static void
+truncate_flags_desc(StringInfo buf, uint8 flags)
+{
+	appendStringInfoString(buf, "flags: [");
+
+	if (flags & XLH_TRUNCATE_CASCADE)
+		appendStringInfoString(buf, "CASCADE, ");
+	if (flags & XLH_TRUNCATE_RESTART_SEQS)
+		appendStringInfoString(buf, "RESTART_SEQS, ");
+
+	if (buf->data[buf->len - 1] == ' ')
+	{
+		/* Truncate-away final unneeded ", "  */
+		Assert(buf->data[buf->len - 2] == ',');
+		buf->len -= 2;
+		buf->data[buf->len] = '\0';
+	}
+
+	appendStringInfoString(buf, "]");
 }
 
 static void
@@ -82,20 +106,20 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_delete *xlrec = (xl_heap_delete *) rec;
 
-		appendStringInfo(buf, "off: %u, flags: 0x%02X",
+		appendStringInfo(buf, "off: %u, flags: 0x%02X, ",
 						 xlrec->offnum,
 						 xlrec->flags);
-		out_infobits(buf, xlrec->infobits_set);
+		infobits_desc(buf, xlrec->infobits_set, "infobits");
 	}
 	else if (info == XLOG_HEAP_UPDATE)
 	{
 		xl_heap_update *xlrec = (xl_heap_update *) rec;
 
-		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
+		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X, ",
 						 xlrec->old_offnum,
 						 xlrec->old_xmax,
 						 xlrec->flags);
-		out_infobits(buf, xlrec->old_infobits_set);
+		infobits_desc(buf, xlrec->old_infobits_set, "old_infobits");
 		appendStringInfo(buf, ", new off: %u, xmax %u",
 						 xlrec->new_offnum,
 						 xlrec->new_xmax);
@@ -104,11 +128,11 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_update *xlrec = (xl_heap_update *) rec;
 
-		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
+		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X, ",
 						 xlrec->old_offnum,
 						 xlrec->old_xmax,
 						 xlrec->flags);
-		out_infobits(buf, xlrec->old_infobits_set);
+		infobits_desc(buf, xlrec->old_infobits_set, "old_infobits");
 		appendStringInfo(buf, ", new off: %u, xmax: %u",
 						 xlrec->new_offnum,
 						 xlrec->new_xmax);
@@ -117,13 +141,7 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_truncate *xlrec = (xl_heap_truncate *) rec;
 
-		appendStringInfoString(buf, "flags: [");
-		if (xlrec->flags & XLH_TRUNCATE_CASCADE)
-			appendStringInfoString(buf, " CASCADE");
-		if (xlrec->flags & XLH_TRUNCATE_RESTART_SEQS)
-			appendStringInfoString(buf, ", RESTART_SEQS");
-		appendStringInfoString(buf, " ]");
-
+		truncate_flags_desc(buf, xlrec->flags);
 		appendStringInfo(buf, ", nrelids: %u", xlrec->nrelids);
 		appendStringInfoString(buf, ", relids:");
 		array_desc(buf, xlrec->relids, sizeof(Oid), xlrec->nrelids,
@@ -139,9 +157,9 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_lock *xlrec = (xl_heap_lock *) rec;
 
-		appendStringInfo(buf, "off: %u, xid: %u, flags: 0x%02X",
+		appendStringInfo(buf, "off: %u, xid: %u, flags: 0x%02X, ",
 						 xlrec->offnum, xlrec->locking_xid, xlrec->flags);
-		out_infobits(buf, xlrec->infobits_set);
+		infobits_desc(buf, xlrec->infobits_set, "infobits");
 	}
 	else if (info == XLOG_HEAP_INPLACE)
 	{
@@ -230,7 +248,9 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			OffsetNumber *offsets;
 
 			plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, NULL);
-			offsets = (OffsetNumber *) &plans[xlrec->nplans];
+			offsets = (OffsetNumber *) ((char *) plans +
+										(xlrec->nplans *
+										 sizeof(xl_heap_freeze_plan)));
 			appendStringInfoString(buf, ", plans:");
 			array_desc(buf, plans, sizeof(xl_heap_freeze_plan), xlrec->nplans,
 					   &plan_elem_desc, &offsets);
@@ -260,9 +280,9 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_lock_updated *xlrec = (xl_heap_lock_updated *) rec;
 
-		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
+		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X, ",
 						 xlrec->offnum, xlrec->xmax, xlrec->flags);
-		out_infobits(buf, xlrec->infobits_set);
+		infobits_desc(buf, xlrec->infobits_set, "infobits");
 	}
 	else if (info == XLOG_HEAP2_NEW_CID)
 	{
diff --git a/src/backend/access/rmgrdesc/rmgrdesc_utils.c b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
index 3d16b1fcb..90051f0df 100644
--- a/src/backend/access/rmgrdesc/rmgrdesc_utils.c
+++ b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
@@ -16,26 +16,6 @@
 #include "access/rmgrdesc_utils.h"
 #include "storage/off.h"
 
-/*
- * Guidelines for formatting desc functions:
- *
- * member1_name: member1_value, member2_name: member2_value
- *
- * If the value is a list, please use:
- *
- * member3_name: [ member3_list_value1, member3_list_value2 ]
- *
- * The first item appended to the string should not be prepended by any spaces
- * or comma, however all subsequent appends to the string are responsible for
- * prepending themselves with a comma followed by a space.
- *
- * Flags should be in ALL CAPS.
- *
- * For lists/arrays of items, the number of those items should be listed at
- * the beginning with all of the other numbers.
- *
- * Composite objects in a list should be surrounded with { }.
- */
 void
 array_desc(StringInfo buf, void *array, size_t elem_size, int count,
 		   void (*elem_desc) (StringInfo buf, void *elem, void *data),
-- 
2.40.0

#36Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#35)
Re: Show various offset arrays for heap WAL records

Hi,

static void
infobits_desc(StringInfo buf, uint8 infobits, const char *keyname)
{
appendStringInfo(buf, "%s: [", keyname);

Why can we assume that there will be no space at the end here?

I know we need to be able to avoid doing the comma overwriting if no
flags were set. In general, we expect record description elements to
prepend themselves with commas and spaces, but these infobits, for
example, use a trailing comma and space. If someone adds a description
element with a trailing space, they will trip this assert. We should at
least add a comment explaining this assertion so someone knows what to
do if they trip it.

Otherwise, we can return early if no flags are set. That will probably
make for slightly messier code since we would still have to construct
the empty list.

Assert(buf->data[buf->len - 1] != ' ');

if (infobits & XLHL_XMAX_IS_MULTI)
appendStringInfoString(buf, "IS_MULTI, ");
if (infobits & XLHL_XMAX_LOCK_ONLY)
appendStringInfoString(buf, "LOCK_ONLY, ");
if (infobits & XLHL_XMAX_EXCL_LOCK)
appendStringInfoString(buf, "EXCL_LOCK, ");
if (infobits & XLHL_XMAX_KEYSHR_LOCK)
appendStringInfoString(buf, "KEYSHR_LOCK, ");
if (infobits & XLHL_KEYS_UPDATED)
appendStringInfoString(buf, "KEYS_UPDATED, ");

if (buf->data[buf->len - 1] == ' ')
{
/* Truncate-away final unneeded ", " */
Assert(buf->data[buf->len - 2] == ',');
buf->len -= 2;
buf->data[buf->len] = '\0';
}

appendStringInfoString(buf, "]");
}

Also you didn't add the same assert to truncate_flags_desc().

static void
truncate_flags_desc(StringInfo buf, uint8 flags)
{
appendStringInfoString(buf, "flags: [");

if (flags & XLH_TRUNCATE_CASCADE)
appendStringInfoString(buf, "CASCADE, ");
if (flags & XLH_TRUNCATE_RESTART_SEQS)
appendStringInfoString(buf, "RESTART_SEQS, ");

if (buf->data[buf->len - 1] == ' ')
{
/* Truncate-away final unneeded ", " */
Assert(buf->data[buf->len - 2] == ',');
buf->len -= 2;
buf->data[buf->len] = '\0';
}

appendStringInfoString(buf, "]");
}

Not the fault of this patch, but I also noticed that heap UPDATE and
HOT_UPDATE records have xmax twice and don't differentiate between new
and old. I think that was probably a mistake.

description | off: 119, xmax: 1105, flags: 0x00, old_infobits:
[], new off: 100, xmax 0

else if (info == XLOG_HEAP_UPDATE)
{
xl_heap_update *xlrec = (xl_heap_update *) rec;

appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X, ",
xlrec->old_offnum,
xlrec->old_xmax,
xlrec->flags);
infobits_desc(buf, xlrec->old_infobits_set, "old_infobits");
appendStringInfo(buf, ", new off: %u, xmax %u",
xlrec->new_offnum,
xlrec->new_xmax);
}
else if (info == XLOG_HEAP_HOT_UPDATE)
{
xl_heap_update *xlrec = (xl_heap_update *) rec;

appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X, ",
xlrec->old_offnum,
xlrec->old_xmax,
xlrec->flags);
infobits_desc(buf, xlrec->old_infobits_set, "old_infobits");
appendStringInfo(buf, ", new off: %u, xmax: %u",
xlrec->new_offnum,
xlrec->new_xmax);
}

Also not the fault of this patch, but looking at the output while using
this, I realized truncate record type has a stringified version of its
flags while other record types, like update, don't. Do you think this
makes sense? Perhaps not something we can change now, though...

description | off: 1, xmax: 1183, flags: 0x00, old_infobits: [],
new off: 119, xmax 0

Also not the fault of this patch, but I noticed that leaftopparent is
often InvalidBlockNumber--which shows up as 4294967295. I wonder if
anyone would be confused by this. Maybe devs know that this value is
InvalidBlockNumber. In the future, perhaps we should interpolate the
string "InvalidBlockNumber"?

description | left: 436, right: 389, level: 0, safexid: 0:1091,
leafleft: 436, leafright: 389, leaftopparent: 4294967295

- Melanie

In reply to: Melanie Plageman (#36)
Re: Show various offset arrays for heap WAL records

On Tue, Apr 11, 2023 at 7:40 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

static void
infobits_desc(StringInfo buf, uint8 infobits, const char *keyname)
{
appendStringInfo(buf, "%s: [", keyname);

Why can we assume that there will be no space at the end here?

I don't think that anybody is going to try that, but if they do then
the assertion will fail reliably.

I know we need to be able to avoid doing the comma overwriting if no
flags were set. In general, we expect record description elements to
prepend themselves with commas and spaces, but these infobits, for
example, use a trailing comma and space. If someone adds a description
element with a trailing space, they will trip this assert. We should at
least add a comment explaining this assertion so someone knows what to
do if they trip it.

The invariant here is roughly: caller's keyname argument cannot have
trailing spaces or punctuation characters. It looks like it would be
inconvenient to write a precise assertion for that, but it doesn't
feel particularly necessary, given that this is just a static helper
function.

Otherwise, we can return early if no flags are set. That will probably
make for slightly messier code since we would still have to construct
the empty list.

I prefer to keep this as simple as possible for now.

Also you didn't add the same assert to truncate_flags_desc().

That's because truncate_flags_desc doesn't have a "keyname" argument.
Though it does have an assertion at the end that is almost equivalent:
the "Assert(buf->data[buf->len - 2] == ',') assertion (a matching
assertion appears at the end of infobits_desc).

Not the fault of this patch, but I also noticed that heap UPDATE and
HOT_UPDATE records have xmax twice and don't differentiate between new
and old. I think that was probably a mistake.

description | off: 119, xmax: 1105, flags: 0x00, old_infobits:
[], new off: 100, xmax 0

That doesn't seem great to me either. I don't like this ambiguity,
because it seems like it makes the description hard to parse in a way
that flies in the face of what we're trying to do here, in general.
So it seems like it might be worth fixing now, in the scope of this
patch.

Also not the fault of this patch, but looking at the output while using
this, I realized truncate record type has a stringified version of its
flags while other record types, like update, don't. Do you think this
makes sense? Perhaps not something we can change now, though...

You don't have to look at the truncate record type (which is a
relatively obscure and unimportant record type) to see these kinds of
inconsistencies. You can see the same thing with HEAP_UPDATE and
HEAP_HOT_UPDATE, which have stringified constants for infomask bits,
but not for the xl_heap_update.flags status bits.

I don't see any principled reason why such an inconsistency should
exist -- and we're talking about a pretty glaring inconsistency here.
On the other hand I don't think that we're obligated to do anything
about it for 16.

Also not the fault of this patch, but I noticed that leaftopparent is
often InvalidBlockNumber--which shows up as 4294967295. I wonder if
anyone would be confused by this. Maybe devs know that this value is
InvalidBlockNumber. In the future, perhaps we should interpolate the
string "InvalidBlockNumber"?

description | left: 436, right: 389, level: 0, safexid: 0:1091,
leafleft: 436, leafright: 389, leaftopparent: 4294967295

In my personal opinion (this is a totally subjective question), the
current approach here is okay because (on its own) "leaftopparent:
4294967295" isn't any more or less meaningful than "leaftopparent:
InvalidBlockNumber". It's not as if the REDO routine actually relies
on the value ever being InvalidBlockNumber at all (except in an
assertion).

Plus it's easier to parse as-is. That's what swings it for me, in fact
(as with the "two xmax fields in update records" question).

This is the kind of question that tends to lead to bikeshedding. The
guidelines should avoid taking a firm position on these more
subjective questions, where we must make a subjective trade-off.
Especially a trade-off around how faithfully we represent the physical
WAL record versus readability (whatever "readability" means). I
pondered a similar trade-off in comments added to delvacuum_desc. That
contributed to my feeling that this is best left up to individual rmgr
desc routines.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#37)
1 attachment(s)
Re: Show various offset arrays for heap WAL records

On Tue, Apr 11, 2023 at 10:34 AM Peter Geoghegan <pg@bowt.ie> wrote:

description | off: 119, xmax: 1105, flags: 0x00, old_infobits:
[], new off: 100, xmax 0

That doesn't seem great to me either. I don't like this ambiguity,
because it seems like it makes the description hard to parse in a way
that flies in the face of what we're trying to do here, in general.
So it seems like it might be worth fixing now, in the scope of this
patch.

Attached revision deals with this by spelling out the names in full
(e.g., "old_xmax" and "new_xmax"). It also reorders the output fields
to match the order from the physical UPDATE, HOT_UPDATE, and LOCK WAL
record types, on the theory that those should match the physical
record (unless there is a good reason not to, which doesn't apply
here). I also removed some inconsistencies between
xl_heap_lock_updated and xl_heap_lock, since they're very similar
record types.

The revision also adds an extra sentence to the guidelines, since this
seems like something that we're entitled to take a relatively firm
position on. Finally, it also adds a comment about the rules for
infobits_desc callers in header comments for the function, per your
concern about that.

--
Peter Geoghegan

Attachments:

v3-0001-Fix-heapdesc-infomask-array-output.patchapplication/octet-stream; name=v3-0001-Fix-heapdesc-infomask-array-output.patchDownload
From dff0f4691e7d8e2da607f7512148356afe8e4492 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sun, 9 Apr 2023 18:14:27 -0700
Subject: [PATCH v3 1/3] Fix heapdesc infomask array output.

---
 src/include/access/heapam_xlog.h             |   2 +-
 src/include/access/rmgrdesc_utils.h          |  39 +++++++
 src/backend/access/heap/heapam.c             |   6 +-
 src/backend/access/rmgrdesc/heapdesc.c       | 116 +++++++++++--------
 src/backend/access/rmgrdesc/rmgrdesc_utils.c |  20 ----
 5 files changed, 112 insertions(+), 71 deletions(-)

diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index e71d84895..bf1d05965 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -279,7 +279,7 @@ typedef struct xl_heap_vacuum
 /* This is what we need to know about lock */
 typedef struct xl_heap_lock
 {
-	TransactionId locking_xid;	/* might be a MultiXactId not xid */
+	TransactionId xmax;			/* might be a MultiXactId */
 	OffsetNumber offnum;		/* locked tuple's offset on page */
 	int8		infobits_set;	/* infomask and infomask2 bits to set */
 	uint8		flags;			/* XLH_LOCK_* flag bits */
diff --git a/src/include/access/rmgrdesc_utils.h b/src/include/access/rmgrdesc_utils.h
index 00e614649..1686147f8 100644
--- a/src/include/access/rmgrdesc_utils.h
+++ b/src/include/access/rmgrdesc_utils.h
@@ -12,6 +12,45 @@
 #ifndef RMGRDESC_UTILS_H_
 #define RMGRDESC_UTILS_H_
 
+/*
+ * Guidelines for rmgr descriptor routine authors:
+ *
+ * The goal of these guidelines is to avoid gratuitous inconsistencies across
+ * each rmgr, and to allow users to parse desc output strings without too much
+ * difficulty.  This is not an API specification or an interchange format.
+ * (Only heapam and nbtree desc routines follow these guidelines at present,
+ * in any case.)
+ *
+ * Record descriptions are similar to JSON style key/value objects.  However,
+ * there is no explicit "string" type/string escaping.  Top-level { } brackets
+ * should be omitted.  For example:
+ *
+ * snapshotConflictHorizon: 0, flags: 0x03
+ *
+ * Record descriptions may contain variable-length arrays.  For example:
+ *
+ * nunused: 5, unused: [1, 2, 3, 4, 5]
+ *
+ * Nested objects are supported via { } brackets.  They generally appear
+ * inside variable-length arrays.  For example:
+ *
+ * ndeleted: 0, nupdated: 1, deleted: [], updated: [{ off: 45, nptids: 1, ptids: [0] }]
+ *
+ * Try to output things in an order that faithfully represents the order of
+ * things in the physical WAL record struct.  It's a good idea if the number
+ * of items in the array appears before the array.  It's also a good idea if
+ * key names are reliably unique (within the same nesting level).
+ *
+ * It's okay for individual WAL record types to invent their own conventions.
+ * For example, heapam's PRUNE records output the follow representation of
+ * redirects:
+ *
+ * ... redirected: [39->46], ...
+ *
+ * Arguably the PRUNE desc routine should be using object notation instead.
+ * This ad-hoc representation of redirects has the advantage of being terse in
+ * a context where that might matter a lot.
+ */
 extern void array_desc(StringInfo buf, void *array, size_t elem_size, int count,
 					   void (*elem_desc) (StringInfo buf, void *elem, void *data),
 					   void *data);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f389ceee1..b300a4675 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3567,7 +3567,7 @@ l2:
 			XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
 
 			xlrec.offnum = ItemPointerGetOffsetNumber(&oldtup.t_self);
-			xlrec.locking_xid = xmax_lock_old_tuple;
+			xlrec.xmax = xmax_lock_old_tuple;
 			xlrec.infobits_set = compute_infobits(oldtup.t_data->t_infomask,
 												  oldtup.t_data->t_infomask2);
 			xlrec.flags =
@@ -4777,7 +4777,7 @@ failed:
 		XLogRegisterBuffer(0, *buffer, REGBUF_STANDARD);
 
 		xlrec.offnum = ItemPointerGetOffsetNumber(&tuple->t_self);
-		xlrec.locking_xid = xid;
+		xlrec.xmax = xid;
 		xlrec.infobits_set = compute_infobits(new_infomask,
 											  tuple->t_data->t_infomask2);
 		xlrec.flags = cleared_all_frozen ? XLH_LOCK_ALL_FROZEN_CLEARED : 0;
@@ -9848,7 +9848,7 @@ heap_xlog_lock(XLogReaderState *record)
 						   BufferGetBlockNumber(buffer),
 						   offnum);
 		}
-		HeapTupleHeaderSetXmax(htup, xlrec->locking_xid);
+		HeapTupleHeaderSetXmax(htup, xlrec->xmax);
 		HeapTupleHeaderSetCmax(htup, FirstCommandId, false);
 		PageSetLSN(page, lsn);
 		MarkBufferDirty(buffer);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 3bd083875..de91a43f9 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -17,30 +17,58 @@
 #include "access/heapam_xlog.h"
 #include "access/rmgrdesc_utils.h"
 
+/*
+ * NOTE: "keyname" argument cannot have trailing spaces or punctuation
+ * characters
+ */
 static void
-out_infobits(StringInfo buf, uint8 infobits)
+infobits_desc(StringInfo buf, uint8 infobits, const char *keyname)
 {
-	if ((infobits & XLHL_XMAX_IS_MULTI) == 0 &&
-		(infobits & XLHL_XMAX_LOCK_ONLY) == 0 &&
-		(infobits & XLHL_XMAX_EXCL_LOCK) == 0 &&
-		(infobits & XLHL_XMAX_KEYSHR_LOCK) == 0 &&
-		(infobits & XLHL_KEYS_UPDATED) == 0)
-		return;
+	appendStringInfo(buf, "%s: [", keyname);
 
-	appendStringInfoString(buf, ", infobits: [");
+	Assert(buf->data[buf->len - 1] != ' ');
 
 	if (infobits & XLHL_XMAX_IS_MULTI)
-		appendStringInfoString(buf, " IS_MULTI");
+		appendStringInfoString(buf, "IS_MULTI, ");
 	if (infobits & XLHL_XMAX_LOCK_ONLY)
-		appendStringInfoString(buf, ", LOCK_ONLY");
+		appendStringInfoString(buf, "LOCK_ONLY, ");
 	if (infobits & XLHL_XMAX_EXCL_LOCK)
-		appendStringInfoString(buf, ", EXCL_LOCK");
+		appendStringInfoString(buf, "EXCL_LOCK, ");
 	if (infobits & XLHL_XMAX_KEYSHR_LOCK)
-		appendStringInfoString(buf, ", KEYSHR_LOCK");
+		appendStringInfoString(buf, "KEYSHR_LOCK, ");
 	if (infobits & XLHL_KEYS_UPDATED)
-		appendStringInfoString(buf, ", KEYS_UPDATED");
+		appendStringInfoString(buf, "KEYS_UPDATED, ");
 
-	appendStringInfoString(buf, " ]");
+	if (buf->data[buf->len - 1] == ' ')
+	{
+		/* Truncate-away final unneeded ", "  */
+		Assert(buf->data[buf->len - 2] == ',');
+		buf->len -= 2;
+		buf->data[buf->len] = '\0';
+	}
+
+	appendStringInfoString(buf, "]");
+}
+
+static void
+truncate_flags_desc(StringInfo buf, uint8 flags)
+{
+	appendStringInfoString(buf, "flags: [");
+
+	if (flags & XLH_TRUNCATE_CASCADE)
+		appendStringInfoString(buf, "CASCADE, ");
+	if (flags & XLH_TRUNCATE_RESTART_SEQS)
+		appendStringInfoString(buf, "RESTART_SEQS, ");
+
+	if (buf->data[buf->len - 1] == ' ')
+	{
+		/* Truncate-away final unneeded ", "  */
+		Assert(buf->data[buf->len - 2] == ',');
+		buf->len -= 2;
+		buf->data[buf->len] = '\0';
+	}
+
+	appendStringInfoString(buf, "]");
 }
 
 static void
@@ -82,48 +110,36 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_delete *xlrec = (xl_heap_delete *) rec;
 
-		appendStringInfo(buf, "off: %u, flags: 0x%02X",
+		appendStringInfo(buf, "off: %u, flags: 0x%02X, ",
 						 xlrec->offnum,
 						 xlrec->flags);
-		out_infobits(buf, xlrec->infobits_set);
+		infobits_desc(buf, xlrec->infobits_set, "infobits");
 	}
 	else if (info == XLOG_HEAP_UPDATE)
 	{
 		xl_heap_update *xlrec = (xl_heap_update *) rec;
 
-		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
-						 xlrec->old_offnum,
-						 xlrec->old_xmax,
-						 xlrec->flags);
-		out_infobits(buf, xlrec->old_infobits_set);
-		appendStringInfo(buf, ", new off: %u, xmax %u",
-						 xlrec->new_offnum,
-						 xlrec->new_xmax);
+		appendStringInfo(buf, "old_off: %u, old_xmax: %u, ",
+						 xlrec->old_offnum, xlrec->old_xmax);
+		infobits_desc(buf, xlrec->old_infobits_set, "old_infobits");
+		appendStringInfo(buf, ", flags: 0x%02X, new_off: %u, new_xmax: %u",
+						 xlrec->flags, xlrec->new_offnum, xlrec->new_xmax);
 	}
 	else if (info == XLOG_HEAP_HOT_UPDATE)
 	{
 		xl_heap_update *xlrec = (xl_heap_update *) rec;
 
-		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
-						 xlrec->old_offnum,
-						 xlrec->old_xmax,
-						 xlrec->flags);
-		out_infobits(buf, xlrec->old_infobits_set);
-		appendStringInfo(buf, ", new off: %u, xmax: %u",
-						 xlrec->new_offnum,
-						 xlrec->new_xmax);
+		appendStringInfo(buf, "old_off: %u, old_xmax: %u, ",
+						 xlrec->old_offnum, xlrec->old_xmax);
+		infobits_desc(buf, xlrec->old_infobits_set, "old_infobits");
+		appendStringInfo(buf, ", flags: 0x%02X, new_off: %u, new_xmax: %u",
+						 xlrec->flags, xlrec->new_offnum, xlrec->new_xmax);
 	}
 	else if (info == XLOG_HEAP_TRUNCATE)
 	{
 		xl_heap_truncate *xlrec = (xl_heap_truncate *) rec;
 
-		appendStringInfoString(buf, "flags: [");
-		if (xlrec->flags & XLH_TRUNCATE_CASCADE)
-			appendStringInfoString(buf, " CASCADE");
-		if (xlrec->flags & XLH_TRUNCATE_RESTART_SEQS)
-			appendStringInfoString(buf, ", RESTART_SEQS");
-		appendStringInfoString(buf, " ]");
-
+		truncate_flags_desc(buf, xlrec->flags);
 		appendStringInfo(buf, ", nrelids: %u", xlrec->nrelids);
 		appendStringInfoString(buf, ", relids:");
 		array_desc(buf, xlrec->relids, sizeof(Oid), xlrec->nrelids,
@@ -139,9 +155,10 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_lock *xlrec = (xl_heap_lock *) rec;
 
-		appendStringInfo(buf, "off: %u, xid: %u, flags: 0x%02X",
-						 xlrec->offnum, xlrec->locking_xid, xlrec->flags);
-		out_infobits(buf, xlrec->infobits_set);
+		appendStringInfo(buf, "xmax: %u, off: %u, ",
+						 xlrec->xmax, xlrec->offnum);
+		infobits_desc(buf, xlrec->infobits_set, "infobits");
+		appendStringInfo(buf, ", flags: 0x%02X", xlrec->flags);
 	}
 	else if (info == XLOG_HEAP_INPLACE)
 	{
@@ -230,7 +247,9 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			OffsetNumber *offsets;
 
 			plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, NULL);
-			offsets = (OffsetNumber *) &plans[xlrec->nplans];
+			offsets = (OffsetNumber *) ((char *) plans +
+										(xlrec->nplans *
+										 sizeof(xl_heap_freeze_plan)));
 			appendStringInfoString(buf, ", plans:");
 			array_desc(buf, plans, sizeof(xl_heap_freeze_plan), xlrec->nplans,
 					   &plan_elem_desc, &offsets);
@@ -251,18 +270,21 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
 
-		appendStringInfoString(buf, ", offsets:");
 		if (!XLogRecHasBlockImage(record, 0) && !isinit)
+		{
+			appendStringInfoString(buf, ", offsets:");
 			array_desc(buf, xlrec->offsets, sizeof(OffsetNumber),
 					   xlrec->ntuples, &offset_elem_desc, NULL);
+		}
 	}
 	else if (info == XLOG_HEAP2_LOCK_UPDATED)
 	{
 		xl_heap_lock_updated *xlrec = (xl_heap_lock_updated *) rec;
 
-		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
-						 xlrec->offnum, xlrec->xmax, xlrec->flags);
-		out_infobits(buf, xlrec->infobits_set);
+		appendStringInfo(buf, "xmax: %u, off: %u, ",
+						 xlrec->xmax, xlrec->offnum);
+		infobits_desc(buf, xlrec->infobits_set, "infobits");
+		appendStringInfo(buf, ", flags: 0x%02X", xlrec->flags);
 	}
 	else if (info == XLOG_HEAP2_NEW_CID)
 	{
diff --git a/src/backend/access/rmgrdesc/rmgrdesc_utils.c b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
index 3d16b1fcb..90051f0df 100644
--- a/src/backend/access/rmgrdesc/rmgrdesc_utils.c
+++ b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
@@ -16,26 +16,6 @@
 #include "access/rmgrdesc_utils.h"
 #include "storage/off.h"
 
-/*
- * Guidelines for formatting desc functions:
- *
- * member1_name: member1_value, member2_name: member2_value
- *
- * If the value is a list, please use:
- *
- * member3_name: [ member3_list_value1, member3_list_value2 ]
- *
- * The first item appended to the string should not be prepended by any spaces
- * or comma, however all subsequent appends to the string are responsible for
- * prepending themselves with a comma followed by a space.
- *
- * Flags should be in ALL CAPS.
- *
- * For lists/arrays of items, the number of those items should be listed at
- * the beginning with all of the other numbers.
- *
- * Composite objects in a list should be surrounded with { }.
- */
 void
 array_desc(StringInfo buf, void *array, size_t elem_size, int count,
 		   void (*elem_desc) (StringInfo buf, void *elem, void *data),
-- 
2.40.0

In reply to: Peter Geoghegan (#38)
1 attachment(s)
Re: Show various offset arrays for heap WAL records

On Tue, Apr 11, 2023 at 11:48 AM Peter Geoghegan <pg@bowt.ie> wrote:

Attached revision deals with this by spelling out the names in full
(e.g., "old_xmax" and "new_xmax"). It also reorders the output fields
to match the order from the physical UPDATE, HOT_UPDATE, and LOCK WAL
record types, on the theory that those should match the physical
record (unless there is a good reason not to, which doesn't apply
here).

I just noticed that we don't even show xmax in the case of DELETE
records. Perhaps the original assumption is that it must match the
record's own XID, but that's not true after the MultiXact enhancements
for foreign key locking added to 9.3 (and in any case there is no
reason at all to show xmax in UPDATE but not in DELETE).

Attached revision v4 fixes this, making DELETE, UPDATE, HOT_UPDATE,
LOCK, and LOCK_UPDATED record types consistent with each other in
terms of the key names output by the heap desc routine. The field
order also needed a couple of tweaks for struct consistency (and
cross-record consistency) for v4.

--
Peter Geoghegan

Attachments:

v4-0001-Fix-heapdesc-infomask-array-output.patchapplication/octet-stream; name=v4-0001-Fix-heapdesc-infomask-array-output.patchDownload
From a53bc02b3bffd910b28ce03b4a36053b6683c436 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sun, 9 Apr 2023 18:14:27 -0700
Subject: [PATCH v4 1/3] Fix heapdesc infomask array output.

---
 src/include/access/heapam_xlog.h             |   2 +-
 src/include/access/rmgrdesc_utils.h          |  39 ++++++
 src/backend/access/heap/heapam.c             |   6 +-
 src/backend/access/rmgrdesc/heapdesc.c       | 120 +++++++++++--------
 src/backend/access/rmgrdesc/rmgrdesc_utils.c |  20 ----
 5 files changed, 114 insertions(+), 73 deletions(-)

diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index e71d84895..bf1d05965 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -279,7 +279,7 @@ typedef struct xl_heap_vacuum
 /* This is what we need to know about lock */
 typedef struct xl_heap_lock
 {
-	TransactionId locking_xid;	/* might be a MultiXactId not xid */
+	TransactionId xmax;			/* might be a MultiXactId */
 	OffsetNumber offnum;		/* locked tuple's offset on page */
 	int8		infobits_set;	/* infomask and infomask2 bits to set */
 	uint8		flags;			/* XLH_LOCK_* flag bits */
diff --git a/src/include/access/rmgrdesc_utils.h b/src/include/access/rmgrdesc_utils.h
index 00e614649..1686147f8 100644
--- a/src/include/access/rmgrdesc_utils.h
+++ b/src/include/access/rmgrdesc_utils.h
@@ -12,6 +12,45 @@
 #ifndef RMGRDESC_UTILS_H_
 #define RMGRDESC_UTILS_H_
 
+/*
+ * Guidelines for rmgr descriptor routine authors:
+ *
+ * The goal of these guidelines is to avoid gratuitous inconsistencies across
+ * each rmgr, and to allow users to parse desc output strings without too much
+ * difficulty.  This is not an API specification or an interchange format.
+ * (Only heapam and nbtree desc routines follow these guidelines at present,
+ * in any case.)
+ *
+ * Record descriptions are similar to JSON style key/value objects.  However,
+ * there is no explicit "string" type/string escaping.  Top-level { } brackets
+ * should be omitted.  For example:
+ *
+ * snapshotConflictHorizon: 0, flags: 0x03
+ *
+ * Record descriptions may contain variable-length arrays.  For example:
+ *
+ * nunused: 5, unused: [1, 2, 3, 4, 5]
+ *
+ * Nested objects are supported via { } brackets.  They generally appear
+ * inside variable-length arrays.  For example:
+ *
+ * ndeleted: 0, nupdated: 1, deleted: [], updated: [{ off: 45, nptids: 1, ptids: [0] }]
+ *
+ * Try to output things in an order that faithfully represents the order of
+ * things in the physical WAL record struct.  It's a good idea if the number
+ * of items in the array appears before the array.  It's also a good idea if
+ * key names are reliably unique (within the same nesting level).
+ *
+ * It's okay for individual WAL record types to invent their own conventions.
+ * For example, heapam's PRUNE records output the follow representation of
+ * redirects:
+ *
+ * ... redirected: [39->46], ...
+ *
+ * Arguably the PRUNE desc routine should be using object notation instead.
+ * This ad-hoc representation of redirects has the advantage of being terse in
+ * a context where that might matter a lot.
+ */
 extern void array_desc(StringInfo buf, void *array, size_t elem_size, int count,
 					   void (*elem_desc) (StringInfo buf, void *elem, void *data),
 					   void *data);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f389ceee1..b300a4675 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3567,7 +3567,7 @@ l2:
 			XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
 
 			xlrec.offnum = ItemPointerGetOffsetNumber(&oldtup.t_self);
-			xlrec.locking_xid = xmax_lock_old_tuple;
+			xlrec.xmax = xmax_lock_old_tuple;
 			xlrec.infobits_set = compute_infobits(oldtup.t_data->t_infomask,
 												  oldtup.t_data->t_infomask2);
 			xlrec.flags =
@@ -4777,7 +4777,7 @@ failed:
 		XLogRegisterBuffer(0, *buffer, REGBUF_STANDARD);
 
 		xlrec.offnum = ItemPointerGetOffsetNumber(&tuple->t_self);
-		xlrec.locking_xid = xid;
+		xlrec.xmax = xid;
 		xlrec.infobits_set = compute_infobits(new_infomask,
 											  tuple->t_data->t_infomask2);
 		xlrec.flags = cleared_all_frozen ? XLH_LOCK_ALL_FROZEN_CLEARED : 0;
@@ -9848,7 +9848,7 @@ heap_xlog_lock(XLogReaderState *record)
 						   BufferGetBlockNumber(buffer),
 						   offnum);
 		}
-		HeapTupleHeaderSetXmax(htup, xlrec->locking_xid);
+		HeapTupleHeaderSetXmax(htup, xlrec->xmax);
 		HeapTupleHeaderSetCmax(htup, FirstCommandId, false);
 		PageSetLSN(page, lsn);
 		MarkBufferDirty(buffer);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 3bd083875..d182d8048 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -17,30 +17,58 @@
 #include "access/heapam_xlog.h"
 #include "access/rmgrdesc_utils.h"
 
+/*
+ * NOTE: "keyname" argument cannot have trailing spaces or punctuation
+ * characters
+ */
 static void
-out_infobits(StringInfo buf, uint8 infobits)
+infobits_desc(StringInfo buf, uint8 infobits, const char *keyname)
 {
-	if ((infobits & XLHL_XMAX_IS_MULTI) == 0 &&
-		(infobits & XLHL_XMAX_LOCK_ONLY) == 0 &&
-		(infobits & XLHL_XMAX_EXCL_LOCK) == 0 &&
-		(infobits & XLHL_XMAX_KEYSHR_LOCK) == 0 &&
-		(infobits & XLHL_KEYS_UPDATED) == 0)
-		return;
+	appendStringInfo(buf, "%s: [", keyname);
 
-	appendStringInfoString(buf, ", infobits: [");
+	Assert(buf->data[buf->len - 1] != ' ');
 
 	if (infobits & XLHL_XMAX_IS_MULTI)
-		appendStringInfoString(buf, " IS_MULTI");
+		appendStringInfoString(buf, "IS_MULTI, ");
 	if (infobits & XLHL_XMAX_LOCK_ONLY)
-		appendStringInfoString(buf, ", LOCK_ONLY");
+		appendStringInfoString(buf, "LOCK_ONLY, ");
 	if (infobits & XLHL_XMAX_EXCL_LOCK)
-		appendStringInfoString(buf, ", EXCL_LOCK");
+		appendStringInfoString(buf, "EXCL_LOCK, ");
 	if (infobits & XLHL_XMAX_KEYSHR_LOCK)
-		appendStringInfoString(buf, ", KEYSHR_LOCK");
+		appendStringInfoString(buf, "KEYSHR_LOCK, ");
 	if (infobits & XLHL_KEYS_UPDATED)
-		appendStringInfoString(buf, ", KEYS_UPDATED");
+		appendStringInfoString(buf, "KEYS_UPDATED, ");
 
-	appendStringInfoString(buf, " ]");
+	if (buf->data[buf->len - 1] == ' ')
+	{
+		/* Truncate-away final unneeded ", "  */
+		Assert(buf->data[buf->len - 2] == ',');
+		buf->len -= 2;
+		buf->data[buf->len] = '\0';
+	}
+
+	appendStringInfoString(buf, "]");
+}
+
+static void
+truncate_flags_desc(StringInfo buf, uint8 flags)
+{
+	appendStringInfoString(buf, "flags: [");
+
+	if (flags & XLH_TRUNCATE_CASCADE)
+		appendStringInfoString(buf, "CASCADE, ");
+	if (flags & XLH_TRUNCATE_RESTART_SEQS)
+		appendStringInfoString(buf, "RESTART_SEQS, ");
+
+	if (buf->data[buf->len - 1] == ' ')
+	{
+		/* Truncate-away final unneeded ", "  */
+		Assert(buf->data[buf->len - 2] == ',');
+		buf->len -= 2;
+		buf->data[buf->len] = '\0';
+	}
+
+	appendStringInfoString(buf, "]");
 }
 
 static void
@@ -82,48 +110,36 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_delete *xlrec = (xl_heap_delete *) rec;
 
-		appendStringInfo(buf, "off: %u, flags: 0x%02X",
-						 xlrec->offnum,
-						 xlrec->flags);
-		out_infobits(buf, xlrec->infobits_set);
+		appendStringInfo(buf, "xmax: %u, off: %u, ",
+						 xlrec->xmax, xlrec->offnum);
+		infobits_desc(buf, xlrec->infobits_set, "infobits");
+		appendStringInfo(buf, ", flags: 0x%02X", xlrec->flags);
 	}
 	else if (info == XLOG_HEAP_UPDATE)
 	{
 		xl_heap_update *xlrec = (xl_heap_update *) rec;
 
-		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
-						 xlrec->old_offnum,
-						 xlrec->old_xmax,
-						 xlrec->flags);
-		out_infobits(buf, xlrec->old_infobits_set);
-		appendStringInfo(buf, ", new off: %u, xmax %u",
-						 xlrec->new_offnum,
-						 xlrec->new_xmax);
+		appendStringInfo(buf, "old_xmax: %u, old_off: %u, ",
+						 xlrec->old_xmax, xlrec->old_offnum);
+		infobits_desc(buf, xlrec->old_infobits_set, "old_infobits");
+		appendStringInfo(buf, ", flags: 0x%02X, new_xmax: %u, new_off: %u",
+						 xlrec->flags, xlrec->new_xmax, xlrec->new_offnum);
 	}
 	else if (info == XLOG_HEAP_HOT_UPDATE)
 	{
 		xl_heap_update *xlrec = (xl_heap_update *) rec;
 
-		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
-						 xlrec->old_offnum,
-						 xlrec->old_xmax,
-						 xlrec->flags);
-		out_infobits(buf, xlrec->old_infobits_set);
-		appendStringInfo(buf, ", new off: %u, xmax: %u",
-						 xlrec->new_offnum,
-						 xlrec->new_xmax);
+		appendStringInfo(buf, "old_xmax: %u, old_off: %u, ",
+						 xlrec->old_xmax, xlrec->old_offnum);
+		infobits_desc(buf, xlrec->old_infobits_set, "old_infobits");
+		appendStringInfo(buf, ", flags: 0x%02X, new_xmax: %u, new_off: %u",
+						 xlrec->flags, xlrec->new_xmax, xlrec->new_offnum);
 	}
 	else if (info == XLOG_HEAP_TRUNCATE)
 	{
 		xl_heap_truncate *xlrec = (xl_heap_truncate *) rec;
 
-		appendStringInfoString(buf, "flags: [");
-		if (xlrec->flags & XLH_TRUNCATE_CASCADE)
-			appendStringInfoString(buf, " CASCADE");
-		if (xlrec->flags & XLH_TRUNCATE_RESTART_SEQS)
-			appendStringInfoString(buf, ", RESTART_SEQS");
-		appendStringInfoString(buf, " ]");
-
+		truncate_flags_desc(buf, xlrec->flags);
 		appendStringInfo(buf, ", nrelids: %u", xlrec->nrelids);
 		appendStringInfoString(buf, ", relids:");
 		array_desc(buf, xlrec->relids, sizeof(Oid), xlrec->nrelids,
@@ -139,9 +155,10 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_lock *xlrec = (xl_heap_lock *) rec;
 
-		appendStringInfo(buf, "off: %u, xid: %u, flags: 0x%02X",
-						 xlrec->offnum, xlrec->locking_xid, xlrec->flags);
-		out_infobits(buf, xlrec->infobits_set);
+		appendStringInfo(buf, "xmax: %u, off: %u, ",
+						 xlrec->xmax, xlrec->offnum);
+		infobits_desc(buf, xlrec->infobits_set, "infobits");
+		appendStringInfo(buf, ", flags: 0x%02X", xlrec->flags);
 	}
 	else if (info == XLOG_HEAP_INPLACE)
 	{
@@ -230,7 +247,9 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			OffsetNumber *offsets;
 
 			plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, NULL);
-			offsets = (OffsetNumber *) &plans[xlrec->nplans];
+			offsets = (OffsetNumber *) ((char *) plans +
+										(xlrec->nplans *
+										 sizeof(xl_heap_freeze_plan)));
 			appendStringInfoString(buf, ", plans:");
 			array_desc(buf, plans, sizeof(xl_heap_freeze_plan), xlrec->nplans,
 					   &plan_elem_desc, &offsets);
@@ -251,18 +270,21 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
 						 xlrec->flags);
 
-		appendStringInfoString(buf, ", offsets:");
 		if (!XLogRecHasBlockImage(record, 0) && !isinit)
+		{
+			appendStringInfoString(buf, ", offsets:");
 			array_desc(buf, xlrec->offsets, sizeof(OffsetNumber),
 					   xlrec->ntuples, &offset_elem_desc, NULL);
+		}
 	}
 	else if (info == XLOG_HEAP2_LOCK_UPDATED)
 	{
 		xl_heap_lock_updated *xlrec = (xl_heap_lock_updated *) rec;
 
-		appendStringInfo(buf, "off: %u, xmax: %u, flags: 0x%02X",
-						 xlrec->offnum, xlrec->xmax, xlrec->flags);
-		out_infobits(buf, xlrec->infobits_set);
+		appendStringInfo(buf, "xmax: %u, off: %u, ",
+						 xlrec->xmax, xlrec->offnum);
+		infobits_desc(buf, xlrec->infobits_set, "infobits");
+		appendStringInfo(buf, ", flags: 0x%02X", xlrec->flags);
 	}
 	else if (info == XLOG_HEAP2_NEW_CID)
 	{
diff --git a/src/backend/access/rmgrdesc/rmgrdesc_utils.c b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
index 3d16b1fcb..90051f0df 100644
--- a/src/backend/access/rmgrdesc/rmgrdesc_utils.c
+++ b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
@@ -16,26 +16,6 @@
 #include "access/rmgrdesc_utils.h"
 #include "storage/off.h"
 
-/*
- * Guidelines for formatting desc functions:
- *
- * member1_name: member1_value, member2_name: member2_value
- *
- * If the value is a list, please use:
- *
- * member3_name: [ member3_list_value1, member3_list_value2 ]
- *
- * The first item appended to the string should not be prepended by any spaces
- * or comma, however all subsequent appends to the string are responsible for
- * prepending themselves with a comma followed by a space.
- *
- * Flags should be in ALL CAPS.
- *
- * For lists/arrays of items, the number of those items should be listed at
- * the beginning with all of the other numbers.
- *
- * Composite objects in a list should be surrounded with { }.
- */
 void
 array_desc(StringInfo buf, void *array, size_t elem_size, int count,
 		   void (*elem_desc) (StringInfo buf, void *elem, void *data),
-- 
2.40.0

#40Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#39)
Re: Show various offset arrays for heap WAL records

On Tue, Apr 11, 2023 at 1:35 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Tue, Apr 11, 2023 at 7:40 AM Melanie Plageman <melanieplageman@gmail.com> wrote:

Not the fault of this patch, but I also noticed that heap UPDATE and
HOT_UPDATE records have xmax twice and don't differentiate between new
and old. I think that was probably a mistake.

description | off: 119, xmax: 1105, flags: 0x00, old_infobits:
[], new off: 100, xmax 0

That doesn't seem great to me either. I don't like this ambiguity,
because it seems like it makes the description hard to parse in a way
that flies in the face of what we're trying to do here, in general.
So it seems like it might be worth fixing now, in the scope of this
patch.

Agreed.

On Tue, Apr 11, 2023 at 3:22 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Tue, Apr 11, 2023 at 11:48 AM Peter Geoghegan <pg@bowt.ie> wrote:

Attached revision deals with this by spelling out the names in full
(e.g., "old_xmax" and "new_xmax"). It also reorders the output fields
to match the order from the physical UPDATE, HOT_UPDATE, and LOCK WAL
record types, on the theory that those should match the physical
record (unless there is a good reason not to, which doesn't apply
here).

I just noticed that we don't even show xmax in the case of DELETE
records. Perhaps the original assumption is that it must match the
record's own XID, but that's not true after the MultiXact enhancements
for foreign key locking added to 9.3 (and in any case there is no
reason at all to show xmax in UPDATE but not in DELETE).

Attached revision v4 fixes this, making DELETE, UPDATE, HOT_UPDATE,
LOCK, and LOCK_UPDATED record types consistent with each other in
terms of the key names output by the heap desc routine. The field
order also needed a couple of tweaks for struct consistency (and
cross-record consistency) for v4.

Code in v4 all seems fine to me.
I like the update guidelines comment.

I agree it would be nice for xl_heap_lock->locking_xid to be renamed
xmax for clarity. I would suggest that if you don't intend to put it
in a separate commit, you mention it explicitly in the final commit
message. Its motivation isn't immediately obvious to the reader.

- Melanie

In reply to: Melanie Plageman (#40)
Re: Show various offset arrays for heap WAL records

On Tue, Apr 11, 2023 at 2:29 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

That doesn't seem great to me either. I don't like this ambiguity,
because it seems like it makes the description hard to parse in a way
that flies in the face of what we're trying to do here, in general.
So it seems like it might be worth fixing now, in the scope of this
patch.

Agreed.

Great -- pushed a fix for this just now, which included that change.

I agree it would be nice for xl_heap_lock->locking_xid to be renamed
xmax for clarity. I would suggest that if you don't intend to put it
in a separate commit, you mention it explicitly in the final commit
message. Its motivation isn't immediately obvious to the reader.

What I ended up doing is making that part of a bug fix for a minor
buglet I noticed in passing -- it became part of the "Fix xl_heap_lock
WAL record field's data type" commit from a bit earlier on.

Thanks for your help with the follow-up work. Seems like we're done
with this now.

--
Peter Geoghegan

#42Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Peter Geoghegan (#41)
1 attachment(s)
Re: Show various offset arrays for heap WAL records

On 12/04/2023 01:29, Peter Geoghegan wrote:

Thanks for your help with the follow-up work. Seems like we're done
with this now.

This is still listed in the July commitfest; is there some work remaining?

I'm late to the party, but regarding commit c03c2eae0a, which added the
guidelines for writing formatting desc functions:

You moved the comment from rmgrdesc_utils.c into rmgrdesc_utils.h, but I
don't think that was a good idea. Our usual convention is to have the
function comment in the .c file, not at the declaration in the header
file. When I want to know what a function does, I jump to the .c file,
and might miss the comment in the header entirely.

Let's add a src/backend/access/rmgrdesc/README file. We don't currently
have any explanation anywhere why the rmgr desc functions are in a
separate directory. The README would be a good place to explain that,
and to have the formatting guidelines. See attached.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v1-0001-Add-rmgrdesc-README.patchtext/x-patch; charset=UTF-8; name=v1-0001-Add-rmgrdesc-README.patchDownload
From 5261e3b4c373aef47458bed14d339d4ea2e6df5a Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 10 Jul 2023 10:41:35 +0300
Subject: [PATCH v1 1/1] Add rmgrdesc README.

In the README, briefly explain what rmgrdesc functions are, and why they
are in a separate directory. Commit c03c2eae0a added some guidelines on
the preferred output format; move that to the README too.

Discussion: TODO
---
 src/backend/access/rmgrdesc/README           | 60 ++++++++++++++++++++
 src/backend/access/rmgrdesc/rmgrdesc_utils.c |  4 ++
 src/include/access/rmgrdesc_utils.h          | 44 --------------
 3 files changed, 64 insertions(+), 44 deletions(-)
 create mode 100644 src/backend/access/rmgrdesc/README

diff --git a/src/backend/access/rmgrdesc/README b/src/backend/access/rmgrdesc/README
new file mode 100644
index 0000000000..abe84b9f11
--- /dev/null
+++ b/src/backend/access/rmgrdesc/README
@@ -0,0 +1,60 @@
+src/backend/access/rmgrdesc/README
+
+WAL resource manager description functions
+==========================================
+
+For debugging purposes, there is a "description function", or rmgrdesc
+function, for each WAL resource manager. The rmgrdesc function parses the WAL
+record and prints the contents of the WAL record in a somewhat human-readable
+format.
+
+The rmgrdesc functions for all resource managers are gathered in this
+directory, because they are also used in the stand-alone pg_waldump program.
+They could potentially be used by out-of-tree debugging tools too, although
+the the functions or the output format should not be considered a stable API.
+
+Guidelines for rmgrdesc output format
+=====================================
+
+The goal of these guidelines is to avoid gratuitous inconsistencies across
+each rmgr, and to allow users to parse desc output strings without too much
+difficulty.  This is not an API specification or an interchange format.
+(Only heapam and nbtree desc routines follow these guidelines at present, in
+any case.)
+
+Record descriptions are similar to JSON style key/value objects.  However,
+there is no explicit "string" type/string escaping.  Top-level { } brackets
+should be omitted.  For example:
+
+snapshotConflictHorizon: 0, flags: 0x03
+
+Record descriptions may contain variable-length arrays.  For example:
+
+nunused: 5, unused: [1, 2, 3, 4, 5]
+
+Nested objects are supported via { } brackets.  They generally appear inside
+variable-length arrays.  For example:
+
+ndeleted: 0, nupdated: 1, deleted: [], updated: [{ off: 45, nptids: 1, ptids: [0] }]
+
+Try to output things in an order that faithfully represents the order of
+fields from the underlying physical WAL record struct.  Key names should be
+unique (at the same nesting level) to make parsing easy.  It's a good idea if
+the number of items in the array appears before the array.
+
+It's okay for individual WAL record types to invent their own conventions.
+For example, Heap2's PRUNE record descriptions use a custom array format for
+the record's "redirected" field:
+
+... redirected: [1->4, 5->9], dead: [10, 11], unused: [3, 7, 8]
+
+Arguably the desc routine should be using object notation for this instead.
+However, there is value in using a custom format when it conveys useful
+information about the underlying physical data structures.
+
+This ad-hoc format has the advantage of being close to the format used for
+the "dead" and "unused" arrays (which follow the standard desc convention for
+page offset number arrays).  It suggests that the "redirected" elements shown
+are just pairs of page offset numbers (which is how it really works).
+
+rmgrdesc_utils.c contains some helper functions to print data in this format.
diff --git a/src/backend/access/rmgrdesc/rmgrdesc_utils.c b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
index 90051f0df9..808770524d 100644
--- a/src/backend/access/rmgrdesc/rmgrdesc_utils.c
+++ b/src/backend/access/rmgrdesc/rmgrdesc_utils.c
@@ -16,6 +16,10 @@
 #include "access/rmgrdesc_utils.h"
 #include "storage/off.h"
 
+/*
+ * Helper function to print an array, in the format described in the
+ * README.
+ */
 void
 array_desc(StringInfo buf, void *array, size_t elem_size, int count,
 		   void (*elem_desc) (StringInfo buf, void *elem, void *data),
diff --git a/src/include/access/rmgrdesc_utils.h b/src/include/access/rmgrdesc_utils.h
index bd414699f2..c754d55621 100644
--- a/src/include/access/rmgrdesc_utils.h
+++ b/src/include/access/rmgrdesc_utils.h
@@ -12,50 +12,6 @@
 #ifndef RMGRDESC_UTILS_H_
 #define RMGRDESC_UTILS_H_
 
-/*
- * Guidelines for rmgrdesc routine authors:
- *
- * The goal of these guidelines is to avoid gratuitous inconsistencies across
- * each rmgr, and to allow users to parse desc output strings without too much
- * difficulty.  This is not an API specification or an interchange format.
- * (Only heapam and nbtree desc routines follow these guidelines at present,
- * in any case.)
- *
- * Record descriptions are similar to JSON style key/value objects.  However,
- * there is no explicit "string" type/string escaping.  Top-level { } brackets
- * should be omitted.  For example:
- *
- * snapshotConflictHorizon: 0, flags: 0x03
- *
- * Record descriptions may contain variable-length arrays.  For example:
- *
- * nunused: 5, unused: [1, 2, 3, 4, 5]
- *
- * Nested objects are supported via { } brackets.  They generally appear
- * inside variable-length arrays.  For example:
- *
- * ndeleted: 0, nupdated: 1, deleted: [], updated: [{ off: 45, nptids: 1, ptids: [0] }]
- *
- * Try to output things in an order that faithfully represents the order of
- * fields from the underlying physical WAL record struct.  Key names should be
- * unique (at the same nesting level) to make parsing easy.  It's a good idea
- * if the number of items in the array appears before the array.
- *
- * It's okay for individual WAL record types to invent their own conventions.
- * For example, Heap2's PRUNE record descriptions use a custom array format
- * for the record's "redirected" field:
- *
- * ... redirected: [1->4, 5->9], dead: [10, 11], unused: [3, 7, 8]
- *
- * Arguably the desc routine should be using object notation for this instead.
- * However, there is value in using a custom format when it conveys useful
- * information about the underlying physical data structures.
- *
- * This ad-hoc format has the advantage of being close to the format used for
- * the "dead" and "unused" arrays (which follow the standard desc convention
- * for page offset number arrays).  It suggests that the "redirected" elements
- * shown are just pairs of page offset numbers (which is how it really works).
- */
 extern void array_desc(StringInfo buf, void *array, size_t elem_size, int count,
 					   void (*elem_desc) (StringInfo buf, void *elem, void *data),
 					   void *data);
-- 
2.30.2

In reply to: Heikki Linnakangas (#42)
Re: Show various offset arrays for heap WAL records

On Mon, Jul 10, 2023 at 12:44 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

This is still listed in the July commitfest; is there some work remaining?

I don't think so; not in the scope of the original patch series from
Melanie, at least.

You moved the comment from rmgrdesc_utils.c into rmgrdesc_utils.h, but I
don't think that was a good idea. Our usual convention is to have the
function comment in the .c file, not at the declaration in the header
file. When I want to know what a function does, I jump to the .c file,
and might miss the comment in the header entirely.

I think that this was a gray area. It wasn't particularly obvious
where this would go. At least not to me.

Let's add a src/backend/access/rmgrdesc/README file. We don't currently
have any explanation anywhere why the rmgr desc functions are in a
separate directory. The README would be a good place to explain that,
and to have the formatting guidelines. See attached.

I agree that it's better this way, though.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#43)
Re: Show various offset arrays for heap WAL records

On Mon, Jul 10, 2023 at 10:29 PM Peter Geoghegan <pg@bowt.ie> wrote:

Let's add a src/backend/access/rmgrdesc/README file. We don't currently
have any explanation anywhere why the rmgr desc functions are in a
separate directory. The README would be a good place to explain that,
and to have the formatting guidelines. See attached.

I agree that it's better this way, though.

Did you forget to follow up here?

--
Peter Geoghegan

#45Melanie Plageman
melanieplageman@gmail.com
In reply to: Heikki Linnakangas (#42)
Re: Show various offset arrays for heap WAL records

On Mon, Jul 10, 2023 at 3:44 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

I'm late to the party, but regarding commit c03c2eae0a, which added the
guidelines for writing formatting desc functions:

You moved the comment from rmgrdesc_utils.c into rmgrdesc_utils.h, but I
don't think that was a good idea. Our usual convention is to have the
function comment in the .c file, not at the declaration in the header
file. When I want to know what a function does, I jump to the .c file,
and might miss the comment in the header entirely.

Let's add a src/backend/access/rmgrdesc/README file. We don't currently
have any explanation anywhere why the rmgr desc functions are in a
separate directory. The README would be a good place to explain that,
and to have the formatting guidelines. See attached.

diff --git a/src/backend/access/rmgrdesc/README
b/src/backend/access/rmgrdesc/README
new file mode 100644
index 0000000000..abe84b9f11
--- /dev/null
+++ b/src/backend/access/rmgrdesc/README
@@ -0,0 +1,60 @@
+src/backend/access/rmgrdesc/README
+
+WAL resource manager description functions
+==========================================
+
+For debugging purposes, there is a "description function", or rmgrdesc
+function, for each WAL resource manager. The rmgrdesc function parses the WAL
+record and prints the contents of the WAL record in a somewhat human-readable
+format.
+
+The rmgrdesc functions for all resource managers are gathered in this
+directory, because they are also used in the stand-alone pg_waldump program.

"standalone" seems the more common spelling of this adjective in the
codebase today.

+They could potentially be used by out-of-tree debugging tools too, although
+the the functions or the output format should not be considered a stable API.

You have an extra "the".

I might phrase the last bit as "neither the description functions nor
the output format should be considered part of a stable API"

+Guidelines for rmgrdesc output format
+=====================================

I noticed you used === for both headings and wondered if it was
intentional. Other READMEs I looked at in src/backend/access tend to
have a single heading underlined with ==== and then subsequent
headings are underlined with -----. I could see an argument either way
here, but I just thought I would bring it up in case it was not a
conscious choice.

Otherwise, LGTM.

- Melanie

#46Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Melanie Plageman (#45)
Re: Show various offset arrays for heap WAL records

On 04/09/2023 23:02, Melanie Plageman wrote:

I might phrase the last bit as "neither the description functions nor
the output format should be considered part of a stable API"

+Guidelines for rmgrdesc output format
+=====================================

I noticed you used === for both headings and wondered if it was
intentional. Other READMEs I looked at in src/backend/access tend to
have a single heading underlined with ==== and then subsequent
headings are underlined with -----. I could see an argument either way
here, but I just thought I would bring it up in case it was not a
conscious choice.

Otherwise, LGTM.

Made these changes and committed. Thank you!

--
Heikki Linnakangas
Neon (https://neon.tech)

In reply to: Peter Geoghegan (#20)
1 attachment(s)
Recording whether Heap2/PRUNE records are from VACUUM or from opportunistic pruning (Was: Show various offset arrays for heap WAL records)

On Tue, Mar 21, 2023 at 3:37 PM Peter Geoghegan <pg@bowt.ie> wrote:

I think that we should do something like the attached, to completely
avoid this ambiguity. This patch adds a new XLOG_HEAP2 bit that's
similar to XLOG_HEAP_INIT_PAGE -- XLOG_HEAP2_BYVACUUM. This allows all
XLOG_HEAP2 record types to indicate that they took place during
VACUUM, by XOR'ing the flag with the record type/info when
XLogInsert() is called. For now this is only used by PRUNE records.
Tools like pg_walinspect will report a separate "Heap2/PRUNE+BYVACUUM"
record_type, as well as the unadorned Heap2/PRUNE record_type, which
we'll now know must have been opportunistic pruning.

The approach of using a bit in the style of the heapam init bit makes
sense to me, because the bit is available, and works in a way that is
minimally invasive. Also, one can imagine needing to resolve a similar
ambiguity in the future, when (say) opportunistic freezing is added.

Starting a new, dedicated thread to keep track of this in the CF app.

This patch bitrot. Attached is v2, rebased on top of HEAD.

--
Peter Geoghegan

Attachments:

v2-0001-Record-which-PRUNE-records-are-from-VACUUM.patchapplication/octet-stream; name=v2-0001-Record-which-PRUNE-records-are-from-VACUUM.patchDownload
From b358cdb295468c59ba0ac1320f46ccf3f1591b29 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Tue, 21 Mar 2023 14:40:43 -0700
Subject: [PATCH v2] Record which PRUNE records are from VACUUM.

---
 src/include/access/heapam.h            |  2 +-
 src/include/access/heapam_xlog.h       |  8 +++++++-
 src/backend/access/heap/pruneheap.c    | 25 +++++++++++++++----------
 src/backend/access/rmgrdesc/heapdesc.c |  3 +++
 4 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2d7a0ea7..174073eb2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -321,7 +321,7 @@ extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune(Relation relation, Buffer buffer,
 							struct GlobalVisState *vistest,
 							PruneResult *presult,
-							OffsetNumber *off_loc);
+							OffsetNumber *off_loc_vacuum);
 extern void heap_page_prune_execute(Buffer buffer,
 									OffsetNumber *redirected, int nredirected,
 									OffsetNumber *nowdead, int ndead,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index a03845078..d4c000f97 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -48,7 +48,7 @@
  * We ran out of opcodes, so heapam.c now has a second RmgrId.  These opcodes
  * are associated with RM_HEAP2_ID, but are not logically different from
  * the ones above associated with RM_HEAP_ID.  XLOG_HEAP_OPMASK applies to
- * these, too.
+ * these, too.  The BYVACUUM bit is used in place of RM_HEAP_ID's init bit.
  */
 #define XLOG_HEAP2_REWRITE		0x00
 #define XLOG_HEAP2_PRUNE		0x10
@@ -59,6 +59,12 @@
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
 
+/*
+ * RM_HEAP2_ID operations that take place during VACUUM set the BYVACUUM bit
+ * as instrumentation.  Only set in XLOG_HEAP2_PRUNE records, for now.
+ */
+#define XLOG_HEAP2_BYVACUUM		0x80
+
 /*
  * xl_heap_insert/xl_heap_multi_insert flag values, 8 bits are available.
  */
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c5f1abd95..37cc31e0f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -193,8 +193,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * (see heap_prune_satisfies_vacuum and
  * HeapTupleSatisfiesVacuum).
  *
- * off_loc is the offset location required by the caller to use in error
- * callback.
+ * off_loc_vacuum is the offset location required by VACUUM caller to use in
+ * error callback.  Opportunistic pruning caller should pass NULL.
  *
  * presult contains output parameters needed by callers such as the number of
  * tuples removed and the number of line pointers newly marked LP_DEAD.
@@ -204,7 +204,7 @@ void
 heap_page_prune(Relation relation, Buffer buffer,
 				GlobalVisState *vistest,
 				PruneResult *presult,
-				OffsetNumber *off_loc)
+				OffsetNumber *off_loc_vacuum)
 {
 	Page		page = BufferGetPage(buffer);
 	BlockNumber blockno = BufferGetBlockNumber(buffer);
@@ -284,8 +284,8 @@ heap_page_prune(Relation relation, Buffer buffer,
 		 * Set the offset number so that we can display it along with any
 		 * error that occurred while processing this tuple.
 		 */
-		if (off_loc)
-			*off_loc = offnum;
+		if (off_loc_vacuum)
+			*off_loc_vacuum = offnum;
 
 		presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
 															buffer);
@@ -303,8 +303,8 @@ heap_page_prune(Relation relation, Buffer buffer,
 			continue;
 
 		/* see preceding loop */
-		if (off_loc)
-			*off_loc = offnum;
+		if (off_loc_vacuum)
+			*off_loc_vacuum = offnum;
 
 		/* Nothing to do if slot is empty or already dead */
 		itemid = PageGetItemId(page, offnum);
@@ -317,8 +317,8 @@ heap_page_prune(Relation relation, Buffer buffer,
 	}
 
 	/* Clear the offset information once we have processed the given page. */
-	if (off_loc)
-		*off_loc = InvalidOffsetNumber;
+	if (off_loc_vacuum)
+		*off_loc_vacuum = InvalidOffsetNumber;
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -356,6 +356,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 		if (RelationNeedsWAL(relation))
 		{
 			xl_heap_prune xlrec;
+			uint8		info = XLOG_HEAP2_PRUNE;
 			XLogRecPtr	recptr;
 
 			xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
@@ -386,7 +387,11 @@ heap_page_prune(Relation relation, Buffer buffer,
 				XLogRegisterBufData(0, (char *) prstate.nowunused,
 									prstate.nunused * sizeof(OffsetNumber));
 
-			recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
+			/* Set bit indicating pruning by VACUUM where appropriate */
+			if (off_loc_vacuum)
+				info |= XLOG_HEAP2_BYVACUUM;
+
+			recptr = XLogInsert(RM_HEAP2_ID, info);
 
 			PageSetLSN(BufferGetPage(buffer), recptr);
 		}
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index f382c0f62..c456d6986 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -356,6 +356,9 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE:
 			id = "PRUNE";
 			break;
+		case XLOG_HEAP2_PRUNE | XLOG_HEAP2_BYVACUUM:
+			id = "PRUNE+BYVACUUM";
+			break;
 		case XLOG_HEAP2_VACUUM:
 			id = "VACUUM";
 			break;
-- 
2.43.0

#48Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Peter Geoghegan (#47)
Re: Recording whether Heap2/PRUNE records are from VACUUM or from opportunistic pruning (Was: Show various offset arrays for heap WAL records)

On 09/12/2023 23:48, Peter Geoghegan wrote:

On Tue, Mar 21, 2023 at 3:37 PM Peter Geoghegan <pg@bowt.ie> wrote:

I think that we should do something like the attached, to completely
avoid this ambiguity. This patch adds a new XLOG_HEAP2 bit that's
similar to XLOG_HEAP_INIT_PAGE -- XLOG_HEAP2_BYVACUUM. This allows all
XLOG_HEAP2 record types to indicate that they took place during
VACUUM, by XOR'ing the flag with the record type/info when
XLogInsert() is called. For now this is only used by PRUNE records.
Tools like pg_walinspect will report a separate "Heap2/PRUNE+BYVACUUM"
record_type, as well as the unadorned Heap2/PRUNE record_type, which
we'll now know must have been opportunistic pruning.

The approach of using a bit in the style of the heapam init bit makes
sense to me, because the bit is available, and works in a way that is
minimally invasive. Also, one can imagine needing to resolve a similar
ambiguity in the future, when (say) opportunistic freezing is added.

Starting a new, dedicated thread to keep track of this in the CF app.

This patch bitrot. Attached is v2, rebased on top of HEAD.

I included changes like this in commit f83d709760 ("Merge prune, freeze
and vacuum WAL record formats"). Marking this as Committed in the
commitfest.

--
Heikki Linnakangas
Neon (https://neon.tech)

In reply to: Heikki Linnakangas (#48)
Re: Recording whether Heap2/PRUNE records are from VACUUM or from opportunistic pruning (Was: Show various offset arrays for heap WAL records)

On Mon, Mar 25, 2024 at 9:04 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

I included changes like this in commit f83d709760 ("Merge prune, freeze
and vacuum WAL record formats"). Marking this as Committed in the
commitfest.

Thanks for making sure that that happened. I suspect that the amount
of pruning performed opportunistically is sometimes much higher than
generally assumed, so having a way of measuring that seems like it
might lead to valuable insights.

--
Peter Geoghegan