Wrong statistics for size of XLOG_SWITCH during pg_waldump.

Started by movead.li@highgo.caover 5 years ago28 messages
#1movead.li@highgo.ca
movead.li@highgo.ca
1 attachment(s)

Hello hackers,

We know that pg_waldump can statistics size for every kind of records. When I use
the feature I find it misses some size for XLOG_SWITCH records. When a user does
a pg_wal_switch(), then postgres will discard the remaining size in the current wal
segment, and the pg_waldump tool misses the discard size.

I think it will be better if pg_waldump can show the matter, so I make a patch
which regards the discard size as a part of XLOG_SWITCH record, it works if we
want to display the detail of wal records or the statistics, and patch attached.

What's your opinion?

Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Attachments:

fix_waldump_size_for_wal_switch.patchapplication/octet-stream; name=fix_waldump_size_for_wal_switch.patchDownload
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 31e99c2a6d..011606ecf2 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -24,6 +24,7 @@
 #include "common/logging.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "catalog/pg_control.h"
 
 static const char *progname;
 
@@ -380,9 +381,11 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
  * Calculate the size of a record, split into !FPI and FPI parts.
  */
 static void
-XLogDumpRecordLen(XLogReaderState *record, uint32 *rec_len, uint32 *fpi_len)
+XLogDumpRecordLen(XLogReaderState *record, uint32 *rec_len, uint32 *fpi_len, uint32 *junk_len)
 {
 	int			block_id;
+	RmgrId		rmid;
+	uint8		info;
 
 	/*
 	 * Calculate the amount of FPI data in the record.
@@ -403,6 +406,31 @@ XLogDumpRecordLen(XLogReaderState *record, uint32 *rec_len, uint32 *fpi_len)
 	 * all the block images.
 	 */
 	*rec_len = XLogRecGetTotalLen(record) - *fpi_len;
+
+	rmid = XLogRecGetRmid(record);
+	info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	/*
+	 * If it's a wal switch record, we should caculate the junk size which skipped
+	 * by this record.
+	 */
+	*junk_len = 0;
+	if(RM_XLOG_ID == rmid && XLOG_SWITCH == info)
+	{
+		XLogSegNo	startSegNo;
+		XLogSegNo	endSegNo;
+
+		XLByteToSeg(record->ReadRecPtr, startSegNo, record->segcxt.ws_segsize);
+		XLByteToSeg(record->EndRecPtr, endSegNo, record->segcxt.ws_segsize);
+
+		*junk_len = record->EndRecPtr - record->ReadRecPtr - XLogRecGetTotalLen(record);
+		/*
+		 * If the wal switch record spread on two segments, we should extra minus the
+		 * long page head.
+		 */
+		if(startSegNo != endSegNo)
+			*junk_len -= SizeOfXLogLongPHD;
+	}
 }
 
 /*
@@ -416,12 +444,14 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
 	uint8		recid;
 	uint32		rec_len;
 	uint32		fpi_len;
+	uint32		junk_len;
 
 	stats->count++;
 
 	rmid = XLogRecGetRmid(record);
 
-	XLogDumpRecordLen(record, &rec_len, &fpi_len);
+	XLogDumpRecordLen(record, &rec_len, &fpi_len, &junk_len);
+	rec_len += junk_len;
 
 	/* Update per-rmgr statistics */
 
@@ -435,9 +465,7 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
 	 * are the rmgr's domain (resulting in sixteen possible entries per
 	 * RmgrId).
 	 */
-
 	recid = XLogRecGetInfo(record) >> 4;
-
 	stats->record_stats[rmid][recid].count++;
 	stats->record_stats[rmid][recid].rec_len += rec_len;
 	stats->record_stats[rmid][recid].fpi_len += fpi_len;
@@ -453,6 +481,7 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
 	const RmgrDescData *desc = &RmgrDescTable[XLogRecGetRmid(record)];
 	uint32		rec_len;
 	uint32		fpi_len;
+	uint32		junk_len;
 	RelFileNode rnode;
 	ForkNumber	forknum;
 	BlockNumber blk;
@@ -461,11 +490,11 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
 	XLogRecPtr	xl_prev = XLogRecGetPrev(record);
 	StringInfoData s;
 
-	XLogDumpRecordLen(record, &rec_len, &fpi_len);
+	XLogDumpRecordLen(record, &rec_len, &fpi_len, &junk_len);
 
 	printf("rmgr: %-11s len (rec/tot): %6u/%6u, tx: %10u, lsn: %X/%08X, prev %X/%08X, ",
 		   desc->rm_name,
-		   rec_len, XLogRecGetTotalLen(record),
+		   rec_len, XLogRecGetTotalLen(record) + junk_len,
 		   XLogRecGetXid(record),
 		   (uint32) (record->ReadRecPtr >> 32), (uint32) record->ReadRecPtr,
 		   (uint32) (xl_prev >> 32), (uint32) xl_prev);
#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: movead.li@highgo.ca (#1)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

At Fri, 9 Oct 2020 13:41:25 +0800, "movead.li@highgo.ca" <movead.li@highgo.ca> wrote in

Hello hackers,

We know that pg_waldump can statistics size for every kind of records. When I use
the feature I find it misses some size for XLOG_SWITCH records. When a user does
a pg_wal_switch(), then postgres will discard the remaining size in the current wal
segment, and the pg_waldump tool misses the discard size.

I think it will be better if pg_waldump can show the matter, so I make a patch
which regards the discard size as a part of XLOG_SWITCH record, it works if we
want to display the detail of wal records or the statistics, and patch attached.

What's your opinion?

I think that the length of the XLOG_SWITCH record is no other than 24
bytes. Just adding the padding? garbage bytes to that length doesn't
seem the right thing to me.

If we want pg_waldump to show that length somewhere, it could be shown
at the end of that record explicitly:

rmgr: XLOG len (rec/tot): 24/16776848, tx: 0, lsn: 0/02000148, prev 0/02000110, desc: SWITCH, trailing-bytes: 16776944

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#3movead.li@highgo.ca
movead.li@highgo.ca
In reply to: movead.li@highgo.ca (#1)
1 attachment(s)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

I think that the length of the XLOG_SWITCH record is no other than 24
bytes. Just adding the padding? garbage bytes to that length doesn't
seem the right thing to me.

If we want pg_waldump to show that length somewhere, it could be shown
at the end of that record explicitly:

rmgr: XLOG len (rec/tot): 24/16776848, tx: 0, lsn: 0/02000148, prev 0/02000110, desc: SWITCH, trailing-bytes: 16776944

Thanks, I think it's good idea, and new patch attached.

Here's the lookes:
rmgr: XLOG len (rec/tot): 24/ 24, tx: 0, lsn: 0/030000D8, prev 0/03000060, desc: SWITCH, trailing-bytes: 16776936

Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Attachments:

fix_waldump_size_for_wal_switch_v2.patchapplication/octet-stream; name=fix_waldump_size_for_wal_switch_v2.patchDownload
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 31e99c2a6d..30159cc820 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -24,6 +24,7 @@
 #include "common/logging.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "catalog/pg_control.h"
 
 static const char *progname;
 
@@ -380,9 +381,11 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
  * Calculate the size of a record, split into !FPI and FPI parts.
  */
 static void
-XLogDumpRecordLen(XLogReaderState *record, uint32 *rec_len, uint32 *fpi_len)
+XLogDumpRecordLen(XLogReaderState *record, uint32 *rec_len, uint32 *fpi_len, uint32 *junk_len)
 {
 	int			block_id;
+	RmgrId		rmid;
+	uint8		info;
 
 	/*
 	 * Calculate the amount of FPI data in the record.
@@ -403,6 +406,31 @@ XLogDumpRecordLen(XLogReaderState *record, uint32 *rec_len, uint32 *fpi_len)
 	 * all the block images.
 	 */
 	*rec_len = XLogRecGetTotalLen(record) - *fpi_len;
+
+	rmid = XLogRecGetRmid(record);
+	info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	/*
+	 * If it's a wal switch record, we should caculate the junk size which skipped
+	 * by this record.
+	 */
+	*junk_len = 0;
+	if(RM_XLOG_ID == rmid && XLOG_SWITCH == info)
+	{
+		XLogSegNo	startSegNo;
+		XLogSegNo	endSegNo;
+
+		XLByteToSeg(record->ReadRecPtr, startSegNo, record->segcxt.ws_segsize);
+		XLByteToSeg(record->EndRecPtr, endSegNo, record->segcxt.ws_segsize);
+
+		*junk_len = record->EndRecPtr - record->ReadRecPtr - XLogRecGetTotalLen(record);
+		/*
+		 * If the wal switch record spread on two segments, we should extra minus the
+		 * long page head.
+		 */
+		if(startSegNo != endSegNo)
+			*junk_len -= SizeOfXLogLongPHD;
+	}
 }
 
 /*
@@ -416,12 +444,14 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
 	uint8		recid;
 	uint32		rec_len;
 	uint32		fpi_len;
+	uint32		junk_len;
 
 	stats->count++;
 
 	rmid = XLogRecGetRmid(record);
 
-	XLogDumpRecordLen(record, &rec_len, &fpi_len);
+	XLogDumpRecordLen(record, &rec_len, &fpi_len, &junk_len);
+	rec_len += junk_len;
 
 	/* Update per-rmgr statistics */
 
@@ -435,9 +465,7 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
 	 * are the rmgr's domain (resulting in sixteen possible entries per
 	 * RmgrId).
 	 */
-
 	recid = XLogRecGetInfo(record) >> 4;
-
 	stats->record_stats[rmid][recid].count++;
 	stats->record_stats[rmid][recid].rec_len += rec_len;
 	stats->record_stats[rmid][recid].fpi_len += fpi_len;
@@ -453,6 +481,7 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
 	const RmgrDescData *desc = &RmgrDescTable[XLogRecGetRmid(record)];
 	uint32		rec_len;
 	uint32		fpi_len;
+	uint32		junk_len;
 	RelFileNode rnode;
 	ForkNumber	forknum;
 	BlockNumber blk;
@@ -461,7 +490,7 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
 	XLogRecPtr	xl_prev = XLogRecGetPrev(record);
 	StringInfoData s;
 
-	XLogDumpRecordLen(record, &rec_len, &fpi_len);
+	XLogDumpRecordLen(record, &rec_len, &fpi_len, &junk_len);
 
 	printf("rmgr: %-11s len (rec/tot): %6u/%6u, tx: %10u, lsn: %X/%08X, prev %X/%08X, ",
 		   desc->rm_name,
@@ -472,9 +501,15 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
 
 	id = desc->rm_identify(info);
 	if (id == NULL)
-		printf("desc: UNKNOWN (%x) ", info & ~XLR_INFO_MASK);
+		printf("desc: UNKNOWN (%x)", info & ~XLR_INFO_MASK);
 	else
-		printf("desc: %s ", id);
+		printf("desc: %s", id);
+
+	if(0 != junk_len)
+	{
+		printf(", trailing-bytes: %u", junk_len);
+	}
+	printf(" ");
 
 	initStringInfo(&s);
 	desc->rm_desc(&s, record);
#4Michael Paquier
michael@paquier.xyz
In reply to: movead.li@highgo.ca (#3)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

On Sat, Oct 10, 2020 at 09:50:02AM +0800, movead.li@highgo.ca wrote:

I think that the length of the XLOG_SWITCH record is no other than 24
bytes. Just adding the padding? garbage bytes to that length doesn't
seem the right thing to me.

Here's the lookes:
rmgr: XLOG len (rec/tot): 24/ 24, tx: 0, lsn: 0/030000D8, prev 0/03000060, desc: SWITCH, trailing-bytes: 16776936

 static void
-XLogDumpRecordLen(XLogReaderState *record, uint32 *rec_len, uint32 *fpi_len)
+XLogDumpRecordLen(XLogReaderState *record, uint32 *rec_len, uint32 *fpi_len, uint32 *junk_len)
 {
If you wish to add more information about a XLOG_SWITCH record, I
don't think that changing the signature of XLogDumpRecordLen() is
adapted because the record length of this record is defined as
Horiguchi-san mentioned upthread, and the meaning of junk_len is
confusing here.  It seems to me that any extra information should be
added to xlog_desc() where there should be an extra code path for
(info == XLOG_SWITCH).  XLogReaderState should have all the
information you are lookng for.
--
Michael
#5movead.li@highgo.ca
movead.li@highgo.ca
In reply to: movead.li@highgo.ca (#1)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

Thanks for reply.

If you wish to add more information about a XLOG_SWITCH record, I
don't think that changing the signature of XLogDumpRecordLen() is
adapted because the record length of this record is defined as
Horiguchi-san mentioned upthread, and the meaning of junk_len is
confusing here. It seems to me that any extra information should be
added to xlog_desc() where there should be an extra code path for
(info == XLOG_SWITCH). XLogReaderState should have all the
information you are lookng for.

We have two places to use the 'junk_len', one is when we show the
detail record information, another is when we statistics the percent
of all kind of wal record kinds(by --stat=record). The second place
will not run the xlog_desc(), so it's not a good chance to do the thing.

I am still can not understand why it can't adapted to change the
signature of XLogDumpRecordLen(), maybe we can add a new function
to caculate the 'junk_len' and rename the 'junk_len' as 'skiped_size' or
'switched_size'?

Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

#6Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: movead.li@highgo.ca (#5)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

At Mon, 12 Oct 2020 09:46:37 +0800, "movead.li@highgo.ca" <movead.li@highgo.ca> wrote in

Thanks for reply.

If you wish to add more information about a XLOG_SWITCH record, I
don't think that changing the signature of XLogDumpRecordLen() is
adapted because the record length of this record is defined as
Horiguchi-san mentioned upthread, and the meaning of junk_len is
confusing here. It seems to me that any extra information should be
added to xlog_desc() where there should be an extra code path for
(info == XLOG_SWITCH). XLogReaderState should have all the
information you are lookng for.

We have two places to use the 'junk_len', one is when we show the
detail record information, another is when we statistics the percent
of all kind of wal record kinds(by --stat=record). The second place
will not run the xlog_desc(), so it's not a good chance to do the thing.

I am still can not understand why it can't adapted to change the
signature of XLogDumpRecordLen(), maybe we can add a new function
to caculate the 'junk_len' and rename the 'junk_len' as 'skiped_size' or
'switched_size'?

The reason is the function XLogDumpRecordLen is a common function
among all kind of LOG records, not belongs only to XLOG_SWICH. And the
junk_len is not useful for other than XLOG_SWITCH. Descriptions
specifc to XLOG_SWITCH is provided by xlog_desc().

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#7Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#6)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

On Wed, Oct 14, 2020 at 10:29:44AM +0900, Kyotaro Horiguchi wrote:

The reason is the function XLogDumpRecordLen is a common function
among all kind of LOG records, not belongs only to XLOG_SWICH. And the
junk_len is not useful for other than XLOG_SWITCH. Descriptions
specifc to XLOG_SWITCH is provided by xlog_desc().

Yeah. In its current shape, it means that only pg_waldump would be
able to know this information. If you make this information part of
xlogdesc.c, any consumer of the WAL record descriptions would be able
to show this information, so it would provide a consistent output for
any kind of tools.

On top of that, it seems to me that the calculation used in the patch
is wrong in two aspects at quick glance:
1) startSegNo and endSegNo point always to two different segments with
a XLOG_SWITCH record, so you should check that ReadRecPtr is not at a
segment border instead before extracting SizeOfXLogLongPHD, no?
2) This stuff should also check after the case of a WAL *page* border
where you'd need to adjust based on SizeOfXLogShortPHD instead.
--
Michael

#8Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#7)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

Hi,

On 2020-10-14 15:52:43 +0900, Michael Paquier wrote:

Yeah. In its current shape, it means that only pg_waldump would be
able to know this information. If you make this information part of
xlogdesc.c, any consumer of the WAL record descriptions would be able
to show this information, so it would provide a consistent output for
any kind of tools.

I'm not convinced by this argument. The only case where accounting for
the "wasted" length seems really interesting is for --stats=record - and
for that including it in the record description is useless. When looking
at plain records the length is sufficiently deducable by looking at the
next record's LSN.

Greetings,

Andres Freund

#9Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Andres Freund (#8)
1 attachment(s)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

At Wed, 14 Oct 2020 13:46:13 -0700, Andres Freund <andres@anarazel.de> wrote in

Hi,

On 2020-10-14 15:52:43 +0900, Michael Paquier wrote:

Yeah. In its current shape, it means that only pg_waldump would be
able to know this information. If you make this information part of
xlogdesc.c, any consumer of the WAL record descriptions would be able
to show this information, so it would provide a consistent output for
any kind of tools.

I'm not convinced by this argument. The only case where accounting for
the "wasted" length seems really interesting is for --stats=record - and
for that including it in the record description is useless. When looking
at plain records the length is sufficiently deducable by looking at the
next record's LSN.

I'm not sure the exact motive of this proposal, but if we show the
wasted length in the stats result, I think it should be other than
existing record types.

  XLOG/CHECKPOINT_SHUTDOWN    1 (  0.50) ..
  ...
  Btree/INSERT_LEAF          63 ( 31.19) ..
+ EMPTY                       1 ( xx.xx) ..
----------------------------------------
  Total                      ...

By the way, I noticed that --stats=record shows two lines for
Transaction/COMMIT. The cause is that XLogDumpCountRecord assumes the
all four bits of xl_info is used to identify record id.

The fourth bit of xl_info of XLOG records is used to signal the
existence of record has 'xinfo' field or not. So an XLOG record with
recid == 8 actually exists but it is really a record that recid == 0
and has xinfo field.

I didn't come up with a cleaner solution but the attached fixes that.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

fix_pg_waldump_xact_commit_record_stats.difftext/x-patch; charset=us-asciiDownload
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 31e99c2a6d..c544b90d88 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -438,6 +438,13 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
 
 	recid = XLogRecGetInfo(record) >> 4;
 
+	/*
+	 * XXX: There is a special case for XACT records. Those records use the MSB
+	 * of recid for another purpose. Ignore that bit in that case.
+	 */
+	if (rmid == RM_XACT_ID)
+		recid &= 0x07;
+
 	stats->record_stats[rmid][recid].count++;
 	stats->record_stats[rmid][recid].rec_len += rec_len;
 	stats->record_stats[rmid][recid].fpi_len += fpi_len;
#10movead.li@highgo.ca
movead.li@highgo.ca
In reply to: Kyotaro Horiguchi (#6)
1 attachment(s)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

Thanks for all the suggestions.

Yeah. In its current shape, it means that only pg_waldump would be
able to know this information. If you make this information part of
xlogdesc.c, any consumer of the WAL record descriptions would be able
to show this information, so it would provide a consistent output for
any kind of tools.

I have change the implement, move some code into xlog_desc().

On top of that, it seems to me that the calculation used in the patch
is wrong in two aspects at quick glance:
1) startSegNo and endSegNo point always to two different segments with
a XLOG_SWITCH record, so you should check that ReadRecPtr is not at a
segment border instead before extracting SizeOfXLogLongPHD, no?

Yes you are right, and I use 'record->EndRecPtr - 1' instead.

2) This stuff should also check after the case of a WAL *page* border
where you'd need to adjust based on SizeOfXLogShortPHD instead.

The 'junk_len -= SizeOfXLogLongPHD' code is considered for when the
remain size of a wal segment can not afford a XLogRecord struct for
XLOG_SWITCH, it needn't care *page* border.

I'm not sure the exact motive of this proposal, but if we show the
wasted length in the stats result, I think it should be other than
existing record types.

Yes agree, and now it looks like below as new patch:

movead@movead-PC:/h2/pg/bin$ ./pg_waldump -p ../walbk/ -s 0/3000000 -e 0/6000000 -z
Type N (%) Record size (%) FPI size (%) Combined size (%)
---- - --- ----------- --- -------- --- ------------- ---
XLOG 5 ( 31.25) 300 ( 0.00) 0 ( 0.00) 300 ( 0.00)
XLOGSwitchJunk 3 ( 18.75) 50330512 (100.00) 0 ( 0.00) 50330512 (100.00)

movead@movead-PC:/h2/pg/bin$ ./pg_waldump -p ../walbk/ -s 0/3000000 -e 0/6000000 --stat=record
XLOG/SWITCH 3 ( 18.75) 72 ( 0.00) 0 ( 0.00) 72 ( 0.00)
XLOG/SWITCH_JUNK 3 ( 18.75) 50330512 (100.00) 0 ( 0.00) 50330512 (100.00)

The shortcoming now is I do not know how to handle the 'count' of SWITCH_JUNK
in pg_waldump results. Currently I regard SWITCH_JUNK as one count.

By the way, I noticed that --stats=record shows two lines for
Transaction/COMMIT. The cause is that XLogDumpCountRecord assumes the
all four bits of xl_info is used to identify record id.

Thanks,I didn't notice it before, and your patch added into v3 patch attached.

Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Attachments:

fix_waldump_size_for_wal_switch_v3.patchapplication/octet-stream; name=fix_waldump_size_for_wal_switch_v3.patchDownload
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 3200f777f5..9aee7279a4 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -74,6 +74,13 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		memcpy(&nextOid, rec, sizeof(Oid));
 		appendStringInfo(buf, "%u", nextOid);
 	}
+	else if (info == XLOG_SWITCH)
+	{
+		uint32	junk_len;
+		junk_len = xlog_switch_junk_len(record);
+
+		appendStringInfo(buf, "trailing-bytes: %u", junk_len);
+	}
 	else if (info == XLOG_RESTORE_POINT)
 	{
 		xl_restore_point *xlrec = (xl_restore_point *) rec;
@@ -142,6 +149,30 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 	}
 }
 
+uint32
+xlog_switch_junk_len(XLogReaderState *record)
+{
+	uint32 		junk_len;
+	XLogSegNo	startSegNo;
+	XLogSegNo	endSegNo;
+
+	XLByteToSeg(record->ReadRecPtr, startSegNo, record->segcxt.ws_segsize);
+	XLByteToSeg(record->EndRecPtr - 1, endSegNo, record->segcxt.ws_segsize);
+
+	junk_len = record->EndRecPtr - record->ReadRecPtr - XLogRecGetTotalLen(record);
+	/*
+	 * If the wal switch record spread on two segments, we should extra minus the
+	 * long page head. I mean the case when the remain size of a wal segment can not
+	 * afford a XLogRecord struct for XLOG_SWITCH.
+	 */
+	if(startSegNo != endSegNo)
+		junk_len -= SizeOfXLogLongPHD;
+
+	Assert(junk_len >= 0);
+
+	return junk_len;
+}
+
 const char *
 xlog_identify(uint8 info)
 {
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 31e99c2a6d..0df4ddcf72 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -24,6 +24,7 @@
 #include "common/logging.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "catalog/pg_control.h"
 
 static const char *progname;
 
@@ -66,6 +67,8 @@ typedef struct Stats
 typedef struct XLogDumpStats
 {
 	uint64		count;
+	uint64		count_xlog_switch;
+	uint32		junk_size;
 	Stats		rmgr_stats[RM_NEXT_ID];
 	Stats		record_stats[RM_NEXT_ID][MAX_XLINFO_TYPES];
 } XLogDumpStats;
@@ -416,12 +419,21 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
 	uint8		recid;
 	uint32		rec_len;
 	uint32		fpi_len;
+	uint32		junk_len;
+	uint8		info;
 
 	stats->count++;
 
 	rmid = XLogRecGetRmid(record);
+	info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
 
 	XLogDumpRecordLen(record, &rec_len, &fpi_len);
+	if(RM_XLOG_ID == rmid && XLOG_SWITCH == info)
+	{
+		junk_len = xlog_switch_junk_len(record);
+		stats->count_xlog_switch++;
+		stats->junk_size += junk_len;
+	}
 
 	/* Update per-rmgr statistics */
 
@@ -438,6 +450,14 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
 
 	recid = XLogRecGetInfo(record) >> 4;
 
+	/*
+	 * XXX: There is a special case for XACT records. Those records use the MSB
+	 * of recid for another purpose. Ignore that bit in that case.
+	 */
+	if (rmid == RM_XACT_ID)
+		recid &= 0x07;
+
+
 	stats->record_stats[rmid][recid].count++;
 	stats->record_stats[rmid][recid].rec_len += rec_len;
 	stats->record_stats[rmid][recid].fpi_len += fpi_len;
@@ -610,6 +630,7 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
 	uint64		total_len = 0;
 	double		rec_len_pct,
 				fpi_len_pct;
+	const char 	*switch_junk_id;
 
 	/*
 	 * Each row shows its percentages of the total, so make a first pass to
@@ -622,7 +643,9 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
 		total_rec_len += stats->rmgr_stats[ri].rec_len;
 		total_fpi_len += stats->rmgr_stats[ri].fpi_len;
 	}
+	total_rec_len += stats->junk_size;
 	total_len = total_rec_len + total_fpi_len;
+	total_count += stats->count_xlog_switch;
 
 	/*
 	 * 27 is strlen("Transaction/COMMIT_PREPARED"), 20 is strlen(2^64), 8 is
@@ -652,12 +675,20 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
 			XLogDumpStatsRow(desc->rm_name,
 							 count, total_count, rec_len, total_rec_len,
 							 fpi_len, total_fpi_len, tot_len, total_len);
+			if(RM_XLOG_ID == ri)
+			{
+				switch_junk_id = "XLOGSwitchJunk";
+				XLogDumpStatsRow(psprintf("%s", switch_junk_id),
+									stats->count_xlog_switch, total_count, stats->junk_size, total_rec_len,
+									0, total_fpi_len, stats->junk_size, total_len);
+			}
 		}
 		else
 		{
 			for (rj = 0; rj < MAX_XLINFO_TYPES; rj++)
 			{
 				const char *id;
+				uint8		info;
 
 				count = stats->record_stats[ri][rj].count;
 				rec_len = stats->record_stats[ri][rj].rec_len;
@@ -676,6 +707,12 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
 				XLogDumpStatsRow(psprintf("%s/%s", desc->rm_name, id),
 								 count, total_count, rec_len, total_rec_len,
 								 fpi_len, total_fpi_len, tot_len, total_len);
+				info = (rj << 4) & ~XLR_INFO_MASK;
+				switch_junk_id = "XLOG/SWITCH_JUNK";
+				if( XLOG_SWITCH == info && stats->count_xlog_switch > 0)
+					XLogDumpStatsRow(psprintf("%s", switch_junk_id),
+									stats->count_xlog_switch, total_count, stats->junk_size, total_rec_len,
+									0, total_fpi_len, stats->junk_size, total_len);
 			}
 		}
 	}
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 4146753d47..8cbc309108 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -315,6 +315,7 @@ extern XLogRecPtr RequestXLogSwitch(bool mark_unimportant);
 
 extern void GetOldestRestartPoint(XLogRecPtr *oldrecptr, TimeLineID *oldtli);
 
+extern uint32 xlog_switch_junk_len(XLogReaderState *record);
 /*
  * Exported for the functions in timeline.c and xlogarchive.c.  Only valid
  * in the startup process.
#11Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: movead.li@highgo.ca (#10)
1 attachment(s)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

At Thu, 15 Oct 2020 12:56:02 +0800, "movead.li@highgo.ca" <movead.li@highgo.ca> wrote in

Thanks for all the suggestions.

Yeah. In its current shape, it means that only pg_waldump would be
able to know this information. If you make this information part of
xlogdesc.c, any consumer of the WAL record descriptions would be able
to show this information, so it would provide a consistent output for
any kind of tools.

I have change the implement, move some code into xlog_desc().

Andres suggested that we don't need that description with per-record
basis. Do you have a reason to do that? (For clarity, I'm not
suggesting that you should reving it.)

On top of that, it seems to me that the calculation used in the patch
is wrong in two aspects at quick glance:
1) startSegNo and endSegNo point always to two different segments with
a XLOG_SWITCH record, so you should check that ReadRecPtr is not at a
segment border instead before extracting SizeOfXLogLongPHD, no?

Yes you are right, and I use 'record->EndRecPtr - 1' instead.

+ XLByteToSeg(record->EndRecPtr - 1, endSegNo, record->segcxt.ws_segsize);

We use XLByteToPrevSeg instead for this purpose.

2) This stuff should also check after the case of a WAL *page* border
where you'd need to adjust based on SizeOfXLogShortPHD instead.

The 'junk_len -= SizeOfXLogLongPHD' code is considered for when the
remain size of a wal segment can not afford a XLogRecord struct for
XLOG_SWITCH, it needn't care *page* border.

I'm not sure the exact motive of this proposal, but if we show the
wasted length in the stats result, I think it should be other than
existing record types.

Yes agree, and now it looks like below as new patch:

You forgot to add a correction for short headers.

movead@movead-PC:/h2/pg/bin$ ./pg_waldump -p ../walbk/ -s 0/3000000 -e 0/6000000 -z
Type N (%) Record size (%) FPI size (%) Combined size (%)
---- - --- ----------- --- -------- --- ------------- ---
XLOG 5 ( 31.25) 300 ( 0.00) 0 ( 0.00) 300 ( 0.00)
XLOGSwitchJunk 3 ( 18.75) 50330512 (100.00) 0 ( 0.00) 50330512 (100.00)

movead@movead-PC:/h2/pg/bin$ ./pg_waldump -p ../walbk/ -s 0/3000000 -e 0/6000000 --stat=record
XLOG/SWITCH 3 ( 18.75) 72 ( 0.00) 0 ( 0.00) 72 ( 0.00)
XLOG/SWITCH_JUNK 3 ( 18.75) 50330512 (100.00) 0 ( 0.00) 50330512 (100.00)

The shortcoming now is I do not know how to handle the 'count' of SWITCH_JUNK
in pg_waldump results. Currently I regard SWITCH_JUNK as one count.

+ if(RM_XLOG_ID == rmid && XLOG_SWITCH == info)

We need a comment for the special code path.
We don't follow this kind of convension. Rather use "variable =
constant".

+	{
+		junk_len = xlog_switch_junk_len(record);
+		stats->count_xlog_switch++;
+		stats->junk_size += junk_len;

junk_len is used only the last line above. We don't need that
temporary variable.

+ * If the wal switch record spread on two segments, we should extra minus the

This comment is sticking out of 80-column border. However, I'm not
sure if we have reached a conclustion about the target console-width.

+				info = (rj << 4) & ~XLR_INFO_MASK;
+				switch_junk_id = "XLOG/SWITCH_JUNK";
+				if( XLOG_SWITCH == info && stats->count_xlog_switch > 0)

This line order is strange. At least switch_junk_id is used only in
the if-then block.

I'm not confindent on which is better, but I feel that this is not a
work for display side, but for the recorder side like attached.

By the way, I noticed that --stats=record shows two lines for
Transaction/COMMIT. The cause is that XLogDumpCountRecord assumes the
all four bits of xl_info is used to identify record id.

Thanks,I didn't notice it before, and your patch added into v3 patch attached.

Sorry for the confusion, but it would be a separate topic even if we
are going to fix that..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

fix_waldump_size_for_wal_switch_v3-1.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 3200f777f5..04042a50a4 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -142,6 +142,31 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 	}
 }
 
+uint32
+xlog_switch_junk_len(XLogReaderState *record)
+{
+	uint32 		junk_len;
+	XLogSegNo	startSegNo;
+	XLogSegNo	endSegNo;
+
+	XLByteToSeg(record->ReadRecPtr, startSegNo, record->segcxt.ws_segsize);
+	XLByteToSeg(record->EndRecPtr - 1, endSegNo, record->segcxt.ws_segsize);
+
+	junk_len = record->EndRecPtr - record->ReadRecPtr - XLogRecGetTotalLen(record);
+	/*
+	 * If the wal switch record spread on two segments, we should extra minus the
+	 * long page head. I mean the case when the remain size of a wal segment can not
+	 * afford a XLogRecord struct for XLOG_SWITCH.
+	 */
+	if(startSegNo != endSegNo)
+		junk_len -= SizeOfXLogLongPHD;
+
+
+	Assert(junk_len >= 0);
+
+	return junk_len;
+}
+
 const char *
 xlog_identify(uint8 info)
 {
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 31e99c2a6d..ab4e7c830f 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -24,6 +24,7 @@
 #include "common/logging.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "catalog/pg_control.h"
 
 static const char *progname;
 
@@ -66,8 +67,8 @@ typedef struct Stats
 typedef struct XLogDumpStats
 {
 	uint64		count;
-	Stats		rmgr_stats[RM_NEXT_ID];
-	Stats		record_stats[RM_NEXT_ID][MAX_XLINFO_TYPES];
+	Stats		rmgr_stats[RM_NEXT_ID + 1];
+	Stats		record_stats[RM_NEXT_ID + 1][MAX_XLINFO_TYPES];
 } XLogDumpStats;
 
 #define fatal_error(...) do { pg_log_fatal(__VA_ARGS__); exit(EXIT_FAILURE); } while(0)
@@ -441,6 +442,22 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
 	stats->record_stats[rmid][recid].count++;
 	stats->record_stats[rmid][recid].rec_len += rec_len;
 	stats->record_stats[rmid][recid].fpi_len += fpi_len;
+
+
+	/* Add junk-space stats for XLOG_SWITCH  */
+	if(rmid == RM_XLOG_ID && XLogRecGetInfo(record) == XLOG_SWITCH)
+	{
+		uint64 junk_len = xlog_switch_junk_len(record);
+
+		rmid = WALDUMP_RM_ID;
+		recid = 0;
+
+		stats->rmgr_stats[rmid].count++;
+		stats->rmgr_stats[rmid].rec_len += junk_len;
+
+		stats->record_stats[rmid][recid].count++;
+		stats->record_stats[rmid][recid].rec_len += junk_len;
+	}
 }
 
 /*
@@ -616,7 +633,7 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
 	 * calculate column totals.
 	 */
 
-	for (ri = 0; ri < RM_NEXT_ID; ri++)
+	for (ri = 0; ri <= RM_NEXT_ID; ri++)
 	{
 		total_count += stats->rmgr_stats[ri].count;
 		total_rec_len += stats->rmgr_stats[ri].rec_len;
@@ -634,7 +651,7 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
 		   "Type", "N", "(%)", "Record size", "(%)", "FPI size", "(%)", "Combined size", "(%)",
 		   "----", "-", "---", "-----------", "---", "--------", "---", "-------------", "---");
 
-	for (ri = 0; ri < RM_NEXT_ID; ri++)
+	for (ri = 0; ri <= RM_NEXT_ID; ri++)
 	{
 		uint64		count,
 					rec_len,
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..efddc70ac1 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -35,6 +35,18 @@
 #define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask) \
 	{ name, desc, identify},
 
-const RmgrDescData RmgrDescTable[RM_MAX_ID + 1] = {
+static const char *waldump_xlogswitch_identify(uint8 info);
+
+const RmgrDescData RmgrDescTable[RM_MAX_ID + 2] = {
 #include "access/rmgrlist.h"
+	PG_RMGR(WALDUMP_RM_ID, "OTHERS", NULL, NULL, waldump_xlogswitch_identify, NULL, NULL, NULL)
 };
+
+static const char *
+waldump_xlogswitch_identify(uint8 info)
+{
+	if (info == 0)
+		return "XLOG_SWITCH_JUNK";
+	else
+		return NULL;
+}
diff --git a/src/bin/pg_waldump/rmgrdesc.h b/src/bin/pg_waldump/rmgrdesc.h
index 42f8483b48..2a67d1050e 100644
--- a/src/bin/pg_waldump/rmgrdesc.h
+++ b/src/bin/pg_waldump/rmgrdesc.h
@@ -20,4 +20,6 @@ typedef struct RmgrDescData
 
 extern const RmgrDescData RmgrDescTable[];
 
+#define WALDUMP_RM_ID RM_NEXT_ID
+
 #endif							/* RMGRDESC_H */
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 4146753d47..8cbc309108 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -315,6 +315,7 @@ extern XLogRecPtr RequestXLogSwitch(bool mark_unimportant);
 
 extern void GetOldestRestartPoint(XLogRecPtr *oldrecptr, TimeLineID *oldtli);
 
+extern uint32 xlog_switch_junk_len(XLogReaderState *record);
 /*
  * Exported for the functions in timeline.c and xlogarchive.c.  Only valid
  * in the startup process.
#12Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#11)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

At Thu, 15 Oct 2020 17:32:10 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Thu, 15 Oct 2020 12:56:02 +0800, "movead.li@highgo.ca" <movead.li@highgo.ca> wrote in

Thanks for all the suggestions.

Yeah. In its current shape, it means that only pg_waldump would be
able to know this information. If you make this information part of
xlogdesc.c, any consumer of the WAL record descriptions would be able
to show this information, so it would provide a consistent output for
any kind of tools.

I have change the implement, move some code into xlog_desc().

Andres suggested that we don't need that description with per-record
basis. Do you have a reason to do that? (For clarity, I'm not
suggesting that you should reving it.)

Sorry. Maybe I deleted wrong letters in the "reving" above.

====
Andres suggested that we don't need that description with per-record
basis. Do you have a reason to do that? (For clarity, I'm not
suggesting that you should remove it.)
====

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#13movead.li@highgo.ca
movead.li@highgo.ca
In reply to: Andres Freund (#8)
1 attachment(s)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

Thanks for all the suggestion, and new patch attached.

Andres suggested that we don't need that description with per-record
basis. Do you have a reason to do that? (For clarity, I'm not
suggesting that you should reving it.)

I think Andres is saying if we just log it in xlog_desc() then we can not get
the result for '--stats=record' case. And the patch solve the problem.

+ XLByteToSeg(record->EndRecPtr - 1, endSegNo, record->segcxt.ws_segsize);
We use XLByteToPrevSeg instead for this purpose.

Thanks and follow the suggestion.

You forgot to add a correction for short headers.

Infact, we need to consider this matter when the remain size of a wal
segment can not afford a XLogRecord struct for XLOG_SWITCH.
It's mean that if record->ReadRecPtr is on A wal segment, then
record->EndRecPtr is on (A+2) wal segment. So we need to minus
the longpagehead size on (A+1) wal segment.
In my thought we need not to care the short header, if my understand
is wrong?

+ if(RM_XLOG_ID == rmid && XLOG_SWITCH == info)

We need a comment for the special code path.
We don't follow this kind of convension. Rather use "variable =
constant".
+ {
+ junk_len = xlog_switch_junk_len(record);
+ stats->count_xlog_switch++;
+ stats->junk_size += junk_len;

junk_len is used only the last line above. We don't need that
temporary variable.

+ * If the wal switch record spread on two segments, we should extra minus the
This comment is sticking out of 80-column border.  However, I'm not
sure if we have reached a conclustion about the target console-width.
+ info = (rj << 4) & ~XLR_INFO_MASK;
+ switch_junk_id = "XLOG/SWITCH_JUNK";
+ if( XLOG_SWITCH == info && stats->count_xlog_switch > 0)

This line order is strange. At least switch_junk_id is used only in
the if-then block.

Thanks and follow the suggestions.

I'm not confindent on which is better, but I feel that this is not a
work for display side, but for the recorder side like attached.

The patch really seems more clearly, but the new 'OTHERS' may confuse
the users and we hard to handle it with '--rmgr=RMGR' option. So I have
not use this design in this patch, let's see other's opinion.

Sorry for the confusion, but it would be a separate topic even if we
are going to fix that..

Sorry, I remove the code, make sense we should discuss it in a
separate topic.

Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Attachments:

fix_waldump_size_for_wal_switch_v4.patchapplication/octet-stream; name=fix_waldump_size_for_wal_switch_v4.patchDownload
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 3200f777f5..2edc02fb72 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -74,6 +74,13 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		memcpy(&nextOid, rec, sizeof(Oid));
 		appendStringInfo(buf, "%u", nextOid);
 	}
+	else if (info == XLOG_SWITCH)
+	{
+		uint32	junk_len;
+		junk_len = xlog_switch_junk_len(record);
+
+		appendStringInfo(buf, "trailing-bytes: %u", junk_len);
+	}
 	else if (info == XLOG_RESTORE_POINT)
 	{
 		xl_restore_point *xlrec = (xl_restore_point *) rec;
@@ -142,6 +149,31 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 	}
 }
 
+uint32
+xlog_switch_junk_len(XLogReaderState *record)
+{
+	uint32 		junk_len;
+	XLogSegNo	startSegNo;
+	XLogSegNo	endSegNo;
+
+	XLByteToSeg(record->ReadRecPtr, startSegNo, record->segcxt.ws_segsize);
+	XLByteToPrevSeg(record->EndRecPtr, endSegNo, record->segcxt.ws_segsize);
+
+	junk_len = record->EndRecPtr - record->ReadRecPtr -
+											XLogRecGetTotalLen(record);
+	/*
+	 * If the wal switch record spread on two segments, we should extra minus
+	 * the long page head. I mean the case when the remain size of a wal
+	 * segment can not afford a XLogRecord struct for XLOG_SWITCH.
+	 */
+	if(startSegNo != endSegNo)
+		junk_len -= SizeOfXLogLongPHD;
+
+	Assert(junk_len >= 0);
+
+	return junk_len;
+}
+
 const char *
 xlog_identify(uint8 info)
 {
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 31e99c2a6d..c6db93ff36 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -24,6 +24,7 @@
 #include "common/logging.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "catalog/pg_control.h"
 
 static const char *progname;
 
@@ -66,6 +67,7 @@ typedef struct Stats
 typedef struct XLogDumpStats
 {
 	uint64		count;
+	uint32		junk_size;
 	Stats		rmgr_stats[RM_NEXT_ID];
 	Stats		record_stats[RM_NEXT_ID][MAX_XLINFO_TYPES];
 } XLogDumpStats;
@@ -416,12 +418,21 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
 	uint8		recid;
 	uint32		rec_len;
 	uint32		fpi_len;
+	uint8		info;
 
 	stats->count++;
 
 	rmid = XLogRecGetRmid(record);
+	info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
 
 	XLogDumpRecordLen(record, &rec_len, &fpi_len);
+	/* Add junk-space stats for XLOG_SWITCH  */
+	if(rmid == RM_XLOG_ID && info == XLOG_SWITCH)
+	{
+		uint32 junk_len = xlog_switch_junk_len(record);
+
+		stats->junk_size += junk_len;
+	}
 
 	/* Update per-rmgr statistics */
 
@@ -622,6 +633,7 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
 		total_rec_len += stats->rmgr_stats[ri].rec_len;
 		total_fpi_len += stats->rmgr_stats[ri].fpi_len;
 	}
+	total_rec_len += stats->junk_size;
 	total_len = total_rec_len + total_fpi_len;
 
 	/*
@@ -652,12 +664,20 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
 			XLogDumpStatsRow(desc->rm_name,
 							 count, total_count, rec_len, total_rec_len,
 							 fpi_len, total_fpi_len, tot_len, total_len);
+			if(ri == RM_XLOG_ID)
+			{
+				XLogDumpStatsRow(psprintf("XLOGSwitchJunk"),
+								0, total_count, stats->junk_size, total_rec_len,
+								0, total_fpi_len, stats->junk_size, total_len);
+			}
+
 		}
 		else
 		{
 			for (rj = 0; rj < MAX_XLINFO_TYPES; rj++)
 			{
 				const char *id;
+				uint8		info;
 
 				count = stats->record_stats[ri][rj].count;
 				rec_len = stats->record_stats[ri][rj].rec_len;
@@ -676,6 +696,12 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
 				XLogDumpStatsRow(psprintf("%s/%s", desc->rm_name, id),
 								 count, total_count, rec_len, total_rec_len,
 								 fpi_len, total_fpi_len, tot_len, total_len);
+				info = (rj << 4) & ~XLR_INFO_MASK;
+				if(info == XLOG_SWITCH)
+					XLogDumpStatsRow(psprintf("XLOG/SWITCH_JUNK"),
+									0, total_count, stats->junk_size, total_rec_len,
+									0, total_fpi_len, stats->junk_size, total_len);
+
 			}
 		}
 	}
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 4146753d47..8cbc309108 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -315,6 +315,7 @@ extern XLogRecPtr RequestXLogSwitch(bool mark_unimportant);
 
 extern void GetOldestRestartPoint(XLogRecPtr *oldrecptr, TimeLineID *oldtli);
 
+extern uint32 xlog_switch_junk_len(XLogReaderState *record);
 /*
  * Exported for the functions in timeline.c and xlogarchive.c.  Only valid
  * in the startup process.
#14Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: movead.li@highgo.ca (#13)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

At Fri, 16 Oct 2020 16:21:47 +0800, "movead.li@highgo.ca" <movead.li@highgo.ca> wrote in

Thanks for all the suggestion, and new patch attached.

Andres suggested that we don't need that description with per-record
basis. Do you have a reason to do that? (For clarity, I'm not
suggesting that you should reving it.)

I think Andres is saying if we just log it in xlog_desc() then we can not get
the result for '--stats=record' case. And the patch solve the problem.

Mmm.

and
for that including it in the record description is useless. When looking
at plain records the length is sufficiently deducable by looking at the
next record's LSN.

It looks to me "We can know that length by subtracting the LSN of
XLOG_SWITCH from the next record's LSN so it doesn't add any
information."

+ XLByteToSeg(record->EndRecPtr - 1, endSegNo, record->segcxt.ws_segsize);
We use XLByteToPrevSeg instead for this purpose.

Thanks and follow the suggestion.

You forgot to add a correction for short headers.

Infact, we need to consider this matter when the remain size of a wal
segment can not afford a XLogRecord struct for XLOG_SWITCH.
It's mean that if record->ReadRecPtr is on A wal segment, then
record->EndRecPtr is on (A+2) wal segment. So we need to minus
the longpagehead size on (A+1) wal segment.
In my thought we need not to care the short header, if my understand
is wrong?

Maybe.

When a page doesn't have sufficient space for a record, the record is
split into to pieces and the last half is recorded after the header of
the next page. If it next page is in the next segment, the header is a
long header and a short header otherwise.

If it were the last page of a segment,

ReadRecPtr
v
<--- SEG A ------->|<---------- SEG A+1 ----------------->|<-SEG A+2
<XLOG_SWITCH_FIRST>|<LONG HEADER><XLOG_SWTICH_LAST><EMPTY>|<LONG HEADER>

So the length of <EMPTY> is:

LOC(SEG A+2) - ReadRecPtr - LEN(long header) - LEN(XLOG_SWITCH)

If not, that is, it were not the last page of a segment.

<-------------------- SEG A ---------------------------->|<-SEG A+1
< page n ->|<-- page n + 1 ---------->|....|<-last page->|<-first page
<X_S_FIRST>|<SHORT HEADER><X_S_LAST><EMPTY..............>|<LONG HEADER>

So the length of <EMPTY> in this case is:

LOC(SEG A+1) - ReadRecPtr - LEN(short header) - LEN(XLOG_SWITCH)

I'm not confindent on which is better, but I feel that this is not a
work for display side, but for the recorder side like attached.

The patch really seems more clearly, but the new 'OTHERS' may confuse
the users and we hard to handle it with '--rmgr=RMGR' option. So I have
not use this design in this patch, let's see other's opinion.

Yeah, I don't like the "OTHERS", too.

Sorry for the confusion, but it would be a separate topic even if we
are going to fix that..

Sorry, I remove the code, make sense we should discuss it in a
separate topic.

Agreed.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#15movead.li@highgo.ca
movead.li@highgo.ca
In reply to: movead.li@highgo.ca (#10)
1 attachment(s)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

It looks to me "We can know that length by subtracting the LSN of
XLOG_SWITCH from the next record's LSN so it doesn't add any
information."

Sorry,maybe I miss this before.
But I think it will be better to show it clearly.

So the length of <EMPTY> in this case is:

LOC(SEG A+1) - ReadRecPtr - LEN(short header) - LEN(XLOG_SWITCH)

Thanks, I should not have missed this and fixed.

Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Attachments:

fix_waldump_size_for_wal_switch_v5.patchapplication/octet-stream; name=fix_waldump_size_for_wal_switch_v5.patchDownload
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 3200f777f5..c1ee96bb91 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -74,6 +74,13 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		memcpy(&nextOid, rec, sizeof(Oid));
 		appendStringInfo(buf, "%u", nextOid);
 	}
+	else if (info == XLOG_SWITCH)
+	{
+		uint32	junk_len;
+		junk_len = xlog_switch_junk_len(record);
+
+		appendStringInfo(buf, "trailing-bytes: %u", junk_len);
+	}
 	else if (info == XLOG_RESTORE_POINT)
 	{
 		xl_restore_point *xlrec = (xl_restore_point *) rec;
@@ -142,6 +149,33 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 	}
 }
 
+uint32
+xlog_switch_junk_len(XLogReaderState *record)
+{
+	uint32 		junk_len;
+	XLogSegNo	startSegNo;
+	XLogSegNo	endSegNo;
+
+	XLByteToSeg(record->ReadRecPtr, startSegNo, record->segcxt.ws_segsize);
+	XLByteToPrevSeg(record->EndRecPtr, endSegNo, record->segcxt.ws_segsize);
+
+	junk_len = record->EndRecPtr - record->ReadRecPtr -
+											XLogRecGetTotalLen(record);
+	/*
+	 * If the wal switch record spread on two pages, we should extra minus
+	 * the page head, be careful if it's page at the top of a wal segment.
+	 */
+	if(startSegNo != endSegNo)
+		junk_len -= SizeOfXLogLongPHD;
+	else if(record->ReadRecPtr / XLOG_BLCKSZ !=
+				(record->EndRecPtr - 1) / XLOG_BLCKSZ)
+		junk_len -= SizeOfXLogShortPHD;
+
+	Assert(junk_len >= 0);
+
+	return junk_len;
+}
+
 const char *
 xlog_identify(uint8 info)
 {
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 31e99c2a6d..c6db93ff36 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -24,6 +24,7 @@
 #include "common/logging.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "catalog/pg_control.h"
 
 static const char *progname;
 
@@ -66,6 +67,7 @@ typedef struct Stats
 typedef struct XLogDumpStats
 {
 	uint64		count;
+	uint32		junk_size;
 	Stats		rmgr_stats[RM_NEXT_ID];
 	Stats		record_stats[RM_NEXT_ID][MAX_XLINFO_TYPES];
 } XLogDumpStats;
@@ -416,12 +418,21 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
 	uint8		recid;
 	uint32		rec_len;
 	uint32		fpi_len;
+	uint8		info;
 
 	stats->count++;
 
 	rmid = XLogRecGetRmid(record);
+	info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
 
 	XLogDumpRecordLen(record, &rec_len, &fpi_len);
+	/* Add junk-space stats for XLOG_SWITCH  */
+	if(rmid == RM_XLOG_ID && info == XLOG_SWITCH)
+	{
+		uint32 junk_len = xlog_switch_junk_len(record);
+
+		stats->junk_size += junk_len;
+	}
 
 	/* Update per-rmgr statistics */
 
@@ -622,6 +633,7 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
 		total_rec_len += stats->rmgr_stats[ri].rec_len;
 		total_fpi_len += stats->rmgr_stats[ri].fpi_len;
 	}
+	total_rec_len += stats->junk_size;
 	total_len = total_rec_len + total_fpi_len;
 
 	/*
@@ -652,12 +664,20 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
 			XLogDumpStatsRow(desc->rm_name,
 							 count, total_count, rec_len, total_rec_len,
 							 fpi_len, total_fpi_len, tot_len, total_len);
+			if(ri == RM_XLOG_ID)
+			{
+				XLogDumpStatsRow(psprintf("XLOGSwitchJunk"),
+								0, total_count, stats->junk_size, total_rec_len,
+								0, total_fpi_len, stats->junk_size, total_len);
+			}
+
 		}
 		else
 		{
 			for (rj = 0; rj < MAX_XLINFO_TYPES; rj++)
 			{
 				const char *id;
+				uint8		info;
 
 				count = stats->record_stats[ri][rj].count;
 				rec_len = stats->record_stats[ri][rj].rec_len;
@@ -676,6 +696,12 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
 				XLogDumpStatsRow(psprintf("%s/%s", desc->rm_name, id),
 								 count, total_count, rec_len, total_rec_len,
 								 fpi_len, total_fpi_len, tot_len, total_len);
+				info = (rj << 4) & ~XLR_INFO_MASK;
+				if(info == XLOG_SWITCH)
+					XLogDumpStatsRow(psprintf("XLOG/SWITCH_JUNK"),
+									0, total_count, stats->junk_size, total_rec_len,
+									0, total_fpi_len, stats->junk_size, total_len);
+
 			}
 		}
 	}
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 4146753d47..8cbc309108 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -315,6 +315,7 @@ extern XLogRecPtr RequestXLogSwitch(bool mark_unimportant);
 
 extern void GetOldestRestartPoint(XLogRecPtr *oldrecptr, TimeLineID *oldtli);
 
+extern uint32 xlog_switch_junk_len(XLogReaderState *record);
 /*
  * Exported for the functions in timeline.c and xlogarchive.c.  Only valid
  * in the startup process.
#16Noname
Shinya11.Kato@nttdata.com
In reply to: movead.li@highgo.ca (#15)
RE: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

When I execute pg_waldump, I found that XLOG/SWITCH_JUNK appears twice.
Is this problem solved by the way of correcting the previously discussed Transaction/COMMIT?

$ ../bin/pg_waldump --stats=record ../data/pg_wal/000000010000000000000001
Type N (%) Record size (%) FPI size (%) Combined size (%)
---- - --- ----------- --- -------- --- ------------- ---
XLOG/CHECKPOINT_SHUTDOWN 5 ( 0.01) 570 ( 0.01) 0 ( 0.00) 570 ( 0.01)
XLOG/CHECKPOINT_ONLINE 6 ( 0.02) 684 ( 0.02) 0 ( 0.00) 684 ( 0.01)
XLOG/NEXTOID 3 ( 0.01) 90 ( 0.00) 0 ( 0.00) 90 ( 0.00)
XLOG/FPI 290 ( 0.80) 14210 ( 0.34) 638216 ( 40.72) 652426 ( 11.30)
Transaction/COMMIT 12 ( 0.03) 408 ( 0.01) 0 ( 0.00) 408 ( 0.01)
Transaction/COMMIT 496 ( 1.36) 134497 ( 3.20) 0 ( 0.00) 134497 ( 2.33)
Storage/CREATE 13 ( 0.04) 546 ( 0.01) 0 ( 0.00) 546 ( 0.01)
CLOG/ZEROPAGE 1 ( 0.00) 30 ( 0.00) 0 ( 0.00) 30 ( 0.00)
Database/CREATE 2 ( 0.01) 84 ( 0.00) 0 ( 0.00) 84 ( 0.00)
Standby/LOCK 142 ( 0.39) 5964 ( 0.14) 0 ( 0.00) 5964 ( 0.10)
Standby/RUNNING_XACTS 13 ( 0.04) 666 ( 0.02) 0 ( 0.00) 666 ( 0.01)
Standby/INVALIDATIONS 136 ( 0.37) 12416 ( 0.30) 0 ( 0.00) 12416 ( 0.22)
Heap2/CLEAN 132 ( 0.36) 8994 ( 0.21) 0 ( 0.00) 8994 ( 0.16)
Heap2/FREEZE_PAGE 245 ( 0.67) 168704 ( 4.01) 0 ( 0.00) 168704 ( 2.92)
Heap2/CLEANUP_INFO 2 ( 0.01) 84 ( 0.00) 0 ( 0.00) 84 ( 0.00)
Heap2/VISIBLE 424 ( 1.16) 25231 ( 0.60) 352256 ( 22.48) 377487 ( 6.54)
XLOG/SWITCH_JUNK 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) 0 ( 0.00)
Heap2/MULTI_INSERT 1511 ( 4.15) 287727 ( 6.84) 12872 ( 0.82) 300599 ( 5.21)
Heap2/MULTI_INSERT+INIT 46 ( 0.13) 71910 ( 1.71) 0 ( 0.00) 71910 ( 1.25)
Heap/INSERT 8849 ( 24.31) 1288414 ( 30.62) 25648 ( 1.64) 1314062 ( 22.76)
Heap/DELETE 25 ( 0.07) 1350 ( 0.03) 0 ( 0.00) 1350 ( 0.02)
Heap/UPDATE 173 ( 0.48) 55238 ( 1.31) 5964 ( 0.38) 61202 ( 1.06)
Heap/HOT_UPDATE 257 ( 0.71) 27585 ( 0.66) 1300 ( 0.08) 28885 ( 0.50)
XLOG/SWITCH_JUNK 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) 0 ( 0.00)
Heap/LOCK 180 ( 0.49) 9800 ( 0.23) 129812 ( 8.28) 139612 ( 2.42)
Heap/INPLACE 214 ( 0.59) 44520 ( 1.06) 40792 ( 2.60) 85312 ( 1.48)
Heap/INSERT+INIT 171 ( 0.47) 171318 ( 4.07) 0 ( 0.00) 171318 ( 2.97)

Regards,
Shinya Kato

#17Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Noname (#16)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

Thanks for taking a look on this.

At Fri, 4 Dec 2020 04:20:47 +0000, <Shinya11.Kato@nttdata.com> wrote in

When I execute pg_waldump, I found that XLOG/SWITCH_JUNK appears twice.
Is this problem solved by the way of correcting the previously discussed Transaction/COMMIT?

$ ../bin/pg_waldump --stats=record ../data/pg_wal/000000010000000000000001
Type N (%) Record size (%) FPI size (%) Combined size (%)
---- - --- ----------- --- -------- --- ------------- ---

..

XLOG/SWITCH_JUNK 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) 0 ( 0.00)

...

XLOG/SWITCH_JUNK 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) 0 ( 0.00)

Yeah, that's because of XLogDumpDisplayStats forgets to consider ri
(rmgr id) when showing the lines. If there's a record with info = 0x04
for other resources than RM_XLOG_ID, the spurious line is shown.

The first one is for XLOG_HEAP2_VISIBLE and the latter is for
XLOG_HEAP_HOT_UPDATE, that is, both of which are not for XLOG_SWITCH..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#18Noname
Shinya11.Kato@nttdata.com
In reply to: Kyotaro Horiguchi (#17)
RE: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

Thanks for the reply. > Mr.Horiguchi.

I reviewed the patch and found some problems.

+ if(startSegNo != endSegNo)
+ else if(record->ReadRecPtr / XLOG_BLCKSZ !=
+ if(rmid == RM_XLOG_ID && info == XLOG_SWITCH)
+ if(ri == RM_XLOG_ID)
+ if(info == XLOG_SWITCH)

You need to put a space after the "if".

@@ -24,6 +24,7 @@
#include "common/logging.h"
#include "getopt_long.h"
#include "rmgrdesc.h"
+#include "catalog/pg_control.h"

I think the include statements should be arranged in alphabetical order.

+ info = (rj << 4) & ~XLR_INFO_MASK;
+ if(info == XLOG_SWITCH)
+ XLogDumpStatsRow(psprintf("XLOG/SWITCH_JUNK"),
+ 0, total_count, stats->junk_size, total_rec_len,
+ 0, total_fpi_len, stats->junk_size, total_len);

Can't be described in the same way as "XLogDumpStatsRow(psprintf("%s/%s", desc->rm_name, id)..."?
Only this part looks strange.

Why are the "count" and "fpi_len" fields 0?

I think you need to improve the duplicate output in column "XLOG/SWITCH_JUNK".

Regards,
Shinya Kato

#19movead.li@highgo.ca
movead.li@highgo.ca
In reply to: Kyotaro Horiguchi (#14)
1 attachment(s)
RE: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

Thanks for review, and sorry for reply so later.

I reviewed the patch and found some problems.

+ if(startSegNo != endSegNo)
+ else if(record->ReadRecPtr / XLOG_BLCKSZ !=
+ if(rmid == RM_XLOG_ID && info == XLOG_SWITCH)
+ if(ri == RM_XLOG_ID)
+ if(info == XLOG_SWITCH)

You need to put a space after the "if".

All fix and thanks for point the issue.

@@ -24,6 +24,7 @@
#include "common/logging.h"
#include "getopt_long.h"
#include "rmgrdesc.h"
+#include "catalog/pg_control.h"

I think the include statements should be arranged in alphabetical order.

Fix.

+ info = (rj << 4) & ~XLR_INFO_MASK;
+ if(info == XLOG_SWITCH)
+ XLogDumpStatsRow(psprintf("XLOG/SWITCH_JUNK"),
+ 0, total_count, stats->junk_size, total_rec_len,
+ 0, total_fpi_len, stats->junk_size, total_len);

Can't be described in the same way as "XLogDumpStatsRow(psprintf("%s/%s", desc->rm_name, id)..."?
Only this part looks strange.
Why are the "count" and "fpi_len" fields 0?

The 'SWITCH_JUNK' is not a real record and it relys on 'XLOG_SWITCH' record, so I think we can't count
'SWITCH_JUNK', so the "count" is 0. And it never contain FPI, so the "fpi_len" is 0.

But 0 value maybe looks strange, so in current version I show it like below:
Type N (%) Record size (%) FPI size (%) Combined size (%)
---- - --- ----------- --- -------- --- ------------- ---
...
XLOG/SWITCH_JUNK - ( -) 11006248 ( 72.26) - ( -) 11006248 ( 65.78)
Transaction/COMMIT 10 ( 0.03) 340 ( 0.00) 0 ( 0.00) 340 ( 0.00)

I think you need to improve the duplicate output in column "XLOG/SWITCH_JUNK".

Yes it's a bug and fixed.

Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Attachments:

fix_waldump_size_for_wal_switch_v6.patchapplication/octet-stream; name=fix_waldump_size_for_wal_switch_v6.patchDownload
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 92cc7ea073..0668511660 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -74,6 +74,13 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		memcpy(&nextOid, rec, sizeof(Oid));
 		appendStringInfo(buf, "%u", nextOid);
 	}
+	else if (info == XLOG_SWITCH)
+	{
+		uint32	junk_len;
+		junk_len = xlog_switch_junk_len(record);
+
+		appendStringInfo(buf, "trailing-bytes: %u", junk_len);
+	}
 	else if (info == XLOG_RESTORE_POINT)
 	{
 		xl_restore_point *xlrec = (xl_restore_point *) rec;
@@ -142,6 +149,33 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 	}
 }
 
+uint32
+xlog_switch_junk_len(XLogReaderState *record)
+{
+	uint32 		junk_len;
+	XLogSegNo	startSegNo;
+	XLogSegNo	endSegNo;
+
+	XLByteToSeg(record->ReadRecPtr, startSegNo, record->segcxt.ws_segsize);
+	XLByteToPrevSeg(record->EndRecPtr, endSegNo, record->segcxt.ws_segsize);
+
+	junk_len = record->EndRecPtr - record->ReadRecPtr -
+											XLogRecGetTotalLen(record);
+	/*
+	 * If the wal switch record spread on two pages, we should extra minus
+	 * the page head, be careful if it's page at the top of a wal segment.
+	 */
+	if (startSegNo != endSegNo)
+		junk_len -= SizeOfXLogLongPHD;
+	else if (record->ReadRecPtr / XLOG_BLCKSZ !=
+				(record->EndRecPtr - 1) / XLOG_BLCKSZ)
+		junk_len -= SizeOfXLogShortPHD;
+
+	Assert(junk_len >= 0);
+
+	return junk_len;
+}
+
 const char *
 xlog_identify(uint8 info)
 {
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 164868d16e..c20f7f5703 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -20,6 +20,7 @@
 #include "access/xlog_internal.h"
 #include "access/xlogreader.h"
 #include "access/xlogrecord.h"
+#include "catalog/pg_control.h"
 #include "common/fe_memutils.h"
 #include "common/logging.h"
 #include "getopt_long.h"
@@ -66,6 +67,7 @@ typedef struct Stats
 typedef struct XLogDumpStats
 {
 	uint64		count;
+	uint32		junk_size;
 	Stats		rmgr_stats[RM_NEXT_ID];
 	Stats		record_stats[RM_NEXT_ID][MAX_XLINFO_TYPES];
 } XLogDumpStats;
@@ -416,12 +418,21 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
 	uint8		recid;
 	uint32		rec_len;
 	uint32		fpi_len;
+	uint8		info;
 
 	stats->count++;
 
 	rmid = XLogRecGetRmid(record);
+	info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
 
 	XLogDumpRecordLen(record, &rec_len, &fpi_len);
+	/* Add junk-space stats for XLOG_SWITCH  */
+	if (rmid == RM_XLOG_ID && info == XLOG_SWITCH)
+	{
+		uint32 junk_len = xlog_switch_junk_len(record);
+
+		stats->junk_size += junk_len;
+	}
 
 	/* Update per-rmgr statistics */
 
@@ -595,6 +606,38 @@ XLogDumpStatsRow(const char *name,
 		   tot_len, tot_len_pct);
 }
 
+static void
+XLogDumpStatsRow_junk(bool detail,
+				 uint64 rec_len, uint64 total_rec_len,
+				 uint64 tot_len, uint64 total_len)
+{
+	double		rec_len_pct,
+				tot_len_pct;
+	char		*name = NULL;
+	char		pad = '-';
+
+	if (detail)
+		name = "XLOG/SWITCH_JUNK";
+	else
+		name = "XLOGSwitchJunk";
+
+	rec_len_pct = 0;
+	if (total_rec_len != 0)
+		rec_len_pct = 100 * (double) rec_len / total_rec_len;
+
+	tot_len_pct = 0;
+	if (total_len != 0)
+		tot_len_pct = 100 * (double) tot_len / total_len;
+
+	printf("%-27s "
+		   "%20c (%6c) "
+		   "%20" INT64_MODIFIER "u (%6.02f) "
+		   "%20c (%6c) "
+		   "%20" INT64_MODIFIER "u (%6.02f)\n",
+		   name, pad, pad, rec_len, rec_len_pct, pad, pad,
+		   tot_len, tot_len_pct);
+}
+
 
 /*
  * Display summary statistics about the records seen so far.
@@ -622,6 +665,7 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
 		total_rec_len += stats->rmgr_stats[ri].rec_len;
 		total_fpi_len += stats->rmgr_stats[ri].fpi_len;
 	}
+	total_rec_len += stats->junk_size;
 	total_len = total_rec_len + total_fpi_len;
 
 	/*
@@ -652,12 +696,22 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
 			XLogDumpStatsRow(desc->rm_name,
 							 count, total_count, rec_len, total_rec_len,
 							 fpi_len, total_fpi_len, tot_len, total_len);
+			if (ri == RM_XLOG_ID)
+			{
+				XLogDumpStatsRow_junk(false,
+								stats->junk_size, total_rec_len,
+								stats->junk_size, total_len);
+			}
+
 		}
 		else
 		{
+			bool	getswitch = false;
+
 			for (rj = 0; rj < MAX_XLINFO_TYPES; rj++)
 			{
 				const char *id;
+				uint8		info;
 
 				count = stats->record_stats[ri][rj].count;
 				rec_len = stats->record_stats[ri][rj].rec_len;
@@ -676,7 +730,15 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
 				XLogDumpStatsRow(psprintf("%s/%s", desc->rm_name, id),
 								 count, total_count, rec_len, total_rec_len,
 								 fpi_len, total_fpi_len, tot_len, total_len);
+				info = (rj << 4) & ~XLR_INFO_MASK;
+				if (ri == RM_XLOG_ID && info == XLOG_SWITCH)
+					getswitch = true;
+
 			}
+			if (getswitch)
+				XLogDumpStatsRow_junk(true,
+									stats->junk_size, total_rec_len,
+									stats->junk_size, total_len);
 		}
 	}
 
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 9585ad17b3..687ba90238 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -315,6 +315,7 @@ extern XLogRecPtr RequestXLogSwitch(bool mark_unimportant);
 
 extern void GetOldestRestartPoint(XLogRecPtr *oldrecptr, TimeLineID *oldtli);
 
+extern uint32 xlog_switch_junk_len(XLogReaderState *record);
 /*
  * Exported for the functions in timeline.c and xlogarchive.c.  Only valid
  * in the startup process.
#20Noname
Shinya11.Kato@nttdata.com
In reply to: movead.li@highgo.ca (#19)
RE: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

Thanks for review, and sorry for reply so later.

I reviewed the patch and found some problems.

+ if(startSegNo != endSegNo)
+ else if(record->ReadRecPtr / XLOG_BLCKSZ !=
+ if(rmid == RM_XLOG_ID && info == XLOG_SWITCH)
+ if(ri == RM_XLOG_ID)
+ if(info == XLOG_SWITCH)

You need to put a space after the "if".

All fix and thanks for point the issue.

@@ -24,6 +24,7 @@
#include "common/logging.h"
#include "getopt_long.h"
#include "rmgrdesc.h"
+#include "catalog/pg_control.h"

I think the include statements should be arranged in alphabetical order.

Fix.

Thank you for your revision!

+ info = (rj << 4) & ~XLR_INFO_MASK;
+ if(info == XLOG_SWITCH)
+ XLogDumpStatsRow(psprintf("XLOG/SWITCH_JUNK"),
+ 0, total_count, stats->junk_size, total_rec_len,
+ 0, total_fpi_len, stats->junk_size, total_len);

Can't be described in the same way as "XLogDumpStatsRow(psprintf("%s/%s", desc->rm_name, id)..."?
Only this part looks strange.
Why are the "count" and "fpi_len" fields 0?

The 'SWITCH_JUNK' is not a real record and it relys on 'XLOG_SWITCH' record, so I think we can't count
'SWITCH_JUNK', so the "count" is 0. And it never contain FPI, so the "fpi_len" is 0.

But 0 value maybe looks strange, so in current version I show it like below:
Type N (%) Record size (%) FPI size (%) Combined size (%)
---- - --- ----------- --- -------- --- ------------- ---
...
XLOG/SWITCH_JUNK - ( -) 11006248 ( 72.26) - ( -) 11006248 ( 65.78)
Transaction/COMMIT 10 ( 0.03) 340 ( 0.00) 0 ( 0.00) 340 ( 0.00)

I just wanted to know why the "count" and "fpi_len" fields 0 are.
So, it would be nice to have 0 values. Sorry for confusing you.

Regards,
Shinya Kato

#21David Steele
david@pgmasters.net
In reply to: Noname (#20)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

On 1/7/21 2:55 AM, Shinya11.Kato@nttdata.com wrote:

But 0 value maybe looks strange, so in current version I show it like below:
Type N (%) Record size (%) FPI size (%) Combined size (%)
---- - --- ----------- --- -------- --- ------------- ---
...
XLOG/SWITCH_JUNK - ( -) 11006248 ( 72.26) - ( -) 11006248 ( 65.78)
Transaction/COMMIT 10 ( 0.03) 340 ( 0.00) 0 ( 0.00) 340 ( 0.00)

I just wanted to know why the "count" and "fpi_len" fields 0 are.
So, it would be nice to have 0 values. Sorry for confusing you.

Kato, it's not clear to me if you were asking for - to be changed back to 0?

You marked the patch as Ready for Committer so I assume not, but it
would be better to say clearly that you think this patch is ready for a
committer to look at.

Regards,
--
-David
david@pgmasters.net

#22Noname
Shinya11.Kato@nttdata.com
In reply to: David Steele (#21)
RE: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

But 0 value maybe looks strange, so in current version I show it like >below:
Type N (%) Record size (%) FPI size (%) Combined size (%)
---- - --- ----------- --- -------- --- ------------- --- ...
XLOG/SWITCH_JUNK - ( -) 11006248 ( 72.26) - ( -) 11006248 ( 65.78)
Transaction/COMMIT 10 ( 0.03) 340 ( 0.00) 0 ( 0.00) 340 ( 0.00)

I just wanted to know why the "count" and "fpi_len" fields 0 are.
So, it would be nice to have 0 values. Sorry for confusing you.

Kato, it's not clear to me if you were asking for - to be changed back to 0?

You marked the patch as Ready for Committer so I assume not, but it would be
better to say clearly that you think this patch is ready for a committer to look at.

Yes, I don't think it needs to be changed back to 0.
I think this patch is ready for a committer to look at.

Regards,
Shinya Kato

#23Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Noname (#22)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

On 2021/03/19 15:06, Shinya11.Kato@nttdata.com wrote:

But 0 value maybe looks strange, so in current version I show it like >below:
Type N (%) Record size (%) FPI size (%) Combined size (%)
---- - --- ----------- --- -------- --- ------------- --- ...
XLOG/SWITCH_JUNK - ( -) 11006248 ( 72.26) - ( -) 11006248 ( 65.78)
Transaction/COMMIT 10 ( 0.03) 340 ( 0.00) 0 ( 0.00) 340 ( 0.00)

I just wanted to know why the "count" and "fpi_len" fields 0 are.
So, it would be nice to have 0 values. Sorry for confusing you.

Kato, it's not clear to me if you were asking for - to be changed back to 0?

You marked the patch as Ready for Committer so I assume not, but it would be
better to say clearly that you think this patch is ready for a committer to look at.

Yes, I don't think it needs to be changed back to 0.
I think this patch is ready for a committer to look at.

What's the use case of this feature? I read through this thread briefly,
but I'm still not sure how useful this feature is.

Horiguchi-san reported one issue upthread; --stats=record shows
two lines for Transaction/COMMIT record. Probably this should be
fixed separately.

Horiguchi-san,
Do you have updated version of that bug-fix patch?
Or you started another thread for that issue?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#24Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Fujii Masao (#23)
1 attachment(s)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

On 2021/03/19 18:27, Fujii Masao wrote:

On 2021/03/19 15:06, Shinya11.Kato@nttdata.com wrote:

But 0 value maybe looks strange, so in current version I show it like >below:
Type N (%) Record size (%) FPI size (%) Combined size (%)
---- - --- ----------- --- -------- --- ------------- --- ...
XLOG/SWITCH_JUNK - ( -) 11006248 ( 72.26) - ( -) 11006248 ( 65.78)
Transaction/COMMIT 10 ( 0.03) 340 ( 0.00) 0 ( 0.00) 340 ( 0.00)

I just wanted to know why the "count" and "fpi_len" fields 0 are.
So, it would be nice to have 0 values. Sorry for confusing you.

Kato, it's not clear to me if you were asking for - to be changed back to 0?

You marked the patch as Ready for Committer so I assume not, but it would be
better to say clearly that you think this patch is ready for a committer to look at.

Yes, I don't think it needs to be changed back to 0.
I think this patch is ready for a committer to look at.

What's the use case of this feature? I read through this thread briefly,
but I'm still not sure how useful this feature is.

Horiguchi-san reported one issue upthread; --stats=record shows
two lines for Transaction/COMMIT record. Probably this should be
fixed separately.

Horiguchi-san,
Do you have updated version of that bug-fix patch?
Or you started another thread for that issue?

I confirmed that only XACT records need to be processed differently.
So the patch that Horiguchi-san posted upthread looks good and enough
to me. I added a bit more detail information into the comment in the patch.
Attached is the updated version of the patch. Since this issue looks like
a bug, I'm thinking to back-patch that. Thought?

Barring any objection, I will commit this.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

fix_pg_waldump_xact_commit_record_stats_fujii.patchtext/plain; charset=UTF-8; name=fix_pg_waldump_xact_commit_record_stats_fujii.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 610f65e471..4e90d82c7d 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -438,6 +438,15 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
 
 	recid = XLogRecGetInfo(record) >> 4;
 
+	/*
+	 * XACT records need to be handled differently. Those records use the
+	 * first bit of thoes four bits for an optional flag variable and the
+	 * following three bits for the opcode. We filter opcode out of xl_info
+	 * and use it as the identifier of the record.
+	 */
+	if (rmid == RM_XACT_ID)
+		recid &= 0x07;
+
 	stats->record_stats[rmid][recid].count++;
 	stats->record_stats[rmid][recid].rec_len += rec_len;
 	stats->record_stats[rmid][recid].fpi_len += fpi_len;
#25Noname
Shinya11.Kato@nttdata.com
In reply to: Fujii Masao (#24)
RE: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

-----Original Message-----
From: Fujii Masao <masao.fujii@oss.nttdata.com>
Sent: Monday, March 22, 2021 11:22 AM
To: Shinya11.Kato@nttdata.com; david@pgmasters.net; movead.li@highgo.ca
Cc: pgsql-hackers@postgresql.org; andres@anarazel.de; michael@paquier.xyz;
ahsan.hadi@highgo.ca; horikyota.ntt@gmail.com
Subject: Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

On 2021/03/19 18:27, Fujii Masao wrote:

On 2021/03/19 15:06, Shinya11.Kato@nttdata.com wrote:

But 0 value maybe looks strange, so in current version I show it like

below:

Type N (%) Record size (%) FPI size (%) Combined size (%)
---- - --- ----------- --- -------- --- ------------- --- ...
XLOG/SWITCH_JUNK - ( -) 11006248 ( 72.26) - ( -) 11006248 ( 65.78)
Transaction/COMMIT 10 ( 0.03) 340 ( 0.00) 0 ( 0.00) 340 ( 0.00)

I just wanted to know why the "count" and "fpi_len" fields 0 are.
So, it would be nice to have 0 values. Sorry for confusing you.

Kato, it's not clear to me if you were asking for - to be changed back to 0?

You marked the patch as Ready for Committer so I assume not, but it
would be better to say clearly that you think this patch is ready for a

committer to look at.

Yes, I don't think it needs to be changed back to 0.
I think this patch is ready for a committer to look at.

What's the use case of this feature? I read through this thread
briefly, but I'm still not sure how useful this feature is.

Horiguchi-san reported one issue upthread; --stats=record shows two
lines for Transaction/COMMIT record. Probably this should be fixed
separately.

Horiguchi-san,
Do you have updated version of that bug-fix patch?
Or you started another thread for that issue?

I confirmed that only XACT records need to be processed differently.
So the patch that Horiguchi-san posted upthread looks good and enough to me.
I added a bit more detail information into the comment in the patch.
Attached is the updated version of the patch. Since this issue looks like a bug,
I'm thinking to back-patch that. Thought?

Barring any objection, I will commit this.

I think it's good except for a typo "thoes four bits"

Regards,
Shinya Kato

#26Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Noname (#25)
1 attachment(s)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

On 2021/03/22 14:03, Shinya11.Kato@nttdata.com wrote:

Barring any objection, I will commit this.

I think it's good except for a typo "thoes four bits"

Thanks for the review! Attached is the updated version of the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

fix_pg_waldump_xact_commit_record_stats_fujii_v2.patchtext/plain; charset=UTF-8; name=fix_pg_waldump_xact_commit_record_stats_fujii_v2.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 610f65e471..f8b8afe4a7 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -438,6 +438,15 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
 
 	recid = XLogRecGetInfo(record) >> 4;
 
+	/*
+	 * XACT records need to be handled differently. Those records use the
+	 * first bit of those four bits for an optional flag variable and the
+	 * following three bits for the opcode. We filter opcode out of xl_info
+	 * and use it as the identifier of the record.
+	 */
+	if (rmid == RM_XACT_ID)
+		recid &= 0x07;
+
 	stats->record_stats[rmid][recid].count++;
 	stats->record_stats[rmid][recid].rec_len += rec_len;
 	stats->record_stats[rmid][recid].fpi_len += fpi_len;
#27Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fujii Masao (#26)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

At Mon, 22 Mar 2021 14:07:43 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

On 2021/03/22 14:03, Shinya11.Kato@nttdata.com wrote:

Barring any objection, I will commit this.

I think it's good except for a typo "thoes four bits"

Thanks for the review! Attached is the updated version of the patch.

Thanks for picking it up. LGTM and applies cleanly.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#28Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Kyotaro Horiguchi (#27)
Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

On 2021/03/22 17:49, Kyotaro Horiguchi wrote:

At Mon, 22 Mar 2021 14:07:43 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

On 2021/03/22 14:03, Shinya11.Kato@nttdata.com wrote:

Barring any objection, I will commit this.

I think it's good except for a typo "thoes four bits"

Thanks for the review! Attached is the updated version of the patch.

Thanks for picking it up. LGTM and applies cleanly.

Pushed. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION