Make pg_waldump report replication origin ID, LSN, and timestamp.

Started by Masahiko Sawadaabout 4 years ago9 messages
#1Masahiko Sawada
sawada.mshk@gmail.com
1 attachment(s)

Hi all,

I realized that pg_waldump doesn't show replication origin ID, LSN,
and timestamp of PREPARE TRANSACTION record. Commit 7b8a899bdeb
improved pg_waldump two years ago so that it reports the detail of
information of PREPARE TRANSACTION but ISTM that it overlooked showing
replication origin information. As far as I can see in the discussion
thread[1]/messages/by-id/CAHGQGwEvhASad4JJnCv=0dW2TJypZgW_Vpb-oZik2a3utCqcrA@mail.gmail.com, there was no discussion on that. These are helpful when
diagnosing 2PC related issues on the subscriber side.

I've attached a patch to add replication origin information to
xact_desc_prepare().

Regards,

[1]: /messages/by-id/CAHGQGwEvhASad4JJnCv=0dW2TJypZgW_Vpb-oZik2a3utCqcrA@mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

0001-Make-pg_waldump-report-replication-origin-ID-LSN-and.patchapplication/x-patch; name=0001-Make-pg_waldump-report-replication-origin-ID-LSN-and.patchDownload
From b37d5a8cd5b84eeb850bd7ddeb903024652f20d2 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 6 Dec 2021 14:32:57 +0900
Subject: [PATCH] Make pg_waldump report replication origin ID, LSN, and
 timestamp.

Commit 7b8a899bdeb made pg_waldump report the detail of information
about PREPARE TRANSACTION record, like global transaction
identifier. However, replication origin LSN and timestamp were not
reporeted. These are helpful when diagnosing 2PC-related troubles on
the subsciber side.

This commit makes xact_desc_prepare() report replication origin ID,
LSN, and timestamp.
---
 src/backend/access/rmgrdesc/xactdesc.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index 4b0d10f073..ea3649d822 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/transam.h"
 #include "access/xact.h"
+#include "replication/origin.h"
 #include "storage/sinval.h"
 #include "storage/standbydefs.h"
 #include "utils/timestamp.h"
@@ -329,7 +330,7 @@ xact_desc_abort(StringInfo buf, uint8 info, xl_xact_abort *xlrec)
 }
 
 static void
-xact_desc_prepare(StringInfo buf, uint8 info, xl_xact_prepare *xlrec)
+xact_desc_prepare(StringInfo buf, uint8 info, xl_xact_prepare *xlrec, RepOriginId origin_id)
 {
 	xl_xact_parsed_prepare parsed;
 
@@ -345,6 +346,12 @@ xact_desc_prepare(StringInfo buf, uint8 info, xl_xact_prepare *xlrec)
 
 	standby_desc_invalidations(buf, parsed.nmsgs, parsed.msgs, parsed.dbId,
 							   parsed.tsId, xlrec->initfileinval);
+
+	if (origin_id != InvalidRepOriginId)
+		appendStringInfo(buf, "; origin: node %u, lsn %X/%X, at %s",
+						 origin_id,
+						 LSN_FORMAT_ARGS(parsed.origin_lsn),
+						 timestamptz_to_str(parsed.origin_timestamp));
 }
 
 static void
@@ -381,7 +388,8 @@ xact_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_xact_prepare *xlrec = (xl_xact_prepare *) rec;
 
-		xact_desc_prepare(buf, XLogRecGetInfo(record), xlrec);
+		xact_desc_prepare(buf, XLogRecGetInfo(record), xlrec,
+						  XLogRecGetOrigin(record));
 	}
 	else if (info == XLOG_XACT_ASSIGNMENT)
 	{
-- 
2.24.3 (Apple Git-128)

#2Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#1)
Re: Make pg_waldump report replication origin ID, LSN, and timestamp.

On Mon, Dec 06, 2021 at 04:35:07PM +0900, Masahiko Sawada wrote:

I've attached a patch to add replication origin information to
xact_desc_prepare().

Yeah.

+   if (origin_id != InvalidRepOriginId)
+       appendStringInfo(buf, "; origin: node %u, lsn %X/%X, at %s",
+                        origin_id,
+                        LSN_FORMAT_ARGS(parsed.origin_lsn),
+                        timestamptz_to_str(parsed.origin_timestamp));

Shouldn't you check for parsed.origin_lsn instead? The replication
origin is stored there as far as I read EndPrepare().

Commit records check after XACT_XINFO_HAS_ORIGIN, but
xact_desc_abort() may include this information for ROLLBACK PREPARED
transactions so we could use the same logic as xact_desc_commit() for
the abort case, no?
--
Michael

#3Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#2)
Re: Make pg_waldump report replication origin ID, LSN, and timestamp.

On Mon, Dec 6, 2021 at 5:09 PM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Dec 06, 2021 at 04:35:07PM +0900, Masahiko Sawada wrote:

I've attached a patch to add replication origin information to
xact_desc_prepare().

Yeah.

+   if (origin_id != InvalidRepOriginId)
+       appendStringInfo(buf, "; origin: node %u, lsn %X/%X, at %s",
+                        origin_id,
+                        LSN_FORMAT_ARGS(parsed.origin_lsn),
+                        timestamptz_to_str(parsed.origin_timestamp));

Shouldn't you check for parsed.origin_lsn instead? The replication
origin is stored there as far as I read EndPrepare().

Yeah, I was thinking check origin_lsn instead. That way, we don't show
invalid origin_timestamp and origin_lsn even if origin_id is set. But
as far as I read, the same is true for xact_desc_commit() (and
xact_desc_rollback()). That is, since apply workers always its origin
id and could do commit transactions that are not replicated from the
publisher, it's possible that xact_desc_commit() reports like:

rmgr: Transaction len (rec/tot): 117/ 117, tx: 725, lsn:
0/014BE938, prev 0/014BE908, desc: COMMIT 2021-12-06 22:04:44.462200
JST; inval msgs: catcache 55 catcache 54 catcache 64; origin: node 1,
lsn 0/0, at 2000-01-01 09:00:00.000000 JST

Also, looking at PrepareRedoAdd(), we check the replication origin id.
So I think that it'd be better to check origin_id for consistency.

Commit records check after XACT_XINFO_HAS_ORIGIN, but
xact_desc_abort() may include this information for ROLLBACK PREPARED
transactions so we could use the same logic as xact_desc_commit() for
the abort case, no?

Good catch! I'll submit an updated patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#4Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#3)
Re: Make pg_waldump report replication origin ID, LSN, and timestamp.

On Mon, Dec 6, 2021 at 11:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Dec 6, 2021 at 5:09 PM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Dec 06, 2021 at 04:35:07PM +0900, Masahiko Sawada wrote:

I've attached a patch to add replication origin information to
xact_desc_prepare().

Yeah.

+   if (origin_id != InvalidRepOriginId)
+       appendStringInfo(buf, "; origin: node %u, lsn %X/%X, at %s",
+                        origin_id,
+                        LSN_FORMAT_ARGS(parsed.origin_lsn),
+                        timestamptz_to_str(parsed.origin_timestamp));

Shouldn't you check for parsed.origin_lsn instead? The replication
origin is stored there as far as I read EndPrepare().

Yeah, I was thinking check origin_lsn instead. That way, we don't show
invalid origin_timestamp and origin_lsn even if origin_id is set. But
as far as I read, the same is true for xact_desc_commit() (and
xact_desc_rollback()). That is, since apply workers always its origin
id and could do commit transactions that are not replicated from the
publisher, it's possible that xact_desc_commit() reports like:

rmgr: Transaction len (rec/tot): 117/ 117, tx: 725, lsn:
0/014BE938, prev 0/014BE908, desc: COMMIT 2021-12-06 22:04:44.462200
JST; inval msgs: catcache 55 catcache 54 catcache 64; origin: node 1,
lsn 0/0, at 2000-01-01 09:00:00.000000 JST

Hmm, is that okay in the first place? This happens since the apply
worker updates twophaesstate value of pg_subscription after setting
the origin id and before entering the apply loop. No changes in this
transaction will be replicated but an empty transaction that has
origin id and doesn't have origin lsn and time will be replicated.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#5Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#3)
Re: Make pg_waldump report replication origin ID, LSN, and timestamp.

On Mon, Dec 06, 2021 at 11:24:09PM +0900, Masahiko Sawada wrote:

On Mon, Dec 6, 2021 at 5:09 PM Michael Paquier <michael@paquier.xyz> wrote:

Shouldn't you check for parsed.origin_lsn instead? The replication
origin is stored there as far as I read EndPrepare().

Also, looking at PrepareRedoAdd(), we check the replication origin id.
So I think that it'd be better to check origin_id for consistency.

Okay, this consistency would make sense, then. Perhaps some comments
should be added to tell that?

Commit records check after XACT_XINFO_HAS_ORIGIN, but
xact_desc_abort() may include this information for ROLLBACK PREPARED
transactions so we could use the same logic as xact_desc_commit() for
the abort case, no?

Good catch! I'll submit an updated patch.

Thanks!
--
Michael

#6Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#5)
1 attachment(s)
Re: Make pg_waldump report replication origin ID, LSN, and timestamp.

On Wed, Dec 8, 2021 at 4:31 PM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Dec 06, 2021 at 11:24:09PM +0900, Masahiko Sawada wrote:

On Mon, Dec 6, 2021 at 5:09 PM Michael Paquier <michael@paquier.xyz> wrote:

Shouldn't you check for parsed.origin_lsn instead? The replication
origin is stored there as far as I read EndPrepare().

Also, looking at PrepareRedoAdd(), we check the replication origin id.
So I think that it'd be better to check origin_id for consistency.

Okay, this consistency would make sense, then. Perhaps some comments
should be added to tell that?

Agreed. I've attached an updated patch that incorporated your review
comments. Please review it.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v2-0001-Make-pg_waldump-report-replication-origin-ID-LSN-.patchapplication/octet-stream; name=v2-0001-Make-pg_waldump-report-replication-origin-ID-LSN-.patchDownload
From efa03d6073a6033a1481d9e4bbfa2273e45e5d73 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 6 Dec 2021 14:32:57 +0900
Subject: [PATCH v2] Make pg_waldump report replication origin ID, LSN, and
 timestamp.

Commit 7b8a899bdeb made pg_waldump report the detail of information
about PREPARE TRANSACTION record, like global transaction
identifier. However, replication origin LSN and timestamp were not
reporeted. These are helpful when diagnosing 2PC-related troubles on
the subsciber side.

This commit makes xact_desc_prepare() and xact_desc_abort() report
replication origin ID, LSN, and timestamp.
---
 src/backend/access/rmgrdesc/xactdesc.c | 29 ++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index 4b0d10f073..b5e9edd819 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/transam.h"
 #include "access/xact.h"
+#include "replication/origin.h"
 #include "storage/sinval.h"
 #include "storage/standbydefs.h"
 #include "utils/timestamp.h"
@@ -312,7 +313,7 @@ xact_desc_commit(StringInfo buf, uint8 info, xl_xact_commit *xlrec, RepOriginId
 }
 
 static void
-xact_desc_abort(StringInfo buf, uint8 info, xl_xact_abort *xlrec)
+xact_desc_abort(StringInfo buf, uint8 info, xl_xact_abort *xlrec, RepOriginId origin_id)
 {
 	xl_xact_parsed_abort parsed;
 
@@ -326,10 +327,18 @@ xact_desc_abort(StringInfo buf, uint8 info, xl_xact_abort *xlrec)
 
 	xact_desc_relations(buf, "rels", parsed.nrels, parsed.xnodes);
 	xact_desc_subxacts(buf, parsed.nsubxacts, parsed.subxacts);
+
+	if (parsed.xinfo & XACT_XINFO_HAS_ORIGIN)
+	{
+		appendStringInfo(buf, "; origin: node %u, lsn %X/%X, at %s",
+						 origin_id,
+						 LSN_FORMAT_ARGS(parsed.origin_lsn),
+						 timestamptz_to_str(parsed.origin_timestamp));
+	}
 }
 
 static void
-xact_desc_prepare(StringInfo buf, uint8 info, xl_xact_prepare *xlrec)
+xact_desc_prepare(StringInfo buf, uint8 info, xl_xact_prepare *xlrec, RepOriginId origin_id)
 {
 	xl_xact_parsed_prepare parsed;
 
@@ -345,6 +354,16 @@ xact_desc_prepare(StringInfo buf, uint8 info, xl_xact_prepare *xlrec)
 
 	standby_desc_invalidations(buf, parsed.nmsgs, parsed.msgs, parsed.dbId,
 							   parsed.tsId, xlrec->initfileinval);
+
+	/*
+	 * Check if the replication origin has been set in this record in the
+	 * same way as PrepareRedoAdd().
+	 */
+	if (origin_id != InvalidRepOriginId)
+		appendStringInfo(buf, "; origin: node %u, lsn %X/%X, at %s",
+						 origin_id,
+						 LSN_FORMAT_ARGS(parsed.origin_lsn),
+						 timestamptz_to_str(parsed.origin_timestamp));
 }
 
 static void
@@ -375,13 +394,15 @@ xact_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_xact_abort *xlrec = (xl_xact_abort *) rec;
 
-		xact_desc_abort(buf, XLogRecGetInfo(record), xlrec);
+		xact_desc_abort(buf, XLogRecGetInfo(record), xlrec,
+						XLogRecGetOrigin(record));
 	}
 	else if (info == XLOG_XACT_PREPARE)
 	{
 		xl_xact_prepare *xlrec = (xl_xact_prepare *) rec;
 
-		xact_desc_prepare(buf, XLogRecGetInfo(record), xlrec);
+		xact_desc_prepare(buf, XLogRecGetInfo(record), xlrec,
+						  XLogRecGetOrigin(record));
 	}
 	else if (info == XLOG_XACT_ASSIGNMENT)
 	{
-- 
2.24.3 (Apple Git-128)

#7Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#6)
1 attachment(s)
Re: Make pg_waldump report replication origin ID, LSN, and timestamp.

On Wed, Dec 08, 2021 at 05:03:30PM +0900, Masahiko Sawada wrote:

Agreed. I've attached an updated patch that incorporated your review
comments. Please review it.

That looks correct to me. One thing that I have noticed while
reviewing is that we don't check XactCompletionApplyFeedback() in
xact_desc_commit(), which would happen if a transaction needs to do
a remote_apply on a standby. synchronous_commit is a user-settable
parameter, so it seems to me that it could be useful for debugging?

That's not related to your patch, but while we are looking at the
area..
--
Michael

Attachments:

v3-0001-Make-pg_waldump-report-replication-origin-ID-LSN-.patchtext/x-diff; charset=us-asciiDownload
From c821c67698535ded147e8686c182d91462a16599 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 9 Dec 2021 16:00:53 +0900
Subject: [PATCH v3] Make pg_waldump report replication origin ID, LSN, and
 timestamp.

Commit 7b8a899bdeb made pg_waldump report the detail of information
about PREPARE TRANSACTION record, like global transaction
identifier. However, replication origin LSN and timestamp were not
reporeted. These are helpful when diagnosing 2PC-related troubles on
the subsciber side.

This commit makes xact_desc_prepare() and xact_desc_abort() report
replication origin ID, LSN, and timestamp.
---
 src/backend/access/rmgrdesc/xactdesc.c | 32 ++++++++++++++++++++++----
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index 4b0d10f073..fca03a00d9 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -16,6 +16,7 @@
 
 #include "access/transam.h"
 #include "access/xact.h"
+#include "replication/origin.h"
 #include "storage/sinval.h"
 #include "storage/standbydefs.h"
 #include "utils/timestamp.h"
@@ -299,6 +300,9 @@ xact_desc_commit(StringInfo buf, uint8 info, xl_xact_commit *xlrec, RepOriginId
 							   parsed.tsId,
 							   XactCompletionRelcacheInitFileInval(parsed.xinfo));
 
+	if (XactCompletionApplyFeedback(parsed.xinfo))
+		appendStringInfoString(buf, "; apply_feedback");
+
 	if (XactCompletionForceSyncCommit(parsed.xinfo))
 		appendStringInfoString(buf, "; sync");
 
@@ -312,7 +316,7 @@ xact_desc_commit(StringInfo buf, uint8 info, xl_xact_commit *xlrec, RepOriginId
 }
 
 static void
-xact_desc_abort(StringInfo buf, uint8 info, xl_xact_abort *xlrec)
+xact_desc_abort(StringInfo buf, uint8 info, xl_xact_abort *xlrec, RepOriginId origin_id)
 {
 	xl_xact_parsed_abort parsed;
 
@@ -326,10 +330,18 @@ xact_desc_abort(StringInfo buf, uint8 info, xl_xact_abort *xlrec)
 
 	xact_desc_relations(buf, "rels", parsed.nrels, parsed.xnodes);
 	xact_desc_subxacts(buf, parsed.nsubxacts, parsed.subxacts);
+
+	if (parsed.xinfo & XACT_XINFO_HAS_ORIGIN)
+	{
+		appendStringInfo(buf, "; origin: node %u, lsn %X/%X, at %s",
+						 origin_id,
+						 LSN_FORMAT_ARGS(parsed.origin_lsn),
+						 timestamptz_to_str(parsed.origin_timestamp));
+	}
 }
 
 static void
-xact_desc_prepare(StringInfo buf, uint8 info, xl_xact_prepare *xlrec)
+xact_desc_prepare(StringInfo buf, uint8 info, xl_xact_prepare *xlrec, RepOriginId origin_id)
 {
 	xl_xact_parsed_prepare parsed;
 
@@ -345,6 +357,16 @@ xact_desc_prepare(StringInfo buf, uint8 info, xl_xact_prepare *xlrec)
 
 	standby_desc_invalidations(buf, parsed.nmsgs, parsed.msgs, parsed.dbId,
 							   parsed.tsId, xlrec->initfileinval);
+
+	/*
+	 * Check if the replication origin has been set in this record in the
+	 * same way as PrepareRedoAdd().
+	 */
+	if (origin_id != InvalidRepOriginId)
+		appendStringInfo(buf, "; origin: node %u, lsn %X/%X, at %s",
+						 origin_id,
+						 LSN_FORMAT_ARGS(parsed.origin_lsn),
+						 timestamptz_to_str(parsed.origin_timestamp));
 }
 
 static void
@@ -375,13 +397,15 @@ xact_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_xact_abort *xlrec = (xl_xact_abort *) rec;
 
-		xact_desc_abort(buf, XLogRecGetInfo(record), xlrec);
+		xact_desc_abort(buf, XLogRecGetInfo(record), xlrec,
+						XLogRecGetOrigin(record));
 	}
 	else if (info == XLOG_XACT_PREPARE)
 	{
 		xl_xact_prepare *xlrec = (xl_xact_prepare *) rec;
 
-		xact_desc_prepare(buf, XLogRecGetInfo(record), xlrec);
+		xact_desc_prepare(buf, XLogRecGetInfo(record), xlrec,
+						  XLogRecGetOrigin(record));
 	}
 	else if (info == XLOG_XACT_ASSIGNMENT)
 	{
-- 
2.34.1

#8Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#7)
Re: Make pg_waldump report replication origin ID, LSN, and timestamp.

On Thu, Dec 9, 2021 at 4:02 PM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Dec 08, 2021 at 05:03:30PM +0900, Masahiko Sawada wrote:

Agreed. I've attached an updated patch that incorporated your review
comments. Please review it.

That looks correct to me. One thing that I have noticed while
reviewing is that we don't check XactCompletionApplyFeedback() in
xact_desc_commit(), which would happen if a transaction needs to do
a remote_apply on a standby. synchronous_commit is a user-settable
parameter, so it seems to me that it could be useful for debugging?

Agreed.

Thank you for updating the patch. The patch looks good to me.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#9Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#8)
Re: Make pg_waldump report replication origin ID, LSN, and timestamp.

On Thu, Dec 09, 2021 at 05:42:56PM +0900, Masahiko Sawada wrote:

Thank you for updating the patch. The patch looks good to me.

Done this way. Please note that while testing this patch, I have
found a completely different issue. I'll spawn a thread about that in
a minute..
--
Michael