pgsql: Store 2PC GID in commit/abort WAL recs for logical decoding

Started by Simon Riggsalmost 8 years ago10 messages
#1Simon Riggs
simon@2ndQuadrant.com

Store 2PC GID in commit/abort WAL recs for logical decoding

Store GID of 2PC in commit/abort WAL records when wal_level = logical.
This allows logical decoding to send the SAME gid to subscribers
across restarts of logical replication.

Track relica origin replay progress for 2PC.

(Edited from patch 0003 in the logical decoding 2PC series.)

Authors: Nikhil Sontakke, Stas Kelvich
Reviewed-by: Simon Riggs, Andres Freund

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/1eb6d6527aae264b3e0b9c95aa70bb7a594ad1cf

Modified Files
--------------
src/backend/access/rmgrdesc/xactdesc.c | 39 ++++++++++++
src/backend/access/transam/twophase.c | 105 ++++++++++++++++++++++++++++-----
src/backend/access/transam/xact.c | 78 ++++++++++++++++++++++--
src/include/access/twophase.h | 5 +-
src/include/access/xact.h | 27 ++++++++-
5 files changed, 230 insertions(+), 24 deletions(-)

#2Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Simon Riggs (#1)
1 attachment(s)
Re: pgsql: Store 2PC GID in commit/abort WAL recs for logical decoding

On 28/03/18 19:47, Simon Riggs wrote:

Store 2PC GID in commit/abort WAL recs for logical decoding

This forgot to update the comments in xl_xact_commit and xl_xact_abort,
for the new fields.

+
+               if (parsed->xinfo & XACT_XINFO_HAS_GID)
+               {
+                       int gidlen;
+                       strcpy(parsed->twophase_gid, data);
+                       gidlen = strlen(parsed->twophase_gid) + 1;
+                       data += MAXALIGN(gidlen);
+               }
+       }
+
+       if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+       {
+               xl_xact_origin xl_origin;
+
+               /* we're only guaranteed 4 byte alignment, so copy onto stack */
+               memcpy(&xl_origin, data, sizeof(xl_origin));
+
+               parsed->origin_lsn = xl_origin.origin_lsn;
+               parsed->origin_timestamp = xl_origin.origin_timestamp;
+
+               data += sizeof(xl_xact_origin);
}

There seems to be some confusion on the padding here. Firstly, what's
the point of zero-padding the GID length to the next MAXALIGN boundary,
which would be 8 bytes on 64-bit systems, if the start is only
guaranteed 4-byte alignment, per the comment at the memcpy() above.
Secondly, if we're memcpying the fields that follow anyway, why bother
with any alignment padding at all?

I propose the attached.

- Heikki

Attachments:

gid-in-wal-records-cleanup.patchtext/x-patch; name=gid-in-wal-records-cleanup.patchDownload
diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index 3b3c95f810..2e4ea62e62 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -105,10 +105,8 @@ ParseCommitRecord(uint8 info, xl_xact_commit *xlrec, xl_xact_parsed_commit *pars
 
 		if (parsed->xinfo & XACT_XINFO_HAS_GID)
 		{
-			int gidlen;
 			strlcpy(parsed->twophase_gid, data, sizeof(parsed->twophase_gid));
-			gidlen = strlen(data) + 1;
-			data += MAXALIGN(gidlen);
+			data += strlen(data) + 1;
 		}
 	}
 
@@ -116,7 +114,7 @@ ParseCommitRecord(uint8 info, xl_xact_commit *xlrec, xl_xact_parsed_commit *pars
 	{
 		xl_xact_origin xl_origin;
 
-		/* we're only guaranteed 4 byte alignment, so copy onto stack */
+		/* we're not guaranteed any alignment, so copy onto stack */
 		memcpy(&xl_origin, data, sizeof(xl_origin));
 
 		parsed->origin_lsn = xl_origin.origin_lsn;
@@ -189,10 +187,8 @@ ParseAbortRecord(uint8 info, xl_xact_abort *xlrec, xl_xact_parsed_abort *parsed)
 
 		if (parsed->xinfo & XACT_XINFO_HAS_GID)
 		{
-			int gidlen;
 			strlcpy(parsed->twophase_gid, data, sizeof(parsed->twophase_gid));
-			gidlen = strlen(data) + 1;
-			data += MAXALIGN(gidlen);
+			data += strlen(data) + 1;
 		}
 	}
 
@@ -200,7 +196,7 @@ ParseAbortRecord(uint8 info, xl_xact_abort *xlrec, xl_xact_parsed_abort *parsed)
 	{
 		xl_xact_origin xl_origin;
 
-		/* we're only guaranteed 4 byte alignment, so copy onto stack */
+		/* we're not guaranteed any alignment, so copy onto stack */
 		memcpy(&xl_origin, data, sizeof(xl_origin));
 
 		parsed->origin_lsn = xl_origin.origin_lsn;
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index d6e4b7980f..5c05d545c4 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1129,9 +1129,11 @@ EndPrepare(GlobalTransaction gxact)
 	gxact->prepare_end_lsn = XLogInsert(RM_XACT_ID, XLOG_XACT_PREPARE);
 
 	if (replorigin)
+	{
 		/* Move LSNs forward for this replication origin */
 		replorigin_session_advance(replorigin_session_origin_lsn,
 								   gxact->prepare_end_lsn);
+	}
 
 	XLogFlush(gxact->prepare_end_lsn);
 
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b88d4ccf74..1dcf825625 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -5248,7 +5248,6 @@ XactLogCommitRecord(TimestampTz commit_time,
 	xl_xact_origin xl_origin;
 
 	uint8		info;
-	int			gidlen = 0;
 
 	Assert(CritSectionCount > 0);
 
@@ -5314,10 +5313,7 @@ XactLogCommitRecord(TimestampTz commit_time,
 		Assert(twophase_gid != NULL);
 
 		if (XLogLogicalInfoActive())
-		{
 			xl_xinfo.xinfo |= XACT_XINFO_HAS_GID;
-			gidlen = strlen(twophase_gid) + 1; /* include '\0' */
-		}
 	}
 
 	/* dump transaction origin information */
@@ -5371,12 +5367,7 @@ XactLogCommitRecord(TimestampTz commit_time,
 	{
 		XLogRegisterData((char *) (&xl_twophase), sizeof(xl_xact_twophase));
 		if (xl_xinfo.xinfo & XACT_XINFO_HAS_GID)
-		{
-			static const char zeroes[MAXIMUM_ALIGNOF] = { 0 };
-			XLogRegisterData((char*) twophase_gid, gidlen);
-			if (MAXALIGN(gidlen) != gidlen)
-				XLogRegisterData((char*) zeroes, MAXALIGN(gidlen) - gidlen);
-		}
+			XLogRegisterData((char *) twophase_gid, strlen(twophase_gid));
 	}
 
 	if (xl_xinfo.xinfo & XACT_XINFO_HAS_ORIGIN)
@@ -5410,7 +5401,6 @@ XactLogAbortRecord(TimestampTz abort_time,
 	xl_xact_origin xl_origin;
 
 	uint8		info;
-	int			gidlen = 0;
 
 	Assert(CritSectionCount > 0);
 
@@ -5449,10 +5439,7 @@ XactLogAbortRecord(TimestampTz abort_time,
 		Assert(twophase_gid != NULL);
 
 		if (XLogLogicalInfoActive())
-		{
 			xl_xinfo.xinfo |= XACT_XINFO_HAS_GID;
-			gidlen = strlen(twophase_gid) + 1; /* include '\0' */
-		}
 	}
 
 	if (TransactionIdIsValid(twophase_xid) && XLogLogicalInfoActive())
@@ -5488,7 +5475,6 @@ XactLogAbortRecord(TimestampTz abort_time,
 	if (xl_xinfo.xinfo & XACT_XINFO_HAS_DBINFO)
 		XLogRegisterData((char *) (&xl_dbinfo), sizeof(xl_dbinfo));
 
-
 	if (xl_xinfo.xinfo & XACT_XINFO_HAS_SUBXACTS)
 	{
 		XLogRegisterData((char *) (&xl_subxacts),
@@ -5509,14 +5495,14 @@ XactLogAbortRecord(TimestampTz abort_time,
 	{
 		XLogRegisterData((char *) (&xl_twophase), sizeof(xl_xact_twophase));
 		if (xl_xinfo.xinfo & XACT_XINFO_HAS_GID)
-		{
-			static const char zeroes[MAXIMUM_ALIGNOF] = { 0 };
-			XLogRegisterData((char*) twophase_gid, gidlen);
-			if (MAXALIGN(gidlen) != gidlen)
-				XLogRegisterData((char*) zeroes, MAXALIGN(gidlen) - gidlen);
-		}
+			XLogRegisterData((char *) twophase_gid, strlen(twophase_gid) + 1);
 	}
 
+	/*
+	 * NOTE: twophase_gid is variable-length, so no alignment is guaranteed
+	 * after this point.
+	 */
+
 	if (xl_xinfo.xinfo & XACT_XINFO_HAS_ORIGIN)
 		XLogRegisterData((char *) (&xl_origin), sizeof(xl_xact_origin));
 
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index a46396f2d9..3661b8d090 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -269,6 +269,7 @@ typedef struct xl_xact_commit
 	/* xl_xact_relfilenodes follows if XINFO_HAS_RELFILENODES */
 	/* xl_xact_invals follows if XINFO_HAS_INVALS */
 	/* xl_xact_twophase follows if XINFO_HAS_TWOPHASE */
+	/* twophase_gid follows if XINFO_HAS_GID. As a null-terminated string. */
 	/* xl_xact_origin follows if XINFO_HAS_ORIGIN, stored unaligned! */
 } xl_xact_commit;
 #define MinSizeOfXactCommit (offsetof(xl_xact_commit, xact_time) + sizeof(TimestampTz))
@@ -278,11 +279,13 @@ typedef struct xl_xact_abort
 	TimestampTz xact_time;		/* time of abort */
 
 	/* xl_xact_xinfo follows if XLOG_XACT_HAS_INFO */
-	/* No db_info required */
+	/* xl_xact_dbinfo follows if XINFO_HAS_DBINFO */
 	/* xl_xact_subxacts follows if HAS_SUBXACT */
 	/* xl_xact_relfilenodes follows if HAS_RELFILENODES */
 	/* No invalidation messages needed. */
 	/* xl_xact_twophase follows if XINFO_HAS_TWOPHASE */
+	/* twophase_gid follows if XINFO_HAS_GID. As a null-terminated string. */
+	/* xl_xact_origin follows if XINFO_HAS_ORIGIN, stored unaligned! */
 } xl_xact_abort;
 #define MinSizeOfXactAbort sizeof(xl_xact_abort)
 
#3Michael Paquier
michael@paquier.xyz
In reply to: Heikki Linnakangas (#2)
Re: pgsql: Store 2PC GID in commit/abort WAL recs for logical decoding

On Mon, Apr 09, 2018 at 03:30:39PM +0300, Heikki Linnakangas wrote:

There seems to be some confusion on the padding here. Firstly, what's the
point of zero-padding the GID length to the next MAXALIGN boundary, which
would be 8 bytes on 64-bit systems, if the start is only guaranteed 4-byte
alignment, per the comment at the memcpy() above. Secondly, if we're
memcpying the fields that follow anyway, why bother with any alignment
padding at all?

It seems to me that you are right here: those complications are not
necessary.

if (replorigin)
+ {
/* Move LSNs forward for this replication origin */
replorigin_session_advance(replorigin_session_origin_lsn,
gxact->prepare_end_lsn);
+ }
This is better style.

+   /* twophase_gid follows if XINFO_HAS_GID. As a null-terminated string. */
+   /* xl_xact_origin follows if XINFO_HAS_ORIGIN, stored unaligned! */

Worth mentioning that the first one is also unaligned with your patch?
And that all the last fields of xl_xact_commit and xl_xact_abort are
kept as such on purpose?
--
Michael

#4Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Michael Paquier (#3)
Re: pgsql: Store 2PC GID in commit/abort WAL recs for logical decoding

On 10/04/18 03:24, Michael Paquier wrote:

+   /* twophase_gid follows if XINFO_HAS_GID. As a null-terminated string. */
+   /* xl_xact_origin follows if XINFO_HAS_ORIGIN, stored unaligned! */

Worth mentioning that the first one is also unaligned with your patch?

Hmm. 'twophase_gid' is actually 4-byte aligned here. But it's a string,
so it doesn't matter whether it it is or not.

And that all the last fields of xl_xact_commit and xl_xact_abort are
kept as such on purpose?

I think that's clear without an explicit comment. If it wasn't on
purpose, we wouldn't have a comment pointing it out (or we would fix it
so that it was aligned).

Pushed, thanks for the review!

- Heikki

#5Nikhil Sontakke
nikhils@2ndquadrant.com
In reply to: Heikki Linnakangas (#4)
1 attachment(s)
Re: pgsql: Store 2PC GID in commit/abort WAL recs for logical decoding

Hi Heikki,

Pushed, thanks for the review!

There was a slight oversight in the twophase_gid length calculation in
the XactLogCommitRecord() code path in the cf5a1890592 commit. The
corresponding XactLogAbortRecord() code path was ok. PFA, a small
patch to fix the oversight.

Regards,
Nikhils
--
Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

gid_length.patchapplication/octet-stream; name=gid_length.patchDownload
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index f4e5ea84b9..b51eeb087e 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -5365,7 +5365,7 @@ XactLogCommitRecord(TimestampTz commit_time,
 	{
 		XLogRegisterData((char *) (&xl_twophase), sizeof(xl_xact_twophase));
 		if (xl_xinfo.xinfo & XACT_XINFO_HAS_GID)
-			XLogRegisterData((char *) twophase_gid, strlen(twophase_gid));
+			XLogRegisterData((char *) twophase_gid, strlen(twophase_gid) + 1);
 	}
 
 	if (xl_xinfo.xinfo & XACT_XINFO_HAS_ORIGIN)
#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nikhil Sontakke (#5)
Re: pgsql: Store 2PC GID in commit/abort WAL recs for logical decoding

On 2018-Jun-14, Nikhil Sontakke wrote:

There was a slight oversight in the twophase_gid length calculation in
the XactLogCommitRecord() code path in the cf5a1890592 commit. The
corresponding XactLogAbortRecord() code path was ok. PFA, a small
patch to fix the oversight.

Thanks, pushed.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nikhil Sontakke (#5)
Re: pgsql: Store 2PC GID in commit/abort WAL recs for logical decoding

On 2018-Jun-14, Nikhil Sontakke wrote:

There was a slight oversight in the twophase_gid length calculation in
the XactLogCommitRecord() code path in the cf5a1890592 commit. The
corresponding XactLogAbortRecord() code path was ok. PFA, a small
patch to fix the oversight.

Forgot to add: maybe it would be useful to have tests in core where
these omissions become evident. Do you have some?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nikhil Sontakke (#5)
1 attachment(s)
Re: pgsql: Store 2PC GID in commit/abort WAL recs for logical decoding

By the way, why do we need to strlen() the target buffer when strlcpy
already reports the length? (You could argue that there is a difference
if the string is truncated ... but surely we don't care about that case)
I propose the attached.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-no-need-for-separate-strlen.patchtext/plain; charset=us-asciiDownload
From 714260ddb13b440db68f4e6db1f8aef0a0baba0a Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Fri, 15 Jun 2018 11:48:53 -0400
Subject: [PATCH] no need for separate strlen

---
 src/backend/access/rmgrdesc/xactdesc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index 6d5ebd475b..da3a296e99 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -104,8 +104,8 @@ ParseCommitRecord(uint8 info, xl_xact_commit *xlrec, xl_xact_parsed_commit *pars
 
 		if (parsed->xinfo & XACT_XINFO_HAS_GID)
 		{
-			strlcpy(parsed->twophase_gid, data, sizeof(parsed->twophase_gid));
-			data += strlen(data) + 1;
+			data += strlcpy(parsed->twophase_gid, data,
+							sizeof(parsed->twophase_gid)) + 1;
 		}
 	}
 
@@ -188,8 +188,8 @@ ParseAbortRecord(uint8 info, xl_xact_abort *xlrec, xl_xact_parsed_abort *parsed)
 
 		if (parsed->xinfo & XACT_XINFO_HAS_GID)
 		{
-			strlcpy(parsed->twophase_gid, data, sizeof(parsed->twophase_gid));
-			data += strlen(data) + 1;
+			data += strlcpy(parsed->twophase_gid, data,
+							sizeof(parsed->twophase_gid)) + 1;
 		}
 	}
 
-- 
2.11.0

#9Nikhil Sontakke
nikhils@2ndquadrant.com
In reply to: Alvaro Herrera (#7)
Re: pgsql: Store 2PC GID in commit/abort WAL recs for logical decoding

Hi Alvaro,

There was a slight oversight in the twophase_gid length calculation in
the XactLogCommitRecord() code path in the cf5a1890592 commit. The
corresponding XactLogAbortRecord() code path was ok. PFA, a small
patch to fix the oversight.

Forgot to add: maybe it would be useful to have tests in core where
these omissions become evident. Do you have some?

Thanks for the commit.

I do have some tests. They are part of the "logical decoding of 2PC"
patch which adds the needed infrastructure to *actually* use this code
present in the core as of now. I am going to submit it in the upcoming
commitfest.

Regards,
Nikhils
--
Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#8)
Re: pgsql: Store 2PC GID in commit/abort WAL recs for logical decoding

On 2018-Jun-15, Alvaro Herrera wrote:

By the way, why do we need to strlen() the target buffer when strlcpy
already reports the length? (You could argue that there is a difference
if the string is truncated ... but surely we don't care about that case)
I propose the attached.

I decided not to push this after all. Yes, one strlen is saved, but
there is some code clarity lost also, and this is certainly not a
contention point.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services