WAL log only necessary part of 2PC GID

Started by Pavan Deolaseealmost 10 years ago7 messages
#1Pavan Deolasee
pavan.deolasee@gmail.com
1 attachment(s)

Hello Hackers,

The maximum size of the GID, used as a 2PC identifier is currently defined
as 200 bytes (see src/backend/access/transam/twophase.c). The actual GID
used by the applications though may be much smaller than that. So IMO
instead of WAL logging the entire 200 bytes during PREPARE TRANSACTION, we
should just WAL log strlen(gid) bytes.

The attached patch does that. The changes are limited to twophase.c and
some simple crash recovery tests seem to be work ok. In terms of
performance, a quick test shows marginal improvement in tps using the
script that Stas Kelvich used for his work on speeding up twophase
transactions. The only change I made is to keep the :scale unchanged
because increasing the :scale in every iteration will result in only a
handful updates (not sure why Stas had that in his original script)

\set naccounts 100000 * :scale
\setrandom from_aid 1 :naccounts
\setrandom to_aid 1 :naccounts
\setrandom delta 1 100
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance - :delta WHERE aid =
:from_aid;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid =
:to_aid;
PREPARE TRANSACTION ':client_id.:scale';
COMMIT PREPARED ':client_id.:scale';

The amount of WAL generated during a 60s run shows a decline of about 25%
with default settings except full_page_writes which is turned off.

HEAD: 861 WAL bytes / transaction
PATCH: 670 WAL bytes / transaction

Actually, the above numbers probably include a lot of WAL generated because
of HOT pruning and page defragmentation. If we just look at the WAL
overhead caused by 2PC, the decline is somewhere close to 50%. I took
numbers using simple 1PC for reference and to understand the overhead of
2PC.

HEAD (1PC): 382 bytes / transaction

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

reduce_gid_wal.patchapplication/octet-stream; name=reduce_gid_wal.patchDownload
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 8a22836..e4e88b6 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -866,7 +866,7 @@ typedef struct TwoPhaseFileHeader
 	int32		nabortrels;		/* number of delete-on-abort rels */
 	int32		ninvalmsgs;		/* number of cache invalidation messages */
 	bool		initfileinval;	/* does relcache init file need invalidation? */
-	char		gid[GIDSIZE];	/* GID for transaction */
+	uint32		gidlen;			/* length of the GID - GID follows the header */
 } TwoPhaseFileHeader;
 
 /*
@@ -977,9 +977,10 @@ StartPrepare(GlobalTransaction gxact)
 	hdr.nabortrels = smgrGetPendingDeletes(false, &abortrels);
 	hdr.ninvalmsgs = xactGetCommittedInvalidationMessages(&invalmsgs,
 														  &hdr.initfileinval);
-	StrNCpy(hdr.gid, gxact->gid, GIDSIZE);
+	hdr.gidlen = strlen(gxact->gid) + 1; /* Include '\0' */
 
 	save_state_data(&hdr, sizeof(TwoPhaseFileHeader));
+	save_state_data(gxact->gid, hdr.gidlen);
 
 	/*
 	 * Add the additional info about subxacts, deletable files and cache
@@ -1360,6 +1361,7 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 	hdr = (TwoPhaseFileHeader *) buf;
 	Assert(TransactionIdEquals(hdr->xid, xid));
 	bufptr = buf + MAXALIGN(sizeof(TwoPhaseFileHeader));
+	bufptr += MAXALIGN(hdr->gidlen);
 	children = (TransactionId *) bufptr;
 	bufptr += MAXALIGN(hdr->nsubxacts * sizeof(TransactionId));
 	commitrels = (RelFileNode *) bufptr;
@@ -1915,6 +1917,7 @@ RecoverPreparedTransactions(void)
 			TwoPhaseFileHeader *hdr;
 			TransactionId *subxids;
 			GlobalTransaction gxact;
+			const char	*gid;
 			int			i;
 
 			xid = (TransactionId) strtoul(clde->d_name, NULL, 16);
@@ -1947,6 +1950,8 @@ RecoverPreparedTransactions(void)
 			hdr = (TwoPhaseFileHeader *) buf;
 			Assert(TransactionIdEquals(hdr->xid, xid));
 			bufptr = buf + MAXALIGN(sizeof(TwoPhaseFileHeader));
+			gid = (const char *) bufptr;
+			bufptr += MAXALIGN(hdr->gidlen);
 			subxids = (TransactionId *) bufptr;
 			bufptr += MAXALIGN(hdr->nsubxacts * sizeof(TransactionId));
 			bufptr += MAXALIGN(hdr->ncommitrels * sizeof(RelFileNode));
@@ -1975,7 +1980,7 @@ RecoverPreparedTransactions(void)
 			/*
 			 * Recreate its GXACT and dummy PGPROC
 			 */
-			gxact = MarkAsPreparing(xid, hdr->gid,
+			gxact = MarkAsPreparing(xid, gid,
 									hdr->prepared_at,
 									hdr->owner, hdr->database);
 			gxact->ondisk = true;
#2Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Pavan Deolasee (#1)
Re: WAL log only necessary part of 2PC GID

On 02/29/2016 08:45 AM, Pavan Deolasee wrote:

Hello Hackers,

The maximum size of the GID, used as a 2PC identifier is currently defined
as 200 bytes (see src/backend/access/transam/twophase.c). The actual GID
used by the applications though may be much smaller than that. So IMO
instead of WAL logging the entire 200 bytes during PREPARE TRANSACTION, we
should just WAL log strlen(gid) bytes.

The attached patch does that. The changes are limited to twophase.c and
some simple crash recovery tests seem to be work ok. In terms of
performance, a quick test shows marginal improvement in tps using the
script that Stas Kelvich used for his work on speeding up twophase
transactions. The only change I made is to keep the :scale unchanged
because increasing the :scale in every iteration will result in only a
handful updates (not sure why Stas had that in his original script)

\set naccounts 100000 * :scale
\setrandom from_aid 1 :naccounts
\setrandom to_aid 1 :naccounts
\setrandom delta 1 100
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance - :delta WHERE aid =
:from_aid;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid =
:to_aid;
PREPARE TRANSACTION ':client_id.:scale';
COMMIT PREPARED ':client_id.:scale';

The amount of WAL generated during a 60s run shows a decline of about 25%
with default settings except full_page_writes which is turned off.

HEAD: 861 WAL bytes / transaction
PATCH: 670 WAL bytes / transaction

Actually, the above numbers probably include a lot of WAL generated because
of HOT pruning and page defragmentation. If we just look at the WAL
overhead caused by 2PC, the decline is somewhere close to 50%. I took
numbers using simple 1PC for reference and to understand the overhead of
2PC.

HEAD (1PC): 382 bytes / transaction

I can confirm the marginal speed up in tps due to the new WAL size.

The TWOPHASE_MAGIC constant should be changed, as the file header has
changed definition, right ?

Thanks for working on this !

Best regards,
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Jesper Pedersen (#2)
1 attachment(s)
Re: WAL log only necessary part of 2PC GID

On Fri, Mar 4, 2016 at 9:16 PM, Jesper Pedersen <jesper.pedersen@redhat.com>
wrote:

I can confirm the marginal speed up in tps due to the new WAL size.

The TWOPHASE_MAGIC constant should be changed, as the file header has
changed definition, right ?

Thanks for looking at it. I've revised the patch by incrementing the
TWOPHASE_MAGIC identifier.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

reduce_gid_wal_v2.patchapplication/octet-stream; name=reduce_gid_wal_v2.patchDownload
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 8a22836..bdf5a6d 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -851,7 +851,7 @@ TwoPhaseGetDummyProc(TransactionId xid)
 /*
  * Header for a 2PC state file
  */
-#define TWOPHASE_MAGIC	0x57F94532		/* format identifier */
+#define TWOPHASE_MAGIC	0x57F94533		/* format identifier */
 
 typedef struct TwoPhaseFileHeader
 {
@@ -866,7 +866,7 @@ typedef struct TwoPhaseFileHeader
 	int32		nabortrels;		/* number of delete-on-abort rels */
 	int32		ninvalmsgs;		/* number of cache invalidation messages */
 	bool		initfileinval;	/* does relcache init file need invalidation? */
-	char		gid[GIDSIZE];	/* GID for transaction */
+	uint32		gidlen;			/* length of the GID - GID follows the header */
 } TwoPhaseFileHeader;
 
 /*
@@ -977,9 +977,10 @@ StartPrepare(GlobalTransaction gxact)
 	hdr.nabortrels = smgrGetPendingDeletes(false, &abortrels);
 	hdr.ninvalmsgs = xactGetCommittedInvalidationMessages(&invalmsgs,
 														  &hdr.initfileinval);
-	StrNCpy(hdr.gid, gxact->gid, GIDSIZE);
+	hdr.gidlen = strlen(gxact->gid) + 1; /* Include '\0' */
 
 	save_state_data(&hdr, sizeof(TwoPhaseFileHeader));
+	save_state_data(gxact->gid, hdr.gidlen);
 
 	/*
 	 * Add the additional info about subxacts, deletable files and cache
@@ -1360,6 +1361,7 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 	hdr = (TwoPhaseFileHeader *) buf;
 	Assert(TransactionIdEquals(hdr->xid, xid));
 	bufptr = buf + MAXALIGN(sizeof(TwoPhaseFileHeader));
+	bufptr += MAXALIGN(hdr->gidlen);
 	children = (TransactionId *) bufptr;
 	bufptr += MAXALIGN(hdr->nsubxacts * sizeof(TransactionId));
 	commitrels = (RelFileNode *) bufptr;
@@ -1915,6 +1917,7 @@ RecoverPreparedTransactions(void)
 			TwoPhaseFileHeader *hdr;
 			TransactionId *subxids;
 			GlobalTransaction gxact;
+			const char	*gid;
 			int			i;
 
 			xid = (TransactionId) strtoul(clde->d_name, NULL, 16);
@@ -1947,6 +1950,8 @@ RecoverPreparedTransactions(void)
 			hdr = (TwoPhaseFileHeader *) buf;
 			Assert(TransactionIdEquals(hdr->xid, xid));
 			bufptr = buf + MAXALIGN(sizeof(TwoPhaseFileHeader));
+			gid = (const char *) bufptr;
+			bufptr += MAXALIGN(hdr->gidlen);
 			subxids = (TransactionId *) bufptr;
 			bufptr += MAXALIGN(hdr->nsubxacts * sizeof(TransactionId));
 			bufptr += MAXALIGN(hdr->ncommitrels * sizeof(RelFileNode));
@@ -1975,7 +1980,7 @@ RecoverPreparedTransactions(void)
 			/*
 			 * Recreate its GXACT and dummy PGPROC
 			 */
-			gxact = MarkAsPreparing(xid, hdr->gid,
+			gxact = MarkAsPreparing(xid, gid,
 									hdr->prepared_at,
 									hdr->owner, hdr->database);
 			gxact->ondisk = true;
#4Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Pavan Deolasee (#3)
Re: WAL log only necessary part of 2PC GID

On 03/08/2016 11:54 PM, Pavan Deolasee wrote:

On Fri, Mar 4, 2016 at 9:16 PM, Jesper Pedersen <jesper.pedersen@redhat.com>
wrote:

I can confirm the marginal speed up in tps due to the new WAL size.

The TWOPHASE_MAGIC constant should be changed, as the file header has
changed definition, right ?

Thanks for looking at it. I've revised the patch by incrementing the
TWOPHASE_MAGIC identifier.

Great, thanks.

Marked "Ready for Committer".

Best regards,
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Petr Jelinek
petr@2ndquadrant.com
In reply to: Pavan Deolasee (#3)
Re: WAL log only necessary part of 2PC GID

Hi,

I wonder why you define the gidlen as uint32 when it would fit into
uint8 which in the current TwoPhaseFileHeader struct should be win of 8
bytes on padding (on 64bit). I think that's something worth considering
given that this patch aims to lower the size of the data.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Petr Jelinek (#5)
1 attachment(s)
Re: WAL log only necessary part of 2PC GID

On Wed, Mar 9, 2016 at 7:56 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:

Hi,

I wonder why you define the gidlen as uint32 when it would fit into uint8
which in the current TwoPhaseFileHeader struct should be win of 8 bytes on
padding (on 64bit). I think that's something worth considering given that
this patch aims to lower the size of the data.

Hi Petr,

That sounds like a good idea; I didn't think about that. I would like to
make it uint16 though just in case if we decide to increase GIDSIZE from
200 to something more than 256 (Postgres-XL does that already). That still
fits in the same aligned width, on both 32 as well as 64-bit machines. New
version attached.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

reduce_gid_wal_v3.patchapplication/octet-stream; name=reduce_gid_wal_v3.patchDownload
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 8a22836..9bd3acb 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -851,7 +851,7 @@ TwoPhaseGetDummyProc(TransactionId xid)
 /*
  * Header for a 2PC state file
  */
-#define TWOPHASE_MAGIC	0x57F94532		/* format identifier */
+#define TWOPHASE_MAGIC	0x57F94533		/* format identifier */
 
 typedef struct TwoPhaseFileHeader
 {
@@ -866,7 +866,7 @@ typedef struct TwoPhaseFileHeader
 	int32		nabortrels;		/* number of delete-on-abort rels */
 	int32		ninvalmsgs;		/* number of cache invalidation messages */
 	bool		initfileinval;	/* does relcache init file need invalidation? */
-	char		gid[GIDSIZE];	/* GID for transaction */
+	uint16		gidlen;			/* length of the GID - GID follows the header */
 } TwoPhaseFileHeader;
 
 /*
@@ -977,9 +977,10 @@ StartPrepare(GlobalTransaction gxact)
 	hdr.nabortrels = smgrGetPendingDeletes(false, &abortrels);
 	hdr.ninvalmsgs = xactGetCommittedInvalidationMessages(&invalmsgs,
 														  &hdr.initfileinval);
-	StrNCpy(hdr.gid, gxact->gid, GIDSIZE);
+	hdr.gidlen = strlen(gxact->gid) + 1; /* Include '\0' */
 
 	save_state_data(&hdr, sizeof(TwoPhaseFileHeader));
+	save_state_data(gxact->gid, hdr.gidlen);
 
 	/*
 	 * Add the additional info about subxacts, deletable files and cache
@@ -1360,6 +1361,7 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 	hdr = (TwoPhaseFileHeader *) buf;
 	Assert(TransactionIdEquals(hdr->xid, xid));
 	bufptr = buf + MAXALIGN(sizeof(TwoPhaseFileHeader));
+	bufptr += MAXALIGN(hdr->gidlen);
 	children = (TransactionId *) bufptr;
 	bufptr += MAXALIGN(hdr->nsubxacts * sizeof(TransactionId));
 	commitrels = (RelFileNode *) bufptr;
@@ -1915,6 +1917,7 @@ RecoverPreparedTransactions(void)
 			TwoPhaseFileHeader *hdr;
 			TransactionId *subxids;
 			GlobalTransaction gxact;
+			const char	*gid;
 			int			i;
 
 			xid = (TransactionId) strtoul(clde->d_name, NULL, 16);
@@ -1947,6 +1950,8 @@ RecoverPreparedTransactions(void)
 			hdr = (TwoPhaseFileHeader *) buf;
 			Assert(TransactionIdEquals(hdr->xid, xid));
 			bufptr = buf + MAXALIGN(sizeof(TwoPhaseFileHeader));
+			gid = (const char *) bufptr;
+			bufptr += MAXALIGN(hdr->gidlen);
 			subxids = (TransactionId *) bufptr;
 			bufptr += MAXALIGN(hdr->nsubxacts * sizeof(TransactionId));
 			bufptr += MAXALIGN(hdr->ncommitrels * sizeof(RelFileNode));
@@ -1975,7 +1980,7 @@ RecoverPreparedTransactions(void)
 			/*
 			 * Recreate its GXACT and dummy PGPROC
 			 */
-			gxact = MarkAsPreparing(xid, hdr->gid,
+			gxact = MarkAsPreparing(xid, gid,
 									hdr->prepared_at,
 									hdr->owner, hdr->database);
 			gxact->ondisk = true;
#7Petr Jelinek
petr@2ndquadrant.com
In reply to: Pavan Deolasee (#6)
Re: WAL log only necessary part of 2PC GID

On 10/03/16 13:43, Pavan Deolasee wrote:

On Wed, Mar 9, 2016 at 7:56 PM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>> wrote:

Hi,

I wonder why you define the gidlen as uint32 when it would fit into
uint8 which in the current TwoPhaseFileHeader struct should be win
of 8 bytes on padding (on 64bit). I think that's something worth
considering given that this patch aims to lower the size of the data.

Hi Petr,

That sounds like a good idea; I didn't think about that. I would like to
make it uint16 though just in case if we decide to increase GIDSIZE from
200 to something more than 256 (Postgres-XL does that already). That
still fits in the same aligned width, on both 32 as well as 64-bit
machines. New version attached.

Correct, and I see Simon committed it like this in meantime, thanks.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers