Use PGAlignedBlock instead of "char buf[BLCKSZ]" in more places

Started by Bharath Rupireddyabout 2 years ago5 messages
#1Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
1 attachment(s)

Hi,

The commit 44cac934 replaced "char buf[BLCKSZ]" with PGAlignedBlock to
avoid issues on alignment-picky hardware. While it replaced most of the
instances, there are still some more left. How about we use PGAlignedBlock
there too, something like the attached patch? A note [2]I used these types even for variables where there's no risk of a misaligned access, since ensuring proper alignment should make kernel data transfers faster. I also changed some places where we had been palloc'ing short-lived buffers, for coding style uniformity and to save palloc/pfree overhead. in the commit
44cac934 says that ensuring proper alignment makes kernel data transfers
fasters and the left-over "char buf[BLCKSZ]" either do read() or write()
system calls, so it might be worth to align them with PGAlignedBlock.

Thoughts?

PS: FWIW, I verified what difference actually char buf[BLCKSZ] and the
union PGAlignedBlock does make with alignment with a sample code like [3]#include <stdio.h>
which gives a different alignment requirement, see below:

size of data 8192, alignment of data 1
size of data_aligned 8192, alignment of data_aligned 8

[1]: commit 44cac9346479d4b0cc9195b0267fd13eb4e7442c Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Sat Sep 1 15:27:12 2018 -0400
commit 44cac9346479d4b0cc9195b0267fd13eb4e7442c
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sat Sep 1 15:27:12 2018 -0400

Avoid using potentially-under-aligned page buffers.

[2]: I used these types even for variables where there's no risk of a misaligned access, since ensuring proper alignment should make kernel data transfers faster. I also changed some places where we had been palloc'ing short-lived buffers, for coding style uniformity and to save palloc/pfree overhead.
I used these types even for variables where there's no risk of a
misaligned access, since ensuring proper alignment should make
kernel data transfers faster. I also changed some places where
we had been palloc'ing short-lived buffers, for coding style
uniformity and to save palloc/pfree overhead.

[3]: #include <stdio.h>
#include <stdio.h>

#define BLCKSZ 8192

typedef union PGAlignedBlock
{
char data[BLCKSZ];
double force_align_d;
long long int force_align_i64;
} PGAlignedBlock;

int main(int argc, char **argv)
{
char data[BLCKSZ];
PGAlignedBlock data_aligned;

printf("size of data %ld, alignment of data %ld\n", sizeof(data),
_Alignof(data));
printf("size of data_aligned %ld, alignment of data_aligned %ld\n",
sizeof(data_aligned), _Alignof(data_aligned));

return 0;
}

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-Use-PGAlignedBlock-instead-of-char-buf-BLCKSZ-in-.patchapplication/x-patch; name=v1-0001-Use-PGAlignedBlock-instead-of-char-buf-BLCKSZ-in-.patchDownload
From 5b2f9d3db4f230c46382f9e1be8cacfeef4f537b Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Sun, 3 Dec 2023 09:33:37 +0000
Subject: [PATCH v1] Use PGAlignedBlock instead of "char buf[BLCKSZ]" in more
 places

Commit 44cac934 replaced "char buf[BLCKSZ]" with PGAlignedBlock to
avoid issues on alignment-picky hardware. While it replaced most
of the instances, there are still some more left. This commit
uses PGAlignedBlock in the left-over places. A note in the commit
44cac934 says that ensuring proper alignment makes kernel data
transfers fasters and the left-over "char buf[BLCKSZ]" either do
read() or write() system calls, so it might be worth to align them
with PGAlignedBlock.
---
 src/backend/access/transam/timeline.c | 12 +++++-----
 src/backend/utils/init/miscinit.c     | 32 +++++++++++++--------------
 2 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c
index 94e152694e..7f1bf1fdc7 100644
--- a/src/backend/access/transam/timeline.c
+++ b/src/backend/access/transam/timeline.c
@@ -307,7 +307,7 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 	char		path[MAXPGPATH];
 	char		tmppath[MAXPGPATH];
 	char		histfname[MAXFNAMELEN];
-	char		buffer[BLCKSZ];
+	PGAlignedBlock buffer;
 	int			srcfd;
 	int			fd;
 	int			nbytes;
@@ -354,7 +354,7 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 		{
 			errno = 0;
 			pgstat_report_wait_start(WAIT_EVENT_TIMELINE_HISTORY_READ);
-			nbytes = (int) read(srcfd, buffer, sizeof(buffer));
+			nbytes = (int) read(srcfd, buffer.data, sizeof(buffer));
 			pgstat_report_wait_end();
 			if (nbytes < 0 || errno != 0)
 				ereport(ERROR,
@@ -364,7 +364,7 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 				break;
 			errno = 0;
 			pgstat_report_wait_start(WAIT_EVENT_TIMELINE_HISTORY_WRITE);
-			if ((int) write(fd, buffer, nbytes) != nbytes)
+			if ((int) write(fd, buffer.data, nbytes) != nbytes)
 			{
 				int			save_errno = errno;
 
@@ -398,17 +398,17 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 	 * If we did have a parent file, insert an extra newline just in case the
 	 * parent file failed to end with one.
 	 */
-	snprintf(buffer, sizeof(buffer),
+	snprintf(buffer.data, sizeof(buffer),
 			 "%s%u\t%X/%X\t%s\n",
 			 (srcfd < 0) ? "" : "\n",
 			 parentTLI,
 			 LSN_FORMAT_ARGS(switchpoint),
 			 reason);
 
-	nbytes = strlen(buffer);
+	nbytes = strlen(buffer.data);
 	errno = 0;
 	pgstat_report_wait_start(WAIT_EVENT_TIMELINE_HISTORY_WRITE);
-	if ((int) write(fd, buffer, nbytes) != nbytes)
+	if ((int) write(fd, buffer.data, nbytes) != nbytes)
 	{
 		int			save_errno = errno;
 
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 819936ec02..c3c3fb325b 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -1508,8 +1508,8 @@ AddToDataDirLockFile(int target_line, const char *str)
 	int			lineno;
 	char	   *srcptr;
 	char	   *destptr;
-	char		srcbuffer[BLCKSZ];
-	char		destbuffer[BLCKSZ];
+	PGAlignedBlock srcbuffer;
+	PGAlignedBlock destbuffer;
 
 	fd = open(DIRECTORY_LOCK_FILE, O_RDWR | PG_BINARY, 0);
 	if (fd < 0)
@@ -1521,7 +1521,7 @@ AddToDataDirLockFile(int target_line, const char *str)
 		return;
 	}
 	pgstat_report_wait_start(WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ);
-	len = read(fd, srcbuffer, sizeof(srcbuffer) - 1);
+	len = read(fd, srcbuffer.data, sizeof(srcbuffer) - 1);
 	pgstat_report_wait_end();
 	if (len < 0)
 	{
@@ -1532,13 +1532,13 @@ AddToDataDirLockFile(int target_line, const char *str)
 		close(fd);
 		return;
 	}
-	srcbuffer[len] = '\0';
+	srcbuffer.data[len] = '\0';
 
 	/*
 	 * Advance over lines we are not supposed to rewrite, then copy them to
 	 * destbuffer.
 	 */
-	srcptr = srcbuffer;
+	srcptr = srcbuffer.data;
 	for (lineno = 1; lineno < target_line; lineno++)
 	{
 		char	   *eol = strchr(srcptr, '\n');
@@ -1547,8 +1547,8 @@ AddToDataDirLockFile(int target_line, const char *str)
 			break;				/* not enough lines in file yet */
 		srcptr = eol + 1;
 	}
-	memcpy(destbuffer, srcbuffer, srcptr - srcbuffer);
-	destptr = destbuffer + (srcptr - srcbuffer);
+	memcpy(destbuffer.data, srcbuffer.data, srcptr - srcbuffer.data);
+	destptr = destbuffer.data + (srcptr - srcbuffer.data);
 
 	/*
 	 * Fill in any missing lines before the target line, in case lines are
@@ -1556,14 +1556,14 @@ AddToDataDirLockFile(int target_line, const char *str)
 	 */
 	for (; lineno < target_line; lineno++)
 	{
-		if (destptr < destbuffer + sizeof(destbuffer))
+		if (destptr < destbuffer.data + sizeof(destbuffer))
 			*destptr++ = '\n';
 	}
 
 	/*
 	 * Write or rewrite the target line.
 	 */
-	snprintf(destptr, destbuffer + sizeof(destbuffer) - destptr, "%s\n", str);
+	snprintf(destptr, destbuffer.data + sizeof(destbuffer) - destptr, "%s\n", str);
 	destptr += strlen(destptr);
 
 	/*
@@ -1572,7 +1572,7 @@ AddToDataDirLockFile(int target_line, const char *str)
 	if ((srcptr = strchr(srcptr, '\n')) != NULL)
 	{
 		srcptr++;
-		snprintf(destptr, destbuffer + sizeof(destbuffer) - destptr, "%s",
+		snprintf(destptr, destbuffer.data + sizeof(destbuffer) - destptr, "%s",
 				 srcptr);
 	}
 
@@ -1580,10 +1580,10 @@ AddToDataDirLockFile(int target_line, const char *str)
 	 * And rewrite the data.  Since we write in a single kernel call, this
 	 * update should appear atomic to onlookers.
 	 */
-	len = strlen(destbuffer);
+	len = strlen(destbuffer.data);
 	errno = 0;
 	pgstat_report_wait_start(WAIT_EVENT_LOCK_FILE_ADDTODATADIR_WRITE);
-	if (pg_pwrite(fd, destbuffer, len, 0) != len)
+	if (pg_pwrite(fd, destbuffer.data, len, 0) != len)
 	{
 		pgstat_report_wait_end();
 		/* if write didn't set errno, assume problem is no disk space */
@@ -1633,7 +1633,7 @@ RecheckDataDirLockFile(void)
 	int			fd;
 	int			len;
 	long		file_pid;
-	char		buffer[BLCKSZ];
+	PGAlignedBlock buffer;
 
 	fd = open(DIRECTORY_LOCK_FILE, O_RDWR | PG_BINARY, 0);
 	if (fd < 0)
@@ -1663,7 +1663,7 @@ RecheckDataDirLockFile(void)
 		}
 	}
 	pgstat_report_wait_start(WAIT_EVENT_LOCK_FILE_RECHECKDATADIR_READ);
-	len = read(fd, buffer, sizeof(buffer) - 1);
+	len = read(fd, buffer.data, sizeof(buffer) - 1);
 	pgstat_report_wait_end();
 	if (len < 0)
 	{
@@ -1674,9 +1674,9 @@ RecheckDataDirLockFile(void)
 		close(fd);
 		return true;			/* treat read failure as nonfatal */
 	}
-	buffer[len] = '\0';
+	buffer.data[len] = '\0';
 	close(fd);
-	file_pid = atol(buffer);
+	file_pid = atol(buffer.data);
 	if (file_pid == getpid())
 		return true;			/* all is well */
 
-- 
2.34.1

#2Michael Paquier
michael@paquier.xyz
In reply to: Bharath Rupireddy (#1)
Re: Use PGAlignedBlock instead of "char buf[BLCKSZ]" in more places

On Mon, Dec 04, 2023 at 06:59:13AM +0530, Bharath Rupireddy wrote:

The commit 44cac934 replaced "char buf[BLCKSZ]" with PGAlignedBlock to
avoid issues on alignment-picky hardware. While it replaced most of the
instances, there are still some more left. How about we use PGAlignedBlock
there too, something like the attached patch? A note [2] in the commit
44cac934 says that ensuring proper alignment makes kernel data transfers
fasters and the left-over "char buf[BLCKSZ]" either do read() or write()
system calls, so it might be worth to align them with PGAlignedBlock.

Thoughts?

The buffers used to write the lock file and the TLI history file are
not page buffers, and this could make code readers think that these
are pages. So I am honestly not sure if there's a point in changing
them because the current code is not incorrect, isn't it? It looks
like 2042b3428d39 for the TLI history file and 52948169bcdd for the
lock file began using BLCKSZ because that was just a handy thing to
do, and because we know they would never get beyond that.
--
Michael

#3Peter Eisentraut
peter@eisentraut.org
In reply to: Michael Paquier (#2)
Re: Use PGAlignedBlock instead of "char buf[BLCKSZ]" in more places

On 04.12.23 06:46, Michael Paquier wrote:

On Mon, Dec 04, 2023 at 06:59:13AM +0530, Bharath Rupireddy wrote:

The commit 44cac934 replaced "char buf[BLCKSZ]" with PGAlignedBlock to
avoid issues on alignment-picky hardware. While it replaced most of the
instances, there are still some more left. How about we use PGAlignedBlock
there too, something like the attached patch? A note [2] in the commit
44cac934 says that ensuring proper alignment makes kernel data transfers
fasters and the left-over "char buf[BLCKSZ]" either do read() or write()
system calls, so it might be worth to align them with PGAlignedBlock.

Thoughts?

The buffers used to write the lock file and the TLI history file are
not page buffers, and this could make code readers think that these
are pages.

The type is called "aligned block", not "aligned buffer" or "aligned
page", so I don't think it's incorrect to try to use it.

So I am honestly not sure if there's a point in changing

them because the current code is not incorrect, isn't it? It looks
like 2042b3428d39 for the TLI history file and 52948169bcdd for the
lock file began using BLCKSZ because that was just a handy thing to
do, and because we know they would never get beyond that.

Yeah, it's not clear why these need to be block-sized. We shouldn't
perpetuate this without more clarity about this.

#4Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#3)
Re: Use PGAlignedBlock instead of "char buf[BLCKSZ]" in more places

Hi,

On 2023-12-04 15:53:44 +0100, Peter Eisentraut wrote:

On 04.12.23 06:46, Michael Paquier wrote:

On Mon, Dec 04, 2023 at 06:59:13AM +0530, Bharath Rupireddy wrote:

The commit 44cac934 replaced "char buf[BLCKSZ]" with PGAlignedBlock to
avoid issues on alignment-picky hardware. While it replaced most of the
instances, there are still some more left. How about we use PGAlignedBlock
there too, something like the attached patch? A note [2] in the commit
44cac934 says that ensuring proper alignment makes kernel data transfers
fasters and the left-over "char buf[BLCKSZ]" either do read() or write()
system calls, so it might be worth to align them with PGAlignedBlock.

Thoughts?

The buffers used to write the lock file and the TLI history file are
not page buffers, and this could make code readers think that these
are pages.

The type is called "aligned block", not "aligned buffer" or "aligned page",
so I don't think it's incorrect to try to use it.

Block is a type defined in bufmgr.h...

So I am honestly not sure if there's a point in changing

them because the current code is not incorrect, isn't it? It looks
like 2042b3428d39 for the TLI history file and 52948169bcdd for the
lock file began using BLCKSZ because that was just a handy thing to
do, and because we know they would never get beyond that.

Yeah, it's not clear why these need to be block-sized. We shouldn't
perpetuate this without more clarity about this.

If we change something, we should consider making buffers like these aligned
to page sizes, rather than just MAXALIGNED.

Greetings,

Andres Freund

#5Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#4)
Re: Use PGAlignedBlock instead of "char buf[BLCKSZ]" in more places

On Mon, Dec 04, 2023 at 09:47:24AM -0800, Andres Freund wrote:

If we change something, we should consider making buffers like these aligned
to page sizes, rather than just MAXALIGNED.

You mean 4k kernel pages, right? That makes sense to me.
--
Michael