Use read streams in CREATE DATABASE command when the strategy is wal_log

Started by Nazir Bilal Yavuzover 1 year ago12 messages
#1Nazir Bilal Yavuz
byavuz81@gmail.com
3 attachment(s)

Hi,

I am working on using read streams in the CREATE DATABASE command when the
strategy is wal_log. RelationCopyStorageUsingBuffer() function is used in
this context. This function reads source buffers then copies them to the
destination buffers. I used read streams only when reading source buffers
because the destination buffers are read by 'RBM_ZERO_AND_LOCK' option, so
it is not important.

I created a ~6 GB table [1]CREATE TABLE t as select repeat('a', 100) || i || repeat('b', 500) as filler from generate_series(1, 9000000) as i; and created a new database with the wal_log
strategy using the database that table was created in as a template [2]CREATE DATABASE test_1 STRATEGY 'wal_log' TEMPLATE test;. My
benchmarking results are:

a. Timings:

patched:
12955.027 ms
12917.475 ms
13177.846 ms
12971.308 ms
13059.985 ms

master:
13156.375 ms
13054.071 ms
13151.607 ms
13152.633 ms
13160.538 ms

There is no difference in timings, the patched version is a tiny bit better
but it is negligible. I actually expected the patched version to be better
because there was no prefetching before, but the read stream API detects
sequential access and disables prefetching.

b. strace:

patched:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
68.02 3.749359 2 1285782 pwrite64
18.54 1.021734 21 46730 preadv
9.49 0.522889 826 633 fdatasync
2.55 0.140339 59 2368 pwritev
1.14 0.062583 409 153 fsync

master:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
59.71 3.831542 2 1288365 pwrite64
29.84 1.914540 2 747936 pread64
7.90 0.506843 837 605 fdatasync
1.58 0.101575 54 1856 pwritev
0.75 0.048431 400 121 fsync

There are fewer (~1/16) read system calls in the patched version.

c. perf:

patched:

- 97.83% 1.13% postgres postgres [.]
RelationCopyStorageUsingBuffer

- 97.83% RelationCopyStorageUsingBuffer

- 44.28% ReadBufferWithoutRelcache

+ 42.20% GetVictimBuffer

0.81% ZeroBuffer

+ 31.86% log_newpage_buffer

- 19.51% read_stream_next_buffer

- 17.92% WaitReadBuffers

+ 17.61% mdreadv

- 1.47% read_stream_start_pending_read

+ 1.46% StartReadBuffers

master:

- 97.68% 0.57% postgres postgres [.]
RelationCopyStorageUsingBuffer

- RelationCopyStorageUsingBuffer

- 65.48% ReadBufferWithoutRelcache

+ 41.16% GetVictimBuffer

- 20.42% WaitReadBuffers

+ 19.90% mdreadv

+ 1.85% StartReadBuffer

0.75% ZeroBuffer

+ 30.82% log_newpage_buffer

Patched version spends less CPU time in read calls and more CPU time in
other calls such as write.

There are three patch files attached. First two are optimization and adding
a way to create a read stream object by using SMgrRelation, these are
already proposed in the streaming I/O thread [3]/messages/by-id/CAN55FZ1yGvCzCW_aufu83VimdEYHbG_zuOY3J9JL-nBptyJyKA@mail.gmail.com. The third one is the
actual patch file.

Any kind of feedback would be appreciated.

[1]: CREATE TABLE t as select repeat('a', 100) || i || repeat('b', 500) as filler from generate_series(1, 9000000) as i;
filler from generate_series(1, 9000000) as i;
[2]: CREATE DATABASE test_1 STRATEGY 'wal_log' TEMPLATE test;
[3]: /messages/by-id/CAN55FZ1yGvCzCW_aufu83VimdEYHbG_zuOY3J9JL-nBptyJyKA@mail.gmail.com
/messages/by-id/CAN55FZ1yGvCzCW_aufu83VimdEYHbG_zuOY3J9JL-nBptyJyKA@mail.gmail.com

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachments:

v1-0002-Add-a-way-to-create-read-stream-object-by-using-S.patchtext/x-patch; charset=US-ASCII; name=v1-0002-Add-a-way-to-create-read-stream-object-by-using-S.patchDownload
From 2c6947231ba6377f291a1bd7c9cee55e5833fdbf Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Mon, 8 Apr 2024 00:14:52 +0300
Subject: [PATCH v1 2/3] Add a way to create read stream object by using
 SMgrRelation

Currently read stream object can be created only by using Relation,
there is no way to create it by using SMgrRelation.

To achieve that, read_stream_begin_impl() function is created and
contents of the read_stream_begin_relation() is moved into this
function.

Both read_stream_begin_relation() and read_stream_begin_smgr_relation()
calls read_stream_begin_impl() by Relation and SMgrRelation
respectively.
---
 src/include/storage/read_stream.h     |  9 +++
 src/backend/storage/aio/read_stream.c | 83 ++++++++++++++++++++++-----
 2 files changed, 78 insertions(+), 14 deletions(-)

diff --git a/src/include/storage/read_stream.h b/src/include/storage/read_stream.h
index fae09d2b4cc..c7d688d8c9c 100644
--- a/src/include/storage/read_stream.h
+++ b/src/include/storage/read_stream.h
@@ -15,6 +15,7 @@
 #define READ_STREAM_H
 
 #include "storage/bufmgr.h"
+#include "storage/smgr.h"
 
 /* Default tuning, reasonable for many users. */
 #define READ_STREAM_DEFAULT 0x00
@@ -56,6 +57,14 @@ extern ReadStream *read_stream_begin_relation(int flags,
 											  ReadStreamBlockNumberCB callback,
 											  void *callback_private_data,
 											  size_t per_buffer_data_size);
+extern ReadStream *read_stream_begin_smgr_relation(int flags,
+												   BufferAccessStrategy strategy,
+												   SMgrRelation smgr,
+												   char smgr_persistence,
+												   ForkNumber forknum,
+												   ReadStreamBlockNumberCB callback,
+												   void *callback_private_data,
+												   size_t per_buffer_data_size);
 extern Buffer read_stream_next_buffer(ReadStream *stream, void **per_buffer_private);
 extern void read_stream_reset(ReadStream *stream);
 extern void read_stream_end(ReadStream *stream);
diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c
index d155dde5ce3..8dd100f016c 100644
--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -406,14 +406,16 @@ read_stream_look_ahead(ReadStream *stream, bool suppress_advice)
  * write extra data for each block into the space provided to it.  It will
  * also receive callback_private_data for its own purposes.
  */
-ReadStream *
-read_stream_begin_relation(int flags,
-						   BufferAccessStrategy strategy,
-						   Relation rel,
-						   ForkNumber forknum,
-						   ReadStreamBlockNumberCB callback,
-						   void *callback_private_data,
-						   size_t per_buffer_data_size)
+static ReadStream *
+read_stream_begin_impl(int flags,
+					   BufferAccessStrategy strategy,
+					   Relation rel,
+					   SMgrRelation smgr,
+					   char smgr_persistence,
+					   ForkNumber forknum,
+					   ReadStreamBlockNumberCB callback,
+					   void *callback_private_data,
+					   size_t per_buffer_data_size)
 {
 	ReadStream *stream;
 	size_t		size;
@@ -422,9 +424,6 @@ read_stream_begin_relation(int flags,
 	int			strategy_pin_limit;
 	uint32		max_pinned_buffers;
 	Oid			tablespace_id;
-	SMgrRelation smgr;
-
-	smgr = RelationGetSmgr(rel);
 
 	/*
 	 * Decide how many I/Os we will allow to run at the same time.  That
@@ -434,7 +433,7 @@ read_stream_begin_relation(int flags,
 	 */
 	tablespace_id = smgr->smgr_rlocator.locator.spcOid;
 	if (!OidIsValid(MyDatabaseId) ||
-		IsCatalogRelation(rel) ||
+		(rel && IsCatalogRelation(rel)) ||
 		IsCatalogRelationOid(smgr->smgr_rlocator.locator.relNumber))
 	{
 		/*
@@ -548,8 +547,8 @@ read_stream_begin_relation(int flags,
 	for (int i = 0; i < max_ios; ++i)
 	{
 		stream->ios[i].op.rel = rel;
-		stream->ios[i].op.smgr = RelationGetSmgr(rel);
-		stream->ios[i].op.smgr_persistence = rel->rd_rel->relpersistence;
+		stream->ios[i].op.smgr = smgr;
+		stream->ios[i].op.smgr_persistence = smgr_persistence;
 		stream->ios[i].op.forknum = forknum;
 		stream->ios[i].op.strategy = strategy;
 	}
@@ -557,6 +556,62 @@ read_stream_begin_relation(int flags,
 	return stream;
 }
 
+/*
+ * Create a new read stream for reading a relation.
+ * See read_stream_begin_impl() for the detailed explanation.
+ */
+ReadStream *
+read_stream_begin_relation(int flags,
+						   BufferAccessStrategy strategy,
+						   Relation rel,
+						   ForkNumber forknum,
+						   ReadStreamBlockNumberCB callback,
+						   void *callback_private_data,
+						   size_t per_buffer_data_size)
+{
+	return read_stream_begin_impl(flags,
+								  strategy,
+								  rel,
+								  RelationGetSmgr(rel),
+								  rel->rd_rel->relpersistence,
+								  forknum,
+								  callback,
+								  callback_private_data,
+								  per_buffer_data_size);
+}
+
+/*
+ * Create a new read stream for reading a SMgr relation.
+ * See read_stream_begin_impl() for the detailed explanation.
+ */
+ReadStream *
+read_stream_begin_smgr_relation(int flags,
+								BufferAccessStrategy strategy,
+								SMgrRelation smgr,
+								char smgr_persistence,
+								ForkNumber forknum,
+								ReadStreamBlockNumberCB callback,
+								void *callback_private_data,
+								size_t per_buffer_data_size)
+{
+	char		persistence;
+
+	if (smgr_persistence == 0)
+		persistence = RELPERSISTENCE_PERMANENT;
+	else
+		persistence = smgr_persistence;
+
+	return read_stream_begin_impl(flags,
+								  strategy,
+								  NULL,
+								  smgr,
+								  persistence,
+								  forknum,
+								  callback,
+								  callback_private_data,
+								  per_buffer_data_size);
+}
+
 /*
  * Pull one pinned buffer out of a stream.  Each call returns successive
  * blocks in the order specified by the callback.  If per_buffer_data_size was
-- 
2.43.0

v1-0003-Use-read-streams-in-CREATE-DATABASE-when-strategy.patchtext/x-patch; charset=US-ASCII; name=v1-0003-Use-read-streams-in-CREATE-DATABASE-when-strategy.patchDownload
From b97e1f39ef53c331090597aaf2d38d66a9b49e89 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Tue, 16 Apr 2024 11:36:49 +0300
Subject: [PATCH v1 3/3] Use read streams in CREATE DATABASE when strategy is
 wal_log

CREATE DABASE command uses RelationCopyStorageUsingBuffer() function to
copy buffers when the strategy is wal_log. This function reads source
buffers then copies them to the destination buffers. Read streams are
used only only while reading source buffers because the destination
buffer is read by 'RBM_ZERO_AND_LOCK' option, so it is not important.
---
 src/backend/storage/buffer/bufmgr.c | 53 ++++++++++++++++++++++++++---
 1 file changed, 49 insertions(+), 4 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 17cb7127f99..e02b699f6cf 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -54,6 +54,7 @@
 #include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/proc.h"
+#include "storage/read_stream.h"
 #include "storage/smgr.h"
 #include "storage/standby.h"
 #include "utils/memdebug.h"
@@ -135,6 +136,33 @@ typedef struct SMgrSortArray
 	SMgrRelation srel;
 } SMgrSortArray;
 
+/*
+ * Helper struct for read stream object used in
+ * RelationCopyStorageUsingBuffer() function.
+ */
+struct copy_storage_using_buffer_read_stream_private
+{
+	BlockNumber blocknum;
+	int64		last_block;
+};
+
+/*
+ * Callback function to get next block for read stream object used in
+ * RelationCopyStorageUsingBuffer() function.
+ */
+static BlockNumber
+copy_storage_using_buffer_read_stream_next_block(ReadStream *stream,
+												 void *callback_private_data,
+												 void *per_buffer_data)
+{
+	struct copy_storage_using_buffer_read_stream_private *p = callback_private_data;
+
+	if (p->blocknum < p->last_block)
+		return p->blocknum++;
+
+	return InvalidBlockNumber;
+}
+
 /* GUC variables */
 bool		zero_damaged_pages = false;
 int			bgwriter_lru_maxpages = 100;
@@ -4639,6 +4667,9 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
 	PGIOAlignedBlock buf;
 	BufferAccessStrategy bstrategy_src;
 	BufferAccessStrategy bstrategy_dst;
+	struct copy_storage_using_buffer_read_stream_private p;
+	ReadStream *src_stream;
+	SMgrRelation src_smgr;
 
 	/*
 	 * In general, we want to write WAL whenever wal_level > 'minimal', but we
@@ -4667,19 +4698,31 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
 	bstrategy_src = GetAccessStrategy(BAS_BULKREAD);
 	bstrategy_dst = GetAccessStrategy(BAS_BULKWRITE);
 
+	/* Initalize streaming read */
+	p.blocknum = 0;
+	p.last_block = nblocks;
+	src_smgr = smgropen(srclocator, INVALID_PROC_NUMBER);
+	src_stream = read_stream_begin_smgr_relation(READ_STREAM_FULL,
+												 bstrategy_src,
+												 src_smgr,
+												 permanent ? RELPERSISTENCE_PERMANENT : RELPERSISTENCE_UNLOGGED,
+												 forkNum,
+												 copy_storage_using_buffer_read_stream_next_block,
+												 &p,
+												 0);
+
 	/* Iterate over each block of the source relation file. */
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		CHECK_FOR_INTERRUPTS();
 
 		/* Read block from source relation. */
-		srcBuf = ReadBufferWithoutRelcache(srclocator, forkNum, blkno,
-										   RBM_NORMAL, bstrategy_src,
-										   permanent);
+		srcBuf = read_stream_next_buffer(src_stream, NULL);
 		LockBuffer(srcBuf, BUFFER_LOCK_SHARE);
 		srcPage = BufferGetPage(srcBuf);
 
-		dstBuf = ReadBufferWithoutRelcache(dstlocator, forkNum, blkno,
+		dstBuf = ReadBufferWithoutRelcache(dstlocator, forkNum,
+										   BufferGetBlockNumber(srcBuf),
 										   RBM_ZERO_AND_LOCK, bstrategy_dst,
 										   permanent);
 		dstPage = BufferGetPage(dstBuf);
@@ -4699,6 +4742,8 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
 		UnlockReleaseBuffer(dstBuf);
 		UnlockReleaseBuffer(srcBuf);
 	}
+	Assert(read_stream_next_buffer(src_stream, NULL) == InvalidBuffer);
+	read_stream_end(src_stream);
 
 	FreeAccessStrategy(bstrategy_src);
 	FreeAccessStrategy(bstrategy_dst);
-- 
2.43.0

v1-0001-Refactor-PinBufferForBlock-to-remove-if-checks-ab.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Refactor-PinBufferForBlock-to-remove-if-checks-ab.patchDownload
From edcfacec40a70a8747c5f18777b6c28b0fccba7a Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Sun, 7 Apr 2024 22:33:36 +0300
Subject: [PATCH v1 1/3] Refactor PinBufferForBlock() to remove if checks about
 persistence

There are if checks in PinBufferForBlock() function to set persistence
of the relation and this function is called for the each block in the
relation. Instead of that, set persistence of the relation before
PinBufferForBlock() function.
---
 src/backend/storage/aio/read_stream.c |  2 +-
 src/backend/storage/buffer/bufmgr.c   | 31 +++++++++++----------------
 2 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c
index f54dacdd914..d155dde5ce3 100644
--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -549,7 +549,7 @@ read_stream_begin_relation(int flags,
 	{
 		stream->ios[i].op.rel = rel;
 		stream->ios[i].op.smgr = RelationGetSmgr(rel);
-		stream->ios[i].op.smgr_persistence = 0;
+		stream->ios[i].op.smgr_persistence = rel->rd_rel->relpersistence;
 		stream->ios[i].op.forknum = forknum;
 		stream->ios[i].op.strategy = strategy;
 	}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 901b7230fb9..17cb7127f99 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1067,24 +1067,10 @@ PinBufferForBlock(Relation rel,
 	BufferDesc *bufHdr;
 	IOContext	io_context;
 	IOObject	io_object;
-	char		persistence;
 
 	Assert(blockNum != P_NEW);
 
-	/*
-	 * If there is no Relation it usually implies recovery and thus permanent,
-	 * but we take an argmument because CreateAndCopyRelationData can reach us
-	 * with only an SMgrRelation for an unlogged relation that we don't want
-	 * to flag with BM_PERMANENT.
-	 */
-	if (rel)
-		persistence = rel->rd_rel->relpersistence;
-	else if (smgr_persistence == 0)
-		persistence = RELPERSISTENCE_PERMANENT;
-	else
-		persistence = smgr_persistence;
-
-	if (persistence == RELPERSISTENCE_TEMP)
+	if (smgr_persistence == RELPERSISTENCE_TEMP)
 	{
 		io_context = IOCONTEXT_NORMAL;
 		io_object = IOOBJECT_TEMP_RELATION;
@@ -1101,7 +1087,7 @@ PinBufferForBlock(Relation rel,
 									   smgr->smgr_rlocator.locator.relNumber,
 									   smgr->smgr_rlocator.backend);
 
-	if (persistence == RELPERSISTENCE_TEMP)
+	if (smgr_persistence == RELPERSISTENCE_TEMP)
 	{
 		bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, foundPtr);
 		if (*foundPtr)
@@ -1109,7 +1095,7 @@ PinBufferForBlock(Relation rel,
 	}
 	else
 	{
-		bufHdr = BufferAlloc(smgr, persistence, forkNum, blockNum,
+		bufHdr = BufferAlloc(smgr, smgr_persistence, forkNum, blockNum,
 							 strategy, foundPtr, io_context);
 		if (*foundPtr)
 			pgBufferUsage.shared_blks_hit++;
@@ -1157,6 +1143,7 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence,
 	ReadBuffersOperation operation;
 	Buffer		buffer;
 	int			flags;
+	char		persistence;
 
 	/*
 	 * Backward compatibility path, most code should use ExtendBufferedRel()
@@ -1195,7 +1182,15 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence,
 		flags = 0;
 	operation.smgr = smgr;
 	operation.rel = rel;
-	operation.smgr_persistence = smgr_persistence;
+
+	if (rel)
+		persistence = rel->rd_rel->relpersistence;
+	else if (smgr_persistence == 0)
+		persistence = RELPERSISTENCE_PERMANENT;
+	else
+		persistence = smgr_persistence;
+	operation.smgr_persistence = persistence;
+
 	operation.forknum = forkNum;
 	operation.strategy = strategy;
 	if (StartReadBuffer(&operation,
-- 
2.43.0

#2Noah Misch
noah@leadboat.com
In reply to: Nazir Bilal Yavuz (#1)
Re: Use read streams in CREATE DATABASE command when the strategy is wal_log

On Tue, Apr 16, 2024 at 02:12:19PM +0300, Nazir Bilal Yavuz wrote:

I am working on using read streams in the CREATE DATABASE command when the
strategy is wal_log. RelationCopyStorageUsingBuffer() function is used in
this context. This function reads source buffers then copies them to the

Please rebase. I applied this to 40126ac for review purposes.

a. Timings:
b. strace:
c. perf:

Thanks for including those details. That's valuable confirmation.

Subject: [PATCH v1 1/3] Refactor PinBufferForBlock() to remove if checks about
persistence

There are if checks in PinBufferForBlock() function to set persistence
of the relation and this function is called for the each block in the
relation. Instead of that, set persistence of the relation before
PinBufferForBlock() function.

I tried with the following additional patch to see if PinBufferForBlock() ever
gets invalid smgr_relpersistence:

====
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1098,6 +1098,11 @@ PinBufferForBlock(Relation rel,

Assert(blockNum != P_NEW);

+	if (!(smgr_persistence == RELPERSISTENCE_TEMP ||
+		  smgr_persistence == RELPERSISTENCE_PERMANENT ||
+		  smgr_persistence == RELPERSISTENCE_UNLOGGED))
+		elog(WARNING, "unexpected relpersistence %d", smgr_persistence);
+
 	if (smgr_persistence == RELPERSISTENCE_TEMP)
 	{
 		io_context = IOCONTEXT_NORMAL;
====

That still gets relpersistence==0 in various src/test/regress cases. I think
the intent was to prevent that. If not, please add a comment about when
relpersistence==0 is still allowed.

--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -549,7 +549,7 @@ read_stream_begin_relation(int flags,
{
stream->ios[i].op.rel = rel;
stream->ios[i].op.smgr = RelationGetSmgr(rel);
-		stream->ios[i].op.smgr_persistence = 0;
+		stream->ios[i].op.smgr_persistence = rel->rd_rel->relpersistence;

Does the following comment in ReadBuffersOperation need an update?

/*
* The following members should be set by the caller. If only smgr is
* provided without rel, then smgr_persistence can be set to override the
* default assumption of RELPERSISTENCE_PERMANENT.
*/

--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
+/*
+ * Helper struct for read stream object used in
+ * RelationCopyStorageUsingBuffer() function.
+ */
+struct copy_storage_using_buffer_read_stream_private
+{
+	BlockNumber blocknum;
+	int64		last_block;
+};

Why is last_block an int64, not a BlockNumber?

@@ -4667,19 +4698,31 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,

/* Iterate over each block of the source relation file. */
for (blkno = 0; blkno < nblocks; blkno++)
{
CHECK_FOR_INTERRUPTS();

/* Read block from source relation. */
-		srcBuf = ReadBufferWithoutRelcache(srclocator, forkNum, blkno,
-										   RBM_NORMAL, bstrategy_src,
-										   permanent);
+		srcBuf = read_stream_next_buffer(src_stream, NULL);
LockBuffer(srcBuf, BUFFER_LOCK_SHARE);

I think this should check for read_stream_next_buffer() returning
InvalidBuffer. pg_prewarm doesn't, but the other callers do, and I think the
other callers are a better model. LockBuffer() doesn't check the
InvalidBuffer case, so let's avoid the style of using a
read_stream_next_buffer() return value without checking.

Thanks,
nm

#3Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Noah Misch (#2)
3 attachment(s)
Re: Use read streams in CREATE DATABASE command when the strategy is wal_log

Hi,

Thank you for the review!

On Fri, 12 Jul 2024 at 02:52, Noah Misch <noah@leadboat.com> wrote:

On Tue, Apr 16, 2024 at 02:12:19PM +0300, Nazir Bilal Yavuz wrote:

I am working on using read streams in the CREATE DATABASE command when the
strategy is wal_log. RelationCopyStorageUsingBuffer() function is used in
this context. This function reads source buffers then copies them to the

Please rebase. I applied this to 40126ac for review purposes.

Rebased.

Subject: [PATCH v1 1/3] Refactor PinBufferForBlock() to remove if checks about
persistence

There are if checks in PinBufferForBlock() function to set persistence
of the relation and this function is called for the each block in the
relation. Instead of that, set persistence of the relation before
PinBufferForBlock() function.

I tried with the following additional patch to see if PinBufferForBlock() ever
gets invalid smgr_relpersistence:

====
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1098,6 +1098,11 @@ PinBufferForBlock(Relation rel,

Assert(blockNum != P_NEW);

+       if (!(smgr_persistence == RELPERSISTENCE_TEMP ||
+                 smgr_persistence == RELPERSISTENCE_PERMANENT ||
+                 smgr_persistence == RELPERSISTENCE_UNLOGGED))
+               elog(WARNING, "unexpected relpersistence %d", smgr_persistence);
+
if (smgr_persistence == RELPERSISTENCE_TEMP)
{
io_context = IOCONTEXT_NORMAL;
====

That still gets relpersistence==0 in various src/test/regress cases. I think
the intent was to prevent that. If not, please add a comment about when
relpersistence==0 is still allowed.

I fixed it, it is caused by (mode == RBM_ZERO_AND_CLEANUP_LOCK | mode
== RBM_ZERO_AND_LOCK) case in the ReadBuffer_common(). The persistence
was not updated for this path before. I also added an assert check for
this problem to PinBufferForBlock().

--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -549,7 +549,7 @@ read_stream_begin_relation(int flags,
{
stream->ios[i].op.rel = rel;
stream->ios[i].op.smgr = RelationGetSmgr(rel);
-             stream->ios[i].op.smgr_persistence = 0;
+             stream->ios[i].op.smgr_persistence = rel->rd_rel->relpersistence;

Does the following comment in ReadBuffersOperation need an update?

/*
* The following members should be set by the caller. If only smgr is
* provided without rel, then smgr_persistence can be set to override the
* default assumption of RELPERSISTENCE_PERMANENT.
*/

I believe it does not need to be updated but I renamed
'ReadBuffersOperation.smgr_persistence' as
'ReadBuffersOperation.persistence'. So, this comment is updated as
well. I think that rename suits better because persistence does not
need to come from smgr, it could come from relation, too. Do you think
it is a good idea? If it is, does it need a separate commit?

--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
+/*
+ * Helper struct for read stream object used in
+ * RelationCopyStorageUsingBuffer() function.
+ */
+struct copy_storage_using_buffer_read_stream_private
+{
+     BlockNumber blocknum;
+     int64           last_block;
+};

Why is last_block an int64, not a BlockNumber?

You are right, the type of last_block should be BlockNumber; done. I
copied it from pg_prewarm_read_stream_private struct and I guess the
same should be applied to it as well but it is not the topic of this
thread, so I did not update it yet.

@@ -4667,19 +4698,31 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,

/* Iterate over each block of the source relation file. */
for (blkno = 0; blkno < nblocks; blkno++)
{
CHECK_FOR_INTERRUPTS();

/* Read block from source relation. */
-             srcBuf = ReadBufferWithoutRelcache(srclocator, forkNum, blkno,
-                                                                                RBM_NORMAL, bstrategy_src,
-                                                                                permanent);
+             srcBuf = read_stream_next_buffer(src_stream, NULL);
LockBuffer(srcBuf, BUFFER_LOCK_SHARE);

I think this should check for read_stream_next_buffer() returning
InvalidBuffer. pg_prewarm doesn't, but the other callers do, and I think the
other callers are a better model. LockBuffer() doesn't check the
InvalidBuffer case, so let's avoid the style of using a
read_stream_next_buffer() return value without checking.

There is an assert in the LockBuffer which checks for the
InvalidBuffer. If that is not enough, we may add an if check for
InvalidBuffer but what should we do in this case? It should not
happen, so erroring out may be a good idea.

Updated patches are attached (without InvalidBuffer check for now).

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachments:

v2-0001-Refactor-PinBufferForBlock-to-remove-if-checks-ab.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Refactor-PinBufferForBlock-to-remove-if-checks-ab.patchDownload
From 977cea9451e602d8d3d5e4f0cf3cd7dfea11879e Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Sun, 7 Apr 2024 22:33:36 +0300
Subject: [PATCH v2 1/3] Refactor PinBufferForBlock() to remove if checks about
 persistence

There are if checks in PinBufferForBlock() function to set persistence
of the relation and this function is called for the each block in the
relation. Instead of that, set persistence of the relation before
PinBufferForBlock() function.
---
 src/include/storage/bufmgr.h          |  4 ++--
 src/backend/storage/aio/read_stream.c |  2 +-
 src/backend/storage/buffer/bufmgr.c   | 32 +++++++++++++--------------
 3 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index a1e71013d32..ba9a1a289a9 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -116,12 +116,12 @@ struct ReadBuffersOperation
 {
 	/*
 	 * The following members should be set by the caller.  If only smgr is
-	 * provided without rel, then smgr_persistence can be set to override the
+	 * provided without rel, then persistence can be set to override the
 	 * default assumption of RELPERSISTENCE_PERMANENT.
 	 */
 	Relation	rel;
 	struct SMgrRelationData *smgr;
-	char		smgr_persistence;
+	char		persistence;
 	ForkNumber	forknum;
 	BufferAccessStrategy strategy;
 
diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c
index 74b9bae6313..58221649f27 100644
--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -551,7 +551,7 @@ read_stream_begin_relation(int flags,
 	{
 		stream->ios[i].op.rel = rel;
 		stream->ios[i].op.smgr = RelationGetSmgr(rel);
-		stream->ios[i].op.smgr_persistence = 0;
+		stream->ios[i].op.persistence = rel->rd_rel->relpersistence;
 		stream->ios[i].op.forknum = forknum;
 		stream->ios[i].op.strategy = strategy;
 	}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 61816730955..7cbcdc63a6f 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1104,7 +1104,7 @@ ZeroAndLockBuffer(Buffer buffer, ReadBufferMode mode, bool already_valid)
 static pg_attribute_always_inline Buffer
 PinBufferForBlock(Relation rel,
 				  SMgrRelation smgr,
-				  char smgr_persistence,
+				  char persistence,
 				  ForkNumber forkNum,
 				  BlockNumber blockNum,
 				  BufferAccessStrategy strategy,
@@ -1113,22 +1113,12 @@ PinBufferForBlock(Relation rel,
 	BufferDesc *bufHdr;
 	IOContext	io_context;
 	IOObject	io_object;
-	char		persistence;
 
 	Assert(blockNum != P_NEW);
 
-	/*
-	 * If there is no Relation it usually implies recovery and thus permanent,
-	 * but we take an argument because CreateAndCopyRelationData can reach us
-	 * with only an SMgrRelation for an unlogged relation that we don't want
-	 * to flag with BM_PERMANENT.
-	 */
-	if (rel)
-		persistence = rel->rd_rel->relpersistence;
-	else if (smgr_persistence == 0)
-		persistence = RELPERSISTENCE_PERMANENT;
-	else
-		persistence = smgr_persistence;
+	Assert((persistence == RELPERSISTENCE_TEMP ||
+			persistence == RELPERSISTENCE_PERMANENT ||
+			persistence == RELPERSISTENCE_UNLOGGED));
 
 	if (persistence == RELPERSISTENCE_TEMP)
 	{
@@ -1203,6 +1193,7 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence,
 	ReadBuffersOperation operation;
 	Buffer		buffer;
 	int			flags;
+	char		persistence;
 
 	/*
 	 * Backward compatibility path, most code should use ExtendBufferedRel()
@@ -1224,12 +1215,19 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence,
 		return ExtendBufferedRel(BMR_REL(rel), forkNum, strategy, flags);
 	}
 
+	if (rel)
+		persistence = rel->rd_rel->relpersistence;
+	else if (smgr_persistence == 0)
+		persistence = RELPERSISTENCE_PERMANENT;
+	else
+		persistence = smgr_persistence;
+
 	if (unlikely(mode == RBM_ZERO_AND_CLEANUP_LOCK ||
 				 mode == RBM_ZERO_AND_LOCK))
 	{
 		bool		found;
 
-		buffer = PinBufferForBlock(rel, smgr, smgr_persistence,
+		buffer = PinBufferForBlock(rel, smgr, persistence,
 								   forkNum, blockNum, strategy, &found);
 		ZeroAndLockBuffer(buffer, mode, found);
 		return buffer;
@@ -1241,7 +1239,7 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence,
 		flags = 0;
 	operation.smgr = smgr;
 	operation.rel = rel;
-	operation.smgr_persistence = smgr_persistence;
+	operation.persistence = persistence;
 	operation.forknum = forkNum;
 	operation.strategy = strategy;
 	if (StartReadBuffer(&operation,
@@ -1272,7 +1270,7 @@ StartReadBuffersImpl(ReadBuffersOperation *operation,
 
 		buffers[i] = PinBufferForBlock(operation->rel,
 									   operation->smgr,
-									   operation->smgr_persistence,
+									   operation->persistence,
 									   operation->forknum,
 									   blockNum + i,
 									   operation->strategy,
-- 
2.45.2

v2-0002-Add-a-way-to-create-read-stream-object-by-using-S.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Add-a-way-to-create-read-stream-object-by-using-S.patchDownload
From 53c2bd12d71e521d975002fb03ef53b74389ccb7 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Mon, 8 Apr 2024 00:14:52 +0300
Subject: [PATCH v2 2/3] Add a way to create read stream object by using
 SMgrRelation

Currently read stream object can be created only by using Relation,
there is no way to create it by using SMgrRelation.

To achieve that, read_stream_begin_impl() function is created and
contents of the read_stream_begin_relation() is moved into this
function.

Both read_stream_begin_relation() and read_stream_begin_smgr_relation()
calls read_stream_begin_impl() by Relation and SMgrRelation
respectively.
---
 src/include/storage/read_stream.h     | 10 ++++
 src/backend/storage/aio/read_stream.c | 83 ++++++++++++++++++++++-----
 2 files changed, 79 insertions(+), 14 deletions(-)

diff --git a/src/include/storage/read_stream.h b/src/include/storage/read_stream.h
index f676d2cc20a..4e599904f26 100644
--- a/src/include/storage/read_stream.h
+++ b/src/include/storage/read_stream.h
@@ -15,6 +15,7 @@
 #define READ_STREAM_H
 
 #include "storage/bufmgr.h"
+#include "storage/smgr.h"
 
 /* Default tuning, reasonable for many users. */
 #define READ_STREAM_DEFAULT 0x00
@@ -57,6 +58,15 @@ extern ReadStream *read_stream_begin_relation(int flags,
 											  void *callback_private_data,
 											  size_t per_buffer_data_size);
 extern Buffer read_stream_next_buffer(ReadStream *stream, void **per_buffer_data);
+extern ReadStream *read_stream_begin_smgr_relation(int flags,
+												   BufferAccessStrategy strategy,
+												   SMgrRelation smgr,
+												   char smgr_persistence,
+												   ForkNumber forknum,
+												   ReadStreamBlockNumberCB callback,
+												   void *callback_private_data,
+												   size_t per_buffer_data_size);
+extern Buffer read_stream_next_buffer(ReadStream *stream, void **per_buffer_private);
 extern void read_stream_reset(ReadStream *stream);
 extern void read_stream_end(ReadStream *stream);
 
diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c
index 58221649f27..cdec918d29d 100644
--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -406,14 +406,16 @@ read_stream_look_ahead(ReadStream *stream, bool suppress_advice)
  * write extra data for each block into the space provided to it.  It will
  * also receive callback_private_data for its own purposes.
  */
-ReadStream *
-read_stream_begin_relation(int flags,
-						   BufferAccessStrategy strategy,
-						   Relation rel,
-						   ForkNumber forknum,
-						   ReadStreamBlockNumberCB callback,
-						   void *callback_private_data,
-						   size_t per_buffer_data_size)
+static ReadStream *
+read_stream_begin_impl(int flags,
+					   BufferAccessStrategy strategy,
+					   Relation rel,
+					   SMgrRelation smgr,
+					   char smgr_persistence,
+					   ForkNumber forknum,
+					   ReadStreamBlockNumberCB callback,
+					   void *callback_private_data,
+					   size_t per_buffer_data_size)
 {
 	ReadStream *stream;
 	size_t		size;
@@ -422,9 +424,6 @@ read_stream_begin_relation(int flags,
 	int			strategy_pin_limit;
 	uint32		max_pinned_buffers;
 	Oid			tablespace_id;
-	SMgrRelation smgr;
-
-	smgr = RelationGetSmgr(rel);
 
 	/*
 	 * Decide how many I/Os we will allow to run at the same time.  That
@@ -434,7 +433,7 @@ read_stream_begin_relation(int flags,
 	 */
 	tablespace_id = smgr->smgr_rlocator.locator.spcOid;
 	if (!OidIsValid(MyDatabaseId) ||
-		IsCatalogRelation(rel) ||
+		(rel && IsCatalogRelation(rel)) ||
 		IsCatalogRelationOid(smgr->smgr_rlocator.locator.relNumber))
 	{
 		/*
@@ -550,8 +549,8 @@ read_stream_begin_relation(int flags,
 	for (int i = 0; i < max_ios; ++i)
 	{
 		stream->ios[i].op.rel = rel;
-		stream->ios[i].op.smgr = RelationGetSmgr(rel);
-		stream->ios[i].op.persistence = rel->rd_rel->relpersistence;
+		stream->ios[i].op.smgr = smgr;
+		stream->ios[i].op.persistence = smgr_persistence;
 		stream->ios[i].op.forknum = forknum;
 		stream->ios[i].op.strategy = strategy;
 	}
@@ -559,6 +558,62 @@ read_stream_begin_relation(int flags,
 	return stream;
 }
 
+/*
+ * Create a new read stream for reading a relation.
+ * See read_stream_begin_impl() for the detailed explanation.
+ */
+ReadStream *
+read_stream_begin_relation(int flags,
+						   BufferAccessStrategy strategy,
+						   Relation rel,
+						   ForkNumber forknum,
+						   ReadStreamBlockNumberCB callback,
+						   void *callback_private_data,
+						   size_t per_buffer_data_size)
+{
+	return read_stream_begin_impl(flags,
+								  strategy,
+								  rel,
+								  RelationGetSmgr(rel),
+								  rel->rd_rel->relpersistence,
+								  forknum,
+								  callback,
+								  callback_private_data,
+								  per_buffer_data_size);
+}
+
+/*
+ * Create a new read stream for reading a SMgr relation.
+ * See read_stream_begin_impl() for the detailed explanation.
+ */
+ReadStream *
+read_stream_begin_smgr_relation(int flags,
+								BufferAccessStrategy strategy,
+								SMgrRelation smgr,
+								char smgr_persistence,
+								ForkNumber forknum,
+								ReadStreamBlockNumberCB callback,
+								void *callback_private_data,
+								size_t per_buffer_data_size)
+{
+	char		persistence;
+
+	if (smgr_persistence == 0)
+		persistence = RELPERSISTENCE_PERMANENT;
+	else
+		persistence = smgr_persistence;
+
+	return read_stream_begin_impl(flags,
+								  strategy,
+								  NULL,
+								  smgr,
+								  persistence,
+								  forknum,
+								  callback,
+								  callback_private_data,
+								  per_buffer_data_size);
+}
+
 /*
  * Pull one pinned buffer out of a stream.  Each call returns successive
  * blocks in the order specified by the callback.  If per_buffer_data_size was
-- 
2.45.2

v2-0003-Use-read-streams-in-CREATE-DATABASE-when-strategy.patchtext/x-patch; charset=US-ASCII; name=v2-0003-Use-read-streams-in-CREATE-DATABASE-when-strategy.patchDownload
From 31a2d8a7eef71bcd56804535f4e8afa90c8e86d3 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Tue, 16 Jul 2024 13:22:15 +0300
Subject: [PATCH v2 3/3] Use read streams in CREATE DATABASE when strategy is
 wal_log

CREATE DABASE command uses RelationCopyStorageUsingBuffer() function to
copy buffers when the strategy is wal_log. This function reads source
buffers then copies them to the destination buffers. Read streams are
used only only while reading source buffers because the destination
buffer is read by 'RBM_ZERO_AND_LOCK' option, so it is not important.
---
 src/backend/storage/buffer/bufmgr.c | 53 ++++++++++++++++++++++++++---
 1 file changed, 49 insertions(+), 4 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 7cbcdc63a6f..2acc109c4db 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -54,6 +54,7 @@
 #include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/proc.h"
+#include "storage/read_stream.h"
 #include "storage/smgr.h"
 #include "storage/standby.h"
 #include "utils/memdebug.h"
@@ -135,6 +136,33 @@ typedef struct SMgrSortArray
 	SMgrRelation srel;
 } SMgrSortArray;
 
+/*
+ * Helper struct for read stream object used in
+ * RelationCopyStorageUsingBuffer() function.
+ */
+struct copy_storage_using_buffer_read_stream_private
+{
+	BlockNumber blocknum;
+	BlockNumber last_block;
+};
+
+/*
+ * Callback function to get next block for read stream object used in
+ * RelationCopyStorageUsingBuffer() function.
+ */
+static BlockNumber
+copy_storage_using_buffer_read_stream_next_block(ReadStream *stream,
+												 void *callback_private_data,
+												 void *per_buffer_data)
+{
+	struct copy_storage_using_buffer_read_stream_private *p = callback_private_data;
+
+	if (p->blocknum < p->last_block)
+		return p->blocknum++;
+
+	return InvalidBlockNumber;
+}
+
 /* GUC variables */
 bool		zero_damaged_pages = false;
 int			bgwriter_lru_maxpages = 100;
@@ -4688,6 +4716,9 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
 	PGIOAlignedBlock buf;
 	BufferAccessStrategy bstrategy_src;
 	BufferAccessStrategy bstrategy_dst;
+	struct copy_storage_using_buffer_read_stream_private p;
+	ReadStream *src_stream;
+	SMgrRelation src_smgr;
 
 	/*
 	 * In general, we want to write WAL whenever wal_level > 'minimal', but we
@@ -4716,19 +4747,31 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
 	bstrategy_src = GetAccessStrategy(BAS_BULKREAD);
 	bstrategy_dst = GetAccessStrategy(BAS_BULKWRITE);
 
+	/* Initalize streaming read */
+	p.blocknum = 0;
+	p.last_block = nblocks;
+	src_smgr = smgropen(srclocator, INVALID_PROC_NUMBER);
+	src_stream = read_stream_begin_smgr_relation(READ_STREAM_FULL,
+												 bstrategy_src,
+												 src_smgr,
+												 permanent ? RELPERSISTENCE_PERMANENT : RELPERSISTENCE_UNLOGGED,
+												 forkNum,
+												 copy_storage_using_buffer_read_stream_next_block,
+												 &p,
+												 0);
+
 	/* Iterate over each block of the source relation file. */
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		CHECK_FOR_INTERRUPTS();
 
 		/* Read block from source relation. */
-		srcBuf = ReadBufferWithoutRelcache(srclocator, forkNum, blkno,
-										   RBM_NORMAL, bstrategy_src,
-										   permanent);
+		srcBuf = read_stream_next_buffer(src_stream, NULL);
 		LockBuffer(srcBuf, BUFFER_LOCK_SHARE);
 		srcPage = BufferGetPage(srcBuf);
 
-		dstBuf = ReadBufferWithoutRelcache(dstlocator, forkNum, blkno,
+		dstBuf = ReadBufferWithoutRelcache(dstlocator, forkNum,
+										   BufferGetBlockNumber(srcBuf),
 										   RBM_ZERO_AND_LOCK, bstrategy_dst,
 										   permanent);
 		dstPage = BufferGetPage(dstBuf);
@@ -4748,6 +4791,8 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
 		UnlockReleaseBuffer(dstBuf);
 		UnlockReleaseBuffer(srcBuf);
 	}
+	Assert(read_stream_next_buffer(src_stream, NULL) == InvalidBuffer);
+	read_stream_end(src_stream);
 
 	FreeAccessStrategy(bstrategy_src);
 	FreeAccessStrategy(bstrategy_dst);
-- 
2.45.2

#4Noah Misch
noah@leadboat.com
In reply to: Nazir Bilal Yavuz (#3)
Re: Use read streams in CREATE DATABASE command when the strategy is wal_log

On Tue, Jul 16, 2024 at 02:11:20PM +0300, Nazir Bilal Yavuz wrote:

On Fri, 12 Jul 2024 at 02:52, Noah Misch <noah@leadboat.com> wrote:

On Tue, Apr 16, 2024 at 02:12:19PM +0300, Nazir Bilal Yavuz wrote:

--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -549,7 +549,7 @@ read_stream_begin_relation(int flags,
{
stream->ios[i].op.rel = rel;
stream->ios[i].op.smgr = RelationGetSmgr(rel);
-             stream->ios[i].op.smgr_persistence = 0;
+             stream->ios[i].op.smgr_persistence = rel->rd_rel->relpersistence;

Does the following comment in ReadBuffersOperation need an update?

/*
* The following members should be set by the caller. If only smgr is
* provided without rel, then smgr_persistence can be set to override the
* default assumption of RELPERSISTENCE_PERMANENT.
*/

I believe it does not need to be updated but I renamed
'ReadBuffersOperation.smgr_persistence' as
'ReadBuffersOperation.persistence'. So, this comment is updated as
well. I think that rename suits better because persistence does not
need to come from smgr, it could come from relation, too. Do you think
it is a good idea? If it is, does it need a separate commit?

The rename is good. I think the comment implies "persistence" is unused when
rel!=NULL. That implication is true before the patch but false after the
patch.

@@ -4667,19 +4698,31 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,

/* Iterate over each block of the source relation file. */
for (blkno = 0; blkno < nblocks; blkno++)
{
CHECK_FOR_INTERRUPTS();

/* Read block from source relation. */
-             srcBuf = ReadBufferWithoutRelcache(srclocator, forkNum, blkno,
-                                                                                RBM_NORMAL, bstrategy_src,
-                                                                                permanent);
+             srcBuf = read_stream_next_buffer(src_stream, NULL);
LockBuffer(srcBuf, BUFFER_LOCK_SHARE);

I think this should check for read_stream_next_buffer() returning
InvalidBuffer. pg_prewarm doesn't, but the other callers do, and I think the
other callers are a better model. LockBuffer() doesn't check the
InvalidBuffer case, so let's avoid the style of using a
read_stream_next_buffer() return value without checking.

There is an assert in the LockBuffer which checks for the
InvalidBuffer. If that is not enough, we may add an if check for
InvalidBuffer but what should we do in this case? It should not
happen, so erroring out may be a good idea.

I like this style from read_stream_reset():

while ((buffer = read_stream_next_buffer(stream, NULL)) != InvalidBuffer)
{
...
}

That is, don't iterate over block numbers. Drain the stream until empty. If
the stream returns a number of blocks higher or lower than we expected, we
won't detect that, and that's okay. It's not a strong preference, so I'm open
to arguments against that from you or others. A counterargument could be that
read_stream_reset() doesn't know the buffer count, so it has no choice. The
counterargument could say that callers knowing the block count should use the
pg_prewarm() style, and others should use the read_stream_reset() style.

#5Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Noah Misch (#4)
3 attachment(s)
Re: Use read streams in CREATE DATABASE command when the strategy is wal_log

Hi,

On Tue, 16 Jul 2024 at 15:19, Noah Misch <noah@leadboat.com> wrote:

On Tue, Jul 16, 2024 at 02:11:20PM +0300, Nazir Bilal Yavuz wrote:

On Fri, 12 Jul 2024 at 02:52, Noah Misch <noah@leadboat.com> wrote:

On Tue, Apr 16, 2024 at 02:12:19PM +0300, Nazir Bilal Yavuz wrote:

--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -549,7 +549,7 @@ read_stream_begin_relation(int flags,
{
stream->ios[i].op.rel = rel;
stream->ios[i].op.smgr = RelationGetSmgr(rel);
-             stream->ios[i].op.smgr_persistence = 0;
+             stream->ios[i].op.smgr_persistence = rel->rd_rel->relpersistence;

Does the following comment in ReadBuffersOperation need an update?

/*
* The following members should be set by the caller. If only smgr is
* provided without rel, then smgr_persistence can be set to override the
* default assumption of RELPERSISTENCE_PERMANENT.
*/

I believe it does not need to be updated but I renamed
'ReadBuffersOperation.smgr_persistence' as
'ReadBuffersOperation.persistence'. So, this comment is updated as
well. I think that rename suits better because persistence does not
need to come from smgr, it could come from relation, too. Do you think
it is a good idea? If it is, does it need a separate commit?

The rename is good. I think the comment implies "persistence" is unused when
rel!=NULL. That implication is true before the patch but false after the
patch.

What makes it false after the patch? I think the logic did not change.
If there is rel, the value of persistence is obtained from
'rel->rd_rel->relpersistence'. If there is no rel, then smgr is used
to obtain its value.

@@ -4667,19 +4698,31 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,

/* Iterate over each block of the source relation file. */
for (blkno = 0; blkno < nblocks; blkno++)
{
CHECK_FOR_INTERRUPTS();

/* Read block from source relation. */
-             srcBuf = ReadBufferWithoutRelcache(srclocator, forkNum, blkno,
-                                                                                RBM_NORMAL, bstrategy_src,
-                                                                                permanent);
+             srcBuf = read_stream_next_buffer(src_stream, NULL);
LockBuffer(srcBuf, BUFFER_LOCK_SHARE);

I think this should check for read_stream_next_buffer() returning
InvalidBuffer. pg_prewarm doesn't, but the other callers do, and I think the
other callers are a better model. LockBuffer() doesn't check the
InvalidBuffer case, so let's avoid the style of using a
read_stream_next_buffer() return value without checking.

There is an assert in the LockBuffer which checks for the
InvalidBuffer. If that is not enough, we may add an if check for
InvalidBuffer but what should we do in this case? It should not
happen, so erroring out may be a good idea.

I like this style from read_stream_reset():

while ((buffer = read_stream_next_buffer(stream, NULL)) != InvalidBuffer)
{
...
}

That is, don't iterate over block numbers. Drain the stream until empty. If
the stream returns a number of blocks higher or lower than we expected, we
won't detect that, and that's okay. It's not a strong preference, so I'm open
to arguments against that from you or others. A counterargument could be that
read_stream_reset() doesn't know the buffer count, so it has no choice. The
counterargument could say that callers knowing the block count should use the
pg_prewarm() style, and others should use the read_stream_reset() style.

I think what you said in the counter argument makes sense. Also, there
is an 'Assert(read_stream_next_buffer(src_stream, NULL) ==
InvalidBuffer);' after the loop. Which means all the blocks in the
stream are read and there is no block left.

v3 is attached. The only change is 'read_stream.c' changes in the 0003
are moved to 0002.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachments:

v3-0001-Refactor-PinBufferForBlock-to-remove-if-checks-ab.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Refactor-PinBufferForBlock-to-remove-if-checks-ab.patchDownload
From 977cea9451e602d8d3d5e4f0cf3cd7dfea11879e Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Sun, 7 Apr 2024 22:33:36 +0300
Subject: [PATCH v3 1/3] Refactor PinBufferForBlock() to remove if checks about
 persistence

There are if checks in PinBufferForBlock() function to set persistence
of the relation and this function is called for the each block in the
relation. Instead of that, set persistence of the relation before
PinBufferForBlock() function.
---
 src/include/storage/bufmgr.h          |  4 ++--
 src/backend/storage/aio/read_stream.c |  2 +-
 src/backend/storage/buffer/bufmgr.c   | 32 +++++++++++++--------------
 3 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index a1e71013d32..ba9a1a289a9 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -116,12 +116,12 @@ struct ReadBuffersOperation
 {
 	/*
 	 * The following members should be set by the caller.  If only smgr is
-	 * provided without rel, then smgr_persistence can be set to override the
+	 * provided without rel, then persistence can be set to override the
 	 * default assumption of RELPERSISTENCE_PERMANENT.
 	 */
 	Relation	rel;
 	struct SMgrRelationData *smgr;
-	char		smgr_persistence;
+	char		persistence;
 	ForkNumber	forknum;
 	BufferAccessStrategy strategy;
 
diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c
index 74b9bae6313..58221649f27 100644
--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -551,7 +551,7 @@ read_stream_begin_relation(int flags,
 	{
 		stream->ios[i].op.rel = rel;
 		stream->ios[i].op.smgr = RelationGetSmgr(rel);
-		stream->ios[i].op.smgr_persistence = 0;
+		stream->ios[i].op.persistence = rel->rd_rel->relpersistence;
 		stream->ios[i].op.forknum = forknum;
 		stream->ios[i].op.strategy = strategy;
 	}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 61816730955..7cbcdc63a6f 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1104,7 +1104,7 @@ ZeroAndLockBuffer(Buffer buffer, ReadBufferMode mode, bool already_valid)
 static pg_attribute_always_inline Buffer
 PinBufferForBlock(Relation rel,
 				  SMgrRelation smgr,
-				  char smgr_persistence,
+				  char persistence,
 				  ForkNumber forkNum,
 				  BlockNumber blockNum,
 				  BufferAccessStrategy strategy,
@@ -1113,22 +1113,12 @@ PinBufferForBlock(Relation rel,
 	BufferDesc *bufHdr;
 	IOContext	io_context;
 	IOObject	io_object;
-	char		persistence;
 
 	Assert(blockNum != P_NEW);
 
-	/*
-	 * If there is no Relation it usually implies recovery and thus permanent,
-	 * but we take an argument because CreateAndCopyRelationData can reach us
-	 * with only an SMgrRelation for an unlogged relation that we don't want
-	 * to flag with BM_PERMANENT.
-	 */
-	if (rel)
-		persistence = rel->rd_rel->relpersistence;
-	else if (smgr_persistence == 0)
-		persistence = RELPERSISTENCE_PERMANENT;
-	else
-		persistence = smgr_persistence;
+	Assert((persistence == RELPERSISTENCE_TEMP ||
+			persistence == RELPERSISTENCE_PERMANENT ||
+			persistence == RELPERSISTENCE_UNLOGGED));
 
 	if (persistence == RELPERSISTENCE_TEMP)
 	{
@@ -1203,6 +1193,7 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence,
 	ReadBuffersOperation operation;
 	Buffer		buffer;
 	int			flags;
+	char		persistence;
 
 	/*
 	 * Backward compatibility path, most code should use ExtendBufferedRel()
@@ -1224,12 +1215,19 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence,
 		return ExtendBufferedRel(BMR_REL(rel), forkNum, strategy, flags);
 	}
 
+	if (rel)
+		persistence = rel->rd_rel->relpersistence;
+	else if (smgr_persistence == 0)
+		persistence = RELPERSISTENCE_PERMANENT;
+	else
+		persistence = smgr_persistence;
+
 	if (unlikely(mode == RBM_ZERO_AND_CLEANUP_LOCK ||
 				 mode == RBM_ZERO_AND_LOCK))
 	{
 		bool		found;
 
-		buffer = PinBufferForBlock(rel, smgr, smgr_persistence,
+		buffer = PinBufferForBlock(rel, smgr, persistence,
 								   forkNum, blockNum, strategy, &found);
 		ZeroAndLockBuffer(buffer, mode, found);
 		return buffer;
@@ -1241,7 +1239,7 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence,
 		flags = 0;
 	operation.smgr = smgr;
 	operation.rel = rel;
-	operation.smgr_persistence = smgr_persistence;
+	operation.persistence = persistence;
 	operation.forknum = forkNum;
 	operation.strategy = strategy;
 	if (StartReadBuffer(&operation,
@@ -1272,7 +1270,7 @@ StartReadBuffersImpl(ReadBuffersOperation *operation,
 
 		buffers[i] = PinBufferForBlock(operation->rel,
 									   operation->smgr,
-									   operation->smgr_persistence,
+									   operation->persistence,
 									   operation->forknum,
 									   blockNum + i,
 									   operation->strategy,
-- 
2.45.2

v3-0002-Add-a-way-to-create-read-stream-object-by-using-S.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Add-a-way-to-create-read-stream-object-by-using-S.patchDownload
From 408a9487f64397739f23839754000b28883d8ab6 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Wed, 17 Jul 2024 11:30:51 +0300
Subject: [PATCH v3 2/3] Add a way to create read stream object by using
 SMgrRelation

Currently read stream object can be created only by using Relation,
there is no way to create it by using SMgrRelation.

To achieve that, read_stream_begin_impl() function is created and
contents of the read_stream_begin_relation() is moved into this
function.

Both read_stream_begin_relation() and read_stream_begin_smgr_relation()
calls read_stream_begin_impl() by Relation and SMgrRelation
respectively.
---
 src/include/storage/read_stream.h     | 10 ++++
 src/backend/storage/aio/read_stream.c | 83 ++++++++++++++++++++++-----
 2 files changed, 79 insertions(+), 14 deletions(-)

diff --git a/src/include/storage/read_stream.h b/src/include/storage/read_stream.h
index f676d2cc20a..4e599904f26 100644
--- a/src/include/storage/read_stream.h
+++ b/src/include/storage/read_stream.h
@@ -15,6 +15,7 @@
 #define READ_STREAM_H
 
 #include "storage/bufmgr.h"
+#include "storage/smgr.h"
 
 /* Default tuning, reasonable for many users. */
 #define READ_STREAM_DEFAULT 0x00
@@ -57,6 +58,15 @@ extern ReadStream *read_stream_begin_relation(int flags,
 											  void *callback_private_data,
 											  size_t per_buffer_data_size);
 extern Buffer read_stream_next_buffer(ReadStream *stream, void **per_buffer_data);
+extern ReadStream *read_stream_begin_smgr_relation(int flags,
+												   BufferAccessStrategy strategy,
+												   SMgrRelation smgr,
+												   char smgr_persistence,
+												   ForkNumber forknum,
+												   ReadStreamBlockNumberCB callback,
+												   void *callback_private_data,
+												   size_t per_buffer_data_size);
+extern Buffer read_stream_next_buffer(ReadStream *stream, void **per_buffer_private);
 extern void read_stream_reset(ReadStream *stream);
 extern void read_stream_end(ReadStream *stream);
 
diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c
index 58221649f27..bd50c55788e 100644
--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -406,14 +406,16 @@ read_stream_look_ahead(ReadStream *stream, bool suppress_advice)
  * write extra data for each block into the space provided to it.  It will
  * also receive callback_private_data for its own purposes.
  */
-ReadStream *
-read_stream_begin_relation(int flags,
-						   BufferAccessStrategy strategy,
-						   Relation rel,
-						   ForkNumber forknum,
-						   ReadStreamBlockNumberCB callback,
-						   void *callback_private_data,
-						   size_t per_buffer_data_size)
+static ReadStream *
+read_stream_begin_impl(int flags,
+					   BufferAccessStrategy strategy,
+					   Relation rel,
+					   SMgrRelation smgr,
+					   char persistence,
+					   ForkNumber forknum,
+					   ReadStreamBlockNumberCB callback,
+					   void *callback_private_data,
+					   size_t per_buffer_data_size)
 {
 	ReadStream *stream;
 	size_t		size;
@@ -422,9 +424,6 @@ read_stream_begin_relation(int flags,
 	int			strategy_pin_limit;
 	uint32		max_pinned_buffers;
 	Oid			tablespace_id;
-	SMgrRelation smgr;
-
-	smgr = RelationGetSmgr(rel);
 
 	/*
 	 * Decide how many I/Os we will allow to run at the same time.  That
@@ -434,7 +433,7 @@ read_stream_begin_relation(int flags,
 	 */
 	tablespace_id = smgr->smgr_rlocator.locator.spcOid;
 	if (!OidIsValid(MyDatabaseId) ||
-		IsCatalogRelation(rel) ||
+		(rel && IsCatalogRelation(rel)) ||
 		IsCatalogRelationOid(smgr->smgr_rlocator.locator.relNumber))
 	{
 		/*
@@ -550,8 +549,8 @@ read_stream_begin_relation(int flags,
 	for (int i = 0; i < max_ios; ++i)
 	{
 		stream->ios[i].op.rel = rel;
-		stream->ios[i].op.smgr = RelationGetSmgr(rel);
-		stream->ios[i].op.persistence = rel->rd_rel->relpersistence;
+		stream->ios[i].op.smgr = smgr;
+		stream->ios[i].op.persistence = persistence;
 		stream->ios[i].op.forknum = forknum;
 		stream->ios[i].op.strategy = strategy;
 	}
@@ -559,6 +558,62 @@ read_stream_begin_relation(int flags,
 	return stream;
 }
 
+/*
+ * Create a new read stream for reading a relation.
+ * See read_stream_begin_impl() for the detailed explanation.
+ */
+ReadStream *
+read_stream_begin_relation(int flags,
+						   BufferAccessStrategy strategy,
+						   Relation rel,
+						   ForkNumber forknum,
+						   ReadStreamBlockNumberCB callback,
+						   void *callback_private_data,
+						   size_t per_buffer_data_size)
+{
+	return read_stream_begin_impl(flags,
+								  strategy,
+								  rel,
+								  RelationGetSmgr(rel),
+								  rel->rd_rel->relpersistence,
+								  forknum,
+								  callback,
+								  callback_private_data,
+								  per_buffer_data_size);
+}
+
+/*
+ * Create a new read stream for reading a SMgr relation.
+ * See read_stream_begin_impl() for the detailed explanation.
+ */
+ReadStream *
+read_stream_begin_smgr_relation(int flags,
+								BufferAccessStrategy strategy,
+								SMgrRelation smgr,
+								char smgr_persistence,
+								ForkNumber forknum,
+								ReadStreamBlockNumberCB callback,
+								void *callback_private_data,
+								size_t per_buffer_data_size)
+{
+	char		persistence;
+
+	if (smgr_persistence == 0)
+		persistence = RELPERSISTENCE_PERMANENT;
+	else
+		persistence = smgr_persistence;
+
+	return read_stream_begin_impl(flags,
+								  strategy,
+								  NULL,
+								  smgr,
+								  persistence,
+								  forknum,
+								  callback,
+								  callback_private_data,
+								  per_buffer_data_size);
+}
+
 /*
  * Pull one pinned buffer out of a stream.  Each call returns successive
  * blocks in the order specified by the callback.  If per_buffer_data_size was
-- 
2.45.2

v3-0003-Use-read-streams-in-CREATE-DATABASE-when-strategy.patchtext/x-patch; charset=US-ASCII; name=v3-0003-Use-read-streams-in-CREATE-DATABASE-when-strategy.patchDownload
From 2aee4cc322bf47f7eb82ad48a472d4c9f2706f1d Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Tue, 16 Jul 2024 18:11:21 +0300
Subject: [PATCH v3 3/3] Use read streams in CREATE DATABASE when strategy is
 wal_log

CREATE DABASE command uses RelationCopyStorageUsingBuffer() function to
copy buffers when the strategy is wal_log. This function reads source
buffers then copies them to the destination buffers. Read streams are
used only only while reading source buffers because the destination
buffer is read by 'RBM_ZERO_AND_LOCK' option, so it is not important.
---
 src/backend/storage/buffer/bufmgr.c | 53 ++++++++++++++++++++++++++---
 1 file changed, 49 insertions(+), 4 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 7cbcdc63a6f..2acc109c4db 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -54,6 +54,7 @@
 #include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/proc.h"
+#include "storage/read_stream.h"
 #include "storage/smgr.h"
 #include "storage/standby.h"
 #include "utils/memdebug.h"
@@ -135,6 +136,33 @@ typedef struct SMgrSortArray
 	SMgrRelation srel;
 } SMgrSortArray;
 
+/*
+ * Helper struct for read stream object used in
+ * RelationCopyStorageUsingBuffer() function.
+ */
+struct copy_storage_using_buffer_read_stream_private
+{
+	BlockNumber blocknum;
+	BlockNumber last_block;
+};
+
+/*
+ * Callback function to get next block for read stream object used in
+ * RelationCopyStorageUsingBuffer() function.
+ */
+static BlockNumber
+copy_storage_using_buffer_read_stream_next_block(ReadStream *stream,
+												 void *callback_private_data,
+												 void *per_buffer_data)
+{
+	struct copy_storage_using_buffer_read_stream_private *p = callback_private_data;
+
+	if (p->blocknum < p->last_block)
+		return p->blocknum++;
+
+	return InvalidBlockNumber;
+}
+
 /* GUC variables */
 bool		zero_damaged_pages = false;
 int			bgwriter_lru_maxpages = 100;
@@ -4688,6 +4716,9 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
 	PGIOAlignedBlock buf;
 	BufferAccessStrategy bstrategy_src;
 	BufferAccessStrategy bstrategy_dst;
+	struct copy_storage_using_buffer_read_stream_private p;
+	ReadStream *src_stream;
+	SMgrRelation src_smgr;
 
 	/*
 	 * In general, we want to write WAL whenever wal_level > 'minimal', but we
@@ -4716,19 +4747,31 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
 	bstrategy_src = GetAccessStrategy(BAS_BULKREAD);
 	bstrategy_dst = GetAccessStrategy(BAS_BULKWRITE);
 
+	/* Initalize streaming read */
+	p.blocknum = 0;
+	p.last_block = nblocks;
+	src_smgr = smgropen(srclocator, INVALID_PROC_NUMBER);
+	src_stream = read_stream_begin_smgr_relation(READ_STREAM_FULL,
+												 bstrategy_src,
+												 src_smgr,
+												 permanent ? RELPERSISTENCE_PERMANENT : RELPERSISTENCE_UNLOGGED,
+												 forkNum,
+												 copy_storage_using_buffer_read_stream_next_block,
+												 &p,
+												 0);
+
 	/* Iterate over each block of the source relation file. */
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		CHECK_FOR_INTERRUPTS();
 
 		/* Read block from source relation. */
-		srcBuf = ReadBufferWithoutRelcache(srclocator, forkNum, blkno,
-										   RBM_NORMAL, bstrategy_src,
-										   permanent);
+		srcBuf = read_stream_next_buffer(src_stream, NULL);
 		LockBuffer(srcBuf, BUFFER_LOCK_SHARE);
 		srcPage = BufferGetPage(srcBuf);
 
-		dstBuf = ReadBufferWithoutRelcache(dstlocator, forkNum, blkno,
+		dstBuf = ReadBufferWithoutRelcache(dstlocator, forkNum,
+										   BufferGetBlockNumber(srcBuf),
 										   RBM_ZERO_AND_LOCK, bstrategy_dst,
 										   permanent);
 		dstPage = BufferGetPage(dstBuf);
@@ -4748,6 +4791,8 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
 		UnlockReleaseBuffer(dstBuf);
 		UnlockReleaseBuffer(srcBuf);
 	}
+	Assert(read_stream_next_buffer(src_stream, NULL) == InvalidBuffer);
+	read_stream_end(src_stream);
 
 	FreeAccessStrategy(bstrategy_src);
 	FreeAccessStrategy(bstrategy_dst);
-- 
2.45.2

#6Noah Misch
noah@leadboat.com
In reply to: Nazir Bilal Yavuz (#5)
Re: Use read streams in CREATE DATABASE command when the strategy is wal_log

On Wed, Jul 17, 2024 at 12:22:49PM +0300, Nazir Bilal Yavuz wrote:

On Tue, 16 Jul 2024 at 15:19, Noah Misch <noah@leadboat.com> wrote:

On Tue, Jul 16, 2024 at 02:11:20PM +0300, Nazir Bilal Yavuz wrote:

On Fri, 12 Jul 2024 at 02:52, Noah Misch <noah@leadboat.com> wrote:

On Tue, Apr 16, 2024 at 02:12:19PM +0300, Nazir Bilal Yavuz wrote:

--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -549,7 +549,7 @@ read_stream_begin_relation(int flags,
{
stream->ios[i].op.rel = rel;
stream->ios[i].op.smgr = RelationGetSmgr(rel);
-             stream->ios[i].op.smgr_persistence = 0;
+             stream->ios[i].op.smgr_persistence = rel->rd_rel->relpersistence;

Does the following comment in ReadBuffersOperation need an update?

/*
* The following members should be set by the caller. If only smgr is
* provided without rel, then smgr_persistence can be set to override the
* default assumption of RELPERSISTENCE_PERMANENT.
*/

I believe it does not need to be updated but I renamed
'ReadBuffersOperation.smgr_persistence' as
'ReadBuffersOperation.persistence'. So, this comment is updated as
well. I think that rename suits better because persistence does not
need to come from smgr, it could come from relation, too. Do you think
it is a good idea? If it is, does it need a separate commit?

The rename is good. I think the comment implies "persistence" is unused when
rel!=NULL. That implication is true before the patch but false after the
patch.

What makes it false after the patch? I think the logic did not change.
If there is rel, the value of persistence is obtained from
'rel->rd_rel->relpersistence'. If there is no rel, then smgr is used
to obtain its value.

First, the patch removes the "default assumption of RELPERSISTENCE_PERMANENT".
It's now an assertion failure.

The second point is about "If only smgr is provided without rel". Before the
patch, the extern functions that take a ReadBuffersOperation argument examine
smgr_persistence if and only if rel==NULL. That's consistent with the
comment. After the patch, StartReadBuffersImpl() calling PinBufferForBlock()
uses the field unconditionally.

On that note, does WaitReadBuffers() still have a reason to calculate its
persistence as follows, or should this patch make it "persistence =
operation->persistence"?

persistence = operation->rel
? operation->rel->rd_rel->relpersistence
: RELPERSISTENCE_PERMANENT;

I think what you said in the counter argument makes sense.

Okay.

#7Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Noah Misch (#6)
3 attachment(s)
Re: Use read streams in CREATE DATABASE command when the strategy is wal_log

Hi,

On Wed, 17 Jul 2024 at 23:41, Noah Misch <noah@leadboat.com> wrote:

On Wed, Jul 17, 2024 at 12:22:49PM +0300, Nazir Bilal Yavuz wrote:

On Tue, 16 Jul 2024 at 15:19, Noah Misch <noah@leadboat.com> wrote:

On Tue, Jul 16, 2024 at 02:11:20PM +0300, Nazir Bilal Yavuz wrote:

On Fri, 12 Jul 2024 at 02:52, Noah Misch <noah@leadboat.com> wrote:

On Tue, Apr 16, 2024 at 02:12:19PM +0300, Nazir Bilal Yavuz wrote:

--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -549,7 +549,7 @@ read_stream_begin_relation(int flags,
{
stream->ios[i].op.rel = rel;
stream->ios[i].op.smgr = RelationGetSmgr(rel);
-             stream->ios[i].op.smgr_persistence = 0;
+             stream->ios[i].op.smgr_persistence = rel->rd_rel->relpersistence;

Does the following comment in ReadBuffersOperation need an update?

/*
* The following members should be set by the caller. If only smgr is
* provided without rel, then smgr_persistence can be set to override the
* default assumption of RELPERSISTENCE_PERMANENT.
*/

I believe it does not need to be updated but I renamed
'ReadBuffersOperation.smgr_persistence' as
'ReadBuffersOperation.persistence'. So, this comment is updated as
well. I think that rename suits better because persistence does not
need to come from smgr, it could come from relation, too. Do you think
it is a good idea? If it is, does it need a separate commit?

The rename is good. I think the comment implies "persistence" is unused when
rel!=NULL. That implication is true before the patch but false after the
patch.

What makes it false after the patch? I think the logic did not change.
If there is rel, the value of persistence is obtained from
'rel->rd_rel->relpersistence'. If there is no rel, then smgr is used
to obtain its value.

First, the patch removes the "default assumption of RELPERSISTENCE_PERMANENT".
It's now an assertion failure.

The second point is about "If only smgr is provided without rel". Before the
patch, the extern functions that take a ReadBuffersOperation argument examine
smgr_persistence if and only if rel==NULL. That's consistent with the
comment. After the patch, StartReadBuffersImpl() calling PinBufferForBlock()
uses the field unconditionally.

I see, thanks for the explanation. I removed that part of the comment.

On that note, does WaitReadBuffers() still have a reason to calculate its
persistence as follows, or should this patch make it "persistence =
operation->persistence"?

persistence = operation->rel
? operation->rel->rd_rel->relpersistence
: RELPERSISTENCE_PERMANENT;

Nice catch, I do not think it is needed now. WaitReadBuffers() is
called only from ReadBuffer_common() and read_stream_next_buffer().
For the ReadBuffer_common(), persistence is calculated before calling
WaitReadBuffers(). And for the read_stream_next_buffer(), it is
calculated while creating a read stream object in the
read_stream_begin_impl().

v4 is attached.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachments:

v4-0001-Refactor-PinBufferForBlock-to-remove-if-checks-ab.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Refactor-PinBufferForBlock-to-remove-if-checks-ab.patchDownload
From 6d677bf4c1f870c03461532ebb96297c3950db2b Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Sun, 7 Apr 2024 22:33:36 +0300
Subject: [PATCH v4 1/3] Refactor PinBufferForBlock() to remove if checks about
 persistence

There are if checks in PinBufferForBlock() function to set persistence
of the relation and this function is called for the each block in the
relation. Instead of that, set persistence of the relation before
PinBufferForBlock() function.
---
 src/include/storage/bufmgr.h          |  6 ++---
 src/backend/storage/aio/read_stream.c |  2 +-
 src/backend/storage/buffer/bufmgr.c   | 37 ++++++++++++---------------
 3 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index a1e71013d32..b4e454cf8af 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -115,13 +115,11 @@ typedef struct BufferManagerRelation
 struct ReadBuffersOperation
 {
 	/*
-	 * The following members should be set by the caller.  If only smgr is
-	 * provided without rel, then smgr_persistence can be set to override the
-	 * default assumption of RELPERSISTENCE_PERMANENT.
+	 * The following members should be set by the caller.
 	 */
 	Relation	rel;
 	struct SMgrRelationData *smgr;
-	char		smgr_persistence;
+	char		persistence;
 	ForkNumber	forknum;
 	BufferAccessStrategy strategy;
 
diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c
index 74b9bae6313..58221649f27 100644
--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -551,7 +551,7 @@ read_stream_begin_relation(int flags,
 	{
 		stream->ios[i].op.rel = rel;
 		stream->ios[i].op.smgr = RelationGetSmgr(rel);
-		stream->ios[i].op.smgr_persistence = 0;
+		stream->ios[i].op.persistence = rel->rd_rel->relpersistence;
 		stream->ios[i].op.forknum = forknum;
 		stream->ios[i].op.strategy = strategy;
 	}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 61816730955..61e8f726268 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1104,7 +1104,7 @@ ZeroAndLockBuffer(Buffer buffer, ReadBufferMode mode, bool already_valid)
 static pg_attribute_always_inline Buffer
 PinBufferForBlock(Relation rel,
 				  SMgrRelation smgr,
-				  char smgr_persistence,
+				  char persistence,
 				  ForkNumber forkNum,
 				  BlockNumber blockNum,
 				  BufferAccessStrategy strategy,
@@ -1113,22 +1113,13 @@ PinBufferForBlock(Relation rel,
 	BufferDesc *bufHdr;
 	IOContext	io_context;
 	IOObject	io_object;
-	char		persistence;
 
 	Assert(blockNum != P_NEW);
 
-	/*
-	 * If there is no Relation it usually implies recovery and thus permanent,
-	 * but we take an argument because CreateAndCopyRelationData can reach us
-	 * with only an SMgrRelation for an unlogged relation that we don't want
-	 * to flag with BM_PERMANENT.
-	 */
-	if (rel)
-		persistence = rel->rd_rel->relpersistence;
-	else if (smgr_persistence == 0)
-		persistence = RELPERSISTENCE_PERMANENT;
-	else
-		persistence = smgr_persistence;
+	/* Persistence should be set before */
+	Assert((persistence == RELPERSISTENCE_TEMP ||
+			persistence == RELPERSISTENCE_PERMANENT ||
+			persistence == RELPERSISTENCE_UNLOGGED));
 
 	if (persistence == RELPERSISTENCE_TEMP)
 	{
@@ -1203,6 +1194,7 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence,
 	ReadBuffersOperation operation;
 	Buffer		buffer;
 	int			flags;
+	char		persistence;
 
 	/*
 	 * Backward compatibility path, most code should use ExtendBufferedRel()
@@ -1224,12 +1216,19 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence,
 		return ExtendBufferedRel(BMR_REL(rel), forkNum, strategy, flags);
 	}
 
+	if (rel)
+		persistence = rel->rd_rel->relpersistence;
+	else if (smgr_persistence == 0)
+		persistence = RELPERSISTENCE_PERMANENT;
+	else
+		persistence = smgr_persistence;
+
 	if (unlikely(mode == RBM_ZERO_AND_CLEANUP_LOCK ||
 				 mode == RBM_ZERO_AND_LOCK))
 	{
 		bool		found;
 
-		buffer = PinBufferForBlock(rel, smgr, smgr_persistence,
+		buffer = PinBufferForBlock(rel, smgr, persistence,
 								   forkNum, blockNum, strategy, &found);
 		ZeroAndLockBuffer(buffer, mode, found);
 		return buffer;
@@ -1241,7 +1240,7 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence,
 		flags = 0;
 	operation.smgr = smgr;
 	operation.rel = rel;
-	operation.smgr_persistence = smgr_persistence;
+	operation.persistence = persistence;
 	operation.forknum = forkNum;
 	operation.strategy = strategy;
 	if (StartReadBuffer(&operation,
@@ -1272,7 +1271,7 @@ StartReadBuffersImpl(ReadBuffersOperation *operation,
 
 		buffers[i] = PinBufferForBlock(operation->rel,
 									   operation->smgr,
-									   operation->smgr_persistence,
+									   operation->persistence,
 									   operation->forknum,
 									   blockNum + i,
 									   operation->strategy,
@@ -1418,10 +1417,8 @@ WaitReadBuffers(ReadBuffersOperation *operation)
 	buffers = &operation->buffers[0];
 	blocknum = operation->blocknum;
 	forknum = operation->forknum;
+	persistence = operation->persistence;
 
-	persistence = operation->rel
-		? operation->rel->rd_rel->relpersistence
-		: RELPERSISTENCE_PERMANENT;
 	if (persistence == RELPERSISTENCE_TEMP)
 	{
 		io_context = IOCONTEXT_NORMAL;
-- 
2.45.2

v4-0002-Add-a-way-to-create-read-stream-object-by-using-S.patchtext/x-patch; charset=US-ASCII; name=v4-0002-Add-a-way-to-create-read-stream-object-by-using-S.patchDownload
From 7ed354ca91ee5004c3c334058dde76ac6e37ad1b Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Wed, 17 Jul 2024 11:30:51 +0300
Subject: [PATCH v4 2/3] Add a way to create read stream object by using
 SMgrRelation

Currently read stream object can be created only by using Relation,
there is no way to create it by using SMgrRelation.

To achieve that, read_stream_begin_impl() function is created and
contents of the read_stream_begin_relation() is moved into this
function.

Both read_stream_begin_relation() and read_stream_begin_smgr_relation()
calls read_stream_begin_impl() by Relation and SMgrRelation
respectively.
---
 src/include/storage/read_stream.h     | 10 ++++
 src/backend/storage/aio/read_stream.c | 83 ++++++++++++++++++++++-----
 2 files changed, 79 insertions(+), 14 deletions(-)

diff --git a/src/include/storage/read_stream.h b/src/include/storage/read_stream.h
index f676d2cc20a..4e599904f26 100644
--- a/src/include/storage/read_stream.h
+++ b/src/include/storage/read_stream.h
@@ -15,6 +15,7 @@
 #define READ_STREAM_H
 
 #include "storage/bufmgr.h"
+#include "storage/smgr.h"
 
 /* Default tuning, reasonable for many users. */
 #define READ_STREAM_DEFAULT 0x00
@@ -57,6 +58,15 @@ extern ReadStream *read_stream_begin_relation(int flags,
 											  void *callback_private_data,
 											  size_t per_buffer_data_size);
 extern Buffer read_stream_next_buffer(ReadStream *stream, void **per_buffer_data);
+extern ReadStream *read_stream_begin_smgr_relation(int flags,
+												   BufferAccessStrategy strategy,
+												   SMgrRelation smgr,
+												   char smgr_persistence,
+												   ForkNumber forknum,
+												   ReadStreamBlockNumberCB callback,
+												   void *callback_private_data,
+												   size_t per_buffer_data_size);
+extern Buffer read_stream_next_buffer(ReadStream *stream, void **per_buffer_private);
 extern void read_stream_reset(ReadStream *stream);
 extern void read_stream_end(ReadStream *stream);
 
diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c
index 58221649f27..bd50c55788e 100644
--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -406,14 +406,16 @@ read_stream_look_ahead(ReadStream *stream, bool suppress_advice)
  * write extra data for each block into the space provided to it.  It will
  * also receive callback_private_data for its own purposes.
  */
-ReadStream *
-read_stream_begin_relation(int flags,
-						   BufferAccessStrategy strategy,
-						   Relation rel,
-						   ForkNumber forknum,
-						   ReadStreamBlockNumberCB callback,
-						   void *callback_private_data,
-						   size_t per_buffer_data_size)
+static ReadStream *
+read_stream_begin_impl(int flags,
+					   BufferAccessStrategy strategy,
+					   Relation rel,
+					   SMgrRelation smgr,
+					   char persistence,
+					   ForkNumber forknum,
+					   ReadStreamBlockNumberCB callback,
+					   void *callback_private_data,
+					   size_t per_buffer_data_size)
 {
 	ReadStream *stream;
 	size_t		size;
@@ -422,9 +424,6 @@ read_stream_begin_relation(int flags,
 	int			strategy_pin_limit;
 	uint32		max_pinned_buffers;
 	Oid			tablespace_id;
-	SMgrRelation smgr;
-
-	smgr = RelationGetSmgr(rel);
 
 	/*
 	 * Decide how many I/Os we will allow to run at the same time.  That
@@ -434,7 +433,7 @@ read_stream_begin_relation(int flags,
 	 */
 	tablespace_id = smgr->smgr_rlocator.locator.spcOid;
 	if (!OidIsValid(MyDatabaseId) ||
-		IsCatalogRelation(rel) ||
+		(rel && IsCatalogRelation(rel)) ||
 		IsCatalogRelationOid(smgr->smgr_rlocator.locator.relNumber))
 	{
 		/*
@@ -550,8 +549,8 @@ read_stream_begin_relation(int flags,
 	for (int i = 0; i < max_ios; ++i)
 	{
 		stream->ios[i].op.rel = rel;
-		stream->ios[i].op.smgr = RelationGetSmgr(rel);
-		stream->ios[i].op.persistence = rel->rd_rel->relpersistence;
+		stream->ios[i].op.smgr = smgr;
+		stream->ios[i].op.persistence = persistence;
 		stream->ios[i].op.forknum = forknum;
 		stream->ios[i].op.strategy = strategy;
 	}
@@ -559,6 +558,62 @@ read_stream_begin_relation(int flags,
 	return stream;
 }
 
+/*
+ * Create a new read stream for reading a relation.
+ * See read_stream_begin_impl() for the detailed explanation.
+ */
+ReadStream *
+read_stream_begin_relation(int flags,
+						   BufferAccessStrategy strategy,
+						   Relation rel,
+						   ForkNumber forknum,
+						   ReadStreamBlockNumberCB callback,
+						   void *callback_private_data,
+						   size_t per_buffer_data_size)
+{
+	return read_stream_begin_impl(flags,
+								  strategy,
+								  rel,
+								  RelationGetSmgr(rel),
+								  rel->rd_rel->relpersistence,
+								  forknum,
+								  callback,
+								  callback_private_data,
+								  per_buffer_data_size);
+}
+
+/*
+ * Create a new read stream for reading a SMgr relation.
+ * See read_stream_begin_impl() for the detailed explanation.
+ */
+ReadStream *
+read_stream_begin_smgr_relation(int flags,
+								BufferAccessStrategy strategy,
+								SMgrRelation smgr,
+								char smgr_persistence,
+								ForkNumber forknum,
+								ReadStreamBlockNumberCB callback,
+								void *callback_private_data,
+								size_t per_buffer_data_size)
+{
+	char		persistence;
+
+	if (smgr_persistence == 0)
+		persistence = RELPERSISTENCE_PERMANENT;
+	else
+		persistence = smgr_persistence;
+
+	return read_stream_begin_impl(flags,
+								  strategy,
+								  NULL,
+								  smgr,
+								  persistence,
+								  forknum,
+								  callback,
+								  callback_private_data,
+								  per_buffer_data_size);
+}
+
 /*
  * Pull one pinned buffer out of a stream.  Each call returns successive
  * blocks in the order specified by the callback.  If per_buffer_data_size was
-- 
2.45.2

v4-0003-Use-read-streams-in-CREATE-DATABASE-when-strategy.patchtext/x-patch; charset=US-ASCII; name=v4-0003-Use-read-streams-in-CREATE-DATABASE-when-strategy.patchDownload
From 0d9bd67772dbcec8b04dcd2c45abb4c1fa1c2b86 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Tue, 16 Jul 2024 18:11:21 +0300
Subject: [PATCH v4 3/3] Use read streams in CREATE DATABASE when strategy is
 wal_log

CREATE DABASE command uses RelationCopyStorageUsingBuffer() function to
copy buffers when the strategy is wal_log. This function reads source
buffers then copies them to the destination buffers. Read streams are
used only only while reading source buffers because the destination
buffer is read by 'RBM_ZERO_AND_LOCK' option, so it is not important.
---
 src/backend/storage/buffer/bufmgr.c | 53 ++++++++++++++++++++++++++---
 1 file changed, 49 insertions(+), 4 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 61e8f726268..24a8ea6611b 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -54,6 +54,7 @@
 #include "storage/ipc.h"
 #include "storage/lmgr.h"
 #include "storage/proc.h"
+#include "storage/read_stream.h"
 #include "storage/smgr.h"
 #include "storage/standby.h"
 #include "utils/memdebug.h"
@@ -135,6 +136,33 @@ typedef struct SMgrSortArray
 	SMgrRelation srel;
 } SMgrSortArray;
 
+/*
+ * Helper struct for read stream object used in
+ * RelationCopyStorageUsingBuffer() function.
+ */
+struct copy_storage_using_buffer_read_stream_private
+{
+	BlockNumber blocknum;
+	BlockNumber last_block;
+};
+
+/*
+ * Callback function to get next block for read stream object used in
+ * RelationCopyStorageUsingBuffer() function.
+ */
+static BlockNumber
+copy_storage_using_buffer_read_stream_next_block(ReadStream *stream,
+												 void *callback_private_data,
+												 void *per_buffer_data)
+{
+	struct copy_storage_using_buffer_read_stream_private *p = callback_private_data;
+
+	if (p->blocknum < p->last_block)
+		return p->blocknum++;
+
+	return InvalidBlockNumber;
+}
+
 /* GUC variables */
 bool		zero_damaged_pages = false;
 int			bgwriter_lru_maxpages = 100;
@@ -4687,6 +4715,9 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
 	PGIOAlignedBlock buf;
 	BufferAccessStrategy bstrategy_src;
 	BufferAccessStrategy bstrategy_dst;
+	struct copy_storage_using_buffer_read_stream_private p;
+	ReadStream *src_stream;
+	SMgrRelation src_smgr;
 
 	/*
 	 * In general, we want to write WAL whenever wal_level > 'minimal', but we
@@ -4715,19 +4746,31 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
 	bstrategy_src = GetAccessStrategy(BAS_BULKREAD);
 	bstrategy_dst = GetAccessStrategy(BAS_BULKWRITE);
 
+	/* Initalize streaming read */
+	p.blocknum = 0;
+	p.last_block = nblocks;
+	src_smgr = smgropen(srclocator, INVALID_PROC_NUMBER);
+	src_stream = read_stream_begin_smgr_relation(READ_STREAM_FULL,
+												 bstrategy_src,
+												 src_smgr,
+												 permanent ? RELPERSISTENCE_PERMANENT : RELPERSISTENCE_UNLOGGED,
+												 forkNum,
+												 copy_storage_using_buffer_read_stream_next_block,
+												 &p,
+												 0);
+
 	/* Iterate over each block of the source relation file. */
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		CHECK_FOR_INTERRUPTS();
 
 		/* Read block from source relation. */
-		srcBuf = ReadBufferWithoutRelcache(srclocator, forkNum, blkno,
-										   RBM_NORMAL, bstrategy_src,
-										   permanent);
+		srcBuf = read_stream_next_buffer(src_stream, NULL);
 		LockBuffer(srcBuf, BUFFER_LOCK_SHARE);
 		srcPage = BufferGetPage(srcBuf);
 
-		dstBuf = ReadBufferWithoutRelcache(dstlocator, forkNum, blkno,
+		dstBuf = ReadBufferWithoutRelcache(dstlocator, forkNum,
+										   BufferGetBlockNumber(srcBuf),
 										   RBM_ZERO_AND_LOCK, bstrategy_dst,
 										   permanent);
 		dstPage = BufferGetPage(dstBuf);
@@ -4747,6 +4790,8 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
 		UnlockReleaseBuffer(dstBuf);
 		UnlockReleaseBuffer(srcBuf);
 	}
+	Assert(read_stream_next_buffer(src_stream, NULL) == InvalidBuffer);
+	read_stream_end(src_stream);
 
 	FreeAccessStrategy(bstrategy_src);
 	FreeAccessStrategy(bstrategy_dst);
-- 
2.45.2

#8Noah Misch
noah@leadboat.com
In reply to: Nazir Bilal Yavuz (#7)
Re: Use read streams in CREATE DATABASE command when the strategy is wal_log

On Thu, Jul 18, 2024 at 02:11:13PM +0300, Nazir Bilal Yavuz wrote:

v4 is attached.

Removal of the PinBufferForBlock() comment about the "persistence =
RELPERSISTENCE_PERMANENT" fallback started to feel like a loss. I looked for
a way to re-add a comment about the fallback.
https://coverage.postgresql.org/src/backend/storage/buffer/bufmgr.c.gcov.html
shows no test coverage of that fallback, and I think the fallback is
unreachable. Hence, I've removed the fallback in a separate commit. I've
pushed that and your three patches. Thanks.

#9Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Noah Misch (#8)
Re: Use read streams in CREATE DATABASE command when the strategy is wal_log

Hi,

On Sat, 20 Jul 2024 at 14:27, Noah Misch <noah@leadboat.com> wrote:

On Thu, Jul 18, 2024 at 02:11:13PM +0300, Nazir Bilal Yavuz wrote:

v4 is attached.

Removal of the PinBufferForBlock() comment about the "persistence =
RELPERSISTENCE_PERMANENT" fallback started to feel like a loss. I looked for
a way to re-add a comment about the fallback.
https://coverage.postgresql.org/src/backend/storage/buffer/bufmgr.c.gcov.html
shows no test coverage of that fallback, and I think the fallback is
unreachable. Hence, I've removed the fallback in a separate commit. I've
pushed that and your three patches. Thanks.

Thanks for the separate commit and push!

With the separate commit (e00c45f685), does it make sense to rename
the smgr_persistence parameter of the ReadBuffer_common() to
persistence? Because, ExtendBufferedRelTo() calls ReadBuffer_common()
with rel's persistence now, not with smgr's persistence.

--
Regards,
Nazir Bilal Yavuz
Microsoft

#10Noah Misch
noah@leadboat.com
In reply to: Nazir Bilal Yavuz (#9)
Re: Use read streams in CREATE DATABASE command when the strategy is wal_log

On Sat, Jul 20, 2024 at 03:01:31PM +0300, Nazir Bilal Yavuz wrote:

On Sat, 20 Jul 2024 at 14:27, Noah Misch <noah@leadboat.com> wrote:

On Thu, Jul 18, 2024 at 02:11:13PM +0300, Nazir Bilal Yavuz wrote:

v4 is attached.

Removal of the PinBufferForBlock() comment about the "persistence =
RELPERSISTENCE_PERMANENT" fallback started to feel like a loss. I looked for
a way to re-add a comment about the fallback.
https://coverage.postgresql.org/src/backend/storage/buffer/bufmgr.c.gcov.html
shows no test coverage of that fallback, and I think the fallback is
unreachable. Hence, I've removed the fallback in a separate commit. I've
pushed that and your three patches. Thanks.

Thanks for the separate commit and push!

With the separate commit (e00c45f685), does it make sense to rename
the smgr_persistence parameter of the ReadBuffer_common() to
persistence? Because, ExtendBufferedRelTo() calls ReadBuffer_common()
with rel's persistence now, not with smgr's persistence.

BMR_REL() doesn't set relpersistence, so bmr.relpersistence is associated with
bmr.smgr and is unset if bmr.rel is set. That is to say, bmr.relpersistence
is an smgr_persistence. It could make sense to change ReadBuffer_common() to
take a BufferManagerRelation instead of the three distinct arguments.

On a different naming topic, my review missed that field name
copy_storage_using_buffer_read_stream_private.last_block doesn't fit how the
field is used. Code uses it like an nblocks. So let's either rename the
field or change the code to use it as a last_block (e.g. initialize it to
nblocks-1, not nblocks).

#11Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Noah Misch (#10)
1 attachment(s)
Re: Use read streams in CREATE DATABASE command when the strategy is wal_log

Hi,

On Sat, 20 Jul 2024 at 21:14, Noah Misch <noah@leadboat.com> wrote:

On Sat, Jul 20, 2024 at 03:01:31PM +0300, Nazir Bilal Yavuz wrote:

With the separate commit (e00c45f685), does it make sense to rename
the smgr_persistence parameter of the ReadBuffer_common() to
persistence? Because, ExtendBufferedRelTo() calls ReadBuffer_common()
with rel's persistence now, not with smgr's persistence.

BMR_REL() doesn't set relpersistence, so bmr.relpersistence is associated with
bmr.smgr and is unset if bmr.rel is set. That is to say, bmr.relpersistence
is an smgr_persistence. It could make sense to change ReadBuffer_common() to
take a BufferManagerRelation instead of the three distinct arguments.

Got it.

On a different naming topic, my review missed that field name
copy_storage_using_buffer_read_stream_private.last_block doesn't fit how the
field is used. Code uses it like an nblocks. So let's either rename the
field or change the code to use it as a last_block (e.g. initialize it to
nblocks-1, not nblocks).

I prefered renaming it as nblocks, since that is how it was used in
RelationCopyStorageUsingBuffer() before. Also, I realized that instead
of setting p.blocknum = 0; initializing blkno as 0 and using
p.blocknum = blkno makes sense. Because, p.blocknum and blkno should
always start with the same block number. The relevant patch is
attached.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachments:

v5-0001-Minimal-refactorings-on-RelationCopyStorageUsingB.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Minimal-refactorings-on-RelationCopyStorageUsingB.patchDownload
From 1110cfe915dd885c90f91014f8dbece20426efbc Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Mon, 22 Jul 2024 11:07:50 +0300
Subject: [PATCH v5] Minimal refactorings on RelationCopyStorageUsingBuffer()

- copy_storage_using_buffer_read_stream_private.last_block is renamed as
  nblocks.

- blkno is initialized as 0 and used while setting
  copy_storage_using_buffer_read_stream_private.blocknum.
---
 src/backend/storage/buffer/bufmgr.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index a6aeecdf534..29d862ccfde 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -143,7 +143,7 @@ typedef struct SMgrSortArray
 struct copy_storage_using_buffer_read_stream_private
 {
 	BlockNumber blocknum;
-	BlockNumber last_block;
+	BlockNumber nblocks;
 };
 
 /*
@@ -157,7 +157,7 @@ copy_storage_using_buffer_read_stream_next_block(ReadStream *stream,
 {
 	struct copy_storage_using_buffer_read_stream_private *p = callback_private_data;
 
-	if (p->blocknum < p->last_block)
+	if (p->blocknum < p->nblocks)
 		return p->blocknum++;
 
 	return InvalidBlockNumber;
@@ -4709,7 +4709,7 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
 	Page		dstPage;
 	bool		use_wal;
 	BlockNumber nblocks;
-	BlockNumber blkno;
+	BlockNumber blkno = 0;
 	PGIOAlignedBlock buf;
 	BufferAccessStrategy bstrategy_src;
 	BufferAccessStrategy bstrategy_dst;
@@ -4745,8 +4745,8 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
 	bstrategy_dst = GetAccessStrategy(BAS_BULKWRITE);
 
 	/* Initalize streaming read */
-	p.blocknum = 0;
-	p.last_block = nblocks;
+	p.blocknum = blkno;
+	p.nblocks = nblocks;
 	src_smgr = smgropen(srclocator, INVALID_PROC_NUMBER);
 	src_stream = read_stream_begin_smgr_relation(READ_STREAM_FULL,
 												 bstrategy_src,
@@ -4758,7 +4758,7 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
 												 0);
 
 	/* Iterate over each block of the source relation file. */
-	for (blkno = 0; blkno < nblocks; blkno++)
+	for (; blkno < nblocks; blkno++)
 	{
 		CHECK_FOR_INTERRUPTS();
 
-- 
2.45.2

#12Noah Misch
noah@leadboat.com
In reply to: Nazir Bilal Yavuz (#11)
Re: Use read streams in CREATE DATABASE command when the strategy is wal_log

On Mon, Jul 22, 2024 at 12:00:45PM +0300, Nazir Bilal Yavuz wrote:

On Sat, 20 Jul 2024 at 21:14, Noah Misch <noah@leadboat.com> wrote:

On a different naming topic, my review missed that field name
copy_storage_using_buffer_read_stream_private.last_block doesn't fit how the
field is used. Code uses it like an nblocks. So let's either rename the
field or change the code to use it as a last_block (e.g. initialize it to
nblocks-1, not nblocks).

I prefered renaming it as nblocks, since that is how it was used in
RelationCopyStorageUsingBuffer() before. Also, I realized that instead
of setting p.blocknum = 0; initializing blkno as 0 and using
p.blocknum = blkno makes sense. Because, p.blocknum and blkno should
always start with the same block number. The relevant patch is
attached.

I felt the local variable change was not a clear improvement. It would have
been fine for the original patch to do it in that style, but the style of the
original patch was also fine. So I've pushed just the struct field rename.