[PATCH] Speedup truncates of relation forks

Started by Jamison, Kirkover 6 years ago37 messages
#1Jamison, Kirk
k.jamison@jp.fujitsu.com
1 attachment(s)

Hi all,

Attached is a patch to speed up the performance of truncates of relations.
This is also my first time to contribute my own patch,
and I'd gladly appreciate your feedback and advice.

A. Summary

Whenever we truncate relations, it scans the shared buffers thrice
(one per fork) which can be time-consuming. This patch improves
the performance of relation truncates by initially marking the
pages-to-be-truncated of relation forks, then simultaneously
truncating them, resulting to an improved performance in VACUUM,
autovacuum operations and their recovery performance.

B. Patch Details
The following functions were modified:

1. FreeSpaceMapTruncateRel() and visibilitymap_truncate()

a. CURRENT HEAD: These functions truncate the FSM pages and unused VM pages.

b. PATCH: Both functions only mark the pages to truncate and return a block number.

- We used to call smgrtruncate() in these functions, but these are now moved inside the RelationTruncate() and smgr_redo().

- The tentative renaming of the functions are: MarkFreeSpaceMapTruncateRel() and visibilitymap_mark_truncate(). Feel free to suggest better names.

2. RelationTruncate()

a. HEAD: Truncate FSM and VM first, then write WAL, and lastly truncate main fork.

b. PATCH: Now we mark FSM and VM pages first, write WAL, mark MAIN fork pages, then truncate all forks (MAIN, FSM, VM) simultaneously.

3. smgr_redo()

a. HEAD: Truncate main fork and the relation during XLOG replay, create fake rel cache for FSM and VM, truncate FSM, truncate VM, then free fake rel cache.

b. PATCH: Mark main fork dirty buffers, create fake rel cache, mark fsm and vm buffers, truncate marked pages of relation forks simultaneously, truncate relation during XLOG replay, then free fake rel cache.

4. smgrtruncate(), DropRelFileNodeBuffers()

- input arguments are changed to array of forknum and block numbers, int nforks (size of forkNum array)

- truncates the pages of relation forks simultaneously

5. smgrdounlinkfork()
I modified the function because it calls DropRelFileNodeBuffers. However, this is a dead code that can be removed.
I did not remove it for now because that's not for me but the community to decide.

C. Performance Test

I setup a synchronous streaming replication between a master-standby.

In postgresql.conf:
autovacuum = off
wal_level = replica
max_wal_senders = 5
wal_keep_segments = 16
max_locks_per_transaction = 10000
#shared_buffers = 8GB
#shared_buffers = 24GB

Objective: Measure VACUUM execution time; varying shared_buffers size.

1. Create table (ex. 10,000 tables). Insert data to tables.
2. DELETE FROM TABLE (ex. all rows of 10,000 tables)
3. psql -c "\timing on" (measures total execution of SQL queries)
4. VACUUM (whole db)

If you want to test with large number of relations,
you may use the stored functions I used here:
http://bit.ly/reltruncates

D. Results

HEAD results
1) 128MB shared_buffers = 48.885 seconds
2) 8GB shared_buffers = 5 min 30.695 s
3) 24GB shared_buffers = 14 min 13.598 s

PATCH results
1) 128MB shared_buffers = 42.736 s
2) 8GB shared_buffers = 2 min 26.464 s
3) 24GB shared_buffers = 5 min 35.848 s

The performance significantly improved compared to HEAD,
especially for large shared buffers.

---
Would appreciate to hear your thoughts, comments, advice.
Thank you in advance.

Regards,
Kirk Jamison

Attachments:

v1-0001-Speedup-truncate-of-relation-forks.patchapplication/octet-stream; name=v1-0001-Speedup-truncate-of-relation-forks.patchDownload
From a844cd4392bcea4dd3c04ff501675fa9534fc955 Mon Sep 17 00:00:00 2001
From: Kirk Jamison <k.jamison@jp.fujitsu.com>
Date: Tue, 11 Jun 2019 01:41:43 +0000
Subject: [PATCH] Speedup truncates of relation forks

Whenever we truncate relations, it scans the shared buffers thrice
(one per fork) which can be time-consuming. This patch improves
the performance of relation truncates by initially marking the
pages-to-be-truncated of relation forks, then simultaneously
truncating them, resulting to an improved performance in VACUUM,
autovacuum operations and their recovery performance.

---
 contrib/pg_visibility/pg_visibility.c     |  17 +++-
 src/backend/access/heap/visibilitymap.c   |  31 +++-----
 src/backend/catalog/storage.c             | 126 ++++++++++++++++++++++++++----
 src/backend/storage/buffer/bufmgr.c       |  31 +++++---
 src/backend/storage/freespace/freespace.c |  38 +++------
 src/backend/storage/smgr/smgr.c           |  24 +++---
 src/include/access/visibilitymap.h        |   2 +-
 src/include/storage/bufmgr.h              |   4 +-
 src/include/storage/freespace.h           |   2 +-
 src/include/storage/smgr.h                |   7 +-
 10 files changed, 193 insertions(+), 89 deletions(-)

diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 1372bb6..2499415 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -383,6 +383,10 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
 {
 	Oid			relid = PG_GETARG_OID(0);
 	Relation	rel;
+	ForkNumber	forks[MAX_FORKNUM];
+	BlockNumber	blocks[MAX_FORKNUM];
+	BlockNumber	newnblocks = InvalidBlockNumber;
+	int		nforks = 0;
 
 	rel = relation_open(relid, AccessExclusiveLock);
 
@@ -392,7 +396,18 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
 	RelationOpenSmgr(rel);
 	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
 
-	visibilitymap_truncate(rel, 0);
+	blocks[nforks] = visibilitymap_mark_truncate(rel, 0);
+	if (BlockNumberIsValid(blocks[nforks]))
+	{
+		forks[nforks] = VISIBILITYMAP_FORKNUM;
+		newnblocks = blocks[nforks];
+		nforks++;
+	}
+	smgrtruncate(rel->rd_smgr, forks, blocks, nforks);
+
+	/* Update the local smgr_vm_nblocks setting */
+	if (rel->rd_smgr)
+		rel->rd_smgr->smgr_vm_nblocks = newnblocks;
 
 	if (RelationNeedsWAL(rel))
 	{
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dfe06..2f1379c 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -17,7 +17,7 @@
  *		visibilitymap_set	 - set a bit in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
- *		visibilitymap_truncate	- truncate the visibility map
+ *		visibilitymap_mark_truncate - mark the about-to-be-truncated VM
  *
  * NOTES
  *
@@ -430,7 +430,10 @@ visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_fro
 }
 
 /*
- *	visibilitymap_truncate - truncate the visibility map
+ *	visibilitymap_mark_truncate - mark the about-to-be-truncated VM
+ *
+ * Formerly, this function truncates VM relation forks. Instead, this just
+ * marks the dirty buffers.
  *
  * The caller must hold AccessExclusiveLock on the relation, to ensure that
  * other backends receive the smgr invalidation event that this function sends
@@ -438,8 +441,8 @@ visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_fro
  *
  * nheapblocks is the new size of the heap.
  */
-void
-visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
+BlockNumber
+visibilitymap_mark_truncate(Relation rel, BlockNumber nheapblocks)
 {
 	BlockNumber newnblocks;
 
@@ -459,7 +462,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	 * nothing to truncate.
 	 */
 	if (!smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM))
-		return;
+		return InvalidBlockNumber;
 
 	/*
 	 * Unless the new size is exactly at a visibility map page boundary, the
@@ -480,7 +483,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 		if (!BufferIsValid(mapBuffer))
 		{
 			/* nothing to do, the file was already smaller */
-			return;
+			return InvalidBlockNumber;
 		}
 
 		page = BufferGetPage(mapBuffer);
@@ -528,20 +531,10 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	if (smgrnblocks(rel->rd_smgr, VISIBILITYMAP_FORKNUM) <= newnblocks)
 	{
 		/* nothing to do, the file was already smaller than requested size */
-		return;
+		return InvalidBlockNumber;
 	}
-
-	/* Truncate the unused VM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, newnblocks);
-
-	/*
-	 * We might as well update the local smgr_vm_nblocks setting. smgrtruncate
-	 * sent an smgr cache inval message, which will cause other backends to
-	 * invalidate their copy of smgr_vm_nblocks, and this one too at the next
-	 * command boundary.  But this ensures it isn't outright wrong until then.
-	 */
-	if (rel->rd_smgr)
-		rel->rd_smgr->smgr_vm_nblocks = newnblocks;
+	else
+		return newnblocks;
 }
 
 /*
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 3cc886f..3151632 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -231,6 +231,11 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 {
 	bool		fsm;
 	bool		vm;
+	ForkNumber	forks[MAX_FORKNUM];
+	BlockNumber	blocks[MAX_FORKNUM];
+	BlockNumber	new_nfsmblocks = InvalidBlockNumber;	/* FSM blocks */
+	BlockNumber	newnblocks = InvalidBlockNumber;	/* VM blocks */
+	int		nforks = 0;
 
 	/* Open it at the smgr level if not already done */
 	RelationOpenSmgr(rel);
@@ -242,15 +247,34 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 	rel->rd_smgr->smgr_fsm_nblocks = InvalidBlockNumber;
 	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
 
-	/* Truncate the FSM first if it exists */
+	/*
+	 * We used to truncate FSM and VM forks here. Now we only mark the
+	 * dirty buffers of all forks about-to-be-truncated if they exist.
+	 */
+
 	fsm = smgrexists(rel->rd_smgr, FSM_FORKNUM);
 	if (fsm)
-		FreeSpaceMapTruncateRel(rel, nblocks);
+	{
+		blocks[nforks] = MarkFreeSpaceMapTruncateRel(rel, nblocks);
+		if (BlockNumberIsValid(blocks[nforks]))
+		{
+			forks[nforks] = FSM_FORKNUM;
+			new_nfsmblocks= blocks[nforks];	/* FSM blocks */
+			nforks++;
+		}
+	}
 
-	/* Truncate the visibility map too if it exists. */
 	vm = smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM);
 	if (vm)
-		visibilitymap_truncate(rel, nblocks);
+	{
+		blocks[nforks] = visibilitymap_mark_truncate(rel, nblocks);
+		if (BlockNumberIsValid(blocks[nforks]))
+		{
+			forks[nforks] = VISIBILITYMAP_FORKNUM;
+			newnblocks = blocks[nforks]; 	/* VM blocks */
+			nforks++;
+		}
+	}
 
 	/*
 	 * We WAL-log the truncation before actually truncating, which means
@@ -263,9 +287,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 	 */
 	if (RelationNeedsWAL(rel))
 	{
-		/*
-		 * Make an XLOG entry reporting the file truncation.
-		 */
+		/* Make an XLOG entry reporting the file truncation */
 		XLogRecPtr	lsn;
 		xl_smgr_truncate xlrec;
 
@@ -290,8 +312,33 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 			XLogFlush(lsn);
 	}
 
-	/* Do the real work */
-	smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks);
+	/* Mark the MAIN fork */
+	forks[nforks] = MAIN_FORKNUM;
+	blocks[nforks] = nblocks;
+	nforks++;
+
+	/* Truncate relation forks simultaneously */
+	smgrtruncate(rel->rd_smgr, forks, blocks, nforks);
+
+	/*
+	 * We might as well update the local smgr_fsm_nblocks and smgr_vm_nblocks
+	 * setting. smgrtruncate sent an smgr cache inval message, which will cause
+	 * other backends to invalidate their copy of smgr_fsm_nblocks and
+	 * smgr_vm_nblocks, and this one too at the next command boundary. But this
+	 * ensures it isn't outright wrong until then.
+	 */
+	if (rel->rd_smgr)
+	{
+		rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
+		rel->rd_smgr->smgr_vm_nblocks = newnblocks;
+	}
+
+	/*
+	 * Update upper-level FSM pages to account for the truncation.  This is
+	 * important because the just-truncated pages were likely marked as
+	 * all-free, and would be preferentially selected.
+	 */
+	FreeSpaceMapVacuumRange(rel, new_nfsmblocks, InvalidBlockNumber);
 }
 
 /*
@@ -588,6 +635,14 @@ smgr_redo(XLogReaderState *record)
 		xl_smgr_truncate *xlrec = (xl_smgr_truncate *) XLogRecGetData(record);
 		SMgrRelation reln;
 		Relation	rel;
+		ForkNumber	forks[MAX_FORKNUM];
+		BlockNumber	blocks[MAX_FORKNUM];
+		BlockNumber	new_nfsmblocks = InvalidBlockNumber;
+		BlockNumber	newnblocks = InvalidBlockNumber;
+		int		nforks = 0;
+		bool		fsm_fork = false;
+		bool		main_fork = false;
+		bool		vm_fork = false;
 
 		reln = smgropen(xlrec->rnode, InvalidBackendId);
 
@@ -616,23 +671,62 @@ smgr_redo(XLogReaderState *record)
 		 */
 		XLogFlush(lsn);
 
+		/*
+		 * To speedup recovery, we mark the about-to-be-truncated blocks of
+		 * relation forks first, then truncate those simultaneously later.
+		 */
 		if ((xlrec->flags & SMGR_TRUNCATE_HEAP) != 0)
 		{
-			smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno);
-
-			/* Also tell xlogutils.c about it */
-			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+			forks[nforks] = MAIN_FORKNUM;
+			blocks[nforks] = xlrec->blkno;
+			nforks++;
+			main_fork = true;
 		}
 
-		/* Truncate FSM and VM too */
 		rel = CreateFakeRelcacheEntry(xlrec->rnode);
 
 		if ((xlrec->flags & SMGR_TRUNCATE_FSM) != 0 &&
 			smgrexists(reln, FSM_FORKNUM))
-			FreeSpaceMapTruncateRel(rel, xlrec->blkno);
+		{
+			blocks[nforks] = MarkFreeSpaceMapTruncateRel(rel, xlrec->blkno);
+			if (BlockNumberIsValid(blocks[nforks]))
+			{
+				forks[nforks] = FSM_FORKNUM;
+				new_nfsmblocks= blocks[nforks];
+				nforks++;
+				fsm_fork = true;
+			}
+		}
 		if ((xlrec->flags & SMGR_TRUNCATE_VM) != 0 &&
 			smgrexists(reln, VISIBILITYMAP_FORKNUM))
-			visibilitymap_truncate(rel, xlrec->blkno);
+		{
+			blocks[nforks] = visibilitymap_mark_truncate(rel, xlrec->blkno);
+			if (BlockNumberIsValid(blocks[nforks]))
+			{
+				forks[nforks] = VISIBILITYMAP_FORKNUM;
+				newnblocks = blocks[nforks];
+				nforks++;
+				vm_fork = true;
+			}
+		}
+
+		/* Truncate relation forks simultaneously */
+		if (main_fork || fsm_fork || vm_fork)
+			smgrtruncate(reln, forks, blocks, nforks);
+
+		/* Also tell xlogutils.c about it */
+		if (main_fork)
+			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+
+		/* Update the local smgr_fsm_nblocks and smgr_vm_nblocks setting */
+		if (rel->rd_smgr)
+		{
+			rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
+			rel->rd_smgr->smgr_vm_nblocks = newnblocks;
+		}
+
+		/* Update upper-level FSM pages to account for the truncation */
+		FreeSpaceMapVacuumRange(rel, new_nfsmblocks, InvalidBlockNumber);
 
 		FreeFakeRelcacheEntry(rel);
 	}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 7332e6b..123429c 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2899,8 +2899,8 @@ BufferGetLSNAtomic(Buffer buffer)
 /* ---------------------------------------------------------------------
  *		DropRelFileNodeBuffers
  *
- *		This function removes from the buffer pool all the pages of the
- *		specified relation fork that have block numbers >= firstDelBlock.
+ *		This function simultaneously removes from the buffer pool all the
+ *		pages of the relation forks that have block numbers >= firstDelBlock.
  *		(In particular, with firstDelBlock = 0, all pages are removed.)
  *		Dirty pages are simply dropped, without bothering to write them
  *		out first.  Therefore, this is NOT rollback-able, and so should be
@@ -2923,8 +2923,8 @@ BufferGetLSNAtomic(Buffer buffer)
  * --------------------------------------------------------------------
  */
 void
-DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
-					   BlockNumber firstDelBlock)
+DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
+					   BlockNumber *firstDelBlock, int nforks)
 {
 	int			i;
 
@@ -2932,7 +2932,11 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
 	if (RelFileNodeBackendIsTemp(rnode))
 	{
 		if (rnode.backend == MyBackendId)
-			DropRelFileNodeLocalBuffers(rnode.node, forkNum, firstDelBlock);
+		{
+			for (int j = 0; j < nforks; j++)
+				DropRelFileNodeLocalBuffers(rnode.node, forkNum[j],
+											firstDelBlock[j]);
+		}
 		return;
 	}
 
@@ -2940,6 +2944,7 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
 	{
 		BufferDesc *bufHdr = GetBufferDescriptor(i);
 		uint32		buf_state;
+		int		k = 0;
 
 		/*
 		 * We can make this a tad faster by prechecking the buffer tag before
@@ -2961,11 +2966,17 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
 			continue;
 
 		buf_state = LockBufHdr(bufHdr);
-		if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
-			bufHdr->tag.forkNum == forkNum &&
-			bufHdr->tag.blockNum >= firstDelBlock)
-			InvalidateBuffer(bufHdr);	/* releases spinlock */
-		else
+		for (k = 0; k < nforks; k++)
+		{
+			if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
+				bufHdr->tag.forkNum == forkNum[k] &&
+				bufHdr->tag.blockNum >= firstDelBlock[k])
+			{
+				InvalidateBuffer(bufHdr); /* releases spinlock */
+				break;
+			}
+		}
+		if (k >= nforks)
 			UnlockBufHdr(bufHdr, buf_state);
 	}
 }
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index c17b3f4..708c7cb 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -247,7 +247,10 @@ GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk)
 }
 
 /*
- * FreeSpaceMapTruncateRel - adjust for truncation of a relation.
+ * MarkFreeSpaceMapTruncateRel - adjust for truncation of a relation.
+ *
+ * Formerly, this function truncates FSM relation forks. Instead, this just
+ * marks the dirty buffers and returns a block number.
  *
  * The caller must hold AccessExclusiveLock on the relation, to ensure that
  * other backends receive the smgr invalidation event that this function sends
@@ -255,8 +258,8 @@ GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk)
  *
  * nblocks is the new size of the heap.
  */
-void
-FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
+BlockNumber
+MarkFreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 {
 	BlockNumber new_nfsmblocks;
 	FSMAddress	first_removed_address;
@@ -270,7 +273,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	 * truncate.
 	 */
 	if (!smgrexists(rel->rd_smgr, FSM_FORKNUM))
-		return;
+		return InvalidBlockNumber;
 
 	/* Get the location in the FSM of the first removed heap block */
 	first_removed_address = fsm_get_location(nblocks, &first_removed_slot);
@@ -285,7 +288,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	{
 		buf = fsm_readbuf(rel, first_removed_address, false);
 		if (!BufferIsValid(buf))
-			return;				/* nothing to do; the FSM was already smaller */
+			return InvalidBlockNumber;	/* nothing to do; the FSM was already smaller */
 		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
 
 		/* NO EREPORT(ERROR) from here till changes are logged */
@@ -310,33 +313,16 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 		UnlockReleaseBuffer(buf);
 
 		new_nfsmblocks = fsm_logical_to_physical(first_removed_address) + 1;
+		return new_nfsmblocks;
 	}
 	else
 	{
 		new_nfsmblocks = fsm_logical_to_physical(first_removed_address);
 		if (smgrnblocks(rel->rd_smgr, FSM_FORKNUM) <= new_nfsmblocks)
-			return;				/* nothing to do; the FSM was already smaller */
+			return InvalidBlockNumber;	/* nothing to do; the FSM was already smaller */
+		else
+			return new_nfsmblocks;
 	}
-
-	/* Truncate the unused FSM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, FSM_FORKNUM, new_nfsmblocks);
-
-	/*
-	 * We might as well update the local smgr_fsm_nblocks setting.
-	 * smgrtruncate sent an smgr cache inval message, which will cause other
-	 * backends to invalidate their copy of smgr_fsm_nblocks, and this one too
-	 * at the next command boundary.  But this ensures it isn't outright wrong
-	 * until then.
-	 */
-	if (rel->rd_smgr)
-		rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
-
-	/*
-	 * Update upper-level FSM pages to account for the truncation.  This is
-	 * important because the just-truncated pages were likely marked as
-	 * all-free, and would be preferentially selected.
-	 */
-	FreeSpaceMapVacuumRange(rel, nblocks, InvalidBlockNumber);
 }
 
 /*
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index dba8c39..b37560e 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -508,19 +508,21 @@ smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo)
  *		already.
  */
 void
-smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
+smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum, bool isRedo, int nforks)
 {
 	RelFileNodeBackend rnode = reln->smgr_rnode;
 	int			which = reln->smgr_which;
+	int			i;
 
 	/* Close the fork at smgr level */
-	smgrsw[which].smgr_close(reln, forknum);
+	for (i = 0; i < nforks; i++)
+		smgrsw[which].smgr_close(reln, forknum[i]);
 
 	/*
 	 * Get rid of any remaining buffers for the fork.  bufmgr will just drop
 	 * them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(rnode, forknum, 0);
+	DropRelFileNodeBuffers(rnode, forknum, 0, nforks);
 
 	/*
 	 * It'd be nice to tell the stats collector to forget it immediately, too.
@@ -546,7 +548,8 @@ smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 	 * ERROR, because we've already decided to commit or abort the current
 	 * xact.
 	 */
-	smgrsw[which].smgr_unlink(rnode, forknum, isRedo);
+	for (i = 0; i < nforks; i++)
+		smgrsw[which].smgr_unlink(rnode, forknum[i], isRedo);
 }
 
 /*
@@ -643,13 +646,15 @@ smgrnblocks(SMgrRelation reln, ForkNumber forknum)
  * The truncation is done immediately, so this can't be rolled back.
  */
 void
-smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
+smgrtruncate(SMgrRelation reln, ForkNumber *forknum, BlockNumber *nblocks, int nforks)
 {
+	int		i;
+
 	/*
 	 * Get rid of any buffers for the about-to-be-deleted blocks. bufmgr will
 	 * just drop them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nblocks);
+	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nblocks, nforks);
 
 	/*
 	 * Send a shared-inval message to force other backends to close any smgr
@@ -663,10 +668,9 @@ smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 	 */
 	CacheInvalidateSmgr(reln->smgr_rnode);
 
-	/*
-	 * Do the truncation.
-	 */
-	smgrsw[reln->smgr_which].smgr_truncate(reln, forknum, nblocks);
+	/* Do the truncation */
+	for (i = 0; i < nforks; i++)
+		smgrsw[reln->smgr_which].smgr_truncate(reln, forknum[i], nblocks[i]);
 }
 
 /*
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 2d88043..4735d5f 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -44,6 +44,6 @@ extern void visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 							  uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
-extern void visibilitymap_truncate(Relation rel, BlockNumber nheapblocks);
+extern BlockNumber visibilitymap_mark_truncate(Relation rel, BlockNumber nheapblocks);
 
 #endif							/* VISIBILITYMAP_H */
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 509f4b7..5be6c0d 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -190,8 +190,8 @@ extern BlockNumber RelationGetNumberOfBlocksInFork(Relation relation,
 extern void FlushOneBuffer(Buffer buffer);
 extern void FlushRelationBuffers(Relation rel);
 extern void FlushDatabaseBuffers(Oid dbid);
-extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode,
-								   ForkNumber forkNum, BlockNumber firstDelBlock);
+extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
+								   BlockNumber *firstDelBlock, int nforks);
 extern void DropRelFileNodesAllBuffers(RelFileNodeBackend *rnodes, int nnodes);
 extern void DropDatabaseBuffers(Oid dbid);
 
diff --git a/src/include/storage/freespace.h b/src/include/storage/freespace.h
index 8d8c465..bf19a67 100644
--- a/src/include/storage/freespace.h
+++ b/src/include/storage/freespace.h
@@ -30,7 +30,7 @@ extern void RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk,
 extern void XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk,
 										Size spaceAvail);
 
-extern void FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
+extern BlockNumber MarkFreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
 extern void FreeSpaceMapVacuum(Relation rel);
 extern void FreeSpaceMapVacuumRange(Relation rel, BlockNumber start,
 									BlockNumber end);
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index d286c8c..ff70b09 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -90,7 +90,8 @@ extern void smgrclosenode(RelFileNodeBackend rnode);
 extern void smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern void smgrdounlink(SMgrRelation reln, bool isRedo);
 extern void smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo);
-extern void smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo);
+extern void smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum,
+							 bool isRedo, int nforks);
 extern void smgrextend(SMgrRelation reln, ForkNumber forknum,
 					   BlockNumber blocknum, char *buffer, bool skipFsync);
 extern void smgrprefetch(SMgrRelation reln, ForkNumber forknum,
@@ -102,8 +103,8 @@ extern void smgrwrite(SMgrRelation reln, ForkNumber forknum,
 extern void smgrwriteback(SMgrRelation reln, ForkNumber forknum,
 						  BlockNumber blocknum, BlockNumber nblocks);
 extern BlockNumber smgrnblocks(SMgrRelation reln, ForkNumber forknum);
-extern void smgrtruncate(SMgrRelation reln, ForkNumber forknum,
-						 BlockNumber nblocks);
+extern void smgrtruncate(SMgrRelation reln, ForkNumber *forknum,
+						 BlockNumber *nblocks, int nforks);
 extern void smgrimmedsync(SMgrRelation reln, ForkNumber forknum);
 extern void AtEOXact_SMgr(void);
 
-- 
1.8.3.1

#2Adrien Nayrat
adrien.nayrat@anayrat.info
In reply to: Jamison, Kirk (#1)
Re: [PATCH] Speedup truncates of relation forks

On 6/11/19 9:34 AM, Jamison, Kirk wrote:

Hi all,

Attached is a patch to speed up the performance of truncates of relations.

Thanks for working on this!

*C.     **Performance Test*

I setup a synchronous streaming replication between a master-standby.

In postgresql.conf:
autovacuum = off
wal_level = replica
max_wal_senders = 5
wal_keep_segments = 16
max_locks_per_transaction = 10000
#shared_buffers = 8GB
#shared_buffers = 24GB

Objective: Measure VACUUM execution time; varying shared_buffers size.

1. Create table (ex. 10,000 tables). Insert data to tables.
2. DELETE FROM TABLE (ex. all rows of 10,000 tables)
3. psql -c "\timing on" (measures total execution of SQL queries)
4. VACUUM (whole db)

If you want to test with large number of relations,

you may use the stored functions I used here:
http://bit.ly/reltruncates

You should post these functions in this thread for the archives ;)

*D.     **Results*

HEAD results

1) 128MB shared_buffers = 48.885 seconds
2) 8GB shared_buffers = 5 min 30.695 s
3) 24GB shared_buffers = 14 min 13.598 s

PATCH results

1) 128MB shared_buffers = 42.736 s
2) 8GB shared_buffers = 2 min 26.464 s
3) 24GB shared_buffers = 5 min 35.848 s

The performance significantly improved compared to HEAD,
especially for large shared buffers.

From a user POW, the main issue with relation truncation is that it can block
queries on standby server during truncation replay.

It could be interesting if you can test this case and give results of your path.
Maybe by performing read queries on standby server and counting wait_event with
pg_wait_sampling?

Regards,

--
Adrien

#3Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Jamison, Kirk (#1)
Re: [PATCH] Speedup truncates of relation forks

On Tue, Jun 11, 2019 at 07:34:35AM +0000, Jamison, Kirk wrote:

Hi all,

Attached is a patch to speed up the performance of truncates of relations.
This is also my first time to contribute my own patch,
and I'd gladly appreciate your feedback and advice.

Thanks for the patch. Please add it to the commitfest app, so that we
don't forget about it: https://commitfest.postgresql.org/23/

A. Summary

Whenever we truncate relations, it scans the shared buffers thrice
(one per fork) which can be time-consuming. This patch improves
the performance of relation truncates by initially marking the
pages-to-be-truncated of relation forks, then simultaneously
truncating them, resulting to an improved performance in VACUUM,
autovacuum operations and their recovery performance.

OK, so essentially the whole point is to scan the buffers only once, for
all forks at the same time (instead of three times).

B. Patch Details
The following functions were modified:

1. FreeSpaceMapTruncateRel() and visibilitymap_truncate()

a. CURRENT HEAD: These functions truncate the FSM pages and unused VM pages.

b. PATCH: Both functions only mark the pages to truncate and return a block number.

- We used to call smgrtruncate() in these functions, but these are now moved inside the RelationTruncate() and smgr_redo().

- The tentative renaming of the functions are: MarkFreeSpaceMapTruncateRel() and visibilitymap_mark_truncate(). Feel free to suggest better names.

2. RelationTruncate()

a. HEAD: Truncate FSM and VM first, then write WAL, and lastly truncate main fork.

b. PATCH: Now we mark FSM and VM pages first, write WAL, mark MAIN fork pages, then truncate all forks (MAIN, FSM, VM) simultaneously.

3. smgr_redo()

a. HEAD: Truncate main fork and the relation during XLOG replay, create fake rel cache for FSM and VM, truncate FSM, truncate VM, then free fake rel cache.

b. PATCH: Mark main fork dirty buffers, create fake rel cache, mark fsm and vm buffers, truncate marked pages of relation forks simultaneously, truncate relation during XLOG replay, then free fake rel cache.

4. smgrtruncate(), DropRelFileNodeBuffers()

- input arguments are changed to array of forknum and block numbers, int nforks (size of forkNum array)

- truncates the pages of relation forks simultaneously

5. smgrdounlinkfork()
I modified the function because it calls DropRelFileNodeBuffers. However, this is a dead code that can be removed.
I did not remove it for now because that's not for me but the community to decide.

You really don't need to extract the changes like this - such changes
are generally obvious from the diff.

You only need to explain things that are not obvious from the code
itself, e.g. non-trivial design decisions, etc.

C. Performance Test

I setup a synchronous streaming replication between a master-standby.

In postgresql.conf:
autovacuum = off
wal_level = replica
max_wal_senders = 5
wal_keep_segments = 16
max_locks_per_transaction = 10000
#shared_buffers = 8GB
#shared_buffers = 24GB

Objective: Measure VACUUM execution time; varying shared_buffers size.

1. Create table (ex. 10,000 tables). Insert data to tables.
2. DELETE FROM TABLE (ex. all rows of 10,000 tables)
3. psql -c "\timing on" (measures total execution of SQL queries)
4. VACUUM (whole db)

If you want to test with large number of relations,
you may use the stored functions I used here:
http://bit.ly/reltruncates

D. Results

HEAD results
1) 128MB shared_buffers = 48.885 seconds
2) 8GB shared_buffers = 5 min 30.695 s
3) 24GB shared_buffers = 14 min 13.598 s

PATCH results
1) 128MB shared_buffers = 42.736 s
2) 8GB shared_buffers = 2 min 26.464 s
3) 24GB shared_buffers = 5 min 35.848 s

The performance significantly improved compared to HEAD,
especially for large shared buffers.

Right, that seems nice. And it matches the expected 1:3 speedup, at
least for the larger shared_buffers cases.

Years ago I've implemented an optimization for many DROP TABLE commands
in a single transaction - instead of scanning buffers for each relation,
the code now accumulates a small number of relations into an array, and
then does a bsearch for each buffer.

Would something like that be applicable/useful here? That is, if we do
multiple TRUNCATE commands in a single transaction, can we optimize it
like this?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#3)
Re: [PATCH] Speedup truncates of relation forks

On 2019-Jun-12, Tomas Vondra wrote:

Years ago I've implemented an optimization for many DROP TABLE commands
in a single transaction - instead of scanning buffers for each relation,
the code now accumulates a small number of relations into an array, and
then does a bsearch for each buffer.

commit 279628a0a7cf582f7dfb68e25b7b76183dd8ff2f:
Accelerate end-of-transaction dropping of relations

When relations are dropped, at end of transaction we need to remove the
files and clean the buffer pool of buffers containing pages of those
relations. Previously we would scan the buffer pool once per relation
to clean up buffers. When there are many relations to drop, the
repeated scans make this process slow; so we now instead pass a list of
relations to drop and scan the pool once, checking each buffer against
the passed list. When the number of relations is larger than a
threshold (which as of this patch is being set to 20 relations) we sort
the array before starting, and bsearch the array; when it's smaller, we
simply scan the array linearly each time, because that's faster. The
exact optimal threshold value depends on many factors, but the
difference is not likely to be significant enough to justify making it
user-settable.

This has been measured to be a significant win (a 15x win when dropping
100,000 relations; an extreme case, but reportedly a real one).

Author: Tomas Vondra, some tweaks by me
Reviewed by: Robert Haas, Shigeru Hanada, Andres Freund, �lvaro Herrera

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Tomas Vondra (#3)
RE: [PATCH] Speedup truncates of relation forks

From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]

Years ago I've implemented an optimization for many DROP TABLE commands
in a single transaction - instead of scanning buffers for each relation,
the code now accumulates a small number of relations into an array, and
then does a bsearch for each buffer.

Would something like that be applicable/useful here? That is, if we do
multiple TRUNCATE commands in a single transaction, can we optimize it
like this?

Unfortunately not. VACUUM and autovacuum handles each table in a different transaction.

BTW, what we really want to do is to keep the failover time within 10 seconds. The customer periodically TRUNCATEs tens of thousands of tables. If failover unluckily happens immediately after those TRUNCATEs, the recovery on the standby could take much longer. But your past improvement seems likely to prevent that problem, if the customer TRUNCATEs tables in the same transaction.

On the other hand, it's now highly possible that the customer can only TRUNCATE a single table in a transaction, thus run as many transactions as the TRUNCATEd tables. So, we also want to speed up each TRUNCATE by touching only the buffers for the table, not scanning the whole shared buffers. Andres proposed one method that uses a radix tree, but we don't have an idea how to do it yet.

Speeding up each TRUNCATE and its recovery is a different topic. The patch proposed here is one possible improvement to shorten the failover time.

Regards
Takayuki Tsunakawa

#6Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Tsunakawa, Takayuki (#5)
Re: [PATCH] Speedup truncates of relation forks

On Wed, Jun 12, 2019 at 12:25 PM Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]

Years ago I've implemented an optimization for many DROP TABLE commands
in a single transaction - instead of scanning buffers for each relation,
the code now accumulates a small number of relations into an array, and
then does a bsearch for each buffer.

Would something like that be applicable/useful here? That is, if we do
multiple TRUNCATE commands in a single transaction, can we optimize it
like this?

Unfortunately not. VACUUM and autovacuum handles each table in a different transaction.

We do RelationTruncate() also when we truncate heaps that are created
in the current transactions or has a new relfilenodes in the current
transaction. So I think there is a room for optimization Thomas
suggested, although I'm not sure it's a popular use case.

I've not look at this patch deeply but in DropRelFileNodeBuffer I
think we can get the min value of all firstDelBlock and use it as the
lower bound of block number that we're interested in. That way we can
skip checking the array during scanning the buffer pool.

-extern void smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum,
bool isRedo);
+extern void smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum,
+                                                        bool isRedo,
int nforks);
-extern void smgrtruncate(SMgrRelation reln, ForkNumber forknum,
-                                                BlockNumber nblocks);
+extern void smgrtruncate(SMgrRelation reln, ForkNumber *forknum,
+                                                BlockNumber *nblocks,
int nforks);

Don't we use each elements of nblocks for each fork? That is, each
fork uses an element at its fork number in the nblocks array and sets
InvalidBlockNumber for invalid slots, instead of passing the valid
number of elements. That way the following code that exist at many places,

blocks[nforks] = visibilitymap_mark_truncate(rel, nblocks);
if (BlockNumberIsValid(blocks[nforks]))
{
forks[nforks] = VISIBILITYMAP_FORKNUM;
nforks++;
}

would become

blocks[VISIBILITYMAP_FORKNUM] = visibilitymap_mark_truncate(rel, nblocks);

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#7Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Adrien Nayrat (#2)
RE: [PATCH] Speedup truncates of relation forks

On Tuesday, June 11, 2019 7:23 PM (GMT+9), Adrien Nayrat wrote:

Attached is a patch to speed up the performance of truncates of relations.

Thanks for working on this!

Thank you also for taking a look at my thread.

If you want to test with large number of relations,
you may use the stored functions I used here:
http://bit.ly/reltruncates

You should post these functions in this thread for the archives ;)

This is noted. Pasting it below:

create or replace function create_tables(numtabs int)
returns void as $$
declare query_string text;
begin
for i in 1..numtabs loop
query_string := 'create table tab_' || i::text || ' (a int);';
execute query_string;
end loop;
end;
$$ language plpgsql;

create or replace function delfrom_tables(numtabs int)
returns void as $$
declare query_string text;
begin
for i in 1..numtabs loop
query_string := 'delete from tab_' || i::text;
execute query_string;
end loop;
end;
$$ language plpgsql;

create or replace function insert_tables(numtabs int)
returns void as $$
declare query_string text;
begin
for i in 1..numtabs loop
query_string := 'insert into tab_' || i::text || ' VALUES (5);' ;
execute query_string;
end loop;
end;
$$ language plpgsql;

From a user POW, the main issue with relation truncation is that it can block
queries on standby server during truncation replay.

It could be interesting if you can test this case and give results of your
path.
Maybe by performing read queries on standby server and counting wait_event
with pg_wait_sampling?

Thanks for the suggestion. I tried using the extension pg_wait_sampling,
But I wasn't sure that I could replicate the problem of blocked queries on standby server.
Could you advise?
Here's what I did for now, similar to my previous test with hot standby setup,
but with additional read queries of wait events on standby server.

128MB shared_buffers
SELECT create_tables(10000);
SELECT insert_tables(10000);
SELECT delfrom_tables(10000);

[Before VACUUM]
Standby: SELECT the following view from pg_stat_waitaccum

wait_event_type | wait_event | calls | microsec
-----------------+-----------------+-------+----------
Client | ClientRead | 2 | 20887759
IO | DataFileRead | 175 | 2788
IO | RelationMapRead | 4 | 26
IO | SLRURead | 2 | 38

Primary: Execute VACUUM (induces relation truncates)

[After VACUUM]
Standby:
wait_event_type | wait_event | calls | microsec
-----------------+-----------------+-------+----------
Client | ClientRead | 7 | 77662067
IO | DataFileRead | 284 | 4523
IO | RelationMapRead | 10 | 51
IO | SLRURead | 3 | 57

Regards,
Kirk Jamison

#8Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Masahiko Sawada (#6)
RE: [PATCH] Speedup truncates of relation forks

From: Masahiko Sawada [mailto:sawada.mshk@gmail.com]

We do RelationTruncate() also when we truncate heaps that are created
in the current transactions or has a new relfilenodes in the current
transaction. So I think there is a room for optimization Thomas
suggested, although I'm not sure it's a popular use case.

Right, and I don't think of a use case that motivates the opmitizaion, too.

I've not look at this patch deeply but in DropRelFileNodeBuffer I
think we can get the min value of all firstDelBlock and use it as the
lower bound of block number that we're interested in. That way we can
skip checking the array during scanning the buffer pool.

That sounds reasonable, although I haven't examined the code, either.

Don't we use each elements of nblocks for each fork? That is, each
fork uses an element at its fork number in the nblocks array and sets
InvalidBlockNumber for invalid slots, instead of passing the valid
number of elements. That way the following code that exist at many places,

I think the current patch tries to reduce the loop count in DropRelFileNodeBuffers() by passing the number of target forks.

Regards
Takayuki Tsunakawa

#9Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Masahiko Sawada (#6)
RE: [PATCH] Speedup truncates of relation forks

On Wednesday, June 12, 2019 4:29 PM (GMT+9), Masahiko Sawada wrote:

On Wed, Jun 12, 2019 at 12:25 PM Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]

Years ago I've implemented an optimization for many DROP TABLE
commands in a single transaction - instead of scanning buffers for
each relation, the code now accumulates a small number of relations
into an array, and then does a bsearch for each buffer.

Would something like that be applicable/useful here? That is, if we
do multiple TRUNCATE commands in a single transaction, can we
optimize it like this?

Unfortunately not. VACUUM and autovacuum handles each table in a different

transaction.

We do RelationTruncate() also when we truncate heaps that are created in the
current transactions or has a new relfilenodes in the current transaction.
So I think there is a room for optimization Thomas suggested, although I'm
not sure it's a popular use case.

I couldn't think of a use case too.

I've not look at this patch deeply but in DropRelFileNodeBuffer I think we
can get the min value of all firstDelBlock and use it as the lower bound of
block number that we're interested in. That way we can skip checking the array
during scanning the buffer pool.

I'll take note of this suggestion.
Could you help me expound more on this idea, skipping the internal loop by
comparing the min and buffer descriptor (bufHdr)?

In the current patch, I've implemented the following in DropRelFileNodeBuffers:
for (i = 0; i < NBuffers; i++)
{
...
buf_state = LockBufHdr(bufHdr);
for (k = 0; k < nforks; k++)
{
if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
bufHdr->tag.forkNum == forkNum[k] &&
bufHdr->tag.blockNum >= firstDelBlock[k])
{
InvalidateBuffer(bufHdr); /* releases spinlock */
break;
}

Don't we use each elements of nblocks for each fork? That is, each fork uses
an element at its fork number in the nblocks array and sets InvalidBlockNumber
for invalid slots, instead of passing the valid number of elements. That way
the following code that exist at many places,

blocks[nforks] = visibilitymap_mark_truncate(rel, nblocks);
if (BlockNumberIsValid(blocks[nforks]))
{
forks[nforks] = VISIBILITYMAP_FORKNUM;
nforks++;
}

would become

blocks[VISIBILITYMAP_FORKNUM] = visibilitymap_mark_truncate(rel,
nblocks);

In the patch, we want to truncate all forks' blocks simultaneously, so
we optimize the invalidation of buffers and reduce the number of loops
using those values.
The suggestion above would have to remove the forks array and its
forksize (nforks), is it correct? But I think we’d need the fork array
and nforks to execute the truncation all at once.
If I'm missing something, I'd really appreciate your further comments.

--
Thank you everyone for taking a look at my thread.
I've also already added this patch to the CommitFest app.

Regards,
Kirk Jamison

#10Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Jamison, Kirk (#9)
Re: [PATCH] Speedup truncates of relation forks

On Thu, Jun 13, 2019 at 6:30 PM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote:

On Wednesday, June 12, 2019 4:29 PM (GMT+9), Masahiko Sawada wrote:

On Wed, Jun 12, 2019 at 12:25 PM Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]

Years ago I've implemented an optimization for many DROP TABLE
commands in a single transaction - instead of scanning buffers for
each relation, the code now accumulates a small number of relations
into an array, and then does a bsearch for each buffer.

Would something like that be applicable/useful here? That is, if we
do multiple TRUNCATE commands in a single transaction, can we
optimize it like this?

Unfortunately not. VACUUM and autovacuum handles each table in a different

transaction.

We do RelationTruncate() also when we truncate heaps that are created in the
current transactions or has a new relfilenodes in the current transaction.
So I think there is a room for optimization Thomas suggested, although I'm
not sure it's a popular use case.

I couldn't think of a use case too.

I've not look at this patch deeply but in DropRelFileNodeBuffer I think we
can get the min value of all firstDelBlock and use it as the lower bound of
block number that we're interested in. That way we can skip checking the array
during scanning the buffer pool.

I'll take note of this suggestion.
Could you help me expound more on this idea, skipping the internal loop by
comparing the min and buffer descriptor (bufHdr)?

Yes. For example,

BlockNumber minBlock = InvalidBlockNumber;
(snip)
/* Get lower bound block number we're interested in */
for (i = 0; i < nforks; i++)
{
if (!BlockNumberIsValid(minBlock) ||
minBlock > firstDelBlock[i])
minBlock = firstDelBlock[i];
}

for (i = 0; i < NBuffers; i++)
{
(snip)
buf_state = LockBufHdr(bufHdr);

/* check with the lower bound and skip the loop */
if (bufHdr->tag.blockNum < minBlock)
{
UnlockBufHdr(bufHdr, buf_state);
continue;
}

for (k = 0; k < nforks; k++)
{
if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
bufHdr->tag.forkNum == forkNum[k] &&
bufHdr->tag.blockNum >= firstDelBlock[k])

But since we acquire the buffer header lock after all and the number
of the internal loops is small (at most 3 for now) the benefit will
not be big.

In the current patch, I've implemented the following in DropRelFileNodeBuffers:
for (i = 0; i < NBuffers; i++)
{
...
buf_state = LockBufHdr(bufHdr);
for (k = 0; k < nforks; k++)
{
if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
bufHdr->tag.forkNum == forkNum[k] &&
bufHdr->tag.blockNum >= firstDelBlock[k])
{
InvalidateBuffer(bufHdr); /* releases spinlock */
break;
}

Don't we use each elements of nblocks for each fork? That is, each fork uses
an element at its fork number in the nblocks array and sets InvalidBlockNumber
for invalid slots, instead of passing the valid number of elements. That way
the following code that exist at many places,

blocks[nforks] = visibilitymap_mark_truncate(rel, nblocks);
if (BlockNumberIsValid(blocks[nforks]))
{
forks[nforks] = VISIBILITYMAP_FORKNUM;
nforks++;
}

would become

blocks[VISIBILITYMAP_FORKNUM] = visibilitymap_mark_truncate(rel,
nblocks);

In the patch, we want to truncate all forks' blocks simultaneously, so
we optimize the invalidation of buffers and reduce the number of loops
using those values.
The suggestion above would have to remove the forks array and its
forksize (nforks), is it correct? But I think we’d need the fork array
and nforks to execute the truncation all at once.

I meant that each forks can use the its forknumber'th element of
firstDelBlock[]. For example, if firstDelBlock = {1000,
InvalidBlockNumber, 20, InvalidBlockNumber}, we can invalid buffers
pertaining both greater than block number 1000 of main and greater
than block number 20 of vm. Since firstDelBlock[FSM_FORKNUM] ==
InvalidBlockNumber we don't invalid buffers of fsm.

As Tsunakawa-san mentioned, since your approach would reduce the loop
count your idea might be better than mine which always takes 4 loop
counts.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#11Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Masahiko Sawada (#10)
RE: [PATCH] Speedup truncates of relation forks

From: Masahiko Sawada [mailto:sawada.mshk@gmail.com]

for (i = 0; i < NBuffers; i++)
{
(snip)
buf_state = LockBufHdr(bufHdr);

/* check with the lower bound and skip the loop */
if (bufHdr->tag.blockNum < minBlock)
{
UnlockBufHdr(bufHdr, buf_state);
continue;
}

for (k = 0; k < nforks; k++)
{
if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
bufHdr->tag.forkNum == forkNum[k] &&
bufHdr->tag.blockNum >= firstDelBlock[k])

But since we acquire the buffer header lock after all and the number
of the internal loops is small (at most 3 for now) the benefit will
not be big.

Yeah, so I think we can just compare the block number without locking the buffer header here.

Regards
Takayuki Tsunakawa

#12Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Masahiko Sawada (#10)
RE: [PATCH] Speedup truncates of relation forks

Hi Sawada-san,

On Thursday, June 13, 2019 8:01 PM, Masahiko Sawada wrote:

On Thu, Jun 13, 2019 at 6:30 PM Jamison, Kirk <k.jamison@jp.fujitsu.com>
wrote:

On Wednesday, June 12, 2019 4:29 PM (GMT+9), Masahiko Sawada wrote:

...
I've not look at this patch deeply but in DropRelFileNodeBuffer I
think we can get the min value of all firstDelBlock and use it as
the lower bound of block number that we're interested in. That way
we can skip checking the array during scanning the buffer pool.

I'll take note of this suggestion.
Could you help me expound more on this idea, skipping the internal
loop by comparing the min and buffer descriptor (bufHdr)?

Yes. For example,

BlockNumber minBlock = InvalidBlockNumber;
(snip)
/* Get lower bound block number we're interested in */
for (i = 0; i < nforks; i++)
{
if (!BlockNumberIsValid(minBlock) ||
minBlock > firstDelBlock[i])
minBlock = firstDelBlock[i];
}

for (i = 0; i < NBuffers; i++)
{
(snip)
buf_state = LockBufHdr(bufHdr);

/* check with the lower bound and skip the loop */
if (bufHdr->tag.blockNum < minBlock)
{
UnlockBufHdr(bufHdr, buf_state);
continue;
}

for (k = 0; k < nforks; k++)
{
if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
bufHdr->tag.forkNum == forkNum[k] &&
bufHdr->tag.blockNum >= firstDelBlock[k])

But since we acquire the buffer header lock after all and the number of the
internal loops is small (at most 3 for now) the benefit will not be big.

Thank you very much for your kind and detailed explanation.
I'll still consider your suggestions in the next patch and optimize it more
so that we could possibly not need to acquire the LockBufHdr anymore.

Don't we use each elements of nblocks for each fork? That is, each
fork uses an element at its fork number in the nblocks array and
sets InvalidBlockNumber for invalid slots, instead of passing the
valid number of elements. That way the following code that exist at
many places,

blocks[nforks] = visibilitymap_mark_truncate(rel, nblocks);
if (BlockNumberIsValid(blocks[nforks]))
{
forks[nforks] = VISIBILITYMAP_FORKNUM;
nforks++;
}

would become

blocks[VISIBILITYMAP_FORKNUM] = visibilitymap_mark_truncate(rel,
nblocks);

In the patch, we want to truncate all forks' blocks simultaneously, so
we optimize the invalidation of buffers and reduce the number of loops
using those values.
The suggestion above would have to remove the forks array and its
forksize (nforks), is it correct? But I think we’d need the fork array
and nforks to execute the truncation all at once.

I meant that each forks can use the its forknumber'th element of
firstDelBlock[]. For example, if firstDelBlock = {1000, InvalidBlockNumber,
20, InvalidBlockNumber}, we can invalid buffers pertaining both greater than
block number 1000 of main and greater than block number 20 of vm. Since
firstDelBlock[FSM_FORKNUM] == InvalidBlockNumber we don't invalid buffers
of fsm.

As Tsunakawa-san mentioned, since your approach would reduce the loop count
your idea might be better than mine which always takes 4 loop counts.

Understood. Thank you again for the kind and detailed explanations.
I'll reconsider these approaches.

Regards,
Kirk Jamison

#13Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Jamison, Kirk (#12)
1 attachment(s)
RE: [PATCH] Speedup truncates of relation forks

Hi all,

Attached is the v2 of the patch. I added the optimization that Sawada-san
suggested for DropRelFileNodeBuffers, although I did not acquire the lock
when comparing the minBlock and target block.

There's actually a comment written in the source code that we could
pre-check buffer tag for forkNum and blockNum, but given that FSM and VM
blocks are small compared to main fork's, the additional benefit of doing so
would be small.

* We could check forkNum and blockNum as well as the rnode, but the
* incremental win from doing so seems small.

I personally think it's alright not to include the suggested pre-checking.
If that's the case, we can just follow the patch v1 version.

Thoughts?

Comments and reviews from other parts of the patch are also very much welcome.

Regards,
Kirk Jamison

Attachments:

v2-0001-Speedup-truncates-of-relation-forks.patchapplication/octet-stream; name=v2-0001-Speedup-truncates-of-relation-forks.patchDownload
From c058bba2bfc47490f231d6f067bb470cdefccbc0 Mon Sep 17 00:00:00 2001
From: Kirk Jamison <k.jamison@jp.fujitsu.com>
Date: Tue, 11 Jun 2019 01:41:43 +0000
Subject: [PATCH] Speedup truncates of relation forks

Whenever we truncate relations, it involves several scans of the
shared buffers for every call of smgrtruncate() for each fork which
is time-consuming. This patch reduces the scan for all forks into one 
instead of three, and improves the relation truncates by initially
marking the pages-to-be-truncated of relation forks, then
simultaneously truncating them, resulting to an improved performance
in VACUUM, autovacuum operations and their recovery performance.
---
 contrib/pg_visibility/pg_visibility.c     |  17 +++-
 src/backend/access/heap/visibilitymap.c   |  31 +++-----
 src/backend/catalog/storage.c             | 126 ++++++++++++++++++++++++++----
 src/backend/storage/buffer/bufmgr.c       |  45 ++++++++---
 src/backend/storage/freespace/freespace.c |  38 +++------
 src/backend/storage/smgr/smgr.c           |  24 +++---
 src/include/access/visibilitymap.h        |   2 +-
 src/include/storage/bufmgr.h              |   4 +-
 src/include/storage/freespace.h           |   2 +-
 src/include/storage/smgr.h                |   7 +-
 10 files changed, 207 insertions(+), 89 deletions(-)

diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 1372bb6..2499415 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -383,6 +383,10 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
 {
 	Oid			relid = PG_GETARG_OID(0);
 	Relation	rel;
+	ForkNumber	forks[MAX_FORKNUM];
+	BlockNumber	blocks[MAX_FORKNUM];
+	BlockNumber	newnblocks = InvalidBlockNumber;
+	int		nforks = 0;
 
 	rel = relation_open(relid, AccessExclusiveLock);
 
@@ -392,7 +396,18 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
 	RelationOpenSmgr(rel);
 	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
 
-	visibilitymap_truncate(rel, 0);
+	blocks[nforks] = visibilitymap_mark_truncate(rel, 0);
+	if (BlockNumberIsValid(blocks[nforks]))
+	{
+		forks[nforks] = VISIBILITYMAP_FORKNUM;
+		newnblocks = blocks[nforks];
+		nforks++;
+	}
+	smgrtruncate(rel->rd_smgr, forks, blocks, nforks);
+
+	/* Update the local smgr_vm_nblocks setting */
+	if (rel->rd_smgr)
+		rel->rd_smgr->smgr_vm_nblocks = newnblocks;
 
 	if (RelationNeedsWAL(rel))
 	{
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dfe06..2f1379c 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -17,7 +17,7 @@
  *		visibilitymap_set	 - set a bit in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
- *		visibilitymap_truncate	- truncate the visibility map
+ *		visibilitymap_mark_truncate - mark the about-to-be-truncated VM
  *
  * NOTES
  *
@@ -430,7 +430,10 @@ visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_fro
 }
 
 /*
- *	visibilitymap_truncate - truncate the visibility map
+ *	visibilitymap_mark_truncate - mark the about-to-be-truncated VM
+ *
+ * Formerly, this function truncates VM relation forks. Instead, this just
+ * marks the dirty buffers.
  *
  * The caller must hold AccessExclusiveLock on the relation, to ensure that
  * other backends receive the smgr invalidation event that this function sends
@@ -438,8 +441,8 @@ visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_fro
  *
  * nheapblocks is the new size of the heap.
  */
-void
-visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
+BlockNumber
+visibilitymap_mark_truncate(Relation rel, BlockNumber nheapblocks)
 {
 	BlockNumber newnblocks;
 
@@ -459,7 +462,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	 * nothing to truncate.
 	 */
 	if (!smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM))
-		return;
+		return InvalidBlockNumber;
 
 	/*
 	 * Unless the new size is exactly at a visibility map page boundary, the
@@ -480,7 +483,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 		if (!BufferIsValid(mapBuffer))
 		{
 			/* nothing to do, the file was already smaller */
-			return;
+			return InvalidBlockNumber;
 		}
 
 		page = BufferGetPage(mapBuffer);
@@ -528,20 +531,10 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	if (smgrnblocks(rel->rd_smgr, VISIBILITYMAP_FORKNUM) <= newnblocks)
 	{
 		/* nothing to do, the file was already smaller than requested size */
-		return;
+		return InvalidBlockNumber;
 	}
-
-	/* Truncate the unused VM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, newnblocks);
-
-	/*
-	 * We might as well update the local smgr_vm_nblocks setting. smgrtruncate
-	 * sent an smgr cache inval message, which will cause other backends to
-	 * invalidate their copy of smgr_vm_nblocks, and this one too at the next
-	 * command boundary.  But this ensures it isn't outright wrong until then.
-	 */
-	if (rel->rd_smgr)
-		rel->rd_smgr->smgr_vm_nblocks = newnblocks;
+	else
+		return newnblocks;
 }
 
 /*
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 3cc886f..3151632 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -231,6 +231,11 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 {
 	bool		fsm;
 	bool		vm;
+	ForkNumber	forks[MAX_FORKNUM];
+	BlockNumber	blocks[MAX_FORKNUM];
+	BlockNumber	new_nfsmblocks = InvalidBlockNumber;	/* FSM blocks */
+	BlockNumber	newnblocks = InvalidBlockNumber;	/* VM blocks */
+	int		nforks = 0;
 
 	/* Open it at the smgr level if not already done */
 	RelationOpenSmgr(rel);
@@ -242,15 +247,34 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 	rel->rd_smgr->smgr_fsm_nblocks = InvalidBlockNumber;
 	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
 
-	/* Truncate the FSM first if it exists */
+	/*
+	 * We used to truncate FSM and VM forks here. Now we only mark the
+	 * dirty buffers of all forks about-to-be-truncated if they exist.
+	 */
+
 	fsm = smgrexists(rel->rd_smgr, FSM_FORKNUM);
 	if (fsm)
-		FreeSpaceMapTruncateRel(rel, nblocks);
+	{
+		blocks[nforks] = MarkFreeSpaceMapTruncateRel(rel, nblocks);
+		if (BlockNumberIsValid(blocks[nforks]))
+		{
+			forks[nforks] = FSM_FORKNUM;
+			new_nfsmblocks= blocks[nforks];	/* FSM blocks */
+			nforks++;
+		}
+	}
 
-	/* Truncate the visibility map too if it exists. */
 	vm = smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM);
 	if (vm)
-		visibilitymap_truncate(rel, nblocks);
+	{
+		blocks[nforks] = visibilitymap_mark_truncate(rel, nblocks);
+		if (BlockNumberIsValid(blocks[nforks]))
+		{
+			forks[nforks] = VISIBILITYMAP_FORKNUM;
+			newnblocks = blocks[nforks]; 	/* VM blocks */
+			nforks++;
+		}
+	}
 
 	/*
 	 * We WAL-log the truncation before actually truncating, which means
@@ -263,9 +287,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 	 */
 	if (RelationNeedsWAL(rel))
 	{
-		/*
-		 * Make an XLOG entry reporting the file truncation.
-		 */
+		/* Make an XLOG entry reporting the file truncation */
 		XLogRecPtr	lsn;
 		xl_smgr_truncate xlrec;
 
@@ -290,8 +312,33 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 			XLogFlush(lsn);
 	}
 
-	/* Do the real work */
-	smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks);
+	/* Mark the MAIN fork */
+	forks[nforks] = MAIN_FORKNUM;
+	blocks[nforks] = nblocks;
+	nforks++;
+
+	/* Truncate relation forks simultaneously */
+	smgrtruncate(rel->rd_smgr, forks, blocks, nforks);
+
+	/*
+	 * We might as well update the local smgr_fsm_nblocks and smgr_vm_nblocks
+	 * setting. smgrtruncate sent an smgr cache inval message, which will cause
+	 * other backends to invalidate their copy of smgr_fsm_nblocks and
+	 * smgr_vm_nblocks, and this one too at the next command boundary. But this
+	 * ensures it isn't outright wrong until then.
+	 */
+	if (rel->rd_smgr)
+	{
+		rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
+		rel->rd_smgr->smgr_vm_nblocks = newnblocks;
+	}
+
+	/*
+	 * Update upper-level FSM pages to account for the truncation.  This is
+	 * important because the just-truncated pages were likely marked as
+	 * all-free, and would be preferentially selected.
+	 */
+	FreeSpaceMapVacuumRange(rel, new_nfsmblocks, InvalidBlockNumber);
 }
 
 /*
@@ -588,6 +635,14 @@ smgr_redo(XLogReaderState *record)
 		xl_smgr_truncate *xlrec = (xl_smgr_truncate *) XLogRecGetData(record);
 		SMgrRelation reln;
 		Relation	rel;
+		ForkNumber	forks[MAX_FORKNUM];
+		BlockNumber	blocks[MAX_FORKNUM];
+		BlockNumber	new_nfsmblocks = InvalidBlockNumber;
+		BlockNumber	newnblocks = InvalidBlockNumber;
+		int		nforks = 0;
+		bool		fsm_fork = false;
+		bool		main_fork = false;
+		bool		vm_fork = false;
 
 		reln = smgropen(xlrec->rnode, InvalidBackendId);
 
@@ -616,23 +671,62 @@ smgr_redo(XLogReaderState *record)
 		 */
 		XLogFlush(lsn);
 
+		/*
+		 * To speedup recovery, we mark the about-to-be-truncated blocks of
+		 * relation forks first, then truncate those simultaneously later.
+		 */
 		if ((xlrec->flags & SMGR_TRUNCATE_HEAP) != 0)
 		{
-			smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno);
-
-			/* Also tell xlogutils.c about it */
-			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+			forks[nforks] = MAIN_FORKNUM;
+			blocks[nforks] = xlrec->blkno;
+			nforks++;
+			main_fork = true;
 		}
 
-		/* Truncate FSM and VM too */
 		rel = CreateFakeRelcacheEntry(xlrec->rnode);
 
 		if ((xlrec->flags & SMGR_TRUNCATE_FSM) != 0 &&
 			smgrexists(reln, FSM_FORKNUM))
-			FreeSpaceMapTruncateRel(rel, xlrec->blkno);
+		{
+			blocks[nforks] = MarkFreeSpaceMapTruncateRel(rel, xlrec->blkno);
+			if (BlockNumberIsValid(blocks[nforks]))
+			{
+				forks[nforks] = FSM_FORKNUM;
+				new_nfsmblocks= blocks[nforks];
+				nforks++;
+				fsm_fork = true;
+			}
+		}
 		if ((xlrec->flags & SMGR_TRUNCATE_VM) != 0 &&
 			smgrexists(reln, VISIBILITYMAP_FORKNUM))
-			visibilitymap_truncate(rel, xlrec->blkno);
+		{
+			blocks[nforks] = visibilitymap_mark_truncate(rel, xlrec->blkno);
+			if (BlockNumberIsValid(blocks[nforks]))
+			{
+				forks[nforks] = VISIBILITYMAP_FORKNUM;
+				newnblocks = blocks[nforks];
+				nforks++;
+				vm_fork = true;
+			}
+		}
+
+		/* Truncate relation forks simultaneously */
+		if (main_fork || fsm_fork || vm_fork)
+			smgrtruncate(reln, forks, blocks, nforks);
+
+		/* Also tell xlogutils.c about it */
+		if (main_fork)
+			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+
+		/* Update the local smgr_fsm_nblocks and smgr_vm_nblocks setting */
+		if (rel->rd_smgr)
+		{
+			rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
+			rel->rd_smgr->smgr_vm_nblocks = newnblocks;
+		}
+
+		/* Update upper-level FSM pages to account for the truncation */
+		FreeSpaceMapVacuumRange(rel, new_nfsmblocks, InvalidBlockNumber);
 
 		FreeFakeRelcacheEntry(rel);
 	}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 7332e6b..dad8158 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2899,8 +2899,8 @@ BufferGetLSNAtomic(Buffer buffer)
 /* ---------------------------------------------------------------------
  *		DropRelFileNodeBuffers
  *
- *		This function removes from the buffer pool all the pages of the
- *		specified relation fork that have block numbers >= firstDelBlock.
+ *		This function simultaneously removes from the buffer pool all the
+ *		pages of the relation forks that have block numbers >= firstDelBlock.
  *		(In particular, with firstDelBlock = 0, all pages are removed.)
  *		Dirty pages are simply dropped, without bothering to write them
  *		out first.  Therefore, this is NOT rollback-able, and so should be
@@ -2923,23 +2923,37 @@ BufferGetLSNAtomic(Buffer buffer)
  * --------------------------------------------------------------------
  */
 void
-DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
-					   BlockNumber firstDelBlock)
+DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
+					   BlockNumber *firstDelBlock, int nforks)
 {
 	int			i;
+	BlockNumber minBlock = InvalidBlockNumber;
 
 	/* If it's a local relation, it's localbuf.c's problem. */
 	if (RelFileNodeBackendIsTemp(rnode))
 	{
 		if (rnode.backend == MyBackendId)
-			DropRelFileNodeLocalBuffers(rnode.node, forkNum, firstDelBlock);
+		{
+			for (int j = 0; j < nforks; j++)
+				DropRelFileNodeLocalBuffers(rnode.node, forkNum[j],
+											firstDelBlock[j]);
+		}
 		return;
 	}
 
+	/* Get the lower bound of target block number we're interested in */
+	for (i = 0; i < nforks; i++)
+	{
+		if (!BlockNumberIsValid(minBlock) ||
+			minBlock > firstDelBlock[i])
+			minBlock = firstDelBlock[i];
+	}
+
 	for (i = 0; i < NBuffers; i++)
 	{
 		BufferDesc *bufHdr = GetBufferDescriptor(i);
 		uint32		buf_state;
+		int		k = 0;
 
 		/*
 		 * We can make this a tad faster by prechecking the buffer tag before
@@ -2960,12 +2974,23 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
 		if (!RelFileNodeEquals(bufHdr->tag.rnode, rnode.node))
 			continue;
 
+		/* Check with the lower bound block number and skip the loop */
+		if (bufHdr->tag.blockNum < minBlock)
+			continue; /* skip checking the buffer pool scan */
+
 		buf_state = LockBufHdr(bufHdr);
-		if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
-			bufHdr->tag.forkNum == forkNum &&
-			bufHdr->tag.blockNum >= firstDelBlock)
-			InvalidateBuffer(bufHdr);	/* releases spinlock */
-		else
+
+		for (k = 0; k < nforks; k++)
+		{
+			if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
+				bufHdr->tag.forkNum == forkNum[k] &&
+				bufHdr->tag.blockNum >= firstDelBlock[k])
+			{
+				InvalidateBuffer(bufHdr); /* releases spinlock */
+				break;
+			}
+		}
+		if (k >= nforks)
 			UnlockBufHdr(bufHdr, buf_state);
 	}
 }
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index c17b3f4..708c7cb 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -247,7 +247,10 @@ GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk)
 }
 
 /*
- * FreeSpaceMapTruncateRel - adjust for truncation of a relation.
+ * MarkFreeSpaceMapTruncateRel - adjust for truncation of a relation.
+ *
+ * Formerly, this function truncates FSM relation forks. Instead, this just
+ * marks the dirty buffers and returns a block number.
  *
  * The caller must hold AccessExclusiveLock on the relation, to ensure that
  * other backends receive the smgr invalidation event that this function sends
@@ -255,8 +258,8 @@ GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk)
  *
  * nblocks is the new size of the heap.
  */
-void
-FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
+BlockNumber
+MarkFreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 {
 	BlockNumber new_nfsmblocks;
 	FSMAddress	first_removed_address;
@@ -270,7 +273,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	 * truncate.
 	 */
 	if (!smgrexists(rel->rd_smgr, FSM_FORKNUM))
-		return;
+		return InvalidBlockNumber;
 
 	/* Get the location in the FSM of the first removed heap block */
 	first_removed_address = fsm_get_location(nblocks, &first_removed_slot);
@@ -285,7 +288,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	{
 		buf = fsm_readbuf(rel, first_removed_address, false);
 		if (!BufferIsValid(buf))
-			return;				/* nothing to do; the FSM was already smaller */
+			return InvalidBlockNumber;	/* nothing to do; the FSM was already smaller */
 		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
 
 		/* NO EREPORT(ERROR) from here till changes are logged */
@@ -310,33 +313,16 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 		UnlockReleaseBuffer(buf);
 
 		new_nfsmblocks = fsm_logical_to_physical(first_removed_address) + 1;
+		return new_nfsmblocks;
 	}
 	else
 	{
 		new_nfsmblocks = fsm_logical_to_physical(first_removed_address);
 		if (smgrnblocks(rel->rd_smgr, FSM_FORKNUM) <= new_nfsmblocks)
-			return;				/* nothing to do; the FSM was already smaller */
+			return InvalidBlockNumber;	/* nothing to do; the FSM was already smaller */
+		else
+			return new_nfsmblocks;
 	}
-
-	/* Truncate the unused FSM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, FSM_FORKNUM, new_nfsmblocks);
-
-	/*
-	 * We might as well update the local smgr_fsm_nblocks setting.
-	 * smgrtruncate sent an smgr cache inval message, which will cause other
-	 * backends to invalidate their copy of smgr_fsm_nblocks, and this one too
-	 * at the next command boundary.  But this ensures it isn't outright wrong
-	 * until then.
-	 */
-	if (rel->rd_smgr)
-		rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
-
-	/*
-	 * Update upper-level FSM pages to account for the truncation.  This is
-	 * important because the just-truncated pages were likely marked as
-	 * all-free, and would be preferentially selected.
-	 */
-	FreeSpaceMapVacuumRange(rel, nblocks, InvalidBlockNumber);
 }
 
 /*
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index dba8c39..b37560e 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -508,19 +508,21 @@ smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo)
  *		already.
  */
 void
-smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
+smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum, bool isRedo, int nforks)
 {
 	RelFileNodeBackend rnode = reln->smgr_rnode;
 	int			which = reln->smgr_which;
+	int			i;
 
 	/* Close the fork at smgr level */
-	smgrsw[which].smgr_close(reln, forknum);
+	for (i = 0; i < nforks; i++)
+		smgrsw[which].smgr_close(reln, forknum[i]);
 
 	/*
 	 * Get rid of any remaining buffers for the fork.  bufmgr will just drop
 	 * them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(rnode, forknum, 0);
+	DropRelFileNodeBuffers(rnode, forknum, 0, nforks);
 
 	/*
 	 * It'd be nice to tell the stats collector to forget it immediately, too.
@@ -546,7 +548,8 @@ smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 	 * ERROR, because we've already decided to commit or abort the current
 	 * xact.
 	 */
-	smgrsw[which].smgr_unlink(rnode, forknum, isRedo);
+	for (i = 0; i < nforks; i++)
+		smgrsw[which].smgr_unlink(rnode, forknum[i], isRedo);
 }
 
 /*
@@ -643,13 +646,15 @@ smgrnblocks(SMgrRelation reln, ForkNumber forknum)
  * The truncation is done immediately, so this can't be rolled back.
  */
 void
-smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
+smgrtruncate(SMgrRelation reln, ForkNumber *forknum, BlockNumber *nblocks, int nforks)
 {
+	int		i;
+
 	/*
 	 * Get rid of any buffers for the about-to-be-deleted blocks. bufmgr will
 	 * just drop them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nblocks);
+	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nblocks, nforks);
 
 	/*
 	 * Send a shared-inval message to force other backends to close any smgr
@@ -663,10 +668,9 @@ smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 	 */
 	CacheInvalidateSmgr(reln->smgr_rnode);
 
-	/*
-	 * Do the truncation.
-	 */
-	smgrsw[reln->smgr_which].smgr_truncate(reln, forknum, nblocks);
+	/* Do the truncation */
+	for (i = 0; i < nforks; i++)
+		smgrsw[reln->smgr_which].smgr_truncate(reln, forknum[i], nblocks[i]);
 }
 
 /*
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 2d88043..4735d5f 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -44,6 +44,6 @@ extern void visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 							  uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
-extern void visibilitymap_truncate(Relation rel, BlockNumber nheapblocks);
+extern BlockNumber visibilitymap_mark_truncate(Relation rel, BlockNumber nheapblocks);
 
 #endif							/* VISIBILITYMAP_H */
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 509f4b7..5be6c0d 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -190,8 +190,8 @@ extern BlockNumber RelationGetNumberOfBlocksInFork(Relation relation,
 extern void FlushOneBuffer(Buffer buffer);
 extern void FlushRelationBuffers(Relation rel);
 extern void FlushDatabaseBuffers(Oid dbid);
-extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode,
-								   ForkNumber forkNum, BlockNumber firstDelBlock);
+extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
+								   BlockNumber *firstDelBlock, int nforks);
 extern void DropRelFileNodesAllBuffers(RelFileNodeBackend *rnodes, int nnodes);
 extern void DropDatabaseBuffers(Oid dbid);
 
diff --git a/src/include/storage/freespace.h b/src/include/storage/freespace.h
index 8d8c465..bf19a67 100644
--- a/src/include/storage/freespace.h
+++ b/src/include/storage/freespace.h
@@ -30,7 +30,7 @@ extern void RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk,
 extern void XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk,
 										Size spaceAvail);
 
-extern void FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
+extern BlockNumber MarkFreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
 extern void FreeSpaceMapVacuum(Relation rel);
 extern void FreeSpaceMapVacuumRange(Relation rel, BlockNumber start,
 									BlockNumber end);
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index d286c8c..ff70b09 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -90,7 +90,8 @@ extern void smgrclosenode(RelFileNodeBackend rnode);
 extern void smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern void smgrdounlink(SMgrRelation reln, bool isRedo);
 extern void smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo);
-extern void smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo);
+extern void smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum,
+							 bool isRedo, int nforks);
 extern void smgrextend(SMgrRelation reln, ForkNumber forknum,
 					   BlockNumber blocknum, char *buffer, bool skipFsync);
 extern void smgrprefetch(SMgrRelation reln, ForkNumber forknum,
@@ -102,8 +103,8 @@ extern void smgrwrite(SMgrRelation reln, ForkNumber forknum,
 extern void smgrwriteback(SMgrRelation reln, ForkNumber forknum,
 						  BlockNumber blocknum, BlockNumber nblocks);
 extern BlockNumber smgrnblocks(SMgrRelation reln, ForkNumber forknum);
-extern void smgrtruncate(SMgrRelation reln, ForkNumber forknum,
-						 BlockNumber nblocks);
+extern void smgrtruncate(SMgrRelation reln, ForkNumber *forknum,
+						 BlockNumber *nblocks, int nforks);
 extern void smgrimmedsync(SMgrRelation reln, ForkNumber forknum);
 extern void AtEOXact_SMgr(void);
 
-- 
1.8.3.1

#14Adrien Nayrat
adrien.nayrat@anayrat.info
In reply to: Jamison, Kirk (#7)
Re: [PATCH] Speedup truncates of relation forks

On 6/12/19 10:29 AM, Jamison, Kirk wrote:

From a user POW, the main issue with relation truncation is that it can block
queries on standby server during truncation replay.

It could be interesting if you can test this case and give results of your
path.
Maybe by performing read queries on standby server and counting wait_event
with pg_wait_sampling?

Thanks for the suggestion. I tried using the extension pg_wait_sampling,
But I wasn't sure that I could replicate the problem of blocked queries on standby server.
Could you advise?
Here's what I did for now, similar to my previous test with hot standby setup,
but with additional read queries of wait events on standby server.

128MB shared_buffers
SELECT create_tables(10000);
SELECT insert_tables(10000);
SELECT delfrom_tables(10000);

[Before VACUUM]
Standby: SELECT the following view from pg_stat_waitaccum

wait_event_type | wait_event | calls | microsec
-----------------+-----------------+-------+----------
Client | ClientRead | 2 | 20887759
IO | DataFileRead | 175 | 2788
IO | RelationMapRead | 4 | 26
IO | SLRURead | 2 | 38

Primary: Execute VACUUM (induces relation truncates)

[After VACUUM]
Standby:
wait_event_type | wait_event | calls | microsec
-----------------+-----------------+-------+----------
Client | ClientRead | 7 | 77662067
IO | DataFileRead | 284 | 4523
IO | RelationMapRead | 10 | 51
IO | SLRURead | 3 | 57

(Sorry for the delay, I forgot to answer you)

As far as I remember, you should see "relation" wait events (type lock) on
standby server. This is due to startup process acquiring AccessExclusiveLock for
the truncation and other backend waiting to acquire a lock to read the table.

On primary server, vacuum is able to cancel truncation:

/*
* We need full exclusive lock on the relation in order to do
* truncation. If we can't get it, give up rather than waiting --- we
* don't want to block other backends, and we don't want to deadlock
* (which is quite possible considering we already hold a lower-grade
* lock).
*/
vacrelstats->lock_waiter_detected = false;
lock_retry = 0;
while (true)
{
if (ConditionalLockRelation(onerel, AccessExclusiveLock))
break;

/*
* Check for interrupts while trying to (re-)acquire the exclusive
* lock.
*/
CHECK_FOR_INTERRUPTS();

if (++lock_retry > (VACUUM_TRUNCATE_LOCK_TIMEOUT /
VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL))
{
/*
* We failed to establish the lock in the specified number of
* retries. This means we give up truncating.
*/
vacrelstats->lock_waiter_detected = true;
ereport(elevel,
(errmsg("\"%s\": stopping truncate due to conflicting lock request",
RelationGetRelationName(onerel))));
return;
}

pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL * 1000L);
}

To maximize chances to reproduce we can use big shared_buffers. But I am afraid
it is not easy to perform reproducible tests to compare results. Unfortunately I
don't have servers to perform tests.

Regards,

#15Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Adrien Nayrat (#14)
RE: [PATCH] Speedup truncates of relation forks

On Wednesday, June 26, 2019 6:10 PM(GMT+9), Adrien Nayrat wrote:

As far as I remember, you should see "relation" wait events (type lock) on
standby server. This is due to startup process acquiring AccessExclusiveLock
for the truncation and other backend waiting to acquire a lock to read the
table.

Hi Adrien, thank you for taking time to reply.

I understand that RelationTruncate() can block read-only queries on
standby during redo. However, it's difficult for me to reproduce the
test case where I need to catch that wait for relation lock, because
one has to execute SELECT within the few milliseconds of redoing the
truncation of one table.

Instead, I just measured the whole recovery time, smgr_redo(),
to show the recovery improvement compared to head. Please refer below.

[Recovery Test]
I used the same stored functions and configurations in the previous email
& created "test" db.

$ createdb test
$ psql -d test

1. [Primary] Create 10,000 relations.
test=# SELECT create_tables(10000);

2. [P] Insert one row in each table.
test=# SELECT insert_tables(10000);

3. [P] Delete row of each table.
test=# SELECT delfrom_tables(10000);

4. [Standby] WAL application is stopped at Standby server.
test=# SELECT pg_wal_replay_pause();

5. [P] VACUUM is executed at Primary side, and measure its execution time.
test=# \timing on
test=# VACUUM;

Alternatively, you may use:
$ time psql -d test -c 'VACUUM;'
(Note: WAL has not replayed on standby because it's been paused.)

6. [P] Wait until VACUUM has finished execution. Then, stop primary server.
test=# pg_ctl stop -w

7. [S] Resume WAL replay, then promote standby (failover).
I used a shell script to execute recovery & promote standby server
because it's kinda difficult to measure recovery time. Please refer to the script below.
- "SELECT pg_wal_replay_resume();" is executed and the WAL application is resumed.
- "pg_ctl promote" to promote standby.
- The time difference of "select pg_is_in_recovery();" from "t" to "f" is measured.

shell script:

PGDT=/path_to_storage_directory/

if [ "$1" = "resume" ]; then
psql -c "SELECT pg_wal_replay_resume();" test
date +%Y/%m/%d_%H:%M:%S.%3N
pg_ctl promote -D ${PGDT}
set +x
date +%Y/%m/%d_%H:%M:%S.%3N
while [ 1 ]
do
RS=`psql -Atc "select pg_is_in_recovery();" test`
if [ ${RS} = "f" ]; then
break
fi
done
date +%Y/%m/%d_%H:%M:%S.%3N
set -x
exit 0
fi

[Test Results]
shared_buffers = 24GB

1. HEAD
(wal replay resumed)
2019/07/01_08:48:50.326
server promoted
2019/07/01_08:49:50.482
2019/07/01_09:02:41.051

Recovery Time:
13 min 50.725 s -> Time difference from WAL replay to complete recovery
12 min 50.569 s -> Time difference of "select pg_is_in_recovery();" from "t" to "f"

2. PATCH
(wal replay resumed)
2019/07/01_07:34:26.766
server promoted
2019/07/01_07:34:57.790
2019/07/01_07:34:57.809

Recovery Time:
31.043 s -> Time difference from WAL replay to complete recovery
00.019 s -> Time difference of "select pg_is_in_recovery();" from "t" to "f"

[Conclusion]
The recovery time significantly improved compared to head
from 13 minutes to 30 seconds.

Any thoughts?
I'd really appreciate your comments/feedback about the patch and/or test.

Regards,
Kirk Jamison

#16Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Jamison, Kirk (#13)
Re: [PATCH] Speedup truncates of relation forks

On Mon, Jun 17, 2019 at 5:01 PM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote:

Hi all,

Attached is the v2 of the patch. I added the optimization that Sawada-san
suggested for DropRelFileNodeBuffers, although I did not acquire the lock
when comparing the minBlock and target block.

There's actually a comment written in the source code that we could
pre-check buffer tag for forkNum and blockNum, but given that FSM and VM
blocks are small compared to main fork's, the additional benefit of doing so
would be small.

* We could check forkNum and blockNum as well as the rnode, but the
* incremental win from doing so seems small.

I personally think it's alright not to include the suggested pre-checking.
If that's the case, we can just follow the patch v1 version.

Thoughts?

Comments and reviews from other parts of the patch are also very much welcome.

Thank you for updating the patch. Here is the review comments for v2 patch.

---
- *     visibilitymap_truncate - truncate the visibility map
+ *     visibilitymap_mark_truncate - mark the about-to-be-truncated VM
+ *
+ * Formerly, this function truncates VM relation forks. Instead, this just
+ * marks the dirty buffers.
  *
  * The caller must hold AccessExclusiveLock on the relation, to ensure that
  * other backends receive the smgr invalidation event that this function sends
  * before they access the VM again.
  *

I don't think we should describe about the previous behavior here.
Rather we need to describe what visibilitymap_mark_truncate does and
what it returns to the caller.

I'm not sure that visibilitymap_mark_truncate function name is
appropriate here since it actually truncate map bits, not only
marking. Perhaps we can still use visibilitymap_truncate or change to
visibilitymap_truncate_prepare, or something? Anyway, this function
truncates only tail bits in the last remaining map page and we can
have a rule that the caller must call smgrtruncate() later to actually
truncate pages.

The comment of second paragraph is now out of date since this function
no longer sends smgr invalidation message.

Is it worth to leave the current visibilitymap_truncate function as a
shortcut function, instead of replacing? That way we don't need to
change pg_truncate_visibility_map function.

The same comments are true for MarkFreeSpaceMapTruncateRel.

---
+       ForkNumber      forks[MAX_FORKNUM];
+       BlockNumber     blocks[MAX_FORKNUM];
+       BlockNumber     new_nfsmblocks = InvalidBlockNumber;    /* FSM blocks */
+       BlockNumber     newnblocks = InvalidBlockNumber;        /* VM blocks */
+       int             nforks = 0;

I think that we can have new_nfsmblocks and new_nvmblocks and wipe out
the comments.

---
-       /* Truncate the FSM first if it exists */
+       /*
+        * We used to truncate FSM and VM forks here. Now we only mark the
+        * dirty buffers of all forks about-to-be-truncated if they exist.
+        */
+

Again, I think we need the description of current behavior rather than
the history, except the case where the history is important.

---
-               /*
-                * Make an XLOG entry reporting the file truncation.
-                */
+               /* Make an XLOG entry reporting the file truncation */

Unnecessary change.

---
+       /*
+        * We might as well update the local smgr_fsm_nblocks and
smgr_vm_nblocks
+        * setting. smgrtruncate sent an smgr cache inval message,
which will cause
+        * other backends to invalidate their copy of smgr_fsm_nblocks and
+        * smgr_vm_nblocks, and this one too at the next command
boundary. But this
+        * ensures it isn't outright wrong until then.
+        */
+       if (rel->rd_smgr)
+       {
+               rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
+               rel->rd_smgr->smgr_vm_nblocks = newnblocks;
+       }

new_nfsmblocks and newnblocks could be InvalidBlockNumber when the
forks are already enough small. So the above code sets incorrect
values to smgr_{fsm,vm}_nblocks.

Also, I wonder if we can do the above code in smgrtruncate. Otherwise
we always need to set smgr_{fsm,vm}_nblocks after smgrtruncate, which
is inconvenient.

---
+               /* Update the local smgr_fsm_nblocks and
smgr_vm_nblocks setting */
+               if (rel->rd_smgr)
+               {
+                       rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
+                       rel->rd_smgr->smgr_vm_nblocks = newnblocks;
+               }

The save as above. And we need to set smgr_{fsm,vm}_nblocks in spite
of freeing the fake relcache soon?

---
+       /* Get the lower bound of target block number we're interested in */
+       for (i = 0; i < nforks; i++)
+       {
+               if (!BlockNumberIsValid(minBlock) ||
+                       minBlock > firstDelBlock[i])
+                       minBlock = firstDelBlock[i];
+       }

Maybe we can declare 'i' in the for statement (i.e. for (int i = 0;
...)) at every outer loops in this functions. And in the inner loop we
can use 'j'.

---
-DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
-                                          BlockNumber firstDelBlock)
+DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
+                                          BlockNumber *firstDelBlock,
int nforks)

I think it's better to declare *forkNum and nforks side by side for
readability. That is, we can have it as follows.

DropRelFileNodeBuffers (RelFileNodeBackend rnode, ForkNumber *forkNum,
int nforks, BlockNumber *firstDelBlock)

---
-smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
+smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum, bool isRedo,
int nforks)

Same as above. The order of reln, *forknum, nforks, isRedo would be better.

---
@@ -383,6 +383,10 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
 {
        Oid                     relid = PG_GETARG_OID(0);
        Relation        rel;
+       ForkNumber      forks[MAX_FORKNUM];
+       BlockNumber     blocks[MAX_FORKNUM];
+       BlockNumber     newnblocks = InvalidBlockNumber;
+       int             nforks = 0;

Why do we need the array of forks and blocks here? I think it's enough
to have one fork and one block number.

---
The comment of smgrdounlinkfork function needs to be updated. We now
can remove multiple forks.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#17Adrien Nayrat
adrien.nayrat@anayrat.info
In reply to: Jamison, Kirk (#15)
Re: [PATCH] Speedup truncates of relation forks

On 7/1/19 12:55 PM, Jamison, Kirk wrote:

On Wednesday, June 26, 2019 6:10 PM(GMT+9), Adrien Nayrat wrote:

As far as I remember, you should see "relation" wait events (type lock) on
standby server. This is due to startup process acquiring AccessExclusiveLock
for the truncation and other backend waiting to acquire a lock to read the
table.

Hi Adrien, thank you for taking time to reply.

I understand that RelationTruncate() can block read-only queries on
standby during redo. However, it's difficult for me to reproduce the
test case where I need to catch that wait for relation lock, because
one has to execute SELECT within the few milliseconds of redoing the
truncation of one table.

Yes, that why your test by measuring vacuum execution time is better as it is
more reproductible.

Instead, I just measured the whole recovery time, smgr_redo(),
to show the recovery improvement compared to head. Please refer below.

[Recovery Test]
I used the same stored functions and configurations in the previous email
& created "test" db.

$ createdb test
$ psql -d test

1. [Primary] Create 10,000 relations.
test=# SELECT create_tables(10000);

2. [P] Insert one row in each table.
test=# SELECT insert_tables(10000);

3. [P] Delete row of each table.
test=# SELECT delfrom_tables(10000);

4. [Standby] WAL application is stopped at Standby server.
test=# SELECT pg_wal_replay_pause();

5. [P] VACUUM is executed at Primary side, and measure its execution time.
test=# \timing on
test=# VACUUM;

Alternatively, you may use:
$ time psql -d test -c 'VACUUM;'
(Note: WAL has not replayed on standby because it's been paused.)

6. [P] Wait until VACUUM has finished execution. Then, stop primary server.
test=# pg_ctl stop -w

7. [S] Resume WAL replay, then promote standby (failover).
I used a shell script to execute recovery & promote standby server
because it's kinda difficult to measure recovery time. Please refer to the script below.
- "SELECT pg_wal_replay_resume();" is executed and the WAL application is resumed.
- "pg_ctl promote" to promote standby.
- The time difference of "select pg_is_in_recovery();" from "t" to "f" is measured.

shell script:

PGDT=/path_to_storage_directory/

if [ "$1" = "resume" ]; then
psql -c "SELECT pg_wal_replay_resume();" test
date +%Y/%m/%d_%H:%M:%S.%3N
pg_ctl promote -D ${PGDT}
set +x
date +%Y/%m/%d_%H:%M:%S.%3N
while [ 1 ]
do
RS=`psql -Atc "select pg_is_in_recovery();" test`
if [ ${RS} = "f" ]; then
break
fi
done
date +%Y/%m/%d_%H:%M:%S.%3N
set -x
exit 0
fi

[Test Results]
shared_buffers = 24GB

1. HEAD
(wal replay resumed)
2019/07/01_08:48:50.326
server promoted
2019/07/01_08:49:50.482
2019/07/01_09:02:41.051

Recovery Time:
13 min 50.725 s -> Time difference from WAL replay to complete recovery
12 min 50.569 s -> Time difference of "select pg_is_in_recovery();" from "t" to "f"

2. PATCH
(wal replay resumed)
2019/07/01_07:34:26.766
server promoted
2019/07/01_07:34:57.790
2019/07/01_07:34:57.809

Recovery Time:
31.043 s -> Time difference from WAL replay to complete recovery
00.019 s -> Time difference of "select pg_is_in_recovery();" from "t" to "f"

[Conclusion]
The recovery time significantly improved compared to head
from 13 minutes to 30 seconds.

Any thoughts?
I'd really appreciate your comments/feedback about the patch and/or test.

Thanks for the time you spend on this test, it is a huge win!
Although creating 10k tables and deleting tuples is not a common use case, it is
still good to know how your patch performs.
I will try to look deeper in your patch, but my knowledge on postgres internal
are limited :)

--
Adrien

#18Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Masahiko Sawada (#16)
1 attachment(s)
RE: [PATCH] Speedup truncates of relation forks

On Tuesday, July 2, 2019 4:59 PM (GMT+9), Masahiko Sawada wrote:

Thank you for updating the patch. Here is the review comments for v2 patch.

Thank you so much for review!
I indicated the addressed parts below and attached the updated patch.

---
visibilitymap.c: visibilitymap_truncate()

I don't think we should describe about the previous behavior here.
Rather we need to describe what visibilitymap_mark_truncate does and what
it returns to the caller.

I'm not sure that visibilitymap_mark_truncate function name is appropriate
here since it actually truncate map bits, not only marking. Perhaps we can
still use visibilitymap_truncate or change to
visibilitymap_truncate_prepare, or something? Anyway, this function
truncates only tail bits in the last remaining map page and we can have a
rule that the caller must call smgrtruncate() later to actually truncate
pages.

The comment of second paragraph is now out of date since this function no
longer sends smgr invalidation message.

(1) I updated function name to "visibilitymap_truncate_prepare()" as suggested.
I think that parameter name fits, unless other reviewers suggest a better name.
I also updated its description more accurately: describing current behavior,
caller must eventually call smgrtruncate() to actually truncate vm pages,
and removed the outdated description.

Is it worth to leave the current visibilitymap_truncate function as a shortcut
function, instead of replacing? That way we don't need to change
pg_truncate_visibility_map function.

(2) Yeah, it's kinda displeasing that I had to add lines in pg_truncate_visibility_map.
By any chance, re: shortcut function, you meant to retain the function
visibilitymap_truncate() and just add another visibilitymap_truncate_prepare(),
isn't it? I'm not sure if it's worth the additional lines of adding
another function in visibilitymap.c, that's why I just updated the function itself
which just adds 10 lines to pg_truncate_visibility_map anyway.
Hmm. If it's not wrong to do it this way, then I will retain this change.
OTOH, if pg_visibility.c *must* not be modified, then I'll follow your advice.

----
pg_visibility.c: pg_truncate_visibility_map()

@@ -383,6 +383,10 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
{
Oid                     relid = PG_GETARG_OID(0);
Relation        rel;
+       ForkNumber      forks[MAX_FORKNUM];
+       BlockNumber     blocks[MAX_FORKNUM];
+       BlockNumber     newnblocks = InvalidBlockNumber;
+       int             nforks = 0;

Why do we need the array of forks and blocks here? I think it's enough to
have one fork and one block number.

(3) Thanks for the catch. Updated.

----
freespace.c: MarkFreeSpaceMapTruncateRel()

The same comments are true for MarkFreeSpaceMapTruncateRel.

+       BlockNumber     new_nfsmblocks = InvalidBlockNumber;    /* FSM
blocks */
+       BlockNumber     newnblocks = InvalidBlockNumber;        /* VM
blocks */
+       int             nforks = 0;

I think that we can have new_nfsmblocks and new_nvmblocks and wipe out the
comments.

(4) I updated the description accordingly, describing only the current behavior.
The caller must eventually call smgrtruncate() to actually truncate fsm pages.
I also removed the outdated description and irrelevant comments.

------
storage.c: RelationTruncate()

+        * We might as well update the local smgr_fsm_nblocks and
smgr_vm_nblocks
+        * setting. smgrtruncate sent an smgr cache inval message,
which will cause
+        * other backends to invalidate their copy of smgr_fsm_nblocks and
+        * smgr_vm_nblocks, and this one too at the next command
boundary. But this
+        * ensures it isn't outright wrong until then.
+        */
+       if (rel->rd_smgr)
+       {
+               rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
+               rel->rd_smgr->smgr_vm_nblocks = newnblocks;
+       }

new_nfsmblocks and newnblocks could be InvalidBlockNumber when the forks are
already enough small. So the above code sets incorrect values to
smgr_{fsm,vm}_nblocks.

Also, I wonder if we can do the above code in smgrtruncate. Otherwise we always
need to set smgr_{fsm,vm}_nblocks after smgrtruncate, which is inconvenient.

(5)
In my patch, did you mean that there's a possibility that these values
were already set to InvalidBlockNumber even before I did the setting, is it?
I'm not sure if IIUC, the point of the above code is to make sure that
smgr_{fsm,vm}_nblocks are not InvalidBlockNumber until the next command
boundary, and while we don't reach that boundary yet, we make sure
these values are valid within that window. Is my understanding correct?
Maybe following your advice that putting it inside the smgrtruncate loop
will make these values correct.
For example, below?

void smgrtruncate
{
...
CacheInvalidateSmgr(reln->smgr_rnode);

/* Do the truncation */
for (i = 0; i < nforks; i++)
{
smgrsw[reln->smgr_which].smgr_truncate(reln, forknum[i], nblocks[i]);

if (forknum[i] == FSM_FORKNUM)
reln->smgr_fsm_nblocks = nblocks[i];
if (forknum[i] == VISIBILITYMAP_FORKNUM)
reln->smgr_vm_nblocks = nblocks[i];
}

Another problem I have is where I should call FreeSpaceMapVacuumRange to 
account for truncation of fsm pages. I also realized the upper bound
new_nfsmblocks might be incorrect in this case.
This is the cause why regression test fails in my updated patch...
+	 * Update upper-level FSM pages to account for the truncation.
+	 * This is important because the just-truncated pages were likely
+	 * marked as all-free, and would be preferentially selected.
+	 */
+	FreeSpaceMapVacuumRange(rel->rd_smgr, new_nfsmblocks, InvalidBlockNumber);

-----------
storage.c: smgr_redo()

+               /* Update the local smgr_fsm_nblocks and
smgr_vm_nblocks setting */
+               if (rel->rd_smgr)
+               {
+                       rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
+                       rel->rd_smgr->smgr_vm_nblocks = newnblocks;
+               }

The save as above. And we need to set smgr_{fsm,vm}_nblocks in spite of freeing
the fake relcache soon?

(6) You're right. It's unnecessary in this case.
If I also put the smgr_{fsm,vm}_nblocks setting inside the smgrtruncate
as you suggested above, it will still be set after truncation? Hmm.
Perhaps it's ok, because in the current source code it also does the setting
whenever we call visibilitymap_truncate, FreeSpaceMapTruncateRel during redo.

-----------
bufmgr.c: DropRelFileNodeBuffers()

+       /* Get the lower bound of target block number we're interested in
*/
+       for (i = 0; i < nforks; i++)
+       {
+               if (!BlockNumberIsValid(minBlock) ||
+                       minBlock > firstDelBlock[i])
+                       minBlock = firstDelBlock[i];
+       }

Maybe we can declare 'i' in the for statement (i.e. for (int i = 0;
...)) at every outer loops in this functions. And in the inner loop we can
use 'j'.

(7) Agree. Updated.

I think it's better to declare *forkNum and nforks side by side for readability.
That is, we can have it as follows.

DropRelFileNodeBuffers (RelFileNodeBackend rnode, ForkNumber *forkNum, int
nforks, BlockNumber *firstDelBlock)

(8) Agree. I updated DropRelFileNodeBuffers, smgrtruncate and
smgrdounlinkfork accordingly.

---------
smgr.c: smgrdounlinkfork()

-smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
+smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum, bool isRedo,
int nforks)

Same as above. The order of reln, *forknum, nforks, isRedo would be better.

The comment of smgrdounlinkfork function needs to be updated. We now can
remove multiple forks.

(9) Agree. Updated accordingly.

I updated the patch based from comments,
but it still fails the regression test as indicated in (5) above.
Kindly verify if I correctly addressed the other parts as what you intended.

Thanks again for the review!
I'll update the patch again after further comments.

Regards,
Kirk Jamison

Attachments:

v3-0001-Speedup-truncates-of-relation-forks.patchapplication/octet-stream; name=v3-0001-Speedup-truncates-of-relation-forks.patchDownload
From 93b4cefe99087e6738827d3cda518883e723a4a9 Mon Sep 17 00:00:00 2001
From: Kirk Jamison <k.jamison@jp.fujitsu.com>
Date: Thu, 4 Jul 2019 10:59:05 +0000
Subject: [PATCH] Speedup truncates of relation forks

Whenever we truncate relations, it involves several scans of the
shared buffers for every call of smgrtruncate() for each fork which
is time-consuming. This patch reduces the scan for all forks into one
instead of three, and improves the relation truncates by initially
marking the pages-to-be-truncated of relation forks, then
simultaneously truncating them, resulting to an improved performance
in VACUUM, autovacuum operations and their recovery performance.
---
 contrib/pg_visibility/pg_visibility.c     | 11 +++-
 src/backend/access/heap/visibilitymap.c   | 36 +++++------
 src/backend/catalog/storage.c             | 99 +++++++++++++++++++++++++++----
 src/backend/storage/buffer/bufmgr.c       | 48 +++++++++++----
 src/backend/storage/freespace/freespace.c | 41 ++++---------
 src/backend/storage/smgr/smgr.c           | 49 ++++++++++-----
 src/include/access/visibilitymap.h        |  2 +-
 src/include/storage/bufmgr.h              |  4 +-
 src/include/storage/freespace.h           |  2 +-
 src/include/storage/smgr.h                |  7 ++-
 10 files changed, 199 insertions(+), 100 deletions(-)

diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 1372bb6..1aabde2 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -383,6 +383,9 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
 {
 	Oid			relid = PG_GETARG_OID(0);
 	Relation	rel;
+	ForkNumber	fork;
+	BlockNumber	block;
+	BlockNumber	newnblocks = InvalidBlockNumber;
 
 	rel = relation_open(relid, AccessExclusiveLock);
 
@@ -392,7 +395,13 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
 	RelationOpenSmgr(rel);
 	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
 
-	visibilitymap_truncate(rel, 0);
+	block = visibilitymap_truncate_prepare(rel, 0);
+	if (BlockNumberIsValid(block)
+	{
+		fork = VISIBILITYMAP_FORKNUM;
+		newnblocks = block;
+	}
+	smgrtruncate(rel->rd_smgr, &fork, 1, &block);
 
 	if (RelationNeedsWAL(rel))
 	{
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dfe06..4cc7977 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -17,7 +17,7 @@
  *		visibilitymap_set	 - set a bit in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
- *		visibilitymap_truncate	- truncate the visibility map
+ *		visibilitymap_truncate_prepare - truncate only tail bits of map pages
  *
  * NOTES
  *
@@ -430,16 +430,18 @@ visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_fro
 }
 
 /*
- *	visibilitymap_truncate - truncate the visibility map
+ *	visibilitymap_truncate_prepare - truncate only tail bits of map page
+ *									 and return the block number for actual
+ *									 truncation later
  *
- * The caller must hold AccessExclusiveLock on the relation, to ensure that
- * other backends receive the smgr invalidation event that this function sends
- * before they access the VM again.
+ * Note that this does not truncate the actual visibility map pages.
+ * When this function is called, the caller must eventually follow it with
+ * smgrtruncate() call to actually truncate visibility map pages.
  *
  * nheapblocks is the new size of the heap.
  */
-void
-visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
+BlockNumber
+visibilitymap_truncate_prepare(Relation rel, BlockNumber nheapblocks)
 {
 	BlockNumber newnblocks;
 
@@ -459,7 +461,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	 * nothing to truncate.
 	 */
 	if (!smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM))
-		return;
+		return InvalidBlockNumber;
 
 	/*
 	 * Unless the new size is exactly at a visibility map page boundary, the
@@ -480,7 +482,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 		if (!BufferIsValid(mapBuffer))
 		{
 			/* nothing to do, the file was already smaller */
-			return;
+			return InvalidBlockNumber;
 		}
 
 		page = BufferGetPage(mapBuffer);
@@ -528,20 +530,10 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	if (smgrnblocks(rel->rd_smgr, VISIBILITYMAP_FORKNUM) <= newnblocks)
 	{
 		/* nothing to do, the file was already smaller than requested size */
-		return;
+		return InvalidBlockNumber;
 	}
-
-	/* Truncate the unused VM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, newnblocks);
-
-	/*
-	 * We might as well update the local smgr_vm_nblocks setting. smgrtruncate
-	 * sent an smgr cache inval message, which will cause other backends to
-	 * invalidate their copy of smgr_vm_nblocks, and this one too at the next
-	 * command boundary.  But this ensures it isn't outright wrong until then.
-	 */
-	if (rel->rd_smgr)
-		rel->rd_smgr->smgr_vm_nblocks = newnblocks;
+	else
+		return newnblocks;
 }
 
 /*
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 3cc886f..646d26c 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -231,6 +231,11 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 {
 	bool		fsm;
 	bool		vm;
+	ForkNumber	forks[MAX_FORKNUM];
+	BlockNumber	blocks[MAX_FORKNUM];
+	BlockNumber	new_nfsmblocks = InvalidBlockNumber;
+	BlockNumber	newnblocks = InvalidBlockNumber;
+	int		nforks = 0;
 
 	/* Open it at the smgr level if not already done */
 	RelationOpenSmgr(rel);
@@ -242,15 +247,34 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 	rel->rd_smgr->smgr_fsm_nblocks = InvalidBlockNumber;
 	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
 
-	/* Truncate the FSM first if it exists */
+	/* Mark the dirty FSM page and return a block number. */
 	fsm = smgrexists(rel->rd_smgr, FSM_FORKNUM);
 	if (fsm)
-		FreeSpaceMapTruncateRel(rel, nblocks);
+	{
+		blocks[nforks] = MarkFreeSpaceMapTruncateRel(rel, nblocks);
+		if (BlockNumberIsValid(blocks[nforks]))
+		{
+			forks[nforks] = FSM_FORKNUM;
+			new_nfsmblocks= blocks[nforks];
+			nforks++;
+		}
+	}
 
-	/* Truncate the visibility map too if it exists. */
+	/*
+	 * Truncate only the tail bits of VM and return the block number
+	 * for actual truncation later in smgrtruncate.
+	 */
 	vm = smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM);
 	if (vm)
-		visibilitymap_truncate(rel, nblocks);
+	{
+		blocks[nforks] = visibilitymap_truncate_prepare(rel, nblocks);
+		if (BlockNumberIsValid(blocks[nforks]))
+		{
+			forks[nforks] = VISIBILITYMAP_FORKNUM;
+			newnblocks = blocks[nforks];
+			nforks++;
+		}
+	}
 
 	/*
 	 * We WAL-log the truncation before actually truncating, which means
@@ -290,8 +314,20 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 			XLogFlush(lsn);
 	}
 
-	/* Do the real work */
-	smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks);
+	/* Mark the MAIN fork */
+	forks[nforks] = MAIN_FORKNUM;
+	blocks[nforks] = nblocks;
+	nforks++;
+
+	/* Truncate relation forks simultaneously */
+	smgrtruncate(rel->rd_smgr, forks, nforks, blocks);
+
+	/*
+	 * Update upper-level FSM pages to account for the truncation.
+	 * This is important because the just-truncated pages were likely
+	 * marked as all-free, and would be preferentially selected.
+	 */
+	FreeSpaceMapVacuumRange(rel->rd_smgr, new_nfsmblocks, InvalidBlockNumber);
 }
 
 /*
@@ -588,6 +624,14 @@ smgr_redo(XLogReaderState *record)
 		xl_smgr_truncate *xlrec = (xl_smgr_truncate *) XLogRecGetData(record);
 		SMgrRelation reln;
 		Relation	rel;
+		ForkNumber	forks[MAX_FORKNUM];
+		BlockNumber	blocks[MAX_FORKNUM];
+		BlockNumber	new_nfsmblocks = InvalidBlockNumber;
+		BlockNumber	newnblocks = InvalidBlockNumber;
+		int		nforks = 0;
+		bool		fsm_fork = false;
+		bool		main_fork = false;
+		bool		vm_fork = false;
 
 		reln = smgropen(xlrec->rnode, InvalidBackendId);
 
@@ -616,23 +660,52 @@ smgr_redo(XLogReaderState *record)
 		 */
 		XLogFlush(lsn);
 
+		/*
+		 * To speedup recovery, we mark the about-to-be-truncated blocks of
+		 * relation forks first, then truncate those simultaneously later.
+		 */
 		if ((xlrec->flags & SMGR_TRUNCATE_HEAP) != 0)
 		{
-			smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno);
-
-			/* Also tell xlogutils.c about it */
-			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+			forks[nforks] = MAIN_FORKNUM;
+			blocks[nforks] = xlrec->blkno;
+			nforks++;
+			main_fork = true;
 		}
 
-		/* Truncate FSM and VM too */
 		rel = CreateFakeRelcacheEntry(xlrec->rnode);
 
 		if ((xlrec->flags & SMGR_TRUNCATE_FSM) != 0 &&
 			smgrexists(reln, FSM_FORKNUM))
-			FreeSpaceMapTruncateRel(rel, xlrec->blkno);
+		{
+			blocks[nforks] = MarkFreeSpaceMapTruncateRel(rel, xlrec->blkno);
+			if (BlockNumberIsValid(blocks[nforks]))
+			{
+				forks[nforks] = FSM_FORKNUM;
+				new_nfsmblocks= blocks[nforks];
+				nforks++;
+				fsm_fork = true;
+			}
+		}
 		if ((xlrec->flags & SMGR_TRUNCATE_VM) != 0 &&
 			smgrexists(reln, VISIBILITYMAP_FORKNUM))
-			visibilitymap_truncate(rel, xlrec->blkno);
+		{
+			blocks[nforks] = visibilitymap_truncate_prepare(rel, xlrec->blkno);
+			if (BlockNumberIsValid(blocks[nforks]))
+			{
+				forks[nforks] = VISIBILITYMAP_FORKNUM;
+				newnblocks = blocks[nforks];
+				nforks++;
+				vm_fork = true;
+			}
+		}
+
+		/* Truncate relation forks simultaneously */
+		if (main_fork || fsm_fork || vm_fork)
+			smgrtruncate(reln, forks, nforks, blocks);
+
+		/* Also tell xlogutils.c about it */
+		if (main_fork)
+			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
 
 		FreeFakeRelcacheEntry(rel);
 	}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 7332e6b..512c8a1 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2899,8 +2899,8 @@ BufferGetLSNAtomic(Buffer buffer)
 /* ---------------------------------------------------------------------
  *		DropRelFileNodeBuffers
  *
- *		This function removes from the buffer pool all the pages of the
- *		specified relation fork that have block numbers >= firstDelBlock.
+ *		This function simultaneously removes from the buffer pool all the
+ *		pages of the relation forks that have block numbers >= firstDelBlock.
  *		(In particular, with firstDelBlock = 0, all pages are removed.)
  *		Dirty pages are simply dropped, without bothering to write them
  *		out first.  Therefore, this is NOT rollback-able, and so should be
@@ -2923,23 +2923,36 @@ BufferGetLSNAtomic(Buffer buffer)
  * --------------------------------------------------------------------
  */
 void
-DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
-					   BlockNumber firstDelBlock)
+DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
+					   int nforks, BlockNumber *firstDelBlock)
 {
-	int			i;
+	BlockNumber minBlock = InvalidBlockNumber;
 
 	/* If it's a local relation, it's localbuf.c's problem. */
 	if (RelFileNodeBackendIsTemp(rnode))
 	{
 		if (rnode.backend == MyBackendId)
-			DropRelFileNodeLocalBuffers(rnode.node, forkNum, firstDelBlock);
+		{
+			for (int i = 0; i < nforks; i++)
+				DropRelFileNodeLocalBuffers(rnode.node, forkNum[i],
+											firstDelBlock[i]);
+		}
 		return;
 	}
 
-	for (i = 0; i < NBuffers; i++)
+	/* Get the lower bound of target block number we're interested in */
+	for (int i = 0; i < nforks; i++)
+	{
+		if (!BlockNumberIsValid(minBlock) ||
+			minBlock > firstDelBlock[i])
+			minBlock = firstDelBlock[i];
+	}
+
+	for (int i = 0; i < NBuffers; i++)
 	{
 		BufferDesc *bufHdr = GetBufferDescriptor(i);
 		uint32		buf_state;
+		int		j = 0;
 
 		/*
 		 * We can make this a tad faster by prechecking the buffer tag before
@@ -2960,12 +2973,23 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
 		if (!RelFileNodeEquals(bufHdr->tag.rnode, rnode.node))
 			continue;
 
+		/* Check with the lower bound block number and skip the loop */
+		if (bufHdr->tag.blockNum < minBlock)
+			continue; /* skip checking the buffer pool scan */
+
 		buf_state = LockBufHdr(bufHdr);
-		if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
-			bufHdr->tag.forkNum == forkNum &&
-			bufHdr->tag.blockNum >= firstDelBlock)
-			InvalidateBuffer(bufHdr);	/* releases spinlock */
-		else
+
+		for (j = 0; j < nforks; j++)
+		{
+			if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
+				bufHdr->tag.forkNum == forkNum[j] &&
+				bufHdr->tag.blockNum >= firstDelBlock[j])
+			{
+				InvalidateBuffer(bufHdr); /* releases spinlock */
+				break;
+			}
+		}
+		if (j >= nforks)
 			UnlockBufHdr(bufHdr, buf_state);
 	}
 }
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index c17b3f4..9c29604 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -247,16 +247,16 @@ GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk)
 }
 
 /*
- * FreeSpaceMapTruncateRel - adjust for truncation of a relation.
+ * MarkFreeSpaceMapTruncateRel - adjust for truncation of a relation.
  *
- * The caller must hold AccessExclusiveLock on the relation, to ensure that
- * other backends receive the smgr invalidation event that this function sends
- * before they access the FSM again.
+ * This function marks the dirty page and returns a block number.
+ * The caller of this function must eventually call smgrtruncate() to actually
+ * truncate FSM pages.
  *
  * nblocks is the new size of the heap.
  */
-void
-FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
+BlockNumber
+MarkFreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 {
 	BlockNumber new_nfsmblocks;
 	FSMAddress	first_removed_address;
@@ -270,7 +270,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	 * truncate.
 	 */
 	if (!smgrexists(rel->rd_smgr, FSM_FORKNUM))
-		return;
+		return InvalidBlockNumber;
 
 	/* Get the location in the FSM of the first removed heap block */
 	first_removed_address = fsm_get_location(nblocks, &first_removed_slot);
@@ -285,7 +285,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	{
 		buf = fsm_readbuf(rel, first_removed_address, false);
 		if (!BufferIsValid(buf))
-			return;				/* nothing to do; the FSM was already smaller */
+			return InvalidBlockNumber;	/* nothing to do; the FSM was already smaller */
 		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
 
 		/* NO EREPORT(ERROR) from here till changes are logged */
@@ -310,33 +310,16 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 		UnlockReleaseBuffer(buf);
 
 		new_nfsmblocks = fsm_logical_to_physical(first_removed_address) + 1;
+		return new_nfsmblocks;
 	}
 	else
 	{
 		new_nfsmblocks = fsm_logical_to_physical(first_removed_address);
 		if (smgrnblocks(rel->rd_smgr, FSM_FORKNUM) <= new_nfsmblocks)
-			return;				/* nothing to do; the FSM was already smaller */
+			return InvalidBlockNumber;	/* nothing to do; the FSM was already smaller */
+		else
+			return new_nfsmblocks;
 	}
-
-	/* Truncate the unused FSM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, FSM_FORKNUM, new_nfsmblocks);
-
-	/*
-	 * We might as well update the local smgr_fsm_nblocks setting.
-	 * smgrtruncate sent an smgr cache inval message, which will cause other
-	 * backends to invalidate their copy of smgr_fsm_nblocks, and this one too
-	 * at the next command boundary.  But this ensures it isn't outright wrong
-	 * until then.
-	 */
-	if (rel->rd_smgr)
-		rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
-
-	/*
-	 * Update upper-level FSM pages to account for the truncation.  This is
-	 * important because the just-truncated pages were likely marked as
-	 * all-free, and would be preferentially selected.
-	 */
-	FreeSpaceMapVacuumRange(rel, nblocks, InvalidBlockNumber);
 }
 
 /*
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index dba8c39..ea57de7 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -498,29 +498,30 @@ smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo)
 }
 
 /*
- *	smgrdounlinkfork() -- Immediately unlink one fork of a relation.
+ *	smgrdounlinkfork() -- Immediately unlink each fork of a relation.
  *
- *		The specified fork of the relation is removed from the store.  This
- *		should not be used during transactional operations, since it can't be
- *		undone.
+ *		Each fork of the relation is removed from the store.  This should
+ *		not be used during transactional operations, since it can't be undone.
  *
  *		If isRedo is true, it is okay for the underlying file to be gone
  *		already.
  */
 void
-smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
+smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum, int nforks, bool isRedo)
 {
 	RelFileNodeBackend rnode = reln->smgr_rnode;
 	int			which = reln->smgr_which;
+	int			i;
 
-	/* Close the fork at smgr level */
-	smgrsw[which].smgr_close(reln, forknum);
+	/* Close each fork at smgr level */
+	for (i = 0; i < nforks; i++)
+		smgrsw[which].smgr_close(reln, forknum[i]);
 
 	/*
-	 * Get rid of any remaining buffers for the fork.  bufmgr will just drop
+	 * Get rid of any remaining buffers for each fork. bufmgr will just drop
 	 * them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(rnode, forknum, 0);
+	DropRelFileNodeBuffers(rnode, forknum, nforks, 0);
 
 	/*
 	 * It'd be nice to tell the stats collector to forget it immediately, too.
@@ -546,7 +547,8 @@ smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 	 * ERROR, because we've already decided to commit or abort the current
 	 * xact.
 	 */
-	smgrsw[which].smgr_unlink(rnode, forknum, isRedo);
+	for (i = 0; i < nforks; i++)
+		smgrsw[which].smgr_unlink(rnode, forknum[i], isRedo);
 }
 
 /*
@@ -643,13 +645,15 @@ smgrnblocks(SMgrRelation reln, ForkNumber forknum)
  * The truncation is done immediately, so this can't be rolled back.
  */
 void
-smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
+smgrtruncate(SMgrRelation reln, ForkNumber *forknum, int nforks, BlockNumber *nblocks)
 {
+	int		i;
+
 	/*
 	 * Get rid of any buffers for the about-to-be-deleted blocks. bufmgr will
 	 * just drop them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nblocks);
+	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nforks, nblocks);
 
 	/*
 	 * Send a shared-inval message to force other backends to close any smgr
@@ -663,10 +667,23 @@ smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 	 */
 	CacheInvalidateSmgr(reln->smgr_rnode);
 
-	/*
-	 * Do the truncation.
-	 */
-	smgrsw[reln->smgr_which].smgr_truncate(reln, forknum, nblocks);
+	/* Do the truncation */
+	for (i = 0; i < nforks; i++)
+	{
+		smgrsw[reln->smgr_which].smgr_truncate(reln, forknum[i], nblocks[i]);
+
+		/*
+		 * We might as well update the local smgr_fsm_nblocks and smgr_vm_nblocks
+		 * setting. smgrtruncate sent an smgr cache inval message, which will
+		 * cause other backends to invalidate their copy of smgr_fsm_nblocks and
+		 * smgr_vm_nblocks, and these ones too at the next command boundary. But
+		 * this ensures these aren't outright wrong until then.
+		 */
+		if (forknum[i] == FSM_FORKNUM)
+			reln->smgr_fsm_nblocks = nblocks[i];
+		if (forknum[i] == VISIBILITYMAP_FORKNUM)
+			reln->smgr_vm_nblocks = nblocks[i];
+	}
 }
 
 /*
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 2d88043..1ab6a81 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -44,6 +44,6 @@ extern void visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 							  uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
-extern void visibilitymap_truncate(Relation rel, BlockNumber nheapblocks);
+extern BlockNumber visibilitymap_truncate_prepare(Relation rel, BlockNumber nheapblocks);
 
 #endif							/* VISIBILITYMAP_H */
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 509f4b7..17b97f7 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -190,8 +190,8 @@ extern BlockNumber RelationGetNumberOfBlocksInFork(Relation relation,
 extern void FlushOneBuffer(Buffer buffer);
 extern void FlushRelationBuffers(Relation rel);
 extern void FlushDatabaseBuffers(Oid dbid);
-extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode,
-								   ForkNumber forkNum, BlockNumber firstDelBlock);
+extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
+								   int nforks, BlockNumber *firstDelBlock);
 extern void DropRelFileNodesAllBuffers(RelFileNodeBackend *rnodes, int nnodes);
 extern void DropDatabaseBuffers(Oid dbid);
 
diff --git a/src/include/storage/freespace.h b/src/include/storage/freespace.h
index 8d8c465..bf19a67 100644
--- a/src/include/storage/freespace.h
+++ b/src/include/storage/freespace.h
@@ -30,7 +30,7 @@ extern void RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk,
 extern void XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk,
 										Size spaceAvail);
 
-extern void FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
+extern BlockNumber MarkFreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
 extern void FreeSpaceMapVacuum(Relation rel);
 extern void FreeSpaceMapVacuumRange(Relation rel, BlockNumber start,
 									BlockNumber end);
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index d286c8c..a24532c 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -90,7 +90,8 @@ extern void smgrclosenode(RelFileNodeBackend rnode);
 extern void smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern void smgrdounlink(SMgrRelation reln, bool isRedo);
 extern void smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo);
-extern void smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo);
+extern void smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum,
+							 int nforks, bool isRedo);
 extern void smgrextend(SMgrRelation reln, ForkNumber forknum,
 					   BlockNumber blocknum, char *buffer, bool skipFsync);
 extern void smgrprefetch(SMgrRelation reln, ForkNumber forknum,
@@ -102,8 +103,8 @@ extern void smgrwrite(SMgrRelation reln, ForkNumber forknum,
 extern void smgrwriteback(SMgrRelation reln, ForkNumber forknum,
 						  BlockNumber blocknum, BlockNumber nblocks);
 extern BlockNumber smgrnblocks(SMgrRelation reln, ForkNumber forknum);
-extern void smgrtruncate(SMgrRelation reln, ForkNumber forknum,
-						 BlockNumber nblocks);
+extern void smgrtruncate(SMgrRelation reln, ForkNumber *forknum,
+						 int nforks, BlockNumber *nblocks);
 extern void smgrimmedsync(SMgrRelation reln, ForkNumber forknum);
 extern void AtEOXact_SMgr(void);
 
-- 
1.8.3.1

#19Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Jamison, Kirk (#18)
1 attachment(s)
RE: [PATCH] Speedup truncates of relation forks

Hi,

I updated the patch based from comments, but it still fails the regression
test as indicated in (5) above.
Kindly verify if I correctly addressed the other parts as what you intended.

Thanks again for the review!
I'll update the patch again after further comments.

I updated the patch which is similar to V3 of the patch,
but addressing my problem in (5) in the previous email regarding FreeSpaceMapVacuumRange.
It seems to pass the regression test now. Kindly check for validation.
Thank you!

Regards,
Kirk Jamison

Attachments:

v4-0001-Speedup-truncates-of-relation-forks.patchapplication/octet-stream; name=v4-0001-Speedup-truncates-of-relation-forks.patchDownload
From 969e07a05b0b067cba1ca02f549c522fba5873d0 Mon Sep 17 00:00:00 2001
From: Kirk Jamison <k.jamison@jp.fujitsu.com>
Date: Thu, 4 Jul 2019 10:59:05 +0000
Subject: [PATCH] Speedup truncates of relation forks

Whenever we truncate relations, it involves several scans of the
shared buffers for every call of smgrtruncate() for each fork which
is time-consuming. This patch reduces the scan for all forks into one
instead of three, and improves the relation truncates by initially
marking the pages-to-be-truncated of relation forks, then
simultaneously truncating them, resulting to an improved performance
in VACUUM, autovacuum operations and their recovery performance.
---
 contrib/pg_visibility/pg_visibility.c     |  11 ++-
 src/backend/access/heap/visibilitymap.c   |  36 ++++------
 src/backend/catalog/storage.c             | 111 ++++++++++++++++++++++++++----
 src/backend/storage/buffer/bufmgr.c       |  48 +++++++++----
 src/backend/storage/freespace/freespace.c |  41 ++++-------
 src/backend/storage/smgr/smgr.c           |  49 ++++++++-----
 src/include/access/visibilitymap.h        |   2 +-
 src/include/storage/bufmgr.h              |   4 +-
 src/include/storage/freespace.h           |   2 +-
 src/include/storage/smgr.h                |   7 +-
 10 files changed, 211 insertions(+), 100 deletions(-)

diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 1372bb6..1aabde2 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -383,6 +383,9 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
 {
 	Oid			relid = PG_GETARG_OID(0);
 	Relation	rel;
+	ForkNumber	fork;
+	BlockNumber	block;
+	BlockNumber	newnblocks = InvalidBlockNumber;
 
 	rel = relation_open(relid, AccessExclusiveLock);
 
@@ -392,7 +395,13 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
 	RelationOpenSmgr(rel);
 	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
 
-	visibilitymap_truncate(rel, 0);
+	block = visibilitymap_truncate_prepare(rel, 0);
+	if (BlockNumberIsValid(block)
+	{
+		fork = VISIBILITYMAP_FORKNUM;
+		newnblocks = block;
+	}
+	smgrtruncate(rel->rd_smgr, &fork, 1, &block);
 
 	if (RelationNeedsWAL(rel))
 	{
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dfe06..4cc7977 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -17,7 +17,7 @@
  *		visibilitymap_set	 - set a bit in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
- *		visibilitymap_truncate	- truncate the visibility map
+ *		visibilitymap_truncate_prepare - truncate only tail bits of map pages
  *
  * NOTES
  *
@@ -430,16 +430,18 @@ visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_fro
 }
 
 /*
- *	visibilitymap_truncate - truncate the visibility map
+ *	visibilitymap_truncate_prepare - truncate only tail bits of map page
+ *									 and return the block number for actual
+ *									 truncation later
  *
- * The caller must hold AccessExclusiveLock on the relation, to ensure that
- * other backends receive the smgr invalidation event that this function sends
- * before they access the VM again.
+ * Note that this does not truncate the actual visibility map pages.
+ * When this function is called, the caller must eventually follow it with
+ * smgrtruncate() call to actually truncate visibility map pages.
  *
  * nheapblocks is the new size of the heap.
  */
-void
-visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
+BlockNumber
+visibilitymap_truncate_prepare(Relation rel, BlockNumber nheapblocks)
 {
 	BlockNumber newnblocks;
 
@@ -459,7 +461,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	 * nothing to truncate.
 	 */
 	if (!smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM))
-		return;
+		return InvalidBlockNumber;
 
 	/*
 	 * Unless the new size is exactly at a visibility map page boundary, the
@@ -480,7 +482,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 		if (!BufferIsValid(mapBuffer))
 		{
 			/* nothing to do, the file was already smaller */
-			return;
+			return InvalidBlockNumber;
 		}
 
 		page = BufferGetPage(mapBuffer);
@@ -528,20 +530,10 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	if (smgrnblocks(rel->rd_smgr, VISIBILITYMAP_FORKNUM) <= newnblocks)
 	{
 		/* nothing to do, the file was already smaller than requested size */
-		return;
+		return InvalidBlockNumber;
 	}
-
-	/* Truncate the unused VM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, newnblocks);
-
-	/*
-	 * We might as well update the local smgr_vm_nblocks setting. smgrtruncate
-	 * sent an smgr cache inval message, which will cause other backends to
-	 * invalidate their copy of smgr_vm_nblocks, and this one too at the next
-	 * command boundary.  But this ensures it isn't outright wrong until then.
-	 */
-	if (rel->rd_smgr)
-		rel->rd_smgr->smgr_vm_nblocks = newnblocks;
+	else
+		return newnblocks;
 }
 
 /*
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 3cc886f..c9fd637 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -231,6 +231,12 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 {
 	bool		fsm;
 	bool		vm;
+	ForkNumber	forks[MAX_FORKNUM];
+	BlockNumber	blocks[MAX_FORKNUM];
+	BlockNumber	first_removed_nblocks = InvalidBlockNumber;
+	BlockNumber	new_nfsmblocks = InvalidBlockNumber;
+	BlockNumber	newnblocks = InvalidBlockNumber;
+	int		nforks = 0;
 
 	/* Open it at the smgr level if not already done */
 	RelationOpenSmgr(rel);
@@ -242,15 +248,35 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 	rel->rd_smgr->smgr_fsm_nblocks = InvalidBlockNumber;
 	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
 
-	/* Truncate the FSM first if it exists */
+	/* Mark the dirty FSM page and return a block number. */
 	fsm = smgrexists(rel->rd_smgr, FSM_FORKNUM);
 	if (fsm)
-		FreeSpaceMapTruncateRel(rel, nblocks);
+	{
+		blocks[nforks] = MarkFreeSpaceMapTruncateRel(rel, nblocks);
+		if (BlockNumberIsValid(blocks[nforks]))
+		{
+			first_removed_nblocks = nblocks;
+			forks[nforks] = FSM_FORKNUM;
+			new_nfsmblocks= blocks[nforks];
+			nforks++;
+		}
+	}
 
-	/* Truncate the visibility map too if it exists. */
+	/*
+	 * Truncate only the tail bits of VM and return the block number
+	 * for actual truncation later in smgrtruncate.
+	 */
 	vm = smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM);
 	if (vm)
-		visibilitymap_truncate(rel, nblocks);
+	{
+		blocks[nforks] = visibilitymap_truncate_prepare(rel, nblocks);
+		if (BlockNumberIsValid(blocks[nforks]))
+		{
+			forks[nforks] = VISIBILITYMAP_FORKNUM;
+			newnblocks = blocks[nforks];
+			nforks++;
+		}
+	}
 
 	/*
 	 * We WAL-log the truncation before actually truncating, which means
@@ -290,8 +316,21 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 			XLogFlush(lsn);
 	}
 
-	/* Do the real work */
-	smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks);
+	/* Mark the MAIN fork */
+	forks[nforks] = MAIN_FORKNUM;
+	blocks[nforks] = nblocks;
+	nforks++;
+
+	/* Truncate relation forks simultaneously */
+	smgrtruncate(rel->rd_smgr, forks, nforks, blocks);
+
+	/*
+	 * Update upper-level FSM pages to account for the truncation.
+	 * This is important because the just-truncated pages were likely
+	 * marked as all-free, and would be preferentially selected.
+	 */
+	//FreeSpaceMapVacuumRange(rel, new_nfsmblocks, InvalidBlockNumber);
+	FreeSpaceMapVacuumRange(rel, first_removed_nblocks, InvalidBlockNumber);
 }
 
 /*
@@ -588,6 +627,15 @@ smgr_redo(XLogReaderState *record)
 		xl_smgr_truncate *xlrec = (xl_smgr_truncate *) XLogRecGetData(record);
 		SMgrRelation reln;
 		Relation	rel;
+		ForkNumber	forks[MAX_FORKNUM];
+		BlockNumber	blocks[MAX_FORKNUM];
+		BlockNumber	first_removed_nblocks = InvalidBlockNumber;
+		BlockNumber	new_nfsmblocks = InvalidBlockNumber;
+		BlockNumber	newnblocks = InvalidBlockNumber;
+		int		nforks = 0;
+		bool		fsm_fork = false;
+		bool		main_fork = false;
+		bool		vm_fork = false;
 
 		reln = smgropen(xlrec->rnode, InvalidBackendId);
 
@@ -616,23 +664,60 @@ smgr_redo(XLogReaderState *record)
 		 */
 		XLogFlush(lsn);
 
+		/*
+		 * To speedup recovery, we mark the about-to-be-truncated blocks of
+		 * relation forks first, then truncate those simultaneously later.
+		 */
 		if ((xlrec->flags & SMGR_TRUNCATE_HEAP) != 0)
 		{
-			smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno);
-
-			/* Also tell xlogutils.c about it */
-			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+			forks[nforks] = MAIN_FORKNUM;
+			blocks[nforks] = xlrec->blkno;
+			nforks++;
+			main_fork = true;
 		}
 
-		/* Truncate FSM and VM too */
 		rel = CreateFakeRelcacheEntry(xlrec->rnode);
 
 		if ((xlrec->flags & SMGR_TRUNCATE_FSM) != 0 &&
 			smgrexists(reln, FSM_FORKNUM))
-			FreeSpaceMapTruncateRel(rel, xlrec->blkno);
+		{
+			blocks[nforks] = MarkFreeSpaceMapTruncateRel(rel, xlrec->blkno);
+			if (BlockNumberIsValid(blocks[nforks]))
+			{
+				first_removed_nblocks = xlrec->blkno;
+				forks[nforks] = FSM_FORKNUM;
+				new_nfsmblocks= blocks[nforks];
+				nforks++;
+				fsm_fork = true;
+			}
+		}
 		if ((xlrec->flags & SMGR_TRUNCATE_VM) != 0 &&
 			smgrexists(reln, VISIBILITYMAP_FORKNUM))
-			visibilitymap_truncate(rel, xlrec->blkno);
+		{
+			blocks[nforks] = visibilitymap_truncate_prepare(rel, xlrec->blkno);
+			if (BlockNumberIsValid(blocks[nforks]))
+			{
+				forks[nforks] = VISIBILITYMAP_FORKNUM;
+				newnblocks = blocks[nforks];
+				nforks++;
+				vm_fork = true;
+			}
+		}
+
+		/* Truncate relation forks simultaneously */
+		if (main_fork || fsm_fork || vm_fork)
+			smgrtruncate(reln, forks, nforks, blocks);
+
+		/* Also tell xlogutils.c about it */
+		if (main_fork)
+			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+
+		/*
+		 * Update upper-level FSM pages to account for the truncation.
+		 * This is important because the just-truncated pages were likely
+		 * marked as all-free, and would be preferentially selected.
+		 */
+		FreeSpaceMapVacuumRange(rel, first_removed_nblocks, InvalidBlockNumber);
 
 		FreeFakeRelcacheEntry(rel);
 	}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 7332e6b..512c8a1 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2899,8 +2899,8 @@ BufferGetLSNAtomic(Buffer buffer)
 /* ---------------------------------------------------------------------
  *		DropRelFileNodeBuffers
  *
- *		This function removes from the buffer pool all the pages of the
- *		specified relation fork that have block numbers >= firstDelBlock.
+ *		This function simultaneously removes from the buffer pool all the
+ *		pages of the relation forks that have block numbers >= firstDelBlock.
  *		(In particular, with firstDelBlock = 0, all pages are removed.)
  *		Dirty pages are simply dropped, without bothering to write them
  *		out first.  Therefore, this is NOT rollback-able, and so should be
@@ -2923,23 +2923,36 @@ BufferGetLSNAtomic(Buffer buffer)
  * --------------------------------------------------------------------
  */
 void
-DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
-					   BlockNumber firstDelBlock)
+DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
+					   int nforks, BlockNumber *firstDelBlock)
 {
-	int			i;
+	BlockNumber minBlock = InvalidBlockNumber;
 
 	/* If it's a local relation, it's localbuf.c's problem. */
 	if (RelFileNodeBackendIsTemp(rnode))
 	{
 		if (rnode.backend == MyBackendId)
-			DropRelFileNodeLocalBuffers(rnode.node, forkNum, firstDelBlock);
+		{
+			for (int i = 0; i < nforks; i++)
+				DropRelFileNodeLocalBuffers(rnode.node, forkNum[i],
+											firstDelBlock[i]);
+		}
 		return;
 	}
 
-	for (i = 0; i < NBuffers; i++)
+	/* Get the lower bound of target block number we're interested in */
+	for (int i = 0; i < nforks; i++)
+	{
+		if (!BlockNumberIsValid(minBlock) ||
+			minBlock > firstDelBlock[i])
+			minBlock = firstDelBlock[i];
+	}
+
+	for (int i = 0; i < NBuffers; i++)
 	{
 		BufferDesc *bufHdr = GetBufferDescriptor(i);
 		uint32		buf_state;
+		int		j = 0;
 
 		/*
 		 * We can make this a tad faster by prechecking the buffer tag before
@@ -2960,12 +2973,23 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
 		if (!RelFileNodeEquals(bufHdr->tag.rnode, rnode.node))
 			continue;
 
+		/* Check with the lower bound block number and skip the loop */
+		if (bufHdr->tag.blockNum < minBlock)
+			continue; /* skip checking the buffer pool scan */
+
 		buf_state = LockBufHdr(bufHdr);
-		if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
-			bufHdr->tag.forkNum == forkNum &&
-			bufHdr->tag.blockNum >= firstDelBlock)
-			InvalidateBuffer(bufHdr);	/* releases spinlock */
-		else
+
+		for (j = 0; j < nforks; j++)
+		{
+			if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
+				bufHdr->tag.forkNum == forkNum[j] &&
+				bufHdr->tag.blockNum >= firstDelBlock[j])
+			{
+				InvalidateBuffer(bufHdr); /* releases spinlock */
+				break;
+			}
+		}
+		if (j >= nforks)
 			UnlockBufHdr(bufHdr, buf_state);
 	}
 }
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index c17b3f4..9c29604 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -247,16 +247,16 @@ GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk)
 }
 
 /*
- * FreeSpaceMapTruncateRel - adjust for truncation of a relation.
+ * MarkFreeSpaceMapTruncateRel - adjust for truncation of a relation.
  *
- * The caller must hold AccessExclusiveLock on the relation, to ensure that
- * other backends receive the smgr invalidation event that this function sends
- * before they access the FSM again.
+ * This function marks the dirty page and returns a block number.
+ * The caller of this function must eventually call smgrtruncate() to actually
+ * truncate FSM pages.
  *
  * nblocks is the new size of the heap.
  */
-void
-FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
+BlockNumber
+MarkFreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 {
 	BlockNumber new_nfsmblocks;
 	FSMAddress	first_removed_address;
@@ -270,7 +270,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	 * truncate.
 	 */
 	if (!smgrexists(rel->rd_smgr, FSM_FORKNUM))
-		return;
+		return InvalidBlockNumber;
 
 	/* Get the location in the FSM of the first removed heap block */
 	first_removed_address = fsm_get_location(nblocks, &first_removed_slot);
@@ -285,7 +285,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	{
 		buf = fsm_readbuf(rel, first_removed_address, false);
 		if (!BufferIsValid(buf))
-			return;				/* nothing to do; the FSM was already smaller */
+			return InvalidBlockNumber;	/* nothing to do; the FSM was already smaller */
 		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
 
 		/* NO EREPORT(ERROR) from here till changes are logged */
@@ -310,33 +310,16 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 		UnlockReleaseBuffer(buf);
 
 		new_nfsmblocks = fsm_logical_to_physical(first_removed_address) + 1;
+		return new_nfsmblocks;
 	}
 	else
 	{
 		new_nfsmblocks = fsm_logical_to_physical(first_removed_address);
 		if (smgrnblocks(rel->rd_smgr, FSM_FORKNUM) <= new_nfsmblocks)
-			return;				/* nothing to do; the FSM was already smaller */
+			return InvalidBlockNumber;	/* nothing to do; the FSM was already smaller */
+		else
+			return new_nfsmblocks;
 	}
-
-	/* Truncate the unused FSM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, FSM_FORKNUM, new_nfsmblocks);
-
-	/*
-	 * We might as well update the local smgr_fsm_nblocks setting.
-	 * smgrtruncate sent an smgr cache inval message, which will cause other
-	 * backends to invalidate their copy of smgr_fsm_nblocks, and this one too
-	 * at the next command boundary.  But this ensures it isn't outright wrong
-	 * until then.
-	 */
-	if (rel->rd_smgr)
-		rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
-
-	/*
-	 * Update upper-level FSM pages to account for the truncation.  This is
-	 * important because the just-truncated pages were likely marked as
-	 * all-free, and would be preferentially selected.
-	 */
-	FreeSpaceMapVacuumRange(rel, nblocks, InvalidBlockNumber);
 }
 
 /*
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index dba8c39..ea57de7 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -498,29 +498,30 @@ smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo)
 }
 
 /*
- *	smgrdounlinkfork() -- Immediately unlink one fork of a relation.
+ *	smgrdounlinkfork() -- Immediately unlink each fork of a relation.
  *
- *		The specified fork of the relation is removed from the store.  This
- *		should not be used during transactional operations, since it can't be
- *		undone.
+ *		Each fork of the relation is removed from the store.  This should
+ *		not be used during transactional operations, since it can't be undone.
  *
  *		If isRedo is true, it is okay for the underlying file to be gone
  *		already.
  */
 void
-smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
+smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum, int nforks, bool isRedo)
 {
 	RelFileNodeBackend rnode = reln->smgr_rnode;
 	int			which = reln->smgr_which;
+	int			i;
 
-	/* Close the fork at smgr level */
-	smgrsw[which].smgr_close(reln, forknum);
+	/* Close each fork at smgr level */
+	for (i = 0; i < nforks; i++)
+		smgrsw[which].smgr_close(reln, forknum[i]);
 
 	/*
-	 * Get rid of any remaining buffers for the fork.  bufmgr will just drop
+	 * Get rid of any remaining buffers for each fork. bufmgr will just drop
 	 * them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(rnode, forknum, 0);
+	DropRelFileNodeBuffers(rnode, forknum, nforks, 0);
 
 	/*
 	 * It'd be nice to tell the stats collector to forget it immediately, too.
@@ -546,7 +547,8 @@ smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 	 * ERROR, because we've already decided to commit or abort the current
 	 * xact.
 	 */
-	smgrsw[which].smgr_unlink(rnode, forknum, isRedo);
+	for (i = 0; i < nforks; i++)
+		smgrsw[which].smgr_unlink(rnode, forknum[i], isRedo);
 }
 
 /*
@@ -643,13 +645,15 @@ smgrnblocks(SMgrRelation reln, ForkNumber forknum)
  * The truncation is done immediately, so this can't be rolled back.
  */
 void
-smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
+smgrtruncate(SMgrRelation reln, ForkNumber *forknum, int nforks, BlockNumber *nblocks)
 {
+	int		i;
+
 	/*
 	 * Get rid of any buffers for the about-to-be-deleted blocks. bufmgr will
 	 * just drop them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nblocks);
+	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nforks, nblocks);
 
 	/*
 	 * Send a shared-inval message to force other backends to close any smgr
@@ -663,10 +667,23 @@ smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 	 */
 	CacheInvalidateSmgr(reln->smgr_rnode);
 
-	/*
-	 * Do the truncation.
-	 */
-	smgrsw[reln->smgr_which].smgr_truncate(reln, forknum, nblocks);
+	/* Do the truncation */
+	for (i = 0; i < nforks; i++)
+	{
+		smgrsw[reln->smgr_which].smgr_truncate(reln, forknum[i], nblocks[i]);
+
+		/*
+		 * We might as well update the local smgr_fsm_nblocks and smgr_vm_nblocks
+		 * setting. smgrtruncate sent an smgr cache inval message, which will
+		 * cause other backends to invalidate their copy of smgr_fsm_nblocks and
+		 * smgr_vm_nblocks, and these ones too at the next command boundary. But
+		 * this ensures these aren't outright wrong until then.
+		 */
+		if (forknum[i] == FSM_FORKNUM)
+			reln->smgr_fsm_nblocks = nblocks[i];
+		if (forknum[i] == VISIBILITYMAP_FORKNUM)
+			reln->smgr_vm_nblocks = nblocks[i];
+	}
 }
 
 /*
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 2d88043..1ab6a81 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -44,6 +44,6 @@ extern void visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 							  uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
-extern void visibilitymap_truncate(Relation rel, BlockNumber nheapblocks);
+extern BlockNumber visibilitymap_truncate_prepare(Relation rel, BlockNumber nheapblocks);
 
 #endif							/* VISIBILITYMAP_H */
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 509f4b7..17b97f7 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -190,8 +190,8 @@ extern BlockNumber RelationGetNumberOfBlocksInFork(Relation relation,
 extern void FlushOneBuffer(Buffer buffer);
 extern void FlushRelationBuffers(Relation rel);
 extern void FlushDatabaseBuffers(Oid dbid);
-extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode,
-								   ForkNumber forkNum, BlockNumber firstDelBlock);
+extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
+								   int nforks, BlockNumber *firstDelBlock);
 extern void DropRelFileNodesAllBuffers(RelFileNodeBackend *rnodes, int nnodes);
 extern void DropDatabaseBuffers(Oid dbid);
 
diff --git a/src/include/storage/freespace.h b/src/include/storage/freespace.h
index 8d8c465..bf19a67 100644
--- a/src/include/storage/freespace.h
+++ b/src/include/storage/freespace.h
@@ -30,7 +30,7 @@ extern void RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk,
 extern void XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk,
 										Size spaceAvail);
 
-extern void FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
+extern BlockNumber MarkFreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
 extern void FreeSpaceMapVacuum(Relation rel);
 extern void FreeSpaceMapVacuumRange(Relation rel, BlockNumber start,
 									BlockNumber end);
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index d286c8c..a24532c 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -90,7 +90,8 @@ extern void smgrclosenode(RelFileNodeBackend rnode);
 extern void smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern void smgrdounlink(SMgrRelation reln, bool isRedo);
 extern void smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo);
-extern void smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo);
+extern void smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum,
+							 int nforks, bool isRedo);
 extern void smgrextend(SMgrRelation reln, ForkNumber forknum,
 					   BlockNumber blocknum, char *buffer, bool skipFsync);
 extern void smgrprefetch(SMgrRelation reln, ForkNumber forknum,
@@ -102,8 +103,8 @@ extern void smgrwrite(SMgrRelation reln, ForkNumber forknum,
 extern void smgrwriteback(SMgrRelation reln, ForkNumber forknum,
 						  BlockNumber blocknum, BlockNumber nblocks);
 extern BlockNumber smgrnblocks(SMgrRelation reln, ForkNumber forknum);
-extern void smgrtruncate(SMgrRelation reln, ForkNumber forknum,
-						 BlockNumber nblocks);
+extern void smgrtruncate(SMgrRelation reln, ForkNumber *forknum,
+						 int nforks, BlockNumber *nblocks);
 extern void smgrimmedsync(SMgrRelation reln, ForkNumber forknum);
 extern void AtEOXact_SMgr(void);
 
-- 
1.8.3.1

#20Thomas Munro
thomas.munro@gmail.com
In reply to: Jamison, Kirk (#19)
Re: [PATCH] Speedup truncates of relation forks

On Fri, Jul 5, 2019 at 3:03 PM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote:

I updated the patch which is similar to V3 of the patch,
but addressing my problem in (5) in the previous email regarding FreeSpaceMapVacuumRange.
It seems to pass the regression test now. Kindly check for validation.

Hi Kirk,

FYI there are a couple of compiler errors reported:

Windows compiler:

contrib/pg_visibility/pg_visibility.c(400): error C2143: syntax error
: missing ')' before '{'
[C:\projects\postgresql\pg_visibility.vcxproj]

GCC:

storage.c: In function ‘RelationTruncate’:
storage.c:238:14: error: variable ‘newnblocks’ set but not used
[-Werror=unused-but-set-variable]
BlockNumber newnblocks = InvalidBlockNumber;
^
storage.c:237:14: error: variable ‘new_nfsmblocks’ set but not used
[-Werror=unused-but-set-variable]
BlockNumber new_nfsmblocks = InvalidBlockNumber;
^
storage.c: In function ‘smgr_redo’:
storage.c:634:15: error: variable ‘newnblocks’ set but not used
[-Werror=unused-but-set-variable]
BlockNumber newnblocks = InvalidBlockNumber;
^
storage.c:633:15: error: variable ‘new_nfsmblocks’ set but not used
[-Werror=unused-but-set-variable]
BlockNumber new_nfsmblocks = InvalidBlockNumber;
^

--
Thomas Munro
https://enterprisedb.com

#21Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Thomas Munro (#20)
1 attachment(s)
RE: [PATCH] Speedup truncates of relation forks

Hi Thomas,

Thanks for checking.

On Fri, Jul 5, 2019 at 3:03 PM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote:

I updated the patch which is similar to V3 of the patch, but
addressing my problem in (5) in the previous email regarding

FreeSpaceMapVacuumRange.

It seems to pass the regression test now. Kindly check for validation.

Hi Kirk,

FYI there are a couple of compiler errors reported:

Attached is the updated patch (V5) fixing the compiler errors.

Comments and reviews about the patch/tests are very much welcome.

Regards,
Kirk Jamison

Attachments:

v5-0001-Speedup-truncates-of-relation-forks.patchapplication/octet-stream; name=v5-0001-Speedup-truncates-of-relation-forks.patchDownload
From e661b6ec1c0cee764830850d68799bf9a08bb99f Mon Sep 17 00:00:00 2001
From: Kirk Jamison <k.jamison@jp.fujitsu.com>
Date: Thu, 4 Jul 2019 10:59:05 +0000
Subject: [PATCH] Speedup truncates of relation forks

Whenever we truncate relations, it involves several scans of the
shared buffers for every call of smgrtruncate() for each fork which
is time-consuming. This patch reduces the scan for all forks into one
instead of three, and improves the relation truncates by initially
marking and preparing the pages-to-be-truncated of relation forks,
then simultaneously truncating them, resulting to an improved
performance in VACUUM, autovacuum operations, and their recovery.
---
 contrib/pg_visibility/pg_visibility.c     |   8 ++-
 src/backend/access/heap/visibilitymap.c   |  36 ++++-------
 src/backend/catalog/storage.c             | 102 ++++++++++++++++++++++++++----
 src/backend/storage/buffer/bufmgr.c       |  48 ++++++++++----
 src/backend/storage/freespace/freespace.c |  41 ++++--------
 src/backend/storage/smgr/smgr.c           |  49 +++++++++-----
 src/include/access/visibilitymap.h        |   2 +-
 src/include/storage/bufmgr.h              |   4 +-
 src/include/storage/freespace.h           |   2 +-
 src/include/storage/smgr.h                |   7 +-
 10 files changed, 199 insertions(+), 100 deletions(-)

diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 1372bb6..60eff7f 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -383,6 +383,8 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
 {
 	Oid			relid = PG_GETARG_OID(0);
 	Relation	rel;
+	ForkNumber	fork;
+	BlockNumber	block;
 
 	rel = relation_open(relid, AccessExclusiveLock);
 
@@ -392,7 +394,11 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
 	RelationOpenSmgr(rel);
 	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
 
-	visibilitymap_truncate(rel, 0);
+	block = visibilitymap_truncate_prepare(rel, 0);
+	if (BlockNumberIsValid(block))
+		fork = VISIBILITYMAP_FORKNUM;
+
+	smgrtruncate(rel->rd_smgr, &fork, 1, &block);
 
 	if (RelationNeedsWAL(rel))
 	{
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dfe06..4cc7977 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -17,7 +17,7 @@
  *		visibilitymap_set	 - set a bit in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
- *		visibilitymap_truncate	- truncate the visibility map
+ *		visibilitymap_truncate_prepare - truncate only tail bits of map pages
  *
  * NOTES
  *
@@ -430,16 +430,18 @@ visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_fro
 }
 
 /*
- *	visibilitymap_truncate - truncate the visibility map
+ *	visibilitymap_truncate_prepare - truncate only tail bits of map page
+ *									 and return the block number for actual
+ *									 truncation later
  *
- * The caller must hold AccessExclusiveLock on the relation, to ensure that
- * other backends receive the smgr invalidation event that this function sends
- * before they access the VM again.
+ * Note that this does not truncate the actual visibility map pages.
+ * When this function is called, the caller must eventually follow it with
+ * smgrtruncate() call to actually truncate visibility map pages.
  *
  * nheapblocks is the new size of the heap.
  */
-void
-visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
+BlockNumber
+visibilitymap_truncate_prepare(Relation rel, BlockNumber nheapblocks)
 {
 	BlockNumber newnblocks;
 
@@ -459,7 +461,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	 * nothing to truncate.
 	 */
 	if (!smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM))
-		return;
+		return InvalidBlockNumber;
 
 	/*
 	 * Unless the new size is exactly at a visibility map page boundary, the
@@ -480,7 +482,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 		if (!BufferIsValid(mapBuffer))
 		{
 			/* nothing to do, the file was already smaller */
-			return;
+			return InvalidBlockNumber;
 		}
 
 		page = BufferGetPage(mapBuffer);
@@ -528,20 +530,10 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	if (smgrnblocks(rel->rd_smgr, VISIBILITYMAP_FORKNUM) <= newnblocks)
 	{
 		/* nothing to do, the file was already smaller than requested size */
-		return;
+		return InvalidBlockNumber;
 	}
-
-	/* Truncate the unused VM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, newnblocks);
-
-	/*
-	 * We might as well update the local smgr_vm_nblocks setting. smgrtruncate
-	 * sent an smgr cache inval message, which will cause other backends to
-	 * invalidate their copy of smgr_vm_nblocks, and this one too at the next
-	 * command boundary.  But this ensures it isn't outright wrong until then.
-	 */
-	if (rel->rd_smgr)
-		rel->rd_smgr->smgr_vm_nblocks = newnblocks;
+	else
+		return newnblocks;
 }
 
 /*
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 3cc886f..395ef0f 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -231,6 +231,10 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 {
 	bool		fsm;
 	bool		vm;
+	ForkNumber	forks[MAX_FORKNUM];
+	BlockNumber	blocks[MAX_FORKNUM];
+	BlockNumber	first_removed_nblocks = InvalidBlockNumber;
+	int		nforks = 0;
 
 	/* Open it at the smgr level if not already done */
 	RelationOpenSmgr(rel);
@@ -242,15 +246,33 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 	rel->rd_smgr->smgr_fsm_nblocks = InvalidBlockNumber;
 	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
 
-	/* Truncate the FSM first if it exists */
+	/* Mark the dirty FSM page and return a block number. */
 	fsm = smgrexists(rel->rd_smgr, FSM_FORKNUM);
 	if (fsm)
-		FreeSpaceMapTruncateRel(rel, nblocks);
+	{
+		blocks[nforks] = MarkFreeSpaceMapTruncateRel(rel, nblocks);
+		if (BlockNumberIsValid(blocks[nforks]))
+		{
+			first_removed_nblocks = nblocks;
+			forks[nforks] = FSM_FORKNUM;
+			nforks++;
+		}
+	}
 
-	/* Truncate the visibility map too if it exists. */
+	/*
+	 * Truncate only the tail bits of VM and return the block number
+	 * for actual truncation later in smgrtruncate.
+	 */
 	vm = smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM);
 	if (vm)
-		visibilitymap_truncate(rel, nblocks);
+	{
+		blocks[nforks] = visibilitymap_truncate_prepare(rel, nblocks);
+		if (BlockNumberIsValid(blocks[nforks]))
+		{
+			forks[nforks] = VISIBILITYMAP_FORKNUM;
+			nforks++;
+		}
+	}
 
 	/*
 	 * We WAL-log the truncation before actually truncating, which means
@@ -290,8 +312,20 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 			XLogFlush(lsn);
 	}
 
-	/* Do the real work */
-	smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks);
+	/* Mark the MAIN fork */
+	forks[nforks] = MAIN_FORKNUM;
+	blocks[nforks] = nblocks;
+	nforks++;
+
+	/* Truncate relation forks simultaneously */
+	smgrtruncate(rel->rd_smgr, forks, nforks, blocks);
+
+	/*
+	 * Update upper-level FSM pages to account for the truncation.
+	 * This is important because the just-truncated pages were likely
+	 * marked as all-free, and would be preferentially selected.
+	 */
+	FreeSpaceMapVacuumRange(rel, first_removed_nblocks, InvalidBlockNumber);
 }
 
 /*
@@ -588,6 +622,13 @@ smgr_redo(XLogReaderState *record)
 		xl_smgr_truncate *xlrec = (xl_smgr_truncate *) XLogRecGetData(record);
 		SMgrRelation reln;
 		Relation	rel;
+		ForkNumber	forks[MAX_FORKNUM];
+		BlockNumber	blocks[MAX_FORKNUM];
+		BlockNumber	first_removed_nblocks = InvalidBlockNumber;
+		int		nforks = 0;
+		bool		fsm_fork = false;
+		bool		main_fork = false;
+		bool		vm_fork = false;
 
 		reln = smgropen(xlrec->rnode, InvalidBackendId);
 
@@ -616,23 +657,58 @@ smgr_redo(XLogReaderState *record)
 		 */
 		XLogFlush(lsn);
 
+		/*
+		 * To speedup recovery, we mark the about-to-be-truncated blocks of
+		 * relation forks first, then truncate those simultaneously later.
+		 */
 		if ((xlrec->flags & SMGR_TRUNCATE_HEAP) != 0)
 		{
-			smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno);
-
-			/* Also tell xlogutils.c about it */
-			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+			forks[nforks] = MAIN_FORKNUM;
+			blocks[nforks] = xlrec->blkno;
+			nforks++;
+			main_fork = true;
 		}
 
-		/* Truncate FSM and VM too */
 		rel = CreateFakeRelcacheEntry(xlrec->rnode);
 
 		if ((xlrec->flags & SMGR_TRUNCATE_FSM) != 0 &&
 			smgrexists(reln, FSM_FORKNUM))
-			FreeSpaceMapTruncateRel(rel, xlrec->blkno);
+		{
+			blocks[nforks] = MarkFreeSpaceMapTruncateRel(rel, xlrec->blkno);
+			if (BlockNumberIsValid(blocks[nforks]))
+			{
+				first_removed_nblocks = xlrec->blkno;
+				forks[nforks] = FSM_FORKNUM;
+				nforks++;
+				fsm_fork = true;
+			}
+		}
 		if ((xlrec->flags & SMGR_TRUNCATE_VM) != 0 &&
 			smgrexists(reln, VISIBILITYMAP_FORKNUM))
-			visibilitymap_truncate(rel, xlrec->blkno);
+		{
+			blocks[nforks] = visibilitymap_truncate_prepare(rel, xlrec->blkno);
+			if (BlockNumberIsValid(blocks[nforks]))
+			{
+				forks[nforks] = VISIBILITYMAP_FORKNUM;
+				nforks++;
+				vm_fork = true;
+			}
+		}
+
+		/* Truncate relation forks simultaneously */
+		if (main_fork || fsm_fork || vm_fork)
+			smgrtruncate(reln, forks, nforks, blocks);
+
+		/* Also tell xlogutils.c about it */
+		if (main_fork)
+			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+
+		/*
+		 * Update upper-level FSM pages to account for the truncation.
+		 * This is important because the just-truncated pages were likely
+		 * marked as all-free, and would be preferentially selected.
+		 */
+		FreeSpaceMapVacuumRange(rel, first_removed_nblocks, InvalidBlockNumber);
 
 		FreeFakeRelcacheEntry(rel);
 	}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 7332e6b..512c8a1 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2899,8 +2899,8 @@ BufferGetLSNAtomic(Buffer buffer)
 /* ---------------------------------------------------------------------
  *		DropRelFileNodeBuffers
  *
- *		This function removes from the buffer pool all the pages of the
- *		specified relation fork that have block numbers >= firstDelBlock.
+ *		This function simultaneously removes from the buffer pool all the
+ *		pages of the relation forks that have block numbers >= firstDelBlock.
  *		(In particular, with firstDelBlock = 0, all pages are removed.)
  *		Dirty pages are simply dropped, without bothering to write them
  *		out first.  Therefore, this is NOT rollback-able, and so should be
@@ -2923,23 +2923,36 @@ BufferGetLSNAtomic(Buffer buffer)
  * --------------------------------------------------------------------
  */
 void
-DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
-					   BlockNumber firstDelBlock)
+DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
+					   int nforks, BlockNumber *firstDelBlock)
 {
-	int			i;
+	BlockNumber minBlock = InvalidBlockNumber;
 
 	/* If it's a local relation, it's localbuf.c's problem. */
 	if (RelFileNodeBackendIsTemp(rnode))
 	{
 		if (rnode.backend == MyBackendId)
-			DropRelFileNodeLocalBuffers(rnode.node, forkNum, firstDelBlock);
+		{
+			for (int i = 0; i < nforks; i++)
+				DropRelFileNodeLocalBuffers(rnode.node, forkNum[i],
+											firstDelBlock[i]);
+		}
 		return;
 	}
 
-	for (i = 0; i < NBuffers; i++)
+	/* Get the lower bound of target block number we're interested in */
+	for (int i = 0; i < nforks; i++)
+	{
+		if (!BlockNumberIsValid(minBlock) ||
+			minBlock > firstDelBlock[i])
+			minBlock = firstDelBlock[i];
+	}
+
+	for (int i = 0; i < NBuffers; i++)
 	{
 		BufferDesc *bufHdr = GetBufferDescriptor(i);
 		uint32		buf_state;
+		int		j = 0;
 
 		/*
 		 * We can make this a tad faster by prechecking the buffer tag before
@@ -2960,12 +2973,23 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
 		if (!RelFileNodeEquals(bufHdr->tag.rnode, rnode.node))
 			continue;
 
+		/* Check with the lower bound block number and skip the loop */
+		if (bufHdr->tag.blockNum < minBlock)
+			continue; /* skip checking the buffer pool scan */
+
 		buf_state = LockBufHdr(bufHdr);
-		if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
-			bufHdr->tag.forkNum == forkNum &&
-			bufHdr->tag.blockNum >= firstDelBlock)
-			InvalidateBuffer(bufHdr);	/* releases spinlock */
-		else
+
+		for (j = 0; j < nforks; j++)
+		{
+			if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
+				bufHdr->tag.forkNum == forkNum[j] &&
+				bufHdr->tag.blockNum >= firstDelBlock[j])
+			{
+				InvalidateBuffer(bufHdr); /* releases spinlock */
+				break;
+			}
+		}
+		if (j >= nforks)
 			UnlockBufHdr(bufHdr, buf_state);
 	}
 }
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index c17b3f4..9c29604 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -247,16 +247,16 @@ GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk)
 }
 
 /*
- * FreeSpaceMapTruncateRel - adjust for truncation of a relation.
+ * MarkFreeSpaceMapTruncateRel - adjust for truncation of a relation.
  *
- * The caller must hold AccessExclusiveLock on the relation, to ensure that
- * other backends receive the smgr invalidation event that this function sends
- * before they access the FSM again.
+ * This function marks the dirty page and returns a block number.
+ * The caller of this function must eventually call smgrtruncate() to actually
+ * truncate FSM pages.
  *
  * nblocks is the new size of the heap.
  */
-void
-FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
+BlockNumber
+MarkFreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 {
 	BlockNumber new_nfsmblocks;
 	FSMAddress	first_removed_address;
@@ -270,7 +270,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	 * truncate.
 	 */
 	if (!smgrexists(rel->rd_smgr, FSM_FORKNUM))
-		return;
+		return InvalidBlockNumber;
 
 	/* Get the location in the FSM of the first removed heap block */
 	first_removed_address = fsm_get_location(nblocks, &first_removed_slot);
@@ -285,7 +285,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	{
 		buf = fsm_readbuf(rel, first_removed_address, false);
 		if (!BufferIsValid(buf))
-			return;				/* nothing to do; the FSM was already smaller */
+			return InvalidBlockNumber;	/* nothing to do; the FSM was already smaller */
 		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
 
 		/* NO EREPORT(ERROR) from here till changes are logged */
@@ -310,33 +310,16 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 		UnlockReleaseBuffer(buf);
 
 		new_nfsmblocks = fsm_logical_to_physical(first_removed_address) + 1;
+		return new_nfsmblocks;
 	}
 	else
 	{
 		new_nfsmblocks = fsm_logical_to_physical(first_removed_address);
 		if (smgrnblocks(rel->rd_smgr, FSM_FORKNUM) <= new_nfsmblocks)
-			return;				/* nothing to do; the FSM was already smaller */
+			return InvalidBlockNumber;	/* nothing to do; the FSM was already smaller */
+		else
+			return new_nfsmblocks;
 	}
-
-	/* Truncate the unused FSM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, FSM_FORKNUM, new_nfsmblocks);
-
-	/*
-	 * We might as well update the local smgr_fsm_nblocks setting.
-	 * smgrtruncate sent an smgr cache inval message, which will cause other
-	 * backends to invalidate their copy of smgr_fsm_nblocks, and this one too
-	 * at the next command boundary.  But this ensures it isn't outright wrong
-	 * until then.
-	 */
-	if (rel->rd_smgr)
-		rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
-
-	/*
-	 * Update upper-level FSM pages to account for the truncation.  This is
-	 * important because the just-truncated pages were likely marked as
-	 * all-free, and would be preferentially selected.
-	 */
-	FreeSpaceMapVacuumRange(rel, nblocks, InvalidBlockNumber);
 }
 
 /*
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index dba8c39..ea57de7 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -498,29 +498,30 @@ smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo)
 }
 
 /*
- *	smgrdounlinkfork() -- Immediately unlink one fork of a relation.
+ *	smgrdounlinkfork() -- Immediately unlink each fork of a relation.
  *
- *		The specified fork of the relation is removed from the store.  This
- *		should not be used during transactional operations, since it can't be
- *		undone.
+ *		Each fork of the relation is removed from the store.  This should
+ *		not be used during transactional operations, since it can't be undone.
  *
  *		If isRedo is true, it is okay for the underlying file to be gone
  *		already.
  */
 void
-smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
+smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum, int nforks, bool isRedo)
 {
 	RelFileNodeBackend rnode = reln->smgr_rnode;
 	int			which = reln->smgr_which;
+	int			i;
 
-	/* Close the fork at smgr level */
-	smgrsw[which].smgr_close(reln, forknum);
+	/* Close each fork at smgr level */
+	for (i = 0; i < nforks; i++)
+		smgrsw[which].smgr_close(reln, forknum[i]);
 
 	/*
-	 * Get rid of any remaining buffers for the fork.  bufmgr will just drop
+	 * Get rid of any remaining buffers for each fork. bufmgr will just drop
 	 * them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(rnode, forknum, 0);
+	DropRelFileNodeBuffers(rnode, forknum, nforks, 0);
 
 	/*
 	 * It'd be nice to tell the stats collector to forget it immediately, too.
@@ -546,7 +547,8 @@ smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 	 * ERROR, because we've already decided to commit or abort the current
 	 * xact.
 	 */
-	smgrsw[which].smgr_unlink(rnode, forknum, isRedo);
+	for (i = 0; i < nforks; i++)
+		smgrsw[which].smgr_unlink(rnode, forknum[i], isRedo);
 }
 
 /*
@@ -643,13 +645,15 @@ smgrnblocks(SMgrRelation reln, ForkNumber forknum)
  * The truncation is done immediately, so this can't be rolled back.
  */
 void
-smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
+smgrtruncate(SMgrRelation reln, ForkNumber *forknum, int nforks, BlockNumber *nblocks)
 {
+	int		i;
+
 	/*
 	 * Get rid of any buffers for the about-to-be-deleted blocks. bufmgr will
 	 * just drop them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nblocks);
+	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nforks, nblocks);
 
 	/*
 	 * Send a shared-inval message to force other backends to close any smgr
@@ -663,10 +667,23 @@ smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 	 */
 	CacheInvalidateSmgr(reln->smgr_rnode);
 
-	/*
-	 * Do the truncation.
-	 */
-	smgrsw[reln->smgr_which].smgr_truncate(reln, forknum, nblocks);
+	/* Do the truncation */
+	for (i = 0; i < nforks; i++)
+	{
+		smgrsw[reln->smgr_which].smgr_truncate(reln, forknum[i], nblocks[i]);
+
+		/*
+		 * We might as well update the local smgr_fsm_nblocks and smgr_vm_nblocks
+		 * setting. smgrtruncate sent an smgr cache inval message, which will
+		 * cause other backends to invalidate their copy of smgr_fsm_nblocks and
+		 * smgr_vm_nblocks, and these ones too at the next command boundary. But
+		 * this ensures these aren't outright wrong until then.
+		 */
+		if (forknum[i] == FSM_FORKNUM)
+			reln->smgr_fsm_nblocks = nblocks[i];
+		if (forknum[i] == VISIBILITYMAP_FORKNUM)
+			reln->smgr_vm_nblocks = nblocks[i];
+	}
 }
 
 /*
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 2d88043..1ab6a81 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -44,6 +44,6 @@ extern void visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 							  uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
-extern void visibilitymap_truncate(Relation rel, BlockNumber nheapblocks);
+extern BlockNumber visibilitymap_truncate_prepare(Relation rel, BlockNumber nheapblocks);
 
 #endif							/* VISIBILITYMAP_H */
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 509f4b7..17b97f7 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -190,8 +190,8 @@ extern BlockNumber RelationGetNumberOfBlocksInFork(Relation relation,
 extern void FlushOneBuffer(Buffer buffer);
 extern void FlushRelationBuffers(Relation rel);
 extern void FlushDatabaseBuffers(Oid dbid);
-extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode,
-								   ForkNumber forkNum, BlockNumber firstDelBlock);
+extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
+								   int nforks, BlockNumber *firstDelBlock);
 extern void DropRelFileNodesAllBuffers(RelFileNodeBackend *rnodes, int nnodes);
 extern void DropDatabaseBuffers(Oid dbid);
 
diff --git a/src/include/storage/freespace.h b/src/include/storage/freespace.h
index 8d8c465..bf19a67 100644
--- a/src/include/storage/freespace.h
+++ b/src/include/storage/freespace.h
@@ -30,7 +30,7 @@ extern void RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk,
 extern void XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk,
 										Size spaceAvail);
 
-extern void FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
+extern BlockNumber MarkFreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
 extern void FreeSpaceMapVacuum(Relation rel);
 extern void FreeSpaceMapVacuumRange(Relation rel, BlockNumber start,
 									BlockNumber end);
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index d286c8c..a24532c 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -90,7 +90,8 @@ extern void smgrclosenode(RelFileNodeBackend rnode);
 extern void smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern void smgrdounlink(SMgrRelation reln, bool isRedo);
 extern void smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo);
-extern void smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo);
+extern void smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum,
+							 int nforks, bool isRedo);
 extern void smgrextend(SMgrRelation reln, ForkNumber forknum,
 					   BlockNumber blocknum, char *buffer, bool skipFsync);
 extern void smgrprefetch(SMgrRelation reln, ForkNumber forknum,
@@ -102,8 +103,8 @@ extern void smgrwrite(SMgrRelation reln, ForkNumber forknum,
 extern void smgrwriteback(SMgrRelation reln, ForkNumber forknum,
 						  BlockNumber blocknum, BlockNumber nblocks);
 extern BlockNumber smgrnblocks(SMgrRelation reln, ForkNumber forknum);
-extern void smgrtruncate(SMgrRelation reln, ForkNumber forknum,
-						 BlockNumber nblocks);
+extern void smgrtruncate(SMgrRelation reln, ForkNumber *forknum,
+						 int nforks, BlockNumber *nblocks);
 extern void smgrimmedsync(SMgrRelation reln, ForkNumber forknum);
 extern void AtEOXact_SMgr(void);
 
-- 
1.8.3.1

#22Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Jamison, Kirk (#21)
1 attachment(s)
RE: [PATCH] Speedup truncates of relation forks

Hi,

I repeated the recovery performance test before, and found out that I made a
wrong measurement.
Using the same steps indicated in the previous email (24GB shared_buffers for my case),
the recovery time still significantly improved compared to head
from "13 minutes" to "4 minutes 44 seconds" //not 30 seconds.
It's expected because the measurement of vacuum execution time (no failover)
which I reported in the first email is about the same (although 5 minutes):

HEAD results
3) 24GB shared_buffers = 14 min 13.598 s
PATCH results
3) 24GB shared_buffers = 5 min 35.848 s

Reattaching the patch here again. The V5 of the patch fixed the compile error
mentioned before and mainly addressed the comments/advice of Sawada-san.
- updated more accurate comments describing only current behavior, not history
- updated function name: visibilitymap_truncate_prepare()
- moved the setting of values for smgr_{fsm,vm}_nblocks inside the smgrtruncate()

I'd be grateful if anyone could provide comments, advice, or insights.
Thank you again in advance.

Regards,
Kirk Jamison

Attachments:

v5-0001-Speedup-truncates-of-relation-forks.patchapplication/octet-stream; name=v5-0001-Speedup-truncates-of-relation-forks.patchDownload
From e661b6ec1c0cee764830850d68799bf9a08bb99f Mon Sep 17 00:00:00 2001
From: Kirk Jamison <k.jamison@jp.fujitsu.com>
Date: Thu, 4 Jul 2019 10:59:05 +0000
Subject: [PATCH] Speedup truncates of relation forks

Whenever we truncate relations, it involves several scans of the
shared buffers for every call of smgrtruncate() for each fork which
is time-consuming. This patch reduces the scan for all forks into one
instead of three, and improves the relation truncates by initially
marking and preparing the pages-to-be-truncated of relation forks,
then simultaneously truncating them, resulting to an improved
performance in VACUUM, autovacuum operations, and their recovery.
---
 contrib/pg_visibility/pg_visibility.c     |   8 ++-
 src/backend/access/heap/visibilitymap.c   |  36 ++++-------
 src/backend/catalog/storage.c             | 102 ++++++++++++++++++++++++++----
 src/backend/storage/buffer/bufmgr.c       |  48 ++++++++++----
 src/backend/storage/freespace/freespace.c |  41 ++++--------
 src/backend/storage/smgr/smgr.c           |  49 +++++++++-----
 src/include/access/visibilitymap.h        |   2 +-
 src/include/storage/bufmgr.h              |   4 +-
 src/include/storage/freespace.h           |   2 +-
 src/include/storage/smgr.h                |   7 +-
 10 files changed, 199 insertions(+), 100 deletions(-)

diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 1372bb6..60eff7f 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -383,6 +383,8 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
 {
 	Oid			relid = PG_GETARG_OID(0);
 	Relation	rel;
+	ForkNumber	fork;
+	BlockNumber	block;
 
 	rel = relation_open(relid, AccessExclusiveLock);
 
@@ -392,7 +394,11 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
 	RelationOpenSmgr(rel);
 	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
 
-	visibilitymap_truncate(rel, 0);
+	block = visibilitymap_truncate_prepare(rel, 0);
+	if (BlockNumberIsValid(block))
+		fork = VISIBILITYMAP_FORKNUM;
+
+	smgrtruncate(rel->rd_smgr, &fork, 1, &block);
 
 	if (RelationNeedsWAL(rel))
 	{
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dfe06..4cc7977 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -17,7 +17,7 @@
  *		visibilitymap_set	 - set a bit in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
- *		visibilitymap_truncate	- truncate the visibility map
+ *		visibilitymap_truncate_prepare - truncate only tail bits of map pages
  *
  * NOTES
  *
@@ -430,16 +430,18 @@ visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_fro
 }
 
 /*
- *	visibilitymap_truncate - truncate the visibility map
+ *	visibilitymap_truncate_prepare - truncate only tail bits of map page
+ *									 and return the block number for actual
+ *									 truncation later
  *
- * The caller must hold AccessExclusiveLock on the relation, to ensure that
- * other backends receive the smgr invalidation event that this function sends
- * before they access the VM again.
+ * Note that this does not truncate the actual visibility map pages.
+ * When this function is called, the caller must eventually follow it with
+ * smgrtruncate() call to actually truncate visibility map pages.
  *
  * nheapblocks is the new size of the heap.
  */
-void
-visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
+BlockNumber
+visibilitymap_truncate_prepare(Relation rel, BlockNumber nheapblocks)
 {
 	BlockNumber newnblocks;
 
@@ -459,7 +461,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	 * nothing to truncate.
 	 */
 	if (!smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM))
-		return;
+		return InvalidBlockNumber;
 
 	/*
 	 * Unless the new size is exactly at a visibility map page boundary, the
@@ -480,7 +482,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 		if (!BufferIsValid(mapBuffer))
 		{
 			/* nothing to do, the file was already smaller */
-			return;
+			return InvalidBlockNumber;
 		}
 
 		page = BufferGetPage(mapBuffer);
@@ -528,20 +530,10 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	if (smgrnblocks(rel->rd_smgr, VISIBILITYMAP_FORKNUM) <= newnblocks)
 	{
 		/* nothing to do, the file was already smaller than requested size */
-		return;
+		return InvalidBlockNumber;
 	}
-
-	/* Truncate the unused VM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, newnblocks);
-
-	/*
-	 * We might as well update the local smgr_vm_nblocks setting. smgrtruncate
-	 * sent an smgr cache inval message, which will cause other backends to
-	 * invalidate their copy of smgr_vm_nblocks, and this one too at the next
-	 * command boundary.  But this ensures it isn't outright wrong until then.
-	 */
-	if (rel->rd_smgr)
-		rel->rd_smgr->smgr_vm_nblocks = newnblocks;
+	else
+		return newnblocks;
 }
 
 /*
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 3cc886f..395ef0f 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -231,6 +231,10 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 {
 	bool		fsm;
 	bool		vm;
+	ForkNumber	forks[MAX_FORKNUM];
+	BlockNumber	blocks[MAX_FORKNUM];
+	BlockNumber	first_removed_nblocks = InvalidBlockNumber;
+	int		nforks = 0;
 
 	/* Open it at the smgr level if not already done */
 	RelationOpenSmgr(rel);
@@ -242,15 +246,33 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 	rel->rd_smgr->smgr_fsm_nblocks = InvalidBlockNumber;
 	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
 
-	/* Truncate the FSM first if it exists */
+	/* Mark the dirty FSM page and return a block number. */
 	fsm = smgrexists(rel->rd_smgr, FSM_FORKNUM);
 	if (fsm)
-		FreeSpaceMapTruncateRel(rel, nblocks);
+	{
+		blocks[nforks] = MarkFreeSpaceMapTruncateRel(rel, nblocks);
+		if (BlockNumberIsValid(blocks[nforks]))
+		{
+			first_removed_nblocks = nblocks;
+			forks[nforks] = FSM_FORKNUM;
+			nforks++;
+		}
+	}
 
-	/* Truncate the visibility map too if it exists. */
+	/*
+	 * Truncate only the tail bits of VM and return the block number
+	 * for actual truncation later in smgrtruncate.
+	 */
 	vm = smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM);
 	if (vm)
-		visibilitymap_truncate(rel, nblocks);
+	{
+		blocks[nforks] = visibilitymap_truncate_prepare(rel, nblocks);
+		if (BlockNumberIsValid(blocks[nforks]))
+		{
+			forks[nforks] = VISIBILITYMAP_FORKNUM;
+			nforks++;
+		}
+	}
 
 	/*
 	 * We WAL-log the truncation before actually truncating, which means
@@ -290,8 +312,20 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 			XLogFlush(lsn);
 	}
 
-	/* Do the real work */
-	smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks);
+	/* Mark the MAIN fork */
+	forks[nforks] = MAIN_FORKNUM;
+	blocks[nforks] = nblocks;
+	nforks++;
+
+	/* Truncate relation forks simultaneously */
+	smgrtruncate(rel->rd_smgr, forks, nforks, blocks);
+
+	/*
+	 * Update upper-level FSM pages to account for the truncation.
+	 * This is important because the just-truncated pages were likely
+	 * marked as all-free, and would be preferentially selected.
+	 */
+	FreeSpaceMapVacuumRange(rel, first_removed_nblocks, InvalidBlockNumber);
 }
 
 /*
@@ -588,6 +622,13 @@ smgr_redo(XLogReaderState *record)
 		xl_smgr_truncate *xlrec = (xl_smgr_truncate *) XLogRecGetData(record);
 		SMgrRelation reln;
 		Relation	rel;
+		ForkNumber	forks[MAX_FORKNUM];
+		BlockNumber	blocks[MAX_FORKNUM];
+		BlockNumber	first_removed_nblocks = InvalidBlockNumber;
+		int		nforks = 0;
+		bool		fsm_fork = false;
+		bool		main_fork = false;
+		bool		vm_fork = false;
 
 		reln = smgropen(xlrec->rnode, InvalidBackendId);
 
@@ -616,23 +657,58 @@ smgr_redo(XLogReaderState *record)
 		 */
 		XLogFlush(lsn);
 
+		/*
+		 * To speedup recovery, we mark the about-to-be-truncated blocks of
+		 * relation forks first, then truncate those simultaneously later.
+		 */
 		if ((xlrec->flags & SMGR_TRUNCATE_HEAP) != 0)
 		{
-			smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno);
-
-			/* Also tell xlogutils.c about it */
-			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+			forks[nforks] = MAIN_FORKNUM;
+			blocks[nforks] = xlrec->blkno;
+			nforks++;
+			main_fork = true;
 		}
 
-		/* Truncate FSM and VM too */
 		rel = CreateFakeRelcacheEntry(xlrec->rnode);
 
 		if ((xlrec->flags & SMGR_TRUNCATE_FSM) != 0 &&
 			smgrexists(reln, FSM_FORKNUM))
-			FreeSpaceMapTruncateRel(rel, xlrec->blkno);
+		{
+			blocks[nforks] = MarkFreeSpaceMapTruncateRel(rel, xlrec->blkno);
+			if (BlockNumberIsValid(blocks[nforks]))
+			{
+				first_removed_nblocks = xlrec->blkno;
+				forks[nforks] = FSM_FORKNUM;
+				nforks++;
+				fsm_fork = true;
+			}
+		}
 		if ((xlrec->flags & SMGR_TRUNCATE_VM) != 0 &&
 			smgrexists(reln, VISIBILITYMAP_FORKNUM))
-			visibilitymap_truncate(rel, xlrec->blkno);
+		{
+			blocks[nforks] = visibilitymap_truncate_prepare(rel, xlrec->blkno);
+			if (BlockNumberIsValid(blocks[nforks]))
+			{
+				forks[nforks] = VISIBILITYMAP_FORKNUM;
+				nforks++;
+				vm_fork = true;
+			}
+		}
+
+		/* Truncate relation forks simultaneously */
+		if (main_fork || fsm_fork || vm_fork)
+			smgrtruncate(reln, forks, nforks, blocks);
+
+		/* Also tell xlogutils.c about it */
+		if (main_fork)
+			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+
+		/*
+		 * Update upper-level FSM pages to account for the truncation.
+		 * This is important because the just-truncated pages were likely
+		 * marked as all-free, and would be preferentially selected.
+		 */
+		FreeSpaceMapVacuumRange(rel, first_removed_nblocks, InvalidBlockNumber);
 
 		FreeFakeRelcacheEntry(rel);
 	}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 7332e6b..512c8a1 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2899,8 +2899,8 @@ BufferGetLSNAtomic(Buffer buffer)
 /* ---------------------------------------------------------------------
  *		DropRelFileNodeBuffers
  *
- *		This function removes from the buffer pool all the pages of the
- *		specified relation fork that have block numbers >= firstDelBlock.
+ *		This function simultaneously removes from the buffer pool all the
+ *		pages of the relation forks that have block numbers >= firstDelBlock.
  *		(In particular, with firstDelBlock = 0, all pages are removed.)
  *		Dirty pages are simply dropped, without bothering to write them
  *		out first.  Therefore, this is NOT rollback-able, and so should be
@@ -2923,23 +2923,36 @@ BufferGetLSNAtomic(Buffer buffer)
  * --------------------------------------------------------------------
  */
 void
-DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
-					   BlockNumber firstDelBlock)
+DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
+					   int nforks, BlockNumber *firstDelBlock)
 {
-	int			i;
+	BlockNumber minBlock = InvalidBlockNumber;
 
 	/* If it's a local relation, it's localbuf.c's problem. */
 	if (RelFileNodeBackendIsTemp(rnode))
 	{
 		if (rnode.backend == MyBackendId)
-			DropRelFileNodeLocalBuffers(rnode.node, forkNum, firstDelBlock);
+		{
+			for (int i = 0; i < nforks; i++)
+				DropRelFileNodeLocalBuffers(rnode.node, forkNum[i],
+											firstDelBlock[i]);
+		}
 		return;
 	}
 
-	for (i = 0; i < NBuffers; i++)
+	/* Get the lower bound of target block number we're interested in */
+	for (int i = 0; i < nforks; i++)
+	{
+		if (!BlockNumberIsValid(minBlock) ||
+			minBlock > firstDelBlock[i])
+			minBlock = firstDelBlock[i];
+	}
+
+	for (int i = 0; i < NBuffers; i++)
 	{
 		BufferDesc *bufHdr = GetBufferDescriptor(i);
 		uint32		buf_state;
+		int		j = 0;
 
 		/*
 		 * We can make this a tad faster by prechecking the buffer tag before
@@ -2960,12 +2973,23 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
 		if (!RelFileNodeEquals(bufHdr->tag.rnode, rnode.node))
 			continue;
 
+		/* Check with the lower bound block number and skip the loop */
+		if (bufHdr->tag.blockNum < minBlock)
+			continue; /* skip checking the buffer pool scan */
+
 		buf_state = LockBufHdr(bufHdr);
-		if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
-			bufHdr->tag.forkNum == forkNum &&
-			bufHdr->tag.blockNum >= firstDelBlock)
-			InvalidateBuffer(bufHdr);	/* releases spinlock */
-		else
+
+		for (j = 0; j < nforks; j++)
+		{
+			if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
+				bufHdr->tag.forkNum == forkNum[j] &&
+				bufHdr->tag.blockNum >= firstDelBlock[j])
+			{
+				InvalidateBuffer(bufHdr); /* releases spinlock */
+				break;
+			}
+		}
+		if (j >= nforks)
 			UnlockBufHdr(bufHdr, buf_state);
 	}
 }
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index c17b3f4..9c29604 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -247,16 +247,16 @@ GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk)
 }
 
 /*
- * FreeSpaceMapTruncateRel - adjust for truncation of a relation.
+ * MarkFreeSpaceMapTruncateRel - adjust for truncation of a relation.
  *
- * The caller must hold AccessExclusiveLock on the relation, to ensure that
- * other backends receive the smgr invalidation event that this function sends
- * before they access the FSM again.
+ * This function marks the dirty page and returns a block number.
+ * The caller of this function must eventually call smgrtruncate() to actually
+ * truncate FSM pages.
  *
  * nblocks is the new size of the heap.
  */
-void
-FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
+BlockNumber
+MarkFreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 {
 	BlockNumber new_nfsmblocks;
 	FSMAddress	first_removed_address;
@@ -270,7 +270,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	 * truncate.
 	 */
 	if (!smgrexists(rel->rd_smgr, FSM_FORKNUM))
-		return;
+		return InvalidBlockNumber;
 
 	/* Get the location in the FSM of the first removed heap block */
 	first_removed_address = fsm_get_location(nblocks, &first_removed_slot);
@@ -285,7 +285,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	{
 		buf = fsm_readbuf(rel, first_removed_address, false);
 		if (!BufferIsValid(buf))
-			return;				/* nothing to do; the FSM was already smaller */
+			return InvalidBlockNumber;	/* nothing to do; the FSM was already smaller */
 		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
 
 		/* NO EREPORT(ERROR) from here till changes are logged */
@@ -310,33 +310,16 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 		UnlockReleaseBuffer(buf);
 
 		new_nfsmblocks = fsm_logical_to_physical(first_removed_address) + 1;
+		return new_nfsmblocks;
 	}
 	else
 	{
 		new_nfsmblocks = fsm_logical_to_physical(first_removed_address);
 		if (smgrnblocks(rel->rd_smgr, FSM_FORKNUM) <= new_nfsmblocks)
-			return;				/* nothing to do; the FSM was already smaller */
+			return InvalidBlockNumber;	/* nothing to do; the FSM was already smaller */
+		else
+			return new_nfsmblocks;
 	}
-
-	/* Truncate the unused FSM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, FSM_FORKNUM, new_nfsmblocks);
-
-	/*
-	 * We might as well update the local smgr_fsm_nblocks setting.
-	 * smgrtruncate sent an smgr cache inval message, which will cause other
-	 * backends to invalidate their copy of smgr_fsm_nblocks, and this one too
-	 * at the next command boundary.  But this ensures it isn't outright wrong
-	 * until then.
-	 */
-	if (rel->rd_smgr)
-		rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
-
-	/*
-	 * Update upper-level FSM pages to account for the truncation.  This is
-	 * important because the just-truncated pages were likely marked as
-	 * all-free, and would be preferentially selected.
-	 */
-	FreeSpaceMapVacuumRange(rel, nblocks, InvalidBlockNumber);
 }
 
 /*
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index dba8c39..ea57de7 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -498,29 +498,30 @@ smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo)
 }
 
 /*
- *	smgrdounlinkfork() -- Immediately unlink one fork of a relation.
+ *	smgrdounlinkfork() -- Immediately unlink each fork of a relation.
  *
- *		The specified fork of the relation is removed from the store.  This
- *		should not be used during transactional operations, since it can't be
- *		undone.
+ *		Each fork of the relation is removed from the store.  This should
+ *		not be used during transactional operations, since it can't be undone.
  *
  *		If isRedo is true, it is okay for the underlying file to be gone
  *		already.
  */
 void
-smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
+smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum, int nforks, bool isRedo)
 {
 	RelFileNodeBackend rnode = reln->smgr_rnode;
 	int			which = reln->smgr_which;
+	int			i;
 
-	/* Close the fork at smgr level */
-	smgrsw[which].smgr_close(reln, forknum);
+	/* Close each fork at smgr level */
+	for (i = 0; i < nforks; i++)
+		smgrsw[which].smgr_close(reln, forknum[i]);
 
 	/*
-	 * Get rid of any remaining buffers for the fork.  bufmgr will just drop
+	 * Get rid of any remaining buffers for each fork. bufmgr will just drop
 	 * them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(rnode, forknum, 0);
+	DropRelFileNodeBuffers(rnode, forknum, nforks, 0);
 
 	/*
 	 * It'd be nice to tell the stats collector to forget it immediately, too.
@@ -546,7 +547,8 @@ smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 	 * ERROR, because we've already decided to commit or abort the current
 	 * xact.
 	 */
-	smgrsw[which].smgr_unlink(rnode, forknum, isRedo);
+	for (i = 0; i < nforks; i++)
+		smgrsw[which].smgr_unlink(rnode, forknum[i], isRedo);
 }
 
 /*
@@ -643,13 +645,15 @@ smgrnblocks(SMgrRelation reln, ForkNumber forknum)
  * The truncation is done immediately, so this can't be rolled back.
  */
 void
-smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
+smgrtruncate(SMgrRelation reln, ForkNumber *forknum, int nforks, BlockNumber *nblocks)
 {
+	int		i;
+
 	/*
 	 * Get rid of any buffers for the about-to-be-deleted blocks. bufmgr will
 	 * just drop them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nblocks);
+	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nforks, nblocks);
 
 	/*
 	 * Send a shared-inval message to force other backends to close any smgr
@@ -663,10 +667,23 @@ smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 	 */
 	CacheInvalidateSmgr(reln->smgr_rnode);
 
-	/*
-	 * Do the truncation.
-	 */
-	smgrsw[reln->smgr_which].smgr_truncate(reln, forknum, nblocks);
+	/* Do the truncation */
+	for (i = 0; i < nforks; i++)
+	{
+		smgrsw[reln->smgr_which].smgr_truncate(reln, forknum[i], nblocks[i]);
+
+		/*
+		 * We might as well update the local smgr_fsm_nblocks and smgr_vm_nblocks
+		 * setting. smgrtruncate sent an smgr cache inval message, which will
+		 * cause other backends to invalidate their copy of smgr_fsm_nblocks and
+		 * smgr_vm_nblocks, and these ones too at the next command boundary. But
+		 * this ensures these aren't outright wrong until then.
+		 */
+		if (forknum[i] == FSM_FORKNUM)
+			reln->smgr_fsm_nblocks = nblocks[i];
+		if (forknum[i] == VISIBILITYMAP_FORKNUM)
+			reln->smgr_vm_nblocks = nblocks[i];
+	}
 }
 
 /*
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 2d88043..1ab6a81 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -44,6 +44,6 @@ extern void visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 							  uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
-extern void visibilitymap_truncate(Relation rel, BlockNumber nheapblocks);
+extern BlockNumber visibilitymap_truncate_prepare(Relation rel, BlockNumber nheapblocks);
 
 #endif							/* VISIBILITYMAP_H */
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 509f4b7..17b97f7 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -190,8 +190,8 @@ extern BlockNumber RelationGetNumberOfBlocksInFork(Relation relation,
 extern void FlushOneBuffer(Buffer buffer);
 extern void FlushRelationBuffers(Relation rel);
 extern void FlushDatabaseBuffers(Oid dbid);
-extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode,
-								   ForkNumber forkNum, BlockNumber firstDelBlock);
+extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
+								   int nforks, BlockNumber *firstDelBlock);
 extern void DropRelFileNodesAllBuffers(RelFileNodeBackend *rnodes, int nnodes);
 extern void DropDatabaseBuffers(Oid dbid);
 
diff --git a/src/include/storage/freespace.h b/src/include/storage/freespace.h
index 8d8c465..bf19a67 100644
--- a/src/include/storage/freespace.h
+++ b/src/include/storage/freespace.h
@@ -30,7 +30,7 @@ extern void RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk,
 extern void XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk,
 										Size spaceAvail);
 
-extern void FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
+extern BlockNumber MarkFreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
 extern void FreeSpaceMapVacuum(Relation rel);
 extern void FreeSpaceMapVacuumRange(Relation rel, BlockNumber start,
 									BlockNumber end);
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index d286c8c..a24532c 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -90,7 +90,8 @@ extern void smgrclosenode(RelFileNodeBackend rnode);
 extern void smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern void smgrdounlink(SMgrRelation reln, bool isRedo);
 extern void smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo);
-extern void smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo);
+extern void smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum,
+							 int nforks, bool isRedo);
 extern void smgrextend(SMgrRelation reln, ForkNumber forknum,
 					   BlockNumber blocknum, char *buffer, bool skipFsync);
 extern void smgrprefetch(SMgrRelation reln, ForkNumber forknum,
@@ -102,8 +103,8 @@ extern void smgrwrite(SMgrRelation reln, ForkNumber forknum,
 extern void smgrwriteback(SMgrRelation reln, ForkNumber forknum,
 						  BlockNumber blocknum, BlockNumber nblocks);
 extern BlockNumber smgrnblocks(SMgrRelation reln, ForkNumber forknum);
-extern void smgrtruncate(SMgrRelation reln, ForkNumber forknum,
-						 BlockNumber nblocks);
+extern void smgrtruncate(SMgrRelation reln, ForkNumber *forknum,
+						 int nforks, BlockNumber *nblocks);
 extern void smgrimmedsync(SMgrRelation reln, ForkNumber forknum);
 extern void AtEOXact_SMgr(void);
 
-- 
1.8.3.1

#23Fujii Masao
masao.fujii@gmail.com
In reply to: Jamison, Kirk (#22)
Re: [PATCH] Speedup truncates of relation forks

On Wed, Jul 24, 2019 at 9:58 AM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote:

Hi,

I repeated the recovery performance test before, and found out that I made a
wrong measurement.
Using the same steps indicated in the previous email (24GB shared_buffers for my case),
the recovery time still significantly improved compared to head
from "13 minutes" to "4 minutes 44 seconds" //not 30 seconds.
It's expected because the measurement of vacuum execution time (no failover)
which I reported in the first email is about the same (although 5 minutes):

HEAD results
3) 24GB shared_buffers = 14 min 13.598 s
PATCH results
3) 24GB shared_buffers = 5 min 35.848 s

Reattaching the patch here again. The V5 of the patch fixed the compile error
mentioned before and mainly addressed the comments/advice of Sawada-san.
- updated more accurate comments describing only current behavior, not history
- updated function name: visibilitymap_truncate_prepare()
- moved the setting of values for smgr_{fsm,vm}_nblocks inside the smgrtruncate()

I'd be grateful if anyone could provide comments, advice, or insights.
Thank you again in advance.

Thanks for the patch!

-smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
+smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum, int nforks,
bool isRedo)

smgrdounlinkfork() is dead code. Per the discussion [1]/messages/by-id/1471.1339106082@sss.pgh.pa.us, this unused
function was left intentionally. But it's still dead code since 2012,
so I'd like to remove it. Or, even if we decide to keep that function
for some reasons, I don't think that we need to update that so that
it can unlink multiple forks at once. So, what about keeping
smgrdounlinkfork() as it is?

[1]: /messages/by-id/1471.1339106082@sss.pgh.pa.us
/messages/by-id/1471.1339106082@sss.pgh.pa.us

+ for (int i = 0; i < nforks; i++)

The variable "i" should not be declared in for loop
per PostgreSQL coding style.

+ /* Check with the lower bound block number and skip the loop */
+ if (bufHdr->tag.blockNum < minBlock)
+ continue; /* skip checking the buffer pool scan */

Because of the above code, the following source comment in bufmgr.c
should be updated.

* We could check forkNum and blockNum as well as the rnode, but the
* incremental win from doing so seems small.

And, first of all, is this check really useful for performance?
Since firstDelBlock for FSM fork is usually small,
minBlock would also be small. So I'm not sure how much
this is helpful for performance.

When relation is completely truncated at all (i.e., the number of block
to delete first is zero), can RelationTruncate() and smgr_redo() just
call smgrdounlinkall() like smgrDoPendingDeletes() does, instead of
calling MarkFreeSpaceMapTruncateRel(), visibilitymap_truncate_prepare()
and smgrtruncate()? ISTM that smgrdounlinkall() is faster and simpler.

Regards,

--
Fujii Masao

#24Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Fujii Masao (#23)
1 attachment(s)
RE: [PATCH] Speedup truncates of relation forks

On Tuesday, September 3, 2019 9:44 PM (GMT+9), Fujii Masao wrote:

Thanks for the patch!

Thank you as well for the review!

-smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
+smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum, int nforks,
bool isRedo)

smgrdounlinkfork() is dead code. Per the discussion [1], this unused function
was left intentionally. But it's still dead code since 2012, so I'd like to
remove it. Or, even if we decide to keep that function for some reasons, I
don't think that we need to update that so that it can unlink multiple forks
at once. So, what about keeping
smgrdounlinkfork() as it is?

[1]
/messages/by-id/1471.1339106082@sss.pgh.pa.us

I also mentioned it from my first post if we can just remove this dead code.
If not, it would require to modify the function because it would also
need nforks as input argument when calling DropRelFileNodeBuffers. I kept my
changes in the latest patch.
So should I remove the function now or keep my changes?

+ for (int i = 0; i < nforks; i++)

The variable "i" should not be declared in for loop per PostgreSQL coding
style.

Fixed.

+ /* Check with the lower bound block number and skip the loop */ if
+ (bufHdr->tag.blockNum < minBlock) continue; /* skip checking the
+ buffer pool scan */

Because of the above code, the following source comment in bufmgr.c should
be updated.

* We could check forkNum and blockNum as well as the rnode, but the
* incremental win from doing so seems small.

And, first of all, is this check really useful for performance?
Since firstDelBlock for FSM fork is usually small, minBlock would also be
small. So I'm not sure how much this is helpful for performance.

This was a suggestion from Sawada-san in the previous email,
but he also thought that the performance benefit might be small..
so I just removed the related code block in this patch.

When relation is completely truncated at all (i.e., the number of block to
delete first is zero), can RelationTruncate() and smgr_redo() just call
smgrdounlinkall() like smgrDoPendingDeletes() does, instead of calling
MarkFreeSpaceMapTruncateRel(), visibilitymap_truncate_prepare() and
smgrtruncate()? ISTM that smgrdounlinkall() is faster and simpler.

I haven't applied this in my patch yet.
If my understanding is correct, smgrdounlinkall() is used for deleting
relation forks. However, we only truncate (not delete) relations
in RelationTruncate() and smgr_redo(). I'm not sure if it's correct to
use it here. Could you expound more your idea on using smgrdounlinkall?

Regards,
Kirk Jamison

Attachments:

v6-0001-Speedup-truncates-of-relation-forks.patchapplication/octet-stream; name=v6-0001-Speedup-truncates-of-relation-forks.patchDownload
From ea70f5734ca6be1ea4d6ec701c419debf9cc5743 Mon Sep 17 00:00:00 2001
From: Kirk Jamison <k.jamison@jp.fujitsu.com>
Date: Thu, 4 Jul 2019 10:59:05 +0000
Subject: [PATCH] Speedup truncates of relation forks

Whenever we truncate relations, it involves several scans of the
shared buffers for every call of smgrtruncate() for each fork which
is time-consuming. This patch reduces the scan for all forks into one
instead of three, and improves the relation truncates by initially
marking and preparing the pages-to-be-truncated of relation forks,
then simultaneously truncating them, resulting to an improved
performance in VACUUM, autovacuum operations, and their recovery.
---
 contrib/pg_visibility/pg_visibility.c     |   8 ++-
 src/backend/access/heap/visibilitymap.c   |  36 ++++-------
 src/backend/catalog/storage.c             | 102 ++++++++++++++++++++++++++----
 src/backend/storage/buffer/bufmgr.c       |  33 +++++++---
 src/backend/storage/freespace/freespace.c |  41 ++++--------
 src/backend/storage/smgr/smgr.c           |  49 +++++++++-----
 src/include/access/visibilitymap.h        |   2 +-
 src/include/storage/bufmgr.h              |   4 +-
 src/include/storage/freespace.h           |   2 +-
 src/include/storage/smgr.h                |   7 +-
 10 files changed, 186 insertions(+), 98 deletions(-)

diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 1372bb6..60eff7f 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -383,6 +383,8 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
 {
 	Oid			relid = PG_GETARG_OID(0);
 	Relation	rel;
+	ForkNumber	fork;
+	BlockNumber	block;
 
 	rel = relation_open(relid, AccessExclusiveLock);
 
@@ -392,7 +394,11 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
 	RelationOpenSmgr(rel);
 	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
 
-	visibilitymap_truncate(rel, 0);
+	block = visibilitymap_truncate_prepare(rel, 0);
+	if (BlockNumberIsValid(block))
+		fork = VISIBILITYMAP_FORKNUM;
+
+	smgrtruncate(rel->rd_smgr, &fork, 1, &block);
 
 	if (RelationNeedsWAL(rel))
 	{
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index a08922b..351fc31 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -17,7 +17,7 @@
  *		visibilitymap_set	 - set a bit in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
- *		visibilitymap_truncate	- truncate the visibility map
+ *		visibilitymap_truncate_prepare - truncate only tail bits of map pages
  *
  * NOTES
  *
@@ -430,16 +430,18 @@ visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_fro
 }
 
 /*
- *	visibilitymap_truncate - truncate the visibility map
+ *	visibilitymap_truncate_prepare - truncate only tail bits of map page
+ *									 and return the block number for actual
+ *									 truncation later
  *
- * The caller must hold AccessExclusiveLock on the relation, to ensure that
- * other backends receive the smgr invalidation event that this function sends
- * before they access the VM again.
+ * Note that this does not truncate the actual visibility map pages.
+ * When this function is called, the caller must eventually follow it with
+ * smgrtruncate() call to actually truncate visibility map pages.
  *
  * nheapblocks is the new size of the heap.
  */
-void
-visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
+BlockNumber
+visibilitymap_truncate_prepare(Relation rel, BlockNumber nheapblocks)
 {
 	BlockNumber newnblocks;
 
@@ -459,7 +461,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	 * nothing to truncate.
 	 */
 	if (!smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM))
-		return;
+		return InvalidBlockNumber;
 
 	/*
 	 * Unless the new size is exactly at a visibility map page boundary, the
@@ -480,7 +482,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 		if (!BufferIsValid(mapBuffer))
 		{
 			/* nothing to do, the file was already smaller */
-			return;
+			return InvalidBlockNumber;
 		}
 
 		page = BufferGetPage(mapBuffer);
@@ -528,20 +530,10 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	if (smgrnblocks(rel->rd_smgr, VISIBILITYMAP_FORKNUM) <= newnblocks)
 	{
 		/* nothing to do, the file was already smaller than requested size */
-		return;
+		return InvalidBlockNumber;
 	}
-
-	/* Truncate the unused VM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, newnblocks);
-
-	/*
-	 * We might as well update the local smgr_vm_nblocks setting. smgrtruncate
-	 * sent an smgr cache inval message, which will cause other backends to
-	 * invalidate their copy of smgr_vm_nblocks, and this one too at the next
-	 * command boundary.  But this ensures it isn't outright wrong until then.
-	 */
-	if (rel->rd_smgr)
-		rel->rd_smgr->smgr_vm_nblocks = newnblocks;
+	else
+		return newnblocks;
 }
 
 /*
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 3cc886f..395ef0f 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -231,6 +231,10 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 {
 	bool		fsm;
 	bool		vm;
+	ForkNumber	forks[MAX_FORKNUM];
+	BlockNumber	blocks[MAX_FORKNUM];
+	BlockNumber	first_removed_nblocks = InvalidBlockNumber;
+	int		nforks = 0;
 
 	/* Open it at the smgr level if not already done */
 	RelationOpenSmgr(rel);
@@ -242,15 +246,33 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 	rel->rd_smgr->smgr_fsm_nblocks = InvalidBlockNumber;
 	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
 
-	/* Truncate the FSM first if it exists */
+	/* Mark the dirty FSM page and return a block number. */
 	fsm = smgrexists(rel->rd_smgr, FSM_FORKNUM);
 	if (fsm)
-		FreeSpaceMapTruncateRel(rel, nblocks);
+	{
+		blocks[nforks] = MarkFreeSpaceMapTruncateRel(rel, nblocks);
+		if (BlockNumberIsValid(blocks[nforks]))
+		{
+			first_removed_nblocks = nblocks;
+			forks[nforks] = FSM_FORKNUM;
+			nforks++;
+		}
+	}
 
-	/* Truncate the visibility map too if it exists. */
+	/*
+	 * Truncate only the tail bits of VM and return the block number
+	 * for actual truncation later in smgrtruncate.
+	 */
 	vm = smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM);
 	if (vm)
-		visibilitymap_truncate(rel, nblocks);
+	{
+		blocks[nforks] = visibilitymap_truncate_prepare(rel, nblocks);
+		if (BlockNumberIsValid(blocks[nforks]))
+		{
+			forks[nforks] = VISIBILITYMAP_FORKNUM;
+			nforks++;
+		}
+	}
 
 	/*
 	 * We WAL-log the truncation before actually truncating, which means
@@ -290,8 +312,20 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 			XLogFlush(lsn);
 	}
 
-	/* Do the real work */
-	smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks);
+	/* Mark the MAIN fork */
+	forks[nforks] = MAIN_FORKNUM;
+	blocks[nforks] = nblocks;
+	nforks++;
+
+	/* Truncate relation forks simultaneously */
+	smgrtruncate(rel->rd_smgr, forks, nforks, blocks);
+
+	/*
+	 * Update upper-level FSM pages to account for the truncation.
+	 * This is important because the just-truncated pages were likely
+	 * marked as all-free, and would be preferentially selected.
+	 */
+	FreeSpaceMapVacuumRange(rel, first_removed_nblocks, InvalidBlockNumber);
 }
 
 /*
@@ -588,6 +622,13 @@ smgr_redo(XLogReaderState *record)
 		xl_smgr_truncate *xlrec = (xl_smgr_truncate *) XLogRecGetData(record);
 		SMgrRelation reln;
 		Relation	rel;
+		ForkNumber	forks[MAX_FORKNUM];
+		BlockNumber	blocks[MAX_FORKNUM];
+		BlockNumber	first_removed_nblocks = InvalidBlockNumber;
+		int		nforks = 0;
+		bool		fsm_fork = false;
+		bool		main_fork = false;
+		bool		vm_fork = false;
 
 		reln = smgropen(xlrec->rnode, InvalidBackendId);
 
@@ -616,23 +657,58 @@ smgr_redo(XLogReaderState *record)
 		 */
 		XLogFlush(lsn);
 
+		/*
+		 * To speedup recovery, we mark the about-to-be-truncated blocks of
+		 * relation forks first, then truncate those simultaneously later.
+		 */
 		if ((xlrec->flags & SMGR_TRUNCATE_HEAP) != 0)
 		{
-			smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno);
-
-			/* Also tell xlogutils.c about it */
-			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+			forks[nforks] = MAIN_FORKNUM;
+			blocks[nforks] = xlrec->blkno;
+			nforks++;
+			main_fork = true;
 		}
 
-		/* Truncate FSM and VM too */
 		rel = CreateFakeRelcacheEntry(xlrec->rnode);
 
 		if ((xlrec->flags & SMGR_TRUNCATE_FSM) != 0 &&
 			smgrexists(reln, FSM_FORKNUM))
-			FreeSpaceMapTruncateRel(rel, xlrec->blkno);
+		{
+			blocks[nforks] = MarkFreeSpaceMapTruncateRel(rel, xlrec->blkno);
+			if (BlockNumberIsValid(blocks[nforks]))
+			{
+				first_removed_nblocks = xlrec->blkno;
+				forks[nforks] = FSM_FORKNUM;
+				nforks++;
+				fsm_fork = true;
+			}
+		}
 		if ((xlrec->flags & SMGR_TRUNCATE_VM) != 0 &&
 			smgrexists(reln, VISIBILITYMAP_FORKNUM))
-			visibilitymap_truncate(rel, xlrec->blkno);
+		{
+			blocks[nforks] = visibilitymap_truncate_prepare(rel, xlrec->blkno);
+			if (BlockNumberIsValid(blocks[nforks]))
+			{
+				forks[nforks] = VISIBILITYMAP_FORKNUM;
+				nforks++;
+				vm_fork = true;
+			}
+		}
+
+		/* Truncate relation forks simultaneously */
+		if (main_fork || fsm_fork || vm_fork)
+			smgrtruncate(reln, forks, nforks, blocks);
+
+		/* Also tell xlogutils.c about it */
+		if (main_fork)
+			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+
+		/*
+		 * Update upper-level FSM pages to account for the truncation.
+		 * This is important because the just-truncated pages were likely
+		 * marked as all-free, and would be preferentially selected.
+		 */
+		FreeSpaceMapVacuumRange(rel, first_removed_nblocks, InvalidBlockNumber);
 
 		FreeFakeRelcacheEntry(rel);
 	}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 6f3a402..1f2b600 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2900,8 +2900,8 @@ BufferGetLSNAtomic(Buffer buffer)
 /* ---------------------------------------------------------------------
  *		DropRelFileNodeBuffers
  *
- *		This function removes from the buffer pool all the pages of the
- *		specified relation fork that have block numbers >= firstDelBlock.
+ *		This function simultaneously removes from the buffer pool all the
+ *		pages of the relation forks that have block numbers >= firstDelBlock.
  *		(In particular, with firstDelBlock = 0, all pages are removed.)
  *		Dirty pages are simply dropped, without bothering to write them
  *		out first.  Therefore, this is NOT rollback-able, and so should be
@@ -2924,8 +2924,8 @@ BufferGetLSNAtomic(Buffer buffer)
  * --------------------------------------------------------------------
  */
 void
-DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
-					   BlockNumber firstDelBlock)
+DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
+					   int nforks, BlockNumber *firstDelBlock)
 {
 	int			i;
 
@@ -2933,7 +2933,12 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
 	if (RelFileNodeBackendIsTemp(rnode))
 	{
 		if (rnode.backend == MyBackendId)
-			DropRelFileNodeLocalBuffers(rnode.node, forkNum, firstDelBlock);
+		{
+			int		j;
+			for (j = 0; j < nforks; j++)
+				DropRelFileNodeLocalBuffers(rnode.node, forkNum[j],
+											firstDelBlock[j]);
+		}
 		return;
 	}
 
@@ -2941,6 +2946,7 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
 	{
 		BufferDesc *bufHdr = GetBufferDescriptor(i);
 		uint32		buf_state;
+		int		j = 0;
 
 		/*
 		 * We can make this a tad faster by prechecking the buffer tag before
@@ -2962,11 +2968,18 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
 			continue;
 
 		buf_state = LockBufHdr(bufHdr);
-		if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
-			bufHdr->tag.forkNum == forkNum &&
-			bufHdr->tag.blockNum >= firstDelBlock)
-			InvalidateBuffer(bufHdr);	/* releases spinlock */
-		else
+
+		for (j = 0; j < nforks; j++)
+		{
+			if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
+				bufHdr->tag.forkNum == forkNum[j] &&
+				bufHdr->tag.blockNum >= firstDelBlock[j])
+			{
+				InvalidateBuffer(bufHdr); /* releases spinlock */
+				break;
+			}
+		}
+		if (j >= nforks)
 			UnlockBufHdr(bufHdr, buf_state);
 	}
 }
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 2383094..437181f 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -247,16 +247,16 @@ GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk)
 }
 
 /*
- * FreeSpaceMapTruncateRel - adjust for truncation of a relation.
+ * MarkFreeSpaceMapTruncateRel - adjust for truncation of a relation.
  *
- * The caller must hold AccessExclusiveLock on the relation, to ensure that
- * other backends receive the smgr invalidation event that this function sends
- * before they access the FSM again.
+ * This function marks the dirty page and returns a block number.
+ * The caller of this function must eventually call smgrtruncate() to actually
+ * truncate FSM pages.
  *
  * nblocks is the new size of the heap.
  */
-void
-FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
+BlockNumber
+MarkFreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 {
 	BlockNumber new_nfsmblocks;
 	FSMAddress	first_removed_address;
@@ -270,7 +270,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	 * truncate.
 	 */
 	if (!smgrexists(rel->rd_smgr, FSM_FORKNUM))
-		return;
+		return InvalidBlockNumber;
 
 	/* Get the location in the FSM of the first removed heap block */
 	first_removed_address = fsm_get_location(nblocks, &first_removed_slot);
@@ -285,7 +285,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	{
 		buf = fsm_readbuf(rel, first_removed_address, false);
 		if (!BufferIsValid(buf))
-			return;				/* nothing to do; the FSM was already smaller */
+			return InvalidBlockNumber;	/* nothing to do; the FSM was already smaller */
 		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
 
 		/* NO EREPORT(ERROR) from here till changes are logged */
@@ -310,33 +310,16 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 		UnlockReleaseBuffer(buf);
 
 		new_nfsmblocks = fsm_logical_to_physical(first_removed_address) + 1;
+		return new_nfsmblocks;
 	}
 	else
 	{
 		new_nfsmblocks = fsm_logical_to_physical(first_removed_address);
 		if (smgrnblocks(rel->rd_smgr, FSM_FORKNUM) <= new_nfsmblocks)
-			return;				/* nothing to do; the FSM was already smaller */
+			return InvalidBlockNumber;	/* nothing to do; the FSM was already smaller */
+		else
+			return new_nfsmblocks;
 	}
-
-	/* Truncate the unused FSM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, FSM_FORKNUM, new_nfsmblocks);
-
-	/*
-	 * We might as well update the local smgr_fsm_nblocks setting.
-	 * smgrtruncate sent an smgr cache inval message, which will cause other
-	 * backends to invalidate their copy of smgr_fsm_nblocks, and this one too
-	 * at the next command boundary.  But this ensures it isn't outright wrong
-	 * until then.
-	 */
-	if (rel->rd_smgr)
-		rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
-
-	/*
-	 * Update upper-level FSM pages to account for the truncation.  This is
-	 * important because the just-truncated pages were likely marked as
-	 * all-free, and would be preferentially selected.
-	 */
-	FreeSpaceMapVacuumRange(rel, nblocks, InvalidBlockNumber);
 }
 
 /*
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index b0d9f21..82a2977 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -473,29 +473,30 @@ smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo)
 }
 
 /*
- *	smgrdounlinkfork() -- Immediately unlink one fork of a relation.
+ *	smgrdounlinkfork() -- Immediately unlink each fork of a relation.
  *
- *		The specified fork of the relation is removed from the store.  This
- *		should not be used during transactional operations, since it can't be
- *		undone.
+ *		Each fork of the relation is removed from the store.  This should
+ *		not be used during transactional operations, since it can't be undone.
  *
  *		If isRedo is true, it is okay for the underlying file to be gone
  *		already.
  */
 void
-smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
+smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum, int nforks, bool isRedo)
 {
 	RelFileNodeBackend rnode = reln->smgr_rnode;
 	int			which = reln->smgr_which;
+	int			i;
 
-	/* Close the fork at smgr level */
-	smgrsw[which].smgr_close(reln, forknum);
+	/* Close each fork at smgr level */
+	for (i = 0; i < nforks; i++)
+		smgrsw[which].smgr_close(reln, forknum[i]);
 
 	/*
-	 * Get rid of any remaining buffers for the fork.  bufmgr will just drop
+	 * Get rid of any remaining buffers for each fork. bufmgr will just drop
 	 * them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(rnode, forknum, 0);
+	DropRelFileNodeBuffers(rnode, forknum, nforks, 0);
 
 	/*
 	 * It'd be nice to tell the stats collector to forget it immediately, too.
@@ -521,7 +522,8 @@ smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 	 * ERROR, because we've already decided to commit or abort the current
 	 * xact.
 	 */
-	smgrsw[which].smgr_unlink(rnode, forknum, isRedo);
+	for (i = 0; i < nforks; i++)
+		smgrsw[which].smgr_unlink(rnode, forknum[i], isRedo);
 }
 
 /*
@@ -618,13 +620,15 @@ smgrnblocks(SMgrRelation reln, ForkNumber forknum)
  * The truncation is done immediately, so this can't be rolled back.
  */
 void
-smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
+smgrtruncate(SMgrRelation reln, ForkNumber *forknum, int nforks, BlockNumber *nblocks)
 {
+	int		i;
+
 	/*
 	 * Get rid of any buffers for the about-to-be-deleted blocks. bufmgr will
 	 * just drop them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nblocks);
+	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nforks, nblocks);
 
 	/*
 	 * Send a shared-inval message to force other backends to close any smgr
@@ -638,10 +642,23 @@ smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 	 */
 	CacheInvalidateSmgr(reln->smgr_rnode);
 
-	/*
-	 * Do the truncation.
-	 */
-	smgrsw[reln->smgr_which].smgr_truncate(reln, forknum, nblocks);
+	/* Do the truncation */
+	for (i = 0; i < nforks; i++)
+	{
+		smgrsw[reln->smgr_which].smgr_truncate(reln, forknum[i], nblocks[i]);
+
+		/*
+		 * We might as well update the local smgr_fsm_nblocks and smgr_vm_nblocks
+		 * setting. smgrtruncate sent an smgr cache inval message, which will
+		 * cause other backends to invalidate their copy of smgr_fsm_nblocks and
+		 * smgr_vm_nblocks, and these ones too at the next command boundary. But
+		 * this ensures these aren't outright wrong until then.
+		 */
+		if (forknum[i] == FSM_FORKNUM)
+			reln->smgr_fsm_nblocks = nblocks[i];
+		if (forknum[i] == VISIBILITYMAP_FORKNUM)
+			reln->smgr_vm_nblocks = nblocks[i];
+	}
 }
 
 /*
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 2d88043..1ab6a81 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -44,6 +44,6 @@ extern void visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 							  uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
-extern void visibilitymap_truncate(Relation rel, BlockNumber nheapblocks);
+extern BlockNumber visibilitymap_truncate_prepare(Relation rel, BlockNumber nheapblocks);
 
 #endif							/* VISIBILITYMAP_H */
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 509f4b7..17b97f7 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -190,8 +190,8 @@ extern BlockNumber RelationGetNumberOfBlocksInFork(Relation relation,
 extern void FlushOneBuffer(Buffer buffer);
 extern void FlushRelationBuffers(Relation rel);
 extern void FlushDatabaseBuffers(Oid dbid);
-extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode,
-								   ForkNumber forkNum, BlockNumber firstDelBlock);
+extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
+								   int nforks, BlockNumber *firstDelBlock);
 extern void DropRelFileNodesAllBuffers(RelFileNodeBackend *rnodes, int nnodes);
 extern void DropDatabaseBuffers(Oid dbid);
 
diff --git a/src/include/storage/freespace.h b/src/include/storage/freespace.h
index 8d8c465..bf19a67 100644
--- a/src/include/storage/freespace.h
+++ b/src/include/storage/freespace.h
@@ -30,7 +30,7 @@ extern void RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk,
 extern void XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk,
 										Size spaceAvail);
 
-extern void FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
+extern BlockNumber MarkFreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
 extern void FreeSpaceMapVacuum(Relation rel);
 extern void FreeSpaceMapVacuumRange(Relation rel, BlockNumber start,
 									BlockNumber end);
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index d286c8c..a24532c 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -90,7 +90,8 @@ extern void smgrclosenode(RelFileNodeBackend rnode);
 extern void smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern void smgrdounlink(SMgrRelation reln, bool isRedo);
 extern void smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo);
-extern void smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo);
+extern void smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum,
+							 int nforks, bool isRedo);
 extern void smgrextend(SMgrRelation reln, ForkNumber forknum,
 					   BlockNumber blocknum, char *buffer, bool skipFsync);
 extern void smgrprefetch(SMgrRelation reln, ForkNumber forknum,
@@ -102,8 +103,8 @@ extern void smgrwrite(SMgrRelation reln, ForkNumber forknum,
 extern void smgrwriteback(SMgrRelation reln, ForkNumber forknum,
 						  BlockNumber blocknum, BlockNumber nblocks);
 extern BlockNumber smgrnblocks(SMgrRelation reln, ForkNumber forknum);
-extern void smgrtruncate(SMgrRelation reln, ForkNumber forknum,
-						 BlockNumber nblocks);
+extern void smgrtruncate(SMgrRelation reln, ForkNumber *forknum,
+						 int nforks, BlockNumber *nblocks);
 extern void smgrimmedsync(SMgrRelation reln, ForkNumber forknum);
 extern void AtEOXact_SMgr(void);
 
-- 
1.8.3.1

#25Alvaro Herrera from 2ndQuadrant
alvherre@alvh.no-ip.org
In reply to: Jamison, Kirk (#24)
Re: [PATCH] Speedup truncates of relation forks

On 2019-Sep-05, Jamison, Kirk wrote:

I also mentioned it from my first post if we can just remove this dead code.
If not, it would require to modify the function because it would also
need nforks as input argument when calling DropRelFileNodeBuffers. I kept my
changes in the latest patch.
So should I remove the function now or keep my changes?

Please add a preliminary patch that removes the function. Dead code is
good, as long as it is gone. We can get it pushed ahead of the rest of
this.

What does it mean to "mark" a dirty page in FSM? We don't have the
concept of marking pages as far as I know (and I don't see that the
patch does any sort of mark). Do you mean to find where it is and
return its block number? If so, I wonder how this handles concurrent
table extension: are we keeping some sort of lock that prevents it?
(... or would we lose any newly added pages that receive tuples while
this truncation is ongoing?)

I think the new API of smgrtruncate() is fairly confusing. Would it be
better to define a new struct { ForkNum forknum; BlockNumber blkno; }
and pass an array of those?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#26Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Alvaro Herrera from 2ndQuadrant (#25)
2 attachment(s)
RE: [PATCH] Speedup truncates of relation forks

On Friday, September 6, 2019 11:51 PM (GMT+9), Alvaro Herrera wrote:

Hi Alvaro,
Thank you very much for the review!

On 2019-Sep-05, Jamison, Kirk wrote:

I also mentioned it from my first post if we can just remove this dead code.
If not, it would require to modify the function because it would also
need nforks as input argument when calling DropRelFileNodeBuffers. I
kept my changes in the latest patch.
So should I remove the function now or keep my changes?

Please add a preliminary patch that removes the function. Dead code is good,
as long as it is gone. We can get it pushed ahead of the rest of this.

Alright. I've attached a separate patch removing the smgrdounlinkfork.

What does it mean to "mark" a dirty page in FSM? We don't have the concept
of marking pages as far as I know (and I don't see that the patch does any
sort of mark). Do you mean to find where it is and return its block number?

Yes. "Mark" is probably not a proper way to describe it, so I temporarily
changed it to "locate" and renamed the function to FreeSpaceMapLocateBlock().
If anyone could suggest a more appropriate name, that'd be appreciated.

If so, I wonder how this handles concurrent table extension: are we keeping
some sort of lock that prevents it?
(... or would we lose any newly added pages that receive tuples while this
truncation is ongoing?)

I moved the the description about acquiring AccessExclusiveLock
from FreeSpaceMapLocateBlock() and visibilitymap_truncate_prepare() to the
smgrtruncate description instead.
IIUC, in lazy_truncate_heap() we still acquire AccessExclusiveLock for the relation
before calling RelationTruncate(), which then calls smgrtruncate().
While holding the exclusive lock, the following are also called to check
if rel has not extended and verify that end pages contain no tuples while
we were vacuuming (with non-exclusive lock).
new_rel_pages = RelationGetNumberOfBlocks(onerel);
new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
I assume that the above would update the correct number of pages.
We then release the exclusive lock as soon as we have truncated the pages.

I think the new API of smgrtruncate() is fairly confusing. Would it be better
to define a new struct { ForkNum forknum; BlockNumber blkno; } and pass an
array of those?

This is for readbility, right? However, I think there's no need to define a
new structure for it, so I kept my changes.
smgrtruncate(SMgrRelation reln, ForkNumber *forknum, int nforks, BlockNumber *nblocks).
I also declared *forkNum and nforks next to each other as suggested by Sawada-san.

What do you think about these changes?

Regards,
Kirk Jamison

Attachments:

v1-0001-Remove-deadcode-smgrdounlinkfork.patchapplication/octet-stream; name=v1-0001-Remove-deadcode-smgrdounlinkfork.patchDownload
From 3aeda104f3c2cc9b0841e536cb39fcd16cb8d881 Mon Sep 17 00:00:00 2001
From: Kirk Jamison <k.jamison@jp.fujitsu.com>
Date: Mon, 9 Sep 2019 06:09:04 +0000
Subject: [PATCH] Remove deadcode smgrdounlinkfork()

---
 src/backend/storage/smgr/smgr.c | 55 -----------------------------------------
 src/include/storage/smgr.h      |  1 -
 2 files changed, 56 deletions(-)

diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index b0d9f21..5b5a80e 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -343,9 +343,6 @@ smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
  *
  *		If isRedo is true, it is okay for the underlying file(s) to be gone
  *		already.
- *
- *		This is equivalent to calling smgrdounlinkfork for each fork, but
- *		it's significantly quicker so should be preferred when possible.
  */
 void
 smgrdounlink(SMgrRelation reln, bool isRedo)
@@ -473,58 +470,6 @@ smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo)
 }
 
 /*
- *	smgrdounlinkfork() -- Immediately unlink one fork of a relation.
- *
- *		The specified fork of the relation is removed from the store.  This
- *		should not be used during transactional operations, since it can't be
- *		undone.
- *
- *		If isRedo is true, it is okay for the underlying file to be gone
- *		already.
- */
-void
-smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
-{
-	RelFileNodeBackend rnode = reln->smgr_rnode;
-	int			which = reln->smgr_which;
-
-	/* Close the fork at smgr level */
-	smgrsw[which].smgr_close(reln, forknum);
-
-	/*
-	 * Get rid of any remaining buffers for the fork.  bufmgr will just drop
-	 * them without bothering to write the contents.
-	 */
-	DropRelFileNodeBuffers(rnode, forknum, 0);
-
-	/*
-	 * It'd be nice to tell the stats collector to forget it immediately, too.
-	 * But we can't because we don't know the OID (and in cases involving
-	 * relfilenode swaps, it's not always clear which table OID to forget,
-	 * anyway).
-	 */
-
-	/*
-	 * Send a shared-inval message to force other backends to close any
-	 * dangling smgr references they may have for this rel.  We should do this
-	 * before starting the actual unlinking, in case we fail partway through
-	 * that step.  Note that the sinval message will eventually come back to
-	 * this backend, too, and thereby provide a backstop that we closed our
-	 * own smgr rel.
-	 */
-	CacheInvalidateSmgr(rnode);
-
-	/*
-	 * Delete the physical file(s).
-	 *
-	 * Note: smgr_unlink must treat deletion failure as a WARNING, not an
-	 * ERROR, because we've already decided to commit or abort the current
-	 * xact.
-	 */
-	smgrsw[which].smgr_unlink(rnode, forknum, isRedo);
-}
-
-/*
  *	smgrextend() -- Add a new block to a file.
  *
  *		The semantics are nearly the same as smgrwrite(): write at the
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index d286c8c..7393727 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -90,7 +90,6 @@ extern void smgrclosenode(RelFileNodeBackend rnode);
 extern void smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern void smgrdounlink(SMgrRelation reln, bool isRedo);
 extern void smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo);
-extern void smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern void smgrextend(SMgrRelation reln, ForkNumber forknum,
 					   BlockNumber blocknum, char *buffer, bool skipFsync);
 extern void smgrprefetch(SMgrRelation reln, ForkNumber forknum,
-- 
1.8.3.1

v7-0001-Speedup-truncates-of-relation-forks.patchapplication/octet-stream; name=v7-0001-Speedup-truncates-of-relation-forks.patchDownload
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 1372bb6..60eff7f 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -383,6 +383,8 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
 {
 	Oid			relid = PG_GETARG_OID(0);
 	Relation	rel;
+	ForkNumber	fork;
+	BlockNumber	block;
 
 	rel = relation_open(relid, AccessExclusiveLock);
 
@@ -392,7 +394,11 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
 	RelationOpenSmgr(rel);
 	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
 
-	visibilitymap_truncate(rel, 0);
+	block = visibilitymap_truncate_prepare(rel, 0);
+	if (BlockNumberIsValid(block))
+		fork = VISIBILITYMAP_FORKNUM;
+
+	smgrtruncate(rel->rd_smgr, &fork, 1, &block);
 
 	if (RelationNeedsWAL(rel))
 	{
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index a08922b..351fc31 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -17,7 +17,7 @@
  *		visibilitymap_set	 - set a bit in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
- *		visibilitymap_truncate	- truncate the visibility map
+ *		visibilitymap_truncate_prepare - truncate only tail bits of map pages
  *
  * NOTES
  *
@@ -430,16 +430,18 @@ visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_fro
 }
 
 /*
- *	visibilitymap_truncate - truncate the visibility map
+ *	visibilitymap_truncate_prepare - truncate only tail bits of map page
+ *									 and return the block number for actual
+ *									 truncation later
  *
- * The caller must hold AccessExclusiveLock on the relation, to ensure that
- * other backends receive the smgr invalidation event that this function sends
- * before they access the VM again.
+ * Note that this does not truncate the actual visibility map pages.
+ * When this function is called, the caller must eventually follow it with
+ * smgrtruncate() call to actually truncate visibility map pages.
  *
  * nheapblocks is the new size of the heap.
  */
-void
-visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
+BlockNumber
+visibilitymap_truncate_prepare(Relation rel, BlockNumber nheapblocks)
 {
 	BlockNumber newnblocks;
 
@@ -459,7 +461,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	 * nothing to truncate.
 	 */
 	if (!smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM))
-		return;
+		return InvalidBlockNumber;
 
 	/*
 	 * Unless the new size is exactly at a visibility map page boundary, the
@@ -480,7 +482,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 		if (!BufferIsValid(mapBuffer))
 		{
 			/* nothing to do, the file was already smaller */
-			return;
+			return InvalidBlockNumber;
 		}
 
 		page = BufferGetPage(mapBuffer);
@@ -528,20 +530,10 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	if (smgrnblocks(rel->rd_smgr, VISIBILITYMAP_FORKNUM) <= newnblocks)
 	{
 		/* nothing to do, the file was already smaller than requested size */
-		return;
+		return InvalidBlockNumber;
 	}
-
-	/* Truncate the unused VM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, newnblocks);
-
-	/*
-	 * We might as well update the local smgr_vm_nblocks setting. smgrtruncate
-	 * sent an smgr cache inval message, which will cause other backends to
-	 * invalidate their copy of smgr_vm_nblocks, and this one too at the next
-	 * command boundary.  But this ensures it isn't outright wrong until then.
-	 */
-	if (rel->rd_smgr)
-		rel->rd_smgr->smgr_vm_nblocks = newnblocks;
+	else
+		return newnblocks;
 }
 
 /*
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 3cc886f..a11e54a 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -231,6 +231,10 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 {
 	bool		fsm;
 	bool		vm;
+	ForkNumber	forks[MAX_FORKNUM];
+	BlockNumber	blocks[MAX_FORKNUM];
+	BlockNumber	first_removed_nblocks = InvalidBlockNumber;
+	int		nforks = 0;
 
 	/* Open it at the smgr level if not already done */
 	RelationOpenSmgr(rel);
@@ -242,15 +246,33 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 	rel->rd_smgr->smgr_fsm_nblocks = InvalidBlockNumber;
 	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
 
-	/* Truncate the FSM first if it exists */
+	/* Find the dirty FSM page and return a block number. */
 	fsm = smgrexists(rel->rd_smgr, FSM_FORKNUM);
 	if (fsm)
-		FreeSpaceMapTruncateRel(rel, nblocks);
+	{
+		blocks[nforks] = FreeSpaceMapLocateBlock(rel, nblocks);
+		if (BlockNumberIsValid(blocks[nforks]))
+		{
+			first_removed_nblocks = nblocks;
+			forks[nforks] = FSM_FORKNUM;
+			nforks++;
+		}
+	}
 
-	/* Truncate the visibility map too if it exists. */
+	/*
+	 * Truncate only the tail bits of VM and return the block number
+	 * for actual truncation later in smgrtruncate.
+	 */
 	vm = smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM);
 	if (vm)
-		visibilitymap_truncate(rel, nblocks);
+	{
+		blocks[nforks] = visibilitymap_truncate_prepare(rel, nblocks);
+		if (BlockNumberIsValid(blocks[nforks]))
+		{
+			forks[nforks] = VISIBILITYMAP_FORKNUM;
+			nforks++;
+		}
+	}
 
 	/*
 	 * We WAL-log the truncation before actually truncating, which means
@@ -290,8 +312,20 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 			XLogFlush(lsn);
 	}
 
-	/* Do the real work */
-	smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks);
+	/* Pinpoint the MAIN fork and its blocks */
+	forks[nforks] = MAIN_FORKNUM;
+	blocks[nforks] = nblocks;
+	nforks++;
+
+	/* Truncate relation forks simultaneously */
+	smgrtruncate(rel->rd_smgr, forks, nforks, blocks);
+
+	/*
+	 * Update upper-level FSM pages to account for the truncation.
+	 * This is important because the just-truncated pages were likely
+	 * marked as all-free, and would be preferentially selected.
+	 */
+	FreeSpaceMapVacuumRange(rel, first_removed_nblocks, InvalidBlockNumber);
 }
 
 /*
@@ -588,6 +622,13 @@ smgr_redo(XLogReaderState *record)
 		xl_smgr_truncate *xlrec = (xl_smgr_truncate *) XLogRecGetData(record);
 		SMgrRelation reln;
 		Relation	rel;
+		ForkNumber	forks[MAX_FORKNUM];
+		BlockNumber	blocks[MAX_FORKNUM];
+		BlockNumber	first_removed_nblocks = InvalidBlockNumber;
+		int		nforks = 0;
+		bool		fsm_fork = false;
+		bool		main_fork = false;
+		bool		vm_fork = false;
 
 		reln = smgropen(xlrec->rnode, InvalidBackendId);
 
@@ -616,23 +657,58 @@ smgr_redo(XLogReaderState *record)
 		 */
 		XLogFlush(lsn);
 
+		/*
+		 * To speedup recovery, we identify the about-to-be-truncated blocks
+		 * of relation forks first, then truncate those simultaneously later.
+		 */
 		if ((xlrec->flags & SMGR_TRUNCATE_HEAP) != 0)
 		{
-			smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno);
-
-			/* Also tell xlogutils.c about it */
-			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+			forks[nforks] = MAIN_FORKNUM;
+			blocks[nforks] = xlrec->blkno;
+			nforks++;
+			main_fork = true;
 		}
 
-		/* Truncate FSM and VM too */
 		rel = CreateFakeRelcacheEntry(xlrec->rnode);
 
 		if ((xlrec->flags & SMGR_TRUNCATE_FSM) != 0 &&
 			smgrexists(reln, FSM_FORKNUM))
-			FreeSpaceMapTruncateRel(rel, xlrec->blkno);
+		{
+			blocks[nforks] = FreeSpaceMapLocateBlock(rel, xlrec->blkno);
+			if (BlockNumberIsValid(blocks[nforks]))
+			{
+				first_removed_nblocks = xlrec->blkno;
+				forks[nforks] = FSM_FORKNUM;
+				nforks++;
+				fsm_fork = true;
+			}
+		}
 		if ((xlrec->flags & SMGR_TRUNCATE_VM) != 0 &&
 			smgrexists(reln, VISIBILITYMAP_FORKNUM))
-			visibilitymap_truncate(rel, xlrec->blkno);
+		{
+			blocks[nforks] = visibilitymap_truncate_prepare(rel, xlrec->blkno);
+			if (BlockNumberIsValid(blocks[nforks]))
+			{
+				forks[nforks] = VISIBILITYMAP_FORKNUM;
+				nforks++;
+				vm_fork = true;
+			}
+		}
+
+		/* Truncate relation forks simultaneously */
+		if (main_fork || fsm_fork || vm_fork)
+			smgrtruncate(reln, forks, nforks, blocks);
+
+		/* Also tell xlogutils.c about it */
+		if (main_fork)
+			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+
+		/*
+		 * Update upper-level FSM pages to account for the truncation.
+		 * This is important because the just-truncated pages were likely
+		 * marked as all-free, and would be preferentially selected.
+		 */
+		FreeSpaceMapVacuumRange(rel, first_removed_nblocks, InvalidBlockNumber);
 
 		FreeFakeRelcacheEntry(rel);
 	}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 6f3a402..1f2b600 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2900,8 +2900,8 @@ BufferGetLSNAtomic(Buffer buffer)
 /* ---------------------------------------------------------------------
  *		DropRelFileNodeBuffers
  *
- *		This function removes from the buffer pool all the pages of the
- *		specified relation fork that have block numbers >= firstDelBlock.
+ *		This function simultaneously removes from the buffer pool all the
+ *		pages of the relation forks that have block numbers >= firstDelBlock.
  *		(In particular, with firstDelBlock = 0, all pages are removed.)
  *		Dirty pages are simply dropped, without bothering to write them
  *		out first.  Therefore, this is NOT rollback-able, and so should be
@@ -2924,8 +2924,8 @@ BufferGetLSNAtomic(Buffer buffer)
  * --------------------------------------------------------------------
  */
 void
-DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
-					   BlockNumber firstDelBlock)
+DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
+					   int nforks, BlockNumber *firstDelBlock)
 {
 	int			i;
 
@@ -2933,7 +2933,12 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
 	if (RelFileNodeBackendIsTemp(rnode))
 	{
 		if (rnode.backend == MyBackendId)
-			DropRelFileNodeLocalBuffers(rnode.node, forkNum, firstDelBlock);
+		{
+			int		j;
+			for (j = 0; j < nforks; j++)
+				DropRelFileNodeLocalBuffers(rnode.node, forkNum[j],
+											firstDelBlock[j]);
+		}
 		return;
 	}
 
@@ -2941,6 +2946,7 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
 	{
 		BufferDesc *bufHdr = GetBufferDescriptor(i);
 		uint32		buf_state;
+		int		j = 0;
 
 		/*
 		 * We can make this a tad faster by prechecking the buffer tag before
@@ -2962,11 +2968,18 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
 			continue;
 
 		buf_state = LockBufHdr(bufHdr);
-		if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
-			bufHdr->tag.forkNum == forkNum &&
-			bufHdr->tag.blockNum >= firstDelBlock)
-			InvalidateBuffer(bufHdr);	/* releases spinlock */
-		else
+
+		for (j = 0; j < nforks; j++)
+		{
+			if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
+				bufHdr->tag.forkNum == forkNum[j] &&
+				bufHdr->tag.blockNum >= firstDelBlock[j])
+			{
+				InvalidateBuffer(bufHdr); /* releases spinlock */
+				break;
+			}
+		}
+		if (j >= nforks)
 			UnlockBufHdr(bufHdr, buf_state);
 	}
 }
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 2383094..473c2e5 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -247,16 +247,17 @@ GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk)
 }
 
 /*
- * FreeSpaceMapTruncateRel - adjust for truncation of a relation.
+ * FreeSpaceMapLocateBlock - adjust for truncation of a relation.
  *
- * The caller must hold AccessExclusiveLock on the relation, to ensure that
- * other backends receive the smgr invalidation event that this function sends
- * before they access the FSM again.
+ * This function finds the dirty page and returns a block number.
+ *
+ * The caller of this function must eventually call smgrtruncate() to actually
+ * truncate FSM pages.
  *
  * nblocks is the new size of the heap.
  */
-void
-FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
+BlockNumber
+FreeSpaceMapLocateBlock(Relation rel, BlockNumber nblocks)
 {
 	BlockNumber new_nfsmblocks;
 	FSMAddress	first_removed_address;
@@ -270,7 +271,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	 * truncate.
 	 */
 	if (!smgrexists(rel->rd_smgr, FSM_FORKNUM))
-		return;
+		return InvalidBlockNumber;
 
 	/* Get the location in the FSM of the first removed heap block */
 	first_removed_address = fsm_get_location(nblocks, &first_removed_slot);
@@ -285,7 +286,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	{
 		buf = fsm_readbuf(rel, first_removed_address, false);
 		if (!BufferIsValid(buf))
-			return;				/* nothing to do; the FSM was already smaller */
+			return InvalidBlockNumber;	/* nothing to do; the FSM was already smaller */
 		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
 
 		/* NO EREPORT(ERROR) from here till changes are logged */
@@ -310,33 +311,16 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 		UnlockReleaseBuffer(buf);
 
 		new_nfsmblocks = fsm_logical_to_physical(first_removed_address) + 1;
+		return new_nfsmblocks;
 	}
 	else
 	{
 		new_nfsmblocks = fsm_logical_to_physical(first_removed_address);
 		if (smgrnblocks(rel->rd_smgr, FSM_FORKNUM) <= new_nfsmblocks)
-			return;				/* nothing to do; the FSM was already smaller */
+			return InvalidBlockNumber;	/* nothing to do; the FSM was already smaller */
+		else
+			return new_nfsmblocks;
 	}
-
-	/* Truncate the unused FSM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, FSM_FORKNUM, new_nfsmblocks);
-
-	/*
-	 * We might as well update the local smgr_fsm_nblocks setting.
-	 * smgrtruncate sent an smgr cache inval message, which will cause other
-	 * backends to invalidate their copy of smgr_fsm_nblocks, and this one too
-	 * at the next command boundary.  But this ensures it isn't outright wrong
-	 * until then.
-	 */
-	if (rel->rd_smgr)
-		rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
-
-	/*
-	 * Update upper-level FSM pages to account for the truncation.  This is
-	 * important because the just-truncated pages were likely marked as
-	 * all-free, and would be preferentially selected.
-	 */
-	FreeSpaceMapVacuumRange(rel, nblocks, InvalidBlockNumber);
 }
 
 /*
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 5b5a80e..0fc1f76 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -469,6 +469,7 @@ smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo)
 	pfree(rnodes);
 }
 
+
 /*
  *	smgrextend() -- Add a new block to a file.
  *
@@ -561,15 +562,21 @@ smgrnblocks(SMgrRelation reln, ForkNumber forknum)
  *					  of blocks
  *
  * The truncation is done immediately, so this can't be rolled back.
+ *
+ * The caller must hold AccessExclusiveLock on the relation, to ensure that
+ * other backends receive the smgr invalidation event that this function sends
+ * before they access the relation again.
  */
 void
-smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
+smgrtruncate(SMgrRelation reln, ForkNumber *forknum, int nforks, BlockNumber *nblocks)
 {
+	int		i;
+
 	/*
 	 * Get rid of any buffers for the about-to-be-deleted blocks. bufmgr will
 	 * just drop them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nblocks);
+	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nforks, nblocks);
 
 	/*
 	 * Send a shared-inval message to force other backends to close any smgr
@@ -583,10 +590,23 @@ smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 	 */
 	CacheInvalidateSmgr(reln->smgr_rnode);
 
-	/*
-	 * Do the truncation.
-	 */
-	smgrsw[reln->smgr_which].smgr_truncate(reln, forknum, nblocks);
+	/* Do the truncation */
+	for (i = 0; i < nforks; i++)
+	{
+		smgrsw[reln->smgr_which].smgr_truncate(reln, forknum[i], nblocks[i]);
+
+		/*
+		 * We might as well update the local smgr_fsm_nblocks and smgr_vm_nblocks
+		 * setting. smgrtruncate sent an smgr cache inval message, which will
+		 * cause other backends to invalidate their copy of smgr_fsm_nblocks and
+		 * smgr_vm_nblocks, and these ones too at the next command boundary. But
+		 * this ensures these aren't outright wrong until then.
+		 */
+		if (forknum[i] == FSM_FORKNUM)
+			reln->smgr_fsm_nblocks = nblocks[i];
+		if (forknum[i] == VISIBILITYMAP_FORKNUM)
+			reln->smgr_vm_nblocks = nblocks[i];
+	}
 }
 
 /*
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 2d88043..1ab6a81 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -44,6 +44,6 @@ extern void visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 							  uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
-extern void visibilitymap_truncate(Relation rel, BlockNumber nheapblocks);
+extern BlockNumber visibilitymap_truncate_prepare(Relation rel, BlockNumber nheapblocks);
 
 #endif							/* VISIBILITYMAP_H */
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 509f4b7..17b97f7 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -190,8 +190,8 @@ extern BlockNumber RelationGetNumberOfBlocksInFork(Relation relation,
 extern void FlushOneBuffer(Buffer buffer);
 extern void FlushRelationBuffers(Relation rel);
 extern void FlushDatabaseBuffers(Oid dbid);
-extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode,
-								   ForkNumber forkNum, BlockNumber firstDelBlock);
+extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
+								   int nforks, BlockNumber *firstDelBlock);
 extern void DropRelFileNodesAllBuffers(RelFileNodeBackend *rnodes, int nnodes);
 extern void DropDatabaseBuffers(Oid dbid);
 
diff --git a/src/include/storage/freespace.h b/src/include/storage/freespace.h
index 8d8c465..0b834cb 100644
--- a/src/include/storage/freespace.h
+++ b/src/include/storage/freespace.h
@@ -30,7 +30,7 @@ extern void RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk,
 extern void XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk,
 										Size spaceAvail);
 
-extern void FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
+extern BlockNumber FreeSpaceMapLocateBlock(Relation rel, BlockNumber nblocks);
 extern void FreeSpaceMapVacuum(Relation rel);
 extern void FreeSpaceMapVacuumRange(Relation rel, BlockNumber start,
 									BlockNumber end);
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index 7393727..1543d8d 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -101,8 +101,8 @@ extern void smgrwrite(SMgrRelation reln, ForkNumber forknum,
 extern void smgrwriteback(SMgrRelation reln, ForkNumber forknum,
 						  BlockNumber blocknum, BlockNumber nblocks);
 extern BlockNumber smgrnblocks(SMgrRelation reln, ForkNumber forknum);
-extern void smgrtruncate(SMgrRelation reln, ForkNumber forknum,
-						 BlockNumber nblocks);
+extern void smgrtruncate(SMgrRelation reln, ForkNumber *forknum,
+						 int nforks, BlockNumber *nblocks);
 extern void smgrimmedsync(SMgrRelation reln, ForkNumber forknum);
 extern void AtEOXact_SMgr(void);
 
#27Fujii Masao
masao.fujii@gmail.com
In reply to: Jamison, Kirk (#26)
Re: [PATCH] Speedup truncates of relation forks

On Mon, Sep 9, 2019 at 3:52 PM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote:

On Friday, September 6, 2019 11:51 PM (GMT+9), Alvaro Herrera wrote:

Hi Alvaro,
Thank you very much for the review!

On 2019-Sep-05, Jamison, Kirk wrote:

I also mentioned it from my first post if we can just remove this dead code.
If not, it would require to modify the function because it would also
need nforks as input argument when calling DropRelFileNodeBuffers. I
kept my changes in the latest patch.
So should I remove the function now or keep my changes?

Please add a preliminary patch that removes the function. Dead code is good,
as long as it is gone. We can get it pushed ahead of the rest of this.

Alright. I've attached a separate patch removing the smgrdounlinkfork.

Per the past discussion, some people want to keep this "dead" function
for some reasons. So, in my opinion, it's better to just enclose the function
with #if NOT_USED and #endif, to keep the function itself as it is, and then
to start new discussion on hackers about the removal of that separatedly
from this patch.

Regards,

--
Fujii Masao

#28Fujii Masao
masao.fujii@gmail.com
In reply to: Jamison, Kirk (#24)
Re: [PATCH] Speedup truncates of relation forks

On Thu, Sep 5, 2019 at 5:53 PM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote:

On Tuesday, September 3, 2019 9:44 PM (GMT+9), Fujii Masao wrote:

Thanks for the patch!

Thank you as well for the review!

-smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
+smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum, int nforks,
bool isRedo)

smgrdounlinkfork() is dead code. Per the discussion [1], this unused function
was left intentionally. But it's still dead code since 2012, so I'd like to
remove it. Or, even if we decide to keep that function for some reasons, I
don't think that we need to update that so that it can unlink multiple forks
at once. So, what about keeping
smgrdounlinkfork() as it is?

[1]
/messages/by-id/1471.1339106082@sss.pgh.pa.us

I also mentioned it from my first post if we can just remove this dead code.
If not, it would require to modify the function because it would also
need nforks as input argument when calling DropRelFileNodeBuffers. I kept my
changes in the latest patch.
So should I remove the function now or keep my changes?

+ for (int i = 0; i < nforks; i++)

The variable "i" should not be declared in for loop per PostgreSQL coding
style.

Fixed.

+ /* Check with the lower bound block number and skip the loop */ if
+ (bufHdr->tag.blockNum < minBlock) continue; /* skip checking the
+ buffer pool scan */

Because of the above code, the following source comment in bufmgr.c should
be updated.

* We could check forkNum and blockNum as well as the rnode, but the
* incremental win from doing so seems small.

And, first of all, is this check really useful for performance?
Since firstDelBlock for FSM fork is usually small, minBlock would also be
small. So I'm not sure how much this is helpful for performance.

This was a suggestion from Sawada-san in the previous email,
but he also thought that the performance benefit might be small..
so I just removed the related code block in this patch.

When relation is completely truncated at all (i.e., the number of block to
delete first is zero), can RelationTruncate() and smgr_redo() just call
smgrdounlinkall() like smgrDoPendingDeletes() does, instead of calling
MarkFreeSpaceMapTruncateRel(), visibilitymap_truncate_prepare() and
smgrtruncate()? ISTM that smgrdounlinkall() is faster and simpler.

I haven't applied this in my patch yet.
If my understanding is correct, smgrdounlinkall() is used for deleting
relation forks. However, we only truncate (not delete) relations
in RelationTruncate() and smgr_redo(). I'm not sure if it's correct to
use it here. Could you expound more your idea on using smgrdounlinkall?

My this comment is pointless, so please ignore it. Sorry for noise..

Here are other comments for the latest patch:

+ block = visibilitymap_truncate_prepare(rel, 0);
+ if (BlockNumberIsValid(block))
+ fork = VISIBILITYMAP_FORKNUM;
+
+ smgrtruncate(rel->rd_smgr, &fork, 1, &block);

If visibilitymap_truncate_prepare() returns InvalidBlockNumber,
smgrtruncate() should not be called.

+ FreeSpaceMapVacuumRange(rel, first_removed_nblocks, InvalidBlockNumber);

FreeSpaceMapVacuumRange() should be called only when FSM exists,
like the original code does?

Regards,

--
Fujii Masao

#29Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Fujii Masao (#27)
Re: [PATCH] Speedup truncates of relation forks

On 2019-Sep-13, Fujii Masao wrote:

On Mon, Sep 9, 2019 at 3:52 PM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote:

Please add a preliminary patch that removes the function. Dead code is good,
as long as it is gone. We can get it pushed ahead of the rest of this.

Alright. I've attached a separate patch removing the smgrdounlinkfork.

Per the past discussion, some people want to keep this "dead" function
for some reasons. So, in my opinion, it's better to just enclose the function
with #if NOT_USED and #endif, to keep the function itself as it is, and then
to start new discussion on hackers about the removal of that separatedly
from this patch.

I searched for anybody requesting to keep the function. I couldn't find
anything. Tom said in 2012:
/messages/by-id/1471.1339106082@sss.pgh.pa.us

As committed, the smgrdounlinkfork case is actually dead code; it's
never called from anywhere. I left it in place just in case we want
it someday.

but if no use has appeared in 7 years, I say it's time to kill it.

In absence of objections, I'll commit a patch to remove it later today.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#30Fujii Masao
masao.fujii@gmail.com
In reply to: Alvaro Herrera (#29)
Re: [PATCH] Speedup truncates of relation forks

On Fri, Sep 13, 2019 at 9:51 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2019-Sep-13, Fujii Masao wrote:

On Mon, Sep 9, 2019 at 3:52 PM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote:

Please add a preliminary patch that removes the function. Dead code is good,
as long as it is gone. We can get it pushed ahead of the rest of this.

Alright. I've attached a separate patch removing the smgrdounlinkfork.

Per the past discussion, some people want to keep this "dead" function
for some reasons. So, in my opinion, it's better to just enclose the function
with #if NOT_USED and #endif, to keep the function itself as it is, and then
to start new discussion on hackers about the removal of that separatedly
from this patch.

I searched for anybody requesting to keep the function. I couldn't find
anything. Tom said in 2012:
/messages/by-id/1471.1339106082@sss.pgh.pa.us

Yes. And I found Andres.
/messages/by-id/20180621174129.hogefyopje4xaznu@alap3.anarazel.de

As committed, the smgrdounlinkfork case is actually dead code; it's
never called from anywhere. I left it in place just in case we want
it someday.

but if no use has appeared in 7 years, I say it's time to kill it.

+1

Regards,

--
Fujii Masao

#31Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Fujii Masao (#30)
2 attachment(s)
RE: [PATCH] Speedup truncates of relation forks

On Friday, September 13, 2019 10:06 PM (GMT+9), Fujii Masao wrote:

On Fri, Sep 13, 2019 at 9:51 PM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

On 2019-Sep-13, Fujii Masao wrote:

On Mon, Sep 9, 2019 at 3:52 PM Jamison, Kirk <k.jamison@jp.fujitsu.com>

wrote:

Please add a preliminary patch that removes the function. Dead
code is good, as long as it is gone. We can get it pushed ahead of

the rest of this.

Alright. I've attached a separate patch removing the smgrdounlinkfork.

Per the past discussion, some people want to keep this "dead"
function for some reasons. So, in my opinion, it's better to just
enclose the function with #if NOT_USED and #endif, to keep the
function itself as it is, and then to start new discussion on
hackers about the removal of that separatedly from this patch.

I searched for anybody requesting to keep the function. I couldn't
find anything. Tom said in 2012:
/messages/by-id/1471.1339106082@sss.pgh.pa.us

Yes. And I found Andres.
/messages/by-id/20180621174129.hogefyopje4xaznu@al
ap3.anarazel.de

As committed, the smgrdounlinkfork case is actually dead code; it's
never called from anywhere. I left it in place just in case we want
it someday.

but if no use has appeared in 7 years, I say it's time to kill it.

+1

The consensus is we remove it, right?
Re-attaching the patch that removes the deadcode: smgrdounlinkfork().

---
I've also fixed Fujii-san's comments below in the latest attached speedup truncate rel patch (v8).

Here are other comments for the latest patch:

+ block = visibilitymap_truncate_prepare(rel, 0); if
+ (BlockNumberIsValid(block)) fork = VISIBILITYMAP_FORKNUM;
+
+ smgrtruncate(rel->rd_smgr, &fork, 1, &block);

If visibilitymap_truncate_prepare() returns InvalidBlockNumber,
smgrtruncate() should not be called.

+ FreeSpaceMapVacuumRange(rel, first_removed_nblocks,
+ InvalidBlockNumber);

Thank you again for the review!

Regards,
Kirk Jamison

Attachments:

v1-0001-Remove-deadcode-smgrdounlinkfork.patchapplication/octet-stream; name=v1-0001-Remove-deadcode-smgrdounlinkfork.patchDownload
From 3aeda104f3c2cc9b0841e536cb39fcd16cb8d881 Mon Sep 17 00:00:00 2001
From: Kirk Jamison <k.jamison@jp.fujitsu.com>
Date: Mon, 9 Sep 2019 06:09:04 +0000
Subject: [PATCH] Remove deadcode smgrdounlinkfork()

---
 src/backend/storage/smgr/smgr.c | 55 -----------------------------------------
 src/include/storage/smgr.h      |  1 -
 2 files changed, 56 deletions(-)

diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index b0d9f21..5b5a80e 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -343,9 +343,6 @@ smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
  *
  *		If isRedo is true, it is okay for the underlying file(s) to be gone
  *		already.
- *
- *		This is equivalent to calling smgrdounlinkfork for each fork, but
- *		it's significantly quicker so should be preferred when possible.
  */
 void
 smgrdounlink(SMgrRelation reln, bool isRedo)
@@ -473,58 +470,6 @@ smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo)
 }
 
 /*
- *	smgrdounlinkfork() -- Immediately unlink one fork of a relation.
- *
- *		The specified fork of the relation is removed from the store.  This
- *		should not be used during transactional operations, since it can't be
- *		undone.
- *
- *		If isRedo is true, it is okay for the underlying file to be gone
- *		already.
- */
-void
-smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
-{
-	RelFileNodeBackend rnode = reln->smgr_rnode;
-	int			which = reln->smgr_which;
-
-	/* Close the fork at smgr level */
-	smgrsw[which].smgr_close(reln, forknum);
-
-	/*
-	 * Get rid of any remaining buffers for the fork.  bufmgr will just drop
-	 * them without bothering to write the contents.
-	 */
-	DropRelFileNodeBuffers(rnode, forknum, 0);
-
-	/*
-	 * It'd be nice to tell the stats collector to forget it immediately, too.
-	 * But we can't because we don't know the OID (and in cases involving
-	 * relfilenode swaps, it's not always clear which table OID to forget,
-	 * anyway).
-	 */
-
-	/*
-	 * Send a shared-inval message to force other backends to close any
-	 * dangling smgr references they may have for this rel.  We should do this
-	 * before starting the actual unlinking, in case we fail partway through
-	 * that step.  Note that the sinval message will eventually come back to
-	 * this backend, too, and thereby provide a backstop that we closed our
-	 * own smgr rel.
-	 */
-	CacheInvalidateSmgr(rnode);
-
-	/*
-	 * Delete the physical file(s).
-	 *
-	 * Note: smgr_unlink must treat deletion failure as a WARNING, not an
-	 * ERROR, because we've already decided to commit or abort the current
-	 * xact.
-	 */
-	smgrsw[which].smgr_unlink(rnode, forknum, isRedo);
-}
-
-/*
  *	smgrextend() -- Add a new block to a file.
  *
  *		The semantics are nearly the same as smgrwrite(): write at the
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index d286c8c..7393727 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -90,7 +90,6 @@ extern void smgrclosenode(RelFileNodeBackend rnode);
 extern void smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern void smgrdounlink(SMgrRelation reln, bool isRedo);
 extern void smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo);
-extern void smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern void smgrextend(SMgrRelation reln, ForkNumber forknum,
 					   BlockNumber blocknum, char *buffer, bool skipFsync);
 extern void smgrprefetch(SMgrRelation reln, ForkNumber forknum,
-- 
1.8.3.1

v8-0001-Speedup-truncates-of-relation-forks.patchapplication/octet-stream; name=v8-0001-Speedup-truncates-of-relation-forks.patchDownload
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 1372bb6..4d342ea 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -383,6 +383,8 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
 {
 	Oid			relid = PG_GETARG_OID(0);
 	Relation	rel;
+	ForkNumber	fork;
+	BlockNumber	block;
 
 	rel = relation_open(relid, AccessExclusiveLock);
 
@@ -392,20 +394,26 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
 	RelationOpenSmgr(rel);
 	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
 
-	visibilitymap_truncate(rel, 0);
-
-	if (RelationNeedsWAL(rel))
+	block = visibilitymap_truncate_prepare(rel, 0);
+	if (BlockNumberIsValid(block))
 	{
-		xl_smgr_truncate xlrec;
+		fork = VISIBILITYMAP_FORKNUM;
+		smgrtruncate(rel->rd_smgr, &fork, 1, &block);
+
+		if (RelationNeedsWAL(rel))
+		{
+			xl_smgr_truncate xlrec;
 
-		xlrec.blkno = 0;
-		xlrec.rnode = rel->rd_node;
-		xlrec.flags = SMGR_TRUNCATE_VM;
+			xlrec.blkno = 0;
+			xlrec.rnode = rel->rd_node;
+			xlrec.flags = SMGR_TRUNCATE_VM;
 
-		XLogBeginInsert();
-		XLogRegisterData((char *) &xlrec, sizeof(xlrec));
+			XLogBeginInsert();
+			XLogRegisterData((char *) &xlrec, sizeof(xlrec));
 
-		XLogInsert(RM_SMGR_ID, XLOG_SMGR_TRUNCATE | XLR_SPECIAL_REL_UPDATE);
+			XLogInsert(RM_SMGR_ID, XLOG_SMGR_TRUNCATE |
+					   XLR_SPECIAL_REL_UPDATE);
+		}
 	}
 
 	/*
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index a08922b..351fc31 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -17,7 +17,7 @@
  *		visibilitymap_set	 - set a bit in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
- *		visibilitymap_truncate	- truncate the visibility map
+ *		visibilitymap_truncate_prepare - truncate only tail bits of map pages
  *
  * NOTES
  *
@@ -430,16 +430,18 @@ visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_fro
 }
 
 /*
- *	visibilitymap_truncate - truncate the visibility map
+ *	visibilitymap_truncate_prepare - truncate only tail bits of map page
+ *									 and return the block number for actual
+ *									 truncation later
  *
- * The caller must hold AccessExclusiveLock on the relation, to ensure that
- * other backends receive the smgr invalidation event that this function sends
- * before they access the VM again.
+ * Note that this does not truncate the actual visibility map pages.
+ * When this function is called, the caller must eventually follow it with
+ * smgrtruncate() call to actually truncate visibility map pages.
  *
  * nheapblocks is the new size of the heap.
  */
-void
-visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
+BlockNumber
+visibilitymap_truncate_prepare(Relation rel, BlockNumber nheapblocks)
 {
 	BlockNumber newnblocks;
 
@@ -459,7 +461,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	 * nothing to truncate.
 	 */
 	if (!smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM))
-		return;
+		return InvalidBlockNumber;
 
 	/*
 	 * Unless the new size is exactly at a visibility map page boundary, the
@@ -480,7 +482,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 		if (!BufferIsValid(mapBuffer))
 		{
 			/* nothing to do, the file was already smaller */
-			return;
+			return InvalidBlockNumber;
 		}
 
 		page = BufferGetPage(mapBuffer);
@@ -528,20 +530,10 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	if (smgrnblocks(rel->rd_smgr, VISIBILITYMAP_FORKNUM) <= newnblocks)
 	{
 		/* nothing to do, the file was already smaller than requested size */
-		return;
+		return InvalidBlockNumber;
 	}
-
-	/* Truncate the unused VM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, newnblocks);
-
-	/*
-	 * We might as well update the local smgr_vm_nblocks setting. smgrtruncate
-	 * sent an smgr cache inval message, which will cause other backends to
-	 * invalidate their copy of smgr_vm_nblocks, and this one too at the next
-	 * command boundary.  But this ensures it isn't outright wrong until then.
-	 */
-	if (rel->rd_smgr)
-		rel->rd_smgr->smgr_vm_nblocks = newnblocks;
+	else
+		return newnblocks;
 }
 
 /*
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 3cc886f..623cf9f 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -231,6 +231,10 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 {
 	bool		fsm;
 	bool		vm;
+	ForkNumber	forks[MAX_FORKNUM];
+	BlockNumber	blocks[MAX_FORKNUM];
+	BlockNumber	first_removed_nblocks = InvalidBlockNumber;
+	int		nforks = 0;
 
 	/* Open it at the smgr level if not already done */
 	RelationOpenSmgr(rel);
@@ -242,15 +246,33 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 	rel->rd_smgr->smgr_fsm_nblocks = InvalidBlockNumber;
 	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
 
-	/* Truncate the FSM first if it exists */
+	/* Find the dirty FSM page and return a block number. */
 	fsm = smgrexists(rel->rd_smgr, FSM_FORKNUM);
 	if (fsm)
-		FreeSpaceMapTruncateRel(rel, nblocks);
+	{
+		blocks[nforks] = FreeSpaceMapLocateBlock(rel, nblocks);
+		if (BlockNumberIsValid(blocks[nforks]))
+		{
+			first_removed_nblocks = nblocks;
+			forks[nforks] = FSM_FORKNUM;
+			nforks++;
+		}
+	}
 
-	/* Truncate the visibility map too if it exists. */
+	/*
+	 * Truncate only the tail bits of VM and return the block number
+	 * for actual truncation later in smgrtruncate.
+	 */
 	vm = smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM);
 	if (vm)
-		visibilitymap_truncate(rel, nblocks);
+	{
+		blocks[nforks] = visibilitymap_truncate_prepare(rel, nblocks);
+		if (BlockNumberIsValid(blocks[nforks]))
+		{
+			forks[nforks] = VISIBILITYMAP_FORKNUM;
+			nforks++;
+		}
+	}
 
 	/*
 	 * We WAL-log the truncation before actually truncating, which means
@@ -290,8 +312,22 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 			XLogFlush(lsn);
 	}
 
-	/* Do the real work */
-	smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks);
+	/* Pinpoint the MAIN fork and its blocks */
+	forks[nforks] = MAIN_FORKNUM;
+	blocks[nforks] = nblocks;
+	nforks++;
+
+	/* Truncate relation forks simultaneously */
+	smgrtruncate(rel->rd_smgr, forks, nforks, blocks);
+
+	/*
+	 * Update upper-level FSM pages to account for the truncation.
+	 * This is important because the just-truncated pages were likely
+	 * marked as all-free, and would be preferentially selected.
+	 */
+	if (fsm)
+		FreeSpaceMapVacuumRange(rel, first_removed_nblocks,
+								InvalidBlockNumber);
 }
 
 /*
@@ -588,6 +624,13 @@ smgr_redo(XLogReaderState *record)
 		xl_smgr_truncate *xlrec = (xl_smgr_truncate *) XLogRecGetData(record);
 		SMgrRelation reln;
 		Relation	rel;
+		ForkNumber	forks[MAX_FORKNUM];
+		BlockNumber	blocks[MAX_FORKNUM];
+		BlockNumber	first_removed_nblocks = InvalidBlockNumber;
+		int		nforks = 0;
+		bool		fsm = false;
+		bool		main = false;
+		bool		vm = false;
 
 		reln = smgropen(xlrec->rnode, InvalidBackendId);
 
@@ -616,23 +659,60 @@ smgr_redo(XLogReaderState *record)
 		 */
 		XLogFlush(lsn);
 
+		/*
+		 * To speedup recovery, we identify the about-to-be-truncated blocks
+		 * of relation forks first, then truncate those simultaneously later.
+		 */
 		if ((xlrec->flags & SMGR_TRUNCATE_HEAP) != 0)
 		{
-			smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno);
-
-			/* Also tell xlogutils.c about it */
-			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+			forks[nforks] = MAIN_FORKNUM;
+			blocks[nforks] = xlrec->blkno;
+			nforks++;
+			main = true;
 		}
 
-		/* Truncate FSM and VM too */
 		rel = CreateFakeRelcacheEntry(xlrec->rnode);
 
 		if ((xlrec->flags & SMGR_TRUNCATE_FSM) != 0 &&
 			smgrexists(reln, FSM_FORKNUM))
-			FreeSpaceMapTruncateRel(rel, xlrec->blkno);
+		{
+			blocks[nforks] = FreeSpaceMapLocateBlock(rel, xlrec->blkno);
+			if (BlockNumberIsValid(blocks[nforks]))
+			{
+				first_removed_nblocks = xlrec->blkno;
+				forks[nforks] = FSM_FORKNUM;
+				nforks++;
+				fsm = true;
+			}
+		}
 		if ((xlrec->flags & SMGR_TRUNCATE_VM) != 0 &&
 			smgrexists(reln, VISIBILITYMAP_FORKNUM))
-			visibilitymap_truncate(rel, xlrec->blkno);
+		{
+			blocks[nforks] = visibilitymap_truncate_prepare(rel, xlrec->blkno);
+			if (BlockNumberIsValid(blocks[nforks]))
+			{
+				forks[nforks] = VISIBILITYMAP_FORKNUM;
+				nforks++;
+				vm = true;
+			}
+		}
+
+		/* Truncate relation forks simultaneously */
+		if (main || fsm || vm)
+			smgrtruncate(reln, forks, nforks, blocks);
+
+		/* Also tell xlogutils.c about it */
+		if (main)
+			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+
+		/*
+		 * Update upper-level FSM pages to account for the truncation.
+		 * This is important because the just-truncated pages were likely
+		 * marked as all-free, and would be preferentially selected.
+		 */
+		if (fsm)
+			FreeSpaceMapVacuumRange(rel, first_removed_nblocks,
+									InvalidBlockNumber);
 
 		FreeFakeRelcacheEntry(rel);
 	}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 6f3a402..1f2b600 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2900,8 +2900,8 @@ BufferGetLSNAtomic(Buffer buffer)
 /* ---------------------------------------------------------------------
  *		DropRelFileNodeBuffers
  *
- *		This function removes from the buffer pool all the pages of the
- *		specified relation fork that have block numbers >= firstDelBlock.
+ *		This function simultaneously removes from the buffer pool all the
+ *		pages of the relation forks that have block numbers >= firstDelBlock.
  *		(In particular, with firstDelBlock = 0, all pages are removed.)
  *		Dirty pages are simply dropped, without bothering to write them
  *		out first.  Therefore, this is NOT rollback-able, and so should be
@@ -2924,8 +2924,8 @@ BufferGetLSNAtomic(Buffer buffer)
  * --------------------------------------------------------------------
  */
 void
-DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
-					   BlockNumber firstDelBlock)
+DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
+					   int nforks, BlockNumber *firstDelBlock)
 {
 	int			i;
 
@@ -2933,7 +2933,12 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
 	if (RelFileNodeBackendIsTemp(rnode))
 	{
 		if (rnode.backend == MyBackendId)
-			DropRelFileNodeLocalBuffers(rnode.node, forkNum, firstDelBlock);
+		{
+			int		j;
+			for (j = 0; j < nforks; j++)
+				DropRelFileNodeLocalBuffers(rnode.node, forkNum[j],
+											firstDelBlock[j]);
+		}
 		return;
 	}
 
@@ -2941,6 +2946,7 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
 	{
 		BufferDesc *bufHdr = GetBufferDescriptor(i);
 		uint32		buf_state;
+		int		j = 0;
 
 		/*
 		 * We can make this a tad faster by prechecking the buffer tag before
@@ -2962,11 +2968,18 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
 			continue;
 
 		buf_state = LockBufHdr(bufHdr);
-		if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
-			bufHdr->tag.forkNum == forkNum &&
-			bufHdr->tag.blockNum >= firstDelBlock)
-			InvalidateBuffer(bufHdr);	/* releases spinlock */
-		else
+
+		for (j = 0; j < nforks; j++)
+		{
+			if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
+				bufHdr->tag.forkNum == forkNum[j] &&
+				bufHdr->tag.blockNum >= firstDelBlock[j])
+			{
+				InvalidateBuffer(bufHdr); /* releases spinlock */
+				break;
+			}
+		}
+		if (j >= nforks)
 			UnlockBufHdr(bufHdr, buf_state);
 	}
 }
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 2383094..473c2e5 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -247,16 +247,17 @@ GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk)
 }
 
 /*
- * FreeSpaceMapTruncateRel - adjust for truncation of a relation.
+ * FreeSpaceMapLocateBlock - adjust for truncation of a relation.
  *
- * The caller must hold AccessExclusiveLock on the relation, to ensure that
- * other backends receive the smgr invalidation event that this function sends
- * before they access the FSM again.
+ * This function finds the dirty page and returns a block number.
+ *
+ * The caller of this function must eventually call smgrtruncate() to actually
+ * truncate FSM pages.
  *
  * nblocks is the new size of the heap.
  */
-void
-FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
+BlockNumber
+FreeSpaceMapLocateBlock(Relation rel, BlockNumber nblocks)
 {
 	BlockNumber new_nfsmblocks;
 	FSMAddress	first_removed_address;
@@ -270,7 +271,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	 * truncate.
 	 */
 	if (!smgrexists(rel->rd_smgr, FSM_FORKNUM))
-		return;
+		return InvalidBlockNumber;
 
 	/* Get the location in the FSM of the first removed heap block */
 	first_removed_address = fsm_get_location(nblocks, &first_removed_slot);
@@ -285,7 +286,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	{
 		buf = fsm_readbuf(rel, first_removed_address, false);
 		if (!BufferIsValid(buf))
-			return;				/* nothing to do; the FSM was already smaller */
+			return InvalidBlockNumber;	/* nothing to do; the FSM was already smaller */
 		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
 
 		/* NO EREPORT(ERROR) from here till changes are logged */
@@ -310,33 +311,16 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 		UnlockReleaseBuffer(buf);
 
 		new_nfsmblocks = fsm_logical_to_physical(first_removed_address) + 1;
+		return new_nfsmblocks;
 	}
 	else
 	{
 		new_nfsmblocks = fsm_logical_to_physical(first_removed_address);
 		if (smgrnblocks(rel->rd_smgr, FSM_FORKNUM) <= new_nfsmblocks)
-			return;				/* nothing to do; the FSM was already smaller */
+			return InvalidBlockNumber;	/* nothing to do; the FSM was already smaller */
+		else
+			return new_nfsmblocks;
 	}
-
-	/* Truncate the unused FSM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, FSM_FORKNUM, new_nfsmblocks);
-
-	/*
-	 * We might as well update the local smgr_fsm_nblocks setting.
-	 * smgrtruncate sent an smgr cache inval message, which will cause other
-	 * backends to invalidate their copy of smgr_fsm_nblocks, and this one too
-	 * at the next command boundary.  But this ensures it isn't outright wrong
-	 * until then.
-	 */
-	if (rel->rd_smgr)
-		rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
-
-	/*
-	 * Update upper-level FSM pages to account for the truncation.  This is
-	 * important because the just-truncated pages were likely marked as
-	 * all-free, and would be preferentially selected.
-	 */
-	FreeSpaceMapVacuumRange(rel, nblocks, InvalidBlockNumber);
 }
 
 /*
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 5b5a80e..b6d4d23 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -469,6 +469,7 @@ smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo)
 	pfree(rnodes);
 }
 
+
 /*
  *	smgrextend() -- Add a new block to a file.
  *
@@ -561,15 +562,21 @@ smgrnblocks(SMgrRelation reln, ForkNumber forknum)
  *					  of blocks
  *
  * The truncation is done immediately, so this can't be rolled back.
+ *
+ * The caller must hold AccessExclusiveLock on the relation, to ensure that
+ * other backends receive the smgr invalidation event that this function sends
+ * before they access the relation again.
  */
 void
-smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
+smgrtruncate(SMgrRelation reln, ForkNumber *forknum, int nforks, BlockNumber *nblocks)
 {
+	int		i;
+
 	/*
 	 * Get rid of any buffers for the about-to-be-deleted blocks. bufmgr will
 	 * just drop them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nblocks);
+	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nforks, nblocks);
 
 	/*
 	 * Send a shared-inval message to force other backends to close any smgr
@@ -583,10 +590,23 @@ smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 	 */
 	CacheInvalidateSmgr(reln->smgr_rnode);
 
-	/*
-	 * Do the truncation.
-	 */
-	smgrsw[reln->smgr_which].smgr_truncate(reln, forknum, nblocks);
+	/* Do the truncation */
+	for (i = 0; i < nforks; i++)
+	{
+		smgrsw[reln->smgr_which].smgr_truncate(reln, forknum[i], nblocks[i]);
+
+		/*
+		 * We might as well update the local smgr_fsm_nblocks and smgr_vm_nblocks
+		 * setting. The smgr cache inval message we sent will cause other backends
+		 * to invalidate their copy of smgr_fsm_nblocks and smgr_vm_nblocks, and
+		 * these ones too at the next command boundary. But these ensure these
+		 * aren't outright wrong until then.
+		 */
+		if (forknum[i] == FSM_FORKNUM)
+			reln->smgr_fsm_nblocks = nblocks[i];
+		if (forknum[i] == VISIBILITYMAP_FORKNUM)
+			reln->smgr_vm_nblocks = nblocks[i];
+	}
 }
 
 /*
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 2d88043..1ab6a81 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -44,6 +44,6 @@ extern void visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 							  uint8 flags);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
-extern void visibilitymap_truncate(Relation rel, BlockNumber nheapblocks);
+extern BlockNumber visibilitymap_truncate_prepare(Relation rel, BlockNumber nheapblocks);
 
 #endif							/* VISIBILITYMAP_H */
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 509f4b7..17b97f7 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -190,8 +190,8 @@ extern BlockNumber RelationGetNumberOfBlocksInFork(Relation relation,
 extern void FlushOneBuffer(Buffer buffer);
 extern void FlushRelationBuffers(Relation rel);
 extern void FlushDatabaseBuffers(Oid dbid);
-extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode,
-								   ForkNumber forkNum, BlockNumber firstDelBlock);
+extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
+								   int nforks, BlockNumber *firstDelBlock);
 extern void DropRelFileNodesAllBuffers(RelFileNodeBackend *rnodes, int nnodes);
 extern void DropDatabaseBuffers(Oid dbid);
 
diff --git a/src/include/storage/freespace.h b/src/include/storage/freespace.h
index 8d8c465..0b834cb 100644
--- a/src/include/storage/freespace.h
+++ b/src/include/storage/freespace.h
@@ -30,7 +30,7 @@ extern void RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk,
 extern void XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk,
 										Size spaceAvail);
 
-extern void FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
+extern BlockNumber FreeSpaceMapLocateBlock(Relation rel, BlockNumber nblocks);
 extern void FreeSpaceMapVacuum(Relation rel);
 extern void FreeSpaceMapVacuumRange(Relation rel, BlockNumber start,
 									BlockNumber end);
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index 7393727..1543d8d 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -101,8 +101,8 @@ extern void smgrwrite(SMgrRelation reln, ForkNumber forknum,
 extern void smgrwriteback(SMgrRelation reln, ForkNumber forknum,
 						  BlockNumber blocknum, BlockNumber nblocks);
 extern BlockNumber smgrnblocks(SMgrRelation reln, ForkNumber forknum);
-extern void smgrtruncate(SMgrRelation reln, ForkNumber forknum,
-						 BlockNumber nblocks);
+extern void smgrtruncate(SMgrRelation reln, ForkNumber *forknum,
+						 int nforks, BlockNumber *nblocks);
 extern void smgrimmedsync(SMgrRelation reln, ForkNumber forknum);
 extern void AtEOXact_SMgr(void);
 
#32Michael Paquier
michael@paquier.xyz
In reply to: Jamison, Kirk (#31)
Re: [PATCH] Speedup truncates of relation forks

On Tue, Sep 17, 2019 at 01:44:12AM +0000, Jamison, Kirk wrote:

On Friday, September 13, 2019 10:06 PM (GMT+9), Fujii Masao wrote:

On Fri, Sep 13, 2019 at 9:51 PM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

As committed, the smgrdounlinkfork case is actually dead code; it's
never called from anywhere. I left it in place just in case we want
it someday.

but if no use has appeared in 7 years, I say it's time to kill it.

+1

The consensus is we remove it, right?

Yes. Just adding my +1 to nuke the function.
--
Michael

#33Fujii Masao
masao.fujii@gmail.com
In reply to: Jamison, Kirk (#31)
1 attachment(s)
Re: [PATCH] Speedup truncates of relation forks

On Tue, Sep 17, 2019 at 10:44 AM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote:

On Friday, September 13, 2019 10:06 PM (GMT+9), Fujii Masao wrote:

On Fri, Sep 13, 2019 at 9:51 PM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

On 2019-Sep-13, Fujii Masao wrote:

On Mon, Sep 9, 2019 at 3:52 PM Jamison, Kirk <k.jamison@jp.fujitsu.com>

wrote:

Please add a preliminary patch that removes the function. Dead
code is good, as long as it is gone. We can get it pushed ahead of

the rest of this.

Alright. I've attached a separate patch removing the smgrdounlinkfork.

Per the past discussion, some people want to keep this "dead"
function for some reasons. So, in my opinion, it's better to just
enclose the function with #if NOT_USED and #endif, to keep the
function itself as it is, and then to start new discussion on
hackers about the removal of that separatedly from this patch.

I searched for anybody requesting to keep the function. I couldn't
find anything. Tom said in 2012:
/messages/by-id/1471.1339106082@sss.pgh.pa.us

Yes. And I found Andres.
/messages/by-id/20180621174129.hogefyopje4xaznu@al
ap3.anarazel.de

As committed, the smgrdounlinkfork case is actually dead code; it's
never called from anywhere. I left it in place just in case we want
it someday.

but if no use has appeared in 7 years, I say it's time to kill it.

+1

The consensus is we remove it, right?
Re-attaching the patch that removes the deadcode: smgrdounlinkfork().

---
I've also fixed Fujii-san's comments below in the latest attached speedup truncate rel patch (v8).

Thanks for updating the patch!

+ block = visibilitymap_truncate_prepare(rel, 0);
+ if (BlockNumberIsValid(block))
  {
- xl_smgr_truncate xlrec;
+ fork = VISIBILITYMAP_FORKNUM;
+ smgrtruncate(rel->rd_smgr, &fork, 1, &block);
+
+ if (RelationNeedsWAL(rel))
+ {
+ xl_smgr_truncate xlrec;

I don't think this fix is right. Originally, WAL is generated
even in the case where visibilitymap_truncate_prepare() returns
InvalidBlockNumber. But the patch unexpectedly changed the logic
so that WAL is not generated in that case.

+ if (fsm)
+ FreeSpaceMapVacuumRange(rel, first_removed_nblocks,
+ InvalidBlockNumber);

This code means that FreeSpaceMapVacuumRange() is called if FSM exists
even if FreeSpaceMapLocateBlock() returns InvalidBlockNumber.
This seems not right. Originally, FreeSpaceMapVacuumRange() is not called
in the case where InvalidBlockNumber is returned.

So I updated the patch based on yours and fixed the above issues.
Attached. Could you review this one? If there is no issue in that,
I'm thinking to commit that.

Regards,

--
Fujii Masao

Attachments:

speedup_truncate_forks_fujii.patchapplication/octet-stream; name=speedup_truncate_forks_fujii.patchDownload
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 1372bb638f..75b6d96440 100644
*** a/contrib/pg_visibility/pg_visibility.c
--- b/contrib/pg_visibility/pg_visibility.c
***************
*** 383,388 **** pg_truncate_visibility_map(PG_FUNCTION_ARGS)
--- 383,390 ----
  {
  	Oid			relid = PG_GETARG_OID(0);
  	Relation	rel;
+ 	ForkNumber	fork;
+ 	BlockNumber	block;
  
  	rel = relation_open(relid, AccessExclusiveLock);
  
***************
*** 392,398 **** pg_truncate_visibility_map(PG_FUNCTION_ARGS)
  	RelationOpenSmgr(rel);
  	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
  
! 	visibilitymap_truncate(rel, 0);
  
  	if (RelationNeedsWAL(rel))
  	{
--- 394,405 ----
  	RelationOpenSmgr(rel);
  	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
  
! 	block = visibilitymap_prepare_truncate(rel, 0);
! 	if (BlockNumberIsValid(block))
! 	{
! 		fork = VISIBILITYMAP_FORKNUM;
! 		smgrtruncate(rel->rd_smgr, &fork, 1, &block);
! 	}
  
  	if (RelationNeedsWAL(rel))
  	{
***************
*** 418,424 **** pg_truncate_visibility_map(PG_FUNCTION_ARGS)
  	 * here and when we sent the messages at our eventual commit.  However,
  	 * we're currently only sending a non-transactional smgr invalidation,
  	 * which will have been posted to shared memory immediately from within
! 	 * visibilitymap_truncate.  Therefore, there should be no race here.
  	 *
  	 * The reason why it's desirable to release the lock early here is because
  	 * of the possibility that someone will need to use this to blow away many
--- 425,431 ----
  	 * here and when we sent the messages at our eventual commit.  However,
  	 * we're currently only sending a non-transactional smgr invalidation,
  	 * which will have been posted to shared memory immediately from within
! 	 * smgr_truncate.  Therefore, there should be no race here.
  	 *
  	 * The reason why it's desirable to release the lock early here is because
  	 * of the possibility that someone will need to use this to blow away many
diff --git a/src/backend/access/heap/visibilindex a08922b079..262876772f 100644
*** a/src/backend/access/heap/visibilitymap.c
--- b/src/backend/access/heap/visibilitymap.c
***************
*** 17,23 ****
   *		visibilitymap_set	 - set a bit in a previously pinned page
   *		visibilitymap_get_status - get status of bits
   *		visibilitymap_count  - count number of bits set in visibility map
!  *		visibilitymap_truncate	- truncate the visibility map
   *
   * NOTES
   *
--- 17,24 ----
   *		visibilitymap_set	 - set a bit in a previously pinned page
   *		visibilitymap_get_status - get status of bits
   *		visibilitymap_count  - count number of bits set in visibility map
!  *		visibilitymap_prepare_truncate -
!  *			prepare for truncation of the visibility map
   *
   * NOTES
   *
***************
*** 430,445 **** visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_fro
  }
  
  /*
!  *	visibilitymap_truncate - truncate the visibility map
!  *
!  * The caller must hold AccessExclusiveLock on the relation, to ensure that
!  * other backends receive the smgr invalidation event that this function sends
!  * before they access the VM again.
   *
   * nheapblocks is the new size of the heap.
   */
! void
! visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
  {
  	BlockNumber newnblocks;
  
--- 431,448 ----
  }
  
  /*
!  *	visibilitymap_prepare_truncate -
!  *			prepare for truncation of the visibility map
   *
   * nheapblocks is the new size of the heap.
+  *
+  * Return the number of blocks of new visibility map after it's truncated.
+  * If it's InvalidBlockNumber, there is nothing to truncate;
+  * otherwise the caller is responsible for calling smgrtruncate()
+  * to truncate the visibility map pages.
   */
! BlockNumber
! visibilitymap_prepare_truncate(Relation rel, BlockNumber nheapblocks)
  {
  	BlockNumber newnblocks;
  
***************
*** 459,465 **** visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
  	 * nothing to truncate.
  	 */
  	if (!smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM))
! 		return;
  
  	/*
  	 * Unless the new size is exactly at a visibility map page boundary, the
--- 462,468 ----
  	 * nothing to truncate.
  	 */
  	if (!smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM))
! 		return InvalidBlockNumber;
  
  	/*
  	 * Unless the new size is exactly at a visibility map page boundary, the
***************
*** 480,486 **** visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
  		if (!BufferIsValid(mapBuffer))
  		{
  			/* nothing to do, the file was already smaller */
! 			return;
  		}
  
  		page = BufferGetPage(mapBuffer);
--- 483,489 ----
  		if (!BufferIsValid(mapBuffer))
  		{
  			/* nothing to do, the file was already smaller */
! 			return InvalidBlockNumber;
  		}
  
  		page = BufferGetPage(mapBuffer);
***************
*** 528,547 **** visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
  	if (smgrnblocks(rel->rd_smgr, VISIBILITYMAP_FORKNUM) <= newnblocks)
  	{
  		/* nothing to do, the file was already smaller than requested size */
! 		return;
  	}
  
! 	/* Truncate the unused VM pages, and send smgr inval message */
! 	smgrtruncate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, newnblocks);
! 
! 	/*
! 	 * We might as well update the local smgr_vm_nblocks setting. smgrtruncate
! 	 * sent an smgr cache inval message, which will cause other backends to
! 	 * invalidate their copy of smgr_vm_nblocks, and this one too at the next
! 	 * command boundary.  But this ensures it isn't outright wrong until then.
! 	 */
! 	if (rel->rd_smgr)
! 		rel->rd_smgr->smgr_vm_nblocks = newnblocks;
  }
  
  /*
--- 531,540 ----
  	if (smgrnblocks(rel->rd_smgr, VISIBILITYMAP_FORKNUM) <= newnblocks)
  	{
  		/* nothing to do, the file was already smaller than requested size */
! 		return InvalidBlockNumber;
  	}
  
! 	return newnblocks;
  }
  
  /*
diff --git a/src/backend/catalog/storage.c b/sindex 3cc886f7fe..b8c9b6f9c6 100644
*** a/src/backend/catalog/storage.c
--- b/src/backend/catalog/storage.c
***************
*** 231,236 **** RelationTruncate(Relation rel, BlockNumber nblocks)
--- 231,240 ----
  {
  	bool		fsm;
  	bool		vm;
+ 	bool		need_fsm_vacuum = false;
+ 	ForkNumber	forks[MAX_FORKNUM];
+ 	BlockNumber	blocks[MAX_FORKNUM];
+ 	int		nforks = 0;
  
  	/* Open it at the smgr level if not already done */
  	RelationOpenSmgr(rel);
***************
*** 242,256 **** RelationTruncate(Relation rel, BlockNumber nblocks)
  	rel->rd_smgr->smgr_fsm_nblocks = InvalidBlockNumber;
  	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
  
! 	/* Truncate the FSM first if it exists */
  	fsm = smgrexists(rel->rd_smgr, FSM_FORKNUM);
  	if (fsm)
! 		FreeSpaceMapTruncateRel(rel, nblocks);
  
! 	/* Truncate the visibility map too if it exists. */
  	vm = smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM);
  	if (vm)
! 		visibilitymap_truncate(rel, nblocks);
  
  	/*
  	 * We WAL-log the truncation before actually truncating, which means
--- 246,280 ----
  	rel->rd_smgr->smgr_fsm_nblocks = InvalidBlockNumber;
  	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
  
! 	/* Prepare for truncation of MAIN fork of the relation */
! 	forks[nforks] = MAIN_FORKNUM;
! 	blocks[nforks] = nblocks;
! 	nforks++;
! 
! 	/*  Prepare for truncation of the FSM if it exists */
  	fsm = smgrexists(rel->rd_smgr, FSM_FORKNUM);
  	if (fsm)
! 	{
! 		blocks[nforks] = FreeSpaceMapPrepareTruncateRel(rel, nblocks);
! 		if (BlockNumberIsValid(blocks[nforks]))
! 		{
! 			forks[nforks] = FSM_FORKNUM;
! 			nforks++;
! 			need_fsm_vacuum = true;
! 		}
! 	}
  
! 	/* Prepare for truncation of the visibility map too if it exists */
  	vm = smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM);
  	if (vm)
! 	{
! 		blocks[nforks] = visibilitymap_prepare_truncate(rel, nblocks);
! 		if (BlockNumberIsValid(blocks[nforks]))
! 		{
! 			forks[nforks] = VISIBILITYMAP_FORKNUM;
! 			nforks++;
! 		}
! 	}
  
  	/*
  	 * We WAL-log the truncation before actually truncating, which means
***************
*** 290,297 **** RelationTruncate(Relation rel, BlockNumber nblocks)
  			XLogFlush(lsn);
  	}
  
! 	/* Do the real work */
! 	smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks);
  }
  
  /*
--- 314,329 ----
  			XLogFlush(lsn);
  	}
  
! 	/* Do the real work to truncate relation forks */
! 	smgrtruncate(rel->rd_smgr, forks, nforks, blocks);
! 
! 	/*
! 	 * Update upper-level FSM pages to account for the truncation.
! 	 * This is important because the just-truncated pages were likely
! 	 * marked as all-free, and would be preferentially selected.
! 	 */
! 	if (need_fsm_vacuum)
! 		FreeSpaceMapVacuumRange(rel, nblocks, InvalidBlockNumber);
  }
  
  /*
***************
*** 588,593 **** smgr_redo(XLogReaderState *record)
--- 620,629 ----
  		xl_smgr_truncate *xlrec = (xl_smgr_truncate *) XLogRecGetData(record);
  		SMgrRelation reln;
  		Relation	rel;
+ 		ForkNumber	forks[MAX_FORKNUM];
+ 		BlockNumber	blocks[MAX_FORKNUM];
+ 		int		nforks = 0;
+ 		bool		need_fsm_vacuum = false;
  
  		reln = smgropen(xlrec->rnode, InvalidBackendId);
  
***************
*** 616,638 **** smgr_redo(XLogReaderState *record)
  		 */
  		XLogFlush(lsn);
  
  		if ((xlrec->flags & SMGR_TRUNCATE_HEAP) != 0)
  		{
! 			smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno);
  
  			/* Also tell xlogutils.c about it */
  			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
  		}
  
! 		/* Truncate FSM and VM too */
  		rel = CreateFakeRelcacheEntry(xlrec->rnode);
  
  		if ((xlrec->flags & SMGR_TRUNCATE_FSM) != 0 &&
  			smgrexists(reln, FSM_FORKNUM))
! 			FreeSpaceMapTruncateRel(rel, xlrec->blkno);
  		if ((xlrec->flags & SMGR_TRUNCATE_VM) != 0 &&
  			smgrexists(reln, VISIBILITYMAP_FORKNUM))
! 			visibilitymap_truncate(rel, xlrec->blkno);
  
  		FreeFakeRelcacheEntry(rel);
  	}
--- 652,705 ----
  		 */
  		XLogFlush(lsn);
  
+ 		/* Prepare for truncation of MAIN fork */
  		if ((xlrec->flags & SMGR_TRUNCATE_HEAP) != 0)
  		{
! 			forks[nforks] = MAIN_FORKNUM;
! 			blocks[nforks] = xlrec->blkno;
! 			nforks++;
  
  			/* Also tell xlogutils.c about it */
  			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
  		}
  
! 		/* Prepare for truncation of FSM and VM too */
  		rel = CreateFakeRelcacheEntry(xlrec->rnode);
  
  		if ((xlrec->flags & SMGR_TRUNCATE_FSM) != 0 &&
  			smgrexists(reln, FSM_FORKNUM))
! 		{
! 			blocks[nforks] = FreeSpaceMapPrepareTruncateRel(rel, xlrec->blkno);
! 			if (BlockNumberIsValid(blocks[nforks]))
! 			{
! 				forks[nforks] = FSM_FORKNUM;
! 				nforks++;
! 				need_fsm_vacuum = true;
! 			}
! 		}
  		if ((xlrec->flags & SMGR_TRUNCATE_VM) != 0 &&
  			smgrexists(reln, VISIBILITYMAP_FORKNUM))
! 		{
! 			blocks[nforks] = visibilitymap_prepare_truncate(rel, xlrec->blkno);
! 			if (BlockNumberIsValid(blocks[nforks]))
! 			{
! 				forks[nforks] = VISIBILITYMAP_FORKNUM;
! 				nforks++;
! 			}
! 		}
! 
! 		/* Do the real work to truncate relation forks */
! 		if (nforks > 0)
! 			smgrtruncate(reln, forks, nforks, blocks);
! 
! 		/*
! 		 * Update upper-level FSM pages to account for the truncation.
! 		 * This is important because the just-truncated pages were likely
! 		 * marked as all-free, and would be preferentially selected.
! 		 */
! 		if (need_fsm_vacuum)
! 			FreeSpaceMapVacuumRange(rel, xlrec->blkno,
! 									InvalidBlockNumber);
  
  		FreeFakeRelcacheEntry(rel);
  	}
diff --git a/src/backend/storage/bufindex 6f3a402854..483f705305 100644
*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
***************
*** 2901,2907 **** BufferGetLSNAtomic(Buffer buffer)
   *		DropRelFileNodeBuffers
   *
   *		This function removes from the buffer pool all the pages of the
!  *		specified relation fork that have block numbers >= firstDelBlock.
   *		(In particular, with firstDelBlock = 0, all pages are removed.)
   *		Dirty pages are simply dropped, without bothering to write them
   *		out first.  Therefore, this is NOT rollback-able, and so should be
--- 2901,2907 ----
   *		DropRelFileNodeBuffers
   *
   *		This function removes from the buffer pool all the pages of the
!  *		specified relation forks that have block numbers >= firstDelBlock.
   *		(In particular, with firstDelBlock = 0, all pages are removed.)
   *		Dirty pages are simply dropped, without bothering to write them
   *		out first.  Therefore, this is NOT rollback-able, and so should be
***************
*** 2924,2939 **** BufferGetLSNAtomic(Buffer buffer)
   * --------------------------------------------------------------------
   */
  void
! DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
! 					   BlockNumber firstDelBlock)
  {
  	int			i;
  
  	/* If it's a local relation, it's localbuf.c's problem. */
  	if (RelFileNodeBackendIsTemp(rnode))
  	{
  		if (rnode.backend == MyBackendId)
! 			DropRelFileNodeLocalBuffers(rnode.node, forkNum, firstDelBlock);
  		return;
  	}
  
--- 2924,2944 ----
   * --------------------------------------------------------------------
   */
  void
! DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
! 					   int nforks, BlockNumber *firstDelBlock)
  {
  	int			i;
+ 	int			j;
  
  	/* If it's a local relation, it's localbuf.c's problem. */
  	if (RelFileNodeBackendIsTemp(rnode))
  	{
  		if (rnode.backend == MyBackendId)
! 		{
! 			for (j = 0; j < nforks; j++)
! 				DropRelFileNodeLocalBuffers(rnode.node, forkNum[j],
! 											firstDelBlock[j]);
! 		}
  		return;
  	}
  
***************
*** 2962,2972 **** DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
  			continue;
  
  		buf_state = LockBufHdr(bufHdr);
! 		if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
! 			bufHdr->tag.forkNum == forkNum &&
! 			bufHdr->tag.blockNum >= firstDelBlock)
! 			InvalidateBuffer(bufHdr);	/* releases spinlock */
! 		else
  			UnlockBufHdr(bufHdr, buf_state);
  	}
  }
--- 2967,2984 ----
  			continue;
  
  		buf_state = LockBufHdr(bufHdr);
! 
! 		for (j = 0; j < nforks; j++)
! 		{
! 			if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
! 				bufHdr->tag.forkNum == forkNum[j] &&
! 				bufHdr->tag.blockNum >= firstDelBlock[j])
! 			{
! 				InvalidateBuffer(bufHdr); /* releases spinlock */
! 				break;
! 			}
! 		}
! 		if (j >= nforks)
  			UnlockBufHdr(bufHdr, buf_state);
  	}
  }
diff --git a/src/backend/storage/freespaceindex 2383094cfd..b7a048c96c 100644
*** a/src/backend/storage/freespace/freespace.c
--- b/src/backend/storage/freespace/freespace.c
***************
*** 247,262 **** GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk)
  }
  
  /*
!  * FreeSpaceMapTruncateRel - adjust for truncation of a relation.
!  *
!  * The caller must hold AccessExclusiveLock on the relation, to ensure that
!  * other backends receive the smgr invalidation event that this function sends
!  * before they access the FSM again.
   *
   * nblocks is the new size of the heap.
   */
! void
! FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
  {
  	BlockNumber new_nfsmblocks;
  	FSMAddress	first_removed_address;
--- 247,264 ----
  }
  
  /*
!  * FreeSpaceMapPrepareTruncateRel - prepare for truncation of a relation.
   *
   * nblocks is the new size of the heap.
+  *
+  * Return the number of blocks of new FSM after it's truncated.
+  * If it's InvalidBlockNumber, there is nothing to truncate;
+  * otherwise the caller is responsible for calling smgrtruncate()
+  * to truncate the FSM pages, and FreeSpaceMapVacuumRange()
+  * to update upper-level pages in the FSM.
   */
! BlockNumber
! FreeSpaceMapPrepareTruncateRel(Relation rel, BlockNumber nblocks)
  {
  	BlockNumber new_nfsmblocks;
  	FSMAddress	first_removed_address;
***************
*** 270,276 **** FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
  	 * truncate.
  	 */
  	if (!smgrexists(rel->rd_smgr, FSM_FORKNUM))
! 		return;
  
  	/* Get the location in the FSM of the first removed heap block */
  	first_removed_address = fsm_get_location(nblocks, &first_removed_slot);
--- 272,278 ----
  	 * truncate.
  	 */
  	if (!smgrexists(rel->rd_smgr, FSM_FORKNUM))
! 		return InvalidBlockNumber;
  
  	/* Get the location in the FSM of the first removed heap block */
  	first_removed_address = fsm_get_location(nblocks, &first_removed_slot);
***************
*** 285,291 **** FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
  	{
  		buf = fsm_readbuf(rel, first_removed_address, false);
  		if (!BufferIsValid(buf))
! 			return;				/* nothing to do; the FSM was already smaller */
  		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
  
  		/* NO EREPORT(ERROR) from here till changes are logged */
--- 287,293 ----
  	{
  		buf = fsm_readbuf(rel, first_removed_address, false);
  		if (!BufferIsValid(buf))
! 			return InvalidBlockNumber;	/* nothing to do; the FSM was already smaller */
  		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
  
  		/* NO EREPORT(ERROR) from here till changes are logged */
***************
*** 315,342 **** FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
  	{
  		new_nfsmblocks = fsm_logical_to_physical(first_removed_address);
  		if (smgrnblocks(rel->rd_smgr, FSM_FORKNUM) <= new_nfsmblocks)
! 			return;				/* nothing to do; the FSM was already smaller */
  	}
  
! 	/* Truncate the unused FSM pages, and send smgr inval message */
! 	smgrtruncate(rel->rd_smgr, FSM_FORKNUM, new_nfsmblocks);
! 
! 	/*
! 	 * We might as well update the local smgr_fsm_nblocks setting.
! 	 * smgrtruncate sent an smgr cache inval message, which will cause other
! 	 * backends to invalidate their copy of smgr_fsm_nblocks, and this one too
! 	 * at the next command boundary.  But this ensures it isn't outright wrong
! 	 * until then.
! 	 */
! 	if (rel->rd_smgr)
! 		rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
! 
! 	/*
! 	 * Update upper-level FSM pages to account for the truncation.  This is
! 	 * important because the just-truncated pages were likely marked as
! 	 * all-free, and would be preferentially selected.
! 	 */
! 	FreeSpaceMapVacuumRange(rel, nblocks, InvalidBlockNumber);
  }
  
  /*
--- 317,326 ----
  	{
  		new_nfsmblocks = fsm_logical_to_physical(first_removed_address);
  		if (smgrnblocks(rel->rd_smgr, FSM_FORKNUM) <= new_nfsmblocks)
! 			return InvalidBlockNumber;	/* nothing to do; the FSM was already smaller */
  	}
  
! 	return new_nfsmblocks;
  }
  
  /*
diff --git a/src/backend/storage/smgr/smgr.c b/sindex 5b5a80e890..b50c69b438 100644
*** a/src/backend/storage/smgr/smgr.c
--- b/src/backend/storage/smgr/smgr.c
***************
*** 469,474 **** smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo)
--- 469,475 ----
  	pfree(rnodes);
  }
  
+ 
  /*
   *	smgrextend() -- Add a new block to a file.
   *
***************
*** 557,575 **** smgrnblocks(SMgrRelation reln, ForkNumber forknum)
  }
  
  /*
!  *	smgrtruncate() -- Truncate supplied relation to the specified number
!  *					  of blocks
   *
   * The truncation is done immediately, so this can't be rolled back.
   */
  void
! smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
  {
  	/*
  	 * Get rid of any buffers for the about-to-be-deleted blocks. bufmgr will
  	 * just drop them without bothering to write the contents.
  	 */
! 	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nblocks);
  
  	/*
  	 * Send a shared-inval message to force other backends to close any smgr
--- 558,582 ----
  }
  
  /*
!  *	smgrtruncate() -- Truncate the given forks of supplied relation to
!  *					  each specified numbers of blocks
   *
   * The truncation is done immediately, so this can't be rolled back.
+  *
+  * The caller must hold AccessExclusiveLock on the relation, to ensure that
+  * other backends receive the smgr invalidation event that this function sends
+  * before they access any forks of the relation again.
   */
  void
! smgrtruncate(SMgrRelation reln, ForkNumber *forknum, int nforks, BlockNumber *nblocks)
  {
+ 	int		i;
+ 
  	/*
  	 * Get rid of any buffers for the about-to-be-deleted blocks. bufmgr will
  	 * just drop them without bothering to write the contents.
  	 */
! 	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nforks, nblocks);
  
  	/*
  	 * Send a shared-inval message to force other backends to close any smgr
***************
*** 583,592 **** smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
  	 */
  	CacheInvalidateSmgr(reln->smgr_rnode);
  
! 	/*
! 	 * Do the truncation.
! 	 */
! 	smgrsw[reln->smgr_which].smgr_truncate(reln, forknum, nblocks);
  }
  
  /*
--- 590,613 ----
  	 */
  	CacheInvalidateSmgr(reln->smgr_rnode);
  
! 	/* Do the truncation */
! 	for (i = 0; i < nforks; i++)
! 	{
! 		smgrsw[reln->smgr_which].smgr_truncate(reln, forknum[i], nblocks[i]);
! 
! 		/*
! 		 * We might as well update the local smgr_fsm_nblocks and
! 		 * smgr_vm_nblocks settings. The smgr cache inval message that
! 		 * this function sent will cause other backends to invalidate
! 		 * their copies of smgr_fsm_nblocks and smgr_vm_nblocks,
! 		 * and these ones too at the next command boundary.
! 		 * But these ensure they aren't outright wrong until then.
! 		 */
! 		if (forknum[i] == FSM_FORKNUM)
! 			reln->smgr_fsm_nblocks = nblocks[i];
! 		if (forknum[i] == VISIBILITYMAP_FORKNUM)
! 			reln->smgr_vm_nblocks = nblocks[i];
! 	}
  }
  
  /*
diff --git a/src/include/access/visibiindex 2d8804351a..0532b04e34 100644
*** a/src/include/access/visibilitymap.h
--- b/src/include/access/visibilitymap.h
***************
*** 44,49 **** extern void visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  							  uint8 flags);
  extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
  extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
! extern void visibilitymap_truncate(Relation rel, BlockNumber nheapblocks);
  
  #endif							/* VISIBILITYMAP_H */
--- 44,50 ----
  							  uint8 flags);
  extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
  extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
! extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
! 							  BlockNumber nheapblocks);
  
  #endif							/* VISIBILITYMAP_H */
diff --git a/src/include/storage/bufmgr.hindex 509f4b7ef1..17b97f7e38 100644
*** a/src/include/storage/bufmgr.h
--- b/src/include/storage/bufmgr.h
***************
*** 190,197 **** extern BlockNumber RelationGetNumberOfBlocksInFork(Relation relation,
  extern void FlushOneBuffer(Buffer buffer);
  extern void FlushRelationBuffers(Relation rel);
  extern void FlushDatabaseBuffers(Oid dbid);
! extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode,
! 								   ForkNumber forkNum, BlockNumber firstDelBlock);
  extern void DropRelFileNodesAllBuffers(RelFileNodeBackend *rnodes, int nnodes);
  extern void DropDatabaseBuffers(Oid dbid);
  
--- 190,197 ----
  extern void FlushOneBuffer(Buffer buffer);
  extern void FlushRelationBuffers(Relation rel);
  extern void FlushDatabaseBuffers(Oid dbid);
! extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
! 								   int nforks, BlockNumber *firstDelBlock);
  extern void DropRelFileNodesAllBuffers(RelFileNodeBackend *rnodes, int nnodes);
  extern void DropDatabaseBuffers(Oid dbid);
  
diff --git a/src/include/storage/frindex 8d8c465d7b..b75f6fe946 100644
*** a/src/include/storage/freespace.h
--- b/src/include/storage/freespace.h
***************
*** 30,36 **** extern void RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk,
  extern void XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk,
  										Size spaceAvail);
  
! extern void FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
  extern void FreeSpaceMapVacuum(Relation rel);
  extern void FreeSpaceMapVacuumRange(Relation rel, BlockNumber start,
  									BlockNumber end);
--- 30,37 ----
  extern void XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk,
  										Size spaceAvail);
  
! extern BlockNumber FreeSpaceMapPrepareTruncateRel(Relation rel,
! 												  BlockNumber nblocks);
  extern void FreeSpaceMapVacuum(Relation rel);
  extern void FreeSpaceMapVacuumRange(Relation rel, BlockNumber start,
  									BlockNumber end);
diff --git a/src/include/storage/smgr.index 7393727a4b..1543d8d870 100644
*** a/src/include/storage/smgr.h
--- b/src/include/storage/smgr.h
***************
*** 101,108 **** extern void smgrwrite(SMgrRelation reln, ForkNumber forknum,
  extern void smgrwriteback(SMgrRelation reln, ForkNumber forknum,
  						  BlockNumber blocknum, BlockNumber nblocks);
  extern BlockNumber smgrnblocks(SMgrRelation reln, ForkNumber forknum);
! extern void smgrtruncate(SMgrRelation reln, ForkNumber forknum,
! 						 BlockNumber nblocks);
  extern void smgrimmedsync(SMgrRelation reln, ForkNumber forknum);
  extern void AtEOXact_SMgr(void);
  
--- 101,108 ----
  extern void smgrwriteback(SMgrRelation reln, ForkNumber forknum,
  						  BlockNumber blocknum, BlockNumber nblocks);
  extern BlockNumber smgrnblocks(SMgrRelation reln, ForkNumber forknum);
! extern void smgrtruncate(SMgrRelation reln, ForkNumber *forknum,
! 						 int nforks, BlockNumber *nblocks);
  extern void smgrimmedsync(SMgrRelation reln, ForkNumber forknum);
  extern void AtEOXact_SMgr(void);
  
#34Fujii Masao
masao.fujii@gmail.com
In reply to: Michael Paquier (#32)
Re: [PATCH] Speedup truncates of relation forks

On Tue, Sep 17, 2019 at 2:25 PM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Sep 17, 2019 at 01:44:12AM +0000, Jamison, Kirk wrote:

On Friday, September 13, 2019 10:06 PM (GMT+9), Fujii Masao wrote:

On Fri, Sep 13, 2019 at 9:51 PM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

As committed, the smgrdounlinkfork case is actually dead code; it's
never called from anywhere. I left it in place just in case we want
it someday.

but if no use has appeared in 7 years, I say it's time to kill it.

+1

The consensus is we remove it, right?

Yes. Just adding my +1 to nuke the function.

Okay, so committed.

Regards,

--
Fujii Masao

#35Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Fujii Masao (#33)
RE: [PATCH] Speedup truncates of relation forks

On Wednesday, September 18, 2019 8:38 PM, Fujii Masao wrote:

On Tue, Sep 17, 2019 at 10:44 AM Jamison, Kirk <k.jamison@jp.fujitsu.com>
wrote:

On Friday, September 13, 2019 10:06 PM (GMT+9), Fujii Masao wrote:

On Fri, Sep 13, 2019 at 9:51 PM Alvaro Herrera
<alvherre@2ndquadrant.com>
wrote:

On 2019-Sep-13, Fujii Masao wrote:

On Mon, Sep 9, 2019 at 3:52 PM Jamison, Kirk
<k.jamison@jp.fujitsu.com>

wrote:

Please add a preliminary patch that removes the function.
Dead code is good, as long as it is gone. We can get it
pushed ahead of

the rest of this.

Alright. I've attached a separate patch removing the

smgrdounlinkfork.

Per the past discussion, some people want to keep this "dead"
function for some reasons. So, in my opinion, it's better to
just enclose the function with #if NOT_USED and #endif, to keep
the function itself as it is, and then to start new discussion
on hackers about the removal of that separatedly from this patch.

I searched for anybody requesting to keep the function. I
couldn't find anything. Tom said in 2012:
/messages/by-id/1471.1339106082@sss.pgh.pa.u
s

Yes. And I found Andres.
/messages/by-id/20180621174129.hogefyopje4xazn
u@al
ap3.anarazel.de

As committed, the smgrdounlinkfork case is actually dead code;
it's never called from anywhere. I left it in place just in
case we want it someday.

but if no use has appeared in 7 years, I say it's time to kill it.

+1

The consensus is we remove it, right?
Re-attaching the patch that removes the deadcode: smgrdounlinkfork().

---
I've also fixed Fujii-san's comments below in the latest attached speedup

truncate rel patch (v8).

Thanks for updating the patch!

+ block = visibilitymap_truncate_prepare(rel, 0); if
+ (BlockNumberIsValid(block))
{
- xl_smgr_truncate xlrec;
+ fork = VISIBILITYMAP_FORKNUM;
+ smgrtruncate(rel->rd_smgr, &fork, 1, &block);
+
+ if (RelationNeedsWAL(rel))
+ {
+ xl_smgr_truncate xlrec;

I don't think this fix is right. Originally, WAL is generated even in the
case where visibilitymap_truncate_prepare() returns InvalidBlockNumber. But
the patch unexpectedly changed the logic so that WAL is not generated in that
case.

+ if (fsm)
+ FreeSpaceMapVacuumRange(rel, first_removed_nblocks,
+ InvalidBlockNumber);

This code means that FreeSpaceMapVacuumRange() is called if FSM exists even
if FreeSpaceMapLocateBlock() returns InvalidBlockNumber.
This seems not right. Originally, FreeSpaceMapVacuumRange() is not called
in the case where InvalidBlockNumber is returned.

So I updated the patch based on yours and fixed the above issues.
Attached. Could you review this one? If there is no issue in that, I'm thinking
to commit that.

Oops. Thanks for the catch to correct my fix and revision of some descriptions.
I also noticed you reordered the truncation of forks, by which main fork will be
truncated first instead of FSM. I'm not sure if the order matters now given that
we're truncating the forks simultaneously, so I'm ok with that change.

Just one minor comment:
+ * Return the number of blocks of new FSM after it's truncated.

"after it's truncated" is quite confusing.
How about, "as a result of previous truncation" or just end the sentence after new FSM?

Thank you for committing the other patch as well!

Regards,
Kirk Jamison

#36Fujii Masao
masao.fujii@gmail.com
In reply to: Jamison, Kirk (#35)
Re: [PATCH] Speedup truncates of relation forks

On Thu, Sep 19, 2019 at 9:42 AM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote:

On Wednesday, September 18, 2019 8:38 PM, Fujii Masao wrote:

On Tue, Sep 17, 2019 at 10:44 AM Jamison, Kirk <k.jamison@jp.fujitsu.com>
wrote:

On Friday, September 13, 2019 10:06 PM (GMT+9), Fujii Masao wrote:

On Fri, Sep 13, 2019 at 9:51 PM Alvaro Herrera
<alvherre@2ndquadrant.com>
wrote:

On 2019-Sep-13, Fujii Masao wrote:

On Mon, Sep 9, 2019 at 3:52 PM Jamison, Kirk
<k.jamison@jp.fujitsu.com>

wrote:

Please add a preliminary patch that removes the function.
Dead code is good, as long as it is gone. We can get it
pushed ahead of

the rest of this.

Alright. I've attached a separate patch removing the

smgrdounlinkfork.

Per the past discussion, some people want to keep this "dead"
function for some reasons. So, in my opinion, it's better to
just enclose the function with #if NOT_USED and #endif, to keep
the function itself as it is, and then to start new discussion
on hackers about the removal of that separatedly from this patch.

I searched for anybody requesting to keep the function. I
couldn't find anything. Tom said in 2012:
/messages/by-id/1471.1339106082@sss.pgh.pa.u
s

Yes. And I found Andres.
/messages/by-id/20180621174129.hogefyopje4xazn
u@al
ap3.anarazel.de

As committed, the smgrdounlinkfork case is actually dead code;
it's never called from anywhere. I left it in place just in
case we want it someday.

but if no use has appeared in 7 years, I say it's time to kill it.

+1

The consensus is we remove it, right?
Re-attaching the patch that removes the deadcode: smgrdounlinkfork().

---
I've also fixed Fujii-san's comments below in the latest attached speedup

truncate rel patch (v8).

Thanks for updating the patch!

+ block = visibilitymap_truncate_prepare(rel, 0); if
+ (BlockNumberIsValid(block))
{
- xl_smgr_truncate xlrec;
+ fork = VISIBILITYMAP_FORKNUM;
+ smgrtruncate(rel->rd_smgr, &fork, 1, &block);
+
+ if (RelationNeedsWAL(rel))
+ {
+ xl_smgr_truncate xlrec;

I don't think this fix is right. Originally, WAL is generated even in the
case where visibilitymap_truncate_prepare() returns InvalidBlockNumber. But
the patch unexpectedly changed the logic so that WAL is not generated in that
case.

+ if (fsm)
+ FreeSpaceMapVacuumRange(rel, first_removed_nblocks,
+ InvalidBlockNumber);

This code means that FreeSpaceMapVacuumRange() is called if FSM exists even
if FreeSpaceMapLocateBlock() returns InvalidBlockNumber.
This seems not right. Originally, FreeSpaceMapVacuumRange() is not called
in the case where InvalidBlockNumber is returned.

So I updated the patch based on yours and fixed the above issues.
Attached. Could you review this one? If there is no issue in that, I'm thinking
to commit that.

Oops. Thanks for the catch to correct my fix and revision of some descriptions.
I also noticed you reordered the truncation of forks, by which main fork will be
truncated first instead of FSM. I'm not sure if the order matters now given that
we're truncating the forks simultaneously, so I'm ok with that change.

I changed that order so that DropRelFileNodeBuffers() can scan shared_buffers
more efficiently. Usually the number of buffers for MAIN fork is larger than
the others, in shared_buffers. So it's better to compare MAIN fork first for
performance, during full scan of shared_buffers.

Just one minor comment:
+ * Return the number of blocks of new FSM after it's truncated.

"after it's truncated" is quite confusing.
How about, "as a result of previous truncation" or just end the sentence after new FSM?

Thanks for the comment!
I adopted the latter and committed the patch. Thanks!

Regards,

--
Fujii Masao

#37Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Fujii Masao (#36)
RE: [PATCH] Speedup truncates of relation forks

On Tuesday, September 24, 2019 5:41 PM (GMT+9), Fujii Masao wrote:

On Thu, Sep 19, 2019 at 9:42 AM Jamison, Kirk <k.jamison@jp.fujitsu.com>
wrote:

On Wednesday, September 18, 2019 8:38 PM, Fujii Masao wrote:

On Tue, Sep 17, 2019 at 10:44 AM Jamison, Kirk
<k.jamison@jp.fujitsu.com>
wrote:

On Friday, September 13, 2019 10:06 PM (GMT+9), Fujii Masao wrote:

On Fri, Sep 13, 2019 at 9:51 PM Alvaro Herrera
<alvherre@2ndquadrant.com>
wrote:

On 2019-Sep-13, Fujii Masao wrote:

On Mon, Sep 9, 2019 at 3:52 PM Jamison, Kirk
<k.jamison@jp.fujitsu.com>

wrote:

Please add a preliminary patch that removes the function.
Dead code is good, as long as it is gone. We can get it
pushed ahead of

the rest of this.

Alright. I've attached a separate patch removing the

smgrdounlinkfork.

Per the past discussion, some people want to keep this "dead"
function for some reasons. So, in my opinion, it's better to
just enclose the function with #if NOT_USED and #endif, to
keep the function itself as it is, and then to start new
discussion on hackers about the removal of that separatedly from

this patch.

I searched for anybody requesting to keep the function. I
couldn't find anything. Tom said in 2012:
/messages/by-id/1471.1339106082@sss.pgh.
pa.u
s

Yes. And I found Andres.
/messages/by-id/20180621174129.hogefyopje4
xazn
u@al
ap3.anarazel.de

As committed, the smgrdounlinkfork case is actually dead
code; it's never called from anywhere. I left it in place
just in case we want it someday.

but if no use has appeared in 7 years, I say it's time to kill it.

+1

The consensus is we remove it, right?
Re-attaching the patch that removes the deadcode: smgrdounlinkfork().

---
I've also fixed Fujii-san's comments below in the latest attached
speedup

truncate rel patch (v8).

Thanks for updating the patch!

+ block = visibilitymap_truncate_prepare(rel, 0); if
+ (BlockNumberIsValid(block))
{
- xl_smgr_truncate xlrec;
+ fork = VISIBILITYMAP_FORKNUM;
+ smgrtruncate(rel->rd_smgr, &fork, 1, &block);
+
+ if (RelationNeedsWAL(rel))
+ {
+ xl_smgr_truncate xlrec;

I don't think this fix is right. Originally, WAL is generated even
in the case where visibilitymap_truncate_prepare() returns
InvalidBlockNumber. But the patch unexpectedly changed the logic so
that WAL is not generated in that case.

+ if (fsm)
+ FreeSpaceMapVacuumRange(rel, first_removed_nblocks,
+ InvalidBlockNumber);

This code means that FreeSpaceMapVacuumRange() is called if FSM
exists even if FreeSpaceMapLocateBlock() returns InvalidBlockNumber.
This seems not right. Originally, FreeSpaceMapVacuumRange() is not
called in the case where InvalidBlockNumber is returned.

So I updated the patch based on yours and fixed the above issues.
Attached. Could you review this one? If there is no issue in that,
I'm thinking to commit that.

Oops. Thanks for the catch to correct my fix and revision of some

descriptions.

I also noticed you reordered the truncation of forks, by which main
fork will be truncated first instead of FSM. I'm not sure if the order
matters now given that we're truncating the forks simultaneously, so I'm

ok with that change.

I changed that order so that DropRelFileNodeBuffers() can scan shared_buffers
more efficiently. Usually the number of buffers for MAIN fork is larger than
the others, in shared_buffers. So it's better to compare MAIN fork first for
performance, during full scan of shared_buffers.

Just one minor comment:
+ * Return the number of blocks of new FSM after it's truncated.

"after it's truncated" is quite confusing.
How about, "as a result of previous truncation" or just end the sentence

after new FSM?

Thanks for the comment!
I adopted the latter and committed the patch. Thanks!

Thank you very much Fujii-san for taking time to review
as well as for committing this patch!

Regards,
Kirk Jamison