Boundary value check in lazy_tid_reaped()

Started by Masahiko Sawadaover 5 years ago19 messages

masahiko.sawada@2ndquadrant.com

over 5 years ago

1 attachment(s)

Hi all,

In the current lazy vacuum implementation, some index AMs such as
btree indexes call lazy_tid_reaped() for each index tuples during
ambulkdelete to check if the index tuple points to the (collected)
garbage tuple. In that function, we simply call bsearch(3) but we
should be able to know the result without bsearch(3) if the index
tuple points to the heap tuple that is out of range of the collected
garbage tuples. I've checked whether bsearch(3) does the boundary
value check or not but looking at the bsearch(3) implementation of
FreeBSD libc[1]https://svnweb.freebsd.org/base/head/lib/libc/stdlib/bsearch.c?revision=326025&view=markup, there is not. The same is true for glibc.

So my proposal is to add boundary value check in lazy_tid_reaped()
before executing bsearch(3). This will help when index vacuum happens
multiple times or when garbage tuples are concentrated to a narrow
range.

I’ve done three performance tests. maintenance_work_mem is set to 64MB
with which we can collect about 11 million tuples at maximum and the
table size is 10GB. The table has one btree index. Here is the average
of index vacuum (ambulkdelete) execution time of three trials:

* Test case 1

I made all pages dirty with 15 million garbage tuples to invoke index
vacuum twice.

HEAD: 164.7 s
Patched: 138.81 s

* Test case 2

I made all pages dirty with 10 million garbage tuples to invoke index
vacuum only once, checking the performance degradation.

HEAD: 127.8 s
Patched: 128.25 s

* Test case 3

I made a certain range of table dirty with 1 million garbage tuples.

HEAD: 31.07 s
Patched: 9.41 s

I thought that we can have a generic function wrapping bsearch(3) that
does boundary value checks and then does bsearch(3) so that we can use
it in other similar places as well. But the attached patch doesn't do
that as I'd like to hear opinions on the proposal first.

Feedback is very welcome.

Regards,

[1]: https://svnweb.freebsd.org/base/head/lib/libc/stdlib/bsearch.c?revision=326025&view=markup

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

vacuum_boundary_check.patchapplication/octet-stream; name=vacuum_boundary_check.patchDownload

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a0da444af0..981789b988 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -167,6 +167,15 @@ typedef struct LVDeadTuples
 {
 	int			max_tuples;		/* # slots allocated in array */
 	int			num_tuples;		/* current # of entries */
+
+	/*
+	 * The boundaries of block numbers of recorded dead tuples.
+	 * min_blkno <= blk < max_blkno exists in the itemptrs array. These values are
+	 * InvalidBlockNumber if not set yet.
+	 */
+	BlockNumber	min_blkno;
+	BlockNumber	max_blkno;
+
 	/* List of TIDs of tuples we intend to delete */
 	/* NB: this list is ordered by TID address */
 	ItemPointerData itemptrs[FLEXIBLE_ARRAY_MEMBER];	/* array of
@@ -1062,6 +1071,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 				vmbuffer = InvalidBuffer;
 			}
 
+			/* Update the upper bound */
+			vacrelstats->dead_tuples->max_blkno = blkno;
+
 			/* Work on all the indexes, then the heap */
 			lazy_vacuum_all_indexes(onerel, Irel, indstats,
 									vacrelstats, lps, nindexes);
@@ -1083,6 +1095,10 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno);
 			next_fsm_block_to_vacuum = blkno;
 
+			/* Reset both boundaries for next heap scan */
+			vacrelstats->dead_tuples->min_blkno = InvalidBlockNumber;
+			vacrelstats->dead_tuples->max_blkno = InvalidBlockNumber;
+
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 										 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
@@ -1712,6 +1728,10 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	/* XXX put a threshold on min number of tuples here? */
 	if (dead_tuples->num_tuples > 0)
 	{
+		/* Update the upper bound */
+		vacrelstats->dead_tuples->max_blkno = blkno;
+		Assert(vacrelstats->dead_tuples->max_blkno == nblocks);
+
 		/* Work on all the indexes, and then the heap */
 		lazy_vacuum_all_indexes(onerel, Irel, indstats, vacrelstats,
 								lps, nindexes);
@@ -1797,6 +1817,8 @@ lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
 {
 	Assert(!IsParallelWorker());
 	Assert(nindexes > 0);
+	Assert(BlockNumberIsValid(vacrelstats->dead_tuples->min_blkno) &&
+			BlockNumberIsValid(vacrelstats->dead_tuples->max_blkno));
 
 	/* Log cleanup info before we touch indexes */
 	vacuum_log_cleanup_info(onerel, vacrelstats);
@@ -2907,6 +2929,8 @@ lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
 	dead_tuples = (LVDeadTuples *) palloc(SizeOfDeadTuples(maxtuples));
 	dead_tuples->num_tuples = 0;
 	dead_tuples->max_tuples = (int) maxtuples;
+	dead_tuples->min_blkno = InvalidBlockNumber;
+	dead_tuples->max_blkno = InvalidBlockNumber;
 
 	vacrelstats->dead_tuples = dead_tuples;
 }
@@ -2928,6 +2952,10 @@ lazy_record_dead_tuple(LVDeadTuples *dead_tuples, ItemPointer itemptr)
 		dead_tuples->num_tuples++;
 		pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
 									 dead_tuples->num_tuples);
+
+		/* Remember the lower bound */
+		if (!BlockNumberIsValid(dead_tuples->min_blkno))
+			dead_tuples->min_blkno = ItemPointerGetBlockNumberNoCheck(itemptr);
 	}
 }
 
@@ -2942,8 +2970,13 @@ static bool
 lazy_tid_reaped(ItemPointer itemptr, void *state)
 {
 	LVDeadTuples *dead_tuples = (LVDeadTuples *) state;
+	BlockNumber	blkno = ItemPointerGetBlockNumberNoCheck(itemptr);
 	ItemPointer res;
 
+	/* Boundary value check */
+	if (blkno < dead_tuples->min_blkno || blkno >= dead_tuples->max_blkno)
+		return false;
+
 	res = (ItemPointer) bsearch((void *) itemptr,
 								(void *) dead_tuples->itemptrs,
 								dead_tuples->num_tuples,

Thomas Munro

thomas.munro@gmail.com

over 5 years ago

In reply to: Masahiko Sawada (#1)

Re: Boundary value check in lazy_tid_reaped()

On Sun, Aug 30, 2020 at 11:08 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

So my proposal is to add boundary value check in lazy_tid_reaped()
before executing bsearch(3). This will help when index vacuum happens
multiple times or when garbage tuples are concentrated to a narrow
range.

Makes sense if it's often out of range.

I thought that we can have a generic function wrapping bsearch(3) that
does boundary value checks and then does bsearch(3) so that we can use
it in other similar places as well. But the attached patch doesn't do
that as I'd like to hear opinions on the proposal first.

I wonder if you would also see a speed-up with a bsearch() replacement
that is inlineable, so it can inline the comparator (instead of
calling it through a function pointer). I wonder if something more
like (lblk << 32 | loff) - (rblk << 32 | roff) would go faster than
the branchy comparator.

Thomas Munro

thomas.munro@gmail.com

over 5 years ago

In reply to: Thomas Munro (#2)

Re: Boundary value check in lazy_tid_reaped()

On Tue, Sep 1, 2020 at 7:21 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Sun, Aug 30, 2020 at 11:08 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

So my proposal is to add boundary value check in lazy_tid_reaped()
before executing bsearch(3). This will help when index vacuum happens
multiple times or when garbage tuples are concentrated to a narrow
range.

Makes sense if it's often out of range.

... though I'm not sure why you need to add extra members to do it?

I thought that we can have a generic function wrapping bsearch(3) that
does boundary value checks and then does bsearch(3) so that we can use
it in other similar places as well. But the attached patch doesn't do
that as I'd like to hear opinions on the proposal first.

I wonder if you would also see a speed-up with a bsearch() replacement
that is inlineable, so it can inline the comparator (instead of
calling it through a function pointer). I wonder if something more
like (lblk << 32 | loff) - (rblk << 32 | roff) would go faster than
the branchy comparator.

Erm, off course that expression won't work... should be << 16, but
even then it would only work with a bsearch that uses int64_t
comparators, so I take that part back.

Peter Geoghegan

pg@bowt.ie

over 5 years ago

In reply to: Thomas Munro (#2)

Re: Boundary value check in lazy_tid_reaped()

On Mon, Aug 31, 2020 at 12:22 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Sun, Aug 30, 2020 at 11:08 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

So my proposal is to add boundary value check in lazy_tid_reaped()
before executing bsearch(3). This will help when index vacuum happens
multiple times or when garbage tuples are concentrated to a narrow
range.

Makes sense if it's often out of range.

I also think this is a good idea. But I also wonder if it goes far
enough. Only one or two dead TIDs in inconvenient heap pages can
completely defeat the optimization.

A related problem with the TID list is that it is very space
inefficient. It would be nice to fix that problem at some point. We
could make multiple index scans by VACUUM operations much rarer if we
tried. But how can we do all of this together?

I wonder if Roaring bitmaps would work well for this:

https://arxiv.org/abs/1709.07821

This data structure looks like it might work well in lazy_tid_reaped()
(for the TID array), because it offers effective bitmap compression
without compromising on binary search speed. Of course, you'd have to
encode TIDs as bitmaps to make use of the data structure, much like
tidbitmap.c. I imagine that the Roaring bitmap indirection would be
very CPU cache efficient in cases that are similar to the cases
Sawada-san is worried about, but a bit more complicated.

VACUUM would be able to make the summarizing information for the TID
bitmap resident in CPU cache. Only this well-cached summarizing
information (the top-level bitmap range indirection) will need to be
accessed by most individual calls to a Roaring-bitmap-aware
lazy_tid_reaped() that return false (i.e. calls that say "don't kill
this TID, it's alive"). These performance characteristics seem likely
when a certain amount of clustering of dead tuples takes place in the
heap. I bet having completely random positions for dead TIDs almost
never happens -- *some* clustering is practically certain to take
place, even without natural locality in the data (which is itself very
common). Think about how opportunistic pruning accumulates many
LP_DEAD items in heap pages.

There is a reference Roaring bitmap implementation in C, so it's
probably not that hard to experimentally determine how well it would
work:

https://github.com/RoaringBitmap/CRoaring

Lots of open source projects already use it, so there are no problems
with patents. Seems worth investigating. (I also wonder if we could
use it for clog.)

--
Peter Geoghegan

Peter Geoghegan

pg@bowt.ie

over 5 years ago

In reply to: Peter Geoghegan (#4)

Re: Boundary value check in lazy_tid_reaped()

On Mon, Aug 31, 2020 at 1:56 PM Peter Geoghegan <pg@bowt.ie> wrote:

I wonder if Roaring bitmaps would work well for this:

https://arxiv.org/abs/1709.07821

Alternatively, perhaps we could use a negative cache of heap blocks
that have no tuples to kill at all. Maybe just a simple array whose
elements are BlockNumber pairs. Each pair represents a range of blocks
known to contain heap pages that *don't* have any TIDs for VACUUM to
kill. The array could be size limited to 8KB, allowing it to be
accessed in L1 cache throughout vacuuming. When the representation
cannot fit in 8KB, maybe we disable the negative cache for the rest of
the VACUUM operation.

This is a bit like Masahiko's min_blkno + max_blkno idea, except: 1).
It's a negative cache, and 2.) There are perhaps as many as 1024
min/max pairs -- not just 1.

It's pretty clear that more than 90% of all calls to lazy_tid_reaped()
return false unless vacuum is running behind (false means "don't kill
this TID, it's alive"). But it could be much more than 90%. This may
be because autovacuum is configured to run more aggressively than the
default settings allow. But it may also happen when LP_DEAD killing in
indexes works well enough to do most of the cleanup needed in
practice. I bet pgbench finds that ~99% of all calls to
lazy_tid_reaped() that take place during index vacuuming find that the
TID doesn't need to be killed. So negative caching could really help.

--
Peter Geoghegan

Masahiko Sawada

masahiko.sawada@2ndquadrant.com

over 5 years ago

In reply to: Thomas Munro (#3)

Re: Boundary value check in lazy_tid_reaped()

On Tue, 1 Sep 2020 at 04:44, Thomas Munro <thomas.munro@gmail.com> wrote:

On Tue, Sep 1, 2020 at 7:21 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Sun, Aug 30, 2020 at 11:08 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

So my proposal is to add boundary value check in lazy_tid_reaped()
before executing bsearch(3). This will help when index vacuum happens
multiple times or when garbage tuples are concentrated to a narrow
range.

Makes sense if it's often out of range.

... though I'm not sure why you need to add extra members to do it?

Indeed. We can use the first and last elements of itemptrs array.

I thought that we can have a generic function wrapping bsearch(3) that
does boundary value checks and then does bsearch(3) so that we can use
it in other similar places as well. But the attached patch doesn't do
that as I'd like to hear opinions on the proposal first.

I wonder if you would also see a speed-up with a bsearch() replacement
that is inlineable, so it can inline the comparator (instead of
calling it through a function pointer). I wonder if something more
like (lblk << 32 | loff) - (rblk << 32 | roff) would go faster than
the branchy comparator.

Erm, off course that expression won't work... should be << 16, but
even then it would only work with a bsearch that uses int64_t
comparators, so I take that part back.

Yeah, it seems to worth benchmarking the speed-up with an inlining.
I'll do some performance tests with/without inlining on top of
checking boundary values.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Peter Geoghegan

pg@bowt.ie

over 5 years ago

In reply to: Masahiko Sawada (#6)

Re: Boundary value check in lazy_tid_reaped()

On Tue, Sep 8, 2020 at 1:23 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

I wonder if you would also see a speed-up with a bsearch() replacement
that is inlineable, so it can inline the comparator (instead of
calling it through a function pointer). I wonder if something more
like (lblk << 32 | loff) - (rblk << 32 | roff) would go faster than
the branchy comparator.

Erm, off course that expression won't work... should be << 16, but
even then it would only work with a bsearch that uses int64_t
comparators, so I take that part back.

Yeah, it seems to worth benchmarking the speed-up with an inlining.
I'll do some performance tests with/without inlining on top of
checking boundary values.

It sounds like Thomas was talking about something like
itemptr_encode() + itemptr_decode(). In case you didn't know, we
actually do something like this for the TID tuplesort used for CREATE
INDEX CONCURRENTLY.

--
Peter Geoghegan

Masahiko Sawada

masahiko.sawada@2ndquadrant.com

over 5 years ago

In reply to: Peter Geoghegan (#5)

Re: Boundary value check in lazy_tid_reaped()

On Tue, 1 Sep 2020 at 08:01, Peter Geoghegan <pg@bowt.ie> wrote:

On Mon, Aug 31, 2020 at 1:56 PM Peter Geoghegan <pg@bowt.ie> wrote:

I wonder if Roaring bitmaps would work well for this:

https://arxiv.org/abs/1709.07821

Alternatively, perhaps we could use a negative cache of heap blocks
that have no tuples to kill at all. Maybe just a simple array whose
elements are BlockNumber pairs. Each pair represents a range of blocks
known to contain heap pages that *don't* have any TIDs for VACUUM to
kill. The array could be size limited to 8KB, allowing it to be
accessed in L1 cache throughout vacuuming. When the representation
cannot fit in 8KB, maybe we disable the negative cache for the rest of
the VACUUM operation.

This is a bit like Masahiko's min_blkno + max_blkno idea, except: 1).
It's a negative cache, and 2.) There are perhaps as many as 1024
min/max pairs -- not just 1.

Interesting idea. Maybe we need one more bsearch() for the min/max
pairs array when a TID should be killed?

It's pretty clear that more than 90% of all calls to lazy_tid_reaped()
return false unless vacuum is running behind (false means "don't kill
this TID, it's alive"). But it could be much more than 90%. This may
be because autovacuum is configured to run more aggressively than the
default settings allow. But it may also happen when LP_DEAD killing in
indexes works well enough to do most of the cleanup needed in
practice. I bet pgbench finds that ~99% of all calls to
lazy_tid_reaped() that take place during index vacuuming find that the
TID doesn't need to be killed. So negative caching could really help.

I agree that lazy_tid_reaped() returns false in most checks in the
case autovacuum is not running behind. But I'm concerned a bit that it
instead costs the case where vacuum needs to kill many TIDs and uses
the min/max filter because it needs to call bsearch() twice. I think
that this is the case where autovacuum is running behind and the user
wants to complete the vacuum as soon as possible. Since I expect that
checking the filtering using min/max pairs array should be done in a
very short time it might not be a problem but I think it’s worth
benchmarking the overhead in the worst case. Or I guess we can use a
bloom filter for this purpose instead.

Also, I'm concerned that 1024 min/max pairs might not be enough in
practice, especially for very large tables.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Masahiko Sawada

masahiko.sawada@2ndquadrant.com

about 5 years ago

In reply to: Peter Geoghegan (#4)

1 attachment(s)

Re: Boundary value check in lazy_tid_reaped()

On Tue, 1 Sep 2020 at 05:56, Peter Geoghegan <pg@bowt.ie> wrote:

On Mon, Aug 31, 2020 at 12:22 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Sun, Aug 30, 2020 at 11:08 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

So my proposal is to add boundary value check in lazy_tid_reaped()
before executing bsearch(3). This will help when index vacuum happens
multiple times or when garbage tuples are concentrated to a narrow
range.

Makes sense if it's often out of range.

I also think this is a good idea. But I also wonder if it goes far
enough. Only one or two dead TIDs in inconvenient heap pages can
completely defeat the optimization.

A related problem with the TID list is that it is very space
inefficient. It would be nice to fix that problem at some point. We
could make multiple index scans by VACUUM operations much rarer if we
tried. But how can we do all of this together?

I wonder if Roaring bitmaps would work well for this:

https://arxiv.org/abs/1709.07821

This data structure looks like it might work well in lazy_tid_reaped()
(for the TID array), because it offers effective bitmap compression
without compromising on binary search speed. Of course, you'd have to
encode TIDs as bitmaps to make use of the data structure, much like
tidbitmap.c. I imagine that the Roaring bitmap indirection would be
very CPU cache efficient in cases that are similar to the cases
Sawada-san is worried about, but a bit more complicated.

VACUUM would be able to make the summarizing information for the TID
bitmap resident in CPU cache. Only this well-cached summarizing
information (the top-level bitmap range indirection) will need to be
accessed by most individual calls to a Roaring-bitmap-aware
lazy_tid_reaped() that return false (i.e. calls that say "don't kill
this TID, it's alive"). These performance characteristics seem likely
when a certain amount of clustering of dead tuples takes place in the
heap. I bet having completely random positions for dead TIDs almost
never happens -- *some* clustering is practically certain to take
place, even without natural locality in the data (which is itself very
common). Think about how opportunistic pruning accumulates many
LP_DEAD items in heap pages.

There is a reference Roaring bitmap implementation in C, so it's
probably not that hard to experimentally determine how well it would
work:

https://github.com/RoaringBitmap/CRoaring

Lots of open source projects already use it, so there are no problems
with patents. Seems worth investigating. (I also wonder if we could
use it for clog.)

Very interesting.

Before getting into CRoaring bitmap, I've done some experiments with
three different methods of recording dead tuple TIDs: array,
array-minmax, and integer set.

'array' is the current method lazy vacuum uses. It just adds dead
tuple TIDs to the simple array of ItemPointerData.
'array-minmax' is the same as 'array' method except for checking min
and max boundaries when deleting index dead tuples (i.g., in
lazy_tid_reaped()).
'intset' uses the integer set (src/backend/lib/integerset.c) to record
dead tuples TIDs. It's an in-memory data structure to hold 64-bit
integers.

In the experiments, I created the table with 4GB and made 20% of total
tuples dirty in all test cases while changing the distribution of
where dead tuples exist within the table. The table has only one
index, therefore I did't use parallel vacuum. I set enough
maintenance_work_mem so that the index scan runs only once. Here is
the result, showing the "total execution time in second (heap scan
time/index vacuum time/table vacuum time/memory usage in MB)”. The
numbers are round down to the nearest decimal.

1. Updated evenly the table (every block has at least one dead tuple).
array : 22 (8/12/2/114)
array-minmax : 20 (8/11/2/114)
intset : 17 (7/8/1/19)

2. Updated the middle of the table.
array : 25 (6/19/0/114)
array-minmax : 17 (6/11/0/114)
intset : 17 (6/11/0/7)

3. Updated the tail of the table.
array : 25 (6/19/0/114)
array-minmax : 18 (6/11/0/114)
intset : 18 (6/11/0/5)

4. Updated both the beginning and the tail of the table.
array : 25 (6/19/0/114)
array-minmax : 23 (6/17/0/114)
intset : 21 (6/14/0/6)

The memory usage is calculated by (# of dead tuples *
sizeof(ItemPointerData)) in both ‘array’ and ‘array-minmax’ case,
although the actual amount of palloc'd memory is different, and by
intset_memory_usage() in ‘intset’ case.

Using the integer set is very memory efficient (5MB vs. 114MB in the
base case) and there is no 1GB limitation. Looking at the execution
time, I had expected that using the integer set is slower on recording
TIDs than using the simple array but in this experiment, there is no
big difference among methods. Perhaps the result will vary if vacuum
needs to record much more dead tuple TIDs. From the results, I can see
a good improvement in the integer set case and probably a good start
but if we want to use it also for lazy vacuum, we will need to improve
it so that it can be used on DSA for the parallel vacuum.

I've attached the patch I used for the experiment that adds xx_vacuum
GUC parameter that controls the method of recording TIDs. Please note
that it doesn't support parallel vacuum.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

vacuum_strategy_poc.patchapplication/x-patch; name=vacuum_strategy_poc.patchDownload

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4f2f38168d..304dae0b25 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -80,6 +80,9 @@
 #include "utils/pg_rusage.h"
 #include "utils/timestamp.h"
 
+#include "lib/integerset.h"
+#include "catalog/index.h"
+
 
 /*
  * Space/time tradeoff parameters: do these need to be user-tunable?
@@ -167,6 +170,8 @@ typedef struct LVDeadTuples
 {
 	int			max_tuples;		/* # slots allocated in array */
 	int			num_tuples;		/* current # of entries */
+	MemoryContext	ctx;
+	IntegerSet		*intset;
 	/* List of TIDs of tuples we intend to delete */
 	/* NB: this list is ordered by TID address */
 	ItemPointerData itemptrs[FLEXIBLE_ARRAY_MEMBER];	/* array of
@@ -338,6 +343,7 @@ static MultiXactId MultiXactCutoff;
 
 static BufferAccessStrategy vac_strategy;
 
+int xx_vacuum = VACUUM_ARRAY;
 
 /* non-export function prototypes */
 static void lazy_scan_heap(Relation onerel, VacuumParams *params,
@@ -405,6 +411,41 @@ static void update_vacuum_error_info(LVRelStats *errinfo, LVSavedErrInfo *saved_
 									 OffsetNumber offnum);
 static void restore_vacuum_error_info(LVRelStats *errinfo, const LVSavedErrInfo *saved_err_info);
 
+static inline  bool
+check_memory_limit(LVDeadTuples *dead_tuples)
+{
+	if (xx_vacuum == VACUUM_ARRAY || xx_vacuum == VACUUM_ARRAY_MINMAX)
+		return ((dead_tuples->max_tuples - dead_tuples->num_tuples) < MaxHeapTuplesPerPage) &&
+			dead_tuples->num_tuples > 0;
+	else
+		return intset_memory_usage(dead_tuples->intset) >= (maintenance_work_mem * 1000L);
+}
+
+static inline int
+get_num_dead_tuples(LVDeadTuples *dead_tuples)
+{
+	if (xx_vacuum == VACUUM_ARRAY || xx_vacuum == VACUUM_ARRAY_MINMAX)
+		return dead_tuples->num_tuples;
+	else
+		return intset_num_entries(dead_tuples->intset);
+}
+
+static void
+init_dead_tuple_intset(LVDeadTuples *dead_tuples)
+{
+	MemoryContext oldctx;
+
+	if (dead_tuples->ctx)
+		MemoryContextDelete(dead_tuples->ctx);
+
+	dead_tuples->ctx = GenerationContextCreate(CurrentMemoryContext,
+											   "dead tuple intset",
+											   16 * 1024);
+	oldctx = MemoryContextSwitchTo(dead_tuples->ctx);
+	dead_tuples->intset = intset_create();
+	MemoryContextSwitchTo(oldctx);
+}
+
 
 /*
  *	heap_vacuum_rel() -- perform VACUUM for one heap relation
@@ -1036,8 +1077,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		 * If we are close to overrunning the available space for dead-tuple
 		 * TIDs, pause and do a cycle of vacuuming before we tackle this page.
 		 */
-		if ((dead_tuples->max_tuples - dead_tuples->num_tuples) < MaxHeapTuplesPerPage &&
-			dead_tuples->num_tuples > 0)
+		if (check_memory_limit(vacrelstats->dead_tuples))
 		{
 			/*
 			 * Before beginning index vacuuming, we release any pin we may
@@ -1064,6 +1104,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			 * valid.
 			 */
 			dead_tuples->num_tuples = 0;
+			if (xx_vacuum == VACUUM_INTSET)
+				init_dead_tuple_intset(dead_tuples);
 
 			/*
 			 * Vacuum the Free Space Map to make newly-freed space visible on
@@ -1250,7 +1292,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		has_dead_tuples = false;
 		nfrozen = 0;
 		hastup = false;
-		prev_dead_count = dead_tuples->num_tuples;
+		prev_dead_count = get_num_dead_tuples(dead_tuples);
 		maxoff = PageGetMaxOffsetNumber(page);
 
 		/*
@@ -1702,7 +1744,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 
 	/* If any tuples need to be deleted, perform final vacuum cycle */
 	/* XXX put a threshold on min number of tuples here? */
-	if (dead_tuples->num_tuples > 0)
+	if (get_num_dead_tuples(dead_tuples) > 0)
 	{
 		/* Work on all the indexes, and then the heap */
 		lazy_vacuum_all_indexes(onerel, Irel, indstats, vacrelstats,
@@ -1861,35 +1903,78 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 	npages = 0;
 
 	tupindex = 0;
-	while (tupindex < vacrelstats->dead_tuples->num_tuples)
+
+	if (xx_vacuum == VACUUM_INTSET)
 	{
-		BlockNumber tblk;
-		Buffer		buf;
-		Page		page;
-		Size		freespace;
+		uint64		val;
 
-		vacuum_delay_point();
+		intset_begin_iterate(vacrelstats->dead_tuples->intset);
 
-		tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
-		vacrelstats->blkno = tblk;
-		buf = ReadBufferExtended(onerel, MAIN_FORKNUM, tblk, RBM_NORMAL,
-								 vac_strategy);
-		if (!ConditionalLockBufferForCleanup(buf))
+		while (tupindex != -1 &&
+			   intset_iterate_next(vacrelstats->dead_tuples->intset, &val))
 		{
-			ReleaseBuffer(buf);
-			++tupindex;
-			continue;
+			BlockNumber tblk;
+			Buffer		buf;
+			Page		page;
+			Size		freespace;
+			ItemPointerData itemptr;
+
+			vacuum_delay_point();
+			itemptr_decode(&itemptr, val);
+			tblk = ItemPointerGetBlockNumber(&itemptr);
+			vacrelstats->blkno = tblk;
+			buf = ReadBufferExtended(onerel, MAIN_FORKNUM, tblk, RBM_NORMAL,
+									 vac_strategy);
+			if (!ConditionalLockBufferForCleanup(buf))
+			{
+				ReleaseBuffer(buf);
+				++tupindex;
+				continue;
+			}
+			tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, vacrelstats,
+										&vmbuffer);
+
+			/* Now that we've compacted the page, record its available space */
+			page = BufferGetPage(buf);
+			freespace = PageGetHeapFreeSpace(page);
+
+			UnlockReleaseBuffer(buf);
+			RecordPageWithFreeSpace(onerel, tblk, freespace);
+			npages++;
 		}
-		tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, vacrelstats,
-									&vmbuffer);
+	}
+	else
+	{
+		while (tupindex < vacrelstats->dead_tuples->num_tuples)
+		{
+			BlockNumber tblk;
+			Buffer		buf;
+			Page		page;
+			Size		freespace;
 
-		/* Now that we've compacted the page, record its available space */
-		page = BufferGetPage(buf);
-		freespace = PageGetHeapFreeSpace(page);
+			vacuum_delay_point();
 
-		UnlockReleaseBuffer(buf);
-		RecordPageWithFreeSpace(onerel, tblk, freespace);
-		npages++;
+			tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
+			vacrelstats->blkno = tblk;
+			buf = ReadBufferExtended(onerel, MAIN_FORKNUM, tblk, RBM_NORMAL,
+									 vac_strategy);
+			if (!ConditionalLockBufferForCleanup(buf))
+			{
+				ReleaseBuffer(buf);
+				++tupindex;
+				continue;
+			}
+			tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, vacrelstats,
+										&vmbuffer);
+
+			/* Now that we've compacted the page, record its available space */
+			page = BufferGetPage(buf);
+			freespace = PageGetHeapFreeSpace(page);
+
+			UnlockReleaseBuffer(buf);
+			RecordPageWithFreeSpace(onerel, tblk, freespace);
+			npages++;
+		}
 	}
 
 	/* Clear the block number information */
@@ -1901,10 +1986,19 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 		vmbuffer = InvalidBuffer;
 	}
 
+	ereport(elevel,
+			(errmsg("%s memory usage %lu",
+					xx_vacuum == VACUUM_INTSET ? "INTSET":
+					xx_vacuum == VACUUM_ARRAY_MINMAX ? "ARRAY_MINMAX":
+					"ARRAY",
+					xx_vacuum == VACUUM_INTSET ?
+					intset_memory_usage(vacrelstats->dead_tuples->intset)
+					: (sizeof(ItemPointerData) * vacrelstats->dead_tuples->num_tuples))));
 	ereport(elevel,
 			(errmsg("\"%s\": removed %d row versions in %d pages",
 					vacrelstats->relname,
-					tupindex, npages),
+					get_num_dead_tuples(vacrelstats->dead_tuples),
+					npages),
 			 errdetail_internal("%s", pg_rusage_show(&ru0))));
 
 	/* Revert to the previous phase information for error traceback */
@@ -1941,19 +2035,50 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 
 	START_CRIT_SECTION();
 
-	for (; tupindex < dead_tuples->num_tuples; tupindex++)
+	if (xx_vacuum == VACUUM_INTSET)
 	{
-		BlockNumber tblk;
-		OffsetNumber toff;
-		ItemId		itemid;
+		uint64		val;
+		bool		nextblk = false;
+
+		while (intset_iterate_next(vacrelstats->dead_tuples->intset, &val))
+		{
+			BlockNumber tblk;
+			OffsetNumber toff;
+			ItemId		itemid;
+			ItemPointerData itemptr;
+
+			itemptr_decode(&itemptr, val);
+			tblk = ItemPointerGetBlockNumber(&itemptr);
+			if (tblk != blkno)
+			{
+				nextblk = true;
+				break;				/* past end of tuples for this block */
+			}
+			toff = ItemPointerGetOffsetNumber(&itemptr);
+			itemid = PageGetItemId(page, toff);
+			ItemIdSetUnused(itemid);
+			unused[uncnt++] = toff;
+		}
+
+		if (!nextblk)
+			tupindex = -1;
+	}
+	else
+	{
+		for (; tupindex < dead_tuples->num_tuples; tupindex++)
+		{
+			BlockNumber tblk;
+			OffsetNumber toff;
+			ItemId		itemid;
 
-		tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
-		if (tblk != blkno)
-			break;				/* past end of tuples for this block */
-		toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
-		itemid = PageGetItemId(page, toff);
-		ItemIdSetUnused(itemid);
-		unused[uncnt++] = toff;
+			tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
+			if (tblk != blkno)
+				break;				/* past end of tuples for this block */
+			toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
+			itemid = PageGetItemId(page, toff);
+			ItemIdSetUnused(itemid);
+			unused[uncnt++] = toff;
+		}
 	}
 
 	PageRepairFragmentation(page);
@@ -2471,7 +2596,7 @@ lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
 	ereport(elevel,
 			(errmsg(msg,
 					vacrelstats->indname,
-					dead_tuples->num_tuples),
+					get_num_dead_tuples(dead_tuples)),
 			 errdetail_internal("%s", pg_rusage_show(&ru0))));
 
 	/* Revert to the previous phase information for error traceback */
@@ -2892,13 +3017,24 @@ static void
 lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
 {
 	LVDeadTuples *dead_tuples = NULL;
-	long		maxtuples;
 
-	maxtuples = compute_max_dead_tuples(relblocks, vacrelstats->useindex);
+	if (xx_vacuum == VACUUM_INTSET)
+	{
+		dead_tuples = (LVDeadTuples *) palloc(sizeof(LVDeadTuples));
+		dead_tuples->num_tuples = 0;
+		dead_tuples->ctx = NULL;
+		init_dead_tuple_intset(dead_tuples);
+	}
+	else
+	{
+		long		maxtuples;
+
+		maxtuples = compute_max_dead_tuples(relblocks, vacrelstats->useindex);
 
-	dead_tuples = (LVDeadTuples *) palloc(SizeOfDeadTuples(maxtuples));
-	dead_tuples->num_tuples = 0;
-	dead_tuples->max_tuples = (int) maxtuples;
+		dead_tuples = (LVDeadTuples *) palloc(SizeOfDeadTuples(maxtuples));
+		dead_tuples->num_tuples = 0;
+		dead_tuples->max_tuples = (int) maxtuples;
+	}
 
 	vacrelstats->dead_tuples = dead_tuples;
 }
@@ -2909,17 +3045,29 @@ lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
 static void
 lazy_record_dead_tuple(LVDeadTuples *dead_tuples, ItemPointer itemptr)
 {
-	/*
-	 * The array shouldn't overflow under normal behavior, but perhaps it
-	 * could if we are given a really small maintenance_work_mem. In that
-	 * case, just forget the last few tuples (we'll get 'em next time).
-	 */
-	if (dead_tuples->num_tuples < dead_tuples->max_tuples)
+	if (xx_vacuum == VACUUM_INTSET)
+	{
+		uint64 val = itemptr_encode(itemptr);
+
+		if (intset_memory_usage(dead_tuples->intset) >= (maintenance_work_mem * 1000L))
+			return;
+
+		intset_add_member(dead_tuples->intset, val);
+	}
+	else
 	{
-		dead_tuples->itemptrs[dead_tuples->num_tuples] = *itemptr;
-		dead_tuples->num_tuples++;
-		pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
-									 dead_tuples->num_tuples);
+		/*
+		 * The array shouldn't overflow under normal behavior, but perhaps it
+		 * could if we are given a really small maintenance_work_mem. In that
+		 * case, just forget the last few tuples (we'll get 'em next time).
+		 */
+		if (dead_tuples->num_tuples < dead_tuples->max_tuples)
+		{
+			dead_tuples->itemptrs[dead_tuples->num_tuples] = *itemptr;
+			dead_tuples->num_tuples++;
+			pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
+										 dead_tuples->num_tuples);
+		}
 	}
 }
 
@@ -2934,15 +3082,40 @@ static bool
 lazy_tid_reaped(ItemPointer itemptr, void *state)
 {
 	LVDeadTuples *dead_tuples = (LVDeadTuples *) state;
-	ItemPointer res;
 
-	res = (ItemPointer) bsearch((void *) itemptr,
+	if (xx_vacuum == VACUUM_INTSET)
+		return intset_is_member(dead_tuples->intset,
+								itemptr_encode(itemptr));
+ 	else if (xx_vacuum == VACUUM_ARRAY_MINMAX)
+	{
+		ItemPointer res;
+		BlockNumber minblk, maxblk, blk;
+
+		blk = ItemPointerGetBlockNumber(itemptr);
+		minblk = ItemPointerGetBlockNumber(&(dead_tuples->itemptrs[0]));
+		maxblk = ItemPointerGetBlockNumber(&(dead_tuples->itemptrs[dead_tuples->num_tuples - 1]));
+
+		if (blk < minblk || blk > maxblk)
+			return false;
+
+		res = (ItemPointer) bsearch((void *) itemptr,
 								(void *) dead_tuples->itemptrs,
 								dead_tuples->num_tuples,
 								sizeof(ItemPointerData),
 								vac_cmp_itemptr);
+		return (res != NULL);
 
-	return (res != NULL);
+	}
+	else
+	{
+		ItemPointer res;
+		res = (ItemPointer) bsearch((void *) itemptr,
+								(void *) dead_tuples->itemptrs,
+								dead_tuples->num_tuples,
+								sizeof(ItemPointerData),
+								vac_cmp_itemptr);
+		return (res != NULL);
+	}
 }
 
 /*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a62d64eaa4..8b3036a32a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -482,6 +482,13 @@ const struct config_enum_entry ssl_protocol_versions_info[] = {
 	{NULL, 0, false}
 };
 
+const struct config_enum_entry xx_vacuum_options[] = {
+	{"array", VACUUM_ARRAY, false},
+	{"intset", VACUUM_INTSET, false},
+	{"array-minmax", VACUUM_ARRAY_MINMAX, false},
+	{NULL, 0, false}
+};
+
 StaticAssertDecl(lengthof(ssl_protocol_versions_info) == (PG_TLS1_3_VERSION + 2),
 				 "array length mismatch");
 
@@ -4457,6 +4464,15 @@ static struct config_string ConfigureNamesString[] =
 
 static struct config_enum ConfigureNamesEnum[] =
 {
+	{
+		{"xx_vacuum", PGC_USERSET, RESOURCES,
+			gettext_noop("vacuum strategy."),
+		},
+		&xx_vacuum,
+		VACUUM_ARRAY, xx_vacuum_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"backslash_quote", PGC_USERSET, COMPAT_OPTIONS_PREVIOUS,
 			gettext_noop("Sets whether \"\\'\" is allowed in string literals."),
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index d9475c9989..ca50589a2b 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -23,6 +23,13 @@
 #include "storage/lock.h"
 #include "utils/relcache.h"
 
+enum xx_vacuum_options
+{
+	VACUUM_ARRAY = 0,
+	VACUUM_INTSET,
+	VACUUM_ARRAY_MINMAX
+};
+
 /*
  * Flags for amparallelvacuumoptions to control the participation of bulkdelete
  * and vacuumcleanup in parallel vacuum.
@@ -243,6 +250,7 @@ extern pg_atomic_uint32 *VacuumSharedCostBalance;
 extern pg_atomic_uint32 *VacuumActiveNWorkers;
 extern int	VacuumCostBalanceLocal;
 
+extern int xx_vacuum;
 
 /* in commands/vacuum.c */
 extern void ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel);

#10

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 5 years ago

In reply to: Masahiko Sawada (#9)

Re: Boundary value check in lazy_tid_reaped()

On 2020-10-30 02:43, Masahiko Sawada wrote:

Using the integer set is very memory efficient (5MB vs. 114MB in the
base case) and there is no 1GB limitation. Looking at the execution
time, I had expected that using the integer set is slower on recording
TIDs than using the simple array but in this experiment, there is no
big difference among methods. Perhaps the result will vary if vacuum
needs to record much more dead tuple TIDs. From the results, I can see
a good improvement in the integer set case and probably a good start
but if we want to use it also for lazy vacuum, we will need to improve
it so that it can be used on DSA for the parallel vacuum.

I've attached the patch I used for the experiment that adds xx_vacuum
GUC parameter that controls the method of recording TIDs. Please note
that it doesn't support parallel vacuum.

How do you want to proceed here? The approach of writing a wrapper for
bsearch with bound check sounded like a good start. All the other ideas
discussed here seem larger projects and would probably be out of scope
of this commit fest.

#11

Masahiko Sawada

sawada.mshk@gmail.com

almost 5 years ago

In reply to: Peter Eisentraut (#10)

1 attachment(s)

Re: Boundary value check in lazy_tid_reaped()

On Wed, Jan 20, 2021 at 3:50 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 2020-10-30 02:43, Masahiko Sawada wrote:

Using the integer set is very memory efficient (5MB vs. 114MB in the
base case) and there is no 1GB limitation. Looking at the execution
time, I had expected that using the integer set is slower on recording
TIDs than using the simple array but in this experiment, there is no
big difference among methods. Perhaps the result will vary if vacuum
needs to record much more dead tuple TIDs. From the results, I can see
a good improvement in the integer set case and probably a good start
but if we want to use it also for lazy vacuum, we will need to improve
it so that it can be used on DSA for the parallel vacuum.

I've attached the patch I used for the experiment that adds xx_vacuum
GUC parameter that controls the method of recording TIDs. Please note
that it doesn't support parallel vacuum.

How do you want to proceed here? The approach of writing a wrapper for
bsearch with bound check sounded like a good start. All the other ideas
discussed here seem larger projects and would probably be out of scope
of this commit fest.

Agreed. bsearch with bound check showed a reasonable improvement in my
evaluation in terms of performance. Regarding memory efficiency, we
can experiment with other methods later.

I've attached the patch that adds a bound check for encoded
itermpointers before bsearch() in lazy_tid_reaped() and inlines the
function.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

bound_check_lazy_vacuum.patchapplication/octet-stream; name=bound_check_lazy_vacuum.patchDownload

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f3d2265fad..93dcb9eca0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -61,6 +61,7 @@
 #include "access/visibilitymap.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "catalog/index.h"
 #include "catalog/storage.h"
 #include "commands/dbcommands.h"
 #include "commands/progress.h"
@@ -365,7 +366,7 @@ static BlockNumber count_nondeletable_pages(Relation onerel,
 static void lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks);
 static void lazy_record_dead_tuple(LVDeadTuples *dead_tuples,
 								   ItemPointer itemptr);
-static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
+static inline bool lazy_tid_reaped(ItemPointer itemptr, void *state);
 static int	vac_cmp_itemptr(const void *left, const void *right);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 LVRelStats *vacrelstats,
@@ -2917,12 +2918,26 @@ lazy_record_dead_tuple(LVDeadTuples *dead_tuples, ItemPointer itemptr)
  *
  *		Assumes dead_tuples array is in sorted order.
  */
-static bool
+static inline bool
 lazy_tid_reaped(ItemPointer itemptr, void *state)
 {
 	LVDeadTuples *dead_tuples = (LVDeadTuples *) state;
+	int64 litem, ritem, item;
 	ItemPointer res;
 
+	litem = itemptr_encode(&(dead_tuples->itemptrs[0]));
+	ritem = itemptr_encode(&(dead_tuples->itemptrs[dead_tuples->num_tuples - 1]));
+	item = itemptr_encode(itemptr);
+
+	/*
+	 * Bound check. Since this function is called for every index tuples
+	 * it needs to be fast. Doing bound check before bsearch is effective
+	 * to avoid extra cost by bsearch especially if dead tuples on the
+	 * heap are concentrated a certain range.
+	 */
+	if (item < litem || item > ritem)
+		return false;
+
 	res = (ItemPointer) bsearch((void *) itemptr,
 								(void *) dead_tuples->itemptrs,
 								dead_tuples->num_tuples,

#12

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 5 years ago

In reply to: Masahiko Sawada (#11)

Re: Boundary value check in lazy_tid_reaped()

On 21.01.21 14:11, Masahiko Sawada wrote:

Agreed. bsearch with bound check showed a reasonable improvement in my
evaluation in terms of performance. Regarding memory efficiency, we
can experiment with other methods later.

I've attached the patch that adds a bound check for encoded
itermpointers before bsearch() in lazy_tid_reaped() and inlines the
function.

Do you have any data showing the effect of inlining lazy_tid_reaped()?
I mean, it probably won't hurt, but it wasn't part of the original patch
that you tested, so I wonder whether it has any noticeable effect.

#13

Masahiko Sawada

sawada.mshk@gmail.com

almost 5 years ago

In reply to: Peter Eisentraut (#12)

Re: Boundary value check in lazy_tid_reaped()

On Mon, Mar 8, 2021 at 7:16 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 21.01.21 14:11, Masahiko Sawada wrote:

Agreed. bsearch with bound check showed a reasonable improvement in my
evaluation in terms of performance. Regarding memory efficiency, we
can experiment with other methods later.

I've attached the patch that adds a bound check for encoded
itermpointers before bsearch() in lazy_tid_reaped() and inlines the
function.

Do you have any data showing the effect of inlining lazy_tid_reaped()?
I mean, it probably won't hurt, but it wasn't part of the original patch
that you tested, so I wonder whether it has any noticeable effect.

I've done some benchmarks while changing the distribution of where
dead tuples exist within the table. The table size is 4GB and 20% of
total tuples are dirty. Here are the results of index vacuum execution
time:

1. Updated evenly the table (every block has at least one dead tuple).
master : 8.15
inlining : 4.84
not-inlinning : 5.01

2. Updated the middle of the table.
master : 8.71
inlining : 3.51
not-inlinning : 3.58

3. Updated both the beginning and the tail of the table.
master : 8.44
inlining : 3.46
not-inlinning : 3.50

There is no noticeable effect of inlining lazy_tid_reaped(). So it
would be better to not do that.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#14

Masahiko Sawada

sawada.mshk@gmail.com

almost 5 years ago

In reply to: Masahiko Sawada (#13)

1 attachment(s)

Re: Boundary value check in lazy_tid_reaped()

On Tue, Mar 9, 2021 at 9:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Mar 8, 2021 at 7:16 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 21.01.21 14:11, Masahiko Sawada wrote:

Agreed. bsearch with bound check showed a reasonable improvement in my
evaluation in terms of performance. Regarding memory efficiency, we
can experiment with other methods later.

I've attached the patch that adds a bound check for encoded
itermpointers before bsearch() in lazy_tid_reaped() and inlines the
function.

Do you have any data showing the effect of inlining lazy_tid_reaped()?
I mean, it probably won't hurt, but it wasn't part of the original patch
that you tested, so I wonder whether it has any noticeable effect.

I've done some benchmarks while changing the distribution of where
dead tuples exist within the table. The table size is 4GB and 20% of
total tuples are dirty. Here are the results of index vacuum execution
time:

1. Updated evenly the table (every block has at least one dead tuple).
master : 8.15
inlining : 4.84
not-inlinning : 5.01

2. Updated the middle of the table.
master : 8.71
inlining : 3.51
not-inlinning : 3.58

3. Updated both the beginning and the tail of the table.
master : 8.44
inlining : 3.46
not-inlinning : 3.50

There is no noticeable effect of inlining lazy_tid_reaped(). So it
would be better to not do that.

Attached the patch that doesn't inline lazy_tid_reaped().

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

bound_check_lazy_vacuum_noinline.patchapplication/octet-stream; name=bound_check_lazy_vacuum_noinline.patchDownload

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d8f847b0e6..ff6aedfcb8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -61,6 +61,7 @@
 #include "access/visibilitymap.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "catalog/index.h"
 #include "catalog/storage.h"
 #include "commands/dbcommands.h"
 #include "commands/progress.h"
@@ -2923,8 +2924,22 @@ static bool
 lazy_tid_reaped(ItemPointer itemptr, void *state)
 {
 	LVDeadTuples *dead_tuples = (LVDeadTuples *) state;
+	int64 litem, ritem, item;
 	ItemPointer res;
 
+	litem = itemptr_encode(&(dead_tuples->itemptrs[0]));
+	ritem = itemptr_encode(&(dead_tuples->itemptrs[dead_tuples->num_tuples - 1]));
+	item = itemptr_encode(itemptr);
+
+	/*
+	 * Bound check. Since this function is called for every index tuples
+	 * it needs to be fast. Doing bound check before bsearch is effective
+	 * to avoid extra cost by bsearch especially if dead tuples on the
+	 * heap are concentrated a certain range.
+	 */
+	if (item < litem || item > ritem)
+		return false;
+
 	res = (ItemPointer) bsearch((void *) itemptr,
 								(void *) dead_tuples->itemptrs,
 								dead_tuples->num_tuples,

#15

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 5 years ago

In reply to: Masahiko Sawada (#14)

Re: Boundary value check in lazy_tid_reaped()

On 10.03.21 02:29, Masahiko Sawada wrote:

There is no noticeable effect of inlining lazy_tid_reaped(). So it
would be better to not do that.

Attached the patch that doesn't inline lazy_tid_reaped().

committed

#16

Masahiko Sawada

sawada.mshk@gmail.com

almost 5 years ago

In reply to: Peter Eisentraut (#15)

Re: Boundary value check in lazy_tid_reaped()

On Wed, Mar 10, 2021 at 11:53 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 10.03.21 02:29, Masahiko Sawada wrote:

There is no noticeable effect of inlining lazy_tid_reaped(). So it
would be better to not do that.

Attached the patch that doesn't inline lazy_tid_reaped().

committed

Thank you!

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#17

Thomas Munro

thomas.munro@gmail.com

almost 5 years ago

In reply to: Peter Geoghegan (#7)

Re: Boundary value check in lazy_tid_reaped()

On Wed, Sep 9, 2020 at 7:33 AM Peter Geoghegan <pg@bowt.ie> wrote:

On Tue, Sep 8, 2020 at 1:23 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

I wonder if you would also see a speed-up with a bsearch() replacement
that is inlineable, so it can inline the comparator (instead of
calling it through a function pointer). I wonder if something more
like (lblk << 32 | loff) - (rblk << 32 | roff) would go faster than
the branchy comparator.

Erm, off course that expression won't work... should be << 16, but
even then it would only work with a bsearch that uses int64_t
comparators, so I take that part back.

Yeah, it seems to worth benchmarking the speed-up with an inlining.
I'll do some performance tests with/without inlining on top of
checking boundary values.

It sounds like Thomas was talking about something like
itemptr_encode() + itemptr_decode(). In case you didn't know, we
actually do something like this for the TID tuplesort used for CREATE
INDEX CONCURRENTLY.

BTW I got around to trying this idea out for a specialised
bsearch_itemptr() using a wide comparator, over here:

/messages/by-id/CA+hUKGKztHEWm676csTFjYzortziWmOcf8HDss2Zr0muZ2xfEg@mail.gmail.com

#18

Peter Geoghegan

pg@bowt.ie

almost 5 years ago

In reply to: Thomas Munro (#17)

Re: Boundary value check in lazy_tid_reaped()

On Sun, Mar 14, 2021 at 4:22 PM Thomas Munro <thomas.munro@gmail.com> wrote:

BTW I got around to trying this idea out for a specialised
bsearch_itemptr() using a wide comparator, over here:

Cool!

I have another thing that should be considered when we revisit this
area in the future: maybe we should structure the binary search to
lookup multiple TIDs at once -- probably all of the TIDs from a given
leaf page, taken together.

There is now an nbtree function used for tuple deletion (all tuple
deletion, not just bottom-up deletion) that works like this:
_bt_delitems_delete_check(). I suspect that it would make sense to
generalize it to do the same thing for regular VACUUM. Perhaps this
idea would have to be combined with other techniques to show a real
benefit. It would probably be necessary to sort the TIDs first (just
like index_delete_sort() does for the _bt_delitems_delete_check() code
today), but that's probably no big deal.

It is typical to have 200 - 400 TIDs on an nbtree leaf page without
using deduplication. And with deduplication enabled you can have as
many as 1300 TIDs on a single 8KiB nbtree leaf page. It's easy to
imagine something like GCC's __builtin_prefetch() (or maybe just more
predictable access patterns in our "batch binary search") making
everything much faster through batching. This will also naturally make
btvacuumpage() much less "branchy", since of course it will no longer
need to process the page one TID at a time -- that helps too.

--
Peter Geoghegan

#19

Hannu Krosing

hannuk@google.com

almost 5 years ago

In reply to: Peter Geoghegan (#18)

Re: Boundary value check in lazy_tid_reaped()

We could also go parallel in another direction - I have been mulling about
writing a "vectorized" bsearch which would use AVX2, where you look up 64
(or more likely 32, so tids also fit in 256bit vector) tids at a time.

The trickiest part is that the search can complete at different iteration
for each value.

Of course it is possible that this has a very bad RAM access behaviour and
is no win at all even if it otherways works.

--
Hannu Krosing

On Tue, Mar 16, 2021 at 10:08 PM Peter Geoghegan <pg@bowt.ie> wrote:

Show quoted text

On Sun, Mar 14, 2021 at 4:22 PM Thomas Munro <thomas.munro@gmail.com>
wrote:

BTW I got around to trying this idea out for a specialised
bsearch_itemptr() using a wide comparator, over here:

Cool!

I have another thing that should be considered when we revisit this
area in the future: maybe we should structure the binary search to
lookup multiple TIDs at once -- probably all of the TIDs from a given
leaf page, taken together.

There is now an nbtree function used for tuple deletion (all tuple
deletion, not just bottom-up deletion) that works like this:
_bt_delitems_delete_check(). I suspect that it would make sense to
generalize it to do the same thing for regular VACUUM. Perhaps this
idea would have to be combined with other techniques to show a real
benefit. It would probably be necessary to sort the TIDs first (just
like index_delete_sort() does for the _bt_delitems_delete_check() code
today), but that's probably no big deal.

It is typical to have 200 - 400 TIDs on an nbtree leaf page without
using deduplication. And with deduplication enabled you can have as
many as 1300 TIDs on a single 8KiB nbtree leaf page. It's easy to
imagine something like GCC's __builtin_prefetch() (or maybe just more
predictable access patterns in our "batch binary search") making
everything much faster through batching. This will also naturally make
btvacuumpage() much less "branchy", since of course it will no longer
need to process the page one TID at a time -- that helps too.

--
Peter Geoghegan