initscan for MVCC snapshot
Hi:
I see initscan calls RelationGetwNumberOfBlocks every time and rescan calls
initscan as well. In my system, RelationGetNumberOfBlocks is expensive
(the reason
doesn't deserve a talk.. ), so in a nest loop + Bitmap heap scan case,
the
impact will be huge. The comments of initscan are below.
/*
* Determine the number of blocks we have to scan.
*
* It is sufficient to do this once at scan start, since any tuples added
* while the scan is in progress will be invisible to my snapshot anyway.
* (That is not true when using a non-MVCC snapshot. However, we couldn't
* guarantee to return tuples added after scan start anyway, since they
* might go into pages we already scanned. To guarantee consistent
* results for a non-MVCC snapshot, the caller must hold some higher-level
* lock that ensures the interesting tuple(s) won't change.)
*/
I still do not fully understand the comments. Looks we only need to call
multi times for non-MVCC snapshot, IIUC, does the following change
reasonable?
===
diff --git a/src/backend/access/heap/heapam.c
b/src/backend/access/heap/heapam.c
index 1b2f70499e..8238eabd8b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -211,6 +211,7 @@ initscan(HeapScanDesc scan, ScanKey key, bool
keep_startblock)
ParallelBlockTableScanDesc bpscan = NULL;
bool allow_strat;
bool allow_sync;
+ bool is_mvcc = scan->rs_base.rs_snapshot &&
IsMVCCSnapshot(scan->rs_base.rs_snapshot);
/*
* Determine the number of blocks we have to scan.
@@ -229,7 +230,8 @@ initscan(HeapScanDesc scan, ScanKey key, bool
keep_startblock)
scan->rs_nblocks = bpscan->phs_nblocks;
}
else
- scan->rs_nblocks =
RelationGetNumberOfBlocks(scan->rs_base.rs_rd);
+ if (scan->rs_nblocks == -1 || !is_mvcc)
+ scan->rs_nblocks =
RelationGetNumberOfBlocks(scan->rs_base.rs_rd);
/*
* If the table is large relative to NBuffers, use a bulk-read
access
@@ -1210,6 +1212,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
else
scan->rs_base.rs_key = NULL;
+ scan->rs_nblocks = -1;
initscan(scan, key, false);
--
Best Regards
Andy Fan
On Mon, Dec 7, 2020 at 8:26 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
Hi:
I see initscan calls RelationGetwNumberOfBlocks every time and rescan
calls
initscan as well. In my system, RelationGetNumberOfBlocks is expensive
(the reason
doesn't deserve a talk.. ), so in a nest loop + Bitmap heap scan case,
the
impact will be huge. The comments of initscan are below./*
* Determine the number of blocks we have to scan.
*
* It is sufficient to do this once at scan start, since any tuples added
* while the scan is in progress will be invisible to my snapshot anyway.
* (That is not true when using a non-MVCC snapshot. However, we couldn't
* guarantee to return tuples added after scan start anyway, since they
* might go into pages we already scanned. To guarantee consistent
* results for a non-MVCC snapshot, the caller must hold some higher-level
* lock that ensures the interesting tuple(s) won't change.)
*/I still do not fully understand the comments. Looks we only need to call
multi times for non-MVCC snapshot, IIUC, does the following change
reasonable?===
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 1b2f70499e..8238eabd8b 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -211,6 +211,7 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock) ParallelBlockTableScanDesc bpscan = NULL; bool allow_strat; bool allow_sync; + bool is_mvcc = scan->rs_base.rs_snapshot && IsMVCCSnapshot(scan->rs_base.rs_snapshot);/* * Determine the number of blocks we have to scan. @@ -229,7 +230,8 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock) scan->rs_nblocks = bpscan->phs_nblocks; } else - scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_base.rs_rd); + if (scan->rs_nblocks == -1 || !is_mvcc) + scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_base.rs_rd);/*
* If the table is large relative to NBuffers, use a bulk-read
access
@@ -1210,6 +1212,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
else
scan->rs_base.rs_key = NULL;+ scan->rs_nblocks = -1;
initscan(scan, key, false);--
Best Regards
Andy Fan
I have tested this with an ext4 file system, and I can get a 7%+
performance improvement
for the given test case. Here are the steps:
create table t(a int, b char(8000));
insert into t select i, i from generate_series(1, 1000000)i;
create index on t(a);
delete from t where a <= 10000;
vacuum t;
alter system set enable_indexscan to off;
select pg_reload_conf();
cat 1.sql
select * from generate_series(1, 10000)i, t where i = t.a;
bin/pgbench -f 1.sql postgres -T 300 -c 10
Without this patch: latency average = 61.806 ms
with this patch: latency average = 57.484 ms
I think the result is good and I think we can probably make this change.
However, I'm not
sure about it.
--
Best Regards
Andy Fan
On Thu, Dec 10, 2020 at 7:31 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
On Mon, Dec 7, 2020 at 8:26 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
Hi:
I see initscan calls RelationGetwNumberOfBlocks every time and rescan
calls
initscan as well. In my system, RelationGetNumberOfBlocks is expensive
(the reason
doesn't deserve a talk.. ), so in a nest loop + Bitmap heap scan case,
the
impact will be huge. The comments of initscan are below./*
* Determine the number of blocks we have to scan.
*
* It is sufficient to do this once at scan start, since any tuples added
* while the scan is in progress will be invisible to my snapshot anyway.
* (That is not true when using a non-MVCC snapshot. However, we couldn't
* guarantee to return tuples added after scan start anyway, since they
* might go into pages we already scanned. To guarantee consistent
* results for a non-MVCC snapshot, the caller must hold some higher-level
* lock that ensures the interesting tuple(s) won't change.)
*/I still do not fully understand the comments. Looks we only need to call
multi times for non-MVCC snapshot, IIUC, does the following change
reasonable?===
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 1b2f70499e..8238eabd8b 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -211,6 +211,7 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock) ParallelBlockTableScanDesc bpscan = NULL; bool allow_strat; bool allow_sync; + bool is_mvcc = scan->rs_base.rs_snapshot && IsMVCCSnapshot(scan->rs_base.rs_snapshot);/* * Determine the number of blocks we have to scan. @@ -229,7 +230,8 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock) scan->rs_nblocks = bpscan->phs_nblocks; } else - scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_base.rs_rd); + if (scan->rs_nblocks == -1 || !is_mvcc) + scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_base.rs_rd);/*
* If the table is large relative to NBuffers, use a bulk-read
access
@@ -1210,6 +1212,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
else
scan->rs_base.rs_key = NULL;+ scan->rs_nblocks = -1;
initscan(scan, key, false);--
Best Regards
Andy FanI have tested this with an ext4 file system, and I can get a 7%+
performance improvement
for the given test case. Here are the steps:create table t(a int, b char(8000));
insert into t select i, i from generate_series(1, 1000000)i;
create index on t(a);
delete from t where a <= 10000;
vacuum t;
alter system set enable_indexscan to off;
select pg_reload_conf();cat 1.sql
select * from generate_series(1, 10000)i, t where i = t.a;bin/pgbench -f 1.sql postgres -T 300 -c 10
Without this patch: latency average = 61.806 ms
with this patch: latency average = 57.484 msI think the result is good and I think we can probably make this change.
However, I'm not
sure about it.
The plan which was used is below, in the rescan of Bitmap Heap Scan,
mdnblocks will
be called 10000 times in current implementation, Within my patch, it will
be only called
once.
postgres=# explain (costs off) select * from generate_series(1, 10000)i, t
where i = t.a;
QUERY PLAN
------------------------------------------
Nested Loop
-> Function Scan on generate_series i
-> Bitmap Heap Scan on t
Recheck Cond: (a = i.i)
-> Bitmap Index Scan on t_a_idx
Index Cond: (a = i.i)
(6 rows)
--
Best Regards
Andy Fan