Bitmap reuse
For some queries PostgreSQL can spend most of its time creating the exact
same bitmap over and over. For example, in the below case: (also attached
as a file because line-wrapping is going to make a mess of it)
drop table if exists foo;
create table foo (x daterange, i int, t text);
insert into foo select daterange(x::date,x::date+3), random()*3000 from
(select now()-interval '3 years'*random() as x from
generate_series(1,1e6))foo;
vacuum analyze foo;
create index ON foo using gist ( x);
create index ON foo ( i);
explain (analyze, buffers) select * from generate_series(1,20) g(i), foo
where x && '[2019-08-09,2019-08-11)' and g.i=foo.i;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=170.21..3563.24 rows=33 width=54) (actual
time=1.295..24.890 rows=28 loops=1)
Buffers: shared hit=543 read=8
I/O Timings: read=0.040
-> Function Scan on generate_series g (cost=0.00..0.20 rows=20
width=4) (actual time=0.007..0.014 rows=20 loops=1)
-> Bitmap Heap Scan on foo (cost=170.21..178.13 rows=2 width=50)
(actual time=1.238..1.240 rows=1 loops=20)
Recheck Cond: ((i = g.i) AND (x &&
'[2019-08-09,2019-08-11)'::daterange))
Heap Blocks: exact=28
Buffers: shared hit=543 read=8
I/O Timings: read=0.040
-> BitmapAnd (cost=170.21..170.21 rows=2 width=0) (actual
time=1.234..1.234 rows=0 loops=20)
Buffers: shared hit=515 read=8
I/O Timings: read=0.040
-> Bitmap Index Scan on foo_i_idx (cost=0.00..6.92
rows=333 width=0) (actual time=0.031..0.031 rows=327 loops=20)
Index Cond: (i = g.i)
Buffers: shared hit=55 read=8
I/O Timings: read=0.040
-> Bitmap Index Scan on foo_x_idx (cost=0.00..161.78
rows=5000 width=0) (actual time=1.183..1.183 rows=3670 loops=20)
Index Cond: (x && '[2019-08-09,2019-08-11)'::daterange)
Buffers: shared hit=460
Note that the fast bitmap index scan is parameterized to the other side of
the nested loop, so has to be recomputed. While the slow one is
parameterized to a constant, so it could in principle just be reused.
What kind of infrastructure would be needed to detect this case and reuse
that bitmap?
Cheers,
Jeff
Attachments:
Jeff Janes <jeff.janes@gmail.com> writes:
For some queries PostgreSQL can spend most of its time creating the exact
same bitmap over and over. For example, in the below case: (also attached
as a file because line-wrapping is going to make a mess of it)
Uh .... it's not the "exact same bitmap each time", because the selected
rows vary depending on the value of g.i.
If the output of the subplan was indeed constant, I'd expect the planner
to stick a Materialize node atop it. That would almost certainly win
more than re-using the index output to scan the heap additional times.
regards, tom lane
On Wed, 21 Jul 2021 at 11:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Janes <jeff.janes@gmail.com> writes:
For some queries PostgreSQL can spend most of its time creating the exact
same bitmap over and over. For example, in the below case: (also attached
as a file because line-wrapping is going to make a mess of it)Uh .... it's not the "exact same bitmap each time", because the selected
rows vary depending on the value of g.i.
I imagined Jeff was meaning the bitmap from the scan of foo_x_idx, not
the combined ANDed bitmap from both indexes.
I didn't look in detail, but I'd think it would just be a matter of
caching the bitmap then in ExecReScanBitmapIndexScan(), if the
PlanState's chgParam indicate a parameter has changed, then throw away
the cache. Then if the cache is still valid in
MultiExecBitmapIndexScan(), return it. Not too sure about the memory
context part for the cache. As I said, I didn't look in detail.
David
David Rowley <dgrowleyml@gmail.com> writes:
On Wed, 21 Jul 2021 at 11:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Uh .... it's not the "exact same bitmap each time", because the selected
rows vary depending on the value of g.i.
I imagined Jeff was meaning the bitmap from the scan of foo_x_idx, not
the combined ANDed bitmap from both indexes.
To re-use that, you'd need a way to prevent the upper node from
destructively modifying the tidbitmap.
regards, tom lane
On Wed, 21 Jul 2021 at 13:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
I imagined Jeff was meaning the bitmap from the scan of foo_x_idx, not
the combined ANDed bitmap from both indexes.To re-use that, you'd need a way to prevent the upper node from
destructively modifying the tidbitmap.
Yeah. And that would slow things down in the case where it was just
executed once as we'd need to make a copy of it to prevent the cached
version from being modified regardless if it would ever be used again
or not.
Maybe the planner would need to be involved in making the decision of
if the bitmap index scan should tuck away a carbon copy of the
resulting TIDBitmap after the first scan. That way on rescan we could
just make a copy of the cached version and return that. That saves
having to modify the callers to tell them not to damage the returned
TIDBitmap.
David
On Thu, 22 Jul 2021 at 01:54, David Rowley <dgrowleyml@gmail.com> wrote:
Maybe the planner would need to be involved in making the decision of
if the bitmap index scan should tuck away a carbon copy of the
resulting TIDBitmap after the first scan. That way on rescan we could
just make a copy of the cached version and return that. That saves
having to modify the callers to tell them not to damage the returned
TIDBitmap.
Oh but, meh. Caching could blow out work_mem... We might end up using
work_mem * 2.
David