Bitmap reuse

Started by Jeff Janesover 4 years ago6 messages

jeff.janes@gmail.com

over 4 years ago

1 attachment(s)

For some queries PostgreSQL can spend most of its time creating the exact
same bitmap over and over. For example, in the below case: (also attached
as a file because line-wrapping is going to make a mess of it)

drop table if exists foo;
create table foo (x daterange, i int, t text);
insert into foo select daterange(x::date,x::date+3), random()*3000 from
(select now()-interval '3 years'*random() as x from
generate_series(1,1e6))foo;
vacuum analyze foo;
create index ON foo using gist ( x);
create index ON foo ( i);
explain (analyze, buffers) select * from generate_series(1,20) g(i), foo
where x && '[2019-08-09,2019-08-11)' and g.i=foo.i;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=170.21..3563.24 rows=33 width=54) (actual
time=1.295..24.890 rows=28 loops=1)
Buffers: shared hit=543 read=8
I/O Timings: read=0.040
-> Function Scan on generate_series g (cost=0.00..0.20 rows=20
width=4) (actual time=0.007..0.014 rows=20 loops=1)
-> Bitmap Heap Scan on foo (cost=170.21..178.13 rows=2 width=50)
(actual time=1.238..1.240 rows=1 loops=20)
Recheck Cond: ((i = g.i) AND (x &&
'[2019-08-09,2019-08-11)'::daterange))
Heap Blocks: exact=28
Buffers: shared hit=543 read=8
I/O Timings: read=0.040
-> BitmapAnd (cost=170.21..170.21 rows=2 width=0) (actual
time=1.234..1.234 rows=0 loops=20)
Buffers: shared hit=515 read=8
I/O Timings: read=0.040
-> Bitmap Index Scan on foo_i_idx (cost=0.00..6.92
rows=333 width=0) (actual time=0.031..0.031 rows=327 loops=20)
Index Cond: (i = g.i)
Buffers: shared hit=55 read=8
I/O Timings: read=0.040
-> Bitmap Index Scan on foo_x_idx (cost=0.00..161.78
rows=5000 width=0) (actual time=1.183..1.183 rows=3670 loops=20)
Index Cond: (x && '[2019-08-09,2019-08-11)'::daterange)
Buffers: shared hit=460

Note that the fast bitmap index scan is parameterized to the other side of
the nested loop, so has to be recomputed. While the slow one is
parameterized to a constant, so it could in principle just be reused.

What kind of infrastructure would be needed to detect this case and reuse
that bitmap?

Cheers,

Jeff

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Jeff Janes (#1)

Re: Bitmap reuse

Jeff Janes <jeff.janes@gmail.com> writes:

For some queries PostgreSQL can spend most of its time creating the exact
same bitmap over and over. For example, in the below case: (also attached
as a file because line-wrapping is going to make a mess of it)

Uh .... it's not the "exact same bitmap each time", because the selected
rows vary depending on the value of g.i.

If the output of the subplan was indeed constant, I'd expect the planner
to stick a Materialize node atop it. That would almost certainly win
more than re-using the index output to scan the heap additional times.

regards, tom lane

David Rowley

dgrowleyml@gmail.com

over 4 years ago

In reply to: Tom Lane (#2)

Re: Bitmap reuse

On Wed, 21 Jul 2021 at 11:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

For some queries PostgreSQL can spend most of its time creating the exact
same bitmap over and over. For example, in the below case: (also attached
as a file because line-wrapping is going to make a mess of it)

Uh .... it's not the "exact same bitmap each time", because the selected
rows vary depending on the value of g.i.

I imagined Jeff was meaning the bitmap from the scan of foo_x_idx, not
the combined ANDed bitmap from both indexes.

I didn't look in detail, but I'd think it would just be a matter of
caching the bitmap then in ExecReScanBitmapIndexScan(), if the
PlanState's chgParam indicate a parameter has changed, then throw away
the cache. Then if the cache is still valid in
MultiExecBitmapIndexScan(), return it. Not too sure about the memory
context part for the cache. As I said, I didn't look in detail.

David

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: David Rowley (#3)

Re: Bitmap reuse

David Rowley <dgrowleyml@gmail.com> writes:

On Wed, 21 Jul 2021 at 11:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Uh .... it's not the "exact same bitmap each time", because the selected
rows vary depending on the value of g.i.

I imagined Jeff was meaning the bitmap from the scan of foo_x_idx, not
the combined ANDed bitmap from both indexes.

To re-use that, you'd need a way to prevent the upper node from
destructively modifying the tidbitmap.

regards, tom lane

David Rowley

dgrowleyml@gmail.com

over 4 years ago

In reply to: Tom Lane (#4)

Re: Bitmap reuse

On Wed, 21 Jul 2021 at 13:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

I imagined Jeff was meaning the bitmap from the scan of foo_x_idx, not
the combined ANDed bitmap from both indexes.

To re-use that, you'd need a way to prevent the upper node from
destructively modifying the tidbitmap.

Yeah. And that would slow things down in the case where it was just
executed once as we'd need to make a copy of it to prevent the cached
version from being modified regardless if it would ever be used again
or not.

Maybe the planner would need to be involved in making the decision of
if the bitmap index scan should tuck away a carbon copy of the
resulting TIDBitmap after the first scan. That way on rescan we could
just make a copy of the cached version and return that. That saves
having to modify the callers to tell them not to damage the returned
TIDBitmap.

David

David Rowley

dgrowleyml@gmail.com

over 4 years ago

In reply to: David Rowley (#5)

Re: Bitmap reuse

On Thu, 22 Jul 2021 at 01:54, David Rowley <dgrowleyml@gmail.com> wrote:

Maybe the planner would need to be involved in making the decision of
if the bitmap index scan should tuck away a carbon copy of the
resulting TIDBitmap after the first scan. That way on rescan we could
just make a copy of the cached version and return that. That saves
having to modify the callers to tell them not to damage the returned
TIDBitmap.

Oh but, meh. Caching could blow out work_mem... We might end up using
work_mem * 2.

David

Bitmap reuse

Attachments: