Bitmap scan is undercosted?

Started by Vitaliy Garnashevichover 8 years ago32 messageshackers

vgarnashevich@gmail.com

over 8 years ago

Hi,

We recently had an issue in production, where a bitmap scan was chosen
instead of an index scan. Despite being 30x slower, the bitmap scan had
about the same cost as the index scan.

I've found some cases where similar issues with bitmap scans were
reported before:

/messages/by-id/1456154321.976561.528310154.6A623C0E@webmail.messagingengine.com

/messages/by-id/CA+wPC0MRMhF_8fD9dc8+QWZQzUvHahPRSv=xMtCmsVLSsy-p0w@mail.gmail.com

I've made a synthetic test, which kind of reproduces the issue:

shared_buffers = 512MB
effective_cache_size = 512MB
work_mem = 100MB

set seq_page_cost = 1.0;
set random_page_cost = 1.5;
set cpu_tuple_cost = 0.01;
set cpu_index_tuple_cost = 0.005;
set cpu_operator_cost = 0.0025;

drop table if exists aaa;
create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
generate_series(1, 10000000) id;
create index i1 on aaa (num);
create index i2 on aaa (flag);
analyze aaa;

select relname, reltuples::bigint, relpages::bigint,
(reltuples/relpages)::bigint tpp from pg_class where relname
in('aaa','i1','i2') order by relname;
"aaa";9999985;44248;226
"i1";9999985;27422;365
"i2";9999985;27422;365

I've been running the same query while enabling and disabling different
kinds of scans:

1) set enable_bitmapscan = on; set enable_indexscan = off; set
enable_seqscan = off;
2) set enable_bitmapscan = off; set enable_indexscan = on; set
enable_seqscan = off;
3) set enable_bitmapscan = off; set enable_indexscan = off; set
enable_seqscan = on;

The query was:
explain (analyze,verbose,costs,buffers)
select count(*) from aaa where num = 1 and flag = true;

Here are the results for PostgreSQL 9.6 (for 9.3 and 10.1 the results
are very similar):

1) Aggregate (cost=24821.70..24821.71 rows=1 width=8) (actual
time=184.591..184.591 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=47259
-> Bitmap Heap Scan on public.aaa (cost=13038.21..24796.22
rows=10189 width=0) (actual time=122.492..178.006 rows=100000 loops=1)
        Output: num, flag
        Recheck Cond: (aaa.num = 1)
        Filter: aaa.flag
        Heap Blocks: exact=44248
        Buffers: shared hit=47259
        -> BitmapAnd (cost=13038.21..13038.21 rows=10189 width=0)
(actual time=110.699..110.699 rows=0 loops=1)
              Buffers: shared hit=3011
              -> Bitmap Index Scan on i1 (cost=0.00..1158.94
rows=99667 width=0) (actual time=19.600..19.600 rows=100000 loops=1)
                    Index Cond: (aaa.num = 1)
                    Buffers: shared hit=276
              -> Bitmap Index Scan on i2 (cost=0.00..11873.92
rows=1022332 width=0) (actual time=81.676..81.676 rows=1000000 loops=1)
                    Index Cond: (aaa.flag = true)
                    Buffers: shared hit=2735
Planning time: 0.104 ms
Execution time: 184.988 ms

2) Aggregate (cost=67939.09..67939.10 rows=1 width=8) (actual
time=67.510..67.510 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=44524
-> Index Scan using i1 on public.aaa (cost=0.44..67910.95
rows=11256 width=0) (actual time=0.020..61.180 rows=100000 loops=1)
        Output: num, flag
        Index Cond: (aaa.num = 1)
        Filter: aaa.flag
        Buffers: shared hit=44524
Planning time: 0.096 ms
Execution time: 67.543 ms

3) Aggregate (cost=169276.49..169276.50 rows=1 width=8) (actual
time=977.063..977.063 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=44248
-> Seq Scan on public.aaa (cost=0.00..169248.35 rows=11256 width=0)
(actual time=0.018..969.294 rows=100000 loops=1)
        Output: num, flag
        Filter: (aaa.flag AND (aaa.num = 1))
        Rows Removed by Filter: 9900000
        Buffers: shared hit=44248
Planning time: 0.099 ms
Execution time: 977.094 ms

The bitmap scan version runs more than twice slower than the one with
index scan, while being costed at more than twice cheaper.

I've tried to increase cpu_tuple_cost and cpu_index_tuple_cost, and this
behavior remains after 6x increase in values. Although the difference in
costs becomes much less. After increasing the settings more than 6x,
PostgreSQL decides to use a different plan for bitmap scans, so it's
hard to make conclusions at that point.

Could such cases be fixed with tuning of cost settings, or that's just
how PostgreSQL estimates bitmap scans and this can't be fixed without
modifying the optimizer? Or am I missing something and that's the
expected behavior? Thoughts?

Regards,
Vitaliy