Bitmap scan is undercosted?

Started by Vitaliy Garnashevichover 8 years ago32 messageshackers
Jump to latest
#1Vitaliy Garnashevich
vgarnashevich@gmail.com

Hi,

We recently had an issue in production, where a bitmap scan was chosen
instead of an index scan. Despite being 30x slower, the bitmap scan had
about the same cost as the index scan.

I've found some cases where similar issues with bitmap scans were
reported before:

/messages/by-id/1456154321.976561.528310154.6A623C0E@webmail.messagingengine.com

/messages/by-id/CA+wPC0MRMhF_8fD9dc8+QWZQzUvHahPRSv=xMtCmsVLSsy-p0w@mail.gmail.com

I've made a synthetic test, which kind of reproduces the issue:

shared_buffers = 512MB
effective_cache_size = 512MB
work_mem = 100MB

set seq_page_cost = 1.0;
set random_page_cost = 1.5;
set cpu_tuple_cost = 0.01;
set cpu_index_tuple_cost = 0.005;
set cpu_operator_cost = 0.0025;

drop table if exists aaa;
create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
generate_series(1, 10000000) id;
create index i1 on aaa  (num);
create index i2 on aaa  (flag);
analyze aaa;

select relname, reltuples::bigint, relpages::bigint,
(reltuples/relpages)::bigint tpp from pg_class where relname
in('aaa','i1','i2') order by relname;
"aaa";9999985;44248;226
"i1";9999985;27422;365
"i2";9999985;27422;365

I've been running the same query while enabling and disabling different
kinds of scans:

1) set enable_bitmapscan = on;  set enable_indexscan = off; set
enable_seqscan = off;
2) set enable_bitmapscan = off; set enable_indexscan = on;  set
enable_seqscan = off;
3) set enable_bitmapscan = off; set enable_indexscan = off; set
enable_seqscan = on;

The query was:
explain (analyze,verbose,costs,buffers)
select count(*) from aaa where num = 1 and flag = true;

Here are the results for PostgreSQL 9.6 (for 9.3 and 10.1 the results
are very similar):

1) Aggregate  (cost=24821.70..24821.71 rows=1 width=8) (actual
time=184.591..184.591 rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=47259
  ->  Bitmap Heap Scan on public.aaa  (cost=13038.21..24796.22
rows=10189 width=0) (actual time=122.492..178.006 rows=100000 loops=1)
        Output: num, flag
        Recheck Cond: (aaa.num = 1)
        Filter: aaa.flag
        Heap Blocks: exact=44248
        Buffers: shared hit=47259
        ->  BitmapAnd  (cost=13038.21..13038.21 rows=10189 width=0)
(actual time=110.699..110.699 rows=0 loops=1)
              Buffers: shared hit=3011
              ->  Bitmap Index Scan on i1  (cost=0.00..1158.94
rows=99667 width=0) (actual time=19.600..19.600 rows=100000 loops=1)
                    Index Cond: (aaa.num = 1)
                    Buffers: shared hit=276
              ->  Bitmap Index Scan on i2  (cost=0.00..11873.92
rows=1022332 width=0) (actual time=81.676..81.676 rows=1000000 loops=1)
                    Index Cond: (aaa.flag = true)
                    Buffers: shared hit=2735
Planning time: 0.104 ms
Execution time: 184.988 ms

2) Aggregate  (cost=67939.09..67939.10 rows=1 width=8) (actual
time=67.510..67.510 rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=44524
  ->  Index Scan using i1 on public.aaa  (cost=0.44..67910.95
rows=11256 width=0) (actual time=0.020..61.180 rows=100000 loops=1)
        Output: num, flag
        Index Cond: (aaa.num = 1)
        Filter: aaa.flag
        Buffers: shared hit=44524
Planning time: 0.096 ms
Execution time: 67.543 ms

3) Aggregate  (cost=169276.49..169276.50 rows=1 width=8) (actual
time=977.063..977.063 rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=44248
  ->  Seq Scan on public.aaa  (cost=0.00..169248.35 rows=11256 width=0)
(actual time=0.018..969.294 rows=100000 loops=1)
        Output: num, flag
        Filter: (aaa.flag AND (aaa.num = 1))
        Rows Removed by Filter: 9900000
        Buffers: shared hit=44248
Planning time: 0.099 ms
Execution time: 977.094 ms

The bitmap scan version runs more than twice slower than the one with
index scan, while being costed at more than twice cheaper.

I've tried to increase cpu_tuple_cost and cpu_index_tuple_cost, and this
behavior remains after 6x increase in values. Although the difference in
costs becomes much less. After increasing the settings more than 6x,
PostgreSQL decides to use a different plan for bitmap scans, so it's
hard to make conclusions at that point.

Could such cases be fixed with tuning of cost settings, or that's just
how PostgreSQL estimates bitmap scans and this can't be fixed without
modifying the optimizer? Or am I missing something and that's the
expected behavior? Thoughts?

Regards,
Vitaliy

#2Justin Pryzby
pryzby@telsasoft.com
In reply to: Vitaliy Garnashevich (#1)
Re: Bitmap scan is undercosted?

On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:

We recently had an issue in production, where a bitmap scan was chosen
instead of an index scan. Despite being 30x slower, the bitmap scan had
about the same cost as the index scan.

Me too, see also:
/messages/by-id/CAH2-WzkRTggiy_LKQUu-oViyp6y_Hhz-a1yWacPy4tcYWV1HoA@mail.gmail.com

drop table if exists aaa;
create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
generate_series(1, 10000000) id;
create index i1 on aaa� (num);
create index i2 on aaa� (flag);
analyze aaa;

select relname, reltuples::bigint, relpages::bigint,
(reltuples/relpages)::bigint tpp from pg_class where relname
in('aaa','i1','i2') order by relname;
"aaa";9999985;44248;226
"i1";9999985;27422;365
"i2";9999985;27422;365

The query was:
explain (analyze,verbose,costs,buffers)
select count(*) from aaa where num = 1 and flag = true;

Note that id%100==1 implies flag='t', so the planner anticipates retrieving
fewer rows than it will ultimately read, probably by 2x. It makes sense that
causes the index scan to be more expensive than expected, but that's only
somewhat important, since there's no joins involved.

The reason why it's more than a bit slower is due to the "density" [0]I'm borrowing Jeff's language from here: /messages/by-id/CAMkU=1xwGn+O0jhKsvrUrbW9MQp1YX0iB4Y-6h1mEz0ffBxK-Q@mail.gmail.com "density" wasn't our problem, but it's a perfect description of this issue. of the
heap pages read. num=1 is more selective than flag=true, so it scans i1,
reading 1% of the whole table. But it's not reading the first 1% or
some other 1% of the table, it reads tuples evenly distributed across the
entire table (226*0.01 = ~2 rows of each page). Since the index was created
after the INSERT, the repeated keys (logical value: id%100) are read in
physical order on the heap, so this is basically doing a seq scan, but with the
additional overhead of reading the index, and maybe doing an fseek() before
each/some read()s, too. You could confirm that by connecting strace to the
backend before starting the query.

Since you did that using % and with indices added after the INSERT, you can't
improve it by reindexing (as I was able to for my case). That's an elegant
test case, so thanks.

I think shared_buffers=512MB is just small enough for this test to be painful
for 1e7 rows. I see the table+index is 559MB.

I don't know if that's really similar to your production use case, but I would
recommend trying BRIN indices, which always require a bitmap scan. Note that
some things (like max()) that can use an btree index cannot use brin. PG10.1
has WITH autosummarize, which was important for our use, since we rarely do
UPDATEs or DELETEs so tables are rarely vacuumed (only analyzed).

Justin

[0]: I'm borrowing Jeff's language from here: /messages/by-id/CAMkU=1xwGn+O0jhKsvrUrbW9MQp1YX0iB4Y-6h1mEz0ffBxK-Q@mail.gmail.com "density" wasn't our problem, but it's a perfect description of this issue.
/messages/by-id/CAMkU=1xwGn+O0jhKsvrUrbW9MQp1YX0iB4Y-6h1mEz0ffBxK-Q@mail.gmail.com
"density" wasn't our problem, but it's a perfect description of this issue.

#3Vitaliy Garnashevich
vgarnashevich@gmail.com
In reply to: Justin Pryzby (#2)
Re: Bitmap scan is undercosted?

On 01/12/2017 20:34, Justin Pryzby wrote:

On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:

We recently had an issue in production, where a bitmap scan was chosen
instead of an index scan. Despite being 30x slower, the bitmap scan had
about the same cost as the index scan.

Me too, see also:
/messages/by-id/CAH2-WzkRTggiy_LKQUu-oViyp6y_Hhz-a1yWacPy4tcYWV1HoA@mail.gmail.com

drop table if exists aaa;
create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
generate_series(1, 10000000) id;
create index i1 on aaa  (num);
create index i2 on aaa  (flag);
analyze aaa;

select relname, reltuples::bigint, relpages::bigint,
(reltuples/relpages)::bigint tpp from pg_class where relname
in('aaa','i1','i2') order by relname;
"aaa";9999985;44248;226
"i1";9999985;27422;365
"i2";9999985;27422;365

The query was:
explain (analyze,verbose,costs,buffers)
select count(*) from aaa where num = 1 and flag = true;

Note that id%100==1 implies flag='t', so the planner anticipates retrieving
fewer rows than it will ultimately read, probably by 2x. It makes sense that
causes the index scan to be more expensive than expected, but that's only
somewhat important, since there's no joins involved.

I don't think the planner is that smart to account for correlation
between values in different columns. When different values are used in
filter (num=2, num=39, num=74), the query actually runs faster, while
still being about twice slower than using an index scan. But the cost
does not change much. It jumps up and down for different values, but
it's still close to the initial value.

Aggregate  (cost=24239.02..24239.03 rows=1 width=8) (actual
time=105.239..105.239 rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=3011
  ->  Bitmap Heap Scan on public.aaa  (cost=12812.05..24214.48
rows=9816 width=0) (actual time=105.236..105.236 rows=0 loops=1)
        Output: num, flag
        Recheck Cond: (aaa.num = 39)
        Filter: aaa.flag
        Buffers: shared hit=3011
        ->  BitmapAnd  (cost=12812.05..12812.05 rows=9816 width=0)
(actual time=105.157..105.157 rows=0 loops=1)
              Buffers: shared hit=3011
              ->  Bitmap Index Scan on i1  (cost=0.00..1134.94
rows=97667 width=0) (actual time=15.725..15.725 rows=100000 loops=1)
                    Index Cond: (aaa.num = 39)
                    Buffers: shared hit=276
              ->  Bitmap Index Scan on i2  (cost=0.00..11671.96
rows=1005003 width=0) (actual time=77.920..77.920 rows=1000000 loops=1)
                    Index Cond: (aaa.flag = true)
                    Buffers: shared hit=2735
Planning time: 0.104 ms
Execution time: 105.553 ms

Aggregate  (cost=65785.99..65786.00 rows=1 width=8) (actual
time=48.587..48.587 rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=44524
  ->  Index Scan using i1 on public.aaa  (cost=0.44..65761.45 rows=9816
width=0) (actual time=48.583..48.583 rows=0 loops=1)
        Output: num, flag
        Index Cond: (aaa.num = 39)
        Filter: aaa.flag
        Rows Removed by Filter: 100000
        Buffers: shared hit=44524
Planning time: 0.097 ms
Execution time: 48.620 ms

The reason why it's more than a bit slower is due to the "density" [0] of the
heap pages read. num=1 is more selective than flag=true, so it scans i1,
reading 1% of the whole table. But it's not reading the first 1% or
some other 1% of the table, it reads tuples evenly distributed across the
entire table (226*0.01 = ~2 rows of each page). Since the index was created
after the INSERT, the repeated keys (logical value: id%100) are read in
physical order on the heap, so this is basically doing a seq scan, but with the
additional overhead of reading the index, and maybe doing an fseek() before
each/some read()s, too. You could confirm that by connecting strace to the
backend before starting the query.

Since you did that using % and with indices added after the INSERT, you can't
improve it by reindexing (as I was able to for my case). That's an elegant
test case, so thanks.

I think shared_buffers=512MB is just small enough for this test to be painful
for 1e7 rows. I see the table+index is 559MB.

           table           | ~count  |    size    |   toast |  idx   |
size + toast + idx
---------------------------+---------+------------+------------+--------+--------------------
 aaa                       | 9999994 | 346 MB     | 0 bytes    | 428 MB
| 774 MB

But the plan says all buffers are "shared hit", and none "read", so
that's probably not an issue.

I don't know if that's really similar to your production use case, but I would
recommend trying BRIN indices, which always require a bitmap scan. Note that
some things (like max()) that can use an btree index cannot use brin. PG10.1
has WITH autosummarize, which was important for our use, since we rarely do
UPDATEs or DELETEs so tables are rarely vacuumed (only analyzed).

Yes, BRIN indexes should be beneficial for our case, because there is a
date column which grows over time (not strictly regularly, but still).
Unfortunately, we're still migrating our databases from 9.3 to 9.6.
Anyway, thanks for the advice.

Show quoted text

Justin

[0] I'm borrowing Jeff's language from here:
/messages/by-id/CAMkU=1xwGn+O0jhKsvrUrbW9MQp1YX0iB4Y-6h1mEz0ffBxK-Q@mail.gmail.com
"density" wasn't our problem, but it's a perfect description of this issue.

#4Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#2)
Re: Bitmap scan is undercosted?

I tried to reproduce this issue and couldn't, under PG95 and 10.1:

On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote:

On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:

We recently had an issue in production, where a bitmap scan was chosen
instead of an index scan. Despite being 30x slower, the bitmap scan had
about the same cost as the index scan.

drop table if exists aaa;
create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
generate_series(1, 10000000) id;
create index i1 on aaa� (num);
create index i2 on aaa� (flag);
analyze aaa;

What is:
effective_io_concurrency
max_parallel_workers_per_gather (I gather you don't have this)

Note:
postgres=# SELECT correlation FROM pg_stats WHERE tablename='aaa' AND attname='num';
correlation | 0.00710112

..so this is different from the issue corrected by the patch I created while
testing.

Note that id%100==1 implies flag='t', so the planner anticipates retrieving
fewer rows than it will ultimately read, probably by 2x. It makes sense that
causes the index scan to be more expensive than expected, but that's only
somewhat important, since there's no joins involved.

I changed the query from COUNT(*) TO * for easier to read explain:

CREATE TABLE aaa AS SELECT (id%100)::int num, (id%10=1)::bool flag FROM generate_series(1, 10000000) id;
CREATE INDEX i1 ON aaa(num);
CREATE INDEX i2 ON aaa (flag);
ANALYZE VERBOSE aaa;
EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag=true;
Bitmap Heap Scan on public.aaa (cost=20652.98..45751.75 rows=10754 width=5) (actual time=85.314..185.107 rows=100000 loops=1)
-> BitmapAnd (cost=20652.98..20652.98 rows=10754 width=0) (actual time=163.220..163.220 rows=0 loops=1)
-> Bitmap Index Scan on i1 (cost=0.00..1965.93 rows=106333 width=0) (actual time=26.943..26.943 rows=100000 loops=1)
-> Bitmap Index Scan on i2 (cost=0.00..18681.42 rows=1011332 width=0) (actual time=133.804..133.804 rows=1000000 loops=1)

..which is what's wanted with no planner hints (PG10.1 here).

Same on PG95:
postgres=# EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag=true;
Bitmap Heap Scan on public.aaa (cost=19755.64..43640.32 rows=9979 width=5) (actual time=230.017..336.583 rows=100000 loops=1)
-> BitmapAnd (cost=19755.64..19755.64 rows=9979 width=0) (actual time=205.242..205.242 rows=0 loops=1)
-> Bitmap Index Scan on i1 (cost=0.00..1911.44 rows=103334 width=0) (actual time=24.911..24.911 rows=100000 loops=1)
-> Bitmap Index Scan on i2 (cost=0.00..17838.96 rows=965670 width=0) (actual time=154.237..154.237 rows=1000000 loops=1)

The rowcount is off, but not a critical issue without a join.

Justin

#5Vitaliy Garnashevich
vgarnashevich@gmail.com
In reply to: Justin Pryzby (#4)
Re: Bitmap scan is undercosted?

On 02/12/2017 01:11, Justin Pryzby wrote:

I tried to reproduce this issue and couldn't, under PG95 and 10.1:

On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote:

On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:

We recently had an issue in production, where a bitmap scan was chosen
instead of an index scan. Despite being 30x slower, the bitmap scan had
about the same cost as the index scan.
drop table if exists aaa;
create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
generate_series(1, 10000000) id;
create index i1 on aaa  (num);
create index i2 on aaa  (flag);
analyze aaa;

What is:
effective_io_concurrency
max_parallel_workers_per_gather (I gather you don't have this)

effective_io_concurrency = 0
max_parallel_workers_per_gather = 0

Did you notice random_page_cost = 1.5 ?

For this test I'm using SSD and Windows (if that matters). On production
we also use SSD, hence lower random_page_cost. But with the default
random_page_cost=4.0, the difference in cost between the index scan plan
and the bitmap scan plan is even bigger.

Note:
postgres=# SELECT correlation FROM pg_stats WHERE tablename='aaa' AND attname='num';
correlation | 0.00710112

..so this is different from the issue corrected by the patch I created while
testing.

Note that id%100==1 implies flag='t', so the planner anticipates retrieving
fewer rows than it will ultimately read, probably by 2x. It makes sense that
causes the index scan to be more expensive than expected, but that's only
somewhat important, since there's no joins involved.

I changed the query from COUNT(*) TO * for easier to read explain:

CREATE TABLE aaa AS SELECT (id%100)::int num, (id%10=1)::bool flag FROM generate_series(1, 10000000) id;
CREATE INDEX i1 ON aaa(num);
CREATE INDEX i2 ON aaa (flag);
ANALYZE VERBOSE aaa;
EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag=true;
Bitmap Heap Scan on public.aaa (cost=20652.98..45751.75 rows=10754 width=5) (actual time=85.314..185.107 rows=100000 loops=1)
-> BitmapAnd (cost=20652.98..20652.98 rows=10754 width=0) (actual time=163.220..163.220 rows=0 loops=1)
-> Bitmap Index Scan on i1 (cost=0.00..1965.93 rows=106333 width=0) (actual time=26.943..26.943 rows=100000 loops=1)
-> Bitmap Index Scan on i2 (cost=0.00..18681.42 rows=1011332 width=0) (actual time=133.804..133.804 rows=1000000 loops=1)

..which is what's wanted with no planner hints (PG10.1 here).

Yes, that's what you get without planner hints, but it's strange to get
this plan, when there is another one, which runs 2-3 times faster, but
happens to be estimated to be twice more costly than the one with bitmap
scans:

# set enable_bitmapscan = off; set enable_indexscan = on;  set
enable_seqscan = off;
# explain analyze select * from aaa where num = 1 and flag = true;
Index Scan using i1 on aaa  (cost=0.44..66369.81 rows=10428 width=5)
(actual time=0.020..57.765 rows=100000 loops=1)

vs.

# set enable_bitmapscan = on;  set enable_indexscan = off; set
enable_seqscan = off;
# explain analyze select * from aaa where num = 1 and flag = true;
Bitmap Heap Scan on aaa  (cost=13099.33..25081.40 rows=10428 width=5)
(actual time=122.137..182.811 rows=100000 loops=1)
  ->  BitmapAnd  (cost=13099.33..13099.33 rows=10428 width=0) (actual
time=110.168..110.168 rows=0 loops=1)
        ->  Bitmap Index Scan on i1  (cost=0.00..1181.44 rows=101667
width=0) (actual time=20.845..20.845 rows=100000 loops=1)
        ->  Bitmap Index Scan on i2  (cost=0.00..11912.43 rows=1025666
width=0) (actual time=80.323..80.323 rows=1000000 loops=1)

Show quoted text

Same on PG95:
postgres=# EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag=true;
Bitmap Heap Scan on public.aaa (cost=19755.64..43640.32 rows=9979 width=5) (actual time=230.017..336.583 rows=100000 loops=1)
-> BitmapAnd (cost=19755.64..19755.64 rows=9979 width=0) (actual time=205.242..205.242 rows=0 loops=1)
-> Bitmap Index Scan on i1 (cost=0.00..1911.44 rows=103334 width=0) (actual time=24.911..24.911 rows=100000 loops=1)
-> Bitmap Index Scan on i2 (cost=0.00..17838.96 rows=965670 width=0) (actual time=154.237..154.237 rows=1000000 loops=1)

The rowcount is off, but not a critical issue without a join.

Justin

#6Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#4)
Re: Bitmap scan is undercosted?

On Fri, Dec 01, 2017 at 05:11:04PM -0600, Justin Pryzby wrote:

I tried to reproduce this issue and couldn't, under PG95 and 10.1:

I'm embarassed to say that I mis-read your message, despite you're amply clear
subject. You're getting a bitmap scan but you'd prefer to get an index scan.
I anticipated the opposite problem (which is what I've had issues with myself).

On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote:

On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:

We recently had an issue in production, where a bitmap scan was chosen
instead of an index scan. Despite being 30x slower, the bitmap scan had
about the same cost as the index scan.

Note:
postgres=# SELECT correlation FROM pg_stats WHERE tablename='aaa' AND attname='num';
correlation | 0.00710112

..so this is different from the issue corrected by the patch I created while
testing.

Actually, that the table is "not correlated" on "num" column is maybe the
primary reason why PG avoids using an index scan. It (more or less correctly)
deduces that it's going to have to "read" a large fraction of the pages (even
if only to process a small fraction of the rows), which is costly, except it's
all cached.. In your case, that overly-penalizes the index scan.

This is cost_index() and cost_bitmap_heap_scan() in costsize.c. Since the
index is uncorrelated, it's returning something close to max_IO_cost. It looks
like effective_cache_size only affects index_pages_fetched().

I'm going to try to dig some more into it. Maybe there's evidence to
re-evaluate one of these:

cost_index()
| run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
or
cost_bitmap_heap_scan()
| cost_per_page = spc_random_page_cost -
| (spc_random_page_cost - spc_seq_page_cost)
| * sqrt(pages_fetched / T);

Justin

#7Jeff Janes
jeff.janes@gmail.com
In reply to: Vitaliy Garnashevich (#5)
Re: Bitmap scan is undercosted?

On Fri, Dec 1, 2017 at 3:54 PM, Vitaliy Garnashevich <
vgarnashevich@gmail.com> wrote:

On 02/12/2017 01:11, Justin Pryzby wrote:

I tried to reproduce this issue and couldn't, under PG95 and 10.1:

On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote:

On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:

We recently had an issue in production, where a bitmap scan was chosen
instead of an index scan. Despite being 30x slower, the bitmap scan had
about the same cost as the index scan.
drop table if exists aaa;
create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
generate_series(1, 10000000) id;
create index i1 on aaa (num);
create index i2 on aaa (flag);
analyze aaa;

What is:

effective_io_concurrency
max_parallel_workers_per_gather (I gather you don't have this)

effective_io_concurrency = 0
max_parallel_workers_per_gather = 0

Did you notice random_page_cost = 1.5 ?

For the aaa.num = 39 case, the faster index scan actually does hit 15 times
more buffers than the bitmap scan does. While 1.5 is lot lower than 4.0,
it is still much higher than the true cost of reading a page from the
buffer cache. This why the index scan is getting punished. You could
lower random_page_cost and seq_page_cost to 0, to remove those
considerations. (I'm not saying you should do this on your production
system, but rather you should do it as a way to investigate the issue. But
it might make sense on production as well)

For this test I'm using SSD and Windows (if that matters). On production
we also use SSD, hence lower random_page_cost. But with the default
random_page_cost=4.0, the difference in cost between the index scan plan
and the bitmap scan plan is even bigger.

Since it is all shared buffers hits, it doesn't matter if you have SSD for
this particular test case.

Cheers,

Jeff

#8Justin Pryzby
pryzby@telsasoft.com
In reply to: Vitaliy Garnashevich (#5)
Re: Bitmap scan is undercosted?

On Sat, Dec 02, 2017 at 01:54:09AM +0200, Vitaliy Garnashevich wrote:

On 02/12/2017 01:11, Justin Pryzby wrote:

..which is what's wanted with no planner hints (PG10.1 here).

Yes, that's what you get without planner hints, but it's strange to get this
plan, when there is another one, which runs 2-3 times faster, but happens to
be estimated to be twice more costly than the one with bitmap scans:

# set enable_bitmapscan = off; set enable_indexscan = on;� set enable_seqscan = off;
# explain analyze select * from aaa where num = 1 and flag = true;
Index Scan using i1 on aaa� (cost=0.44..66369.81 rows=10428 width=5) (actual time=0.020..57.765 rows=100000 loops=1)

vs.

# set enable_bitmapscan = on;� set enable_indexscan = off; set enable_seqscan = off;
# explain analyze select * from aaa where num = 1 and flag = true;
Bitmap Heap Scan on aaa� (cost=13099.33..25081.40 rows=10428 width=5) (actual time=122.137..182.811 rows=100000 loops=1)

I was able to get an index plan with:

SET random_page_cost=1; SET cpu_index_tuple_cost=.04; -- default: 0.005; see selfuncs.c
postgres=# EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag=true;
Index Scan using i1 on public.aaa (cost=0.43..50120.71 rows=10754 width=5) (actual time=0.040..149.580 rows=100000 loops=1)

Or with:
SET random_page_cost=1; SET cpu_operator_cost=0.03; -- default: 0.0025 see cost_bitmap_tree_node()
EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag= true;
Index Scan using i1 on public.aaa (cost=5.22..49328.00 rows=10754 width=5) (actual time=0.051..109.082 rows=100000 loops=1)

Or a combination trying to minimize the cost of the index scan:
postgres=# SET random_page_cost=1; SET cpu_index_tuple_cost=.0017; SET cpu_operator_cost=0.03; EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag= true;
Index Scan using i1 on public.aaa (cost=5.22..48977.10 rows=10754 width=5) (actual time=0.032..86.883 rows=100000 loops=1)

Not sure if that's reasonable, but maybe it helps to understand.

Justin

#9Vitaliy Garnashevich
vgarnashevich@gmail.com
In reply to: Jeff Janes (#7)
Re: Bitmap scan is undercosted?

On 02/12/2017 07:51, Jeff Janes wrote:

On Fri, Dec 1, 2017 at 3:54 PM, Vitaliy Garnashevich
<vgarnashevich@gmail.com <mailto:vgarnashevich@gmail.com>> wrote:

On 02/12/2017 01:11, Justin Pryzby wrote:

I tried to reproduce this issue and couldn't, under PG95 and 10.1:

On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote:

On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy
Garnashevich wrote:

We recently had an issue in production, where a bitmap
scan was chosen
instead of an index scan. Despite being 30x slower,
the bitmap scan had
about the same cost as the index scan.
drop table if exists aaa;
create table aaa as select (id%100)::int num,
(id%10=1)::bool flag from
generate_series(1, 10000000) id;
create index i1 on aaa  (num);
create index i2 on aaa  (flag);
analyze aaa;

What is:
effective_io_concurrency
max_parallel_workers_per_gather (I gather you don't have this)

effective_io_concurrency = 0
max_parallel_workers_per_gather = 0

Did you notice random_page_cost = 1.5 ?

For the aaa.num = 39 case, the faster index scan actually does hit 15
times more buffers than the bitmap scan does.  While 1.5 is lot lower
than 4.0, it is still much higher than the true cost of reading a page
from the buffer cache.   This why the index scan is getting punished. 
You could lower random_page_cost and seq_page_cost to 0, to remove
those considerations.  (I'm not saying you should do this on your
production system, but rather you should do it as a way to investigate
the issue.  But it might make sense on production as well)

seq_page_cost = 1.0
random_page_cost = 1.0*
*explain analyze select * from aaa where num = 2 and flag = true;

Bitmap Heap Scan on aaa  (cost=11536.74..20856.96 rows=10257 width=5)
(actual time=108.338..108.338 rows=0 loops=1)
  ->  BitmapAnd  (cost=11536.74..11536.74 rows=10257 width=0) (actual
time=108.226..108.226 rows=0 loops=1)
        ->  Bitmap Index Scan on i1  (cost=0.00..1025.43 rows=100000
width=0) (actual time=18.563..18.563 rows=100000 loops=1)
        ->  Bitmap Index Scan on i2  (cost=0.00..10505.93 rows=1025666
width=0) (actual time=78.493..78.493 rows=1000000 loops=1)

Index Scan using i1 on aaa  (cost=0.44..44663.58 rows=10257 width=5)
(actual time=51.264..51.264 rows=0 loops=1)

Here I've used the filter num = 2, which produces rows=0 at BitmapAnd,
and thus avoids a lot of work at "Bitmap Heap Scan" node, while still
leaving about the same proportion in bitmap vs index - the bitmap is
twice slower but twice less costly. It does not matter much which value
to use for the filter, if it's other than num = 1.

seq_page_cost = 0.0
random_page_cost = 0.0
explain analyze select * from aaa where num = 2 and flag = true;

Bitmap Heap Scan on aaa  (cost=753.00..2003.00 rows=10257 width=5)
(actual time=82.212..82.212 rows=0 loops=1)
  ->  Bitmap Index Scan on i1  (cost=0.00..750.43 rows=100000 width=0)
(actual time=17.401..17.401 rows=100000 loops=1)

Index Scan using i1 on aaa  (cost=0.44..1750.43 rows=10257 width=5)
(actual time=49.766..49.766 rows=0 loops=1)

The bitmap plan was reduced to use only one bitmap scan, and finally it
costs more than the index plan. But I doubt that the settings
seq_page_cost = random_page_cost = 0.0 should actually be used. Probably
it should be instead something like 1.0/1.0 or 1.0/1.1, but other costs
increased, to have more weight.

# x4 tuple/operator costs - bitmap scan still a bit cheaper
set seq_page_cost = 1.0;
set random_page_cost = 1.0;
set cpu_tuple_cost = 0.04;
set cpu_index_tuple_cost = 0.02;
set cpu_operator_cost = 0.01;

Bitmap Heap Scan on aaa  (cost=36882.97..46587.82 rows=10257 width=5)
(actual time=106.045..106.045 rows=0 loops=1)
  ->  BitmapAnd  (cost=36882.97..36882.97 rows=10257 width=0) (actual
time=105.966..105.966 rows=0 loops=1)
        ->  Bitmap Index Scan on i1  (cost=0.00..3276.74 rows=100000
width=0) (actual time=15.977..15.977 rows=100000 loops=1)
        ->  Bitmap Index Scan on i2  (cost=0.00..33584.72 rows=1025666
width=0) (actual time=79.208..79.208 rows=1000000 loops=1)

Index Scan using i1 on aaa  (cost=1.74..49914.89 rows=10257 width=5)
(actual time=50.144..50.144 rows=0 loops=1)

# x5 tuple/operator costs - switched to single bitmap index scan, but
now it costs more than the index scan
set seq_page_cost = 1.0;
set random_page_cost = 1.0;
set cpu_tuple_cost = 0.05;
set cpu_index_tuple_cost = 0.025;
set cpu_operator_cost = 0.0125;

Bitmap Heap Scan on aaa  (cost=4040.00..54538.00 rows=10257 width=5)
(actual time=82.338..82.338 rows=0 loops=1)
  ->  Bitmap Index Scan on i1  (cost=0.00..4027.18 rows=100000 width=0)
(actual time=19.541..19.541 rows=100000 loops=1)

Index Scan using i1 on aaa  (cost=2.17..51665.32 rows=10257 width=5)
(actual time=49.545..49.545 rows=0 loops=1)

I've also tried seq_page_cost = 1.0, random_page_cost = 1.1, but that
would require more than x10 increase in tuple/operator costs, to make
bitmap more costly than index.

For this test I'm using SSD and Windows (if that matters). On
production we also use SSD, hence lower random_page_cost. But with
the default random_page_cost=4.0, the difference in cost between
the index scan plan and the bitmap scan plan is even bigger.

Since it is all shared buffers hits, it doesn't matter if you have SSD
for this particular test case.

Agree. I've just tried to justify the value of random_page_cost, which
is lower than like 2.0.

Show quoted text

Cheers,

Jeff

#10Jeff Janes
jeff.janes@gmail.com
In reply to: Vitaliy Garnashevich (#9)
Re: Bitmap scan is undercosted?

On Fri, Dec 1, 2017 at 11:08 PM, Vitaliy Garnashevich <
vgarnashevich@gmail.com> wrote:

seq_page_cost = 0.0
random_page_cost = 0.0
explain analyze select * from aaa where num = 2 and flag = true;

Bitmap Heap Scan on aaa (cost=753.00..2003.00 rows=10257 width=5) (actual
time=82.212..82.212 rows=0 loops=1)
-> Bitmap Index Scan on i1 (cost=0.00..750.43 rows=100000 width=0)
(actual time=17.401..17.401 rows=100000 loops=1)

Index Scan using i1 on aaa (cost=0.44..1750.43 rows=10257 width=5)
(actual time=49.766..49.766 rows=0 loops=1)

The bitmap plan was reduced to use only one bitmap scan, and finally it
costs more than the index plan.

Right, so there is a cpu costing problem (which could only be fixed by
hacking postgresql and recompiling it), but it is much smaller of a problem
than the IO cost not being accurate due to the high hit rate. Fixing the
CPU costing problem is unlikely to make a difference to your real query.
If you set the page costs to zero, what happens to your real query?

But I doubt that the settings seq_page_cost = random_page_cost = 0.0
should actually be used.

Why not? If your production server really has everything in memory during
normal operation, that is the correct course of action. If you ever
restart the server, then you could have some unpleasant time getting it
back up to speed again, but pg_prewarm could help with that.

Probably it should be instead something like 1.0/1.0 or 1.0/1.1, but other
costs increased, to have more weight.

This doesn't make any sense to me. Halving the page costs is
mathematically the same as doubling all the other constants. But the first
way of doing things says what you are doing, and the second way is an
obfuscation of what you are doing.

# x4 tuple/operator costs - bitmap scan still a bit cheaper
set seq_page_cost = 1.0;
set random_page_cost = 1.0;
set cpu_tuple_cost = 0.04;
set cpu_index_tuple_cost = 0.02;
set cpu_operator_cost = 0.01;

If you really want to target the plan with the BitmapAnd, you should
increase cpu_index_tuple_cost and/or cpu_operator_cost but not increase
cpu_tuple_cost. That is because the unselective bitmap index scan does
not incur any cpu_tuple_cost, but does incur index_tuple and operator
costs. Unfortunately all other index scans in the system will also be
skewed by such a change if you make the change system-wide.

Incidentally, the "actual rows" field of BitmapAnd is always zero. That
field is not implemented for that node type.

Why do you have an index on flag in the first place? What does the index
accomplish, other than enticing the planner into bad plans? I don't know
how this translates back into your real query, but dropping that index
should be considered. Or replace both indexes with one on (num,flag).

Or you can re-write the part of the WHERE clause in a way that it can't use
an index, something like:

and flag::text ='t'

Cheers,

Jeff

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#10)
Re: Bitmap scan is undercosted?

Jeff Janes <jeff.janes@gmail.com> writes:

On Fri, Dec 1, 2017 at 11:08 PM, Vitaliy Garnashevich <
vgarnashevich@gmail.com> wrote:

# x4 tuple/operator costs - bitmap scan still a bit cheaper
set seq_page_cost = 1.0;
set random_page_cost = 1.0;
set cpu_tuple_cost = 0.04;
set cpu_index_tuple_cost = 0.02;
set cpu_operator_cost = 0.01;

If you really want to target the plan with the BitmapAnd, you should
increase cpu_index_tuple_cost and/or cpu_operator_cost but not increase
cpu_tuple_cost. That is because the unselective bitmap index scan does
not incur any cpu_tuple_cost, but does incur index_tuple and operator
costs. Unfortunately all other index scans in the system will also be
skewed by such a change if you make the change system-wide.

I think it'd be a serious error to screw around with your cost settings
on the basis of a single case in which the rowcount estimates are so
far off. It's really those estimates that are the problem AFAICS.

The core issue in this example is that, the way the test data is set up,
the "flag = true" condition actually adds no selectivity at all, because
every row with "num = 1" is certain to have "flag = true". If the planner
realized that, it would certainly not bother with BitmapAnd'ing the flag
index onto the results of the num index. But it doesn't know that those
columns are correlated, so it supposes that adding the extra index will
give a 10x reduction in the number of heap rows that have to be visited
(since it knows that only 1/10th of the rows have "flag = true").
*That* is what causes the overly optimistic cost estimate for the
two-index bitmapscan, and no amount of fiddling with the cost parameters
will make that better.

I tried creating multiple-column statistics using the v10 facility for
that:

regression=# create statistics s1 on num, flag from aaa;
CREATE STATISTICS
regression=# analyze aaa;
ANALYZE

but that changed the estimate not at all, which surprised me because
dependency statistics are supposed to fix exactly this type of problem.
I suspect there may be something in the extended-stats code that causes it
not to work right for boolean columns --- this wouldn't be excessively
surprising because of the way the planner messes around with converting
"flag = true" to just "flag" and sometimes back again. But I've not
looked closer yet.

regards, tom lane

#12Jeff Janes
jeff.janes@gmail.com
In reply to: Tom Lane (#11)
Re: Bitmap scan is undercosted?

On Sat, Dec 2, 2017 at 3:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

On Fri, Dec 1, 2017 at 11:08 PM, Vitaliy Garnashevich <
vgarnashevich@gmail.com> wrote:

# x4 tuple/operator costs - bitmap scan still a bit cheaper
set seq_page_cost = 1.0;
set random_page_cost = 1.0;
set cpu_tuple_cost = 0.04;
set cpu_index_tuple_cost = 0.02;
set cpu_operator_cost = 0.01;

If you really want to target the plan with the BitmapAnd, you should
increase cpu_index_tuple_cost and/or cpu_operator_cost but not increase
cpu_tuple_cost. That is because the unselective bitmap index scan does
not incur any cpu_tuple_cost, but does incur index_tuple and operator
costs. Unfortunately all other index scans in the system will also be
skewed by such a change if you make the change system-wide.

I think it'd be a serious error to screw around with your cost settings
on the basis of a single case in which the rowcount estimates are so
far off. It's really those estimates that are the problem AFAICS.

The core issue in this example is that, the way the test data is set up,
the "flag = true" condition actually adds no selectivity at all, because
every row with "num = 1" is certain to have "flag = true". If the planner
realized that, it would certainly not bother with BitmapAnd'ing the flag
index onto the results of the num index. But it doesn't know that those
columns are correlated, so it supposes that adding the extra index will
give a 10x reduction in the number of heap rows that have to be visited
(since it knows that only 1/10th of the rows have "flag = true").
*That* is what causes the overly optimistic cost estimate for the
two-index bitmapscan, and no amount of fiddling with the cost parameters
will make that better.

But he also tested with num=2 and num=39, which reverses the situation so
the bitmap is 100% selective rather than the 90% the planner thinks it will
be.

But it is still slower for him (I am having trouble replicating that exact
behavior), so building the bitmap to rule out 100% of the rows is
empirically not worth it, I don't see how building it to rule out 90%, as
the planner things, would be any better.

I tried creating multiple-column statistics using the v10 facility for
that:

regression=# create statistics s1 on num, flag from aaa;
CREATE STATISTICS
regression=# analyze aaa;
ANALYZE

but that changed the estimate not at all, which surprised me because
dependency statistics are supposed to fix exactly this type of problem.
I suspect there may be something in the extended-stats code that causes it
not to work right for boolean columns --- this wouldn't be excessively
surprising because of the way the planner messes around with converting
"flag = true" to just "flag" and sometimes back again. But I've not
looked closer yet.

I think the non-extended stats code also has trouble with booleans.
pg_stats gives me a correlation of 0.8 or higher for the flag column.

Due to that, when I disable bitmapscans and seqscans, I start getting slow
index scans on the wrong index, i2 rather than i1. I don't know why he
doesn't see that in his example.

Cheers,

Jeff

#13Justin Pryzby
pryzby@telsasoft.com
In reply to: Jeff Janes (#12)
Re: Bitmap scan is undercosted? - boolean correlation

On Sat, Dec 02, 2017 at 05:27:51PM -0800, Jeff Janes wrote:

I think the non-extended stats code also has trouble with booleans.
pg_stats gives me a correlation of 0.8 or higher for the flag column.

It's not due to the boolean though; you see the same thing if you do:
CREATE INDEX aaa_f ON aaa((flag::text));
ANALYZE aaa;
correlation | 0.81193

or:
ALTER TABLE aaa ADD flag2 int; UPDATE aaa SET flag2= flag::int
correlation | 0.81193

I think it's caused by having so few (2) values to correlate.

most_common_vals | {f,t}
most_common_freqs | {0.9014,0.0986}
correlation | 0.822792

It thinks there's somewhat-high correlation since it gets a list of x and y
values (integer positions by logical and physical sort order) and 90% of the x
list (logical value) are the same value ('t'), and the CTIDs are in order on
the new index, so 90% of the values are 100% correlated.

It improves (by which I mean here that it spits out a lower number) if it's not
a 90/10 split:

CREATE TABLE aaa5 AS SELECT (id%100)::int num, (id%10>5)::bool flag FROM generate_series(1, 10000000) id;
CREATE INDEX ON aaa5 (flag);

tablename | aaa5
attname | flag
correlation | 0.522184

Justin

#14Vitaliy Garnashevich
vgarnashevich@gmail.com
In reply to: Jeff Janes (#10)
Re: Bitmap scan is undercosted?

On 02/12/2017 23:17, Jeff Janes wrote:

Right, so there is a cpu costing problem (which could only be fixed by
hacking postgresql and recompiling it), but it is much smaller of a
problem than the IO cost not being accurate due to the high hit rate.
Fixing the CPU costing problem is unlikely to make a difference to
your real query.  If you set the page costs to zero, what happens to
your real query?

I can't reproduce the exact issue on the real database any more. The
query started to use the slow bitmap scan recently, and had been doing
so for some time lately, but now it's switched back to use the index
scan. The table involved in the query gets modified a lot. It has
hundreds of millions of rows. Lots of new rows are appended to it every
day, the oldest rows are sometimes removed. The table is analyzed at
least daily. It's possible that statistics was updated and that caused
the query to run differently. But I still would like to understand why
that issue happened, and how to properly fix it, in case the issue returns.

But I doubt that the settings seq_page_cost = random_page_cost =
0.0 should actually be used.

Why not?  If your production server really has everything in memory
during normal operation, that is the correct course of action.  If you
ever restart the server, then you could have some unpleasant time
getting it back up to speed again, but pg_prewarm could help with that.

In the real database, not everything is in memory. There are 200GB+ of
RAM, but DB is 500GB+. The table involved in the query itself is 60GB+
of data and 100GB+ of indexes. I'm running the test case in a way where
all reads are done from RAM, only to make it easier to reproduce and to
avoid unrelated effects.

As far as know, costs in Postgres were designed to be relative to
seq_page_cost, which for that reason is usually defined as 1.0. Even if
everything would be in RAM, accesses to the pages would still not have
zero cost. Setting 0.0 just seems too extreme, as all other non-zero
costs would become infinitely bigger.

If you really want to target the plan with the BitmapAnd, you should
increase cpu_index_tuple_cost and/or cpu_operator_cost but not
increase cpu_tuple_cost.  That is because the  unselective bitmap
index scan does not incur any cpu_tuple_cost, but does incur
index_tuple and operator costs.  Unfortunately all other index scans
in the system will also be skewed by such a change if you make the
change system-wide.

Exactly. I'd like to understand why the worse plan is being chosen, and
1) if it's fixable by tuning costs, to figure out the right settings
which could be used in production, 2) if there is a bug in Postgres
optimizer, then to bring some attention to it, so that it's eventually
fixed in one of future releases, 3) if Postgres is supposed to work this
way, then at least I (and people who ever read this thread) would
understand it better.

Regards,
Vitaliy

#15Vitaliy Garnashevich
vgarnashevich@gmail.com
In reply to: Tom Lane (#11)
Re: Bitmap scan is undercosted?

On 03/12/2017 01:44, Tom Lane wrote:

I think it'd be a serious error to screw around with your cost settings
on the basis of a single case in which the rowcount estimates are so
far off. It's really those estimates that are the problem AFAICS.

The core issue in this example is that, the way the test data is set up,
the "flag = true" condition actually adds no selectivity at all, because
every row with "num = 1" is certain to have "flag = true". If the planner
realized that, it would certainly not bother with BitmapAnd'ing the flag
index onto the results of the num index. But it doesn't know that those
columns are correlated, so it supposes that adding the extra index will
give a 10x reduction in the number of heap rows that have to be visited
(since it knows that only 1/10th of the rows have "flag = true").
*That* is what causes the overly optimistic cost estimate for the
two-index bitmapscan, and no amount of fiddling with the cost parameters
will make that better.

Here I've tried to make a test which would not have correlation between
the two columns.

shared_buffers = 512MB
effective_cache_size = 512MB
work_mem = 100MB

set seq_page_cost = 1.0;
set random_page_cost = 1.5;
set cpu_tuple_cost = 0.01;
set cpu_index_tuple_cost = 0.005;
set cpu_operator_cost = 0.0025;

drop table if exists aaa;
create table aaa as select floor(random()*100)::int num, (random()*10 <
1)::bool flag from generate_series(1, 10000000) id;
create index i1 on aaa  (num);
create index i2 on aaa  (flag);
analyze aaa;

select relname, reltuples::bigint, relpages::bigint,
(reltuples/relpages)::bigint tpp from pg_class where relname
in('aaa','i1','i2') order by relname;
"aaa";10000033;44248;226
"i1";10000033;27422;365
"i2";10000033;27422;365

select flag, count(*) from aaa group by flag order by flag;
f;9000661
t;999339

select num, count(*) from aaa group by num order by num;
0;99852
1;99631
2;99699
3;100493
...
96;100345
97;99322
98;100013
99;100030

explain (analyze,verbose,costs,buffers)
select * from aaa where num = 1 and flag = true;

Bitmap Heap Scan on public.aaa  (cost=12829.83..24729.85 rows=10340
width=5) (actual time=104.941..112.649 rows=9944 loops=1)
  Output: num, flag
  Recheck Cond: (aaa.num = 1)
  Filter: aaa.flag
  Heap Blocks: exact=8922
  Buffers: shared hit=11932
  ->  BitmapAnd  (cost=12829.83..12829.83 rows=10340 width=0) (actual
time=102.926..102.926 rows=0 loops=1)
        Buffers: shared hit=3010
        ->  Bitmap Index Scan on i1  (cost=0.00..1201.44 rows=103334
width=0) (actual time=15.459..15.459 rows=99631 loops=1)
              Index Cond: (aaa.num = 1)
              Buffers: shared hit=276
        ->  Bitmap Index Scan on i2  (cost=0.00..11622.97 rows=1000671
width=0) (actual time=76.906..76.906 rows=999339 loops=1)
              Index Cond: (aaa.flag = true)
              Buffers: shared hit=2734
Planning time: 0.110 ms
Execution time: 113.272 ms

Index Scan using i1 on public.aaa  (cost=0.44..66621.56 rows=10340
width=5) (actual time=0.027..47.075 rows=9944 loops=1)
  Output: num, flag
  Index Cond: (aaa.num = 1)
  Filter: aaa.flag
  Rows Removed by Filter: 89687
  Buffers: shared hit=39949
Planning time: 0.104 ms
Execution time: 47.351 ms

Show quoted text

I tried creating multiple-column statistics using the v10 facility for
that:

regression=# create statistics s1 on num, flag from aaa;
CREATE STATISTICS
regression=# analyze aaa;
ANALYZE

but that changed the estimate not at all, which surprised me because
dependency statistics are supposed to fix exactly this type of problem.
I suspect there may be something in the extended-stats code that causes it
not to work right for boolean columns --- this wouldn't be excessively
surprising because of the way the planner messes around with converting
"flag = true" to just "flag" and sometimes back again. But I've not
looked closer yet.

regards, tom lane
.

#16Vitaliy Garnashevich
vgarnashevich@gmail.com
In reply to: Jeff Janes (#12)
Re: Bitmap scan is undercosted?

On 03/12/2017 03:27, Jeff Janes wrote:

Due to that, when I disable bitmapscans and seqscans, I start getting
slow index scans on the wrong index, i2 rather than i1.  I don't know
why he doesn't see that in his example.

When I increase effective_cache_size to 1024MB, I start getting the plan
with the slower index i2, too.

*Bitmap Heap Scan* on public.aaa  (cost=12600.90..*23688**.70* rows=9488
width=5) (actual time=107.529..*115.902* rows=9976 loops=1)
  ->  BitmapAnd  (cost=12600.90..12600.90 rows=9488 width=0) (actual
time=105.133..105.133 rows=0 loops=1)
        ->  Bitmap Index Scan on i1  (cost=0.00..1116.43 rows=96000
width=0) (actual time=16.313..16.313 rows=100508 loops=1)
        ->  Bitmap Index Scan on i2  (cost=0.00..11479.47 rows=988338
width=0) (actual time=77.950..77.950 rows=1000200 loops=1)

*Index Scan* using i2 on public.aaa  (cost=0.44..*48227.31* rows=9488
width=5) (actual time=0.020..*285.695* rows=9976 loops=1)

*Seq Scan* on public.aaa  (cost=0.00..*169248.54* rows=9488 width=5)
(actual time=0.024..*966.469* rows=9976 loops=1)

This way the estimates and the actual time get more sense. But then
there's the question - maybe it's i1 runs too fast, and is estimated
incorrectly? Why that happens?

Here are the complete plans with the two different kinds of index scans
once again:

Index Scan using i1 on public.aaa  (cost=0.44..66621.56 rows=10340
width=5) (actual time=0.027..47.075 rows=9944 loops=1)
  Output: num, flag
  Index Cond: (aaa.num = 1)
  Filter: aaa.flag
  Rows Removed by Filter: 89687
  Buffers: shared hit=39949
Planning time: 0.104 ms
Execution time: 47.351 ms

Index Scan using i2 on public.aaa  (cost=0.44..48227.31 rows=9488
width=5) (actual time=0.020..285.695 rows=9976 loops=1)
  Output: num, flag
  Index Cond: (aaa.flag = true)
  Filter: (aaa.flag AND (aaa.num = 1))
  Rows Removed by Filter: 990224
  Buffers: shared hit=46984
Planning time: 0.098 ms
Execution time: 286.081 ms

// The test DB was populated with: create table aaa as select
floor(random()*100)::int num, (random()*10 < 1)::bool flag from
generate_series(1, 10000000) id;

Regards,
Vitaliy

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#11)
Re: Bitmap scan is undercosted?

I wrote:

I tried creating multiple-column statistics using the v10 facility for
that:
regression=# create statistics s1 on num, flag from aaa;
CREATE STATISTICS
regression=# analyze aaa;
ANALYZE
but that changed the estimate not at all, which surprised me because
dependency statistics are supposed to fix exactly this type of problem.
I suspect there may be something in the extended-stats code that causes it
not to work right for boolean columns --- this wouldn't be excessively
surprising because of the way the planner messes around with converting
"flag = true" to just "flag" and sometimes back again. But I've not
looked closer yet.

After looking, I found that indeed dependency_is_compatible_clause()
rejects expressions like "flag" or "NOT flag", which it needn't since
those are fully equivalent to "flag = true" or "flag = false"
respectively. Moreover there's nothing else in
dependencies_clauselist_selectivity that depends on the exact form of
the clause under test, only on the semantics that it's an equality
condition on some Var. Hence I propose the attached patch, which
fixes the rowcount estimate for the example discussed in this thread:

create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
generate_series(1, 10000000) id;
create index i1 on aaa (num);
create index i2 on aaa (flag);
create statistics s1 on num, flag from aaa;
analyze aaa;

explain analyze select count(*) from aaa where num = 1 and flag = true;

Without patch:

Aggregate (cost=43236.73..43236.74 rows=1 width=8) (actual time=349.365..349.3
65 rows=1 loops=1)
-> Bitmap Heap Scan on aaa (cost=20086.40..43212.94 rows=9514 width=0) (act
ual time=101.308..337.985 rows=100000 loops=1)
Recheck Cond: (num = 1)
Filter: flag
Heap Blocks: exact=44248
-> BitmapAnd (cost=20086.40..20086.40 rows=9514 width=0) (actual time
=92.214..92.214 rows=0 loops=1)
-> Bitmap Index Scan on i1 (cost=0.00..1776.43 rows=96000 width
=0) (actual time=17.236..17.236 rows=100000 loops=1)
Index Cond: (num = 1)
-> Bitmap Index Scan on i2 (cost=0.00..18304.96 rows=991003 wid
th=0) (actual time=72.168..72.168 rows=1000000 loops=1)
Index Cond: (flag = true)
Planning time: 0.254 ms
Execution time: 350.796 ms

With patch:

Aggregate (cost=43496.19..43496.20 rows=1 width=8) (actual time=359.195..359.1
95 rows=1 loops=1)
-> Bitmap Heap Scan on aaa (cost=20129.64..43256.19 rows=96000 width=0) (ac
tual time=99.750..347.353 rows=100000 loops=1)
Recheck Cond: (num = 1)
Filter: flag
Heap Blocks: exact=44248
-> BitmapAnd (cost=20129.64..20129.64 rows=9514 width=0) (actual time
=90.671..90.671 rows=0 loops=1)
-> Bitmap Index Scan on i1 (cost=0.00..1776.43 rows=96000 width
=0) (actual time=16.946..16.946 rows=100000 loops=1)
Index Cond: (num = 1)
-> Bitmap Index Scan on i2 (cost=0.00..18304.96 rows=991003 wid
th=0) (actual time=70.898..70.898 rows=1000000 loops=1)
Index Cond: (flag = true)
Planning time: 0.218 ms
Execution time: 360.608 ms

That's the right overall rowcount estimate for the scan, given the stats
it's working from. There's apparently still something wrong with bitmap
costing, since it's still estimating this as cheaper than the single-index
case --- noting the bogus rowcount estimate for the BitmapAnd, I suspect
that bitmap costing is taking shortcuts rather than using
clauselist_selectivity to estimate the overall selectivity of the bitmap
conditions. But whatever is causing that, it's independent of this
deficiency.

In addition to the bugfix proper, I improved some comments, got rid of
a NumRelids() test that's redundant with the preceding bms_membership()
test, and fixed dependencies_clauselist_selectivity so that
estimatedclauses actually is a pure output argument as stated by its
API contract.

regards, tom lane

Attachments:

fix-dependencies.c-for-boolean-quals.patchtext/x-diff; charset=us-ascii; name=fix-dependencies.c-for-boolean-quals.patchDownload+107-100
#18Jeff Janes
jeff.janes@gmail.com
In reply to: Justin Pryzby (#13)
Re: Bitmap scan is undercosted? - boolean correlation

On Sat, Dec 2, 2017 at 8:04 PM, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Sat, Dec 02, 2017 at 05:27:51PM -0800, Jeff Janes wrote:

I think the non-extended stats code also has trouble with booleans.
pg_stats gives me a correlation of 0.8 or higher for the flag column.

It's not due to the boolean though; you see the same thing if you do:
CREATE INDEX aaa_f ON aaa((flag::text));
ANALYZE aaa;
correlation | 0.81193

or:
ALTER TABLE aaa ADD flag2 int; UPDATE aaa SET flag2= flag::int
correlation | 0.81193

I think it's caused by having so few (2) values to correlate.

most_common_vals | {f,t}
most_common_freqs | {0.9014,0.0986}
correlation | 0.822792

It thinks there's somewhat-high correlation since it gets a list of x and y
values (integer positions by logical and physical sort order) and 90% of
the x
list (logical value) are the same value ('t'), and the CTIDs are in order
on
the new index, so 90% of the values are 100% correlated.

But there is no index involved (except in the case of the functional
index). The correlation of table columns to physical order of the table
doesn't depend on the existence of an index, or the physical order within
an index.

But I do see that ties within the logical order of the column values are
broken to agree with the physical order. That is wrong, right? Is there
any argument that this is desirable?

It looks like it could be fixed with a few extra double calcs per distinct
value. Considering we already sorted the sample values using SQL-callable
collation dependent comparators, I doubt a few C-level double calcs is
going to be meaningful.

Cheers,

Jeff

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#18)
Re: Bitmap scan is undercosted? - boolean correlation

Jeff Janes <jeff.janes@gmail.com> writes:

On Sat, Dec 2, 2017 at 8:04 PM, Justin Pryzby <pryzby@telsasoft.com> wrote:

It thinks there's somewhat-high correlation since it gets a list of x
and y values (integer positions by logical and physical sort order) and
90% of the x list (logical value) are the same value ('t'), and the
CTIDs are in order on the new index, so 90% of the values are 100%
correlated.

But there is no index involved (except in the case of the functional
index). The correlation of table columns to physical order of the table
doesn't depend on the existence of an index, or the physical order within
an index.

But I do see that ties within the logical order of the column values are
broken to agree with the physical order. That is wrong, right? Is there
any argument that this is desirable?

Uh ... what do you propose doing instead? We'd have to do something with
ties, and it's not so obvious this way is wrong.

regards, tom lane

#20Jeff Janes
jeff.janes@gmail.com
In reply to: Tom Lane (#19)
Re: Bitmap scan is undercosted? - boolean correlation

On Dec 3, 2017 15:31, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

On Sat, Dec 2, 2017 at 8:04 PM, Justin Pryzby <pryzby@telsasoft.com>

wrote:

It thinks there's somewhat-high correlation since it gets a list of x
and y values (integer positions by logical and physical sort order) and
90% of the x list (logical value) are the same value ('t'), and the
CTIDs are in order on the new index, so 90% of the values are 100%
correlated.

But there is no index involved (except in the case of the functional
index). The correlation of table columns to physical order of the table
doesn't depend on the existence of an index, or the physical order within
an index.

But I do see that ties within the logical order of the column values are
broken to agree with the physical order. That is wrong, right? Is there
any argument that this is desirable?

Uh ... what do you propose doing instead? We'd have to do something with
ties, and it's not so obvious this way is wrong.

Let them be tied. If there are 10 distinct values, number the values 0 to
9, and all rows of a given distinct values get the same number for the
logical order axis.

Calling the correlation 0.8 when it is really 0.0 seems obviously wrong to
me. Although if we switched btree to store duplicate values with tid as a
tie breaker, then maybe it wouldn't be as obviously wrong.

Cheers,

Jeff

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#20)
#22Vitaliy Garnashevich
vgarnashevich@gmail.com
In reply to: Vitaliy Garnashevich (#16)
#23Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#21)
#24Vitaliy Garnashevich
vgarnashevich@gmail.com
In reply to: Vitaliy Garnashevich (#22)
#25Jeff Janes
jeff.janes@gmail.com
In reply to: Vitaliy Garnashevich (#14)
#26Jeff Janes
jeff.janes@gmail.com
In reply to: Tom Lane (#21)
#27Jeff Janes
jeff.janes@gmail.com
In reply to: Vitaliy Garnashevich (#22)
#28Jeff Janes
jeff.janes@gmail.com
In reply to: Justin Pryzby (#23)
#29Justin Pryzby
pryzby@telsasoft.com
In reply to: Jeff Janes (#28)
#30Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#29)
#31Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#29)
#32Vitaliy Garnashevich
vgarnashevich@gmail.com
In reply to: Jeff Janes (#27)