BitmapAnd on correlated column?

Started by greigwiseover 6 years ago6 messagesgeneral
Jump to latest
#1greigwise
greigwise@comcast.net

I'm running the following query on Postgres version 10.8:

SELECT count(*) FROM test_table WHERE and id_column_1 IN (9954, 9690, 9689,
9688) AND
id_column_2 IN (75328, 51448, 48060, 48065, 51803, 51449, 51802, 48064,
48061, 48062, 48059, 48063, 121834, 326649, 303167, 303249, 336838, 334172,
303419, 48148, 329727, 320007, 205066, 205065, 75406, 48153, 56298, 51570,
62668, 48150, 110618, 48154, 48147, 110626, 96830, 110624, 110615, 110625,
110621, 110617, 115865, 196128, 179940, 156843, 186994, 125995, 206982,
240753, 245896, 131002, 215463, 303111, 303399, 336923, 325403, 336189,
335278, 335271, 334985, 325014, 59681, 48277, 68237, 48067, 48072, 49379,
60063, 85903, 117843, 51451, 65681, 117848, 117845, 48068, 48066, 48071,
50323, 51450, 48069, 121835, 117844, 48070, 49310, 293247, 134761, 210493,
325404, 117847, 117846, 169399, 187795, 117849, 200056, 309165, 106874,
310255, 117343, 240831, 117842, 270870, 259300, 259291, 307661, 335006,
329259, 119279, 293474, 303516, 119277, 303557, 307101, 302380, 301949,
252399, 51988, 65660, 68236, 51987, 86026, 50557, 82723, 51452, 50550,
50552, 50559, 70009, 50554, 50551, 50555, 50553, 55955, 82766, 119274,
57069, 58184, 111317, 84796, 122502, 119275, 119270, 119273, 89111, 307110,
105797, 105798, 57068, 119280, 122495, 132483, 293249, 140929, 307666,
307660, 210494, 294051, 210507, 211284, 74299, 235551, 119281, 269433,
269434, 269891, 240828, 50556, 307977);

There are 2 indexes, one on id_column_1 and one on id_column_2.

The plan looks like this:

Aggregate (cost=732.87..732.88 rows=1 width=8) (actual time=33.835..33.835
rows=1 loops=1)
-> Bitmap Heap Scan on test_table (cost=701.88..732.82 rows=18 width=0)
(actual time=13.073..31.288 rows=24965 loops=1)
Recheck Cond: ((id_column_1 = ANY
('{9954,9690,9689,9688}'::integer[])) AND (id_column_2 = ANY
('{75328,51448,48060,48065,51803,51449,51802,48064,48061,48062,48059,48063,121834,326649,303167,303249,336838,334172,303419,48148,329727,320007,205066,205065,75406,48153,56298,51570,62668,48150,110618,48154,48147,110626,96830,110624,110615,110625,110621,110617,115865,196128,179940,156843,186994,125995,206982,240753,245896,131002,215463,303111,303399,336923,325403,336189,335278,335271,334985,325014,59681,48277,68237,48067,48072,49379,60063,85903,117843,51451,65681,117848,117845,48068,48066,48071,50323,51450,48069,121835,117844,48070,49310,293247,134761,210493,325404,117847,117846,169399,187795,117849,200056,309165,106874,310255,117343,240831,117842,270870,259300,259291,307661,335006,329259,119279,293474,303516,119277,303557,307101,302380,301949,252399,51988,65660,68236,51987,86026,50557,82723,51452,50550,50552,50559,70009,50554,50551,50555,50553,55955,82766,119274,57069,58184,111317,84796,122502,119275,119270,119273,89111,307110,105797,105798,57068,119280,122495,132483,293249,140929,307666,307660,210494,294051,210507,211284,74299,235551,119281,269433,269434,269891,240828,50556,307977}'::integer[])))
Heap Blocks: exact=20879
-> BitmapAnd (cost=701.88..701.88 rows=18 width=0) (actual
time=9.904..9.904 rows=0 loops=1)
-> Bitmap Index Scan on index_on_col1 (cost=0.00..95.31
rows=7676 width=0) (actual time=5.345..5.345 rows=28346 loops=1)
Index Cond: (id_column_1 = ANY
('{9954,9690,9689,9688}'::integer[]))
-> Bitmap Index Scan on index_on_col2 (cost=0.00..606.32
rows=37941 width=0) (actual time=3.395..3.395 rows=24965 loops=1)
Index Cond: (id_column_2 = ANY
('{75328,51448,48060,48065,51803,51449,51802,48064,48061,48062,48059,48063,121834,326649,303167,303249,336838,334172,303419,48148,329727,320007,205066,205065,75406,48153,56298,51570,62668,48150,110618,48154,48147,110626,96830,110624,110615,110625,110621,110617,115865,196128,179940,156843,186994,125995,206982,240753,245896,131002,215463,303111,303399,336923,325403,336189,335278,335271,334985,325014,59681,48277,68237,48067,48072,49379,60063,85903,117843,51451,65681,117848,117845,48068,48066,48071,50323,51450,48069,121835,117844,48070,49310,293247,134761,210493,325404,117847,117846,169399,187795,117849,200056,309165,106874,310255,117343,240831,117842,270870,259300,259291,307661,335006,329259,119279,293474,303516,119277,303557,307101,302380,301949,252399,51988,65660,68236,51987,86026,50557,82723,51452,50550,50552,50559,70009,50554,50551,50555,50553,55955,82766,119274,57069,58184,111317,84796,122502,119275,119270,119273,89111,307110,105797,105798,57068,119280,122495,132483,293249,140929,307666,307660,210494,294051,210507,211284,74299,235551,119281,269433,269434,269891,240828,50556,307977}'::integer[]))
Planning time: 1.452 ms
Execution time: 34.036 ms

The thing is that id_column_1 is really dependent on id_column_2. So
there's really no point in scanning the index on id_column_1. In fact, if I
remove that in clause for id_column_1 from the query, I get a better plan:

Aggregate (cost=14045.14..14045.15 rows=1 width=8) (actual
time=22.743..22.743 rows=1 loops=1)
-> Index Only Scan using index_on_col2 on test_table
(cost=0.43..13950.28 rows=37941 width=0) (actual time=0.130..19.880
rows=24965 loops=1)
Index Cond: (id_column_2 = ANY
('{75328,51448,48060,48065,51803,51449,51802,48064,48061,48062,48059,48063,121834,326649,303167,303249,336838,334172,303419,48148,329727,320007,205066,205065,75406,48153,56298,51570,62668,48150,110618,48154,48147,110626,96830,110624,110615,110625,110621,110617,115865,196128,179940,156843,186994,125995,206982,240753,245896,131002,215463,303111,303399,336923,325403,336189,335278,335271,334985,325014,59681,48277,68237,48067,48072,49379,60063,85903,117843,51451,65681,117848,117845,48068,48066,48071,50323,51450,48069,121835,117844,48070,49310,293247,134761,210493,325404,117847,117846,169399,187795,117849,200056,309165,106874,310255,117343,240831,117842,270870,259300,259291,307661,335006,329259,119279,293474,303516,119277,303557,307101,302380,301949,252399,51988,65660,68236,51987,86026,50557,82723,51452,50550,50552,50559,70009,50554,50551,50555,50553,55955,82766,119274,57069,58184,111317,84796,122502,119275,119270,119273,89111,307110,105797,105798,57068,119280,122495,132483,293249,140929,307666,307660,210494,294051,210507,211284,74299,235551,119281,269433,269434,269891,240828,50556,307977}'::integer[]))
Heap Fetches: 24965
Planning time: 0.647 ms
Execution time: 22.781 ms

I thought maybe extended statistics would help, so I did this:

create statistics test (dependencies) on id_column_2, id_column_1 from
test_table;
analyze test_table;

But the plan was nearly identical to the first plan with the BitmapAND even
after creating the extended statistics:

Aggregate (cost=738.12..738.13 rows=1 width=8) (actual time=34.207..34.207
rows=1 loops=1)
-> Bitmap Heap Scan on test_table (cost=707.13..738.07 rows=18 width=0)
(actual time=14.230..31.782 rows=24965 loops=1)
Recheck Cond: ((id_column_1 = ANY
('{9954,9690,9689,9688}'::integer[])) AND (id_column_2 = ANY
('{75328,51448,48060,48065,51803,51449,51802,48064,48061,48062,48059,48063,121834,326649,303167,303249,336838,334172,303419,48148,329727,320007,205066,205065,75406,48153,56298,51570,62668,48150,110618,48154,48147,110626,96830,110624,110615,110625,110621,110617,115865,196128,179940,156843,186994,125995,206982,240753,245896,131002,215463,303111,303399,336923,325403,336189,335278,335271,334985,325014,59681,48277,68237,48067,48072,49379,60063,85903,117843,51451,65681,117848,117845,48068,48066,48071,50323,51450,48069,121835,117844,48070,49310,293247,134761,210493,325404,117847,117846,169399,187795,117849,200056,309165,106874,310255,117343,240831,117842,270870,259300,259291,307661,335006,329259,119279,293474,303516,119277,303557,307101,302380,301949,252399,51988,65660,68236,51987,86026,50557,82723,51452,50550,50552,50559,70009,50554,50551,50555,50553,55955,82766,119274,57069,58184,111317,84796,122502,119275,119270,119273,89111,307110,105797,105798,57068,119280,122495,132483,293249,140929,307666,307660,210494,294051,210507,211284,74299,235551,119281,269433,269434,269891,240828,50556,307977}'::integer[])))
Heap Blocks: exact=20879
-> BitmapAnd (cost=707.13..707.13 rows=18 width=0) (actual
time=10.710..10.710 rows=0 loops=1)
-> Bitmap Index Scan on index_on_col1 (cost=0.00..95.58
rows=7711 width=0) (actual time=5.232..5.232 rows=28346 loops=1)
Index Cond: (id_column_1 = ANY
('{9954,9690,9689,9688}'::integer[]))
-> Bitmap Index Scan on index_on_col2 (cost=0.00..611.30
rows=38750 width=0) (actual time=4.010..4.010 rows=24965 loops=1)
Index Cond: (id_column_2 = ANY
('{75328,51448,48060,48065,51803,51449,51802,48064,48061,48062,48059,48063,121834,326649,303167,303249,336838,334172,303419,48148,329727,320007,205066,205065,75406,48153,56298,51570,62668,48150,110618,48154,48147,110626,96830,110624,110615,110625,110621,110617,115865,196128,179940,156843,186994,125995,206982,240753,245896,131002,215463,303111,303399,336923,325403,336189,335278,335271,334985,325014,59681,48277,68237,48067,48072,49379,60063,85903,117843,51451,65681,117848,117845,48068,48066,48071,50323,51450,48069,121835,117844,48070,49310,293247,134761,210493,325404,117847,117846,169399,187795,117849,200056,309165,106874,310255,117343,240831,117842,270870,259300,259291,307661,335006,329259,119279,293474,303516,119277,303557,307101,302380,301949,252399,51988,65660,68236,51987,86026,50557,82723,51452,50550,50552,50559,70009,50554,50551,50555,50553,55955,82766,119274,57069,58184,111317,84796,122502,119275,119270,119273,89111,307110,105797,105798,57068,119280,122495,132483,293249,140929,307666,307660,210494,294051,210507,211284,74299,235551,119281,269433,269434,269891,240828,50556,307977}'::integer[]))
Planning time: 1.073 ms
Execution time: 34.331 ms

So, I'm just wondering if there's anything I can do to influence the
optimize to pick the better plan using just the one index on id_column_2
(aside from re-writing the query).

Thanks,
Greig Wise

--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html

#2Rob Sargent
robjsargent@gmail.com
In reply to: greigwise (#1)
Re: BitmapAnd on correlated column?

On 10/3/19 3:22 PM, greigwise wrote:

I'm running the following query on Postgres version 10.8:

SELECT count(*) FROM test_table WHERE and id_column_1 IN (9954, 9690, 9689,
9688) AND
id_column_2 IN (75328, 51448, 48060, 48065, 51803, 51449, 51802, 48064,

-
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html

Is "where and" just a typo?

#3greigwise
greigwise@comcast.net
In reply to: Rob Sargent (#2)
Re: BitmapAnd on correlated column?
#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: greigwise (#1)
Re: BitmapAnd on correlated column?

On Thu, 2019-10-03 at 14:22 -0700, greigwise wrote:

I'm running the following query on Postgres version 10.8:

SELECT count(*) FROM test_table WHERE and id_column_1 IN (9954,
9690, 9689, 9688) AND id_column_2 IN ([long list]);

There are 2 indexes, one on id_column_1 and one on id_column_2.

The plan looks like this:

Aggregate
-> Bitmap Heap Scan on test_table
-> BitmapAnd
-> Bitmap Index Scan on index_on_col1
-> Bitmap Index Scan on index_on_col2
Planning time: 1.452 ms
Execution time: 34.036 ms

The thing is that id_column_1 is really dependent on id_column_2. So
there's really no point in scanning the index on id_column_1. In
fact, if I remove that in clause for id_column_1 from the query, I
get a better plan:

Aggregate
-> Index Only Scan using index_on_col2 on test_table
Planning time: 0.647 ms
Execution time: 22.781 ms

I thought maybe extended statistics would help, so I did this:

create statistics test (dependencies) on id_column_2, id_column_1
from test_table;
analyze test_table;

But the plan was nearly identical to the first plan with the
BitmapAND even after creating the extended statistics:

So, I'm just wondering if there's anything I can do to influence the
optimize to pick the better plan using just the one index on
id_column_2 (aside from re-writing the query).

Extended statistics will tell PostgreSQL that it is very unlikely
that the first condition will contribute significantly, but that
is no proof that the condition can be omitted, so the optimizer
cannot just skip the condition.

You'll have to rewrite the query.

If one condition depends on the other, consider normalizing the table.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#5greigwise
greigwise@comcast.net
In reply to: Laurenz Albe (#4)
Re: BitmapAnd on correlated column?

Extended statistics will tell PostgreSQL that it is very unlikely
that the first condition will contribute significantly, but that
is no proof that the condition can be omitted, so the optimizer
cannot just skip the condition.

Granted it cannot skip the condition, but that doesn't mean that it has to
use that second index. It's doing a recheck on the conditions anyway,
right?

Thanks.
Greig

--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: greigwise (#5)
Re: BitmapAnd on correlated column?

greigwise <greigwise@comcast.net> writes:

Granted it cannot skip the condition, but that doesn't mean that it has to
use that second index. It's doing a recheck on the conditions anyway,
right?

I'm afraid you're out of luck on that for now. In principle, the
presence of extended stats on the column combination might allow
the planner to figure out that the second IN condition adds little
selectivity. But I'm not sure how hard it would be, and I am sure
that the extended-stats logic hasn't been built out in that direction
yet.

regards, tom lane