Curious planning decision

Started by Steve Rogersonover 8 years ago2 messagesgeneral

steve.pg@yewtc.demon.co.uk

over 8 years ago

I have a query that behaves in an odd way, specifically it does different
things depending on if I use "=" or ">=" :

mydb=# explain analyze select count(*) from ctable c where c.date1 >=
'2017-08-10' and c.col2 = '637' and c.col3 = 'TEXT3' and c.col4 not in ('TEXT4');
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=104183.81..104183.82 rows=1 width=0) (actual
time=520.221..520.221 rows=1 loops=1)
-> Bitmap Heap Scan on ctable c (cost=83843.79..104161.45 rows=8947
width=0) (actual time=515.833..520.192 rows=540 loops=1)
Recheck Cond: ((date1 >= '2017-08-10'::date) AND (col2 = 637) AND
((col3)::text = 'TEXT3'::text))
Rows Removed by Index Recheck: 4894
Filter: ((col4)::text <> 'TEXT4'::text)
Rows Removed by Filter: 360
Heap Blocks: exact=1159
-> BitmapAnd (cost=83843.79..83843.79 rows=10294 width=0) (actual
time=515.629..515.629 rows=0 loops=1)
-> Bitmap Index Scan on ctable_date1 (cost=0.00..22534.71
rows=598419 width=0) (actual time=5.204..5.204 rows=37536 loops=1)
Index Cond: (date1 >= '2017-08-10'::date)
-> Bitmap Index Scan on ctable_col2_col3 (cost=0.00..61304.36
rows=1158379 width=0) (actual time=505.479..505.479 rows=1213850 loops=1)
Index Cond: ((col2 = 637) AND ((col3)::text = 'TEXT3'::text))
Planning time: 0.306 ms
Execution time: 520.283 ms
(14 rows)

mydb=# explain analyze select count(*) from ctable c where c.date1 =
'2017-08-10' and c.col2 = '637' and c.col3 = 'TEXT3' and c.col4 not in ('TEXT4');

QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=9929.67..9929.68 rows=1 width=0) (actual time=10.500..10.501
rows=1 loops=1)
-> Index Scan using ctable_date1_col3_col4_in on ctable c
(cost=0.57..9929.05 rows=245 width=0) (actual time=0.062..10.456 rows=540 loops=1)
Index Cond: ((date1 = '2017-08-10'::date) AND ((col3)::text =
'TEXT3'::text))
Filter: (((col4)::text <> 'TEXT4'::text) AND (col2 = 637))
Rows Removed by Filter: 9059
Planning time: 0.499 ms
Execution time: 10.557 ms
(7 rows)

The used indexes are:
Indexes:

"ctable_date1" btree (date1)
"ctable_col2_col3" btree (col2, col3)
"ctable_date1_col2_col3" btree (date1, col2, col3)

ctable has about 60,000,000 rows.

[ Caveat: I sanitized the query and I might not have this character perfect,
though I believe it's ok ]
The date is "today" at the time of writing. The table is analyzed, to pg
should "know" that date1 <= "today" hence there is no data beyond today.

On similar - and more relevant lines the same query with a date range so as
above but "c.date1 >= '2017-08-01' and c.date1 <= '2017-08-10'" uses the
date1_col2_col3 index and takes about 143ms, whereas "c.date1 >= '2017-07-31'
and c.date1 <= '2017-08-10'" uses the date1/col2_col3 indexes takes about 902ms.

Is there a planner option I can tweak that might help?

Steve

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Steve Rogerson (#1)

Re: Curious planning decision

Steve Rogerson <steve.pg@yewtc.demon.co.uk> writes:

The date is "today" at the time of writing. The table is analyzed, to pg
should "know" that date1 <= "today" hence there is no data beyond today.

It doesn't seem to, given this:

-> Bitmap Index Scan on ctable_date1 (cost=0.00..22534.71
rows=598419 width=0) (actual time=5.204..5.204 rows=37536 loops=1)
Index Cond: (date1 >= '2017-08-10'::date)

You should try just "explain analyze select * from ctable c
where c.date1 >= '2017-08-10'" to confirm that that row estimate
is wrong in isolation, but it looks like it is off by more than 10X.

ctable has about 60,000,000 rows.

So basically, the planner is estimating 1% selectivity for this condition,
when the true figure is more like 0.0625%. I find this unsurprising if
you're using the default statistics target of 100 --- that means that the
accuracy of histogram-related predictions can't be expected to be any
better than 1%. If you crank up the stats target (for this column, or
the whole table, or globally) and re-analyze, the estimate should get
better, and then you should get a better plan.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general