Optimizer degradation since 8.0

Started by Teodor Sigaevover 19 years ago2 messages
#1Teodor Sigaev
teodor@sigaev.ru

Table with one million rows:

wow=# \d _document83
Table "public._document83"
Column | Type | Modifiers
------------+-----------------------------+-----------
_idrref | bytea | not null
_marked | boolean | not null
_date_time | timestamp without time zone | not null
_number | character(10) | not null
_posted | boolean | not null
Indexes:
"_document83ng_pkey" PRIMARY KEY, btree (_idrref)
"_document83_bydocdate_tr" btree (_date_time, _idrref)
"qq" btree (_date_time)

Query:
SELECT
_Date_Time,
_IDRRef,
FROM
_Document83
WHERE
_Date_Time = '2006-06-21 11:24:56'::timestamp AND
_IDRRef > '\\234?PPTP00\\021\\333\\000\\367\\004:v:'::bytea
OR _Date_Time > '2006-06-21 11:24:56'::timestamp
LIMIT 20;

Explain analyze in postgres 8.0:
Limit (cost=0.00..0.63 rows=20 width=44) (actual time=0.250..0.485 rows=20
loops=1)
-> Index Scan using qq, qq on _document83 (cost=0.00..6679.90 rows=211427
width=44) (actual time=0.238..0.416 rows=20 loops=1)
Index Cond: ((_date_time = '2006-06-21 11:24:56'::timestamp without
time zone) OR (_date_time > '2006-06-21 11:24:56'::timestamp without time zone))
Filter: (((_date_time = '2006-06-21 11:24:56'::timestamp without time
zone) AND (_idrref > '\\234?PPTP00\\021\\333\\000\\367\\004:v:'::bytea)) OR
(_date_time > '2006-06-21 11:24:56'::timestamp without time zone))
Total runtime: 2.313 ms

Explain analyze in postgres 8.1:
Limit (cost=0.00..2.82 rows=20 width=44) (actual time=1448.897..1610.386
rows=20 loops=1)
-> Seq Scan on _document83 (cost=0.00..29729.04 rows=210782 width=44)
(actual time=1448.889..1610.314 rows=20 loops=1)
Filter: ((_date_time > '2006-06-21 11:24:56'::timestamp without time
zone) OR ((_date_time = '2006-06-21 11:24:56'::timestamp without time zone) AND
(_idrref > E'\\234?PPTP00\\021\\333\\000\\367\\004:v:'::bytea)))
Total runtime: 1610.524 ms
(4 rows)

With ENABLE_SEQSCAN=OFF:
Limit (cost=1319.83..1321.23 rows=20 width=44) (actual time=193.261..193.382
rows=20 loops=1)
-> Bitmap Heap Scan on _document83 (cost=1319.83..16029.62 rows=210782
width=44) (actual time=193.253..193.314 rows=20 loops=1)
Recheck Cond: (((_date_time = '2006-06-21 11:24:56'::timestamp without
time zone) AND (_idrref > E'\\234?PPTP00\\021\\333\\000\\367\\004:v:'::bytea))
OR (_date_time > '2006-06-21 11:24:56'::timestamp without time zone))
-> BitmapOr (cost=1319.83..1319.83 rows=210788 width=0) (actual
time=191.203..191.203 rows=0 loops=1)
-> Bitmap Index Scan on _document83_bydocdate_tr
(cost=0.00..2.18 rows=30 width=0) (actual time=2.470..2.470 rows=43 loops=1)
Index Cond: ((_date_time = '2006-06-21
11:24:56'::timestamp without time zone) AND (_idrref >
E'\\234?PPTP00\\021\\333\\000\\367\\004:v:'::bytea))
-> Bitmap Index Scan on qq (cost=0.00..1317.65 rows=210758
width=0) (actual time=188.720..188.720 rows=275800 loops=1)
Index Cond: (_date_time > '2006-06-21 11:24:56'::timestamp
without time zone)
Total runtime: 193.872 ms

So, 8.0 is better at least in 100 times. Expected number of rows is close to
real value ( ~270000 ). Rewriting query with UNION ALL makes better performance
(about 1 ms):
Limit (cost=0.00..0.73 rows=20 width=44) (actual time=0.654..0.851 rows=20
loops=1)
-> Append (cost=0.00..7712.53 rows=210788 width=44) (actual
time=0.648..0.791 rows=20 loops=1)
-> Index Scan using qq on _document83 (cost=0.00..6.42 rows=30
width=44) (actual time=0.645..0.733 rows=20 loops=1)
Index Cond: (_date_time = '2006-06-21 11:24:56'::timestamp
without time zone)
Filter: (_idrref >
E'\\234?PPTP00\\021\\333\\000\\367\\004:v:'::bytea)
-> Index Scan using qq on _document83 (cost=0.00..5598.23
rows=210758 width=44) (never executed)
Index Cond: (_date_time > '2006-06-21 11:24:56'::timestamp
without time zone)
Total runtime: 1.059 ms
But it's not always possible to rewrite automatically generated query...

After adding 'order by', postgres uses index but plan becomes the same as
before, with seqscan=off.

Can I tweak something in 8.1 or it's a bug?

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#1)
Re: Optimizer degradation since 8.0

Teodor Sigaev <teodor@sigaev.ru> writes:

Can I tweak something in 8.1 or it's a bug?

It's not a bug, it's an intentional change:

2005-04-24 21:30 tgl

* src/: backend/commands/explain.c,
backend/executor/nodeBitmapIndexscan.c,
backend/executor/nodeIndexscan.c, backend/nodes/copyfuncs.c,
backend/nodes/outfuncs.c, backend/optimizer/path/allpaths.c,
backend/optimizer/path/indxpath.c,
backend/optimizer/path/orindxpath.c,
backend/optimizer/plan/createplan.c,
backend/optimizer/plan/setrefs.c,
backend/optimizer/plan/subselect.c,
backend/optimizer/util/pathnode.c,
backend/optimizer/util/restrictinfo.c,
backend/utils/adt/selfuncs.c, include/executor/nodeIndexscan.h,
include/nodes/execnodes.h, include/nodes/plannodes.h,
include/nodes/relation.h, include/optimizer/paths.h,
include/optimizer/planmain.h, include/optimizer/restrictinfo.h:
Remove support for OR'd indexscans internal to a single IndexScan
plan node, as this behavior is now better done as a bitmap OR
indexscan. This allows considerable simplification in
nodeIndexscan.c itself as well as several planner modules concerned
with indexscan plan generation. Also we can improve the sharing of
code between regular and bitmap indexscans, since they are now
working with nigh-identical Plan nodes.

Your example shows a case where a plain indexscan's zero startup cost
is very useful. I'm disinclined to revert the above patch completely,
because the OR'd-indexscan code was a mess, but perhaps we could
consider ways to make bitmap scans start delivering tuples sooner
than after finishing the whole index scan. Not for 8.2 though :-(

regards, tom lane