Strange query plan
Tom,
I have a problem with slow query execution (postgresql 7.1.2):
There are 2 tables - idx, msg_prt:
bug=# \dt
List of relations
Name | Type | Owner
---------+-------+--------
idx | table | megera
msg_prt | table | megera
(2 rows)
bug=# \d idx
Table "idx"
Attribute | Type | Modifier
-----------+---------+----------
tid | integer |
lid | integer |
did | integer |
Index: idxidx
bug=# \d msg_prt
Table "msg_prt"
Attribute | Type | Modifier
-----------+---------+----------
tid | integer |
Index: mprt_tid
Also there are 2 indexes - idxidx, mprt_tid
bug=# \d idxidx
Index "idxidx"
Attribute | Type
-----------+---------
lid | integer
did | integer
tid | integer
unique btree
bug=# \d mprt_tid
Index "mprt_tid"
Attribute | Type
-----------+---------
tid | integer
unique btree
Query is:
select msg_prt.tid as mid from msg_prt
where exists
(select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and idx.lid in (1207,59587) )
Plan for this query looks very ineffective and query is very slow:
select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and idx.lid in (1207,59587) )
NOTICE: QUERY PLAN:
Seq Scan on msg_prt (cost=0.00..119090807.13 rows=69505 width=4)
SubPlan
-> Index Scan using idxidx, idxidx on idx (cost=0.00..1713.40 rows=1 width=4)
total: 6.80 sec; number: 1; for one: 6.796 sec;
Statistics on tables:
idx - 103651 rows
msg_prt - 69505 rows
There are only 16 rows in 'idx' table satisfied subselect condition.
I did vacuum analyze.
Adding another index 'create index tididx on idx (tid);' helps:
select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and idx.lid in (1207,59587) )
NOTICE: QUERY PLAN:
Seq Scan on msg_prt (cost=0.00..1134474.94 rows=69505 width=4)
SubPlan
-> Index Scan using tididx on idx (cost=0.00..16.31 rows=1 width=4)
total: 1.71 sec; number: 1; for one: 1.711 sec;
but still plan looks ineffective.
The best plan I've got eliminating IN predicate:
select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and idx.lid = 1207 and idx.lid=59587 )
NOTICE: QUERY PLAN:
Seq Scan on msg_prt (cost=0.00..167368.47 rows=69505 width=4)
SubPlan
-> Index Scan using idxidx on idx (cost=0.00..2.39 rows=1 width=4)
total: 0.54 sec; number: 1; for one: 0.541 sec;
Unfortunately I can't use this way in general case.
Does it's a known problem ?
data+schema is available from
http://www.sai.msu.su/~megera/postgres/data/bug.dump.gz
It's about 500Kb !
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes:
The best plan I've got eliminating IN predicate:
select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and idx.lid = 1207 and idx.lid=59587 )
Surely that returns zero rows?
regards, tom lane
On Tue, 5 Jun 2001, Tom Lane wrote:
Oleg Bartunov <oleg@sai.msu.su> writes:
The best plan I've got eliminating IN predicate:
select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and idx.lid = 1207 and idx.lid=59587 )Surely that returns zero rows?
Oops, sorry :-)
should be
select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and ( idx.lid = 1207 or idx.lid=59587 ));
but this is not a big win.
Anyway, what's about original query ?
regards, tom lane
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes:
should be
select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and ( idx.lid = 1207 or idx.lid=59587 ));
but this is not a big win.
Shouldn't be any win at all: the IN expression-list notation will get
translated to exactly that form.
Anyway, what's about original query ?
IN/EXISTS subqueries suck. This has been true for a long time and is
going to be true for a while longer, unless someone else fixes it before
I have a chance to look at it. See if you can't rewrite your query as
a plain join.
regards, tom lane
On Tue, 5 Jun 2001, Tom Lane wrote:
Oleg Bartunov <oleg@sai.msu.su> writes:
should be
select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and ( idx.lid = 1207 or idx.lid=59587 ));
but this is not a big win.Shouldn't be any win at all: the IN expression-list notation will get
translated to exactly that form.
Sure
Anyway, what's about original query ?
IN/EXISTS subqueries suck. This has been true for a long time and is
going to be true for a while longer, unless someone else fixes it before
I have a chance to look at it. See if you can't rewrite your query as
a plain join.
That's why we've moved to GiST and feel fine :-) That query we used in
our old project, now swithing to GiST.
regards, tom lane
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes:
select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and idx.lid in (1207,59587) )
NOTICE: QUERY PLAN:
Seq Scan on msg_prt (cost=0.00..119090807.13 rows=69505 width=4)
SubPlan
-> Index Scan using idxidx, idxidx on idx (cost=0.00..1713.40 rows=1 width=4)
Actually, this example does reveal an unnecessary inefficiency: the
planner is only using the "idx.lid in (1207,59587)" clause for the
indexscan, ignoring the fact that the did and tid clauses match the
additional columns of your three-column index. The attached patch
should improve matters.
regards, tom lane
*** src/backend/optimizer/path/indxpath.c.orig Sun May 20 16:28:18 2001
--- src/backend/optimizer/path/indxpath.c Tue Jun 5 12:38:21 2001
***************
*** 397,403 ****
clause, false);
}
! /*
* Given an OR subclause that has previously been determined to match
* the specified index, extract a list of specific opclauses that can be
* used as indexquals.
--- 397,403 ----
clause, false);
}
! /*----------
* Given an OR subclause that has previously been determined to match
* the specified index, extract a list of specific opclauses that can be
* used as indexquals.
***************
*** 406,415 ****
* given opclause. However, if the OR subclause is an AND, we have to
* scan it to find the opclause(s) that match the index. (There should
* be at least one, if match_or_subclause_to_indexkey succeeded, but there
! * could be more.) Also, we apply expand_indexqual_conditions() to convert
! * any special matching opclauses to indexable operators.
*
* The passed-in clause is not changed.
*/
List *
extract_or_indexqual_conditions(RelOptInfo *rel,
--- 406,430 ----
* given opclause. However, if the OR subclause is an AND, we have to
* scan it to find the opclause(s) that match the index. (There should
* be at least one, if match_or_subclause_to_indexkey succeeded, but there
! * could be more.)
! *
! * Also, we can look at other restriction clauses of the rel to discover
! * additional candidate indexquals: for example, consider
! * ... where (a = 11 or a = 12) and b = 42;
! * If we are dealing with an index on (a,b) then we can include the clause
! * b = 42 in the indexqual list generated for each of the OR subclauses.
! * Essentially, we are making an index-specific transformation from CNF to
! * DNF. (NOTE: when we do this, we end up with a slightly inefficient plan
! * because create_indexscan_plan is not very bright about figuring out which
! * restriction clauses are implied by the generated indexqual condition.
! * Currently we'll end up rechecking both the OR clause and the transferred
! * restriction clause as qpquals. FIXME someday.)
! *
! * Also, we apply expand_indexqual_conditions() to convert any special
! * matching opclauses to indexable operators.
*
* The passed-in clause is not changed.
+ *----------
*/
List *
extract_or_indexqual_conditions(RelOptInfo *rel,
***************
*** 417,470 ****
Expr *orsubclause)
{
List *quals = NIL;
! if (and_clause((Node *) orsubclause))
{
! /*
! * Extract relevant sub-subclauses in indexkey order. This is
! * just like group_clauses_by_indexkey() except that the input and
! * output are lists of bare clauses, not of RestrictInfo nodes.
! */
! int *indexkeys = index->indexkeys;
! Oid *classes = index->classlist;
! do
{
! int curIndxKey = indexkeys[0];
! Oid curClass = classes[0];
! List *clausegroup = NIL;
! List *item;
! foreach(item, orsubclause->args)
{
if (match_clause_to_indexkey(rel, index,
curIndxKey, curClass,
! lfirst(item), false))
! clausegroup = lappend(clausegroup, lfirst(item));
}
! /*
! * If no clauses match this key, we're done; we don't want to
! * look at keys to its right.
! */
! if (clausegroup == NIL)
! break;
!
! quals = nconc(quals, clausegroup);
!
! indexkeys++;
! classes++;
! } while (!DoneMatchingIndexKeys(indexkeys, index));
!
! if (quals == NIL)
! elog(ERROR, "extract_or_indexqual_conditions: no matching clause");
! }
! else
! {
! /* we assume the caller passed a valid indexable qual */
! quals = makeList1(orsubclause);
! }
return expand_indexqual_conditions(quals);
}
--- 432,503 ----
Expr *orsubclause)
{
List *quals = NIL;
+ int *indexkeys = index->indexkeys;
+ Oid *classes = index->classlist;
! /*
! * Extract relevant indexclauses in indexkey order. This is essentially
! * just like group_clauses_by_indexkey() except that the input and
! * output are lists of bare clauses, not of RestrictInfo nodes.
! */
! do
{
+ int curIndxKey = indexkeys[0];
+ Oid curClass = classes[0];
+ List *clausegroup = NIL;
+ List *item;
! if (and_clause((Node *) orsubclause))
! {
! foreach(item, orsubclause->args)
! {
! Expr *subsubclause = (Expr *) lfirst(item);
! if (match_clause_to_indexkey(rel, index,
! curIndxKey, curClass,
! subsubclause, false))
! clausegroup = lappend(clausegroup, subsubclause);
! }
! }
! else if (match_clause_to_indexkey(rel, index,
! curIndxKey, curClass,
! orsubclause, false))
{
! clausegroup = makeList1(orsubclause);
! }
! /*
! * If we found no clauses for this indexkey in the OR subclause
! * itself, try looking in the rel's top-level restriction list.
! */
! if (clausegroup == NIL)
! {
! foreach(item, rel->baserestrictinfo)
{
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(item);
+
if (match_clause_to_indexkey(rel, index,
curIndxKey, curClass,
! rinfo->clause, false))
! clausegroup = lappend(clausegroup, rinfo->clause);
}
+ }
! /*
! * If still no clauses match this key, we're done; we don't want to
! * look at keys to its right.
! */
! if (clausegroup == NIL)
! break;
!
! quals = nconc(quals, clausegroup);
!
! indexkeys++;
! classes++;
! } while (!DoneMatchingIndexKeys(indexkeys, index));
!
! if (quals == NIL)
! elog(ERROR, "extract_or_indexqual_conditions: no matching clause");
return expand_indexqual_conditions(quals);
}
On Tue, 5 Jun 2001, Tom Lane wrote:
Oleg Bartunov <oleg@sai.msu.su> writes:
select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and idx.lid in (1207,59587) )
NOTICE: QUERY PLAN:Seq Scan on msg_prt (cost=0.00..119090807.13 rows=69505 width=4)
SubPlan
-> Index Scan using idxidx, idxidx on idx (cost=0.00..1713.40 rows=1 width=4)Actually, this example does reveal an unnecessary inefficiency: the
planner is only using the "idx.lid in (1207,59587)" clause for the
indexscan, ignoring the fact that the did and tid clauses match the
additional columns of your three-column index. The attached patch
should improve matters.regards, tom lane
Cool. Looks better
select msg_prt.tid as mid from msg_prt where exists (select idx.tid from idx where msg_prt.tid=idx.tid and idx.did=1 and idx.lid in ( 1207, 59587) )
NOTICE: QUERY PLAN:
Seq Scan on msg_prt (cost=0.00..333700.88 rows=69505 width=4)
SubPlan
-> Index Scan using idxidx, idxidx on idx (cost=0.00..4.79 rows=1 width=4)
total: 3.15 sec; number: 1; for one: 3.153 sec;
interesting that droping index 'idxidx' and creating simple
create index tididx on idx (tid);
behaves better, while plan looks worse. Notice, index on tididx
estimates cost better (16).
select msg_prt.tid as mid from msg_prt where exists (select idx.tid from idx where msg_prt.tid=idx.tid and idx.did=1 and idx.lid in ( 1207, 59587) )
NOTICE: QUERY PLAN:
Seq Scan on msg_prt (cost=0.00..1134474.94 rows=69505 width=4)
SubPlan
-> Index Scan using tididx on idx (cost=0.00..16.31 rows=1 width=4)
total: 1.70 sec; number: 1; for one: 1.703 sec;
Interesting that earlier if I have 2 indexes - idxidx and tididx
optimizer choose tididx, while now (after patching) optimizer
always choose idxidx.
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83