Strange query plan

Started by Oleg Bartunovover 24 years ago7 messages

oleg@sai.msu.su

over 24 years ago

Tom,

I have a problem with slow query execution (postgresql 7.1.2):

bug=#  \d idx
          Table "idx"
 Attribute |  Type   | Modifier
-----------+---------+----------
 tid       | integer |
 lid       | integer |
 did       | integer |
Index: idxidx

bug=#  \d msg_prt
        Table "msg_prt"
 Attribute |  Type   | Modifier
-----------+---------+----------
 tid       | integer |
Index: mprt_tid

Also there are 2 indexes - idxidx, mprt_tid

bug=# \d idxidx
Index "idxidx"
Attribute | Type
-----------+---------
lid | integer
did | integer
tid | integer
unique btree

bug=# \d mprt_tid
Index "mprt_tid"
Attribute | Type
-----------+---------
tid | integer
unique btree

Query is:
select msg_prt.tid as mid from msg_prt
where exists
(select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and idx.lid in (1207,59587) )

Plan for this query looks very ineffective and query is very slow:

select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and idx.lid in (1207,59587) )
NOTICE: QUERY PLAN:

Seq Scan on msg_prt (cost=0.00..119090807.13 rows=69505 width=4)
SubPlan
-> Index Scan using idxidx, idxidx on idx (cost=0.00..1713.40 rows=1 width=4)

total: 6.80 sec; number: 1; for one: 6.796 sec;

Statistics on tables:
idx - 103651 rows
msg_prt - 69505 rows

There are only 16 rows in 'idx' table satisfied subselect condition.
I did vacuum analyze.

Adding another index 'create index tididx on idx (tid);' helps:
select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and idx.lid in (1207,59587) )
NOTICE: QUERY PLAN:

Seq Scan on msg_prt (cost=0.00..1134474.94 rows=69505 width=4)
SubPlan
-> Index Scan using tididx on idx (cost=0.00..16.31 rows=1 width=4)

total: 1.71 sec; number: 1; for one: 1.711 sec;

but still plan looks ineffective.

The best plan I've got eliminating IN predicate:
select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and idx.lid = 1207 and idx.lid=59587 )
NOTICE: QUERY PLAN:

Seq Scan on msg_prt (cost=0.00..167368.47 rows=69505 width=4)
SubPlan
-> Index Scan using idxidx on idx (cost=0.00..2.39 rows=1 width=4)

total: 0.54 sec; number: 1; for one: 0.541 sec;

Unfortunately I can't use this way in general case.

Does it's a known problem ?

data+schema is available from
http://www.sai.msu.su/~megera/postgres/data/bug.dump.gz
It's about 500Kb !

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Tom Lane

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Oleg Bartunov (#1)

Re: Strange query plan

Oleg Bartunov <oleg@sai.msu.su> writes:

The best plan I've got eliminating IN predicate:
select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and idx.lid = 1207 and idx.lid=59587 )

Surely that returns zero rows?

regards, tom lane

Oleg Bartunov

oleg@sai.msu.su

over 24 years ago

In reply to: Tom Lane (#2)

Re: Strange query plan

On Tue, 5 Jun 2001, Tom Lane wrote:

Oleg Bartunov <oleg@sai.msu.su> writes:

The best plan I've got eliminating IN predicate:
select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and idx.lid = 1207 and idx.lid=59587 )

Surely that returns zero rows?

Oops, sorry :-)
should be
select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and ( idx.lid = 1207 or idx.lid=59587 ));

but this is not a big win.

Anyway, what's about original query ?

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Oleg Bartunov (#3)

Re: Strange query plan

Oleg Bartunov <oleg@sai.msu.su> writes:

should be
select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and ( idx.lid = 1207 or idx.lid=59587 ));
but this is not a big win.

Shouldn't be any win at all: the IN expression-list notation will get
translated to exactly that form.

Anyway, what's about original query ?

IN/EXISTS subqueries suck. This has been true for a long time and is
going to be true for a while longer, unless someone else fixes it before
I have a chance to look at it. See if you can't rewrite your query as
a plain join.

regards, tom lane

Oleg Bartunov

oleg@sai.msu.su

over 24 years ago

In reply to: Tom Lane (#4)

Re: Strange query plan

On Tue, 5 Jun 2001, Tom Lane wrote:

Oleg Bartunov <oleg@sai.msu.su> writes:

should be
select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and ( idx.lid = 1207 or idx.lid=59587 ));
but this is not a big win.

Shouldn't be any win at all: the IN expression-list notation will get
translated to exactly that form.

Sure

Anyway, what's about original query ?

IN/EXISTS subqueries suck. This has been true for a long time and is
going to be true for a while longer, unless someone else fixes it before
I have a chance to look at it. See if you can't rewrite your query as
a plain join.

That's why we've moved to GiST and feel fine :-) That query we used in
our old project, now swithing to GiST.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Oleg Bartunov (#1)

Re: Strange query plan

Oleg Bartunov <oleg@sai.msu.su> writes:

select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and idx.lid in (1207,59587) )
NOTICE: QUERY PLAN:

Seq Scan on msg_prt (cost=0.00..119090807.13 rows=69505 width=4)
SubPlan
-> Index Scan using idxidx, idxidx on idx (cost=0.00..1713.40 rows=1 width=4)

Actually, this example does reveal an unnecessary inefficiency: the
planner is only using the "idx.lid in (1207,59587)" clause for the
indexscan, ignoring the fact that the did and tid clauses match the
additional columns of your three-column index. The attached patch
should improve matters.

regards, tom lane

*** src/backend/optimizer/path/indxpath.c.orig	Sun May 20 16:28:18 2001
--- src/backend/optimizer/path/indxpath.c	Tue Jun  5 12:38:21 2001
***************
*** 397,403 ****
  										clause, false);
  }

! /*
   * Given an OR subclause that has previously been determined to match
   * the specified index, extract a list of specific opclauses that can be
   * used as indexquals.
--- 397,403 ----
  										clause, false);
  }

! /*----------
   * Given an OR subclause that has previously been determined to match
   * the specified index, extract a list of specific opclauses that can be
   * used as indexquals.
***************
*** 406,415 ****
   * given opclause.	However, if the OR subclause is an AND, we have to
   * scan it to find the opclause(s) that match the index.  (There should
   * be at least one, if match_or_subclause_to_indexkey succeeded, but there
!  * could be more.)	Also, we apply expand_indexqual_conditions() to convert
!  * any special matching opclauses to indexable operators.
   *
   * The passed-in clause is not changed.
   */
  List *
  extract_or_indexqual_conditions(RelOptInfo *rel,
--- 406,430 ----
   * given opclause.	However, if the OR subclause is an AND, we have to
   * scan it to find the opclause(s) that match the index.  (There should
   * be at least one, if match_or_subclause_to_indexkey succeeded, but there
!  * could be more.)
!  *
!  * Also, we can look at other restriction clauses of the rel to discover
!  * additional candidate indexquals: for example, consider
!  *			... where (a = 11 or a = 12) and b = 42;
!  * If we are dealing with an index on (a,b) then we can include the clause
!  * b = 42 in the indexqual list generated for each of the OR subclauses.
!  * Essentially, we are making an index-specific transformation from CNF to
!  * DNF.  (NOTE: when we do this, we end up with a slightly inefficient plan
!  * because create_indexscan_plan is not very bright about figuring out which
!  * restriction clauses are implied by the generated indexqual condition.
!  * Currently we'll end up rechecking both the OR clause and the transferred
!  * restriction clause as qpquals.  FIXME someday.)
!  *
!  * Also, we apply expand_indexqual_conditions() to convert any special
!  * matching opclauses to indexable operators.
   *
   * The passed-in clause is not changed.
+  *----------
   */
  List *
  extract_or_indexqual_conditions(RelOptInfo *rel,
***************
*** 417,470 ****
  								Expr *orsubclause)
  {
  	List	   *quals = NIL;

! if (and_clause((Node *) orsubclause))
{

! /*
! * Extract relevant sub-subclauses in indexkey order. This is
! * just like group_clauses_by_indexkey() except that the input and
! * output are lists of bare clauses, not of RestrictInfo nodes.
! */
! int *indexkeys = index->indexkeys;
! Oid *classes = index->classlist;

! do
{
! int curIndxKey = indexkeys[0];
! Oid curClass = classes[0];
! List *clausegroup = NIL;
! List *item;

! foreach(item, orsubclause->args)
{
if (match_clause_to_indexkey(rel, index,
curIndxKey, curClass,
! lfirst(item), false))
! clausegroup = lappend(clausegroup, lfirst(item));
}

! /*
! * If no clauses match this key, we're done; we don't want to
! * look at keys to its right.
! */
! if (clausegroup == NIL)
! break;
!
! quals = nconc(quals, clausegroup);
!
! indexkeys++;
! classes++;
! } while (!DoneMatchingIndexKeys(indexkeys, index));
!
! if (quals == NIL)
! elog(ERROR, "extract_or_indexqual_conditions: no matching clause");
! }
! else
! {
! /* we assume the caller passed a valid indexable qual */
! quals = makeList1(orsubclause);
! }

  	return expand_indexqual_conditions(quals);
  }
--- 432,503 ----
  								Expr *orsubclause)
  {
  	List	   *quals = NIL;
+ 	int		   *indexkeys = index->indexkeys;
+ 	Oid		   *classes = index->classlist;

! /*
! * Extract relevant indexclauses in indexkey order. This is essentially
! * just like group_clauses_by_indexkey() except that the input and
! * output are lists of bare clauses, not of RestrictInfo nodes.
! */
! do
{
+ int curIndxKey = indexkeys[0];
+ Oid curClass = classes[0];
+ List *clausegroup = NIL;
+ List *item;

! if (and_clause((Node *) orsubclause))
! {
! foreach(item, orsubclause->args)
! {
! Expr *subsubclause = (Expr *) lfirst(item);

! if (match_clause_to_indexkey(rel, index,
! curIndxKey, curClass,
! subsubclause, false))
! clausegroup = lappend(clausegroup, subsubclause);
! }
! }
! else if (match_clause_to_indexkey(rel, index,
! curIndxKey, curClass,
! orsubclause, false))
{
! clausegroup = makeList1(orsubclause);
! }

! 		/*
! 		 * If we found no clauses for this indexkey in the OR subclause
! 		 * itself, try looking in the rel's top-level restriction list.
! 		 */
! 		if (clausegroup == NIL)
! 		{
! 			foreach(item, rel->baserestrictinfo)
  			{
+ 				RestrictInfo *rinfo = (RestrictInfo *) lfirst(item);
+ 
  				if (match_clause_to_indexkey(rel, index,
  											 curIndxKey, curClass,
! 											 rinfo->clause, false))
! 					clausegroup = lappend(clausegroup, rinfo->clause);
  			}
+ 		}

! /*
! * If still no clauses match this key, we're done; we don't want to
! * look at keys to its right.
! */
! if (clausegroup == NIL)
! break;
!
! quals = nconc(quals, clausegroup);
!
! indexkeys++;
! classes++;
! } while (!DoneMatchingIndexKeys(indexkeys, index));
!
! if (quals == NIL)
! elog(ERROR, "extract_or_indexqual_conditions: no matching clause");

return expand_indexqual_conditions(quals);
}

Oleg Bartunov

oleg@sai.msu.su

over 24 years ago

In reply to: Tom Lane (#6)

Re: Strange query plan

On Tue, 5 Jun 2001, Tom Lane wrote:

Oleg Bartunov <oleg@sai.msu.su> writes:

select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid
and idx.did=1 and idx.lid in (1207,59587) )
NOTICE: QUERY PLAN:

Seq Scan on msg_prt (cost=0.00..119090807.13 rows=69505 width=4)
SubPlan
-> Index Scan using idxidx, idxidx on idx (cost=0.00..1713.40 rows=1 width=4)

Actually, this example does reveal an unnecessary inefficiency: the
planner is only using the "idx.lid in (1207,59587)" clause for the
indexscan, ignoring the fact that the did and tid clauses match the
additional columns of your three-column index. The attached patch
should improve matters.

regards, tom lane

Cool. Looks better
select msg_prt.tid as mid from msg_prt where exists (select idx.tid from idx where msg_prt.tid=idx.tid and idx.did=1 and idx.lid in ( 1207, 59587) )
NOTICE: QUERY PLAN:

Seq Scan on msg_prt (cost=0.00..333700.88 rows=69505 width=4)
SubPlan
-> Index Scan using idxidx, idxidx on idx (cost=0.00..4.79 rows=1 width=4)

total: 3.15 sec; number: 1; for one: 3.153 sec;

interesting that droping index 'idxidx' and creating simple
create index tididx on idx (tid);
behaves better, while plan looks worse. Notice, index on tididx
estimates cost better (16).

select msg_prt.tid as mid from msg_prt where exists (select idx.tid from idx where msg_prt.tid=idx.tid and idx.did=1 and idx.lid in ( 1207, 59587) )
NOTICE: QUERY PLAN:

Seq Scan on msg_prt (cost=0.00..1134474.94 rows=69505 width=4)
SubPlan
-> Index Scan using tididx on idx (cost=0.00..16.31 rows=1 width=4)

total: 1.70 sec; number: 1; for one: 1.703 sec;

Interesting that earlier if I have 2 indexes - idxidx and tididx
optimizer choose tididx, while now (after patching) optimizer
always choose idxidx.