Backwards index scan

Started by Dmitry Tkachover 22 years ago10 messages
#1Dmitry Tkach
dmitry@openratings.com

I am not sure if this is really a bug, but it certainly looks like one
to me...

I have a table that looks something like this:

create table huge_table
(
int x,
int y
);
create index huge_table_idx on huge_table (x,y);

It contains about 80 million rows...
I am trying to get those rows that have a particular value for x,
ordered by y in descending order (and I am looking to get just a few
first ones), so I am running a query like:

declare mycursor cursor for select * from huge_table where x=10 order by
x desc, y desc;
fetch 10 from mycursor;

this query takes 10 to 20 minutes!

This is because there are *lots* (a few million) of matches for x=10,
and _bt_first () scans through them *all* sequentually to get to the
last one.

Now, if I change the query to look like:

declare mycursor cursor for select * from huge_table where x > 9 and x <
11 order by x desc, y desc;
(which is the same thing)
then fetch 10 from mycursor; returns right away (under a second), just
as I expected.

I understand that with the generic approach to operators in postgres it
is, probably, not very feasible to try and teach _bt_first () to handle
this situation automatically (it would need to know how to get
next/previous value for every indexable type)... I guess, that could be
done by adding another kind of strategy to pg_amop for example...
Another way to work around this would be to allow ordering spec to be a
part of CREATE INDEX (I know, that informix does that for example) - so
that, I could do
create index huge_table_idx on huge_table (x, y desc);
... and then select * from huge_table where x=10 order x, y desc;
would not require a backwards scan to begin with.

Can something like this be done? What do you think?

Thanks!

Dima

#2Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Dmitry Tkach (#1)
Re: Backwards index scan

On Mon, 7 Jul 2003, Dmitry Tkach wrote:

I understand that with the generic approach to operators in postgres it
is, probably, not very feasible to try and teach _bt_first () to handle
this situation automatically (it would need to know how to get
next/previous value for every indexable type)... I guess, that could be
done by adding another kind of strategy to pg_amop for example...
Another way to work around this would be to allow ordering spec to be a
part of CREATE INDEX (I know, that informix does that for example) - so
that, I could do
create index huge_table_idx on huge_table (x, y desc);
... and then select * from huge_table where x=10 order x, y desc;
would not require a backwards scan to begin with.

Can something like this be done? What do you think?

If you make an opclass that orders in the reverse order you can use that
opclass in creating the index (which effectively can give you an index
like x, y desc by using the new opclass on y). There was some talk
recently about whether we should provide such opclasses as builtins or
contrib items.

#3Dmitry Tkach
dmitry@openratings.com
In reply to: Stephan Szabo (#2)
Re: Backwards index scan

If you make an opclass that orders in the reverse order you can use that
opclass in creating the index (which effectively can give you an index
like x, y desc by using the new opclass on y). There was some talk
recently about whether we should provide such opclasses as builtins or
contrib items.

Ah! Nice :-)
I did not think about it...

Thanks a lot for the hit!

Dima

#4Dmitry Tkach
dmitry@openratings.com
In reply to: Stephan Szabo (#2)
Re: Backwards index scan

Stephan Szabo wrote:

If you make an opclass that orders in the reverse order you can use that
opclass in creating the index (which effectively can give you an index
like x, y desc by using the new opclass on y). There was some talk
recently about whether we should provide such opclasses as builtins or
contrib items.

Actually, I just thought, it is not exactly equivalent, unless I am
missing something.
If I create this opclass and the index, and then make a query like
select * from huge_table where x=10 order by x,y desc

... it won't know to use the index for sorting, will it?
My understanding is that I'd have to get rid of the sort clause
completely, and just rely on the query plan, right?

In this situation, it will work... But it may be a problem when the
query is (a lot) more complicated, with several joins and a bunch of
different paths available to the planner - how can I guarantee then that
it will always choose this index and return the results in the right order?

Currently I just always use the sort clause, and that forces it to pick
the right index even if another path looks a little less expensive, but
with this custom opclass, I believe, having the sort clause will always
cause it to actually sort even if it does use the right index...

Or am I missing something here?

Thanks!

Dima

#5Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Dmitry Tkach (#4)
Re: Backwards index scan

On Tue, 8 Jul 2003, Dmitry Tkach wrote:

Stephan Szabo wrote:

If you make an opclass that orders in the reverse order you can use that
opclass in creating the index (which effectively can give you an index
like x, y desc by using the new opclass on y). There was some talk
recently about whether we should provide such opclasses as builtins or
contrib items.

Actually, I just thought, it is not exactly equivalent, unless I am
missing something.
If I create this opclass and the index, and then make a query like
select * from huge_table where x=10 order by x,y desc

... it won't know to use the index for sorting, will it?

I don't know the mechanics (haven't looked) but it seems to know
based on the way the operators are assigned to the opclass. I've
done some minimal tests and for queries like
select * from tab order by a, b desc
and then gotten effectively
Index scan using <index> on <table>
as the plan with no sort steps.

In this situation, it will work... But it may be a problem when the
query is (a lot) more complicated, with several joins and a bunch of
different paths available to the planner - how can I guarantee then that
it will always choose this index and return the results in the right order?

You can't guarantee that it'll always choose this index because it's
possible that the index is more expensive for a particular query, but
it should consider the index.

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dmitry Tkach (#1)
Re: [GENERAL] Backwards index scan

[ reply redirected to a more appropriate list ]

Dmitry Tkach <dmitry@openratings.com> writes:

I am not sure if this is really a bug, but it certainly looks like one
to me...

It's not a bug, but I agree that _bt_first can be inefficient if there
are lots of matching keys.

This is because there are *lots* (a few million) of matches for x=10,
and _bt_first () scans through them *all* sequentually to get to the
last one.
I understand that with the generic approach to operators in postgres it
is, probably, not very feasible to try and teach _bt_first () to handle
this situation automatically (it would need to know how to get
next/previous value for every indexable type)...

I think what we'd want is variant versions of _bt_search and _bt_binsrch
that locate the first entry greater than the specified target key,
rather than the first entry greater than or equal to it. Given such
positioning, all the _bt_first cases that involve stepping over more
than one entry could be improved to require no more than one step.

Not sure whether it'd be better to make clone versions of these
functions, or to add a parameter to tell them which behavior is wanted.

regards, tom lane

#7Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#6)
Re: [GENERAL] Backwards index scan

Is this a TODO?

---------------------------------------------------------------------------

Tom Lane wrote:

[ reply redirected to a more appropriate list ]

Dmitry Tkach <dmitry@openratings.com> writes:

I am not sure if this is really a bug, but it certainly looks like one
to me...

It's not a bug, but I agree that _bt_first can be inefficient if there
are lots of matching keys.

This is because there are *lots* (a few million) of matches for x=10,
and _bt_first () scans through them *all* sequentually to get to the
last one.
I understand that with the generic approach to operators in postgres it
is, probably, not very feasible to try and teach _bt_first () to handle
this situation automatically (it would need to know how to get
next/previous value for every indexable type)...

I think what we'd want is variant versions of _bt_search and _bt_binsrch
that locate the first entry greater than the specified target key,
rather than the first entry greater than or equal to it. Given such
positioning, all the _bt_first cases that involve stepping over more
than one entry could be improved to require no more than one step.

Not sure whether it'd be better to make clone versions of these
functions, or to add a parameter to tell them which behavior is wanted.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#6)
Re: [GENERAL] Backwards index scan

Tom Lane <tgl@sss.pgh.pa.us> writes:

Dmitry Tkach <dmitry@openratings.com> writes:

This is because there are *lots* (a few million) of matches for x=10,
and _bt_first () scans through them *all* sequentually to get to the
last one.

It's not a bug, but I agree that _bt_first can be inefficient if there
are lots of matching keys.

I think what we'd want is variant versions of _bt_search and _bt_binsrch
that locate the first entry greater than the specified target key,
rather than the first entry greater than or equal to it. Given such
positioning, all the _bt_first cases that involve stepping over more
than one entry could be improved to require no more than one step.

I have committed a fix into 7.5devel to do this properly. I think this
is the last case wherein btree is unnecessarily inefficient for large
numbers of equal keys.

regards, tom lane

#9Gaetano Mendola
mendola@bigfoot.com
In reply to: Tom Lane (#8)
Re: [GENERAL] Backwards index scan

Tom Lane wrote:

I have committed a fix into 7.5devel to do this properly. I think this
is the last case wherein btree is unnecessarily inefficient for large
numbers of equal keys.

Any chance to have it on 7.4.1 ?

Regards
Gaetano Mendola

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gaetano Mendola (#9)
Re: [GENERAL] Backwards index scan

Gaetano Mendola <mendola@bigfoot.com> writes:

Tom Lane wrote:

I have committed a fix into 7.5devel to do this properly. I think this
is the last case wherein btree is unnecessarily inefficient for large
numbers of equal keys.

Any chance to have it on 7.4.1 ?

No. It's inadequately tested to go into a stable release, and anyway
I'm not sure what the interactions are with previous 7.5-only changes
for cross-datatype indexing.

regards, tom lane