An optimisation question

Started by Constantin Teodorescuover 26 years ago3 messages

Constantin Teodorescu

teo@flex.ro

over 26 years ago

Hello,

I have a very big table (valori) with the following columns:
- data datetime
- debitor float8
- creditor float8

It has a btree index on data (non unique).

The following select is using the index:

select * from valori where data > '25-10-1999'

NOTICE: QUERY PLAN:
Index Scan using valori_data on valori (cost=1550.17 rows=24324
width=8)

But this one:

select data from valori order by desc limit 1
NOTICE: QUERY PLAN:

Sort (cost=3216.01 rows=72970 width=8)
-> Seq Scan on valori (cost=3216.01 rows=72970 width=8)

I thought that if the 'order by' implies an column which have a btree
index, the sort would not be actually executed and the index will be
used instead. But it seems that it won't.

Then, the question is : How should I retrieve extremely fast the first
'data' greater than a given value from that table.

Also, the following query :

select max(data) from valori where data<'2-3-1999'

is not using optimally the index, it just limit the records for the
aggregate function instead of picking the first value from the left of
the index tree lower than '2-3-1999'.

Waiting for some ideas,
Constantin Teodorescu
FLEX Consulting Braila, ROMANIA

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Constantin Teodorescu (#1)

Re: [HACKERS] An optimisation question

Constantin Teodorescu <teo@flex.ro> writes:

select data from valori order by desc limit 1
NOTICE: QUERY PLAN:
Sort (cost=3216.01 rows=72970 width=8)

-> Seq Scan on valori (cost=3216.01 rows=72970 width=8)

I thought that if the 'order by' implies an column which have a btree
index, the sort would not be actually executed and the index will be
used instead. But it seems that it won't.

That's fixed for 6.6. A workaround that partially solves the problem
for 6.5 is to add a dummy WHERE clause referencing the ORDER-BY item:
select data from valori where data > '1/1/1800'
order by data limit 1;
The WHERE is needed to get the 6.5 optimizer to consider the index
at all. In a quick test it seems this works for normal order but not
DESC order... you could try applying the backwards-index patch that
someone (Hiroshi or Tatsuo, I think) posted recently.

Also, the following query :
select max(data) from valori where data<'2-3-1999'
is not using optimally the index, it just limit the records for the
aggregate function instead of picking the first value from the left of
the index tree lower than '2-3-1999'.

There's no prospect of that happening anytime soon, I fear; there is no
connection between aggregate functions and indexes in the system, and
no easy way of making one. This workaround works in current sources:

explain select data from valori where data<'2-3-1999'
order by data desc limit 1;
NOTICE: QUERY PLAN:

Index Scan Backward using valori_i on valori (cost=21.67 rows=334 width=8)

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofMon30Aug1999065018+000037CA29AA.7C69D9D7@flex.ro | Resolved by subject fallback

Constantin Teodorescu

teo@flex.ro

over 26 years ago

In reply to: Tom Lane (#2)

Re: [HACKERS] An optimisation question

Tom Lane wrote:

That's fixed for 6.6. A workaround that partially solves the problem
for 6.5 is to add a dummy WHERE clause referencing the ORDER-BY item:
select data from valori where data > '1/1/1800'
order by data limit 1;
The WHERE is needed to get the 6.5 optimizer to consider the index
at all. In a quick test it seems this works for normal order but not
DESC order... you could try applying the backwards-index patch that
someone (Hiroshi or Tatsuo, I think) posted recently.

Yeap , I will search for it.

There's no prospect of that happening anytime soon, I fear; there is no
connection between aggregate functions and indexes in the system, and
no easy way of making one.

Understand that, but.
Selects that deal ONLY with columns included in an index should operate
exclusively on that index and return the results. Example : select
sum(price) , price*1.2, max(price) from products , assuming that price
is included in an index it would be less cost to scan the index rather
than the whole table.

I remember that Paradox tables had indexes and the index was also a
Paradox table or some sort of that. Internally it's possible that a
number of procedures related to tables could be applied to indexes. So,
a sum(price) from a_table could be easily switched to be done on any
index that contain the price field.

What do you think?

Constantin Teodorescu
FLEX Consulting BRaila, ROMANIA