Order by optimisations?

Started by Christopher Kings-Lynneover 20 years ago12 messages
#1Christopher Kings-Lynne
chriskl@familyhealth.com.au

Hi,

Does PostgreSQL do the following optimisation:

SELECT * FROM diary WHERE date = '2005-05-01' ORDER BY date;

or in fact even better (for my situation)

SELECT * FROM diary WHERE date BETWEEN '2005-05-01' AND '2005-05-01'
ORDER BY date;

Does it know that the input to the sort routine is already sorted and
hence is a no-op?

Chris

#2Hannu Krosing
hannu@skype.net
In reply to: Christopher Kings-Lynne (#1)
Re: Order by optimisations?

On K, 2005-07-13 at 16:08 +0800, Christopher Kings-Lynne wrote:

Hi,

Does PostgreSQL do the following optimisation:

SELECT * FROM diary WHERE date = '2005-05-01' ORDER BY date;

or in fact even better (for my situation)

SELECT * FROM diary WHERE date BETWEEN '2005-05-01' AND '2005-05-01'
ORDER BY date;

Does it know that the input to the sort routine is already sorted and
hence is a no-op?

Yes

try EXPLAIN ;)

--
Hannu Krosing <hannu@skype.net>

#3Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Hannu Krosing (#2)
Re: Order by optimisations?

Hannu Krosing wrote:

On K, 2005-07-13 at 16:08 +0800, Christopher Kings-Lynne wrote:

Hi,

Does PostgreSQL do the following optimisation:

SELECT * FROM diary WHERE date = '2005-05-01' ORDER BY date;

or in fact even better (for my situation)

SELECT * FROM diary WHERE date BETWEEN '2005-05-01' AND '2005-05-01'
ORDER BY date;

Does it know that the input to the sort routine is already sorted and
hence is a no-op?

Yes

try EXPLAIN ;)

Doesn't seem like it does:

usatest=# explain select * from users_myfoods_map where
date='2004-11-21' order by date;
QUERY PLAN
---------------------------------------------------------------------------
Sort (cost=17.17..17.48 rows=123 width=22)
Sort Key: date
-> Seq Scan on users_myfoods_map (cost=0.00..12.90 rows=123 width=22)
Filter: (date = '2004-11-21'::date)
(4 rows)

The sort cost is non-zero. Or am I not looking at the right thing...

Chris

#4Michael Paesold
mpaesold@gmx.at
In reply to: Christopher Kings-Lynne (#1)
Re: Order by optimisations?

Christopher Kings-Lynne wrote:

Doesn't seem like it does:

usatest=# explain select * from users_myfoods_map where date='2004-11-21'
order by date;
QUERY PLAN
---------------------------------------------------------------------------
Sort (cost=17.17..17.48 rows=123 width=22)
Sort Key: date
-> Seq Scan on users_myfoods_map (cost=0.00..12.90 rows=123 width=22)
Filter: (date = '2004-11-21'::date)
(4 rows)

The sort cost is non-zero. Or am I not looking at the right thing...

You are looking at the right thing, AFAIK. Well, it seems the planner cannot
reason that if a field should have only one value, sorting on that field is
not needed.

I remember there are examples where the planner will know that the input to
a sort is already sorted and will skip the sort. Tom will be able to explain
if this here is a reasonable optimization. I *guess* it could be done, with
some restrictions.

Best Regards,
Michael Paesold

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Christopher Kings-Lynne (#3)
Re: Order by optimisations?

Christopher Kings-Lynne said:

usatest=# explain select * from users_myfoods_map where
date='2004-11-21' order by date;

I assume that this is program generated SQL, as I hope a human would know
better than to write this. In which case, isn't the answer to improve the
generator rather than expect postgres to make up for its defficiencies?

cheers

andrew

#6Jochem van Dieten
jochemd@gmail.com
In reply to: Michael Paesold (#4)
Re: Order by optimisations?

On 7/14/05, Michael Paesold wrote:

Christopher Kings-Lynne wrote:

usatest=# explain select * from users_myfoods_map where date='2004-11-21'
order by date;
QUERY PLAN
---------------------------------------------------------------------------
Sort (cost=17.17..17.48 rows=123 width=22)
Sort Key: date
-> Seq Scan on users_myfoods_map (cost=0.00..12.90 rows=123 width=22)
Filter: (date = '2004-11-21'::date)
(4 rows)

The sort cost is non-zero. Or am I not looking at the right thing...

You are looking at the right thing, AFAIK. Well, it seems the planner cannot
reason that if a field should have only one value, sorting on that field is
not needed.

For the planner to deduct that, it should first deduct that the field
should only have one value. Is that a deduction the planner can even
make for this query if we consider for instance implicit timestamp to
date casting?

I remember there are examples where the planner will know that the input to
a sort is already sorted and will skip the sort.

The planner knows the output of an indexscan is sorted. With a proper
index on the "date" field (I hope that is not really the name) and
favourable statistics the planner should switch to an indexscan and
the order node should disappear.

Jochem

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#2)
Re: Order by optimisations?

Hannu Krosing <hannu@skype.net> writes:

On K, 2005-07-13 at 16:08 +0800, Christopher Kings-Lynne wrote:

Does it know that the input to the sort routine is already sorted and
hence is a no-op?

Yes

No, but in most cases this will use an index and hence will assume that
the index is responsible for ordering.

regards, tom lane

#8Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Andrew Dunstan (#5)
Re: Order by optimisations?

I assume that this is program generated SQL, as I hope a human would know
better than to write this. In which case, isn't the answer to improve the
generator rather than expect postgres to make up for its defficiencies?

Well, the issue in my case is we have user food diaries. Usually,
99.9999% of the time we pull up a single date of their diary. However,
for printing purposes we need a range.

It's a large query, so it's implemented as a simple PL/PSQL stored
procedure.

I was trying to avoid having to c&p the entire stored proc to make a
'range version'.

If PostgreSQL was smart enough to deal with a range of 1 day and a sort
on it efficiently, I'd just use the range stored proc exclusively....

Chris

#9Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Tom Lane (#7)
Re: Order by optimisations?
Show quoted text

Does it know that the input to the sort routine is already sorted and
hence is a no-op?

Yes

No, but in most cases this will use an index and hence will assume that
the index is responsible for ordering.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#10Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Tom Lane (#7)
Re: Order by optimisations?

Does it know that the input to the sort routine is already sorted and
hence is a no-op?

Yes

No, but in most cases this will use an index and hence will assume that
the index is responsible for ordering.

OK, so what's going on here?

usa=> explain select * from users_myfoods_map where user_id=1 and
date='2003-11-03' order by date;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------
Sort (cost=4.84..4.85 rows=2 width=22)
Sort Key: date
-> Index Scan using users_myfoods_map_user_id_date_key on
users_myfoods_map (cost=0.00..4.83 rows=2 width=22)
Index Cond: ((user_id = 1) AND (date = '2003-11-03'::date))
(4 rows)

(That's on our enormous live table)

Chris

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christopher Kings-Lynne (#10)
Re: Order by optimisations?

Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:

OK, so what's going on here?

usa=> explain select * from users_myfoods_map where user_id=1 and
date='2003-11-03' order by date;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------
Sort (cost=4.84..4.85 rows=2 width=22)
Sort Key: date
-> Index Scan using users_myfoods_map_user_id_date_key on
users_myfoods_map (cost=0.00..4.83 rows=2 width=22)
Index Cond: ((user_id = 1) AND (date = '2003-11-03'::date))
(4 rows)

Well, date evidently isn't the high-order key of this index. But why
exactly are you worried about a sort of 2 rows?

regards, tom lane

#12Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Tom Lane (#11)
Re: Order by optimisations?

Well, date evidently isn't the high-order key of this index. But why
exactly are you worried about a sort of 2 rows?

Aha that's nailed it:

usa=> explain select * from users_myfoods_map where user_id=1 and date
between '2003-11-03' and '2003-11-03' order by user_id, date;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------
Index Scan using users_myfoods_map_user_id_date_key on
users_myfoods_map (cost=0.00..3.78 rows=1 width=22)
Index Cond: ((user_id = 1) AND (date >= '2003-11-03'::date) AND
(date <= '2003-11-03'::date))
(2 rows)

I don't care about this particular result. But imagine it running
thousands of times a minute, with result sets between 0 and 50 rows...

Chris