suggestion to improve planer

Started by Ľubomír Vargaalmost 17 years ago3 messageshackers

luvar@plaintext.sk

almost 17 years ago

Hi.

I hope, that this is right mailing list.

SELECT date, value FROM t_event
WHERE t_event.id in (SELECT id FROM t_event
WHERE date < '2009-08-25'
ORDER BY date DESC LIMIT 1)
ORDER BY date;
cost 6.4

SELECT date, value FROM t_event
WHERE t_event.id = (SELECT id FROM t_event
WHERE date < '2009-08-25'
ORDER BY date DESC LIMIT 1)
ORDER BY date;
cost 6.36..6.37

Why that two query dont have equal cost? If it is not problem, try add some
planer code to recognize that sublesect HAVE TO return just one row (limit 1)
and in plan could be used filter/index scan instead of hash aggregate. I have
also some complex query examples where cost difference is more visible.

Have a nice day.
--
Odborník na všetko je zlý odborník. Ja sa snažím byť výnimkou potvrdzujúcou
pravidlo.

Peter Eisentraut

peter_e@gmx.net

almost 17 years ago

In reply to: Ľubomír Varga (#1)

Re: suggestion to improve planer

On Thu, 2009-09-03 at 10:35 +0200, Ľubomír Varga wrote:

Hi.

I hope, that this is right mailing list.

SELECT date, value FROM t_event
WHERE t_event.id in (SELECT id FROM t_event
WHERE date < '2009-08-25'
ORDER BY date DESC LIMIT 1)
ORDER BY date;
cost 6.4

SELECT date, value FROM t_event
WHERE t_event.id = (SELECT id FROM t_event
WHERE date < '2009-08-25'
ORDER BY date DESC LIMIT 1)
ORDER BY date;
cost 6.36..6.37

Why that two query dont have equal cost? If it is not problem, try add some
planer code to recognize that sublesect HAVE TO return just one row (limit 1)
and in plan could be used filter/index scan instead of hash aggregate.

Well, there is always a tradeoff between more planner analysis and more
complicated and slow planning. Seeing that the cost estimates are close
enough for practical purposes, it doesn't seem worthwhile to fix
anything here.

I have
also some complex query examples where cost difference is more visible.

Having real examples where a change might actually improve runtime is
always more interesting than an academic exercise like the above.

Ľubomír Varga

luvar@plaintext.sk

almost 17 years ago

In reply to: Peter Eisentraut (#2)

Re: suggestion to improve planer

On Wednesday 09 September 2009 14:11:41 Peter Eisentraut wrote:

On Thu, 2009-09-03 at 10:35 +0200, Ľubomír Varga wrote:

Hi.

I hope, that this is right mailing list.

SELECT date, value FROM t_event
WHERE t_event.id in (SELECT id FROM t_event
WHERE date < '2009-08-25'
ORDER BY date DESC LIMIT 1)
ORDER BY date;
cost 6.4

SELECT date, value FROM t_event
WHERE t_event.id = (SELECT id FROM t_event
WHERE date < '2009-08-25'
ORDER BY date DESC LIMIT 1)
ORDER BY date;
cost 6.36..6.37

Why that two query dont have equal cost? If it is not problem, try add
some planer code to recognize that sublesect HAVE TO return just one row
(limit 1) and in plan could be used filter/index scan instead of hash
aggregate.

Well, there is always a tradeoff between more planner analysis and more
complicated and slow planning. Seeing that the cost estimates are close
enough for practical purposes, it doesn't seem worthwhile to fix
anything here.

I have
also some complex query examples where cost difference is more visible.

Having real examples where a change might actually improve runtime is
always more interesting than an academic exercise like the above.

Oka, real world example is attached as txt file. There are 3x2 queries and its
costs on my system/database. System/database is also real, and production.
First set for short time interval and second for long time interval.

Main poblem is, that I want to generate some statistical report and I dont
know for how many "devices" Iam going to generate it for. So if I make some
program, there will be something like:
..."t_device.imei in (" + this.getDevicesImeis() + ")"...
If i have only one device, there could be "=" instead of "in".

PS: Iam very confused about actual costs. I dont have exact query on which I
have seen that "in" for one device in array, is much worse than "=" for one
device. Actual costs are somehow different and "=" select gives about ten
time worse cost.

So probably just ignore my mail and keep planner as is.
My version of postgre: "PostgreSQL 8.2.6 on x86_64-pc-linux-gnu, compiled by
GCC x86_64-pc-linux-gnu-gcc (GCC) 4.1.1 (Gentoo 4.1.1-r3)"

--
Odborník na všetko je zlý odborník. Ja sa snažím byť výnimkou potvrdzujúcou
pravidlo.

suggestion to improve planer

Attachments: