how to avoid repeating expensive computation in select

Started by Bob Priceabout 15 years ago10 messagesgeneral
Jump to latest
#1Bob Price
rjp_email@yahoo.com

I have been searching through the docs and mailing list and haven't found a way to do this, so I thought I would ask the community.

I would like to know if there is a way in PostgreSQL to avoid repeating an expensive computation in a SELECT where the result is needed both as a returned value and as an expression in the WHERE clause.

As a simple example, consider the following query on a table with 'id' and 'value' columns, and an expensive computation represented as a function:

SELECT id, expensivefunc(value) AS score FROM mytable
WHERE id LIKE '%z%' AND expensivefunc(value) > 0.5;

It would be great if I could find a way to only compute expensivefunc(value) at most once per row, and not at all if the other WHERE constraints are not satisfied.

For this simple case I know that I could rewrite the SELECT as something like the following:

WITH other_where AS (
SELECT id, value FROM mytable WHERE id LIKE '%z%'
), calc_scores AS (
SELECT id, expensivefunc(value) AS score FROM other_where
)
SELECT id, score from calc_scores WHERE score > 0.5;

This works in this simple case, but my guess is that it probably adds a lot of overhead (is this true?), and I also have to deal with much more complicated scenarios with multiple expensive calculations that may not fit into this kind of rewrite.

Does anyone know of a simpler way to accomplish this?

For example, it would be great if there were a function that could reference the Nth select list item so it is only computed once, like:

SELECT id, expensivefunc(value) AS score FROM mytable
WHERE id LIKE '%z%' AND sel_list_item(2) > 0.5;

or if there were temporary variables in the WHERE expressions like:

SELECT id, tmp1 AS score FROM mytable
WHERE id LIKE '%z%' AND (tmp1 = expensivefunc(value)) > 0.5;

Any ideas anyone!

Thanks in advance!
Bob

#2Bill Moran
wmoran@potentialtech.com
In reply to: Bob Price (#1)
Re: how to avoid repeating expensive computation in select

In response to Bob Price <rjp_email@yahoo.com>:

I have been searching through the docs and mailing list and haven't found a way to do this, so I thought I would ask the community.

I would like to know if there is a way in PostgreSQL to avoid repeating an expensive computation in a SELECT where the result is needed both as a returned value and as an expression in the WHERE clause.

As a simple example, consider the following query on a table with 'id' and 'value' columns, and an expensive computation represented as a function:

SELECT id, expensivefunc(value) AS score FROM mytable
WHERE id LIKE '%z%' AND expensivefunc(value) > 0.5;

It would be great if I could find a way to only compute expensivefunc(value) at most once per row, and not at all if the other WHERE constraints are not satisfied.

Two ways that I can think of:
1) If expensivefunc() doesn't have any side-effects, you can create it
as IMMUTABLE, which tells PostgreSQL that it can cache the result
for optimization purposes. IMMUTABLE is not the default.
2) Create a new column in the table that stores the value of
expensivefunc(value) and add a trigger to the table to ensure that
column is updated any time value is changed. This will slow down
inserts and updates a bit, but it means you can select/compare the
generated column directly with no calculation.

Which one of these is more practical for you depends on a number of
factors about the table, the data, and the function.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Bob Price (#1)
Re: how to avoid repeating expensive computation in select

Is using a pl/pgsql function a viable option. Within or without the use of
a function you can create a temporary table to hold the needed intermediate
results. You can even use a permanent working table and write functions to
perform the needed queries against it.

Especially for expensive calculation you want to consider whether it is
safe/reasonable to pre-calculate and store values instead of running the
calculation during each query.

If you need procedural language capabilities (variables, multiple uses of
the same data) trying to work out a solution in pure transactional SQL can
be difficult or outright impossible; you really need to use the procedural
facilities built into the server OR your application environment. In other
words put down the hammer and go find yourself a chainsaw :)

David J

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Bob Price
Sent: Thursday, February 03, 2011 12:18 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] how to avoid repeating expensive computation in select

I have been searching through the docs and mailing list and haven't found a
way to do this, so I thought I would ask the community.

I would like to know if there is a way in PostgreSQL to avoid repeating an
expensive computation in a SELECT where the result is needed both as a
returned value and as an expression in the WHERE clause.

As a simple example, consider the following query on a table with 'id' and
'value' columns, and an expensive computation represented as a function:

SELECT id, expensivefunc(value) AS score FROM mytable
WHERE id LIKE '%z%' AND expensivefunc(value) > 0.5;

It would be great if I could find a way to only compute expensivefunc(value)
at most once per row, and not at all if the other WHERE constraints are not
satisfied.

For this simple case I know that I could rewrite the SELECT as something
like the following:

WITH other_where AS (
SELECT id, value FROM mytable WHERE id LIKE '%z%'
), calc_scores AS (
SELECT id, expensivefunc(value) AS score FROM other_where
)
SELECT id, score from calc_scores WHERE score > 0.5;

This works in this simple case, but my guess is that it probably adds a lot
of overhead (is this true?), and I also have to deal with much more
complicated scenarios with multiple expensive calculations that may not fit
into this kind of rewrite.

Does anyone know of a simpler way to accomplish this?

For example, it would be great if there were a function that could reference
the Nth select list item so it is only computed once, like:

SELECT id, expensivefunc(value) AS score FROM mytable
WHERE id LIKE '%z%' AND sel_list_item(2) > 0.5;

or if there were temporary variables in the WHERE expressions like:

SELECT id, tmp1 AS score FROM mytable
WHERE id LIKE '%z%' AND (tmp1 = expensivefunc(value)) > 0.5;

Any ideas anyone!

Thanks in advance!
Bob

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bob Price (#1)
Re: how to avoid repeating expensive computation in select

Bob Price <rjp_email@yahoo.com> writes:

I would like to know if there is a way in PostgreSQL to avoid repeating an expensive computation in a SELECT where the result is needed both as a returned value and as an expression in the WHERE clause.

Use a subselect. You might need OFFSET 0 to prevent the planner from
"flattening" the subselect, eg

SELECT whatever FROM
(SELECT *, expensivefunc(value) AS score FROM mytable OFFSET 0) ss
WHERE id LIKE '%z%' AND score > 0.5;

Keep in mind that in the above formulation, expensivefunc will be
evaluated at rows that don't pass the LIKE test. So you probably want
to push down as much as you can into the sub-select's WHERE clause.
The planner will not help you with that if you put in the OFFSET 0
optimization-fence. It's a good idea to use EXPLAIN (or even better
EXPLAIN VERBOSE, if you're using >= 8.4) to confirm that you're getting
the plan you want.

regards, tom lane

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#4)
Re: how to avoid repeating expensive computation in select

Hello

2011/2/3 Tom Lane <tgl@sss.pgh.pa.us>:

Bob Price <rjp_email@yahoo.com> writes:

I would like to know if there is a way in PostgreSQL to avoid repeating an expensive computation in a SELECT where the result is needed both as a returned value and as an expression in the WHERE clause.

Use a subselect.  You might need OFFSET 0 to prevent the planner from
"flattening" the subselect, eg

   SELECT whatever FROM
     (SELECT *, expensivefunc(value) AS score FROM mytable OFFSET 0) ss
   WHERE id LIKE '%z%' AND score > 0.5;

Keep in mind that in the above formulation, expensivefunc will be
evaluated at rows that don't pass the LIKE test.  So you probably want
to push down as much as you can into the sub-select's WHERE clause.
The planner will not help you with that if you put in the OFFSET 0
optimization-fence.  It's a good idea to use EXPLAIN (or even better
EXPLAIN VERBOSE, if you're using >= 8.4) to confirm that you're getting
the plan you want.

What about to increase a COST value? Can it help?

Regards

Pavel

Show quoted text

                       regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Bob Price
rjp_email@yahoo.com
In reply to: Bill Moran (#2)
Re: how to avoid repeating expensive computation in select

Thanks for all of the responses.

If the IMMUTABLE flag on a function does what the docs say then it might satisfy my needs.

Here is a more specific example of what I need to do, with a new custom data type and a new function:

- define new complex data type X
- create table mytable ( id varchar, value X )
- create function expensivefunc(X,X) which is implemented in C
- select id, expensivefunc(value, 'constantdata...'::X) as score
from mytable where expensivefunc(value, 'constantdata...'::X) > 0.5;

If I set the COST of expensivefunc high, and label it IMMUTABLE, will the query executor note that the two invocations to expensivefunc have the same inputs so it can only call it once and re-use the result the second time?

I imagine that it might be a problem to pass 'constantdata...'::X to both invocations. I guess that I could create a one-time use function that declared a variable with this 'constantdata...'::X value, and then pass this variable in both calls.

Would this work?

Thanks again!
Bob

--- On Thu, 2/3/11, Bill Moran <wmoran@potentialtech.com> wrote:
Show quoted text

From: Bill Moran <wmoran@potentialtech.com>
Subject: Re: [GENERAL] how to avoid repeating expensive computation in select
To: "Bob Price" <rjp_email@yahoo.com>
Cc: pgsql-general@postgresql.org
Date: Thursday, February 3, 2011, 12:23 PM
In response to Bob Price <rjp_email@yahoo.com>:

I have been searching through the docs and mailing

list and haven't found a way to do this, so I thought I
would ask the community.

I would like to know if there is a way in PostgreSQL

to avoid repeating an expensive computation in a SELECT
where the result is needed both as a returned value and as
an expression in the WHERE clause.

As a simple example, consider the following query on a

table with 'id' and 'value' columns, and an expensive
computation represented as a function:

SELECT id, expensivefunc(value) AS

score FROM mytable

WHERE id LIKE '%z%' AND

expensivefunc(value) > 0.5;

It would be great if I could find a way to only

compute expensivefunc(value) at most once per row, and not
at all if the other WHERE constraints are not satisfied.

Two ways that I can think of:
1) If expensivefunc() doesn't have any side-effects, you
can create it
as IMMUTABLE, which tells PostgreSQL that
it can cache the result
for optimization purposes.
IMMUTABLE is not the default.
2) Create a new column in the table that stores the value
of
expensivefunc(value) and add a trigger to
the table to ensure that
column is updated any time value is
changed. This will slow down
inserts and updates a bit, but it means
you can select/compare the
generated column directly with no
calculation.

Which one of these is more practical for you depends on a
number of
factors about the table, the data, and the function.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

#7Nicklas Avén
nicklas.aven@jordogskog.no
In reply to: Tom Lane (#4)
Re: how to avoid repeating expensive computation in select

That's interesting Tom.

This is a discussion coming up sometimes at PostGIS lists since PostGIS
often handles "expensive" calculations.

Regina wrote a blog post about it:
http://postgresonline.com/journal/archives/113-How-to-force-PostgreSQL-to-use-a-pre-calculated-value.html

I thought the "offset 0" trick was just a dirty hack, but coming from
you, Tom, I assume it is a robust way of doing it.

I also tried some of the queries we tried then, almost 2 years ago, and
I think it seems like PostgreSQL handles this much better in 9.0. Is
that possible?
What was strange then was that PostGIS functions marked immutable also
was recalculated between Select and where-clause and also if used
multiple times in the Select part.

But I think (from very few tests) that the result of the function was
reused in a better way now.

Can that be the case?

Thanks

Nicklas Avén

Show quoted text

On Thu, 2011-02-03 at 13:16 -0500, Tom Lane wrote:

Bob Price <rjp_email@yahoo.com> writes:

I would like to know if there is a way in PostgreSQL to avoid repeating an expensive computation in a SELECT where the result is needed both as a returned value and as an expression in the WHERE clause.

Use a subselect. You might need OFFSET 0 to prevent the planner from
"flattening" the subselect, eg

SELECT whatever FROM
(SELECT *, expensivefunc(value) AS score FROM mytable OFFSET 0) ss
WHERE id LIKE '%z%' AND score > 0.5;

Keep in mind that in the above formulation, expensivefunc will be
evaluated at rows that don't pass the LIKE test. So you probably want
to push down as much as you can into the sub-select's WHERE clause.
The planner will not help you with that if you put in the OFFSET 0
optimization-fence. It's a good idea to use EXPLAIN (or even better
EXPLAIN VERBOSE, if you're using >= 8.4) to confirm that you're getting
the plan you want.

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bob Price (#6)
Re: how to avoid repeating expensive computation in select

Bob Price <rjp_email@yahoo.com> writes:

If I set the COST of expensivefunc high, and label it IMMUTABLE, will the query executor note that the two invocations to expensivefunc have the same inputs so it can only call it once and re-use the result the second time?

No. There is a myth prevalent among certain wishful thinkers that
IMMUTABLE does something like that, but it doesn't. IMMUTABLE only
licenses the planner to fold a call *with constant arguments* into a
constant result, by executing the function once before the query
actually starts. Textually distinct calls of a function are not folded
together in any case.

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nicklas Avén (#7)
Re: how to avoid repeating expensive computation in select

Nicklas =?ISO-8859-1?Q?Av=E9n?= <nicklas.aven@jordogskog.no> writes:

I thought the "offset 0" trick was just a dirty hack, but coming from
you, Tom, I assume it is a robust way of doing it.

Well, I can't deny it's a dirty hack ... but it's not something we'll
break until we have a better solution.

I also tried some of the queries we tried then, almost 2 years ago, and
I think it seems like PostgreSQL handles this much better in 9.0. Is
that possible?

That observation is too vague to comment on. There are surely things
that are better in 9.0, but we haven't done anything lately that would
be likely to reduce the number of calls to a user-defined function per
se.

regards, tom lane

#10Bob Price
rjp_email@yahoo.com
In reply to: Tom Lane (#9)
Re: how to avoid repeating expensive computation in select

One last question. Are there any pitfalls if I roll my own ability to check for duplicate calls?

Since I am using my own defined data type, and my own function, I could
do this by:

1. in my data type X, adding fields for: a table oid, a row oid, a copy
of a reference to the last 2nd argument, and a copy of the last result
(in my case a double)

2. in my function (which takes two X args), if the new
table/row/ref-2nd-arg match the last data (saved in the first arg), then
return the last result, otherwise compute the new result and save this
info to use on the next call

This should enable only a single computation of the data for any given
row as long as the same args are used as parameters.

But, is this safe, or have any pitfalls I would need to look out for?

Thanks!
Bob

--- On Thu, 2/3/11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text

From: Tom Lane <tgl@sss.pgh.pa.us>
Subject: Re: [GENERAL] how to avoid repeating expensive computation in select
To: "Bob Price" <rjp_email@yahoo.com>
Cc: "Bill Moran" <wmoran@potentialtech.com>, pgsql-general@postgresql.org
Date: Thursday, February 3, 2011, 5:33 PM
Bob Price <rjp_email@yahoo.com>
writes:

If I set the COST of expensivefunc high, and label it

IMMUTABLE, will the query executor note that the two
invocations to expensivefunc have the same inputs so it can
only call it once and re-use the result the second time?

No. There is a myth prevalent among certain wishful
thinkers that
IMMUTABLE does something like that, but it doesn't.
IMMUTABLE only
licenses the planner to fold a call *with constant
arguments* into a
constant result, by executing the function once before the
query
actually starts. Textually distinct calls of a
function are not folded
together in any case.

regards, tom lane