Query optimization with repeated calculations

Started by Robert McGeheeover 8 years ago2 messagesgeneral

rmcgehee@gmail.com

over 8 years ago

Hello,
I have a general query optimization question involving repeated
calculations.

I frequently want to make views that repeat a calculation over and over
again in lots of columns. In the example below, let’s say it’s (a+b), but
we can imagine the calculation being much more complicated.

For example:
CREATE VIEW AS
SELECT (a+b)*c as c1, (a+b)*d as d1, (a+b)*e as e1
FROM table;

My question is, is PostgreSQL "smart" enough to cache this (a+b)
calculation in the above example, or does it recalculate it in each column?
If it recalculates it in each column, would I generally get better
performance by using a subquery to convince PostgreSQL to cache the
result? For example:

CREATE VIEW AS
SELECT x*c as c1, x*d as d1, x*e as e1
FROM (SELECT (a+b) as x, * FROM table) x;

Or perhaps I could use a CTE? In some instances, I might need 2 or more
subqueries to “cache” all the calculations (if one column depends on
another column that depends on another column), and I feel that sometimes
Postgres makes better optimization decisions when there are not subqueries
involved, but I am not sure.

Obviously I could benchmark everything every single time this comes up, but
there is overhead in that and some kind of rule of thumb or general
guideline would be helpful as it is a frequent question for me.

Thank you!
Robert

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Robert McGehee (#1)

Re: Query optimization with repeated calculations

Robert McGehee <rmcgehee@gmail.com> writes:

I frequently want to make views that repeat a calculation over and over
again in lots of columns. In the example below, let’s say it’s (a+b), but
we can imagine the calculation being much more complicated.

For example:
CREATE VIEW AS
SELECT (a+b)*c as c1, (a+b)*d as d1, (a+b)*e as e1
FROM table;

My question is, is PostgreSQL "smart" enough to cache this (a+b)
calculation in the above example, or does it recalculate it in each column?

Postgres will not notice that there's duplicate subexpressions.

If it recalculates it in each column, would I generally get better
performance by using a subquery to convince PostgreSQL to cache the
result? For example:

CREATE VIEW AS
SELECT x*c as c1, x*d as d1, x*e as e1
FROM (SELECT (a+b) as x, * FROM table) x;

Maybe, if the subexpression is expensive enough (a+b probably doesn't
meet that threshold). You'd need to take care to prevent the
subquery from being "flattened" into the upper query. Typically
this might require an optimization fence like OFFSET 0, which is
problematic for a view because it could prevent optimizations that
are far more important than avoiding duplicated calculations.
CTEs are likewise a rather blunt tool that could cost more than
they save.

It might work better to put the redundant calculations in a LATERAL
subquery, ie

CREATE VIEW AS
SELECT x*c as c1, x*d as d1, x*e as e1
FROM tab, LATERAL (SELECT a+b as x OFFSET 0) ss;

I haven't really experimented, but I think that this'd dodge
most of the optimization pitfalls, except one: in this formulation,
a+b will be evaluated for every "tab" row even if the particular
use of the view does not demand any of the columns that use "x".

regards, tom lane