Avoiding duplication of code via views -- slower? How do people typically do this?

Started by Joe Van Dykabout 13 years ago9 messagesgeneral
Jump to latest
#1Joe Van Dyk
joe@tanga.com

See
https://gist.github.com/joevandyk/4957646/raw/86d55472ff8b5a4a6740d9c673d18a7005738467/gistfile1.txtfor
the code.

I have promotions(id, end_at, quantity) and promotion_usages(promotion_id).

I have a couple of things I typically want to retrieve, and I'd like those
things to be composable. In this case, finding recently-expired
promotions, finding promotions that have a quantity of one, and finding
promotions that were used.

My approach is to put these conditions into views, then I can join against
each one. But that approach is much slower than inlining all the code.

How is this typically done?

Thanks,
Joe

#2Jack Christensen
jack@jackchristensen.com
In reply to: Joe Van Dyk (#1)
Re: Avoiding duplication of code via views -- slower? How do people typically do this?

Joe Van Dyk wrote:

See
https://gist.github.com/joevandyk/4957646/raw/86d55472ff8b5a4a6740d9c673d18a7005738467/gistfile1.txt
for the code.

I have promotions(id, end_at, quantity) and
promotion_usages(promotion_id).

I have a couple of things I typically want to retrieve, and I'd like
those things to be composable. In this case, finding recently-expired
promotions, finding promotions that have a quantity of one, and
finding promotions that were used.

My approach is to put these conditions into views, then I can join
against each one. But that approach is much slower than inlining all
the code.

How is this typically done?

Thanks,
Joe

From your first example on the gist I extracted this. It should avoid
the multiple scans and hash join the the join of the two views suffers
from.

create view promotions_with_filters as (
select *,
end_at > now() - '30 days'::interval as recently_expired,
quantity = 1 as one_time_use,
exists(select 1 from promotion_usages pu on pu.promotion_id = p.id)
as used
from promotions
);

select count(*) from promotions_with_filters where recently_expired and
one_time_use;

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Joe Van Dyk
joe@tanga.com
In reply to: Jack Christensen (#2)
Re: Avoiding duplication of code via views -- slower? How do people typically do this?

On Thu, Feb 14, 2013 at 6:31 PM, Jack Christensen
<jack@jackchristensen.com>wrote:

Joe Van Dyk wrote:

See https://gist.github.com/**joevandyk/4957646/raw/**
86d55472ff8b5a4a6740d9c673d18a**7005738467/gistfile1.txt<https://gist.github.com/joevandyk/4957646/raw/86d55472ff8b5a4a6740d9c673d18a7005738467/gistfile1.txt&gt;for the code.

I have promotions(id, end_at, quantity) and promotion_usages(promotion_id)
**.

I have a couple of things I typically want to retrieve, and I'd like
those things to be composable. In this case, finding recently-expired
promotions, finding promotions that have a quantity of one, and finding
promotions that were used.

My approach is to put these conditions into views, then I can join
against each one. But that approach is much slower than inlining all the
code.

How is this typically done?

Thanks,
Joe

From your first example on the gist I extracted this. It should avoid

the multiple scans and hash join the the join of the two views suffers from.

create view promotions_with_filters as (
select *,
end_at > now() - '30 days'::interval as recently_expired,
quantity = 1 as one_time_use,
exists(select 1 from promotion_usages pu on pu.promotion_id = p.id)
as used
from promotions
);

select count(*) from promotions_with_filters where recently_expired and
one_time_use;

Perhaps I fat-fingered something somewhere... I tried that and I got this:
https://gist.github.com/joevandyk/4958906/raw/5561f95ef2b5d82f81ab14913c4d36f6aac3ee0a/gistfile1.txt

The with_filters view uses a different plan.

#4Chris Travers
chris.travers@gmail.com
In reply to: Joe Van Dyk (#1)
Re: Avoiding duplication of code via views -- slower? How do people typically do this?

On Thu, Feb 14, 2013 at 4:32 PM, Joe Van Dyk <joe@tanga.com> wrote:

See
https://gist.github.com/joevandyk/4957646/raw/86d55472ff8b5a4a6740d9c673d18a7005738467/gistfile1.txtfor the code.

I have promotions(id, end_at, quantity) and promotion_usages(promotion_id).

I have a couple of things I typically want to retrieve, and I'd like those
things to be composable. In this case, finding recently-expired
promotions, finding promotions that have a quantity of one, and finding
promotions that were used.

My approach is to put these conditions into views, then I can join against
each one. But that approach is much slower than inlining all the code.

How is this typically done?

First I am not usually a fan of trying to reduce code duplication by using
views. In general, my experience is that this makes it very easy to make
things slower, and it adds unexpected optimization hedges in unexpected
places.

Your problem here seems to be of this sort. You are joining together two
views in order to add filters. These operations are not really guaranteed
to be the same and so you have an unexpected optimization fence.

My general rule of thumb is to consider moving inline views and WITH
clauses into views as needed.

Now I think there are a bunch of ways to accomplish what you are trying to
do here.

At the risk of jumping straight ahead into advanced functionality and the
accusations that I am making use of magic wands, I will suggest an
object-relational approach to reducing code duplication. This would be to
eliminate most your filter views and make use instead of table methods.

CREATE FUNCTION recently_expired(promotion) returns bool language sql
immutable as
$$
select $1.ended at > now() - '30 days'::interval;
$$;

CREATE FUNCTION is_one_time(promotion) returns bool language sql immutable
as
$$
select $1.quantity = 1;
$$;

The one thing is you'd probably have to manually write in your join against
promotion_uses to make that effective, But you could instead do:

select p.id from promotions p join promotion_usages pu on pu.promotion_id = p.id

where p.is_one_time and p.recently_expired;

#5Jack Christensen
jack@jackchristensen.com
In reply to: Joe Van Dyk (#3)
Re: Avoiding duplication of code via views -- slower? How do people typically do this?

Joe Van Dyk wrote:

Perhaps I fat-fingered something somewhere... I tried that and I got
this:
https://gist.github.com/joevandyk/4958906/raw/5561f95ef2b5d82f81ab14913c4d36f6aac3ee0a/gistfile1.txt

The with_filters view uses a different plan.

Interesting. It is avoiding the hash join, but it is still evaluating
the exists column even when it is not referenced at all in the select. I
would have expected the optimizer to remove it entirely.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Van Dyk (#3)
Re: Avoiding duplication of code via views -- slower? How do people typically do this?

Joe Van Dyk <joe@tanga.com> writes:

Perhaps I fat-fingered something somewhere... I tried that and I got this:
https://gist.github.com/joevandyk/4958906/raw/5561f95ef2b5d82f81ab14913c4d36f6aac3ee0a/gistfile1.txt

Try without the useless "is true" bits.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Joe Van Dyk
joe@tanga.com
In reply to: Tom Lane (#6)
Re: Avoiding duplication of code via views -- slower? How do people typically do this?

On Fri, Feb 15, 2013 at 7:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Joe Van Dyk <joe@tanga.com> writes:

Perhaps I fat-fingered something somewhere... I tried that and I got

this:

https://gist.github.com/joevandyk/4958906/raw/5561f95ef2b5d82f81ab14913c4d36f6aac3ee0a/gistfile1.txt

Try without the useless "is true" bits.

regards, tom lane

Huh, that did do the trick. Why does "is true" affect the plan?

without "is true" in the conditions:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=16676.66..16676.67 rows=1 width=0) (actual time=
95.648..95.648 rows=1 loops=1)
-> Bitmap Heap Scan on promotions p (cost=868.37..16619.49 rows=22868
width=0) (actual time=11.031..95.294 rows=2720 loops=1)
Recheck Cond: (end_at > (now() - '30 days'::interval))
Filter: ((quantity = 1) AND (SubPlan 1))
Rows Removed by Filter: 43073
-> Bitmap Index Scan on index_promotions_on_end_at
(cost=0.00..862.65 rows=46093 width=0) (actual time=10.783..10.783
rows=73234 loops=1)
Index Cond: (end_at > (now() - '30 days'::interval))
SubPlan 1
-> Index Only Scan using index_promotion_usages_on_promotion_id
on promotion_usages pu (cost=0.00..20.54 rows=178 width=0) (actual
time=0.001..0.001 rows=0 loops=44998)
Index Cond: (promotion_id = p.id)
Heap Fetches: 2720
Total runtime: 95.739 ms
(12 rows)

with "is true" in the conditions:

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=94430.93..94430.94 rows=1 width=0) (actual
time=534.568..534.569 rows=1 loops=1)
-> Seq Scan on promotions p (cost=0.00..94373.76 rows=22868 width=0)
(actual time=0.306..534.165 rows=2720 loops=1)
Filter: (((quantity = 1) IS TRUE) AND ((end_at > (now() - '30
days'::interval)) IS TRUE) AND ((SubPlan 1) IS TRUE))
Rows Removed by Filter: 600105
SubPlan 1
-> Index Only Scan using index_promotion_usages_on_promotion_id
on promotion_usages pu (cost=0.00..20.54 rows=178 width=0) (actual
time=0.001..0.001 rows=0 loops=44998)
Index Cond: (promotion_id = p.id)
Heap Fetches: 2720
Total runtime: 534.627 ms
(9 rows)

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Van Dyk (#7)
Re: Avoiding duplication of code via views -- slower? How do people typically do this?

Joe Van Dyk <joe@tanga.com> writes:

On Fri, Feb 15, 2013 at 7:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Try without the useless "is true" bits.

Huh, that did do the trick. Why does "is true" affect the plan?

Because "(x = y) IS TRUE" isn't the same as "x = y". (The behavior for
nulls is different.) And the planner only knows about using the latter
type of condition for indexscans. Since you need it to convert the
end_at condition into an indexscan to get a fast plan, you lose.

It's conceivable that we could teach the planner about this case, but
I haven't seen enough people doing that to make me think it's worth the
code space and planner cycles.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Merlin Moncure
mmoncure@gmail.com
In reply to: Chris Travers (#4)
Re: Avoiding duplication of code via views -- slower? How do people typically do this?

On Fri, Feb 15, 2013 at 1:42 AM, Chris Travers <chris.travers@gmail.com> wrote:

On Thu, Feb 14, 2013 at 4:32 PM, Joe Van Dyk <joe@tanga.com> wrote:

See
https://gist.github.com/joevandyk/4957646/raw/86d55472ff8b5a4a6740d9c673d18a7005738467/gistfile1.txt
for the code.

I have promotions(id, end_at, quantity) and
promotion_usages(promotion_id).

I have a couple of things I typically want to retrieve, and I'd like those
things to be composable. In this case, finding recently-expired promotions,
finding promotions that have a quantity of one, and finding promotions that
were used.

My approach is to put these conditions into views, then I can join against
each one. But that approach is much slower than inlining all the code.

How is this typically done?

First I am not usually a fan of trying to reduce code duplication by using
views. In general, my experience is that this makes it very easy to make
things slower, and it adds unexpected optimization hedges in unexpected
places.

Your problem here seems to be of this sort. You are joining together two
views in order to add filters. These operations are not really guaranteed
to be the same and so you have an unexpected optimization fence.

My general rule of thumb is to consider moving inline views and WITH clauses
into views as needed.

Now I think there are a bunch of ways to accomplish what you are trying to
do here.

At the risk of jumping straight ahead into advanced functionality and the
accusations that I am making use of magic wands, I will suggest an
object-relational approach to reducing code duplication. This would be to
eliminate most your filter views and make use instead of table methods.

CREATE FUNCTION recently_expired(promotion) returns bool language sql
immutable as
$$
select $1.ended at > now() - '30 days'::interval;
$$;

CREATE FUNCTION is_one_time(promotion) returns bool language sql immutable
as
$$
select $1.quantity = 1;
$$;

Unfortunately from performance point of view that is a much worse way
to do things. Pushing checks into function like that forces
processing into a iterative model which has a much worse set of
performance gotchas that have essentially no workaround. This is
because there is no way to force the function to be inlined. What I'd
like to see is to have a new function decoration, INLINE, that
introduces some constraints to how the function can be written and
forces the function to be expanded in the query at plan time. Only
then will there be a true alternative to using views, especially if
you could inline through a LATERAL function call.

merlin.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general