Query becomes slow when written as view

Started by Jan Strubeover 13 years ago5 messagesgeneral

js@deriva.de

over 13 years ago

Hi,

I have the following query which runs reasonably fast under PostgreSQL
9.1.8:

SELECT
b."ISIN",
CASE
WHEN b."COMMENT" IS NOT NULL THEN b."COMMENT"
WHEN cc."ISIN" IS NOT NULL THEN cc.comment
ELSE get_comment(b."ISIN")
END AS "COMMENT"
FROM dtng."Z_BASE" b
LEFT JOIN dtng.cached_comments cc on b."ISIN" = cc."ISIN" AND
cc.cache_time >= b._last_modified
WHERE b."ISIN" IN (SELECT "ISIN" from dtng."Z_BASE" LIMIT 1)

Here is the query plan:

Nested Loop Left Join (cost=0.08..16.65 rows=1 width=1053)
Join Filter: (cc.cache_time >= b._last_modified)
-> Nested Loop (cost=0.08..8.67 rows=1 width=644)
-> HashAggregate (cost=0.08..0.09 rows=1 width=13)
-> Subquery Scan on "ANY_subquery" (cost=0.00..0.08 rows=1 width=13)
-> Limit (cost=0.00..0.07 rows=1 width=13)
-> Seq Scan on "Z_BASE" (cost=0.00..106515.68 rows=1637368 width=13)
-> Index Scan using "Z_BASE_pkey" on "Z_BASE" b (cost=0.00..8.57 rows=1
width=644)
Index Cond: (("ISIN")::bpchar = ("ANY_subquery"."ISIN")::bpchar)
-> Index Scan using cached_comments_pkey on cached_comments cc
(cost=0.00..7.71 rows=1 width=425)
Index Cond: ((b."ISIN")::bpchar = ("ISIN")::bpchar)

When I´m trying to put this into a view, it becomes extremely slow:

CREATE VIEW export_comments AS
SELECT
b."ISIN",
CASE
WHEN b."COMMENT" IS NOT NULL THEN b."COMMENT"
WHEN cc."ISIN" IS NOT NULL THEN cc.comment
ELSE get_comment(b."ISIN")
END AS "COMMENT"
FROM dtng."Z_BASE" b
LEFT JOIN dtng.cached_comments cc on b."ISIN" = cc."ISIN" AND
cc.cache_time >= b._last_modified

SELECT *
FROM export_comments
WHERE "ISIN" IN (SELECT "ISIN" FROM dtng."Z_BASE" LIMIT 1)

The query plan now is:

Hash Join (cost=79926.52..906644.87 rows=818684 width=45)
Hash Cond: ((b."ISIN")::bpchar = ("ANY_subquery"."ISIN")::bpchar)
-> Hash Left Join (cost=79926.42..884049.08 rows=1637368 width=1053)
Hash Cond: ((b."ISIN")::bpchar = (cc."ISIN")::bpchar)
Join Filter: (cc.cache_time >= b._last_modified)
-> Seq Scan on "Z_BASE" b (cost=0.00..106515.68 rows=1637368 width=644)
-> Hash (cost=74620.41..74620.41 rows=77841 width=425)
-> Seq Scan on cached_comments cc (cost=0.00..74620.41 rows=77841 width=425)
-> Hash (cost=0.09..0.09 rows=1 width=13)
-> HashAggregate (cost=0.08..0.09 rows=1 width=13)
-> Subquery Scan on "ANY_subquery" (cost=0.00..0.08 rows=1 width=13)
-> Limit (cost=0.00..0.07 rows=1 width=13)
-> Seq Scan on "Z_BASE" (cost=0.00..106515.68 rows=1637368 width=13)

By the way I get the same behaviour and query plan when I try this:

SELECT *
FROM (
-- above view definition
) x
WHERE x."ISIN" IN (SELECT "ISIN" FROM dtng."Z_BASE" LIMIT 1)

We already found out that the problem is the Perl function "get_comment"
which is very expensive. In the first case the function is called at
most once, but in the second case it is called many times. I believe
this is because of the hash join which causes the view to fetch
everything from dtng."Z_BASE" first?
The question is, how to avoid this? We tried to set the functions cost
from 100 to 10000000 but that did not help. (Because of the architecture
of the software that uses this query, we have the constraint that
structure of the final WHERE clause (WHERE "ISIN" IN (...)) must not be
altered.)

Thanks a lot for any idea,
Jan

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Merlin Moncure

mmoncure@gmail.com

over 13 years ago

In reply to: Jan Strube (#1)

Re: Query becomes slow when written as view

On Thu, Feb 14, 2013 at 7:23 AM, Jan Strube <js@deriva.de> wrote:

Hi,

I have the following query which runs reasonably fast under PostgreSQL
9.1.8:

SELECT
b."ISIN",
CASE
WHEN b."COMMENT" IS NOT NULL THEN b."COMMENT"
WHEN cc."ISIN" IS NOT NULL THEN cc.comment
ELSE get_comment(b."ISIN")
END AS "COMMENT"
FROM dtng."Z_BASE" b
LEFT JOIN dtng.cached_comments cc on b."ISIN" = cc."ISIN" AND cc.cache_time

= b._last_modified

WHERE b."ISIN" IN (SELECT "ISIN" from dtng."Z_BASE" LIMIT 1)

Here is the query plan:

Nested Loop Left Join (cost=0.08..16.65 rows=1 width=1053)
Join Filter: (cc.cache_time >= b._last_modified)
-> Nested Loop (cost=0.08..8.67 rows=1 width=644)
-> HashAggregate (cost=0.08..0.09 rows=1 width=13)
-> Subquery Scan on "ANY_subquery" (cost=0.00..0.08 rows=1 width=13)
-> Limit (cost=0.00..0.07 rows=1 width=13)
-> Seq Scan on "Z_BASE" (cost=0.00..106515.68 rows=1637368 width=13)
-> Index Scan using "Z_BASE_pkey" on "Z_BASE" b (cost=0.00..8.57 rows=1
width=644)
Index Cond: (("ISIN")::bpchar = ("ANY_subquery"."ISIN")::bpchar)
-> Index Scan using cached_comments_pkey on cached_comments cc
(cost=0.00..7.71 rows=1 width=425)
Index Cond: ((b."ISIN")::bpchar = ("ISIN")::bpchar)

When I´m trying to put this into a view, it becomes extremely slow:

CREATE VIEW export_comments AS
SELECT
b."ISIN",
CASE
WHEN b."COMMENT" IS NOT NULL THEN b."COMMENT"
WHEN cc."ISIN" IS NOT NULL THEN cc.comment
ELSE get_comment(b."ISIN")
END AS "COMMENT"
FROM dtng."Z_BASE" b
LEFT JOIN dtng.cached_comments cc on b."ISIN" = cc."ISIN" AND cc.cache_time

= b._last_modified

SELECT *
FROM export_comments
WHERE "ISIN" IN (SELECT "ISIN" FROM dtng."Z_BASE" LIMIT 1)

The query plan now is:

Hash Join (cost=79926.52..906644.87 rows=818684 width=45)
Hash Cond: ((b."ISIN")::bpchar = ("ANY_subquery"."ISIN")::bpchar)
-> Hash Left Join (cost=79926.42..884049.08 rows=1637368 width=1053)
Hash Cond: ((b."ISIN")::bpchar = (cc."ISIN")::bpchar)
Join Filter: (cc.cache_time >= b._last_modified)
-> Seq Scan on "Z_BASE" b (cost=0.00..106515.68 rows=1637368 width=644)
-> Hash (cost=74620.41..74620.41 rows=77841 width=425)
-> Seq Scan on cached_comments cc (cost=0.00..74620.41 rows=77841 width=425)
-> Hash (cost=0.09..0.09 rows=1 width=13)
-> HashAggregate (cost=0.08..0.09 rows=1 width=13)
-> Subquery Scan on "ANY_subquery" (cost=0.00..0.08 rows=1 width=13)
-> Limit (cost=0.00..0.07 rows=1 width=13)
-> Seq Scan on "Z_BASE" (cost=0.00..106515.68 rows=1637368 width=13)

By the way I get the same behaviour and query plan when I try this:

SELECT *
FROM (
-- above view definition
) x
WHERE x."ISIN" IN (SELECT "ISIN" FROM dtng."Z_BASE" LIMIT 1)

We already found out that the problem is the Perl function "get_comment"
which is very expensive. In the first case the function is called at most
once, but in the second case it is called many times. I believe this is
because of the hash join which causes the view to fetch everything from
dtng."Z_BASE" first?
The question is, how to avoid this? We tried to set the functions cost from
100 to 10000000 but that did not help. (Because of the architecture of the
software that uses this query, we have the constraint that structure of the
final WHERE clause (WHERE "ISIN" IN (...)) must not be altered.)

Thanks a lot for any idea,
Jan

is your function stable/immutable, and if so is it decorated as such.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jan Strube

js@deriva.de

over 13 years ago

In reply to: Merlin Moncure (#2)

Re: Query becomes slow when written as view

is your function stable/immutable, and if so is it decorated as such.

merlin

No, it´s volatile.

Jan

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Jan Strube (#3)

Re: Query becomes slow when written as view

Jan Strube <js@deriva.de> writes:

is your function stable/immutable, and if so is it decorated as such.

No, it�s volatile.

Well, that's your problem. The planner won't push down the IN clause
past the volatile function for fear of changing the query's side-effects.

I'd question whether it's sane to have a view with volatile functions in
it at all. It certainly won't act much like the normal understanding of
a view ...

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jan Strube

js@deriva.de

over 13 years ago

In reply to: Tom Lane (#4)

Re: Query becomes slow when written as view

is your function stable/immutable, and if so is it decorated as such.

No, itŽs volatile.

Well, that's your problem. The planner won't push down the IN clause
past the volatile function for fear of changing the query's side-effects.

I'd question whether it's sane to have a view with volatile functions in
it at all. It certainly won't act much like the normal understanding of
a view ...

I see, thanks for the explanation.
In this case, the side effect is desired. The view should always return
a COMMENT. Either directly from one of the tables or generated from the
function which stores the COMMENT in cached_comments for the next select.
Is there perhaps a best practice to do a thing like that? Of course we
could declare the original function stable and call another volatile
function to store the data, as noted in the docs. But that would be
cheating...

Regards,
Jan

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general