performance regression, 7.2.3 -> 7.3b5 w/ VIEW

Started by Ross J. Reedstromover 23 years ago7 messageshackers
Jump to latest
#1Ross J. Reedstrom
reedstrm@rice.edu

Hey Hackers -
I was testing beta5 and found a performance regression involving
application of constraints into a VIEW - I've got a view that is fairly
expensive, involving a subselet and an aggregate. When the query is
rewritten in 7.2.3, the toplevel constraint is used to filter before
the subselect - in 7.3b5, it comes after.

For this query, the difference is 160 ms vs. 2 sec. Any reason for this
change?

Here's the view def., and explain analyzes for the view, and two hand
rewritten versions (since the explain analyze in 7.2.3 doesn't display
the filter parameters)

Ross

CREATE VIEW current_modules AS
SELECT * FROM modules m
WHERE module_ident =
(SELECT max(module_ident) FROM modules
WHERE m.moduleid = moduleid GROUP BY moduleid);

repository=# explain analyze select * from current_modules where name ~ 'Fourier';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on modules m (cost=0.00..116090.23 rows=1 width=135) (actual time=18.74..1968.01 rows=37 loops=1)
Filter: ((module_ident = (subplan)) AND (name ~ 'Fourier'::text))
SubPlan
-> Aggregate (cost=0.00..25.57 rows=1 width=13) (actual time=0.41..0.41 rows=1 loops=4534)
-> Group (cost=0.00..25.55 rows=6 width=13) (actual time=0.08..0.37 rows=10 loops=4534)
-> Index Scan using moduleid_idx on modules (cost=0.00..25.54 rows=6 width=13) (actual time=0.06..0.27 rows=10 loops=4534)
Index Cond: ($0 = moduleid)
Total runtime: 1968.65 msec
(8 rows)

repository=# explain analyze select module_ident from modules m where m.name ~ 'Fourier' and m.module_ident = (SELECT max(modules.module_ident) as max from modules where (m.moduleid=moduleid) group by modules.moduleid);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on modules m (cost=0.00..116090.23 rows=1 width=4) (actual time=2.46..158.33 rows=37 loops=1)
Filter: ((name ~ 'Fourier'::text) AND (module_ident = (subplan)))
SubPlan
-> Aggregate (cost=0.00..25.57 rows=1 width=13) (actual time=0.35..0.35 rows=1 loops=270)
-> Group (cost=0.00..25.55 rows=6 width=13) (actual time=0.07..0.31 rows=9 loops=270)
-> Index Scan using moduleid_idx on modules (cost=0.00..25.54 rows=6 width=13) (actual time=0.06..0.22 rows=9 loops=270)
Index Cond: ($0 = moduleid)
Total runtime: 158.81 msec
(8 rows)

repository=# explain analyze select module_ident from modules m where m.module_ident = (SELECT max(modules.module_ident) as max from modules where (m.moduleid=moduleid) group by modules.moduleid) and m.name ~ 'Fourier';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on modules m (cost=0.00..116090.23 rows=1 width=4) (actual time=18.66..1959.31 rows=37 loops=1)
Filter: ((module_ident = (subplan)) AND (name ~ 'Fourier'::text))
SubPlan
-> Aggregate (cost=0.00..25.57 rows=1 width=13) (actual time=0.41..0.41 rows=1 loops=4534)
-> Group (cost=0.00..25.55 rows=6 width=13) (actual time=0.08..0.37 rows=10 loops=4534)
-> Index Scan using moduleid_idx on modules (cost=0.00..25.54 rows=6 width=13) (actual time=0.06..0.27 rows=10 loops=4534)
Index Cond: ($0 = moduleid)
Total runtime: 1959.84 msec
(8 rows)

#2Mike Mascari
mascarm@mascari.com
In reply to: Ross J. Reedstrom (#1)
Re: performance regression, 7.2.3 -> 7.3b5 w/ VIEW

Ross J. Reedstrom wrote:

Hey Hackers -
I was testing beta5 and found a performance regression involving
application of constraints into a VIEW - I've got a view that is fairly
expensive, involving a subselet and an aggregate. When the query is
rewritten in 7.2.3, the toplevel constraint is used to filter before
the subselect - in 7.3b5, it comes after.

For this query, the difference is 160 ms vs. 2 sec. Any reason for this
change?

I could be way off base, but here's a shot in the dark:

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=3D0885E1.8F369ACA%40mascari.com&rnum=3&prev=/groups%3Fq%3DMike%2BMascari%2Bsecurity%2BTom%2BLane%26ie%3DUTF-8%26oe%3DUTF-8%26hl%3Den

At the time I thought PostgreSQL was doing something naughty by
allowing user functions to be invoked on data that would
ultimately not be returned. Now I know how Oracle uses VIEWS for
row security: Oracle functions invoked in DML statements can't
record any changes to the database. So if the above is the
cause, I wouldn't have any problems with the patch being
reversed. Maybe separate privileges for read-only vs. read-write
functions are in order at some point in the future though...

Mike Mascari
mascarm@mascari.com

#3Ross J. Reedstrom
reedstrm@rice.edu
In reply to: Mike Mascari (#2)
Re: performance regression, 7.2.3 -> 7.3b5 w/ VIEW

On Wed, Nov 13, 2002 at 02:40:40AM -0500, Mike Mascari wrote:

Ross J. Reedstrom wrote:

For this query, the difference is 160 ms vs. 2 sec. Any reason for this
change?

I could be way off base, but here's a shot in the dark:

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=3D0885E1.8F369ACA%40mascari.com&rnum=3&prev=/groups%3Fq%3DMike%2BMascari%2Bsecurity%2BTom%2BLane%26ie%3DUTF-8%26oe%3DUTF-8%26hl%3Den

At the time I thought PostgreSQL was doing something naughty by
allowing user functions to be invoked on data that would
ultimately not be returned. Now I know how Oracle uses VIEWS for
row security: Oracle functions invoked in DML statements can't
record any changes to the database. So if the above is the
cause, I wouldn't have any problems with the patch being
reversed. Maybe separate privileges for read-only vs. read-write
functions are in order at some point in the future though...

Bingo, that solved it. I'm back to 160 ms. What does Tom feel about
removing this? Is there some way the planner could have known which
was the smarter/faster order of application?

Ross

#4Tommi Maekitalo
t.maekitalo@epgmbh.de
In reply to: Ross J. Reedstrom (#1)
Re: performance regression, 7.2.3 -> 7.3b5 w/ VIEW

Am Mittwoch, 13. November 2002 07:22 schrieb Ross J. Reedstrom:

Hey Hackers -

...

CREATE VIEW current_modules AS
SELECT * FROM modules m
WHERE module_ident =
(SELECT max(module_ident) FROM modules
WHERE m.moduleid = moduleid GROUP BY moduleid);

...

I just wonder if you really need the GROUP BY. The subselect should return
exactly one row and so max does without GROUP BY:
CREATE VIEW current_modules AS
SELECT * FROM modules m
WHERE module_ident =
(SELECT max(module_ident) FROM modules
WHERE m.moduleid = moduleid);

Tommi

--
Dr. Eckhardt + Partner GmbH
http://www.epgmbh.de

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ross J. Reedstrom (#3)
Re: performance regression, 7.2.3 -> 7.3b5 w/ VIEW

"Ross J. Reedstrom" <reedstrm@rice.edu> writes:

Bingo, that solved it. I'm back to 160 ms. What does Tom feel about
removing this? Is there some way the planner could have known which
was the smarter/faster order of application?

As I said in the previous thread, I don't have a lot of patience with
the notion of expecting the planner to promise anything about evaluation
order of WHERE clauses. I wasn't thrilled with adding the patch, but
I'm even less thrilled with the idea of backing it out now.

There has been some discussion of reordering WHERE clauses based on
estimated cost --- a simple form of this would be to push any clauses
involving subplans to the end of the list. I haven't done anything
about that yet, mainly because I'm unsure if there are cases where it
would be worse than not doing it.

regards, tom lane

#6Ross J. Reedstrom
reedstrm@rice.edu
In reply to: Tom Lane (#5)
Re: performance regression, 7.2.3 -> 7.3b5 w/ VIEW

On Wed, Nov 13, 2002 at 08:58:04AM -0500, Tom Lane wrote:

"Ross J. Reedstrom" <reedstrm@rice.edu> writes:

Bingo, that solved it. I'm back to 160 ms. What does Tom feel about
removing this? Is there some way the planner could have known which
was the smarter/faster order of application?

As I said in the previous thread, I don't have a lot of patience with
the notion of expecting the planner to promise anything about evaluation
order of WHERE clauses. I wasn't thrilled with adding the patch, but
I'm even less thrilled with the idea of backing it out now.

Having read the previous thread, I realized you wouldn't be thrilled
about it, that's why I asked. While I agree in principle (don't promise
a particular order), the pragmatic corollary of that principle would say
if you don't favor a particular order, then don't change the order from
previous stable releases.

Unlike the previous thread, I'm not looking for a particular order:
there're no side-effects I'm trying to exploit, I just want the best
possible performance.

There has been some discussion of reordering WHERE clauses based on
estimated cost --- a simple form of this would be to push any clauses
involving subplans to the end of the list. I haven't done anything
about that yet, mainly because I'm unsure if there are cases where it
would be worse than not doing it.

Me either, though my gut says subplans are expensive. I _can_ trivially
write queries that do the wrong thing (suboptimal order of WHERE clauses)
with or without this patch.

It's clearly the wrong time to try to do anything fancier, but the
conservative thing to do (in my unbiased opinion ;-) is put it back
the way it was for the last stable release, on the principle of least
surprise - there seems to be no bug fixed or functionality gained by
keeping the change.

Seems like this is at least worth a TODO:

* Examine WHERE clause order optimization possibilities, particularly
with subplans

Ross

#7Ross J. Reedstrom
reedstrm@rice.edu
In reply to: Tommi Maekitalo (#4)
Re: performance regression, 7.2.3 -> 7.3b5 w/ VIEW

You're right, I should remove that (cruft left over from when the
subselect wasn't). However, it has no impact on the planner at hand:
removing it does trim 25% from the execution time, but getting the
WHERE clauses used in the right order gains an order of magnitude.

Both apply. Thanks, I'll fix it.

Ross

Show quoted text

On Wed, Nov 13, 2002 at 09:28:38AM +0100, Tommi Maekitalo wrote:

Am Mittwoch, 13. November 2002 07:22 schrieb Ross J. Reedstrom:

Hey Hackers -

...

CREATE VIEW current_modules AS
SELECT * FROM modules m
WHERE module_ident =
(SELECT max(module_ident) FROM modules
WHERE m.moduleid = moduleid GROUP BY moduleid);

...

I just wonder if you really need the GROUP BY. The subselect should return
exactly one row and so max does without GROUP BY:
CREATE VIEW current_modules AS
SELECT * FROM modules m
WHERE module_ident =
(SELECT max(module_ident) FROM modules
WHERE m.moduleid = moduleid);