No optimization with a partition window in a view

Started by Pierreabout 12 years ago4 messagesbugs
Jump to latest
#1Pierre
pinaraf@pinaraf.info

Hello

I tried to avoid implementing window functions inside my ORM by using a view,
but it seems the optimizer is missing an obvious optimization and thus doing a
full table scan.

Affected versions : 9.2.4 and 9.3.2 (9.4 not tested yet)

How to reproduce :

=> create table test_history (i serial, piece integer not null, date timestamp
with time zone default now(), location integer not null);

=> create table test_history (i serial, piece integer not null, date timestamp
with time zone default now(), location integer not null);
CREATE TABLE

=> insert into test_history (piece, location) select i, 1 from
generate_series(1, 1000000) i;
INSERT 0 1000000
=> insert into test_history (piece, location) select i, 2 from
generate_series(1, 1000000) i;
INSERT 0 1000000
=> insert into test_history (piece, location) select i, 3 from
generate_series(1, 1000000) i;
^[[AINSERT 0 1000000
=> insert into test_history (piece, location) select i, 4 from
generate_series(1, 1000000, 2) i;
INSERT 0 500000

=> alter table test_history add primary key(i);
ALTER TABLE
=> create index on test_history(piece);
CREATE INDEX

=> select * from test_history where piece = 42;
i | piece | date | location
---------+-------+-------------------------------+----------
42 | 42 | 2014-02-15 08:52:50.946586+01 | 1
1000042 | 42 | 2014-02-15 08:52:56.634685+01 | 2
2000042 | 42 | 2014-02-15 08:53:00.762706+01 | 3
(3 rows)

Time: 0.158 ms

=> select *, lag(location, 1) over w, lead(location, 1) over w from test_history
where piece = 42 window w as (partition by piece order by date);
i | piece | date | location | lag | lead
---------+-------+-------------------------------+----------+-----+------
42 | 42 | 2014-02-15 08:52:50.946586+01 | 1 | | 2
1000042 | 42 | 2014-02-15 08:52:56.634685+01 | 2 | 1 | 3
2000042 | 42 | 2014-02-15 08:53:00.762706+01 | 3 | 2 |
(3 rows)

Time: 0.203 ms

=> create view test_history_lag_lead as select *, lag(location, 1) over w,
lead(location, 1) over w from test_history window w as (partition by piece order
by date);
CREATE VIEW
=> select * from test_history_lag_lead where piece = 42;
i | piece | date | location | lag | lead
---------+-------+-------------------------------+----------+-----+------
42 | 42 | 2014-02-15 08:52:50.946586+01 | 1 | | 2
1000042 | 42 | 2014-02-15 08:52:56.634685+01 | 2 | 1 | 3
2000042 | 42 | 2014-02-15 08:53:00.762706+01 | 3 | 2 |
(3 rows)

Time: 2915.756 ms

=> explain analyze select * from test_history_lag_lead where piece = 42;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on test_history_lag_lead (cost=653058.16..775558.16 rows=4
width=28) (actual time=1411.423..3375.794 rows=3 loops=1)
Filter: (test_history_lag_lead.piece = 42)
Rows Removed by Filter: 3499997
-> WindowAgg (cost=653058.16..731808.16 rows=3500000 width=20) (actual
time=1411.343..3206.959 rows=3500000 loops=1)
-> Sort (cost=653058.16..661808.16 rows=3500000 width=20) (actual
time=1411.337..1867.895 rows=3500000 loops=1)
Sort Key: test_history.piece, test_history.date
Sort Method: external merge Disk: 116328kB
-> Seq Scan on test_history (cost=0.00..57293.00 rows=3500000
width=20) (actual time=0.004..310.558 rows=3500000 loops=1)
Total runtime: 3386.455 ms
(9 rows)

=> explain analyze select *, lag(location, 1) over w, lead(location, 1) over w
from test_history where piece = 42 window w as (partition by piece order by
date);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=19.70..19.78 rows=4 width=20) (actual time=0.018..0.019 rows=3
loops=1)
-> Sort (cost=19.70..19.71 rows=4 width=20) (actual time=0.015..0.015
rows=3 loops=1)
Sort Key: date
Sort Method: quicksort Memory: 25kB
-> Index Scan using test_history_piece_idx on test_history
(cost=0.43..19.66 rows=4 width=20) (actual time=0.009..0.010 rows=3 loops=1)
Index Cond: (piece = 42)
Total runtime: 0.037 ms
(7 rows)

As you can see, the optimizer decided to do the WindowAgg first on the 3,5M
lines before filtering, instead of filtering using an Index scan and then doing
the aggregation.
IMHO, that optimization would be a special case possible only with windows on a
partition that is contained in the filtering fields, but I aint no expert here
:)

Thanks

Pierre Ducroquet

#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Pierre (#1)
Re: No optimization with a partition window in a view

Hello, this seems to be a matter of subquery pushdown, query
transform involving subqueries or how to deal with views in
planner, rather than a bug.

I tried to avoid implementing window functions inside my ORM by using a view,
but it seems the optimizer is missing an obvious optimization and thus doing a
full table scan.

Affected versions : 9.2.4 and 9.3.2 (9.4 not tested yet)

9.4dev is also "affected".

How to reproduce :

<snip>

OK_A> => select * from test_history where piece = 42;
OK_B> => select *, lag(location, 1) over w, lead(location, 1) over w from test_history
where piece = 42 window w as (partition by piece order by date);

=> create view test_history_lag_lead as select *, lag(location, 1) over w,
lead(location, 1) over w from test_history window w as (partition by piece order
by date);

NG_C> => select * from test_history_lag_lead where piece = 42;

The equivalent for NG_C is not OK_B but this,

select * from
(select *, lag(location, 1) over w, lead(location,1) over w
from test_history window w as (partition by piece order
by date)) test_history_lag_lead
where piece = 42;

You will see this also falls down to SeqScan. Since views are
treated as monolithic substances. The planner simplly replaces a
view with a subquery and the subquery is planned individually -
separately from the upper part so that result comes. No amendment
is seen in my poor sight so far..

If your objective is simplification of queries or concealing
details and you are allowed to do these in different way,
functions could be usable.

CREATE OR REPLACE FUNCTION test_history_lag_lead(integer)
RETURNS TABLE(i integer, piece integer, date timestamp with time zone,
location integer, lag integer, lead integer) AS $$
SELECT *, lag(location, 1) OVER w, lead(location, 1) OVER w
FROM test_history WHERE piece = $1
WINDOW w AS (PARTITION BY piece ORDER BY date);
$$ LANGUAGE SQL;

select * from test_history_lag_lead(42);
...
Total runtime: 0.594 ms

(Seems slow but seqscan takes 13 sec on my rig..(altough without
compile optimizations))

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kyotaro Horiguchi (#2)
Re: No optimization with a partition window in a view

Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:

Hello, this seems to be a matter of subquery pushdown, query
transform involving subqueries or how to deal with views in
planner, rather than a bug.

In general, pushing a WHERE clause down through a window function call
*would* be a bug. I think it should be safe in this case because the
WHERE clause matches the window functions' partition clauses, so that
applying the WHERE removes either all or none of the rows of any
particular partition. We've not gotten around to that type of refinement
in window function planning, yet. It would take some infrastructure that
doesn't exist now --- I don't recall that we have any code that tries to
make that particular kind of proof.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Tom Lane (#3)
Re: No optimization with a partition window in a view

Hello,

At Mon, 17 Feb 2014 23:10:53 -0500, Tom Lane wrote

Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:

Hello, this seems to be a matter of subquery pushdown, query
transform involving subqueries or how to deal with views in
planner, rather than a bug.

In general, pushing a WHERE clause down through a window function call
*would* be a bug.

I presumed without thought the first two queries are equivalent, but

http://www.postgresql.org/docs/9.2/static/tutorial-window.html

| The rows considered by a window function are those of the
| "virtual table" produced by the query's FROM clause as filtered
| by its WHERE, GROUP BY, and HAVING clauses if any. For example,
| a row removed because it does not meet the WHERE condition is
| not seen by any window function.

Pushing down WHERE clause relating to window functions is
certainly a bug for window functions in most cases.

I think it should be safe in this case because the WHERE clause
matches the window functions' partition clauses, so that
applying the WHERE removes either all or none of the rows of
any particular partition.

I understood that after examining data. How stupid!

We've not gotten around to that type of refinement in window
function planning, yet. It would take some infrastructure that
doesn't exist now --- I don't recall that we have any code that
tries to make that particular kind of proof.

# Althgough I dont't see the landscape of what kind of these
# optimizations could there be,

Subquery comes to me to seem to be a firm obstacle for many
things :-( But it makes everything simpler, too.

The infrastructure is, in simple imagination(?), that for
examining condition, pushing down, repeated exec time estimation
and selection involving upper nodes of the subquery, but this is
too naive design as it takes too long time to make exec plan as
you mentioned before , besides of ugliness. (But multiple
plannings for one subquery seems not needed for this case.)

Larger scale transformations which can dissolve subqueries would
be better. Feeling overwhelmed..

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs