Problems pushing down WHERE-clause to underlying view

Started by Nicklas Avénover 7 years ago8 messagesgeneral

nicklas.aven@jordogskog.no

over 7 years ago

We have a system with 2 layers of views. It is about forestry.

The first layer contains the logic like grouping volumes in logs
together to stems or harvesting areas and joining species names to codes
and things like that.

The second layer just joins this underlying views to a table with user
ids and machine ids. So, when used by the application there is a where
clause containing the user id which gives access to the correct data

in the underlying view based on machine id.

The underlying view in this case can return approx 22000 rows, grouped
from approx 8 million logs by harvest date, harvest object and so on.

The problem is that it always calculates all those 22000 rows even if
the user id I use only gives 250 rows.

So, the query uses 4 seconds instead of under 100 ms.

I have tried this on 2 servers and my laptop, PostgreSQL 9.6 and 10 and
get the same issue.

I have tried to pick the query apart to understand what is happening.

First, the underlying view looks like this except I removed some fields
that doesn't affect the case to save some space:

CREATE OR REPLACE VIEW underlying_view AS
SELECT
    l.machine_key,
    o.object_name,
    o.sub_object_name,
    s.species_group_name,
    p.product_group_name ,
    l.m3_sub AS volume_m3sub,
    l.number_of_logs,
    mi.basemachine_manufacturer,
    mi.basemachine_model
   FROM
    (
       SELECT
        hl.contractor_id,
        hl.machine_key,
        hl.operator_key,
        hl.object_key,
        hl.sub_object_key,
        date(hl.harvest_date) AS harvest_date, --this is timestamptz
since we use the time in other places
        hl.species_group_key,
        hl.product_key,
        sum(hl.m3_sub) AS m3_sub,
        count(*) AS number_of_logs
    FROM version_union_tables_r02.harvester_logs hl
    GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key,
hl.object_key, hl.sub_object_key, (date(hl.harvest_date)),
hl.species_group_key, hl.product_key
    ) l
     LEFT JOIN version_union_tables_r02.machine_info mi ON
l.machine_key::text = mi.machine_key::text
     LEFT JOIN version_union_tables_r02.objects o ON
l.machine_key::text = o.machine_key::text AND l.object_key =
o.object_key AND l.sub_object_key = o.sub_object_key
     LEFT JOIN version_union_tables_r02.products p ON
l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
     LEFT JOIN version_union_tables_r02.species s ON
l.machine_key::text = s.machine_key::text AND l.species_group_key =
s.species_group_key;

In the next level is a view that looks like this:

CREATE OR REPLACE VIEW top_level_view AS
SELECT *
FROM underlying_view a
JOIN contractor_access b ON a.machine_key = b.machine_key WHERE
b.active <> 0;

If I query this top_level_view like:

SELECT * FROM top_level_view WHERE user_id = 'name@email.address';

I haven't succeeded to avoid the underlying view to compute the full
dataset.

The user_id 'name@email.address' returns 1 machine_key from
contractor_access table that has any hits in the underlying view (4 in
total but 3 machines are not represented in the underlying view)

We call the machine that we get from contractor_id table 'machine1'

Here is what I have tried to find out when the plan changes:

1) SELECT * FROM underlying_view WHERE machine_key = 'machine1'; --
returns 250 rows in approx 100 ms, so, works as expected

2) select * from (select 'machine1' machine_key) a, underlying_view b
where a.machine_key = b.machine_key; --same as above, works as expected

3) select * from (select * from contractor_access where user_id =
'name@email.address') a,
underlying_view b where a.machine_key = b.machine_key; -- Here I am
hit. this returns the same 250 rows, but in over 4 seconds

/*So I thought I should try to force down the machine_key to the
underlying view with lateral like this*/

4) select * from (select * from contractor_access where user_id =
'name@email.address') a,
lateral (select * from underlying_view where machine_key = a.machine_key) b;

But this doesn't work either. It returns the same 250 rows in approx 4
seconds.

My question is, is there some trick to force the planner to push down
the machine_key.

I cannot understand what is fooling the planner.

The table is analyzed, I have tried on several machines, so I do not
think it is miss leading statistics.

I haven't done any configuration more than tried with
max_parallel_workers_per_gather to 0 since the workers makes it harder
to understand what is happening.

Here is the quer plan on query number 3 above:

EXPLAIN select * from (select * from contractor_access where user_id =
'name@email.address') a,
underlying_view b where a.machine_key = b.machine_key;

Aggregate (cost=543839.03..543839.04 rows=1 width=8)
-> Hash Join (cost=395402.74..543798.72 rows=16123 width=0)
        Hash Cond: ((hl.machine_key)::text = contractor_access.machine_key)
        -> Hash Left Join (cost=395395.10..533563.59 rows=806147
width=400)
              Hash Cond: (((hl.machine_key)::text =
(s.machine_key)::text) AND (hl.species_group_key = s.species_group_key))
              -> Hash Left Join (cost=395380.73..485122.31
rows=806147 width=32)
                    Hash Cond: (((hl.machine_key)::text =
(p.machine_key)::text) AND (hl.product_key = p.product_key))
                    -> Hash Left Join (cost=395320.48..444697.18
rows=806147 width=36)
                          Hash Cond: (((hl.machine_key)::text =
(o.machine_key)::text) AND (hl.object_key = o.object_key) AND
(hl.sub_object_key = o.sub_object_key))
                          -> Hash Left Join (cost=395257.01..417426.05
rows=806147 width=36)
                                Hash Cond: ((hl.machine_key)::text =
(mi.machine_key)::text)
                                Join Filter: (((hl.contractor_id)::text
= (mi.contractor_id)::text) OR ((hl.contractor_id IS NULL) AND
(mi.contractor_id IS NULL)))
                                -> HashAggregate
(cost=395254.66..403316.13 rows=806147 width=86)
                                      Group Key: hl.machine_key,
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key,
hl.harvest_date, hl.species_group_key, hl.product_key
                                      -> Seq Scan on harvester_logs
hl (cost=0.00..234025.22 rows=8061472 width=54)
                                -> Hash (cost=1.60..1.60 rows=60
width=35)
                                      -> Seq Scan on machine_info mi
(cost=0.00..1.60 rows=60 width=35)
                          -> Hash (cost=33.26..33.26 rows=1726 width=23)
                                -> Seq Scan on objects o
(cost=0.00..33.26 rows=1726 width=23)
                    -> Hash (cost=37.90..37.90 rows=1490 width=29)
                          -> Seq Scan on products p (cost=0.00..37.90
rows=1490 width=29)
              -> Hash (cost=7.55..7.55 rows=455 width=31)
                    -> Seq Scan on species s (cost=0.00..7.55
rows=455 width=31)
        -> Hash (cost=7.59..7.59 rows=4 width=21)
              -> Seq Scan on contractor_access (cost=0.00..7.59
rows=4 width=21)
                    Filter: (t4e_contractor_id =
'name@email.address'::text)

Thanks

Nicklas Avén

Adrian Klaver

adrian.klaver@aklaver.com

over 7 years ago

In reply to: Nicklas Avén (#1)

Re: Problems pushing down WHERE-clause to underlying view

On 2/15/19 7:28 AM, Nicklas Avén wrote:

Hi

We have a system with 2 layers of views. It is about forestry.

The first layer contains the logic like grouping volumes in logs
together to stems or harvesting areas and joining species names to codes
and things like that.

The second layer just joins this underlying views to a table with user
ids and machine ids. So, when used by the application there is a where
clause containing the user id which gives access to the correct data

in the underlying view based on machine id.

The underlying view in this case can return approx 22000 rows, grouped
from approx 8 million logs by harvest date, harvest object and so on.

The problem is that it always calculates all those 22000 rows even if
the user id I use only gives 250 rows.

So, the query uses 4 seconds instead of under 100 ms.

https://www.postgresql.org/docs/10/sql-createview.html

"CREATE VIEW defines a view of a query. The view is not physically
materialized. Instead, the query is run every time the view is
referenced in a query."

Might want to look at materialized view:
https://www.postgresql.org/docs/10/sql-creatematerializedview.html

"CREATE MATERIALIZED VIEW is similar to CREATE TABLE AS, except that it
also remembers the query used to initialize the view, so that it can be
refreshed later upon demand. A materialized view has many of the same
properties as a table, but there is no support for temporary
materialized views or automatic generation of OIDs."

I would also suggest running the EXPLAIN below with ANALYZE so actual
timings are returned. Also try:

SELECT
*
FROM
underlying_view AS b
JOIN
contractor_access AS b
ON
a.machine_key = b.machine_key
WHERE
user_id = 'name@email.address'

I have tried this on 2 servers and my laptop, PostgreSQL 9.6 and 10 and
get the same issue.

I have tried to pick the query apart to understand what is happening.

First, the underlying view looks like this except I removed some fields
that doesn't affect the case to save some space:

CREATE OR REPLACE VIEW underlying_view AS
SELECT
    l.machine_key,
    o.object_name,
    o.sub_object_name,
    s.species_group_name,
    p.product_group_name ,
    l.m3_sub AS volume_m3sub,
    l.number_of_logs,
    mi.basemachine_manufacturer,
    mi.basemachine_model
   FROM
    (
       SELECT
        hl.contractor_id,
        hl.machine_key,
        hl.operator_key,
        hl.object_key,
        hl.sub_object_key,
        date(hl.harvest_date) AS harvest_date, --this is timestamptz
since we use the time in other places
        hl.species_group_key,
        hl.product_key,
        sum(hl.m3_sub) AS m3_sub,
        count(*) AS number_of_logs
    FROM version_union_tables_r02.harvester_logs hl
    GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key,
hl.object_key, hl.sub_object_key, (date(hl.harvest_date)),
hl.species_group_key, hl.product_key
    ) l
     LEFT JOIN version_union_tables_r02.machine_info mi ON
l.machine_key::text = mi.machine_key::text
     LEFT JOIN version_union_tables_r02.objects o ON
l.machine_key::text = o.machine_key::text AND l.object_key =
o.object_key AND l.sub_object_key = o.sub_object_key
     LEFT JOIN version_union_tables_r02.products p ON
l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
     LEFT JOIN version_union_tables_r02.species s ON
l.machine_key::text = s.machine_key::text AND l.species_group_key =
s.species_group_key;

In the next level is a view that looks like this:

CREATE OR REPLACE VIEW top_level_view AS
SELECT *
   FROM underlying_view a
     JOIN contractor_access b ON a.machine_key = b.machine_key WHERE
b.active <> 0;

If I query this top_level_view like:

SELECT * FROM top_level_view WHERE user_id = 'name@email.address';

I haven't succeeded to avoid the underlying view to compute the full
dataset.

The user_id 'name@email.address' returns 1 machine_key from
contractor_access table that has any hits in the underlying view (4 in
total but 3 machines are not represented in the underlying view)

We call the machine that we get from contractor_id table 'machine1'

Here is what I have tried to find out when the plan changes:

1) SELECT * FROM underlying_view WHERE machine_key = 'machine1'; --
returns 250 rows in approx 100 ms, so, works as expected

2) select * from (select 'machine1' machine_key) a, underlying_view b
where a.machine_key = b.machine_key; --same as above, works as expected

3) select * from (select * from contractor_access where user_id =
'name@email.address') a,
underlying_view b where a.machine_key = b.machine_key;     -- Here I am
hit. this returns the same 250 rows, but in over 4 seconds

/*So I thought I should try to force down the machine_key to the
underlying view with lateral like this*/

4) select * from (select * from contractor_access where user_id =
'name@email.address') a,
lateral (select * from underlying_view where machine_key =
a.machine_key) b;

But this doesn't work either. It returns the same 250 rows in approx 4
seconds.

My question is, is there some trick to force the planner to push down
the machine_key.

I cannot understand what is fooling the planner.

The table is analyzed, I have tried on several machines, so I do not
think it is miss leading statistics.

I haven't done any configuration more than tried with
max_parallel_workers_per_gather to 0 since the workers makes it harder
to understand what is happening.

Here is the quer plan on query number 3 above:

EXPLAIN select * from (select * from contractor_access where user_id =
'name@email.address') a,
underlying_view b where a.machine_key = b.machine_key;

Aggregate (cost=543839.03..543839.04 rows=1 width=8)
-> Hash Join (cost=395402.74..543798.72 rows=16123 width=0)
        Hash Cond: ((hl.machine_key)::text =
contractor_access.machine_key)
        -> Hash Left Join (cost=395395.10..533563.59 rows=806147
width=400)
              Hash Cond: (((hl.machine_key)::text =
(s.machine_key)::text) AND (hl.species_group_key = s.species_group_key))
              -> Hash Left Join (cost=395380.73..485122.31
rows=806147 width=32)
                    Hash Cond: (((hl.machine_key)::text =
(p.machine_key)::text) AND (hl.product_key = p.product_key))
                    -> Hash Left Join (cost=395320.48..444697.18
rows=806147 width=36)
                          Hash Cond: (((hl.machine_key)::text =
(o.machine_key)::text) AND (hl.object_key = o.object_key) AND
(hl.sub_object_key = o.sub_object_key))
                          -> Hash Left Join (cost=395257.01..417426.05
rows=806147 width=36)
                                Hash Cond: ((hl.machine_key)::text =
(mi.machine_key)::text)
                                Join Filter: (((hl.contractor_id)::text
= (mi.contractor_id)::text) OR ((hl.contractor_id IS NULL) AND
(mi.contractor_id IS NULL)))
                                -> HashAggregate
(cost=395254.66..403316.13 rows=806147 width=86)
                                      Group Key: hl.machine_key,
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key,
hl.harvest_date, hl.species_group_key, hl.product_key
                                      -> Seq Scan on harvester_logs
hl (cost=0.00..234025.22 rows=8061472 width=54)
                                -> Hash (cost=1.60..1.60 rows=60
width=35)
                                      -> Seq Scan on machine_info mi
(cost=0.00..1.60 rows=60 width=35)
                          -> Hash (cost=33.26..33.26 rows=1726 width=23)
                                -> Seq Scan on objects o
(cost=0.00..33.26 rows=1726 width=23)
                    -> Hash (cost=37.90..37.90 rows=1490 width=29)
                          -> Seq Scan on products p (cost=0.00..37.90
rows=1490 width=29)
              -> Hash (cost=7.55..7.55 rows=455 width=31)
                    -> Seq Scan on species s (cost=0.00..7.55
rows=455 width=31)
        -> Hash (cost=7.59..7.59 rows=4 width=21)
              -> Seq Scan on contractor_access (cost=0.00..7.59
rows=4 width=21)
                    Filter: (t4e_contractor_id =
'name@email.address'::text)

Thanks

Nicklas Avén

--
Adrian Klaver
adrian.klaver@aklaver.com

Nicklas Avén

nicklas.aven@jordogskog.no

over 7 years ago

In reply to: Adrian Klaver (#2)

Re: Problems pushing down WHERE-clause to underlying view

On 2/15/19 5:06 PM, Adrian Klaver wrote:

On 2/15/19 7:28 AM, Nicklas Avén wrote:

Hi

The problem is that it always calculates all those 22000 rows even

if the user id I use only gives 250 rows.

So, the query uses 4 seconds instead of under 100 ms.

https://www.postgresql.org/docs/10/sql-createview.html

"CREATE VIEW defines a view of a query. The view is not physically

materialized. Instead, the query is run every time the view is
referenced in a query."

Sorry, I must have expressed what I mean bad. Of course a view is not
materialized.
I will explain without views what I mean here below

Might want to look at materialized view:
https://www.postgresql.org/docs/10/sql-creatematerializedview.html

"CREATE MATERIALIZED VIEW is similar to CREATE TABLE AS, except that

it also remembers the query used to initialize the view, so that it can
be refreshed later upon demand. A materialized view has many of the same
properties as a table, but there is no support for temporary
materialized views or automatic generation of OIDs."

No, materialized views is not an option. We get some data into those
tables daily. Recalculating the full dataset on a lot of views like this
doesn't make sense.
Instead we have tables maintained with processed new data. But I want
this last part of logic on top as views for flexibility,
to not need cached tables for each possible type of grouping that we need.
Started out with materialized views and it didn't work out well.

I would also suggest running the EXPLAIN below with ANALYZE so actual

timings are returned. Also try:

SELECT
     *
FROM
     underlying_view AS b
JOIN
     contractor_access AS b
ON
     a.machine_key = b.machine_key
WHERE
     user_id = 'name@email.address'

Sorry again, I didn't mention. This I have tried this since this is what
the top level view do.
So first step when trying to understand this was (of course) to apply
the where-clause directly to the query

So, let's do that also on the underlying query (view) .

Here I have 2 queries, where I apply the where clause directly to the
query in the underlying view

(joining the contractor_access table directly on that query).

The first takes 30-40 ms ms and returns the same 250 rows as the second.
In the first I use the machine_key in the where clause.
In the second query that takes about 16 seconds to return the same 250
rows I use the user_id in the contractor_access table.

I have also cleaned up the contractor_access table. So there is only 1
row now, with my email as user_id and the same machine_key as used in
the first query.

Query 1:

EXPLAIN ANALYZE
SELECT
    l.machine_key,
    o.object_name,
    o.sub_object_name,
    o.object_user_id,
    o.sub_object_user_id,
    o.start_date AS object_start_date,
    s.species_group_name,
    p.product_group_name,
    l.m3_sub AS volume_m3sub,
    l.number_of_logs,
    mi.basemachine_manufacturer,
    mi.basemachine_model
   FROM ( SELECT hl.contractor_id,
            hl.machine_key,
            hl.operator_key,
            hl.object_key,
            hl.sub_object_key,
            date(hl.harvest_date) AS harvest_date,
            hl.species_group_key,
            hl.product_key,
            sum(hl.m3_sub) AS m3_sub,
            count(*) AS number_of_logs
           FROM version_union_tables_r02.harvester_logs hl
          GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key,
hl.object_key, hl.sub_object_key, (date(hl.harvest_date)),
hl.species_group_key, hl.product_key) l
     LEFT JOIN version_union_tables_r02.machine_info mi ON
l.machine_key::text = mi.machine_key::text
     LEFT JOIN version_union_tables_r02.objects o ON
l.machine_key::text = o.machine_key::text AND l.object_key =
o.object_key AND l.sub_object_key = o.sub_object_key
     LEFT JOIN version_union_tables_r02.products p ON
l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
     LEFT JOIN version_union_tables_r02.species s ON
l.machine_key::text = s.machine_key::text AND l.species_group_key =
s.species_group_key
join shiny_adm.contractor_access ci on l.machine_key=ci.machine_key
where l.machine_key = '887655635442600'
;

which results in this query plan

Nested Loop (cost=61865.25..65302.20 rows=22624 width=122) (actual
time=25.804..27.134 rows=250 loops=1)
-> Seq Scan on contractor_access ci (cost=0.00..1.01 rows=1
width=19) (actual time=0.009..0.010 rows=1 loops=1)
        Filter: (machine_key = '887655635442600'::text)
-> Hash Left Join (cost=61865.25..65074.95 rows=22624 width=122)
(actual time=25.793..26.959 rows=250 loops=1)
        Hash Cond: (((hl.machine_key)::text = (s.machine_key)::text)
AND (hl.species_group_key = s.species_group_key))
        -> Hash Left Join (cost=61854.55..64263.92 rows=22624
width=120) (actual time=25.755..26.763 rows=250 loops=1)
              Hash Cond: (((hl.machine_key)::text =
(p.machine_key)::text) AND (hl.product_key = p.product_key))
              -> Hash Left Join (cost=61815.97..63145.14 rows=22624
width=118) (actual time=25.706..26.543 rows=250 loops=1)
                    Hash Cond: (((hl.machine_key)::text =
(o.machine_key)::text) AND (hl.object_key = o.object_key) AND
(hl.sub_object_key = o.sub_object_key))
                    -> Hash Left Join (cost=61799.78..62619.90
rows=22624 width=65) (actual time=25.668..26.327 rows=250 loops=1)
                          Hash Cond: ((hl.machine_key)::text =
(mi.machine_key)::text)
                          -> HashAggregate (cost=61796.99..62079.79
rows=22624 width=69) (actual time=25.627..26.132 rows=250 loops=1)
                                Group Key: hl.machine_key,
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key,
date(hl.harvest_date), hl.species_group_key, hl.product_key
                                -> Bitmap Heap Scan on harvester_logs
hl (cost=570.14..61224.14 rows=22914 width=61) (actual
time=0.909..11.573 rows=24151 loops=1)
                                      Recheck Cond:
((machine_key)::text = '887655635442600'::text)
                                      Heap Blocks: exact=538
                                      -> Bitmap Index Scan on
version_union_tables_r02_harvester_logs_machine_key (cost=0.00..564.41
rows=22914 width=0) (actual time=0.870..0.870 rows=24151 loops=1)
                                            Index Cond:
((machine_key)::text = '887655635442600'::text)
                          -> Hash (cost=2.77..2.77 rows=1 width=38)
(actual time=0.023..0.023 rows=1 loops=1)
                                Buckets: 1024 Batches: 1 Memory
Usage: 9kB
                                -> Seq Scan on machine_info mi
(cost=0.00..2.77 rows=1 width=38) (actual time=0.018..0.019 rows=1 loops=1)
                                      Filter: ((machine_key)::text =
'887655635442600'::text)
                                      Rows Removed by Filter: 61
                    -> Hash (cost=16.12..16.12 rows=4 width=84)
(actual time=0.025..0.026 rows=3 loops=1)
                          Buckets: 1024 Batches: 1 Memory Usage: 9kB
                          -> Bitmap Heap Scan on objects o
(cost=4.31..16.12 rows=4 width=84) (actual time=0.020..0.021 rows=3 loops=1)
                                Recheck Cond: ((machine_key)::text =
'887655635442600'::text)
                                Heap Blocks: exact=1
                                -> Bitmap Index Scan on
version_union_tables_r02_objects_machine_key (cost=0.00..4.31 rows=4
width=0) (actual time=0.015..0.015 rows=3 loops=1)
                                      Index Cond: ((machine_key)::text
= '887655635442600'::text)
              -> Hash (cost=38.19..38.19 rows=26 width=35) (actual
time=0.037..0.037 rows=26 loops=1)
                    Buckets: 1024 Batches: 1 Memory Usage: 10kB
                    -> Bitmap Heap Scan on products p
(cost=4.48..38.19 rows=26 width=35) (actual time=0.015..0.026 rows=26
loops=1)
                          Recheck Cond: ((machine_key)::text =
'887655635442600'::text)
                          Heap Blocks: exact=1
                          -> Bitmap Index Scan on
version_union_tables_r02_products_machine_key (cost=0.00..4.47 rows=26
width=0) (actual time=0.009..0.009 rows=26 loops=1)
                                Index Cond: ((machine_key)::text =
'887655635442600'::text)
        -> Hash (cost=10.52..10.52 rows=12 width=37) (actual
time=0.028..0.029 rows=12 loops=1)
              Buckets: 1024 Batches: 1 Memory Usage: 9kB
              -> Bitmap Heap Scan on species s (cost=4.37..10.52
rows=12 width=37) (actual time=0.016..0.021 rows=12 loops=1)
                    Recheck Cond: ((machine_key)::text =
'887655635442600'::text)
                    Heap Blocks: exact=1
                    -> Bitmap Index Scan on
version_union_tables_r02_species_machine_key (cost=0.00..4.36 rows=12
width=0) (actual time=0.008..0.008 rows=12 loops=1)
                          Index Cond: ((machine_key)::text =
'887655635442600'::text)
Planning time: 0.434 ms
Execution time: 27.370 ms

Next query, the slow one that calculates the whole dataset:

results in this query plan:

Hash Left Join (cost=1780026.09..2023556.15 rows=4044 width=122)
(actual time=15860.900..15888.766 rows=250 loops=1)
Hash Cond: (((hl.machine_key)::text = (o.machine_key)::text) AND
(hl.object_key = o.object_key) AND (hl.sub_object_key = o.sub_object_key))
-> Merge Left Join (cost=1779946.65..2023340.22 rows=4044 width=69)
(actual time=15859.604..15887.287 rows=250 loops=1)
        Merge Cond: ((hl.machine_key)::text = (s.machine_key)::text)
        Join Filter: (hl.species_group_key = s.species_group_key)
        Rows Removed by Join Filter: 2750
        -> Merge Left Join (cost=1779915.71..2023136.40 rows=4044
width=67) (actual time=15859.072..15884.912 rows=250 loops=1)
              Merge Cond: ((hl.machine_key)::text = (p.machine_key)::text)
              Join Filter: (hl.product_key = p.product_key)
              Rows Removed by Join Filter: 6250
              -> Merge Left Join (cost=1779788.20..2022471.20
rows=4044 width=65) (actual time=15857.473..15879.504 rows=250 loops=1)
                    Merge Cond: ((hl.machine_key)::text =
(mi.machine_key)::text)
                    -> Merge Join (cost=1779783.74..2022437.81
rows=4044 width=48) (actual time=15857.359..15879.102 rows=250 loops=1)
                          Merge Cond: ((hl.machine_key)::text =
ci.machine_key)
                          -> GroupAggregate
(cost=1779782.72..2012287.44 rows=808712 width=69) (actual
time=15088.353..15878.172 rows=2683 loops=1)
                                Group Key: hl.machine_key,
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key,
(date(hl.harvest_date)), hl.species_group_key, hl.product_key
                                -> Sort (cost=1779782.72..1800000.52
rows=8087121 width=61) (actual time=15088.336..15488.144 rows=942552
loops=1)
                                      Sort Key: hl.machine_key,
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key,
(date(hl.harvest_date)), hl.species_group_key, hl.product_key
                                      Sort Method: external merge Disk:
543456kB
                                      -> Seq Scan on harvester_logs
hl (cost=0.00..243781.01 rows=8087121 width=61) (actual
time=0.007..3169.984 rows=8084464 loops=1)
                          -> Sort (cost=1.02..1.03 rows=1 width=19)
(actual time=0.019..0.020 rows=1 loops=1)
                                Sort Key: ci.machine_key
                                Sort Method: quicksort Memory: 25kB
                                -> Seq Scan on contractor_access ci
(cost=0.00..1.01 rows=1 width=19) (actual time=0.008..0.009 rows=1 loops=1)
                                      Filter: (t4e_contractor_id =
'nicklas.aven@jordogskog.no'::text)
                    -> Sort (cost=4.47..4.62 rows=62 width=38)
(actual time=0.102..0.173 rows=266 loops=1)
                          Sort Key: mi.machine_key
                          Sort Method: quicksort Memory: 30kB
                          -> Seq Scan on machine_info mi
(cost=0.00..2.62 rows=62 width=38) (actual time=0.008..0.030 rows=62
loops=1)
              -> Sort (cost=127.50..131.23 rows=1491 width=35)
(actual time=1.205..3.071 rows=7204 loops=1)
                    Sort Key: p.machine_key
                    Sort Method: quicksort Memory: 175kB
                    -> Seq Scan on products p (cost=0.00..48.91
rows=1491 width=35) (actual time=0.004..0.497 rows=1491 loops=1)
        -> Sort (cost=30.94..32.09 rows=460 width=37) (actual
time=0.385..1.233 rows=3259 loops=1)
              Sort Key: s.machine_key
              Sort Method: quicksort Memory: 65kB
              -> Seq Scan on species s (cost=0.00..10.60 rows=460
width=37) (actual time=0.004..0.146 rows=460 loops=1)
-> Hash (cost=49.25..49.25 rows=1725 width=84) (actual
time=1.286..1.287 rows=1690 loops=1)
        Buckets: 2048 Batches: 1 Memory Usage: 190kB
        -> Seq Scan on objects o (cost=0.00..49.25 rows=1725
width=84) (actual time=0.004..0.600 rows=1725 loops=1)
Planning time: 0.527 ms
Execution time: 15945.641 ms

Thanks

Nicklas

Adrian Klaver

adrian.klaver@aklaver.com

over 7 years ago

In reply to: Nicklas Avén (#3)

Re: Problems pushing down WHERE-clause to underlying view

On 2/15/19 9:27 AM, Nicklas Avén wrote:

On 2/15/19 5:06 PM, Adrian Klaver wrote:

On 2/15/19 7:28 AM, Nicklas Avén wrote:

Hi

The problem is that it always calculates all those 22000 rows even

if the user id I use only gives 250 rows.

So, the query uses 4 seconds instead of under 100 ms.

https://www.postgresql.org/docs/10/sql-createview.html

"CREATE VIEW defines a view of a query. The view is not physically

materialized. Instead, the query is run every time the view is
referenced in a query."

Sorry, I must have expressed what I mean bad. Of course a view is not
materialized.
I will explain without views what I mean here below

Might want to look at materialized view:
https://www.postgresql.org/docs/10/sql-creatematerializedview.html

"CREATE MATERIALIZED VIEW is similar to CREATE TABLE AS, except that

it also remembers the query used to initialize the view, so that it can
be refreshed later upon demand. A materialized view has many of the same
properties as a table, but there is no support for temporary
materialized views or automatic generation of OIDs."

No, materialized views is not an option. We get some data into those
tables daily. Recalculating the full dataset on a lot of views like this
doesn't make sense.
Instead we have tables maintained with processed new data. But I want
this last part of logic on top as views for flexibility,
to not need cached tables for each possible type of grouping that we need.
Started out with materialized views and it didn't work out well.

I would also suggest running the EXPLAIN below with ANALYZE so actual

timings are returned. Also try:

SELECT
     *
FROM
     underlying_view AS b
JOIN
     contractor_access AS b
ON
     a.machine_key = b.machine_key
WHERE
     user_id = 'name@email.address'

Sorry again, I didn't mention. This I have tried this since this is what
the top level view do.
So first step when trying to understand this was (of course) to apply
the where-clause directly to the query

So, let's do that also on the underlying query (view) .

Here I have 2 queries, where I apply the where clause directly to the
query in the underlying view

(joining the contractor_access table directly on that query).

The first takes 30-40 ms ms and returns the same 250 rows as the second.
In the first I use the machine_key in the where clause.
In the second query that takes about 16 seconds to return the same 250
rows I use the user_id in the contractor_access table.

I have also cleaned up the contractor_access table. So there is only 1
row now, with my email as user_id and the same machine_key as used in
the first query.

I have not had chance to fully go through all of below. Some
questions/suggestions:

1) Thanks for the formatted queries. If I could make a suggestion, when
aliasing could you include AS. It would make finding what l.* refers to
easier for those of us with old eyes:)

2) t4e_contractor_id is in the shiny_adm.contractor_access table?
If not where?

3) What is the schema for shiny_adm.contractor_access?
In particular what indexes are on it?

Query 1:

EXPLAIN ANALYZE
SELECT
    l.machine_key,
    o.object_name,
    o.sub_object_name,
    o.object_user_id,
    o.sub_object_user_id,
    o.start_date AS object_start_date,
    s.species_group_name,
    p.product_group_name,
    l.m3_sub AS volume_m3sub,
    l.number_of_logs,
    mi.basemachine_manufacturer,
    mi.basemachine_model
   FROM ( SELECT hl.contractor_id,
            hl.machine_key,
            hl.operator_key,
            hl.object_key,
            hl.sub_object_key,
            date(hl.harvest_date) AS harvest_date,
            hl.species_group_key,
            hl.product_key,
            sum(hl.m3_sub) AS m3_sub,
            count(*) AS number_of_logs
           FROM version_union_tables_r02.harvester_logs hl
          GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key,
hl.object_key, hl.sub_object_key, (date(hl.harvest_date)),
hl.species_group_key, hl.product_key) l
     LEFT JOIN version_union_tables_r02.machine_info mi ON
l.machine_key::text = mi.machine_key::text
     LEFT JOIN version_union_tables_r02.objects o ON
l.machine_key::text = o.machine_key::text AND l.object_key =
o.object_key AND l.sub_object_key = o.sub_object_key
     LEFT JOIN version_union_tables_r02.products p ON
l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
     LEFT JOIN version_union_tables_r02.species s ON
l.machine_key::text = s.machine_key::text AND l.species_group_key =
s.species_group_key
join shiny_adm.contractor_access ci on l.machine_key=ci.machine_key
where l.machine_key = '887655635442600'
;

which results in this query plan

Nested Loop (cost=61865.25..65302.20 rows=22624 width=122) (actual
time=25.804..27.134 rows=250 loops=1)
-> Seq Scan on contractor_access ci (cost=0.00..1.01 rows=1
width=19) (actual time=0.009..0.010 rows=1 loops=1)
        Filter: (machine_key = '887655635442600'::text)
-> Hash Left Join (cost=61865.25..65074.95 rows=22624 width=122)
(actual time=25.793..26.959 rows=250 loops=1)
        Hash Cond: (((hl.machine_key)::text = (s.machine_key)::text)
AND (hl.species_group_key = s.species_group_key))
        -> Hash Left Join (cost=61854.55..64263.92 rows=22624
width=120) (actual time=25.755..26.763 rows=250 loops=1)
              Hash Cond: (((hl.machine_key)::text =
(p.machine_key)::text) AND (hl.product_key = p.product_key))
              -> Hash Left Join (cost=61815.97..63145.14 rows=22624
width=118) (actual time=25.706..26.543 rows=250 loops=1)
                    Hash Cond: (((hl.machine_key)::text =
(o.machine_key)::text) AND (hl.object_key = o.object_key) AND
(hl.sub_object_key = o.sub_object_key))
                    -> Hash Left Join (cost=61799.78..62619.90
rows=22624 width=65) (actual time=25.668..26.327 rows=250 loops=1)
                          Hash Cond: ((hl.machine_key)::text =
(mi.machine_key)::text)
                          -> HashAggregate (cost=61796.99..62079.79
rows=22624 width=69) (actual time=25.627..26.132 rows=250 loops=1)
                                Group Key: hl.machine_key,
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key,
date(hl.harvest_date), hl.species_group_key, hl.product_key
                                -> Bitmap Heap Scan on harvester_logs
hl (cost=570.14..61224.14 rows=22914 width=61) (actual
time=0.909..11.573 rows=24151 loops=1)
                                      Recheck Cond:
((machine_key)::text = '887655635442600'::text)
                                      Heap Blocks: exact=538
                                      -> Bitmap Index Scan on
version_union_tables_r02_harvester_logs_machine_key (cost=0.00..564.41
rows=22914 width=0) (actual time=0.870..0.870 rows=24151 loops=1)
                                            Index Cond:
((machine_key)::text = '887655635442600'::text)
                          -> Hash (cost=2.77..2.77 rows=1 width=38)
(actual time=0.023..0.023 rows=1 loops=1)
                                Buckets: 1024 Batches: 1 Memory
Usage: 9kB
                                -> Seq Scan on machine_info mi
(cost=0.00..2.77 rows=1 width=38) (actual time=0.018..0.019 rows=1 loops=1)
                                      Filter: ((machine_key)::text =
'887655635442600'::text)
                                      Rows Removed by Filter: 61
                    -> Hash (cost=16.12..16.12 rows=4 width=84)
(actual time=0.025..0.026 rows=3 loops=1)
                          Buckets: 1024 Batches: 1 Memory Usage: 9kB
                          -> Bitmap Heap Scan on objects o
(cost=4.31..16.12 rows=4 width=84) (actual time=0.020..0.021 rows=3
loops=1)
                                Recheck Cond: ((machine_key)::text =
'887655635442600'::text)
                                Heap Blocks: exact=1
                                -> Bitmap Index Scan on
version_union_tables_r02_objects_machine_key (cost=0.00..4.31 rows=4
width=0) (actual time=0.015..0.015 rows=3 loops=1)
                                      Index Cond: ((machine_key)::text
= '887655635442600'::text)
              -> Hash (cost=38.19..38.19 rows=26 width=35) (actual
time=0.037..0.037 rows=26 loops=1)
                    Buckets: 1024 Batches: 1 Memory Usage: 10kB
                    -> Bitmap Heap Scan on products p
(cost=4.48..38.19 rows=26 width=35) (actual time=0.015..0.026 rows=26
loops=1)
                          Recheck Cond: ((machine_key)::text =
'887655635442600'::text)
                          Heap Blocks: exact=1
                          -> Bitmap Index Scan on
version_union_tables_r02_products_machine_key (cost=0.00..4.47 rows=26
width=0) (actual time=0.009..0.009 rows=26 loops=1)
                                Index Cond: ((machine_key)::text =
'887655635442600'::text)
        -> Hash (cost=10.52..10.52 rows=12 width=37) (actual
time=0.028..0.029 rows=12 loops=1)
              Buckets: 1024 Batches: 1 Memory Usage: 9kB
              -> Bitmap Heap Scan on species s (cost=4.37..10.52
rows=12 width=37) (actual time=0.016..0.021 rows=12 loops=1)
                    Recheck Cond: ((machine_key)::text =
'887655635442600'::text)
                    Heap Blocks: exact=1
                    -> Bitmap Index Scan on
version_union_tables_r02_species_machine_key (cost=0.00..4.36 rows=12
width=0) (actual time=0.008..0.008 rows=12 loops=1)
                          Index Cond: ((machine_key)::text =
'887655635442600'::text)
Planning time: 0.434 ms
Execution time: 27.370 ms

Next query, the slow one that calculates the whole dataset:

EXPLAIN ANALYZE
SELECT
    l.machine_key,
    o.object_name,
    o.sub_object_name,
    o.object_user_id,
    o.sub_object_user_id,
    o.start_date AS object_start_date,
    s.species_group_name,
    p.product_group_name,
    l.m3_sub AS volume_m3sub,
    l.number_of_logs,
    mi.basemachine_manufacturer,
    mi.basemachine_model
   FROM ( SELECT hl.contractor_id,
            hl.machine_key,
            hl.operator_key,
            hl.object_key,
            hl.sub_object_key,
            date(hl.harvest_date) AS harvest_date,
            hl.species_group_key,
            hl.product_key,
            sum(hl.m3_sub) AS m3_sub,
            count(*) AS number_of_logs
           FROM version_union_tables_r02.harvester_logs hl
          GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key,
hl.object_key, hl.sub_object_key, (date(hl.harvest_date)),
hl.species_group_key, hl.product_key) l
     LEFT JOIN version_union_tables_r02.machine_info mi ON
l.machine_key::text = mi.machine_key::text
     LEFT JOIN version_union_tables_r02.objects o ON
l.machine_key::text = o.machine_key::text AND l.object_key =
o.object_key AND l.sub_object_key = o.sub_object_key
     LEFT JOIN version_union_tables_r02.products p ON
l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
     LEFT JOIN version_union_tables_r02.species s ON
l.machine_key::text = s.machine_key::text AND l.species_group_key =
s.species_group_key
join shiny_adm.contractor_access ci on l.machine_key=ci.machine_key
where t4e_contractor_id = 'nicklas.aven@jordogskog.no';

results in this query plan:

Hash Left Join (cost=1780026.09..2023556.15 rows=4044 width=122)
(actual time=15860.900..15888.766 rows=250 loops=1)
Hash Cond: (((hl.machine_key)::text = (o.machine_key)::text) AND
(hl.object_key = o.object_key) AND (hl.sub_object_key = o.sub_object_key))
-> Merge Left Join (cost=1779946.65..2023340.22 rows=4044 width=69)
(actual time=15859.604..15887.287 rows=250 loops=1)
        Merge Cond: ((hl.machine_key)::text = (s.machine_key)::text)
        Join Filter: (hl.species_group_key = s.species_group_key)
        Rows Removed by Join Filter: 2750
        -> Merge Left Join (cost=1779915.71..2023136.40 rows=4044
width=67) (actual time=15859.072..15884.912 rows=250 loops=1)
              Merge Cond: ((hl.machine_key)::text = (p.machine_key)::text)
              Join Filter: (hl.product_key = p.product_key)
              Rows Removed by Join Filter: 6250
              -> Merge Left Join (cost=1779788.20..2022471.20
rows=4044 width=65) (actual time=15857.473..15879.504 rows=250 loops=1)
                    Merge Cond: ((hl.machine_key)::text =
(mi.machine_key)::text)
                    -> Merge Join (cost=1779783.74..2022437.81
rows=4044 width=48) (actual time=15857.359..15879.102 rows=250 loops=1)
                          Merge Cond: ((hl.machine_key)::text =
ci.machine_key)
                          -> GroupAggregate
(cost=1779782.72..2012287.44 rows=808712 width=69) (actual
time=15088.353..15878.172 rows=2683 loops=1)
                                Group Key: hl.machine_key,
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key,
(date(hl.harvest_date)), hl.species_group_key, hl.product_key
                                -> Sort (cost=1779782.72..1800000.52
rows=8087121 width=61) (actual time=15088.336..15488.144 rows=942552
loops=1)
                                      Sort Key: hl.machine_key,
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key,
(date(hl.harvest_date)), hl.species_group_key, hl.product_key
                                      Sort Method: external merge Disk:
543456kB
                                      -> Seq Scan on harvester_logs
hl (cost=0.00..243781.01 rows=8087121 width=61) (actual
time=0.007..3169.984 rows=8084464 loops=1)
                          -> Sort (cost=1.02..1.03 rows=1 width=19)
(actual time=0.019..0.020 rows=1 loops=1)
                                Sort Key: ci.machine_key
                                Sort Method: quicksort Memory: 25kB
                                -> Seq Scan on contractor_access ci
(cost=0.00..1.01 rows=1 width=19) (actual time=0.008..0.009 rows=1 loops=1)
                                      Filter: (t4e_contractor_id =
'nicklas.aven@jordogskog.no'::text)
                    -> Sort (cost=4.47..4.62 rows=62 width=38)
(actual time=0.102..0.173 rows=266 loops=1)
                          Sort Key: mi.machine_key
                          Sort Method: quicksort Memory: 30kB
                          -> Seq Scan on machine_info mi
(cost=0.00..2.62 rows=62 width=38) (actual time=0.008..0.030 rows=62
loops=1)
              -> Sort (cost=127.50..131.23 rows=1491 width=35)
(actual time=1.205..3.071 rows=7204 loops=1)
                    Sort Key: p.machine_key
                    Sort Method: quicksort Memory: 175kB
                    -> Seq Scan on products p (cost=0.00..48.91
rows=1491 width=35) (actual time=0.004..0.497 rows=1491 loops=1)
        -> Sort (cost=30.94..32.09 rows=460 width=37) (actual
time=0.385..1.233 rows=3259 loops=1)
              Sort Key: s.machine_key
              Sort Method: quicksort Memory: 65kB
              -> Seq Scan on species s (cost=0.00..10.60 rows=460
width=37) (actual time=0.004..0.146 rows=460 loops=1)
-> Hash (cost=49.25..49.25 rows=1725 width=84) (actual
time=1.286..1.287 rows=1690 loops=1)
        Buckets: 2048 Batches: 1 Memory Usage: 190kB
        -> Seq Scan on objects o (cost=0.00..49.25 rows=1725
width=84) (actual time=0.004..0.600 rows=1725 loops=1)
Planning time: 0.527 ms
Execution time: 15945.641 ms

Thanks

Nicklas

--
Adrian Klaver
adrian.klaver@aklaver.com

Nicklas Avén

nicklas.aven@jordogskog.no

over 7 years ago

In reply to: Adrian Klaver (#4)

Re: Problems pushing down WHERE-clause to underlying view

I have not had chance to fully go through all of below. Some

questions/suggestions:

1) Thanks for the formatted queries. If I could make a suggestion,

when aliasing could you include AS. It would make finding what l.*
refers to easier for those of us with old eyes:)

Yes, of course, sorry :-)

2) t4e_contractor_id is in the shiny_adm.contractor_access table?
If not where?

Yes, sorry again, it is there

3) What is the schema for shiny_adm.contractor_access?
In particular what indexes are on it?

shiny_adm.contractor_access looks like this:

CREATE TABLE shiny_adm.contractor_access
(
machine_key text,
t4e_contractor_id text,
active integer DEFAULT 1,
id serial NOT NULL,
CONSTRAINT contractor_access_pkey PRIMARY KEY (id),
CONSTRAINT contractor_unique UNIQUE (machine_key, t4e_contractor_id),
CONSTRAINT co_check_t4e_co_email CHECK
(utils.verify_email(t4e_contractor_id))
)

CREATE INDEX idx_contractor
ON shiny_adm.contractor_access
USING btree
(t4e_contractor_id COLLATE pg_catalog."default");

CREATE INDEX idx_contractor_mk
ON shiny_adm.contractor_access
USING btree
(machine_key COLLATE pg_catalog."default");

I tried to format the below a little better with AS and some more
consistent indents.

I also, in the first query, changed the where clause to filter on
machine_key in table contractor _access. Just to illustrate the problem
better.

Both queries filter on the same table which is joined the same way. But
in the second example the where clause is not pushed to the subquery l

Thanks a lot for looking into it

Nicklas

Query 1:
EXPLAIN ANALYZE
SELECT
    l.machine_key,
    o.object_name,
    o.sub_object_name,
    o.object_user_id,
    o.sub_object_user_id,
    o.start_date AS object_start_date,
    s.species_group_name,
    p.product_group_name,
    l.m3_sub AS volume_m3sub,
    l.number_of_logs,
    mi.basemachine_manufacturer,
    mi.basemachine_model
FROM
    (
        SELECT
            hl.contractor_id,
            hl.machine_key,
            hl.operator_key,
            hl.object_key,
            hl.sub_object_key,
            date(hl.harvest_date) AS harvest_date,
            hl.species_group_key,
            hl.product_key,
            sum(hl.m3_sub) AS m3_sub,
            count(*) AS number_of_logs
        FROM
            version_union_tables_r02.harvester_logs hl
        GROUP BY
            hl.machine_key, hl.contractor_id, hl.operator_key,
hl.object_key, hl.sub_object_key, (date(hl.harvest_date)),
hl.species_group_key, hl.product_key
    ) AS l
    LEFT JOIN version_union_tables_r02.machine_info AS mi ON
l.machine_key::text = mi.machine_key::text
    LEFT JOIN version_union_tables_r02.objects AS o ON
l.machine_key::text = o.machine_key::text AND l.object_key =
o.object_key AND l.sub_object_key = o.sub_object_key
    LEFT JOIN version_union_tables_r02.products AS p ON
l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
    LEFT JOIN version_union_tables_r02.species AS s ON
l.machine_key::text = s.machine_key::text AND l.species_group_key =
s.species_group_key
    join shiny_adm.contractor_access AS ci on l.machine_key=ci.machine_key
where ci.machine_key = '887655635442600';

Resulting in this query plan:
Nested Loop (cost=61865.25..65302.20 rows=22624 width=122) (actual
time=27.801..29.225 rows=250 loops=1)
-> Seq Scan on contractor_access ci (cost=0.00..1.01 rows=1
width=19) (actual time=0.005..0.006 rows=1 loops=1)
        Filter: (machine_key = '887655635442600'::text)
-> Hash Left Join (cost=61865.25..65074.95 rows=22624 width=122)
(actual time=27.794..29.070 rows=250 loops=1)
        Hash Cond: (((hl.machine_key)::text = (s.machine_key)::text)
AND (hl.species_group_key = s.species_group_key))
        -> Hash Left Join (cost=61854.55..64263.92 rows=22624
width=120) (actual time=27.771..28.851 rows=250 loops=1)
              Hash Cond: (((hl.machine_key)::text =
(p.machine_key)::text) AND (hl.product_key = p.product_key))
              -> Hash Left Join (cost=61815.97..63145.14 rows=22624
width=118) (actual time=27.736..28.628 rows=250 loops=1)
                    Hash Cond: (((hl.machine_key)::text =
(o.machine_key)::text) AND (hl.object_key = o.object_key) AND
(hl.sub_object_key = o.sub_object_key))
                    -> Hash Left Join (cost=61799.78..62619.90
rows=22624 width=65) (actual time=27.709..28.416 rows=250 loops=1)
                          Hash Cond: ((hl.machine_key)::text =
(mi.machine_key)::text)
                          -> HashAggregate (cost=61796.99..62079.79
rows=22624 width=69) (actual time=27.677..28.217 rows=250 loops=1)
                                Group Key: hl.machine_key,
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key,
date(hl.harvest_date), hl.species_group_key, hl.product_key
                                -> Bitmap Heap Scan on harvester_logs
hl (cost=570.14..61224.14 rows=22914 width=61) (actual
time=1.040..12.977 rows=24151 loops=1)
                                      Recheck Cond:
((machine_key)::text = '887655635442600'::text)
                                      Heap Blocks: exact=538
                                      -> Bitmap Index Scan on
version_union_tables_r02_harvester_logs_machine_key (cost=0.00..564.41
rows=22914 width=0) (actual time=0.996..0.997 rows=24151 loops=1)
                                            Index Cond:
((machine_key)::text = '887655635442600'::text)
                          -> Hash (cost=2.77..2.77 rows=1 width=38)
(actual time=0.018..0.018 rows=1 loops=1)
                                Buckets: 1024 Batches: 1 Memory
Usage: 9kB
                                -> Seq Scan on machine_info mi
(cost=0.00..2.77 rows=1 width=38) (actual time=0.013..0.014 rows=1 loops=1)
                                      Filter: ((machine_key)::text =
'887655635442600'::text)
                                      Rows Removed by Filter: 61
                    -> Hash (cost=16.12..16.12 rows=4 width=84)
(actual time=0.020..0.020 rows=3 loops=1)
                          Buckets: 1024 Batches: 1 Memory Usage: 9kB
                          -> Bitmap Heap Scan on objects o
(cost=4.31..16.12 rows=4 width=84) (actual time=0.015..0.016 rows=3 loops=1)
                                Recheck Cond: ((machine_key)::text =
'887655635442600'::text)
                                Heap Blocks: exact=1
                                -> Bitmap Index Scan on
version_union_tables_r02_objects_machine_key (cost=0.00..4.31 rows=4
width=0) (actual time=0.011..0.012 rows=3 loops=1)
                                      Index Cond: ((machine_key)::text
= '887655635442600'::text)
              -> Hash (cost=38.19..38.19 rows=26 width=35) (actual
time=0.030..0.031 rows=26 loops=1)
                    Buckets: 1024 Batches: 1 Memory Usage: 10kB
                    -> Bitmap Heap Scan on products p
(cost=4.48..38.19 rows=26 width=35) (actual time=0.010..0.019 rows=26
loops=1)
                          Recheck Cond: ((machine_key)::text =
'887655635442600'::text)
                          Heap Blocks: exact=1
                          -> Bitmap Index Scan on
version_union_tables_r02_products_machine_key (cost=0.00..4.47 rows=26
width=0) (actual time=0.006..0.006 rows=26 loops=1)
                                Index Cond: ((machine_key)::text =
'887655635442600'::text)
        -> Hash (cost=10.52..10.52 rows=12 width=37) (actual
time=0.018..0.018 rows=12 loops=1)
              Buckets: 1024 Batches: 1 Memory Usage: 9kB
              -> Bitmap Heap Scan on species s (cost=4.37..10.52
rows=12 width=37) (actual time=0.008..0.012 rows=12 loops=1)
                    Recheck Cond: ((machine_key)::text =
'887655635442600'::text)
                    Heap Blocks: exact=1
                    -> Bitmap Index Scan on
version_union_tables_r02_species_machine_key (cost=0.00..4.36 rows=12
width=0) (actual time=0.004..0.005 rows=12 loops=1)
                          Index Cond: ((machine_key)::text =
'887655635442600'::text)
Planning time: 0.376 ms
Execution time: 29.435 ms

Next query, the slow one that calculates the whole dataset:

EXPLAIN ANALYZE
SELECT
    l.machine_key,
    o.object_name,
    o.sub_object_name,
    o.object_user_id,
    o.sub_object_user_id,
    o.start_date AS object_start_date,
    s.species_group_name,
    p.product_group_name,
    l.m3_sub AS volume_m3sub,
    l.number_of_logs,
    mi.basemachine_manufacturer,
    mi.basemachine_model
FROM shiny_adm.contractor_access ci join
    (
        SELECT
            hl.contractor_id,
            hl.machine_key,
            hl.operator_key,
            hl.object_key,
            hl.sub_object_key,
            date(hl.harvest_date) AS harvest_date,
            hl.species_group_key,
            hl.product_key,
            sum(hl.m3_sub) AS m3_sub,
            count(*) AS number_of_logs
        FROM
            version_union_tables_r02.harvester_logs AS hl
        GROUP BY
            hl.machine_key, hl.contractor_id, hl.operator_key,
hl.object_key, hl.sub_object_key, (date(hl.harvest_date)),
hl.species_group_key, hl.product_key
    ) AS l on l.machine_key=ci.machine_key
    LEFT JOIN version_union_tables_r02.machine_info AS mi ON
l.machine_key::text = mi.machine_key::text
    LEFT JOIN version_union_tables_r02.objects AS o ON
l.machine_key::text = o.machine_key::text AND l.object_key =
o.object_key AND l.sub_object_key = o.sub_object_key
    LEFT JOIN version_union_tables_r02.products AS p ON
l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
    LEFT JOIN version_union_tables_r02.species AS s ON
l.machine_key::text = s.machine_key::text AND l.species_group_key =
s.species_group_key
WHERE t4e_contractor_id = 'nicklas.aven@jordogskog.no';

results in this query plan:

Hash Left Join (cost=1780026.09..2023556.15 rows=4044 width=122)
(actual time=16336.200..16366.486 rows=250 loops=1)
Hash Cond: (((hl.machine_key)::text = (o.machine_key)::text) AND
(hl.object_key = o.object_key) AND (hl.sub_object_key = o.sub_object_key))
-> Merge Left Join (cost=1779946.65..2023340.22 rows=4044 width=69)
(actual time=16334.747..16364.834 rows=250 loops=1)
        Merge Cond: ((hl.machine_key)::text = (s.machine_key)::text)
        Join Filter: (hl.species_group_key = s.species_group_key)
        Rows Removed by Join Filter: 2750
        -> Merge Left Join (cost=1779915.71..2023136.40 rows=4044
width=67) (actual time=16334.145..16362.241 rows=250 loops=1)
              Merge Cond: ((hl.machine_key)::text = (p.machine_key)::text)
              Join Filter: (hl.product_key = p.product_key)
              Rows Removed by Join Filter: 6250
              -> Merge Left Join (cost=1779788.20..2022471.20
rows=4044 width=65) (actual time=16332.364..16356.313 rows=250 loops=1)
                    Merge Cond: ((hl.machine_key)::text =
(mi.machine_key)::text)
                    -> Merge Join (cost=1779783.74..2022437.81
rows=4044 width=48) (actual time=16332.238..16355.855 rows=250 loops=1)
                          Merge Cond: ((hl.machine_key)::text =
ci.machine_key)
                          -> GroupAggregate
(cost=1779782.72..2012287.44 rows=808712 width=69) (actual
time=15552.813..16354.893 rows=2683 loops=1)
                                Group Key: hl.machine_key,
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key,
(date(hl.harvest_date)), hl.species_group_key, hl.product_key
                                -> Sort (cost=1779782.72..1800000.52
rows=8087121 width=61) (actual time=15552.795..15959.066 rows=942552
loops=1)
                                      Sort Key: hl.machine_key,
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key,
(date(hl.harvest_date)), hl.species_group_key, hl.product_key
                                      Sort Method: external merge Disk:
543456kB
                                      -> Seq Scan on harvester_logs
hl (cost=0.00..243781.01 rows=8087121 width=61) (actual
time=0.008..3221.502 rows=8084464 loops=1)
                          -> Sort (cost=1.02..1.03 rows=1 width=19)
(actual time=0.018..0.019 rows=1 loops=1)
                                Sort Key: ci.machine_key
                                Sort Method: quicksort Memory: 25kB
                                -> Seq Scan on contractor_access ci
(cost=0.00..1.01 rows=1 width=19) (actual time=0.008..0.009 rows=1 loops=1)
                                      Filter: (t4e_contractor_id =
'nicklas.aven@jordogskog.no'::text)
                    -> Sort (cost=4.47..4.62 rows=62 width=38)
(actual time=0.112..0.197 rows=266 loops=1)
                          Sort Key: mi.machine_key
                          Sort Method: quicksort Memory: 30kB
                          -> Seq Scan on machine_info mi
(cost=0.00..2.62 rows=62 width=38) (actual time=0.008..0.032 rows=62
loops=1)
              -> Sort (cost=127.50..131.23 rows=1491 width=35)
(actual time=1.353..3.404 rows=7204 loops=1)
                    Sort Key: p.machine_key
                    Sort Method: quicksort Memory: 175kB
                    -> Seq Scan on products p (cost=0.00..48.91
rows=1491 width=35) (actual time=0.005..0.556 rows=1491 loops=1)
        -> Sort (cost=30.94..32.09 rows=460 width=37) (actual
time=0.436..1.363 rows=3259 loops=1)
              Sort Key: s.machine_key
              Sort Method: quicksort Memory: 65kB
              -> Seq Scan on species s (cost=0.00..10.60 rows=460
width=37) (actual time=0.004..0.161 rows=460 loops=1)
-> Hash (cost=49.25..49.25 rows=1725 width=84) (actual
time=1.444..1.444 rows=1690 loops=1)
        Buckets: 2048 Batches: 1 Memory Usage: 190kB
        -> Seq Scan on objects o (cost=0.00..49.25 rows=1725
width=84) (actual time=0.004..0.656 rows=1725 loops=1)
Planning time: 0.653 ms
Execution time: 16428.966 ms

Adrian Klaver

adrian.klaver@aklaver.com

over 7 years ago

In reply to: Nicklas Avén (#5)

Re: Problems pushing down WHERE-clause to underlying view

On 2/15/19 12:43 PM, Nicklas Avén wrote:

I have not had chance to fully go through all of below. Some

questions/suggestions:

1) Thanks for the formatted queries. If I could make a suggestion,

when aliasing could you include AS. It would make finding what l.*
refers to easier for those of us with old eyes:)

Yes, of course, sorry :-)

2) t4e_contractor_id is in the shiny_adm.contractor_access table?
If not where?

Yes, sorry again, it is there

3) What is the schema for shiny_adm.contractor_access?
In particular what indexes are on it?

shiny_adm.contractor_access looks like this:

CREATE TABLE shiny_adm.contractor_access
(
machine_key text,
t4e_contractor_id text,
active integer DEFAULT 1,
id serial NOT NULL,
CONSTRAINT contractor_access_pkey PRIMARY KEY (id),
CONSTRAINT contractor_unique UNIQUE (machine_key, t4e_contractor_id),
CONSTRAINT co_check_t4e_co_email CHECK
(utils.verify_email(t4e_contractor_id))
)

CREATE INDEX idx_contractor
ON shiny_adm.contractor_access
USING btree
(t4e_contractor_id COLLATE pg_catalog."default");

CREATE INDEX idx_contractor_mk
ON shiny_adm.contractor_access
USING btree
(machine_key COLLATE pg_catalog."default");

I tried to format the below a little better with AS and some more
consistent indents.

I also, in the first query, changed the where clause to filter on
machine_key in table contractor _access. Just to illustrate the problem
better.

Both queries filter on the same table which is joined the same way. But
in the second example the where clause is not pushed to the subquery l

Thanks a lot for looking into it

Nicklas

Next query, the slow one that calculates the whole dataset:

EXPLAIN ANALYZE
SELECT
    l.machine_key,
    o.object_name,
    o.sub_object_name,
    o.object_user_id,
    o.sub_object_user_id,
    o.start_date AS object_start_date,
    s.species_group_name,
    p.product_group_name,
    l.m3_sub AS volume_m3sub,
    l.number_of_logs,
    mi.basemachine_manufacturer,
    mi.basemachine_model
FROM shiny_adm.contractor_access ci join
    (
        SELECT
            hl.contractor_id,
            hl.machine_key,
            hl.operator_key,
            hl.object_key,
            hl.sub_object_key,
            date(hl.harvest_date) AS harvest_date,
            hl.species_group_key,
            hl.product_key,
            sum(hl.m3_sub) AS m3_sub,
            count(*) AS number_of_logs
        FROM
            version_union_tables_r02.harvester_logs AS hl
        GROUP BY
            hl.machine_key, hl.contractor_id, hl.operator_key,
hl.object_key, hl.sub_object_key, (date(hl.harvest_date)),
hl.species_group_key, hl.product_key
    ) AS l on l.machine_key=ci.machine_key
    LEFT JOIN version_union_tables_r02.machine_info AS mi ON
l.machine_key::text = mi.machine_key::text
    LEFT JOIN version_union_tables_r02.objects AS o ON
l.machine_key::text = o.machine_key::text AND l.object_key =
o.object_key AND l.sub_object_key = o.sub_object_key
    LEFT JOIN version_union_tables_r02.products AS p ON
l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
    LEFT JOIN version_union_tables_r02.species AS s ON
l.machine_key::text = s.machine_key::text AND l.species_group_key =
s.species_group_key
WHERE t4e_contractor_id = 'nicklas.aven@jordogskog.no';

To make it apples to apples try changing above to be more like first query:

...

AS l
LEFT JOIN version_union_tables_r02.machine_info AS mi ON
l.machine_key::text = mi.machine_key::text
LEFT JOIN version_union_tables_r02.objects AS o ON
l.machine_key::text = o.machine_key::text AND l.object_key =
o.object_key AND l.sub_object_key = o.sub_object_key
LEFT JOIN version_union_tables_r02.products AS p ON
l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
LEFT JOIN version_union_tables_r02.species AS s ON
l.machine_key::text = s.machine_key::text AND l.species_group_key =
s.species_group_key
JOIN shiny_adm.contractor_access AS ci ON l.machine_key=ci.machine_key
WHERE t4e_contractor_id = 'nicklas.aven@jordogskog.no'

;

results in this query plan:

--
Adrian Klaver
adrian.klaver@aklaver.com

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Nicklas Avén (#5)

Re: Problems pushing down WHERE-clause to underlying view

=?UTF-8?Q?Nicklas_Av=c3=a9n?= <nicklas.aven@jordogskog.no> writes:

I also, in the first query, changed the where clause to filter on
machine_key in table contractor _access. Just to illustrate the problem
better.

Both queries filter on the same table which is joined the same way. But
in the second example the where clause is not pushed to the subquery

The filters are totally different though. In one case you provide

where ci.machine_key = '887655635442600'

and there is also a join condition

l.machine_key=ci.machine_key

From these two things the planner can deduce

l.machine_key='887655635442600'

which is a restriction condition that it knows how to push down into the
"l" subquery. Furthermore, it can also deduce that it can restrict
all of the left-joined tables to consider only that value of their
join keys.

In query #2 you have no constant value for machine_key so none of that
happens.

IIRC, the propagated value doesn't have to be a constant, exactly,
just a fixed expression. So you might consider something like

<query 1 as written, up to the WHERE>
where ci.machine_key = (select machine_key from contractor_access
where t4e_contractor_id = 'nicklas.aven@jordogskog.no');

when you need to drive the lookup from something other than raw
machine_key. This'll fail, as-is, if there's more than one
contractor_access row with t4e_contractor_id =
'nicklas.aven@jordogskog.no', but you can probably adapt the idea
to make it work.

regards, tom lane

Nicklas Avén

nicklas.aven@jordogskog.no

over 7 years ago

In reply to: Tom Lane (#7)

Re: Problems pushing down WHERE-clause to underlying view

On 16 February 2019 06:02:50 GMT+01:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:

=?UTF-8?Q?Nicklas_Av=c3=a9n?= <nicklas.aven@jordogskog.no> writes:

I also, in the first query, changed the where clause to filter on
machine_key in table contractor _access. Just to illustrate the

problem

better.

Both queries filter on the same table which is joined the same way.

But

in the second example the where clause is not pushed to the subquery

The filters are totally different though. In one case you provide

where ci.machine_key = '887655635442600'

and there is also a join condition

l.machine_key=ci.machine_key

From these two things the planner can deduce

l.machine_key='887655635442600'

which is a restriction condition that it knows how to push down into
the
"l" subquery. Furthermore, it can also deduce that it can restrict
all of the left-joined tables to consider only that value of their
join keys.

In query #2 you have no constant value for machine_key so none of that
happens.

IIRC, the propagated value doesn't have to be a constant, exactly,
just a fixed expression. So you might consider something like

<query 1 as written, up to the WHERE>
where ci.machine_key = (select machine_key from contractor_access
where t4e_contractor_id = 'nicklas.aven@jordogskog.no');

when you need to drive the lookup from something other than raw
machine_key. This'll fail, as-is, if there's more than one
contractor_access row with t4e_contractor_id =
'nicklas.aven@jordogskog.no', but you can probably adapt the idea
to make it work.

regards, tom lane

Thanks Tom
This is what I suspected was happening.
What I was hoping though was that the planner could see that the contractor_access table only contains a few hundred rows, and that the logs table with millions of rows with an index on machine_key should be reduced as much as possible before start grouping.
At first I thought this didn't happen because the logs table is hidden in a subquery. But since it works when filtering directly on machine_key I guess that is not the problem.

But I am still a little confused why I cannot trick this with lateral as I showed in the first mail.

I guess I will have to rewrite this into a function and only give one machine_key at a time to this query.

I think I have bumped into this before, and I might even have asked the same question som years aho, I think I recognize you answer. Sorry for not learning.

Thanks

Nicklas

Sent from my Android device with K-9 Mail. Please excuse my brevity.