Slow query with join

Started by Mark Watsonabout 11 years ago6 messagesgeneral
Jump to latest
#1Mark Watson
mark.watson@jurisconcept.ca

Hello all,
I am using PostgreSQL 9.4.1, compiled by Visual C++ build 1800, 64-bit, as downloaded from EnterpriseDB, and is running on my dev system under Win 7 64-bit.
I hope someone can help me with a problem I'm having when joining a view with a table. The view is somewhat involved, but I can provide the details if necessary
A query on the table is quick (16 ms):
explain analyze select ir_actor_id from f_intervenant_ref where ir_dos_id = '5226' order by ir_actor_id;

"Sort (cost=17.28..17.30 rows=8 width=4) (actual time=0.032..0.033 rows=8 loops=1)"
" Sort Key: ir_actor_id"
" Sort Method: quicksort Memory: 25kB"
" -> Index Scan using ir_dos_id_idx on f_intervenant_ref (cost=0.28..17.16 rows=8 width=4) (actual time=0.019..0.024 rows=8 loops=1)"
" Index Cond: ((ir_dos_id)::text = '5226'::text)"
"Planning time: 0.180 ms"
"Execution time: 0.049 ms"

A query on the view is also quick (31 ms), using the results from the previous query on the table:
explain analyze select * from v_actor where v_actor.actor_id in(77170,77170,77184,77184,77185,77185,77186,77186);
"Nested Loop Left Join (cost=0.86..385.18 rows=8 width=138) (actual time=2.819..9.652 rows=4 loops=1)"
" Join Filter: (actor.type = 'physical'::business.actor_type)"
" -> Nested Loop Left Join (cost=0.58..105.16 rows=8 width=114) (actual time=0.015..0.032 rows=4 loops=1)"
" Join Filter: (actor.type = 'moral'::business.actor_type)"
" -> Index Scan using actor_pkey on actor (cost=0.29..38.66 rows=8 width=78) (actual time=0.010..0.018 rows=4 loops=1)"
" Index Cond: (id = ANY ('{77170,77170,77184,77184,77185,77185,77186,77186}'::integer[]))"
" Filter: (deleted IS FALSE)"
" -> Index Scan using moral_actor_pkey on moral_actor (cost=0.28..8.30 rows=1 width=40) (actual time=0.002..0.002 rows=0 loops=4)"
" Index Cond: (id = actor.id)"
" -> Index Scan using physical_actor_pkey on physical_actor (cost=0.29..8.30 rows=1 width=28) (actual time=0.003..0.004 rows=1 loops=4)"
" Index Cond: (id = actor.id)"
" SubPlan 1"
" -> Limit (cost=8.30..8.31 rows=1 width=8) (never executed)"
" -> Sort (cost=8.30..8.31 rows=1 width=8) (never executed)"
" Sort Key: contact.rank"
" -> Index Scan using contact_actor_idx on contact (cost=0.28..8.29 rows=1 width=8) (never executed)"
" Index Cond: (actor_id = actor.id)"
" Filter: (NOT deleted)"
" SubPlan 2"
" -> Limit (cost=8.30..8.31 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=4)"
" -> Sort (cost=8.30..8.31 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=4)"
" Sort Key: contact_1.rank"
" Sort Method: quicksort Memory: 25kB"
" -> Index Scan using contact_actor_idx on contact contact_1 (cost=0.28..8.29 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=4)"
" Index Cond: (actor_id = actor.id)"
" Filter: (NOT deleted)"
"Planning time: 0.721 ms"
"Execution time: 9.759 ms"

However, when I combine the two queries into one, the result set takes 6742 ms:
explain analyze select * from v_actor where v_actor.actor_id in(select ir_actor_id from f_intervenant_ref where ir_dos_id = '5226' order by ir_actor_id);
"Merge Semi Join (cost=71.79..108061.92 rows=8 width=1461) (actual time=7884.994..7927.699 rows=4 loops=1)"
" Merge Cond: (actor.id = f_intervenant_ref.ir_actor_id)"
" -> Merge Left Join (cost=0.85..554314.28 rows=20670 width=138) (actual time=2.820..7926.001 rows=3072 loops=1)"
" Merge Cond: (actor.id = physical_actor.id)"
" Join Filter: (actor.type = 'physical'::business.actor_type)"
" -> Merge Left Join (cost=0.57..1679.17 rows=20670 width=114) (actual time=0.016..4.768 rows=3072 loops=1)"
" Merge Cond: (actor.id = moral_actor.id)"
" Join Filter: (actor.type = 'moral'::business.actor_type)"
" -> Index Scan using actor_pkey on actor (cost=0.29..1275.50 rows=20670 width=78) (actual time=0.008..3.190 rows=3072 loops=1)"
" Filter: (deleted IS FALSE)"
" -> Index Scan using moral_actor_pkey on moral_actor (cost=0.28..268.78 rows=5548 width=40) (actual time=0.006..0.006 rows=1 loops=1)"
" -> Index Scan using physical_actor_pkey on physical_actor (cost=0.29..725.98 rows=15122 width=28) (actual time=0.003..3.208 rows=3072 loops=1)"
" SubPlan 1"
" -> Limit (cost=8.30..8.31 rows=1 width=8) (never executed)"
" -> Sort (cost=8.30..8.31 rows=1 width=8) (never executed)"
" Sort Key: contact.rank"
" -> Index Scan using contact_actor_idx on contact (cost=0.28..8.29 rows=1 width=8) (never executed)"
" Index Cond: (actor_id = actor.id)"
" Filter: (NOT deleted)"
" SubPlan 2"
" -> Limit (cost=8.30..8.31 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=3072)"
" -> Sort (cost=8.30..8.31 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=3072)"
" Sort Key: contact_1.rank"
" Sort Method: quicksort Memory: 25kB"
" -> Index Scan using contact_actor_idx on contact contact_1 (cost=0.28..8.29 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=3072)"
" Index Cond: (actor_id = actor.id)"
" Filter: (NOT deleted)"
" -> Materialize (cost=17.28..17.40 rows=8 width=4) (actual time=0.024..0.027 rows=8 loops=1)"
" -> Sort (cost=17.28..17.30 rows=8 width=4) (actual time=0.021..0.022 rows=8 loops=1)"
" Sort Key: f_intervenant_ref.ir_actor_id"
" Sort Method: quicksort Memory: 25kB"
" -> Index Scan using ir_dos_id_idx on f_intervenant_ref (cost=0.28..17.16 rows=8 width=4) (actual time=0.012..0.017 rows=8 loops=1)"
" Index Cond: ((ir_dos_id)::text = '5226'::text)"
"Planning time: 0.820 ms"
"Execution time: 7927.838 ms"

Any suggestions to help me speed this up will be greatly appreciated.

Mark Watson
Service au client - R&D
www.jurisconcept.ca  

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Mark Watson (#1)
Re: Slow query with join

On Mon, Mar 16, 2015 at 11:50 AM, Marc Watson <mark.watson@jurisconcept.ca>
wrote:

Hello all,
I am using PostgreSQL 9.4.1, compiled by Visual C++ build 1800, 64-bit, as
downloaded from EnterpriseDB, and is running on my dev system under Win 7
64-bit.

​[...]​

However, when I combine the two queries into one, the result set takes
6742 ms:
explain analyze select * from v_actor where v_actor.actor_id in(select
ir_actor_id from f_intervenant_ref where ir_dos_id = '5226' order by
ir_actor_id);

​You might want to consider whether the following is acceptable; but it
would depend on the relationship between f_intervenant_ref and v_actor:

SELECT *
FROM v_actor
JOIN f_intervenant_ref ON (actor_id = ir_actor_id)
WHERE ir_dos_id = '5226';

"IN" has issues due to necessary consideration of possible NULLs in the
list.

Furthermore; even in your original query there is no value to incorporating
an ORDER BY into the IN subquery.

I suspect that this second problem is preventing the planner from pushing
the subquery down into the view and so is forced to perform a "Merge Semi
Join" against the full (and thus expensive) view while the constants in the
second query can be pushed down and the planner is able to choose the
"Nested Loop Left Join" over 8 keys (4 rows) which ends up being very fast.

​The other option is to use EXISTS:

SELECT *
FROM v_actor
WHERE EXIST (SELECT 1 FROM f_intervenant_ref WHERE actor_id = ir_actor_id
AND ir_dos_id = '5226')

"Merge Semi Join (cost=71.79..108061.92 rows=8 width=1461) (actual

time=7884.994..7927.699 rows=4 loops=1)"
" Merge Cond: (actor.id = f_intervenant_ref.ir_actor_id)"
" -> Merge Left Join (cost=0.85..554314.28 rows=20670 width=138)
(actual time=2.820..7926.001 rows=3072 loops=1)"

​[...]​

" -> Materialize (cost=17.28..17.40 rows=8 width=4) (actual
time=0.024..0.027 rows=8 loops=1)"
" -> Sort (cost=17.28..17.30 rows=8 width=4) (actual
time=0.021..0.022 rows=8 loops=1)"
" Sort Key: f_intervenant_ref.ir_actor_id"
" Sort Method: quicksort Memory: 25kB"
" -> Index Scan using ir_dos_id_idx on f_intervenant_ref
(cost=0.28..17.16 rows=8 width=4) (actual time=0.012..0.017 rows=8 loops=1)"
" Index Cond: ((ir_dos_id)::text = '5226'::text)"
"Planning time: 0.820 ms"
"Execution time: 7927.838 ms"

Any suggestions to help me speed this up will be greatly appreciated.

​David J.​

#3Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Mark Watson (#1)
Re: Slow query with join

On 16.3.2015 19:50, Marc Watson wrote:

Hello all,
I am using PostgreSQL 9.4.1, compiled by Visual C++ build 1800,

64-bit, as downloaded from EnterpriseDB, and is running on my dev system
under Win 7 64-bit.

I hope someone can help me with a problem I'm having when joining a

view with a table. The view is somewhat involved, but I can provide the
details if necessary

A query on the table is quick (16 ms):

<

explain analyze select ir_actor_id from f_intervenant_ref where ir_dos_id = '5226' order by ir_actor_id;

"Sort (cost=17.28..17.30 rows=8 width=4) (actual time=0.032..0.033 rows=8 loops=1)"
" Sort Key: ir_actor_id"
" Sort Method: quicksort Memory: 25kB"
" -> Index Scan using ir_dos_id_idx on f_intervenant_ref (cost=0.28..17.16 rows=8 width=4) (actual time=0.019..0.024 rows=8 loops=1)"
" Index Cond: ((ir_dos_id)::text = '5226'::text)"
"Planning time: 0.180 ms"
"Execution time: 0.049 ms"
..

ISTM the database is applying the IN() condition last, i.e. it executes

SELECT * FROM v_actor

and then proceeds to filter the result. I'd bet if you measure time for
that (SELECT * FROM v_actor) you'll get ~7 seconds.

First, get rid of the ORDER BY clauses in the subselects - it's
completely pointless, and might prevent proper optimization (e.g.
replacing the IN() with optimized joins.

I.e. try this:

SELECT * FROM v_actor
WHERE v_actor.actor_id IN (SELECT ir_actor_id FROM f_intervenant_ref
WHERE ir_dos_id = '5226');

I'd also try replacing this with EXISTS

SELECT * FROM v_actor
WHERE EXISTS (SELECT 1 FROM f_intervenant_ref
WHERE (actor_id = ir_actor_id)
AND (ir_dos_id = '5226'));

or even an explicit join

SELECT v_actor.* FROM v_actor JOIN f_intervenant_ref
ON (actor_id = ir_actor_id)
WHERE ir_dos_id = '5226');

That might behave a bit differently if there are multiple
f_intervenant_ref rows matching the actor. If that's the case, a simple
DISTINCT should fix that.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#3)
Re: Slow query with join

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On 16.3.2015 19:50, Marc Watson wrote:

I hope someone can help me with a problem I'm having when joining a
view with a table. The view is somewhat involved, but I can provide the
details if necessary

First, get rid of the ORDER BY clauses in the subselects - it's
completely pointless, and might prevent proper optimization (e.g.
replacing the IN() with optimized joins.

I'm suspicious that the cause may be an ORDER BY in the view. It's
hard to tell when we've not seen the view definition, but I see that both
plans we've been shown are going to produce output sorted by actor.id.
Maybe that's happenstance, or maybe not.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Mark Watson
mark.watson@jurisconcept.ca
In reply to: Tom Lane (#4)
Re: Slow query with join

-----Message d'origine-----
De : Tom Lane [mailto:tgl@sss.pgh.pa.us]
Envoyé : March-16-15 5:07 PM
À : Tomas Vondra
Cc : pgsql-general@postgresql.org; Marc Watson
Objet : Re: [GENERAL] Slow query with join

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On 16.3.2015 19:50, Marc Watson wrote:

I hope someone can help me with a problem I'm having when joining a
view with a table. The view is somewhat involved, but I can provide the
details if necessary

First, get rid of the ORDER BY clauses in the subselects - it's
completely pointless, and might prevent proper optimization (e.g.
replacing the IN() with optimized joins.

I'm suspicious that the cause may be an ORDER BY in the view. It's
hard to tell when we've not seen the view definition, but I see that both
plans we've been shown are going to produce output sorted by actor.id.
Maybe that's happenstance, or maybe not.

regards, tom lane

Thanks for replying.
First of all, there is an ORDER BY in the view - ORDER BY actor.id. Removing this, the query takes about 47 secs., which is also the same as if I simply do a select * from v_actor, with or without the ORDER BY in the view.
It is something in the view, which I will explore further. If I create a materialized view mv_actor, select * from mv_actor takes 8 sec and my original query using the materialized view takes 16ms.

David G. Johnston <david.g.johnston@gmail.com> wrote:

You might want to consider whether the following is acceptable; but it would depend on the relationship between f_intervenant_ref and v_actor:

SELECT *
FROM v_actor
JOIN f_intervenant_ref ON (actor_id = ir_actor_id)
WHERE ir_dos_id = '5226'

The relationship between f_intervenant_ref and v_actor is simple. ir_actor_id is an integer that contains the value from the table actor.id, a serial and the primary key. This gives the same 7 secs, or 49 secs with out the ORDER BY in the view. The explain analyze output (with the ORDER BY in the view) is :
"Merge Join (cost=54.78..108097.04 rows=8 width=1540) (actual time=6925.158..6962.631 rows=8 loops=1)"
" Merge Cond: (actor.id = f_intervenant_ref.ir_actor_id)"
" -> Merge Left Join (cost=0.85..554262.60 rows=20670 width=138) (actual time=2.840..6960.625 rows=3072 loops=1)"
" Merge Cond: (actor.id = physical_actor.id)"
" -> Merge Left Join (cost=0.57..1665.30 rows=20670 width=114) (actual time=0.014..3.791 rows=3072 loops=1)"
" Merge Cond: (actor.id = moral_actor.id)"
" -> Index Scan using actor_pkey on actor (cost=0.29..1275.50 rows=20670 width=78) (actual time=0.006..1.667 rows=3072 loops=1)"
" -> Index Scan using moral_actor_pkey on moral_actor (cost=0.28..268.78 rows=5548 width=40) (actual time=0.005..0.005 rows=1 loops=1)"
" -> Index Scan using physical_actor_pkey on physical_actor (cost=0.29..725.98 rows=15122 width=28) (actual time=0.003..3.072 rows=3072 loops=1)"
" SubPlan 1"
" -> Limit (cost=8.30..8.31 rows=1 width=8) (never executed)"
" -> Sort (cost=8.30..8.31 rows=1 width=8) (never executed)"
" Sort Key: contact.rank"
" -> Index Scan using contact_actor_idx on contact (cost=0.28..8.29 rows=1 width=8) (never executed)"
" Index Cond: (actor_id = actor.id)"
" SubPlan 2"
" -> Limit (cost=8.30..8.31 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=3072)"
" -> Sort (cost=8.30..8.31 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=3072)"
" Sort Key: contact_1.rank"
" Sort Method: quicksort Memory: 25kB"
" -> Index Scan using contact_actor_idx on contact contact_1 (cost=0.28..8.29 rows=1 width=8) (actual time=0.002..0.003 rows=0 loops=3072)"
" Index Cond: (actor_id = actor.id)"
" -> Index Scan using ir_actor_id_idx on f_intervenant_ref (cost=0.28..62.58 rows=8 width=79) (actual time=0.386..0.449 rows=8 loops=1)"
" Filter: ((ir_dos_id)::text = '5226'::text)"
" Rows Removed by Filter: 891"
"Planning time: 0.864 ms"
"Execution time: 6962.773 ms"

The other option is to use EXISTS:
...

Same 7 secs.

A scaled-down version of my view runs quickly, so I'll gradually start inserting the missing pieces until I find the culprit. I'll let you know the cause when I find it.
Thx again,
Mark Watson

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Mark Watson
mark.watson@jurisconcept.ca
In reply to: Mark Watson (#5)
Re: Slow query with join

Update :

My query SELECT * FROM v_actor JOIN f_intervenant_ref ON (actor_id = ir_actor_id) WHERE ir_dos_id = '5226' took 7 secs.
If I substitute the _RETURN rule for the view and add the above join, it takes 31 ms.

Mark Watson
Service au client - R&D
Tél. 418 659-7272 ou 1 888 692-1050
www.jurisconcept.ca  

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general