Slow SQL?

Started by Bjørn T Johansenover 9 years ago8 messagesgeneral
Jump to latest
#1Bjørn T Johansen
btj@havleik.no

I am trying to move a small system from Oracle to PostgreSQL and I have come upon a sql that runs really slow compared to on the Oracle database and
I am not able to interpret why this is slow.

The SQL looks like this:

Select a.status, a.plass, a.navn, a.avlsnr,
date_part('day',(now() - s.dato)) dato_diff, v.tekst, COALESCE(a.avlsverdi,0)
From sed_uttak s, sem_avlsverdi a, semin_vare v
where a.aktiv = 1
And s.dato = (Select Max(y.dato)
From sed_uttak y
Where y.avlsnr = s.avlsnr)
And a.avlsnr = s.avlsnr
And s.sedtype = v.tallkode
And a.avlsnr Not In (Select avlsnr
From dyr_pause_mot)
Union
Select a.status, a.plass, a.navn, a.avlsnr,
date_part('day',(now() - s.dato)) dato_diff, 'Tappe pause', COALESCE(a.avlsverdi,0)
From sed_uttak s, sem_avlsverdi a, dyr_pause_mot p
Where s.dato = (Select Max(x.dato)
From sed_uttak x
Where x.avlsnr = s.avlsnr)
And a.avlsnr = s.avlsnr
And a.avlsnr = p.avlsnr
Union
Select a.status, a.plass, a.navn, a.avlsnr, null dato_diff, 'IKKE TAPPET', COALESCE(a.avlsverdi,0)
From sem_avlsverdi a
Where a.aktiv = 1
And a.avlsnr Not In (Select avlsnr From sed_uttak)
And a.avlsnr Not In (Select avlsnr From dyr_pause_mot)

And the explain result looks like this:

HashAggregate (cost=7288068.92..7288092.10 rows=2318 width=36) (actual time=10740.366..10741.879 rows=6475 loops=1)
Group Key: a.status, a.plass, a.navn, a.avlsnr, (date_part('day'::text, (now() - (s.dato)::timestamp with time zone))), v.tekst, (COALESCE((a.avlsverdi)::integer, 0))
-> Append (cost=1.46..7288028.35 rows=2318 width=36) (actual time=0.203..10730.906 rows=8915 loops=1)
-> Nested Loop (cost=1.46..7274678.41 rows=698 width=82) (actual time=0.203..10638.870 rows=8602 loops=1)
Join Filter: (s.sedtype = v.tallkode)
Rows Removed by Join Filter: 127006
-> Nested Loop (cost=1.46..7274438.07 rows=698 width=26) (actual time=0.189..10607.509 rows=6164 loops=1)
-> Seq Scan on sem_avlsverdi a (cost=1.04..153.19 rows=3238 width=16) (actual time=0.024..4.027 rows=6474 loops=1)
Filter: ((NOT (hashed SubPlan 5)) AND (aktiv = 1))
Rows Removed by Filter: 3
SubPlan 5
-> Seq Scan on dyr_pause_mot dyr_pause_mot_1 (cost=0.00..1.03 rows=3 width=4) (actual time=0.002..0.003 rows=3 loops=1)
-> Index Scan using idx_seduttak_avlsnr on sed_uttak s (cost=0.42..2246.53 rows=1 width=14) (actual time=1.404..1.637 rows=1 loops=6474)
Index Cond: (avlsnr = a.avlsnr)
Filter: (dato = (SubPlan 4))
Rows Removed by Filter: 42
SubPlan 4
-> Aggregate (cost=43.09..43.10 rows=1 width=8) (actual time=0.037..0.037 rows=1 loops=279035)
-> Index Scan using idx_seduttak_avlsnr on sed_uttak y (cost=0.42..42.96 rows=52 width=8) (actual time=0.003..0.029 rows=76 loops=279035)
Index Cond: (avlsnr = s.avlsnr)
-> Materialize (cost=0.00..1.33 rows=22 width=60) (actual time=0.000..0.001 rows=22 loops=6164)
-> Seq Scan on semin_vare v (cost=0.00..1.22 rows=22 width=60) (actual time=0.002..0.005 rows=22 loops=1)
-> Nested Loop (cost=0.70..6761.33 rows=1 width=24) (actual time=0.342..1.801 rows=3 loops=1)
-> Nested Loop (cost=0.28..25.96 rows=3 width=20) (actual time=0.008..0.015 rows=3 loops=1)
-> Seq Scan on dyr_pause_mot p (cost=0.00..1.03 rows=3 width=4) (actual time=0.002..0.002 rows=3 loops=1)
-> Index Scan using idx_avlsverdi_avlsnr on sem_avlsverdi a_1 (cost=0.28..8.30 rows=1 width=16) (actual time=0.003..0.003 rows=1 loops=3)
Index Cond: (avlsnr = p.avlsnr)
-> Index Scan using idx_seduttak_avlsnr on sed_uttak s_1 (cost=0.42..2245.11 rows=1 width=12) (actual time=0.377..0.589 rows=1 loops=3)
Index Cond: (avlsnr = a_1.avlsnr)
Filter: (dato = (SubPlan 3))
Rows Removed by Filter: 27
SubPlan 3
-> Aggregate (cost=43.09..43.10 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=85)
-> Index Scan using idx_seduttak_avlsnr on sed_uttak x (cost=0.42..42.96 rows=52 width=8) (actual time=0.002..0.015 rows=37 loops=85)
Index Cond: (avlsnr = s_1.avlsnr)
-> Seq Scan on sem_avlsverdi a_2 (cost=6393.04..6565.43 rows=1619 width=16) (actual time=87.448..89.059 rows=310 loops=1)
Filter: ((NOT (hashed SubPlan 1)) AND (NOT (hashed SubPlan 2)) AND (aktiv = 1))
Rows Removed by Filter: 6167
SubPlan 1
-> Seq Scan on sed_uttak (cost=0.00..5694.20 rows=279120 width=4) (actual time=0.005..36.384 rows=279120 loops=1)
SubPlan 2
-> Seq Scan on dyr_pause_mot (cost=0.00..1.03 rows=3 width=4) (actual time=0.002..0.003 rows=3 loops=1)
Planning time: 0.927 ms
Execution time: 10742.300ms

Appreciate any pointers on where to look... :)

Regards,

BTJ

--
-----------------------------------------------------------------------------------------------
Bjørn T Johansen

btj@havleik.no
-----------------------------------------------------------------------------------------------
Someone wrote:
"I understand that if you play a Windows CD backwards you hear strange Satanic messages"
To which someone replied:
"It's even worse than that; play it forwards and it installs Windows"
-----------------------------------------------------------------------------------------------

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Noname
hamann.w@t-online.de
In reply to: Bjørn T Johansen (#1)
Re: Slow SQL?

Hi Bjorn,

I have experienced that some subqueries can be quite slow, and would suspect the NOT IN
clause. I occasionally rewrite
NOT IN (select key from unwanted_candidates)
as
IN (select key from possible_candidates except select key from unwanted_candidates)

Admittedly, I am not running latest version

Regards
Wolfgang Hamann

I am trying to move a small system from Oracle to PostgreSQL and I have come upon a sql that runs really slow compared to on the Oracle database and
I am not able to interpret why this is slow.

The SQL looks like this:

Select a.status, a.plass, a.navn, a.avlsnr,
date_part('day',(now() - s.dato)) dato_diff, v.tekst, COALESCE(a.avlsverdi,0)
From sed_uttak s, sem_avlsverdi a, semin_vare v
where a.aktiv = 1
And s.dato = (Select Max(y.dato)
From sed_uttak y
Where y.avlsnr = s.avlsnr)
And a.avlsnr = s.avlsnr
And s.sedtype = v.tallkode
And a.avlsnr Not In (Select avlsnr
From dyr_pause_mot)
Union
Select a.status, a.plass, a.navn, a.avlsnr,
date_part('day',(now() - s.dato)) dato_diff, 'Tappe pause', COALESCE(a.avlsverdi,0)
From sed_uttak s, sem_avlsverdi a, dyr_pause_mot p
Where s.dato = (Select Max(x.dato)
From sed_uttak x
Where x.avlsnr = s.avlsnr)
And a.avlsnr = s.avlsnr
And a.avlsnr = p.avlsnr
Union
Select a.status, a.plass, a.navn, a.avlsnr, null dato_diff, 'IKKE TAPPET', COALESCE(a.avlsverdi,0)
From sem_avlsverdi a
Where a.aktiv = 1
And a.avlsnr Not In (Select avlsnr From sed_uttak)
And a.avlsnr Not In (Select avlsnr From dyr_pause_mot)

And the explain result looks like this:

HashAggregate (cost=7288068.92..7288092.10 rows=2318 width=36) (actual time=10740.366..10741.879 rows=6475 loops=1)
Group Key: a.status, a.plass, a.navn, a.avlsnr, (date_part('day'::text, (now() - (s.dato)::timestamp with time zone))), v.tekst, (COALESCE((a.avlsverdi)::integer, 0))
-> Append (cost=1.46..7288028.35 rows=2318 width=36) (actual time=0.203..10730.906 rows=8915 loops=1)
-> Nested Loop (cost=1.46..7274678.41 rows=698 width=82) (actual time=0.203..10638.870 rows=8602 loops=1)
Join Filter: (s.sedtype = v.tallkode)
Rows Removed by Join Filter: 127006
-> Nested Loop (cost=1.46..7274438.07 rows=698 width=26) (actual time=0.189..10607.509 rows=6164 loops=1)
-> Seq Scan on sem_avlsverdi a (cost=1.04..153.19 rows=3238 width=16) (actual time=0.024..4.027 rows=6474 loops=1)
Filter: ((NOT (hashed SubPlan 5)) AND (aktiv = 1))
Rows Removed by Filter: 3
SubPlan 5
-> Seq Scan on dyr_pause_mot dyr_pause_mot_1 (cost=0.00..1.03 rows=3 width=4) (actual time=0.002..0.003 rows=3 loops=1)
-> Index Scan using idx_seduttak_avlsnr on sed_uttak s (cost=0.42..2246.53 rows=1 width=14) (actual time=1.404..1.637 rows=1 loops=6474)
Index Cond: (avlsnr = a.avlsnr)
Filter: (dato = (SubPlan 4))
Rows Removed by Filter: 42
SubPlan 4
-> Aggregate (cost=43.09..43.10 rows=1 width=8) (actual time=0.037..0.037 rows=1 loops=279035)
-> Index Scan using idx_seduttak_avlsnr on sed_uttak y (cost=0.42..42.96 rows=52 width=8) (actual time=0.003..0.029 rows=76 loops=279035)
Index Cond: (avlsnr = s.avlsnr)
-> Materialize (cost=0.00..1.33 rows=22 width=60) (actual time=0.000..0.001 rows=22 loops=6164)
-> Seq Scan on semin_vare v (cost=0.00..1.22 rows=22 width=60) (actual time=0.002..0.005 rows=22 loops=1)
-> Nested Loop (cost=0.70..6761.33 rows=1 width=24) (actual time=0.342..1.801 rows=3 loops=1)
-> Nested Loop (cost=0.28..25.96 rows=3 width=20) (actual time=0.008..0.015 rows=3 loops=1)
-> Seq Scan on dyr_pause_mot p (cost=0.00..1.03 rows=3 width=4) (actual time=0.002..0.002 rows=3 loops=1)
-> Index Scan using idx_avlsverdi_avlsnr on sem_avlsverdi a_1 (cost=0.28..8.30 rows=1 width=16) (actual time=0.003..0.003 rows=1 loops=3)
Index Cond: (avlsnr = p.avlsnr)
-> Index Scan using idx_seduttak_avlsnr on sed_uttak s_1 (cost=0.42..2245.11 rows=1 width=12) (actual time=0.377..0.589 rows=1 loops=3)
Index Cond: (avlsnr = a_1.avlsnr)
Filter: (dato = (SubPlan 3))
Rows Removed by Filter: 27
SubPlan 3
-> Aggregate (cost=43.09..43.10 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=85)
-> Index Scan using idx_seduttak_avlsnr on sed_uttak x (cost=0.42..42.96 rows=52 width=8) (actual time=0.002..0.015 rows=37 loops=85)
Index Cond: (avlsnr = s_1.avlsnr)
-> Seq Scan on sem_avlsverdi a_2 (cost=6393.04..6565.43 rows=1619 width=16) (actual time=87.448..89.059 rows=310 loops=1)
Filter: ((NOT (hashed SubPlan 1)) AND (NOT (hashed SubPlan 2)) AND (aktiv = 1))
Rows Removed by Filter: 6167
SubPlan 1
-> Seq Scan on sed_uttak (cost=0.00..5694.20 rows=279120 width=4) (actual time=0.005..36.384 rows=279120 loops=1)
SubPlan 2
-> Seq Scan on dyr_pause_mot (cost=0.00..1.03 rows=3 width=4) (actual time=0.002..0.003 rows=3 loops=1)
Planning time: 0.927 ms
Execution time: 10742.300ms

Appreciate any pointers on where to look... :)

Regards,

BTJ

--
-----------------------------------------------------------------------------------------------
Bjørn T Johansen

btj@havleik.no
-----------------------------------------------------------------------------------------------
Someone wrote:
"I understand that if you play a Windows CD backwards you hear strange Satanic messages"
To which someone replied:
"It's even worse than that; play it forwards and it installs Windows"
-----------------------------------------------------------------------------------------------

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Noname (#2)
Re: Slow SQL?

hamann.w@t-online.de wrote:
Bjørn T Johansen wrote:

I am trying to move a small system from Oracle to PostgreSQL and I have come upon a sql that runs
really slow compared to on the Oracle database and I am not able to interpret why this is slow.

I have experienced that some subqueries can be quite slow, and would suspect the NOT IN
clause. I occasionally rewrite
NOT IN (select key from unwanted_candidates)
as
IN (select key from possible_candidates except select key from unwanted_candidates)

I would try to rewrite these clauses to NOT EXISTS, for example

a.avlsnr Not In (Select avlsnr From dyr_pause_mot)

could be

NOT EXISTS (SELECT NULL FROM dyr_pause_mot WHERE avlsnr = a.avlsnr)

This can be executed as anti-join and is often more efficient.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Bjørn T Johansen
btj@havleik.no
In reply to: Laurenz Albe (#3)
Re: Slow SQL?

On Tue, 12 Jul 2016 09:12:23 +0000
Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

hamann.w@t-online.de wrote:
Bjørn T Johansen wrote:

I am trying to move a small system from Oracle to PostgreSQL and I have come upon a sql that runs
really slow compared to on the Oracle database and I am not able to interpret why this is slow.

I have experienced that some subqueries can be quite slow, and would suspect the NOT IN
clause. I occasionally rewrite
NOT IN (select key from unwanted_candidates)
as
IN (select key from possible_candidates except select key from unwanted_candidates)

I would try to rewrite these clauses to NOT EXISTS, for example

a.avlsnr Not In (Select avlsnr From dyr_pause_mot)

could be

NOT EXISTS (SELECT NULL FROM dyr_pause_mot WHERE avlsnr = a.avlsnr)

This can be executed as anti-join and is often more efficient.

Yours,
Laurenz Albe

Thx for your suggestions. Tried to use NOT EXISTS and the query was about half a second quicker so not much difference...
But when I try to run the 3 queries separately, then they are very quick, 2 barely measurable and the third takes about 1,5 seconds. The union query
takes a little over 9 seconds, so I guess the union part is the bottleneck?

BTJ

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In reply to: Bjørn T Johansen (#1)
Re: Slow SQL?

On Tue, Jul 12, 2016 at 10:23:24AM +0200, Bjørn T Johansen wrote:

I am trying to move a small system from Oracle to PostgreSQL and
I have come upon a sql that runs really slow compared to on the Oracle
database and I am not able to interpret why this is slow.

I loaded your explain analyze to https://explain.depesz.com/, as:
https://explain.depesz.com/s/iXK

as you can see there, the problem is that you made 280 thousand checks
for "sed_uttak y", which seems to be related to this part:

Select a.status, a.plass, a.navn, a.avlsnr,
date_part('day',(now() - s.dato)) dato_diff, v.tekst, COALESCE(a.avlsverdi,0)
From sed_uttak s, sem_avlsverdi a, semin_vare v
where a.aktiv = 1
And s.dato = (Select Max(y.dato)
From sed_uttak y
Where y.avlsnr = s.avlsnr)

from what I understand, you're doing it to work on newest record from
sed_uttak, for each avlsnr.

What is rowcount in the table, and how many different avlsnr are there?

You might want to do something like:

with s as (
select distinct on (avlsnr) *
from sed_uttak
order by avlsnr desc, dato desc
)

and then use "s" instead of set_uttak, and get rid of the s.dato
= (select max....) checks.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Bjørn T Johansen (#4)
Re: Slow SQL?

Bjørn T Johansen wrote:

Thx for your suggestions. Tried to use NOT EXISTS and the query was about half a second quicker so not
much difference...
But when I try to run the 3 queries separately, then they are very quick, 2 barely measurable and the
third takes about 1,5 seconds. The union query
takes a little over 9 seconds, so I guess the union part is the bottleneck?

Looking at EXPLAIN (ANALYZE) output should tell you.

If yes, is UNION ALL an option for you?
That should be cheaper.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bjørn T Johansen (#4)
Re: Slow SQL?

=?UTF-8?B?QmrDuHJu?= T Johansen <btj@havleik.no> writes:

But when I try to run the 3 queries separately, then they are very quick, 2 barely measurable and the third takes about 1,5 seconds. The union query
takes a little over 9 seconds, so I guess the union part is the bottleneck?

No; it's clear from your EXPLAIN output that the first UNION arm is taking
the bulk of the runtime. It's odd that it wouldn't be exactly the same
when run standalone. What does EXPLAIN ANALYZE show if you run that first
arm by itself?

I concur with depesz's observation that the MAX() subselects seem to be
pretty expensive. If you don't want to rewrite the query in a wholesale
fashion like he suggests, you might be able to make the MAX's cheaper by
providing an index on sed_uttak(avlsnr, dato); but I'm not sure how much
that will help.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Bjørn T Johansen
btj@havleik.no
In reply to: hubert depesz lubaczewski (#5)
Re: Slow SQL?

On Tue, 12 Jul 2016 14:34:50 +0200
hubert depesz lubaczewski <depesz@depesz.com> wrote:

On Tue, Jul 12, 2016 at 10:23:24AM +0200, Bjørn T Johansen wrote:

I am trying to move a small system from Oracle to PostgreSQL and
I have come upon a sql that runs really slow compared to on the Oracle
database and I am not able to interpret why this is slow.

I loaded your explain analyze to https://explain.depesz.com/, as:
https://explain.depesz.com/s/iXK

as you can see there, the problem is that you made 280 thousand checks
for "sed_uttak y", which seems to be related to this part:

Select a.status, a.plass, a.navn, a.avlsnr,
date_part('day',(now() - s.dato)) dato_diff, v.tekst, COALESCE(a.avlsverdi,0)
From sed_uttak s, sem_avlsverdi a, semin_vare v
where a.aktiv = 1
And s.dato = (Select Max(y.dato)
From sed_uttak y
Where y.avlsnr = s.avlsnr)

from what I understand, you're doing it to work on newest record from
sed_uttak, for each avlsnr.

What is rowcount in the table, and how many different avlsnr are there?

You might want to do something like:

with s as (
select distinct on (avlsnr) *
from sed_uttak
order by avlsnr desc, dato desc
)

and then use "s" instead of set_uttak, and get rid of the s.dato
= (select max....) checks.

Best regards,

depesz

Rewrote using the with s as method and now it's faster than the oracle querytime, so I am happy...
And I learned something new, never used the with s as way before...

Thx! :)

BTJ

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general