Help speeding up a left join aggregate

Started by Nickabout 14 years ago3 messagesgeneral
Jump to latest
#1Nick
nboutelier@gmail.com

I have a pretty well tuned setup, with appropriate indexes and 16GB of
available RAM. Should this be taking this long? I forced it to not use
a sequential scan and that only knocked a second off the plan.

QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Hash Right Join (cost=105882.35..105882.47 rows=3 width=118) (actual
time=3931.567..3931.583 rows=4 loops=1)
Hash Cond: (songs_downloaded.advertisement_id = a.id)
-> HashAggregate (cost=105881.21..105881.26 rows=4 width=13)
(actual time=3931.484..3931.489 rows=3 loops=1)
-> Seq Scan on songs_downloaded (cost=0.00..95455.96
rows=1042525 width=13) (actual time=0.071..1833.680 rows=1034752
loops=1)
Filter: (advertiser_id = 6553406)
-> Hash (cost=1.10..1.10 rows=3 width=46) (actual
time=0.050..0.050 rows=4 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on advertisements a (cost=0.00..1.10 rows=3
width=46) (actual time=0.037..0.041 rows=4 loops=1)
Filter: (advertiser_id = 6553406)
Total runtime: 3931.808 ms
(10 rows)

SELECT a.id, sd.price, COALESCE(sd.downloads,0) AS downloads,
COALESCE(sd.download_revenue,0) AS download_revenue
FROM advertisements a
LEFT JOIN (SELECT advertisement_id, AVG(price) AS price, SUM(price) AS
download_revenue, COUNT(1) AS downloads FROM songs_downloaded WHERE
advertiser_id = 6553406 GROUP BY advertisement_id) AS sd ON a.id =
sd.advertisement_id
WHERE advertiser_id = 6553406

#2Alban Hertroys
haramrae@gmail.com
In reply to: Nick (#1)
Re: Help speeding up a left join aggregate

On 31 Jan 2012, at 4:55, Nick wrote:

I have a pretty well tuned setup, with appropriate indexes and 16GB of
available RAM. Should this be taking this long? I forced it to not use
a sequential scan and that only knocked a second off the plan.

QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Hash Right Join (cost=105882.35..105882.47 rows=3 width=118) (actual
time=3931.567..3931.583 rows=4 loops=1)
Hash Cond: (songs_downloaded.advertisement_id = a.id)
-> HashAggregate (cost=105881.21..105881.26 rows=4 width=13)
(actual time=3931.484..3931.489 rows=3 loops=1)
-> Seq Scan on songs_downloaded (cost=0.00..95455.96
rows=1042525 width=13) (actual time=0.071..1833.680 rows=1034752
loops=1)
Filter: (advertiser_id = 6553406)
-> Hash (cost=1.10..1.10 rows=3 width=46) (actual
time=0.050..0.050 rows=4 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on advertisements a (cost=0.00..1.10 rows=3
width=46) (actual time=0.037..0.041 rows=4 loops=1)
Filter: (advertiser_id = 6553406)
Total runtime: 3931.808 ms
(10 rows)

I bet the group by query would be far more selective on advertisement_id than on the advertiser_id it's currently using, wouldn't it?
Perhaps the query planner chooses the wrong filter here because the advertiser_id is in the inner query, while the advertisement_id is outside it. You could try and see what happens if you move the advertiser_id into the join condition:

SELECT a.id, sd.price, COALESCE(sd.downloads,0) AS downloads,
COALESCE(sd.download_revenue,0) AS download_revenue
FROM advertisements a
LEFT JOIN (SELECT advertisement_id, AVG(price) AS price, SUM(price) AS
download_revenue, COUNT(1) AS downloads FROM songs_downloaded
GROUP BY advertisement_id) AS sd ON a.id =
sd.advertisement_id AND a.advertiser_id = sd.advertiser_id
WHERE advertiser_id = 6553406

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.

#3Volodymyr Kostyrko
c.kworr@gmail.com
In reply to: Nick (#1)
Re: Help speeding up a left join aggregate

Nick wrote:

I have a pretty well tuned setup, with appropriate indexes and 16GB of
available RAM. Should this be taking this long? I forced it to not use
a sequential scan and that only knocked a second off the plan.

QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Hash Right Join (cost=105882.35..105882.47 rows=3 width=118) (actual
time=3931.567..3931.583 rows=4 loops=1)
Hash Cond: (songs_downloaded.advertisement_id = a.id)
-> HashAggregate (cost=105881.21..105881.26 rows=4 width=13)
(actual time=3931.484..3931.489 rows=3 loops=1)
-> Seq Scan on songs_downloaded (cost=0.00..95455.96
rows=1042525 width=13) (actual time=0.071..1833.680 rows=1034752
loops=1)
Filter: (advertiser_id = 6553406)
-> Hash (cost=1.10..1.10 rows=3 width=46) (actual
time=0.050..0.050 rows=4 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on advertisements a (cost=0.00..1.10 rows=3
width=46) (actual time=0.037..0.041 rows=4 loops=1)
Filter: (advertiser_id = 6553406)
Total runtime: 3931.808 ms
(10 rows)

What indexes do you have? Can you show some?

I bet you need something like (advertiser_id, advertisement_id), because
plain index would not be sorted right.

SELECT a.id, sd.price, COALESCE(sd.downloads,0) AS downloads,
COALESCE(sd.download_revenue,0) AS download_revenue
FROM advertisements a
LEFT JOIN (SELECT advertisement_id, AVG(price) AS price, SUM(price) AS
download_revenue, COUNT(1) AS downloads FROM songs_downloaded WHERE
advertiser_id = 6553406 GROUP BY advertisement_id) AS sd ON a.id =
sd.advertisement_id
WHERE advertiser_id = 6553406

--
Sphinx of black quartz judge my vow.