How to improve sql query to achieve the better plan

Started by Arup Rakshitover 7 years ago4 messagesgeneral

ar@zeit.io

over 7 years ago

I have the below query which is taking 1873 ms. How can I improve this?

explain analyze select
sum(coalesce(price_cents, 0)::bigint * coalesce(quantity, 0) * (1 - coalesce(workitems.discount, 0)/ 100)) as total_budget_cents,
sum(coalesce(price_cents, 0)::bigint * coalesce(quantity, 0) * (1 - coalesce(workitems.discount, 0)/ 100) + coalesce(additional_cost_cents, 0) - coalesce(cost_reduction_cents, 0)) as final_budget_cents,
projects.id as project_id
from
projects
left join workitems on
workitems.project_id = projects.id
where
workitems.deleted_at is null
group by
projects.id
order by
project_id asc

And explain output is:

Sort (cost=62851.33..62856.07 rows=1897 width=35) (actual time=1872.867..1873.003 rows=1229 loops=1)
Sort Key: projects.id
Sort Method: quicksort Memory: 145kB
-> HashAggregate (cost=62719.59..62748.04 rows=1897 width=35) (actual time=1871.281..1872.104 rows=1229 loops=1)
Group Key: projects.id
-> Hash Right Join (cost=159.68..45386.32 rows=364911 width=35) (actual time=2.226..637.936 rows=365784 loops=1)
Hash Cond: (workitems.project_id = projects.id)
Filter: (workitems.deleted_at IS NULL)
Rows Removed by Filter: 257457
-> Seq Scan on workitems (cost=0.00..36655.53 rows=623353 width=43) (actual time=0.020..220.215 rows=623175 loops=1)
-> Hash (cost=135.97..135.97 rows=1897 width=16) (actual time=2.177..2.177 rows=1897 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 105kB
-> Seq Scan on projects (cost=0.00..135.97 rows=1897 width=16) (actual time=0.013..1.451 rows=1897 loops=1)
Planning time: 2.775 ms
Execution time: 1873.308 ms

Projects table has the index:

Indexes:
"projects_pkey" PRIMARY KEY, btree (id)
"index_projects_on_company_id" btree (company_id)
"index_projects_on_deleted_at" btree (deleted_at)
"index_projects_on_inspector_id" btree (inspector_id)
"index_projects_on_managed_offline_by_user_id" btree (managed_offline_by_user_id)
"index_projects_on_project_status_id" btree (project_status_id)
"index_projects_on_shipyard_id" btree (shipyard_id)
"index_projects_on_vessel_id" btree (vessel_id)

Workitems table has the index:

Indexes:
"workitems_pkey" PRIMARY KEY, btree (id)
"index_workitems_on_company_id" btree (company_id)
"index_workitems_on_deleted_at" btree (deleted_at)
"index_workitems_on_parent_workitem_id" btree (parent_workitem_id)
"index_workitems_on_project_id" btree (project_id)
"index_workitems_on_standard_workitem_id" btree (standard_workitem_id)
"index_workitems_on_workitem_category_id" btree (workitem_category_id)

Thanks,

Arup Rakshit
ar@zeit.io

Pavel Stehule

pavel.stehule@gmail.com

over 7 years ago

In reply to: Arup Rakshit (#1)

Re: How to improve sql query to achieve the better plan

ne 30. 9. 2018 v 18:23 odesílatel Arup Rakshit <ar@zeit.io> napsal:

I have the below query which is taking 1873 ms. How can I improve this?

explain analyze select
sum(coalesce(price_cents, 0)::bigint * coalesce(quantity, 0) * (1 -
coalesce(workitems.discount, 0)/ 100)) as total_budget_cents,
sum(coalesce(price_cents, 0)::bigint * coalesce(quantity, 0) * (1 -
coalesce(workitems.discount, 0)/ 100) + coalesce(additional_cost_cents, 0)
- coalesce(cost_reduction_cents, 0)) as final_budget_cents,
projects.id as project_id
from
projects
left join workitems on
workitems.project_id = projects.id
where
workitems.deleted_at is null
group by
projects.id
order by
project_id asc

And explain output is:

Sort (cost=62851.33..62856.07 rows=1897 width=35) (actual
time=1872.867..1873.003 rows=1229 loops=1)
Sort Key: projects.id
Sort Method: quicksort Memory: 145kB
-> HashAggregate (cost=62719.59..62748.04 rows=1897 width=35) (actual
time=1871.281..1872.104 rows=1229 loops=1)
Group Key: projects.id
-> Hash Right Join (cost=159.68..45386.32 rows=364911 width=35)
(actual time=2.226..637.936 rows=365784 loops=1)
Hash Cond: (workitems.project_id = projects.id)
Filter: (workitems.deleted_at IS NULL)
Rows Removed by Filter: 257457
-> Seq Scan on workitems (cost=0.00..36655.53 rows=623353
width=43) (actual time=0.020..220.215 rows=623175 loops=1)
-> Hash (cost=135.97..135.97 rows=1897 width=16) (actual
time=2.177..2.177 rows=1897 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 105kB
-> Seq Scan on projects (cost=0.00..135.97 rows=1897
width=16) (actual time=0.013..1.451 rows=1897 loops=1)
Planning time: 2.775 ms
Execution time: 1873.308 ms

maybe conditional index can help

CREATE INDEX ON workitems(project_id) WHERE deleted_at is null

Regards

Pavel

Show quoted text

Projects table has the index:

Indexes:
"projects_pkey" PRIMARY KEY, btree (id)
"index_projects_on_company_id" btree (company_id)
"index_projects_on_deleted_at" btree (deleted_at)
"index_projects_on_inspector_id" btree (inspector_id)
"index_projects_on_managed_offline_by_user_id" btree
(managed_offline_by_user_id)
"index_projects_on_project_status_id" btree (project_status_id)
"index_projects_on_shipyard_id" btree (shipyard_id)
"index_projects_on_vessel_id" btree (vessel_id)

Workitems table has the index:

Indexes:
"workitems_pkey" PRIMARY KEY, btree (id)
"index_workitems_on_company_id" btree (company_id)
"index_workitems_on_deleted_at" btree (deleted_at)
"index_workitems_on_parent_workitem_id" btree (parent_workitem_id)
"index_workitems_on_project_id" btree (project_id)
"index_workitems_on_standard_workitem_id" btree (standard_workitem_id)
"index_workitems_on_workitem_category_id" btree (workitem_category_id)

Thanks,

Arup Rakshit
ar@zeit.io

Arup Rakshit

ar@zeit.io

over 7 years ago

In reply to: Pavel Stehule (#2)

Re: How to improve sql query to achieve the better plan

I just added it as you said, but I am getting same plan.

Sort (cost=62842.16..62846.91 rows=1897 width=35) (actual time=1845.831..1845.950 rows=1229 loops=1)
Sort Key: projects.id
Sort Method: quicksort Memory: 145kB
-> HashAggregate (cost=62710.42..62738.88 rows=1897 width=35) (actual time=1844.178..1845.060 rows=1229 loops=1)
Group Key: projects.id
-> Hash Right Join (cost=159.68..45382.09 rows=364807 width=35) (actual time=1.534..618.717 rows=365784 loops=1)
Hash Cond: (workitems.project_id = projects.id)
Filter: (workitems.deleted_at IS NULL)
Rows Removed by Filter: 257457
-> Seq Scan on workitems (cost=0.00..36653.75 rows=623175 width=43) (actual time=0.047..213.842 rows=623175 loops=1)
-> Hash (cost=135.97..135.97 rows=1897 width=16) (actual time=1.478..1.478 rows=1897 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 105kB
-> Seq Scan on projects (cost=0.00..135.97 rows=1897 width=16) (actual time=0.006..0.914 rows=1897 loops=1)
Planning time: 0.498 ms
Execution time: 1846.100 ms

——————

Thanks,

Arup Rakshit
ar@zeit.io

Show quoted text

On 30-Sep-2018, at 10:15 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

CREATE INDEX ON workitems(project_id) WHERE deleted_at is null

Pavel Stehule

pavel.stehule@gmail.com

over 7 years ago

In reply to: Arup Rakshit (#3)

Re: How to improve sql query to achieve the better plan

ne 30. 9. 2018 v 18:49 odesílatel Arup Rakshit <ar@zeit.io> napsal:

I just added it as you said, but I am getting same plan.

Sort (cost=62842.16..62846.91 rows=1897 width=35) (actual
time=1845.831..1845.950 rows=1229 loops=1)
Sort Key: projects.id
Sort Method: quicksort Memory: 145kB
-> HashAggregate (cost=62710.42..62738.88 rows=1897 width=35) (actual
time=1844.178..1845.060 rows=1229 loops=1)
Group Key: projects.id
-> Hash Right Join (cost=159.68..45382.09 rows=364807 width=35)
(actual time=1.534..618.717 rows=365784 loops=1)
Hash Cond: (workitems.project_id = projects.id)
Filter: (workitems.deleted_at IS NULL)
Rows Removed by Filter: 257457
-> Seq Scan on workitems (cost=0.00..36653.75 rows=623175
width=43) (actual time=0.047..213.842 rows=623175 loops=1)
-> Hash (cost=135.97..135.97 rows=1897 width=16) (actual
time=1.478..1.478 rows=1897 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 105kB
-> Seq Scan on projects (cost=0.00..135.97 rows=1897
width=16) (actual time=0.006..0.914 rows=1897 loops=1)
Planning time: 0.498 ms
Execution time: 1846.100 ms

Then there is not too much what can be done better - maybe you can try
PostgreSQL 11 with paralel hash join -- it is process about 6M rows, the
time about 2 sec is good

Show quoted text

——————

Indexes:
"workitems_pkey" PRIMARY KEY, btree (id)
"index_workitems_on_company_id" btree (company_id)
"index_workitems_on_deleted_at" btree (deleted_at)
"index_workitems_on_parent_workitem_id" btree (parent_workitem_id)
"index_workitems_on_project_id" btree (project_id)
"index_workitems_on_standard_workitem_id" btree (standard_workitem_id)
"index_workitems_on_workitem_category_id" btree (workitem_category_id)
"patrial_index_workitems_200_1" btree (project_id) WHERE deleted_at IS
NULL

Thanks,

Arup Rakshit
ar@zeit.io

On 30-Sep-2018, at 10:15 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

CREATE INDEX ON workitems(project_id) WHERE deleted_at is null