BUG #14399: Order by id DESC causing bad query plan
The following bug has been logged on the website:
Bug reference: 14399
Logged by: Jamie Koceniak
Email address: jkoceniak@mediamath.com
PostgreSQL version: 9.4.6
Operating system: Linux
Description:
One table has 2M records (orders) joining to another table with 75K records
(customers).
Query:
select * FROM
orders t1
JOIN customer t2 ON (t1.customer_id = t2.id)
WHERE
t2.id IN (select distinct customer_id from valid_customers)
ORDER BY t1.id
LIMIT 10 ;
-- valid customers subquery contains 200 records.
For some reason the nested join is doing filter:
Rows Removed by Join Filter: 410976415
See anonymized query plan here:
https://explain.depesz.com/s/k9s5
If I removed order by, query returns in 1.5ms
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Hi I was wondering if this could get approved by the moderator.
One added note: if I actually drop the index then the query plan does a top-N heap sort instead of nested join filter and runs in 28ms.
Sort Method: top-N heapsort Memory: 94kB
Thanks!
On 10/27/16, 5:16 PM, "jkoceniak@mediamath.com" <jkoceniak@mediamath.com> wrote:
The following bug has been logged on the website:
Bug reference: 14399
Logged by: Jamie Koceniak
Email address: jkoceniak@mediamath.com
PostgreSQL version: 9.4.6
Operating system: Linux
Description:One table has 2M records (orders) joining to another table with 75K records
(customers).Query:
select * FROM
orders t1
JOIN customer t2 ON (t1.customer_id = t2.id)
WHERE
t2.id IN (select distinct customer_id from valid_customers)
ORDER BY t1.id
LIMIT 10 ;-- valid customers subquery contains 200 records.
For some reason the nested join is doing filter:
Rows Removed by Join Filter: 410976415See anonymized query plan here:
https://explain.depesz.com/s/k9s5If I removed order by, query returns in 1.5ms
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Thu, Oct 27, 2016 at 5:16 PM, <jkoceniak@mediamath.com> wrote:
The following bug has been logged on the website:
Bug reference: 14399
Logged by: Jamie Koceniak
Email address: jkoceniak@mediamath.com
PostgreSQL version: 9.4.6
Operating system: Linux
Description:One table has 2M records (orders) joining to another table with 75K records
(customers).Query:
select * FROM
orders t1
JOIN customer t2 ON (t1.customer_id = t2.id)
WHERE
t2.id IN (select distinct customer_id from valid_customers)
ORDER BY t1.id
LIMIT 10 ;
Bug potential aside the better way to write that is to use a proper
semi-join (i.e., EXISTS)
SELECT *
FROM order t1
JOIN customer t2 ON (t1.customer_id = t2.id)
WHERE EXISTS (SELECT 1 FROM valid_customers t3 WHERE t3.customer_id = t2.id)
ORDER BY t1.id
LIMIT 10;
Note too that your query plan has a "function scan" node unlike what your
query implies...
Sorry I can't be of more help with the information you've provided.
David J.
Hi David,
Thanks for the suggestion on rewriting the query.
Unfortunately, it yields the same performance problem.
<https://explain.depesz.com/s/fab>https://explain.depesz.com/s/fab
Query rewritten (sorry did leave out function call on original email):
select * FROM
orders t1
JOIN customer t2 ON (t1.customer_id = t2.id<http://t2.id>)
WHERE
t2.id<http://t2.id> EXISTS (select 1 from valid_customers(15348) t3 where t3.customer_id = t2.id)
ORDER BY t1.id<http://t1.id> DESC
LIMIT 10 ;
We also do pagination and if you add a limit 10 offset 90 for example, the performance is 10 times as worse.
If you actually sort by a non-indexed field, then the query runs in 37ms.
Here is the query plan using non-indexed field:
https://explain.depesz.com/s/BtF2
So query sorted by non-indexed field looks like:
select * FROM
orders t1
JOIN customer t2 ON (t1.customer_id = t2.id<http://t2.id>)
WHERE
t2.id<http://t2.id> EXISTS (select 1 from valid_customers(15348) t3 where t3.customer_id = t2.id)
ORDER BY t1.<http://t1.id>created_on desc
LIMIT 10 ;
Thanks,
Jamie
From: David Johnston <david.g.johnston@gmail.com<mailto:david.g.johnston@gmail.com>>
Date: Tuesday, November 1, 2016 at 4:41 PM
To: Jamie Koceniak <jkoceniak@mediamath.com<mailto:jkoceniak@mediamath.com>>
Cc: "pgsql-bugs@postgresql.org<mailto:pgsql-bugs@postgresql.org>" <pgsql-bugs@postgresql.org<mailto:pgsql-bugs@postgresql.org>>
Subject: Re: [BUGS] BUG #14399: Order by id DESC causing bad query plan
On Thu, Oct 27, 2016 at 5:16 PM, <jkoceniak@mediamath.com<mailto:jkoceniak@mediamath.com>> wrote:
The following bug has been logged on the website:
Bug reference: 14399
Logged by: Jamie Koceniak
Email address: jkoceniak@mediamath.com<mailto:jkoceniak@mediamath.com>
PostgreSQL version: 9.4.6
Operating system: Linux
Description:
One table has 2M records (orders) joining to another table with 75K records
(customers).
Query:
select * FROM
orders t1
JOIN customer t2 ON (t1.customer_id = t2.id<http://t2.id>)
WHERE
t2.id<http://t2.id> IN (select distinct customer_id from valid_customers)
ORDER BY t1.id<http://t1.id>
LIMIT 10 ;
Bug potential aside the better way to write that is to use a proper semi-join (i.e., EXISTS)
SELECT *
FROM order t1
JOIN customer t2 ON (t1.customer_id = t2.id<http://t2.id>)
WHERE EXISTS (SELECT 1 FROM valid_customers t3 WHERE t3.customer_id = t2.id<http://t2.id>)
ORDER BY t1.id<http://t1.id>
LIMIT 10;
Note too that your query plan has a "function scan" node unlike what your query implies...
Sorry I can't be of more help with the information you've provided.
David J.