BUG #14399: Order by id DESC causing bad query plan

Started by Jamie Koceniakover 9 years ago4 messagesbugs
Jump to latest
#1Jamie Koceniak
jkoceniak@mediamath.com

The following bug has been logged on the website:

Bug reference: 14399
Logged by: Jamie Koceniak
Email address: jkoceniak@mediamath.com
PostgreSQL version: 9.4.6
Operating system: Linux
Description:

One table has 2M records (orders) joining to another table with 75K records
(customers).

Query:
select * FROM
orders t1
JOIN customer t2 ON (t1.customer_id = t2.id)
WHERE
t2.id IN (select distinct customer_id from valid_customers)
ORDER BY t1.id
LIMIT 10 ;

-- valid customers subquery contains 200 records.

For some reason the nested join is doing filter:
Rows Removed by Join Filter: 410976415

See anonymized query plan here:
https://explain.depesz.com/s/k9s5

If I removed order by, query returns in 1.5ms

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Jamie Koceniak
jkoceniak@mediamath.com
In reply to: Jamie Koceniak (#1)
Re: BUG #14399: Order by id DESC causing bad query plan

Hi I was wondering if this could get approved by the moderator.

One added note: if I actually drop the index then the query plan does a top-N heap sort instead of nested join filter and runs in 28ms.

Sort Method: top-N heapsort Memory: 94kB

Thanks!

On 10/27/16, 5:16 PM, "jkoceniak@mediamath.com" <jkoceniak@mediamath.com> wrote:

The following bug has been logged on the website:

Bug reference: 14399
Logged by: Jamie Koceniak
Email address: jkoceniak@mediamath.com
PostgreSQL version: 9.4.6
Operating system: Linux
Description:

One table has 2M records (orders) joining to another table with 75K records
(customers).

Query:
select * FROM
orders t1
JOIN customer t2 ON (t1.customer_id = t2.id)
WHERE
t2.id IN (select distinct customer_id from valid_customers)
ORDER BY t1.id
LIMIT 10 ;

-- valid customers subquery contains 200 records.

For some reason the nested join is doing filter:
Rows Removed by Join Filter: 410976415

See anonymized query plan here:
https://explain.depesz.com/s/k9s5

If I removed order by, query returns in 1.5ms

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Jamie Koceniak (#1)
Re: BUG #14399: Order by id DESC causing bad query plan

On Thu, Oct 27, 2016 at 5:16 PM, <jkoceniak@mediamath.com> wrote:

The following bug has been logged on the website:

Bug reference: 14399
Logged by: Jamie Koceniak
Email address: jkoceniak@mediamath.com
PostgreSQL version: 9.4.6
Operating system: Linux
Description:

One table has 2M records (orders) joining to another table with 75K records
(customers).

Query:
select * FROM
orders t1
JOIN customer t2 ON (t1.customer_id = t2.id)

WHERE

t2.id IN (select distinct customer_id from valid_customers)

ORDER BY t1.id

LIMIT 10 ;

​Bug potential aside the better way to write ​that is to use a proper
semi-join (i.e., EXISTS)

SELECT *
FROM order t1
JOIN customer t2 ON (t1.customer_id = t2.id)
WHERE EXISTS (SELECT 1 FROM valid_customers t3 WHERE t3.customer_id = t2.id)
ORDER BY t1.id
LIMIT 10;

Note too that your query plan has a "function scan" node unlike what your
query implies...

Sorry I can't be of more help with the information you've provided.

David J.

#4Jamie Koceniak
jkoceniak@mediamath.com
In reply to: David G. Johnston (#3)
Re: BUG #14399: Order by id DESC causing bad query plan

Hi David,

Thanks for the suggestion on rewriting the query.
Unfortunately, it yields the same performance problem.

<https://explain.depesz.com/s/fab&gt;https://explain.depesz.com/s/fab

Query rewritten (sorry did leave out function call on original email):
select * FROM
orders t1
JOIN customer t2 ON (t1.customer_id = t2.id<http://t2.id&gt;)
WHERE
t2.id<http://t2.id&gt; EXISTS (select 1 from valid_customers(15348) t3 where t3.customer_id = t2.id)
ORDER BY t1.id<http://t1.id&gt; DESC
LIMIT 10 ;

We also do pagination and if you add a limit 10 offset 90 for example, the performance is 10 times as worse.

If you actually sort by a non-indexed field, then the query runs in 37ms.
Here is the query plan using non-indexed field:
https://explain.depesz.com/s/BtF2

So query sorted by non-indexed field looks like:

select * FROM
orders t1
JOIN customer t2 ON (t1.customer_id = t2.id<http://t2.id&gt;)
WHERE
t2.id<http://t2.id&gt; EXISTS (select 1 from valid_customers(15348) t3 where t3.customer_id = t2.id)
ORDER BY t1.<http://t1.id&gt;created_on desc
LIMIT 10 ;

Thanks,
Jamie

From: David Johnston <david.g.johnston@gmail.com<mailto:david.g.johnston@gmail.com>>
Date: Tuesday, November 1, 2016 at 4:41 PM
To: Jamie Koceniak <jkoceniak@mediamath.com<mailto:jkoceniak@mediamath.com>>
Cc: "pgsql-bugs@postgresql.org<mailto:pgsql-bugs@postgresql.org>" <pgsql-bugs@postgresql.org<mailto:pgsql-bugs@postgresql.org>>
Subject: Re: [BUGS] BUG #14399: Order by id DESC causing bad query plan

On Thu, Oct 27, 2016 at 5:16 PM, <jkoceniak@mediamath.com<mailto:jkoceniak@mediamath.com>> wrote:
The following bug has been logged on the website:

Bug reference: 14399
Logged by: Jamie Koceniak
Email address: jkoceniak@mediamath.com<mailto:jkoceniak@mediamath.com>
PostgreSQL version: 9.4.6
Operating system: Linux
Description:

One table has 2M records (orders) joining to another table with 75K records
(customers).

Query:
select * FROM
orders t1
JOIN customer t2 ON (t1.customer_id = t2.id<http://t2.id&gt;)
WHERE
t2.id<http://t2.id&gt; IN (select distinct customer_id from valid_customers)
ORDER BY t1.id<http://t1.id&gt;
LIMIT 10 ;

​Bug potential aside the better way to write ​that is to use a proper semi-join (i.e., EXISTS)

SELECT *
FROM order t1
JOIN customer t2 ON (t1.customer_id = t2.id<http://t2.id&gt;)
WHERE EXISTS (SELECT 1 FROM valid_customers t3 WHERE t3.customer_id = t2.id<http://t2.id&gt;)
ORDER BY t1.id<http://t1.id&gt;
LIMIT 10;

Note too that your query plan has a "function scan" node unlike what your query implies...

Sorry I can't be of more help with the information you've provided.

David J.