To prefer sorts or filters in postgres, that is the question....

Started by Bob Jonesalmost 8 years ago4 messagesgeneral

r.a.n.d.o.m.d.e.v.4+postgres@gmail.com

almost 8 years ago

Hi,

I've been playing around with hierarchical queries a bit and one thing
I wanted to do is build a query that gives the ultimate parent for a
given child.

The two queries below seem to be a good a shortlist as any.

I'm no query-plan guru, but these seem to be largely identical aside
from one uses "filter IS NULL" and the other uses "top-N heapsort".

Would there be a reason to prefer one over the other (or perhaps
there's an altogether more efficient way of doing this query ?!?).
My gut-instinct says the sort version ?

=> explain analyze with recursive cte(cmenu_id,depth,cmenu_parent) as (
SELECT cmenu_id,1 as depth,cmenu_parent
FROM cms_menu
WHERE cmenu_id='CHILDNODENAME' and cmenu_active=true
UNION ALL
SELECT c.cmenu_id,cte.depth-1,c.cmenu_parent
FROM cms_menu c
JOIN cte ON c.cmenu_id=cte.cmenu_parent WHERE cmenu_active=true)
select * from cte order by depth LIMIT 1;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=166.59..166.59 rows=1 width=68) (actual
time=0.132..0.132 rows=1 loops=1)
CTE cte
-> Recursive Union (cost=0.15..165.31 rows=51 width=68) (actual
time=0.023..0.070 rows=4 loops=1)
-> Index Scan using cms_menu_cmenu_id_key on cms_menu
(cost=0.15..8.17 rows=1 width=68) (actual time=0.022..0.022 rows=1
loops=1)
Index Cond: (cmenu_id = 'CHILDNODENAME'::text)
Filter: cmenu_active
-> Hash Join (cost=0.33..15.61 rows=5 width=68) (actual
time=0.009..0.010 rows=1 loops=4)
Hash Cond: (c.cmenu_id = cte_1.cmenu_parent)
-> Seq Scan on cms_menu c (cost=0.00..14.40
rows=220 width=64) (actual time=0.002..0.004 rows=12 loops=3)
Filter: cmenu_active
-> Hash (cost=0.20..0.20 rows=10 width=36) (actual
time=0.002..0.002 rows=1 loops=4)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> WorkTable Scan on cte cte_1
(cost=0.00..0.20 rows=10 width=36) (actual time=0.000..0.001 rows=1
loops=4)
-> Sort (cost=1.28..1.40 rows=51 width=68) (actual
time=0.131..0.131 rows=1 loops=1)
Sort Key: cte.depth
Sort Method: top-N heapsort Memory: 25kB
-> CTE Scan on cte (cost=0.00..1.02 rows=51 width=68)
(actual time=0.024..0.073 rows=4 loops=1)
Planning time: 0.221 ms
Execution time: 0.163 ms
(19 rows)

=>explain analyze with recursive cte(cmenu_id,cmenu_parent) as (
SELECT cmenu_id,cmenu_parent
FROM cms_menu
WHERE cmenu_id='CHILDNODENAME' and cmenu_active=true
UNION ALL
SELECT c.cmenu_id,c.cmenu_parent
FROM cms_menu c
JOIN cte ON c.cmenu_id=cte.cmenu_parent WHERE cmenu_active=true)
select * from cte where cmenu_parent IS NULL LIMIT 1;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=165.19..166.21 rows=1 width=64) (actual
time=0.069..0.069 rows=1 loops=1)
CTE cte
-> Recursive Union (cost=0.15..165.19 rows=51 width=64) (actual
time=0.020..0.064 rows=4 loops=1)
-> Index Scan using cms_menu_cmenu_id_key on cms_menu
(cost=0.15..8.17 rows=1 width=64) (actual time=0.019..0.020 rows=1
loops=1)
Index Cond: (cmenu_id = 'CHILDNODENAME'::text)
Filter: cmenu_active
-> Hash Join (cost=0.33..15.60 rows=5 width=64) (actual
time=0.011..0.012 rows=1 loops=3)
Hash Cond: (c.cmenu_id = cte_1.cmenu_parent)
-> Seq Scan on cms_menu c (cost=0.00..14.40
rows=220 width=64) (actual time=0.003..0.005 rows=9 loops=3)
Filter: cmenu_active
-> Hash (cost=0.20..0.20 rows=10 width=32) (actual
time=0.002..0.002 rows=1 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> WorkTable Scan on cte cte_1
(cost=0.00..0.20 rows=10 width=32) (actual time=0.001..0.001 rows=1
loops=3)
-> CTE Scan on cte (cost=0.00..1.02 rows=1 width=64) (actual
time=0.068..0.068 rows=1 loops=1)
Filter: (cmenu_parent IS NULL)
Rows Removed by Filter: 3
Planning time: 0.302 ms
Execution time: 0.105 ms
(18 rows)

Laurenz Albe

laurenz.albe@cybertec.at

almost 8 years ago

In reply to: Bob Jones (#1)

Re: To prefer sorts or filters in postgres, that is the question....

Bob Jones wrote:

I've been playing around with hierarchical queries a bit and one thing
I wanted to do is build a query that gives the ultimate parent for a
given child.

The two queries below seem to be a good a shortlist as any.

I'm no query-plan guru, but these seem to be largely identical aside
from one uses "filter IS NULL" and the other uses "top-N heapsort".

Would there be a reason to prefer one over the other (or perhaps
there's an altogether more efficient way of doing this query ?!?).
My gut-instinct says the sort version ?

The ultimate criterion which one is better is the execution time,
but you probably need more data to tell with certainty.

At a short glance, I'd say that they are pretty much the same.
The filter and the top-1-sort will both require a single scan through
the result set and one operation per row found.
And the recursive queries are pretty similar, right?

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Bob Jones

r.a.n.d.o.m.d.e.v.4+postgres@gmail.com

almost 8 years ago

In reply to: Laurenz Albe (#2)

Re: To prefer sorts or filters in postgres, that is the question....

At a short glance, I'd say that they are pretty much the same.
The filter and the top-1-sort will both require a single scan through
the result set and one operation per row found.
And the recursive queries are pretty similar, right?

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Thanks Laurenz.

After sending my original message, I did briefly reconsider things.

My current thinking is that the filter is a bit like an "fgrep" and
the sort actually requires memory allocation and some "real work", and
thus I've settled on the filter for now pending experiments with a
larger quantity of data.

Laurenz Albe

laurenz.albe@cybertec.at

almost 8 years ago

In reply to: Bob Jones (#3)

Re: To prefer sorts or filters in postgres, that is the question....

Bob Jones wrote:

My current thinking is that the filter is a bit like an "fgrep" and
the sort actually requires memory allocation and some "real work", and
thus I've settled on the filter for now pending experiments with a
larger quantity of data.

That's fine.

A top-1-sort is less work than you maybe think:
You go through all items and find the biggest one.
So there is one comparison operator per row - very similar to
what happens when "grepping" for NULL values.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com