merge semi join cost calculation error
Hi
I try to understand to a issue
https://stackoverflow.com/questions/52685384/subquery-performance-on-simple-case
The user sent a plan:
QUERY PLAN
Merge Semi Join (cost=82.97..580.24 rows=580 width=8) (actual
time=0.503..9557.396 rows=721 loops=1)
Merge Cond: (tips.users_id = follows.users_id_to)
-> Index Scan using tips_idx_users_id01 on tips (cost=0.43..8378397.19
rows=2491358 width=16) (actual time=0.009..9231.585 rows=2353914 loops=1)
-> Sort (cost=1.77..1.82 rows=22 width=8) (actual time=0.052..0.089
rows=28 loops=1)
Sort Key: follows.users_id_to
Sort Method: quicksort Memory: 26kB
-> Seq Scan on follows (cost=0.00..1.27 rows=22 width=8) (actual
time=0.013..0.020 rows=28 loops=1)
Filter: (users_id_from = 1)
He has PostgreSQL 10.5. I cannot to understand to too low total cost of Merge
Semi Join because subnode has very high cost 8378397.
I cannot to emulate this case on my comp - so it looks like maybe some
build error. What do you think about?
Regards
Pavel
Pavel Stehule <pavel.stehule@gmail.com> writes:
The user sent a plan:
QUERY PLAN
Merge Semi Join (cost=82.97..580.24 rows=580 width=8) (actual
time=0.503..9557.396 rows=721 loops=1)
Merge Cond: (tips.users_id = follows.users_id_to)
-> Index Scan using tips_idx_users_id01 on tips (cost=0.43..8378397.19
rows=2491358 width=16) (actual time=0.009..9231.585 rows=2353914 loops=1)
-> Sort (cost=1.77..1.82 rows=22 width=8) (actual time=0.052..0.089
rows=28 loops=1)
Sort Key: follows.users_id_to
Sort Method: quicksort Memory: 26kB
-> Seq Scan on follows (cost=0.00..1.27 rows=22 width=8) (actual
time=0.013..0.020 rows=28 loops=1)
Filter: (users_id_from = 1)
He has PostgreSQL 10.5. I cannot to understand to too low total cost of Merge
Semi Join because subnode has very high cost 8378397.
The planner seems to be supposing that the merge will stop far short of
scanning the entire LHS table, presumably as a result of thinking that
the maximum value of follows.users_id_to is much less than the maximum
value of tips.users_id. Given the actual rowcounts, that's seemingly
not true, which suggests out-of-date stats for one table or the other.
regards, tom lane
po 8. 10. 2018 v 17:00 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
The user sent a plan:
QUERY PLAN
Merge Semi Join (cost=82.97..580.24 rows=580 width=8) (actual
time=0.503..9557.396 rows=721 loops=1)
Merge Cond: (tips.users_id = follows.users_id_to)
-> Index Scan using tips_idx_users_id01 on tips(cost=0.43..8378397.19
rows=2491358 width=16) (actual time=0.009..9231.585 rows=2353914 loops=1)
-> Sort (cost=1.77..1.82 rows=22 width=8) (actual time=0.052..0.089
rows=28 loops=1)
Sort Key: follows.users_id_to
Sort Method: quicksort Memory: 26kB
-> Seq Scan on follows (cost=0.00..1.27 rows=22 width=8)(actual
time=0.013..0.020 rows=28 loops=1)
Filter: (users_id_from = 1)He has PostgreSQL 10.5. I cannot to understand to too low total cost of
Merge
Semi Join because subnode has very high cost 8378397.
The planner seems to be supposing that the merge will stop far short of
scanning the entire LHS table, presumably as a result of thinking that
the maximum value of follows.users_id_to is much less than the maximum
value of tips.users_id. Given the actual rowcounts, that's seemingly
not true, which suggests out-of-date stats for one table or the other.
good tip - the table follows was too small for autovacuum, and it was
terrible effect. I didn't know about this optimization.
Thank you
Pavel
Show quoted text
regards, tom lane