merge semi join cost calculation error

Started by Pavel Stehuleover 7 years ago3 messageshackers

pavel.stehule@gmail.com

over 7 years ago

I try to understand to a issue
https://stackoverflow.com/questions/52685384/subquery-performance-on-simple-case

The user sent a plan:

QUERY PLAN
Merge Semi Join (cost=82.97..580.24 rows=580 width=8) (actual
time=0.503..9557.396 rows=721 loops=1)
Merge Cond: (tips.users_id = follows.users_id_to)
-> Index Scan using tips_idx_users_id01 on tips (cost=0.43..8378397.19
rows=2491358 width=16) (actual time=0.009..9231.585 rows=2353914 loops=1)
-> Sort (cost=1.77..1.82 rows=22 width=8) (actual time=0.052..0.089
rows=28 loops=1)
Sort Key: follows.users_id_to
Sort Method: quicksort Memory: 26kB
-> Seq Scan on follows (cost=0.00..1.27 rows=22 width=8) (actual
time=0.013..0.020 rows=28 loops=1)
Filter: (users_id_from = 1)

He has PostgreSQL 10.5. I cannot to understand to too low total cost of Merge
Semi Join because subnode has very high cost 8378397.

I cannot to emulate this case on my comp - so it looks like maybe some
build error. What do you think about?

Regards

Pavel

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Pavel Stehule (#1)

Re: merge semi join cost calculation error

Pavel Stehule <pavel.stehule@gmail.com> writes:

The user sent a plan:

QUERY PLAN
Merge Semi Join (cost=82.97..580.24 rows=580 width=8) (actual
time=0.503..9557.396 rows=721 loops=1)
Merge Cond: (tips.users_id = follows.users_id_to)
-> Index Scan using tips_idx_users_id01 on tips (cost=0.43..8378397.19
rows=2491358 width=16) (actual time=0.009..9231.585 rows=2353914 loops=1)
-> Sort (cost=1.77..1.82 rows=22 width=8) (actual time=0.052..0.089
rows=28 loops=1)
Sort Key: follows.users_id_to
Sort Method: quicksort Memory: 26kB
-> Seq Scan on follows (cost=0.00..1.27 rows=22 width=8) (actual
time=0.013..0.020 rows=28 loops=1)
Filter: (users_id_from = 1)

He has PostgreSQL 10.5. I cannot to understand to too low total cost of Merge
Semi Join because subnode has very high cost 8378397.

The planner seems to be supposing that the merge will stop far short of
scanning the entire LHS table, presumably as a result of thinking that
the maximum value of follows.users_id_to is much less than the maximum
value of tips.users_id. Given the actual rowcounts, that's seemingly
not true, which suggests out-of-date stats for one table or the other.

regards, tom lane

Pavel Stehule

pavel.stehule@gmail.com

over 7 years ago

In reply to: Tom Lane (#2)

Re: merge semi join cost calculation error

po 8. 10. 2018 v 17:00 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:

The user sent a plan:

QUERY PLAN
Merge Semi Join (cost=82.97..580.24 rows=580 width=8) (actual
time=0.503..9557.396 rows=721 loops=1)
Merge Cond: (tips.users_id = follows.users_id_to)
-> Index Scan using tips_idx_users_id01 on tips

(cost=0.43..8378397.19

rows=2491358 width=16) (actual time=0.009..9231.585 rows=2353914 loops=1)
-> Sort (cost=1.77..1.82 rows=22 width=8) (actual time=0.052..0.089
rows=28 loops=1)
Sort Key: follows.users_id_to
Sort Method: quicksort Memory: 26kB
-> Seq Scan on follows (cost=0.00..1.27 rows=22 width=8)

(actual

time=0.013..0.020 rows=28 loops=1)
Filter: (users_id_from = 1)

He has PostgreSQL 10.5. I cannot to understand to too low total cost of

Merge

Semi Join because subnode has very high cost 8378397.

The planner seems to be supposing that the merge will stop far short of
scanning the entire LHS table, presumably as a result of thinking that
the maximum value of follows.users_id_to is much less than the maximum
value of tips.users_id. Given the actual rowcounts, that's seemingly
not true, which suggests out-of-date stats for one table or the other.

good tip - the table follows was too small for autovacuum, and it was
terrible effect. I didn't know about this optimization.

Thank you

Pavel

Show quoted text

regards, tom lane