Nested Loop Left Join always shows rows=1
Query is:
"explain analyze select sum(A), count(*) from tab1 left outer join
tab_outer on id=out_id where id=10;"
output:
Aggregate (cost=31.91..31.92 rows=1 width=4) (actual
time=14.185..14.185 rows=1 loops=1)
-> Nested Loop Left Join (cost=0.00..17.11 rows=2959 width=4)
(actual time=8.608..13.400 rows=2953 loops=1)
-> Index Scan using id_idx on tab1 (cost=0.00..8.27 rows=1
width=4) (actual time=0.010..0.011 rows=1 loops=1)
Index Cond: (id = 10)
-> Index Scan using out_id_idx on tab_outer (cost=0.00..8.83
rows=1 width=8) (actual time=8.590..11.924 rows=2953 loops=1)
Index Cond: ((tab_outer.out_id = 10) AND (tab1.id =
tab_outer.out_id))
http://explain-analyze.info/query_plans/2519-query-plan-1240
I ran ANALYZE on both tables, but the plan shows rows=1 in the inner
table results.
If I change the "left outer join" into an "inner join" the bitmap index
scan of the inner table of the nested loop shows a correct number of
rows (not the exact same thing as "actual rows", but very close).
Am I reading the explain output in the wrong way?
I'm not complaining about performance issues, I'm just trying to learn
how to read plans.
Scara Maccai <m_lists@yahoo.it> writes:
Aggregate (cost=31.91..31.92 rows=1 width=4) (actual
time=14.185..14.185 rows=1 loops=1)
-> Nested Loop Left Join (cost=0.00..17.11 rows=2959 width=4)
(actual time=8.608..13.400 rows=2953 loops=1)
-> Index Scan using id_idx on tab1 (cost=0.00..8.27 rows=1
width=4) (actual time=0.010..0.011 rows=1 loops=1)
Index Cond: (id = 10)
-> Index Scan using out_id_idx on tab_outer (cost=0.00..8.83
rows=1 width=8) (actual time=8.590..11.924 rows=2953 loops=1)
Index Cond: ((tab_outer.out_id = 10) AND (tab1.id =
tab_outer.out_id))
This seems a bit broken :-( ... ideally it shouldn't be generating the
redundant index condition, either. I think the redundant condition is
probably indirectly responsible for the low rowcount estimate, too.
It doesn't really hurt anything, since the join output estimate is
correct, but it'd be nice to make it look better.
regards, tom lane
Tom Lane wrote:
Scara Maccai <m_lists@yahoo.it> writes:
-> Index Scan using id_idx on tab1 (cost=0.00..8.27 rows=1
width=4) (actual time=0.010..0.011 rows=1 loops=1)
Index Cond: (id = 10)
-> Index Scan using out_id_idx on tab_outer (cost=0.00..8.83
rows=1 width=8) (actual time=8.590..11.924 rows=2953 loops=1)
Index Cond: ((tab_outer.out_id = 10) AND (tab1.id =
tab_outer.out_id))This seems a bit broken :-( ... ideally it shouldn't be generating the
redundant index condition, either.
Why is the index condition redundant? I guess the redundant part is
"AND (tab1.id = tab_outer.out_id)" but I'm not sure, and I would really appreciate if you could explain a little bit more...
The reason I'm asking is because I'm looking at the implementation of some VERY basic form of progress indicator based on "estimated rows vs rows already processed".
I guess this would work for our DB because the planner is usually very good at guessing the number of rows for each step of the plan, since our distributions are pretty simple.
But this left outer join thing I'm afraid would invalidate the method, since the inner table row count is always off...
Thank you for the reply.
(I added hackers too, since this seems a place where some
fixing/development would be necessary. I hope it's ok.)
Scara Maccai <m_lists@yahoo.it> writes:
I ran ANALYZE on both tables, but the plan shows rows=1 in the inner
table results.
If I change the "left outer join" into an "inner join" the bitmap index
scan of the inner table of the nested loop shows a correct number of
rows (not the exact same thing as "actual rows", but very close).
Yeah, this is a bug: it's effectively double-counting the selectivity of
the index clause. The error isn't enormously critical, since the join
size estimate is right; but it could perhaps lead to choosing a plain
indexscan when a bitmap scan would be better. I've applied a patch.
regards, tom lane