Bug #4284
I had an email today about an old bug that I reported back in July 2008.
http://archives.postgresql.org/pgsql-bugs/2008-07/msg00026.php
I didn't receive any response at the time and I didn't really follow it up.
My report contained a full re-creation script to reproduce the problem and
tonight I'm having the same problem with CVS Head. To my untrained eye it
looks like the planner is not properly pushing down the row count.
I know this is a busy time for all, but this seems to be distressing a few
people. At the time I didn't find anyway apart from disabling nested loops.
Would anyone gifted in the art of the query planner be able to look at this?
David.
"David Rowley" <dgrowley@gmail.com> writes:
My report contained a full re-creation script to reproduce the problem and
tonight I'm having the same problem with CVS Head. To my untrained eye it
looks like the planner is not properly pushing down the row count.
It looks more like a multicolumn selectivity issue to me. The planner
is supposing that the join condition
ON t1.productiondate = t2.productiondate AND t1.lineid = t2.lineid
AND t1.partcode = t2.partcode
is going to eliminate some fair-size fraction of t1 rows, whereas in
fact the construction of t2 is such that it won't eliminate any of them.
This is less obviously true for the join to t4, but I imagine from the
rowcounts that it's also true there. So you get an unreasonably small
rowcount for whichever join gets done first, and then the nestloop plan
looks like a good idea for the second join.
regards, tom lane
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: 10 February 2009 22:30
To: David Rowley
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Bug #4284"David Rowley" <dgrowley@gmail.com> writes:
My report contained a full re-creation script to reproduce the problem
and
tonight I'm having the same problem with CVS Head. To my untrained eye
it
looks like the planner is not properly pushing down the row count.
It looks more like a multicolumn selectivity issue to me. The planner
is supposing that the join conditionON t1.productiondate = t2.productiondate AND t1.lineid = t2.lineid
AND t1.partcode = t2.partcodeis going to eliminate some fair-size fraction of t1 rows, whereas in
fact the construction of t2 is such that it won't eliminate any of them.
This is less obviously true for the join to t4, but I imagine from the
rowcounts that it's also true there. So you get an unreasonably small
rowcount for whichever join gets done first, and then the nestloop plan
looks like a good idea for the second join.
At work I've been (unwillingly) working with MS SQL Server 2008. I notice
that when I request a query plan it's an "Estimated" query plan. Perhaps
this is to get around problems with bad row estimates. In theory at least it
should be possible to revert back to another plan after the inner nested
queries have processed and before the join takes place... I know that's a
major change, I'm just theorising.
Apart from that, do you think that this could only be fixed with stats that
span multiple columns? I know this was talked about not so long ago.
David.
Tom Lane Wrote:
"David Rowley" <dgrowley@gmail.com> writes:
My report contained a full re-creation script to reproduce the problem
and
tonight I'm having the same problem with CVS Head. To my untrained eye
it
looks like the planner is not properly pushing down the row count.
It looks more like a multicolumn selectivity issue to me. The planner
is supposing that the join conditionON t1.productiondate = t2.productiondate AND t1.lineid = t2.lineid
AND t1.partcode = t2.partcodeis going to eliminate some fair-size fraction of t1 rows, whereas in
fact the construction of t2 is such that it won't eliminate any of them.
This is less obviously true for the join to t4, but I imagine from the
rowcounts that it's also true there. So you get an unreasonably small
rowcount for whichever join gets done first, and then the nestloop plan
looks like a good idea for the second join.
I thought about this after sending my reply to this last night. I remembered
when I created my test case I had to add the other tables to get the nest
loop behaviour. I'm not sure your guess about the multicolumn selectivity
issue is correct. I re-tested with the following query.
EXPLAIN ANALYZE SELECT t1.productiondate,
t1.partcode,
t1.batchcode,
t1.bestbefore
FROM batches t1
LEFT OUTER JOIN (SELECT productiondate,
lineid,
partcode,
SUM(quantity) AS quantity
FROM production
GROUP BY productiondate,partcode,lineid
) t4 ON t1.productiondate = t4.productiondate AND t1.lineid = t4.lineid AND
t1.partcode = t4.partcode;
The top line of the explain analyze is:
Merge Left Join (cost=464.51..510.72 rows=4200 width=21) (actual
time=107.872..157.882 rows=4200 loops=1)
Perfect row estimate!
It seems to be something to do with having those other tables in there.
David.
"David Rowley" <dgrowley@gmail.com> writes:
I thought about this after sending my reply to this last night. I remembered
when I created my test case I had to add the other tables to get the nest
loop behaviour. I'm not sure your guess about the multicolumn selectivity
issue is correct. I re-tested with the following query.
...
Perfect row estimate!
The reason for that is that the planner knows that a LEFT JOIN result
can't be smaller than the left input, so whatever join size estimate it
comes up with from statistics will be clamped to be at least that much.
If you change the test case to an inner join you still get the one-row
rowcount estimate, same as for the join to t2.
regards, tom lane