Estimated rows question
I was wondering why it seems that the query planner can't "see", based
on the histograms, that two join-columns have a very small
intersection, and adjust its row estimation accordingly. Clearly the
below query returns 1001 rows. It appears as if much or all of the
necessary machinery exists in mergejoinscansel, and indeed if you
inspect
leftstartsel, leftendsel, rightstartsel, rightendsel during execution
they are respectively 0.98, 1.00, 0.00, 0.020, which I believe makes sense.
Am I missing something obvious?
Thanks
Sam
create table table_a as select * from generate_series(1,61000) as pkey;
create table table_b as select * from generate_series(60000,110000) as pkey;
create unique index idx_a on table_a(pkey);
create unique index idx_b on table_b(pkey);
analyse table_a;
analyse table_b;
explain select * from table_a a inner join table_b b on a.pkey = b.pkey;
QUERY PLAN
-----------------------------------------------------------------------------------------
Merge Join (cost=1984.88..2550.42 rows=50001 width=8)
Merge Cond: (a.pkey = b.pkey)
-> Index Only Scan using idx_a on table_a a (cost=0.00..1864.32
rows=61000 width=4)
-> Index Only Scan using idx_b on table_b b (cost=0.00..1531.32
rows=50001 width=4)
[ sorry for slow response, but I'd not gotten time to think about this... ]
Sam Ross <elliptic@gmail.com> writes:
I was wondering why it seems that the query planner can't "see", based
on the histograms, that two join-columns have a very small
intersection, and adjust its row estimation accordingly.
The reason why not is that eqjoinsel() doesn't take any such
consideration into account. It's possible that it'd be a good idea
to teach it to do so. I'm not entirely convinced though. It would
add a fair amount of expense to that function, as well as adding
some possibly shaky assumptions, and I'm not sure how often we'd
get a usefully-better estimate in practice. OTOH, there are a lot
of shaky assumptions in eqjoinsel() already, and we did decide this
was worth worrying about in mergejoin cost estimation.
Do you want to try it and submit a patch for testing?
regards, tom lane
On Sat, Sep 1, 2012 at 8:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
[ sorry for slow response, but I'd not gotten time to think about this... ]
Sam Ross <elliptic@gmail.com> writes:
I was wondering why it seems that the query planner can't "see", based
on the histograms, that two join-columns have a very small
intersection, and adjust its row estimation accordingly.The reason why not is that eqjoinsel() doesn't take any such
consideration into account. It's possible that it'd be a good idea
to teach it to do so. I'm not entirely convinced though. It would
add a fair amount of expense to that function, as well as adding
some possibly shaky assumptions, and I'm not sure how often we'd
get a usefully-better estimate in practice. OTOH, there are a lot
of shaky assumptions in eqjoinsel() already, and we did decide this
was worth worrying about in mergejoin cost estimation.Do you want to try it and submit a patch for testing?
regards, tom lane
Thanks for the answer, and sorry for the slow reply -
I'm not sure I have the necessary expertise, but I'll be happy to give
it a shot. Is there an already-assembled library of queries that is
used to test purported improvements to the planner, or is it expected
that I come up with a convincing test-set myself?
Sam
Sam Ross <elliptic@gmail.com> writes:
On Sat, Sep 1, 2012 at 8:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Do you want to try it and submit a patch for testing?
Thanks for the answer, and sorry for the slow reply -
I'm not sure I have the necessary expertise, but I'll be happy to give
it a shot. Is there an already-assembled library of queries that is
used to test purported improvements to the planner, or is it expected
that I come up with a convincing test-set myself?
No, we don't really have much in that line :-(. There's the regression
tests of course, but they're mostly about functionality not quality of
statistical estimates. In practice, as long as a proposed change made
the estimates demonstrably better in some reasonable scenarios, and
didn't slow it down very much, that would probably be enough.
regards, tom lane
On Wed, Sep 12, 2012 at 07:19:06PM -0400, Tom Lane wrote:
Sam Ross <elliptic@gmail.com> writes:
On Sat, Sep 1, 2012 at 8:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Do you want to try it and submit a patch for testing?
Thanks for the answer, and sorry for the slow reply - I'm not sure
I have the necessary expertise, but I'll be happy to give it a
shot. Is there an already-assembled library of queries that is
used to test purported improvements to the planner, or is it
expected that I come up with a convincing test-set myself?No, we don't really have much in that line :-(.
Does anyone? If so, who?
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate