Why is this a cross join?
I have a seemingly simple query that returns way too many records.
Here is the count query
select count(*) from (
select crm.*
from sap_crm_sales crm
inner join model_configurations mc on left(crm.customer_class, 6)
= left(mc.sap_code,6)
) as s
The crm table has about 44K records, the model table has about 1K the
join produces about 9 million records
It doesn't matter if I make it a left join or inner join the count
ends up the same.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Feb 17, 2013, at 9:16, Tim Uckun <timuckun@gmail.com> wrote:
I have a seemingly simple query that returns way too many records.
Here is the count queryselect count(*) from (
select crm.*
from sap_crm_sales crm
inner join model_configurations mc on left(crm.customer_class, 6)
= left(mc.sap_code,6)
) as sThe crm table has about 44K records, the model table has about 1K the
join produces about 9 million records
Apparently the first 6 characters of those fields are quite common, which gives you a result for every possible combination of the same 6-character value.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
Apparently the first 6 characters of those fields are quite common, which
gives you a result for every possible combination of the same 6-character
value.
Mmmmm. That seems kind of weird. Is there any way to NOT have this
be a cross join? For example if I extracted the first six characters
into a field and then joined on them it would not be a cross join
right?
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 18/02/2013, at 7:58 AM, Tim Uckun <timuckun@gmail.com> wrote:
Apparently the first 6 characters of those fields are quite common, which
gives you a result for every possible combination of the same 6-character
value.Mmmmm. That seems kind of weird. Is there any way to NOT have this
be a cross join? For example if I extracted the first six characters
into a field and then joined on them it would not be a cross join
right?
In some way, every join is a cross join, with the results filtered according to the specificity of the join conditions. In this case:
inner join model_configurations mc on left(crm.customer_class, 6) = left(mc.sap_code,6)
"customer_class" sounds like a fairly generic sort of field, so you'd expect many matches. Truncating the fields is likely to make this even less specific, returning more results.
Cheers,
Tony
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
In some way, every join is a cross join, with the results filtered according to the specificity of the join conditions. In this case:
inner join model_configurations mc on left(crm.customer_class, 6) = left(mc.sap_code,6)
"customer_class" sounds like a fairly generic sort of field, so you'd expect many matches. Truncating the fields is likely to make this even less specific, returning more results.
I guess I am not explaining it properly..
Say I created new columns on both tables called "first_6" and
populated them with the substrings. If I did a inner join or a left
join on those fields would I still get a cross join?
inner join model_configurations mc on mc.first_6 = crm.first_6
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 02/17/2013 02:09 PM, Tim Uckun wrote:
In some way, every join is a cross join, with the results filtered according to the specificity of the join conditions. In this case:
inner join model_configurations mc on left(crm.customer_class, 6) = left(mc.sap_code,6)
"customer_class" sounds like a fairly generic sort of field, so you'd expect many matches. Truncating the fields is likely to make this even less specific, returning more results.
I guess I am not explaining it properly..
Say I created new columns on both tables called "first_6" and
populated them with the substrings. If I did a inner join or a left
join on those fields would I still get a cross join?inner join model_configurations mc on mc.first_6 = crm.first_6
http://www.postgresql.org/docs/9.2/interactive/sql-select.html
...CROSS JOIN and INNER JOIN produce a simple Cartesian product, the
same result as you get from listing the two items at the top level of
FROM, but restricted by the join condition (if any). CROSS JOIN is
equivalent to INNER JOIN ON (TRUE), that is, no rows are removed by
qualification. These join types are just a notational convenience, since
they do nothing you couldn't do with plain FROM and WHERE...
--
Adrian Klaver
adrian.klaver@gmail.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 18/02/2013, at 9:09 AM, Tim Uckun <timuckun@gmail.com> wrote:
In some way, every join is a cross join, with the results filtered according to the specificity of the join conditions. In this case:
inner join model_configurations mc on left(crm.customer_class, 6) = left(mc.sap_code,6)
"customer_class" sounds like a fairly generic sort of field, so you'd expect many matches. Truncating the fields is likely to make this even less specific, returning more results.
I guess I am not explaining it properly..
Say I created new columns on both tables called "first_6" and
populated them with the substrings. If I did a inner join or a left
join on those fields would I still get a cross join?
No, it would be no different if you created new columns - the join condition is the same.
You're not actually getting a cross join, that would be many more records than 9million - you're just not setting a specific enough filter.
Cheers,
Tony
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 2/17/2013 2:09 PM, Tim Uckun wrote:
Say I created new columns on both tables called "first_6" and
populated them with the substrings. If I did a inner join or a left
join on those fields would I still get a cross join?inner join model_configurations mc on mc.first_6 = crm.first_6
without additional filtering, that will return a row for every row in
crm that matches a row in mc. if there's a 1000 rows in crm that each
match an average of 6 rows in mc, that would return 6000 rows.
crm left join mc would return all of those, plus all the rows in crm
that don't have any match in mc (the mc.* fields would be NULL in these
result rows).
--
john r pierce 37N 122W
somewhere on the middle of the left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Tim Uckun <timuckun@gmail.com> wrote:
I guess I am not explaining it properly..
Say I created new columns on both tables called "first_6" and
populated them with the substrings. If I did a inner join or a
left join on those fields would I still get a cross join?inner join model_configurations mc on mc.first_6 = crm.first_6
As others have said, that would not change the results -- you are
getting exactly what you are requesting, but apparently not what
you want. Can you explain what it is that you do want? If you
have ten rows in one table with the same "first six" value, and 20
rows in the other table with that same "first six" value, how do
you want them to be matched up? Do you want one row in the result
for every row in one of the tables? If so, how do you want to
determine which of the matching rows in the other table to choose,
and which to ignore?
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
for every row in one of the tables? If so, how do you want to
determine which of the matching rows in the other table to choose,
and which to ignore?
In this case it's a simple lookup. Any of the matching rows will do
but I just want one.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Tim Uckun <timuckun@gmail.com> wrote:
how do you want to determine which of the matching rows in the
other table to choose, and which to ignore?Any of the matching rows will do but I just want one.
Well, if you don't care, this might give you what you want.
select distinct on (x.first6) crm.*, x.*
from from sap_crm_sales crm
join (select left(mc.sap_code,6) as first6, mc.*
from model_configurations mc) x
on x.first6 = left(crm.customer_class, 6)
order by x.first6;
I can think of about a dozen different things which fit the
description that you've given so far that you *might* want instead,
but unless you get more specific we can only guess.
If this isn't what you want, your best bet might might be to come
up with a self-contained example, where you create an populate
tables with a small amount of data, and show what you would like as
a result given that data set.
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general