is a 'pairwise' possible / feasible in SQL?

Started by Rajarshi Guhaover 17 years ago8 messagesgeneral

rguha@indiana.edu

over 17 years ago

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi, I have a table of the form

aid cid
- ---- -----
1 123
2 456
3 667
3 879
3 123
4 878
4 456
4 123
5 999
5 667
5 879

My goal is to identify for each pair of cid values, the number of
times they have the same aid

Thus for example I would have

pair count
- ---- -----
123 & 456 1
667 & 879 2
...

I currently do this by using a Python script to do a pairwise lookup, as

select count(aid) where cid = 123 and cid = 456;

but I was wondering whether I could construct a single SQL statement
to do this.

Any pointers would be appreciated,

Thanks,
- -------------------------------------------------------------------
Rajarshi Guha <rguha@indiana.edu>
GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84
- -------------------------------------------------------------------
All great discoveries are made by mistake.
-- Young

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAkiXRYUACgkQZqGSLFHnnoTJJQCgtvromGcYfQVGsekGFQJU6vTo
oHgAnjpfKSkZR0MqBjdE6WFGO8SBr2WH
=zZJk
-----END PGP SIGNATURE-----

david.t.wilson@gmail.com

over 17 years ago

In reply to: Rajarshi Guha (#1)

Re: is a 'pairwise' possible / feasible in SQL?

On Mon, Aug 4, 2008 at 2:08 PM, Rajarshi Guha <rguha@indiana.edu> wrote:

select count(aid) where cid = 123 and cid = 456;

but I was wondering whether I could construct a single SQL statement to do
this.

Any pointers would be appreciated,

Typed into gmail, so may need some tweaking, but something to the effect of:

select count(*) from table a inner join table b on a.aid=b.bid group
by a.cid,b.cid;

should do the trick, I'd think...

--
- David T. Wilson
david.t.wilson@gmail.com

scott.marlowe@gmail.com

over 17 years ago

In reply to: David Wilson (#2)

Re: is a 'pairwise' possible / feasible in SQL?

On Mon, Aug 4, 2008 at 1:02 PM, David Wilson <david.t.wilson@gmail.com> wrote:

On Mon, Aug 4, 2008 at 2:08 PM, Rajarshi Guha <rguha@indiana.edu> wrote:

select count(aid) where cid = 123 and cid = 456;

but I was wondering whether I could construct a single SQL statement to do
this.

Any pointers would be appreciated,

Typed into gmail, so may need some tweaking, but something to the effect of:

select count(*) from table a inner join table b on a.aid=b.bid group
by a.cid,b.cid;

should do the trick, I'd think...

But then you need remove the dups where you got:

667 999 2
999 667 2

Francisco Reyes

lists@stringsutils.com

over 17 years ago

In reply to: Rajarshi Guha (#1)

Re: is a 'pairwise' possible / feasible in SQL?

On 2:08 pm 08/04/08 Rajarshi Guha <rguha@indiana.edu> wrote:

pair count
- ---- -----
123 & 456 1
667 & 879 2

create temp table aic_cid
(
id smallint,
cid smallint

);

insert into aic_cid values (1,123);
insert into aic_cid values (2,456);
insert into aic_cid values (3,667);
insert into aic_cid values (3,879);
insert into aic_cid values (3,123);
insert into aic_cid values (4,878);
insert into aic_cid values (4,456);
insert into aic_cid values (4,123);
insert into aic_cid values (5,999);
insert into aic_cid values (5,667);
insert into aic_cid values (5,879);

select a.cid as ac, b.cid as bc, count(*) from aic_cid a left outer join
aic_cid b on a.cid <>b.cid and a.id = b.id where b.cid is not null group by
a.cid, b.cid order by a.cid;
ac | bc | count
-----+-----+-------
123 | 456 | 1
123 | 667 | 1
123 | 878 | 1
123 | 879 | 1
456 | 123 | 1
456 | 878 | 1
667 | 123 | 1
667 | 879 | 2
667 | 999 | 1
878 | 123 | 1
878 | 456 | 1
879 | 123 | 1
879 | 667 | 2
879 | 999 | 1
999 | 667 | 1
999 | 879 | 1

Is that what you are looking for?

rguha@indiana.edu

over 17 years ago

In reply to: Francisco Reyes (#4)

Re: is a 'pairwise' possible / feasible in SQL?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 4, 2008, at 4:55 PM, Francisco Reyes wrote:

On 2:08 pm 08/04/08 Rajarshi Guha <rguha@indiana.edu> wrote:

pair count
- ---- -----
123 & 456 1
667 & 879 2

<snip>

select a.cid as ac, b.cid as bc, count(*) from aic_cid a left
outer join
aic_cid b on a.cid <>b.cid and a.id = b.id where b.cid is not null
group by
a.cid, b.cid order by a.cid;
ac | bc | count
-----+-----+-------
123 | 456 | 1
123 | 667 | 1
123 | 878 | 1
123 | 879 | 1
456 | 123 | 1
456 | 878 | 1
667 | 123 | 1
667 | 879 | 2
667 | 999 | 1
878 | 123 | 1
878 | 456 | 1
879 | 123 | 1
879 | 667 | 2
879 | 999 | 1
999 | 667 | 1
999 | 879 | 1

Is that what you are looking for?

Thanks a lot - this is very close. Ideally, I'd want unique pairs, so
the row

879 | 999 | 1

is the same as

999 | 879 | 1

Can these duplicates be avoided?

- -------------------------------------------------------------------
Rajarshi Guha <rguha@indiana.edu>
GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84
- -------------------------------------------------------------------
How I wish I were what I was when I wished I were what I am.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAkiXbe8ACgkQZqGSLFHnnoRXPACeMcPqXG4QIf308ufnAHev9hlG
EEoAoLzU5tmL1ipiUIp69N9mOvnsfrES
=JOg1
-----END PGP SIGNATURE-----

Rainer.Pruy@Acrys.COM

over 17 years ago

In reply to: Rajarshi Guha (#5)

Re: is a 'pairwise' possible / feasible in SQL?

Rajarshi Guha wrote

On Aug 4, 2008, at 4:55 PM, Francisco Reyes wrote:

On 2:08 pm 08/04/08 Rajarshi Guha <rguha@indiana.edu> wrote:

pair count
- ---- -----
123 & 456 1
667 & 879 2

<snip>

select a.cid as ac, b.cid as bc, count(*) from aic_cid a left outer join
aic_cid b on a.cid <>b.cid and a.id = b.id where b.cid is not null
group by
a.cid, b.cid order by a.cid;
ac | bc | count
-----+-----+-------
123 | 456 | 1
123 | 667 | 1
123 | 878 | 1
123 | 879 | 1
456 | 123 | 1
456 | 878 | 1
667 | 123 | 1
667 | 879 | 2
667 | 999 | 1
878 | 123 | 1
878 | 456 | 1
879 | 123 | 1
879 | 667 | 2
879 | 999 | 1
999 | 667 | 1
999 | 879 | 1

Is that what you are looking for?

Thanks a lot - this is very close. Ideally, I'd want unique pairs, so
the row

879 | 999 | 1

is the same as

999 | 879 | 1

Can these duplicates be avoided?

Depends on values and other distinguishing attributes....
For the given example - assuming pairing of a given cid with itself is not to be expected:
add a "and a.cid < b.cid" to the query....

Rainer

Show quoted text

-------------------------------------------------------------------
Rajarshi Guha <rguha@indiana.edu>
GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84
-------------------------------------------------------------------
How I wish I were what I was when I wished I were what I am.

gnari@hive.is

over 17 years ago

In reply to: Rajarshi Guha (#5)

Re: is a 'pairwise' possible / feasible in SQL?

On mï¿½n, 2008-08-04 at 17:00 -0400, Rajarshi Guha wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 4, 2008, at 4:55 PM, Francisco Reyes wrote:

On 2:08 pm 08/04/08 Rajarshi Guha <rguha@indiana.edu> wrote:

pair count
- ---- -----
123 & 456 1
667 & 879 2

<snip>

select a.cid as ac, b.cid as bc, count(*) from aic_cid a left
outer join
aic_cid b on a.cid <>b.cid and a.id = b.id where b.cid is not null
group by
a.cid, b.cid order by a.cid;
ac | bc | count
-----+-----+-------
123 | 456 | 1
123 | 667 | 1
...
Is that what you are looking for?

Thanks a lot - this is very close. Ideally, I'd want unique pairs, so
the row

879 | 999 | 1

is the same as

999 | 879 | 1

Can these duplicates be avoided?

just add a ac<bc condition:
select a.cid as ac, b.cid as bc, count(*)
from aic_cid a left outer join aic_cid b
on a.cid <> b.cid and a.id = b.id
where b.cid is not null AND a.cid < b.cid
group by a.cid, b.cid
order by a.cid;

gnari

sam@samason.me.uk

over 17 years ago

In reply to: Rajarshi Guha (#5)

Re: is a 'pairwise' possible / feasible in SQL?

On Mon, Aug 04, 2008 at 05:00:31PM -0400, Rajarshi Guha wrote:

On Aug 4, 2008, at 4:55 PM, Francisco Reyes wrote:

select a.cid as ac, b.cid as bc, count(*) from aic_cid a left
outer join
aic_cid b on a.cid <>b.cid and a.id = b.id where b.cid is not null
group by
a.cid, b.cid order by a.cid;

Is that what you are looking for?

Thanks a lot - this is very close. Ideally, I'd want unique pairs

You just need to change the "a.cid <> b.cid" equality to something
non-symmetric, i.e. "a.cid < b.cid". I'm also not sure why an outer
join is being used. I've rewritten it to:

SELECT a.cid AS ac, b.cid AS bc, count(*)
FROM aic_cid a, aic_cid b
WHERE a.id = b.id AND a.cid < b.cid
GROUP BY a.cid, b.cid
ORDER BY a.cid, b.cid;

and seem to get similar results.

Sam