Selecting dupes from table

Started by Uros Gruberalmost 23 years ago8 messagesgeneral
Jump to latest
#1Uros Gruber
uros@sir-mag.com

Hello ,

I have table directory with 3 columns (id,url,title)

I want to list all entries with duplicate urls.

I tried this:

select id,url,title from directory where url IN
(select url from directory group by url having count(url) > 1)
ORDER by url;

but this takes 30 seconds with 25.000 entries. I have index on url.

Can I use any other query to select this faster.

--
Best regards,
Uros mailto:uros@sir-mag.com

#2Jean-Christian Imbeault
jc@mega-bucks.co.jp
In reply to: Uros Gruber (#1)
Re: Selecting dupes from table

Uros wrote:

I want to list all entries with duplicate urls.

I tried this:

select id,url,title from directory where url IN
(select url from directory group by url having count(url) > 1)
ORDER by url;

Try:

select id,url,title from directory where group by id, url, title having
count(url) > 1 order by url;

Think it should work,

Jan-Christian Imbeault

#3Martijn van Oosterhout
kleptog@svana.org
In reply to: Uros Gruber (#1)
Re: Selecting dupes from table

On Tue, Jun 24, 2003 at 12:16:43PM +0200, Uros wrote:

Hello ,

I have table directory with 3 columns (id,url,title)

I want to list all entries with duplicate urls.

I tried this:

select id,url,title from directory where url IN
(select url from directory group by url having count(url) > 1)
ORDER by url;

but this takes 30 seconds with 25.000 entries. I have index on url.

Can I use any other query to select this faster.

How about:

Duplicate urls would be given by:

select url from directory group by url having count(*) > 1;

To get all the entries with those urls, something like:

select id,url,title from directory,
(select url from directory group by url having count(*) > 1) as list
where list.url = directory.url;

I hope I got the syntax right.
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

"the West won the world not by the superiority of its ideas or values or
religion but rather by its superiority in applying organized violence.
Westerners often forget this fact, non-Westerners never do."
- Samuel P. Huntington

#4Uros Gruber
uros@sir-mag.com
In reply to: Martijn van Oosterhout (#3)
Re: Selecting dupes from table

Hello Martijn,

Tuesday, June 24, 2003, 12:32:53 PM, you wrote:

MvO> On Tue, Jun 24, 2003 at 12:16:43PM +0200, Uros wrote:

MvO> How about:

MvO> Duplicate urls would be given by:

MvO> select url from directory group by url having count(*) > 1;

MvO> To get all the entries with those urls, something like:

MvO> select id,url,title from directory,
MvO> (select url from directory group by url having count(*) > 1) as list
MvO> where list.url = directory.url;

MvO> I hope I got the syntax right.

I tried that before but got error:

ERROR: Column reference "url" is ambiguous

--
Best regards,
Uros mailto:uros@sir-mag.com

#5Martijn van Oosterhout
kleptog@svana.org
In reply to: Uros Gruber (#4)
Re: Selecting dupes from table

On Tue, Jun 24, 2003 at 01:12:05PM +0200, Uros wrote:

Hello Martijn,

MvO> select id,url,title from directory,
MvO> (select url from directory group by url having count(*) > 1) as list
MvO> where list.url = directory.url;

MvO> I hope I got the syntax right.

I tried that before but got error:

ERROR: Column reference "url" is ambiguous

Oh right, try:

select id,directory.url,title from directory,
(select url from directory group by url having count(*) > 1) as list
where list.url = directory.url;
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

"the West won the world not by the superiority of its ideas or values or
religion but rather by its superiority in applying organized violence.
Westerners often forget this fact, non-Westerners never do."
- Samuel P. Huntington

#6Hervé Piedvache
herve@elma.fr
In reply to: Uros Gruber (#1)
Re: Selecting dupes from table

Hi,

I think this could be good :

select d1.id, d1.url, d1.tittle
from directory d1
where exists (select url from directory d2 where d1.url=d2.url and
d2.oid>d1.oid)
order by d1.url;

Hope this will help,

Regards,

Le Mardi 24 Juin 2003 12:16, Uros a ᅵcrit :

Hello ,

I have table directory with 3 columns (id,url,title)

I want to list all entries with duplicate urls.

I tried this:

select id,url,title from directory where url IN
(select url from directory group by url having count(url) > 1)
ORDER by url;

but this takes 30 seconds with 25.000 entries. I have index on url.

Can I use any other query to select this faster.

--
Hervᅵ Piedvache

Elma Ingᅵnierie Informatique
6 rue du Faubourg Saint-Honorᅵ
F-75008 - Paris - France
Pho. 33-144949901
Fax. 33-144949902

#7Uros Gruber
uros@sir-mag.com
In reply to: Hervé Piedvache (#6)
Re: Selecting dupes from table

Hello Hervᅵ,

Thanks a lot for help. Both Martijn van Oosterhout and yours do job god.
Execution time is now about 1 to 2 seconds. I also figured myself what i
was doing wrong with join and error i got.

If enybody need this here is both three solutions.

select distinct d1.id,d1.url,d1.title from directory d1 inner join directory d2 on d2.url = d1.url where d1.id <> d2.id ORDER by d1.url;

select id,directory.url,title from directory, (select url from directory group by url having count(*) > 1) as list where list.url = directory.url;

select d1.id, d1.url, d1.tittle from directory d1 where exists (select url from directory d2 where d1.url=d2.url and d2.oid>d1.oid) order by d1.url;

--
Best regards,
Uros mailto:uros@sir-mag.com

Tuesday, June 24, 2003, 1:27:44 PM, you wrote:

HP> Hi,

HP> I think this could be good :

HP> select d1.id, d1.url, d1.tittle
HP> from directory d1
HP> where exists (select url from directory d2 where d1.url=d2.url and
d2.oid>>d1.oid)
HP> order by d1.url;

HP> Hope this will help,

HP> Regards,

HP> Le Mardi 24 Juin 2003 12:16, Uros a ᅵcrit :

Show quoted text

Hello ,

I have table directory with 3 columns (id,url,title)

I want to list all entries with duplicate urls.

I tried this:

select id,url,title from directory where url IN
(select url from directory group by url having count(url) > 1)
ORDER by url;

but this takes 30 seconds with 25.000 entries. I have index on url.

Can I use any other query to select this faster.

#8Bruno Wolff III
bruno@wolff.to
In reply to: Jean-Christian Imbeault (#2)
Re: Selecting dupes from table

On Tue, Jun 24, 2003 at 19:31:46 +0900,
Jean-Christian Imbeault <jc@mega-bucks.co.jp> wrote:

Try:

select id,url,title from directory where group by id, url, title having
count(url) > 1 order by url;

That won't work because the count is over the triples, rather than just
the url.