Selecting dupes from table

Started by Uros Gruberalmost 23 years ago8 messagesgeneral

Jump to latest

Uros Gruber

uros@sir-mag.com

almost 23 years ago

Hello ,

I have table directory with 3 columns (id,url,title)

I want to list all entries with duplicate urls.

I tried this:

select id,url,title from directory where url IN
(select url from directory group by url having count(url) > 1)
ORDER by url;

but this takes 30 seconds with 25.000 entries. I have index on url.

Can I use any other query to select this faster.

--
Best regards,
Uros mailto:uros@sir-mag.com

Jean-Christian Imbeault

jc@mega-bucks.co.jp

almost 23 years ago

In reply to: Uros Gruber (#1)

Re: Selecting dupes from table

Uros wrote:

I want to list all entries with duplicate urls.

I tried this:

select id,url,title from directory where url IN
(select url from directory group by url having count(url) > 1)
ORDER by url;

Try:

select id,url,title from directory where group by id, url, title having
count(url) > 1 order by url;

Think it should work,

Jan-Christian Imbeault

Martijn van Oosterhout

kleptog@svana.org

almost 23 years ago

In reply to: Uros Gruber (#1)

Re: Selecting dupes from table

On Tue, Jun 24, 2003 at 12:16:43PM +0200, Uros wrote:

Hello ,

I have table directory with 3 columns (id,url,title)

I want to list all entries with duplicate urls.

I tried this:

select id,url,title from directory where url IN
(select url from directory group by url having count(url) > 1)
ORDER by url;

but this takes 30 seconds with 25.000 entries. I have index on url.

Can I use any other query to select this faster.

How about:

Duplicate urls would be given by:

select url from directory group by url having count(*) > 1;

To get all the entries with those urls, something like:

select id,url,title from directory,
(select url from directory group by url having count(*) > 1) as list
where list.url = directory.url;

I hope I got the syntax right.
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

"the West won the world not by the superiority of its ideas or values or
religion but rather by its superiority in applying organized violence.
Westerners often forget this fact, non-Westerners never do."
- Samuel P. Huntington

Uros Gruber

uros@sir-mag.com

almost 23 years ago

In reply to: Martijn van Oosterhout (#3)

Re: Selecting dupes from table

Hello Martijn,

Tuesday, June 24, 2003, 12:32:53 PM, you wrote:

MvO> On Tue, Jun 24, 2003 at 12:16:43PM +0200, Uros wrote:

MvO> How about:

MvO> Duplicate urls would be given by:

MvO> select url from directory group by url having count(*) > 1;

MvO> To get all the entries with those urls, something like:

MvO> select id,url,title from directory,
MvO> (select url from directory group by url having count(*) > 1) as list
MvO> where list.url = directory.url;

MvO> I hope I got the syntax right.

I tried that before but got error:

ERROR: Column reference "url" is ambiguous

--
Best regards,
Uros mailto:uros@sir-mag.com

Martijn van Oosterhout

kleptog@svana.org

almost 23 years ago

In reply to: Uros Gruber (#4)

Re: Selecting dupes from table

On Tue, Jun 24, 2003 at 01:12:05PM +0200, Uros wrote:

Hello Martijn,

MvO> select id,url,title from directory,
MvO> (select url from directory group by url having count(*) > 1) as list
MvO> where list.url = directory.url;

MvO> I hope I got the syntax right.

I tried that before but got error:

ERROR: Column reference "url" is ambiguous

Oh right, try:

select id,directory.url,title from directory,
(select url from directory group by url having count(*) > 1) as list
where list.url = directory.url;
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

"the West won the world not by the superiority of its ideas or values or
religion but rather by its superiority in applying organized violence.
Westerners often forget this fact, non-Westerners never do."
- Samuel P. Huntington

Hervé Piedvache

herve@elma.fr

almost 23 years ago

In reply to: Uros Gruber (#1)

Re: Selecting dupes from table

Hi,

I think this could be good :

select d1.id, d1.url, d1.tittle
from directory d1
where exists (select url from directory d2 where d1.url=d2.url and
d2.oid>d1.oid)
order by d1.url;

Hope this will help,

Regards,

Le Mardi 24 Juin 2003 12:16, Uros a ï¿œcrit :

Hello ,

I have table directory with 3 columns (id,url,title)

I want to list all entries with duplicate urls.

I tried this:

select id,url,title from directory where url IN
(select url from directory group by url having count(url) > 1)
ORDER by url;

but this takes 30 seconds with 25.000 entries. I have index on url.

Can I use any other query to select this faster.

--
Hervï¿œ Piedvache

Elma Ingï¿œnierie Informatique
6 rue du Faubourg Saint-Honorï¿œ
F-75008 - Paris - France
Pho. 33-144949901
Fax. 33-144949902

Uros Gruber

uros@sir-mag.com

almost 23 years ago

In reply to: Hervé Piedvache (#6)

Re: Selecting dupes from table

Hello Hervï¿œ,

Thanks a lot for help. Both Martijn van Oosterhout and yours do job god.
Execution time is now about 1 to 2 seconds. I also figured myself what i
was doing wrong with join and error i got.

If enybody need this here is both three solutions.

select distinct d1.id,d1.url,d1.title from directory d1 inner join directory d2 on d2.url = d1.url where d1.id <> d2.id ORDER by d1.url;

select id,directory.url,title from directory, (select url from directory group by url having count(*) > 1) as list where list.url = directory.url;

select d1.id, d1.url, d1.tittle from directory d1 where exists (select url from directory d2 where d1.url=d2.url and d2.oid>d1.oid) order by d1.url;

--
Best regards,
Uros mailto:uros@sir-mag.com

Tuesday, June 24, 2003, 1:27:44 PM, you wrote:

HP> Hi,

HP> I think this could be good :

HP> select d1.id, d1.url, d1.tittle
HP> from directory d1
HP> where exists (select url from directory d2 where d1.url=d2.url and
d2.oid>>d1.oid)
HP> order by d1.url;

HP> Hope this will help,

HP> Regards,

HP> Le Mardi 24 Juin 2003 12:16, Uros a ï¿œcrit :

Show quoted text

Hello ,

I have table directory with 3 columns (id,url,title)

I want to list all entries with duplicate urls.

I tried this:

select id,url,title from directory where url IN
(select url from directory group by url having count(url) > 1)
ORDER by url;

but this takes 30 seconds with 25.000 entries. I have index on url.

Can I use any other query to select this faster.

Bruno Wolff III

bruno@wolff.to

almost 23 years ago

In reply to: Jean-Christian Imbeault (#2)

Re: Selecting dupes from table

On Tue, Jun 24, 2003 at 19:31:46 +0900,
Jean-Christian Imbeault <jc@mega-bucks.co.jp> wrote:

Try:

select id,url,title from directory where group by id, url, title having
count(url) > 1 order by url;

That won't work because the count is over the triples, rather than just
the url.