How to remove duplicate lines but save one of the lines?

Started by A Bover 17 years ago15 messagesgeneral

gentosaker@gmail.com

over 17 years ago

I have a table with rows like this
A 1
A 1
B 3
B 3
C 44
C 44
and so on.

and I want it to be
A 1
B 3
C 44

so how can I remove the all the duplicate lines but one?

Raymond O'Donnell

rod@iol.ie

over 17 years ago

In reply to: A B (#1)

Re: How to remove duplicate lines but save one of the lines?

On 21/07/2008 16:33, A B wrote:

I have a table with rows like this
A 1
A 1
B 3
B 3
C 44
C 44
and so on.

and I want it to be
A 1
B 3
C 44

so how can I remove the all the duplicate lines but one?

You could copy them into a new table, like so:

CREATE TABLE newtable AS
SELECT DISTINCT * FROM oldtable;

Ray.

------------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
------------------------------------------------------------------

sramirez@vonage.com

over 17 years ago

In reply to: A B (#1)

Re: How to remove duplicate lines but save one of the lines?

There is probably a more elegant way of doing it, but a simple way of
doing it ( depending on the size of the table ) could be:

begin;

insert into foo select distinct * from orig_table;
delete from orig_table;
insert into orig_table select * from foo;

commit;

-Said

A B wrote:

Show quoted text

I have a table with rows like this
A 1
A 1
B 3
B 3
C 44
C 44
and so on.

and I want it to be
A 1
B 3
C 44

so how can I remove the all the duplicate lines but one?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Francisco Reyes

lists@stringsutils.com

over 17 years ago

In reply to: A B (#1)

Re: How to remove duplicate lines but save one of the lines?

On 11:33 am 07/21/08 "A B" <gentosaker@gmail.com> wrote:

and I want it to be
A 1
B 3
C 44

The slow way
select distinct field1, field2 from sometable.

The faster way
select field1,fields2 from sometable group by field1, field2.

Distinct should in theory be the same speed, but on several tests I have
done group by was faster. I posted a message to the list and there were
some explanations why group by was faster.. Hopefully someday they should
perform just as efficiently.

gentosaker@gmail.com

over 17 years ago

In reply to: Said Ramirez (#3)

Re: How to remove duplicate lines but save one of the lines?

There is probably a more elegant way of doing it, but a simple way of doing
it ( depending on the size of the table ) could be:

begin;

insert into foo select distinct * from orig_table;
delete from orig_table;
insert into orig_table select * from foo;

commit;

Just to make it clear to me
Here foo is a table that I have to create with the command
CREATE TABLE foo (....same columns as orig_table);
?

Is it possible to add a unique constraint to the table, with a
"delete" option so it will delete duplicates?

sramirez@vonage.com

over 17 years ago

In reply to: A B (#5)

Re: How to remove duplicate lines but save one of the lines?

Yes, here foo is a temp table. As others have pointed out, you could
probably do a create table foo as select distinct * from orig_table. I
would move the data back to orig_table, so that constraints and
privileges are maintainited. After you have done this, you can put a
uniq constraint on columns A & B. I am uncertain if you can do
something like ALTER TABLE orig_table ADD UNIQUE (A,B) ON DUPLICATE DELETE.
-Said

A B wrote:

There is probably a more elegant way of doing it, but a simple way

of doing

it ( depending on the size of the table ) could be:

begin;

insert into foo select distinct * from orig_table;
delete from orig_table;
insert into orig_table select * from foo;

commit;

Just to make it clear to me
Here foo is a table that I have to create with the command
CREATE TABLE foo (....same columns as orig_table);
?

Is it possible to add a unique constraint to the table, with a
"delete" option so it will delete duplicates?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Said Ramirez
Super Cool MySQL DBA
cel: 732 425 1929

Hoover, Jeffrey

jhoover@jcvi.org

over 17 years ago

In reply to: A B (#5)

Re: How to remove duplicate lines but save one of the lines?

minor refinement on suggestion:

-- CTAS (create table as) is easiest way to create table with same
structure
create table foo as select * from orig_table;
-- truncate is much more efficient than delete
truncate orig_table;
-- unchanged
insert into orig_table select * from foo;
-- recompute statistics
analyze orig_table

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of A B
Sent: Monday, July 21, 2008 11:51 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] How to remove duplicate lines but save one of the
lines?

There is probably a more elegant way of doing it, but a simple way of

doing

it ( depending on the size of the table ) could be:

begin;

insert into foo select distinct * from orig_table;
delete from orig_table;
insert into orig_table select * from foo;

commit;

Just to make it clear to me
Here foo is a table that I have to create with the command
CREATE TABLE foo (....same columns as orig_table);
?

Is it possible to add a unique constraint to the table, with a
"delete" option so it will delete duplicates?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

scott.marlowe@gmail.com

over 17 years ago

In reply to: A B (#5)

Re: How to remove duplicate lines but save one of the lines?

On Mon, Jul 21, 2008 at 9:51 AM, A B <gentosaker@gmail.com> wrote:

There is probably a more elegant way of doing it, but a simple way of doing
it ( depending on the size of the table ) could be:

begin;

insert into foo select distinct * from orig_table;
delete from orig_table;
insert into orig_table select * from foo;

commit;

Just to make it clear to me
Here foo is a table that I have to create with the command
CREATE TABLE foo (....same columns as orig_table);
?

If this is a live table with that you can't use that method on, you
can use this generic methodology to get rid of dups.

-- Create test table
smarlowe=# create table main (i int, t text);
CREATE TABLE
smarlowe=# insert into main values (1,'A');
INSERT 0 1
smarlowe=# insert into main values (1,'A');
INSERT 0 1
smarlowe=# insert into main values (3,'B');
INSERT 0 1
smarlowe=# insert into main values (3,'B');
INSERT 0 1
smarlowe=# insert into main values (44,'C');
INSERT 0 1
smarlowe=# insert into main values (44,'C');
INSERT 0 1
smarlowe=# select * from main;
i | t
----+---
1 | A
1 | A
3 | B
3 | B
44 | C
44 | C
(6 rows)

Add a new field for an int, set it to a sequence of numbers:

smarlowe=# alter table main add uniq int;
ALTER TABLE
smarlowe=# create sequence t
smarlowe-# ;
CREATE SEQUENCE
smarlowe=# update main set uniq=nextval('t');
UPDATE 6
smarlowe=# select * from main;
i | t | uniq
----+---+------
1 | A | 1
1 | A | 2
3 | B | 3
3 | B | 4
44 | C | 5
44 | C | 6
(6 rows)

This query will give us a list of "extra" ids:

smarlowe=# select distinct m1.uniq from main m1 join main m2 on
(m1.t=m2.t and m1.i=m2.i) where m1.uniq > m2.uniq;
uniq
------
2
4
6
(3 rows)

We use that query as a subselect to a delete:

smarlowe=# begin;
BEGIN
smarlowe=# delete from main where uniq in (select m1.uniq from main m1
join main m2 on (m1.t=m2.t and m1.i=m2.i) where m1.uniq > m2.uniq);
DELETE 3
smarlowe=# select * from main;
i | t | uniq
----+---+------
1 | A | 1
3 | B | 3
44 | C | 5
(3 rows)
smarlowe=# commit;
COMMIT

Is it possible to add a unique constraint to the table, with a
"delete" option so it will delete duplicates?

It is possible to add a unique constraint. Having it "delete" rows
automagically is not normal operation, but I'm sure some kind of user
defined trigger could be written to do that. But if you've got a
unique constraint on a unique set of data, new non-unique entries will
fail to enter.

smarlowe=# create unique index main_t_i on main (t,i);
CREATE INDEX
smarlowe=# insert into main (i,t) values (1,'A');
ERROR: duplicate key violates unique constraint "main_t_i"

btober@ct.metrocast.net

over 17 years ago

In reply to: A B (#1)

Re: How to remove duplicate lines but save one of the lines?

A B wrote:

I have a table with rows like this
A 1
A 1
B 3
B 3
C 44
C 44
and so on.

and I want it to be
A 1
B 3
C 44

so how can I remove the all the duplicate lines but one?

CREATE TEMP TABLE tmp AS SELECT DISTINCT * FROM t1;
DROP TABLE t1;
CREATE TABLE t1 AS SELECT * FROM tmp;

teemu.juntunen@e-ngine.fi

over 17 years ago

In reply to: A B (#1)

Is it possible to do some damage to database with SELECT query?

Hi,

is it possible to make a SELECT query with some nasty follow up commands,
which damages the database.

Something like:

SELECT *,(DROP DATABASE enterprise) AS roger FROM sales WHERE sales >
(UPDATE order SET order=1);

I know this wont work, but is there some possibility to modify database with
SELECT query?

I'm developing an ERP where I would like to implement a statistical program
where you can write your own SELECT queries.

Best Regards,
Teemu Juntunen

andreas.kretschmer@schollglas.com

over 17 years ago

In reply to: Teemu Juntunen (#10)

Re: Is it possible to do some damage to database with SELECT query?

am Tue, dem 22.07.2008, um 12:50:31 +0300 mailte Teemu Juntunen folgendes:

Hi,

First, don't hijack other threads!

is it possible to make a SELECT query with some nasty follow up commands,
which damages the database.

Something like:

SELECT *,(DROP DATABASE enterprise) AS roger FROM sales WHERE sales >
(UPDATE order SET order=1);

I know this wont work, but is there some possibility to modify database
with SELECT query?

Sure, with sql-injection. There are a lot to read via google, for
instance http://en.wikipedia.org/wiki/SQL_injection

HTH, Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net

wmoran@collaborativefusion.com

over 17 years ago

In reply to: Teemu Juntunen (#10)

Re: Is it possible to do some damage to database with SELECT query?

In response to "Teemu Juntunen" <teemu.juntunen@e-ngine.fi>:

Hi,

is it possible to make a SELECT query with some nasty follow up commands,
which damages the database.

Something like:

SELECT *,(DROP DATABASE enterprise) AS roger FROM sales WHERE sales >
(UPDATE order SET order=1);

I know this wont work, but is there some possibility to modify database with
SELECT query?

I'm developing an ERP where I would like to implement a statistical program
where you can write your own SELECT queries.

Yes, it's easy to do with stored procedures. i.e.:
SELECT drop_table('important_table') FROM some_other_table;

This is managed with database permissions. Ensure that the user your
dynamic query engine is connecting as does not have permissions to
drop tables, or even delete rows from the tables (see the GRANT docs).
By doing so, you ensure that even if someone can create a dangerous
procedure and execute it via query, that they can't do any damage.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Karsten Hilbert

Karsten.Hilbert@gmx.net

over 17 years ago

In reply to: A. Kretschmer (#11)

Re: Is it possible to do some damage to database with SELECT query?

On Tue, Jul 22, 2008 at 12:20:46PM +0200, A. Kretschmer wrote:

is it possible to make a SELECT query with some nasty follow up commands,
which damages the database.

Something like:

SELECT *,(DROP DATABASE enterprise) AS roger FROM sales WHERE sales >
(UPDATE order SET order=1);

I know this wont work, but is there some possibility to modify database
with SELECT query?

Sure, with sql-injection. There are a lot to read via google, for
instance http://en.wikipedia.org/wiki/SQL_injection

That's why on important databases you'd configure them

"set default_transaction_read_only to on"

and only reverse that connect by connect when a writable
connection is truly needed. That way injectors will not only
have to hijack *any* connection but pick the right one, too.

It also nicely keeps average users from destroying their
data with admin tools like pgadmin etc.

And then there's role based per-table permissions, of course.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

Julio Cesar Sánchez González

knowhow@sistemasyconectividad.com.mx

over 17 years ago

In reply to: A B (#1)

Re: How to remove duplicate lines but save one of the lines?

A B wrote:

I have a table with rows like this
A 1
A 1
B 3
B 3
C 44
C 44
and so on.

and I want it to be
A 1
B 3
C 44

so how can I remove the all the duplicate lines but one?

Try with:

your table structure for example: create table yourtable(campo1 char,
num integer);

select * from yourtable;

sicodelico=# select * from yourtable ;
campo1 | num
--------+-----
A | 1
A | 1
B | 3
B | 3
C | 44
C | 44
(6 filas)

sicodelico=#

1) create temp sequence foo_id_seq start with 1;

2) alter table yourtable add column id integer;

3) update yourtable set id = nextval('foo_id_seq');

look this:

sicodelico=# select * from yourtable ;
campo1 | num | id
--------+-----+----
A | 1 | 1
A | 1 | 2
B | 3 | 3
B | 3 | 4
C | 44 | 5
C | 44 | 6
(6 filas)

4) delete from yourtable where campo1 in (select y.campo1 from yourtable
y where yourtable.id > y.id);

sicodelico=# select * from yourtable;
campo1 | num | id
--------+-----+----
A | 1 | 1
B | 3 | 3
C | 44 | 5
(3 filas)

5) alter table yourtable drop column id;

sicodelico=# select * from yourtable;
campo1 | num
--------+-----
A | 1
B | 3
C | 44
(3 filas)

have a lot of fun :)

--
Regards,

Julio Cesar Sï¿½nchez Gonzï¿½lez.

--
Ahora me he convertido en la muerte, destructora de mundos.
Soy la Muerte que se lleva todo, la fuente de las cosas que vendran.

www.sistemasyconectividad.com.mx http://darkavngr.blogspot.com/

kedarr@netcore.co.in

over 17 years ago

In reply to: Julio Cesar Sánchez González (#14)

Re: How to remove duplicate lines but save one of the lines?

Julio Cesar Sï¿½nchez Gonzï¿½lez wrote:

A B wrote:

I have a table with rows like this
A 1
A 1
B 3
B 3
C 44
C 44
and so on.

and I want it to be
A 1
B 3
C 44

so how can I remove the all the duplicate lines but one?

You think this would help?
create table temp(text varchar(20),id integer );
INSERT INTO temp values('A',10);
INSERT INTO temp values('A',10);
INSERT INTO temp values('B',20);
INSERT INTO temp values('B',20);
INSERT INTO temp values('B',20);

select * from temp;
text | id
------+----
A | 10
A | 10
B | 20
B | 20
B | 20

select text,id, count(1) from temp group by 1,2;

text | id | count
------+----+-------
A | 10 | 2
B | 20 | 3

and forget about the count from the result set.

--

Thanks & Regards

Kedar Parikh
Netcore Solutions Pvt. Ltd.

Tel: +91 (22) 6662 8135
Mob: +91 9819634734
Email: kedar@netcore.co.in
Web: www.netcore.co.in

===================================================================
sms START NEWS <your city> to 09845398453 for Breaking News and Top
Stories on Business, Sports & Politics. For more services visit
http://www.mytodaysms.com
===================================================================