Need help requiring uniqueness in text columns

Started by Matthew Wilsonover 18 years ago5 messagesgeneral
Jump to latest
#1Matthew Wilson
matt@tplus1.com

I have a table MESSAGE with a text column and a timestamp column. I
want to make sure that I have no rows with duplicates for both values.
I have two ideas right now for how to do this.

IDEA 1:

CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, my_text_col);

IDEA 2:

CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col));

I am speculating that an index on the md5 is cheaper than on a text
column. I'm willing to risk the chance of a hash collision.

I don't want to use this index to allow searching inside the text
column. I just want to protect against duplication.

Are either of these approaches any good? What are other ways to
guarantee uniqueness for the pair of my timestamp column and my text
column?

TIA

Matt

--
Programming, economics, gardening, life in Cleveland.
http://blog.tplus1.com

#2Stuart Bishop
stuart@stuartbishop.net
In reply to: Matthew Wilson (#1)
Re: Need help requiring uniqueness in text columns

Matthew Wilson wrote:

I have a table MESSAGE with a text column and a timestamp column. I
want to make sure that I have no rows with duplicates for both values.
I have two ideas right now for how to do this.

IDEA 1:

CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, my_text_col);

IDEA 2:

CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col));

I am speculating that an index on the md5 is cheaper than on a text
column. I'm willing to risk the chance of a hash collision.

I don't want to use this index to allow searching inside the text
column. I just want to protect against duplication.

Are either of these approaches any good? What are other ways to
guarantee uniqueness for the pair of my timestamp column and my text
column?

Idea 1 will only work if your text columns never grow too long - I can't
recall what the limit is off the top of my head but PG will issue warnings
when you reach it.

Idea 2 works. You can even use it for exact match searching if you like if
you do WHERE my_timestamp_col='...' AND md5(my_text_col)=md5('...')

Depending on your app, you might want to round your timestamps to avoid
getting two identical messages in the system only a millisecond apart.

--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: Matthew Wilson (#1)
Re: Need help requiring uniqueness in text columns

Hello

IDEA 3:

Use two hash functions:

CREATE UNIQUE INDEX uidx ON TEST((decode(md5(a),'hex')),(hashtext(a)));

removing spaces helps
CREATE UNIQUE INDEX uidx ON test((decode(md5(lower(replace(a,' ',''))),'hex')));

Regards
Pavel Stehule

CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col));

On 02/01/2008, Matthew Wilson <matt@tplus1.com> wrote:

I have a table MESSAGE with a text column and a timestamp column. I
want to make sure that I have no rows with duplicates for both values.
I have two ideas right now for how to do this.

IDEA 1:

CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, my_text_col);

IDEA 2:

CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col));

I am speculating that an index on the md5 is cheaper than on a text
column. I'm willing to risk the chance of a hash collision.

I don't want to use this index to allow searching inside the text
column. I just want to protect against duplication.

Are either of these approaches any good? What are other ways to
guarantee uniqueness for the pair of my timestamp column and my text
column?

TIA

Matt

--
Programming, economics, gardening, life in Cleveland.
http://blog.tplus1.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

#4Matthew Wilson
matt@tplus1.com
In reply to: Matthew Wilson (#1)
Re: Need help requiring uniqueness in text columns

On Wed 02 Jan 2008 04:23:46 AM EST, Pavel Stehule wrote:

Hello

IDEA 3:

Use two hash functions:

CREATE UNIQUE INDEX uidx ON TEST((decode(md5(a),'hex')),(hashtext(a)));

removing spaces helps
CREATE UNIQUE INDEX uidx ON test((decode(md5(lower(replace(a,' ',''))),'hex')));

Regards
Pavel Stehule

CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col));

What is the advantage of this method?

--
Programming, economics, gardening, life in Cleveland.
http://blog.tplus1.com

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Matthew Wilson (#4)
Re: Need help requiring uniqueness in text columns

On 02/01/2008, Matthew Wilson <matt@tplus1.com> wrote:

On Wed 02 Jan 2008 04:23:46 AM EST, Pavel Stehule wrote:

Hello

IDEA 3:

Use two hash functions:

CREATE UNIQUE INDEX uidx ON TEST((decode(md5(a),'hex')),(hashtext(a)));

removing spaces helps
CREATE UNIQUE INDEX uidx ON test((decode(md5(lower(replace(a,' ',''))),'hex')));

Regards
Pavel Stehule

CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col));

What is the advantage of this method?

much less unique index.

Regards
Pavel

Show quoted text

--
Programming, economics, gardening, life in Cleveland.
http://blog.tplus1.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/