Need help requiring uniqueness in text columns
I have a table MESSAGE with a text column and a timestamp column. I
want to make sure that I have no rows with duplicates for both values.
I have two ideas right now for how to do this.
IDEA 1:
CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, my_text_col);
IDEA 2:
CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col));
I am speculating that an index on the md5 is cheaper than on a text
column. I'm willing to risk the chance of a hash collision.
I don't want to use this index to allow searching inside the text
column. I just want to protect against duplication.
Are either of these approaches any good? What are other ways to
guarantee uniqueness for the pair of my timestamp column and my text
column?
TIA
Matt
--
Programming, economics, gardening, life in Cleveland.
http://blog.tplus1.com
Matthew Wilson wrote:
I have a table MESSAGE with a text column and a timestamp column. I
want to make sure that I have no rows with duplicates for both values.
I have two ideas right now for how to do this.IDEA 1:
CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, my_text_col);
IDEA 2:
CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col));
I am speculating that an index on the md5 is cheaper than on a text
column. I'm willing to risk the chance of a hash collision.I don't want to use this index to allow searching inside the text
column. I just want to protect against duplication.Are either of these approaches any good? What are other ways to
guarantee uniqueness for the pair of my timestamp column and my text
column?
Idea 1 will only work if your text columns never grow too long - I can't
recall what the limit is off the top of my head but PG will issue warnings
when you reach it.
Idea 2 works. You can even use it for exact match searching if you like if
you do WHERE my_timestamp_col='...' AND md5(my_text_col)=md5('...')
Depending on your app, you might want to round your timestamps to avoid
getting two identical messages in the system only a millisecond apart.
--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/
Hello
IDEA 3:
Use two hash functions:
CREATE UNIQUE INDEX uidx ON TEST((decode(md5(a),'hex')),(hashtext(a)));
removing spaces helps
CREATE UNIQUE INDEX uidx ON test((decode(md5(lower(replace(a,' ',''))),'hex')));
Regards
Pavel Stehule
CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col));
On 02/01/2008, Matthew Wilson <matt@tplus1.com> wrote:
I have a table MESSAGE with a text column and a timestamp column. I
want to make sure that I have no rows with duplicates for both values.
I have two ideas right now for how to do this.IDEA 1:
CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, my_text_col);
IDEA 2:
CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col));
I am speculating that an index on the md5 is cheaper than on a text
column. I'm willing to risk the chance of a hash collision.I don't want to use this index to allow searching inside the text
column. I just want to protect against duplication.Are either of these approaches any good? What are other ways to
guarantee uniqueness for the pair of my timestamp column and my text
column?TIA
Matt
--
Programming, economics, gardening, life in Cleveland.
http://blog.tplus1.com---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
On Wed 02 Jan 2008 04:23:46 AM EST, Pavel Stehule wrote:
Hello
IDEA 3:
Use two hash functions:
CREATE UNIQUE INDEX uidx ON TEST((decode(md5(a),'hex')),(hashtext(a)));
removing spaces helps
CREATE UNIQUE INDEX uidx ON test((decode(md5(lower(replace(a,' ',''))),'hex')));Regards
Pavel StehuleCREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col));
What is the advantage of this method?
--
Programming, economics, gardening, life in Cleveland.
http://blog.tplus1.com
On 02/01/2008, Matthew Wilson <matt@tplus1.com> wrote:
On Wed 02 Jan 2008 04:23:46 AM EST, Pavel Stehule wrote:
Hello
IDEA 3:
Use two hash functions:
CREATE UNIQUE INDEX uidx ON TEST((decode(md5(a),'hex')),(hashtext(a)));
removing spaces helps
CREATE UNIQUE INDEX uidx ON test((decode(md5(lower(replace(a,' ',''))),'hex')));Regards
Pavel StehuleCREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col));
What is the advantage of this method?
much less unique index.
Regards
Pavel
Show quoted text
--
Programming, economics, gardening, life in Cleveland.
http://blog.tplus1.com---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?