Best way to store case-insensitive data?
I have a column called "email" that users login with, thus I need to
be able to lookup email very quickly. The problem is, emails are
case-insensitive. I want foo@bar.com to be able to login with
FOO@Bar.com as well. There's two ways of doing this, that I can see:
1) Every time I lookup an email in the database, do a case-insensitive
ilike, or cast both sides with LOWER(). I think both are slow,
correct?
2) Every time the user updates or saves their email, store it in
lowercase, and every time I lookup an email, pass in a lowercase
email. This is somewhat of a bug farm because one might miss some
little spot in a piece of code where an email is compared or updated.
Is there any way to tell postgres to always store data in lowercase
form, or just have a flat out case-insensitive column? Thanks!
Mike
On 6/10/2010 3:50 PM, Mike Christensen wrote:
I have a column called "email" that users login with, thus I need to
be able to lookup email very quickly. The problem is, emails are
case-insensitive. I want foo@bar.com to be able to login with
FOO@Bar.com as well. There's two ways of doing this, that I can see:1) Every time I lookup an email in the database, do a case-insensitive
ilike, or cast both sides with LOWER(). I think both are slow,
correct?
2) Every time the user updates or saves their email, store it in
lowercase, and every time I lookup an email, pass in a lowercase
email. This is somewhat of a bug farm because one might miss some
little spot in a piece of code where an email is compared or updated.Is there any way to tell postgres to always store data in lowercase
form, or just have a flat out case-insensitive column? Thanks!Mike
There is citext in contrib, it makes case insensitive text columns.
-Andy
On Thu, Jun 10, 2010 at 2:50 PM, Mike Christensen <mike@kitchenpc.com> wrote:
I have a column called "email" that users login with, thus I need to
be able to lookup email very quickly. The problem is, emails are
case-insensitive. I want foo@bar.com to be able to login with
FOO@Bar.com as well. There's two ways of doing this, that I can see:1) Every time I lookup an email in the database, do a case-insensitive
ilike, or cast both sides with LOWER(). I think both are slow,
correct?
2) Every time the user updates or saves their email, store it in
lowercase, and every time I lookup an email, pass in a lowercase
email. This is somewhat of a bug farm because one might miss some
little spot in a piece of code where an email is compared or updated.Is there any way to tell postgres to always store data in lowercase
form, or just have a flat out case-insensitive column? Thanks!
The contrib module citext provides a case insensitive text type.
Where do I get info on installing this?
Show quoted text
On Thu, Jun 10, 2010 at 2:15 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Thu, Jun 10, 2010 at 2:50 PM, Mike Christensen <mike@kitchenpc.com> wrote:
I have a column called "email" that users login with, thus I need to
be able to lookup email very quickly. The problem is, emails are
case-insensitive. I want foo@bar.com to be able to login with
FOO@Bar.com as well. There's two ways of doing this, that I can see:1) Every time I lookup an email in the database, do a case-insensitive
ilike, or cast both sides with LOWER(). I think both are slow,
correct?
2) Every time the user updates or saves their email, store it in
lowercase, and every time I lookup an email, pass in a lowercase
email. This is somewhat of a bug farm because one might miss some
little spot in a piece of code where an email is compared or updated.Is there any way to tell postgres to always store data in lowercase
form, or just have a flat out case-insensitive column? Thanks!The contrib module citext provides a case insensitive text type.
On Thu, Jun 10, 2010 at 2:50 PM, Mike Christensen <mike@kitchenpc.com> wrote:
I have a column called "email" that users login with, thus I need to
be able to lookup email very quickly. The problem is, emails are
case-insensitive. I want foo@bar.com to be able to login with
FOO@Bar.com as well. There's two ways of doing this, that I can see:1) Every time I lookup an email in the database, do a case-insensitive
ilike, or cast both sides with LOWER(). I think both are slow,
correct?
2) Every time the user updates or saves their email, store it in
lowercase, and every time I lookup an email, pass in a lowercase
email. This is somewhat of a bug farm because one might miss some
little spot in a piece of code where an email is compared or updated.Is there any way to tell postgres to always store data in lowercase
form, or just have a flat out case-insensitive column? Thanks!
Note the other option is to store an index on lower(column)
create index mycaseinsensitiveindex on table ((lower(column));
On Thu, Jun 10, 2010 at 3:16 PM, Mike Christensen <mike@kitchenpc.com> wrote:
Where do I get info on installing this?
Very much depends on OS and how you installed pgsql
On 6/10/2010 3:50 PM, Mike Christensen wrote:
I have a column called "email" that users login with, thus I need to
be able to lookup email very quickly. The problem is, emails are
case-insensitive. I want foo@bar.com to be able to login with
FOO@Bar.com as well. There's two ways of doing this, that I can see:1) Every time I lookup an email in the database, do a case-insensitive
ilike, or cast both sides with LOWER(). I think both are slow,
correct?
Use a functional index and they won't be.
create index email_lower_idx on foo (lower(email));
select * from foo where lower(email) = lower('foo@bar');
2) Every time the user updates or saves their email, store it in
lowercase, and every time I lookup an email, pass in a lowercase
email. This is somewhat of a bug farm because one might miss some
little spot in a piece of code where an email is compared or updated.Is there any way to tell postgres to always store data in lowercase
form, or just have a flat out case-insensitive column? Thanks!Mike
Cheers,
Steve
Right now, I'm running 8.3.4 on Windows 2003, and I just used the MSI
installer to install it.
Show quoted text
On Thu, Jun 10, 2010 at 2:18 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Thu, Jun 10, 2010 at 3:16 PM, Mike Christensen <mike@kitchenpc.com> wrote:
Where do I get info on installing this?
Very much depends on OS and how you installed pgsql
From this site:
http://developer.postgresql.org/pgdocs/postgres/citext.html
I couldn't tell if you still had to create an index on the lower case
value. It seems that it basically mimics the WHERE LOWER(email) =
LOWER(?) method. Since this part is incredibly performance critical,
maybe I'm better off storing my data all in lowercase and keeping the
DB case sensitive.
Show quoted text
On Thu, Jun 10, 2010 at 2:29 PM, Mike Christensen <mike@kitchenpc.com> wrote:
Right now, I'm running 8.3.4 on Windows 2003, and I just used the MSI
installer to install it.On Thu, Jun 10, 2010 at 2:18 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Thu, Jun 10, 2010 at 3:16 PM, Mike Christensen <mike@kitchenpc.com> wrote:
Where do I get info on installing this?
Very much depends on OS and how you installed pgsql
On Thu, Jun 10, 2010 at 3:34 PM, Mike Christensen <mike@kitchenpc.com> wrote:
From this site:
http://developer.postgresql.org/pgdocs/postgres/citext.html
I couldn't tell if you still had to create an index on the lower case
value. It seems that it basically mimics the WHERE LOWER(email) =
LOWER(?) method. Since this part is incredibly performance critical,
maybe I'm better off storing my data all in lowercase and keeping the
DB case sensitive.
of course you'd still need an index. whether you store it lower case
in regular text or mixed case in a citext, the db would need an index
for good performance. But you wouldn't have to store a lower() index
for citext, just an index.
BTW, citext it new for 8.4, so it's probably not an option for you if
you're on 8.3
Ah, I should probably upgrade to 8.4. However, I'll probably just
wait for 9.0 to come out. So it seems like citext will be about the
same as casting both sides to LOWER(), plus putting an index on the
lowercase version of the text. I'd probably use that if it were out
of the box, but I'm trying to stay away from adding too many
dependencies.. I think I'll stick with my original approach of only
storing lowercase data in the DB, and perhaps put a CHECK constraint
on there to ensure no upper case letters sneak in.
Mike
Show quoted text
On Thu, Jun 10, 2010 at 2:42 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Thu, Jun 10, 2010 at 3:34 PM, Mike Christensen <mike@kitchenpc.com> wrote:
From this site:
http://developer.postgresql.org/pgdocs/postgres/citext.html
I couldn't tell if you still had to create an index on the lower case
value. It seems that it basically mimics the WHERE LOWER(email) =
LOWER(?) method. Since this part is incredibly performance critical,
maybe I'm better off storing my data all in lowercase and keeping the
DB case sensitive.of course you'd still need an index. whether you store it lower case
in regular text or mixed case in a citext, the db would need an index
for good performance. But you wouldn't have to store a lower() index
for citext, just an index.BTW, citext it new for 8.4, so it's probably not an option for you if
you're on 8.3
Heyho!
On Thursday 10 June 2010 22.50:23 Mike Christensen wrote:
2) Every time the user updates or saves their email, store it in
lowercase, and every time I lookup an email, pass in a lowercase
email.
I'd do it this way. Plus either a CHECK condition on the table (email =
lowercase(email)) (this will reliably catch all cases, but you will
experience failures until you have found all cases)
Or a BEFORE trigger that converts email to lowercase. (This is mostly
transparent for storing, but I usually try to avoid triggers that modify
data like this. But that's probably just me.)
In either case, obviously you'll still need to change the code that is used
for retrieving and comparing email addresses.
cheers
-- vbi
--
featured link: http://www.pool.ntp.org
Yup, I actually ended up doing this with this constraint:
ALTER TABLE Users ADD CONSTRAINT check_email CHECK (email ~ E'^[^A-Z]+$');
However, I like your version better so I'll use that instead :)
Mike
On Thu, Jun 10, 2010 at 11:48 PM, Adrian von Bidder
<avbidder@fortytwo.ch> wrote:
Show quoted text
Heyho!
On Thursday 10 June 2010 22.50:23 Mike Christensen wrote:
2) Every time the user updates or saves their email, store it in
lowercase, and every time I lookup an email, pass in a lowercase
email.I'd do it this way. Plus either a CHECK condition on the table (email =
lowercase(email)) (this will reliably catch all cases, but you will
experience failures until you have found all cases)Or a BEFORE trigger that converts email to lowercase. (This is mostly
transparent for storing, but I usually try to avoid triggers that modify
data like this. But that's probably just me.)In either case, obviously you'll still need to change the code that is used
for retrieving and comparing email addresses.cheers
-- vbi--
featured link: http://www.pool.ntp.org
On Thu, 10 Jun 2010 13:50:23 -0700, Mike Christensen wrote:
I have a column called "email" that users login with, thus I need to
be able to lookup email very quickly. The problem is, emails are
case-insensitive. I want foo@bar.com to be able to login with
FOO@Bar.com as well. There's two ways of doing this, that I can see:
NB: technically the local part in an email address can be case sensitive.
As RFC 5321 says:
The local-part of a mailbox MUST BE treated as case sensitive.
Therefore, SMTP implementations MUST take care to preserve the case
of mailbox local-parts. In particular, for some hosts, the user
"smith" is different from the user "Smith". However, exploiting the
case sensitivity of mailbox local-parts impedes interoperability and
is discouraged. Mailbox domains follow normal DNS rules and are
hence not case sensitive.
In practice I've yet to see a system having both smith and Smith
and them being different, but still it is theoretically posible.
--
Michał Politowski
Talking has been known to lead to communication if practiced carelessly.
Yea this is a valid point. It's very possible my design won't work
for the long term, and at some point I'll have to store the email name
exactly as it was entered, and allow the lookup logic to be case
insensitive with a lowercase index. However, I think the way I have
it now should not break any known email server heh.
Mike
2010/6/11 Michal Politowski <mpol@charybda.icm.edu.pl>:
Show quoted text
On Thu, 10 Jun 2010 13:50:23 -0700, Mike Christensen wrote:
I have a column called "email" that users login with, thus I need to
be able to lookup email very quickly. The problem is, emails are
case-insensitive. I want foo@bar.com to be able to login with
FOO@Bar.com as well. There's two ways of doing this, that I can see:NB: technically the local part in an email address can be case sensitive.
As RFC 5321 says:
The local-part of a mailbox MUST BE treated as case sensitive.
Therefore, SMTP implementations MUST take care to preserve the case
of mailbox local-parts. In particular, for some hosts, the user
"smith" is different from the user "Smith". However, exploiting the
case sensitivity of mailbox local-parts impedes interoperability and
is discouraged. Mailbox domains follow normal DNS rules and are
hence not case sensitive.In practice I've yet to see a system having both smith and Smith
and them being different, but still it is theoretically posible.--
Michał Politowski
Talking has been known to lead to communication if practiced carelessly.--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Friday 11 June 2010 09.27:15 Michal Politowski wrote:
[email address local part is case sensitive]
In practice I've yet to see a system having both smith and Smith
and them being different, but still it is theoretically posible.
I routinely modify email addresses I store to my addressbook to all
lowercase. I have yet to have a single case where this gets me into
problems; I think it's probably quite exotic to find a system that actually
is case sensitive.
cheers
-- vbi
--
featured product: ClamAV Antivirus - http://www.clamav.net/
2010/6/11 Mike Christensen <mike@kitchenpc.com>:
Yea this is a valid point. It's very possible my design won't work
for the long term, and at some point I'll have to store the email name
exactly as it was entered, and allow the lookup logic to be case
insensitive with a lowercase index. However, I think the way I have
it now should not break any known email server heh.
Instead of mangling data when you store it, mangle it later when you
retrieve it. with a functional index on the column, you get the
comparison data stored in an index, ready to go.
Performance test the index:
create test_index on table (lower(fieldname));
versus storing the emails in lower case.
n Sat, Jun 12, 2010 at 3:21 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
Performance test the index:
create test_index on table (lower(fieldname));
versus storing the emails in lower case.
Some quick testing on that tells me that storing in lower case will be
about twice as fast at retrieval. But we're talking things like 1ms
versus 2ms.
Please don't top-post.
Mike Christensen wrote:
Ah, I should probably upgrade to 8.4. However, I'll probably just
wait for 9.0 to come out. So it seems like citext will be about the
same as casting both sides to LOWER(), plus putting an index on the
lowercase version of the text. I'd probably use that if it were out
of the box, but I'm trying to stay away from adding too many
dependencies.. I think I'll stick with my original approach of only
storing lowercase data in the DB, and perhaps put a CHECK constraint
on there to ensure no upper case letters sneak in.
If your db contains international text there are some corner cases where
lower( upper( val )) != val or upper( lower( val )) != val. Or there should
be, because that's what happens in certain languages.
For example, upper-case 'ß' should be 'SS' in German. Lower-case 'SS' is 'ss'.
--
Lew