Case Sensitivity
Hi,
I'm moving over from MS SQL. I've been googling this for ages now and suprisingly cannot find a clear answer.
I want my data in tables to be case insensitive.
This is so i can:
1. Put keys on natural key fields, for instance a product part number. I product number 'ABC123' inserted i need to disallow 'abc123' to be inserted as a second row. Please don't tell me i have to add another column holding a lowered version of the product number. My database is littered with this need and i would end up bloating my table schema.
2. I need to query case insensitively. SELECT * FROM product WHERE product_number = 'ABC123' should return the same row as SELECT * FROM product WHERE product_number = 'abc123'
Is there a database wide collation setting i can make, or a case insensitive character type. There are lots of online posts regarding using LOWER function for querying. This is a workaround for point (2) but does not remedy point (1) above.
Many thanks
Phillip
Phillip Smith
Hello
you should to use a citext datatype
http://www.postgresql.org/docs/9.0/interactive/citext.html
regards
Pavel Stehule
2011/1/13 Phillip Smith <phillip@softworks.com>:
Show quoted text
Hi,
I'm moving over from MS SQL. I've been googling this for ages now and
suprisingly cannot find a clear answer.I want my data in tables to be case insensitive.
This is so i can:
1. Put keys on natural key fields, for instance a product part number. I
product number 'ABC123' inserted i need to disallow 'abc123' to be inserted
as a second row. Please don't tell me i have to add another column holding a
lowered version of the product number. My database is littered with this
need and i would end up bloating my table schema.2. I need to query case insensitively. SELECT * FROM product WHERE
product_number = 'ABC123' should return the same row as SELECT * FROM
product WHERE product_number = 'abc123'Is there a database wide collation setting i can make, or a case
insensitive character type. There are lots of online posts regarding using
LOWER function for querying. This is a workaround for point (2) but does not
remedy point (1) above.Many thanks
Phillip
Phillip Smith
Thanks Pavel,
Is this the only way?. I would prefer to use a collation setting if this is possible. Do you know whether there would be a performance hit using non standard character strings?
Phillip
you should to use a citext datatype
Or:
- use a functional index with lower() to ensure uniqueness
- use a BEFORE trigger to lower() the inserted data
Karsten
--
GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit
gratis Handy-Flat! http://portal.gmx.net/de/go/dsl
Hi Karsten,
modifying user inputted data is not an option for me. I need to maintain the original data. I had read about indexing on a lower function. This all seems a bit of a bodgy workaround to me. A clean Case Insensitive collation setting is cleaner. Is this supported in postgreSQL 9?
Regards
Phillip
Phillip
2011/1/13 Phillip Smith <phillip@softworks.com>:
Thanks Pavel,
Is this the only way?. I would prefer to use a collation setting if this is
possible. Do you know whether there would be a performance hit using non
standard character strings?
It is preferred way. PostgreSQL doesn't support a collations, and what
I know collation in 9.1 will not allow a comparation between case
sensitive and case insensitive text, because collation in PostgreSQL
is based on unix locale, and there are not case insensitive locale.
I don't know about performance problems. It does same work like case
insensitive collation.
Regards
Pavel Stehule
Show quoted text
Phillip
On Thu, Jan 13, 2011 at 03:29:03PM -0000, Phillip Smith wrote:
modifying user inputted data is not an option for me. I need to
maintain the original data. I had read about indexing on a lower
function. This all seems a bit of a bodgy workaround to me. A clean
Case Insensitive collation setting is cleaner. Is this supported in
postgreSQL 9?
There isn't such a collation setting as far as I know, and it's
actually not as clean as you think. It turns out to work reasonably
well in ASCII-land, but not very well in other circumstances. In
general, the case-insensitive but case-preserving technique that
English-speaking computer users have come to find normal is a bad fit
for a wide variety of languages. (Even "Latin" characterset-using
languages have trouble, because of historic ways of handling accents.
Is the capital version of é E or É? Well, both, it turns out,
depending on whom you believe. I could bore you about the effects of
this in the DNS all day long, but I'll resist the temptation.)
Putting an index on lower(column) and then doing all your comparisons
with lower(datum) works fairly well, and I don't see how it's any more
bodgy than a database-wide case insensitive collation. For instance,
I can assure you that customers named Leblanc and LeBlanc care about
whether those two compare equally. In your customer name field, if
you have a database-wide collation setting, you can't make the
distinction.
A
--
Andrew Sullivan
ajs@crankycanuck.ca