COLLATE

Started by Filip Rembiałkowskialmost 20 years ago9 messagesgeneral
Jump to latest
#1Filip Rembiałkowski
plk.zuber@gmail.com

Hi all,

let's assume that we keep Unicode text data in the column.
sometimes we want to sort it according to specific collation order.
how can we force collation when running a query?
ideal solution would be having SQL92 standard COLLATE clauses in
SELECT statements. I heard it's work in progress, what's the status?

are there any "partial" solutions to the problem?

regards,

Filip Rembiałkowski

#2Martijn van Oosterhout
kleptog@svana.org
In reply to: Filip Rembiałkowski (#1)
Re: COLLATE

On Mon, Jun 05, 2006 at 05:04:25PM +0200, Filip Rembia??kowski wrote:

Hi all,

let's assume that we keep Unicode text data in the column.
sometimes we want to sort it according to specific collation order.
how can we force collation when running a query?
ideal solution would be having SQL92 standard COLLATE clauses in
SELECT statements. I heard it's work in progress, what's the status?

Yeah, I was working on it but got stuck on the planner/optimiser
changes. In the mean time the tree drifted and lack of interest, which
gets us where we are now...

are there any "partial" solutions to the problem?

Not that I know of.

Have a ncie day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#3Bruce Momjian
bruce@momjian.us
In reply to: Filip Rembiałkowski (#1)
Re: COLLATE

"Filip Rembia�kowski" <plk.zuber@gmail.com> writes:

let's assume that we keep Unicode text data in the column.
sometimes we want to sort it according to specific collation order.
how can we force collation when running a query?
ideal solution would be having SQL92 standard COLLATE clauses in
SELECT statements. I heard it's work in progress, what's the status?

are there any "partial" solutions to the problem?

I don't know the status of the full COLLATE support.

But the closest partial solution suggested so far is the pg_xfrm function that
has been implemented and posted at least three times by three different
posters to the postgres mailing lists. In the interest of avoiding a fourth
independent implementation I'll attach the one I use below, it's not big.

Attachments:

pg_strxfrm.ctext/x-csrcDownload
#4Nikolay Samokhvalov
samokhvalov@gmail.com
In reply to: Martijn van Oosterhout (#2)
Re: COLLATE

On 6/5/06, Martijn van Oosterhout <kleptog@svana.org> wrote:

Yeah, I was working on it but got stuck on the planner/optimiser
changes. In the mean time the tree drifted and lack of interest, which
gets us where we are now...

Very bad news :-(
I were looking forward to this feature... So many troubles in my
projects would be vanished...

What is 'lack of interest'? Interest from community, or major
developers, or your personal one?

#5Filip Rembiałkowski
plk.zuber@gmail.com
In reply to: Bruce Momjian (#3)
Re: COLLATE

(group dupe)

05 Jun 2006 12:53:57 -0400, Greg Stark <gsstark@mit.edu>:

But the closest partial solution suggested so far is the pg_xfrm function that
has been implemented and posted at least three times by three different
posters to the postgres mailing lists. In the interest of avoiding a fourth
independent implementation I'll attach the one I use below, it's not big.

Thanks! It may be the only solution available at the moment.

But I have no idea how to compile/install it.
I imagine there should be separate /contrib/pg_strxfrm directory, with
the makefile etc.
is it correct? if so, how should the makefile look like?

I tried compiling but it failed:

pgdba@sulaco:~/src/postgresql-8.1.3/src$ gcc -I ./include -fPIC -c pg_strxfrm.c
pg_strxfrm.c: In function 'pg_strxfrm':
pg_strxfrm.c:98: error: 'Warn_restart' undeclared (first use in this function)
pg_strxfrm.c:98: error: (Each undeclared identifier is reported only once
pg_strxfrm.c:98: error: for each function it appears in.)

#6Martijn van Oosterhout
kleptog@svana.org
In reply to: Nikolay Samokhvalov (#4)
Re: COLLATE

On Wed, Jun 07, 2006 at 12:40:49AM +0400, Nikolay Samokhvalov wrote:

I were looking forward to this feature... So many troubles in my
projects would be vanished...

What is 'lack of interest'? Interest from community, or major
developers, or your personal one?

Kind of all three, feeding off eachother. There's just not enough
interest from any front to really get it moving. It's a fairly invasive
change and without significant support and interest from somewhere,
chances of completion let alone acceptance are pretty slim...

Have a nice day
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#7Tomi NA
hefest@gmail.com
In reply to: Martijn van Oosterhout (#6)
Re: COLLATE

On 6/7/06, Martijn van Oosterhout <kleptog@svana.org> wrote:

Kind of all three, feeding off eachother. There's just not enough
interest from any front to really get it moving. It's a fairly invasive
change and without significant support and interest from somewhere,
chances of completion let alone acceptance are pretty slim...

I don't get it. Maybe it's me or my environment.
About half of the work we do in my company is building webs on the
(excellent) eZ publish CMS. We routinely use MySQL even though the CMS
(at least declaratively) wolks on top of pgsql. We use MySQL on
inertia: it's the default db under eZ publish and it's easy to get
hosting.
On the other hand, I'd rather bite my hand off than use MySQL in some
other Internet apps (e-shop with a card payment interface and others).
Where am I going with all this?
Well, if you don't live in an english-speaking country, there's no
such thing as a single language web. Because of collation limitations,
however, postgresql would be the first to be crossed out on my list no
matter how good it is in all other respects.
I understand that the needed change is uncomfortably invasive, but not
beeing able to collate correctly is a show-stopping problem on a
professional site. The user opens a drop-down with several dozens of
cities, scrolls down a bit to where his city should be, dooesn't see
it because the collator places the weird letter all the way down at
the end of the list - and the user walks away. The other user can't
get info about the bus lines to the city or he doesn't buy a product
he want's because he doesn't see it where it's supposed to be. Another
user notices the error and dissregards the site as amateurish.
I understand I'm talking about a specific area of use, but isn't that
where the biggest growth in both the number of new applications as
well as the number of users is?
As it is, I'm happy to have a great RDBMS to build intranet, rich
client apps on - but that's about it.
I've used MySQL and MSSQL (unfortunately) and they both support
collation much better than pgsql. Not perfect, but much much better.
Maybe postgresql should try to set less ambitious goals and instead of
going for the holy grail of collation management (which is usualy
praiseworthy) try to provide at least db-level collation definitions
if table/row/cell level collation settings should proove too
challenging at the moment.

t.n.a.

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martijn van Oosterhout (#6)
Re: COLLATE

Martijn van Oosterhout <kleptog@svana.org> writes:

On Wed, Jun 07, 2006 at 12:40:49AM +0400, Nikolay Samokhvalov wrote:

What is 'lack of interest'? Interest from community, or major
developers, or your personal one?

Kind of all three, feeding off eachother. There's just not enough
interest from any front to really get it moving. It's a fairly invasive
change and without significant support and interest from somewhere,
chances of completion let alone acceptance are pretty slim...

I think there was also considerable concern about introducing a
dependency on a very large chunk of outside software (viz, ICU).
Loss of control, licensing questions, etc.

Of course, doing it *without* ICU is even more daunting :-(
but I'd like to think we'll get there someday.

regards, tom lane

#9Jan Behrens
jan.behrens@flexiguided.de
In reply to: Filip Rembiałkowski (#1)
Re: COLLATE

Filip Rembiałkowski wrote:

let's assume that we keep Unicode text data in the column.
sometimes we want to sort it according to specific collation order.
how can we force collation when running a query?

Hi Filip,

I had the same problem you have. As a solution I implemented a wrapper
function named collkey(), which transforms a Unicode string into a
sortable collation key using IBM's ICU library. Of course it's not as
nice as having a standard SQL command for it, but i think it's a nice
solution, until a good collation support is provided by PostgreSQL itself.

You can find the source here:
http://www.flexiguided.de/publications.pgcollkey.en.html

I can understand that people don't want PostgreSQL being dependent of
other libraries. The best solution for future would be implementing
independent collation functions inside of PostgreSQL, which also support
standard SQL syntax, but that's a lot of work, i think.

Jan Behrens