Problem with PostgreSQL string sorting

Started by Hari Sankar Aabout 9 years ago7 messagesbugs
Jump to latest
#1Hari Sankar A
hsshanthamhari@gmail.com

Hello All,

I am a Database administrator and we are using PostgreSQL-9.6.1 version in
RHEL-7 linux machine. Current server Encoding setup is UTF8 and LC_COLLATE
and LC_CTYPE is en_US.UTF-8.

Lets say table person has field name with values likes

name
------
Abc
abc
.dcb
Dcb
$sdf
EDF

en_US.UTF-8 collate sorting this field as:

name
------
abc
Abc
.dcb
Dcb
EDF
$sdf

It ignore's Special characters.

"C" collate sorting this field as:
name
------
$sdf
.dcb
Abc
Dcb
EDF
abc

It does case sensitive sort.

My expected sorting should be: (Case insensitive and special character)

name
------
$sdf
.dcb
Abc
abc
Dcb
EDF

I can use lower() and collate 'C' to get this result. But I need a default
collate to support this.

Is there any collation that support both (Case insensitive and special
character) sorting in utf8?

your help would be really appreciated.

Thanks,
Hari

In reply to: Hari Sankar A (#1)
Re: Problem with PostgreSQL string sorting

On Wed, Feb 15, 2017 at 7:44 AM, Hari Sankar A <hsshanthamhari@gmail.com> wrote:

I am a Database administrator and we are using PostgreSQL-9.6.1 version in
RHEL-7 linux machine. Current server Encoding setup is UTF8 and LC_COLLATE
and LC_CTYPE is en_US.UTF-8.

The behavior of each collation comes from the operating system's own
libc, except for the C collation, which is based on the ordering
implied by strcmp() comparisons. Generally, most implementations have
the behavior you describe, in that they assign least weight of all to
caseness and whitespace, and somewhat more weight to punctuation. I
don't think that there is much that can be done about it in practice,
though in principal there could be a collation that has all the
properties you want.

Future versions of PostgreSQL may have better support for this kind of thing.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#2)
Re: Problem with PostgreSQL string sorting

Peter Geoghegan <pg@bowt.ie> writes:

The behavior of each collation comes from the operating system's own
libc, except for the C collation, which is based on the ordering
implied by strcmp() comparisons. Generally, most implementations have
the behavior you describe, in that they assign least weight of all to
caseness and whitespace, and somewhat more weight to punctuation. I
don't think that there is much that can be done about it in practice,
though in principal there could be a collation that has all the
properties you want.

I believe most versions of libc use declarative specifications of locale
properties, so if you wanted to spend some time learning the language in
which those are written, you could make and install a locale that acts
like you want. It's possible there's one out there already, because
a lot of people seem to want that.

In any case, this is not a bug, this is a feature request; and it's not
even for a Postgres feature, it's for a libc feature.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Tom Lane (#3)
Re: Problem with PostgreSQL string sorting

On Wed, Feb 15, 2017 at 9:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I believe most versions of libc use declarative specifications of locale
properties, so if you wanted to spend some time learning the language in
which those are written, you could make and install a locale that acts
like you want. It's possible there's one out there already, because
a lot of people seem to want that.

True. And, the declarative language glibc uses doesn't look that
complicated to me. However, you'd really have to vendor your own glibc
to make this work, since they are compiled to some binary
representation when glibc is built. That's probably prohibitively
complicated to manage.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#5Hari Sankar A
hsshanthamhari@gmail.com
In reply to: Peter Geoghegan (#4)
Re: Problem with PostgreSQL string sorting

Thank you all for your valuable comments.

I will work on glibc to get this feature.

Appreciate your help.

Thanks,
Hari
On Wed, Feb 15, 2017 at 12:14 PM Peter Geoghegan <pg@bowt.ie> wrote:

Show quoted text

On Wed, Feb 15, 2017 at 9:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I believe most versions of libc use declarative specifications of locale
properties, so if you wanted to spend some time learning the language in
which those are written, you could make and install a locale that acts
like you want. It's possible there's one out there already, because
a lot of people seem to want that.

True. And, the declarative language glibc uses doesn't look that
complicated to me. However, you'd really have to vendor your own glibc
to make this work, since they are compiled to some binary
representation when glibc is built. That's probably prohibitively
complicated to manage.

--
Peter Geoghegan

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#4)
Re: Problem with PostgreSQL string sorting

Peter Geoghegan <pg@bowt.ie> writes:

On Wed, Feb 15, 2017 at 9:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I believe most versions of libc use declarative specifications of locale
properties, so if you wanted to spend some time learning the language in
which those are written, you could make and install a locale that acts
like you want. It's possible there's one out there already, because
a lot of people seem to want that.

True. And, the declarative language glibc uses doesn't look that
complicated to me. However, you'd really have to vendor your own glibc
to make this work, since they are compiled to some binary
representation when glibc is built. That's probably prohibitively
complicated to manage.

Really? "man localedef" certainly gives the impression that you can
just compile some more locales if you want. Admittedly, your system's
package manager might have a hissy fit about nonstandard files
suddenly appearing in /usr/lib/locale, but that's not glibc's fault.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Tom Lane (#6)
Re: Problem with PostgreSQL string sorting

On Wed, Feb 15, 2017 at 9:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Really? "man localedef" certainly gives the impression that you can
just compile some more locales if you want. Admittedly, your system's
package manager might have a hissy fit about nonstandard files
suddenly appearing in /usr/lib/locale, but that's not glibc's fault.

I agree, but the apparent lack of any standard, high-level guidance on
creating a custom locale would leave me hesitant to take that course
of action.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs