PostgreSQL 9.0.3 - collate/ctype order by hu_HU.UTF8 is not correct sorting

Started by Adam PAPAIabout 15 years ago10 messagesgeneral
Jump to latest
#1Adam PAPAI
adam.papai@bsdsupportservice.hu

I have a PgSQL node on a FreeBSD 8.1-RELEASE.

(PostgreSQL) 9.0.3 client/server.

pkg_info | grep 'postgresql'
postgresql-client-9.0.3
postgresql-server-9.0.3

I have a lots of UTF8 database with hu_HU.UTF8 COLLATE and LC_CTYPE

For example:

test | pgsql | UTF8 | hu_HU.UTF-8 | hu_HU.UTF-8

The problem is with the "ORDER BY", which does not sort the results
corretly.

Table "public.erdekessegek"
Column | Type | Modifiers
--------+------------------------+-----------
lead | character varying(255) |

test=# select * from erdekessegek order by lead asc;
lead
-------
Adi
Bedi
Evi
Zsolt
�di
�vi
(6 rows)

which is not correct.

It should be:

Adi
�di
Bedi
Evi
�vi
Zsolt

The "chars" wich accents always at the end of the order by but I don't
know why.

What can I do to make it work? I've read all docs on this topic and all
sais the LC_COLLATE and LC_CTYPE solves it, but it seems not.

Any advice would be helpful.

Thanks in advance,

--
Adam PAPAI
BSD Support Service
http://www.bsdsupportservice.hu
E-mail: adam.papai@bsdsupportservice.hu
Phone: +36 30 33-55-735 (Hungary)

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Adam PAPAI (#1)
Re: PostgreSQL 9.0.3 - collate/ctype order by hu_HU.UTF8 is not correct sorting

Hello

2011/2/5 Adam PAPAI <adam.papai@bsdsupportservice.hu>:

I have a PgSQL node on a FreeBSD 8.1-RELEASE.

(PostgreSQL) 9.0.3 client/server.

pkg_info | grep 'postgresql'
postgresql-client-9.0.3
postgresql-server-9.0.3

I have a lots of UTF8 database with hu_HU.UTF8 COLLATE and LC_CTYPE

For example:

test       | pgsql | UTF8      | hu_HU.UTF-8 | hu_HU.UTF-8

The problem is with the "ORDER BY", which does not sort the results
corretly.

       Table "public.erdekessegek"
 Column |          Type          | Modifiers
--------+------------------------+-----------
 lead   | character varying(255) |

test=# select * from erdekessegek  order by lead asc;
 lead
-------
 Adi
 Bedi
 Evi
 Zsolt
 Ádi
 Évi
(6 rows)

which is not correct.

It should be:

Adi
Ádi
Bedi
Evi
Évi
Zsolt

The "chars" wich accents always at the end of the order by but I don't
know why.

What can I do to make it work? I've read all docs on this topic and all
sais the LC_COLLATE and LC_CTYPE solves it, but it seems not.

You should to initialize database with correct locale. You can't to
change locale after database is created.

/usr/local/pgsql91/bin/createdb test --locale=cs_CZ.utf-8 -e UTF-8
CREATE DATABASE test LC_COLLATE 'cs_CZ.utf-8' LC_CTYPE 'cs_CZ.utf-8';
COMMENT ON DATABASE test IS 'UTF-8';

test=# show lc_collate ;
lc_collate
-------------
cs_CZ.utf-8
(1 row)

test=# select * from (values('Sís'),('Šiška'),('Syrový'),('Vondra')) x
ORDER BY 1;
column1
---------
Sís
Syrový
Šiška
Vondra
(4 rows)

Regards

Pavel Stehule

Show quoted text

Any advice would be helpful.

Thanks in advance,

--
Adam PAPAI
BSD Support Service
http://www.bsdsupportservice.hu
E-mail: adam.papai@bsdsupportservice.hu
Phone: +36 30 33-55-735 (Hungary)

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Adam PAPAI
adam.papai@bsdsupportservice.hu
In reply to: Pavel Stehule (#2)
Re: PostgreSQL 9.0.3 - collate/ctype order by hu_HU.UTF8 is not correct sorting

Pavel Stehule wrote:

Hello

You should to initialize database with correct locale. You can't to
change locale after database is created.

/usr/local/pgsql91/bin/createdb test --locale=cs_CZ.utf-8 -e UTF-8
CREATE DATABASE test LC_COLLATE 'cs_CZ.utf-8' LC_CTYPE 'cs_CZ.utf-8';
COMMENT ON DATABASE test IS 'UTF-8';

test=# show lc_collate ;
lc_collate
-------------
cs_CZ.utf-8
(1 row)

test=# select * from (values('Sís'),('Šiška'),('Syrový'),('Vondra')) x
ORDER BY 1;
column1
---------
Sís
Syrový
Šiška
Vondra
(4 rows)

Regards

Pavel Stehule

Something is not correct for me.

I've re-initialised it again with:

--locale=hu_HU.UTF-8 --encoding=UTF-8

which generated everything LC_* to hu_HU.UTF-8.

I've created the DBs with createdb:

createdb -U pgsql "$DBNAME" -E "UTF-8" --locale="hu_HU.UTF-8"

Example:

createdb -U pgsql spiritnet -E UTF-8 --locale=hu_HU.UTF-8

spiritnet=# show lc_collate;
lc_collate
-------------
hu_HU.UTF-8
(1 row)

spiritnet=# show lc_ctype;
lc_ctype
-------------
hu_HU.UTF-8

Name | Owner | Encoding | Collation | Ctype
-----------------------+-------+-----------+-------------+-------------
spiritnet | pgsql | UTF8 | hu_HU.UTF-8 | hu_HU.UTF-8

And it STILL NOT working, the order by is absolutly incorrect. All chars
with accents, are located at the END of the list. Is it a problem only
with 9.0.3?

For me, your test shows:

"Syrový"
"Sís"
"Vondra"
"Šiška"

instead your correct order.

Any ideas?

Further information:

[root@radon /usr/local/pgsql]# pg_config
BINDIR = /usr/local/bin
DOCDIR = /usr/local/share/doc/postgresql
HTMLDIR = /usr/local/share/doc/postgresql
INCLUDEDIR = /usr/local/include
PKGINCLUDEDIR = /usr/local/include/postgresql
INCLUDEDIR-SERVER = /usr/local/include/postgresql/server
LIBDIR = /usr/local/lib
PKGLIBDIR = /usr/local/lib/postgresql
LOCALEDIR = /usr/local/share/locale
MANDIR = /usr/local/man
SHAREDIR = /usr/local/share/postgresql
SYSCONFDIR = /usr/local/etc/postgresql
PGXS = /usr/local/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--with-libraries=/usr/local/lib'
'--with-includes=/usr/local/include' '--enable-thread-safety'
'--with-openssl' '--with-libxml' '--enable-nls' '--with-gssapi'
'--prefix=/usr/local' '--mandir=/usr/local/man'
'--infodir=/usr/local/info/' '--build=amd64-portbld-freebsd8.1'
'build_alias=amd64-portbld-freebsd8.1' 'CC=cc' 'CFLAGS=-O2 -pipe -O3
-funroll-loops -fno-strict-aliasing' 'LDFLAGS= -L/usr/local/lib
-rpath=/usr/lib:/usr/local/lib' 'CPP=cpp'
CC = cc
CPPFLAGS = -I/usr/local/include/libxml2 -I/usr/local/include
-I/usr/local/include -I/usr/local/include
CFLAGS = -O2 -pipe -O3 -funroll-loops -fno-strict-aliasing -Wall
-Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement
-Wendif-labels -fno-strict-aliasing -fwrapv
CFLAGS_SL = -fPIC -DPIC
LDFLAGS = -L/usr/local/lib -rpath=/usr/lib:/usr/local/lib
-L/usr/local/lib -L/usr/local/lib -L/usr/local/lib -Wl,--as-needed
-Wl,-R'/usr/local/lib'
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgport -lintl -lxml2 -lssl -lcrypto -lgssapi_krb5 -lz
-lreadline -lcrypt -lm
VERSION = PostgreSQL 9.0.3

[root@radon /usr/local/pgsql]# pg_controldata /usr/local/pgsql/data
pg_control version number: 903
Catalog version number: 201008051
Database system identifier: 5570316617043000695
Database cluster state: in production
pg_control last modified: Sat Feb 5 22:58:43 2011
Latest checkpoint location: 0/9008D748
Prior checkpoint location: 0/8FF99F68
Latest checkpoint's REDO location: 0/9005FEB0
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 0/16405
Latest checkpoint's NextOID: 57344
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 654
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 0
Time of latest checkpoint: Sat Feb 5 22:58:33 2011
Minimum recovery ending location: 0/0
Backup start location: 0/0
Current wal_level setting: minimal
Current max_connections setting: 40
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 64
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value

--
Adam PAPAI
BSD Support Service
http://www.bsdsupportservice.hu
E-mail: adam.papai@bsdsupportservice.hu
Phone: +36 30 33-55-735 (Hungary)

#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: Adam PAPAI (#3)
Re: PostgreSQL 9.0.3 - collate/ctype order by hu_HU.UTF8 is not correct sorting

2011/2/5 Adam PAPAI <adam.papai@bsdsupportservice.hu>:

Pavel Stehule wrote:

Hello

You should to initialize database with correct locale. You can't to
change locale after database is created.

/usr/local/pgsql91/bin/createdb test --locale=cs_CZ.utf-8 -e UTF-8
CREATE DATABASE test LC_COLLATE 'cs_CZ.utf-8' LC_CTYPE 'cs_CZ.utf-8';
COMMENT ON DATABASE test IS 'UTF-8';

test=# show lc_collate ;
 lc_collate
-------------
 cs_CZ.utf-8
(1 row)

test=# select * from (values('Sís'),('Šiška'),('Syrový'),('Vondra')) x
ORDER BY 1;
 column1
---------
 Sís
 Syrový
 Šiška
 Vondra
(4 rows)

Regards

Pavel Stehule

Something is not correct for me.

I've re-initialised it again with:

--locale=hu_HU.UTF-8 --encoding=UTF-8

which generated everything LC_* to hu_HU.UTF-8.

I've created the DBs with createdb:

createdb -U pgsql "$DBNAME" -E "UTF-8" --locale="hu_HU.UTF-8"

Example:

createdb -U pgsql spiritnet -E UTF-8 --locale=hu_HU.UTF-8

spiritnet=# show lc_collate;
 lc_collate
-------------
 hu_HU.UTF-8
(1 row)

spiritnet=# show lc_ctype;
 lc_ctype
-------------
 hu_HU.UTF-8

        Name          | Owner | Encoding  |  Collation  |    Ctype
-----------------------+-------+-----------+-------------+-------------
 spiritnet             | pgsql | UTF8      | hu_HU.UTF-8 | hu_HU.UTF-8

And it STILL NOT working, the order by is absolutly incorrect. All chars
with accents, are located at the END of the list. Is it a problem only
with 9.0.3?

For me, your test shows:

"Syrový"
"Sís"
"Vondra"
"Šiška"

instead your correct order.

Any ideas?

your system locales is correct? PostgreSQL uses only system libs

Regards

Pavel Stehule

Show quoted text

Further information:

[root@radon /usr/local/pgsql]# pg_config
BINDIR = /usr/local/bin
DOCDIR = /usr/local/share/doc/postgresql
HTMLDIR = /usr/local/share/doc/postgresql
INCLUDEDIR = /usr/local/include
PKGINCLUDEDIR = /usr/local/include/postgresql
INCLUDEDIR-SERVER = /usr/local/include/postgresql/server
LIBDIR = /usr/local/lib
PKGLIBDIR = /usr/local/lib/postgresql
LOCALEDIR = /usr/local/share/locale
MANDIR = /usr/local/man
SHAREDIR = /usr/local/share/postgresql
SYSCONFDIR = /usr/local/etc/postgresql
PGXS = /usr/local/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--with-libraries=/usr/local/lib'
'--with-includes=/usr/local/include' '--enable-thread-safety'
'--with-openssl' '--with-libxml' '--enable-nls' '--with-gssapi'
'--prefix=/usr/local' '--mandir=/usr/local/man'
'--infodir=/usr/local/info/' '--build=amd64-portbld-freebsd8.1'
'build_alias=amd64-portbld-freebsd8.1' 'CC=cc' 'CFLAGS=-O2 -pipe -O3
-funroll-loops -fno-strict-aliasing' 'LDFLAGS= -L/usr/local/lib
-rpath=/usr/lib:/usr/local/lib' 'CPP=cpp'
CC = cc
CPPFLAGS = -I/usr/local/include/libxml2 -I/usr/local/include
-I/usr/local/include -I/usr/local/include
CFLAGS = -O2 -pipe -O3 -funroll-loops -fno-strict-aliasing -Wall
-Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement
-Wendif-labels -fno-strict-aliasing -fwrapv
CFLAGS_SL = -fPIC -DPIC
LDFLAGS = -L/usr/local/lib -rpath=/usr/lib:/usr/local/lib
-L/usr/local/lib -L/usr/local/lib -L/usr/local/lib -Wl,--as-needed
-Wl,-R'/usr/local/lib'
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgport -lintl -lxml2 -lssl -lcrypto -lgssapi_krb5 -lz
-lreadline -lcrypt -lm
VERSION = PostgreSQL 9.0.3

[root@radon /usr/local/pgsql]# pg_controldata /usr/local/pgsql/data
pg_control version number:            903
Catalog version number:               201008051
Database system identifier:           5570316617043000695
Database cluster state:               in production
pg_control last modified:             Sat Feb  5 22:58:43 2011
Latest checkpoint location:           0/9008D748
Prior checkpoint location:            0/8FF99F68
Latest checkpoint's REDO location:    0/9005FEB0
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/16405
Latest checkpoint's NextOID:          57344
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        654
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Time of latest checkpoint:            Sat Feb  5 22:58:33 2011
Minimum recovery ending location:     0/0
Backup start location:                0/0
Current wal_level setting:            minimal
Current max_connections setting:      40
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value

--
Adam PAPAI
BSD Support Service
http://www.bsdsupportservice.hu
E-mail: adam.papai@bsdsupportservice.hu
Phone: +36 30 33-55-735 (Hungary)

#5Adam PAPAI
adam.papai@bsdsupportservice.hu
In reply to: Pavel Stehule (#4)
Re: PostgreSQL 9.0.3 - collate/ctype order by hu_HU.UTF8 is not correct sorting

Pavel Stehule wrote:

2011/2/5 Adam PAPAI <adam.papai@bsdsupportservice.hu>:

your system locales is correct? PostgreSQL uses only system libs

I've tested it with a fresh 8.4 and 9.0.

It's the same.

My FreeBSD 8.1 supports hu_HU.UTF-8, but I don't know why it's not working.

initdb output:

[root@titanium /usr/home/wooh]# /usr/local/etc/rc.d/postgresql initdb
The files belonging to this database system will be owned by user "pgsql".
This user must also own the server process.

The database cluster will be initialized with locale hu_HU.UTF-8.
The default text search configuration will be set to "hungarian".

creating directory /usr/local/pgsql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 40
selecting default shared_buffers ... 28MB
creating configuration files ... ok
creating template1 database in /usr/local/pgsql/data/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok

Perhaps the new FreeBSD 8.1 does something wrong. I'm gonna check it on
7.0/7.1/7.2.

--
Adam PAPAI
BSD Support Service
http://www.bsdsupportservice.hu
E-mail: adam.papai@bsdsupportservice.hu
Phone: +36 30 33-55-735 (Hungary)

#6Adam PAPAI
adam.papai@bsdsupportservice.hu
In reply to: Adam PAPAI (#5)
Re: PostgreSQL 9.0.3 - collate/ctype order by hu_HU.UTF8 is not correct sorting

Adam PAPAI wrote:

Pavel Stehule wrote:

2011/2/5 Adam PAPAI <adam.papai@bsdsupportservice.hu>:

your system locales is correct? PostgreSQL uses only system libs

I've tested it with a fresh 8.4 and 9.0.

It's the same.

My FreeBSD 8.1 supports hu_HU.UTF-8, but I don't know why it's not working.

I've tested it with 8.4 and 9.0 with locale=hu_HU.ISO8859-2,
encoding=LATIN2, It's working correctly.

But not with locale=hu_HU.UTF-8, encoding=UTF-8

Is it related to the FreeBSD team or the PostgreSQL team?

--
Adam PAPAI
BSD Support Service
http://www.bsdsupportservice.hu
E-mail: adam.papai@bsdsupportservice.hu
Phone: +36 30 33-55-735 (Hungary)

#7Martijn van Oosterhout
kleptog@svana.org
In reply to: Adam PAPAI (#6)
Re: PostgreSQL 9.0.3 - collate/ctype order by hu_HU.UTF8 is not correct sorting

On Sun, Feb 06, 2011 at 11:02:25AM +0100, Adam PAPAI wrote:

I've tested it with 8.4 and 9.0 with locale=hu_HU.ISO8859-2,
encoding=LATIN2, It's working correctly.

But not with locale=hu_HU.UTF-8, encoding=UTF-8

Is it related to the FreeBSD team or the PostgreSQL team?

Last I checked *BSD did not support sorting in UTF-8. I know Apple
added it themselves because they needed it but I don't think it got
backported to *BSD.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patriotism is when love of your own people comes first; nationalism,
when hate for people other than your own comes first.
- Charles de Gaulle

#8Radosław Smogura
rsmogura@softperience.eu
In reply to: Adam PAPAI (#6)
Re: PostgreSQL 9.0.3 - collate/ctype order by hu_HU.UTF8 is not correct sorting

You need to create database with LC_COLLATE="hu_HU.utf8", e.g.

CREATE DATABASE tx2 ENCODING='UTF-8' TEMPLATE=template0
LC_COLLATE='pl_PL.utf8';

Kind regards,
Radosław Smogura
http://www.softperience.eu

Adam PAPAI <adam.papai@bsdsupportservice.hu> Sunday 06 February 2011 11:02:25

Show quoted text

Adam PAPAI wrote:

Pavel Stehule wrote:

2011/2/5 Adam PAPAI <adam.papai@bsdsupportservice.hu>:

your system locales is correct? PostgreSQL uses only system libs

I've tested it with a fresh 8.4 and 9.0.

It's the same.

My FreeBSD 8.1 supports hu_HU.UTF-8, but I don't know why it's not
working.

I've tested it with 8.4 and 9.0 with locale=hu_HU.ISO8859-2,
encoding=LATIN2, It's working correctly.

But not with locale=hu_HU.UTF-8, encoding=UTF-8

Is it related to the FreeBSD team or the PostgreSQL team?

#9Adam PAPAI
adam.papai@bsdsupportservice.hu
In reply to: Radosław Smogura (#8)
Re: PostgreSQL 9.0.3 - collate/ctype order by hu_HU.UTF8 is not correct sorting

Radosław Smogura wrote:

You need to create database with LC_COLLATE="hu_HU.utf8", e.g.

CREATE DATABASE tx2 ENCODING='UTF-8' TEMPLATE=template0
LC_COLLATE='pl_PL.utf8';

Are you running it under FreeBSD?

--
Adam PAPAI
BSD Support Service
http://www.bsdsupportservice.hu
E-mail: adam.papai@bsdsupportservice.hu
Phone: +36 30 33-55-735 (Hungary)

#10Radosław Smogura
rsmogura@softperience.eu
In reply to: Adam PAPAI (#9)
Re: PostgreSQL 9.0.3 - collate/ctype order by hu_HU.UTF8 is not correct sorting

Adam PAPAI <adam.papai@bsdsupportservice.hu> Sunday 06 February 2011 14:13:51

Radosław Smogura wrote:

You need to create database with LC_COLLATE="hu_HU.utf8", e.g.

CREATE DATABASE tx2 ENCODING='UTF-8' TEMPLATE=template0
LC_COLLATE='pl_PL.utf8';

Are you running it under FreeBSD?

No, Gentoo. But, without creating db with LC_COLLATE I get wrong sroting too.