varchar sort ordering ignore blanks

Started by Luca Arzeniover 18 years ago2 messagesgeneral
Jump to latest
#1Luca Arzeni
l.arzeni@amadego.com

Hi there,
I have a table with a single column, pk of varchar type

The table contains few names, say:

XXXX A
XXXX C
XXXXB

In the first two records there is a between the XXXX and the following letter
A and C while, the third one has a B immediately following the XXXX (without
blanks).

In postgres 7.4.7 (debian sarge), if I issue a select to sort the record I
(correctly) obtain:
XXXX A
XXXX C
XXXXB

In postgres 8.1.9 (debian etch), if I issue a select to sort the record I
(mistakenly) obtain:
XXXX A
XXXXB
XXXX C

That is: the sort order in postgres 8.1.9 seems to ignore the blank.

In all cases I'm using locale LATIN9 during DB creation, but I tested also
with ASCII, UTF8 and LATIN1 encoding.

Can someone help me to get the correct order in postgres 8.1.9 ?

=== Sample code ===

CREATE TABLE t_table
(
c_column varchar(30) NOT NULL,
CONSTRAINT t_table_pk PRIMARY KEY (c_column)
)
WITHOUT OIDS;

INSERT INTO t_table(c_column) VALUES ('XXXX A');
INSERT INTO t_table(c_column) VALUES ('XXXXB');
INSERT INTO t_table(c_column) VALUES ('XXXX C');

select * from t_table order by c_column asc;

=============

Thanks, Luca Arzeni

#2Csaba Nagy
nagy@ecircle-ag.com
In reply to: Luca Arzeni (#1)
Re: varchar sort ordering ignore blanks

On Tue, 2008-01-15 at 16:32 +0100, Luca Arzeni wrote:

In all cases I'm using locale LATIN9 during DB creation, but I tested also
with ASCII, UTF8 and LATIN1 encoding.

I guess this has nothing to do with the encoding, but with the collation
rules used, which is governed by "lc_collate" parameter. See what you
get on both DBs for:

SHOW lc_collate ;

Quoting from the docs:

"The nature of some locale categories is that their value has to be
fixed for the lifetime of a database cluster. That is, once initdb has
run, you cannot change them anymore. LC_COLLATE and LC_CTYPE are those
categories. They affect the sort order of indexes, so they must be kept
fixed, or indexes on text columns will become corrupt. PostgreSQL
enforces this by recording the values of LC_COLLATE and LC_CTYPE that
are seen by initdb. The server automatically adopts those two values
when it is started."

See:
http://www.postgresql.org/docs/8.1/static/charset.html

HTH,
Csaba.