Unexpected behavior sorting strings

Started by Jimmy Thrasherabout 6 years ago4 messagesgeneral
Jump to latest
#1Jimmy Thrasher
jimmy@jimmythrasher.com

I'm seeing some unexpected behavior when sorting some strings, and it indicates I don't fully understand how postgresql string sorting works.

As I understand it, postgresql sorts strings roughly like strcmp does: character by character based on encoding value.

In particular, I'm seeing the following. I would expect "< S" to come first, because "<" (0x3c) is less than ">" (0x3e).

```
supercatdev=# select unnest(array['> N', '< S']) as s order by s;
s
-----

N

< S
(2 rows)
```

I've broken this down further:
```
supercatdev=# select '> N' < '< S';
?column?
----------
t
(1 row)
```

Am I missing something about how sorting works?

Metadata:
- postgresql 9.5.19, running on Ubuntu 16LTS
- encoding, collate, and ctype are all UTF8 or en_US.UTF-8, as appropriate

Thanks!

Jimmy

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jimmy Thrasher (#1)
Re: Unexpected behavior sorting strings

"Jimmy Thrasher" <jimmy@jimmythrasher.com> writes:

As I understand it, postgresql sorts strings roughly like strcmp does: character by character based on encoding value.

Only if you're using C locale. Other locales such as en_US have
completely different rules, which most hackers tend to find pretty
unintelligible and inconsistent :-(. In your example, I think
the first-pass sort is on just the letters, and only if those are
the same will it consider the punctuation.

regards, tom lane

#3Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Jimmy Thrasher (#1)
Re: Unexpected behavior sorting strings

On 4/8/20 7:35 AM, Jimmy Thrasher wrote:

I'm seeing some unexpected behavior when sorting some strings, and it indicates I don't fully understand how postgresql string sorting works.

As I understand it, postgresql sorts strings roughly like strcmp does: character by character based on encoding value.

In particular, I'm seeing the following. I would expect "< S" to come first, because "<" (0x3c) is less than ">" (0x3e).

```
supercatdev=# select unnest(array['> N', '< S']) as s order by s;
s
-----

N

< S
(2 rows)
```

I've broken this down further:
```
supercatdev=# select '> N' < '< S';
?column?
----------
t
(1 row)
```

Am I missing something about how sorting works?

I believe you are looking for 'C' collation:

test=# select unnest(array[('> N' collate "C") , ('< S' COLLATE "C")])
as s order by s;

s
-----
< S

N

(2 rows)

For more information see:

https://www.postgresql.org/docs/12/collation.html

Metadata:
- postgresql 9.5.19, running on Ubuntu 16LTS
- encoding, collate, and ctype are all UTF8 or en_US.UTF-8, as appropriate

Thanks!

Jimmy

--
Adrian Klaver
adrian.klaver@aklaver.com

#4Jimmy Thrasher
jimmy@jimmythrasher.com
In reply to: Adrian Klaver (#3)
Re: Unexpected behavior sorting strings

Many thanks! That clarifies things well.

Jimmy

On Wed, Apr 8, 2020, at 11:49 AM, Adrian Klaver wrote:

On 4/8/20 7:35 AM, Jimmy Thrasher wrote:

<snip>

Show quoted text

Am I missing something about how sorting works?

I believe you are looking for 'C' collation:

test=# select unnest(array[('> N' collate "C") , ('< S' COLLATE "C")])
as s order by s;

s
-----
< S

N

(2 rows)

For more information see:

https://www.postgresql.org/docs/12/collation.html