Sorting Discrepancy in PostgreSQL 14.13

Started by [3반]김민지_4904over 1 year ago4 messagesbugs
Jump to latest
#1[3반]김민지_4904
hzuiw33@gmail.com

==============================================
POSTGRESQL BUG REPORT TEMPLATE
==============================================

Your name : minji-kim
Your email address : hzuiw33@gmail.com

# System Configuration:
---------------------
Architecture (example: Intel Pentium) : Intel(R)
Core(TM) Ultra 7 155H
Operating System (example: Linux 2.4.18) : VMware Workstation
Pro (Ubuntu-22.04)
PostgreSQL version (example: PostgreSQL 9.6.6) : PostgreSQL 14.13
Compiler used (example: gcc 3.3.5) : X (sudo apt
install postgresql postgresql-contrib)

# Please enter a FULL description of your problem:
------------------------------------------------

Sorting Discrepancy in PostgreSQL 14.13

When running the following command in PostgreSQL 14.13:

```sql
CREATE TABLE t0 (c0 TEXT);
INSERT INTO t0 (c0) VALUES ('-10'), ('20'), ('-5'), ('15'), ('-25');
SELECT c0, MIN(ABS(CAST(c0 AS BIGINT))) OVER (ORDER BY c0 NULLS FIRST) AS
min_function_cast FROM t0;
DROP TABLE IF EXISTS t0;
```

The result is:

```
c0 | min_function_cast
-----+-------------------
-10 | 10
15 | 10
20 | 10
-25 | 10
-5 | 5
(5 rows)
```

However, in other DBMS (SQLite, MySQL, OracleDB) and PostgreSQL version
17.0, the output is:

```
c0 | min_function_cast
-----+-------------------
-10 | 10
-25 | 10
-5 | 5
15 | 5
20 | 5
(5 rows)
```

This discrepency is due to different string sorting orders.

The minimized PoC is:

```sql
CREATE TABLE t0 (c0 TEXT);
INSERT INTO t0 (c0) VALUES ('-10'), ('20'), ('-5'), ('15'), ('-25');
SELECT c0 FROM t0 ORDER BY c0;
```

In PostgreSQL 14.13, the order is incorrect:
```
-10
15
20
-25
-5
```

While the correct order should be:

```
-10
-25
-5
10
20
```

as '-' is smaller than '1', or '2' in ascii.

I'm doubtful this is a collation issue, as most collations basically
respect ASCII order.

Even if this issue appears to be related to collation, no warnings are
provided when migrating from this version.

# Please describe a way to repeat the problem.
# Please try to provide a concise reproducible example, if at all possible:
----------------------------------------------------------------------

Running the following command in PostgreSQL 14.13:

```sql
CREATE TABLE t0 (c0 TEXT);
INSERT INTO t0 (c0) VALUES ('-10'), ('20'), ('-5'), ('15'), ('-25');
SELECT c0, MIN(ABS(CAST(c0 AS BIGINT))) OVER (ORDER BY c0 NULLS FIRST) AS
min_function_cast FROM t0;
DROP TABLE IF EXISTS t0;
```

Results in the incorrect sorting behavior.

If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------
Using the correct sort criteria will solve this problem.

#2Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: [3반]김민지_4904 (#1)
Re: Sorting Discrepancy in PostgreSQL 14.13

Hi,

On 11/14/24 13:49, [3반]김민지_4904 wrote:

...

The minimized PoC is:

```sql
CREATE TABLE t0 (c0 TEXT);
INSERT INTO t0 (c0) VALUES ('-10'), ('20'), ('-5'), ('15'), ('-25');
SELECT c0 FROM t0 ORDER BY c0;
```

In PostgreSQL 14.13, the order is incorrect:
```
-10
15
20
-25
-5
```

While the correct order should be:

```
-10
-25
-5
10
20
```

as '-' is smaller than '1', or '2' in ascii.

I'm doubtful this is a collation issue, as most collations basically
respect ASCII order.

This is 99.999% due to the collation, so which collations are being used
on these systems? Also, I don't get this "incorrect" behavior on 14.13,
it behaves the same as 17 for me, producing the expected result.

regards

--
Tomas Vondra

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#2)
Re: Sorting Discrepancy in PostgreSQL 14.13

Tomas Vondra <tomas@vondra.me> writes:

On 11/14/24 13:49, [3반]김민지_4904 wrote:

I'm doubtful this is a collation issue, as most collations basically
respect ASCII order.

This is 99.999% due to the collation, so which collations are being used
on these systems? Also, I don't get this "incorrect" behavior on 14.13,
it behaves the same as 17 for me, producing the expected result.

It surely is a collation issue. Using a glibc-based system, I get

u8=# CREATE TABLE t0 (c0 TEXT);
CREATE TABLE
u8=# INSERT INTO t0 (c0) VALUES ('-10'), ('20'), ('-5'), ('15'), ('-25');
INSERT 0 5
u8=# select * from t0 order by c0 collate "C";
c0
-----
-10
-25
-5
15
20
(5 rows)

u8=# select * from t0 order by c0 collate "en_US";
c0
-----
-10
15
20
-25
-5
(5 rows)

(In point of fact, most glibc collations do NOT "respect ASCII order".
They tend to ignore punctuation until it's needed as a tiebreaker.)

So this is surely down to the PG 14.13 installation having a different
default collation than whatever it's compared to, which most likely
is caused by having run initdb with a different locale environment.

regards, tom lane

#4Peter Eisentraut
peter_e@gmx.net
In reply to: [3반]김민지_4904 (#1)
Re: Sorting Discrepancy in PostgreSQL 14.13

On 14.11.24 13:49, [3반]김민지_4904 wrote:

as '-' is smaller than '1', or '2' in ascii.

I'm doubtful this is a collation issue, as most collations basically
respect ASCII order.

See also here for a possible explanation:
https://peter.eisentraut.org/blog/2023/04/12/how-collation-of-punctuation-and-whitespace-works