BUG #19045: Applying custom collation rules appears to erase existing rules

Started by PG Bug reporting form8 months ago3 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 19045
Logged by: Todd Lang
Email address: todd.lang@d2l.com
PostgreSQL version: 17.6
Operating system: Windows 10 64 bit
Description:

Setting up a collation on a table with the following:

DROP TABLE IF EXISTS test_table;
DROP COLLATION IF EXISTS CI_AS;
CREATE COLLATION CI_AS (PROVIDER=icu, LOCALE='en-US-u-ks-level2',
DETERMINISTIC=false);
CREATE TABLE test_table (field1 varchar(256) COLLATE CI_AS);
INSERT INTO test_table VALUES (U&'this is a string.');
INSERT INTO test_table VALUES (U&'THIS IS A STRING.');

Then issue the query:
SELECT * FROM test_table WHERE field1 = 'This is a string.';

This should provide:
"this is a string."
"THIS IS A STRING."

Now alter the collation slightly to include rules. (Note the CREATE
COLLATION line)

DROP TABLE IF EXISTS test_table;
DROP COLLATION IF EXISTS CI_AS;
CREATE COLLATION CI_AS (PROVIDER=icu, LOCALE='en-US-u-ks-level2',
DETERMINISTIC=false, rules='');
CREATE TABLE test_table (field1 varchar(256) COLLATE CI_AS);
INSERT INTO test_table VALUES (U&'this is a string.');
INSERT INTO test_table VALUES (U&'THIS IS A STRING.');

Now issue:
SELECT * FROM test_table WHERE field1 = 'This is a string.';

There are no results.

From the documentation it seems that any text supplied should be additional
rules to the standard rules.
In `pg_locale_icu.c` in the `make_icu_collator` method at line 455, it seems
that it does a simple:

u_strcpy(all_rules, std_rules);
u_strcat(all_rules, my_rules);

which seems like, with the above change, should just append nothing to the
standard rules, causing no change. This, however, is not the case.

I have tried it with various permutations of the `rules`, and while any
rules supplied during the CREATE COLLATION call appear to function, it seems
that all standard rules are forgotten when this option is utilized.

#2Todd Lang
Todd.Lang@D2L.com
In reply to: PG Bug reporting form (#1)
RE: BUG #19045: Applying custom collation rules appears to erase existing rules

FWIW, I've started putting in some logging to see if I can figure out what's going on here.

What seems to happen is that at backend\utils\adt\pg_locale_c:347 it asks for the existing rules to prepare to append the custom rules. However, I can't seem to track it actually returning any rules. The length returned is always 0. It then dutifully appends the custom rules to this empty set of rules and then applies them, and that is exactly the behaviour I seem to be observing. I'm still trying to figure out why icu_getRules isn't returning the rules for the supplied locale.

-----Original Message-----
From: PG Bug reporting form <noreply@postgresql.org>
Sent: Tuesday, September 9, 2025 11:14 AM
To: pgsql-bugs@lists.postgresql.org
Cc: Todd Lang <Todd.Lang@D2L.com>
Subject: BUG #19045: Applying custom collation rules appears to erase existing rules

CAUTION: This email originated from outside of D2L. Do not respond to, click links or open attachments unless you recognize the sender and know the content is safe.

The following bug has been logged on the website:

Bug reference: 19045
Logged by: Todd Lang
Email address: todd.lang@d2l.com
PostgreSQL version: 17.6
Operating system: Windows 10 64 bit
Description:

Setting up a collation on a table with the following:

DROP TABLE IF EXISTS test_table;
DROP COLLATION IF EXISTS CI_AS;
CREATE COLLATION CI_AS (PROVIDER=icu, LOCALE='en-US-u-ks-level2', DETERMINISTIC=false); CREATE TABLE test_table (field1 varchar(256) COLLATE CI_AS); INSERT INTO test_table VALUES (U&'this is a string.'); INSERT INTO test_table VALUES (U&'THIS IS A STRING.');

Then issue the query:
SELECT * FROM test_table WHERE field1 = 'This is a string.';

This should provide:
"this is a string."
"THIS IS A STRING."

Now alter the collation slightly to include rules. (Note the CREATE COLLATION line)

DROP TABLE IF EXISTS test_table;
DROP COLLATION IF EXISTS CI_AS;
CREATE COLLATION CI_AS (PROVIDER=icu, LOCALE='en-US-u-ks-level2', DETERMINISTIC=false, rules=''); CREATE TABLE test_table (field1 varchar(256) COLLATE CI_AS); INSERT INTO test_table VALUES (U&'this is a string.'); INSERT INTO test_table VALUES (U&'THIS IS A STRING.');

Now issue:
SELECT * FROM test_table WHERE field1 = 'This is a string.';

There are no results.

From the documentation it seems that any text supplied should be additional rules to the standard rules.
In `pg_locale_icu.c` in the `make_icu_collator` method at line 455, it seems that it does a simple:

u_strcpy(all_rules, std_rules);
u_strcat(all_rules, my_rules);

which seems like, with the above change, should just append nothing to the standard rules, causing no change. This, however, is not the case.

I have tried it with various permutations of the `rules`, and while any rules supplied during the CREATE COLLATION call appear to function, it seems that all standard rules are forgotten when this option is utilized.

#3Todd Lang
Todd.Lang@D2L.com
In reply to: Todd Lang (#2)
RE: BUG #19045: Applying custom collation rules appears to erase existing rules

My apologies, that should be pg_locale_icu.c, not pg_locale_c

-----Original Message-----
From: Todd Lang <Todd.Lang@D2L.com>
Sent: Thursday, September 11, 2025 2:15 PM
To: Todd Lang <Todd.Lang@D2L.com>; pgsql-bugs@lists.postgresql.org
Subject: RE: BUG #19045: Applying custom collation rules appears to erase existing rules

FWIW, I've started putting in some logging to see if I can figure out what's going on here.

What seems to happen is that at backend\utils\adt\pg_locale_c:347 it asks for the existing rules to prepare to append the custom rules. However, I can't seem to track it actually returning any rules. The length returned is always 0. It then dutifully appends the custom rules to this empty set of rules and then applies them, and that is exactly the behaviour I seem to be observing. I'm still trying to figure out why icu_getRules isn't returning the rules for the supplied locale.

-----Original Message-----
From: PG Bug reporting form <noreply@postgresql.org>
Sent: Tuesday, September 9, 2025 11:14 AM
To: pgsql-bugs@lists.postgresql.org
Cc: Todd Lang <Todd.Lang@D2L.com>
Subject: BUG #19045: Applying custom collation rules appears to erase existing rules

CAUTION: This email originated from outside of D2L. Do not respond to, click links or open attachments unless you recognize the sender and know the content is safe.

The following bug has been logged on the website:

Bug reference: 19045
Logged by: Todd Lang
Email address: todd.lang@d2l.com
PostgreSQL version: 17.6
Operating system: Windows 10 64 bit
Description:

Setting up a collation on a table with the following:

DROP TABLE IF EXISTS test_table;
DROP COLLATION IF EXISTS CI_AS;
CREATE COLLATION CI_AS (PROVIDER=icu, LOCALE='en-US-u-ks-level2', DETERMINISTIC=false); CREATE TABLE test_table (field1 varchar(256) COLLATE CI_AS); INSERT INTO test_table VALUES (U&'this is a string.'); INSERT INTO test_table VALUES (U&'THIS IS A STRING.');

Then issue the query:
SELECT * FROM test_table WHERE field1 = 'This is a string.';

This should provide:
"this is a string."
"THIS IS A STRING."

Now alter the collation slightly to include rules. (Note the CREATE COLLATION line)

DROP TABLE IF EXISTS test_table;
DROP COLLATION IF EXISTS CI_AS;
CREATE COLLATION CI_AS (PROVIDER=icu, LOCALE='en-US-u-ks-level2', DETERMINISTIC=false, rules=''); CREATE TABLE test_table (field1 varchar(256) COLLATE CI_AS); INSERT INTO test_table VALUES (U&'this is a string.'); INSERT INTO test_table VALUES (U&'THIS IS A STRING.');

Now issue:
SELECT * FROM test_table WHERE field1 = 'This is a string.';

There are no results.

From the documentation it seems that any text supplied should be additional rules to the standard rules.
In `pg_locale_icu.c` in the `make_icu_collator` method at line 455, it seems that it does a simple:

u_strcpy(all_rules, std_rules);
u_strcat(all_rules, my_rules);

which seems like, with the above change, should just append nothing to the standard rules, causing no change. This, however, is not the case.

I have tried it with various permutations of the `rules`, and while any rules supplied during the CREATE COLLATION call appear to function, it seems that all standard rules are forgotten when this option is utilized.