[PATCH] Fix ICU strength not being honored in collation rules

Started by Luis Felippe3 months ago3 messages
#1Luis Felippe
luisfelippe@protonmail.com
1 attachment(s)

Hello,

I have run into an issue where specifying the rules argument for "CREATE COLLATION" changes the collation strength to tertiary, even if it is explicitly set in the rules string. I discovered that this is because ucol_openRules is called passing strength UCOL_DEFAULT_STRENGTH, which overwrites whatever is in the rules string with UCOL_TERTIARY.

This fix changes this call to pass UCOL_DEFAULT instead. This way, UCOL_TERTIARY is still specified by default, but the strength explicitly set on the rules string is not overwritten. This is important because there is currently no way to create a collation with custom tailoring rules with strengh other than tertiary.

What happens currently:

CREATE COLLATION my_col (provider = icu, locale = 'und', rules = '', deterministic = false); -- strengh: tertiary
CREATE COLLATION my_col (provider = icu, locale = 'und', rules = '[strength 2]', deterministic = false); -- strength: tertiary
CREATE COLLATION my_col (provider = icu, locale = 'und', rules = '[strength 1]', deterministic = false); -- strength: tertiary

What happens after the patch:

CREATE COLLATION my_col (provider = icu, locale = 'und', rules = '', deterministic = false); -- strengh: tertiary
CREATE COLLATION my_col (provider = icu, locale = 'und', rules = '[strength 2]', deterministic = false); -- strength: secondary
CREATE COLLATION my_col (provider = icu, locale = 'und', rules = '[strength 1]', deterministic = false); -- strength: primary

As this only affects cases where the strength is explicitly set but was previously ignores, I do not think it is a breaking change.

I have successfully compiled and tested PostgreSQL after this change, and it behaves as documented above.

Thank you in advance,

Luis

Attachments:

0001-Fix-ICU-strength-not-being-honored-in-collation-rule.patchtext/x-patch; name=0001-Fix-ICU-strength-not-being-honored-in-collation-rule.patchDownload
From fa37c67416fcf472c01cf59c8bb12c7dba2ab284 Mon Sep 17 00:00:00 2001
From: lfpraca <luisfelippe@protonmail.com>
Date: Mon, 27 Oct 2025 14:27:27 -0300
Subject: [PATCH] Fix ICU strength not being honored in collation rules

---
 src/backend/utils/adt/pg_locale_icu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 05bad20..08a461a 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -466,7 +466,7 @@ make_icu_collator(const char *iculocstr, const char *icurules)
 
 		status = U_ZERO_ERROR;
 		collator_all_rules = ucol_openRules(all_rules, u_strlen(all_rules),
-											UCOL_DEFAULT, UCOL_DEFAULT_STRENGTH,
+											UCOL_DEFAULT, UCOL_DEFAULT,
 											NULL, &status);
 		if (U_FAILURE(status))
 		{
-- 
2.49.1

#2Daniel Verite
daniel@manitou-mail.org
In reply to: Luis Felippe (#1)
Re: [PATCH] Fix ICU strength not being honored in collation rules

Luis Felippe wrote:

This fix changes this call to pass UCOL_DEFAULT instead. This way,
UCOL_TERTIARY is still specified by default, but the strength explicitly set
on the rules string is not overwritten. This is important because there is
currently no way to create a collation with custom tailoring rules with
strengh other than tertiary.

Yes. There was a previous report recently [1]/messages/by-id/YT2PPF959236618377A072745A280E278F4BE1DA@YT2PPF959236618.CANPRD01.PROD.OUTLOOK.COM, with a proposed fix [2]https://commitfest.postgresql.org/patch/6084/
identical to yours.

As this only affects cases where the strength is explicitly set but was
previously ignores, I do not think it is a breaking change.

The fix may change sort results for collations affected by the problem
(that's the point of the fix!), so even if it's for the better, it's
theorically
a breaking change for databases that may have collations like that.

[1]: /messages/by-id/YT2PPF959236618377A072745A280E278F4BE1DA@YT2PPF959236618.CANPRD01.PROD.OUTLOOK.COM
/messages/by-id/YT2PPF959236618377A072745A280E278F4BE1DA@YT2PPF959236618.CANPRD01.PROD.OUTLOOK.COM

[2]: https://commitfest.postgresql.org/patch/6084/

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/

#3Luis Felippe
luisfelippe@protonmail.com
In reply to: Daniel Verite (#2)
Re: [PATCH] Fix ICU strength not being honored in collation rules

Daniel Verite wrote:

Yes. There was a previous report recently [1], with a proposed fix [2]
identical to yours.

It is great to know this is already being addressed.

The fix may change sort results for collations affected by the problem
(that's the point of the fix!), so even if it's for the better, it's
theorically
a breaking change for databases that may have collations like that.

While this is technically a breaking change, it only affects cases where the strength attribute is explicitly set. Cases where the strength is indirectly set — for example, by specifying a locale with a different default strength (e.g. und-u-ks-level-2) — continue to behave as before, where providing any tailoring rules resets the strength to tertiary.

Explicitly setting the strength attribute is, by definition, an intentional change to the collation strength. PostgreSQL currently accepts this attribute but silently ignores it, which is a clear correctness issue rather than an intentional behavioral characteristic. The fix therefore aligns the implementation with user expectations and with the documented meaning of the attribute.

Given that the change only impacts explicitly misbehaving cases and brings behavior in line with both specification and intent, I think it would be reasonable — and beneficial — to include it in the next minor release.

Best regards,

Luis