Unicode update and some tooling improvements
This is the annual update of the Unicode data. I also worked a bit on
the tooling. The update-unicode target under meson did not update the
data in contrib/unaccent/, so I added that. I also fixed a Python
deprecation warning in the generation script and made some light changes
in the surrounding documentation.
Attachments:
0001-Fix-Python-deprecation-warning.patchtext/plain; charset=UTF-8; name=0001-Fix-Python-deprecation-warning.patchDownload+1-2
0002-doc-Fix-capitalization-of-Unicode.patchtext/plain; charset=UTF-8; name=0002-doc-Fix-capitalization-of-Unicode.patchDownload+1-2
0003-Implement-unaccent-Unicode-data-update-in-meson.patchtext/plain; charset=UTF-8; name=0003-Implement-unaccent-Unicode-data-update-in-meson.patchDownload+63-19
0004-Update-RELEASE_CHANGES.patchtext/plain; charset=UTF-8; name=0004-Update-RELEASE_CHANGES.patchDownload+1-3
0005-Update-Unicode-data-to-CLDR-48.1.patchtext/plain; charset=UTF-8; name=0005-Update-Unicode-data-to-CLDR-48.1.patchDownload+2-3
0006-Update-Unicode-data-to-Unicode-17.0.0.patchtext/plain; charset=UTF-8; name=0006-Update-Unicode-data-to-Unicode-17.0.0.patchDownload+4034-3675
On Feb 27, 2026, at 04:36, Peter Eisentraut <peter@eisentraut.org> wrote:
This is the annual update of the Unicode data. I also worked a bit on the tooling. The update-unicode target under meson did not update the data in contrib/unaccent/, so I added that. I also fixed a Python deprecation warning in the generation script and made some light changes in the surrounding documentation.
<0001-Fix-Python-deprecation-warning.patch><0002-doc-Fix-capitalization-of-Unicode.patch><0003-Implement-unaccent-Unicode-data-update-in-meson.patch><0004-Update-RELEASE_CHANGES.patch><0005-Update-Unicode-data-to-CLDR-48.1.patch><0006-Update-Unicode-data-to-Unicode-17.0.0.patch>
Overall looks good to me.
To verify this patch, I upgraded by local ICU to version 78.2, then I tried to run the python script:
```
chaol@ChaodeMacBook-Air postgresql % python3 contrib/unaccent/generate_unaccent_rules.py \
--unicode-data-file src/common/unicode/UnicodeData.txt \
--latin-ascii-file contrib/unaccent/Latin-ASCII.xml \
/tmp/unaccent.rules.new
chaol@ChaodeMacBook-Air postgresql %
chaol@ChaodeMacBook-Air postgresql %
chaol@ChaodeMacBook-Air postgresql % diff -u contrib/unaccent/unaccent.rules /tmp/unaccent.rules.new # no difference
```
And I ran a clean meson build, and specially verified the new Unicode wiring:
```
chaol@ChaodeMacBook-Air postgresql % ninja -C build update-unicode # passed
```
And test:
```
chaol@ChaodeMacBook-Air postgresql % ninja -C build -t targets | grep update-unicode
update-unicode: phony
chaol@ChaodeMacBook-Air postgresql % ninja -C build test # passed
ninja: Entering directory `build'
[406/407] Running all tests
…
Ok: 333
Fail: 0
Skipped: 30
Full log written to /Users/chaol/Documents/code/postgresql/build/meson-logs/testlog.txt
```
Only a small comment on 0003:
```
# Meson 0.57.0 and 0.57.1 are buggy, therefore >=0.57.2.
- meson_version: '>=0.57.2',
+ # FIXME: update comment
+ meson_version: '>=0.58',
```
Why leaves a FIXME instead of just updating the comment? I saw the installation.sgml doc has been updated.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On 27.02.26 03:50, Chao Li wrote:
Only a small comment on 0003:
```
# Meson 0.57.0 and 0.57.1 are buggy, therefore >=0.57.2.
- meson_version: '>=0.57.2',
+ # FIXME: update comment
+ meson_version: '>=0.58',
```Why leaves a FIXME instead of just updating the comment? I saw the installation.sgml doc has been updated.
It wasn't meant to be committed that way. I just didn't want to spend
the time crafting a comment before it was generally agreed to proceed in
this way that required a meson version update.
26.02.2026 23:36, Peter Eisentraut wrote:
This is the annual update of the Unicode data. I also worked a bit on
the tooling. The update-unicode target under meson did not update the
data in contrib/unaccent/, so I added that. I also fixed a Python
deprecation warning in the generation script and made some light changes
in the surrounding documentation.
Installed, tested, checked it out.
I hope I'm not late.
"[PATCH 3/6] Implement unaccent Unicode data update in meson"
The idea of raising the minimum Meson version is good.
But it seems like we can do without raising the version.
As I understand it, the minimum version is being raised because of
.replace(), but it can be successfully replaced here with the following
construct:
cldr_version_dashed = '-'.join(CLDR_VERSION.split('.'))
url = cldr_baseurl.format(cldr_version_dashed, f)
I would increase the minimum version of Meson, but I would do it with a
separate patch so that the commit log would be "loud":
- Increase the minimum version for Meson.
This would be useful for users who look at commit logs.
Currently, the minimum version for Meson is increased "secretly" inside
the patch. Or at least explicitly indicate this in the commit log for
this patch.
Otherwise, looks good to me.
I am in favor of regular Unicode updates. 🙂
--
Best regards,
Alexander Borisov