more Unicode data updates

Started by Peter Eisentrautalmost 7 years ago3 messageshackers
Jump to latest
#1Peter Eisentraut
peter_e@gmx.net

src/include/common/unicode_norm_table.h also should be updated to the
latest Unicode tables, as described in src/common/unicode. See attached
patches. This also passes the tests described in
src/common/unicode/README. (That is, the old code does not pass the
current Unicode test file, but the updated code does pass it.)

I also checked contrib/unaccent/ but it seems up to date.

It seems to me that we ought to make this part of the standard major
release preparations. There is a new Unicode standard approximately
once a year; see <https://unicode.org/Public/&gt;. (The 13.0.0 listed
there is not released yet.)

It would also be nice to unify and automate all these "update to latest
Unicode" steps.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-Correct-script-name-in-README-file.patchtext/plain; charset=UTF-8; name=0001-Correct-script-name-in-README-file.patch; x-mac-creator=0; x-mac-type=0Download+1-2
0002-Make-script-output-more-pgindent-compatible.patchtext/plain; charset=UTF-8; name=0002-Make-script-output-more-pgindent-compatible.patch; x-mac-creator=0; x-mac-type=0Download+2-2
0003-Update-unicode_norm_table.h-to-Unicode-12.1.0.patchtext/plain; charset=UTF-8; name=0003-Update-unicode_norm_table.h-to-Unicode-12.1.0.patch; x-mac-creator=0; x-mac-type=0Download+2017-1966
#2Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#1)
Re: more Unicode data updates

On Thu, Jun 20, 2019 at 8:35 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

src/include/common/unicode_norm_table.h also should be updated to the
latest Unicode tables, as described in src/common/unicode. See attached
patches. This also passes the tests described in
src/common/unicode/README. (That is, the old code does not pass the
current Unicode test file, but the updated code does pass it.)

I also checked contrib/unaccent/ but it seems up to date.

It seems to me that we ought to make this part of the standard major
release preparations. There is a new Unicode standard approximately
once a year; see <https://unicode.org/Public/&gt;. (The 13.0.0 listed
there is not released yet.)

It would also be nice to unify and automate all these "update to latest
Unicode" steps.

+1, great idea. Every piece of the system that derives from Unicode
data should derive from the same version, and the version should be
mentioned in the release notes when it changes, and should be
documented somewhere centrally. I wondered about that when working on
the unaccent generator script but didn't wonder hard enough.

--
Thomas Munro
https://enterprisedb.com

#3Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#1)
Re: more Unicode data updates

On 2019-06-19 22:34, Peter Eisentraut wrote:

src/include/common/unicode_norm_table.h also should be updated to the
latest Unicode tables, as described in src/common/unicode. See attached
patches. This also passes the tests described in
src/common/unicode/README. (That is, the old code does not pass the
current Unicode test file, but the updated code does pass it.)

committed

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services