pgsql: Fix off-by-one with NFC recomposition for Hangul U+11A7 (TBASE)

Started by Michael Paquier19 days ago1 messagescomitters
Jump to latest
#1Michael Paquier
michael@paquier.xyz

Fix off-by-one with NFC recomposition for Hangul U+11A7 (TBASE)

The NFC recomposition incorrectly included TBASE as a valid T syllable,
which is incorrect based on the Unicode specification (TBASE is one
below the start of the range, range beginning at U+11A8).

This would cause the TBASE to be silently swallowed in the
normalization, leading to an incorrect result.

A couple of regression tests are added to check more patterns with
Hangul recomposition and decomposition, on top of a test to check the
problem with TBASE. Diego has submitted the code fix, and I have
written the tests.

Author: Diego Frias <mail@dzfrias.dev>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: /messages/by-id/B92ED640-7D4A-4505-B09F-3548F58CBB16@dzfrias.dev
Backpatch-through: 14

Branch
------
REL_14_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/8bb935d619f6397ca91742195965d20b0ee5df6c

Modified Files
--------------
src/common/unicode_norm.c | 2 +-
src/test/regress/expected/unicode.out | 78 +++++++++++++++++++++++++++++++++++
src/test/regress/sql/unicode.sql | 20 +++++++++
3 files changed, 99 insertions(+), 1 deletion(-)