How to display complicated Chinese character: Biang.

Started by jian healmost 4 years ago2 messagesgeneral

jian.universality@gmail.com

almost 4 years ago

Inspired by this thread:
/messages/by-id/011f01d8757e$f5d69700$e183c500$@ndensan.co.jp
Trying to display some special Chinese characters in Postgresql. For now I
am using postgresql 15 beta1. The OS is Ubuntu 20.

localhost:5433 admin@test=# show LC_COLLATE;
+------------+
| lc_collate |
+------------+
| C.UTF-8 |
+------------+

localhost:5433 admin@test=# select icu_unicode_version();

+---------------------+

| icu_unicode_version |

+---------------------+

| 13.0 |

+---------------------+

icu_unicode_version is the extension function.

Wiki about character Biang: https://en.wikipedia.org/wiki/Biangbiang_noodles

quote:

The character's traditional and simplified forms were added to Unicode
<https://en.wikipedia.org/wiki/Unicode> version 13.0 in March 2020 in the CJK
Unified Ideographs Extension G
<https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extension_G> block
of the newly allocated Tertiary Ideographic Plane
<https://en.wikipedia.org/wiki/Tertiary_Ideographic_Plane>.[19]
<https://en.wikipedia.org/wiki/Biangbiang_noodles#cite_note-20> The
corresponding Unicode characters are:

Unicode character info: https://www.compart.com/en/unicode/U+30EDD

query

with strings(s) as (

values (U&'\+0030EDD')
)
select s,
octet_length(s),
char_length(s),
(select count(*) from icu_character_boundaries(s,'en')) as graphemes
from strings;

return

+-----+--------------+-------------+-----------+
|  s    | octet_length | char_length | graphemes |
+-----+--------------+-------------+-----------+
| ロD |            4      |           2          |         2 |
+-----+--------------+-------------+-----------+

Seems not right. graphemes should be 1?
And I am not sure values (U&'\+0030EDD') is the same as 𰻝.

--
I recommend David Deutsch's <<The Beginning of Infinity>>

Jian

Laurenz Albe

laurenz.albe@cybertec.at

almost 4 years ago

In reply to: jian he (#1)

Re: How to display complicated Chinese character: Biang.

On Thu, 2022-06-02 at 12:45 +0530, jian he wrote:

Trying to display some special Chinese characters in Postgresql.

localhost:5433 admin@test=# show LC_COLLATE;
+------------+
| lc_collate |
+------------+
| C.UTF-8 |
+------------+

with strings(s) as (
values (U&'\+0030EDD')
)
select s,
octet_length(s),
char_length(s),
(select count(*) from icu_character_boundaries(s,'en')) as graphemes from strings;
+-----+--------------+-------------+-----------+
|  s    | octet_length | char_length | graphemes |
+-----+--------------+-------------+-----------+
| ロD |            4      |           2          |         2 |
+-----+--------------+-------------+-----------+
Seems not right. graphemes should be 1?

You have an extra "0" there; "\+" unicode escapes have exactly 6 digits:

WITH strings(s) AS (
VALUES (U&'\+030EDD')
)
select s,
octet_length(s),
char_length(s)
from strings;

s │ octet_length │ char_length
════╪══════════════╪═════════════
𰻝 │ 4 │ 1
(1 row)

PostgreSQL doesn't have a function "icu_character_boundaries".

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com