BUG #16222: [[:print:]] doesn't correctly handle Emoji skin tone modifiers on MacOS
The following bug has been logged on the website:
Bug reference: 16222
Logged by: Mack Earnhardt
Email address: mack@agilereasoning.com
PostgreSQL version: 11.6
Operating system: MacOS Catalina
Description:
On Linux heroku-18, these expressions both eval true:
select '✌'~'\A[[:print:]]*\Z';
select '✌🏻'~'\A[[:print:]]*\Z';
On MacOS Catalina, the 1st evals true but the 2nd evals false.
PG Bug reporting form <noreply@postgresql.org> writes:
On Linux heroku-18, these expressions both eval true:
select '✌'~'\A[[:print:]]*\Z';
select '✌ð»'~'\A[[:print:]]*\Z';
On MacOS Catalina, the 1st evals true but the 2nd evals false.
This is entirely a function of what your operating system's
locale support does. So it could be that you chose the wrong
LC_CTYPE setting for the macOS database -- in C locale, for
example, "false" is the right answer. However, we've observed
that macOS's UTF8-based locales seem pretty brain-dead about
handling of multibyte characters :-(. So it's likely that this
boils down to being Apple's bug. I haven't detected any interest
on their part in improving their POSIX locale support, unfortunately.
regards, tom lane
Hi Tom,
You’re correct. I thought the fact that Terminal and Vim both display correct-ish was enough to rule out the OS. It wasn’t.
The database LC_CTYPE is set to en_US.UTF-8, as is my bash terminal. When I put the two queries in a text file and use `egrep '^[[:print:]]+$’`, only the first line is recognized.
Thanks for helping me narrow this down!
-M
Show quoted text
On Jan 21, 2020, at 12:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
On Linux heroku-18, these expressions both eval true:
select '✌'~'\A[[:print:]]*\Z';
select '✌ð»'~'\A[[:print:]]*\Z';On MacOS Catalina, the 1st evals true but the 2nd evals false.
This is entirely a function of what your operating system's
locale support does. So it could be that you chose the wrong
LC_CTYPE setting for the macOS database -- in C locale, for
example, "false" is the right answer. However, we've observed
that macOS's UTF8-based locales seem pretty brain-dead about
handling of multibyte characters :-(. So it's likely that this
boils down to being Apple's bug. I haven't detected any interest
on their part in improving their POSIX locale support, unfortunately.regards, tom lane