[PATCH] Add Hebrew and Arabic combining characters to unaccent.rules

Started by Noname7 months ago1 messages
Jump to latest
#1Noname
e3718e7@tutamail.com

Hi,

This adds combining diacritical mark ranges in Hebrew and Arabic unicode blocks (things like cantillations, vowel marks, etc.) to the list of code points which should be stripped in `unaccent`. There are a few punctuation code points interspersed between the ranges, so more contiguous blocks cannot be used.

Attachments:

0001-Add-Hebrew-and-Arabic-combining-characters-to-unacce.patchapplication/octet-stream; name=0001-Add-Hebrew-and-Arabic-combining-characters-to-unacce.patchDownload+118-4