[PATCH] Add Hebrew and Arabic combining characters to unaccent.rules
Started by Noname7 months ago1 messages
Hi,
This adds combining diacritical mark ranges in Hebrew and Arabic unicode blocks (things like cantillations, vowel marks, etc.) to the list of code points which should be stripped in `unaccent`. There are a few punctuation code points interspersed between the ranges, so more contiguous blocks cannot be used.