encoding affects ICU regex character classification

Started by Jeff Davisabout 2 years ago12 messages
#1Jeff Davis
pgsql@j-davis.com

The following query:

SELECT U&'\017D' ~ '[[:alpha:]]' collate "en-US-x-icu";

returns true if the server encoding is UTF8, and false if the server
encoding is LATIN9. That's a bug -- any behavior involving ICU should
be encoding-independent.

The problem seems to be confusion between pg_wchar and a unicode code
point in pg_wc_isalpha() and related functions.

It might be good to introduce some infrastructure here that can convert
a pg_wchar into a Unicode code point, or decode a string of bytes into
a string of 32-bit code points. Right now, that's possible, but it
involves pg_wchar2mb() followed by encoding conversion to UTF8,
followed by decoding the UTF8 to a code point. (Is there an easier path
that I missed?)

One wrinkle is MULE_INTERNAL, which doesn't have any conversion path to
UTF8. That's not important for ICU (because ICU is not allowed for that
encoding), but I'd like it if we could make this infrastructure
independent of ICU, because I have some follow-up proposals to simplify
character classification here and in ts_locale.c.

Thoughts?

Regards,
Jeff Davis

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#1)
Re: encoding affects ICU regex character classification

Jeff Davis <pgsql@j-davis.com> writes:

The problem seems to be confusion between pg_wchar and a unicode code
point in pg_wc_isalpha() and related functions.

Yeah, that's an ancient sore spot: we don't really know what the
representation of wchar is. We assume it's Unicode code points
for UTF8 locales, but libc isn't required to do that AFAIK. See
comment block starting about line 20 in regc_pg_locale.c.

I doubt that ICU has much to do with this directly.

We'd have to find an alternate source of knowledge to replace the
<wctype.h> functions if we wanted to fix it fully ... can ICU do that?

regards, tom lane

#3Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#2)
3 attachment(s)
Re: encoding affects ICU regex character classification

On Wed, 2023-11-29 at 18:56 -0500, Tom Lane wrote:

We'd have to find an alternate source of knowledge to replace the
<wctype.h> functions if we wanted to fix it fully ... can ICU do
that?

My follow-up proposal is exactly along those lines, except that we
don't even need ICU.

By adding a couple lookup tables generated from the Unicode data files,
we can offer a pg_u_isalpha() family of functions. As a bonus, I have
some exhaustive tests to compare with what ICU does so we can protect
ourselves from simple mistakes.

I might as well send it now; patch attached (0003 is the interesting
one).

I also tested against the iswalpha() family of functions, and those
have very similar behavior (apart from the "C" locale, of course).
Character classification is not localized at all in libc or ICU as far
as I can tell.

There are some differences, and I don't understand why those
differences exist, so perhaps that's worth discussing. Some differences
seem to be related to the titlecase/uppercase distinction. Others are
strange, like how glibc counts some digit characters (outside 0-9) as
alphabetic. And some seem arbitrary, like excluding a few whitespace
characters. I can try to post more details if that would be helpful.

Another issue is that right now we are doing the wrong thing with ICU:
we should be using the u_isUAlphabetic() family of functions, not the
u_isalpha() family of functions.

Regards,
Jeff Davis

Attachments:

v1-0003-Add-Unicode-property-tables.patchtext/x-patch; charset=UTF-8; name=v1-0003-Add-Unicode-property-tables.patchDownload
From f0c004846542f6b415005f9f9c949199b3f3bcfd Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Sat, 18 Nov 2023 15:34:24 -0800
Subject: [PATCH v1 3/3] Add Unicode property tables.

---
 src/common/unicode/Makefile                   |    6 +-
 src/common/unicode/category_test.c            |  167 +-
 .../generate-unicode_category_table.pl        |  203 +-
 src/common/unicode/meson.build                |    4 +-
 src/common/unicode_category.c                 |  210 +-
 src/include/common/unicode_category.h         |   19 +-
 src/include/common/unicode_category_table.h   | 2532 +++++++++++++++++
 7 files changed, 3078 insertions(+), 63 deletions(-)

diff --git a/src/common/unicode/Makefile b/src/common/unicode/Makefile
index 04d81dd5cb..27f0408d8b 100644
--- a/src/common/unicode/Makefile
+++ b/src/common/unicode/Makefile
@@ -29,13 +29,13 @@ update-unicode: unicode_category_table.h unicode_east_asian_fw_table.h unicode_n
 # These files are part of the Unicode Character Database. Download
 # them on demand.  The dependency on Makefile.global is for
 # UNICODE_VERSION.
-CompositionExclusions.txt DerivedNormalizationProps.txt EastAsianWidth.txt NormalizationTest.txt UnicodeData.txt: $(top_builddir)/src/Makefile.global
+CompositionExclusions.txt DerivedCoreProperties.txt DerivedNormalizationProps.txt EastAsianWidth.txt NormalizationTest.txt PropList.txt UnicodeData.txt: $(top_builddir)/src/Makefile.global
 	$(DOWNLOAD) https://www.unicode.org/Public/$(UNICODE_VERSION)/ucd/$(@F)
 
 unicode_version.h: generate-unicode_version.pl
 	$(PERL) $< --version $(UNICODE_VERSION)
 
-unicode_category_table.h: generate-unicode_category_table.pl UnicodeData.txt
+unicode_category_table.h: generate-unicode_category_table.pl DerivedCoreProperties.txt PropList.txt UnicodeData.txt
 	$(PERL) $<
 
 # Generation of conversion tables used for string normalization with
@@ -82,4 +82,4 @@ clean:
 	rm -f $(OBJS) category_test category_test.o norm_test norm_test.o
 
 distclean: clean
-	rm -f CompositionExclusions.txt DerivedNormalizationProps.txt EastAsianWidth.txt NormalizationTest.txt UnicodeData.txt norm_test_table.h unicode_category_table.h unicode_norm_table.h
+	rm -f CompositionExclusions.txt DerivedCoreProperties.txt DerivedNormalizationProps.txt EastAsianWidth.txt NormalizationTest.txt PropList.txt UnicodeData.txt norm_test_table.h unicode_category_table.h unicode_norm_table.h
diff --git a/src/common/unicode/category_test.c b/src/common/unicode/category_test.c
index d9ea806eb8..02b4f0698a 100644
--- a/src/common/unicode/category_test.c
+++ b/src/common/unicode/category_test.c
@@ -1,6 +1,7 @@
 /*-------------------------------------------------------------------------
  * category_test.c
- *		Program to test Unicode general category functions.
+ *		Program to test Unicode general category and character class
+ *		functions.
  *
  * Portions Copyright (c) 2017-2023, PostgreSQL Global Development Group
  *
@@ -14,17 +15,21 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
-
 #ifdef USE_ICU
 #include <unicode/uchar.h>
 #endif
-#include "common/unicode_category.h"
+
 #include "common/unicode_version.h"
+#include "common/unicode_category.h"
+
+static int	pg_unicode_version = 0;
+#ifdef USE_ICU
+static int	icu_unicode_version = 0;
+#endif
 
 /*
  * Parse version into integer for easy comparison.
  */
-#ifdef USE_ICU
 static int
 parse_unicode_version(const char *version)
 {
@@ -39,56 +44,116 @@ parse_unicode_version(const char *version)
 
 	return major * 100 + minor;
 }
-#endif
 
+#ifdef USE_ICU
 /*
- * Exhaustively test that the Unicode category for each codepoint matches that
- * returned by ICU.
+ * Test Postgres Unicode tables by comparing with ICU. Test the General
+ * Category, as well as the properties Alphabetic, Lowercase, Uppercase,
+ * White_Space, and Hex_Digit.
  */
-int
-main(int argc, char **argv)
+static void
+icu_test()
 {
-#ifdef USE_ICU
-	int			pg_unicode_version = parse_unicode_version(PG_UNICODE_VERSION);
-	int			icu_unicode_version = parse_unicode_version(U_UNICODE_VERSION);
 	int			pg_skipped_codepoints = 0;
 	int			icu_skipped_codepoints = 0;
 
-	printf("category_test: Postgres Unicode version:\t%s\n", PG_UNICODE_VERSION);
-	printf("category_test: ICU Unicode version:\t\t%s\n", U_UNICODE_VERSION);
-
-	for (UChar32 code = 0; code <= 0x10ffff; code++)
+	for (pg_wchar code = 0; code <= 0x10ffff; code++)
 	{
 		uint8_t		pg_category = unicode_category(code);
 		uint8_t		icu_category = u_charType(code);
 
+		bool		isalpha = pg_u_isalpha(code);
+		bool		islower = pg_u_islower(code);
+		bool		isupper = pg_u_isupper(code);
+		bool		ispunct = pg_u_ispunct(code);
+		bool		isdigit = pg_u_isdigit(code);
+		bool		isxdigit = pg_u_isxdigit(code);
+		bool		isalnum = pg_u_isalnum(code);
+		bool		isspace = pg_u_isspace(code);
+		bool		isblank = pg_u_isblank(code);
+		bool		iscntrl = pg_u_iscntrl(code);
+		bool		isgraph = pg_u_isgraph(code);
+		bool		isprint = pg_u_isprint(code);
+
+		bool		icu_isalpha = u_isUAlphabetic(code);
+		bool		icu_islower = u_isULowercase(code);
+		bool		icu_isupper = u_isUUppercase(code);
+		bool		icu_ispunct = u_ispunct(code);
+		bool		icu_isdigit = u_isdigit(code);
+
+		/*
+		 * ICU documents that UCHAR_POSIX_XDIGIT should match the xdigit class
+		 * in: http://www.unicode.org/reports/tr18/#Compatibility_Properties,
+		 * but it does not, so we use UCHAR_HEX_DIGIT instead.
+		 */
+		bool		icu_isxdigit = u_hasBinaryProperty(code,
+													   UCHAR_HEX_DIGIT);
+
+		bool		icu_isalnum = u_hasBinaryProperty(code,
+													  UCHAR_POSIX_ALNUM);
+		bool		icu_isspace = u_isUWhiteSpace(code);
+		bool		icu_isblank = u_isblank(code);
+		bool		icu_iscntrl = icu_category == PG_U_CONTROL;
+		bool		icu_isgraph = u_hasBinaryProperty(code,
+													  UCHAR_POSIX_GRAPH);
+		bool		icu_isprint = u_hasBinaryProperty(code,
+													  UCHAR_POSIX_PRINT);
+
+		/*
+		 * A version mismatch means that some assigned codepoints in the newer
+		 * version may be unassigned in the older version. That's OK, though
+		 * the test will not cover those codepoints marked unassigned in the
+		 * older version (that is, it will no longer be an exhaustive test).
+		 */
+		if (pg_category == PG_U_UNASSIGNED &&
+			icu_category != PG_U_UNASSIGNED &&
+			pg_unicode_version < icu_unicode_version)
+		{
+			pg_skipped_codepoints++;
+			continue;
+		}
+
+		if (icu_category == PG_U_UNASSIGNED &&
+			pg_category != PG_U_UNASSIGNED &&
+			icu_unicode_version < pg_unicode_version)
+		{
+			icu_skipped_codepoints++;
+			continue;
+		}
+
 		if (pg_category != icu_category)
 		{
-			/*
-			 * A version mismatch means that some assigned codepoints in the
-			 * newer version may be unassigned in the older version. That's
-			 * OK, though the test will not cover those codepoints marked
-			 * unassigned in the older version (that is, it will no longer be
-			 * an exhaustive test).
-			 */
-			if (pg_category == PG_U_UNASSIGNED &&
-				pg_unicode_version < icu_unicode_version)
-				pg_skipped_codepoints++;
-			else if (icu_category == PG_U_UNASSIGNED &&
-					 icu_unicode_version < pg_unicode_version)
-				icu_skipped_codepoints++;
-			else
-			{
-				printf("category_test: FAILURE for codepoint 0x%06x\n", code);
-				printf("category_test: Postgres category:	%02d %s %s\n", pg_category,
-					   unicode_category_abbrev(pg_category),
-					   unicode_category_string(pg_category));
-				printf("category_test: ICU category:		%02d %s %s\n", icu_category,
-					   unicode_category_abbrev(icu_category),
-					   unicode_category_string(icu_category));
-				printf("\n");
-				exit(1);
-			}
+			printf("category_test: FAILURE for codepoint 0x%06x\n", code);
+			printf("category_test: Postgres category:	%02d %s %s\n", pg_category,
+				   unicode_category_abbrev(pg_category),
+				   unicode_category_string(pg_category));
+			printf("category_test: ICU category:		%02d %s %s\n", icu_category,
+				   unicode_category_abbrev(icu_category),
+				   unicode_category_string(icu_category));
+			printf("\n");
+			exit(1);
+		}
+
+		if (isalpha != icu_isalpha ||
+			islower != icu_islower ||
+			isupper != icu_isupper ||
+			ispunct != icu_ispunct ||
+			isdigit != icu_isdigit ||
+			isxdigit != icu_isxdigit ||
+			isalnum != icu_isalnum ||
+			isspace != icu_isspace ||
+			isblank != icu_isblank ||
+			iscntrl != icu_iscntrl ||
+			isgraph != icu_isgraph ||
+			isprint != icu_isprint)
+		{
+			printf("category_test: FAILURE for codepoint 0x%06x\n", code);
+			printf("category_test: Postgres	property	alpha/lower/upper/punct/digit/xdigit/alnum/space/blank/cntrl/graph/print: %d/%d/%d/%d/%d/%d/%d/%d/%d/%d/%d/%d\n",
+				   isalpha, islower, isupper, ispunct, isdigit, isxdigit, isalnum, isspace, isblank, iscntrl, isgraph, isprint);
+			printf("category_test: ICU property		alpha/lower/upper/punct/digit/xdigit/alnum/space/blank/cntrl/graph/print: %d/%d/%d/%d/%d/%d/%d/%d/%d/%d/%d/%d\n",
+				   icu_isalpha, icu_islower, icu_isupper, icu_ispunct, icu_isdigit, icu_isxdigit, icu_isalnum, icu_isspace, icu_isblank, icu_iscntrl, icu_isgraph, icu_isprint);
+			printf("\n");
+			exit(1);
 		}
 	}
 
@@ -99,10 +164,22 @@ main(int argc, char **argv)
 		printf("category_test: skipped %d codepoints unassigned in ICU due to Unicode version mismatch\n",
 			   icu_skipped_codepoints);
 
-	printf("category_test: success\n");
-	exit(0);
+	printf("category_test: ICU test successful\n");
+}
+#endif
+
+int
+main(int argc, char **argv)
+{
+	pg_unicode_version = parse_unicode_version(PG_UNICODE_VERSION);
+	printf("category_test: Postgres Unicode version:\t%s\n", PG_UNICODE_VERSION);
+
+#ifdef USE_ICU
+	icu_unicode_version = parse_unicode_version(U_UNICODE_VERSION);
+	printf("category_test: ICU Unicode version:\t\t%s\n", U_UNICODE_VERSION);
+
+	icu_test();
 #else
-	printf("category_test: ICU support required for test; skipping\n");
-	exit(0);
+	printf("category_test: ICU not available; skipping\n");
 #endif
 }
diff --git a/src/common/unicode/generate-unicode_category_table.pl b/src/common/unicode/generate-unicode_category_table.pl
index 992b877ede..9545728443 100644
--- a/src/common/unicode/generate-unicode_category_table.pl
+++ b/src/common/unicode/generate-unicode_category_table.pl
@@ -120,8 +120,6 @@ if ($range_category ne $CATEGORY_UNASSIGNED) {
 							category => $range_category});
 }
 
-my $num_ranges = scalar @category_ranges;
-
 # See: https://www.unicode.org/reports/tr44/#General_Category_Values
 my $categories = {
 	Cn => 'PG_U_UNASSIGNED',
@@ -156,11 +154,98 @@ my $categories = {
 	Pf => 'PG_U_FINAL_PUNCTUATION'
 };
 
-# Start writing out the output files
+# Find White_Space and Hex_Digit characters
+my @white_space = ();
+my @hex_digits = ();
+my @join_control = ();
+open($FH, '<', "$output_path/PropList.txt")
+  or die "Could not open $output_path/PropList.txt: $!.";
+while (my $line = <$FH>)
+{
+	my $pattern = qr/([0-9A-F\.]+)\s*;\s*(\w+)\s*#.*/s;
+	next unless $line =~ $pattern;
+
+	my $code = $line =~ s/$pattern/$1/rg;
+	my $property = $line =~ s/$pattern/$2/rg;
+	my $start;
+	my $end;
+
+	if ($code =~ /\.\./) {
+		# code range
+	    my @sp = split /\.\./, $code;
+		$start = hex($sp[0]);
+		$end = hex($sp[1]);
+	} else {
+		# single code point
+		$start = hex($code);
+		$end = hex($code);
+	}
+
+	if ($property eq "White_Space") {
+		push @white_space, {start => $start, end => $end};
+	}
+	elsif ($property eq "Hex_Digit") {
+		push @hex_digits, {start => $start, end => $end};
+	}
+	elsif ($property eq "Join_Control") {
+		push @join_control, {start => $start, end => $end};
+	}
+}
+
+# Find Alphabetic, Lowercase, and Uppercase characters
+my @alphabetic = ();
+my @lowercase = ();
+my @uppercase = ();
+open($FH, '<', "$output_path/DerivedCoreProperties.txt")
+  or die "Could not open $output_path/DerivedCoreProperties.txt: $!.";
+while (my $line = <$FH>)
+{
+	my $pattern = qr/^([0-9A-F\.]+)\s*;\s*(\w+)\s*#.*$/s;
+	next unless $line =~ $pattern;
+
+	my $code = $line =~ s/$pattern/$1/rg;
+	my $property = $line =~ s/$pattern/$2/rg;
+	my $start;
+	my $end;
+
+	if ($code =~ /\.\./) {
+		# code range
+	    my @sp = split /\.\./, $code;
+	    die "line: {$line} code: {$code} sp[0] {$sp[0]} sp[1] {$sp[1]}"
+		  unless $sp[0] =~ /^[0-9A-F]+$/ &&  $sp[1] =~ /^[0-9A-F]+$/;
+		$start = hex($sp[0]);
+		$end = hex($sp[1]);
+	} else {
+	    die "line: {$line} code: {$code}" unless $code =~ /^[0-9A-F]+$/;
+		# single code point
+		$start = hex($code);
+		$end = hex($code);
+	}
+
+	if ($property eq "Alphabetic") {
+		push @alphabetic, {start => $start, end => $end};
+	}
+	elsif ($property eq "Lowercase") {
+		push @lowercase, {start => $start, end => $end};
+	}
+	elsif ($property eq "Uppercase") {
+		push @uppercase, {start => $start, end => $end};
+	}
+}
+
+my $num_category_ranges = scalar @category_ranges;
+my $num_alphabetic_ranges = scalar @alphabetic;
+my $num_lowercase_ranges = scalar @lowercase;
+my $num_uppercase_ranges = scalar @uppercase;
+my $num_white_space_ranges = scalar @white_space;
+my $num_hex_digit_ranges = scalar @hex_digits;
+my $num_join_control_ranges = scalar @join_control;
+
+# Start writing out the output file
 open my $OT, '>', $output_table_file
   or die "Could not open output file $output_table_file: $!\n";
 
-print $OT <<HEADER;
+print $OT <<"HEADER";
 /*-------------------------------------------------------------------------
  *
  * unicode_category_table.h
@@ -188,11 +273,20 @@ typedef struct
 	uint8		category;		/* General Category */
 }			pg_category_range;
 
-/* table of Unicode codepoint ranges and their categories */
-static const pg_category_range unicode_categories[$num_ranges] =
+typedef struct
 {
+	uint32		first;			/* Unicode codepoint */
+	uint32		last;			/* Unicode codepoint */
+}			pg_unicode_range;
+
 HEADER
 
+print $OT <<"CATEGORY_TABLE";
+/* table of Unicode codepoint ranges and their categories */
+static const pg_category_range unicode_categories[$num_category_ranges] =
+{
+CATEGORY_TABLE
+
 my $firsttime = 1;
 foreach my $range (@category_ranges) {
 	printf $OT ",\n" unless $firsttime;
@@ -202,4 +296,101 @@ foreach my $range (@category_ranges) {
 	die "category missing: $range->{category}" unless $category;
 	printf $OT "\t{0x%06x, 0x%06x, %s}", $range->{start}, $range->{end}, $category;
 }
+
+print $OT "\n};\n\n";
+
+print $OT <<"ALPHABETIC_TABLE";
+/* table of Unicode codepoint ranges of Alphabetic characters */
+static const pg_unicode_range unicode_alphabetic[$num_alphabetic_ranges] =
+{
+ALPHABETIC_TABLE
+
+$firsttime = 1;
+foreach my $range (@alphabetic) {
+	printf $OT ",\n" unless $firsttime;
+	$firsttime = 0;
+
+	printf $OT "\t{0x%06x, 0x%06x}", $range->{start}, $range->{end};
+}
+
+print $OT "\n};\n\n";
+
+print $OT <<"LOWERCASE_TABLE";
+/* table of Unicode codepoint ranges of Lowercase characters */
+static const pg_unicode_range unicode_lowercase[$num_lowercase_ranges] =
+{
+LOWERCASE_TABLE
+
+$firsttime = 1;
+foreach my $range (@lowercase) {
+	printf $OT ",\n" unless $firsttime;
+	$firsttime = 0;
+
+	printf $OT "\t{0x%06x, 0x%06x}", $range->{start}, $range->{end};
+}
+
+print $OT "\n};\n\n";
+
+print $OT <<"UPPERCASE_TABLE";
+/* table of Unicode codepoint ranges of Uppercase characters */
+static const pg_unicode_range unicode_uppercase[$num_uppercase_ranges] =
+{
+UPPERCASE_TABLE
+
+$firsttime = 1;
+foreach my $range (@uppercase) {
+	printf $OT ",\n" unless $firsttime;
+	$firsttime = 0;
+
+	printf $OT "\t{0x%06x, 0x%06x}", $range->{start}, $range->{end};
+}
+
+print $OT "\n};\n\n";
+
+print $OT <<"WHITE_SPACE_TABLE";
+/* table of Unicode codepoint ranges of White_Space characters */
+static const pg_unicode_range unicode_white_space[$num_white_space_ranges] =
+{
+WHITE_SPACE_TABLE
+
+$firsttime = 1;
+foreach my $range (@white_space) {
+	printf $OT ",\n" unless $firsttime;
+	$firsttime = 0;
+
+	printf $OT "\t{0x%06x, 0x%06x}", $range->{start}, $range->{end};
+}
+
+print $OT "\n};\n\n";
+
+print $OT <<"HEX_DIGITS_TABLE";
+/* table of Unicode codepoint ranges of Hex_Digit characters */
+static const pg_unicode_range unicode_hex_digit[$num_hex_digit_ranges] =
+{
+HEX_DIGITS_TABLE
+
+$firsttime = 1;
+foreach my $range (@hex_digits) {
+	printf $OT ",\n" unless $firsttime;
+	$firsttime = 0;
+
+	printf $OT "\t{0x%06x, 0x%06x}", $range->{start}, $range->{end};
+}
+
+print $OT "\n};\n\n";
+
+print $OT <<"JOIN_CONTROL_TABLE";
+/* table of Unicode codepoint ranges of Join_Control characters */
+static const pg_unicode_range unicode_join_control[$num_join_control_ranges] =
+{
+JOIN_CONTROL_TABLE
+
+$firsttime = 1;
+foreach my $range (@join_control) {
+	printf $OT ",\n" unless $firsttime;
+	$firsttime = 0;
+
+	printf $OT "\t{0x%06x, 0x%06x}", $range->{start}, $range->{end};
+}
+
 print $OT "\n};\n";
diff --git a/src/common/unicode/meson.build b/src/common/unicode/meson.build
index e8cfdc1df4..3526ddb846 100644
--- a/src/common/unicode/meson.build
+++ b/src/common/unicode/meson.build
@@ -11,7 +11,7 @@ endif
 
 # These files are part of the Unicode Character Database. Download them on
 # demand.
-foreach f : ['CompositionExclusions.txt', 'DerivedNormalizationProps.txt', 'EastAsianWidth.txt', 'NormalizationTest.txt', 'UnicodeData.txt']
+foreach f : ['CompositionExclusions.txt', 'DerivedCoreProperties.txt', 'DerivedNormalizationProps.txt', 'EastAsianWidth.txt', 'NormalizationTest.txt', 'PropList.txt', 'UnicodeData.txt']
   url = unicode_baseurl.format(UNICODE_VERSION, f)
   target = custom_target(f,
     output: f,
@@ -26,7 +26,7 @@ update_unicode_targets = []
 
 update_unicode_targets += \
   custom_target('unicode_category_table.h',
-    input: [unicode_data['UnicodeData.txt']],
+    input: [unicode_data['UnicodeData.txt'], unicode_data['DerivedCoreProperties.txt'], unicode_data['PropList.txt']],
     output: ['unicode_category_table.h'],
     command: [
       perl, files('generate-unicode_category_table.pl'),
diff --git a/src/common/unicode_category.c b/src/common/unicode_category.c
index 189cd6eca3..0b7a9947cc 100644
--- a/src/common/unicode_category.c
+++ b/src/common/unicode_category.c
@@ -1,6 +1,8 @@
 /*-------------------------------------------------------------------------
  * unicode_category.c
- *		Determine general category of Unicode characters.
+ *		Determine general category and character class of Unicode
+ *		characters. Encoding must be UTF8, where we assume that the pg_wchar
+ *		representation is a code point.
  *
  * Portions Copyright (c) 2017-2023, PostgreSQL Global Development Group
  *
@@ -18,24 +20,78 @@
 #include "common/unicode_category.h"
 #include "common/unicode_category_table.h"
 
+/*
+ * We use a mask word for convenience when testing for multiple categories at
+ * once. The number of Unicode General Categories should never grow, so a
+ * 32-bit mask is fine.
+ */
+#define PG_U_CATEGORY_MASK(X) ((uint32)(1 << (X)))
+
+#define PG_U_LU_MASK PG_U_CATEGORY_MASK(PG_U_UPPERCASE_LETTER)
+#define PG_U_LL_MASK PG_U_CATEGORY_MASK(PG_U_LOWERCASE_LETTER)
+#define PG_U_LT_MASK PG_U_CATEGORY_MASK(PG_U_TITLECASE_LETTER)
+#define PG_U_LC_MASK (PG_U_LU_MASK|PG_U_LL_MASK|PG_U_LT_MASK)
+#define PG_U_LM_MASK PG_U_CATEGORY_MASK(PG_U_MODIFIER_LETTER)
+#define PG_U_LO_MASK PG_U_CATEGORY_MASK(PG_U_OTHER_LETTER)
+#define PG_U_L_MASK (PG_U_LU_MASK|PG_U_LL_MASK|PG_U_LT_MASK|PG_U_LM_MASK|\
+					 PG_U_LO_MASK)
+#define PG_U_MN_MASK PG_U_CATEGORY_MASK(PG_U_NONSPACING_MARK)
+#define PG_U_ME_MASK PG_U_CATEGORY_MASK(PG_U_ENCLOSING_MARK)
+#define PG_U_MC_MASK PG_U_CATEGORY_MASK(PG_U_SPACING_MARK)
+#define PG_U_M_MASK (PG_U_MN_MASK|PG_U_MC_MASK|PG_U_ME_MASK)
+#define PG_U_ND_MASK PG_U_CATEGORY_MASK(PG_U_DECIMAL_NUMBER)
+#define PG_U_NL_MASK PG_U_CATEGORY_MASK(PG_U_LETTER_NUMBER)
+#define PG_U_NO_MASK PG_U_CATEGORY_MASK(PG_U_OTHER_NUMBER)
+#define PG_U_N_MASK (PG_U_ND_MASK|PG_U_NL_MASK|PG_U_NO_MASK)
+#define PG_U_PC_MASK PG_U_CATEGORY_MASK(PG_U_CONNECTOR_PUNCTUATION)
+#define PG_U_PD_MASK PG_U_CATEGORY_MASK(PG_U_DASH_PUNCTUATION)
+#define PG_U_PS_MASK PG_U_CATEGORY_MASK(PG_U_OPEN_PUNCTUATION)
+#define PG_U_PE_MASK PG_U_CATEGORY_MASK(PG_U_CLOSE_PUNCTUATION)
+#define PG_U_PI_MASK PG_U_CATEGORY_MASK(PG_U_INITIAL_PUNCTUATION)
+#define PG_U_PF_MASK PG_U_CATEGORY_MASK(PG_U_FINAL_PUNCTUATION)
+#define PG_U_PO_MASK PG_U_CATEGORY_MASK(PG_U_OTHER_PUNCTUATION)
+#define PG_U_P_MASK (PG_U_PC_MASK|PG_U_PD_MASK|PG_U_PS_MASK|PG_U_PE_MASK|\
+					 PG_U_PI_MASK|PG_U_PF_MASK|PG_U_PO_MASK)
+#define PG_U_SM_MASK PG_U_CATEGORY_MASK(PG_U_MATH_SYMBOL)
+#define PG_U_SC_MASK PG_U_CATEGORY_MASK(PG_U_CURRENCY_SYMBOL)
+#define PG_U_SK_MASK PG_U_CATEGORY_MASK(PG_U_MODIFIER_SYMBOL)
+#define PG_U_SO_MASK PG_U_CATEGORY_MASK(PG_U_OTHER_SYMBOL)
+#define PG_U_S_MASK (PG_U_SM_MASK|PG_U_SC_MASK|PG_U_SK_MASK|PG_U_SO_MASK)
+#define PG_U_ZS_MASK PG_U_CATEGORY_MASK(PG_U_SPACE_SEPARATOR)
+#define PG_U_ZL_MASK PG_U_CATEGORY_MASK(PG_U_LINE_SEPARATOR)
+#define PG_U_ZP_MASK PG_U_CATEGORY_MASK(PG_U_PARAGRAPH_SEPARATOR)
+#define PG_U_Z_MASK (PG_U_ZS_MASK|PG_U_ZL_MASK|PG_U_ZP_MASK)
+#define PG_U_CC_MASK PG_U_CATEGORY_MASK(PG_U_CONTROL)
+#define PG_U_CF_MASK PG_U_CATEGORY_MASK(PG_U_FORMAT)
+#define PG_U_CS_MASK PG_U_CATEGORY_MASK(PG_U_SURROGATE)
+#define PG_U_CO_MASK PG_U_CATEGORY_MASK(PG_U_PRIVATE_USE)
+#define PG_U_CN_MASK PG_U_CATEGORY_MASK(PG_U_UNASSIGNED)
+#define PG_U_C_MASK (PG_U_CC_MASK|PG_U_CF_MASK|PG_U_CS_MASK|PG_U_CO_MASK|\
+					 PG_U_CN_MASK)
+
+#define PG_U_CHARACTER_TAB	0x09
+
+static bool range_search(const pg_unicode_range * tbl, Size size,
+						 pg_wchar code);
+
 /*
  * Unicode general category for the given codepoint.
  */
 pg_unicode_category
-unicode_category(pg_wchar ucs)
+unicode_category(pg_wchar code)
 {
 	int			min = 0;
 	int			mid;
 	int			max = lengthof(unicode_categories) - 1;
 
-	Assert(ucs <= 0x10ffff);
+	Assert(code <= 0x10ffff);
 
 	while (max >= min)
 	{
 		mid = (min + max) / 2;
-		if (ucs > unicode_categories[mid].last)
+		if (code > unicode_categories[mid].last)
 			min = mid + 1;
-		else if (ucs < unicode_categories[mid].first)
+		else if (code < unicode_categories[mid].first)
 			max = mid - 1;
 		else
 			return unicode_categories[mid].category;
@@ -44,6 +100,123 @@ unicode_category(pg_wchar ucs)
 	return PG_U_UNASSIGNED;
 }
 
+/*
+ * The following functions implement the regex character classification as
+ * described at: http://www.unicode.org/reports/tr18/#Compatibility_Properties
+ */
+
+bool
+pg_u_isdigit(pg_wchar code)
+{
+	return unicode_category(code) == PG_U_DECIMAL_NUMBER;
+}
+
+bool
+pg_u_isalpha(pg_wchar code)
+{
+	return range_search(unicode_alphabetic, lengthof(unicode_alphabetic),
+						code);
+}
+
+bool
+pg_u_isalnum(pg_wchar code)
+{
+	return pg_u_isalpha(code) || pg_u_isdigit(code);
+}
+
+bool
+pg_u_isword(pg_wchar code)
+{
+	uint32 category_mask = PG_U_CATEGORY_MASK(unicode_category(code));
+
+	return
+		category_mask & (PG_U_M_MASK|PG_U_ND_MASK|PG_U_PC_MASK) ||
+		pg_u_isalpha(code) ||
+		range_search(unicode_join_control, lengthof(unicode_join_control),
+					 code);
+}
+
+bool
+pg_u_isupper(pg_wchar code)
+{
+	return range_search(unicode_uppercase, lengthof(unicode_uppercase), code);
+}
+
+bool
+pg_u_islower(pg_wchar code)
+{
+	return range_search(unicode_lowercase, lengthof(unicode_lowercase), code);
+}
+
+bool
+pg_u_isblank(pg_wchar code)
+{
+	return code == PG_U_CHARACTER_TAB ||
+		unicode_category(code) == PG_U_SPACE_SEPARATOR;
+}
+
+bool
+pg_u_iscntrl(pg_wchar code)
+{
+	return unicode_category(code) == PG_U_CONTROL;
+}
+
+bool
+pg_u_isgraph(pg_wchar code)
+{
+	uint32 category_mask = PG_U_CATEGORY_MASK(unicode_category(code));
+
+	if (category_mask & (PG_U_CC_MASK|PG_U_CS_MASK|PG_U_CN_MASK) ||
+		pg_u_isspace(code))
+		return false;
+	return true;
+}
+
+bool
+pg_u_isprint(pg_wchar code)
+{
+	pg_unicode_category category = unicode_category(code);
+
+	if (category == PG_U_CONTROL)
+		return false;
+
+	return pg_u_isgraph(code) || pg_u_isblank(code);
+}
+
+bool
+pg_u_ispunct(pg_wchar code)
+{
+	uint32 category_mask = PG_U_CATEGORY_MASK(unicode_category(code));
+
+	return category_mask & PG_U_P_MASK;
+}
+
+/* posix variant */
+bool
+pg_u_ispunct_posix(pg_wchar code)
+{
+	uint32 category_mask;
+
+	if (pg_u_isalpha(code))
+		return false;
+
+	category_mask = PG_U_CATEGORY_MASK(unicode_category(code));
+	return category_mask & (PG_U_P_MASK|PG_U_S_MASK);
+}
+
+bool
+pg_u_isspace(pg_wchar code)
+{
+	return range_search(unicode_white_space, lengthof(unicode_white_space),
+						code);
+}
+
+bool
+pg_u_isxdigit(pg_wchar code)
+{
+	return range_search(unicode_hex_digit, lengthof(unicode_hex_digit), code);
+}
+
 /*
  * Description of Unicode general category.
  */
@@ -191,3 +364,30 @@ unicode_category_abbrev(pg_unicode_category category)
 	Assert(false);
 	return "??";				/* keep compiler quiet */
 }
+
+/*
+ * Binary search to test if given codepoint exists in one of the ranges in the
+ * given table.
+ */
+static bool
+range_search(const pg_unicode_range * tbl, Size size, pg_wchar code)
+{
+	int			min = 0;
+	int			mid;
+	int			max = size - 1;
+
+	Assert(code <= 0x10ffff);
+
+	while (max >= min)
+	{
+		mid = (min + max) / 2;
+		if (code > tbl[mid].last)
+			min = mid + 1;
+		else if (code < tbl[mid].first)
+			max = mid - 1;
+		else
+			return true;
+	}
+
+	return false;
+}
diff --git a/src/include/common/unicode_category.h b/src/include/common/unicode_category.h
index 81d38c7411..25c7937a43 100644
--- a/src/include/common/unicode_category.h
+++ b/src/include/common/unicode_category.h
@@ -62,7 +62,22 @@ typedef enum pg_unicode_category
 } pg_unicode_category;
 
 extern pg_unicode_category unicode_category(pg_wchar ucs);
-const char *unicode_category_string(pg_unicode_category category);
-const char *unicode_category_abbrev(pg_unicode_category category);
+extern const char *unicode_category_string(pg_unicode_category category);
+extern const char *unicode_category_abbrev(pg_unicode_category category);
+
+extern bool	pg_u_isdigit(pg_wchar c);
+extern bool	pg_u_isalpha(pg_wchar c);
+extern bool	pg_u_isalnum(pg_wchar c);
+extern bool	pg_u_isword(pg_wchar c);
+extern bool	pg_u_isupper(pg_wchar c);
+extern bool	pg_u_islower(pg_wchar c);
+extern bool	pg_u_isblank(pg_wchar c);
+extern bool	pg_u_iscntrl(pg_wchar c);
+extern bool	pg_u_isgraph(pg_wchar c);
+extern bool	pg_u_isprint(pg_wchar c);
+extern bool	pg_u_ispunct(pg_wchar c);
+extern bool	pg_u_ispunct_posix(pg_wchar c);
+extern bool	pg_u_isspace(pg_wchar c);
+extern bool	pg_u_isxdigit(pg_wchar c);
 
 #endif							/* UNICODE_CATEGORY_H */
diff --git a/src/include/common/unicode_category_table.h b/src/include/common/unicode_category_table.h
index 14f1ea0677..86cdc9c0ed 100644
--- a/src/include/common/unicode_category_table.h
+++ b/src/include/common/unicode_category_table.h
@@ -25,6 +25,12 @@ typedef struct
 	uint8		category;		/* General Category */
 }			pg_category_range;
 
+typedef struct
+{
+	uint32		first;			/* Unicode codepoint */
+	uint32		last;			/* Unicode codepoint */
+}			pg_unicode_range;
+
 /* table of Unicode codepoint ranges and their categories */
 static const pg_category_range unicode_categories[3302] =
 {
@@ -3331,3 +3337,2529 @@ static const pg_category_range unicode_categories[3302] =
 	{0x0f0000, 0x0ffffd, PG_U_PRIVATE_USE},
 	{0x100000, 0x10fffd, PG_U_PRIVATE_USE}
 };
+
+/* table of Unicode codepoint ranges of Alphabetic characters */
+static const pg_unicode_range unicode_alphabetic[1141] =
+{
+	{0x000041, 0x00005a},
+	{0x000061, 0x00007a},
+	{0x0000aa, 0x0000aa},
+	{0x0000b5, 0x0000b5},
+	{0x0000ba, 0x0000ba},
+	{0x0000c0, 0x0000d6},
+	{0x0000d8, 0x0000f6},
+	{0x0000f8, 0x0001ba},
+	{0x0001bb, 0x0001bb},
+	{0x0001bc, 0x0001bf},
+	{0x0001c0, 0x0001c3},
+	{0x0001c4, 0x000293},
+	{0x000294, 0x000294},
+	{0x000295, 0x0002af},
+	{0x0002b0, 0x0002c1},
+	{0x0002c6, 0x0002d1},
+	{0x0002e0, 0x0002e4},
+	{0x0002ec, 0x0002ec},
+	{0x0002ee, 0x0002ee},
+	{0x000345, 0x000345},
+	{0x000370, 0x000373},
+	{0x000374, 0x000374},
+	{0x000376, 0x000377},
+	{0x00037a, 0x00037a},
+	{0x00037b, 0x00037d},
+	{0x00037f, 0x00037f},
+	{0x000386, 0x000386},
+	{0x000388, 0x00038a},
+	{0x00038c, 0x00038c},
+	{0x00038e, 0x0003a1},
+	{0x0003a3, 0x0003f5},
+	{0x0003f7, 0x000481},
+	{0x00048a, 0x00052f},
+	{0x000531, 0x000556},
+	{0x000559, 0x000559},
+	{0x000560, 0x000588},
+	{0x0005b0, 0x0005bd},
+	{0x0005bf, 0x0005bf},
+	{0x0005c1, 0x0005c2},
+	{0x0005c4, 0x0005c5},
+	{0x0005c7, 0x0005c7},
+	{0x0005d0, 0x0005ea},
+	{0x0005ef, 0x0005f2},
+	{0x000610, 0x00061a},
+	{0x000620, 0x00063f},
+	{0x000640, 0x000640},
+	{0x000641, 0x00064a},
+	{0x00064b, 0x000657},
+	{0x000659, 0x00065f},
+	{0x00066e, 0x00066f},
+	{0x000670, 0x000670},
+	{0x000671, 0x0006d3},
+	{0x0006d5, 0x0006d5},
+	{0x0006d6, 0x0006dc},
+	{0x0006e1, 0x0006e4},
+	{0x0006e5, 0x0006e6},
+	{0x0006e7, 0x0006e8},
+	{0x0006ed, 0x0006ed},
+	{0x0006ee, 0x0006ef},
+	{0x0006fa, 0x0006fc},
+	{0x0006ff, 0x0006ff},
+	{0x000710, 0x000710},
+	{0x000711, 0x000711},
+	{0x000712, 0x00072f},
+	{0x000730, 0x00073f},
+	{0x00074d, 0x0007a5},
+	{0x0007a6, 0x0007b0},
+	{0x0007b1, 0x0007b1},
+	{0x0007ca, 0x0007ea},
+	{0x0007f4, 0x0007f5},
+	{0x0007fa, 0x0007fa},
+	{0x000800, 0x000815},
+	{0x000816, 0x000817},
+	{0x00081a, 0x00081a},
+	{0x00081b, 0x000823},
+	{0x000824, 0x000824},
+	{0x000825, 0x000827},
+	{0x000828, 0x000828},
+	{0x000829, 0x00082c},
+	{0x000840, 0x000858},
+	{0x000860, 0x00086a},
+	{0x000870, 0x000887},
+	{0x000889, 0x00088e},
+	{0x0008a0, 0x0008c8},
+	{0x0008c9, 0x0008c9},
+	{0x0008d4, 0x0008df},
+	{0x0008e3, 0x0008e9},
+	{0x0008f0, 0x000902},
+	{0x000903, 0x000903},
+	{0x000904, 0x000939},
+	{0x00093a, 0x00093a},
+	{0x00093b, 0x00093b},
+	{0x00093d, 0x00093d},
+	{0x00093e, 0x000940},
+	{0x000941, 0x000948},
+	{0x000949, 0x00094c},
+	{0x00094e, 0x00094f},
+	{0x000950, 0x000950},
+	{0x000955, 0x000957},
+	{0x000958, 0x000961},
+	{0x000962, 0x000963},
+	{0x000971, 0x000971},
+	{0x000972, 0x000980},
+	{0x000981, 0x000981},
+	{0x000982, 0x000983},
+	{0x000985, 0x00098c},
+	{0x00098f, 0x000990},
+	{0x000993, 0x0009a8},
+	{0x0009aa, 0x0009b0},
+	{0x0009b2, 0x0009b2},
+	{0x0009b6, 0x0009b9},
+	{0x0009bd, 0x0009bd},
+	{0x0009be, 0x0009c0},
+	{0x0009c1, 0x0009c4},
+	{0x0009c7, 0x0009c8},
+	{0x0009cb, 0x0009cc},
+	{0x0009ce, 0x0009ce},
+	{0x0009d7, 0x0009d7},
+	{0x0009dc, 0x0009dd},
+	{0x0009df, 0x0009e1},
+	{0x0009e2, 0x0009e3},
+	{0x0009f0, 0x0009f1},
+	{0x0009fc, 0x0009fc},
+	{0x000a01, 0x000a02},
+	{0x000a03, 0x000a03},
+	{0x000a05, 0x000a0a},
+	{0x000a0f, 0x000a10},
+	{0x000a13, 0x000a28},
+	{0x000a2a, 0x000a30},
+	{0x000a32, 0x000a33},
+	{0x000a35, 0x000a36},
+	{0x000a38, 0x000a39},
+	{0x000a3e, 0x000a40},
+	{0x000a41, 0x000a42},
+	{0x000a47, 0x000a48},
+	{0x000a4b, 0x000a4c},
+	{0x000a51, 0x000a51},
+	{0x000a59, 0x000a5c},
+	{0x000a5e, 0x000a5e},
+	{0x000a70, 0x000a71},
+	{0x000a72, 0x000a74},
+	{0x000a75, 0x000a75},
+	{0x000a81, 0x000a82},
+	{0x000a83, 0x000a83},
+	{0x000a85, 0x000a8d},
+	{0x000a8f, 0x000a91},
+	{0x000a93, 0x000aa8},
+	{0x000aaa, 0x000ab0},
+	{0x000ab2, 0x000ab3},
+	{0x000ab5, 0x000ab9},
+	{0x000abd, 0x000abd},
+	{0x000abe, 0x000ac0},
+	{0x000ac1, 0x000ac5},
+	{0x000ac7, 0x000ac8},
+	{0x000ac9, 0x000ac9},
+	{0x000acb, 0x000acc},
+	{0x000ad0, 0x000ad0},
+	{0x000ae0, 0x000ae1},
+	{0x000ae2, 0x000ae3},
+	{0x000af9, 0x000af9},
+	{0x000afa, 0x000afc},
+	{0x000b01, 0x000b01},
+	{0x000b02, 0x000b03},
+	{0x000b05, 0x000b0c},
+	{0x000b0f, 0x000b10},
+	{0x000b13, 0x000b28},
+	{0x000b2a, 0x000b30},
+	{0x000b32, 0x000b33},
+	{0x000b35, 0x000b39},
+	{0x000b3d, 0x000b3d},
+	{0x000b3e, 0x000b3e},
+	{0x000b3f, 0x000b3f},
+	{0x000b40, 0x000b40},
+	{0x000b41, 0x000b44},
+	{0x000b47, 0x000b48},
+	{0x000b4b, 0x000b4c},
+	{0x000b56, 0x000b56},
+	{0x000b57, 0x000b57},
+	{0x000b5c, 0x000b5d},
+	{0x000b5f, 0x000b61},
+	{0x000b62, 0x000b63},
+	{0x000b71, 0x000b71},
+	{0x000b82, 0x000b82},
+	{0x000b83, 0x000b83},
+	{0x000b85, 0x000b8a},
+	{0x000b8e, 0x000b90},
+	{0x000b92, 0x000b95},
+	{0x000b99, 0x000b9a},
+	{0x000b9c, 0x000b9c},
+	{0x000b9e, 0x000b9f},
+	{0x000ba3, 0x000ba4},
+	{0x000ba8, 0x000baa},
+	{0x000bae, 0x000bb9},
+	{0x000bbe, 0x000bbf},
+	{0x000bc0, 0x000bc0},
+	{0x000bc1, 0x000bc2},
+	{0x000bc6, 0x000bc8},
+	{0x000bca, 0x000bcc},
+	{0x000bd0, 0x000bd0},
+	{0x000bd7, 0x000bd7},
+	{0x000c00, 0x000c00},
+	{0x000c01, 0x000c03},
+	{0x000c04, 0x000c04},
+	{0x000c05, 0x000c0c},
+	{0x000c0e, 0x000c10},
+	{0x000c12, 0x000c28},
+	{0x000c2a, 0x000c39},
+	{0x000c3d, 0x000c3d},
+	{0x000c3e, 0x000c40},
+	{0x000c41, 0x000c44},
+	{0x000c46, 0x000c48},
+	{0x000c4a, 0x000c4c},
+	{0x000c55, 0x000c56},
+	{0x000c58, 0x000c5a},
+	{0x000c5d, 0x000c5d},
+	{0x000c60, 0x000c61},
+	{0x000c62, 0x000c63},
+	{0x000c80, 0x000c80},
+	{0x000c81, 0x000c81},
+	{0x000c82, 0x000c83},
+	{0x000c85, 0x000c8c},
+	{0x000c8e, 0x000c90},
+	{0x000c92, 0x000ca8},
+	{0x000caa, 0x000cb3},
+	{0x000cb5, 0x000cb9},
+	{0x000cbd, 0x000cbd},
+	{0x000cbe, 0x000cbe},
+	{0x000cbf, 0x000cbf},
+	{0x000cc0, 0x000cc4},
+	{0x000cc6, 0x000cc6},
+	{0x000cc7, 0x000cc8},
+	{0x000cca, 0x000ccb},
+	{0x000ccc, 0x000ccc},
+	{0x000cd5, 0x000cd6},
+	{0x000cdd, 0x000cde},
+	{0x000ce0, 0x000ce1},
+	{0x000ce2, 0x000ce3},
+	{0x000cf1, 0x000cf2},
+	{0x000cf3, 0x000cf3},
+	{0x000d00, 0x000d01},
+	{0x000d02, 0x000d03},
+	{0x000d04, 0x000d0c},
+	{0x000d0e, 0x000d10},
+	{0x000d12, 0x000d3a},
+	{0x000d3d, 0x000d3d},
+	{0x000d3e, 0x000d40},
+	{0x000d41, 0x000d44},
+	{0x000d46, 0x000d48},
+	{0x000d4a, 0x000d4c},
+	{0x000d4e, 0x000d4e},
+	{0x000d54, 0x000d56},
+	{0x000d57, 0x000d57},
+	{0x000d5f, 0x000d61},
+	{0x000d62, 0x000d63},
+	{0x000d7a, 0x000d7f},
+	{0x000d81, 0x000d81},
+	{0x000d82, 0x000d83},
+	{0x000d85, 0x000d96},
+	{0x000d9a, 0x000db1},
+	{0x000db3, 0x000dbb},
+	{0x000dbd, 0x000dbd},
+	{0x000dc0, 0x000dc6},
+	{0x000dcf, 0x000dd1},
+	{0x000dd2, 0x000dd4},
+	{0x000dd6, 0x000dd6},
+	{0x000dd8, 0x000ddf},
+	{0x000df2, 0x000df3},
+	{0x000e01, 0x000e30},
+	{0x000e31, 0x000e31},
+	{0x000e32, 0x000e33},
+	{0x000e34, 0x000e3a},
+	{0x000e40, 0x000e45},
+	{0x000e46, 0x000e46},
+	{0x000e4d, 0x000e4d},
+	{0x000e81, 0x000e82},
+	{0x000e84, 0x000e84},
+	{0x000e86, 0x000e8a},
+	{0x000e8c, 0x000ea3},
+	{0x000ea5, 0x000ea5},
+	{0x000ea7, 0x000eb0},
+	{0x000eb1, 0x000eb1},
+	{0x000eb2, 0x000eb3},
+	{0x000eb4, 0x000eb9},
+	{0x000ebb, 0x000ebc},
+	{0x000ebd, 0x000ebd},
+	{0x000ec0, 0x000ec4},
+	{0x000ec6, 0x000ec6},
+	{0x000ecd, 0x000ecd},
+	{0x000edc, 0x000edf},
+	{0x000f00, 0x000f00},
+	{0x000f40, 0x000f47},
+	{0x000f49, 0x000f6c},
+	{0x000f71, 0x000f7e},
+	{0x000f7f, 0x000f7f},
+	{0x000f80, 0x000f83},
+	{0x000f88, 0x000f8c},
+	{0x000f8d, 0x000f97},
+	{0x000f99, 0x000fbc},
+	{0x001000, 0x00102a},
+	{0x00102b, 0x00102c},
+	{0x00102d, 0x001030},
+	{0x001031, 0x001031},
+	{0x001032, 0x001036},
+	{0x001038, 0x001038},
+	{0x00103b, 0x00103c},
+	{0x00103d, 0x00103e},
+	{0x00103f, 0x00103f},
+	{0x001050, 0x001055},
+	{0x001056, 0x001057},
+	{0x001058, 0x001059},
+	{0x00105a, 0x00105d},
+	{0x00105e, 0x001060},
+	{0x001061, 0x001061},
+	{0x001062, 0x001064},
+	{0x001065, 0x001066},
+	{0x001067, 0x00106d},
+	{0x00106e, 0x001070},
+	{0x001071, 0x001074},
+	{0x001075, 0x001081},
+	{0x001082, 0x001082},
+	{0x001083, 0x001084},
+	{0x001085, 0x001086},
+	{0x001087, 0x00108c},
+	{0x00108d, 0x00108d},
+	{0x00108e, 0x00108e},
+	{0x00108f, 0x00108f},
+	{0x00109a, 0x00109c},
+	{0x00109d, 0x00109d},
+	{0x0010a0, 0x0010c5},
+	{0x0010c7, 0x0010c7},
+	{0x0010cd, 0x0010cd},
+	{0x0010d0, 0x0010fa},
+	{0x0010fc, 0x0010fc},
+	{0x0010fd, 0x0010ff},
+	{0x001100, 0x001248},
+	{0x00124a, 0x00124d},
+	{0x001250, 0x001256},
+	{0x001258, 0x001258},
+	{0x00125a, 0x00125d},
+	{0x001260, 0x001288},
+	{0x00128a, 0x00128d},
+	{0x001290, 0x0012b0},
+	{0x0012b2, 0x0012b5},
+	{0x0012b8, 0x0012be},
+	{0x0012c0, 0x0012c0},
+	{0x0012c2, 0x0012c5},
+	{0x0012c8, 0x0012d6},
+	{0x0012d8, 0x001310},
+	{0x001312, 0x001315},
+	{0x001318, 0x00135a},
+	{0x001380, 0x00138f},
+	{0x0013a0, 0x0013f5},
+	{0x0013f8, 0x0013fd},
+	{0x001401, 0x00166c},
+	{0x00166f, 0x00167f},
+	{0x001681, 0x00169a},
+	{0x0016a0, 0x0016ea},
+	{0x0016ee, 0x0016f0},
+	{0x0016f1, 0x0016f8},
+	{0x001700, 0x001711},
+	{0x001712, 0x001713},
+	{0x00171f, 0x001731},
+	{0x001732, 0x001733},
+	{0x001740, 0x001751},
+	{0x001752, 0x001753},
+	{0x001760, 0x00176c},
+	{0x00176e, 0x001770},
+	{0x001772, 0x001773},
+	{0x001780, 0x0017b3},
+	{0x0017b6, 0x0017b6},
+	{0x0017b7, 0x0017bd},
+	{0x0017be, 0x0017c5},
+	{0x0017c6, 0x0017c6},
+	{0x0017c7, 0x0017c8},
+	{0x0017d7, 0x0017d7},
+	{0x0017dc, 0x0017dc},
+	{0x001820, 0x001842},
+	{0x001843, 0x001843},
+	{0x001844, 0x001878},
+	{0x001880, 0x001884},
+	{0x001885, 0x001886},
+	{0x001887, 0x0018a8},
+	{0x0018a9, 0x0018a9},
+	{0x0018aa, 0x0018aa},
+	{0x0018b0, 0x0018f5},
+	{0x001900, 0x00191e},
+	{0x001920, 0x001922},
+	{0x001923, 0x001926},
+	{0x001927, 0x001928},
+	{0x001929, 0x00192b},
+	{0x001930, 0x001931},
+	{0x001932, 0x001932},
+	{0x001933, 0x001938},
+	{0x001950, 0x00196d},
+	{0x001970, 0x001974},
+	{0x001980, 0x0019ab},
+	{0x0019b0, 0x0019c9},
+	{0x001a00, 0x001a16},
+	{0x001a17, 0x001a18},
+	{0x001a19, 0x001a1a},
+	{0x001a1b, 0x001a1b},
+	{0x001a20, 0x001a54},
+	{0x001a55, 0x001a55},
+	{0x001a56, 0x001a56},
+	{0x001a57, 0x001a57},
+	{0x001a58, 0x001a5e},
+	{0x001a61, 0x001a61},
+	{0x001a62, 0x001a62},
+	{0x001a63, 0x001a64},
+	{0x001a65, 0x001a6c},
+	{0x001a6d, 0x001a72},
+	{0x001a73, 0x001a74},
+	{0x001aa7, 0x001aa7},
+	{0x001abf, 0x001ac0},
+	{0x001acc, 0x001ace},
+	{0x001b00, 0x001b03},
+	{0x001b04, 0x001b04},
+	{0x001b05, 0x001b33},
+	{0x001b35, 0x001b35},
+	{0x001b36, 0x001b3a},
+	{0x001b3b, 0x001b3b},
+	{0x001b3c, 0x001b3c},
+	{0x001b3d, 0x001b41},
+	{0x001b42, 0x001b42},
+	{0x001b43, 0x001b43},
+	{0x001b45, 0x001b4c},
+	{0x001b80, 0x001b81},
+	{0x001b82, 0x001b82},
+	{0x001b83, 0x001ba0},
+	{0x001ba1, 0x001ba1},
+	{0x001ba2, 0x001ba5},
+	{0x001ba6, 0x001ba7},
+	{0x001ba8, 0x001ba9},
+	{0x001bac, 0x001bad},
+	{0x001bae, 0x001baf},
+	{0x001bba, 0x001be5},
+	{0x001be7, 0x001be7},
+	{0x001be8, 0x001be9},
+	{0x001bea, 0x001bec},
+	{0x001bed, 0x001bed},
+	{0x001bee, 0x001bee},
+	{0x001bef, 0x001bf1},
+	{0x001c00, 0x001c23},
+	{0x001c24, 0x001c2b},
+	{0x001c2c, 0x001c33},
+	{0x001c34, 0x001c35},
+	{0x001c36, 0x001c36},
+	{0x001c4d, 0x001c4f},
+	{0x001c5a, 0x001c77},
+	{0x001c78, 0x001c7d},
+	{0x001c80, 0x001c88},
+	{0x001c90, 0x001cba},
+	{0x001cbd, 0x001cbf},
+	{0x001ce9, 0x001cec},
+	{0x001cee, 0x001cf3},
+	{0x001cf5, 0x001cf6},
+	{0x001cfa, 0x001cfa},
+	{0x001d00, 0x001d2b},
+	{0x001d2c, 0x001d6a},
+	{0x001d6b, 0x001d77},
+	{0x001d78, 0x001d78},
+	{0x001d79, 0x001d9a},
+	{0x001d9b, 0x001dbf},
+	{0x001de7, 0x001df4},
+	{0x001e00, 0x001f15},
+	{0x001f18, 0x001f1d},
+	{0x001f20, 0x001f45},
+	{0x001f48, 0x001f4d},
+	{0x001f50, 0x001f57},
+	{0x001f59, 0x001f59},
+	{0x001f5b, 0x001f5b},
+	{0x001f5d, 0x001f5d},
+	{0x001f5f, 0x001f7d},
+	{0x001f80, 0x001fb4},
+	{0x001fb6, 0x001fbc},
+	{0x001fbe, 0x001fbe},
+	{0x001fc2, 0x001fc4},
+	{0x001fc6, 0x001fcc},
+	{0x001fd0, 0x001fd3},
+	{0x001fd6, 0x001fdb},
+	{0x001fe0, 0x001fec},
+	{0x001ff2, 0x001ff4},
+	{0x001ff6, 0x001ffc},
+	{0x002071, 0x002071},
+	{0x00207f, 0x00207f},
+	{0x002090, 0x00209c},
+	{0x002102, 0x002102},
+	{0x002107, 0x002107},
+	{0x00210a, 0x002113},
+	{0x002115, 0x002115},
+	{0x002119, 0x00211d},
+	{0x002124, 0x002124},
+	{0x002126, 0x002126},
+	{0x002128, 0x002128},
+	{0x00212a, 0x00212d},
+	{0x00212f, 0x002134},
+	{0x002135, 0x002138},
+	{0x002139, 0x002139},
+	{0x00213c, 0x00213f},
+	{0x002145, 0x002149},
+	{0x00214e, 0x00214e},
+	{0x002160, 0x002182},
+	{0x002183, 0x002184},
+	{0x002185, 0x002188},
+	{0x0024b6, 0x0024e9},
+	{0x002c00, 0x002c7b},
+	{0x002c7c, 0x002c7d},
+	{0x002c7e, 0x002ce4},
+	{0x002ceb, 0x002cee},
+	{0x002cf2, 0x002cf3},
+	{0x002d00, 0x002d25},
+	{0x002d27, 0x002d27},
+	{0x002d2d, 0x002d2d},
+	{0x002d30, 0x002d67},
+	{0x002d6f, 0x002d6f},
+	{0x002d80, 0x002d96},
+	{0x002da0, 0x002da6},
+	{0x002da8, 0x002dae},
+	{0x002db0, 0x002db6},
+	{0x002db8, 0x002dbe},
+	{0x002dc0, 0x002dc6},
+	{0x002dc8, 0x002dce},
+	{0x002dd0, 0x002dd6},
+	{0x002dd8, 0x002dde},
+	{0x002de0, 0x002dff},
+	{0x002e2f, 0x002e2f},
+	{0x003005, 0x003005},
+	{0x003006, 0x003006},
+	{0x003007, 0x003007},
+	{0x003021, 0x003029},
+	{0x003031, 0x003035},
+	{0x003038, 0x00303a},
+	{0x00303b, 0x00303b},
+	{0x00303c, 0x00303c},
+	{0x003041, 0x003096},
+	{0x00309d, 0x00309e},
+	{0x00309f, 0x00309f},
+	{0x0030a1, 0x0030fa},
+	{0x0030fc, 0x0030fe},
+	{0x0030ff, 0x0030ff},
+	{0x003105, 0x00312f},
+	{0x003131, 0x00318e},
+	{0x0031a0, 0x0031bf},
+	{0x0031f0, 0x0031ff},
+	{0x003400, 0x004dbf},
+	{0x004e00, 0x00a014},
+	{0x00a015, 0x00a015},
+	{0x00a016, 0x00a48c},
+	{0x00a4d0, 0x00a4f7},
+	{0x00a4f8, 0x00a4fd},
+	{0x00a500, 0x00a60b},
+	{0x00a60c, 0x00a60c},
+	{0x00a610, 0x00a61f},
+	{0x00a62a, 0x00a62b},
+	{0x00a640, 0x00a66d},
+	{0x00a66e, 0x00a66e},
+	{0x00a674, 0x00a67b},
+	{0x00a67f, 0x00a67f},
+	{0x00a680, 0x00a69b},
+	{0x00a69c, 0x00a69d},
+	{0x00a69e, 0x00a69f},
+	{0x00a6a0, 0x00a6e5},
+	{0x00a6e6, 0x00a6ef},
+	{0x00a717, 0x00a71f},
+	{0x00a722, 0x00a76f},
+	{0x00a770, 0x00a770},
+	{0x00a771, 0x00a787},
+	{0x00a788, 0x00a788},
+	{0x00a78b, 0x00a78e},
+	{0x00a78f, 0x00a78f},
+	{0x00a790, 0x00a7ca},
+	{0x00a7d0, 0x00a7d1},
+	{0x00a7d3, 0x00a7d3},
+	{0x00a7d5, 0x00a7d9},
+	{0x00a7f2, 0x00a7f4},
+	{0x00a7f5, 0x00a7f6},
+	{0x00a7f7, 0x00a7f7},
+	{0x00a7f8, 0x00a7f9},
+	{0x00a7fa, 0x00a7fa},
+	{0x00a7fb, 0x00a801},
+	{0x00a802, 0x00a802},
+	{0x00a803, 0x00a805},
+	{0x00a807, 0x00a80a},
+	{0x00a80b, 0x00a80b},
+	{0x00a80c, 0x00a822},
+	{0x00a823, 0x00a824},
+	{0x00a825, 0x00a826},
+	{0x00a827, 0x00a827},
+	{0x00a840, 0x00a873},
+	{0x00a880, 0x00a881},
+	{0x00a882, 0x00a8b3},
+	{0x00a8b4, 0x00a8c3},
+	{0x00a8c5, 0x00a8c5},
+	{0x00a8f2, 0x00a8f7},
+	{0x00a8fb, 0x00a8fb},
+	{0x00a8fd, 0x00a8fe},
+	{0x00a8ff, 0x00a8ff},
+	{0x00a90a, 0x00a925},
+	{0x00a926, 0x00a92a},
+	{0x00a930, 0x00a946},
+	{0x00a947, 0x00a951},
+	{0x00a952, 0x00a952},
+	{0x00a960, 0x00a97c},
+	{0x00a980, 0x00a982},
+	{0x00a983, 0x00a983},
+	{0x00a984, 0x00a9b2},
+	{0x00a9b4, 0x00a9b5},
+	{0x00a9b6, 0x00a9b9},
+	{0x00a9ba, 0x00a9bb},
+	{0x00a9bc, 0x00a9bd},
+	{0x00a9be, 0x00a9bf},
+	{0x00a9cf, 0x00a9cf},
+	{0x00a9e0, 0x00a9e4},
+	{0x00a9e5, 0x00a9e5},
+	{0x00a9e6, 0x00a9e6},
+	{0x00a9e7, 0x00a9ef},
+	{0x00a9fa, 0x00a9fe},
+	{0x00aa00, 0x00aa28},
+	{0x00aa29, 0x00aa2e},
+	{0x00aa2f, 0x00aa30},
+	{0x00aa31, 0x00aa32},
+	{0x00aa33, 0x00aa34},
+	{0x00aa35, 0x00aa36},
+	{0x00aa40, 0x00aa42},
+	{0x00aa43, 0x00aa43},
+	{0x00aa44, 0x00aa4b},
+	{0x00aa4c, 0x00aa4c},
+	{0x00aa4d, 0x00aa4d},
+	{0x00aa60, 0x00aa6f},
+	{0x00aa70, 0x00aa70},
+	{0x00aa71, 0x00aa76},
+	{0x00aa7a, 0x00aa7a},
+	{0x00aa7b, 0x00aa7b},
+	{0x00aa7c, 0x00aa7c},
+	{0x00aa7d, 0x00aa7d},
+	{0x00aa7e, 0x00aaaf},
+	{0x00aab0, 0x00aab0},
+	{0x00aab1, 0x00aab1},
+	{0x00aab2, 0x00aab4},
+	{0x00aab5, 0x00aab6},
+	{0x00aab7, 0x00aab8},
+	{0x00aab9, 0x00aabd},
+	{0x00aabe, 0x00aabe},
+	{0x00aac0, 0x00aac0},
+	{0x00aac2, 0x00aac2},
+	{0x00aadb, 0x00aadc},
+	{0x00aadd, 0x00aadd},
+	{0x00aae0, 0x00aaea},
+	{0x00aaeb, 0x00aaeb},
+	{0x00aaec, 0x00aaed},
+	{0x00aaee, 0x00aaef},
+	{0x00aaf2, 0x00aaf2},
+	{0x00aaf3, 0x00aaf4},
+	{0x00aaf5, 0x00aaf5},
+	{0x00ab01, 0x00ab06},
+	{0x00ab09, 0x00ab0e},
+	{0x00ab11, 0x00ab16},
+	{0x00ab20, 0x00ab26},
+	{0x00ab28, 0x00ab2e},
+	{0x00ab30, 0x00ab5a},
+	{0x00ab5c, 0x00ab5f},
+	{0x00ab60, 0x00ab68},
+	{0x00ab69, 0x00ab69},
+	{0x00ab70, 0x00abbf},
+	{0x00abc0, 0x00abe2},
+	{0x00abe3, 0x00abe4},
+	{0x00abe5, 0x00abe5},
+	{0x00abe6, 0x00abe7},
+	{0x00abe8, 0x00abe8},
+	{0x00abe9, 0x00abea},
+	{0x00ac00, 0x00d7a3},
+	{0x00d7b0, 0x00d7c6},
+	{0x00d7cb, 0x00d7fb},
+	{0x00f900, 0x00fa6d},
+	{0x00fa70, 0x00fad9},
+	{0x00fb00, 0x00fb06},
+	{0x00fb13, 0x00fb17},
+	{0x00fb1d, 0x00fb1d},
+	{0x00fb1e, 0x00fb1e},
+	{0x00fb1f, 0x00fb28},
+	{0x00fb2a, 0x00fb36},
+	{0x00fb38, 0x00fb3c},
+	{0x00fb3e, 0x00fb3e},
+	{0x00fb40, 0x00fb41},
+	{0x00fb43, 0x00fb44},
+	{0x00fb46, 0x00fbb1},
+	{0x00fbd3, 0x00fd3d},
+	{0x00fd50, 0x00fd8f},
+	{0x00fd92, 0x00fdc7},
+	{0x00fdf0, 0x00fdfb},
+	{0x00fe70, 0x00fe74},
+	{0x00fe76, 0x00fefc},
+	{0x00ff21, 0x00ff3a},
+	{0x00ff41, 0x00ff5a},
+	{0x00ff66, 0x00ff6f},
+	{0x00ff70, 0x00ff70},
+	{0x00ff71, 0x00ff9d},
+	{0x00ff9e, 0x00ff9f},
+	{0x00ffa0, 0x00ffbe},
+	{0x00ffc2, 0x00ffc7},
+	{0x00ffca, 0x00ffcf},
+	{0x00ffd2, 0x00ffd7},
+	{0x00ffda, 0x00ffdc},
+	{0x010000, 0x01000b},
+	{0x01000d, 0x010026},
+	{0x010028, 0x01003a},
+	{0x01003c, 0x01003d},
+	{0x01003f, 0x01004d},
+	{0x010050, 0x01005d},
+	{0x010080, 0x0100fa},
+	{0x010140, 0x010174},
+	{0x010280, 0x01029c},
+	{0x0102a0, 0x0102d0},
+	{0x010300, 0x01031f},
+	{0x01032d, 0x010340},
+	{0x010341, 0x010341},
+	{0x010342, 0x010349},
+	{0x01034a, 0x01034a},
+	{0x010350, 0x010375},
+	{0x010376, 0x01037a},
+	{0x010380, 0x01039d},
+	{0x0103a0, 0x0103c3},
+	{0x0103c8, 0x0103cf},
+	{0x0103d1, 0x0103d5},
+	{0x010400, 0x01044f},
+	{0x010450, 0x01049d},
+	{0x0104b0, 0x0104d3},
+	{0x0104d8, 0x0104fb},
+	{0x010500, 0x010527},
+	{0x010530, 0x010563},
+	{0x010570, 0x01057a},
+	{0x01057c, 0x01058a},
+	{0x01058c, 0x010592},
+	{0x010594, 0x010595},
+	{0x010597, 0x0105a1},
+	{0x0105a3, 0x0105b1},
+	{0x0105b3, 0x0105b9},
+	{0x0105bb, 0x0105bc},
+	{0x010600, 0x010736},
+	{0x010740, 0x010755},
+	{0x010760, 0x010767},
+	{0x010780, 0x010785},
+	{0x010787, 0x0107b0},
+	{0x0107b2, 0x0107ba},
+	{0x010800, 0x010805},
+	{0x010808, 0x010808},
+	{0x01080a, 0x010835},
+	{0x010837, 0x010838},
+	{0x01083c, 0x01083c},
+	{0x01083f, 0x010855},
+	{0x010860, 0x010876},
+	{0x010880, 0x01089e},
+	{0x0108e0, 0x0108f2},
+	{0x0108f4, 0x0108f5},
+	{0x010900, 0x010915},
+	{0x010920, 0x010939},
+	{0x010980, 0x0109b7},
+	{0x0109be, 0x0109bf},
+	{0x010a00, 0x010a00},
+	{0x010a01, 0x010a03},
+	{0x010a05, 0x010a06},
+	{0x010a0c, 0x010a0f},
+	{0x010a10, 0x010a13},
+	{0x010a15, 0x010a17},
+	{0x010a19, 0x010a35},
+	{0x010a60, 0x010a7c},
+	{0x010a80, 0x010a9c},
+	{0x010ac0, 0x010ac7},
+	{0x010ac9, 0x010ae4},
+	{0x010b00, 0x010b35},
+	{0x010b40, 0x010b55},
+	{0x010b60, 0x010b72},
+	{0x010b80, 0x010b91},
+	{0x010c00, 0x010c48},
+	{0x010c80, 0x010cb2},
+	{0x010cc0, 0x010cf2},
+	{0x010d00, 0x010d23},
+	{0x010d24, 0x010d27},
+	{0x010e80, 0x010ea9},
+	{0x010eab, 0x010eac},
+	{0x010eb0, 0x010eb1},
+	{0x010f00, 0x010f1c},
+	{0x010f27, 0x010f27},
+	{0x010f30, 0x010f45},
+	{0x010f70, 0x010f81},
+	{0x010fb0, 0x010fc4},
+	{0x010fe0, 0x010ff6},
+	{0x011000, 0x011000},
+	{0x011001, 0x011001},
+	{0x011002, 0x011002},
+	{0x011003, 0x011037},
+	{0x011038, 0x011045},
+	{0x011071, 0x011072},
+	{0x011073, 0x011074},
+	{0x011075, 0x011075},
+	{0x011080, 0x011081},
+	{0x011082, 0x011082},
+	{0x011083, 0x0110af},
+	{0x0110b0, 0x0110b2},
+	{0x0110b3, 0x0110b6},
+	{0x0110b7, 0x0110b8},
+	{0x0110c2, 0x0110c2},
+	{0x0110d0, 0x0110e8},
+	{0x011100, 0x011102},
+	{0x011103, 0x011126},
+	{0x011127, 0x01112b},
+	{0x01112c, 0x01112c},
+	{0x01112d, 0x011132},
+	{0x011144, 0x011144},
+	{0x011145, 0x011146},
+	{0x011147, 0x011147},
+	{0x011150, 0x011172},
+	{0x011176, 0x011176},
+	{0x011180, 0x011181},
+	{0x011182, 0x011182},
+	{0x011183, 0x0111b2},
+	{0x0111b3, 0x0111b5},
+	{0x0111b6, 0x0111be},
+	{0x0111bf, 0x0111bf},
+	{0x0111c1, 0x0111c4},
+	{0x0111ce, 0x0111ce},
+	{0x0111cf, 0x0111cf},
+	{0x0111da, 0x0111da},
+	{0x0111dc, 0x0111dc},
+	{0x011200, 0x011211},
+	{0x011213, 0x01122b},
+	{0x01122c, 0x01122e},
+	{0x01122f, 0x011231},
+	{0x011232, 0x011233},
+	{0x011234, 0x011234},
+	{0x011237, 0x011237},
+	{0x01123e, 0x01123e},
+	{0x01123f, 0x011240},
+	{0x011241, 0x011241},
+	{0x011280, 0x011286},
+	{0x011288, 0x011288},
+	{0x01128a, 0x01128d},
+	{0x01128f, 0x01129d},
+	{0x01129f, 0x0112a8},
+	{0x0112b0, 0x0112de},
+	{0x0112df, 0x0112df},
+	{0x0112e0, 0x0112e2},
+	{0x0112e3, 0x0112e8},
+	{0x011300, 0x011301},
+	{0x011302, 0x011303},
+	{0x011305, 0x01130c},
+	{0x01130f, 0x011310},
+	{0x011313, 0x011328},
+	{0x01132a, 0x011330},
+	{0x011332, 0x011333},
+	{0x011335, 0x011339},
+	{0x01133d, 0x01133d},
+	{0x01133e, 0x01133f},
+	{0x011340, 0x011340},
+	{0x011341, 0x011344},
+	{0x011347, 0x011348},
+	{0x01134b, 0x01134c},
+	{0x011350, 0x011350},
+	{0x011357, 0x011357},
+	{0x01135d, 0x011361},
+	{0x011362, 0x011363},
+	{0x011400, 0x011434},
+	{0x011435, 0x011437},
+	{0x011438, 0x01143f},
+	{0x011440, 0x011441},
+	{0x011443, 0x011444},
+	{0x011445, 0x011445},
+	{0x011447, 0x01144a},
+	{0x01145f, 0x011461},
+	{0x011480, 0x0114af},
+	{0x0114b0, 0x0114b2},
+	{0x0114b3, 0x0114b8},
+	{0x0114b9, 0x0114b9},
+	{0x0114ba, 0x0114ba},
+	{0x0114bb, 0x0114be},
+	{0x0114bf, 0x0114c0},
+	{0x0114c1, 0x0114c1},
+	{0x0114c4, 0x0114c5},
+	{0x0114c7, 0x0114c7},
+	{0x011580, 0x0115ae},
+	{0x0115af, 0x0115b1},
+	{0x0115b2, 0x0115b5},
+	{0x0115b8, 0x0115bb},
+	{0x0115bc, 0x0115bd},
+	{0x0115be, 0x0115be},
+	{0x0115d8, 0x0115db},
+	{0x0115dc, 0x0115dd},
+	{0x011600, 0x01162f},
+	{0x011630, 0x011632},
+	{0x011633, 0x01163a},
+	{0x01163b, 0x01163c},
+	{0x01163d, 0x01163d},
+	{0x01163e, 0x01163e},
+	{0x011640, 0x011640},
+	{0x011644, 0x011644},
+	{0x011680, 0x0116aa},
+	{0x0116ab, 0x0116ab},
+	{0x0116ac, 0x0116ac},
+	{0x0116ad, 0x0116ad},
+	{0x0116ae, 0x0116af},
+	{0x0116b0, 0x0116b5},
+	{0x0116b8, 0x0116b8},
+	{0x011700, 0x01171a},
+	{0x01171d, 0x01171f},
+	{0x011720, 0x011721},
+	{0x011722, 0x011725},
+	{0x011726, 0x011726},
+	{0x011727, 0x01172a},
+	{0x011740, 0x011746},
+	{0x011800, 0x01182b},
+	{0x01182c, 0x01182e},
+	{0x01182f, 0x011837},
+	{0x011838, 0x011838},
+	{0x0118a0, 0x0118df},
+	{0x0118ff, 0x011906},
+	{0x011909, 0x011909},
+	{0x01190c, 0x011913},
+	{0x011915, 0x011916},
+	{0x011918, 0x01192f},
+	{0x011930, 0x011935},
+	{0x011937, 0x011938},
+	{0x01193b, 0x01193c},
+	{0x01193f, 0x01193f},
+	{0x011940, 0x011940},
+	{0x011941, 0x011941},
+	{0x011942, 0x011942},
+	{0x0119a0, 0x0119a7},
+	{0x0119aa, 0x0119d0},
+	{0x0119d1, 0x0119d3},
+	{0x0119d4, 0x0119d7},
+	{0x0119da, 0x0119db},
+	{0x0119dc, 0x0119df},
+	{0x0119e1, 0x0119e1},
+	{0x0119e3, 0x0119e3},
+	{0x0119e4, 0x0119e4},
+	{0x011a00, 0x011a00},
+	{0x011a01, 0x011a0a},
+	{0x011a0b, 0x011a32},
+	{0x011a35, 0x011a38},
+	{0x011a39, 0x011a39},
+	{0x011a3a, 0x011a3a},
+	{0x011a3b, 0x011a3e},
+	{0x011a50, 0x011a50},
+	{0x011a51, 0x011a56},
+	{0x011a57, 0x011a58},
+	{0x011a59, 0x011a5b},
+	{0x011a5c, 0x011a89},
+	{0x011a8a, 0x011a96},
+	{0x011a97, 0x011a97},
+	{0x011a9d, 0x011a9d},
+	{0x011ab0, 0x011af8},
+	{0x011c00, 0x011c08},
+	{0x011c0a, 0x011c2e},
+	{0x011c2f, 0x011c2f},
+	{0x011c30, 0x011c36},
+	{0x011c38, 0x011c3d},
+	{0x011c3e, 0x011c3e},
+	{0x011c40, 0x011c40},
+	{0x011c72, 0x011c8f},
+	{0x011c92, 0x011ca7},
+	{0x011ca9, 0x011ca9},
+	{0x011caa, 0x011cb0},
+	{0x011cb1, 0x011cb1},
+	{0x011cb2, 0x011cb3},
+	{0x011cb4, 0x011cb4},
+	{0x011cb5, 0x011cb6},
+	{0x011d00, 0x011d06},
+	{0x011d08, 0x011d09},
+	{0x011d0b, 0x011d30},
+	{0x011d31, 0x011d36},
+	{0x011d3a, 0x011d3a},
+	{0x011d3c, 0x011d3d},
+	{0x011d3f, 0x011d41},
+	{0x011d43, 0x011d43},
+	{0x011d46, 0x011d46},
+	{0x011d47, 0x011d47},
+	{0x011d60, 0x011d65},
+	{0x011d67, 0x011d68},
+	{0x011d6a, 0x011d89},
+	{0x011d8a, 0x011d8e},
+	{0x011d90, 0x011d91},
+	{0x011d93, 0x011d94},
+	{0x011d95, 0x011d95},
+	{0x011d96, 0x011d96},
+	{0x011d98, 0x011d98},
+	{0x011ee0, 0x011ef2},
+	{0x011ef3, 0x011ef4},
+	{0x011ef5, 0x011ef6},
+	{0x011f00, 0x011f01},
+	{0x011f02, 0x011f02},
+	{0x011f03, 0x011f03},
+	{0x011f04, 0x011f10},
+	{0x011f12, 0x011f33},
+	{0x011f34, 0x011f35},
+	{0x011f36, 0x011f3a},
+	{0x011f3e, 0x011f3f},
+	{0x011f40, 0x011f40},
+	{0x011fb0, 0x011fb0},
+	{0x012000, 0x012399},
+	{0x012400, 0x01246e},
+	{0x012480, 0x012543},
+	{0x012f90, 0x012ff0},
+	{0x013000, 0x01342f},
+	{0x013441, 0x013446},
+	{0x014400, 0x014646},
+	{0x016800, 0x016a38},
+	{0x016a40, 0x016a5e},
+	{0x016a70, 0x016abe},
+	{0x016ad0, 0x016aed},
+	{0x016b00, 0x016b2f},
+	{0x016b40, 0x016b43},
+	{0x016b63, 0x016b77},
+	{0x016b7d, 0x016b8f},
+	{0x016e40, 0x016e7f},
+	{0x016f00, 0x016f4a},
+	{0x016f4f, 0x016f4f},
+	{0x016f50, 0x016f50},
+	{0x016f51, 0x016f87},
+	{0x016f8f, 0x016f92},
+	{0x016f93, 0x016f9f},
+	{0x016fe0, 0x016fe1},
+	{0x016fe3, 0x016fe3},
+	{0x016ff0, 0x016ff1},
+	{0x017000, 0x0187f7},
+	{0x018800, 0x018cd5},
+	{0x018d00, 0x018d08},
+	{0x01aff0, 0x01aff3},
+	{0x01aff5, 0x01affb},
+	{0x01affd, 0x01affe},
+	{0x01b000, 0x01b122},
+	{0x01b132, 0x01b132},
+	{0x01b150, 0x01b152},
+	{0x01b155, 0x01b155},
+	{0x01b164, 0x01b167},
+	{0x01b170, 0x01b2fb},
+	{0x01bc00, 0x01bc6a},
+	{0x01bc70, 0x01bc7c},
+	{0x01bc80, 0x01bc88},
+	{0x01bc90, 0x01bc99},
+	{0x01bc9e, 0x01bc9e},
+	{0x01d400, 0x01d454},
+	{0x01d456, 0x01d49c},
+	{0x01d49e, 0x01d49f},
+	{0x01d4a2, 0x01d4a2},
+	{0x01d4a5, 0x01d4a6},
+	{0x01d4a9, 0x01d4ac},
+	{0x01d4ae, 0x01d4b9},
+	{0x01d4bb, 0x01d4bb},
+	{0x01d4bd, 0x01d4c3},
+	{0x01d4c5, 0x01d505},
+	{0x01d507, 0x01d50a},
+	{0x01d50d, 0x01d514},
+	{0x01d516, 0x01d51c},
+	{0x01d51e, 0x01d539},
+	{0x01d53b, 0x01d53e},
+	{0x01d540, 0x01d544},
+	{0x01d546, 0x01d546},
+	{0x01d54a, 0x01d550},
+	{0x01d552, 0x01d6a5},
+	{0x01d6a8, 0x01d6c0},
+	{0x01d6c2, 0x01d6da},
+	{0x01d6dc, 0x01d6fa},
+	{0x01d6fc, 0x01d714},
+	{0x01d716, 0x01d734},
+	{0x01d736, 0x01d74e},
+	{0x01d750, 0x01d76e},
+	{0x01d770, 0x01d788},
+	{0x01d78a, 0x01d7a8},
+	{0x01d7aa, 0x01d7c2},
+	{0x01d7c4, 0x01d7cb},
+	{0x01df00, 0x01df09},
+	{0x01df0a, 0x01df0a},
+	{0x01df0b, 0x01df1e},
+	{0x01df25, 0x01df2a},
+	{0x01e000, 0x01e006},
+	{0x01e008, 0x01e018},
+	{0x01e01b, 0x01e021},
+	{0x01e023, 0x01e024},
+	{0x01e026, 0x01e02a},
+	{0x01e030, 0x01e06d},
+	{0x01e08f, 0x01e08f},
+	{0x01e100, 0x01e12c},
+	{0x01e137, 0x01e13d},
+	{0x01e14e, 0x01e14e},
+	{0x01e290, 0x01e2ad},
+	{0x01e2c0, 0x01e2eb},
+	{0x01e4d0, 0x01e4ea},
+	{0x01e4eb, 0x01e4eb},
+	{0x01e7e0, 0x01e7e6},
+	{0x01e7e8, 0x01e7eb},
+	{0x01e7ed, 0x01e7ee},
+	{0x01e7f0, 0x01e7fe},
+	{0x01e800, 0x01e8c4},
+	{0x01e900, 0x01e943},
+	{0x01e947, 0x01e947},
+	{0x01e94b, 0x01e94b},
+	{0x01ee00, 0x01ee03},
+	{0x01ee05, 0x01ee1f},
+	{0x01ee21, 0x01ee22},
+	{0x01ee24, 0x01ee24},
+	{0x01ee27, 0x01ee27},
+	{0x01ee29, 0x01ee32},
+	{0x01ee34, 0x01ee37},
+	{0x01ee39, 0x01ee39},
+	{0x01ee3b, 0x01ee3b},
+	{0x01ee42, 0x01ee42},
+	{0x01ee47, 0x01ee47},
+	{0x01ee49, 0x01ee49},
+	{0x01ee4b, 0x01ee4b},
+	{0x01ee4d, 0x01ee4f},
+	{0x01ee51, 0x01ee52},
+	{0x01ee54, 0x01ee54},
+	{0x01ee57, 0x01ee57},
+	{0x01ee59, 0x01ee59},
+	{0x01ee5b, 0x01ee5b},
+	{0x01ee5d, 0x01ee5d},
+	{0x01ee5f, 0x01ee5f},
+	{0x01ee61, 0x01ee62},
+	{0x01ee64, 0x01ee64},
+	{0x01ee67, 0x01ee6a},
+	{0x01ee6c, 0x01ee72},
+	{0x01ee74, 0x01ee77},
+	{0x01ee79, 0x01ee7c},
+	{0x01ee7e, 0x01ee7e},
+	{0x01ee80, 0x01ee89},
+	{0x01ee8b, 0x01ee9b},
+	{0x01eea1, 0x01eea3},
+	{0x01eea5, 0x01eea9},
+	{0x01eeab, 0x01eebb},
+	{0x01f130, 0x01f149},
+	{0x01f150, 0x01f169},
+	{0x01f170, 0x01f189},
+	{0x020000, 0x02a6df},
+	{0x02a700, 0x02b739},
+	{0x02b740, 0x02b81d},
+	{0x02b820, 0x02cea1},
+	{0x02ceb0, 0x02ebe0},
+	{0x02ebf0, 0x02ee5d},
+	{0x02f800, 0x02fa1d},
+	{0x030000, 0x03134a},
+	{0x031350, 0x0323af}
+};
+
+/* table of Unicode codepoint ranges of Lowercase characters */
+static const pg_unicode_range unicode_lowercase[686] =
+{
+	{0x000061, 0x00007a},
+	{0x0000aa, 0x0000aa},
+	{0x0000b5, 0x0000b5},
+	{0x0000ba, 0x0000ba},
+	{0x0000df, 0x0000f6},
+	{0x0000f8, 0x0000ff},
+	{0x000101, 0x000101},
+	{0x000103, 0x000103},
+	{0x000105, 0x000105},
+	{0x000107, 0x000107},
+	{0x000109, 0x000109},
+	{0x00010b, 0x00010b},
+	{0x00010d, 0x00010d},
+	{0x00010f, 0x00010f},
+	{0x000111, 0x000111},
+	{0x000113, 0x000113},
+	{0x000115, 0x000115},
+	{0x000117, 0x000117},
+	{0x000119, 0x000119},
+	{0x00011b, 0x00011b},
+	{0x00011d, 0x00011d},
+	{0x00011f, 0x00011f},
+	{0x000121, 0x000121},
+	{0x000123, 0x000123},
+	{0x000125, 0x000125},
+	{0x000127, 0x000127},
+	{0x000129, 0x000129},
+	{0x00012b, 0x00012b},
+	{0x00012d, 0x00012d},
+	{0x00012f, 0x00012f},
+	{0x000131, 0x000131},
+	{0x000133, 0x000133},
+	{0x000135, 0x000135},
+	{0x000137, 0x000138},
+	{0x00013a, 0x00013a},
+	{0x00013c, 0x00013c},
+	{0x00013e, 0x00013e},
+	{0x000140, 0x000140},
+	{0x000142, 0x000142},
+	{0x000144, 0x000144},
+	{0x000146, 0x000146},
+	{0x000148, 0x000149},
+	{0x00014b, 0x00014b},
+	{0x00014d, 0x00014d},
+	{0x00014f, 0x00014f},
+	{0x000151, 0x000151},
+	{0x000153, 0x000153},
+	{0x000155, 0x000155},
+	{0x000157, 0x000157},
+	{0x000159, 0x000159},
+	{0x00015b, 0x00015b},
+	{0x00015d, 0x00015d},
+	{0x00015f, 0x00015f},
+	{0x000161, 0x000161},
+	{0x000163, 0x000163},
+	{0x000165, 0x000165},
+	{0x000167, 0x000167},
+	{0x000169, 0x000169},
+	{0x00016b, 0x00016b},
+	{0x00016d, 0x00016d},
+	{0x00016f, 0x00016f},
+	{0x000171, 0x000171},
+	{0x000173, 0x000173},
+	{0x000175, 0x000175},
+	{0x000177, 0x000177},
+	{0x00017a, 0x00017a},
+	{0x00017c, 0x00017c},
+	{0x00017e, 0x000180},
+	{0x000183, 0x000183},
+	{0x000185, 0x000185},
+	{0x000188, 0x000188},
+	{0x00018c, 0x00018d},
+	{0x000192, 0x000192},
+	{0x000195, 0x000195},
+	{0x000199, 0x00019b},
+	{0x00019e, 0x00019e},
+	{0x0001a1, 0x0001a1},
+	{0x0001a3, 0x0001a3},
+	{0x0001a5, 0x0001a5},
+	{0x0001a8, 0x0001a8},
+	{0x0001aa, 0x0001ab},
+	{0x0001ad, 0x0001ad},
+	{0x0001b0, 0x0001b0},
+	{0x0001b4, 0x0001b4},
+	{0x0001b6, 0x0001b6},
+	{0x0001b9, 0x0001ba},
+	{0x0001bd, 0x0001bf},
+	{0x0001c6, 0x0001c6},
+	{0x0001c9, 0x0001c9},
+	{0x0001cc, 0x0001cc},
+	{0x0001ce, 0x0001ce},
+	{0x0001d0, 0x0001d0},
+	{0x0001d2, 0x0001d2},
+	{0x0001d4, 0x0001d4},
+	{0x0001d6, 0x0001d6},
+	{0x0001d8, 0x0001d8},
+	{0x0001da, 0x0001da},
+	{0x0001dc, 0x0001dd},
+	{0x0001df, 0x0001df},
+	{0x0001e1, 0x0001e1},
+	{0x0001e3, 0x0001e3},
+	{0x0001e5, 0x0001e5},
+	{0x0001e7, 0x0001e7},
+	{0x0001e9, 0x0001e9},
+	{0x0001eb, 0x0001eb},
+	{0x0001ed, 0x0001ed},
+	{0x0001ef, 0x0001f0},
+	{0x0001f3, 0x0001f3},
+	{0x0001f5, 0x0001f5},
+	{0x0001f9, 0x0001f9},
+	{0x0001fb, 0x0001fb},
+	{0x0001fd, 0x0001fd},
+	{0x0001ff, 0x0001ff},
+	{0x000201, 0x000201},
+	{0x000203, 0x000203},
+	{0x000205, 0x000205},
+	{0x000207, 0x000207},
+	{0x000209, 0x000209},
+	{0x00020b, 0x00020b},
+	{0x00020d, 0x00020d},
+	{0x00020f, 0x00020f},
+	{0x000211, 0x000211},
+	{0x000213, 0x000213},
+	{0x000215, 0x000215},
+	{0x000217, 0x000217},
+	{0x000219, 0x000219},
+	{0x00021b, 0x00021b},
+	{0x00021d, 0x00021d},
+	{0x00021f, 0x00021f},
+	{0x000221, 0x000221},
+	{0x000223, 0x000223},
+	{0x000225, 0x000225},
+	{0x000227, 0x000227},
+	{0x000229, 0x000229},
+	{0x00022b, 0x00022b},
+	{0x00022d, 0x00022d},
+	{0x00022f, 0x00022f},
+	{0x000231, 0x000231},
+	{0x000233, 0x000239},
+	{0x00023c, 0x00023c},
+	{0x00023f, 0x000240},
+	{0x000242, 0x000242},
+	{0x000247, 0x000247},
+	{0x000249, 0x000249},
+	{0x00024b, 0x00024b},
+	{0x00024d, 0x00024d},
+	{0x00024f, 0x000293},
+	{0x000295, 0x0002af},
+	{0x0002b0, 0x0002b8},
+	{0x0002c0, 0x0002c1},
+	{0x0002e0, 0x0002e4},
+	{0x000345, 0x000345},
+	{0x000371, 0x000371},
+	{0x000373, 0x000373},
+	{0x000377, 0x000377},
+	{0x00037a, 0x00037a},
+	{0x00037b, 0x00037d},
+	{0x000390, 0x000390},
+	{0x0003ac, 0x0003ce},
+	{0x0003d0, 0x0003d1},
+	{0x0003d5, 0x0003d7},
+	{0x0003d9, 0x0003d9},
+	{0x0003db, 0x0003db},
+	{0x0003dd, 0x0003dd},
+	{0x0003df, 0x0003df},
+	{0x0003e1, 0x0003e1},
+	{0x0003e3, 0x0003e3},
+	{0x0003e5, 0x0003e5},
+	{0x0003e7, 0x0003e7},
+	{0x0003e9, 0x0003e9},
+	{0x0003eb, 0x0003eb},
+	{0x0003ed, 0x0003ed},
+	{0x0003ef, 0x0003f3},
+	{0x0003f5, 0x0003f5},
+	{0x0003f8, 0x0003f8},
+	{0x0003fb, 0x0003fc},
+	{0x000430, 0x00045f},
+	{0x000461, 0x000461},
+	{0x000463, 0x000463},
+	{0x000465, 0x000465},
+	{0x000467, 0x000467},
+	{0x000469, 0x000469},
+	{0x00046b, 0x00046b},
+	{0x00046d, 0x00046d},
+	{0x00046f, 0x00046f},
+	{0x000471, 0x000471},
+	{0x000473, 0x000473},
+	{0x000475, 0x000475},
+	{0x000477, 0x000477},
+	{0x000479, 0x000479},
+	{0x00047b, 0x00047b},
+	{0x00047d, 0x00047d},
+	{0x00047f, 0x00047f},
+	{0x000481, 0x000481},
+	{0x00048b, 0x00048b},
+	{0x00048d, 0x00048d},
+	{0x00048f, 0x00048f},
+	{0x000491, 0x000491},
+	{0x000493, 0x000493},
+	{0x000495, 0x000495},
+	{0x000497, 0x000497},
+	{0x000499, 0x000499},
+	{0x00049b, 0x00049b},
+	{0x00049d, 0x00049d},
+	{0x00049f, 0x00049f},
+	{0x0004a1, 0x0004a1},
+	{0x0004a3, 0x0004a3},
+	{0x0004a5, 0x0004a5},
+	{0x0004a7, 0x0004a7},
+	{0x0004a9, 0x0004a9},
+	{0x0004ab, 0x0004ab},
+	{0x0004ad, 0x0004ad},
+	{0x0004af, 0x0004af},
+	{0x0004b1, 0x0004b1},
+	{0x0004b3, 0x0004b3},
+	{0x0004b5, 0x0004b5},
+	{0x0004b7, 0x0004b7},
+	{0x0004b9, 0x0004b9},
+	{0x0004bb, 0x0004bb},
+	{0x0004bd, 0x0004bd},
+	{0x0004bf, 0x0004bf},
+	{0x0004c2, 0x0004c2},
+	{0x0004c4, 0x0004c4},
+	{0x0004c6, 0x0004c6},
+	{0x0004c8, 0x0004c8},
+	{0x0004ca, 0x0004ca},
+	{0x0004cc, 0x0004cc},
+	{0x0004ce, 0x0004cf},
+	{0x0004d1, 0x0004d1},
+	{0x0004d3, 0x0004d3},
+	{0x0004d5, 0x0004d5},
+	{0x0004d7, 0x0004d7},
+	{0x0004d9, 0x0004d9},
+	{0x0004db, 0x0004db},
+	{0x0004dd, 0x0004dd},
+	{0x0004df, 0x0004df},
+	{0x0004e1, 0x0004e1},
+	{0x0004e3, 0x0004e3},
+	{0x0004e5, 0x0004e5},
+	{0x0004e7, 0x0004e7},
+	{0x0004e9, 0x0004e9},
+	{0x0004eb, 0x0004eb},
+	{0x0004ed, 0x0004ed},
+	{0x0004ef, 0x0004ef},
+	{0x0004f1, 0x0004f1},
+	{0x0004f3, 0x0004f3},
+	{0x0004f5, 0x0004f5},
+	{0x0004f7, 0x0004f7},
+	{0x0004f9, 0x0004f9},
+	{0x0004fb, 0x0004fb},
+	{0x0004fd, 0x0004fd},
+	{0x0004ff, 0x0004ff},
+	{0x000501, 0x000501},
+	{0x000503, 0x000503},
+	{0x000505, 0x000505},
+	{0x000507, 0x000507},
+	{0x000509, 0x000509},
+	{0x00050b, 0x00050b},
+	{0x00050d, 0x00050d},
+	{0x00050f, 0x00050f},
+	{0x000511, 0x000511},
+	{0x000513, 0x000513},
+	{0x000515, 0x000515},
+	{0x000517, 0x000517},
+	{0x000519, 0x000519},
+	{0x00051b, 0x00051b},
+	{0x00051d, 0x00051d},
+	{0x00051f, 0x00051f},
+	{0x000521, 0x000521},
+	{0x000523, 0x000523},
+	{0x000525, 0x000525},
+	{0x000527, 0x000527},
+	{0x000529, 0x000529},
+	{0x00052b, 0x00052b},
+	{0x00052d, 0x00052d},
+	{0x00052f, 0x00052f},
+	{0x000560, 0x000588},
+	{0x0010d0, 0x0010fa},
+	{0x0010fc, 0x0010fc},
+	{0x0010fd, 0x0010ff},
+	{0x0013f8, 0x0013fd},
+	{0x001c80, 0x001c88},
+	{0x001d00, 0x001d2b},
+	{0x001d2c, 0x001d6a},
+	{0x001d6b, 0x001d77},
+	{0x001d78, 0x001d78},
+	{0x001d79, 0x001d9a},
+	{0x001d9b, 0x001dbf},
+	{0x001e01, 0x001e01},
+	{0x001e03, 0x001e03},
+	{0x001e05, 0x001e05},
+	{0x001e07, 0x001e07},
+	{0x001e09, 0x001e09},
+	{0x001e0b, 0x001e0b},
+	{0x001e0d, 0x001e0d},
+	{0x001e0f, 0x001e0f},
+	{0x001e11, 0x001e11},
+	{0x001e13, 0x001e13},
+	{0x001e15, 0x001e15},
+	{0x001e17, 0x001e17},
+	{0x001e19, 0x001e19},
+	{0x001e1b, 0x001e1b},
+	{0x001e1d, 0x001e1d},
+	{0x001e1f, 0x001e1f},
+	{0x001e21, 0x001e21},
+	{0x001e23, 0x001e23},
+	{0x001e25, 0x001e25},
+	{0x001e27, 0x001e27},
+	{0x001e29, 0x001e29},
+	{0x001e2b, 0x001e2b},
+	{0x001e2d, 0x001e2d},
+	{0x001e2f, 0x001e2f},
+	{0x001e31, 0x001e31},
+	{0x001e33, 0x001e33},
+	{0x001e35, 0x001e35},
+	{0x001e37, 0x001e37},
+	{0x001e39, 0x001e39},
+	{0x001e3b, 0x001e3b},
+	{0x001e3d, 0x001e3d},
+	{0x001e3f, 0x001e3f},
+	{0x001e41, 0x001e41},
+	{0x001e43, 0x001e43},
+	{0x001e45, 0x001e45},
+	{0x001e47, 0x001e47},
+	{0x001e49, 0x001e49},
+	{0x001e4b, 0x001e4b},
+	{0x001e4d, 0x001e4d},
+	{0x001e4f, 0x001e4f},
+	{0x001e51, 0x001e51},
+	{0x001e53, 0x001e53},
+	{0x001e55, 0x001e55},
+	{0x001e57, 0x001e57},
+	{0x001e59, 0x001e59},
+	{0x001e5b, 0x001e5b},
+	{0x001e5d, 0x001e5d},
+	{0x001e5f, 0x001e5f},
+	{0x001e61, 0x001e61},
+	{0x001e63, 0x001e63},
+	{0x001e65, 0x001e65},
+	{0x001e67, 0x001e67},
+	{0x001e69, 0x001e69},
+	{0x001e6b, 0x001e6b},
+	{0x001e6d, 0x001e6d},
+	{0x001e6f, 0x001e6f},
+	{0x001e71, 0x001e71},
+	{0x001e73, 0x001e73},
+	{0x001e75, 0x001e75},
+	{0x001e77, 0x001e77},
+	{0x001e79, 0x001e79},
+	{0x001e7b, 0x001e7b},
+	{0x001e7d, 0x001e7d},
+	{0x001e7f, 0x001e7f},
+	{0x001e81, 0x001e81},
+	{0x001e83, 0x001e83},
+	{0x001e85, 0x001e85},
+	{0x001e87, 0x001e87},
+	{0x001e89, 0x001e89},
+	{0x001e8b, 0x001e8b},
+	{0x001e8d, 0x001e8d},
+	{0x001e8f, 0x001e8f},
+	{0x001e91, 0x001e91},
+	{0x001e93, 0x001e93},
+	{0x001e95, 0x001e9d},
+	{0x001e9f, 0x001e9f},
+	{0x001ea1, 0x001ea1},
+	{0x001ea3, 0x001ea3},
+	{0x001ea5, 0x001ea5},
+	{0x001ea7, 0x001ea7},
+	{0x001ea9, 0x001ea9},
+	{0x001eab, 0x001eab},
+	{0x001ead, 0x001ead},
+	{0x001eaf, 0x001eaf},
+	{0x001eb1, 0x001eb1},
+	{0x001eb3, 0x001eb3},
+	{0x001eb5, 0x001eb5},
+	{0x001eb7, 0x001eb7},
+	{0x001eb9, 0x001eb9},
+	{0x001ebb, 0x001ebb},
+	{0x001ebd, 0x001ebd},
+	{0x001ebf, 0x001ebf},
+	{0x001ec1, 0x001ec1},
+	{0x001ec3, 0x001ec3},
+	{0x001ec5, 0x001ec5},
+	{0x001ec7, 0x001ec7},
+	{0x001ec9, 0x001ec9},
+	{0x001ecb, 0x001ecb},
+	{0x001ecd, 0x001ecd},
+	{0x001ecf, 0x001ecf},
+	{0x001ed1, 0x001ed1},
+	{0x001ed3, 0x001ed3},
+	{0x001ed5, 0x001ed5},
+	{0x001ed7, 0x001ed7},
+	{0x001ed9, 0x001ed9},
+	{0x001edb, 0x001edb},
+	{0x001edd, 0x001edd},
+	{0x001edf, 0x001edf},
+	{0x001ee1, 0x001ee1},
+	{0x001ee3, 0x001ee3},
+	{0x001ee5, 0x001ee5},
+	{0x001ee7, 0x001ee7},
+	{0x001ee9, 0x001ee9},
+	{0x001eeb, 0x001eeb},
+	{0x001eed, 0x001eed},
+	{0x001eef, 0x001eef},
+	{0x001ef1, 0x001ef1},
+	{0x001ef3, 0x001ef3},
+	{0x001ef5, 0x001ef5},
+	{0x001ef7, 0x001ef7},
+	{0x001ef9, 0x001ef9},
+	{0x001efb, 0x001efb},
+	{0x001efd, 0x001efd},
+	{0x001eff, 0x001f07},
+	{0x001f10, 0x001f15},
+	{0x001f20, 0x001f27},
+	{0x001f30, 0x001f37},
+	{0x001f40, 0x001f45},
+	{0x001f50, 0x001f57},
+	{0x001f60, 0x001f67},
+	{0x001f70, 0x001f7d},
+	{0x001f80, 0x001f87},
+	{0x001f90, 0x001f97},
+	{0x001fa0, 0x001fa7},
+	{0x001fb0, 0x001fb4},
+	{0x001fb6, 0x001fb7},
+	{0x001fbe, 0x001fbe},
+	{0x001fc2, 0x001fc4},
+	{0x001fc6, 0x001fc7},
+	{0x001fd0, 0x001fd3},
+	{0x001fd6, 0x001fd7},
+	{0x001fe0, 0x001fe7},
+	{0x001ff2, 0x001ff4},
+	{0x001ff6, 0x001ff7},
+	{0x002071, 0x002071},
+	{0x00207f, 0x00207f},
+	{0x002090, 0x00209c},
+	{0x00210a, 0x00210a},
+	{0x00210e, 0x00210f},
+	{0x002113, 0x002113},
+	{0x00212f, 0x00212f},
+	{0x002134, 0x002134},
+	{0x002139, 0x002139},
+	{0x00213c, 0x00213d},
+	{0x002146, 0x002149},
+	{0x00214e, 0x00214e},
+	{0x002170, 0x00217f},
+	{0x002184, 0x002184},
+	{0x0024d0, 0x0024e9},
+	{0x002c30, 0x002c5f},
+	{0x002c61, 0x002c61},
+	{0x002c65, 0x002c66},
+	{0x002c68, 0x002c68},
+	{0x002c6a, 0x002c6a},
+	{0x002c6c, 0x002c6c},
+	{0x002c71, 0x002c71},
+	{0x002c73, 0x002c74},
+	{0x002c76, 0x002c7b},
+	{0x002c7c, 0x002c7d},
+	{0x002c81, 0x002c81},
+	{0x002c83, 0x002c83},
+	{0x002c85, 0x002c85},
+	{0x002c87, 0x002c87},
+	{0x002c89, 0x002c89},
+	{0x002c8b, 0x002c8b},
+	{0x002c8d, 0x002c8d},
+	{0x002c8f, 0x002c8f},
+	{0x002c91, 0x002c91},
+	{0x002c93, 0x002c93},
+	{0x002c95, 0x002c95},
+	{0x002c97, 0x002c97},
+	{0x002c99, 0x002c99},
+	{0x002c9b, 0x002c9b},
+	{0x002c9d, 0x002c9d},
+	{0x002c9f, 0x002c9f},
+	{0x002ca1, 0x002ca1},
+	{0x002ca3, 0x002ca3},
+	{0x002ca5, 0x002ca5},
+	{0x002ca7, 0x002ca7},
+	{0x002ca9, 0x002ca9},
+	{0x002cab, 0x002cab},
+	{0x002cad, 0x002cad},
+	{0x002caf, 0x002caf},
+	{0x002cb1, 0x002cb1},
+	{0x002cb3, 0x002cb3},
+	{0x002cb5, 0x002cb5},
+	{0x002cb7, 0x002cb7},
+	{0x002cb9, 0x002cb9},
+	{0x002cbb, 0x002cbb},
+	{0x002cbd, 0x002cbd},
+	{0x002cbf, 0x002cbf},
+	{0x002cc1, 0x002cc1},
+	{0x002cc3, 0x002cc3},
+	{0x002cc5, 0x002cc5},
+	{0x002cc7, 0x002cc7},
+	{0x002cc9, 0x002cc9},
+	{0x002ccb, 0x002ccb},
+	{0x002ccd, 0x002ccd},
+	{0x002ccf, 0x002ccf},
+	{0x002cd1, 0x002cd1},
+	{0x002cd3, 0x002cd3},
+	{0x002cd5, 0x002cd5},
+	{0x002cd7, 0x002cd7},
+	{0x002cd9, 0x002cd9},
+	{0x002cdb, 0x002cdb},
+	{0x002cdd, 0x002cdd},
+	{0x002cdf, 0x002cdf},
+	{0x002ce1, 0x002ce1},
+	{0x002ce3, 0x002ce4},
+	{0x002cec, 0x002cec},
+	{0x002cee, 0x002cee},
+	{0x002cf3, 0x002cf3},
+	{0x002d00, 0x002d25},
+	{0x002d27, 0x002d27},
+	{0x002d2d, 0x002d2d},
+	{0x00a641, 0x00a641},
+	{0x00a643, 0x00a643},
+	{0x00a645, 0x00a645},
+	{0x00a647, 0x00a647},
+	{0x00a649, 0x00a649},
+	{0x00a64b, 0x00a64b},
+	{0x00a64d, 0x00a64d},
+	{0x00a64f, 0x00a64f},
+	{0x00a651, 0x00a651},
+	{0x00a653, 0x00a653},
+	{0x00a655, 0x00a655},
+	{0x00a657, 0x00a657},
+	{0x00a659, 0x00a659},
+	{0x00a65b, 0x00a65b},
+	{0x00a65d, 0x00a65d},
+	{0x00a65f, 0x00a65f},
+	{0x00a661, 0x00a661},
+	{0x00a663, 0x00a663},
+	{0x00a665, 0x00a665},
+	{0x00a667, 0x00a667},
+	{0x00a669, 0x00a669},
+	{0x00a66b, 0x00a66b},
+	{0x00a66d, 0x00a66d},
+	{0x00a681, 0x00a681},
+	{0x00a683, 0x00a683},
+	{0x00a685, 0x00a685},
+	{0x00a687, 0x00a687},
+	{0x00a689, 0x00a689},
+	{0x00a68b, 0x00a68b},
+	{0x00a68d, 0x00a68d},
+	{0x00a68f, 0x00a68f},
+	{0x00a691, 0x00a691},
+	{0x00a693, 0x00a693},
+	{0x00a695, 0x00a695},
+	{0x00a697, 0x00a697},
+	{0x00a699, 0x00a699},
+	{0x00a69b, 0x00a69b},
+	{0x00a69c, 0x00a69d},
+	{0x00a723, 0x00a723},
+	{0x00a725, 0x00a725},
+	{0x00a727, 0x00a727},
+	{0x00a729, 0x00a729},
+	{0x00a72b, 0x00a72b},
+	{0x00a72d, 0x00a72d},
+	{0x00a72f, 0x00a731},
+	{0x00a733, 0x00a733},
+	{0x00a735, 0x00a735},
+	{0x00a737, 0x00a737},
+	{0x00a739, 0x00a739},
+	{0x00a73b, 0x00a73b},
+	{0x00a73d, 0x00a73d},
+	{0x00a73f, 0x00a73f},
+	{0x00a741, 0x00a741},
+	{0x00a743, 0x00a743},
+	{0x00a745, 0x00a745},
+	{0x00a747, 0x00a747},
+	{0x00a749, 0x00a749},
+	{0x00a74b, 0x00a74b},
+	{0x00a74d, 0x00a74d},
+	{0x00a74f, 0x00a74f},
+	{0x00a751, 0x00a751},
+	{0x00a753, 0x00a753},
+	{0x00a755, 0x00a755},
+	{0x00a757, 0x00a757},
+	{0x00a759, 0x00a759},
+	{0x00a75b, 0x00a75b},
+	{0x00a75d, 0x00a75d},
+	{0x00a75f, 0x00a75f},
+	{0x00a761, 0x00a761},
+	{0x00a763, 0x00a763},
+	{0x00a765, 0x00a765},
+	{0x00a767, 0x00a767},
+	{0x00a769, 0x00a769},
+	{0x00a76b, 0x00a76b},
+	{0x00a76d, 0x00a76d},
+	{0x00a76f, 0x00a76f},
+	{0x00a770, 0x00a770},
+	{0x00a771, 0x00a778},
+	{0x00a77a, 0x00a77a},
+	{0x00a77c, 0x00a77c},
+	{0x00a77f, 0x00a77f},
+	{0x00a781, 0x00a781},
+	{0x00a783, 0x00a783},
+	{0x00a785, 0x00a785},
+	{0x00a787, 0x00a787},
+	{0x00a78c, 0x00a78c},
+	{0x00a78e, 0x00a78e},
+	{0x00a791, 0x00a791},
+	{0x00a793, 0x00a795},
+	{0x00a797, 0x00a797},
+	{0x00a799, 0x00a799},
+	{0x00a79b, 0x00a79b},
+	{0x00a79d, 0x00a79d},
+	{0x00a79f, 0x00a79f},
+	{0x00a7a1, 0x00a7a1},
+	{0x00a7a3, 0x00a7a3},
+	{0x00a7a5, 0x00a7a5},
+	{0x00a7a7, 0x00a7a7},
+	{0x00a7a9, 0x00a7a9},
+	{0x00a7af, 0x00a7af},
+	{0x00a7b5, 0x00a7b5},
+	{0x00a7b7, 0x00a7b7},
+	{0x00a7b9, 0x00a7b9},
+	{0x00a7bb, 0x00a7bb},
+	{0x00a7bd, 0x00a7bd},
+	{0x00a7bf, 0x00a7bf},
+	{0x00a7c1, 0x00a7c1},
+	{0x00a7c3, 0x00a7c3},
+	{0x00a7c8, 0x00a7c8},
+	{0x00a7ca, 0x00a7ca},
+	{0x00a7d1, 0x00a7d1},
+	{0x00a7d3, 0x00a7d3},
+	{0x00a7d5, 0x00a7d5},
+	{0x00a7d7, 0x00a7d7},
+	{0x00a7d9, 0x00a7d9},
+	{0x00a7f2, 0x00a7f4},
+	{0x00a7f6, 0x00a7f6},
+	{0x00a7f8, 0x00a7f9},
+	{0x00a7fa, 0x00a7fa},
+	{0x00ab30, 0x00ab5a},
+	{0x00ab5c, 0x00ab5f},
+	{0x00ab60, 0x00ab68},
+	{0x00ab69, 0x00ab69},
+	{0x00ab70, 0x00abbf},
+	{0x00fb00, 0x00fb06},
+	{0x00fb13, 0x00fb17},
+	{0x00ff41, 0x00ff5a},
+	{0x010428, 0x01044f},
+	{0x0104d8, 0x0104fb},
+	{0x010597, 0x0105a1},
+	{0x0105a3, 0x0105b1},
+	{0x0105b3, 0x0105b9},
+	{0x0105bb, 0x0105bc},
+	{0x010780, 0x010780},
+	{0x010783, 0x010785},
+	{0x010787, 0x0107b0},
+	{0x0107b2, 0x0107ba},
+	{0x010cc0, 0x010cf2},
+	{0x0118c0, 0x0118df},
+	{0x016e60, 0x016e7f},
+	{0x01d41a, 0x01d433},
+	{0x01d44e, 0x01d454},
+	{0x01d456, 0x01d467},
+	{0x01d482, 0x01d49b},
+	{0x01d4b6, 0x01d4b9},
+	{0x01d4bb, 0x01d4bb},
+	{0x01d4bd, 0x01d4c3},
+	{0x01d4c5, 0x01d4cf},
+	{0x01d4ea, 0x01d503},
+	{0x01d51e, 0x01d537},
+	{0x01d552, 0x01d56b},
+	{0x01d586, 0x01d59f},
+	{0x01d5ba, 0x01d5d3},
+	{0x01d5ee, 0x01d607},
+	{0x01d622, 0x01d63b},
+	{0x01d656, 0x01d66f},
+	{0x01d68a, 0x01d6a5},
+	{0x01d6c2, 0x01d6da},
+	{0x01d6dc, 0x01d6e1},
+	{0x01d6fc, 0x01d714},
+	{0x01d716, 0x01d71b},
+	{0x01d736, 0x01d74e},
+	{0x01d750, 0x01d755},
+	{0x01d770, 0x01d788},
+	{0x01d78a, 0x01d78f},
+	{0x01d7aa, 0x01d7c2},
+	{0x01d7c4, 0x01d7c9},
+	{0x01d7cb, 0x01d7cb},
+	{0x01df00, 0x01df09},
+	{0x01df0b, 0x01df1e},
+	{0x01df25, 0x01df2a},
+	{0x01e030, 0x01e06d},
+	{0x01e922, 0x01e943}
+};
+
+/* table of Unicode codepoint ranges of Uppercase characters */
+static const pg_unicode_range unicode_uppercase[651] =
+{
+	{0x000041, 0x00005a},
+	{0x0000c0, 0x0000d6},
+	{0x0000d8, 0x0000de},
+	{0x000100, 0x000100},
+	{0x000102, 0x000102},
+	{0x000104, 0x000104},
+	{0x000106, 0x000106},
+	{0x000108, 0x000108},
+	{0x00010a, 0x00010a},
+	{0x00010c, 0x00010c},
+	{0x00010e, 0x00010e},
+	{0x000110, 0x000110},
+	{0x000112, 0x000112},
+	{0x000114, 0x000114},
+	{0x000116, 0x000116},
+	{0x000118, 0x000118},
+	{0x00011a, 0x00011a},
+	{0x00011c, 0x00011c},
+	{0x00011e, 0x00011e},
+	{0x000120, 0x000120},
+	{0x000122, 0x000122},
+	{0x000124, 0x000124},
+	{0x000126, 0x000126},
+	{0x000128, 0x000128},
+	{0x00012a, 0x00012a},
+	{0x00012c, 0x00012c},
+	{0x00012e, 0x00012e},
+	{0x000130, 0x000130},
+	{0x000132, 0x000132},
+	{0x000134, 0x000134},
+	{0x000136, 0x000136},
+	{0x000139, 0x000139},
+	{0x00013b, 0x00013b},
+	{0x00013d, 0x00013d},
+	{0x00013f, 0x00013f},
+	{0x000141, 0x000141},
+	{0x000143, 0x000143},
+	{0x000145, 0x000145},
+	{0x000147, 0x000147},
+	{0x00014a, 0x00014a},
+	{0x00014c, 0x00014c},
+	{0x00014e, 0x00014e},
+	{0x000150, 0x000150},
+	{0x000152, 0x000152},
+	{0x000154, 0x000154},
+	{0x000156, 0x000156},
+	{0x000158, 0x000158},
+	{0x00015a, 0x00015a},
+	{0x00015c, 0x00015c},
+	{0x00015e, 0x00015e},
+	{0x000160, 0x000160},
+	{0x000162, 0x000162},
+	{0x000164, 0x000164},
+	{0x000166, 0x000166},
+	{0x000168, 0x000168},
+	{0x00016a, 0x00016a},
+	{0x00016c, 0x00016c},
+	{0x00016e, 0x00016e},
+	{0x000170, 0x000170},
+	{0x000172, 0x000172},
+	{0x000174, 0x000174},
+	{0x000176, 0x000176},
+	{0x000178, 0x000179},
+	{0x00017b, 0x00017b},
+	{0x00017d, 0x00017d},
+	{0x000181, 0x000182},
+	{0x000184, 0x000184},
+	{0x000186, 0x000187},
+	{0x000189, 0x00018b},
+	{0x00018e, 0x000191},
+	{0x000193, 0x000194},
+	{0x000196, 0x000198},
+	{0x00019c, 0x00019d},
+	{0x00019f, 0x0001a0},
+	{0x0001a2, 0x0001a2},
+	{0x0001a4, 0x0001a4},
+	{0x0001a6, 0x0001a7},
+	{0x0001a9, 0x0001a9},
+	{0x0001ac, 0x0001ac},
+	{0x0001ae, 0x0001af},
+	{0x0001b1, 0x0001b3},
+	{0x0001b5, 0x0001b5},
+	{0x0001b7, 0x0001b8},
+	{0x0001bc, 0x0001bc},
+	{0x0001c4, 0x0001c4},
+	{0x0001c7, 0x0001c7},
+	{0x0001ca, 0x0001ca},
+	{0x0001cd, 0x0001cd},
+	{0x0001cf, 0x0001cf},
+	{0x0001d1, 0x0001d1},
+	{0x0001d3, 0x0001d3},
+	{0x0001d5, 0x0001d5},
+	{0x0001d7, 0x0001d7},
+	{0x0001d9, 0x0001d9},
+	{0x0001db, 0x0001db},
+	{0x0001de, 0x0001de},
+	{0x0001e0, 0x0001e0},
+	{0x0001e2, 0x0001e2},
+	{0x0001e4, 0x0001e4},
+	{0x0001e6, 0x0001e6},
+	{0x0001e8, 0x0001e8},
+	{0x0001ea, 0x0001ea},
+	{0x0001ec, 0x0001ec},
+	{0x0001ee, 0x0001ee},
+	{0x0001f1, 0x0001f1},
+	{0x0001f4, 0x0001f4},
+	{0x0001f6, 0x0001f8},
+	{0x0001fa, 0x0001fa},
+	{0x0001fc, 0x0001fc},
+	{0x0001fe, 0x0001fe},
+	{0x000200, 0x000200},
+	{0x000202, 0x000202},
+	{0x000204, 0x000204},
+	{0x000206, 0x000206},
+	{0x000208, 0x000208},
+	{0x00020a, 0x00020a},
+	{0x00020c, 0x00020c},
+	{0x00020e, 0x00020e},
+	{0x000210, 0x000210},
+	{0x000212, 0x000212},
+	{0x000214, 0x000214},
+	{0x000216, 0x000216},
+	{0x000218, 0x000218},
+	{0x00021a, 0x00021a},
+	{0x00021c, 0x00021c},
+	{0x00021e, 0x00021e},
+	{0x000220, 0x000220},
+	{0x000222, 0x000222},
+	{0x000224, 0x000224},
+	{0x000226, 0x000226},
+	{0x000228, 0x000228},
+	{0x00022a, 0x00022a},
+	{0x00022c, 0x00022c},
+	{0x00022e, 0x00022e},
+	{0x000230, 0x000230},
+	{0x000232, 0x000232},
+	{0x00023a, 0x00023b},
+	{0x00023d, 0x00023e},
+	{0x000241, 0x000241},
+	{0x000243, 0x000246},
+	{0x000248, 0x000248},
+	{0x00024a, 0x00024a},
+	{0x00024c, 0x00024c},
+	{0x00024e, 0x00024e},
+	{0x000370, 0x000370},
+	{0x000372, 0x000372},
+	{0x000376, 0x000376},
+	{0x00037f, 0x00037f},
+	{0x000386, 0x000386},
+	{0x000388, 0x00038a},
+	{0x00038c, 0x00038c},
+	{0x00038e, 0x00038f},
+	{0x000391, 0x0003a1},
+	{0x0003a3, 0x0003ab},
+	{0x0003cf, 0x0003cf},
+	{0x0003d2, 0x0003d4},
+	{0x0003d8, 0x0003d8},
+	{0x0003da, 0x0003da},
+	{0x0003dc, 0x0003dc},
+	{0x0003de, 0x0003de},
+	{0x0003e0, 0x0003e0},
+	{0x0003e2, 0x0003e2},
+	{0x0003e4, 0x0003e4},
+	{0x0003e6, 0x0003e6},
+	{0x0003e8, 0x0003e8},
+	{0x0003ea, 0x0003ea},
+	{0x0003ec, 0x0003ec},
+	{0x0003ee, 0x0003ee},
+	{0x0003f4, 0x0003f4},
+	{0x0003f7, 0x0003f7},
+	{0x0003f9, 0x0003fa},
+	{0x0003fd, 0x00042f},
+	{0x000460, 0x000460},
+	{0x000462, 0x000462},
+	{0x000464, 0x000464},
+	{0x000466, 0x000466},
+	{0x000468, 0x000468},
+	{0x00046a, 0x00046a},
+	{0x00046c, 0x00046c},
+	{0x00046e, 0x00046e},
+	{0x000470, 0x000470},
+	{0x000472, 0x000472},
+	{0x000474, 0x000474},
+	{0x000476, 0x000476},
+	{0x000478, 0x000478},
+	{0x00047a, 0x00047a},
+	{0x00047c, 0x00047c},
+	{0x00047e, 0x00047e},
+	{0x000480, 0x000480},
+	{0x00048a, 0x00048a},
+	{0x00048c, 0x00048c},
+	{0x00048e, 0x00048e},
+	{0x000490, 0x000490},
+	{0x000492, 0x000492},
+	{0x000494, 0x000494},
+	{0x000496, 0x000496},
+	{0x000498, 0x000498},
+	{0x00049a, 0x00049a},
+	{0x00049c, 0x00049c},
+	{0x00049e, 0x00049e},
+	{0x0004a0, 0x0004a0},
+	{0x0004a2, 0x0004a2},
+	{0x0004a4, 0x0004a4},
+	{0x0004a6, 0x0004a6},
+	{0x0004a8, 0x0004a8},
+	{0x0004aa, 0x0004aa},
+	{0x0004ac, 0x0004ac},
+	{0x0004ae, 0x0004ae},
+	{0x0004b0, 0x0004b0},
+	{0x0004b2, 0x0004b2},
+	{0x0004b4, 0x0004b4},
+	{0x0004b6, 0x0004b6},
+	{0x0004b8, 0x0004b8},
+	{0x0004ba, 0x0004ba},
+	{0x0004bc, 0x0004bc},
+	{0x0004be, 0x0004be},
+	{0x0004c0, 0x0004c1},
+	{0x0004c3, 0x0004c3},
+	{0x0004c5, 0x0004c5},
+	{0x0004c7, 0x0004c7},
+	{0x0004c9, 0x0004c9},
+	{0x0004cb, 0x0004cb},
+	{0x0004cd, 0x0004cd},
+	{0x0004d0, 0x0004d0},
+	{0x0004d2, 0x0004d2},
+	{0x0004d4, 0x0004d4},
+	{0x0004d6, 0x0004d6},
+	{0x0004d8, 0x0004d8},
+	{0x0004da, 0x0004da},
+	{0x0004dc, 0x0004dc},
+	{0x0004de, 0x0004de},
+	{0x0004e0, 0x0004e0},
+	{0x0004e2, 0x0004e2},
+	{0x0004e4, 0x0004e4},
+	{0x0004e6, 0x0004e6},
+	{0x0004e8, 0x0004e8},
+	{0x0004ea, 0x0004ea},
+	{0x0004ec, 0x0004ec},
+	{0x0004ee, 0x0004ee},
+	{0x0004f0, 0x0004f0},
+	{0x0004f2, 0x0004f2},
+	{0x0004f4, 0x0004f4},
+	{0x0004f6, 0x0004f6},
+	{0x0004f8, 0x0004f8},
+	{0x0004fa, 0x0004fa},
+	{0x0004fc, 0x0004fc},
+	{0x0004fe, 0x0004fe},
+	{0x000500, 0x000500},
+	{0x000502, 0x000502},
+	{0x000504, 0x000504},
+	{0x000506, 0x000506},
+	{0x000508, 0x000508},
+	{0x00050a, 0x00050a},
+	{0x00050c, 0x00050c},
+	{0x00050e, 0x00050e},
+	{0x000510, 0x000510},
+	{0x000512, 0x000512},
+	{0x000514, 0x000514},
+	{0x000516, 0x000516},
+	{0x000518, 0x000518},
+	{0x00051a, 0x00051a},
+	{0x00051c, 0x00051c},
+	{0x00051e, 0x00051e},
+	{0x000520, 0x000520},
+	{0x000522, 0x000522},
+	{0x000524, 0x000524},
+	{0x000526, 0x000526},
+	{0x000528, 0x000528},
+	{0x00052a, 0x00052a},
+	{0x00052c, 0x00052c},
+	{0x00052e, 0x00052e},
+	{0x000531, 0x000556},
+	{0x0010a0, 0x0010c5},
+	{0x0010c7, 0x0010c7},
+	{0x0010cd, 0x0010cd},
+	{0x0013a0, 0x0013f5},
+	{0x001c90, 0x001cba},
+	{0x001cbd, 0x001cbf},
+	{0x001e00, 0x001e00},
+	{0x001e02, 0x001e02},
+	{0x001e04, 0x001e04},
+	{0x001e06, 0x001e06},
+	{0x001e08, 0x001e08},
+	{0x001e0a, 0x001e0a},
+	{0x001e0c, 0x001e0c},
+	{0x001e0e, 0x001e0e},
+	{0x001e10, 0x001e10},
+	{0x001e12, 0x001e12},
+	{0x001e14, 0x001e14},
+	{0x001e16, 0x001e16},
+	{0x001e18, 0x001e18},
+	{0x001e1a, 0x001e1a},
+	{0x001e1c, 0x001e1c},
+	{0x001e1e, 0x001e1e},
+	{0x001e20, 0x001e20},
+	{0x001e22, 0x001e22},
+	{0x001e24, 0x001e24},
+	{0x001e26, 0x001e26},
+	{0x001e28, 0x001e28},
+	{0x001e2a, 0x001e2a},
+	{0x001e2c, 0x001e2c},
+	{0x001e2e, 0x001e2e},
+	{0x001e30, 0x001e30},
+	{0x001e32, 0x001e32},
+	{0x001e34, 0x001e34},
+	{0x001e36, 0x001e36},
+	{0x001e38, 0x001e38},
+	{0x001e3a, 0x001e3a},
+	{0x001e3c, 0x001e3c},
+	{0x001e3e, 0x001e3e},
+	{0x001e40, 0x001e40},
+	{0x001e42, 0x001e42},
+	{0x001e44, 0x001e44},
+	{0x001e46, 0x001e46},
+	{0x001e48, 0x001e48},
+	{0x001e4a, 0x001e4a},
+	{0x001e4c, 0x001e4c},
+	{0x001e4e, 0x001e4e},
+	{0x001e50, 0x001e50},
+	{0x001e52, 0x001e52},
+	{0x001e54, 0x001e54},
+	{0x001e56, 0x001e56},
+	{0x001e58, 0x001e58},
+	{0x001e5a, 0x001e5a},
+	{0x001e5c, 0x001e5c},
+	{0x001e5e, 0x001e5e},
+	{0x001e60, 0x001e60},
+	{0x001e62, 0x001e62},
+	{0x001e64, 0x001e64},
+	{0x001e66, 0x001e66},
+	{0x001e68, 0x001e68},
+	{0x001e6a, 0x001e6a},
+	{0x001e6c, 0x001e6c},
+	{0x001e6e, 0x001e6e},
+	{0x001e70, 0x001e70},
+	{0x001e72, 0x001e72},
+	{0x001e74, 0x001e74},
+	{0x001e76, 0x001e76},
+	{0x001e78, 0x001e78},
+	{0x001e7a, 0x001e7a},
+	{0x001e7c, 0x001e7c},
+	{0x001e7e, 0x001e7e},
+	{0x001e80, 0x001e80},
+	{0x001e82, 0x001e82},
+	{0x001e84, 0x001e84},
+	{0x001e86, 0x001e86},
+	{0x001e88, 0x001e88},
+	{0x001e8a, 0x001e8a},
+	{0x001e8c, 0x001e8c},
+	{0x001e8e, 0x001e8e},
+	{0x001e90, 0x001e90},
+	{0x001e92, 0x001e92},
+	{0x001e94, 0x001e94},
+	{0x001e9e, 0x001e9e},
+	{0x001ea0, 0x001ea0},
+	{0x001ea2, 0x001ea2},
+	{0x001ea4, 0x001ea4},
+	{0x001ea6, 0x001ea6},
+	{0x001ea8, 0x001ea8},
+	{0x001eaa, 0x001eaa},
+	{0x001eac, 0x001eac},
+	{0x001eae, 0x001eae},
+	{0x001eb0, 0x001eb0},
+	{0x001eb2, 0x001eb2},
+	{0x001eb4, 0x001eb4},
+	{0x001eb6, 0x001eb6},
+	{0x001eb8, 0x001eb8},
+	{0x001eba, 0x001eba},
+	{0x001ebc, 0x001ebc},
+	{0x001ebe, 0x001ebe},
+	{0x001ec0, 0x001ec0},
+	{0x001ec2, 0x001ec2},
+	{0x001ec4, 0x001ec4},
+	{0x001ec6, 0x001ec6},
+	{0x001ec8, 0x001ec8},
+	{0x001eca, 0x001eca},
+	{0x001ecc, 0x001ecc},
+	{0x001ece, 0x001ece},
+	{0x001ed0, 0x001ed0},
+	{0x001ed2, 0x001ed2},
+	{0x001ed4, 0x001ed4},
+	{0x001ed6, 0x001ed6},
+	{0x001ed8, 0x001ed8},
+	{0x001eda, 0x001eda},
+	{0x001edc, 0x001edc},
+	{0x001ede, 0x001ede},
+	{0x001ee0, 0x001ee0},
+	{0x001ee2, 0x001ee2},
+	{0x001ee4, 0x001ee4},
+	{0x001ee6, 0x001ee6},
+	{0x001ee8, 0x001ee8},
+	{0x001eea, 0x001eea},
+	{0x001eec, 0x001eec},
+	{0x001eee, 0x001eee},
+	{0x001ef0, 0x001ef0},
+	{0x001ef2, 0x001ef2},
+	{0x001ef4, 0x001ef4},
+	{0x001ef6, 0x001ef6},
+	{0x001ef8, 0x001ef8},
+	{0x001efa, 0x001efa},
+	{0x001efc, 0x001efc},
+	{0x001efe, 0x001efe},
+	{0x001f08, 0x001f0f},
+	{0x001f18, 0x001f1d},
+	{0x001f28, 0x001f2f},
+	{0x001f38, 0x001f3f},
+	{0x001f48, 0x001f4d},
+	{0x001f59, 0x001f59},
+	{0x001f5b, 0x001f5b},
+	{0x001f5d, 0x001f5d},
+	{0x001f5f, 0x001f5f},
+	{0x001f68, 0x001f6f},
+	{0x001fb8, 0x001fbb},
+	{0x001fc8, 0x001fcb},
+	{0x001fd8, 0x001fdb},
+	{0x001fe8, 0x001fec},
+	{0x001ff8, 0x001ffb},
+	{0x002102, 0x002102},
+	{0x002107, 0x002107},
+	{0x00210b, 0x00210d},
+	{0x002110, 0x002112},
+	{0x002115, 0x002115},
+	{0x002119, 0x00211d},
+	{0x002124, 0x002124},
+	{0x002126, 0x002126},
+	{0x002128, 0x002128},
+	{0x00212a, 0x00212d},
+	{0x002130, 0x002133},
+	{0x00213e, 0x00213f},
+	{0x002145, 0x002145},
+	{0x002160, 0x00216f},
+	{0x002183, 0x002183},
+	{0x0024b6, 0x0024cf},
+	{0x002c00, 0x002c2f},
+	{0x002c60, 0x002c60},
+	{0x002c62, 0x002c64},
+	{0x002c67, 0x002c67},
+	{0x002c69, 0x002c69},
+	{0x002c6b, 0x002c6b},
+	{0x002c6d, 0x002c70},
+	{0x002c72, 0x002c72},
+	{0x002c75, 0x002c75},
+	{0x002c7e, 0x002c80},
+	{0x002c82, 0x002c82},
+	{0x002c84, 0x002c84},
+	{0x002c86, 0x002c86},
+	{0x002c88, 0x002c88},
+	{0x002c8a, 0x002c8a},
+	{0x002c8c, 0x002c8c},
+	{0x002c8e, 0x002c8e},
+	{0x002c90, 0x002c90},
+	{0x002c92, 0x002c92},
+	{0x002c94, 0x002c94},
+	{0x002c96, 0x002c96},
+	{0x002c98, 0x002c98},
+	{0x002c9a, 0x002c9a},
+	{0x002c9c, 0x002c9c},
+	{0x002c9e, 0x002c9e},
+	{0x002ca0, 0x002ca0},
+	{0x002ca2, 0x002ca2},
+	{0x002ca4, 0x002ca4},
+	{0x002ca6, 0x002ca6},
+	{0x002ca8, 0x002ca8},
+	{0x002caa, 0x002caa},
+	{0x002cac, 0x002cac},
+	{0x002cae, 0x002cae},
+	{0x002cb0, 0x002cb0},
+	{0x002cb2, 0x002cb2},
+	{0x002cb4, 0x002cb4},
+	{0x002cb6, 0x002cb6},
+	{0x002cb8, 0x002cb8},
+	{0x002cba, 0x002cba},
+	{0x002cbc, 0x002cbc},
+	{0x002cbe, 0x002cbe},
+	{0x002cc0, 0x002cc0},
+	{0x002cc2, 0x002cc2},
+	{0x002cc4, 0x002cc4},
+	{0x002cc6, 0x002cc6},
+	{0x002cc8, 0x002cc8},
+	{0x002cca, 0x002cca},
+	{0x002ccc, 0x002ccc},
+	{0x002cce, 0x002cce},
+	{0x002cd0, 0x002cd0},
+	{0x002cd2, 0x002cd2},
+	{0x002cd4, 0x002cd4},
+	{0x002cd6, 0x002cd6},
+	{0x002cd8, 0x002cd8},
+	{0x002cda, 0x002cda},
+	{0x002cdc, 0x002cdc},
+	{0x002cde, 0x002cde},
+	{0x002ce0, 0x002ce0},
+	{0x002ce2, 0x002ce2},
+	{0x002ceb, 0x002ceb},
+	{0x002ced, 0x002ced},
+	{0x002cf2, 0x002cf2},
+	{0x00a640, 0x00a640},
+	{0x00a642, 0x00a642},
+	{0x00a644, 0x00a644},
+	{0x00a646, 0x00a646},
+	{0x00a648, 0x00a648},
+	{0x00a64a, 0x00a64a},
+	{0x00a64c, 0x00a64c},
+	{0x00a64e, 0x00a64e},
+	{0x00a650, 0x00a650},
+	{0x00a652, 0x00a652},
+	{0x00a654, 0x00a654},
+	{0x00a656, 0x00a656},
+	{0x00a658, 0x00a658},
+	{0x00a65a, 0x00a65a},
+	{0x00a65c, 0x00a65c},
+	{0x00a65e, 0x00a65e},
+	{0x00a660, 0x00a660},
+	{0x00a662, 0x00a662},
+	{0x00a664, 0x00a664},
+	{0x00a666, 0x00a666},
+	{0x00a668, 0x00a668},
+	{0x00a66a, 0x00a66a},
+	{0x00a66c, 0x00a66c},
+	{0x00a680, 0x00a680},
+	{0x00a682, 0x00a682},
+	{0x00a684, 0x00a684},
+	{0x00a686, 0x00a686},
+	{0x00a688, 0x00a688},
+	{0x00a68a, 0x00a68a},
+	{0x00a68c, 0x00a68c},
+	{0x00a68e, 0x00a68e},
+	{0x00a690, 0x00a690},
+	{0x00a692, 0x00a692},
+	{0x00a694, 0x00a694},
+	{0x00a696, 0x00a696},
+	{0x00a698, 0x00a698},
+	{0x00a69a, 0x00a69a},
+	{0x00a722, 0x00a722},
+	{0x00a724, 0x00a724},
+	{0x00a726, 0x00a726},
+	{0x00a728, 0x00a728},
+	{0x00a72a, 0x00a72a},
+	{0x00a72c, 0x00a72c},
+	{0x00a72e, 0x00a72e},
+	{0x00a732, 0x00a732},
+	{0x00a734, 0x00a734},
+	{0x00a736, 0x00a736},
+	{0x00a738, 0x00a738},
+	{0x00a73a, 0x00a73a},
+	{0x00a73c, 0x00a73c},
+	{0x00a73e, 0x00a73e},
+	{0x00a740, 0x00a740},
+	{0x00a742, 0x00a742},
+	{0x00a744, 0x00a744},
+	{0x00a746, 0x00a746},
+	{0x00a748, 0x00a748},
+	{0x00a74a, 0x00a74a},
+	{0x00a74c, 0x00a74c},
+	{0x00a74e, 0x00a74e},
+	{0x00a750, 0x00a750},
+	{0x00a752, 0x00a752},
+	{0x00a754, 0x00a754},
+	{0x00a756, 0x00a756},
+	{0x00a758, 0x00a758},
+	{0x00a75a, 0x00a75a},
+	{0x00a75c, 0x00a75c},
+	{0x00a75e, 0x00a75e},
+	{0x00a760, 0x00a760},
+	{0x00a762, 0x00a762},
+	{0x00a764, 0x00a764},
+	{0x00a766, 0x00a766},
+	{0x00a768, 0x00a768},
+	{0x00a76a, 0x00a76a},
+	{0x00a76c, 0x00a76c},
+	{0x00a76e, 0x00a76e},
+	{0x00a779, 0x00a779},
+	{0x00a77b, 0x00a77b},
+	{0x00a77d, 0x00a77e},
+	{0x00a780, 0x00a780},
+	{0x00a782, 0x00a782},
+	{0x00a784, 0x00a784},
+	{0x00a786, 0x00a786},
+	{0x00a78b, 0x00a78b},
+	{0x00a78d, 0x00a78d},
+	{0x00a790, 0x00a790},
+	{0x00a792, 0x00a792},
+	{0x00a796, 0x00a796},
+	{0x00a798, 0x00a798},
+	{0x00a79a, 0x00a79a},
+	{0x00a79c, 0x00a79c},
+	{0x00a79e, 0x00a79e},
+	{0x00a7a0, 0x00a7a0},
+	{0x00a7a2, 0x00a7a2},
+	{0x00a7a4, 0x00a7a4},
+	{0x00a7a6, 0x00a7a6},
+	{0x00a7a8, 0x00a7a8},
+	{0x00a7aa, 0x00a7ae},
+	{0x00a7b0, 0x00a7b4},
+	{0x00a7b6, 0x00a7b6},
+	{0x00a7b8, 0x00a7b8},
+	{0x00a7ba, 0x00a7ba},
+	{0x00a7bc, 0x00a7bc},
+	{0x00a7be, 0x00a7be},
+	{0x00a7c0, 0x00a7c0},
+	{0x00a7c2, 0x00a7c2},
+	{0x00a7c4, 0x00a7c7},
+	{0x00a7c9, 0x00a7c9},
+	{0x00a7d0, 0x00a7d0},
+	{0x00a7d6, 0x00a7d6},
+	{0x00a7d8, 0x00a7d8},
+	{0x00a7f5, 0x00a7f5},
+	{0x00ff21, 0x00ff3a},
+	{0x010400, 0x010427},
+	{0x0104b0, 0x0104d3},
+	{0x010570, 0x01057a},
+	{0x01057c, 0x01058a},
+	{0x01058c, 0x010592},
+	{0x010594, 0x010595},
+	{0x010c80, 0x010cb2},
+	{0x0118a0, 0x0118bf},
+	{0x016e40, 0x016e5f},
+	{0x01d400, 0x01d419},
+	{0x01d434, 0x01d44d},
+	{0x01d468, 0x01d481},
+	{0x01d49c, 0x01d49c},
+	{0x01d49e, 0x01d49f},
+	{0x01d4a2, 0x01d4a2},
+	{0x01d4a5, 0x01d4a6},
+	{0x01d4a9, 0x01d4ac},
+	{0x01d4ae, 0x01d4b5},
+	{0x01d4d0, 0x01d4e9},
+	{0x01d504, 0x01d505},
+	{0x01d507, 0x01d50a},
+	{0x01d50d, 0x01d514},
+	{0x01d516, 0x01d51c},
+	{0x01d538, 0x01d539},
+	{0x01d53b, 0x01d53e},
+	{0x01d540, 0x01d544},
+	{0x01d546, 0x01d546},
+	{0x01d54a, 0x01d550},
+	{0x01d56c, 0x01d585},
+	{0x01d5a0, 0x01d5b9},
+	{0x01d5d4, 0x01d5ed},
+	{0x01d608, 0x01d621},
+	{0x01d63c, 0x01d655},
+	{0x01d670, 0x01d689},
+	{0x01d6a8, 0x01d6c0},
+	{0x01d6e2, 0x01d6fa},
+	{0x01d71c, 0x01d734},
+	{0x01d756, 0x01d76e},
+	{0x01d790, 0x01d7a8},
+	{0x01d7ca, 0x01d7ca},
+	{0x01e900, 0x01e921},
+	{0x01f130, 0x01f149},
+	{0x01f150, 0x01f169},
+	{0x01f170, 0x01f189}
+};
+
+/* table of Unicode codepoint ranges of White_Space characters */
+static const pg_unicode_range unicode_white_space[11] =
+{
+	{0x000009, 0x00000d},
+	{0x000020, 0x000020},
+	{0x000085, 0x000085},
+	{0x0000a0, 0x0000a0},
+	{0x001680, 0x001680},
+	{0x002000, 0x00200a},
+	{0x002028, 0x002028},
+	{0x002029, 0x002029},
+	{0x00202f, 0x00202f},
+	{0x00205f, 0x00205f},
+	{0x003000, 0x003000}
+};
+
+/* table of Unicode codepoint ranges of Hex_Digit characters */
+static const pg_unicode_range unicode_hex_digit[6] =
+{
+	{0x000030, 0x000039},
+	{0x000041, 0x000046},
+	{0x000061, 0x000066},
+	{0x00ff10, 0x00ff19},
+	{0x00ff21, 0x00ff26},
+	{0x00ff41, 0x00ff46}
+};
+
+/* table of Unicode codepoint ranges of Join_Control characters */
+static const pg_unicode_range unicode_join_control[1] =
+{
+	{0x00200c, 0x00200d}
+};
-- 
2.34.1

v1-0002-Shrink-unicode-category-table.patchtext/x-patch; charset=UTF-8; name=v1-0002-Shrink-unicode-category-table.patchDownload
From 35cd57cd65205573be3a3eff91affe307da405d0 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 22 Nov 2023 11:30:31 -0800
Subject: [PATCH v1 2/3] Shrink unicode category table.

Missing entries can implicitly be considered "unassigned".
---
 .../generate-unicode_category_table.pl        |  21 +-
 src/common/unicode_category.c                 |   6 +-
 src/include/common/unicode_category_table.h   | 711 +-----------------
 3 files changed, 15 insertions(+), 723 deletions(-)

diff --git a/src/common/unicode/generate-unicode_category_table.pl b/src/common/unicode/generate-unicode_category_table.pl
index 8f03425e0b..992b877ede 100644
--- a/src/common/unicode/generate-unicode_category_table.pl
+++ b/src/common/unicode/generate-unicode_category_table.pl
@@ -72,7 +72,10 @@ while (my $line = <$FH>)
 	# the current range, emit the current range and initialize a new
 	# range representing the gap.
 	if ($range_end + 1 != $code && $range_category ne $gap_category) {
-		push(@category_ranges, {start => $range_start, end => $range_end, category => $range_category});
+		if ($range_category ne $CATEGORY_UNASSIGNED) {
+			push(@category_ranges, {start => $range_start, end => $range_end,
+									category => $range_category});
+		}
 		$range_start = $range_end + 1;
 		$range_end = $code - 1;
 		$range_category = $gap_category;
@@ -80,7 +83,10 @@ while (my $line = <$FH>)
 
 	# different category; new range
 	if ($range_category ne $category) {
-		push(@category_ranges, {start => $range_start, end => $range_end, category => $range_category});
+		if ($range_category ne $CATEGORY_UNASSIGNED) {
+			push(@category_ranges, {start => $range_start, end => $range_end,
+									category => $range_category});
+		}
 		$range_start = $code;
 		$range_end = $code;
 		$range_category = $category;
@@ -109,14 +115,9 @@ die "<..., First> entry with no corresponding <..., Last> entry"
   if $gap_category ne $CATEGORY_UNASSIGNED;
 
 # emit final range
-push(@category_ranges, {start => $range_start, end => $range_end, category => $range_category});
-
-# emit range for any unassigned code points after last entry
-if ($range_end < 0x10FFFF) {
-	$range_start = $range_end + 1;
-	$range_end = 0x10FFFF;
-	$range_category = $CATEGORY_UNASSIGNED;
-	push(@category_ranges, {start => $range_start, end => $range_end, category => $range_category});
+if ($range_category ne $CATEGORY_UNASSIGNED) {
+	push(@category_ranges, {start => $range_start, end => $range_end,
+							category => $range_category});
 }
 
 my $num_ranges = scalar @category_ranges;
diff --git a/src/common/unicode_category.c b/src/common/unicode_category.c
index cec9c0d998..189cd6eca3 100644
--- a/src/common/unicode_category.c
+++ b/src/common/unicode_category.c
@@ -28,8 +28,7 @@ unicode_category(pg_wchar ucs)
 	int			mid;
 	int			max = lengthof(unicode_categories) - 1;
 
-	Assert(ucs >= unicode_categories[0].first &&
-		   ucs <= unicode_categories[max].last);
+	Assert(ucs <= 0x10ffff);
 
 	while (max >= min)
 	{
@@ -42,8 +41,7 @@ unicode_category(pg_wchar ucs)
 			return unicode_categories[mid].category;
 	}
 
-	Assert(false);
-	return (pg_unicode_category) - 1;
+	return PG_U_UNASSIGNED;
 }
 
 /*
diff --git a/src/include/common/unicode_category_table.h b/src/include/common/unicode_category_table.h
index 06ad50d215..14f1ea0677 100644
--- a/src/include/common/unicode_category_table.h
+++ b/src/include/common/unicode_category_table.h
@@ -26,7 +26,7 @@ typedef struct
 }			pg_category_range;
 
 /* table of Unicode codepoint ranges and their categories */
-static const pg_category_range unicode_categories[4009] =
+static const pg_category_range unicode_categories[3302] =
 {
 	{0x000000, 0x00001f, PG_U_CONTROL},
 	{0x000020, 0x000020, PG_U_SPACE_SEPARATOR},
@@ -397,23 +397,18 @@ static const pg_category_range unicode_categories[4009] =
 	{0x000375, 0x000375, PG_U_MODIFIER_SYMBOL},
 	{0x000376, 0x000376, PG_U_UPPERCASE_LETTER},
 	{0x000377, 0x000377, PG_U_LOWERCASE_LETTER},
-	{0x000378, 0x000379, PG_U_UNASSIGNED},
 	{0x00037a, 0x00037a, PG_U_MODIFIER_LETTER},
 	{0x00037b, 0x00037d, PG_U_LOWERCASE_LETTER},
 	{0x00037e, 0x00037e, PG_U_OTHER_PUNCTUATION},
 	{0x00037f, 0x00037f, PG_U_UPPERCASE_LETTER},
-	{0x000380, 0x000383, PG_U_UNASSIGNED},
 	{0x000384, 0x000385, PG_U_MODIFIER_SYMBOL},
 	{0x000386, 0x000386, PG_U_UPPERCASE_LETTER},
 	{0x000387, 0x000387, PG_U_OTHER_PUNCTUATION},
 	{0x000388, 0x00038a, PG_U_UPPERCASE_LETTER},
-	{0x00038b, 0x00038b, PG_U_UNASSIGNED},
 	{0x00038c, 0x00038c, PG_U_UPPERCASE_LETTER},
-	{0x00038d, 0x00038d, PG_U_UNASSIGNED},
 	{0x00038e, 0x00038f, PG_U_UPPERCASE_LETTER},
 	{0x000390, 0x000390, PG_U_LOWERCASE_LETTER},
 	{0x000391, 0x0003a1, PG_U_UPPERCASE_LETTER},
-	{0x0003a2, 0x0003a2, PG_U_UNASSIGNED},
 	{0x0003a3, 0x0003ab, PG_U_UPPERCASE_LETTER},
 	{0x0003ac, 0x0003ce, PG_U_LOWERCASE_LETTER},
 	{0x0003cf, 0x0003cf, PG_U_UPPERCASE_LETTER},
@@ -654,18 +649,14 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00052d, 0x00052d, PG_U_LOWERCASE_LETTER},
 	{0x00052e, 0x00052e, PG_U_UPPERCASE_LETTER},
 	{0x00052f, 0x00052f, PG_U_LOWERCASE_LETTER},
-	{0x000530, 0x000530, PG_U_UNASSIGNED},
 	{0x000531, 0x000556, PG_U_UPPERCASE_LETTER},
-	{0x000557, 0x000558, PG_U_UNASSIGNED},
 	{0x000559, 0x000559, PG_U_MODIFIER_LETTER},
 	{0x00055a, 0x00055f, PG_U_OTHER_PUNCTUATION},
 	{0x000560, 0x000588, PG_U_LOWERCASE_LETTER},
 	{0x000589, 0x000589, PG_U_OTHER_PUNCTUATION},
 	{0x00058a, 0x00058a, PG_U_DASH_PUNCTUATION},
-	{0x00058b, 0x00058c, PG_U_UNASSIGNED},
 	{0x00058d, 0x00058e, PG_U_OTHER_SYMBOL},
 	{0x00058f, 0x00058f, PG_U_CURRENCY_SYMBOL},
-	{0x000590, 0x000590, PG_U_UNASSIGNED},
 	{0x000591, 0x0005bd, PG_U_NONSPACING_MARK},
 	{0x0005be, 0x0005be, PG_U_DASH_PUNCTUATION},
 	{0x0005bf, 0x0005bf, PG_U_NONSPACING_MARK},
@@ -675,12 +666,9 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0005c4, 0x0005c5, PG_U_NONSPACING_MARK},
 	{0x0005c6, 0x0005c6, PG_U_OTHER_PUNCTUATION},
 	{0x0005c7, 0x0005c7, PG_U_NONSPACING_MARK},
-	{0x0005c8, 0x0005cf, PG_U_UNASSIGNED},
 	{0x0005d0, 0x0005ea, PG_U_OTHER_LETTER},
-	{0x0005eb, 0x0005ee, PG_U_UNASSIGNED},
 	{0x0005ef, 0x0005f2, PG_U_OTHER_LETTER},
 	{0x0005f3, 0x0005f4, PG_U_OTHER_PUNCTUATION},
-	{0x0005f5, 0x0005ff, PG_U_UNASSIGNED},
 	{0x000600, 0x000605, PG_U_FORMAT},
 	{0x000606, 0x000608, PG_U_MATH_SYMBOL},
 	{0x000609, 0x00060a, PG_U_OTHER_PUNCTUATION},
@@ -716,17 +704,14 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0006fd, 0x0006fe, PG_U_OTHER_SYMBOL},
 	{0x0006ff, 0x0006ff, PG_U_OTHER_LETTER},
 	{0x000700, 0x00070d, PG_U_OTHER_PUNCTUATION},
-	{0x00070e, 0x00070e, PG_U_UNASSIGNED},
 	{0x00070f, 0x00070f, PG_U_FORMAT},
 	{0x000710, 0x000710, PG_U_OTHER_LETTER},
 	{0x000711, 0x000711, PG_U_NONSPACING_MARK},
 	{0x000712, 0x00072f, PG_U_OTHER_LETTER},
 	{0x000730, 0x00074a, PG_U_NONSPACING_MARK},
-	{0x00074b, 0x00074c, PG_U_UNASSIGNED},
 	{0x00074d, 0x0007a5, PG_U_OTHER_LETTER},
 	{0x0007a6, 0x0007b0, PG_U_NONSPACING_MARK},
 	{0x0007b1, 0x0007b1, PG_U_OTHER_LETTER},
-	{0x0007b2, 0x0007bf, PG_U_UNASSIGNED},
 	{0x0007c0, 0x0007c9, PG_U_DECIMAL_NUMBER},
 	{0x0007ca, 0x0007ea, PG_U_OTHER_LETTER},
 	{0x0007eb, 0x0007f3, PG_U_NONSPACING_MARK},
@@ -734,7 +719,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0007f6, 0x0007f6, PG_U_OTHER_SYMBOL},
 	{0x0007f7, 0x0007f9, PG_U_OTHER_PUNCTUATION},
 	{0x0007fa, 0x0007fa, PG_U_MODIFIER_LETTER},
-	{0x0007fb, 0x0007fc, PG_U_UNASSIGNED},
 	{0x0007fd, 0x0007fd, PG_U_NONSPACING_MARK},
 	{0x0007fe, 0x0007ff, PG_U_CURRENCY_SYMBOL},
 	{0x000800, 0x000815, PG_U_OTHER_LETTER},
@@ -745,22 +729,15 @@ static const pg_category_range unicode_categories[4009] =
 	{0x000825, 0x000827, PG_U_NONSPACING_MARK},
 	{0x000828, 0x000828, PG_U_MODIFIER_LETTER},
 	{0x000829, 0x00082d, PG_U_NONSPACING_MARK},
-	{0x00082e, 0x00082f, PG_U_UNASSIGNED},
 	{0x000830, 0x00083e, PG_U_OTHER_PUNCTUATION},
-	{0x00083f, 0x00083f, PG_U_UNASSIGNED},
 	{0x000840, 0x000858, PG_U_OTHER_LETTER},
 	{0x000859, 0x00085b, PG_U_NONSPACING_MARK},
-	{0x00085c, 0x00085d, PG_U_UNASSIGNED},
 	{0x00085e, 0x00085e, PG_U_OTHER_PUNCTUATION},
-	{0x00085f, 0x00085f, PG_U_UNASSIGNED},
 	{0x000860, 0x00086a, PG_U_OTHER_LETTER},
-	{0x00086b, 0x00086f, PG_U_UNASSIGNED},
 	{0x000870, 0x000887, PG_U_OTHER_LETTER},
 	{0x000888, 0x000888, PG_U_MODIFIER_SYMBOL},
 	{0x000889, 0x00088e, PG_U_OTHER_LETTER},
-	{0x00088f, 0x00088f, PG_U_UNASSIGNED},
 	{0x000890, 0x000891, PG_U_FORMAT},
-	{0x000892, 0x000897, PG_U_UNASSIGNED},
 	{0x000898, 0x00089f, PG_U_NONSPACING_MARK},
 	{0x0008a0, 0x0008c8, PG_U_OTHER_LETTER},
 	{0x0008c9, 0x0008c9, PG_U_MODIFIER_LETTER},
@@ -789,37 +766,24 @@ static const pg_category_range unicode_categories[4009] =
 	{0x000972, 0x000980, PG_U_OTHER_LETTER},
 	{0x000981, 0x000981, PG_U_NONSPACING_MARK},
 	{0x000982, 0x000983, PG_U_SPACING_MARK},
-	{0x000984, 0x000984, PG_U_UNASSIGNED},
 	{0x000985, 0x00098c, PG_U_OTHER_LETTER},
-	{0x00098d, 0x00098e, PG_U_UNASSIGNED},
 	{0x00098f, 0x000990, PG_U_OTHER_LETTER},
-	{0x000991, 0x000992, PG_U_UNASSIGNED},
 	{0x000993, 0x0009a8, PG_U_OTHER_LETTER},
-	{0x0009a9, 0x0009a9, PG_U_UNASSIGNED},
 	{0x0009aa, 0x0009b0, PG_U_OTHER_LETTER},
-	{0x0009b1, 0x0009b1, PG_U_UNASSIGNED},
 	{0x0009b2, 0x0009b2, PG_U_OTHER_LETTER},
-	{0x0009b3, 0x0009b5, PG_U_UNASSIGNED},
 	{0x0009b6, 0x0009b9, PG_U_OTHER_LETTER},
-	{0x0009ba, 0x0009bb, PG_U_UNASSIGNED},
 	{0x0009bc, 0x0009bc, PG_U_NONSPACING_MARK},
 	{0x0009bd, 0x0009bd, PG_U_OTHER_LETTER},
 	{0x0009be, 0x0009c0, PG_U_SPACING_MARK},
 	{0x0009c1, 0x0009c4, PG_U_NONSPACING_MARK},
-	{0x0009c5, 0x0009c6, PG_U_UNASSIGNED},
 	{0x0009c7, 0x0009c8, PG_U_SPACING_MARK},
-	{0x0009c9, 0x0009ca, PG_U_UNASSIGNED},
 	{0x0009cb, 0x0009cc, PG_U_SPACING_MARK},
 	{0x0009cd, 0x0009cd, PG_U_NONSPACING_MARK},
 	{0x0009ce, 0x0009ce, PG_U_OTHER_LETTER},
-	{0x0009cf, 0x0009d6, PG_U_UNASSIGNED},
 	{0x0009d7, 0x0009d7, PG_U_SPACING_MARK},
-	{0x0009d8, 0x0009db, PG_U_UNASSIGNED},
 	{0x0009dc, 0x0009dd, PG_U_OTHER_LETTER},
-	{0x0009de, 0x0009de, PG_U_UNASSIGNED},
 	{0x0009df, 0x0009e1, PG_U_OTHER_LETTER},
 	{0x0009e2, 0x0009e3, PG_U_NONSPACING_MARK},
-	{0x0009e4, 0x0009e5, PG_U_UNASSIGNED},
 	{0x0009e6, 0x0009ef, PG_U_DECIMAL_NUMBER},
 	{0x0009f0, 0x0009f1, PG_U_OTHER_LETTER},
 	{0x0009f2, 0x0009f3, PG_U_CURRENCY_SYMBOL},
@@ -829,194 +793,121 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0009fc, 0x0009fc, PG_U_OTHER_LETTER},
 	{0x0009fd, 0x0009fd, PG_U_OTHER_PUNCTUATION},
 	{0x0009fe, 0x0009fe, PG_U_NONSPACING_MARK},
-	{0x0009ff, 0x000a00, PG_U_UNASSIGNED},
 	{0x000a01, 0x000a02, PG_U_NONSPACING_MARK},
 	{0x000a03, 0x000a03, PG_U_SPACING_MARK},
-	{0x000a04, 0x000a04, PG_U_UNASSIGNED},
 	{0x000a05, 0x000a0a, PG_U_OTHER_LETTER},
-	{0x000a0b, 0x000a0e, PG_U_UNASSIGNED},
 	{0x000a0f, 0x000a10, PG_U_OTHER_LETTER},
-	{0x000a11, 0x000a12, PG_U_UNASSIGNED},
 	{0x000a13, 0x000a28, PG_U_OTHER_LETTER},
-	{0x000a29, 0x000a29, PG_U_UNASSIGNED},
 	{0x000a2a, 0x000a30, PG_U_OTHER_LETTER},
-	{0x000a31, 0x000a31, PG_U_UNASSIGNED},
 	{0x000a32, 0x000a33, PG_U_OTHER_LETTER},
-	{0x000a34, 0x000a34, PG_U_UNASSIGNED},
 	{0x000a35, 0x000a36, PG_U_OTHER_LETTER},
-	{0x000a37, 0x000a37, PG_U_UNASSIGNED},
 	{0x000a38, 0x000a39, PG_U_OTHER_LETTER},
-	{0x000a3a, 0x000a3b, PG_U_UNASSIGNED},
 	{0x000a3c, 0x000a3c, PG_U_NONSPACING_MARK},
-	{0x000a3d, 0x000a3d, PG_U_UNASSIGNED},
 	{0x000a3e, 0x000a40, PG_U_SPACING_MARK},
 	{0x000a41, 0x000a42, PG_U_NONSPACING_MARK},
-	{0x000a43, 0x000a46, PG_U_UNASSIGNED},
 	{0x000a47, 0x000a48, PG_U_NONSPACING_MARK},
-	{0x000a49, 0x000a4a, PG_U_UNASSIGNED},
 	{0x000a4b, 0x000a4d, PG_U_NONSPACING_MARK},
-	{0x000a4e, 0x000a50, PG_U_UNASSIGNED},
 	{0x000a51, 0x000a51, PG_U_NONSPACING_MARK},
-	{0x000a52, 0x000a58, PG_U_UNASSIGNED},
 	{0x000a59, 0x000a5c, PG_U_OTHER_LETTER},
-	{0x000a5d, 0x000a5d, PG_U_UNASSIGNED},
 	{0x000a5e, 0x000a5e, PG_U_OTHER_LETTER},
-	{0x000a5f, 0x000a65, PG_U_UNASSIGNED},
 	{0x000a66, 0x000a6f, PG_U_DECIMAL_NUMBER},
 	{0x000a70, 0x000a71, PG_U_NONSPACING_MARK},
 	{0x000a72, 0x000a74, PG_U_OTHER_LETTER},
 	{0x000a75, 0x000a75, PG_U_NONSPACING_MARK},
 	{0x000a76, 0x000a76, PG_U_OTHER_PUNCTUATION},
-	{0x000a77, 0x000a80, PG_U_UNASSIGNED},
 	{0x000a81, 0x000a82, PG_U_NONSPACING_MARK},
 	{0x000a83, 0x000a83, PG_U_SPACING_MARK},
-	{0x000a84, 0x000a84, PG_U_UNASSIGNED},
 	{0x000a85, 0x000a8d, PG_U_OTHER_LETTER},
-	{0x000a8e, 0x000a8e, PG_U_UNASSIGNED},
 	{0x000a8f, 0x000a91, PG_U_OTHER_LETTER},
-	{0x000a92, 0x000a92, PG_U_UNASSIGNED},
 	{0x000a93, 0x000aa8, PG_U_OTHER_LETTER},
-	{0x000aa9, 0x000aa9, PG_U_UNASSIGNED},
 	{0x000aaa, 0x000ab0, PG_U_OTHER_LETTER},
-	{0x000ab1, 0x000ab1, PG_U_UNASSIGNED},
 	{0x000ab2, 0x000ab3, PG_U_OTHER_LETTER},
-	{0x000ab4, 0x000ab4, PG_U_UNASSIGNED},
 	{0x000ab5, 0x000ab9, PG_U_OTHER_LETTER},
-	{0x000aba, 0x000abb, PG_U_UNASSIGNED},
 	{0x000abc, 0x000abc, PG_U_NONSPACING_MARK},
 	{0x000abd, 0x000abd, PG_U_OTHER_LETTER},
 	{0x000abe, 0x000ac0, PG_U_SPACING_MARK},
 	{0x000ac1, 0x000ac5, PG_U_NONSPACING_MARK},
-	{0x000ac6, 0x000ac6, PG_U_UNASSIGNED},
 	{0x000ac7, 0x000ac8, PG_U_NONSPACING_MARK},
 	{0x000ac9, 0x000ac9, PG_U_SPACING_MARK},
-	{0x000aca, 0x000aca, PG_U_UNASSIGNED},
 	{0x000acb, 0x000acc, PG_U_SPACING_MARK},
 	{0x000acd, 0x000acd, PG_U_NONSPACING_MARK},
-	{0x000ace, 0x000acf, PG_U_UNASSIGNED},
 	{0x000ad0, 0x000ad0, PG_U_OTHER_LETTER},
-	{0x000ad1, 0x000adf, PG_U_UNASSIGNED},
 	{0x000ae0, 0x000ae1, PG_U_OTHER_LETTER},
 	{0x000ae2, 0x000ae3, PG_U_NONSPACING_MARK},
-	{0x000ae4, 0x000ae5, PG_U_UNASSIGNED},
 	{0x000ae6, 0x000aef, PG_U_DECIMAL_NUMBER},
 	{0x000af0, 0x000af0, PG_U_OTHER_PUNCTUATION},
 	{0x000af1, 0x000af1, PG_U_CURRENCY_SYMBOL},
-	{0x000af2, 0x000af8, PG_U_UNASSIGNED},
 	{0x000af9, 0x000af9, PG_U_OTHER_LETTER},
 	{0x000afa, 0x000aff, PG_U_NONSPACING_MARK},
-	{0x000b00, 0x000b00, PG_U_UNASSIGNED},
 	{0x000b01, 0x000b01, PG_U_NONSPACING_MARK},
 	{0x000b02, 0x000b03, PG_U_SPACING_MARK},
-	{0x000b04, 0x000b04, PG_U_UNASSIGNED},
 	{0x000b05, 0x000b0c, PG_U_OTHER_LETTER},
-	{0x000b0d, 0x000b0e, PG_U_UNASSIGNED},
 	{0x000b0f, 0x000b10, PG_U_OTHER_LETTER},
-	{0x000b11, 0x000b12, PG_U_UNASSIGNED},
 	{0x000b13, 0x000b28, PG_U_OTHER_LETTER},
-	{0x000b29, 0x000b29, PG_U_UNASSIGNED},
 	{0x000b2a, 0x000b30, PG_U_OTHER_LETTER},
-	{0x000b31, 0x000b31, PG_U_UNASSIGNED},
 	{0x000b32, 0x000b33, PG_U_OTHER_LETTER},
-	{0x000b34, 0x000b34, PG_U_UNASSIGNED},
 	{0x000b35, 0x000b39, PG_U_OTHER_LETTER},
-	{0x000b3a, 0x000b3b, PG_U_UNASSIGNED},
 	{0x000b3c, 0x000b3c, PG_U_NONSPACING_MARK},
 	{0x000b3d, 0x000b3d, PG_U_OTHER_LETTER},
 	{0x000b3e, 0x000b3e, PG_U_SPACING_MARK},
 	{0x000b3f, 0x000b3f, PG_U_NONSPACING_MARK},
 	{0x000b40, 0x000b40, PG_U_SPACING_MARK},
 	{0x000b41, 0x000b44, PG_U_NONSPACING_MARK},
-	{0x000b45, 0x000b46, PG_U_UNASSIGNED},
 	{0x000b47, 0x000b48, PG_U_SPACING_MARK},
-	{0x000b49, 0x000b4a, PG_U_UNASSIGNED},
 	{0x000b4b, 0x000b4c, PG_U_SPACING_MARK},
 	{0x000b4d, 0x000b4d, PG_U_NONSPACING_MARK},
-	{0x000b4e, 0x000b54, PG_U_UNASSIGNED},
 	{0x000b55, 0x000b56, PG_U_NONSPACING_MARK},
 	{0x000b57, 0x000b57, PG_U_SPACING_MARK},
-	{0x000b58, 0x000b5b, PG_U_UNASSIGNED},
 	{0x000b5c, 0x000b5d, PG_U_OTHER_LETTER},
-	{0x000b5e, 0x000b5e, PG_U_UNASSIGNED},
 	{0x000b5f, 0x000b61, PG_U_OTHER_LETTER},
 	{0x000b62, 0x000b63, PG_U_NONSPACING_MARK},
-	{0x000b64, 0x000b65, PG_U_UNASSIGNED},
 	{0x000b66, 0x000b6f, PG_U_DECIMAL_NUMBER},
 	{0x000b70, 0x000b70, PG_U_OTHER_SYMBOL},
 	{0x000b71, 0x000b71, PG_U_OTHER_LETTER},
 	{0x000b72, 0x000b77, PG_U_OTHER_NUMBER},
-	{0x000b78, 0x000b81, PG_U_UNASSIGNED},
 	{0x000b82, 0x000b82, PG_U_NONSPACING_MARK},
 	{0x000b83, 0x000b83, PG_U_OTHER_LETTER},
-	{0x000b84, 0x000b84, PG_U_UNASSIGNED},
 	{0x000b85, 0x000b8a, PG_U_OTHER_LETTER},
-	{0x000b8b, 0x000b8d, PG_U_UNASSIGNED},
 	{0x000b8e, 0x000b90, PG_U_OTHER_LETTER},
-	{0x000b91, 0x000b91, PG_U_UNASSIGNED},
 	{0x000b92, 0x000b95, PG_U_OTHER_LETTER},
-	{0x000b96, 0x000b98, PG_U_UNASSIGNED},
 	{0x000b99, 0x000b9a, PG_U_OTHER_LETTER},
-	{0x000b9b, 0x000b9b, PG_U_UNASSIGNED},
 	{0x000b9c, 0x000b9c, PG_U_OTHER_LETTER},
-	{0x000b9d, 0x000b9d, PG_U_UNASSIGNED},
 	{0x000b9e, 0x000b9f, PG_U_OTHER_LETTER},
-	{0x000ba0, 0x000ba2, PG_U_UNASSIGNED},
 	{0x000ba3, 0x000ba4, PG_U_OTHER_LETTER},
-	{0x000ba5, 0x000ba7, PG_U_UNASSIGNED},
 	{0x000ba8, 0x000baa, PG_U_OTHER_LETTER},
-	{0x000bab, 0x000bad, PG_U_UNASSIGNED},
 	{0x000bae, 0x000bb9, PG_U_OTHER_LETTER},
-	{0x000bba, 0x000bbd, PG_U_UNASSIGNED},
 	{0x000bbe, 0x000bbf, PG_U_SPACING_MARK},
 	{0x000bc0, 0x000bc0, PG_U_NONSPACING_MARK},
 	{0x000bc1, 0x000bc2, PG_U_SPACING_MARK},
-	{0x000bc3, 0x000bc5, PG_U_UNASSIGNED},
 	{0x000bc6, 0x000bc8, PG_U_SPACING_MARK},
-	{0x000bc9, 0x000bc9, PG_U_UNASSIGNED},
 	{0x000bca, 0x000bcc, PG_U_SPACING_MARK},
 	{0x000bcd, 0x000bcd, PG_U_NONSPACING_MARK},
-	{0x000bce, 0x000bcf, PG_U_UNASSIGNED},
 	{0x000bd0, 0x000bd0, PG_U_OTHER_LETTER},
-	{0x000bd1, 0x000bd6, PG_U_UNASSIGNED},
 	{0x000bd7, 0x000bd7, PG_U_SPACING_MARK},
-	{0x000bd8, 0x000be5, PG_U_UNASSIGNED},
 	{0x000be6, 0x000bef, PG_U_DECIMAL_NUMBER},
 	{0x000bf0, 0x000bf2, PG_U_OTHER_NUMBER},
 	{0x000bf3, 0x000bf8, PG_U_OTHER_SYMBOL},
 	{0x000bf9, 0x000bf9, PG_U_CURRENCY_SYMBOL},
 	{0x000bfa, 0x000bfa, PG_U_OTHER_SYMBOL},
-	{0x000bfb, 0x000bff, PG_U_UNASSIGNED},
 	{0x000c00, 0x000c00, PG_U_NONSPACING_MARK},
 	{0x000c01, 0x000c03, PG_U_SPACING_MARK},
 	{0x000c04, 0x000c04, PG_U_NONSPACING_MARK},
 	{0x000c05, 0x000c0c, PG_U_OTHER_LETTER},
-	{0x000c0d, 0x000c0d, PG_U_UNASSIGNED},
 	{0x000c0e, 0x000c10, PG_U_OTHER_LETTER},
-	{0x000c11, 0x000c11, PG_U_UNASSIGNED},
 	{0x000c12, 0x000c28, PG_U_OTHER_LETTER},
-	{0x000c29, 0x000c29, PG_U_UNASSIGNED},
 	{0x000c2a, 0x000c39, PG_U_OTHER_LETTER},
-	{0x000c3a, 0x000c3b, PG_U_UNASSIGNED},
 	{0x000c3c, 0x000c3c, PG_U_NONSPACING_MARK},
 	{0x000c3d, 0x000c3d, PG_U_OTHER_LETTER},
 	{0x000c3e, 0x000c40, PG_U_NONSPACING_MARK},
 	{0x000c41, 0x000c44, PG_U_SPACING_MARK},
-	{0x000c45, 0x000c45, PG_U_UNASSIGNED},
 	{0x000c46, 0x000c48, PG_U_NONSPACING_MARK},
-	{0x000c49, 0x000c49, PG_U_UNASSIGNED},
 	{0x000c4a, 0x000c4d, PG_U_NONSPACING_MARK},
-	{0x000c4e, 0x000c54, PG_U_UNASSIGNED},
 	{0x000c55, 0x000c56, PG_U_NONSPACING_MARK},
-	{0x000c57, 0x000c57, PG_U_UNASSIGNED},
 	{0x000c58, 0x000c5a, PG_U_OTHER_LETTER},
-	{0x000c5b, 0x000c5c, PG_U_UNASSIGNED},
 	{0x000c5d, 0x000c5d, PG_U_OTHER_LETTER},
-	{0x000c5e, 0x000c5f, PG_U_UNASSIGNED},
 	{0x000c60, 0x000c61, PG_U_OTHER_LETTER},
 	{0x000c62, 0x000c63, PG_U_NONSPACING_MARK},
-	{0x000c64, 0x000c65, PG_U_UNASSIGNED},
 	{0x000c66, 0x000c6f, PG_U_DECIMAL_NUMBER},
-	{0x000c70, 0x000c76, PG_U_UNASSIGNED},
 	{0x000c77, 0x000c77, PG_U_OTHER_PUNCTUATION},
 	{0x000c78, 0x000c7e, PG_U_OTHER_NUMBER},
 	{0x000c7f, 0x000c7f, PG_U_OTHER_SYMBOL},
@@ -1025,101 +916,68 @@ static const pg_category_range unicode_categories[4009] =
 	{0x000c82, 0x000c83, PG_U_SPACING_MARK},
 	{0x000c84, 0x000c84, PG_U_OTHER_PUNCTUATION},
 	{0x000c85, 0x000c8c, PG_U_OTHER_LETTER},
-	{0x000c8d, 0x000c8d, PG_U_UNASSIGNED},
 	{0x000c8e, 0x000c90, PG_U_OTHER_LETTER},
-	{0x000c91, 0x000c91, PG_U_UNASSIGNED},
 	{0x000c92, 0x000ca8, PG_U_OTHER_LETTER},
-	{0x000ca9, 0x000ca9, PG_U_UNASSIGNED},
 	{0x000caa, 0x000cb3, PG_U_OTHER_LETTER},
-	{0x000cb4, 0x000cb4, PG_U_UNASSIGNED},
 	{0x000cb5, 0x000cb9, PG_U_OTHER_LETTER},
-	{0x000cba, 0x000cbb, PG_U_UNASSIGNED},
 	{0x000cbc, 0x000cbc, PG_U_NONSPACING_MARK},
 	{0x000cbd, 0x000cbd, PG_U_OTHER_LETTER},
 	{0x000cbe, 0x000cbe, PG_U_SPACING_MARK},
 	{0x000cbf, 0x000cbf, PG_U_NONSPACING_MARK},
 	{0x000cc0, 0x000cc4, PG_U_SPACING_MARK},
-	{0x000cc5, 0x000cc5, PG_U_UNASSIGNED},
 	{0x000cc6, 0x000cc6, PG_U_NONSPACING_MARK},
 	{0x000cc7, 0x000cc8, PG_U_SPACING_MARK},
-	{0x000cc9, 0x000cc9, PG_U_UNASSIGNED},
 	{0x000cca, 0x000ccb, PG_U_SPACING_MARK},
 	{0x000ccc, 0x000ccd, PG_U_NONSPACING_MARK},
-	{0x000cce, 0x000cd4, PG_U_UNASSIGNED},
 	{0x000cd5, 0x000cd6, PG_U_SPACING_MARK},
-	{0x000cd7, 0x000cdc, PG_U_UNASSIGNED},
 	{0x000cdd, 0x000cde, PG_U_OTHER_LETTER},
-	{0x000cdf, 0x000cdf, PG_U_UNASSIGNED},
 	{0x000ce0, 0x000ce1, PG_U_OTHER_LETTER},
 	{0x000ce2, 0x000ce3, PG_U_NONSPACING_MARK},
-	{0x000ce4, 0x000ce5, PG_U_UNASSIGNED},
 	{0x000ce6, 0x000cef, PG_U_DECIMAL_NUMBER},
-	{0x000cf0, 0x000cf0, PG_U_UNASSIGNED},
 	{0x000cf1, 0x000cf2, PG_U_OTHER_LETTER},
 	{0x000cf3, 0x000cf3, PG_U_SPACING_MARK},
-	{0x000cf4, 0x000cff, PG_U_UNASSIGNED},
 	{0x000d00, 0x000d01, PG_U_NONSPACING_MARK},
 	{0x000d02, 0x000d03, PG_U_SPACING_MARK},
 	{0x000d04, 0x000d0c, PG_U_OTHER_LETTER},
-	{0x000d0d, 0x000d0d, PG_U_UNASSIGNED},
 	{0x000d0e, 0x000d10, PG_U_OTHER_LETTER},
-	{0x000d11, 0x000d11, PG_U_UNASSIGNED},
 	{0x000d12, 0x000d3a, PG_U_OTHER_LETTER},
 	{0x000d3b, 0x000d3c, PG_U_NONSPACING_MARK},
 	{0x000d3d, 0x000d3d, PG_U_OTHER_LETTER},
 	{0x000d3e, 0x000d40, PG_U_SPACING_MARK},
 	{0x000d41, 0x000d44, PG_U_NONSPACING_MARK},
-	{0x000d45, 0x000d45, PG_U_UNASSIGNED},
 	{0x000d46, 0x000d48, PG_U_SPACING_MARK},
-	{0x000d49, 0x000d49, PG_U_UNASSIGNED},
 	{0x000d4a, 0x000d4c, PG_U_SPACING_MARK},
 	{0x000d4d, 0x000d4d, PG_U_NONSPACING_MARK},
 	{0x000d4e, 0x000d4e, PG_U_OTHER_LETTER},
 	{0x000d4f, 0x000d4f, PG_U_OTHER_SYMBOL},
-	{0x000d50, 0x000d53, PG_U_UNASSIGNED},
 	{0x000d54, 0x000d56, PG_U_OTHER_LETTER},
 	{0x000d57, 0x000d57, PG_U_SPACING_MARK},
 	{0x000d58, 0x000d5e, PG_U_OTHER_NUMBER},
 	{0x000d5f, 0x000d61, PG_U_OTHER_LETTER},
 	{0x000d62, 0x000d63, PG_U_NONSPACING_MARK},
-	{0x000d64, 0x000d65, PG_U_UNASSIGNED},
 	{0x000d66, 0x000d6f, PG_U_DECIMAL_NUMBER},
 	{0x000d70, 0x000d78, PG_U_OTHER_NUMBER},
 	{0x000d79, 0x000d79, PG_U_OTHER_SYMBOL},
 	{0x000d7a, 0x000d7f, PG_U_OTHER_LETTER},
-	{0x000d80, 0x000d80, PG_U_UNASSIGNED},
 	{0x000d81, 0x000d81, PG_U_NONSPACING_MARK},
 	{0x000d82, 0x000d83, PG_U_SPACING_MARK},
-	{0x000d84, 0x000d84, PG_U_UNASSIGNED},
 	{0x000d85, 0x000d96, PG_U_OTHER_LETTER},
-	{0x000d97, 0x000d99, PG_U_UNASSIGNED},
 	{0x000d9a, 0x000db1, PG_U_OTHER_LETTER},
-	{0x000db2, 0x000db2, PG_U_UNASSIGNED},
 	{0x000db3, 0x000dbb, PG_U_OTHER_LETTER},
-	{0x000dbc, 0x000dbc, PG_U_UNASSIGNED},
 	{0x000dbd, 0x000dbd, PG_U_OTHER_LETTER},
-	{0x000dbe, 0x000dbf, PG_U_UNASSIGNED},
 	{0x000dc0, 0x000dc6, PG_U_OTHER_LETTER},
-	{0x000dc7, 0x000dc9, PG_U_UNASSIGNED},
 	{0x000dca, 0x000dca, PG_U_NONSPACING_MARK},
-	{0x000dcb, 0x000dce, PG_U_UNASSIGNED},
 	{0x000dcf, 0x000dd1, PG_U_SPACING_MARK},
 	{0x000dd2, 0x000dd4, PG_U_NONSPACING_MARK},
-	{0x000dd5, 0x000dd5, PG_U_UNASSIGNED},
 	{0x000dd6, 0x000dd6, PG_U_NONSPACING_MARK},
-	{0x000dd7, 0x000dd7, PG_U_UNASSIGNED},
 	{0x000dd8, 0x000ddf, PG_U_SPACING_MARK},
-	{0x000de0, 0x000de5, PG_U_UNASSIGNED},
 	{0x000de6, 0x000def, PG_U_DECIMAL_NUMBER},
-	{0x000df0, 0x000df1, PG_U_UNASSIGNED},
 	{0x000df2, 0x000df3, PG_U_SPACING_MARK},
 	{0x000df4, 0x000df4, PG_U_OTHER_PUNCTUATION},
-	{0x000df5, 0x000e00, PG_U_UNASSIGNED},
 	{0x000e01, 0x000e30, PG_U_OTHER_LETTER},
 	{0x000e31, 0x000e31, PG_U_NONSPACING_MARK},
 	{0x000e32, 0x000e33, PG_U_OTHER_LETTER},
 	{0x000e34, 0x000e3a, PG_U_NONSPACING_MARK},
-	{0x000e3b, 0x000e3e, PG_U_UNASSIGNED},
 	{0x000e3f, 0x000e3f, PG_U_CURRENCY_SYMBOL},
 	{0x000e40, 0x000e45, PG_U_OTHER_LETTER},
 	{0x000e46, 0x000e46, PG_U_MODIFIER_LETTER},
@@ -1127,33 +985,21 @@ static const pg_category_range unicode_categories[4009] =
 	{0x000e4f, 0x000e4f, PG_U_OTHER_PUNCTUATION},
 	{0x000e50, 0x000e59, PG_U_DECIMAL_NUMBER},
 	{0x000e5a, 0x000e5b, PG_U_OTHER_PUNCTUATION},
-	{0x000e5c, 0x000e80, PG_U_UNASSIGNED},
 	{0x000e81, 0x000e82, PG_U_OTHER_LETTER},
-	{0x000e83, 0x000e83, PG_U_UNASSIGNED},
 	{0x000e84, 0x000e84, PG_U_OTHER_LETTER},
-	{0x000e85, 0x000e85, PG_U_UNASSIGNED},
 	{0x000e86, 0x000e8a, PG_U_OTHER_LETTER},
-	{0x000e8b, 0x000e8b, PG_U_UNASSIGNED},
 	{0x000e8c, 0x000ea3, PG_U_OTHER_LETTER},
-	{0x000ea4, 0x000ea4, PG_U_UNASSIGNED},
 	{0x000ea5, 0x000ea5, PG_U_OTHER_LETTER},
-	{0x000ea6, 0x000ea6, PG_U_UNASSIGNED},
 	{0x000ea7, 0x000eb0, PG_U_OTHER_LETTER},
 	{0x000eb1, 0x000eb1, PG_U_NONSPACING_MARK},
 	{0x000eb2, 0x000eb3, PG_U_OTHER_LETTER},
 	{0x000eb4, 0x000ebc, PG_U_NONSPACING_MARK},
 	{0x000ebd, 0x000ebd, PG_U_OTHER_LETTER},
-	{0x000ebe, 0x000ebf, PG_U_UNASSIGNED},
 	{0x000ec0, 0x000ec4, PG_U_OTHER_LETTER},
-	{0x000ec5, 0x000ec5, PG_U_UNASSIGNED},
 	{0x000ec6, 0x000ec6, PG_U_MODIFIER_LETTER},
-	{0x000ec7, 0x000ec7, PG_U_UNASSIGNED},
 	{0x000ec8, 0x000ece, PG_U_NONSPACING_MARK},
-	{0x000ecf, 0x000ecf, PG_U_UNASSIGNED},
 	{0x000ed0, 0x000ed9, PG_U_DECIMAL_NUMBER},
-	{0x000eda, 0x000edb, PG_U_UNASSIGNED},
 	{0x000edc, 0x000edf, PG_U_OTHER_LETTER},
-	{0x000ee0, 0x000eff, PG_U_UNASSIGNED},
 	{0x000f00, 0x000f00, PG_U_OTHER_LETTER},
 	{0x000f01, 0x000f03, PG_U_OTHER_SYMBOL},
 	{0x000f04, 0x000f12, PG_U_OTHER_PUNCTUATION},
@@ -1176,9 +1022,7 @@ static const pg_category_range unicode_categories[4009] =
 	{0x000f3d, 0x000f3d, PG_U_CLOSE_PUNCTUATION},
 	{0x000f3e, 0x000f3f, PG_U_SPACING_MARK},
 	{0x000f40, 0x000f47, PG_U_OTHER_LETTER},
-	{0x000f48, 0x000f48, PG_U_UNASSIGNED},
 	{0x000f49, 0x000f6c, PG_U_OTHER_LETTER},
-	{0x000f6d, 0x000f70, PG_U_UNASSIGNED},
 	{0x000f71, 0x000f7e, PG_U_NONSPACING_MARK},
 	{0x000f7f, 0x000f7f, PG_U_SPACING_MARK},
 	{0x000f80, 0x000f84, PG_U_NONSPACING_MARK},
@@ -1186,18 +1030,14 @@ static const pg_category_range unicode_categories[4009] =
 	{0x000f86, 0x000f87, PG_U_NONSPACING_MARK},
 	{0x000f88, 0x000f8c, PG_U_OTHER_LETTER},
 	{0x000f8d, 0x000f97, PG_U_NONSPACING_MARK},
-	{0x000f98, 0x000f98, PG_U_UNASSIGNED},
 	{0x000f99, 0x000fbc, PG_U_NONSPACING_MARK},
-	{0x000fbd, 0x000fbd, PG_U_UNASSIGNED},
 	{0x000fbe, 0x000fc5, PG_U_OTHER_SYMBOL},
 	{0x000fc6, 0x000fc6, PG_U_NONSPACING_MARK},
 	{0x000fc7, 0x000fcc, PG_U_OTHER_SYMBOL},
-	{0x000fcd, 0x000fcd, PG_U_UNASSIGNED},
 	{0x000fce, 0x000fcf, PG_U_OTHER_SYMBOL},
 	{0x000fd0, 0x000fd4, PG_U_OTHER_PUNCTUATION},
 	{0x000fd5, 0x000fd8, PG_U_OTHER_SYMBOL},
 	{0x000fd9, 0x000fda, PG_U_OTHER_PUNCTUATION},
-	{0x000fdb, 0x000fff, PG_U_UNASSIGNED},
 	{0x001000, 0x00102a, PG_U_OTHER_LETTER},
 	{0x00102b, 0x00102c, PG_U_SPACING_MARK},
 	{0x00102d, 0x001030, PG_U_NONSPACING_MARK},
@@ -1234,58 +1074,35 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00109d, 0x00109d, PG_U_NONSPACING_MARK},
 	{0x00109e, 0x00109f, PG_U_OTHER_SYMBOL},
 	{0x0010a0, 0x0010c5, PG_U_UPPERCASE_LETTER},
-	{0x0010c6, 0x0010c6, PG_U_UNASSIGNED},
 	{0x0010c7, 0x0010c7, PG_U_UPPERCASE_LETTER},
-	{0x0010c8, 0x0010cc, PG_U_UNASSIGNED},
 	{0x0010cd, 0x0010cd, PG_U_UPPERCASE_LETTER},
-	{0x0010ce, 0x0010cf, PG_U_UNASSIGNED},
 	{0x0010d0, 0x0010fa, PG_U_LOWERCASE_LETTER},
 	{0x0010fb, 0x0010fb, PG_U_OTHER_PUNCTUATION},
 	{0x0010fc, 0x0010fc, PG_U_MODIFIER_LETTER},
 	{0x0010fd, 0x0010ff, PG_U_LOWERCASE_LETTER},
 	{0x001100, 0x001248, PG_U_OTHER_LETTER},
-	{0x001249, 0x001249, PG_U_UNASSIGNED},
 	{0x00124a, 0x00124d, PG_U_OTHER_LETTER},
-	{0x00124e, 0x00124f, PG_U_UNASSIGNED},
 	{0x001250, 0x001256, PG_U_OTHER_LETTER},
-	{0x001257, 0x001257, PG_U_UNASSIGNED},
 	{0x001258, 0x001258, PG_U_OTHER_LETTER},
-	{0x001259, 0x001259, PG_U_UNASSIGNED},
 	{0x00125a, 0x00125d, PG_U_OTHER_LETTER},
-	{0x00125e, 0x00125f, PG_U_UNASSIGNED},
 	{0x001260, 0x001288, PG_U_OTHER_LETTER},
-	{0x001289, 0x001289, PG_U_UNASSIGNED},
 	{0x00128a, 0x00128d, PG_U_OTHER_LETTER},
-	{0x00128e, 0x00128f, PG_U_UNASSIGNED},
 	{0x001290, 0x0012b0, PG_U_OTHER_LETTER},
-	{0x0012b1, 0x0012b1, PG_U_UNASSIGNED},
 	{0x0012b2, 0x0012b5, PG_U_OTHER_LETTER},
-	{0x0012b6, 0x0012b7, PG_U_UNASSIGNED},
 	{0x0012b8, 0x0012be, PG_U_OTHER_LETTER},
-	{0x0012bf, 0x0012bf, PG_U_UNASSIGNED},
 	{0x0012c0, 0x0012c0, PG_U_OTHER_LETTER},
-	{0x0012c1, 0x0012c1, PG_U_UNASSIGNED},
 	{0x0012c2, 0x0012c5, PG_U_OTHER_LETTER},
-	{0x0012c6, 0x0012c7, PG_U_UNASSIGNED},
 	{0x0012c8, 0x0012d6, PG_U_OTHER_LETTER},
-	{0x0012d7, 0x0012d7, PG_U_UNASSIGNED},
 	{0x0012d8, 0x001310, PG_U_OTHER_LETTER},
-	{0x001311, 0x001311, PG_U_UNASSIGNED},
 	{0x001312, 0x001315, PG_U_OTHER_LETTER},
-	{0x001316, 0x001317, PG_U_UNASSIGNED},
 	{0x001318, 0x00135a, PG_U_OTHER_LETTER},
-	{0x00135b, 0x00135c, PG_U_UNASSIGNED},
 	{0x00135d, 0x00135f, PG_U_NONSPACING_MARK},
 	{0x001360, 0x001368, PG_U_OTHER_PUNCTUATION},
 	{0x001369, 0x00137c, PG_U_OTHER_NUMBER},
-	{0x00137d, 0x00137f, PG_U_UNASSIGNED},
 	{0x001380, 0x00138f, PG_U_OTHER_LETTER},
 	{0x001390, 0x001399, PG_U_OTHER_SYMBOL},
-	{0x00139a, 0x00139f, PG_U_UNASSIGNED},
 	{0x0013a0, 0x0013f5, PG_U_UPPERCASE_LETTER},
-	{0x0013f6, 0x0013f7, PG_U_UNASSIGNED},
 	{0x0013f8, 0x0013fd, PG_U_LOWERCASE_LETTER},
-	{0x0013fe, 0x0013ff, PG_U_UNASSIGNED},
 	{0x001400, 0x001400, PG_U_DASH_PUNCTUATION},
 	{0x001401, 0x00166c, PG_U_OTHER_LETTER},
 	{0x00166d, 0x00166d, PG_U_OTHER_SYMBOL},
@@ -1295,30 +1112,22 @@ static const pg_category_range unicode_categories[4009] =
 	{0x001681, 0x00169a, PG_U_OTHER_LETTER},
 	{0x00169b, 0x00169b, PG_U_OPEN_PUNCTUATION},
 	{0x00169c, 0x00169c, PG_U_CLOSE_PUNCTUATION},
-	{0x00169d, 0x00169f, PG_U_UNASSIGNED},
 	{0x0016a0, 0x0016ea, PG_U_OTHER_LETTER},
 	{0x0016eb, 0x0016ed, PG_U_OTHER_PUNCTUATION},
 	{0x0016ee, 0x0016f0, PG_U_LETTER_NUMBER},
 	{0x0016f1, 0x0016f8, PG_U_OTHER_LETTER},
-	{0x0016f9, 0x0016ff, PG_U_UNASSIGNED},
 	{0x001700, 0x001711, PG_U_OTHER_LETTER},
 	{0x001712, 0x001714, PG_U_NONSPACING_MARK},
 	{0x001715, 0x001715, PG_U_SPACING_MARK},
-	{0x001716, 0x00171e, PG_U_UNASSIGNED},
 	{0x00171f, 0x001731, PG_U_OTHER_LETTER},
 	{0x001732, 0x001733, PG_U_NONSPACING_MARK},
 	{0x001734, 0x001734, PG_U_SPACING_MARK},
 	{0x001735, 0x001736, PG_U_OTHER_PUNCTUATION},
-	{0x001737, 0x00173f, PG_U_UNASSIGNED},
 	{0x001740, 0x001751, PG_U_OTHER_LETTER},
 	{0x001752, 0x001753, PG_U_NONSPACING_MARK},
-	{0x001754, 0x00175f, PG_U_UNASSIGNED},
 	{0x001760, 0x00176c, PG_U_OTHER_LETTER},
-	{0x00176d, 0x00176d, PG_U_UNASSIGNED},
 	{0x00176e, 0x001770, PG_U_OTHER_LETTER},
-	{0x001771, 0x001771, PG_U_UNASSIGNED},
 	{0x001772, 0x001773, PG_U_NONSPACING_MARK},
-	{0x001774, 0x00177f, PG_U_UNASSIGNED},
 	{0x001780, 0x0017b3, PG_U_OTHER_LETTER},
 	{0x0017b4, 0x0017b5, PG_U_NONSPACING_MARK},
 	{0x0017b6, 0x0017b6, PG_U_SPACING_MARK},
@@ -1333,11 +1142,8 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0017db, 0x0017db, PG_U_CURRENCY_SYMBOL},
 	{0x0017dc, 0x0017dc, PG_U_OTHER_LETTER},
 	{0x0017dd, 0x0017dd, PG_U_NONSPACING_MARK},
-	{0x0017de, 0x0017df, PG_U_UNASSIGNED},
 	{0x0017e0, 0x0017e9, PG_U_DECIMAL_NUMBER},
-	{0x0017ea, 0x0017ef, PG_U_UNASSIGNED},
 	{0x0017f0, 0x0017f9, PG_U_OTHER_NUMBER},
-	{0x0017fa, 0x0017ff, PG_U_UNASSIGNED},
 	{0x001800, 0x001805, PG_U_OTHER_PUNCTUATION},
 	{0x001806, 0x001806, PG_U_DASH_PUNCTUATION},
 	{0x001807, 0x00180a, PG_U_OTHER_PUNCTUATION},
@@ -1345,59 +1151,44 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00180e, 0x00180e, PG_U_FORMAT},
 	{0x00180f, 0x00180f, PG_U_NONSPACING_MARK},
 	{0x001810, 0x001819, PG_U_DECIMAL_NUMBER},
-	{0x00181a, 0x00181f, PG_U_UNASSIGNED},
 	{0x001820, 0x001842, PG_U_OTHER_LETTER},
 	{0x001843, 0x001843, PG_U_MODIFIER_LETTER},
 	{0x001844, 0x001878, PG_U_OTHER_LETTER},
-	{0x001879, 0x00187f, PG_U_UNASSIGNED},
 	{0x001880, 0x001884, PG_U_OTHER_LETTER},
 	{0x001885, 0x001886, PG_U_NONSPACING_MARK},
 	{0x001887, 0x0018a8, PG_U_OTHER_LETTER},
 	{0x0018a9, 0x0018a9, PG_U_NONSPACING_MARK},
 	{0x0018aa, 0x0018aa, PG_U_OTHER_LETTER},
-	{0x0018ab, 0x0018af, PG_U_UNASSIGNED},
 	{0x0018b0, 0x0018f5, PG_U_OTHER_LETTER},
-	{0x0018f6, 0x0018ff, PG_U_UNASSIGNED},
 	{0x001900, 0x00191e, PG_U_OTHER_LETTER},
-	{0x00191f, 0x00191f, PG_U_UNASSIGNED},
 	{0x001920, 0x001922, PG_U_NONSPACING_MARK},
 	{0x001923, 0x001926, PG_U_SPACING_MARK},
 	{0x001927, 0x001928, PG_U_NONSPACING_MARK},
 	{0x001929, 0x00192b, PG_U_SPACING_MARK},
-	{0x00192c, 0x00192f, PG_U_UNASSIGNED},
 	{0x001930, 0x001931, PG_U_SPACING_MARK},
 	{0x001932, 0x001932, PG_U_NONSPACING_MARK},
 	{0x001933, 0x001938, PG_U_SPACING_MARK},
 	{0x001939, 0x00193b, PG_U_NONSPACING_MARK},
-	{0x00193c, 0x00193f, PG_U_UNASSIGNED},
 	{0x001940, 0x001940, PG_U_OTHER_SYMBOL},
-	{0x001941, 0x001943, PG_U_UNASSIGNED},
 	{0x001944, 0x001945, PG_U_OTHER_PUNCTUATION},
 	{0x001946, 0x00194f, PG_U_DECIMAL_NUMBER},
 	{0x001950, 0x00196d, PG_U_OTHER_LETTER},
-	{0x00196e, 0x00196f, PG_U_UNASSIGNED},
 	{0x001970, 0x001974, PG_U_OTHER_LETTER},
-	{0x001975, 0x00197f, PG_U_UNASSIGNED},
 	{0x001980, 0x0019ab, PG_U_OTHER_LETTER},
-	{0x0019ac, 0x0019af, PG_U_UNASSIGNED},
 	{0x0019b0, 0x0019c9, PG_U_OTHER_LETTER},
-	{0x0019ca, 0x0019cf, PG_U_UNASSIGNED},
 	{0x0019d0, 0x0019d9, PG_U_DECIMAL_NUMBER},
 	{0x0019da, 0x0019da, PG_U_OTHER_NUMBER},
-	{0x0019db, 0x0019dd, PG_U_UNASSIGNED},
 	{0x0019de, 0x0019ff, PG_U_OTHER_SYMBOL},
 	{0x001a00, 0x001a16, PG_U_OTHER_LETTER},
 	{0x001a17, 0x001a18, PG_U_NONSPACING_MARK},
 	{0x001a19, 0x001a1a, PG_U_SPACING_MARK},
 	{0x001a1b, 0x001a1b, PG_U_NONSPACING_MARK},
-	{0x001a1c, 0x001a1d, PG_U_UNASSIGNED},
 	{0x001a1e, 0x001a1f, PG_U_OTHER_PUNCTUATION},
 	{0x001a20, 0x001a54, PG_U_OTHER_LETTER},
 	{0x001a55, 0x001a55, PG_U_SPACING_MARK},
 	{0x001a56, 0x001a56, PG_U_NONSPACING_MARK},
 	{0x001a57, 0x001a57, PG_U_SPACING_MARK},
 	{0x001a58, 0x001a5e, PG_U_NONSPACING_MARK},
-	{0x001a5f, 0x001a5f, PG_U_UNASSIGNED},
 	{0x001a60, 0x001a60, PG_U_NONSPACING_MARK},
 	{0x001a61, 0x001a61, PG_U_SPACING_MARK},
 	{0x001a62, 0x001a62, PG_U_NONSPACING_MARK},
@@ -1405,20 +1196,15 @@ static const pg_category_range unicode_categories[4009] =
 	{0x001a65, 0x001a6c, PG_U_NONSPACING_MARK},
 	{0x001a6d, 0x001a72, PG_U_SPACING_MARK},
 	{0x001a73, 0x001a7c, PG_U_NONSPACING_MARK},
-	{0x001a7d, 0x001a7e, PG_U_UNASSIGNED},
 	{0x001a7f, 0x001a7f, PG_U_NONSPACING_MARK},
 	{0x001a80, 0x001a89, PG_U_DECIMAL_NUMBER},
-	{0x001a8a, 0x001a8f, PG_U_UNASSIGNED},
 	{0x001a90, 0x001a99, PG_U_DECIMAL_NUMBER},
-	{0x001a9a, 0x001a9f, PG_U_UNASSIGNED},
 	{0x001aa0, 0x001aa6, PG_U_OTHER_PUNCTUATION},
 	{0x001aa7, 0x001aa7, PG_U_MODIFIER_LETTER},
 	{0x001aa8, 0x001aad, PG_U_OTHER_PUNCTUATION},
-	{0x001aae, 0x001aaf, PG_U_UNASSIGNED},
 	{0x001ab0, 0x001abd, PG_U_NONSPACING_MARK},
 	{0x001abe, 0x001abe, PG_U_ENCLOSING_MARK},
 	{0x001abf, 0x001ace, PG_U_NONSPACING_MARK},
-	{0x001acf, 0x001aff, PG_U_UNASSIGNED},
 	{0x001b00, 0x001b03, PG_U_NONSPACING_MARK},
 	{0x001b04, 0x001b04, PG_U_SPACING_MARK},
 	{0x001b05, 0x001b33, PG_U_OTHER_LETTER},
@@ -1431,14 +1217,12 @@ static const pg_category_range unicode_categories[4009] =
 	{0x001b42, 0x001b42, PG_U_NONSPACING_MARK},
 	{0x001b43, 0x001b44, PG_U_SPACING_MARK},
 	{0x001b45, 0x001b4c, PG_U_OTHER_LETTER},
-	{0x001b4d, 0x001b4f, PG_U_UNASSIGNED},
 	{0x001b50, 0x001b59, PG_U_DECIMAL_NUMBER},
 	{0x001b5a, 0x001b60, PG_U_OTHER_PUNCTUATION},
 	{0x001b61, 0x001b6a, PG_U_OTHER_SYMBOL},
 	{0x001b6b, 0x001b73, PG_U_NONSPACING_MARK},
 	{0x001b74, 0x001b7c, PG_U_OTHER_SYMBOL},
 	{0x001b7d, 0x001b7e, PG_U_OTHER_PUNCTUATION},
-	{0x001b7f, 0x001b7f, PG_U_UNASSIGNED},
 	{0x001b80, 0x001b81, PG_U_NONSPACING_MARK},
 	{0x001b82, 0x001b82, PG_U_SPACING_MARK},
 	{0x001b83, 0x001ba0, PG_U_OTHER_LETTER},
@@ -1459,29 +1243,23 @@ static const pg_category_range unicode_categories[4009] =
 	{0x001bee, 0x001bee, PG_U_SPACING_MARK},
 	{0x001bef, 0x001bf1, PG_U_NONSPACING_MARK},
 	{0x001bf2, 0x001bf3, PG_U_SPACING_MARK},
-	{0x001bf4, 0x001bfb, PG_U_UNASSIGNED},
 	{0x001bfc, 0x001bff, PG_U_OTHER_PUNCTUATION},
 	{0x001c00, 0x001c23, PG_U_OTHER_LETTER},
 	{0x001c24, 0x001c2b, PG_U_SPACING_MARK},
 	{0x001c2c, 0x001c33, PG_U_NONSPACING_MARK},
 	{0x001c34, 0x001c35, PG_U_SPACING_MARK},
 	{0x001c36, 0x001c37, PG_U_NONSPACING_MARK},
-	{0x001c38, 0x001c3a, PG_U_UNASSIGNED},
 	{0x001c3b, 0x001c3f, PG_U_OTHER_PUNCTUATION},
 	{0x001c40, 0x001c49, PG_U_DECIMAL_NUMBER},
-	{0x001c4a, 0x001c4c, PG_U_UNASSIGNED},
 	{0x001c4d, 0x001c4f, PG_U_OTHER_LETTER},
 	{0x001c50, 0x001c59, PG_U_DECIMAL_NUMBER},
 	{0x001c5a, 0x001c77, PG_U_OTHER_LETTER},
 	{0x001c78, 0x001c7d, PG_U_MODIFIER_LETTER},
 	{0x001c7e, 0x001c7f, PG_U_OTHER_PUNCTUATION},
 	{0x001c80, 0x001c88, PG_U_LOWERCASE_LETTER},
-	{0x001c89, 0x001c8f, PG_U_UNASSIGNED},
 	{0x001c90, 0x001cba, PG_U_UPPERCASE_LETTER},
-	{0x001cbb, 0x001cbc, PG_U_UNASSIGNED},
 	{0x001cbd, 0x001cbf, PG_U_UPPERCASE_LETTER},
 	{0x001cc0, 0x001cc7, PG_U_OTHER_PUNCTUATION},
-	{0x001cc8, 0x001ccf, PG_U_UNASSIGNED},
 	{0x001cd0, 0x001cd2, PG_U_NONSPACING_MARK},
 	{0x001cd3, 0x001cd3, PG_U_OTHER_PUNCTUATION},
 	{0x001cd4, 0x001ce0, PG_U_NONSPACING_MARK},
@@ -1495,7 +1273,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x001cf7, 0x001cf7, PG_U_SPACING_MARK},
 	{0x001cf8, 0x001cf9, PG_U_NONSPACING_MARK},
 	{0x001cfa, 0x001cfa, PG_U_OTHER_LETTER},
-	{0x001cfb, 0x001cff, PG_U_UNASSIGNED},
 	{0x001d00, 0x001d2b, PG_U_LOWERCASE_LETTER},
 	{0x001d2c, 0x001d6a, PG_U_MODIFIER_LETTER},
 	{0x001d6b, 0x001d77, PG_U_LOWERCASE_LETTER},
@@ -1753,30 +1530,21 @@ static const pg_category_range unicode_categories[4009] =
 	{0x001eff, 0x001f07, PG_U_LOWERCASE_LETTER},
 	{0x001f08, 0x001f0f, PG_U_UPPERCASE_LETTER},
 	{0x001f10, 0x001f15, PG_U_LOWERCASE_LETTER},
-	{0x001f16, 0x001f17, PG_U_UNASSIGNED},
 	{0x001f18, 0x001f1d, PG_U_UPPERCASE_LETTER},
-	{0x001f1e, 0x001f1f, PG_U_UNASSIGNED},
 	{0x001f20, 0x001f27, PG_U_LOWERCASE_LETTER},
 	{0x001f28, 0x001f2f, PG_U_UPPERCASE_LETTER},
 	{0x001f30, 0x001f37, PG_U_LOWERCASE_LETTER},
 	{0x001f38, 0x001f3f, PG_U_UPPERCASE_LETTER},
 	{0x001f40, 0x001f45, PG_U_LOWERCASE_LETTER},
-	{0x001f46, 0x001f47, PG_U_UNASSIGNED},
 	{0x001f48, 0x001f4d, PG_U_UPPERCASE_LETTER},
-	{0x001f4e, 0x001f4f, PG_U_UNASSIGNED},
 	{0x001f50, 0x001f57, PG_U_LOWERCASE_LETTER},
-	{0x001f58, 0x001f58, PG_U_UNASSIGNED},
 	{0x001f59, 0x001f59, PG_U_UPPERCASE_LETTER},
-	{0x001f5a, 0x001f5a, PG_U_UNASSIGNED},
 	{0x001f5b, 0x001f5b, PG_U_UPPERCASE_LETTER},
-	{0x001f5c, 0x001f5c, PG_U_UNASSIGNED},
 	{0x001f5d, 0x001f5d, PG_U_UPPERCASE_LETTER},
-	{0x001f5e, 0x001f5e, PG_U_UNASSIGNED},
 	{0x001f5f, 0x001f5f, PG_U_UPPERCASE_LETTER},
 	{0x001f60, 0x001f67, PG_U_LOWERCASE_LETTER},
 	{0x001f68, 0x001f6f, PG_U_UPPERCASE_LETTER},
 	{0x001f70, 0x001f7d, PG_U_LOWERCASE_LETTER},
-	{0x001f7e, 0x001f7f, PG_U_UNASSIGNED},
 	{0x001f80, 0x001f87, PG_U_LOWERCASE_LETTER},
 	{0x001f88, 0x001f8f, PG_U_TITLECASE_LETTER},
 	{0x001f90, 0x001f97, PG_U_LOWERCASE_LETTER},
@@ -1784,7 +1552,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x001fa0, 0x001fa7, PG_U_LOWERCASE_LETTER},
 	{0x001fa8, 0x001faf, PG_U_TITLECASE_LETTER},
 	{0x001fb0, 0x001fb4, PG_U_LOWERCASE_LETTER},
-	{0x001fb5, 0x001fb5, PG_U_UNASSIGNED},
 	{0x001fb6, 0x001fb7, PG_U_LOWERCASE_LETTER},
 	{0x001fb8, 0x001fbb, PG_U_UPPERCASE_LETTER},
 	{0x001fbc, 0x001fbc, PG_U_TITLECASE_LETTER},
@@ -1792,28 +1559,22 @@ static const pg_category_range unicode_categories[4009] =
 	{0x001fbe, 0x001fbe, PG_U_LOWERCASE_LETTER},
 	{0x001fbf, 0x001fc1, PG_U_MODIFIER_SYMBOL},
 	{0x001fc2, 0x001fc4, PG_U_LOWERCASE_LETTER},
-	{0x001fc5, 0x001fc5, PG_U_UNASSIGNED},
 	{0x001fc6, 0x001fc7, PG_U_LOWERCASE_LETTER},
 	{0x001fc8, 0x001fcb, PG_U_UPPERCASE_LETTER},
 	{0x001fcc, 0x001fcc, PG_U_TITLECASE_LETTER},
 	{0x001fcd, 0x001fcf, PG_U_MODIFIER_SYMBOL},
 	{0x001fd0, 0x001fd3, PG_U_LOWERCASE_LETTER},
-	{0x001fd4, 0x001fd5, PG_U_UNASSIGNED},
 	{0x001fd6, 0x001fd7, PG_U_LOWERCASE_LETTER},
 	{0x001fd8, 0x001fdb, PG_U_UPPERCASE_LETTER},
-	{0x001fdc, 0x001fdc, PG_U_UNASSIGNED},
 	{0x001fdd, 0x001fdf, PG_U_MODIFIER_SYMBOL},
 	{0x001fe0, 0x001fe7, PG_U_LOWERCASE_LETTER},
 	{0x001fe8, 0x001fec, PG_U_UPPERCASE_LETTER},
 	{0x001fed, 0x001fef, PG_U_MODIFIER_SYMBOL},
-	{0x001ff0, 0x001ff1, PG_U_UNASSIGNED},
 	{0x001ff2, 0x001ff4, PG_U_LOWERCASE_LETTER},
-	{0x001ff5, 0x001ff5, PG_U_UNASSIGNED},
 	{0x001ff6, 0x001ff7, PG_U_LOWERCASE_LETTER},
 	{0x001ff8, 0x001ffb, PG_U_UPPERCASE_LETTER},
 	{0x001ffc, 0x001ffc, PG_U_TITLECASE_LETTER},
 	{0x001ffd, 0x001ffe, PG_U_MODIFIER_SYMBOL},
-	{0x001fff, 0x001fff, PG_U_UNASSIGNED},
 	{0x002000, 0x00200a, PG_U_SPACE_SEPARATOR},
 	{0x00200b, 0x00200f, PG_U_FORMAT},
 	{0x002010, 0x002015, PG_U_DASH_PUNCTUATION},
@@ -1846,11 +1607,9 @@ static const pg_category_range unicode_categories[4009] =
 	{0x002055, 0x00205e, PG_U_OTHER_PUNCTUATION},
 	{0x00205f, 0x00205f, PG_U_SPACE_SEPARATOR},
 	{0x002060, 0x002064, PG_U_FORMAT},
-	{0x002065, 0x002065, PG_U_UNASSIGNED},
 	{0x002066, 0x00206f, PG_U_FORMAT},
 	{0x002070, 0x002070, PG_U_OTHER_NUMBER},
 	{0x002071, 0x002071, PG_U_MODIFIER_LETTER},
-	{0x002072, 0x002073, PG_U_UNASSIGNED},
 	{0x002074, 0x002079, PG_U_OTHER_NUMBER},
 	{0x00207a, 0x00207c, PG_U_MATH_SYMBOL},
 	{0x00207d, 0x00207d, PG_U_OPEN_PUNCTUATION},
@@ -1860,17 +1619,13 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00208a, 0x00208c, PG_U_MATH_SYMBOL},
 	{0x00208d, 0x00208d, PG_U_OPEN_PUNCTUATION},
 	{0x00208e, 0x00208e, PG_U_CLOSE_PUNCTUATION},
-	{0x00208f, 0x00208f, PG_U_UNASSIGNED},
 	{0x002090, 0x00209c, PG_U_MODIFIER_LETTER},
-	{0x00209d, 0x00209f, PG_U_UNASSIGNED},
 	{0x0020a0, 0x0020c0, PG_U_CURRENCY_SYMBOL},
-	{0x0020c1, 0x0020cf, PG_U_UNASSIGNED},
 	{0x0020d0, 0x0020dc, PG_U_NONSPACING_MARK},
 	{0x0020dd, 0x0020e0, PG_U_ENCLOSING_MARK},
 	{0x0020e1, 0x0020e1, PG_U_NONSPACING_MARK},
 	{0x0020e2, 0x0020e4, PG_U_ENCLOSING_MARK},
 	{0x0020e5, 0x0020f0, PG_U_NONSPACING_MARK},
-	{0x0020f1, 0x0020ff, PG_U_UNASSIGNED},
 	{0x002100, 0x002101, PG_U_OTHER_SYMBOL},
 	{0x002102, 0x002102, PG_U_UPPERCASE_LETTER},
 	{0x002103, 0x002106, PG_U_OTHER_SYMBOL},
@@ -1918,7 +1673,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x002185, 0x002188, PG_U_LETTER_NUMBER},
 	{0x002189, 0x002189, PG_U_OTHER_NUMBER},
 	{0x00218a, 0x00218b, PG_U_OTHER_SYMBOL},
-	{0x00218c, 0x00218f, PG_U_UNASSIGNED},
 	{0x002190, 0x002194, PG_U_MATH_SYMBOL},
 	{0x002195, 0x002199, PG_U_OTHER_SYMBOL},
 	{0x00219a, 0x00219b, PG_U_MATH_SYMBOL},
@@ -1955,9 +1709,7 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0023b4, 0x0023db, PG_U_OTHER_SYMBOL},
 	{0x0023dc, 0x0023e1, PG_U_MATH_SYMBOL},
 	{0x0023e2, 0x002426, PG_U_OTHER_SYMBOL},
-	{0x002427, 0x00243f, PG_U_UNASSIGNED},
 	{0x002440, 0x00244a, PG_U_OTHER_SYMBOL},
-	{0x00244b, 0x00245f, PG_U_UNASSIGNED},
 	{0x002460, 0x00249b, PG_U_OTHER_NUMBER},
 	{0x00249c, 0x0024e9, PG_U_OTHER_SYMBOL},
 	{0x0024ea, 0x0024ff, PG_U_OTHER_NUMBER},
@@ -2039,9 +1791,7 @@ static const pg_category_range unicode_categories[4009] =
 	{0x002b45, 0x002b46, PG_U_OTHER_SYMBOL},
 	{0x002b47, 0x002b4c, PG_U_MATH_SYMBOL},
 	{0x002b4d, 0x002b73, PG_U_OTHER_SYMBOL},
-	{0x002b74, 0x002b75, PG_U_UNASSIGNED},
 	{0x002b76, 0x002b95, PG_U_OTHER_SYMBOL},
-	{0x002b96, 0x002b96, PG_U_UNASSIGNED},
 	{0x002b97, 0x002bff, PG_U_OTHER_SYMBOL},
 	{0x002c00, 0x002c2f, PG_U_UPPERCASE_LETTER},
 	{0x002c30, 0x002c5f, PG_U_LOWERCASE_LETTER},
@@ -2170,40 +1920,25 @@ static const pg_category_range unicode_categories[4009] =
 	{0x002cef, 0x002cf1, PG_U_NONSPACING_MARK},
 	{0x002cf2, 0x002cf2, PG_U_UPPERCASE_LETTER},
 	{0x002cf3, 0x002cf3, PG_U_LOWERCASE_LETTER},
-	{0x002cf4, 0x002cf8, PG_U_UNASSIGNED},
 	{0x002cf9, 0x002cfc, PG_U_OTHER_PUNCTUATION},
 	{0x002cfd, 0x002cfd, PG_U_OTHER_NUMBER},
 	{0x002cfe, 0x002cff, PG_U_OTHER_PUNCTUATION},
 	{0x002d00, 0x002d25, PG_U_LOWERCASE_LETTER},
-	{0x002d26, 0x002d26, PG_U_UNASSIGNED},
 	{0x002d27, 0x002d27, PG_U_LOWERCASE_LETTER},
-	{0x002d28, 0x002d2c, PG_U_UNASSIGNED},
 	{0x002d2d, 0x002d2d, PG_U_LOWERCASE_LETTER},
-	{0x002d2e, 0x002d2f, PG_U_UNASSIGNED},
 	{0x002d30, 0x002d67, PG_U_OTHER_LETTER},
-	{0x002d68, 0x002d6e, PG_U_UNASSIGNED},
 	{0x002d6f, 0x002d6f, PG_U_MODIFIER_LETTER},
 	{0x002d70, 0x002d70, PG_U_OTHER_PUNCTUATION},
-	{0x002d71, 0x002d7e, PG_U_UNASSIGNED},
 	{0x002d7f, 0x002d7f, PG_U_NONSPACING_MARK},
 	{0x002d80, 0x002d96, PG_U_OTHER_LETTER},
-	{0x002d97, 0x002d9f, PG_U_UNASSIGNED},
 	{0x002da0, 0x002da6, PG_U_OTHER_LETTER},
-	{0x002da7, 0x002da7, PG_U_UNASSIGNED},
 	{0x002da8, 0x002dae, PG_U_OTHER_LETTER},
-	{0x002daf, 0x002daf, PG_U_UNASSIGNED},
 	{0x002db0, 0x002db6, PG_U_OTHER_LETTER},
-	{0x002db7, 0x002db7, PG_U_UNASSIGNED},
 	{0x002db8, 0x002dbe, PG_U_OTHER_LETTER},
-	{0x002dbf, 0x002dbf, PG_U_UNASSIGNED},
 	{0x002dc0, 0x002dc6, PG_U_OTHER_LETTER},
-	{0x002dc7, 0x002dc7, PG_U_UNASSIGNED},
 	{0x002dc8, 0x002dce, PG_U_OTHER_LETTER},
-	{0x002dcf, 0x002dcf, PG_U_UNASSIGNED},
 	{0x002dd0, 0x002dd6, PG_U_OTHER_LETTER},
-	{0x002dd7, 0x002dd7, PG_U_UNASSIGNED},
 	{0x002dd8, 0x002dde, PG_U_OTHER_LETTER},
-	{0x002ddf, 0x002ddf, PG_U_UNASSIGNED},
 	{0x002de0, 0x002dff, PG_U_NONSPACING_MARK},
 	{0x002e00, 0x002e01, PG_U_OTHER_PUNCTUATION},
 	{0x002e02, 0x002e02, PG_U_INITIAL_PUNCTUATION},
@@ -2254,13 +1989,9 @@ static const pg_category_range unicode_categories[4009] =
 	{0x002e5b, 0x002e5b, PG_U_OPEN_PUNCTUATION},
 	{0x002e5c, 0x002e5c, PG_U_CLOSE_PUNCTUATION},
 	{0x002e5d, 0x002e5d, PG_U_DASH_PUNCTUATION},
-	{0x002e5e, 0x002e7f, PG_U_UNASSIGNED},
 	{0x002e80, 0x002e99, PG_U_OTHER_SYMBOL},
-	{0x002e9a, 0x002e9a, PG_U_UNASSIGNED},
 	{0x002e9b, 0x002ef3, PG_U_OTHER_SYMBOL},
-	{0x002ef4, 0x002eff, PG_U_UNASSIGNED},
 	{0x002f00, 0x002fd5, PG_U_OTHER_SYMBOL},
-	{0x002fd6, 0x002fef, PG_U_UNASSIGNED},
 	{0x002ff0, 0x002fff, PG_U_OTHER_SYMBOL},
 	{0x003000, 0x003000, PG_U_SPACE_SEPARATOR},
 	{0x003001, 0x003003, PG_U_OTHER_PUNCTUATION},
@@ -2302,9 +2033,7 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00303c, 0x00303c, PG_U_OTHER_LETTER},
 	{0x00303d, 0x00303d, PG_U_OTHER_PUNCTUATION},
 	{0x00303e, 0x00303f, PG_U_OTHER_SYMBOL},
-	{0x003040, 0x003040, PG_U_UNASSIGNED},
 	{0x003041, 0x003096, PG_U_OTHER_LETTER},
-	{0x003097, 0x003098, PG_U_UNASSIGNED},
 	{0x003099, 0x00309a, PG_U_NONSPACING_MARK},
 	{0x00309b, 0x00309c, PG_U_MODIFIER_SYMBOL},
 	{0x00309d, 0x00309e, PG_U_MODIFIER_LETTER},
@@ -2314,21 +2043,16 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0030fb, 0x0030fb, PG_U_OTHER_PUNCTUATION},
 	{0x0030fc, 0x0030fe, PG_U_MODIFIER_LETTER},
 	{0x0030ff, 0x0030ff, PG_U_OTHER_LETTER},
-	{0x003100, 0x003104, PG_U_UNASSIGNED},
 	{0x003105, 0x00312f, PG_U_OTHER_LETTER},
-	{0x003130, 0x003130, PG_U_UNASSIGNED},
 	{0x003131, 0x00318e, PG_U_OTHER_LETTER},
-	{0x00318f, 0x00318f, PG_U_UNASSIGNED},
 	{0x003190, 0x003191, PG_U_OTHER_SYMBOL},
 	{0x003192, 0x003195, PG_U_OTHER_NUMBER},
 	{0x003196, 0x00319f, PG_U_OTHER_SYMBOL},
 	{0x0031a0, 0x0031bf, PG_U_OTHER_LETTER},
 	{0x0031c0, 0x0031e3, PG_U_OTHER_SYMBOL},
-	{0x0031e4, 0x0031ee, PG_U_UNASSIGNED},
 	{0x0031ef, 0x0031ef, PG_U_OTHER_SYMBOL},
 	{0x0031f0, 0x0031ff, PG_U_OTHER_LETTER},
 	{0x003200, 0x00321e, PG_U_OTHER_SYMBOL},
-	{0x00321f, 0x00321f, PG_U_UNASSIGNED},
 	{0x003220, 0x003229, PG_U_OTHER_NUMBER},
 	{0x00322a, 0x003247, PG_U_OTHER_SYMBOL},
 	{0x003248, 0x00324f, PG_U_OTHER_NUMBER},
@@ -2344,9 +2068,7 @@ static const pg_category_range unicode_categories[4009] =
 	{0x004e00, 0x00a014, PG_U_OTHER_LETTER},
 	{0x00a015, 0x00a015, PG_U_MODIFIER_LETTER},
 	{0x00a016, 0x00a48c, PG_U_OTHER_LETTER},
-	{0x00a48d, 0x00a48f, PG_U_UNASSIGNED},
 	{0x00a490, 0x00a4c6, PG_U_OTHER_SYMBOL},
-	{0x00a4c7, 0x00a4cf, PG_U_UNASSIGNED},
 	{0x00a4d0, 0x00a4f7, PG_U_OTHER_LETTER},
 	{0x00a4f8, 0x00a4fd, PG_U_MODIFIER_LETTER},
 	{0x00a4fe, 0x00a4ff, PG_U_OTHER_PUNCTUATION},
@@ -2356,7 +2078,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00a610, 0x00a61f, PG_U_OTHER_LETTER},
 	{0x00a620, 0x00a629, PG_U_DECIMAL_NUMBER},
 	{0x00a62a, 0x00a62b, PG_U_OTHER_LETTER},
-	{0x00a62c, 0x00a63f, PG_U_UNASSIGNED},
 	{0x00a640, 0x00a640, PG_U_UPPERCASE_LETTER},
 	{0x00a641, 0x00a641, PG_U_LOWERCASE_LETTER},
 	{0x00a642, 0x00a642, PG_U_UPPERCASE_LETTER},
@@ -2444,7 +2165,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00a6e6, 0x00a6ef, PG_U_LETTER_NUMBER},
 	{0x00a6f0, 0x00a6f1, PG_U_NONSPACING_MARK},
 	{0x00a6f2, 0x00a6f7, PG_U_OTHER_PUNCTUATION},
-	{0x00a6f8, 0x00a6ff, PG_U_UNASSIGNED},
 	{0x00a700, 0x00a716, PG_U_MODIFIER_SYMBOL},
 	{0x00a717, 0x00a71f, PG_U_MODIFIER_LETTER},
 	{0x00a720, 0x00a721, PG_U_MODIFIER_SYMBOL},
@@ -2593,18 +2313,14 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00a7c8, 0x00a7c8, PG_U_LOWERCASE_LETTER},
 	{0x00a7c9, 0x00a7c9, PG_U_UPPERCASE_LETTER},
 	{0x00a7ca, 0x00a7ca, PG_U_LOWERCASE_LETTER},
-	{0x00a7cb, 0x00a7cf, PG_U_UNASSIGNED},
 	{0x00a7d0, 0x00a7d0, PG_U_UPPERCASE_LETTER},
 	{0x00a7d1, 0x00a7d1, PG_U_LOWERCASE_LETTER},
-	{0x00a7d2, 0x00a7d2, PG_U_UNASSIGNED},
 	{0x00a7d3, 0x00a7d3, PG_U_LOWERCASE_LETTER},
-	{0x00a7d4, 0x00a7d4, PG_U_UNASSIGNED},
 	{0x00a7d5, 0x00a7d5, PG_U_LOWERCASE_LETTER},
 	{0x00a7d6, 0x00a7d6, PG_U_UPPERCASE_LETTER},
 	{0x00a7d7, 0x00a7d7, PG_U_LOWERCASE_LETTER},
 	{0x00a7d8, 0x00a7d8, PG_U_UPPERCASE_LETTER},
 	{0x00a7d9, 0x00a7d9, PG_U_LOWERCASE_LETTER},
-	{0x00a7da, 0x00a7f1, PG_U_UNASSIGNED},
 	{0x00a7f2, 0x00a7f4, PG_U_MODIFIER_LETTER},
 	{0x00a7f5, 0x00a7f5, PG_U_UPPERCASE_LETTER},
 	{0x00a7f6, 0x00a7f6, PG_U_LOWERCASE_LETTER},
@@ -2623,23 +2339,18 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00a827, 0x00a827, PG_U_SPACING_MARK},
 	{0x00a828, 0x00a82b, PG_U_OTHER_SYMBOL},
 	{0x00a82c, 0x00a82c, PG_U_NONSPACING_MARK},
-	{0x00a82d, 0x00a82f, PG_U_UNASSIGNED},
 	{0x00a830, 0x00a835, PG_U_OTHER_NUMBER},
 	{0x00a836, 0x00a837, PG_U_OTHER_SYMBOL},
 	{0x00a838, 0x00a838, PG_U_CURRENCY_SYMBOL},
 	{0x00a839, 0x00a839, PG_U_OTHER_SYMBOL},
-	{0x00a83a, 0x00a83f, PG_U_UNASSIGNED},
 	{0x00a840, 0x00a873, PG_U_OTHER_LETTER},
 	{0x00a874, 0x00a877, PG_U_OTHER_PUNCTUATION},
-	{0x00a878, 0x00a87f, PG_U_UNASSIGNED},
 	{0x00a880, 0x00a881, PG_U_SPACING_MARK},
 	{0x00a882, 0x00a8b3, PG_U_OTHER_LETTER},
 	{0x00a8b4, 0x00a8c3, PG_U_SPACING_MARK},
 	{0x00a8c4, 0x00a8c5, PG_U_NONSPACING_MARK},
-	{0x00a8c6, 0x00a8cd, PG_U_UNASSIGNED},
 	{0x00a8ce, 0x00a8cf, PG_U_OTHER_PUNCTUATION},
 	{0x00a8d0, 0x00a8d9, PG_U_DECIMAL_NUMBER},
-	{0x00a8da, 0x00a8df, PG_U_UNASSIGNED},
 	{0x00a8e0, 0x00a8f1, PG_U_NONSPACING_MARK},
 	{0x00a8f2, 0x00a8f7, PG_U_OTHER_LETTER},
 	{0x00a8f8, 0x00a8fa, PG_U_OTHER_PUNCTUATION},
@@ -2654,10 +2365,8 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00a930, 0x00a946, PG_U_OTHER_LETTER},
 	{0x00a947, 0x00a951, PG_U_NONSPACING_MARK},
 	{0x00a952, 0x00a953, PG_U_SPACING_MARK},
-	{0x00a954, 0x00a95e, PG_U_UNASSIGNED},
 	{0x00a95f, 0x00a95f, PG_U_OTHER_PUNCTUATION},
 	{0x00a960, 0x00a97c, PG_U_OTHER_LETTER},
-	{0x00a97d, 0x00a97f, PG_U_UNASSIGNED},
 	{0x00a980, 0x00a982, PG_U_NONSPACING_MARK},
 	{0x00a983, 0x00a983, PG_U_SPACING_MARK},
 	{0x00a984, 0x00a9b2, PG_U_OTHER_LETTER},
@@ -2668,10 +2377,8 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00a9bc, 0x00a9bd, PG_U_NONSPACING_MARK},
 	{0x00a9be, 0x00a9c0, PG_U_SPACING_MARK},
 	{0x00a9c1, 0x00a9cd, PG_U_OTHER_PUNCTUATION},
-	{0x00a9ce, 0x00a9ce, PG_U_UNASSIGNED},
 	{0x00a9cf, 0x00a9cf, PG_U_MODIFIER_LETTER},
 	{0x00a9d0, 0x00a9d9, PG_U_DECIMAL_NUMBER},
-	{0x00a9da, 0x00a9dd, PG_U_UNASSIGNED},
 	{0x00a9de, 0x00a9df, PG_U_OTHER_PUNCTUATION},
 	{0x00a9e0, 0x00a9e4, PG_U_OTHER_LETTER},
 	{0x00a9e5, 0x00a9e5, PG_U_NONSPACING_MARK},
@@ -2679,22 +2386,18 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00a9e7, 0x00a9ef, PG_U_OTHER_LETTER},
 	{0x00a9f0, 0x00a9f9, PG_U_DECIMAL_NUMBER},
 	{0x00a9fa, 0x00a9fe, PG_U_OTHER_LETTER},
-	{0x00a9ff, 0x00a9ff, PG_U_UNASSIGNED},
 	{0x00aa00, 0x00aa28, PG_U_OTHER_LETTER},
 	{0x00aa29, 0x00aa2e, PG_U_NONSPACING_MARK},
 	{0x00aa2f, 0x00aa30, PG_U_SPACING_MARK},
 	{0x00aa31, 0x00aa32, PG_U_NONSPACING_MARK},
 	{0x00aa33, 0x00aa34, PG_U_SPACING_MARK},
 	{0x00aa35, 0x00aa36, PG_U_NONSPACING_MARK},
-	{0x00aa37, 0x00aa3f, PG_U_UNASSIGNED},
 	{0x00aa40, 0x00aa42, PG_U_OTHER_LETTER},
 	{0x00aa43, 0x00aa43, PG_U_NONSPACING_MARK},
 	{0x00aa44, 0x00aa4b, PG_U_OTHER_LETTER},
 	{0x00aa4c, 0x00aa4c, PG_U_NONSPACING_MARK},
 	{0x00aa4d, 0x00aa4d, PG_U_SPACING_MARK},
-	{0x00aa4e, 0x00aa4f, PG_U_UNASSIGNED},
 	{0x00aa50, 0x00aa59, PG_U_DECIMAL_NUMBER},
-	{0x00aa5a, 0x00aa5b, PG_U_UNASSIGNED},
 	{0x00aa5c, 0x00aa5f, PG_U_OTHER_PUNCTUATION},
 	{0x00aa60, 0x00aa6f, PG_U_OTHER_LETTER},
 	{0x00aa70, 0x00aa70, PG_U_MODIFIER_LETTER},
@@ -2715,7 +2418,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00aac0, 0x00aac0, PG_U_OTHER_LETTER},
 	{0x00aac1, 0x00aac1, PG_U_NONSPACING_MARK},
 	{0x00aac2, 0x00aac2, PG_U_OTHER_LETTER},
-	{0x00aac3, 0x00aada, PG_U_UNASSIGNED},
 	{0x00aadb, 0x00aadc, PG_U_OTHER_LETTER},
 	{0x00aadd, 0x00aadd, PG_U_MODIFIER_LETTER},
 	{0x00aade, 0x00aadf, PG_U_OTHER_PUNCTUATION},
@@ -2728,24 +2430,17 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00aaf3, 0x00aaf4, PG_U_MODIFIER_LETTER},
 	{0x00aaf5, 0x00aaf5, PG_U_SPACING_MARK},
 	{0x00aaf6, 0x00aaf6, PG_U_NONSPACING_MARK},
-	{0x00aaf7, 0x00ab00, PG_U_UNASSIGNED},
 	{0x00ab01, 0x00ab06, PG_U_OTHER_LETTER},
-	{0x00ab07, 0x00ab08, PG_U_UNASSIGNED},
 	{0x00ab09, 0x00ab0e, PG_U_OTHER_LETTER},
-	{0x00ab0f, 0x00ab10, PG_U_UNASSIGNED},
 	{0x00ab11, 0x00ab16, PG_U_OTHER_LETTER},
-	{0x00ab17, 0x00ab1f, PG_U_UNASSIGNED},
 	{0x00ab20, 0x00ab26, PG_U_OTHER_LETTER},
-	{0x00ab27, 0x00ab27, PG_U_UNASSIGNED},
 	{0x00ab28, 0x00ab2e, PG_U_OTHER_LETTER},
-	{0x00ab2f, 0x00ab2f, PG_U_UNASSIGNED},
 	{0x00ab30, 0x00ab5a, PG_U_LOWERCASE_LETTER},
 	{0x00ab5b, 0x00ab5b, PG_U_MODIFIER_SYMBOL},
 	{0x00ab5c, 0x00ab5f, PG_U_MODIFIER_LETTER},
 	{0x00ab60, 0x00ab68, PG_U_LOWERCASE_LETTER},
 	{0x00ab69, 0x00ab69, PG_U_MODIFIER_LETTER},
 	{0x00ab6a, 0x00ab6b, PG_U_MODIFIER_SYMBOL},
-	{0x00ab6c, 0x00ab6f, PG_U_UNASSIGNED},
 	{0x00ab70, 0x00abbf, PG_U_LOWERCASE_LETTER},
 	{0x00abc0, 0x00abe2, PG_U_OTHER_LETTER},
 	{0x00abe3, 0x00abe4, PG_U_SPACING_MARK},
@@ -2756,52 +2451,34 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00abeb, 0x00abeb, PG_U_OTHER_PUNCTUATION},
 	{0x00abec, 0x00abec, PG_U_SPACING_MARK},
 	{0x00abed, 0x00abed, PG_U_NONSPACING_MARK},
-	{0x00abee, 0x00abef, PG_U_UNASSIGNED},
 	{0x00abf0, 0x00abf9, PG_U_DECIMAL_NUMBER},
-	{0x00abfa, 0x00abff, PG_U_UNASSIGNED},
 	{0x00ac00, 0x00d7a3, PG_U_OTHER_LETTER},
-	{0x00d7a4, 0x00d7af, PG_U_UNASSIGNED},
 	{0x00d7b0, 0x00d7c6, PG_U_OTHER_LETTER},
-	{0x00d7c7, 0x00d7ca, PG_U_UNASSIGNED},
 	{0x00d7cb, 0x00d7fb, PG_U_OTHER_LETTER},
-	{0x00d7fc, 0x00d7ff, PG_U_UNASSIGNED},
 	{0x00d800, 0x00dfff, PG_U_SURROGATE},
 	{0x00e000, 0x00f8ff, PG_U_PRIVATE_USE},
 	{0x00f900, 0x00fa6d, PG_U_OTHER_LETTER},
-	{0x00fa6e, 0x00fa6f, PG_U_UNASSIGNED},
 	{0x00fa70, 0x00fad9, PG_U_OTHER_LETTER},
-	{0x00fada, 0x00faff, PG_U_UNASSIGNED},
 	{0x00fb00, 0x00fb06, PG_U_LOWERCASE_LETTER},
-	{0x00fb07, 0x00fb12, PG_U_UNASSIGNED},
 	{0x00fb13, 0x00fb17, PG_U_LOWERCASE_LETTER},
-	{0x00fb18, 0x00fb1c, PG_U_UNASSIGNED},
 	{0x00fb1d, 0x00fb1d, PG_U_OTHER_LETTER},
 	{0x00fb1e, 0x00fb1e, PG_U_NONSPACING_MARK},
 	{0x00fb1f, 0x00fb28, PG_U_OTHER_LETTER},
 	{0x00fb29, 0x00fb29, PG_U_MATH_SYMBOL},
 	{0x00fb2a, 0x00fb36, PG_U_OTHER_LETTER},
-	{0x00fb37, 0x00fb37, PG_U_UNASSIGNED},
 	{0x00fb38, 0x00fb3c, PG_U_OTHER_LETTER},
-	{0x00fb3d, 0x00fb3d, PG_U_UNASSIGNED},
 	{0x00fb3e, 0x00fb3e, PG_U_OTHER_LETTER},
-	{0x00fb3f, 0x00fb3f, PG_U_UNASSIGNED},
 	{0x00fb40, 0x00fb41, PG_U_OTHER_LETTER},
-	{0x00fb42, 0x00fb42, PG_U_UNASSIGNED},
 	{0x00fb43, 0x00fb44, PG_U_OTHER_LETTER},
-	{0x00fb45, 0x00fb45, PG_U_UNASSIGNED},
 	{0x00fb46, 0x00fbb1, PG_U_OTHER_LETTER},
 	{0x00fbb2, 0x00fbc2, PG_U_MODIFIER_SYMBOL},
-	{0x00fbc3, 0x00fbd2, PG_U_UNASSIGNED},
 	{0x00fbd3, 0x00fd3d, PG_U_OTHER_LETTER},
 	{0x00fd3e, 0x00fd3e, PG_U_CLOSE_PUNCTUATION},
 	{0x00fd3f, 0x00fd3f, PG_U_OPEN_PUNCTUATION},
 	{0x00fd40, 0x00fd4f, PG_U_OTHER_SYMBOL},
 	{0x00fd50, 0x00fd8f, PG_U_OTHER_LETTER},
-	{0x00fd90, 0x00fd91, PG_U_UNASSIGNED},
 	{0x00fd92, 0x00fdc7, PG_U_OTHER_LETTER},
-	{0x00fdc8, 0x00fdce, PG_U_UNASSIGNED},
 	{0x00fdcf, 0x00fdcf, PG_U_OTHER_SYMBOL},
-	{0x00fdd0, 0x00fdef, PG_U_UNASSIGNED},
 	{0x00fdf0, 0x00fdfb, PG_U_OTHER_LETTER},
 	{0x00fdfc, 0x00fdfc, PG_U_CURRENCY_SYMBOL},
 	{0x00fdfd, 0x00fdff, PG_U_OTHER_SYMBOL},
@@ -2810,7 +2487,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00fe17, 0x00fe17, PG_U_OPEN_PUNCTUATION},
 	{0x00fe18, 0x00fe18, PG_U_CLOSE_PUNCTUATION},
 	{0x00fe19, 0x00fe19, PG_U_OTHER_PUNCTUATION},
-	{0x00fe1a, 0x00fe1f, PG_U_UNASSIGNED},
 	{0x00fe20, 0x00fe2f, PG_U_NONSPACING_MARK},
 	{0x00fe30, 0x00fe30, PG_U_OTHER_PUNCTUATION},
 	{0x00fe31, 0x00fe32, PG_U_DASH_PUNCTUATION},
@@ -2837,7 +2513,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00fe49, 0x00fe4c, PG_U_OTHER_PUNCTUATION},
 	{0x00fe4d, 0x00fe4f, PG_U_CONNECTOR_PUNCTUATION},
 	{0x00fe50, 0x00fe52, PG_U_OTHER_PUNCTUATION},
-	{0x00fe53, 0x00fe53, PG_U_UNASSIGNED},
 	{0x00fe54, 0x00fe57, PG_U_OTHER_PUNCTUATION},
 	{0x00fe58, 0x00fe58, PG_U_DASH_PUNCTUATION},
 	{0x00fe59, 0x00fe59, PG_U_OPEN_PUNCTUATION},
@@ -2850,17 +2525,12 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00fe62, 0x00fe62, PG_U_MATH_SYMBOL},
 	{0x00fe63, 0x00fe63, PG_U_DASH_PUNCTUATION},
 	{0x00fe64, 0x00fe66, PG_U_MATH_SYMBOL},
-	{0x00fe67, 0x00fe67, PG_U_UNASSIGNED},
 	{0x00fe68, 0x00fe68, PG_U_OTHER_PUNCTUATION},
 	{0x00fe69, 0x00fe69, PG_U_CURRENCY_SYMBOL},
 	{0x00fe6a, 0x00fe6b, PG_U_OTHER_PUNCTUATION},
-	{0x00fe6c, 0x00fe6f, PG_U_UNASSIGNED},
 	{0x00fe70, 0x00fe74, PG_U_OTHER_LETTER},
-	{0x00fe75, 0x00fe75, PG_U_UNASSIGNED},
 	{0x00fe76, 0x00fefc, PG_U_OTHER_LETTER},
-	{0x00fefd, 0x00fefe, PG_U_UNASSIGNED},
 	{0x00feff, 0x00feff, PG_U_FORMAT},
-	{0x00ff00, 0x00ff00, PG_U_UNASSIGNED},
 	{0x00ff01, 0x00ff03, PG_U_OTHER_PUNCTUATION},
 	{0x00ff04, 0x00ff04, PG_U_CURRENCY_SYMBOL},
 	{0x00ff05, 0x00ff07, PG_U_OTHER_PUNCTUATION},
@@ -2898,273 +2568,175 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00ff71, 0x00ff9d, PG_U_OTHER_LETTER},
 	{0x00ff9e, 0x00ff9f, PG_U_MODIFIER_LETTER},
 	{0x00ffa0, 0x00ffbe, PG_U_OTHER_LETTER},
-	{0x00ffbf, 0x00ffc1, PG_U_UNASSIGNED},
 	{0x00ffc2, 0x00ffc7, PG_U_OTHER_LETTER},
-	{0x00ffc8, 0x00ffc9, PG_U_UNASSIGNED},
 	{0x00ffca, 0x00ffcf, PG_U_OTHER_LETTER},
-	{0x00ffd0, 0x00ffd1, PG_U_UNASSIGNED},
 	{0x00ffd2, 0x00ffd7, PG_U_OTHER_LETTER},
-	{0x00ffd8, 0x00ffd9, PG_U_UNASSIGNED},
 	{0x00ffda, 0x00ffdc, PG_U_OTHER_LETTER},
-	{0x00ffdd, 0x00ffdf, PG_U_UNASSIGNED},
 	{0x00ffe0, 0x00ffe1, PG_U_CURRENCY_SYMBOL},
 	{0x00ffe2, 0x00ffe2, PG_U_MATH_SYMBOL},
 	{0x00ffe3, 0x00ffe3, PG_U_MODIFIER_SYMBOL},
 	{0x00ffe4, 0x00ffe4, PG_U_OTHER_SYMBOL},
 	{0x00ffe5, 0x00ffe6, PG_U_CURRENCY_SYMBOL},
-	{0x00ffe7, 0x00ffe7, PG_U_UNASSIGNED},
 	{0x00ffe8, 0x00ffe8, PG_U_OTHER_SYMBOL},
 	{0x00ffe9, 0x00ffec, PG_U_MATH_SYMBOL},
 	{0x00ffed, 0x00ffee, PG_U_OTHER_SYMBOL},
-	{0x00ffef, 0x00fff8, PG_U_UNASSIGNED},
 	{0x00fff9, 0x00fffb, PG_U_FORMAT},
 	{0x00fffc, 0x00fffd, PG_U_OTHER_SYMBOL},
-	{0x00fffe, 0x00ffff, PG_U_UNASSIGNED},
 	{0x010000, 0x01000b, PG_U_OTHER_LETTER},
-	{0x01000c, 0x01000c, PG_U_UNASSIGNED},
 	{0x01000d, 0x010026, PG_U_OTHER_LETTER},
-	{0x010027, 0x010027, PG_U_UNASSIGNED},
 	{0x010028, 0x01003a, PG_U_OTHER_LETTER},
-	{0x01003b, 0x01003b, PG_U_UNASSIGNED},
 	{0x01003c, 0x01003d, PG_U_OTHER_LETTER},
-	{0x01003e, 0x01003e, PG_U_UNASSIGNED},
 	{0x01003f, 0x01004d, PG_U_OTHER_LETTER},
-	{0x01004e, 0x01004f, PG_U_UNASSIGNED},
 	{0x010050, 0x01005d, PG_U_OTHER_LETTER},
-	{0x01005e, 0x01007f, PG_U_UNASSIGNED},
 	{0x010080, 0x0100fa, PG_U_OTHER_LETTER},
-	{0x0100fb, 0x0100ff, PG_U_UNASSIGNED},
 	{0x010100, 0x010102, PG_U_OTHER_PUNCTUATION},
-	{0x010103, 0x010106, PG_U_UNASSIGNED},
 	{0x010107, 0x010133, PG_U_OTHER_NUMBER},
-	{0x010134, 0x010136, PG_U_UNASSIGNED},
 	{0x010137, 0x01013f, PG_U_OTHER_SYMBOL},
 	{0x010140, 0x010174, PG_U_LETTER_NUMBER},
 	{0x010175, 0x010178, PG_U_OTHER_NUMBER},
 	{0x010179, 0x010189, PG_U_OTHER_SYMBOL},
 	{0x01018a, 0x01018b, PG_U_OTHER_NUMBER},
 	{0x01018c, 0x01018e, PG_U_OTHER_SYMBOL},
-	{0x01018f, 0x01018f, PG_U_UNASSIGNED},
 	{0x010190, 0x01019c, PG_U_OTHER_SYMBOL},
-	{0x01019d, 0x01019f, PG_U_UNASSIGNED},
 	{0x0101a0, 0x0101a0, PG_U_OTHER_SYMBOL},
-	{0x0101a1, 0x0101cf, PG_U_UNASSIGNED},
 	{0x0101d0, 0x0101fc, PG_U_OTHER_SYMBOL},
 	{0x0101fd, 0x0101fd, PG_U_NONSPACING_MARK},
-	{0x0101fe, 0x01027f, PG_U_UNASSIGNED},
 	{0x010280, 0x01029c, PG_U_OTHER_LETTER},
-	{0x01029d, 0x01029f, PG_U_UNASSIGNED},
 	{0x0102a0, 0x0102d0, PG_U_OTHER_LETTER},
-	{0x0102d1, 0x0102df, PG_U_UNASSIGNED},
 	{0x0102e0, 0x0102e0, PG_U_NONSPACING_MARK},
 	{0x0102e1, 0x0102fb, PG_U_OTHER_NUMBER},
-	{0x0102fc, 0x0102ff, PG_U_UNASSIGNED},
 	{0x010300, 0x01031f, PG_U_OTHER_LETTER},
 	{0x010320, 0x010323, PG_U_OTHER_NUMBER},
-	{0x010324, 0x01032c, PG_U_UNASSIGNED},
 	{0x01032d, 0x010340, PG_U_OTHER_LETTER},
 	{0x010341, 0x010341, PG_U_LETTER_NUMBER},
 	{0x010342, 0x010349, PG_U_OTHER_LETTER},
 	{0x01034a, 0x01034a, PG_U_LETTER_NUMBER},
-	{0x01034b, 0x01034f, PG_U_UNASSIGNED},
 	{0x010350, 0x010375, PG_U_OTHER_LETTER},
 	{0x010376, 0x01037a, PG_U_NONSPACING_MARK},
-	{0x01037b, 0x01037f, PG_U_UNASSIGNED},
 	{0x010380, 0x01039d, PG_U_OTHER_LETTER},
-	{0x01039e, 0x01039e, PG_U_UNASSIGNED},
 	{0x01039f, 0x01039f, PG_U_OTHER_PUNCTUATION},
 	{0x0103a0, 0x0103c3, PG_U_OTHER_LETTER},
-	{0x0103c4, 0x0103c7, PG_U_UNASSIGNED},
 	{0x0103c8, 0x0103cf, PG_U_OTHER_LETTER},
 	{0x0103d0, 0x0103d0, PG_U_OTHER_PUNCTUATION},
 	{0x0103d1, 0x0103d5, PG_U_LETTER_NUMBER},
-	{0x0103d6, 0x0103ff, PG_U_UNASSIGNED},
 	{0x010400, 0x010427, PG_U_UPPERCASE_LETTER},
 	{0x010428, 0x01044f, PG_U_LOWERCASE_LETTER},
 	{0x010450, 0x01049d, PG_U_OTHER_LETTER},
-	{0x01049e, 0x01049f, PG_U_UNASSIGNED},
 	{0x0104a0, 0x0104a9, PG_U_DECIMAL_NUMBER},
-	{0x0104aa, 0x0104af, PG_U_UNASSIGNED},
 	{0x0104b0, 0x0104d3, PG_U_UPPERCASE_LETTER},
-	{0x0104d4, 0x0104d7, PG_U_UNASSIGNED},
 	{0x0104d8, 0x0104fb, PG_U_LOWERCASE_LETTER},
-	{0x0104fc, 0x0104ff, PG_U_UNASSIGNED},
 	{0x010500, 0x010527, PG_U_OTHER_LETTER},
-	{0x010528, 0x01052f, PG_U_UNASSIGNED},
 	{0x010530, 0x010563, PG_U_OTHER_LETTER},
-	{0x010564, 0x01056e, PG_U_UNASSIGNED},
 	{0x01056f, 0x01056f, PG_U_OTHER_PUNCTUATION},
 	{0x010570, 0x01057a, PG_U_UPPERCASE_LETTER},
-	{0x01057b, 0x01057b, PG_U_UNASSIGNED},
 	{0x01057c, 0x01058a, PG_U_UPPERCASE_LETTER},
-	{0x01058b, 0x01058b, PG_U_UNASSIGNED},
 	{0x01058c, 0x010592, PG_U_UPPERCASE_LETTER},
-	{0x010593, 0x010593, PG_U_UNASSIGNED},
 	{0x010594, 0x010595, PG_U_UPPERCASE_LETTER},
-	{0x010596, 0x010596, PG_U_UNASSIGNED},
 	{0x010597, 0x0105a1, PG_U_LOWERCASE_LETTER},
-	{0x0105a2, 0x0105a2, PG_U_UNASSIGNED},
 	{0x0105a3, 0x0105b1, PG_U_LOWERCASE_LETTER},
-	{0x0105b2, 0x0105b2, PG_U_UNASSIGNED},
 	{0x0105b3, 0x0105b9, PG_U_LOWERCASE_LETTER},
-	{0x0105ba, 0x0105ba, PG_U_UNASSIGNED},
 	{0x0105bb, 0x0105bc, PG_U_LOWERCASE_LETTER},
-	{0x0105bd, 0x0105ff, PG_U_UNASSIGNED},
 	{0x010600, 0x010736, PG_U_OTHER_LETTER},
-	{0x010737, 0x01073f, PG_U_UNASSIGNED},
 	{0x010740, 0x010755, PG_U_OTHER_LETTER},
-	{0x010756, 0x01075f, PG_U_UNASSIGNED},
 	{0x010760, 0x010767, PG_U_OTHER_LETTER},
-	{0x010768, 0x01077f, PG_U_UNASSIGNED},
 	{0x010780, 0x010785, PG_U_MODIFIER_LETTER},
-	{0x010786, 0x010786, PG_U_UNASSIGNED},
 	{0x010787, 0x0107b0, PG_U_MODIFIER_LETTER},
-	{0x0107b1, 0x0107b1, PG_U_UNASSIGNED},
 	{0x0107b2, 0x0107ba, PG_U_MODIFIER_LETTER},
-	{0x0107bb, 0x0107ff, PG_U_UNASSIGNED},
 	{0x010800, 0x010805, PG_U_OTHER_LETTER},
-	{0x010806, 0x010807, PG_U_UNASSIGNED},
 	{0x010808, 0x010808, PG_U_OTHER_LETTER},
-	{0x010809, 0x010809, PG_U_UNASSIGNED},
 	{0x01080a, 0x010835, PG_U_OTHER_LETTER},
-	{0x010836, 0x010836, PG_U_UNASSIGNED},
 	{0x010837, 0x010838, PG_U_OTHER_LETTER},
-	{0x010839, 0x01083b, PG_U_UNASSIGNED},
 	{0x01083c, 0x01083c, PG_U_OTHER_LETTER},
-	{0x01083d, 0x01083e, PG_U_UNASSIGNED},
 	{0x01083f, 0x010855, PG_U_OTHER_LETTER},
-	{0x010856, 0x010856, PG_U_UNASSIGNED},
 	{0x010857, 0x010857, PG_U_OTHER_PUNCTUATION},
 	{0x010858, 0x01085f, PG_U_OTHER_NUMBER},
 	{0x010860, 0x010876, PG_U_OTHER_LETTER},
 	{0x010877, 0x010878, PG_U_OTHER_SYMBOL},
 	{0x010879, 0x01087f, PG_U_OTHER_NUMBER},
 	{0x010880, 0x01089e, PG_U_OTHER_LETTER},
-	{0x01089f, 0x0108a6, PG_U_UNASSIGNED},
 	{0x0108a7, 0x0108af, PG_U_OTHER_NUMBER},
-	{0x0108b0, 0x0108df, PG_U_UNASSIGNED},
 	{0x0108e0, 0x0108f2, PG_U_OTHER_LETTER},
-	{0x0108f3, 0x0108f3, PG_U_UNASSIGNED},
 	{0x0108f4, 0x0108f5, PG_U_OTHER_LETTER},
-	{0x0108f6, 0x0108fa, PG_U_UNASSIGNED},
 	{0x0108fb, 0x0108ff, PG_U_OTHER_NUMBER},
 	{0x010900, 0x010915, PG_U_OTHER_LETTER},
 	{0x010916, 0x01091b, PG_U_OTHER_NUMBER},
-	{0x01091c, 0x01091e, PG_U_UNASSIGNED},
 	{0x01091f, 0x01091f, PG_U_OTHER_PUNCTUATION},
 	{0x010920, 0x010939, PG_U_OTHER_LETTER},
-	{0x01093a, 0x01093e, PG_U_UNASSIGNED},
 	{0x01093f, 0x01093f, PG_U_OTHER_PUNCTUATION},
-	{0x010940, 0x01097f, PG_U_UNASSIGNED},
 	{0x010980, 0x0109b7, PG_U_OTHER_LETTER},
-	{0x0109b8, 0x0109bb, PG_U_UNASSIGNED},
 	{0x0109bc, 0x0109bd, PG_U_OTHER_NUMBER},
 	{0x0109be, 0x0109bf, PG_U_OTHER_LETTER},
 	{0x0109c0, 0x0109cf, PG_U_OTHER_NUMBER},
-	{0x0109d0, 0x0109d1, PG_U_UNASSIGNED},
 	{0x0109d2, 0x0109ff, PG_U_OTHER_NUMBER},
 	{0x010a00, 0x010a00, PG_U_OTHER_LETTER},
 	{0x010a01, 0x010a03, PG_U_NONSPACING_MARK},
-	{0x010a04, 0x010a04, PG_U_UNASSIGNED},
 	{0x010a05, 0x010a06, PG_U_NONSPACING_MARK},
-	{0x010a07, 0x010a0b, PG_U_UNASSIGNED},
 	{0x010a0c, 0x010a0f, PG_U_NONSPACING_MARK},
 	{0x010a10, 0x010a13, PG_U_OTHER_LETTER},
-	{0x010a14, 0x010a14, PG_U_UNASSIGNED},
 	{0x010a15, 0x010a17, PG_U_OTHER_LETTER},
-	{0x010a18, 0x010a18, PG_U_UNASSIGNED},
 	{0x010a19, 0x010a35, PG_U_OTHER_LETTER},
-	{0x010a36, 0x010a37, PG_U_UNASSIGNED},
 	{0x010a38, 0x010a3a, PG_U_NONSPACING_MARK},
-	{0x010a3b, 0x010a3e, PG_U_UNASSIGNED},
 	{0x010a3f, 0x010a3f, PG_U_NONSPACING_MARK},
 	{0x010a40, 0x010a48, PG_U_OTHER_NUMBER},
-	{0x010a49, 0x010a4f, PG_U_UNASSIGNED},
 	{0x010a50, 0x010a58, PG_U_OTHER_PUNCTUATION},
-	{0x010a59, 0x010a5f, PG_U_UNASSIGNED},
 	{0x010a60, 0x010a7c, PG_U_OTHER_LETTER},
 	{0x010a7d, 0x010a7e, PG_U_OTHER_NUMBER},
 	{0x010a7f, 0x010a7f, PG_U_OTHER_PUNCTUATION},
 	{0x010a80, 0x010a9c, PG_U_OTHER_LETTER},
 	{0x010a9d, 0x010a9f, PG_U_OTHER_NUMBER},
-	{0x010aa0, 0x010abf, PG_U_UNASSIGNED},
 	{0x010ac0, 0x010ac7, PG_U_OTHER_LETTER},
 	{0x010ac8, 0x010ac8, PG_U_OTHER_SYMBOL},
 	{0x010ac9, 0x010ae4, PG_U_OTHER_LETTER},
 	{0x010ae5, 0x010ae6, PG_U_NONSPACING_MARK},
-	{0x010ae7, 0x010aea, PG_U_UNASSIGNED},
 	{0x010aeb, 0x010aef, PG_U_OTHER_NUMBER},
 	{0x010af0, 0x010af6, PG_U_OTHER_PUNCTUATION},
-	{0x010af7, 0x010aff, PG_U_UNASSIGNED},
 	{0x010b00, 0x010b35, PG_U_OTHER_LETTER},
-	{0x010b36, 0x010b38, PG_U_UNASSIGNED},
 	{0x010b39, 0x010b3f, PG_U_OTHER_PUNCTUATION},
 	{0x010b40, 0x010b55, PG_U_OTHER_LETTER},
-	{0x010b56, 0x010b57, PG_U_UNASSIGNED},
 	{0x010b58, 0x010b5f, PG_U_OTHER_NUMBER},
 	{0x010b60, 0x010b72, PG_U_OTHER_LETTER},
-	{0x010b73, 0x010b77, PG_U_UNASSIGNED},
 	{0x010b78, 0x010b7f, PG_U_OTHER_NUMBER},
 	{0x010b80, 0x010b91, PG_U_OTHER_LETTER},
-	{0x010b92, 0x010b98, PG_U_UNASSIGNED},
 	{0x010b99, 0x010b9c, PG_U_OTHER_PUNCTUATION},
-	{0x010b9d, 0x010ba8, PG_U_UNASSIGNED},
 	{0x010ba9, 0x010baf, PG_U_OTHER_NUMBER},
-	{0x010bb0, 0x010bff, PG_U_UNASSIGNED},
 	{0x010c00, 0x010c48, PG_U_OTHER_LETTER},
-	{0x010c49, 0x010c7f, PG_U_UNASSIGNED},
 	{0x010c80, 0x010cb2, PG_U_UPPERCASE_LETTER},
-	{0x010cb3, 0x010cbf, PG_U_UNASSIGNED},
 	{0x010cc0, 0x010cf2, PG_U_LOWERCASE_LETTER},
-	{0x010cf3, 0x010cf9, PG_U_UNASSIGNED},
 	{0x010cfa, 0x010cff, PG_U_OTHER_NUMBER},
 	{0x010d00, 0x010d23, PG_U_OTHER_LETTER},
 	{0x010d24, 0x010d27, PG_U_NONSPACING_MARK},
-	{0x010d28, 0x010d2f, PG_U_UNASSIGNED},
 	{0x010d30, 0x010d39, PG_U_DECIMAL_NUMBER},
-	{0x010d3a, 0x010e5f, PG_U_UNASSIGNED},
 	{0x010e60, 0x010e7e, PG_U_OTHER_NUMBER},
-	{0x010e7f, 0x010e7f, PG_U_UNASSIGNED},
 	{0x010e80, 0x010ea9, PG_U_OTHER_LETTER},
-	{0x010eaa, 0x010eaa, PG_U_UNASSIGNED},
 	{0x010eab, 0x010eac, PG_U_NONSPACING_MARK},
 	{0x010ead, 0x010ead, PG_U_DASH_PUNCTUATION},
-	{0x010eae, 0x010eaf, PG_U_UNASSIGNED},
 	{0x010eb0, 0x010eb1, PG_U_OTHER_LETTER},
-	{0x010eb2, 0x010efc, PG_U_UNASSIGNED},
 	{0x010efd, 0x010eff, PG_U_NONSPACING_MARK},
 	{0x010f00, 0x010f1c, PG_U_OTHER_LETTER},
 	{0x010f1d, 0x010f26, PG_U_OTHER_NUMBER},
 	{0x010f27, 0x010f27, PG_U_OTHER_LETTER},
-	{0x010f28, 0x010f2f, PG_U_UNASSIGNED},
 	{0x010f30, 0x010f45, PG_U_OTHER_LETTER},
 	{0x010f46, 0x010f50, PG_U_NONSPACING_MARK},
 	{0x010f51, 0x010f54, PG_U_OTHER_NUMBER},
 	{0x010f55, 0x010f59, PG_U_OTHER_PUNCTUATION},
-	{0x010f5a, 0x010f6f, PG_U_UNASSIGNED},
 	{0x010f70, 0x010f81, PG_U_OTHER_LETTER},
 	{0x010f82, 0x010f85, PG_U_NONSPACING_MARK},
 	{0x010f86, 0x010f89, PG_U_OTHER_PUNCTUATION},
-	{0x010f8a, 0x010faf, PG_U_UNASSIGNED},
 	{0x010fb0, 0x010fc4, PG_U_OTHER_LETTER},
 	{0x010fc5, 0x010fcb, PG_U_OTHER_NUMBER},
-	{0x010fcc, 0x010fdf, PG_U_UNASSIGNED},
 	{0x010fe0, 0x010ff6, PG_U_OTHER_LETTER},
-	{0x010ff7, 0x010fff, PG_U_UNASSIGNED},
 	{0x011000, 0x011000, PG_U_SPACING_MARK},
 	{0x011001, 0x011001, PG_U_NONSPACING_MARK},
 	{0x011002, 0x011002, PG_U_SPACING_MARK},
 	{0x011003, 0x011037, PG_U_OTHER_LETTER},
 	{0x011038, 0x011046, PG_U_NONSPACING_MARK},
 	{0x011047, 0x01104d, PG_U_OTHER_PUNCTUATION},
-	{0x01104e, 0x011051, PG_U_UNASSIGNED},
 	{0x011052, 0x011065, PG_U_OTHER_NUMBER},
 	{0x011066, 0x01106f, PG_U_DECIMAL_NUMBER},
 	{0x011070, 0x011070, PG_U_NONSPACING_MARK},
 	{0x011071, 0x011072, PG_U_OTHER_LETTER},
 	{0x011073, 0x011074, PG_U_NONSPACING_MARK},
 	{0x011075, 0x011075, PG_U_OTHER_LETTER},
-	{0x011076, 0x01107e, PG_U_UNASSIGNED},
 	{0x01107f, 0x011081, PG_U_NONSPACING_MARK},
 	{0x011082, 0x011082, PG_U_SPACING_MARK},
 	{0x011083, 0x0110af, PG_U_OTHER_LETTER},
@@ -3176,30 +2748,23 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0110bd, 0x0110bd, PG_U_FORMAT},
 	{0x0110be, 0x0110c1, PG_U_OTHER_PUNCTUATION},
 	{0x0110c2, 0x0110c2, PG_U_NONSPACING_MARK},
-	{0x0110c3, 0x0110cc, PG_U_UNASSIGNED},
 	{0x0110cd, 0x0110cd, PG_U_FORMAT},
-	{0x0110ce, 0x0110cf, PG_U_UNASSIGNED},
 	{0x0110d0, 0x0110e8, PG_U_OTHER_LETTER},
-	{0x0110e9, 0x0110ef, PG_U_UNASSIGNED},
 	{0x0110f0, 0x0110f9, PG_U_DECIMAL_NUMBER},
-	{0x0110fa, 0x0110ff, PG_U_UNASSIGNED},
 	{0x011100, 0x011102, PG_U_NONSPACING_MARK},
 	{0x011103, 0x011126, PG_U_OTHER_LETTER},
 	{0x011127, 0x01112b, PG_U_NONSPACING_MARK},
 	{0x01112c, 0x01112c, PG_U_SPACING_MARK},
 	{0x01112d, 0x011134, PG_U_NONSPACING_MARK},
-	{0x011135, 0x011135, PG_U_UNASSIGNED},
 	{0x011136, 0x01113f, PG_U_DECIMAL_NUMBER},
 	{0x011140, 0x011143, PG_U_OTHER_PUNCTUATION},
 	{0x011144, 0x011144, PG_U_OTHER_LETTER},
 	{0x011145, 0x011146, PG_U_SPACING_MARK},
 	{0x011147, 0x011147, PG_U_OTHER_LETTER},
-	{0x011148, 0x01114f, PG_U_UNASSIGNED},
 	{0x011150, 0x011172, PG_U_OTHER_LETTER},
 	{0x011173, 0x011173, PG_U_NONSPACING_MARK},
 	{0x011174, 0x011175, PG_U_OTHER_PUNCTUATION},
 	{0x011176, 0x011176, PG_U_OTHER_LETTER},
-	{0x011177, 0x01117f, PG_U_UNASSIGNED},
 	{0x011180, 0x011181, PG_U_NONSPACING_MARK},
 	{0x011182, 0x011182, PG_U_SPACING_MARK},
 	{0x011183, 0x0111b2, PG_U_OTHER_LETTER},
@@ -3217,11 +2782,8 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0111db, 0x0111db, PG_U_OTHER_PUNCTUATION},
 	{0x0111dc, 0x0111dc, PG_U_OTHER_LETTER},
 	{0x0111dd, 0x0111df, PG_U_OTHER_PUNCTUATION},
-	{0x0111e0, 0x0111e0, PG_U_UNASSIGNED},
 	{0x0111e1, 0x0111f4, PG_U_OTHER_NUMBER},
-	{0x0111f5, 0x0111ff, PG_U_UNASSIGNED},
 	{0x011200, 0x011211, PG_U_OTHER_LETTER},
-	{0x011212, 0x011212, PG_U_UNASSIGNED},
 	{0x011213, 0x01122b, PG_U_OTHER_LETTER},
 	{0x01122c, 0x01122e, PG_U_SPACING_MARK},
 	{0x01122f, 0x011231, PG_U_NONSPACING_MARK},
@@ -3233,61 +2795,38 @@ static const pg_category_range unicode_categories[4009] =
 	{0x01123e, 0x01123e, PG_U_NONSPACING_MARK},
 	{0x01123f, 0x011240, PG_U_OTHER_LETTER},
 	{0x011241, 0x011241, PG_U_NONSPACING_MARK},
-	{0x011242, 0x01127f, PG_U_UNASSIGNED},
 	{0x011280, 0x011286, PG_U_OTHER_LETTER},
-	{0x011287, 0x011287, PG_U_UNASSIGNED},
 	{0x011288, 0x011288, PG_U_OTHER_LETTER},
-	{0x011289, 0x011289, PG_U_UNASSIGNED},
 	{0x01128a, 0x01128d, PG_U_OTHER_LETTER},
-	{0x01128e, 0x01128e, PG_U_UNASSIGNED},
 	{0x01128f, 0x01129d, PG_U_OTHER_LETTER},
-	{0x01129e, 0x01129e, PG_U_UNASSIGNED},
 	{0x01129f, 0x0112a8, PG_U_OTHER_LETTER},
 	{0x0112a9, 0x0112a9, PG_U_OTHER_PUNCTUATION},
-	{0x0112aa, 0x0112af, PG_U_UNASSIGNED},
 	{0x0112b0, 0x0112de, PG_U_OTHER_LETTER},
 	{0x0112df, 0x0112df, PG_U_NONSPACING_MARK},
 	{0x0112e0, 0x0112e2, PG_U_SPACING_MARK},
 	{0x0112e3, 0x0112ea, PG_U_NONSPACING_MARK},
-	{0x0112eb, 0x0112ef, PG_U_UNASSIGNED},
 	{0x0112f0, 0x0112f9, PG_U_DECIMAL_NUMBER},
-	{0x0112fa, 0x0112ff, PG_U_UNASSIGNED},
 	{0x011300, 0x011301, PG_U_NONSPACING_MARK},
 	{0x011302, 0x011303, PG_U_SPACING_MARK},
-	{0x011304, 0x011304, PG_U_UNASSIGNED},
 	{0x011305, 0x01130c, PG_U_OTHER_LETTER},
-	{0x01130d, 0x01130e, PG_U_UNASSIGNED},
 	{0x01130f, 0x011310, PG_U_OTHER_LETTER},
-	{0x011311, 0x011312, PG_U_UNASSIGNED},
 	{0x011313, 0x011328, PG_U_OTHER_LETTER},
-	{0x011329, 0x011329, PG_U_UNASSIGNED},
 	{0x01132a, 0x011330, PG_U_OTHER_LETTER},
-	{0x011331, 0x011331, PG_U_UNASSIGNED},
 	{0x011332, 0x011333, PG_U_OTHER_LETTER},
-	{0x011334, 0x011334, PG_U_UNASSIGNED},
 	{0x011335, 0x011339, PG_U_OTHER_LETTER},
-	{0x01133a, 0x01133a, PG_U_UNASSIGNED},
 	{0x01133b, 0x01133c, PG_U_NONSPACING_MARK},
 	{0x01133d, 0x01133d, PG_U_OTHER_LETTER},
 	{0x01133e, 0x01133f, PG_U_SPACING_MARK},
 	{0x011340, 0x011340, PG_U_NONSPACING_MARK},
 	{0x011341, 0x011344, PG_U_SPACING_MARK},
-	{0x011345, 0x011346, PG_U_UNASSIGNED},
 	{0x011347, 0x011348, PG_U_SPACING_MARK},
-	{0x011349, 0x01134a, PG_U_UNASSIGNED},
 	{0x01134b, 0x01134d, PG_U_SPACING_MARK},
-	{0x01134e, 0x01134f, PG_U_UNASSIGNED},
 	{0x011350, 0x011350, PG_U_OTHER_LETTER},
-	{0x011351, 0x011356, PG_U_UNASSIGNED},
 	{0x011357, 0x011357, PG_U_SPACING_MARK},
-	{0x011358, 0x01135c, PG_U_UNASSIGNED},
 	{0x01135d, 0x011361, PG_U_OTHER_LETTER},
 	{0x011362, 0x011363, PG_U_SPACING_MARK},
-	{0x011364, 0x011365, PG_U_UNASSIGNED},
 	{0x011366, 0x01136c, PG_U_NONSPACING_MARK},
-	{0x01136d, 0x01136f, PG_U_UNASSIGNED},
 	{0x011370, 0x011374, PG_U_NONSPACING_MARK},
-	{0x011375, 0x0113ff, PG_U_UNASSIGNED},
 	{0x011400, 0x011434, PG_U_OTHER_LETTER},
 	{0x011435, 0x011437, PG_U_SPACING_MARK},
 	{0x011438, 0x01143f, PG_U_NONSPACING_MARK},
@@ -3299,11 +2838,9 @@ static const pg_category_range unicode_categories[4009] =
 	{0x01144b, 0x01144f, PG_U_OTHER_PUNCTUATION},
 	{0x011450, 0x011459, PG_U_DECIMAL_NUMBER},
 	{0x01145a, 0x01145b, PG_U_OTHER_PUNCTUATION},
-	{0x01145c, 0x01145c, PG_U_UNASSIGNED},
 	{0x01145d, 0x01145d, PG_U_OTHER_PUNCTUATION},
 	{0x01145e, 0x01145e, PG_U_NONSPACING_MARK},
 	{0x01145f, 0x011461, PG_U_OTHER_LETTER},
-	{0x011462, 0x01147f, PG_U_UNASSIGNED},
 	{0x011480, 0x0114af, PG_U_OTHER_LETTER},
 	{0x0114b0, 0x0114b2, PG_U_SPACING_MARK},
 	{0x0114b3, 0x0114b8, PG_U_NONSPACING_MARK},
@@ -3316,13 +2853,10 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0114c4, 0x0114c5, PG_U_OTHER_LETTER},
 	{0x0114c6, 0x0114c6, PG_U_OTHER_PUNCTUATION},
 	{0x0114c7, 0x0114c7, PG_U_OTHER_LETTER},
-	{0x0114c8, 0x0114cf, PG_U_UNASSIGNED},
 	{0x0114d0, 0x0114d9, PG_U_DECIMAL_NUMBER},
-	{0x0114da, 0x01157f, PG_U_UNASSIGNED},
 	{0x011580, 0x0115ae, PG_U_OTHER_LETTER},
 	{0x0115af, 0x0115b1, PG_U_SPACING_MARK},
 	{0x0115b2, 0x0115b5, PG_U_NONSPACING_MARK},
-	{0x0115b6, 0x0115b7, PG_U_UNASSIGNED},
 	{0x0115b8, 0x0115bb, PG_U_SPACING_MARK},
 	{0x0115bc, 0x0115bd, PG_U_NONSPACING_MARK},
 	{0x0115be, 0x0115be, PG_U_SPACING_MARK},
@@ -3330,7 +2864,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0115c1, 0x0115d7, PG_U_OTHER_PUNCTUATION},
 	{0x0115d8, 0x0115db, PG_U_OTHER_LETTER},
 	{0x0115dc, 0x0115dd, PG_U_NONSPACING_MARK},
-	{0x0115de, 0x0115ff, PG_U_UNASSIGNED},
 	{0x011600, 0x01162f, PG_U_OTHER_LETTER},
 	{0x011630, 0x011632, PG_U_SPACING_MARK},
 	{0x011633, 0x01163a, PG_U_NONSPACING_MARK},
@@ -3340,11 +2873,8 @@ static const pg_category_range unicode_categories[4009] =
 	{0x01163f, 0x011640, PG_U_NONSPACING_MARK},
 	{0x011641, 0x011643, PG_U_OTHER_PUNCTUATION},
 	{0x011644, 0x011644, PG_U_OTHER_LETTER},
-	{0x011645, 0x01164f, PG_U_UNASSIGNED},
 	{0x011650, 0x011659, PG_U_DECIMAL_NUMBER},
-	{0x01165a, 0x01165f, PG_U_UNASSIGNED},
 	{0x011660, 0x01166c, PG_U_OTHER_PUNCTUATION},
-	{0x01166d, 0x01167f, PG_U_UNASSIGNED},
 	{0x011680, 0x0116aa, PG_U_OTHER_LETTER},
 	{0x0116ab, 0x0116ab, PG_U_NONSPACING_MARK},
 	{0x0116ac, 0x0116ac, PG_U_SPACING_MARK},
@@ -3355,48 +2885,35 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0116b7, 0x0116b7, PG_U_NONSPACING_MARK},
 	{0x0116b8, 0x0116b8, PG_U_OTHER_LETTER},
 	{0x0116b9, 0x0116b9, PG_U_OTHER_PUNCTUATION},
-	{0x0116ba, 0x0116bf, PG_U_UNASSIGNED},
 	{0x0116c0, 0x0116c9, PG_U_DECIMAL_NUMBER},
-	{0x0116ca, 0x0116ff, PG_U_UNASSIGNED},
 	{0x011700, 0x01171a, PG_U_OTHER_LETTER},
-	{0x01171b, 0x01171c, PG_U_UNASSIGNED},
 	{0x01171d, 0x01171f, PG_U_NONSPACING_MARK},
 	{0x011720, 0x011721, PG_U_SPACING_MARK},
 	{0x011722, 0x011725, PG_U_NONSPACING_MARK},
 	{0x011726, 0x011726, PG_U_SPACING_MARK},
 	{0x011727, 0x01172b, PG_U_NONSPACING_MARK},
-	{0x01172c, 0x01172f, PG_U_UNASSIGNED},
 	{0x011730, 0x011739, PG_U_DECIMAL_NUMBER},
 	{0x01173a, 0x01173b, PG_U_OTHER_NUMBER},
 	{0x01173c, 0x01173e, PG_U_OTHER_PUNCTUATION},
 	{0x01173f, 0x01173f, PG_U_OTHER_SYMBOL},
 	{0x011740, 0x011746, PG_U_OTHER_LETTER},
-	{0x011747, 0x0117ff, PG_U_UNASSIGNED},
 	{0x011800, 0x01182b, PG_U_OTHER_LETTER},
 	{0x01182c, 0x01182e, PG_U_SPACING_MARK},
 	{0x01182f, 0x011837, PG_U_NONSPACING_MARK},
 	{0x011838, 0x011838, PG_U_SPACING_MARK},
 	{0x011839, 0x01183a, PG_U_NONSPACING_MARK},
 	{0x01183b, 0x01183b, PG_U_OTHER_PUNCTUATION},
-	{0x01183c, 0x01189f, PG_U_UNASSIGNED},
 	{0x0118a0, 0x0118bf, PG_U_UPPERCASE_LETTER},
 	{0x0118c0, 0x0118df, PG_U_LOWERCASE_LETTER},
 	{0x0118e0, 0x0118e9, PG_U_DECIMAL_NUMBER},
 	{0x0118ea, 0x0118f2, PG_U_OTHER_NUMBER},
-	{0x0118f3, 0x0118fe, PG_U_UNASSIGNED},
 	{0x0118ff, 0x011906, PG_U_OTHER_LETTER},
-	{0x011907, 0x011908, PG_U_UNASSIGNED},
 	{0x011909, 0x011909, PG_U_OTHER_LETTER},
-	{0x01190a, 0x01190b, PG_U_UNASSIGNED},
 	{0x01190c, 0x011913, PG_U_OTHER_LETTER},
-	{0x011914, 0x011914, PG_U_UNASSIGNED},
 	{0x011915, 0x011916, PG_U_OTHER_LETTER},
-	{0x011917, 0x011917, PG_U_UNASSIGNED},
 	{0x011918, 0x01192f, PG_U_OTHER_LETTER},
 	{0x011930, 0x011935, PG_U_SPACING_MARK},
-	{0x011936, 0x011936, PG_U_UNASSIGNED},
 	{0x011937, 0x011938, PG_U_SPACING_MARK},
-	{0x011939, 0x01193a, PG_U_UNASSIGNED},
 	{0x01193b, 0x01193c, PG_U_NONSPACING_MARK},
 	{0x01193d, 0x01193d, PG_U_SPACING_MARK},
 	{0x01193e, 0x01193e, PG_U_NONSPACING_MARK},
@@ -3406,15 +2923,11 @@ static const pg_category_range unicode_categories[4009] =
 	{0x011942, 0x011942, PG_U_SPACING_MARK},
 	{0x011943, 0x011943, PG_U_NONSPACING_MARK},
 	{0x011944, 0x011946, PG_U_OTHER_PUNCTUATION},
-	{0x011947, 0x01194f, PG_U_UNASSIGNED},
 	{0x011950, 0x011959, PG_U_DECIMAL_NUMBER},
-	{0x01195a, 0x01199f, PG_U_UNASSIGNED},
 	{0x0119a0, 0x0119a7, PG_U_OTHER_LETTER},
-	{0x0119a8, 0x0119a9, PG_U_UNASSIGNED},
 	{0x0119aa, 0x0119d0, PG_U_OTHER_LETTER},
 	{0x0119d1, 0x0119d3, PG_U_SPACING_MARK},
 	{0x0119d4, 0x0119d7, PG_U_NONSPACING_MARK},
-	{0x0119d8, 0x0119d9, PG_U_UNASSIGNED},
 	{0x0119da, 0x0119db, PG_U_NONSPACING_MARK},
 	{0x0119dc, 0x0119df, PG_U_SPACING_MARK},
 	{0x0119e0, 0x0119e0, PG_U_NONSPACING_MARK},
@@ -3422,7 +2935,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0119e2, 0x0119e2, PG_U_OTHER_PUNCTUATION},
 	{0x0119e3, 0x0119e3, PG_U_OTHER_LETTER},
 	{0x0119e4, 0x0119e4, PG_U_SPACING_MARK},
-	{0x0119e5, 0x0119ff, PG_U_UNASSIGNED},
 	{0x011a00, 0x011a00, PG_U_OTHER_LETTER},
 	{0x011a01, 0x011a0a, PG_U_NONSPACING_MARK},
 	{0x011a0b, 0x011a32, PG_U_OTHER_LETTER},
@@ -3432,7 +2944,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x011a3b, 0x011a3e, PG_U_NONSPACING_MARK},
 	{0x011a3f, 0x011a46, PG_U_OTHER_PUNCTUATION},
 	{0x011a47, 0x011a47, PG_U_NONSPACING_MARK},
-	{0x011a48, 0x011a4f, PG_U_UNASSIGNED},
 	{0x011a50, 0x011a50, PG_U_OTHER_LETTER},
 	{0x011a51, 0x011a56, PG_U_NONSPACING_MARK},
 	{0x011a57, 0x011a58, PG_U_SPACING_MARK},
@@ -3444,136 +2955,93 @@ static const pg_category_range unicode_categories[4009] =
 	{0x011a9a, 0x011a9c, PG_U_OTHER_PUNCTUATION},
 	{0x011a9d, 0x011a9d, PG_U_OTHER_LETTER},
 	{0x011a9e, 0x011aa2, PG_U_OTHER_PUNCTUATION},
-	{0x011aa3, 0x011aaf, PG_U_UNASSIGNED},
 	{0x011ab0, 0x011af8, PG_U_OTHER_LETTER},
-	{0x011af9, 0x011aff, PG_U_UNASSIGNED},
 	{0x011b00, 0x011b09, PG_U_OTHER_PUNCTUATION},
-	{0x011b0a, 0x011bff, PG_U_UNASSIGNED},
 	{0x011c00, 0x011c08, PG_U_OTHER_LETTER},
-	{0x011c09, 0x011c09, PG_U_UNASSIGNED},
 	{0x011c0a, 0x011c2e, PG_U_OTHER_LETTER},
 	{0x011c2f, 0x011c2f, PG_U_SPACING_MARK},
 	{0x011c30, 0x011c36, PG_U_NONSPACING_MARK},
-	{0x011c37, 0x011c37, PG_U_UNASSIGNED},
 	{0x011c38, 0x011c3d, PG_U_NONSPACING_MARK},
 	{0x011c3e, 0x011c3e, PG_U_SPACING_MARK},
 	{0x011c3f, 0x011c3f, PG_U_NONSPACING_MARK},
 	{0x011c40, 0x011c40, PG_U_OTHER_LETTER},
 	{0x011c41, 0x011c45, PG_U_OTHER_PUNCTUATION},
-	{0x011c46, 0x011c4f, PG_U_UNASSIGNED},
 	{0x011c50, 0x011c59, PG_U_DECIMAL_NUMBER},
 	{0x011c5a, 0x011c6c, PG_U_OTHER_NUMBER},
-	{0x011c6d, 0x011c6f, PG_U_UNASSIGNED},
 	{0x011c70, 0x011c71, PG_U_OTHER_PUNCTUATION},
 	{0x011c72, 0x011c8f, PG_U_OTHER_LETTER},
-	{0x011c90, 0x011c91, PG_U_UNASSIGNED},
 	{0x011c92, 0x011ca7, PG_U_NONSPACING_MARK},
-	{0x011ca8, 0x011ca8, PG_U_UNASSIGNED},
 	{0x011ca9, 0x011ca9, PG_U_SPACING_MARK},
 	{0x011caa, 0x011cb0, PG_U_NONSPACING_MARK},
 	{0x011cb1, 0x011cb1, PG_U_SPACING_MARK},
 	{0x011cb2, 0x011cb3, PG_U_NONSPACING_MARK},
 	{0x011cb4, 0x011cb4, PG_U_SPACING_MARK},
 	{0x011cb5, 0x011cb6, PG_U_NONSPACING_MARK},
-	{0x011cb7, 0x011cff, PG_U_UNASSIGNED},
 	{0x011d00, 0x011d06, PG_U_OTHER_LETTER},
-	{0x011d07, 0x011d07, PG_U_UNASSIGNED},
 	{0x011d08, 0x011d09, PG_U_OTHER_LETTER},
-	{0x011d0a, 0x011d0a, PG_U_UNASSIGNED},
 	{0x011d0b, 0x011d30, PG_U_OTHER_LETTER},
 	{0x011d31, 0x011d36, PG_U_NONSPACING_MARK},
-	{0x011d37, 0x011d39, PG_U_UNASSIGNED},
 	{0x011d3a, 0x011d3a, PG_U_NONSPACING_MARK},
-	{0x011d3b, 0x011d3b, PG_U_UNASSIGNED},
 	{0x011d3c, 0x011d3d, PG_U_NONSPACING_MARK},
-	{0x011d3e, 0x011d3e, PG_U_UNASSIGNED},
 	{0x011d3f, 0x011d45, PG_U_NONSPACING_MARK},
 	{0x011d46, 0x011d46, PG_U_OTHER_LETTER},
 	{0x011d47, 0x011d47, PG_U_NONSPACING_MARK},
-	{0x011d48, 0x011d4f, PG_U_UNASSIGNED},
 	{0x011d50, 0x011d59, PG_U_DECIMAL_NUMBER},
-	{0x011d5a, 0x011d5f, PG_U_UNASSIGNED},
 	{0x011d60, 0x011d65, PG_U_OTHER_LETTER},
-	{0x011d66, 0x011d66, PG_U_UNASSIGNED},
 	{0x011d67, 0x011d68, PG_U_OTHER_LETTER},
-	{0x011d69, 0x011d69, PG_U_UNASSIGNED},
 	{0x011d6a, 0x011d89, PG_U_OTHER_LETTER},
 	{0x011d8a, 0x011d8e, PG_U_SPACING_MARK},
-	{0x011d8f, 0x011d8f, PG_U_UNASSIGNED},
 	{0x011d90, 0x011d91, PG_U_NONSPACING_MARK},
-	{0x011d92, 0x011d92, PG_U_UNASSIGNED},
 	{0x011d93, 0x011d94, PG_U_SPACING_MARK},
 	{0x011d95, 0x011d95, PG_U_NONSPACING_MARK},
 	{0x011d96, 0x011d96, PG_U_SPACING_MARK},
 	{0x011d97, 0x011d97, PG_U_NONSPACING_MARK},
 	{0x011d98, 0x011d98, PG_U_OTHER_LETTER},
-	{0x011d99, 0x011d9f, PG_U_UNASSIGNED},
 	{0x011da0, 0x011da9, PG_U_DECIMAL_NUMBER},
-	{0x011daa, 0x011edf, PG_U_UNASSIGNED},
 	{0x011ee0, 0x011ef2, PG_U_OTHER_LETTER},
 	{0x011ef3, 0x011ef4, PG_U_NONSPACING_MARK},
 	{0x011ef5, 0x011ef6, PG_U_SPACING_MARK},
 	{0x011ef7, 0x011ef8, PG_U_OTHER_PUNCTUATION},
-	{0x011ef9, 0x011eff, PG_U_UNASSIGNED},
 	{0x011f00, 0x011f01, PG_U_NONSPACING_MARK},
 	{0x011f02, 0x011f02, PG_U_OTHER_LETTER},
 	{0x011f03, 0x011f03, PG_U_SPACING_MARK},
 	{0x011f04, 0x011f10, PG_U_OTHER_LETTER},
-	{0x011f11, 0x011f11, PG_U_UNASSIGNED},
 	{0x011f12, 0x011f33, PG_U_OTHER_LETTER},
 	{0x011f34, 0x011f35, PG_U_SPACING_MARK},
 	{0x011f36, 0x011f3a, PG_U_NONSPACING_MARK},
-	{0x011f3b, 0x011f3d, PG_U_UNASSIGNED},
 	{0x011f3e, 0x011f3f, PG_U_SPACING_MARK},
 	{0x011f40, 0x011f40, PG_U_NONSPACING_MARK},
 	{0x011f41, 0x011f41, PG_U_SPACING_MARK},
 	{0x011f42, 0x011f42, PG_U_NONSPACING_MARK},
 	{0x011f43, 0x011f4f, PG_U_OTHER_PUNCTUATION},
 	{0x011f50, 0x011f59, PG_U_DECIMAL_NUMBER},
-	{0x011f5a, 0x011faf, PG_U_UNASSIGNED},
 	{0x011fb0, 0x011fb0, PG_U_OTHER_LETTER},
-	{0x011fb1, 0x011fbf, PG_U_UNASSIGNED},
 	{0x011fc0, 0x011fd4, PG_U_OTHER_NUMBER},
 	{0x011fd5, 0x011fdc, PG_U_OTHER_SYMBOL},
 	{0x011fdd, 0x011fe0, PG_U_CURRENCY_SYMBOL},
 	{0x011fe1, 0x011ff1, PG_U_OTHER_SYMBOL},
-	{0x011ff2, 0x011ffe, PG_U_UNASSIGNED},
 	{0x011fff, 0x011fff, PG_U_OTHER_PUNCTUATION},
 	{0x012000, 0x012399, PG_U_OTHER_LETTER},
-	{0x01239a, 0x0123ff, PG_U_UNASSIGNED},
 	{0x012400, 0x01246e, PG_U_LETTER_NUMBER},
-	{0x01246f, 0x01246f, PG_U_UNASSIGNED},
 	{0x012470, 0x012474, PG_U_OTHER_PUNCTUATION},
-	{0x012475, 0x01247f, PG_U_UNASSIGNED},
 	{0x012480, 0x012543, PG_U_OTHER_LETTER},
-	{0x012544, 0x012f8f, PG_U_UNASSIGNED},
 	{0x012f90, 0x012ff0, PG_U_OTHER_LETTER},
 	{0x012ff1, 0x012ff2, PG_U_OTHER_PUNCTUATION},
-	{0x012ff3, 0x012fff, PG_U_UNASSIGNED},
 	{0x013000, 0x01342f, PG_U_OTHER_LETTER},
 	{0x013430, 0x01343f, PG_U_FORMAT},
 	{0x013440, 0x013440, PG_U_NONSPACING_MARK},
 	{0x013441, 0x013446, PG_U_OTHER_LETTER},
 	{0x013447, 0x013455, PG_U_NONSPACING_MARK},
-	{0x013456, 0x0143ff, PG_U_UNASSIGNED},
 	{0x014400, 0x014646, PG_U_OTHER_LETTER},
-	{0x014647, 0x0167ff, PG_U_UNASSIGNED},
 	{0x016800, 0x016a38, PG_U_OTHER_LETTER},
-	{0x016a39, 0x016a3f, PG_U_UNASSIGNED},
 	{0x016a40, 0x016a5e, PG_U_OTHER_LETTER},
-	{0x016a5f, 0x016a5f, PG_U_UNASSIGNED},
 	{0x016a60, 0x016a69, PG_U_DECIMAL_NUMBER},
-	{0x016a6a, 0x016a6d, PG_U_UNASSIGNED},
 	{0x016a6e, 0x016a6f, PG_U_OTHER_PUNCTUATION},
 	{0x016a70, 0x016abe, PG_U_OTHER_LETTER},
-	{0x016abf, 0x016abf, PG_U_UNASSIGNED},
 	{0x016ac0, 0x016ac9, PG_U_DECIMAL_NUMBER},
-	{0x016aca, 0x016acf, PG_U_UNASSIGNED},
 	{0x016ad0, 0x016aed, PG_U_OTHER_LETTER},
-	{0x016aee, 0x016aef, PG_U_UNASSIGNED},
 	{0x016af0, 0x016af4, PG_U_NONSPACING_MARK},
 	{0x016af5, 0x016af5, PG_U_OTHER_PUNCTUATION},
-	{0x016af6, 0x016aff, PG_U_UNASSIGNED},
 	{0x016b00, 0x016b2f, PG_U_OTHER_LETTER},
 	{0x016b30, 0x016b36, PG_U_NONSPACING_MARK},
 	{0x016b37, 0x016b3b, PG_U_OTHER_PUNCTUATION},
@@ -3581,83 +3049,50 @@ static const pg_category_range unicode_categories[4009] =
 	{0x016b40, 0x016b43, PG_U_MODIFIER_LETTER},
 	{0x016b44, 0x016b44, PG_U_OTHER_PUNCTUATION},
 	{0x016b45, 0x016b45, PG_U_OTHER_SYMBOL},
-	{0x016b46, 0x016b4f, PG_U_UNASSIGNED},
 	{0x016b50, 0x016b59, PG_U_DECIMAL_NUMBER},
-	{0x016b5a, 0x016b5a, PG_U_UNASSIGNED},
 	{0x016b5b, 0x016b61, PG_U_OTHER_NUMBER},
-	{0x016b62, 0x016b62, PG_U_UNASSIGNED},
 	{0x016b63, 0x016b77, PG_U_OTHER_LETTER},
-	{0x016b78, 0x016b7c, PG_U_UNASSIGNED},
 	{0x016b7d, 0x016b8f, PG_U_OTHER_LETTER},
-	{0x016b90, 0x016e3f, PG_U_UNASSIGNED},
 	{0x016e40, 0x016e5f, PG_U_UPPERCASE_LETTER},
 	{0x016e60, 0x016e7f, PG_U_LOWERCASE_LETTER},
 	{0x016e80, 0x016e96, PG_U_OTHER_NUMBER},
 	{0x016e97, 0x016e9a, PG_U_OTHER_PUNCTUATION},
-	{0x016e9b, 0x016eff, PG_U_UNASSIGNED},
 	{0x016f00, 0x016f4a, PG_U_OTHER_LETTER},
-	{0x016f4b, 0x016f4e, PG_U_UNASSIGNED},
 	{0x016f4f, 0x016f4f, PG_U_NONSPACING_MARK},
 	{0x016f50, 0x016f50, PG_U_OTHER_LETTER},
 	{0x016f51, 0x016f87, PG_U_SPACING_MARK},
-	{0x016f88, 0x016f8e, PG_U_UNASSIGNED},
 	{0x016f8f, 0x016f92, PG_U_NONSPACING_MARK},
 	{0x016f93, 0x016f9f, PG_U_MODIFIER_LETTER},
-	{0x016fa0, 0x016fdf, PG_U_UNASSIGNED},
 	{0x016fe0, 0x016fe1, PG_U_MODIFIER_LETTER},
 	{0x016fe2, 0x016fe2, PG_U_OTHER_PUNCTUATION},
 	{0x016fe3, 0x016fe3, PG_U_MODIFIER_LETTER},
 	{0x016fe4, 0x016fe4, PG_U_NONSPACING_MARK},
-	{0x016fe5, 0x016fef, PG_U_UNASSIGNED},
 	{0x016ff0, 0x016ff1, PG_U_SPACING_MARK},
-	{0x016ff2, 0x016fff, PG_U_UNASSIGNED},
 	{0x017000, 0x0187f7, PG_U_OTHER_LETTER},
-	{0x0187f8, 0x0187ff, PG_U_UNASSIGNED},
 	{0x018800, 0x018cd5, PG_U_OTHER_LETTER},
-	{0x018cd6, 0x018cff, PG_U_UNASSIGNED},
 	{0x018d00, 0x018d08, PG_U_OTHER_LETTER},
-	{0x018d09, 0x01afef, PG_U_UNASSIGNED},
 	{0x01aff0, 0x01aff3, PG_U_MODIFIER_LETTER},
-	{0x01aff4, 0x01aff4, PG_U_UNASSIGNED},
 	{0x01aff5, 0x01affb, PG_U_MODIFIER_LETTER},
-	{0x01affc, 0x01affc, PG_U_UNASSIGNED},
 	{0x01affd, 0x01affe, PG_U_MODIFIER_LETTER},
-	{0x01afff, 0x01afff, PG_U_UNASSIGNED},
 	{0x01b000, 0x01b122, PG_U_OTHER_LETTER},
-	{0x01b123, 0x01b131, PG_U_UNASSIGNED},
 	{0x01b132, 0x01b132, PG_U_OTHER_LETTER},
-	{0x01b133, 0x01b14f, PG_U_UNASSIGNED},
 	{0x01b150, 0x01b152, PG_U_OTHER_LETTER},
-	{0x01b153, 0x01b154, PG_U_UNASSIGNED},
 	{0x01b155, 0x01b155, PG_U_OTHER_LETTER},
-	{0x01b156, 0x01b163, PG_U_UNASSIGNED},
 	{0x01b164, 0x01b167, PG_U_OTHER_LETTER},
-	{0x01b168, 0x01b16f, PG_U_UNASSIGNED},
 	{0x01b170, 0x01b2fb, PG_U_OTHER_LETTER},
-	{0x01b2fc, 0x01bbff, PG_U_UNASSIGNED},
 	{0x01bc00, 0x01bc6a, PG_U_OTHER_LETTER},
-	{0x01bc6b, 0x01bc6f, PG_U_UNASSIGNED},
 	{0x01bc70, 0x01bc7c, PG_U_OTHER_LETTER},
-	{0x01bc7d, 0x01bc7f, PG_U_UNASSIGNED},
 	{0x01bc80, 0x01bc88, PG_U_OTHER_LETTER},
-	{0x01bc89, 0x01bc8f, PG_U_UNASSIGNED},
 	{0x01bc90, 0x01bc99, PG_U_OTHER_LETTER},
-	{0x01bc9a, 0x01bc9b, PG_U_UNASSIGNED},
 	{0x01bc9c, 0x01bc9c, PG_U_OTHER_SYMBOL},
 	{0x01bc9d, 0x01bc9e, PG_U_NONSPACING_MARK},
 	{0x01bc9f, 0x01bc9f, PG_U_OTHER_PUNCTUATION},
 	{0x01bca0, 0x01bca3, PG_U_FORMAT},
-	{0x01bca4, 0x01ceff, PG_U_UNASSIGNED},
 	{0x01cf00, 0x01cf2d, PG_U_NONSPACING_MARK},
-	{0x01cf2e, 0x01cf2f, PG_U_UNASSIGNED},
 	{0x01cf30, 0x01cf46, PG_U_NONSPACING_MARK},
-	{0x01cf47, 0x01cf4f, PG_U_UNASSIGNED},
 	{0x01cf50, 0x01cfc3, PG_U_OTHER_SYMBOL},
-	{0x01cfc4, 0x01cfff, PG_U_UNASSIGNED},
 	{0x01d000, 0x01d0f5, PG_U_OTHER_SYMBOL},
-	{0x01d0f6, 0x01d0ff, PG_U_UNASSIGNED},
 	{0x01d100, 0x01d126, PG_U_OTHER_SYMBOL},
-	{0x01d127, 0x01d128, PG_U_UNASSIGNED},
 	{0x01d129, 0x01d164, PG_U_OTHER_SYMBOL},
 	{0x01d165, 0x01d166, PG_U_SPACING_MARK},
 	{0x01d167, 0x01d169, PG_U_NONSPACING_MARK},
@@ -3670,66 +3105,42 @@ static const pg_category_range unicode_categories[4009] =
 	{0x01d18c, 0x01d1a9, PG_U_OTHER_SYMBOL},
 	{0x01d1aa, 0x01d1ad, PG_U_NONSPACING_MARK},
 	{0x01d1ae, 0x01d1ea, PG_U_OTHER_SYMBOL},
-	{0x01d1eb, 0x01d1ff, PG_U_UNASSIGNED},
 	{0x01d200, 0x01d241, PG_U_OTHER_SYMBOL},
 	{0x01d242, 0x01d244, PG_U_NONSPACING_MARK},
 	{0x01d245, 0x01d245, PG_U_OTHER_SYMBOL},
-	{0x01d246, 0x01d2bf, PG_U_UNASSIGNED},
 	{0x01d2c0, 0x01d2d3, PG_U_OTHER_NUMBER},
-	{0x01d2d4, 0x01d2df, PG_U_UNASSIGNED},
 	{0x01d2e0, 0x01d2f3, PG_U_OTHER_NUMBER},
-	{0x01d2f4, 0x01d2ff, PG_U_UNASSIGNED},
 	{0x01d300, 0x01d356, PG_U_OTHER_SYMBOL},
-	{0x01d357, 0x01d35f, PG_U_UNASSIGNED},
 	{0x01d360, 0x01d378, PG_U_OTHER_NUMBER},
-	{0x01d379, 0x01d3ff, PG_U_UNASSIGNED},
 	{0x01d400, 0x01d419, PG_U_UPPERCASE_LETTER},
 	{0x01d41a, 0x01d433, PG_U_LOWERCASE_LETTER},
 	{0x01d434, 0x01d44d, PG_U_UPPERCASE_LETTER},
 	{0x01d44e, 0x01d454, PG_U_LOWERCASE_LETTER},
-	{0x01d455, 0x01d455, PG_U_UNASSIGNED},
 	{0x01d456, 0x01d467, PG_U_LOWERCASE_LETTER},
 	{0x01d468, 0x01d481, PG_U_UPPERCASE_LETTER},
 	{0x01d482, 0x01d49b, PG_U_LOWERCASE_LETTER},
 	{0x01d49c, 0x01d49c, PG_U_UPPERCASE_LETTER},
-	{0x01d49d, 0x01d49d, PG_U_UNASSIGNED},
 	{0x01d49e, 0x01d49f, PG_U_UPPERCASE_LETTER},
-	{0x01d4a0, 0x01d4a1, PG_U_UNASSIGNED},
 	{0x01d4a2, 0x01d4a2, PG_U_UPPERCASE_LETTER},
-	{0x01d4a3, 0x01d4a4, PG_U_UNASSIGNED},
 	{0x01d4a5, 0x01d4a6, PG_U_UPPERCASE_LETTER},
-	{0x01d4a7, 0x01d4a8, PG_U_UNASSIGNED},
 	{0x01d4a9, 0x01d4ac, PG_U_UPPERCASE_LETTER},
-	{0x01d4ad, 0x01d4ad, PG_U_UNASSIGNED},
 	{0x01d4ae, 0x01d4b5, PG_U_UPPERCASE_LETTER},
 	{0x01d4b6, 0x01d4b9, PG_U_LOWERCASE_LETTER},
-	{0x01d4ba, 0x01d4ba, PG_U_UNASSIGNED},
 	{0x01d4bb, 0x01d4bb, PG_U_LOWERCASE_LETTER},
-	{0x01d4bc, 0x01d4bc, PG_U_UNASSIGNED},
 	{0x01d4bd, 0x01d4c3, PG_U_LOWERCASE_LETTER},
-	{0x01d4c4, 0x01d4c4, PG_U_UNASSIGNED},
 	{0x01d4c5, 0x01d4cf, PG_U_LOWERCASE_LETTER},
 	{0x01d4d0, 0x01d4e9, PG_U_UPPERCASE_LETTER},
 	{0x01d4ea, 0x01d503, PG_U_LOWERCASE_LETTER},
 	{0x01d504, 0x01d505, PG_U_UPPERCASE_LETTER},
-	{0x01d506, 0x01d506, PG_U_UNASSIGNED},
 	{0x01d507, 0x01d50a, PG_U_UPPERCASE_LETTER},
-	{0x01d50b, 0x01d50c, PG_U_UNASSIGNED},
 	{0x01d50d, 0x01d514, PG_U_UPPERCASE_LETTER},
-	{0x01d515, 0x01d515, PG_U_UNASSIGNED},
 	{0x01d516, 0x01d51c, PG_U_UPPERCASE_LETTER},
-	{0x01d51d, 0x01d51d, PG_U_UNASSIGNED},
 	{0x01d51e, 0x01d537, PG_U_LOWERCASE_LETTER},
 	{0x01d538, 0x01d539, PG_U_UPPERCASE_LETTER},
-	{0x01d53a, 0x01d53a, PG_U_UNASSIGNED},
 	{0x01d53b, 0x01d53e, PG_U_UPPERCASE_LETTER},
-	{0x01d53f, 0x01d53f, PG_U_UNASSIGNED},
 	{0x01d540, 0x01d544, PG_U_UPPERCASE_LETTER},
-	{0x01d545, 0x01d545, PG_U_UNASSIGNED},
 	{0x01d546, 0x01d546, PG_U_UPPERCASE_LETTER},
-	{0x01d547, 0x01d549, PG_U_UNASSIGNED},
 	{0x01d54a, 0x01d550, PG_U_UPPERCASE_LETTER},
-	{0x01d551, 0x01d551, PG_U_UNASSIGNED},
 	{0x01d552, 0x01d56b, PG_U_LOWERCASE_LETTER},
 	{0x01d56c, 0x01d585, PG_U_UPPERCASE_LETTER},
 	{0x01d586, 0x01d59f, PG_U_LOWERCASE_LETTER},
@@ -3743,7 +3154,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x01d656, 0x01d66f, PG_U_LOWERCASE_LETTER},
 	{0x01d670, 0x01d689, PG_U_UPPERCASE_LETTER},
 	{0x01d68a, 0x01d6a5, PG_U_LOWERCASE_LETTER},
-	{0x01d6a6, 0x01d6a7, PG_U_UNASSIGNED},
 	{0x01d6a8, 0x01d6c0, PG_U_UPPERCASE_LETTER},
 	{0x01d6c1, 0x01d6c1, PG_U_MATH_SYMBOL},
 	{0x01d6c2, 0x01d6da, PG_U_LOWERCASE_LETTER},
@@ -3771,7 +3181,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x01d7c4, 0x01d7c9, PG_U_LOWERCASE_LETTER},
 	{0x01d7ca, 0x01d7ca, PG_U_UPPERCASE_LETTER},
 	{0x01d7cb, 0x01d7cb, PG_U_LOWERCASE_LETTER},
-	{0x01d7cc, 0x01d7cd, PG_U_UNASSIGNED},
 	{0x01d7ce, 0x01d7ff, PG_U_DECIMAL_NUMBER},
 	{0x01d800, 0x01d9ff, PG_U_OTHER_SYMBOL},
 	{0x01da00, 0x01da36, PG_U_NONSPACING_MARK},
@@ -3783,258 +3192,142 @@ static const pg_category_range unicode_categories[4009] =
 	{0x01da84, 0x01da84, PG_U_NONSPACING_MARK},
 	{0x01da85, 0x01da86, PG_U_OTHER_SYMBOL},
 	{0x01da87, 0x01da8b, PG_U_OTHER_PUNCTUATION},
-	{0x01da8c, 0x01da9a, PG_U_UNASSIGNED},
 	{0x01da9b, 0x01da9f, PG_U_NONSPACING_MARK},
-	{0x01daa0, 0x01daa0, PG_U_UNASSIGNED},
 	{0x01daa1, 0x01daaf, PG_U_NONSPACING_MARK},
-	{0x01dab0, 0x01deff, PG_U_UNASSIGNED},
 	{0x01df00, 0x01df09, PG_U_LOWERCASE_LETTER},
 	{0x01df0a, 0x01df0a, PG_U_OTHER_LETTER},
 	{0x01df0b, 0x01df1e, PG_U_LOWERCASE_LETTER},
-	{0x01df1f, 0x01df24, PG_U_UNASSIGNED},
 	{0x01df25, 0x01df2a, PG_U_LOWERCASE_LETTER},
-	{0x01df2b, 0x01dfff, PG_U_UNASSIGNED},
 	{0x01e000, 0x01e006, PG_U_NONSPACING_MARK},
-	{0x01e007, 0x01e007, PG_U_UNASSIGNED},
 	{0x01e008, 0x01e018, PG_U_NONSPACING_MARK},
-	{0x01e019, 0x01e01a, PG_U_UNASSIGNED},
 	{0x01e01b, 0x01e021, PG_U_NONSPACING_MARK},
-	{0x01e022, 0x01e022, PG_U_UNASSIGNED},
 	{0x01e023, 0x01e024, PG_U_NONSPACING_MARK},
-	{0x01e025, 0x01e025, PG_U_UNASSIGNED},
 	{0x01e026, 0x01e02a, PG_U_NONSPACING_MARK},
-	{0x01e02b, 0x01e02f, PG_U_UNASSIGNED},
 	{0x01e030, 0x01e06d, PG_U_MODIFIER_LETTER},
-	{0x01e06e, 0x01e08e, PG_U_UNASSIGNED},
 	{0x01e08f, 0x01e08f, PG_U_NONSPACING_MARK},
-	{0x01e090, 0x01e0ff, PG_U_UNASSIGNED},
 	{0x01e100, 0x01e12c, PG_U_OTHER_LETTER},
-	{0x01e12d, 0x01e12f, PG_U_UNASSIGNED},
 	{0x01e130, 0x01e136, PG_U_NONSPACING_MARK},
 	{0x01e137, 0x01e13d, PG_U_MODIFIER_LETTER},
-	{0x01e13e, 0x01e13f, PG_U_UNASSIGNED},
 	{0x01e140, 0x01e149, PG_U_DECIMAL_NUMBER},
-	{0x01e14a, 0x01e14d, PG_U_UNASSIGNED},
 	{0x01e14e, 0x01e14e, PG_U_OTHER_LETTER},
 	{0x01e14f, 0x01e14f, PG_U_OTHER_SYMBOL},
-	{0x01e150, 0x01e28f, PG_U_UNASSIGNED},
 	{0x01e290, 0x01e2ad, PG_U_OTHER_LETTER},
 	{0x01e2ae, 0x01e2ae, PG_U_NONSPACING_MARK},
-	{0x01e2af, 0x01e2bf, PG_U_UNASSIGNED},
 	{0x01e2c0, 0x01e2eb, PG_U_OTHER_LETTER},
 	{0x01e2ec, 0x01e2ef, PG_U_NONSPACING_MARK},
 	{0x01e2f0, 0x01e2f9, PG_U_DECIMAL_NUMBER},
-	{0x01e2fa, 0x01e2fe, PG_U_UNASSIGNED},
 	{0x01e2ff, 0x01e2ff, PG_U_CURRENCY_SYMBOL},
-	{0x01e300, 0x01e4cf, PG_U_UNASSIGNED},
 	{0x01e4d0, 0x01e4ea, PG_U_OTHER_LETTER},
 	{0x01e4eb, 0x01e4eb, PG_U_MODIFIER_LETTER},
 	{0x01e4ec, 0x01e4ef, PG_U_NONSPACING_MARK},
 	{0x01e4f0, 0x01e4f9, PG_U_DECIMAL_NUMBER},
-	{0x01e4fa, 0x01e7df, PG_U_UNASSIGNED},
 	{0x01e7e0, 0x01e7e6, PG_U_OTHER_LETTER},
-	{0x01e7e7, 0x01e7e7, PG_U_UNASSIGNED},
 	{0x01e7e8, 0x01e7eb, PG_U_OTHER_LETTER},
-	{0x01e7ec, 0x01e7ec, PG_U_UNASSIGNED},
 	{0x01e7ed, 0x01e7ee, PG_U_OTHER_LETTER},
-	{0x01e7ef, 0x01e7ef, PG_U_UNASSIGNED},
 	{0x01e7f0, 0x01e7fe, PG_U_OTHER_LETTER},
-	{0x01e7ff, 0x01e7ff, PG_U_UNASSIGNED},
 	{0x01e800, 0x01e8c4, PG_U_OTHER_LETTER},
-	{0x01e8c5, 0x01e8c6, PG_U_UNASSIGNED},
 	{0x01e8c7, 0x01e8cf, PG_U_OTHER_NUMBER},
 	{0x01e8d0, 0x01e8d6, PG_U_NONSPACING_MARK},
-	{0x01e8d7, 0x01e8ff, PG_U_UNASSIGNED},
 	{0x01e900, 0x01e921, PG_U_UPPERCASE_LETTER},
 	{0x01e922, 0x01e943, PG_U_LOWERCASE_LETTER},
 	{0x01e944, 0x01e94a, PG_U_NONSPACING_MARK},
 	{0x01e94b, 0x01e94b, PG_U_MODIFIER_LETTER},
-	{0x01e94c, 0x01e94f, PG_U_UNASSIGNED},
 	{0x01e950, 0x01e959, PG_U_DECIMAL_NUMBER},
-	{0x01e95a, 0x01e95d, PG_U_UNASSIGNED},
 	{0x01e95e, 0x01e95f, PG_U_OTHER_PUNCTUATION},
-	{0x01e960, 0x01ec70, PG_U_UNASSIGNED},
 	{0x01ec71, 0x01ecab, PG_U_OTHER_NUMBER},
 	{0x01ecac, 0x01ecac, PG_U_OTHER_SYMBOL},
 	{0x01ecad, 0x01ecaf, PG_U_OTHER_NUMBER},
 	{0x01ecb0, 0x01ecb0, PG_U_CURRENCY_SYMBOL},
 	{0x01ecb1, 0x01ecb4, PG_U_OTHER_NUMBER},
-	{0x01ecb5, 0x01ed00, PG_U_UNASSIGNED},
 	{0x01ed01, 0x01ed2d, PG_U_OTHER_NUMBER},
 	{0x01ed2e, 0x01ed2e, PG_U_OTHER_SYMBOL},
 	{0x01ed2f, 0x01ed3d, PG_U_OTHER_NUMBER},
-	{0x01ed3e, 0x01edff, PG_U_UNASSIGNED},
 	{0x01ee00, 0x01ee03, PG_U_OTHER_LETTER},
-	{0x01ee04, 0x01ee04, PG_U_UNASSIGNED},
 	{0x01ee05, 0x01ee1f, PG_U_OTHER_LETTER},
-	{0x01ee20, 0x01ee20, PG_U_UNASSIGNED},
 	{0x01ee21, 0x01ee22, PG_U_OTHER_LETTER},
-	{0x01ee23, 0x01ee23, PG_U_UNASSIGNED},
 	{0x01ee24, 0x01ee24, PG_U_OTHER_LETTER},
-	{0x01ee25, 0x01ee26, PG_U_UNASSIGNED},
 	{0x01ee27, 0x01ee27, PG_U_OTHER_LETTER},
-	{0x01ee28, 0x01ee28, PG_U_UNASSIGNED},
 	{0x01ee29, 0x01ee32, PG_U_OTHER_LETTER},
-	{0x01ee33, 0x01ee33, PG_U_UNASSIGNED},
 	{0x01ee34, 0x01ee37, PG_U_OTHER_LETTER},
-	{0x01ee38, 0x01ee38, PG_U_UNASSIGNED},
 	{0x01ee39, 0x01ee39, PG_U_OTHER_LETTER},
-	{0x01ee3a, 0x01ee3a, PG_U_UNASSIGNED},
 	{0x01ee3b, 0x01ee3b, PG_U_OTHER_LETTER},
-	{0x01ee3c, 0x01ee41, PG_U_UNASSIGNED},
 	{0x01ee42, 0x01ee42, PG_U_OTHER_LETTER},
-	{0x01ee43, 0x01ee46, PG_U_UNASSIGNED},
 	{0x01ee47, 0x01ee47, PG_U_OTHER_LETTER},
-	{0x01ee48, 0x01ee48, PG_U_UNASSIGNED},
 	{0x01ee49, 0x01ee49, PG_U_OTHER_LETTER},
-	{0x01ee4a, 0x01ee4a, PG_U_UNASSIGNED},
 	{0x01ee4b, 0x01ee4b, PG_U_OTHER_LETTER},
-	{0x01ee4c, 0x01ee4c, PG_U_UNASSIGNED},
 	{0x01ee4d, 0x01ee4f, PG_U_OTHER_LETTER},
-	{0x01ee50, 0x01ee50, PG_U_UNASSIGNED},
 	{0x01ee51, 0x01ee52, PG_U_OTHER_LETTER},
-	{0x01ee53, 0x01ee53, PG_U_UNASSIGNED},
 	{0x01ee54, 0x01ee54, PG_U_OTHER_LETTER},
-	{0x01ee55, 0x01ee56, PG_U_UNASSIGNED},
 	{0x01ee57, 0x01ee57, PG_U_OTHER_LETTER},
-	{0x01ee58, 0x01ee58, PG_U_UNASSIGNED},
 	{0x01ee59, 0x01ee59, PG_U_OTHER_LETTER},
-	{0x01ee5a, 0x01ee5a, PG_U_UNASSIGNED},
 	{0x01ee5b, 0x01ee5b, PG_U_OTHER_LETTER},
-	{0x01ee5c, 0x01ee5c, PG_U_UNASSIGNED},
 	{0x01ee5d, 0x01ee5d, PG_U_OTHER_LETTER},
-	{0x01ee5e, 0x01ee5e, PG_U_UNASSIGNED},
 	{0x01ee5f, 0x01ee5f, PG_U_OTHER_LETTER},
-	{0x01ee60, 0x01ee60, PG_U_UNASSIGNED},
 	{0x01ee61, 0x01ee62, PG_U_OTHER_LETTER},
-	{0x01ee63, 0x01ee63, PG_U_UNASSIGNED},
 	{0x01ee64, 0x01ee64, PG_U_OTHER_LETTER},
-	{0x01ee65, 0x01ee66, PG_U_UNASSIGNED},
 	{0x01ee67, 0x01ee6a, PG_U_OTHER_LETTER},
-	{0x01ee6b, 0x01ee6b, PG_U_UNASSIGNED},
 	{0x01ee6c, 0x01ee72, PG_U_OTHER_LETTER},
-	{0x01ee73, 0x01ee73, PG_U_UNASSIGNED},
 	{0x01ee74, 0x01ee77, PG_U_OTHER_LETTER},
-	{0x01ee78, 0x01ee78, PG_U_UNASSIGNED},
 	{0x01ee79, 0x01ee7c, PG_U_OTHER_LETTER},
-	{0x01ee7d, 0x01ee7d, PG_U_UNASSIGNED},
 	{0x01ee7e, 0x01ee7e, PG_U_OTHER_LETTER},
-	{0x01ee7f, 0x01ee7f, PG_U_UNASSIGNED},
 	{0x01ee80, 0x01ee89, PG_U_OTHER_LETTER},
-	{0x01ee8a, 0x01ee8a, PG_U_UNASSIGNED},
 	{0x01ee8b, 0x01ee9b, PG_U_OTHER_LETTER},
-	{0x01ee9c, 0x01eea0, PG_U_UNASSIGNED},
 	{0x01eea1, 0x01eea3, PG_U_OTHER_LETTER},
-	{0x01eea4, 0x01eea4, PG_U_UNASSIGNED},
 	{0x01eea5, 0x01eea9, PG_U_OTHER_LETTER},
-	{0x01eeaa, 0x01eeaa, PG_U_UNASSIGNED},
 	{0x01eeab, 0x01eebb, PG_U_OTHER_LETTER},
-	{0x01eebc, 0x01eeef, PG_U_UNASSIGNED},
 	{0x01eef0, 0x01eef1, PG_U_MATH_SYMBOL},
-	{0x01eef2, 0x01efff, PG_U_UNASSIGNED},
 	{0x01f000, 0x01f02b, PG_U_OTHER_SYMBOL},
-	{0x01f02c, 0x01f02f, PG_U_UNASSIGNED},
 	{0x01f030, 0x01f093, PG_U_OTHER_SYMBOL},
-	{0x01f094, 0x01f09f, PG_U_UNASSIGNED},
 	{0x01f0a0, 0x01f0ae, PG_U_OTHER_SYMBOL},
-	{0x01f0af, 0x01f0b0, PG_U_UNASSIGNED},
 	{0x01f0b1, 0x01f0bf, PG_U_OTHER_SYMBOL},
-	{0x01f0c0, 0x01f0c0, PG_U_UNASSIGNED},
 	{0x01f0c1, 0x01f0cf, PG_U_OTHER_SYMBOL},
-	{0x01f0d0, 0x01f0d0, PG_U_UNASSIGNED},
 	{0x01f0d1, 0x01f0f5, PG_U_OTHER_SYMBOL},
-	{0x01f0f6, 0x01f0ff, PG_U_UNASSIGNED},
 	{0x01f100, 0x01f10c, PG_U_OTHER_NUMBER},
 	{0x01f10d, 0x01f1ad, PG_U_OTHER_SYMBOL},
-	{0x01f1ae, 0x01f1e5, PG_U_UNASSIGNED},
 	{0x01f1e6, 0x01f202, PG_U_OTHER_SYMBOL},
-	{0x01f203, 0x01f20f, PG_U_UNASSIGNED},
 	{0x01f210, 0x01f23b, PG_U_OTHER_SYMBOL},
-	{0x01f23c, 0x01f23f, PG_U_UNASSIGNED},
 	{0x01f240, 0x01f248, PG_U_OTHER_SYMBOL},
-	{0x01f249, 0x01f24f, PG_U_UNASSIGNED},
 	{0x01f250, 0x01f251, PG_U_OTHER_SYMBOL},
-	{0x01f252, 0x01f25f, PG_U_UNASSIGNED},
 	{0x01f260, 0x01f265, PG_U_OTHER_SYMBOL},
-	{0x01f266, 0x01f2ff, PG_U_UNASSIGNED},
 	{0x01f300, 0x01f3fa, PG_U_OTHER_SYMBOL},
 	{0x01f3fb, 0x01f3ff, PG_U_MODIFIER_SYMBOL},
 	{0x01f400, 0x01f6d7, PG_U_OTHER_SYMBOL},
-	{0x01f6d8, 0x01f6db, PG_U_UNASSIGNED},
 	{0x01f6dc, 0x01f6ec, PG_U_OTHER_SYMBOL},
-	{0x01f6ed, 0x01f6ef, PG_U_UNASSIGNED},
 	{0x01f6f0, 0x01f6fc, PG_U_OTHER_SYMBOL},
-	{0x01f6fd, 0x01f6ff, PG_U_UNASSIGNED},
 	{0x01f700, 0x01f776, PG_U_OTHER_SYMBOL},
-	{0x01f777, 0x01f77a, PG_U_UNASSIGNED},
 	{0x01f77b, 0x01f7d9, PG_U_OTHER_SYMBOL},
-	{0x01f7da, 0x01f7df, PG_U_UNASSIGNED},
 	{0x01f7e0, 0x01f7eb, PG_U_OTHER_SYMBOL},
-	{0x01f7ec, 0x01f7ef, PG_U_UNASSIGNED},
 	{0x01f7f0, 0x01f7f0, PG_U_OTHER_SYMBOL},
-	{0x01f7f1, 0x01f7ff, PG_U_UNASSIGNED},
 	{0x01f800, 0x01f80b, PG_U_OTHER_SYMBOL},
-	{0x01f80c, 0x01f80f, PG_U_UNASSIGNED},
 	{0x01f810, 0x01f847, PG_U_OTHER_SYMBOL},
-	{0x01f848, 0x01f84f, PG_U_UNASSIGNED},
 	{0x01f850, 0x01f859, PG_U_OTHER_SYMBOL},
-	{0x01f85a, 0x01f85f, PG_U_UNASSIGNED},
 	{0x01f860, 0x01f887, PG_U_OTHER_SYMBOL},
-	{0x01f888, 0x01f88f, PG_U_UNASSIGNED},
 	{0x01f890, 0x01f8ad, PG_U_OTHER_SYMBOL},
-	{0x01f8ae, 0x01f8af, PG_U_UNASSIGNED},
 	{0x01f8b0, 0x01f8b1, PG_U_OTHER_SYMBOL},
-	{0x01f8b2, 0x01f8ff, PG_U_UNASSIGNED},
 	{0x01f900, 0x01fa53, PG_U_OTHER_SYMBOL},
-	{0x01fa54, 0x01fa5f, PG_U_UNASSIGNED},
 	{0x01fa60, 0x01fa6d, PG_U_OTHER_SYMBOL},
-	{0x01fa6e, 0x01fa6f, PG_U_UNASSIGNED},
 	{0x01fa70, 0x01fa7c, PG_U_OTHER_SYMBOL},
-	{0x01fa7d, 0x01fa7f, PG_U_UNASSIGNED},
 	{0x01fa80, 0x01fa88, PG_U_OTHER_SYMBOL},
-	{0x01fa89, 0x01fa8f, PG_U_UNASSIGNED},
 	{0x01fa90, 0x01fabd, PG_U_OTHER_SYMBOL},
-	{0x01fabe, 0x01fabe, PG_U_UNASSIGNED},
 	{0x01fabf, 0x01fac5, PG_U_OTHER_SYMBOL},
-	{0x01fac6, 0x01facd, PG_U_UNASSIGNED},
 	{0x01face, 0x01fadb, PG_U_OTHER_SYMBOL},
-	{0x01fadc, 0x01fadf, PG_U_UNASSIGNED},
 	{0x01fae0, 0x01fae8, PG_U_OTHER_SYMBOL},
-	{0x01fae9, 0x01faef, PG_U_UNASSIGNED},
 	{0x01faf0, 0x01faf8, PG_U_OTHER_SYMBOL},
-	{0x01faf9, 0x01faff, PG_U_UNASSIGNED},
 	{0x01fb00, 0x01fb92, PG_U_OTHER_SYMBOL},
-	{0x01fb93, 0x01fb93, PG_U_UNASSIGNED},
 	{0x01fb94, 0x01fbca, PG_U_OTHER_SYMBOL},
-	{0x01fbcb, 0x01fbef, PG_U_UNASSIGNED},
 	{0x01fbf0, 0x01fbf9, PG_U_DECIMAL_NUMBER},
-	{0x01fbfa, 0x01ffff, PG_U_UNASSIGNED},
 	{0x020000, 0x02a6df, PG_U_OTHER_LETTER},
-	{0x02a6e0, 0x02a6ff, PG_U_UNASSIGNED},
 	{0x02a700, 0x02b739, PG_U_OTHER_LETTER},
-	{0x02b73a, 0x02b73f, PG_U_UNASSIGNED},
 	{0x02b740, 0x02b81d, PG_U_OTHER_LETTER},
-	{0x02b81e, 0x02b81f, PG_U_UNASSIGNED},
 	{0x02b820, 0x02cea1, PG_U_OTHER_LETTER},
-	{0x02cea2, 0x02ceaf, PG_U_UNASSIGNED},
 	{0x02ceb0, 0x02ebe0, PG_U_OTHER_LETTER},
-	{0x02ebe1, 0x02ebef, PG_U_UNASSIGNED},
 	{0x02ebf0, 0x02ee5d, PG_U_OTHER_LETTER},
-	{0x02ee5e, 0x02f7ff, PG_U_UNASSIGNED},
 	{0x02f800, 0x02fa1d, PG_U_OTHER_LETTER},
-	{0x02fa1e, 0x02ffff, PG_U_UNASSIGNED},
 	{0x030000, 0x03134a, PG_U_OTHER_LETTER},
-	{0x03134b, 0x03134f, PG_U_UNASSIGNED},
 	{0x031350, 0x0323af, PG_U_OTHER_LETTER},
-	{0x0323b0, 0x0e0000, PG_U_UNASSIGNED},
 	{0x0e0001, 0x0e0001, PG_U_FORMAT},
-	{0x0e0002, 0x0e001f, PG_U_UNASSIGNED},
 	{0x0e0020, 0x0e007f, PG_U_FORMAT},
-	{0x0e0080, 0x0e00ff, PG_U_UNASSIGNED},
 	{0x0e0100, 0x0e01ef, PG_U_NONSPACING_MARK},
-	{0x0e01f0, 0x0effff, PG_U_UNASSIGNED},
 	{0x0f0000, 0x0ffffd, PG_U_PRIVATE_USE},
-	{0x0ffffe, 0x0fffff, PG_U_UNASSIGNED},
-	{0x100000, 0x10fffd, PG_U_PRIVATE_USE},
-	{0x10fffe, 0x10ffff, PG_U_UNASSIGNED}
+	{0x100000, 0x10fffd, PG_U_PRIVATE_USE}
 };
-- 
2.34.1

v1-0001-Minor-cleanup-for-unicode-update-build-and-test.patchtext/x-patch; charset=UTF-8; name=v1-0001-Minor-cleanup-for-unicode-update-build-and-test.patchDownload
From 80ed701721b2bc91f2346f013d58930cd1d325f5 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 22 Nov 2023 10:38:46 -0800
Subject: [PATCH v1 1/3] Minor cleanup for unicode-update build and test.

---
 src/common/unicode/Makefile        |  6 ++--
 src/common/unicode/category_test.c | 18 ++++++------
 src/common/unicode/meson.build     | 44 +++++++++++++++---------------
 3 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/src/common/unicode/Makefile b/src/common/unicode/Makefile
index 30cd75cc6a..04d81dd5cb 100644
--- a/src/common/unicode/Makefile
+++ b/src/common/unicode/Makefile
@@ -21,7 +21,7 @@ CPPFLAGS += $(ICU_CFLAGS)
 # By default, do nothing.
 all:
 
-update-unicode: unicode_category_table.h unicode_norm_table.h unicode_nonspacing_table.h unicode_east_asian_fw_table.h unicode_normprops_table.h unicode_norm_hashfunc.h unicode_version.h
+update-unicode: unicode_category_table.h unicode_east_asian_fw_table.h unicode_nonspacing_table.h unicode_norm_hashfunc.h unicode_norm_table.h unicode_normprops_table.h unicode_version.h
 	mv $^ $(top_srcdir)/src/include/common/
 	$(MAKE) category-check
 	$(MAKE) normalization-check
@@ -29,7 +29,7 @@ update-unicode: unicode_category_table.h unicode_norm_table.h unicode_nonspacing
 # These files are part of the Unicode Character Database. Download
 # them on demand.  The dependency on Makefile.global is for
 # UNICODE_VERSION.
-UnicodeData.txt EastAsianWidth.txt DerivedNormalizationProps.txt CompositionExclusions.txt NormalizationTest.txt: $(top_builddir)/src/Makefile.global
+CompositionExclusions.txt DerivedNormalizationProps.txt EastAsianWidth.txt NormalizationTest.txt UnicodeData.txt: $(top_builddir)/src/Makefile.global
 	$(DOWNLOAD) https://www.unicode.org/Public/$(UNICODE_VERSION)/ucd/$(@F)
 
 unicode_version.h: generate-unicode_version.pl
@@ -82,4 +82,4 @@ clean:
 	rm -f $(OBJS) category_test category_test.o norm_test norm_test.o
 
 distclean: clean
-	rm -f UnicodeData.txt EastAsianWidth.txt CompositionExclusions.txt NormalizationTest.txt norm_test_table.h unicode_norm_table.h
+	rm -f CompositionExclusions.txt DerivedNormalizationProps.txt EastAsianWidth.txt NormalizationTest.txt UnicodeData.txt norm_test_table.h unicode_category_table.h unicode_norm_table.h
diff --git a/src/common/unicode/category_test.c b/src/common/unicode/category_test.c
index ba62716d45..d9ea806eb8 100644
--- a/src/common/unicode/category_test.c
+++ b/src/common/unicode/category_test.c
@@ -54,8 +54,8 @@ main(int argc, char **argv)
 	int			pg_skipped_codepoints = 0;
 	int			icu_skipped_codepoints = 0;
 
-	printf("Postgres Unicode Version:\t%s\n", PG_UNICODE_VERSION);
-	printf("ICU Unicode Version:\t\t%s\n", U_UNICODE_VERSION);
+	printf("category_test: Postgres Unicode version:\t%s\n", PG_UNICODE_VERSION);
+	printf("category_test: ICU Unicode version:\t\t%s\n", U_UNICODE_VERSION);
 
 	for (UChar32 code = 0; code <= 0x10ffff; code++)
 	{
@@ -79,11 +79,11 @@ main(int argc, char **argv)
 				icu_skipped_codepoints++;
 			else
 			{
-				printf("FAILURE for codepoint %06x\n", code);
-				printf("Postgres category:	%02d %s %s\n", pg_category,
+				printf("category_test: FAILURE for codepoint 0x%06x\n", code);
+				printf("category_test: Postgres category:	%02d %s %s\n", pg_category,
 					   unicode_category_abbrev(pg_category),
 					   unicode_category_string(pg_category));
-				printf("ICU category:		%02d %s %s\n", icu_category,
+				printf("category_test: ICU category:		%02d %s %s\n", icu_category,
 					   unicode_category_abbrev(icu_category),
 					   unicode_category_string(icu_category));
 				printf("\n");
@@ -93,16 +93,16 @@ main(int argc, char **argv)
 	}
 
 	if (pg_skipped_codepoints > 0)
-		printf("Skipped %d codepoints unassigned in Postgres due to Unicode version mismatch.\n",
+		printf("category_test: skipped %d codepoints unassigned in Postgres due to Unicode version mismatch\n",
 			   pg_skipped_codepoints);
 	if (icu_skipped_codepoints > 0)
-		printf("Skipped %d codepoints unassigned in ICU due to Unicode version mismatch.\n",
+		printf("category_test: skipped %d codepoints unassigned in ICU due to Unicode version mismatch\n",
 			   icu_skipped_codepoints);
 
-	printf("category_test: All tests successful!\n");
+	printf("category_test: success\n");
 	exit(0);
 #else
-	printf("ICU support required for test; skipping.\n");
+	printf("category_test: ICU support required for test; skipping\n");
 	exit(0);
 #endif
 }
diff --git a/src/common/unicode/meson.build b/src/common/unicode/meson.build
index 6af46122c4..e8cfdc1df4 100644
--- a/src/common/unicode/meson.build
+++ b/src/common/unicode/meson.build
@@ -11,7 +11,7 @@ endif
 
 # These files are part of the Unicode Character Database. Download them on
 # demand.
-foreach f : ['UnicodeData.txt', 'EastAsianWidth.txt', 'DerivedNormalizationProps.txt', 'CompositionExclusions.txt', 'NormalizationTest.txt']
+foreach f : ['CompositionExclusions.txt', 'DerivedNormalizationProps.txt', 'EastAsianWidth.txt', 'NormalizationTest.txt', 'UnicodeData.txt']
   url = unicode_baseurl.format(UNICODE_VERSION, f)
   target = custom_target(f,
     output: f,
@@ -24,15 +24,6 @@ endforeach
 
 update_unicode_targets = []
 
-update_unicode_targets += \
-  custom_target('unicode_version.h',
-    output: ['unicode_version.h'],
-    command: [
-      perl, files('generate-unicode_version.pl'),
-      '--outdir', '@OUTDIR@', '--version', UNICODE_VERSION],
-    build_by_default: false,
-  )
-
 update_unicode_targets += \
   custom_target('unicode_category_table.h',
     input: [unicode_data['UnicodeData.txt']],
@@ -44,14 +35,12 @@ update_unicode_targets += \
   )
 
 update_unicode_targets += \
-  custom_target('unicode_norm_table.h',
-    input: [unicode_data['UnicodeData.txt'], unicode_data['CompositionExclusions.txt']],
-    output: ['unicode_norm_table.h', 'unicode_norm_hashfunc.h'],
-    depend_files: perfect_hash_pm,
-    command: [
-      perl, files('generate-unicode_norm_table.pl'),
-      '--outdir', '@OUTDIR@', '@INPUT@'],
+  custom_target('unicode_east_asian_fw_table.h',
+    input: [unicode_data['EastAsianWidth.txt']],
+    output: ['unicode_east_asian_fw_table.h'],
+    command: [perl, files('generate-unicode_east_asian_fw_table.pl'), '@INPUT@'],
     build_by_default: false,
+    capture: true,
   )
 
 update_unicode_targets += \
@@ -65,12 +54,14 @@ update_unicode_targets += \
   )
 
 update_unicode_targets += \
-  custom_target('unicode_east_asian_fw_table.h',
-    input: [unicode_data['EastAsianWidth.txt']],
-    output: ['unicode_east_asian_fw_table.h'],
-    command: [perl, files('generate-unicode_east_asian_fw_table.pl'), '@INPUT@'],
+  custom_target('unicode_norm_table.h',
+    input: [unicode_data['UnicodeData.txt'], unicode_data['CompositionExclusions.txt']],
+    output: ['unicode_norm_table.h', 'unicode_norm_hashfunc.h'],
+    depend_files: perfect_hash_pm,
+    command: [
+      perl, files('generate-unicode_norm_table.pl'),
+      '--outdir', '@OUTDIR@', '@INPUT@'],
     build_by_default: false,
-    capture: true,
   )
 
 update_unicode_targets += \
@@ -83,6 +74,15 @@ update_unicode_targets += \
     capture: true,
   )
 
+update_unicode_targets += \
+  custom_target('unicode_version.h',
+    output: ['unicode_version.h'],
+    command: [
+      perl, files('generate-unicode_version.pl'),
+      '--outdir', '@OUTDIR@', '--version', UNICODE_VERSION],
+    build_by_default: false,
+  )
+
 norm_test_table = custom_target('norm_test_table.h',
     input: [unicode_data['NormalizationTest.txt']],
     output: ['norm_test_table.h'],
-- 
2.34.1

#4Thomas Munro
thomas.munro@gmail.com
In reply to: Jeff Davis (#3)
Re: encoding affects ICU regex character classification

On Thu, Nov 30, 2023 at 1:23 PM Jeff Davis <pgsql@j-davis.com> wrote:

Character classification is not localized at all in libc or ICU as far
as I can tell.

Really? POSIX isalpha()/isalpha_l() and friends clearly depend on a
locale. See eg d522b05c for a case where that broke something.
Perhaps you mean glibc wouldn't do that to you because you know that,
as an unstandardised detail, it sucks in (some version of) Unicode's
data which shouldn't vary between locales. But you are allowed to
make your own locales, including putting whatever classifications you
want into the LC_TYPE file using POSIX-standardised tools like
localedef. Perhaps that is a bit of a stretch, and no one really does
that in practice, but anyway it's still "localized".

Not knowing anything about how glibc generates its charmaps, Unicode
or pre-Unicode, I could take a wild guess that maybe in LATIN9 they
have an old hand-crafted table, but for UTF-8 encoding it's fully
outsourced to Unicode, and that's why you see a difference. Another
problem seen in a few parts of our tree is that we sometimes feed
individual UTF-8 bytes to the isXXX() functions which is about as well
defined as trying to pay for a pint with the left half of a $10 bill.

As for ICU, it's "not localized" only if there is only one ICU library
in the universe, but of course different versions of ICU might give
different answers because they correspond to different versions of
Unicode (as do glibc versions, FreeBSD libc versions, etc) and also
might disagree with tables built by PostgreSQL. Maybe irrelevant for
now, but I think with thus-far-imagined variants of the multi-version
ICU proposal, you have to choose whether to call u_isUAlphabetic() in
the library we're linked against, or via the dlsym() we look up in a
particular dlopen'd library. So I guess we'd have to access it via
our pg_locale_t, so again it'd be "localized" by some definitions.

Thinking about how to apply that thinking to libc, ... this is going
to sound far fetched and handwavy but here goes: we could even
imagine a multi-version system based on different base locale paths.
Instead of using the system-provided locales under /usr/share/locale
to look when we call newlocale(..., "en_NZ.UTF-8", ...), POSIX says
we're allowed to specify an absolute path eg newlocale(...,
"/foo/bar/unicode11/en_NZ.UTF-8", ...). If it is possible to use
$DISTRO's localedef to compile $OLD_DISTRO's locale sources to get
historical behaviour, that might provide a way to get them without
assuming the binary format is stable (it definitely isn't, but the
source format is nailed down by POSIX). One fly in the ointment is
that glibc failed to implement absolute path support, so you might
need to use versioned locale names instead, or see if the LOCPATH
environment variable can be swizzled around without confusing glibc's
locale cache. Then wouldn't be fundamentally different than the
hypothesised multi-version ICU case: you could probably come up with
different isalpha_l() results for different locales because you have
different LC_CTYPE versions (for example Unicode 15.0 added new
extended Cyrillic characters 1E030..1E08F, they look alphabetical to
me but what would I know). That is an extremely hypothetical
pie-in-the-sky thought and I don't know if it'd really work very well,
but it is a concrete way that someone might finish up getting
different answers out of isalpha_l(), to observe that it really is
localised. And localized.

#5Jeff Davis
pgsql@j-davis.com
In reply to: Thomas Munro (#4)
5 attachment(s)
Re: encoding affects ICU regex character classification

On Thu, 2023-11-30 at 15:10 +1300, Thomas Munro wrote:

On Thu, Nov 30, 2023 at 1:23 PM Jeff Davis <pgsql@j-davis.com>
wrote:

Character classification is not localized at all in libc or ICU
as > > far
as I can tell.

Really?  POSIX isalpha()/isalpha_l() and friends clearly depend on
a
locale.  See eg d522b05c for a case where that broke something.

I believe we're using different definitions of "localized". What I mean
is "varies from region to region or language to language". I think you
mean "varies for any reason at all [perhaps for no reason?]".

For instance, that commit indirectly links to:

https://github.com/evanj/isspace_locale

Which says "Mac OS X in a UTF-8 locale...". I don't see any fundamental
locale-based concern there.

I wrote a test program (attached) which compares any two given libc
locales using both the ordinary isalpha() family of functions, and also
using the iswalpha() family of functions. For the former, I only test
up to 0x7f. For the latter, I went to some effort to properly translate
the code point to a wchar_t (encode as UTF8, then mbstowcs using a UTF-
8 locale), and I test all unicode code points except the surrogate
range.

Using the test program, I compared the C.UTF-8 locale to every other
installed locale on my system (attached list for reference) and the
only ones that show any differences are "C" and "POSIX". That, combined
with the fact that ICU doesn't even accept a locale argument to the
character classification functions, gives me a high degree of
confidence that character classification is not localized on my system
according to my definition of "localized". If someone else wants to run
the test program on their system, I'd be interested to see the results
(some platform-specific modification may be required, e.g. handling 16-
bit whcar_t, etc.).

Your definition is too wide in my opinion, because it mixes together
different sources of variation that are best left separate:
a. region/language
b. technical requirements
c. versioning
d. implementation variance

(a) is not a true source of variation (please correct me if I'm wrong)

(b) is perhaps interesting. The "C" locale is one example, and perhaps
there are others, but I doubt very many others that we want to support.

(c) is not a major concern in my opinion. The impact of Unicode changes
is usually not dramatic, and it only affects regexes so it's much more
contained than collation, for example. And if you really care, just use
the "C" locale.

(d) is mostly a bug. Most users would prefer standardization, platform-
independence, documentability, and testability. There are users who
might care a lot about compatibility, and I don't want to disrupt such
users, but long term I don't see a lot of value in bubbling up
semantics from libc into expressions when there's not a clear reason to
do so. (Note: getting semantics from libc is a bit dubious in the case
of collation, as well, but at least for collation there are regional
and linguistic differences that we can't handle internally.)

I think we only need 2 main character classification schemes: "C" and
Unicode (TR #18 Compatibility Properties[1]http://www.unicode.org/reports/tr18/#Compatibility_Properties, either the "Standard"
variant or the "POSIX Compatible" variant or both). The libc and ICU
ones should be there only for compatibility and discouraged and
hopefully eventually removed.

Not knowing anything about how glibc generates its charmaps,
Unicode
or pre-Unicode, I could take a wild guess that maybe in LATIN9 they
have an old hand-crafted table, but for UTF-8 encoding it's fully
outsourced to Unicode, and that's why you see a difference.

No, the problem is that we're passing a pg_wchar to an ICU function
that expects a 32-bit code point. Those two things are equivalent in
the UTF8 encoding, but not in the LATIN9 encoding.

See the comment at the top of regc_pg_locale.c, which should probably
be updated to describe what happens with ICU collations.

  Another
problem seen in a few parts of our tree is that we sometimes feed
individual UTF-8 bytes to the isXXX() functions which is about as >
well
defined as trying to pay for a pint with the left half of a $10
bill.

If we have built-in character classification systems as I propose ("C"
and Unicode), then the callers can simply choose which well-defined one
to use.

 also
might disagree with tables built by PostgreSQL.

The patch I provided (new version attached) exhaustively tests all the
new Unicode property tables, and also the class assignments based on
[1]: http://www.unicode.org/reports/tr18/#Compatibility_Properties
test will run whenever you "ninja update-unicode", so any
inconsistencies will be highly visible before release. Additionally,
because the tables are checked in, you'll be able to see (in the diff)
the impact from a Unicode version update and consider that impact when
writing the release notes.

You may be wondering about differences in the version of Unicode
between Postgres and ICU while the test is running. It only tests code
points that are assigned in both Unicode versions, and reports the
number of code points that are skipped due to this check. The person
running "update-unicode" may see a failing test or a large number of
skipped codepoints if the Unicode versions don't match, in which case
they should try running against a more closely-matching version of ICU.

Regards,
Jeff Davis

[1]: http://www.unicode.org/reports/tr18/#Compatibility_Properties
[2]: https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/uchar_8h.html#details
https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/uchar_8h.html#details

Attachments:

l.txttext/plain; charset=UTF-8; name=l.txtDownload
ctype_test.ctext/x-csrc; charset=UTF-8; name=ctype_test.cDownload
v2-0003-Add-Unicode-property-tables.patchtext/x-patch; charset=UTF-8; name=v2-0003-Add-Unicode-property-tables.patchDownload
From 31f8a02ad90d9d03ed39bb09f7a585e37e72b4e8 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Sat, 18 Nov 2023 15:34:24 -0800
Subject: [PATCH v2 3/3] Add Unicode property tables.

---
 src/common/unicode/Makefile                   |    6 +-
 src/common/unicode/category_test.c            |  296 +-
 .../generate-unicode_category_table.pl        |  203 +-
 src/common/unicode/meson.build                |    4 +-
 src/common/unicode_category.c                 |  266 +-
 src/include/common/unicode_category.h         |   29 +-
 src/include/common/unicode_category_table.h   | 2532 +++++++++++++++++
 7 files changed, 3273 insertions(+), 63 deletions(-)

diff --git a/src/common/unicode/Makefile b/src/common/unicode/Makefile
index 04d81dd5cb..27f0408d8b 100644
--- a/src/common/unicode/Makefile
+++ b/src/common/unicode/Makefile
@@ -29,13 +29,13 @@ update-unicode: unicode_category_table.h unicode_east_asian_fw_table.h unicode_n
 # These files are part of the Unicode Character Database. Download
 # them on demand.  The dependency on Makefile.global is for
 # UNICODE_VERSION.
-CompositionExclusions.txt DerivedNormalizationProps.txt EastAsianWidth.txt NormalizationTest.txt UnicodeData.txt: $(top_builddir)/src/Makefile.global
+CompositionExclusions.txt DerivedCoreProperties.txt DerivedNormalizationProps.txt EastAsianWidth.txt NormalizationTest.txt PropList.txt UnicodeData.txt: $(top_builddir)/src/Makefile.global
 	$(DOWNLOAD) https://www.unicode.org/Public/$(UNICODE_VERSION)/ucd/$(@F)
 
 unicode_version.h: generate-unicode_version.pl
 	$(PERL) $< --version $(UNICODE_VERSION)
 
-unicode_category_table.h: generate-unicode_category_table.pl UnicodeData.txt
+unicode_category_table.h: generate-unicode_category_table.pl DerivedCoreProperties.txt PropList.txt UnicodeData.txt
 	$(PERL) $<
 
 # Generation of conversion tables used for string normalization with
@@ -82,4 +82,4 @@ clean:
 	rm -f $(OBJS) category_test category_test.o norm_test norm_test.o
 
 distclean: clean
-	rm -f CompositionExclusions.txt DerivedNormalizationProps.txt EastAsianWidth.txt NormalizationTest.txt UnicodeData.txt norm_test_table.h unicode_category_table.h unicode_norm_table.h
+	rm -f CompositionExclusions.txt DerivedCoreProperties.txt DerivedNormalizationProps.txt EastAsianWidth.txt NormalizationTest.txt PropList.txt UnicodeData.txt norm_test_table.h unicode_category_table.h unicode_norm_table.h
diff --git a/src/common/unicode/category_test.c b/src/common/unicode/category_test.c
index d9ea806eb8..b769850bf7 100644
--- a/src/common/unicode/category_test.c
+++ b/src/common/unicode/category_test.c
@@ -1,6 +1,7 @@
 /*-------------------------------------------------------------------------
  * category_test.c
- *		Program to test Unicode general category functions.
+ *		Program to test Unicode general category and character class
+ *		functions.
  *
  * Portions Copyright (c) 2017-2023, PostgreSQL Global Development Group
  *
@@ -14,17 +15,24 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
-
 #ifdef USE_ICU
 #include <unicode/uchar.h>
 #endif
-#include "common/unicode_category.h"
+#include <wctype.h>
+
 #include "common/unicode_version.h"
+#include "common/unicode_category.h"
+
+#define LIBC_LOCALE "C.UTF-8"
+
+static int	pg_unicode_version = 0;
+#ifdef USE_ICU
+static int	icu_unicode_version = 0;
+#endif
 
 /*
  * Parse version into integer for easy comparison.
  */
-#ifdef USE_ICU
 static int
 parse_unicode_version(const char *version)
 {
@@ -39,56 +47,160 @@ parse_unicode_version(const char *version)
 
 	return major * 100 + minor;
 }
-#endif
 
+#ifdef USE_ICU
 /*
- * Exhaustively test that the Unicode category for each codepoint matches that
- * returned by ICU.
+ * Test Postgres Unicode tables by comparing with ICU. Test the General
+ * Category, as well as the properties Alphabetic, Lowercase, Uppercase,
+ * White_Space, and Hex_Digit.
  */
-int
-main(int argc, char **argv)
+static void
+icu_test()
 {
-#ifdef USE_ICU
-	int			pg_unicode_version = parse_unicode_version(PG_UNICODE_VERSION);
-	int			icu_unicode_version = parse_unicode_version(U_UNICODE_VERSION);
 	int			pg_skipped_codepoints = 0;
 	int			icu_skipped_codepoints = 0;
 
-	printf("category_test: Postgres Unicode version:\t%s\n", PG_UNICODE_VERSION);
-	printf("category_test: ICU Unicode version:\t\t%s\n", U_UNICODE_VERSION);
-
-	for (UChar32 code = 0; code <= 0x10ffff; code++)
+	for (pg_wchar code = 0; code <= 0x10ffff; code++)
 	{
 		uint8_t		pg_category = unicode_category(code);
 		uint8_t		icu_category = u_charType(code);
 
+		/* Property tests */
+		bool		prop_alphabetic = pg_u_prop_alphabetic(code);
+		bool		prop_lowercase = pg_u_prop_lowercase(code);
+		bool		prop_uppercase = pg_u_prop_uppercase(code);
+		bool		prop_white_space = pg_u_prop_white_space(code);
+		bool		prop_hex_digit = pg_u_prop_hex_digit(code);
+		bool		prop_join_control = pg_u_prop_join_control(code);
+
+		bool		icu_prop_alphabetic = u_hasBinaryProperty(
+			code, UCHAR_ALPHABETIC);
+		bool		icu_prop_lowercase =  u_hasBinaryProperty(
+			code, UCHAR_LOWERCASE);
+		bool		icu_prop_uppercase =  u_hasBinaryProperty(
+			code, UCHAR_UPPERCASE);
+		bool		icu_prop_white_space =  u_hasBinaryProperty(
+			code, UCHAR_WHITE_SPACE);
+		bool		icu_prop_hex_digit =  u_hasBinaryProperty(
+			code, UCHAR_HEX_DIGIT);
+		bool		icu_prop_join_control =  u_hasBinaryProperty(
+			code, UCHAR_JOIN_CONTROL);
+
+		/*
+		 * Compare with ICU for TR #18 character classes using:
+		 *
+		 * https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/uchar_8h.html#details
+		 *
+		 * which describes how to use ICU to test for membership in regex
+		 * character classes ("Standard", not "POSIX Compatible").
+		 *
+		 * NB: the document suggests testing for some properties such as
+		 * UCHAR_POSIX_ALNUM, but that doesn't mean that we're testing for the
+		 * "POSIX Compatible" character classes.
+		 */
+		bool		isalpha = pg_u_isalpha(code);
+		bool		islower = pg_u_islower(code);
+		bool		isupper = pg_u_isupper(code);
+		bool		ispunct = pg_u_ispunct(code);
+		bool		isdigit = pg_u_isdigit(code);
+		bool		isxdigit = pg_u_isxdigit(code);
+		bool		isalnum = pg_u_isalnum(code);
+		bool		isspace = pg_u_isspace(code);
+		bool		isblank = pg_u_isblank(code);
+		bool		iscntrl = pg_u_iscntrl(code);
+		bool		isgraph = pg_u_isgraph(code);
+		bool		isprint = pg_u_isprint(code);
+
+		bool		icu_isalpha = u_isUAlphabetic(code);
+		bool		icu_islower = u_isULowercase(code);
+		bool		icu_isupper = u_isUUppercase(code);
+		bool		icu_ispunct = u_ispunct(code);
+		bool		icu_isdigit = u_isdigit(code);
+		bool		icu_isxdigit = u_hasBinaryProperty(code,
+													   UCHAR_POSIX_XDIGIT);
+		bool		icu_isalnum = u_hasBinaryProperty(code,
+													  UCHAR_POSIX_ALNUM);
+		bool		icu_isspace = u_isUWhiteSpace(code);
+		bool		icu_isblank = u_isblank(code);
+		bool		icu_iscntrl = icu_category == PG_U_CONTROL;
+		bool		icu_isgraph = u_hasBinaryProperty(code,
+													  UCHAR_POSIX_GRAPH);
+		bool		icu_isprint = u_hasBinaryProperty(code,
+													  UCHAR_POSIX_PRINT);
+
+		/*
+		 * A version mismatch means that some assigned codepoints in the newer
+		 * version may be unassigned in the older version. That's OK, though
+		 * the test will not cover those codepoints marked unassigned in the
+		 * older version (that is, it will no longer be an exhaustive test).
+		 */
+		if (pg_category == PG_U_UNASSIGNED &&
+			icu_category != PG_U_UNASSIGNED &&
+			pg_unicode_version < icu_unicode_version)
+		{
+			pg_skipped_codepoints++;
+			continue;
+		}
+
+		if (icu_category == PG_U_UNASSIGNED &&
+			pg_category != PG_U_UNASSIGNED &&
+			icu_unicode_version < pg_unicode_version)
+		{
+			icu_skipped_codepoints++;
+			continue;
+		}
+
 		if (pg_category != icu_category)
 		{
-			/*
-			 * A version mismatch means that some assigned codepoints in the
-			 * newer version may be unassigned in the older version. That's
-			 * OK, though the test will not cover those codepoints marked
-			 * unassigned in the older version (that is, it will no longer be
-			 * an exhaustive test).
-			 */
-			if (pg_category == PG_U_UNASSIGNED &&
-				pg_unicode_version < icu_unicode_version)
-				pg_skipped_codepoints++;
-			else if (icu_category == PG_U_UNASSIGNED &&
-					 icu_unicode_version < pg_unicode_version)
-				icu_skipped_codepoints++;
-			else
-			{
-				printf("category_test: FAILURE for codepoint 0x%06x\n", code);
-				printf("category_test: Postgres category:	%02d %s %s\n", pg_category,
-					   unicode_category_abbrev(pg_category),
-					   unicode_category_string(pg_category));
-				printf("category_test: ICU category:		%02d %s %s\n", icu_category,
-					   unicode_category_abbrev(icu_category),
-					   unicode_category_string(icu_category));
-				printf("\n");
-				exit(1);
-			}
+			printf("category_test: FAILURE for codepoint 0x%06x\n", code);
+			printf("category_test: Postgres category:	%02d %s %s\n", pg_category,
+				   unicode_category_abbrev(pg_category),
+				   unicode_category_string(pg_category));
+			printf("category_test: ICU category:		%02d %s %s\n", icu_category,
+				   unicode_category_abbrev(icu_category),
+				   unicode_category_string(icu_category));
+			printf("\n");
+			exit(1);
+		}
+
+		if (prop_alphabetic != icu_prop_alphabetic ||
+			prop_lowercase != icu_prop_lowercase ||
+			prop_uppercase != icu_prop_uppercase ||
+			prop_white_space != icu_prop_white_space ||
+			prop_hex_digit != icu_prop_hex_digit ||
+			prop_join_control != icu_prop_join_control)
+		{
+			printf("category_test: FAILURE for codepoint 0x%06x\n", code);
+			printf("category_test: Postgres	property	alphabetic/lowercase/uppercase/white_space/hex_digit/join_control: %d/%d/%d/%d/%d/%d\n",
+				   prop_alphabetic, prop_lowercase, prop_uppercase,
+				   prop_white_space, prop_hex_digit, prop_join_control);
+			printf("category_test: ICU	property	alphabetic/lowercase/uppercase/white_space/hex_digit/join_control: %d/%d/%d/%d/%d/%d\n",
+				   icu_prop_alphabetic, icu_prop_lowercase, icu_prop_uppercase,
+				   icu_prop_white_space, icu_prop_hex_digit, icu_prop_join_control);
+			printf("\n");
+			exit(1);
+		}
+
+		if (isalpha != icu_isalpha ||
+			islower != icu_islower ||
+			isupper != icu_isupper ||
+			ispunct != icu_ispunct ||
+			isdigit != icu_isdigit ||
+			isxdigit != icu_isxdigit ||
+			isalnum != icu_isalnum ||
+			isspace != icu_isspace ||
+			isblank != icu_isblank ||
+			iscntrl != icu_iscntrl ||
+			isgraph != icu_isgraph ||
+			isprint != icu_isprint)
+		{
+			printf("category_test: FAILURE for codepoint 0x%06x\n", code);
+			printf("category_test: Postgres	class	alpha/lower/upper/punct/digit/xdigit/alnum/space/blank/cntrl/graph/print: %d/%d/%d/%d/%d/%d/%d/%d/%d/%d/%d/%d\n",
+				   isalpha, islower, isupper, ispunct, isdigit, isxdigit, isalnum, isspace, isblank, iscntrl, isgraph, isprint);
+			printf("category_test: ICU class	alpha/lower/upper/punct/digit/xdigit/alnum/space/blank/cntrl/graph/print: %d/%d/%d/%d/%d/%d/%d/%d/%d/%d/%d/%d\n",
+				   icu_isalpha, icu_islower, icu_isupper, icu_ispunct, icu_isdigit, icu_isxdigit, icu_isalnum, icu_isspace, icu_isblank, icu_iscntrl, icu_isgraph, icu_isprint);
+			printf("\n");
+			exit(1);
 		}
 	}
 
@@ -99,10 +211,104 @@ main(int argc, char **argv)
 		printf("category_test: skipped %d codepoints unassigned in ICU due to Unicode version mismatch\n",
 			   icu_skipped_codepoints);
 
-	printf("category_test: success\n");
-	exit(0);
+	printf("category_test: ICU test successful\n");
+}
+#endif
+
+/*
+ * For libc, test only some characters for membership in the punctuation
+ * class. We have no guarantee that all characters will obey the same rules as
+ * pg_u_ispunct_posix(), though some coverage is still useful.
+ */
+static const unsigned char test_punct[] = {
+	',', '$', '"', 0x85, 0x00, 'b', '&', 'Z', ' ', '\t', '\n'
+};
+
+/*
+ * Test what we can for libc, which is limited but still useful to cover the
+ * _posix-variant functions.
+ */
+static void
+libc_test()
+{
+	char * libc_locale = setlocale(LC_CTYPE, LIBC_LOCALE);
+
+	if (!libc_locale)
+	{
+		printf("category_test: libc locale \"%s\" not available; skipping\n", LIBC_LOCALE);
+		return;
+	}
+
+	/* non-exhaustive test of pg_u_ispunct_posix() */
+	for (int i = 0; i < sizeof(test_punct)/sizeof(test_punct[0]); i++)
+	{
+		pg_wchar code = (pg_wchar) test_punct[i];
+		bool ispunct = pg_u_ispunct_posix(code);
+		bool libc_ispunct = iswpunct(code);
+
+		if (ispunct != libc_ispunct)
+		{
+			printf("category_test: FAILURE for codepoint 0x%06x\n", code);
+			printf("category_test: Postgres	ispunct_posix:	%d\n", ispunct);
+			printf("category_test: libc iswpunct:		%d\n", libc_ispunct);
+			printf("\n");
+			exit(1);
+		}
+	}
+
+	for (pg_wchar code = 0; code <= 0x10ffff; code++)
+	{
+		uint8_t		pg_category = unicode_category(code);
+
+		bool		isalpha = pg_u_isalpha(code);
+		bool		isdigit = pg_u_isdigit_posix(code);
+		bool		isxdigit = pg_u_isxdigit_posix(code);
+		bool		isalnum = pg_u_isalnum_posix(code);
+
+		bool		libc_isdigit = iswdigit(code);
+		bool		libc_isxdigit = iswxdigit(code);
+
+		if (pg_category == PG_U_UNASSIGNED)
+			continue;
+
+		/* check that alnum is the same as isdigit OR isalpha */
+		if (((isdigit || isalpha) && !isalnum) ||
+			(!(isdigit || isalpha) && isalnum))
+		{
+			printf("category_test: FAILURE for codepoint 0x%06x\n", code);
+			printf("category_test: isalnum inconsistent: isalpha/isdigit/isalnum: %d/%d/%d\n",
+				   isalpha, isdigit, isalnum);
+			exit(1);
+		}
+
+		if (isdigit != libc_isdigit ||
+			isxdigit != libc_isxdigit)
+		{
+			printf("category_test: FAILURE for codepoint 0x%06x\n", code);
+			printf("category_test: Postgres	class	digit/xdigit: %d/%d\n",
+				   isdigit, isxdigit);
+			printf("category_test: libc class	digit/xdigit: %d/%d\n",
+				   libc_isdigit, libc_isxdigit);
+			printf("\n");
+			exit(1);
+		}
+	}
+}
+
+int
+main(int argc, char **argv)
+{
+	pg_unicode_version = parse_unicode_version(PG_UNICODE_VERSION);
+	printf("category_test: Postgres Unicode version:\t%s\n", PG_UNICODE_VERSION);
+
+	libc_test();
+
+#ifdef USE_ICU
+	icu_unicode_version = parse_unicode_version(U_UNICODE_VERSION);
+	printf("category_test: ICU Unicode version:\t\t%s\n", U_UNICODE_VERSION);
+
+	icu_test();
 #else
-	printf("category_test: ICU support required for test; skipping\n");
-	exit(0);
+	printf("category_test: ICU not available; skipping\n");
 #endif
 }
diff --git a/src/common/unicode/generate-unicode_category_table.pl b/src/common/unicode/generate-unicode_category_table.pl
index 992b877ede..9545728443 100644
--- a/src/common/unicode/generate-unicode_category_table.pl
+++ b/src/common/unicode/generate-unicode_category_table.pl
@@ -120,8 +120,6 @@ if ($range_category ne $CATEGORY_UNASSIGNED) {
 							category => $range_category});
 }
 
-my $num_ranges = scalar @category_ranges;
-
 # See: https://www.unicode.org/reports/tr44/#General_Category_Values
 my $categories = {
 	Cn => 'PG_U_UNASSIGNED',
@@ -156,11 +154,98 @@ my $categories = {
 	Pf => 'PG_U_FINAL_PUNCTUATION'
 };
 
-# Start writing out the output files
+# Find White_Space and Hex_Digit characters
+my @white_space = ();
+my @hex_digits = ();
+my @join_control = ();
+open($FH, '<', "$output_path/PropList.txt")
+  or die "Could not open $output_path/PropList.txt: $!.";
+while (my $line = <$FH>)
+{
+	my $pattern = qr/([0-9A-F\.]+)\s*;\s*(\w+)\s*#.*/s;
+	next unless $line =~ $pattern;
+
+	my $code = $line =~ s/$pattern/$1/rg;
+	my $property = $line =~ s/$pattern/$2/rg;
+	my $start;
+	my $end;
+
+	if ($code =~ /\.\./) {
+		# code range
+	    my @sp = split /\.\./, $code;
+		$start = hex($sp[0]);
+		$end = hex($sp[1]);
+	} else {
+		# single code point
+		$start = hex($code);
+		$end = hex($code);
+	}
+
+	if ($property eq "White_Space") {
+		push @white_space, {start => $start, end => $end};
+	}
+	elsif ($property eq "Hex_Digit") {
+		push @hex_digits, {start => $start, end => $end};
+	}
+	elsif ($property eq "Join_Control") {
+		push @join_control, {start => $start, end => $end};
+	}
+}
+
+# Find Alphabetic, Lowercase, and Uppercase characters
+my @alphabetic = ();
+my @lowercase = ();
+my @uppercase = ();
+open($FH, '<', "$output_path/DerivedCoreProperties.txt")
+  or die "Could not open $output_path/DerivedCoreProperties.txt: $!.";
+while (my $line = <$FH>)
+{
+	my $pattern = qr/^([0-9A-F\.]+)\s*;\s*(\w+)\s*#.*$/s;
+	next unless $line =~ $pattern;
+
+	my $code = $line =~ s/$pattern/$1/rg;
+	my $property = $line =~ s/$pattern/$2/rg;
+	my $start;
+	my $end;
+
+	if ($code =~ /\.\./) {
+		# code range
+	    my @sp = split /\.\./, $code;
+	    die "line: {$line} code: {$code} sp[0] {$sp[0]} sp[1] {$sp[1]}"
+		  unless $sp[0] =~ /^[0-9A-F]+$/ &&  $sp[1] =~ /^[0-9A-F]+$/;
+		$start = hex($sp[0]);
+		$end = hex($sp[1]);
+	} else {
+	    die "line: {$line} code: {$code}" unless $code =~ /^[0-9A-F]+$/;
+		# single code point
+		$start = hex($code);
+		$end = hex($code);
+	}
+
+	if ($property eq "Alphabetic") {
+		push @alphabetic, {start => $start, end => $end};
+	}
+	elsif ($property eq "Lowercase") {
+		push @lowercase, {start => $start, end => $end};
+	}
+	elsif ($property eq "Uppercase") {
+		push @uppercase, {start => $start, end => $end};
+	}
+}
+
+my $num_category_ranges = scalar @category_ranges;
+my $num_alphabetic_ranges = scalar @alphabetic;
+my $num_lowercase_ranges = scalar @lowercase;
+my $num_uppercase_ranges = scalar @uppercase;
+my $num_white_space_ranges = scalar @white_space;
+my $num_hex_digit_ranges = scalar @hex_digits;
+my $num_join_control_ranges = scalar @join_control;
+
+# Start writing out the output file
 open my $OT, '>', $output_table_file
   or die "Could not open output file $output_table_file: $!\n";
 
-print $OT <<HEADER;
+print $OT <<"HEADER";
 /*-------------------------------------------------------------------------
  *
  * unicode_category_table.h
@@ -188,11 +273,20 @@ typedef struct
 	uint8		category;		/* General Category */
 }			pg_category_range;
 
-/* table of Unicode codepoint ranges and their categories */
-static const pg_category_range unicode_categories[$num_ranges] =
+typedef struct
 {
+	uint32		first;			/* Unicode codepoint */
+	uint32		last;			/* Unicode codepoint */
+}			pg_unicode_range;
+
 HEADER
 
+print $OT <<"CATEGORY_TABLE";
+/* table of Unicode codepoint ranges and their categories */
+static const pg_category_range unicode_categories[$num_category_ranges] =
+{
+CATEGORY_TABLE
+
 my $firsttime = 1;
 foreach my $range (@category_ranges) {
 	printf $OT ",\n" unless $firsttime;
@@ -202,4 +296,101 @@ foreach my $range (@category_ranges) {
 	die "category missing: $range->{category}" unless $category;
 	printf $OT "\t{0x%06x, 0x%06x, %s}", $range->{start}, $range->{end}, $category;
 }
+
+print $OT "\n};\n\n";
+
+print $OT <<"ALPHABETIC_TABLE";
+/* table of Unicode codepoint ranges of Alphabetic characters */
+static const pg_unicode_range unicode_alphabetic[$num_alphabetic_ranges] =
+{
+ALPHABETIC_TABLE
+
+$firsttime = 1;
+foreach my $range (@alphabetic) {
+	printf $OT ",\n" unless $firsttime;
+	$firsttime = 0;
+
+	printf $OT "\t{0x%06x, 0x%06x}", $range->{start}, $range->{end};
+}
+
+print $OT "\n};\n\n";
+
+print $OT <<"LOWERCASE_TABLE";
+/* table of Unicode codepoint ranges of Lowercase characters */
+static const pg_unicode_range unicode_lowercase[$num_lowercase_ranges] =
+{
+LOWERCASE_TABLE
+
+$firsttime = 1;
+foreach my $range (@lowercase) {
+	printf $OT ",\n" unless $firsttime;
+	$firsttime = 0;
+
+	printf $OT "\t{0x%06x, 0x%06x}", $range->{start}, $range->{end};
+}
+
+print $OT "\n};\n\n";
+
+print $OT <<"UPPERCASE_TABLE";
+/* table of Unicode codepoint ranges of Uppercase characters */
+static const pg_unicode_range unicode_uppercase[$num_uppercase_ranges] =
+{
+UPPERCASE_TABLE
+
+$firsttime = 1;
+foreach my $range (@uppercase) {
+	printf $OT ",\n" unless $firsttime;
+	$firsttime = 0;
+
+	printf $OT "\t{0x%06x, 0x%06x}", $range->{start}, $range->{end};
+}
+
+print $OT "\n};\n\n";
+
+print $OT <<"WHITE_SPACE_TABLE";
+/* table of Unicode codepoint ranges of White_Space characters */
+static const pg_unicode_range unicode_white_space[$num_white_space_ranges] =
+{
+WHITE_SPACE_TABLE
+
+$firsttime = 1;
+foreach my $range (@white_space) {
+	printf $OT ",\n" unless $firsttime;
+	$firsttime = 0;
+
+	printf $OT "\t{0x%06x, 0x%06x}", $range->{start}, $range->{end};
+}
+
+print $OT "\n};\n\n";
+
+print $OT <<"HEX_DIGITS_TABLE";
+/* table of Unicode codepoint ranges of Hex_Digit characters */
+static const pg_unicode_range unicode_hex_digit[$num_hex_digit_ranges] =
+{
+HEX_DIGITS_TABLE
+
+$firsttime = 1;
+foreach my $range (@hex_digits) {
+	printf $OT ",\n" unless $firsttime;
+	$firsttime = 0;
+
+	printf $OT "\t{0x%06x, 0x%06x}", $range->{start}, $range->{end};
+}
+
+print $OT "\n};\n\n";
+
+print $OT <<"JOIN_CONTROL_TABLE";
+/* table of Unicode codepoint ranges of Join_Control characters */
+static const pg_unicode_range unicode_join_control[$num_join_control_ranges] =
+{
+JOIN_CONTROL_TABLE
+
+$firsttime = 1;
+foreach my $range (@join_control) {
+	printf $OT ",\n" unless $firsttime;
+	$firsttime = 0;
+
+	printf $OT "\t{0x%06x, 0x%06x}", $range->{start}, $range->{end};
+}
+
 print $OT "\n};\n";
diff --git a/src/common/unicode/meson.build b/src/common/unicode/meson.build
index e8cfdc1df4..3526ddb846 100644
--- a/src/common/unicode/meson.build
+++ b/src/common/unicode/meson.build
@@ -11,7 +11,7 @@ endif
 
 # These files are part of the Unicode Character Database. Download them on
 # demand.
-foreach f : ['CompositionExclusions.txt', 'DerivedNormalizationProps.txt', 'EastAsianWidth.txt', 'NormalizationTest.txt', 'UnicodeData.txt']
+foreach f : ['CompositionExclusions.txt', 'DerivedCoreProperties.txt', 'DerivedNormalizationProps.txt', 'EastAsianWidth.txt', 'NormalizationTest.txt', 'PropList.txt', 'UnicodeData.txt']
   url = unicode_baseurl.format(UNICODE_VERSION, f)
   target = custom_target(f,
     output: f,
@@ -26,7 +26,7 @@ update_unicode_targets = []
 
 update_unicode_targets += \
   custom_target('unicode_category_table.h',
-    input: [unicode_data['UnicodeData.txt']],
+    input: [unicode_data['UnicodeData.txt'], unicode_data['DerivedCoreProperties.txt'], unicode_data['PropList.txt']],
     output: ['unicode_category_table.h'],
     command: [
       perl, files('generate-unicode_category_table.pl'),
diff --git a/src/common/unicode_category.c b/src/common/unicode_category.c
index 189cd6eca3..efe617d45b 100644
--- a/src/common/unicode_category.c
+++ b/src/common/unicode_category.c
@@ -1,6 +1,8 @@
 /*-------------------------------------------------------------------------
  * unicode_category.c
- *		Determine general category of Unicode characters.
+ *		Determine general category and character class of Unicode
+ *		characters. Encoding must be UTF8, where we assume that the pg_wchar
+ *		representation is a code point.
  *
  * Portions Copyright (c) 2017-2023, PostgreSQL Global Development Group
  *
@@ -18,24 +20,78 @@
 #include "common/unicode_category.h"
 #include "common/unicode_category_table.h"
 
+/*
+ * We use a mask word for convenience when testing for multiple categories at
+ * once. The number of Unicode General Categories should never grow, so a
+ * 32-bit mask is fine.
+ */
+#define PG_U_CATEGORY_MASK(X) ((uint32)(1 << (X)))
+
+#define PG_U_LU_MASK PG_U_CATEGORY_MASK(PG_U_UPPERCASE_LETTER)
+#define PG_U_LL_MASK PG_U_CATEGORY_MASK(PG_U_LOWERCASE_LETTER)
+#define PG_U_LT_MASK PG_U_CATEGORY_MASK(PG_U_TITLECASE_LETTER)
+#define PG_U_LC_MASK (PG_U_LU_MASK|PG_U_LL_MASK|PG_U_LT_MASK)
+#define PG_U_LM_MASK PG_U_CATEGORY_MASK(PG_U_MODIFIER_LETTER)
+#define PG_U_LO_MASK PG_U_CATEGORY_MASK(PG_U_OTHER_LETTER)
+#define PG_U_L_MASK (PG_U_LU_MASK|PG_U_LL_MASK|PG_U_LT_MASK|PG_U_LM_MASK|\
+					 PG_U_LO_MASK)
+#define PG_U_MN_MASK PG_U_CATEGORY_MASK(PG_U_NONSPACING_MARK)
+#define PG_U_ME_MASK PG_U_CATEGORY_MASK(PG_U_ENCLOSING_MARK)
+#define PG_U_MC_MASK PG_U_CATEGORY_MASK(PG_U_SPACING_MARK)
+#define PG_U_M_MASK (PG_U_MN_MASK|PG_U_MC_MASK|PG_U_ME_MASK)
+#define PG_U_ND_MASK PG_U_CATEGORY_MASK(PG_U_DECIMAL_NUMBER)
+#define PG_U_NL_MASK PG_U_CATEGORY_MASK(PG_U_LETTER_NUMBER)
+#define PG_U_NO_MASK PG_U_CATEGORY_MASK(PG_U_OTHER_NUMBER)
+#define PG_U_N_MASK (PG_U_ND_MASK|PG_U_NL_MASK|PG_U_NO_MASK)
+#define PG_U_PC_MASK PG_U_CATEGORY_MASK(PG_U_CONNECTOR_PUNCTUATION)
+#define PG_U_PD_MASK PG_U_CATEGORY_MASK(PG_U_DASH_PUNCTUATION)
+#define PG_U_PS_MASK PG_U_CATEGORY_MASK(PG_U_OPEN_PUNCTUATION)
+#define PG_U_PE_MASK PG_U_CATEGORY_MASK(PG_U_CLOSE_PUNCTUATION)
+#define PG_U_PI_MASK PG_U_CATEGORY_MASK(PG_U_INITIAL_PUNCTUATION)
+#define PG_U_PF_MASK PG_U_CATEGORY_MASK(PG_U_FINAL_PUNCTUATION)
+#define PG_U_PO_MASK PG_U_CATEGORY_MASK(PG_U_OTHER_PUNCTUATION)
+#define PG_U_P_MASK (PG_U_PC_MASK|PG_U_PD_MASK|PG_U_PS_MASK|PG_U_PE_MASK|\
+					 PG_U_PI_MASK|PG_U_PF_MASK|PG_U_PO_MASK)
+#define PG_U_SM_MASK PG_U_CATEGORY_MASK(PG_U_MATH_SYMBOL)
+#define PG_U_SC_MASK PG_U_CATEGORY_MASK(PG_U_CURRENCY_SYMBOL)
+#define PG_U_SK_MASK PG_U_CATEGORY_MASK(PG_U_MODIFIER_SYMBOL)
+#define PG_U_SO_MASK PG_U_CATEGORY_MASK(PG_U_OTHER_SYMBOL)
+#define PG_U_S_MASK (PG_U_SM_MASK|PG_U_SC_MASK|PG_U_SK_MASK|PG_U_SO_MASK)
+#define PG_U_ZS_MASK PG_U_CATEGORY_MASK(PG_U_SPACE_SEPARATOR)
+#define PG_U_ZL_MASK PG_U_CATEGORY_MASK(PG_U_LINE_SEPARATOR)
+#define PG_U_ZP_MASK PG_U_CATEGORY_MASK(PG_U_PARAGRAPH_SEPARATOR)
+#define PG_U_Z_MASK (PG_U_ZS_MASK|PG_U_ZL_MASK|PG_U_ZP_MASK)
+#define PG_U_CC_MASK PG_U_CATEGORY_MASK(PG_U_CONTROL)
+#define PG_U_CF_MASK PG_U_CATEGORY_MASK(PG_U_FORMAT)
+#define PG_U_CS_MASK PG_U_CATEGORY_MASK(PG_U_SURROGATE)
+#define PG_U_CO_MASK PG_U_CATEGORY_MASK(PG_U_PRIVATE_USE)
+#define PG_U_CN_MASK PG_U_CATEGORY_MASK(PG_U_UNASSIGNED)
+#define PG_U_C_MASK (PG_U_CC_MASK|PG_U_CF_MASK|PG_U_CS_MASK|PG_U_CO_MASK|\
+					 PG_U_CN_MASK)
+
+#define PG_U_CHARACTER_TAB	0x09
+
+static bool range_search(const pg_unicode_range * tbl, Size size,
+						 pg_wchar code);
+
 /*
  * Unicode general category for the given codepoint.
  */
 pg_unicode_category
-unicode_category(pg_wchar ucs)
+unicode_category(pg_wchar code)
 {
 	int			min = 0;
 	int			mid;
 	int			max = lengthof(unicode_categories) - 1;
 
-	Assert(ucs <= 0x10ffff);
+	Assert(code <= 0x10ffff);
 
 	while (max >= min)
 	{
 		mid = (min + max) / 2;
-		if (ucs > unicode_categories[mid].last)
+		if (code > unicode_categories[mid].last)
 			min = mid + 1;
-		else if (ucs < unicode_categories[mid].first)
+		else if (code < unicode_categories[mid].first)
 			max = mid - 1;
 		else
 			return unicode_categories[mid].category;
@@ -44,6 +100,179 @@ unicode_category(pg_wchar ucs)
 	return PG_U_UNASSIGNED;
 }
 
+bool
+pg_u_prop_alphabetic(pg_wchar code)
+{
+	return range_search(unicode_alphabetic, lengthof(unicode_alphabetic),
+						code);
+}
+
+bool
+pg_u_prop_lowercase(pg_wchar code)
+{
+	return range_search(unicode_lowercase, lengthof(unicode_lowercase), code);
+}
+
+bool
+pg_u_prop_uppercase(pg_wchar code)
+{
+	return range_search(unicode_uppercase, lengthof(unicode_uppercase), code);
+}
+
+bool
+pg_u_prop_white_space(pg_wchar code)
+{
+	return range_search(unicode_white_space, lengthof(unicode_white_space),
+						code);
+}
+
+bool
+pg_u_prop_hex_digit(pg_wchar code)
+{
+	return range_search(unicode_hex_digit, lengthof(unicode_hex_digit), code);
+}
+
+bool
+pg_u_prop_join_control(pg_wchar code)
+{
+	return range_search(unicode_join_control, lengthof(unicode_join_control),
+						code);
+}
+
+/*
+ * The following functions implement the regex character classification as
+ * described at: http://www.unicode.org/reports/tr18/#Compatibility_Properties
+ */
+
+bool
+pg_u_isdigit(pg_wchar code)
+{
+	return unicode_category(code) == PG_U_DECIMAL_NUMBER;
+}
+
+bool
+pg_u_isdigit_posix(pg_wchar code)
+{
+	return ('0' <= code && code <= '9');
+}
+
+bool
+pg_u_isalpha(pg_wchar code)
+{
+	return pg_u_prop_alphabetic(code);
+}
+
+bool
+pg_u_isalnum(pg_wchar code)
+{
+	return pg_u_isalpha(code) || pg_u_isdigit(code);
+}
+
+bool
+pg_u_isalnum_posix(pg_wchar code)
+{
+	return pg_u_isalpha(code) || pg_u_isdigit_posix(code);
+}
+
+bool
+pg_u_isword(pg_wchar code)
+{
+	uint32 category_mask = PG_U_CATEGORY_MASK(unicode_category(code));
+
+	return
+		category_mask & (PG_U_M_MASK|PG_U_ND_MASK|PG_U_PC_MASK) ||
+		pg_u_isalpha(code) ||
+		pg_u_prop_join_control(code);
+}
+
+bool
+pg_u_isupper(pg_wchar code)
+{
+	return pg_u_prop_uppercase(code);
+}
+
+bool
+pg_u_islower(pg_wchar code)
+{
+	return pg_u_prop_lowercase(code);
+}
+
+bool
+pg_u_isblank(pg_wchar code)
+{
+	return code == PG_U_CHARACTER_TAB ||
+		unicode_category(code) == PG_U_SPACE_SEPARATOR;
+}
+
+bool
+pg_u_iscntrl(pg_wchar code)
+{
+	return unicode_category(code) == PG_U_CONTROL;
+}
+
+bool
+pg_u_isgraph(pg_wchar code)
+{
+	uint32 category_mask = PG_U_CATEGORY_MASK(unicode_category(code));
+
+	if (category_mask & (PG_U_CC_MASK|PG_U_CS_MASK|PG_U_CN_MASK) ||
+		pg_u_isspace(code))
+		return false;
+	return true;
+}
+
+bool
+pg_u_isprint(pg_wchar code)
+{
+	pg_unicode_category category = unicode_category(code);
+
+	if (category == PG_U_CONTROL)
+		return false;
+
+	return pg_u_isgraph(code) || pg_u_isblank(code);
+}
+
+bool
+pg_u_ispunct(pg_wchar code)
+{
+	uint32 category_mask = PG_U_CATEGORY_MASK(unicode_category(code));
+
+	return category_mask & PG_U_P_MASK;
+}
+
+bool
+pg_u_ispunct_posix(pg_wchar code)
+{
+	uint32 category_mask;
+
+	if (pg_u_isalpha(code))
+		return false;
+
+	category_mask = PG_U_CATEGORY_MASK(unicode_category(code));
+	return category_mask & (PG_U_P_MASK|PG_U_S_MASK);
+}
+
+bool
+pg_u_isspace(pg_wchar code)
+{
+	return pg_u_prop_white_space(code);
+}
+
+bool
+pg_u_isxdigit(pg_wchar code)
+{
+	return unicode_category(code) == PG_U_DECIMAL_NUMBER ||
+		pg_u_prop_hex_digit(code);
+}
+
+bool
+pg_u_isxdigit_posix(pg_wchar code)
+{
+	return (('0' <= code && code <= '9') ||
+			('A' <= code && code <= 'F') ||
+			('a' <= code && code <= 'f'));
+}
+
 /*
  * Description of Unicode general category.
  */
@@ -191,3 +420,30 @@ unicode_category_abbrev(pg_unicode_category category)
 	Assert(false);
 	return "??";				/* keep compiler quiet */
 }
+
+/*
+ * Binary search to test if given codepoint exists in one of the ranges in the
+ * given table.
+ */
+static bool
+range_search(const pg_unicode_range * tbl, Size size, pg_wchar code)
+{
+	int			min = 0;
+	int			mid;
+	int			max = size - 1;
+
+	Assert(code <= 0x10ffff);
+
+	while (max >= min)
+	{
+		mid = (min + max) / 2;
+		if (code > tbl[mid].last)
+			min = mid + 1;
+		else if (code < tbl[mid].first)
+			max = mid - 1;
+		else
+			return true;
+	}
+
+	return false;
+}
diff --git a/src/include/common/unicode_category.h b/src/include/common/unicode_category.h
index 81d38c7411..7d9ef2b496 100644
--- a/src/include/common/unicode_category.h
+++ b/src/include/common/unicode_category.h
@@ -62,7 +62,32 @@ typedef enum pg_unicode_category
 } pg_unicode_category;
 
 extern pg_unicode_category unicode_category(pg_wchar ucs);
-const char *unicode_category_string(pg_unicode_category category);
-const char *unicode_category_abbrev(pg_unicode_category category);
+extern const char *unicode_category_string(pg_unicode_category category);
+extern const char *unicode_category_abbrev(pg_unicode_category category);
+
+extern bool pg_u_prop_alphabetic(pg_wchar c);
+extern bool pg_u_prop_lowercase(pg_wchar c);
+extern bool pg_u_prop_uppercase(pg_wchar c);
+extern bool pg_u_prop_white_space(pg_wchar c);
+extern bool pg_u_prop_hex_digit(pg_wchar c);
+extern bool pg_u_prop_join_control(pg_wchar c);
+
+extern bool	pg_u_isdigit(pg_wchar c);
+extern bool	pg_u_isdigit_posix(pg_wchar c);
+extern bool	pg_u_isalpha(pg_wchar c);
+extern bool	pg_u_isalnum(pg_wchar c);
+extern bool	pg_u_isalnum_posix(pg_wchar c);
+extern bool	pg_u_isword(pg_wchar c);
+extern bool	pg_u_isupper(pg_wchar c);
+extern bool	pg_u_islower(pg_wchar c);
+extern bool	pg_u_isblank(pg_wchar c);
+extern bool	pg_u_iscntrl(pg_wchar c);
+extern bool	pg_u_isgraph(pg_wchar c);
+extern bool	pg_u_isprint(pg_wchar c);
+extern bool	pg_u_ispunct(pg_wchar c);
+extern bool	pg_u_ispunct_posix(pg_wchar c);
+extern bool	pg_u_isspace(pg_wchar c);
+extern bool	pg_u_isxdigit(pg_wchar c);
+extern bool	pg_u_isxdigit_posix(pg_wchar c);
 
 #endif							/* UNICODE_CATEGORY_H */
diff --git a/src/include/common/unicode_category_table.h b/src/include/common/unicode_category_table.h
index 14f1ea0677..86cdc9c0ed 100644
--- a/src/include/common/unicode_category_table.h
+++ b/src/include/common/unicode_category_table.h
@@ -25,6 +25,12 @@ typedef struct
 	uint8		category;		/* General Category */
 }			pg_category_range;
 
+typedef struct
+{
+	uint32		first;			/* Unicode codepoint */
+	uint32		last;			/* Unicode codepoint */
+}			pg_unicode_range;
+
 /* table of Unicode codepoint ranges and their categories */
 static const pg_category_range unicode_categories[3302] =
 {
@@ -3331,3 +3337,2529 @@ static const pg_category_range unicode_categories[3302] =
 	{0x0f0000, 0x0ffffd, PG_U_PRIVATE_USE},
 	{0x100000, 0x10fffd, PG_U_PRIVATE_USE}
 };
+
+/* table of Unicode codepoint ranges of Alphabetic characters */
+static const pg_unicode_range unicode_alphabetic[1141] =
+{
+	{0x000041, 0x00005a},
+	{0x000061, 0x00007a},
+	{0x0000aa, 0x0000aa},
+	{0x0000b5, 0x0000b5},
+	{0x0000ba, 0x0000ba},
+	{0x0000c0, 0x0000d6},
+	{0x0000d8, 0x0000f6},
+	{0x0000f8, 0x0001ba},
+	{0x0001bb, 0x0001bb},
+	{0x0001bc, 0x0001bf},
+	{0x0001c0, 0x0001c3},
+	{0x0001c4, 0x000293},
+	{0x000294, 0x000294},
+	{0x000295, 0x0002af},
+	{0x0002b0, 0x0002c1},
+	{0x0002c6, 0x0002d1},
+	{0x0002e0, 0x0002e4},
+	{0x0002ec, 0x0002ec},
+	{0x0002ee, 0x0002ee},
+	{0x000345, 0x000345},
+	{0x000370, 0x000373},
+	{0x000374, 0x000374},
+	{0x000376, 0x000377},
+	{0x00037a, 0x00037a},
+	{0x00037b, 0x00037d},
+	{0x00037f, 0x00037f},
+	{0x000386, 0x000386},
+	{0x000388, 0x00038a},
+	{0x00038c, 0x00038c},
+	{0x00038e, 0x0003a1},
+	{0x0003a3, 0x0003f5},
+	{0x0003f7, 0x000481},
+	{0x00048a, 0x00052f},
+	{0x000531, 0x000556},
+	{0x000559, 0x000559},
+	{0x000560, 0x000588},
+	{0x0005b0, 0x0005bd},
+	{0x0005bf, 0x0005bf},
+	{0x0005c1, 0x0005c2},
+	{0x0005c4, 0x0005c5},
+	{0x0005c7, 0x0005c7},
+	{0x0005d0, 0x0005ea},
+	{0x0005ef, 0x0005f2},
+	{0x000610, 0x00061a},
+	{0x000620, 0x00063f},
+	{0x000640, 0x000640},
+	{0x000641, 0x00064a},
+	{0x00064b, 0x000657},
+	{0x000659, 0x00065f},
+	{0x00066e, 0x00066f},
+	{0x000670, 0x000670},
+	{0x000671, 0x0006d3},
+	{0x0006d5, 0x0006d5},
+	{0x0006d6, 0x0006dc},
+	{0x0006e1, 0x0006e4},
+	{0x0006e5, 0x0006e6},
+	{0x0006e7, 0x0006e8},
+	{0x0006ed, 0x0006ed},
+	{0x0006ee, 0x0006ef},
+	{0x0006fa, 0x0006fc},
+	{0x0006ff, 0x0006ff},
+	{0x000710, 0x000710},
+	{0x000711, 0x000711},
+	{0x000712, 0x00072f},
+	{0x000730, 0x00073f},
+	{0x00074d, 0x0007a5},
+	{0x0007a6, 0x0007b0},
+	{0x0007b1, 0x0007b1},
+	{0x0007ca, 0x0007ea},
+	{0x0007f4, 0x0007f5},
+	{0x0007fa, 0x0007fa},
+	{0x000800, 0x000815},
+	{0x000816, 0x000817},
+	{0x00081a, 0x00081a},
+	{0x00081b, 0x000823},
+	{0x000824, 0x000824},
+	{0x000825, 0x000827},
+	{0x000828, 0x000828},
+	{0x000829, 0x00082c},
+	{0x000840, 0x000858},
+	{0x000860, 0x00086a},
+	{0x000870, 0x000887},
+	{0x000889, 0x00088e},
+	{0x0008a0, 0x0008c8},
+	{0x0008c9, 0x0008c9},
+	{0x0008d4, 0x0008df},
+	{0x0008e3, 0x0008e9},
+	{0x0008f0, 0x000902},
+	{0x000903, 0x000903},
+	{0x000904, 0x000939},
+	{0x00093a, 0x00093a},
+	{0x00093b, 0x00093b},
+	{0x00093d, 0x00093d},
+	{0x00093e, 0x000940},
+	{0x000941, 0x000948},
+	{0x000949, 0x00094c},
+	{0x00094e, 0x00094f},
+	{0x000950, 0x000950},
+	{0x000955, 0x000957},
+	{0x000958, 0x000961},
+	{0x000962, 0x000963},
+	{0x000971, 0x000971},
+	{0x000972, 0x000980},
+	{0x000981, 0x000981},
+	{0x000982, 0x000983},
+	{0x000985, 0x00098c},
+	{0x00098f, 0x000990},
+	{0x000993, 0x0009a8},
+	{0x0009aa, 0x0009b0},
+	{0x0009b2, 0x0009b2},
+	{0x0009b6, 0x0009b9},
+	{0x0009bd, 0x0009bd},
+	{0x0009be, 0x0009c0},
+	{0x0009c1, 0x0009c4},
+	{0x0009c7, 0x0009c8},
+	{0x0009cb, 0x0009cc},
+	{0x0009ce, 0x0009ce},
+	{0x0009d7, 0x0009d7},
+	{0x0009dc, 0x0009dd},
+	{0x0009df, 0x0009e1},
+	{0x0009e2, 0x0009e3},
+	{0x0009f0, 0x0009f1},
+	{0x0009fc, 0x0009fc},
+	{0x000a01, 0x000a02},
+	{0x000a03, 0x000a03},
+	{0x000a05, 0x000a0a},
+	{0x000a0f, 0x000a10},
+	{0x000a13, 0x000a28},
+	{0x000a2a, 0x000a30},
+	{0x000a32, 0x000a33},
+	{0x000a35, 0x000a36},
+	{0x000a38, 0x000a39},
+	{0x000a3e, 0x000a40},
+	{0x000a41, 0x000a42},
+	{0x000a47, 0x000a48},
+	{0x000a4b, 0x000a4c},
+	{0x000a51, 0x000a51},
+	{0x000a59, 0x000a5c},
+	{0x000a5e, 0x000a5e},
+	{0x000a70, 0x000a71},
+	{0x000a72, 0x000a74},
+	{0x000a75, 0x000a75},
+	{0x000a81, 0x000a82},
+	{0x000a83, 0x000a83},
+	{0x000a85, 0x000a8d},
+	{0x000a8f, 0x000a91},
+	{0x000a93, 0x000aa8},
+	{0x000aaa, 0x000ab0},
+	{0x000ab2, 0x000ab3},
+	{0x000ab5, 0x000ab9},
+	{0x000abd, 0x000abd},
+	{0x000abe, 0x000ac0},
+	{0x000ac1, 0x000ac5},
+	{0x000ac7, 0x000ac8},
+	{0x000ac9, 0x000ac9},
+	{0x000acb, 0x000acc},
+	{0x000ad0, 0x000ad0},
+	{0x000ae0, 0x000ae1},
+	{0x000ae2, 0x000ae3},
+	{0x000af9, 0x000af9},
+	{0x000afa, 0x000afc},
+	{0x000b01, 0x000b01},
+	{0x000b02, 0x000b03},
+	{0x000b05, 0x000b0c},
+	{0x000b0f, 0x000b10},
+	{0x000b13, 0x000b28},
+	{0x000b2a, 0x000b30},
+	{0x000b32, 0x000b33},
+	{0x000b35, 0x000b39},
+	{0x000b3d, 0x000b3d},
+	{0x000b3e, 0x000b3e},
+	{0x000b3f, 0x000b3f},
+	{0x000b40, 0x000b40},
+	{0x000b41, 0x000b44},
+	{0x000b47, 0x000b48},
+	{0x000b4b, 0x000b4c},
+	{0x000b56, 0x000b56},
+	{0x000b57, 0x000b57},
+	{0x000b5c, 0x000b5d},
+	{0x000b5f, 0x000b61},
+	{0x000b62, 0x000b63},
+	{0x000b71, 0x000b71},
+	{0x000b82, 0x000b82},
+	{0x000b83, 0x000b83},
+	{0x000b85, 0x000b8a},
+	{0x000b8e, 0x000b90},
+	{0x000b92, 0x000b95},
+	{0x000b99, 0x000b9a},
+	{0x000b9c, 0x000b9c},
+	{0x000b9e, 0x000b9f},
+	{0x000ba3, 0x000ba4},
+	{0x000ba8, 0x000baa},
+	{0x000bae, 0x000bb9},
+	{0x000bbe, 0x000bbf},
+	{0x000bc0, 0x000bc0},
+	{0x000bc1, 0x000bc2},
+	{0x000bc6, 0x000bc8},
+	{0x000bca, 0x000bcc},
+	{0x000bd0, 0x000bd0},
+	{0x000bd7, 0x000bd7},
+	{0x000c00, 0x000c00},
+	{0x000c01, 0x000c03},
+	{0x000c04, 0x000c04},
+	{0x000c05, 0x000c0c},
+	{0x000c0e, 0x000c10},
+	{0x000c12, 0x000c28},
+	{0x000c2a, 0x000c39},
+	{0x000c3d, 0x000c3d},
+	{0x000c3e, 0x000c40},
+	{0x000c41, 0x000c44},
+	{0x000c46, 0x000c48},
+	{0x000c4a, 0x000c4c},
+	{0x000c55, 0x000c56},
+	{0x000c58, 0x000c5a},
+	{0x000c5d, 0x000c5d},
+	{0x000c60, 0x000c61},
+	{0x000c62, 0x000c63},
+	{0x000c80, 0x000c80},
+	{0x000c81, 0x000c81},
+	{0x000c82, 0x000c83},
+	{0x000c85, 0x000c8c},
+	{0x000c8e, 0x000c90},
+	{0x000c92, 0x000ca8},
+	{0x000caa, 0x000cb3},
+	{0x000cb5, 0x000cb9},
+	{0x000cbd, 0x000cbd},
+	{0x000cbe, 0x000cbe},
+	{0x000cbf, 0x000cbf},
+	{0x000cc0, 0x000cc4},
+	{0x000cc6, 0x000cc6},
+	{0x000cc7, 0x000cc8},
+	{0x000cca, 0x000ccb},
+	{0x000ccc, 0x000ccc},
+	{0x000cd5, 0x000cd6},
+	{0x000cdd, 0x000cde},
+	{0x000ce0, 0x000ce1},
+	{0x000ce2, 0x000ce3},
+	{0x000cf1, 0x000cf2},
+	{0x000cf3, 0x000cf3},
+	{0x000d00, 0x000d01},
+	{0x000d02, 0x000d03},
+	{0x000d04, 0x000d0c},
+	{0x000d0e, 0x000d10},
+	{0x000d12, 0x000d3a},
+	{0x000d3d, 0x000d3d},
+	{0x000d3e, 0x000d40},
+	{0x000d41, 0x000d44},
+	{0x000d46, 0x000d48},
+	{0x000d4a, 0x000d4c},
+	{0x000d4e, 0x000d4e},
+	{0x000d54, 0x000d56},
+	{0x000d57, 0x000d57},
+	{0x000d5f, 0x000d61},
+	{0x000d62, 0x000d63},
+	{0x000d7a, 0x000d7f},
+	{0x000d81, 0x000d81},
+	{0x000d82, 0x000d83},
+	{0x000d85, 0x000d96},
+	{0x000d9a, 0x000db1},
+	{0x000db3, 0x000dbb},
+	{0x000dbd, 0x000dbd},
+	{0x000dc0, 0x000dc6},
+	{0x000dcf, 0x000dd1},
+	{0x000dd2, 0x000dd4},
+	{0x000dd6, 0x000dd6},
+	{0x000dd8, 0x000ddf},
+	{0x000df2, 0x000df3},
+	{0x000e01, 0x000e30},
+	{0x000e31, 0x000e31},
+	{0x000e32, 0x000e33},
+	{0x000e34, 0x000e3a},
+	{0x000e40, 0x000e45},
+	{0x000e46, 0x000e46},
+	{0x000e4d, 0x000e4d},
+	{0x000e81, 0x000e82},
+	{0x000e84, 0x000e84},
+	{0x000e86, 0x000e8a},
+	{0x000e8c, 0x000ea3},
+	{0x000ea5, 0x000ea5},
+	{0x000ea7, 0x000eb0},
+	{0x000eb1, 0x000eb1},
+	{0x000eb2, 0x000eb3},
+	{0x000eb4, 0x000eb9},
+	{0x000ebb, 0x000ebc},
+	{0x000ebd, 0x000ebd},
+	{0x000ec0, 0x000ec4},
+	{0x000ec6, 0x000ec6},
+	{0x000ecd, 0x000ecd},
+	{0x000edc, 0x000edf},
+	{0x000f00, 0x000f00},
+	{0x000f40, 0x000f47},
+	{0x000f49, 0x000f6c},
+	{0x000f71, 0x000f7e},
+	{0x000f7f, 0x000f7f},
+	{0x000f80, 0x000f83},
+	{0x000f88, 0x000f8c},
+	{0x000f8d, 0x000f97},
+	{0x000f99, 0x000fbc},
+	{0x001000, 0x00102a},
+	{0x00102b, 0x00102c},
+	{0x00102d, 0x001030},
+	{0x001031, 0x001031},
+	{0x001032, 0x001036},
+	{0x001038, 0x001038},
+	{0x00103b, 0x00103c},
+	{0x00103d, 0x00103e},
+	{0x00103f, 0x00103f},
+	{0x001050, 0x001055},
+	{0x001056, 0x001057},
+	{0x001058, 0x001059},
+	{0x00105a, 0x00105d},
+	{0x00105e, 0x001060},
+	{0x001061, 0x001061},
+	{0x001062, 0x001064},
+	{0x001065, 0x001066},
+	{0x001067, 0x00106d},
+	{0x00106e, 0x001070},
+	{0x001071, 0x001074},
+	{0x001075, 0x001081},
+	{0x001082, 0x001082},
+	{0x001083, 0x001084},
+	{0x001085, 0x001086},
+	{0x001087, 0x00108c},
+	{0x00108d, 0x00108d},
+	{0x00108e, 0x00108e},
+	{0x00108f, 0x00108f},
+	{0x00109a, 0x00109c},
+	{0x00109d, 0x00109d},
+	{0x0010a0, 0x0010c5},
+	{0x0010c7, 0x0010c7},
+	{0x0010cd, 0x0010cd},
+	{0x0010d0, 0x0010fa},
+	{0x0010fc, 0x0010fc},
+	{0x0010fd, 0x0010ff},
+	{0x001100, 0x001248},
+	{0x00124a, 0x00124d},
+	{0x001250, 0x001256},
+	{0x001258, 0x001258},
+	{0x00125a, 0x00125d},
+	{0x001260, 0x001288},
+	{0x00128a, 0x00128d},
+	{0x001290, 0x0012b0},
+	{0x0012b2, 0x0012b5},
+	{0x0012b8, 0x0012be},
+	{0x0012c0, 0x0012c0},
+	{0x0012c2, 0x0012c5},
+	{0x0012c8, 0x0012d6},
+	{0x0012d8, 0x001310},
+	{0x001312, 0x001315},
+	{0x001318, 0x00135a},
+	{0x001380, 0x00138f},
+	{0x0013a0, 0x0013f5},
+	{0x0013f8, 0x0013fd},
+	{0x001401, 0x00166c},
+	{0x00166f, 0x00167f},
+	{0x001681, 0x00169a},
+	{0x0016a0, 0x0016ea},
+	{0x0016ee, 0x0016f0},
+	{0x0016f1, 0x0016f8},
+	{0x001700, 0x001711},
+	{0x001712, 0x001713},
+	{0x00171f, 0x001731},
+	{0x001732, 0x001733},
+	{0x001740, 0x001751},
+	{0x001752, 0x001753},
+	{0x001760, 0x00176c},
+	{0x00176e, 0x001770},
+	{0x001772, 0x001773},
+	{0x001780, 0x0017b3},
+	{0x0017b6, 0x0017b6},
+	{0x0017b7, 0x0017bd},
+	{0x0017be, 0x0017c5},
+	{0x0017c6, 0x0017c6},
+	{0x0017c7, 0x0017c8},
+	{0x0017d7, 0x0017d7},
+	{0x0017dc, 0x0017dc},
+	{0x001820, 0x001842},
+	{0x001843, 0x001843},
+	{0x001844, 0x001878},
+	{0x001880, 0x001884},
+	{0x001885, 0x001886},
+	{0x001887, 0x0018a8},
+	{0x0018a9, 0x0018a9},
+	{0x0018aa, 0x0018aa},
+	{0x0018b0, 0x0018f5},
+	{0x001900, 0x00191e},
+	{0x001920, 0x001922},
+	{0x001923, 0x001926},
+	{0x001927, 0x001928},
+	{0x001929, 0x00192b},
+	{0x001930, 0x001931},
+	{0x001932, 0x001932},
+	{0x001933, 0x001938},
+	{0x001950, 0x00196d},
+	{0x001970, 0x001974},
+	{0x001980, 0x0019ab},
+	{0x0019b0, 0x0019c9},
+	{0x001a00, 0x001a16},
+	{0x001a17, 0x001a18},
+	{0x001a19, 0x001a1a},
+	{0x001a1b, 0x001a1b},
+	{0x001a20, 0x001a54},
+	{0x001a55, 0x001a55},
+	{0x001a56, 0x001a56},
+	{0x001a57, 0x001a57},
+	{0x001a58, 0x001a5e},
+	{0x001a61, 0x001a61},
+	{0x001a62, 0x001a62},
+	{0x001a63, 0x001a64},
+	{0x001a65, 0x001a6c},
+	{0x001a6d, 0x001a72},
+	{0x001a73, 0x001a74},
+	{0x001aa7, 0x001aa7},
+	{0x001abf, 0x001ac0},
+	{0x001acc, 0x001ace},
+	{0x001b00, 0x001b03},
+	{0x001b04, 0x001b04},
+	{0x001b05, 0x001b33},
+	{0x001b35, 0x001b35},
+	{0x001b36, 0x001b3a},
+	{0x001b3b, 0x001b3b},
+	{0x001b3c, 0x001b3c},
+	{0x001b3d, 0x001b41},
+	{0x001b42, 0x001b42},
+	{0x001b43, 0x001b43},
+	{0x001b45, 0x001b4c},
+	{0x001b80, 0x001b81},
+	{0x001b82, 0x001b82},
+	{0x001b83, 0x001ba0},
+	{0x001ba1, 0x001ba1},
+	{0x001ba2, 0x001ba5},
+	{0x001ba6, 0x001ba7},
+	{0x001ba8, 0x001ba9},
+	{0x001bac, 0x001bad},
+	{0x001bae, 0x001baf},
+	{0x001bba, 0x001be5},
+	{0x001be7, 0x001be7},
+	{0x001be8, 0x001be9},
+	{0x001bea, 0x001bec},
+	{0x001bed, 0x001bed},
+	{0x001bee, 0x001bee},
+	{0x001bef, 0x001bf1},
+	{0x001c00, 0x001c23},
+	{0x001c24, 0x001c2b},
+	{0x001c2c, 0x001c33},
+	{0x001c34, 0x001c35},
+	{0x001c36, 0x001c36},
+	{0x001c4d, 0x001c4f},
+	{0x001c5a, 0x001c77},
+	{0x001c78, 0x001c7d},
+	{0x001c80, 0x001c88},
+	{0x001c90, 0x001cba},
+	{0x001cbd, 0x001cbf},
+	{0x001ce9, 0x001cec},
+	{0x001cee, 0x001cf3},
+	{0x001cf5, 0x001cf6},
+	{0x001cfa, 0x001cfa},
+	{0x001d00, 0x001d2b},
+	{0x001d2c, 0x001d6a},
+	{0x001d6b, 0x001d77},
+	{0x001d78, 0x001d78},
+	{0x001d79, 0x001d9a},
+	{0x001d9b, 0x001dbf},
+	{0x001de7, 0x001df4},
+	{0x001e00, 0x001f15},
+	{0x001f18, 0x001f1d},
+	{0x001f20, 0x001f45},
+	{0x001f48, 0x001f4d},
+	{0x001f50, 0x001f57},
+	{0x001f59, 0x001f59},
+	{0x001f5b, 0x001f5b},
+	{0x001f5d, 0x001f5d},
+	{0x001f5f, 0x001f7d},
+	{0x001f80, 0x001fb4},
+	{0x001fb6, 0x001fbc},
+	{0x001fbe, 0x001fbe},
+	{0x001fc2, 0x001fc4},
+	{0x001fc6, 0x001fcc},
+	{0x001fd0, 0x001fd3},
+	{0x001fd6, 0x001fdb},
+	{0x001fe0, 0x001fec},
+	{0x001ff2, 0x001ff4},
+	{0x001ff6, 0x001ffc},
+	{0x002071, 0x002071},
+	{0x00207f, 0x00207f},
+	{0x002090, 0x00209c},
+	{0x002102, 0x002102},
+	{0x002107, 0x002107},
+	{0x00210a, 0x002113},
+	{0x002115, 0x002115},
+	{0x002119, 0x00211d},
+	{0x002124, 0x002124},
+	{0x002126, 0x002126},
+	{0x002128, 0x002128},
+	{0x00212a, 0x00212d},
+	{0x00212f, 0x002134},
+	{0x002135, 0x002138},
+	{0x002139, 0x002139},
+	{0x00213c, 0x00213f},
+	{0x002145, 0x002149},
+	{0x00214e, 0x00214e},
+	{0x002160, 0x002182},
+	{0x002183, 0x002184},
+	{0x002185, 0x002188},
+	{0x0024b6, 0x0024e9},
+	{0x002c00, 0x002c7b},
+	{0x002c7c, 0x002c7d},
+	{0x002c7e, 0x002ce4},
+	{0x002ceb, 0x002cee},
+	{0x002cf2, 0x002cf3},
+	{0x002d00, 0x002d25},
+	{0x002d27, 0x002d27},
+	{0x002d2d, 0x002d2d},
+	{0x002d30, 0x002d67},
+	{0x002d6f, 0x002d6f},
+	{0x002d80, 0x002d96},
+	{0x002da0, 0x002da6},
+	{0x002da8, 0x002dae},
+	{0x002db0, 0x002db6},
+	{0x002db8, 0x002dbe},
+	{0x002dc0, 0x002dc6},
+	{0x002dc8, 0x002dce},
+	{0x002dd0, 0x002dd6},
+	{0x002dd8, 0x002dde},
+	{0x002de0, 0x002dff},
+	{0x002e2f, 0x002e2f},
+	{0x003005, 0x003005},
+	{0x003006, 0x003006},
+	{0x003007, 0x003007},
+	{0x003021, 0x003029},
+	{0x003031, 0x003035},
+	{0x003038, 0x00303a},
+	{0x00303b, 0x00303b},
+	{0x00303c, 0x00303c},
+	{0x003041, 0x003096},
+	{0x00309d, 0x00309e},
+	{0x00309f, 0x00309f},
+	{0x0030a1, 0x0030fa},
+	{0x0030fc, 0x0030fe},
+	{0x0030ff, 0x0030ff},
+	{0x003105, 0x00312f},
+	{0x003131, 0x00318e},
+	{0x0031a0, 0x0031bf},
+	{0x0031f0, 0x0031ff},
+	{0x003400, 0x004dbf},
+	{0x004e00, 0x00a014},
+	{0x00a015, 0x00a015},
+	{0x00a016, 0x00a48c},
+	{0x00a4d0, 0x00a4f7},
+	{0x00a4f8, 0x00a4fd},
+	{0x00a500, 0x00a60b},
+	{0x00a60c, 0x00a60c},
+	{0x00a610, 0x00a61f},
+	{0x00a62a, 0x00a62b},
+	{0x00a640, 0x00a66d},
+	{0x00a66e, 0x00a66e},
+	{0x00a674, 0x00a67b},
+	{0x00a67f, 0x00a67f},
+	{0x00a680, 0x00a69b},
+	{0x00a69c, 0x00a69d},
+	{0x00a69e, 0x00a69f},
+	{0x00a6a0, 0x00a6e5},
+	{0x00a6e6, 0x00a6ef},
+	{0x00a717, 0x00a71f},
+	{0x00a722, 0x00a76f},
+	{0x00a770, 0x00a770},
+	{0x00a771, 0x00a787},
+	{0x00a788, 0x00a788},
+	{0x00a78b, 0x00a78e},
+	{0x00a78f, 0x00a78f},
+	{0x00a790, 0x00a7ca},
+	{0x00a7d0, 0x00a7d1},
+	{0x00a7d3, 0x00a7d3},
+	{0x00a7d5, 0x00a7d9},
+	{0x00a7f2, 0x00a7f4},
+	{0x00a7f5, 0x00a7f6},
+	{0x00a7f7, 0x00a7f7},
+	{0x00a7f8, 0x00a7f9},
+	{0x00a7fa, 0x00a7fa},
+	{0x00a7fb, 0x00a801},
+	{0x00a802, 0x00a802},
+	{0x00a803, 0x00a805},
+	{0x00a807, 0x00a80a},
+	{0x00a80b, 0x00a80b},
+	{0x00a80c, 0x00a822},
+	{0x00a823, 0x00a824},
+	{0x00a825, 0x00a826},
+	{0x00a827, 0x00a827},
+	{0x00a840, 0x00a873},
+	{0x00a880, 0x00a881},
+	{0x00a882, 0x00a8b3},
+	{0x00a8b4, 0x00a8c3},
+	{0x00a8c5, 0x00a8c5},
+	{0x00a8f2, 0x00a8f7},
+	{0x00a8fb, 0x00a8fb},
+	{0x00a8fd, 0x00a8fe},
+	{0x00a8ff, 0x00a8ff},
+	{0x00a90a, 0x00a925},
+	{0x00a926, 0x00a92a},
+	{0x00a930, 0x00a946},
+	{0x00a947, 0x00a951},
+	{0x00a952, 0x00a952},
+	{0x00a960, 0x00a97c},
+	{0x00a980, 0x00a982},
+	{0x00a983, 0x00a983},
+	{0x00a984, 0x00a9b2},
+	{0x00a9b4, 0x00a9b5},
+	{0x00a9b6, 0x00a9b9},
+	{0x00a9ba, 0x00a9bb},
+	{0x00a9bc, 0x00a9bd},
+	{0x00a9be, 0x00a9bf},
+	{0x00a9cf, 0x00a9cf},
+	{0x00a9e0, 0x00a9e4},
+	{0x00a9e5, 0x00a9e5},
+	{0x00a9e6, 0x00a9e6},
+	{0x00a9e7, 0x00a9ef},
+	{0x00a9fa, 0x00a9fe},
+	{0x00aa00, 0x00aa28},
+	{0x00aa29, 0x00aa2e},
+	{0x00aa2f, 0x00aa30},
+	{0x00aa31, 0x00aa32},
+	{0x00aa33, 0x00aa34},
+	{0x00aa35, 0x00aa36},
+	{0x00aa40, 0x00aa42},
+	{0x00aa43, 0x00aa43},
+	{0x00aa44, 0x00aa4b},
+	{0x00aa4c, 0x00aa4c},
+	{0x00aa4d, 0x00aa4d},
+	{0x00aa60, 0x00aa6f},
+	{0x00aa70, 0x00aa70},
+	{0x00aa71, 0x00aa76},
+	{0x00aa7a, 0x00aa7a},
+	{0x00aa7b, 0x00aa7b},
+	{0x00aa7c, 0x00aa7c},
+	{0x00aa7d, 0x00aa7d},
+	{0x00aa7e, 0x00aaaf},
+	{0x00aab0, 0x00aab0},
+	{0x00aab1, 0x00aab1},
+	{0x00aab2, 0x00aab4},
+	{0x00aab5, 0x00aab6},
+	{0x00aab7, 0x00aab8},
+	{0x00aab9, 0x00aabd},
+	{0x00aabe, 0x00aabe},
+	{0x00aac0, 0x00aac0},
+	{0x00aac2, 0x00aac2},
+	{0x00aadb, 0x00aadc},
+	{0x00aadd, 0x00aadd},
+	{0x00aae0, 0x00aaea},
+	{0x00aaeb, 0x00aaeb},
+	{0x00aaec, 0x00aaed},
+	{0x00aaee, 0x00aaef},
+	{0x00aaf2, 0x00aaf2},
+	{0x00aaf3, 0x00aaf4},
+	{0x00aaf5, 0x00aaf5},
+	{0x00ab01, 0x00ab06},
+	{0x00ab09, 0x00ab0e},
+	{0x00ab11, 0x00ab16},
+	{0x00ab20, 0x00ab26},
+	{0x00ab28, 0x00ab2e},
+	{0x00ab30, 0x00ab5a},
+	{0x00ab5c, 0x00ab5f},
+	{0x00ab60, 0x00ab68},
+	{0x00ab69, 0x00ab69},
+	{0x00ab70, 0x00abbf},
+	{0x00abc0, 0x00abe2},
+	{0x00abe3, 0x00abe4},
+	{0x00abe5, 0x00abe5},
+	{0x00abe6, 0x00abe7},
+	{0x00abe8, 0x00abe8},
+	{0x00abe9, 0x00abea},
+	{0x00ac00, 0x00d7a3},
+	{0x00d7b0, 0x00d7c6},
+	{0x00d7cb, 0x00d7fb},
+	{0x00f900, 0x00fa6d},
+	{0x00fa70, 0x00fad9},
+	{0x00fb00, 0x00fb06},
+	{0x00fb13, 0x00fb17},
+	{0x00fb1d, 0x00fb1d},
+	{0x00fb1e, 0x00fb1e},
+	{0x00fb1f, 0x00fb28},
+	{0x00fb2a, 0x00fb36},
+	{0x00fb38, 0x00fb3c},
+	{0x00fb3e, 0x00fb3e},
+	{0x00fb40, 0x00fb41},
+	{0x00fb43, 0x00fb44},
+	{0x00fb46, 0x00fbb1},
+	{0x00fbd3, 0x00fd3d},
+	{0x00fd50, 0x00fd8f},
+	{0x00fd92, 0x00fdc7},
+	{0x00fdf0, 0x00fdfb},
+	{0x00fe70, 0x00fe74},
+	{0x00fe76, 0x00fefc},
+	{0x00ff21, 0x00ff3a},
+	{0x00ff41, 0x00ff5a},
+	{0x00ff66, 0x00ff6f},
+	{0x00ff70, 0x00ff70},
+	{0x00ff71, 0x00ff9d},
+	{0x00ff9e, 0x00ff9f},
+	{0x00ffa0, 0x00ffbe},
+	{0x00ffc2, 0x00ffc7},
+	{0x00ffca, 0x00ffcf},
+	{0x00ffd2, 0x00ffd7},
+	{0x00ffda, 0x00ffdc},
+	{0x010000, 0x01000b},
+	{0x01000d, 0x010026},
+	{0x010028, 0x01003a},
+	{0x01003c, 0x01003d},
+	{0x01003f, 0x01004d},
+	{0x010050, 0x01005d},
+	{0x010080, 0x0100fa},
+	{0x010140, 0x010174},
+	{0x010280, 0x01029c},
+	{0x0102a0, 0x0102d0},
+	{0x010300, 0x01031f},
+	{0x01032d, 0x010340},
+	{0x010341, 0x010341},
+	{0x010342, 0x010349},
+	{0x01034a, 0x01034a},
+	{0x010350, 0x010375},
+	{0x010376, 0x01037a},
+	{0x010380, 0x01039d},
+	{0x0103a0, 0x0103c3},
+	{0x0103c8, 0x0103cf},
+	{0x0103d1, 0x0103d5},
+	{0x010400, 0x01044f},
+	{0x010450, 0x01049d},
+	{0x0104b0, 0x0104d3},
+	{0x0104d8, 0x0104fb},
+	{0x010500, 0x010527},
+	{0x010530, 0x010563},
+	{0x010570, 0x01057a},
+	{0x01057c, 0x01058a},
+	{0x01058c, 0x010592},
+	{0x010594, 0x010595},
+	{0x010597, 0x0105a1},
+	{0x0105a3, 0x0105b1},
+	{0x0105b3, 0x0105b9},
+	{0x0105bb, 0x0105bc},
+	{0x010600, 0x010736},
+	{0x010740, 0x010755},
+	{0x010760, 0x010767},
+	{0x010780, 0x010785},
+	{0x010787, 0x0107b0},
+	{0x0107b2, 0x0107ba},
+	{0x010800, 0x010805},
+	{0x010808, 0x010808},
+	{0x01080a, 0x010835},
+	{0x010837, 0x010838},
+	{0x01083c, 0x01083c},
+	{0x01083f, 0x010855},
+	{0x010860, 0x010876},
+	{0x010880, 0x01089e},
+	{0x0108e0, 0x0108f2},
+	{0x0108f4, 0x0108f5},
+	{0x010900, 0x010915},
+	{0x010920, 0x010939},
+	{0x010980, 0x0109b7},
+	{0x0109be, 0x0109bf},
+	{0x010a00, 0x010a00},
+	{0x010a01, 0x010a03},
+	{0x010a05, 0x010a06},
+	{0x010a0c, 0x010a0f},
+	{0x010a10, 0x010a13},
+	{0x010a15, 0x010a17},
+	{0x010a19, 0x010a35},
+	{0x010a60, 0x010a7c},
+	{0x010a80, 0x010a9c},
+	{0x010ac0, 0x010ac7},
+	{0x010ac9, 0x010ae4},
+	{0x010b00, 0x010b35},
+	{0x010b40, 0x010b55},
+	{0x010b60, 0x010b72},
+	{0x010b80, 0x010b91},
+	{0x010c00, 0x010c48},
+	{0x010c80, 0x010cb2},
+	{0x010cc0, 0x010cf2},
+	{0x010d00, 0x010d23},
+	{0x010d24, 0x010d27},
+	{0x010e80, 0x010ea9},
+	{0x010eab, 0x010eac},
+	{0x010eb0, 0x010eb1},
+	{0x010f00, 0x010f1c},
+	{0x010f27, 0x010f27},
+	{0x010f30, 0x010f45},
+	{0x010f70, 0x010f81},
+	{0x010fb0, 0x010fc4},
+	{0x010fe0, 0x010ff6},
+	{0x011000, 0x011000},
+	{0x011001, 0x011001},
+	{0x011002, 0x011002},
+	{0x011003, 0x011037},
+	{0x011038, 0x011045},
+	{0x011071, 0x011072},
+	{0x011073, 0x011074},
+	{0x011075, 0x011075},
+	{0x011080, 0x011081},
+	{0x011082, 0x011082},
+	{0x011083, 0x0110af},
+	{0x0110b0, 0x0110b2},
+	{0x0110b3, 0x0110b6},
+	{0x0110b7, 0x0110b8},
+	{0x0110c2, 0x0110c2},
+	{0x0110d0, 0x0110e8},
+	{0x011100, 0x011102},
+	{0x011103, 0x011126},
+	{0x011127, 0x01112b},
+	{0x01112c, 0x01112c},
+	{0x01112d, 0x011132},
+	{0x011144, 0x011144},
+	{0x011145, 0x011146},
+	{0x011147, 0x011147},
+	{0x011150, 0x011172},
+	{0x011176, 0x011176},
+	{0x011180, 0x011181},
+	{0x011182, 0x011182},
+	{0x011183, 0x0111b2},
+	{0x0111b3, 0x0111b5},
+	{0x0111b6, 0x0111be},
+	{0x0111bf, 0x0111bf},
+	{0x0111c1, 0x0111c4},
+	{0x0111ce, 0x0111ce},
+	{0x0111cf, 0x0111cf},
+	{0x0111da, 0x0111da},
+	{0x0111dc, 0x0111dc},
+	{0x011200, 0x011211},
+	{0x011213, 0x01122b},
+	{0x01122c, 0x01122e},
+	{0x01122f, 0x011231},
+	{0x011232, 0x011233},
+	{0x011234, 0x011234},
+	{0x011237, 0x011237},
+	{0x01123e, 0x01123e},
+	{0x01123f, 0x011240},
+	{0x011241, 0x011241},
+	{0x011280, 0x011286},
+	{0x011288, 0x011288},
+	{0x01128a, 0x01128d},
+	{0x01128f, 0x01129d},
+	{0x01129f, 0x0112a8},
+	{0x0112b0, 0x0112de},
+	{0x0112df, 0x0112df},
+	{0x0112e0, 0x0112e2},
+	{0x0112e3, 0x0112e8},
+	{0x011300, 0x011301},
+	{0x011302, 0x011303},
+	{0x011305, 0x01130c},
+	{0x01130f, 0x011310},
+	{0x011313, 0x011328},
+	{0x01132a, 0x011330},
+	{0x011332, 0x011333},
+	{0x011335, 0x011339},
+	{0x01133d, 0x01133d},
+	{0x01133e, 0x01133f},
+	{0x011340, 0x011340},
+	{0x011341, 0x011344},
+	{0x011347, 0x011348},
+	{0x01134b, 0x01134c},
+	{0x011350, 0x011350},
+	{0x011357, 0x011357},
+	{0x01135d, 0x011361},
+	{0x011362, 0x011363},
+	{0x011400, 0x011434},
+	{0x011435, 0x011437},
+	{0x011438, 0x01143f},
+	{0x011440, 0x011441},
+	{0x011443, 0x011444},
+	{0x011445, 0x011445},
+	{0x011447, 0x01144a},
+	{0x01145f, 0x011461},
+	{0x011480, 0x0114af},
+	{0x0114b0, 0x0114b2},
+	{0x0114b3, 0x0114b8},
+	{0x0114b9, 0x0114b9},
+	{0x0114ba, 0x0114ba},
+	{0x0114bb, 0x0114be},
+	{0x0114bf, 0x0114c0},
+	{0x0114c1, 0x0114c1},
+	{0x0114c4, 0x0114c5},
+	{0x0114c7, 0x0114c7},
+	{0x011580, 0x0115ae},
+	{0x0115af, 0x0115b1},
+	{0x0115b2, 0x0115b5},
+	{0x0115b8, 0x0115bb},
+	{0x0115bc, 0x0115bd},
+	{0x0115be, 0x0115be},
+	{0x0115d8, 0x0115db},
+	{0x0115dc, 0x0115dd},
+	{0x011600, 0x01162f},
+	{0x011630, 0x011632},
+	{0x011633, 0x01163a},
+	{0x01163b, 0x01163c},
+	{0x01163d, 0x01163d},
+	{0x01163e, 0x01163e},
+	{0x011640, 0x011640},
+	{0x011644, 0x011644},
+	{0x011680, 0x0116aa},
+	{0x0116ab, 0x0116ab},
+	{0x0116ac, 0x0116ac},
+	{0x0116ad, 0x0116ad},
+	{0x0116ae, 0x0116af},
+	{0x0116b0, 0x0116b5},
+	{0x0116b8, 0x0116b8},
+	{0x011700, 0x01171a},
+	{0x01171d, 0x01171f},
+	{0x011720, 0x011721},
+	{0x011722, 0x011725},
+	{0x011726, 0x011726},
+	{0x011727, 0x01172a},
+	{0x011740, 0x011746},
+	{0x011800, 0x01182b},
+	{0x01182c, 0x01182e},
+	{0x01182f, 0x011837},
+	{0x011838, 0x011838},
+	{0x0118a0, 0x0118df},
+	{0x0118ff, 0x011906},
+	{0x011909, 0x011909},
+	{0x01190c, 0x011913},
+	{0x011915, 0x011916},
+	{0x011918, 0x01192f},
+	{0x011930, 0x011935},
+	{0x011937, 0x011938},
+	{0x01193b, 0x01193c},
+	{0x01193f, 0x01193f},
+	{0x011940, 0x011940},
+	{0x011941, 0x011941},
+	{0x011942, 0x011942},
+	{0x0119a0, 0x0119a7},
+	{0x0119aa, 0x0119d0},
+	{0x0119d1, 0x0119d3},
+	{0x0119d4, 0x0119d7},
+	{0x0119da, 0x0119db},
+	{0x0119dc, 0x0119df},
+	{0x0119e1, 0x0119e1},
+	{0x0119e3, 0x0119e3},
+	{0x0119e4, 0x0119e4},
+	{0x011a00, 0x011a00},
+	{0x011a01, 0x011a0a},
+	{0x011a0b, 0x011a32},
+	{0x011a35, 0x011a38},
+	{0x011a39, 0x011a39},
+	{0x011a3a, 0x011a3a},
+	{0x011a3b, 0x011a3e},
+	{0x011a50, 0x011a50},
+	{0x011a51, 0x011a56},
+	{0x011a57, 0x011a58},
+	{0x011a59, 0x011a5b},
+	{0x011a5c, 0x011a89},
+	{0x011a8a, 0x011a96},
+	{0x011a97, 0x011a97},
+	{0x011a9d, 0x011a9d},
+	{0x011ab0, 0x011af8},
+	{0x011c00, 0x011c08},
+	{0x011c0a, 0x011c2e},
+	{0x011c2f, 0x011c2f},
+	{0x011c30, 0x011c36},
+	{0x011c38, 0x011c3d},
+	{0x011c3e, 0x011c3e},
+	{0x011c40, 0x011c40},
+	{0x011c72, 0x011c8f},
+	{0x011c92, 0x011ca7},
+	{0x011ca9, 0x011ca9},
+	{0x011caa, 0x011cb0},
+	{0x011cb1, 0x011cb1},
+	{0x011cb2, 0x011cb3},
+	{0x011cb4, 0x011cb4},
+	{0x011cb5, 0x011cb6},
+	{0x011d00, 0x011d06},
+	{0x011d08, 0x011d09},
+	{0x011d0b, 0x011d30},
+	{0x011d31, 0x011d36},
+	{0x011d3a, 0x011d3a},
+	{0x011d3c, 0x011d3d},
+	{0x011d3f, 0x011d41},
+	{0x011d43, 0x011d43},
+	{0x011d46, 0x011d46},
+	{0x011d47, 0x011d47},
+	{0x011d60, 0x011d65},
+	{0x011d67, 0x011d68},
+	{0x011d6a, 0x011d89},
+	{0x011d8a, 0x011d8e},
+	{0x011d90, 0x011d91},
+	{0x011d93, 0x011d94},
+	{0x011d95, 0x011d95},
+	{0x011d96, 0x011d96},
+	{0x011d98, 0x011d98},
+	{0x011ee0, 0x011ef2},
+	{0x011ef3, 0x011ef4},
+	{0x011ef5, 0x011ef6},
+	{0x011f00, 0x011f01},
+	{0x011f02, 0x011f02},
+	{0x011f03, 0x011f03},
+	{0x011f04, 0x011f10},
+	{0x011f12, 0x011f33},
+	{0x011f34, 0x011f35},
+	{0x011f36, 0x011f3a},
+	{0x011f3e, 0x011f3f},
+	{0x011f40, 0x011f40},
+	{0x011fb0, 0x011fb0},
+	{0x012000, 0x012399},
+	{0x012400, 0x01246e},
+	{0x012480, 0x012543},
+	{0x012f90, 0x012ff0},
+	{0x013000, 0x01342f},
+	{0x013441, 0x013446},
+	{0x014400, 0x014646},
+	{0x016800, 0x016a38},
+	{0x016a40, 0x016a5e},
+	{0x016a70, 0x016abe},
+	{0x016ad0, 0x016aed},
+	{0x016b00, 0x016b2f},
+	{0x016b40, 0x016b43},
+	{0x016b63, 0x016b77},
+	{0x016b7d, 0x016b8f},
+	{0x016e40, 0x016e7f},
+	{0x016f00, 0x016f4a},
+	{0x016f4f, 0x016f4f},
+	{0x016f50, 0x016f50},
+	{0x016f51, 0x016f87},
+	{0x016f8f, 0x016f92},
+	{0x016f93, 0x016f9f},
+	{0x016fe0, 0x016fe1},
+	{0x016fe3, 0x016fe3},
+	{0x016ff0, 0x016ff1},
+	{0x017000, 0x0187f7},
+	{0x018800, 0x018cd5},
+	{0x018d00, 0x018d08},
+	{0x01aff0, 0x01aff3},
+	{0x01aff5, 0x01affb},
+	{0x01affd, 0x01affe},
+	{0x01b000, 0x01b122},
+	{0x01b132, 0x01b132},
+	{0x01b150, 0x01b152},
+	{0x01b155, 0x01b155},
+	{0x01b164, 0x01b167},
+	{0x01b170, 0x01b2fb},
+	{0x01bc00, 0x01bc6a},
+	{0x01bc70, 0x01bc7c},
+	{0x01bc80, 0x01bc88},
+	{0x01bc90, 0x01bc99},
+	{0x01bc9e, 0x01bc9e},
+	{0x01d400, 0x01d454},
+	{0x01d456, 0x01d49c},
+	{0x01d49e, 0x01d49f},
+	{0x01d4a2, 0x01d4a2},
+	{0x01d4a5, 0x01d4a6},
+	{0x01d4a9, 0x01d4ac},
+	{0x01d4ae, 0x01d4b9},
+	{0x01d4bb, 0x01d4bb},
+	{0x01d4bd, 0x01d4c3},
+	{0x01d4c5, 0x01d505},
+	{0x01d507, 0x01d50a},
+	{0x01d50d, 0x01d514},
+	{0x01d516, 0x01d51c},
+	{0x01d51e, 0x01d539},
+	{0x01d53b, 0x01d53e},
+	{0x01d540, 0x01d544},
+	{0x01d546, 0x01d546},
+	{0x01d54a, 0x01d550},
+	{0x01d552, 0x01d6a5},
+	{0x01d6a8, 0x01d6c0},
+	{0x01d6c2, 0x01d6da},
+	{0x01d6dc, 0x01d6fa},
+	{0x01d6fc, 0x01d714},
+	{0x01d716, 0x01d734},
+	{0x01d736, 0x01d74e},
+	{0x01d750, 0x01d76e},
+	{0x01d770, 0x01d788},
+	{0x01d78a, 0x01d7a8},
+	{0x01d7aa, 0x01d7c2},
+	{0x01d7c4, 0x01d7cb},
+	{0x01df00, 0x01df09},
+	{0x01df0a, 0x01df0a},
+	{0x01df0b, 0x01df1e},
+	{0x01df25, 0x01df2a},
+	{0x01e000, 0x01e006},
+	{0x01e008, 0x01e018},
+	{0x01e01b, 0x01e021},
+	{0x01e023, 0x01e024},
+	{0x01e026, 0x01e02a},
+	{0x01e030, 0x01e06d},
+	{0x01e08f, 0x01e08f},
+	{0x01e100, 0x01e12c},
+	{0x01e137, 0x01e13d},
+	{0x01e14e, 0x01e14e},
+	{0x01e290, 0x01e2ad},
+	{0x01e2c0, 0x01e2eb},
+	{0x01e4d0, 0x01e4ea},
+	{0x01e4eb, 0x01e4eb},
+	{0x01e7e0, 0x01e7e6},
+	{0x01e7e8, 0x01e7eb},
+	{0x01e7ed, 0x01e7ee},
+	{0x01e7f0, 0x01e7fe},
+	{0x01e800, 0x01e8c4},
+	{0x01e900, 0x01e943},
+	{0x01e947, 0x01e947},
+	{0x01e94b, 0x01e94b},
+	{0x01ee00, 0x01ee03},
+	{0x01ee05, 0x01ee1f},
+	{0x01ee21, 0x01ee22},
+	{0x01ee24, 0x01ee24},
+	{0x01ee27, 0x01ee27},
+	{0x01ee29, 0x01ee32},
+	{0x01ee34, 0x01ee37},
+	{0x01ee39, 0x01ee39},
+	{0x01ee3b, 0x01ee3b},
+	{0x01ee42, 0x01ee42},
+	{0x01ee47, 0x01ee47},
+	{0x01ee49, 0x01ee49},
+	{0x01ee4b, 0x01ee4b},
+	{0x01ee4d, 0x01ee4f},
+	{0x01ee51, 0x01ee52},
+	{0x01ee54, 0x01ee54},
+	{0x01ee57, 0x01ee57},
+	{0x01ee59, 0x01ee59},
+	{0x01ee5b, 0x01ee5b},
+	{0x01ee5d, 0x01ee5d},
+	{0x01ee5f, 0x01ee5f},
+	{0x01ee61, 0x01ee62},
+	{0x01ee64, 0x01ee64},
+	{0x01ee67, 0x01ee6a},
+	{0x01ee6c, 0x01ee72},
+	{0x01ee74, 0x01ee77},
+	{0x01ee79, 0x01ee7c},
+	{0x01ee7e, 0x01ee7e},
+	{0x01ee80, 0x01ee89},
+	{0x01ee8b, 0x01ee9b},
+	{0x01eea1, 0x01eea3},
+	{0x01eea5, 0x01eea9},
+	{0x01eeab, 0x01eebb},
+	{0x01f130, 0x01f149},
+	{0x01f150, 0x01f169},
+	{0x01f170, 0x01f189},
+	{0x020000, 0x02a6df},
+	{0x02a700, 0x02b739},
+	{0x02b740, 0x02b81d},
+	{0x02b820, 0x02cea1},
+	{0x02ceb0, 0x02ebe0},
+	{0x02ebf0, 0x02ee5d},
+	{0x02f800, 0x02fa1d},
+	{0x030000, 0x03134a},
+	{0x031350, 0x0323af}
+};
+
+/* table of Unicode codepoint ranges of Lowercase characters */
+static const pg_unicode_range unicode_lowercase[686] =
+{
+	{0x000061, 0x00007a},
+	{0x0000aa, 0x0000aa},
+	{0x0000b5, 0x0000b5},
+	{0x0000ba, 0x0000ba},
+	{0x0000df, 0x0000f6},
+	{0x0000f8, 0x0000ff},
+	{0x000101, 0x000101},
+	{0x000103, 0x000103},
+	{0x000105, 0x000105},
+	{0x000107, 0x000107},
+	{0x000109, 0x000109},
+	{0x00010b, 0x00010b},
+	{0x00010d, 0x00010d},
+	{0x00010f, 0x00010f},
+	{0x000111, 0x000111},
+	{0x000113, 0x000113},
+	{0x000115, 0x000115},
+	{0x000117, 0x000117},
+	{0x000119, 0x000119},
+	{0x00011b, 0x00011b},
+	{0x00011d, 0x00011d},
+	{0x00011f, 0x00011f},
+	{0x000121, 0x000121},
+	{0x000123, 0x000123},
+	{0x000125, 0x000125},
+	{0x000127, 0x000127},
+	{0x000129, 0x000129},
+	{0x00012b, 0x00012b},
+	{0x00012d, 0x00012d},
+	{0x00012f, 0x00012f},
+	{0x000131, 0x000131},
+	{0x000133, 0x000133},
+	{0x000135, 0x000135},
+	{0x000137, 0x000138},
+	{0x00013a, 0x00013a},
+	{0x00013c, 0x00013c},
+	{0x00013e, 0x00013e},
+	{0x000140, 0x000140},
+	{0x000142, 0x000142},
+	{0x000144, 0x000144},
+	{0x000146, 0x000146},
+	{0x000148, 0x000149},
+	{0x00014b, 0x00014b},
+	{0x00014d, 0x00014d},
+	{0x00014f, 0x00014f},
+	{0x000151, 0x000151},
+	{0x000153, 0x000153},
+	{0x000155, 0x000155},
+	{0x000157, 0x000157},
+	{0x000159, 0x000159},
+	{0x00015b, 0x00015b},
+	{0x00015d, 0x00015d},
+	{0x00015f, 0x00015f},
+	{0x000161, 0x000161},
+	{0x000163, 0x000163},
+	{0x000165, 0x000165},
+	{0x000167, 0x000167},
+	{0x000169, 0x000169},
+	{0x00016b, 0x00016b},
+	{0x00016d, 0x00016d},
+	{0x00016f, 0x00016f},
+	{0x000171, 0x000171},
+	{0x000173, 0x000173},
+	{0x000175, 0x000175},
+	{0x000177, 0x000177},
+	{0x00017a, 0x00017a},
+	{0x00017c, 0x00017c},
+	{0x00017e, 0x000180},
+	{0x000183, 0x000183},
+	{0x000185, 0x000185},
+	{0x000188, 0x000188},
+	{0x00018c, 0x00018d},
+	{0x000192, 0x000192},
+	{0x000195, 0x000195},
+	{0x000199, 0x00019b},
+	{0x00019e, 0x00019e},
+	{0x0001a1, 0x0001a1},
+	{0x0001a3, 0x0001a3},
+	{0x0001a5, 0x0001a5},
+	{0x0001a8, 0x0001a8},
+	{0x0001aa, 0x0001ab},
+	{0x0001ad, 0x0001ad},
+	{0x0001b0, 0x0001b0},
+	{0x0001b4, 0x0001b4},
+	{0x0001b6, 0x0001b6},
+	{0x0001b9, 0x0001ba},
+	{0x0001bd, 0x0001bf},
+	{0x0001c6, 0x0001c6},
+	{0x0001c9, 0x0001c9},
+	{0x0001cc, 0x0001cc},
+	{0x0001ce, 0x0001ce},
+	{0x0001d0, 0x0001d0},
+	{0x0001d2, 0x0001d2},
+	{0x0001d4, 0x0001d4},
+	{0x0001d6, 0x0001d6},
+	{0x0001d8, 0x0001d8},
+	{0x0001da, 0x0001da},
+	{0x0001dc, 0x0001dd},
+	{0x0001df, 0x0001df},
+	{0x0001e1, 0x0001e1},
+	{0x0001e3, 0x0001e3},
+	{0x0001e5, 0x0001e5},
+	{0x0001e7, 0x0001e7},
+	{0x0001e9, 0x0001e9},
+	{0x0001eb, 0x0001eb},
+	{0x0001ed, 0x0001ed},
+	{0x0001ef, 0x0001f0},
+	{0x0001f3, 0x0001f3},
+	{0x0001f5, 0x0001f5},
+	{0x0001f9, 0x0001f9},
+	{0x0001fb, 0x0001fb},
+	{0x0001fd, 0x0001fd},
+	{0x0001ff, 0x0001ff},
+	{0x000201, 0x000201},
+	{0x000203, 0x000203},
+	{0x000205, 0x000205},
+	{0x000207, 0x000207},
+	{0x000209, 0x000209},
+	{0x00020b, 0x00020b},
+	{0x00020d, 0x00020d},
+	{0x00020f, 0x00020f},
+	{0x000211, 0x000211},
+	{0x000213, 0x000213},
+	{0x000215, 0x000215},
+	{0x000217, 0x000217},
+	{0x000219, 0x000219},
+	{0x00021b, 0x00021b},
+	{0x00021d, 0x00021d},
+	{0x00021f, 0x00021f},
+	{0x000221, 0x000221},
+	{0x000223, 0x000223},
+	{0x000225, 0x000225},
+	{0x000227, 0x000227},
+	{0x000229, 0x000229},
+	{0x00022b, 0x00022b},
+	{0x00022d, 0x00022d},
+	{0x00022f, 0x00022f},
+	{0x000231, 0x000231},
+	{0x000233, 0x000239},
+	{0x00023c, 0x00023c},
+	{0x00023f, 0x000240},
+	{0x000242, 0x000242},
+	{0x000247, 0x000247},
+	{0x000249, 0x000249},
+	{0x00024b, 0x00024b},
+	{0x00024d, 0x00024d},
+	{0x00024f, 0x000293},
+	{0x000295, 0x0002af},
+	{0x0002b0, 0x0002b8},
+	{0x0002c0, 0x0002c1},
+	{0x0002e0, 0x0002e4},
+	{0x000345, 0x000345},
+	{0x000371, 0x000371},
+	{0x000373, 0x000373},
+	{0x000377, 0x000377},
+	{0x00037a, 0x00037a},
+	{0x00037b, 0x00037d},
+	{0x000390, 0x000390},
+	{0x0003ac, 0x0003ce},
+	{0x0003d0, 0x0003d1},
+	{0x0003d5, 0x0003d7},
+	{0x0003d9, 0x0003d9},
+	{0x0003db, 0x0003db},
+	{0x0003dd, 0x0003dd},
+	{0x0003df, 0x0003df},
+	{0x0003e1, 0x0003e1},
+	{0x0003e3, 0x0003e3},
+	{0x0003e5, 0x0003e5},
+	{0x0003e7, 0x0003e7},
+	{0x0003e9, 0x0003e9},
+	{0x0003eb, 0x0003eb},
+	{0x0003ed, 0x0003ed},
+	{0x0003ef, 0x0003f3},
+	{0x0003f5, 0x0003f5},
+	{0x0003f8, 0x0003f8},
+	{0x0003fb, 0x0003fc},
+	{0x000430, 0x00045f},
+	{0x000461, 0x000461},
+	{0x000463, 0x000463},
+	{0x000465, 0x000465},
+	{0x000467, 0x000467},
+	{0x000469, 0x000469},
+	{0x00046b, 0x00046b},
+	{0x00046d, 0x00046d},
+	{0x00046f, 0x00046f},
+	{0x000471, 0x000471},
+	{0x000473, 0x000473},
+	{0x000475, 0x000475},
+	{0x000477, 0x000477},
+	{0x000479, 0x000479},
+	{0x00047b, 0x00047b},
+	{0x00047d, 0x00047d},
+	{0x00047f, 0x00047f},
+	{0x000481, 0x000481},
+	{0x00048b, 0x00048b},
+	{0x00048d, 0x00048d},
+	{0x00048f, 0x00048f},
+	{0x000491, 0x000491},
+	{0x000493, 0x000493},
+	{0x000495, 0x000495},
+	{0x000497, 0x000497},
+	{0x000499, 0x000499},
+	{0x00049b, 0x00049b},
+	{0x00049d, 0x00049d},
+	{0x00049f, 0x00049f},
+	{0x0004a1, 0x0004a1},
+	{0x0004a3, 0x0004a3},
+	{0x0004a5, 0x0004a5},
+	{0x0004a7, 0x0004a7},
+	{0x0004a9, 0x0004a9},
+	{0x0004ab, 0x0004ab},
+	{0x0004ad, 0x0004ad},
+	{0x0004af, 0x0004af},
+	{0x0004b1, 0x0004b1},
+	{0x0004b3, 0x0004b3},
+	{0x0004b5, 0x0004b5},
+	{0x0004b7, 0x0004b7},
+	{0x0004b9, 0x0004b9},
+	{0x0004bb, 0x0004bb},
+	{0x0004bd, 0x0004bd},
+	{0x0004bf, 0x0004bf},
+	{0x0004c2, 0x0004c2},
+	{0x0004c4, 0x0004c4},
+	{0x0004c6, 0x0004c6},
+	{0x0004c8, 0x0004c8},
+	{0x0004ca, 0x0004ca},
+	{0x0004cc, 0x0004cc},
+	{0x0004ce, 0x0004cf},
+	{0x0004d1, 0x0004d1},
+	{0x0004d3, 0x0004d3},
+	{0x0004d5, 0x0004d5},
+	{0x0004d7, 0x0004d7},
+	{0x0004d9, 0x0004d9},
+	{0x0004db, 0x0004db},
+	{0x0004dd, 0x0004dd},
+	{0x0004df, 0x0004df},
+	{0x0004e1, 0x0004e1},
+	{0x0004e3, 0x0004e3},
+	{0x0004e5, 0x0004e5},
+	{0x0004e7, 0x0004e7},
+	{0x0004e9, 0x0004e9},
+	{0x0004eb, 0x0004eb},
+	{0x0004ed, 0x0004ed},
+	{0x0004ef, 0x0004ef},
+	{0x0004f1, 0x0004f1},
+	{0x0004f3, 0x0004f3},
+	{0x0004f5, 0x0004f5},
+	{0x0004f7, 0x0004f7},
+	{0x0004f9, 0x0004f9},
+	{0x0004fb, 0x0004fb},
+	{0x0004fd, 0x0004fd},
+	{0x0004ff, 0x0004ff},
+	{0x000501, 0x000501},
+	{0x000503, 0x000503},
+	{0x000505, 0x000505},
+	{0x000507, 0x000507},
+	{0x000509, 0x000509},
+	{0x00050b, 0x00050b},
+	{0x00050d, 0x00050d},
+	{0x00050f, 0x00050f},
+	{0x000511, 0x000511},
+	{0x000513, 0x000513},
+	{0x000515, 0x000515},
+	{0x000517, 0x000517},
+	{0x000519, 0x000519},
+	{0x00051b, 0x00051b},
+	{0x00051d, 0x00051d},
+	{0x00051f, 0x00051f},
+	{0x000521, 0x000521},
+	{0x000523, 0x000523},
+	{0x000525, 0x000525},
+	{0x000527, 0x000527},
+	{0x000529, 0x000529},
+	{0x00052b, 0x00052b},
+	{0x00052d, 0x00052d},
+	{0x00052f, 0x00052f},
+	{0x000560, 0x000588},
+	{0x0010d0, 0x0010fa},
+	{0x0010fc, 0x0010fc},
+	{0x0010fd, 0x0010ff},
+	{0x0013f8, 0x0013fd},
+	{0x001c80, 0x001c88},
+	{0x001d00, 0x001d2b},
+	{0x001d2c, 0x001d6a},
+	{0x001d6b, 0x001d77},
+	{0x001d78, 0x001d78},
+	{0x001d79, 0x001d9a},
+	{0x001d9b, 0x001dbf},
+	{0x001e01, 0x001e01},
+	{0x001e03, 0x001e03},
+	{0x001e05, 0x001e05},
+	{0x001e07, 0x001e07},
+	{0x001e09, 0x001e09},
+	{0x001e0b, 0x001e0b},
+	{0x001e0d, 0x001e0d},
+	{0x001e0f, 0x001e0f},
+	{0x001e11, 0x001e11},
+	{0x001e13, 0x001e13},
+	{0x001e15, 0x001e15},
+	{0x001e17, 0x001e17},
+	{0x001e19, 0x001e19},
+	{0x001e1b, 0x001e1b},
+	{0x001e1d, 0x001e1d},
+	{0x001e1f, 0x001e1f},
+	{0x001e21, 0x001e21},
+	{0x001e23, 0x001e23},
+	{0x001e25, 0x001e25},
+	{0x001e27, 0x001e27},
+	{0x001e29, 0x001e29},
+	{0x001e2b, 0x001e2b},
+	{0x001e2d, 0x001e2d},
+	{0x001e2f, 0x001e2f},
+	{0x001e31, 0x001e31},
+	{0x001e33, 0x001e33},
+	{0x001e35, 0x001e35},
+	{0x001e37, 0x001e37},
+	{0x001e39, 0x001e39},
+	{0x001e3b, 0x001e3b},
+	{0x001e3d, 0x001e3d},
+	{0x001e3f, 0x001e3f},
+	{0x001e41, 0x001e41},
+	{0x001e43, 0x001e43},
+	{0x001e45, 0x001e45},
+	{0x001e47, 0x001e47},
+	{0x001e49, 0x001e49},
+	{0x001e4b, 0x001e4b},
+	{0x001e4d, 0x001e4d},
+	{0x001e4f, 0x001e4f},
+	{0x001e51, 0x001e51},
+	{0x001e53, 0x001e53},
+	{0x001e55, 0x001e55},
+	{0x001e57, 0x001e57},
+	{0x001e59, 0x001e59},
+	{0x001e5b, 0x001e5b},
+	{0x001e5d, 0x001e5d},
+	{0x001e5f, 0x001e5f},
+	{0x001e61, 0x001e61},
+	{0x001e63, 0x001e63},
+	{0x001e65, 0x001e65},
+	{0x001e67, 0x001e67},
+	{0x001e69, 0x001e69},
+	{0x001e6b, 0x001e6b},
+	{0x001e6d, 0x001e6d},
+	{0x001e6f, 0x001e6f},
+	{0x001e71, 0x001e71},
+	{0x001e73, 0x001e73},
+	{0x001e75, 0x001e75},
+	{0x001e77, 0x001e77},
+	{0x001e79, 0x001e79},
+	{0x001e7b, 0x001e7b},
+	{0x001e7d, 0x001e7d},
+	{0x001e7f, 0x001e7f},
+	{0x001e81, 0x001e81},
+	{0x001e83, 0x001e83},
+	{0x001e85, 0x001e85},
+	{0x001e87, 0x001e87},
+	{0x001e89, 0x001e89},
+	{0x001e8b, 0x001e8b},
+	{0x001e8d, 0x001e8d},
+	{0x001e8f, 0x001e8f},
+	{0x001e91, 0x001e91},
+	{0x001e93, 0x001e93},
+	{0x001e95, 0x001e9d},
+	{0x001e9f, 0x001e9f},
+	{0x001ea1, 0x001ea1},
+	{0x001ea3, 0x001ea3},
+	{0x001ea5, 0x001ea5},
+	{0x001ea7, 0x001ea7},
+	{0x001ea9, 0x001ea9},
+	{0x001eab, 0x001eab},
+	{0x001ead, 0x001ead},
+	{0x001eaf, 0x001eaf},
+	{0x001eb1, 0x001eb1},
+	{0x001eb3, 0x001eb3},
+	{0x001eb5, 0x001eb5},
+	{0x001eb7, 0x001eb7},
+	{0x001eb9, 0x001eb9},
+	{0x001ebb, 0x001ebb},
+	{0x001ebd, 0x001ebd},
+	{0x001ebf, 0x001ebf},
+	{0x001ec1, 0x001ec1},
+	{0x001ec3, 0x001ec3},
+	{0x001ec5, 0x001ec5},
+	{0x001ec7, 0x001ec7},
+	{0x001ec9, 0x001ec9},
+	{0x001ecb, 0x001ecb},
+	{0x001ecd, 0x001ecd},
+	{0x001ecf, 0x001ecf},
+	{0x001ed1, 0x001ed1},
+	{0x001ed3, 0x001ed3},
+	{0x001ed5, 0x001ed5},
+	{0x001ed7, 0x001ed7},
+	{0x001ed9, 0x001ed9},
+	{0x001edb, 0x001edb},
+	{0x001edd, 0x001edd},
+	{0x001edf, 0x001edf},
+	{0x001ee1, 0x001ee1},
+	{0x001ee3, 0x001ee3},
+	{0x001ee5, 0x001ee5},
+	{0x001ee7, 0x001ee7},
+	{0x001ee9, 0x001ee9},
+	{0x001eeb, 0x001eeb},
+	{0x001eed, 0x001eed},
+	{0x001eef, 0x001eef},
+	{0x001ef1, 0x001ef1},
+	{0x001ef3, 0x001ef3},
+	{0x001ef5, 0x001ef5},
+	{0x001ef7, 0x001ef7},
+	{0x001ef9, 0x001ef9},
+	{0x001efb, 0x001efb},
+	{0x001efd, 0x001efd},
+	{0x001eff, 0x001f07},
+	{0x001f10, 0x001f15},
+	{0x001f20, 0x001f27},
+	{0x001f30, 0x001f37},
+	{0x001f40, 0x001f45},
+	{0x001f50, 0x001f57},
+	{0x001f60, 0x001f67},
+	{0x001f70, 0x001f7d},
+	{0x001f80, 0x001f87},
+	{0x001f90, 0x001f97},
+	{0x001fa0, 0x001fa7},
+	{0x001fb0, 0x001fb4},
+	{0x001fb6, 0x001fb7},
+	{0x001fbe, 0x001fbe},
+	{0x001fc2, 0x001fc4},
+	{0x001fc6, 0x001fc7},
+	{0x001fd0, 0x001fd3},
+	{0x001fd6, 0x001fd7},
+	{0x001fe0, 0x001fe7},
+	{0x001ff2, 0x001ff4},
+	{0x001ff6, 0x001ff7},
+	{0x002071, 0x002071},
+	{0x00207f, 0x00207f},
+	{0x002090, 0x00209c},
+	{0x00210a, 0x00210a},
+	{0x00210e, 0x00210f},
+	{0x002113, 0x002113},
+	{0x00212f, 0x00212f},
+	{0x002134, 0x002134},
+	{0x002139, 0x002139},
+	{0x00213c, 0x00213d},
+	{0x002146, 0x002149},
+	{0x00214e, 0x00214e},
+	{0x002170, 0x00217f},
+	{0x002184, 0x002184},
+	{0x0024d0, 0x0024e9},
+	{0x002c30, 0x002c5f},
+	{0x002c61, 0x002c61},
+	{0x002c65, 0x002c66},
+	{0x002c68, 0x002c68},
+	{0x002c6a, 0x002c6a},
+	{0x002c6c, 0x002c6c},
+	{0x002c71, 0x002c71},
+	{0x002c73, 0x002c74},
+	{0x002c76, 0x002c7b},
+	{0x002c7c, 0x002c7d},
+	{0x002c81, 0x002c81},
+	{0x002c83, 0x002c83},
+	{0x002c85, 0x002c85},
+	{0x002c87, 0x002c87},
+	{0x002c89, 0x002c89},
+	{0x002c8b, 0x002c8b},
+	{0x002c8d, 0x002c8d},
+	{0x002c8f, 0x002c8f},
+	{0x002c91, 0x002c91},
+	{0x002c93, 0x002c93},
+	{0x002c95, 0x002c95},
+	{0x002c97, 0x002c97},
+	{0x002c99, 0x002c99},
+	{0x002c9b, 0x002c9b},
+	{0x002c9d, 0x002c9d},
+	{0x002c9f, 0x002c9f},
+	{0x002ca1, 0x002ca1},
+	{0x002ca3, 0x002ca3},
+	{0x002ca5, 0x002ca5},
+	{0x002ca7, 0x002ca7},
+	{0x002ca9, 0x002ca9},
+	{0x002cab, 0x002cab},
+	{0x002cad, 0x002cad},
+	{0x002caf, 0x002caf},
+	{0x002cb1, 0x002cb1},
+	{0x002cb3, 0x002cb3},
+	{0x002cb5, 0x002cb5},
+	{0x002cb7, 0x002cb7},
+	{0x002cb9, 0x002cb9},
+	{0x002cbb, 0x002cbb},
+	{0x002cbd, 0x002cbd},
+	{0x002cbf, 0x002cbf},
+	{0x002cc1, 0x002cc1},
+	{0x002cc3, 0x002cc3},
+	{0x002cc5, 0x002cc5},
+	{0x002cc7, 0x002cc7},
+	{0x002cc9, 0x002cc9},
+	{0x002ccb, 0x002ccb},
+	{0x002ccd, 0x002ccd},
+	{0x002ccf, 0x002ccf},
+	{0x002cd1, 0x002cd1},
+	{0x002cd3, 0x002cd3},
+	{0x002cd5, 0x002cd5},
+	{0x002cd7, 0x002cd7},
+	{0x002cd9, 0x002cd9},
+	{0x002cdb, 0x002cdb},
+	{0x002cdd, 0x002cdd},
+	{0x002cdf, 0x002cdf},
+	{0x002ce1, 0x002ce1},
+	{0x002ce3, 0x002ce4},
+	{0x002cec, 0x002cec},
+	{0x002cee, 0x002cee},
+	{0x002cf3, 0x002cf3},
+	{0x002d00, 0x002d25},
+	{0x002d27, 0x002d27},
+	{0x002d2d, 0x002d2d},
+	{0x00a641, 0x00a641},
+	{0x00a643, 0x00a643},
+	{0x00a645, 0x00a645},
+	{0x00a647, 0x00a647},
+	{0x00a649, 0x00a649},
+	{0x00a64b, 0x00a64b},
+	{0x00a64d, 0x00a64d},
+	{0x00a64f, 0x00a64f},
+	{0x00a651, 0x00a651},
+	{0x00a653, 0x00a653},
+	{0x00a655, 0x00a655},
+	{0x00a657, 0x00a657},
+	{0x00a659, 0x00a659},
+	{0x00a65b, 0x00a65b},
+	{0x00a65d, 0x00a65d},
+	{0x00a65f, 0x00a65f},
+	{0x00a661, 0x00a661},
+	{0x00a663, 0x00a663},
+	{0x00a665, 0x00a665},
+	{0x00a667, 0x00a667},
+	{0x00a669, 0x00a669},
+	{0x00a66b, 0x00a66b},
+	{0x00a66d, 0x00a66d},
+	{0x00a681, 0x00a681},
+	{0x00a683, 0x00a683},
+	{0x00a685, 0x00a685},
+	{0x00a687, 0x00a687},
+	{0x00a689, 0x00a689},
+	{0x00a68b, 0x00a68b},
+	{0x00a68d, 0x00a68d},
+	{0x00a68f, 0x00a68f},
+	{0x00a691, 0x00a691},
+	{0x00a693, 0x00a693},
+	{0x00a695, 0x00a695},
+	{0x00a697, 0x00a697},
+	{0x00a699, 0x00a699},
+	{0x00a69b, 0x00a69b},
+	{0x00a69c, 0x00a69d},
+	{0x00a723, 0x00a723},
+	{0x00a725, 0x00a725},
+	{0x00a727, 0x00a727},
+	{0x00a729, 0x00a729},
+	{0x00a72b, 0x00a72b},
+	{0x00a72d, 0x00a72d},
+	{0x00a72f, 0x00a731},
+	{0x00a733, 0x00a733},
+	{0x00a735, 0x00a735},
+	{0x00a737, 0x00a737},
+	{0x00a739, 0x00a739},
+	{0x00a73b, 0x00a73b},
+	{0x00a73d, 0x00a73d},
+	{0x00a73f, 0x00a73f},
+	{0x00a741, 0x00a741},
+	{0x00a743, 0x00a743},
+	{0x00a745, 0x00a745},
+	{0x00a747, 0x00a747},
+	{0x00a749, 0x00a749},
+	{0x00a74b, 0x00a74b},
+	{0x00a74d, 0x00a74d},
+	{0x00a74f, 0x00a74f},
+	{0x00a751, 0x00a751},
+	{0x00a753, 0x00a753},
+	{0x00a755, 0x00a755},
+	{0x00a757, 0x00a757},
+	{0x00a759, 0x00a759},
+	{0x00a75b, 0x00a75b},
+	{0x00a75d, 0x00a75d},
+	{0x00a75f, 0x00a75f},
+	{0x00a761, 0x00a761},
+	{0x00a763, 0x00a763},
+	{0x00a765, 0x00a765},
+	{0x00a767, 0x00a767},
+	{0x00a769, 0x00a769},
+	{0x00a76b, 0x00a76b},
+	{0x00a76d, 0x00a76d},
+	{0x00a76f, 0x00a76f},
+	{0x00a770, 0x00a770},
+	{0x00a771, 0x00a778},
+	{0x00a77a, 0x00a77a},
+	{0x00a77c, 0x00a77c},
+	{0x00a77f, 0x00a77f},
+	{0x00a781, 0x00a781},
+	{0x00a783, 0x00a783},
+	{0x00a785, 0x00a785},
+	{0x00a787, 0x00a787},
+	{0x00a78c, 0x00a78c},
+	{0x00a78e, 0x00a78e},
+	{0x00a791, 0x00a791},
+	{0x00a793, 0x00a795},
+	{0x00a797, 0x00a797},
+	{0x00a799, 0x00a799},
+	{0x00a79b, 0x00a79b},
+	{0x00a79d, 0x00a79d},
+	{0x00a79f, 0x00a79f},
+	{0x00a7a1, 0x00a7a1},
+	{0x00a7a3, 0x00a7a3},
+	{0x00a7a5, 0x00a7a5},
+	{0x00a7a7, 0x00a7a7},
+	{0x00a7a9, 0x00a7a9},
+	{0x00a7af, 0x00a7af},
+	{0x00a7b5, 0x00a7b5},
+	{0x00a7b7, 0x00a7b7},
+	{0x00a7b9, 0x00a7b9},
+	{0x00a7bb, 0x00a7bb},
+	{0x00a7bd, 0x00a7bd},
+	{0x00a7bf, 0x00a7bf},
+	{0x00a7c1, 0x00a7c1},
+	{0x00a7c3, 0x00a7c3},
+	{0x00a7c8, 0x00a7c8},
+	{0x00a7ca, 0x00a7ca},
+	{0x00a7d1, 0x00a7d1},
+	{0x00a7d3, 0x00a7d3},
+	{0x00a7d5, 0x00a7d5},
+	{0x00a7d7, 0x00a7d7},
+	{0x00a7d9, 0x00a7d9},
+	{0x00a7f2, 0x00a7f4},
+	{0x00a7f6, 0x00a7f6},
+	{0x00a7f8, 0x00a7f9},
+	{0x00a7fa, 0x00a7fa},
+	{0x00ab30, 0x00ab5a},
+	{0x00ab5c, 0x00ab5f},
+	{0x00ab60, 0x00ab68},
+	{0x00ab69, 0x00ab69},
+	{0x00ab70, 0x00abbf},
+	{0x00fb00, 0x00fb06},
+	{0x00fb13, 0x00fb17},
+	{0x00ff41, 0x00ff5a},
+	{0x010428, 0x01044f},
+	{0x0104d8, 0x0104fb},
+	{0x010597, 0x0105a1},
+	{0x0105a3, 0x0105b1},
+	{0x0105b3, 0x0105b9},
+	{0x0105bb, 0x0105bc},
+	{0x010780, 0x010780},
+	{0x010783, 0x010785},
+	{0x010787, 0x0107b0},
+	{0x0107b2, 0x0107ba},
+	{0x010cc0, 0x010cf2},
+	{0x0118c0, 0x0118df},
+	{0x016e60, 0x016e7f},
+	{0x01d41a, 0x01d433},
+	{0x01d44e, 0x01d454},
+	{0x01d456, 0x01d467},
+	{0x01d482, 0x01d49b},
+	{0x01d4b6, 0x01d4b9},
+	{0x01d4bb, 0x01d4bb},
+	{0x01d4bd, 0x01d4c3},
+	{0x01d4c5, 0x01d4cf},
+	{0x01d4ea, 0x01d503},
+	{0x01d51e, 0x01d537},
+	{0x01d552, 0x01d56b},
+	{0x01d586, 0x01d59f},
+	{0x01d5ba, 0x01d5d3},
+	{0x01d5ee, 0x01d607},
+	{0x01d622, 0x01d63b},
+	{0x01d656, 0x01d66f},
+	{0x01d68a, 0x01d6a5},
+	{0x01d6c2, 0x01d6da},
+	{0x01d6dc, 0x01d6e1},
+	{0x01d6fc, 0x01d714},
+	{0x01d716, 0x01d71b},
+	{0x01d736, 0x01d74e},
+	{0x01d750, 0x01d755},
+	{0x01d770, 0x01d788},
+	{0x01d78a, 0x01d78f},
+	{0x01d7aa, 0x01d7c2},
+	{0x01d7c4, 0x01d7c9},
+	{0x01d7cb, 0x01d7cb},
+	{0x01df00, 0x01df09},
+	{0x01df0b, 0x01df1e},
+	{0x01df25, 0x01df2a},
+	{0x01e030, 0x01e06d},
+	{0x01e922, 0x01e943}
+};
+
+/* table of Unicode codepoint ranges of Uppercase characters */
+static const pg_unicode_range unicode_uppercase[651] =
+{
+	{0x000041, 0x00005a},
+	{0x0000c0, 0x0000d6},
+	{0x0000d8, 0x0000de},
+	{0x000100, 0x000100},
+	{0x000102, 0x000102},
+	{0x000104, 0x000104},
+	{0x000106, 0x000106},
+	{0x000108, 0x000108},
+	{0x00010a, 0x00010a},
+	{0x00010c, 0x00010c},
+	{0x00010e, 0x00010e},
+	{0x000110, 0x000110},
+	{0x000112, 0x000112},
+	{0x000114, 0x000114},
+	{0x000116, 0x000116},
+	{0x000118, 0x000118},
+	{0x00011a, 0x00011a},
+	{0x00011c, 0x00011c},
+	{0x00011e, 0x00011e},
+	{0x000120, 0x000120},
+	{0x000122, 0x000122},
+	{0x000124, 0x000124},
+	{0x000126, 0x000126},
+	{0x000128, 0x000128},
+	{0x00012a, 0x00012a},
+	{0x00012c, 0x00012c},
+	{0x00012e, 0x00012e},
+	{0x000130, 0x000130},
+	{0x000132, 0x000132},
+	{0x000134, 0x000134},
+	{0x000136, 0x000136},
+	{0x000139, 0x000139},
+	{0x00013b, 0x00013b},
+	{0x00013d, 0x00013d},
+	{0x00013f, 0x00013f},
+	{0x000141, 0x000141},
+	{0x000143, 0x000143},
+	{0x000145, 0x000145},
+	{0x000147, 0x000147},
+	{0x00014a, 0x00014a},
+	{0x00014c, 0x00014c},
+	{0x00014e, 0x00014e},
+	{0x000150, 0x000150},
+	{0x000152, 0x000152},
+	{0x000154, 0x000154},
+	{0x000156, 0x000156},
+	{0x000158, 0x000158},
+	{0x00015a, 0x00015a},
+	{0x00015c, 0x00015c},
+	{0x00015e, 0x00015e},
+	{0x000160, 0x000160},
+	{0x000162, 0x000162},
+	{0x000164, 0x000164},
+	{0x000166, 0x000166},
+	{0x000168, 0x000168},
+	{0x00016a, 0x00016a},
+	{0x00016c, 0x00016c},
+	{0x00016e, 0x00016e},
+	{0x000170, 0x000170},
+	{0x000172, 0x000172},
+	{0x000174, 0x000174},
+	{0x000176, 0x000176},
+	{0x000178, 0x000179},
+	{0x00017b, 0x00017b},
+	{0x00017d, 0x00017d},
+	{0x000181, 0x000182},
+	{0x000184, 0x000184},
+	{0x000186, 0x000187},
+	{0x000189, 0x00018b},
+	{0x00018e, 0x000191},
+	{0x000193, 0x000194},
+	{0x000196, 0x000198},
+	{0x00019c, 0x00019d},
+	{0x00019f, 0x0001a0},
+	{0x0001a2, 0x0001a2},
+	{0x0001a4, 0x0001a4},
+	{0x0001a6, 0x0001a7},
+	{0x0001a9, 0x0001a9},
+	{0x0001ac, 0x0001ac},
+	{0x0001ae, 0x0001af},
+	{0x0001b1, 0x0001b3},
+	{0x0001b5, 0x0001b5},
+	{0x0001b7, 0x0001b8},
+	{0x0001bc, 0x0001bc},
+	{0x0001c4, 0x0001c4},
+	{0x0001c7, 0x0001c7},
+	{0x0001ca, 0x0001ca},
+	{0x0001cd, 0x0001cd},
+	{0x0001cf, 0x0001cf},
+	{0x0001d1, 0x0001d1},
+	{0x0001d3, 0x0001d3},
+	{0x0001d5, 0x0001d5},
+	{0x0001d7, 0x0001d7},
+	{0x0001d9, 0x0001d9},
+	{0x0001db, 0x0001db},
+	{0x0001de, 0x0001de},
+	{0x0001e0, 0x0001e0},
+	{0x0001e2, 0x0001e2},
+	{0x0001e4, 0x0001e4},
+	{0x0001e6, 0x0001e6},
+	{0x0001e8, 0x0001e8},
+	{0x0001ea, 0x0001ea},
+	{0x0001ec, 0x0001ec},
+	{0x0001ee, 0x0001ee},
+	{0x0001f1, 0x0001f1},
+	{0x0001f4, 0x0001f4},
+	{0x0001f6, 0x0001f8},
+	{0x0001fa, 0x0001fa},
+	{0x0001fc, 0x0001fc},
+	{0x0001fe, 0x0001fe},
+	{0x000200, 0x000200},
+	{0x000202, 0x000202},
+	{0x000204, 0x000204},
+	{0x000206, 0x000206},
+	{0x000208, 0x000208},
+	{0x00020a, 0x00020a},
+	{0x00020c, 0x00020c},
+	{0x00020e, 0x00020e},
+	{0x000210, 0x000210},
+	{0x000212, 0x000212},
+	{0x000214, 0x000214},
+	{0x000216, 0x000216},
+	{0x000218, 0x000218},
+	{0x00021a, 0x00021a},
+	{0x00021c, 0x00021c},
+	{0x00021e, 0x00021e},
+	{0x000220, 0x000220},
+	{0x000222, 0x000222},
+	{0x000224, 0x000224},
+	{0x000226, 0x000226},
+	{0x000228, 0x000228},
+	{0x00022a, 0x00022a},
+	{0x00022c, 0x00022c},
+	{0x00022e, 0x00022e},
+	{0x000230, 0x000230},
+	{0x000232, 0x000232},
+	{0x00023a, 0x00023b},
+	{0x00023d, 0x00023e},
+	{0x000241, 0x000241},
+	{0x000243, 0x000246},
+	{0x000248, 0x000248},
+	{0x00024a, 0x00024a},
+	{0x00024c, 0x00024c},
+	{0x00024e, 0x00024e},
+	{0x000370, 0x000370},
+	{0x000372, 0x000372},
+	{0x000376, 0x000376},
+	{0x00037f, 0x00037f},
+	{0x000386, 0x000386},
+	{0x000388, 0x00038a},
+	{0x00038c, 0x00038c},
+	{0x00038e, 0x00038f},
+	{0x000391, 0x0003a1},
+	{0x0003a3, 0x0003ab},
+	{0x0003cf, 0x0003cf},
+	{0x0003d2, 0x0003d4},
+	{0x0003d8, 0x0003d8},
+	{0x0003da, 0x0003da},
+	{0x0003dc, 0x0003dc},
+	{0x0003de, 0x0003de},
+	{0x0003e0, 0x0003e0},
+	{0x0003e2, 0x0003e2},
+	{0x0003e4, 0x0003e4},
+	{0x0003e6, 0x0003e6},
+	{0x0003e8, 0x0003e8},
+	{0x0003ea, 0x0003ea},
+	{0x0003ec, 0x0003ec},
+	{0x0003ee, 0x0003ee},
+	{0x0003f4, 0x0003f4},
+	{0x0003f7, 0x0003f7},
+	{0x0003f9, 0x0003fa},
+	{0x0003fd, 0x00042f},
+	{0x000460, 0x000460},
+	{0x000462, 0x000462},
+	{0x000464, 0x000464},
+	{0x000466, 0x000466},
+	{0x000468, 0x000468},
+	{0x00046a, 0x00046a},
+	{0x00046c, 0x00046c},
+	{0x00046e, 0x00046e},
+	{0x000470, 0x000470},
+	{0x000472, 0x000472},
+	{0x000474, 0x000474},
+	{0x000476, 0x000476},
+	{0x000478, 0x000478},
+	{0x00047a, 0x00047a},
+	{0x00047c, 0x00047c},
+	{0x00047e, 0x00047e},
+	{0x000480, 0x000480},
+	{0x00048a, 0x00048a},
+	{0x00048c, 0x00048c},
+	{0x00048e, 0x00048e},
+	{0x000490, 0x000490},
+	{0x000492, 0x000492},
+	{0x000494, 0x000494},
+	{0x000496, 0x000496},
+	{0x000498, 0x000498},
+	{0x00049a, 0x00049a},
+	{0x00049c, 0x00049c},
+	{0x00049e, 0x00049e},
+	{0x0004a0, 0x0004a0},
+	{0x0004a2, 0x0004a2},
+	{0x0004a4, 0x0004a4},
+	{0x0004a6, 0x0004a6},
+	{0x0004a8, 0x0004a8},
+	{0x0004aa, 0x0004aa},
+	{0x0004ac, 0x0004ac},
+	{0x0004ae, 0x0004ae},
+	{0x0004b0, 0x0004b0},
+	{0x0004b2, 0x0004b2},
+	{0x0004b4, 0x0004b4},
+	{0x0004b6, 0x0004b6},
+	{0x0004b8, 0x0004b8},
+	{0x0004ba, 0x0004ba},
+	{0x0004bc, 0x0004bc},
+	{0x0004be, 0x0004be},
+	{0x0004c0, 0x0004c1},
+	{0x0004c3, 0x0004c3},
+	{0x0004c5, 0x0004c5},
+	{0x0004c7, 0x0004c7},
+	{0x0004c9, 0x0004c9},
+	{0x0004cb, 0x0004cb},
+	{0x0004cd, 0x0004cd},
+	{0x0004d0, 0x0004d0},
+	{0x0004d2, 0x0004d2},
+	{0x0004d4, 0x0004d4},
+	{0x0004d6, 0x0004d6},
+	{0x0004d8, 0x0004d8},
+	{0x0004da, 0x0004da},
+	{0x0004dc, 0x0004dc},
+	{0x0004de, 0x0004de},
+	{0x0004e0, 0x0004e0},
+	{0x0004e2, 0x0004e2},
+	{0x0004e4, 0x0004e4},
+	{0x0004e6, 0x0004e6},
+	{0x0004e8, 0x0004e8},
+	{0x0004ea, 0x0004ea},
+	{0x0004ec, 0x0004ec},
+	{0x0004ee, 0x0004ee},
+	{0x0004f0, 0x0004f0},
+	{0x0004f2, 0x0004f2},
+	{0x0004f4, 0x0004f4},
+	{0x0004f6, 0x0004f6},
+	{0x0004f8, 0x0004f8},
+	{0x0004fa, 0x0004fa},
+	{0x0004fc, 0x0004fc},
+	{0x0004fe, 0x0004fe},
+	{0x000500, 0x000500},
+	{0x000502, 0x000502},
+	{0x000504, 0x000504},
+	{0x000506, 0x000506},
+	{0x000508, 0x000508},
+	{0x00050a, 0x00050a},
+	{0x00050c, 0x00050c},
+	{0x00050e, 0x00050e},
+	{0x000510, 0x000510},
+	{0x000512, 0x000512},
+	{0x000514, 0x000514},
+	{0x000516, 0x000516},
+	{0x000518, 0x000518},
+	{0x00051a, 0x00051a},
+	{0x00051c, 0x00051c},
+	{0x00051e, 0x00051e},
+	{0x000520, 0x000520},
+	{0x000522, 0x000522},
+	{0x000524, 0x000524},
+	{0x000526, 0x000526},
+	{0x000528, 0x000528},
+	{0x00052a, 0x00052a},
+	{0x00052c, 0x00052c},
+	{0x00052e, 0x00052e},
+	{0x000531, 0x000556},
+	{0x0010a0, 0x0010c5},
+	{0x0010c7, 0x0010c7},
+	{0x0010cd, 0x0010cd},
+	{0x0013a0, 0x0013f5},
+	{0x001c90, 0x001cba},
+	{0x001cbd, 0x001cbf},
+	{0x001e00, 0x001e00},
+	{0x001e02, 0x001e02},
+	{0x001e04, 0x001e04},
+	{0x001e06, 0x001e06},
+	{0x001e08, 0x001e08},
+	{0x001e0a, 0x001e0a},
+	{0x001e0c, 0x001e0c},
+	{0x001e0e, 0x001e0e},
+	{0x001e10, 0x001e10},
+	{0x001e12, 0x001e12},
+	{0x001e14, 0x001e14},
+	{0x001e16, 0x001e16},
+	{0x001e18, 0x001e18},
+	{0x001e1a, 0x001e1a},
+	{0x001e1c, 0x001e1c},
+	{0x001e1e, 0x001e1e},
+	{0x001e20, 0x001e20},
+	{0x001e22, 0x001e22},
+	{0x001e24, 0x001e24},
+	{0x001e26, 0x001e26},
+	{0x001e28, 0x001e28},
+	{0x001e2a, 0x001e2a},
+	{0x001e2c, 0x001e2c},
+	{0x001e2e, 0x001e2e},
+	{0x001e30, 0x001e30},
+	{0x001e32, 0x001e32},
+	{0x001e34, 0x001e34},
+	{0x001e36, 0x001e36},
+	{0x001e38, 0x001e38},
+	{0x001e3a, 0x001e3a},
+	{0x001e3c, 0x001e3c},
+	{0x001e3e, 0x001e3e},
+	{0x001e40, 0x001e40},
+	{0x001e42, 0x001e42},
+	{0x001e44, 0x001e44},
+	{0x001e46, 0x001e46},
+	{0x001e48, 0x001e48},
+	{0x001e4a, 0x001e4a},
+	{0x001e4c, 0x001e4c},
+	{0x001e4e, 0x001e4e},
+	{0x001e50, 0x001e50},
+	{0x001e52, 0x001e52},
+	{0x001e54, 0x001e54},
+	{0x001e56, 0x001e56},
+	{0x001e58, 0x001e58},
+	{0x001e5a, 0x001e5a},
+	{0x001e5c, 0x001e5c},
+	{0x001e5e, 0x001e5e},
+	{0x001e60, 0x001e60},
+	{0x001e62, 0x001e62},
+	{0x001e64, 0x001e64},
+	{0x001e66, 0x001e66},
+	{0x001e68, 0x001e68},
+	{0x001e6a, 0x001e6a},
+	{0x001e6c, 0x001e6c},
+	{0x001e6e, 0x001e6e},
+	{0x001e70, 0x001e70},
+	{0x001e72, 0x001e72},
+	{0x001e74, 0x001e74},
+	{0x001e76, 0x001e76},
+	{0x001e78, 0x001e78},
+	{0x001e7a, 0x001e7a},
+	{0x001e7c, 0x001e7c},
+	{0x001e7e, 0x001e7e},
+	{0x001e80, 0x001e80},
+	{0x001e82, 0x001e82},
+	{0x001e84, 0x001e84},
+	{0x001e86, 0x001e86},
+	{0x001e88, 0x001e88},
+	{0x001e8a, 0x001e8a},
+	{0x001e8c, 0x001e8c},
+	{0x001e8e, 0x001e8e},
+	{0x001e90, 0x001e90},
+	{0x001e92, 0x001e92},
+	{0x001e94, 0x001e94},
+	{0x001e9e, 0x001e9e},
+	{0x001ea0, 0x001ea0},
+	{0x001ea2, 0x001ea2},
+	{0x001ea4, 0x001ea4},
+	{0x001ea6, 0x001ea6},
+	{0x001ea8, 0x001ea8},
+	{0x001eaa, 0x001eaa},
+	{0x001eac, 0x001eac},
+	{0x001eae, 0x001eae},
+	{0x001eb0, 0x001eb0},
+	{0x001eb2, 0x001eb2},
+	{0x001eb4, 0x001eb4},
+	{0x001eb6, 0x001eb6},
+	{0x001eb8, 0x001eb8},
+	{0x001eba, 0x001eba},
+	{0x001ebc, 0x001ebc},
+	{0x001ebe, 0x001ebe},
+	{0x001ec0, 0x001ec0},
+	{0x001ec2, 0x001ec2},
+	{0x001ec4, 0x001ec4},
+	{0x001ec6, 0x001ec6},
+	{0x001ec8, 0x001ec8},
+	{0x001eca, 0x001eca},
+	{0x001ecc, 0x001ecc},
+	{0x001ece, 0x001ece},
+	{0x001ed0, 0x001ed0},
+	{0x001ed2, 0x001ed2},
+	{0x001ed4, 0x001ed4},
+	{0x001ed6, 0x001ed6},
+	{0x001ed8, 0x001ed8},
+	{0x001eda, 0x001eda},
+	{0x001edc, 0x001edc},
+	{0x001ede, 0x001ede},
+	{0x001ee0, 0x001ee0},
+	{0x001ee2, 0x001ee2},
+	{0x001ee4, 0x001ee4},
+	{0x001ee6, 0x001ee6},
+	{0x001ee8, 0x001ee8},
+	{0x001eea, 0x001eea},
+	{0x001eec, 0x001eec},
+	{0x001eee, 0x001eee},
+	{0x001ef0, 0x001ef0},
+	{0x001ef2, 0x001ef2},
+	{0x001ef4, 0x001ef4},
+	{0x001ef6, 0x001ef6},
+	{0x001ef8, 0x001ef8},
+	{0x001efa, 0x001efa},
+	{0x001efc, 0x001efc},
+	{0x001efe, 0x001efe},
+	{0x001f08, 0x001f0f},
+	{0x001f18, 0x001f1d},
+	{0x001f28, 0x001f2f},
+	{0x001f38, 0x001f3f},
+	{0x001f48, 0x001f4d},
+	{0x001f59, 0x001f59},
+	{0x001f5b, 0x001f5b},
+	{0x001f5d, 0x001f5d},
+	{0x001f5f, 0x001f5f},
+	{0x001f68, 0x001f6f},
+	{0x001fb8, 0x001fbb},
+	{0x001fc8, 0x001fcb},
+	{0x001fd8, 0x001fdb},
+	{0x001fe8, 0x001fec},
+	{0x001ff8, 0x001ffb},
+	{0x002102, 0x002102},
+	{0x002107, 0x002107},
+	{0x00210b, 0x00210d},
+	{0x002110, 0x002112},
+	{0x002115, 0x002115},
+	{0x002119, 0x00211d},
+	{0x002124, 0x002124},
+	{0x002126, 0x002126},
+	{0x002128, 0x002128},
+	{0x00212a, 0x00212d},
+	{0x002130, 0x002133},
+	{0x00213e, 0x00213f},
+	{0x002145, 0x002145},
+	{0x002160, 0x00216f},
+	{0x002183, 0x002183},
+	{0x0024b6, 0x0024cf},
+	{0x002c00, 0x002c2f},
+	{0x002c60, 0x002c60},
+	{0x002c62, 0x002c64},
+	{0x002c67, 0x002c67},
+	{0x002c69, 0x002c69},
+	{0x002c6b, 0x002c6b},
+	{0x002c6d, 0x002c70},
+	{0x002c72, 0x002c72},
+	{0x002c75, 0x002c75},
+	{0x002c7e, 0x002c80},
+	{0x002c82, 0x002c82},
+	{0x002c84, 0x002c84},
+	{0x002c86, 0x002c86},
+	{0x002c88, 0x002c88},
+	{0x002c8a, 0x002c8a},
+	{0x002c8c, 0x002c8c},
+	{0x002c8e, 0x002c8e},
+	{0x002c90, 0x002c90},
+	{0x002c92, 0x002c92},
+	{0x002c94, 0x002c94},
+	{0x002c96, 0x002c96},
+	{0x002c98, 0x002c98},
+	{0x002c9a, 0x002c9a},
+	{0x002c9c, 0x002c9c},
+	{0x002c9e, 0x002c9e},
+	{0x002ca0, 0x002ca0},
+	{0x002ca2, 0x002ca2},
+	{0x002ca4, 0x002ca4},
+	{0x002ca6, 0x002ca6},
+	{0x002ca8, 0x002ca8},
+	{0x002caa, 0x002caa},
+	{0x002cac, 0x002cac},
+	{0x002cae, 0x002cae},
+	{0x002cb0, 0x002cb0},
+	{0x002cb2, 0x002cb2},
+	{0x002cb4, 0x002cb4},
+	{0x002cb6, 0x002cb6},
+	{0x002cb8, 0x002cb8},
+	{0x002cba, 0x002cba},
+	{0x002cbc, 0x002cbc},
+	{0x002cbe, 0x002cbe},
+	{0x002cc0, 0x002cc0},
+	{0x002cc2, 0x002cc2},
+	{0x002cc4, 0x002cc4},
+	{0x002cc6, 0x002cc6},
+	{0x002cc8, 0x002cc8},
+	{0x002cca, 0x002cca},
+	{0x002ccc, 0x002ccc},
+	{0x002cce, 0x002cce},
+	{0x002cd0, 0x002cd0},
+	{0x002cd2, 0x002cd2},
+	{0x002cd4, 0x002cd4},
+	{0x002cd6, 0x002cd6},
+	{0x002cd8, 0x002cd8},
+	{0x002cda, 0x002cda},
+	{0x002cdc, 0x002cdc},
+	{0x002cde, 0x002cde},
+	{0x002ce0, 0x002ce0},
+	{0x002ce2, 0x002ce2},
+	{0x002ceb, 0x002ceb},
+	{0x002ced, 0x002ced},
+	{0x002cf2, 0x002cf2},
+	{0x00a640, 0x00a640},
+	{0x00a642, 0x00a642},
+	{0x00a644, 0x00a644},
+	{0x00a646, 0x00a646},
+	{0x00a648, 0x00a648},
+	{0x00a64a, 0x00a64a},
+	{0x00a64c, 0x00a64c},
+	{0x00a64e, 0x00a64e},
+	{0x00a650, 0x00a650},
+	{0x00a652, 0x00a652},
+	{0x00a654, 0x00a654},
+	{0x00a656, 0x00a656},
+	{0x00a658, 0x00a658},
+	{0x00a65a, 0x00a65a},
+	{0x00a65c, 0x00a65c},
+	{0x00a65e, 0x00a65e},
+	{0x00a660, 0x00a660},
+	{0x00a662, 0x00a662},
+	{0x00a664, 0x00a664},
+	{0x00a666, 0x00a666},
+	{0x00a668, 0x00a668},
+	{0x00a66a, 0x00a66a},
+	{0x00a66c, 0x00a66c},
+	{0x00a680, 0x00a680},
+	{0x00a682, 0x00a682},
+	{0x00a684, 0x00a684},
+	{0x00a686, 0x00a686},
+	{0x00a688, 0x00a688},
+	{0x00a68a, 0x00a68a},
+	{0x00a68c, 0x00a68c},
+	{0x00a68e, 0x00a68e},
+	{0x00a690, 0x00a690},
+	{0x00a692, 0x00a692},
+	{0x00a694, 0x00a694},
+	{0x00a696, 0x00a696},
+	{0x00a698, 0x00a698},
+	{0x00a69a, 0x00a69a},
+	{0x00a722, 0x00a722},
+	{0x00a724, 0x00a724},
+	{0x00a726, 0x00a726},
+	{0x00a728, 0x00a728},
+	{0x00a72a, 0x00a72a},
+	{0x00a72c, 0x00a72c},
+	{0x00a72e, 0x00a72e},
+	{0x00a732, 0x00a732},
+	{0x00a734, 0x00a734},
+	{0x00a736, 0x00a736},
+	{0x00a738, 0x00a738},
+	{0x00a73a, 0x00a73a},
+	{0x00a73c, 0x00a73c},
+	{0x00a73e, 0x00a73e},
+	{0x00a740, 0x00a740},
+	{0x00a742, 0x00a742},
+	{0x00a744, 0x00a744},
+	{0x00a746, 0x00a746},
+	{0x00a748, 0x00a748},
+	{0x00a74a, 0x00a74a},
+	{0x00a74c, 0x00a74c},
+	{0x00a74e, 0x00a74e},
+	{0x00a750, 0x00a750},
+	{0x00a752, 0x00a752},
+	{0x00a754, 0x00a754},
+	{0x00a756, 0x00a756},
+	{0x00a758, 0x00a758},
+	{0x00a75a, 0x00a75a},
+	{0x00a75c, 0x00a75c},
+	{0x00a75e, 0x00a75e},
+	{0x00a760, 0x00a760},
+	{0x00a762, 0x00a762},
+	{0x00a764, 0x00a764},
+	{0x00a766, 0x00a766},
+	{0x00a768, 0x00a768},
+	{0x00a76a, 0x00a76a},
+	{0x00a76c, 0x00a76c},
+	{0x00a76e, 0x00a76e},
+	{0x00a779, 0x00a779},
+	{0x00a77b, 0x00a77b},
+	{0x00a77d, 0x00a77e},
+	{0x00a780, 0x00a780},
+	{0x00a782, 0x00a782},
+	{0x00a784, 0x00a784},
+	{0x00a786, 0x00a786},
+	{0x00a78b, 0x00a78b},
+	{0x00a78d, 0x00a78d},
+	{0x00a790, 0x00a790},
+	{0x00a792, 0x00a792},
+	{0x00a796, 0x00a796},
+	{0x00a798, 0x00a798},
+	{0x00a79a, 0x00a79a},
+	{0x00a79c, 0x00a79c},
+	{0x00a79e, 0x00a79e},
+	{0x00a7a0, 0x00a7a0},
+	{0x00a7a2, 0x00a7a2},
+	{0x00a7a4, 0x00a7a4},
+	{0x00a7a6, 0x00a7a6},
+	{0x00a7a8, 0x00a7a8},
+	{0x00a7aa, 0x00a7ae},
+	{0x00a7b0, 0x00a7b4},
+	{0x00a7b6, 0x00a7b6},
+	{0x00a7b8, 0x00a7b8},
+	{0x00a7ba, 0x00a7ba},
+	{0x00a7bc, 0x00a7bc},
+	{0x00a7be, 0x00a7be},
+	{0x00a7c0, 0x00a7c0},
+	{0x00a7c2, 0x00a7c2},
+	{0x00a7c4, 0x00a7c7},
+	{0x00a7c9, 0x00a7c9},
+	{0x00a7d0, 0x00a7d0},
+	{0x00a7d6, 0x00a7d6},
+	{0x00a7d8, 0x00a7d8},
+	{0x00a7f5, 0x00a7f5},
+	{0x00ff21, 0x00ff3a},
+	{0x010400, 0x010427},
+	{0x0104b0, 0x0104d3},
+	{0x010570, 0x01057a},
+	{0x01057c, 0x01058a},
+	{0x01058c, 0x010592},
+	{0x010594, 0x010595},
+	{0x010c80, 0x010cb2},
+	{0x0118a0, 0x0118bf},
+	{0x016e40, 0x016e5f},
+	{0x01d400, 0x01d419},
+	{0x01d434, 0x01d44d},
+	{0x01d468, 0x01d481},
+	{0x01d49c, 0x01d49c},
+	{0x01d49e, 0x01d49f},
+	{0x01d4a2, 0x01d4a2},
+	{0x01d4a5, 0x01d4a6},
+	{0x01d4a9, 0x01d4ac},
+	{0x01d4ae, 0x01d4b5},
+	{0x01d4d0, 0x01d4e9},
+	{0x01d504, 0x01d505},
+	{0x01d507, 0x01d50a},
+	{0x01d50d, 0x01d514},
+	{0x01d516, 0x01d51c},
+	{0x01d538, 0x01d539},
+	{0x01d53b, 0x01d53e},
+	{0x01d540, 0x01d544},
+	{0x01d546, 0x01d546},
+	{0x01d54a, 0x01d550},
+	{0x01d56c, 0x01d585},
+	{0x01d5a0, 0x01d5b9},
+	{0x01d5d4, 0x01d5ed},
+	{0x01d608, 0x01d621},
+	{0x01d63c, 0x01d655},
+	{0x01d670, 0x01d689},
+	{0x01d6a8, 0x01d6c0},
+	{0x01d6e2, 0x01d6fa},
+	{0x01d71c, 0x01d734},
+	{0x01d756, 0x01d76e},
+	{0x01d790, 0x01d7a8},
+	{0x01d7ca, 0x01d7ca},
+	{0x01e900, 0x01e921},
+	{0x01f130, 0x01f149},
+	{0x01f150, 0x01f169},
+	{0x01f170, 0x01f189}
+};
+
+/* table of Unicode codepoint ranges of White_Space characters */
+static const pg_unicode_range unicode_white_space[11] =
+{
+	{0x000009, 0x00000d},
+	{0x000020, 0x000020},
+	{0x000085, 0x000085},
+	{0x0000a0, 0x0000a0},
+	{0x001680, 0x001680},
+	{0x002000, 0x00200a},
+	{0x002028, 0x002028},
+	{0x002029, 0x002029},
+	{0x00202f, 0x00202f},
+	{0x00205f, 0x00205f},
+	{0x003000, 0x003000}
+};
+
+/* table of Unicode codepoint ranges of Hex_Digit characters */
+static const pg_unicode_range unicode_hex_digit[6] =
+{
+	{0x000030, 0x000039},
+	{0x000041, 0x000046},
+	{0x000061, 0x000066},
+	{0x00ff10, 0x00ff19},
+	{0x00ff21, 0x00ff26},
+	{0x00ff41, 0x00ff46}
+};
+
+/* table of Unicode codepoint ranges of Join_Control characters */
+static const pg_unicode_range unicode_join_control[1] =
+{
+	{0x00200c, 0x00200d}
+};
-- 
2.34.1

v2-0002-Shrink-unicode-category-table.patchtext/x-patch; charset=UTF-8; name=v2-0002-Shrink-unicode-category-table.patchDownload
From 35cd57cd65205573be3a3eff91affe307da405d0 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 22 Nov 2023 11:30:31 -0800
Subject: [PATCH v2 2/3] Shrink unicode category table.

Missing entries can implicitly be considered "unassigned".
---
 .../generate-unicode_category_table.pl        |  21 +-
 src/common/unicode_category.c                 |   6 +-
 src/include/common/unicode_category_table.h   | 711 +-----------------
 3 files changed, 15 insertions(+), 723 deletions(-)

diff --git a/src/common/unicode/generate-unicode_category_table.pl b/src/common/unicode/generate-unicode_category_table.pl
index 8f03425e0b..992b877ede 100644
--- a/src/common/unicode/generate-unicode_category_table.pl
+++ b/src/common/unicode/generate-unicode_category_table.pl
@@ -72,7 +72,10 @@ while (my $line = <$FH>)
 	# the current range, emit the current range and initialize a new
 	# range representing the gap.
 	if ($range_end + 1 != $code && $range_category ne $gap_category) {
-		push(@category_ranges, {start => $range_start, end => $range_end, category => $range_category});
+		if ($range_category ne $CATEGORY_UNASSIGNED) {
+			push(@category_ranges, {start => $range_start, end => $range_end,
+									category => $range_category});
+		}
 		$range_start = $range_end + 1;
 		$range_end = $code - 1;
 		$range_category = $gap_category;
@@ -80,7 +83,10 @@ while (my $line = <$FH>)
 
 	# different category; new range
 	if ($range_category ne $category) {
-		push(@category_ranges, {start => $range_start, end => $range_end, category => $range_category});
+		if ($range_category ne $CATEGORY_UNASSIGNED) {
+			push(@category_ranges, {start => $range_start, end => $range_end,
+									category => $range_category});
+		}
 		$range_start = $code;
 		$range_end = $code;
 		$range_category = $category;
@@ -109,14 +115,9 @@ die "<..., First> entry with no corresponding <..., Last> entry"
   if $gap_category ne $CATEGORY_UNASSIGNED;
 
 # emit final range
-push(@category_ranges, {start => $range_start, end => $range_end, category => $range_category});
-
-# emit range for any unassigned code points after last entry
-if ($range_end < 0x10FFFF) {
-	$range_start = $range_end + 1;
-	$range_end = 0x10FFFF;
-	$range_category = $CATEGORY_UNASSIGNED;
-	push(@category_ranges, {start => $range_start, end => $range_end, category => $range_category});
+if ($range_category ne $CATEGORY_UNASSIGNED) {
+	push(@category_ranges, {start => $range_start, end => $range_end,
+							category => $range_category});
 }
 
 my $num_ranges = scalar @category_ranges;
diff --git a/src/common/unicode_category.c b/src/common/unicode_category.c
index cec9c0d998..189cd6eca3 100644
--- a/src/common/unicode_category.c
+++ b/src/common/unicode_category.c
@@ -28,8 +28,7 @@ unicode_category(pg_wchar ucs)
 	int			mid;
 	int			max = lengthof(unicode_categories) - 1;
 
-	Assert(ucs >= unicode_categories[0].first &&
-		   ucs <= unicode_categories[max].last);
+	Assert(ucs <= 0x10ffff);
 
 	while (max >= min)
 	{
@@ -42,8 +41,7 @@ unicode_category(pg_wchar ucs)
 			return unicode_categories[mid].category;
 	}
 
-	Assert(false);
-	return (pg_unicode_category) - 1;
+	return PG_U_UNASSIGNED;
 }
 
 /*
diff --git a/src/include/common/unicode_category_table.h b/src/include/common/unicode_category_table.h
index 06ad50d215..14f1ea0677 100644
--- a/src/include/common/unicode_category_table.h
+++ b/src/include/common/unicode_category_table.h
@@ -26,7 +26,7 @@ typedef struct
 }			pg_category_range;
 
 /* table of Unicode codepoint ranges and their categories */
-static const pg_category_range unicode_categories[4009] =
+static const pg_category_range unicode_categories[3302] =
 {
 	{0x000000, 0x00001f, PG_U_CONTROL},
 	{0x000020, 0x000020, PG_U_SPACE_SEPARATOR},
@@ -397,23 +397,18 @@ static const pg_category_range unicode_categories[4009] =
 	{0x000375, 0x000375, PG_U_MODIFIER_SYMBOL},
 	{0x000376, 0x000376, PG_U_UPPERCASE_LETTER},
 	{0x000377, 0x000377, PG_U_LOWERCASE_LETTER},
-	{0x000378, 0x000379, PG_U_UNASSIGNED},
 	{0x00037a, 0x00037a, PG_U_MODIFIER_LETTER},
 	{0x00037b, 0x00037d, PG_U_LOWERCASE_LETTER},
 	{0x00037e, 0x00037e, PG_U_OTHER_PUNCTUATION},
 	{0x00037f, 0x00037f, PG_U_UPPERCASE_LETTER},
-	{0x000380, 0x000383, PG_U_UNASSIGNED},
 	{0x000384, 0x000385, PG_U_MODIFIER_SYMBOL},
 	{0x000386, 0x000386, PG_U_UPPERCASE_LETTER},
 	{0x000387, 0x000387, PG_U_OTHER_PUNCTUATION},
 	{0x000388, 0x00038a, PG_U_UPPERCASE_LETTER},
-	{0x00038b, 0x00038b, PG_U_UNASSIGNED},
 	{0x00038c, 0x00038c, PG_U_UPPERCASE_LETTER},
-	{0x00038d, 0x00038d, PG_U_UNASSIGNED},
 	{0x00038e, 0x00038f, PG_U_UPPERCASE_LETTER},
 	{0x000390, 0x000390, PG_U_LOWERCASE_LETTER},
 	{0x000391, 0x0003a1, PG_U_UPPERCASE_LETTER},
-	{0x0003a2, 0x0003a2, PG_U_UNASSIGNED},
 	{0x0003a3, 0x0003ab, PG_U_UPPERCASE_LETTER},
 	{0x0003ac, 0x0003ce, PG_U_LOWERCASE_LETTER},
 	{0x0003cf, 0x0003cf, PG_U_UPPERCASE_LETTER},
@@ -654,18 +649,14 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00052d, 0x00052d, PG_U_LOWERCASE_LETTER},
 	{0x00052e, 0x00052e, PG_U_UPPERCASE_LETTER},
 	{0x00052f, 0x00052f, PG_U_LOWERCASE_LETTER},
-	{0x000530, 0x000530, PG_U_UNASSIGNED},
 	{0x000531, 0x000556, PG_U_UPPERCASE_LETTER},
-	{0x000557, 0x000558, PG_U_UNASSIGNED},
 	{0x000559, 0x000559, PG_U_MODIFIER_LETTER},
 	{0x00055a, 0x00055f, PG_U_OTHER_PUNCTUATION},
 	{0x000560, 0x000588, PG_U_LOWERCASE_LETTER},
 	{0x000589, 0x000589, PG_U_OTHER_PUNCTUATION},
 	{0x00058a, 0x00058a, PG_U_DASH_PUNCTUATION},
-	{0x00058b, 0x00058c, PG_U_UNASSIGNED},
 	{0x00058d, 0x00058e, PG_U_OTHER_SYMBOL},
 	{0x00058f, 0x00058f, PG_U_CURRENCY_SYMBOL},
-	{0x000590, 0x000590, PG_U_UNASSIGNED},
 	{0x000591, 0x0005bd, PG_U_NONSPACING_MARK},
 	{0x0005be, 0x0005be, PG_U_DASH_PUNCTUATION},
 	{0x0005bf, 0x0005bf, PG_U_NONSPACING_MARK},
@@ -675,12 +666,9 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0005c4, 0x0005c5, PG_U_NONSPACING_MARK},
 	{0x0005c6, 0x0005c6, PG_U_OTHER_PUNCTUATION},
 	{0x0005c7, 0x0005c7, PG_U_NONSPACING_MARK},
-	{0x0005c8, 0x0005cf, PG_U_UNASSIGNED},
 	{0x0005d0, 0x0005ea, PG_U_OTHER_LETTER},
-	{0x0005eb, 0x0005ee, PG_U_UNASSIGNED},
 	{0x0005ef, 0x0005f2, PG_U_OTHER_LETTER},
 	{0x0005f3, 0x0005f4, PG_U_OTHER_PUNCTUATION},
-	{0x0005f5, 0x0005ff, PG_U_UNASSIGNED},
 	{0x000600, 0x000605, PG_U_FORMAT},
 	{0x000606, 0x000608, PG_U_MATH_SYMBOL},
 	{0x000609, 0x00060a, PG_U_OTHER_PUNCTUATION},
@@ -716,17 +704,14 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0006fd, 0x0006fe, PG_U_OTHER_SYMBOL},
 	{0x0006ff, 0x0006ff, PG_U_OTHER_LETTER},
 	{0x000700, 0x00070d, PG_U_OTHER_PUNCTUATION},
-	{0x00070e, 0x00070e, PG_U_UNASSIGNED},
 	{0x00070f, 0x00070f, PG_U_FORMAT},
 	{0x000710, 0x000710, PG_U_OTHER_LETTER},
 	{0x000711, 0x000711, PG_U_NONSPACING_MARK},
 	{0x000712, 0x00072f, PG_U_OTHER_LETTER},
 	{0x000730, 0x00074a, PG_U_NONSPACING_MARK},
-	{0x00074b, 0x00074c, PG_U_UNASSIGNED},
 	{0x00074d, 0x0007a5, PG_U_OTHER_LETTER},
 	{0x0007a6, 0x0007b0, PG_U_NONSPACING_MARK},
 	{0x0007b1, 0x0007b1, PG_U_OTHER_LETTER},
-	{0x0007b2, 0x0007bf, PG_U_UNASSIGNED},
 	{0x0007c0, 0x0007c9, PG_U_DECIMAL_NUMBER},
 	{0x0007ca, 0x0007ea, PG_U_OTHER_LETTER},
 	{0x0007eb, 0x0007f3, PG_U_NONSPACING_MARK},
@@ -734,7 +719,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0007f6, 0x0007f6, PG_U_OTHER_SYMBOL},
 	{0x0007f7, 0x0007f9, PG_U_OTHER_PUNCTUATION},
 	{0x0007fa, 0x0007fa, PG_U_MODIFIER_LETTER},
-	{0x0007fb, 0x0007fc, PG_U_UNASSIGNED},
 	{0x0007fd, 0x0007fd, PG_U_NONSPACING_MARK},
 	{0x0007fe, 0x0007ff, PG_U_CURRENCY_SYMBOL},
 	{0x000800, 0x000815, PG_U_OTHER_LETTER},
@@ -745,22 +729,15 @@ static const pg_category_range unicode_categories[4009] =
 	{0x000825, 0x000827, PG_U_NONSPACING_MARK},
 	{0x000828, 0x000828, PG_U_MODIFIER_LETTER},
 	{0x000829, 0x00082d, PG_U_NONSPACING_MARK},
-	{0x00082e, 0x00082f, PG_U_UNASSIGNED},
 	{0x000830, 0x00083e, PG_U_OTHER_PUNCTUATION},
-	{0x00083f, 0x00083f, PG_U_UNASSIGNED},
 	{0x000840, 0x000858, PG_U_OTHER_LETTER},
 	{0x000859, 0x00085b, PG_U_NONSPACING_MARK},
-	{0x00085c, 0x00085d, PG_U_UNASSIGNED},
 	{0x00085e, 0x00085e, PG_U_OTHER_PUNCTUATION},
-	{0x00085f, 0x00085f, PG_U_UNASSIGNED},
 	{0x000860, 0x00086a, PG_U_OTHER_LETTER},
-	{0x00086b, 0x00086f, PG_U_UNASSIGNED},
 	{0x000870, 0x000887, PG_U_OTHER_LETTER},
 	{0x000888, 0x000888, PG_U_MODIFIER_SYMBOL},
 	{0x000889, 0x00088e, PG_U_OTHER_LETTER},
-	{0x00088f, 0x00088f, PG_U_UNASSIGNED},
 	{0x000890, 0x000891, PG_U_FORMAT},
-	{0x000892, 0x000897, PG_U_UNASSIGNED},
 	{0x000898, 0x00089f, PG_U_NONSPACING_MARK},
 	{0x0008a0, 0x0008c8, PG_U_OTHER_LETTER},
 	{0x0008c9, 0x0008c9, PG_U_MODIFIER_LETTER},
@@ -789,37 +766,24 @@ static const pg_category_range unicode_categories[4009] =
 	{0x000972, 0x000980, PG_U_OTHER_LETTER},
 	{0x000981, 0x000981, PG_U_NONSPACING_MARK},
 	{0x000982, 0x000983, PG_U_SPACING_MARK},
-	{0x000984, 0x000984, PG_U_UNASSIGNED},
 	{0x000985, 0x00098c, PG_U_OTHER_LETTER},
-	{0x00098d, 0x00098e, PG_U_UNASSIGNED},
 	{0x00098f, 0x000990, PG_U_OTHER_LETTER},
-	{0x000991, 0x000992, PG_U_UNASSIGNED},
 	{0x000993, 0x0009a8, PG_U_OTHER_LETTER},
-	{0x0009a9, 0x0009a9, PG_U_UNASSIGNED},
 	{0x0009aa, 0x0009b0, PG_U_OTHER_LETTER},
-	{0x0009b1, 0x0009b1, PG_U_UNASSIGNED},
 	{0x0009b2, 0x0009b2, PG_U_OTHER_LETTER},
-	{0x0009b3, 0x0009b5, PG_U_UNASSIGNED},
 	{0x0009b6, 0x0009b9, PG_U_OTHER_LETTER},
-	{0x0009ba, 0x0009bb, PG_U_UNASSIGNED},
 	{0x0009bc, 0x0009bc, PG_U_NONSPACING_MARK},
 	{0x0009bd, 0x0009bd, PG_U_OTHER_LETTER},
 	{0x0009be, 0x0009c0, PG_U_SPACING_MARK},
 	{0x0009c1, 0x0009c4, PG_U_NONSPACING_MARK},
-	{0x0009c5, 0x0009c6, PG_U_UNASSIGNED},
 	{0x0009c7, 0x0009c8, PG_U_SPACING_MARK},
-	{0x0009c9, 0x0009ca, PG_U_UNASSIGNED},
 	{0x0009cb, 0x0009cc, PG_U_SPACING_MARK},
 	{0x0009cd, 0x0009cd, PG_U_NONSPACING_MARK},
 	{0x0009ce, 0x0009ce, PG_U_OTHER_LETTER},
-	{0x0009cf, 0x0009d6, PG_U_UNASSIGNED},
 	{0x0009d7, 0x0009d7, PG_U_SPACING_MARK},
-	{0x0009d8, 0x0009db, PG_U_UNASSIGNED},
 	{0x0009dc, 0x0009dd, PG_U_OTHER_LETTER},
-	{0x0009de, 0x0009de, PG_U_UNASSIGNED},
 	{0x0009df, 0x0009e1, PG_U_OTHER_LETTER},
 	{0x0009e2, 0x0009e3, PG_U_NONSPACING_MARK},
-	{0x0009e4, 0x0009e5, PG_U_UNASSIGNED},
 	{0x0009e6, 0x0009ef, PG_U_DECIMAL_NUMBER},
 	{0x0009f0, 0x0009f1, PG_U_OTHER_LETTER},
 	{0x0009f2, 0x0009f3, PG_U_CURRENCY_SYMBOL},
@@ -829,194 +793,121 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0009fc, 0x0009fc, PG_U_OTHER_LETTER},
 	{0x0009fd, 0x0009fd, PG_U_OTHER_PUNCTUATION},
 	{0x0009fe, 0x0009fe, PG_U_NONSPACING_MARK},
-	{0x0009ff, 0x000a00, PG_U_UNASSIGNED},
 	{0x000a01, 0x000a02, PG_U_NONSPACING_MARK},
 	{0x000a03, 0x000a03, PG_U_SPACING_MARK},
-	{0x000a04, 0x000a04, PG_U_UNASSIGNED},
 	{0x000a05, 0x000a0a, PG_U_OTHER_LETTER},
-	{0x000a0b, 0x000a0e, PG_U_UNASSIGNED},
 	{0x000a0f, 0x000a10, PG_U_OTHER_LETTER},
-	{0x000a11, 0x000a12, PG_U_UNASSIGNED},
 	{0x000a13, 0x000a28, PG_U_OTHER_LETTER},
-	{0x000a29, 0x000a29, PG_U_UNASSIGNED},
 	{0x000a2a, 0x000a30, PG_U_OTHER_LETTER},
-	{0x000a31, 0x000a31, PG_U_UNASSIGNED},
 	{0x000a32, 0x000a33, PG_U_OTHER_LETTER},
-	{0x000a34, 0x000a34, PG_U_UNASSIGNED},
 	{0x000a35, 0x000a36, PG_U_OTHER_LETTER},
-	{0x000a37, 0x000a37, PG_U_UNASSIGNED},
 	{0x000a38, 0x000a39, PG_U_OTHER_LETTER},
-	{0x000a3a, 0x000a3b, PG_U_UNASSIGNED},
 	{0x000a3c, 0x000a3c, PG_U_NONSPACING_MARK},
-	{0x000a3d, 0x000a3d, PG_U_UNASSIGNED},
 	{0x000a3e, 0x000a40, PG_U_SPACING_MARK},
 	{0x000a41, 0x000a42, PG_U_NONSPACING_MARK},
-	{0x000a43, 0x000a46, PG_U_UNASSIGNED},
 	{0x000a47, 0x000a48, PG_U_NONSPACING_MARK},
-	{0x000a49, 0x000a4a, PG_U_UNASSIGNED},
 	{0x000a4b, 0x000a4d, PG_U_NONSPACING_MARK},
-	{0x000a4e, 0x000a50, PG_U_UNASSIGNED},
 	{0x000a51, 0x000a51, PG_U_NONSPACING_MARK},
-	{0x000a52, 0x000a58, PG_U_UNASSIGNED},
 	{0x000a59, 0x000a5c, PG_U_OTHER_LETTER},
-	{0x000a5d, 0x000a5d, PG_U_UNASSIGNED},
 	{0x000a5e, 0x000a5e, PG_U_OTHER_LETTER},
-	{0x000a5f, 0x000a65, PG_U_UNASSIGNED},
 	{0x000a66, 0x000a6f, PG_U_DECIMAL_NUMBER},
 	{0x000a70, 0x000a71, PG_U_NONSPACING_MARK},
 	{0x000a72, 0x000a74, PG_U_OTHER_LETTER},
 	{0x000a75, 0x000a75, PG_U_NONSPACING_MARK},
 	{0x000a76, 0x000a76, PG_U_OTHER_PUNCTUATION},
-	{0x000a77, 0x000a80, PG_U_UNASSIGNED},
 	{0x000a81, 0x000a82, PG_U_NONSPACING_MARK},
 	{0x000a83, 0x000a83, PG_U_SPACING_MARK},
-	{0x000a84, 0x000a84, PG_U_UNASSIGNED},
 	{0x000a85, 0x000a8d, PG_U_OTHER_LETTER},
-	{0x000a8e, 0x000a8e, PG_U_UNASSIGNED},
 	{0x000a8f, 0x000a91, PG_U_OTHER_LETTER},
-	{0x000a92, 0x000a92, PG_U_UNASSIGNED},
 	{0x000a93, 0x000aa8, PG_U_OTHER_LETTER},
-	{0x000aa9, 0x000aa9, PG_U_UNASSIGNED},
 	{0x000aaa, 0x000ab0, PG_U_OTHER_LETTER},
-	{0x000ab1, 0x000ab1, PG_U_UNASSIGNED},
 	{0x000ab2, 0x000ab3, PG_U_OTHER_LETTER},
-	{0x000ab4, 0x000ab4, PG_U_UNASSIGNED},
 	{0x000ab5, 0x000ab9, PG_U_OTHER_LETTER},
-	{0x000aba, 0x000abb, PG_U_UNASSIGNED},
 	{0x000abc, 0x000abc, PG_U_NONSPACING_MARK},
 	{0x000abd, 0x000abd, PG_U_OTHER_LETTER},
 	{0x000abe, 0x000ac0, PG_U_SPACING_MARK},
 	{0x000ac1, 0x000ac5, PG_U_NONSPACING_MARK},
-	{0x000ac6, 0x000ac6, PG_U_UNASSIGNED},
 	{0x000ac7, 0x000ac8, PG_U_NONSPACING_MARK},
 	{0x000ac9, 0x000ac9, PG_U_SPACING_MARK},
-	{0x000aca, 0x000aca, PG_U_UNASSIGNED},
 	{0x000acb, 0x000acc, PG_U_SPACING_MARK},
 	{0x000acd, 0x000acd, PG_U_NONSPACING_MARK},
-	{0x000ace, 0x000acf, PG_U_UNASSIGNED},
 	{0x000ad0, 0x000ad0, PG_U_OTHER_LETTER},
-	{0x000ad1, 0x000adf, PG_U_UNASSIGNED},
 	{0x000ae0, 0x000ae1, PG_U_OTHER_LETTER},
 	{0x000ae2, 0x000ae3, PG_U_NONSPACING_MARK},
-	{0x000ae4, 0x000ae5, PG_U_UNASSIGNED},
 	{0x000ae6, 0x000aef, PG_U_DECIMAL_NUMBER},
 	{0x000af0, 0x000af0, PG_U_OTHER_PUNCTUATION},
 	{0x000af1, 0x000af1, PG_U_CURRENCY_SYMBOL},
-	{0x000af2, 0x000af8, PG_U_UNASSIGNED},
 	{0x000af9, 0x000af9, PG_U_OTHER_LETTER},
 	{0x000afa, 0x000aff, PG_U_NONSPACING_MARK},
-	{0x000b00, 0x000b00, PG_U_UNASSIGNED},
 	{0x000b01, 0x000b01, PG_U_NONSPACING_MARK},
 	{0x000b02, 0x000b03, PG_U_SPACING_MARK},
-	{0x000b04, 0x000b04, PG_U_UNASSIGNED},
 	{0x000b05, 0x000b0c, PG_U_OTHER_LETTER},
-	{0x000b0d, 0x000b0e, PG_U_UNASSIGNED},
 	{0x000b0f, 0x000b10, PG_U_OTHER_LETTER},
-	{0x000b11, 0x000b12, PG_U_UNASSIGNED},
 	{0x000b13, 0x000b28, PG_U_OTHER_LETTER},
-	{0x000b29, 0x000b29, PG_U_UNASSIGNED},
 	{0x000b2a, 0x000b30, PG_U_OTHER_LETTER},
-	{0x000b31, 0x000b31, PG_U_UNASSIGNED},
 	{0x000b32, 0x000b33, PG_U_OTHER_LETTER},
-	{0x000b34, 0x000b34, PG_U_UNASSIGNED},
 	{0x000b35, 0x000b39, PG_U_OTHER_LETTER},
-	{0x000b3a, 0x000b3b, PG_U_UNASSIGNED},
 	{0x000b3c, 0x000b3c, PG_U_NONSPACING_MARK},
 	{0x000b3d, 0x000b3d, PG_U_OTHER_LETTER},
 	{0x000b3e, 0x000b3e, PG_U_SPACING_MARK},
 	{0x000b3f, 0x000b3f, PG_U_NONSPACING_MARK},
 	{0x000b40, 0x000b40, PG_U_SPACING_MARK},
 	{0x000b41, 0x000b44, PG_U_NONSPACING_MARK},
-	{0x000b45, 0x000b46, PG_U_UNASSIGNED},
 	{0x000b47, 0x000b48, PG_U_SPACING_MARK},
-	{0x000b49, 0x000b4a, PG_U_UNASSIGNED},
 	{0x000b4b, 0x000b4c, PG_U_SPACING_MARK},
 	{0x000b4d, 0x000b4d, PG_U_NONSPACING_MARK},
-	{0x000b4e, 0x000b54, PG_U_UNASSIGNED},
 	{0x000b55, 0x000b56, PG_U_NONSPACING_MARK},
 	{0x000b57, 0x000b57, PG_U_SPACING_MARK},
-	{0x000b58, 0x000b5b, PG_U_UNASSIGNED},
 	{0x000b5c, 0x000b5d, PG_U_OTHER_LETTER},
-	{0x000b5e, 0x000b5e, PG_U_UNASSIGNED},
 	{0x000b5f, 0x000b61, PG_U_OTHER_LETTER},
 	{0x000b62, 0x000b63, PG_U_NONSPACING_MARK},
-	{0x000b64, 0x000b65, PG_U_UNASSIGNED},
 	{0x000b66, 0x000b6f, PG_U_DECIMAL_NUMBER},
 	{0x000b70, 0x000b70, PG_U_OTHER_SYMBOL},
 	{0x000b71, 0x000b71, PG_U_OTHER_LETTER},
 	{0x000b72, 0x000b77, PG_U_OTHER_NUMBER},
-	{0x000b78, 0x000b81, PG_U_UNASSIGNED},
 	{0x000b82, 0x000b82, PG_U_NONSPACING_MARK},
 	{0x000b83, 0x000b83, PG_U_OTHER_LETTER},
-	{0x000b84, 0x000b84, PG_U_UNASSIGNED},
 	{0x000b85, 0x000b8a, PG_U_OTHER_LETTER},
-	{0x000b8b, 0x000b8d, PG_U_UNASSIGNED},
 	{0x000b8e, 0x000b90, PG_U_OTHER_LETTER},
-	{0x000b91, 0x000b91, PG_U_UNASSIGNED},
 	{0x000b92, 0x000b95, PG_U_OTHER_LETTER},
-	{0x000b96, 0x000b98, PG_U_UNASSIGNED},
 	{0x000b99, 0x000b9a, PG_U_OTHER_LETTER},
-	{0x000b9b, 0x000b9b, PG_U_UNASSIGNED},
 	{0x000b9c, 0x000b9c, PG_U_OTHER_LETTER},
-	{0x000b9d, 0x000b9d, PG_U_UNASSIGNED},
 	{0x000b9e, 0x000b9f, PG_U_OTHER_LETTER},
-	{0x000ba0, 0x000ba2, PG_U_UNASSIGNED},
 	{0x000ba3, 0x000ba4, PG_U_OTHER_LETTER},
-	{0x000ba5, 0x000ba7, PG_U_UNASSIGNED},
 	{0x000ba8, 0x000baa, PG_U_OTHER_LETTER},
-	{0x000bab, 0x000bad, PG_U_UNASSIGNED},
 	{0x000bae, 0x000bb9, PG_U_OTHER_LETTER},
-	{0x000bba, 0x000bbd, PG_U_UNASSIGNED},
 	{0x000bbe, 0x000bbf, PG_U_SPACING_MARK},
 	{0x000bc0, 0x000bc0, PG_U_NONSPACING_MARK},
 	{0x000bc1, 0x000bc2, PG_U_SPACING_MARK},
-	{0x000bc3, 0x000bc5, PG_U_UNASSIGNED},
 	{0x000bc6, 0x000bc8, PG_U_SPACING_MARK},
-	{0x000bc9, 0x000bc9, PG_U_UNASSIGNED},
 	{0x000bca, 0x000bcc, PG_U_SPACING_MARK},
 	{0x000bcd, 0x000bcd, PG_U_NONSPACING_MARK},
-	{0x000bce, 0x000bcf, PG_U_UNASSIGNED},
 	{0x000bd0, 0x000bd0, PG_U_OTHER_LETTER},
-	{0x000bd1, 0x000bd6, PG_U_UNASSIGNED},
 	{0x000bd7, 0x000bd7, PG_U_SPACING_MARK},
-	{0x000bd8, 0x000be5, PG_U_UNASSIGNED},
 	{0x000be6, 0x000bef, PG_U_DECIMAL_NUMBER},
 	{0x000bf0, 0x000bf2, PG_U_OTHER_NUMBER},
 	{0x000bf3, 0x000bf8, PG_U_OTHER_SYMBOL},
 	{0x000bf9, 0x000bf9, PG_U_CURRENCY_SYMBOL},
 	{0x000bfa, 0x000bfa, PG_U_OTHER_SYMBOL},
-	{0x000bfb, 0x000bff, PG_U_UNASSIGNED},
 	{0x000c00, 0x000c00, PG_U_NONSPACING_MARK},
 	{0x000c01, 0x000c03, PG_U_SPACING_MARK},
 	{0x000c04, 0x000c04, PG_U_NONSPACING_MARK},
 	{0x000c05, 0x000c0c, PG_U_OTHER_LETTER},
-	{0x000c0d, 0x000c0d, PG_U_UNASSIGNED},
 	{0x000c0e, 0x000c10, PG_U_OTHER_LETTER},
-	{0x000c11, 0x000c11, PG_U_UNASSIGNED},
 	{0x000c12, 0x000c28, PG_U_OTHER_LETTER},
-	{0x000c29, 0x000c29, PG_U_UNASSIGNED},
 	{0x000c2a, 0x000c39, PG_U_OTHER_LETTER},
-	{0x000c3a, 0x000c3b, PG_U_UNASSIGNED},
 	{0x000c3c, 0x000c3c, PG_U_NONSPACING_MARK},
 	{0x000c3d, 0x000c3d, PG_U_OTHER_LETTER},
 	{0x000c3e, 0x000c40, PG_U_NONSPACING_MARK},
 	{0x000c41, 0x000c44, PG_U_SPACING_MARK},
-	{0x000c45, 0x000c45, PG_U_UNASSIGNED},
 	{0x000c46, 0x000c48, PG_U_NONSPACING_MARK},
-	{0x000c49, 0x000c49, PG_U_UNASSIGNED},
 	{0x000c4a, 0x000c4d, PG_U_NONSPACING_MARK},
-	{0x000c4e, 0x000c54, PG_U_UNASSIGNED},
 	{0x000c55, 0x000c56, PG_U_NONSPACING_MARK},
-	{0x000c57, 0x000c57, PG_U_UNASSIGNED},
 	{0x000c58, 0x000c5a, PG_U_OTHER_LETTER},
-	{0x000c5b, 0x000c5c, PG_U_UNASSIGNED},
 	{0x000c5d, 0x000c5d, PG_U_OTHER_LETTER},
-	{0x000c5e, 0x000c5f, PG_U_UNASSIGNED},
 	{0x000c60, 0x000c61, PG_U_OTHER_LETTER},
 	{0x000c62, 0x000c63, PG_U_NONSPACING_MARK},
-	{0x000c64, 0x000c65, PG_U_UNASSIGNED},
 	{0x000c66, 0x000c6f, PG_U_DECIMAL_NUMBER},
-	{0x000c70, 0x000c76, PG_U_UNASSIGNED},
 	{0x000c77, 0x000c77, PG_U_OTHER_PUNCTUATION},
 	{0x000c78, 0x000c7e, PG_U_OTHER_NUMBER},
 	{0x000c7f, 0x000c7f, PG_U_OTHER_SYMBOL},
@@ -1025,101 +916,68 @@ static const pg_category_range unicode_categories[4009] =
 	{0x000c82, 0x000c83, PG_U_SPACING_MARK},
 	{0x000c84, 0x000c84, PG_U_OTHER_PUNCTUATION},
 	{0x000c85, 0x000c8c, PG_U_OTHER_LETTER},
-	{0x000c8d, 0x000c8d, PG_U_UNASSIGNED},
 	{0x000c8e, 0x000c90, PG_U_OTHER_LETTER},
-	{0x000c91, 0x000c91, PG_U_UNASSIGNED},
 	{0x000c92, 0x000ca8, PG_U_OTHER_LETTER},
-	{0x000ca9, 0x000ca9, PG_U_UNASSIGNED},
 	{0x000caa, 0x000cb3, PG_U_OTHER_LETTER},
-	{0x000cb4, 0x000cb4, PG_U_UNASSIGNED},
 	{0x000cb5, 0x000cb9, PG_U_OTHER_LETTER},
-	{0x000cba, 0x000cbb, PG_U_UNASSIGNED},
 	{0x000cbc, 0x000cbc, PG_U_NONSPACING_MARK},
 	{0x000cbd, 0x000cbd, PG_U_OTHER_LETTER},
 	{0x000cbe, 0x000cbe, PG_U_SPACING_MARK},
 	{0x000cbf, 0x000cbf, PG_U_NONSPACING_MARK},
 	{0x000cc0, 0x000cc4, PG_U_SPACING_MARK},
-	{0x000cc5, 0x000cc5, PG_U_UNASSIGNED},
 	{0x000cc6, 0x000cc6, PG_U_NONSPACING_MARK},
 	{0x000cc7, 0x000cc8, PG_U_SPACING_MARK},
-	{0x000cc9, 0x000cc9, PG_U_UNASSIGNED},
 	{0x000cca, 0x000ccb, PG_U_SPACING_MARK},
 	{0x000ccc, 0x000ccd, PG_U_NONSPACING_MARK},
-	{0x000cce, 0x000cd4, PG_U_UNASSIGNED},
 	{0x000cd5, 0x000cd6, PG_U_SPACING_MARK},
-	{0x000cd7, 0x000cdc, PG_U_UNASSIGNED},
 	{0x000cdd, 0x000cde, PG_U_OTHER_LETTER},
-	{0x000cdf, 0x000cdf, PG_U_UNASSIGNED},
 	{0x000ce0, 0x000ce1, PG_U_OTHER_LETTER},
 	{0x000ce2, 0x000ce3, PG_U_NONSPACING_MARK},
-	{0x000ce4, 0x000ce5, PG_U_UNASSIGNED},
 	{0x000ce6, 0x000cef, PG_U_DECIMAL_NUMBER},
-	{0x000cf0, 0x000cf0, PG_U_UNASSIGNED},
 	{0x000cf1, 0x000cf2, PG_U_OTHER_LETTER},
 	{0x000cf3, 0x000cf3, PG_U_SPACING_MARK},
-	{0x000cf4, 0x000cff, PG_U_UNASSIGNED},
 	{0x000d00, 0x000d01, PG_U_NONSPACING_MARK},
 	{0x000d02, 0x000d03, PG_U_SPACING_MARK},
 	{0x000d04, 0x000d0c, PG_U_OTHER_LETTER},
-	{0x000d0d, 0x000d0d, PG_U_UNASSIGNED},
 	{0x000d0e, 0x000d10, PG_U_OTHER_LETTER},
-	{0x000d11, 0x000d11, PG_U_UNASSIGNED},
 	{0x000d12, 0x000d3a, PG_U_OTHER_LETTER},
 	{0x000d3b, 0x000d3c, PG_U_NONSPACING_MARK},
 	{0x000d3d, 0x000d3d, PG_U_OTHER_LETTER},
 	{0x000d3e, 0x000d40, PG_U_SPACING_MARK},
 	{0x000d41, 0x000d44, PG_U_NONSPACING_MARK},
-	{0x000d45, 0x000d45, PG_U_UNASSIGNED},
 	{0x000d46, 0x000d48, PG_U_SPACING_MARK},
-	{0x000d49, 0x000d49, PG_U_UNASSIGNED},
 	{0x000d4a, 0x000d4c, PG_U_SPACING_MARK},
 	{0x000d4d, 0x000d4d, PG_U_NONSPACING_MARK},
 	{0x000d4e, 0x000d4e, PG_U_OTHER_LETTER},
 	{0x000d4f, 0x000d4f, PG_U_OTHER_SYMBOL},
-	{0x000d50, 0x000d53, PG_U_UNASSIGNED},
 	{0x000d54, 0x000d56, PG_U_OTHER_LETTER},
 	{0x000d57, 0x000d57, PG_U_SPACING_MARK},
 	{0x000d58, 0x000d5e, PG_U_OTHER_NUMBER},
 	{0x000d5f, 0x000d61, PG_U_OTHER_LETTER},
 	{0x000d62, 0x000d63, PG_U_NONSPACING_MARK},
-	{0x000d64, 0x000d65, PG_U_UNASSIGNED},
 	{0x000d66, 0x000d6f, PG_U_DECIMAL_NUMBER},
 	{0x000d70, 0x000d78, PG_U_OTHER_NUMBER},
 	{0x000d79, 0x000d79, PG_U_OTHER_SYMBOL},
 	{0x000d7a, 0x000d7f, PG_U_OTHER_LETTER},
-	{0x000d80, 0x000d80, PG_U_UNASSIGNED},
 	{0x000d81, 0x000d81, PG_U_NONSPACING_MARK},
 	{0x000d82, 0x000d83, PG_U_SPACING_MARK},
-	{0x000d84, 0x000d84, PG_U_UNASSIGNED},
 	{0x000d85, 0x000d96, PG_U_OTHER_LETTER},
-	{0x000d97, 0x000d99, PG_U_UNASSIGNED},
 	{0x000d9a, 0x000db1, PG_U_OTHER_LETTER},
-	{0x000db2, 0x000db2, PG_U_UNASSIGNED},
 	{0x000db3, 0x000dbb, PG_U_OTHER_LETTER},
-	{0x000dbc, 0x000dbc, PG_U_UNASSIGNED},
 	{0x000dbd, 0x000dbd, PG_U_OTHER_LETTER},
-	{0x000dbe, 0x000dbf, PG_U_UNASSIGNED},
 	{0x000dc0, 0x000dc6, PG_U_OTHER_LETTER},
-	{0x000dc7, 0x000dc9, PG_U_UNASSIGNED},
 	{0x000dca, 0x000dca, PG_U_NONSPACING_MARK},
-	{0x000dcb, 0x000dce, PG_U_UNASSIGNED},
 	{0x000dcf, 0x000dd1, PG_U_SPACING_MARK},
 	{0x000dd2, 0x000dd4, PG_U_NONSPACING_MARK},
-	{0x000dd5, 0x000dd5, PG_U_UNASSIGNED},
 	{0x000dd6, 0x000dd6, PG_U_NONSPACING_MARK},
-	{0x000dd7, 0x000dd7, PG_U_UNASSIGNED},
 	{0x000dd8, 0x000ddf, PG_U_SPACING_MARK},
-	{0x000de0, 0x000de5, PG_U_UNASSIGNED},
 	{0x000de6, 0x000def, PG_U_DECIMAL_NUMBER},
-	{0x000df0, 0x000df1, PG_U_UNASSIGNED},
 	{0x000df2, 0x000df3, PG_U_SPACING_MARK},
 	{0x000df4, 0x000df4, PG_U_OTHER_PUNCTUATION},
-	{0x000df5, 0x000e00, PG_U_UNASSIGNED},
 	{0x000e01, 0x000e30, PG_U_OTHER_LETTER},
 	{0x000e31, 0x000e31, PG_U_NONSPACING_MARK},
 	{0x000e32, 0x000e33, PG_U_OTHER_LETTER},
 	{0x000e34, 0x000e3a, PG_U_NONSPACING_MARK},
-	{0x000e3b, 0x000e3e, PG_U_UNASSIGNED},
 	{0x000e3f, 0x000e3f, PG_U_CURRENCY_SYMBOL},
 	{0x000e40, 0x000e45, PG_U_OTHER_LETTER},
 	{0x000e46, 0x000e46, PG_U_MODIFIER_LETTER},
@@ -1127,33 +985,21 @@ static const pg_category_range unicode_categories[4009] =
 	{0x000e4f, 0x000e4f, PG_U_OTHER_PUNCTUATION},
 	{0x000e50, 0x000e59, PG_U_DECIMAL_NUMBER},
 	{0x000e5a, 0x000e5b, PG_U_OTHER_PUNCTUATION},
-	{0x000e5c, 0x000e80, PG_U_UNASSIGNED},
 	{0x000e81, 0x000e82, PG_U_OTHER_LETTER},
-	{0x000e83, 0x000e83, PG_U_UNASSIGNED},
 	{0x000e84, 0x000e84, PG_U_OTHER_LETTER},
-	{0x000e85, 0x000e85, PG_U_UNASSIGNED},
 	{0x000e86, 0x000e8a, PG_U_OTHER_LETTER},
-	{0x000e8b, 0x000e8b, PG_U_UNASSIGNED},
 	{0x000e8c, 0x000ea3, PG_U_OTHER_LETTER},
-	{0x000ea4, 0x000ea4, PG_U_UNASSIGNED},
 	{0x000ea5, 0x000ea5, PG_U_OTHER_LETTER},
-	{0x000ea6, 0x000ea6, PG_U_UNASSIGNED},
 	{0x000ea7, 0x000eb0, PG_U_OTHER_LETTER},
 	{0x000eb1, 0x000eb1, PG_U_NONSPACING_MARK},
 	{0x000eb2, 0x000eb3, PG_U_OTHER_LETTER},
 	{0x000eb4, 0x000ebc, PG_U_NONSPACING_MARK},
 	{0x000ebd, 0x000ebd, PG_U_OTHER_LETTER},
-	{0x000ebe, 0x000ebf, PG_U_UNASSIGNED},
 	{0x000ec0, 0x000ec4, PG_U_OTHER_LETTER},
-	{0x000ec5, 0x000ec5, PG_U_UNASSIGNED},
 	{0x000ec6, 0x000ec6, PG_U_MODIFIER_LETTER},
-	{0x000ec7, 0x000ec7, PG_U_UNASSIGNED},
 	{0x000ec8, 0x000ece, PG_U_NONSPACING_MARK},
-	{0x000ecf, 0x000ecf, PG_U_UNASSIGNED},
 	{0x000ed0, 0x000ed9, PG_U_DECIMAL_NUMBER},
-	{0x000eda, 0x000edb, PG_U_UNASSIGNED},
 	{0x000edc, 0x000edf, PG_U_OTHER_LETTER},
-	{0x000ee0, 0x000eff, PG_U_UNASSIGNED},
 	{0x000f00, 0x000f00, PG_U_OTHER_LETTER},
 	{0x000f01, 0x000f03, PG_U_OTHER_SYMBOL},
 	{0x000f04, 0x000f12, PG_U_OTHER_PUNCTUATION},
@@ -1176,9 +1022,7 @@ static const pg_category_range unicode_categories[4009] =
 	{0x000f3d, 0x000f3d, PG_U_CLOSE_PUNCTUATION},
 	{0x000f3e, 0x000f3f, PG_U_SPACING_MARK},
 	{0x000f40, 0x000f47, PG_U_OTHER_LETTER},
-	{0x000f48, 0x000f48, PG_U_UNASSIGNED},
 	{0x000f49, 0x000f6c, PG_U_OTHER_LETTER},
-	{0x000f6d, 0x000f70, PG_U_UNASSIGNED},
 	{0x000f71, 0x000f7e, PG_U_NONSPACING_MARK},
 	{0x000f7f, 0x000f7f, PG_U_SPACING_MARK},
 	{0x000f80, 0x000f84, PG_U_NONSPACING_MARK},
@@ -1186,18 +1030,14 @@ static const pg_category_range unicode_categories[4009] =
 	{0x000f86, 0x000f87, PG_U_NONSPACING_MARK},
 	{0x000f88, 0x000f8c, PG_U_OTHER_LETTER},
 	{0x000f8d, 0x000f97, PG_U_NONSPACING_MARK},
-	{0x000f98, 0x000f98, PG_U_UNASSIGNED},
 	{0x000f99, 0x000fbc, PG_U_NONSPACING_MARK},
-	{0x000fbd, 0x000fbd, PG_U_UNASSIGNED},
 	{0x000fbe, 0x000fc5, PG_U_OTHER_SYMBOL},
 	{0x000fc6, 0x000fc6, PG_U_NONSPACING_MARK},
 	{0x000fc7, 0x000fcc, PG_U_OTHER_SYMBOL},
-	{0x000fcd, 0x000fcd, PG_U_UNASSIGNED},
 	{0x000fce, 0x000fcf, PG_U_OTHER_SYMBOL},
 	{0x000fd0, 0x000fd4, PG_U_OTHER_PUNCTUATION},
 	{0x000fd5, 0x000fd8, PG_U_OTHER_SYMBOL},
 	{0x000fd9, 0x000fda, PG_U_OTHER_PUNCTUATION},
-	{0x000fdb, 0x000fff, PG_U_UNASSIGNED},
 	{0x001000, 0x00102a, PG_U_OTHER_LETTER},
 	{0x00102b, 0x00102c, PG_U_SPACING_MARK},
 	{0x00102d, 0x001030, PG_U_NONSPACING_MARK},
@@ -1234,58 +1074,35 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00109d, 0x00109d, PG_U_NONSPACING_MARK},
 	{0x00109e, 0x00109f, PG_U_OTHER_SYMBOL},
 	{0x0010a0, 0x0010c5, PG_U_UPPERCASE_LETTER},
-	{0x0010c6, 0x0010c6, PG_U_UNASSIGNED},
 	{0x0010c7, 0x0010c7, PG_U_UPPERCASE_LETTER},
-	{0x0010c8, 0x0010cc, PG_U_UNASSIGNED},
 	{0x0010cd, 0x0010cd, PG_U_UPPERCASE_LETTER},
-	{0x0010ce, 0x0010cf, PG_U_UNASSIGNED},
 	{0x0010d0, 0x0010fa, PG_U_LOWERCASE_LETTER},
 	{0x0010fb, 0x0010fb, PG_U_OTHER_PUNCTUATION},
 	{0x0010fc, 0x0010fc, PG_U_MODIFIER_LETTER},
 	{0x0010fd, 0x0010ff, PG_U_LOWERCASE_LETTER},
 	{0x001100, 0x001248, PG_U_OTHER_LETTER},
-	{0x001249, 0x001249, PG_U_UNASSIGNED},
 	{0x00124a, 0x00124d, PG_U_OTHER_LETTER},
-	{0x00124e, 0x00124f, PG_U_UNASSIGNED},
 	{0x001250, 0x001256, PG_U_OTHER_LETTER},
-	{0x001257, 0x001257, PG_U_UNASSIGNED},
 	{0x001258, 0x001258, PG_U_OTHER_LETTER},
-	{0x001259, 0x001259, PG_U_UNASSIGNED},
 	{0x00125a, 0x00125d, PG_U_OTHER_LETTER},
-	{0x00125e, 0x00125f, PG_U_UNASSIGNED},
 	{0x001260, 0x001288, PG_U_OTHER_LETTER},
-	{0x001289, 0x001289, PG_U_UNASSIGNED},
 	{0x00128a, 0x00128d, PG_U_OTHER_LETTER},
-	{0x00128e, 0x00128f, PG_U_UNASSIGNED},
 	{0x001290, 0x0012b0, PG_U_OTHER_LETTER},
-	{0x0012b1, 0x0012b1, PG_U_UNASSIGNED},
 	{0x0012b2, 0x0012b5, PG_U_OTHER_LETTER},
-	{0x0012b6, 0x0012b7, PG_U_UNASSIGNED},
 	{0x0012b8, 0x0012be, PG_U_OTHER_LETTER},
-	{0x0012bf, 0x0012bf, PG_U_UNASSIGNED},
 	{0x0012c0, 0x0012c0, PG_U_OTHER_LETTER},
-	{0x0012c1, 0x0012c1, PG_U_UNASSIGNED},
 	{0x0012c2, 0x0012c5, PG_U_OTHER_LETTER},
-	{0x0012c6, 0x0012c7, PG_U_UNASSIGNED},
 	{0x0012c8, 0x0012d6, PG_U_OTHER_LETTER},
-	{0x0012d7, 0x0012d7, PG_U_UNASSIGNED},
 	{0x0012d8, 0x001310, PG_U_OTHER_LETTER},
-	{0x001311, 0x001311, PG_U_UNASSIGNED},
 	{0x001312, 0x001315, PG_U_OTHER_LETTER},
-	{0x001316, 0x001317, PG_U_UNASSIGNED},
 	{0x001318, 0x00135a, PG_U_OTHER_LETTER},
-	{0x00135b, 0x00135c, PG_U_UNASSIGNED},
 	{0x00135d, 0x00135f, PG_U_NONSPACING_MARK},
 	{0x001360, 0x001368, PG_U_OTHER_PUNCTUATION},
 	{0x001369, 0x00137c, PG_U_OTHER_NUMBER},
-	{0x00137d, 0x00137f, PG_U_UNASSIGNED},
 	{0x001380, 0x00138f, PG_U_OTHER_LETTER},
 	{0x001390, 0x001399, PG_U_OTHER_SYMBOL},
-	{0x00139a, 0x00139f, PG_U_UNASSIGNED},
 	{0x0013a0, 0x0013f5, PG_U_UPPERCASE_LETTER},
-	{0x0013f6, 0x0013f7, PG_U_UNASSIGNED},
 	{0x0013f8, 0x0013fd, PG_U_LOWERCASE_LETTER},
-	{0x0013fe, 0x0013ff, PG_U_UNASSIGNED},
 	{0x001400, 0x001400, PG_U_DASH_PUNCTUATION},
 	{0x001401, 0x00166c, PG_U_OTHER_LETTER},
 	{0x00166d, 0x00166d, PG_U_OTHER_SYMBOL},
@@ -1295,30 +1112,22 @@ static const pg_category_range unicode_categories[4009] =
 	{0x001681, 0x00169a, PG_U_OTHER_LETTER},
 	{0x00169b, 0x00169b, PG_U_OPEN_PUNCTUATION},
 	{0x00169c, 0x00169c, PG_U_CLOSE_PUNCTUATION},
-	{0x00169d, 0x00169f, PG_U_UNASSIGNED},
 	{0x0016a0, 0x0016ea, PG_U_OTHER_LETTER},
 	{0x0016eb, 0x0016ed, PG_U_OTHER_PUNCTUATION},
 	{0x0016ee, 0x0016f0, PG_U_LETTER_NUMBER},
 	{0x0016f1, 0x0016f8, PG_U_OTHER_LETTER},
-	{0x0016f9, 0x0016ff, PG_U_UNASSIGNED},
 	{0x001700, 0x001711, PG_U_OTHER_LETTER},
 	{0x001712, 0x001714, PG_U_NONSPACING_MARK},
 	{0x001715, 0x001715, PG_U_SPACING_MARK},
-	{0x001716, 0x00171e, PG_U_UNASSIGNED},
 	{0x00171f, 0x001731, PG_U_OTHER_LETTER},
 	{0x001732, 0x001733, PG_U_NONSPACING_MARK},
 	{0x001734, 0x001734, PG_U_SPACING_MARK},
 	{0x001735, 0x001736, PG_U_OTHER_PUNCTUATION},
-	{0x001737, 0x00173f, PG_U_UNASSIGNED},
 	{0x001740, 0x001751, PG_U_OTHER_LETTER},
 	{0x001752, 0x001753, PG_U_NONSPACING_MARK},
-	{0x001754, 0x00175f, PG_U_UNASSIGNED},
 	{0x001760, 0x00176c, PG_U_OTHER_LETTER},
-	{0x00176d, 0x00176d, PG_U_UNASSIGNED},
 	{0x00176e, 0x001770, PG_U_OTHER_LETTER},
-	{0x001771, 0x001771, PG_U_UNASSIGNED},
 	{0x001772, 0x001773, PG_U_NONSPACING_MARK},
-	{0x001774, 0x00177f, PG_U_UNASSIGNED},
 	{0x001780, 0x0017b3, PG_U_OTHER_LETTER},
 	{0x0017b4, 0x0017b5, PG_U_NONSPACING_MARK},
 	{0x0017b6, 0x0017b6, PG_U_SPACING_MARK},
@@ -1333,11 +1142,8 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0017db, 0x0017db, PG_U_CURRENCY_SYMBOL},
 	{0x0017dc, 0x0017dc, PG_U_OTHER_LETTER},
 	{0x0017dd, 0x0017dd, PG_U_NONSPACING_MARK},
-	{0x0017de, 0x0017df, PG_U_UNASSIGNED},
 	{0x0017e0, 0x0017e9, PG_U_DECIMAL_NUMBER},
-	{0x0017ea, 0x0017ef, PG_U_UNASSIGNED},
 	{0x0017f0, 0x0017f9, PG_U_OTHER_NUMBER},
-	{0x0017fa, 0x0017ff, PG_U_UNASSIGNED},
 	{0x001800, 0x001805, PG_U_OTHER_PUNCTUATION},
 	{0x001806, 0x001806, PG_U_DASH_PUNCTUATION},
 	{0x001807, 0x00180a, PG_U_OTHER_PUNCTUATION},
@@ -1345,59 +1151,44 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00180e, 0x00180e, PG_U_FORMAT},
 	{0x00180f, 0x00180f, PG_U_NONSPACING_MARK},
 	{0x001810, 0x001819, PG_U_DECIMAL_NUMBER},
-	{0x00181a, 0x00181f, PG_U_UNASSIGNED},
 	{0x001820, 0x001842, PG_U_OTHER_LETTER},
 	{0x001843, 0x001843, PG_U_MODIFIER_LETTER},
 	{0x001844, 0x001878, PG_U_OTHER_LETTER},
-	{0x001879, 0x00187f, PG_U_UNASSIGNED},
 	{0x001880, 0x001884, PG_U_OTHER_LETTER},
 	{0x001885, 0x001886, PG_U_NONSPACING_MARK},
 	{0x001887, 0x0018a8, PG_U_OTHER_LETTER},
 	{0x0018a9, 0x0018a9, PG_U_NONSPACING_MARK},
 	{0x0018aa, 0x0018aa, PG_U_OTHER_LETTER},
-	{0x0018ab, 0x0018af, PG_U_UNASSIGNED},
 	{0x0018b0, 0x0018f5, PG_U_OTHER_LETTER},
-	{0x0018f6, 0x0018ff, PG_U_UNASSIGNED},
 	{0x001900, 0x00191e, PG_U_OTHER_LETTER},
-	{0x00191f, 0x00191f, PG_U_UNASSIGNED},
 	{0x001920, 0x001922, PG_U_NONSPACING_MARK},
 	{0x001923, 0x001926, PG_U_SPACING_MARK},
 	{0x001927, 0x001928, PG_U_NONSPACING_MARK},
 	{0x001929, 0x00192b, PG_U_SPACING_MARK},
-	{0x00192c, 0x00192f, PG_U_UNASSIGNED},
 	{0x001930, 0x001931, PG_U_SPACING_MARK},
 	{0x001932, 0x001932, PG_U_NONSPACING_MARK},
 	{0x001933, 0x001938, PG_U_SPACING_MARK},
 	{0x001939, 0x00193b, PG_U_NONSPACING_MARK},
-	{0x00193c, 0x00193f, PG_U_UNASSIGNED},
 	{0x001940, 0x001940, PG_U_OTHER_SYMBOL},
-	{0x001941, 0x001943, PG_U_UNASSIGNED},
 	{0x001944, 0x001945, PG_U_OTHER_PUNCTUATION},
 	{0x001946, 0x00194f, PG_U_DECIMAL_NUMBER},
 	{0x001950, 0x00196d, PG_U_OTHER_LETTER},
-	{0x00196e, 0x00196f, PG_U_UNASSIGNED},
 	{0x001970, 0x001974, PG_U_OTHER_LETTER},
-	{0x001975, 0x00197f, PG_U_UNASSIGNED},
 	{0x001980, 0x0019ab, PG_U_OTHER_LETTER},
-	{0x0019ac, 0x0019af, PG_U_UNASSIGNED},
 	{0x0019b0, 0x0019c9, PG_U_OTHER_LETTER},
-	{0x0019ca, 0x0019cf, PG_U_UNASSIGNED},
 	{0x0019d0, 0x0019d9, PG_U_DECIMAL_NUMBER},
 	{0x0019da, 0x0019da, PG_U_OTHER_NUMBER},
-	{0x0019db, 0x0019dd, PG_U_UNASSIGNED},
 	{0x0019de, 0x0019ff, PG_U_OTHER_SYMBOL},
 	{0x001a00, 0x001a16, PG_U_OTHER_LETTER},
 	{0x001a17, 0x001a18, PG_U_NONSPACING_MARK},
 	{0x001a19, 0x001a1a, PG_U_SPACING_MARK},
 	{0x001a1b, 0x001a1b, PG_U_NONSPACING_MARK},
-	{0x001a1c, 0x001a1d, PG_U_UNASSIGNED},
 	{0x001a1e, 0x001a1f, PG_U_OTHER_PUNCTUATION},
 	{0x001a20, 0x001a54, PG_U_OTHER_LETTER},
 	{0x001a55, 0x001a55, PG_U_SPACING_MARK},
 	{0x001a56, 0x001a56, PG_U_NONSPACING_MARK},
 	{0x001a57, 0x001a57, PG_U_SPACING_MARK},
 	{0x001a58, 0x001a5e, PG_U_NONSPACING_MARK},
-	{0x001a5f, 0x001a5f, PG_U_UNASSIGNED},
 	{0x001a60, 0x001a60, PG_U_NONSPACING_MARK},
 	{0x001a61, 0x001a61, PG_U_SPACING_MARK},
 	{0x001a62, 0x001a62, PG_U_NONSPACING_MARK},
@@ -1405,20 +1196,15 @@ static const pg_category_range unicode_categories[4009] =
 	{0x001a65, 0x001a6c, PG_U_NONSPACING_MARK},
 	{0x001a6d, 0x001a72, PG_U_SPACING_MARK},
 	{0x001a73, 0x001a7c, PG_U_NONSPACING_MARK},
-	{0x001a7d, 0x001a7e, PG_U_UNASSIGNED},
 	{0x001a7f, 0x001a7f, PG_U_NONSPACING_MARK},
 	{0x001a80, 0x001a89, PG_U_DECIMAL_NUMBER},
-	{0x001a8a, 0x001a8f, PG_U_UNASSIGNED},
 	{0x001a90, 0x001a99, PG_U_DECIMAL_NUMBER},
-	{0x001a9a, 0x001a9f, PG_U_UNASSIGNED},
 	{0x001aa0, 0x001aa6, PG_U_OTHER_PUNCTUATION},
 	{0x001aa7, 0x001aa7, PG_U_MODIFIER_LETTER},
 	{0x001aa8, 0x001aad, PG_U_OTHER_PUNCTUATION},
-	{0x001aae, 0x001aaf, PG_U_UNASSIGNED},
 	{0x001ab0, 0x001abd, PG_U_NONSPACING_MARK},
 	{0x001abe, 0x001abe, PG_U_ENCLOSING_MARK},
 	{0x001abf, 0x001ace, PG_U_NONSPACING_MARK},
-	{0x001acf, 0x001aff, PG_U_UNASSIGNED},
 	{0x001b00, 0x001b03, PG_U_NONSPACING_MARK},
 	{0x001b04, 0x001b04, PG_U_SPACING_MARK},
 	{0x001b05, 0x001b33, PG_U_OTHER_LETTER},
@@ -1431,14 +1217,12 @@ static const pg_category_range unicode_categories[4009] =
 	{0x001b42, 0x001b42, PG_U_NONSPACING_MARK},
 	{0x001b43, 0x001b44, PG_U_SPACING_MARK},
 	{0x001b45, 0x001b4c, PG_U_OTHER_LETTER},
-	{0x001b4d, 0x001b4f, PG_U_UNASSIGNED},
 	{0x001b50, 0x001b59, PG_U_DECIMAL_NUMBER},
 	{0x001b5a, 0x001b60, PG_U_OTHER_PUNCTUATION},
 	{0x001b61, 0x001b6a, PG_U_OTHER_SYMBOL},
 	{0x001b6b, 0x001b73, PG_U_NONSPACING_MARK},
 	{0x001b74, 0x001b7c, PG_U_OTHER_SYMBOL},
 	{0x001b7d, 0x001b7e, PG_U_OTHER_PUNCTUATION},
-	{0x001b7f, 0x001b7f, PG_U_UNASSIGNED},
 	{0x001b80, 0x001b81, PG_U_NONSPACING_MARK},
 	{0x001b82, 0x001b82, PG_U_SPACING_MARK},
 	{0x001b83, 0x001ba0, PG_U_OTHER_LETTER},
@@ -1459,29 +1243,23 @@ static const pg_category_range unicode_categories[4009] =
 	{0x001bee, 0x001bee, PG_U_SPACING_MARK},
 	{0x001bef, 0x001bf1, PG_U_NONSPACING_MARK},
 	{0x001bf2, 0x001bf3, PG_U_SPACING_MARK},
-	{0x001bf4, 0x001bfb, PG_U_UNASSIGNED},
 	{0x001bfc, 0x001bff, PG_U_OTHER_PUNCTUATION},
 	{0x001c00, 0x001c23, PG_U_OTHER_LETTER},
 	{0x001c24, 0x001c2b, PG_U_SPACING_MARK},
 	{0x001c2c, 0x001c33, PG_U_NONSPACING_MARK},
 	{0x001c34, 0x001c35, PG_U_SPACING_MARK},
 	{0x001c36, 0x001c37, PG_U_NONSPACING_MARK},
-	{0x001c38, 0x001c3a, PG_U_UNASSIGNED},
 	{0x001c3b, 0x001c3f, PG_U_OTHER_PUNCTUATION},
 	{0x001c40, 0x001c49, PG_U_DECIMAL_NUMBER},
-	{0x001c4a, 0x001c4c, PG_U_UNASSIGNED},
 	{0x001c4d, 0x001c4f, PG_U_OTHER_LETTER},
 	{0x001c50, 0x001c59, PG_U_DECIMAL_NUMBER},
 	{0x001c5a, 0x001c77, PG_U_OTHER_LETTER},
 	{0x001c78, 0x001c7d, PG_U_MODIFIER_LETTER},
 	{0x001c7e, 0x001c7f, PG_U_OTHER_PUNCTUATION},
 	{0x001c80, 0x001c88, PG_U_LOWERCASE_LETTER},
-	{0x001c89, 0x001c8f, PG_U_UNASSIGNED},
 	{0x001c90, 0x001cba, PG_U_UPPERCASE_LETTER},
-	{0x001cbb, 0x001cbc, PG_U_UNASSIGNED},
 	{0x001cbd, 0x001cbf, PG_U_UPPERCASE_LETTER},
 	{0x001cc0, 0x001cc7, PG_U_OTHER_PUNCTUATION},
-	{0x001cc8, 0x001ccf, PG_U_UNASSIGNED},
 	{0x001cd0, 0x001cd2, PG_U_NONSPACING_MARK},
 	{0x001cd3, 0x001cd3, PG_U_OTHER_PUNCTUATION},
 	{0x001cd4, 0x001ce0, PG_U_NONSPACING_MARK},
@@ -1495,7 +1273,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x001cf7, 0x001cf7, PG_U_SPACING_MARK},
 	{0x001cf8, 0x001cf9, PG_U_NONSPACING_MARK},
 	{0x001cfa, 0x001cfa, PG_U_OTHER_LETTER},
-	{0x001cfb, 0x001cff, PG_U_UNASSIGNED},
 	{0x001d00, 0x001d2b, PG_U_LOWERCASE_LETTER},
 	{0x001d2c, 0x001d6a, PG_U_MODIFIER_LETTER},
 	{0x001d6b, 0x001d77, PG_U_LOWERCASE_LETTER},
@@ -1753,30 +1530,21 @@ static const pg_category_range unicode_categories[4009] =
 	{0x001eff, 0x001f07, PG_U_LOWERCASE_LETTER},
 	{0x001f08, 0x001f0f, PG_U_UPPERCASE_LETTER},
 	{0x001f10, 0x001f15, PG_U_LOWERCASE_LETTER},
-	{0x001f16, 0x001f17, PG_U_UNASSIGNED},
 	{0x001f18, 0x001f1d, PG_U_UPPERCASE_LETTER},
-	{0x001f1e, 0x001f1f, PG_U_UNASSIGNED},
 	{0x001f20, 0x001f27, PG_U_LOWERCASE_LETTER},
 	{0x001f28, 0x001f2f, PG_U_UPPERCASE_LETTER},
 	{0x001f30, 0x001f37, PG_U_LOWERCASE_LETTER},
 	{0x001f38, 0x001f3f, PG_U_UPPERCASE_LETTER},
 	{0x001f40, 0x001f45, PG_U_LOWERCASE_LETTER},
-	{0x001f46, 0x001f47, PG_U_UNASSIGNED},
 	{0x001f48, 0x001f4d, PG_U_UPPERCASE_LETTER},
-	{0x001f4e, 0x001f4f, PG_U_UNASSIGNED},
 	{0x001f50, 0x001f57, PG_U_LOWERCASE_LETTER},
-	{0x001f58, 0x001f58, PG_U_UNASSIGNED},
 	{0x001f59, 0x001f59, PG_U_UPPERCASE_LETTER},
-	{0x001f5a, 0x001f5a, PG_U_UNASSIGNED},
 	{0x001f5b, 0x001f5b, PG_U_UPPERCASE_LETTER},
-	{0x001f5c, 0x001f5c, PG_U_UNASSIGNED},
 	{0x001f5d, 0x001f5d, PG_U_UPPERCASE_LETTER},
-	{0x001f5e, 0x001f5e, PG_U_UNASSIGNED},
 	{0x001f5f, 0x001f5f, PG_U_UPPERCASE_LETTER},
 	{0x001f60, 0x001f67, PG_U_LOWERCASE_LETTER},
 	{0x001f68, 0x001f6f, PG_U_UPPERCASE_LETTER},
 	{0x001f70, 0x001f7d, PG_U_LOWERCASE_LETTER},
-	{0x001f7e, 0x001f7f, PG_U_UNASSIGNED},
 	{0x001f80, 0x001f87, PG_U_LOWERCASE_LETTER},
 	{0x001f88, 0x001f8f, PG_U_TITLECASE_LETTER},
 	{0x001f90, 0x001f97, PG_U_LOWERCASE_LETTER},
@@ -1784,7 +1552,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x001fa0, 0x001fa7, PG_U_LOWERCASE_LETTER},
 	{0x001fa8, 0x001faf, PG_U_TITLECASE_LETTER},
 	{0x001fb0, 0x001fb4, PG_U_LOWERCASE_LETTER},
-	{0x001fb5, 0x001fb5, PG_U_UNASSIGNED},
 	{0x001fb6, 0x001fb7, PG_U_LOWERCASE_LETTER},
 	{0x001fb8, 0x001fbb, PG_U_UPPERCASE_LETTER},
 	{0x001fbc, 0x001fbc, PG_U_TITLECASE_LETTER},
@@ -1792,28 +1559,22 @@ static const pg_category_range unicode_categories[4009] =
 	{0x001fbe, 0x001fbe, PG_U_LOWERCASE_LETTER},
 	{0x001fbf, 0x001fc1, PG_U_MODIFIER_SYMBOL},
 	{0x001fc2, 0x001fc4, PG_U_LOWERCASE_LETTER},
-	{0x001fc5, 0x001fc5, PG_U_UNASSIGNED},
 	{0x001fc6, 0x001fc7, PG_U_LOWERCASE_LETTER},
 	{0x001fc8, 0x001fcb, PG_U_UPPERCASE_LETTER},
 	{0x001fcc, 0x001fcc, PG_U_TITLECASE_LETTER},
 	{0x001fcd, 0x001fcf, PG_U_MODIFIER_SYMBOL},
 	{0x001fd0, 0x001fd3, PG_U_LOWERCASE_LETTER},
-	{0x001fd4, 0x001fd5, PG_U_UNASSIGNED},
 	{0x001fd6, 0x001fd7, PG_U_LOWERCASE_LETTER},
 	{0x001fd8, 0x001fdb, PG_U_UPPERCASE_LETTER},
-	{0x001fdc, 0x001fdc, PG_U_UNASSIGNED},
 	{0x001fdd, 0x001fdf, PG_U_MODIFIER_SYMBOL},
 	{0x001fe0, 0x001fe7, PG_U_LOWERCASE_LETTER},
 	{0x001fe8, 0x001fec, PG_U_UPPERCASE_LETTER},
 	{0x001fed, 0x001fef, PG_U_MODIFIER_SYMBOL},
-	{0x001ff0, 0x001ff1, PG_U_UNASSIGNED},
 	{0x001ff2, 0x001ff4, PG_U_LOWERCASE_LETTER},
-	{0x001ff5, 0x001ff5, PG_U_UNASSIGNED},
 	{0x001ff6, 0x001ff7, PG_U_LOWERCASE_LETTER},
 	{0x001ff8, 0x001ffb, PG_U_UPPERCASE_LETTER},
 	{0x001ffc, 0x001ffc, PG_U_TITLECASE_LETTER},
 	{0x001ffd, 0x001ffe, PG_U_MODIFIER_SYMBOL},
-	{0x001fff, 0x001fff, PG_U_UNASSIGNED},
 	{0x002000, 0x00200a, PG_U_SPACE_SEPARATOR},
 	{0x00200b, 0x00200f, PG_U_FORMAT},
 	{0x002010, 0x002015, PG_U_DASH_PUNCTUATION},
@@ -1846,11 +1607,9 @@ static const pg_category_range unicode_categories[4009] =
 	{0x002055, 0x00205e, PG_U_OTHER_PUNCTUATION},
 	{0x00205f, 0x00205f, PG_U_SPACE_SEPARATOR},
 	{0x002060, 0x002064, PG_U_FORMAT},
-	{0x002065, 0x002065, PG_U_UNASSIGNED},
 	{0x002066, 0x00206f, PG_U_FORMAT},
 	{0x002070, 0x002070, PG_U_OTHER_NUMBER},
 	{0x002071, 0x002071, PG_U_MODIFIER_LETTER},
-	{0x002072, 0x002073, PG_U_UNASSIGNED},
 	{0x002074, 0x002079, PG_U_OTHER_NUMBER},
 	{0x00207a, 0x00207c, PG_U_MATH_SYMBOL},
 	{0x00207d, 0x00207d, PG_U_OPEN_PUNCTUATION},
@@ -1860,17 +1619,13 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00208a, 0x00208c, PG_U_MATH_SYMBOL},
 	{0x00208d, 0x00208d, PG_U_OPEN_PUNCTUATION},
 	{0x00208e, 0x00208e, PG_U_CLOSE_PUNCTUATION},
-	{0x00208f, 0x00208f, PG_U_UNASSIGNED},
 	{0x002090, 0x00209c, PG_U_MODIFIER_LETTER},
-	{0x00209d, 0x00209f, PG_U_UNASSIGNED},
 	{0x0020a0, 0x0020c0, PG_U_CURRENCY_SYMBOL},
-	{0x0020c1, 0x0020cf, PG_U_UNASSIGNED},
 	{0x0020d0, 0x0020dc, PG_U_NONSPACING_MARK},
 	{0x0020dd, 0x0020e0, PG_U_ENCLOSING_MARK},
 	{0x0020e1, 0x0020e1, PG_U_NONSPACING_MARK},
 	{0x0020e2, 0x0020e4, PG_U_ENCLOSING_MARK},
 	{0x0020e5, 0x0020f0, PG_U_NONSPACING_MARK},
-	{0x0020f1, 0x0020ff, PG_U_UNASSIGNED},
 	{0x002100, 0x002101, PG_U_OTHER_SYMBOL},
 	{0x002102, 0x002102, PG_U_UPPERCASE_LETTER},
 	{0x002103, 0x002106, PG_U_OTHER_SYMBOL},
@@ -1918,7 +1673,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x002185, 0x002188, PG_U_LETTER_NUMBER},
 	{0x002189, 0x002189, PG_U_OTHER_NUMBER},
 	{0x00218a, 0x00218b, PG_U_OTHER_SYMBOL},
-	{0x00218c, 0x00218f, PG_U_UNASSIGNED},
 	{0x002190, 0x002194, PG_U_MATH_SYMBOL},
 	{0x002195, 0x002199, PG_U_OTHER_SYMBOL},
 	{0x00219a, 0x00219b, PG_U_MATH_SYMBOL},
@@ -1955,9 +1709,7 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0023b4, 0x0023db, PG_U_OTHER_SYMBOL},
 	{0x0023dc, 0x0023e1, PG_U_MATH_SYMBOL},
 	{0x0023e2, 0x002426, PG_U_OTHER_SYMBOL},
-	{0x002427, 0x00243f, PG_U_UNASSIGNED},
 	{0x002440, 0x00244a, PG_U_OTHER_SYMBOL},
-	{0x00244b, 0x00245f, PG_U_UNASSIGNED},
 	{0x002460, 0x00249b, PG_U_OTHER_NUMBER},
 	{0x00249c, 0x0024e9, PG_U_OTHER_SYMBOL},
 	{0x0024ea, 0x0024ff, PG_U_OTHER_NUMBER},
@@ -2039,9 +1791,7 @@ static const pg_category_range unicode_categories[4009] =
 	{0x002b45, 0x002b46, PG_U_OTHER_SYMBOL},
 	{0x002b47, 0x002b4c, PG_U_MATH_SYMBOL},
 	{0x002b4d, 0x002b73, PG_U_OTHER_SYMBOL},
-	{0x002b74, 0x002b75, PG_U_UNASSIGNED},
 	{0x002b76, 0x002b95, PG_U_OTHER_SYMBOL},
-	{0x002b96, 0x002b96, PG_U_UNASSIGNED},
 	{0x002b97, 0x002bff, PG_U_OTHER_SYMBOL},
 	{0x002c00, 0x002c2f, PG_U_UPPERCASE_LETTER},
 	{0x002c30, 0x002c5f, PG_U_LOWERCASE_LETTER},
@@ -2170,40 +1920,25 @@ static const pg_category_range unicode_categories[4009] =
 	{0x002cef, 0x002cf1, PG_U_NONSPACING_MARK},
 	{0x002cf2, 0x002cf2, PG_U_UPPERCASE_LETTER},
 	{0x002cf3, 0x002cf3, PG_U_LOWERCASE_LETTER},
-	{0x002cf4, 0x002cf8, PG_U_UNASSIGNED},
 	{0x002cf9, 0x002cfc, PG_U_OTHER_PUNCTUATION},
 	{0x002cfd, 0x002cfd, PG_U_OTHER_NUMBER},
 	{0x002cfe, 0x002cff, PG_U_OTHER_PUNCTUATION},
 	{0x002d00, 0x002d25, PG_U_LOWERCASE_LETTER},
-	{0x002d26, 0x002d26, PG_U_UNASSIGNED},
 	{0x002d27, 0x002d27, PG_U_LOWERCASE_LETTER},
-	{0x002d28, 0x002d2c, PG_U_UNASSIGNED},
 	{0x002d2d, 0x002d2d, PG_U_LOWERCASE_LETTER},
-	{0x002d2e, 0x002d2f, PG_U_UNASSIGNED},
 	{0x002d30, 0x002d67, PG_U_OTHER_LETTER},
-	{0x002d68, 0x002d6e, PG_U_UNASSIGNED},
 	{0x002d6f, 0x002d6f, PG_U_MODIFIER_LETTER},
 	{0x002d70, 0x002d70, PG_U_OTHER_PUNCTUATION},
-	{0x002d71, 0x002d7e, PG_U_UNASSIGNED},
 	{0x002d7f, 0x002d7f, PG_U_NONSPACING_MARK},
 	{0x002d80, 0x002d96, PG_U_OTHER_LETTER},
-	{0x002d97, 0x002d9f, PG_U_UNASSIGNED},
 	{0x002da0, 0x002da6, PG_U_OTHER_LETTER},
-	{0x002da7, 0x002da7, PG_U_UNASSIGNED},
 	{0x002da8, 0x002dae, PG_U_OTHER_LETTER},
-	{0x002daf, 0x002daf, PG_U_UNASSIGNED},
 	{0x002db0, 0x002db6, PG_U_OTHER_LETTER},
-	{0x002db7, 0x002db7, PG_U_UNASSIGNED},
 	{0x002db8, 0x002dbe, PG_U_OTHER_LETTER},
-	{0x002dbf, 0x002dbf, PG_U_UNASSIGNED},
 	{0x002dc0, 0x002dc6, PG_U_OTHER_LETTER},
-	{0x002dc7, 0x002dc7, PG_U_UNASSIGNED},
 	{0x002dc8, 0x002dce, PG_U_OTHER_LETTER},
-	{0x002dcf, 0x002dcf, PG_U_UNASSIGNED},
 	{0x002dd0, 0x002dd6, PG_U_OTHER_LETTER},
-	{0x002dd7, 0x002dd7, PG_U_UNASSIGNED},
 	{0x002dd8, 0x002dde, PG_U_OTHER_LETTER},
-	{0x002ddf, 0x002ddf, PG_U_UNASSIGNED},
 	{0x002de0, 0x002dff, PG_U_NONSPACING_MARK},
 	{0x002e00, 0x002e01, PG_U_OTHER_PUNCTUATION},
 	{0x002e02, 0x002e02, PG_U_INITIAL_PUNCTUATION},
@@ -2254,13 +1989,9 @@ static const pg_category_range unicode_categories[4009] =
 	{0x002e5b, 0x002e5b, PG_U_OPEN_PUNCTUATION},
 	{0x002e5c, 0x002e5c, PG_U_CLOSE_PUNCTUATION},
 	{0x002e5d, 0x002e5d, PG_U_DASH_PUNCTUATION},
-	{0x002e5e, 0x002e7f, PG_U_UNASSIGNED},
 	{0x002e80, 0x002e99, PG_U_OTHER_SYMBOL},
-	{0x002e9a, 0x002e9a, PG_U_UNASSIGNED},
 	{0x002e9b, 0x002ef3, PG_U_OTHER_SYMBOL},
-	{0x002ef4, 0x002eff, PG_U_UNASSIGNED},
 	{0x002f00, 0x002fd5, PG_U_OTHER_SYMBOL},
-	{0x002fd6, 0x002fef, PG_U_UNASSIGNED},
 	{0x002ff0, 0x002fff, PG_U_OTHER_SYMBOL},
 	{0x003000, 0x003000, PG_U_SPACE_SEPARATOR},
 	{0x003001, 0x003003, PG_U_OTHER_PUNCTUATION},
@@ -2302,9 +2033,7 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00303c, 0x00303c, PG_U_OTHER_LETTER},
 	{0x00303d, 0x00303d, PG_U_OTHER_PUNCTUATION},
 	{0x00303e, 0x00303f, PG_U_OTHER_SYMBOL},
-	{0x003040, 0x003040, PG_U_UNASSIGNED},
 	{0x003041, 0x003096, PG_U_OTHER_LETTER},
-	{0x003097, 0x003098, PG_U_UNASSIGNED},
 	{0x003099, 0x00309a, PG_U_NONSPACING_MARK},
 	{0x00309b, 0x00309c, PG_U_MODIFIER_SYMBOL},
 	{0x00309d, 0x00309e, PG_U_MODIFIER_LETTER},
@@ -2314,21 +2043,16 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0030fb, 0x0030fb, PG_U_OTHER_PUNCTUATION},
 	{0x0030fc, 0x0030fe, PG_U_MODIFIER_LETTER},
 	{0x0030ff, 0x0030ff, PG_U_OTHER_LETTER},
-	{0x003100, 0x003104, PG_U_UNASSIGNED},
 	{0x003105, 0x00312f, PG_U_OTHER_LETTER},
-	{0x003130, 0x003130, PG_U_UNASSIGNED},
 	{0x003131, 0x00318e, PG_U_OTHER_LETTER},
-	{0x00318f, 0x00318f, PG_U_UNASSIGNED},
 	{0x003190, 0x003191, PG_U_OTHER_SYMBOL},
 	{0x003192, 0x003195, PG_U_OTHER_NUMBER},
 	{0x003196, 0x00319f, PG_U_OTHER_SYMBOL},
 	{0x0031a0, 0x0031bf, PG_U_OTHER_LETTER},
 	{0x0031c0, 0x0031e3, PG_U_OTHER_SYMBOL},
-	{0x0031e4, 0x0031ee, PG_U_UNASSIGNED},
 	{0x0031ef, 0x0031ef, PG_U_OTHER_SYMBOL},
 	{0x0031f0, 0x0031ff, PG_U_OTHER_LETTER},
 	{0x003200, 0x00321e, PG_U_OTHER_SYMBOL},
-	{0x00321f, 0x00321f, PG_U_UNASSIGNED},
 	{0x003220, 0x003229, PG_U_OTHER_NUMBER},
 	{0x00322a, 0x003247, PG_U_OTHER_SYMBOL},
 	{0x003248, 0x00324f, PG_U_OTHER_NUMBER},
@@ -2344,9 +2068,7 @@ static const pg_category_range unicode_categories[4009] =
 	{0x004e00, 0x00a014, PG_U_OTHER_LETTER},
 	{0x00a015, 0x00a015, PG_U_MODIFIER_LETTER},
 	{0x00a016, 0x00a48c, PG_U_OTHER_LETTER},
-	{0x00a48d, 0x00a48f, PG_U_UNASSIGNED},
 	{0x00a490, 0x00a4c6, PG_U_OTHER_SYMBOL},
-	{0x00a4c7, 0x00a4cf, PG_U_UNASSIGNED},
 	{0x00a4d0, 0x00a4f7, PG_U_OTHER_LETTER},
 	{0x00a4f8, 0x00a4fd, PG_U_MODIFIER_LETTER},
 	{0x00a4fe, 0x00a4ff, PG_U_OTHER_PUNCTUATION},
@@ -2356,7 +2078,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00a610, 0x00a61f, PG_U_OTHER_LETTER},
 	{0x00a620, 0x00a629, PG_U_DECIMAL_NUMBER},
 	{0x00a62a, 0x00a62b, PG_U_OTHER_LETTER},
-	{0x00a62c, 0x00a63f, PG_U_UNASSIGNED},
 	{0x00a640, 0x00a640, PG_U_UPPERCASE_LETTER},
 	{0x00a641, 0x00a641, PG_U_LOWERCASE_LETTER},
 	{0x00a642, 0x00a642, PG_U_UPPERCASE_LETTER},
@@ -2444,7 +2165,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00a6e6, 0x00a6ef, PG_U_LETTER_NUMBER},
 	{0x00a6f0, 0x00a6f1, PG_U_NONSPACING_MARK},
 	{0x00a6f2, 0x00a6f7, PG_U_OTHER_PUNCTUATION},
-	{0x00a6f8, 0x00a6ff, PG_U_UNASSIGNED},
 	{0x00a700, 0x00a716, PG_U_MODIFIER_SYMBOL},
 	{0x00a717, 0x00a71f, PG_U_MODIFIER_LETTER},
 	{0x00a720, 0x00a721, PG_U_MODIFIER_SYMBOL},
@@ -2593,18 +2313,14 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00a7c8, 0x00a7c8, PG_U_LOWERCASE_LETTER},
 	{0x00a7c9, 0x00a7c9, PG_U_UPPERCASE_LETTER},
 	{0x00a7ca, 0x00a7ca, PG_U_LOWERCASE_LETTER},
-	{0x00a7cb, 0x00a7cf, PG_U_UNASSIGNED},
 	{0x00a7d0, 0x00a7d0, PG_U_UPPERCASE_LETTER},
 	{0x00a7d1, 0x00a7d1, PG_U_LOWERCASE_LETTER},
-	{0x00a7d2, 0x00a7d2, PG_U_UNASSIGNED},
 	{0x00a7d3, 0x00a7d3, PG_U_LOWERCASE_LETTER},
-	{0x00a7d4, 0x00a7d4, PG_U_UNASSIGNED},
 	{0x00a7d5, 0x00a7d5, PG_U_LOWERCASE_LETTER},
 	{0x00a7d6, 0x00a7d6, PG_U_UPPERCASE_LETTER},
 	{0x00a7d7, 0x00a7d7, PG_U_LOWERCASE_LETTER},
 	{0x00a7d8, 0x00a7d8, PG_U_UPPERCASE_LETTER},
 	{0x00a7d9, 0x00a7d9, PG_U_LOWERCASE_LETTER},
-	{0x00a7da, 0x00a7f1, PG_U_UNASSIGNED},
 	{0x00a7f2, 0x00a7f4, PG_U_MODIFIER_LETTER},
 	{0x00a7f5, 0x00a7f5, PG_U_UPPERCASE_LETTER},
 	{0x00a7f6, 0x00a7f6, PG_U_LOWERCASE_LETTER},
@@ -2623,23 +2339,18 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00a827, 0x00a827, PG_U_SPACING_MARK},
 	{0x00a828, 0x00a82b, PG_U_OTHER_SYMBOL},
 	{0x00a82c, 0x00a82c, PG_U_NONSPACING_MARK},
-	{0x00a82d, 0x00a82f, PG_U_UNASSIGNED},
 	{0x00a830, 0x00a835, PG_U_OTHER_NUMBER},
 	{0x00a836, 0x00a837, PG_U_OTHER_SYMBOL},
 	{0x00a838, 0x00a838, PG_U_CURRENCY_SYMBOL},
 	{0x00a839, 0x00a839, PG_U_OTHER_SYMBOL},
-	{0x00a83a, 0x00a83f, PG_U_UNASSIGNED},
 	{0x00a840, 0x00a873, PG_U_OTHER_LETTER},
 	{0x00a874, 0x00a877, PG_U_OTHER_PUNCTUATION},
-	{0x00a878, 0x00a87f, PG_U_UNASSIGNED},
 	{0x00a880, 0x00a881, PG_U_SPACING_MARK},
 	{0x00a882, 0x00a8b3, PG_U_OTHER_LETTER},
 	{0x00a8b4, 0x00a8c3, PG_U_SPACING_MARK},
 	{0x00a8c4, 0x00a8c5, PG_U_NONSPACING_MARK},
-	{0x00a8c6, 0x00a8cd, PG_U_UNASSIGNED},
 	{0x00a8ce, 0x00a8cf, PG_U_OTHER_PUNCTUATION},
 	{0x00a8d0, 0x00a8d9, PG_U_DECIMAL_NUMBER},
-	{0x00a8da, 0x00a8df, PG_U_UNASSIGNED},
 	{0x00a8e0, 0x00a8f1, PG_U_NONSPACING_MARK},
 	{0x00a8f2, 0x00a8f7, PG_U_OTHER_LETTER},
 	{0x00a8f8, 0x00a8fa, PG_U_OTHER_PUNCTUATION},
@@ -2654,10 +2365,8 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00a930, 0x00a946, PG_U_OTHER_LETTER},
 	{0x00a947, 0x00a951, PG_U_NONSPACING_MARK},
 	{0x00a952, 0x00a953, PG_U_SPACING_MARK},
-	{0x00a954, 0x00a95e, PG_U_UNASSIGNED},
 	{0x00a95f, 0x00a95f, PG_U_OTHER_PUNCTUATION},
 	{0x00a960, 0x00a97c, PG_U_OTHER_LETTER},
-	{0x00a97d, 0x00a97f, PG_U_UNASSIGNED},
 	{0x00a980, 0x00a982, PG_U_NONSPACING_MARK},
 	{0x00a983, 0x00a983, PG_U_SPACING_MARK},
 	{0x00a984, 0x00a9b2, PG_U_OTHER_LETTER},
@@ -2668,10 +2377,8 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00a9bc, 0x00a9bd, PG_U_NONSPACING_MARK},
 	{0x00a9be, 0x00a9c0, PG_U_SPACING_MARK},
 	{0x00a9c1, 0x00a9cd, PG_U_OTHER_PUNCTUATION},
-	{0x00a9ce, 0x00a9ce, PG_U_UNASSIGNED},
 	{0x00a9cf, 0x00a9cf, PG_U_MODIFIER_LETTER},
 	{0x00a9d0, 0x00a9d9, PG_U_DECIMAL_NUMBER},
-	{0x00a9da, 0x00a9dd, PG_U_UNASSIGNED},
 	{0x00a9de, 0x00a9df, PG_U_OTHER_PUNCTUATION},
 	{0x00a9e0, 0x00a9e4, PG_U_OTHER_LETTER},
 	{0x00a9e5, 0x00a9e5, PG_U_NONSPACING_MARK},
@@ -2679,22 +2386,18 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00a9e7, 0x00a9ef, PG_U_OTHER_LETTER},
 	{0x00a9f0, 0x00a9f9, PG_U_DECIMAL_NUMBER},
 	{0x00a9fa, 0x00a9fe, PG_U_OTHER_LETTER},
-	{0x00a9ff, 0x00a9ff, PG_U_UNASSIGNED},
 	{0x00aa00, 0x00aa28, PG_U_OTHER_LETTER},
 	{0x00aa29, 0x00aa2e, PG_U_NONSPACING_MARK},
 	{0x00aa2f, 0x00aa30, PG_U_SPACING_MARK},
 	{0x00aa31, 0x00aa32, PG_U_NONSPACING_MARK},
 	{0x00aa33, 0x00aa34, PG_U_SPACING_MARK},
 	{0x00aa35, 0x00aa36, PG_U_NONSPACING_MARK},
-	{0x00aa37, 0x00aa3f, PG_U_UNASSIGNED},
 	{0x00aa40, 0x00aa42, PG_U_OTHER_LETTER},
 	{0x00aa43, 0x00aa43, PG_U_NONSPACING_MARK},
 	{0x00aa44, 0x00aa4b, PG_U_OTHER_LETTER},
 	{0x00aa4c, 0x00aa4c, PG_U_NONSPACING_MARK},
 	{0x00aa4d, 0x00aa4d, PG_U_SPACING_MARK},
-	{0x00aa4e, 0x00aa4f, PG_U_UNASSIGNED},
 	{0x00aa50, 0x00aa59, PG_U_DECIMAL_NUMBER},
-	{0x00aa5a, 0x00aa5b, PG_U_UNASSIGNED},
 	{0x00aa5c, 0x00aa5f, PG_U_OTHER_PUNCTUATION},
 	{0x00aa60, 0x00aa6f, PG_U_OTHER_LETTER},
 	{0x00aa70, 0x00aa70, PG_U_MODIFIER_LETTER},
@@ -2715,7 +2418,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00aac0, 0x00aac0, PG_U_OTHER_LETTER},
 	{0x00aac1, 0x00aac1, PG_U_NONSPACING_MARK},
 	{0x00aac2, 0x00aac2, PG_U_OTHER_LETTER},
-	{0x00aac3, 0x00aada, PG_U_UNASSIGNED},
 	{0x00aadb, 0x00aadc, PG_U_OTHER_LETTER},
 	{0x00aadd, 0x00aadd, PG_U_MODIFIER_LETTER},
 	{0x00aade, 0x00aadf, PG_U_OTHER_PUNCTUATION},
@@ -2728,24 +2430,17 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00aaf3, 0x00aaf4, PG_U_MODIFIER_LETTER},
 	{0x00aaf5, 0x00aaf5, PG_U_SPACING_MARK},
 	{0x00aaf6, 0x00aaf6, PG_U_NONSPACING_MARK},
-	{0x00aaf7, 0x00ab00, PG_U_UNASSIGNED},
 	{0x00ab01, 0x00ab06, PG_U_OTHER_LETTER},
-	{0x00ab07, 0x00ab08, PG_U_UNASSIGNED},
 	{0x00ab09, 0x00ab0e, PG_U_OTHER_LETTER},
-	{0x00ab0f, 0x00ab10, PG_U_UNASSIGNED},
 	{0x00ab11, 0x00ab16, PG_U_OTHER_LETTER},
-	{0x00ab17, 0x00ab1f, PG_U_UNASSIGNED},
 	{0x00ab20, 0x00ab26, PG_U_OTHER_LETTER},
-	{0x00ab27, 0x00ab27, PG_U_UNASSIGNED},
 	{0x00ab28, 0x00ab2e, PG_U_OTHER_LETTER},
-	{0x00ab2f, 0x00ab2f, PG_U_UNASSIGNED},
 	{0x00ab30, 0x00ab5a, PG_U_LOWERCASE_LETTER},
 	{0x00ab5b, 0x00ab5b, PG_U_MODIFIER_SYMBOL},
 	{0x00ab5c, 0x00ab5f, PG_U_MODIFIER_LETTER},
 	{0x00ab60, 0x00ab68, PG_U_LOWERCASE_LETTER},
 	{0x00ab69, 0x00ab69, PG_U_MODIFIER_LETTER},
 	{0x00ab6a, 0x00ab6b, PG_U_MODIFIER_SYMBOL},
-	{0x00ab6c, 0x00ab6f, PG_U_UNASSIGNED},
 	{0x00ab70, 0x00abbf, PG_U_LOWERCASE_LETTER},
 	{0x00abc0, 0x00abe2, PG_U_OTHER_LETTER},
 	{0x00abe3, 0x00abe4, PG_U_SPACING_MARK},
@@ -2756,52 +2451,34 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00abeb, 0x00abeb, PG_U_OTHER_PUNCTUATION},
 	{0x00abec, 0x00abec, PG_U_SPACING_MARK},
 	{0x00abed, 0x00abed, PG_U_NONSPACING_MARK},
-	{0x00abee, 0x00abef, PG_U_UNASSIGNED},
 	{0x00abf0, 0x00abf9, PG_U_DECIMAL_NUMBER},
-	{0x00abfa, 0x00abff, PG_U_UNASSIGNED},
 	{0x00ac00, 0x00d7a3, PG_U_OTHER_LETTER},
-	{0x00d7a4, 0x00d7af, PG_U_UNASSIGNED},
 	{0x00d7b0, 0x00d7c6, PG_U_OTHER_LETTER},
-	{0x00d7c7, 0x00d7ca, PG_U_UNASSIGNED},
 	{0x00d7cb, 0x00d7fb, PG_U_OTHER_LETTER},
-	{0x00d7fc, 0x00d7ff, PG_U_UNASSIGNED},
 	{0x00d800, 0x00dfff, PG_U_SURROGATE},
 	{0x00e000, 0x00f8ff, PG_U_PRIVATE_USE},
 	{0x00f900, 0x00fa6d, PG_U_OTHER_LETTER},
-	{0x00fa6e, 0x00fa6f, PG_U_UNASSIGNED},
 	{0x00fa70, 0x00fad9, PG_U_OTHER_LETTER},
-	{0x00fada, 0x00faff, PG_U_UNASSIGNED},
 	{0x00fb00, 0x00fb06, PG_U_LOWERCASE_LETTER},
-	{0x00fb07, 0x00fb12, PG_U_UNASSIGNED},
 	{0x00fb13, 0x00fb17, PG_U_LOWERCASE_LETTER},
-	{0x00fb18, 0x00fb1c, PG_U_UNASSIGNED},
 	{0x00fb1d, 0x00fb1d, PG_U_OTHER_LETTER},
 	{0x00fb1e, 0x00fb1e, PG_U_NONSPACING_MARK},
 	{0x00fb1f, 0x00fb28, PG_U_OTHER_LETTER},
 	{0x00fb29, 0x00fb29, PG_U_MATH_SYMBOL},
 	{0x00fb2a, 0x00fb36, PG_U_OTHER_LETTER},
-	{0x00fb37, 0x00fb37, PG_U_UNASSIGNED},
 	{0x00fb38, 0x00fb3c, PG_U_OTHER_LETTER},
-	{0x00fb3d, 0x00fb3d, PG_U_UNASSIGNED},
 	{0x00fb3e, 0x00fb3e, PG_U_OTHER_LETTER},
-	{0x00fb3f, 0x00fb3f, PG_U_UNASSIGNED},
 	{0x00fb40, 0x00fb41, PG_U_OTHER_LETTER},
-	{0x00fb42, 0x00fb42, PG_U_UNASSIGNED},
 	{0x00fb43, 0x00fb44, PG_U_OTHER_LETTER},
-	{0x00fb45, 0x00fb45, PG_U_UNASSIGNED},
 	{0x00fb46, 0x00fbb1, PG_U_OTHER_LETTER},
 	{0x00fbb2, 0x00fbc2, PG_U_MODIFIER_SYMBOL},
-	{0x00fbc3, 0x00fbd2, PG_U_UNASSIGNED},
 	{0x00fbd3, 0x00fd3d, PG_U_OTHER_LETTER},
 	{0x00fd3e, 0x00fd3e, PG_U_CLOSE_PUNCTUATION},
 	{0x00fd3f, 0x00fd3f, PG_U_OPEN_PUNCTUATION},
 	{0x00fd40, 0x00fd4f, PG_U_OTHER_SYMBOL},
 	{0x00fd50, 0x00fd8f, PG_U_OTHER_LETTER},
-	{0x00fd90, 0x00fd91, PG_U_UNASSIGNED},
 	{0x00fd92, 0x00fdc7, PG_U_OTHER_LETTER},
-	{0x00fdc8, 0x00fdce, PG_U_UNASSIGNED},
 	{0x00fdcf, 0x00fdcf, PG_U_OTHER_SYMBOL},
-	{0x00fdd0, 0x00fdef, PG_U_UNASSIGNED},
 	{0x00fdf0, 0x00fdfb, PG_U_OTHER_LETTER},
 	{0x00fdfc, 0x00fdfc, PG_U_CURRENCY_SYMBOL},
 	{0x00fdfd, 0x00fdff, PG_U_OTHER_SYMBOL},
@@ -2810,7 +2487,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00fe17, 0x00fe17, PG_U_OPEN_PUNCTUATION},
 	{0x00fe18, 0x00fe18, PG_U_CLOSE_PUNCTUATION},
 	{0x00fe19, 0x00fe19, PG_U_OTHER_PUNCTUATION},
-	{0x00fe1a, 0x00fe1f, PG_U_UNASSIGNED},
 	{0x00fe20, 0x00fe2f, PG_U_NONSPACING_MARK},
 	{0x00fe30, 0x00fe30, PG_U_OTHER_PUNCTUATION},
 	{0x00fe31, 0x00fe32, PG_U_DASH_PUNCTUATION},
@@ -2837,7 +2513,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00fe49, 0x00fe4c, PG_U_OTHER_PUNCTUATION},
 	{0x00fe4d, 0x00fe4f, PG_U_CONNECTOR_PUNCTUATION},
 	{0x00fe50, 0x00fe52, PG_U_OTHER_PUNCTUATION},
-	{0x00fe53, 0x00fe53, PG_U_UNASSIGNED},
 	{0x00fe54, 0x00fe57, PG_U_OTHER_PUNCTUATION},
 	{0x00fe58, 0x00fe58, PG_U_DASH_PUNCTUATION},
 	{0x00fe59, 0x00fe59, PG_U_OPEN_PUNCTUATION},
@@ -2850,17 +2525,12 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00fe62, 0x00fe62, PG_U_MATH_SYMBOL},
 	{0x00fe63, 0x00fe63, PG_U_DASH_PUNCTUATION},
 	{0x00fe64, 0x00fe66, PG_U_MATH_SYMBOL},
-	{0x00fe67, 0x00fe67, PG_U_UNASSIGNED},
 	{0x00fe68, 0x00fe68, PG_U_OTHER_PUNCTUATION},
 	{0x00fe69, 0x00fe69, PG_U_CURRENCY_SYMBOL},
 	{0x00fe6a, 0x00fe6b, PG_U_OTHER_PUNCTUATION},
-	{0x00fe6c, 0x00fe6f, PG_U_UNASSIGNED},
 	{0x00fe70, 0x00fe74, PG_U_OTHER_LETTER},
-	{0x00fe75, 0x00fe75, PG_U_UNASSIGNED},
 	{0x00fe76, 0x00fefc, PG_U_OTHER_LETTER},
-	{0x00fefd, 0x00fefe, PG_U_UNASSIGNED},
 	{0x00feff, 0x00feff, PG_U_FORMAT},
-	{0x00ff00, 0x00ff00, PG_U_UNASSIGNED},
 	{0x00ff01, 0x00ff03, PG_U_OTHER_PUNCTUATION},
 	{0x00ff04, 0x00ff04, PG_U_CURRENCY_SYMBOL},
 	{0x00ff05, 0x00ff07, PG_U_OTHER_PUNCTUATION},
@@ -2898,273 +2568,175 @@ static const pg_category_range unicode_categories[4009] =
 	{0x00ff71, 0x00ff9d, PG_U_OTHER_LETTER},
 	{0x00ff9e, 0x00ff9f, PG_U_MODIFIER_LETTER},
 	{0x00ffa0, 0x00ffbe, PG_U_OTHER_LETTER},
-	{0x00ffbf, 0x00ffc1, PG_U_UNASSIGNED},
 	{0x00ffc2, 0x00ffc7, PG_U_OTHER_LETTER},
-	{0x00ffc8, 0x00ffc9, PG_U_UNASSIGNED},
 	{0x00ffca, 0x00ffcf, PG_U_OTHER_LETTER},
-	{0x00ffd0, 0x00ffd1, PG_U_UNASSIGNED},
 	{0x00ffd2, 0x00ffd7, PG_U_OTHER_LETTER},
-	{0x00ffd8, 0x00ffd9, PG_U_UNASSIGNED},
 	{0x00ffda, 0x00ffdc, PG_U_OTHER_LETTER},
-	{0x00ffdd, 0x00ffdf, PG_U_UNASSIGNED},
 	{0x00ffe0, 0x00ffe1, PG_U_CURRENCY_SYMBOL},
 	{0x00ffe2, 0x00ffe2, PG_U_MATH_SYMBOL},
 	{0x00ffe3, 0x00ffe3, PG_U_MODIFIER_SYMBOL},
 	{0x00ffe4, 0x00ffe4, PG_U_OTHER_SYMBOL},
 	{0x00ffe5, 0x00ffe6, PG_U_CURRENCY_SYMBOL},
-	{0x00ffe7, 0x00ffe7, PG_U_UNASSIGNED},
 	{0x00ffe8, 0x00ffe8, PG_U_OTHER_SYMBOL},
 	{0x00ffe9, 0x00ffec, PG_U_MATH_SYMBOL},
 	{0x00ffed, 0x00ffee, PG_U_OTHER_SYMBOL},
-	{0x00ffef, 0x00fff8, PG_U_UNASSIGNED},
 	{0x00fff9, 0x00fffb, PG_U_FORMAT},
 	{0x00fffc, 0x00fffd, PG_U_OTHER_SYMBOL},
-	{0x00fffe, 0x00ffff, PG_U_UNASSIGNED},
 	{0x010000, 0x01000b, PG_U_OTHER_LETTER},
-	{0x01000c, 0x01000c, PG_U_UNASSIGNED},
 	{0x01000d, 0x010026, PG_U_OTHER_LETTER},
-	{0x010027, 0x010027, PG_U_UNASSIGNED},
 	{0x010028, 0x01003a, PG_U_OTHER_LETTER},
-	{0x01003b, 0x01003b, PG_U_UNASSIGNED},
 	{0x01003c, 0x01003d, PG_U_OTHER_LETTER},
-	{0x01003e, 0x01003e, PG_U_UNASSIGNED},
 	{0x01003f, 0x01004d, PG_U_OTHER_LETTER},
-	{0x01004e, 0x01004f, PG_U_UNASSIGNED},
 	{0x010050, 0x01005d, PG_U_OTHER_LETTER},
-	{0x01005e, 0x01007f, PG_U_UNASSIGNED},
 	{0x010080, 0x0100fa, PG_U_OTHER_LETTER},
-	{0x0100fb, 0x0100ff, PG_U_UNASSIGNED},
 	{0x010100, 0x010102, PG_U_OTHER_PUNCTUATION},
-	{0x010103, 0x010106, PG_U_UNASSIGNED},
 	{0x010107, 0x010133, PG_U_OTHER_NUMBER},
-	{0x010134, 0x010136, PG_U_UNASSIGNED},
 	{0x010137, 0x01013f, PG_U_OTHER_SYMBOL},
 	{0x010140, 0x010174, PG_U_LETTER_NUMBER},
 	{0x010175, 0x010178, PG_U_OTHER_NUMBER},
 	{0x010179, 0x010189, PG_U_OTHER_SYMBOL},
 	{0x01018a, 0x01018b, PG_U_OTHER_NUMBER},
 	{0x01018c, 0x01018e, PG_U_OTHER_SYMBOL},
-	{0x01018f, 0x01018f, PG_U_UNASSIGNED},
 	{0x010190, 0x01019c, PG_U_OTHER_SYMBOL},
-	{0x01019d, 0x01019f, PG_U_UNASSIGNED},
 	{0x0101a0, 0x0101a0, PG_U_OTHER_SYMBOL},
-	{0x0101a1, 0x0101cf, PG_U_UNASSIGNED},
 	{0x0101d0, 0x0101fc, PG_U_OTHER_SYMBOL},
 	{0x0101fd, 0x0101fd, PG_U_NONSPACING_MARK},
-	{0x0101fe, 0x01027f, PG_U_UNASSIGNED},
 	{0x010280, 0x01029c, PG_U_OTHER_LETTER},
-	{0x01029d, 0x01029f, PG_U_UNASSIGNED},
 	{0x0102a0, 0x0102d0, PG_U_OTHER_LETTER},
-	{0x0102d1, 0x0102df, PG_U_UNASSIGNED},
 	{0x0102e0, 0x0102e0, PG_U_NONSPACING_MARK},
 	{0x0102e1, 0x0102fb, PG_U_OTHER_NUMBER},
-	{0x0102fc, 0x0102ff, PG_U_UNASSIGNED},
 	{0x010300, 0x01031f, PG_U_OTHER_LETTER},
 	{0x010320, 0x010323, PG_U_OTHER_NUMBER},
-	{0x010324, 0x01032c, PG_U_UNASSIGNED},
 	{0x01032d, 0x010340, PG_U_OTHER_LETTER},
 	{0x010341, 0x010341, PG_U_LETTER_NUMBER},
 	{0x010342, 0x010349, PG_U_OTHER_LETTER},
 	{0x01034a, 0x01034a, PG_U_LETTER_NUMBER},
-	{0x01034b, 0x01034f, PG_U_UNASSIGNED},
 	{0x010350, 0x010375, PG_U_OTHER_LETTER},
 	{0x010376, 0x01037a, PG_U_NONSPACING_MARK},
-	{0x01037b, 0x01037f, PG_U_UNASSIGNED},
 	{0x010380, 0x01039d, PG_U_OTHER_LETTER},
-	{0x01039e, 0x01039e, PG_U_UNASSIGNED},
 	{0x01039f, 0x01039f, PG_U_OTHER_PUNCTUATION},
 	{0x0103a0, 0x0103c3, PG_U_OTHER_LETTER},
-	{0x0103c4, 0x0103c7, PG_U_UNASSIGNED},
 	{0x0103c8, 0x0103cf, PG_U_OTHER_LETTER},
 	{0x0103d0, 0x0103d0, PG_U_OTHER_PUNCTUATION},
 	{0x0103d1, 0x0103d5, PG_U_LETTER_NUMBER},
-	{0x0103d6, 0x0103ff, PG_U_UNASSIGNED},
 	{0x010400, 0x010427, PG_U_UPPERCASE_LETTER},
 	{0x010428, 0x01044f, PG_U_LOWERCASE_LETTER},
 	{0x010450, 0x01049d, PG_U_OTHER_LETTER},
-	{0x01049e, 0x01049f, PG_U_UNASSIGNED},
 	{0x0104a0, 0x0104a9, PG_U_DECIMAL_NUMBER},
-	{0x0104aa, 0x0104af, PG_U_UNASSIGNED},
 	{0x0104b0, 0x0104d3, PG_U_UPPERCASE_LETTER},
-	{0x0104d4, 0x0104d7, PG_U_UNASSIGNED},
 	{0x0104d8, 0x0104fb, PG_U_LOWERCASE_LETTER},
-	{0x0104fc, 0x0104ff, PG_U_UNASSIGNED},
 	{0x010500, 0x010527, PG_U_OTHER_LETTER},
-	{0x010528, 0x01052f, PG_U_UNASSIGNED},
 	{0x010530, 0x010563, PG_U_OTHER_LETTER},
-	{0x010564, 0x01056e, PG_U_UNASSIGNED},
 	{0x01056f, 0x01056f, PG_U_OTHER_PUNCTUATION},
 	{0x010570, 0x01057a, PG_U_UPPERCASE_LETTER},
-	{0x01057b, 0x01057b, PG_U_UNASSIGNED},
 	{0x01057c, 0x01058a, PG_U_UPPERCASE_LETTER},
-	{0x01058b, 0x01058b, PG_U_UNASSIGNED},
 	{0x01058c, 0x010592, PG_U_UPPERCASE_LETTER},
-	{0x010593, 0x010593, PG_U_UNASSIGNED},
 	{0x010594, 0x010595, PG_U_UPPERCASE_LETTER},
-	{0x010596, 0x010596, PG_U_UNASSIGNED},
 	{0x010597, 0x0105a1, PG_U_LOWERCASE_LETTER},
-	{0x0105a2, 0x0105a2, PG_U_UNASSIGNED},
 	{0x0105a3, 0x0105b1, PG_U_LOWERCASE_LETTER},
-	{0x0105b2, 0x0105b2, PG_U_UNASSIGNED},
 	{0x0105b3, 0x0105b9, PG_U_LOWERCASE_LETTER},
-	{0x0105ba, 0x0105ba, PG_U_UNASSIGNED},
 	{0x0105bb, 0x0105bc, PG_U_LOWERCASE_LETTER},
-	{0x0105bd, 0x0105ff, PG_U_UNASSIGNED},
 	{0x010600, 0x010736, PG_U_OTHER_LETTER},
-	{0x010737, 0x01073f, PG_U_UNASSIGNED},
 	{0x010740, 0x010755, PG_U_OTHER_LETTER},
-	{0x010756, 0x01075f, PG_U_UNASSIGNED},
 	{0x010760, 0x010767, PG_U_OTHER_LETTER},
-	{0x010768, 0x01077f, PG_U_UNASSIGNED},
 	{0x010780, 0x010785, PG_U_MODIFIER_LETTER},
-	{0x010786, 0x010786, PG_U_UNASSIGNED},
 	{0x010787, 0x0107b0, PG_U_MODIFIER_LETTER},
-	{0x0107b1, 0x0107b1, PG_U_UNASSIGNED},
 	{0x0107b2, 0x0107ba, PG_U_MODIFIER_LETTER},
-	{0x0107bb, 0x0107ff, PG_U_UNASSIGNED},
 	{0x010800, 0x010805, PG_U_OTHER_LETTER},
-	{0x010806, 0x010807, PG_U_UNASSIGNED},
 	{0x010808, 0x010808, PG_U_OTHER_LETTER},
-	{0x010809, 0x010809, PG_U_UNASSIGNED},
 	{0x01080a, 0x010835, PG_U_OTHER_LETTER},
-	{0x010836, 0x010836, PG_U_UNASSIGNED},
 	{0x010837, 0x010838, PG_U_OTHER_LETTER},
-	{0x010839, 0x01083b, PG_U_UNASSIGNED},
 	{0x01083c, 0x01083c, PG_U_OTHER_LETTER},
-	{0x01083d, 0x01083e, PG_U_UNASSIGNED},
 	{0x01083f, 0x010855, PG_U_OTHER_LETTER},
-	{0x010856, 0x010856, PG_U_UNASSIGNED},
 	{0x010857, 0x010857, PG_U_OTHER_PUNCTUATION},
 	{0x010858, 0x01085f, PG_U_OTHER_NUMBER},
 	{0x010860, 0x010876, PG_U_OTHER_LETTER},
 	{0x010877, 0x010878, PG_U_OTHER_SYMBOL},
 	{0x010879, 0x01087f, PG_U_OTHER_NUMBER},
 	{0x010880, 0x01089e, PG_U_OTHER_LETTER},
-	{0x01089f, 0x0108a6, PG_U_UNASSIGNED},
 	{0x0108a7, 0x0108af, PG_U_OTHER_NUMBER},
-	{0x0108b0, 0x0108df, PG_U_UNASSIGNED},
 	{0x0108e0, 0x0108f2, PG_U_OTHER_LETTER},
-	{0x0108f3, 0x0108f3, PG_U_UNASSIGNED},
 	{0x0108f4, 0x0108f5, PG_U_OTHER_LETTER},
-	{0x0108f6, 0x0108fa, PG_U_UNASSIGNED},
 	{0x0108fb, 0x0108ff, PG_U_OTHER_NUMBER},
 	{0x010900, 0x010915, PG_U_OTHER_LETTER},
 	{0x010916, 0x01091b, PG_U_OTHER_NUMBER},
-	{0x01091c, 0x01091e, PG_U_UNASSIGNED},
 	{0x01091f, 0x01091f, PG_U_OTHER_PUNCTUATION},
 	{0x010920, 0x010939, PG_U_OTHER_LETTER},
-	{0x01093a, 0x01093e, PG_U_UNASSIGNED},
 	{0x01093f, 0x01093f, PG_U_OTHER_PUNCTUATION},
-	{0x010940, 0x01097f, PG_U_UNASSIGNED},
 	{0x010980, 0x0109b7, PG_U_OTHER_LETTER},
-	{0x0109b8, 0x0109bb, PG_U_UNASSIGNED},
 	{0x0109bc, 0x0109bd, PG_U_OTHER_NUMBER},
 	{0x0109be, 0x0109bf, PG_U_OTHER_LETTER},
 	{0x0109c0, 0x0109cf, PG_U_OTHER_NUMBER},
-	{0x0109d0, 0x0109d1, PG_U_UNASSIGNED},
 	{0x0109d2, 0x0109ff, PG_U_OTHER_NUMBER},
 	{0x010a00, 0x010a00, PG_U_OTHER_LETTER},
 	{0x010a01, 0x010a03, PG_U_NONSPACING_MARK},
-	{0x010a04, 0x010a04, PG_U_UNASSIGNED},
 	{0x010a05, 0x010a06, PG_U_NONSPACING_MARK},
-	{0x010a07, 0x010a0b, PG_U_UNASSIGNED},
 	{0x010a0c, 0x010a0f, PG_U_NONSPACING_MARK},
 	{0x010a10, 0x010a13, PG_U_OTHER_LETTER},
-	{0x010a14, 0x010a14, PG_U_UNASSIGNED},
 	{0x010a15, 0x010a17, PG_U_OTHER_LETTER},
-	{0x010a18, 0x010a18, PG_U_UNASSIGNED},
 	{0x010a19, 0x010a35, PG_U_OTHER_LETTER},
-	{0x010a36, 0x010a37, PG_U_UNASSIGNED},
 	{0x010a38, 0x010a3a, PG_U_NONSPACING_MARK},
-	{0x010a3b, 0x010a3e, PG_U_UNASSIGNED},
 	{0x010a3f, 0x010a3f, PG_U_NONSPACING_MARK},
 	{0x010a40, 0x010a48, PG_U_OTHER_NUMBER},
-	{0x010a49, 0x010a4f, PG_U_UNASSIGNED},
 	{0x010a50, 0x010a58, PG_U_OTHER_PUNCTUATION},
-	{0x010a59, 0x010a5f, PG_U_UNASSIGNED},
 	{0x010a60, 0x010a7c, PG_U_OTHER_LETTER},
 	{0x010a7d, 0x010a7e, PG_U_OTHER_NUMBER},
 	{0x010a7f, 0x010a7f, PG_U_OTHER_PUNCTUATION},
 	{0x010a80, 0x010a9c, PG_U_OTHER_LETTER},
 	{0x010a9d, 0x010a9f, PG_U_OTHER_NUMBER},
-	{0x010aa0, 0x010abf, PG_U_UNASSIGNED},
 	{0x010ac0, 0x010ac7, PG_U_OTHER_LETTER},
 	{0x010ac8, 0x010ac8, PG_U_OTHER_SYMBOL},
 	{0x010ac9, 0x010ae4, PG_U_OTHER_LETTER},
 	{0x010ae5, 0x010ae6, PG_U_NONSPACING_MARK},
-	{0x010ae7, 0x010aea, PG_U_UNASSIGNED},
 	{0x010aeb, 0x010aef, PG_U_OTHER_NUMBER},
 	{0x010af0, 0x010af6, PG_U_OTHER_PUNCTUATION},
-	{0x010af7, 0x010aff, PG_U_UNASSIGNED},
 	{0x010b00, 0x010b35, PG_U_OTHER_LETTER},
-	{0x010b36, 0x010b38, PG_U_UNASSIGNED},
 	{0x010b39, 0x010b3f, PG_U_OTHER_PUNCTUATION},
 	{0x010b40, 0x010b55, PG_U_OTHER_LETTER},
-	{0x010b56, 0x010b57, PG_U_UNASSIGNED},
 	{0x010b58, 0x010b5f, PG_U_OTHER_NUMBER},
 	{0x010b60, 0x010b72, PG_U_OTHER_LETTER},
-	{0x010b73, 0x010b77, PG_U_UNASSIGNED},
 	{0x010b78, 0x010b7f, PG_U_OTHER_NUMBER},
 	{0x010b80, 0x010b91, PG_U_OTHER_LETTER},
-	{0x010b92, 0x010b98, PG_U_UNASSIGNED},
 	{0x010b99, 0x010b9c, PG_U_OTHER_PUNCTUATION},
-	{0x010b9d, 0x010ba8, PG_U_UNASSIGNED},
 	{0x010ba9, 0x010baf, PG_U_OTHER_NUMBER},
-	{0x010bb0, 0x010bff, PG_U_UNASSIGNED},
 	{0x010c00, 0x010c48, PG_U_OTHER_LETTER},
-	{0x010c49, 0x010c7f, PG_U_UNASSIGNED},
 	{0x010c80, 0x010cb2, PG_U_UPPERCASE_LETTER},
-	{0x010cb3, 0x010cbf, PG_U_UNASSIGNED},
 	{0x010cc0, 0x010cf2, PG_U_LOWERCASE_LETTER},
-	{0x010cf3, 0x010cf9, PG_U_UNASSIGNED},
 	{0x010cfa, 0x010cff, PG_U_OTHER_NUMBER},
 	{0x010d00, 0x010d23, PG_U_OTHER_LETTER},
 	{0x010d24, 0x010d27, PG_U_NONSPACING_MARK},
-	{0x010d28, 0x010d2f, PG_U_UNASSIGNED},
 	{0x010d30, 0x010d39, PG_U_DECIMAL_NUMBER},
-	{0x010d3a, 0x010e5f, PG_U_UNASSIGNED},
 	{0x010e60, 0x010e7e, PG_U_OTHER_NUMBER},
-	{0x010e7f, 0x010e7f, PG_U_UNASSIGNED},
 	{0x010e80, 0x010ea9, PG_U_OTHER_LETTER},
-	{0x010eaa, 0x010eaa, PG_U_UNASSIGNED},
 	{0x010eab, 0x010eac, PG_U_NONSPACING_MARK},
 	{0x010ead, 0x010ead, PG_U_DASH_PUNCTUATION},
-	{0x010eae, 0x010eaf, PG_U_UNASSIGNED},
 	{0x010eb0, 0x010eb1, PG_U_OTHER_LETTER},
-	{0x010eb2, 0x010efc, PG_U_UNASSIGNED},
 	{0x010efd, 0x010eff, PG_U_NONSPACING_MARK},
 	{0x010f00, 0x010f1c, PG_U_OTHER_LETTER},
 	{0x010f1d, 0x010f26, PG_U_OTHER_NUMBER},
 	{0x010f27, 0x010f27, PG_U_OTHER_LETTER},
-	{0x010f28, 0x010f2f, PG_U_UNASSIGNED},
 	{0x010f30, 0x010f45, PG_U_OTHER_LETTER},
 	{0x010f46, 0x010f50, PG_U_NONSPACING_MARK},
 	{0x010f51, 0x010f54, PG_U_OTHER_NUMBER},
 	{0x010f55, 0x010f59, PG_U_OTHER_PUNCTUATION},
-	{0x010f5a, 0x010f6f, PG_U_UNASSIGNED},
 	{0x010f70, 0x010f81, PG_U_OTHER_LETTER},
 	{0x010f82, 0x010f85, PG_U_NONSPACING_MARK},
 	{0x010f86, 0x010f89, PG_U_OTHER_PUNCTUATION},
-	{0x010f8a, 0x010faf, PG_U_UNASSIGNED},
 	{0x010fb0, 0x010fc4, PG_U_OTHER_LETTER},
 	{0x010fc5, 0x010fcb, PG_U_OTHER_NUMBER},
-	{0x010fcc, 0x010fdf, PG_U_UNASSIGNED},
 	{0x010fe0, 0x010ff6, PG_U_OTHER_LETTER},
-	{0x010ff7, 0x010fff, PG_U_UNASSIGNED},
 	{0x011000, 0x011000, PG_U_SPACING_MARK},
 	{0x011001, 0x011001, PG_U_NONSPACING_MARK},
 	{0x011002, 0x011002, PG_U_SPACING_MARK},
 	{0x011003, 0x011037, PG_U_OTHER_LETTER},
 	{0x011038, 0x011046, PG_U_NONSPACING_MARK},
 	{0x011047, 0x01104d, PG_U_OTHER_PUNCTUATION},
-	{0x01104e, 0x011051, PG_U_UNASSIGNED},
 	{0x011052, 0x011065, PG_U_OTHER_NUMBER},
 	{0x011066, 0x01106f, PG_U_DECIMAL_NUMBER},
 	{0x011070, 0x011070, PG_U_NONSPACING_MARK},
 	{0x011071, 0x011072, PG_U_OTHER_LETTER},
 	{0x011073, 0x011074, PG_U_NONSPACING_MARK},
 	{0x011075, 0x011075, PG_U_OTHER_LETTER},
-	{0x011076, 0x01107e, PG_U_UNASSIGNED},
 	{0x01107f, 0x011081, PG_U_NONSPACING_MARK},
 	{0x011082, 0x011082, PG_U_SPACING_MARK},
 	{0x011083, 0x0110af, PG_U_OTHER_LETTER},
@@ -3176,30 +2748,23 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0110bd, 0x0110bd, PG_U_FORMAT},
 	{0x0110be, 0x0110c1, PG_U_OTHER_PUNCTUATION},
 	{0x0110c2, 0x0110c2, PG_U_NONSPACING_MARK},
-	{0x0110c3, 0x0110cc, PG_U_UNASSIGNED},
 	{0x0110cd, 0x0110cd, PG_U_FORMAT},
-	{0x0110ce, 0x0110cf, PG_U_UNASSIGNED},
 	{0x0110d0, 0x0110e8, PG_U_OTHER_LETTER},
-	{0x0110e9, 0x0110ef, PG_U_UNASSIGNED},
 	{0x0110f0, 0x0110f9, PG_U_DECIMAL_NUMBER},
-	{0x0110fa, 0x0110ff, PG_U_UNASSIGNED},
 	{0x011100, 0x011102, PG_U_NONSPACING_MARK},
 	{0x011103, 0x011126, PG_U_OTHER_LETTER},
 	{0x011127, 0x01112b, PG_U_NONSPACING_MARK},
 	{0x01112c, 0x01112c, PG_U_SPACING_MARK},
 	{0x01112d, 0x011134, PG_U_NONSPACING_MARK},
-	{0x011135, 0x011135, PG_U_UNASSIGNED},
 	{0x011136, 0x01113f, PG_U_DECIMAL_NUMBER},
 	{0x011140, 0x011143, PG_U_OTHER_PUNCTUATION},
 	{0x011144, 0x011144, PG_U_OTHER_LETTER},
 	{0x011145, 0x011146, PG_U_SPACING_MARK},
 	{0x011147, 0x011147, PG_U_OTHER_LETTER},
-	{0x011148, 0x01114f, PG_U_UNASSIGNED},
 	{0x011150, 0x011172, PG_U_OTHER_LETTER},
 	{0x011173, 0x011173, PG_U_NONSPACING_MARK},
 	{0x011174, 0x011175, PG_U_OTHER_PUNCTUATION},
 	{0x011176, 0x011176, PG_U_OTHER_LETTER},
-	{0x011177, 0x01117f, PG_U_UNASSIGNED},
 	{0x011180, 0x011181, PG_U_NONSPACING_MARK},
 	{0x011182, 0x011182, PG_U_SPACING_MARK},
 	{0x011183, 0x0111b2, PG_U_OTHER_LETTER},
@@ -3217,11 +2782,8 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0111db, 0x0111db, PG_U_OTHER_PUNCTUATION},
 	{0x0111dc, 0x0111dc, PG_U_OTHER_LETTER},
 	{0x0111dd, 0x0111df, PG_U_OTHER_PUNCTUATION},
-	{0x0111e0, 0x0111e0, PG_U_UNASSIGNED},
 	{0x0111e1, 0x0111f4, PG_U_OTHER_NUMBER},
-	{0x0111f5, 0x0111ff, PG_U_UNASSIGNED},
 	{0x011200, 0x011211, PG_U_OTHER_LETTER},
-	{0x011212, 0x011212, PG_U_UNASSIGNED},
 	{0x011213, 0x01122b, PG_U_OTHER_LETTER},
 	{0x01122c, 0x01122e, PG_U_SPACING_MARK},
 	{0x01122f, 0x011231, PG_U_NONSPACING_MARK},
@@ -3233,61 +2795,38 @@ static const pg_category_range unicode_categories[4009] =
 	{0x01123e, 0x01123e, PG_U_NONSPACING_MARK},
 	{0x01123f, 0x011240, PG_U_OTHER_LETTER},
 	{0x011241, 0x011241, PG_U_NONSPACING_MARK},
-	{0x011242, 0x01127f, PG_U_UNASSIGNED},
 	{0x011280, 0x011286, PG_U_OTHER_LETTER},
-	{0x011287, 0x011287, PG_U_UNASSIGNED},
 	{0x011288, 0x011288, PG_U_OTHER_LETTER},
-	{0x011289, 0x011289, PG_U_UNASSIGNED},
 	{0x01128a, 0x01128d, PG_U_OTHER_LETTER},
-	{0x01128e, 0x01128e, PG_U_UNASSIGNED},
 	{0x01128f, 0x01129d, PG_U_OTHER_LETTER},
-	{0x01129e, 0x01129e, PG_U_UNASSIGNED},
 	{0x01129f, 0x0112a8, PG_U_OTHER_LETTER},
 	{0x0112a9, 0x0112a9, PG_U_OTHER_PUNCTUATION},
-	{0x0112aa, 0x0112af, PG_U_UNASSIGNED},
 	{0x0112b0, 0x0112de, PG_U_OTHER_LETTER},
 	{0x0112df, 0x0112df, PG_U_NONSPACING_MARK},
 	{0x0112e0, 0x0112e2, PG_U_SPACING_MARK},
 	{0x0112e3, 0x0112ea, PG_U_NONSPACING_MARK},
-	{0x0112eb, 0x0112ef, PG_U_UNASSIGNED},
 	{0x0112f0, 0x0112f9, PG_U_DECIMAL_NUMBER},
-	{0x0112fa, 0x0112ff, PG_U_UNASSIGNED},
 	{0x011300, 0x011301, PG_U_NONSPACING_MARK},
 	{0x011302, 0x011303, PG_U_SPACING_MARK},
-	{0x011304, 0x011304, PG_U_UNASSIGNED},
 	{0x011305, 0x01130c, PG_U_OTHER_LETTER},
-	{0x01130d, 0x01130e, PG_U_UNASSIGNED},
 	{0x01130f, 0x011310, PG_U_OTHER_LETTER},
-	{0x011311, 0x011312, PG_U_UNASSIGNED},
 	{0x011313, 0x011328, PG_U_OTHER_LETTER},
-	{0x011329, 0x011329, PG_U_UNASSIGNED},
 	{0x01132a, 0x011330, PG_U_OTHER_LETTER},
-	{0x011331, 0x011331, PG_U_UNASSIGNED},
 	{0x011332, 0x011333, PG_U_OTHER_LETTER},
-	{0x011334, 0x011334, PG_U_UNASSIGNED},
 	{0x011335, 0x011339, PG_U_OTHER_LETTER},
-	{0x01133a, 0x01133a, PG_U_UNASSIGNED},
 	{0x01133b, 0x01133c, PG_U_NONSPACING_MARK},
 	{0x01133d, 0x01133d, PG_U_OTHER_LETTER},
 	{0x01133e, 0x01133f, PG_U_SPACING_MARK},
 	{0x011340, 0x011340, PG_U_NONSPACING_MARK},
 	{0x011341, 0x011344, PG_U_SPACING_MARK},
-	{0x011345, 0x011346, PG_U_UNASSIGNED},
 	{0x011347, 0x011348, PG_U_SPACING_MARK},
-	{0x011349, 0x01134a, PG_U_UNASSIGNED},
 	{0x01134b, 0x01134d, PG_U_SPACING_MARK},
-	{0x01134e, 0x01134f, PG_U_UNASSIGNED},
 	{0x011350, 0x011350, PG_U_OTHER_LETTER},
-	{0x011351, 0x011356, PG_U_UNASSIGNED},
 	{0x011357, 0x011357, PG_U_SPACING_MARK},
-	{0x011358, 0x01135c, PG_U_UNASSIGNED},
 	{0x01135d, 0x011361, PG_U_OTHER_LETTER},
 	{0x011362, 0x011363, PG_U_SPACING_MARK},
-	{0x011364, 0x011365, PG_U_UNASSIGNED},
 	{0x011366, 0x01136c, PG_U_NONSPACING_MARK},
-	{0x01136d, 0x01136f, PG_U_UNASSIGNED},
 	{0x011370, 0x011374, PG_U_NONSPACING_MARK},
-	{0x011375, 0x0113ff, PG_U_UNASSIGNED},
 	{0x011400, 0x011434, PG_U_OTHER_LETTER},
 	{0x011435, 0x011437, PG_U_SPACING_MARK},
 	{0x011438, 0x01143f, PG_U_NONSPACING_MARK},
@@ -3299,11 +2838,9 @@ static const pg_category_range unicode_categories[4009] =
 	{0x01144b, 0x01144f, PG_U_OTHER_PUNCTUATION},
 	{0x011450, 0x011459, PG_U_DECIMAL_NUMBER},
 	{0x01145a, 0x01145b, PG_U_OTHER_PUNCTUATION},
-	{0x01145c, 0x01145c, PG_U_UNASSIGNED},
 	{0x01145d, 0x01145d, PG_U_OTHER_PUNCTUATION},
 	{0x01145e, 0x01145e, PG_U_NONSPACING_MARK},
 	{0x01145f, 0x011461, PG_U_OTHER_LETTER},
-	{0x011462, 0x01147f, PG_U_UNASSIGNED},
 	{0x011480, 0x0114af, PG_U_OTHER_LETTER},
 	{0x0114b0, 0x0114b2, PG_U_SPACING_MARK},
 	{0x0114b3, 0x0114b8, PG_U_NONSPACING_MARK},
@@ -3316,13 +2853,10 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0114c4, 0x0114c5, PG_U_OTHER_LETTER},
 	{0x0114c6, 0x0114c6, PG_U_OTHER_PUNCTUATION},
 	{0x0114c7, 0x0114c7, PG_U_OTHER_LETTER},
-	{0x0114c8, 0x0114cf, PG_U_UNASSIGNED},
 	{0x0114d0, 0x0114d9, PG_U_DECIMAL_NUMBER},
-	{0x0114da, 0x01157f, PG_U_UNASSIGNED},
 	{0x011580, 0x0115ae, PG_U_OTHER_LETTER},
 	{0x0115af, 0x0115b1, PG_U_SPACING_MARK},
 	{0x0115b2, 0x0115b5, PG_U_NONSPACING_MARK},
-	{0x0115b6, 0x0115b7, PG_U_UNASSIGNED},
 	{0x0115b8, 0x0115bb, PG_U_SPACING_MARK},
 	{0x0115bc, 0x0115bd, PG_U_NONSPACING_MARK},
 	{0x0115be, 0x0115be, PG_U_SPACING_MARK},
@@ -3330,7 +2864,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0115c1, 0x0115d7, PG_U_OTHER_PUNCTUATION},
 	{0x0115d8, 0x0115db, PG_U_OTHER_LETTER},
 	{0x0115dc, 0x0115dd, PG_U_NONSPACING_MARK},
-	{0x0115de, 0x0115ff, PG_U_UNASSIGNED},
 	{0x011600, 0x01162f, PG_U_OTHER_LETTER},
 	{0x011630, 0x011632, PG_U_SPACING_MARK},
 	{0x011633, 0x01163a, PG_U_NONSPACING_MARK},
@@ -3340,11 +2873,8 @@ static const pg_category_range unicode_categories[4009] =
 	{0x01163f, 0x011640, PG_U_NONSPACING_MARK},
 	{0x011641, 0x011643, PG_U_OTHER_PUNCTUATION},
 	{0x011644, 0x011644, PG_U_OTHER_LETTER},
-	{0x011645, 0x01164f, PG_U_UNASSIGNED},
 	{0x011650, 0x011659, PG_U_DECIMAL_NUMBER},
-	{0x01165a, 0x01165f, PG_U_UNASSIGNED},
 	{0x011660, 0x01166c, PG_U_OTHER_PUNCTUATION},
-	{0x01166d, 0x01167f, PG_U_UNASSIGNED},
 	{0x011680, 0x0116aa, PG_U_OTHER_LETTER},
 	{0x0116ab, 0x0116ab, PG_U_NONSPACING_MARK},
 	{0x0116ac, 0x0116ac, PG_U_SPACING_MARK},
@@ -3355,48 +2885,35 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0116b7, 0x0116b7, PG_U_NONSPACING_MARK},
 	{0x0116b8, 0x0116b8, PG_U_OTHER_LETTER},
 	{0x0116b9, 0x0116b9, PG_U_OTHER_PUNCTUATION},
-	{0x0116ba, 0x0116bf, PG_U_UNASSIGNED},
 	{0x0116c0, 0x0116c9, PG_U_DECIMAL_NUMBER},
-	{0x0116ca, 0x0116ff, PG_U_UNASSIGNED},
 	{0x011700, 0x01171a, PG_U_OTHER_LETTER},
-	{0x01171b, 0x01171c, PG_U_UNASSIGNED},
 	{0x01171d, 0x01171f, PG_U_NONSPACING_MARK},
 	{0x011720, 0x011721, PG_U_SPACING_MARK},
 	{0x011722, 0x011725, PG_U_NONSPACING_MARK},
 	{0x011726, 0x011726, PG_U_SPACING_MARK},
 	{0x011727, 0x01172b, PG_U_NONSPACING_MARK},
-	{0x01172c, 0x01172f, PG_U_UNASSIGNED},
 	{0x011730, 0x011739, PG_U_DECIMAL_NUMBER},
 	{0x01173a, 0x01173b, PG_U_OTHER_NUMBER},
 	{0x01173c, 0x01173e, PG_U_OTHER_PUNCTUATION},
 	{0x01173f, 0x01173f, PG_U_OTHER_SYMBOL},
 	{0x011740, 0x011746, PG_U_OTHER_LETTER},
-	{0x011747, 0x0117ff, PG_U_UNASSIGNED},
 	{0x011800, 0x01182b, PG_U_OTHER_LETTER},
 	{0x01182c, 0x01182e, PG_U_SPACING_MARK},
 	{0x01182f, 0x011837, PG_U_NONSPACING_MARK},
 	{0x011838, 0x011838, PG_U_SPACING_MARK},
 	{0x011839, 0x01183a, PG_U_NONSPACING_MARK},
 	{0x01183b, 0x01183b, PG_U_OTHER_PUNCTUATION},
-	{0x01183c, 0x01189f, PG_U_UNASSIGNED},
 	{0x0118a0, 0x0118bf, PG_U_UPPERCASE_LETTER},
 	{0x0118c0, 0x0118df, PG_U_LOWERCASE_LETTER},
 	{0x0118e0, 0x0118e9, PG_U_DECIMAL_NUMBER},
 	{0x0118ea, 0x0118f2, PG_U_OTHER_NUMBER},
-	{0x0118f3, 0x0118fe, PG_U_UNASSIGNED},
 	{0x0118ff, 0x011906, PG_U_OTHER_LETTER},
-	{0x011907, 0x011908, PG_U_UNASSIGNED},
 	{0x011909, 0x011909, PG_U_OTHER_LETTER},
-	{0x01190a, 0x01190b, PG_U_UNASSIGNED},
 	{0x01190c, 0x011913, PG_U_OTHER_LETTER},
-	{0x011914, 0x011914, PG_U_UNASSIGNED},
 	{0x011915, 0x011916, PG_U_OTHER_LETTER},
-	{0x011917, 0x011917, PG_U_UNASSIGNED},
 	{0x011918, 0x01192f, PG_U_OTHER_LETTER},
 	{0x011930, 0x011935, PG_U_SPACING_MARK},
-	{0x011936, 0x011936, PG_U_UNASSIGNED},
 	{0x011937, 0x011938, PG_U_SPACING_MARK},
-	{0x011939, 0x01193a, PG_U_UNASSIGNED},
 	{0x01193b, 0x01193c, PG_U_NONSPACING_MARK},
 	{0x01193d, 0x01193d, PG_U_SPACING_MARK},
 	{0x01193e, 0x01193e, PG_U_NONSPACING_MARK},
@@ -3406,15 +2923,11 @@ static const pg_category_range unicode_categories[4009] =
 	{0x011942, 0x011942, PG_U_SPACING_MARK},
 	{0x011943, 0x011943, PG_U_NONSPACING_MARK},
 	{0x011944, 0x011946, PG_U_OTHER_PUNCTUATION},
-	{0x011947, 0x01194f, PG_U_UNASSIGNED},
 	{0x011950, 0x011959, PG_U_DECIMAL_NUMBER},
-	{0x01195a, 0x01199f, PG_U_UNASSIGNED},
 	{0x0119a0, 0x0119a7, PG_U_OTHER_LETTER},
-	{0x0119a8, 0x0119a9, PG_U_UNASSIGNED},
 	{0x0119aa, 0x0119d0, PG_U_OTHER_LETTER},
 	{0x0119d1, 0x0119d3, PG_U_SPACING_MARK},
 	{0x0119d4, 0x0119d7, PG_U_NONSPACING_MARK},
-	{0x0119d8, 0x0119d9, PG_U_UNASSIGNED},
 	{0x0119da, 0x0119db, PG_U_NONSPACING_MARK},
 	{0x0119dc, 0x0119df, PG_U_SPACING_MARK},
 	{0x0119e0, 0x0119e0, PG_U_NONSPACING_MARK},
@@ -3422,7 +2935,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x0119e2, 0x0119e2, PG_U_OTHER_PUNCTUATION},
 	{0x0119e3, 0x0119e3, PG_U_OTHER_LETTER},
 	{0x0119e4, 0x0119e4, PG_U_SPACING_MARK},
-	{0x0119e5, 0x0119ff, PG_U_UNASSIGNED},
 	{0x011a00, 0x011a00, PG_U_OTHER_LETTER},
 	{0x011a01, 0x011a0a, PG_U_NONSPACING_MARK},
 	{0x011a0b, 0x011a32, PG_U_OTHER_LETTER},
@@ -3432,7 +2944,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x011a3b, 0x011a3e, PG_U_NONSPACING_MARK},
 	{0x011a3f, 0x011a46, PG_U_OTHER_PUNCTUATION},
 	{0x011a47, 0x011a47, PG_U_NONSPACING_MARK},
-	{0x011a48, 0x011a4f, PG_U_UNASSIGNED},
 	{0x011a50, 0x011a50, PG_U_OTHER_LETTER},
 	{0x011a51, 0x011a56, PG_U_NONSPACING_MARK},
 	{0x011a57, 0x011a58, PG_U_SPACING_MARK},
@@ -3444,136 +2955,93 @@ static const pg_category_range unicode_categories[4009] =
 	{0x011a9a, 0x011a9c, PG_U_OTHER_PUNCTUATION},
 	{0x011a9d, 0x011a9d, PG_U_OTHER_LETTER},
 	{0x011a9e, 0x011aa2, PG_U_OTHER_PUNCTUATION},
-	{0x011aa3, 0x011aaf, PG_U_UNASSIGNED},
 	{0x011ab0, 0x011af8, PG_U_OTHER_LETTER},
-	{0x011af9, 0x011aff, PG_U_UNASSIGNED},
 	{0x011b00, 0x011b09, PG_U_OTHER_PUNCTUATION},
-	{0x011b0a, 0x011bff, PG_U_UNASSIGNED},
 	{0x011c00, 0x011c08, PG_U_OTHER_LETTER},
-	{0x011c09, 0x011c09, PG_U_UNASSIGNED},
 	{0x011c0a, 0x011c2e, PG_U_OTHER_LETTER},
 	{0x011c2f, 0x011c2f, PG_U_SPACING_MARK},
 	{0x011c30, 0x011c36, PG_U_NONSPACING_MARK},
-	{0x011c37, 0x011c37, PG_U_UNASSIGNED},
 	{0x011c38, 0x011c3d, PG_U_NONSPACING_MARK},
 	{0x011c3e, 0x011c3e, PG_U_SPACING_MARK},
 	{0x011c3f, 0x011c3f, PG_U_NONSPACING_MARK},
 	{0x011c40, 0x011c40, PG_U_OTHER_LETTER},
 	{0x011c41, 0x011c45, PG_U_OTHER_PUNCTUATION},
-	{0x011c46, 0x011c4f, PG_U_UNASSIGNED},
 	{0x011c50, 0x011c59, PG_U_DECIMAL_NUMBER},
 	{0x011c5a, 0x011c6c, PG_U_OTHER_NUMBER},
-	{0x011c6d, 0x011c6f, PG_U_UNASSIGNED},
 	{0x011c70, 0x011c71, PG_U_OTHER_PUNCTUATION},
 	{0x011c72, 0x011c8f, PG_U_OTHER_LETTER},
-	{0x011c90, 0x011c91, PG_U_UNASSIGNED},
 	{0x011c92, 0x011ca7, PG_U_NONSPACING_MARK},
-	{0x011ca8, 0x011ca8, PG_U_UNASSIGNED},
 	{0x011ca9, 0x011ca9, PG_U_SPACING_MARK},
 	{0x011caa, 0x011cb0, PG_U_NONSPACING_MARK},
 	{0x011cb1, 0x011cb1, PG_U_SPACING_MARK},
 	{0x011cb2, 0x011cb3, PG_U_NONSPACING_MARK},
 	{0x011cb4, 0x011cb4, PG_U_SPACING_MARK},
 	{0x011cb5, 0x011cb6, PG_U_NONSPACING_MARK},
-	{0x011cb7, 0x011cff, PG_U_UNASSIGNED},
 	{0x011d00, 0x011d06, PG_U_OTHER_LETTER},
-	{0x011d07, 0x011d07, PG_U_UNASSIGNED},
 	{0x011d08, 0x011d09, PG_U_OTHER_LETTER},
-	{0x011d0a, 0x011d0a, PG_U_UNASSIGNED},
 	{0x011d0b, 0x011d30, PG_U_OTHER_LETTER},
 	{0x011d31, 0x011d36, PG_U_NONSPACING_MARK},
-	{0x011d37, 0x011d39, PG_U_UNASSIGNED},
 	{0x011d3a, 0x011d3a, PG_U_NONSPACING_MARK},
-	{0x011d3b, 0x011d3b, PG_U_UNASSIGNED},
 	{0x011d3c, 0x011d3d, PG_U_NONSPACING_MARK},
-	{0x011d3e, 0x011d3e, PG_U_UNASSIGNED},
 	{0x011d3f, 0x011d45, PG_U_NONSPACING_MARK},
 	{0x011d46, 0x011d46, PG_U_OTHER_LETTER},
 	{0x011d47, 0x011d47, PG_U_NONSPACING_MARK},
-	{0x011d48, 0x011d4f, PG_U_UNASSIGNED},
 	{0x011d50, 0x011d59, PG_U_DECIMAL_NUMBER},
-	{0x011d5a, 0x011d5f, PG_U_UNASSIGNED},
 	{0x011d60, 0x011d65, PG_U_OTHER_LETTER},
-	{0x011d66, 0x011d66, PG_U_UNASSIGNED},
 	{0x011d67, 0x011d68, PG_U_OTHER_LETTER},
-	{0x011d69, 0x011d69, PG_U_UNASSIGNED},
 	{0x011d6a, 0x011d89, PG_U_OTHER_LETTER},
 	{0x011d8a, 0x011d8e, PG_U_SPACING_MARK},
-	{0x011d8f, 0x011d8f, PG_U_UNASSIGNED},
 	{0x011d90, 0x011d91, PG_U_NONSPACING_MARK},
-	{0x011d92, 0x011d92, PG_U_UNASSIGNED},
 	{0x011d93, 0x011d94, PG_U_SPACING_MARK},
 	{0x011d95, 0x011d95, PG_U_NONSPACING_MARK},
 	{0x011d96, 0x011d96, PG_U_SPACING_MARK},
 	{0x011d97, 0x011d97, PG_U_NONSPACING_MARK},
 	{0x011d98, 0x011d98, PG_U_OTHER_LETTER},
-	{0x011d99, 0x011d9f, PG_U_UNASSIGNED},
 	{0x011da0, 0x011da9, PG_U_DECIMAL_NUMBER},
-	{0x011daa, 0x011edf, PG_U_UNASSIGNED},
 	{0x011ee0, 0x011ef2, PG_U_OTHER_LETTER},
 	{0x011ef3, 0x011ef4, PG_U_NONSPACING_MARK},
 	{0x011ef5, 0x011ef6, PG_U_SPACING_MARK},
 	{0x011ef7, 0x011ef8, PG_U_OTHER_PUNCTUATION},
-	{0x011ef9, 0x011eff, PG_U_UNASSIGNED},
 	{0x011f00, 0x011f01, PG_U_NONSPACING_MARK},
 	{0x011f02, 0x011f02, PG_U_OTHER_LETTER},
 	{0x011f03, 0x011f03, PG_U_SPACING_MARK},
 	{0x011f04, 0x011f10, PG_U_OTHER_LETTER},
-	{0x011f11, 0x011f11, PG_U_UNASSIGNED},
 	{0x011f12, 0x011f33, PG_U_OTHER_LETTER},
 	{0x011f34, 0x011f35, PG_U_SPACING_MARK},
 	{0x011f36, 0x011f3a, PG_U_NONSPACING_MARK},
-	{0x011f3b, 0x011f3d, PG_U_UNASSIGNED},
 	{0x011f3e, 0x011f3f, PG_U_SPACING_MARK},
 	{0x011f40, 0x011f40, PG_U_NONSPACING_MARK},
 	{0x011f41, 0x011f41, PG_U_SPACING_MARK},
 	{0x011f42, 0x011f42, PG_U_NONSPACING_MARK},
 	{0x011f43, 0x011f4f, PG_U_OTHER_PUNCTUATION},
 	{0x011f50, 0x011f59, PG_U_DECIMAL_NUMBER},
-	{0x011f5a, 0x011faf, PG_U_UNASSIGNED},
 	{0x011fb0, 0x011fb0, PG_U_OTHER_LETTER},
-	{0x011fb1, 0x011fbf, PG_U_UNASSIGNED},
 	{0x011fc0, 0x011fd4, PG_U_OTHER_NUMBER},
 	{0x011fd5, 0x011fdc, PG_U_OTHER_SYMBOL},
 	{0x011fdd, 0x011fe0, PG_U_CURRENCY_SYMBOL},
 	{0x011fe1, 0x011ff1, PG_U_OTHER_SYMBOL},
-	{0x011ff2, 0x011ffe, PG_U_UNASSIGNED},
 	{0x011fff, 0x011fff, PG_U_OTHER_PUNCTUATION},
 	{0x012000, 0x012399, PG_U_OTHER_LETTER},
-	{0x01239a, 0x0123ff, PG_U_UNASSIGNED},
 	{0x012400, 0x01246e, PG_U_LETTER_NUMBER},
-	{0x01246f, 0x01246f, PG_U_UNASSIGNED},
 	{0x012470, 0x012474, PG_U_OTHER_PUNCTUATION},
-	{0x012475, 0x01247f, PG_U_UNASSIGNED},
 	{0x012480, 0x012543, PG_U_OTHER_LETTER},
-	{0x012544, 0x012f8f, PG_U_UNASSIGNED},
 	{0x012f90, 0x012ff0, PG_U_OTHER_LETTER},
 	{0x012ff1, 0x012ff2, PG_U_OTHER_PUNCTUATION},
-	{0x012ff3, 0x012fff, PG_U_UNASSIGNED},
 	{0x013000, 0x01342f, PG_U_OTHER_LETTER},
 	{0x013430, 0x01343f, PG_U_FORMAT},
 	{0x013440, 0x013440, PG_U_NONSPACING_MARK},
 	{0x013441, 0x013446, PG_U_OTHER_LETTER},
 	{0x013447, 0x013455, PG_U_NONSPACING_MARK},
-	{0x013456, 0x0143ff, PG_U_UNASSIGNED},
 	{0x014400, 0x014646, PG_U_OTHER_LETTER},
-	{0x014647, 0x0167ff, PG_U_UNASSIGNED},
 	{0x016800, 0x016a38, PG_U_OTHER_LETTER},
-	{0x016a39, 0x016a3f, PG_U_UNASSIGNED},
 	{0x016a40, 0x016a5e, PG_U_OTHER_LETTER},
-	{0x016a5f, 0x016a5f, PG_U_UNASSIGNED},
 	{0x016a60, 0x016a69, PG_U_DECIMAL_NUMBER},
-	{0x016a6a, 0x016a6d, PG_U_UNASSIGNED},
 	{0x016a6e, 0x016a6f, PG_U_OTHER_PUNCTUATION},
 	{0x016a70, 0x016abe, PG_U_OTHER_LETTER},
-	{0x016abf, 0x016abf, PG_U_UNASSIGNED},
 	{0x016ac0, 0x016ac9, PG_U_DECIMAL_NUMBER},
-	{0x016aca, 0x016acf, PG_U_UNASSIGNED},
 	{0x016ad0, 0x016aed, PG_U_OTHER_LETTER},
-	{0x016aee, 0x016aef, PG_U_UNASSIGNED},
 	{0x016af0, 0x016af4, PG_U_NONSPACING_MARK},
 	{0x016af5, 0x016af5, PG_U_OTHER_PUNCTUATION},
-	{0x016af6, 0x016aff, PG_U_UNASSIGNED},
 	{0x016b00, 0x016b2f, PG_U_OTHER_LETTER},
 	{0x016b30, 0x016b36, PG_U_NONSPACING_MARK},
 	{0x016b37, 0x016b3b, PG_U_OTHER_PUNCTUATION},
@@ -3581,83 +3049,50 @@ static const pg_category_range unicode_categories[4009] =
 	{0x016b40, 0x016b43, PG_U_MODIFIER_LETTER},
 	{0x016b44, 0x016b44, PG_U_OTHER_PUNCTUATION},
 	{0x016b45, 0x016b45, PG_U_OTHER_SYMBOL},
-	{0x016b46, 0x016b4f, PG_U_UNASSIGNED},
 	{0x016b50, 0x016b59, PG_U_DECIMAL_NUMBER},
-	{0x016b5a, 0x016b5a, PG_U_UNASSIGNED},
 	{0x016b5b, 0x016b61, PG_U_OTHER_NUMBER},
-	{0x016b62, 0x016b62, PG_U_UNASSIGNED},
 	{0x016b63, 0x016b77, PG_U_OTHER_LETTER},
-	{0x016b78, 0x016b7c, PG_U_UNASSIGNED},
 	{0x016b7d, 0x016b8f, PG_U_OTHER_LETTER},
-	{0x016b90, 0x016e3f, PG_U_UNASSIGNED},
 	{0x016e40, 0x016e5f, PG_U_UPPERCASE_LETTER},
 	{0x016e60, 0x016e7f, PG_U_LOWERCASE_LETTER},
 	{0x016e80, 0x016e96, PG_U_OTHER_NUMBER},
 	{0x016e97, 0x016e9a, PG_U_OTHER_PUNCTUATION},
-	{0x016e9b, 0x016eff, PG_U_UNASSIGNED},
 	{0x016f00, 0x016f4a, PG_U_OTHER_LETTER},
-	{0x016f4b, 0x016f4e, PG_U_UNASSIGNED},
 	{0x016f4f, 0x016f4f, PG_U_NONSPACING_MARK},
 	{0x016f50, 0x016f50, PG_U_OTHER_LETTER},
 	{0x016f51, 0x016f87, PG_U_SPACING_MARK},
-	{0x016f88, 0x016f8e, PG_U_UNASSIGNED},
 	{0x016f8f, 0x016f92, PG_U_NONSPACING_MARK},
 	{0x016f93, 0x016f9f, PG_U_MODIFIER_LETTER},
-	{0x016fa0, 0x016fdf, PG_U_UNASSIGNED},
 	{0x016fe0, 0x016fe1, PG_U_MODIFIER_LETTER},
 	{0x016fe2, 0x016fe2, PG_U_OTHER_PUNCTUATION},
 	{0x016fe3, 0x016fe3, PG_U_MODIFIER_LETTER},
 	{0x016fe4, 0x016fe4, PG_U_NONSPACING_MARK},
-	{0x016fe5, 0x016fef, PG_U_UNASSIGNED},
 	{0x016ff0, 0x016ff1, PG_U_SPACING_MARK},
-	{0x016ff2, 0x016fff, PG_U_UNASSIGNED},
 	{0x017000, 0x0187f7, PG_U_OTHER_LETTER},
-	{0x0187f8, 0x0187ff, PG_U_UNASSIGNED},
 	{0x018800, 0x018cd5, PG_U_OTHER_LETTER},
-	{0x018cd6, 0x018cff, PG_U_UNASSIGNED},
 	{0x018d00, 0x018d08, PG_U_OTHER_LETTER},
-	{0x018d09, 0x01afef, PG_U_UNASSIGNED},
 	{0x01aff0, 0x01aff3, PG_U_MODIFIER_LETTER},
-	{0x01aff4, 0x01aff4, PG_U_UNASSIGNED},
 	{0x01aff5, 0x01affb, PG_U_MODIFIER_LETTER},
-	{0x01affc, 0x01affc, PG_U_UNASSIGNED},
 	{0x01affd, 0x01affe, PG_U_MODIFIER_LETTER},
-	{0x01afff, 0x01afff, PG_U_UNASSIGNED},
 	{0x01b000, 0x01b122, PG_U_OTHER_LETTER},
-	{0x01b123, 0x01b131, PG_U_UNASSIGNED},
 	{0x01b132, 0x01b132, PG_U_OTHER_LETTER},
-	{0x01b133, 0x01b14f, PG_U_UNASSIGNED},
 	{0x01b150, 0x01b152, PG_U_OTHER_LETTER},
-	{0x01b153, 0x01b154, PG_U_UNASSIGNED},
 	{0x01b155, 0x01b155, PG_U_OTHER_LETTER},
-	{0x01b156, 0x01b163, PG_U_UNASSIGNED},
 	{0x01b164, 0x01b167, PG_U_OTHER_LETTER},
-	{0x01b168, 0x01b16f, PG_U_UNASSIGNED},
 	{0x01b170, 0x01b2fb, PG_U_OTHER_LETTER},
-	{0x01b2fc, 0x01bbff, PG_U_UNASSIGNED},
 	{0x01bc00, 0x01bc6a, PG_U_OTHER_LETTER},
-	{0x01bc6b, 0x01bc6f, PG_U_UNASSIGNED},
 	{0x01bc70, 0x01bc7c, PG_U_OTHER_LETTER},
-	{0x01bc7d, 0x01bc7f, PG_U_UNASSIGNED},
 	{0x01bc80, 0x01bc88, PG_U_OTHER_LETTER},
-	{0x01bc89, 0x01bc8f, PG_U_UNASSIGNED},
 	{0x01bc90, 0x01bc99, PG_U_OTHER_LETTER},
-	{0x01bc9a, 0x01bc9b, PG_U_UNASSIGNED},
 	{0x01bc9c, 0x01bc9c, PG_U_OTHER_SYMBOL},
 	{0x01bc9d, 0x01bc9e, PG_U_NONSPACING_MARK},
 	{0x01bc9f, 0x01bc9f, PG_U_OTHER_PUNCTUATION},
 	{0x01bca0, 0x01bca3, PG_U_FORMAT},
-	{0x01bca4, 0x01ceff, PG_U_UNASSIGNED},
 	{0x01cf00, 0x01cf2d, PG_U_NONSPACING_MARK},
-	{0x01cf2e, 0x01cf2f, PG_U_UNASSIGNED},
 	{0x01cf30, 0x01cf46, PG_U_NONSPACING_MARK},
-	{0x01cf47, 0x01cf4f, PG_U_UNASSIGNED},
 	{0x01cf50, 0x01cfc3, PG_U_OTHER_SYMBOL},
-	{0x01cfc4, 0x01cfff, PG_U_UNASSIGNED},
 	{0x01d000, 0x01d0f5, PG_U_OTHER_SYMBOL},
-	{0x01d0f6, 0x01d0ff, PG_U_UNASSIGNED},
 	{0x01d100, 0x01d126, PG_U_OTHER_SYMBOL},
-	{0x01d127, 0x01d128, PG_U_UNASSIGNED},
 	{0x01d129, 0x01d164, PG_U_OTHER_SYMBOL},
 	{0x01d165, 0x01d166, PG_U_SPACING_MARK},
 	{0x01d167, 0x01d169, PG_U_NONSPACING_MARK},
@@ -3670,66 +3105,42 @@ static const pg_category_range unicode_categories[4009] =
 	{0x01d18c, 0x01d1a9, PG_U_OTHER_SYMBOL},
 	{0x01d1aa, 0x01d1ad, PG_U_NONSPACING_MARK},
 	{0x01d1ae, 0x01d1ea, PG_U_OTHER_SYMBOL},
-	{0x01d1eb, 0x01d1ff, PG_U_UNASSIGNED},
 	{0x01d200, 0x01d241, PG_U_OTHER_SYMBOL},
 	{0x01d242, 0x01d244, PG_U_NONSPACING_MARK},
 	{0x01d245, 0x01d245, PG_U_OTHER_SYMBOL},
-	{0x01d246, 0x01d2bf, PG_U_UNASSIGNED},
 	{0x01d2c0, 0x01d2d3, PG_U_OTHER_NUMBER},
-	{0x01d2d4, 0x01d2df, PG_U_UNASSIGNED},
 	{0x01d2e0, 0x01d2f3, PG_U_OTHER_NUMBER},
-	{0x01d2f4, 0x01d2ff, PG_U_UNASSIGNED},
 	{0x01d300, 0x01d356, PG_U_OTHER_SYMBOL},
-	{0x01d357, 0x01d35f, PG_U_UNASSIGNED},
 	{0x01d360, 0x01d378, PG_U_OTHER_NUMBER},
-	{0x01d379, 0x01d3ff, PG_U_UNASSIGNED},
 	{0x01d400, 0x01d419, PG_U_UPPERCASE_LETTER},
 	{0x01d41a, 0x01d433, PG_U_LOWERCASE_LETTER},
 	{0x01d434, 0x01d44d, PG_U_UPPERCASE_LETTER},
 	{0x01d44e, 0x01d454, PG_U_LOWERCASE_LETTER},
-	{0x01d455, 0x01d455, PG_U_UNASSIGNED},
 	{0x01d456, 0x01d467, PG_U_LOWERCASE_LETTER},
 	{0x01d468, 0x01d481, PG_U_UPPERCASE_LETTER},
 	{0x01d482, 0x01d49b, PG_U_LOWERCASE_LETTER},
 	{0x01d49c, 0x01d49c, PG_U_UPPERCASE_LETTER},
-	{0x01d49d, 0x01d49d, PG_U_UNASSIGNED},
 	{0x01d49e, 0x01d49f, PG_U_UPPERCASE_LETTER},
-	{0x01d4a0, 0x01d4a1, PG_U_UNASSIGNED},
 	{0x01d4a2, 0x01d4a2, PG_U_UPPERCASE_LETTER},
-	{0x01d4a3, 0x01d4a4, PG_U_UNASSIGNED},
 	{0x01d4a5, 0x01d4a6, PG_U_UPPERCASE_LETTER},
-	{0x01d4a7, 0x01d4a8, PG_U_UNASSIGNED},
 	{0x01d4a9, 0x01d4ac, PG_U_UPPERCASE_LETTER},
-	{0x01d4ad, 0x01d4ad, PG_U_UNASSIGNED},
 	{0x01d4ae, 0x01d4b5, PG_U_UPPERCASE_LETTER},
 	{0x01d4b6, 0x01d4b9, PG_U_LOWERCASE_LETTER},
-	{0x01d4ba, 0x01d4ba, PG_U_UNASSIGNED},
 	{0x01d4bb, 0x01d4bb, PG_U_LOWERCASE_LETTER},
-	{0x01d4bc, 0x01d4bc, PG_U_UNASSIGNED},
 	{0x01d4bd, 0x01d4c3, PG_U_LOWERCASE_LETTER},
-	{0x01d4c4, 0x01d4c4, PG_U_UNASSIGNED},
 	{0x01d4c5, 0x01d4cf, PG_U_LOWERCASE_LETTER},
 	{0x01d4d0, 0x01d4e9, PG_U_UPPERCASE_LETTER},
 	{0x01d4ea, 0x01d503, PG_U_LOWERCASE_LETTER},
 	{0x01d504, 0x01d505, PG_U_UPPERCASE_LETTER},
-	{0x01d506, 0x01d506, PG_U_UNASSIGNED},
 	{0x01d507, 0x01d50a, PG_U_UPPERCASE_LETTER},
-	{0x01d50b, 0x01d50c, PG_U_UNASSIGNED},
 	{0x01d50d, 0x01d514, PG_U_UPPERCASE_LETTER},
-	{0x01d515, 0x01d515, PG_U_UNASSIGNED},
 	{0x01d516, 0x01d51c, PG_U_UPPERCASE_LETTER},
-	{0x01d51d, 0x01d51d, PG_U_UNASSIGNED},
 	{0x01d51e, 0x01d537, PG_U_LOWERCASE_LETTER},
 	{0x01d538, 0x01d539, PG_U_UPPERCASE_LETTER},
-	{0x01d53a, 0x01d53a, PG_U_UNASSIGNED},
 	{0x01d53b, 0x01d53e, PG_U_UPPERCASE_LETTER},
-	{0x01d53f, 0x01d53f, PG_U_UNASSIGNED},
 	{0x01d540, 0x01d544, PG_U_UPPERCASE_LETTER},
-	{0x01d545, 0x01d545, PG_U_UNASSIGNED},
 	{0x01d546, 0x01d546, PG_U_UPPERCASE_LETTER},
-	{0x01d547, 0x01d549, PG_U_UNASSIGNED},
 	{0x01d54a, 0x01d550, PG_U_UPPERCASE_LETTER},
-	{0x01d551, 0x01d551, PG_U_UNASSIGNED},
 	{0x01d552, 0x01d56b, PG_U_LOWERCASE_LETTER},
 	{0x01d56c, 0x01d585, PG_U_UPPERCASE_LETTER},
 	{0x01d586, 0x01d59f, PG_U_LOWERCASE_LETTER},
@@ -3743,7 +3154,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x01d656, 0x01d66f, PG_U_LOWERCASE_LETTER},
 	{0x01d670, 0x01d689, PG_U_UPPERCASE_LETTER},
 	{0x01d68a, 0x01d6a5, PG_U_LOWERCASE_LETTER},
-	{0x01d6a6, 0x01d6a7, PG_U_UNASSIGNED},
 	{0x01d6a8, 0x01d6c0, PG_U_UPPERCASE_LETTER},
 	{0x01d6c1, 0x01d6c1, PG_U_MATH_SYMBOL},
 	{0x01d6c2, 0x01d6da, PG_U_LOWERCASE_LETTER},
@@ -3771,7 +3181,6 @@ static const pg_category_range unicode_categories[4009] =
 	{0x01d7c4, 0x01d7c9, PG_U_LOWERCASE_LETTER},
 	{0x01d7ca, 0x01d7ca, PG_U_UPPERCASE_LETTER},
 	{0x01d7cb, 0x01d7cb, PG_U_LOWERCASE_LETTER},
-	{0x01d7cc, 0x01d7cd, PG_U_UNASSIGNED},
 	{0x01d7ce, 0x01d7ff, PG_U_DECIMAL_NUMBER},
 	{0x01d800, 0x01d9ff, PG_U_OTHER_SYMBOL},
 	{0x01da00, 0x01da36, PG_U_NONSPACING_MARK},
@@ -3783,258 +3192,142 @@ static const pg_category_range unicode_categories[4009] =
 	{0x01da84, 0x01da84, PG_U_NONSPACING_MARK},
 	{0x01da85, 0x01da86, PG_U_OTHER_SYMBOL},
 	{0x01da87, 0x01da8b, PG_U_OTHER_PUNCTUATION},
-	{0x01da8c, 0x01da9a, PG_U_UNASSIGNED},
 	{0x01da9b, 0x01da9f, PG_U_NONSPACING_MARK},
-	{0x01daa0, 0x01daa0, PG_U_UNASSIGNED},
 	{0x01daa1, 0x01daaf, PG_U_NONSPACING_MARK},
-	{0x01dab0, 0x01deff, PG_U_UNASSIGNED},
 	{0x01df00, 0x01df09, PG_U_LOWERCASE_LETTER},
 	{0x01df0a, 0x01df0a, PG_U_OTHER_LETTER},
 	{0x01df0b, 0x01df1e, PG_U_LOWERCASE_LETTER},
-	{0x01df1f, 0x01df24, PG_U_UNASSIGNED},
 	{0x01df25, 0x01df2a, PG_U_LOWERCASE_LETTER},
-	{0x01df2b, 0x01dfff, PG_U_UNASSIGNED},
 	{0x01e000, 0x01e006, PG_U_NONSPACING_MARK},
-	{0x01e007, 0x01e007, PG_U_UNASSIGNED},
 	{0x01e008, 0x01e018, PG_U_NONSPACING_MARK},
-	{0x01e019, 0x01e01a, PG_U_UNASSIGNED},
 	{0x01e01b, 0x01e021, PG_U_NONSPACING_MARK},
-	{0x01e022, 0x01e022, PG_U_UNASSIGNED},
 	{0x01e023, 0x01e024, PG_U_NONSPACING_MARK},
-	{0x01e025, 0x01e025, PG_U_UNASSIGNED},
 	{0x01e026, 0x01e02a, PG_U_NONSPACING_MARK},
-	{0x01e02b, 0x01e02f, PG_U_UNASSIGNED},
 	{0x01e030, 0x01e06d, PG_U_MODIFIER_LETTER},
-	{0x01e06e, 0x01e08e, PG_U_UNASSIGNED},
 	{0x01e08f, 0x01e08f, PG_U_NONSPACING_MARK},
-	{0x01e090, 0x01e0ff, PG_U_UNASSIGNED},
 	{0x01e100, 0x01e12c, PG_U_OTHER_LETTER},
-	{0x01e12d, 0x01e12f, PG_U_UNASSIGNED},
 	{0x01e130, 0x01e136, PG_U_NONSPACING_MARK},
 	{0x01e137, 0x01e13d, PG_U_MODIFIER_LETTER},
-	{0x01e13e, 0x01e13f, PG_U_UNASSIGNED},
 	{0x01e140, 0x01e149, PG_U_DECIMAL_NUMBER},
-	{0x01e14a, 0x01e14d, PG_U_UNASSIGNED},
 	{0x01e14e, 0x01e14e, PG_U_OTHER_LETTER},
 	{0x01e14f, 0x01e14f, PG_U_OTHER_SYMBOL},
-	{0x01e150, 0x01e28f, PG_U_UNASSIGNED},
 	{0x01e290, 0x01e2ad, PG_U_OTHER_LETTER},
 	{0x01e2ae, 0x01e2ae, PG_U_NONSPACING_MARK},
-	{0x01e2af, 0x01e2bf, PG_U_UNASSIGNED},
 	{0x01e2c0, 0x01e2eb, PG_U_OTHER_LETTER},
 	{0x01e2ec, 0x01e2ef, PG_U_NONSPACING_MARK},
 	{0x01e2f0, 0x01e2f9, PG_U_DECIMAL_NUMBER},
-	{0x01e2fa, 0x01e2fe, PG_U_UNASSIGNED},
 	{0x01e2ff, 0x01e2ff, PG_U_CURRENCY_SYMBOL},
-	{0x01e300, 0x01e4cf, PG_U_UNASSIGNED},
 	{0x01e4d0, 0x01e4ea, PG_U_OTHER_LETTER},
 	{0x01e4eb, 0x01e4eb, PG_U_MODIFIER_LETTER},
 	{0x01e4ec, 0x01e4ef, PG_U_NONSPACING_MARK},
 	{0x01e4f0, 0x01e4f9, PG_U_DECIMAL_NUMBER},
-	{0x01e4fa, 0x01e7df, PG_U_UNASSIGNED},
 	{0x01e7e0, 0x01e7e6, PG_U_OTHER_LETTER},
-	{0x01e7e7, 0x01e7e7, PG_U_UNASSIGNED},
 	{0x01e7e8, 0x01e7eb, PG_U_OTHER_LETTER},
-	{0x01e7ec, 0x01e7ec, PG_U_UNASSIGNED},
 	{0x01e7ed, 0x01e7ee, PG_U_OTHER_LETTER},
-	{0x01e7ef, 0x01e7ef, PG_U_UNASSIGNED},
 	{0x01e7f0, 0x01e7fe, PG_U_OTHER_LETTER},
-	{0x01e7ff, 0x01e7ff, PG_U_UNASSIGNED},
 	{0x01e800, 0x01e8c4, PG_U_OTHER_LETTER},
-	{0x01e8c5, 0x01e8c6, PG_U_UNASSIGNED},
 	{0x01e8c7, 0x01e8cf, PG_U_OTHER_NUMBER},
 	{0x01e8d0, 0x01e8d6, PG_U_NONSPACING_MARK},
-	{0x01e8d7, 0x01e8ff, PG_U_UNASSIGNED},
 	{0x01e900, 0x01e921, PG_U_UPPERCASE_LETTER},
 	{0x01e922, 0x01e943, PG_U_LOWERCASE_LETTER},
 	{0x01e944, 0x01e94a, PG_U_NONSPACING_MARK},
 	{0x01e94b, 0x01e94b, PG_U_MODIFIER_LETTER},
-	{0x01e94c, 0x01e94f, PG_U_UNASSIGNED},
 	{0x01e950, 0x01e959, PG_U_DECIMAL_NUMBER},
-	{0x01e95a, 0x01e95d, PG_U_UNASSIGNED},
 	{0x01e95e, 0x01e95f, PG_U_OTHER_PUNCTUATION},
-	{0x01e960, 0x01ec70, PG_U_UNASSIGNED},
 	{0x01ec71, 0x01ecab, PG_U_OTHER_NUMBER},
 	{0x01ecac, 0x01ecac, PG_U_OTHER_SYMBOL},
 	{0x01ecad, 0x01ecaf, PG_U_OTHER_NUMBER},
 	{0x01ecb0, 0x01ecb0, PG_U_CURRENCY_SYMBOL},
 	{0x01ecb1, 0x01ecb4, PG_U_OTHER_NUMBER},
-	{0x01ecb5, 0x01ed00, PG_U_UNASSIGNED},
 	{0x01ed01, 0x01ed2d, PG_U_OTHER_NUMBER},
 	{0x01ed2e, 0x01ed2e, PG_U_OTHER_SYMBOL},
 	{0x01ed2f, 0x01ed3d, PG_U_OTHER_NUMBER},
-	{0x01ed3e, 0x01edff, PG_U_UNASSIGNED},
 	{0x01ee00, 0x01ee03, PG_U_OTHER_LETTER},
-	{0x01ee04, 0x01ee04, PG_U_UNASSIGNED},
 	{0x01ee05, 0x01ee1f, PG_U_OTHER_LETTER},
-	{0x01ee20, 0x01ee20, PG_U_UNASSIGNED},
 	{0x01ee21, 0x01ee22, PG_U_OTHER_LETTER},
-	{0x01ee23, 0x01ee23, PG_U_UNASSIGNED},
 	{0x01ee24, 0x01ee24, PG_U_OTHER_LETTER},
-	{0x01ee25, 0x01ee26, PG_U_UNASSIGNED},
 	{0x01ee27, 0x01ee27, PG_U_OTHER_LETTER},
-	{0x01ee28, 0x01ee28, PG_U_UNASSIGNED},
 	{0x01ee29, 0x01ee32, PG_U_OTHER_LETTER},
-	{0x01ee33, 0x01ee33, PG_U_UNASSIGNED},
 	{0x01ee34, 0x01ee37, PG_U_OTHER_LETTER},
-	{0x01ee38, 0x01ee38, PG_U_UNASSIGNED},
 	{0x01ee39, 0x01ee39, PG_U_OTHER_LETTER},
-	{0x01ee3a, 0x01ee3a, PG_U_UNASSIGNED},
 	{0x01ee3b, 0x01ee3b, PG_U_OTHER_LETTER},
-	{0x01ee3c, 0x01ee41, PG_U_UNASSIGNED},
 	{0x01ee42, 0x01ee42, PG_U_OTHER_LETTER},
-	{0x01ee43, 0x01ee46, PG_U_UNASSIGNED},
 	{0x01ee47, 0x01ee47, PG_U_OTHER_LETTER},
-	{0x01ee48, 0x01ee48, PG_U_UNASSIGNED},
 	{0x01ee49, 0x01ee49, PG_U_OTHER_LETTER},
-	{0x01ee4a, 0x01ee4a, PG_U_UNASSIGNED},
 	{0x01ee4b, 0x01ee4b, PG_U_OTHER_LETTER},
-	{0x01ee4c, 0x01ee4c, PG_U_UNASSIGNED},
 	{0x01ee4d, 0x01ee4f, PG_U_OTHER_LETTER},
-	{0x01ee50, 0x01ee50, PG_U_UNASSIGNED},
 	{0x01ee51, 0x01ee52, PG_U_OTHER_LETTER},
-	{0x01ee53, 0x01ee53, PG_U_UNASSIGNED},
 	{0x01ee54, 0x01ee54, PG_U_OTHER_LETTER},
-	{0x01ee55, 0x01ee56, PG_U_UNASSIGNED},
 	{0x01ee57, 0x01ee57, PG_U_OTHER_LETTER},
-	{0x01ee58, 0x01ee58, PG_U_UNASSIGNED},
 	{0x01ee59, 0x01ee59, PG_U_OTHER_LETTER},
-	{0x01ee5a, 0x01ee5a, PG_U_UNASSIGNED},
 	{0x01ee5b, 0x01ee5b, PG_U_OTHER_LETTER},
-	{0x01ee5c, 0x01ee5c, PG_U_UNASSIGNED},
 	{0x01ee5d, 0x01ee5d, PG_U_OTHER_LETTER},
-	{0x01ee5e, 0x01ee5e, PG_U_UNASSIGNED},
 	{0x01ee5f, 0x01ee5f, PG_U_OTHER_LETTER},
-	{0x01ee60, 0x01ee60, PG_U_UNASSIGNED},
 	{0x01ee61, 0x01ee62, PG_U_OTHER_LETTER},
-	{0x01ee63, 0x01ee63, PG_U_UNASSIGNED},
 	{0x01ee64, 0x01ee64, PG_U_OTHER_LETTER},
-	{0x01ee65, 0x01ee66, PG_U_UNASSIGNED},
 	{0x01ee67, 0x01ee6a, PG_U_OTHER_LETTER},
-	{0x01ee6b, 0x01ee6b, PG_U_UNASSIGNED},
 	{0x01ee6c, 0x01ee72, PG_U_OTHER_LETTER},
-	{0x01ee73, 0x01ee73, PG_U_UNASSIGNED},
 	{0x01ee74, 0x01ee77, PG_U_OTHER_LETTER},
-	{0x01ee78, 0x01ee78, PG_U_UNASSIGNED},
 	{0x01ee79, 0x01ee7c, PG_U_OTHER_LETTER},
-	{0x01ee7d, 0x01ee7d, PG_U_UNASSIGNED},
 	{0x01ee7e, 0x01ee7e, PG_U_OTHER_LETTER},
-	{0x01ee7f, 0x01ee7f, PG_U_UNASSIGNED},
 	{0x01ee80, 0x01ee89, PG_U_OTHER_LETTER},
-	{0x01ee8a, 0x01ee8a, PG_U_UNASSIGNED},
 	{0x01ee8b, 0x01ee9b, PG_U_OTHER_LETTER},
-	{0x01ee9c, 0x01eea0, PG_U_UNASSIGNED},
 	{0x01eea1, 0x01eea3, PG_U_OTHER_LETTER},
-	{0x01eea4, 0x01eea4, PG_U_UNASSIGNED},
 	{0x01eea5, 0x01eea9, PG_U_OTHER_LETTER},
-	{0x01eeaa, 0x01eeaa, PG_U_UNASSIGNED},
 	{0x01eeab, 0x01eebb, PG_U_OTHER_LETTER},
-	{0x01eebc, 0x01eeef, PG_U_UNASSIGNED},
 	{0x01eef0, 0x01eef1, PG_U_MATH_SYMBOL},
-	{0x01eef2, 0x01efff, PG_U_UNASSIGNED},
 	{0x01f000, 0x01f02b, PG_U_OTHER_SYMBOL},
-	{0x01f02c, 0x01f02f, PG_U_UNASSIGNED},
 	{0x01f030, 0x01f093, PG_U_OTHER_SYMBOL},
-	{0x01f094, 0x01f09f, PG_U_UNASSIGNED},
 	{0x01f0a0, 0x01f0ae, PG_U_OTHER_SYMBOL},
-	{0x01f0af, 0x01f0b0, PG_U_UNASSIGNED},
 	{0x01f0b1, 0x01f0bf, PG_U_OTHER_SYMBOL},
-	{0x01f0c0, 0x01f0c0, PG_U_UNASSIGNED},
 	{0x01f0c1, 0x01f0cf, PG_U_OTHER_SYMBOL},
-	{0x01f0d0, 0x01f0d0, PG_U_UNASSIGNED},
 	{0x01f0d1, 0x01f0f5, PG_U_OTHER_SYMBOL},
-	{0x01f0f6, 0x01f0ff, PG_U_UNASSIGNED},
 	{0x01f100, 0x01f10c, PG_U_OTHER_NUMBER},
 	{0x01f10d, 0x01f1ad, PG_U_OTHER_SYMBOL},
-	{0x01f1ae, 0x01f1e5, PG_U_UNASSIGNED},
 	{0x01f1e6, 0x01f202, PG_U_OTHER_SYMBOL},
-	{0x01f203, 0x01f20f, PG_U_UNASSIGNED},
 	{0x01f210, 0x01f23b, PG_U_OTHER_SYMBOL},
-	{0x01f23c, 0x01f23f, PG_U_UNASSIGNED},
 	{0x01f240, 0x01f248, PG_U_OTHER_SYMBOL},
-	{0x01f249, 0x01f24f, PG_U_UNASSIGNED},
 	{0x01f250, 0x01f251, PG_U_OTHER_SYMBOL},
-	{0x01f252, 0x01f25f, PG_U_UNASSIGNED},
 	{0x01f260, 0x01f265, PG_U_OTHER_SYMBOL},
-	{0x01f266, 0x01f2ff, PG_U_UNASSIGNED},
 	{0x01f300, 0x01f3fa, PG_U_OTHER_SYMBOL},
 	{0x01f3fb, 0x01f3ff, PG_U_MODIFIER_SYMBOL},
 	{0x01f400, 0x01f6d7, PG_U_OTHER_SYMBOL},
-	{0x01f6d8, 0x01f6db, PG_U_UNASSIGNED},
 	{0x01f6dc, 0x01f6ec, PG_U_OTHER_SYMBOL},
-	{0x01f6ed, 0x01f6ef, PG_U_UNASSIGNED},
 	{0x01f6f0, 0x01f6fc, PG_U_OTHER_SYMBOL},
-	{0x01f6fd, 0x01f6ff, PG_U_UNASSIGNED},
 	{0x01f700, 0x01f776, PG_U_OTHER_SYMBOL},
-	{0x01f777, 0x01f77a, PG_U_UNASSIGNED},
 	{0x01f77b, 0x01f7d9, PG_U_OTHER_SYMBOL},
-	{0x01f7da, 0x01f7df, PG_U_UNASSIGNED},
 	{0x01f7e0, 0x01f7eb, PG_U_OTHER_SYMBOL},
-	{0x01f7ec, 0x01f7ef, PG_U_UNASSIGNED},
 	{0x01f7f0, 0x01f7f0, PG_U_OTHER_SYMBOL},
-	{0x01f7f1, 0x01f7ff, PG_U_UNASSIGNED},
 	{0x01f800, 0x01f80b, PG_U_OTHER_SYMBOL},
-	{0x01f80c, 0x01f80f, PG_U_UNASSIGNED},
 	{0x01f810, 0x01f847, PG_U_OTHER_SYMBOL},
-	{0x01f848, 0x01f84f, PG_U_UNASSIGNED},
 	{0x01f850, 0x01f859, PG_U_OTHER_SYMBOL},
-	{0x01f85a, 0x01f85f, PG_U_UNASSIGNED},
 	{0x01f860, 0x01f887, PG_U_OTHER_SYMBOL},
-	{0x01f888, 0x01f88f, PG_U_UNASSIGNED},
 	{0x01f890, 0x01f8ad, PG_U_OTHER_SYMBOL},
-	{0x01f8ae, 0x01f8af, PG_U_UNASSIGNED},
 	{0x01f8b0, 0x01f8b1, PG_U_OTHER_SYMBOL},
-	{0x01f8b2, 0x01f8ff, PG_U_UNASSIGNED},
 	{0x01f900, 0x01fa53, PG_U_OTHER_SYMBOL},
-	{0x01fa54, 0x01fa5f, PG_U_UNASSIGNED},
 	{0x01fa60, 0x01fa6d, PG_U_OTHER_SYMBOL},
-	{0x01fa6e, 0x01fa6f, PG_U_UNASSIGNED},
 	{0x01fa70, 0x01fa7c, PG_U_OTHER_SYMBOL},
-	{0x01fa7d, 0x01fa7f, PG_U_UNASSIGNED},
 	{0x01fa80, 0x01fa88, PG_U_OTHER_SYMBOL},
-	{0x01fa89, 0x01fa8f, PG_U_UNASSIGNED},
 	{0x01fa90, 0x01fabd, PG_U_OTHER_SYMBOL},
-	{0x01fabe, 0x01fabe, PG_U_UNASSIGNED},
 	{0x01fabf, 0x01fac5, PG_U_OTHER_SYMBOL},
-	{0x01fac6, 0x01facd, PG_U_UNASSIGNED},
 	{0x01face, 0x01fadb, PG_U_OTHER_SYMBOL},
-	{0x01fadc, 0x01fadf, PG_U_UNASSIGNED},
 	{0x01fae0, 0x01fae8, PG_U_OTHER_SYMBOL},
-	{0x01fae9, 0x01faef, PG_U_UNASSIGNED},
 	{0x01faf0, 0x01faf8, PG_U_OTHER_SYMBOL},
-	{0x01faf9, 0x01faff, PG_U_UNASSIGNED},
 	{0x01fb00, 0x01fb92, PG_U_OTHER_SYMBOL},
-	{0x01fb93, 0x01fb93, PG_U_UNASSIGNED},
 	{0x01fb94, 0x01fbca, PG_U_OTHER_SYMBOL},
-	{0x01fbcb, 0x01fbef, PG_U_UNASSIGNED},
 	{0x01fbf0, 0x01fbf9, PG_U_DECIMAL_NUMBER},
-	{0x01fbfa, 0x01ffff, PG_U_UNASSIGNED},
 	{0x020000, 0x02a6df, PG_U_OTHER_LETTER},
-	{0x02a6e0, 0x02a6ff, PG_U_UNASSIGNED},
 	{0x02a700, 0x02b739, PG_U_OTHER_LETTER},
-	{0x02b73a, 0x02b73f, PG_U_UNASSIGNED},
 	{0x02b740, 0x02b81d, PG_U_OTHER_LETTER},
-	{0x02b81e, 0x02b81f, PG_U_UNASSIGNED},
 	{0x02b820, 0x02cea1, PG_U_OTHER_LETTER},
-	{0x02cea2, 0x02ceaf, PG_U_UNASSIGNED},
 	{0x02ceb0, 0x02ebe0, PG_U_OTHER_LETTER},
-	{0x02ebe1, 0x02ebef, PG_U_UNASSIGNED},
 	{0x02ebf0, 0x02ee5d, PG_U_OTHER_LETTER},
-	{0x02ee5e, 0x02f7ff, PG_U_UNASSIGNED},
 	{0x02f800, 0x02fa1d, PG_U_OTHER_LETTER},
-	{0x02fa1e, 0x02ffff, PG_U_UNASSIGNED},
 	{0x030000, 0x03134a, PG_U_OTHER_LETTER},
-	{0x03134b, 0x03134f, PG_U_UNASSIGNED},
 	{0x031350, 0x0323af, PG_U_OTHER_LETTER},
-	{0x0323b0, 0x0e0000, PG_U_UNASSIGNED},
 	{0x0e0001, 0x0e0001, PG_U_FORMAT},
-	{0x0e0002, 0x0e001f, PG_U_UNASSIGNED},
 	{0x0e0020, 0x0e007f, PG_U_FORMAT},
-	{0x0e0080, 0x0e00ff, PG_U_UNASSIGNED},
 	{0x0e0100, 0x0e01ef, PG_U_NONSPACING_MARK},
-	{0x0e01f0, 0x0effff, PG_U_UNASSIGNED},
 	{0x0f0000, 0x0ffffd, PG_U_PRIVATE_USE},
-	{0x0ffffe, 0x0fffff, PG_U_UNASSIGNED},
-	{0x100000, 0x10fffd, PG_U_PRIVATE_USE},
-	{0x10fffe, 0x10ffff, PG_U_UNASSIGNED}
+	{0x100000, 0x10fffd, PG_U_PRIVATE_USE}
 };
-- 
2.34.1

v2-0001-Minor-cleanup-for-unicode-update-build-and-test.patchtext/x-patch; charset=UTF-8; name=v2-0001-Minor-cleanup-for-unicode-update-build-and-test.patchDownload
From 80ed701721b2bc91f2346f013d58930cd1d325f5 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 22 Nov 2023 10:38:46 -0800
Subject: [PATCH v2 1/3] Minor cleanup for unicode-update build and test.

---
 src/common/unicode/Makefile        |  6 ++--
 src/common/unicode/category_test.c | 18 ++++++------
 src/common/unicode/meson.build     | 44 +++++++++++++++---------------
 3 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/src/common/unicode/Makefile b/src/common/unicode/Makefile
index 30cd75cc6a..04d81dd5cb 100644
--- a/src/common/unicode/Makefile
+++ b/src/common/unicode/Makefile
@@ -21,7 +21,7 @@ CPPFLAGS += $(ICU_CFLAGS)
 # By default, do nothing.
 all:
 
-update-unicode: unicode_category_table.h unicode_norm_table.h unicode_nonspacing_table.h unicode_east_asian_fw_table.h unicode_normprops_table.h unicode_norm_hashfunc.h unicode_version.h
+update-unicode: unicode_category_table.h unicode_east_asian_fw_table.h unicode_nonspacing_table.h unicode_norm_hashfunc.h unicode_norm_table.h unicode_normprops_table.h unicode_version.h
 	mv $^ $(top_srcdir)/src/include/common/
 	$(MAKE) category-check
 	$(MAKE) normalization-check
@@ -29,7 +29,7 @@ update-unicode: unicode_category_table.h unicode_norm_table.h unicode_nonspacing
 # These files are part of the Unicode Character Database. Download
 # them on demand.  The dependency on Makefile.global is for
 # UNICODE_VERSION.
-UnicodeData.txt EastAsianWidth.txt DerivedNormalizationProps.txt CompositionExclusions.txt NormalizationTest.txt: $(top_builddir)/src/Makefile.global
+CompositionExclusions.txt DerivedNormalizationProps.txt EastAsianWidth.txt NormalizationTest.txt UnicodeData.txt: $(top_builddir)/src/Makefile.global
 	$(DOWNLOAD) https://www.unicode.org/Public/$(UNICODE_VERSION)/ucd/$(@F)
 
 unicode_version.h: generate-unicode_version.pl
@@ -82,4 +82,4 @@ clean:
 	rm -f $(OBJS) category_test category_test.o norm_test norm_test.o
 
 distclean: clean
-	rm -f UnicodeData.txt EastAsianWidth.txt CompositionExclusions.txt NormalizationTest.txt norm_test_table.h unicode_norm_table.h
+	rm -f CompositionExclusions.txt DerivedNormalizationProps.txt EastAsianWidth.txt NormalizationTest.txt UnicodeData.txt norm_test_table.h unicode_category_table.h unicode_norm_table.h
diff --git a/src/common/unicode/category_test.c b/src/common/unicode/category_test.c
index ba62716d45..d9ea806eb8 100644
--- a/src/common/unicode/category_test.c
+++ b/src/common/unicode/category_test.c
@@ -54,8 +54,8 @@ main(int argc, char **argv)
 	int			pg_skipped_codepoints = 0;
 	int			icu_skipped_codepoints = 0;
 
-	printf("Postgres Unicode Version:\t%s\n", PG_UNICODE_VERSION);
-	printf("ICU Unicode Version:\t\t%s\n", U_UNICODE_VERSION);
+	printf("category_test: Postgres Unicode version:\t%s\n", PG_UNICODE_VERSION);
+	printf("category_test: ICU Unicode version:\t\t%s\n", U_UNICODE_VERSION);
 
 	for (UChar32 code = 0; code <= 0x10ffff; code++)
 	{
@@ -79,11 +79,11 @@ main(int argc, char **argv)
 				icu_skipped_codepoints++;
 			else
 			{
-				printf("FAILURE for codepoint %06x\n", code);
-				printf("Postgres category:	%02d %s %s\n", pg_category,
+				printf("category_test: FAILURE for codepoint 0x%06x\n", code);
+				printf("category_test: Postgres category:	%02d %s %s\n", pg_category,
 					   unicode_category_abbrev(pg_category),
 					   unicode_category_string(pg_category));
-				printf("ICU category:		%02d %s %s\n", icu_category,
+				printf("category_test: ICU category:		%02d %s %s\n", icu_category,
 					   unicode_category_abbrev(icu_category),
 					   unicode_category_string(icu_category));
 				printf("\n");
@@ -93,16 +93,16 @@ main(int argc, char **argv)
 	}
 
 	if (pg_skipped_codepoints > 0)
-		printf("Skipped %d codepoints unassigned in Postgres due to Unicode version mismatch.\n",
+		printf("category_test: skipped %d codepoints unassigned in Postgres due to Unicode version mismatch\n",
 			   pg_skipped_codepoints);
 	if (icu_skipped_codepoints > 0)
-		printf("Skipped %d codepoints unassigned in ICU due to Unicode version mismatch.\n",
+		printf("category_test: skipped %d codepoints unassigned in ICU due to Unicode version mismatch\n",
 			   icu_skipped_codepoints);
 
-	printf("category_test: All tests successful!\n");
+	printf("category_test: success\n");
 	exit(0);
 #else
-	printf("ICU support required for test; skipping.\n");
+	printf("category_test: ICU support required for test; skipping\n");
 	exit(0);
 #endif
 }
diff --git a/src/common/unicode/meson.build b/src/common/unicode/meson.build
index 6af46122c4..e8cfdc1df4 100644
--- a/src/common/unicode/meson.build
+++ b/src/common/unicode/meson.build
@@ -11,7 +11,7 @@ endif
 
 # These files are part of the Unicode Character Database. Download them on
 # demand.
-foreach f : ['UnicodeData.txt', 'EastAsianWidth.txt', 'DerivedNormalizationProps.txt', 'CompositionExclusions.txt', 'NormalizationTest.txt']
+foreach f : ['CompositionExclusions.txt', 'DerivedNormalizationProps.txt', 'EastAsianWidth.txt', 'NormalizationTest.txt', 'UnicodeData.txt']
   url = unicode_baseurl.format(UNICODE_VERSION, f)
   target = custom_target(f,
     output: f,
@@ -24,15 +24,6 @@ endforeach
 
 update_unicode_targets = []
 
-update_unicode_targets += \
-  custom_target('unicode_version.h',
-    output: ['unicode_version.h'],
-    command: [
-      perl, files('generate-unicode_version.pl'),
-      '--outdir', '@OUTDIR@', '--version', UNICODE_VERSION],
-    build_by_default: false,
-  )
-
 update_unicode_targets += \
   custom_target('unicode_category_table.h',
     input: [unicode_data['UnicodeData.txt']],
@@ -44,14 +35,12 @@ update_unicode_targets += \
   )
 
 update_unicode_targets += \
-  custom_target('unicode_norm_table.h',
-    input: [unicode_data['UnicodeData.txt'], unicode_data['CompositionExclusions.txt']],
-    output: ['unicode_norm_table.h', 'unicode_norm_hashfunc.h'],
-    depend_files: perfect_hash_pm,
-    command: [
-      perl, files('generate-unicode_norm_table.pl'),
-      '--outdir', '@OUTDIR@', '@INPUT@'],
+  custom_target('unicode_east_asian_fw_table.h',
+    input: [unicode_data['EastAsianWidth.txt']],
+    output: ['unicode_east_asian_fw_table.h'],
+    command: [perl, files('generate-unicode_east_asian_fw_table.pl'), '@INPUT@'],
     build_by_default: false,
+    capture: true,
   )
 
 update_unicode_targets += \
@@ -65,12 +54,14 @@ update_unicode_targets += \
   )
 
 update_unicode_targets += \
-  custom_target('unicode_east_asian_fw_table.h',
-    input: [unicode_data['EastAsianWidth.txt']],
-    output: ['unicode_east_asian_fw_table.h'],
-    command: [perl, files('generate-unicode_east_asian_fw_table.pl'), '@INPUT@'],
+  custom_target('unicode_norm_table.h',
+    input: [unicode_data['UnicodeData.txt'], unicode_data['CompositionExclusions.txt']],
+    output: ['unicode_norm_table.h', 'unicode_norm_hashfunc.h'],
+    depend_files: perfect_hash_pm,
+    command: [
+      perl, files('generate-unicode_norm_table.pl'),
+      '--outdir', '@OUTDIR@', '@INPUT@'],
     build_by_default: false,
-    capture: true,
   )
 
 update_unicode_targets += \
@@ -83,6 +74,15 @@ update_unicode_targets += \
     capture: true,
   )
 
+update_unicode_targets += \
+  custom_target('unicode_version.h',
+    output: ['unicode_version.h'],
+    command: [
+      perl, files('generate-unicode_version.pl'),
+      '--outdir', '@OUTDIR@', '--version', UNICODE_VERSION],
+    build_by_default: false,
+  )
+
 norm_test_table = custom_target('norm_test_table.h',
     input: [unicode_data['NormalizationTest.txt']],
     output: ['norm_test_table.h'],
-- 
2.34.1

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Jeff Davis (#5)
Re: encoding affects ICU regex character classification

On Sat, Dec 2, 2023 at 9:49 AM Jeff Davis <pgsql@j-davis.com> wrote:

Your definition is too wide in my opinion, because it mixes together
different sources of variation that are best left separate:
a. region/language
b. technical requirements
c. versioning
d. implementation variance

(a) is not a true source of variation (please correct me if I'm wrong)

(b) is perhaps interesting. The "C" locale is one example, and perhaps
there are others, but I doubt very many others that we want to support.

(c) is not a major concern in my opinion. The impact of Unicode changes
is usually not dramatic, and it only affects regexes so it's much more
contained than collation, for example. And if you really care, just use
the "C" locale.

(d) is mostly a bug

I get you. I was mainly commenting on what POSIX APIs allow, which is
much wider than what you might observe on <your local libc>, and also
end-user-customisable. But I agree that Unicode is all-pervasive and
authoritative in practice, to the point that if your libc disagrees
with it, it's probably just wrong. (I guess site-local locales were
essential for bootstrapping in the early days of computers in a
language/territory but I can't find much discussion of the tools being
used by non-libc-maintainers today.)

I think we only need 2 main character classification schemes: "C" and
Unicode (TR #18 Compatibility Properties[1], either the "Standard"
variant or the "POSIX Compatible" variant or both). The libc and ICU
ones should be there only for compatibility and discouraged and
hopefully eventually removed.

How would you specify what you want? As with collating, I like the
idea of keeping support for libc even if it is terrible (some libcs
more than others) and eventually not the default, because I think
optional agreement with other software on the same host is a feature.

In the regex code we see not only class membership tests eg
iswlower_l(), but also conversions eg towlower_l(). Unless you also
implement built-in case mapping, you'd still have to call libc or ICU
for that, right? It seems a bit strange to use different systems for
classification and mapping. If you do implement mapping too, you have
to decide if you believe it is language-dependent or not, I think?

Hmm, let's see what we're doing now... for ICU the regex code is using
"simple" case mapping functions like u_toupper(c) that don't take a
locale, so no Turkish i/İ conversion for you, unlike our SQL
upper()/lower(), which this is supposed to agree with according to the
comments at the top. I see why: POSIX can only do one-by-one
character mappings (which cannot handle Greek's context-sensitive
Σ->σ/ς or German's multi-character ß->SS), while ICU offers only
language-aware "full" string conversation (which does not guarantee
1:1 mapping for each character in a string) OR non-language-aware
"simple" character conversion (which does not handle Turkish's i->İ).
ICU has no middle ground for language-aware mapping with just the 1:1
cases only, probably because that doesn't really make total sense as a
concept (as I assume Greek speakers would agree).

Not knowing anything about how glibc generates its charmaps,
Unicode
or pre-Unicode, I could take a wild guess that maybe in LATIN9 they
have an old hand-crafted table, but for UTF-8 encoding it's fully
outsourced to Unicode, and that's why you see a difference.

No, the problem is that we're passing a pg_wchar to an ICU function
that expects a 32-bit code point. Those two things are equivalent in
the UTF8 encoding, but not in the LATIN9 encoding.

Ah right, I get that now (sorry, I confused myself by forgetting we
were talking about ICU).

#7Jeff Davis
pgsql@j-davis.com
In reply to: Thomas Munro (#6)
Re: encoding affects ICU regex character classification

On Sun, 2023-12-10 at 10:39 +1300, Thomas Munro wrote:

How would you specify what you want?

One proposal would be to have a builtin collation provider:

/messages/by-id/9d63548c4d86b0f820e1ff15a83f93ed9ded4543.camel@j-davis.com

I don't think there are very many ctype options, but they could be
specified as part of the locale, or perhaps even as some provider-
specific options specified at CREATE COLLATION time.

As with collating, I like the
idea of keeping support for libc even if it is terrible (some libcs
more than others) and eventually not the default, because I think
optional agreement with other software on the same host is a feature.

Of course we should keep the libc support around. I'm not sure how
relevant such a feature is, but I don't think we actually have to
remove it.

Unless you also
implement built-in case mapping, you'd still have to call libc or ICU
for that, right?

We can do built-in case mapping, see:

/messages/by-id/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel@j-davis.com

  It seems a bit strange to use different systems for
classification and mapping.  If you do implement mapping too, you
have
to decide if you believe it is language-dependent or not, I think?

A complete solution would need to do the language-dependent case
mapping. But that seems to only be 3 locales ("az", "lt", and "tr"),
and only a handful of mapping changes, so we can handle that with the
builtin provider as well.

Hmm, let's see what we're doing now... for ICU the regex code is
using
"simple" case mapping functions like u_toupper(c) that don't take a
locale, so no Turkish i/İ conversion for you, unlike our SQL
upper()/lower(), which this is supposed to agree with according to
the
comments at the top.  I see why: POSIX can only do one-by-one
character mappings (which cannot handle Greek's context-sensitive
Σ->σ/ς or German's multi-character ß->SS)

Regexes are inherently character-by-character, so transformations like
ß->SS are not going to work for case-insensitive regex matching
regardless of the provider.

Σ->σ/ς does make sense, and what we have seems to be just broken:

select 'ς' ~* 'Σ'; -- false in both libc and ICU
select 'Σ' ~* 'ς'; -- true in both libc and ICU

Similarly for titlecase variants:

select 'Dž' ~* 'dž'; -- false in libc and ICU
select 'dž' ~* 'Dž'; -- true in libc and ICU

If we do the case mapping ourselves, we can make those work. We'd just
have to modify the APIs a bit so that allcases() can actually get all
of the case variants, rather than relying on just towupper/towlower.

Regards,
Jeff Davis

#8Jeremy Schneider
schneider@ardentperf.com
In reply to: Jeff Davis (#7)
Re: encoding affects ICU regex character classification

On 12/12/23 1:39 PM, Jeff Davis wrote:

On Sun, 2023-12-10 at 10:39 +1300, Thomas Munro wrote:

Unless you also
implement built-in case mapping, you'd still have to call libc or ICU
for that, right?

We can do built-in case mapping, see:

/messages/by-id/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel@j-davis.com

  It seems a bit strange to use different systems for
classification and mapping.  If you do implement mapping too, you
have
to decide if you believe it is language-dependent or not, I think?

A complete solution would need to do the language-dependent case
mapping. But that seems to only be 3 locales ("az", "lt", and "tr"),
and only a handful of mapping changes, so we can handle that with the
builtin provider as well.

This thread has me second-guessing the reply I just sent on the other
thread.

Is someone able to test out upper & lower functions on U+A7BA ... U+A7BF
across a few libs/versions? Theoretically the upper/lower behavior
should change in ICU between Ubuntu 18.04 LTS and Ubuntu 20.04 LTS
(specifically in ICU 64 / Unicode 12). And I have no idea if or when
glibc might have picked up the new unicode characters.

-Jeremy

--
http://about.me/jeremy_schneider

#9Jeff Davis
pgsql@j-davis.com
In reply to: Jeremy Schneider (#8)
Re: encoding affects ICU regex character classification

On Tue, 2023-12-12 at 14:35 -0800, Jeremy Schneider wrote:

Is someone able to test out upper & lower functions on U+A7BA ...
U+A7BF
across a few libs/versions?

Those code points are unassigned in Unicode 11.0 and assigned in
Unicode 12.0.

In ICU 63-2 (based on Unicode 11.0), they just get mapped to
themselves. In ICU 64-2 (based on Unicode 12.1) they get mapped the
same way the builtin CTYPE maps them (based on Unicode 15.1).

The concern over unassigned code points is misplaced. The application
may be aware of newly-assigned code points, and there's no way they
will be mapped correctly in Postgres if the provider is not aware of
those code points. The user can either proceed in using unassigned code
points and accept the risk of future changes, or wait for the provider
to be upgraded.

If the user doesn't have many expression indexes dependent on ctype
behavior, it doesn't matter much. If they do have such indexes, the
best we can offer is a controlled process, and the builtin provider
allows the most visibility and control.

(Aside: case mapping has very strong compatibility guarantees, but not
perfect. For better compatibility guarantees, we should support case
folding.)

And I have no idea if or when
glibc might have picked up the new unicode characters.

That's a strong argument in favor of a builtin provider.

Regards,
Jeff Davis

#10Jeremy Schneider
schneider@ardentperf.com
In reply to: Jeff Davis (#9)
Re: encoding affects ICU regex character classification

On 12/14/23 7:12 AM, Jeff Davis wrote:

The concern over unassigned code points is misplaced. The application
may be aware of newly-assigned code points, and there's no way they
will be mapped correctly in Postgres if the provider is not aware of
those code points. The user can either proceed in using unassigned code
points and accept the risk of future changes, or wait for the provider
to be upgraded.

This does not seem to me like a good way to view the situation.

Earlier this summer, a day or two after writing a document, I was
completely surprised to open it on my work computer and see "unknown
character" boxes. When I had previously written the document on my home
computer and when I had viewed it from my cell phone, everything was
fine. Apple does a very good job of always keeping iPhones and MacOS
versions up-to-date with the latest versions of Unicode and latest
characters. iPhone keyboards make it very easy to access any character.
Emojis are the canonical example here. My work computer was one major
version of MacOS behind my home computer.

And I'm probably one of a few people on this hackers email list who even
understands what the words "unassigned code point" mean. Generally DBAs,
sysadmins, architects and developers who are all part of the tangled web
of building and maintaining systems which use PostgreSQL on their
backend are never going to think about unicode characters proactively.

This goes back to my other thread (which sadly got very little
discussion): PosgreSQL really needs to be safe by /default/ ... having
GUCs is fine though; we can put explanation in the docs about what users
should consider if they change a setting.

-Jeremy

--
http://about.me/jeremy_schneider

#11Thomas Munro
thomas.munro@gmail.com
In reply to: Jeremy Schneider (#10)
Re: encoding affects ICU regex character classification

On Sat, Dec 16, 2023 at 1:48 PM Jeremy Schneider
<schneider@ardentperf.com> wrote:

On 12/14/23 7:12 AM, Jeff Davis wrote:

The concern over unassigned code points is misplaced. The application
may be aware of newly-assigned code points, and there's no way they
will be mapped correctly in Postgres if the provider is not aware of
those code points. The user can either proceed in using unassigned code
points and accept the risk of future changes, or wait for the provider
to be upgraded.

This does not seem to me like a good way to view the situation.

Earlier this summer, a day or two after writing a document, I was
completely surprised to open it on my work computer and see "unknown
character" boxes. When I had previously written the document on my home
computer and when I had viewed it from my cell phone, everything was
fine. Apple does a very good job of always keeping iPhones and MacOS
versions up-to-date with the latest versions of Unicode and latest
characters. iPhone keyboards make it very easy to access any character.
Emojis are the canonical example here. My work computer was one major
version of MacOS behind my home computer.

That "SQUARE ERA NAME REIWA" codepoint we talked about in one of the
multi-version ICU threads was an interesting case study. It's not an
emoji, it entered real/serious use suddenly, landed in a quickly
wrapped minor release of Unicode, and then arrived in locale
definitions via regular package upgrades on various OSes AFAICT (ie
didn't require a major version upgrade of the OS).

https://en.wikipedia.org/wiki/Reiwa_era#Announcement
https://en.wikipedia.org/wiki/Reiwa_era#Technology
https://unicode.org/versions/Unicode12.1.0/

#12Jeff Davis
pgsql@j-davis.com
In reply to: Jeremy Schneider (#10)
Re: encoding affects ICU regex character classification

On Fri, 2023-12-15 at 16:48 -0800, Jeremy Schneider wrote:

This goes back to my other thread (which sadly got very little
discussion): PosgreSQL really needs to be safe by /default/

Doesn't a built-in provider help create a safer option?

The built-in provider's version of Unicode will be consistent with
unicode_assigned(), which is a first step toward rejecting code points
that the provider doesn't understand. And by rejecting unassigned code
points, we get all kinds of Unicode compatibility guarantees that avoid
the kinds of change risks that you are worried about.

Regards,
Jeff Davis