json/jsonb and unicode escapes
Started by Andrew Dunstanover 11 years ago1 messages
Here is a draft patch for some of the issues to do with unicode escapes
that Teodor raised the other day.
I think it does the right thing, although I want to add a few more
regression cases before committing it.
Comments welcome.
cheers
andrew
Attachments:
json-escape.patchtext/x-patch; name=json-escape.patchDownload
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index a7364f3..47ab9be 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -2274,7 +2274,27 @@ escape_json(StringInfo buf, const char *str)
appendStringInfoString(buf, "\\\"");
break;
case '\\':
- appendStringInfoString(buf, "\\\\");
+ /*
+ * Unicode escapes are passed through as is. There is no
+ * requirement that they denote a valid character in the
+ * server encoding - indeed that is a big part of their
+ * usefulness.
+ *
+ * All we require is that they consist of \uXXXX where
+ * the Xs are hexadecimal digits. It is the responsibility
+ * of the caller of, say, to_json() to make sure that the
+ * unicode escape is valid.
+ *
+ * In the case of a jsonb string value beng escaped, the
+ * only unicode escape that should be present is \u0000,
+ * all the other unicode escapes will have been resolved.
+ *
+ */
+ if (p[1] == 'u' && isxdigit(p[2]) && isxdigit(p[3])
+ && isxdigit(p[4]) && isxdigit(p[5]))
+ appendStringInfoCharMacro(buf, *p);
+ else
+ appendStringInfoString(buf, "\\\\");
break;
default:
if ((unsigned char) *p < ' ')
diff --git a/src/test/regress/expected/jsonb.out b/src/test/regress/expected/jsonb.out
index ae7c506..1e46939 100644
--- a/src/test/regress/expected/jsonb.out
+++ b/src/test/regress/expected/jsonb.out
@@ -61,9 +61,9 @@ LINE 1: SELECT '"\u000g"'::jsonb;
DETAIL: "\u" must be followed by four hexadecimal digits.
CONTEXT: JSON data, line 1: "\u000g...
SELECT '"\u0000"'::jsonb; -- OK, legal escape
- jsonb
------------
- "\\u0000"
+ jsonb
+----------
+ "\u0000"
(1 row)
-- use octet_length here so we don't get an odd unicode char in the
diff --git a/src/test/regress/expected/jsonb_1.out b/src/test/regress/expected/jsonb_1.out
index 38a95b4..955dc42 100644
--- a/src/test/regress/expected/jsonb_1.out
+++ b/src/test/regress/expected/jsonb_1.out
@@ -61,9 +61,9 @@ LINE 1: SELECT '"\u000g"'::jsonb;
DETAIL: "\u" must be followed by four hexadecimal digits.
CONTEXT: JSON data, line 1: "\u000g...
SELECT '"\u0000"'::jsonb; -- OK, legal escape
- jsonb
------------
- "\\u0000"
+ jsonb
+----------
+ "\u0000"
(1 row)
-- use octet_length here so we don't get an odd unicode char in the