making the backend's json parser work in frontend code

Started by Robert Haasalmost 6 years ago101 messages

robertmhaas@gmail.com

almost 6 years ago

6 attachment(s)

The discussion on the backup manifest thread has gotten bogged down on
the issue of the format that should be used to store the backup
manifest file. I want something simple and ad-hoc; David Steele and
Stephen Frost prefer JSON. That is problematic because our JSON parser
does not work in frontend code, and I want to be able to validate a
backup against its manifest, which involves being able to parse the
manifest from frontend code. The latest development over there is that
David Steele has posted the JSON parser that he wrote for pgbackrest
with an offer to try to adapt it for use in front-end PostgreSQL code,
an offer which I genuinely appreciate. I'll write more about that over
on that thread. However, I decided to spend today doing some further
investigation of an alternative approach, namely making the backend's
existing JSON parser work in frontend code as well. I did not solve
all the problems there, but I did come up with some patches which I
think would be worth committing on independent grounds, and I think
the whole series is worth posting. So here goes.

0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
missing something, this seems like an overdue cleanup. It's long been
the case that wchar.c is actually compiled and linked into both
frontend and backend code. Commit
60f11b87a2349985230c08616fa8a34ffde934c8 added code into src/common
that depends on wchar.c being available, but didn't actually make
wchar.c part of src/common, which seems like an odd decision: the
functions in the library are dependent on code that is not part of any
library but whose source files get copied around where needed. Eh?

0002 does some basic header cleanup to make it possible to include the
existing header file jsonapi.h in frontend code. The state of the JSON
headers today looks generally poor. There seems not to have been much
attempt to get the prototypes for a given source file, say foo.c, into
a header file with the same name, say foo.h. Also, dependencies
between various header files seem to be have added somewhat freely.
This patch does not come close to fixing all that, but I consider it a
modest down payment on a cleanup that probably ought to be taken
further.

0003 splits json.c into two files, json.c and jsonapi.c. All the
lexing and parsing stuff (whose prototypes are in jsonapi.h) goes into
jsonapi.c, while the stuff that pertains to the 'json' data type
remains in json.c. This also seems like a good cleanup, because to me,
at least, it's not a great idea to mix together code that is used by
both the json and jsonb data types as well as other things in the
system that want to generate or parse json together with things that
are specific to the 'json' data type.

As far as I know all three of the above patches are committable as-is;
review and contrary opinions welcome.

On the other hand, 0004, 0005, and 0006 are charitably described as
experimental or WIP. 0004 and 0005 hack up jsonapi.c so that it can
still be compiled even if #include "postgres.h" is changed to #include
"postgres-fe.h" and 0006 moves it into src/common. Note that I say
that they make it compile, not work. It's not just untested; it's
definitely broken. But it gives a feeling for what the remaining
obstacles to making this code available in a frontend environment are.
Since I wrote my very first email complaining about the difficulty of
making the backend's JSON parser work in a frontend environment, one
obstacle has been knocked down: StringInfo is now available in
front-end code (commit 26aaf97b683d6258c098859e6b1268e1f5da242f). The
remaining problems (that I know about) have to do with error reporting
and multibyte character support; a read of the patches is suggested
for those wanting further details.

Suggestions welcome.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Move-wchar.c-to-src-common.patchapplication/octet-stream; name=0001-Move-wchar.c-to-src-common.patchDownload

From 3236eb91d9dec6177c26c91467f1e5655b70e610 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 15 Jan 2020 12:51:59 -0500
Subject: [PATCH 1/6] Move wchar.c to src/common.

Previously, it was compiled as part of src/backend/utils/mb and
separately as part of src/interfaces/libpq, and separately linked
into both the backend and libpq. However, as noted by commit
60f11b87a2349985230c08616fa8a34ffde934c8, this is not an optimal
arrangement, and it makes more sense to make wchar.c part of
src/common.
---
 src/backend/utils/mb/Makefile            | 1 -
 src/common/Makefile                      | 3 ++-
 src/{backend/utils/mb => common}/wchar.c | 0
 src/interfaces/libpq/Makefile            | 3 +--
 4 files changed, 3 insertions(+), 4 deletions(-)
 rename src/{backend/utils/mb => common}/wchar.c (100%)

diff --git a/src/backend/utils/mb/Makefile b/src/backend/utils/mb/Makefile
index cd4a016449..766a033f39 100644
--- a/src/backend/utils/mb/Makefile
+++ b/src/backend/utils/mb/Makefile
@@ -17,7 +17,6 @@ OBJS = \
 	encnames.o \
 	mbutils.o \
 	stringinfo_mb.o \
-	wchar.o \
 	wstrcmp.o \
 	wstrncmp.o
 
diff --git a/src/common/Makefile b/src/common/Makefile
index ffb0f6edff..cb00bcbbba 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -70,7 +70,8 @@ OBJS_COMMON = \
 	stringinfo.o \
 	unicode_norm.o \
 	username.o \
-	wait_error.o
+	wait_error.o \
+	wchar.o
 
 ifeq ($(with_openssl),yes)
 OBJS_COMMON += sha2_openssl.o
diff --git a/src/backend/utils/mb/wchar.c b/src/common/wchar.c
similarity index 100%
rename from src/backend/utils/mb/wchar.c
rename to src/common/wchar.c
diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile
index 773ef2723d..009fe4d2bf 100644
--- a/src/interfaces/libpq/Makefile
+++ b/src/interfaces/libpq/Makefile
@@ -44,8 +44,7 @@ OBJS = \
 
 # src/backend/utils/mb
 OBJS += \
-	encnames.o \
-	wchar.o
+	encnames.o
 
 ifeq ($(with_openssl),yes)
 OBJS += \
-- 
2.17.2 (Apple Git-113)

0002-Adjust-src-include-utils-jsonapi.h-so-it-s-not-backe.patchapplication/octet-stream; name=0002-Adjust-src-include-utils-jsonapi.h-so-it-s-not-backe.patchDownload

From 8c58c5bad80626f60da836cedac4644d686cf4ce Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 15 Jan 2020 09:26:04 -0500
Subject: [PATCH 2/6] Adjust src/include/utils/jsonapi.h so it's not
 backend-only.

The major change here is that we no longer include jsonb.h into
jsonapi.h. The reason that was necessary is that jsonapi.h included
several prototypes functions in jsonfuncs.c that depend on the Jsonb
type. Move those prototypes to a new header, jsonfuncs.h, and include
it where needed.

The other change is that JsonEncodeDateTime is now declared in
json.h rather than jsonapi.h.

Taken together, these steps eliminate all dependencies of jsonapi.h
on backend-only data types and header files, so that it can
potentially be included in frontend code.
---
 src/backend/tsearch/to_tsany.c     |  1 +
 src/backend/tsearch/wparser.c      |  1 +
 src/backend/utils/adt/jsonb_util.c |  1 +
 src/backend/utils/adt/jsonfuncs.c  |  1 +
 src/include/utils/json.h           |  2 ++
 src/include/utils/jsonapi.h        | 33 --------------------
 src/include/utils/jsonfuncs.h      | 49 ++++++++++++++++++++++++++++++
 7 files changed, 55 insertions(+), 33 deletions(-)
 create mode 100644 src/include/utils/jsonfuncs.h

diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index cc694cda8c..adf181c191 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -17,6 +17,7 @@
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
 #include "utils/jsonapi.h"
+#include "utils/jsonfuncs.h"
 
 
 typedef struct MorphOpaque
diff --git a/src/backend/tsearch/wparser.c b/src/backend/tsearch/wparser.c
index 6b5960ecc1..c7499a94ac 100644
--- a/src/backend/tsearch/wparser.c
+++ b/src/backend/tsearch/wparser.c
@@ -21,6 +21,7 @@
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
 #include "utils/jsonapi.h"
+#include "utils/jsonfuncs.h"
 #include "utils/varlena.h"
 
 /******sql-level interface******/
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 7c9da701dd..b33c3ef43c 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -19,6 +19,7 @@
 #include "utils/builtins.h"
 #include "utils/datetime.h"
 #include "utils/hashutils.h"
+#include "utils/json.h"
 #include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/memutils.h"
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index ab5a24a858..a08d3a2027 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -29,6 +29,7 @@
 #include "utils/json.h"
 #include "utils/jsonapi.h"
 #include "utils/jsonb.h"
+#include "utils/jsonfuncs.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 20b5294491..4345fbdc31 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -18,5 +18,7 @@
 
 /* functions in json.c */
 extern void escape_json(StringInfo buf, const char *str);
+extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
+								const int *tzp);
 
 #endif							/* JSON_H */
diff --git a/src/include/utils/jsonapi.h b/src/include/utils/jsonapi.h
index f72f1cefd5..1190947476 100644
--- a/src/include/utils/jsonapi.h
+++ b/src/include/utils/jsonapi.h
@@ -14,7 +14,6 @@
 #ifndef JSONAPI_H
 #define JSONAPI_H
 
-#include "jsonb.h"
 #include "lib/stringinfo.h"
 
 typedef enum
@@ -132,36 +131,4 @@ extern JsonLexContext *makeJsonLexContextCstringLen(char *json,
  */
 extern bool IsValidJsonNumber(const char *str, int len);
 
-/*
- * Flag types for iterate_json(b)_values to specify what elements from a
- * json(b) document we want to iterate.
- */
-typedef enum JsonToIndex
-{
-	jtiKey = 0x01,
-	jtiString = 0x02,
-	jtiNumeric = 0x04,
-	jtiBool = 0x08,
-	jtiAll = jtiKey | jtiString | jtiNumeric | jtiBool
-} JsonToIndex;
-
-/* an action that will be applied to each value in iterate_json(b)_values functions */
-typedef void (*JsonIterateStringValuesAction) (void *state, char *elem_value, int elem_len);
-
-/* an action that will be applied to each value in transform_json(b)_values functions */
-typedef text *(*JsonTransformStringValuesAction) (void *state, char *elem_value, int elem_len);
-
-extern uint32 parse_jsonb_index_flags(Jsonb *jb);
-extern void iterate_jsonb_values(Jsonb *jb, uint32 flags, void *state,
-								 JsonIterateStringValuesAction action);
-extern void iterate_json_values(text *json, uint32 flags, void *action_state,
-								JsonIterateStringValuesAction action);
-extern Jsonb *transform_jsonb_string_values(Jsonb *jsonb, void *action_state,
-											JsonTransformStringValuesAction transform_action);
-extern text *transform_json_string_values(text *json, void *action_state,
-										  JsonTransformStringValuesAction transform_action);
-
-extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
-								const int *tzp);
-
 #endif							/* JSONAPI_H */
diff --git a/src/include/utils/jsonfuncs.h b/src/include/utils/jsonfuncs.h
new file mode 100644
index 0000000000..19f087ccae
--- /dev/null
+++ b/src/include/utils/jsonfuncs.h
@@ -0,0 +1,49 @@
+/*-------------------------------------------------------------------------
+ *
+ * jsonfuncs.h
+ *	  Functions to process JSON data types.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/jsonapi.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef JSONFUNCS_H
+#define JSONFUNCS_H
+
+#include "utils/jsonapi.h"
+#include "utils/jsonb.h"
+
+/*
+ * Flag types for iterate_json(b)_values to specify what elements from a
+ * json(b) document we want to iterate.
+ */
+typedef enum JsonToIndex
+{
+	jtiKey = 0x01,
+	jtiString = 0x02,
+	jtiNumeric = 0x04,
+	jtiBool = 0x08,
+	jtiAll = jtiKey | jtiString | jtiNumeric | jtiBool
+} JsonToIndex;
+
+/* an action that will be applied to each value in iterate_json(b)_values functions */
+typedef void (*JsonIterateStringValuesAction) (void *state, char *elem_value, int elem_len);
+
+/* an action that will be applied to each value in transform_json(b)_values functions */
+typedef text *(*JsonTransformStringValuesAction) (void *state, char *elem_value, int elem_len);
+
+extern uint32 parse_jsonb_index_flags(Jsonb *jb);
+extern void iterate_jsonb_values(Jsonb *jb, uint32 flags, void *state,
+								 JsonIterateStringValuesAction action);
+extern void iterate_json_values(text *json, uint32 flags, void *action_state,
+								JsonIterateStringValuesAction action);
+extern Jsonb *transform_jsonb_string_values(Jsonb *jsonb, void *action_state,
+											JsonTransformStringValuesAction transform_action);
+extern text *transform_json_string_values(text *json, void *action_state,
+										  JsonTransformStringValuesAction transform_action);
+
+#endif
-- 
2.17.2 (Apple Git-113)

0003-Split-JSON-lexer-parser-from-json-data-type-support.patchapplication/octet-stream; name=0003-Split-JSON-lexer-parser-from-json-data-type-support.patchDownload

From d8326f93883dd50e296a7282e03f123589559166 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 15 Jan 2020 10:03:39 -0500
Subject: [PATCH 3/6] Split JSON lexer/parser from 'json' data type support.

Keep the code that pertains to the 'json' data type in json.c, but
move the lexing and parsing code to a new file jsonapi.c, a name
I chose because the corresponding prototypes are in jsonapi.h.

This seems like a logical division, because the JSON lexer and parser
are also used by the 'jsonb' data type, but the SQL-callable functions
in json.c are a separate thing. Also, the new jsonapi.c file needs to
include far fewer header files than json.c, which seems like a good
sign that this is an appropriate place to insert an abstraction
boundary. I took the opportunity to remove a few apparently-unneeded
includes from json.c at the same time.
---
 src/backend/utils/adt/Makefile  |    1 +
 src/backend/utils/adt/json.c    | 1206 +-----------------------------
 src/backend/utils/adt/jsonapi.c | 1216 +++++++++++++++++++++++++++++++
 src/include/utils/jsonapi.h     |    6 +
 4 files changed, 1224 insertions(+), 1205 deletions(-)
 create mode 100644 src/backend/utils/adt/jsonapi.c

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 13efa9338c..790d7a24fb 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -44,6 +44,7 @@ OBJS = \
 	int.o \
 	int8.o \
 	json.o \
+	jsonapi.o \
 	jsonb.o \
 	jsonb_gin.o \
 	jsonb_op.o \
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 458505abfd..4be16b5c20 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -13,14 +13,9 @@
  */
 #include "postgres.h"
 
-#include "access/htup_details.h"
-#include "access/transam.h"
 #include "catalog/pg_type.h"
-#include "executor/spi.h"
 #include "funcapi.h"
-#include "lib/stringinfo.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "utils/array.h"
@@ -30,27 +25,8 @@
 #include "utils/json.h"
 #include "utils/jsonapi.h"
 #include "utils/lsyscache.h"
-#include "utils/syscache.h"
 #include "utils/typcache.h"
 
-/*
- * The context of the parser is maintained by the recursive descent
- * mechanism, but is passed explicitly to the error reporting routine
- * for better diagnostics.
- */
-typedef enum					/* contexts of JSON parser */
-{
-	JSON_PARSE_VALUE,			/* expecting a value */
-	JSON_PARSE_STRING,			/* expecting a string (for a field name) */
-	JSON_PARSE_ARRAY_START,		/* saw '[', expecting value or ']' */
-	JSON_PARSE_ARRAY_NEXT,		/* saw array element, expecting ',' or ']' */
-	JSON_PARSE_OBJECT_START,	/* saw '{', expecting label or '}' */
-	JSON_PARSE_OBJECT_LABEL,	/* saw object label, expecting ':' */
-	JSON_PARSE_OBJECT_NEXT,		/* saw object value, expecting ',' or '}' */
-	JSON_PARSE_OBJECT_COMMA,	/* saw object ',', expecting next label */
-	JSON_PARSE_END				/* saw the end of a document, expect nothing */
-} JsonParseContext;
-
 typedef enum					/* type categories for datum_to_json */
 {
 	JSONTYPE_NULL,				/* null, so we didn't bother to identify */
@@ -75,19 +51,6 @@ typedef struct JsonAggState
 	Oid			val_output_func;
 } JsonAggState;
 
-static inline void json_lex(JsonLexContext *lex);
-static inline void json_lex_string(JsonLexContext *lex);
-static inline void json_lex_number(JsonLexContext *lex, char *s,
-								   bool *num_err, int *total_len);
-static inline void parse_scalar(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_object_field(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_object(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_array_element(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_array(JsonLexContext *lex, JsonSemAction *sem);
-static void report_parse_error(JsonParseContext ctx, JsonLexContext *lex) pg_attribute_noreturn();
-static void report_invalid_token(JsonLexContext *lex) pg_attribute_noreturn();
-static int	report_json_context(JsonLexContext *lex);
-static char *extract_mb_char(char *s);
 static void composite_to_json(Datum composite, StringInfo result,
 							  bool use_line_feeds);
 static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
@@ -106,121 +69,6 @@ static void add_json(Datum val, bool is_null, StringInfo result,
 					 Oid val_type, bool key_scalar);
 static text *catenate_stringinfo_string(StringInfo buffer, const char *addon);
 
-/* the null action object used for pure validation */
-static JsonSemAction nullSemAction =
-{
-	NULL, NULL, NULL, NULL, NULL,
-	NULL, NULL, NULL, NULL, NULL
-};
-
-/* Recursive Descent parser support routines */
-
-/*
- * lex_peek
- *
- * what is the current look_ahead token?
-*/
-static inline JsonTokenType
-lex_peek(JsonLexContext *lex)
-{
-	return lex->token_type;
-}
-
-/*
- * lex_accept
- *
- * accept the look_ahead token and move the lexer to the next token if the
- * look_ahead token matches the token parameter. In that case, and if required,
- * also hand back the de-escaped lexeme.
- *
- * returns true if the token matched, false otherwise.
- */
-static inline bool
-lex_accept(JsonLexContext *lex, JsonTokenType token, char **lexeme)
-{
-	if (lex->token_type == token)
-	{
-		if (lexeme != NULL)
-		{
-			if (lex->token_type == JSON_TOKEN_STRING)
-			{
-				if (lex->strval != NULL)
-					*lexeme = pstrdup(lex->strval->data);
-			}
-			else
-			{
-				int			len = (lex->token_terminator - lex->token_start);
-				char	   *tokstr = palloc(len + 1);
-
-				memcpy(tokstr, lex->token_start, len);
-				tokstr[len] = '\0';
-				*lexeme = tokstr;
-			}
-		}
-		json_lex(lex);
-		return true;
-	}
-	return false;
-}
-
-/*
- * lex_accept
- *
- * move the lexer to the next token if the current look_ahead token matches
- * the parameter token. Otherwise, report an error.
- */
-static inline void
-lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token)
-{
-	if (!lex_accept(lex, token, NULL))
-		report_parse_error(ctx, lex);
-}
-
-/* chars to consider as part of an alphanumeric token */
-#define JSON_ALPHANUMERIC_CHAR(c)  \
-	(((c) >= 'a' && (c) <= 'z') || \
-	 ((c) >= 'A' && (c) <= 'Z') || \
-	 ((c) >= '0' && (c) <= '9') || \
-	 (c) == '_' || \
-	 IS_HIGHBIT_SET(c))
-
-/*
- * Utility function to check if a string is a valid JSON number.
- *
- * str is of length len, and need not be null-terminated.
- */
-bool
-IsValidJsonNumber(const char *str, int len)
-{
-	bool		numeric_error;
-	int			total_len;
-	JsonLexContext dummy_lex;
-
-	if (len <= 0)
-		return false;
-
-	/*
-	 * json_lex_number expects a leading  '-' to have been eaten already.
-	 *
-	 * having to cast away the constness of str is ugly, but there's not much
-	 * easy alternative.
-	 */
-	if (*str == '-')
-	{
-		dummy_lex.input = unconstify(char *, str) +1;
-		dummy_lex.input_length = len - 1;
-	}
-	else
-	{
-		dummy_lex.input = unconstify(char *, str);
-		dummy_lex.input_length = len;
-	}
-
-	json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error, &total_len);
-
-	return (!numeric_error) && (total_len == dummy_lex.input_length);
-}
-
 /*
  * Input.
  */
@@ -285,1058 +133,6 @@ json_recv(PG_FUNCTION_ARGS)
 	PG_RETURN_TEXT_P(cstring_to_text_with_len(str, nbytes));
 }
 
-/*
- * makeJsonLexContext
- *
- * lex constructor, with or without StringInfo object
- * for de-escaped lexemes.
- *
- * Without is better as it makes the processing faster, so only make one
- * if really required.
- *
- * If you already have the json as a text* value, use the first of these
- * functions, otherwise use  makeJsonLexContextCstringLen().
- */
-JsonLexContext *
-makeJsonLexContext(text *json, bool need_escapes)
-{
-	return makeJsonLexContextCstringLen(VARDATA_ANY(json),
-										VARSIZE_ANY_EXHDR(json),
-										need_escapes);
-}
-
-JsonLexContext *
-makeJsonLexContextCstringLen(char *json, int len, bool need_escapes)
-{
-	JsonLexContext *lex = palloc0(sizeof(JsonLexContext));
-
-	lex->input = lex->token_terminator = lex->line_start = json;
-	lex->line_number = 1;
-	lex->input_length = len;
-	if (need_escapes)
-		lex->strval = makeStringInfo();
-	return lex;
-}
-
-/*
- * pg_parse_json
- *
- * Publicly visible entry point for the JSON parser.
- *
- * lex is a lexing context, set up for the json to be processed by calling
- * makeJsonLexContext(). sem is a structure of function pointers to semantic
- * action routines to be called at appropriate spots during parsing, and a
- * pointer to a state object to be passed to those routines.
- */
-void
-pg_parse_json(JsonLexContext *lex, JsonSemAction *sem)
-{
-	JsonTokenType tok;
-
-	/* get the initial token */
-	json_lex(lex);
-
-	tok = lex_peek(lex);
-
-	/* parse by recursive descent */
-	switch (tok)
-	{
-		case JSON_TOKEN_OBJECT_START:
-			parse_object(lex, sem);
-			break;
-		case JSON_TOKEN_ARRAY_START:
-			parse_array(lex, sem);
-			break;
-		default:
-			parse_scalar(lex, sem); /* json can be a bare scalar */
-	}
-
-	lex_expect(JSON_PARSE_END, lex, JSON_TOKEN_END);
-
-}
-
-/*
- * json_count_array_elements
- *
- * Returns number of array elements in lex context at start of array token
- * until end of array token at same nesting level.
- *
- * Designed to be called from array_start routines.
- */
-int
-json_count_array_elements(JsonLexContext *lex)
-{
-	JsonLexContext copylex;
-	int			count;
-
-	/*
-	 * It's safe to do this with a shallow copy because the lexical routines
-	 * don't scribble on the input. They do scribble on the other pointers
-	 * etc, so doing this with a copy makes that safe.
-	 */
-	memcpy(&copylex, lex, sizeof(JsonLexContext));
-	copylex.strval = NULL;		/* not interested in values here */
-	copylex.lex_level++;
-
-	count = 0;
-	lex_expect(JSON_PARSE_ARRAY_START, &copylex, JSON_TOKEN_ARRAY_START);
-	if (lex_peek(&copylex) != JSON_TOKEN_ARRAY_END)
-	{
-		do
-		{
-			count++;
-			parse_array_element(&copylex, &nullSemAction);
-		}
-		while (lex_accept(&copylex, JSON_TOKEN_COMMA, NULL));
-	}
-	lex_expect(JSON_PARSE_ARRAY_NEXT, &copylex, JSON_TOKEN_ARRAY_END);
-
-	return count;
-}
-
-/*
- *	Recursive Descent parse routines. There is one for each structural
- *	element in a json document:
- *	  - scalar (string, number, true, false, null)
- *	  - array  ( [ ] )
- *	  - array element
- *	  - object ( { } )
- *	  - object field
- */
-static inline void
-parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
-{
-	char	   *val = NULL;
-	json_scalar_action sfunc = sem->scalar;
-	char	  **valaddr;
-	JsonTokenType tok = lex_peek(lex);
-
-	valaddr = sfunc == NULL ? NULL : &val;
-
-	/* a scalar must be a string, a number, true, false, or null */
-	switch (tok)
-	{
-		case JSON_TOKEN_TRUE:
-			lex_accept(lex, JSON_TOKEN_TRUE, valaddr);
-			break;
-		case JSON_TOKEN_FALSE:
-			lex_accept(lex, JSON_TOKEN_FALSE, valaddr);
-			break;
-		case JSON_TOKEN_NULL:
-			lex_accept(lex, JSON_TOKEN_NULL, valaddr);
-			break;
-		case JSON_TOKEN_NUMBER:
-			lex_accept(lex, JSON_TOKEN_NUMBER, valaddr);
-			break;
-		case JSON_TOKEN_STRING:
-			lex_accept(lex, JSON_TOKEN_STRING, valaddr);
-			break;
-		default:
-			report_parse_error(JSON_PARSE_VALUE, lex);
-	}
-
-	if (sfunc != NULL)
-		(*sfunc) (sem->semstate, val, tok);
-}
-
-static void
-parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
-{
-	/*
-	 * An object field is "fieldname" : value where value can be a scalar,
-	 * object or array.  Note: in user-facing docs and error messages, we
-	 * generally call a field name a "key".
-	 */
-
-	char	   *fname = NULL;	/* keep compiler quiet */
-	json_ofield_action ostart = sem->object_field_start;
-	json_ofield_action oend = sem->object_field_end;
-	bool		isnull;
-	char	  **fnameaddr = NULL;
-	JsonTokenType tok;
-
-	if (ostart != NULL || oend != NULL)
-		fnameaddr = &fname;
-
-	if (!lex_accept(lex, JSON_TOKEN_STRING, fnameaddr))
-		report_parse_error(JSON_PARSE_STRING, lex);
-
-	lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON);
-
-	tok = lex_peek(lex);
-	isnull = tok == JSON_TOKEN_NULL;
-
-	if (ostart != NULL)
-		(*ostart) (sem->semstate, fname, isnull);
-
-	switch (tok)
-	{
-		case JSON_TOKEN_OBJECT_START:
-			parse_object(lex, sem);
-			break;
-		case JSON_TOKEN_ARRAY_START:
-			parse_array(lex, sem);
-			break;
-		default:
-			parse_scalar(lex, sem);
-	}
-
-	if (oend != NULL)
-		(*oend) (sem->semstate, fname, isnull);
-}
-
-static void
-parse_object(JsonLexContext *lex, JsonSemAction *sem)
-{
-	/*
-	 * an object is a possibly empty sequence of object fields, separated by
-	 * commas and surrounded by curly braces.
-	 */
-	json_struct_action ostart = sem->object_start;
-	json_struct_action oend = sem->object_end;
-	JsonTokenType tok;
-
-	check_stack_depth();
-
-	if (ostart != NULL)
-		(*ostart) (sem->semstate);
-
-	/*
-	 * Data inside an object is at a higher nesting level than the object
-	 * itself. Note that we increment this after we call the semantic routine
-	 * for the object start and restore it before we call the routine for the
-	 * object end.
-	 */
-	lex->lex_level++;
-
-	/* we know this will succeed, just clearing the token */
-	lex_expect(JSON_PARSE_OBJECT_START, lex, JSON_TOKEN_OBJECT_START);
-
-	tok = lex_peek(lex);
-	switch (tok)
-	{
-		case JSON_TOKEN_STRING:
-			parse_object_field(lex, sem);
-			while (lex_accept(lex, JSON_TOKEN_COMMA, NULL))
-				parse_object_field(lex, sem);
-			break;
-		case JSON_TOKEN_OBJECT_END:
-			break;
-		default:
-			/* case of an invalid initial token inside the object */
-			report_parse_error(JSON_PARSE_OBJECT_START, lex);
-	}
-
-	lex_expect(JSON_PARSE_OBJECT_NEXT, lex, JSON_TOKEN_OBJECT_END);
-
-	lex->lex_level--;
-
-	if (oend != NULL)
-		(*oend) (sem->semstate);
-}
-
-static void
-parse_array_element(JsonLexContext *lex, JsonSemAction *sem)
-{
-	json_aelem_action astart = sem->array_element_start;
-	json_aelem_action aend = sem->array_element_end;
-	JsonTokenType tok = lex_peek(lex);
-
-	bool		isnull;
-
-	isnull = tok == JSON_TOKEN_NULL;
-
-	if (astart != NULL)
-		(*astart) (sem->semstate, isnull);
-
-	/* an array element is any object, array or scalar */
-	switch (tok)
-	{
-		case JSON_TOKEN_OBJECT_START:
-			parse_object(lex, sem);
-			break;
-		case JSON_TOKEN_ARRAY_START:
-			parse_array(lex, sem);
-			break;
-		default:
-			parse_scalar(lex, sem);
-	}
-
-	if (aend != NULL)
-		(*aend) (sem->semstate, isnull);
-}
-
-static void
-parse_array(JsonLexContext *lex, JsonSemAction *sem)
-{
-	/*
-	 * an array is a possibly empty sequence of array elements, separated by
-	 * commas and surrounded by square brackets.
-	 */
-	json_struct_action astart = sem->array_start;
-	json_struct_action aend = sem->array_end;
-
-	check_stack_depth();
-
-	if (astart != NULL)
-		(*astart) (sem->semstate);
-
-	/*
-	 * Data inside an array is at a higher nesting level than the array
-	 * itself. Note that we increment this after we call the semantic routine
-	 * for the array start and restore it before we call the routine for the
-	 * array end.
-	 */
-	lex->lex_level++;
-
-	lex_expect(JSON_PARSE_ARRAY_START, lex, JSON_TOKEN_ARRAY_START);
-	if (lex_peek(lex) != JSON_TOKEN_ARRAY_END)
-	{
-
-		parse_array_element(lex, sem);
-
-		while (lex_accept(lex, JSON_TOKEN_COMMA, NULL))
-			parse_array_element(lex, sem);
-	}
-
-	lex_expect(JSON_PARSE_ARRAY_NEXT, lex, JSON_TOKEN_ARRAY_END);
-
-	lex->lex_level--;
-
-	if (aend != NULL)
-		(*aend) (sem->semstate);
-}
-
-/*
- * Lex one token from the input stream.
- */
-static inline void
-json_lex(JsonLexContext *lex)
-{
-	char	   *s;
-	int			len;
-
-	/* Skip leading whitespace. */
-	s = lex->token_terminator;
-	len = s - lex->input;
-	while (len < lex->input_length &&
-		   (*s == ' ' || *s == '\t' || *s == '\n' || *s == '\r'))
-	{
-		if (*s == '\n')
-			++lex->line_number;
-		++s;
-		++len;
-	}
-	lex->token_start = s;
-
-	/* Determine token type. */
-	if (len >= lex->input_length)
-	{
-		lex->token_start = NULL;
-		lex->prev_token_terminator = lex->token_terminator;
-		lex->token_terminator = s;
-		lex->token_type = JSON_TOKEN_END;
-	}
-	else
-		switch (*s)
-		{
-				/* Single-character token, some kind of punctuation mark. */
-			case '{':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_OBJECT_START;
-				break;
-			case '}':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_OBJECT_END;
-				break;
-			case '[':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_ARRAY_START;
-				break;
-			case ']':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_ARRAY_END;
-				break;
-			case ',':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_COMMA;
-				break;
-			case ':':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_COLON;
-				break;
-			case '"':
-				/* string */
-				json_lex_string(lex);
-				lex->token_type = JSON_TOKEN_STRING;
-				break;
-			case '-':
-				/* Negative number. */
-				json_lex_number(lex, s + 1, NULL, NULL);
-				lex->token_type = JSON_TOKEN_NUMBER;
-				break;
-			case '0':
-			case '1':
-			case '2':
-			case '3':
-			case '4':
-			case '5':
-			case '6':
-			case '7':
-			case '8':
-			case '9':
-				/* Positive number. */
-				json_lex_number(lex, s, NULL, NULL);
-				lex->token_type = JSON_TOKEN_NUMBER;
-				break;
-			default:
-				{
-					char	   *p;
-
-					/*
-					 * We're not dealing with a string, number, legal
-					 * punctuation mark, or end of string.  The only legal
-					 * tokens we might find here are true, false, and null,
-					 * but for error reporting purposes we scan until we see a
-					 * non-alphanumeric character.  That way, we can report
-					 * the whole word as an unexpected token, rather than just
-					 * some unintuitive prefix thereof.
-					 */
-					for (p = s; p - s < lex->input_length - len && JSON_ALPHANUMERIC_CHAR(*p); p++)
-						 /* skip */ ;
-
-					/*
-					 * We got some sort of unexpected punctuation or an
-					 * otherwise unexpected character, so just complain about
-					 * that one character.
-					 */
-					if (p == s)
-					{
-						lex->prev_token_terminator = lex->token_terminator;
-						lex->token_terminator = s + 1;
-						report_invalid_token(lex);
-					}
-
-					/*
-					 * We've got a real alphanumeric token here.  If it
-					 * happens to be true, false, or null, all is well.  If
-					 * not, error out.
-					 */
-					lex->prev_token_terminator = lex->token_terminator;
-					lex->token_terminator = p;
-					if (p - s == 4)
-					{
-						if (memcmp(s, "true", 4) == 0)
-							lex->token_type = JSON_TOKEN_TRUE;
-						else if (memcmp(s, "null", 4) == 0)
-							lex->token_type = JSON_TOKEN_NULL;
-						else
-							report_invalid_token(lex);
-					}
-					else if (p - s == 5 && memcmp(s, "false", 5) == 0)
-						lex->token_type = JSON_TOKEN_FALSE;
-					else
-						report_invalid_token(lex);
-
-				}
-		}						/* end of switch */
-}
-
-/*
- * The next token in the input stream is known to be a string; lex it.
- */
-static inline void
-json_lex_string(JsonLexContext *lex)
-{
-	char	   *s;
-	int			len;
-	int			hi_surrogate = -1;
-
-	if (lex->strval != NULL)
-		resetStringInfo(lex->strval);
-
-	Assert(lex->input_length > 0);
-	s = lex->token_start;
-	len = lex->token_start - lex->input;
-	for (;;)
-	{
-		s++;
-		len++;
-		/* Premature end of the string. */
-		if (len >= lex->input_length)
-		{
-			lex->token_terminator = s;
-			report_invalid_token(lex);
-		}
-		else if (*s == '"')
-			break;
-		else if ((unsigned char) *s < 32)
-		{
-			/* Per RFC4627, these characters MUST be escaped. */
-			/* Since *s isn't printable, exclude it from the context string */
-			lex->token_terminator = s;
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-					 errmsg("invalid input syntax for type %s", "json"),
-					 errdetail("Character with value 0x%02x must be escaped.",
-							   (unsigned char) *s),
-					 report_json_context(lex)));
-		}
-		else if (*s == '\\')
-		{
-			/* OK, we have an escape character. */
-			s++;
-			len++;
-			if (len >= lex->input_length)
-			{
-				lex->token_terminator = s;
-				report_invalid_token(lex);
-			}
-			else if (*s == 'u')
-			{
-				int			i;
-				int			ch = 0;
-
-				for (i = 1; i <= 4; i++)
-				{
-					s++;
-					len++;
-					if (len >= lex->input_length)
-					{
-						lex->token_terminator = s;
-						report_invalid_token(lex);
-					}
-					else if (*s >= '0' && *s <= '9')
-						ch = (ch * 16) + (*s - '0');
-					else if (*s >= 'a' && *s <= 'f')
-						ch = (ch * 16) + (*s - 'a') + 10;
-					else if (*s >= 'A' && *s <= 'F')
-						ch = (ch * 16) + (*s - 'A') + 10;
-					else
-					{
-						lex->token_terminator = s + pg_mblen(s);
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s",
-										"json"),
-								 errdetail("\"\\u\" must be followed by four hexadecimal digits."),
-								 report_json_context(lex)));
-					}
-				}
-				if (lex->strval != NULL)
-				{
-					char		utf8str[5];
-					int			utf8len;
-
-					if (ch >= 0xd800 && ch <= 0xdbff)
-					{
-						if (hi_surrogate != -1)
-							ereport(ERROR,
-									(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-									 errmsg("invalid input syntax for type %s",
-											"json"),
-									 errdetail("Unicode high surrogate must not follow a high surrogate."),
-									 report_json_context(lex)));
-						hi_surrogate = (ch & 0x3ff) << 10;
-						continue;
-					}
-					else if (ch >= 0xdc00 && ch <= 0xdfff)
-					{
-						if (hi_surrogate == -1)
-							ereport(ERROR,
-									(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-									 errmsg("invalid input syntax for type %s", "json"),
-									 errdetail("Unicode low surrogate must follow a high surrogate."),
-									 report_json_context(lex)));
-						ch = 0x10000 + hi_surrogate + (ch & 0x3ff);
-						hi_surrogate = -1;
-					}
-
-					if (hi_surrogate != -1)
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s", "json"),
-								 errdetail("Unicode low surrogate must follow a high surrogate."),
-								 report_json_context(lex)));
-
-					/*
-					 * For UTF8, replace the escape sequence by the actual
-					 * utf8 character in lex->strval. Do this also for other
-					 * encodings if the escape designates an ASCII character,
-					 * otherwise raise an error.
-					 */
-
-					if (ch == 0)
-					{
-						/* We can't allow this, since our TEXT type doesn't */
-						ereport(ERROR,
-								(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
-								 errmsg("unsupported Unicode escape sequence"),
-								 errdetail("\\u0000 cannot be converted to text."),
-								 report_json_context(lex)));
-					}
-					else if (GetDatabaseEncoding() == PG_UTF8)
-					{
-						unicode_to_utf8(ch, (unsigned char *) utf8str);
-						utf8len = pg_utf_mblen((unsigned char *) utf8str);
-						appendBinaryStringInfo(lex->strval, utf8str, utf8len);
-					}
-					else if (ch <= 0x007f)
-					{
-						/*
-						 * This is the only way to designate things like a
-						 * form feed character in JSON, so it's useful in all
-						 * encodings.
-						 */
-						appendStringInfoChar(lex->strval, (char) ch);
-					}
-					else
-					{
-						ereport(ERROR,
-								(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
-								 errmsg("unsupported Unicode escape sequence"),
-								 errdetail("Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8."),
-								 report_json_context(lex)));
-					}
-
-				}
-			}
-			else if (lex->strval != NULL)
-			{
-				if (hi_surrogate != -1)
-					ereport(ERROR,
-							(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-							 errmsg("invalid input syntax for type %s",
-									"json"),
-							 errdetail("Unicode low surrogate must follow a high surrogate."),
-							 report_json_context(lex)));
-
-				switch (*s)
-				{
-					case '"':
-					case '\\':
-					case '/':
-						appendStringInfoChar(lex->strval, *s);
-						break;
-					case 'b':
-						appendStringInfoChar(lex->strval, '\b');
-						break;
-					case 'f':
-						appendStringInfoChar(lex->strval, '\f');
-						break;
-					case 'n':
-						appendStringInfoChar(lex->strval, '\n');
-						break;
-					case 'r':
-						appendStringInfoChar(lex->strval, '\r');
-						break;
-					case 't':
-						appendStringInfoChar(lex->strval, '\t');
-						break;
-					default:
-						/* Not a valid string escape, so error out. */
-						lex->token_terminator = s + pg_mblen(s);
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s",
-										"json"),
-								 errdetail("Escape sequence \"\\%s\" is invalid.",
-										   extract_mb_char(s)),
-								 report_json_context(lex)));
-				}
-			}
-			else if (strchr("\"\\/bfnrt", *s) == NULL)
-			{
-				/*
-				 * Simpler processing if we're not bothered about de-escaping
-				 *
-				 * It's very tempting to remove the strchr() call here and
-				 * replace it with a switch statement, but testing so far has
-				 * shown it's not a performance win.
-				 */
-				lex->token_terminator = s + pg_mblen(s);
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Escape sequence \"\\%s\" is invalid.",
-								   extract_mb_char(s)),
-						 report_json_context(lex)));
-			}
-
-		}
-		else if (lex->strval != NULL)
-		{
-			if (hi_surrogate != -1)
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Unicode low surrogate must follow a high surrogate."),
-						 report_json_context(lex)));
-
-			appendStringInfoChar(lex->strval, *s);
-		}
-
-	}
-
-	if (hi_surrogate != -1)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("Unicode low surrogate must follow a high surrogate."),
-				 report_json_context(lex)));
-
-	/* Hooray, we found the end of the string! */
-	lex->prev_token_terminator = lex->token_terminator;
-	lex->token_terminator = s + 1;
-}
-
-/*
- * The next token in the input stream is known to be a number; lex it.
- *
- * In JSON, a number consists of four parts:
- *
- * (1) An optional minus sign ('-').
- *
- * (2) Either a single '0', or a string of one or more digits that does not
- *	   begin with a '0'.
- *
- * (3) An optional decimal part, consisting of a period ('.') followed by
- *	   one or more digits.  (Note: While this part can be omitted
- *	   completely, it's not OK to have only the decimal point without
- *	   any digits afterwards.)
- *
- * (4) An optional exponent part, consisting of 'e' or 'E', optionally
- *	   followed by '+' or '-', followed by one or more digits.  (Note:
- *	   As with the decimal part, if 'e' or 'E' is present, it must be
- *	   followed by at least one digit.)
- *
- * The 's' argument to this function points to the ostensible beginning
- * of part 2 - i.e. the character after any optional minus sign, or the
- * first character of the string if there is none.
- *
- * If num_err is not NULL, we return an error flag to *num_err rather than
- * raising an error for a badly-formed number.  Also, if total_len is not NULL
- * the distance from lex->input to the token end+1 is returned to *total_len.
- */
-static inline void
-json_lex_number(JsonLexContext *lex, char *s,
-				bool *num_err, int *total_len)
-{
-	bool		error = false;
-	int			len = s - lex->input;
-
-	/* Part (1): leading sign indicator. */
-	/* Caller already did this for us; so do nothing. */
-
-	/* Part (2): parse main digit string. */
-	if (len < lex->input_length && *s == '0')
-	{
-		s++;
-		len++;
-	}
-	else if (len < lex->input_length && *s >= '1' && *s <= '9')
-	{
-		do
-		{
-			s++;
-			len++;
-		} while (len < lex->input_length && *s >= '0' && *s <= '9');
-	}
-	else
-		error = true;
-
-	/* Part (3): parse optional decimal portion. */
-	if (len < lex->input_length && *s == '.')
-	{
-		s++;
-		len++;
-		if (len == lex->input_length || *s < '0' || *s > '9')
-			error = true;
-		else
-		{
-			do
-			{
-				s++;
-				len++;
-			} while (len < lex->input_length && *s >= '0' && *s <= '9');
-		}
-	}
-
-	/* Part (4): parse optional exponent. */
-	if (len < lex->input_length && (*s == 'e' || *s == 'E'))
-	{
-		s++;
-		len++;
-		if (len < lex->input_length && (*s == '+' || *s == '-'))
-		{
-			s++;
-			len++;
-		}
-		if (len == lex->input_length || *s < '0' || *s > '9')
-			error = true;
-		else
-		{
-			do
-			{
-				s++;
-				len++;
-			} while (len < lex->input_length && *s >= '0' && *s <= '9');
-		}
-	}
-
-	/*
-	 * Check for trailing garbage.  As in json_lex(), any alphanumeric stuff
-	 * here should be considered part of the token for error-reporting
-	 * purposes.
-	 */
-	for (; len < lex->input_length && JSON_ALPHANUMERIC_CHAR(*s); s++, len++)
-		error = true;
-
-	if (total_len != NULL)
-		*total_len = len;
-
-	if (num_err != NULL)
-	{
-		/* let the caller handle any error */
-		*num_err = error;
-	}
-	else
-	{
-		/* return token endpoint */
-		lex->prev_token_terminator = lex->token_terminator;
-		lex->token_terminator = s;
-		/* handle error if any */
-		if (error)
-			report_invalid_token(lex);
-	}
-}
-
-/*
- * Report a parse error.
- *
- * lex->token_start and lex->token_terminator must identify the current token.
- */
-static void
-report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
-{
-	char	   *token;
-	int			toklen;
-
-	/* Handle case where the input ended prematurely. */
-	if (lex->token_start == NULL || lex->token_type == JSON_TOKEN_END)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("The input string ended unexpectedly."),
-				 report_json_context(lex)));
-
-	/* Separate out the current token. */
-	toklen = lex->token_terminator - lex->token_start;
-	token = palloc(toklen + 1);
-	memcpy(token, lex->token_start, toklen);
-	token[toklen] = '\0';
-
-	/* Complain, with the appropriate detail message. */
-	if (ctx == JSON_PARSE_END)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("Expected end of input, but found \"%s\".",
-						   token),
-				 report_json_context(lex)));
-	else
-	{
-		switch (ctx)
-		{
-			case JSON_PARSE_VALUE:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected JSON value, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_STRING:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_ARRAY_START:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected array element or \"]\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_ARRAY_NEXT:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \",\" or \"]\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_START:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string or \"}\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_LABEL:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \":\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_NEXT:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \",\" or \"}\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_COMMA:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			default:
-				elog(ERROR, "unexpected json parse state: %d", ctx);
-		}
-	}
-}
-
-/*
- * Report an invalid input token.
- *
- * lex->token_start and lex->token_terminator must identify the token.
- */
-static void
-report_invalid_token(JsonLexContext *lex)
-{
-	char	   *token;
-	int			toklen;
-
-	/* Separate out the offending token. */
-	toklen = lex->token_terminator - lex->token_start;
-	token = palloc(toklen + 1);
-	memcpy(token, lex->token_start, toklen);
-	token[toklen] = '\0';
-
-	ereport(ERROR,
-			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-			 errmsg("invalid input syntax for type %s", "json"),
-			 errdetail("Token \"%s\" is invalid.", token),
-			 report_json_context(lex)));
-}
-
-/*
- * Report a CONTEXT line for bogus JSON input.
- *
- * lex->token_terminator must be set to identify the spot where we detected
- * the error.  Note that lex->token_start might be NULL, in case we recognized
- * error at EOF.
- *
- * The return value isn't meaningful, but we make it non-void so that this
- * can be invoked inside ereport().
- */
-static int
-report_json_context(JsonLexContext *lex)
-{
-	const char *context_start;
-	const char *context_end;
-	const char *line_start;
-	int			line_number;
-	char	   *ctxt;
-	int			ctxtlen;
-	const char *prefix;
-	const char *suffix;
-
-	/* Choose boundaries for the part of the input we will display */
-	context_start = lex->input;
-	context_end = lex->token_terminator;
-	line_start = context_start;
-	line_number = 1;
-	for (;;)
-	{
-		/* Always advance over newlines */
-		if (context_start < context_end && *context_start == '\n')
-		{
-			context_start++;
-			line_start = context_start;
-			line_number++;
-			continue;
-		}
-		/* Otherwise, done as soon as we are close enough to context_end */
-		if (context_end - context_start < 50)
-			break;
-		/* Advance to next multibyte character */
-		if (IS_HIGHBIT_SET(*context_start))
-			context_start += pg_mblen(context_start);
-		else
-			context_start++;
-	}
-
-	/*
-	 * We add "..." to indicate that the excerpt doesn't start at the
-	 * beginning of the line ... but if we're within 3 characters of the
-	 * beginning of the line, we might as well just show the whole line.
-	 */
-	if (context_start - line_start <= 3)
-		context_start = line_start;
-
-	/* Get a null-terminated copy of the data to present */
-	ctxtlen = context_end - context_start;
-	ctxt = palloc(ctxtlen + 1);
-	memcpy(ctxt, context_start, ctxtlen);
-	ctxt[ctxtlen] = '\0';
-
-	/*
-	 * Show the context, prefixing "..." if not starting at start of line, and
-	 * suffixing "..." if not ending at end of line.
-	 */
-	prefix = (context_start > line_start) ? "..." : "";
-	suffix = (lex->token_type != JSON_TOKEN_END && context_end - lex->input < lex->input_length && *context_end != '\n' && *context_end != '\r') ? "..." : "";
-
-	return errcontext("JSON data, line %d: %s%s%s",
-					  line_number, prefix, ctxt, suffix);
-}
-
-/*
- * Extract a single, possibly multi-byte char from the input string.
- */
-static char *
-extract_mb_char(char *s)
-{
-	char	   *res;
-	int			len;
-
-	len = pg_mblen(s);
-	res = palloc(len + 1);
-	memcpy(res, s, len);
-	res[len] = '\0';
-
-	return res;
-}
-
 /*
  * Determine how we want to print values of a given type in datum_to_json.
  *
@@ -2547,7 +1343,7 @@ json_typeof(PG_FUNCTION_ARGS)
 
 	/* Lex exactly one token from the input and check its type. */
 	json_lex(lex);
-	tok = lex_peek(lex);
+	tok = lex->token_type;
 	switch (tok)
 	{
 		case JSON_TOKEN_OBJECT_START:
diff --git a/src/backend/utils/adt/jsonapi.c b/src/backend/utils/adt/jsonapi.c
new file mode 100644
index 0000000000..fc8af9f861
--- /dev/null
+++ b/src/backend/utils/adt/jsonapi.c
@@ -0,0 +1,1216 @@
+/*-------------------------------------------------------------------------
+ *
+ * jsonapi.c
+ *		JSON parser and lexer interfaces
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/jsonapi.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "utils/jsonapi.h"
+
+/*
+ * The context of the parser is maintained by the recursive descent
+ * mechanism, but is passed explicitly to the error reporting routine
+ * for better diagnostics.
+ */
+typedef enum					/* contexts of JSON parser */
+{
+	JSON_PARSE_VALUE,			/* expecting a value */
+	JSON_PARSE_STRING,			/* expecting a string (for a field name) */
+	JSON_PARSE_ARRAY_START,		/* saw '[', expecting value or ']' */
+	JSON_PARSE_ARRAY_NEXT,		/* saw array element, expecting ',' or ']' */
+	JSON_PARSE_OBJECT_START,	/* saw '{', expecting label or '}' */
+	JSON_PARSE_OBJECT_LABEL,	/* saw object label, expecting ':' */
+	JSON_PARSE_OBJECT_NEXT,		/* saw object value, expecting ',' or '}' */
+	JSON_PARSE_OBJECT_COMMA,	/* saw object ',', expecting next label */
+	JSON_PARSE_END				/* saw the end of a document, expect nothing */
+} JsonParseContext;
+
+static inline void json_lex_string(JsonLexContext *lex);
+static inline void json_lex_number(JsonLexContext *lex, char *s,
+								   bool *num_err, int *total_len);
+static inline void parse_scalar(JsonLexContext *lex, JsonSemAction *sem);
+static void parse_object_field(JsonLexContext *lex, JsonSemAction *sem);
+static void parse_object(JsonLexContext *lex, JsonSemAction *sem);
+static void parse_array_element(JsonLexContext *lex, JsonSemAction *sem);
+static void parse_array(JsonLexContext *lex, JsonSemAction *sem);
+static void report_parse_error(JsonParseContext ctx, JsonLexContext *lex) pg_attribute_noreturn();
+static void report_invalid_token(JsonLexContext *lex) pg_attribute_noreturn();
+static int	report_json_context(JsonLexContext *lex);
+static char *extract_mb_char(char *s);
+
+/* the null action object used for pure validation */
+JsonSemAction nullSemAction =
+{
+	NULL, NULL, NULL, NULL, NULL,
+	NULL, NULL, NULL, NULL, NULL
+};
+
+/* Recursive Descent parser support routines */
+
+/*
+ * lex_peek
+ *
+ * what is the current look_ahead token?
+*/
+static inline JsonTokenType
+lex_peek(JsonLexContext *lex)
+{
+	return lex->token_type;
+}
+
+/*
+ * lex_accept
+ *
+ * accept the look_ahead token and move the lexer to the next token if the
+ * look_ahead token matches the token parameter. In that case, and if required,
+ * also hand back the de-escaped lexeme.
+ *
+ * returns true if the token matched, false otherwise.
+ */
+static inline bool
+lex_accept(JsonLexContext *lex, JsonTokenType token, char **lexeme)
+{
+	if (lex->token_type == token)
+	{
+		if (lexeme != NULL)
+		{
+			if (lex->token_type == JSON_TOKEN_STRING)
+			{
+				if (lex->strval != NULL)
+					*lexeme = pstrdup(lex->strval->data);
+			}
+			else
+			{
+				int			len = (lex->token_terminator - lex->token_start);
+				char	   *tokstr = palloc(len + 1);
+
+				memcpy(tokstr, lex->token_start, len);
+				tokstr[len] = '\0';
+				*lexeme = tokstr;
+			}
+		}
+		json_lex(lex);
+		return true;
+	}
+	return false;
+}
+
+/*
+ * lex_accept
+ *
+ * move the lexer to the next token if the current look_ahead token matches
+ * the parameter token. Otherwise, report an error.
+ */
+static inline void
+lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token)
+{
+	if (!lex_accept(lex, token, NULL))
+		report_parse_error(ctx, lex);
+}
+
+/* chars to consider as part of an alphanumeric token */
+#define JSON_ALPHANUMERIC_CHAR(c)  \
+	(((c) >= 'a' && (c) <= 'z') || \
+	 ((c) >= 'A' && (c) <= 'Z') || \
+	 ((c) >= '0' && (c) <= '9') || \
+	 (c) == '_' || \
+	 IS_HIGHBIT_SET(c))
+
+/*
+ * Utility function to check if a string is a valid JSON number.
+ *
+ * str is of length len, and need not be null-terminated.
+ */
+bool
+IsValidJsonNumber(const char *str, int len)
+{
+	bool		numeric_error;
+	int			total_len;
+	JsonLexContext dummy_lex;
+
+	if (len <= 0)
+		return false;
+
+	/*
+	 * json_lex_number expects a leading  '-' to have been eaten already.
+	 *
+	 * having to cast away the constness of str is ugly, but there's not much
+	 * easy alternative.
+	 */
+	if (*str == '-')
+	{
+		dummy_lex.input = unconstify(char *, str) +1;
+		dummy_lex.input_length = len - 1;
+	}
+	else
+	{
+		dummy_lex.input = unconstify(char *, str);
+		dummy_lex.input_length = len;
+	}
+
+	json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error, &total_len);
+
+	return (!numeric_error) && (total_len == dummy_lex.input_length);
+}
+
+/*
+ * makeJsonLexContext
+ *
+ * lex constructor, with or without StringInfo object
+ * for de-escaped lexemes.
+ *
+ * Without is better as it makes the processing faster, so only make one
+ * if really required.
+ *
+ * If you already have the json as a text* value, use the first of these
+ * functions, otherwise use  makeJsonLexContextCstringLen().
+ */
+JsonLexContext *
+makeJsonLexContext(text *json, bool need_escapes)
+{
+	return makeJsonLexContextCstringLen(VARDATA_ANY(json),
+										VARSIZE_ANY_EXHDR(json),
+										need_escapes);
+}
+
+JsonLexContext *
+makeJsonLexContextCstringLen(char *json, int len, bool need_escapes)
+{
+	JsonLexContext *lex = palloc0(sizeof(JsonLexContext));
+
+	lex->input = lex->token_terminator = lex->line_start = json;
+	lex->line_number = 1;
+	lex->input_length = len;
+	if (need_escapes)
+		lex->strval = makeStringInfo();
+	return lex;
+}
+
+/*
+ * pg_parse_json
+ *
+ * Publicly visible entry point for the JSON parser.
+ *
+ * lex is a lexing context, set up for the json to be processed by calling
+ * makeJsonLexContext(). sem is a structure of function pointers to semantic
+ * action routines to be called at appropriate spots during parsing, and a
+ * pointer to a state object to be passed to those routines.
+ */
+void
+pg_parse_json(JsonLexContext *lex, JsonSemAction *sem)
+{
+	JsonTokenType tok;
+
+	/* get the initial token */
+	json_lex(lex);
+
+	tok = lex_peek(lex);
+
+	/* parse by recursive descent */
+	switch (tok)
+	{
+		case JSON_TOKEN_OBJECT_START:
+			parse_object(lex, sem);
+			break;
+		case JSON_TOKEN_ARRAY_START:
+			parse_array(lex, sem);
+			break;
+		default:
+			parse_scalar(lex, sem); /* json can be a bare scalar */
+	}
+
+	lex_expect(JSON_PARSE_END, lex, JSON_TOKEN_END);
+
+}
+
+/*
+ * json_count_array_elements
+ *
+ * Returns number of array elements in lex context at start of array token
+ * until end of array token at same nesting level.
+ *
+ * Designed to be called from array_start routines.
+ */
+int
+json_count_array_elements(JsonLexContext *lex)
+{
+	JsonLexContext copylex;
+	int			count;
+
+	/*
+	 * It's safe to do this with a shallow copy because the lexical routines
+	 * don't scribble on the input. They do scribble on the other pointers
+	 * etc, so doing this with a copy makes that safe.
+	 */
+	memcpy(&copylex, lex, sizeof(JsonLexContext));
+	copylex.strval = NULL;		/* not interested in values here */
+	copylex.lex_level++;
+
+	count = 0;
+	lex_expect(JSON_PARSE_ARRAY_START, &copylex, JSON_TOKEN_ARRAY_START);
+	if (lex_peek(&copylex) != JSON_TOKEN_ARRAY_END)
+	{
+		do
+		{
+			count++;
+			parse_array_element(&copylex, &nullSemAction);
+		}
+		while (lex_accept(&copylex, JSON_TOKEN_COMMA, NULL));
+	}
+	lex_expect(JSON_PARSE_ARRAY_NEXT, &copylex, JSON_TOKEN_ARRAY_END);
+
+	return count;
+}
+
+/*
+ *	Recursive Descent parse routines. There is one for each structural
+ *	element in a json document:
+ *	  - scalar (string, number, true, false, null)
+ *	  - array  ( [ ] )
+ *	  - array element
+ *	  - object ( { } )
+ *	  - object field
+ */
+static inline void
+parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
+{
+	char	   *val = NULL;
+	json_scalar_action sfunc = sem->scalar;
+	char	  **valaddr;
+	JsonTokenType tok = lex_peek(lex);
+
+	valaddr = sfunc == NULL ? NULL : &val;
+
+	/* a scalar must be a string, a number, true, false, or null */
+	switch (tok)
+	{
+		case JSON_TOKEN_TRUE:
+			lex_accept(lex, JSON_TOKEN_TRUE, valaddr);
+			break;
+		case JSON_TOKEN_FALSE:
+			lex_accept(lex, JSON_TOKEN_FALSE, valaddr);
+			break;
+		case JSON_TOKEN_NULL:
+			lex_accept(lex, JSON_TOKEN_NULL, valaddr);
+			break;
+		case JSON_TOKEN_NUMBER:
+			lex_accept(lex, JSON_TOKEN_NUMBER, valaddr);
+			break;
+		case JSON_TOKEN_STRING:
+			lex_accept(lex, JSON_TOKEN_STRING, valaddr);
+			break;
+		default:
+			report_parse_error(JSON_PARSE_VALUE, lex);
+	}
+
+	if (sfunc != NULL)
+		(*sfunc) (sem->semstate, val, tok);
+}
+
+static void
+parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
+{
+	/*
+	 * An object field is "fieldname" : value where value can be a scalar,
+	 * object or array.  Note: in user-facing docs and error messages, we
+	 * generally call a field name a "key".
+	 */
+
+	char	   *fname = NULL;	/* keep compiler quiet */
+	json_ofield_action ostart = sem->object_field_start;
+	json_ofield_action oend = sem->object_field_end;
+	bool		isnull;
+	char	  **fnameaddr = NULL;
+	JsonTokenType tok;
+
+	if (ostart != NULL || oend != NULL)
+		fnameaddr = &fname;
+
+	if (!lex_accept(lex, JSON_TOKEN_STRING, fnameaddr))
+		report_parse_error(JSON_PARSE_STRING, lex);
+
+	lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON);
+
+	tok = lex_peek(lex);
+	isnull = tok == JSON_TOKEN_NULL;
+
+	if (ostart != NULL)
+		(*ostart) (sem->semstate, fname, isnull);
+
+	switch (tok)
+	{
+		case JSON_TOKEN_OBJECT_START:
+			parse_object(lex, sem);
+			break;
+		case JSON_TOKEN_ARRAY_START:
+			parse_array(lex, sem);
+			break;
+		default:
+			parse_scalar(lex, sem);
+	}
+
+	if (oend != NULL)
+		(*oend) (sem->semstate, fname, isnull);
+}
+
+static void
+parse_object(JsonLexContext *lex, JsonSemAction *sem)
+{
+	/*
+	 * an object is a possibly empty sequence of object fields, separated by
+	 * commas and surrounded by curly braces.
+	 */
+	json_struct_action ostart = sem->object_start;
+	json_struct_action oend = sem->object_end;
+	JsonTokenType tok;
+
+	check_stack_depth();
+
+	if (ostart != NULL)
+		(*ostart) (sem->semstate);
+
+	/*
+	 * Data inside an object is at a higher nesting level than the object
+	 * itself. Note that we increment this after we call the semantic routine
+	 * for the object start and restore it before we call the routine for the
+	 * object end.
+	 */
+	lex->lex_level++;
+
+	/* we know this will succeed, just clearing the token */
+	lex_expect(JSON_PARSE_OBJECT_START, lex, JSON_TOKEN_OBJECT_START);
+
+	tok = lex_peek(lex);
+	switch (tok)
+	{
+		case JSON_TOKEN_STRING:
+			parse_object_field(lex, sem);
+			while (lex_accept(lex, JSON_TOKEN_COMMA, NULL))
+				parse_object_field(lex, sem);
+			break;
+		case JSON_TOKEN_OBJECT_END:
+			break;
+		default:
+			/* case of an invalid initial token inside the object */
+			report_parse_error(JSON_PARSE_OBJECT_START, lex);
+	}
+
+	lex_expect(JSON_PARSE_OBJECT_NEXT, lex, JSON_TOKEN_OBJECT_END);
+
+	lex->lex_level--;
+
+	if (oend != NULL)
+		(*oend) (sem->semstate);
+}
+
+static void
+parse_array_element(JsonLexContext *lex, JsonSemAction *sem)
+{
+	json_aelem_action astart = sem->array_element_start;
+	json_aelem_action aend = sem->array_element_end;
+	JsonTokenType tok = lex_peek(lex);
+
+	bool		isnull;
+
+	isnull = tok == JSON_TOKEN_NULL;
+
+	if (astart != NULL)
+		(*astart) (sem->semstate, isnull);
+
+	/* an array element is any object, array or scalar */
+	switch (tok)
+	{
+		case JSON_TOKEN_OBJECT_START:
+			parse_object(lex, sem);
+			break;
+		case JSON_TOKEN_ARRAY_START:
+			parse_array(lex, sem);
+			break;
+		default:
+			parse_scalar(lex, sem);
+	}
+
+	if (aend != NULL)
+		(*aend) (sem->semstate, isnull);
+}
+
+static void
+parse_array(JsonLexContext *lex, JsonSemAction *sem)
+{
+	/*
+	 * an array is a possibly empty sequence of array elements, separated by
+	 * commas and surrounded by square brackets.
+	 */
+	json_struct_action astart = sem->array_start;
+	json_struct_action aend = sem->array_end;
+
+	check_stack_depth();
+
+	if (astart != NULL)
+		(*astart) (sem->semstate);
+
+	/*
+	 * Data inside an array is at a higher nesting level than the array
+	 * itself. Note that we increment this after we call the semantic routine
+	 * for the array start and restore it before we call the routine for the
+	 * array end.
+	 */
+	lex->lex_level++;
+
+	lex_expect(JSON_PARSE_ARRAY_START, lex, JSON_TOKEN_ARRAY_START);
+	if (lex_peek(lex) != JSON_TOKEN_ARRAY_END)
+	{
+
+		parse_array_element(lex, sem);
+
+		while (lex_accept(lex, JSON_TOKEN_COMMA, NULL))
+			parse_array_element(lex, sem);
+	}
+
+	lex_expect(JSON_PARSE_ARRAY_NEXT, lex, JSON_TOKEN_ARRAY_END);
+
+	lex->lex_level--;
+
+	if (aend != NULL)
+		(*aend) (sem->semstate);
+}
+
+/*
+ * Lex one token from the input stream.
+ */
+void
+json_lex(JsonLexContext *lex)
+{
+	char	   *s;
+	int			len;
+
+	/* Skip leading whitespace. */
+	s = lex->token_terminator;
+	len = s - lex->input;
+	while (len < lex->input_length &&
+		   (*s == ' ' || *s == '\t' || *s == '\n' || *s == '\r'))
+	{
+		if (*s == '\n')
+			++lex->line_number;
+		++s;
+		++len;
+	}
+	lex->token_start = s;
+
+	/* Determine token type. */
+	if (len >= lex->input_length)
+	{
+		lex->token_start = NULL;
+		lex->prev_token_terminator = lex->token_terminator;
+		lex->token_terminator = s;
+		lex->token_type = JSON_TOKEN_END;
+	}
+	else
+		switch (*s)
+		{
+				/* Single-character token, some kind of punctuation mark. */
+			case '{':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_OBJECT_START;
+				break;
+			case '}':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_OBJECT_END;
+				break;
+			case '[':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_ARRAY_START;
+				break;
+			case ']':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_ARRAY_END;
+				break;
+			case ',':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_COMMA;
+				break;
+			case ':':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_COLON;
+				break;
+			case '"':
+				/* string */
+				json_lex_string(lex);
+				lex->token_type = JSON_TOKEN_STRING;
+				break;
+			case '-':
+				/* Negative number. */
+				json_lex_number(lex, s + 1, NULL, NULL);
+				lex->token_type = JSON_TOKEN_NUMBER;
+				break;
+			case '0':
+			case '1':
+			case '2':
+			case '3':
+			case '4':
+			case '5':
+			case '6':
+			case '7':
+			case '8':
+			case '9':
+				/* Positive number. */
+				json_lex_number(lex, s, NULL, NULL);
+				lex->token_type = JSON_TOKEN_NUMBER;
+				break;
+			default:
+				{
+					char	   *p;
+
+					/*
+					 * We're not dealing with a string, number, legal
+					 * punctuation mark, or end of string.  The only legal
+					 * tokens we might find here are true, false, and null,
+					 * but for error reporting purposes we scan until we see a
+					 * non-alphanumeric character.  That way, we can report
+					 * the whole word as an unexpected token, rather than just
+					 * some unintuitive prefix thereof.
+					 */
+					for (p = s; p - s < lex->input_length - len && JSON_ALPHANUMERIC_CHAR(*p); p++)
+						 /* skip */ ;
+
+					/*
+					 * We got some sort of unexpected punctuation or an
+					 * otherwise unexpected character, so just complain about
+					 * that one character.
+					 */
+					if (p == s)
+					{
+						lex->prev_token_terminator = lex->token_terminator;
+						lex->token_terminator = s + 1;
+						report_invalid_token(lex);
+					}
+
+					/*
+					 * We've got a real alphanumeric token here.  If it
+					 * happens to be true, false, or null, all is well.  If
+					 * not, error out.
+					 */
+					lex->prev_token_terminator = lex->token_terminator;
+					lex->token_terminator = p;
+					if (p - s == 4)
+					{
+						if (memcmp(s, "true", 4) == 0)
+							lex->token_type = JSON_TOKEN_TRUE;
+						else if (memcmp(s, "null", 4) == 0)
+							lex->token_type = JSON_TOKEN_NULL;
+						else
+							report_invalid_token(lex);
+					}
+					else if (p - s == 5 && memcmp(s, "false", 5) == 0)
+						lex->token_type = JSON_TOKEN_FALSE;
+					else
+						report_invalid_token(lex);
+
+				}
+		}						/* end of switch */
+}
+
+/*
+ * The next token in the input stream is known to be a string; lex it.
+ */
+static inline void
+json_lex_string(JsonLexContext *lex)
+{
+	char	   *s;
+	int			len;
+	int			hi_surrogate = -1;
+
+	if (lex->strval != NULL)
+		resetStringInfo(lex->strval);
+
+	Assert(lex->input_length > 0);
+	s = lex->token_start;
+	len = lex->token_start - lex->input;
+	for (;;)
+	{
+		s++;
+		len++;
+		/* Premature end of the string. */
+		if (len >= lex->input_length)
+		{
+			lex->token_terminator = s;
+			report_invalid_token(lex);
+		}
+		else if (*s == '"')
+			break;
+		else if ((unsigned char) *s < 32)
+		{
+			/* Per RFC4627, these characters MUST be escaped. */
+			/* Since *s isn't printable, exclude it from the context string */
+			lex->token_terminator = s;
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Character with value 0x%02x must be escaped.",
+							   (unsigned char) *s),
+					 report_json_context(lex)));
+		}
+		else if (*s == '\\')
+		{
+			/* OK, we have an escape character. */
+			s++;
+			len++;
+			if (len >= lex->input_length)
+			{
+				lex->token_terminator = s;
+				report_invalid_token(lex);
+			}
+			else if (*s == 'u')
+			{
+				int			i;
+				int			ch = 0;
+
+				for (i = 1; i <= 4; i++)
+				{
+					s++;
+					len++;
+					if (len >= lex->input_length)
+					{
+						lex->token_terminator = s;
+						report_invalid_token(lex);
+					}
+					else if (*s >= '0' && *s <= '9')
+						ch = (ch * 16) + (*s - '0');
+					else if (*s >= 'a' && *s <= 'f')
+						ch = (ch * 16) + (*s - 'a') + 10;
+					else if (*s >= 'A' && *s <= 'F')
+						ch = (ch * 16) + (*s - 'A') + 10;
+					else
+					{
+						lex->token_terminator = s + pg_mblen(s);
+						ereport(ERROR,
+								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+								 errmsg("invalid input syntax for type %s",
+										"json"),
+								 errdetail("\"\\u\" must be followed by four hexadecimal digits."),
+								 report_json_context(lex)));
+					}
+				}
+				if (lex->strval != NULL)
+				{
+					char		utf8str[5];
+					int			utf8len;
+
+					if (ch >= 0xd800 && ch <= 0xdbff)
+					{
+						if (hi_surrogate != -1)
+							ereport(ERROR,
+									(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+									 errmsg("invalid input syntax for type %s",
+											"json"),
+									 errdetail("Unicode high surrogate must not follow a high surrogate."),
+									 report_json_context(lex)));
+						hi_surrogate = (ch & 0x3ff) << 10;
+						continue;
+					}
+					else if (ch >= 0xdc00 && ch <= 0xdfff)
+					{
+						if (hi_surrogate == -1)
+							ereport(ERROR,
+									(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+									 errmsg("invalid input syntax for type %s", "json"),
+									 errdetail("Unicode low surrogate must follow a high surrogate."),
+									 report_json_context(lex)));
+						ch = 0x10000 + hi_surrogate + (ch & 0x3ff);
+						hi_surrogate = -1;
+					}
+
+					if (hi_surrogate != -1)
+						ereport(ERROR,
+								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+								 errmsg("invalid input syntax for type %s", "json"),
+								 errdetail("Unicode low surrogate must follow a high surrogate."),
+								 report_json_context(lex)));
+
+					/*
+					 * For UTF8, replace the escape sequence by the actual
+					 * utf8 character in lex->strval. Do this also for other
+					 * encodings if the escape designates an ASCII character,
+					 * otherwise raise an error.
+					 */
+
+					if (ch == 0)
+					{
+						/* We can't allow this, since our TEXT type doesn't */
+						ereport(ERROR,
+								(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
+								 errmsg("unsupported Unicode escape sequence"),
+								 errdetail("\\u0000 cannot be converted to text."),
+								 report_json_context(lex)));
+					}
+					else if (GetDatabaseEncoding() == PG_UTF8)
+					{
+						unicode_to_utf8(ch, (unsigned char *) utf8str);
+						utf8len = pg_utf_mblen((unsigned char *) utf8str);
+						appendBinaryStringInfo(lex->strval, utf8str, utf8len);
+					}
+					else if (ch <= 0x007f)
+					{
+						/*
+						 * This is the only way to designate things like a
+						 * form feed character in JSON, so it's useful in all
+						 * encodings.
+						 */
+						appendStringInfoChar(lex->strval, (char) ch);
+					}
+					else
+					{
+						ereport(ERROR,
+								(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
+								 errmsg("unsupported Unicode escape sequence"),
+								 errdetail("Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8."),
+								 report_json_context(lex)));
+					}
+
+				}
+			}
+			else if (lex->strval != NULL)
+			{
+				if (hi_surrogate != -1)
+					ereport(ERROR,
+							(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+							 errmsg("invalid input syntax for type %s",
+									"json"),
+							 errdetail("Unicode low surrogate must follow a high surrogate."),
+							 report_json_context(lex)));
+
+				switch (*s)
+				{
+					case '"':
+					case '\\':
+					case '/':
+						appendStringInfoChar(lex->strval, *s);
+						break;
+					case 'b':
+						appendStringInfoChar(lex->strval, '\b');
+						break;
+					case 'f':
+						appendStringInfoChar(lex->strval, '\f');
+						break;
+					case 'n':
+						appendStringInfoChar(lex->strval, '\n');
+						break;
+					case 'r':
+						appendStringInfoChar(lex->strval, '\r');
+						break;
+					case 't':
+						appendStringInfoChar(lex->strval, '\t');
+						break;
+					default:
+						/* Not a valid string escape, so error out. */
+						lex->token_terminator = s + pg_mblen(s);
+						ereport(ERROR,
+								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+								 errmsg("invalid input syntax for type %s",
+										"json"),
+								 errdetail("Escape sequence \"\\%s\" is invalid.",
+										   extract_mb_char(s)),
+								 report_json_context(lex)));
+				}
+			}
+			else if (strchr("\"\\/bfnrt", *s) == NULL)
+			{
+				/*
+				 * Simpler processing if we're not bothered about de-escaping
+				 *
+				 * It's very tempting to remove the strchr() call here and
+				 * replace it with a switch statement, but testing so far has
+				 * shown it's not a performance win.
+				 */
+				lex->token_terminator = s + pg_mblen(s);
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Escape sequence \"\\%s\" is invalid.",
+								   extract_mb_char(s)),
+						 report_json_context(lex)));
+			}
+
+		}
+		else if (lex->strval != NULL)
+		{
+			if (hi_surrogate != -1)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Unicode low surrogate must follow a high surrogate."),
+						 report_json_context(lex)));
+
+			appendStringInfoChar(lex->strval, *s);
+		}
+
+	}
+
+	if (hi_surrogate != -1)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				 errmsg("invalid input syntax for type %s", "json"),
+				 errdetail("Unicode low surrogate must follow a high surrogate."),
+				 report_json_context(lex)));
+
+	/* Hooray, we found the end of the string! */
+	lex->prev_token_terminator = lex->token_terminator;
+	lex->token_terminator = s + 1;
+}
+
+/*
+ * The next token in the input stream is known to be a number; lex it.
+ *
+ * In JSON, a number consists of four parts:
+ *
+ * (1) An optional minus sign ('-').
+ *
+ * (2) Either a single '0', or a string of one or more digits that does not
+ *	   begin with a '0'.
+ *
+ * (3) An optional decimal part, consisting of a period ('.') followed by
+ *	   one or more digits.  (Note: While this part can be omitted
+ *	   completely, it's not OK to have only the decimal point without
+ *	   any digits afterwards.)
+ *
+ * (4) An optional exponent part, consisting of 'e' or 'E', optionally
+ *	   followed by '+' or '-', followed by one or more digits.  (Note:
+ *	   As with the decimal part, if 'e' or 'E' is present, it must be
+ *	   followed by at least one digit.)
+ *
+ * The 's' argument to this function points to the ostensible beginning
+ * of part 2 - i.e. the character after any optional minus sign, or the
+ * first character of the string if there is none.
+ *
+ * If num_err is not NULL, we return an error flag to *num_err rather than
+ * raising an error for a badly-formed number.  Also, if total_len is not NULL
+ * the distance from lex->input to the token end+1 is returned to *total_len.
+ */
+static inline void
+json_lex_number(JsonLexContext *lex, char *s,
+				bool *num_err, int *total_len)
+{
+	bool		error = false;
+	int			len = s - lex->input;
+
+	/* Part (1): leading sign indicator. */
+	/* Caller already did this for us; so do nothing. */
+
+	/* Part (2): parse main digit string. */
+	if (len < lex->input_length && *s == '0')
+	{
+		s++;
+		len++;
+	}
+	else if (len < lex->input_length && *s >= '1' && *s <= '9')
+	{
+		do
+		{
+			s++;
+			len++;
+		} while (len < lex->input_length && *s >= '0' && *s <= '9');
+	}
+	else
+		error = true;
+
+	/* Part (3): parse optional decimal portion. */
+	if (len < lex->input_length && *s == '.')
+	{
+		s++;
+		len++;
+		if (len == lex->input_length || *s < '0' || *s > '9')
+			error = true;
+		else
+		{
+			do
+			{
+				s++;
+				len++;
+			} while (len < lex->input_length && *s >= '0' && *s <= '9');
+		}
+	}
+
+	/* Part (4): parse optional exponent. */
+	if (len < lex->input_length && (*s == 'e' || *s == 'E'))
+	{
+		s++;
+		len++;
+		if (len < lex->input_length && (*s == '+' || *s == '-'))
+		{
+			s++;
+			len++;
+		}
+		if (len == lex->input_length || *s < '0' || *s > '9')
+			error = true;
+		else
+		{
+			do
+			{
+				s++;
+				len++;
+			} while (len < lex->input_length && *s >= '0' && *s <= '9');
+		}
+	}
+
+	/*
+	 * Check for trailing garbage.  As in json_lex(), any alphanumeric stuff
+	 * here should be considered part of the token for error-reporting
+	 * purposes.
+	 */
+	for (; len < lex->input_length && JSON_ALPHANUMERIC_CHAR(*s); s++, len++)
+		error = true;
+
+	if (total_len != NULL)
+		*total_len = len;
+
+	if (num_err != NULL)
+	{
+		/* let the caller handle any error */
+		*num_err = error;
+	}
+	else
+	{
+		/* return token endpoint */
+		lex->prev_token_terminator = lex->token_terminator;
+		lex->token_terminator = s;
+		/* handle error if any */
+		if (error)
+			report_invalid_token(lex);
+	}
+}
+
+/*
+ * Report a parse error.
+ *
+ * lex->token_start and lex->token_terminator must identify the current token.
+ */
+static void
+report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
+{
+	char	   *token;
+	int			toklen;
+
+	/* Handle case where the input ended prematurely. */
+	if (lex->token_start == NULL || lex->token_type == JSON_TOKEN_END)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				 errmsg("invalid input syntax for type %s", "json"),
+				 errdetail("The input string ended unexpectedly."),
+				 report_json_context(lex)));
+
+	/* Separate out the current token. */
+	toklen = lex->token_terminator - lex->token_start;
+	token = palloc(toklen + 1);
+	memcpy(token, lex->token_start, toklen);
+	token[toklen] = '\0';
+
+	/* Complain, with the appropriate detail message. */
+	if (ctx == JSON_PARSE_END)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				 errmsg("invalid input syntax for type %s", "json"),
+				 errdetail("Expected end of input, but found \"%s\".",
+						   token),
+				 report_json_context(lex)));
+	else
+	{
+		switch (ctx)
+		{
+			case JSON_PARSE_VALUE:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected JSON value, but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_STRING:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected string, but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_ARRAY_START:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected array element or \"]\", but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_ARRAY_NEXT:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected \",\" or \"]\", but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_OBJECT_START:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected string or \"}\", but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_OBJECT_LABEL:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected \":\", but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_OBJECT_NEXT:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected \",\" or \"}\", but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_OBJECT_COMMA:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected string, but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			default:
+				elog(ERROR, "unexpected json parse state: %d", ctx);
+		}
+	}
+}
+
+/*
+ * Report an invalid input token.
+ *
+ * lex->token_start and lex->token_terminator must identify the token.
+ */
+static void
+report_invalid_token(JsonLexContext *lex)
+{
+	char	   *token;
+	int			toklen;
+
+	/* Separate out the offending token. */
+	toklen = lex->token_terminator - lex->token_start;
+	token = palloc(toklen + 1);
+	memcpy(token, lex->token_start, toklen);
+	token[toklen] = '\0';
+
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+			 errmsg("invalid input syntax for type %s", "json"),
+			 errdetail("Token \"%s\" is invalid.", token),
+			 report_json_context(lex)));
+}
+
+/*
+ * Report a CONTEXT line for bogus JSON input.
+ *
+ * lex->token_terminator must be set to identify the spot where we detected
+ * the error.  Note that lex->token_start might be NULL, in case we recognized
+ * error at EOF.
+ *
+ * The return value isn't meaningful, but we make it non-void so that this
+ * can be invoked inside ereport().
+ */
+static int
+report_json_context(JsonLexContext *lex)
+{
+	const char *context_start;
+	const char *context_end;
+	const char *line_start;
+	int			line_number;
+	char	   *ctxt;
+	int			ctxtlen;
+	const char *prefix;
+	const char *suffix;
+
+	/* Choose boundaries for the part of the input we will display */
+	context_start = lex->input;
+	context_end = lex->token_terminator;
+	line_start = context_start;
+	line_number = 1;
+	for (;;)
+	{
+		/* Always advance over newlines */
+		if (context_start < context_end && *context_start == '\n')
+		{
+			context_start++;
+			line_start = context_start;
+			line_number++;
+			continue;
+		}
+		/* Otherwise, done as soon as we are close enough to context_end */
+		if (context_end - context_start < 50)
+			break;
+		/* Advance to next multibyte character */
+		if (IS_HIGHBIT_SET(*context_start))
+			context_start += pg_mblen(context_start);
+		else
+			context_start++;
+	}
+
+	/*
+	 * We add "..." to indicate that the excerpt doesn't start at the
+	 * beginning of the line ... but if we're within 3 characters of the
+	 * beginning of the line, we might as well just show the whole line.
+	 */
+	if (context_start - line_start <= 3)
+		context_start = line_start;
+
+	/* Get a null-terminated copy of the data to present */
+	ctxtlen = context_end - context_start;
+	ctxt = palloc(ctxtlen + 1);
+	memcpy(ctxt, context_start, ctxtlen);
+	ctxt[ctxtlen] = '\0';
+
+	/*
+	 * Show the context, prefixing "..." if not starting at start of line, and
+	 * suffixing "..." if not ending at end of line.
+	 */
+	prefix = (context_start > line_start) ? "..." : "";
+	suffix = (lex->token_type != JSON_TOKEN_END && context_end - lex->input < lex->input_length && *context_end != '\n' && *context_end != '\r') ? "..." : "";
+
+	return errcontext("JSON data, line %d: %s%s%s",
+					  line_number, prefix, ctxt, suffix);
+}
+
+/*
+ * Extract a single, possibly multi-byte char from the input string.
+ */
+static char *
+extract_mb_char(char *s)
+{
+	char	   *res;
+	int			len;
+
+	len = pg_mblen(s);
+	res = palloc(len + 1);
+	memcpy(res, s, len);
+	res[len] = '\0';
+
+	return res;
+}
diff --git a/src/include/utils/jsonapi.h b/src/include/utils/jsonapi.h
index 1190947476..bbca121bb7 100644
--- a/src/include/utils/jsonapi.h
+++ b/src/include/utils/jsonapi.h
@@ -103,6 +103,9 @@ typedef struct JsonSemAction
  */
 extern void pg_parse_json(JsonLexContext *lex, JsonSemAction *sem);
 
+/* the null action object used for pure validation */
+extern JsonSemAction nullSemAction;
+
 /*
  * json_count_array_elements performs a fast secondary parse to determine the
  * number of elements in passed array lex context. It should be called from an
@@ -124,6 +127,9 @@ extern JsonLexContext *makeJsonLexContextCstringLen(char *json,
 													int len,
 													bool need_escapes);
 
+/* lex one token */
+extern void json_lex(JsonLexContext *lex);
+
 /*
  * Utility function to check if a string is a valid JSON number.
  *
-- 
2.17.2 (Apple Git-113)

0004-Introduce-json_error-macro.patchapplication/octet-stream; name=0004-Introduce-json_error-macro.patchDownload

From d05e1fc82a51cb583a0367e72b1afc0de561dd00 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 15 Jan 2020 10:36:52 -0500
Subject: [PATCH 4/6] Introduce json_error() macro.

---
 src/backend/utils/adt/jsonapi.c | 221 +++++++++++++-------------------
 1 file changed, 90 insertions(+), 131 deletions(-)

diff --git a/src/backend/utils/adt/jsonapi.c b/src/backend/utils/adt/jsonapi.c
index fc8af9f861..20f7f0f7ac 100644
--- a/src/backend/utils/adt/jsonapi.c
+++ b/src/backend/utils/adt/jsonapi.c
@@ -17,6 +17,9 @@
 #include "miscadmin.h"
 #include "utils/jsonapi.h"
 
+#define json_error(rest) \
+	ereport(ERROR, (rest, report_json_context(lex)))
+
 /*
  * The context of the parser is maintained by the recursive descent
  * mechanism, but is passed explicitly to the error reporting routine
@@ -163,6 +166,7 @@ IsValidJsonNumber(const char *str, int len)
 	return (!numeric_error) && (total_len == dummy_lex.input_length);
 }
 
+#ifndef FRONTEND
 /*
  * makeJsonLexContext
  *
@@ -182,6 +186,7 @@ makeJsonLexContext(text *json, bool need_escapes)
 										VARSIZE_ANY_EXHDR(json),
 										need_escapes);
 }
+#endif
 
 JsonLexContext *
 makeJsonLexContextCstringLen(char *json, int len, bool need_escapes)
@@ -659,12 +664,10 @@ json_lex_string(JsonLexContext *lex)
 			/* Per RFC4627, these characters MUST be escaped. */
 			/* Since *s isn't printable, exclude it from the context string */
 			lex->token_terminator = s;
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-					 errmsg("invalid input syntax for type %s", "json"),
-					 errdetail("Character with value 0x%02x must be escaped.",
-							   (unsigned char) *s),
-					 report_json_context(lex)));
+			json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						errmsg("invalid input syntax for type %s", "json"),
+						errdetail("Character with value 0x%02x must be escaped.",
+								  (unsigned char) *s)));
 		}
 		else if (*s == '\\')
 		{
@@ -699,12 +702,10 @@ json_lex_string(JsonLexContext *lex)
 					else
 					{
 						lex->token_terminator = s + pg_mblen(s);
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s",
-										"json"),
-								 errdetail("\"\\u\" must be followed by four hexadecimal digits."),
-								 report_json_context(lex)));
+						json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+									errmsg("invalid input syntax for type %s",
+										   "json"),
+									errdetail("\"\\u\" must be followed by four hexadecimal digits.")));
 					}
 				}
 				if (lex->strval != NULL)
@@ -715,33 +716,27 @@ json_lex_string(JsonLexContext *lex)
 					if (ch >= 0xd800 && ch <= 0xdbff)
 					{
 						if (hi_surrogate != -1)
-							ereport(ERROR,
-									(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-									 errmsg("invalid input syntax for type %s",
-											"json"),
-									 errdetail("Unicode high surrogate must not follow a high surrogate."),
-									 report_json_context(lex)));
+							json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+										errmsg("invalid input syntax for type %s",
+											   "json"),
+										errdetail("Unicode high surrogate must not follow a high surrogate.")));
 						hi_surrogate = (ch & 0x3ff) << 10;
 						continue;
 					}
 					else if (ch >= 0xdc00 && ch <= 0xdfff)
 					{
 						if (hi_surrogate == -1)
-							ereport(ERROR,
-									(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-									 errmsg("invalid input syntax for type %s", "json"),
-									 errdetail("Unicode low surrogate must follow a high surrogate."),
-									 report_json_context(lex)));
+							json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+										errmsg("invalid input syntax for type %s", "json"),
+										errdetail("Unicode low surrogate must follow a high surrogate.")));
 						ch = 0x10000 + hi_surrogate + (ch & 0x3ff);
 						hi_surrogate = -1;
 					}
 
 					if (hi_surrogate != -1)
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s", "json"),
-								 errdetail("Unicode low surrogate must follow a high surrogate."),
-								 report_json_context(lex)));
+						json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+									errmsg("invalid input syntax for type %s", "json"),
+									errdetail("Unicode low surrogate must follow a high surrogate.")));
 
 					/*
 					 * For UTF8, replace the escape sequence by the actual
@@ -753,11 +748,9 @@ json_lex_string(JsonLexContext *lex)
 					if (ch == 0)
 					{
 						/* We can't allow this, since our TEXT type doesn't */
-						ereport(ERROR,
-								(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
-								 errmsg("unsupported Unicode escape sequence"),
-								 errdetail("\\u0000 cannot be converted to text."),
-								 report_json_context(lex)));
+						json_error((errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
+									errmsg("unsupported Unicode escape sequence"),
+									errdetail("\\u0000 cannot be converted to text.")));
 					}
 					else if (GetDatabaseEncoding() == PG_UTF8)
 					{
@@ -776,11 +769,9 @@ json_lex_string(JsonLexContext *lex)
 					}
 					else
 					{
-						ereport(ERROR,
-								(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
-								 errmsg("unsupported Unicode escape sequence"),
-								 errdetail("Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8."),
-								 report_json_context(lex)));
+						json_error((errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
+									errmsg("unsupported Unicode escape sequence"),
+									errdetail("Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8.")));
 					}
 
 				}
@@ -788,12 +779,10 @@ json_lex_string(JsonLexContext *lex)
 			else if (lex->strval != NULL)
 			{
 				if (hi_surrogate != -1)
-					ereport(ERROR,
-							(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-							 errmsg("invalid input syntax for type %s",
-									"json"),
-							 errdetail("Unicode low surrogate must follow a high surrogate."),
-							 report_json_context(lex)));
+					json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+								errmsg("invalid input syntax for type %s",
+									   "json"),
+								errdetail("Unicode low surrogate must follow a high surrogate.")));
 
 				switch (*s)
 				{
@@ -820,13 +809,11 @@ json_lex_string(JsonLexContext *lex)
 					default:
 						/* Not a valid string escape, so error out. */
 						lex->token_terminator = s + pg_mblen(s);
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s",
-										"json"),
-								 errdetail("Escape sequence \"\\%s\" is invalid.",
-										   extract_mb_char(s)),
-								 report_json_context(lex)));
+						json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+									errmsg("invalid input syntax for type %s",
+										   "json"),
+									errdetail("Escape sequence \"\\%s\" is invalid.",
+											  extract_mb_char(s))));
 				}
 			}
 			else if (strchr("\"\\/bfnrt", *s) == NULL)
@@ -839,23 +826,19 @@ json_lex_string(JsonLexContext *lex)
 				 * shown it's not a performance win.
 				 */
 				lex->token_terminator = s + pg_mblen(s);
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Escape sequence \"\\%s\" is invalid.",
-								   extract_mb_char(s)),
-						 report_json_context(lex)));
+				json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+							errmsg("invalid input syntax for type %s", "json"),
+							errdetail("Escape sequence \"\\%s\" is invalid.",
+									  extract_mb_char(s))));
 			}
 
 		}
 		else if (lex->strval != NULL)
 		{
 			if (hi_surrogate != -1)
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Unicode low surrogate must follow a high surrogate."),
-						 report_json_context(lex)));
+				json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+							errmsg("invalid input syntax for type %s", "json"),
+							errdetail("Unicode low surrogate must follow a high surrogate.")));
 
 			appendStringInfoChar(lex->strval, *s);
 		}
@@ -863,11 +846,9 @@ json_lex_string(JsonLexContext *lex)
 	}
 
 	if (hi_surrogate != -1)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("Unicode low surrogate must follow a high surrogate."),
-				 report_json_context(lex)));
+		json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					errmsg("invalid input syntax for type %s", "json"),
+					errdetail("Unicode low surrogate must follow a high surrogate.")));
 
 	/* Hooray, we found the end of the string! */
 	lex->prev_token_terminator = lex->token_terminator;
@@ -1008,11 +989,9 @@ report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
 
 	/* Handle case where the input ended prematurely. */
 	if (lex->token_start == NULL || lex->token_type == JSON_TOKEN_END)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("The input string ended unexpectedly."),
-				 report_json_context(lex)));
+		json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					errmsg("invalid input syntax for type %s", "json"),
+					errdetail("The input string ended unexpectedly.")));
 
 	/* Separate out the current token. */
 	toklen = lex->token_terminator - lex->token_start;
@@ -1022,79 +1001,61 @@ report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
 
 	/* Complain, with the appropriate detail message. */
 	if (ctx == JSON_PARSE_END)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("Expected end of input, but found \"%s\".",
-						   token),
-				 report_json_context(lex)));
+		json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					errmsg("invalid input syntax for type %s", "json"),
+					errdetail("Expected end of input, but found \"%s\".",
+							  token)));
 	else
 	{
 		switch (ctx)
 		{
 			case JSON_PARSE_VALUE:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected JSON value, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
+				json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+							errmsg("invalid input syntax for type %s", "json"),
+							errdetail("Expected JSON value, but found \"%s\".",
+									  token)));
 				break;
 			case JSON_PARSE_STRING:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
+				json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+							errmsg("invalid input syntax for type %s", "json"),
+							errdetail("Expected string, but found \"%s\".",
+									  token)));
 				break;
 			case JSON_PARSE_ARRAY_START:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected array element or \"]\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
+				json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+							errmsg("invalid input syntax for type %s", "json"),
+							errdetail("Expected array element or \"]\", but found \"%s\".",
+									  token)));
 				break;
 			case JSON_PARSE_ARRAY_NEXT:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \",\" or \"]\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
+				json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+							errmsg("invalid input syntax for type %s", "json"),
+							errdetail("Expected \",\" or \"]\", but found \"%s\".",
+									  token)));
 				break;
 			case JSON_PARSE_OBJECT_START:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string or \"}\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
+				json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+							errmsg("invalid input syntax for type %s", "json"),
+							errdetail("Expected string or \"}\", but found \"%s\".",
+									  token)));
 				break;
 			case JSON_PARSE_OBJECT_LABEL:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \":\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
+				json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+							errmsg("invalid input syntax for type %s", "json"),
+							errdetail("Expected \":\", but found \"%s\".",
+									  token)));
 				break;
 			case JSON_PARSE_OBJECT_NEXT:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \",\" or \"}\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
+				json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+							errmsg("invalid input syntax for type %s", "json"),
+							errdetail("Expected \",\" or \"}\", but found \"%s\".",
+									  token)));
 				break;
 			case JSON_PARSE_OBJECT_COMMA:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
+				json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+							errmsg("invalid input syntax for type %s", "json"),
+							errdetail("Expected string, but found \"%s\".",
+									  token)));
 				break;
 			default:
 				elog(ERROR, "unexpected json parse state: %d", ctx);
@@ -1119,11 +1080,9 @@ report_invalid_token(JsonLexContext *lex)
 	memcpy(token, lex->token_start, toklen);
 	token[toklen] = '\0';
 
-	ereport(ERROR,
-			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-			 errmsg("invalid input syntax for type %s", "json"),
-			 errdetail("Token \"%s\" is invalid.", token),
-			 report_json_context(lex)));
+	json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				errmsg("invalid input syntax for type %s", "json"),
+				errdetail("Token \"%s\" is invalid.", token)));
 }
 
 /*
-- 
2.17.2 (Apple Git-113)

0005-Frontendify-jsonapi.c.patchapplication/octet-stream; name=0005-Frontendify-jsonapi.c.patchDownload

From 92ef37e68a9e95dbfff10e1bb1b369a957987fa4 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 15 Jan 2020 11:08:13 -0500
Subject: [PATCH 5/6] Frontendify jsonapi.c

---
 src/backend/utils/adt/jsonapi.c | 71 +++++++++++++++++++++++++++++++--
 1 file changed, 68 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/jsonapi.c b/src/backend/utils/adt/jsonapi.c
index 20f7f0f7ac..f355bd9de1 100644
--- a/src/backend/utils/adt/jsonapi.c
+++ b/src/backend/utils/adt/jsonapi.c
@@ -13,12 +13,44 @@
  */
 #include "postgres.h"
 
-#include "mb/pg_wchar.h"
+#ifdef FRONTEND
+#include "common/logging.h"
+#include "libpq-fe.h"
+#else
 #include "miscadmin.h"
+#endif
+
+#include "mb/pg_wchar.h"
 #include "utils/jsonapi.h"
 
+#ifdef FRONTEND
+static void json_frontend_error(int dummy) pg_attribute_noreturn();
+
+static int dummy;
+#define errcode(something) (dummy = 1)
+static int errmsg(const char *fmt, ...) pg_attribute_printf(1,2);
+static int errdetail(const char *fmt, ...) pg_attribute_printf(1, 2);
+
+static void json_frontend_error(int dummy)
+{
+	exit(1);
+}
+static int errmsg(const char *fmt, ...)
+{
+	return 1;
+}
+static int errdetail(const char *fmt, ...)
+{
+	return 1;
+}
+
+#define json_error(rest) \
+	json_frontend_error(rest)
+#define check_stack_depth()
+#else
 #define json_error(rest) \
 	ereport(ERROR, (rest, report_json_context(lex)))
+#endif
 
 /*
  * The context of the parser is maintained by the recursive descent
@@ -48,7 +80,9 @@ static void parse_array_element(JsonLexContext *lex, JsonSemAction *sem);
 static void parse_array(JsonLexContext *lex, JsonSemAction *sem);
 static void report_parse_error(JsonParseContext ctx, JsonLexContext *lex) pg_attribute_noreturn();
 static void report_invalid_token(JsonLexContext *lex) pg_attribute_noreturn();
+#ifndef FRONTEND
 static int	report_json_context(JsonLexContext *lex);
+#endif
 static char *extract_mb_char(char *s);
 
 /* the null action object used for pure validation */
@@ -701,7 +735,11 @@ json_lex_string(JsonLexContext *lex)
 						ch = (ch * 16) + (*s - 'A') + 10;
 					else
 					{
+#ifdef FRONTEND
+						lex->token_terminator = s + PQmblen(s, PG_UTF8);
+#else
 						lex->token_terminator = s + pg_mblen(s);
+#endif
 						json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
 									errmsg("invalid input syntax for type %s",
 										   "json"),
@@ -752,6 +790,14 @@ json_lex_string(JsonLexContext *lex)
 									errmsg("unsupported Unicode escape sequence"),
 									errdetail("\\u0000 cannot be converted to text.")));
 					}
+#ifdef FRONTEND
+					else
+					{
+						unicode_to_utf8(ch, (unsigned char *) utf8str);
+						utf8len = PQmblen(utf8str, PG_UTF8);
+						appendBinaryStringInfo(lex->strval, utf8str, utf8len);
+					}
+#else
 					else if (GetDatabaseEncoding() == PG_UTF8)
 					{
 						unicode_to_utf8(ch, (unsigned char *) utf8str);
@@ -773,7 +819,7 @@ json_lex_string(JsonLexContext *lex)
 									errmsg("unsupported Unicode escape sequence"),
 									errdetail("Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8.")));
 					}
-
+#endif
 				}
 			}
 			else if (lex->strval != NULL)
@@ -808,7 +854,11 @@ json_lex_string(JsonLexContext *lex)
 						break;
 					default:
 						/* Not a valid string escape, so error out. */
+#ifdef FRONTEND
+						lex->token_terminator = s + PQmblen(s, PG_UTF8);
+#else
 						lex->token_terminator = s + pg_mblen(s);
+#endif
 						json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
 									errmsg("invalid input syntax for type %s",
 										   "json"),
@@ -825,7 +875,11 @@ json_lex_string(JsonLexContext *lex)
 				 * replace it with a switch statement, but testing so far has
 				 * shown it's not a performance win.
 				 */
-				lex->token_terminator = s + pg_mblen(s);
+#ifdef FRONTEND
+						lex->token_terminator = s + PQmblen(s, PG_UTF8);
+#else
+						lex->token_terminator = s + pg_mblen(s);
+#endif
 				json_error((errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
 							errmsg("invalid input syntax for type %s", "json"),
 							errdetail("Escape sequence \"\\%s\" is invalid.",
@@ -1058,7 +1112,12 @@ report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
 									  token)));
 				break;
 			default:
+#ifdef FRONTEND
+				pg_log_fatal("unexpected json parse state: %d", ctx);
+				exit(1);
+#else
 				elog(ERROR, "unexpected json parse state: %d", ctx);
+#endif
 		}
 	}
 }
@@ -1085,6 +1144,7 @@ report_invalid_token(JsonLexContext *lex)
 				errdetail("Token \"%s\" is invalid.", token)));
 }
 
+#ifndef FRONTEND
 /*
  * Report a CONTEXT line for bogus JSON input.
  *
@@ -1156,6 +1216,7 @@ report_json_context(JsonLexContext *lex)
 	return errcontext("JSON data, line %d: %s%s%s",
 					  line_number, prefix, ctxt, suffix);
 }
+#endif
 
 /*
  * Extract a single, possibly multi-byte char from the input string.
@@ -1166,7 +1227,11 @@ extract_mb_char(char *s)
 	char	   *res;
 	int			len;
 
+#ifdef FRONTEND
+	len = PQmblen(s, PG_UTF8);
+#else
 	len = pg_mblen(s);
+#endif
 	res = palloc(len + 1);
 	memcpy(res, s, len);
 	res[len] = '\0';
-- 
2.17.2 (Apple Git-113)

0006-Move-jsonapi.c-to-src-common.patchapplication/octet-stream; name=0006-Move-jsonapi.c-to-src-common.patchDownload

From c4cf7378460ba07dd9d5e30ca0277a2fc67f44eb Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 15 Jan 2020 13:45:11 -0500
Subject: [PATCH 6/6] Move jsonapi.c to src/common

---
 src/backend/utils/adt/Makefile              | 1 -
 src/common/Makefile                         | 1 +
 src/{backend/utils/adt => common}/jsonapi.c | 6 +++++-
 3 files changed, 6 insertions(+), 2 deletions(-)
 rename src/{backend/utils/adt => common}/jsonapi.c (99%)

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 790d7a24fb..13efa9338c 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -44,7 +44,6 @@ OBJS = \
 	int.o \
 	int8.o \
 	json.o \
-	jsonapi.o \
 	jsonb.o \
 	jsonb_gin.o \
 	jsonb_op.o \
diff --git a/src/common/Makefile b/src/common/Makefile
index cb00bcbbba..9ee87650a8 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -57,6 +57,7 @@ OBJS_COMMON = \
 	ip.o \
 	keywords.o \
 	kwlookup.o \
+	jsonapi.o \
 	link-canary.o \
 	md5.o \
 	pg_lzcompress.o \
diff --git a/src/backend/utils/adt/jsonapi.c b/src/common/jsonapi.c
similarity index 99%
rename from src/backend/utils/adt/jsonapi.c
rename to src/common/jsonapi.c
index f355bd9de1..c42af7647d 100644
--- a/src/backend/utils/adt/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -11,11 +11,15 @@
  *
  *-------------------------------------------------------------------------
  */
+#ifdef FRONTEND
+#include "postgres_fe.h"
+#else
 #include "postgres.h"
+#endif
 
 #ifdef FRONTEND
 #include "common/logging.h"
-#include "libpq-fe.h"
+#include "../interfaces/libpq/libpq-fe.h"
 #else
 #include "miscadmin.h"
 #endif
-- 
2.17.2 (Apple Git-113)

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Robert Haas (#1)

Re: making the backend's json parser work in frontend code

Robert Haas <robertmhaas@gmail.com> writes:

... However, I decided to spend today doing some further
investigation of an alternative approach, namely making the backend's
existing JSON parser work in frontend code as well. I did not solve
all the problems there, but I did come up with some patches which I
think would be worth committing on independent grounds, and I think
the whole series is worth posting. So here goes.

In general, if we can possibly get to having one JSON parser in
src/common, that seems like an obviously better place to be than
having two JSON parsers. So I'm encouraged that it might be
feasible after all.

0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
missing something, this seems like an overdue cleanup.

FWIW, I've been wanting to do that for awhile. I've not studied
your patch, but +1 for the idea. We might also need to take a
hard look at mbutils.c to see if any of that code can/should move.

Since I wrote my very first email complaining about the difficulty of
making the backend's JSON parser work in a frontend environment, one
obstacle has been knocked down: StringInfo is now available in
front-end code (commit 26aaf97b683d6258c098859e6b1268e1f5da242f). The
remaining problems (that I know about) have to do with error reporting
and multibyte character support; a read of the patches is suggested
for those wanting further details.

The patch I just posted at <2863.1579127649@sss.pgh.pa.us> probably
affects this in small ways, but not anything major.

regards, tom lane

Andres Freund

andres@anarazel.de

almost 6 years ago

In reply to: Robert Haas (#1)

Re: making the backend's json parser work in frontend code

Hi,

On 2020-01-15 16:02:49 -0500, Robert Haas wrote:

The discussion on the backup manifest thread has gotten bogged down on
the issue of the format that should be used to store the backup
manifest file. I want something simple and ad-hoc; David Steele and
Stephen Frost prefer JSON. That is problematic because our JSON parser
does not work in frontend code, and I want to be able to validate a
backup against its manifest, which involves being able to parse the
manifest from frontend code. The latest development over there is that
David Steele has posted the JSON parser that he wrote for pgbackrest
with an offer to try to adapt it for use in front-end PostgreSQL code,
an offer which I genuinely appreciate. I'll write more about that over
on that thread.

I'm not sure where I come down between using json and a simple ad-hoc
format, when the dependency for the former is making the existing json
parser work in the frontend. But if the alternative is to add a second
json parser, it very clearly shifts towards using an ad-hoc
format. Having to maintain a simple ad-hoc parser is a lot less
technical debt than having a second full blown json parser. Imo even
when an external project or three also has to have that simple parser.

If the alternative were to use that newly proposed json parser to
*replace* the backend one too, the story would again be different.

0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
missing something, this seems like an overdue cleanup. It's long been
the case that wchar.c is actually compiled and linked into both
frontend and backend code. Commit
60f11b87a2349985230c08616fa8a34ffde934c8 added code into src/common
that depends on wchar.c being available, but didn't actually make
wchar.c part of src/common, which seems like an odd decision: the
functions in the library are dependent on code that is not part of any
library but whose source files get copied around where needed. Eh?

Cool.

0002 does some basic header cleanup to make it possible to include the
existing header file jsonapi.h in frontend code. The state of the JSON
headers today looks generally poor. There seems not to have been much
attempt to get the prototypes for a given source file, say foo.c, into
a header file with the same name, say foo.h. Also, dependencies
between various header files seem to be have added somewhat freely.
This patch does not come close to fixing all that, but I consider it a
modest down payment on a cleanup that probably ought to be taken
further.

Yea, this seems like a necessary cleanup (or well, maybe the start of
it).

0003 splits json.c into two files, json.c and jsonapi.c. All the
lexing and parsing stuff (whose prototypes are in jsonapi.h) goes into
jsonapi.c, while the stuff that pertains to the 'json' data type
remains in json.c. This also seems like a good cleanup, because to me,
at least, it's not a great idea to mix together code that is used by
both the json and jsonb data types as well as other things in the
system that want to generate or parse json together with things that
are specific to the 'json' data type.

On the other hand, 0004, 0005, and 0006 are charitably described as
experimental or WIP. 0004 and 0005 hack up jsonapi.c so that it can
still be compiled even if #include "postgres.h" is changed to #include
"postgres-fe.h" and 0006 moves it into src/common. Note that I say
that they make it compile, not work. It's not just untested; it's
definitely broken. But it gives a feeling for what the remaining
obstacles to making this code available in a frontend environment are.
Since I wrote my very first email complaining about the difficulty of
making the backend's JSON parser work in a frontend environment, one
obstacle has been knocked down: StringInfo is now available in
front-end code (commit 26aaf97b683d6258c098859e6b1268e1f5da242f). The
remaining problems (that I know about) have to do with error reporting
and multibyte character support; a read of the patches is suggested
for those wanting further details.

From d05e1fc82a51cb583a0367e72b1afc0de561dd00 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 15 Jan 2020 10:36:52 -0500
Subject: [PATCH 4/6] Introduce json_error() macro.

---
src/backend/utils/adt/jsonapi.c | 221 +++++++++++++-------------------
1 file changed, 90 insertions(+), 131 deletions(-)
diff --git a/src/backend/utils/adt/jsonapi.c b/src/backend/utils/adt/jsonapi.c
index fc8af9f861..20f7f0f7ac 100644
--- a/src/backend/utils/adt/jsonapi.c
+++ b/src/backend/utils/adt/jsonapi.c
@@ -17,6 +17,9 @@
#include "miscadmin.h"
#include "utils/jsonapi.h"
+#define json_error(rest) \
+	ereport(ERROR, (rest, report_json_context(lex)))
+

It's not obvious why the better approach here wouldn't be to just have a
very simple ereport replacement, that needs to be explicitly included
from frontend code. It'd not be meaningfully harder, imo, and it'd
require fewer adaptions, and it'd look more familiar.

/* the null action object used for pure validation */
@@ -701,7 +735,11 @@ json_lex_string(JsonLexContext *lex)
ch = (ch * 16) + (*s - 'A') + 10;
else
{
+#ifdef FRONTEND
+						lex->token_terminator = s + PQmblen(s, PG_UTF8);
+#else
lex->token_terminator = s + pg_mblen(s);
+#endif

If we were to go this way, it seems like the ifdef should rather be in a
helper function, rather than all over. It seems like it should be
unproblematic to have a common interface for both frontend/backend?

Greetings,

Andres Freund

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Andres Freund (#3)

Re: making the backend's json parser work in frontend code

On Wed, Jan 15, 2020 at 6:40 PM Andres Freund <andres@anarazel.de> wrote:

It's not obvious why the better approach here wouldn't be to just have a
very simple ereport replacement, that needs to be explicitly included
from frontend code. It'd not be meaningfully harder, imo, and it'd
require fewer adaptions, and it'd look more familiar.

I agree that it's far from obvious that the hacks in the patch are
best; to the contrary, they are hacks. That said, I feel that the
semantics of throwing an error are not very well-defined in a
front-end environment. I mean, in a backend context, throwing an error
is going to abort the current transaction, with all that this implies.
If the frontend equivalent is to do nothing and hope for the best, I
doubt it will survive anything more than the simplest use cases. This
is one of the reasons I've been very reluctant to go do down this
whole path in the first place.

+#ifdef FRONTEND
+                                             lex->token_terminator = s + PQmblen(s, PG_UTF8);
+#else
lex->token_terminator = s + pg_mblen(s);
+#endif
If we were to go this way, it seems like the ifdef should rather be in a
helper function, rather than all over.

Sure... like I said, this is just to illustrate the problem.

It seems like it should be
unproblematic to have a common interface for both frontend/backend?

Not sure how. pg_mblen() and PQmblen() are both existing interfaces,
and they're not compatible with each other. I guess we could make
PQmblen() available to backend code, but given that the function name
implies an origin in libpq, that seems wicked confusing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Michael Paquier

michael@paquier.xyz

almost 6 years ago

In reply to: Robert Haas (#4)

Re: making the backend's json parser work in frontend code

On Wed, Jan 15, 2020 at 09:39:13PM -0500, Robert Haas wrote:

On Wed, Jan 15, 2020 at 6:40 PM Andres Freund <andres@anarazel.de> wrote:

It's not obvious why the better approach here wouldn't be to just have a
very simple ereport replacement, that needs to be explicitly included
from frontend code. It'd not be meaningfully harder, imo, and it'd
require fewer adaptions, and it'd look more familiar.

I agree that it's far from obvious that the hacks in the patch are
best; to the contrary, they are hacks. That said, I feel that the
semantics of throwing an error are not very well-defined in a
front-end environment. I mean, in a backend context, throwing an error
is going to abort the current transaction, with all that this implies.
If the frontend equivalent is to do nothing and hope for the best, I
doubt it will survive anything more than the simplest use cases. This
is one of the reasons I've been very reluctant to go do down this
whole path in the first place.

The error handling is a well defined concept in the backend. If
connected to a database, you know that a session has to rollback any
existing activity, etc. The clients have to be more flexible because
an error depends a lot of how the tools is designed and how it should
react on a error. So the backend code in charge of logging an error
does the best it can: it throws an error, then lets the caller decide
what to do with it. I agree with the feeling that having a simple
replacement for ereport() in the frontend would be nice, that would be
less code churn in parts shared by backend/frontend.

Not sure how. pg_mblen() and PQmblen() are both existing interfaces,
and they're not compatible with each other. I guess we could make
PQmblen() available to backend code, but given that the function name
implies an origin in libpq, that seems wicked confusing.

Well, the problem here is the encoding part, and the code looks at the
same table pg_wchar_table[] at the end, so this needs some thoughts.
On top of that, we don't know exactly on the client what kind of
encoding is available (this led for example to several
assumptions/hiccups behind the implementation of SCRAM as it requires
UTF-8 per its RFC when working on the libpq part).
--
Michael

David Steele

david@pgmasters.net

almost 6 years ago

In reply to: Robert Haas (#1)

Re: making the backend's json parser work in frontend code

Hi Robert,

On 1/15/20 2:02 PM, Robert Haas wrote:

The discussion on the backup manifest thread has gotten bogged down on
the issue of the format that should be used to store the backup
manifest file. I want something simple and ad-hoc; David Steele and
Stephen Frost prefer JSON. That is problematic because our JSON parser
does not work in frontend code, and I want to be able to validate a
backup against its manifest, which involves being able to parse the
manifest from frontend code. The latest development over there is that
David Steele has posted the JSON parser that he wrote for pgbackrest
with an offer to try to adapt it for use in front-end PostgreSQL code,
an offer which I genuinely appreciate. I'll write more about that over
on that thread. However, I decided to spend today doing some further
investigation of an alternative approach, namely making the backend's
existing JSON parser work in frontend code as well. I did not solve
all the problems there, but I did come up with some patches which I
think would be worth committing on independent grounds, and I think
the whole series is worth posting. So here goes.

I was starting to wonder if it wouldn't be simpler to go back to the
Postgres JSON parser and see if we can adapt it. I'm not sure that it
*is* simpler, but it would almost certainly be more acceptable.

0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
missing something, this seems like an overdue cleanup. It's long been
the case that wchar.c is actually compiled and linked into both
frontend and backend code. Commit
60f11b87a2349985230c08616fa8a34ffde934c8 added code into src/common
that depends on wchar.c being available, but didn't actually make
wchar.c part of src/common, which seems like an odd decision: the
functions in the library are dependent on code that is not part of any
library but whose source files get copied around where needed. Eh?

This looks like an obvious improvement to me.

0002 does some basic header cleanup to make it possible to include the
existing header file jsonapi.h in frontend code. The state of the JSON
headers today looks generally poor. There seems not to have been much
attempt to get the prototypes for a given source file, say foo.c, into
a header file with the same name, say foo.h. Also, dependencies
between various header files seem to be have added somewhat freely.
This patch does not come close to fixing all that, but I consider it a
modest down payment on a cleanup that probably ought to be taken
further.

Agreed that these header files are fairly disorganized. In general the
names json, jsonapi, jsonfuncs don't tell me a whole lot. I feel like
I'd want to include json.h to get a json parser but it only contains one
utility function before these patches. I can see that json.c primarily
contains SQL functions so that's why.

So the idea here is that json.c will have the JSON SQL functions,
jsonb.c the JSONB SQL functions, and jsonapi.c the parser, and
jsonfuncs.c the utility functions?

0003 splits json.c into two files, json.c and jsonapi.c. All the
lexing and parsing stuff (whose prototypes are in jsonapi.h) goes into
jsonapi.c, while the stuff that pertains to the 'json' data type
remains in json.c. This also seems like a good cleanup, because to me,
at least, it's not a great idea to mix together code that is used by
both the json and jsonb data types as well as other things in the
system that want to generate or parse json together with things that
are specific to the 'json' data type.

This seems like a good first step. I wonder if the remainder of the SQL
json/jsonb functions should be moved to json.c/jsonb.c respectively?

That does represent a lot of code churn though, so perhaps not worth it.

As far as I know all three of the above patches are committable as-is;
review and contrary opinions welcome.

Agreed, with some questions as above.

On the other hand, 0004, 0005, and 0006 are charitably described as
experimental or WIP. 0004 and 0005 hack up jsonapi.c so that it can
still be compiled even if #include "postgres.h" is changed to #include
"postgres-fe.h" and 0006 moves it into src/common. Note that I say
that they make it compile, not work. It's not just untested; it's
definitely broken. But it gives a feeling for what the remaining
obstacles to making this code available in a frontend environment are.
Since I wrote my very first email complaining about the difficulty of
making the backend's JSON parser work in a frontend environment, one
obstacle has been knocked down: StringInfo is now available in
front-end code (commit 26aaf97b683d6258c098859e6b1268e1f5da242f). The
remaining problems (that I know about) have to do with error reporting
and multibyte character support; a read of the patches is suggested
for those wanting further details.

Well, with the caveat that it doesn't work, it's less than I expected.

Obviously ereport() is a pretty big deal and I agree with Michael
downthread that we should port this to the frontend code.

It would also be nice to unify functions like PQmblen() and pg_mblen()
if possible.

The next question in my mind is given the caveat that the error handing
is questionable in the front end, can we at least render/parse valid
JSON with the code?

Regards,
--
-David
david@pgmasters.net

David Steele

david@pgmasters.net

almost 6 years ago

In reply to: David Steele (#6)

Re: making the backend's json parser work in frontend code

Hi Robert,

On 1/16/20 11:37 AM, David Steele wrote:

The next question in my mind is given the caveat that the error handing
is questionable in the front end, can we at least render/parse valid
JSON with the code?

Hrm, this bit was from an earlier edit. I meant:

The next question in my mind is what will it take to get this working in
a limited form so we can at least prototype it with pg_basebackup. I
can hack on this with some static strings in front end code tomorrow to
see what works and what doesn't if that makes sense.

Regards,
--
-David
david@pgmasters.net

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: David Steele (#6)

Re: making the backend's json parser work in frontend code

On Thu, Jan 16, 2020 at 1:37 PM David Steele <david@pgmasters.net> wrote:

I was starting to wonder if it wouldn't be simpler to go back to the
Postgres JSON parser and see if we can adapt it. I'm not sure that it
*is* simpler, but it would almost certainly be more acceptable.

That is my feeling also.

So the idea here is that json.c will have the JSON SQL functions,
jsonb.c the JSONB SQL functions, and jsonapi.c the parser, and
jsonfuncs.c the utility functions?

Uh, I think roughly that, yes. Although I can't claim to fully
understand everything that's here.

This seems like a good first step. I wonder if the remainder of the SQL
json/jsonb functions should be moved to json.c/jsonb.c respectively?

That does represent a lot of code churn though, so perhaps not worth it.

I don't have an opinion on this right now.

Well, with the caveat that it doesn't work, it's less than I expected.

Obviously ereport() is a pretty big deal and I agree with Michael
downthread that we should port this to the frontend code.

Another possibly-attractive option would be to defer throwing the
error: i.e. set some flags in the lex or parse state or something, and
then just return. The caller notices the flags and has enough
information to throw an error or whatever it wants to do. The reason I
think this might be attractive is that it dodges the whole question of
what exactly throwing an error is supposed to do in a world without
transactions, memory contexts, resource owners, etc. However, it has
some pitfalls of its own, like maybe being too much code churn or
hurting performance in non-error cases.

It would also be nice to unify functions like PQmblen() and pg_mblen()
if possible.

I don't see how to do that at the moment, but I agree that it would be
nice if we can figure it out.

The next question in my mind is given the caveat that the error handing
is questionable in the front end, can we at least render/parse valid
JSON with the code?

That's a real good question. Thanks for offering to test it; I think
that would be very helpful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

David Steele

david@pgmasters.net

almost 6 years ago

In reply to: Andres Freund (#3)

Re: making the backend's json parser work in frontend code

On 1/15/20 4:40 PM, Andres Freund wrote:

I'm not sure where I come down between using json and a simple ad-hoc
format, when the dependency for the former is making the existing json
parser work in the frontend. But if the alternative is to add a second
json parser, it very clearly shifts towards using an ad-hoc
format. Having to maintain a simple ad-hoc parser is a lot less
technical debt than having a second full blown json parser.

Maybe at first, but it will grow and become more complex as new features
are added. This has been our experience with pgBackRest, at least.

Imo even
when an external project or three also has to have that simple parser.

I don't agree here. Especially if we outgrow the format and they need
two parsers, depending on the version of PostgreSQL.

To do page-level incrementals (which this feature is intended to enable)
the user will need to be able to associate full and incremental backups
and the only way I see to do that (currently) is to read the manifests,
since the prior backup should be stored there. I think this means that
parsing the manifest is not really optional -- it will be required to do
any kind of automation with incrementals.

It's easy enough for a tool like pgBackRest to do something like that,
much harder for a user hacking together a tool in bash based on
pg_basebackup.

If the alternative were to use that newly proposed json parser to
*replace* the backend one too, the story would again be different.

That was certainly not my intention.

Regards,
--
-David
david@pgmasters.net

#10

David Steele

david@pgmasters.net

almost 6 years ago

In reply to: Robert Haas (#4)

Re: making the backend's json parser work in frontend code

On 1/15/20 7:39 PM, Robert Haas wrote:

On Wed, Jan 15, 2020 at 6:40 PM Andres Freund <andres@anarazel.de> wrote:

It's not obvious why the better approach here wouldn't be to just have a
very simple ereport replacement, that needs to be explicitly included
from frontend code. It'd not be meaningfully harder, imo, and it'd
require fewer adaptions, and it'd look more familiar.

I agree that it's far from obvious that the hacks in the patch are
best; to the contrary, they are hacks. That said, I feel that the
semantics of throwing an error are not very well-defined in a
front-end environment. I mean, in a backend context, throwing an error
is going to abort the current transaction, with all that this implies.
If the frontend equivalent is to do nothing and hope for the best, I
doubt it will survive anything more than the simplest use cases. This
is one of the reasons I've been very reluctant to go do down this
whole path in the first place.

The way we handle this in pgBackRest is to put a TRY ... CATCH block in
main() to log and exit on any uncaught THROW. That seems like a
reasonable way to start here. Without memory contexts that almost
certainly will mean memory leaks but I'm not sure how much that matters
if the action is to exit immediately.

Regards,
--
-David
david@pgmasters.net

#11

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: David Steele (#10)

Re: making the backend's json parser work in frontend code

David Steele <david@pgmasters.net> writes:

On 1/15/20 7:39 PM, Robert Haas wrote:

I agree that it's far from obvious that the hacks in the patch are
best; to the contrary, they are hacks. That said, I feel that the
semantics of throwing an error are not very well-defined in a
front-end environment. I mean, in a backend context, throwing an error
is going to abort the current transaction, with all that this implies.
If the frontend equivalent is to do nothing and hope for the best, I
doubt it will survive anything more than the simplest use cases. This
is one of the reasons I've been very reluctant to go do down this
whole path in the first place.

The way we handle this in pgBackRest is to put a TRY ... CATCH block in
main() to log and exit on any uncaught THROW. That seems like a
reasonable way to start here. Without memory contexts that almost
certainly will mean memory leaks but I'm not sure how much that matters
if the action is to exit immediately.

If that's the expectation, we might as well replace backend ereport(ERROR)
with something that just prints a message and does exit(1).

The question comes down to whether there are use-cases where a frontend
application would really want to recover and continue processing after
a JSON syntax problem. I'm not seeing that that's a near-term
requirement, so maybe we could leave it for somebody to solve when
and if they want to do it.

regards, tom lane

#12

Andres Freund

andres@anarazel.de

almost 6 years ago

In reply to: Tom Lane (#11)

Re: making the backend's json parser work in frontend code

Hi,

On 2020-01-16 14:20:28 -0500, Tom Lane wrote:

David Steele <david@pgmasters.net> writes:

The way we handle this in pgBackRest is to put a TRY ... CATCH block in
main() to log and exit on any uncaught THROW. That seems like a
reasonable way to start here. Without memory contexts that almost
certainly will mean memory leaks but I'm not sure how much that matters
if the action is to exit immediately.

If that's the expectation, we might as well replace backend ereport(ERROR)
with something that just prints a message and does exit(1).

Well, the process might still want to do some cleanup of half-finished
work. You'd not need to be resistant against memory leaks to do so, if
followed by an exit. Obviously you can also do all the necessarily
cleanup from within the ereport(ERROR) itself, but that doesn't seem
appealing to me (not composable, harder to reuse for other programs,
etc).

Greetings,

Andres Freund

#13

David Steele

david@pgmasters.net

almost 6 years ago

In reply to: Andres Freund (#12)

Re: making the backend's json parser work in frontend code

On 1/16/20 12:26 PM, Andres Freund wrote:

Hi,

On 2020-01-16 14:20:28 -0500, Tom Lane wrote:

David Steele <david@pgmasters.net> writes:

The way we handle this in pgBackRest is to put a TRY ... CATCH block in
main() to log and exit on any uncaught THROW. That seems like a
reasonable way to start here. Without memory contexts that almost
certainly will mean memory leaks but I'm not sure how much that matters
if the action is to exit immediately.

If that's the expectation, we might as well replace backend ereport(ERROR)
with something that just prints a message and does exit(1).

Well, the process might still want to do some cleanup of half-finished
work. You'd not need to be resistant against memory leaks to do so, if
followed by an exit. Obviously you can also do all the necessarily
cleanup from within the ereport(ERROR) itself, but that doesn't seem
appealing to me (not composable, harder to reuse for other programs,
etc).

In pgBackRest we have a default handler that just logs the message to
stderr and exits (though we consider it a coding error if it gets
called). Seems like we could do the same here. Default message and
exit if no handler, but optionally allow a handler (which could RETHROW
to get to the default handler afterwards).

It seems like we've been wanting a front end version of ereport() for a
while so I'll take a look at that and see what it involves.

Regards,
--
-David
david@pgmasters.net

#14

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Robert Haas (#1)

1 attachment(s)

Re: making the backend's json parser work in frontend code

Robert Haas <robertmhaas@gmail.com> writes:

0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
missing something, this seems like an overdue cleanup.

Here's a reviewed version of 0001. You missed fixing the MSVC build,
and there were assorted comments and other things referencing wchar.c
that needed to be cleaned up.

Also, it seemed to me that if we are going to move wchar.c, we should
also move encnames.c, so that libpq can get fully out of the
symlinking-source-files business. It makes initdb less weird too.

I took the liberty of sticking proper copyright headers onto these
two files, too. (This makes the diff a lot more bulky :-(. Would
it help to add the headers in a separate commit?)

Another thing I'm wondering about is if any of the #ifndef FRONTEND
code should get moved *back* to src/backend/utils/mb. But that
could be a separate commit, too.

Lastly, it strikes me that maybe pg_wchar.h, or parts of it, should
migrate over to src/include/common. But that'd be far more invasive
to other source files, so I've not touched the issue here.

regards, tom lane

Attachments:

0001-move-wchar-and-encnames-to-src-common.patchtext/x-diff; charset=us-ascii; name=0001-move-wchar-and-encnames-to-src-common.patchDownload

diff --git a/src/backend/utils/mb/Makefile b/src/backend/utils/mb/Makefile
index cd4a016..b19a125 100644
--- a/src/backend/utils/mb/Makefile
+++ b/src/backend/utils/mb/Makefile
@@ -14,10 +14,8 @@ include $(top_builddir)/src/Makefile.global
 
 OBJS = \
 	conv.o \
-	encnames.o \
 	mbutils.o \
 	stringinfo_mb.o \
-	wchar.o \
 	wstrcmp.o \
 	wstrncmp.o
 
diff --git a/src/backend/utils/mb/README b/src/backend/utils/mb/README
index 7495ca5..ef36626 100644
--- a/src/backend/utils/mb/README
+++ b/src/backend/utils/mb/README
@@ -3,12 +3,8 @@ src/backend/utils/mb/README
 Encodings
 =========
 
-encnames.c:	public functions for both the backend and the frontend.
 conv.c:		static functions and a public table for code conversion
-wchar.c:	mostly static functions and a public table for mb string and
-		multibyte conversion
 mbutils.c:	public functions for the backend only.
-		requires conv.c and wchar.c
 stringinfo_mb.c: public backend-only multibyte-aware stringinfo functions
 wstrcmp.c:	strcmp for mb
 wstrncmp.c:	strncmp for mb
@@ -16,6 +12,12 @@ win866.c:	a tool to generate KOI8 <--> CP866 conversion table
 iso.c:		a tool to generate KOI8 <--> ISO8859-5 conversion table
 win1251.c:	a tool to generate KOI8 <--> CP1251 conversion table
 
+See also in src/common/:
+
+encnames.c:	public functions for encoding names
+wchar.c:	mostly static functions and a public table for mb string and
+		multibyte conversion
+
 Introduction
 ------------
 	http://www.cprogramming.com/tutorial/unicode.html
diff --git a/src/backend/utils/mb/encnames.c b/src/backend/utils/mb/encnames.c
deleted file mode 100644
index 12b61cd..0000000
--- a/src/backend/utils/mb/encnames.c
+++ /dev/null
@@ -1,629 +0,0 @@
-/*
- * Encoding names and routines for work with it. All
- * in this file is shared between FE and BE.
- *
- * src/backend/utils/mb/encnames.c
- */
-#ifdef FRONTEND
-#include "postgres_fe.h"
-#else
-#include "postgres.h"
-#include "utils/builtins.h"
-#endif
-
-#include <ctype.h>
-#include <unistd.h>
-
-#include "mb/pg_wchar.h"
-
-
-/* ----------
- * All encoding names, sorted:		 *** A L P H A B E T I C ***
- *
- * All names must be without irrelevant chars, search routines use
- * isalnum() chars only. It means ISO-8859-1, iso_8859-1 and Iso8859_1
- * are always converted to 'iso88591'. All must be lower case.
- *
- * The table doesn't contain 'cs' aliases (like csISOLatin1). It's needed?
- *
- * Karel Zak, Aug 2001
- * ----------
- */
-typedef struct pg_encname
-{
-	const char *name;
-	pg_enc		encoding;
-} pg_encname;
-
-static const pg_encname pg_encname_tbl[] =
-{
-	{
-		"abc", PG_WIN1258
-	},							/* alias for WIN1258 */
-	{
-		"alt", PG_WIN866
-	},							/* IBM866 */
-	{
-		"big5", PG_BIG5
-	},							/* Big5; Chinese for Taiwan multibyte set */
-	{
-		"euccn", PG_EUC_CN
-	},							/* EUC-CN; Extended Unix Code for simplified
-								 * Chinese */
-	{
-		"eucjis2004", PG_EUC_JIS_2004
-	},							/* EUC-JIS-2004; Extended UNIX Code fixed
-								 * Width for Japanese, standard JIS X 0213 */
-	{
-		"eucjp", PG_EUC_JP
-	},							/* EUC-JP; Extended UNIX Code fixed Width for
-								 * Japanese, standard OSF */
-	{
-		"euckr", PG_EUC_KR
-	},							/* EUC-KR; Extended Unix Code for Korean , KS
-								 * X 1001 standard */
-	{
-		"euctw", PG_EUC_TW
-	},							/* EUC-TW; Extended Unix Code for
-								 *
-								 * traditional Chinese */
-	{
-		"gb18030", PG_GB18030
-	},							/* GB18030;GB18030 */
-	{
-		"gbk", PG_GBK
-	},							/* GBK; Chinese Windows CodePage 936
-								 * simplified Chinese */
-	{
-		"iso88591", PG_LATIN1
-	},							/* ISO-8859-1; RFC1345,KXS2 */
-	{
-		"iso885910", PG_LATIN6
-	},							/* ISO-8859-10; RFC1345,KXS2 */
-	{
-		"iso885913", PG_LATIN7
-	},							/* ISO-8859-13; RFC1345,KXS2 */
-	{
-		"iso885914", PG_LATIN8
-	},							/* ISO-8859-14; RFC1345,KXS2 */
-	{
-		"iso885915", PG_LATIN9
-	},							/* ISO-8859-15; RFC1345,KXS2 */
-	{
-		"iso885916", PG_LATIN10
-	},							/* ISO-8859-16; RFC1345,KXS2 */
-	{
-		"iso88592", PG_LATIN2
-	},							/* ISO-8859-2; RFC1345,KXS2 */
-	{
-		"iso88593", PG_LATIN3
-	},							/* ISO-8859-3; RFC1345,KXS2 */
-	{
-		"iso88594", PG_LATIN4
-	},							/* ISO-8859-4; RFC1345,KXS2 */
-	{
-		"iso88595", PG_ISO_8859_5
-	},							/* ISO-8859-5; RFC1345,KXS2 */
-	{
-		"iso88596", PG_ISO_8859_6
-	},							/* ISO-8859-6; RFC1345,KXS2 */
-	{
-		"iso88597", PG_ISO_8859_7
-	},							/* ISO-8859-7; RFC1345,KXS2 */
-	{
-		"iso88598", PG_ISO_8859_8
-	},							/* ISO-8859-8; RFC1345,KXS2 */
-	{
-		"iso88599", PG_LATIN5
-	},							/* ISO-8859-9; RFC1345,KXS2 */
-	{
-		"johab", PG_JOHAB
-	},							/* JOHAB; Extended Unix Code for simplified
-								 * Chinese */
-	{
-		"koi8", PG_KOI8R
-	},							/* _dirty_ alias for KOI8-R (backward
-								 * compatibility) */
-	{
-		"koi8r", PG_KOI8R
-	},							/* KOI8-R; RFC1489 */
-	{
-		"koi8u", PG_KOI8U
-	},							/* KOI8-U; RFC2319 */
-	{
-		"latin1", PG_LATIN1
-	},							/* alias for ISO-8859-1 */
-	{
-		"latin10", PG_LATIN10
-	},							/* alias for ISO-8859-16 */
-	{
-		"latin2", PG_LATIN2
-	},							/* alias for ISO-8859-2 */
-	{
-		"latin3", PG_LATIN3
-	},							/* alias for ISO-8859-3 */
-	{
-		"latin4", PG_LATIN4
-	},							/* alias for ISO-8859-4 */
-	{
-		"latin5", PG_LATIN5
-	},							/* alias for ISO-8859-9 */
-	{
-		"latin6", PG_LATIN6
-	},							/* alias for ISO-8859-10 */
-	{
-		"latin7", PG_LATIN7
-	},							/* alias for ISO-8859-13 */
-	{
-		"latin8", PG_LATIN8
-	},							/* alias for ISO-8859-14 */
-	{
-		"latin9", PG_LATIN9
-	},							/* alias for ISO-8859-15 */
-	{
-		"mskanji", PG_SJIS
-	},							/* alias for Shift_JIS */
-	{
-		"muleinternal", PG_MULE_INTERNAL
-	},
-	{
-		"shiftjis", PG_SJIS
-	},							/* Shift_JIS; JIS X 0202-1991 */
-
-	{
-		"shiftjis2004", PG_SHIFT_JIS_2004
-	},							/* SHIFT-JIS-2004; Shift JIS for Japanese,
-								 * standard JIS X 0213 */
-	{
-		"sjis", PG_SJIS
-	},							/* alias for Shift_JIS */
-	{
-		"sqlascii", PG_SQL_ASCII
-	},
-	{
-		"tcvn", PG_WIN1258
-	},							/* alias for WIN1258 */
-	{
-		"tcvn5712", PG_WIN1258
-	},							/* alias for WIN1258 */
-	{
-		"uhc", PG_UHC
-	},							/* UHC; Korean Windows CodePage 949 */
-	{
-		"unicode", PG_UTF8
-	},							/* alias for UTF8 */
-	{
-		"utf8", PG_UTF8
-	},							/* alias for UTF8 */
-	{
-		"vscii", PG_WIN1258
-	},							/* alias for WIN1258 */
-	{
-		"win", PG_WIN1251
-	},							/* _dirty_ alias for windows-1251 (backward
-								 * compatibility) */
-	{
-		"win1250", PG_WIN1250
-	},							/* alias for Windows-1250 */
-	{
-		"win1251", PG_WIN1251
-	},							/* alias for Windows-1251 */
-	{
-		"win1252", PG_WIN1252
-	},							/* alias for Windows-1252 */
-	{
-		"win1253", PG_WIN1253
-	},							/* alias for Windows-1253 */
-	{
-		"win1254", PG_WIN1254
-	},							/* alias for Windows-1254 */
-	{
-		"win1255", PG_WIN1255
-	},							/* alias for Windows-1255 */
-	{
-		"win1256", PG_WIN1256
-	},							/* alias for Windows-1256 */
-	{
-		"win1257", PG_WIN1257
-	},							/* alias for Windows-1257 */
-	{
-		"win1258", PG_WIN1258
-	},							/* alias for Windows-1258 */
-	{
-		"win866", PG_WIN866
-	},							/* IBM866 */
-	{
-		"win874", PG_WIN874
-	},							/* alias for Windows-874 */
-	{
-		"win932", PG_SJIS
-	},							/* alias for Shift_JIS */
-	{
-		"win936", PG_GBK
-	},							/* alias for GBK */
-	{
-		"win949", PG_UHC
-	},							/* alias for UHC */
-	{
-		"win950", PG_BIG5
-	},							/* alias for BIG5 */
-	{
-		"windows1250", PG_WIN1250
-	},							/* Windows-1251; Microsoft */
-	{
-		"windows1251", PG_WIN1251
-	},							/* Windows-1251; Microsoft */
-	{
-		"windows1252", PG_WIN1252
-	},							/* Windows-1252; Microsoft */
-	{
-		"windows1253", PG_WIN1253
-	},							/* Windows-1253; Microsoft */
-	{
-		"windows1254", PG_WIN1254
-	},							/* Windows-1254; Microsoft */
-	{
-		"windows1255", PG_WIN1255
-	},							/* Windows-1255; Microsoft */
-	{
-		"windows1256", PG_WIN1256
-	},							/* Windows-1256; Microsoft */
-	{
-		"windows1257", PG_WIN1257
-	},							/* Windows-1257; Microsoft */
-	{
-		"windows1258", PG_WIN1258
-	},							/* Windows-1258; Microsoft */
-	{
-		"windows866", PG_WIN866
-	},							/* IBM866 */
-	{
-		"windows874", PG_WIN874
-	},							/* Windows-874; Microsoft */
-	{
-		"windows932", PG_SJIS
-	},							/* alias for Shift_JIS */
-	{
-		"windows936", PG_GBK
-	},							/* alias for GBK */
-	{
-		"windows949", PG_UHC
-	},							/* alias for UHC */
-	{
-		"windows950", PG_BIG5
-	}							/* alias for BIG5 */
-};
-
-/* ----------
- * These are "official" encoding names.
- * XXX must be sorted by the same order as enum pg_enc (in mb/pg_wchar.h)
- * ----------
- */
-#ifndef WIN32
-#define DEF_ENC2NAME(name, codepage) { #name, PG_##name }
-#else
-#define DEF_ENC2NAME(name, codepage) { #name, PG_##name, codepage }
-#endif
-const pg_enc2name pg_enc2name_tbl[] =
-{
-	DEF_ENC2NAME(SQL_ASCII, 0),
-	DEF_ENC2NAME(EUC_JP, 20932),
-	DEF_ENC2NAME(EUC_CN, 20936),
-	DEF_ENC2NAME(EUC_KR, 51949),
-	DEF_ENC2NAME(EUC_TW, 0),
-	DEF_ENC2NAME(EUC_JIS_2004, 20932),
-	DEF_ENC2NAME(UTF8, 65001),
-	DEF_ENC2NAME(MULE_INTERNAL, 0),
-	DEF_ENC2NAME(LATIN1, 28591),
-	DEF_ENC2NAME(LATIN2, 28592),
-	DEF_ENC2NAME(LATIN3, 28593),
-	DEF_ENC2NAME(LATIN4, 28594),
-	DEF_ENC2NAME(LATIN5, 28599),
-	DEF_ENC2NAME(LATIN6, 0),
-	DEF_ENC2NAME(LATIN7, 0),
-	DEF_ENC2NAME(LATIN8, 0),
-	DEF_ENC2NAME(LATIN9, 28605),
-	DEF_ENC2NAME(LATIN10, 0),
-	DEF_ENC2NAME(WIN1256, 1256),
-	DEF_ENC2NAME(WIN1258, 1258),
-	DEF_ENC2NAME(WIN866, 866),
-	DEF_ENC2NAME(WIN874, 874),
-	DEF_ENC2NAME(KOI8R, 20866),
-	DEF_ENC2NAME(WIN1251, 1251),
-	DEF_ENC2NAME(WIN1252, 1252),
-	DEF_ENC2NAME(ISO_8859_5, 28595),
-	DEF_ENC2NAME(ISO_8859_6, 28596),
-	DEF_ENC2NAME(ISO_8859_7, 28597),
-	DEF_ENC2NAME(ISO_8859_8, 28598),
-	DEF_ENC2NAME(WIN1250, 1250),
-	DEF_ENC2NAME(WIN1253, 1253),
-	DEF_ENC2NAME(WIN1254, 1254),
-	DEF_ENC2NAME(WIN1255, 1255),
-	DEF_ENC2NAME(WIN1257, 1257),
-	DEF_ENC2NAME(KOI8U, 21866),
-	DEF_ENC2NAME(SJIS, 932),
-	DEF_ENC2NAME(BIG5, 950),
-	DEF_ENC2NAME(GBK, 936),
-	DEF_ENC2NAME(UHC, 949),
-	DEF_ENC2NAME(GB18030, 54936),
-	DEF_ENC2NAME(JOHAB, 0),
-	DEF_ENC2NAME(SHIFT_JIS_2004, 932)
-};
-
-/* ----------
- * These are encoding names for gettext.
- *
- * This covers all encodings except MULE_INTERNAL, which is alien to gettext.
- * ----------
- */
-const pg_enc2gettext pg_enc2gettext_tbl[] =
-{
-	{PG_SQL_ASCII, "US-ASCII"},
-	{PG_UTF8, "UTF-8"},
-	{PG_LATIN1, "LATIN1"},
-	{PG_LATIN2, "LATIN2"},
-	{PG_LATIN3, "LATIN3"},
-	{PG_LATIN4, "LATIN4"},
-	{PG_ISO_8859_5, "ISO-8859-5"},
-	{PG_ISO_8859_6, "ISO_8859-6"},
-	{PG_ISO_8859_7, "ISO-8859-7"},
-	{PG_ISO_8859_8, "ISO-8859-8"},
-	{PG_LATIN5, "LATIN5"},
-	{PG_LATIN6, "LATIN6"},
-	{PG_LATIN7, "LATIN7"},
-	{PG_LATIN8, "LATIN8"},
-	{PG_LATIN9, "LATIN-9"},
-	{PG_LATIN10, "LATIN10"},
-	{PG_KOI8R, "KOI8-R"},
-	{PG_KOI8U, "KOI8-U"},
-	{PG_WIN1250, "CP1250"},
-	{PG_WIN1251, "CP1251"},
-	{PG_WIN1252, "CP1252"},
-	{PG_WIN1253, "CP1253"},
-	{PG_WIN1254, "CP1254"},
-	{PG_WIN1255, "CP1255"},
-	{PG_WIN1256, "CP1256"},
-	{PG_WIN1257, "CP1257"},
-	{PG_WIN1258, "CP1258"},
-	{PG_WIN866, "CP866"},
-	{PG_WIN874, "CP874"},
-	{PG_EUC_CN, "EUC-CN"},
-	{PG_EUC_JP, "EUC-JP"},
-	{PG_EUC_KR, "EUC-KR"},
-	{PG_EUC_TW, "EUC-TW"},
-	{PG_EUC_JIS_2004, "EUC-JP"},
-	{PG_SJIS, "SHIFT-JIS"},
-	{PG_BIG5, "BIG5"},
-	{PG_GBK, "GBK"},
-	{PG_UHC, "UHC"},
-	{PG_GB18030, "GB18030"},
-	{PG_JOHAB, "JOHAB"},
-	{PG_SHIFT_JIS_2004, "SHIFT_JISX0213"},
-	{0, NULL}
-};
-
-
-#ifndef FRONTEND
-
-/*
- * Table of encoding names for ICU
- *
- * Reference: <https://ssl.icu-project.org/icu-bin/convexp>
- *
- * NULL entries are not supported by ICU, or their mapping is unclear.
- */
-static const char *const pg_enc2icu_tbl[] =
-{
-	NULL,						/* PG_SQL_ASCII */
-	"EUC-JP",					/* PG_EUC_JP */
-	"EUC-CN",					/* PG_EUC_CN */
-	"EUC-KR",					/* PG_EUC_KR */
-	"EUC-TW",					/* PG_EUC_TW */
-	NULL,						/* PG_EUC_JIS_2004 */
-	"UTF-8",					/* PG_UTF8 */
-	NULL,						/* PG_MULE_INTERNAL */
-	"ISO-8859-1",				/* PG_LATIN1 */
-	"ISO-8859-2",				/* PG_LATIN2 */
-	"ISO-8859-3",				/* PG_LATIN3 */
-	"ISO-8859-4",				/* PG_LATIN4 */
-	"ISO-8859-9",				/* PG_LATIN5 */
-	"ISO-8859-10",				/* PG_LATIN6 */
-	"ISO-8859-13",				/* PG_LATIN7 */
-	"ISO-8859-14",				/* PG_LATIN8 */
-	"ISO-8859-15",				/* PG_LATIN9 */
-	NULL,						/* PG_LATIN10 */
-	"CP1256",					/* PG_WIN1256 */
-	"CP1258",					/* PG_WIN1258 */
-	"CP866",					/* PG_WIN866 */
-	NULL,						/* PG_WIN874 */
-	"KOI8-R",					/* PG_KOI8R */
-	"CP1251",					/* PG_WIN1251 */
-	"CP1252",					/* PG_WIN1252 */
-	"ISO-8859-5",				/* PG_ISO_8859_5 */
-	"ISO-8859-6",				/* PG_ISO_8859_6 */
-	"ISO-8859-7",				/* PG_ISO_8859_7 */
-	"ISO-8859-8",				/* PG_ISO_8859_8 */
-	"CP1250",					/* PG_WIN1250 */
-	"CP1253",					/* PG_WIN1253 */
-	"CP1254",					/* PG_WIN1254 */
-	"CP1255",					/* PG_WIN1255 */
-	"CP1257",					/* PG_WIN1257 */
-	"KOI8-U",					/* PG_KOI8U */
-};
-
-bool
-is_encoding_supported_by_icu(int encoding)
-{
-	return (pg_enc2icu_tbl[encoding] != NULL);
-}
-
-const char *
-get_encoding_name_for_icu(int encoding)
-{
-	const char *icu_encoding_name;
-
-	StaticAssertStmt(lengthof(pg_enc2icu_tbl) == PG_ENCODING_BE_LAST + 1,
-					 "pg_enc2icu_tbl incomplete");
-
-	icu_encoding_name = pg_enc2icu_tbl[encoding];
-
-	if (!icu_encoding_name)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("encoding \"%s\" not supported by ICU",
-						pg_encoding_to_char(encoding))));
-
-	return icu_encoding_name;
-}
-
-#endif							/* not FRONTEND */
-
-
-/* ----------
- * Encoding checks, for error returns -1 else encoding id
- * ----------
- */
-int
-pg_valid_client_encoding(const char *name)
-{
-	int			enc;
-
-	if ((enc = pg_char_to_encoding(name)) < 0)
-		return -1;
-
-	if (!PG_VALID_FE_ENCODING(enc))
-		return -1;
-
-	return enc;
-}
-
-int
-pg_valid_server_encoding(const char *name)
-{
-	int			enc;
-
-	if ((enc = pg_char_to_encoding(name)) < 0)
-		return -1;
-
-	if (!PG_VALID_BE_ENCODING(enc))
-		return -1;
-
-	return enc;
-}
-
-int
-pg_valid_server_encoding_id(int encoding)
-{
-	return PG_VALID_BE_ENCODING(encoding);
-}
-
-/* ----------
- * Remove irrelevant chars from encoding name
- * ----------
- */
-static char *
-clean_encoding_name(const char *key, char *newkey)
-{
-	const char *p;
-	char	   *np;
-
-	for (p = key, np = newkey; *p != '\0'; p++)
-	{
-		if (isalnum((unsigned char) *p))
-		{
-			if (*p >= 'A' && *p <= 'Z')
-				*np++ = *p + 'a' - 'A';
-			else
-				*np++ = *p;
-		}
-	}
-	*np = '\0';
-	return newkey;
-}
-
-/* ----------
- * Search encoding by encoding name
- *
- * Returns encoding ID, or -1 for error
- * ----------
- */
-int
-pg_char_to_encoding(const char *name)
-{
-	unsigned int nel = lengthof(pg_encname_tbl);
-	const pg_encname *base = pg_encname_tbl,
-			   *last = base + nel - 1,
-			   *position;
-	int			result;
-	char		buff[NAMEDATALEN],
-			   *key;
-
-	if (name == NULL || *name == '\0')
-		return -1;
-
-	if (strlen(name) >= NAMEDATALEN)
-	{
-#ifdef FRONTEND
-		fprintf(stderr, "encoding name too long\n");
-		return -1;
-#else
-		ereport(ERROR,
-				(errcode(ERRCODE_NAME_TOO_LONG),
-				 errmsg("encoding name too long")));
-#endif
-	}
-	key = clean_encoding_name(name, buff);
-
-	while (last >= base)
-	{
-		position = base + ((last - base) >> 1);
-		result = key[0] - position->name[0];
-
-		if (result == 0)
-		{
-			result = strcmp(key, position->name);
-			if (result == 0)
-				return position->encoding;
-		}
-		if (result < 0)
-			last = position - 1;
-		else
-			base = position + 1;
-	}
-	return -1;
-}
-
-#ifndef FRONTEND
-Datum
-PG_char_to_encoding(PG_FUNCTION_ARGS)
-{
-	Name		s = PG_GETARG_NAME(0);
-
-	PG_RETURN_INT32(pg_char_to_encoding(NameStr(*s)));
-}
-#endif
-
-const char *
-pg_encoding_to_char(int encoding)
-{
-	if (PG_VALID_ENCODING(encoding))
-	{
-		const pg_enc2name *p = &pg_enc2name_tbl[encoding];
-
-		Assert(encoding == p->encoding);
-		return p->name;
-	}
-	return "";
-}
-
-#ifndef FRONTEND
-Datum
-PG_encoding_to_char(PG_FUNCTION_ARGS)
-{
-	int32		encoding = PG_GETARG_INT32(0);
-	const char *encoding_name = pg_encoding_to_char(encoding);
-
-	return DirectFunctionCall1(namein, CStringGetDatum(encoding_name));
-}
-
-#endif
diff --git a/src/backend/utils/mb/wchar.c b/src/backend/utils/mb/wchar.c
deleted file mode 100644
index 02e2588..0000000
--- a/src/backend/utils/mb/wchar.c
+++ /dev/null
@@ -1,2036 +0,0 @@
-/*
- * conversion functions between pg_wchar and multibyte streams.
- * Tatsuo Ishii
- * src/backend/utils/mb/wchar.c
- *
- */
-/* can be used in either frontend or backend */
-#ifdef FRONTEND
-#include "postgres_fe.h"
-#else
-#include "postgres.h"
-#endif
-
-#include "mb/pg_wchar.h"
-
-
-/*
- * Operations on multi-byte encodings are driven by a table of helper
- * functions.
- *
- * To add an encoding support, define mblen(), dsplen() and verifier() for
- * the encoding.  For server-encodings, also define mb2wchar() and wchar2mb()
- * conversion functions.
- *
- * These functions generally assume that their input is validly formed.
- * The "verifier" functions, further down in the file, have to be more
- * paranoid.
- *
- * We expect that mblen() does not need to examine more than the first byte
- * of the character to discover the correct length.  GB18030 is an exception
- * to that rule, though, as it also looks at second byte.  But even that
- * behaves in a predictable way, if you only pass the first byte: it will
- * treat 4-byte encoded characters as two 2-byte encoded characters, which is
- * good enough for all current uses.
- *
- * Note: for the display output of psql to work properly, the return values
- * of the dsplen functions must conform to the Unicode standard. In particular
- * the NUL character is zero width and control characters are generally
- * width -1. It is recommended that non-ASCII encodings refer their ASCII
- * subset to the ASCII routines to ensure consistency.
- */
-
-/*
- * SQL/ASCII
- */
-static int
-pg_ascii2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
-{
-	int			cnt = 0;
-
-	while (len > 0 && *from)
-	{
-		*to++ = *from++;
-		len--;
-		cnt++;
-	}
-	*to = 0;
-	return cnt;
-}
-
-static int
-pg_ascii_mblen(const unsigned char *s)
-{
-	return 1;
-}
-
-static int
-pg_ascii_dsplen(const unsigned char *s)
-{
-	if (*s == '\0')
-		return 0;
-	if (*s < 0x20 || *s == 0x7f)
-		return -1;
-
-	return 1;
-}
-
-/*
- * EUC
- */
-static int
-pg_euc2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
-{
-	int			cnt = 0;
-
-	while (len > 0 && *from)
-	{
-		if (*from == SS2 && len >= 2)	/* JIS X 0201 (so called "1 byte
-										 * KANA") */
-		{
-			from++;
-			*to = (SS2 << 8) | *from++;
-			len -= 2;
-		}
-		else if (*from == SS3 && len >= 3)	/* JIS X 0212 KANJI */
-		{
-			from++;
-			*to = (SS3 << 16) | (*from++ << 8);
-			*to |= *from++;
-			len -= 3;
-		}
-		else if (IS_HIGHBIT_SET(*from) && len >= 2) /* JIS X 0208 KANJI */
-		{
-			*to = *from++ << 8;
-			*to |= *from++;
-			len -= 2;
-		}
-		else					/* must be ASCII */
-		{
-			*to = *from++;
-			len--;
-		}
-		to++;
-		cnt++;
-	}
-	*to = 0;
-	return cnt;
-}
-
-static inline int
-pg_euc_mblen(const unsigned char *s)
-{
-	int			len;
-
-	if (*s == SS2)
-		len = 2;
-	else if (*s == SS3)
-		len = 3;
-	else if (IS_HIGHBIT_SET(*s))
-		len = 2;
-	else
-		len = 1;
-	return len;
-}
-
-static inline int
-pg_euc_dsplen(const unsigned char *s)
-{
-	int			len;
-
-	if (*s == SS2)
-		len = 2;
-	else if (*s == SS3)
-		len = 2;
-	else if (IS_HIGHBIT_SET(*s))
-		len = 2;
-	else
-		len = pg_ascii_dsplen(s);
-	return len;
-}
-
-/*
- * EUC_JP
- */
-static int
-pg_eucjp2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
-{
-	return pg_euc2wchar_with_len(from, to, len);
-}
-
-static int
-pg_eucjp_mblen(const unsigned char *s)
-{
-	return pg_euc_mblen(s);
-}
-
-static int
-pg_eucjp_dsplen(const unsigned char *s)
-{
-	int			len;
-
-	if (*s == SS2)
-		len = 1;
-	else if (*s == SS3)
-		len = 2;
-	else if (IS_HIGHBIT_SET(*s))
-		len = 2;
-	else
-		len = pg_ascii_dsplen(s);
-	return len;
-}
-
-/*
- * EUC_KR
- */
-static int
-pg_euckr2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
-{
-	return pg_euc2wchar_with_len(from, to, len);
-}
-
-static int
-pg_euckr_mblen(const unsigned char *s)
-{
-	return pg_euc_mblen(s);
-}
-
-static int
-pg_euckr_dsplen(const unsigned char *s)
-{
-	return pg_euc_dsplen(s);
-}
-
-/*
- * EUC_CN
- *
- */
-static int
-pg_euccn2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
-{
-	int			cnt = 0;
-
-	while (len > 0 && *from)
-	{
-		if (*from == SS2 && len >= 3)	/* code set 2 (unused?) */
-		{
-			from++;
-			*to = (SS2 << 16) | (*from++ << 8);
-			*to |= *from++;
-			len -= 3;
-		}
-		else if (*from == SS3 && len >= 3)	/* code set 3 (unused ?) */
-		{
-			from++;
-			*to = (SS3 << 16) | (*from++ << 8);
-			*to |= *from++;
-			len -= 3;
-		}
-		else if (IS_HIGHBIT_SET(*from) && len >= 2) /* code set 1 */
-		{
-			*to = *from++ << 8;
-			*to |= *from++;
-			len -= 2;
-		}
-		else
-		{
-			*to = *from++;
-			len--;
-		}
-		to++;
-		cnt++;
-	}
-	*to = 0;
-	return cnt;
-}
-
-static int
-pg_euccn_mblen(const unsigned char *s)
-{
-	int			len;
-
-	if (IS_HIGHBIT_SET(*s))
-		len = 2;
-	else
-		len = 1;
-	return len;
-}
-
-static int
-pg_euccn_dsplen(const unsigned char *s)
-{
-	int			len;
-
-	if (IS_HIGHBIT_SET(*s))
-		len = 2;
-	else
-		len = pg_ascii_dsplen(s);
-	return len;
-}
-
-/*
- * EUC_TW
- *
- */
-static int
-pg_euctw2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
-{
-	int			cnt = 0;
-
-	while (len > 0 && *from)
-	{
-		if (*from == SS2 && len >= 4)	/* code set 2 */
-		{
-			from++;
-			*to = (((uint32) SS2) << 24) | (*from++ << 16);
-			*to |= *from++ << 8;
-			*to |= *from++;
-			len -= 4;
-		}
-		else if (*from == SS3 && len >= 3)	/* code set 3 (unused?) */
-		{
-			from++;
-			*to = (SS3 << 16) | (*from++ << 8);
-			*to |= *from++;
-			len -= 3;
-		}
-		else if (IS_HIGHBIT_SET(*from) && len >= 2) /* code set 2 */
-		{
-			*to = *from++ << 8;
-			*to |= *from++;
-			len -= 2;
-		}
-		else
-		{
-			*to = *from++;
-			len--;
-		}
-		to++;
-		cnt++;
-	}
-	*to = 0;
-	return cnt;
-}
-
-static int
-pg_euctw_mblen(const unsigned char *s)
-{
-	int			len;
-
-	if (*s == SS2)
-		len = 4;
-	else if (*s == SS3)
-		len = 3;
-	else if (IS_HIGHBIT_SET(*s))
-		len = 2;
-	else
-		len = 1;
-	return len;
-}
-
-static int
-pg_euctw_dsplen(const unsigned char *s)
-{
-	int			len;
-
-	if (*s == SS2)
-		len = 2;
-	else if (*s == SS3)
-		len = 2;
-	else if (IS_HIGHBIT_SET(*s))
-		len = 2;
-	else
-		len = pg_ascii_dsplen(s);
-	return len;
-}
-
-/*
- * Convert pg_wchar to EUC_* encoding.
- * caller must allocate enough space for "to", including a trailing zero!
- * len: length of from.
- * "from" not necessarily null terminated.
- */
-static int
-pg_wchar2euc_with_len(const pg_wchar *from, unsigned char *to, int len)
-{
-	int			cnt = 0;
-
-	while (len > 0 && *from)
-	{
-		unsigned char c;
-
-		if ((c = (*from >> 24)))
-		{
-			*to++ = c;
-			*to++ = (*from >> 16) & 0xff;
-			*to++ = (*from >> 8) & 0xff;
-			*to++ = *from & 0xff;
-			cnt += 4;
-		}
-		else if ((c = (*from >> 16)))
-		{
-			*to++ = c;
-			*to++ = (*from >> 8) & 0xff;
-			*to++ = *from & 0xff;
-			cnt += 3;
-		}
-		else if ((c = (*from >> 8)))
-		{
-			*to++ = c;
-			*to++ = *from & 0xff;
-			cnt += 2;
-		}
-		else
-		{
-			*to++ = *from;
-			cnt++;
-		}
-		from++;
-		len--;
-	}
-	*to = 0;
-	return cnt;
-}
-
-
-/*
- * JOHAB
- */
-static int
-pg_johab_mblen(const unsigned char *s)
-{
-	return pg_euc_mblen(s);
-}
-
-static int
-pg_johab_dsplen(const unsigned char *s)
-{
-	return pg_euc_dsplen(s);
-}
-
-/*
- * convert UTF8 string to pg_wchar (UCS-4)
- * caller must allocate enough space for "to", including a trailing zero!
- * len: length of from.
- * "from" not necessarily null terminated.
- */
-static int
-pg_utf2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
-{
-	int			cnt = 0;
-	uint32		c1,
-				c2,
-				c3,
-				c4;
-
-	while (len > 0 && *from)
-	{
-		if ((*from & 0x80) == 0)
-		{
-			*to = *from++;
-			len--;
-		}
-		else if ((*from & 0xe0) == 0xc0)
-		{
-			if (len < 2)
-				break;			/* drop trailing incomplete char */
-			c1 = *from++ & 0x1f;
-			c2 = *from++ & 0x3f;
-			*to = (c1 << 6) | c2;
-			len -= 2;
-		}
-		else if ((*from & 0xf0) == 0xe0)
-		{
-			if (len < 3)
-				break;			/* drop trailing incomplete char */
-			c1 = *from++ & 0x0f;
-			c2 = *from++ & 0x3f;
-			c3 = *from++ & 0x3f;
-			*to = (c1 << 12) | (c2 << 6) | c3;
-			len -= 3;
-		}
-		else if ((*from & 0xf8) == 0xf0)
-		{
-			if (len < 4)
-				break;			/* drop trailing incomplete char */
-			c1 = *from++ & 0x07;
-			c2 = *from++ & 0x3f;
-			c3 = *from++ & 0x3f;
-			c4 = *from++ & 0x3f;
-			*to = (c1 << 18) | (c2 << 12) | (c3 << 6) | c4;
-			len -= 4;
-		}
-		else
-		{
-			/* treat a bogus char as length 1; not ours to raise error */
-			*to = *from++;
-			len--;
-		}
-		to++;
-		cnt++;
-	}
-	*to = 0;
-	return cnt;
-}
-
-
-/*
- * Map a Unicode code point to UTF-8.  utf8string must have 4 bytes of
- * space allocated.
- */
-unsigned char *
-unicode_to_utf8(pg_wchar c, unsigned char *utf8string)
-{
-	if (c <= 0x7F)
-	{
-		utf8string[0] = c;
-	}
-	else if (c <= 0x7FF)
-	{
-		utf8string[0] = 0xC0 | ((c >> 6) & 0x1F);
-		utf8string[1] = 0x80 | (c & 0x3F);
-	}
-	else if (c <= 0xFFFF)
-	{
-		utf8string[0] = 0xE0 | ((c >> 12) & 0x0F);
-		utf8string[1] = 0x80 | ((c >> 6) & 0x3F);
-		utf8string[2] = 0x80 | (c & 0x3F);
-	}
-	else
-	{
-		utf8string[0] = 0xF0 | ((c >> 18) & 0x07);
-		utf8string[1] = 0x80 | ((c >> 12) & 0x3F);
-		utf8string[2] = 0x80 | ((c >> 6) & 0x3F);
-		utf8string[3] = 0x80 | (c & 0x3F);
-	}
-
-	return utf8string;
-}
-
-/*
- * Trivial conversion from pg_wchar to UTF-8.
- * caller should allocate enough space for "to"
- * len: length of from.
- * "from" not necessarily null terminated.
- */
-static int
-pg_wchar2utf_with_len(const pg_wchar *from, unsigned char *to, int len)
-{
-	int			cnt = 0;
-
-	while (len > 0 && *from)
-	{
-		int			char_len;
-
-		unicode_to_utf8(*from, to);
-		char_len = pg_utf_mblen(to);
-		cnt += char_len;
-		to += char_len;
-		from++;
-		len--;
-	}
-	*to = 0;
-	return cnt;
-}
-
-/*
- * Return the byte length of a UTF8 character pointed to by s
- *
- * Note: in the current implementation we do not support UTF8 sequences
- * of more than 4 bytes; hence do NOT return a value larger than 4.
- * We return "1" for any leading byte that is either flat-out illegal or
- * indicates a length larger than we support.
- *
- * pg_utf2wchar_with_len(), utf8_to_unicode(), pg_utf8_islegal(), and perhaps
- * other places would need to be fixed to change this.
- */
-int
-pg_utf_mblen(const unsigned char *s)
-{
-	int			len;
-
-	if ((*s & 0x80) == 0)
-		len = 1;
-	else if ((*s & 0xe0) == 0xc0)
-		len = 2;
-	else if ((*s & 0xf0) == 0xe0)
-		len = 3;
-	else if ((*s & 0xf8) == 0xf0)
-		len = 4;
-#ifdef NOT_USED
-	else if ((*s & 0xfc) == 0xf8)
-		len = 5;
-	else if ((*s & 0xfe) == 0xfc)
-		len = 6;
-#endif
-	else
-		len = 1;
-	return len;
-}
-
-/*
- * This is an implementation of wcwidth() and wcswidth() as defined in
- * "The Single UNIX Specification, Version 2, The Open Group, 1997"
- * <http://www.unix.org/online.html>
- *
- * Markus Kuhn -- 2001-09-08 -- public domain
- *
- * customised for PostgreSQL
- *
- * original available at : http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
- */
-
-struct mbinterval
-{
-	unsigned short first;
-	unsigned short last;
-};
-
-/* auxiliary function for binary search in interval table */
-static int
-mbbisearch(pg_wchar ucs, const struct mbinterval *table, int max)
-{
-	int			min = 0;
-	int			mid;
-
-	if (ucs < table[0].first || ucs > table[max].last)
-		return 0;
-	while (max >= min)
-	{
-		mid = (min + max) / 2;
-		if (ucs > table[mid].last)
-			min = mid + 1;
-		else if (ucs < table[mid].first)
-			max = mid - 1;
-		else
-			return 1;
-	}
-
-	return 0;
-}
-
-
-/* The following functions define the column width of an ISO 10646
- * character as follows:
- *
- *	  - The null character (U+0000) has a column width of 0.
- *
- *	  - Other C0/C1 control characters and DEL will lead to a return
- *		value of -1.
- *
- *	  - Non-spacing and enclosing combining characters (general
- *		category code Mn or Me in the Unicode database) have a
- *		column width of 0.
- *
- *	  - Other format characters (general category code Cf in the Unicode
- *		database) and ZERO WIDTH SPACE (U+200B) have a column width of 0.
- *
- *	  - Hangul Jamo medial vowels and final consonants (U+1160-U+11FF)
- *		have a column width of 0.
- *
- *	  - Spacing characters in the East Asian Wide (W) or East Asian
- *		FullWidth (F) category as defined in Unicode Technical
- *		Report #11 have a column width of 2.
- *
- *	  - All remaining characters (including all printable
- *		ISO 8859-1 and WGL4 characters, Unicode control characters,
- *		etc.) have a column width of 1.
- *
- * This implementation assumes that wchar_t characters are encoded
- * in ISO 10646.
- */
-
-static int
-ucs_wcwidth(pg_wchar ucs)
-{
-#include "common/unicode_combining_table.h"
-
-	/* test for 8-bit control characters */
-	if (ucs == 0)
-		return 0;
-
-	if (ucs < 0x20 || (ucs >= 0x7f && ucs < 0xa0) || ucs > 0x0010ffff)
-		return -1;
-
-	/* binary search in table of non-spacing characters */
-	if (mbbisearch(ucs, combining,
-				   sizeof(combining) / sizeof(struct mbinterval) - 1))
-		return 0;
-
-	/*
-	 * if we arrive here, ucs is not a combining or C0/C1 control character
-	 */
-
-	return 1 +
-		(ucs >= 0x1100 &&
-		 (ucs <= 0x115f ||		/* Hangul Jamo init. consonants */
-		  (ucs >= 0x2e80 && ucs <= 0xa4cf && (ucs & ~0x0011) != 0x300a &&
-		   ucs != 0x303f) ||	/* CJK ... Yi */
-		  (ucs >= 0xac00 && ucs <= 0xd7a3) ||	/* Hangul Syllables */
-		  (ucs >= 0xf900 && ucs <= 0xfaff) ||	/* CJK Compatibility
-												 * Ideographs */
-		  (ucs >= 0xfe30 && ucs <= 0xfe6f) ||	/* CJK Compatibility Forms */
-		  (ucs >= 0xff00 && ucs <= 0xff5f) ||	/* Fullwidth Forms */
-		  (ucs >= 0xffe0 && ucs <= 0xffe6) ||
-		  (ucs >= 0x20000 && ucs <= 0x2ffff)));
-}
-
-/*
- * Convert a UTF-8 character to a Unicode code point.
- * This is a one-character version of pg_utf2wchar_with_len.
- *
- * No error checks here, c must point to a long-enough string.
- */
-pg_wchar
-utf8_to_unicode(const unsigned char *c)
-{
-	if ((*c & 0x80) == 0)
-		return (pg_wchar) c[0];
-	else if ((*c & 0xe0) == 0xc0)
-		return (pg_wchar) (((c[0] & 0x1f) << 6) |
-						   (c[1] & 0x3f));
-	else if ((*c & 0xf0) == 0xe0)
-		return (pg_wchar) (((c[0] & 0x0f) << 12) |
-						   ((c[1] & 0x3f) << 6) |
-						   (c[2] & 0x3f));
-	else if ((*c & 0xf8) == 0xf0)
-		return (pg_wchar) (((c[0] & 0x07) << 18) |
-						   ((c[1] & 0x3f) << 12) |
-						   ((c[2] & 0x3f) << 6) |
-						   (c[3] & 0x3f));
-	else
-		/* that is an invalid code on purpose */
-		return 0xffffffff;
-}
-
-static int
-pg_utf_dsplen(const unsigned char *s)
-{
-	return ucs_wcwidth(utf8_to_unicode(s));
-}
-
-/*
- * convert mule internal code to pg_wchar
- * caller should allocate enough space for "to"
- * len: length of from.
- * "from" not necessarily null terminated.
- */
-static int
-pg_mule2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
-{
-	int			cnt = 0;
-
-	while (len > 0 && *from)
-	{
-		if (IS_LC1(*from) && len >= 2)
-		{
-			*to = *from++ << 16;
-			*to |= *from++;
-			len -= 2;
-		}
-		else if (IS_LCPRV1(*from) && len >= 3)
-		{
-			from++;
-			*to = *from++ << 16;
-			*to |= *from++;
-			len -= 3;
-		}
-		else if (IS_LC2(*from) && len >= 3)
-		{
-			*to = *from++ << 16;
-			*to |= *from++ << 8;
-			*to |= *from++;
-			len -= 3;
-		}
-		else if (IS_LCPRV2(*from) && len >= 4)
-		{
-			from++;
-			*to = *from++ << 16;
-			*to |= *from++ << 8;
-			*to |= *from++;
-			len -= 4;
-		}
-		else
-		{						/* assume ASCII */
-			*to = (unsigned char) *from++;
-			len--;
-		}
-		to++;
-		cnt++;
-	}
-	*to = 0;
-	return cnt;
-}
-
-/*
- * convert pg_wchar to mule internal code
- * caller should allocate enough space for "to"
- * len: length of from.
- * "from" not necessarily null terminated.
- */
-static int
-pg_wchar2mule_with_len(const pg_wchar *from, unsigned char *to, int len)
-{
-	int			cnt = 0;
-
-	while (len > 0 && *from)
-	{
-		unsigned char lb;
-
-		lb = (*from >> 16) & 0xff;
-		if (IS_LC1(lb))
-		{
-			*to++ = lb;
-			*to++ = *from & 0xff;
-			cnt += 2;
-		}
-		else if (IS_LC2(lb))
-		{
-			*to++ = lb;
-			*to++ = (*from >> 8) & 0xff;
-			*to++ = *from & 0xff;
-			cnt += 3;
-		}
-		else if (IS_LCPRV1_A_RANGE(lb))
-		{
-			*to++ = LCPRV1_A;
-			*to++ = lb;
-			*to++ = *from & 0xff;
-			cnt += 3;
-		}
-		else if (IS_LCPRV1_B_RANGE(lb))
-		{
-			*to++ = LCPRV1_B;
-			*to++ = lb;
-			*to++ = *from & 0xff;
-			cnt += 3;
-		}
-		else if (IS_LCPRV2_A_RANGE(lb))
-		{
-			*to++ = LCPRV2_A;
-			*to++ = lb;
-			*to++ = (*from >> 8) & 0xff;
-			*to++ = *from & 0xff;
-			cnt += 4;
-		}
-		else if (IS_LCPRV2_B_RANGE(lb))
-		{
-			*to++ = LCPRV2_B;
-			*to++ = lb;
-			*to++ = (*from >> 8) & 0xff;
-			*to++ = *from & 0xff;
-			cnt += 4;
-		}
-		else
-		{
-			*to++ = *from & 0xff;
-			cnt += 1;
-		}
-		from++;
-		len--;
-	}
-	*to = 0;
-	return cnt;
-}
-
-int
-pg_mule_mblen(const unsigned char *s)
-{
-	int			len;
-
-	if (IS_LC1(*s))
-		len = 2;
-	else if (IS_LCPRV1(*s))
-		len = 3;
-	else if (IS_LC2(*s))
-		len = 3;
-	else if (IS_LCPRV2(*s))
-		len = 4;
-	else
-		len = 1;				/* assume ASCII */
-	return len;
-}
-
-static int
-pg_mule_dsplen(const unsigned char *s)
-{
-	int			len;
-
-	/*
-	 * Note: it's not really appropriate to assume that all multibyte charsets
-	 * are double-wide on screen.  But this seems an okay approximation for
-	 * the MULE charsets we currently support.
-	 */
-
-	if (IS_LC1(*s))
-		len = 1;
-	else if (IS_LCPRV1(*s))
-		len = 1;
-	else if (IS_LC2(*s))
-		len = 2;
-	else if (IS_LCPRV2(*s))
-		len = 2;
-	else
-		len = 1;				/* assume ASCII */
-
-	return len;
-}
-
-/*
- * ISO8859-1
- */
-static int
-pg_latin12wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
-{
-	int			cnt = 0;
-
-	while (len > 0 && *from)
-	{
-		*to++ = *from++;
-		len--;
-		cnt++;
-	}
-	*to = 0;
-	return cnt;
-}
-
-/*
- * Trivial conversion from pg_wchar to single byte encoding. Just ignores
- * high bits.
- * caller should allocate enough space for "to"
- * len: length of from.
- * "from" not necessarily null terminated.
- */
-static int
-pg_wchar2single_with_len(const pg_wchar *from, unsigned char *to, int len)
-{
-	int			cnt = 0;
-
-	while (len > 0 && *from)
-	{
-		*to++ = *from++;
-		len--;
-		cnt++;
-	}
-	*to = 0;
-	return cnt;
-}
-
-static int
-pg_latin1_mblen(const unsigned char *s)
-{
-	return 1;
-}
-
-static int
-pg_latin1_dsplen(const unsigned char *s)
-{
-	return pg_ascii_dsplen(s);
-}
-
-/*
- * SJIS
- */
-static int
-pg_sjis_mblen(const unsigned char *s)
-{
-	int			len;
-
-	if (*s >= 0xa1 && *s <= 0xdf)
-		len = 1;				/* 1 byte kana? */
-	else if (IS_HIGHBIT_SET(*s))
-		len = 2;				/* kanji? */
-	else
-		len = 1;				/* should be ASCII */
-	return len;
-}
-
-static int
-pg_sjis_dsplen(const unsigned char *s)
-{
-	int			len;
-
-	if (*s >= 0xa1 && *s <= 0xdf)
-		len = 1;				/* 1 byte kana? */
-	else if (IS_HIGHBIT_SET(*s))
-		len = 2;				/* kanji? */
-	else
-		len = pg_ascii_dsplen(s);	/* should be ASCII */
-	return len;
-}
-
-/*
- * Big5
- */
-static int
-pg_big5_mblen(const unsigned char *s)
-{
-	int			len;
-
-	if (IS_HIGHBIT_SET(*s))
-		len = 2;				/* kanji? */
-	else
-		len = 1;				/* should be ASCII */
-	return len;
-}
-
-static int
-pg_big5_dsplen(const unsigned char *s)
-{
-	int			len;
-
-	if (IS_HIGHBIT_SET(*s))
-		len = 2;				/* kanji? */
-	else
-		len = pg_ascii_dsplen(s);	/* should be ASCII */
-	return len;
-}
-
-/*
- * GBK
- */
-static int
-pg_gbk_mblen(const unsigned char *s)
-{
-	int			len;
-
-	if (IS_HIGHBIT_SET(*s))
-		len = 2;				/* kanji? */
-	else
-		len = 1;				/* should be ASCII */
-	return len;
-}
-
-static int
-pg_gbk_dsplen(const unsigned char *s)
-{
-	int			len;
-
-	if (IS_HIGHBIT_SET(*s))
-		len = 2;				/* kanji? */
-	else
-		len = pg_ascii_dsplen(s);	/* should be ASCII */
-	return len;
-}
-
-/*
- * UHC
- */
-static int
-pg_uhc_mblen(const unsigned char *s)
-{
-	int			len;
-
-	if (IS_HIGHBIT_SET(*s))
-		len = 2;				/* 2byte? */
-	else
-		len = 1;				/* should be ASCII */
-	return len;
-}
-
-static int
-pg_uhc_dsplen(const unsigned char *s)
-{
-	int			len;
-
-	if (IS_HIGHBIT_SET(*s))
-		len = 2;				/* 2byte? */
-	else
-		len = pg_ascii_dsplen(s);	/* should be ASCII */
-	return len;
-}
-
-/*
- * GB18030
- *	Added by Bill Huang <bhuang@redhat.com>,<bill_huanghb@ybb.ne.jp>
- */
-
-/*
- * Unlike all other mblen() functions, this also looks at the second byte of
- * the input.  However, if you only pass the first byte of a multi-byte
- * string, and \0 as the second byte, this still works in a predictable way:
- * a 4-byte character will be reported as two 2-byte characters.  That's
- * enough for all current uses, as a client-only encoding.  It works that
- * way, because in any valid 4-byte GB18030-encoded character, the third and
- * fourth byte look like a 2-byte encoded character, when looked at
- * separately.
- */
-static int
-pg_gb18030_mblen(const unsigned char *s)
-{
-	int			len;
-
-	if (!IS_HIGHBIT_SET(*s))
-		len = 1;				/* ASCII */
-	else if (*(s + 1) >= 0x30 && *(s + 1) <= 0x39)
-		len = 4;
-	else
-		len = 2;
-	return len;
-}
-
-static int
-pg_gb18030_dsplen(const unsigned char *s)
-{
-	int			len;
-
-	if (IS_HIGHBIT_SET(*s))
-		len = 2;
-	else
-		len = pg_ascii_dsplen(s);	/* ASCII */
-	return len;
-}
-
-/*
- *-------------------------------------------------------------------
- * multibyte sequence validators
- *
- * These functions accept "s", a pointer to the first byte of a string,
- * and "len", the remaining length of the string.  If there is a validly
- * encoded character beginning at *s, return its length in bytes; else
- * return -1.
- *
- * The functions can assume that len > 0 and that *s != '\0', but they must
- * test for and reject zeroes in any additional bytes of a multibyte character.
- *
- * Note that this definition allows the function for a single-byte
- * encoding to be just "return 1".
- *-------------------------------------------------------------------
- */
-
-static int
-pg_ascii_verifier(const unsigned char *s, int len)
-{
-	return 1;
-}
-
-#define IS_EUC_RANGE_VALID(c)	((c) >= 0xa1 && (c) <= 0xfe)
-
-static int
-pg_eucjp_verifier(const unsigned char *s, int len)
-{
-	int			l;
-	unsigned char c1,
-				c2;
-
-	c1 = *s++;
-
-	switch (c1)
-	{
-		case SS2:				/* JIS X 0201 */
-			l = 2;
-			if (l > len)
-				return -1;
-			c2 = *s++;
-			if (c2 < 0xa1 || c2 > 0xdf)
-				return -1;
-			break;
-
-		case SS3:				/* JIS X 0212 */
-			l = 3;
-			if (l > len)
-				return -1;
-			c2 = *s++;
-			if (!IS_EUC_RANGE_VALID(c2))
-				return -1;
-			c2 = *s++;
-			if (!IS_EUC_RANGE_VALID(c2))
-				return -1;
-			break;
-
-		default:
-			if (IS_HIGHBIT_SET(c1)) /* JIS X 0208? */
-			{
-				l = 2;
-				if (l > len)
-					return -1;
-				if (!IS_EUC_RANGE_VALID(c1))
-					return -1;
-				c2 = *s++;
-				if (!IS_EUC_RANGE_VALID(c2))
-					return -1;
-			}
-			else
-				/* must be ASCII */
-			{
-				l = 1;
-			}
-			break;
-	}
-
-	return l;
-}
-
-static int
-pg_euckr_verifier(const unsigned char *s, int len)
-{
-	int			l;
-	unsigned char c1,
-				c2;
-
-	c1 = *s++;
-
-	if (IS_HIGHBIT_SET(c1))
-	{
-		l = 2;
-		if (l > len)
-			return -1;
-		if (!IS_EUC_RANGE_VALID(c1))
-			return -1;
-		c2 = *s++;
-		if (!IS_EUC_RANGE_VALID(c2))
-			return -1;
-	}
-	else
-		/* must be ASCII */
-	{
-		l = 1;
-	}
-
-	return l;
-}
-
-/* EUC-CN byte sequences are exactly same as EUC-KR */
-#define pg_euccn_verifier	pg_euckr_verifier
-
-static int
-pg_euctw_verifier(const unsigned char *s, int len)
-{
-	int			l;
-	unsigned char c1,
-				c2;
-
-	c1 = *s++;
-
-	switch (c1)
-	{
-		case SS2:				/* CNS 11643 Plane 1-7 */
-			l = 4;
-			if (l > len)
-				return -1;
-			c2 = *s++;
-			if (c2 < 0xa1 || c2 > 0xa7)
-				return -1;
-			c2 = *s++;
-			if (!IS_EUC_RANGE_VALID(c2))
-				return -1;
-			c2 = *s++;
-			if (!IS_EUC_RANGE_VALID(c2))
-				return -1;
-			break;
-
-		case SS3:				/* unused */
-			return -1;
-
-		default:
-			if (IS_HIGHBIT_SET(c1)) /* CNS 11643 Plane 1 */
-			{
-				l = 2;
-				if (l > len)
-					return -1;
-				/* no further range check on c1? */
-				c2 = *s++;
-				if (!IS_EUC_RANGE_VALID(c2))
-					return -1;
-			}
-			else
-				/* must be ASCII */
-			{
-				l = 1;
-			}
-			break;
-	}
-	return l;
-}
-
-static int
-pg_johab_verifier(const unsigned char *s, int len)
-{
-	int			l,
-				mbl;
-	unsigned char c;
-
-	l = mbl = pg_johab_mblen(s);
-
-	if (len < l)
-		return -1;
-
-	if (!IS_HIGHBIT_SET(*s))
-		return mbl;
-
-	while (--l > 0)
-	{
-		c = *++s;
-		if (!IS_EUC_RANGE_VALID(c))
-			return -1;
-	}
-	return mbl;
-}
-
-static int
-pg_mule_verifier(const unsigned char *s, int len)
-{
-	int			l,
-				mbl;
-	unsigned char c;
-
-	l = mbl = pg_mule_mblen(s);
-
-	if (len < l)
-		return -1;
-
-	while (--l > 0)
-	{
-		c = *++s;
-		if (!IS_HIGHBIT_SET(c))
-			return -1;
-	}
-	return mbl;
-}
-
-static int
-pg_latin1_verifier(const unsigned char *s, int len)
-{
-	return 1;
-}
-
-static int
-pg_sjis_verifier(const unsigned char *s, int len)
-{
-	int			l,
-				mbl;
-	unsigned char c1,
-				c2;
-
-	l = mbl = pg_sjis_mblen(s);
-
-	if (len < l)
-		return -1;
-
-	if (l == 1)					/* pg_sjis_mblen already verified it */
-		return mbl;
-
-	c1 = *s++;
-	c2 = *s;
-	if (!ISSJISHEAD(c1) || !ISSJISTAIL(c2))
-		return -1;
-	return mbl;
-}
-
-static int
-pg_big5_verifier(const unsigned char *s, int len)
-{
-	int			l,
-				mbl;
-
-	l = mbl = pg_big5_mblen(s);
-
-	if (len < l)
-		return -1;
-
-	while (--l > 0)
-	{
-		if (*++s == '\0')
-			return -1;
-	}
-
-	return mbl;
-}
-
-static int
-pg_gbk_verifier(const unsigned char *s, int len)
-{
-	int			l,
-				mbl;
-
-	l = mbl = pg_gbk_mblen(s);
-
-	if (len < l)
-		return -1;
-
-	while (--l > 0)
-	{
-		if (*++s == '\0')
-			return -1;
-	}
-
-	return mbl;
-}
-
-static int
-pg_uhc_verifier(const unsigned char *s, int len)
-{
-	int			l,
-				mbl;
-
-	l = mbl = pg_uhc_mblen(s);
-
-	if (len < l)
-		return -1;
-
-	while (--l > 0)
-	{
-		if (*++s == '\0')
-			return -1;
-	}
-
-	return mbl;
-}
-
-static int
-pg_gb18030_verifier(const unsigned char *s, int len)
-{
-	int			l;
-
-	if (!IS_HIGHBIT_SET(*s))
-		l = 1;					/* ASCII */
-	else if (len >= 4 && *(s + 1) >= 0x30 && *(s + 1) <= 0x39)
-	{
-		/* Should be 4-byte, validate remaining bytes */
-		if (*s >= 0x81 && *s <= 0xfe &&
-			*(s + 2) >= 0x81 && *(s + 2) <= 0xfe &&
-			*(s + 3) >= 0x30 && *(s + 3) <= 0x39)
-			l = 4;
-		else
-			l = -1;
-	}
-	else if (len >= 2 && *s >= 0x81 && *s <= 0xfe)
-	{
-		/* Should be 2-byte, validate */
-		if ((*(s + 1) >= 0x40 && *(s + 1) <= 0x7e) ||
-			(*(s + 1) >= 0x80 && *(s + 1) <= 0xfe))
-			l = 2;
-		else
-			l = -1;
-	}
-	else
-		l = -1;
-	return l;
-}
-
-static int
-pg_utf8_verifier(const unsigned char *s, int len)
-{
-	int			l = pg_utf_mblen(s);
-
-	if (len < l)
-		return -1;
-
-	if (!pg_utf8_islegal(s, l))
-		return -1;
-
-	return l;
-}
-
-/*
- * Check for validity of a single UTF-8 encoded character
- *
- * This directly implements the rules in RFC3629.  The bizarre-looking
- * restrictions on the second byte are meant to ensure that there isn't
- * more than one encoding of a given Unicode character point; that is,
- * you may not use a longer-than-necessary byte sequence with high order
- * zero bits to represent a character that would fit in fewer bytes.
- * To do otherwise is to create security hazards (eg, create an apparent
- * non-ASCII character that decodes to plain ASCII).
- *
- * length is assumed to have been obtained by pg_utf_mblen(), and the
- * caller must have checked that that many bytes are present in the buffer.
- */
-bool
-pg_utf8_islegal(const unsigned char *source, int length)
-{
-	unsigned char a;
-
-	switch (length)
-	{
-		default:
-			/* reject lengths 5 and 6 for now */
-			return false;
-		case 4:
-			a = source[3];
-			if (a < 0x80 || a > 0xBF)
-				return false;
-			/* FALL THRU */
-		case 3:
-			a = source[2];
-			if (a < 0x80 || a > 0xBF)
-				return false;
-			/* FALL THRU */
-		case 2:
-			a = source[1];
-			switch (*source)
-			{
-				case 0xE0:
-					if (a < 0xA0 || a > 0xBF)
-						return false;
-					break;
-				case 0xED:
-					if (a < 0x80 || a > 0x9F)
-						return false;
-					break;
-				case 0xF0:
-					if (a < 0x90 || a > 0xBF)
-						return false;
-					break;
-				case 0xF4:
-					if (a < 0x80 || a > 0x8F)
-						return false;
-					break;
-				default:
-					if (a < 0x80 || a > 0xBF)
-						return false;
-					break;
-			}
-			/* FALL THRU */
-		case 1:
-			a = *source;
-			if (a >= 0x80 && a < 0xC2)
-				return false;
-			if (a > 0xF4)
-				return false;
-			break;
-	}
-	return true;
-}
-
-#ifndef FRONTEND
-
-/*
- * Generic character incrementer function.
- *
- * Not knowing anything about the properties of the encoding in use, we just
- * keep incrementing the last byte until we get a validly-encoded result,
- * or we run out of values to try.  We don't bother to try incrementing
- * higher-order bytes, so there's no growth in runtime for wider characters.
- * (If we did try to do that, we'd need to consider the likelihood that 255
- * is not a valid final byte in the encoding.)
- */
-static bool
-pg_generic_charinc(unsigned char *charptr, int len)
-{
-	unsigned char *lastbyte = charptr + len - 1;
-	mbverifier	mbverify;
-
-	/* We can just invoke the character verifier directly. */
-	mbverify = pg_wchar_table[GetDatabaseEncoding()].mbverify;
-
-	while (*lastbyte < (unsigned char) 255)
-	{
-		(*lastbyte)++;
-		if ((*mbverify) (charptr, len) == len)
-			return true;
-	}
-
-	return false;
-}
-
-/*
- * UTF-8 character incrementer function.
- *
- * For a one-byte character less than 0x7F, we just increment the byte.
- *
- * For a multibyte character, every byte but the first must fall between 0x80
- * and 0xBF; and the first byte must be between 0xC0 and 0xF4.  We increment
- * the last byte that's not already at its maximum value.  If we can't find a
- * byte that's less than the maximum allowable value, we simply fail.  We also
- * need some special-case logic to skip regions used for surrogate pair
- * handling, as those should not occur in valid UTF-8.
- *
- * Note that we don't reset lower-order bytes back to their minimums, since
- * we can't afford to make an exhaustive search (see make_greater_string).
- */
-static bool
-pg_utf8_increment(unsigned char *charptr, int length)
-{
-	unsigned char a;
-	unsigned char limit;
-
-	switch (length)
-	{
-		default:
-			/* reject lengths 5 and 6 for now */
-			return false;
-		case 4:
-			a = charptr[3];
-			if (a < 0xBF)
-			{
-				charptr[3]++;
-				break;
-			}
-			/* FALL THRU */
-		case 3:
-			a = charptr[2];
-			if (a < 0xBF)
-			{
-				charptr[2]++;
-				break;
-			}
-			/* FALL THRU */
-		case 2:
-			a = charptr[1];
-			switch (*charptr)
-			{
-				case 0xED:
-					limit = 0x9F;
-					break;
-				case 0xF4:
-					limit = 0x8F;
-					break;
-				default:
-					limit = 0xBF;
-					break;
-			}
-			if (a < limit)
-			{
-				charptr[1]++;
-				break;
-			}
-			/* FALL THRU */
-		case 1:
-			a = *charptr;
-			if (a == 0x7F || a == 0xDF || a == 0xEF || a == 0xF4)
-				return false;
-			charptr[0]++;
-			break;
-	}
-
-	return true;
-}
-
-/*
- * EUC-JP character incrementer function.
- *
- * If the sequence starts with SS2 (0x8e), it must be a two-byte sequence
- * representing JIS X 0201 characters with the second byte ranging between
- * 0xa1 and 0xdf.  We just increment the last byte if it's less than 0xdf,
- * and otherwise rewrite the whole sequence to 0xa1 0xa1.
- *
- * If the sequence starts with SS3 (0x8f), it must be a three-byte sequence
- * in which the last two bytes range between 0xa1 and 0xfe.  The last byte
- * is incremented if possible, otherwise the second-to-last byte.
- *
- * If the sequence starts with a value other than the above and its MSB
- * is set, it must be a two-byte sequence representing JIS X 0208 characters
- * with both bytes ranging between 0xa1 and 0xfe.  The last byte is
- * incremented if possible, otherwise the second-to-last byte.
- *
- * Otherwise, the sequence is a single-byte ASCII character. It is
- * incremented up to 0x7f.
- */
-static bool
-pg_eucjp_increment(unsigned char *charptr, int length)
-{
-	unsigned char c1,
-				c2;
-	int			i;
-
-	c1 = *charptr;
-
-	switch (c1)
-	{
-		case SS2:				/* JIS X 0201 */
-			if (length != 2)
-				return false;
-
-			c2 = charptr[1];
-
-			if (c2 >= 0xdf)
-				charptr[0] = charptr[1] = 0xa1;
-			else if (c2 < 0xa1)
-				charptr[1] = 0xa1;
-			else
-				charptr[1]++;
-			break;
-
-		case SS3:				/* JIS X 0212 */
-			if (length != 3)
-				return false;
-
-			for (i = 2; i > 0; i--)
-			{
-				c2 = charptr[i];
-				if (c2 < 0xa1)
-				{
-					charptr[i] = 0xa1;
-					return true;
-				}
-				else if (c2 < 0xfe)
-				{
-					charptr[i]++;
-					return true;
-				}
-			}
-
-			/* Out of 3-byte code region */
-			return false;
-
-		default:
-			if (IS_HIGHBIT_SET(c1)) /* JIS X 0208? */
-			{
-				if (length != 2)
-					return false;
-
-				for (i = 1; i >= 0; i--)
-				{
-					c2 = charptr[i];
-					if (c2 < 0xa1)
-					{
-						charptr[i] = 0xa1;
-						return true;
-					}
-					else if (c2 < 0xfe)
-					{
-						charptr[i]++;
-						return true;
-					}
-				}
-
-				/* Out of 2 byte code region */
-				return false;
-			}
-			else
-			{					/* ASCII, single byte */
-				if (c1 > 0x7e)
-					return false;
-				(*charptr)++;
-			}
-			break;
-	}
-
-	return true;
-}
-#endif							/* !FRONTEND */
-
-
-/*
- *-------------------------------------------------------------------
- * encoding info table
- * XXX must be sorted by the same order as enum pg_enc (in mb/pg_wchar.h)
- *-------------------------------------------------------------------
- */
-const pg_wchar_tbl pg_wchar_table[] = {
-	{pg_ascii2wchar_with_len, pg_wchar2single_with_len, pg_ascii_mblen, pg_ascii_dsplen, pg_ascii_verifier, 1}, /* PG_SQL_ASCII */
-	{pg_eucjp2wchar_with_len, pg_wchar2euc_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_eucjp_verifier, 3},	/* PG_EUC_JP */
-	{pg_euccn2wchar_with_len, pg_wchar2euc_with_len, pg_euccn_mblen, pg_euccn_dsplen, pg_euccn_verifier, 2},	/* PG_EUC_CN */
-	{pg_euckr2wchar_with_len, pg_wchar2euc_with_len, pg_euckr_mblen, pg_euckr_dsplen, pg_euckr_verifier, 3},	/* PG_EUC_KR */
-	{pg_euctw2wchar_with_len, pg_wchar2euc_with_len, pg_euctw_mblen, pg_euctw_dsplen, pg_euctw_verifier, 4},	/* PG_EUC_TW */
-	{pg_eucjp2wchar_with_len, pg_wchar2euc_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_eucjp_verifier, 3},	/* PG_EUC_JIS_2004 */
-	{pg_utf2wchar_with_len, pg_wchar2utf_with_len, pg_utf_mblen, pg_utf_dsplen, pg_utf8_verifier, 4},	/* PG_UTF8 */
-	{pg_mule2wchar_with_len, pg_wchar2mule_with_len, pg_mule_mblen, pg_mule_dsplen, pg_mule_verifier, 4},	/* PG_MULE_INTERNAL */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN1 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN2 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN3 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN4 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN5 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN6 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN7 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN8 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN9 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN10 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1256 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1258 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN866 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN874 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_KOI8R */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1251 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1252 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* ISO-8859-5 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* ISO-8859-6 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* ISO-8859-7 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* ISO-8859-8 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1250 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1253 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1254 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1255 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1257 */
-	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_KOI8U */
-	{0, 0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifier, 2}, /* PG_SJIS */
-	{0, 0, pg_big5_mblen, pg_big5_dsplen, pg_big5_verifier, 2}, /* PG_BIG5 */
-	{0, 0, pg_gbk_mblen, pg_gbk_dsplen, pg_gbk_verifier, 2},	/* PG_GBK */
-	{0, 0, pg_uhc_mblen, pg_uhc_dsplen, pg_uhc_verifier, 2},	/* PG_UHC */
-	{0, 0, pg_gb18030_mblen, pg_gb18030_dsplen, pg_gb18030_verifier, 4},	/* PG_GB18030 */
-	{0, 0, pg_johab_mblen, pg_johab_dsplen, pg_johab_verifier, 3},	/* PG_JOHAB */
-	{0, 0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifier, 2}	/* PG_SHIFT_JIS_2004 */
-};
-
-/* returns the byte length of a word for mule internal code */
-int
-pg_mic_mblen(const unsigned char *mbstr)
-{
-	return pg_mule_mblen(mbstr);
-}
-
-/*
- * Returns the byte length of a multibyte character.
- */
-int
-pg_encoding_mblen(int encoding, const char *mbstr)
-{
-	return (PG_VALID_ENCODING(encoding) ?
-			pg_wchar_table[encoding].mblen((const unsigned char *) mbstr) :
-			pg_wchar_table[PG_SQL_ASCII].mblen((const unsigned char *) mbstr));
-}
-
-/*
- * Returns the display length of a multibyte character.
- */
-int
-pg_encoding_dsplen(int encoding, const char *mbstr)
-{
-	return (PG_VALID_ENCODING(encoding) ?
-			pg_wchar_table[encoding].dsplen((const unsigned char *) mbstr) :
-			pg_wchar_table[PG_SQL_ASCII].dsplen((const unsigned char *) mbstr));
-}
-
-/*
- * Verify the first multibyte character of the given string.
- * Return its byte length if good, -1 if bad.  (See comments above for
- * full details of the mbverify API.)
- */
-int
-pg_encoding_verifymb(int encoding, const char *mbstr, int len)
-{
-	return (PG_VALID_ENCODING(encoding) ?
-			pg_wchar_table[encoding].mbverify((const unsigned char *) mbstr, len) :
-			pg_wchar_table[PG_SQL_ASCII].mbverify((const unsigned char *) mbstr, len));
-}
-
-/*
- * fetch maximum length of a given encoding
- */
-int
-pg_encoding_max_length(int encoding)
-{
-	Assert(PG_VALID_ENCODING(encoding));
-
-	return pg_wchar_table[encoding].maxmblen;
-}
-
-#ifndef FRONTEND
-
-/*
- * fetch maximum length of the encoding for the current database
- */
-int
-pg_database_encoding_max_length(void)
-{
-	return pg_wchar_table[GetDatabaseEncoding()].maxmblen;
-}
-
-/*
- * get the character incrementer for the encoding for the current database
- */
-mbcharacter_incrementer
-pg_database_encoding_character_incrementer(void)
-{
-	/*
-	 * Eventually it might be best to add a field to pg_wchar_table[], but for
-	 * now we just use a switch.
-	 */
-	switch (GetDatabaseEncoding())
-	{
-		case PG_UTF8:
-			return pg_utf8_increment;
-
-		case PG_EUC_JP:
-			return pg_eucjp_increment;
-
-		default:
-			return pg_generic_charinc;
-	}
-}
-
-/*
- * Verify mbstr to make sure that it is validly encoded in the current
- * database encoding.  Otherwise same as pg_verify_mbstr().
- */
-bool
-pg_verifymbstr(const char *mbstr, int len, bool noError)
-{
-	return
-		pg_verify_mbstr_len(GetDatabaseEncoding(), mbstr, len, noError) >= 0;
-}
-
-/*
- * Verify mbstr to make sure that it is validly encoded in the specified
- * encoding.
- */
-bool
-pg_verify_mbstr(int encoding, const char *mbstr, int len, bool noError)
-{
-	return pg_verify_mbstr_len(encoding, mbstr, len, noError) >= 0;
-}
-
-/*
- * Verify mbstr to make sure that it is validly encoded in the specified
- * encoding.
- *
- * mbstr is not necessarily zero terminated; length of mbstr is
- * specified by len.
- *
- * If OK, return length of string in the encoding.
- * If a problem is found, return -1 when noError is
- * true; when noError is false, ereport() a descriptive message.
- */
-int
-pg_verify_mbstr_len(int encoding, const char *mbstr, int len, bool noError)
-{
-	mbverifier	mbverify;
-	int			mb_len;
-
-	Assert(PG_VALID_ENCODING(encoding));
-
-	/*
-	 * In single-byte encodings, we need only reject nulls (\0).
-	 */
-	if (pg_encoding_max_length(encoding) <= 1)
-	{
-		const char *nullpos = memchr(mbstr, 0, len);
-
-		if (nullpos == NULL)
-			return len;
-		if (noError)
-			return -1;
-		report_invalid_encoding(encoding, nullpos, 1);
-	}
-
-	/* fetch function pointer just once */
-	mbverify = pg_wchar_table[encoding].mbverify;
-
-	mb_len = 0;
-
-	while (len > 0)
-	{
-		int			l;
-
-		/* fast path for ASCII-subset characters */
-		if (!IS_HIGHBIT_SET(*mbstr))
-		{
-			if (*mbstr != '\0')
-			{
-				mb_len++;
-				mbstr++;
-				len--;
-				continue;
-			}
-			if (noError)
-				return -1;
-			report_invalid_encoding(encoding, mbstr, len);
-		}
-
-		l = (*mbverify) ((const unsigned char *) mbstr, len);
-
-		if (l < 0)
-		{
-			if (noError)
-				return -1;
-			report_invalid_encoding(encoding, mbstr, len);
-		}
-
-		mbstr += l;
-		len -= l;
-		mb_len++;
-	}
-	return mb_len;
-}
-
-/*
- * check_encoding_conversion_args: check arguments of a conversion function
- *
- * "expected" arguments can be either an encoding ID or -1 to indicate that
- * the caller will check whether it accepts the ID.
- *
- * Note: the errors here are not really user-facing, so elog instead of
- * ereport seems sufficient.  Also, we trust that the "expected" encoding
- * arguments are valid encoding IDs, but we don't trust the actuals.
- */
-void
-check_encoding_conversion_args(int src_encoding,
-							   int dest_encoding,
-							   int len,
-							   int expected_src_encoding,
-							   int expected_dest_encoding)
-{
-	if (!PG_VALID_ENCODING(src_encoding))
-		elog(ERROR, "invalid source encoding ID: %d", src_encoding);
-	if (src_encoding != expected_src_encoding && expected_src_encoding >= 0)
-		elog(ERROR, "expected source encoding \"%s\", but got \"%s\"",
-			 pg_enc2name_tbl[expected_src_encoding].name,
-			 pg_enc2name_tbl[src_encoding].name);
-	if (!PG_VALID_ENCODING(dest_encoding))
-		elog(ERROR, "invalid destination encoding ID: %d", dest_encoding);
-	if (dest_encoding != expected_dest_encoding && expected_dest_encoding >= 0)
-		elog(ERROR, "expected destination encoding \"%s\", but got \"%s\"",
-			 pg_enc2name_tbl[expected_dest_encoding].name,
-			 pg_enc2name_tbl[dest_encoding].name);
-	if (len < 0)
-		elog(ERROR, "encoding conversion length must not be negative");
-}
-
-/*
- * report_invalid_encoding: complain about invalid multibyte character
- *
- * note: len is remaining length of string, not length of character;
- * len must be greater than zero, as we always examine the first byte.
- */
-void
-report_invalid_encoding(int encoding, const char *mbstr, int len)
-{
-	int			l = pg_encoding_mblen(encoding, mbstr);
-	char		buf[8 * 5 + 1];
-	char	   *p = buf;
-	int			j,
-				jlimit;
-
-	jlimit = Min(l, len);
-	jlimit = Min(jlimit, 8);	/* prevent buffer overrun */
-
-	for (j = 0; j < jlimit; j++)
-	{
-		p += sprintf(p, "0x%02x", (unsigned char) mbstr[j]);
-		if (j < jlimit - 1)
-			p += sprintf(p, " ");
-	}
-
-	ereport(ERROR,
-			(errcode(ERRCODE_CHARACTER_NOT_IN_REPERTOIRE),
-			 errmsg("invalid byte sequence for encoding \"%s\": %s",
-					pg_enc2name_tbl[encoding].name,
-					buf)));
-}
-
-/*
- * report_untranslatable_char: complain about untranslatable character
- *
- * note: len is remaining length of string, not length of character;
- * len must be greater than zero, as we always examine the first byte.
- */
-void
-report_untranslatable_char(int src_encoding, int dest_encoding,
-						   const char *mbstr, int len)
-{
-	int			l = pg_encoding_mblen(src_encoding, mbstr);
-	char		buf[8 * 5 + 1];
-	char	   *p = buf;
-	int			j,
-				jlimit;
-
-	jlimit = Min(l, len);
-	jlimit = Min(jlimit, 8);	/* prevent buffer overrun */
-
-	for (j = 0; j < jlimit; j++)
-	{
-		p += sprintf(p, "0x%02x", (unsigned char) mbstr[j]);
-		if (j < jlimit - 1)
-			p += sprintf(p, " ");
-	}
-
-	ereport(ERROR,
-			(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
-			 errmsg("character with byte sequence %s in encoding \"%s\" has no equivalent in encoding \"%s\"",
-					buf,
-					pg_enc2name_tbl[src_encoding].name,
-					pg_enc2name_tbl[dest_encoding].name)));
-}
-
-#endif							/* !FRONTEND */
diff --git a/src/bin/initdb/.gitignore b/src/bin/initdb/.gitignore
index 71a899f..b3167c4 100644
--- a/src/bin/initdb/.gitignore
+++ b/src/bin/initdb/.gitignore
@@ -1,4 +1,3 @@
-/encnames.c
 /localtime.c
 
 /initdb
diff --git a/src/bin/initdb/Makefile b/src/bin/initdb/Makefile
index f587a86..7e23754 100644
--- a/src/bin/initdb/Makefile
+++ b/src/bin/initdb/Makefile
@@ -18,7 +18,12 @@ include $(top_builddir)/src/Makefile.global
 
 override CPPFLAGS := -DFRONTEND -I$(libpq_srcdir) -I$(top_srcdir)/src/timezone $(CPPFLAGS)
 
-# note: we need libpq only because fe_utils does
+# Note: it's important that we link to encnames.o from libpgcommon, not
+# from libpq, else we have risks of version skew if we run with a libpq
+# shared library from a different PG version.  The libpq_pgport macro
+# should ensure that that happens.
+#
+# We need libpq only because fe_utils does.
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
 
 # use system timezone data?
@@ -28,7 +33,6 @@ endif
 
 OBJS = \
 	$(WIN32RES) \
-	encnames.o \
 	findtimezone.o \
 	initdb.o \
 	localtime.o
@@ -38,15 +42,7 @@ all: initdb
 initdb: $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
 	$(CC) $(CFLAGS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
 
-# We used to pull in all of libpq to get encnames.c, but that
-# exposes us to risks of version skew if we link to a shared library.
-# Do it the hard way, instead, so that we're statically linked.
-
-encnames.c: % : $(top_srcdir)/src/backend/utils/mb/%
-	rm -f $@ && $(LN_S) $< .
-
-# Likewise, pull in localtime.c from src/timezones
-
+# We must pull in localtime.c from src/timezones
 localtime.c: % : $(top_srcdir)/src/timezone/%
 	rm -f $@ && $(LN_S) $< .
 
@@ -60,7 +56,7 @@ uninstall:
 	rm -f '$(DESTDIR)$(bindir)/initdb$(X)'
 
 clean distclean maintainer-clean:
-	rm -f initdb$(X) $(OBJS) encnames.c localtime.c
+	rm -f initdb$(X) $(OBJS) localtime.c
 	rm -rf tmp_check
 
 # ensure that changes in datadir propagate into object file
diff --git a/src/common/Makefile b/src/common/Makefile
index ffb0f6e..5b44340 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -51,6 +51,7 @@ OBJS_COMMON = \
 	config_info.o \
 	controldata_utils.o \
 	d2s.o \
+	encnames.o \
 	exec.o \
 	f2s.o \
 	file_perm.o \
@@ -70,7 +71,8 @@ OBJS_COMMON = \
 	stringinfo.o \
 	unicode_norm.o \
 	username.o \
-	wait_error.o
+	wait_error.o \
+	wchar.o
 
 ifeq ($(with_openssl),yes)
 OBJS_COMMON += sha2_openssl.o
diff --git a/src/common/encnames.c b/src/common/encnames.c
new file mode 100644
index 0000000..2086e00
--- /dev/null
+++ b/src/common/encnames.c
@@ -0,0 +1,635 @@
+/*-------------------------------------------------------------------------
+ *
+ * encnames.c
+ *	  Encoding names and routines for working with them.
+ *
+ * Portions Copyright (c) 2001-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/common/encnames.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifdef FRONTEND
+#include "postgres_fe.h"
+#else
+#include "postgres.h"
+#include "utils/builtins.h"
+#endif
+
+#include <ctype.h>
+#include <unistd.h>
+
+#include "mb/pg_wchar.h"
+
+
+/* ----------
+ * All encoding names, sorted:		 *** A L P H A B E T I C ***
+ *
+ * All names must be without irrelevant chars, search routines use
+ * isalnum() chars only. It means ISO-8859-1, iso_8859-1 and Iso8859_1
+ * are always converted to 'iso88591'. All must be lower case.
+ *
+ * The table doesn't contain 'cs' aliases (like csISOLatin1). It's needed?
+ *
+ * Karel Zak, Aug 2001
+ * ----------
+ */
+typedef struct pg_encname
+{
+	const char *name;
+	pg_enc		encoding;
+} pg_encname;
+
+static const pg_encname pg_encname_tbl[] =
+{
+	{
+		"abc", PG_WIN1258
+	},							/* alias for WIN1258 */
+	{
+		"alt", PG_WIN866
+	},							/* IBM866 */
+	{
+		"big5", PG_BIG5
+	},							/* Big5; Chinese for Taiwan multibyte set */
+	{
+		"euccn", PG_EUC_CN
+	},							/* EUC-CN; Extended Unix Code for simplified
+								 * Chinese */
+	{
+		"eucjis2004", PG_EUC_JIS_2004
+	},							/* EUC-JIS-2004; Extended UNIX Code fixed
+								 * Width for Japanese, standard JIS X 0213 */
+	{
+		"eucjp", PG_EUC_JP
+	},							/* EUC-JP; Extended UNIX Code fixed Width for
+								 * Japanese, standard OSF */
+	{
+		"euckr", PG_EUC_KR
+	},							/* EUC-KR; Extended Unix Code for Korean , KS
+								 * X 1001 standard */
+	{
+		"euctw", PG_EUC_TW
+	},							/* EUC-TW; Extended Unix Code for
+								 *
+								 * traditional Chinese */
+	{
+		"gb18030", PG_GB18030
+	},							/* GB18030;GB18030 */
+	{
+		"gbk", PG_GBK
+	},							/* GBK; Chinese Windows CodePage 936
+								 * simplified Chinese */
+	{
+		"iso88591", PG_LATIN1
+	},							/* ISO-8859-1; RFC1345,KXS2 */
+	{
+		"iso885910", PG_LATIN6
+	},							/* ISO-8859-10; RFC1345,KXS2 */
+	{
+		"iso885913", PG_LATIN7
+	},							/* ISO-8859-13; RFC1345,KXS2 */
+	{
+		"iso885914", PG_LATIN8
+	},							/* ISO-8859-14; RFC1345,KXS2 */
+	{
+		"iso885915", PG_LATIN9
+	},							/* ISO-8859-15; RFC1345,KXS2 */
+	{
+		"iso885916", PG_LATIN10
+	},							/* ISO-8859-16; RFC1345,KXS2 */
+	{
+		"iso88592", PG_LATIN2
+	},							/* ISO-8859-2; RFC1345,KXS2 */
+	{
+		"iso88593", PG_LATIN3
+	},							/* ISO-8859-3; RFC1345,KXS2 */
+	{
+		"iso88594", PG_LATIN4
+	},							/* ISO-8859-4; RFC1345,KXS2 */
+	{
+		"iso88595", PG_ISO_8859_5
+	},							/* ISO-8859-5; RFC1345,KXS2 */
+	{
+		"iso88596", PG_ISO_8859_6
+	},							/* ISO-8859-6; RFC1345,KXS2 */
+	{
+		"iso88597", PG_ISO_8859_7
+	},							/* ISO-8859-7; RFC1345,KXS2 */
+	{
+		"iso88598", PG_ISO_8859_8
+	},							/* ISO-8859-8; RFC1345,KXS2 */
+	{
+		"iso88599", PG_LATIN5
+	},							/* ISO-8859-9; RFC1345,KXS2 */
+	{
+		"johab", PG_JOHAB
+	},							/* JOHAB; Extended Unix Code for simplified
+								 * Chinese */
+	{
+		"koi8", PG_KOI8R
+	},							/* _dirty_ alias for KOI8-R (backward
+								 * compatibility) */
+	{
+		"koi8r", PG_KOI8R
+	},							/* KOI8-R; RFC1489 */
+	{
+		"koi8u", PG_KOI8U
+	},							/* KOI8-U; RFC2319 */
+	{
+		"latin1", PG_LATIN1
+	},							/* alias for ISO-8859-1 */
+	{
+		"latin10", PG_LATIN10
+	},							/* alias for ISO-8859-16 */
+	{
+		"latin2", PG_LATIN2
+	},							/* alias for ISO-8859-2 */
+	{
+		"latin3", PG_LATIN3
+	},							/* alias for ISO-8859-3 */
+	{
+		"latin4", PG_LATIN4
+	},							/* alias for ISO-8859-4 */
+	{
+		"latin5", PG_LATIN5
+	},							/* alias for ISO-8859-9 */
+	{
+		"latin6", PG_LATIN6
+	},							/* alias for ISO-8859-10 */
+	{
+		"latin7", PG_LATIN7
+	},							/* alias for ISO-8859-13 */
+	{
+		"latin8", PG_LATIN8
+	},							/* alias for ISO-8859-14 */
+	{
+		"latin9", PG_LATIN9
+	},							/* alias for ISO-8859-15 */
+	{
+		"mskanji", PG_SJIS
+	},							/* alias for Shift_JIS */
+	{
+		"muleinternal", PG_MULE_INTERNAL
+	},
+	{
+		"shiftjis", PG_SJIS
+	},							/* Shift_JIS; JIS X 0202-1991 */
+
+	{
+		"shiftjis2004", PG_SHIFT_JIS_2004
+	},							/* SHIFT-JIS-2004; Shift JIS for Japanese,
+								 * standard JIS X 0213 */
+	{
+		"sjis", PG_SJIS
+	},							/* alias for Shift_JIS */
+	{
+		"sqlascii", PG_SQL_ASCII
+	},
+	{
+		"tcvn", PG_WIN1258
+	},							/* alias for WIN1258 */
+	{
+		"tcvn5712", PG_WIN1258
+	},							/* alias for WIN1258 */
+	{
+		"uhc", PG_UHC
+	},							/* UHC; Korean Windows CodePage 949 */
+	{
+		"unicode", PG_UTF8
+	},							/* alias for UTF8 */
+	{
+		"utf8", PG_UTF8
+	},							/* alias for UTF8 */
+	{
+		"vscii", PG_WIN1258
+	},							/* alias for WIN1258 */
+	{
+		"win", PG_WIN1251
+	},							/* _dirty_ alias for windows-1251 (backward
+								 * compatibility) */
+	{
+		"win1250", PG_WIN1250
+	},							/* alias for Windows-1250 */
+	{
+		"win1251", PG_WIN1251
+	},							/* alias for Windows-1251 */
+	{
+		"win1252", PG_WIN1252
+	},							/* alias for Windows-1252 */
+	{
+		"win1253", PG_WIN1253
+	},							/* alias for Windows-1253 */
+	{
+		"win1254", PG_WIN1254
+	},							/* alias for Windows-1254 */
+	{
+		"win1255", PG_WIN1255
+	},							/* alias for Windows-1255 */
+	{
+		"win1256", PG_WIN1256
+	},							/* alias for Windows-1256 */
+	{
+		"win1257", PG_WIN1257
+	},							/* alias for Windows-1257 */
+	{
+		"win1258", PG_WIN1258
+	},							/* alias for Windows-1258 */
+	{
+		"win866", PG_WIN866
+	},							/* IBM866 */
+	{
+		"win874", PG_WIN874
+	},							/* alias for Windows-874 */
+	{
+		"win932", PG_SJIS
+	},							/* alias for Shift_JIS */
+	{
+		"win936", PG_GBK
+	},							/* alias for GBK */
+	{
+		"win949", PG_UHC
+	},							/* alias for UHC */
+	{
+		"win950", PG_BIG5
+	},							/* alias for BIG5 */
+	{
+		"windows1250", PG_WIN1250
+	},							/* Windows-1251; Microsoft */
+	{
+		"windows1251", PG_WIN1251
+	},							/* Windows-1251; Microsoft */
+	{
+		"windows1252", PG_WIN1252
+	},							/* Windows-1252; Microsoft */
+	{
+		"windows1253", PG_WIN1253
+	},							/* Windows-1253; Microsoft */
+	{
+		"windows1254", PG_WIN1254
+	},							/* Windows-1254; Microsoft */
+	{
+		"windows1255", PG_WIN1255
+	},							/* Windows-1255; Microsoft */
+	{
+		"windows1256", PG_WIN1256
+	},							/* Windows-1256; Microsoft */
+	{
+		"windows1257", PG_WIN1257
+	},							/* Windows-1257; Microsoft */
+	{
+		"windows1258", PG_WIN1258
+	},							/* Windows-1258; Microsoft */
+	{
+		"windows866", PG_WIN866
+	},							/* IBM866 */
+	{
+		"windows874", PG_WIN874
+	},							/* Windows-874; Microsoft */
+	{
+		"windows932", PG_SJIS
+	},							/* alias for Shift_JIS */
+	{
+		"windows936", PG_GBK
+	},							/* alias for GBK */
+	{
+		"windows949", PG_UHC
+	},							/* alias for UHC */
+	{
+		"windows950", PG_BIG5
+	}							/* alias for BIG5 */
+};
+
+/* ----------
+ * These are "official" encoding names.
+ * XXX must be sorted by the same order as enum pg_enc (in mb/pg_wchar.h)
+ * ----------
+ */
+#ifndef WIN32
+#define DEF_ENC2NAME(name, codepage) { #name, PG_##name }
+#else
+#define DEF_ENC2NAME(name, codepage) { #name, PG_##name, codepage }
+#endif
+const pg_enc2name pg_enc2name_tbl[] =
+{
+	DEF_ENC2NAME(SQL_ASCII, 0),
+	DEF_ENC2NAME(EUC_JP, 20932),
+	DEF_ENC2NAME(EUC_CN, 20936),
+	DEF_ENC2NAME(EUC_KR, 51949),
+	DEF_ENC2NAME(EUC_TW, 0),
+	DEF_ENC2NAME(EUC_JIS_2004, 20932),
+	DEF_ENC2NAME(UTF8, 65001),
+	DEF_ENC2NAME(MULE_INTERNAL, 0),
+	DEF_ENC2NAME(LATIN1, 28591),
+	DEF_ENC2NAME(LATIN2, 28592),
+	DEF_ENC2NAME(LATIN3, 28593),
+	DEF_ENC2NAME(LATIN4, 28594),
+	DEF_ENC2NAME(LATIN5, 28599),
+	DEF_ENC2NAME(LATIN6, 0),
+	DEF_ENC2NAME(LATIN7, 0),
+	DEF_ENC2NAME(LATIN8, 0),
+	DEF_ENC2NAME(LATIN9, 28605),
+	DEF_ENC2NAME(LATIN10, 0),
+	DEF_ENC2NAME(WIN1256, 1256),
+	DEF_ENC2NAME(WIN1258, 1258),
+	DEF_ENC2NAME(WIN866, 866),
+	DEF_ENC2NAME(WIN874, 874),
+	DEF_ENC2NAME(KOI8R, 20866),
+	DEF_ENC2NAME(WIN1251, 1251),
+	DEF_ENC2NAME(WIN1252, 1252),
+	DEF_ENC2NAME(ISO_8859_5, 28595),
+	DEF_ENC2NAME(ISO_8859_6, 28596),
+	DEF_ENC2NAME(ISO_8859_7, 28597),
+	DEF_ENC2NAME(ISO_8859_8, 28598),
+	DEF_ENC2NAME(WIN1250, 1250),
+	DEF_ENC2NAME(WIN1253, 1253),
+	DEF_ENC2NAME(WIN1254, 1254),
+	DEF_ENC2NAME(WIN1255, 1255),
+	DEF_ENC2NAME(WIN1257, 1257),
+	DEF_ENC2NAME(KOI8U, 21866),
+	DEF_ENC2NAME(SJIS, 932),
+	DEF_ENC2NAME(BIG5, 950),
+	DEF_ENC2NAME(GBK, 936),
+	DEF_ENC2NAME(UHC, 949),
+	DEF_ENC2NAME(GB18030, 54936),
+	DEF_ENC2NAME(JOHAB, 0),
+	DEF_ENC2NAME(SHIFT_JIS_2004, 932)
+};
+
+/* ----------
+ * These are encoding names for gettext.
+ *
+ * This covers all encodings except MULE_INTERNAL, which is alien to gettext.
+ * ----------
+ */
+const pg_enc2gettext pg_enc2gettext_tbl[] =
+{
+	{PG_SQL_ASCII, "US-ASCII"},
+	{PG_UTF8, "UTF-8"},
+	{PG_LATIN1, "LATIN1"},
+	{PG_LATIN2, "LATIN2"},
+	{PG_LATIN3, "LATIN3"},
+	{PG_LATIN4, "LATIN4"},
+	{PG_ISO_8859_5, "ISO-8859-5"},
+	{PG_ISO_8859_6, "ISO_8859-6"},
+	{PG_ISO_8859_7, "ISO-8859-7"},
+	{PG_ISO_8859_8, "ISO-8859-8"},
+	{PG_LATIN5, "LATIN5"},
+	{PG_LATIN6, "LATIN6"},
+	{PG_LATIN7, "LATIN7"},
+	{PG_LATIN8, "LATIN8"},
+	{PG_LATIN9, "LATIN-9"},
+	{PG_LATIN10, "LATIN10"},
+	{PG_KOI8R, "KOI8-R"},
+	{PG_KOI8U, "KOI8-U"},
+	{PG_WIN1250, "CP1250"},
+	{PG_WIN1251, "CP1251"},
+	{PG_WIN1252, "CP1252"},
+	{PG_WIN1253, "CP1253"},
+	{PG_WIN1254, "CP1254"},
+	{PG_WIN1255, "CP1255"},
+	{PG_WIN1256, "CP1256"},
+	{PG_WIN1257, "CP1257"},
+	{PG_WIN1258, "CP1258"},
+	{PG_WIN866, "CP866"},
+	{PG_WIN874, "CP874"},
+	{PG_EUC_CN, "EUC-CN"},
+	{PG_EUC_JP, "EUC-JP"},
+	{PG_EUC_KR, "EUC-KR"},
+	{PG_EUC_TW, "EUC-TW"},
+	{PG_EUC_JIS_2004, "EUC-JP"},
+	{PG_SJIS, "SHIFT-JIS"},
+	{PG_BIG5, "BIG5"},
+	{PG_GBK, "GBK"},
+	{PG_UHC, "UHC"},
+	{PG_GB18030, "GB18030"},
+	{PG_JOHAB, "JOHAB"},
+	{PG_SHIFT_JIS_2004, "SHIFT_JISX0213"},
+	{0, NULL}
+};
+
+
+#ifndef FRONTEND
+
+/*
+ * Table of encoding names for ICU
+ *
+ * Reference: <https://ssl.icu-project.org/icu-bin/convexp>
+ *
+ * NULL entries are not supported by ICU, or their mapping is unclear.
+ */
+static const char *const pg_enc2icu_tbl[] =
+{
+	NULL,						/* PG_SQL_ASCII */
+	"EUC-JP",					/* PG_EUC_JP */
+	"EUC-CN",					/* PG_EUC_CN */
+	"EUC-KR",					/* PG_EUC_KR */
+	"EUC-TW",					/* PG_EUC_TW */
+	NULL,						/* PG_EUC_JIS_2004 */
+	"UTF-8",					/* PG_UTF8 */
+	NULL,						/* PG_MULE_INTERNAL */
+	"ISO-8859-1",				/* PG_LATIN1 */
+	"ISO-8859-2",				/* PG_LATIN2 */
+	"ISO-8859-3",				/* PG_LATIN3 */
+	"ISO-8859-4",				/* PG_LATIN4 */
+	"ISO-8859-9",				/* PG_LATIN5 */
+	"ISO-8859-10",				/* PG_LATIN6 */
+	"ISO-8859-13",				/* PG_LATIN7 */
+	"ISO-8859-14",				/* PG_LATIN8 */
+	"ISO-8859-15",				/* PG_LATIN9 */
+	NULL,						/* PG_LATIN10 */
+	"CP1256",					/* PG_WIN1256 */
+	"CP1258",					/* PG_WIN1258 */
+	"CP866",					/* PG_WIN866 */
+	NULL,						/* PG_WIN874 */
+	"KOI8-R",					/* PG_KOI8R */
+	"CP1251",					/* PG_WIN1251 */
+	"CP1252",					/* PG_WIN1252 */
+	"ISO-8859-5",				/* PG_ISO_8859_5 */
+	"ISO-8859-6",				/* PG_ISO_8859_6 */
+	"ISO-8859-7",				/* PG_ISO_8859_7 */
+	"ISO-8859-8",				/* PG_ISO_8859_8 */
+	"CP1250",					/* PG_WIN1250 */
+	"CP1253",					/* PG_WIN1253 */
+	"CP1254",					/* PG_WIN1254 */
+	"CP1255",					/* PG_WIN1255 */
+	"CP1257",					/* PG_WIN1257 */
+	"KOI8-U",					/* PG_KOI8U */
+};
+
+bool
+is_encoding_supported_by_icu(int encoding)
+{
+	return (pg_enc2icu_tbl[encoding] != NULL);
+}
+
+const char *
+get_encoding_name_for_icu(int encoding)
+{
+	const char *icu_encoding_name;
+
+	StaticAssertStmt(lengthof(pg_enc2icu_tbl) == PG_ENCODING_BE_LAST + 1,
+					 "pg_enc2icu_tbl incomplete");
+
+	icu_encoding_name = pg_enc2icu_tbl[encoding];
+
+	if (!icu_encoding_name)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("encoding \"%s\" not supported by ICU",
+						pg_encoding_to_char(encoding))));
+
+	return icu_encoding_name;
+}
+
+#endif							/* not FRONTEND */
+
+
+/* ----------
+ * Encoding checks, for error returns -1 else encoding id
+ * ----------
+ */
+int
+pg_valid_client_encoding(const char *name)
+{
+	int			enc;
+
+	if ((enc = pg_char_to_encoding(name)) < 0)
+		return -1;
+
+	if (!PG_VALID_FE_ENCODING(enc))
+		return -1;
+
+	return enc;
+}
+
+int
+pg_valid_server_encoding(const char *name)
+{
+	int			enc;
+
+	if ((enc = pg_char_to_encoding(name)) < 0)
+		return -1;
+
+	if (!PG_VALID_BE_ENCODING(enc))
+		return -1;
+
+	return enc;
+}
+
+int
+pg_valid_server_encoding_id(int encoding)
+{
+	return PG_VALID_BE_ENCODING(encoding);
+}
+
+/* ----------
+ * Remove irrelevant chars from encoding name
+ * ----------
+ */
+static char *
+clean_encoding_name(const char *key, char *newkey)
+{
+	const char *p;
+	char	   *np;
+
+	for (p = key, np = newkey; *p != '\0'; p++)
+	{
+		if (isalnum((unsigned char) *p))
+		{
+			if (*p >= 'A' && *p <= 'Z')
+				*np++ = *p + 'a' - 'A';
+			else
+				*np++ = *p;
+		}
+	}
+	*np = '\0';
+	return newkey;
+}
+
+/* ----------
+ * Search encoding by encoding name
+ *
+ * Returns encoding ID, or -1 for error
+ * ----------
+ */
+int
+pg_char_to_encoding(const char *name)
+{
+	unsigned int nel = lengthof(pg_encname_tbl);
+	const pg_encname *base = pg_encname_tbl,
+			   *last = base + nel - 1,
+			   *position;
+	int			result;
+	char		buff[NAMEDATALEN],
+			   *key;
+
+	if (name == NULL || *name == '\0')
+		return -1;
+
+	if (strlen(name) >= NAMEDATALEN)
+	{
+#ifdef FRONTEND
+		fprintf(stderr, "encoding name too long\n");
+		return -1;
+#else
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("encoding name too long")));
+#endif
+	}
+	key = clean_encoding_name(name, buff);
+
+	while (last >= base)
+	{
+		position = base + ((last - base) >> 1);
+		result = key[0] - position->name[0];
+
+		if (result == 0)
+		{
+			result = strcmp(key, position->name);
+			if (result == 0)
+				return position->encoding;
+		}
+		if (result < 0)
+			last = position - 1;
+		else
+			base = position + 1;
+	}
+	return -1;
+}
+
+#ifndef FRONTEND
+Datum
+PG_char_to_encoding(PG_FUNCTION_ARGS)
+{
+	Name		s = PG_GETARG_NAME(0);
+
+	PG_RETURN_INT32(pg_char_to_encoding(NameStr(*s)));
+}
+#endif
+
+const char *
+pg_encoding_to_char(int encoding)
+{
+	if (PG_VALID_ENCODING(encoding))
+	{
+		const pg_enc2name *p = &pg_enc2name_tbl[encoding];
+
+		Assert(encoding == p->encoding);
+		return p->name;
+	}
+	return "";
+}
+
+#ifndef FRONTEND
+Datum
+PG_encoding_to_char(PG_FUNCTION_ARGS)
+{
+	int32		encoding = PG_GETARG_INT32(0);
+	const char *encoding_name = pg_encoding_to_char(encoding);
+
+	return DirectFunctionCall1(namein, CStringGetDatum(encoding_name));
+}
+
+#endif
diff --git a/src/common/saslprep.c b/src/common/saslprep.c
index 2a2449e..7739b81 100644
--- a/src/common/saslprep.c
+++ b/src/common/saslprep.c
@@ -27,12 +27,6 @@
 
 #include "common/saslprep.h"
 #include "common/unicode_norm.h"
-
-/*
- * Note: The functions in this file depend on functions from
- * src/backend/utils/mb/wchar.c, so in order to use this in frontend
- * code, you will need to link that in, too.
- */
 #include "mb/pg_wchar.h"
 
 /*
diff --git a/src/common/wchar.c b/src/common/wchar.c
new file mode 100644
index 0000000..74a8823
--- /dev/null
+++ b/src/common/wchar.c
@@ -0,0 +1,2041 @@
+/*-------------------------------------------------------------------------
+ *
+ * wchar.c
+ *	  Functions for working with multibyte characters in various encodings.
+ *
+ * Portions Copyright (c) 1998-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/common/wchar.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifdef FRONTEND
+#include "postgres_fe.h"
+#else
+#include "postgres.h"
+#endif
+
+#include "mb/pg_wchar.h"
+
+
+/*
+ * Operations on multi-byte encodings are driven by a table of helper
+ * functions.
+ *
+ * To add an encoding support, define mblen(), dsplen() and verifier() for
+ * the encoding.  For server-encodings, also define mb2wchar() and wchar2mb()
+ * conversion functions.
+ *
+ * These functions generally assume that their input is validly formed.
+ * The "verifier" functions, further down in the file, have to be more
+ * paranoid.
+ *
+ * We expect that mblen() does not need to examine more than the first byte
+ * of the character to discover the correct length.  GB18030 is an exception
+ * to that rule, though, as it also looks at second byte.  But even that
+ * behaves in a predictable way, if you only pass the first byte: it will
+ * treat 4-byte encoded characters as two 2-byte encoded characters, which is
+ * good enough for all current uses.
+ *
+ * Note: for the display output of psql to work properly, the return values
+ * of the dsplen functions must conform to the Unicode standard. In particular
+ * the NUL character is zero width and control characters are generally
+ * width -1. It is recommended that non-ASCII encodings refer their ASCII
+ * subset to the ASCII routines to ensure consistency.
+ */
+
+/*
+ * SQL/ASCII
+ */
+static int
+pg_ascii2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
+{
+	int			cnt = 0;
+
+	while (len > 0 && *from)
+	{
+		*to++ = *from++;
+		len--;
+		cnt++;
+	}
+	*to = 0;
+	return cnt;
+}
+
+static int
+pg_ascii_mblen(const unsigned char *s)
+{
+	return 1;
+}
+
+static int
+pg_ascii_dsplen(const unsigned char *s)
+{
+	if (*s == '\0')
+		return 0;
+	if (*s < 0x20 || *s == 0x7f)
+		return -1;
+
+	return 1;
+}
+
+/*
+ * EUC
+ */
+static int
+pg_euc2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
+{
+	int			cnt = 0;
+
+	while (len > 0 && *from)
+	{
+		if (*from == SS2 && len >= 2)	/* JIS X 0201 (so called "1 byte
+										 * KANA") */
+		{
+			from++;
+			*to = (SS2 << 8) | *from++;
+			len -= 2;
+		}
+		else if (*from == SS3 && len >= 3)	/* JIS X 0212 KANJI */
+		{
+			from++;
+			*to = (SS3 << 16) | (*from++ << 8);
+			*to |= *from++;
+			len -= 3;
+		}
+		else if (IS_HIGHBIT_SET(*from) && len >= 2) /* JIS X 0208 KANJI */
+		{
+			*to = *from++ << 8;
+			*to |= *from++;
+			len -= 2;
+		}
+		else					/* must be ASCII */
+		{
+			*to = *from++;
+			len--;
+		}
+		to++;
+		cnt++;
+	}
+	*to = 0;
+	return cnt;
+}
+
+static inline int
+pg_euc_mblen(const unsigned char *s)
+{
+	int			len;
+
+	if (*s == SS2)
+		len = 2;
+	else if (*s == SS3)
+		len = 3;
+	else if (IS_HIGHBIT_SET(*s))
+		len = 2;
+	else
+		len = 1;
+	return len;
+}
+
+static inline int
+pg_euc_dsplen(const unsigned char *s)
+{
+	int			len;
+
+	if (*s == SS2)
+		len = 2;
+	else if (*s == SS3)
+		len = 2;
+	else if (IS_HIGHBIT_SET(*s))
+		len = 2;
+	else
+		len = pg_ascii_dsplen(s);
+	return len;
+}
+
+/*
+ * EUC_JP
+ */
+static int
+pg_eucjp2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
+{
+	return pg_euc2wchar_with_len(from, to, len);
+}
+
+static int
+pg_eucjp_mblen(const unsigned char *s)
+{
+	return pg_euc_mblen(s);
+}
+
+static int
+pg_eucjp_dsplen(const unsigned char *s)
+{
+	int			len;
+
+	if (*s == SS2)
+		len = 1;
+	else if (*s == SS3)
+		len = 2;
+	else if (IS_HIGHBIT_SET(*s))
+		len = 2;
+	else
+		len = pg_ascii_dsplen(s);
+	return len;
+}
+
+/*
+ * EUC_KR
+ */
+static int
+pg_euckr2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
+{
+	return pg_euc2wchar_with_len(from, to, len);
+}
+
+static int
+pg_euckr_mblen(const unsigned char *s)
+{
+	return pg_euc_mblen(s);
+}
+
+static int
+pg_euckr_dsplen(const unsigned char *s)
+{
+	return pg_euc_dsplen(s);
+}
+
+/*
+ * EUC_CN
+ *
+ */
+static int
+pg_euccn2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
+{
+	int			cnt = 0;
+
+	while (len > 0 && *from)
+	{
+		if (*from == SS2 && len >= 3)	/* code set 2 (unused?) */
+		{
+			from++;
+			*to = (SS2 << 16) | (*from++ << 8);
+			*to |= *from++;
+			len -= 3;
+		}
+		else if (*from == SS3 && len >= 3)	/* code set 3 (unused ?) */
+		{
+			from++;
+			*to = (SS3 << 16) | (*from++ << 8);
+			*to |= *from++;
+			len -= 3;
+		}
+		else if (IS_HIGHBIT_SET(*from) && len >= 2) /* code set 1 */
+		{
+			*to = *from++ << 8;
+			*to |= *from++;
+			len -= 2;
+		}
+		else
+		{
+			*to = *from++;
+			len--;
+		}
+		to++;
+		cnt++;
+	}
+	*to = 0;
+	return cnt;
+}
+
+static int
+pg_euccn_mblen(const unsigned char *s)
+{
+	int			len;
+
+	if (IS_HIGHBIT_SET(*s))
+		len = 2;
+	else
+		len = 1;
+	return len;
+}
+
+static int
+pg_euccn_dsplen(const unsigned char *s)
+{
+	int			len;
+
+	if (IS_HIGHBIT_SET(*s))
+		len = 2;
+	else
+		len = pg_ascii_dsplen(s);
+	return len;
+}
+
+/*
+ * EUC_TW
+ *
+ */
+static int
+pg_euctw2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
+{
+	int			cnt = 0;
+
+	while (len > 0 && *from)
+	{
+		if (*from == SS2 && len >= 4)	/* code set 2 */
+		{
+			from++;
+			*to = (((uint32) SS2) << 24) | (*from++ << 16);
+			*to |= *from++ << 8;
+			*to |= *from++;
+			len -= 4;
+		}
+		else if (*from == SS3 && len >= 3)	/* code set 3 (unused?) */
+		{
+			from++;
+			*to = (SS3 << 16) | (*from++ << 8);
+			*to |= *from++;
+			len -= 3;
+		}
+		else if (IS_HIGHBIT_SET(*from) && len >= 2) /* code set 2 */
+		{
+			*to = *from++ << 8;
+			*to |= *from++;
+			len -= 2;
+		}
+		else
+		{
+			*to = *from++;
+			len--;
+		}
+		to++;
+		cnt++;
+	}
+	*to = 0;
+	return cnt;
+}
+
+static int
+pg_euctw_mblen(const unsigned char *s)
+{
+	int			len;
+
+	if (*s == SS2)
+		len = 4;
+	else if (*s == SS3)
+		len = 3;
+	else if (IS_HIGHBIT_SET(*s))
+		len = 2;
+	else
+		len = 1;
+	return len;
+}
+
+static int
+pg_euctw_dsplen(const unsigned char *s)
+{
+	int			len;
+
+	if (*s == SS2)
+		len = 2;
+	else if (*s == SS3)
+		len = 2;
+	else if (IS_HIGHBIT_SET(*s))
+		len = 2;
+	else
+		len = pg_ascii_dsplen(s);
+	return len;
+}
+
+/*
+ * Convert pg_wchar to EUC_* encoding.
+ * caller must allocate enough space for "to", including a trailing zero!
+ * len: length of from.
+ * "from" not necessarily null terminated.
+ */
+static int
+pg_wchar2euc_with_len(const pg_wchar *from, unsigned char *to, int len)
+{
+	int			cnt = 0;
+
+	while (len > 0 && *from)
+	{
+		unsigned char c;
+
+		if ((c = (*from >> 24)))
+		{
+			*to++ = c;
+			*to++ = (*from >> 16) & 0xff;
+			*to++ = (*from >> 8) & 0xff;
+			*to++ = *from & 0xff;
+			cnt += 4;
+		}
+		else if ((c = (*from >> 16)))
+		{
+			*to++ = c;
+			*to++ = (*from >> 8) & 0xff;
+			*to++ = *from & 0xff;
+			cnt += 3;
+		}
+		else if ((c = (*from >> 8)))
+		{
+			*to++ = c;
+			*to++ = *from & 0xff;
+			cnt += 2;
+		}
+		else
+		{
+			*to++ = *from;
+			cnt++;
+		}
+		from++;
+		len--;
+	}
+	*to = 0;
+	return cnt;
+}
+
+
+/*
+ * JOHAB
+ */
+static int
+pg_johab_mblen(const unsigned char *s)
+{
+	return pg_euc_mblen(s);
+}
+
+static int
+pg_johab_dsplen(const unsigned char *s)
+{
+	return pg_euc_dsplen(s);
+}
+
+/*
+ * convert UTF8 string to pg_wchar (UCS-4)
+ * caller must allocate enough space for "to", including a trailing zero!
+ * len: length of from.
+ * "from" not necessarily null terminated.
+ */
+static int
+pg_utf2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
+{
+	int			cnt = 0;
+	uint32		c1,
+				c2,
+				c3,
+				c4;
+
+	while (len > 0 && *from)
+	{
+		if ((*from & 0x80) == 0)
+		{
+			*to = *from++;
+			len--;
+		}
+		else if ((*from & 0xe0) == 0xc0)
+		{
+			if (len < 2)
+				break;			/* drop trailing incomplete char */
+			c1 = *from++ & 0x1f;
+			c2 = *from++ & 0x3f;
+			*to = (c1 << 6) | c2;
+			len -= 2;
+		}
+		else if ((*from & 0xf0) == 0xe0)
+		{
+			if (len < 3)
+				break;			/* drop trailing incomplete char */
+			c1 = *from++ & 0x0f;
+			c2 = *from++ & 0x3f;
+			c3 = *from++ & 0x3f;
+			*to = (c1 << 12) | (c2 << 6) | c3;
+			len -= 3;
+		}
+		else if ((*from & 0xf8) == 0xf0)
+		{
+			if (len < 4)
+				break;			/* drop trailing incomplete char */
+			c1 = *from++ & 0x07;
+			c2 = *from++ & 0x3f;
+			c3 = *from++ & 0x3f;
+			c4 = *from++ & 0x3f;
+			*to = (c1 << 18) | (c2 << 12) | (c3 << 6) | c4;
+			len -= 4;
+		}
+		else
+		{
+			/* treat a bogus char as length 1; not ours to raise error */
+			*to = *from++;
+			len--;
+		}
+		to++;
+		cnt++;
+	}
+	*to = 0;
+	return cnt;
+}
+
+
+/*
+ * Map a Unicode code point to UTF-8.  utf8string must have 4 bytes of
+ * space allocated.
+ */
+unsigned char *
+unicode_to_utf8(pg_wchar c, unsigned char *utf8string)
+{
+	if (c <= 0x7F)
+	{
+		utf8string[0] = c;
+	}
+	else if (c <= 0x7FF)
+	{
+		utf8string[0] = 0xC0 | ((c >> 6) & 0x1F);
+		utf8string[1] = 0x80 | (c & 0x3F);
+	}
+	else if (c <= 0xFFFF)
+	{
+		utf8string[0] = 0xE0 | ((c >> 12) & 0x0F);
+		utf8string[1] = 0x80 | ((c >> 6) & 0x3F);
+		utf8string[2] = 0x80 | (c & 0x3F);
+	}
+	else
+	{
+		utf8string[0] = 0xF0 | ((c >> 18) & 0x07);
+		utf8string[1] = 0x80 | ((c >> 12) & 0x3F);
+		utf8string[2] = 0x80 | ((c >> 6) & 0x3F);
+		utf8string[3] = 0x80 | (c & 0x3F);
+	}
+
+	return utf8string;
+}
+
+/*
+ * Trivial conversion from pg_wchar to UTF-8.
+ * caller should allocate enough space for "to"
+ * len: length of from.
+ * "from" not necessarily null terminated.
+ */
+static int
+pg_wchar2utf_with_len(const pg_wchar *from, unsigned char *to, int len)
+{
+	int			cnt = 0;
+
+	while (len > 0 && *from)
+	{
+		int			char_len;
+
+		unicode_to_utf8(*from, to);
+		char_len = pg_utf_mblen(to);
+		cnt += char_len;
+		to += char_len;
+		from++;
+		len--;
+	}
+	*to = 0;
+	return cnt;
+}
+
+/*
+ * Return the byte length of a UTF8 character pointed to by s
+ *
+ * Note: in the current implementation we do not support UTF8 sequences
+ * of more than 4 bytes; hence do NOT return a value larger than 4.
+ * We return "1" for any leading byte that is either flat-out illegal or
+ * indicates a length larger than we support.
+ *
+ * pg_utf2wchar_with_len(), utf8_to_unicode(), pg_utf8_islegal(), and perhaps
+ * other places would need to be fixed to change this.
+ */
+int
+pg_utf_mblen(const unsigned char *s)
+{
+	int			len;
+
+	if ((*s & 0x80) == 0)
+		len = 1;
+	else if ((*s & 0xe0) == 0xc0)
+		len = 2;
+	else if ((*s & 0xf0) == 0xe0)
+		len = 3;
+	else if ((*s & 0xf8) == 0xf0)
+		len = 4;
+#ifdef NOT_USED
+	else if ((*s & 0xfc) == 0xf8)
+		len = 5;
+	else if ((*s & 0xfe) == 0xfc)
+		len = 6;
+#endif
+	else
+		len = 1;
+	return len;
+}
+
+/*
+ * This is an implementation of wcwidth() and wcswidth() as defined in
+ * "The Single UNIX Specification, Version 2, The Open Group, 1997"
+ * <http://www.unix.org/online.html>
+ *
+ * Markus Kuhn -- 2001-09-08 -- public domain
+ *
+ * customised for PostgreSQL
+ *
+ * original available at : http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
+ */
+
+struct mbinterval
+{
+	unsigned short first;
+	unsigned short last;
+};
+
+/* auxiliary function for binary search in interval table */
+static int
+mbbisearch(pg_wchar ucs, const struct mbinterval *table, int max)
+{
+	int			min = 0;
+	int			mid;
+
+	if (ucs < table[0].first || ucs > table[max].last)
+		return 0;
+	while (max >= min)
+	{
+		mid = (min + max) / 2;
+		if (ucs > table[mid].last)
+			min = mid + 1;
+		else if (ucs < table[mid].first)
+			max = mid - 1;
+		else
+			return 1;
+	}
+
+	return 0;
+}
+
+
+/* The following functions define the column width of an ISO 10646
+ * character as follows:
+ *
+ *	  - The null character (U+0000) has a column width of 0.
+ *
+ *	  - Other C0/C1 control characters and DEL will lead to a return
+ *		value of -1.
+ *
+ *	  - Non-spacing and enclosing combining characters (general
+ *		category code Mn or Me in the Unicode database) have a
+ *		column width of 0.
+ *
+ *	  - Other format characters (general category code Cf in the Unicode
+ *		database) and ZERO WIDTH SPACE (U+200B) have a column width of 0.
+ *
+ *	  - Hangul Jamo medial vowels and final consonants (U+1160-U+11FF)
+ *		have a column width of 0.
+ *
+ *	  - Spacing characters in the East Asian Wide (W) or East Asian
+ *		FullWidth (F) category as defined in Unicode Technical
+ *		Report #11 have a column width of 2.
+ *
+ *	  - All remaining characters (including all printable
+ *		ISO 8859-1 and WGL4 characters, Unicode control characters,
+ *		etc.) have a column width of 1.
+ *
+ * This implementation assumes that wchar_t characters are encoded
+ * in ISO 10646.
+ */
+
+static int
+ucs_wcwidth(pg_wchar ucs)
+{
+#include "common/unicode_combining_table.h"
+
+	/* test for 8-bit control characters */
+	if (ucs == 0)
+		return 0;
+
+	if (ucs < 0x20 || (ucs >= 0x7f && ucs < 0xa0) || ucs > 0x0010ffff)
+		return -1;
+
+	/* binary search in table of non-spacing characters */
+	if (mbbisearch(ucs, combining,
+				   sizeof(combining) / sizeof(struct mbinterval) - 1))
+		return 0;
+
+	/*
+	 * if we arrive here, ucs is not a combining or C0/C1 control character
+	 */
+
+	return 1 +
+		(ucs >= 0x1100 &&
+		 (ucs <= 0x115f ||		/* Hangul Jamo init. consonants */
+		  (ucs >= 0x2e80 && ucs <= 0xa4cf && (ucs & ~0x0011) != 0x300a &&
+		   ucs != 0x303f) ||	/* CJK ... Yi */
+		  (ucs >= 0xac00 && ucs <= 0xd7a3) ||	/* Hangul Syllables */
+		  (ucs >= 0xf900 && ucs <= 0xfaff) ||	/* CJK Compatibility
+												 * Ideographs */
+		  (ucs >= 0xfe30 && ucs <= 0xfe6f) ||	/* CJK Compatibility Forms */
+		  (ucs >= 0xff00 && ucs <= 0xff5f) ||	/* Fullwidth Forms */
+		  (ucs >= 0xffe0 && ucs <= 0xffe6) ||
+		  (ucs >= 0x20000 && ucs <= 0x2ffff)));
+}
+
+/*
+ * Convert a UTF-8 character to a Unicode code point.
+ * This is a one-character version of pg_utf2wchar_with_len.
+ *
+ * No error checks here, c must point to a long-enough string.
+ */
+pg_wchar
+utf8_to_unicode(const unsigned char *c)
+{
+	if ((*c & 0x80) == 0)
+		return (pg_wchar) c[0];
+	else if ((*c & 0xe0) == 0xc0)
+		return (pg_wchar) (((c[0] & 0x1f) << 6) |
+						   (c[1] & 0x3f));
+	else if ((*c & 0xf0) == 0xe0)
+		return (pg_wchar) (((c[0] & 0x0f) << 12) |
+						   ((c[1] & 0x3f) << 6) |
+						   (c[2] & 0x3f));
+	else if ((*c & 0xf8) == 0xf0)
+		return (pg_wchar) (((c[0] & 0x07) << 18) |
+						   ((c[1] & 0x3f) << 12) |
+						   ((c[2] & 0x3f) << 6) |
+						   (c[3] & 0x3f));
+	else
+		/* that is an invalid code on purpose */
+		return 0xffffffff;
+}
+
+static int
+pg_utf_dsplen(const unsigned char *s)
+{
+	return ucs_wcwidth(utf8_to_unicode(s));
+}
+
+/*
+ * convert mule internal code to pg_wchar
+ * caller should allocate enough space for "to"
+ * len: length of from.
+ * "from" not necessarily null terminated.
+ */
+static int
+pg_mule2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
+{
+	int			cnt = 0;
+
+	while (len > 0 && *from)
+	{
+		if (IS_LC1(*from) && len >= 2)
+		{
+			*to = *from++ << 16;
+			*to |= *from++;
+			len -= 2;
+		}
+		else if (IS_LCPRV1(*from) && len >= 3)
+		{
+			from++;
+			*to = *from++ << 16;
+			*to |= *from++;
+			len -= 3;
+		}
+		else if (IS_LC2(*from) && len >= 3)
+		{
+			*to = *from++ << 16;
+			*to |= *from++ << 8;
+			*to |= *from++;
+			len -= 3;
+		}
+		else if (IS_LCPRV2(*from) && len >= 4)
+		{
+			from++;
+			*to = *from++ << 16;
+			*to |= *from++ << 8;
+			*to |= *from++;
+			len -= 4;
+		}
+		else
+		{						/* assume ASCII */
+			*to = (unsigned char) *from++;
+			len--;
+		}
+		to++;
+		cnt++;
+	}
+	*to = 0;
+	return cnt;
+}
+
+/*
+ * convert pg_wchar to mule internal code
+ * caller should allocate enough space for "to"
+ * len: length of from.
+ * "from" not necessarily null terminated.
+ */
+static int
+pg_wchar2mule_with_len(const pg_wchar *from, unsigned char *to, int len)
+{
+	int			cnt = 0;
+
+	while (len > 0 && *from)
+	{
+		unsigned char lb;
+
+		lb = (*from >> 16) & 0xff;
+		if (IS_LC1(lb))
+		{
+			*to++ = lb;
+			*to++ = *from & 0xff;
+			cnt += 2;
+		}
+		else if (IS_LC2(lb))
+		{
+			*to++ = lb;
+			*to++ = (*from >> 8) & 0xff;
+			*to++ = *from & 0xff;
+			cnt += 3;
+		}
+		else if (IS_LCPRV1_A_RANGE(lb))
+		{
+			*to++ = LCPRV1_A;
+			*to++ = lb;
+			*to++ = *from & 0xff;
+			cnt += 3;
+		}
+		else if (IS_LCPRV1_B_RANGE(lb))
+		{
+			*to++ = LCPRV1_B;
+			*to++ = lb;
+			*to++ = *from & 0xff;
+			cnt += 3;
+		}
+		else if (IS_LCPRV2_A_RANGE(lb))
+		{
+			*to++ = LCPRV2_A;
+			*to++ = lb;
+			*to++ = (*from >> 8) & 0xff;
+			*to++ = *from & 0xff;
+			cnt += 4;
+		}
+		else if (IS_LCPRV2_B_RANGE(lb))
+		{
+			*to++ = LCPRV2_B;
+			*to++ = lb;
+			*to++ = (*from >> 8) & 0xff;
+			*to++ = *from & 0xff;
+			cnt += 4;
+		}
+		else
+		{
+			*to++ = *from & 0xff;
+			cnt += 1;
+		}
+		from++;
+		len--;
+	}
+	*to = 0;
+	return cnt;
+}
+
+int
+pg_mule_mblen(const unsigned char *s)
+{
+	int			len;
+
+	if (IS_LC1(*s))
+		len = 2;
+	else if (IS_LCPRV1(*s))
+		len = 3;
+	else if (IS_LC2(*s))
+		len = 3;
+	else if (IS_LCPRV2(*s))
+		len = 4;
+	else
+		len = 1;				/* assume ASCII */
+	return len;
+}
+
+static int
+pg_mule_dsplen(const unsigned char *s)
+{
+	int			len;
+
+	/*
+	 * Note: it's not really appropriate to assume that all multibyte charsets
+	 * are double-wide on screen.  But this seems an okay approximation for
+	 * the MULE charsets we currently support.
+	 */
+
+	if (IS_LC1(*s))
+		len = 1;
+	else if (IS_LCPRV1(*s))
+		len = 1;
+	else if (IS_LC2(*s))
+		len = 2;
+	else if (IS_LCPRV2(*s))
+		len = 2;
+	else
+		len = 1;				/* assume ASCII */
+
+	return len;
+}
+
+/*
+ * ISO8859-1
+ */
+static int
+pg_latin12wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
+{
+	int			cnt = 0;
+
+	while (len > 0 && *from)
+	{
+		*to++ = *from++;
+		len--;
+		cnt++;
+	}
+	*to = 0;
+	return cnt;
+}
+
+/*
+ * Trivial conversion from pg_wchar to single byte encoding. Just ignores
+ * high bits.
+ * caller should allocate enough space for "to"
+ * len: length of from.
+ * "from" not necessarily null terminated.
+ */
+static int
+pg_wchar2single_with_len(const pg_wchar *from, unsigned char *to, int len)
+{
+	int			cnt = 0;
+
+	while (len > 0 && *from)
+	{
+		*to++ = *from++;
+		len--;
+		cnt++;
+	}
+	*to = 0;
+	return cnt;
+}
+
+static int
+pg_latin1_mblen(const unsigned char *s)
+{
+	return 1;
+}
+
+static int
+pg_latin1_dsplen(const unsigned char *s)
+{
+	return pg_ascii_dsplen(s);
+}
+
+/*
+ * SJIS
+ */
+static int
+pg_sjis_mblen(const unsigned char *s)
+{
+	int			len;
+
+	if (*s >= 0xa1 && *s <= 0xdf)
+		len = 1;				/* 1 byte kana? */
+	else if (IS_HIGHBIT_SET(*s))
+		len = 2;				/* kanji? */
+	else
+		len = 1;				/* should be ASCII */
+	return len;
+}
+
+static int
+pg_sjis_dsplen(const unsigned char *s)
+{
+	int			len;
+
+	if (*s >= 0xa1 && *s <= 0xdf)
+		len = 1;				/* 1 byte kana? */
+	else if (IS_HIGHBIT_SET(*s))
+		len = 2;				/* kanji? */
+	else
+		len = pg_ascii_dsplen(s);	/* should be ASCII */
+	return len;
+}
+
+/*
+ * Big5
+ */
+static int
+pg_big5_mblen(const unsigned char *s)
+{
+	int			len;
+
+	if (IS_HIGHBIT_SET(*s))
+		len = 2;				/* kanji? */
+	else
+		len = 1;				/* should be ASCII */
+	return len;
+}
+
+static int
+pg_big5_dsplen(const unsigned char *s)
+{
+	int			len;
+
+	if (IS_HIGHBIT_SET(*s))
+		len = 2;				/* kanji? */
+	else
+		len = pg_ascii_dsplen(s);	/* should be ASCII */
+	return len;
+}
+
+/*
+ * GBK
+ */
+static int
+pg_gbk_mblen(const unsigned char *s)
+{
+	int			len;
+
+	if (IS_HIGHBIT_SET(*s))
+		len = 2;				/* kanji? */
+	else
+		len = 1;				/* should be ASCII */
+	return len;
+}
+
+static int
+pg_gbk_dsplen(const unsigned char *s)
+{
+	int			len;
+
+	if (IS_HIGHBIT_SET(*s))
+		len = 2;				/* kanji? */
+	else
+		len = pg_ascii_dsplen(s);	/* should be ASCII */
+	return len;
+}
+
+/*
+ * UHC
+ */
+static int
+pg_uhc_mblen(const unsigned char *s)
+{
+	int			len;
+
+	if (IS_HIGHBIT_SET(*s))
+		len = 2;				/* 2byte? */
+	else
+		len = 1;				/* should be ASCII */
+	return len;
+}
+
+static int
+pg_uhc_dsplen(const unsigned char *s)
+{
+	int			len;
+
+	if (IS_HIGHBIT_SET(*s))
+		len = 2;				/* 2byte? */
+	else
+		len = pg_ascii_dsplen(s);	/* should be ASCII */
+	return len;
+}
+
+/*
+ * GB18030
+ *	Added by Bill Huang <bhuang@redhat.com>,<bill_huanghb@ybb.ne.jp>
+ */
+
+/*
+ * Unlike all other mblen() functions, this also looks at the second byte of
+ * the input.  However, if you only pass the first byte of a multi-byte
+ * string, and \0 as the second byte, this still works in a predictable way:
+ * a 4-byte character will be reported as two 2-byte characters.  That's
+ * enough for all current uses, as a client-only encoding.  It works that
+ * way, because in any valid 4-byte GB18030-encoded character, the third and
+ * fourth byte look like a 2-byte encoded character, when looked at
+ * separately.
+ */
+static int
+pg_gb18030_mblen(const unsigned char *s)
+{
+	int			len;
+
+	if (!IS_HIGHBIT_SET(*s))
+		len = 1;				/* ASCII */
+	else if (*(s + 1) >= 0x30 && *(s + 1) <= 0x39)
+		len = 4;
+	else
+		len = 2;
+	return len;
+}
+
+static int
+pg_gb18030_dsplen(const unsigned char *s)
+{
+	int			len;
+
+	if (IS_HIGHBIT_SET(*s))
+		len = 2;
+	else
+		len = pg_ascii_dsplen(s);	/* ASCII */
+	return len;
+}
+
+/*
+ *-------------------------------------------------------------------
+ * multibyte sequence validators
+ *
+ * These functions accept "s", a pointer to the first byte of a string,
+ * and "len", the remaining length of the string.  If there is a validly
+ * encoded character beginning at *s, return its length in bytes; else
+ * return -1.
+ *
+ * The functions can assume that len > 0 and that *s != '\0', but they must
+ * test for and reject zeroes in any additional bytes of a multibyte character.
+ *
+ * Note that this definition allows the function for a single-byte
+ * encoding to be just "return 1".
+ *-------------------------------------------------------------------
+ */
+
+static int
+pg_ascii_verifier(const unsigned char *s, int len)
+{
+	return 1;
+}
+
+#define IS_EUC_RANGE_VALID(c)	((c) >= 0xa1 && (c) <= 0xfe)
+
+static int
+pg_eucjp_verifier(const unsigned char *s, int len)
+{
+	int			l;
+	unsigned char c1,
+				c2;
+
+	c1 = *s++;
+
+	switch (c1)
+	{
+		case SS2:				/* JIS X 0201 */
+			l = 2;
+			if (l > len)
+				return -1;
+			c2 = *s++;
+			if (c2 < 0xa1 || c2 > 0xdf)
+				return -1;
+			break;
+
+		case SS3:				/* JIS X 0212 */
+			l = 3;
+			if (l > len)
+				return -1;
+			c2 = *s++;
+			if (!IS_EUC_RANGE_VALID(c2))
+				return -1;
+			c2 = *s++;
+			if (!IS_EUC_RANGE_VALID(c2))
+				return -1;
+			break;
+
+		default:
+			if (IS_HIGHBIT_SET(c1)) /* JIS X 0208? */
+			{
+				l = 2;
+				if (l > len)
+					return -1;
+				if (!IS_EUC_RANGE_VALID(c1))
+					return -1;
+				c2 = *s++;
+				if (!IS_EUC_RANGE_VALID(c2))
+					return -1;
+			}
+			else
+				/* must be ASCII */
+			{
+				l = 1;
+			}
+			break;
+	}
+
+	return l;
+}
+
+static int
+pg_euckr_verifier(const unsigned char *s, int len)
+{
+	int			l;
+	unsigned char c1,
+				c2;
+
+	c1 = *s++;
+
+	if (IS_HIGHBIT_SET(c1))
+	{
+		l = 2;
+		if (l > len)
+			return -1;
+		if (!IS_EUC_RANGE_VALID(c1))
+			return -1;
+		c2 = *s++;
+		if (!IS_EUC_RANGE_VALID(c2))
+			return -1;
+	}
+	else
+		/* must be ASCII */
+	{
+		l = 1;
+	}
+
+	return l;
+}
+
+/* EUC-CN byte sequences are exactly same as EUC-KR */
+#define pg_euccn_verifier	pg_euckr_verifier
+
+static int
+pg_euctw_verifier(const unsigned char *s, int len)
+{
+	int			l;
+	unsigned char c1,
+				c2;
+
+	c1 = *s++;
+
+	switch (c1)
+	{
+		case SS2:				/* CNS 11643 Plane 1-7 */
+			l = 4;
+			if (l > len)
+				return -1;
+			c2 = *s++;
+			if (c2 < 0xa1 || c2 > 0xa7)
+				return -1;
+			c2 = *s++;
+			if (!IS_EUC_RANGE_VALID(c2))
+				return -1;
+			c2 = *s++;
+			if (!IS_EUC_RANGE_VALID(c2))
+				return -1;
+			break;
+
+		case SS3:				/* unused */
+			return -1;
+
+		default:
+			if (IS_HIGHBIT_SET(c1)) /* CNS 11643 Plane 1 */
+			{
+				l = 2;
+				if (l > len)
+					return -1;
+				/* no further range check on c1? */
+				c2 = *s++;
+				if (!IS_EUC_RANGE_VALID(c2))
+					return -1;
+			}
+			else
+				/* must be ASCII */
+			{
+				l = 1;
+			}
+			break;
+	}
+	return l;
+}
+
+static int
+pg_johab_verifier(const unsigned char *s, int len)
+{
+	int			l,
+				mbl;
+	unsigned char c;
+
+	l = mbl = pg_johab_mblen(s);
+
+	if (len < l)
+		return -1;
+
+	if (!IS_HIGHBIT_SET(*s))
+		return mbl;
+
+	while (--l > 0)
+	{
+		c = *++s;
+		if (!IS_EUC_RANGE_VALID(c))
+			return -1;
+	}
+	return mbl;
+}
+
+static int
+pg_mule_verifier(const unsigned char *s, int len)
+{
+	int			l,
+				mbl;
+	unsigned char c;
+
+	l = mbl = pg_mule_mblen(s);
+
+	if (len < l)
+		return -1;
+
+	while (--l > 0)
+	{
+		c = *++s;
+		if (!IS_HIGHBIT_SET(c))
+			return -1;
+	}
+	return mbl;
+}
+
+static int
+pg_latin1_verifier(const unsigned char *s, int len)
+{
+	return 1;
+}
+
+static int
+pg_sjis_verifier(const unsigned char *s, int len)
+{
+	int			l,
+				mbl;
+	unsigned char c1,
+				c2;
+
+	l = mbl = pg_sjis_mblen(s);
+
+	if (len < l)
+		return -1;
+
+	if (l == 1)					/* pg_sjis_mblen already verified it */
+		return mbl;
+
+	c1 = *s++;
+	c2 = *s;
+	if (!ISSJISHEAD(c1) || !ISSJISTAIL(c2))
+		return -1;
+	return mbl;
+}
+
+static int
+pg_big5_verifier(const unsigned char *s, int len)
+{
+	int			l,
+				mbl;
+
+	l = mbl = pg_big5_mblen(s);
+
+	if (len < l)
+		return -1;
+
+	while (--l > 0)
+	{
+		if (*++s == '\0')
+			return -1;
+	}
+
+	return mbl;
+}
+
+static int
+pg_gbk_verifier(const unsigned char *s, int len)
+{
+	int			l,
+				mbl;
+
+	l = mbl = pg_gbk_mblen(s);
+
+	if (len < l)
+		return -1;
+
+	while (--l > 0)
+	{
+		if (*++s == '\0')
+			return -1;
+	}
+
+	return mbl;
+}
+
+static int
+pg_uhc_verifier(const unsigned char *s, int len)
+{
+	int			l,
+				mbl;
+
+	l = mbl = pg_uhc_mblen(s);
+
+	if (len < l)
+		return -1;
+
+	while (--l > 0)
+	{
+		if (*++s == '\0')
+			return -1;
+	}
+
+	return mbl;
+}
+
+static int
+pg_gb18030_verifier(const unsigned char *s, int len)
+{
+	int			l;
+
+	if (!IS_HIGHBIT_SET(*s))
+		l = 1;					/* ASCII */
+	else if (len >= 4 && *(s + 1) >= 0x30 && *(s + 1) <= 0x39)
+	{
+		/* Should be 4-byte, validate remaining bytes */
+		if (*s >= 0x81 && *s <= 0xfe &&
+			*(s + 2) >= 0x81 && *(s + 2) <= 0xfe &&
+			*(s + 3) >= 0x30 && *(s + 3) <= 0x39)
+			l = 4;
+		else
+			l = -1;
+	}
+	else if (len >= 2 && *s >= 0x81 && *s <= 0xfe)
+	{
+		/* Should be 2-byte, validate */
+		if ((*(s + 1) >= 0x40 && *(s + 1) <= 0x7e) ||
+			(*(s + 1) >= 0x80 && *(s + 1) <= 0xfe))
+			l = 2;
+		else
+			l = -1;
+	}
+	else
+		l = -1;
+	return l;
+}
+
+static int
+pg_utf8_verifier(const unsigned char *s, int len)
+{
+	int			l = pg_utf_mblen(s);
+
+	if (len < l)
+		return -1;
+
+	if (!pg_utf8_islegal(s, l))
+		return -1;
+
+	return l;
+}
+
+/*
+ * Check for validity of a single UTF-8 encoded character
+ *
+ * This directly implements the rules in RFC3629.  The bizarre-looking
+ * restrictions on the second byte are meant to ensure that there isn't
+ * more than one encoding of a given Unicode character point; that is,
+ * you may not use a longer-than-necessary byte sequence with high order
+ * zero bits to represent a character that would fit in fewer bytes.
+ * To do otherwise is to create security hazards (eg, create an apparent
+ * non-ASCII character that decodes to plain ASCII).
+ *
+ * length is assumed to have been obtained by pg_utf_mblen(), and the
+ * caller must have checked that that many bytes are present in the buffer.
+ */
+bool
+pg_utf8_islegal(const unsigned char *source, int length)
+{
+	unsigned char a;
+
+	switch (length)
+	{
+		default:
+			/* reject lengths 5 and 6 for now */
+			return false;
+		case 4:
+			a = source[3];
+			if (a < 0x80 || a > 0xBF)
+				return false;
+			/* FALL THRU */
+		case 3:
+			a = source[2];
+			if (a < 0x80 || a > 0xBF)
+				return false;
+			/* FALL THRU */
+		case 2:
+			a = source[1];
+			switch (*source)
+			{
+				case 0xE0:
+					if (a < 0xA0 || a > 0xBF)
+						return false;
+					break;
+				case 0xED:
+					if (a < 0x80 || a > 0x9F)
+						return false;
+					break;
+				case 0xF0:
+					if (a < 0x90 || a > 0xBF)
+						return false;
+					break;
+				case 0xF4:
+					if (a < 0x80 || a > 0x8F)
+						return false;
+					break;
+				default:
+					if (a < 0x80 || a > 0xBF)
+						return false;
+					break;
+			}
+			/* FALL THRU */
+		case 1:
+			a = *source;
+			if (a >= 0x80 && a < 0xC2)
+				return false;
+			if (a > 0xF4)
+				return false;
+			break;
+	}
+	return true;
+}
+
+#ifndef FRONTEND
+
+/*
+ * Generic character incrementer function.
+ *
+ * Not knowing anything about the properties of the encoding in use, we just
+ * keep incrementing the last byte until we get a validly-encoded result,
+ * or we run out of values to try.  We don't bother to try incrementing
+ * higher-order bytes, so there's no growth in runtime for wider characters.
+ * (If we did try to do that, we'd need to consider the likelihood that 255
+ * is not a valid final byte in the encoding.)
+ */
+static bool
+pg_generic_charinc(unsigned char *charptr, int len)
+{
+	unsigned char *lastbyte = charptr + len - 1;
+	mbverifier	mbverify;
+
+	/* We can just invoke the character verifier directly. */
+	mbverify = pg_wchar_table[GetDatabaseEncoding()].mbverify;
+
+	while (*lastbyte < (unsigned char) 255)
+	{
+		(*lastbyte)++;
+		if ((*mbverify) (charptr, len) == len)
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * UTF-8 character incrementer function.
+ *
+ * For a one-byte character less than 0x7F, we just increment the byte.
+ *
+ * For a multibyte character, every byte but the first must fall between 0x80
+ * and 0xBF; and the first byte must be between 0xC0 and 0xF4.  We increment
+ * the last byte that's not already at its maximum value.  If we can't find a
+ * byte that's less than the maximum allowable value, we simply fail.  We also
+ * need some special-case logic to skip regions used for surrogate pair
+ * handling, as those should not occur in valid UTF-8.
+ *
+ * Note that we don't reset lower-order bytes back to their minimums, since
+ * we can't afford to make an exhaustive search (see make_greater_string).
+ */
+static bool
+pg_utf8_increment(unsigned char *charptr, int length)
+{
+	unsigned char a;
+	unsigned char limit;
+
+	switch (length)
+	{
+		default:
+			/* reject lengths 5 and 6 for now */
+			return false;
+		case 4:
+			a = charptr[3];
+			if (a < 0xBF)
+			{
+				charptr[3]++;
+				break;
+			}
+			/* FALL THRU */
+		case 3:
+			a = charptr[2];
+			if (a < 0xBF)
+			{
+				charptr[2]++;
+				break;
+			}
+			/* FALL THRU */
+		case 2:
+			a = charptr[1];
+			switch (*charptr)
+			{
+				case 0xED:
+					limit = 0x9F;
+					break;
+				case 0xF4:
+					limit = 0x8F;
+					break;
+				default:
+					limit = 0xBF;
+					break;
+			}
+			if (a < limit)
+			{
+				charptr[1]++;
+				break;
+			}
+			/* FALL THRU */
+		case 1:
+			a = *charptr;
+			if (a == 0x7F || a == 0xDF || a == 0xEF || a == 0xF4)
+				return false;
+			charptr[0]++;
+			break;
+	}
+
+	return true;
+}
+
+/*
+ * EUC-JP character incrementer function.
+ *
+ * If the sequence starts with SS2 (0x8e), it must be a two-byte sequence
+ * representing JIS X 0201 characters with the second byte ranging between
+ * 0xa1 and 0xdf.  We just increment the last byte if it's less than 0xdf,
+ * and otherwise rewrite the whole sequence to 0xa1 0xa1.
+ *
+ * If the sequence starts with SS3 (0x8f), it must be a three-byte sequence
+ * in which the last two bytes range between 0xa1 and 0xfe.  The last byte
+ * is incremented if possible, otherwise the second-to-last byte.
+ *
+ * If the sequence starts with a value other than the above and its MSB
+ * is set, it must be a two-byte sequence representing JIS X 0208 characters
+ * with both bytes ranging between 0xa1 and 0xfe.  The last byte is
+ * incremented if possible, otherwise the second-to-last byte.
+ *
+ * Otherwise, the sequence is a single-byte ASCII character. It is
+ * incremented up to 0x7f.
+ */
+static bool
+pg_eucjp_increment(unsigned char *charptr, int length)
+{
+	unsigned char c1,
+				c2;
+	int			i;
+
+	c1 = *charptr;
+
+	switch (c1)
+	{
+		case SS2:				/* JIS X 0201 */
+			if (length != 2)
+				return false;
+
+			c2 = charptr[1];
+
+			if (c2 >= 0xdf)
+				charptr[0] = charptr[1] = 0xa1;
+			else if (c2 < 0xa1)
+				charptr[1] = 0xa1;
+			else
+				charptr[1]++;
+			break;
+
+		case SS3:				/* JIS X 0212 */
+			if (length != 3)
+				return false;
+
+			for (i = 2; i > 0; i--)
+			{
+				c2 = charptr[i];
+				if (c2 < 0xa1)
+				{
+					charptr[i] = 0xa1;
+					return true;
+				}
+				else if (c2 < 0xfe)
+				{
+					charptr[i]++;
+					return true;
+				}
+			}
+
+			/* Out of 3-byte code region */
+			return false;
+
+		default:
+			if (IS_HIGHBIT_SET(c1)) /* JIS X 0208? */
+			{
+				if (length != 2)
+					return false;
+
+				for (i = 1; i >= 0; i--)
+				{
+					c2 = charptr[i];
+					if (c2 < 0xa1)
+					{
+						charptr[i] = 0xa1;
+						return true;
+					}
+					else if (c2 < 0xfe)
+					{
+						charptr[i]++;
+						return true;
+					}
+				}
+
+				/* Out of 2 byte code region */
+				return false;
+			}
+			else
+			{					/* ASCII, single byte */
+				if (c1 > 0x7e)
+					return false;
+				(*charptr)++;
+			}
+			break;
+	}
+
+	return true;
+}
+#endif							/* !FRONTEND */
+
+
+/*
+ *-------------------------------------------------------------------
+ * encoding info table
+ * XXX must be sorted by the same order as enum pg_enc (in mb/pg_wchar.h)
+ *-------------------------------------------------------------------
+ */
+const pg_wchar_tbl pg_wchar_table[] = {
+	{pg_ascii2wchar_with_len, pg_wchar2single_with_len, pg_ascii_mblen, pg_ascii_dsplen, pg_ascii_verifier, 1}, /* PG_SQL_ASCII */
+	{pg_eucjp2wchar_with_len, pg_wchar2euc_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_eucjp_verifier, 3},	/* PG_EUC_JP */
+	{pg_euccn2wchar_with_len, pg_wchar2euc_with_len, pg_euccn_mblen, pg_euccn_dsplen, pg_euccn_verifier, 2},	/* PG_EUC_CN */
+	{pg_euckr2wchar_with_len, pg_wchar2euc_with_len, pg_euckr_mblen, pg_euckr_dsplen, pg_euckr_verifier, 3},	/* PG_EUC_KR */
+	{pg_euctw2wchar_with_len, pg_wchar2euc_with_len, pg_euctw_mblen, pg_euctw_dsplen, pg_euctw_verifier, 4},	/* PG_EUC_TW */
+	{pg_eucjp2wchar_with_len, pg_wchar2euc_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_eucjp_verifier, 3},	/* PG_EUC_JIS_2004 */
+	{pg_utf2wchar_with_len, pg_wchar2utf_with_len, pg_utf_mblen, pg_utf_dsplen, pg_utf8_verifier, 4},	/* PG_UTF8 */
+	{pg_mule2wchar_with_len, pg_wchar2mule_with_len, pg_mule_mblen, pg_mule_dsplen, pg_mule_verifier, 4},	/* PG_MULE_INTERNAL */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN1 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN2 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN3 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN4 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN5 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN6 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN7 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN8 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN9 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN10 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1256 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1258 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN866 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN874 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_KOI8R */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1251 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1252 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* ISO-8859-5 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* ISO-8859-6 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* ISO-8859-7 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* ISO-8859-8 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1250 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1253 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1254 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1255 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1257 */
+	{pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_KOI8U */
+	{0, 0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifier, 2}, /* PG_SJIS */
+	{0, 0, pg_big5_mblen, pg_big5_dsplen, pg_big5_verifier, 2}, /* PG_BIG5 */
+	{0, 0, pg_gbk_mblen, pg_gbk_dsplen, pg_gbk_verifier, 2},	/* PG_GBK */
+	{0, 0, pg_uhc_mblen, pg_uhc_dsplen, pg_uhc_verifier, 2},	/* PG_UHC */
+	{0, 0, pg_gb18030_mblen, pg_gb18030_dsplen, pg_gb18030_verifier, 4},	/* PG_GB18030 */
+	{0, 0, pg_johab_mblen, pg_johab_dsplen, pg_johab_verifier, 3},	/* PG_JOHAB */
+	{0, 0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifier, 2}	/* PG_SHIFT_JIS_2004 */
+};
+
+/* returns the byte length of a word for mule internal code */
+int
+pg_mic_mblen(const unsigned char *mbstr)
+{
+	return pg_mule_mblen(mbstr);
+}
+
+/*
+ * Returns the byte length of a multibyte character.
+ */
+int
+pg_encoding_mblen(int encoding, const char *mbstr)
+{
+	return (PG_VALID_ENCODING(encoding) ?
+			pg_wchar_table[encoding].mblen((const unsigned char *) mbstr) :
+			pg_wchar_table[PG_SQL_ASCII].mblen((const unsigned char *) mbstr));
+}
+
+/*
+ * Returns the display length of a multibyte character.
+ */
+int
+pg_encoding_dsplen(int encoding, const char *mbstr)
+{
+	return (PG_VALID_ENCODING(encoding) ?
+			pg_wchar_table[encoding].dsplen((const unsigned char *) mbstr) :
+			pg_wchar_table[PG_SQL_ASCII].dsplen((const unsigned char *) mbstr));
+}
+
+/*
+ * Verify the first multibyte character of the given string.
+ * Return its byte length if good, -1 if bad.  (See comments above for
+ * full details of the mbverify API.)
+ */
+int
+pg_encoding_verifymb(int encoding, const char *mbstr, int len)
+{
+	return (PG_VALID_ENCODING(encoding) ?
+			pg_wchar_table[encoding].mbverify((const unsigned char *) mbstr, len) :
+			pg_wchar_table[PG_SQL_ASCII].mbverify((const unsigned char *) mbstr, len));
+}
+
+/*
+ * fetch maximum length of a given encoding
+ */
+int
+pg_encoding_max_length(int encoding)
+{
+	Assert(PG_VALID_ENCODING(encoding));
+
+	return pg_wchar_table[encoding].maxmblen;
+}
+
+#ifndef FRONTEND
+
+/*
+ * fetch maximum length of the encoding for the current database
+ */
+int
+pg_database_encoding_max_length(void)
+{
+	return pg_wchar_table[GetDatabaseEncoding()].maxmblen;
+}
+
+/*
+ * get the character incrementer for the encoding for the current database
+ */
+mbcharacter_incrementer
+pg_database_encoding_character_incrementer(void)
+{
+	/*
+	 * Eventually it might be best to add a field to pg_wchar_table[], but for
+	 * now we just use a switch.
+	 */
+	switch (GetDatabaseEncoding())
+	{
+		case PG_UTF8:
+			return pg_utf8_increment;
+
+		case PG_EUC_JP:
+			return pg_eucjp_increment;
+
+		default:
+			return pg_generic_charinc;
+	}
+}
+
+/*
+ * Verify mbstr to make sure that it is validly encoded in the current
+ * database encoding.  Otherwise same as pg_verify_mbstr().
+ */
+bool
+pg_verifymbstr(const char *mbstr, int len, bool noError)
+{
+	return
+		pg_verify_mbstr_len(GetDatabaseEncoding(), mbstr, len, noError) >= 0;
+}
+
+/*
+ * Verify mbstr to make sure that it is validly encoded in the specified
+ * encoding.
+ */
+bool
+pg_verify_mbstr(int encoding, const char *mbstr, int len, bool noError)
+{
+	return pg_verify_mbstr_len(encoding, mbstr, len, noError) >= 0;
+}
+
+/*
+ * Verify mbstr to make sure that it is validly encoded in the specified
+ * encoding.
+ *
+ * mbstr is not necessarily zero terminated; length of mbstr is
+ * specified by len.
+ *
+ * If OK, return length of string in the encoding.
+ * If a problem is found, return -1 when noError is
+ * true; when noError is false, ereport() a descriptive message.
+ */
+int
+pg_verify_mbstr_len(int encoding, const char *mbstr, int len, bool noError)
+{
+	mbverifier	mbverify;
+	int			mb_len;
+
+	Assert(PG_VALID_ENCODING(encoding));
+
+	/*
+	 * In single-byte encodings, we need only reject nulls (\0).
+	 */
+	if (pg_encoding_max_length(encoding) <= 1)
+	{
+		const char *nullpos = memchr(mbstr, 0, len);
+
+		if (nullpos == NULL)
+			return len;
+		if (noError)
+			return -1;
+		report_invalid_encoding(encoding, nullpos, 1);
+	}
+
+	/* fetch function pointer just once */
+	mbverify = pg_wchar_table[encoding].mbverify;
+
+	mb_len = 0;
+
+	while (len > 0)
+	{
+		int			l;
+
+		/* fast path for ASCII-subset characters */
+		if (!IS_HIGHBIT_SET(*mbstr))
+		{
+			if (*mbstr != '\0')
+			{
+				mb_len++;
+				mbstr++;
+				len--;
+				continue;
+			}
+			if (noError)
+				return -1;
+			report_invalid_encoding(encoding, mbstr, len);
+		}
+
+		l = (*mbverify) ((const unsigned char *) mbstr, len);
+
+		if (l < 0)
+		{
+			if (noError)
+				return -1;
+			report_invalid_encoding(encoding, mbstr, len);
+		}
+
+		mbstr += l;
+		len -= l;
+		mb_len++;
+	}
+	return mb_len;
+}
+
+/*
+ * check_encoding_conversion_args: check arguments of a conversion function
+ *
+ * "expected" arguments can be either an encoding ID or -1 to indicate that
+ * the caller will check whether it accepts the ID.
+ *
+ * Note: the errors here are not really user-facing, so elog instead of
+ * ereport seems sufficient.  Also, we trust that the "expected" encoding
+ * arguments are valid encoding IDs, but we don't trust the actuals.
+ */
+void
+check_encoding_conversion_args(int src_encoding,
+							   int dest_encoding,
+							   int len,
+							   int expected_src_encoding,
+							   int expected_dest_encoding)
+{
+	if (!PG_VALID_ENCODING(src_encoding))
+		elog(ERROR, "invalid source encoding ID: %d", src_encoding);
+	if (src_encoding != expected_src_encoding && expected_src_encoding >= 0)
+		elog(ERROR, "expected source encoding \"%s\", but got \"%s\"",
+			 pg_enc2name_tbl[expected_src_encoding].name,
+			 pg_enc2name_tbl[src_encoding].name);
+	if (!PG_VALID_ENCODING(dest_encoding))
+		elog(ERROR, "invalid destination encoding ID: %d", dest_encoding);
+	if (dest_encoding != expected_dest_encoding && expected_dest_encoding >= 0)
+		elog(ERROR, "expected destination encoding \"%s\", but got \"%s\"",
+			 pg_enc2name_tbl[expected_dest_encoding].name,
+			 pg_enc2name_tbl[dest_encoding].name);
+	if (len < 0)
+		elog(ERROR, "encoding conversion length must not be negative");
+}
+
+/*
+ * report_invalid_encoding: complain about invalid multibyte character
+ *
+ * note: len is remaining length of string, not length of character;
+ * len must be greater than zero, as we always examine the first byte.
+ */
+void
+report_invalid_encoding(int encoding, const char *mbstr, int len)
+{
+	int			l = pg_encoding_mblen(encoding, mbstr);
+	char		buf[8 * 5 + 1];
+	char	   *p = buf;
+	int			j,
+				jlimit;
+
+	jlimit = Min(l, len);
+	jlimit = Min(jlimit, 8);	/* prevent buffer overrun */
+
+	for (j = 0; j < jlimit; j++)
+	{
+		p += sprintf(p, "0x%02x", (unsigned char) mbstr[j]);
+		if (j < jlimit - 1)
+			p += sprintf(p, " ");
+	}
+
+	ereport(ERROR,
+			(errcode(ERRCODE_CHARACTER_NOT_IN_REPERTOIRE),
+			 errmsg("invalid byte sequence for encoding \"%s\": %s",
+					pg_enc2name_tbl[encoding].name,
+					buf)));
+}
+
+/*
+ * report_untranslatable_char: complain about untranslatable character
+ *
+ * note: len is remaining length of string, not length of character;
+ * len must be greater than zero, as we always examine the first byte.
+ */
+void
+report_untranslatable_char(int src_encoding, int dest_encoding,
+						   const char *mbstr, int len)
+{
+	int			l = pg_encoding_mblen(src_encoding, mbstr);
+	char		buf[8 * 5 + 1];
+	char	   *p = buf;
+	int			j,
+				jlimit;
+
+	jlimit = Min(l, len);
+	jlimit = Min(jlimit, 8);	/* prevent buffer overrun */
+
+	for (j = 0; j < jlimit; j++)
+	{
+		p += sprintf(p, "0x%02x", (unsigned char) mbstr[j]);
+		if (j < jlimit - 1)
+			p += sprintf(p, " ");
+	}
+
+	ereport(ERROR,
+			(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
+			 errmsg("character with byte sequence %s in encoding \"%s\" has no equivalent in encoding \"%s\"",
+					buf,
+					pg_enc2name_tbl[src_encoding].name,
+					pg_enc2name_tbl[dest_encoding].name)));
+}
+
+#endif							/* !FRONTEND */
diff --git a/src/include/mb/pg_wchar.h b/src/include/mb/pg_wchar.h
index 7fb5fa4..026f64f 100644
--- a/src/include/mb/pg_wchar.h
+++ b/src/include/mb/pg_wchar.h
@@ -222,8 +222,8 @@ typedef unsigned int pg_wchar;
  * PostgreSQL encoding identifiers
  *
  * WARNING: the order of this enum must be same as order of entries
- *			in the pg_enc2name_tbl[] array (in mb/encnames.c), and
- *			in the pg_wchar_table[] array (in mb/wchar.c)!
+ *			in the pg_enc2name_tbl[] array (in src/common/encnames.c), and
+ *			in the pg_wchar_table[] array (in src/common/wchar.c)!
  *
  *			If you add some encoding don't forget to check
  *			PG_ENCODING_BE_LAST macro.
diff --git a/src/interfaces/libpq/.gitignore b/src/interfaces/libpq/.gitignore
index 7b438f3..a4afe7c 100644
--- a/src/interfaces/libpq/.gitignore
+++ b/src/interfaces/libpq/.gitignore
@@ -1,4 +1 @@
 /exports.list
-# .c files that are symlinked in from elsewhere
-/encnames.c
-/wchar.c
diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile
index f5f1c0c..a068826 100644
--- a/src/interfaces/libpq/Makefile
+++ b/src/interfaces/libpq/Makefile
@@ -45,11 +45,6 @@ OBJS = \
 	pqexpbuffer.o \
 	fe-auth.o
 
-# src/backend/utils/mb
-OBJS += \
-	encnames.o \
-	wchar.o
-
 ifeq ($(with_openssl),yes)
 OBJS += \
 	fe-secure-common.o \
@@ -102,17 +97,7 @@ include $(top_srcdir)/src/Makefile.shlib
 backend_src = $(top_srcdir)/src/backend
 
 
-# We use a few backend modules verbatim, but since we need
-# to compile with appropriate options to build a shared lib, we can't
-# use the same object files built for the backend.
-# Instead, symlink the source files in here and build our own object files.
-# When you add a file here, remember to add it in the "clean" target below.
-
-encnames.c wchar.c: % : $(backend_src)/utils/mb/%
-	rm -f $@ && $(LN_S) $< .
-
-
-# Make dependencies on pg_config_paths.h visible, too.
+# Make dependencies on pg_config_paths.h visible in all builds.
 fe-connect.o: fe-connect.c $(top_builddir)/src/port/pg_config_paths.h
 fe-misc.o: fe-misc.c $(top_builddir)/src/port/pg_config_paths.h
 
@@ -144,8 +129,6 @@ clean distclean: clean-lib
 	rm -f $(OBJS) pthread.h
 # Might be left over from a Win32 client-only build
 	rm -f pg_config_paths.h
-# Remove files we (may have) symlinked in from other places
-	rm -f encnames.c wchar.c
 
 maintainer-clean: distclean
 	$(MAKE) -C test $@
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index f6ab0d5..67b9f23 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -120,11 +120,12 @@ sub mkvcbuild
 	}
 
 	our @pgcommonallfiles = qw(
-	  base64.c config_info.c controldata_utils.c d2s.c exec.c f2s.c file_perm.c ip.c
+	  base64.c config_info.c controldata_utils.c d2s.c encnames.c exec.c
+	  f2s.c file_perm.c ip.c
 	  keywords.c kwlookup.c link-canary.c md5.c
 	  pg_lzcompress.c pgfnames.c psprintf.c relpath.c rmtree.c
 	  saslprep.c scram-common.c string.c stringinfo.c unicode_norm.c username.c
-	  wait_error.c);
+	  wait_error.c wchar.c);
 
 	if ($solution->{options}->{openssl})
 	{

#15

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Tom Lane (#14)

Re: making the backend's json parser work in frontend code

On Thu, Jan 16, 2020 at 3:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
missing something, this seems like an overdue cleanup.

Here's a reviewed version of 0001. You missed fixing the MSVC build,
and there were assorted comments and other things referencing wchar.c
that needed to be cleaned up.

Wow, thanks.

Also, it seemed to me that if we are going to move wchar.c, we should
also move encnames.c, so that libpq can get fully out of the
symlinking-source-files business. It makes initdb less weird too.

OK.

I took the liberty of sticking proper copyright headers onto these
two files, too. (This makes the diff a lot more bulky :-(. Would
it help to add the headers in a separate commit?)

I wouldn't bother making it a separate commit, but please do whatever you like.

Another thing I'm wondering about is if any of the #ifndef FRONTEND
code should get moved *back* to src/backend/utils/mb. But that
could be a separate commit, too.

+1 for moving that stuff to a separate backend-only file.

Lastly, it strikes me that maybe pg_wchar.h, or parts of it, should
migrate over to src/include/common. But that'd be far more invasive
to other source files, so I've not touched the issue here.

I don't have a view on this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: David Steele (#9)

Re: making the backend's json parser work in frontend code

On Thu, Jan 16, 2020 at 1:58 PM David Steele <david@pgmasters.net> wrote:

To do page-level incrementals (which this feature is intended to enable)
the user will need to be able to associate full and incremental backups
and the only way I see to do that (currently) is to read the manifests,
since the prior backup should be stored there. I think this means that
parsing the manifest is not really optional -- it will be required to do
any kind of automation with incrementals.

My current belief is that enabling incremental backup will require
extending the manifest format either not at all or by adding one
additional line with some LSN info.

If we could foresee a need to store a bunch of additional *per-file*
details, I'd be a lot more receptive to the argument that we ought to
be using a more structured format like JSON. And it doesn't seem
impossible that such a thing could happen, but I don't think it's at
all clear that it actually will happen, or that it will happen soon
enough that we ought to be worrying about it now.

It's possible that we're chasing a real problem here, and if there's
something we can agree on and get done I'd rather do that than argue,
but I am still quite suspicious that there's no actually serious
technical problem here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Robert Haas (#15)

Re: making the backend's json parser work in frontend code

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jan 16, 2020 at 3:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Here's a reviewed version of 0001. You missed fixing the MSVC build,
and there were assorted comments and other things referencing wchar.c
that needed to be cleaned up.

Wow, thanks.

Pushed that.

Another thing I'm wondering about is if any of the #ifndef FRONTEND
code should get moved *back* to src/backend/utils/mb. But that
could be a separate commit, too.

+1 for moving that stuff to a separate backend-only file.

After a brief look, I propose the following:

* I think we should just shove the "#ifndef FRONTEND" stuff in
wchar.c into mbutils.c. It doesn't seem worth inventing a whole
new file for that code, especially when it's arguably within the
remit of mbutils.c anyway.

* Let's remove the "#ifndef FRONTEND" restriction on the ICU-related
stuff in encnames.c. Even if we don't need that stuff in frontend
today, it's hardly unlikely that we will need it tomorrow. And there's
not that much bulk there anyway.

* The one positive reason for that restriction is the ereport() in
get_encoding_name_for_icu. We could change that to be the usual
#ifdef-ereport-or-printf dance, but I think there's a better way: put
the ereport at the caller, by redefining that function to return NULL
for an unsupported encoding. There's only one caller today anyhow.

* PG_char_to_encoding() and PG_encoding_to_char() can be moved to
mbutils.c; they'd fit reasonably well beside getdatabaseencoding and
pg_client_encoding. (I also thought about utils/adt/misc.c, but
that's not obviously better.)

Barring objections I'll go make this happen shortly.

Lastly, it strikes me that maybe pg_wchar.h, or parts of it, should
migrate over to src/include/common. But that'd be far more invasive
to other source files, so I've not touched the issue here.

I don't have a view on this.

If anyone is hot to do this part, please have at it. I'm not.

regards, tom lane

#18

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Robert Haas (#16)

Re: making the backend's json parser work in frontend code

Robert Haas <robertmhaas@gmail.com> writes:

It's possible that we're chasing a real problem here, and if there's
something we can agree on and get done I'd rather do that than argue,
but I am still quite suspicious that there's no actually serious
technical problem here.

It's entirely possible that you're right. But if this is a file format
that is meant to be exposed to user tools, we need to take a very long
view of the requirements for it. Five or ten years down the road, we
might be darn glad we spent extra time now.

regards, tom lane

#19

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

almost 6 years ago

In reply to: Robert Haas (#1)

Re: making the backend's json parser work in frontend code

On Thu, Jan 16, 2020 at 7:33 AM Robert Haas <robertmhaas@gmail.com> wrote:

0002 does some basic header cleanup to make it possible to include the
existing header file jsonapi.h in frontend code. The state of the JSON
headers today looks generally poor. There seems not to have been much
attempt to get the prototypes for a given source file, say foo.c, into
a header file with the same name, say foo.h. Also, dependencies
between various header files seem to be have added somewhat freely.
This patch does not come close to fixing all that, but I consider it a
modest down payment on a cleanup that probably ought to be taken
further.

0003 splits json.c into two files, json.c and jsonapi.c. All the
lexing and parsing stuff (whose prototypes are in jsonapi.h) goes into
jsonapi.c, while the stuff that pertains to the 'json' data type
remains in json.c. This also seems like a good cleanup, because to me,
at least, it's not a great idea to mix together code that is used by
both the json and jsonb data types as well as other things in the
system that want to generate or parse json together with things that
are specific to the 'json' data type.

I'm probably responsible for a good deal of the mess, so let me say Thankyou.

I'll have a good look at these.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#20

David Steele

david@pgmasters.net

almost 6 years ago

In reply to: Robert Haas (#8)

1 attachment(s)

Re: making the backend's json parser work in frontend code

Hi Robert,

On 1/16/20 11:51 AM, Robert Haas wrote:

On Thu, Jan 16, 2020 at 1:37 PM David Steele <david@pgmasters.net> wrote:

The next question in my mind is given the caveat that the error handing
is questionable in the front end, can we at least render/parse valid
JSON with the code?

That's a real good question. Thanks for offering to test it; I think
that would be very helpful.

It seems to work just fine. I didn't stress it too hard but I did put
in one escape and a multi-byte character and check the various data types.

Attached is a test hack on pg_basebackup which produces this output:

START
FIELD "number", null 0
SCALAR TYPE 2: 123
FIELD "string", null 0
SCALAR TYPE 1: val ue-丏
FIELD "bool", null 0
SCALAR TYPE 9: true
FIELD "null", null 1
SCALAR TYPE 11: null
END

I used the callbacks because that's the first method I found but it
seems like json_lex() might be easier to use in practice.

I think it's an issue that the entire string must be passed to the lexer
at once. That will not be great for large manifests. However, I don't
think it will be all that hard to implement an optional "want more"
callback in the lexer so JSON data can be fed in from the file in chunks.

So, that just leaves ereport() as the largest remaining issue? I'll
look at that today and Tuesday and see what I can work up.

Regards,
--
-David
david@pgmasters.net

Attachments:

json-api-client-test.difftext/plain; charset=UTF-8; name=json-api-client-test.diff; x-mac-creator=0; x-mac-type=0Download

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 238b671f7a..00b74118fb 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -41,6 +41,7 @@
 #include "receivelog.h"
 #include "replication/basebackup.h"
 #include "streamutil.h"
+#include "utils/jsonapi.h"
 
 #define ERRCODE_DATA_CORRUPTED	"XX001"
 
@@ -2019,10 +2020,40 @@ BaseBackup(void)
 		pg_log_info("base backup completed");
 }
 
+void
+jsonOFieldAction(void *state, char *fname, bool isnull)
+{
+	fprintf(stderr, "    FIELD \"%s\", null %d\n", fname, isnull); fflush(stderr);
+	pfree(fname);
+}
+
+void
+jsonScalarAction(void *state, char *token, JsonTokenType tokentype)
+{
+	fprintf(stderr, "    SCALAR TYPE %u: %s\n", tokentype, token); fflush(stderr);
+	pfree(token);
+}
 
 int
 main(int argc, char **argv)
 {
+	if (true)
+	{
+		char json[] = "{\"number\": 123, \"string\": \"val\\tue-丏\", \"bool\": true, \"null\": null}";
+		JsonSemAction sem = {.semstate = NULL, .scalar = jsonScalarAction, .object_field_start = jsonOFieldAction};
+		JsonLexContext *lex;
+
+		fprintf(stderr, "START\n"); fflush(stderr);
+
+		lex = makeJsonLexContextCstringLen(json, strlen(json), true);
+
+		pg_parse_json(lex, &sem);
+
+		fprintf(stderr, "END\n"); fflush(stderr);
+
+		exit(0);
+	}
+
 	static struct option long_options[] = {
 		{"help", no_argument, NULL, '?'},
 		{"version", no_argument, NULL, 'V'},

#21

David Steele

david@pgmasters.net

almost 6 years ago

In reply to: Robert Haas (#8)

Re: making the backend's json parser work in frontend code

Hi Robert,

On 1/16/20 11:51 AM, Robert Haas wrote:

On Thu, Jan 16, 2020 at 1:37 PM David Steele <david@pgmasters.net> wrote:

So the idea here is that json.c will have the JSON SQL functions,
jsonb.c the JSONB SQL functions, and jsonapi.c the parser, and
jsonfuncs.c the utility functions?

Uh, I think roughly that, yes. Although I can't claim to fully
understand everything that's here.

Now that I've spent some time with the code I see your intent was just
to isolate the JSON lexer code with 0002 and 0003. As such, I now think
they are commit-able as is.

Regards,
--
-David
david@pgmasters.net

#22

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Andrew Dunstan (#19)

Re: making the backend's json parser work in frontend code

On Thu, Jan 16, 2020 at 8:55 PM Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:

I'm probably responsible for a good deal of the mess, so let me say Thankyou.

I'll have a good look at these.

Thanks, appreciated.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#23

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: David Steele (#20)

4 attachment(s)

Re: making the backend's json parser work in frontend code

On Fri, Jan 17, 2020 at 12:36 PM David Steele <david@pgmasters.net> wrote:

It seems to work just fine. I didn't stress it too hard but I did put
in one escape and a multi-byte character and check the various data types.

Cool.

I used the callbacks because that's the first method I found but it
seems like json_lex() might be easier to use in practice.

Ugh, really? That doesn't seem like it would be nice at all.

I think it's an issue that the entire string must be passed to the lexer
at once. That will not be great for large manifests. However, I don't
think it will be all that hard to implement an optional "want more"
callback in the lexer so JSON data can be fed in from the file in chunks.

I thought so initially, but now I'm not so sure. The thing is, you
actually need all the manifest data in memory at once anyway, or so I
think. You're essentially doing a "full join" between the contents of
the manifest and the contents of the file system, so you've got to
scan one (probably the filesystem) and then mark entries in the other
(probably the manifest) used as you go.

But this might need more thought. The details probably depend on
exactly how you design it all.

So, that just leaves ereport() as the largest remaining issue? I'll
look at that today and Tuesday and see what I can work up.

PFA my work on that topic. As compared with my previous patch series,
the previous 0001 is dropped and what are now 0001 and 0002 are the
same as patches from the previous series. 0003 and 0004 are aiming
toward getting rid of ereport() and, I believe, show a plausible
strategy for so doing. There are, possibly, things not to like here,
and it's certainly incomplete, but I think I kinda like this
direction. Comments appreciated.

0003 nukes lex_accept(), inlining the logic into callers. I found that
the refactoring I wanted to do in 0004 was pretty hard without this,
and it turns out to save code, so I think this is a good idea
independently of anything else.

0004 adjusts many functions in jsonapi.c to return a new enumerated
type, JsonParseErrorType, instead of directly doing ereport(). It adds
a new function that takes this value and a lexing context and throws
an error. The JSON_ESCAPING_INVALID case is wrong and involves a gross
hack, but that's fixable with another field in the lexing context.
More work is needed to really bring this up to scratch, but the idea
is to make this code have a soft dependency on ereport() rather than a
hard one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

v2-0001-Adjust-src-include-utils-jsonapi.h-so-it-s-not-ba.patchapplication/octet-stream; name=v2-0001-Adjust-src-include-utils-jsonapi.h-so-it-s-not-ba.patchDownload

From 436ab321a8ed9e2053b778c869428ab10388c13d Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 15 Jan 2020 09:26:04 -0500
Subject: [PATCH v2 1/4] Adjust src/include/utils/jsonapi.h so it's not
 backend-only.

The major change here is that we no longer include jsonb.h into
jsonapi.h. The reason that was necessary is that jsonapi.h included
several prototypes functions in jsonfuncs.c that depend on the Jsonb
type. Move those prototypes to a new header, jsonfuncs.h, and include
it where needed.

The other change is that JsonEncodeDateTime is now declared in
json.h rather than jsonapi.h.

Taken together, these steps eliminate all dependencies of jsonapi.h
on backend-only data types and header files, so that it can
potentially be included in frontend code.
---
 src/backend/tsearch/to_tsany.c     |  1 +
 src/backend/tsearch/wparser.c      |  1 +
 src/backend/utils/adt/jsonb_util.c |  1 +
 src/backend/utils/adt/jsonfuncs.c  |  1 +
 src/include/utils/json.h           |  2 ++
 src/include/utils/jsonapi.h        | 33 --------------------
 src/include/utils/jsonfuncs.h      | 49 ++++++++++++++++++++++++++++++
 7 files changed, 55 insertions(+), 33 deletions(-)
 create mode 100644 src/include/utils/jsonfuncs.h

diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index cc694cda8c..adf181c191 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -17,6 +17,7 @@
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
 #include "utils/jsonapi.h"
+#include "utils/jsonfuncs.h"
 
 
 typedef struct MorphOpaque
diff --git a/src/backend/tsearch/wparser.c b/src/backend/tsearch/wparser.c
index 6b5960ecc1..c7499a94ac 100644
--- a/src/backend/tsearch/wparser.c
+++ b/src/backend/tsearch/wparser.c
@@ -21,6 +21,7 @@
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
 #include "utils/jsonapi.h"
+#include "utils/jsonfuncs.h"
 #include "utils/varlena.h"
 
 /******sql-level interface******/
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 7c9da701dd..b33c3ef43c 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -19,6 +19,7 @@
 #include "utils/builtins.h"
 #include "utils/datetime.h"
 #include "utils/hashutils.h"
+#include "utils/json.h"
 #include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/memutils.h"
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 4b5a0214dc..165e6a83ec 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -29,6 +29,7 @@
 #include "utils/json.h"
 #include "utils/jsonapi.h"
 #include "utils/jsonb.h"
+#include "utils/jsonfuncs.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 20b5294491..4345fbdc31 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -18,5 +18,7 @@
 
 /* functions in json.c */
 extern void escape_json(StringInfo buf, const char *str);
+extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
+								const int *tzp);
 
 #endif							/* JSON_H */
diff --git a/src/include/utils/jsonapi.h b/src/include/utils/jsonapi.h
index f72f1cefd5..1190947476 100644
--- a/src/include/utils/jsonapi.h
+++ b/src/include/utils/jsonapi.h
@@ -14,7 +14,6 @@
 #ifndef JSONAPI_H
 #define JSONAPI_H
 
-#include "jsonb.h"
 #include "lib/stringinfo.h"
 
 typedef enum
@@ -132,36 +131,4 @@ extern JsonLexContext *makeJsonLexContextCstringLen(char *json,
  */
 extern bool IsValidJsonNumber(const char *str, int len);
 
-/*
- * Flag types for iterate_json(b)_values to specify what elements from a
- * json(b) document we want to iterate.
- */
-typedef enum JsonToIndex
-{
-	jtiKey = 0x01,
-	jtiString = 0x02,
-	jtiNumeric = 0x04,
-	jtiBool = 0x08,
-	jtiAll = jtiKey | jtiString | jtiNumeric | jtiBool
-} JsonToIndex;
-
-/* an action that will be applied to each value in iterate_json(b)_values functions */
-typedef void (*JsonIterateStringValuesAction) (void *state, char *elem_value, int elem_len);
-
-/* an action that will be applied to each value in transform_json(b)_values functions */
-typedef text *(*JsonTransformStringValuesAction) (void *state, char *elem_value, int elem_len);
-
-extern uint32 parse_jsonb_index_flags(Jsonb *jb);
-extern void iterate_jsonb_values(Jsonb *jb, uint32 flags, void *state,
-								 JsonIterateStringValuesAction action);
-extern void iterate_json_values(text *json, uint32 flags, void *action_state,
-								JsonIterateStringValuesAction action);
-extern Jsonb *transform_jsonb_string_values(Jsonb *jsonb, void *action_state,
-											JsonTransformStringValuesAction transform_action);
-extern text *transform_json_string_values(text *json, void *action_state,
-										  JsonTransformStringValuesAction transform_action);
-
-extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
-								const int *tzp);
-
 #endif							/* JSONAPI_H */
diff --git a/src/include/utils/jsonfuncs.h b/src/include/utils/jsonfuncs.h
new file mode 100644
index 0000000000..19f087ccae
--- /dev/null
+++ b/src/include/utils/jsonfuncs.h
@@ -0,0 +1,49 @@
+/*-------------------------------------------------------------------------
+ *
+ * jsonfuncs.h
+ *	  Functions to process JSON data types.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/jsonapi.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef JSONFUNCS_H
+#define JSONFUNCS_H
+
+#include "utils/jsonapi.h"
+#include "utils/jsonb.h"
+
+/*
+ * Flag types for iterate_json(b)_values to specify what elements from a
+ * json(b) document we want to iterate.
+ */
+typedef enum JsonToIndex
+{
+	jtiKey = 0x01,
+	jtiString = 0x02,
+	jtiNumeric = 0x04,
+	jtiBool = 0x08,
+	jtiAll = jtiKey | jtiString | jtiNumeric | jtiBool
+} JsonToIndex;
+
+/* an action that will be applied to each value in iterate_json(b)_values functions */
+typedef void (*JsonIterateStringValuesAction) (void *state, char *elem_value, int elem_len);
+
+/* an action that will be applied to each value in transform_json(b)_values functions */
+typedef text *(*JsonTransformStringValuesAction) (void *state, char *elem_value, int elem_len);
+
+extern uint32 parse_jsonb_index_flags(Jsonb *jb);
+extern void iterate_jsonb_values(Jsonb *jb, uint32 flags, void *state,
+								 JsonIterateStringValuesAction action);
+extern void iterate_json_values(text *json, uint32 flags, void *action_state,
+								JsonIterateStringValuesAction action);
+extern Jsonb *transform_jsonb_string_values(Jsonb *jsonb, void *action_state,
+											JsonTransformStringValuesAction transform_action);
+extern text *transform_json_string_values(text *json, void *action_state,
+										  JsonTransformStringValuesAction transform_action);
+
+#endif
-- 
2.17.2 (Apple Git-113)

v2-0002-Split-JSON-lexer-parser-from-json-data-type-suppo.patchapplication/octet-stream; name=v2-0002-Split-JSON-lexer-parser-from-json-data-type-suppo.patchDownload

From d4a2af5c6ea7cbd541ad83f0d1e999f9a65c52ff Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 15 Jan 2020 10:03:39 -0500
Subject: [PATCH v2 2/4] Split JSON lexer/parser from 'json' data type support.

Keep the code that pertains to the 'json' data type in json.c, but
move the lexing and parsing code to a new file jsonapi.c, a name
I chose because the corresponding prototypes are in jsonapi.h.

This seems like a logical division, because the JSON lexer and parser
are also used by the 'jsonb' data type, but the SQL-callable functions
in json.c are a separate thing. Also, the new jsonapi.c file needs to
include far fewer header files than json.c, which seems like a good
sign that this is an appropriate place to insert an abstraction
boundary. I took the opportunity to remove a few apparently-unneeded
includes from json.c at the same time.
---
 src/backend/utils/adt/Makefile  |    1 +
 src/backend/utils/adt/json.c    | 1206 +-----------------------------
 src/backend/utils/adt/jsonapi.c | 1216 +++++++++++++++++++++++++++++++
 src/include/utils/jsonapi.h     |    6 +
 4 files changed, 1224 insertions(+), 1205 deletions(-)
 create mode 100644 src/backend/utils/adt/jsonapi.c

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 13efa9338c..790d7a24fb 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -44,6 +44,7 @@ OBJS = \
 	int.o \
 	int8.o \
 	json.o \
+	jsonapi.o \
 	jsonb.o \
 	jsonb_gin.o \
 	jsonb_op.o \
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 458505abfd..4be16b5c20 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -13,14 +13,9 @@
  */
 #include "postgres.h"
 
-#include "access/htup_details.h"
-#include "access/transam.h"
 #include "catalog/pg_type.h"
-#include "executor/spi.h"
 #include "funcapi.h"
-#include "lib/stringinfo.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "utils/array.h"
@@ -30,27 +25,8 @@
 #include "utils/json.h"
 #include "utils/jsonapi.h"
 #include "utils/lsyscache.h"
-#include "utils/syscache.h"
 #include "utils/typcache.h"
 
-/*
- * The context of the parser is maintained by the recursive descent
- * mechanism, but is passed explicitly to the error reporting routine
- * for better diagnostics.
- */
-typedef enum					/* contexts of JSON parser */
-{
-	JSON_PARSE_VALUE,			/* expecting a value */
-	JSON_PARSE_STRING,			/* expecting a string (for a field name) */
-	JSON_PARSE_ARRAY_START,		/* saw '[', expecting value or ']' */
-	JSON_PARSE_ARRAY_NEXT,		/* saw array element, expecting ',' or ']' */
-	JSON_PARSE_OBJECT_START,	/* saw '{', expecting label or '}' */
-	JSON_PARSE_OBJECT_LABEL,	/* saw object label, expecting ':' */
-	JSON_PARSE_OBJECT_NEXT,		/* saw object value, expecting ',' or '}' */
-	JSON_PARSE_OBJECT_COMMA,	/* saw object ',', expecting next label */
-	JSON_PARSE_END				/* saw the end of a document, expect nothing */
-} JsonParseContext;
-
 typedef enum					/* type categories for datum_to_json */
 {
 	JSONTYPE_NULL,				/* null, so we didn't bother to identify */
@@ -75,19 +51,6 @@ typedef struct JsonAggState
 	Oid			val_output_func;
 } JsonAggState;
 
-static inline void json_lex(JsonLexContext *lex);
-static inline void json_lex_string(JsonLexContext *lex);
-static inline void json_lex_number(JsonLexContext *lex, char *s,
-								   bool *num_err, int *total_len);
-static inline void parse_scalar(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_object_field(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_object(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_array_element(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_array(JsonLexContext *lex, JsonSemAction *sem);
-static void report_parse_error(JsonParseContext ctx, JsonLexContext *lex) pg_attribute_noreturn();
-static void report_invalid_token(JsonLexContext *lex) pg_attribute_noreturn();
-static int	report_json_context(JsonLexContext *lex);
-static char *extract_mb_char(char *s);
 static void composite_to_json(Datum composite, StringInfo result,
 							  bool use_line_feeds);
 static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
@@ -106,121 +69,6 @@ static void add_json(Datum val, bool is_null, StringInfo result,
 					 Oid val_type, bool key_scalar);
 static text *catenate_stringinfo_string(StringInfo buffer, const char *addon);
 
-/* the null action object used for pure validation */
-static JsonSemAction nullSemAction =
-{
-	NULL, NULL, NULL, NULL, NULL,
-	NULL, NULL, NULL, NULL, NULL
-};
-
-/* Recursive Descent parser support routines */
-
-/*
- * lex_peek
- *
- * what is the current look_ahead token?
-*/
-static inline JsonTokenType
-lex_peek(JsonLexContext *lex)
-{
-	return lex->token_type;
-}
-
-/*
- * lex_accept
- *
- * accept the look_ahead token and move the lexer to the next token if the
- * look_ahead token matches the token parameter. In that case, and if required,
- * also hand back the de-escaped lexeme.
- *
- * returns true if the token matched, false otherwise.
- */
-static inline bool
-lex_accept(JsonLexContext *lex, JsonTokenType token, char **lexeme)
-{
-	if (lex->token_type == token)
-	{
-		if (lexeme != NULL)
-		{
-			if (lex->token_type == JSON_TOKEN_STRING)
-			{
-				if (lex->strval != NULL)
-					*lexeme = pstrdup(lex->strval->data);
-			}
-			else
-			{
-				int			len = (lex->token_terminator - lex->token_start);
-				char	   *tokstr = palloc(len + 1);
-
-				memcpy(tokstr, lex->token_start, len);
-				tokstr[len] = '\0';
-				*lexeme = tokstr;
-			}
-		}
-		json_lex(lex);
-		return true;
-	}
-	return false;
-}
-
-/*
- * lex_accept
- *
- * move the lexer to the next token if the current look_ahead token matches
- * the parameter token. Otherwise, report an error.
- */
-static inline void
-lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token)
-{
-	if (!lex_accept(lex, token, NULL))
-		report_parse_error(ctx, lex);
-}
-
-/* chars to consider as part of an alphanumeric token */
-#define JSON_ALPHANUMERIC_CHAR(c)  \
-	(((c) >= 'a' && (c) <= 'z') || \
-	 ((c) >= 'A' && (c) <= 'Z') || \
-	 ((c) >= '0' && (c) <= '9') || \
-	 (c) == '_' || \
-	 IS_HIGHBIT_SET(c))
-
-/*
- * Utility function to check if a string is a valid JSON number.
- *
- * str is of length len, and need not be null-terminated.
- */
-bool
-IsValidJsonNumber(const char *str, int len)
-{
-	bool		numeric_error;
-	int			total_len;
-	JsonLexContext dummy_lex;
-
-	if (len <= 0)
-		return false;
-
-	/*
-	 * json_lex_number expects a leading  '-' to have been eaten already.
-	 *
-	 * having to cast away the constness of str is ugly, but there's not much
-	 * easy alternative.
-	 */
-	if (*str == '-')
-	{
-		dummy_lex.input = unconstify(char *, str) +1;
-		dummy_lex.input_length = len - 1;
-	}
-	else
-	{
-		dummy_lex.input = unconstify(char *, str);
-		dummy_lex.input_length = len;
-	}
-
-	json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error, &total_len);
-
-	return (!numeric_error) && (total_len == dummy_lex.input_length);
-}
-
 /*
  * Input.
  */
@@ -285,1058 +133,6 @@ json_recv(PG_FUNCTION_ARGS)
 	PG_RETURN_TEXT_P(cstring_to_text_with_len(str, nbytes));
 }
 
-/*
- * makeJsonLexContext
- *
- * lex constructor, with or without StringInfo object
- * for de-escaped lexemes.
- *
- * Without is better as it makes the processing faster, so only make one
- * if really required.
- *
- * If you already have the json as a text* value, use the first of these
- * functions, otherwise use  makeJsonLexContextCstringLen().
- */
-JsonLexContext *
-makeJsonLexContext(text *json, bool need_escapes)
-{
-	return makeJsonLexContextCstringLen(VARDATA_ANY(json),
-										VARSIZE_ANY_EXHDR(json),
-										need_escapes);
-}
-
-JsonLexContext *
-makeJsonLexContextCstringLen(char *json, int len, bool need_escapes)
-{
-	JsonLexContext *lex = palloc0(sizeof(JsonLexContext));
-
-	lex->input = lex->token_terminator = lex->line_start = json;
-	lex->line_number = 1;
-	lex->input_length = len;
-	if (need_escapes)
-		lex->strval = makeStringInfo();
-	return lex;
-}
-
-/*
- * pg_parse_json
- *
- * Publicly visible entry point for the JSON parser.
- *
- * lex is a lexing context, set up for the json to be processed by calling
- * makeJsonLexContext(). sem is a structure of function pointers to semantic
- * action routines to be called at appropriate spots during parsing, and a
- * pointer to a state object to be passed to those routines.
- */
-void
-pg_parse_json(JsonLexContext *lex, JsonSemAction *sem)
-{
-	JsonTokenType tok;
-
-	/* get the initial token */
-	json_lex(lex);
-
-	tok = lex_peek(lex);
-
-	/* parse by recursive descent */
-	switch (tok)
-	{
-		case JSON_TOKEN_OBJECT_START:
-			parse_object(lex, sem);
-			break;
-		case JSON_TOKEN_ARRAY_START:
-			parse_array(lex, sem);
-			break;
-		default:
-			parse_scalar(lex, sem); /* json can be a bare scalar */
-	}
-
-	lex_expect(JSON_PARSE_END, lex, JSON_TOKEN_END);
-
-}
-
-/*
- * json_count_array_elements
- *
- * Returns number of array elements in lex context at start of array token
- * until end of array token at same nesting level.
- *
- * Designed to be called from array_start routines.
- */
-int
-json_count_array_elements(JsonLexContext *lex)
-{
-	JsonLexContext copylex;
-	int			count;
-
-	/*
-	 * It's safe to do this with a shallow copy because the lexical routines
-	 * don't scribble on the input. They do scribble on the other pointers
-	 * etc, so doing this with a copy makes that safe.
-	 */
-	memcpy(&copylex, lex, sizeof(JsonLexContext));
-	copylex.strval = NULL;		/* not interested in values here */
-	copylex.lex_level++;
-
-	count = 0;
-	lex_expect(JSON_PARSE_ARRAY_START, &copylex, JSON_TOKEN_ARRAY_START);
-	if (lex_peek(&copylex) != JSON_TOKEN_ARRAY_END)
-	{
-		do
-		{
-			count++;
-			parse_array_element(&copylex, &nullSemAction);
-		}
-		while (lex_accept(&copylex, JSON_TOKEN_COMMA, NULL));
-	}
-	lex_expect(JSON_PARSE_ARRAY_NEXT, &copylex, JSON_TOKEN_ARRAY_END);
-
-	return count;
-}
-
-/*
- *	Recursive Descent parse routines. There is one for each structural
- *	element in a json document:
- *	  - scalar (string, number, true, false, null)
- *	  - array  ( [ ] )
- *	  - array element
- *	  - object ( { } )
- *	  - object field
- */
-static inline void
-parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
-{
-	char	   *val = NULL;
-	json_scalar_action sfunc = sem->scalar;
-	char	  **valaddr;
-	JsonTokenType tok = lex_peek(lex);
-
-	valaddr = sfunc == NULL ? NULL : &val;
-
-	/* a scalar must be a string, a number, true, false, or null */
-	switch (tok)
-	{
-		case JSON_TOKEN_TRUE:
-			lex_accept(lex, JSON_TOKEN_TRUE, valaddr);
-			break;
-		case JSON_TOKEN_FALSE:
-			lex_accept(lex, JSON_TOKEN_FALSE, valaddr);
-			break;
-		case JSON_TOKEN_NULL:
-			lex_accept(lex, JSON_TOKEN_NULL, valaddr);
-			break;
-		case JSON_TOKEN_NUMBER:
-			lex_accept(lex, JSON_TOKEN_NUMBER, valaddr);
-			break;
-		case JSON_TOKEN_STRING:
-			lex_accept(lex, JSON_TOKEN_STRING, valaddr);
-			break;
-		default:
-			report_parse_error(JSON_PARSE_VALUE, lex);
-	}
-
-	if (sfunc != NULL)
-		(*sfunc) (sem->semstate, val, tok);
-}
-
-static void
-parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
-{
-	/*
-	 * An object field is "fieldname" : value where value can be a scalar,
-	 * object or array.  Note: in user-facing docs and error messages, we
-	 * generally call a field name a "key".
-	 */
-
-	char	   *fname = NULL;	/* keep compiler quiet */
-	json_ofield_action ostart = sem->object_field_start;
-	json_ofield_action oend = sem->object_field_end;
-	bool		isnull;
-	char	  **fnameaddr = NULL;
-	JsonTokenType tok;
-
-	if (ostart != NULL || oend != NULL)
-		fnameaddr = &fname;
-
-	if (!lex_accept(lex, JSON_TOKEN_STRING, fnameaddr))
-		report_parse_error(JSON_PARSE_STRING, lex);
-
-	lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON);
-
-	tok = lex_peek(lex);
-	isnull = tok == JSON_TOKEN_NULL;
-
-	if (ostart != NULL)
-		(*ostart) (sem->semstate, fname, isnull);
-
-	switch (tok)
-	{
-		case JSON_TOKEN_OBJECT_START:
-			parse_object(lex, sem);
-			break;
-		case JSON_TOKEN_ARRAY_START:
-			parse_array(lex, sem);
-			break;
-		default:
-			parse_scalar(lex, sem);
-	}
-
-	if (oend != NULL)
-		(*oend) (sem->semstate, fname, isnull);
-}
-
-static void
-parse_object(JsonLexContext *lex, JsonSemAction *sem)
-{
-	/*
-	 * an object is a possibly empty sequence of object fields, separated by
-	 * commas and surrounded by curly braces.
-	 */
-	json_struct_action ostart = sem->object_start;
-	json_struct_action oend = sem->object_end;
-	JsonTokenType tok;
-
-	check_stack_depth();
-
-	if (ostart != NULL)
-		(*ostart) (sem->semstate);
-
-	/*
-	 * Data inside an object is at a higher nesting level than the object
-	 * itself. Note that we increment this after we call the semantic routine
-	 * for the object start and restore it before we call the routine for the
-	 * object end.
-	 */
-	lex->lex_level++;
-
-	/* we know this will succeed, just clearing the token */
-	lex_expect(JSON_PARSE_OBJECT_START, lex, JSON_TOKEN_OBJECT_START);
-
-	tok = lex_peek(lex);
-	switch (tok)
-	{
-		case JSON_TOKEN_STRING:
-			parse_object_field(lex, sem);
-			while (lex_accept(lex, JSON_TOKEN_COMMA, NULL))
-				parse_object_field(lex, sem);
-			break;
-		case JSON_TOKEN_OBJECT_END:
-			break;
-		default:
-			/* case of an invalid initial token inside the object */
-			report_parse_error(JSON_PARSE_OBJECT_START, lex);
-	}
-
-	lex_expect(JSON_PARSE_OBJECT_NEXT, lex, JSON_TOKEN_OBJECT_END);
-
-	lex->lex_level--;
-
-	if (oend != NULL)
-		(*oend) (sem->semstate);
-}
-
-static void
-parse_array_element(JsonLexContext *lex, JsonSemAction *sem)
-{
-	json_aelem_action astart = sem->array_element_start;
-	json_aelem_action aend = sem->array_element_end;
-	JsonTokenType tok = lex_peek(lex);
-
-	bool		isnull;
-
-	isnull = tok == JSON_TOKEN_NULL;
-
-	if (astart != NULL)
-		(*astart) (sem->semstate, isnull);
-
-	/* an array element is any object, array or scalar */
-	switch (tok)
-	{
-		case JSON_TOKEN_OBJECT_START:
-			parse_object(lex, sem);
-			break;
-		case JSON_TOKEN_ARRAY_START:
-			parse_array(lex, sem);
-			break;
-		default:
-			parse_scalar(lex, sem);
-	}
-
-	if (aend != NULL)
-		(*aend) (sem->semstate, isnull);
-}
-
-static void
-parse_array(JsonLexContext *lex, JsonSemAction *sem)
-{
-	/*
-	 * an array is a possibly empty sequence of array elements, separated by
-	 * commas and surrounded by square brackets.
-	 */
-	json_struct_action astart = sem->array_start;
-	json_struct_action aend = sem->array_end;
-
-	check_stack_depth();
-
-	if (astart != NULL)
-		(*astart) (sem->semstate);
-
-	/*
-	 * Data inside an array is at a higher nesting level than the array
-	 * itself. Note that we increment this after we call the semantic routine
-	 * for the array start and restore it before we call the routine for the
-	 * array end.
-	 */
-	lex->lex_level++;
-
-	lex_expect(JSON_PARSE_ARRAY_START, lex, JSON_TOKEN_ARRAY_START);
-	if (lex_peek(lex) != JSON_TOKEN_ARRAY_END)
-	{
-
-		parse_array_element(lex, sem);
-
-		while (lex_accept(lex, JSON_TOKEN_COMMA, NULL))
-			parse_array_element(lex, sem);
-	}
-
-	lex_expect(JSON_PARSE_ARRAY_NEXT, lex, JSON_TOKEN_ARRAY_END);
-
-	lex->lex_level--;
-
-	if (aend != NULL)
-		(*aend) (sem->semstate);
-}
-
-/*
- * Lex one token from the input stream.
- */
-static inline void
-json_lex(JsonLexContext *lex)
-{
-	char	   *s;
-	int			len;
-
-	/* Skip leading whitespace. */
-	s = lex->token_terminator;
-	len = s - lex->input;
-	while (len < lex->input_length &&
-		   (*s == ' ' || *s == '\t' || *s == '\n' || *s == '\r'))
-	{
-		if (*s == '\n')
-			++lex->line_number;
-		++s;
-		++len;
-	}
-	lex->token_start = s;
-
-	/* Determine token type. */
-	if (len >= lex->input_length)
-	{
-		lex->token_start = NULL;
-		lex->prev_token_terminator = lex->token_terminator;
-		lex->token_terminator = s;
-		lex->token_type = JSON_TOKEN_END;
-	}
-	else
-		switch (*s)
-		{
-				/* Single-character token, some kind of punctuation mark. */
-			case '{':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_OBJECT_START;
-				break;
-			case '}':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_OBJECT_END;
-				break;
-			case '[':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_ARRAY_START;
-				break;
-			case ']':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_ARRAY_END;
-				break;
-			case ',':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_COMMA;
-				break;
-			case ':':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_COLON;
-				break;
-			case '"':
-				/* string */
-				json_lex_string(lex);
-				lex->token_type = JSON_TOKEN_STRING;
-				break;
-			case '-':
-				/* Negative number. */
-				json_lex_number(lex, s + 1, NULL, NULL);
-				lex->token_type = JSON_TOKEN_NUMBER;
-				break;
-			case '0':
-			case '1':
-			case '2':
-			case '3':
-			case '4':
-			case '5':
-			case '6':
-			case '7':
-			case '8':
-			case '9':
-				/* Positive number. */
-				json_lex_number(lex, s, NULL, NULL);
-				lex->token_type = JSON_TOKEN_NUMBER;
-				break;
-			default:
-				{
-					char	   *p;
-
-					/*
-					 * We're not dealing with a string, number, legal
-					 * punctuation mark, or end of string.  The only legal
-					 * tokens we might find here are true, false, and null,
-					 * but for error reporting purposes we scan until we see a
-					 * non-alphanumeric character.  That way, we can report
-					 * the whole word as an unexpected token, rather than just
-					 * some unintuitive prefix thereof.
-					 */
-					for (p = s; p - s < lex->input_length - len && JSON_ALPHANUMERIC_CHAR(*p); p++)
-						 /* skip */ ;
-
-					/*
-					 * We got some sort of unexpected punctuation or an
-					 * otherwise unexpected character, so just complain about
-					 * that one character.
-					 */
-					if (p == s)
-					{
-						lex->prev_token_terminator = lex->token_terminator;
-						lex->token_terminator = s + 1;
-						report_invalid_token(lex);
-					}
-
-					/*
-					 * We've got a real alphanumeric token here.  If it
-					 * happens to be true, false, or null, all is well.  If
-					 * not, error out.
-					 */
-					lex->prev_token_terminator = lex->token_terminator;
-					lex->token_terminator = p;
-					if (p - s == 4)
-					{
-						if (memcmp(s, "true", 4) == 0)
-							lex->token_type = JSON_TOKEN_TRUE;
-						else if (memcmp(s, "null", 4) == 0)
-							lex->token_type = JSON_TOKEN_NULL;
-						else
-							report_invalid_token(lex);
-					}
-					else if (p - s == 5 && memcmp(s, "false", 5) == 0)
-						lex->token_type = JSON_TOKEN_FALSE;
-					else
-						report_invalid_token(lex);
-
-				}
-		}						/* end of switch */
-}
-
-/*
- * The next token in the input stream is known to be a string; lex it.
- */
-static inline void
-json_lex_string(JsonLexContext *lex)
-{
-	char	   *s;
-	int			len;
-	int			hi_surrogate = -1;
-
-	if (lex->strval != NULL)
-		resetStringInfo(lex->strval);
-
-	Assert(lex->input_length > 0);
-	s = lex->token_start;
-	len = lex->token_start - lex->input;
-	for (;;)
-	{
-		s++;
-		len++;
-		/* Premature end of the string. */
-		if (len >= lex->input_length)
-		{
-			lex->token_terminator = s;
-			report_invalid_token(lex);
-		}
-		else if (*s == '"')
-			break;
-		else if ((unsigned char) *s < 32)
-		{
-			/* Per RFC4627, these characters MUST be escaped. */
-			/* Since *s isn't printable, exclude it from the context string */
-			lex->token_terminator = s;
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-					 errmsg("invalid input syntax for type %s", "json"),
-					 errdetail("Character with value 0x%02x must be escaped.",
-							   (unsigned char) *s),
-					 report_json_context(lex)));
-		}
-		else if (*s == '\\')
-		{
-			/* OK, we have an escape character. */
-			s++;
-			len++;
-			if (len >= lex->input_length)
-			{
-				lex->token_terminator = s;
-				report_invalid_token(lex);
-			}
-			else if (*s == 'u')
-			{
-				int			i;
-				int			ch = 0;
-
-				for (i = 1; i <= 4; i++)
-				{
-					s++;
-					len++;
-					if (len >= lex->input_length)
-					{
-						lex->token_terminator = s;
-						report_invalid_token(lex);
-					}
-					else if (*s >= '0' && *s <= '9')
-						ch = (ch * 16) + (*s - '0');
-					else if (*s >= 'a' && *s <= 'f')
-						ch = (ch * 16) + (*s - 'a') + 10;
-					else if (*s >= 'A' && *s <= 'F')
-						ch = (ch * 16) + (*s - 'A') + 10;
-					else
-					{
-						lex->token_terminator = s + pg_mblen(s);
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s",
-										"json"),
-								 errdetail("\"\\u\" must be followed by four hexadecimal digits."),
-								 report_json_context(lex)));
-					}
-				}
-				if (lex->strval != NULL)
-				{
-					char		utf8str[5];
-					int			utf8len;
-
-					if (ch >= 0xd800 && ch <= 0xdbff)
-					{
-						if (hi_surrogate != -1)
-							ereport(ERROR,
-									(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-									 errmsg("invalid input syntax for type %s",
-											"json"),
-									 errdetail("Unicode high surrogate must not follow a high surrogate."),
-									 report_json_context(lex)));
-						hi_surrogate = (ch & 0x3ff) << 10;
-						continue;
-					}
-					else if (ch >= 0xdc00 && ch <= 0xdfff)
-					{
-						if (hi_surrogate == -1)
-							ereport(ERROR,
-									(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-									 errmsg("invalid input syntax for type %s", "json"),
-									 errdetail("Unicode low surrogate must follow a high surrogate."),
-									 report_json_context(lex)));
-						ch = 0x10000 + hi_surrogate + (ch & 0x3ff);
-						hi_surrogate = -1;
-					}
-
-					if (hi_surrogate != -1)
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s", "json"),
-								 errdetail("Unicode low surrogate must follow a high surrogate."),
-								 report_json_context(lex)));
-
-					/*
-					 * For UTF8, replace the escape sequence by the actual
-					 * utf8 character in lex->strval. Do this also for other
-					 * encodings if the escape designates an ASCII character,
-					 * otherwise raise an error.
-					 */
-
-					if (ch == 0)
-					{
-						/* We can't allow this, since our TEXT type doesn't */
-						ereport(ERROR,
-								(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
-								 errmsg("unsupported Unicode escape sequence"),
-								 errdetail("\\u0000 cannot be converted to text."),
-								 report_json_context(lex)));
-					}
-					else if (GetDatabaseEncoding() == PG_UTF8)
-					{
-						unicode_to_utf8(ch, (unsigned char *) utf8str);
-						utf8len = pg_utf_mblen((unsigned char *) utf8str);
-						appendBinaryStringInfo(lex->strval, utf8str, utf8len);
-					}
-					else if (ch <= 0x007f)
-					{
-						/*
-						 * This is the only way to designate things like a
-						 * form feed character in JSON, so it's useful in all
-						 * encodings.
-						 */
-						appendStringInfoChar(lex->strval, (char) ch);
-					}
-					else
-					{
-						ereport(ERROR,
-								(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
-								 errmsg("unsupported Unicode escape sequence"),
-								 errdetail("Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8."),
-								 report_json_context(lex)));
-					}
-
-				}
-			}
-			else if (lex->strval != NULL)
-			{
-				if (hi_surrogate != -1)
-					ereport(ERROR,
-							(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-							 errmsg("invalid input syntax for type %s",
-									"json"),
-							 errdetail("Unicode low surrogate must follow a high surrogate."),
-							 report_json_context(lex)));
-
-				switch (*s)
-				{
-					case '"':
-					case '\\':
-					case '/':
-						appendStringInfoChar(lex->strval, *s);
-						break;
-					case 'b':
-						appendStringInfoChar(lex->strval, '\b');
-						break;
-					case 'f':
-						appendStringInfoChar(lex->strval, '\f');
-						break;
-					case 'n':
-						appendStringInfoChar(lex->strval, '\n');
-						break;
-					case 'r':
-						appendStringInfoChar(lex->strval, '\r');
-						break;
-					case 't':
-						appendStringInfoChar(lex->strval, '\t');
-						break;
-					default:
-						/* Not a valid string escape, so error out. */
-						lex->token_terminator = s + pg_mblen(s);
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s",
-										"json"),
-								 errdetail("Escape sequence \"\\%s\" is invalid.",
-										   extract_mb_char(s)),
-								 report_json_context(lex)));
-				}
-			}
-			else if (strchr("\"\\/bfnrt", *s) == NULL)
-			{
-				/*
-				 * Simpler processing if we're not bothered about de-escaping
-				 *
-				 * It's very tempting to remove the strchr() call here and
-				 * replace it with a switch statement, but testing so far has
-				 * shown it's not a performance win.
-				 */
-				lex->token_terminator = s + pg_mblen(s);
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Escape sequence \"\\%s\" is invalid.",
-								   extract_mb_char(s)),
-						 report_json_context(lex)));
-			}
-
-		}
-		else if (lex->strval != NULL)
-		{
-			if (hi_surrogate != -1)
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Unicode low surrogate must follow a high surrogate."),
-						 report_json_context(lex)));
-
-			appendStringInfoChar(lex->strval, *s);
-		}
-
-	}
-
-	if (hi_surrogate != -1)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("Unicode low surrogate must follow a high surrogate."),
-				 report_json_context(lex)));
-
-	/* Hooray, we found the end of the string! */
-	lex->prev_token_terminator = lex->token_terminator;
-	lex->token_terminator = s + 1;
-}
-
-/*
- * The next token in the input stream is known to be a number; lex it.
- *
- * In JSON, a number consists of four parts:
- *
- * (1) An optional minus sign ('-').
- *
- * (2) Either a single '0', or a string of one or more digits that does not
- *	   begin with a '0'.
- *
- * (3) An optional decimal part, consisting of a period ('.') followed by
- *	   one or more digits.  (Note: While this part can be omitted
- *	   completely, it's not OK to have only the decimal point without
- *	   any digits afterwards.)
- *
- * (4) An optional exponent part, consisting of 'e' or 'E', optionally
- *	   followed by '+' or '-', followed by one or more digits.  (Note:
- *	   As with the decimal part, if 'e' or 'E' is present, it must be
- *	   followed by at least one digit.)
- *
- * The 's' argument to this function points to the ostensible beginning
- * of part 2 - i.e. the character after any optional minus sign, or the
- * first character of the string if there is none.
- *
- * If num_err is not NULL, we return an error flag to *num_err rather than
- * raising an error for a badly-formed number.  Also, if total_len is not NULL
- * the distance from lex->input to the token end+1 is returned to *total_len.
- */
-static inline void
-json_lex_number(JsonLexContext *lex, char *s,
-				bool *num_err, int *total_len)
-{
-	bool		error = false;
-	int			len = s - lex->input;
-
-	/* Part (1): leading sign indicator. */
-	/* Caller already did this for us; so do nothing. */
-
-	/* Part (2): parse main digit string. */
-	if (len < lex->input_length && *s == '0')
-	{
-		s++;
-		len++;
-	}
-	else if (len < lex->input_length && *s >= '1' && *s <= '9')
-	{
-		do
-		{
-			s++;
-			len++;
-		} while (len < lex->input_length && *s >= '0' && *s <= '9');
-	}
-	else
-		error = true;
-
-	/* Part (3): parse optional decimal portion. */
-	if (len < lex->input_length && *s == '.')
-	{
-		s++;
-		len++;
-		if (len == lex->input_length || *s < '0' || *s > '9')
-			error = true;
-		else
-		{
-			do
-			{
-				s++;
-				len++;
-			} while (len < lex->input_length && *s >= '0' && *s <= '9');
-		}
-	}
-
-	/* Part (4): parse optional exponent. */
-	if (len < lex->input_length && (*s == 'e' || *s == 'E'))
-	{
-		s++;
-		len++;
-		if (len < lex->input_length && (*s == '+' || *s == '-'))
-		{
-			s++;
-			len++;
-		}
-		if (len == lex->input_length || *s < '0' || *s > '9')
-			error = true;
-		else
-		{
-			do
-			{
-				s++;
-				len++;
-			} while (len < lex->input_length && *s >= '0' && *s <= '9');
-		}
-	}
-
-	/*
-	 * Check for trailing garbage.  As in json_lex(), any alphanumeric stuff
-	 * here should be considered part of the token for error-reporting
-	 * purposes.
-	 */
-	for (; len < lex->input_length && JSON_ALPHANUMERIC_CHAR(*s); s++, len++)
-		error = true;
-
-	if (total_len != NULL)
-		*total_len = len;
-
-	if (num_err != NULL)
-	{
-		/* let the caller handle any error */
-		*num_err = error;
-	}
-	else
-	{
-		/* return token endpoint */
-		lex->prev_token_terminator = lex->token_terminator;
-		lex->token_terminator = s;
-		/* handle error if any */
-		if (error)
-			report_invalid_token(lex);
-	}
-}
-
-/*
- * Report a parse error.
- *
- * lex->token_start and lex->token_terminator must identify the current token.
- */
-static void
-report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
-{
-	char	   *token;
-	int			toklen;
-
-	/* Handle case where the input ended prematurely. */
-	if (lex->token_start == NULL || lex->token_type == JSON_TOKEN_END)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("The input string ended unexpectedly."),
-				 report_json_context(lex)));
-
-	/* Separate out the current token. */
-	toklen = lex->token_terminator - lex->token_start;
-	token = palloc(toklen + 1);
-	memcpy(token, lex->token_start, toklen);
-	token[toklen] = '\0';
-
-	/* Complain, with the appropriate detail message. */
-	if (ctx == JSON_PARSE_END)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("Expected end of input, but found \"%s\".",
-						   token),
-				 report_json_context(lex)));
-	else
-	{
-		switch (ctx)
-		{
-			case JSON_PARSE_VALUE:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected JSON value, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_STRING:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_ARRAY_START:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected array element or \"]\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_ARRAY_NEXT:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \",\" or \"]\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_START:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string or \"}\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_LABEL:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \":\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_NEXT:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \",\" or \"}\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_COMMA:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			default:
-				elog(ERROR, "unexpected json parse state: %d", ctx);
-		}
-	}
-}
-
-/*
- * Report an invalid input token.
- *
- * lex->token_start and lex->token_terminator must identify the token.
- */
-static void
-report_invalid_token(JsonLexContext *lex)
-{
-	char	   *token;
-	int			toklen;
-
-	/* Separate out the offending token. */
-	toklen = lex->token_terminator - lex->token_start;
-	token = palloc(toklen + 1);
-	memcpy(token, lex->token_start, toklen);
-	token[toklen] = '\0';
-
-	ereport(ERROR,
-			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-			 errmsg("invalid input syntax for type %s", "json"),
-			 errdetail("Token \"%s\" is invalid.", token),
-			 report_json_context(lex)));
-}
-
-/*
- * Report a CONTEXT line for bogus JSON input.
- *
- * lex->token_terminator must be set to identify the spot where we detected
- * the error.  Note that lex->token_start might be NULL, in case we recognized
- * error at EOF.
- *
- * The return value isn't meaningful, but we make it non-void so that this
- * can be invoked inside ereport().
- */
-static int
-report_json_context(JsonLexContext *lex)
-{
-	const char *context_start;
-	const char *context_end;
-	const char *line_start;
-	int			line_number;
-	char	   *ctxt;
-	int			ctxtlen;
-	const char *prefix;
-	const char *suffix;
-
-	/* Choose boundaries for the part of the input we will display */
-	context_start = lex->input;
-	context_end = lex->token_terminator;
-	line_start = context_start;
-	line_number = 1;
-	for (;;)
-	{
-		/* Always advance over newlines */
-		if (context_start < context_end && *context_start == '\n')
-		{
-			context_start++;
-			line_start = context_start;
-			line_number++;
-			continue;
-		}
-		/* Otherwise, done as soon as we are close enough to context_end */
-		if (context_end - context_start < 50)
-			break;
-		/* Advance to next multibyte character */
-		if (IS_HIGHBIT_SET(*context_start))
-			context_start += pg_mblen(context_start);
-		else
-			context_start++;
-	}
-
-	/*
-	 * We add "..." to indicate that the excerpt doesn't start at the
-	 * beginning of the line ... but if we're within 3 characters of the
-	 * beginning of the line, we might as well just show the whole line.
-	 */
-	if (context_start - line_start <= 3)
-		context_start = line_start;
-
-	/* Get a null-terminated copy of the data to present */
-	ctxtlen = context_end - context_start;
-	ctxt = palloc(ctxtlen + 1);
-	memcpy(ctxt, context_start, ctxtlen);
-	ctxt[ctxtlen] = '\0';
-
-	/*
-	 * Show the context, prefixing "..." if not starting at start of line, and
-	 * suffixing "..." if not ending at end of line.
-	 */
-	prefix = (context_start > line_start) ? "..." : "";
-	suffix = (lex->token_type != JSON_TOKEN_END && context_end - lex->input < lex->input_length && *context_end != '\n' && *context_end != '\r') ? "..." : "";
-
-	return errcontext("JSON data, line %d: %s%s%s",
-					  line_number, prefix, ctxt, suffix);
-}
-
-/*
- * Extract a single, possibly multi-byte char from the input string.
- */
-static char *
-extract_mb_char(char *s)
-{
-	char	   *res;
-	int			len;
-
-	len = pg_mblen(s);
-	res = palloc(len + 1);
-	memcpy(res, s, len);
-	res[len] = '\0';
-
-	return res;
-}
-
 /*
  * Determine how we want to print values of a given type in datum_to_json.
  *
@@ -2547,7 +1343,7 @@ json_typeof(PG_FUNCTION_ARGS)
 
 	/* Lex exactly one token from the input and check its type. */
 	json_lex(lex);
-	tok = lex_peek(lex);
+	tok = lex->token_type;
 	switch (tok)
 	{
 		case JSON_TOKEN_OBJECT_START:
diff --git a/src/backend/utils/adt/jsonapi.c b/src/backend/utils/adt/jsonapi.c
new file mode 100644
index 0000000000..fc8af9f861
--- /dev/null
+++ b/src/backend/utils/adt/jsonapi.c
@@ -0,0 +1,1216 @@
+/*-------------------------------------------------------------------------
+ *
+ * jsonapi.c
+ *		JSON parser and lexer interfaces
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/jsonapi.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "utils/jsonapi.h"
+
+/*
+ * The context of the parser is maintained by the recursive descent
+ * mechanism, but is passed explicitly to the error reporting routine
+ * for better diagnostics.
+ */
+typedef enum					/* contexts of JSON parser */
+{
+	JSON_PARSE_VALUE,			/* expecting a value */
+	JSON_PARSE_STRING,			/* expecting a string (for a field name) */
+	JSON_PARSE_ARRAY_START,		/* saw '[', expecting value or ']' */
+	JSON_PARSE_ARRAY_NEXT,		/* saw array element, expecting ',' or ']' */
+	JSON_PARSE_OBJECT_START,	/* saw '{', expecting label or '}' */
+	JSON_PARSE_OBJECT_LABEL,	/* saw object label, expecting ':' */
+	JSON_PARSE_OBJECT_NEXT,		/* saw object value, expecting ',' or '}' */
+	JSON_PARSE_OBJECT_COMMA,	/* saw object ',', expecting next label */
+	JSON_PARSE_END				/* saw the end of a document, expect nothing */
+} JsonParseContext;
+
+static inline void json_lex_string(JsonLexContext *lex);
+static inline void json_lex_number(JsonLexContext *lex, char *s,
+								   bool *num_err, int *total_len);
+static inline void parse_scalar(JsonLexContext *lex, JsonSemAction *sem);
+static void parse_object_field(JsonLexContext *lex, JsonSemAction *sem);
+static void parse_object(JsonLexContext *lex, JsonSemAction *sem);
+static void parse_array_element(JsonLexContext *lex, JsonSemAction *sem);
+static void parse_array(JsonLexContext *lex, JsonSemAction *sem);
+static void report_parse_error(JsonParseContext ctx, JsonLexContext *lex) pg_attribute_noreturn();
+static void report_invalid_token(JsonLexContext *lex) pg_attribute_noreturn();
+static int	report_json_context(JsonLexContext *lex);
+static char *extract_mb_char(char *s);
+
+/* the null action object used for pure validation */
+JsonSemAction nullSemAction =
+{
+	NULL, NULL, NULL, NULL, NULL,
+	NULL, NULL, NULL, NULL, NULL
+};
+
+/* Recursive Descent parser support routines */
+
+/*
+ * lex_peek
+ *
+ * what is the current look_ahead token?
+*/
+static inline JsonTokenType
+lex_peek(JsonLexContext *lex)
+{
+	return lex->token_type;
+}
+
+/*
+ * lex_accept
+ *
+ * accept the look_ahead token and move the lexer to the next token if the
+ * look_ahead token matches the token parameter. In that case, and if required,
+ * also hand back the de-escaped lexeme.
+ *
+ * returns true if the token matched, false otherwise.
+ */
+static inline bool
+lex_accept(JsonLexContext *lex, JsonTokenType token, char **lexeme)
+{
+	if (lex->token_type == token)
+	{
+		if (lexeme != NULL)
+		{
+			if (lex->token_type == JSON_TOKEN_STRING)
+			{
+				if (lex->strval != NULL)
+					*lexeme = pstrdup(lex->strval->data);
+			}
+			else
+			{
+				int			len = (lex->token_terminator - lex->token_start);
+				char	   *tokstr = palloc(len + 1);
+
+				memcpy(tokstr, lex->token_start, len);
+				tokstr[len] = '\0';
+				*lexeme = tokstr;
+			}
+		}
+		json_lex(lex);
+		return true;
+	}
+	return false;
+}
+
+/*
+ * lex_accept
+ *
+ * move the lexer to the next token if the current look_ahead token matches
+ * the parameter token. Otherwise, report an error.
+ */
+static inline void
+lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token)
+{
+	if (!lex_accept(lex, token, NULL))
+		report_parse_error(ctx, lex);
+}
+
+/* chars to consider as part of an alphanumeric token */
+#define JSON_ALPHANUMERIC_CHAR(c)  \
+	(((c) >= 'a' && (c) <= 'z') || \
+	 ((c) >= 'A' && (c) <= 'Z') || \
+	 ((c) >= '0' && (c) <= '9') || \
+	 (c) == '_' || \
+	 IS_HIGHBIT_SET(c))
+
+/*
+ * Utility function to check if a string is a valid JSON number.
+ *
+ * str is of length len, and need not be null-terminated.
+ */
+bool
+IsValidJsonNumber(const char *str, int len)
+{
+	bool		numeric_error;
+	int			total_len;
+	JsonLexContext dummy_lex;
+
+	if (len <= 0)
+		return false;
+
+	/*
+	 * json_lex_number expects a leading  '-' to have been eaten already.
+	 *
+	 * having to cast away the constness of str is ugly, but there's not much
+	 * easy alternative.
+	 */
+	if (*str == '-')
+	{
+		dummy_lex.input = unconstify(char *, str) +1;
+		dummy_lex.input_length = len - 1;
+	}
+	else
+	{
+		dummy_lex.input = unconstify(char *, str);
+		dummy_lex.input_length = len;
+	}
+
+	json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error, &total_len);
+
+	return (!numeric_error) && (total_len == dummy_lex.input_length);
+}
+
+/*
+ * makeJsonLexContext
+ *
+ * lex constructor, with or without StringInfo object
+ * for de-escaped lexemes.
+ *
+ * Without is better as it makes the processing faster, so only make one
+ * if really required.
+ *
+ * If you already have the json as a text* value, use the first of these
+ * functions, otherwise use  makeJsonLexContextCstringLen().
+ */
+JsonLexContext *
+makeJsonLexContext(text *json, bool need_escapes)
+{
+	return makeJsonLexContextCstringLen(VARDATA_ANY(json),
+										VARSIZE_ANY_EXHDR(json),
+										need_escapes);
+}
+
+JsonLexContext *
+makeJsonLexContextCstringLen(char *json, int len, bool need_escapes)
+{
+	JsonLexContext *lex = palloc0(sizeof(JsonLexContext));
+
+	lex->input = lex->token_terminator = lex->line_start = json;
+	lex->line_number = 1;
+	lex->input_length = len;
+	if (need_escapes)
+		lex->strval = makeStringInfo();
+	return lex;
+}
+
+/*
+ * pg_parse_json
+ *
+ * Publicly visible entry point for the JSON parser.
+ *
+ * lex is a lexing context, set up for the json to be processed by calling
+ * makeJsonLexContext(). sem is a structure of function pointers to semantic
+ * action routines to be called at appropriate spots during parsing, and a
+ * pointer to a state object to be passed to those routines.
+ */
+void
+pg_parse_json(JsonLexContext *lex, JsonSemAction *sem)
+{
+	JsonTokenType tok;
+
+	/* get the initial token */
+	json_lex(lex);
+
+	tok = lex_peek(lex);
+
+	/* parse by recursive descent */
+	switch (tok)
+	{
+		case JSON_TOKEN_OBJECT_START:
+			parse_object(lex, sem);
+			break;
+		case JSON_TOKEN_ARRAY_START:
+			parse_array(lex, sem);
+			break;
+		default:
+			parse_scalar(lex, sem); /* json can be a bare scalar */
+	}
+
+	lex_expect(JSON_PARSE_END, lex, JSON_TOKEN_END);
+
+}
+
+/*
+ * json_count_array_elements
+ *
+ * Returns number of array elements in lex context at start of array token
+ * until end of array token at same nesting level.
+ *
+ * Designed to be called from array_start routines.
+ */
+int
+json_count_array_elements(JsonLexContext *lex)
+{
+	JsonLexContext copylex;
+	int			count;
+
+	/*
+	 * It's safe to do this with a shallow copy because the lexical routines
+	 * don't scribble on the input. They do scribble on the other pointers
+	 * etc, so doing this with a copy makes that safe.
+	 */
+	memcpy(&copylex, lex, sizeof(JsonLexContext));
+	copylex.strval = NULL;		/* not interested in values here */
+	copylex.lex_level++;
+
+	count = 0;
+	lex_expect(JSON_PARSE_ARRAY_START, &copylex, JSON_TOKEN_ARRAY_START);
+	if (lex_peek(&copylex) != JSON_TOKEN_ARRAY_END)
+	{
+		do
+		{
+			count++;
+			parse_array_element(&copylex, &nullSemAction);
+		}
+		while (lex_accept(&copylex, JSON_TOKEN_COMMA, NULL));
+	}
+	lex_expect(JSON_PARSE_ARRAY_NEXT, &copylex, JSON_TOKEN_ARRAY_END);
+
+	return count;
+}
+
+/*
+ *	Recursive Descent parse routines. There is one for each structural
+ *	element in a json document:
+ *	  - scalar (string, number, true, false, null)
+ *	  - array  ( [ ] )
+ *	  - array element
+ *	  - object ( { } )
+ *	  - object field
+ */
+static inline void
+parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
+{
+	char	   *val = NULL;
+	json_scalar_action sfunc = sem->scalar;
+	char	  **valaddr;
+	JsonTokenType tok = lex_peek(lex);
+
+	valaddr = sfunc == NULL ? NULL : &val;
+
+	/* a scalar must be a string, a number, true, false, or null */
+	switch (tok)
+	{
+		case JSON_TOKEN_TRUE:
+			lex_accept(lex, JSON_TOKEN_TRUE, valaddr);
+			break;
+		case JSON_TOKEN_FALSE:
+			lex_accept(lex, JSON_TOKEN_FALSE, valaddr);
+			break;
+		case JSON_TOKEN_NULL:
+			lex_accept(lex, JSON_TOKEN_NULL, valaddr);
+			break;
+		case JSON_TOKEN_NUMBER:
+			lex_accept(lex, JSON_TOKEN_NUMBER, valaddr);
+			break;
+		case JSON_TOKEN_STRING:
+			lex_accept(lex, JSON_TOKEN_STRING, valaddr);
+			break;
+		default:
+			report_parse_error(JSON_PARSE_VALUE, lex);
+	}
+
+	if (sfunc != NULL)
+		(*sfunc) (sem->semstate, val, tok);
+}
+
+static void
+parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
+{
+	/*
+	 * An object field is "fieldname" : value where value can be a scalar,
+	 * object or array.  Note: in user-facing docs and error messages, we
+	 * generally call a field name a "key".
+	 */
+
+	char	   *fname = NULL;	/* keep compiler quiet */
+	json_ofield_action ostart = sem->object_field_start;
+	json_ofield_action oend = sem->object_field_end;
+	bool		isnull;
+	char	  **fnameaddr = NULL;
+	JsonTokenType tok;
+
+	if (ostart != NULL || oend != NULL)
+		fnameaddr = &fname;
+
+	if (!lex_accept(lex, JSON_TOKEN_STRING, fnameaddr))
+		report_parse_error(JSON_PARSE_STRING, lex);
+
+	lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON);
+
+	tok = lex_peek(lex);
+	isnull = tok == JSON_TOKEN_NULL;
+
+	if (ostart != NULL)
+		(*ostart) (sem->semstate, fname, isnull);
+
+	switch (tok)
+	{
+		case JSON_TOKEN_OBJECT_START:
+			parse_object(lex, sem);
+			break;
+		case JSON_TOKEN_ARRAY_START:
+			parse_array(lex, sem);
+			break;
+		default:
+			parse_scalar(lex, sem);
+	}
+
+	if (oend != NULL)
+		(*oend) (sem->semstate, fname, isnull);
+}
+
+static void
+parse_object(JsonLexContext *lex, JsonSemAction *sem)
+{
+	/*
+	 * an object is a possibly empty sequence of object fields, separated by
+	 * commas and surrounded by curly braces.
+	 */
+	json_struct_action ostart = sem->object_start;
+	json_struct_action oend = sem->object_end;
+	JsonTokenType tok;
+
+	check_stack_depth();
+
+	if (ostart != NULL)
+		(*ostart) (sem->semstate);
+
+	/*
+	 * Data inside an object is at a higher nesting level than the object
+	 * itself. Note that we increment this after we call the semantic routine
+	 * for the object start and restore it before we call the routine for the
+	 * object end.
+	 */
+	lex->lex_level++;
+
+	/* we know this will succeed, just clearing the token */
+	lex_expect(JSON_PARSE_OBJECT_START, lex, JSON_TOKEN_OBJECT_START);
+
+	tok = lex_peek(lex);
+	switch (tok)
+	{
+		case JSON_TOKEN_STRING:
+			parse_object_field(lex, sem);
+			while (lex_accept(lex, JSON_TOKEN_COMMA, NULL))
+				parse_object_field(lex, sem);
+			break;
+		case JSON_TOKEN_OBJECT_END:
+			break;
+		default:
+			/* case of an invalid initial token inside the object */
+			report_parse_error(JSON_PARSE_OBJECT_START, lex);
+	}
+
+	lex_expect(JSON_PARSE_OBJECT_NEXT, lex, JSON_TOKEN_OBJECT_END);
+
+	lex->lex_level--;
+
+	if (oend != NULL)
+		(*oend) (sem->semstate);
+}
+
+static void
+parse_array_element(JsonLexContext *lex, JsonSemAction *sem)
+{
+	json_aelem_action astart = sem->array_element_start;
+	json_aelem_action aend = sem->array_element_end;
+	JsonTokenType tok = lex_peek(lex);
+
+	bool		isnull;
+
+	isnull = tok == JSON_TOKEN_NULL;
+
+	if (astart != NULL)
+		(*astart) (sem->semstate, isnull);
+
+	/* an array element is any object, array or scalar */
+	switch (tok)
+	{
+		case JSON_TOKEN_OBJECT_START:
+			parse_object(lex, sem);
+			break;
+		case JSON_TOKEN_ARRAY_START:
+			parse_array(lex, sem);
+			break;
+		default:
+			parse_scalar(lex, sem);
+	}
+
+	if (aend != NULL)
+		(*aend) (sem->semstate, isnull);
+}
+
+static void
+parse_array(JsonLexContext *lex, JsonSemAction *sem)
+{
+	/*
+	 * an array is a possibly empty sequence of array elements, separated by
+	 * commas and surrounded by square brackets.
+	 */
+	json_struct_action astart = sem->array_start;
+	json_struct_action aend = sem->array_end;
+
+	check_stack_depth();
+
+	if (astart != NULL)
+		(*astart) (sem->semstate);
+
+	/*
+	 * Data inside an array is at a higher nesting level than the array
+	 * itself. Note that we increment this after we call the semantic routine
+	 * for the array start and restore it before we call the routine for the
+	 * array end.
+	 */
+	lex->lex_level++;
+
+	lex_expect(JSON_PARSE_ARRAY_START, lex, JSON_TOKEN_ARRAY_START);
+	if (lex_peek(lex) != JSON_TOKEN_ARRAY_END)
+	{
+
+		parse_array_element(lex, sem);
+
+		while (lex_accept(lex, JSON_TOKEN_COMMA, NULL))
+			parse_array_element(lex, sem);
+	}
+
+	lex_expect(JSON_PARSE_ARRAY_NEXT, lex, JSON_TOKEN_ARRAY_END);
+
+	lex->lex_level--;
+
+	if (aend != NULL)
+		(*aend) (sem->semstate);
+}
+
+/*
+ * Lex one token from the input stream.
+ */
+void
+json_lex(JsonLexContext *lex)
+{
+	char	   *s;
+	int			len;
+
+	/* Skip leading whitespace. */
+	s = lex->token_terminator;
+	len = s - lex->input;
+	while (len < lex->input_length &&
+		   (*s == ' ' || *s == '\t' || *s == '\n' || *s == '\r'))
+	{
+		if (*s == '\n')
+			++lex->line_number;
+		++s;
+		++len;
+	}
+	lex->token_start = s;
+
+	/* Determine token type. */
+	if (len >= lex->input_length)
+	{
+		lex->token_start = NULL;
+		lex->prev_token_terminator = lex->token_terminator;
+		lex->token_terminator = s;
+		lex->token_type = JSON_TOKEN_END;
+	}
+	else
+		switch (*s)
+		{
+				/* Single-character token, some kind of punctuation mark. */
+			case '{':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_OBJECT_START;
+				break;
+			case '}':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_OBJECT_END;
+				break;
+			case '[':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_ARRAY_START;
+				break;
+			case ']':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_ARRAY_END;
+				break;
+			case ',':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_COMMA;
+				break;
+			case ':':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_COLON;
+				break;
+			case '"':
+				/* string */
+				json_lex_string(lex);
+				lex->token_type = JSON_TOKEN_STRING;
+				break;
+			case '-':
+				/* Negative number. */
+				json_lex_number(lex, s + 1, NULL, NULL);
+				lex->token_type = JSON_TOKEN_NUMBER;
+				break;
+			case '0':
+			case '1':
+			case '2':
+			case '3':
+			case '4':
+			case '5':
+			case '6':
+			case '7':
+			case '8':
+			case '9':
+				/* Positive number. */
+				json_lex_number(lex, s, NULL, NULL);
+				lex->token_type = JSON_TOKEN_NUMBER;
+				break;
+			default:
+				{
+					char	   *p;
+
+					/*
+					 * We're not dealing with a string, number, legal
+					 * punctuation mark, or end of string.  The only legal
+					 * tokens we might find here are true, false, and null,
+					 * but for error reporting purposes we scan until we see a
+					 * non-alphanumeric character.  That way, we can report
+					 * the whole word as an unexpected token, rather than just
+					 * some unintuitive prefix thereof.
+					 */
+					for (p = s; p - s < lex->input_length - len && JSON_ALPHANUMERIC_CHAR(*p); p++)
+						 /* skip */ ;
+
+					/*
+					 * We got some sort of unexpected punctuation or an
+					 * otherwise unexpected character, so just complain about
+					 * that one character.
+					 */
+					if (p == s)
+					{
+						lex->prev_token_terminator = lex->token_terminator;
+						lex->token_terminator = s + 1;
+						report_invalid_token(lex);
+					}
+
+					/*
+					 * We've got a real alphanumeric token here.  If it
+					 * happens to be true, false, or null, all is well.  If
+					 * not, error out.
+					 */
+					lex->prev_token_terminator = lex->token_terminator;
+					lex->token_terminator = p;
+					if (p - s == 4)
+					{
+						if (memcmp(s, "true", 4) == 0)
+							lex->token_type = JSON_TOKEN_TRUE;
+						else if (memcmp(s, "null", 4) == 0)
+							lex->token_type = JSON_TOKEN_NULL;
+						else
+							report_invalid_token(lex);
+					}
+					else if (p - s == 5 && memcmp(s, "false", 5) == 0)
+						lex->token_type = JSON_TOKEN_FALSE;
+					else
+						report_invalid_token(lex);
+
+				}
+		}						/* end of switch */
+}
+
+/*
+ * The next token in the input stream is known to be a string; lex it.
+ */
+static inline void
+json_lex_string(JsonLexContext *lex)
+{
+	char	   *s;
+	int			len;
+	int			hi_surrogate = -1;
+
+	if (lex->strval != NULL)
+		resetStringInfo(lex->strval);
+
+	Assert(lex->input_length > 0);
+	s = lex->token_start;
+	len = lex->token_start - lex->input;
+	for (;;)
+	{
+		s++;
+		len++;
+		/* Premature end of the string. */
+		if (len >= lex->input_length)
+		{
+			lex->token_terminator = s;
+			report_invalid_token(lex);
+		}
+		else if (*s == '"')
+			break;
+		else if ((unsigned char) *s < 32)
+		{
+			/* Per RFC4627, these characters MUST be escaped. */
+			/* Since *s isn't printable, exclude it from the context string */
+			lex->token_terminator = s;
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Character with value 0x%02x must be escaped.",
+							   (unsigned char) *s),
+					 report_json_context(lex)));
+		}
+		else if (*s == '\\')
+		{
+			/* OK, we have an escape character. */
+			s++;
+			len++;
+			if (len >= lex->input_length)
+			{
+				lex->token_terminator = s;
+				report_invalid_token(lex);
+			}
+			else if (*s == 'u')
+			{
+				int			i;
+				int			ch = 0;
+
+				for (i = 1; i <= 4; i++)
+				{
+					s++;
+					len++;
+					if (len >= lex->input_length)
+					{
+						lex->token_terminator = s;
+						report_invalid_token(lex);
+					}
+					else if (*s >= '0' && *s <= '9')
+						ch = (ch * 16) + (*s - '0');
+					else if (*s >= 'a' && *s <= 'f')
+						ch = (ch * 16) + (*s - 'a') + 10;
+					else if (*s >= 'A' && *s <= 'F')
+						ch = (ch * 16) + (*s - 'A') + 10;
+					else
+					{
+						lex->token_terminator = s + pg_mblen(s);
+						ereport(ERROR,
+								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+								 errmsg("invalid input syntax for type %s",
+										"json"),
+								 errdetail("\"\\u\" must be followed by four hexadecimal digits."),
+								 report_json_context(lex)));
+					}
+				}
+				if (lex->strval != NULL)
+				{
+					char		utf8str[5];
+					int			utf8len;
+
+					if (ch >= 0xd800 && ch <= 0xdbff)
+					{
+						if (hi_surrogate != -1)
+							ereport(ERROR,
+									(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+									 errmsg("invalid input syntax for type %s",
+											"json"),
+									 errdetail("Unicode high surrogate must not follow a high surrogate."),
+									 report_json_context(lex)));
+						hi_surrogate = (ch & 0x3ff) << 10;
+						continue;
+					}
+					else if (ch >= 0xdc00 && ch <= 0xdfff)
+					{
+						if (hi_surrogate == -1)
+							ereport(ERROR,
+									(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+									 errmsg("invalid input syntax for type %s", "json"),
+									 errdetail("Unicode low surrogate must follow a high surrogate."),
+									 report_json_context(lex)));
+						ch = 0x10000 + hi_surrogate + (ch & 0x3ff);
+						hi_surrogate = -1;
+					}
+
+					if (hi_surrogate != -1)
+						ereport(ERROR,
+								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+								 errmsg("invalid input syntax for type %s", "json"),
+								 errdetail("Unicode low surrogate must follow a high surrogate."),
+								 report_json_context(lex)));
+
+					/*
+					 * For UTF8, replace the escape sequence by the actual
+					 * utf8 character in lex->strval. Do this also for other
+					 * encodings if the escape designates an ASCII character,
+					 * otherwise raise an error.
+					 */
+
+					if (ch == 0)
+					{
+						/* We can't allow this, since our TEXT type doesn't */
+						ereport(ERROR,
+								(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
+								 errmsg("unsupported Unicode escape sequence"),
+								 errdetail("\\u0000 cannot be converted to text."),
+								 report_json_context(lex)));
+					}
+					else if (GetDatabaseEncoding() == PG_UTF8)
+					{
+						unicode_to_utf8(ch, (unsigned char *) utf8str);
+						utf8len = pg_utf_mblen((unsigned char *) utf8str);
+						appendBinaryStringInfo(lex->strval, utf8str, utf8len);
+					}
+					else if (ch <= 0x007f)
+					{
+						/*
+						 * This is the only way to designate things like a
+						 * form feed character in JSON, so it's useful in all
+						 * encodings.
+						 */
+						appendStringInfoChar(lex->strval, (char) ch);
+					}
+					else
+					{
+						ereport(ERROR,
+								(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
+								 errmsg("unsupported Unicode escape sequence"),
+								 errdetail("Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8."),
+								 report_json_context(lex)));
+					}
+
+				}
+			}
+			else if (lex->strval != NULL)
+			{
+				if (hi_surrogate != -1)
+					ereport(ERROR,
+							(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+							 errmsg("invalid input syntax for type %s",
+									"json"),
+							 errdetail("Unicode low surrogate must follow a high surrogate."),
+							 report_json_context(lex)));
+
+				switch (*s)
+				{
+					case '"':
+					case '\\':
+					case '/':
+						appendStringInfoChar(lex->strval, *s);
+						break;
+					case 'b':
+						appendStringInfoChar(lex->strval, '\b');
+						break;
+					case 'f':
+						appendStringInfoChar(lex->strval, '\f');
+						break;
+					case 'n':
+						appendStringInfoChar(lex->strval, '\n');
+						break;
+					case 'r':
+						appendStringInfoChar(lex->strval, '\r');
+						break;
+					case 't':
+						appendStringInfoChar(lex->strval, '\t');
+						break;
+					default:
+						/* Not a valid string escape, so error out. */
+						lex->token_terminator = s + pg_mblen(s);
+						ereport(ERROR,
+								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+								 errmsg("invalid input syntax for type %s",
+										"json"),
+								 errdetail("Escape sequence \"\\%s\" is invalid.",
+										   extract_mb_char(s)),
+								 report_json_context(lex)));
+				}
+			}
+			else if (strchr("\"\\/bfnrt", *s) == NULL)
+			{
+				/*
+				 * Simpler processing if we're not bothered about de-escaping
+				 *
+				 * It's very tempting to remove the strchr() call here and
+				 * replace it with a switch statement, but testing so far has
+				 * shown it's not a performance win.
+				 */
+				lex->token_terminator = s + pg_mblen(s);
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Escape sequence \"\\%s\" is invalid.",
+								   extract_mb_char(s)),
+						 report_json_context(lex)));
+			}
+
+		}
+		else if (lex->strval != NULL)
+		{
+			if (hi_surrogate != -1)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Unicode low surrogate must follow a high surrogate."),
+						 report_json_context(lex)));
+
+			appendStringInfoChar(lex->strval, *s);
+		}
+
+	}
+
+	if (hi_surrogate != -1)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				 errmsg("invalid input syntax for type %s", "json"),
+				 errdetail("Unicode low surrogate must follow a high surrogate."),
+				 report_json_context(lex)));
+
+	/* Hooray, we found the end of the string! */
+	lex->prev_token_terminator = lex->token_terminator;
+	lex->token_terminator = s + 1;
+}
+
+/*
+ * The next token in the input stream is known to be a number; lex it.
+ *
+ * In JSON, a number consists of four parts:
+ *
+ * (1) An optional minus sign ('-').
+ *
+ * (2) Either a single '0', or a string of one or more digits that does not
+ *	   begin with a '0'.
+ *
+ * (3) An optional decimal part, consisting of a period ('.') followed by
+ *	   one or more digits.  (Note: While this part can be omitted
+ *	   completely, it's not OK to have only the decimal point without
+ *	   any digits afterwards.)
+ *
+ * (4) An optional exponent part, consisting of 'e' or 'E', optionally
+ *	   followed by '+' or '-', followed by one or more digits.  (Note:
+ *	   As with the decimal part, if 'e' or 'E' is present, it must be
+ *	   followed by at least one digit.)
+ *
+ * The 's' argument to this function points to the ostensible beginning
+ * of part 2 - i.e. the character after any optional minus sign, or the
+ * first character of the string if there is none.
+ *
+ * If num_err is not NULL, we return an error flag to *num_err rather than
+ * raising an error for a badly-formed number.  Also, if total_len is not NULL
+ * the distance from lex->input to the token end+1 is returned to *total_len.
+ */
+static inline void
+json_lex_number(JsonLexContext *lex, char *s,
+				bool *num_err, int *total_len)
+{
+	bool		error = false;
+	int			len = s - lex->input;
+
+	/* Part (1): leading sign indicator. */
+	/* Caller already did this for us; so do nothing. */
+
+	/* Part (2): parse main digit string. */
+	if (len < lex->input_length && *s == '0')
+	{
+		s++;
+		len++;
+	}
+	else if (len < lex->input_length && *s >= '1' && *s <= '9')
+	{
+		do
+		{
+			s++;
+			len++;
+		} while (len < lex->input_length && *s >= '0' && *s <= '9');
+	}
+	else
+		error = true;
+
+	/* Part (3): parse optional decimal portion. */
+	if (len < lex->input_length && *s == '.')
+	{
+		s++;
+		len++;
+		if (len == lex->input_length || *s < '0' || *s > '9')
+			error = true;
+		else
+		{
+			do
+			{
+				s++;
+				len++;
+			} while (len < lex->input_length && *s >= '0' && *s <= '9');
+		}
+	}
+
+	/* Part (4): parse optional exponent. */
+	if (len < lex->input_length && (*s == 'e' || *s == 'E'))
+	{
+		s++;
+		len++;
+		if (len < lex->input_length && (*s == '+' || *s == '-'))
+		{
+			s++;
+			len++;
+		}
+		if (len == lex->input_length || *s < '0' || *s > '9')
+			error = true;
+		else
+		{
+			do
+			{
+				s++;
+				len++;
+			} while (len < lex->input_length && *s >= '0' && *s <= '9');
+		}
+	}
+
+	/*
+	 * Check for trailing garbage.  As in json_lex(), any alphanumeric stuff
+	 * here should be considered part of the token for error-reporting
+	 * purposes.
+	 */
+	for (; len < lex->input_length && JSON_ALPHANUMERIC_CHAR(*s); s++, len++)
+		error = true;
+
+	if (total_len != NULL)
+		*total_len = len;
+
+	if (num_err != NULL)
+	{
+		/* let the caller handle any error */
+		*num_err = error;
+	}
+	else
+	{
+		/* return token endpoint */
+		lex->prev_token_terminator = lex->token_terminator;
+		lex->token_terminator = s;
+		/* handle error if any */
+		if (error)
+			report_invalid_token(lex);
+	}
+}
+
+/*
+ * Report a parse error.
+ *
+ * lex->token_start and lex->token_terminator must identify the current token.
+ */
+static void
+report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
+{
+	char	   *token;
+	int			toklen;
+
+	/* Handle case where the input ended prematurely. */
+	if (lex->token_start == NULL || lex->token_type == JSON_TOKEN_END)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				 errmsg("invalid input syntax for type %s", "json"),
+				 errdetail("The input string ended unexpectedly."),
+				 report_json_context(lex)));
+
+	/* Separate out the current token. */
+	toklen = lex->token_terminator - lex->token_start;
+	token = palloc(toklen + 1);
+	memcpy(token, lex->token_start, toklen);
+	token[toklen] = '\0';
+
+	/* Complain, with the appropriate detail message. */
+	if (ctx == JSON_PARSE_END)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				 errmsg("invalid input syntax for type %s", "json"),
+				 errdetail("Expected end of input, but found \"%s\".",
+						   token),
+				 report_json_context(lex)));
+	else
+	{
+		switch (ctx)
+		{
+			case JSON_PARSE_VALUE:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected JSON value, but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_STRING:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected string, but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_ARRAY_START:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected array element or \"]\", but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_ARRAY_NEXT:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected \",\" or \"]\", but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_OBJECT_START:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected string or \"}\", but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_OBJECT_LABEL:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected \":\", but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_OBJECT_NEXT:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected \",\" or \"}\", but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_OBJECT_COMMA:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected string, but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			default:
+				elog(ERROR, "unexpected json parse state: %d", ctx);
+		}
+	}
+}
+
+/*
+ * Report an invalid input token.
+ *
+ * lex->token_start and lex->token_terminator must identify the token.
+ */
+static void
+report_invalid_token(JsonLexContext *lex)
+{
+	char	   *token;
+	int			toklen;
+
+	/* Separate out the offending token. */
+	toklen = lex->token_terminator - lex->token_start;
+	token = palloc(toklen + 1);
+	memcpy(token, lex->token_start, toklen);
+	token[toklen] = '\0';
+
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+			 errmsg("invalid input syntax for type %s", "json"),
+			 errdetail("Token \"%s\" is invalid.", token),
+			 report_json_context(lex)));
+}
+
+/*
+ * Report a CONTEXT line for bogus JSON input.
+ *
+ * lex->token_terminator must be set to identify the spot where we detected
+ * the error.  Note that lex->token_start might be NULL, in case we recognized
+ * error at EOF.
+ *
+ * The return value isn't meaningful, but we make it non-void so that this
+ * can be invoked inside ereport().
+ */
+static int
+report_json_context(JsonLexContext *lex)
+{
+	const char *context_start;
+	const char *context_end;
+	const char *line_start;
+	int			line_number;
+	char	   *ctxt;
+	int			ctxtlen;
+	const char *prefix;
+	const char *suffix;
+
+	/* Choose boundaries for the part of the input we will display */
+	context_start = lex->input;
+	context_end = lex->token_terminator;
+	line_start = context_start;
+	line_number = 1;
+	for (;;)
+	{
+		/* Always advance over newlines */
+		if (context_start < context_end && *context_start == '\n')
+		{
+			context_start++;
+			line_start = context_start;
+			line_number++;
+			continue;
+		}
+		/* Otherwise, done as soon as we are close enough to context_end */
+		if (context_end - context_start < 50)
+			break;
+		/* Advance to next multibyte character */
+		if (IS_HIGHBIT_SET(*context_start))
+			context_start += pg_mblen(context_start);
+		else
+			context_start++;
+	}
+
+	/*
+	 * We add "..." to indicate that the excerpt doesn't start at the
+	 * beginning of the line ... but if we're within 3 characters of the
+	 * beginning of the line, we might as well just show the whole line.
+	 */
+	if (context_start - line_start <= 3)
+		context_start = line_start;
+
+	/* Get a null-terminated copy of the data to present */
+	ctxtlen = context_end - context_start;
+	ctxt = palloc(ctxtlen + 1);
+	memcpy(ctxt, context_start, ctxtlen);
+	ctxt[ctxtlen] = '\0';
+
+	/*
+	 * Show the context, prefixing "..." if not starting at start of line, and
+	 * suffixing "..." if not ending at end of line.
+	 */
+	prefix = (context_start > line_start) ? "..." : "";
+	suffix = (lex->token_type != JSON_TOKEN_END && context_end - lex->input < lex->input_length && *context_end != '\n' && *context_end != '\r') ? "..." : "";
+
+	return errcontext("JSON data, line %d: %s%s%s",
+					  line_number, prefix, ctxt, suffix);
+}
+
+/*
+ * Extract a single, possibly multi-byte char from the input string.
+ */
+static char *
+extract_mb_char(char *s)
+{
+	char	   *res;
+	int			len;
+
+	len = pg_mblen(s);
+	res = palloc(len + 1);
+	memcpy(res, s, len);
+	res[len] = '\0';
+
+	return res;
+}
diff --git a/src/include/utils/jsonapi.h b/src/include/utils/jsonapi.h
index 1190947476..bbca121bb7 100644
--- a/src/include/utils/jsonapi.h
+++ b/src/include/utils/jsonapi.h
@@ -103,6 +103,9 @@ typedef struct JsonSemAction
  */
 extern void pg_parse_json(JsonLexContext *lex, JsonSemAction *sem);
 
+/* the null action object used for pure validation */
+extern JsonSemAction nullSemAction;
+
 /*
  * json_count_array_elements performs a fast secondary parse to determine the
  * number of elements in passed array lex context. It should be called from an
@@ -124,6 +127,9 @@ extern JsonLexContext *makeJsonLexContextCstringLen(char *json,
 													int len,
 													bool need_escapes);
 
+/* lex one token */
+extern void json_lex(JsonLexContext *lex);
+
 /*
  * Utility function to check if a string is a valid JSON number.
  *
-- 
2.17.2 (Apple Git-113)

v2-0004-WIP-Return-errors-rather-than-using-ereport.patchapplication/octet-stream; name=v2-0004-WIP-Return-errors-rather-than-using-ereport.patchDownload

From a523b6ec9e4d78fc2d81ae15193cdd6d7885fe37 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 17 Jan 2020 16:22:21 -0500
Subject: [PATCH v2 4/4] WIP: Return errors rather than using ereport().

This might be a useful step towards allowing the frontend to use
this code without needing ereport().
---
 src/backend/utils/adt/jsonapi.c | 578 ++++++++++++++++++--------------
 src/include/utils/jsonapi.h     |  22 ++
 2 files changed, 341 insertions(+), 259 deletions(-)

diff --git a/src/backend/utils/adt/jsonapi.c b/src/backend/utils/adt/jsonapi.c
index 9e14306b6f..0b5f1664a3 100644
--- a/src/backend/utils/adt/jsonapi.c
+++ b/src/backend/utils/adt/jsonapi.c
@@ -35,16 +35,17 @@ typedef enum					/* contexts of JSON parser */
 	JSON_PARSE_END				/* saw the end of a document, expect nothing */
 } JsonParseContext;
 
-static inline void json_lex_string(JsonLexContext *lex);
-static inline void json_lex_number(JsonLexContext *lex, char *s,
+static JsonParseErrorType json_lex_internal(JsonLexContext *lex);
+static inline JsonParseErrorType json_lex_string(JsonLexContext *lex);
+static inline JsonParseErrorType json_lex_number(JsonLexContext *lex, char *s,
 								   bool *num_err, int *total_len);
-static inline void parse_scalar(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_object_field(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_object(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_array_element(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_array(JsonLexContext *lex, JsonSemAction *sem);
-static void report_parse_error(JsonParseContext ctx, JsonLexContext *lex) pg_attribute_noreturn();
-static void report_invalid_token(JsonLexContext *lex) pg_attribute_noreturn();
+static inline JsonParseErrorType parse_scalar(JsonLexContext *lex, JsonSemAction *sem);
+static JsonParseErrorType parse_object_field(JsonLexContext *lex, JsonSemAction *sem);
+static JsonParseErrorType parse_object(JsonLexContext *lex, JsonSemAction *sem);
+static JsonParseErrorType parse_array_element(JsonLexContext *lex, JsonSemAction *sem);
+static JsonParseErrorType parse_array(JsonLexContext *lex, JsonSemAction *sem);
+static JsonParseErrorType report_parse_error(JsonParseContext ctx, JsonLexContext *lex);
+static void throw_a_json_error(JsonParseErrorType error, JsonLexContext *lex);
 static int	report_json_context(JsonLexContext *lex);
 static char *extract_mb_char(char *s);
 
@@ -74,13 +75,13 @@ lex_peek(JsonLexContext *lex)
  * move the lexer to the next token if the current look_ahead token matches
  * the parameter token. Otherwise, report an error.
  */
-static inline void
+static inline JsonParseErrorType
 lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token)
 {
 	if (lex_peek(lex) == token)
-		json_lex(lex);
+		return json_lex_internal(lex);
 	else
-		report_parse_error(ctx, lex);
+		return report_parse_error(ctx, lex);
 }
 
 /* chars to consider as part of an alphanumeric token */
@@ -175,9 +176,12 @@ void
 pg_parse_json(JsonLexContext *lex, JsonSemAction *sem)
 {
 	JsonTokenType tok;
+	JsonParseErrorType	result;
 
 	/* get the initial token */
-	json_lex(lex);
+	result = json_lex_internal(lex);
+	if (result != JSON_SUCCESS)
+		throw_a_json_error(result, lex);
 
 	tok = lex_peek(lex);
 
@@ -185,17 +189,20 @@ pg_parse_json(JsonLexContext *lex, JsonSemAction *sem)
 	switch (tok)
 	{
 		case JSON_TOKEN_OBJECT_START:
-			parse_object(lex, sem);
+			result = parse_object(lex, sem);
 			break;
 		case JSON_TOKEN_ARRAY_START:
-			parse_array(lex, sem);
+			result = parse_array(lex, sem);
 			break;
 		default:
-			parse_scalar(lex, sem); /* json can be a bare scalar */
+			result = parse_scalar(lex, sem); /* json can be a bare scalar */
 	}
 
-	lex_expect(JSON_PARSE_END, lex, JSON_TOKEN_END);
+	if (result == JSON_SUCCESS)
+		result = lex_expect(JSON_PARSE_END, lex, JSON_TOKEN_END);
 
+	if (result != JSON_SUCCESS)
+		throw_a_json_error(result, lex);
 }
 
 /*
@@ -211,6 +218,7 @@ json_count_array_elements(JsonLexContext *lex)
 {
 	JsonLexContext copylex;
 	int			count;
+	JsonParseErrorType	result;
 
 	/*
 	 * It's safe to do this with a shallow copy because the lexical routines
@@ -222,19 +230,28 @@ json_count_array_elements(JsonLexContext *lex)
 	copylex.lex_level++;
 
 	count = 0;
-	lex_expect(JSON_PARSE_ARRAY_START, &copylex, JSON_TOKEN_ARRAY_START);
-	if (lex_peek(&copylex) != JSON_TOKEN_ARRAY_END)
+	result = lex_expect(JSON_PARSE_ARRAY_START, &copylex,
+						JSON_TOKEN_ARRAY_START);
+	if (result == JSON_SUCCESS && lex_peek(&copylex) != JSON_TOKEN_ARRAY_END)
 	{
 		while (1)
 		{
 			count++;
-			parse_array_element(&copylex, &nullSemAction);
+			result = parse_array_element(&copylex, &nullSemAction);
+			if (result != JSON_SUCCESS)
+				break;
 			if (copylex.token_type != JSON_TOKEN_COMMA)
 				break;
-			json_lex(&copylex);
+			result = json_lex_internal(&copylex);
+			if (result != JSON_SUCCESS)
+				break;
 		}
 	}
-	lex_expect(JSON_PARSE_ARRAY_NEXT, &copylex, JSON_TOKEN_ARRAY_END);
+	if (result == JSON_SUCCESS)
+		result = lex_expect(JSON_PARSE_ARRAY_NEXT, &copylex,
+							JSON_TOKEN_ARRAY_END);
+	if (result != JSON_SUCCESS)
+		throw_a_json_error(result, lex);
 
 	return count;
 }
@@ -248,25 +265,23 @@ json_count_array_elements(JsonLexContext *lex)
  *	  - object ( { } )
  *	  - object field
  */
-static inline void
+static inline JsonParseErrorType
 parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
 {
 	char	   *val = NULL;
 	json_scalar_action sfunc = sem->scalar;
 	JsonTokenType tok = lex_peek(lex);
+	JsonParseErrorType result;
 
 	/* a scalar must be a string, a number, true, false, or null */
 	if (tok != JSON_TOKEN_STRING && tok != JSON_TOKEN_NUMBER &&
 		tok != JSON_TOKEN_TRUE && tok != JSON_TOKEN_FALSE &&
 		tok != JSON_TOKEN_NULL)
-		report_parse_error(JSON_PARSE_VALUE, lex);
+		return report_parse_error(JSON_PARSE_VALUE, lex);
 
 	/* if no semantic function, just consume the token */
 	if (sfunc == NULL)
-	{
-		json_lex(lex);
-		return;
-	}
+		return json_lex_internal(lex);
 
 	/* extract the de-escaped string value, or the raw lexeme */
 	if (lex_peek(lex) == JSON_TOKEN_STRING)
@@ -284,13 +299,17 @@ parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
 	}
 
 	/* consume the token */
-	json_lex(lex);
+	result = json_lex_internal(lex);
+	if (result != JSON_SUCCESS)
+		return result;
 
 	/* invoke the callback */
 	(*sfunc) (sem->semstate, val, tok);
+
+	return JSON_SUCCESS;
 }
 
-static void
+static JsonParseErrorType
 parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
 {
 	/*
@@ -304,14 +323,19 @@ parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
 	json_ofield_action oend = sem->object_field_end;
 	bool		isnull;
 	JsonTokenType tok;
+	JsonParseErrorType result;
 
 	if (lex_peek(lex) != JSON_TOKEN_STRING)
-		report_parse_error(JSON_PARSE_STRING, lex);
+		return report_parse_error(JSON_PARSE_STRING, lex);
 	if ((ostart != NULL || oend != NULL) && lex->strval != NULL)
 		fname = pstrdup(lex->strval->data);
-	json_lex(lex);
+	result = json_lex_internal(lex);
+	if (result != JSON_SUCCESS)
+		return result;
 
-	lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON);
+	result = lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON);
+	if (result != JSON_SUCCESS)
+		return result;
 
 	tok = lex_peek(lex);
 	isnull = tok == JSON_TOKEN_NULL;
@@ -322,20 +346,23 @@ parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
 	switch (tok)
 	{
 		case JSON_TOKEN_OBJECT_START:
-			parse_object(lex, sem);
+			result = parse_object(lex, sem);
 			break;
 		case JSON_TOKEN_ARRAY_START:
-			parse_array(lex, sem);
+			result = parse_array(lex, sem);
 			break;
 		default:
-			parse_scalar(lex, sem);
+			result = parse_scalar(lex, sem);
 	}
+	if (result != JSON_SUCCESS)
+		return result;
 
 	if (oend != NULL)
 		(*oend) (sem->semstate, fname, isnull);
+	return JSON_SUCCESS;
 }
 
-static void
+static JsonParseErrorType
 parse_object(JsonLexContext *lex, JsonSemAction *sem)
 {
 	/*
@@ -345,6 +372,7 @@ parse_object(JsonLexContext *lex, JsonSemAction *sem)
 	json_struct_action ostart = sem->object_start;
 	json_struct_action oend = sem->object_end;
 	JsonTokenType tok;
+	JsonParseErrorType result;
 
 	check_stack_depth();
 
@@ -360,40 +388,51 @@ parse_object(JsonLexContext *lex, JsonSemAction *sem)
 	lex->lex_level++;
 
 	Assert(lex_peek(lex) == JSON_TOKEN_OBJECT_START);
-	json_lex(lex);
+	result = json_lex_internal(lex);
+	if (result != JSON_SUCCESS)
+		return result;
 
 	tok = lex_peek(lex);
 	switch (tok)
 	{
 		case JSON_TOKEN_STRING:
-			parse_object_field(lex, sem);
-			while (lex_peek(lex) == JSON_TOKEN_COMMA)
+			result = parse_object_field(lex, sem);
+			while (result == JSON_SUCCESS && lex_peek(lex) == JSON_TOKEN_COMMA)
 			{
-				json_lex(lex);
-				parse_object_field(lex, sem);
+				result = json_lex_internal(lex);
+				if (result != JSON_SUCCESS)
+					break;
+				result = parse_object_field(lex, sem);
 			}
 			break;
 		case JSON_TOKEN_OBJECT_END:
 			break;
 		default:
 			/* case of an invalid initial token inside the object */
-			report_parse_error(JSON_PARSE_OBJECT_START, lex);
+			result = report_parse_error(JSON_PARSE_OBJECT_START, lex);
 	}
+	if (result != JSON_SUCCESS)
+		return result;
 
-	lex_expect(JSON_PARSE_OBJECT_NEXT, lex, JSON_TOKEN_OBJECT_END);
+	result = lex_expect(JSON_PARSE_OBJECT_NEXT, lex, JSON_TOKEN_OBJECT_END);
+	if (result != JSON_SUCCESS)
+		return result;
 
 	lex->lex_level--;
 
 	if (oend != NULL)
 		(*oend) (sem->semstate);
+
+	return JSON_SUCCESS;
 }
 
-static void
+static JsonParseErrorType
 parse_array_element(JsonLexContext *lex, JsonSemAction *sem)
 {
 	json_aelem_action astart = sem->array_element_start;
 	json_aelem_action aend = sem->array_element_end;
 	JsonTokenType tok = lex_peek(lex);
+	JsonParseErrorType result;
 
 	bool		isnull;
 
@@ -406,20 +445,25 @@ parse_array_element(JsonLexContext *lex, JsonSemAction *sem)
 	switch (tok)
 	{
 		case JSON_TOKEN_OBJECT_START:
-			parse_object(lex, sem);
+			result = parse_object(lex, sem);
 			break;
 		case JSON_TOKEN_ARRAY_START:
-			parse_array(lex, sem);
+			result = parse_array(lex, sem);
 			break;
 		default:
-			parse_scalar(lex, sem);
+			result = parse_scalar(lex, sem);
 	}
 
+	if (result != JSON_SUCCESS)
+		return result;
+
 	if (aend != NULL)
 		(*aend) (sem->semstate, isnull);
+
+	return JSON_SUCCESS;
 }
 
-static void
+static JsonParseErrorType
 parse_array(JsonLexContext *lex, JsonSemAction *sem)
 {
 	/*
@@ -428,6 +472,7 @@ parse_array(JsonLexContext *lex, JsonSemAction *sem)
 	 */
 	json_struct_action astart = sem->array_start;
 	json_struct_action aend = sem->array_end;
+	JsonParseErrorType result;
 
 	check_stack_depth();
 
@@ -442,35 +487,53 @@ parse_array(JsonLexContext *lex, JsonSemAction *sem)
 	 */
 	lex->lex_level++;
 
-	lex_expect(JSON_PARSE_ARRAY_START, lex, JSON_TOKEN_ARRAY_START);
-	if (lex_peek(lex) != JSON_TOKEN_ARRAY_END)
+	result = lex_expect(JSON_PARSE_ARRAY_START, lex, JSON_TOKEN_ARRAY_START);
+	if (result == JSON_SUCCESS && lex_peek(lex) != JSON_TOKEN_ARRAY_END)
 	{
+		result = parse_array_element(lex, sem);
 
-		parse_array_element(lex, sem);
-
-		while (lex_peek(lex) == JSON_TOKEN_COMMA)
+		while (result == JSON_SUCCESS && lex_peek(lex) == JSON_TOKEN_COMMA)
 		{
-			json_lex(lex);
-			parse_array_element(lex, sem);
+			result = json_lex_internal(lex);
+			if (result != JSON_SUCCESS)
+				break;
+			result = parse_array_element(lex, sem);
 		}
 	}
+	if (result != JSON_SUCCESS)
+		return result;
 
-	lex_expect(JSON_PARSE_ARRAY_NEXT, lex, JSON_TOKEN_ARRAY_END);
+	result = lex_expect(JSON_PARSE_ARRAY_NEXT, lex, JSON_TOKEN_ARRAY_END);
+	if (result != JSON_SUCCESS)
+		return result;
 
 	lex->lex_level--;
 
 	if (aend != NULL)
 		(*aend) (sem->semstate);
+
+	return JSON_SUCCESS;
+}
+
+void
+json_lex(JsonLexContext *lex)
+{
+	JsonParseErrorType	result;
+
+	result = json_lex_internal(lex);
+	if (result != JSON_SUCCESS)
+		throw_a_json_error(result, lex);
 }
 
 /*
  * Lex one token from the input stream.
  */
-void
-json_lex(JsonLexContext *lex)
+static JsonParseErrorType
+json_lex_internal(JsonLexContext *lex)
 {
 	char	   *s;
 	int			len;
+	JsonParseErrorType	result;
 
 	/* Skip leading whitespace. */
 	s = lex->token_terminator;
@@ -494,6 +557,7 @@ json_lex(JsonLexContext *lex)
 		lex->token_type = JSON_TOKEN_END;
 	}
 	else
+	{
 		switch (*s)
 		{
 				/* Single-character token, some kind of punctuation mark. */
@@ -529,12 +593,16 @@ json_lex(JsonLexContext *lex)
 				break;
 			case '"':
 				/* string */
-				json_lex_string(lex);
+				result = json_lex_string(lex);
+				if (result != JSON_SUCCESS)
+					return result;
 				lex->token_type = JSON_TOKEN_STRING;
 				break;
 			case '-':
 				/* Negative number. */
-				json_lex_number(lex, s + 1, NULL, NULL);
+				result = json_lex_number(lex, s + 1, NULL, NULL);
+				if (result != JSON_SUCCESS)
+					return result;
 				lex->token_type = JSON_TOKEN_NUMBER;
 				break;
 			case '0':
@@ -548,7 +616,9 @@ json_lex(JsonLexContext *lex)
 			case '8':
 			case '9':
 				/* Positive number. */
-				json_lex_number(lex, s, NULL, NULL);
+				result = json_lex_number(lex, s, NULL, NULL);
+				if (result != JSON_SUCCESS)
+					return result;
 				lex->token_type = JSON_TOKEN_NUMBER;
 				break;
 			default:
@@ -576,7 +646,7 @@ json_lex(JsonLexContext *lex)
 					{
 						lex->prev_token_terminator = lex->token_terminator;
 						lex->token_terminator = s + 1;
-						report_invalid_token(lex);
+						return JSON_INVALID_TOKEN;
 					}
 
 					/*
@@ -593,21 +663,24 @@ json_lex(JsonLexContext *lex)
 						else if (memcmp(s, "null", 4) == 0)
 							lex->token_type = JSON_TOKEN_NULL;
 						else
-							report_invalid_token(lex);
+							return JSON_INVALID_TOKEN;
 					}
 					else if (p - s == 5 && memcmp(s, "false", 5) == 0)
 						lex->token_type = JSON_TOKEN_FALSE;
 					else
-						report_invalid_token(lex);
+						return JSON_INVALID_TOKEN;
 
 				}
 		}						/* end of switch */
+	}
+
+	return JSON_SUCCESS;
 }
 
 /*
  * The next token in the input stream is known to be a string; lex it.
  */
-static inline void
+static inline JsonParseErrorType
 json_lex_string(JsonLexContext *lex)
 {
 	char	   *s;
@@ -628,7 +701,7 @@ json_lex_string(JsonLexContext *lex)
 		if (len >= lex->input_length)
 		{
 			lex->token_terminator = s;
-			report_invalid_token(lex);
+			return JSON_INVALID_TOKEN;
 		}
 		else if (*s == '"')
 			break;
@@ -637,12 +710,7 @@ json_lex_string(JsonLexContext *lex)
 			/* Per RFC4627, these characters MUST be escaped. */
 			/* Since *s isn't printable, exclude it from the context string */
 			lex->token_terminator = s;
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-					 errmsg("invalid input syntax for type %s", "json"),
-					 errdetail("Character with value 0x%02x must be escaped.",
-							   (unsigned char) *s),
-					 report_json_context(lex)));
+			return JSON_ESCAPING_REQUIRED;
 		}
 		else if (*s == '\\')
 		{
@@ -652,7 +720,7 @@ json_lex_string(JsonLexContext *lex)
 			if (len >= lex->input_length)
 			{
 				lex->token_terminator = s;
-				report_invalid_token(lex);
+				return JSON_INVALID_TOKEN;
 			}
 			else if (*s == 'u')
 			{
@@ -666,7 +734,7 @@ json_lex_string(JsonLexContext *lex)
 					if (len >= lex->input_length)
 					{
 						lex->token_terminator = s;
-						report_invalid_token(lex);
+						return JSON_INVALID_TOKEN;
 					}
 					else if (*s >= '0' && *s <= '9')
 						ch = (ch * 16) + (*s - '0');
@@ -677,12 +745,7 @@ json_lex_string(JsonLexContext *lex)
 					else
 					{
 						lex->token_terminator = s + pg_mblen(s);
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s",
-										"json"),
-								 errdetail("\"\\u\" must be followed by four hexadecimal digits."),
-								 report_json_context(lex)));
+						return JSON_UNICODE_ESCAPE_FORMAT;
 					}
 				}
 				if (lex->strval != NULL)
@@ -693,33 +756,20 @@ json_lex_string(JsonLexContext *lex)
 					if (ch >= 0xd800 && ch <= 0xdbff)
 					{
 						if (hi_surrogate != -1)
-							ereport(ERROR,
-									(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-									 errmsg("invalid input syntax for type %s",
-											"json"),
-									 errdetail("Unicode high surrogate must not follow a high surrogate."),
-									 report_json_context(lex)));
+							return JSON_UNICODE_HIGH_SURROGATE;
 						hi_surrogate = (ch & 0x3ff) << 10;
 						continue;
 					}
 					else if (ch >= 0xdc00 && ch <= 0xdfff)
 					{
 						if (hi_surrogate == -1)
-							ereport(ERROR,
-									(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-									 errmsg("invalid input syntax for type %s", "json"),
-									 errdetail("Unicode low surrogate must follow a high surrogate."),
-									 report_json_context(lex)));
+							return JSON_UNICODE_LOW_SURROGATE;
 						ch = 0x10000 + hi_surrogate + (ch & 0x3ff);
 						hi_surrogate = -1;
 					}
 
 					if (hi_surrogate != -1)
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s", "json"),
-								 errdetail("Unicode low surrogate must follow a high surrogate."),
-								 report_json_context(lex)));
+						return JSON_UNICODE_LOW_SURROGATE;
 
 					/*
 					 * For UTF8, replace the escape sequence by the actual
@@ -731,11 +781,7 @@ json_lex_string(JsonLexContext *lex)
 					if (ch == 0)
 					{
 						/* We can't allow this, since our TEXT type doesn't */
-						ereport(ERROR,
-								(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
-								 errmsg("unsupported Unicode escape sequence"),
-								 errdetail("\\u0000 cannot be converted to text."),
-								 report_json_context(lex)));
+						return JSON_UNICODE_CODE_POINT_ZERO;
 					}
 					else if (GetDatabaseEncoding() == PG_UTF8)
 					{
@@ -753,25 +799,14 @@ json_lex_string(JsonLexContext *lex)
 						appendStringInfoChar(lex->strval, (char) ch);
 					}
 					else
-					{
-						ereport(ERROR,
-								(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
-								 errmsg("unsupported Unicode escape sequence"),
-								 errdetail("Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8."),
-								 report_json_context(lex)));
-					}
+						return JSON_UNICODE_HIGH_ESCAPE;
 
 				}
 			}
 			else if (lex->strval != NULL)
 			{
 				if (hi_surrogate != -1)
-					ereport(ERROR,
-							(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-							 errmsg("invalid input syntax for type %s",
-									"json"),
-							 errdetail("Unicode low surrogate must follow a high surrogate."),
-							 report_json_context(lex)));
+					return JSON_UNICODE_LOW_SURROGATE;
 
 				switch (*s)
 				{
@@ -798,13 +833,7 @@ json_lex_string(JsonLexContext *lex)
 					default:
 						/* Not a valid string escape, so error out. */
 						lex->token_terminator = s + pg_mblen(s);
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s",
-										"json"),
-								 errdetail("Escape sequence \"\\%s\" is invalid.",
-										   extract_mb_char(s)),
-								 report_json_context(lex)));
+						return JSON_ESCAPING_INVALID;
 				}
 			}
 			else if (strchr("\"\\/bfnrt", *s) == NULL)
@@ -817,23 +846,14 @@ json_lex_string(JsonLexContext *lex)
 				 * shown it's not a performance win.
 				 */
 				lex->token_terminator = s + pg_mblen(s);
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Escape sequence \"\\%s\" is invalid.",
-								   extract_mb_char(s)),
-						 report_json_context(lex)));
+				return JSON_ESCAPING_INVALID;
 			}
 
 		}
 		else if (lex->strval != NULL)
 		{
 			if (hi_surrogate != -1)
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Unicode low surrogate must follow a high surrogate."),
-						 report_json_context(lex)));
+				return JSON_UNICODE_LOW_SURROGATE;
 
 			appendStringInfoChar(lex->strval, *s);
 		}
@@ -841,15 +861,12 @@ json_lex_string(JsonLexContext *lex)
 	}
 
 	if (hi_surrogate != -1)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("Unicode low surrogate must follow a high surrogate."),
-				 report_json_context(lex)));
+		return JSON_UNICODE_LOW_SURROGATE;
 
 	/* Hooray, we found the end of the string! */
 	lex->prev_token_terminator = lex->token_terminator;
 	lex->token_terminator = s + 1;
+	return JSON_SUCCESS;
 }
 
 /*
@@ -880,7 +897,7 @@ json_lex_string(JsonLexContext *lex)
  * raising an error for a badly-formed number.  Also, if total_len is not NULL
  * the distance from lex->input to the token end+1 is returned to *total_len.
  */
-static inline void
+static inline JsonParseErrorType
 json_lex_number(JsonLexContext *lex, char *s,
 				bool *num_err, int *total_len)
 {
@@ -947,8 +964,8 @@ json_lex_number(JsonLexContext *lex, char *s,
 	}
 
 	/*
-	 * Check for trailing garbage.  As in json_lex(), any alphanumeric stuff
-	 * here should be considered part of the token for error-reporting
+	 * Check for trailing garbage.  As in json_lex_internal(), any alphanumeric
+	 * stuff here should be considered part of the token for error-reporting
 	 * purposes.
 	 */
 	for (; len < lex->input_length && JSON_ALPHANUMERIC_CHAR(*s); s++, len++)
@@ -969,8 +986,10 @@ json_lex_number(JsonLexContext *lex, char *s,
 		lex->token_terminator = s;
 		/* handle error if any */
 		if (error)
-			report_invalid_token(lex);
+			return JSON_INVALID_TOKEN;
 	}
+
+	return JSON_SUCCESS;
 }
 
 /*
@@ -978,132 +997,39 @@ json_lex_number(JsonLexContext *lex, char *s,
  *
  * lex->token_start and lex->token_terminator must identify the current token.
  */
-static void
+static JsonParseErrorType
 report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
 {
-	char	   *token;
-	int			toklen;
-
 	/* Handle case where the input ended prematurely. */
 	if (lex->token_start == NULL || lex->token_type == JSON_TOKEN_END)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("The input string ended unexpectedly."),
-				 report_json_context(lex)));
-
-	/* Separate out the current token. */
-	toklen = lex->token_terminator - lex->token_start;
-	token = palloc(toklen + 1);
-	memcpy(token, lex->token_start, toklen);
-	token[toklen] = '\0';
+		return JSON_EXPECTED_MORE;
 
-	/* Complain, with the appropriate detail message. */
-	if (ctx == JSON_PARSE_END)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("Expected end of input, but found \"%s\".",
-						   token),
-				 report_json_context(lex)));
-	else
+	/* Otherwise choose the error type based on the parsing context. */
+	switch (ctx)
 	{
-		switch (ctx)
-		{
-			case JSON_PARSE_VALUE:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected JSON value, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_STRING:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_ARRAY_START:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected array element or \"]\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_ARRAY_NEXT:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \",\" or \"]\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_START:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string or \"}\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_LABEL:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \":\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_NEXT:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \",\" or \"}\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_COMMA:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			default:
-				elog(ERROR, "unexpected json parse state: %d", ctx);
-		}
+		case JSON_PARSE_END:
+			return JSON_EXPECTED_END;
+		case JSON_PARSE_VALUE:
+			return JSON_EXPECTED_JSON;
+		case JSON_PARSE_STRING:
+			return JSON_EXPECTED_STRING;
+		case JSON_PARSE_ARRAY_START:
+			return JSON_EXPECTED_ARRAY_FIRST;
+		case JSON_PARSE_ARRAY_NEXT:
+			return JSON_EXPECTED_ARRAY_NEXT;
+		case JSON_PARSE_OBJECT_START:
+			return JSON_EXPECTED_OBJECT_FIRST;
+		case JSON_PARSE_OBJECT_LABEL:
+			return JSON_EXPECTED_COLON;
+		case JSON_PARSE_OBJECT_NEXT:
+			return JSON_EXPECTED_OBJECT_NEXT;
+		case JSON_PARSE_OBJECT_COMMA:
+			return JSON_EXPECTED_STRING;
+		default:
+			elog(ERROR, "unexpected json parse state: %d", ctx);
 	}
 }
 
-/*
- * Report an invalid input token.
- *
- * lex->token_start and lex->token_terminator must identify the token.
- */
-static void
-report_invalid_token(JsonLexContext *lex)
-{
-	char	   *token;
-	int			toklen;
-
-	/* Separate out the offending token. */
-	toklen = lex->token_terminator - lex->token_start;
-	token = palloc(toklen + 1);
-	memcpy(token, lex->token_start, toklen);
-	token[toklen] = '\0';
-
-	ereport(ERROR,
-			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-			 errmsg("invalid input syntax for type %s", "json"),
-			 errdetail("Token \"%s\" is invalid.", token),
-			 report_json_context(lex)));
-}
-
 /*
  * Report a CONTEXT line for bogus JSON input.
  *
@@ -1192,3 +1118,137 @@ extract_mb_char(char *s)
 
 	return res;
 }
+
+static char *
+extract_token(JsonLexContext *lex)
+{
+	int toklen = lex->token_terminator - lex->token_start;
+	char *token = palloc(toklen + 1);
+
+	memcpy(token, lex->token_start, toklen);
+	token[toklen] = '\0';
+	return token;
+}
+
+static void
+throw_a_json_error(JsonParseErrorType error, JsonLexContext *lex)
+{
+	switch (error)
+	{
+		case JSON_SUCCESS:
+			elog(ERROR, "internal error in json parser");
+			break;
+		case JSON_ESCAPING_INVALID:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Escape sequence \"\\%s\" is invalid.",
+							   extract_mb_char(lex->token_terminator - 1)), // XXX WRONG AND BUSTED
+					 report_json_context(lex)));
+		case JSON_ESCAPING_REQUIRED:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Character with value 0x%02x must be escaped.",
+							   (unsigned char) *(lex->token_terminator)),
+					 report_json_context(lex)));
+		case JSON_EXPECTED_END:
+			ereport(ERROR,
+					 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					  errmsg("invalid input syntax for type %s", "json"),
+					  errdetail("Expected end of input, but found \"%s\".",
+								extract_token(lex)),
+					  report_json_context(lex)));
+		case JSON_EXPECTED_ARRAY_FIRST:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Expected array element or \"]\", but found \"%s\".",
+							   extract_token(lex)),
+					 report_json_context(lex)));
+		case JSON_EXPECTED_ARRAY_NEXT:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Expected \",\" or \"]\", but found \"%s\".",
+							   extract_token(lex)),
+					 report_json_context(lex)));
+		case JSON_EXPECTED_COLON:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Expected \":\", but found \"%s\".",
+							   extract_token(lex)),
+					 report_json_context(lex)));
+		case JSON_EXPECTED_JSON:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Expected JSON value, but found \"%s\".",
+							   extract_token(lex)),
+					 report_json_context(lex)));
+		case JSON_EXPECTED_MORE:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("The input string ended unexpectedly."),
+					 report_json_context(lex)));
+		case JSON_EXPECTED_OBJECT_FIRST:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Expected string or \"}\", but found \"%s\".",
+							   extract_token(lex)),
+					 report_json_context(lex)));
+		case JSON_EXPECTED_OBJECT_NEXT:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Expected \",\" or \"}\", but found \"%s\".",
+							   extract_token(lex)),
+					 report_json_context(lex)));
+		case JSON_EXPECTED_STRING:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Expected string, but found \"%s\".",
+							   extract_token(lex)),
+					 report_json_context(lex)));
+		case JSON_INVALID_TOKEN:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Token \"%s\" is invalid.", extract_token(lex)),
+					 report_json_context(lex)));
+		case JSON_UNICODE_CODE_POINT_ZERO:
+			ereport(ERROR,
+					(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
+					 errmsg("unsupported Unicode escape sequence"),
+					 errdetail("\\u0000 cannot be converted to text."),
+					 report_json_context(lex)));
+		case JSON_UNICODE_ESCAPE_FORMAT:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("\"\\u\" must be followed by four hexadecimal digits."),
+					 report_json_context(lex)));
+		case JSON_UNICODE_HIGH_ESCAPE:
+			ereport(ERROR,
+					(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
+					 errmsg("unsupported Unicode escape sequence"),
+					 errdetail("Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8."),
+					 report_json_context(lex)));
+		case JSON_UNICODE_HIGH_SURROGATE:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Unicode high surrogate must not follow a high surrogate."),
+					 report_json_context(lex)));
+		case JSON_UNICODE_LOW_SURROGATE:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Unicode low surrogate must follow a high surrogate."),
+					 report_json_context(lex)));
+	}
+}
diff --git a/src/include/utils/jsonapi.h b/src/include/utils/jsonapi.h
index bbca121bb7..b18ead7545 100644
--- a/src/include/utils/jsonapi.h
+++ b/src/include/utils/jsonapi.h
@@ -33,6 +33,28 @@ typedef enum
 	JSON_TOKEN_END
 } JsonTokenType;
 
+typedef enum
+{
+	JSON_SUCCESS,
+	JSON_ESCAPING_INVALID,
+	JSON_ESCAPING_REQUIRED,
+	JSON_EXPECTED_ARRAY_FIRST,
+	JSON_EXPECTED_ARRAY_NEXT,
+	JSON_EXPECTED_COLON,
+	JSON_EXPECTED_END,
+	JSON_EXPECTED_JSON,
+	JSON_EXPECTED_MORE,
+	JSON_EXPECTED_OBJECT_FIRST,
+	JSON_EXPECTED_OBJECT_NEXT,
+	JSON_EXPECTED_STRING,
+	JSON_INVALID_TOKEN,
+	JSON_UNICODE_CODE_POINT_ZERO,
+	JSON_UNICODE_ESCAPE_FORMAT,
+	JSON_UNICODE_HIGH_ESCAPE,
+	JSON_UNICODE_HIGH_SURROGATE,
+	JSON_UNICODE_LOW_SURROGATE
+} JsonParseErrorType;
+
 
 /*
  * All the fields in this structure should be treated as read-only.
-- 
2.17.2 (Apple Git-113)

v2-0003-Remove-jsonapi.c-s-lex_accept.patchapplication/octet-stream; name=v2-0003-Remove-jsonapi.c-s-lex_accept.patchDownload

From 9c34fec11a9fc478b5333bb220b923695325a212 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 17 Jan 2020 14:06:41 -0500
Subject: [PATCH v2 3/4] Remove jsonapi.c's lex_accept().

At first glance, this function seems useful, but it actually increases
the amount of code required rather than decreasing it. Inline the
logic into the callers instead; most callers don't use the 'lexeme'
argument for anything and as a result considerable simplification is
possible.
---
 src/backend/utils/adt/jsonapi.c | 124 +++++++++++++-------------------
 1 file changed, 51 insertions(+), 73 deletions(-)

diff --git a/src/backend/utils/adt/jsonapi.c b/src/backend/utils/adt/jsonapi.c
index fc8af9f861..9e14306b6f 100644
--- a/src/backend/utils/adt/jsonapi.c
+++ b/src/backend/utils/adt/jsonapi.c
@@ -69,44 +69,7 @@ lex_peek(JsonLexContext *lex)
 }
 
 /*
- * lex_accept
- *
- * accept the look_ahead token and move the lexer to the next token if the
- * look_ahead token matches the token parameter. In that case, and if required,
- * also hand back the de-escaped lexeme.
- *
- * returns true if the token matched, false otherwise.
- */
-static inline bool
-lex_accept(JsonLexContext *lex, JsonTokenType token, char **lexeme)
-{
-	if (lex->token_type == token)
-	{
-		if (lexeme != NULL)
-		{
-			if (lex->token_type == JSON_TOKEN_STRING)
-			{
-				if (lex->strval != NULL)
-					*lexeme = pstrdup(lex->strval->data);
-			}
-			else
-			{
-				int			len = (lex->token_terminator - lex->token_start);
-				char	   *tokstr = palloc(len + 1);
-
-				memcpy(tokstr, lex->token_start, len);
-				tokstr[len] = '\0';
-				*lexeme = tokstr;
-			}
-		}
-		json_lex(lex);
-		return true;
-	}
-	return false;
-}
-
-/*
- * lex_accept
+ * lex_expect
  *
  * move the lexer to the next token if the current look_ahead token matches
  * the parameter token. Otherwise, report an error.
@@ -114,7 +77,9 @@ lex_accept(JsonLexContext *lex, JsonTokenType token, char **lexeme)
 static inline void
 lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token)
 {
-	if (!lex_accept(lex, token, NULL))
+	if (lex_peek(lex) == token)
+		json_lex(lex);
+	else
 		report_parse_error(ctx, lex);
 }
 
@@ -260,12 +225,14 @@ json_count_array_elements(JsonLexContext *lex)
 	lex_expect(JSON_PARSE_ARRAY_START, &copylex, JSON_TOKEN_ARRAY_START);
 	if (lex_peek(&copylex) != JSON_TOKEN_ARRAY_END)
 	{
-		do
+		while (1)
 		{
 			count++;
 			parse_array_element(&copylex, &nullSemAction);
+			if (copylex.token_type != JSON_TOKEN_COMMA)
+				break;
+			json_lex(&copylex);
 		}
-		while (lex_accept(&copylex, JSON_TOKEN_COMMA, NULL));
 	}
 	lex_expect(JSON_PARSE_ARRAY_NEXT, &copylex, JSON_TOKEN_ARRAY_END);
 
@@ -286,35 +253,41 @@ parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
 {
 	char	   *val = NULL;
 	json_scalar_action sfunc = sem->scalar;
-	char	  **valaddr;
 	JsonTokenType tok = lex_peek(lex);
 
-	valaddr = sfunc == NULL ? NULL : &val;
-
 	/* a scalar must be a string, a number, true, false, or null */
-	switch (tok)
+	if (tok != JSON_TOKEN_STRING && tok != JSON_TOKEN_NUMBER &&
+		tok != JSON_TOKEN_TRUE && tok != JSON_TOKEN_FALSE &&
+		tok != JSON_TOKEN_NULL)
+		report_parse_error(JSON_PARSE_VALUE, lex);
+
+	/* if no semantic function, just consume the token */
+	if (sfunc == NULL)
 	{
-		case JSON_TOKEN_TRUE:
-			lex_accept(lex, JSON_TOKEN_TRUE, valaddr);
-			break;
-		case JSON_TOKEN_FALSE:
-			lex_accept(lex, JSON_TOKEN_FALSE, valaddr);
-			break;
-		case JSON_TOKEN_NULL:
-			lex_accept(lex, JSON_TOKEN_NULL, valaddr);
-			break;
-		case JSON_TOKEN_NUMBER:
-			lex_accept(lex, JSON_TOKEN_NUMBER, valaddr);
-			break;
-		case JSON_TOKEN_STRING:
-			lex_accept(lex, JSON_TOKEN_STRING, valaddr);
-			break;
-		default:
-			report_parse_error(JSON_PARSE_VALUE, lex);
+		json_lex(lex);
+		return;
+	}
+
+	/* extract the de-escaped string value, or the raw lexeme */
+	if (lex_peek(lex) == JSON_TOKEN_STRING)
+	{
+		if (lex->strval != NULL)
+			val = pstrdup(lex->strval->data);
+	}
+	else
+	{
+		int			len = (lex->token_terminator - lex->token_start);
+
+		val = palloc(len + 1);
+		memcpy(val, lex->token_start, len);
+		val[len] = '\0';
 	}
 
-	if (sfunc != NULL)
-		(*sfunc) (sem->semstate, val, tok);
+	/* consume the token */
+	json_lex(lex);
+
+	/* invoke the callback */
+	(*sfunc) (sem->semstate, val, tok);
 }
 
 static void
@@ -330,14 +303,13 @@ parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
 	json_ofield_action ostart = sem->object_field_start;
 	json_ofield_action oend = sem->object_field_end;
 	bool		isnull;
-	char	  **fnameaddr = NULL;
 	JsonTokenType tok;
 
-	if (ostart != NULL || oend != NULL)
-		fnameaddr = &fname;
-
-	if (!lex_accept(lex, JSON_TOKEN_STRING, fnameaddr))
+	if (lex_peek(lex) != JSON_TOKEN_STRING)
 		report_parse_error(JSON_PARSE_STRING, lex);
+	if ((ostart != NULL || oend != NULL) && lex->strval != NULL)
+		fname = pstrdup(lex->strval->data);
+	json_lex(lex);
 
 	lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON);
 
@@ -387,16 +359,19 @@ parse_object(JsonLexContext *lex, JsonSemAction *sem)
 	 */
 	lex->lex_level++;
 
-	/* we know this will succeed, just clearing the token */
-	lex_expect(JSON_PARSE_OBJECT_START, lex, JSON_TOKEN_OBJECT_START);
+	Assert(lex_peek(lex) == JSON_TOKEN_OBJECT_START);
+	json_lex(lex);
 
 	tok = lex_peek(lex);
 	switch (tok)
 	{
 		case JSON_TOKEN_STRING:
 			parse_object_field(lex, sem);
-			while (lex_accept(lex, JSON_TOKEN_COMMA, NULL))
+			while (lex_peek(lex) == JSON_TOKEN_COMMA)
+			{
+				json_lex(lex);
 				parse_object_field(lex, sem);
+			}
 			break;
 		case JSON_TOKEN_OBJECT_END:
 			break;
@@ -473,8 +448,11 @@ parse_array(JsonLexContext *lex, JsonSemAction *sem)
 
 		parse_array_element(lex, sem);
 
-		while (lex_accept(lex, JSON_TOKEN_COMMA, NULL))
+		while (lex_peek(lex) == JSON_TOKEN_COMMA)
+		{
+			json_lex(lex);
 			parse_array_element(lex, sem);
+		}
 	}
 
 	lex_expect(JSON_PARSE_ARRAY_NEXT, lex, JSON_TOKEN_ARRAY_END);
-- 
2.17.2 (Apple Git-113)

#24

Mark Dilger

mark.dilger@enterprisedb.com

almost 6 years ago

In reply to: Tom Lane (#17)

1 attachment(s)

Re: making the backend's json parser work in frontend code

On Jan 16, 2020, at 1:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Lastly, it strikes me that maybe pg_wchar.h, or parts of it, should
migrate over to src/include/common. But that'd be far more invasive
to other source files, so I've not touched the issue here.

I don't have a view on this.

If anyone is hot to do this part, please have at it. I'm not.

I moved the file pg_wchar.h into src/include/common and split out
most of the functions you marked as being suitable for the
backend only into a new file src/include/utils/mbutils.h. That
resulted in the need to include this new “utils/mbutils.h” from
a number of .c files in the source tree.

One issue that came up was libpq/pqformat.h uses a couple
of those functions from within static inline functions, preventing
me from moving those to a backend-only include file without
making pqformat.h a backend-only include file.

I think the right thing to do here is to move references to these
functions into pqformat.c by un-inlining these functions. I have
not done that yet.

There are whitespace cleanup issues I’m not going to fix just
yet, since I’ll be making more changes anyway. What do you
think of the direction I’m taking in the attached?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

pg_wchar.cleanup.patch.1application/octet-stream; name=pg_wchar.cleanup.patch.1; x-unix-mode=0644Download

diff --git a/contrib/btree_gist/btree_text.c b/contrib/btree_gist/btree_text.c
index 8019d11281..985848cd46 100644
--- a/contrib/btree_gist/btree_text.c
+++ b/contrib/btree_gist/btree_text.c
@@ -6,6 +6,7 @@
 #include "btree_gist.h"
 #include "btree_utils_var.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 
 /*
 ** Text ops
diff --git a/contrib/btree_gist/btree_utils_var.c b/contrib/btree_gist/btree_utils_var.c
index 452241f697..9ff97d404c 100644
--- a/contrib/btree_gist/btree_utils_var.c
+++ b/contrib/btree_gist/btree_utils_var.c
@@ -10,6 +10,7 @@
 #include "btree_gist.h"
 #include "btree_utils_var.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 #include "utils/rel.h"
 
diff --git a/contrib/btree_gist/btree_utils_var.h b/contrib/btree_gist/btree_utils_var.h
index 2f8def655c..4ae273d767 100644
--- a/contrib/btree_gist/btree_utils_var.h
+++ b/contrib/btree_gist/btree_utils_var.h
@@ -6,7 +6,7 @@
 
 #include "access/gist.h"
 #include "btree_gist.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 /* Variable length key */
 typedef bytea GBT_VARKEY;
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 1dddf02779..e26bdc55c4 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -49,7 +49,7 @@
 #include "funcapi.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/scansup.h"
 #include "utils/acl.h"
@@ -57,6 +57,7 @@
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/varlena.h"
diff --git a/contrib/dict_xsyn/dict_xsyn.c b/contrib/dict_xsyn/dict_xsyn.c
index 1065d64ccb..872710096e 100644
--- a/contrib/dict_xsyn/dict_xsyn.c
+++ b/contrib/dict_xsyn/dict_xsyn.c
@@ -17,6 +17,7 @@
 #include "commands/defrem.h"
 #include "tsearch/ts_locale.h"
 #include "tsearch/ts_utils.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c b/contrib/fuzzystrmatch/fuzzystrmatch.c
index ccbb84b481..aa0a209dab 100644
--- a/contrib/fuzzystrmatch/fuzzystrmatch.c
+++ b/contrib/fuzzystrmatch/fuzzystrmatch.c
@@ -40,7 +40,7 @@
 
 #include <ctype.h>
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/varlena.h"
 
diff --git a/contrib/ltree/lquery_op.c b/contrib/ltree/lquery_op.c
index fa47710439..cb486f873a 100644
--- a/contrib/ltree/lquery_op.c
+++ b/contrib/ltree/lquery_op.c
@@ -10,6 +10,7 @@
 #include "catalog/pg_collation.h"
 #include "ltree.h"
 #include "utils/formatting.h"
+#include "utils/mbutils.h"
 
 PG_FUNCTION_INFO_V1(ltq_regex);
 PG_FUNCTION_INFO_V1(ltq_rregex);
diff --git a/contrib/ltree/ltree_io.c b/contrib/ltree/ltree_io.c
index 900a46a9e7..195a948981 100644
--- a/contrib/ltree/ltree_io.c
+++ b/contrib/ltree/ltree_io.c
@@ -9,6 +9,7 @@
 
 #include "crc32.h"
 #include "ltree.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 
 PG_FUNCTION_INFO_V1(ltree_in);
diff --git a/contrib/ltree/ltxtquery_io.c b/contrib/ltree/ltxtquery_io.c
index db347f7772..b7dba1d9fc 100644
--- a/contrib/ltree/ltxtquery_io.c
+++ b/contrib/ltree/ltxtquery_io.c
@@ -7,6 +7,8 @@
 
 #include <ctype.h>
 
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 #include "crc32.h"
 #include "ltree.h"
 #include "miscadmin.h"
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 6f82a671ee..a2b42a68f5 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -64,7 +64,7 @@
 #include "catalog/pg_authid.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/analyze.h"
 #include "parser/parsetree.h"
@@ -78,6 +78,7 @@
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/hashutils.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 
 PG_MODULE_MAGIC;
diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index c9c8cbc734..328b5b4e41 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -10,6 +10,7 @@
 #include "trgm.h"
 #include "tsearch/ts_locale.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/pg_crc.h"
 
diff --git a/contrib/pg_trgm/trgm_regexp.c b/contrib/pg_trgm/trgm_regexp.c
index 1330b615b8..d148eeee91 100644
--- a/contrib/pg_trgm/trgm_regexp.c
+++ b/contrib/pg_trgm/trgm_regexp.c
@@ -195,6 +195,7 @@
 #include "trgm.h"
 #include "tsearch/ts_locale.h"
 #include "utils/hsearch.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 
 /*
diff --git a/contrib/pgcrypto/pgp-pgsql.c b/contrib/pgcrypto/pgp-pgsql.c
index 8be895df80..7057076f48 100644
--- a/contrib/pgcrypto/pgp-pgsql.c
+++ b/contrib/pgcrypto/pgp-pgsql.c
@@ -34,12 +34,13 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "lib/stringinfo.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "mbuf.h"
 #include "pgp.h"
 #include "px.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 
 /*
  * public functions
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 29c811a80b..6c0b13a80c 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -16,13 +16,14 @@
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
 #include "commands/defrem.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postgres_fdw.h"
 #include "storage/latch.h"
 #include "utils/hsearch.h"
 #include "utils/inval.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
 
diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c
index 0047efc075..04377736b3 100644
--- a/contrib/unaccent/unaccent.c
+++ b/contrib/unaccent/unaccent.c
@@ -22,6 +22,7 @@
 #include "tsearch/ts_public.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/regproc.h"
 #include "utils/syscache.h"
 
diff --git a/src/backend/access/spgist/spgtextproc.c b/src/backend/access/spgist/spgtextproc.c
index b5ec81937c..f56424ef5c 100644
--- a/src/backend/access/spgist/spgtextproc.c
+++ b/src/backend/access/spgist/spgtextproc.c
@@ -41,9 +41,10 @@
 
 #include "access/spgist.h"
 #include "catalog/pg_type.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 #include "utils/varlena.h"
 
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index f3e2254954..a551cbba35 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -38,6 +38,7 @@
 #include "utils/combocid.h"
 #include "utils/guc.h"
 #include "utils/inval.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/relmapper.h"
 #include "utils/snapmgr.h"
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 887fb28578..49d01a3485 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -308,7 +308,7 @@ foreach my $row (@{ $catalog_data{pg_type} })
 # as for OIDs, but we have to dig the values out of pg_wchar.h.
 my %encids;
 
-my $encfile = $include_path . 'mb/pg_wchar.h';
+my $encfile = $include_path . 'common/pg_wchar.h';
 open(my $ef, '<', $encfile) || die "$encfile: $!";
 
 # We're parsing an enum, so start with 0 and increment
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index e70243a008..ca77289e27 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -41,7 +41,7 @@
 #include "catalog/pg_type.h"
 #include "commands/dbcommands.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
@@ -54,6 +54,7 @@
 #include "utils/guc.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
 #include "utils/varlena.h"
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index 8559779a4f..2f07f45fdc 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -24,9 +24,10 @@
 #include "catalog/objectaccess.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_namespace.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
diff --git a/src/backend/catalog/pg_conversion.c b/src/backend/catalog/pg_conversion.c
index b38df4f696..1163aa709a 100644
--- a/src/backend/catalog/pg_conversion.c
+++ b/src/backend/catalog/pg_conversion.c
@@ -25,7 +25,7 @@
 #include "catalog/pg_conversion.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_proc.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/catcache.h"
 #include "utils/fmgroids.h"
diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c
index 5194dcaac0..3f9d036a70 100644
--- a/src/backend/catalog/pg_proc.c
+++ b/src/backend/catalog/pg_proc.c
@@ -29,7 +29,7 @@
 #include "commands/defrem.h"
 #include "executor/functions.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "parser/parse_type.h"
@@ -38,6 +38,7 @@
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/regproc.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 34c75e8b56..2869cb68a4 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -27,10 +27,11 @@
 #include "commands/comment.h"
 #include "commands/dbcommands.h"
 #include "commands/defrem.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
diff --git a/src/backend/commands/conversioncmds.c b/src/backend/commands/conversioncmds.c
index f974478b26..87b4cc1090 100644
--- a/src/backend/commands/conversioncmds.c
+++ b/src/backend/commands/conversioncmds.c
@@ -21,7 +21,7 @@
 #include "catalog/pg_type.h"
 #include "commands/alter.h"
 #include "commands/conversioncmds.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parse_func.h"
 #include "utils/builtins.h"
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 40a8ec1abd..cd86236955 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -37,7 +37,7 @@
 #include "foreign/fdwapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/optimizer.h"
@@ -51,6 +51,7 @@
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/partcache.h"
 #include "utils/portal.h"
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 367c30adb0..0a6e6c6c6b 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -46,7 +46,7 @@
 #include "commands/defrem.h"
 #include "commands/seclabel.h"
 #include "commands/tablespace.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/bgwriter.h"
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 01de398dcb..bdba95b0cc 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -51,7 +51,7 @@
 #include "commands/extension.h"
 #include "commands/schemacmds.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "storage/fd.h"
@@ -60,6 +60,7 @@
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 52ce02f898..365d610e39 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -39,7 +39,7 @@
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/tablespace.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -59,6 +59,7 @@
 #include "utils/fmgroids.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/partcache.h"
 #include "utils/pg_rusage.h"
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index 484f7ea2c0..e27957bc5b 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -24,10 +24,11 @@
 #include "access/xlog.h"
 #include "catalog/pg_authid.h"
 #include "commands/variable.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b03e02ae6c..f3237067af 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -51,7 +51,7 @@
 #include "executor/nodeSubplan.h"
 #include "foreign/fdwapi.h"
 #include "jit/jit.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
 #include "storage/bufmgr.h"
@@ -59,6 +59,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/partcache.h"
 #include "utils/rls.h"
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index c13b1d3501..986c545af3 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -21,7 +21,7 @@
 #include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "partitioning/partbounds.h"
@@ -29,6 +29,7 @@
 #include "partitioning/partprune.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/partcache.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index cc5177cc2b..751d6aca14 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -52,12 +52,13 @@
 #include "access/transam.h"
 #include "executor/executor.h"
 #include "jit/jit.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "nodes/nodeFuncs.h"
 #include "parser/parsetree.h"
 #include "partitioning/partdesc.h"
 #include "storage/lmgr.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/typcache.h"
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index d2a63e9e56..93794d0096 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -43,6 +43,7 @@
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/varlena.h"
+#include "utils/mbutils.h"
 
 #ifdef USE_LDAP
 #ifdef WIN32
diff --git a/src/backend/libpq/pqformat.c b/src/backend/libpq/pqformat.c
index a6f990c2d2..f52e2ff5c0 100644
--- a/src/backend/libpq/pqformat.c
+++ b/src/backend/libpq/pqformat.c
@@ -75,7 +75,7 @@
 
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "port/pg_bswap.h"
 
 
diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index 6e98fe55fc..c5993d01c5 100644
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -17,7 +17,7 @@
 #include "access/htup_details.h"
 #include "access/table.h"
 #include "catalog/pg_type.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "parser/parse_coerce.h"
@@ -27,6 +27,7 @@
 #include "utils/builtins.h"
 #include "utils/int8.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/syscache.h"
 #include "utils/varbit.h"
 
diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c
index 1bf1144c4f..6d27d1c5b7 100644
--- a/src/backend/parser/parser.c
+++ b/src/backend/parser/parser.c
@@ -21,10 +21,11 @@
 
 #include "postgres.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "parser/gramparse.h"
 #include "parser/parser.h"
 #include "parser/scansup.h"
+#include "utils/mbutils.h"
 
 static bool check_uescapechar(unsigned char escape);
 static char *str_udeescape(const char *str, char escape,
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 84c73914a8..05b6f17bbd 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -39,7 +39,8 @@
 #include "parser/gramparse.h"
 #include "parser/parser.h"		/* only needed for GUC variables */
 #include "parser/scansup.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 }
 
 %{
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 18169ec4f4..6fc2b59fa3 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -17,8 +17,9 @@
 
 #include <ctype.h>
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "parser/scansup.h"
+#include "utils/mbutils.h"
 
 /* ----------------
  *		scanstr
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 51c486bebd..751f017c7d 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -43,7 +43,7 @@
 #include "common/ip.h"
 #include "libpq/libpq.h"
 #include "libpq/pqsignal.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "pgstat.h"
@@ -63,6 +63,7 @@
 #include "storage/sinvaladt.h"
 #include "utils/ascii.h"
 #include "utils/guc.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/rel.h"
diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index 3cc2d4d362..5db0d1b749 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -16,6 +16,7 @@
  */
 
 #include "catalog/pg_collation.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 
 /*
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index e4fd1f9bb6..cff8d5b17d 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -23,12 +23,13 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "libpq-fe.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "pqexpbuffer.h"
 #include "replication/walreceiver.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/tuplestore.h"
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 7693c98949..a286c4ab82 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -23,7 +23,7 @@
 #include "catalog/pg_type.h"
 #include "fmgr.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "replication/decode.h"
@@ -35,6 +35,7 @@
 #include "utils/builtins.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/regproc.h"
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 7a5471f95c..c661ac4acf 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -38,7 +38,7 @@
 #include "funcapi.h"
 #include "libpq/pqformat.h"
 #include "libpq/pqsignal.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/optimizer.h"
diff --git a/src/backend/snowball/dict_snowball.c b/src/backend/snowball/dict_snowball.c
index 144120a564..03a41c80ec 100644
--- a/src/backend/snowball/dict_snowball.c
+++ b/src/backend/snowball/dict_snowball.c
@@ -15,6 +15,7 @@
 #include "commands/defrem.h"
 #include "tsearch/ts_locale.h"
 #include "tsearch/ts_utils.h"
+#include "utils/mbutils.h"
 
 /* Some platforms define MAXINT and/or MININT, causing conflicts */
 #ifdef MAXINT
diff --git a/src/backend/tcop/fastpath.c b/src/backend/tcop/fastpath.c
index e793984a9f..7ba382aa3a 100644
--- a/src/backend/tcop/fastpath.c
+++ b/src/backend/tcop/fastpath.c
@@ -23,7 +23,7 @@
 #include "catalog/pg_proc.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "port/pg_bswap.h"
 #include "tcop/fastpath.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0a6f80963b..a5d1e199a0 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -47,7 +47,7 @@
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "libpq/pqsignal.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "mb/stringinfo_mb.h"
 #include "miscadmin.h"
 #include "nodes/print.h"
diff --git a/src/backend/tsearch/dict_synonym.c b/src/backend/tsearch/dict_synonym.c
index e732e66dac..343e82b1bf 100644
--- a/src/backend/tsearch/dict_synonym.c
+++ b/src/backend/tsearch/dict_synonym.c
@@ -17,6 +17,7 @@
 #include "tsearch/ts_locale.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 
 typedef struct
 {
diff --git a/src/backend/tsearch/dict_thesaurus.c b/src/backend/tsearch/dict_thesaurus.c
index fb91a34b02..a6b6e0880c 100644
--- a/src/backend/tsearch/dict_thesaurus.c
+++ b/src/backend/tsearch/dict_thesaurus.c
@@ -19,6 +19,7 @@
 #include "tsearch/ts_locale.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/regproc.h"
 
 
diff --git a/src/backend/tsearch/regis.c b/src/backend/tsearch/regis.c
index 2edd4faa8e..46b8434a29 100644
--- a/src/backend/tsearch/regis.c
+++ b/src/backend/tsearch/regis.c
@@ -16,6 +16,7 @@
 
 #include "tsearch/dicts/regis.h"
 #include "tsearch/ts_locale.h"
+#include "utils/mbutils.h"
 
 #define RS_IN_ONEOF 1
 #define RS_IN_ONEOF_IN	2
diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c
index 8aab96d3b0..150f7a9c0f 100644
--- a/src/backend/tsearch/spell.c
+++ b/src/backend/tsearch/spell.c
@@ -66,6 +66,7 @@
 #include "tsearch/dicts/spell.h"
 #include "tsearch/ts_locale.h"
 #include "utils/memutils.h"
+#include "utils/mbutils.h"
 
 
 /*
diff --git a/src/backend/tsearch/ts_locale.c b/src/backend/tsearch/ts_locale.c
index a916dd6cb6..165b194246 100644
--- a/src/backend/tsearch/ts_locale.c
+++ b/src/backend/tsearch/ts_locale.c
@@ -17,6 +17,7 @@
 #include "storage/fd.h"
 #include "tsearch/ts_locale.h"
 #include "tsearch/ts_public.h"
+#include "utils/mbutils.h"
 
 static void tsearch_readline_callback(void *arg);
 
diff --git a/src/backend/tsearch/ts_utils.c b/src/backend/tsearch/ts_utils.c
index 3bc6b32095..280ba72eab 100644
--- a/src/backend/tsearch/ts_utils.c
+++ b/src/backend/tsearch/ts_utils.c
@@ -19,6 +19,7 @@
 #include "miscadmin.h"
 #include "tsearch/ts_locale.h"
 #include "tsearch/ts_utils.h"
+#include "utils/mbutils.h"
 
 
 /*
diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c
index 898466fcef..4fbb985c38 100644
--- a/src/backend/tsearch/wparser_def.c
+++ b/src/backend/tsearch/wparser_def.c
@@ -23,6 +23,7 @@
 #include "tsearch/ts_type.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 
 
 /* Define me to enable tracing of parser behavior */
diff --git a/src/backend/utils/adt/ascii.c b/src/backend/utils/adt/ascii.c
index 3aa8a5e7d2..b692aed409 100644
--- a/src/backend/utils/adt/ascii.c
+++ b/src/backend/utils/adt/ascii.c
@@ -11,9 +11,10 @@
  */
 #include "postgres.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/ascii.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 
 static void pg_to_ascii(unsigned char *src, unsigned char *src_end,
 						unsigned char *dest, int enc);
diff --git a/src/backend/utils/adt/format_type.c b/src/backend/utils/adt/format_type.c
index 92ee77ac5c..1e8f1a2189 100644
--- a/src/backend/utils/adt/format_type.c
+++ b/src/backend/utils/adt/format_type.c
@@ -20,9 +20,10 @@
 #include "access/htup_details.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/numeric.h"
 #include "utils/syscache.h"
 
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index ca3c48d024..cadee997a1 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -87,13 +87,14 @@
 
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/date.h"
 #include "utils/datetime.h"
 #include "utils/float.h"
 #include "utils/formatting.h"
 #include "utils/int8.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/numeric.h"
 #include "utils/pg_locale.h"
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index 0d75928e7f..6bfca411b7 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -26,11 +26,12 @@
 #include "catalog/pg_tablespace_d.h"
 #include "catalog/pg_type.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "postmaster/syslogger.h"
 #include "storage/fd.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 458505abfd..c0a7afb9a6 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -20,7 +20,7 @@
 #include "funcapi.h"
 #include "lib/stringinfo.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "utils/array.h"
@@ -30,6 +30,7 @@
 #include "utils/json.h"
 #include "utils/jsonapi.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/syscache.h"
 #include "utils/typcache.h"
 
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 4b5a0214dc..ee75fc34c4 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -21,7 +21,7 @@
 #include "fmgr.h"
 #include "funcapi.h"
 #include "lib/stringinfo.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
diff --git a/src/backend/utils/adt/jsonpath_exec.c b/src/backend/utils/adt/jsonpath_exec.c
index 6b9bd00bb7..abdc0149b2 100644
--- a/src/backend/utils/adt/jsonpath_exec.c
+++ b/src/backend/utils/adt/jsonpath_exec.c
@@ -74,6 +74,7 @@
 #include "utils/guc.h"
 #include "utils/json.h"
 #include "utils/jsonpath.h"
+#include "utils/mbutils.h"
 #include "utils/timestamp.h"
 #include "utils/varlena.h"
 
diff --git a/src/backend/utils/adt/jsonpath_scan.l b/src/backend/utils/adt/jsonpath_scan.l
index 70681b789d..d522b18097 100644
--- a/src/backend/utils/adt/jsonpath_scan.l
+++ b/src/backend/utils/adt/jsonpath_scan.l
@@ -17,8 +17,9 @@
 
 #include "postgres.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "nodes/pg_list.h"
+#include "utils/mbutils.h"
 
 static JsonPathString scanstring;
 
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 5bf94628c3..ef80cdbf15 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -20,9 +20,10 @@
 #include <ctype.h>
 
 #include "catalog/pg_collation.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 
 
diff --git a/src/backend/utils/adt/like_match.c b/src/backend/utils/adt/like_match.c
index ee30170fbb..b05255761c 100644
--- a/src/backend/utils/adt/like_match.c
+++ b/src/backend/utils/adt/like_match.c
@@ -57,6 +57,7 @@
  * - thomas 2000-08-06
  */
 
+#include "utils/mbutils.h"
 
 /*--------------------
  *	Match text and pattern, return LIKE_TRUE, LIKE_FALSE, or LIKE_ABORT.
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 286e000d4e..837b7dc153 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -43,13 +43,14 @@
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_type.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 #include "utils/selfuncs.h"
 #include "utils/varlena.h"
diff --git a/src/backend/utils/adt/name.c b/src/backend/utils/adt/name.c
index 6749e75c89..89d485889a 100644
--- a/src/backend/utils/adt/name.c
+++ b/src/backend/utils/adt/name.c
@@ -24,11 +24,12 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/varlena.h"
 
 
diff --git a/src/backend/utils/adt/oracle_compat.c b/src/backend/utils/adt/oracle_compat.c
index 0d56dc898a..7ff845e735 100644
--- a/src/backend/utils/adt/oracle_compat.c
+++ b/src/backend/utils/adt/oracle_compat.c
@@ -16,9 +16,10 @@
 #include "postgres.h"
 
 #include "common/int.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/formatting.h"
+#include "utils/mbutils.h"
 
 static text *dotrim(const char *string, int stringlen,
 					const char *set, int setlen,
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 25fb7e2ebf..c7174e33ad 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -57,11 +57,12 @@
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_control.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/formatting.h"
 #include "utils/hsearch.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/pg_locale.h"
 #include "utils/syscache.h"
diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index 6c76e89c9a..ca29918441 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -35,6 +35,7 @@
 #include "regex/regex.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/varlena.h"
 
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 116e00bce4..bd563da92a 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -45,7 +45,7 @@
 #include "common/keywords.h"
 #include "executor/spi.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -66,6 +66,7 @@
 #include "utils/guc.h"
 #include "utils/hsearch.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/ruleutils.h"
diff --git a/src/backend/utils/adt/tsquery.c b/src/backend/utils/adt/tsquery.c
index 092e8a130b..c4635e0ee3 100644
--- a/src/backend/utils/adt/tsquery.c
+++ b/src/backend/utils/adt/tsquery.c
@@ -20,6 +20,7 @@
 #include "tsearch/ts_type.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/pg_crc.h"
 
diff --git a/src/backend/utils/adt/tsvector.c b/src/backend/utils/adt/tsvector.c
index cd3bb9b63e..7106457431 100644
--- a/src/backend/utils/adt/tsvector.c
+++ b/src/backend/utils/adt/tsvector.c
@@ -18,6 +18,7 @@
 #include "tsearch/ts_locale.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 
 typedef struct
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index e33ca5abe7..fbbf3bdbae 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -22,13 +22,14 @@
 #include "executor/spi.h"
 #include "funcapi.h"
 #include "lib/qunique.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "tsearch/ts_utils.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/regproc.h"
 #include "utils/rel.h"
 
diff --git a/src/backend/utils/adt/tsvector_parser.c b/src/backend/utils/adt/tsvector_parser.c
index cfc181c20d..11defd95cd 100644
--- a/src/backend/utils/adt/tsvector_parser.c
+++ b/src/backend/utils/adt/tsvector_parser.c
@@ -16,6 +16,7 @@
 
 #include "tsearch/ts_locale.h"
 #include "tsearch/ts_utils.h"
+#include "utils/mbutils.h"
 
 
 /*
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index 1e1239a1ba..fb80a08f36 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -18,13 +18,14 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/hashutils.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 #include "utils/varlena.h"
 
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 1b351cbc68..e2029ade81 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -31,6 +31,7 @@
 #include "utils/bytea.h"
 #include "utils/hashutils.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/pg_locale.h"
 #include "utils/sortsupport.h"
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 3808c307f6..855843a843 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -78,7 +78,7 @@
 #include "fmgr.h"
 #include "lib/stringinfo.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/execnodes.h"
 #include "nodes/nodeFuncs.h"
@@ -87,6 +87,7 @@
 #include "utils/date.h"
 #include "utils/datetime.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index f5b0211f66..5db5520504 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -70,7 +70,7 @@
 #include "access/xact.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
@@ -78,6 +78,7 @@
 #include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index c4b2946986..503175fbf3 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -34,7 +34,7 @@
 #include "catalog/pg_authid.h"
 #include "common/file_perm.h"
 #include "libpq/libpq.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 8a47dcdcb1..98195f1ea5 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -36,7 +36,7 @@
 #include "catalog/pg_tablespace.h"
 #include "libpq/auth.h"
 #include "libpq/libpq-be.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -56,6 +56,7 @@
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/pg_locale.h"
 #include "utils/portal.h"
diff --git a/src/backend/utils/mb/conv.c b/src/backend/utils/mb/conv.c
index 54dcf71fb7..3b68dd4821 100644
--- a/src/backend/utils/mb/conv.c
+++ b/src/backend/utils/mb/conv.c
@@ -11,7 +11,8 @@
  *-------------------------------------------------------------------------
  */
 #include "postgres.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 
 /*
diff --git a/src/backend/utils/mb/conversion_procs/README.euc_jp b/src/backend/utils/mb/conversion_procs/README.euc_jp
index 6e59b7bd7f..97cc63f1bd 100644
--- a/src/backend/utils/mb/conversion_procs/README.euc_jp
+++ b/src/backend/utils/mb/conversion_procs/README.euc_jp
@@ -35,7 +35,7 @@ o C
   ������������������(5��������������������������������NULL������������
   ����������������������)������������������������
 
-  ����������������ID��include/mb/pg_wchar.h��typedef enum pg_enc������
+  ����������������ID��include/common/pg_wchar.h��typedef enum pg_enc������
   ��������������
 
 o ����������������������
diff --git a/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c b/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c
index 376b48ca61..0c951c6901 100644
--- a/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c b/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
index 9ba6bd3040..26d0f21140 100644
--- a/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
+++ b/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
@@ -12,7 +12,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c b/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c
index 59c6c3bb12..c3c324c9f5 100644
--- a/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c b/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
index 4ca8e2126e..b24148c90b 100644
--- a/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
+++ b/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 /*
  * SJIS alternative code.
diff --git a/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c b/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
index 4d7876a666..273fce16f7 100644
--- a/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c
index 68f76aa8cb..7601bd65f2 100644
--- a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c
+++ b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c
@@ -13,7 +13,7 @@
 /* can be used in either frontend or backend */
 #include "postgres_fe.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 typedef struct
 {
diff --git a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
index 82a22b9beb..fe40e2429f 100644
--- a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
+++ b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 #define ENCODING_GROWTH_RATE 4
 
diff --git a/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c b/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c
index f424f88145..21a82ffd88 100644
--- a/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c
+++ b/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c b/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c
index a358a707c1..bb40d91922 100644
--- a/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c b/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c
index 75ed49ac54..f12d96e263 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/big5_to_utf8.map"
 #include "../../Unicode/utf8_to_big5.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c b/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c
index 90ad316111..49317d6417 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/utf8_to_koi8r.map"
 #include "../../Unicode/koi8r_to_utf8.map"
 #include "../../Unicode/utf8_to_koi8u.map"
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c
index 018312489c..808ccf3b1d 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/euc_jis_2004_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_jis_2004.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c
index 62182a9ba8..c4eb2eb879 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/euc_cn_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_cn.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c
index dc5abb5dfd..717106b2a1 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/euc_jp_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_jp.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c
index 088a38d839..1885b1d1ef 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/euc_kr_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_kr.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c
index a9fe94f88b..3f1b8b36cc 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/euc_tw_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_tw.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c b/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c
index 96909b5885..8a5ac5db27 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/gb18030_to_utf8.map"
 #include "../../Unicode/utf8_to_gb18030.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c b/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c
index 78bbcd3ce7..bb61be10ad 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/gbk_to_utf8.map"
 #include "../../Unicode/utf8_to_gbk.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
index 348524f4a2..706b072993 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/iso8859_10_to_utf8.map"
 #include "../../Unicode/iso8859_13_to_utf8.map"
 #include "../../Unicode/iso8859_14_to_utf8.map"
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
index 2cdca9f780..2e7a64fa4a 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c b/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c
index e09a7c8e41..29534f81a4 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/johab_to_utf8.map"
 #include "../../Unicode/utf8_to_johab.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c b/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c
index c56fa80a4b..a6f526d3cb 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/sjis_to_utf8.map"
 #include "../../Unicode/utf8_to_sjis.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c b/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c
index 458500998d..a21f7a72c8 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/shift_jis_2004_to_utf8.map"
 #include "../../Unicode/utf8_to_shift_jis_2004.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c b/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c
index 3226ed0325..0ad012282f 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/uhc_to_utf8.map"
 #include "../../Unicode/utf8_to_uhc.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c b/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c
index 1a0074d063..b348f29646 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c
@@ -13,7 +13,8 @@
 
 #include "postgres.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/utf8_to_win1250.map"
 #include "../../Unicode/utf8_to_win1251.map"
 #include "../../Unicode/utf8_to_win1252.map"
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 86787bcb31..bd3c5b9442 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -36,8 +36,9 @@
 
 #include "access/xact.h"
 #include "catalog/namespace.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
 
diff --git a/src/backend/utils/mb/stringinfo_mb.c b/src/backend/utils/mb/stringinfo_mb.c
index c153b77007..ce8c181474 100644
--- a/src/backend/utils/mb/stringinfo_mb.c
+++ b/src/backend/utils/mb/stringinfo_mb.c
@@ -20,7 +20,8 @@
 #include "postgres.h"
 
 #include "mb/stringinfo_mb.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 
 /*
diff --git a/src/backend/utils/mb/wstrcmp.c b/src/backend/utils/mb/wstrcmp.c
index dad3ae023a..e5f57d717d 100644
--- a/src/backend/utils/mb/wstrcmp.c
+++ b/src/backend/utils/mb/wstrcmp.c
@@ -35,7 +35,7 @@
 /* can be used in either frontend or backend */
 #include "postgres_fe.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 int
 pg_char_and_wchar_strcmp(const char *s1, const pg_wchar *s2)
diff --git a/src/backend/utils/mb/wstrncmp.c b/src/backend/utils/mb/wstrncmp.c
index ea4823fc6f..cce0c6c5cf 100644
--- a/src/backend/utils/mb/wstrncmp.c
+++ b/src/backend/utils/mb/wstrncmp.c
@@ -34,7 +34,7 @@
 /* can be used in either frontend or backend */
 #include "postgres_fe.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 int
 pg_wchar_strncmp(const pg_wchar *s1, const pg_wchar *s2, size_t n)
diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l
index 268b745528..0738360aa7 100644
--- a/src/backend/utils/misc/guc-file.l
+++ b/src/backend/utils/misc/guc-file.l
@@ -14,10 +14,11 @@
 #include <ctype.h>
 #include <unistd.h>
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "storage/fd.h"
 #include "utils/guc.h"
+#include "utils/mbutils.h"
 
 
 /*
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 9e24fec72d..b47439c52c 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -21,8 +21,9 @@
 
 #include "postgres.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
+#include "utils/mbutils.h"
 #include "utils/memdebug.h"
 #include "utils/memutils.h"
 
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ec6d0bdf8e..d3b598e0a5 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -71,7 +71,7 @@
 #include "fe_utils/string_utils.h"
 #include "getaddrinfo.h"
 #include "getopt_long.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 
 
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 5f9a102a74..355723b161 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -11,7 +11,7 @@
 
 #include "catalog/pg_authid_d.h"
 #include "fe_utils/string_utils.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "pg_upgrade.h"
 
 static void check_new_cluster_is_empty(void);
diff --git a/src/bin/psql/mainloop.c b/src/bin/psql/mainloop.c
index bdf803a053..bf05cc688d 100644
--- a/src/bin/psql/mainloop.c
+++ b/src/bin/psql/mainloop.c
@@ -12,7 +12,7 @@
 #include "common/logging.h"
 #include "input.h"
 #include "mainloop.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "prompt.h"
 #include "settings.h"
 
diff --git a/src/common/encnames.c b/src/common/encnames.c
index 14cf1b39e9..f06221e1d5 100644
--- a/src/common/encnames.c
+++ b/src/common/encnames.c
@@ -15,7 +15,7 @@
 #include <ctype.h>
 #include <unistd.h>
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 
 /* ----------
@@ -297,7 +297,7 @@ static const pg_encname pg_encname_tbl[] =
 
 /* ----------
  * These are "official" encoding names.
- * XXX must be sorted by the same order as enum pg_enc (in mb/pg_wchar.h)
+ * XXX must be sorted by the same order as enum pg_enc (in common/pg_wchar.h)
  * ----------
  */
 #ifndef WIN32
diff --git a/src/common/saslprep.c b/src/common/saslprep.c
index 7739b81807..a8b2c42949 100644
--- a/src/common/saslprep.c
+++ b/src/common/saslprep.c
@@ -27,7 +27,7 @@
 
 #include "common/saslprep.h"
 #include "common/unicode_norm.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 /*
  * Limit on how large password's we will try to process.  A password
diff --git a/src/common/wchar.c b/src/common/wchar.c
index efaf1c155b..53006115d2 100644
--- a/src/common/wchar.c
+++ b/src/common/wchar.c
@@ -12,7 +12,7 @@
  */
 #include "c.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 
 /*
@@ -1499,7 +1499,7 @@ pg_utf8_islegal(const unsigned char *source, int length)
 /*
  *-------------------------------------------------------------------
  * encoding info table
- * XXX must be sorted by the same order as enum pg_enc (in mb/pg_wchar.h)
+ * XXX must be sorted by the same order as enum pg_enc (in common/pg_wchar.h)
  *-------------------------------------------------------------------
  */
 const pg_wchar_tbl pg_wchar_table[] = {
diff --git a/src/include/catalog/pg_conversion.dat b/src/include/catalog/pg_conversion.dat
index d7120f2fb0..9913d2e370 100644
--- a/src/include/catalog/pg_conversion.dat
+++ b/src/include/catalog/pg_conversion.dat
@@ -11,7 +11,7 @@
 #----------------------------------------------------------------------
 
 # Note: conforencoding and contoencoding must match the spelling of
-# the labels used in the enum pg_enc in mb/pg_wchar.h.
+# the labels used in the enum pg_enc in common/pg_wchar.h.
 
 [
 
diff --git a/src/include/mb/pg_wchar.h b/src/include/common/pg_wchar.h
similarity index 85%
rename from src/include/mb/pg_wchar.h
rename to src/include/common/pg_wchar.h
index b8892ef730..e23cd11df6 100644
--- a/src/include/mb/pg_wchar.h
+++ b/src/include/common/pg_wchar.h
@@ -6,7 +6,7 @@
  * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
- * src/include/mb/pg_wchar.h
+ * src/include/common/pg_wchar.h
  *
  *	NOTES
  *		This is used both by the backend and by frontends, but should not be
@@ -552,104 +552,25 @@ extern bool pg_utf8_islegal(const unsigned char *source, int length);
 extern int	pg_utf_mblen(const unsigned char *s);
 extern int	pg_mule_mblen(const unsigned char *s);
 
-/*
- * The remaining functions are backend-only.
- */
-extern int	pg_mb2wchar(const char *from, pg_wchar *to);
-extern int	pg_mb2wchar_with_len(const char *from, pg_wchar *to, int len);
 extern int	pg_encoding_mb2wchar_with_len(int encoding,
 										  const char *from, pg_wchar *to, int len);
-extern int	pg_wchar2mb(const pg_wchar *from, char *to);
-extern int	pg_wchar2mb_with_len(const pg_wchar *from, char *to, int len);
 extern int	pg_encoding_wchar2mb_with_len(int encoding,
 										  const pg_wchar *from, char *to, int len);
 extern int	pg_char_and_wchar_strcmp(const char *s1, const pg_wchar *s2);
 extern int	pg_wchar_strncmp(const pg_wchar *s1, const pg_wchar *s2, size_t n);
 extern int	pg_char_and_wchar_strncmp(const char *s1, const pg_wchar *s2, size_t n);
-extern size_t pg_wchar_strlen(const pg_wchar *wstr);
-extern int	pg_mblen(const char *mbstr);
-extern int	pg_dsplen(const char *mbstr);
-extern int	pg_mbstrlen(const char *mbstr);
-extern int	pg_mbstrlen_with_len(const char *mbstr, int len);
-extern int	pg_mbcliplen(const char *mbstr, int len, int limit);
+extern size_t	pg_wchar_strlen(const pg_wchar *wstr);
 extern int	pg_encoding_mbcliplen(int encoding, const char *mbstr,
 								  int len, int limit);
-extern int	pg_mbcharcliplen(const char *mbstr, int len, int limit);
-extern int	pg_database_encoding_max_length(void);
-extern mbcharacter_incrementer pg_database_encoding_character_incrementer(void);
-
-extern int	PrepareClientEncoding(int encoding);
-extern int	SetClientEncoding(int encoding);
-extern void InitializeClientEncoding(void);
-extern int	pg_get_client_encoding(void);
-extern const char *pg_get_client_encoding_name(void);
-
-extern void SetDatabaseEncoding(int encoding);
-extern int	GetDatabaseEncoding(void);
-extern const char *GetDatabaseEncodingName(void);
-extern void SetMessageEncoding(int encoding);
-extern int	GetMessageEncoding(void);
-
-#ifdef ENABLE_NLS
-extern int	pg_bind_textdomain_codeset(const char *domainname);
-#endif
-
-extern unsigned char *pg_do_encoding_conversion(unsigned char *src, int len,
-												int src_encoding,
-												int dest_encoding);
-
-extern char *pg_client_to_server(const char *s, int len);
+/*
+ * These functions are used from pqformat.h and pqformat.c and therefore need
+ * to be retained here rather than being moved to mbutils.h.  This situation
+ * should be rectified.
+ */
 extern char *pg_server_to_client(const char *s, int len);
-extern char *pg_any_to_server(const char *s, int len, int encoding);
-extern char *pg_server_to_any(const char *s, int len, int encoding);
+extern char *pg_client_to_server(const char *s, int len);
 
 extern unsigned short BIG5toCNS(unsigned short big5, unsigned char *lc);
 extern unsigned short CNStoBIG5(unsigned short cns, unsigned char lc);
 
-extern void UtfToLocal(const unsigned char *utf, int len,
-					   unsigned char *iso,
-					   const pg_mb_radix_tree *map,
-					   const pg_utf_to_local_combined *cmap, int cmapsize,
-					   utf_local_conversion_func conv_func,
-					   int encoding);
-extern void LocalToUtf(const unsigned char *iso, int len,
-					   unsigned char *utf,
-					   const pg_mb_radix_tree *map,
-					   const pg_local_to_utf_combined *cmap, int cmapsize,
-					   utf_local_conversion_func conv_func,
-					   int encoding);
-
-extern bool pg_verifymbstr(const char *mbstr, int len, bool noError);
-extern bool pg_verify_mbstr(int encoding, const char *mbstr, int len,
-							bool noError);
-extern int	pg_verify_mbstr_len(int encoding, const char *mbstr, int len,
-								bool noError);
-
-extern void check_encoding_conversion_args(int src_encoding,
-										   int dest_encoding,
-										   int len,
-										   int expected_src_encoding,
-										   int expected_dest_encoding);
-
-extern void report_invalid_encoding(int encoding, const char *mbstr, int len) pg_attribute_noreturn();
-extern void report_untranslatable_char(int src_encoding, int dest_encoding,
-									   const char *mbstr, int len) pg_attribute_noreturn();
-
-extern void local2local(const unsigned char *l, unsigned char *p, int len,
-						int src_encoding, int dest_encoding, const unsigned char *tab);
-extern void latin2mic(const unsigned char *l, unsigned char *p, int len,
-					  int lc, int encoding);
-extern void mic2latin(const unsigned char *mic, unsigned char *p, int len,
-					  int lc, int encoding);
-extern void latin2mic_with_table(const unsigned char *l, unsigned char *p,
-								 int len, int lc, int encoding,
-								 const unsigned char *tab);
-extern void mic2latin_with_table(const unsigned char *mic, unsigned char *p,
-								 int len, int lc, int encoding,
-								 const unsigned char *tab);
-
-#ifdef WIN32
-extern WCHAR *pgwin32_message_to_UTF16(const char *str, int len, int *utf16len);
-#endif
-
 #endif							/* PG_WCHAR_H */
diff --git a/src/include/common/unicode_norm.h b/src/include/common/unicode_norm.h
index f1b7ef1aa4..e09c1162eb 100644
--- a/src/include/common/unicode_norm.h
+++ b/src/include/common/unicode_norm.h
@@ -14,7 +14,7 @@
 #ifndef UNICODE_NORM_H
 #define UNICODE_NORM_H
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 extern pg_wchar *unicode_normalize_kc(const pg_wchar *input);
 
diff --git a/src/include/libpq/pqformat.h b/src/include/libpq/pqformat.h
index af31e9caba..584e141ac9 100644
--- a/src/include/libpq/pqformat.h
+++ b/src/include/libpq/pqformat.h
@@ -14,7 +14,7 @@
 #define PQFORMAT_H
 
 #include "lib/stringinfo.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "port/pg_bswap.h"
 
 extern void pq_beginmessage(StringInfo buf, char msgtype);
diff --git a/src/include/regex/regcustom.h b/src/include/regex/regcustom.h
index 82c9e2fad8..2506ac8268 100644
--- a/src/include/regex/regcustom.h
+++ b/src/include/regex/regcustom.h
@@ -52,7 +52,7 @@
 #include <wctype.h>
 #endif
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 #include "miscadmin.h"			/* needed by rcancelrequested/rstacktoodeep */
 
diff --git a/src/include/regex/regex.h b/src/include/regex/regex.h
index dc31899aa4..739cc7d14b 100644
--- a/src/include/regex/regex.h
+++ b/src/include/regex/regex.h
@@ -35,7 +35,7 @@
 /*
  * Add your own defines, if needed, here.
  */
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 /*
  * interface types etc.
diff --git a/src/include/tsearch/ts_locale.h b/src/include/tsearch/ts_locale.h
index 17536babfe..f77eb23c8a 100644
--- a/src/include/tsearch/ts_locale.h
+++ b/src/include/tsearch/ts_locale.h
@@ -15,7 +15,7 @@
 #include <ctype.h>
 #include <limits.h>
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/pg_locale.h"
 
 /*
diff --git a/src/include/utils/mbutils.h b/src/include/utils/mbutils.h
new file mode 100644
index 0000000000..737e9d9a59
--- /dev/null
+++ b/src/include/utils/mbutils.h
@@ -0,0 +1,108 @@
+/*-------------------------------------------------------------------------
+ *
+ * mbutils.h
+ *	  backend-only multibyte-character support
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/mbutils.h
+ *
+ *	NOTES
+ *		TODO: write some notes
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MBUTILS_H
+#define MBUTILS_H
+
+/*
+ * For front-end portability, use pg_encoding_mb2wchar_with_len instead
+*/
+extern int	pg_mb2wchar(const char *from, pg_wchar *to);
+extern int	pg_mb2wchar_with_len(const char *from, pg_wchar *to, int len);
+
+extern int	pg_wchar2mb(const pg_wchar *from, char *to);
+extern int	pg_wchar2mb_with_len(const pg_wchar *from, char *to, int len);
+
+extern int	pg_mblen(const char *mbstr);
+extern int	pg_dsplen(const char *mbstr);
+extern int	pg_mbstrlen(const char *mbstr);
+extern int	pg_mbstrlen_with_len(const char *mbstr, int len);
+extern int	pg_mbcliplen(const char *mbstr, int len, int limit);
+
+extern int	pg_mbcharcliplen(const char *mbstr, int len, int limit);
+extern int	pg_database_encoding_max_length(void);
+extern mbcharacter_incrementer pg_database_encoding_character_incrementer(void);
+
+extern int	PrepareClientEncoding(int encoding);
+extern int	SetClientEncoding(int encoding);
+extern void	InitializeClientEncoding(void);
+extern int	pg_get_client_encoding(void);
+extern const char *pg_get_client_encoding_name(void);
+
+extern void SetDatabaseEncoding(int encoding);
+extern int	GetDatabaseEncoding(void);
+extern const char *GetDatabaseEncodingName(void);
+extern void SetMessageEncoding(int encoding);
+extern int	GetMessageEncoding(void);
+
+#ifdef ENABLE_NLS
+extern int	pg_bind_textdomain_codeset(const char *domainname);
+#endif
+
+extern unsigned char *pg_do_encoding_conversion(unsigned char *src, int len,
+												int src_encoding,
+												int dest_encoding);
+
+extern char *pg_any_to_server(const char *s, int len, int encoding);
+extern char *pg_server_to_any(const char *s, int len, int encoding);
+
+extern void UtfToLocal(const unsigned char *utf, int len,
+					   unsigned char *iso,
+					   const pg_mb_radix_tree *map,
+					   const pg_utf_to_local_combined *cmap, int cmapsize,
+					   utf_local_conversion_func conv_func,
+					   int encoding);
+extern void LocalToUtf(const unsigned char *iso, int len,
+					   unsigned char *utf,
+					   const pg_mb_radix_tree *map,
+					   const pg_local_to_utf_combined *cmap, int cmapsize,
+					   utf_local_conversion_func conv_func,
+					   int encoding);
+
+extern bool pg_verifymbstr(const char *mbstr, int len, bool noError);
+
+extern bool pg_verify_mbstr(int encoding, const char *mbstr, int len,
+							bool noError);
+extern int	pg_verify_mbstr_len(int encoding, const char *mbstr, int len,
+								bool noError);
+
+extern void check_encoding_conversion_args(int src_encoding,
+										   int dest_encoding,
+										   int len,
+										   int expected_src_encoding,
+										   int expected_dest_encoding);
+
+extern void report_invalid_encoding(int encoding, const char *mbstr, int len) pg_attribute_noreturn();
+extern void report_untranslatable_char(int src_encoding, int dest_encoding,
+									   const char *mbstr, int len) pg_attribute_noreturn();
+
+extern void local2local(const unsigned char *l, unsigned char *p, int len,
+						int src_encoding, int dest_encoding, const unsigned char *tab);
+extern void latin2mic(const unsigned char *l, unsigned char *p, int len,
+					  int lc, int encoding);
+extern void mic2latin(const unsigned char *mic, unsigned char *p, int len,
+					  int lc, int encoding);
+extern void latin2mic_with_table(const unsigned char *l, unsigned char *p,
+								 int len, int lc, int encoding,
+								 const unsigned char *tab);
+extern void mic2latin_with_table(const unsigned char *mic, unsigned char *p,
+								 int len, int lc, int encoding,
+								 const unsigned char *tab);
+
+#ifdef WIN32
+extern WCHAR *pgwin32_message_to_UTF16(const char *str, int len, int *utf16len);
+#endif
+
+#endif							/* MBUTILS_H */
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 80b54bc92b..a58619a562 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -28,7 +28,7 @@
 #include "fe-auth.h"
 #include "libpq-fe.h"
 #include "libpq-int.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "pg_config_paths.h"
 #include "port/pg_bswap.h"
 
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index eea0237c3a..5940ed6705 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -26,7 +26,7 @@
 
 #include "libpq-fe.h"
 #include "libpq-int.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 /* keep this in same order as ExecStatusType in libpq-fe.h */
 char	   *const pgresStatus[] = {
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index a9074d2f29..ce3b4ce41e 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -49,7 +49,7 @@
 
 #include "libpq-fe.h"
 #include "libpq-int.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "pg_config_paths.h"
 #include "port/pg_bswap.h"
 
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 850bf84c96..e8461cd0df 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -28,7 +28,7 @@
 
 #include "libpq-fe.h"
 #include "libpq-int.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "port/pg_bswap.h"
 
 /*
diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index c78891868a..75d6c67560 100644
--- a/src/pl/plperl/plperl.c
+++ b/src/pl/plperl/plperl.c
@@ -23,7 +23,7 @@
 #include "commands/trigger.h"
 #include "executor/spi.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "parser/parse_type.h"
diff --git a/src/pl/plperl/plperl_helpers.h b/src/pl/plperl/plperl_helpers.h
index 1e318b6dc8..cec942c280 100644
--- a/src/pl/plperl/plperl_helpers.h
+++ b/src/pl/plperl/plperl_helpers.h
@@ -1,7 +1,7 @@
 #ifndef PL_PERL_HELPERS_H
 #define PL_PERL_HELPERS_H
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 #include "plperl.h"
 
diff --git a/src/pl/plpgsql/src/pl_scanner.c b/src/pl/plpgsql/src/pl_scanner.c
index 9cea2e42ac..a875c01430 100644
--- a/src/pl/plpgsql/src/pl_scanner.c
+++ b/src/pl/plpgsql/src/pl_scanner.c
@@ -15,8 +15,9 @@
  */
 #include "postgres.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "parser/scanner.h"
+#include "utils/mbutils.h"
 
 #include "plpgsql.h"
 #include "pl_gram.h"			/* must be after parser/scanner.h */
diff --git a/src/pl/plpython/plpy_cursorobject.c b/src/pl/plpython/plpy_cursorobject.c
index 4c37ff898c..01ad7cf3bc 100644
--- a/src/pl/plpython/plpy_cursorobject.c
+++ b/src/pl/plpython/plpy_cursorobject.c
@@ -10,7 +10,7 @@
 
 #include "access/xact.h"
 #include "catalog/pg_type.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "plpy_cursorobject.h"
 #include "plpy_elog.h"
 #include "plpy_main.h"
diff --git a/src/pl/plpython/plpy_plpymodule.c b/src/pl/plpython/plpy_plpymodule.c
index e308c61d50..b079afc000 100644
--- a/src/pl/plpython/plpy_plpymodule.c
+++ b/src/pl/plpython/plpy_plpymodule.c
@@ -7,7 +7,7 @@
 #include "postgres.h"
 
 #include "access/xact.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "plpy_cursorobject.h"
 #include "plpy_elog.h"
 #include "plpy_main.h"
diff --git a/src/pl/plpython/plpy_spi.c b/src/pl/plpython/plpy_spi.c
index 99c1b4f28f..9857114d4c 100644
--- a/src/pl/plpython/plpy_spi.c
+++ b/src/pl/plpython/plpy_spi.c
@@ -12,7 +12,7 @@
 #include "access/xact.h"
 #include "catalog/pg_type.h"
 #include "executor/spi.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "parser/parse_type.h"
 #include "plpy_elog.h"
 #include "plpy_main.h"
diff --git a/src/pl/plpython/plpy_typeio.c b/src/pl/plpython/plpy_typeio.c
index e734b0d130..e75f55d622 100644
--- a/src/pl/plpython/plpy_typeio.c
+++ b/src/pl/plpython/plpy_typeio.c
@@ -9,7 +9,7 @@
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "plpy_elog.h"
 #include "plpy_main.h"
diff --git a/src/pl/plpython/plpy_util.c b/src/pl/plpython/plpy_util.c
index 4a7d7264d7..873afcfbb8 100644
--- a/src/pl/plpython/plpy_util.c
+++ b/src/pl/plpython/plpy_util.c
@@ -6,7 +6,7 @@
 
 #include "postgres.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "plpy_elog.h"
 #include "plpy_util.h"
 #include "plpython.h"
diff --git a/src/pl/tcl/pltcl.c b/src/pl/tcl/pltcl.c
index e7640008fd..b163017ee6 100644
--- a/src/pl/tcl/pltcl.c
+++ b/src/pl/tcl/pltcl.c
@@ -23,7 +23,7 @@
 #include "executor/spi.h"
 #include "fmgr.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
diff --git a/src/port/chklocale.c b/src/port/chklocale.c
index c9c680f0b3..4ebc11bab8 100644
--- a/src/port/chklocale.c
+++ b/src/port/chklocale.c
@@ -23,7 +23,7 @@
 #include <langinfo.h>
 #endif
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 
 /*

#25

David Steele

david@pgmasters.net

almost 6 years ago

In reply to: Robert Haas (#23)

Re: making the backend's json parser work in frontend code

Hi Robert,

On 1/17/20 2:33 PM, Robert Haas wrote:

On Fri, Jan 17, 2020 at 12:36 PM David Steele <david@pgmasters.net>

wrote:

I used the callbacks because that's the first method I found but it
seems like json_lex() might be easier to use in practice.

Ugh, really? That doesn't seem like it would be nice at all.

I guess it's a matter of how you want to structure the code.

I think it's an issue that the entire string must be passed to the lexer
at once. That will not be great for large manifests. However, I don't
think it will be all that hard to implement an optional "want more"
callback in the lexer so JSON data can be fed in from the file in

chunks.

I thought so initially, but now I'm not so sure. The thing is, you
actually need all the manifest data in memory at once anyway, or so I
think. You're essentially doing a "full join" between the contents of
the manifest and the contents of the file system, so you've got to
scan one (probably the filesystem) and then mark entries in the other
(probably the manifest) used as you go.

Yeah, having a copy of the manifest in memory is the easiest way to do
validation, but I think you'd want it in a structured format.

We parse the file part of the manifest into a sorted struct array which
we can then do binary searches on by filename.

So, that just leaves ereport() as the largest remaining issue? I'll
look at that today and Tuesday and see what I can work up.

PFA my work on that topic. As compared with my previous patch series,
the previous 0001 is dropped and what are now 0001 and 0002 are the
same as patches from the previous series. 0003 and 0004 are aiming
toward getting rid of ereport() and, I believe, show a plausible
strategy for so doing. There are, possibly, things not to like here,
and it's certainly incomplete, but I think I kinda like this
direction. Comments appreciated.

0003 nukes lex_accept(), inlining the logic into callers. I found that
the refactoring I wanted to do in 0004 was pretty hard without this,
and it turns out to save code, so I think this is a good idea
independently of anything else.

No arguments here.

0004 adjusts many functions in jsonapi.c to return a new enumerated
type, JsonParseErrorType, instead of directly doing ereport(). It adds
a new function that takes this value and a lexing context and throws
an error. The JSON_ESCAPING_INVALID case is wrong and involves a gross
hack, but that's fixable with another field in the lexing context.
More work is needed to really bring this up to scratch, but the idea
is to make this code have a soft dependency on ereport() rather than a
hard one.

My first reaction was that if we migrated ereport() first it would make
this all so much easier. Now I'm no so sure.

Having a general json parser in libcommon that is not tied into a
specific error handling/logging system actually sounds like a really
nice thing to have. If we do migrate ereport() the user would always
have the choice to call throw_a_json_error() if they wanted to.

There's also a bit of de-duplication of error messages, which is nice,
especially in the case JSON_ESCAPING_INVALID. And I agree that this
case can be fixed with another field in the lexer -- or at least so it
seems to me.

Though, throw_a_json_error() is not my favorite name. Perhaps
json_ereport()?

Regards,
--
-David
david@pgmasters.net

#26

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: David Steele (#25)

Re: making the backend's json parser work in frontend code

On Tue, Jan 21, 2020 at 5:34 PM David Steele <david@pgmasters.net> wrote:

Though, throw_a_json_error() is not my favorite name. Perhaps
json_ereport()?

That name was deliberately chosen to be dumb, with the thought that
readers would understand it was to be replaced at some point before
this was final. It sounds like it wasn't quite dumb enough to make
that totally clear.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#27

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Robert Haas (#26)

5 attachment(s)

Re: making the backend's json parser work in frontend code

On Tue, Jan 21, 2020 at 7:23 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jan 21, 2020 at 5:34 PM David Steele <david@pgmasters.net> wrote:

Though, throw_a_json_error() is not my favorite name. Perhaps
json_ereport()?

That name was deliberately chosen to be dumb, with the thought that
readers would understand it was to be replaced at some point before
this was final. It sounds like it wasn't quite dumb enough to make
that totally clear.

Here is a new version that is, I think, much closer what I would
consider a final form. 0001 through 0003 are as before, and unless
somebody says pretty soon that they see a problem with those or want
more time to review them, I'm going to commit them; David Steele has
endorsed all three, and they seem like independently sensible
cleanups.

0004 is a substantially cleaned up version of the patch to make the
JSON parser return a result code rather than throwing errors. Names
have been fixed, interfaces have been tidied up, and the thing is
better integrated with the surrounding code. I would really like
comments, if anyone has them, on whether this approach is acceptable.

0005 builds on 0004 by moving three functions from jsonapi.c to
jsonfuncs.c. With that done, jsonapi.c has minimal remaining
dependencies on the backend environment. It would still need a
substitute for elog(ERROR, "some internal thing is broken"); I'm
thinking of using pg_log_fatal() for that case. It would also need a
fix for the problem that pg_mblen() is not available in the front-end
environment. I don't know what to do about that yet exactly, but it
doesn't seem unsolvable. The frontend environment just needs to know
which encoding to use, and needs a way to call PQmblen() rather than
pg_mblen().

One problem with this whole thing that I just realized is that the
backup manifest file needs to store filenames, and we don't know that
the filenames we get from the filesystem are going to be valid in
UTF-8 (or, for that matter, any other encoding we might want to
choose). So, just deciding that the backup manifest is always UTF-8
doesn't seem like an option, unless we stick another level of escaping
in there somehow. Strictly as a theoretical matter, someone might
consider this a reason why using JSON for the backup manifest is not
necessarily the best fit, but since other arguments to that effect
have gotten me nowhere until now, I will instead request that someone
suggest to me how I ought to handle that problem.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

v3-0001-Adjust-src-include-utils-jsonapi.h-so-it-s-not-ba.patchapplication/octet-stream; name=v3-0001-Adjust-src-include-utils-jsonapi.h-so-it-s-not-ba.patchDownload

From 517b709f204a05549d30be0bfc5b4e010f62c7ce Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 15 Jan 2020 09:26:04 -0500
Subject: [PATCH v3 1/5] Adjust src/include/utils/jsonapi.h so it's not
 backend-only.

The major change here is that we no longer include jsonb.h into
jsonapi.h. The reason that was necessary is that jsonapi.h included
several prototypes functions in jsonfuncs.c that depend on the Jsonb
type. Move those prototypes to a new header, jsonfuncs.h, and include
it where needed.

The other change is that JsonEncodeDateTime is now declared in
json.h rather than jsonapi.h.

Taken together, these steps eliminate all dependencies of jsonapi.h
on backend-only data types and header files, so that it can
potentially be included in frontend code.
---
 src/backend/tsearch/to_tsany.c     |  1 +
 src/backend/tsearch/wparser.c      |  1 +
 src/backend/utils/adt/jsonb_util.c |  1 +
 src/backend/utils/adt/jsonfuncs.c  |  1 +
 src/include/utils/json.h           |  2 ++
 src/include/utils/jsonapi.h        | 33 --------------------
 src/include/utils/jsonfuncs.h      | 49 ++++++++++++++++++++++++++++++
 7 files changed, 55 insertions(+), 33 deletions(-)
 create mode 100644 src/include/utils/jsonfuncs.h

diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index cc694cda8c..adf181c191 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -17,6 +17,7 @@
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
 #include "utils/jsonapi.h"
+#include "utils/jsonfuncs.h"
 
 
 typedef struct MorphOpaque
diff --git a/src/backend/tsearch/wparser.c b/src/backend/tsearch/wparser.c
index 6b5960ecc1..c7499a94ac 100644
--- a/src/backend/tsearch/wparser.c
+++ b/src/backend/tsearch/wparser.c
@@ -21,6 +21,7 @@
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
 #include "utils/jsonapi.h"
+#include "utils/jsonfuncs.h"
 #include "utils/varlena.h"
 
 /******sql-level interface******/
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 7c9da701dd..b33c3ef43c 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -19,6 +19,7 @@
 #include "utils/builtins.h"
 #include "utils/datetime.h"
 #include "utils/hashutils.h"
+#include "utils/json.h"
 #include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/memutils.h"
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 38758a626b..2f9955d665 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -29,6 +29,7 @@
 #include "utils/json.h"
 #include "utils/jsonapi.h"
 #include "utils/jsonb.h"
+#include "utils/jsonfuncs.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 20b5294491..4345fbdc31 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -18,5 +18,7 @@
 
 /* functions in json.c */
 extern void escape_json(StringInfo buf, const char *str);
+extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
+								const int *tzp);
 
 #endif							/* JSON_H */
diff --git a/src/include/utils/jsonapi.h b/src/include/utils/jsonapi.h
index f72f1cefd5..1190947476 100644
--- a/src/include/utils/jsonapi.h
+++ b/src/include/utils/jsonapi.h
@@ -14,7 +14,6 @@
 #ifndef JSONAPI_H
 #define JSONAPI_H
 
-#include "jsonb.h"
 #include "lib/stringinfo.h"
 
 typedef enum
@@ -132,36 +131,4 @@ extern JsonLexContext *makeJsonLexContextCstringLen(char *json,
  */
 extern bool IsValidJsonNumber(const char *str, int len);
 
-/*
- * Flag types for iterate_json(b)_values to specify what elements from a
- * json(b) document we want to iterate.
- */
-typedef enum JsonToIndex
-{
-	jtiKey = 0x01,
-	jtiString = 0x02,
-	jtiNumeric = 0x04,
-	jtiBool = 0x08,
-	jtiAll = jtiKey | jtiString | jtiNumeric | jtiBool
-} JsonToIndex;
-
-/* an action that will be applied to each value in iterate_json(b)_values functions */
-typedef void (*JsonIterateStringValuesAction) (void *state, char *elem_value, int elem_len);
-
-/* an action that will be applied to each value in transform_json(b)_values functions */
-typedef text *(*JsonTransformStringValuesAction) (void *state, char *elem_value, int elem_len);
-
-extern uint32 parse_jsonb_index_flags(Jsonb *jb);
-extern void iterate_jsonb_values(Jsonb *jb, uint32 flags, void *state,
-								 JsonIterateStringValuesAction action);
-extern void iterate_json_values(text *json, uint32 flags, void *action_state,
-								JsonIterateStringValuesAction action);
-extern Jsonb *transform_jsonb_string_values(Jsonb *jsonb, void *action_state,
-											JsonTransformStringValuesAction transform_action);
-extern text *transform_json_string_values(text *json, void *action_state,
-										  JsonTransformStringValuesAction transform_action);
-
-extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
-								const int *tzp);
-
 #endif							/* JSONAPI_H */
diff --git a/src/include/utils/jsonfuncs.h b/src/include/utils/jsonfuncs.h
new file mode 100644
index 0000000000..19f087ccae
--- /dev/null
+++ b/src/include/utils/jsonfuncs.h
@@ -0,0 +1,49 @@
+/*-------------------------------------------------------------------------
+ *
+ * jsonfuncs.h
+ *	  Functions to process JSON data types.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/jsonapi.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef JSONFUNCS_H
+#define JSONFUNCS_H
+
+#include "utils/jsonapi.h"
+#include "utils/jsonb.h"
+
+/*
+ * Flag types for iterate_json(b)_values to specify what elements from a
+ * json(b) document we want to iterate.
+ */
+typedef enum JsonToIndex
+{
+	jtiKey = 0x01,
+	jtiString = 0x02,
+	jtiNumeric = 0x04,
+	jtiBool = 0x08,
+	jtiAll = jtiKey | jtiString | jtiNumeric | jtiBool
+} JsonToIndex;
+
+/* an action that will be applied to each value in iterate_json(b)_values functions */
+typedef void (*JsonIterateStringValuesAction) (void *state, char *elem_value, int elem_len);
+
+/* an action that will be applied to each value in transform_json(b)_values functions */
+typedef text *(*JsonTransformStringValuesAction) (void *state, char *elem_value, int elem_len);
+
+extern uint32 parse_jsonb_index_flags(Jsonb *jb);
+extern void iterate_jsonb_values(Jsonb *jb, uint32 flags, void *state,
+								 JsonIterateStringValuesAction action);
+extern void iterate_json_values(text *json, uint32 flags, void *action_state,
+								JsonIterateStringValuesAction action);
+extern Jsonb *transform_jsonb_string_values(Jsonb *jsonb, void *action_state,
+											JsonTransformStringValuesAction transform_action);
+extern text *transform_json_string_values(text *json, void *action_state,
+										  JsonTransformStringValuesAction transform_action);
+
+#endif
-- 
2.17.2 (Apple Git-113)

v3-0002-Split-JSON-lexer-parser-from-json-data-type-suppo.patchapplication/octet-stream; name=v3-0002-Split-JSON-lexer-parser-from-json-data-type-suppo.patchDownload

From b8fb0d591c1cd2770c0e76b0206a8277bf5ca709 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 15 Jan 2020 10:03:39 -0500
Subject: [PATCH v3 2/5] Split JSON lexer/parser from 'json' data type support.

Keep the code that pertains to the 'json' data type in json.c, but
move the lexing and parsing code to a new file jsonapi.c, a name
I chose because the corresponding prototypes are in jsonapi.h.

This seems like a logical division, because the JSON lexer and parser
are also used by the 'jsonb' data type, but the SQL-callable functions
in json.c are a separate thing. Also, the new jsonapi.c file needs to
include far fewer header files than json.c, which seems like a good
sign that this is an appropriate place to insert an abstraction
boundary. I took the opportunity to remove a few apparently-unneeded
includes from json.c at the same time.
---
 src/backend/utils/adt/Makefile  |    1 +
 src/backend/utils/adt/json.c    | 1206 +-----------------------------
 src/backend/utils/adt/jsonapi.c | 1216 +++++++++++++++++++++++++++++++
 src/include/utils/jsonapi.h     |    6 +
 4 files changed, 1224 insertions(+), 1205 deletions(-)
 create mode 100644 src/backend/utils/adt/jsonapi.c

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 13efa9338c..790d7a24fb 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -44,6 +44,7 @@ OBJS = \
 	int.o \
 	int8.o \
 	json.o \
+	jsonapi.o \
 	jsonb.o \
 	jsonb_gin.o \
 	jsonb_op.o \
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 458505abfd..4be16b5c20 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -13,14 +13,9 @@
  */
 #include "postgres.h"
 
-#include "access/htup_details.h"
-#include "access/transam.h"
 #include "catalog/pg_type.h"
-#include "executor/spi.h"
 #include "funcapi.h"
-#include "lib/stringinfo.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "utils/array.h"
@@ -30,27 +25,8 @@
 #include "utils/json.h"
 #include "utils/jsonapi.h"
 #include "utils/lsyscache.h"
-#include "utils/syscache.h"
 #include "utils/typcache.h"
 
-/*
- * The context of the parser is maintained by the recursive descent
- * mechanism, but is passed explicitly to the error reporting routine
- * for better diagnostics.
- */
-typedef enum					/* contexts of JSON parser */
-{
-	JSON_PARSE_VALUE,			/* expecting a value */
-	JSON_PARSE_STRING,			/* expecting a string (for a field name) */
-	JSON_PARSE_ARRAY_START,		/* saw '[', expecting value or ']' */
-	JSON_PARSE_ARRAY_NEXT,		/* saw array element, expecting ',' or ']' */
-	JSON_PARSE_OBJECT_START,	/* saw '{', expecting label or '}' */
-	JSON_PARSE_OBJECT_LABEL,	/* saw object label, expecting ':' */
-	JSON_PARSE_OBJECT_NEXT,		/* saw object value, expecting ',' or '}' */
-	JSON_PARSE_OBJECT_COMMA,	/* saw object ',', expecting next label */
-	JSON_PARSE_END				/* saw the end of a document, expect nothing */
-} JsonParseContext;
-
 typedef enum					/* type categories for datum_to_json */
 {
 	JSONTYPE_NULL,				/* null, so we didn't bother to identify */
@@ -75,19 +51,6 @@ typedef struct JsonAggState
 	Oid			val_output_func;
 } JsonAggState;
 
-static inline void json_lex(JsonLexContext *lex);
-static inline void json_lex_string(JsonLexContext *lex);
-static inline void json_lex_number(JsonLexContext *lex, char *s,
-								   bool *num_err, int *total_len);
-static inline void parse_scalar(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_object_field(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_object(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_array_element(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_array(JsonLexContext *lex, JsonSemAction *sem);
-static void report_parse_error(JsonParseContext ctx, JsonLexContext *lex) pg_attribute_noreturn();
-static void report_invalid_token(JsonLexContext *lex) pg_attribute_noreturn();
-static int	report_json_context(JsonLexContext *lex);
-static char *extract_mb_char(char *s);
 static void composite_to_json(Datum composite, StringInfo result,
 							  bool use_line_feeds);
 static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
@@ -106,121 +69,6 @@ static void add_json(Datum val, bool is_null, StringInfo result,
 					 Oid val_type, bool key_scalar);
 static text *catenate_stringinfo_string(StringInfo buffer, const char *addon);
 
-/* the null action object used for pure validation */
-static JsonSemAction nullSemAction =
-{
-	NULL, NULL, NULL, NULL, NULL,
-	NULL, NULL, NULL, NULL, NULL
-};
-
-/* Recursive Descent parser support routines */
-
-/*
- * lex_peek
- *
- * what is the current look_ahead token?
-*/
-static inline JsonTokenType
-lex_peek(JsonLexContext *lex)
-{
-	return lex->token_type;
-}
-
-/*
- * lex_accept
- *
- * accept the look_ahead token and move the lexer to the next token if the
- * look_ahead token matches the token parameter. In that case, and if required,
- * also hand back the de-escaped lexeme.
- *
- * returns true if the token matched, false otherwise.
- */
-static inline bool
-lex_accept(JsonLexContext *lex, JsonTokenType token, char **lexeme)
-{
-	if (lex->token_type == token)
-	{
-		if (lexeme != NULL)
-		{
-			if (lex->token_type == JSON_TOKEN_STRING)
-			{
-				if (lex->strval != NULL)
-					*lexeme = pstrdup(lex->strval->data);
-			}
-			else
-			{
-				int			len = (lex->token_terminator - lex->token_start);
-				char	   *tokstr = palloc(len + 1);
-
-				memcpy(tokstr, lex->token_start, len);
-				tokstr[len] = '\0';
-				*lexeme = tokstr;
-			}
-		}
-		json_lex(lex);
-		return true;
-	}
-	return false;
-}
-
-/*
- * lex_accept
- *
- * move the lexer to the next token if the current look_ahead token matches
- * the parameter token. Otherwise, report an error.
- */
-static inline void
-lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token)
-{
-	if (!lex_accept(lex, token, NULL))
-		report_parse_error(ctx, lex);
-}
-
-/* chars to consider as part of an alphanumeric token */
-#define JSON_ALPHANUMERIC_CHAR(c)  \
-	(((c) >= 'a' && (c) <= 'z') || \
-	 ((c) >= 'A' && (c) <= 'Z') || \
-	 ((c) >= '0' && (c) <= '9') || \
-	 (c) == '_' || \
-	 IS_HIGHBIT_SET(c))
-
-/*
- * Utility function to check if a string is a valid JSON number.
- *
- * str is of length len, and need not be null-terminated.
- */
-bool
-IsValidJsonNumber(const char *str, int len)
-{
-	bool		numeric_error;
-	int			total_len;
-	JsonLexContext dummy_lex;
-
-	if (len <= 0)
-		return false;
-
-	/*
-	 * json_lex_number expects a leading  '-' to have been eaten already.
-	 *
-	 * having to cast away the constness of str is ugly, but there's not much
-	 * easy alternative.
-	 */
-	if (*str == '-')
-	{
-		dummy_lex.input = unconstify(char *, str) +1;
-		dummy_lex.input_length = len - 1;
-	}
-	else
-	{
-		dummy_lex.input = unconstify(char *, str);
-		dummy_lex.input_length = len;
-	}
-
-	json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error, &total_len);
-
-	return (!numeric_error) && (total_len == dummy_lex.input_length);
-}
-
 /*
  * Input.
  */
@@ -285,1058 +133,6 @@ json_recv(PG_FUNCTION_ARGS)
 	PG_RETURN_TEXT_P(cstring_to_text_with_len(str, nbytes));
 }
 
-/*
- * makeJsonLexContext
- *
- * lex constructor, with or without StringInfo object
- * for de-escaped lexemes.
- *
- * Without is better as it makes the processing faster, so only make one
- * if really required.
- *
- * If you already have the json as a text* value, use the first of these
- * functions, otherwise use  makeJsonLexContextCstringLen().
- */
-JsonLexContext *
-makeJsonLexContext(text *json, bool need_escapes)
-{
-	return makeJsonLexContextCstringLen(VARDATA_ANY(json),
-										VARSIZE_ANY_EXHDR(json),
-										need_escapes);
-}
-
-JsonLexContext *
-makeJsonLexContextCstringLen(char *json, int len, bool need_escapes)
-{
-	JsonLexContext *lex = palloc0(sizeof(JsonLexContext));
-
-	lex->input = lex->token_terminator = lex->line_start = json;
-	lex->line_number = 1;
-	lex->input_length = len;
-	if (need_escapes)
-		lex->strval = makeStringInfo();
-	return lex;
-}
-
-/*
- * pg_parse_json
- *
- * Publicly visible entry point for the JSON parser.
- *
- * lex is a lexing context, set up for the json to be processed by calling
- * makeJsonLexContext(). sem is a structure of function pointers to semantic
- * action routines to be called at appropriate spots during parsing, and a
- * pointer to a state object to be passed to those routines.
- */
-void
-pg_parse_json(JsonLexContext *lex, JsonSemAction *sem)
-{
-	JsonTokenType tok;
-
-	/* get the initial token */
-	json_lex(lex);
-
-	tok = lex_peek(lex);
-
-	/* parse by recursive descent */
-	switch (tok)
-	{
-		case JSON_TOKEN_OBJECT_START:
-			parse_object(lex, sem);
-			break;
-		case JSON_TOKEN_ARRAY_START:
-			parse_array(lex, sem);
-			break;
-		default:
-			parse_scalar(lex, sem); /* json can be a bare scalar */
-	}
-
-	lex_expect(JSON_PARSE_END, lex, JSON_TOKEN_END);
-
-}
-
-/*
- * json_count_array_elements
- *
- * Returns number of array elements in lex context at start of array token
- * until end of array token at same nesting level.
- *
- * Designed to be called from array_start routines.
- */
-int
-json_count_array_elements(JsonLexContext *lex)
-{
-	JsonLexContext copylex;
-	int			count;
-
-	/*
-	 * It's safe to do this with a shallow copy because the lexical routines
-	 * don't scribble on the input. They do scribble on the other pointers
-	 * etc, so doing this with a copy makes that safe.
-	 */
-	memcpy(&copylex, lex, sizeof(JsonLexContext));
-	copylex.strval = NULL;		/* not interested in values here */
-	copylex.lex_level++;
-
-	count = 0;
-	lex_expect(JSON_PARSE_ARRAY_START, &copylex, JSON_TOKEN_ARRAY_START);
-	if (lex_peek(&copylex) != JSON_TOKEN_ARRAY_END)
-	{
-		do
-		{
-			count++;
-			parse_array_element(&copylex, &nullSemAction);
-		}
-		while (lex_accept(&copylex, JSON_TOKEN_COMMA, NULL));
-	}
-	lex_expect(JSON_PARSE_ARRAY_NEXT, &copylex, JSON_TOKEN_ARRAY_END);
-
-	return count;
-}
-
-/*
- *	Recursive Descent parse routines. There is one for each structural
- *	element in a json document:
- *	  - scalar (string, number, true, false, null)
- *	  - array  ( [ ] )
- *	  - array element
- *	  - object ( { } )
- *	  - object field
- */
-static inline void
-parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
-{
-	char	   *val = NULL;
-	json_scalar_action sfunc = sem->scalar;
-	char	  **valaddr;
-	JsonTokenType tok = lex_peek(lex);
-
-	valaddr = sfunc == NULL ? NULL : &val;
-
-	/* a scalar must be a string, a number, true, false, or null */
-	switch (tok)
-	{
-		case JSON_TOKEN_TRUE:
-			lex_accept(lex, JSON_TOKEN_TRUE, valaddr);
-			break;
-		case JSON_TOKEN_FALSE:
-			lex_accept(lex, JSON_TOKEN_FALSE, valaddr);
-			break;
-		case JSON_TOKEN_NULL:
-			lex_accept(lex, JSON_TOKEN_NULL, valaddr);
-			break;
-		case JSON_TOKEN_NUMBER:
-			lex_accept(lex, JSON_TOKEN_NUMBER, valaddr);
-			break;
-		case JSON_TOKEN_STRING:
-			lex_accept(lex, JSON_TOKEN_STRING, valaddr);
-			break;
-		default:
-			report_parse_error(JSON_PARSE_VALUE, lex);
-	}
-
-	if (sfunc != NULL)
-		(*sfunc) (sem->semstate, val, tok);
-}
-
-static void
-parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
-{
-	/*
-	 * An object field is "fieldname" : value where value can be a scalar,
-	 * object or array.  Note: in user-facing docs and error messages, we
-	 * generally call a field name a "key".
-	 */
-
-	char	   *fname = NULL;	/* keep compiler quiet */
-	json_ofield_action ostart = sem->object_field_start;
-	json_ofield_action oend = sem->object_field_end;
-	bool		isnull;
-	char	  **fnameaddr = NULL;
-	JsonTokenType tok;
-
-	if (ostart != NULL || oend != NULL)
-		fnameaddr = &fname;
-
-	if (!lex_accept(lex, JSON_TOKEN_STRING, fnameaddr))
-		report_parse_error(JSON_PARSE_STRING, lex);
-
-	lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON);
-
-	tok = lex_peek(lex);
-	isnull = tok == JSON_TOKEN_NULL;
-
-	if (ostart != NULL)
-		(*ostart) (sem->semstate, fname, isnull);
-
-	switch (tok)
-	{
-		case JSON_TOKEN_OBJECT_START:
-			parse_object(lex, sem);
-			break;
-		case JSON_TOKEN_ARRAY_START:
-			parse_array(lex, sem);
-			break;
-		default:
-			parse_scalar(lex, sem);
-	}
-
-	if (oend != NULL)
-		(*oend) (sem->semstate, fname, isnull);
-}
-
-static void
-parse_object(JsonLexContext *lex, JsonSemAction *sem)
-{
-	/*
-	 * an object is a possibly empty sequence of object fields, separated by
-	 * commas and surrounded by curly braces.
-	 */
-	json_struct_action ostart = sem->object_start;
-	json_struct_action oend = sem->object_end;
-	JsonTokenType tok;
-
-	check_stack_depth();
-
-	if (ostart != NULL)
-		(*ostart) (sem->semstate);
-
-	/*
-	 * Data inside an object is at a higher nesting level than the object
-	 * itself. Note that we increment this after we call the semantic routine
-	 * for the object start and restore it before we call the routine for the
-	 * object end.
-	 */
-	lex->lex_level++;
-
-	/* we know this will succeed, just clearing the token */
-	lex_expect(JSON_PARSE_OBJECT_START, lex, JSON_TOKEN_OBJECT_START);
-
-	tok = lex_peek(lex);
-	switch (tok)
-	{
-		case JSON_TOKEN_STRING:
-			parse_object_field(lex, sem);
-			while (lex_accept(lex, JSON_TOKEN_COMMA, NULL))
-				parse_object_field(lex, sem);
-			break;
-		case JSON_TOKEN_OBJECT_END:
-			break;
-		default:
-			/* case of an invalid initial token inside the object */
-			report_parse_error(JSON_PARSE_OBJECT_START, lex);
-	}
-
-	lex_expect(JSON_PARSE_OBJECT_NEXT, lex, JSON_TOKEN_OBJECT_END);
-
-	lex->lex_level--;
-
-	if (oend != NULL)
-		(*oend) (sem->semstate);
-}
-
-static void
-parse_array_element(JsonLexContext *lex, JsonSemAction *sem)
-{
-	json_aelem_action astart = sem->array_element_start;
-	json_aelem_action aend = sem->array_element_end;
-	JsonTokenType tok = lex_peek(lex);
-
-	bool		isnull;
-
-	isnull = tok == JSON_TOKEN_NULL;
-
-	if (astart != NULL)
-		(*astart) (sem->semstate, isnull);
-
-	/* an array element is any object, array or scalar */
-	switch (tok)
-	{
-		case JSON_TOKEN_OBJECT_START:
-			parse_object(lex, sem);
-			break;
-		case JSON_TOKEN_ARRAY_START:
-			parse_array(lex, sem);
-			break;
-		default:
-			parse_scalar(lex, sem);
-	}
-
-	if (aend != NULL)
-		(*aend) (sem->semstate, isnull);
-}
-
-static void
-parse_array(JsonLexContext *lex, JsonSemAction *sem)
-{
-	/*
-	 * an array is a possibly empty sequence of array elements, separated by
-	 * commas and surrounded by square brackets.
-	 */
-	json_struct_action astart = sem->array_start;
-	json_struct_action aend = sem->array_end;
-
-	check_stack_depth();
-
-	if (astart != NULL)
-		(*astart) (sem->semstate);
-
-	/*
-	 * Data inside an array is at a higher nesting level than the array
-	 * itself. Note that we increment this after we call the semantic routine
-	 * for the array start and restore it before we call the routine for the
-	 * array end.
-	 */
-	lex->lex_level++;
-
-	lex_expect(JSON_PARSE_ARRAY_START, lex, JSON_TOKEN_ARRAY_START);
-	if (lex_peek(lex) != JSON_TOKEN_ARRAY_END)
-	{
-
-		parse_array_element(lex, sem);
-
-		while (lex_accept(lex, JSON_TOKEN_COMMA, NULL))
-			parse_array_element(lex, sem);
-	}
-
-	lex_expect(JSON_PARSE_ARRAY_NEXT, lex, JSON_TOKEN_ARRAY_END);
-
-	lex->lex_level--;
-
-	if (aend != NULL)
-		(*aend) (sem->semstate);
-}
-
-/*
- * Lex one token from the input stream.
- */
-static inline void
-json_lex(JsonLexContext *lex)
-{
-	char	   *s;
-	int			len;
-
-	/* Skip leading whitespace. */
-	s = lex->token_terminator;
-	len = s - lex->input;
-	while (len < lex->input_length &&
-		   (*s == ' ' || *s == '\t' || *s == '\n' || *s == '\r'))
-	{
-		if (*s == '\n')
-			++lex->line_number;
-		++s;
-		++len;
-	}
-	lex->token_start = s;
-
-	/* Determine token type. */
-	if (len >= lex->input_length)
-	{
-		lex->token_start = NULL;
-		lex->prev_token_terminator = lex->token_terminator;
-		lex->token_terminator = s;
-		lex->token_type = JSON_TOKEN_END;
-	}
-	else
-		switch (*s)
-		{
-				/* Single-character token, some kind of punctuation mark. */
-			case '{':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_OBJECT_START;
-				break;
-			case '}':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_OBJECT_END;
-				break;
-			case '[':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_ARRAY_START;
-				break;
-			case ']':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_ARRAY_END;
-				break;
-			case ',':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_COMMA;
-				break;
-			case ':':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_COLON;
-				break;
-			case '"':
-				/* string */
-				json_lex_string(lex);
-				lex->token_type = JSON_TOKEN_STRING;
-				break;
-			case '-':
-				/* Negative number. */
-				json_lex_number(lex, s + 1, NULL, NULL);
-				lex->token_type = JSON_TOKEN_NUMBER;
-				break;
-			case '0':
-			case '1':
-			case '2':
-			case '3':
-			case '4':
-			case '5':
-			case '6':
-			case '7':
-			case '8':
-			case '9':
-				/* Positive number. */
-				json_lex_number(lex, s, NULL, NULL);
-				lex->token_type = JSON_TOKEN_NUMBER;
-				break;
-			default:
-				{
-					char	   *p;
-
-					/*
-					 * We're not dealing with a string, number, legal
-					 * punctuation mark, or end of string.  The only legal
-					 * tokens we might find here are true, false, and null,
-					 * but for error reporting purposes we scan until we see a
-					 * non-alphanumeric character.  That way, we can report
-					 * the whole word as an unexpected token, rather than just
-					 * some unintuitive prefix thereof.
-					 */
-					for (p = s; p - s < lex->input_length - len && JSON_ALPHANUMERIC_CHAR(*p); p++)
-						 /* skip */ ;
-
-					/*
-					 * We got some sort of unexpected punctuation or an
-					 * otherwise unexpected character, so just complain about
-					 * that one character.
-					 */
-					if (p == s)
-					{
-						lex->prev_token_terminator = lex->token_terminator;
-						lex->token_terminator = s + 1;
-						report_invalid_token(lex);
-					}
-
-					/*
-					 * We've got a real alphanumeric token here.  If it
-					 * happens to be true, false, or null, all is well.  If
-					 * not, error out.
-					 */
-					lex->prev_token_terminator = lex->token_terminator;
-					lex->token_terminator = p;
-					if (p - s == 4)
-					{
-						if (memcmp(s, "true", 4) == 0)
-							lex->token_type = JSON_TOKEN_TRUE;
-						else if (memcmp(s, "null", 4) == 0)
-							lex->token_type = JSON_TOKEN_NULL;
-						else
-							report_invalid_token(lex);
-					}
-					else if (p - s == 5 && memcmp(s, "false", 5) == 0)
-						lex->token_type = JSON_TOKEN_FALSE;
-					else
-						report_invalid_token(lex);
-
-				}
-		}						/* end of switch */
-}
-
-/*
- * The next token in the input stream is known to be a string; lex it.
- */
-static inline void
-json_lex_string(JsonLexContext *lex)
-{
-	char	   *s;
-	int			len;
-	int			hi_surrogate = -1;
-
-	if (lex->strval != NULL)
-		resetStringInfo(lex->strval);
-
-	Assert(lex->input_length > 0);
-	s = lex->token_start;
-	len = lex->token_start - lex->input;
-	for (;;)
-	{
-		s++;
-		len++;
-		/* Premature end of the string. */
-		if (len >= lex->input_length)
-		{
-			lex->token_terminator = s;
-			report_invalid_token(lex);
-		}
-		else if (*s == '"')
-			break;
-		else if ((unsigned char) *s < 32)
-		{
-			/* Per RFC4627, these characters MUST be escaped. */
-			/* Since *s isn't printable, exclude it from the context string */
-			lex->token_terminator = s;
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-					 errmsg("invalid input syntax for type %s", "json"),
-					 errdetail("Character with value 0x%02x must be escaped.",
-							   (unsigned char) *s),
-					 report_json_context(lex)));
-		}
-		else if (*s == '\\')
-		{
-			/* OK, we have an escape character. */
-			s++;
-			len++;
-			if (len >= lex->input_length)
-			{
-				lex->token_terminator = s;
-				report_invalid_token(lex);
-			}
-			else if (*s == 'u')
-			{
-				int			i;
-				int			ch = 0;
-
-				for (i = 1; i <= 4; i++)
-				{
-					s++;
-					len++;
-					if (len >= lex->input_length)
-					{
-						lex->token_terminator = s;
-						report_invalid_token(lex);
-					}
-					else if (*s >= '0' && *s <= '9')
-						ch = (ch * 16) + (*s - '0');
-					else if (*s >= 'a' && *s <= 'f')
-						ch = (ch * 16) + (*s - 'a') + 10;
-					else if (*s >= 'A' && *s <= 'F')
-						ch = (ch * 16) + (*s - 'A') + 10;
-					else
-					{
-						lex->token_terminator = s + pg_mblen(s);
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s",
-										"json"),
-								 errdetail("\"\\u\" must be followed by four hexadecimal digits."),
-								 report_json_context(lex)));
-					}
-				}
-				if (lex->strval != NULL)
-				{
-					char		utf8str[5];
-					int			utf8len;
-
-					if (ch >= 0xd800 && ch <= 0xdbff)
-					{
-						if (hi_surrogate != -1)
-							ereport(ERROR,
-									(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-									 errmsg("invalid input syntax for type %s",
-											"json"),
-									 errdetail("Unicode high surrogate must not follow a high surrogate."),
-									 report_json_context(lex)));
-						hi_surrogate = (ch & 0x3ff) << 10;
-						continue;
-					}
-					else if (ch >= 0xdc00 && ch <= 0xdfff)
-					{
-						if (hi_surrogate == -1)
-							ereport(ERROR,
-									(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-									 errmsg("invalid input syntax for type %s", "json"),
-									 errdetail("Unicode low surrogate must follow a high surrogate."),
-									 report_json_context(lex)));
-						ch = 0x10000 + hi_surrogate + (ch & 0x3ff);
-						hi_surrogate = -1;
-					}
-
-					if (hi_surrogate != -1)
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s", "json"),
-								 errdetail("Unicode low surrogate must follow a high surrogate."),
-								 report_json_context(lex)));
-
-					/*
-					 * For UTF8, replace the escape sequence by the actual
-					 * utf8 character in lex->strval. Do this also for other
-					 * encodings if the escape designates an ASCII character,
-					 * otherwise raise an error.
-					 */
-
-					if (ch == 0)
-					{
-						/* We can't allow this, since our TEXT type doesn't */
-						ereport(ERROR,
-								(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
-								 errmsg("unsupported Unicode escape sequence"),
-								 errdetail("\\u0000 cannot be converted to text."),
-								 report_json_context(lex)));
-					}
-					else if (GetDatabaseEncoding() == PG_UTF8)
-					{
-						unicode_to_utf8(ch, (unsigned char *) utf8str);
-						utf8len = pg_utf_mblen((unsigned char *) utf8str);
-						appendBinaryStringInfo(lex->strval, utf8str, utf8len);
-					}
-					else if (ch <= 0x007f)
-					{
-						/*
-						 * This is the only way to designate things like a
-						 * form feed character in JSON, so it's useful in all
-						 * encodings.
-						 */
-						appendStringInfoChar(lex->strval, (char) ch);
-					}
-					else
-					{
-						ereport(ERROR,
-								(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
-								 errmsg("unsupported Unicode escape sequence"),
-								 errdetail("Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8."),
-								 report_json_context(lex)));
-					}
-
-				}
-			}
-			else if (lex->strval != NULL)
-			{
-				if (hi_surrogate != -1)
-					ereport(ERROR,
-							(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-							 errmsg("invalid input syntax for type %s",
-									"json"),
-							 errdetail("Unicode low surrogate must follow a high surrogate."),
-							 report_json_context(lex)));
-
-				switch (*s)
-				{
-					case '"':
-					case '\\':
-					case '/':
-						appendStringInfoChar(lex->strval, *s);
-						break;
-					case 'b':
-						appendStringInfoChar(lex->strval, '\b');
-						break;
-					case 'f':
-						appendStringInfoChar(lex->strval, '\f');
-						break;
-					case 'n':
-						appendStringInfoChar(lex->strval, '\n');
-						break;
-					case 'r':
-						appendStringInfoChar(lex->strval, '\r');
-						break;
-					case 't':
-						appendStringInfoChar(lex->strval, '\t');
-						break;
-					default:
-						/* Not a valid string escape, so error out. */
-						lex->token_terminator = s + pg_mblen(s);
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s",
-										"json"),
-								 errdetail("Escape sequence \"\\%s\" is invalid.",
-										   extract_mb_char(s)),
-								 report_json_context(lex)));
-				}
-			}
-			else if (strchr("\"\\/bfnrt", *s) == NULL)
-			{
-				/*
-				 * Simpler processing if we're not bothered about de-escaping
-				 *
-				 * It's very tempting to remove the strchr() call here and
-				 * replace it with a switch statement, but testing so far has
-				 * shown it's not a performance win.
-				 */
-				lex->token_terminator = s + pg_mblen(s);
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Escape sequence \"\\%s\" is invalid.",
-								   extract_mb_char(s)),
-						 report_json_context(lex)));
-			}
-
-		}
-		else if (lex->strval != NULL)
-		{
-			if (hi_surrogate != -1)
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Unicode low surrogate must follow a high surrogate."),
-						 report_json_context(lex)));
-
-			appendStringInfoChar(lex->strval, *s);
-		}
-
-	}
-
-	if (hi_surrogate != -1)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("Unicode low surrogate must follow a high surrogate."),
-				 report_json_context(lex)));
-
-	/* Hooray, we found the end of the string! */
-	lex->prev_token_terminator = lex->token_terminator;
-	lex->token_terminator = s + 1;
-}
-
-/*
- * The next token in the input stream is known to be a number; lex it.
- *
- * In JSON, a number consists of four parts:
- *
- * (1) An optional minus sign ('-').
- *
- * (2) Either a single '0', or a string of one or more digits that does not
- *	   begin with a '0'.
- *
- * (3) An optional decimal part, consisting of a period ('.') followed by
- *	   one or more digits.  (Note: While this part can be omitted
- *	   completely, it's not OK to have only the decimal point without
- *	   any digits afterwards.)
- *
- * (4) An optional exponent part, consisting of 'e' or 'E', optionally
- *	   followed by '+' or '-', followed by one or more digits.  (Note:
- *	   As with the decimal part, if 'e' or 'E' is present, it must be
- *	   followed by at least one digit.)
- *
- * The 's' argument to this function points to the ostensible beginning
- * of part 2 - i.e. the character after any optional minus sign, or the
- * first character of the string if there is none.
- *
- * If num_err is not NULL, we return an error flag to *num_err rather than
- * raising an error for a badly-formed number.  Also, if total_len is not NULL
- * the distance from lex->input to the token end+1 is returned to *total_len.
- */
-static inline void
-json_lex_number(JsonLexContext *lex, char *s,
-				bool *num_err, int *total_len)
-{
-	bool		error = false;
-	int			len = s - lex->input;
-
-	/* Part (1): leading sign indicator. */
-	/* Caller already did this for us; so do nothing. */
-
-	/* Part (2): parse main digit string. */
-	if (len < lex->input_length && *s == '0')
-	{
-		s++;
-		len++;
-	}
-	else if (len < lex->input_length && *s >= '1' && *s <= '9')
-	{
-		do
-		{
-			s++;
-			len++;
-		} while (len < lex->input_length && *s >= '0' && *s <= '9');
-	}
-	else
-		error = true;
-
-	/* Part (3): parse optional decimal portion. */
-	if (len < lex->input_length && *s == '.')
-	{
-		s++;
-		len++;
-		if (len == lex->input_length || *s < '0' || *s > '9')
-			error = true;
-		else
-		{
-			do
-			{
-				s++;
-				len++;
-			} while (len < lex->input_length && *s >= '0' && *s <= '9');
-		}
-	}
-
-	/* Part (4): parse optional exponent. */
-	if (len < lex->input_length && (*s == 'e' || *s == 'E'))
-	{
-		s++;
-		len++;
-		if (len < lex->input_length && (*s == '+' || *s == '-'))
-		{
-			s++;
-			len++;
-		}
-		if (len == lex->input_length || *s < '0' || *s > '9')
-			error = true;
-		else
-		{
-			do
-			{
-				s++;
-				len++;
-			} while (len < lex->input_length && *s >= '0' && *s <= '9');
-		}
-	}
-
-	/*
-	 * Check for trailing garbage.  As in json_lex(), any alphanumeric stuff
-	 * here should be considered part of the token for error-reporting
-	 * purposes.
-	 */
-	for (; len < lex->input_length && JSON_ALPHANUMERIC_CHAR(*s); s++, len++)
-		error = true;
-
-	if (total_len != NULL)
-		*total_len = len;
-
-	if (num_err != NULL)
-	{
-		/* let the caller handle any error */
-		*num_err = error;
-	}
-	else
-	{
-		/* return token endpoint */
-		lex->prev_token_terminator = lex->token_terminator;
-		lex->token_terminator = s;
-		/* handle error if any */
-		if (error)
-			report_invalid_token(lex);
-	}
-}
-
-/*
- * Report a parse error.
- *
- * lex->token_start and lex->token_terminator must identify the current token.
- */
-static void
-report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
-{
-	char	   *token;
-	int			toklen;
-
-	/* Handle case where the input ended prematurely. */
-	if (lex->token_start == NULL || lex->token_type == JSON_TOKEN_END)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("The input string ended unexpectedly."),
-				 report_json_context(lex)));
-
-	/* Separate out the current token. */
-	toklen = lex->token_terminator - lex->token_start;
-	token = palloc(toklen + 1);
-	memcpy(token, lex->token_start, toklen);
-	token[toklen] = '\0';
-
-	/* Complain, with the appropriate detail message. */
-	if (ctx == JSON_PARSE_END)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("Expected end of input, but found \"%s\".",
-						   token),
-				 report_json_context(lex)));
-	else
-	{
-		switch (ctx)
-		{
-			case JSON_PARSE_VALUE:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected JSON value, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_STRING:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_ARRAY_START:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected array element or \"]\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_ARRAY_NEXT:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \",\" or \"]\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_START:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string or \"}\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_LABEL:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \":\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_NEXT:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \",\" or \"}\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_COMMA:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			default:
-				elog(ERROR, "unexpected json parse state: %d", ctx);
-		}
-	}
-}
-
-/*
- * Report an invalid input token.
- *
- * lex->token_start and lex->token_terminator must identify the token.
- */
-static void
-report_invalid_token(JsonLexContext *lex)
-{
-	char	   *token;
-	int			toklen;
-
-	/* Separate out the offending token. */
-	toklen = lex->token_terminator - lex->token_start;
-	token = palloc(toklen + 1);
-	memcpy(token, lex->token_start, toklen);
-	token[toklen] = '\0';
-
-	ereport(ERROR,
-			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-			 errmsg("invalid input syntax for type %s", "json"),
-			 errdetail("Token \"%s\" is invalid.", token),
-			 report_json_context(lex)));
-}
-
-/*
- * Report a CONTEXT line for bogus JSON input.
- *
- * lex->token_terminator must be set to identify the spot where we detected
- * the error.  Note that lex->token_start might be NULL, in case we recognized
- * error at EOF.
- *
- * The return value isn't meaningful, but we make it non-void so that this
- * can be invoked inside ereport().
- */
-static int
-report_json_context(JsonLexContext *lex)
-{
-	const char *context_start;
-	const char *context_end;
-	const char *line_start;
-	int			line_number;
-	char	   *ctxt;
-	int			ctxtlen;
-	const char *prefix;
-	const char *suffix;
-
-	/* Choose boundaries for the part of the input we will display */
-	context_start = lex->input;
-	context_end = lex->token_terminator;
-	line_start = context_start;
-	line_number = 1;
-	for (;;)
-	{
-		/* Always advance over newlines */
-		if (context_start < context_end && *context_start == '\n')
-		{
-			context_start++;
-			line_start = context_start;
-			line_number++;
-			continue;
-		}
-		/* Otherwise, done as soon as we are close enough to context_end */
-		if (context_end - context_start < 50)
-			break;
-		/* Advance to next multibyte character */
-		if (IS_HIGHBIT_SET(*context_start))
-			context_start += pg_mblen(context_start);
-		else
-			context_start++;
-	}
-
-	/*
-	 * We add "..." to indicate that the excerpt doesn't start at the
-	 * beginning of the line ... but if we're within 3 characters of the
-	 * beginning of the line, we might as well just show the whole line.
-	 */
-	if (context_start - line_start <= 3)
-		context_start = line_start;
-
-	/* Get a null-terminated copy of the data to present */
-	ctxtlen = context_end - context_start;
-	ctxt = palloc(ctxtlen + 1);
-	memcpy(ctxt, context_start, ctxtlen);
-	ctxt[ctxtlen] = '\0';
-
-	/*
-	 * Show the context, prefixing "..." if not starting at start of line, and
-	 * suffixing "..." if not ending at end of line.
-	 */
-	prefix = (context_start > line_start) ? "..." : "";
-	suffix = (lex->token_type != JSON_TOKEN_END && context_end - lex->input < lex->input_length && *context_end != '\n' && *context_end != '\r') ? "..." : "";
-
-	return errcontext("JSON data, line %d: %s%s%s",
-					  line_number, prefix, ctxt, suffix);
-}
-
-/*
- * Extract a single, possibly multi-byte char from the input string.
- */
-static char *
-extract_mb_char(char *s)
-{
-	char	   *res;
-	int			len;
-
-	len = pg_mblen(s);
-	res = palloc(len + 1);
-	memcpy(res, s, len);
-	res[len] = '\0';
-
-	return res;
-}
-
 /*
  * Determine how we want to print values of a given type in datum_to_json.
  *
@@ -2547,7 +1343,7 @@ json_typeof(PG_FUNCTION_ARGS)
 
 	/* Lex exactly one token from the input and check its type. */
 	json_lex(lex);
-	tok = lex_peek(lex);
+	tok = lex->token_type;
 	switch (tok)
 	{
 		case JSON_TOKEN_OBJECT_START:
diff --git a/src/backend/utils/adt/jsonapi.c b/src/backend/utils/adt/jsonapi.c
new file mode 100644
index 0000000000..fc8af9f861
--- /dev/null
+++ b/src/backend/utils/adt/jsonapi.c
@@ -0,0 +1,1216 @@
+/*-------------------------------------------------------------------------
+ *
+ * jsonapi.c
+ *		JSON parser and lexer interfaces
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/jsonapi.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "utils/jsonapi.h"
+
+/*
+ * The context of the parser is maintained by the recursive descent
+ * mechanism, but is passed explicitly to the error reporting routine
+ * for better diagnostics.
+ */
+typedef enum					/* contexts of JSON parser */
+{
+	JSON_PARSE_VALUE,			/* expecting a value */
+	JSON_PARSE_STRING,			/* expecting a string (for a field name) */
+	JSON_PARSE_ARRAY_START,		/* saw '[', expecting value or ']' */
+	JSON_PARSE_ARRAY_NEXT,		/* saw array element, expecting ',' or ']' */
+	JSON_PARSE_OBJECT_START,	/* saw '{', expecting label or '}' */
+	JSON_PARSE_OBJECT_LABEL,	/* saw object label, expecting ':' */
+	JSON_PARSE_OBJECT_NEXT,		/* saw object value, expecting ',' or '}' */
+	JSON_PARSE_OBJECT_COMMA,	/* saw object ',', expecting next label */
+	JSON_PARSE_END				/* saw the end of a document, expect nothing */
+} JsonParseContext;
+
+static inline void json_lex_string(JsonLexContext *lex);
+static inline void json_lex_number(JsonLexContext *lex, char *s,
+								   bool *num_err, int *total_len);
+static inline void parse_scalar(JsonLexContext *lex, JsonSemAction *sem);
+static void parse_object_field(JsonLexContext *lex, JsonSemAction *sem);
+static void parse_object(JsonLexContext *lex, JsonSemAction *sem);
+static void parse_array_element(JsonLexContext *lex, JsonSemAction *sem);
+static void parse_array(JsonLexContext *lex, JsonSemAction *sem);
+static void report_parse_error(JsonParseContext ctx, JsonLexContext *lex) pg_attribute_noreturn();
+static void report_invalid_token(JsonLexContext *lex) pg_attribute_noreturn();
+static int	report_json_context(JsonLexContext *lex);
+static char *extract_mb_char(char *s);
+
+/* the null action object used for pure validation */
+JsonSemAction nullSemAction =
+{
+	NULL, NULL, NULL, NULL, NULL,
+	NULL, NULL, NULL, NULL, NULL
+};
+
+/* Recursive Descent parser support routines */
+
+/*
+ * lex_peek
+ *
+ * what is the current look_ahead token?
+*/
+static inline JsonTokenType
+lex_peek(JsonLexContext *lex)
+{
+	return lex->token_type;
+}
+
+/*
+ * lex_accept
+ *
+ * accept the look_ahead token and move the lexer to the next token if the
+ * look_ahead token matches the token parameter. In that case, and if required,
+ * also hand back the de-escaped lexeme.
+ *
+ * returns true if the token matched, false otherwise.
+ */
+static inline bool
+lex_accept(JsonLexContext *lex, JsonTokenType token, char **lexeme)
+{
+	if (lex->token_type == token)
+	{
+		if (lexeme != NULL)
+		{
+			if (lex->token_type == JSON_TOKEN_STRING)
+			{
+				if (lex->strval != NULL)
+					*lexeme = pstrdup(lex->strval->data);
+			}
+			else
+			{
+				int			len = (lex->token_terminator - lex->token_start);
+				char	   *tokstr = palloc(len + 1);
+
+				memcpy(tokstr, lex->token_start, len);
+				tokstr[len] = '\0';
+				*lexeme = tokstr;
+			}
+		}
+		json_lex(lex);
+		return true;
+	}
+	return false;
+}
+
+/*
+ * lex_accept
+ *
+ * move the lexer to the next token if the current look_ahead token matches
+ * the parameter token. Otherwise, report an error.
+ */
+static inline void
+lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token)
+{
+	if (!lex_accept(lex, token, NULL))
+		report_parse_error(ctx, lex);
+}
+
+/* chars to consider as part of an alphanumeric token */
+#define JSON_ALPHANUMERIC_CHAR(c)  \
+	(((c) >= 'a' && (c) <= 'z') || \
+	 ((c) >= 'A' && (c) <= 'Z') || \
+	 ((c) >= '0' && (c) <= '9') || \
+	 (c) == '_' || \
+	 IS_HIGHBIT_SET(c))
+
+/*
+ * Utility function to check if a string is a valid JSON number.
+ *
+ * str is of length len, and need not be null-terminated.
+ */
+bool
+IsValidJsonNumber(const char *str, int len)
+{
+	bool		numeric_error;
+	int			total_len;
+	JsonLexContext dummy_lex;
+
+	if (len <= 0)
+		return false;
+
+	/*
+	 * json_lex_number expects a leading  '-' to have been eaten already.
+	 *
+	 * having to cast away the constness of str is ugly, but there's not much
+	 * easy alternative.
+	 */
+	if (*str == '-')
+	{
+		dummy_lex.input = unconstify(char *, str) +1;
+		dummy_lex.input_length = len - 1;
+	}
+	else
+	{
+		dummy_lex.input = unconstify(char *, str);
+		dummy_lex.input_length = len;
+	}
+
+	json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error, &total_len);
+
+	return (!numeric_error) && (total_len == dummy_lex.input_length);
+}
+
+/*
+ * makeJsonLexContext
+ *
+ * lex constructor, with or without StringInfo object
+ * for de-escaped lexemes.
+ *
+ * Without is better as it makes the processing faster, so only make one
+ * if really required.
+ *
+ * If you already have the json as a text* value, use the first of these
+ * functions, otherwise use  makeJsonLexContextCstringLen().
+ */
+JsonLexContext *
+makeJsonLexContext(text *json, bool need_escapes)
+{
+	return makeJsonLexContextCstringLen(VARDATA_ANY(json),
+										VARSIZE_ANY_EXHDR(json),
+										need_escapes);
+}
+
+JsonLexContext *
+makeJsonLexContextCstringLen(char *json, int len, bool need_escapes)
+{
+	JsonLexContext *lex = palloc0(sizeof(JsonLexContext));
+
+	lex->input = lex->token_terminator = lex->line_start = json;
+	lex->line_number = 1;
+	lex->input_length = len;
+	if (need_escapes)
+		lex->strval = makeStringInfo();
+	return lex;
+}
+
+/*
+ * pg_parse_json
+ *
+ * Publicly visible entry point for the JSON parser.
+ *
+ * lex is a lexing context, set up for the json to be processed by calling
+ * makeJsonLexContext(). sem is a structure of function pointers to semantic
+ * action routines to be called at appropriate spots during parsing, and a
+ * pointer to a state object to be passed to those routines.
+ */
+void
+pg_parse_json(JsonLexContext *lex, JsonSemAction *sem)
+{
+	JsonTokenType tok;
+
+	/* get the initial token */
+	json_lex(lex);
+
+	tok = lex_peek(lex);
+
+	/* parse by recursive descent */
+	switch (tok)
+	{
+		case JSON_TOKEN_OBJECT_START:
+			parse_object(lex, sem);
+			break;
+		case JSON_TOKEN_ARRAY_START:
+			parse_array(lex, sem);
+			break;
+		default:
+			parse_scalar(lex, sem); /* json can be a bare scalar */
+	}
+
+	lex_expect(JSON_PARSE_END, lex, JSON_TOKEN_END);
+
+}
+
+/*
+ * json_count_array_elements
+ *
+ * Returns number of array elements in lex context at start of array token
+ * until end of array token at same nesting level.
+ *
+ * Designed to be called from array_start routines.
+ */
+int
+json_count_array_elements(JsonLexContext *lex)
+{
+	JsonLexContext copylex;
+	int			count;
+
+	/*
+	 * It's safe to do this with a shallow copy because the lexical routines
+	 * don't scribble on the input. They do scribble on the other pointers
+	 * etc, so doing this with a copy makes that safe.
+	 */
+	memcpy(&copylex, lex, sizeof(JsonLexContext));
+	copylex.strval = NULL;		/* not interested in values here */
+	copylex.lex_level++;
+
+	count = 0;
+	lex_expect(JSON_PARSE_ARRAY_START, &copylex, JSON_TOKEN_ARRAY_START);
+	if (lex_peek(&copylex) != JSON_TOKEN_ARRAY_END)
+	{
+		do
+		{
+			count++;
+			parse_array_element(&copylex, &nullSemAction);
+		}
+		while (lex_accept(&copylex, JSON_TOKEN_COMMA, NULL));
+	}
+	lex_expect(JSON_PARSE_ARRAY_NEXT, &copylex, JSON_TOKEN_ARRAY_END);
+
+	return count;
+}
+
+/*
+ *	Recursive Descent parse routines. There is one for each structural
+ *	element in a json document:
+ *	  - scalar (string, number, true, false, null)
+ *	  - array  ( [ ] )
+ *	  - array element
+ *	  - object ( { } )
+ *	  - object field
+ */
+static inline void
+parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
+{
+	char	   *val = NULL;
+	json_scalar_action sfunc = sem->scalar;
+	char	  **valaddr;
+	JsonTokenType tok = lex_peek(lex);
+
+	valaddr = sfunc == NULL ? NULL : &val;
+
+	/* a scalar must be a string, a number, true, false, or null */
+	switch (tok)
+	{
+		case JSON_TOKEN_TRUE:
+			lex_accept(lex, JSON_TOKEN_TRUE, valaddr);
+			break;
+		case JSON_TOKEN_FALSE:
+			lex_accept(lex, JSON_TOKEN_FALSE, valaddr);
+			break;
+		case JSON_TOKEN_NULL:
+			lex_accept(lex, JSON_TOKEN_NULL, valaddr);
+			break;
+		case JSON_TOKEN_NUMBER:
+			lex_accept(lex, JSON_TOKEN_NUMBER, valaddr);
+			break;
+		case JSON_TOKEN_STRING:
+			lex_accept(lex, JSON_TOKEN_STRING, valaddr);
+			break;
+		default:
+			report_parse_error(JSON_PARSE_VALUE, lex);
+	}
+
+	if (sfunc != NULL)
+		(*sfunc) (sem->semstate, val, tok);
+}
+
+static void
+parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
+{
+	/*
+	 * An object field is "fieldname" : value where value can be a scalar,
+	 * object or array.  Note: in user-facing docs and error messages, we
+	 * generally call a field name a "key".
+	 */
+
+	char	   *fname = NULL;	/* keep compiler quiet */
+	json_ofield_action ostart = sem->object_field_start;
+	json_ofield_action oend = sem->object_field_end;
+	bool		isnull;
+	char	  **fnameaddr = NULL;
+	JsonTokenType tok;
+
+	if (ostart != NULL || oend != NULL)
+		fnameaddr = &fname;
+
+	if (!lex_accept(lex, JSON_TOKEN_STRING, fnameaddr))
+		report_parse_error(JSON_PARSE_STRING, lex);
+
+	lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON);
+
+	tok = lex_peek(lex);
+	isnull = tok == JSON_TOKEN_NULL;
+
+	if (ostart != NULL)
+		(*ostart) (sem->semstate, fname, isnull);
+
+	switch (tok)
+	{
+		case JSON_TOKEN_OBJECT_START:
+			parse_object(lex, sem);
+			break;
+		case JSON_TOKEN_ARRAY_START:
+			parse_array(lex, sem);
+			break;
+		default:
+			parse_scalar(lex, sem);
+	}
+
+	if (oend != NULL)
+		(*oend) (sem->semstate, fname, isnull);
+}
+
+static void
+parse_object(JsonLexContext *lex, JsonSemAction *sem)
+{
+	/*
+	 * an object is a possibly empty sequence of object fields, separated by
+	 * commas and surrounded by curly braces.
+	 */
+	json_struct_action ostart = sem->object_start;
+	json_struct_action oend = sem->object_end;
+	JsonTokenType tok;
+
+	check_stack_depth();
+
+	if (ostart != NULL)
+		(*ostart) (sem->semstate);
+
+	/*
+	 * Data inside an object is at a higher nesting level than the object
+	 * itself. Note that we increment this after we call the semantic routine
+	 * for the object start and restore it before we call the routine for the
+	 * object end.
+	 */
+	lex->lex_level++;
+
+	/* we know this will succeed, just clearing the token */
+	lex_expect(JSON_PARSE_OBJECT_START, lex, JSON_TOKEN_OBJECT_START);
+
+	tok = lex_peek(lex);
+	switch (tok)
+	{
+		case JSON_TOKEN_STRING:
+			parse_object_field(lex, sem);
+			while (lex_accept(lex, JSON_TOKEN_COMMA, NULL))
+				parse_object_field(lex, sem);
+			break;
+		case JSON_TOKEN_OBJECT_END:
+			break;
+		default:
+			/* case of an invalid initial token inside the object */
+			report_parse_error(JSON_PARSE_OBJECT_START, lex);
+	}
+
+	lex_expect(JSON_PARSE_OBJECT_NEXT, lex, JSON_TOKEN_OBJECT_END);
+
+	lex->lex_level--;
+
+	if (oend != NULL)
+		(*oend) (sem->semstate);
+}
+
+static void
+parse_array_element(JsonLexContext *lex, JsonSemAction *sem)
+{
+	json_aelem_action astart = sem->array_element_start;
+	json_aelem_action aend = sem->array_element_end;
+	JsonTokenType tok = lex_peek(lex);
+
+	bool		isnull;
+
+	isnull = tok == JSON_TOKEN_NULL;
+
+	if (astart != NULL)
+		(*astart) (sem->semstate, isnull);
+
+	/* an array element is any object, array or scalar */
+	switch (tok)
+	{
+		case JSON_TOKEN_OBJECT_START:
+			parse_object(lex, sem);
+			break;
+		case JSON_TOKEN_ARRAY_START:
+			parse_array(lex, sem);
+			break;
+		default:
+			parse_scalar(lex, sem);
+	}
+
+	if (aend != NULL)
+		(*aend) (sem->semstate, isnull);
+}
+
+static void
+parse_array(JsonLexContext *lex, JsonSemAction *sem)
+{
+	/*
+	 * an array is a possibly empty sequence of array elements, separated by
+	 * commas and surrounded by square brackets.
+	 */
+	json_struct_action astart = sem->array_start;
+	json_struct_action aend = sem->array_end;
+
+	check_stack_depth();
+
+	if (astart != NULL)
+		(*astart) (sem->semstate);
+
+	/*
+	 * Data inside an array is at a higher nesting level than the array
+	 * itself. Note that we increment this after we call the semantic routine
+	 * for the array start and restore it before we call the routine for the
+	 * array end.
+	 */
+	lex->lex_level++;
+
+	lex_expect(JSON_PARSE_ARRAY_START, lex, JSON_TOKEN_ARRAY_START);
+	if (lex_peek(lex) != JSON_TOKEN_ARRAY_END)
+	{
+
+		parse_array_element(lex, sem);
+
+		while (lex_accept(lex, JSON_TOKEN_COMMA, NULL))
+			parse_array_element(lex, sem);
+	}
+
+	lex_expect(JSON_PARSE_ARRAY_NEXT, lex, JSON_TOKEN_ARRAY_END);
+
+	lex->lex_level--;
+
+	if (aend != NULL)
+		(*aend) (sem->semstate);
+}
+
+/*
+ * Lex one token from the input stream.
+ */
+void
+json_lex(JsonLexContext *lex)
+{
+	char	   *s;
+	int			len;
+
+	/* Skip leading whitespace. */
+	s = lex->token_terminator;
+	len = s - lex->input;
+	while (len < lex->input_length &&
+		   (*s == ' ' || *s == '\t' || *s == '\n' || *s == '\r'))
+	{
+		if (*s == '\n')
+			++lex->line_number;
+		++s;
+		++len;
+	}
+	lex->token_start = s;
+
+	/* Determine token type. */
+	if (len >= lex->input_length)
+	{
+		lex->token_start = NULL;
+		lex->prev_token_terminator = lex->token_terminator;
+		lex->token_terminator = s;
+		lex->token_type = JSON_TOKEN_END;
+	}
+	else
+		switch (*s)
+		{
+				/* Single-character token, some kind of punctuation mark. */
+			case '{':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_OBJECT_START;
+				break;
+			case '}':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_OBJECT_END;
+				break;
+			case '[':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_ARRAY_START;
+				break;
+			case ']':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_ARRAY_END;
+				break;
+			case ',':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_COMMA;
+				break;
+			case ':':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_COLON;
+				break;
+			case '"':
+				/* string */
+				json_lex_string(lex);
+				lex->token_type = JSON_TOKEN_STRING;
+				break;
+			case '-':
+				/* Negative number. */
+				json_lex_number(lex, s + 1, NULL, NULL);
+				lex->token_type = JSON_TOKEN_NUMBER;
+				break;
+			case '0':
+			case '1':
+			case '2':
+			case '3':
+			case '4':
+			case '5':
+			case '6':
+			case '7':
+			case '8':
+			case '9':
+				/* Positive number. */
+				json_lex_number(lex, s, NULL, NULL);
+				lex->token_type = JSON_TOKEN_NUMBER;
+				break;
+			default:
+				{
+					char	   *p;
+
+					/*
+					 * We're not dealing with a string, number, legal
+					 * punctuation mark, or end of string.  The only legal
+					 * tokens we might find here are true, false, and null,
+					 * but for error reporting purposes we scan until we see a
+					 * non-alphanumeric character.  That way, we can report
+					 * the whole word as an unexpected token, rather than just
+					 * some unintuitive prefix thereof.
+					 */
+					for (p = s; p - s < lex->input_length - len && JSON_ALPHANUMERIC_CHAR(*p); p++)
+						 /* skip */ ;
+
+					/*
+					 * We got some sort of unexpected punctuation or an
+					 * otherwise unexpected character, so just complain about
+					 * that one character.
+					 */
+					if (p == s)
+					{
+						lex->prev_token_terminator = lex->token_terminator;
+						lex->token_terminator = s + 1;
+						report_invalid_token(lex);
+					}
+
+					/*
+					 * We've got a real alphanumeric token here.  If it
+					 * happens to be true, false, or null, all is well.  If
+					 * not, error out.
+					 */
+					lex->prev_token_terminator = lex->token_terminator;
+					lex->token_terminator = p;
+					if (p - s == 4)
+					{
+						if (memcmp(s, "true", 4) == 0)
+							lex->token_type = JSON_TOKEN_TRUE;
+						else if (memcmp(s, "null", 4) == 0)
+							lex->token_type = JSON_TOKEN_NULL;
+						else
+							report_invalid_token(lex);
+					}
+					else if (p - s == 5 && memcmp(s, "false", 5) == 0)
+						lex->token_type = JSON_TOKEN_FALSE;
+					else
+						report_invalid_token(lex);
+
+				}
+		}						/* end of switch */
+}
+
+/*
+ * The next token in the input stream is known to be a string; lex it.
+ */
+static inline void
+json_lex_string(JsonLexContext *lex)
+{
+	char	   *s;
+	int			len;
+	int			hi_surrogate = -1;
+
+	if (lex->strval != NULL)
+		resetStringInfo(lex->strval);
+
+	Assert(lex->input_length > 0);
+	s = lex->token_start;
+	len = lex->token_start - lex->input;
+	for (;;)
+	{
+		s++;
+		len++;
+		/* Premature end of the string. */
+		if (len >= lex->input_length)
+		{
+			lex->token_terminator = s;
+			report_invalid_token(lex);
+		}
+		else if (*s == '"')
+			break;
+		else if ((unsigned char) *s < 32)
+		{
+			/* Per RFC4627, these characters MUST be escaped. */
+			/* Since *s isn't printable, exclude it from the context string */
+			lex->token_terminator = s;
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Character with value 0x%02x must be escaped.",
+							   (unsigned char) *s),
+					 report_json_context(lex)));
+		}
+		else if (*s == '\\')
+		{
+			/* OK, we have an escape character. */
+			s++;
+			len++;
+			if (len >= lex->input_length)
+			{
+				lex->token_terminator = s;
+				report_invalid_token(lex);
+			}
+			else if (*s == 'u')
+			{
+				int			i;
+				int			ch = 0;
+
+				for (i = 1; i <= 4; i++)
+				{
+					s++;
+					len++;
+					if (len >= lex->input_length)
+					{
+						lex->token_terminator = s;
+						report_invalid_token(lex);
+					}
+					else if (*s >= '0' && *s <= '9')
+						ch = (ch * 16) + (*s - '0');
+					else if (*s >= 'a' && *s <= 'f')
+						ch = (ch * 16) + (*s - 'a') + 10;
+					else if (*s >= 'A' && *s <= 'F')
+						ch = (ch * 16) + (*s - 'A') + 10;
+					else
+					{
+						lex->token_terminator = s + pg_mblen(s);
+						ereport(ERROR,
+								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+								 errmsg("invalid input syntax for type %s",
+										"json"),
+								 errdetail("\"\\u\" must be followed by four hexadecimal digits."),
+								 report_json_context(lex)));
+					}
+				}
+				if (lex->strval != NULL)
+				{
+					char		utf8str[5];
+					int			utf8len;
+
+					if (ch >= 0xd800 && ch <= 0xdbff)
+					{
+						if (hi_surrogate != -1)
+							ereport(ERROR,
+									(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+									 errmsg("invalid input syntax for type %s",
+											"json"),
+									 errdetail("Unicode high surrogate must not follow a high surrogate."),
+									 report_json_context(lex)));
+						hi_surrogate = (ch & 0x3ff) << 10;
+						continue;
+					}
+					else if (ch >= 0xdc00 && ch <= 0xdfff)
+					{
+						if (hi_surrogate == -1)
+							ereport(ERROR,
+									(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+									 errmsg("invalid input syntax for type %s", "json"),
+									 errdetail("Unicode low surrogate must follow a high surrogate."),
+									 report_json_context(lex)));
+						ch = 0x10000 + hi_surrogate + (ch & 0x3ff);
+						hi_surrogate = -1;
+					}
+
+					if (hi_surrogate != -1)
+						ereport(ERROR,
+								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+								 errmsg("invalid input syntax for type %s", "json"),
+								 errdetail("Unicode low surrogate must follow a high surrogate."),
+								 report_json_context(lex)));
+
+					/*
+					 * For UTF8, replace the escape sequence by the actual
+					 * utf8 character in lex->strval. Do this also for other
+					 * encodings if the escape designates an ASCII character,
+					 * otherwise raise an error.
+					 */
+
+					if (ch == 0)
+					{
+						/* We can't allow this, since our TEXT type doesn't */
+						ereport(ERROR,
+								(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
+								 errmsg("unsupported Unicode escape sequence"),
+								 errdetail("\\u0000 cannot be converted to text."),
+								 report_json_context(lex)));
+					}
+					else if (GetDatabaseEncoding() == PG_UTF8)
+					{
+						unicode_to_utf8(ch, (unsigned char *) utf8str);
+						utf8len = pg_utf_mblen((unsigned char *) utf8str);
+						appendBinaryStringInfo(lex->strval, utf8str, utf8len);
+					}
+					else if (ch <= 0x007f)
+					{
+						/*
+						 * This is the only way to designate things like a
+						 * form feed character in JSON, so it's useful in all
+						 * encodings.
+						 */
+						appendStringInfoChar(lex->strval, (char) ch);
+					}
+					else
+					{
+						ereport(ERROR,
+								(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
+								 errmsg("unsupported Unicode escape sequence"),
+								 errdetail("Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8."),
+								 report_json_context(lex)));
+					}
+
+				}
+			}
+			else if (lex->strval != NULL)
+			{
+				if (hi_surrogate != -1)
+					ereport(ERROR,
+							(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+							 errmsg("invalid input syntax for type %s",
+									"json"),
+							 errdetail("Unicode low surrogate must follow a high surrogate."),
+							 report_json_context(lex)));
+
+				switch (*s)
+				{
+					case '"':
+					case '\\':
+					case '/':
+						appendStringInfoChar(lex->strval, *s);
+						break;
+					case 'b':
+						appendStringInfoChar(lex->strval, '\b');
+						break;
+					case 'f':
+						appendStringInfoChar(lex->strval, '\f');
+						break;
+					case 'n':
+						appendStringInfoChar(lex->strval, '\n');
+						break;
+					case 'r':
+						appendStringInfoChar(lex->strval, '\r');
+						break;
+					case 't':
+						appendStringInfoChar(lex->strval, '\t');
+						break;
+					default:
+						/* Not a valid string escape, so error out. */
+						lex->token_terminator = s + pg_mblen(s);
+						ereport(ERROR,
+								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+								 errmsg("invalid input syntax for type %s",
+										"json"),
+								 errdetail("Escape sequence \"\\%s\" is invalid.",
+										   extract_mb_char(s)),
+								 report_json_context(lex)));
+				}
+			}
+			else if (strchr("\"\\/bfnrt", *s) == NULL)
+			{
+				/*
+				 * Simpler processing if we're not bothered about de-escaping
+				 *
+				 * It's very tempting to remove the strchr() call here and
+				 * replace it with a switch statement, but testing so far has
+				 * shown it's not a performance win.
+				 */
+				lex->token_terminator = s + pg_mblen(s);
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Escape sequence \"\\%s\" is invalid.",
+								   extract_mb_char(s)),
+						 report_json_context(lex)));
+			}
+
+		}
+		else if (lex->strval != NULL)
+		{
+			if (hi_surrogate != -1)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Unicode low surrogate must follow a high surrogate."),
+						 report_json_context(lex)));
+
+			appendStringInfoChar(lex->strval, *s);
+		}
+
+	}
+
+	if (hi_surrogate != -1)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				 errmsg("invalid input syntax for type %s", "json"),
+				 errdetail("Unicode low surrogate must follow a high surrogate."),
+				 report_json_context(lex)));
+
+	/* Hooray, we found the end of the string! */
+	lex->prev_token_terminator = lex->token_terminator;
+	lex->token_terminator = s + 1;
+}
+
+/*
+ * The next token in the input stream is known to be a number; lex it.
+ *
+ * In JSON, a number consists of four parts:
+ *
+ * (1) An optional minus sign ('-').
+ *
+ * (2) Either a single '0', or a string of one or more digits that does not
+ *	   begin with a '0'.
+ *
+ * (3) An optional decimal part, consisting of a period ('.') followed by
+ *	   one or more digits.  (Note: While this part can be omitted
+ *	   completely, it's not OK to have only the decimal point without
+ *	   any digits afterwards.)
+ *
+ * (4) An optional exponent part, consisting of 'e' or 'E', optionally
+ *	   followed by '+' or '-', followed by one or more digits.  (Note:
+ *	   As with the decimal part, if 'e' or 'E' is present, it must be
+ *	   followed by at least one digit.)
+ *
+ * The 's' argument to this function points to the ostensible beginning
+ * of part 2 - i.e. the character after any optional minus sign, or the
+ * first character of the string if there is none.
+ *
+ * If num_err is not NULL, we return an error flag to *num_err rather than
+ * raising an error for a badly-formed number.  Also, if total_len is not NULL
+ * the distance from lex->input to the token end+1 is returned to *total_len.
+ */
+static inline void
+json_lex_number(JsonLexContext *lex, char *s,
+				bool *num_err, int *total_len)
+{
+	bool		error = false;
+	int			len = s - lex->input;
+
+	/* Part (1): leading sign indicator. */
+	/* Caller already did this for us; so do nothing. */
+
+	/* Part (2): parse main digit string. */
+	if (len < lex->input_length && *s == '0')
+	{
+		s++;
+		len++;
+	}
+	else if (len < lex->input_length && *s >= '1' && *s <= '9')
+	{
+		do
+		{
+			s++;
+			len++;
+		} while (len < lex->input_length && *s >= '0' && *s <= '9');
+	}
+	else
+		error = true;
+
+	/* Part (3): parse optional decimal portion. */
+	if (len < lex->input_length && *s == '.')
+	{
+		s++;
+		len++;
+		if (len == lex->input_length || *s < '0' || *s > '9')
+			error = true;
+		else
+		{
+			do
+			{
+				s++;
+				len++;
+			} while (len < lex->input_length && *s >= '0' && *s <= '9');
+		}
+	}
+
+	/* Part (4): parse optional exponent. */
+	if (len < lex->input_length && (*s == 'e' || *s == 'E'))
+	{
+		s++;
+		len++;
+		if (len < lex->input_length && (*s == '+' || *s == '-'))
+		{
+			s++;
+			len++;
+		}
+		if (len == lex->input_length || *s < '0' || *s > '9')
+			error = true;
+		else
+		{
+			do
+			{
+				s++;
+				len++;
+			} while (len < lex->input_length && *s >= '0' && *s <= '9');
+		}
+	}
+
+	/*
+	 * Check for trailing garbage.  As in json_lex(), any alphanumeric stuff
+	 * here should be considered part of the token for error-reporting
+	 * purposes.
+	 */
+	for (; len < lex->input_length && JSON_ALPHANUMERIC_CHAR(*s); s++, len++)
+		error = true;
+
+	if (total_len != NULL)
+		*total_len = len;
+
+	if (num_err != NULL)
+	{
+		/* let the caller handle any error */
+		*num_err = error;
+	}
+	else
+	{
+		/* return token endpoint */
+		lex->prev_token_terminator = lex->token_terminator;
+		lex->token_terminator = s;
+		/* handle error if any */
+		if (error)
+			report_invalid_token(lex);
+	}
+}
+
+/*
+ * Report a parse error.
+ *
+ * lex->token_start and lex->token_terminator must identify the current token.
+ */
+static void
+report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
+{
+	char	   *token;
+	int			toklen;
+
+	/* Handle case where the input ended prematurely. */
+	if (lex->token_start == NULL || lex->token_type == JSON_TOKEN_END)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				 errmsg("invalid input syntax for type %s", "json"),
+				 errdetail("The input string ended unexpectedly."),
+				 report_json_context(lex)));
+
+	/* Separate out the current token. */
+	toklen = lex->token_terminator - lex->token_start;
+	token = palloc(toklen + 1);
+	memcpy(token, lex->token_start, toklen);
+	token[toklen] = '\0';
+
+	/* Complain, with the appropriate detail message. */
+	if (ctx == JSON_PARSE_END)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				 errmsg("invalid input syntax for type %s", "json"),
+				 errdetail("Expected end of input, but found \"%s\".",
+						   token),
+				 report_json_context(lex)));
+	else
+	{
+		switch (ctx)
+		{
+			case JSON_PARSE_VALUE:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected JSON value, but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_STRING:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected string, but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_ARRAY_START:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected array element or \"]\", but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_ARRAY_NEXT:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected \",\" or \"]\", but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_OBJECT_START:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected string or \"}\", but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_OBJECT_LABEL:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected \":\", but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_OBJECT_NEXT:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected \",\" or \"}\", but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			case JSON_PARSE_OBJECT_COMMA:
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("invalid input syntax for type %s", "json"),
+						 errdetail("Expected string, but found \"%s\".",
+								   token),
+						 report_json_context(lex)));
+				break;
+			default:
+				elog(ERROR, "unexpected json parse state: %d", ctx);
+		}
+	}
+}
+
+/*
+ * Report an invalid input token.
+ *
+ * lex->token_start and lex->token_terminator must identify the token.
+ */
+static void
+report_invalid_token(JsonLexContext *lex)
+{
+	char	   *token;
+	int			toklen;
+
+	/* Separate out the offending token. */
+	toklen = lex->token_terminator - lex->token_start;
+	token = palloc(toklen + 1);
+	memcpy(token, lex->token_start, toklen);
+	token[toklen] = '\0';
+
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+			 errmsg("invalid input syntax for type %s", "json"),
+			 errdetail("Token \"%s\" is invalid.", token),
+			 report_json_context(lex)));
+}
+
+/*
+ * Report a CONTEXT line for bogus JSON input.
+ *
+ * lex->token_terminator must be set to identify the spot where we detected
+ * the error.  Note that lex->token_start might be NULL, in case we recognized
+ * error at EOF.
+ *
+ * The return value isn't meaningful, but we make it non-void so that this
+ * can be invoked inside ereport().
+ */
+static int
+report_json_context(JsonLexContext *lex)
+{
+	const char *context_start;
+	const char *context_end;
+	const char *line_start;
+	int			line_number;
+	char	   *ctxt;
+	int			ctxtlen;
+	const char *prefix;
+	const char *suffix;
+
+	/* Choose boundaries for the part of the input we will display */
+	context_start = lex->input;
+	context_end = lex->token_terminator;
+	line_start = context_start;
+	line_number = 1;
+	for (;;)
+	{
+		/* Always advance over newlines */
+		if (context_start < context_end && *context_start == '\n')
+		{
+			context_start++;
+			line_start = context_start;
+			line_number++;
+			continue;
+		}
+		/* Otherwise, done as soon as we are close enough to context_end */
+		if (context_end - context_start < 50)
+			break;
+		/* Advance to next multibyte character */
+		if (IS_HIGHBIT_SET(*context_start))
+			context_start += pg_mblen(context_start);
+		else
+			context_start++;
+	}
+
+	/*
+	 * We add "..." to indicate that the excerpt doesn't start at the
+	 * beginning of the line ... but if we're within 3 characters of the
+	 * beginning of the line, we might as well just show the whole line.
+	 */
+	if (context_start - line_start <= 3)
+		context_start = line_start;
+
+	/* Get a null-terminated copy of the data to present */
+	ctxtlen = context_end - context_start;
+	ctxt = palloc(ctxtlen + 1);
+	memcpy(ctxt, context_start, ctxtlen);
+	ctxt[ctxtlen] = '\0';
+
+	/*
+	 * Show the context, prefixing "..." if not starting at start of line, and
+	 * suffixing "..." if not ending at end of line.
+	 */
+	prefix = (context_start > line_start) ? "..." : "";
+	suffix = (lex->token_type != JSON_TOKEN_END && context_end - lex->input < lex->input_length && *context_end != '\n' && *context_end != '\r') ? "..." : "";
+
+	return errcontext("JSON data, line %d: %s%s%s",
+					  line_number, prefix, ctxt, suffix);
+}
+
+/*
+ * Extract a single, possibly multi-byte char from the input string.
+ */
+static char *
+extract_mb_char(char *s)
+{
+	char	   *res;
+	int			len;
+
+	len = pg_mblen(s);
+	res = palloc(len + 1);
+	memcpy(res, s, len);
+	res[len] = '\0';
+
+	return res;
+}
diff --git a/src/include/utils/jsonapi.h b/src/include/utils/jsonapi.h
index 1190947476..bbca121bb7 100644
--- a/src/include/utils/jsonapi.h
+++ b/src/include/utils/jsonapi.h
@@ -103,6 +103,9 @@ typedef struct JsonSemAction
  */
 extern void pg_parse_json(JsonLexContext *lex, JsonSemAction *sem);
 
+/* the null action object used for pure validation */
+extern JsonSemAction nullSemAction;
+
 /*
  * json_count_array_elements performs a fast secondary parse to determine the
  * number of elements in passed array lex context. It should be called from an
@@ -124,6 +127,9 @@ extern JsonLexContext *makeJsonLexContextCstringLen(char *json,
 													int len,
 													bool need_escapes);
 
+/* lex one token */
+extern void json_lex(JsonLexContext *lex);
+
 /*
  * Utility function to check if a string is a valid JSON number.
  *
-- 
2.17.2 (Apple Git-113)

v3-0005-Move-some-code-from-jsonapi.c-to-jsonfuncs.c.patchapplication/octet-stream; name=v3-0005-Move-some-code-from-jsonapi.c-to-jsonfuncs.c.patchDownload

From 910a60c7167b9d9c086a2183063cf0605ec31f8d Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 22 Jan 2020 13:28:03 -0500
Subject: [PATCH v3 5/5] Move some code from jsonapi.c to jsonfuncs.c.

Specifically, move all of the functions that depend on the backend's
error-reporting infrastructure from jsonapi.c to jsonfuncs.c, in
preparation for allowing jsonapi.c to be used from frontend code.

Well, not quite all of them. A few cases where elog(ERROR, ...)
is used for can't-happen conditions are left alone; we can handle
those in some other way in frontend code.
---
 src/backend/utils/adt/json.c      |   2 +-
 src/backend/utils/adt/jsonapi.c   | 127 +-----------------------------
 src/backend/utils/adt/jsonb.c     |   2 +-
 src/backend/utils/adt/jsonfuncs.c | 126 +++++++++++++++++++++++++++++
 src/include/utils/jsonapi.h       |  15 +---
 src/include/utils/jsonfuncs.h     |   9 +++
 6 files changed, 140 insertions(+), 141 deletions(-)

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index e73a60ece8..f6cd2b9911 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -23,7 +23,7 @@
 #include "utils/date.h"
 #include "utils/datetime.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
+#include "utils/jsonfuncs.h"
 #include "utils/lsyscache.h"
 #include "utils/typcache.h"
 
diff --git a/src/backend/utils/adt/jsonapi.c b/src/backend/utils/adt/jsonapi.c
index 129fbd65d5..1ac3b7beda 100644
--- a/src/backend/utils/adt/jsonapi.c
+++ b/src/backend/utils/adt/jsonapi.c
@@ -44,7 +44,6 @@ static JsonParseErrorType parse_object(JsonLexContext *lex, JsonSemAction *sem);
 static JsonParseErrorType parse_array_element(JsonLexContext *lex, JsonSemAction *sem);
 static JsonParseErrorType parse_array(JsonLexContext *lex, JsonSemAction *sem);
 static JsonParseErrorType report_parse_error(JsonParseContext ctx, JsonLexContext *lex);
-static int	report_json_context(JsonLexContext *lex);
 static char *extract_token(JsonLexContext *lex);
 
 /* the null action object used for pure validation */
@@ -128,25 +127,13 @@ IsValidJsonNumber(const char *str, int len)
 }
 
 /*
- * makeJsonLexContext
+ * makeJsonLexContextCstringLen
  *
- * lex constructor, with or without StringInfo object
- * for de-escaped lexemes.
+ * lex constructor, with or without StringInfo object for de-escaped lexemes.
  *
  * Without is better as it makes the processing faster, so only make one
  * if really required.
- *
- * If you already have the json as a text* value, use the first of these
- * functions, otherwise use  makeJsonLexContextCstringLen().
  */
-JsonLexContext *
-makeJsonLexContext(text *json, bool need_escapes)
-{
-	return makeJsonLexContextCstringLen(VARDATA_ANY(json),
-										VARSIZE_ANY_EXHDR(json),
-										need_escapes);
-}
-
 JsonLexContext *
 makeJsonLexContextCstringLen(char *json, int len, bool need_escapes)
 {
@@ -202,23 +189,6 @@ pg_parse_json(JsonLexContext *lex, JsonSemAction *sem)
 	return result;
 }
 
-/*
- * pg_parse_json_or_ereport
- *
- * This fuction is like pg_parse_json, except that it does not return a
- * JsonParseErrorType. Instead, in case of any failure, this function will
- * ereport(ERROR).
- */
-void
-pg_parse_json_or_ereport(JsonLexContext *lex, JsonSemAction *sem)
-{
-	JsonParseErrorType	result;
-
-	result = pg_parse_json(lex, sem);
-	if (result != JSON_SUCCESS)
-		json_ereport_error(result, lex);
-}
-
 /*
  * json_count_array_elements
  *
@@ -1038,27 +1008,6 @@ report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
 	}
 }
 
-/*
- * Report a JSON error.
- */
-void
-json_ereport_error(JsonParseErrorType error, JsonLexContext *lex)
-{
-	if (error == JSON_UNICODE_HIGH_ESCAPE ||
-		error == JSON_UNICODE_CODE_POINT_ZERO)
-		ereport(ERROR,
-				(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
-				 errmsg("unsupported Unicode escape sequence"),
-				 errdetail("%s", json_errdetail(error, lex)),
-				 report_json_context(lex)));
-	else
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("%s", json_errdetail(error, lex)),
-				 report_json_context(lex)));
-}
-
 /*
  * Construct a detail message for a JSON error.
  */
@@ -1118,78 +1067,6 @@ json_errdetail(JsonParseErrorType error, JsonLexContext *lex)
 	}
 }
 
-/*
- * Report a CONTEXT line for bogus JSON input.
- *
- * lex->token_terminator must be set to identify the spot where we detected
- * the error.  Note that lex->token_start might be NULL, in case we recognized
- * error at EOF.
- *
- * The return value isn't meaningful, but we make it non-void so that this
- * can be invoked inside ereport().
- */
-static int
-report_json_context(JsonLexContext *lex)
-{
-	const char *context_start;
-	const char *context_end;
-	const char *line_start;
-	int			line_number;
-	char	   *ctxt;
-	int			ctxtlen;
-	const char *prefix;
-	const char *suffix;
-
-	/* Choose boundaries for the part of the input we will display */
-	context_start = lex->input;
-	context_end = lex->token_terminator;
-	line_start = context_start;
-	line_number = 1;
-	for (;;)
-	{
-		/* Always advance over newlines */
-		if (context_start < context_end && *context_start == '\n')
-		{
-			context_start++;
-			line_start = context_start;
-			line_number++;
-			continue;
-		}
-		/* Otherwise, done as soon as we are close enough to context_end */
-		if (context_end - context_start < 50)
-			break;
-		/* Advance to next multibyte character */
-		if (IS_HIGHBIT_SET(*context_start))
-			context_start += pg_mblen(context_start);
-		else
-			context_start++;
-	}
-
-	/*
-	 * We add "..." to indicate that the excerpt doesn't start at the
-	 * beginning of the line ... but if we're within 3 characters of the
-	 * beginning of the line, we might as well just show the whole line.
-	 */
-	if (context_start - line_start <= 3)
-		context_start = line_start;
-
-	/* Get a null-terminated copy of the data to present */
-	ctxtlen = context_end - context_start;
-	ctxt = palloc(ctxtlen + 1);
-	memcpy(ctxt, context_start, ctxtlen);
-	ctxt[ctxtlen] = '\0';
-
-	/*
-	 * Show the context, prefixing "..." if not starting at start of line, and
-	 * suffixing "..." if not ending at end of line.
-	 */
-	prefix = (context_start > line_start) ? "..." : "";
-	suffix = (lex->token_type != JSON_TOKEN_END && context_end - lex->input < lex->input_length && *context_end != '\n' && *context_end != '\r') ? "..." : "";
-
-	return errcontext("JSON data, line %d: %s%s%s",
-					  line_number, prefix, ctxt, suffix);
-}
-
 /*
  * Extract the current token from a lexing context, for error reporting.
  */
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 83d7f68b82..c912f8932d 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -23,8 +23,8 @@
 #include "utils/date.h"
 #include "utils/datetime.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonb.h"
+#include "utils/jsonfuncs.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 #include "utils/typcache.h"
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 9eff506855..66ea11b971 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -329,6 +329,8 @@ typedef struct JsObject
 			hash_destroy((jso)->val.json_hash); \
 	} while (0)
 
+static int	report_json_context(JsonLexContext *lex);
+
 /* semantic action functions for json_object_keys */
 static void okeys_object_field_start(void *state, char *fname, bool isnull);
 static void okeys_array_start(void *state);
@@ -484,6 +486,37 @@ static void transform_string_values_object_field_start(void *state, char *fname,
 static void transform_string_values_array_element_start(void *state, bool isnull);
 static void transform_string_values_scalar(void *state, char *token, JsonTokenType tokentype);
 
+/*
+ * pg_parse_json_or_ereport
+ *
+ * This fuction is like pg_parse_json, except that it does not return a
+ * JsonParseErrorType. Instead, in case of any failure, this function will
+ * ereport(ERROR).
+ */
+void
+pg_parse_json_or_ereport(JsonLexContext *lex, JsonSemAction *sem)
+{
+	JsonParseErrorType	result;
+
+	result = pg_parse_json(lex, sem);
+	if (result != JSON_SUCCESS)
+		json_ereport_error(result, lex);
+}
+
+/*
+ * makeJsonLexContext
+ *
+ * This is like makeJsonLexContextCstringLen, but it accepts a text value
+ * directly.
+ */
+JsonLexContext *
+makeJsonLexContext(text *json, bool need_escapes)
+{
+	return makeJsonLexContextCstringLen(VARDATA_ANY(json),
+										VARSIZE_ANY_EXHDR(json),
+										need_escapes);
+}
+
 /*
  * SQL function json_object_keys
  *
@@ -573,6 +606,99 @@ jsonb_object_keys(PG_FUNCTION_ARGS)
 	SRF_RETURN_DONE(funcctx);
 }
 
+/*
+ * Report a JSON error.
+ */
+void
+json_ereport_error(JsonParseErrorType error, JsonLexContext *lex)
+{
+	if (error == JSON_UNICODE_HIGH_ESCAPE ||
+		error == JSON_UNICODE_CODE_POINT_ZERO)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
+				 errmsg("unsupported Unicode escape sequence"),
+				 errdetail("%s", json_errdetail(error, lex)),
+				 report_json_context(lex)));
+	else
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				 errmsg("invalid input syntax for type %s", "json"),
+				 errdetail("%s", json_errdetail(error, lex)),
+				 report_json_context(lex)));
+}
+
+/*
+ * Report a CONTEXT line for bogus JSON input.
+ *
+ * lex->token_terminator must be set to identify the spot where we detected
+ * the error.  Note that lex->token_start might be NULL, in case we recognized
+ * error at EOF.
+ *
+ * The return value isn't meaningful, but we make it non-void so that this
+ * can be invoked inside ereport().
+ */
+static int
+report_json_context(JsonLexContext *lex)
+{
+	const char *context_start;
+	const char *context_end;
+	const char *line_start;
+	int			line_number;
+	char	   *ctxt;
+	int			ctxtlen;
+	const char *prefix;
+	const char *suffix;
+
+	/* Choose boundaries for the part of the input we will display */
+	context_start = lex->input;
+	context_end = lex->token_terminator;
+	line_start = context_start;
+	line_number = 1;
+	for (;;)
+	{
+		/* Always advance over newlines */
+		if (context_start < context_end && *context_start == '\n')
+		{
+			context_start++;
+			line_start = context_start;
+			line_number++;
+			continue;
+		}
+		/* Otherwise, done as soon as we are close enough to context_end */
+		if (context_end - context_start < 50)
+			break;
+		/* Advance to next multibyte character */
+		if (IS_HIGHBIT_SET(*context_start))
+			context_start += pg_mblen(context_start);
+		else
+			context_start++;
+	}
+
+	/*
+	 * We add "..." to indicate that the excerpt doesn't start at the
+	 * beginning of the line ... but if we're within 3 characters of the
+	 * beginning of the line, we might as well just show the whole line.
+	 */
+	if (context_start - line_start <= 3)
+		context_start = line_start;
+
+	/* Get a null-terminated copy of the data to present */
+	ctxtlen = context_end - context_start;
+	ctxt = palloc(ctxtlen + 1);
+	memcpy(ctxt, context_start, ctxtlen);
+	ctxt[ctxtlen] = '\0';
+
+	/*
+	 * Show the context, prefixing "..." if not starting at start of line, and
+	 * suffixing "..." if not ending at end of line.
+	 */
+	prefix = (context_start > line_start) ? "..." : "";
+	suffix = (lex->token_type != JSON_TOKEN_END && context_end - lex->input < lex->input_length && *context_end != '\n' && *context_end != '\r') ? "..." : "";
+
+	return errcontext("JSON data, line %d: %s%s%s",
+					  line_number, prefix, ctxt, suffix);
+}
+
 
 Datum
 json_object_keys(PG_FUNCTION_ARGS)
diff --git a/src/include/utils/jsonapi.h b/src/include/utils/jsonapi.h
index 74dc35c41c..4d69b18495 100644
--- a/src/include/utils/jsonapi.h
+++ b/src/include/utils/jsonapi.h
@@ -126,12 +126,6 @@ typedef struct JsonSemAction
 extern JsonParseErrorType pg_parse_json(JsonLexContext *lex,
 										JsonSemAction *sem);
 
-/*
- * Same thing, but signal errors via ereport(ERROR) instead of returning
- * a result code.
- */
-extern void pg_parse_json_or_ereport(JsonLexContext *lex, JsonSemAction *sem);
-
 /* the null action object used for pure validation */
 extern JsonSemAction nullSemAction;
 
@@ -148,15 +142,11 @@ extern JsonParseErrorType json_count_array_elements(JsonLexContext *lex,
 													int *elements);
 
 /*
- * constructors for JsonLexContext, with or without strval element.
+ * constructor for JsonLexContext, with or without strval element.
  * If supplied, the strval element will contain a de-escaped version of
  * the lexeme. However, doing this imposes a performance penalty, so
  * it should be avoided if the de-escaped lexeme is not required.
- *
- * If you already have the json as a text* value, use the first of these
- * functions, otherwise use  makeJsonLexContextCstringLen().
  */
-extern JsonLexContext *makeJsonLexContext(text *json, bool need_escapes);
 extern JsonLexContext *makeJsonLexContextCstringLen(char *json,
 													int len,
 													bool need_escapes);
@@ -164,9 +154,6 @@ extern JsonLexContext *makeJsonLexContextCstringLen(char *json,
 /* lex one token */
 extern JsonParseErrorType json_lex(JsonLexContext *lex);
 
-/* report an error during json lexing or parsing */
-extern void json_ereport_error(JsonParseErrorType error, JsonLexContext *lex);
-
 /* construct an error detail string for a json error */
 extern char *json_errdetail(JsonParseErrorType error, JsonLexContext *lex);
 
diff --git a/src/include/utils/jsonfuncs.h b/src/include/utils/jsonfuncs.h
index 19f087ccae..b993f38409 100644
--- a/src/include/utils/jsonfuncs.h
+++ b/src/include/utils/jsonfuncs.h
@@ -36,6 +36,15 @@ typedef void (*JsonIterateStringValuesAction) (void *state, char *elem_value, in
 /* an action that will be applied to each value in transform_json(b)_values functions */
 typedef text *(*JsonTransformStringValuesAction) (void *state, char *elem_value, int elem_len);
 
+/* build a JsonLexContext from a text datum */
+extern JsonLexContext *makeJsonLexContext(text *json, bool need_escapes);
+
+/* try to parse json, and ereport(ERROR) on failure */
+extern void pg_parse_json_or_ereport(JsonLexContext *lex, JsonSemAction *sem);
+
+/* report an error during json lexing or parsing */
+extern void json_ereport_error(JsonParseErrorType error, JsonLexContext *lex);
+
 extern uint32 parse_jsonb_index_flags(Jsonb *jb);
 extern void iterate_jsonb_values(Jsonb *jb, uint32 flags, void *state,
 								 JsonIterateStringValuesAction action);
-- 
2.17.2 (Apple Git-113)

v3-0004-Adjust-pg_parse_error-so-that-it-does-not-depend-.patchapplication/octet-stream; name=v3-0004-Adjust-pg_parse_error-so-that-it-does-not-depend-.patchDownload

From c781796fdeba1e5f8d551aada8231db8af6bd66f Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 17 Jan 2020 16:22:21 -0500
Subject: [PATCH v3 4/5] Adjust pg_parse_error() so that it does not depend on
 ereport().

Instead, it now returns a value indicating either success or the
type of error which occurred. The old behavior is still available
by calling pg_parse_json_or_ereport(). If the new interface is
used, an error can be thrown by passing the return value of
pg_parse_json() to json_ereport_error().

Adjust json_lex() and json_count_array_elements() to return an
error code, too.

This is all in preparation for making the backend's json parser
available to frontend code.
---
 src/backend/utils/adt/json.c      |   9 +-
 src/backend/utils/adt/jsonapi.c   | 537 +++++++++++++++---------------
 src/backend/utils/adt/jsonb.c     |   4 +-
 src/backend/utils/adt/jsonfuncs.c |  29 +-
 src/include/utils/jsonapi.h       |  46 ++-
 5 files changed, 342 insertions(+), 283 deletions(-)

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 4be16b5c20..e73a60ece8 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -81,7 +81,7 @@ json_in(PG_FUNCTION_ARGS)
 
 	/* validate it */
 	lex = makeJsonLexContext(result, false);
-	pg_parse_json(lex, &nullSemAction);
+	pg_parse_json_or_ereport(lex, &nullSemAction);
 
 	/* Internal representation is the same as text, for now */
 	PG_RETURN_TEXT_P(result);
@@ -128,7 +128,7 @@ json_recv(PG_FUNCTION_ARGS)
 
 	/* Validate it. */
 	lex = makeJsonLexContextCstringLen(str, nbytes, false);
-	pg_parse_json(lex, &nullSemAction);
+	pg_parse_json_or_ereport(lex, &nullSemAction);
 
 	PG_RETURN_TEXT_P(cstring_to_text_with_len(str, nbytes));
 }
@@ -1337,12 +1337,15 @@ json_typeof(PG_FUNCTION_ARGS)
 	JsonLexContext *lex;
 	JsonTokenType tok;
 	char	   *type;
+	JsonParseErrorType	result;
 
 	json = PG_GETARG_TEXT_PP(0);
 	lex = makeJsonLexContext(json, false);
 
 	/* Lex exactly one token from the input and check its type. */
-	json_lex(lex);
+	result = json_lex(lex);
+	if (result != JSON_SUCCESS)
+		json_ereport_error(result, lex);
 	tok = lex->token_type;
 	switch (tok)
 	{
diff --git a/src/backend/utils/adt/jsonapi.c b/src/backend/utils/adt/jsonapi.c
index 9e14306b6f..129fbd65d5 100644
--- a/src/backend/utils/adt/jsonapi.c
+++ b/src/backend/utils/adt/jsonapi.c
@@ -35,18 +35,17 @@ typedef enum					/* contexts of JSON parser */
 	JSON_PARSE_END				/* saw the end of a document, expect nothing */
 } JsonParseContext;
 
-static inline void json_lex_string(JsonLexContext *lex);
-static inline void json_lex_number(JsonLexContext *lex, char *s,
+static inline JsonParseErrorType json_lex_string(JsonLexContext *lex);
+static inline JsonParseErrorType json_lex_number(JsonLexContext *lex, char *s,
 								   bool *num_err, int *total_len);
-static inline void parse_scalar(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_object_field(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_object(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_array_element(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_array(JsonLexContext *lex, JsonSemAction *sem);
-static void report_parse_error(JsonParseContext ctx, JsonLexContext *lex) pg_attribute_noreturn();
-static void report_invalid_token(JsonLexContext *lex) pg_attribute_noreturn();
+static inline JsonParseErrorType parse_scalar(JsonLexContext *lex, JsonSemAction *sem);
+static JsonParseErrorType parse_object_field(JsonLexContext *lex, JsonSemAction *sem);
+static JsonParseErrorType parse_object(JsonLexContext *lex, JsonSemAction *sem);
+static JsonParseErrorType parse_array_element(JsonLexContext *lex, JsonSemAction *sem);
+static JsonParseErrorType parse_array(JsonLexContext *lex, JsonSemAction *sem);
+static JsonParseErrorType report_parse_error(JsonParseContext ctx, JsonLexContext *lex);
 static int	report_json_context(JsonLexContext *lex);
-static char *extract_mb_char(char *s);
+static char *extract_token(JsonLexContext *lex);
 
 /* the null action object used for pure validation */
 JsonSemAction nullSemAction =
@@ -74,13 +73,13 @@ lex_peek(JsonLexContext *lex)
  * move the lexer to the next token if the current look_ahead token matches
  * the parameter token. Otherwise, report an error.
  */
-static inline void
+static inline JsonParseErrorType
 lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token)
 {
 	if (lex_peek(lex) == token)
-		json_lex(lex);
+		return json_lex(lex);
 	else
-		report_parse_error(ctx, lex);
+		return report_parse_error(ctx, lex);
 }
 
 /* chars to consider as part of an alphanumeric token */
@@ -171,13 +170,16 @@ makeJsonLexContextCstringLen(char *json, int len, bool need_escapes)
  * action routines to be called at appropriate spots during parsing, and a
  * pointer to a state object to be passed to those routines.
  */
-void
+JsonParseErrorType
 pg_parse_json(JsonLexContext *lex, JsonSemAction *sem)
 {
 	JsonTokenType tok;
+	JsonParseErrorType	result;
 
 	/* get the initial token */
-	json_lex(lex);
+	result = json_lex(lex);
+	if (result != JSON_SUCCESS)
+		return result;
 
 	tok = lex_peek(lex);
 
@@ -185,17 +187,36 @@ pg_parse_json(JsonLexContext *lex, JsonSemAction *sem)
 	switch (tok)
 	{
 		case JSON_TOKEN_OBJECT_START:
-			parse_object(lex, sem);
+			result = parse_object(lex, sem);
 			break;
 		case JSON_TOKEN_ARRAY_START:
-			parse_array(lex, sem);
+			result = parse_array(lex, sem);
 			break;
 		default:
-			parse_scalar(lex, sem); /* json can be a bare scalar */
+			result = parse_scalar(lex, sem); /* json can be a bare scalar */
 	}
 
-	lex_expect(JSON_PARSE_END, lex, JSON_TOKEN_END);
+	if (result == JSON_SUCCESS)
+		result = lex_expect(JSON_PARSE_END, lex, JSON_TOKEN_END);
+
+	return result;
+}
+
+/*
+ * pg_parse_json_or_ereport
+ *
+ * This fuction is like pg_parse_json, except that it does not return a
+ * JsonParseErrorType. Instead, in case of any failure, this function will
+ * ereport(ERROR).
+ */
+void
+pg_parse_json_or_ereport(JsonLexContext *lex, JsonSemAction *sem)
+{
+	JsonParseErrorType	result;
 
+	result = pg_parse_json(lex, sem);
+	if (result != JSON_SUCCESS)
+		json_ereport_error(result, lex);
 }
 
 /*
@@ -206,11 +227,12 @@ pg_parse_json(JsonLexContext *lex, JsonSemAction *sem)
  *
  * Designed to be called from array_start routines.
  */
-int
-json_count_array_elements(JsonLexContext *lex)
+JsonParseErrorType
+json_count_array_elements(JsonLexContext *lex, int *elements)
 {
 	JsonLexContext copylex;
 	int			count;
+	JsonParseErrorType	result;
 
 	/*
 	 * It's safe to do this with a shallow copy because the lexical routines
@@ -222,21 +244,32 @@ json_count_array_elements(JsonLexContext *lex)
 	copylex.lex_level++;
 
 	count = 0;
-	lex_expect(JSON_PARSE_ARRAY_START, &copylex, JSON_TOKEN_ARRAY_START);
+	result = lex_expect(JSON_PARSE_ARRAY_START, &copylex,
+						JSON_TOKEN_ARRAY_START);
+	if (result != JSON_SUCCESS)
+		return result;
 	if (lex_peek(&copylex) != JSON_TOKEN_ARRAY_END)
 	{
 		while (1)
 		{
 			count++;
-			parse_array_element(&copylex, &nullSemAction);
+			result = parse_array_element(&copylex, &nullSemAction);
+			if (result != JSON_SUCCESS)
+				return result;
 			if (copylex.token_type != JSON_TOKEN_COMMA)
 				break;
-			json_lex(&copylex);
+			result = json_lex(&copylex);
+			if (result != JSON_SUCCESS)
+				return result;
 		}
 	}
-	lex_expect(JSON_PARSE_ARRAY_NEXT, &copylex, JSON_TOKEN_ARRAY_END);
+	result = lex_expect(JSON_PARSE_ARRAY_NEXT, &copylex,
+							JSON_TOKEN_ARRAY_END);
+	if (result != JSON_SUCCESS)
+		return result;
 
-	return count;
+	*elements = count;
+	return JSON_SUCCESS;
 }
 
 /*
@@ -248,25 +281,23 @@ json_count_array_elements(JsonLexContext *lex)
  *	  - object ( { } )
  *	  - object field
  */
-static inline void
+static inline JsonParseErrorType
 parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
 {
 	char	   *val = NULL;
 	json_scalar_action sfunc = sem->scalar;
 	JsonTokenType tok = lex_peek(lex);
+	JsonParseErrorType result;
 
 	/* a scalar must be a string, a number, true, false, or null */
 	if (tok != JSON_TOKEN_STRING && tok != JSON_TOKEN_NUMBER &&
 		tok != JSON_TOKEN_TRUE && tok != JSON_TOKEN_FALSE &&
 		tok != JSON_TOKEN_NULL)
-		report_parse_error(JSON_PARSE_VALUE, lex);
+		return report_parse_error(JSON_PARSE_VALUE, lex);
 
 	/* if no semantic function, just consume the token */
 	if (sfunc == NULL)
-	{
-		json_lex(lex);
-		return;
-	}
+		return json_lex(lex);
 
 	/* extract the de-escaped string value, or the raw lexeme */
 	if (lex_peek(lex) == JSON_TOKEN_STRING)
@@ -284,13 +315,17 @@ parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
 	}
 
 	/* consume the token */
-	json_lex(lex);
+	result = json_lex(lex);
+	if (result != JSON_SUCCESS)
+		return result;
 
 	/* invoke the callback */
 	(*sfunc) (sem->semstate, val, tok);
+
+	return JSON_SUCCESS;
 }
 
-static void
+static JsonParseErrorType
 parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
 {
 	/*
@@ -304,14 +339,19 @@ parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
 	json_ofield_action oend = sem->object_field_end;
 	bool		isnull;
 	JsonTokenType tok;
+	JsonParseErrorType result;
 
 	if (lex_peek(lex) != JSON_TOKEN_STRING)
-		report_parse_error(JSON_PARSE_STRING, lex);
+		return report_parse_error(JSON_PARSE_STRING, lex);
 	if ((ostart != NULL || oend != NULL) && lex->strval != NULL)
 		fname = pstrdup(lex->strval->data);
-	json_lex(lex);
+	result = json_lex(lex);
+	if (result != JSON_SUCCESS)
+		return result;
 
-	lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON);
+	result = lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON);
+	if (result != JSON_SUCCESS)
+		return result;
 
 	tok = lex_peek(lex);
 	isnull = tok == JSON_TOKEN_NULL;
@@ -322,20 +362,23 @@ parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
 	switch (tok)
 	{
 		case JSON_TOKEN_OBJECT_START:
-			parse_object(lex, sem);
+			result = parse_object(lex, sem);
 			break;
 		case JSON_TOKEN_ARRAY_START:
-			parse_array(lex, sem);
+			result = parse_array(lex, sem);
 			break;
 		default:
-			parse_scalar(lex, sem);
+			result = parse_scalar(lex, sem);
 	}
+	if (result != JSON_SUCCESS)
+		return result;
 
 	if (oend != NULL)
 		(*oend) (sem->semstate, fname, isnull);
+	return JSON_SUCCESS;
 }
 
-static void
+static JsonParseErrorType
 parse_object(JsonLexContext *lex, JsonSemAction *sem)
 {
 	/*
@@ -345,6 +388,7 @@ parse_object(JsonLexContext *lex, JsonSemAction *sem)
 	json_struct_action ostart = sem->object_start;
 	json_struct_action oend = sem->object_end;
 	JsonTokenType tok;
+	JsonParseErrorType result;
 
 	check_stack_depth();
 
@@ -360,40 +404,51 @@ parse_object(JsonLexContext *lex, JsonSemAction *sem)
 	lex->lex_level++;
 
 	Assert(lex_peek(lex) == JSON_TOKEN_OBJECT_START);
-	json_lex(lex);
+	result = json_lex(lex);
+	if (result != JSON_SUCCESS)
+		return result;
 
 	tok = lex_peek(lex);
 	switch (tok)
 	{
 		case JSON_TOKEN_STRING:
-			parse_object_field(lex, sem);
-			while (lex_peek(lex) == JSON_TOKEN_COMMA)
+			result = parse_object_field(lex, sem);
+			while (result == JSON_SUCCESS && lex_peek(lex) == JSON_TOKEN_COMMA)
 			{
-				json_lex(lex);
-				parse_object_field(lex, sem);
+				result = json_lex(lex);
+				if (result != JSON_SUCCESS)
+					break;
+				result = parse_object_field(lex, sem);
 			}
 			break;
 		case JSON_TOKEN_OBJECT_END:
 			break;
 		default:
 			/* case of an invalid initial token inside the object */
-			report_parse_error(JSON_PARSE_OBJECT_START, lex);
+			result = report_parse_error(JSON_PARSE_OBJECT_START, lex);
 	}
+	if (result != JSON_SUCCESS)
+		return result;
 
-	lex_expect(JSON_PARSE_OBJECT_NEXT, lex, JSON_TOKEN_OBJECT_END);
+	result = lex_expect(JSON_PARSE_OBJECT_NEXT, lex, JSON_TOKEN_OBJECT_END);
+	if (result != JSON_SUCCESS)
+		return result;
 
 	lex->lex_level--;
 
 	if (oend != NULL)
 		(*oend) (sem->semstate);
+
+	return JSON_SUCCESS;
 }
 
-static void
+static JsonParseErrorType
 parse_array_element(JsonLexContext *lex, JsonSemAction *sem)
 {
 	json_aelem_action astart = sem->array_element_start;
 	json_aelem_action aend = sem->array_element_end;
 	JsonTokenType tok = lex_peek(lex);
+	JsonParseErrorType result;
 
 	bool		isnull;
 
@@ -406,20 +461,25 @@ parse_array_element(JsonLexContext *lex, JsonSemAction *sem)
 	switch (tok)
 	{
 		case JSON_TOKEN_OBJECT_START:
-			parse_object(lex, sem);
+			result = parse_object(lex, sem);
 			break;
 		case JSON_TOKEN_ARRAY_START:
-			parse_array(lex, sem);
+			result = parse_array(lex, sem);
 			break;
 		default:
-			parse_scalar(lex, sem);
+			result = parse_scalar(lex, sem);
 	}
 
+	if (result != JSON_SUCCESS)
+		return result;
+
 	if (aend != NULL)
 		(*aend) (sem->semstate, isnull);
+
+	return JSON_SUCCESS;
 }
 
-static void
+static JsonParseErrorType
 parse_array(JsonLexContext *lex, JsonSemAction *sem)
 {
 	/*
@@ -428,6 +488,7 @@ parse_array(JsonLexContext *lex, JsonSemAction *sem)
 	 */
 	json_struct_action astart = sem->array_start;
 	json_struct_action aend = sem->array_end;
+	JsonParseErrorType result;
 
 	check_stack_depth();
 
@@ -442,35 +503,43 @@ parse_array(JsonLexContext *lex, JsonSemAction *sem)
 	 */
 	lex->lex_level++;
 
-	lex_expect(JSON_PARSE_ARRAY_START, lex, JSON_TOKEN_ARRAY_START);
-	if (lex_peek(lex) != JSON_TOKEN_ARRAY_END)
+	result = lex_expect(JSON_PARSE_ARRAY_START, lex, JSON_TOKEN_ARRAY_START);
+	if (result == JSON_SUCCESS && lex_peek(lex) != JSON_TOKEN_ARRAY_END)
 	{
+		result = parse_array_element(lex, sem);
 
-		parse_array_element(lex, sem);
-
-		while (lex_peek(lex) == JSON_TOKEN_COMMA)
+		while (result == JSON_SUCCESS && lex_peek(lex) == JSON_TOKEN_COMMA)
 		{
-			json_lex(lex);
-			parse_array_element(lex, sem);
+			result = json_lex(lex);
+			if (result != JSON_SUCCESS)
+				break;
+			result = parse_array_element(lex, sem);
 		}
 	}
+	if (result != JSON_SUCCESS)
+		return result;
 
-	lex_expect(JSON_PARSE_ARRAY_NEXT, lex, JSON_TOKEN_ARRAY_END);
+	result = lex_expect(JSON_PARSE_ARRAY_NEXT, lex, JSON_TOKEN_ARRAY_END);
+	if (result != JSON_SUCCESS)
+		return result;
 
 	lex->lex_level--;
 
 	if (aend != NULL)
 		(*aend) (sem->semstate);
+
+	return JSON_SUCCESS;
 }
 
 /*
  * Lex one token from the input stream.
  */
-void
+JsonParseErrorType
 json_lex(JsonLexContext *lex)
 {
 	char	   *s;
 	int			len;
+	JsonParseErrorType	result;
 
 	/* Skip leading whitespace. */
 	s = lex->token_terminator;
@@ -494,6 +563,7 @@ json_lex(JsonLexContext *lex)
 		lex->token_type = JSON_TOKEN_END;
 	}
 	else
+	{
 		switch (*s)
 		{
 				/* Single-character token, some kind of punctuation mark. */
@@ -529,12 +599,16 @@ json_lex(JsonLexContext *lex)
 				break;
 			case '"':
 				/* string */
-				json_lex_string(lex);
+				result = json_lex_string(lex);
+				if (result != JSON_SUCCESS)
+					return result;
 				lex->token_type = JSON_TOKEN_STRING;
 				break;
 			case '-':
 				/* Negative number. */
-				json_lex_number(lex, s + 1, NULL, NULL);
+				result = json_lex_number(lex, s + 1, NULL, NULL);
+				if (result != JSON_SUCCESS)
+					return result;
 				lex->token_type = JSON_TOKEN_NUMBER;
 				break;
 			case '0':
@@ -548,7 +622,9 @@ json_lex(JsonLexContext *lex)
 			case '8':
 			case '9':
 				/* Positive number. */
-				json_lex_number(lex, s, NULL, NULL);
+				result = json_lex_number(lex, s, NULL, NULL);
+				if (result != JSON_SUCCESS)
+					return result;
 				lex->token_type = JSON_TOKEN_NUMBER;
 				break;
 			default:
@@ -576,7 +652,7 @@ json_lex(JsonLexContext *lex)
 					{
 						lex->prev_token_terminator = lex->token_terminator;
 						lex->token_terminator = s + 1;
-						report_invalid_token(lex);
+						return JSON_INVALID_TOKEN;
 					}
 
 					/*
@@ -593,21 +669,24 @@ json_lex(JsonLexContext *lex)
 						else if (memcmp(s, "null", 4) == 0)
 							lex->token_type = JSON_TOKEN_NULL;
 						else
-							report_invalid_token(lex);
+							return JSON_INVALID_TOKEN;
 					}
 					else if (p - s == 5 && memcmp(s, "false", 5) == 0)
 						lex->token_type = JSON_TOKEN_FALSE;
 					else
-						report_invalid_token(lex);
+						return JSON_INVALID_TOKEN;
 
 				}
 		}						/* end of switch */
+	}
+
+	return JSON_SUCCESS;
 }
 
 /*
  * The next token in the input stream is known to be a string; lex it.
  */
-static inline void
+static inline JsonParseErrorType
 json_lex_string(JsonLexContext *lex)
 {
 	char	   *s;
@@ -628,7 +707,7 @@ json_lex_string(JsonLexContext *lex)
 		if (len >= lex->input_length)
 		{
 			lex->token_terminator = s;
-			report_invalid_token(lex);
+			return JSON_INVALID_TOKEN;
 		}
 		else if (*s == '"')
 			break;
@@ -637,12 +716,7 @@ json_lex_string(JsonLexContext *lex)
 			/* Per RFC4627, these characters MUST be escaped. */
 			/* Since *s isn't printable, exclude it from the context string */
 			lex->token_terminator = s;
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-					 errmsg("invalid input syntax for type %s", "json"),
-					 errdetail("Character with value 0x%02x must be escaped.",
-							   (unsigned char) *s),
-					 report_json_context(lex)));
+			return JSON_ESCAPING_REQUIRED;
 		}
 		else if (*s == '\\')
 		{
@@ -652,7 +726,7 @@ json_lex_string(JsonLexContext *lex)
 			if (len >= lex->input_length)
 			{
 				lex->token_terminator = s;
-				report_invalid_token(lex);
+				return JSON_INVALID_TOKEN;
 			}
 			else if (*s == 'u')
 			{
@@ -666,7 +740,7 @@ json_lex_string(JsonLexContext *lex)
 					if (len >= lex->input_length)
 					{
 						lex->token_terminator = s;
-						report_invalid_token(lex);
+						return JSON_INVALID_TOKEN;
 					}
 					else if (*s >= '0' && *s <= '9')
 						ch = (ch * 16) + (*s - '0');
@@ -677,12 +751,7 @@ json_lex_string(JsonLexContext *lex)
 					else
 					{
 						lex->token_terminator = s + pg_mblen(s);
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s",
-										"json"),
-								 errdetail("\"\\u\" must be followed by four hexadecimal digits."),
-								 report_json_context(lex)));
+						return JSON_UNICODE_ESCAPE_FORMAT;
 					}
 				}
 				if (lex->strval != NULL)
@@ -693,33 +762,20 @@ json_lex_string(JsonLexContext *lex)
 					if (ch >= 0xd800 && ch <= 0xdbff)
 					{
 						if (hi_surrogate != -1)
-							ereport(ERROR,
-									(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-									 errmsg("invalid input syntax for type %s",
-											"json"),
-									 errdetail("Unicode high surrogate must not follow a high surrogate."),
-									 report_json_context(lex)));
+							return JSON_UNICODE_HIGH_SURROGATE;
 						hi_surrogate = (ch & 0x3ff) << 10;
 						continue;
 					}
 					else if (ch >= 0xdc00 && ch <= 0xdfff)
 					{
 						if (hi_surrogate == -1)
-							ereport(ERROR,
-									(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-									 errmsg("invalid input syntax for type %s", "json"),
-									 errdetail("Unicode low surrogate must follow a high surrogate."),
-									 report_json_context(lex)));
+							return JSON_UNICODE_LOW_SURROGATE;
 						ch = 0x10000 + hi_surrogate + (ch & 0x3ff);
 						hi_surrogate = -1;
 					}
 
 					if (hi_surrogate != -1)
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s", "json"),
-								 errdetail("Unicode low surrogate must follow a high surrogate."),
-								 report_json_context(lex)));
+						return JSON_UNICODE_LOW_SURROGATE;
 
 					/*
 					 * For UTF8, replace the escape sequence by the actual
@@ -731,11 +787,7 @@ json_lex_string(JsonLexContext *lex)
 					if (ch == 0)
 					{
 						/* We can't allow this, since our TEXT type doesn't */
-						ereport(ERROR,
-								(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
-								 errmsg("unsupported Unicode escape sequence"),
-								 errdetail("\\u0000 cannot be converted to text."),
-								 report_json_context(lex)));
+						return JSON_UNICODE_CODE_POINT_ZERO;
 					}
 					else if (GetDatabaseEncoding() == PG_UTF8)
 					{
@@ -753,25 +805,14 @@ json_lex_string(JsonLexContext *lex)
 						appendStringInfoChar(lex->strval, (char) ch);
 					}
 					else
-					{
-						ereport(ERROR,
-								(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
-								 errmsg("unsupported Unicode escape sequence"),
-								 errdetail("Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8."),
-								 report_json_context(lex)));
-					}
+						return JSON_UNICODE_HIGH_ESCAPE;
 
 				}
 			}
 			else if (lex->strval != NULL)
 			{
 				if (hi_surrogate != -1)
-					ereport(ERROR,
-							(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-							 errmsg("invalid input syntax for type %s",
-									"json"),
-							 errdetail("Unicode low surrogate must follow a high surrogate."),
-							 report_json_context(lex)));
+					return JSON_UNICODE_LOW_SURROGATE;
 
 				switch (*s)
 				{
@@ -796,15 +837,10 @@ json_lex_string(JsonLexContext *lex)
 						appendStringInfoChar(lex->strval, '\t');
 						break;
 					default:
-						/* Not a valid string escape, so error out. */
+						/* Not a valid string escape, so signal error. */
+						lex->token_start = s;
 						lex->token_terminator = s + pg_mblen(s);
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s",
-										"json"),
-								 errdetail("Escape sequence \"\\%s\" is invalid.",
-										   extract_mb_char(s)),
-								 report_json_context(lex)));
+						return JSON_ESCAPING_INVALID;
 				}
 			}
 			else if (strchr("\"\\/bfnrt", *s) == NULL)
@@ -816,24 +852,16 @@ json_lex_string(JsonLexContext *lex)
 				 * replace it with a switch statement, but testing so far has
 				 * shown it's not a performance win.
 				 */
+				lex->token_start = s;
 				lex->token_terminator = s + pg_mblen(s);
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Escape sequence \"\\%s\" is invalid.",
-								   extract_mb_char(s)),
-						 report_json_context(lex)));
+				return JSON_ESCAPING_INVALID;
 			}
 
 		}
 		else if (lex->strval != NULL)
 		{
 			if (hi_surrogate != -1)
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Unicode low surrogate must follow a high surrogate."),
-						 report_json_context(lex)));
+				return JSON_UNICODE_LOW_SURROGATE;
 
 			appendStringInfoChar(lex->strval, *s);
 		}
@@ -841,15 +869,12 @@ json_lex_string(JsonLexContext *lex)
 	}
 
 	if (hi_surrogate != -1)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("Unicode low surrogate must follow a high surrogate."),
-				 report_json_context(lex)));
+		return JSON_UNICODE_LOW_SURROGATE;
 
 	/* Hooray, we found the end of the string! */
 	lex->prev_token_terminator = lex->token_terminator;
 	lex->token_terminator = s + 1;
+	return JSON_SUCCESS;
 }
 
 /*
@@ -880,7 +905,7 @@ json_lex_string(JsonLexContext *lex)
  * raising an error for a badly-formed number.  Also, if total_len is not NULL
  * the distance from lex->input to the token end+1 is returned to *total_len.
  */
-static inline void
+static inline JsonParseErrorType
 json_lex_number(JsonLexContext *lex, char *s,
 				bool *num_err, int *total_len)
 {
@@ -969,8 +994,10 @@ json_lex_number(JsonLexContext *lex, char *s,
 		lex->token_terminator = s;
 		/* handle error if any */
 		if (error)
-			report_invalid_token(lex);
+			return JSON_INVALID_TOKEN;
 	}
+
+	return JSON_SUCCESS;
 }
 
 /*
@@ -978,130 +1005,117 @@ json_lex_number(JsonLexContext *lex, char *s,
  *
  * lex->token_start and lex->token_terminator must identify the current token.
  */
-static void
+static JsonParseErrorType
 report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
 {
-	char	   *token;
-	int			toklen;
-
 	/* Handle case where the input ended prematurely. */
 	if (lex->token_start == NULL || lex->token_type == JSON_TOKEN_END)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("The input string ended unexpectedly."),
-				 report_json_context(lex)));
+		return JSON_EXPECTED_MORE;
 
-	/* Separate out the current token. */
-	toklen = lex->token_terminator - lex->token_start;
-	token = palloc(toklen + 1);
-	memcpy(token, lex->token_start, toklen);
-	token[toklen] = '\0';
+	/* Otherwise choose the error type based on the parsing context. */
+	switch (ctx)
+	{
+		case JSON_PARSE_END:
+			return JSON_EXPECTED_END;
+		case JSON_PARSE_VALUE:
+			return JSON_EXPECTED_JSON;
+		case JSON_PARSE_STRING:
+			return JSON_EXPECTED_STRING;
+		case JSON_PARSE_ARRAY_START:
+			return JSON_EXPECTED_ARRAY_FIRST;
+		case JSON_PARSE_ARRAY_NEXT:
+			return JSON_EXPECTED_ARRAY_NEXT;
+		case JSON_PARSE_OBJECT_START:
+			return JSON_EXPECTED_OBJECT_FIRST;
+		case JSON_PARSE_OBJECT_LABEL:
+			return JSON_EXPECTED_COLON;
+		case JSON_PARSE_OBJECT_NEXT:
+			return JSON_EXPECTED_OBJECT_NEXT;
+		case JSON_PARSE_OBJECT_COMMA:
+			return JSON_EXPECTED_STRING;
+		default:
+			elog(ERROR, "unexpected json parse state: %d", ctx);
+	}
+}
 
-	/* Complain, with the appropriate detail message. */
-	if (ctx == JSON_PARSE_END)
+/*
+ * Report a JSON error.
+ */
+void
+json_ereport_error(JsonParseErrorType error, JsonLexContext *lex)
+{
+	if (error == JSON_UNICODE_HIGH_ESCAPE ||
+		error == JSON_UNICODE_CODE_POINT_ZERO)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
+				 errmsg("unsupported Unicode escape sequence"),
+				 errdetail("%s", json_errdetail(error, lex)),
+				 report_json_context(lex)));
+	else
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
 				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("Expected end of input, but found \"%s\".",
-						   token),
+				 errdetail("%s", json_errdetail(error, lex)),
 				 report_json_context(lex)));
-	else
-	{
-		switch (ctx)
-		{
-			case JSON_PARSE_VALUE:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected JSON value, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_STRING:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_ARRAY_START:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected array element or \"]\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_ARRAY_NEXT:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \",\" or \"]\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_START:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string or \"}\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_LABEL:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \":\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_NEXT:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \",\" or \"}\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_COMMA:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			default:
-				elog(ERROR, "unexpected json parse state: %d", ctx);
-		}
-	}
 }
 
 /*
- * Report an invalid input token.
- *
- * lex->token_start and lex->token_terminator must identify the token.
+ * Construct a detail message for a JSON error.
  */
-static void
-report_invalid_token(JsonLexContext *lex)
+char *
+json_errdetail(JsonParseErrorType error, JsonLexContext *lex)
 {
-	char	   *token;
-	int			toklen;
-
-	/* Separate out the offending token. */
-	toklen = lex->token_terminator - lex->token_start;
-	token = palloc(toklen + 1);
-	memcpy(token, lex->token_start, toklen);
-	token[toklen] = '\0';
-
-	ereport(ERROR,
-			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-			 errmsg("invalid input syntax for type %s", "json"),
-			 errdetail("Token \"%s\" is invalid.", token),
-			 report_json_context(lex)));
+	switch (error)
+	{
+		case JSON_SUCCESS:
+			elog(ERROR, "internal error in json parser");
+			break;
+		case JSON_ESCAPING_INVALID:
+			return psprintf(_("Escape sequence \"\\%s\" is invalid."),
+							extract_token(lex));
+		case JSON_ESCAPING_REQUIRED:
+			return psprintf(_("Character with value 0x%02x must be escaped."),
+							(unsigned char) *(lex->token_terminator));
+		case JSON_EXPECTED_END:
+			return psprintf(_("Expected end of input, but found \"%s\"."),
+							extract_token(lex));
+		case JSON_EXPECTED_ARRAY_FIRST:
+			return psprintf(_("Expected array element or \"]\", but found \"%s\"."),
+							extract_token(lex));
+		case JSON_EXPECTED_ARRAY_NEXT:
+			return psprintf(_("Expected \",\" or \"]\", but found \"%s\"."),
+							extract_token(lex));
+		case JSON_EXPECTED_COLON:
+			return psprintf(_("Expected \":\", but found \"%s\"."),
+							extract_token(lex));
+		case JSON_EXPECTED_JSON:
+			return psprintf(_("Expected JSON value, but found \"%s\"."),
+							extract_token(lex));
+		case JSON_EXPECTED_MORE:
+			return _("The input string ended unexpectedly.");
+		case JSON_EXPECTED_OBJECT_FIRST:
+			return psprintf(_("Expected string or \"}\", but found \"%s\"."),
+							extract_token(lex));
+		case JSON_EXPECTED_OBJECT_NEXT:
+			return psprintf(_("Expected \",\" or \"}\", but found \"%s\"."),
+							extract_token(lex));
+		case JSON_EXPECTED_STRING:
+			return psprintf(_("Expected string, but found \"%s\"."),
+							extract_token(lex));
+		case JSON_INVALID_TOKEN:
+			return psprintf(_("Token \"%s\" is invalid."),
+							extract_token(lex));
+		case JSON_UNICODE_CODE_POINT_ZERO:
+			return _("\\u0000 cannot be converted to text.");
+		case JSON_UNICODE_ESCAPE_FORMAT:
+			return _("\"\\u\" must be followed by four hexadecimal digits.");
+		case JSON_UNICODE_HIGH_ESCAPE:
+			return _("Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8.");
+		case JSON_UNICODE_HIGH_SURROGATE:
+			return _("Unicode high surrogate must not follow a high surrogate.");
+		case JSON_UNICODE_LOW_SURROGATE:
+			return _("Unicode low surrogate must follow a high surrogate.");
+	}
 }
 
 /*
@@ -1177,18 +1191,15 @@ report_json_context(JsonLexContext *lex)
 }
 
 /*
- * Extract a single, possibly multi-byte char from the input string.
+ * Extract the current token from a lexing context, for error reporting.
  */
 static char *
-extract_mb_char(char *s)
+extract_token(JsonLexContext *lex)
 {
-	char	   *res;
-	int			len;
-
-	len = pg_mblen(s);
-	res = palloc(len + 1);
-	memcpy(res, s, len);
-	res[len] = '\0';
+	int toklen = lex->token_terminator - lex->token_start;
+	char *token = palloc(toklen + 1);
 
-	return res;
+	memcpy(token, lex->token_start, toklen);
+	token[toklen] = '\0';
+	return token;
 }
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index c4a4ec78b0..83d7f68b82 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -272,7 +272,7 @@ jsonb_from_cstring(char *json, int len)
 	sem.scalar = jsonb_in_scalar;
 	sem.object_field_start = jsonb_in_object_field_start;
 
-	pg_parse_json(lex, &sem);
+	pg_parse_json_or_ereport(lex, &sem);
 
 	/* after parsing, the item member has the composed jsonb structure */
 	PG_RETURN_POINTER(JsonbValueToJsonb(state.res));
@@ -860,7 +860,7 @@ datum_to_jsonb(Datum val, bool is_null, JsonbInState *result,
 					sem.scalar = jsonb_in_scalar;
 					sem.object_field_start = jsonb_in_object_field_start;
 
-					pg_parse_json(lex, &sem);
+					pg_parse_json_or_ereport(lex, &sem);
 
 				}
 				break;
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 2f9955d665..9eff506855 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -606,7 +606,7 @@ json_object_keys(PG_FUNCTION_ARGS)
 		sem->object_field_start = okeys_object_field_start;
 		/* remainder are all NULL, courtesy of palloc0 above */
 
-		pg_parse_json(lex, sem);
+		pg_parse_json_or_ereport(lex, sem);
 		/* keys are now in state->result */
 
 		pfree(lex->strval->data);
@@ -1000,7 +1000,7 @@ get_worker(text *json,
 		sem->array_element_end = get_array_element_end;
 	}
 
-	pg_parse_json(lex, sem);
+	pg_parse_json_or_ereport(lex, sem);
 
 	return state->tresult;
 }
@@ -1148,7 +1148,12 @@ get_array_start(void *state)
 			_state->path_indexes[lex_level] != INT_MIN)
 		{
 			/* Negative subscript -- convert to positive-wise subscript */
-			int			nelements = json_count_array_elements(_state->lex);
+			JsonParseErrorType error;
+			int			nelements;
+
+			error = json_count_array_elements(_state->lex, &nelements);
+			if (error != JSON_SUCCESS)
+				json_ereport_error(error, _state->lex);
 
 			if (-_state->path_indexes[lex_level] <= nelements)
 				_state->path_indexes[lex_level] += nelements;
@@ -1548,7 +1553,7 @@ json_array_length(PG_FUNCTION_ARGS)
 	sem->scalar = alen_scalar;
 	sem->array_element_start = alen_array_element_start;
 
-	pg_parse_json(lex, sem);
+	pg_parse_json_or_ereport(lex, sem);
 
 	PG_RETURN_INT32(state->count);
 }
@@ -1814,7 +1819,7 @@ each_worker(FunctionCallInfo fcinfo, bool as_text)
 										   "json_each temporary cxt",
 										   ALLOCSET_DEFAULT_SIZES);
 
-	pg_parse_json(lex, sem);
+	pg_parse_json_or_ereport(lex, sem);
 
 	MemoryContextDelete(state->tmp_cxt);
 
@@ -2113,7 +2118,7 @@ elements_worker(FunctionCallInfo fcinfo, const char *funcname, bool as_text)
 										   "json_array_elements temporary cxt",
 										   ALLOCSET_DEFAULT_SIZES);
 
-	pg_parse_json(lex, sem);
+	pg_parse_json_or_ereport(lex, sem);
 
 	MemoryContextDelete(state->tmp_cxt);
 
@@ -2485,7 +2490,7 @@ populate_array_json(PopulateArrayContext *ctx, char *json, int len)
 	sem.array_element_end = populate_array_element_end;
 	sem.scalar = populate_array_scalar;
 
-	pg_parse_json(state.lex, &sem);
+	pg_parse_json_or_ereport(state.lex, &sem);
 
 	/* number of dimensions should be already known */
 	Assert(ctx->ndims > 0 && ctx->dims);
@@ -3342,7 +3347,7 @@ get_json_object_as_hash(char *json, int len, const char *funcname)
 	sem->object_field_start = hash_object_field_start;
 	sem->object_field_end = hash_object_field_end;
 
-	pg_parse_json(lex, sem);
+	pg_parse_json_or_ereport(lex, sem);
 
 	return tab;
 }
@@ -3641,7 +3646,7 @@ populate_recordset_worker(FunctionCallInfo fcinfo, const char *funcname,
 
 		state->lex = lex;
 
-		pg_parse_json(lex, sem);
+		pg_parse_json_or_ereport(lex, sem);
 	}
 	else
 	{
@@ -3971,7 +3976,7 @@ json_strip_nulls(PG_FUNCTION_ARGS)
 	sem->array_element_start = sn_array_element_start;
 	sem->object_field_start = sn_object_field_start;
 
-	pg_parse_json(lex, sem);
+	pg_parse_json_or_ereport(lex, sem);
 
 	PG_RETURN_TEXT_P(cstring_to_text_with_len(state->strval->data,
 											  state->strval->len));
@@ -5110,7 +5115,7 @@ iterate_json_values(text *json, uint32 flags, void *action_state,
 	sem->scalar = iterate_values_scalar;
 	sem->object_field_start = iterate_values_object_field_start;
 
-	pg_parse_json(lex, sem);
+	pg_parse_json_or_ereport(lex, sem);
 }
 
 /*
@@ -5230,7 +5235,7 @@ transform_json_string_values(text *json, void *action_state,
 	sem->array_element_start = transform_string_values_array_element_start;
 	sem->object_field_start = transform_string_values_object_field_start;
 
-	pg_parse_json(lex, sem);
+	pg_parse_json_or_ereport(lex, sem);
 
 	return cstring_to_text_with_len(state->strval->data, state->strval->len);
 }
diff --git a/src/include/utils/jsonapi.h b/src/include/utils/jsonapi.h
index bbca121bb7..74dc35c41c 100644
--- a/src/include/utils/jsonapi.h
+++ b/src/include/utils/jsonapi.h
@@ -33,6 +33,28 @@ typedef enum
 	JSON_TOKEN_END
 } JsonTokenType;
 
+typedef enum
+{
+	JSON_SUCCESS,
+	JSON_ESCAPING_INVALID,
+	JSON_ESCAPING_REQUIRED,
+	JSON_EXPECTED_ARRAY_FIRST,
+	JSON_EXPECTED_ARRAY_NEXT,
+	JSON_EXPECTED_COLON,
+	JSON_EXPECTED_END,
+	JSON_EXPECTED_JSON,
+	JSON_EXPECTED_MORE,
+	JSON_EXPECTED_OBJECT_FIRST,
+	JSON_EXPECTED_OBJECT_NEXT,
+	JSON_EXPECTED_STRING,
+	JSON_INVALID_TOKEN,
+	JSON_UNICODE_CODE_POINT_ZERO,
+	JSON_UNICODE_ESCAPE_FORMAT,
+	JSON_UNICODE_HIGH_ESCAPE,
+	JSON_UNICODE_HIGH_SURROGATE,
+	JSON_UNICODE_LOW_SURROGATE
+} JsonParseErrorType;
+
 
 /*
  * All the fields in this structure should be treated as read-only.
@@ -101,7 +123,14 @@ typedef struct JsonSemAction
  * points to. If the action pointers are NULL the parser
  * does nothing and just continues.
  */
-extern void pg_parse_json(JsonLexContext *lex, JsonSemAction *sem);
+extern JsonParseErrorType pg_parse_json(JsonLexContext *lex,
+										JsonSemAction *sem);
+
+/*
+ * Same thing, but signal errors via ereport(ERROR) instead of returning
+ * a result code.
+ */
+extern void pg_parse_json_or_ereport(JsonLexContext *lex, JsonSemAction *sem);
 
 /* the null action object used for pure validation */
 extern JsonSemAction nullSemAction;
@@ -110,8 +139,13 @@ extern JsonSemAction nullSemAction;
  * json_count_array_elements performs a fast secondary parse to determine the
  * number of elements in passed array lex context. It should be called from an
  * array_start action.
+ *
+ * The return value indicates whether any error occurred, while the number
+ * of elements is stored into *elements (but only if the return value is
+ * JSON_SUCCESS).
  */
-extern int	json_count_array_elements(JsonLexContext *lex);
+extern JsonParseErrorType json_count_array_elements(JsonLexContext *lex,
+													int *elements);
 
 /*
  * constructors for JsonLexContext, with or without strval element.
@@ -128,7 +162,13 @@ extern JsonLexContext *makeJsonLexContextCstringLen(char *json,
 													bool need_escapes);
 
 /* lex one token */
-extern void json_lex(JsonLexContext *lex);
+extern JsonParseErrorType json_lex(JsonLexContext *lex);
+
+/* report an error during json lexing or parsing */
+extern void json_ereport_error(JsonParseErrorType error, JsonLexContext *lex);
+
+/* construct an error detail string for a json error */
+extern char *json_errdetail(JsonParseErrorType error, JsonLexContext *lex);
 
 /*
  * Utility function to check if a string is a valid JSON number.
-- 
2.17.2 (Apple Git-113)

v3-0003-Remove-jsonapi.c-s-lex_accept.patchapplication/octet-stream; name=v3-0003-Remove-jsonapi.c-s-lex_accept.patchDownload

From c44dbe22ef9d165fc24d07b4dd4e5b4938bdcce5 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 17 Jan 2020 14:06:41 -0500
Subject: [PATCH v3 3/5] Remove jsonapi.c's lex_accept().

At first glance, this function seems useful, but it actually increases
the amount of code required rather than decreasing it. Inline the
logic into the callers instead; most callers don't use the 'lexeme'
argument for anything and as a result considerable simplification is
possible.
---
 src/backend/utils/adt/jsonapi.c | 124 +++++++++++++-------------------
 1 file changed, 51 insertions(+), 73 deletions(-)

diff --git a/src/backend/utils/adt/jsonapi.c b/src/backend/utils/adt/jsonapi.c
index fc8af9f861..9e14306b6f 100644
--- a/src/backend/utils/adt/jsonapi.c
+++ b/src/backend/utils/adt/jsonapi.c
@@ -69,44 +69,7 @@ lex_peek(JsonLexContext *lex)
 }
 
 /*
- * lex_accept
- *
- * accept the look_ahead token and move the lexer to the next token if the
- * look_ahead token matches the token parameter. In that case, and if required,
- * also hand back the de-escaped lexeme.
- *
- * returns true if the token matched, false otherwise.
- */
-static inline bool
-lex_accept(JsonLexContext *lex, JsonTokenType token, char **lexeme)
-{
-	if (lex->token_type == token)
-	{
-		if (lexeme != NULL)
-		{
-			if (lex->token_type == JSON_TOKEN_STRING)
-			{
-				if (lex->strval != NULL)
-					*lexeme = pstrdup(lex->strval->data);
-			}
-			else
-			{
-				int			len = (lex->token_terminator - lex->token_start);
-				char	   *tokstr = palloc(len + 1);
-
-				memcpy(tokstr, lex->token_start, len);
-				tokstr[len] = '\0';
-				*lexeme = tokstr;
-			}
-		}
-		json_lex(lex);
-		return true;
-	}
-	return false;
-}
-
-/*
- * lex_accept
+ * lex_expect
  *
  * move the lexer to the next token if the current look_ahead token matches
  * the parameter token. Otherwise, report an error.
@@ -114,7 +77,9 @@ lex_accept(JsonLexContext *lex, JsonTokenType token, char **lexeme)
 static inline void
 lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token)
 {
-	if (!lex_accept(lex, token, NULL))
+	if (lex_peek(lex) == token)
+		json_lex(lex);
+	else
 		report_parse_error(ctx, lex);
 }
 
@@ -260,12 +225,14 @@ json_count_array_elements(JsonLexContext *lex)
 	lex_expect(JSON_PARSE_ARRAY_START, &copylex, JSON_TOKEN_ARRAY_START);
 	if (lex_peek(&copylex) != JSON_TOKEN_ARRAY_END)
 	{
-		do
+		while (1)
 		{
 			count++;
 			parse_array_element(&copylex, &nullSemAction);
+			if (copylex.token_type != JSON_TOKEN_COMMA)
+				break;
+			json_lex(&copylex);
 		}
-		while (lex_accept(&copylex, JSON_TOKEN_COMMA, NULL));
 	}
 	lex_expect(JSON_PARSE_ARRAY_NEXT, &copylex, JSON_TOKEN_ARRAY_END);
 
@@ -286,35 +253,41 @@ parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
 {
 	char	   *val = NULL;
 	json_scalar_action sfunc = sem->scalar;
-	char	  **valaddr;
 	JsonTokenType tok = lex_peek(lex);
 
-	valaddr = sfunc == NULL ? NULL : &val;
-
 	/* a scalar must be a string, a number, true, false, or null */
-	switch (tok)
+	if (tok != JSON_TOKEN_STRING && tok != JSON_TOKEN_NUMBER &&
+		tok != JSON_TOKEN_TRUE && tok != JSON_TOKEN_FALSE &&
+		tok != JSON_TOKEN_NULL)
+		report_parse_error(JSON_PARSE_VALUE, lex);
+
+	/* if no semantic function, just consume the token */
+	if (sfunc == NULL)
 	{
-		case JSON_TOKEN_TRUE:
-			lex_accept(lex, JSON_TOKEN_TRUE, valaddr);
-			break;
-		case JSON_TOKEN_FALSE:
-			lex_accept(lex, JSON_TOKEN_FALSE, valaddr);
-			break;
-		case JSON_TOKEN_NULL:
-			lex_accept(lex, JSON_TOKEN_NULL, valaddr);
-			break;
-		case JSON_TOKEN_NUMBER:
-			lex_accept(lex, JSON_TOKEN_NUMBER, valaddr);
-			break;
-		case JSON_TOKEN_STRING:
-			lex_accept(lex, JSON_TOKEN_STRING, valaddr);
-			break;
-		default:
-			report_parse_error(JSON_PARSE_VALUE, lex);
+		json_lex(lex);
+		return;
+	}
+
+	/* extract the de-escaped string value, or the raw lexeme */
+	if (lex_peek(lex) == JSON_TOKEN_STRING)
+	{
+		if (lex->strval != NULL)
+			val = pstrdup(lex->strval->data);
+	}
+	else
+	{
+		int			len = (lex->token_terminator - lex->token_start);
+
+		val = palloc(len + 1);
+		memcpy(val, lex->token_start, len);
+		val[len] = '\0';
 	}
 
-	if (sfunc != NULL)
-		(*sfunc) (sem->semstate, val, tok);
+	/* consume the token */
+	json_lex(lex);
+
+	/* invoke the callback */
+	(*sfunc) (sem->semstate, val, tok);
 }
 
 static void
@@ -330,14 +303,13 @@ parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
 	json_ofield_action ostart = sem->object_field_start;
 	json_ofield_action oend = sem->object_field_end;
 	bool		isnull;
-	char	  **fnameaddr = NULL;
 	JsonTokenType tok;
 
-	if (ostart != NULL || oend != NULL)
-		fnameaddr = &fname;
-
-	if (!lex_accept(lex, JSON_TOKEN_STRING, fnameaddr))
+	if (lex_peek(lex) != JSON_TOKEN_STRING)
 		report_parse_error(JSON_PARSE_STRING, lex);
+	if ((ostart != NULL || oend != NULL) && lex->strval != NULL)
+		fname = pstrdup(lex->strval->data);
+	json_lex(lex);
 
 	lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON);
 
@@ -387,16 +359,19 @@ parse_object(JsonLexContext *lex, JsonSemAction *sem)
 	 */
 	lex->lex_level++;
 
-	/* we know this will succeed, just clearing the token */
-	lex_expect(JSON_PARSE_OBJECT_START, lex, JSON_TOKEN_OBJECT_START);
+	Assert(lex_peek(lex) == JSON_TOKEN_OBJECT_START);
+	json_lex(lex);
 
 	tok = lex_peek(lex);
 	switch (tok)
 	{
 		case JSON_TOKEN_STRING:
 			parse_object_field(lex, sem);
-			while (lex_accept(lex, JSON_TOKEN_COMMA, NULL))
+			while (lex_peek(lex) == JSON_TOKEN_COMMA)
+			{
+				json_lex(lex);
 				parse_object_field(lex, sem);
+			}
 			break;
 		case JSON_TOKEN_OBJECT_END:
 			break;
@@ -473,8 +448,11 @@ parse_array(JsonLexContext *lex, JsonSemAction *sem)
 
 		parse_array_element(lex, sem);
 
-		while (lex_accept(lex, JSON_TOKEN_COMMA, NULL))
+		while (lex_peek(lex) == JSON_TOKEN_COMMA)
+		{
+			json_lex(lex);
 			parse_array_element(lex, sem);
+		}
 	}
 
 	lex_expect(JSON_PARSE_ARRAY_NEXT, lex, JSON_TOKEN_ARRAY_END);
-- 
2.17.2 (Apple Git-113)

#28

Alvaro Herrera

alvherre@2ndquadrant.com

almost 6 years ago

In reply to: Robert Haas (#27)

Re: making the backend's json parser work in frontend code

On 2020-Jan-22, Robert Haas wrote:

Here is a new version that is, I think, much closer what I would
consider a final form. 0001 through 0003 are as before, and unless
somebody says pretty soon that they see a problem with those or want
more time to review them, I'm going to commit them; David Steele has
endorsed all three, and they seem like independently sensible
cleanups.

I'm not sure I see the point of keeping json.h split from jsonapi.h. It
seems to me that you could move back all the contents from jsonapi.h
into json.h, and everything would work just as well. (Evidently the
Datum in JsonEncodeDateTime's proto is problematic ... perhaps putting
that prototype in jsonfuncs.h would work.)

I don't really object to your 0001 patch as posted, though.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#29

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Alvaro Herrera (#28)

Re: making the backend's json parser work in frontend code

On Wed, Jan 22, 2020 at 2:26 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

I'm not sure I see the point of keeping json.h split from jsonapi.h. It
seems to me that you could move back all the contents from jsonapi.h
into json.h, and everything would work just as well. (Evidently the
Datum in JsonEncodeDateTime's proto is problematic ... perhaps putting
that prototype in jsonfuncs.h would work.)

I don't really object to your 0001 patch as posted, though.

The goal is to make it possible to use the JSON parser in the
frontend, and we can't do that if the header files that would have to
be included on the frontend side rely on things that only work in the
backend. As written, the patch series leaves json.h with a dependency
on Datum, so the stuff that it leaves in jsonapi.h (which is intended
to be the header that gets moved to src/common and included by
frontend code) can't be merged with it.

Now, we could obviously rearrange that. I don't think any of the file
naming here is great. But I think we probably want, as far as
possible, for the code in FOO.c to correspond to the prototypes in
FOO.h. What I'm thinking we should work towards is:

json.c/h - support for the 'json' data type
jsonb.c/h - support for the 'jsonb' data type
jsonfuncs.c/h - backend code that doesn't fit in either of the above
jsonapi.c/h - lexing/parsing code that can be used in either the
frontend or the backend

I'm not wedded to that. It just looks like the most natural thing from
where we are now.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#30

Alvaro Herrera

alvherre@2ndquadrant.com

almost 6 years ago

In reply to: Robert Haas (#29)

Re: making the backend's json parser work in frontend code

On 2020-Jan-22, Robert Haas wrote:

On Wed, Jan 22, 2020 at 2:26 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

I'm not sure I see the point of keeping json.h split from jsonapi.h. It
seems to me that you could move back all the contents from jsonapi.h
into json.h, and everything would work just as well. (Evidently the
Datum in JsonEncodeDateTime's proto is problematic ... perhaps putting
that prototype in jsonfuncs.h would work.)

I don't really object to your 0001 patch as posted, though.

The goal is to make it possible to use the JSON parser in the
frontend, and we can't do that if the header files that would have to
be included on the frontend side rely on things that only work in the
backend. As written, the patch series leaves json.h with a dependency
on Datum, so the stuff that it leaves in jsonapi.h (which is intended
to be the header that gets moved to src/common and included by
frontend code) can't be merged with it.

Right, I agree with that goal, and as I said, I don't object to your
patch as posted.

Now, we could obviously rearrange that. I don't think any of the file
naming here is great. But I think we probably want, as far as
possible, for the code in FOO.c to correspond to the prototypes in
FOO.h. What I'm thinking we should work towards is:

json.c/h - support for the 'json' data type
jsonb.c/h - support for the 'jsonb' data type
jsonfuncs.c/h - backend code that doesn't fit in either of the above
jsonapi.c/h - lexing/parsing code that can be used in either the
frontend or the backend

... it would probably require more work to make this 100% attainable,
but I don't really care all that much.

I'm not wedded to that. It just looks like the most natural thing from
where we are now.

Let's go with it.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#31

Mark Dilger

mark.dilger@enterprisedb.com

almost 6 years ago

In reply to: Alvaro Herrera (#30)

11 attachment(s)

Re: making the backend's json parser work in frontend code

On Jan 22, 2020, at 12:11 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Jan-22, Robert Haas wrote:

On Wed, Jan 22, 2020 at 2:26 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

I'm not sure I see the point of keeping json.h split from jsonapi.h. It
seems to me that you could move back all the contents from jsonapi.h
into json.h, and everything would work just as well. (Evidently the
Datum in JsonEncodeDateTime's proto is problematic ... perhaps putting
that prototype in jsonfuncs.h would work.)

I don't really object to your 0001 patch as posted, though.

The goal is to make it possible to use the JSON parser in the
frontend, and we can't do that if the header files that would have to
be included on the frontend side rely on things that only work in the
backend. As written, the patch series leaves json.h with a dependency
on Datum, so the stuff that it leaves in jsonapi.h (which is intended
to be the header that gets moved to src/common and included by
frontend code) can't be merged with it.

Right, I agree with that goal, and as I said, I don't object to your
patch as posted.

Now, we could obviously rearrange that. I don't think any of the file
naming here is great. But I think we probably want, as far as
possible, for the code in FOO.c to correspond to the prototypes in
FOO.h. What I'm thinking we should work towards is:

json.c/h - support for the 'json' data type
jsonb.c/h - support for the 'jsonb' data type
jsonfuncs.c/h - backend code that doesn't fit in either of the above
jsonapi.c/h - lexing/parsing code that can be used in either the
frontend or the backend

... it would probably require more work to make this 100% attainable,
but I don't really care all that much.

I'm not wedded to that. It just looks like the most natural thing from
where we are now.

Let's go with it.

I have this done in my local repo to the point that I can build frontend tools against the json parser that is now in src/common and also run all the check-world tests without failure. I’m planning to post my work soon, possibly tonight if I don’t run out of time, but more likely tomorrow.

The main issue remaining is that my repo has a lot of stuff organized differently than Robert’s patches, so I’m trying to turn my code into a simple extension of his work rather than having my implementation compete against his.

For the curious, the code as Robert left it still relies on the DatabaseEncoding though the use of GetDatabaseEncoding, pg_mblen, and similar, and that has been changed in my patches to only rely on the database encoding in the backend, with the code in src/common taking an explicit encoding, which the backend gets in the usual way and the frontend might get with PQenv2encoding() or whatever the frontend programmer finds appropriate. Hopefully, this addresses Robert’s concern upthread about the filesystem name not necessarily being in utf8 format, though I might be misunderstanding the exact thrust of his concern. I can think of other possible interpretations of his concern as he expressed it, so I’ll wait for him to clarify.

For those who want a sneak peak, I’m attaching WIP patches to this email with all my changes, with Robert’s changes partially manually cherry-picked and the rest still unmerged. *THESE ARE NOT MEANT FOR COMMIT. THIS IS FOR ADVISORY PURPOSES ONLY.*. I have some debugging cruft left in here, too, like gcc __attribute__ stuff that won’t be in the patches I submit.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Move-pg_wchar-from-src-include-mb-to-src-include-com.patch.WIPapplication/octet-stream; name=0001-Move-pg_wchar-from-src-include-mb-to-src-include-com.patch.WIP; x-unix-mode=0644Download

From 49f269c27e689d2cd1fd7b83f159e608b5ad6c89 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 20 Jan 2020 12:55:01 -0800
Subject: [PATCH 01/11] Move pg_wchar from src/include/mb to
 src/include/common.

Moving pg_wchar header file to src/include/common and
updating all files that #include it to use this new location.
---
 contrib/btree_gist/btree_utils_var.h                          | 2 +-
 contrib/dblink/dblink.c                                       | 2 +-
 contrib/fuzzystrmatch/fuzzystrmatch.c                         | 2 +-
 contrib/pg_stat_statements/pg_stat_statements.c               | 2 +-
 contrib/pgcrypto/pgp-pgsql.c                                  | 2 +-
 contrib/postgres_fdw/connection.c                             | 2 +-
 src/backend/access/spgist/spgtextproc.c                       | 2 +-
 src/backend/catalog/genbki.pl                                 | 2 +-
 src/backend/catalog/namespace.c                               | 2 +-
 src/backend/catalog/pg_collation.c                            | 2 +-
 src/backend/catalog/pg_conversion.c                           | 2 +-
 src/backend/catalog/pg_proc.c                                 | 2 +-
 src/backend/commands/collationcmds.c                          | 2 +-
 src/backend/commands/conversioncmds.c                         | 2 +-
 src/backend/commands/copy.c                                   | 2 +-
 src/backend/commands/dbcommands.c                             | 2 +-
 src/backend/commands/extension.c                              | 2 +-
 src/backend/commands/indexcmds.c                              | 2 +-
 src/backend/commands/variable.c                               | 2 +-
 src/backend/executor/execMain.c                               | 2 +-
 src/backend/executor/execPartition.c                          | 2 +-
 src/backend/executor/execUtils.c                              | 2 +-
 src/backend/libpq/pqformat.c                                  | 2 +-
 src/backend/parser/parse_node.c                               | 2 +-
 src/backend/parser/parser.c                                   | 2 +-
 src/backend/parser/scan.l                                     | 2 +-
 src/backend/parser/scansup.c                                  | 2 +-
 src/backend/postmaster/pgstat.c                               | 2 +-
 src/backend/replication/libpqwalreceiver/libpqwalreceiver.c   | 2 +-
 src/backend/replication/logical/logicalfuncs.c                | 2 +-
 src/backend/replication/logical/worker.c                      | 2 +-
 src/backend/tcop/fastpath.c                                   | 2 +-
 src/backend/tcop/postgres.c                                   | 2 +-
 src/backend/utils/adt/ascii.c                                 | 2 +-
 src/backend/utils/adt/format_type.c                           | 2 +-
 src/backend/utils/adt/formatting.c                            | 2 +-
 src/backend/utils/adt/genfile.c                               | 2 +-
 src/backend/utils/adt/json.c                                  | 2 +-
 src/backend/utils/adt/jsonfuncs.c                             | 2 +-
 src/backend/utils/adt/jsonpath_scan.l                         | 2 +-
 src/backend/utils/adt/like.c                                  | 2 +-
 src/backend/utils/adt/like_support.c                          | 2 +-
 src/backend/utils/adt/name.c                                  | 2 +-
 src/backend/utils/adt/oracle_compat.c                         | 2 +-
 src/backend/utils/adt/pg_locale.c                             | 2 +-
 src/backend/utils/adt/ruleutils.c                             | 2 +-
 src/backend/utils/adt/tsvector_op.c                           | 2 +-
 src/backend/utils/adt/varchar.c                               | 2 +-
 src/backend/utils/adt/xml.c                                   | 2 +-
 src/backend/utils/error/elog.c                                | 2 +-
 src/backend/utils/init/miscinit.c                             | 2 +-
 src/backend/utils/init/postinit.c                             | 2 +-
 src/backend/utils/mb/conv.c                                   | 2 +-
 src/backend/utils/mb/conversion_procs/README.euc_jp           | 2 +-
 .../mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c   | 2 +-
 .../mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c   | 2 +-
 .../utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c | 2 +-
 .../mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c     | 2 +-
 .../utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c | 2 +-
 src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c  | 2 +-
 .../mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c     | 2 +-
 .../conversion_procs/latin2_and_win1250/latin2_and_win1250.c  | 2 +-
 .../utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c   | 2 +-
 .../utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c   | 2 +-
 .../mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c | 2 +-
 .../mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c   | 2 +-
 .../mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c     | 2 +-
 .../mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c     | 2 +-
 .../mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c     | 2 +-
 .../mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c     | 2 +-
 .../mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c   | 2 +-
 .../utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c     | 2 +-
 .../mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c   | 2 +-
 .../conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c  | 2 +-
 .../utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c | 2 +-
 .../utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c   | 2 +-
 .../mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c | 2 +-
 .../utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c     | 2 +-
 .../utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c     | 2 +-
 src/backend/utils/mb/mbutils.c                                | 2 +-
 src/backend/utils/mb/stringinfo_mb.c                          | 2 +-
 src/backend/utils/mb/wstrcmp.c                                | 2 +-
 src/backend/utils/mb/wstrncmp.c                               | 2 +-
 src/backend/utils/misc/guc-file.l                             | 2 +-
 src/backend/utils/mmgr/mcxt.c                                 | 2 +-
 src/bin/initdb/initdb.c                                       | 2 +-
 src/bin/pg_upgrade/check.c                                    | 2 +-
 src/bin/psql/mainloop.c                                       | 2 +-
 src/common/encnames.c                                         | 4 ++--
 src/common/saslprep.c                                         | 2 +-
 src/common/wchar.c                                            | 4 ++--
 src/include/catalog/pg_conversion.dat                         | 2 +-
 src/include/{mb => common}/pg_wchar.h                         | 2 +-
 src/include/common/unicode_norm.h                             | 2 +-
 src/include/libpq/pqformat.h                                  | 2 +-
 src/include/regex/regcustom.h                                 | 2 +-
 src/include/regex/regex.h                                     | 2 +-
 src/include/tsearch/ts_locale.h                               | 2 +-
 src/interfaces/libpq/fe-connect.c                             | 2 +-
 src/interfaces/libpq/fe-exec.c                                | 2 +-
 src/interfaces/libpq/fe-misc.c                                | 2 +-
 src/interfaces/libpq/fe-protocol3.c                           | 2 +-
 src/pl/plperl/plperl.c                                        | 2 +-
 src/pl/plperl/plperl_helpers.h                                | 2 +-
 src/pl/plpgsql/src/pl_scanner.c                               | 2 +-
 src/pl/plpython/plpy_cursorobject.c                           | 2 +-
 src/pl/plpython/plpy_plpymodule.c                             | 2 +-
 src/pl/plpython/plpy_spi.c                                    | 2 +-
 src/pl/plpython/plpy_typeio.c                                 | 2 +-
 src/pl/plpython/plpy_util.c                                   | 2 +-
 src/pl/tcl/pltcl.c                                            | 2 +-
 src/port/chklocale.c                                          | 2 +-
 112 files changed, 114 insertions(+), 114 deletions(-)
 rename src/include/{mb => common}/pg_wchar.h (99%)

diff --git a/contrib/btree_gist/btree_utils_var.h b/contrib/btree_gist/btree_utils_var.h
index 2f8def655c..4ae273d767 100644
--- a/contrib/btree_gist/btree_utils_var.h
+++ b/contrib/btree_gist/btree_utils_var.h
@@ -6,7 +6,7 @@
 
 #include "access/gist.h"
 #include "btree_gist.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 /* Variable length key */
 typedef bytea GBT_VARKEY;
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 1dddf02779..a202772fad 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -44,12 +44,12 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_type.h"
 #include "catalog/pg_user_mapping.h"
+#include "common/pg_wchar.h"
 #include "executor/spi.h"
 #include "foreign/foreign.h"
 #include "funcapi.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/scansup.h"
 #include "utils/acl.h"
diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c b/contrib/fuzzystrmatch/fuzzystrmatch.c
index ccbb84b481..aa0a209dab 100644
--- a/contrib/fuzzystrmatch/fuzzystrmatch.c
+++ b/contrib/fuzzystrmatch/fuzzystrmatch.c
@@ -40,7 +40,7 @@
 
 #include <ctype.h>
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/varlena.h"
 
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 6f82a671ee..53fa917e95 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -62,9 +62,9 @@
 #include <unistd.h>
 
 #include "catalog/pg_authid.h"
+#include "common/pg_wchar.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/analyze.h"
 #include "parser/parsetree.h"
diff --git a/contrib/pgcrypto/pgp-pgsql.c b/contrib/pgcrypto/pgp-pgsql.c
index 8be895df80..960fa1c3e0 100644
--- a/contrib/pgcrypto/pgp-pgsql.c
+++ b/contrib/pgcrypto/pgp-pgsql.c
@@ -32,9 +32,9 @@
 #include "postgres.h"
 
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "funcapi.h"
 #include "lib/stringinfo.h"
-#include "mb/pg_wchar.h"
 #include "mbuf.h"
 #include "pgp.h"
 #include "px.h"
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 29c811a80b..db897e7819 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -16,7 +16,7 @@
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
 #include "commands/defrem.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postgres_fdw.h"
diff --git a/src/backend/access/spgist/spgtextproc.c b/src/backend/access/spgist/spgtextproc.c
index b5ec81937c..d166a6352c 100644
--- a/src/backend/access/spgist/spgtextproc.c
+++ b/src/backend/access/spgist/spgtextproc.c
@@ -41,7 +41,7 @@
 
 #include "access/spgist.h"
 #include "catalog/pg_type.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/pg_locale.h"
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 803251207b..b8c4c22fad 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -327,7 +327,7 @@ foreach my $row (@{ $catalog_data{pg_type} })
 # as for OIDs, but we have to dig the values out of pg_wchar.h.
 my %encids;
 
-my $encfile = $include_path . 'mb/pg_wchar.h';
+my $encfile = $include_path . 'common/pg_wchar.h';
 open(my $ef, '<', $encfile) || die "$encfile: $!";
 
 # We're parsing an enum, so start with 0 and increment
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index e70243a008..f7fd77ed1a 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -40,8 +40,8 @@
 #include "catalog/pg_ts_template.h"
 #include "catalog/pg_type.h"
 #include "commands/dbcommands.h"
+#include "common/pg_wchar.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index 8559779a4f..a5dd02d2ee 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -24,7 +24,7 @@
 #include "catalog/objectaccess.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_namespace.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
 #include "utils/pg_locale.h"
diff --git a/src/backend/catalog/pg_conversion.c b/src/backend/catalog/pg_conversion.c
index b38df4f696..1163aa709a 100644
--- a/src/backend/catalog/pg_conversion.c
+++ b/src/backend/catalog/pg_conversion.c
@@ -25,7 +25,7 @@
 #include "catalog/pg_conversion.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_proc.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/catcache.h"
 #include "utils/fmgroids.h"
diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c
index 5194dcaac0..d9bb3ebc2a 100644
--- a/src/backend/catalog/pg_proc.c
+++ b/src/backend/catalog/pg_proc.c
@@ -27,9 +27,9 @@
 #include "catalog/pg_transform.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
+#include "common/pg_wchar.h"
 #include "executor/functions.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "parser/parse_type.h"
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 34c75e8b56..59c0abeea5 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -27,7 +27,7 @@
 #include "commands/comment.h"
 #include "commands/dbcommands.h"
 #include "commands/defrem.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
diff --git a/src/backend/commands/conversioncmds.c b/src/backend/commands/conversioncmds.c
index f974478b26..87b4cc1090 100644
--- a/src/backend/commands/conversioncmds.c
+++ b/src/backend/commands/conversioncmds.c
@@ -21,7 +21,7 @@
 #include "catalog/pg_type.h"
 #include "commands/alter.h"
 #include "commands/conversioncmds.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parse_func.h"
 #include "utils/builtins.h"
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 40a8ec1abd..02cb2d9718 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -30,6 +30,7 @@
 #include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/trigger.h"
+#include "common/pg_wchar.h"
 #include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
@@ -37,7 +38,6 @@
 #include "foreign/fdwapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/optimizer.h"
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 367c30adb0..0a6e6c6c6b 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -46,7 +46,7 @@
 #include "commands/defrem.h"
 #include "commands/seclabel.h"
 #include "commands/tablespace.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/bgwriter.h"
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 01de398dcb..b17bc69a65 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -50,8 +50,8 @@
 #include "commands/defrem.h"
 #include "commands/extension.h"
 #include "commands/schemacmds.h"
+#include "common/pg_wchar.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "storage/fd.h"
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 52ce02f898..e27ead40bd 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -39,7 +39,7 @@
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/tablespace.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index 484f7ea2c0..a197cebb84 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -24,7 +24,7 @@
 #include "access/xlog.h"
 #include "catalog/pg_authid.h"
 #include "commands/variable.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b03e02ae6c..5275f439c0 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -47,11 +47,11 @@
 #include "catalog/pg_publication.h"
 #include "commands/matview.h"
 #include "commands/trigger.h"
+#include "common/pg_wchar.h"
 #include "executor/execdebug.h"
 #include "executor/nodeSubplan.h"
 #include "foreign/fdwapi.h"
 #include "jit/jit.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
 #include "storage/bufmgr.h"
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index c13b1d3501..169ff50407 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -18,10 +18,10 @@
 #include "catalog/partition.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "partitioning/partbounds.h"
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index cc5177cc2b..673f2db641 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -50,9 +50,9 @@
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/transam.h"
+#include "common/pg_wchar.h"
 #include "executor/executor.h"
 #include "jit/jit.h"
-#include "mb/pg_wchar.h"
 #include "nodes/nodeFuncs.h"
 #include "parser/parsetree.h"
 #include "partitioning/partdesc.h"
diff --git a/src/backend/libpq/pqformat.c b/src/backend/libpq/pqformat.c
index a6f990c2d2..82ee1f26ba 100644
--- a/src/backend/libpq/pqformat.c
+++ b/src/backend/libpq/pqformat.c
@@ -73,9 +73,9 @@
 
 #include <sys/param.h>
 
+#include "common/pg_wchar.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
 #include "port/pg_bswap.h"
 
 
diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index 6e98fe55fc..a075354805 100644
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -17,7 +17,7 @@
 #include "access/htup_details.h"
 #include "access/table.h"
 #include "catalog/pg_type.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "parser/parse_coerce.h"
diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c
index 1bf1144c4f..0922e9436f 100644
--- a/src/backend/parser/parser.c
+++ b/src/backend/parser/parser.c
@@ -21,7 +21,7 @@
 
 #include "postgres.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "parser/gramparse.h"
 #include "parser/parser.h"
 #include "parser/scansup.h"
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 84c73914a8..6bade98a30 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -35,11 +35,11 @@
 #include <ctype.h>
 #include <unistd.h>
 
+#include "common/pg_wchar.h"
 #include "common/string.h"
 #include "parser/gramparse.h"
 #include "parser/parser.h"		/* only needed for GUC variables */
 #include "parser/scansup.h"
-#include "mb/pg_wchar.h"
 }
 
 %{
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 18169ec4f4..ff1e6ab8d5 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -17,7 +17,7 @@
 
 #include <ctype.h>
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "parser/scansup.h"
 
 /* ----------------
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 51c486bebd..7913fa67f0 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -41,9 +41,9 @@
 #include "catalog/pg_database.h"
 #include "catalog/pg_proc.h"
 #include "common/ip.h"
+#include "common/pg_wchar.h"
 #include "libpq/libpq.h"
 #include "libpq/pqsignal.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "pgstat.h"
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index e4fd1f9bb6..c8f8dd178c 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -21,9 +21,9 @@
 
 #include "access/xlog.h"
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "funcapi.h"
 #include "libpq-fe.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "pqexpbuffer.h"
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 7693c98949..9bd38a936e 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -21,9 +21,9 @@
 #include "access/xlog_internal.h"
 #include "access/xlogutils.h"
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "replication/decode.h"
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 7a5471f95c..a09c8bd092 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -33,12 +33,12 @@
 #include "catalog/pg_subscription_rel.h"
 #include "commands/tablecmds.h"
 #include "commands/trigger.h"
+#include "common/pg_wchar.h"
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "funcapi.h"
 #include "libpq/pqformat.h"
 #include "libpq/pqsignal.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/optimizer.h"
diff --git a/src/backend/tcop/fastpath.c b/src/backend/tcop/fastpath.c
index e793984a9f..ef730acc5b 100644
--- a/src/backend/tcop/fastpath.c
+++ b/src/backend/tcop/fastpath.c
@@ -21,9 +21,9 @@
 #include "access/xact.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_proc.h"
+#include "common/pg_wchar.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "port/pg_bswap.h"
 #include "tcop/fastpath.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0a6f80963b..7f03d0acda 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -42,12 +42,12 @@
 #include "catalog/pg_type.h"
 #include "commands/async.h"
 #include "commands/prepare.h"
+#include "common/pg_wchar.h"
 #include "executor/spi.h"
 #include "jit/jit.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "libpq/pqsignal.h"
-#include "mb/pg_wchar.h"
 #include "mb/stringinfo_mb.h"
 #include "miscadmin.h"
 #include "nodes/print.h"
diff --git a/src/backend/utils/adt/ascii.c b/src/backend/utils/adt/ascii.c
index 3aa8a5e7d2..7ac3536705 100644
--- a/src/backend/utils/adt/ascii.c
+++ b/src/backend/utils/adt/ascii.c
@@ -11,7 +11,7 @@
  */
 #include "postgres.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/ascii.h"
 #include "utils/builtins.h"
 
diff --git a/src/backend/utils/adt/format_type.c b/src/backend/utils/adt/format_type.c
index 92ee77ac5c..edbded16d5 100644
--- a/src/backend/utils/adt/format_type.c
+++ b/src/backend/utils/adt/format_type.c
@@ -20,7 +20,7 @@
 #include "access/htup_details.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/numeric.h"
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index ca3c48d024..ff026f410c 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -87,7 +87,7 @@
 
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/date.h"
 #include "utils/datetime.h"
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index 0d75928e7f..3da50444ab 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -25,8 +25,8 @@
 #include "catalog/pg_authid.h"
 #include "catalog/pg_tablespace_d.h"
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "postmaster/syslogger.h"
 #include "storage/fd.h"
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 458505abfd..364cadbaab 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -16,11 +16,11 @@
 #include "access/htup_details.h"
 #include "access/transam.h"
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "executor/spi.h"
 #include "funcapi.h"
 #include "lib/stringinfo.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "utils/array.h"
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 38758a626b..14aefd8fe2 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -18,10 +18,10 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
 #include "funcapi.h"
 #include "lib/stringinfo.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
diff --git a/src/backend/utils/adt/jsonpath_scan.l b/src/backend/utils/adt/jsonpath_scan.l
index 70681b789d..bde5539aed 100644
--- a/src/backend/utils/adt/jsonpath_scan.l
+++ b/src/backend/utils/adt/jsonpath_scan.l
@@ -17,7 +17,7 @@
 
 #include "postgres.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "nodes/pg_list.h"
 
 static JsonPathString scanstring;
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 5bf94628c3..1314fce1b4 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -20,7 +20,7 @@
 #include <ctype.h>
 
 #include "catalog/pg_collation.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/pg_locale.h"
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 286e000d4e..7d20abc9be 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -43,7 +43,7 @@
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_type.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
diff --git a/src/backend/utils/adt/name.c b/src/backend/utils/adt/name.c
index 6749e75c89..9e81df701c 100644
--- a/src/backend/utils/adt/name.c
+++ b/src/backend/utils/adt/name.c
@@ -23,8 +23,8 @@
 #include "catalog/namespace.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
diff --git a/src/backend/utils/adt/oracle_compat.c b/src/backend/utils/adt/oracle_compat.c
index 0d56dc898a..9c4131e6f2 100644
--- a/src/backend/utils/adt/oracle_compat.c
+++ b/src/backend/utils/adt/oracle_compat.c
@@ -16,7 +16,7 @@
 #include "postgres.h"
 
 #include "common/int.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/formatting.h"
 
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 25fb7e2ebf..5900443039 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -57,7 +57,7 @@
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_control.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/formatting.h"
 #include "utils/hsearch.h"
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 116e00bce4..5ae53968f1 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -43,9 +43,9 @@
 #include "commands/defrem.h"
 #include "commands/tablespace.h"
 #include "common/keywords.h"
+#include "common/pg_wchar.h"
 #include "executor/spi.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index e33ca5abe7..f68d87cb35 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -19,10 +19,10 @@
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "common/pg_wchar.h"
 #include "executor/spi.h"
 #include "funcapi.h"
 #include "lib/qunique.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "tsearch/ts_utils.h"
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index 1e1239a1ba..69f6d9d8c1 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -17,8 +17,8 @@
 #include "access/detoast.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
 #include "utils/array.h"
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 3808c307f6..74c8a268f2 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -73,12 +73,12 @@
 #include "catalog/pg_class.h"
 #include "catalog/pg_type.h"
 #include "commands/dbcommands.h"
+#include "common/pg_wchar.h"
 #include "executor/spi.h"
 #include "executor/tablefunc.h"
 #include "fmgr.h"
 #include "lib/stringinfo.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/execnodes.h"
 #include "nodes/nodeFuncs.h"
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index f5b0211f66..61ad9d8f89 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -68,9 +68,9 @@
 
 #include "access/transam.h"
 #include "access/xact.h"
+#include "common/pg_wchar.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index c4b2946986..15f79a6d41 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -33,8 +33,8 @@
 #include "access/htup_details.h"
 #include "catalog/pg_authid.h"
 #include "common/file_perm.h"
+#include "common/pg_wchar.h"
 #include "libpq/libpq.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 8a47dcdcb1..1a9bb9b98d 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -34,9 +34,9 @@
 #include "catalog/pg_database.h"
 #include "catalog/pg_db_role_setting.h"
 #include "catalog/pg_tablespace.h"
+#include "common/pg_wchar.h"
 #include "libpq/auth.h"
 #include "libpq/libpq-be.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
diff --git a/src/backend/utils/mb/conv.c b/src/backend/utils/mb/conv.c
index 54dcf71fb7..4b0fc23285 100644
--- a/src/backend/utils/mb/conv.c
+++ b/src/backend/utils/mb/conv.c
@@ -11,7 +11,7 @@
  *-------------------------------------------------------------------------
  */
 #include "postgres.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 
 /*
diff --git a/src/backend/utils/mb/conversion_procs/README.euc_jp b/src/backend/utils/mb/conversion_procs/README.euc_jp
index 6e59b7bd7f..97cc63f1bd 100644
--- a/src/backend/utils/mb/conversion_procs/README.euc_jp
+++ b/src/backend/utils/mb/conversion_procs/README.euc_jp
@@ -35,7 +35,7 @@ o C
   ������������������(5��������������������������������NULL������������
   ����������������������)������������������������
 
-  ����������������ID��include/mb/pg_wchar.h��typedef enum pg_enc������
+  ����������������ID��include/common/pg_wchar.h��typedef enum pg_enc������
   ��������������
 
 o ����������������������
diff --git a/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c b/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c
index 376b48ca61..5500dd4fed 100644
--- a/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c b/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
index 9ba6bd3040..26051530b7 100644
--- a/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
+++ b/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
@@ -11,8 +11,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c b/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c
index 59c6c3bb12..d1cb25ea92 100644
--- a/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c b/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
index 4ca8e2126e..ffc3896c9d 100644
--- a/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
+++ b/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 
 /*
  * SJIS alternative code.
diff --git a/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c b/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
index 4d7876a666..2183cf183a 100644
--- a/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c
index 68f76aa8cb..7601bd65f2 100644
--- a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c
+++ b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c
@@ -13,7 +13,7 @@
 /* can be used in either frontend or backend */
 #include "postgres_fe.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 typedef struct
 {
diff --git a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
index 82a22b9beb..e209e9a545 100644
--- a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
+++ b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 
 #define ENCODING_GROWTH_RATE 4
 
diff --git a/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c b/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c
index f424f88145..72ab071d87 100644
--- a/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c
+++ b/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c b/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c
index a358a707c1..5868b66715 100644
--- a/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c b/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c
index 75ed49ac54..542e055ea2 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/big5_to_utf8.map"
 #include "../../Unicode/utf8_to_big5.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c b/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c
index 90ad316111..d145da093f 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/utf8_to_koi8r.map"
 #include "../../Unicode/koi8r_to_utf8.map"
 #include "../../Unicode/utf8_to_koi8u.map"
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c
index 018312489c..8d56d2343c 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/euc_jis_2004_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_jis_2004.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c
index 62182a9ba8..41f2f9a5ca 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/euc_cn_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_cn.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c
index dc5abb5dfd..5a9f8ecfd4 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/euc_jp_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_jp.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c
index 088a38d839..d09f9dc78a 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/euc_kr_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_kr.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c
index a9fe94f88b..8e0250e603 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/euc_tw_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_tw.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c b/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c
index 96909b5885..6b5b326c6f 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/gb18030_to_utf8.map"
 #include "../../Unicode/utf8_to_gb18030.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c b/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c
index 78bbcd3ce7..4e2246da7a 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/gbk_to_utf8.map"
 #include "../../Unicode/utf8_to_gbk.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
index 348524f4a2..8051eefed9 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/iso8859_10_to_utf8.map"
 #include "../../Unicode/iso8859_13_to_utf8.map"
 #include "../../Unicode/iso8859_14_to_utf8.map"
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
index 2cdca9f780..b9e127cc5e 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c b/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c
index e09a7c8e41..bece993e1d 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/johab_to_utf8.map"
 #include "../../Unicode/utf8_to_johab.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c b/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c
index c56fa80a4b..8307d4ceb4 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/sjis_to_utf8.map"
 #include "../../Unicode/utf8_to_sjis.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c b/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c
index 458500998d..c913ba0866 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/shift_jis_2004_to_utf8.map"
 #include "../../Unicode/utf8_to_shift_jis_2004.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c b/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c
index 3226ed0325..87ae61ee85 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/uhc_to_utf8.map"
 #include "../../Unicode/utf8_to_uhc.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c b/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c
index 1a0074d063..200fd7f23a 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/utf8_to_win1250.map"
 #include "../../Unicode/utf8_to_win1251.map"
 #include "../../Unicode/utf8_to_win1252.map"
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 86787bcb31..4cec7ec73a 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -36,7 +36,7 @@
 
 #include "access/xact.h"
 #include "catalog/namespace.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
diff --git a/src/backend/utils/mb/stringinfo_mb.c b/src/backend/utils/mb/stringinfo_mb.c
index c153b77007..e7462a09b5 100644
--- a/src/backend/utils/mb/stringinfo_mb.c
+++ b/src/backend/utils/mb/stringinfo_mb.c
@@ -19,8 +19,8 @@
  */
 #include "postgres.h"
 
+#include "common/pg_wchar.h"
 #include "mb/stringinfo_mb.h"
-#include "mb/pg_wchar.h"
 
 
 /*
diff --git a/src/backend/utils/mb/wstrcmp.c b/src/backend/utils/mb/wstrcmp.c
index dad3ae023a..e5f57d717d 100644
--- a/src/backend/utils/mb/wstrcmp.c
+++ b/src/backend/utils/mb/wstrcmp.c
@@ -35,7 +35,7 @@
 /* can be used in either frontend or backend */
 #include "postgres_fe.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 int
 pg_char_and_wchar_strcmp(const char *s1, const pg_wchar *s2)
diff --git a/src/backend/utils/mb/wstrncmp.c b/src/backend/utils/mb/wstrncmp.c
index ea4823fc6f..cce0c6c5cf 100644
--- a/src/backend/utils/mb/wstrncmp.c
+++ b/src/backend/utils/mb/wstrncmp.c
@@ -34,7 +34,7 @@
 /* can be used in either frontend or backend */
 #include "postgres_fe.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 int
 pg_wchar_strncmp(const pg_wchar *s1, const pg_wchar *s2, size_t n)
diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l
index 268b745528..6c9e60ef64 100644
--- a/src/backend/utils/misc/guc-file.l
+++ b/src/backend/utils/misc/guc-file.l
@@ -14,7 +14,7 @@
 #include <ctype.h>
 #include <unistd.h>
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "storage/fd.h"
 #include "utils/guc.h"
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 9e24fec72d..71701de091 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -21,7 +21,7 @@
 
 #include "postgres.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/memdebug.h"
 #include "utils/memutils.h"
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 7f1534aebb..0f1758d8e4 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -66,12 +66,12 @@
 #include "common/file_perm.h"
 #include "common/file_utils.h"
 #include "common/logging.h"
+#include "common/pg_wchar.h"
 #include "common/restricted_token.h"
 #include "common/username.h"
 #include "fe_utils/string_utils.h"
 #include "getaddrinfo.h"
 #include "getopt_long.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 
 
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 5f9a102a74..faad97fb8b 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -10,8 +10,8 @@
 #include "postgres_fe.h"
 
 #include "catalog/pg_authid_d.h"
+#include "common/pg_wchar.h"
 #include "fe_utils/string_utils.h"
-#include "mb/pg_wchar.h"
 #include "pg_upgrade.h"
 
 static void check_new_cluster_is_empty(void);
diff --git a/src/bin/psql/mainloop.c b/src/bin/psql/mainloop.c
index bdf803a053..3ea44e0f78 100644
--- a/src/bin/psql/mainloop.c
+++ b/src/bin/psql/mainloop.c
@@ -10,9 +10,9 @@
 #include "command.h"
 #include "common.h"
 #include "common/logging.h"
+#include "common/pg_wchar.h"
 #include "input.h"
 #include "mainloop.h"
-#include "mb/pg_wchar.h"
 #include "prompt.h"
 #include "settings.h"
 
diff --git a/src/common/encnames.c b/src/common/encnames.c
index 14cf1b39e9..f06221e1d5 100644
--- a/src/common/encnames.c
+++ b/src/common/encnames.c
@@ -15,7 +15,7 @@
 #include <ctype.h>
 #include <unistd.h>
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 
 /* ----------
@@ -297,7 +297,7 @@ static const pg_encname pg_encname_tbl[] =
 
 /* ----------
  * These are "official" encoding names.
- * XXX must be sorted by the same order as enum pg_enc (in mb/pg_wchar.h)
+ * XXX must be sorted by the same order as enum pg_enc (in common/pg_wchar.h)
  * ----------
  */
 #ifndef WIN32
diff --git a/src/common/saslprep.c b/src/common/saslprep.c
index 7739b81807..ac49f32eab 100644
--- a/src/common/saslprep.c
+++ b/src/common/saslprep.c
@@ -25,9 +25,9 @@
 #include "postgres_fe.h"
 #endif
 
+#include "common/pg_wchar.h"
 #include "common/saslprep.h"
 #include "common/unicode_norm.h"
-#include "mb/pg_wchar.h"
 
 /*
  * Limit on how large password's we will try to process.  A password
diff --git a/src/common/wchar.c b/src/common/wchar.c
index efaf1c155b..53006115d2 100644
--- a/src/common/wchar.c
+++ b/src/common/wchar.c
@@ -12,7 +12,7 @@
  */
 #include "c.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 
 /*
@@ -1499,7 +1499,7 @@ pg_utf8_islegal(const unsigned char *source, int length)
 /*
  *-------------------------------------------------------------------
  * encoding info table
- * XXX must be sorted by the same order as enum pg_enc (in mb/pg_wchar.h)
+ * XXX must be sorted by the same order as enum pg_enc (in common/pg_wchar.h)
  *-------------------------------------------------------------------
  */
 const pg_wchar_tbl pg_wchar_table[] = {
diff --git a/src/include/catalog/pg_conversion.dat b/src/include/catalog/pg_conversion.dat
index d7120f2fb0..9913d2e370 100644
--- a/src/include/catalog/pg_conversion.dat
+++ b/src/include/catalog/pg_conversion.dat
@@ -11,7 +11,7 @@
 #----------------------------------------------------------------------
 
 # Note: conforencoding and contoencoding must match the spelling of
-# the labels used in the enum pg_enc in mb/pg_wchar.h.
+# the labels used in the enum pg_enc in common/pg_wchar.h.
 
 [
 
diff --git a/src/include/mb/pg_wchar.h b/src/include/common/pg_wchar.h
similarity index 99%
rename from src/include/mb/pg_wchar.h
rename to src/include/common/pg_wchar.h
index b8892ef730..a1b3a27b9f 100644
--- a/src/include/mb/pg_wchar.h
+++ b/src/include/common/pg_wchar.h
@@ -6,7 +6,7 @@
  * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
- * src/include/mb/pg_wchar.h
+ * src/include/common/pg_wchar.h
  *
  *	NOTES
  *		This is used both by the backend and by frontends, but should not be
diff --git a/src/include/common/unicode_norm.h b/src/include/common/unicode_norm.h
index f1b7ef1aa4..e09c1162eb 100644
--- a/src/include/common/unicode_norm.h
+++ b/src/include/common/unicode_norm.h
@@ -14,7 +14,7 @@
 #ifndef UNICODE_NORM_H
 #define UNICODE_NORM_H
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 extern pg_wchar *unicode_normalize_kc(const pg_wchar *input);
 
diff --git a/src/include/libpq/pqformat.h b/src/include/libpq/pqformat.h
index af31e9caba..daeba6d5c9 100644
--- a/src/include/libpq/pqformat.h
+++ b/src/include/libpq/pqformat.h
@@ -13,8 +13,8 @@
 #ifndef PQFORMAT_H
 #define PQFORMAT_H
 
+#include "common/pg_wchar.h"
 #include "lib/stringinfo.h"
-#include "mb/pg_wchar.h"
 #include "port/pg_bswap.h"
 
 extern void pq_beginmessage(StringInfo buf, char msgtype);
diff --git a/src/include/regex/regcustom.h b/src/include/regex/regcustom.h
index 82c9e2fad8..2506ac8268 100644
--- a/src/include/regex/regcustom.h
+++ b/src/include/regex/regcustom.h
@@ -52,7 +52,7 @@
 #include <wctype.h>
 #endif
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 #include "miscadmin.h"			/* needed by rcancelrequested/rstacktoodeep */
 
diff --git a/src/include/regex/regex.h b/src/include/regex/regex.h
index dc31899aa4..739cc7d14b 100644
--- a/src/include/regex/regex.h
+++ b/src/include/regex/regex.h
@@ -35,7 +35,7 @@
 /*
  * Add your own defines, if needed, here.
  */
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 /*
  * interface types etc.
diff --git a/src/include/tsearch/ts_locale.h b/src/include/tsearch/ts_locale.h
index 17536babfe..f77eb23c8a 100644
--- a/src/include/tsearch/ts_locale.h
+++ b/src/include/tsearch/ts_locale.h
@@ -15,7 +15,7 @@
 #include <ctype.h>
 #include <limits.h>
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/pg_locale.h"
 
 /*
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 80b54bc92b..f4db884735 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -23,12 +23,12 @@
 
 #include "common/ip.h"
 #include "common/link-canary.h"
+#include "common/pg_wchar.h"
 #include "common/scram-common.h"
 #include "common/string.h"
 #include "fe-auth.h"
 #include "libpq-fe.h"
 #include "libpq-int.h"
-#include "mb/pg_wchar.h"
 #include "pg_config_paths.h"
 #include "port/pg_bswap.h"
 
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index eea0237c3a..454c272807 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -24,9 +24,9 @@
 #include <unistd.h>
 #endif
 
+#include "common/pg_wchar.h"
 #include "libpq-fe.h"
 #include "libpq-int.h"
-#include "mb/pg_wchar.h"
 
 /* keep this in same order as ExecStatusType in libpq-fe.h */
 char	   *const pgresStatus[] = {
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index a9074d2f29..261d64e7cd 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -47,9 +47,9 @@
 #include <sys/select.h>
 #endif
 
+#include "common/pg_wchar.h"
 #include "libpq-fe.h"
 #include "libpq-int.h"
-#include "mb/pg_wchar.h"
 #include "pg_config_paths.h"
 #include "port/pg_bswap.h"
 
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 850bf84c96..799a1538da 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -26,9 +26,9 @@
 #endif
 #endif
 
+#include "common/pg_wchar.h"
 #include "libpq-fe.h"
 #include "libpq-int.h"
-#include "mb/pg_wchar.h"
 #include "port/pg_bswap.h"
 
 /*
diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index c78891868a..ac7144e99f 100644
--- a/src/pl/plperl/plperl.c
+++ b/src/pl/plperl/plperl.c
@@ -21,9 +21,9 @@
 #include "catalog/pg_type.h"
 #include "commands/event_trigger.h"
 #include "commands/trigger.h"
+#include "common/pg_wchar.h"
 #include "executor/spi.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "parser/parse_type.h"
diff --git a/src/pl/plperl/plperl_helpers.h b/src/pl/plperl/plperl_helpers.h
index 1e318b6dc8..cec942c280 100644
--- a/src/pl/plperl/plperl_helpers.h
+++ b/src/pl/plperl/plperl_helpers.h
@@ -1,7 +1,7 @@
 #ifndef PL_PERL_HELPERS_H
 #define PL_PERL_HELPERS_H
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 #include "plperl.h"
 
diff --git a/src/pl/plpgsql/src/pl_scanner.c b/src/pl/plpgsql/src/pl_scanner.c
index 9cea2e42ac..a5bb7c474c 100644
--- a/src/pl/plpgsql/src/pl_scanner.c
+++ b/src/pl/plpgsql/src/pl_scanner.c
@@ -15,7 +15,7 @@
  */
 #include "postgres.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "parser/scanner.h"
 
 #include "plpgsql.h"
diff --git a/src/pl/plpython/plpy_cursorobject.c b/src/pl/plpython/plpy_cursorobject.c
index 4c37ff898c..01ad7cf3bc 100644
--- a/src/pl/plpython/plpy_cursorobject.c
+++ b/src/pl/plpython/plpy_cursorobject.c
@@ -10,7 +10,7 @@
 
 #include "access/xact.h"
 #include "catalog/pg_type.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "plpy_cursorobject.h"
 #include "plpy_elog.h"
 #include "plpy_main.h"
diff --git a/src/pl/plpython/plpy_plpymodule.c b/src/pl/plpython/plpy_plpymodule.c
index e308c61d50..b079afc000 100644
--- a/src/pl/plpython/plpy_plpymodule.c
+++ b/src/pl/plpython/plpy_plpymodule.c
@@ -7,7 +7,7 @@
 #include "postgres.h"
 
 #include "access/xact.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "plpy_cursorobject.h"
 #include "plpy_elog.h"
 #include "plpy_main.h"
diff --git a/src/pl/plpython/plpy_spi.c b/src/pl/plpython/plpy_spi.c
index 99c1b4f28f..8b7cfbf18c 100644
--- a/src/pl/plpython/plpy_spi.c
+++ b/src/pl/plpython/plpy_spi.c
@@ -11,8 +11,8 @@
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "executor/spi.h"
-#include "mb/pg_wchar.h"
 #include "parser/parse_type.h"
 #include "plpy_elog.h"
 #include "plpy_main.h"
diff --git a/src/pl/plpython/plpy_typeio.c b/src/pl/plpython/plpy_typeio.c
index e734b0d130..baf45526c6 100644
--- a/src/pl/plpython/plpy_typeio.c
+++ b/src/pl/plpython/plpy_typeio.c
@@ -8,8 +8,8 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "plpy_elog.h"
 #include "plpy_main.h"
diff --git a/src/pl/plpython/plpy_util.c b/src/pl/plpython/plpy_util.c
index 4a7d7264d7..873afcfbb8 100644
--- a/src/pl/plpython/plpy_util.c
+++ b/src/pl/plpython/plpy_util.c
@@ -6,7 +6,7 @@
 
 #include "postgres.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "plpy_elog.h"
 #include "plpy_util.h"
 #include "plpython.h"
diff --git a/src/pl/tcl/pltcl.c b/src/pl/tcl/pltcl.c
index e7640008fd..47a328ba29 100644
--- a/src/pl/tcl/pltcl.c
+++ b/src/pl/tcl/pltcl.c
@@ -20,10 +20,10 @@
 #include "catalog/pg_type.h"
 #include "commands/event_trigger.h"
 #include "commands/trigger.h"
+#include "common/pg_wchar.h"
 #include "executor/spi.h"
 #include "fmgr.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
diff --git a/src/port/chklocale.c b/src/port/chklocale.c
index c9c680f0b3..4ebc11bab8 100644
--- a/src/port/chklocale.c
+++ b/src/port/chklocale.c
@@ -23,7 +23,7 @@
 #include <langinfo.h>
 #endif
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 
 /*
-- 
2.21.1 (Apple Git-122.3)

0002-Moving-backend-only-functions-from-pg_wchar.h-into-m.patch.WIPapplication/octet-stream; name=0002-Moving-backend-only-functions-from-pg_wchar.h-into-m.patch.WIP; x-unix-mode=0644Download

From a2235873f032360059d900fc5c616ab1dd1b1511 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 20 Jan 2020 15:48:06 -0800
Subject: [PATCH 02/11] Moving backend-only functions from pg_wchar.h into
 mbutils.h

The functions declared in pg_wchar.h which are not suitable
for frontend code now are moved into a new file,
src/include/utils/mbutils.h.
---
 contrib/btree_gist/btree_utils_var.h          |   1 +
 contrib/dblink/dblink.c                       |   1 +
 .../pg_stat_statements/pg_stat_statements.c   |   1 +
 contrib/pgcrypto/pgp-pgsql.c                  |   1 +
 contrib/postgres_fdw/connection.c             |   1 +
 src/backend/access/spgist/spgtextproc.c       |   1 +
 src/backend/catalog/namespace.c               |   1 +
 src/backend/catalog/pg_collation.c            |   1 +
 src/backend/catalog/pg_proc.c                 |   1 +
 src/backend/commands/collationcmds.c          |   1 +
 src/backend/commands/copy.c                   |   1 +
 src/backend/commands/extension.c              |   1 +
 src/backend/commands/indexcmds.c              |   1 +
 src/backend/commands/variable.c               |   1 +
 src/backend/executor/execMain.c               |   1 +
 src/backend/executor/execPartition.c          |   1 +
 src/backend/executor/execUtils.c              |   1 +
 src/backend/parser/parse_node.c               |   1 +
 src/backend/parser/parser.c                   |   1 +
 src/backend/parser/scan.l                     |   1 +
 src/backend/parser/scansup.c                  |   1 +
 src/backend/postmaster/pgstat.c               |   1 +
 .../libpqwalreceiver/libpqwalreceiver.c       |   1 +
 .../replication/logical/logicalfuncs.c        |   1 +
 src/backend/utils/adt/ascii.c                 |   1 +
 src/backend/utils/adt/format_type.c           |   1 +
 src/backend/utils/adt/formatting.c            |   1 +
 src/backend/utils/adt/genfile.c               |   1 +
 src/backend/utils/adt/json.c                  |   1 +
 src/backend/utils/adt/jsonpath_scan.l         |   1 +
 src/backend/utils/adt/like.c                  |   1 +
 src/backend/utils/adt/like_support.c          |   1 +
 src/backend/utils/adt/name.c                  |   1 +
 src/backend/utils/adt/oracle_compat.c         |   1 +
 src/backend/utils/adt/pg_locale.c             |   1 +
 src/backend/utils/adt/ruleutils.c             |   1 +
 src/backend/utils/adt/tsvector_op.c           |   1 +
 src/backend/utils/adt/varchar.c               |   1 +
 src/backend/utils/adt/xml.c                   |   1 +
 src/backend/utils/error/elog.c                |   1 +
 src/backend/utils/init/postinit.c             |   1 +
 src/backend/utils/mb/conv.c                   |   1 +
 .../cyrillic_and_mic/cyrillic_and_mic.c       |   1 +
 .../euc2004_sjis2004/euc2004_sjis2004.c       |   1 +
 .../euc_cn_and_mic/euc_cn_and_mic.c           |   1 +
 .../euc_jp_and_sjis/euc_jp_and_sjis.c         |   1 +
 .../euc_kr_and_mic/euc_kr_and_mic.c           |   1 +
 .../conversion_procs/euc_tw_and_big5/big5.c   |   1 +
 .../euc_tw_and_big5/euc_tw_and_big5.c         |   1 +
 .../latin2_and_win1250/latin2_and_win1250.c   |   1 +
 .../latin_and_mic/latin_and_mic.c             |   1 +
 .../utf8_and_big5/utf8_and_big5.c             |   1 +
 .../utf8_and_cyrillic/utf8_and_cyrillic.c     |   1 +
 .../utf8_and_euc2004/utf8_and_euc2004.c       |   1 +
 .../utf8_and_euc_cn/utf8_and_euc_cn.c         |   1 +
 .../utf8_and_euc_jp/utf8_and_euc_jp.c         |   1 +
 .../utf8_and_euc_kr/utf8_and_euc_kr.c         |   1 +
 .../utf8_and_euc_tw/utf8_and_euc_tw.c         |   1 +
 .../utf8_and_gb18030/utf8_and_gb18030.c       |   1 +
 .../utf8_and_gbk/utf8_and_gbk.c               |   1 +
 .../utf8_and_iso8859/utf8_and_iso8859.c       |   1 +
 .../utf8_and_iso8859_1/utf8_and_iso8859_1.c   |   1 +
 .../utf8_and_johab/utf8_and_johab.c           |   1 +
 .../utf8_and_sjis/utf8_and_sjis.c             |   1 +
 .../utf8_and_sjis2004/utf8_and_sjis2004.c     |   1 +
 .../utf8_and_uhc/utf8_and_uhc.c               |   1 +
 .../utf8_and_win/utf8_and_win.c               |   1 +
 src/backend/utils/mb/mbutils.c                |   1 +
 src/backend/utils/mb/stringinfo_mb.c          |   1 +
 src/backend/utils/misc/guc-file.l             |   1 +
 src/backend/utils/mmgr/mcxt.c                 |   1 +
 src/common/wchar.c                            |   1 +
 src/include/common/pg_wchar.h                 | 109 ---------------
 src/include/libpq/pqformat.h                  |   1 +
 src/include/regex/regcustom.h                 |   2 +
 src/include/regex/regex.h                     |   1 +
 src/include/tsearch/ts_locale.h               |   1 +
 src/include/utils/mbutils.h                   | 124 ++++++++++++++++++
 src/interfaces/libpq/fe-protocol3.c           |   1 +
 src/pl/plperl/plperl.c                        |   1 +
 src/pl/plperl/plperl_helpers.h                |   2 +-
 src/pl/plpgsql/src/pl_scanner.c               |   1 +
 src/pl/plpython/plpy_cursorobject.c           |   1 +
 src/pl/plpython/plpy_plpymodule.c             |   1 +
 src/pl/plpython/plpy_spi.c                    |   1 +
 src/pl/plpython/plpy_typeio.c                 |   1 +
 src/pl/plpython/plpy_util.c                   |   1 +
 87 files changed, 210 insertions(+), 110 deletions(-)
 create mode 100644 src/include/utils/mbutils.h

diff --git a/contrib/btree_gist/btree_utils_var.h b/contrib/btree_gist/btree_utils_var.h
index 4ae273d767..a63e6ace2f 100644
--- a/contrib/btree_gist/btree_utils_var.h
+++ b/contrib/btree_gist/btree_utils_var.h
@@ -7,6 +7,7 @@
 #include "access/gist.h"
 #include "btree_gist.h"
 #include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 /* Variable length key */
 typedef bytea GBT_VARKEY;
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index a202772fad..1eda3c63de 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -57,6 +57,7 @@
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/varlena.h"
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 53fa917e95..37f1516922 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -78,6 +78,7 @@
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/hashutils.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 
 PG_MODULE_MAGIC;
diff --git a/contrib/pgcrypto/pgp-pgsql.c b/contrib/pgcrypto/pgp-pgsql.c
index 960fa1c3e0..3ac659e4f0 100644
--- a/contrib/pgcrypto/pgp-pgsql.c
+++ b/contrib/pgcrypto/pgp-pgsql.c
@@ -40,6 +40,7 @@
 #include "px.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 
 /*
  * public functions
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index db897e7819..6c0b13a80c 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -23,6 +23,7 @@
 #include "storage/latch.h"
 #include "utils/hsearch.h"
 #include "utils/inval.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
 
diff --git a/src/backend/access/spgist/spgtextproc.c b/src/backend/access/spgist/spgtextproc.c
index d166a6352c..f56424ef5c 100644
--- a/src/backend/access/spgist/spgtextproc.c
+++ b/src/backend/access/spgist/spgtextproc.c
@@ -44,6 +44,7 @@
 #include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 #include "utils/varlena.h"
 
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index f7fd77ed1a..8d08a60e56 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -54,6 +54,7 @@
 #include "utils/guc.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
 #include "utils/varlena.h"
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index a5dd02d2ee..2f07f45fdc 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -27,6 +27,7 @@
 #include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c
index d9bb3ebc2a..b7def351fa 100644
--- a/src/backend/catalog/pg_proc.c
+++ b/src/backend/catalog/pg_proc.c
@@ -38,6 +38,7 @@
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/regproc.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 59c0abeea5..2869cb68a4 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -31,6 +31,7 @@
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 02cb2d9718..58d177829f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -51,6 +51,7 @@
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/partcache.h"
 #include "utils/portal.h"
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index b17bc69a65..cf82cd49d8 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -60,6 +60,7 @@
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index e27ead40bd..365d610e39 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -59,6 +59,7 @@
 #include "utils/fmgroids.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/partcache.h"
 #include "utils/pg_rusage.h"
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index a197cebb84..e27957bc5b 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -28,6 +28,7 @@
 #include "miscadmin.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 5275f439c0..0ee1163d21 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -59,6 +59,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/partcache.h"
 #include "utils/rls.h"
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 169ff50407..e67fb8698a 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -29,6 +29,7 @@
 #include "partitioning/partprune.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/partcache.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 673f2db641..eddb6c7560 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -58,6 +58,7 @@
 #include "partitioning/partdesc.h"
 #include "storage/lmgr.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/typcache.h"
diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index a075354805..c5993d01c5 100644
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -27,6 +27,7 @@
 #include "utils/builtins.h"
 #include "utils/int8.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/syscache.h"
 #include "utils/varbit.h"
 
diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c
index 0922e9436f..6d27d1c5b7 100644
--- a/src/backend/parser/parser.c
+++ b/src/backend/parser/parser.c
@@ -25,6 +25,7 @@
 #include "parser/gramparse.h"
 #include "parser/parser.h"
 #include "parser/scansup.h"
+#include "utils/mbutils.h"
 
 static bool check_uescapechar(unsigned char escape);
 static char *str_udeescape(const char *str, char escape,
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 6bade98a30..d25ed1ddc4 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -40,6 +40,7 @@
 #include "parser/gramparse.h"
 #include "parser/parser.h"		/* only needed for GUC variables */
 #include "parser/scansup.h"
+#include "utils/mbutils.h"
 }
 
 %{
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index ff1e6ab8d5..6fc2b59fa3 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -19,6 +19,7 @@
 
 #include "common/pg_wchar.h"
 #include "parser/scansup.h"
+#include "utils/mbutils.h"
 
 /* ----------------
  *		scanstr
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 7913fa67f0..041bf18365 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -63,6 +63,7 @@
 #include "storage/sinvaladt.h"
 #include "utils/ascii.h"
 #include "utils/guc.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/rel.h"
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index c8f8dd178c..535c551629 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -29,6 +29,7 @@
 #include "pqexpbuffer.h"
 #include "replication/walreceiver.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/tuplestore.h"
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 9bd38a936e..0a89a353bc 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -35,6 +35,7 @@
 #include "utils/builtins.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/regproc.h"
diff --git a/src/backend/utils/adt/ascii.c b/src/backend/utils/adt/ascii.c
index 7ac3536705..b692aed409 100644
--- a/src/backend/utils/adt/ascii.c
+++ b/src/backend/utils/adt/ascii.c
@@ -14,6 +14,7 @@
 #include "common/pg_wchar.h"
 #include "utils/ascii.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 
 static void pg_to_ascii(unsigned char *src, unsigned char *src_end,
 						unsigned char *dest, int enc);
diff --git a/src/backend/utils/adt/format_type.c b/src/backend/utils/adt/format_type.c
index edbded16d5..1e8f1a2189 100644
--- a/src/backend/utils/adt/format_type.c
+++ b/src/backend/utils/adt/format_type.c
@@ -23,6 +23,7 @@
 #include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/numeric.h"
 #include "utils/syscache.h"
 
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index ff026f410c..cadee997a1 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -94,6 +94,7 @@
 #include "utils/float.h"
 #include "utils/formatting.h"
 #include "utils/int8.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/numeric.h"
 #include "utils/pg_locale.h"
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index 3da50444ab..338cf7437b 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -31,6 +31,7 @@
 #include "postmaster/syslogger.h"
 #include "storage/fd.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 364cadbaab..b3f7bf1d32 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -30,6 +30,7 @@
 #include "utils/json.h"
 #include "utils/jsonapi.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/syscache.h"
 #include "utils/typcache.h"
 
diff --git a/src/backend/utils/adt/jsonpath_scan.l b/src/backend/utils/adt/jsonpath_scan.l
index bde5539aed..d522b18097 100644
--- a/src/backend/utils/adt/jsonpath_scan.l
+++ b/src/backend/utils/adt/jsonpath_scan.l
@@ -19,6 +19,7 @@
 
 #include "common/pg_wchar.h"
 #include "nodes/pg_list.h"
+#include "utils/mbutils.h"
 
 static JsonPathString scanstring;
 
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 1314fce1b4..ef80cdbf15 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -23,6 +23,7 @@
 #include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 
 
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 7d20abc9be..837b7dc153 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -50,6 +50,7 @@
 #include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 #include "utils/selfuncs.h"
 #include "utils/varlena.h"
diff --git a/src/backend/utils/adt/name.c b/src/backend/utils/adt/name.c
index 9e81df701c..07811bdf2b 100644
--- a/src/backend/utils/adt/name.c
+++ b/src/backend/utils/adt/name.c
@@ -29,6 +29,7 @@
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/varlena.h"
 
 
diff --git a/src/backend/utils/adt/oracle_compat.c b/src/backend/utils/adt/oracle_compat.c
index 9c4131e6f2..7ff845e735 100644
--- a/src/backend/utils/adt/oracle_compat.c
+++ b/src/backend/utils/adt/oracle_compat.c
@@ -19,6 +19,7 @@
 #include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/formatting.h"
+#include "utils/mbutils.h"
 
 static text *dotrim(const char *string, int stringlen,
 					const char *set, int setlen,
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 5900443039..c7174e33ad 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -62,6 +62,7 @@
 #include "utils/formatting.h"
 #include "utils/hsearch.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/pg_locale.h"
 #include "utils/syscache.h"
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 5ae53968f1..71beb499a4 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -66,6 +66,7 @@
 #include "utils/guc.h"
 #include "utils/hsearch.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/ruleutils.h"
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index f68d87cb35..14809b4f82 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -29,6 +29,7 @@
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/regproc.h"
 #include "utils/rel.h"
 
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index 69f6d9d8c1..7b8186243b 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -25,6 +25,7 @@
 #include "utils/builtins.h"
 #include "utils/hashutils.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 #include "utils/varlena.h"
 
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 74c8a268f2..3545f1f6fa 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -87,6 +87,7 @@
 #include "utils/date.h"
 #include "utils/datetime.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 61ad9d8f89..c9aa4f207d 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -78,6 +78,7 @@
 #include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 1a9bb9b98d..44e0be3817 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -56,6 +56,7 @@
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/pg_locale.h"
 #include "utils/portal.h"
diff --git a/src/backend/utils/mb/conv.c b/src/backend/utils/mb/conv.c
index 4b0fc23285..3b68dd4821 100644
--- a/src/backend/utils/mb/conv.c
+++ b/src/backend/utils/mb/conv.c
@@ -12,6 +12,7 @@
  */
 #include "postgres.h"
 #include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 
 /*
diff --git a/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c b/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c
index 5500dd4fed..35204ae530 100644
--- a/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c b/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
index 26051530b7..5b9ca9d3c0 100644
--- a/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
+++ b/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
@@ -13,6 +13,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c b/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c
index d1cb25ea92..4703c4ef0a 100644
--- a/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c b/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
index ffc3896c9d..929b5d78b0 100644
--- a/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
+++ b/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 
 /*
  * SJIS alternative code.
diff --git a/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c b/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
index 2183cf183a..bb42246c15 100644
--- a/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c
index 7601bd65f2..33079708bf 100644
--- a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c
+++ b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c
@@ -14,6 +14,7 @@
 #include "postgres_fe.h"
 
 #include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 typedef struct
 {
diff --git a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
index e209e9a545..ef887961f2 100644
--- a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
+++ b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 
 #define ENCODING_GROWTH_RATE 4
 
diff --git a/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c b/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c
index 72ab071d87..5e38c186e0 100644
--- a/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c
+++ b/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c b/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c
index 5868b66715..20898d9040 100644
--- a/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c b/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c
index 542e055ea2..13b9e2e881 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/big5_to_utf8.map"
 #include "../../Unicode/utf8_to_big5.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c b/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c
index d145da093f..d738275c9d 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/utf8_to_koi8r.map"
 #include "../../Unicode/koi8r_to_utf8.map"
 #include "../../Unicode/utf8_to_koi8u.map"
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c
index 8d56d2343c..20b2f9ab1f 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/euc_jis_2004_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_jis_2004.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c
index 41f2f9a5ca..3e98899ec5 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/euc_cn_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_cn.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c
index 5a9f8ecfd4..84c6f5ed69 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/euc_jp_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_jp.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c
index d09f9dc78a..4462554199 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/euc_kr_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_kr.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c
index 8e0250e603..b3b7ff87e4 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/euc_tw_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_tw.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c b/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c
index 6b5b326c6f..49062c5b0d 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/gb18030_to_utf8.map"
 #include "../../Unicode/utf8_to_gb18030.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c b/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c
index 4e2246da7a..8e60cc1fa6 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/gbk_to_utf8.map"
 #include "../../Unicode/utf8_to_gbk.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
index 8051eefed9..673e079060 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/iso8859_10_to_utf8.map"
 #include "../../Unicode/iso8859_13_to_utf8.map"
 #include "../../Unicode/iso8859_14_to_utf8.map"
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
index b9e127cc5e..62d5d5feee 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c b/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c
index bece993e1d..ef2879150b 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/johab_to_utf8.map"
 #include "../../Unicode/utf8_to_johab.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c b/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c
index 8307d4ceb4..428bdc5c27 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/sjis_to_utf8.map"
 #include "../../Unicode/utf8_to_sjis.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c b/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c
index c913ba0866..ed97196d37 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/shift_jis_2004_to_utf8.map"
 #include "../../Unicode/utf8_to_shift_jis_2004.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c b/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c
index 87ae61ee85..d52107ed7f 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/uhc_to_utf8.map"
 #include "../../Unicode/utf8_to_uhc.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c b/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c
index 200fd7f23a..a29c45e746 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/utf8_to_win1250.map"
 #include "../../Unicode/utf8_to_win1251.map"
 #include "../../Unicode/utf8_to_win1252.map"
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 4cec7ec73a..bd3c5b9442 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -38,6 +38,7 @@
 #include "catalog/namespace.h"
 #include "common/pg_wchar.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
 
diff --git a/src/backend/utils/mb/stringinfo_mb.c b/src/backend/utils/mb/stringinfo_mb.c
index e7462a09b5..26cf8c715f 100644
--- a/src/backend/utils/mb/stringinfo_mb.c
+++ b/src/backend/utils/mb/stringinfo_mb.c
@@ -21,6 +21,7 @@
 
 #include "common/pg_wchar.h"
 #include "mb/stringinfo_mb.h"
+#include "utils/mbutils.h"
 
 
 /*
diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l
index 6c9e60ef64..0738360aa7 100644
--- a/src/backend/utils/misc/guc-file.l
+++ b/src/backend/utils/misc/guc-file.l
@@ -18,6 +18,7 @@
 #include "miscadmin.h"
 #include "storage/fd.h"
 #include "utils/guc.h"
+#include "utils/mbutils.h"
 
 
 /*
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 71701de091..b47439c52c 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -23,6 +23,7 @@
 
 #include "common/pg_wchar.h"
 #include "miscadmin.h"
+#include "utils/mbutils.h"
 #include "utils/memdebug.h"
 #include "utils/memutils.h"
 
diff --git a/src/common/wchar.c b/src/common/wchar.c
index 53006115d2..c9c07a14ba 100644
--- a/src/common/wchar.c
+++ b/src/common/wchar.c
@@ -13,6 +13,7 @@
 #include "c.h"
 
 #include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 
 /*
diff --git a/src/include/common/pg_wchar.h b/src/include/common/pg_wchar.h
index a1b3a27b9f..7b98453b7b 100644
--- a/src/include/common/pg_wchar.h
+++ b/src/include/common/pg_wchar.h
@@ -488,20 +488,6 @@ typedef struct
  */
 typedef uint32 (*utf_local_conversion_func) (uint32 code);
 
-/*
- * Support macro for encoding conversion functions to validate their
- * arguments.  (This could be made more compact if we included fmgr.h
- * here, but we don't want to do that because this header file is also
- * used by frontends.)
- */
-#define CHECK_ENCODING_CONVERSION_ARGS(srcencoding,destencoding) \
-	check_encoding_conversion_args(PG_GETARG_INT32(0), \
-								   PG_GETARG_INT32(1), \
-								   PG_GETARG_INT32(4), \
-								   (srcencoding), \
-								   (destencoding))
-
-
 /*
  * Some handy functions for Unicode-specific tests.
  */
@@ -552,104 +538,9 @@ extern bool pg_utf8_islegal(const unsigned char *source, int length);
 extern int	pg_utf_mblen(const unsigned char *s);
 extern int	pg_mule_mblen(const unsigned char *s);
 
-/*
- * The remaining functions are backend-only.
- */
-extern int	pg_mb2wchar(const char *from, pg_wchar *to);
-extern int	pg_mb2wchar_with_len(const char *from, pg_wchar *to, int len);
-extern int	pg_encoding_mb2wchar_with_len(int encoding,
-										  const char *from, pg_wchar *to, int len);
-extern int	pg_wchar2mb(const pg_wchar *from, char *to);
-extern int	pg_wchar2mb_with_len(const pg_wchar *from, char *to, int len);
-extern int	pg_encoding_wchar2mb_with_len(int encoding,
-										  const pg_wchar *from, char *to, int len);
 extern int	pg_char_and_wchar_strcmp(const char *s1, const pg_wchar *s2);
 extern int	pg_wchar_strncmp(const pg_wchar *s1, const pg_wchar *s2, size_t n);
 extern int	pg_char_and_wchar_strncmp(const char *s1, const pg_wchar *s2, size_t n);
 extern size_t pg_wchar_strlen(const pg_wchar *wstr);
-extern int	pg_mblen(const char *mbstr);
-extern int	pg_dsplen(const char *mbstr);
-extern int	pg_mbstrlen(const char *mbstr);
-extern int	pg_mbstrlen_with_len(const char *mbstr, int len);
-extern int	pg_mbcliplen(const char *mbstr, int len, int limit);
-extern int	pg_encoding_mbcliplen(int encoding, const char *mbstr,
-								  int len, int limit);
-extern int	pg_mbcharcliplen(const char *mbstr, int len, int limit);
-extern int	pg_database_encoding_max_length(void);
-extern mbcharacter_incrementer pg_database_encoding_character_incrementer(void);
-
-extern int	PrepareClientEncoding(int encoding);
-extern int	SetClientEncoding(int encoding);
-extern void InitializeClientEncoding(void);
-extern int	pg_get_client_encoding(void);
-extern const char *pg_get_client_encoding_name(void);
-
-extern void SetDatabaseEncoding(int encoding);
-extern int	GetDatabaseEncoding(void);
-extern const char *GetDatabaseEncodingName(void);
-extern void SetMessageEncoding(int encoding);
-extern int	GetMessageEncoding(void);
-
-#ifdef ENABLE_NLS
-extern int	pg_bind_textdomain_codeset(const char *domainname);
-#endif
-
-extern unsigned char *pg_do_encoding_conversion(unsigned char *src, int len,
-												int src_encoding,
-												int dest_encoding);
-
-extern char *pg_client_to_server(const char *s, int len);
-extern char *pg_server_to_client(const char *s, int len);
-extern char *pg_any_to_server(const char *s, int len, int encoding);
-extern char *pg_server_to_any(const char *s, int len, int encoding);
-
-extern unsigned short BIG5toCNS(unsigned short big5, unsigned char *lc);
-extern unsigned short CNStoBIG5(unsigned short cns, unsigned char lc);
-
-extern void UtfToLocal(const unsigned char *utf, int len,
-					   unsigned char *iso,
-					   const pg_mb_radix_tree *map,
-					   const pg_utf_to_local_combined *cmap, int cmapsize,
-					   utf_local_conversion_func conv_func,
-					   int encoding);
-extern void LocalToUtf(const unsigned char *iso, int len,
-					   unsigned char *utf,
-					   const pg_mb_radix_tree *map,
-					   const pg_local_to_utf_combined *cmap, int cmapsize,
-					   utf_local_conversion_func conv_func,
-					   int encoding);
-
-extern bool pg_verifymbstr(const char *mbstr, int len, bool noError);
-extern bool pg_verify_mbstr(int encoding, const char *mbstr, int len,
-							bool noError);
-extern int	pg_verify_mbstr_len(int encoding, const char *mbstr, int len,
-								bool noError);
-
-extern void check_encoding_conversion_args(int src_encoding,
-										   int dest_encoding,
-										   int len,
-										   int expected_src_encoding,
-										   int expected_dest_encoding);
-
-extern void report_invalid_encoding(int encoding, const char *mbstr, int len) pg_attribute_noreturn();
-extern void report_untranslatable_char(int src_encoding, int dest_encoding,
-									   const char *mbstr, int len) pg_attribute_noreturn();
-
-extern void local2local(const unsigned char *l, unsigned char *p, int len,
-						int src_encoding, int dest_encoding, const unsigned char *tab);
-extern void latin2mic(const unsigned char *l, unsigned char *p, int len,
-					  int lc, int encoding);
-extern void mic2latin(const unsigned char *mic, unsigned char *p, int len,
-					  int lc, int encoding);
-extern void latin2mic_with_table(const unsigned char *l, unsigned char *p,
-								 int len, int lc, int encoding,
-								 const unsigned char *tab);
-extern void mic2latin_with_table(const unsigned char *mic, unsigned char *p,
-								 int len, int lc, int encoding,
-								 const unsigned char *tab);
-
-#ifdef WIN32
-extern WCHAR *pgwin32_message_to_UTF16(const char *str, int len, int *utf16len);
-#endif
 
 #endif							/* PG_WCHAR_H */
diff --git a/src/include/libpq/pqformat.h b/src/include/libpq/pqformat.h
index daeba6d5c9..9672e654bf 100644
--- a/src/include/libpq/pqformat.h
+++ b/src/include/libpq/pqformat.h
@@ -16,6 +16,7 @@
 #include "common/pg_wchar.h"
 #include "lib/stringinfo.h"
 #include "port/pg_bswap.h"
+#include "utils/mbutils.h"
 
 extern void pq_beginmessage(StringInfo buf, char msgtype);
 extern void pq_beginmessage_reuse(StringInfo buf, char msgtype);
diff --git a/src/include/regex/regcustom.h b/src/include/regex/regcustom.h
index 2506ac8268..ce0700c7b1 100644
--- a/src/include/regex/regcustom.h
+++ b/src/include/regex/regcustom.h
@@ -56,6 +56,8 @@
 
 #include "miscadmin.h"			/* needed by rcancelrequested/rstacktoodeep */
 
+#include "utils/mbutils.h"
+
 
 /* overrides for regguts.h definitions, if any */
 #define FUNCPTR(name, args) (*name) args
diff --git a/src/include/regex/regex.h b/src/include/regex/regex.h
index 739cc7d14b..3c53687c1b 100644
--- a/src/include/regex/regex.h
+++ b/src/include/regex/regex.h
@@ -36,6 +36,7 @@
  * Add your own defines, if needed, here.
  */
 #include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 /*
  * interface types etc.
diff --git a/src/include/tsearch/ts_locale.h b/src/include/tsearch/ts_locale.h
index f77eb23c8a..00dcef3f80 100644
--- a/src/include/tsearch/ts_locale.h
+++ b/src/include/tsearch/ts_locale.h
@@ -16,6 +16,7 @@
 #include <limits.h>
 
 #include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 
 /*
diff --git a/src/include/utils/mbutils.h b/src/include/utils/mbutils.h
new file mode 100644
index 0000000000..b8da6ce0a9
--- /dev/null
+++ b/src/include/utils/mbutils.h
@@ -0,0 +1,124 @@
+/*-------------------------------------------------------------------------
+ *
+ * mbutils.h
+ *	  backend-only multibyte-character support
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/mbutils.h
+ *
+ *	NOTES
+ *		TODO: write some notes
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MBUTILS_H
+#define MBUTILS_H
+
+extern int	pg_mb2wchar(const char *from, pg_wchar *to);
+extern int	pg_mb2wchar_with_len(const char *from, pg_wchar *to, int len);
+extern int	pg_encoding_mb2wchar_with_len(int encoding,
+										  const char *from, pg_wchar *to, int len);
+extern int	pg_wchar2mb(const pg_wchar *from, char *to);
+extern int	pg_wchar2mb_with_len(const pg_wchar *from, char *to, int len);
+extern int	pg_encoding_wchar2mb_with_len(int encoding,
+										  const pg_wchar *from, char *to, int len);
+extern int	pg_mblen(const char *mbstr);
+extern int	pg_dsplen(const char *mbstr);
+extern int	pg_mbstrlen(const char *mbstr);
+extern int	pg_mbstrlen_with_len(const char *mbstr, int len);
+extern int	pg_mbcliplen(const char *mbstr, int len, int limit);
+extern int	pg_encoding_mbcliplen(int encoding, const char *mbstr,
+								  int len, int limit);
+extern int	pg_mbcharcliplen(const char *mbstr, int len, int limit);
+extern int	pg_database_encoding_max_length(void);
+extern mbcharacter_incrementer pg_database_encoding_character_incrementer(void);
+
+extern int	PrepareClientEncoding(int encoding);
+extern int	SetClientEncoding(int encoding);
+extern void InitializeClientEncoding(void);
+extern int	pg_get_client_encoding(void);
+extern const char *pg_get_client_encoding_name(void);
+
+extern void SetDatabaseEncoding(int encoding);
+extern int	GetDatabaseEncoding(void);
+extern const char *GetDatabaseEncodingName(void);
+extern void SetMessageEncoding(int encoding);
+extern int	GetMessageEncoding(void);
+
+#ifdef ENABLE_NLS
+extern int	pg_bind_textdomain_codeset(const char *domainname);
+#endif
+
+extern unsigned char *pg_do_encoding_conversion(unsigned char *src, int len,
+												int src_encoding,
+												int dest_encoding);
+
+extern char *pg_client_to_server(const char *s, int len);
+extern char *pg_server_to_client(const char *s, int len);
+extern char *pg_any_to_server(const char *s, int len, int encoding);
+extern char *pg_server_to_any(const char *s, int len, int encoding);
+
+extern unsigned short BIG5toCNS(unsigned short big5, unsigned char *lc);
+extern unsigned short CNStoBIG5(unsigned short cns, unsigned char lc);
+
+extern void UtfToLocal(const unsigned char *utf, int len,
+					   unsigned char *iso,
+					   const pg_mb_radix_tree *map,
+					   const pg_utf_to_local_combined *cmap, int cmapsize,
+					   utf_local_conversion_func conv_func,
+					   int encoding);
+extern void LocalToUtf(const unsigned char *iso, int len,
+					   unsigned char *utf,
+					   const pg_mb_radix_tree *map,
+					   const pg_local_to_utf_combined *cmap, int cmapsize,
+					   utf_local_conversion_func conv_func,
+					   int encoding);
+
+extern bool pg_verifymbstr(const char *mbstr, int len, bool noError);
+extern bool pg_verify_mbstr(int encoding, const char *mbstr, int len,
+							bool noError);
+extern int	pg_verify_mbstr_len(int encoding, const char *mbstr, int len,
+								bool noError);
+
+extern void check_encoding_conversion_args(int src_encoding,
+										   int dest_encoding,
+										   int len,
+										   int expected_src_encoding,
+										   int expected_dest_encoding);
+
+extern void report_invalid_encoding(int encoding, const char *mbstr, int len) pg_attribute_noreturn();
+extern void report_untranslatable_char(int src_encoding, int dest_encoding,
+									   const char *mbstr, int len) pg_attribute_noreturn();
+
+extern void local2local(const unsigned char *l, unsigned char *p, int len,
+						int src_encoding, int dest_encoding, const unsigned char *tab);
+extern void latin2mic(const unsigned char *l, unsigned char *p, int len,
+					  int lc, int encoding);
+extern void mic2latin(const unsigned char *mic, unsigned char *p, int len,
+					  int lc, int encoding);
+extern void latin2mic_with_table(const unsigned char *l, unsigned char *p,
+								 int len, int lc, int encoding,
+								 const unsigned char *tab);
+extern void mic2latin_with_table(const unsigned char *mic, unsigned char *p,
+								 int len, int lc, int encoding,
+								 const unsigned char *tab);
+
+#ifdef WIN32
+extern WCHAR *pgwin32_message_to_UTF16(const char *str, int len, int *utf16len);
+#endif
+
+/*
+ * Support macro for encoding conversion functions to validate their
+ * arguments.
+ */
+#define CHECK_ENCODING_CONVERSION_ARGS(srcencoding,destencoding) \
+	check_encoding_conversion_args(PG_GETARG_INT32(0), \
+								   PG_GETARG_INT32(1), \
+								   PG_GETARG_INT32(4), \
+								   (srcencoding), \
+								   (destencoding))
+
+
+#endif							/* MBUTILS_H */
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 799a1538da..9ca04d9781 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -30,6 +30,7 @@
 #include "libpq-fe.h"
 #include "libpq-int.h"
 #include "port/pg_bswap.h"
+#include "utils/mbutils.h"
 
 /*
  * This macro lists the backend message types that could be "long" (more
diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index ac7144e99f..4293f58360 100644
--- a/src/pl/plperl/plperl.c
+++ b/src/pl/plperl/plperl.c
@@ -34,6 +34,7 @@
 #include "utils/guc.h"
 #include "utils/hsearch.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
diff --git a/src/pl/plperl/plperl_helpers.h b/src/pl/plperl/plperl_helpers.h
index cec942c280..2359930805 100644
--- a/src/pl/plperl/plperl_helpers.h
+++ b/src/pl/plperl/plperl_helpers.h
@@ -2,8 +2,8 @@
 #define PL_PERL_HELPERS_H
 
 #include "common/pg_wchar.h"
-
 #include "plperl.h"
+#include "utils/mbutils.h"
 
 
 /*
diff --git a/src/pl/plpgsql/src/pl_scanner.c b/src/pl/plpgsql/src/pl_scanner.c
index a5bb7c474c..0b758db69c 100644
--- a/src/pl/plpgsql/src/pl_scanner.c
+++ b/src/pl/plpgsql/src/pl_scanner.c
@@ -20,6 +20,7 @@
 
 #include "plpgsql.h"
 #include "pl_gram.h"			/* must be after parser/scanner.h */
+#include "utils/mbutils.h"
 
 
 /* Klugy flag to tell scanner how to look up identifiers */
diff --git a/src/pl/plpython/plpy_cursorobject.c b/src/pl/plpython/plpy_cursorobject.c
index 01ad7cf3bc..5dd1079002 100644
--- a/src/pl/plpython/plpy_cursorobject.c
+++ b/src/pl/plpython/plpy_cursorobject.c
@@ -19,6 +19,7 @@
 #include "plpy_resultobject.h"
 #include "plpy_spi.h"
 #include "plpython.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 
 static PyObject *PLy_cursor_query(const char *query);
diff --git a/src/pl/plpython/plpy_plpymodule.c b/src/pl/plpython/plpy_plpymodule.c
index b079afc000..d691def5bb 100644
--- a/src/pl/plpython/plpy_plpymodule.c
+++ b/src/pl/plpython/plpy_plpymodule.c
@@ -18,6 +18,7 @@
 #include "plpy_subxactobject.h"
 #include "plpython.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/snapmgr.h"
 
 HTAB	   *PLy_spi_exceptions = NULL;
diff --git a/src/pl/plpython/plpy_spi.c b/src/pl/plpython/plpy_spi.c
index 8b7cfbf18c..a6cf592974 100644
--- a/src/pl/plpython/plpy_spi.c
+++ b/src/pl/plpython/plpy_spi.c
@@ -22,6 +22,7 @@
 #include "plpy_resultobject.h"
 #include "plpy_spi.h"
 #include "plpython.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
 
diff --git a/src/pl/plpython/plpy_typeio.c b/src/pl/plpython/plpy_typeio.c
index baf45526c6..e4fd090228 100644
--- a/src/pl/plpython/plpy_typeio.c
+++ b/src/pl/plpython/plpy_typeio.c
@@ -19,6 +19,7 @@
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 
 /* conversion from Datums to Python objects */
diff --git a/src/pl/plpython/plpy_util.c b/src/pl/plpython/plpy_util.c
index 873afcfbb8..b9ad3f7000 100644
--- a/src/pl/plpython/plpy_util.c
+++ b/src/pl/plpython/plpy_util.c
@@ -10,6 +10,7 @@
 #include "plpy_elog.h"
 #include "plpy_util.h"
 #include "plpython.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 
 /*
-- 
2.21.1 (Apple Git-122.3)

0003-Moving-pg_wchar.h-declared-functions-from-backend-to.patch.WIPapplication/octet-stream; name=0003-Moving-pg_wchar.h-declared-functions-from-backend-to.patch.WIP; x-unix-mode=0644Download

From 10af2aa37ad5f529a06cf700b66e3fcd353b9b69 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 20 Jan 2020 17:40:40 -0800
Subject: [PATCH 03/11] Moving pg_wchar.h declared functions from backend to
 src/common

The functions defined in wstrcmp.c and wstrncmp.c are declared
in src/include/common/pg_wchar.h, so moving those two files
into src/common/ where they belong.
---
 src/backend/utils/mb/Makefile               | 4 +---
 src/common/Makefile                         | 4 +++-
 src/{backend/utils/mb => common}/wstrcmp.c  | 2 +-
 src/{backend/utils/mb => common}/wstrncmp.c | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)
 rename src/{backend/utils/mb => common}/wstrcmp.c (98%)
 rename src/{backend/utils/mb => common}/wstrncmp.c (98%)

diff --git a/src/backend/utils/mb/Makefile b/src/backend/utils/mb/Makefile
index b19a125fa2..3e6f19cd44 100644
--- a/src/backend/utils/mb/Makefile
+++ b/src/backend/utils/mb/Makefile
@@ -15,9 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = \
 	conv.o \
 	mbutils.o \
-	stringinfo_mb.o \
-	wstrcmp.o \
-	wstrncmp.o
+	stringinfo_mb.o
 
 include $(top_srcdir)/src/backend/common.mk
 
diff --git a/src/common/Makefile b/src/common/Makefile
index 44ca68fa6c..3882cd7a3d 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -72,7 +72,9 @@ OBJS_COMMON = \
 	unicode_norm.o \
 	username.o \
 	wait_error.o \
-	wchar.o
+	wchar.o \
+	wstrcmp.o \
+	wstrncmp.o
 
 ifeq ($(with_openssl),yes)
 OBJS_COMMON += \
diff --git a/src/backend/utils/mb/wstrcmp.c b/src/common/wstrcmp.c
similarity index 98%
rename from src/backend/utils/mb/wstrcmp.c
rename to src/common/wstrcmp.c
index e5f57d717d..4d0a66952b 100644
--- a/src/backend/utils/mb/wstrcmp.c
+++ b/src/common/wstrcmp.c
@@ -1,5 +1,5 @@
 /*
- * src/backend/utils/mb/wstrcmp.c
+ * src/common/wstrcmp.c
  *
  *-
  * Copyright (c) 1990, 1993
diff --git a/src/backend/utils/mb/wstrncmp.c b/src/common/wstrncmp.c
similarity index 98%
rename from src/backend/utils/mb/wstrncmp.c
rename to src/common/wstrncmp.c
index cce0c6c5cf..7fb8319161 100644
--- a/src/backend/utils/mb/wstrncmp.c
+++ b/src/common/wstrncmp.c
@@ -1,5 +1,5 @@
 /*
- * src/backend/utils/mb/wstrncmp.c
+ * src/common/wstrncmp.c
  *
  *
  * Copyright (c) 1989, 1993
-- 
2.21.1 (Apple Git-122.3)

0004-Moving-jsonapi.h-to-src-include-common.patch.WIPapplication/octet-stream; name=0004-Moving-jsonapi.h-to-src-include-common.patch.WIP; x-unix-mode=0644Download

From 1f265b0b68a2c9d75ef6e0764746de8d6a7b4879 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 20 Jan 2020 19:31:06 -0800
Subject: [PATCH 04/11] Moving jsonapi.h to src/include/common

In preparation for exposing this header for both frontend
and backend use, moving it to src/include/common, and
changing the #includes in other files to match.
---
 contrib/hstore/hstore_io.c              | 2 +-
 src/backend/tsearch/to_tsany.c          | 2 +-
 src/backend/tsearch/wparser.c           | 2 +-
 src/backend/utils/adt/json.c            | 2 +-
 src/backend/utils/adt/jsonb.c           | 2 +-
 src/backend/utils/adt/jsonb_util.c      | 2 +-
 src/backend/utils/adt/jsonfuncs.c       | 2 +-
 src/include/{utils => common}/jsonapi.h | 4 ++--
 8 files changed, 9 insertions(+), 9 deletions(-)
 rename src/include/{utils => common}/jsonapi.h (98%)

diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index 10ec392775..f3174f2995 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -7,13 +7,13 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "funcapi.h"
 #include "hstore.h"
 #include "lib/stringinfo.h"
 #include "libpq/pqformat.h"
 #include "utils/builtins.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index cc694cda8c..9c356cbfd5 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -13,10 +13,10 @@
  */
 #include "postgres.h"
 
+#include "common/jsonapi.h"
 #include "tsearch/ts_cache.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
-#include "utils/jsonapi.h"
 
 
 typedef struct MorphOpaque
diff --git a/src/backend/tsearch/wparser.c b/src/backend/tsearch/wparser.c
index 6b5960ecc1..25ef79ac6d 100644
--- a/src/backend/tsearch/wparser.c
+++ b/src/backend/tsearch/wparser.c
@@ -16,11 +16,11 @@
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
+#include "common/jsonapi.h"
 #include "funcapi.h"
 #include "tsearch/ts_cache.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
-#include "utils/jsonapi.h"
 #include "utils/varlena.h"
 
 /******sql-level interface******/
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index b3f7bf1d32..6175fba447 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -16,6 +16,7 @@
 #include "access/htup_details.h"
 #include "access/transam.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "common/pg_wchar.h"
 #include "executor/spi.h"
 #include "funcapi.h"
@@ -28,7 +29,6 @@
 #include "utils/date.h"
 #include "utils/datetime.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
 #include "utils/lsyscache.h"
 #include "utils/mbutils.h"
 #include "utils/syscache.h"
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index c4a4ec78b0..c95e112184 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -15,6 +15,7 @@
 #include "access/htup_details.h"
 #include "access/transam.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "funcapi.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
@@ -23,7 +24,6 @@
 #include "utils/date.h"
 #include "utils/datetime.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 7c9da701dd..a63f06fdae 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -15,11 +15,11 @@
 
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/datetime.h"
 #include "utils/hashutils.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/memutils.h"
 #include "utils/varlena.h"
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 14aefd8fe2..e59e99c0d0 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -18,6 +18,7 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
 #include "funcapi.h"
@@ -27,7 +28,6 @@
 #include "utils/builtins.h"
 #include "utils/hsearch.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
diff --git a/src/include/utils/jsonapi.h b/src/include/common/jsonapi.h
similarity index 98%
rename from src/include/utils/jsonapi.h
rename to src/include/common/jsonapi.h
index f72f1cefd5..6f7a810172 100644
--- a/src/include/utils/jsonapi.h
+++ b/src/include/common/jsonapi.h
@@ -6,7 +6,7 @@
  * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
- * src/include/utils/jsonapi.h
+ * src/include/common/jsonapi.h
  *
  *-------------------------------------------------------------------------
  */
@@ -14,7 +14,7 @@
 #ifndef JSONAPI_H
 #define JSONAPI_H
 
-#include "jsonb.h"
+#include "utils/jsonb.h"
 #include "lib/stringinfo.h"
 
 typedef enum
-- 
2.21.1 (Apple Git-122.3)

0005-Moving-backend-functions-out-of-src-include-common-j.patch.WIPapplication/octet-stream; name=0005-Moving-backend-functions-out-of-src-include-common-j.patch.WIP; x-unix-mode=0644Download

From c1d5a2a118bf57d7d4a6758107802ba774486dbf Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 21 Jan 2020 07:44:52 -0800
Subject: [PATCH 05/11] Moving backend functions out of
 src/include/common/jsonapi.h

Functions that rely on backend datatypes (text, Datum, etc)
are now moved out of jsonapi.h into either json.h or jsonb.h,
except for functions which work for both json and jsonb which
are moved into a new header named jsonfuncs.h.
---
 src/backend/tsearch/to_tsany.c |  2 ++
 src/backend/tsearch/wparser.c  |  1 +
 src/include/common/jsonapi.h   | 17 -----------------
 src/include/utils/json.h       |  5 +++++
 src/include/utils/jsonb.h      |  7 +++++++
 src/include/utils/jsonfuncs.h  | 28 ++++++++++++++++++++++++++++
 6 files changed, 43 insertions(+), 17 deletions(-)
 create mode 100644 src/include/utils/jsonfuncs.h

diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index 9c356cbfd5..824df69cd1 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -17,6 +17,8 @@
 #include "tsearch/ts_cache.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
+#include "utils/json.h"
+#include "utils/jsonb.h"
 
 
 typedef struct MorphOpaque
diff --git a/src/backend/tsearch/wparser.c b/src/backend/tsearch/wparser.c
index 25ef79ac6d..ef8ec445c6 100644
--- a/src/backend/tsearch/wparser.c
+++ b/src/backend/tsearch/wparser.c
@@ -21,6 +21,7 @@
 #include "tsearch/ts_cache.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
+#include "utils/jsonb.h"
 #include "utils/varlena.h"
 
 /******sql-level interface******/
diff --git a/src/include/common/jsonapi.h b/src/include/common/jsonapi.h
index 6f7a810172..2aa35dbb3c 100644
--- a/src/include/common/jsonapi.h
+++ b/src/include/common/jsonapi.h
@@ -14,7 +14,6 @@
 #ifndef JSONAPI_H
 #define JSONAPI_H
 
-#include "utils/jsonb.h"
 #include "lib/stringinfo.h"
 
 typedef enum
@@ -148,20 +147,4 @@ typedef enum JsonToIndex
 /* an action that will be applied to each value in iterate_json(b)_values functions */
 typedef void (*JsonIterateStringValuesAction) (void *state, char *elem_value, int elem_len);
 
-/* an action that will be applied to each value in transform_json(b)_values functions */
-typedef text *(*JsonTransformStringValuesAction) (void *state, char *elem_value, int elem_len);
-
-extern uint32 parse_jsonb_index_flags(Jsonb *jb);
-extern void iterate_jsonb_values(Jsonb *jb, uint32 flags, void *state,
-								 JsonIterateStringValuesAction action);
-extern void iterate_json_values(text *json, uint32 flags, void *action_state,
-								JsonIterateStringValuesAction action);
-extern Jsonb *transform_jsonb_string_values(Jsonb *jsonb, void *action_state,
-											JsonTransformStringValuesAction transform_action);
-extern text *transform_json_string_values(text *json, void *action_state,
-										  JsonTransformStringValuesAction transform_action);
-
-extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
-								const int *tzp);
-
 #endif							/* JSONAPI_H */
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 20b5294491..b12f5ae049 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -14,9 +14,14 @@
 #ifndef JSON_H
 #define JSON_H
 
+#include "common/jsonapi.h"
 #include "lib/stringinfo.h"
+#include "utils/jsonfuncs.h"
 
 /* functions in json.c */
 extern void escape_json(StringInfo buf, const char *str);
 
+extern void iterate_json_values(text *json, uint32 flags, void *action_state,
+								JsonIterateStringValuesAction action);
+
 #endif							/* JSON_H */
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index 5860011693..39bc734a5e 100644
--- a/src/include/utils/jsonb.h
+++ b/src/include/utils/jsonb.h
@@ -12,8 +12,10 @@
 #ifndef __JSONB_H__
 #define __JSONB_H__
 
+#include "common/jsonapi.h"
 #include "lib/stringinfo.h"
 #include "utils/array.h"
+#include "utils/jsonfuncs.h"
 #include "utils/numeric.h"
 
 /* Tokens used when sequentially processing a jsonb value */
@@ -374,6 +376,11 @@ typedef struct JsonbIterator
 	struct JsonbIterator *parent;
 } JsonbIterator;
 
+extern uint32 parse_jsonb_index_flags(Jsonb *jb);
+extern void iterate_jsonb_values(Jsonb *jb, uint32 flags, void *state,
+								 JsonIterateStringValuesAction action);
+extern Jsonb *transform_jsonb_string_values(Jsonb *jsonb, void *action_state,
+											JsonTransformStringValuesAction transform_action);
 
 /* Support functions */
 extern uint32 getJsonbOffset(const JsonbContainer *jc, int index);
diff --git a/src/include/utils/jsonfuncs.h b/src/include/utils/jsonfuncs.h
new file mode 100644
index 0000000000..8d7abe73e6
--- /dev/null
+++ b/src/include/utils/jsonfuncs.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * jsonfuncs.h
+ *	  Declarations for JSON and JSONB functional support.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/jsonfuncs.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef JSONFUNCS_H
+#define JSONFUNCS_H
+
+#include "common/jsonapi.h"
+#include "lib/stringinfo.h"
+
+/* an action that will be applied to each value in transform_json(b)_values functions */
+typedef text *(*JsonTransformStringValuesAction) (void *state, char *elem_value, int elem_len);
+
+extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
+								const int *tzp);
+extern text *transform_json_string_values(text *json, void *action_state,
+										  JsonTransformStringValuesAction transform_action);
+
+#endif							/* JSONFUNCS_H */
-- 
2.21.1 (Apple Git-122.3)

0006-Moving-common-functions-into-jsonapi.c.patch.WIPapplication/octet-stream; name=0006-Moving-common-functions-into-jsonapi.c.patch.WIP; x-unix-mode=0644Download

From 59cc79e5f7984ab07d3ef724d2ed272539225ff0 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 21 Jan 2020 12:47:29 -0800
Subject: [PATCH 06/11] Moving common functions into jsonapi.c

Moving functions common to frontend and backend code
out of src/backend/utils and into new file
src/common/jsonapi.c
---
 src/backend/utils/adt/json.c      | 33 -------------------------
 src/backend/utils/adt/jsonfuncs.c | 20 +++++++++++++++
 src/common/Makefile               |  1 +
 src/common/jsonapi.c              | 41 +++++++++++++++++++++++++++++++
 src/include/common/jsonapi.h      |  1 -
 src/include/utils/jsonfuncs.h     |  1 +
 6 files changed, 63 insertions(+), 34 deletions(-)
 create mode 100644 src/common/jsonapi.c

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 6175fba447..dc71909a3c 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -286,39 +286,6 @@ json_recv(PG_FUNCTION_ARGS)
 	PG_RETURN_TEXT_P(cstring_to_text_with_len(str, nbytes));
 }
 
-/*
- * makeJsonLexContext
- *
- * lex constructor, with or without StringInfo object
- * for de-escaped lexemes.
- *
- * Without is better as it makes the processing faster, so only make one
- * if really required.
- *
- * If you already have the json as a text* value, use the first of these
- * functions, otherwise use  makeJsonLexContextCstringLen().
- */
-JsonLexContext *
-makeJsonLexContext(text *json, bool need_escapes)
-{
-	return makeJsonLexContextCstringLen(VARDATA_ANY(json),
-										VARSIZE_ANY_EXHDR(json),
-										need_escapes);
-}
-
-JsonLexContext *
-makeJsonLexContextCstringLen(char *json, int len, bool need_escapes)
-{
-	JsonLexContext *lex = palloc0(sizeof(JsonLexContext));
-
-	lex->input = lex->token_terminator = lex->line_start = json;
-	lex->line_number = 1;
-	lex->input_length = len;
-	if (need_escapes)
-		lex->strval = makeStringInfo();
-	return lex;
-}
-
 /*
  * pg_parse_json
  *
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index e59e99c0d0..3979145ecc 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -483,6 +483,26 @@ static void transform_string_values_object_field_start(void *state, char *fname,
 static void transform_string_values_array_element_start(void *state, bool isnull);
 static void transform_string_values_scalar(void *state, char *token, JsonTokenType tokentype);
 
+/*
+ * makeJsonLexContext
+ *
+ * lex constructor, with or without StringInfo object
+ * for de-escaped lexemes.
+ *
+ * Without is better as it makes the processing faster, so only make one
+ * if really required.
+ *
+ * If you already have the json as a text* value, use the first of these
+ * functions, otherwise use  makeJsonLexContextCstringLen().
+ */
+JsonLexContext *
+makeJsonLexContext(text *json, bool need_escapes)
+{
+	return makeJsonLexContextCstringLen(VARDATA_ANY(json),
+										VARSIZE_ANY_EXHDR(json),
+										need_escapes);
+}
+
 /*
  * SQL function json_object_keys
  *
diff --git a/src/common/Makefile b/src/common/Makefile
index 3882cd7a3d..e9caa08418 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -56,6 +56,7 @@ OBJS_COMMON = \
 	f2s.o \
 	file_perm.o \
 	ip.o \
+	jsonapi.o \
 	keywords.o \
 	kwlookup.o \
 	link-canary.o \
diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
new file mode 100644
index 0000000000..69fa42705f
--- /dev/null
+++ b/src/common/jsonapi.c
@@ -0,0 +1,41 @@
+/*-------------------------------------------------------------------------
+ *
+ * jsonapi.c
+ *	  Functions for working with json formatted data.
+ *
+ * Portions Copyright (c) 1998-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/common/jsonapi.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "c.h"
+
+#include "common/jsonapi.h"
+
+#ifndef FRONTEND
+#include "utils/palloc.h"
+#endif
+
+JsonLexContext *
+makeJsonLexContextCstringLen(char *json, int len, bool need_escapes)
+{
+	JsonLexContext *lex;
+
+#ifndef FRONTEND
+	lex = (JsonLexContext*) palloc0fast(sizeof(JsonLexContext));
+#else
+	lex = (JsonLexContext*) malloc(sizeof(JsonLexContext));
+	memset(lex, 0, sizeof(JsonLexContext));
+#endif
+
+	lex->input = lex->token_terminator = lex->line_start = json;
+	lex->line_number = 1;
+	lex->input_length = len;
+	if (need_escapes)
+		lex->strval = makeStringInfo();
+	return lex;
+}
+
+
diff --git a/src/include/common/jsonapi.h b/src/include/common/jsonapi.h
index 2aa35dbb3c..fd1d94d1d7 100644
--- a/src/include/common/jsonapi.h
+++ b/src/include/common/jsonapi.h
@@ -119,7 +119,6 @@ extern int	json_count_array_elements(JsonLexContext *lex);
  * If you already have the json as a text* value, use the first of these
  * functions, otherwise use  makeJsonLexContextCstringLen().
  */
-extern JsonLexContext *makeJsonLexContext(text *json, bool need_escapes);
 extern JsonLexContext *makeJsonLexContextCstringLen(char *json,
 													int len,
 													bool need_escapes);
diff --git a/src/include/utils/jsonfuncs.h b/src/include/utils/jsonfuncs.h
index 8d7abe73e6..bade7248f9 100644
--- a/src/include/utils/jsonfuncs.h
+++ b/src/include/utils/jsonfuncs.h
@@ -24,5 +24,6 @@ extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
 								const int *tzp);
 extern text *transform_json_string_values(text *json, void *action_state,
 										  JsonTransformStringValuesAction transform_action);
+extern JsonLexContext *makeJsonLexContext(text *json, bool need_escapes);
 
 #endif							/* JSONFUNCS_H */
-- 
2.21.1 (Apple Git-122.3)

0007-Moving-json-parsing-logic-into-common.patch.WIPapplication/octet-stream; name=0007-Moving-json-parsing-logic-into-common.patch.WIP; x-unix-mode=0644Download

From fdf45e14e66f9dc7299142a9b0e7a5dc6d95a034 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 21 Jan 2020 13:05:48 -0800
Subject: [PATCH 07/11] Moving json parsing logic into common.

Parsing logic for json was in src/backend/utils/adt, but
to be able to parse json from frontend code, this needs
to be moved to common.
---
 src/backend/utils/adt/json.c | 29 +++++++++++------------------
 src/common/jsonapi.c         |  7 +++++++
 src/include/common/jsonapi.h |  5 ++++-
 3 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index dc71909a3c..f86d5042a7 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -80,11 +80,11 @@ static inline void json_lex(JsonLexContext *lex);
 static inline void json_lex_string(JsonLexContext *lex);
 static inline void json_lex_number(JsonLexContext *lex, char *s,
 								   bool *num_err, int *total_len);
-static inline void parse_scalar(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_object_field(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_object(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_array_element(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_array(JsonLexContext *lex, JsonSemAction *sem);
+static inline void parse_scalar(JsonLexContext *lex, const JsonSemAction *sem);
+static void parse_object_field(JsonLexContext *lex, const JsonSemAction *sem);
+static void parse_object(JsonLexContext *lex, const JsonSemAction *sem);
+static void parse_array_element(JsonLexContext *lex, const JsonSemAction *sem);
+static void parse_array(JsonLexContext *lex, const JsonSemAction *sem);
 static void report_parse_error(JsonParseContext ctx, JsonLexContext *lex) pg_attribute_noreturn();
 static void report_invalid_token(JsonLexContext *lex) pg_attribute_noreturn();
 static int	report_json_context(JsonLexContext *lex);
@@ -107,13 +107,6 @@ static void add_json(Datum val, bool is_null, StringInfo result,
 					 Oid val_type, bool key_scalar);
 static text *catenate_stringinfo_string(StringInfo buffer, const char *addon);
 
-/* the null action object used for pure validation */
-static JsonSemAction nullSemAction =
-{
-	NULL, NULL, NULL, NULL, NULL,
-	NULL, NULL, NULL, NULL, NULL
-};
-
 /* Recursive Descent parser support routines */
 
 /*
@@ -297,7 +290,7 @@ json_recv(PG_FUNCTION_ARGS)
  * pointer to a state object to be passed to those routines.
  */
 void
-pg_parse_json(JsonLexContext *lex, JsonSemAction *sem)
+pg_parse_json(JsonLexContext *lex, const JsonSemAction *sem)
 {
 	JsonTokenType tok;
 
@@ -372,7 +365,7 @@ json_count_array_elements(JsonLexContext *lex)
  *	  - object field
  */
 static inline void
-parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
+parse_scalar(JsonLexContext *lex, const JsonSemAction *sem)
 {
 	char	   *val = NULL;
 	json_scalar_action sfunc = sem->scalar;
@@ -408,7 +401,7 @@ parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
 }
 
 static void
-parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
+parse_object_field(JsonLexContext *lex, const JsonSemAction *sem)
 {
 	/*
 	 * An object field is "fieldname" : value where value can be a scalar,
@@ -454,7 +447,7 @@ parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
 }
 
 static void
-parse_object(JsonLexContext *lex, JsonSemAction *sem)
+parse_object(JsonLexContext *lex, const JsonSemAction *sem)
 {
 	/*
 	 * an object is a possibly empty sequence of object fields, separated by
@@ -504,7 +497,7 @@ parse_object(JsonLexContext *lex, JsonSemAction *sem)
 }
 
 static void
-parse_array_element(JsonLexContext *lex, JsonSemAction *sem)
+parse_array_element(JsonLexContext *lex, const JsonSemAction *sem)
 {
 	json_aelem_action astart = sem->array_element_start;
 	json_aelem_action aend = sem->array_element_end;
@@ -535,7 +528,7 @@ parse_array_element(JsonLexContext *lex, JsonSemAction *sem)
 }
 
 static void
-parse_array(JsonLexContext *lex, JsonSemAction *sem)
+parse_array(JsonLexContext *lex, const JsonSemAction *sem)
 {
 	/*
 	 * an array is a possibly empty sequence of array elements, separated by
diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
index 69fa42705f..c14d6ff4f2 100644
--- a/src/common/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -18,6 +18,13 @@
 #include "utils/palloc.h"
 #endif
 
+/* the null action object used for pure validation */
+const JsonSemAction nullSemAction =
+{
+	NULL, NULL, NULL, NULL, NULL,
+	NULL, NULL, NULL, NULL, NULL
+};
+
 JsonLexContext *
 makeJsonLexContextCstringLen(char *json, int len, bool need_escapes)
 {
diff --git a/src/include/common/jsonapi.h b/src/include/common/jsonapi.h
index fd1d94d1d7..581fd48036 100644
--- a/src/include/common/jsonapi.h
+++ b/src/include/common/jsonapi.h
@@ -92,6 +92,9 @@ typedef struct JsonSemAction
 	json_scalar_action scalar;
 } JsonSemAction;
 
+/* the null action object used for pure validation */
+extern const JsonSemAction nullSemAction;
+
 /*
  * pg_parse_json will parse the string in the lex calling the
  * action functions in sem at the appropriate points. It is
@@ -101,7 +104,7 @@ typedef struct JsonSemAction
  * points to. If the action pointers are NULL the parser
  * does nothing and just continues.
  */
-extern void pg_parse_json(JsonLexContext *lex, JsonSemAction *sem);
+extern void pg_parse_json(JsonLexContext *lex, const JsonSemAction *sem);
 
 /*
  * json_count_array_elements performs a fast secondary parse to determine the
-- 
2.21.1 (Apple Git-122.3)

0008-Remove-json.c-s-lex_accept.patch.WIPapplication/octet-stream; name=0008-Remove-json.c-s-lex_accept.patch.WIP; x-unix-mode=0644Download

From 896b27c1220d16c0eda5cb6796340713ef939b33 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 22 Jan 2020 08:19:18 -0800
Subject: [PATCH 08/11] Remove json.c's lex_accept().

This was Robert's v2-0003-Remove-jsonapi.c-s-lex_accept.patch,
but I'm applying it to json.c rather than jsonapi.c since I have
not applied his move of that code as yet.  From his patch:

"At first glance, this function seems useful, but it actually increases
the amount of code required rather than decreasing it. Inline the
logic into the callers instead; most callers don't use the 'lexeme'
argument for anything and as a result considerable simplification is
possible."

Author: Robert Haas
---
 src/backend/utils/adt/json.c | 117 +++++++++++++++--------------------
 1 file changed, 51 insertions(+), 66 deletions(-)

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index f86d5042a7..ff0764dbc5 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -121,44 +121,7 @@ lex_peek(JsonLexContext *lex)
 }
 
 /*
- * lex_accept
- *
- * accept the look_ahead token and move the lexer to the next token if the
- * look_ahead token matches the token parameter. In that case, and if required,
- * also hand back the de-escaped lexeme.
- *
- * returns true if the token matched, false otherwise.
- */
-static inline bool
-lex_accept(JsonLexContext *lex, JsonTokenType token, char **lexeme)
-{
-	if (lex->token_type == token)
-	{
-		if (lexeme != NULL)
-		{
-			if (lex->token_type == JSON_TOKEN_STRING)
-			{
-				if (lex->strval != NULL)
-					*lexeme = pstrdup(lex->strval->data);
-			}
-			else
-			{
-				int			len = (lex->token_terminator - lex->token_start);
-				char	   *tokstr = palloc(len + 1);
-
-				memcpy(tokstr, lex->token_start, len);
-				tokstr[len] = '\0';
-				*lexeme = tokstr;
-			}
-		}
-		json_lex(lex);
-		return true;
-	}
-	return false;
-}
-
-/*
- * lex_accept
+ * lex_except
  *
  * move the lexer to the next token if the current look_ahead token matches
  * the parameter token. Otherwise, report an error.
@@ -166,7 +129,9 @@ lex_accept(JsonLexContext *lex, JsonTokenType token, char **lexeme)
 static inline void
 lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token)
 {
-	if (!lex_accept(lex, token, NULL))
+	if (lex_peek(lex) == token)
+		json_lex(lex);
+	else
 		report_parse_error(ctx, lex);
 }
 
@@ -343,12 +308,14 @@ json_count_array_elements(JsonLexContext *lex)
 	lex_expect(JSON_PARSE_ARRAY_START, &copylex, JSON_TOKEN_ARRAY_START);
 	if (lex_peek(&copylex) != JSON_TOKEN_ARRAY_END)
 	{
-		do
+		while (1)
 		{
 			count++;
 			parse_array_element(&copylex, &nullSemAction);
+			if (copylex.token_type != JSON_TOKEN_COMMA)
+				break;
+			json_lex(&copylex);
 		}
-		while (lex_accept(&copylex, JSON_TOKEN_COMMA, NULL));
 	}
 	lex_expect(JSON_PARSE_ARRAY_NEXT, &copylex, JSON_TOKEN_ARRAY_END);
 
@@ -369,35 +336,48 @@ parse_scalar(JsonLexContext *lex, const JsonSemAction *sem)
 {
 	char	   *val = NULL;
 	json_scalar_action sfunc = sem->scalar;
-	char	  **valaddr;
 	JsonTokenType tok = lex_peek(lex);
 
-	valaddr = sfunc == NULL ? NULL : &val;
-
 	/* a scalar must be a string, a number, true, false, or null */
 	switch (tok)
 	{
+		case JSON_TOKEN_STRING:
+		case JSON_TOKEN_NUMBER:
 		case JSON_TOKEN_TRUE:
-			lex_accept(lex, JSON_TOKEN_TRUE, valaddr);
-			break;
 		case JSON_TOKEN_FALSE:
-			lex_accept(lex, JSON_TOKEN_FALSE, valaddr);
-			break;
 		case JSON_TOKEN_NULL:
-			lex_accept(lex, JSON_TOKEN_NULL, valaddr);
-			break;
-		case JSON_TOKEN_NUMBER:
-			lex_accept(lex, JSON_TOKEN_NUMBER, valaddr);
-			break;
-		case JSON_TOKEN_STRING:
-			lex_accept(lex, JSON_TOKEN_STRING, valaddr);
 			break;
 		default:
 			report_parse_error(JSON_PARSE_VALUE, lex);
 	}
+	
+	/* if no semantic function, just consume the token */
+	if (sfunc == NULL)
+	{
+		json_lex(lex);
+		return;
+	}
+
+	/* extract the de-escaped string value, or the raw lexeme */
+	if (lex_peek(lex) == JSON_TOKEN_STRING)
+	{
+		if (lex->strval != NULL)
+			val = pstrdup(lex->strval->data);
+	}
+	else
+	{
+		int         len = (lex->token_terminator - lex->token_start);
+	
+		val = palloc(len + 1);
+		memcpy(val, lex->token_start, len);
+		val[len] = '\0';
+    }
+
+	/* consume the token */
+	json_lex(lex);
 
-	if (sfunc != NULL)
-		(*sfunc) (sem->semstate, val, tok);
+	/* invoke the callback */
+	(*sfunc) (sem->semstate, val, tok);
 }
 
 static void
@@ -413,14 +393,13 @@ parse_object_field(JsonLexContext *lex, const JsonSemAction *sem)
 	json_ofield_action ostart = sem->object_field_start;
 	json_ofield_action oend = sem->object_field_end;
 	bool		isnull;
-	char	  **fnameaddr = NULL;
 	JsonTokenType tok;
 
-	if (ostart != NULL || oend != NULL)
-		fnameaddr = &fname;
-
-	if (!lex_accept(lex, JSON_TOKEN_STRING, fnameaddr))
+	if (lex_peek(lex) != JSON_TOKEN_STRING)
 		report_parse_error(JSON_PARSE_STRING, lex);
+	if ((ostart != NULL || oend != NULL) && lex->strval != NULL)
+		fname = pstrdup(lex->strval->data);
+	json_lex(lex);
 
 	lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON);
 
@@ -470,16 +449,19 @@ parse_object(JsonLexContext *lex, const JsonSemAction *sem)
 	 */
 	lex->lex_level++;
 
-	/* we know this will succeed, just clearing the token */
-	lex_expect(JSON_PARSE_OBJECT_START, lex, JSON_TOKEN_OBJECT_START);
+	Assert(lex_peek(lex) == JSON_TOKEN_OBJECT_START);
+	json_lex(lex);
 
 	tok = lex_peek(lex);
 	switch (tok)
 	{
 		case JSON_TOKEN_STRING:
 			parse_object_field(lex, sem);
-			while (lex_accept(lex, JSON_TOKEN_COMMA, NULL))
+			while (lex_peek(lex) == JSON_TOKEN_COMMA)
+			{
+				json_lex(lex);
 				parse_object_field(lex, sem);
+			}
 			break;
 		case JSON_TOKEN_OBJECT_END:
 			break;
@@ -556,8 +538,11 @@ parse_array(JsonLexContext *lex, const JsonSemAction *sem)
 
 		parse_array_element(lex, sem);
 
-		while (lex_accept(lex, JSON_TOKEN_COMMA, NULL))
+		while (lex_peek(lex) == JSON_TOKEN_COMMA)
+		{
+			json_lex(lex);
 			parse_array_element(lex, sem);
+		}
 	}
 
 	lex_expect(JSON_PARSE_ARRAY_NEXT, lex, JSON_TOKEN_ARRAY_END);
-- 
2.21.1 (Apple Git-122.3)

0009-Making-json-parsing-work-without-throwing-exceptions.patch.WIPapplication/octet-stream; name=0009-Making-json-parsing-work-without-throwing-exceptions.patch.WIP; x-unix-mode=0644Download

From d69c6e1dcfd96adb9dfe012b80d3922b89be6dbf Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 22 Jan 2020 13:28:52 -0800
Subject: [PATCH 09/11] Making json parsing work without throwing exceptions.

This is largely based on Robert Haas's patch
v2-0004-WIP-Return-errors-rather-than-using-ereport.patch
---
 src/backend/utils/adt/json.c      | 473 ++++++++----------------------
 src/backend/utils/adt/jsonb.c     |   5 +-
 src/backend/utils/adt/jsonfuncs.c | 254 +++++++++++++++-
 src/include/common/jsonapi.h      |  30 +-
 src/include/utils/jsonfuncs.h     |  34 +++
 5 files changed, 432 insertions(+), 364 deletions(-)

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index ff0764dbc5..54075d07e3 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -34,6 +34,13 @@
 #include "utils/syscache.h"
 #include "utils/typcache.h"
 
+#define INSIST(x) \
+do { \
+	JsonParseErrorType	parse_result; \
+	if((parse_result = (x)) != JSON_SUCCESS) \
+		return parse_result; \
+} while (0)
+
 /*
  * The context of the parser is maintained by the recursive descent
  * mechanism, but is passed explicitly to the error reporting routine
@@ -76,19 +83,17 @@ typedef struct JsonAggState
 	Oid			val_output_func;
 } JsonAggState;
 
-static inline void json_lex(JsonLexContext *lex);
-static inline void json_lex_string(JsonLexContext *lex);
-static inline void json_lex_number(JsonLexContext *lex, char *s,
-								   bool *num_err, int *total_len);
-static inline void parse_scalar(JsonLexContext *lex, const JsonSemAction *sem);
-static void parse_object_field(JsonLexContext *lex, const JsonSemAction *sem);
-static void parse_object(JsonLexContext *lex, const JsonSemAction *sem);
-static void parse_array_element(JsonLexContext *lex, const JsonSemAction *sem);
-static void parse_array(JsonLexContext *lex, const JsonSemAction *sem);
-static void report_parse_error(JsonParseContext ctx, JsonLexContext *lex) pg_attribute_noreturn();
-static void report_invalid_token(JsonLexContext *lex) pg_attribute_noreturn();
-static int	report_json_context(JsonLexContext *lex);
-static char *extract_mb_char(char *s);
+static inline JsonTokenType lex_peek(JsonLexContext *lex) __attribute__((warn_unused_result));
+static inline JsonParseErrorType lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token) __attribute__((warn_unused_result));
+static inline JsonParseErrorType json_lex_string(JsonLexContext *lex) __attribute__((warn_unused_result));
+static inline JsonParseErrorType json_lex_number(JsonLexContext *lex, char *s,
+												 bool *num_err, int *total_len) __attribute__((warn_unused_result));
+static inline JsonParseErrorType parse_scalar(JsonLexContext *lex, const JsonSemAction *sem) __attribute__((warn_unused_result));
+static JsonParseErrorType parse_object_field(JsonLexContext *lex, const JsonSemAction *sem) __attribute__((warn_unused_result));
+static JsonParseErrorType parse_object(JsonLexContext *lex, const JsonSemAction *sem) __attribute__((warn_unused_result));
+static JsonParseErrorType parse_array_element(JsonLexContext *lex, const JsonSemAction *sem) __attribute__((warn_unused_result));
+static JsonParseErrorType parse_array(JsonLexContext *lex, const JsonSemAction *sem) __attribute__((warn_unused_result));
+static JsonParseErrorType report_parse_error(JsonParseContext ctx, JsonLexContext *lex) __attribute__((warn_unused_result));
 static void composite_to_json(Datum composite, StringInfo result,
 							  bool use_line_feeds);
 static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
@@ -126,13 +131,14 @@ lex_peek(JsonLexContext *lex)
  * move the lexer to the next token if the current look_ahead token matches
  * the parameter token. Otherwise, report an error.
  */
-static inline void
+static inline JsonParseErrorType
 lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token)
 {
 	if (lex_peek(lex) == token)
-		json_lex(lex);
+		INSIST(json_lex(lex));
 	else
-		report_parse_error(ctx, lex);
+		return report_parse_error(ctx, lex);
+	return JSON_SUCCESS;
 }
 
 /* chars to consider as part of an alphanumeric token */
@@ -175,7 +181,8 @@ IsValidJsonNumber(const char *str, int len)
 		dummy_lex.input_length = len;
 	}
 
-	json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error, &total_len);
+	if (JSON_SUCCESS != json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error, &total_len))
+		return false;
 
 	return (!numeric_error) && (total_len == dummy_lex.input_length);
 }
@@ -192,7 +199,7 @@ json_in(PG_FUNCTION_ARGS)
 
 	/* validate it */
 	lex = makeJsonLexContext(result, false);
-	pg_parse_json(lex, &nullSemAction);
+	pg_parse_json_or_throw(lex, &nullSemAction);
 
 	/* Internal representation is the same as text, for now */
 	PG_RETURN_TEXT_P(result);
@@ -239,7 +246,7 @@ json_recv(PG_FUNCTION_ARGS)
 
 	/* Validate it. */
 	lex = makeJsonLexContextCstringLen(str, nbytes, false);
-	pg_parse_json(lex, &nullSemAction);
+	pg_parse_json_or_throw(lex, &nullSemAction);
 
 	PG_RETURN_TEXT_P(cstring_to_text_with_len(str, nbytes));
 }
@@ -254,13 +261,13 @@ json_recv(PG_FUNCTION_ARGS)
  * action routines to be called at appropriate spots during parsing, and a
  * pointer to a state object to be passed to those routines.
  */
-void
+JsonParseErrorType
 pg_parse_json(JsonLexContext *lex, const JsonSemAction *sem)
 {
 	JsonTokenType tok;
 
 	/* get the initial token */
-	json_lex(lex);
+	INSIST(json_lex(lex));
 
 	tok = lex_peek(lex);
 
@@ -268,17 +275,17 @@ pg_parse_json(JsonLexContext *lex, const JsonSemAction *sem)
 	switch (tok)
 	{
 		case JSON_TOKEN_OBJECT_START:
-			parse_object(lex, sem);
+			INSIST(parse_object(lex, sem));
 			break;
 		case JSON_TOKEN_ARRAY_START:
-			parse_array(lex, sem);
+			INSIST(parse_array(lex, sem));
 			break;
 		default:
-			parse_scalar(lex, sem); /* json can be a bare scalar */
+			INSIST(parse_scalar(lex, sem)); /* json can be a bare scalar */
 	}
 
-	lex_expect(JSON_PARSE_END, lex, JSON_TOKEN_END);
-
+	INSIST(lex_expect(JSON_PARSE_END, lex, JSON_TOKEN_END));
+	return JSON_SUCCESS;
 }
 
 /*
@@ -305,19 +312,20 @@ json_count_array_elements(JsonLexContext *lex)
 	copylex.lex_level++;
 
 	count = 0;
-	lex_expect(JSON_PARSE_ARRAY_START, &copylex, JSON_TOKEN_ARRAY_START);
+	INSIST(lex_expect(JSON_PARSE_ARRAY_START, &copylex, JSON_TOKEN_ARRAY_START));
 	if (lex_peek(&copylex) != JSON_TOKEN_ARRAY_END)
 	{
 		while (1)
 		{
 			count++;
-			parse_array_element(&copylex, &nullSemAction);
+			if (JSON_SUCCESS != parse_array_element(&copylex, &nullSemAction))
+				break;
 			if (copylex.token_type != JSON_TOKEN_COMMA)
 				break;
-			json_lex(&copylex);
+			INSIST(json_lex(&copylex));
 		}
 	}
-	lex_expect(JSON_PARSE_ARRAY_NEXT, &copylex, JSON_TOKEN_ARRAY_END);
+	INSIST(lex_expect(JSON_PARSE_ARRAY_NEXT, &copylex, JSON_TOKEN_ARRAY_END));
 
 	return count;
 }
@@ -331,7 +339,7 @@ json_count_array_elements(JsonLexContext *lex)
  *	  - object ( { } )
  *	  - object field
  */
-static inline void
+static inline JsonParseErrorType
 parse_scalar(JsonLexContext *lex, const JsonSemAction *sem)
 {
 	char	   *val = NULL;
@@ -348,14 +356,14 @@ parse_scalar(JsonLexContext *lex, const JsonSemAction *sem)
 		case JSON_TOKEN_NULL:
 			break;
 		default:
-			report_parse_error(JSON_PARSE_VALUE, lex);
+			return report_parse_error(JSON_PARSE_VALUE, lex);
 	}
 	
 	/* if no semantic function, just consume the token */
 	if (sfunc == NULL)
 	{
-		json_lex(lex);
-		return;
+		INSIST(json_lex(lex));
+		return JSON_SUCCESS;
 	}
 
 	/* extract the de-escaped string value, or the raw lexeme */
@@ -374,13 +382,14 @@ parse_scalar(JsonLexContext *lex, const JsonSemAction *sem)
     }
 
 	/* consume the token */
-	json_lex(lex);
+	INSIST(json_lex(lex));
 
 	/* invoke the callback */
 	(*sfunc) (sem->semstate, val, tok);
+	return JSON_SUCCESS;
 }
 
-static void
+static JsonParseErrorType
 parse_object_field(JsonLexContext *lex, const JsonSemAction *sem)
 {
 	/*
@@ -396,12 +405,12 @@ parse_object_field(JsonLexContext *lex, const JsonSemAction *sem)
 	JsonTokenType tok;
 
 	if (lex_peek(lex) != JSON_TOKEN_STRING)
-		report_parse_error(JSON_PARSE_STRING, lex);
+		return report_parse_error(JSON_PARSE_STRING, lex);
 	if ((ostart != NULL || oend != NULL) && lex->strval != NULL)
 		fname = pstrdup(lex->strval->data);
-	json_lex(lex);
+	INSIST(json_lex(lex));
 
-	lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON);
+	INSIST(lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON));
 
 	tok = lex_peek(lex);
 	isnull = tok == JSON_TOKEN_NULL;
@@ -412,20 +421,21 @@ parse_object_field(JsonLexContext *lex, const JsonSemAction *sem)
 	switch (tok)
 	{
 		case JSON_TOKEN_OBJECT_START:
-			parse_object(lex, sem);
+			INSIST(parse_object(lex, sem));
 			break;
 		case JSON_TOKEN_ARRAY_START:
-			parse_array(lex, sem);
+			INSIST(parse_array(lex, sem));
 			break;
 		default:
-			parse_scalar(lex, sem);
+			INSIST(parse_scalar(lex, sem));
 	}
 
 	if (oend != NULL)
 		(*oend) (sem->semstate, fname, isnull);
+	return JSON_SUCCESS;
 }
 
-static void
+static JsonParseErrorType
 parse_object(JsonLexContext *lex, const JsonSemAction *sem)
 {
 	/*
@@ -450,35 +460,36 @@ parse_object(JsonLexContext *lex, const JsonSemAction *sem)
 	lex->lex_level++;
 
 	Assert(lex_peek(lex) == JSON_TOKEN_OBJECT_START);
-	json_lex(lex);
+	INSIST(json_lex(lex));
 
 	tok = lex_peek(lex);
 	switch (tok)
 	{
 		case JSON_TOKEN_STRING:
-			parse_object_field(lex, sem);
+			INSIST(parse_object_field(lex, sem));
 			while (lex_peek(lex) == JSON_TOKEN_COMMA)
 			{
-				json_lex(lex);
-				parse_object_field(lex, sem);
+				INSIST(json_lex(lex));
+				INSIST(parse_object_field(lex, sem));
 			}
 			break;
 		case JSON_TOKEN_OBJECT_END:
 			break;
 		default:
 			/* case of an invalid initial token inside the object */
-			report_parse_error(JSON_PARSE_OBJECT_START, lex);
+			return report_parse_error(JSON_PARSE_OBJECT_START, lex);
 	}
 
-	lex_expect(JSON_PARSE_OBJECT_NEXT, lex, JSON_TOKEN_OBJECT_END);
+	INSIST(lex_expect(JSON_PARSE_OBJECT_NEXT, lex, JSON_TOKEN_OBJECT_END));
 
 	lex->lex_level--;
 
 	if (oend != NULL)
 		(*oend) (sem->semstate);
+	return JSON_SUCCESS;
 }
 
-static void
+static JsonParseErrorType
 parse_array_element(JsonLexContext *lex, const JsonSemAction *sem)
 {
 	json_aelem_action astart = sem->array_element_start;
@@ -496,20 +507,21 @@ parse_array_element(JsonLexContext *lex, const JsonSemAction *sem)
 	switch (tok)
 	{
 		case JSON_TOKEN_OBJECT_START:
-			parse_object(lex, sem);
+			INSIST(parse_object(lex, sem));
 			break;
 		case JSON_TOKEN_ARRAY_START:
-			parse_array(lex, sem);
+			INSIST(parse_array(lex, sem));
 			break;
 		default:
-			parse_scalar(lex, sem);
+			INSIST(parse_scalar(lex, sem));
 	}
 
 	if (aend != NULL)
 		(*aend) (sem->semstate, isnull);
+	return JSON_SUCCESS;
 }
 
-static void
+static JsonParseErrorType
 parse_array(JsonLexContext *lex, const JsonSemAction *sem)
 {
 	/*
@@ -532,31 +544,32 @@ parse_array(JsonLexContext *lex, const JsonSemAction *sem)
 	 */
 	lex->lex_level++;
 
-	lex_expect(JSON_PARSE_ARRAY_START, lex, JSON_TOKEN_ARRAY_START);
+	INSIST(lex_expect(JSON_PARSE_ARRAY_START, lex, JSON_TOKEN_ARRAY_START));
 	if (lex_peek(lex) != JSON_TOKEN_ARRAY_END)
 	{
 
-		parse_array_element(lex, sem);
+		INSIST(parse_array_element(lex, sem));
 
 		while (lex_peek(lex) == JSON_TOKEN_COMMA)
 		{
-			json_lex(lex);
-			parse_array_element(lex, sem);
+			INSIST(json_lex(lex));
+			INSIST(parse_array_element(lex, sem));
 		}
 	}
 
-	lex_expect(JSON_PARSE_ARRAY_NEXT, lex, JSON_TOKEN_ARRAY_END);
+	INSIST(lex_expect(JSON_PARSE_ARRAY_NEXT, lex, JSON_TOKEN_ARRAY_END));
 
 	lex->lex_level--;
 
 	if (aend != NULL)
 		(*aend) (sem->semstate);
+	return JSON_SUCCESS;
 }
 
 /*
  * Lex one token from the input stream.
  */
-static inline void
+JsonParseErrorType
 json_lex(JsonLexContext *lex)
 {
 	char	   *s;
@@ -619,12 +632,12 @@ json_lex(JsonLexContext *lex)
 				break;
 			case '"':
 				/* string */
-				json_lex_string(lex);
+				INSIST(json_lex_string(lex));
 				lex->token_type = JSON_TOKEN_STRING;
 				break;
 			case '-':
 				/* Negative number. */
-				json_lex_number(lex, s + 1, NULL, NULL);
+				INSIST(json_lex_number(lex, s + 1, NULL, NULL));
 				lex->token_type = JSON_TOKEN_NUMBER;
 				break;
 			case '0':
@@ -638,7 +651,7 @@ json_lex(JsonLexContext *lex)
 			case '8':
 			case '9':
 				/* Positive number. */
-				json_lex_number(lex, s, NULL, NULL);
+				INSIST(json_lex_number(lex, s, NULL, NULL));
 				lex->token_type = JSON_TOKEN_NUMBER;
 				break;
 			default:
@@ -666,7 +679,7 @@ json_lex(JsonLexContext *lex)
 					{
 						lex->prev_token_terminator = lex->token_terminator;
 						lex->token_terminator = s + 1;
-						report_invalid_token(lex);
+						return JSON_INVALID_TOKEN;
 					}
 
 					/*
@@ -683,21 +696,22 @@ json_lex(JsonLexContext *lex)
 						else if (memcmp(s, "null", 4) == 0)
 							lex->token_type = JSON_TOKEN_NULL;
 						else
-							report_invalid_token(lex);
+							return JSON_INVALID_TOKEN;
 					}
 					else if (p - s == 5 && memcmp(s, "false", 5) == 0)
 						lex->token_type = JSON_TOKEN_FALSE;
 					else
-						report_invalid_token(lex);
+						return JSON_INVALID_TOKEN;
 
 				}
 		}						/* end of switch */
+	return JSON_SUCCESS;
 }
 
 /*
  * The next token in the input stream is known to be a string; lex it.
  */
-static inline void
+static inline JsonParseErrorType
 json_lex_string(JsonLexContext *lex)
 {
 	char	   *s;
@@ -718,7 +732,7 @@ json_lex_string(JsonLexContext *lex)
 		if (len >= lex->input_length)
 		{
 			lex->token_terminator = s;
-			report_invalid_token(lex);
+			return JSON_INVALID_TOKEN;
 		}
 		else if (*s == '"')
 			break;
@@ -727,12 +741,7 @@ json_lex_string(JsonLexContext *lex)
 			/* Per RFC4627, these characters MUST be escaped. */
 			/* Since *s isn't printable, exclude it from the context string */
 			lex->token_terminator = s;
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-					 errmsg("invalid input syntax for type %s", "json"),
-					 errdetail("Character with value 0x%02x must be escaped.",
-							   (unsigned char) *s),
-					 report_json_context(lex)));
+			return JSON_ESCAPING_REQUIRED;
 		}
 		else if (*s == '\\')
 		{
@@ -742,7 +751,7 @@ json_lex_string(JsonLexContext *lex)
 			if (len >= lex->input_length)
 			{
 				lex->token_terminator = s;
-				report_invalid_token(lex);
+				return JSON_INVALID_TOKEN;
 			}
 			else if (*s == 'u')
 			{
@@ -756,7 +765,7 @@ json_lex_string(JsonLexContext *lex)
 					if (len >= lex->input_length)
 					{
 						lex->token_terminator = s;
-						report_invalid_token(lex);
+						return JSON_INVALID_TOKEN;
 					}
 					else if (*s >= '0' && *s <= '9')
 						ch = (ch * 16) + (*s - '0');
@@ -767,12 +776,7 @@ json_lex_string(JsonLexContext *lex)
 					else
 					{
 						lex->token_terminator = s + pg_mblen(s);
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s",
-										"json"),
-								 errdetail("\"\\u\" must be followed by four hexadecimal digits."),
-								 report_json_context(lex)));
+						return JSON_UNICODE_ESCAPE_FORMAT;
 					}
 				}
 				if (lex->strval != NULL)
@@ -783,33 +787,20 @@ json_lex_string(JsonLexContext *lex)
 					if (ch >= 0xd800 && ch <= 0xdbff)
 					{
 						if (hi_surrogate != -1)
-							ereport(ERROR,
-									(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-									 errmsg("invalid input syntax for type %s",
-											"json"),
-									 errdetail("Unicode high surrogate must not follow a high surrogate."),
-									 report_json_context(lex)));
+							return JSON_UNICODE_HIGH_SURROGATE;
 						hi_surrogate = (ch & 0x3ff) << 10;
 						continue;
 					}
 					else if (ch >= 0xdc00 && ch <= 0xdfff)
 					{
 						if (hi_surrogate == -1)
-							ereport(ERROR,
-									(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-									 errmsg("invalid input syntax for type %s", "json"),
-									 errdetail("Unicode low surrogate must follow a high surrogate."),
-									 report_json_context(lex)));
+							return JSON_UNICODE_LOW_SURROGATE;
 						ch = 0x10000 + hi_surrogate + (ch & 0x3ff);
 						hi_surrogate = -1;
 					}
 
 					if (hi_surrogate != -1)
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s", "json"),
-								 errdetail("Unicode low surrogate must follow a high surrogate."),
-								 report_json_context(lex)));
+						return JSON_UNICODE_LOW_SURROGATE;
 
 					/*
 					 * For UTF8, replace the escape sequence by the actual
@@ -821,11 +812,7 @@ json_lex_string(JsonLexContext *lex)
 					if (ch == 0)
 					{
 						/* We can't allow this, since our TEXT type doesn't */
-						ereport(ERROR,
-								(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
-								 errmsg("unsupported Unicode escape sequence"),
-								 errdetail("\\u0000 cannot be converted to text."),
-								 report_json_context(lex)));
+						return JSON_UNICODE_CODE_POINT_ZERO;
 					}
 					else if (GetDatabaseEncoding() == PG_UTF8)
 					{
@@ -843,25 +830,14 @@ json_lex_string(JsonLexContext *lex)
 						appendStringInfoChar(lex->strval, (char) ch);
 					}
 					else
-					{
-						ereport(ERROR,
-								(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
-								 errmsg("unsupported Unicode escape sequence"),
-								 errdetail("Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8."),
-								 report_json_context(lex)));
-					}
+						return JSON_UNICODE_HIGH_ESCAPE;
 
 				}
 			}
 			else if (lex->strval != NULL)
 			{
 				if (hi_surrogate != -1)
-					ereport(ERROR,
-							(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-							 errmsg("invalid input syntax for type %s",
-									"json"),
-							 errdetail("Unicode low surrogate must follow a high surrogate."),
-							 report_json_context(lex)));
+					return JSON_UNICODE_LOW_SURROGATE;
 
 				switch (*s)
 				{
@@ -888,13 +864,7 @@ json_lex_string(JsonLexContext *lex)
 					default:
 						/* Not a valid string escape, so error out. */
 						lex->token_terminator = s + pg_mblen(s);
-						ereport(ERROR,
-								(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-								 errmsg("invalid input syntax for type %s",
-										"json"),
-								 errdetail("Escape sequence \"\\%s\" is invalid.",
-										   extract_mb_char(s)),
-								 report_json_context(lex)));
+						return JSON_ESCAPING_INVALID;
 				}
 			}
 			else if (strchr("\"\\/bfnrt", *s) == NULL)
@@ -907,39 +877,26 @@ json_lex_string(JsonLexContext *lex)
 				 * shown it's not a performance win.
 				 */
 				lex->token_terminator = s + pg_mblen(s);
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Escape sequence \"\\%s\" is invalid.",
-								   extract_mb_char(s)),
-						 report_json_context(lex)));
+				return JSON_ESCAPING_INVALID;
 			}
 
 		}
 		else if (lex->strval != NULL)
 		{
 			if (hi_surrogate != -1)
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Unicode low surrogate must follow a high surrogate."),
-						 report_json_context(lex)));
-
+				return JSON_UNICODE_LOW_SURROGATE;
 			appendStringInfoChar(lex->strval, *s);
 		}
 
 	}
 
 	if (hi_surrogate != -1)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("Unicode low surrogate must follow a high surrogate."),
-				 report_json_context(lex)));
+		return JSON_UNICODE_LOW_SURROGATE;
 
 	/* Hooray, we found the end of the string! */
 	lex->prev_token_terminator = lex->token_terminator;
 	lex->token_terminator = s + 1;
+	return JSON_SUCCESS;
 }
 
 /*
@@ -970,7 +927,7 @@ json_lex_string(JsonLexContext *lex)
  * raising an error for a badly-formed number.  Also, if total_len is not NULL
  * the distance from lex->input to the token end+1 is returned to *total_len.
  */
-static inline void
+static inline JsonParseErrorType
 json_lex_number(JsonLexContext *lex, char *s,
 				bool *num_err, int *total_len)
 {
@@ -1059,8 +1016,9 @@ json_lex_number(JsonLexContext *lex, char *s,
 		lex->token_terminator = s;
 		/* handle error if any */
 		if (error)
-			report_invalid_token(lex);
+			return JSON_INVALID_TOKEN;
 	}
+	return JSON_SUCCESS;
 }
 
 /*
@@ -1068,219 +1026,36 @@ json_lex_number(JsonLexContext *lex, char *s,
  *
  * lex->token_start and lex->token_terminator must identify the current token.
  */
-static void
+static JsonParseErrorType
 report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
 {
-	char	   *token;
-	int			toklen;
-
 	/* Handle case where the input ended prematurely. */
 	if (lex->token_start == NULL || lex->token_type == JSON_TOKEN_END)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("The input string ended unexpectedly."),
-				 report_json_context(lex)));
-
-	/* Separate out the current token. */
-	toklen = lex->token_terminator - lex->token_start;
-	token = palloc(toklen + 1);
-	memcpy(token, lex->token_start, toklen);
-	token[toklen] = '\0';
-
-	/* Complain, with the appropriate detail message. */
-	if (ctx == JSON_PARSE_END)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s", "json"),
-				 errdetail("Expected end of input, but found \"%s\".",
-						   token),
-				 report_json_context(lex)));
-	else
-	{
-		switch (ctx)
-		{
-			case JSON_PARSE_VALUE:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected JSON value, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_STRING:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_ARRAY_START:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected array element or \"]\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_ARRAY_NEXT:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \",\" or \"]\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_START:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string or \"}\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_LABEL:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \":\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_NEXT:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected \",\" or \"}\", but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			case JSON_PARSE_OBJECT_COMMA:
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s", "json"),
-						 errdetail("Expected string, but found \"%s\".",
-								   token),
-						 report_json_context(lex)));
-				break;
-			default:
-				elog(ERROR, "unexpected json parse state: %d", ctx);
-		}
-	}
-}
-
-/*
- * Report an invalid input token.
- *
- * lex->token_start and lex->token_terminator must identify the token.
- */
-static void
-report_invalid_token(JsonLexContext *lex)
-{
-	char	   *token;
-	int			toklen;
-
-	/* Separate out the offending token. */
-	toklen = lex->token_terminator - lex->token_start;
-	token = palloc(toklen + 1);
-	memcpy(token, lex->token_start, toklen);
-	token[toklen] = '\0';
-
-	ereport(ERROR,
-			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-			 errmsg("invalid input syntax for type %s", "json"),
-			 errdetail("Token \"%s\" is invalid.", token),
-			 report_json_context(lex)));
-}
+		return JSON_EXPECTED_MORE;
 
-/*
- * Report a CONTEXT line for bogus JSON input.
- *
- * lex->token_terminator must be set to identify the spot where we detected
- * the error.  Note that lex->token_start might be NULL, in case we recognized
- * error at EOF.
- *
- * The return value isn't meaningful, but we make it non-void so that this
- * can be invoked inside ereport().
- */
-static int
-report_json_context(JsonLexContext *lex)
-{
-	const char *context_start;
-	const char *context_end;
-	const char *line_start;
-	int			line_number;
-	char	   *ctxt;
-	int			ctxtlen;
-	const char *prefix;
-	const char *suffix;
-
-	/* Choose boundaries for the part of the input we will display */
-	context_start = lex->input;
-	context_end = lex->token_terminator;
-	line_start = context_start;
-	line_number = 1;
-	for (;;)
+	switch (ctx)
 	{
-		/* Always advance over newlines */
-		if (context_start < context_end && *context_start == '\n')
-		{
-			context_start++;
-			line_start = context_start;
-			line_number++;
-			continue;
-		}
-		/* Otherwise, done as soon as we are close enough to context_end */
-		if (context_end - context_start < 50)
-			break;
-		/* Advance to next multibyte character */
-		if (IS_HIGHBIT_SET(*context_start))
-			context_start += pg_mblen(context_start);
-		else
-			context_start++;
+		case JSON_PARSE_END:
+			return JSON_EXPECTED_END;
+		case JSON_PARSE_VALUE:
+			return JSON_EXPECTED_JSON;
+		case JSON_PARSE_STRING:
+			return JSON_EXPECTED_STRING;
+		case JSON_PARSE_ARRAY_START:
+			return JSON_EXPECTED_ARRAY_FIRST;
+		case JSON_PARSE_ARRAY_NEXT:
+			return JSON_EXPECTED_ARRAY_NEXT;
+		case JSON_PARSE_OBJECT_START:
+			return JSON_EXPECTED_OBJECT_FIRST;
+		case JSON_PARSE_OBJECT_LABEL:
+			return JSON_EXPECTED_COLON;
+		case JSON_PARSE_OBJECT_NEXT:
+			return JSON_EXPECTED_OBJECT_NEXT;
+		case JSON_PARSE_OBJECT_COMMA:
+			return JSON_EXPECTED_STRING;
+		default:
+			return JSON_BAD_PARSER_STATE;;
 	}
-
-	/*
-	 * We add "..." to indicate that the excerpt doesn't start at the
-	 * beginning of the line ... but if we're within 3 characters of the
-	 * beginning of the line, we might as well just show the whole line.
-	 */
-	if (context_start - line_start <= 3)
-		context_start = line_start;
-
-	/* Get a null-terminated copy of the data to present */
-	ctxtlen = context_end - context_start;
-	ctxt = palloc(ctxtlen + 1);
-	memcpy(ctxt, context_start, ctxtlen);
-	ctxt[ctxtlen] = '\0';
-
-	/*
-	 * Show the context, prefixing "..." if not starting at start of line, and
-	 * suffixing "..." if not ending at end of line.
-	 */
-	prefix = (context_start > line_start) ? "..." : "";
-	suffix = (lex->token_type != JSON_TOKEN_END && context_end - lex->input < lex->input_length && *context_end != '\n' && *context_end != '\r') ? "..." : "";
-
-	return errcontext("JSON data, line %d: %s%s%s",
-					  line_number, prefix, ctxt, suffix);
-}
-
-/*
- * Extract a single, possibly multi-byte char from the input string.
- */
-static char *
-extract_mb_char(char *s)
-{
-	char	   *res;
-	int			len;
-
-	len = pg_mblen(s);
-	res = palloc(len + 1);
-	memcpy(res, s, len);
-	res[len] = '\0';
-
-	return res;
 }
 
 /*
@@ -2492,7 +2267,7 @@ json_typeof(PG_FUNCTION_ARGS)
 	lex = makeJsonLexContext(json, false);
 
 	/* Lex exactly one token from the input and check its type. */
-	json_lex(lex);
+	json_lex_or_throw(lex);
 	tok = lex_peek(lex);
 	switch (tok)
 	{
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index c95e112184..63072f616e 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -272,7 +272,7 @@ jsonb_from_cstring(char *json, int len)
 	sem.scalar = jsonb_in_scalar;
 	sem.object_field_start = jsonb_in_object_field_start;
 
-	pg_parse_json(lex, &sem);
+	pg_parse_json_or_throw(lex, &sem);
 
 	/* after parsing, the item member has the composed jsonb structure */
 	PG_RETURN_POINTER(JsonbValueToJsonb(state.res));
@@ -860,8 +860,7 @@ datum_to_jsonb(Datum val, bool is_null, JsonbInState *result,
 					sem.scalar = jsonb_in_scalar;
 					sem.object_field_start = jsonb_in_object_field_start;
 
-					pg_parse_json(lex, &sem);
-
+					pg_parse_json_or_throw(lex, &sem);
 				}
 				break;
 			case JSONBTYPE_JSONB:
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 3979145ecc..be5d30239d 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -30,6 +30,7 @@
 #include "utils/json.h"
 #include "utils/jsonb.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
 #include "utils/typcache.h"
@@ -328,6 +329,9 @@ typedef struct JsObject
 			hash_destroy((jso)->val.json_hash); \
 	} while (0)
 
+/* functions for json parsing */
+static char *extract_mb_char(char *s);
+
 /* semantic action functions for json_object_keys */
 static void okeys_object_field_start(void *state, char *fname, bool isnull);
 static void okeys_array_start(void *state);
@@ -483,6 +487,23 @@ static void transform_string_values_object_field_start(void *state, char *fname,
 static void transform_string_values_array_element_start(void *state, bool isnull);
 static void transform_string_values_scalar(void *state, char *token, JsonTokenType tokentype);
 
+/*
+ * Extract a single, possibly multi-byte char from the input string.
+ */
+static char *
+extract_mb_char(char *s)
+{
+	char	   *res;
+	int			len;
+
+	len = pg_mblen(s);
+	res = palloc(len + 1);
+	memcpy(res, s, len);
+	res[len] = '\0';
+
+	return res;
+}
+
 /*
  * makeJsonLexContext
  *
@@ -625,7 +646,7 @@ json_object_keys(PG_FUNCTION_ARGS)
 		sem->object_field_start = okeys_object_field_start;
 		/* remainder are all NULL, courtesy of palloc0 above */
 
-		pg_parse_json(lex, sem);
+		pg_parse_json_or_throw(lex, sem);
 		/* keys are now in state->result */
 
 		pfree(lex->strval->data);
@@ -656,6 +677,78 @@ json_object_keys(PG_FUNCTION_ARGS)
 	SRF_RETURN_DONE(funcctx);
 }
 
+/*
+ * Report a CONTEXT line for bogus JSON input.
+ *
+ * lex->token_terminator must be set to identify the spot where we detected
+ * the error.  Note that lex->token_start might be NULL, in case we recognized
+ * error at EOF.
+ *
+ * The return value isn't meaningful, but we make it non-void so that this
+ * can be invoked inside ereport().
+ */
+int
+report_json_context(JsonLexContext *lex)
+{
+	const char *context_start;
+	const char *context_end;
+	const char *line_start;
+	int			line_number;
+	char	   *ctxt;
+	int			ctxtlen;
+	const char *prefix;
+	const char *suffix;
+
+	/* Choose boundaries for the part of the input we will display */
+	context_start = lex->input;
+	context_end = lex->token_terminator;
+	line_start = context_start;
+	line_number = 1;
+	for (;;)
+	{
+		/* Always advance over newlines */
+		if (context_start < context_end && *context_start == '\n')
+		{
+			context_start++;
+			line_start = context_start;
+			line_number++;
+			continue;
+		}
+		/* Otherwise, done as soon as we are close enough to context_end */
+		if (context_end - context_start < 50)
+			break;
+		/* Advance to next multibyte character */
+		if (IS_HIGHBIT_SET(*context_start))
+			context_start += pg_mblen(context_start);
+		else
+			context_start++;
+	}
+
+	/*
+	 * We add "..." to indicate that the excerpt doesn't start at the
+	 * beginning of the line ... but if we're within 3 characters of the
+	 * beginning of the line, we might as well just show the whole line.
+	 */
+	if (context_start - line_start <= 3)
+		context_start = line_start;
+
+	/* Get a null-terminated copy of the data to present */
+	ctxtlen = context_end - context_start;
+	ctxt = palloc(ctxtlen + 1);
+	memcpy(ctxt, context_start, ctxtlen);
+	ctxt[ctxtlen] = '\0';
+
+	/*
+	 * Show the context, prefixing "..." if not starting at start of line, and
+	 * suffixing "..." if not ending at end of line.
+	 */
+	prefix = (context_start > line_start) ? "..." : "";
+	suffix = (lex->token_type != JSON_TOKEN_END && context_end - lex->input < lex->input_length && *context_end != '\n' && *context_end != '\r') ? "..." : "";
+
+	return errcontext("JSON data, line %d: %s%s%s",
+					  line_number, prefix, ctxt, suffix);
+}
+
 static void
 okeys_object_field_start(void *state, char *fname, bool isnull)
 {
@@ -1019,7 +1112,7 @@ get_worker(text *json,
 		sem->array_element_end = get_array_element_end;
 	}
 
-	pg_parse_json(lex, sem);
+	pg_parse_json_or_throw(lex, sem);
 
 	return state->tresult;
 }
@@ -1567,7 +1660,7 @@ json_array_length(PG_FUNCTION_ARGS)
 	sem->scalar = alen_scalar;
 	sem->array_element_start = alen_array_element_start;
 
-	pg_parse_json(lex, sem);
+	pg_parse_json_or_throw(lex, sem);
 
 	PG_RETURN_INT32(state->count);
 }
@@ -1662,6 +1755,145 @@ jsonb_each_text(PG_FUNCTION_ARGS)
 	return each_worker_jsonb(fcinfo, "jsonb_each_text", true);
 }
 
+static char *
+extract_token(JsonLexContext *lex)
+{
+	int toklen = lex->token_terminator - lex->token_start;
+	char *token = palloc(toklen + 1);
+
+	memcpy(token, lex->token_start, toklen);
+	token[toklen] = '\0';
+	return token;
+}
+
+void
+throw_json_parse_error(JsonParseErrorType error, JsonLexContext *lex)
+{
+	switch (error)
+	{
+		case JSON_SUCCESS:
+			elog(ERROR, "internal error in json parser");
+			break;
+		case JSON_ESCAPING_INVALID:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Escape sequence \"\\%s\" is invalid.",
+							   extract_mb_char(lex->token_terminator - 1)), // XXX WRONG AND BUSTED
+					 report_json_context(lex)));
+		case JSON_ESCAPING_REQUIRED:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Character with value 0x%02x must be escaped.",
+							   (unsigned char) *(lex->token_terminator)),
+					 report_json_context(lex)));
+		case JSON_EXPECTED_END:
+			ereport(ERROR,
+					 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					  errmsg("invalid input syntax for type %s", "json"),
+					  errdetail("Expected end of input, but found \"%s\".",
+								extract_token(lex)),
+					  report_json_context(lex)));
+		case JSON_EXPECTED_ARRAY_FIRST:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Expected array element or \"]\", but found \"%s\".",
+							   extract_token(lex)),
+					 report_json_context(lex)));
+		case JSON_EXPECTED_ARRAY_NEXT:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Expected \",\" or \"]\", but found \"%s\".",
+							   extract_token(lex)),
+					 report_json_context(lex)));
+		case JSON_EXPECTED_COLON:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Expected \":\", but found \"%s\".",
+							   extract_token(lex)),
+					 report_json_context(lex)));
+		case JSON_EXPECTED_JSON:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Expected JSON value, but found \"%s\".",
+							   extract_token(lex)),
+					 report_json_context(lex)));
+		case JSON_EXPECTED_MORE:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("The input string ended unexpectedly."),
+					 report_json_context(lex)));
+		case JSON_EXPECTED_OBJECT_FIRST:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Expected string or \"}\", but found \"%s\".",
+							   extract_token(lex)),
+					 report_json_context(lex)));
+		case JSON_EXPECTED_OBJECT_NEXT:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Expected \",\" or \"}\", but found \"%s\".",
+							   extract_token(lex)),
+					 report_json_context(lex)));
+		case JSON_EXPECTED_STRING:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Expected string, but found \"%s\".",
+							   extract_token(lex)),
+					 report_json_context(lex)));
+		case JSON_INVALID_TOKEN:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Token \"%s\" is invalid.", extract_token(lex)),
+					 report_json_context(lex)));
+		case JSON_UNICODE_CODE_POINT_ZERO:
+			ereport(ERROR,
+					(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
+					 errmsg("unsupported Unicode escape sequence"),
+					 errdetail("\\u0000 cannot be converted to text."),
+					 report_json_context(lex)));
+		case JSON_UNICODE_ESCAPE_FORMAT:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("\"\\u\" must be followed by four hexadecimal digits."),
+					 report_json_context(lex)));
+		case JSON_UNICODE_HIGH_ESCAPE:
+			ereport(ERROR,
+					(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
+					 errmsg("unsupported Unicode escape sequence"),
+					 errdetail("Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8."),
+					 report_json_context(lex)));
+		case JSON_UNICODE_HIGH_SURROGATE:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Unicode high surrogate must not follow a high surrogate."),
+					 report_json_context(lex)));
+		case JSON_UNICODE_LOW_SURROGATE:
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s", "json"),
+					 errdetail("Unicode low surrogate must follow a high surrogate."),
+					 report_json_context(lex)));
+		case JSON_BAD_PARSER_STATE:
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					 errmsg("json parser encountered an internally inconsistent state"),
+					 report_json_context(lex)));	/* TODO: Is it safe to call report_json_context here?  Perhaps the bad parser state causes problems? */
+	}
+}
+
 static Datum
 each_worker_jsonb(FunctionCallInfo fcinfo, const char *funcname, bool as_text)
 {
@@ -1833,7 +2065,7 @@ each_worker(FunctionCallInfo fcinfo, bool as_text)
 										   "json_each temporary cxt",
 										   ALLOCSET_DEFAULT_SIZES);
 
-	pg_parse_json(lex, sem);
+	pg_parse_json_or_throw(lex, sem);
 
 	MemoryContextDelete(state->tmp_cxt);
 
@@ -2132,7 +2364,7 @@ elements_worker(FunctionCallInfo fcinfo, const char *funcname, bool as_text)
 										   "json_array_elements temporary cxt",
 										   ALLOCSET_DEFAULT_SIZES);
 
-	pg_parse_json(lex, sem);
+	pg_parse_json_or_throw(lex, sem);
 
 	MemoryContextDelete(state->tmp_cxt);
 
@@ -2504,7 +2736,7 @@ populate_array_json(PopulateArrayContext *ctx, char *json, int len)
 	sem.array_element_end = populate_array_element_end;
 	sem.scalar = populate_array_scalar;
 
-	pg_parse_json(state.lex, &sem);
+	pg_parse_json_or_throw(state.lex, &sem);
 
 	/* number of dimensions should be already known */
 	Assert(ctx->ndims > 0 && ctx->dims);
@@ -3361,7 +3593,7 @@ get_json_object_as_hash(char *json, int len, const char *funcname)
 	sem->object_field_start = hash_object_field_start;
 	sem->object_field_end = hash_object_field_end;
 
-	pg_parse_json(lex, sem);
+	pg_parse_json_or_throw(lex, sem);
 
 	return tab;
 }
@@ -3660,7 +3892,7 @@ populate_recordset_worker(FunctionCallInfo fcinfo, const char *funcname,
 
 		state->lex = lex;
 
-		pg_parse_json(lex, sem);
+		pg_parse_json_or_throw(lex, sem);
 	}
 	else
 	{
@@ -3990,7 +4222,7 @@ json_strip_nulls(PG_FUNCTION_ARGS)
 	sem->array_element_start = sn_array_element_start;
 	sem->object_field_start = sn_object_field_start;
 
-	pg_parse_json(lex, sem);
+	pg_parse_json_or_throw(lex, sem);
 
 	PG_RETURN_TEXT_P(cstring_to_text_with_len(state->strval->data,
 											  state->strval->len));
@@ -5129,7 +5361,7 @@ iterate_json_values(text *json, uint32 flags, void *action_state,
 	sem->scalar = iterate_values_scalar;
 	sem->object_field_start = iterate_values_object_field_start;
 
-	pg_parse_json(lex, sem);
+	pg_parse_json_or_throw(lex, sem);
 }
 
 /*
@@ -5249,7 +5481,7 @@ transform_json_string_values(text *json, void *action_state,
 	sem->array_element_start = transform_string_values_array_element_start;
 	sem->object_field_start = transform_string_values_object_field_start;
 
-	pg_parse_json(lex, sem);
+	pg_parse_json_or_throw(lex, sem);
 
 	return cstring_to_text_with_len(state->strval->data, state->strval->len);
 }
diff --git a/src/include/common/jsonapi.h b/src/include/common/jsonapi.h
index 581fd48036..162437193a 100644
--- a/src/include/common/jsonapi.h
+++ b/src/include/common/jsonapi.h
@@ -33,6 +33,28 @@ typedef enum
 	JSON_TOKEN_END
 } JsonTokenType;
 
+typedef enum
+{
+	JSON_SUCCESS = 0,
+	JSON_ESCAPING_INVALID,
+	JSON_ESCAPING_REQUIRED,
+	JSON_EXPECTED_ARRAY_FIRST,
+	JSON_EXPECTED_ARRAY_NEXT,
+	JSON_EXPECTED_COLON,
+	JSON_EXPECTED_END,
+	JSON_EXPECTED_JSON,
+	JSON_EXPECTED_MORE,
+	JSON_EXPECTED_OBJECT_FIRST,
+	JSON_EXPECTED_OBJECT_NEXT,
+	JSON_EXPECTED_STRING,
+	JSON_INVALID_TOKEN,
+	JSON_UNICODE_CODE_POINT_ZERO,
+	JSON_UNICODE_ESCAPE_FORMAT,
+	JSON_UNICODE_HIGH_ESCAPE,
+	JSON_UNICODE_HIGH_SURROGATE,
+	JSON_UNICODE_LOW_SURROGATE,
+	JSON_BAD_PARSER_STATE
+} JsonParseErrorType;
 
 /*
  * All the fields in this structure should be treated as read-only.
@@ -104,7 +126,13 @@ extern const JsonSemAction nullSemAction;
  * points to. If the action pointers are NULL the parser
  * does nothing and just continues.
  */
-extern void pg_parse_json(JsonLexContext *lex, const JsonSemAction *sem);
+extern JsonParseErrorType pg_parse_json(JsonLexContext *lex, const JsonSemAction *sem) __attribute__((warn_unused_result));
+
+/*
+ * Lex one token from the input stream.
+ */
+extern JsonParseErrorType json_lex(JsonLexContext *lex) __attribute__((warn_unused_result));
+
 
 /*
  * json_count_array_elements performs a fast secondary parse to determine the
diff --git a/src/include/utils/jsonfuncs.h b/src/include/utils/jsonfuncs.h
index bade7248f9..82a56eaf06 100644
--- a/src/include/utils/jsonfuncs.h
+++ b/src/include/utils/jsonfuncs.h
@@ -25,5 +25,39 @@ extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
 extern text *transform_json_string_values(text *json, void *action_state,
 										  JsonTransformStringValuesAction transform_action);
 extern JsonLexContext *makeJsonLexContext(text *json, bool need_escapes);
+extern int	report_json_context(JsonLexContext *lex);
+
+extern void throw_json_parse_error(JsonParseErrorType error, JsonLexContext *lex) pg_attribute_noreturn();
+
+static inline void pg_parse_json_or_throw(JsonLexContext *lex, const JsonSemAction *sem);
+static inline void json_lex_or_throw(JsonLexContext *lex);
+
+#define PARSE_OR_THROW(x, lex) \
+do { \
+	JsonParseErrorType	parse_result; \
+	if ((parse_result = (x)) != JSON_SUCCESS) \
+		throw_json_parse_error(parse_result, (lex)); \
+} while (0)
+
+/*
+ * pg_parse_json will parse the string in the lex calling the
+ * action functions in sem at the appropriate points. It is
+ * up to them to keep what state they need	in semstate. If they
+ * need access to the state of the lexer, then its pointer
+ * should be passed to them as a member of whatever semstate
+ * points to. If the action pointers are NULL the parser
+ * does nothing and just continues.
+ */
+static inline void
+pg_parse_json_or_throw(JsonLexContext *lex, const JsonSemAction *sem)
+{
+	PARSE_OR_THROW(pg_parse_json(lex, sem), lex);
+}
+
+static inline void
+json_lex_or_throw(JsonLexContext *lex)
+{
+	PARSE_OR_THROW(json_lex(lex), lex);
+}
 
 #endif							/* JSONFUNCS_H */
-- 
2.21.1 (Apple Git-122.3)

0010-Moving-json-parsing-into-src-common.patch.WIPapplication/octet-stream; name=0010-Moving-json-parsing-into-src-common.patch.WIP; x-unix-mode=0644Download

From caa4056cd113c3488c107ffc5086ded94fde378d Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 22 Jan 2020 16:33:04 -0800
Subject: [PATCH 10/11] Moving json parsing into src/common

This commit moves the code for parsing json out of
src/backend/utils and also removes the inherent
dependency on the database encoding.
---
 src/backend/utils/adt/json.c      | 920 +----------------------------
 src/backend/utils/adt/jsonb.c     |   2 +-
 src/backend/utils/adt/jsonfuncs.c |   9 +-
 src/common/jsonapi.c              | 938 +++++++++++++++++++++++++++++-
 src/include/common/jsonapi.h      |   9 +-
 5 files changed, 951 insertions(+), 927 deletions(-)

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 54075d07e3..1e514cb02c 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -34,31 +34,6 @@
 #include "utils/syscache.h"
 #include "utils/typcache.h"
 
-#define INSIST(x) \
-do { \
-	JsonParseErrorType	parse_result; \
-	if((parse_result = (x)) != JSON_SUCCESS) \
-		return parse_result; \
-} while (0)
-
-/*
- * The context of the parser is maintained by the recursive descent
- * mechanism, but is passed explicitly to the error reporting routine
- * for better diagnostics.
- */
-typedef enum					/* contexts of JSON parser */
-{
-	JSON_PARSE_VALUE,			/* expecting a value */
-	JSON_PARSE_STRING,			/* expecting a string (for a field name) */
-	JSON_PARSE_ARRAY_START,		/* saw '[', expecting value or ']' */
-	JSON_PARSE_ARRAY_NEXT,		/* saw array element, expecting ',' or ']' */
-	JSON_PARSE_OBJECT_START,	/* saw '{', expecting label or '}' */
-	JSON_PARSE_OBJECT_LABEL,	/* saw object label, expecting ':' */
-	JSON_PARSE_OBJECT_NEXT,		/* saw object value, expecting ',' or '}' */
-	JSON_PARSE_OBJECT_COMMA,	/* saw object ',', expecting next label */
-	JSON_PARSE_END				/* saw the end of a document, expect nothing */
-} JsonParseContext;
-
 typedef enum					/* type categories for datum_to_json */
 {
 	JSONTYPE_NULL,				/* null, so we didn't bother to identify */
@@ -83,17 +58,6 @@ typedef struct JsonAggState
 	Oid			val_output_func;
 } JsonAggState;
 
-static inline JsonTokenType lex_peek(JsonLexContext *lex) __attribute__((warn_unused_result));
-static inline JsonParseErrorType lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token) __attribute__((warn_unused_result));
-static inline JsonParseErrorType json_lex_string(JsonLexContext *lex) __attribute__((warn_unused_result));
-static inline JsonParseErrorType json_lex_number(JsonLexContext *lex, char *s,
-												 bool *num_err, int *total_len) __attribute__((warn_unused_result));
-static inline JsonParseErrorType parse_scalar(JsonLexContext *lex, const JsonSemAction *sem) __attribute__((warn_unused_result));
-static JsonParseErrorType parse_object_field(JsonLexContext *lex, const JsonSemAction *sem) __attribute__((warn_unused_result));
-static JsonParseErrorType parse_object(JsonLexContext *lex, const JsonSemAction *sem) __attribute__((warn_unused_result));
-static JsonParseErrorType parse_array_element(JsonLexContext *lex, const JsonSemAction *sem) __attribute__((warn_unused_result));
-static JsonParseErrorType parse_array(JsonLexContext *lex, const JsonSemAction *sem) __attribute__((warn_unused_result));
-static JsonParseErrorType report_parse_error(JsonParseContext ctx, JsonLexContext *lex) __attribute__((warn_unused_result));
 static void composite_to_json(Datum composite, StringInfo result,
 							  bool use_line_feeds);
 static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
@@ -112,81 +76,6 @@ static void add_json(Datum val, bool is_null, StringInfo result,
 					 Oid val_type, bool key_scalar);
 static text *catenate_stringinfo_string(StringInfo buffer, const char *addon);
 
-/* Recursive Descent parser support routines */
-
-/*
- * lex_peek
- *
- * what is the current look_ahead token?
-*/
-static inline JsonTokenType
-lex_peek(JsonLexContext *lex)
-{
-	return lex->token_type;
-}
-
-/*
- * lex_except
- *
- * move the lexer to the next token if the current look_ahead token matches
- * the parameter token. Otherwise, report an error.
- */
-static inline JsonParseErrorType
-lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token)
-{
-	if (lex_peek(lex) == token)
-		INSIST(json_lex(lex));
-	else
-		return report_parse_error(ctx, lex);
-	return JSON_SUCCESS;
-}
-
-/* chars to consider as part of an alphanumeric token */
-#define JSON_ALPHANUMERIC_CHAR(c)  \
-	(((c) >= 'a' && (c) <= 'z') || \
-	 ((c) >= 'A' && (c) <= 'Z') || \
-	 ((c) >= '0' && (c) <= '9') || \
-	 (c) == '_' || \
-	 IS_HIGHBIT_SET(c))
-
-/*
- * Utility function to check if a string is a valid JSON number.
- *
- * str is of length len, and need not be null-terminated.
- */
-bool
-IsValidJsonNumber(const char *str, int len)
-{
-	bool		numeric_error;
-	int			total_len;
-	JsonLexContext dummy_lex;
-
-	if (len <= 0)
-		return false;
-
-	/*
-	 * json_lex_number expects a leading  '-' to have been eaten already.
-	 *
-	 * having to cast away the constness of str is ugly, but there's not much
-	 * easy alternative.
-	 */
-	if (*str == '-')
-	{
-		dummy_lex.input = unconstify(char *, str) +1;
-		dummy_lex.input_length = len - 1;
-	}
-	else
-	{
-		dummy_lex.input = unconstify(char *, str);
-		dummy_lex.input_length = len;
-	}
-
-	if (JSON_SUCCESS != json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error, &total_len))
-		return false;
-
-	return (!numeric_error) && (total_len == dummy_lex.input_length);
-}
-
 /*
  * Input.
  */
@@ -245,819 +134,12 @@ json_recv(PG_FUNCTION_ARGS)
 	str = pq_getmsgtext(buf, buf->len - buf->cursor, &nbytes);
 
 	/* Validate it. */
-	lex = makeJsonLexContextCstringLen(str, nbytes, false);
+	lex = makeJsonLexContextCstringLen(str, nbytes, GetDatabaseEncoding(), false);
 	pg_parse_json_or_throw(lex, &nullSemAction);
 
 	PG_RETURN_TEXT_P(cstring_to_text_with_len(str, nbytes));
 }
 
-/*
- * pg_parse_json
- *
- * Publicly visible entry point for the JSON parser.
- *
- * lex is a lexing context, set up for the json to be processed by calling
- * makeJsonLexContext(). sem is a structure of function pointers to semantic
- * action routines to be called at appropriate spots during parsing, and a
- * pointer to a state object to be passed to those routines.
- */
-JsonParseErrorType
-pg_parse_json(JsonLexContext *lex, const JsonSemAction *sem)
-{
-	JsonTokenType tok;
-
-	/* get the initial token */
-	INSIST(json_lex(lex));
-
-	tok = lex_peek(lex);
-
-	/* parse by recursive descent */
-	switch (tok)
-	{
-		case JSON_TOKEN_OBJECT_START:
-			INSIST(parse_object(lex, sem));
-			break;
-		case JSON_TOKEN_ARRAY_START:
-			INSIST(parse_array(lex, sem));
-			break;
-		default:
-			INSIST(parse_scalar(lex, sem)); /* json can be a bare scalar */
-	}
-
-	INSIST(lex_expect(JSON_PARSE_END, lex, JSON_TOKEN_END));
-	return JSON_SUCCESS;
-}
-
-/*
- * json_count_array_elements
- *
- * Returns number of array elements in lex context at start of array token
- * until end of array token at same nesting level.
- *
- * Designed to be called from array_start routines.
- */
-int
-json_count_array_elements(JsonLexContext *lex)
-{
-	JsonLexContext copylex;
-	int			count;
-
-	/*
-	 * It's safe to do this with a shallow copy because the lexical routines
-	 * don't scribble on the input. They do scribble on the other pointers
-	 * etc, so doing this with a copy makes that safe.
-	 */
-	memcpy(&copylex, lex, sizeof(JsonLexContext));
-	copylex.strval = NULL;		/* not interested in values here */
-	copylex.lex_level++;
-
-	count = 0;
-	INSIST(lex_expect(JSON_PARSE_ARRAY_START, &copylex, JSON_TOKEN_ARRAY_START));
-	if (lex_peek(&copylex) != JSON_TOKEN_ARRAY_END)
-	{
-		while (1)
-		{
-			count++;
-			if (JSON_SUCCESS != parse_array_element(&copylex, &nullSemAction))
-				break;
-			if (copylex.token_type != JSON_TOKEN_COMMA)
-				break;
-			INSIST(json_lex(&copylex));
-		}
-	}
-	INSIST(lex_expect(JSON_PARSE_ARRAY_NEXT, &copylex, JSON_TOKEN_ARRAY_END));
-
-	return count;
-}
-
-/*
- *	Recursive Descent parse routines. There is one for each structural
- *	element in a json document:
- *	  - scalar (string, number, true, false, null)
- *	  - array  ( [ ] )
- *	  - array element
- *	  - object ( { } )
- *	  - object field
- */
-static inline JsonParseErrorType
-parse_scalar(JsonLexContext *lex, const JsonSemAction *sem)
-{
-	char	   *val = NULL;
-	json_scalar_action sfunc = sem->scalar;
-	JsonTokenType tok = lex_peek(lex);
-
-	/* a scalar must be a string, a number, true, false, or null */
-	switch (tok)
-	{
-		case JSON_TOKEN_STRING:
-		case JSON_TOKEN_NUMBER:
-		case JSON_TOKEN_TRUE:
-		case JSON_TOKEN_FALSE:
-		case JSON_TOKEN_NULL:
-			break;
-		default:
-			return report_parse_error(JSON_PARSE_VALUE, lex);
-	}
-	
-	/* if no semantic function, just consume the token */
-	if (sfunc == NULL)
-	{
-		INSIST(json_lex(lex));
-		return JSON_SUCCESS;
-	}
-
-	/* extract the de-escaped string value, or the raw lexeme */
-	if (lex_peek(lex) == JSON_TOKEN_STRING)
-	{
-		if (lex->strval != NULL)
-			val = pstrdup(lex->strval->data);
-	}
-	else
-	{
-		int         len = (lex->token_terminator - lex->token_start);
-	
-		val = palloc(len + 1);
-		memcpy(val, lex->token_start, len);
-		val[len] = '\0';
-    }
-
-	/* consume the token */
-	INSIST(json_lex(lex));
-
-	/* invoke the callback */
-	(*sfunc) (sem->semstate, val, tok);
-	return JSON_SUCCESS;
-}
-
-static JsonParseErrorType
-parse_object_field(JsonLexContext *lex, const JsonSemAction *sem)
-{
-	/*
-	 * An object field is "fieldname" : value where value can be a scalar,
-	 * object or array.  Note: in user-facing docs and error messages, we
-	 * generally call a field name a "key".
-	 */
-
-	char	   *fname = NULL;	/* keep compiler quiet */
-	json_ofield_action ostart = sem->object_field_start;
-	json_ofield_action oend = sem->object_field_end;
-	bool		isnull;
-	JsonTokenType tok;
-
-	if (lex_peek(lex) != JSON_TOKEN_STRING)
-		return report_parse_error(JSON_PARSE_STRING, lex);
-	if ((ostart != NULL || oend != NULL) && lex->strval != NULL)
-		fname = pstrdup(lex->strval->data);
-	INSIST(json_lex(lex));
-
-	INSIST(lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON));
-
-	tok = lex_peek(lex);
-	isnull = tok == JSON_TOKEN_NULL;
-
-	if (ostart != NULL)
-		(*ostart) (sem->semstate, fname, isnull);
-
-	switch (tok)
-	{
-		case JSON_TOKEN_OBJECT_START:
-			INSIST(parse_object(lex, sem));
-			break;
-		case JSON_TOKEN_ARRAY_START:
-			INSIST(parse_array(lex, sem));
-			break;
-		default:
-			INSIST(parse_scalar(lex, sem));
-	}
-
-	if (oend != NULL)
-		(*oend) (sem->semstate, fname, isnull);
-	return JSON_SUCCESS;
-}
-
-static JsonParseErrorType
-parse_object(JsonLexContext *lex, const JsonSemAction *sem)
-{
-	/*
-	 * an object is a possibly empty sequence of object fields, separated by
-	 * commas and surrounded by curly braces.
-	 */
-	json_struct_action ostart = sem->object_start;
-	json_struct_action oend = sem->object_end;
-	JsonTokenType tok;
-
-	check_stack_depth();
-
-	if (ostart != NULL)
-		(*ostart) (sem->semstate);
-
-	/*
-	 * Data inside an object is at a higher nesting level than the object
-	 * itself. Note that we increment this after we call the semantic routine
-	 * for the object start and restore it before we call the routine for the
-	 * object end.
-	 */
-	lex->lex_level++;
-
-	Assert(lex_peek(lex) == JSON_TOKEN_OBJECT_START);
-	INSIST(json_lex(lex));
-
-	tok = lex_peek(lex);
-	switch (tok)
-	{
-		case JSON_TOKEN_STRING:
-			INSIST(parse_object_field(lex, sem));
-			while (lex_peek(lex) == JSON_TOKEN_COMMA)
-			{
-				INSIST(json_lex(lex));
-				INSIST(parse_object_field(lex, sem));
-			}
-			break;
-		case JSON_TOKEN_OBJECT_END:
-			break;
-		default:
-			/* case of an invalid initial token inside the object */
-			return report_parse_error(JSON_PARSE_OBJECT_START, lex);
-	}
-
-	INSIST(lex_expect(JSON_PARSE_OBJECT_NEXT, lex, JSON_TOKEN_OBJECT_END));
-
-	lex->lex_level--;
-
-	if (oend != NULL)
-		(*oend) (sem->semstate);
-	return JSON_SUCCESS;
-}
-
-static JsonParseErrorType
-parse_array_element(JsonLexContext *lex, const JsonSemAction *sem)
-{
-	json_aelem_action astart = sem->array_element_start;
-	json_aelem_action aend = sem->array_element_end;
-	JsonTokenType tok = lex_peek(lex);
-
-	bool		isnull;
-
-	isnull = tok == JSON_TOKEN_NULL;
-
-	if (astart != NULL)
-		(*astart) (sem->semstate, isnull);
-
-	/* an array element is any object, array or scalar */
-	switch (tok)
-	{
-		case JSON_TOKEN_OBJECT_START:
-			INSIST(parse_object(lex, sem));
-			break;
-		case JSON_TOKEN_ARRAY_START:
-			INSIST(parse_array(lex, sem));
-			break;
-		default:
-			INSIST(parse_scalar(lex, sem));
-	}
-
-	if (aend != NULL)
-		(*aend) (sem->semstate, isnull);
-	return JSON_SUCCESS;
-}
-
-static JsonParseErrorType
-parse_array(JsonLexContext *lex, const JsonSemAction *sem)
-{
-	/*
-	 * an array is a possibly empty sequence of array elements, separated by
-	 * commas and surrounded by square brackets.
-	 */
-	json_struct_action astart = sem->array_start;
-	json_struct_action aend = sem->array_end;
-
-	check_stack_depth();
-
-	if (astart != NULL)
-		(*astart) (sem->semstate);
-
-	/*
-	 * Data inside an array is at a higher nesting level than the array
-	 * itself. Note that we increment this after we call the semantic routine
-	 * for the array start and restore it before we call the routine for the
-	 * array end.
-	 */
-	lex->lex_level++;
-
-	INSIST(lex_expect(JSON_PARSE_ARRAY_START, lex, JSON_TOKEN_ARRAY_START));
-	if (lex_peek(lex) != JSON_TOKEN_ARRAY_END)
-	{
-
-		INSIST(parse_array_element(lex, sem));
-
-		while (lex_peek(lex) == JSON_TOKEN_COMMA)
-		{
-			INSIST(json_lex(lex));
-			INSIST(parse_array_element(lex, sem));
-		}
-	}
-
-	INSIST(lex_expect(JSON_PARSE_ARRAY_NEXT, lex, JSON_TOKEN_ARRAY_END));
-
-	lex->lex_level--;
-
-	if (aend != NULL)
-		(*aend) (sem->semstate);
-	return JSON_SUCCESS;
-}
-
-/*
- * Lex one token from the input stream.
- */
-JsonParseErrorType
-json_lex(JsonLexContext *lex)
-{
-	char	   *s;
-	int			len;
-
-	/* Skip leading whitespace. */
-	s = lex->token_terminator;
-	len = s - lex->input;
-	while (len < lex->input_length &&
-		   (*s == ' ' || *s == '\t' || *s == '\n' || *s == '\r'))
-	{
-		if (*s == '\n')
-			++lex->line_number;
-		++s;
-		++len;
-	}
-	lex->token_start = s;
-
-	/* Determine token type. */
-	if (len >= lex->input_length)
-	{
-		lex->token_start = NULL;
-		lex->prev_token_terminator = lex->token_terminator;
-		lex->token_terminator = s;
-		lex->token_type = JSON_TOKEN_END;
-	}
-	else
-		switch (*s)
-		{
-				/* Single-character token, some kind of punctuation mark. */
-			case '{':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_OBJECT_START;
-				break;
-			case '}':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_OBJECT_END;
-				break;
-			case '[':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_ARRAY_START;
-				break;
-			case ']':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_ARRAY_END;
-				break;
-			case ',':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_COMMA;
-				break;
-			case ':':
-				lex->prev_token_terminator = lex->token_terminator;
-				lex->token_terminator = s + 1;
-				lex->token_type = JSON_TOKEN_COLON;
-				break;
-			case '"':
-				/* string */
-				INSIST(json_lex_string(lex));
-				lex->token_type = JSON_TOKEN_STRING;
-				break;
-			case '-':
-				/* Negative number. */
-				INSIST(json_lex_number(lex, s + 1, NULL, NULL));
-				lex->token_type = JSON_TOKEN_NUMBER;
-				break;
-			case '0':
-			case '1':
-			case '2':
-			case '3':
-			case '4':
-			case '5':
-			case '6':
-			case '7':
-			case '8':
-			case '9':
-				/* Positive number. */
-				INSIST(json_lex_number(lex, s, NULL, NULL));
-				lex->token_type = JSON_TOKEN_NUMBER;
-				break;
-			default:
-				{
-					char	   *p;
-
-					/*
-					 * We're not dealing with a string, number, legal
-					 * punctuation mark, or end of string.  The only legal
-					 * tokens we might find here are true, false, and null,
-					 * but for error reporting purposes we scan until we see a
-					 * non-alphanumeric character.  That way, we can report
-					 * the whole word as an unexpected token, rather than just
-					 * some unintuitive prefix thereof.
-					 */
-					for (p = s; p - s < lex->input_length - len && JSON_ALPHANUMERIC_CHAR(*p); p++)
-						 /* skip */ ;
-
-					/*
-					 * We got some sort of unexpected punctuation or an
-					 * otherwise unexpected character, so just complain about
-					 * that one character.
-					 */
-					if (p == s)
-					{
-						lex->prev_token_terminator = lex->token_terminator;
-						lex->token_terminator = s + 1;
-						return JSON_INVALID_TOKEN;
-					}
-
-					/*
-					 * We've got a real alphanumeric token here.  If it
-					 * happens to be true, false, or null, all is well.  If
-					 * not, error out.
-					 */
-					lex->prev_token_terminator = lex->token_terminator;
-					lex->token_terminator = p;
-					if (p - s == 4)
-					{
-						if (memcmp(s, "true", 4) == 0)
-							lex->token_type = JSON_TOKEN_TRUE;
-						else if (memcmp(s, "null", 4) == 0)
-							lex->token_type = JSON_TOKEN_NULL;
-						else
-							return JSON_INVALID_TOKEN;
-					}
-					else if (p - s == 5 && memcmp(s, "false", 5) == 0)
-						lex->token_type = JSON_TOKEN_FALSE;
-					else
-						return JSON_INVALID_TOKEN;
-
-				}
-		}						/* end of switch */
-	return JSON_SUCCESS;
-}
-
-/*
- * The next token in the input stream is known to be a string; lex it.
- */
-static inline JsonParseErrorType
-json_lex_string(JsonLexContext *lex)
-{
-	char	   *s;
-	int			len;
-	int			hi_surrogate = -1;
-
-	if (lex->strval != NULL)
-		resetStringInfo(lex->strval);
-
-	Assert(lex->input_length > 0);
-	s = lex->token_start;
-	len = lex->token_start - lex->input;
-	for (;;)
-	{
-		s++;
-		len++;
-		/* Premature end of the string. */
-		if (len >= lex->input_length)
-		{
-			lex->token_terminator = s;
-			return JSON_INVALID_TOKEN;
-		}
-		else if (*s == '"')
-			break;
-		else if ((unsigned char) *s < 32)
-		{
-			/* Per RFC4627, these characters MUST be escaped. */
-			/* Since *s isn't printable, exclude it from the context string */
-			lex->token_terminator = s;
-			return JSON_ESCAPING_REQUIRED;
-		}
-		else if (*s == '\\')
-		{
-			/* OK, we have an escape character. */
-			s++;
-			len++;
-			if (len >= lex->input_length)
-			{
-				lex->token_terminator = s;
-				return JSON_INVALID_TOKEN;
-			}
-			else if (*s == 'u')
-			{
-				int			i;
-				int			ch = 0;
-
-				for (i = 1; i <= 4; i++)
-				{
-					s++;
-					len++;
-					if (len >= lex->input_length)
-					{
-						lex->token_terminator = s;
-						return JSON_INVALID_TOKEN;
-					}
-					else if (*s >= '0' && *s <= '9')
-						ch = (ch * 16) + (*s - '0');
-					else if (*s >= 'a' && *s <= 'f')
-						ch = (ch * 16) + (*s - 'a') + 10;
-					else if (*s >= 'A' && *s <= 'F')
-						ch = (ch * 16) + (*s - 'A') + 10;
-					else
-					{
-						lex->token_terminator = s + pg_mblen(s);
-						return JSON_UNICODE_ESCAPE_FORMAT;
-					}
-				}
-				if (lex->strval != NULL)
-				{
-					char		utf8str[5];
-					int			utf8len;
-
-					if (ch >= 0xd800 && ch <= 0xdbff)
-					{
-						if (hi_surrogate != -1)
-							return JSON_UNICODE_HIGH_SURROGATE;
-						hi_surrogate = (ch & 0x3ff) << 10;
-						continue;
-					}
-					else if (ch >= 0xdc00 && ch <= 0xdfff)
-					{
-						if (hi_surrogate == -1)
-							return JSON_UNICODE_LOW_SURROGATE;
-						ch = 0x10000 + hi_surrogate + (ch & 0x3ff);
-						hi_surrogate = -1;
-					}
-
-					if (hi_surrogate != -1)
-						return JSON_UNICODE_LOW_SURROGATE;
-
-					/*
-					 * For UTF8, replace the escape sequence by the actual
-					 * utf8 character in lex->strval. Do this also for other
-					 * encodings if the escape designates an ASCII character,
-					 * otherwise raise an error.
-					 */
-
-					if (ch == 0)
-					{
-						/* We can't allow this, since our TEXT type doesn't */
-						return JSON_UNICODE_CODE_POINT_ZERO;
-					}
-					else if (GetDatabaseEncoding() == PG_UTF8)
-					{
-						unicode_to_utf8(ch, (unsigned char *) utf8str);
-						utf8len = pg_utf_mblen((unsigned char *) utf8str);
-						appendBinaryStringInfo(lex->strval, utf8str, utf8len);
-					}
-					else if (ch <= 0x007f)
-					{
-						/*
-						 * This is the only way to designate things like a
-						 * form feed character in JSON, so it's useful in all
-						 * encodings.
-						 */
-						appendStringInfoChar(lex->strval, (char) ch);
-					}
-					else
-						return JSON_UNICODE_HIGH_ESCAPE;
-
-				}
-			}
-			else if (lex->strval != NULL)
-			{
-				if (hi_surrogate != -1)
-					return JSON_UNICODE_LOW_SURROGATE;
-
-				switch (*s)
-				{
-					case '"':
-					case '\\':
-					case '/':
-						appendStringInfoChar(lex->strval, *s);
-						break;
-					case 'b':
-						appendStringInfoChar(lex->strval, '\b');
-						break;
-					case 'f':
-						appendStringInfoChar(lex->strval, '\f');
-						break;
-					case 'n':
-						appendStringInfoChar(lex->strval, '\n');
-						break;
-					case 'r':
-						appendStringInfoChar(lex->strval, '\r');
-						break;
-					case 't':
-						appendStringInfoChar(lex->strval, '\t');
-						break;
-					default:
-						/* Not a valid string escape, so error out. */
-						lex->token_terminator = s + pg_mblen(s);
-						return JSON_ESCAPING_INVALID;
-				}
-			}
-			else if (strchr("\"\\/bfnrt", *s) == NULL)
-			{
-				/*
-				 * Simpler processing if we're not bothered about de-escaping
-				 *
-				 * It's very tempting to remove the strchr() call here and
-				 * replace it with a switch statement, but testing so far has
-				 * shown it's not a performance win.
-				 */
-				lex->token_terminator = s + pg_mblen(s);
-				return JSON_ESCAPING_INVALID;
-			}
-
-		}
-		else if (lex->strval != NULL)
-		{
-			if (hi_surrogate != -1)
-				return JSON_UNICODE_LOW_SURROGATE;
-			appendStringInfoChar(lex->strval, *s);
-		}
-
-	}
-
-	if (hi_surrogate != -1)
-		return JSON_UNICODE_LOW_SURROGATE;
-
-	/* Hooray, we found the end of the string! */
-	lex->prev_token_terminator = lex->token_terminator;
-	lex->token_terminator = s + 1;
-	return JSON_SUCCESS;
-}
-
-/*
- * The next token in the input stream is known to be a number; lex it.
- *
- * In JSON, a number consists of four parts:
- *
- * (1) An optional minus sign ('-').
- *
- * (2) Either a single '0', or a string of one or more digits that does not
- *	   begin with a '0'.
- *
- * (3) An optional decimal part, consisting of a period ('.') followed by
- *	   one or more digits.  (Note: While this part can be omitted
- *	   completely, it's not OK to have only the decimal point without
- *	   any digits afterwards.)
- *
- * (4) An optional exponent part, consisting of 'e' or 'E', optionally
- *	   followed by '+' or '-', followed by one or more digits.  (Note:
- *	   As with the decimal part, if 'e' or 'E' is present, it must be
- *	   followed by at least one digit.)
- *
- * The 's' argument to this function points to the ostensible beginning
- * of part 2 - i.e. the character after any optional minus sign, or the
- * first character of the string if there is none.
- *
- * If num_err is not NULL, we return an error flag to *num_err rather than
- * raising an error for a badly-formed number.  Also, if total_len is not NULL
- * the distance from lex->input to the token end+1 is returned to *total_len.
- */
-static inline JsonParseErrorType
-json_lex_number(JsonLexContext *lex, char *s,
-				bool *num_err, int *total_len)
-{
-	bool		error = false;
-	int			len = s - lex->input;
-
-	/* Part (1): leading sign indicator. */
-	/* Caller already did this for us; so do nothing. */
-
-	/* Part (2): parse main digit string. */
-	if (len < lex->input_length && *s == '0')
-	{
-		s++;
-		len++;
-	}
-	else if (len < lex->input_length && *s >= '1' && *s <= '9')
-	{
-		do
-		{
-			s++;
-			len++;
-		} while (len < lex->input_length && *s >= '0' && *s <= '9');
-	}
-	else
-		error = true;
-
-	/* Part (3): parse optional decimal portion. */
-	if (len < lex->input_length && *s == '.')
-	{
-		s++;
-		len++;
-		if (len == lex->input_length || *s < '0' || *s > '9')
-			error = true;
-		else
-		{
-			do
-			{
-				s++;
-				len++;
-			} while (len < lex->input_length && *s >= '0' && *s <= '9');
-		}
-	}
-
-	/* Part (4): parse optional exponent. */
-	if (len < lex->input_length && (*s == 'e' || *s == 'E'))
-	{
-		s++;
-		len++;
-		if (len < lex->input_length && (*s == '+' || *s == '-'))
-		{
-			s++;
-			len++;
-		}
-		if (len == lex->input_length || *s < '0' || *s > '9')
-			error = true;
-		else
-		{
-			do
-			{
-				s++;
-				len++;
-			} while (len < lex->input_length && *s >= '0' && *s <= '9');
-		}
-	}
-
-	/*
-	 * Check for trailing garbage.  As in json_lex(), any alphanumeric stuff
-	 * here should be considered part of the token for error-reporting
-	 * purposes.
-	 */
-	for (; len < lex->input_length && JSON_ALPHANUMERIC_CHAR(*s); s++, len++)
-		error = true;
-
-	if (total_len != NULL)
-		*total_len = len;
-
-	if (num_err != NULL)
-	{
-		/* let the caller handle any error */
-		*num_err = error;
-	}
-	else
-	{
-		/* return token endpoint */
-		lex->prev_token_terminator = lex->token_terminator;
-		lex->token_terminator = s;
-		/* handle error if any */
-		if (error)
-			return JSON_INVALID_TOKEN;
-	}
-	return JSON_SUCCESS;
-}
-
-/*
- * Report a parse error.
- *
- * lex->token_start and lex->token_terminator must identify the current token.
- */
-static JsonParseErrorType
-report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
-{
-	/* Handle case where the input ended prematurely. */
-	if (lex->token_start == NULL || lex->token_type == JSON_TOKEN_END)
-		return JSON_EXPECTED_MORE;
-
-	switch (ctx)
-	{
-		case JSON_PARSE_END:
-			return JSON_EXPECTED_END;
-		case JSON_PARSE_VALUE:
-			return JSON_EXPECTED_JSON;
-		case JSON_PARSE_STRING:
-			return JSON_EXPECTED_STRING;
-		case JSON_PARSE_ARRAY_START:
-			return JSON_EXPECTED_ARRAY_FIRST;
-		case JSON_PARSE_ARRAY_NEXT:
-			return JSON_EXPECTED_ARRAY_NEXT;
-		case JSON_PARSE_OBJECT_START:
-			return JSON_EXPECTED_OBJECT_FIRST;
-		case JSON_PARSE_OBJECT_LABEL:
-			return JSON_EXPECTED_COLON;
-		case JSON_PARSE_OBJECT_NEXT:
-			return JSON_EXPECTED_OBJECT_NEXT;
-		case JSON_PARSE_OBJECT_COMMA:
-			return JSON_EXPECTED_STRING;
-		default:
-			return JSON_BAD_PARSER_STATE;;
-	}
-}
-
 /*
  * Determine how we want to print values of a given type in datum_to_json.
  *
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 63072f616e..a5f2f3eeca 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -261,7 +261,7 @@ jsonb_from_cstring(char *json, int len)
 
 	memset(&state, 0, sizeof(state));
 	memset(&sem, 0, sizeof(sem));
-	lex = makeJsonLexContextCstringLen(json, len, true);
+	lex = makeJsonLexContextCstringLen(json, len, GetDatabaseEncoding(), true);
 
 	sem.semstate = (void *) &state;
 
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index be5d30239d..bb22fd23ea 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -521,6 +521,7 @@ makeJsonLexContext(text *json, bool need_escapes)
 {
 	return makeJsonLexContextCstringLen(VARDATA_ANY(json),
 										VARSIZE_ANY_EXHDR(json),
+										GetDatabaseEncoding(),
 										need_escapes);
 }
 
@@ -1260,7 +1261,9 @@ get_array_start(void *state)
 			_state->path_indexes[lex_level] != INT_MIN)
 		{
 			/* Negative subscript -- convert to positive-wise subscript */
-			int			nelements = json_count_array_elements(_state->lex);
+			int			nelements;
+
+			PARSE_OR_THROW(json_count_array_elements(_state->lex, &nelements), _state->lex);
 
 			if (-_state->path_indexes[lex_level] <= nelements)
 				_state->path_indexes[lex_level] += nelements;
@@ -2725,7 +2728,7 @@ populate_array_json(PopulateArrayContext *ctx, char *json, int len)
 	PopulateArrayState state;
 	JsonSemAction sem;
 
-	state.lex = makeJsonLexContextCstringLen(json, len, true);
+	state.lex = makeJsonLexContextCstringLen(json, len, GetDatabaseEncoding(), true);
 	state.ctx = ctx;
 
 	memset(&sem, 0, sizeof(sem));
@@ -3568,7 +3571,7 @@ get_json_object_as_hash(char *json, int len, const char *funcname)
 	HASHCTL		ctl;
 	HTAB	   *tab;
 	JHashState *state;
-	JsonLexContext *lex = makeJsonLexContextCstringLen(json, len, true);
+	JsonLexContext *lex = makeJsonLexContextCstringLen(json, len, GetDatabaseEncoding(), true);
 	JsonSemAction *sem;
 
 	memset(&ctl, 0, sizeof(ctl));
diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
index c14d6ff4f2..585d4842b2 100644
--- a/src/common/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -10,14 +10,44 @@
  *
  *-------------------------------------------------------------------------
  */
-#include "c.h"
+#ifndef FRONTEND
+#include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
 
+#include "c.h"
 #include "common/jsonapi.h"
 
 #ifndef FRONTEND
-#include "utils/palloc.h"
+#include "miscadmin.h"
 #endif
 
+#define INSIST(x) \
+do { \
+	JsonParseErrorType	parse_result; \
+	if((parse_result = (x)) != JSON_SUCCESS) \
+		return parse_result; \
+} while (0)
+
+/*
+ * The context of the parser is maintained by the recursive descent
+ * mechanism, but is passed explicitly to the error reporting routine
+ * for better diagnostics.
+ */
+typedef enum					/* contexts of JSON parser */
+{
+	JSON_PARSE_VALUE,			/* expecting a value */
+	JSON_PARSE_STRING,			/* expecting a string (for a field name) */
+	JSON_PARSE_ARRAY_START,		/* saw '[', expecting value or ']' */
+	JSON_PARSE_ARRAY_NEXT,		/* saw array element, expecting ',' or ']' */
+	JSON_PARSE_OBJECT_START,	/* saw '{', expecting label or '}' */
+	JSON_PARSE_OBJECT_LABEL,	/* saw object label, expecting ':' */
+	JSON_PARSE_OBJECT_NEXT,		/* saw object value, expecting ',' or '}' */
+	JSON_PARSE_OBJECT_COMMA,	/* saw object ',', expecting next label */
+	JSON_PARSE_END				/* saw the end of a document, expect nothing */
+} JsonParseContext;
+
 /* the null action object used for pure validation */
 const JsonSemAction nullSemAction =
 {
@@ -25,8 +55,19 @@ const JsonSemAction nullSemAction =
 	NULL, NULL, NULL, NULL, NULL
 };
 
+static inline JsonParseErrorType lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token) __attribute__((warn_unused_result));
+static inline JsonParseErrorType json_lex_string(JsonLexContext *lex) __attribute__((warn_unused_result));
+static inline JsonParseErrorType json_lex_number(JsonLexContext *lex, char *s,
+												 bool *num_err, int *total_len) __attribute__((warn_unused_result));
+static inline JsonParseErrorType parse_scalar(JsonLexContext *lex, const JsonSemAction *sem) __attribute__((warn_unused_result));
+static JsonParseErrorType parse_object_field(JsonLexContext *lex, const JsonSemAction *sem) __attribute__((warn_unused_result));
+static JsonParseErrorType parse_object(JsonLexContext *lex, const JsonSemAction *sem) __attribute__((warn_unused_result));
+static JsonParseErrorType parse_array_element(JsonLexContext *lex, const JsonSemAction *sem) __attribute__((warn_unused_result));
+static JsonParseErrorType parse_array(JsonLexContext *lex, const JsonSemAction *sem) __attribute__((warn_unused_result));
+static JsonParseErrorType report_parse_error(JsonParseContext ctx, JsonLexContext *lex) __attribute__((warn_unused_result));
+
 JsonLexContext *
-makeJsonLexContextCstringLen(char *json, int len, bool need_escapes)
+makeJsonLexContextCstringLen(char *json, int len, pg_enc encoding, bool need_escapes)
 {
 	JsonLexContext *lex;
 
@@ -40,9 +81,900 @@ makeJsonLexContextCstringLen(char *json, int len, bool need_escapes)
 	lex->input = lex->token_terminator = lex->line_start = json;
 	lex->line_number = 1;
 	lex->input_length = len;
+	lex->input_encoding = encoding;
 	if (need_escapes)
 		lex->strval = makeStringInfo();
 	return lex;
 }
 
+/*
+ * lex_peek
+ *
+ * what is the current look_ahead token?
+*/
+JsonTokenType
+lex_peek(JsonLexContext *lex)
+{
+	return lex->token_type;
+}
+
+/*
+ * lex_except
+ *
+ * move the lexer to the next token if the current look_ahead token matches
+ * the parameter token. Otherwise, report an error.
+ */
+static inline JsonParseErrorType
+lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token)
+{
+	if (lex_peek(lex) == token)
+		INSIST(json_lex(lex));
+	else
+		return report_parse_error(ctx, lex);
+	return JSON_SUCCESS;
+}
+
+/* chars to consider as part of an alphanumeric token */
+#define JSON_ALPHANUMERIC_CHAR(c)  \
+	(((c) >= 'a' && (c) <= 'z') || \
+	 ((c) >= 'A' && (c) <= 'Z') || \
+	 ((c) >= '0' && (c) <= '9') || \
+	 (c) == '_' || \
+	 IS_HIGHBIT_SET(c))
+
+/*
+ * Utility function to check if a string is a valid JSON number.
+ *
+ * str is of length len, and need not be null-terminated.
+ */
+bool
+IsValidJsonNumber(const char *str, int len)
+{
+	bool		numeric_error;
+	int			total_len;
+	JsonLexContext dummy_lex;
+
+	if (len <= 0)
+		return false;
+
+	/*
+	 * json_lex_number expects a leading  '-' to have been eaten already.
+	 *
+	 * having to cast away the constness of str is ugly, but there's not much
+	 * easy alternative.
+	 */
+	if (*str == '-')
+	{
+		dummy_lex.input = unconstify(char *, str) +1;
+		dummy_lex.input_length = len - 1;
+	}
+	else
+	{
+		dummy_lex.input = unconstify(char *, str);
+		dummy_lex.input_length = len;
+	}
+
+	if (JSON_SUCCESS != json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error, &total_len))
+		return false;
+
+	return (!numeric_error) && (total_len == dummy_lex.input_length);
+}
+
+/* Recursive Descent parser support routines */
+
+/*
+ * pg_parse_json
+ *
+ * Publicly visible entry point for the JSON parser.
+ *
+ * lex is a lexing context, set up for the json to be processed by calling
+ * makeJsonLexContext(). sem is a structure of function pointers to semantic
+ * action routines to be called at appropriate spots during parsing, and a
+ * pointer to a state object to be passed to those routines.
+ */
+JsonParseErrorType
+pg_parse_json(JsonLexContext *lex, const JsonSemAction *sem)
+{
+	JsonTokenType tok;
+
+	/* get the initial token */
+	INSIST(json_lex(lex));
+
+	tok = lex_peek(lex);
+
+	/* parse by recursive descent */
+	switch (tok)
+	{
+		case JSON_TOKEN_OBJECT_START:
+			INSIST(parse_object(lex, sem));
+			break;
+		case JSON_TOKEN_ARRAY_START:
+			INSIST(parse_array(lex, sem));
+			break;
+		default:
+			INSIST(parse_scalar(lex, sem)); /* json can be a bare scalar */
+	}
+
+	INSIST(lex_expect(JSON_PARSE_END, lex, JSON_TOKEN_END));
+	return JSON_SUCCESS;
+}
+
+/*
+ * json_count_array_elements
+ *
+ * Returns number of array elements in lex context at start of array token
+ * until end of array token at same nesting level.
+ *
+ * Designed to be called from array_start routines.
+ */
+JsonParseErrorType
+json_count_array_elements(JsonLexContext *lex, int *elements)
+{
+	JsonLexContext copylex;
+	int			count;
+
+	/*
+	 * It's safe to do this with a shallow copy because the lexical routines
+	 * don't scribble on the input. They do scribble on the other pointers
+	 * etc, so doing this with a copy makes that safe.
+	 */
+	memcpy(&copylex, lex, sizeof(JsonLexContext));
+	copylex.strval = NULL;		/* not interested in values here */
+	copylex.lex_level++;
+
+	count = 0;
+	INSIST(lex_expect(JSON_PARSE_ARRAY_START, &copylex, JSON_TOKEN_ARRAY_START));
+	if (lex_peek(&copylex) != JSON_TOKEN_ARRAY_END)
+	{
+		while (1)
+		{
+			count++;
+			if (JSON_SUCCESS != parse_array_element(&copylex, &nullSemAction))
+				break;
+			if (copylex.token_type != JSON_TOKEN_COMMA)
+				break;
+			INSIST(json_lex(&copylex));
+		}
+	}
+	INSIST(lex_expect(JSON_PARSE_ARRAY_NEXT, &copylex, JSON_TOKEN_ARRAY_END));
+
+	*elements = count;
+	return JSON_SUCCESS;
+}
+
+/*
+ *	Recursive Descent parse routines. There is one for each structural
+ *	element in a json document:
+ *	  - scalar (string, number, true, false, null)
+ *	  - array  ( [ ] )
+ *	  - array element
+ *	  - object ( { } )
+ *	  - object field
+ */
+static inline JsonParseErrorType
+parse_scalar(JsonLexContext *lex, const JsonSemAction *sem)
+{
+	char	   *val = NULL;
+	json_scalar_action sfunc = sem->scalar;
+	JsonTokenType tok = lex_peek(lex);
+
+	/* a scalar must be a string, a number, true, false, or null */
+	switch (tok)
+	{
+		case JSON_TOKEN_STRING:
+		case JSON_TOKEN_NUMBER:
+		case JSON_TOKEN_TRUE:
+		case JSON_TOKEN_FALSE:
+		case JSON_TOKEN_NULL:
+			break;
+		default:
+			return report_parse_error(JSON_PARSE_VALUE, lex);
+	}
+
+	/* if no semantic function, just consume the token */
+	if (sfunc == NULL)
+	{
+		INSIST(json_lex(lex));
+		return JSON_SUCCESS;
+	}
+
+	/* extract the de-escaped string value, or the raw lexeme */
+	if (lex_peek(lex) == JSON_TOKEN_STRING)
+	{
+		if (lex->strval != NULL)
+			val = pstrdup(lex->strval->data);
+	}
+	else
+	{
+		int		len = (lex->token_terminator - lex->token_start);
+
+		val = palloc(len + 1);
+		memcpy(val, lex->token_start, len);
+		val[len] = '\0';
+	}
+
+	/* consume the token */
+	INSIST(json_lex(lex));
+
+	/* invoke the callback */
+	(*sfunc) (sem->semstate, val, tok);
+	return JSON_SUCCESS;
+}
+
+static JsonParseErrorType
+parse_object_field(JsonLexContext *lex, const JsonSemAction *sem)
+{
+	/*
+	 * An object field is "fieldname" : value where value can be a scalar,
+	 * object or array.  Note: in user-facing docs and error messages, we
+	 * generally call a field name a "key".
+	 */
+
+	char	   *fname = NULL;	/* keep compiler quiet */
+	json_ofield_action ostart = sem->object_field_start;
+	json_ofield_action oend = sem->object_field_end;
+	bool		isnull;
+	JsonTokenType tok;
+
+	if (lex_peek(lex) != JSON_TOKEN_STRING)
+		return report_parse_error(JSON_PARSE_STRING, lex);
+	if ((ostart != NULL || oend != NULL) && lex->strval != NULL)
+		fname = pstrdup(lex->strval->data);
+	INSIST(json_lex(lex));
+
+	INSIST(lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON));
+
+	tok = lex_peek(lex);
+	isnull = tok == JSON_TOKEN_NULL;
+
+	if (ostart != NULL)
+		(*ostart) (sem->semstate, fname, isnull);
+
+	switch (tok)
+	{
+		case JSON_TOKEN_OBJECT_START:
+			INSIST(parse_object(lex, sem));
+			break;
+		case JSON_TOKEN_ARRAY_START:
+			INSIST(parse_array(lex, sem));
+			break;
+		default:
+			INSIST(parse_scalar(lex, sem));
+	}
+
+	if (oend != NULL)
+		(*oend) (sem->semstate, fname, isnull);
+	return JSON_SUCCESS;
+}
+
+static JsonParseErrorType
+parse_object(JsonLexContext *lex, const JsonSemAction *sem)
+{
+	/*
+	 * an object is a possibly empty sequence of object fields, separated by
+	 * commas and surrounded by curly braces.
+	 */
+	json_struct_action ostart = sem->object_start;
+	json_struct_action oend = sem->object_end;
+	JsonTokenType tok;
+
+#ifndef FRONTEND
+	check_stack_depth();
+#else
+	/* TODO: What do we do in frontend code? */
+#endif
+
+	if (ostart != NULL)
+		(*ostart) (sem->semstate);
+
+	/*
+	 * Data inside an object is at a higher nesting level than the object
+	 * itself. Note that we increment this after we call the semantic routine
+	 * for the object start and restore it before we call the routine for the
+	 * object end.
+	 */
+	lex->lex_level++;
+
+	Assert(lex_peek(lex) == JSON_TOKEN_OBJECT_START);
+	INSIST(json_lex(lex));
+
+	tok = lex_peek(lex);
+	switch (tok)
+	{
+		case JSON_TOKEN_STRING:
+			INSIST(parse_object_field(lex, sem));
+			while (lex_peek(lex) == JSON_TOKEN_COMMA)
+			{
+				INSIST(json_lex(lex));
+				INSIST(parse_object_field(lex, sem));
+			}
+			break;
+		case JSON_TOKEN_OBJECT_END:
+			break;
+		default:
+			/* case of an invalid initial token inside the object */
+			return report_parse_error(JSON_PARSE_OBJECT_START, lex);
+	}
+
+	INSIST(lex_expect(JSON_PARSE_OBJECT_NEXT, lex, JSON_TOKEN_OBJECT_END));
+
+	lex->lex_level--;
+
+	if (oend != NULL)
+		(*oend) (sem->semstate);
+	return JSON_SUCCESS;
+}
+
+static JsonParseErrorType
+parse_array_element(JsonLexContext *lex, const JsonSemAction *sem)
+{
+	json_aelem_action astart = sem->array_element_start;
+	json_aelem_action aend = sem->array_element_end;
+	JsonTokenType tok = lex_peek(lex);
+
+	bool		isnull;
+
+	isnull = tok == JSON_TOKEN_NULL;
+
+	if (astart != NULL)
+		(*astart) (sem->semstate, isnull);
+
+	/* an array element is any object, array or scalar */
+	switch (tok)
+	{
+		case JSON_TOKEN_OBJECT_START:
+			INSIST(parse_object(lex, sem));
+			break;
+		case JSON_TOKEN_ARRAY_START:
+			INSIST(parse_array(lex, sem));
+			break;
+		default:
+			INSIST(parse_scalar(lex, sem));
+	}
+
+	if (aend != NULL)
+		(*aend) (sem->semstate, isnull);
+	return JSON_SUCCESS;
+}
+
+static JsonParseErrorType
+parse_array(JsonLexContext *lex, const JsonSemAction *sem)
+{
+	/*
+	 * an array is a possibly empty sequence of array elements, separated by
+	 * commas and surrounded by square brackets.
+	 */
+	json_struct_action astart = sem->array_start;
+	json_struct_action aend = sem->array_end;
+
+#ifndef FRONTEND
+	check_stack_depth();
+#else
+	/* TODO: What do we do in frontend code? */
+#endif
+
+	if (astart != NULL)
+		(*astart) (sem->semstate);
+
+	/*
+	 * Data inside an array is at a higher nesting level than the array
+	 * itself. Note that we increment this after we call the semantic routine
+	 * for the array start and restore it before we call the routine for the
+	 * array end.
+	 */
+	lex->lex_level++;
+
+	INSIST(lex_expect(JSON_PARSE_ARRAY_START, lex, JSON_TOKEN_ARRAY_START));
+	if (lex_peek(lex) != JSON_TOKEN_ARRAY_END)
+	{
+
+		INSIST(parse_array_element(lex, sem));
+
+		while (lex_peek(lex) == JSON_TOKEN_COMMA)
+		{
+			INSIST(json_lex(lex));
+			INSIST(parse_array_element(lex, sem));
+		}
+	}
+
+	INSIST(lex_expect(JSON_PARSE_ARRAY_NEXT, lex, JSON_TOKEN_ARRAY_END));
+
+	lex->lex_level--;
+
+	if (aend != NULL)
+		(*aend) (sem->semstate);
+	return JSON_SUCCESS;
+}
+
+/*
+ * Lex one token from the input stream.
+ */
+JsonParseErrorType
+json_lex(JsonLexContext *lex)
+{
+	char	   *s;
+	int			len;
+
+	/* Skip leading whitespace. */
+	s = lex->token_terminator;
+	len = s - lex->input;
+	while (len < lex->input_length &&
+		   (*s == ' ' || *s == '\t' || *s == '\n' || *s == '\r'))
+	{
+		if (*s == '\n')
+			++lex->line_number;
+		++s;
+		++len;
+	}
+	lex->token_start = s;
+
+	/* Determine token type. */
+	if (len >= lex->input_length)
+	{
+		lex->token_start = NULL;
+		lex->prev_token_terminator = lex->token_terminator;
+		lex->token_terminator = s;
+		lex->token_type = JSON_TOKEN_END;
+	}
+	else
+		switch (*s)
+		{
+				/* Single-character token, some kind of punctuation mark. */
+			case '{':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_OBJECT_START;
+				break;
+			case '}':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_OBJECT_END;
+				break;
+			case '[':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_ARRAY_START;
+				break;
+			case ']':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_ARRAY_END;
+				break;
+			case ',':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_COMMA;
+				break;
+			case ':':
+				lex->prev_token_terminator = lex->token_terminator;
+				lex->token_terminator = s + 1;
+				lex->token_type = JSON_TOKEN_COLON;
+				break;
+			case '"':
+				/* string */
+				INSIST(json_lex_string(lex));
+				lex->token_type = JSON_TOKEN_STRING;
+				break;
+			case '-':
+				/* Negative number. */
+				INSIST(json_lex_number(lex, s + 1, NULL, NULL));
+				lex->token_type = JSON_TOKEN_NUMBER;
+				break;
+			case '0':
+			case '1':
+			case '2':
+			case '3':
+			case '4':
+			case '5':
+			case '6':
+			case '7':
+			case '8':
+			case '9':
+				/* Positive number. */
+				INSIST(json_lex_number(lex, s, NULL, NULL));
+				lex->token_type = JSON_TOKEN_NUMBER;
+				break;
+			default:
+				{
+					char	   *p;
+
+					/*
+					 * We're not dealing with a string, number, legal
+					 * punctuation mark, or end of string.  The only legal
+					 * tokens we might find here are true, false, and null,
+					 * but for error reporting purposes we scan until we see a
+					 * non-alphanumeric character.  That way, we can report
+					 * the whole word as an unexpected token, rather than just
+					 * some unintuitive prefix thereof.
+					 */
+					for (p = s; p - s < lex->input_length - len && JSON_ALPHANUMERIC_CHAR(*p); p++)
+						 /* skip */ ;
+
+					/*
+					 * We got some sort of unexpected punctuation or an
+					 * otherwise unexpected character, so just complain about
+					 * that one character.
+					 */
+					if (p == s)
+					{
+						lex->prev_token_terminator = lex->token_terminator;
+						lex->token_terminator = s + 1;
+						return JSON_INVALID_TOKEN;
+					}
+
+					/*
+					 * We've got a real alphanumeric token here.  If it
+					 * happens to be true, false, or null, all is well.  If
+					 * not, error out.
+					 */
+					lex->prev_token_terminator = lex->token_terminator;
+					lex->token_terminator = p;
+					if (p - s == 4)
+					{
+						if (memcmp(s, "true", 4) == 0)
+							lex->token_type = JSON_TOKEN_TRUE;
+						else if (memcmp(s, "null", 4) == 0)
+							lex->token_type = JSON_TOKEN_NULL;
+						else
+							return JSON_INVALID_TOKEN;
+					}
+					else if (p - s == 5 && memcmp(s, "false", 5) == 0)
+						lex->token_type = JSON_TOKEN_FALSE;
+					else
+						return JSON_INVALID_TOKEN;
+
+				}
+		}						/* end of switch */
+	return JSON_SUCCESS;
+}
+
+/*
+ * The next token in the input stream is known to be a string; lex it.
+ */
+static inline JsonParseErrorType
+json_lex_string(JsonLexContext *lex)
+{
+	char	   *s;
+	int			len;
+	int			hi_surrogate = -1;
+
+	if (lex->strval != NULL)
+		resetStringInfo(lex->strval);
+
+	Assert(lex->input_length > 0);
+	s = lex->token_start;
+	len = lex->token_start - lex->input;
+	for (;;)
+	{
+		s++;
+		len++;
+		/* Premature end of the string. */
+		if (len >= lex->input_length)
+		{
+			lex->token_terminator = s;
+			return JSON_INVALID_TOKEN;
+		}
+		else if (*s == '"')
+			break;
+		else if ((unsigned char) *s < 32)
+		{
+			/* Per RFC4627, these characters MUST be escaped. */
+			/* Since *s isn't printable, exclude it from the context string */
+			lex->token_terminator = s;
+			return JSON_ESCAPING_REQUIRED;
+		}
+		else if (*s == '\\')
+		{
+			/* OK, we have an escape character. */
+			s++;
+			len++;
+			if (len >= lex->input_length)
+			{
+				lex->token_terminator = s;
+				return JSON_INVALID_TOKEN;
+			}
+			else if (*s == 'u')
+			{
+				int			i;
+				int			ch = 0;
+
+				for (i = 1; i <= 4; i++)
+				{
+					s++;
+					len++;
+					if (len >= lex->input_length)
+					{
+						lex->token_terminator = s;
+						return JSON_INVALID_TOKEN;
+					}
+					else if (*s >= '0' && *s <= '9')
+						ch = (ch * 16) + (*s - '0');
+					else if (*s >= 'a' && *s <= 'f')
+						ch = (ch * 16) + (*s - 'a') + 10;
+					else if (*s >= 'A' && *s <= 'F')
+						ch = (ch * 16) + (*s - 'A') + 10;
+					else
+					{
+						lex->token_terminator = s + pg_wchar_table[lex->input_encoding].mblen((const unsigned char *) s);
+						return JSON_UNICODE_ESCAPE_FORMAT;
+					}
+				}
+				if (lex->strval != NULL)
+				{
+					char		utf8str[5];
+					int			utf8len;
+
+					if (ch >= 0xd800 && ch <= 0xdbff)
+					{
+						if (hi_surrogate != -1)
+							return JSON_UNICODE_HIGH_SURROGATE;
+						hi_surrogate = (ch & 0x3ff) << 10;
+						continue;
+					}
+					else if (ch >= 0xdc00 && ch <= 0xdfff)
+					{
+						if (hi_surrogate == -1)
+							return JSON_UNICODE_LOW_SURROGATE;
+						ch = 0x10000 + hi_surrogate + (ch & 0x3ff);
+						hi_surrogate = -1;
+					}
+
+					if (hi_surrogate != -1)
+						return JSON_UNICODE_LOW_SURROGATE;
+
+					/*
+					 * For UTF8, replace the escape sequence by the actual
+					 * utf8 character in lex->strval. Do this also for other
+					 * encodings if the escape designates an ASCII character,
+					 * otherwise raise an error.
+					 */
+
+					if (ch == 0)
+					{
+						/* We can't allow this, since our TEXT type doesn't */
+						return JSON_UNICODE_CODE_POINT_ZERO;
+					}
+					else if (lex->input_encoding == PG_UTF8)
+					{
+						unicode_to_utf8(ch, (unsigned char *) utf8str);
+						utf8len = pg_utf_mblen((unsigned char *) utf8str);
+						appendBinaryStringInfo(lex->strval, utf8str, utf8len);
+					}
+					else if (ch <= 0x007f)
+					{
+						/*
+						 * This is the only way to designate things like a
+						 * form feed character in JSON, so it's useful in all
+						 * encodings.
+						 */
+						appendStringInfoChar(lex->strval, (char) ch);
+					}
+					else
+						return JSON_UNICODE_HIGH_ESCAPE;
+
+				}
+			}
+			else if (lex->strval != NULL)
+			{
+				if (hi_surrogate != -1)
+					return JSON_UNICODE_LOW_SURROGATE;
+
+				switch (*s)
+				{
+					case '"':
+					case '\\':
+					case '/':
+						appendStringInfoChar(lex->strval, *s);
+						break;
+					case 'b':
+						appendStringInfoChar(lex->strval, '\b');
+						break;
+					case 'f':
+						appendStringInfoChar(lex->strval, '\f');
+						break;
+					case 'n':
+						appendStringInfoChar(lex->strval, '\n');
+						break;
+					case 'r':
+						appendStringInfoChar(lex->strval, '\r');
+						break;
+					case 't':
+						appendStringInfoChar(lex->strval, '\t');
+						break;
+					default:
+						/* Not a valid string escape, so error out. */
+						lex->token_terminator = s + pg_wchar_table[lex->input_encoding].mblen((const unsigned char *) s);
+						return JSON_ESCAPING_INVALID;
+				}
+			}
+			else if (strchr("\"\\/bfnrt", *s) == NULL)
+			{
+				/*
+				 * Simpler processing if we're not bothered about de-escaping
+				 *
+				 * It's very tempting to remove the strchr() call here and
+				 * replace it with a switch statement, but testing so far has
+				 * shown it's not a performance win.
+				 */
+				lex->token_terminator = s + pg_wchar_table[lex->input_encoding].mblen((const unsigned char *) s);
+				return JSON_ESCAPING_INVALID;
+			}
+
+		}
+		else if (lex->strval != NULL)
+		{
+			if (hi_surrogate != -1)
+				return JSON_UNICODE_LOW_SURROGATE;
+			appendStringInfoChar(lex->strval, *s);
+		}
+
+	}
+
+	if (hi_surrogate != -1)
+		return JSON_UNICODE_LOW_SURROGATE;
+
+	/* Hooray, we found the end of the string! */
+	lex->prev_token_terminator = lex->token_terminator;
+	lex->token_terminator = s + 1;
+	return JSON_SUCCESS;
+}
+
+/*
+ * The next token in the input stream is known to be a number; lex it.
+ *
+ * In JSON, a number consists of four parts:
+ *
+ * (1) An optional minus sign ('-').
+ *
+ * (2) Either a single '0', or a string of one or more digits that does not
+ *	   begin with a '0'.
+ *
+ * (3) An optional decimal part, consisting of a period ('.') followed by
+ *	   one or more digits.  (Note: While this part can be omitted
+ *	   completely, it's not OK to have only the decimal point without
+ *	   any digits afterwards.)
+ *
+ * (4) An optional exponent part, consisting of 'e' or 'E', optionally
+ *	   followed by '+' or '-', followed by one or more digits.  (Note:
+ *	   As with the decimal part, if 'e' or 'E' is present, it must be
+ *	   followed by at least one digit.)
+ *
+ * The 's' argument to this function points to the ostensible beginning
+ * of part 2 - i.e. the character after any optional minus sign, or the
+ * first character of the string if there is none.
+ *
+ * If num_err is not NULL, we return an error flag to *num_err rather than
+ * raising an error for a badly-formed number.  Also, if total_len is not NULL
+ * the distance from lex->input to the token end+1 is returned to *total_len.
+ */
+static inline JsonParseErrorType
+json_lex_number(JsonLexContext *lex, char *s,
+				bool *num_err, int *total_len)
+{
+	bool		error = false;
+	int			len = s - lex->input;
+
+	/* Part (1): leading sign indicator. */
+	/* Caller already did this for us; so do nothing. */
+
+	/* Part (2): parse main digit string. */
+	if (len < lex->input_length && *s == '0')
+	{
+		s++;
+		len++;
+	}
+	else if (len < lex->input_length && *s >= '1' && *s <= '9')
+	{
+		do
+		{
+			s++;
+			len++;
+		} while (len < lex->input_length && *s >= '0' && *s <= '9');
+	}
+	else
+		error = true;
+
+	/* Part (3): parse optional decimal portion. */
+	if (len < lex->input_length && *s == '.')
+	{
+		s++;
+		len++;
+		if (len == lex->input_length || *s < '0' || *s > '9')
+			error = true;
+		else
+		{
+			do
+			{
+				s++;
+				len++;
+			} while (len < lex->input_length && *s >= '0' && *s <= '9');
+		}
+	}
+
+	/* Part (4): parse optional exponent. */
+	if (len < lex->input_length && (*s == 'e' || *s == 'E'))
+	{
+		s++;
+		len++;
+		if (len < lex->input_length && (*s == '+' || *s == '-'))
+		{
+			s++;
+			len++;
+		}
+		if (len == lex->input_length || *s < '0' || *s > '9')
+			error = true;
+		else
+		{
+			do
+			{
+				s++;
+				len++;
+			} while (len < lex->input_length && *s >= '0' && *s <= '9');
+		}
+	}
+
+	/*
+	 * Check for trailing garbage.  As in json_lex(), any alphanumeric stuff
+	 * here should be considered part of the token for error-reporting
+	 * purposes.
+	 */
+	for (; len < lex->input_length && JSON_ALPHANUMERIC_CHAR(*s); s++, len++)
+		error = true;
+
+	if (total_len != NULL)
+		*total_len = len;
+
+	if (num_err != NULL)
+	{
+		/* let the caller handle any error */
+		*num_err = error;
+	}
+	else
+	{
+		/* return token endpoint */
+		lex->prev_token_terminator = lex->token_terminator;
+		lex->token_terminator = s;
+		/* handle error if any */
+		if (error)
+			return JSON_INVALID_TOKEN;
+	}
+	return JSON_SUCCESS;
+}
+
+/*
+ * Report a parse error.
+ *
+ * lex->token_start and lex->token_terminator must identify the current token.
+ */
+static JsonParseErrorType
+report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
+{
+	/* Handle case where the input ended prematurely. */
+	if (lex->token_start == NULL || lex->token_type == JSON_TOKEN_END)
+		return JSON_EXPECTED_MORE;
+
+	switch (ctx)
+	{
+		case JSON_PARSE_END:
+			return JSON_EXPECTED_END;
+		case JSON_PARSE_VALUE:
+			return JSON_EXPECTED_JSON;
+		case JSON_PARSE_STRING:
+			return JSON_EXPECTED_STRING;
+		case JSON_PARSE_ARRAY_START:
+			return JSON_EXPECTED_ARRAY_FIRST;
+		case JSON_PARSE_ARRAY_NEXT:
+			return JSON_EXPECTED_ARRAY_NEXT;
+		case JSON_PARSE_OBJECT_START:
+			return JSON_EXPECTED_OBJECT_FIRST;
+		case JSON_PARSE_OBJECT_LABEL:
+			return JSON_EXPECTED_COLON;
+		case JSON_PARSE_OBJECT_NEXT:
+			return JSON_EXPECTED_OBJECT_NEXT;
+		case JSON_PARSE_OBJECT_COMMA:
+			return JSON_EXPECTED_STRING;
+		default:
+			return JSON_BAD_PARSER_STATE;;
+	}
+}
 
diff --git a/src/include/common/jsonapi.h b/src/include/common/jsonapi.h
index 162437193a..614764bcca 100644
--- a/src/include/common/jsonapi.h
+++ b/src/include/common/jsonapi.h
@@ -14,6 +14,7 @@
 #ifndef JSONAPI_H
 #define JSONAPI_H
 
+#include "common/pg_wchar.h"
 #include "lib/stringinfo.h"
 
 typedef enum
@@ -73,6 +74,7 @@ typedef struct JsonLexContext
 {
 	char	   *input;
 	int			input_length;
+	pg_enc		input_encoding;
 	char	   *token_start;
 	char	   *token_terminator;
 	char	   *prev_token_terminator;
@@ -133,13 +135,17 @@ extern JsonParseErrorType pg_parse_json(JsonLexContext *lex, const JsonSemAction
  */
 extern JsonParseErrorType json_lex(JsonLexContext *lex) __attribute__((warn_unused_result));
 
+/*
+ * Get the current look_ahead token.
+*/
+extern JsonTokenType lex_peek(JsonLexContext *lex) __attribute__((warn_unused_result));
 
 /*
  * json_count_array_elements performs a fast secondary parse to determine the
  * number of elements in passed array lex context. It should be called from an
  * array_start action.
  */
-extern int	json_count_array_elements(JsonLexContext *lex);
+extern JsonParseErrorType json_count_array_elements(JsonLexContext *lex, int *elements);
 
 /*
  * constructors for JsonLexContext, with or without strval element.
@@ -152,6 +158,7 @@ extern int	json_count_array_elements(JsonLexContext *lex);
  */
 extern JsonLexContext *makeJsonLexContextCstringLen(char *json,
 													int len,
+													pg_enc encoding,
 													bool need_escapes);
 
 /*
-- 
2.21.1 (Apple Git-122.3)

0011-Adding-src-bin-pg_test_json.patch.WIPapplication/octet-stream; name=0011-Adding-src-bin-pg_test_json.patch.WIP; x-unix-mode=0644Download

From 3d27738e4e5ba12c45322d7d66dd33b8b89a06c6 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 22 Jan 2020 16:59:52 -0800
Subject: [PATCH 11/11] Adding src/bin/pg_test_json

This is a command line tool for testing whether
a string is valid json.  It is not that useful
in itself, but it proves that the json parser in
src/common can be used from frontend code.
---
 src/bin/Makefile                    |  1 +
 src/bin/pg_test_json/Makefile       | 46 +++++++++++++++++
 src/bin/pg_test_json/nls.mk         |  4 ++
 src/bin/pg_test_json/pg_test_json.c | 80 +++++++++++++++++++++++++++++
 4 files changed, 131 insertions(+)
 create mode 100644 src/bin/pg_test_json/Makefile
 create mode 100644 src/bin/pg_test_json/nls.mk
 create mode 100644 src/bin/pg_test_json/pg_test_json.c

diff --git a/src/bin/Makefile b/src/bin/Makefile
index 7f4120a34f..7a9a7980b4 100644
--- a/src/bin/Makefile
+++ b/src/bin/Makefile
@@ -25,6 +25,7 @@ SUBDIRS = \
 	pg_resetwal \
 	pg_rewind \
 	pg_test_fsync \
+	pg_test_json \
 	pg_test_timing \
 	pg_upgrade \
 	pg_waldump \
diff --git a/src/bin/pg_test_json/Makefile b/src/bin/pg_test_json/Makefile
new file mode 100644
index 0000000000..3e0458db99
--- /dev/null
+++ b/src/bin/pg_test_json/Makefile
@@ -0,0 +1,46 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/bin/pg_test_json
+#
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/bin/pg_test_json/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "pg_test_json - the PostgreSQL interactive json syntax tester"
+PGAPPICON=win32
+
+subdir = src/bin/pg_test_json
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+# make this available to TAP test scripts
+export with_readline
+
+REFDOCDIR= $(top_srcdir)/doc/src/sgml/ref
+
+override CPPFLAGS := -I. -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+OBJS = \
+	$(WIN32RES) \
+	pg_test_json.o
+
+all: pg_test_json
+
+pg_test_json: $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
+	$(CC) $(CFLAGS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+install: all installdirs
+	$(INSTALL_PROGRAM) pg_test_json$(X) '$(DESTDIR)$(bindir)/pg_test_json$(X)'
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(bindir)'
+
+uninstall:
+	rm -f '$(DESTDIR)$(bindir)/pg_test_json$(X)'
+
+clean distclean maintainer-clean:
+	rm -f pg_test_json$(X) $(OBJS)
diff --git a/src/bin/pg_test_json/nls.mk b/src/bin/pg_test_json/nls.mk
new file mode 100644
index 0000000000..72de0728db
--- /dev/null
+++ b/src/bin/pg_test_json/nls.mk
@@ -0,0 +1,4 @@
+# src/bin/pg_test_json/nls.mk
+CATALOG_NAME     = pg_test_json
+AVAIL_LANGUAGES  =
+GETTEXT_FILES    = pg_test_json.c
diff --git a/src/bin/pg_test_json/pg_test_json.c b/src/bin/pg_test_json/pg_test_json.c
new file mode 100644
index 0000000000..dd8bebafb9
--- /dev/null
+++ b/src/bin/pg_test_json/pg_test_json.c
@@ -0,0 +1,80 @@
+/*
+ *	pg_test_json.c
+ *		tests validity of json strings against parser implementation.
+ */
+
+#include "postgres_fe.h"
+
+#include "common/jsonapi.h"
+#include "libpq-fe.h"
+
+static const char *progname;
+
+static void handle_args(int argc, char *argv[]);
+static void parse_json(const char *str);
+
+int
+main(int argc, char *argv[])
+{
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_test_json"));
+	progname = get_progname(argv[0]);
+
+	handle_args(argc, argv);
+
+	return 0;
+}
+
+static void
+handle_args(int argc, char *argv[])
+{
+	int			argidx;			/* Command line argument position */
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			printf(_("Usage: %s jsonstr [, ...]\n"), progname);
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_test_json (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	for (argidx = 1; argidx < argc; argidx++)
+		parse_json(argv[argidx]);
+}
+
+static void
+parse_json(const char *str)
+{
+	char *json;
+	unsigned int json_len;
+	JsonLexContext *lex;
+	int client_encoding;
+	JsonParseErrorType parse_result;
+
+	json_len = (unsigned int) strlen(str);
+	client_encoding = PQenv2encoding();
+
+#if 0
+	fprintf(stdout, _("%s: preparing for parse of string of length %u....\n"),
+				progname, json_len);
+#endif
+	json = strdup(str);
+#if 0
+	fprintf(stdout, _("%s: duplicated string of length %u.\n"),
+				progname, json_len);
+#endif
+	lex = makeJsonLexContextCstringLen(json, strlen(json), client_encoding, true /* need_escapes */);
+#if 0
+	fprintf(stdout, _("%s: constructed JsonLexContext for utf8 json string of length %u.\n"),
+				progname, json_len);
+#endif
+	parse_result = pg_parse_json(lex, &nullSemAction);
+	fprintf(stdout, _("%s: %s: %s\n"),
+					progname, str, (JSON_SUCCESS == parse_result ? "VALID" : "INVALID"));
+	return;
+}
-- 
2.21.1 (Apple Git-122.3)

#32

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Mark Dilger (#31)

Re: making the backend's json parser work in frontend code

On Wed, Jan 22, 2020 at 10:00 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

Hopefully, this addresses Robert’s concern upthread about the filesystem name not necessarily being in utf8 format, though I might be misunderstanding the exact thrust of his concern. I can think of other possible interpretations of his concern as he expressed it, so I’ll wait for him to clarify.

No, that's not it. Suppose that Álvaro Herrera has some custom
settings he likes to put on all the PostgreSQL clusters that he uses,
so he creates a file álvaro.conf and uses an "include" directive in
postgresql.conf to suck in those settings. If he also likes UTF-8,
then the file name will be stored in the file system as a 12-byte
value of which the first two bytes will be 0xc3 0xa1. In that case,
everything will be fine, because JSON is supposed to always be UTF-8,
and the file name is UTF-8, and it's all good. But suppose he instead
likes LATIN-1. Then the file name will be stored as an 11-byte value
and the first byte will be 0xe1. The second byte, representing a
lower-case 'l', will be 0x6c. But we can't put a byte sequence that
goes 0xe1 0x6c into a JSON manifest stored as UTF-8, because that's
not valid in UTF-8. UTF-8 requires that every byte from 0xc0-0xff be
followed by one or more bytes in the range 0x80-0xbf, and our
hypothetical file name that starts with 0xe1 0x6c does not meet that
criteria.

Now, you might say "well, why don't we just do an encoding
conversion?", but we can't. When the filesystem tells us what the file
names are, it does not tell us what encoding the person who created
those files had in mind. We don't know that they had *any* encoding in
mind. IIUC, a file in the data directory can have a name that consists
of any sequence of bytes whatsoever, so long as it doesn't contain
prohibited characters like a path separator or \0 byte. But only some
of those possible octet sequences can be stored in a manifest that has
to be valid UTF-8.

The degree to which there is a practical problem here is limited by
the fact that most filenames within the data directory are chosen by
the system, e.g. base/16384/16385, and those file names are only going
to contain ASCII characters (i.e. code points 0-127) and those are
valid in UTF-8 and lots of other encodings. Moreover, most people who
create additional files in the data directory will probably use ASCII
characters for those as well, at least if they are from an
English-speaking country, and if they're not, they're likely going to
use UTF-8, and then they'll still be fine. But there is no rule that
says people have to do that, and if somebody wants to use file names
based around SJIS or whatever, the backup manifest functionality
should not for that reason break.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#33

Alvaro Herrera

alvherre@2ndquadrant.com

almost 6 years ago

In reply to: Robert Haas (#32)

Re: making the backend's json parser work in frontend code

On 2020-Jan-23, Robert Haas wrote:

No, that's not it. Suppose that ï¿½lvaro Herrera has some custom
settings he likes to put on all the PostgreSQL clusters that he uses,
so he creates a file ï¿½lvaro.conf and uses an "include" directive in
postgresql.conf to suck in those settings. If he also likes UTF-8,
then the file name will be stored in the file system as a 12-byte
value of which the first two bytes will be 0xc3 0xa1. In that case,
everything will be fine, because JSON is supposed to always be UTF-8,
and the file name is UTF-8, and it's all good. But suppose he instead
likes LATIN-1.

I do have files with Latin-1-encoded names in my filesystem, even though
my system is UTF-8, so I understand the problem. I was wondering if it
would work to encode any non-UTF8-valid name using something like
base64; the encoded name will be plain ASCII and can be put in the
manifest, probably using a different field of the JSON object -- so for
a normal file you'd have { path => '1234/2345' } but for a
Latin-1-encoded file you'd have { path_base64 => '4Wx2YXJvLmNvbmYK' }.
Then it's the job of the tool to ensure it decodes the name to its
original form when creating/querying for the file.

A problem I have with this idea is that this is very corner-casey, so
most tool implementors will never realize that there's a need to decode
certain file names.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#34

Bruce Momjian

bruce@momjian.us

almost 6 years ago

In reply to: Alvaro Herrera (#33)

Re: making the backend's json parser work in frontend code

On Thu, Jan 23, 2020 at 02:23:14PM -0300, Alvaro Herrera wrote:

On 2020-Jan-23, Robert Haas wrote:

No, that's not it. Suppose that ï¿½lvaro Herrera has some custom
settings he likes to put on all the PostgreSQL clusters that he uses,
so he creates a file ï¿½lvaro.conf and uses an "include" directive in
postgresql.conf to suck in those settings. If he also likes UTF-8,
then the file name will be stored in the file system as a 12-byte
value of which the first two bytes will be 0xc3 0xa1. In that case,
everything will be fine, because JSON is supposed to always be UTF-8,
and the file name is UTF-8, and it's all good. But suppose he instead
likes LATIN-1.

I do have files with Latin-1-encoded names in my filesystem, even though
my system is UTF-8, so I understand the problem. I was wondering if it
would work to encode any non-UTF8-valid name using something like
base64; the encoded name will be plain ASCII and can be put in the
manifest, probably using a different field of the JSON object -- so for
a normal file you'd have { path => '1234/2345' } but for a
Latin-1-encoded file you'd have { path_base64 => '4Wx2YXJvLmNvbmYK' }.
Then it's the job of the tool to ensure it decodes the name to its
original form when creating/querying for the file.

A problem I have with this idea is that this is very corner-casey, so
most tool implementors will never realize that there's a need to decode
certain file names.

Another idea is to use base64 for all non-ASCII file names, so we don't
need to check if the file name is valid UTF8 before outputting --- we
just need to check for non-ASCII, which is much easier. Another
problem, though, is how do you _flag_ file names as being
base64-encoded? Use another JSON field to specify that?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

#35

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Alvaro Herrera (#33)

Re: making the backend's json parser work in frontend code

On Thu, Jan 23, 2020 at 12:24 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I do have files with Latin-1-encoded names in my filesystem, even though
my system is UTF-8, so I understand the problem. I was wondering if it
would work to encode any non-UTF8-valid name using something like
base64; the encoded name will be plain ASCII and can be put in the
manifest, probably using a different field of the JSON object -- so for
a normal file you'd have { path => '1234/2345' } but for a
Latin-1-encoded file you'd have { path_base64 => '4Wx2YXJvLmNvbmYK' }.
Then it's the job of the tool to ensure it decodes the name to its
original form when creating/querying for the file.

Right. That's what I meant, a couple of messages back, when I
mentioned an extra layer of escaping, but your explanation here is
better because it's more detailed.

A problem I have with this idea is that this is very corner-casey, so
most tool implementors will never realize that there's a need to decode
certain file names.

That's a valid concern. I would not necessarily have thought that
out-of-core tools would find a lot of use in reading them, provided
PostgreSQL itself both knows how to generate them and how to validate
them, but the interest in this topic suggests that people do care
about that.

Mostly, I think this issue shows the folly of imagining that putting
everything into JSON is a good idea because it gets rid of escaping
problems. Actually, what it does is create multiple kinds of escaping
problems. With the format I proposed, you only have to worry that the
file name might contain a tab character, because in that format, tab
is the delimiter. But, if we use JSON, then we've got the same problem
with JSON's delimiter, namely a double quote, which the JSON parser
will solve for you. We then have this additional and somewhat obscure
problem with invalidly encoded data, to which JSON itself provides no
solution. We have to invent our own, probably along the lines of what
you have just proposed. I think one can reasonably wonder whether this
is really an improvement over just inventing a way to escape tabs.

That said, there are other reasons to want to go with JSON, most of
all the fact that it's easy to see how to extend the format to
additional fields. Once you decide that each file will have an object,
you can add any keys that you like to that object and things should
scale up nicely. It has been my contention that we probably will not
find the need to add much more here, but such arguments are always
suspect and have a good chance of being wrong. Also, such prophecies
can be self-fulfilling: if the format is easy to extend, then people
may extend it, whereas if it is hard to extend, they may not try, or
they may try and then give up.

At the end of the day, I'm willing to make this work either way. I do
not think that my original proposal was bad, but there were things not
to like about it. There are also things not to like about using a
JSON-based format, and this seems to me to be a fairly significant
one. However, both sets of problems are solvable, and neither design
is awful. It's just a question of which kind of warts we like better.
To be blunt, I've already spent a lot more effort on this problem than
I would have liked, and more than 90% of it has been spent on the
issue of how to format a file that only PostgreSQL needs to read and
write. While I do not think that good file formats are unimportant, I
remain unconvinced that switching to JSON is making things better. It
seems like it's just making them different, while inflating the amount
of coding required by a fairly significant multiple.

That being said, unless somebody objects in the next few days, I am
going to assume that the people who preferred JSON over a
tab-separated file are also happy with the idea of using base-64
encoding as proposed by you above to represent files whose names are
not valid UTF-8 sequences; and I will then go rewrite the patch that
generates the manifests to use that format, rewrite the validator
patch to parse that format using this infrastructure, and hopefully
end up with something that can be reviewed and committed before we run
out of time to get things done for this release. If anybody wants to
vote for another plan, please vote soon.

In the meantime, any review of the new patches I posted here yesterday
would be warmly appreciated.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#36

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Bruce Momjian (#34)

Re: making the backend's json parser work in frontend code

On Thu, Jan 23, 2020 at 12:49 PM Bruce Momjian <bruce@momjian.us> wrote:

Another idea is to use base64 for all non-ASCII file names, so we don't
need to check if the file name is valid UTF8 before outputting --- we
just need to check for non-ASCII, which is much easier.

I think that we have the infrastructure available to check in a
convenient way whether it's valid as UTF-8, so this might not be
necessary, but I will look into it further unless there is a consensus
to go another direction entirely.

Another
problem, though, is how do you _flag_ file names as being
base64-encoded? Use another JSON field to specify that?

Alvaro's proposed solution in the message to which you replied was to
call the field either 'path' or 'path_base64' depending on whether
base-64 escaping was used. That seems better to me than having a field
called 'path' and a separate field called 'is_path_base64' or
whatever.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#37

Bruce Momjian

bruce@momjian.us

almost 6 years ago

In reply to: Robert Haas (#36)

Re: making the backend's json parser work in frontend code

On Thu, Jan 23, 2020 at 01:05:50PM -0500, Robert Haas wrote:

Another
problem, though, is how do you _flag_ file names as being
base64-encoded? Use another JSON field to specify that?

Alvaro's proposed solution in the message to which you replied was to
call the field either 'path' or 'path_base64' depending on whether
base-64 escaping was used. That seems better to me than having a field
called 'path' and a separate field called 'is_path_base64' or
whatever.

Hmm, so the JSON key name is the flag --- interesting.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

#38

Alvaro Herrera

alvherre@2ndquadrant.com

almost 6 years ago

In reply to: Bruce Momjian (#37)

Re: making the backend's json parser work in frontend code

On 2020-Jan-23, Bruce Momjian wrote:

On Thu, Jan 23, 2020 at 01:05:50PM -0500, Robert Haas wrote:

Another
problem, though, is how do you _flag_ file names as being
base64-encoded? Use another JSON field to specify that?

Alvaro's proposed solution in the message to which you replied was to
call the field either 'path' or 'path_base64' depending on whether
base-64 escaping was used. That seems better to me than having a field
called 'path' and a separate field called 'is_path_base64' or
whatever.

Hmm, so the JSON key name is the flag --- interesting.

Yes, because if you use the same key name, you risk a dumb tool writing
the file name as the encoded name. That's worse because it's harder to
figure out that it's wrong.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#39

Bruce Momjian

bruce@momjian.us

almost 6 years ago

In reply to: Alvaro Herrera (#38)

Re: making the backend's json parser work in frontend code

On Thu, Jan 23, 2020 at 03:20:27PM -0300, Alvaro Herrera wrote:

On 2020-Jan-23, Bruce Momjian wrote:

On Thu, Jan 23, 2020 at 01:05:50PM -0500, Robert Haas wrote:

Another
problem, though, is how do you _flag_ file names as being
base64-encoded? Use another JSON field to specify that?

Alvaro's proposed solution in the message to which you replied was to
call the field either 'path' or 'path_base64' depending on whether
base-64 escaping was used. That seems better to me than having a field
called 'path' and a separate field called 'is_path_base64' or
whatever.

Hmm, so the JSON key name is the flag --- interesting.

Yes, because if you use the same key name, you risk a dumb tool writing
the file name as the encoded name. That's worse because it's harder to
figure out that it's wrong.

Yes, good point. I think my one concern is that someone might specify
both keys in the JSON, which would be very odd.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

#40

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Bruce Momjian (#39)

Re: making the backend's json parser work in frontend code

On Thu, Jan 23, 2020 at 1:22 PM Bruce Momjian <bruce@momjian.us> wrote:

Yes, good point. I think my one concern is that someone might specify
both keys in the JSON, which would be very odd.

I think that if a tool other than PostgreSQL chooses to generate a
PostreSQL backup manifest, it must take care to do it in a manner that
is compatible with what PostgreSQL would generate. If it doesn't,
well, that sucks for them, but we can't prevent other people from
writing bad code. On a very good day, we can prevent ourselves from
writing bad code.

There is in general the question of how rigorous PostgreSQL ought to
be when validating a backup manifest. The proposal on the table is to
store four (4) fields per file: name, size, last modification time,
and checksum. So a JSON object representing a file should have four
keys, say "path", "size", "mtime", and "checksum". The "checksum" key
could perhaps be optional, in case the user disables checksums, or we
could represent that case in some other way, like "checksum" => null,
"checksum" => "", or "checksum" => "NONE". There is an almost
unlimited scope for bike-shedding here, but let's leave that to one
side for the moment.

Suppose that someone asks PostgreSQL's backup manifest verification
tool to validate a backup manifest where there's an extra key. Say, in
addition to the four keys listed in the previous paragraph, there is
an additional key, "momjian". On the one hand, our backup manifest
verification tool could take this as a sign that the manifest is
invalid, and accordingly throw an error. Or, it could assume that some
third-party backup tool generated the backup manifest and that the
"momjian" field is there to track something which is of interest to
that tool but not to PostgreSQL core, in which case it should just be
ignored.

Incidentally, some research seems to suggest that the problem of
filenames which don't form a valid UTF-8 sequence cannot occur on
Windows. This blog post may be helpful in understanding the issues:

http://beets.io/blog/paths.html

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#41

Bruce Momjian

bruce@momjian.us

almost 6 years ago

In reply to: Robert Haas (#40)

Re: making the backend's json parser work in frontend code

On Thu, Jan 23, 2020 at 02:00:23PM -0500, Robert Haas wrote:

Incidentally, some research seems to suggest that the problem of
filenames which don't form a valid UTF-8 sequence cannot occur on
Windows. This blog post may be helpful in understanding the issues:

http://beets.io/blog/paths.html

Is there any danger of assuming a non-UTF8 sequence to be UTF8 even when
it isn't, except that it displays oddly? I am thinking of a non-UTF8
sequence that is value UTF8.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

#42

Daniel Verite

daniel@manitou-mail.org

almost 6 years ago

In reply to: Robert Haas (#35)

Re: making the backend's json parser work in frontend code

Robert Haas wrote:

With the format I proposed, you only have to worry that the
file name might contain a tab character, because in that format, tab
is the delimiter

It could be CSV, which has this problem already solved,
is easier to parse than JSON, certainly no less popular,
and is not bound to a specific encoding.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

#43

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Daniel Verite (#42)

Re: making the backend's json parser work in frontend code

On Thu, Jan 23, 2020 at 2:08 PM Daniel Verite <daniel@manitou-mail.org> wrote:

It could be CSV, which has this problem already solved,
is easier to parse than JSON, certainly no less popular,
and is not bound to a specific encoding.

Sure. I don't think that would look quite as nice visually as what I
proposed when inspected by humans, and our default COPY output format
is tab-separated rather than comma-separated. However, if CSV would be
more acceptable, great.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#44

Alvaro Herrera

alvherre@2ndquadrant.com

almost 6 years ago

In reply to: Bruce Momjian (#39)

Re: making the backend's json parser work in frontend code

On 2020-Jan-23, Bruce Momjian wrote:

Yes, good point. I think my one concern is that someone might specify
both keys in the JSON, which would be very odd.

Just make that a reason to raise an error. I think it's even possible
to specify that as a JSON Schema constraint, using a "oneOf" predicate.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#45

Mark Dilger

mark.dilger@enterprisedb.com

almost 6 years ago

In reply to: Mark Dilger (#31)

6 attachment(s)

Re: making the backend's json parser work in frontend code

On Jan 22, 2020, at 7:00 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:

I have this done in my local repo to the point that I can build frontend tools against the json parser that is now in src/common and also run all the check-world tests without failure. I’m planning to post my work soon, possibly tonight if I don’t run out of time, but more likely tomorrow.

Ok, I finished merging with Robert’s patches. The attached follow his numbering, with my patches intended to by applied after his.

I tried not to change his work too much, but I did a bit of refactoring in 0010, as explained in the commit comment.

0011 is just for verifying the linking works ok and the json parser can be invoked from a frontend tool without error — I don’t really see the point in committing it.

I ran some benchmarks for json parsing in the backend both before and after these patches, with very slight changes in runtime. The setup for the benchmark creates an unlogged table with a single text column and loads rows of json formatted text:

CREATE UNLOGGED TABLE benchmark (
j text
);
COPY benchmark (j) FROM '/Users/mark.dilger/bench/json.csv’;

FYI:

wc ~/bench/json.csv
107 34465023 503364244 /Users/mark.dilger/bench/json.csv

The benchmark itself casts the text column to jsonb, as follows:

SELECT jsonb_typeof(j::jsonb) typ, COUNT(*) FROM benchmark GROUP BY typ;

In summary, the times are:

pristine patched
————— —————
11.985 12.237
12.200 11.992
11.691 11.896
11.847 11.833
11.722 11.936

The full output for the runtimes without the patch over five iterations:

CREATE TABLE
COPY 107
typ | count
--------+-------
object | 107
(1 row)

real 0m11.985s
user 0m0.002s
sys 0m0.003s
typ | count
--------+-------
object | 107
(1 row)

real 0m12.200s
user 0m0.002s
sys 0m0.004s
typ | count
--------+-------
object | 107
(1 row)

real 0m11.691s
user 0m0.002s
sys 0m0.003s
typ | count
--------+-------
object | 107
(1 row)

real 0m11.847s
user 0m0.002s
sys 0m0.004s
typ | count
--------+-------
object | 107
(1 row)

real 0m11.722s
user 0m0.002s
sys 0m0.003s

An with the patch, also five iterations:

CREATE TABLE
COPY 107
typ | count
--------+-------
object | 107
(1 row)

real 0m12.237s
user 0m0.002s
sys 0m0.004s
typ | count
--------+-------
object | 107
(1 row)

real 0m11.992s
user 0m0.002s
sys 0m0.004s
typ | count
--------+-------
object | 107
(1 row)

real 0m11.896s
user 0m0.002s
sys 0m0.004s
typ | count
--------+-------
object | 107
(1 row)

real 0m11.833s
user 0m0.002s
sys 0m0.004s
typ | count
--------+-------
object | 107
(1 row)

real 0m11.936s
user 0m0.002s
sys 0m0.004s

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0006-Relocating-pg_wchar.h.patchapplication/octet-stream; name=0006-Relocating-pg_wchar.h.patch; x-unix-mode=0644Download

From e1f6f9d04498490b7828187c42201b0da853feec Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 22 Jan 2020 19:01:16 -0800
Subject: [PATCH 06/11] Relocating pg_wchar.h

Moving pg_wchar.h from src/include/mb to src/include/common.
Updating #include directives as necessary.
---
 contrib/btree_gist/btree_utils_var.h                          | 2 +-
 contrib/dblink/dblink.c                                       | 2 +-
 contrib/fuzzystrmatch/fuzzystrmatch.c                         | 2 +-
 contrib/pg_stat_statements/pg_stat_statements.c               | 2 +-
 contrib/pgcrypto/pgp-pgsql.c                                  | 2 +-
 contrib/postgres_fdw/connection.c                             | 2 +-
 src/backend/access/spgist/spgtextproc.c                       | 2 +-
 src/backend/catalog/genbki.pl                                 | 2 +-
 src/backend/catalog/namespace.c                               | 2 +-
 src/backend/catalog/pg_collation.c                            | 2 +-
 src/backend/catalog/pg_conversion.c                           | 2 +-
 src/backend/catalog/pg_proc.c                                 | 2 +-
 src/backend/commands/collationcmds.c                          | 2 +-
 src/backend/commands/conversioncmds.c                         | 2 +-
 src/backend/commands/copy.c                                   | 2 +-
 src/backend/commands/dbcommands.c                             | 2 +-
 src/backend/commands/extension.c                              | 2 +-
 src/backend/commands/indexcmds.c                              | 2 +-
 src/backend/commands/variable.c                               | 2 +-
 src/backend/executor/execMain.c                               | 2 +-
 src/backend/executor/execPartition.c                          | 2 +-
 src/backend/executor/execUtils.c                              | 2 +-
 src/backend/libpq/pqformat.c                                  | 2 +-
 src/backend/parser/parse_node.c                               | 2 +-
 src/backend/parser/parser.c                                   | 2 +-
 src/backend/parser/scan.l                                     | 2 +-
 src/backend/parser/scansup.c                                  | 2 +-
 src/backend/postmaster/pgstat.c                               | 2 +-
 src/backend/replication/libpqwalreceiver/libpqwalreceiver.c   | 2 +-
 src/backend/replication/logical/logicalfuncs.c                | 2 +-
 src/backend/replication/logical/worker.c                      | 2 +-
 src/backend/tcop/fastpath.c                                   | 2 +-
 src/backend/tcop/postgres.c                                   | 2 +-
 src/backend/utils/adt/ascii.c                                 | 2 +-
 src/backend/utils/adt/format_type.c                           | 2 +-
 src/backend/utils/adt/formatting.c                            | 2 +-
 src/backend/utils/adt/genfile.c                               | 2 +-
 src/backend/utils/adt/jsonapi.c                               | 2 +-
 src/backend/utils/adt/jsonfuncs.c                             | 2 +-
 src/backend/utils/adt/jsonpath_scan.l                         | 2 +-
 src/backend/utils/adt/like.c                                  | 2 +-
 src/backend/utils/adt/like_support.c                          | 2 +-
 src/backend/utils/adt/name.c                                  | 2 +-
 src/backend/utils/adt/oracle_compat.c                         | 2 +-
 src/backend/utils/adt/pg_locale.c                             | 2 +-
 src/backend/utils/adt/ruleutils.c                             | 2 +-
 src/backend/utils/adt/tsvector_op.c                           | 2 +-
 src/backend/utils/adt/varchar.c                               | 2 +-
 src/backend/utils/adt/xml.c                                   | 2 +-
 src/backend/utils/error/elog.c                                | 2 +-
 src/backend/utils/init/miscinit.c                             | 2 +-
 src/backend/utils/init/postinit.c                             | 2 +-
 src/backend/utils/mb/conv.c                                   | 2 +-
 src/backend/utils/mb/conversion_procs/README.euc_jp           | 2 +-
 .../mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c   | 2 +-
 .../mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c   | 2 +-
 .../utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c | 2 +-
 .../mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c     | 2 +-
 .../utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c | 2 +-
 src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c  | 2 +-
 .../mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c     | 2 +-
 .../conversion_procs/latin2_and_win1250/latin2_and_win1250.c  | 2 +-
 .../utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c   | 2 +-
 .../utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c   | 2 +-
 .../mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c | 2 +-
 .../mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c   | 2 +-
 .../mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c     | 2 +-
 .../mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c     | 2 +-
 .../mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c     | 2 +-
 .../mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c     | 2 +-
 .../mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c   | 2 +-
 .../utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c     | 2 +-
 .../mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c   | 2 +-
 .../conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c  | 2 +-
 .../utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c | 2 +-
 .../utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c   | 2 +-
 .../mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c | 2 +-
 .../utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c     | 2 +-
 .../utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c     | 2 +-
 src/backend/utils/mb/mbutils.c                                | 2 +-
 src/backend/utils/mb/stringinfo_mb.c                          | 2 +-
 src/backend/utils/mb/wstrcmp.c                                | 2 +-
 src/backend/utils/mb/wstrncmp.c                               | 2 +-
 src/backend/utils/misc/guc-file.l                             | 2 +-
 src/backend/utils/mmgr/mcxt.c                                 | 2 +-
 src/bin/initdb/initdb.c                                       | 2 +-
 src/bin/pg_upgrade/check.c                                    | 2 +-
 src/bin/psql/mainloop.c                                       | 2 +-
 src/common/encnames.c                                         | 4 ++--
 src/common/saslprep.c                                         | 2 +-
 src/common/wchar.c                                            | 4 ++--
 src/include/catalog/pg_conversion.dat                         | 2 +-
 src/include/{mb => common}/pg_wchar.h                         | 2 +-
 src/include/common/unicode_norm.h                             | 2 +-
 src/include/libpq/pqformat.h                                  | 2 +-
 src/include/regex/regcustom.h                                 | 2 +-
 src/include/regex/regex.h                                     | 2 +-
 src/include/tsearch/ts_locale.h                               | 2 +-
 src/interfaces/libpq/fe-connect.c                             | 2 +-
 src/interfaces/libpq/fe-exec.c                                | 2 +-
 src/interfaces/libpq/fe-misc.c                                | 2 +-
 src/interfaces/libpq/fe-protocol3.c                           | 2 +-
 src/pl/plperl/plperl.c                                        | 2 +-
 src/pl/plperl/plperl_helpers.h                                | 2 +-
 src/pl/plpgsql/src/pl_scanner.c                               | 2 +-
 src/pl/plpython/plpy_cursorobject.c                           | 2 +-
 src/pl/plpython/plpy_plpymodule.c                             | 2 +-
 src/pl/plpython/plpy_spi.c                                    | 2 +-
 src/pl/plpython/plpy_typeio.c                                 | 2 +-
 src/pl/plpython/plpy_util.c                                   | 2 +-
 src/pl/tcl/pltcl.c                                            | 2 +-
 src/port/chklocale.c                                          | 2 +-
 112 files changed, 114 insertions(+), 114 deletions(-)
 rename src/include/{mb => common}/pg_wchar.h (99%)

diff --git a/contrib/btree_gist/btree_utils_var.h b/contrib/btree_gist/btree_utils_var.h
index 2f8def655c..4ae273d767 100644
--- a/contrib/btree_gist/btree_utils_var.h
+++ b/contrib/btree_gist/btree_utils_var.h
@@ -6,7 +6,7 @@
 
 #include "access/gist.h"
 #include "btree_gist.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 /* Variable length key */
 typedef bytea GBT_VARKEY;
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 1dddf02779..a202772fad 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -44,12 +44,12 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_type.h"
 #include "catalog/pg_user_mapping.h"
+#include "common/pg_wchar.h"
 #include "executor/spi.h"
 #include "foreign/foreign.h"
 #include "funcapi.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/scansup.h"
 #include "utils/acl.h"
diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c b/contrib/fuzzystrmatch/fuzzystrmatch.c
index ccbb84b481..aa0a209dab 100644
--- a/contrib/fuzzystrmatch/fuzzystrmatch.c
+++ b/contrib/fuzzystrmatch/fuzzystrmatch.c
@@ -40,7 +40,7 @@
 
 #include <ctype.h>
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/varlena.h"
 
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 6f82a671ee..53fa917e95 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -62,9 +62,9 @@
 #include <unistd.h>
 
 #include "catalog/pg_authid.h"
+#include "common/pg_wchar.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/analyze.h"
 #include "parser/parsetree.h"
diff --git a/contrib/pgcrypto/pgp-pgsql.c b/contrib/pgcrypto/pgp-pgsql.c
index 8be895df80..960fa1c3e0 100644
--- a/contrib/pgcrypto/pgp-pgsql.c
+++ b/contrib/pgcrypto/pgp-pgsql.c
@@ -32,9 +32,9 @@
 #include "postgres.h"
 
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "funcapi.h"
 #include "lib/stringinfo.h"
-#include "mb/pg_wchar.h"
 #include "mbuf.h"
 #include "pgp.h"
 #include "px.h"
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 29c811a80b..db897e7819 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -16,7 +16,7 @@
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
 #include "commands/defrem.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postgres_fdw.h"
diff --git a/src/backend/access/spgist/spgtextproc.c b/src/backend/access/spgist/spgtextproc.c
index b5ec81937c..d166a6352c 100644
--- a/src/backend/access/spgist/spgtextproc.c
+++ b/src/backend/access/spgist/spgtextproc.c
@@ -41,7 +41,7 @@
 
 #include "access/spgist.h"
 #include "catalog/pg_type.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/pg_locale.h"
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 803251207b..b8c4c22fad 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -327,7 +327,7 @@ foreach my $row (@{ $catalog_data{pg_type} })
 # as for OIDs, but we have to dig the values out of pg_wchar.h.
 my %encids;
 
-my $encfile = $include_path . 'mb/pg_wchar.h';
+my $encfile = $include_path . 'common/pg_wchar.h';
 open(my $ef, '<', $encfile) || die "$encfile: $!";
 
 # We're parsing an enum, so start with 0 and increment
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index e70243a008..f7fd77ed1a 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -40,8 +40,8 @@
 #include "catalog/pg_ts_template.h"
 #include "catalog/pg_type.h"
 #include "commands/dbcommands.h"
+#include "common/pg_wchar.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index 8559779a4f..a5dd02d2ee 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -24,7 +24,7 @@
 #include "catalog/objectaccess.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_namespace.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
 #include "utils/pg_locale.h"
diff --git a/src/backend/catalog/pg_conversion.c b/src/backend/catalog/pg_conversion.c
index b38df4f696..1163aa709a 100644
--- a/src/backend/catalog/pg_conversion.c
+++ b/src/backend/catalog/pg_conversion.c
@@ -25,7 +25,7 @@
 #include "catalog/pg_conversion.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_proc.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/catcache.h"
 #include "utils/fmgroids.h"
diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c
index 5194dcaac0..d9bb3ebc2a 100644
--- a/src/backend/catalog/pg_proc.c
+++ b/src/backend/catalog/pg_proc.c
@@ -27,9 +27,9 @@
 #include "catalog/pg_transform.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
+#include "common/pg_wchar.h"
 #include "executor/functions.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "parser/parse_type.h"
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 34c75e8b56..59c0abeea5 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -27,7 +27,7 @@
 #include "commands/comment.h"
 #include "commands/dbcommands.h"
 #include "commands/defrem.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
diff --git a/src/backend/commands/conversioncmds.c b/src/backend/commands/conversioncmds.c
index f974478b26..87b4cc1090 100644
--- a/src/backend/commands/conversioncmds.c
+++ b/src/backend/commands/conversioncmds.c
@@ -21,7 +21,7 @@
 #include "catalog/pg_type.h"
 #include "commands/alter.h"
 #include "commands/conversioncmds.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parse_func.h"
 #include "utils/builtins.h"
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 40a8ec1abd..02cb2d9718 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -30,6 +30,7 @@
 #include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/trigger.h"
+#include "common/pg_wchar.h"
 #include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
@@ -37,7 +38,6 @@
 #include "foreign/fdwapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/optimizer.h"
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 367c30adb0..0a6e6c6c6b 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -46,7 +46,7 @@
 #include "commands/defrem.h"
 #include "commands/seclabel.h"
 #include "commands/tablespace.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/bgwriter.h"
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 01de398dcb..b17bc69a65 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -50,8 +50,8 @@
 #include "commands/defrem.h"
 #include "commands/extension.h"
 #include "commands/schemacmds.h"
+#include "common/pg_wchar.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "storage/fd.h"
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 52ce02f898..e27ead40bd 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -39,7 +39,7 @@
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/tablespace.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index 484f7ea2c0..a197cebb84 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -24,7 +24,7 @@
 #include "access/xlog.h"
 #include "catalog/pg_authid.h"
 #include "commands/variable.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b03e02ae6c..5275f439c0 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -47,11 +47,11 @@
 #include "catalog/pg_publication.h"
 #include "commands/matview.h"
 #include "commands/trigger.h"
+#include "common/pg_wchar.h"
 #include "executor/execdebug.h"
 #include "executor/nodeSubplan.h"
 #include "foreign/fdwapi.h"
 #include "jit/jit.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parsetree.h"
 #include "storage/bufmgr.h"
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index c13b1d3501..169ff50407 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -18,10 +18,10 @@
 #include "catalog/partition.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "partitioning/partbounds.h"
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index cc5177cc2b..673f2db641 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -50,9 +50,9 @@
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/transam.h"
+#include "common/pg_wchar.h"
 #include "executor/executor.h"
 #include "jit/jit.h"
-#include "mb/pg_wchar.h"
 #include "nodes/nodeFuncs.h"
 #include "parser/parsetree.h"
 #include "partitioning/partdesc.h"
diff --git a/src/backend/libpq/pqformat.c b/src/backend/libpq/pqformat.c
index a6f990c2d2..82ee1f26ba 100644
--- a/src/backend/libpq/pqformat.c
+++ b/src/backend/libpq/pqformat.c
@@ -73,9 +73,9 @@
 
 #include <sys/param.h>
 
+#include "common/pg_wchar.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
 #include "port/pg_bswap.h"
 
 
diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index 6e98fe55fc..a075354805 100644
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -17,7 +17,7 @@
 #include "access/htup_details.h"
 #include "access/table.h"
 #include "catalog/pg_type.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "parser/parse_coerce.h"
diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c
index 1bf1144c4f..0922e9436f 100644
--- a/src/backend/parser/parser.c
+++ b/src/backend/parser/parser.c
@@ -21,7 +21,7 @@
 
 #include "postgres.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "parser/gramparse.h"
 #include "parser/parser.h"
 #include "parser/scansup.h"
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 84c73914a8..6bade98a30 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -35,11 +35,11 @@
 #include <ctype.h>
 #include <unistd.h>
 
+#include "common/pg_wchar.h"
 #include "common/string.h"
 #include "parser/gramparse.h"
 #include "parser/parser.h"		/* only needed for GUC variables */
 #include "parser/scansup.h"
-#include "mb/pg_wchar.h"
 }
 
 %{
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 18169ec4f4..ff1e6ab8d5 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -17,7 +17,7 @@
 
 #include <ctype.h>
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "parser/scansup.h"
 
 /* ----------------
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 51c486bebd..7913fa67f0 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -41,9 +41,9 @@
 #include "catalog/pg_database.h"
 #include "catalog/pg_proc.h"
 #include "common/ip.h"
+#include "common/pg_wchar.h"
 #include "libpq/libpq.h"
 #include "libpq/pqsignal.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "pgstat.h"
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index e4fd1f9bb6..c8f8dd178c 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -21,9 +21,9 @@
 
 #include "access/xlog.h"
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "funcapi.h"
 #include "libpq-fe.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "pqexpbuffer.h"
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 7693c98949..9bd38a936e 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -21,9 +21,9 @@
 #include "access/xlog_internal.h"
 #include "access/xlogutils.h"
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "replication/decode.h"
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 7a5471f95c..a09c8bd092 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -33,12 +33,12 @@
 #include "catalog/pg_subscription_rel.h"
 #include "commands/tablecmds.h"
 #include "commands/trigger.h"
+#include "common/pg_wchar.h"
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "funcapi.h"
 #include "libpq/pqformat.h"
 #include "libpq/pqsignal.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/optimizer.h"
diff --git a/src/backend/tcop/fastpath.c b/src/backend/tcop/fastpath.c
index e793984a9f..ef730acc5b 100644
--- a/src/backend/tcop/fastpath.c
+++ b/src/backend/tcop/fastpath.c
@@ -21,9 +21,9 @@
 #include "access/xact.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_proc.h"
+#include "common/pg_wchar.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "port/pg_bswap.h"
 #include "tcop/fastpath.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0a6f80963b..7f03d0acda 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -42,12 +42,12 @@
 #include "catalog/pg_type.h"
 #include "commands/async.h"
 #include "commands/prepare.h"
+#include "common/pg_wchar.h"
 #include "executor/spi.h"
 #include "jit/jit.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "libpq/pqsignal.h"
-#include "mb/pg_wchar.h"
 #include "mb/stringinfo_mb.h"
 #include "miscadmin.h"
 #include "nodes/print.h"
diff --git a/src/backend/utils/adt/ascii.c b/src/backend/utils/adt/ascii.c
index 3aa8a5e7d2..7ac3536705 100644
--- a/src/backend/utils/adt/ascii.c
+++ b/src/backend/utils/adt/ascii.c
@@ -11,7 +11,7 @@
  */
 #include "postgres.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/ascii.h"
 #include "utils/builtins.h"
 
diff --git a/src/backend/utils/adt/format_type.c b/src/backend/utils/adt/format_type.c
index 92ee77ac5c..edbded16d5 100644
--- a/src/backend/utils/adt/format_type.c
+++ b/src/backend/utils/adt/format_type.c
@@ -20,7 +20,7 @@
 #include "access/htup_details.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/numeric.h"
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index ca3c48d024..ff026f410c 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -87,7 +87,7 @@
 
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "parser/scansup.h"
 #include "utils/builtins.h"
 #include "utils/date.h"
 #include "utils/datetime.h"
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index 0d75928e7f..3da50444ab 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -25,8 +25,8 @@
 #include "catalog/pg_authid.h"
 #include "catalog/pg_tablespace_d.h"
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "postmaster/syslogger.h"
 #include "storage/fd.h"
diff --git a/src/backend/utils/adt/jsonapi.c b/src/backend/utils/adt/jsonapi.c
index 1ac3b7beda..75e71ba376 100644
--- a/src/backend/utils/adt/jsonapi.c
+++ b/src/backend/utils/adt/jsonapi.c
@@ -13,7 +13,7 @@
  */
 #include "postgres.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/jsonapi.h"
 
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 66ea11b971..971ef6e630 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -18,10 +18,10 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
 #include "funcapi.h"
 #include "lib/stringinfo.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
diff --git a/src/backend/utils/adt/jsonpath_scan.l b/src/backend/utils/adt/jsonpath_scan.l
index 70681b789d..bde5539aed 100644
--- a/src/backend/utils/adt/jsonpath_scan.l
+++ b/src/backend/utils/adt/jsonpath_scan.l
@@ -17,7 +17,7 @@
 
 #include "postgres.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "nodes/pg_list.h"
 
 static JsonPathString scanstring;
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 5bf94628c3..1314fce1b4 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -20,7 +20,7 @@
 #include <ctype.h>
 
 #include "catalog/pg_collation.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/pg_locale.h"
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 286e000d4e..7d20abc9be 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -43,7 +43,7 @@
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_type.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
diff --git a/src/backend/utils/adt/name.c b/src/backend/utils/adt/name.c
index 6749e75c89..9e81df701c 100644
--- a/src/backend/utils/adt/name.c
+++ b/src/backend/utils/adt/name.c
@@ -23,8 +23,8 @@
 #include "catalog/namespace.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
diff --git a/src/backend/utils/adt/oracle_compat.c b/src/backend/utils/adt/oracle_compat.c
index 0d56dc898a..9c4131e6f2 100644
--- a/src/backend/utils/adt/oracle_compat.c
+++ b/src/backend/utils/adt/oracle_compat.c
@@ -16,7 +16,7 @@
 #include "postgres.h"
 
 #include "common/int.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/formatting.h"
 
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 25fb7e2ebf..5900443039 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -57,7 +57,7 @@
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_control.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/formatting.h"
 #include "utils/hsearch.h"
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 116e00bce4..5ae53968f1 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -43,9 +43,9 @@
 #include "commands/defrem.h"
 #include "commands/tablespace.h"
 #include "common/keywords.h"
+#include "common/pg_wchar.h"
 #include "executor/spi.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index e33ca5abe7..f68d87cb35 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -19,10 +19,10 @@
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "common/pg_wchar.h"
 #include "executor/spi.h"
 #include "funcapi.h"
 #include "lib/qunique.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "tsearch/ts_utils.h"
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index 1e1239a1ba..69f6d9d8c1 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -17,8 +17,8 @@
 #include "access/detoast.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
 #include "utils/array.h"
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 3808c307f6..74c8a268f2 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -73,12 +73,12 @@
 #include "catalog/pg_class.h"
 #include "catalog/pg_type.h"
 #include "commands/dbcommands.h"
+#include "common/pg_wchar.h"
 #include "executor/spi.h"
 #include "executor/tablefunc.h"
 #include "fmgr.h"
 #include "lib/stringinfo.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/execnodes.h"
 #include "nodes/nodeFuncs.h"
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index f5b0211f66..61ad9d8f89 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -68,9 +68,9 @@
 
 #include "access/transam.h"
 #include "access/xact.h"
+#include "common/pg_wchar.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index c4b2946986..15f79a6d41 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -33,8 +33,8 @@
 #include "access/htup_details.h"
 #include "catalog/pg_authid.h"
 #include "common/file_perm.h"
+#include "common/pg_wchar.h"
 #include "libpq/libpq.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 8a47dcdcb1..1a9bb9b98d 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -34,9 +34,9 @@
 #include "catalog/pg_database.h"
 #include "catalog/pg_db_role_setting.h"
 #include "catalog/pg_tablespace.h"
+#include "common/pg_wchar.h"
 #include "libpq/auth.h"
 #include "libpq/libpq-be.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
diff --git a/src/backend/utils/mb/conv.c b/src/backend/utils/mb/conv.c
index 54dcf71fb7..4b0fc23285 100644
--- a/src/backend/utils/mb/conv.c
+++ b/src/backend/utils/mb/conv.c
@@ -11,7 +11,7 @@
  *-------------------------------------------------------------------------
  */
 #include "postgres.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 
 /*
diff --git a/src/backend/utils/mb/conversion_procs/README.euc_jp b/src/backend/utils/mb/conversion_procs/README.euc_jp
index 6e59b7bd7f..97cc63f1bd 100644
--- a/src/backend/utils/mb/conversion_procs/README.euc_jp
+++ b/src/backend/utils/mb/conversion_procs/README.euc_jp
@@ -35,7 +35,7 @@ o C
   ������������������(5��������������������������������NULL������������
   ����������������������)������������������������
 
-  ����������������ID��include/mb/pg_wchar.h��typedef enum pg_enc������
+  ����������������ID��include/common/pg_wchar.h��typedef enum pg_enc������
   ��������������
 
 o ����������������������
diff --git a/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c b/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c
index 376b48ca61..5500dd4fed 100644
--- a/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c b/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
index 9ba6bd3040..26051530b7 100644
--- a/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
+++ b/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
@@ -11,8 +11,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c b/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c
index 59c6c3bb12..d1cb25ea92 100644
--- a/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c b/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
index 4ca8e2126e..ffc3896c9d 100644
--- a/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
+++ b/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 
 /*
  * SJIS alternative code.
diff --git a/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c b/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
index 4d7876a666..2183cf183a 100644
--- a/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c
index 68f76aa8cb..7601bd65f2 100644
--- a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c
+++ b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c
@@ -13,7 +13,7 @@
 /* can be used in either frontend or backend */
 #include "postgres_fe.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 typedef struct
 {
diff --git a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
index 82a22b9beb..e209e9a545 100644
--- a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
+++ b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 
 #define ENCODING_GROWTH_RATE 4
 
diff --git a/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c b/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c
index f424f88145..72ab071d87 100644
--- a/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c
+++ b/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c b/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c
index a358a707c1..5868b66715 100644
--- a/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c b/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c
index 75ed49ac54..542e055ea2 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/big5_to_utf8.map"
 #include "../../Unicode/utf8_to_big5.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c b/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c
index 90ad316111..d145da093f 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/utf8_to_koi8r.map"
 #include "../../Unicode/koi8r_to_utf8.map"
 #include "../../Unicode/utf8_to_koi8u.map"
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c
index 018312489c..8d56d2343c 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/euc_jis_2004_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_jis_2004.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c
index 62182a9ba8..41f2f9a5ca 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/euc_cn_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_cn.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c
index dc5abb5dfd..5a9f8ecfd4 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/euc_jp_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_jp.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c
index 088a38d839..d09f9dc78a 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/euc_kr_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_kr.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c
index a9fe94f88b..8e0250e603 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/euc_tw_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_tw.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c b/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c
index 96909b5885..6b5b326c6f 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/gb18030_to_utf8.map"
 #include "../../Unicode/utf8_to_gb18030.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c b/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c
index 78bbcd3ce7..4e2246da7a 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/gbk_to_utf8.map"
 #include "../../Unicode/utf8_to_gbk.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
index 348524f4a2..8051eefed9 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/iso8859_10_to_utf8.map"
 #include "../../Unicode/iso8859_13_to_utf8.map"
 #include "../../Unicode/iso8859_14_to_utf8.map"
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
index 2cdca9f780..b9e127cc5e 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c b/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c
index e09a7c8e41..bece993e1d 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/johab_to_utf8.map"
 #include "../../Unicode/utf8_to_johab.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c b/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c
index c56fa80a4b..8307d4ceb4 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/sjis_to_utf8.map"
 #include "../../Unicode/utf8_to_sjis.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c b/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c
index 458500998d..c913ba0866 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/shift_jis_2004_to_utf8.map"
 #include "../../Unicode/utf8_to_shift_jis_2004.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c b/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c
index 3226ed0325..87ae61ee85 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/uhc_to_utf8.map"
 #include "../../Unicode/utf8_to_uhc.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c b/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c
index 1a0074d063..200fd7f23a 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c
@@ -12,8 +12,8 @@
  */
 
 #include "postgres.h"
+#include "common/pg_wchar.h"
 #include "fmgr.h"
-#include "mb/pg_wchar.h"
 #include "../../Unicode/utf8_to_win1250.map"
 #include "../../Unicode/utf8_to_win1251.map"
 #include "../../Unicode/utf8_to_win1252.map"
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 86787bcb31..4cec7ec73a 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -36,7 +36,7 @@
 
 #include "access/xact.h"
 #include "catalog/namespace.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
diff --git a/src/backend/utils/mb/stringinfo_mb.c b/src/backend/utils/mb/stringinfo_mb.c
index c153b77007..e7462a09b5 100644
--- a/src/backend/utils/mb/stringinfo_mb.c
+++ b/src/backend/utils/mb/stringinfo_mb.c
@@ -19,8 +19,8 @@
  */
 #include "postgres.h"
 
+#include "common/pg_wchar.h"
 #include "mb/stringinfo_mb.h"
-#include "mb/pg_wchar.h"
 
 
 /*
diff --git a/src/backend/utils/mb/wstrcmp.c b/src/backend/utils/mb/wstrcmp.c
index dad3ae023a..e5f57d717d 100644
--- a/src/backend/utils/mb/wstrcmp.c
+++ b/src/backend/utils/mb/wstrcmp.c
@@ -35,7 +35,7 @@
 /* can be used in either frontend or backend */
 #include "postgres_fe.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 int
 pg_char_and_wchar_strcmp(const char *s1, const pg_wchar *s2)
diff --git a/src/backend/utils/mb/wstrncmp.c b/src/backend/utils/mb/wstrncmp.c
index ea4823fc6f..cce0c6c5cf 100644
--- a/src/backend/utils/mb/wstrncmp.c
+++ b/src/backend/utils/mb/wstrncmp.c
@@ -34,7 +34,7 @@
 /* can be used in either frontend or backend */
 #include "postgres_fe.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 int
 pg_wchar_strncmp(const pg_wchar *s1, const pg_wchar *s2, size_t n)
diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l
index 268b745528..6c9e60ef64 100644
--- a/src/backend/utils/misc/guc-file.l
+++ b/src/backend/utils/misc/guc-file.l
@@ -14,7 +14,7 @@
 #include <ctype.h>
 #include <unistd.h>
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "storage/fd.h"
 #include "utils/guc.h"
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 9e24fec72d..71701de091 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -21,7 +21,7 @@
 
 #include "postgres.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/memdebug.h"
 #include "utils/memutils.h"
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 7f1534aebb..0f1758d8e4 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -66,12 +66,12 @@
 #include "common/file_perm.h"
 #include "common/file_utils.h"
 #include "common/logging.h"
+#include "common/pg_wchar.h"
 #include "common/restricted_token.h"
 #include "common/username.h"
 #include "fe_utils/string_utils.h"
 #include "getaddrinfo.h"
 #include "getopt_long.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 
 
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 5f9a102a74..faad97fb8b 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -10,8 +10,8 @@
 #include "postgres_fe.h"
 
 #include "catalog/pg_authid_d.h"
+#include "common/pg_wchar.h"
 #include "fe_utils/string_utils.h"
-#include "mb/pg_wchar.h"
 #include "pg_upgrade.h"
 
 static void check_new_cluster_is_empty(void);
diff --git a/src/bin/psql/mainloop.c b/src/bin/psql/mainloop.c
index bdf803a053..3ea44e0f78 100644
--- a/src/bin/psql/mainloop.c
+++ b/src/bin/psql/mainloop.c
@@ -10,9 +10,9 @@
 #include "command.h"
 #include "common.h"
 #include "common/logging.h"
+#include "common/pg_wchar.h"
 #include "input.h"
 #include "mainloop.h"
-#include "mb/pg_wchar.h"
 #include "prompt.h"
 #include "settings.h"
 
diff --git a/src/common/encnames.c b/src/common/encnames.c
index 14cf1b39e9..f06221e1d5 100644
--- a/src/common/encnames.c
+++ b/src/common/encnames.c
@@ -15,7 +15,7 @@
 #include <ctype.h>
 #include <unistd.h>
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 
 /* ----------
@@ -297,7 +297,7 @@ static const pg_encname pg_encname_tbl[] =
 
 /* ----------
  * These are "official" encoding names.
- * XXX must be sorted by the same order as enum pg_enc (in mb/pg_wchar.h)
+ * XXX must be sorted by the same order as enum pg_enc (in common/pg_wchar.h)
  * ----------
  */
 #ifndef WIN32
diff --git a/src/common/saslprep.c b/src/common/saslprep.c
index 7739b81807..ac49f32eab 100644
--- a/src/common/saslprep.c
+++ b/src/common/saslprep.c
@@ -25,9 +25,9 @@
 #include "postgres_fe.h"
 #endif
 
+#include "common/pg_wchar.h"
 #include "common/saslprep.h"
 #include "common/unicode_norm.h"
-#include "mb/pg_wchar.h"
 
 /*
  * Limit on how large password's we will try to process.  A password
diff --git a/src/common/wchar.c b/src/common/wchar.c
index efaf1c155b..53006115d2 100644
--- a/src/common/wchar.c
+++ b/src/common/wchar.c
@@ -12,7 +12,7 @@
  */
 #include "c.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 
 /*
@@ -1499,7 +1499,7 @@ pg_utf8_islegal(const unsigned char *source, int length)
 /*
  *-------------------------------------------------------------------
  * encoding info table
- * XXX must be sorted by the same order as enum pg_enc (in mb/pg_wchar.h)
+ * XXX must be sorted by the same order as enum pg_enc (in common/pg_wchar.h)
  *-------------------------------------------------------------------
  */
 const pg_wchar_tbl pg_wchar_table[] = {
diff --git a/src/include/catalog/pg_conversion.dat b/src/include/catalog/pg_conversion.dat
index d7120f2fb0..9913d2e370 100644
--- a/src/include/catalog/pg_conversion.dat
+++ b/src/include/catalog/pg_conversion.dat
@@ -11,7 +11,7 @@
 #----------------------------------------------------------------------
 
 # Note: conforencoding and contoencoding must match the spelling of
-# the labels used in the enum pg_enc in mb/pg_wchar.h.
+# the labels used in the enum pg_enc in common/pg_wchar.h.
 
 [
 
diff --git a/src/include/mb/pg_wchar.h b/src/include/common/pg_wchar.h
similarity index 99%
rename from src/include/mb/pg_wchar.h
rename to src/include/common/pg_wchar.h
index b8892ef730..a1b3a27b9f 100644
--- a/src/include/mb/pg_wchar.h
+++ b/src/include/common/pg_wchar.h
@@ -6,7 +6,7 @@
  * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
- * src/include/mb/pg_wchar.h
+ * src/include/common/pg_wchar.h
  *
  *	NOTES
  *		This is used both by the backend and by frontends, but should not be
diff --git a/src/include/common/unicode_norm.h b/src/include/common/unicode_norm.h
index f1b7ef1aa4..e09c1162eb 100644
--- a/src/include/common/unicode_norm.h
+++ b/src/include/common/unicode_norm.h
@@ -14,7 +14,7 @@
 #ifndef UNICODE_NORM_H
 #define UNICODE_NORM_H
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 extern pg_wchar *unicode_normalize_kc(const pg_wchar *input);
 
diff --git a/src/include/libpq/pqformat.h b/src/include/libpq/pqformat.h
index af31e9caba..daeba6d5c9 100644
--- a/src/include/libpq/pqformat.h
+++ b/src/include/libpq/pqformat.h
@@ -13,8 +13,8 @@
 #ifndef PQFORMAT_H
 #define PQFORMAT_H
 
+#include "common/pg_wchar.h"
 #include "lib/stringinfo.h"
-#include "mb/pg_wchar.h"
 #include "port/pg_bswap.h"
 
 extern void pq_beginmessage(StringInfo buf, char msgtype);
diff --git a/src/include/regex/regcustom.h b/src/include/regex/regcustom.h
index 82c9e2fad8..2506ac8268 100644
--- a/src/include/regex/regcustom.h
+++ b/src/include/regex/regcustom.h
@@ -52,7 +52,7 @@
 #include <wctype.h>
 #endif
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 #include "miscadmin.h"			/* needed by rcancelrequested/rstacktoodeep */
 
diff --git a/src/include/regex/regex.h b/src/include/regex/regex.h
index dc31899aa4..739cc7d14b 100644
--- a/src/include/regex/regex.h
+++ b/src/include/regex/regex.h
@@ -35,7 +35,7 @@
 /*
  * Add your own defines, if needed, here.
  */
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 /*
  * interface types etc.
diff --git a/src/include/tsearch/ts_locale.h b/src/include/tsearch/ts_locale.h
index 17536babfe..f77eb23c8a 100644
--- a/src/include/tsearch/ts_locale.h
+++ b/src/include/tsearch/ts_locale.h
@@ -15,7 +15,7 @@
 #include <ctype.h>
 #include <limits.h>
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "utils/pg_locale.h"
 
 /*
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 80b54bc92b..f4db884735 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -23,12 +23,12 @@
 
 #include "common/ip.h"
 #include "common/link-canary.h"
+#include "common/pg_wchar.h"
 #include "common/scram-common.h"
 #include "common/string.h"
 #include "fe-auth.h"
 #include "libpq-fe.h"
 #include "libpq-int.h"
-#include "mb/pg_wchar.h"
 #include "pg_config_paths.h"
 #include "port/pg_bswap.h"
 
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index eea0237c3a..454c272807 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -24,9 +24,9 @@
 #include <unistd.h>
 #endif
 
+#include "common/pg_wchar.h"
 #include "libpq-fe.h"
 #include "libpq-int.h"
-#include "mb/pg_wchar.h"
 
 /* keep this in same order as ExecStatusType in libpq-fe.h */
 char	   *const pgresStatus[] = {
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index a9074d2f29..261d64e7cd 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -47,9 +47,9 @@
 #include <sys/select.h>
 #endif
 
+#include "common/pg_wchar.h"
 #include "libpq-fe.h"
 #include "libpq-int.h"
-#include "mb/pg_wchar.h"
 #include "pg_config_paths.h"
 #include "port/pg_bswap.h"
 
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 850bf84c96..799a1538da 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -26,9 +26,9 @@
 #endif
 #endif
 
+#include "common/pg_wchar.h"
 #include "libpq-fe.h"
 #include "libpq-int.h"
-#include "mb/pg_wchar.h"
 #include "port/pg_bswap.h"
 
 /*
diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index c78891868a..ac7144e99f 100644
--- a/src/pl/plperl/plperl.c
+++ b/src/pl/plperl/plperl.c
@@ -21,9 +21,9 @@
 #include "catalog/pg_type.h"
 #include "commands/event_trigger.h"
 #include "commands/trigger.h"
+#include "common/pg_wchar.h"
 #include "executor/spi.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "parser/parse_type.h"
diff --git a/src/pl/plperl/plperl_helpers.h b/src/pl/plperl/plperl_helpers.h
index 1e318b6dc8..cec942c280 100644
--- a/src/pl/plperl/plperl_helpers.h
+++ b/src/pl/plperl/plperl_helpers.h
@@ -1,7 +1,7 @@
 #ifndef PL_PERL_HELPERS_H
 #define PL_PERL_HELPERS_H
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 #include "plperl.h"
 
diff --git a/src/pl/plpgsql/src/pl_scanner.c b/src/pl/plpgsql/src/pl_scanner.c
index 9cea2e42ac..a5bb7c474c 100644
--- a/src/pl/plpgsql/src/pl_scanner.c
+++ b/src/pl/plpgsql/src/pl_scanner.c
@@ -15,7 +15,7 @@
  */
 #include "postgres.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "parser/scanner.h"
 
 #include "plpgsql.h"
diff --git a/src/pl/plpython/plpy_cursorobject.c b/src/pl/plpython/plpy_cursorobject.c
index 4c37ff898c..01ad7cf3bc 100644
--- a/src/pl/plpython/plpy_cursorobject.c
+++ b/src/pl/plpython/plpy_cursorobject.c
@@ -10,7 +10,7 @@
 
 #include "access/xact.h"
 #include "catalog/pg_type.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "plpy_cursorobject.h"
 #include "plpy_elog.h"
 #include "plpy_main.h"
diff --git a/src/pl/plpython/plpy_plpymodule.c b/src/pl/plpython/plpy_plpymodule.c
index e308c61d50..b079afc000 100644
--- a/src/pl/plpython/plpy_plpymodule.c
+++ b/src/pl/plpython/plpy_plpymodule.c
@@ -7,7 +7,7 @@
 #include "postgres.h"
 
 #include "access/xact.h"
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "plpy_cursorobject.h"
 #include "plpy_elog.h"
 #include "plpy_main.h"
diff --git a/src/pl/plpython/plpy_spi.c b/src/pl/plpython/plpy_spi.c
index 99c1b4f28f..8b7cfbf18c 100644
--- a/src/pl/plpython/plpy_spi.c
+++ b/src/pl/plpython/plpy_spi.c
@@ -11,8 +11,8 @@
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "executor/spi.h"
-#include "mb/pg_wchar.h"
 #include "parser/parse_type.h"
 #include "plpy_elog.h"
 #include "plpy_main.h"
diff --git a/src/pl/plpython/plpy_typeio.c b/src/pl/plpython/plpy_typeio.c
index e734b0d130..baf45526c6 100644
--- a/src/pl/plpython/plpy_typeio.c
+++ b/src/pl/plpython/plpy_typeio.c
@@ -8,8 +8,8 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/pg_wchar.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "plpy_elog.h"
 #include "plpy_main.h"
diff --git a/src/pl/plpython/plpy_util.c b/src/pl/plpython/plpy_util.c
index 4a7d7264d7..873afcfbb8 100644
--- a/src/pl/plpython/plpy_util.c
+++ b/src/pl/plpython/plpy_util.c
@@ -6,7 +6,7 @@
 
 #include "postgres.h"
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 #include "plpy_elog.h"
 #include "plpy_util.h"
 #include "plpython.h"
diff --git a/src/pl/tcl/pltcl.c b/src/pl/tcl/pltcl.c
index e7640008fd..47a328ba29 100644
--- a/src/pl/tcl/pltcl.c
+++ b/src/pl/tcl/pltcl.c
@@ -20,10 +20,10 @@
 #include "catalog/pg_type.h"
 #include "commands/event_trigger.h"
 #include "commands/trigger.h"
+#include "common/pg_wchar.h"
 #include "executor/spi.h"
 #include "fmgr.h"
 #include "funcapi.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
diff --git a/src/port/chklocale.c b/src/port/chklocale.c
index c9c680f0b3..4ebc11bab8 100644
--- a/src/port/chklocale.c
+++ b/src/port/chklocale.c
@@ -23,7 +23,7 @@
 #include <langinfo.h>
 #endif
 
-#include "mb/pg_wchar.h"
+#include "common/pg_wchar.h"
 
 
 /*
-- 
2.21.1 (Apple Git-122.3)

0007-Adding-utils-mbutils.h.patchapplication/octet-stream; name=0007-Adding-utils-mbutils.h.patch; x-unix-mode=0644Download

From af4c59d5beabfda166d8a17f917a5f7d7281bfbf Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 22 Jan 2020 19:19:23 -0800
Subject: [PATCH 07/11] Adding utils/mbutils.h

Moving functions not suitable for frontend code out of common/pg_wchar.h into
new file utils/mbutils.h.
---
 contrib/btree_gist/btree_utils_var.h          |   1 +
 contrib/dblink/dblink.c                       |   1 +
 .../pg_stat_statements/pg_stat_statements.c   |   1 +
 contrib/pgcrypto/pgp-pgsql.c                  |   1 +
 contrib/postgres_fdw/connection.c             |   1 +
 src/backend/access/spgist/spgtextproc.c       |   1 +
 src/backend/catalog/namespace.c               |   1 +
 src/backend/catalog/pg_collation.c            |   1 +
 src/backend/catalog/pg_proc.c                 |   1 +
 src/backend/commands/collationcmds.c          |   1 +
 src/backend/commands/copy.c                   |   1 +
 src/backend/commands/extension.c              |   1 +
 src/backend/commands/indexcmds.c              |   1 +
 src/backend/commands/variable.c               |   1 +
 src/backend/executor/execMain.c               |   1 +
 src/backend/executor/execPartition.c          |   1 +
 src/backend/executor/execUtils.c              |   1 +
 src/backend/parser/parse_node.c               |   1 +
 src/backend/parser/parser.c                   |   1 +
 src/backend/parser/scan.l                     |   1 +
 src/backend/parser/scansup.c                  |   1 +
 src/backend/postmaster/pgstat.c               |   1 +
 .../libpqwalreceiver/libpqwalreceiver.c       |   1 +
 .../replication/logical/logicalfuncs.c        |   1 +
 src/backend/utils/adt/ascii.c                 |   1 +
 src/backend/utils/adt/format_type.c           |   1 +
 src/backend/utils/adt/formatting.c            |   1 +
 src/backend/utils/adt/genfile.c               |   1 +
 src/backend/utils/adt/json.c                  |   1 +
 src/backend/utils/adt/jsonpath_scan.l         |   1 +
 src/backend/utils/adt/like.c                  |   1 +
 src/backend/utils/adt/like_support.c          |   1 +
 src/backend/utils/adt/name.c                  |   1 +
 src/backend/utils/adt/oracle_compat.c         |   1 +
 src/backend/utils/adt/pg_locale.c             |   1 +
 src/backend/utils/adt/ruleutils.c             |   1 +
 src/backend/utils/adt/tsvector_op.c           |   1 +
 src/backend/utils/adt/varchar.c               |   1 +
 src/backend/utils/adt/xml.c                   |   1 +
 src/backend/utils/error/elog.c                |   1 +
 src/backend/utils/init/postinit.c             |   1 +
 src/backend/utils/mb/conv.c                   |   1 +
 .../cyrillic_and_mic/cyrillic_and_mic.c       |   1 +
 .../euc2004_sjis2004/euc2004_sjis2004.c       |   1 +
 .../euc_cn_and_mic/euc_cn_and_mic.c           |   1 +
 .../euc_jp_and_sjis/euc_jp_and_sjis.c         |   1 +
 .../euc_kr_and_mic/euc_kr_and_mic.c           |   1 +
 .../conversion_procs/euc_tw_and_big5/big5.c   |   1 +
 .../euc_tw_and_big5/euc_tw_and_big5.c         |   1 +
 .../latin2_and_win1250/latin2_and_win1250.c   |   1 +
 .../latin_and_mic/latin_and_mic.c             |   1 +
 .../utf8_and_big5/utf8_and_big5.c             |   1 +
 .../utf8_and_cyrillic/utf8_and_cyrillic.c     |   1 +
 .../utf8_and_euc2004/utf8_and_euc2004.c       |   1 +
 .../utf8_and_euc_cn/utf8_and_euc_cn.c         |   1 +
 .../utf8_and_euc_jp/utf8_and_euc_jp.c         |   1 +
 .../utf8_and_euc_kr/utf8_and_euc_kr.c         |   1 +
 .../utf8_and_euc_tw/utf8_and_euc_tw.c         |   1 +
 .../utf8_and_gb18030/utf8_and_gb18030.c       |   1 +
 .../utf8_and_gbk/utf8_and_gbk.c               |   1 +
 .../utf8_and_iso8859/utf8_and_iso8859.c       |   1 +
 .../utf8_and_iso8859_1/utf8_and_iso8859_1.c   |   1 +
 .../utf8_and_johab/utf8_and_johab.c           |   1 +
 .../utf8_and_sjis/utf8_and_sjis.c             |   1 +
 .../utf8_and_sjis2004/utf8_and_sjis2004.c     |   1 +
 .../utf8_and_uhc/utf8_and_uhc.c               |   1 +
 .../utf8_and_win/utf8_and_win.c               |   1 +
 src/backend/utils/mb/mbutils.c                |   1 +
 src/backend/utils/mb/stringinfo_mb.c          |   1 +
 src/backend/utils/misc/guc-file.l             |   1 +
 src/backend/utils/mmgr/mcxt.c                 |   1 +
 src/common/wchar.c                            |   1 +
 src/include/common/pg_wchar.h                 | 109 ---------------
 src/include/libpq/pqformat.h                  |   1 +
 src/include/regex/regcustom.h                 |   2 +
 src/include/regex/regex.h                     |   1 +
 src/include/tsearch/ts_locale.h               |   1 +
 src/include/utils/mbutils.h                   | 124 ++++++++++++++++++
 src/interfaces/libpq/fe-protocol3.c           |   1 +
 src/pl/plperl/plperl.c                        |   1 +
 src/pl/plperl/plperl_helpers.h                |   2 +-
 src/pl/plpgsql/src/pl_scanner.c               |   1 +
 src/pl/plpython/plpy_cursorobject.c           |   1 +
 src/pl/plpython/plpy_plpymodule.c             |   1 +
 src/pl/plpython/plpy_spi.c                    |   1 +
 src/pl/plpython/plpy_typeio.c                 |   1 +
 src/pl/plpython/plpy_util.c                   |   1 +
 87 files changed, 210 insertions(+), 110 deletions(-)
 create mode 100644 src/include/utils/mbutils.h

diff --git a/contrib/btree_gist/btree_utils_var.h b/contrib/btree_gist/btree_utils_var.h
index 4ae273d767..a63e6ace2f 100644
--- a/contrib/btree_gist/btree_utils_var.h
+++ b/contrib/btree_gist/btree_utils_var.h
@@ -7,6 +7,7 @@
 #include "access/gist.h"
 #include "btree_gist.h"
 #include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 /* Variable length key */
 typedef bytea GBT_VARKEY;
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index a202772fad..1eda3c63de 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -57,6 +57,7 @@
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/varlena.h"
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 53fa917e95..37f1516922 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -78,6 +78,7 @@
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/hashutils.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 
 PG_MODULE_MAGIC;
diff --git a/contrib/pgcrypto/pgp-pgsql.c b/contrib/pgcrypto/pgp-pgsql.c
index 960fa1c3e0..3ac659e4f0 100644
--- a/contrib/pgcrypto/pgp-pgsql.c
+++ b/contrib/pgcrypto/pgp-pgsql.c
@@ -40,6 +40,7 @@
 #include "px.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 
 /*
  * public functions
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index db897e7819..6c0b13a80c 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -23,6 +23,7 @@
 #include "storage/latch.h"
 #include "utils/hsearch.h"
 #include "utils/inval.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
 
diff --git a/src/backend/access/spgist/spgtextproc.c b/src/backend/access/spgist/spgtextproc.c
index d166a6352c..f56424ef5c 100644
--- a/src/backend/access/spgist/spgtextproc.c
+++ b/src/backend/access/spgist/spgtextproc.c
@@ -44,6 +44,7 @@
 #include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 #include "utils/varlena.h"
 
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index f7fd77ed1a..8d08a60e56 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -54,6 +54,7 @@
 #include "utils/guc.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
 #include "utils/varlena.h"
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index a5dd02d2ee..2f07f45fdc 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -27,6 +27,7 @@
 #include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c
index d9bb3ebc2a..b7def351fa 100644
--- a/src/backend/catalog/pg_proc.c
+++ b/src/backend/catalog/pg_proc.c
@@ -38,6 +38,7 @@
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/regproc.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 59c0abeea5..2869cb68a4 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -31,6 +31,7 @@
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 02cb2d9718..58d177829f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -51,6 +51,7 @@
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/partcache.h"
 #include "utils/portal.h"
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index b17bc69a65..cf82cd49d8 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -60,6 +60,7 @@
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index e27ead40bd..365d610e39 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -59,6 +59,7 @@
 #include "utils/fmgroids.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/partcache.h"
 #include "utils/pg_rusage.h"
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index a197cebb84..e27957bc5b 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -28,6 +28,7 @@
 #include "miscadmin.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 5275f439c0..0ee1163d21 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -59,6 +59,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/partcache.h"
 #include "utils/rls.h"
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 169ff50407..e67fb8698a 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -29,6 +29,7 @@
 #include "partitioning/partprune.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/partcache.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 673f2db641..eddb6c7560 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -58,6 +58,7 @@
 #include "partitioning/partdesc.h"
 #include "storage/lmgr.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/typcache.h"
diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index a075354805..c5993d01c5 100644
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -27,6 +27,7 @@
 #include "utils/builtins.h"
 #include "utils/int8.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/syscache.h"
 #include "utils/varbit.h"
 
diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c
index 0922e9436f..6d27d1c5b7 100644
--- a/src/backend/parser/parser.c
+++ b/src/backend/parser/parser.c
@@ -25,6 +25,7 @@
 #include "parser/gramparse.h"
 #include "parser/parser.h"
 #include "parser/scansup.h"
+#include "utils/mbutils.h"
 
 static bool check_uescapechar(unsigned char escape);
 static char *str_udeescape(const char *str, char escape,
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 6bade98a30..d25ed1ddc4 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -40,6 +40,7 @@
 #include "parser/gramparse.h"
 #include "parser/parser.h"		/* only needed for GUC variables */
 #include "parser/scansup.h"
+#include "utils/mbutils.h"
 }
 
 %{
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index ff1e6ab8d5..6fc2b59fa3 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -19,6 +19,7 @@
 
 #include "common/pg_wchar.h"
 #include "parser/scansup.h"
+#include "utils/mbutils.h"
 
 /* ----------------
  *		scanstr
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 7913fa67f0..041bf18365 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -63,6 +63,7 @@
 #include "storage/sinvaladt.h"
 #include "utils/ascii.h"
 #include "utils/guc.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/rel.h"
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index c8f8dd178c..535c551629 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -29,6 +29,7 @@
 #include "pqexpbuffer.h"
 #include "replication/walreceiver.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/tuplestore.h"
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 9bd38a936e..0a89a353bc 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -35,6 +35,7 @@
 #include "utils/builtins.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/regproc.h"
diff --git a/src/backend/utils/adt/ascii.c b/src/backend/utils/adt/ascii.c
index 7ac3536705..b692aed409 100644
--- a/src/backend/utils/adt/ascii.c
+++ b/src/backend/utils/adt/ascii.c
@@ -14,6 +14,7 @@
 #include "common/pg_wchar.h"
 #include "utils/ascii.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 
 static void pg_to_ascii(unsigned char *src, unsigned char *src_end,
 						unsigned char *dest, int enc);
diff --git a/src/backend/utils/adt/format_type.c b/src/backend/utils/adt/format_type.c
index edbded16d5..1e8f1a2189 100644
--- a/src/backend/utils/adt/format_type.c
+++ b/src/backend/utils/adt/format_type.c
@@ -23,6 +23,7 @@
 #include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/numeric.h"
 #include "utils/syscache.h"
 
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index ff026f410c..cadee997a1 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -94,6 +94,7 @@
 #include "utils/float.h"
 #include "utils/formatting.h"
 #include "utils/int8.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/numeric.h"
 #include "utils/pg_locale.h"
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index 3da50444ab..338cf7437b 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -31,6 +31,7 @@
 #include "postmaster/syslogger.h"
 #include "storage/fd.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index f6cd2b9911..743dafea4a 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -25,6 +25,7 @@
 #include "utils/json.h"
 #include "utils/jsonfuncs.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/typcache.h"
 
 typedef enum					/* type categories for datum_to_json */
diff --git a/src/backend/utils/adt/jsonpath_scan.l b/src/backend/utils/adt/jsonpath_scan.l
index bde5539aed..d522b18097 100644
--- a/src/backend/utils/adt/jsonpath_scan.l
+++ b/src/backend/utils/adt/jsonpath_scan.l
@@ -19,6 +19,7 @@
 
 #include "common/pg_wchar.h"
 #include "nodes/pg_list.h"
+#include "utils/mbutils.h"
 
 static JsonPathString scanstring;
 
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 1314fce1b4..ef80cdbf15 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -23,6 +23,7 @@
 #include "common/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 
 
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 7d20abc9be..837b7dc153 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -50,6 +50,7 @@
 #include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 #include "utils/selfuncs.h"
 #include "utils/varlena.h"
diff --git a/src/backend/utils/adt/name.c b/src/backend/utils/adt/name.c
index 9e81df701c..07811bdf2b 100644
--- a/src/backend/utils/adt/name.c
+++ b/src/backend/utils/adt/name.c
@@ -29,6 +29,7 @@
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/varlena.h"
 
 
diff --git a/src/backend/utils/adt/oracle_compat.c b/src/backend/utils/adt/oracle_compat.c
index 9c4131e6f2..7ff845e735 100644
--- a/src/backend/utils/adt/oracle_compat.c
+++ b/src/backend/utils/adt/oracle_compat.c
@@ -19,6 +19,7 @@
 #include "common/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/formatting.h"
+#include "utils/mbutils.h"
 
 static text *dotrim(const char *string, int stringlen,
 					const char *set, int setlen,
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 5900443039..c7174e33ad 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -62,6 +62,7 @@
 #include "utils/formatting.h"
 #include "utils/hsearch.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/pg_locale.h"
 #include "utils/syscache.h"
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 5ae53968f1..71beb499a4 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -66,6 +66,7 @@
 #include "utils/guc.h"
 #include "utils/hsearch.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/ruleutils.h"
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index f68d87cb35..14809b4f82 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -29,6 +29,7 @@
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/regproc.h"
 #include "utils/rel.h"
 
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index 69f6d9d8c1..7b8186243b 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -25,6 +25,7 @@
 #include "utils/builtins.h"
 #include "utils/hashutils.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 #include "utils/varlena.h"
 
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 74c8a268f2..3545f1f6fa 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -87,6 +87,7 @@
 #include "utils/date.h"
 #include "utils/datetime.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 61ad9d8f89..c9aa4f207d 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -78,6 +78,7 @@
 #include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 1a9bb9b98d..44e0be3817 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -56,6 +56,7 @@
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/pg_locale.h"
 #include "utils/portal.h"
diff --git a/src/backend/utils/mb/conv.c b/src/backend/utils/mb/conv.c
index 4b0fc23285..3b68dd4821 100644
--- a/src/backend/utils/mb/conv.c
+++ b/src/backend/utils/mb/conv.c
@@ -12,6 +12,7 @@
  */
 #include "postgres.h"
 #include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 
 /*
diff --git a/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c b/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c
index 5500dd4fed..35204ae530 100644
--- a/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c b/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
index 26051530b7..5b9ca9d3c0 100644
--- a/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
+++ b/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
@@ -13,6 +13,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c b/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c
index d1cb25ea92..4703c4ef0a 100644
--- a/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c b/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
index ffc3896c9d..929b5d78b0 100644
--- a/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
+++ b/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 
 /*
  * SJIS alternative code.
diff --git a/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c b/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
index 2183cf183a..bb42246c15 100644
--- a/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c
index 7601bd65f2..33079708bf 100644
--- a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c
+++ b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/big5.c
@@ -14,6 +14,7 @@
 #include "postgres_fe.h"
 
 #include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 typedef struct
 {
diff --git a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
index e209e9a545..ef887961f2 100644
--- a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
+++ b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 
 #define ENCODING_GROWTH_RATE 4
 
diff --git a/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c b/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c
index 72ab071d87..5e38c186e0 100644
--- a/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c
+++ b/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c b/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c
index 5868b66715..20898d9040 100644
--- a/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c b/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c
index 542e055ea2..13b9e2e881 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/big5_to_utf8.map"
 #include "../../Unicode/utf8_to_big5.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c b/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c
index d145da093f..d738275c9d 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/utf8_to_koi8r.map"
 #include "../../Unicode/koi8r_to_utf8.map"
 #include "../../Unicode/utf8_to_koi8u.map"
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c
index 8d56d2343c..20b2f9ab1f 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/euc_jis_2004_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_jis_2004.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c
index 41f2f9a5ca..3e98899ec5 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/euc_cn_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_cn.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c
index 5a9f8ecfd4..84c6f5ed69 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/euc_jp_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_jp.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c
index d09f9dc78a..4462554199 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/euc_kr_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_kr.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c
index 8e0250e603..b3b7ff87e4 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/euc_tw_to_utf8.map"
 #include "../../Unicode/utf8_to_euc_tw.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c b/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c
index 6b5b326c6f..49062c5b0d 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/gb18030_to_utf8.map"
 #include "../../Unicode/utf8_to_gb18030.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c b/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c
index 4e2246da7a..8e60cc1fa6 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/gbk_to_utf8.map"
 #include "../../Unicode/utf8_to_gbk.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
index 8051eefed9..673e079060 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/iso8859_10_to_utf8.map"
 #include "../../Unicode/iso8859_13_to_utf8.map"
 #include "../../Unicode/iso8859_14_to_utf8.map"
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
index b9e127cc5e..62d5d5feee 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c b/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c
index bece993e1d..ef2879150b 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/johab_to_utf8.map"
 #include "../../Unicode/utf8_to_johab.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c b/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c
index 8307d4ceb4..428bdc5c27 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/sjis_to_utf8.map"
 #include "../../Unicode/utf8_to_sjis.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c b/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c
index c913ba0866..ed97196d37 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/shift_jis_2004_to_utf8.map"
 #include "../../Unicode/utf8_to_shift_jis_2004.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c b/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c
index 87ae61ee85..d52107ed7f 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/uhc_to_utf8.map"
 #include "../../Unicode/utf8_to_uhc.map"
 
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c b/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c
index 200fd7f23a..a29c45e746 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
+#include "utils/mbutils.h"
 #include "../../Unicode/utf8_to_win1250.map"
 #include "../../Unicode/utf8_to_win1251.map"
 #include "../../Unicode/utf8_to_win1252.map"
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 4cec7ec73a..bd3c5b9442 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -38,6 +38,7 @@
 #include "catalog/namespace.h"
 #include "common/pg_wchar.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
 
diff --git a/src/backend/utils/mb/stringinfo_mb.c b/src/backend/utils/mb/stringinfo_mb.c
index e7462a09b5..26cf8c715f 100644
--- a/src/backend/utils/mb/stringinfo_mb.c
+++ b/src/backend/utils/mb/stringinfo_mb.c
@@ -21,6 +21,7 @@
 
 #include "common/pg_wchar.h"
 #include "mb/stringinfo_mb.h"
+#include "utils/mbutils.h"
 
 
 /*
diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l
index 6c9e60ef64..0738360aa7 100644
--- a/src/backend/utils/misc/guc-file.l
+++ b/src/backend/utils/misc/guc-file.l
@@ -18,6 +18,7 @@
 #include "miscadmin.h"
 #include "storage/fd.h"
 #include "utils/guc.h"
+#include "utils/mbutils.h"
 
 
 /*
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 71701de091..b47439c52c 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -23,6 +23,7 @@
 
 #include "common/pg_wchar.h"
 #include "miscadmin.h"
+#include "utils/mbutils.h"
 #include "utils/memdebug.h"
 #include "utils/memutils.h"
 
diff --git a/src/common/wchar.c b/src/common/wchar.c
index 53006115d2..c9c07a14ba 100644
--- a/src/common/wchar.c
+++ b/src/common/wchar.c
@@ -13,6 +13,7 @@
 #include "c.h"
 
 #include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 
 /*
diff --git a/src/include/common/pg_wchar.h b/src/include/common/pg_wchar.h
index a1b3a27b9f..7b98453b7b 100644
--- a/src/include/common/pg_wchar.h
+++ b/src/include/common/pg_wchar.h
@@ -488,20 +488,6 @@ typedef struct
  */
 typedef uint32 (*utf_local_conversion_func) (uint32 code);
 
-/*
- * Support macro for encoding conversion functions to validate their
- * arguments.  (This could be made more compact if we included fmgr.h
- * here, but we don't want to do that because this header file is also
- * used by frontends.)
- */
-#define CHECK_ENCODING_CONVERSION_ARGS(srcencoding,destencoding) \
-	check_encoding_conversion_args(PG_GETARG_INT32(0), \
-								   PG_GETARG_INT32(1), \
-								   PG_GETARG_INT32(4), \
-								   (srcencoding), \
-								   (destencoding))
-
-
 /*
  * Some handy functions for Unicode-specific tests.
  */
@@ -552,104 +538,9 @@ extern bool pg_utf8_islegal(const unsigned char *source, int length);
 extern int	pg_utf_mblen(const unsigned char *s);
 extern int	pg_mule_mblen(const unsigned char *s);
 
-/*
- * The remaining functions are backend-only.
- */
-extern int	pg_mb2wchar(const char *from, pg_wchar *to);
-extern int	pg_mb2wchar_with_len(const char *from, pg_wchar *to, int len);
-extern int	pg_encoding_mb2wchar_with_len(int encoding,
-										  const char *from, pg_wchar *to, int len);
-extern int	pg_wchar2mb(const pg_wchar *from, char *to);
-extern int	pg_wchar2mb_with_len(const pg_wchar *from, char *to, int len);
-extern int	pg_encoding_wchar2mb_with_len(int encoding,
-										  const pg_wchar *from, char *to, int len);
 extern int	pg_char_and_wchar_strcmp(const char *s1, const pg_wchar *s2);
 extern int	pg_wchar_strncmp(const pg_wchar *s1, const pg_wchar *s2, size_t n);
 extern int	pg_char_and_wchar_strncmp(const char *s1, const pg_wchar *s2, size_t n);
 extern size_t pg_wchar_strlen(const pg_wchar *wstr);
-extern int	pg_mblen(const char *mbstr);
-extern int	pg_dsplen(const char *mbstr);
-extern int	pg_mbstrlen(const char *mbstr);
-extern int	pg_mbstrlen_with_len(const char *mbstr, int len);
-extern int	pg_mbcliplen(const char *mbstr, int len, int limit);
-extern int	pg_encoding_mbcliplen(int encoding, const char *mbstr,
-								  int len, int limit);
-extern int	pg_mbcharcliplen(const char *mbstr, int len, int limit);
-extern int	pg_database_encoding_max_length(void);
-extern mbcharacter_incrementer pg_database_encoding_character_incrementer(void);
-
-extern int	PrepareClientEncoding(int encoding);
-extern int	SetClientEncoding(int encoding);
-extern void InitializeClientEncoding(void);
-extern int	pg_get_client_encoding(void);
-extern const char *pg_get_client_encoding_name(void);
-
-extern void SetDatabaseEncoding(int encoding);
-extern int	GetDatabaseEncoding(void);
-extern const char *GetDatabaseEncodingName(void);
-extern void SetMessageEncoding(int encoding);
-extern int	GetMessageEncoding(void);
-
-#ifdef ENABLE_NLS
-extern int	pg_bind_textdomain_codeset(const char *domainname);
-#endif
-
-extern unsigned char *pg_do_encoding_conversion(unsigned char *src, int len,
-												int src_encoding,
-												int dest_encoding);
-
-extern char *pg_client_to_server(const char *s, int len);
-extern char *pg_server_to_client(const char *s, int len);
-extern char *pg_any_to_server(const char *s, int len, int encoding);
-extern char *pg_server_to_any(const char *s, int len, int encoding);
-
-extern unsigned short BIG5toCNS(unsigned short big5, unsigned char *lc);
-extern unsigned short CNStoBIG5(unsigned short cns, unsigned char lc);
-
-extern void UtfToLocal(const unsigned char *utf, int len,
-					   unsigned char *iso,
-					   const pg_mb_radix_tree *map,
-					   const pg_utf_to_local_combined *cmap, int cmapsize,
-					   utf_local_conversion_func conv_func,
-					   int encoding);
-extern void LocalToUtf(const unsigned char *iso, int len,
-					   unsigned char *utf,
-					   const pg_mb_radix_tree *map,
-					   const pg_local_to_utf_combined *cmap, int cmapsize,
-					   utf_local_conversion_func conv_func,
-					   int encoding);
-
-extern bool pg_verifymbstr(const char *mbstr, int len, bool noError);
-extern bool pg_verify_mbstr(int encoding, const char *mbstr, int len,
-							bool noError);
-extern int	pg_verify_mbstr_len(int encoding, const char *mbstr, int len,
-								bool noError);
-
-extern void check_encoding_conversion_args(int src_encoding,
-										   int dest_encoding,
-										   int len,
-										   int expected_src_encoding,
-										   int expected_dest_encoding);
-
-extern void report_invalid_encoding(int encoding, const char *mbstr, int len) pg_attribute_noreturn();
-extern void report_untranslatable_char(int src_encoding, int dest_encoding,
-									   const char *mbstr, int len) pg_attribute_noreturn();
-
-extern void local2local(const unsigned char *l, unsigned char *p, int len,
-						int src_encoding, int dest_encoding, const unsigned char *tab);
-extern void latin2mic(const unsigned char *l, unsigned char *p, int len,
-					  int lc, int encoding);
-extern void mic2latin(const unsigned char *mic, unsigned char *p, int len,
-					  int lc, int encoding);
-extern void latin2mic_with_table(const unsigned char *l, unsigned char *p,
-								 int len, int lc, int encoding,
-								 const unsigned char *tab);
-extern void mic2latin_with_table(const unsigned char *mic, unsigned char *p,
-								 int len, int lc, int encoding,
-								 const unsigned char *tab);
-
-#ifdef WIN32
-extern WCHAR *pgwin32_message_to_UTF16(const char *str, int len, int *utf16len);
-#endif
 
 #endif							/* PG_WCHAR_H */
diff --git a/src/include/libpq/pqformat.h b/src/include/libpq/pqformat.h
index daeba6d5c9..9672e654bf 100644
--- a/src/include/libpq/pqformat.h
+++ b/src/include/libpq/pqformat.h
@@ -16,6 +16,7 @@
 #include "common/pg_wchar.h"
 #include "lib/stringinfo.h"
 #include "port/pg_bswap.h"
+#include "utils/mbutils.h"
 
 extern void pq_beginmessage(StringInfo buf, char msgtype);
 extern void pq_beginmessage_reuse(StringInfo buf, char msgtype);
diff --git a/src/include/regex/regcustom.h b/src/include/regex/regcustom.h
index 2506ac8268..ce0700c7b1 100644
--- a/src/include/regex/regcustom.h
+++ b/src/include/regex/regcustom.h
@@ -56,6 +56,8 @@
 
 #include "miscadmin.h"			/* needed by rcancelrequested/rstacktoodeep */
 
+#include "utils/mbutils.h"
+
 
 /* overrides for regguts.h definitions, if any */
 #define FUNCPTR(name, args) (*name) args
diff --git a/src/include/regex/regex.h b/src/include/regex/regex.h
index 739cc7d14b..3c53687c1b 100644
--- a/src/include/regex/regex.h
+++ b/src/include/regex/regex.h
@@ -36,6 +36,7 @@
  * Add your own defines, if needed, here.
  */
 #include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 
 /*
  * interface types etc.
diff --git a/src/include/tsearch/ts_locale.h b/src/include/tsearch/ts_locale.h
index f77eb23c8a..00dcef3f80 100644
--- a/src/include/tsearch/ts_locale.h
+++ b/src/include/tsearch/ts_locale.h
@@ -16,6 +16,7 @@
 #include <limits.h>
 
 #include "common/pg_wchar.h"
+#include "utils/mbutils.h"
 #include "utils/pg_locale.h"
 
 /*
diff --git a/src/include/utils/mbutils.h b/src/include/utils/mbutils.h
new file mode 100644
index 0000000000..b8da6ce0a9
--- /dev/null
+++ b/src/include/utils/mbutils.h
@@ -0,0 +1,124 @@
+/*-------------------------------------------------------------------------
+ *
+ * mbutils.h
+ *	  backend-only multibyte-character support
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/mbutils.h
+ *
+ *	NOTES
+ *		TODO: write some notes
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MBUTILS_H
+#define MBUTILS_H
+
+extern int	pg_mb2wchar(const char *from, pg_wchar *to);
+extern int	pg_mb2wchar_with_len(const char *from, pg_wchar *to, int len);
+extern int	pg_encoding_mb2wchar_with_len(int encoding,
+										  const char *from, pg_wchar *to, int len);
+extern int	pg_wchar2mb(const pg_wchar *from, char *to);
+extern int	pg_wchar2mb_with_len(const pg_wchar *from, char *to, int len);
+extern int	pg_encoding_wchar2mb_with_len(int encoding,
+										  const pg_wchar *from, char *to, int len);
+extern int	pg_mblen(const char *mbstr);
+extern int	pg_dsplen(const char *mbstr);
+extern int	pg_mbstrlen(const char *mbstr);
+extern int	pg_mbstrlen_with_len(const char *mbstr, int len);
+extern int	pg_mbcliplen(const char *mbstr, int len, int limit);
+extern int	pg_encoding_mbcliplen(int encoding, const char *mbstr,
+								  int len, int limit);
+extern int	pg_mbcharcliplen(const char *mbstr, int len, int limit);
+extern int	pg_database_encoding_max_length(void);
+extern mbcharacter_incrementer pg_database_encoding_character_incrementer(void);
+
+extern int	PrepareClientEncoding(int encoding);
+extern int	SetClientEncoding(int encoding);
+extern void InitializeClientEncoding(void);
+extern int	pg_get_client_encoding(void);
+extern const char *pg_get_client_encoding_name(void);
+
+extern void SetDatabaseEncoding(int encoding);
+extern int	GetDatabaseEncoding(void);
+extern const char *GetDatabaseEncodingName(void);
+extern void SetMessageEncoding(int encoding);
+extern int	GetMessageEncoding(void);
+
+#ifdef ENABLE_NLS
+extern int	pg_bind_textdomain_codeset(const char *domainname);
+#endif
+
+extern unsigned char *pg_do_encoding_conversion(unsigned char *src, int len,
+												int src_encoding,
+												int dest_encoding);
+
+extern char *pg_client_to_server(const char *s, int len);
+extern char *pg_server_to_client(const char *s, int len);
+extern char *pg_any_to_server(const char *s, int len, int encoding);
+extern char *pg_server_to_any(const char *s, int len, int encoding);
+
+extern unsigned short BIG5toCNS(unsigned short big5, unsigned char *lc);
+extern unsigned short CNStoBIG5(unsigned short cns, unsigned char lc);
+
+extern void UtfToLocal(const unsigned char *utf, int len,
+					   unsigned char *iso,
+					   const pg_mb_radix_tree *map,
+					   const pg_utf_to_local_combined *cmap, int cmapsize,
+					   utf_local_conversion_func conv_func,
+					   int encoding);
+extern void LocalToUtf(const unsigned char *iso, int len,
+					   unsigned char *utf,
+					   const pg_mb_radix_tree *map,
+					   const pg_local_to_utf_combined *cmap, int cmapsize,
+					   utf_local_conversion_func conv_func,
+					   int encoding);
+
+extern bool pg_verifymbstr(const char *mbstr, int len, bool noError);
+extern bool pg_verify_mbstr(int encoding, const char *mbstr, int len,
+							bool noError);
+extern int	pg_verify_mbstr_len(int encoding, const char *mbstr, int len,
+								bool noError);
+
+extern void check_encoding_conversion_args(int src_encoding,
+										   int dest_encoding,
+										   int len,
+										   int expected_src_encoding,
+										   int expected_dest_encoding);
+
+extern void report_invalid_encoding(int encoding, const char *mbstr, int len) pg_attribute_noreturn();
+extern void report_untranslatable_char(int src_encoding, int dest_encoding,
+									   const char *mbstr, int len) pg_attribute_noreturn();
+
+extern void local2local(const unsigned char *l, unsigned char *p, int len,
+						int src_encoding, int dest_encoding, const unsigned char *tab);
+extern void latin2mic(const unsigned char *l, unsigned char *p, int len,
+					  int lc, int encoding);
+extern void mic2latin(const unsigned char *mic, unsigned char *p, int len,
+					  int lc, int encoding);
+extern void latin2mic_with_table(const unsigned char *l, unsigned char *p,
+								 int len, int lc, int encoding,
+								 const unsigned char *tab);
+extern void mic2latin_with_table(const unsigned char *mic, unsigned char *p,
+								 int len, int lc, int encoding,
+								 const unsigned char *tab);
+
+#ifdef WIN32
+extern WCHAR *pgwin32_message_to_UTF16(const char *str, int len, int *utf16len);
+#endif
+
+/*
+ * Support macro for encoding conversion functions to validate their
+ * arguments.
+ */
+#define CHECK_ENCODING_CONVERSION_ARGS(srcencoding,destencoding) \
+	check_encoding_conversion_args(PG_GETARG_INT32(0), \
+								   PG_GETARG_INT32(1), \
+								   PG_GETARG_INT32(4), \
+								   (srcencoding), \
+								   (destencoding))
+
+
+#endif							/* MBUTILS_H */
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 799a1538da..9ca04d9781 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -30,6 +30,7 @@
 #include "libpq-fe.h"
 #include "libpq-int.h"
 #include "port/pg_bswap.h"
+#include "utils/mbutils.h"
 
 /*
  * This macro lists the backend message types that could be "long" (more
diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index ac7144e99f..4293f58360 100644
--- a/src/pl/plperl/plperl.c
+++ b/src/pl/plperl/plperl.c
@@ -34,6 +34,7 @@
 #include "utils/guc.h"
 #include "utils/hsearch.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
diff --git a/src/pl/plperl/plperl_helpers.h b/src/pl/plperl/plperl_helpers.h
index cec942c280..2359930805 100644
--- a/src/pl/plperl/plperl_helpers.h
+++ b/src/pl/plperl/plperl_helpers.h
@@ -2,8 +2,8 @@
 #define PL_PERL_HELPERS_H
 
 #include "common/pg_wchar.h"
-
 #include "plperl.h"
+#include "utils/mbutils.h"
 
 
 /*
diff --git a/src/pl/plpgsql/src/pl_scanner.c b/src/pl/plpgsql/src/pl_scanner.c
index a5bb7c474c..0b758db69c 100644
--- a/src/pl/plpgsql/src/pl_scanner.c
+++ b/src/pl/plpgsql/src/pl_scanner.c
@@ -20,6 +20,7 @@
 
 #include "plpgsql.h"
 #include "pl_gram.h"			/* must be after parser/scanner.h */
+#include "utils/mbutils.h"
 
 
 /* Klugy flag to tell scanner how to look up identifiers */
diff --git a/src/pl/plpython/plpy_cursorobject.c b/src/pl/plpython/plpy_cursorobject.c
index 01ad7cf3bc..5dd1079002 100644
--- a/src/pl/plpython/plpy_cursorobject.c
+++ b/src/pl/plpython/plpy_cursorobject.c
@@ -19,6 +19,7 @@
 #include "plpy_resultobject.h"
 #include "plpy_spi.h"
 #include "plpython.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 
 static PyObject *PLy_cursor_query(const char *query);
diff --git a/src/pl/plpython/plpy_plpymodule.c b/src/pl/plpython/plpy_plpymodule.c
index b079afc000..d691def5bb 100644
--- a/src/pl/plpython/plpy_plpymodule.c
+++ b/src/pl/plpython/plpy_plpymodule.c
@@ -18,6 +18,7 @@
 #include "plpy_subxactobject.h"
 #include "plpython.h"
 #include "utils/builtins.h"
+#include "utils/mbutils.h"
 #include "utils/snapmgr.h"
 
 HTAB	   *PLy_spi_exceptions = NULL;
diff --git a/src/pl/plpython/plpy_spi.c b/src/pl/plpython/plpy_spi.c
index 8b7cfbf18c..a6cf592974 100644
--- a/src/pl/plpython/plpy_spi.c
+++ b/src/pl/plpython/plpy_spi.c
@@ -22,6 +22,7 @@
 #include "plpy_resultobject.h"
 #include "plpy_spi.h"
 #include "plpython.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
 
diff --git a/src/pl/plpython/plpy_typeio.c b/src/pl/plpython/plpy_typeio.c
index baf45526c6..e4fd090228 100644
--- a/src/pl/plpython/plpy_typeio.c
+++ b/src/pl/plpython/plpy_typeio.c
@@ -19,6 +19,7 @@
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 
 /* conversion from Datums to Python objects */
diff --git a/src/pl/plpython/plpy_util.c b/src/pl/plpython/plpy_util.c
index 873afcfbb8..b9ad3f7000 100644
--- a/src/pl/plpython/plpy_util.c
+++ b/src/pl/plpython/plpy_util.c
@@ -10,6 +10,7 @@
 #include "plpy_elog.h"
 #include "plpy_util.h"
 #include "plpython.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 
 /*
-- 
2.21.1 (Apple Git-122.3)

0008-Moving-wstrcmp-and-wstrncmp-implementations.patchapplication/octet-stream; name=0008-Moving-wstrcmp-and-wstrncmp-implementations.patch; x-unix-mode=0644Download

From c4a38c31d9be99dda51fe6f66c39c584521b26cc Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 22 Jan 2020 20:19:34 -0800
Subject: [PATCH 08/11] Moving wstrcmp and wstrncmp implementations.

The functions defined in wstrcmp.c and wstrncmp.c are declared in
src/include/common/pg_wchar.h and contain no backend-specific code, so moving
those two files into src/common/ where they belong.
---
 src/backend/utils/mb/Makefile               | 4 +---
 src/common/Makefile                         | 4 +++-
 src/{backend/utils/mb => common}/wstrcmp.c  | 2 +-
 src/{backend/utils/mb => common}/wstrncmp.c | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)
 rename src/{backend/utils/mb => common}/wstrcmp.c (98%)
 rename src/{backend/utils/mb => common}/wstrncmp.c (98%)

diff --git a/src/backend/utils/mb/Makefile b/src/backend/utils/mb/Makefile
index b19a125fa2..3e6f19cd44 100644
--- a/src/backend/utils/mb/Makefile
+++ b/src/backend/utils/mb/Makefile
@@ -15,9 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = \
 	conv.o \
 	mbutils.o \
-	stringinfo_mb.o \
-	wstrcmp.o \
-	wstrncmp.o
+	stringinfo_mb.o
 
 include $(top_srcdir)/src/backend/common.mk
 
diff --git a/src/common/Makefile b/src/common/Makefile
index 44ca68fa6c..3882cd7a3d 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -72,7 +72,9 @@ OBJS_COMMON = \
 	unicode_norm.o \
 	username.o \
 	wait_error.o \
-	wchar.o
+	wchar.o \
+	wstrcmp.o \
+	wstrncmp.o
 
 ifeq ($(with_openssl),yes)
 OBJS_COMMON += \
diff --git a/src/backend/utils/mb/wstrcmp.c b/src/common/wstrcmp.c
similarity index 98%
rename from src/backend/utils/mb/wstrcmp.c
rename to src/common/wstrcmp.c
index e5f57d717d..4d0a66952b 100644
--- a/src/backend/utils/mb/wstrcmp.c
+++ b/src/common/wstrcmp.c
@@ -1,5 +1,5 @@
 /*
- * src/backend/utils/mb/wstrcmp.c
+ * src/common/wstrcmp.c
  *
  *-
  * Copyright (c) 1990, 1993
diff --git a/src/backend/utils/mb/wstrncmp.c b/src/common/wstrncmp.c
similarity index 98%
rename from src/backend/utils/mb/wstrncmp.c
rename to src/common/wstrncmp.c
index cce0c6c5cf..7fb8319161 100644
--- a/src/backend/utils/mb/wstrncmp.c
+++ b/src/common/wstrncmp.c
@@ -1,5 +1,5 @@
 /*
- * src/backend/utils/mb/wstrncmp.c
+ * src/common/wstrncmp.c
  *
  *
  * Copyright (c) 1989, 1993
-- 
2.21.1 (Apple Git-122.3)

0009-Relocating-jsonapi.h.patchapplication/octet-stream; name=0009-Relocating-jsonapi.h.patch; x-unix-mode=0644Download

From 78e387bae384a7addf851e40fd45198df10121d3 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Thu, 23 Jan 2020 08:59:46 -0800
Subject: [PATCH 09/11] Relocating jsonapi.h

Moving jsonapi.h to src/include/common

In preparation for exposing this header for both frontend
and backend use, moving it to src/include/common, and
changing the #includes in other files to match.
---
 contrib/hstore/hstore_io.c              | 2 +-
 src/backend/tsearch/to_tsany.c          | 2 +-
 src/backend/tsearch/wparser.c           | 2 +-
 src/backend/utils/adt/json.c            | 1 +
 src/backend/utils/adt/jsonapi.c         | 3 ++-
 src/backend/utils/adt/jsonb.c           | 1 +
 src/backend/utils/adt/jsonb_util.c      | 2 +-
 src/backend/utils/adt/jsonfuncs.c       | 3 ++-
 src/include/{utils => common}/jsonapi.h | 2 +-
 src/include/utils/jsonfuncs.h           | 2 +-
 10 files changed, 12 insertions(+), 8 deletions(-)
 rename src/include/{utils => common}/jsonapi.h (99%)

diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index 10ec392775..f3174f2995 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -7,13 +7,13 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "funcapi.h"
 #include "hstore.h"
 #include "lib/stringinfo.h"
 #include "libpq/pqformat.h"
 #include "utils/builtins.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index adf181c191..1fe67c4c99 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -13,10 +13,10 @@
  */
 #include "postgres.h"
 
+#include "common/jsonapi.h"
 #include "tsearch/ts_cache.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonfuncs.h"
 
 
diff --git a/src/backend/tsearch/wparser.c b/src/backend/tsearch/wparser.c
index c7499a94ac..88005c0519 100644
--- a/src/backend/tsearch/wparser.c
+++ b/src/backend/tsearch/wparser.c
@@ -16,11 +16,11 @@
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
+#include "common/jsonapi.h"
 #include "funcapi.h"
 #include "tsearch/ts_cache.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonfuncs.h"
 #include "utils/varlena.h"
 
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 743dafea4a..8333711988 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "funcapi.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
diff --git a/src/backend/utils/adt/jsonapi.c b/src/backend/utils/adt/jsonapi.c
index 75e71ba376..ee26dc324f 100644
--- a/src/backend/utils/adt/jsonapi.c
+++ b/src/backend/utils/adt/jsonapi.c
@@ -13,9 +13,10 @@
  */
 #include "postgres.h"
 
+#include "common/jsonapi.h"
 #include "common/pg_wchar.h"
 #include "miscadmin.h"
-#include "utils/jsonapi.h"
+#include "utils/mbutils.h"
 
 /*
  * The context of the parser is maintained by the recursive descent
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index c912f8932d..d1d4542f74 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -15,6 +15,7 @@
 #include "access/htup_details.h"
 #include "access/transam.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "funcapi.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index b33c3ef43c..edec657cd3 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -15,12 +15,12 @@
 
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/datetime.h"
 #include "utils/hashutils.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/memutils.h"
 #include "utils/varlena.h"
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 971ef6e630..d5e2c56f14 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -18,6 +18,7 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "common/pg_wchar.h"
 #include "fmgr.h"
 #include "funcapi.h"
@@ -27,10 +28,10 @@
 #include "utils/builtins.h"
 #include "utils/hsearch.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/jsonfuncs.h"
 #include "utils/lsyscache.h"
+#include "utils/mbutils.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
 #include "utils/typcache.h"
diff --git a/src/include/utils/jsonapi.h b/src/include/common/jsonapi.h
similarity index 99%
rename from src/include/utils/jsonapi.h
rename to src/include/common/jsonapi.h
index 4d69b18495..da20ffce64 100644
--- a/src/include/utils/jsonapi.h
+++ b/src/include/common/jsonapi.h
@@ -6,7 +6,7 @@
  * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
- * src/include/utils/jsonapi.h
+ * src/include/common/jsonapi.h
  *
  *-------------------------------------------------------------------------
  */
diff --git a/src/include/utils/jsonfuncs.h b/src/include/utils/jsonfuncs.h
index b993f38409..1f1b4029cb 100644
--- a/src/include/utils/jsonfuncs.h
+++ b/src/include/utils/jsonfuncs.h
@@ -14,7 +14,7 @@
 #ifndef JSONFUNCS_H
 #define JSONFUNCS_H
 
-#include "utils/jsonapi.h"
+#include "common/jsonapi.h"
 #include "utils/jsonb.h"
 
 /*
-- 
2.21.1 (Apple Git-122.3)

0010-Relocating-jsonapi.c.patchapplication/octet-stream; name=0010-Relocating-jsonapi.c.patch; x-unix-mode=0644Download

From a2c7d0e798004fc5c22ef8eef204a8eefc19e4a5 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Thu, 23 Jan 2020 11:39:55 -0800
Subject: [PATCH 10/11] Relocating jsonapi.c

Moving src/backend/utils/adt/jsonapi.c to src/common/jsonapi.c
and reworking the code to not include elog, pg_mblen and similar
backend-only functions.

Along the way, simplifying Robert's implementation with macros
to make the json parsing code more compact and (to my eyes)
easier to read.
---
 src/backend/utils/adt/Makefile              |   1 -
 src/backend/utils/adt/json.c                |   2 +-
 src/backend/utils/adt/jsonb.c               |   2 +-
 src/backend/utils/adt/jsonfuncs.c           |   7 +-
 src/common/Makefile                         |   1 +
 src/{backend/utils/adt => common}/jsonapi.c | 273 +++++++++-----------
 src/include/common/jsonapi.h                |  19 +-
 src/include/utils/jsonfuncs.h               |   2 +-
 8 files changed, 146 insertions(+), 161 deletions(-)
 rename src/{backend/utils/adt => common}/jsonapi.c (84%)

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 790d7a24fb..13efa9338c 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -44,7 +44,6 @@ OBJS = \
 	int.o \
 	int8.o \
 	json.o \
-	jsonapi.o \
 	jsonb.o \
 	jsonb_gin.o \
 	jsonb_op.o \
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 8333711988..3f9a3c9e6e 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -129,7 +129,7 @@ json_recv(PG_FUNCTION_ARGS)
 	str = pq_getmsgtext(buf, buf->len - buf->cursor, &nbytes);
 
 	/* Validate it. */
-	lex = makeJsonLexContextCstringLen(str, nbytes, false);
+	lex = makeJsonLexContextCstringLen(str, nbytes, GetDatabaseEncoding(), false);
 	pg_parse_json_or_ereport(lex, &nullSemAction);
 
 	PG_RETURN_TEXT_P(cstring_to_text_with_len(str, nbytes));
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index d1d4542f74..1f85f68e30 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -262,7 +262,7 @@ jsonb_from_cstring(char *json, int len)
 
 	memset(&state, 0, sizeof(state));
 	memset(&sem, 0, sizeof(sem));
-	lex = makeJsonLexContextCstringLen(json, len, true);
+	lex = makeJsonLexContextCstringLen(json, len, GetDatabaseEncoding(), true);
 
 	sem.semstate = (void *) &state;
 
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index d5e2c56f14..2b403ea7da 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -495,7 +495,7 @@ static void transform_string_values_scalar(void *state, char *token, JsonTokenTy
  * ereport(ERROR).
  */
 void
-pg_parse_json_or_ereport(JsonLexContext *lex, JsonSemAction *sem)
+pg_parse_json_or_ereport(JsonLexContext *lex, const JsonSemAction *sem)
 {
 	JsonParseErrorType	result;
 
@@ -515,6 +515,7 @@ makeJsonLexContext(text *json, bool need_escapes)
 {
 	return makeJsonLexContextCstringLen(VARDATA_ANY(json),
 										VARSIZE_ANY_EXHDR(json),
+										GetDatabaseEncoding(),
 										need_escapes);
 }
 
@@ -2606,7 +2607,7 @@ populate_array_json(PopulateArrayContext *ctx, char *json, int len)
 	PopulateArrayState state;
 	JsonSemAction sem;
 
-	state.lex = makeJsonLexContextCstringLen(json, len, true);
+	state.lex = makeJsonLexContextCstringLen(json, len, GetDatabaseEncoding(), true);
 	state.ctx = ctx;
 
 	memset(&sem, 0, sizeof(sem));
@@ -3449,7 +3450,7 @@ get_json_object_as_hash(char *json, int len, const char *funcname)
 	HASHCTL		ctl;
 	HTAB	   *tab;
 	JHashState *state;
-	JsonLexContext *lex = makeJsonLexContextCstringLen(json, len, true);
+	JsonLexContext *lex = makeJsonLexContextCstringLen(json, len, GetDatabaseEncoding(), true);
 	JsonSemAction *sem;
 
 	memset(&ctl, 0, sizeof(ctl));
diff --git a/src/common/Makefile b/src/common/Makefile
index 3882cd7a3d..e9caa08418 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -56,6 +56,7 @@ OBJS_COMMON = \
 	f2s.o \
 	file_perm.o \
 	ip.o \
+	jsonapi.o \
 	keywords.o \
 	kwlookup.o \
 	link-canary.o \
diff --git a/src/backend/utils/adt/jsonapi.c b/src/common/jsonapi.c
similarity index 84%
rename from src/backend/utils/adt/jsonapi.c
rename to src/common/jsonapi.c
index ee26dc324f..49e1bb44c9 100644
--- a/src/backend/utils/adt/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -11,12 +11,25 @@
  *
  *-------------------------------------------------------------------------
  */
+#ifndef FRONTEND
 #include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
 
 #include "common/jsonapi.h"
 #include "common/pg_wchar.h"
+
+#ifndef FRONTEND
 #include "miscadmin.h"
-#include "utils/mbutils.h"
+#endif
+
+#define INSIST(x) \
+do { \
+	JsonParseErrorType	parse_result; \
+	if((parse_result = (x)) != JSON_SUCCESS) \
+		return parse_result; \
+} while (0)
 
 /*
  * The context of the parser is maintained by the recursive descent
@@ -36,19 +49,20 @@ typedef enum					/* contexts of JSON parser */
 	JSON_PARSE_END				/* saw the end of a document, expect nothing */
 } JsonParseContext;
 
+static inline JsonParseErrorType lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token);
 static inline JsonParseErrorType json_lex_string(JsonLexContext *lex);
 static inline JsonParseErrorType json_lex_number(JsonLexContext *lex, char *s,
 								   bool *num_err, int *total_len);
-static inline JsonParseErrorType parse_scalar(JsonLexContext *lex, JsonSemAction *sem);
-static JsonParseErrorType parse_object_field(JsonLexContext *lex, JsonSemAction *sem);
-static JsonParseErrorType parse_object(JsonLexContext *lex, JsonSemAction *sem);
-static JsonParseErrorType parse_array_element(JsonLexContext *lex, JsonSemAction *sem);
-static JsonParseErrorType parse_array(JsonLexContext *lex, JsonSemAction *sem);
+static inline JsonParseErrorType parse_scalar(JsonLexContext *lex, const JsonSemAction *sem);
+static JsonParseErrorType parse_object_field(JsonLexContext *lex, const JsonSemAction *sem);
+static JsonParseErrorType parse_object(JsonLexContext *lex, const JsonSemAction *sem);
+static JsonParseErrorType parse_array_element(JsonLexContext *lex, const JsonSemAction *sem);
+static JsonParseErrorType parse_array(JsonLexContext *lex, const JsonSemAction *sem);
 static JsonParseErrorType report_parse_error(JsonParseContext ctx, JsonLexContext *lex);
 static char *extract_token(JsonLexContext *lex);
 
 /* the null action object used for pure validation */
-JsonSemAction nullSemAction =
+const JsonSemAction nullSemAction =
 {
 	NULL, NULL, NULL, NULL, NULL,
 	NULL, NULL, NULL, NULL, NULL
@@ -56,32 +70,6 @@ JsonSemAction nullSemAction =
 
 /* Recursive Descent parser support routines */
 
-/*
- * lex_peek
- *
- * what is the current look_ahead token?
-*/
-static inline JsonTokenType
-lex_peek(JsonLexContext *lex)
-{
-	return lex->token_type;
-}
-
-/*
- * lex_expect
- *
- * move the lexer to the next token if the current look_ahead token matches
- * the parameter token. Otherwise, report an error.
- */
-static inline JsonParseErrorType
-lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token)
-{
-	if (lex_peek(lex) == token)
-		return json_lex(lex);
-	else
-		return report_parse_error(ctx, lex);
-}
-
 /* chars to consider as part of an alphanumeric token */
 #define JSON_ALPHANUMERIC_CHAR(c)  \
 	(((c) >= 'a' && (c) <= 'z') || \
@@ -122,7 +110,8 @@ IsValidJsonNumber(const char *str, int len)
 		dummy_lex.input_length = len;
 	}
 
-	json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error, &total_len);
+	if (JSON_SUCCESS != json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error, &total_len))
+		return false;
 
 	return (!numeric_error) && (total_len == dummy_lex.input_length);
 }
@@ -136,13 +125,21 @@ IsValidJsonNumber(const char *str, int len)
  * if really required.
  */
 JsonLexContext *
-makeJsonLexContextCstringLen(char *json, int len, bool need_escapes)
+makeJsonLexContextCstringLen(char *json, int len, pg_enc encoding, bool need_escapes)
 {
-	JsonLexContext *lex = palloc0(sizeof(JsonLexContext));
+	JsonLexContext *lex;
+
+#ifndef FRONTEND
+	lex = (JsonLexContext*) palloc0fast(sizeof(JsonLexContext));
+#else
+	lex = (JsonLexContext*) malloc(sizeof(JsonLexContext));
+	memset(lex, 0, sizeof(JsonLexContext));
+#endif
 
 	lex->input = lex->token_terminator = lex->line_start = json;
 	lex->line_number = 1;
 	lex->input_length = len;
+	lex->input_encoding = encoding;
 	if (need_escapes)
 		lex->strval = makeStringInfo();
 	return lex;
@@ -159,15 +156,12 @@ makeJsonLexContextCstringLen(char *json, int len, bool need_escapes)
  * pointer to a state object to be passed to those routines.
  */
 JsonParseErrorType
-pg_parse_json(JsonLexContext *lex, JsonSemAction *sem)
+pg_parse_json(JsonLexContext *lex, const JsonSemAction *sem)
 {
 	JsonTokenType tok;
-	JsonParseErrorType	result;
 
 	/* get the initial token */
-	result = json_lex(lex);
-	if (result != JSON_SUCCESS)
-		return result;
+	INSIST(json_lex(lex));
 
 	tok = lex_peek(lex);
 
@@ -175,19 +169,16 @@ pg_parse_json(JsonLexContext *lex, JsonSemAction *sem)
 	switch (tok)
 	{
 		case JSON_TOKEN_OBJECT_START:
-			result = parse_object(lex, sem);
+			INSIST(parse_object(lex, sem));
 			break;
 		case JSON_TOKEN_ARRAY_START:
-			result = parse_array(lex, sem);
+			INSIST(parse_array(lex, sem));
 			break;
 		default:
-			result = parse_scalar(lex, sem); /* json can be a bare scalar */
+			INSIST(parse_scalar(lex, sem)); /* json can be a bare scalar */
 	}
 
-	if (result == JSON_SUCCESS)
-		result = lex_expect(JSON_PARSE_END, lex, JSON_TOKEN_END);
-
-	return result;
+	return lex_expect(JSON_PARSE_END, lex, JSON_TOKEN_END);
 }
 
 /*
@@ -203,7 +194,6 @@ json_count_array_elements(JsonLexContext *lex, int *elements)
 {
 	JsonLexContext copylex;
 	int			count;
-	JsonParseErrorType	result;
 
 	/*
 	 * It's safe to do this with a shallow copy because the lexical routines
@@ -215,29 +205,20 @@ json_count_array_elements(JsonLexContext *lex, int *elements)
 	copylex.lex_level++;
 
 	count = 0;
-	result = lex_expect(JSON_PARSE_ARRAY_START, &copylex,
-						JSON_TOKEN_ARRAY_START);
-	if (result != JSON_SUCCESS)
-		return result;
+	INSIST(lex_expect(JSON_PARSE_ARRAY_START, &copylex, JSON_TOKEN_ARRAY_START));
 	if (lex_peek(&copylex) != JSON_TOKEN_ARRAY_END)
 	{
 		while (1)
 		{
 			count++;
-			result = parse_array_element(&copylex, &nullSemAction);
-			if (result != JSON_SUCCESS)
-				return result;
+			if (JSON_SUCCESS != parse_array_element(&copylex, &nullSemAction))
+				break;
 			if (copylex.token_type != JSON_TOKEN_COMMA)
 				break;
-			result = json_lex(&copylex);
-			if (result != JSON_SUCCESS)
-				return result;
+			INSIST(json_lex(&copylex));
 		}
 	}
-	result = lex_expect(JSON_PARSE_ARRAY_NEXT, &copylex,
-							JSON_TOKEN_ARRAY_END);
-	if (result != JSON_SUCCESS)
-		return result;
+	INSIST(lex_expect(JSON_PARSE_ARRAY_NEXT, &copylex, JSON_TOKEN_ARRAY_END));
 
 	*elements = count;
 	return JSON_SUCCESS;
@@ -253,22 +234,30 @@ json_count_array_elements(JsonLexContext *lex, int *elements)
  *	  - object field
  */
 static inline JsonParseErrorType
-parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
+parse_scalar(JsonLexContext *lex, const JsonSemAction *sem)
 {
 	char	   *val = NULL;
 	json_scalar_action sfunc = sem->scalar;
 	JsonTokenType tok = lex_peek(lex);
-	JsonParseErrorType result;
 
 	/* a scalar must be a string, a number, true, false, or null */
-	if (tok != JSON_TOKEN_STRING && tok != JSON_TOKEN_NUMBER &&
-		tok != JSON_TOKEN_TRUE && tok != JSON_TOKEN_FALSE &&
-		tok != JSON_TOKEN_NULL)
-		return report_parse_error(JSON_PARSE_VALUE, lex);
+	switch (tok)
+	{
+		case JSON_TOKEN_STRING:
+		case JSON_TOKEN_NUMBER:
+		case JSON_TOKEN_TRUE:
+		case JSON_TOKEN_FALSE:
+		case JSON_TOKEN_NULL:
+			break;
+		default:
+			return report_parse_error(JSON_PARSE_VALUE, lex);
+	}
 
 	/* if no semantic function, just consume the token */
 	if (sfunc == NULL)
+	{
 		return json_lex(lex);
+	}
 
 	/* extract the de-escaped string value, or the raw lexeme */
 	if (lex_peek(lex) == JSON_TOKEN_STRING)
@@ -278,7 +267,7 @@ parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
 	}
 	else
 	{
-		int			len = (lex->token_terminator - lex->token_start);
+		int		len = (lex->token_terminator - lex->token_start);
 
 		val = palloc(len + 1);
 		memcpy(val, lex->token_start, len);
@@ -286,18 +275,15 @@ parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
 	}
 
 	/* consume the token */
-	result = json_lex(lex);
-	if (result != JSON_SUCCESS)
-		return result;
+	INSIST(json_lex(lex));
 
 	/* invoke the callback */
 	(*sfunc) (sem->semstate, val, tok);
-
 	return JSON_SUCCESS;
 }
 
 static JsonParseErrorType
-parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
+parse_object_field(JsonLexContext *lex, const JsonSemAction *sem)
 {
 	/*
 	 * An object field is "fieldname" : value where value can be a scalar,
@@ -310,19 +296,14 @@ parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
 	json_ofield_action oend = sem->object_field_end;
 	bool		isnull;
 	JsonTokenType tok;
-	JsonParseErrorType result;
 
 	if (lex_peek(lex) != JSON_TOKEN_STRING)
 		return report_parse_error(JSON_PARSE_STRING, lex);
 	if ((ostart != NULL || oend != NULL) && lex->strval != NULL)
 		fname = pstrdup(lex->strval->data);
-	result = json_lex(lex);
-	if (result != JSON_SUCCESS)
-		return result;
+	INSIST(json_lex(lex));
 
-	result = lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON);
-	if (result != JSON_SUCCESS)
-		return result;
+	INSIST(lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON));
 
 	tok = lex_peek(lex);
 	isnull = tok == JSON_TOKEN_NULL;
@@ -333,16 +314,14 @@ parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
 	switch (tok)
 	{
 		case JSON_TOKEN_OBJECT_START:
-			result = parse_object(lex, sem);
+			INSIST(parse_object(lex, sem));
 			break;
 		case JSON_TOKEN_ARRAY_START:
-			result = parse_array(lex, sem);
+			INSIST(parse_array(lex, sem));
 			break;
 		default:
-			result = parse_scalar(lex, sem);
+			INSIST(parse_scalar(lex, sem));
 	}
-	if (result != JSON_SUCCESS)
-		return result;
 
 	if (oend != NULL)
 		(*oend) (sem->semstate, fname, isnull);
@@ -350,7 +329,7 @@ parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
 }
 
 static JsonParseErrorType
-parse_object(JsonLexContext *lex, JsonSemAction *sem)
+parse_object(JsonLexContext *lex, const JsonSemAction *sem)
 {
 	/*
 	 * an object is a possibly empty sequence of object fields, separated by
@@ -359,9 +338,12 @@ parse_object(JsonLexContext *lex, JsonSemAction *sem)
 	json_struct_action ostart = sem->object_start;
 	json_struct_action oend = sem->object_end;
 	JsonTokenType tok;
-	JsonParseErrorType result;
 
+#ifndef FRONTEND
 	check_stack_depth();
+#else
+	/* TODO: What do we do in frontend code? */
+#endif
 
 	if (ostart != NULL)
 		(*ostart) (sem->semstate);
@@ -375,51 +357,41 @@ parse_object(JsonLexContext *lex, JsonSemAction *sem)
 	lex->lex_level++;
 
 	Assert(lex_peek(lex) == JSON_TOKEN_OBJECT_START);
-	result = json_lex(lex);
-	if (result != JSON_SUCCESS)
-		return result;
+	INSIST(json_lex(lex));
 
 	tok = lex_peek(lex);
 	switch (tok)
 	{
 		case JSON_TOKEN_STRING:
-			result = parse_object_field(lex, sem);
-			while (result == JSON_SUCCESS && lex_peek(lex) == JSON_TOKEN_COMMA)
+			INSIST(parse_object_field(lex, sem));
+			while (lex_peek(lex) == JSON_TOKEN_COMMA)
 			{
-				result = json_lex(lex);
-				if (result != JSON_SUCCESS)
-					break;
-				result = parse_object_field(lex, sem);
+				INSIST(json_lex(lex));
+				INSIST(parse_object_field(lex, sem));
 			}
 			break;
 		case JSON_TOKEN_OBJECT_END:
 			break;
 		default:
 			/* case of an invalid initial token inside the object */
-			result = report_parse_error(JSON_PARSE_OBJECT_START, lex);
+			return report_parse_error(JSON_PARSE_OBJECT_START, lex);
 	}
-	if (result != JSON_SUCCESS)
-		return result;
 
-	result = lex_expect(JSON_PARSE_OBJECT_NEXT, lex, JSON_TOKEN_OBJECT_END);
-	if (result != JSON_SUCCESS)
-		return result;
+	INSIST(lex_expect(JSON_PARSE_OBJECT_NEXT, lex, JSON_TOKEN_OBJECT_END));
 
 	lex->lex_level--;
 
 	if (oend != NULL)
 		(*oend) (sem->semstate);
-
 	return JSON_SUCCESS;
 }
 
 static JsonParseErrorType
-parse_array_element(JsonLexContext *lex, JsonSemAction *sem)
+parse_array_element(JsonLexContext *lex, const JsonSemAction *sem)
 {
 	json_aelem_action astart = sem->array_element_start;
 	json_aelem_action aend = sem->array_element_end;
 	JsonTokenType tok = lex_peek(lex);
-	JsonParseErrorType result;
 
 	bool		isnull;
 
@@ -432,26 +404,22 @@ parse_array_element(JsonLexContext *lex, JsonSemAction *sem)
 	switch (tok)
 	{
 		case JSON_TOKEN_OBJECT_START:
-			result = parse_object(lex, sem);
+			INSIST(parse_object(lex, sem));
 			break;
 		case JSON_TOKEN_ARRAY_START:
-			result = parse_array(lex, sem);
+			INSIST(parse_array(lex, sem));
 			break;
 		default:
-			result = parse_scalar(lex, sem);
+			INSIST(parse_scalar(lex, sem));
 	}
 
-	if (result != JSON_SUCCESS)
-		return result;
-
 	if (aend != NULL)
 		(*aend) (sem->semstate, isnull);
-
 	return JSON_SUCCESS;
 }
 
 static JsonParseErrorType
-parse_array(JsonLexContext *lex, JsonSemAction *sem)
+parse_array(JsonLexContext *lex, const JsonSemAction *sem)
 {
 	/*
 	 * an array is a possibly empty sequence of array elements, separated by
@@ -459,9 +427,12 @@ parse_array(JsonLexContext *lex, JsonSemAction *sem)
 	 */
 	json_struct_action astart = sem->array_start;
 	json_struct_action aend = sem->array_end;
-	JsonParseErrorType result;
 
+#ifndef FRONTEND
 	check_stack_depth();
+#else
+	/* TODO: What do we do in frontend code? */
+#endif
 
 	if (astart != NULL)
 		(*astart) (sem->semstate);
@@ -474,31 +445,25 @@ parse_array(JsonLexContext *lex, JsonSemAction *sem)
 	 */
 	lex->lex_level++;
 
-	result = lex_expect(JSON_PARSE_ARRAY_START, lex, JSON_TOKEN_ARRAY_START);
-	if (result == JSON_SUCCESS && lex_peek(lex) != JSON_TOKEN_ARRAY_END)
+	INSIST(lex_expect(JSON_PARSE_ARRAY_START, lex, JSON_TOKEN_ARRAY_START));
+	if (lex_peek(lex) != JSON_TOKEN_ARRAY_END)
 	{
-		result = parse_array_element(lex, sem);
 
-		while (result == JSON_SUCCESS && lex_peek(lex) == JSON_TOKEN_COMMA)
+		INSIST(parse_array_element(lex, sem));
+
+		while (lex_peek(lex) == JSON_TOKEN_COMMA)
 		{
-			result = json_lex(lex);
-			if (result != JSON_SUCCESS)
-				break;
-			result = parse_array_element(lex, sem);
+			INSIST(json_lex(lex));
+			INSIST(parse_array_element(lex, sem));
 		}
 	}
-	if (result != JSON_SUCCESS)
-		return result;
 
-	result = lex_expect(JSON_PARSE_ARRAY_NEXT, lex, JSON_TOKEN_ARRAY_END);
-	if (result != JSON_SUCCESS)
-		return result;
+	INSIST(lex_expect(JSON_PARSE_ARRAY_NEXT, lex, JSON_TOKEN_ARRAY_END));
 
 	lex->lex_level--;
 
 	if (aend != NULL)
 		(*aend) (sem->semstate);
-
 	return JSON_SUCCESS;
 }
 
@@ -510,7 +475,6 @@ json_lex(JsonLexContext *lex)
 {
 	char	   *s;
 	int			len;
-	JsonParseErrorType	result;
 
 	/* Skip leading whitespace. */
 	s = lex->token_terminator;
@@ -534,7 +498,6 @@ json_lex(JsonLexContext *lex)
 		lex->token_type = JSON_TOKEN_END;
 	}
 	else
-	{
 		switch (*s)
 		{
 				/* Single-character token, some kind of punctuation mark. */
@@ -570,16 +533,12 @@ json_lex(JsonLexContext *lex)
 				break;
 			case '"':
 				/* string */
-				result = json_lex_string(lex);
-				if (result != JSON_SUCCESS)
-					return result;
+				INSIST(json_lex_string(lex));
 				lex->token_type = JSON_TOKEN_STRING;
 				break;
 			case '-':
 				/* Negative number. */
-				result = json_lex_number(lex, s + 1, NULL, NULL);
-				if (result != JSON_SUCCESS)
-					return result;
+				INSIST(json_lex_number(lex, s + 1, NULL, NULL));
 				lex->token_type = JSON_TOKEN_NUMBER;
 				break;
 			case '0':
@@ -593,9 +552,7 @@ json_lex(JsonLexContext *lex)
 			case '8':
 			case '9':
 				/* Positive number. */
-				result = json_lex_number(lex, s, NULL, NULL);
-				if (result != JSON_SUCCESS)
-					return result;
+				INSIST(json_lex_number(lex, s, NULL, NULL));
 				lex->token_type = JSON_TOKEN_NUMBER;
 				break;
 			default:
@@ -649,11 +606,23 @@ json_lex(JsonLexContext *lex)
 
 				}
 		}						/* end of switch */
-	}
-
 	return JSON_SUCCESS;
 }
 
+/*
+ * lex_expect
+ *
+ * move the lexer to the next token if the current look_ahead token matches
+ * the parameter token. Otherwise, report an error.
+ */
+static inline JsonParseErrorType
+lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token)
+{
+	if (lex_peek(lex) == token)
+		return json_lex(lex);
+	return report_parse_error(ctx, lex);
+}
+
 /*
  * The next token in the input stream is known to be a string; lex it.
  */
@@ -721,7 +690,7 @@ json_lex_string(JsonLexContext *lex)
 						ch = (ch * 16) + (*s - 'A') + 10;
 					else
 					{
-						lex->token_terminator = s + pg_mblen(s);
+						lex->token_terminator = s + pg_wchar_table[lex->input_encoding].mblen((const unsigned char *) s);
 						return JSON_UNICODE_ESCAPE_FORMAT;
 					}
 				}
@@ -760,7 +729,7 @@ json_lex_string(JsonLexContext *lex)
 						/* We can't allow this, since our TEXT type doesn't */
 						return JSON_UNICODE_CODE_POINT_ZERO;
 					}
-					else if (GetDatabaseEncoding() == PG_UTF8)
+					else if (lex->input_encoding == PG_UTF8)
 					{
 						unicode_to_utf8(ch, (unsigned char *) utf8str);
 						utf8len = pg_utf_mblen((unsigned char *) utf8str);
@@ -810,7 +779,7 @@ json_lex_string(JsonLexContext *lex)
 					default:
 						/* Not a valid string escape, so signal error. */
 						lex->token_start = s;
-						lex->token_terminator = s + pg_mblen(s);
+						lex->token_terminator = s + pg_wchar_table[lex->input_encoding].mblen((const unsigned char *) s);
 						return JSON_ESCAPING_INVALID;
 				}
 			}
@@ -824,7 +793,7 @@ json_lex_string(JsonLexContext *lex)
 				 * shown it's not a performance win.
 				 */
 				lex->token_start = s;
-				lex->token_terminator = s + pg_mblen(s);
+				lex->token_terminator = s + pg_wchar_table[lex->input_encoding].mblen((const unsigned char *) s);
 				return JSON_ESCAPING_INVALID;
 			}
 
@@ -833,7 +802,6 @@ json_lex_string(JsonLexContext *lex)
 		{
 			if (hi_surrogate != -1)
 				return JSON_UNICODE_LOW_SURROGATE;
-
 			appendStringInfoChar(lex->strval, *s);
 		}
 
@@ -967,7 +935,6 @@ json_lex_number(JsonLexContext *lex, char *s,
 		if (error)
 			return JSON_INVALID_TOKEN;
 	}
-
 	return JSON_SUCCESS;
 }
 
@@ -983,7 +950,6 @@ report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
 	if (lex->token_start == NULL || lex->token_type == JSON_TOKEN_END)
 		return JSON_EXPECTED_MORE;
 
-	/* Otherwise choose the error type based on the parsing context. */
 	switch (ctx)
 	{
 		case JSON_PARSE_END:
@@ -1005,7 +971,7 @@ report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
 		case JSON_PARSE_OBJECT_COMMA:
 			return JSON_EXPECTED_STRING;
 		default:
-			elog(ERROR, "unexpected json parse state: %d", ctx);
+			return JSON_BAD_PARSER_STATE;;
 	}
 }
 
@@ -1018,7 +984,12 @@ json_errdetail(JsonParseErrorType error, JsonLexContext *lex)
 	switch (error)
 	{
 		case JSON_SUCCESS:
+		case JSON_BAD_PARSER_STATE:
+#ifdef FRONTEND
+			return _("internal error in json parser");
+#else
 			elog(ERROR, "internal error in json parser");
+#endif
 			break;
 		case JSON_ESCAPING_INVALID:
 			return psprintf(_("Escape sequence \"\\%s\" is invalid."),
diff --git a/src/include/common/jsonapi.h b/src/include/common/jsonapi.h
index da20ffce64..1836b5e591 100644
--- a/src/include/common/jsonapi.h
+++ b/src/include/common/jsonapi.h
@@ -14,6 +14,7 @@
 #ifndef JSONAPI_H
 #define JSONAPI_H
 
+#include "common/pg_wchar.h"
 #include "lib/stringinfo.h"
 
 typedef enum
@@ -52,7 +53,8 @@ typedef enum
 	JSON_UNICODE_ESCAPE_FORMAT,
 	JSON_UNICODE_HIGH_ESCAPE,
 	JSON_UNICODE_HIGH_SURROGATE,
-	JSON_UNICODE_LOW_SURROGATE
+	JSON_UNICODE_LOW_SURROGATE,
+	JSON_BAD_PARSER_STATE
 } JsonParseErrorType;
 
 
@@ -73,6 +75,7 @@ typedef struct JsonLexContext
 {
 	char	   *input;
 	int			input_length;
+	pg_enc		input_encoding;
 	char	   *token_start;
 	char	   *token_terminator;
 	char	   *prev_token_terminator;
@@ -114,6 +117,9 @@ typedef struct JsonSemAction
 	json_scalar_action scalar;
 } JsonSemAction;
 
+/* the null action object used for pure validation */
+extern const JsonSemAction nullSemAction;
+
 /*
  * pg_parse_json will parse the string in the lex calling the
  * action functions in sem at the appropriate points. It is
@@ -124,10 +130,10 @@ typedef struct JsonSemAction
  * does nothing and just continues.
  */
 extern JsonParseErrorType pg_parse_json(JsonLexContext *lex,
-										JsonSemAction *sem);
+										const JsonSemAction *sem);
 
 /* the null action object used for pure validation */
-extern JsonSemAction nullSemAction;
+extern const JsonSemAction nullSemAction;
 
 /*
  * json_count_array_elements performs a fast secondary parse to determine the
@@ -149,11 +155,18 @@ extern JsonParseErrorType json_count_array_elements(JsonLexContext *lex,
  */
 extern JsonLexContext *makeJsonLexContextCstringLen(char *json,
 													int len,
+													pg_enc encoding,
 													bool need_escapes);
 
 /* lex one token */
 extern JsonParseErrorType json_lex(JsonLexContext *lex);
 
+/* Get the current look_ahead token. */
+static inline JsonTokenType lex_peek(JsonLexContext *lex)
+{
+	return lex->token_type;
+}
+
 /* construct an error detail string for a json error */
 extern char *json_errdetail(JsonParseErrorType error, JsonLexContext *lex);
 
diff --git a/src/include/utils/jsonfuncs.h b/src/include/utils/jsonfuncs.h
index 1f1b4029cb..f29437d29e 100644
--- a/src/include/utils/jsonfuncs.h
+++ b/src/include/utils/jsonfuncs.h
@@ -40,7 +40,7 @@ typedef text *(*JsonTransformStringValuesAction) (void *state, char *elem_value,
 extern JsonLexContext *makeJsonLexContext(text *json, bool need_escapes);
 
 /* try to parse json, and ereport(ERROR) on failure */
-extern void pg_parse_json_or_ereport(JsonLexContext *lex, JsonSemAction *sem);
+extern void pg_parse_json_or_ereport(JsonLexContext *lex, const JsonSemAction *sem);
 
 /* report an error during json lexing or parsing */
 extern void json_ereport_error(JsonParseErrorType error, JsonLexContext *lex);
-- 
2.21.1 (Apple Git-122.3)

0011-Adding-src-bin-pg_test_json.patchapplication/octet-stream; name=0011-Adding-src-bin-pg_test_json.patch; x-unix-mode=0644Download

From ddb5cfd79a3acd943b82cb3c9a6105404e601dcf Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Thu, 23 Jan 2020 12:03:29 -0800
Subject: [PATCH 11/11] Adding src/bin/pg_test_json

This is a command line tool for testing whether
a string is valid json.  It is not that useful
in itself, but it proves that the json parser in
src/common can be used from frontend code.
---
 src/bin/Makefile                    |  1 +
 src/bin/pg_test_json/Makefile       | 46 +++++++++++++++++
 src/bin/pg_test_json/nls.mk         |  4 ++
 src/bin/pg_test_json/pg_test_json.c | 80 +++++++++++++++++++++++++++++
 4 files changed, 131 insertions(+)
 create mode 100644 src/bin/pg_test_json/Makefile
 create mode 100644 src/bin/pg_test_json/nls.mk
 create mode 100644 src/bin/pg_test_json/pg_test_json.c

diff --git a/src/bin/Makefile b/src/bin/Makefile
index 7f4120a34f..7a9a7980b4 100644
--- a/src/bin/Makefile
+++ b/src/bin/Makefile
@@ -25,6 +25,7 @@ SUBDIRS = \
 	pg_resetwal \
 	pg_rewind \
 	pg_test_fsync \
+	pg_test_json \
 	pg_test_timing \
 	pg_upgrade \
 	pg_waldump \
diff --git a/src/bin/pg_test_json/Makefile b/src/bin/pg_test_json/Makefile
new file mode 100644
index 0000000000..3e0458db99
--- /dev/null
+++ b/src/bin/pg_test_json/Makefile
@@ -0,0 +1,46 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/bin/pg_test_json
+#
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/bin/pg_test_json/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "pg_test_json - the PostgreSQL interactive json syntax tester"
+PGAPPICON=win32
+
+subdir = src/bin/pg_test_json
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+# make this available to TAP test scripts
+export with_readline
+
+REFDOCDIR= $(top_srcdir)/doc/src/sgml/ref
+
+override CPPFLAGS := -I. -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+OBJS = \
+	$(WIN32RES) \
+	pg_test_json.o
+
+all: pg_test_json
+
+pg_test_json: $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
+	$(CC) $(CFLAGS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+install: all installdirs
+	$(INSTALL_PROGRAM) pg_test_json$(X) '$(DESTDIR)$(bindir)/pg_test_json$(X)'
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(bindir)'
+
+uninstall:
+	rm -f '$(DESTDIR)$(bindir)/pg_test_json$(X)'
+
+clean distclean maintainer-clean:
+	rm -f pg_test_json$(X) $(OBJS)
diff --git a/src/bin/pg_test_json/nls.mk b/src/bin/pg_test_json/nls.mk
new file mode 100644
index 0000000000..72de0728db
--- /dev/null
+++ b/src/bin/pg_test_json/nls.mk
@@ -0,0 +1,4 @@
+# src/bin/pg_test_json/nls.mk
+CATALOG_NAME     = pg_test_json
+AVAIL_LANGUAGES  =
+GETTEXT_FILES    = pg_test_json.c
diff --git a/src/bin/pg_test_json/pg_test_json.c b/src/bin/pg_test_json/pg_test_json.c
new file mode 100644
index 0000000000..dd8bebafb9
--- /dev/null
+++ b/src/bin/pg_test_json/pg_test_json.c
@@ -0,0 +1,80 @@
+/*
+ *	pg_test_json.c
+ *		tests validity of json strings against parser implementation.
+ */
+
+#include "postgres_fe.h"
+
+#include "common/jsonapi.h"
+#include "libpq-fe.h"
+
+static const char *progname;
+
+static void handle_args(int argc, char *argv[]);
+static void parse_json(const char *str);
+
+int
+main(int argc, char *argv[])
+{
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_test_json"));
+	progname = get_progname(argv[0]);
+
+	handle_args(argc, argv);
+
+	return 0;
+}
+
+static void
+handle_args(int argc, char *argv[])
+{
+	int			argidx;			/* Command line argument position */
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			printf(_("Usage: %s jsonstr [, ...]\n"), progname);
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_test_json (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	for (argidx = 1; argidx < argc; argidx++)
+		parse_json(argv[argidx]);
+}
+
+static void
+parse_json(const char *str)
+{
+	char *json;
+	unsigned int json_len;
+	JsonLexContext *lex;
+	int client_encoding;
+	JsonParseErrorType parse_result;
+
+	json_len = (unsigned int) strlen(str);
+	client_encoding = PQenv2encoding();
+
+#if 0
+	fprintf(stdout, _("%s: preparing for parse of string of length %u....\n"),
+				progname, json_len);
+#endif
+	json = strdup(str);
+#if 0
+	fprintf(stdout, _("%s: duplicated string of length %u.\n"),
+				progname, json_len);
+#endif
+	lex = makeJsonLexContextCstringLen(json, strlen(json), client_encoding, true /* need_escapes */);
+#if 0
+	fprintf(stdout, _("%s: constructed JsonLexContext for utf8 json string of length %u.\n"),
+				progname, json_len);
+#endif
+	parse_result = pg_parse_json(lex, &nullSemAction);
+	fprintf(stdout, _("%s: %s: %s\n"),
+					progname, str, (JSON_SUCCESS == parse_result ? "VALID" : "INVALID"));
+	return;
+}
-- 
2.21.1 (Apple Git-122.3)

#46

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

almost 6 years ago

In reply to: Mark Dilger (#45)

Re: making the backend's json parser work in frontend code

On Fri, Jan 24, 2020 at 7:35 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

On Jan 22, 2020, at 7:00 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:

I have this done in my local repo to the point that I can build frontend tools against the json parser that is now in src/common and also run all the check-world tests without failure. I’m planning to post my work soon, possibly tonight if I don’t run out of time, but more likely tomorrow.

Ok, I finished merging with Robert’s patches. The attached follow his numbering, with my patches intended to by applied after his.

I tried not to change his work too much, but I did a bit of refactoring in 0010, as explained in the commit comment.

0011 is just for verifying the linking works ok and the json parser can be invoked from a frontend tool without error — I don’t really see the point in committing it.

I ran some benchmarks for json parsing in the backend both before and after these patches, with very slight changes in runtime. The setup for the benchmark creates an unlogged table with a single text column and loads rows of json formatted text:

CREATE UNLOGGED TABLE benchmark (
j text
);
COPY benchmark (j) FROM '/Users/mark.dilger/bench/json.csv’;

FYI:

wc ~/bench/json.csv
107 34465023 503364244 /Users/mark.dilger/bench/json.csv

The benchmark itself casts the text column to jsonb, as follows:

SELECT jsonb_typeof(j::jsonb) typ, COUNT(*) FROM benchmark GROUP BY typ;

In summary, the times are:

pristine patched
————— —————
11.985 12.237
12.200 11.992
11.691 11.896
11.847 11.833
11.722 11.936

OK, nothing noticeable there.

"accept" is a common utility I've used in the past with parsers of
this kind, but inlining it seems perfectly reasonable.

I've reviewed these patches and Robert's, and they seem basically good to me.

But I don't think src/bin is the right place for the test program. I
assume we're not going to ship this program, so it really belongs in
src/test somewhere, I think. It should also have a TAP test.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#47

Mark Dilger

mark.dilger@enterprisedb.com

almost 6 years ago

In reply to: Andrew Dunstan (#46)

Re: making the backend's json parser work in frontend code

On Jan 23, 2020, at 4:27 PM, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:

On Fri, Jan 24, 2020 at 7:35 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

On Jan 22, 2020, at 7:00 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:

I have this done in my local repo to the point that I can build frontend tools against the json parser that is now in src/common and also run all the check-world tests without failure. I’m planning to post my work soon, possibly tonight if I don’t run out of time, but more likely tomorrow.

Ok, I finished merging with Robert’s patches. The attached follow his numbering, with my patches intended to by applied after his.

I tried not to change his work too much, but I did a bit of refactoring in 0010, as explained in the commit comment.

0011 is just for verifying the linking works ok and the json parser can be invoked from a frontend tool without error — I don’t really see the point in committing it.

I ran some benchmarks for json parsing in the backend both before and after these patches, with very slight changes in runtime. The setup for the benchmark creates an unlogged table with a single text column and loads rows of json formatted text:

CREATE UNLOGGED TABLE benchmark (
j text
);
COPY benchmark (j) FROM '/Users/mark.dilger/bench/json.csv’;

FYI:

wc ~/bench/json.csv
107 34465023 503364244 /Users/mark.dilger/bench/json.csv

The benchmark itself casts the text column to jsonb, as follows:

SELECT jsonb_typeof(j::jsonb) typ, COUNT(*) FROM benchmark GROUP BY typ;

In summary, the times are:

pristine patched
————— —————
11.985 12.237
12.200 11.992
11.691 11.896
11.847 11.833
11.722 11.936

OK, nothing noticeable there.

"accept" is a common utility I've used in the past with parsers of
this kind, but inlining it seems perfectly reasonable.

I've reviewed these patches and Robert's, and they seem basically good to me.

Thanks for the review!

But I don't think src/bin is the right place for the test program. I
assume we're not going to ship this program, so it really belongs in
src/test somewhere, I think. It should also have a TAP test.

Ok, I’ll go do that; thanks for the suggestion.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#48

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

almost 6 years ago

In reply to: Robert Haas (#32)

Re: making the backend's json parser work in frontend code

On 2020-01-23 18:04, Robert Haas wrote:

Now, you might say "well, why don't we just do an encoding
conversion?", but we can't. When the filesystem tells us what the file
names are, it does not tell us what encoding the person who created
those files had in mind. We don't know that they had*any* encoding in
mind. IIUC, a file in the data directory can have a name that consists
of any sequence of bytes whatsoever, so long as it doesn't contain
prohibited characters like a path separator or \0 byte. But only some
of those possible octet sequences can be stored in a manifest that has
to be valid UTF-8.

I think it wouldn't be unreasonable to require that file names in the
database directory be consistently encoded (as defined by pg_control,
probably). After all, this information is sometimes also shown in
system views, so it's already difficult to process total junk. In
practice, this shouldn't be an onerous requirement.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#49

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Peter Eisentraut (#48)

Re: making the backend's json parser work in frontend code

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

On 2020-01-23 18:04, Robert Haas wrote:

Now, you might say "well, why don't we just do an encoding
conversion?", but we can't. When the filesystem tells us what the file
names are, it does not tell us what encoding the person who created
those files had in mind. We don't know that they had*any* encoding in
mind. IIUC, a file in the data directory can have a name that consists
of any sequence of bytes whatsoever, so long as it doesn't contain
prohibited characters like a path separator or \0 byte. But only some
of those possible octet sequences can be stored in a manifest that has
to be valid UTF-8.

I think it wouldn't be unreasonable to require that file names in the
database directory be consistently encoded (as defined by pg_control,
probably). After all, this information is sometimes also shown in
system views, so it's already difficult to process total junk. In
practice, this shouldn't be an onerous requirement.

I don't entirely follow why we're discussing this at all, if the
requirement is backing up a PG data directory. There are not, and
are never likely to be, any legitimate files with non-ASCII names
in that context. Why can't we just skip any such files?

regards, tom lane

#50

David Steele

david@pgmasters.net

almost 6 years ago

In reply to: Robert Haas (#36)

Re: making the backend's json parser work in frontend code

On 1/23/20 11:05 AM, Robert Haas wrote:

On Thu, Jan 23, 2020 at 12:49 PM Bruce Momjian <bruce@momjian.us> wrote:

Another idea is to use base64 for all non-ASCII file names, so we don't
need to check if the file name is valid UTF8 before outputting --- we
just need to check for non-ASCII, which is much easier.

I think that we have the infrastructure available to check in a
convenient way whether it's valid as UTF-8, so this might not be
necessary, but I will look into it further unless there is a consensus
to go another direction entirely.

Another
problem, though, is how do you _flag_ file names as being
base64-encoded? Use another JSON field to specify that?

Alvaro's proposed solution in the message to which you replied was to
call the field either 'path' or 'path_base64' depending on whether
base-64 escaping was used. That seems better to me than having a field
called 'path' and a separate field called 'is_path_base64' or
whatever.

+1. I'm not excited about this solution but don't have a better idea.

It might be nice to have a strict mode where non-ASCII/UTF8 characters
will error instead, but that can be added on later.

Regards,
--
-David
david@pgmasters.net

#51

David Steele

david@pgmasters.net

almost 6 years ago

In reply to: Tom Lane (#49)

Re: making the backend's json parser work in frontend code

On 1/24/20 9:27 AM, Tom Lane wrote:

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

On 2020-01-23 18:04, Robert Haas wrote:

Now, you might say "well, why don't we just do an encoding
conversion?", but we can't. When the filesystem tells us what the file
names are, it does not tell us what encoding the person who created
those files had in mind. We don't know that they had*any* encoding in
mind. IIUC, a file in the data directory can have a name that consists
of any sequence of bytes whatsoever, so long as it doesn't contain
prohibited characters like a path separator or \0 byte. But only some
of those possible octet sequences can be stored in a manifest that has
to be valid UTF-8.

I think it wouldn't be unreasonable to require that file names in the
database directory be consistently encoded (as defined by pg_control,
probably). After all, this information is sometimes also shown in
system views, so it's already difficult to process total junk. In
practice, this shouldn't be an onerous requirement.

I don't entirely follow why we're discussing this at all, if the
requirement is backing up a PG data directory. There are not, and
are never likely to be, any legitimate files with non-ASCII names
in that context. Why can't we just skip any such files?

It's not uncommon in my experience for users to drop odd files into
PGDATA (usually versioned copies of postgresql.conf, etc.), but I agree
that it should be discouraged. Even so, I don't recall ever seeing any
non-ASCII filenames.

Skipping files sounds scary, I'd prefer an error or a warning (and then
base64 encode the filename).

Regards,
--
-David
david@pgmasters.net

#52

Alvaro Herrera

alvherre@2ndquadrant.com

almost 6 years ago

In reply to: David Steele (#50)

Re: making the backend's json parser work in frontend code

On 2020-Jan-24, David Steele wrote:

It might be nice to have a strict mode where non-ASCII/UTF8 characters will
error instead, but that can be added on later.

"your backup failed because you have a file we don't like" is not great
behavior. IIRC we already fail when a file is owned by root (or maybe
unreadable and owned by root), and it messes up severely when people
edit postgresql.conf as root. Let's not add more cases of that sort.

Maybe we can get away with *ignoring* such files, perhaps after emitting
a warning.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#53

David Steele

david@pgmasters.net

almost 6 years ago

In reply to: Alvaro Herrera (#52)

Re: making the backend's json parser work in frontend code

On 1/24/20 10:00 AM, Alvaro Herrera wrote:

On 2020-Jan-24, David Steele wrote:

It might be nice to have a strict mode where non-ASCII/UTF8 characters will
error instead, but that can be added on later.

"your backup failed because you have a file we don't like" is not great
behavior. IIRC we already fail when a file is owned by root (or maybe
unreadable and owned by root), and it messes up severely when people
edit postgresql.conf as root. Let's not add more cases of that sort.

My intention was that the strict mode would not be the default, so I
don't see why it would be a big issue.

Maybe we can get away with *ignoring* such files, perhaps after emitting
a warning.

I'd prefer an an error (or base64 encoding) rather than just skipping a
file. The latter sounds scary.

Regards,
--
-David
david@pgmasters.net

#54

Mark Dilger

mark.dilger@enterprisedb.com

almost 6 years ago

In reply to: David Steele (#51)

Re: making the backend's json parser work in frontend code

On Jan 24, 2020, at 8:36 AM, David Steele <david@pgmasters.net> wrote:

I don't entirely follow why we're discussing this at all, if the
requirement is backing up a PG data directory. There are not, and
are never likely to be, any legitimate files with non-ASCII names
in that context. Why can't we just skip any such files?

It's not uncommon in my experience for users to drop odd files into PGDATA (usually versioned copies of postgresql.conf, etc.), but I agree that it should be discouraged. Even so, I don't recall ever seeing any non-ASCII filenames.

Skipping files sounds scary, I'd prefer an error or a warning (and then base64 encode the filename).

I tend to agree with Tom. We know that postgres doesn’t write any such files now, and if we ever decided to change that, we could change this, too. So for now, we can assume any such files are not ours. Either the user manually scribbled in this directory, or had a tool (antivirus checksum file, vim .WHATEVER.swp file, etc) that did so. Raising an error would break any automated backup process that hit this issue, and base64 encoding the file name and backing up the file contents could grab data that the user would not reasonably expect in the backup. But this argument applies equally well to such files regardless of filename encoding. It would be odd to back them up when they happen to be valid UTF-8/ASCII/whatever, but not do so when they are not valid. I would expect, therefore, that we only back up files which match our expected file name pattern and ignore (perhaps with a warning) everything else.

Quoting from Robert’s email about why we want a backup manifest seems to support this idea, at least as I see it:

So, let's suppose we invent a backup manifest. What should it contain?
I imagine that it would consist of a list of files, and the lengths of
those files, and a checksum for each file. I think you should have a
choice of what kind of checksums to use, because algorithms that used
to seem like good choices (e.g. MD5) no longer do; this trend can
probably be expected to continue. Even if we initially support only
one kind of checksum -- presumably SHA-something since we have code
for that already for SCRAM -- I think that it would also be a good
idea to allow for future changes. And maybe it's best to just allow a
choice of SHA-224, SHA-256, SHA-384, and SHA-512 right out of the
gate, so that we can avoid bikeshedding over which one is secure
enough. I guess we'll still have to argue about the default. I also
think that it should be possible to build a manifest with no
checksums, so that one need not pay the overhead of computing
checksums if one does not wish. Of course, such a manifest is of much
less utility for checking backup integrity, but you can still check
that you've got the right files, which is noticeably better than
nothing. The manifest should probably also contain a checksum of its
own contents so that the integrity of the manifest itself can be
verified. And maybe a few other bits of metadata, but I'm not sure
exactly what. Ideas?

Once we invent the concept of a backup manifest, what do we need to do
with them? I think we'd want three things initially:

(1) When taking a backup, have the option (perhaps enabled by default)
to include a backup manifest.
(2) Given an existing backup that has not got a manifest, construct one.
(3) Cross-check a manifest against a backup and complain about extra
files, missing files, size differences, or checksum mismatches.

Nothing in there sounds to me like it needs to include random cruft.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#55

Alvaro Herrera

alvherre@2ndquadrant.com

almost 6 years ago

In reply to: David Steele (#53)

Re: making the backend's json parser work in frontend code

On 2020-Jan-24, David Steele wrote:

On 1/24/20 10:00 AM, Alvaro Herrera wrote:

Maybe we can get away with *ignoring* such files, perhaps after emitting
a warning.

I'd prefer an an error (or base64 encoding) rather than just skipping a
file. The latter sounds scary.

Well, if the file is "invalid" then evidently Postgres cannot possibly
care about it, so why would it care if it's missing from the backup?

I prefer the encoding scheme myself. I don't see the point of the
error.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#56

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Alvaro Herrera (#55)

Re: making the backend's json parser work in frontend code

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I prefer the encoding scheme myself. I don't see the point of the
error.

Yeah, if we don't want to skip such files, then storing them using
a base64-encoded name (with a different key than regular names)
seems plausible. But I don't really see why we'd go to that much
trouble, nor why we'd think it's likely that tools would correctly
handle a case that is going to have 0.00% usage in the field.

regards, tom lane

#57

Alvaro Herrera

alvherre@2ndquadrant.com

almost 6 years ago

In reply to: Mark Dilger (#54)

Re: making the backend's json parser work in frontend code

On 2020-Jan-24, Mark Dilger wrote:

I would expect, therefore, that we only back up files which match our
expected file name pattern and ignore (perhaps with a warning)
everything else.

That risks missing files placed in the datadir by extensions; see
discussion about pg_checksums using a whitelist[1]/messages/by-id/20181019171747.4uithw2sjkt6msne@alap3.anarazel.de, which does not
translate directly to this problem, because omitting to checksum a file
is not the same as failing to copy a file into the backups.
(Essentially, the backups would be incomplete.)

[1]: /messages/by-id/20181019171747.4uithw2sjkt6msne@alap3.anarazel.de

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#58

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Alvaro Herrera (#57)

Re: making the backend's json parser work in frontend code

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2020-Jan-24, Mark Dilger wrote:

I would expect, therefore, that we only back up files which match our
expected file name pattern and ignore (perhaps with a warning)
everything else.

That risks missing files placed in the datadir by extensions;

I agree that assuming we know everything that will appear in the
data directory is a pretty unsafe assumption. But no rational
extension is going to use a non-ASCII file name, either, if only
because it can't predict what the filesystem encoding will be.

regards, tom lane

#59

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Tom Lane (#56)

Re: making the backend's json parser work in frontend code

On Fri, Jan 24, 2020 at 9:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I prefer the encoding scheme myself. I don't see the point of the
error.

Yeah, if we don't want to skip such files, then storing them using
a base64-encoded name (with a different key than regular names)
seems plausible. But I don't really see why we'd go to that much
trouble, nor why we'd think it's likely that tools would correctly
handle a case that is going to have 0.00% usage in the field.

I mean, I gave a not-totally-unrealistic example of how this could
happen upthread. I agree it's going to be rare, but it's not usually
OK to decide that if a user does something a little unusual,
not-obviously-related features subtly break.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#60

Alvaro Herrera

alvherre@2ndquadrant.com

almost 6 years ago

In reply to: Tom Lane (#58)

Re: making the backend's json parser work in frontend code

On 2020-Jan-24, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2020-Jan-24, Mark Dilger wrote:

I would expect, therefore, that we only back up files which match our
expected file name pattern and ignore (perhaps with a warning)
everything else.

That risks missing files placed in the datadir by extensions;

I agree that assuming we know everything that will appear in the
data directory is a pretty unsafe assumption. But no rational
extension is going to use a non-ASCII file name, either, if only
because it can't predict what the filesystem encoding will be.

I see two different arguments. One is about the file encoding. Those
files are rare and would be placed by the user manually. We can fix
that by encoding the name. We can have a debug mode that encodes all
names that way, just to ensure the tools are prepared for it.

The other is Mark's point about "expected file pattern", which seems a
slippery slope to me. If the pattern is /^[a-zA-Z0-9_.]*$/ then I'm
okay with it (maybe add a few other punctuation chars); as you say no
sane extension would use names much weirder than that. But we should
not be stricter, such as counting the number of periods/underscores
allowed or where are alpha chars expected (except maybe disallow period
at start of filename), or anything too specific like that.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#61

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

almost 6 years ago

In reply to: Robert Haas (#59)

Re: making the backend's json parser work in frontend code

On 2020-01-24 18:56, Robert Haas wrote:

On Fri, Jan 24, 2020 at 9:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I prefer the encoding scheme myself. I don't see the point of the
error.

Yeah, if we don't want to skip such files, then storing them using
a base64-encoded name (with a different key than regular names)
seems plausible. But I don't really see why we'd go to that much
trouble, nor why we'd think it's likely that tools would correctly
handle a case that is going to have 0.00% usage in the field.

I mean, I gave a not-totally-unrealistic example of how this could
happen upthread. I agree it's going to be rare, but it's not usually
OK to decide that if a user does something a little unusual,
not-obviously-related features subtly break.

Another example might be log files under pg_log with localized weekday
or month names. (Maybe we're not planning to back up log files, but the
routines that deal with file names should probably be prepared to at
least look at the name and decide that they don't care about it rather
than freaking out right away.)

I'm not fond of the base64 idea btw., because it seems to sort of
penalize using non-ASCII characters by making the result completely not
human readable. Something along the lines of MIME would be better in
that way. There are existing solutions to storing data with metadata
around it.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#62

Mark Dilger

mark.dilger@enterprisedb.com

almost 6 years ago

In reply to: Alvaro Herrera (#60)

Re: making the backend's json parser work in frontend code

On Jan 24, 2020, at 10:03 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

The other is Mark's point about "expected file pattern", which seems a
slippery slope to me. If the pattern is /^[a-zA-Z0-9_.]*$/ then I'm
okay with it (maybe add a few other punctuation chars); as you say no
sane extension would use names much weirder than that. But we should
not be stricter, such as counting the number of periods/underscores
allowed or where are alpha chars expected (except maybe disallow period
at start of filename), or anything too specific like that.

What bothered me about skipping files based only on encoding is that it creates hard to anticipate bugs. If extensions embed something, like a customer name, into a filename, and that something is usually ASCII, or usually valid UTF-8, and gets backed up, but then some day they embed something that is not ASCII/UTF-8, then it does not get backed up, and maybe nobody notices until they actually *need* the backup, and it’s too late.

We either need to be really strict about what gets backed up, so that nobody gets a false sense of security about what gets included in that list, or we need to be completely permissive, which would include files named in arbitrary encodings. I don’t see how it does anybody any favors to make the system appear to back up everything until you hit this unanticipated case and then it fails.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#63

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Mark Dilger (#45)

Re: making the backend's json parser work in frontend code

On Thu, Jan 23, 2020 at 1:05 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

Ok, I finished merging with Robert’s patches. The attached follow his numbering, with my patches intended to by applied after his.

I think it'd be a good idea to move the pg_wchar.h stuff into a new
thread. This thread is getting a bit complicated, because we've got
(1) the patches need to do $SUBJECT plus (2) additional patches that
clean up the multibyte stuff more plus (3) discussion of issues that
pertain to the backup manifest thread. To my knowledge, $SUBJECT
doesn't strictly require the pg_wchar.h changes, so I suggest we try
to segregate those.

I tried not to change his work too much, but I did a bit of refactoring in 0010, as explained in the commit comment.

Hmm, I generally prefer to avoid these kinds of macro tricks because I
think they can be confusing to the reader. It's worth it in a case
like equalfuncs.c where so much boilerplate code is saved that the
gain in readability more than makes up for having to go check what the
macros do -- but I don't feel that's the case here. There aren't
*that* many call sites, and I think the code will be easier to
understand without "return" statements concealed within macros...

I ran some benchmarks for json parsing in the backend both before and after these patches, with very slight changes in runtime.

Cool, thanks.

Since 0001-0003 have been reviewed by multiple people and nobody's
objected, I have committed those. But I made a hash of it: the first
one, I failed to credit any reviewers, or include a Discussion link,
and I just realized that I should have listed Alvaro's name as a
reviewer also. Sorry about that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#64

Alvaro Herrera

alvherre@2ndquadrant.com

almost 6 years ago

In reply to: Peter Eisentraut (#61)

Re: making the backend's json parser work in frontend code

On 2020-Jan-24, Peter Eisentraut wrote:

I'm not fond of the base64 idea btw., because it seems to sort of penalize
using non-ASCII characters by making the result completely not human
readable. Something along the lines of MIME would be better in that way.
There are existing solutions to storing data with metadata around it.

You mean quoted-printable? That works for me.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#65

Mark Dilger

mark.dilger@enterprisedb.com

almost 6 years ago

In reply to: Robert Haas (#63)

Re: making the backend's json parser work in frontend code

On Jan 24, 2020, at 10:43 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Since 0001-0003 have been reviewed by multiple people and nobody's
objected, I have committed those.

I think 0004-0005 have been reviewed and accepted by both me and Andrew, if I understood him correctly:

I've reviewed these patches and Robert's, and they seem basically good to me.

Certainly, nothing in those two patches caused me any concern. I’m going to modify my patches as you suggested, get rid of the INSIST macro, and move the pg_wchar changes to their own thread. None of that should require changes in your 0004 or 0005. It won’t bother me if you commit those two. Andrew?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#66

Mark Dilger

mark.dilger@enterprisedb.com

almost 6 years ago

In reply to: Robert Haas (#27)

2 attachment(s)

Re: making the backend's json parser work in frontend code

On Jan 22, 2020, at 10:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:

0004 is a substantially cleaned up version of the patch to make the
JSON parser return a result code rather than throwing errors. Names
have been fixed, interfaces have been tidied up, and the thing is
better integrated with the surrounding code. I would really like
comments, if anyone has them, on whether this approach is acceptable.

0005 builds on 0004 by moving three functions from jsonapi.c to
jsonfuncs.c. With that done, jsonapi.c has minimal remaining
dependencies on the backend environment. It would still need a
substitute for elog(ERROR, "some internal thing is broken"); I'm
thinking of using pg_log_fatal() for that case. It would also need a
fix for the problem that pg_mblen() is not available in the front-end
environment. I don't know what to do about that yet exactly, but it
doesn't seem unsolvable. The frontend environment just needs to know
which encoding to use, and needs a way to call PQmblen() rather than
pg_mblen().

I have completed the work in the attached 0006 and 0007 patches.
These are intended to apply after your 0004 and 0005; they won’t
work directly on master which, as of this writing, only contains your
0001-0003 patches.

0006 finishes moving the json parser to src/include/common and src/common.

0007 adds testing.

I would appreciate somebody looking at the portability issues for 0007
on Windows.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

v4-0006-Relocating-jsonapi-to-common.patchapplication/octet-stream; name=v4-0006-Relocating-jsonapi-to-common.patch; x-unix-mode=0644Download

From 04a5423f53a0ca7f655dcdaca6b22c0af713570f Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Fri, 24 Jan 2020 14:30:35 -0800
Subject: [PATCH 1/2] Relocating jsonapi to common.

Moving jsonapi.c and jsonapi.h to src/common and src/include/common.
Reworking the code to not include elog, ereport, pg_mblen, and
similar backend-only functionality.
---
 contrib/hstore/hstore_io.c                  |  2 +-
 src/backend/tsearch/to_tsany.c              |  2 +-
 src/backend/tsearch/wparser.c               |  2 +-
 src/backend/utils/adt/Makefile              |  1 -
 src/backend/utils/adt/json.c                |  2 +-
 src/backend/utils/adt/jsonb.c               |  2 +-
 src/backend/utils/adt/jsonb_util.c          |  2 +-
 src/backend/utils/adt/jsonfuncs.c           |  7 ++--
 src/common/Makefile                         |  1 +
 src/{backend/utils/adt => common}/jsonapi.c | 42 ++++++++++++++++-----
 src/include/{utils => common}/jsonapi.h     |  5 ++-
 src/include/utils/jsonfuncs.h               |  2 +-
 12 files changed, 49 insertions(+), 21 deletions(-)
 rename src/{backend/utils/adt => common}/jsonapi.c (96%)
 rename src/include/{utils => common}/jsonapi.h (98%)

diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index 10ec392775..f3174f2995 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -7,13 +7,13 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "funcapi.h"
 #include "hstore.h"
 #include "lib/stringinfo.h"
 #include "libpq/pqformat.h"
 #include "utils/builtins.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index adf181c191..1fe67c4c99 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -13,10 +13,10 @@
  */
 #include "postgres.h"
 
+#include "common/jsonapi.h"
 #include "tsearch/ts_cache.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonfuncs.h"
 
 
diff --git a/src/backend/tsearch/wparser.c b/src/backend/tsearch/wparser.c
index c7499a94ac..88005c0519 100644
--- a/src/backend/tsearch/wparser.c
+++ b/src/backend/tsearch/wparser.c
@@ -16,11 +16,11 @@
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
+#include "common/jsonapi.h"
 #include "funcapi.h"
 #include "tsearch/ts_cache.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonfuncs.h"
 #include "utils/varlena.h"
 
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 790d7a24fb..13efa9338c 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -44,7 +44,6 @@ OBJS = \
 	int.o \
 	int8.o \
 	json.o \
-	jsonapi.o \
 	jsonb.o \
 	jsonb_gin.o \
 	jsonb_op.o \
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index f6cd2b9911..567eab1e01 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -127,7 +127,7 @@ json_recv(PG_FUNCTION_ARGS)
 	str = pq_getmsgtext(buf, buf->len - buf->cursor, &nbytes);
 
 	/* Validate it. */
-	lex = makeJsonLexContextCstringLen(str, nbytes, false);
+	lex = makeJsonLexContextCstringLen(str, nbytes, GetDatabaseEncoding(), false);
 	pg_parse_json_or_ereport(lex, &nullSemAction);
 
 	PG_RETURN_TEXT_P(cstring_to_text_with_len(str, nbytes));
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index c912f8932d..fea4335951 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -261,7 +261,7 @@ jsonb_from_cstring(char *json, int len)
 
 	memset(&state, 0, sizeof(state));
 	memset(&sem, 0, sizeof(sem));
-	lex = makeJsonLexContextCstringLen(json, len, true);
+	lex = makeJsonLexContextCstringLen(json, len, GetDatabaseEncoding(), true);
 
 	sem.semstate = (void *) &state;
 
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index b33c3ef43c..edec657cd3 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -15,12 +15,12 @@
 
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/datetime.h"
 #include "utils/hashutils.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/memutils.h"
 #include "utils/varlena.h"
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 66ea11b971..4f6fd0de02 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -18,6 +18,7 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "fmgr.h"
 #include "funcapi.h"
 #include "lib/stringinfo.h"
@@ -27,7 +28,6 @@
 #include "utils/builtins.h"
 #include "utils/hsearch.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/jsonfuncs.h"
 #include "utils/lsyscache.h"
@@ -514,6 +514,7 @@ makeJsonLexContext(text *json, bool need_escapes)
 {
 	return makeJsonLexContextCstringLen(VARDATA_ANY(json),
 										VARSIZE_ANY_EXHDR(json),
+										GetDatabaseEncoding(),
 										need_escapes);
 }
 
@@ -2605,7 +2606,7 @@ populate_array_json(PopulateArrayContext *ctx, char *json, int len)
 	PopulateArrayState state;
 	JsonSemAction sem;
 
-	state.lex = makeJsonLexContextCstringLen(json, len, true);
+	state.lex = makeJsonLexContextCstringLen(json, len, GetDatabaseEncoding(), true);
 	state.ctx = ctx;
 
 	memset(&sem, 0, sizeof(sem));
@@ -3448,7 +3449,7 @@ get_json_object_as_hash(char *json, int len, const char *funcname)
 	HASHCTL		ctl;
 	HTAB	   *tab;
 	JHashState *state;
-	JsonLexContext *lex = makeJsonLexContextCstringLen(json, len, true);
+	JsonLexContext *lex = makeJsonLexContextCstringLen(json, len, GetDatabaseEncoding(), true);
 	JsonSemAction *sem;
 
 	memset(&ctl, 0, sizeof(ctl));
diff --git a/src/common/Makefile b/src/common/Makefile
index 44ca68fa6c..e757fb7399 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -56,6 +56,7 @@ OBJS_COMMON = \
 	f2s.o \
 	file_perm.o \
 	ip.o \
+	jsonapi.o \
 	keywords.o \
 	kwlookup.o \
 	link-canary.o \
diff --git a/src/backend/utils/adt/jsonapi.c b/src/common/jsonapi.c
similarity index 96%
rename from src/backend/utils/adt/jsonapi.c
rename to src/common/jsonapi.c
index 1ac3b7beda..f0e6a63e4d 100644
--- a/src/backend/utils/adt/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -11,11 +11,18 @@
  *
  *-------------------------------------------------------------------------
  */
+#ifndef FRONTEND
 #include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
 
+#include "common/jsonapi.h"
 #include "mb/pg_wchar.h"
+
+#ifndef FRONTEND
 #include "miscadmin.h"
-#include "utils/jsonapi.h"
+#endif
 
 /*
  * The context of the parser is maintained by the recursive descent
@@ -135,13 +142,21 @@ IsValidJsonNumber(const char *str, int len)
  * if really required.
  */
 JsonLexContext *
-makeJsonLexContextCstringLen(char *json, int len, bool need_escapes)
+makeJsonLexContextCstringLen(char *json, int len, int encoding, bool need_escapes)
 {
-	JsonLexContext *lex = palloc0(sizeof(JsonLexContext));
+	JsonLexContext *lex;
+
+#ifndef FRONTEND
+	lex = palloc0(sizeof(JsonLexContext));
+#else
+	lex = (JsonLexContext*) malloc(sizeof(JsonLexContext));
+	memset(lex, 0, sizeof(JsonLexContext));
+#endif
 
 	lex->input = lex->token_terminator = lex->line_start = json;
 	lex->line_number = 1;
 	lex->input_length = len;
+	lex->input_encoding = encoding;
 	if (need_escapes)
 		lex->strval = makeStringInfo();
 	return lex;
@@ -360,7 +375,11 @@ parse_object(JsonLexContext *lex, JsonSemAction *sem)
 	JsonTokenType tok;
 	JsonParseErrorType result;
 
+#ifndef FRONTEND
 	check_stack_depth();
+#else
+	/* TODO: What do we do in frontend code? */
+#endif
 
 	if (ostart != NULL)
 		(*ostart) (sem->semstate);
@@ -460,7 +479,11 @@ parse_array(JsonLexContext *lex, JsonSemAction *sem)
 	json_struct_action aend = sem->array_end;
 	JsonParseErrorType result;
 
+#ifndef FRONTEND
 	check_stack_depth();
+#else
+	/* TODO: What do we do in frontend code? */
+#endif
 
 	if (astart != NULL)
 		(*astart) (sem->semstate);
@@ -720,7 +743,7 @@ json_lex_string(JsonLexContext *lex)
 						ch = (ch * 16) + (*s - 'A') + 10;
 					else
 					{
-						lex->token_terminator = s + pg_mblen(s);
+						lex->token_terminator = s + pg_wchar_table[lex->input_encoding].mblen((const unsigned char *) s);
 						return JSON_UNICODE_ESCAPE_FORMAT;
 					}
 				}
@@ -759,7 +782,7 @@ json_lex_string(JsonLexContext *lex)
 						/* We can't allow this, since our TEXT type doesn't */
 						return JSON_UNICODE_CODE_POINT_ZERO;
 					}
-					else if (GetDatabaseEncoding() == PG_UTF8)
+					else if (lex->input_encoding == PG_UTF8)
 					{
 						unicode_to_utf8(ch, (unsigned char *) utf8str);
 						utf8len = pg_utf_mblen((unsigned char *) utf8str);
@@ -809,7 +832,7 @@ json_lex_string(JsonLexContext *lex)
 					default:
 						/* Not a valid string escape, so signal error. */
 						lex->token_start = s;
-						lex->token_terminator = s + pg_mblen(s);
+						lex->token_terminator = s + pg_wchar_table[lex->input_encoding].mblen((const unsigned char *) s);
 						return JSON_ESCAPING_INVALID;
 				}
 			}
@@ -823,7 +846,7 @@ json_lex_string(JsonLexContext *lex)
 				 * shown it's not a performance win.
 				 */
 				lex->token_start = s;
-				lex->token_terminator = s + pg_mblen(s);
+				lex->token_terminator = s + pg_wchar_table[lex->input_encoding].mblen((const unsigned char *) s);
 				return JSON_ESCAPING_INVALID;
 			}
 
@@ -1004,7 +1027,7 @@ report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
 		case JSON_PARSE_OBJECT_COMMA:
 			return JSON_EXPECTED_STRING;
 		default:
-			elog(ERROR, "unexpected json parse state: %d", ctx);
+			return JSON_BAD_PARSER_STATE;
 	}
 }
 
@@ -1017,7 +1040,8 @@ json_errdetail(JsonParseErrorType error, JsonLexContext *lex)
 	switch (error)
 	{
 		case JSON_SUCCESS:
-			elog(ERROR, "internal error in json parser");
+		case JSON_BAD_PARSER_STATE:
+			return _("internal error in json parser");
 			break;
 		case JSON_ESCAPING_INVALID:
 			return psprintf(_("Escape sequence \"\\%s\" is invalid."),
diff --git a/src/include/utils/jsonapi.h b/src/include/common/jsonapi.h
similarity index 98%
rename from src/include/utils/jsonapi.h
rename to src/include/common/jsonapi.h
index 4d69b18495..375b6d6639 100644
--- a/src/include/utils/jsonapi.h
+++ b/src/include/common/jsonapi.h
@@ -52,7 +52,8 @@ typedef enum
 	JSON_UNICODE_ESCAPE_FORMAT,
 	JSON_UNICODE_HIGH_ESCAPE,
 	JSON_UNICODE_HIGH_SURROGATE,
-	JSON_UNICODE_LOW_SURROGATE
+	JSON_UNICODE_LOW_SURROGATE,
+	JSON_BAD_PARSER_STATE
 } JsonParseErrorType;
 
 
@@ -73,6 +74,7 @@ typedef struct JsonLexContext
 {
 	char	   *input;
 	int			input_length;
+	int			input_encoding;
 	char	   *token_start;
 	char	   *token_terminator;
 	char	   *prev_token_terminator;
@@ -149,6 +151,7 @@ extern JsonParseErrorType json_count_array_elements(JsonLexContext *lex,
  */
 extern JsonLexContext *makeJsonLexContextCstringLen(char *json,
 													int len,
+													int encoding,
 													bool need_escapes);
 
 /* lex one token */
diff --git a/src/include/utils/jsonfuncs.h b/src/include/utils/jsonfuncs.h
index b993f38409..1f1b4029cb 100644
--- a/src/include/utils/jsonfuncs.h
+++ b/src/include/utils/jsonfuncs.h
@@ -14,7 +14,7 @@
 #ifndef JSONFUNCS_H
 #define JSONFUNCS_H
 
-#include "utils/jsonapi.h"
+#include "common/jsonapi.h"
 #include "utils/jsonb.h"
 
 /*
-- 
2.21.1 (Apple Git-122.3)

v4-0007-Adding-frontend-tests-for-json-parser.patchapplication/octet-stream; name=v4-0007-Adding-frontend-tests-for-json-parser.patch; x-unix-mode=0644Download

From f2f2792ec244a59a3773ad654d1478a1d48ac81c Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Sun, 26 Jan 2020 10:41:37 -0800
Subject: [PATCH 2/2] Adding frontend tests for json parser.

Creating tests in new directory src/test/bin for testing
that the json parser can be included and used from within
a frontend standalone binary.
---
 src/Makefile                    |  4 +-
 src/test/Makefile               |  7 ++--
 src/test/bin/.gitignore         |  1 +
 src/test/bin/Makefile           | 41 ++++++++++++++++++++
 src/test/bin/README             | 15 ++++++++
 src/test/bin/t/001_test_json.pl | 48 +++++++++++++++++++++++
 src/test/bin/test_json.c        | 67 +++++++++++++++++++++++++++++++++
 7 files changed, 179 insertions(+), 4 deletions(-)
 create mode 100644 src/test/bin/.gitignore
 create mode 100644 src/test/bin/Makefile
 create mode 100644 src/test/bin/README
 create mode 100644 src/test/bin/t/001_test_json.pl
 create mode 100644 src/test/bin/test_json.c

diff --git a/src/Makefile b/src/Makefile
index bcdbd9588a..ccd4bab0de 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -2,7 +2,8 @@
 #
 # Makefile for src
 #
-# Copyright (c) 1994, Regents of the University of California
+# Portions Copyright (c) 1994, Regents of the University of California
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
 #
 # src/Makefile
 #
@@ -27,6 +28,7 @@ SUBDIRS = \
 	bin \
 	pl \
 	makefiles \
+	test/bin \
 	test/regress \
 	test/isolation \
 	test/perl
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..e24732f190 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,7 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = bin perl regress isolation modules authentication recovery subscription
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
@@ -40,10 +40,11 @@ endif
 ALWAYS_SUBDIRS = $(filter-out $(SUBDIRS),examples kerberos ldap locale thread ssl)
 
 # We want to recurse to all subdirs for all standard targets, except that
-# installcheck and install should not recurse into the subdirectory "modules".
+# installcheck and install should not recurse into the subdirectory "modules"
+# nor "bin".
 
 recurse_alldirs_targets := $(filter-out installcheck install, $(standard_targets))
-installable_dirs := $(filter-out modules, $(SUBDIRS))
+installable_dirs := $(filter-out modules bin, $(SUBDIRS))
 
 $(call recurse,$(recurse_alldirs_targets))
 $(call recurse,installcheck, $(installable_dirs))
diff --git a/src/test/bin/.gitignore b/src/test/bin/.gitignore
new file mode 100644
index 0000000000..6709c749d8
--- /dev/null
+++ b/src/test/bin/.gitignore
@@ -0,0 +1 @@
+test_json
diff --git a/src/test/bin/Makefile b/src/test/bin/Makefile
new file mode 100644
index 0000000000..3eee9091bc
--- /dev/null
+++ b/src/test/bin/Makefile
@@ -0,0 +1,41 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/bin
+#
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/bin/bin/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "bin - the PostgreSQL standalone code binaries for testing"
+PGAPPICON=win32
+
+subdir = src/test/bin
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+# make this available to TAP test scripts
+export with_readline
+
+REFDOCDIR= $(top_srcdir)/doc/src/sgml/ref
+
+override CPPFLAGS := -I. -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+OBJS = \
+	$(WIN32RES) \
+	test_json.o
+
+all: test_json
+
+test_json: $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
+	$(CC) $(CFLAGS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+check:
+	PATH="$(abs_top_builddir)/src/test/bin:$$PATH" $(prove_check)
+
+clean distclean maintainer-clean:
+	rm -f test_json$(X) $(OBJS)
+	rm -rf tmp_check
diff --git a/src/test/bin/README b/src/test/bin/README
new file mode 100644
index 0000000000..214671af88
--- /dev/null
+++ b/src/test/bin/README
@@ -0,0 +1,15 @@
+src/test/bin/README
+
+Binary executable tests
+=======================
+
+This directory contains a set of programs that exercise functionality declared
+in src/include/common and defined in src/common.  The purpose of these programs
+is to verify that code intended to work both from frontend and backend code do
+indeed work when compiled and used in frontend code.  The structure of this
+directory makes no attempt to test that such code works in the backend, as the
+backend has its own tests already, and presumably those tests sufficiently
+exercide the code as used by it.
+
+These test programs are part of the  tap-test suite.  Configure with tap tests
+enabled or these tests will be skipped.
diff --git a/src/test/bin/t/001_test_json.pl b/src/test/bin/t/001_test_json.pl
new file mode 100644
index 0000000000..08e15c956d
--- /dev/null
+++ b/src/test/bin/t/001_test_json.pl
@@ -0,0 +1,48 @@
+# Basic logical replication test
+use strict;
+use warnings;
+use TestLib ();
+use Cwd;
+
+use Test::More tests => 83;
+
+# There doesn't seem to be any easy way to get TestLib to use the binaries from
+# our directory, so we hack up a path to our binary and run that directly.  This
+# seems brittle enough that some other solution should be found, if possible.
+
+my $test_json = join('/', $ENV{TESTDIR}, 'test_json');
+
+ok(-f $test_json, "test_json file exists");
+ok(-x $test_json, "test_json file is executable");
+
+# Verify some valid JSON is accepted by our parser
+TestLib::command_like( [$test_json, q/null/       ], qr{\bVALID\b}, "null");
+TestLib::command_like( [$test_json, q/{}/         ], qr{\bVALID\b}, "empty object");
+TestLib::command_like( [$test_json, q/[]/         ], qr{\bVALID\b}, "empty array");
+TestLib::command_like( [$test_json, q/-12345/     ], qr{\bVALID\b}, "negative integer");
+TestLib::command_like( [$test_json, q/-1/         ], qr{\bVALID\b}, "negative integer");
+TestLib::command_like( [$test_json, q/0/          ], qr{\bVALID\b}, "zero");
+TestLib::command_like( [$test_json, q/1/          ], qr{\bVALID\b}, "positive integer");
+TestLib::command_like( [$test_json, q/12345/      ], qr{\bVALID\b}, "positive integer");
+TestLib::command_like( [$test_json, q/-1.23456789/], qr{\bVALID\b}, "negative float");
+TestLib::command_like( [$test_json, q/1.23456789/ ], qr{\bVALID\b}, "positive float");
+TestLib::command_like( [$test_json, q/{"a": "b"}/ ], qr{\bVALID\b}, "object");
+TestLib::command_like( [$test_json, q/["a", "b"]/ ], qr{\bVALID\b}, "array");
+TestLib::command_like( [$test_json, q/"pigs feet"/], qr{\bVALID\b}, 'text string');
+
+# Verify some invalid JSON is rejected by our parser
+TestLib::command_like( [$test_json, q/{/          ], qr{\bINVALID\b}, 'unclosed object');
+TestLib::command_like( [$test_json, q/[/          ], qr{\bINVALID\b}, 'unclosed array');
+TestLib::command_like( [$test_json, q/(/          ], qr{\bINVALID\b}, 'unclosed parenthesis');
+TestLib::command_like( [$test_json, q/}/          ], qr{\bINVALID\b}, 'unopened object');
+TestLib::command_like( [$test_json, q/]/          ], qr{\bINVALID\b}, 'unopened array');
+TestLib::command_like( [$test_json, q/)/          ], qr{\bINVALID\b}, 'unopened parenthesis');
+TestLib::command_like( [$test_json, q/{{{}}/      ], qr{\bINVALID\b}, 'unbalanced object curlies');
+TestLib::command_like( [$test_json, q/{{}}}/      ], qr{\bINVALID\b}, 'unbalanced object curlies');
+TestLib::command_like( [$test_json, q/[[[]]/      ], qr{\bINVALID\b}, 'unbalanced array braces');
+TestLib::command_like( [$test_json, q/[[]]]/      ], qr{\bINVALID\b}, 'unbalanced array braces');
+TestLib::command_like( [$test_json, q/((())/      ], qr{\bINVALID\b}, 'unbalanced array braces');
+TestLib::command_like( [$test_json, q/(()))/      ], qr{\bINVALID\b}, 'unbalanced array braces');
+TestLib::command_like( [$test_json, q/1 7 13/     ], qr{\bINVALID\b}, 'integer sequence');
+TestLib::command_like( [$test_json, q/{"a", "b"}/ ], qr{\bINVALID\b}, 'mixed object and array syntax');
diff --git a/src/test/bin/test_json.c b/src/test/bin/test_json.c
new file mode 100644
index 0000000000..567a35f3b0
--- /dev/null
+++ b/src/test/bin/test_json.c
@@ -0,0 +1,67 @@
+/*
+ *	pg_test_json.c
+ *		tests validity of json strings against parser implementation.
+ */
+
+#include "postgres_fe.h"
+
+#include "common/jsonapi.h"
+#include "libpq-fe.h"
+
+static const char *progname;
+
+static void parse_json(const char *str);
+
+int
+main(int argc, char *argv[])
+{
+	int			argidx;
+
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_test_json"));
+	progname = get_progname(argv[0]);
+
+	/*
+	 * Make stdout unbuffered to match stderr; and ensure stderr is unbuffered
+	 * too, which it should already be everywhere except sometimes in Windows.
+	 */
+	setbuf(stdout, NULL);
+	setbuf(stderr, NULL);
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			printf(_("Usage: %s jsonstr [, ...]\n"), progname);
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_test_json (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	for (argidx = 1; argidx < argc; argidx++)
+		parse_json(argv[argidx]);
+
+	return 0;
+}
+
+static void
+parse_json(const char *str)
+{
+	char *json;
+	unsigned int json_len;
+	JsonLexContext *lex;
+	int client_encoding;
+	JsonParseErrorType parse_result;
+
+	json_len = (unsigned int) strlen(str);
+	client_encoding = PQenv2encoding();
+
+	json = strdup(str);
+	lex = makeJsonLexContextCstringLen(json, strlen(json), client_encoding, true /* need_escapes */);
+	parse_result = pg_parse_json(lex, &nullSemAction);
+	fprintf(stdout, _("%s\n"), (JSON_SUCCESS == parse_result ? "VALID" : "INVALID"));
+	return;
+}
-- 
2.21.1 (Apple Git-122.3)

#67

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

almost 6 years ago

In reply to: Mark Dilger (#65)

Re: making the backend's json parser work in frontend code

On Sat, Jan 25, 2020 at 6:20 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

On Jan 24, 2020, at 10:43 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Since 0001-0003 have been reviewed by multiple people and nobody's
objected, I have committed those.

I think 0004-0005 have been reviewed and accepted by both me and Andrew, if I understood him correctly:

I've reviewed these patches and Robert's, and they seem basically good to me.

Certainly, nothing in those two patches caused me any concern. I’m going to modify my patches as you suggested, get rid of the INSIST macro, and move the pg_wchar changes to their own thread. None of that should require changes in your 0004 or 0005. It won’t bother me if you commit those two. Andrew?

Just reviewed the latest versions of 4 and 5, they look good to me.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#68

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

almost 6 years ago

In reply to: Mark Dilger (#66)

Re: making the backend's json parser work in frontend code

On Mon, Jan 27, 2020 at 5:54 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

On Jan 22, 2020, at 10:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:

0004 is a substantially cleaned up version of the patch to make the
JSON parser return a result code rather than throwing errors. Names
have been fixed, interfaces have been tidied up, and the thing is
better integrated with the surrounding code. I would really like
comments, if anyone has them, on whether this approach is acceptable.

0005 builds on 0004 by moving three functions from jsonapi.c to
jsonfuncs.c. With that done, jsonapi.c has minimal remaining
dependencies on the backend environment. It would still need a
substitute for elog(ERROR, "some internal thing is broken"); I'm
thinking of using pg_log_fatal() for that case. It would also need a
fix for the problem that pg_mblen() is not available in the front-end
environment. I don't know what to do about that yet exactly, but it
doesn't seem unsolvable. The frontend environment just needs to know
which encoding to use, and needs a way to call PQmblen() rather than
pg_mblen().

I have completed the work in the attached 0006 and 0007 patches.
These are intended to apply after your 0004 and 0005; they won’t
work directly on master which, as of this writing, only contains your
0001-0003 patches.

0006 finishes moving the json parser to src/include/common and src/common.

0007 adds testing.

I would appreciate somebody looking at the portability issues for 0007
on Windows.

We'll need at a minimum something added to src/tools/msvc to build the
test program, maybe some other stuff too. I'll take a look.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#69

Mark Dilger

mark.dilger@enterprisedb.com

almost 6 years ago

In reply to: Andrew Dunstan (#68)

Re: making the backend's json parser work in frontend code

On Jan 26, 2020, at 5:09 PM, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:

We'll need at a minimum something added to src/tools/msvc to build the
test program, maybe some other stuff too. I'll take a look.

Thanks, much appreciated.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#70

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

almost 6 years ago

In reply to: Andrew Dunstan (#68)

Re: making the backend's json parser work in frontend code

0007 adds testing.

I would appreciate somebody looking at the portability issues for 0007
on Windows.

We'll need at a minimum something added to src/tools/msvc to build the
test program, maybe some other stuff too. I'll take a look.

Patch complains that the 0007 patch is malformed:

andrew@ariana:pg_head (master)*$ patch -p 1 <
~/Downloads/v4-0007-Adding-frontend-tests-for-json-parser.patch
patching file src/Makefile
patching file src/test/Makefile
patching file src/test/bin/.gitignore
patching file src/test/bin/Makefile
patching file src/test/bin/README
patching file src/test/bin/t/001_test_json.pl
patch: **** malformed patch at line 201: diff --git
a/src/test/bin/test_json.c b/src/test/bin/test_json.c

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#71

Mark Dilger

mark.dilger@enterprisedb.com

almost 6 years ago

In reply to: Andrew Dunstan (#70)

2 attachment(s)

Re: making the backend's json parser work in frontend code

On Jan 26, 2020, at 5:51 PM, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:

0007 adds testing.

I would appreciate somebody looking at the portability issues for 0007
on Windows.

We'll need at a minimum something added to src/tools/msvc to build the
test program, maybe some other stuff too. I'll take a look.

Patch complains that the 0007 patch is malformed:

andrew@ariana:pg_head (master)*$ patch -p 1 <
~/Downloads/v4-0007-Adding-frontend-tests-for-json-parser.patch
patching file src/Makefile
patching file src/test/Makefile
patching file src/test/bin/.gitignore
patching file src/test/bin/Makefile
patching file src/test/bin/README
patching file src/test/bin/t/001_test_json.pl
patch: **** malformed patch at line 201: diff --git
a/src/test/bin/test_json.c b/src/test/bin/test_json.c

I manually removed a stray newline in the patch file. I shouldn’t have done that. I’ve removed the stray newline in the sources, committed (with git commit —amend) and am testing again, which is what I should have done the first time….

Ok, the tests pass. Here are those two patches again, both regenerated with a fresh invocation of ‘git format-patch’.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

v5-0006-Relocating-jsonapi-to-common.patchapplication/octet-stream; name=v5-0006-Relocating-jsonapi-to-common.patch; x-unix-mode=0644Download

From 6f9159935737451df311eb25a24c080b0814af0b Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Fri, 24 Jan 2020 14:30:35 -0800
Subject: [PATCH 1/2] Relocating jsonapi to common.

Moving jsonapi.c and jsonapi.h to src/common and src/include/common.
Reworking the code to not include elog, ereport, pg_mblen, and
similar backend-only functionality.
---
 contrib/hstore/hstore_io.c                  |  2 +-
 src/backend/tsearch/to_tsany.c              |  2 +-
 src/backend/tsearch/wparser.c               |  2 +-
 src/backend/utils/adt/Makefile              |  1 -
 src/backend/utils/adt/json.c                |  2 +-
 src/backend/utils/adt/jsonb.c               |  2 +-
 src/backend/utils/adt/jsonb_util.c          |  2 +-
 src/backend/utils/adt/jsonfuncs.c           |  7 ++--
 src/common/Makefile                         |  1 +
 src/{backend/utils/adt => common}/jsonapi.c | 42 ++++++++++++++++-----
 src/include/{utils => common}/jsonapi.h     |  5 ++-
 src/include/utils/jsonfuncs.h               |  2 +-
 12 files changed, 49 insertions(+), 21 deletions(-)
 rename src/{backend/utils/adt => common}/jsonapi.c (96%)
 rename src/include/{utils => common}/jsonapi.h (98%)

diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index 10ec392775..f3174f2995 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -7,13 +7,13 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "funcapi.h"
 #include "hstore.h"
 #include "lib/stringinfo.h"
 #include "libpq/pqformat.h"
 #include "utils/builtins.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index adf181c191..1fe67c4c99 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -13,10 +13,10 @@
  */
 #include "postgres.h"
 
+#include "common/jsonapi.h"
 #include "tsearch/ts_cache.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonfuncs.h"
 
 
diff --git a/src/backend/tsearch/wparser.c b/src/backend/tsearch/wparser.c
index c7499a94ac..88005c0519 100644
--- a/src/backend/tsearch/wparser.c
+++ b/src/backend/tsearch/wparser.c
@@ -16,11 +16,11 @@
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
+#include "common/jsonapi.h"
 #include "funcapi.h"
 #include "tsearch/ts_cache.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonfuncs.h"
 #include "utils/varlena.h"
 
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 790d7a24fb..13efa9338c 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -44,7 +44,6 @@ OBJS = \
 	int.o \
 	int8.o \
 	json.o \
-	jsonapi.o \
 	jsonb.o \
 	jsonb_gin.o \
 	jsonb_op.o \
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index f6cd2b9911..567eab1e01 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -127,7 +127,7 @@ json_recv(PG_FUNCTION_ARGS)
 	str = pq_getmsgtext(buf, buf->len - buf->cursor, &nbytes);
 
 	/* Validate it. */
-	lex = makeJsonLexContextCstringLen(str, nbytes, false);
+	lex = makeJsonLexContextCstringLen(str, nbytes, GetDatabaseEncoding(), false);
 	pg_parse_json_or_ereport(lex, &nullSemAction);
 
 	PG_RETURN_TEXT_P(cstring_to_text_with_len(str, nbytes));
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index c912f8932d..fea4335951 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -261,7 +261,7 @@ jsonb_from_cstring(char *json, int len)
 
 	memset(&state, 0, sizeof(state));
 	memset(&sem, 0, sizeof(sem));
-	lex = makeJsonLexContextCstringLen(json, len, true);
+	lex = makeJsonLexContextCstringLen(json, len, GetDatabaseEncoding(), true);
 
 	sem.semstate = (void *) &state;
 
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index b33c3ef43c..edec657cd3 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -15,12 +15,12 @@
 
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/datetime.h"
 #include "utils/hashutils.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/memutils.h"
 #include "utils/varlena.h"
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 66ea11b971..4f6fd0de02 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -18,6 +18,7 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "fmgr.h"
 #include "funcapi.h"
 #include "lib/stringinfo.h"
@@ -27,7 +28,6 @@
 #include "utils/builtins.h"
 #include "utils/hsearch.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/jsonfuncs.h"
 #include "utils/lsyscache.h"
@@ -514,6 +514,7 @@ makeJsonLexContext(text *json, bool need_escapes)
 {
 	return makeJsonLexContextCstringLen(VARDATA_ANY(json),
 										VARSIZE_ANY_EXHDR(json),
+										GetDatabaseEncoding(),
 										need_escapes);
 }
 
@@ -2605,7 +2606,7 @@ populate_array_json(PopulateArrayContext *ctx, char *json, int len)
 	PopulateArrayState state;
 	JsonSemAction sem;
 
-	state.lex = makeJsonLexContextCstringLen(json, len, true);
+	state.lex = makeJsonLexContextCstringLen(json, len, GetDatabaseEncoding(), true);
 	state.ctx = ctx;
 
 	memset(&sem, 0, sizeof(sem));
@@ -3448,7 +3449,7 @@ get_json_object_as_hash(char *json, int len, const char *funcname)
 	HASHCTL		ctl;
 	HTAB	   *tab;
 	JHashState *state;
-	JsonLexContext *lex = makeJsonLexContextCstringLen(json, len, true);
+	JsonLexContext *lex = makeJsonLexContextCstringLen(json, len, GetDatabaseEncoding(), true);
 	JsonSemAction *sem;
 
 	memset(&ctl, 0, sizeof(ctl));
diff --git a/src/common/Makefile b/src/common/Makefile
index 44ca68fa6c..e757fb7399 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -56,6 +56,7 @@ OBJS_COMMON = \
 	f2s.o \
 	file_perm.o \
 	ip.o \
+	jsonapi.o \
 	keywords.o \
 	kwlookup.o \
 	link-canary.o \
diff --git a/src/backend/utils/adt/jsonapi.c b/src/common/jsonapi.c
similarity index 96%
rename from src/backend/utils/adt/jsonapi.c
rename to src/common/jsonapi.c
index 1ac3b7beda..f0e6a63e4d 100644
--- a/src/backend/utils/adt/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -11,11 +11,18 @@
  *
  *-------------------------------------------------------------------------
  */
+#ifndef FRONTEND
 #include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
 
+#include "common/jsonapi.h"
 #include "mb/pg_wchar.h"
+
+#ifndef FRONTEND
 #include "miscadmin.h"
-#include "utils/jsonapi.h"
+#endif
 
 /*
  * The context of the parser is maintained by the recursive descent
@@ -135,13 +142,21 @@ IsValidJsonNumber(const char *str, int len)
  * if really required.
  */
 JsonLexContext *
-makeJsonLexContextCstringLen(char *json, int len, bool need_escapes)
+makeJsonLexContextCstringLen(char *json, int len, int encoding, bool need_escapes)
 {
-	JsonLexContext *lex = palloc0(sizeof(JsonLexContext));
+	JsonLexContext *lex;
+
+#ifndef FRONTEND
+	lex = palloc0(sizeof(JsonLexContext));
+#else
+	lex = (JsonLexContext*) malloc(sizeof(JsonLexContext));
+	memset(lex, 0, sizeof(JsonLexContext));
+#endif
 
 	lex->input = lex->token_terminator = lex->line_start = json;
 	lex->line_number = 1;
 	lex->input_length = len;
+	lex->input_encoding = encoding;
 	if (need_escapes)
 		lex->strval = makeStringInfo();
 	return lex;
@@ -360,7 +375,11 @@ parse_object(JsonLexContext *lex, JsonSemAction *sem)
 	JsonTokenType tok;
 	JsonParseErrorType result;
 
+#ifndef FRONTEND
 	check_stack_depth();
+#else
+	/* TODO: What do we do in frontend code? */
+#endif
 
 	if (ostart != NULL)
 		(*ostart) (sem->semstate);
@@ -460,7 +479,11 @@ parse_array(JsonLexContext *lex, JsonSemAction *sem)
 	json_struct_action aend = sem->array_end;
 	JsonParseErrorType result;
 
+#ifndef FRONTEND
 	check_stack_depth();
+#else
+	/* TODO: What do we do in frontend code? */
+#endif
 
 	if (astart != NULL)
 		(*astart) (sem->semstate);
@@ -720,7 +743,7 @@ json_lex_string(JsonLexContext *lex)
 						ch = (ch * 16) + (*s - 'A') + 10;
 					else
 					{
-						lex->token_terminator = s + pg_mblen(s);
+						lex->token_terminator = s + pg_wchar_table[lex->input_encoding].mblen((const unsigned char *) s);
 						return JSON_UNICODE_ESCAPE_FORMAT;
 					}
 				}
@@ -759,7 +782,7 @@ json_lex_string(JsonLexContext *lex)
 						/* We can't allow this, since our TEXT type doesn't */
 						return JSON_UNICODE_CODE_POINT_ZERO;
 					}
-					else if (GetDatabaseEncoding() == PG_UTF8)
+					else if (lex->input_encoding == PG_UTF8)
 					{
 						unicode_to_utf8(ch, (unsigned char *) utf8str);
 						utf8len = pg_utf_mblen((unsigned char *) utf8str);
@@ -809,7 +832,7 @@ json_lex_string(JsonLexContext *lex)
 					default:
 						/* Not a valid string escape, so signal error. */
 						lex->token_start = s;
-						lex->token_terminator = s + pg_mblen(s);
+						lex->token_terminator = s + pg_wchar_table[lex->input_encoding].mblen((const unsigned char *) s);
 						return JSON_ESCAPING_INVALID;
 				}
 			}
@@ -823,7 +846,7 @@ json_lex_string(JsonLexContext *lex)
 				 * shown it's not a performance win.
 				 */
 				lex->token_start = s;
-				lex->token_terminator = s + pg_mblen(s);
+				lex->token_terminator = s + pg_wchar_table[lex->input_encoding].mblen((const unsigned char *) s);
 				return JSON_ESCAPING_INVALID;
 			}
 
@@ -1004,7 +1027,7 @@ report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
 		case JSON_PARSE_OBJECT_COMMA:
 			return JSON_EXPECTED_STRING;
 		default:
-			elog(ERROR, "unexpected json parse state: %d", ctx);
+			return JSON_BAD_PARSER_STATE;
 	}
 }
 
@@ -1017,7 +1040,8 @@ json_errdetail(JsonParseErrorType error, JsonLexContext *lex)
 	switch (error)
 	{
 		case JSON_SUCCESS:
-			elog(ERROR, "internal error in json parser");
+		case JSON_BAD_PARSER_STATE:
+			return _("internal error in json parser");
 			break;
 		case JSON_ESCAPING_INVALID:
 			return psprintf(_("Escape sequence \"\\%s\" is invalid."),
diff --git a/src/include/utils/jsonapi.h b/src/include/common/jsonapi.h
similarity index 98%
rename from src/include/utils/jsonapi.h
rename to src/include/common/jsonapi.h
index 4d69b18495..375b6d6639 100644
--- a/src/include/utils/jsonapi.h
+++ b/src/include/common/jsonapi.h
@@ -52,7 +52,8 @@ typedef enum
 	JSON_UNICODE_ESCAPE_FORMAT,
 	JSON_UNICODE_HIGH_ESCAPE,
 	JSON_UNICODE_HIGH_SURROGATE,
-	JSON_UNICODE_LOW_SURROGATE
+	JSON_UNICODE_LOW_SURROGATE,
+	JSON_BAD_PARSER_STATE
 } JsonParseErrorType;
 
 
@@ -73,6 +74,7 @@ typedef struct JsonLexContext
 {
 	char	   *input;
 	int			input_length;
+	int			input_encoding;
 	char	   *token_start;
 	char	   *token_terminator;
 	char	   *prev_token_terminator;
@@ -149,6 +151,7 @@ extern JsonParseErrorType json_count_array_elements(JsonLexContext *lex,
  */
 extern JsonLexContext *makeJsonLexContextCstringLen(char *json,
 													int len,
+													int encoding,
 													bool need_escapes);
 
 /* lex one token */
diff --git a/src/include/utils/jsonfuncs.h b/src/include/utils/jsonfuncs.h
index b993f38409..1f1b4029cb 100644
--- a/src/include/utils/jsonfuncs.h
+++ b/src/include/utils/jsonfuncs.h
@@ -14,7 +14,7 @@
 #ifndef JSONFUNCS_H
 #define JSONFUNCS_H
 
-#include "utils/jsonapi.h"
+#include "common/jsonapi.h"
 #include "utils/jsonb.h"
 
 /*
-- 
2.21.1 (Apple Git-122.3)

v5-0007-Adding-frontend-tests-for-json-parser.patchapplication/octet-stream; name=v5-0007-Adding-frontend-tests-for-json-parser.patch; x-unix-mode=0644Download

From c0b68089adf8ea81e63885fe80b0f0ef44cdfbeb Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Sun, 26 Jan 2020 10:41:37 -0800
Subject: [PATCH 2/2] Adding frontend tests for json parser.

Creating tests in new directory src/test/bin for testing
that the json parser can be included and used from within
a frontend standalone binary.
---
 src/Makefile                    |  4 +-
 src/test/Makefile               |  7 ++--
 src/test/bin/.gitignore         |  1 +
 src/test/bin/Makefile           | 41 ++++++++++++++++++++
 src/test/bin/README             | 15 ++++++++
 src/test/bin/t/001_test_json.pl | 47 +++++++++++++++++++++++
 src/test/bin/test_json.c        | 67 +++++++++++++++++++++++++++++++++
 7 files changed, 178 insertions(+), 4 deletions(-)
 create mode 100644 src/test/bin/.gitignore
 create mode 100644 src/test/bin/Makefile
 create mode 100644 src/test/bin/README
 create mode 100644 src/test/bin/t/001_test_json.pl
 create mode 100644 src/test/bin/test_json.c

diff --git a/src/Makefile b/src/Makefile
index bcdbd9588a..ccd4bab0de 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -2,7 +2,8 @@
 #
 # Makefile for src
 #
-# Copyright (c) 1994, Regents of the University of California
+# Portions Copyright (c) 1994, Regents of the University of California
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
 #
 # src/Makefile
 #
@@ -27,6 +28,7 @@ SUBDIRS = \
 	bin \
 	pl \
 	makefiles \
+	test/bin \
 	test/regress \
 	test/isolation \
 	test/perl
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..e24732f190 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,7 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = bin perl regress isolation modules authentication recovery subscription
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
@@ -40,10 +40,11 @@ endif
 ALWAYS_SUBDIRS = $(filter-out $(SUBDIRS),examples kerberos ldap locale thread ssl)
 
 # We want to recurse to all subdirs for all standard targets, except that
-# installcheck and install should not recurse into the subdirectory "modules".
+# installcheck and install should not recurse into the subdirectory "modules"
+# nor "bin".
 
 recurse_alldirs_targets := $(filter-out installcheck install, $(standard_targets))
-installable_dirs := $(filter-out modules, $(SUBDIRS))
+installable_dirs := $(filter-out modules bin, $(SUBDIRS))
 
 $(call recurse,$(recurse_alldirs_targets))
 $(call recurse,installcheck, $(installable_dirs))
diff --git a/src/test/bin/.gitignore b/src/test/bin/.gitignore
new file mode 100644
index 0000000000..6709c749d8
--- /dev/null
+++ b/src/test/bin/.gitignore
@@ -0,0 +1 @@
+test_json
diff --git a/src/test/bin/Makefile b/src/test/bin/Makefile
new file mode 100644
index 0000000000..3eee9091bc
--- /dev/null
+++ b/src/test/bin/Makefile
@@ -0,0 +1,41 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/bin
+#
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/bin/bin/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "bin - the PostgreSQL standalone code binaries for testing"
+PGAPPICON=win32
+
+subdir = src/test/bin
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+# make this available to TAP test scripts
+export with_readline
+
+REFDOCDIR= $(top_srcdir)/doc/src/sgml/ref
+
+override CPPFLAGS := -I. -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+OBJS = \
+	$(WIN32RES) \
+	test_json.o
+
+all: test_json
+
+test_json: $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
+	$(CC) $(CFLAGS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+check:
+	PATH="$(abs_top_builddir)/src/test/bin:$$PATH" $(prove_check)
+
+clean distclean maintainer-clean:
+	rm -f test_json$(X) $(OBJS)
+	rm -rf tmp_check
diff --git a/src/test/bin/README b/src/test/bin/README
new file mode 100644
index 0000000000..214671af88
--- /dev/null
+++ b/src/test/bin/README
@@ -0,0 +1,15 @@
+src/test/bin/README
+
+Binary executable tests
+=======================
+
+This directory contains a set of programs that exercise functionality declared
+in src/include/common and defined in src/common.  The purpose of these programs
+is to verify that code intended to work both from frontend and backend code do
+indeed work when compiled and used in frontend code.  The structure of this
+directory makes no attempt to test that such code works in the backend, as the
+backend has its own tests already, and presumably those tests sufficiently
+exercide the code as used by it.
+
+These test programs are part of the  tap-test suite.  Configure with tap tests
+enabled or these tests will be skipped.
diff --git a/src/test/bin/t/001_test_json.pl b/src/test/bin/t/001_test_json.pl
new file mode 100644
index 0000000000..eb85e1a07b
--- /dev/null
+++ b/src/test/bin/t/001_test_json.pl
@@ -0,0 +1,47 @@
+# Basic logical replication test
+use strict;
+use warnings;
+use TestLib ();
+use Cwd;
+
+use Test::More tests => 83;
+
+# There doesn't seem to be any easy way to get TestLib to use the binaries from
+# our directory, so we hack up a path to our binary and run that directly.  This
+# seems brittle enough that some other solution should be found, if possible.
+
+my $test_json = join('/', $ENV{TESTDIR}, 'test_json');
+
+ok(-f $test_json, "test_json file exists");
+ok(-x $test_json, "test_json file is executable");
+
+# Verify some valid JSON is accepted by our parser
+TestLib::command_like( [$test_json, q/null/       ], qr{\bVALID\b}, "null");
+TestLib::command_like( [$test_json, q/{}/         ], qr{\bVALID\b}, "empty object");
+TestLib::command_like( [$test_json, q/[]/         ], qr{\bVALID\b}, "empty array");
+TestLib::command_like( [$test_json, q/-12345/     ], qr{\bVALID\b}, "negative integer");
+TestLib::command_like( [$test_json, q/-1/         ], qr{\bVALID\b}, "negative integer");
+TestLib::command_like( [$test_json, q/0/          ], qr{\bVALID\b}, "zero");
+TestLib::command_like( [$test_json, q/1/          ], qr{\bVALID\b}, "positive integer");
+TestLib::command_like( [$test_json, q/12345/      ], qr{\bVALID\b}, "positive integer");
+TestLib::command_like( [$test_json, q/-1.23456789/], qr{\bVALID\b}, "negative float");
+TestLib::command_like( [$test_json, q/1.23456789/ ], qr{\bVALID\b}, "positive float");
+TestLib::command_like( [$test_json, q/{"a": "b"}/ ], qr{\bVALID\b}, "object");
+TestLib::command_like( [$test_json, q/["a", "b"]/ ], qr{\bVALID\b}, "array");
+TestLib::command_like( [$test_json, q/"pigs feet"/], qr{\bVALID\b}, 'text string');
+
+# Verify some invalid JSON is rejected by our parser
+TestLib::command_like( [$test_json, q/{/          ], qr{\bINVALID\b}, 'unclosed object');
+TestLib::command_like( [$test_json, q/[/          ], qr{\bINVALID\b}, 'unclosed array');
+TestLib::command_like( [$test_json, q/(/          ], qr{\bINVALID\b}, 'unclosed parenthesis');
+TestLib::command_like( [$test_json, q/}/          ], qr{\bINVALID\b}, 'unopened object');
+TestLib::command_like( [$test_json, q/]/          ], qr{\bINVALID\b}, 'unopened array');
+TestLib::command_like( [$test_json, q/)/          ], qr{\bINVALID\b}, 'unopened parenthesis');
+TestLib::command_like( [$test_json, q/{{{}}/      ], qr{\bINVALID\b}, 'unbalanced object curlies');
+TestLib::command_like( [$test_json, q/{{}}}/      ], qr{\bINVALID\b}, 'unbalanced object curlies');
+TestLib::command_like( [$test_json, q/[[[]]/      ], qr{\bINVALID\b}, 'unbalanced array braces');
+TestLib::command_like( [$test_json, q/[[]]]/      ], qr{\bINVALID\b}, 'unbalanced array braces');
+TestLib::command_like( [$test_json, q/((())/      ], qr{\bINVALID\b}, 'unbalanced array braces');
+TestLib::command_like( [$test_json, q/(()))/      ], qr{\bINVALID\b}, 'unbalanced array braces');
+TestLib::command_like( [$test_json, q/1 7 13/     ], qr{\bINVALID\b}, 'integer sequence');
+TestLib::command_like( [$test_json, q/{"a", "b"}/ ], qr{\bINVALID\b}, 'mixed object and array syntax');
diff --git a/src/test/bin/test_json.c b/src/test/bin/test_json.c
new file mode 100644
index 0000000000..567a35f3b0
--- /dev/null
+++ b/src/test/bin/test_json.c
@@ -0,0 +1,67 @@
+/*
+ *	pg_test_json.c
+ *		tests validity of json strings against parser implementation.
+ */
+
+#include "postgres_fe.h"
+
+#include "common/jsonapi.h"
+#include "libpq-fe.h"
+
+static const char *progname;
+
+static void parse_json(const char *str);
+
+int
+main(int argc, char *argv[])
+{
+	int			argidx;
+
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_test_json"));
+	progname = get_progname(argv[0]);
+
+	/*
+	 * Make stdout unbuffered to match stderr; and ensure stderr is unbuffered
+	 * too, which it should already be everywhere except sometimes in Windows.
+	 */
+	setbuf(stdout, NULL);
+	setbuf(stderr, NULL);
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			printf(_("Usage: %s jsonstr [, ...]\n"), progname);
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_test_json (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	for (argidx = 1; argidx < argc; argidx++)
+		parse_json(argv[argidx]);
+
+	return 0;
+}
+
+static void
+parse_json(const char *str)
+{
+	char *json;
+	unsigned int json_len;
+	JsonLexContext *lex;
+	int client_encoding;
+	JsonParseErrorType parse_result;
+
+	json_len = (unsigned int) strlen(str);
+	client_encoding = PQenv2encoding();
+
+	json = strdup(str);
+	lex = makeJsonLexContextCstringLen(json, strlen(json), client_encoding, true /* need_escapes */);
+	parse_result = pg_parse_json(lex, &nullSemAction);
+	fprintf(stdout, _("%s\n"), (JSON_SUCCESS == parse_result ? "VALID" : "INVALID"));
+	return;
+}
-- 
2.21.1 (Apple Git-122.3)

#72

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Mark Dilger (#71)

Re: making the backend's json parser work in frontend code

On Sun, Jan 26, 2020 at 9:05 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

Ok, the tests pass. Here are those two patches again, both regenerated with a fresh invocation of ‘git format-patch’.

Regarding 0006:

+#ifndef FRONTEND
 #include "miscadmin.h"
-#include "utils/jsonapi.h"
+#endif

I suggest

#ifdef FRONTEND
#define check_stack_depth()
#else
#include "miscadmin.h"
#endif

- lex->token_terminator = s + pg_mblen(s);
+ lex->token_terminator = s +
pg_wchar_table[lex->input_encoding].mblen((const unsigned char *) s);

Can we use pq_encoding_mblen() here? Regardless, it doesn't seem great
to add more direct references to pg_wchar_table. I think we should
avoid that.

+ return JSON_BAD_PARSER_STATE;

I don't like this, either. I'm thinking about adding some
variable-argument macros that either elog() in backend code or else
pg_log_fatal() and exit(1) in frontend code. There are some existing
precedents already (e.g. rmtree.c, pgfnames.c) which could perhaps be
generalized. I think I'll start a new thread about that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#73

Mahendra Singh Thalor

mahi6run@gmail.com

almost 6 years ago

In reply to: Robert Haas (#72)

1 attachment(s)

Re: making the backend's json parser work in frontend code

On Mon, 27 Jan 2020 at 19:00, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Jan 26, 2020 at 9:05 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

Ok, the tests pass. Here are those two patches again, both regenerated with a fresh invocation of ‘git format-patch’.

Regarding 0006:
+#ifndef FRONTEND
#include "miscadmin.h"
-#include "utils/jsonapi.h"
+#endif
I suggest

#ifdef FRONTEND
#define check_stack_depth()
#else
#include "miscadmin.h"
#endif
- lex->token_terminator = s + pg_mblen(s);
+ lex->token_terminator = s +
pg_wchar_table[lex->input_encoding].mblen((const unsigned char *) s);
Can we use pq_encoding_mblen() here? Regardless, it doesn't seem great
to add more direct references to pg_wchar_table. I think we should
avoid that.

+ return JSON_BAD_PARSER_STATE;

I don't like this, either. I'm thinking about adding some
variable-argument macros that either elog() in backend code or else
pg_log_fatal() and exit(1) in frontend code. There are some existing
precedents already (e.g. rmtree.c, pgfnames.c) which could perhaps be
generalized. I think I'll start a new thread about that.

Hi,
I can see one warning on HEAD.

jsonapi.c: In function ‘json_errdetail’:
jsonapi.c:1068:1: warning: control reaches end of non-void function
[-Wreturn-type]
}
^

Attaching a patch to fix warning.

Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Attachments:

Fixed_compiler_warning_json_errdetail.patchapplication/octet-stream; name=Fixed_compiler_warning_json_errdetail.patchDownload

commit 34b2fb4875e1c0043ab0acde8a9f1417b322d3ea
Author: Mahendra Singh Thalor <mahi6run@gmail.com>
Date:   Tue Jan 28 00:28:24 2020 +0530

    Fixed warning for json_errdetail

diff --git a/src/backend/utils/adt/jsonapi.c b/src/backend/utils/adt/jsonapi.c
index 1ac3b7b..b750b68 100644
--- a/src/backend/utils/adt/jsonapi.c
+++ b/src/backend/utils/adt/jsonapi.c
@@ -1065,6 +1065,9 @@ json_errdetail(JsonParseErrorType error, JsonLexContext *lex)
 		case JSON_UNICODE_LOW_SURROGATE:
 			return _("Unicode low surrogate must follow a high surrogate.");
 	}
+
+	/* To silence the compiler. */
+	return NULL;
 }
 
 /*

#74

Mark Dilger

mark.dilger@enterprisedb.com

almost 6 years ago

In reply to: Robert Haas (#72)

3 attachment(s)

Re: making the backend's json parser work in frontend code

On Jan 27, 2020, at 5:30 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Jan 26, 2020 at 9:05 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

Ok, the tests pass. Here are those two patches again, both regenerated with a fresh invocation of ‘git format-patch’.

Regarding 0006:
+#ifndef FRONTEND
#include "miscadmin.h"
-#include "utils/jsonapi.h"
+#endif
I suggest

#ifdef FRONTEND
#define check_stack_depth()
#else
#include "miscadmin.h"
#endif

Sure, we can do it that way.

- lex->token_terminator = s + pg_mblen(s);
+ lex->token_terminator = s +
pg_wchar_table[lex->input_encoding].mblen((const unsigned char *) s);
Can we use pq_encoding_mblen() here? Regardless, it doesn't seem great
to add more direct references to pg_wchar_table. I think we should
avoid that.

Yes, that looks a lot cleaner.

+ return JSON_BAD_PARSER_STATE;

I don't like this, either. I'm thinking about adding some
variable-argument macros that either elog() in backend code or else
pg_log_fatal() and exit(1) in frontend code. There are some existing
precedents already (e.g. rmtree.c, pgfnames.c) which could perhaps be
generalized. I think I'll start a new thread about that.

Right, you started the "pg_croak, or something like it?” thread, which already looks like it might not be resolved quickly. Can we use the

#ifndef FRONTEND
#define pg_log_warning(...) elog(WARNING, __VA_ARGS__)
#else
#include "common/logging.h"
#endif

pattern here as a place holder, and revisit it along with the other couple instances of this pattern if/when the “pg_croak, or something like it?” thread is ready for commit? I’m calling it json_log_and_abort(…) for now, as I can’t hope to guess what the final name will be.

I’m attaching a new patch set with these three changes including Mahendra’s patch posted elsewhere on this thread.

Since you’ve committed your 0004 and 0005 patches, this v6 patch set is now based on a fresh copy of master.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

v6-0001-Relocating-jsonapi-to-common.patchapplication/octet-stream; name=v6-0001-Relocating-jsonapi-to-common.patch; x-unix-mode=0644Download

From 0cee475d0ab71445f976ccd74b8b87c9ce1a5f55 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Fri, 24 Jan 2020 14:30:35 -0800
Subject: [PATCH 1/3] Relocating jsonapi to common.

Moving jsonapi.c and jsonapi.h to src/common and src/include/common.
Reworking the code to not include elog, ereport, pg_mblen, and
similar backend-only functionality.
---
 contrib/hstore/hstore_io.c                  |  2 +-
 src/backend/tsearch/to_tsany.c              |  2 +-
 src/backend/tsearch/wparser.c               |  2 +-
 src/backend/utils/adt/Makefile              |  1 -
 src/backend/utils/adt/json.c                |  2 +-
 src/backend/utils/adt/jsonb.c               |  2 +-
 src/backend/utils/adt/jsonb_util.c          |  2 +-
 src/backend/utils/adt/jsonfuncs.c           |  7 +--
 src/common/Makefile                         |  1 +
 src/{backend/utils/adt => common}/jsonapi.c | 49 ++++++++++++++++-----
 src/include/{utils => common}/jsonapi.h     |  2 +
 src/include/utils/jsonfuncs.h               |  2 +-
 12 files changed, 53 insertions(+), 21 deletions(-)
 rename src/{backend/utils/adt => common}/jsonapi.c (96%)
 rename src/include/{utils => common}/jsonapi.h (99%)

diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index 10ec392775..f3174f2995 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -7,13 +7,13 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "funcapi.h"
 #include "hstore.h"
 #include "lib/stringinfo.h"
 #include "libpq/pqformat.h"
 #include "utils/builtins.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index adf181c191..1fe67c4c99 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -13,10 +13,10 @@
  */
 #include "postgres.h"
 
+#include "common/jsonapi.h"
 #include "tsearch/ts_cache.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonfuncs.h"
 
 
diff --git a/src/backend/tsearch/wparser.c b/src/backend/tsearch/wparser.c
index c7499a94ac..88005c0519 100644
--- a/src/backend/tsearch/wparser.c
+++ b/src/backend/tsearch/wparser.c
@@ -16,11 +16,11 @@
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
+#include "common/jsonapi.h"
 #include "funcapi.h"
 #include "tsearch/ts_cache.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonfuncs.h"
 #include "utils/varlena.h"
 
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 790d7a24fb..13efa9338c 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -44,7 +44,6 @@ OBJS = \
 	int.o \
 	int8.o \
 	json.o \
-	jsonapi.o \
 	jsonb.o \
 	jsonb_gin.o \
 	jsonb_op.o \
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index f6cd2b9911..567eab1e01 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -127,7 +127,7 @@ json_recv(PG_FUNCTION_ARGS)
 	str = pq_getmsgtext(buf, buf->len - buf->cursor, &nbytes);
 
 	/* Validate it. */
-	lex = makeJsonLexContextCstringLen(str, nbytes, false);
+	lex = makeJsonLexContextCstringLen(str, nbytes, GetDatabaseEncoding(), false);
 	pg_parse_json_or_ereport(lex, &nullSemAction);
 
 	PG_RETURN_TEXT_P(cstring_to_text_with_len(str, nbytes));
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index c912f8932d..fea4335951 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -261,7 +261,7 @@ jsonb_from_cstring(char *json, int len)
 
 	memset(&state, 0, sizeof(state));
 	memset(&sem, 0, sizeof(sem));
-	lex = makeJsonLexContextCstringLen(json, len, true);
+	lex = makeJsonLexContextCstringLen(json, len, GetDatabaseEncoding(), true);
 
 	sem.semstate = (void *) &state;
 
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index b33c3ef43c..edec657cd3 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -15,12 +15,12 @@
 
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/datetime.h"
 #include "utils/hashutils.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/memutils.h"
 #include "utils/varlena.h"
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 66ea11b971..4f6fd0de02 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -18,6 +18,7 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "fmgr.h"
 #include "funcapi.h"
 #include "lib/stringinfo.h"
@@ -27,7 +28,6 @@
 #include "utils/builtins.h"
 #include "utils/hsearch.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/jsonfuncs.h"
 #include "utils/lsyscache.h"
@@ -514,6 +514,7 @@ makeJsonLexContext(text *json, bool need_escapes)
 {
 	return makeJsonLexContextCstringLen(VARDATA_ANY(json),
 										VARSIZE_ANY_EXHDR(json),
+										GetDatabaseEncoding(),
 										need_escapes);
 }
 
@@ -2605,7 +2606,7 @@ populate_array_json(PopulateArrayContext *ctx, char *json, int len)
 	PopulateArrayState state;
 	JsonSemAction sem;
 
-	state.lex = makeJsonLexContextCstringLen(json, len, true);
+	state.lex = makeJsonLexContextCstringLen(json, len, GetDatabaseEncoding(), true);
 	state.ctx = ctx;
 
 	memset(&sem, 0, sizeof(sem));
@@ -3448,7 +3449,7 @@ get_json_object_as_hash(char *json, int len, const char *funcname)
 	HASHCTL		ctl;
 	HTAB	   *tab;
 	JHashState *state;
-	JsonLexContext *lex = makeJsonLexContextCstringLen(json, len, true);
+	JsonLexContext *lex = makeJsonLexContextCstringLen(json, len, GetDatabaseEncoding(), true);
 	JsonSemAction *sem;
 
 	memset(&ctl, 0, sizeof(ctl));
diff --git a/src/common/Makefile b/src/common/Makefile
index 44ca68fa6c..e757fb7399 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -56,6 +56,7 @@ OBJS_COMMON = \
 	f2s.o \
 	file_perm.o \
 	ip.o \
+	jsonapi.o \
 	keywords.o \
 	kwlookup.o \
 	link-canary.o \
diff --git a/src/backend/utils/adt/jsonapi.c b/src/common/jsonapi.c
similarity index 96%
rename from src/backend/utils/adt/jsonapi.c
rename to src/common/jsonapi.c
index 1ac3b7beda..a048b3e87b 100644
--- a/src/backend/utils/adt/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -11,11 +11,30 @@
  *
  *-------------------------------------------------------------------------
  */
+#ifndef FRONTEND
 #include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
+
+#include "common/jsonapi.h"
+
+#ifdef FRONTEND
+#include "common/logging.h"
+#endif
 
 #include "mb/pg_wchar.h"
+
+#ifndef FRONTEND
 #include "miscadmin.h"
-#include "utils/jsonapi.h"
+#endif
+
+#ifdef FRONTEND
+#define check_stack_depth()
+#define json_log_and_abort(...) pg_log_fatal(__VA_ARGS__); exit(1);
+#else
+#define json_log_and_abort(...) elog(ERROR, __VA_ARGS__)
+#endif
 
 /*
  * The context of the parser is maintained by the recursive descent
@@ -135,13 +154,21 @@ IsValidJsonNumber(const char *str, int len)
  * if really required.
  */
 JsonLexContext *
-makeJsonLexContextCstringLen(char *json, int len, bool need_escapes)
+makeJsonLexContextCstringLen(char *json, int len, int encoding, bool need_escapes)
 {
-	JsonLexContext *lex = palloc0(sizeof(JsonLexContext));
+	JsonLexContext *lex;
+
+#ifndef FRONTEND
+	lex = palloc0(sizeof(JsonLexContext));
+#else
+	lex = (JsonLexContext*) malloc(sizeof(JsonLexContext));
+	memset(lex, 0, sizeof(JsonLexContext));
+#endif
 
 	lex->input = lex->token_terminator = lex->line_start = json;
 	lex->line_number = 1;
 	lex->input_length = len;
+	lex->input_encoding = encoding;
 	if (need_escapes)
 		lex->strval = makeStringInfo();
 	return lex;
@@ -720,7 +747,7 @@ json_lex_string(JsonLexContext *lex)
 						ch = (ch * 16) + (*s - 'A') + 10;
 					else
 					{
-						lex->token_terminator = s + pg_mblen(s);
+						lex->token_terminator = s + pg_encoding_mblen(lex->input_encoding, s);
 						return JSON_UNICODE_ESCAPE_FORMAT;
 					}
 				}
@@ -759,7 +786,7 @@ json_lex_string(JsonLexContext *lex)
 						/* We can't allow this, since our TEXT type doesn't */
 						return JSON_UNICODE_CODE_POINT_ZERO;
 					}
-					else if (GetDatabaseEncoding() == PG_UTF8)
+					else if (lex->input_encoding == PG_UTF8)
 					{
 						unicode_to_utf8(ch, (unsigned char *) utf8str);
 						utf8len = pg_utf_mblen((unsigned char *) utf8str);
@@ -809,7 +836,7 @@ json_lex_string(JsonLexContext *lex)
 					default:
 						/* Not a valid string escape, so signal error. */
 						lex->token_start = s;
-						lex->token_terminator = s + pg_mblen(s);
+						lex->token_terminator = s + pg_encoding_mblen(lex->input_encoding, s);
 						return JSON_ESCAPING_INVALID;
 				}
 			}
@@ -823,7 +850,7 @@ json_lex_string(JsonLexContext *lex)
 				 * shown it's not a performance win.
 				 */
 				lex->token_start = s;
-				lex->token_terminator = s + pg_mblen(s);
+				lex->token_terminator = s + pg_encoding_mblen(lex->input_encoding, s);
 				return JSON_ESCAPING_INVALID;
 			}
 
@@ -1003,9 +1030,10 @@ report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
 			return JSON_EXPECTED_OBJECT_NEXT;
 		case JSON_PARSE_OBJECT_COMMA:
 			return JSON_EXPECTED_STRING;
-		default:
-			elog(ERROR, "unexpected json parse state: %d", ctx);
 	}
+
+	/* Not reached */
+	json_log_and_abort("unexpected json parse state: %d", ctx);
 }
 
 /*
@@ -1017,7 +1045,8 @@ json_errdetail(JsonParseErrorType error, JsonLexContext *lex)
 	switch (error)
 	{
 		case JSON_SUCCESS:
-			elog(ERROR, "internal error in json parser");
+			/* If success, why were we called? */
+			json_log_and_abort("internal error in json parser");
 			break;
 		case JSON_ESCAPING_INVALID:
 			return psprintf(_("Escape sequence \"\\%s\" is invalid."),
diff --git a/src/include/utils/jsonapi.h b/src/include/common/jsonapi.h
similarity index 99%
rename from src/include/utils/jsonapi.h
rename to src/include/common/jsonapi.h
index 4d69b18495..bcfd57cc53 100644
--- a/src/include/utils/jsonapi.h
+++ b/src/include/common/jsonapi.h
@@ -73,6 +73,7 @@ typedef struct JsonLexContext
 {
 	char	   *input;
 	int			input_length;
+	int			input_encoding;
 	char	   *token_start;
 	char	   *token_terminator;
 	char	   *prev_token_terminator;
@@ -149,6 +150,7 @@ extern JsonParseErrorType json_count_array_elements(JsonLexContext *lex,
  */
 extern JsonLexContext *makeJsonLexContextCstringLen(char *json,
 													int len,
+													int encoding,
 													bool need_escapes);
 
 /* lex one token */
diff --git a/src/include/utils/jsonfuncs.h b/src/include/utils/jsonfuncs.h
index b993f38409..1f1b4029cb 100644
--- a/src/include/utils/jsonfuncs.h
+++ b/src/include/utils/jsonfuncs.h
@@ -14,7 +14,7 @@
 #ifndef JSONFUNCS_H
 #define JSONFUNCS_H
 
-#include "utils/jsonapi.h"
+#include "common/jsonapi.h"
 #include "utils/jsonb.h"
 
 /*
-- 
2.21.1 (Apple Git-122.3)

v6-0002-Fixed-warning-for-json_errdetail.patchapplication/octet-stream; name=v6-0002-Fixed-warning-for-json_errdetail.patch; x-unix-mode=0644Download

From 43dd6111ce584c32aab1375bc7ad5b1a560c8960 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 27 Jan 2020 11:42:07 -0800
Subject: [PATCH 2/3] Fixed warning for json_errdetail

To keep the compiler happy, adding a not reached
return at the end of json_errdetail.

Author: Mahendra Singh Thalor <mahi6run@gmail.com>
---
 src/common/jsonapi.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
index a048b3e87b..3d21ae1b6a 100644
--- a/src/common/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -1094,6 +1094,9 @@ json_errdetail(JsonParseErrorType error, JsonLexContext *lex)
 		case JSON_UNICODE_LOW_SURROGATE:
 			return _("Unicode low surrogate must follow a high surrogate.");
 	}
+
+	/* To silence the compiler. */
+	return NULL;
 }
 
 /*
-- 
2.21.1 (Apple Git-122.3)

v6-0003-Adding-frontend-tests-for-json-parser.patchapplication/octet-stream; name=v6-0003-Adding-frontend-tests-for-json-parser.patch; x-unix-mode=0644Download

From 57a0140b49ec230ec6fadc161d839c0c6d0fc0e1 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Sun, 26 Jan 2020 10:41:37 -0800
Subject: [PATCH 3/3] Adding frontend tests for json parser.

Creating tests in new directory src/test/bin for testing
that the json parser can be included and used from within
a frontend standalone binary.
---
 src/Makefile                    |  4 +-
 src/test/Makefile               |  7 ++--
 src/test/bin/.gitignore         |  1 +
 src/test/bin/Makefile           | 41 ++++++++++++++++++++
 src/test/bin/README             | 15 ++++++++
 src/test/bin/t/001_test_json.pl | 47 +++++++++++++++++++++++
 src/test/bin/test_json.c        | 67 +++++++++++++++++++++++++++++++++
 7 files changed, 178 insertions(+), 4 deletions(-)
 create mode 100644 src/test/bin/.gitignore
 create mode 100644 src/test/bin/Makefile
 create mode 100644 src/test/bin/README
 create mode 100644 src/test/bin/t/001_test_json.pl
 create mode 100644 src/test/bin/test_json.c

diff --git a/src/Makefile b/src/Makefile
index bcdbd9588a..ccd4bab0de 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -2,7 +2,8 @@
 #
 # Makefile for src
 #
-# Copyright (c) 1994, Regents of the University of California
+# Portions Copyright (c) 1994, Regents of the University of California
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
 #
 # src/Makefile
 #
@@ -27,6 +28,7 @@ SUBDIRS = \
 	bin \
 	pl \
 	makefiles \
+	test/bin \
 	test/regress \
 	test/isolation \
 	test/perl
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..e24732f190 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,7 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = bin perl regress isolation modules authentication recovery subscription
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
@@ -40,10 +40,11 @@ endif
 ALWAYS_SUBDIRS = $(filter-out $(SUBDIRS),examples kerberos ldap locale thread ssl)
 
 # We want to recurse to all subdirs for all standard targets, except that
-# installcheck and install should not recurse into the subdirectory "modules".
+# installcheck and install should not recurse into the subdirectory "modules"
+# nor "bin".
 
 recurse_alldirs_targets := $(filter-out installcheck install, $(standard_targets))
-installable_dirs := $(filter-out modules, $(SUBDIRS))
+installable_dirs := $(filter-out modules bin, $(SUBDIRS))
 
 $(call recurse,$(recurse_alldirs_targets))
 $(call recurse,installcheck, $(installable_dirs))
diff --git a/src/test/bin/.gitignore b/src/test/bin/.gitignore
new file mode 100644
index 0000000000..6709c749d8
--- /dev/null
+++ b/src/test/bin/.gitignore
@@ -0,0 +1 @@
+test_json
diff --git a/src/test/bin/Makefile b/src/test/bin/Makefile
new file mode 100644
index 0000000000..3eee9091bc
--- /dev/null
+++ b/src/test/bin/Makefile
@@ -0,0 +1,41 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/bin
+#
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/bin/bin/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "bin - the PostgreSQL standalone code binaries for testing"
+PGAPPICON=win32
+
+subdir = src/test/bin
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+# make this available to TAP test scripts
+export with_readline
+
+REFDOCDIR= $(top_srcdir)/doc/src/sgml/ref
+
+override CPPFLAGS := -I. -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+OBJS = \
+	$(WIN32RES) \
+	test_json.o
+
+all: test_json
+
+test_json: $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
+	$(CC) $(CFLAGS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+check:
+	PATH="$(abs_top_builddir)/src/test/bin:$$PATH" $(prove_check)
+
+clean distclean maintainer-clean:
+	rm -f test_json$(X) $(OBJS)
+	rm -rf tmp_check
diff --git a/src/test/bin/README b/src/test/bin/README
new file mode 100644
index 0000000000..214671af88
--- /dev/null
+++ b/src/test/bin/README
@@ -0,0 +1,15 @@
+src/test/bin/README
+
+Binary executable tests
+=======================
+
+This directory contains a set of programs that exercise functionality declared
+in src/include/common and defined in src/common.  The purpose of these programs
+is to verify that code intended to work both from frontend and backend code do
+indeed work when compiled and used in frontend code.  The structure of this
+directory makes no attempt to test that such code works in the backend, as the
+backend has its own tests already, and presumably those tests sufficiently
+exercide the code as used by it.
+
+These test programs are part of the  tap-test suite.  Configure with tap tests
+enabled or these tests will be skipped.
diff --git a/src/test/bin/t/001_test_json.pl b/src/test/bin/t/001_test_json.pl
new file mode 100644
index 0000000000..eb85e1a07b
--- /dev/null
+++ b/src/test/bin/t/001_test_json.pl
@@ -0,0 +1,47 @@
+# Basic logical replication test
+use strict;
+use warnings;
+use TestLib ();
+use Cwd;
+
+use Test::More tests => 83;
+
+# There doesn't seem to be any easy way to get TestLib to use the binaries from
+# our directory, so we hack up a path to our binary and run that directly.  This
+# seems brittle enough that some other solution should be found, if possible.
+
+my $test_json = join('/', $ENV{TESTDIR}, 'test_json');
+
+ok(-f $test_json, "test_json file exists");
+ok(-x $test_json, "test_json file is executable");
+
+# Verify some valid JSON is accepted by our parser
+TestLib::command_like( [$test_json, q/null/       ], qr{\bVALID\b}, "null");
+TestLib::command_like( [$test_json, q/{}/         ], qr{\bVALID\b}, "empty object");
+TestLib::command_like( [$test_json, q/[]/         ], qr{\bVALID\b}, "empty array");
+TestLib::command_like( [$test_json, q/-12345/     ], qr{\bVALID\b}, "negative integer");
+TestLib::command_like( [$test_json, q/-1/         ], qr{\bVALID\b}, "negative integer");
+TestLib::command_like( [$test_json, q/0/          ], qr{\bVALID\b}, "zero");
+TestLib::command_like( [$test_json, q/1/          ], qr{\bVALID\b}, "positive integer");
+TestLib::command_like( [$test_json, q/12345/      ], qr{\bVALID\b}, "positive integer");
+TestLib::command_like( [$test_json, q/-1.23456789/], qr{\bVALID\b}, "negative float");
+TestLib::command_like( [$test_json, q/1.23456789/ ], qr{\bVALID\b}, "positive float");
+TestLib::command_like( [$test_json, q/{"a": "b"}/ ], qr{\bVALID\b}, "object");
+TestLib::command_like( [$test_json, q/["a", "b"]/ ], qr{\bVALID\b}, "array");
+TestLib::command_like( [$test_json, q/"pigs feet"/], qr{\bVALID\b}, 'text string');
+
+# Verify some invalid JSON is rejected by our parser
+TestLib::command_like( [$test_json, q/{/          ], qr{\bINVALID\b}, 'unclosed object');
+TestLib::command_like( [$test_json, q/[/          ], qr{\bINVALID\b}, 'unclosed array');
+TestLib::command_like( [$test_json, q/(/          ], qr{\bINVALID\b}, 'unclosed parenthesis');
+TestLib::command_like( [$test_json, q/}/          ], qr{\bINVALID\b}, 'unopened object');
+TestLib::command_like( [$test_json, q/]/          ], qr{\bINVALID\b}, 'unopened array');
+TestLib::command_like( [$test_json, q/)/          ], qr{\bINVALID\b}, 'unopened parenthesis');
+TestLib::command_like( [$test_json, q/{{{}}/      ], qr{\bINVALID\b}, 'unbalanced object curlies');
+TestLib::command_like( [$test_json, q/{{}}}/      ], qr{\bINVALID\b}, 'unbalanced object curlies');
+TestLib::command_like( [$test_json, q/[[[]]/      ], qr{\bINVALID\b}, 'unbalanced array braces');
+TestLib::command_like( [$test_json, q/[[]]]/      ], qr{\bINVALID\b}, 'unbalanced array braces');
+TestLib::command_like( [$test_json, q/((())/      ], qr{\bINVALID\b}, 'unbalanced array braces');
+TestLib::command_like( [$test_json, q/(()))/      ], qr{\bINVALID\b}, 'unbalanced array braces');
+TestLib::command_like( [$test_json, q/1 7 13/     ], qr{\bINVALID\b}, 'integer sequence');
+TestLib::command_like( [$test_json, q/{"a", "b"}/ ], qr{\bINVALID\b}, 'mixed object and array syntax');
diff --git a/src/test/bin/test_json.c b/src/test/bin/test_json.c
new file mode 100644
index 0000000000..567a35f3b0
--- /dev/null
+++ b/src/test/bin/test_json.c
@@ -0,0 +1,67 @@
+/*
+ *	pg_test_json.c
+ *		tests validity of json strings against parser implementation.
+ */
+
+#include "postgres_fe.h"
+
+#include "common/jsonapi.h"
+#include "libpq-fe.h"
+
+static const char *progname;
+
+static void parse_json(const char *str);
+
+int
+main(int argc, char *argv[])
+{
+	int			argidx;
+
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_test_json"));
+	progname = get_progname(argv[0]);
+
+	/*
+	 * Make stdout unbuffered to match stderr; and ensure stderr is unbuffered
+	 * too, which it should already be everywhere except sometimes in Windows.
+	 */
+	setbuf(stdout, NULL);
+	setbuf(stderr, NULL);
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			printf(_("Usage: %s jsonstr [, ...]\n"), progname);
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_test_json (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	for (argidx = 1; argidx < argc; argidx++)
+		parse_json(argv[argidx]);
+
+	return 0;
+}
+
+static void
+parse_json(const char *str)
+{
+	char *json;
+	unsigned int json_len;
+	JsonLexContext *lex;
+	int client_encoding;
+	JsonParseErrorType parse_result;
+
+	json_len = (unsigned int) strlen(str);
+	client_encoding = PQenv2encoding();
+
+	json = strdup(str);
+	lex = makeJsonLexContextCstringLen(json, strlen(json), client_encoding, true /* need_escapes */);
+	parse_result = pg_parse_json(lex, &nullSemAction);
+	fprintf(stdout, _("%s\n"), (JSON_SUCCESS == parse_result ? "VALID" : "INVALID"));
+	return;
+}
-- 
2.21.1 (Apple Git-122.3)

#75

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Mahendra Singh Thalor (#73)

Re: making the backend's json parser work in frontend code

On Mon, Jan 27, 2020 at 2:02 PM Mahendra Singh Thalor
<mahi6run@gmail.com> wrote:

I can see one warning on HEAD.

jsonapi.c: In function ‘json_errdetail’:
jsonapi.c:1068:1: warning: control reaches end of non-void function
[-Wreturn-type]
}
^

Attaching a patch to fix warning.

Hmm, I don't get a warning there. This function is a switch over an
enum type with a case for every value of the enum, and every branch
either does a "return" or an "elog," so any code after the switch
should be unreachable. It's possible your compiler is too dumb to know
that, but I thought there were other places in the code base where we
assumed that if we handled every defined value of enum, that was good
enough.

But maybe not. I found similar coding in CreateDestReceiver(), and
that ends with:

/* should never get here */
pg_unreachable();

So perhaps we need the same thing here. Does adding that fix it for you?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#76

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Mark Dilger (#74)

Re: making the backend's json parser work in frontend code

On Mon, Jan 27, 2020 at 3:05 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

I’m attaching a new patch set with these three changes including Mahendra’s patch posted elsewhere on this thread.

Since you’ve committed your 0004 and 0005 patches, this v6 patch set is now based on a fresh copy of master.

OK, so I think this is getting close.

What is now 0001 manages to have four (4) conditionals on FRONTEND at
the top of the file. This seems like at least one two many. I am OK
with this being separate:

+#ifndef FRONTEND
 #include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif

postgres(_fe).h has pride of place among includes, so it's reasonable
to put this in its own section like this.

+#ifdef FRONTEND
+#define check_stack_depth()
+#define json_log_and_abort(...) pg_log_fatal(__VA_ARGS__); exit(1);
+#else
+#define json_log_and_abort(...) elog(ERROR, __VA_ARGS__)
+#endif

OK, so here we have a section entirely devoted to our own file-local
macros. Also reasonable. But in between, you have both an #ifdef
FRONTEND and an #ifndef FRONTEND for other includes, and I really
think that should be like #ifdef FRONTEND .. #else .. #endif.

Also, the preprocessor police are on their way to your house now to
arrest you for that first one. You need to write it like this:

#define json_log_and_abort(...) \
do { pg_log_fatal(__VA_ARGS__); exit(1); } while (0)

Otherwise, hilarity ensues if somebody writes if (my_code_is_buggy)
json_log_and_abort("oops").

 {
- JsonLexContext *lex = palloc0(sizeof(JsonLexContext));
+ JsonLexContext *lex;
+
+#ifndef FRONTEND
+ lex = palloc0(sizeof(JsonLexContext));
+#else
+ lex = (JsonLexContext*) malloc(sizeof(JsonLexContext));
+ memset(lex, 0, sizeof(JsonLexContext));
+#endif

Instead of this, how making no change at all here?

- default:
- elog(ERROR, "unexpected json parse state: %d", ctx);
  }
+
+ /* Not reached */
+ json_log_and_abort("unexpected json parse state: %d", ctx);

This, too, seems unnecessary.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#77

Julien Rouhaud

rjuju123@gmail.com

almost 6 years ago

In reply to: Robert Haas (#75)

Re: making the backend's json parser work in frontend code

On Tue, Jan 28, 2020 at 4:06 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jan 27, 2020 at 2:02 PM Mahendra Singh Thalor
<mahi6run@gmail.com> wrote:

I can see one warning on HEAD.

jsonapi.c: In function ‘json_errdetail’:
jsonapi.c:1068:1: warning: control reaches end of non-void function
[-Wreturn-type]
}
^

Attaching a patch to fix warning.

Hmm, I don't get a warning there. This function is a switch over an
enum type with a case for every value of the enum, and every branch
either does a "return" or an "elog," so any code after the switch
should be unreachable. It's possible your compiler is too dumb to know
that, but I thought there were other places in the code base where we
assumed that if we handled every defined value of enum, that was good
enough.

But maybe not. I found similar coding in CreateDestReceiver(), and
that ends with:

/* should never get here */
pg_unreachable();

So perhaps we need the same thing here. Does adding that fix it for you?

FTR this has unfortunately the same result on Thomas' automatic patch
tester, e.g. https://travis-ci.org/postgresql-cfbot/postgresql/builds/642634195#L1968

#78

Mahendra Singh Thalor

mahi6run@gmail.com

almost 6 years ago

In reply to: Robert Haas (#75)

Re: making the backend's json parser work in frontend code

On Tue, 28 Jan 2020 at 20:36, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jan 27, 2020 at 2:02 PM Mahendra Singh Thalor
<mahi6run@gmail.com> wrote:

I can see one warning on HEAD.

jsonapi.c: In function ‘json_errdetail’:
jsonapi.c:1068:1: warning: control reaches end of non-void function
[-Wreturn-type]
}
^

Attaching a patch to fix warning.

Hmm, I don't get a warning there. This function is a switch over an
enum type with a case for every value of the enum, and every branch
either does a "return" or an "elog," so any code after the switch
should be unreachable. It's possible your compiler is too dumb to know
that, but I thought there were other places in the code base where we
assumed that if we handled every defined value of enum, that was good
enough.

But maybe not. I found similar coding in CreateDestReceiver(), and
that ends with:

/* should never get here */
pg_unreachable();

So perhaps we need the same thing here. Does adding that fix it for you?

Hi Robert,
Tom Lane already fixed this and committed yesterday(4589c6a2a30faba53d0655a8e).

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

#79

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Mahendra Singh Thalor (#78)

Re: making the backend's json parser work in frontend code

On Tue, Jan 28, 2020 at 10:30 AM Mahendra Singh Thalor
<mahi6run@gmail.com> wrote:

Tom Lane already fixed this and committed yesterday(4589c6a2a30faba53d0655a8e).

Oops. OK, thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#80

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Julien Rouhaud (#77)

Re: making the backend's json parser work in frontend code

On Tue, Jan 28, 2020 at 10:19 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

FTR this has unfortunately the same result on Thomas' automatic patch
tester, e.g. https://travis-ci.org/postgresql-cfbot/postgresql/builds/642634195#L1968

That's unfortunate ... but presumably Tom's changes took care of this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#81

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Robert Haas (#80)

Re: making the backend's json parser work in frontend code

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jan 28, 2020 at 10:19 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

FTR this has unfortunately the same result on Thomas' automatic patch
tester, e.g. https://travis-ci.org/postgresql-cfbot/postgresql/builds/642634195#L1968

That's unfortunate ... but presumably Tom's changes took care of this?

Probably the cfbot just hasn't retried this build since that fix.
I don't know what its algorithm is for retrying failed builds, but it
does seem to do so after awhile.

regards, tom lane

#82

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Mark Dilger (#74)

Re: making the backend's json parser work in frontend code

On Mon, Jan 27, 2020 at 3:05 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

Since you’ve committed your 0004 and 0005 patches, this v6 patch set is now based on a fresh copy of master.

I think the first question for 0005 is whether want this at all.
Initially, you proposed NOT committing it, but then Andrew reviewed it
as if it were for commit. I'm not sure whether he was actually saying
that it ought to be committed, though, or whether he just missed your
remarks on the topic. Nobody else has really taken a position. I'm not
100% convinced that it's necessary to include this, but I'm also not
particularly opposed to it. It's a fairly small amount of code, which
is nice, and perhaps useful as a demonstration of how to use the JSON
parser in a frontend application, which someone also might find nice.
Anyone else want to express an opinion?

Meanwhile, here is a round of nitp^H^H^H^Hreview:

-# installcheck and install should not recurse into the subdirectory "modules".
+# installcheck and install should not recurse into the subdirectory "modules"
+# nor "bin".

I would probably have just changed this to:

# installcheck and install should not recurse into "modules" or "bin"

The details are arguable, but you definitely shouldn't say "the
subdirectory" and then list two of them.

+This directory contains a set of programs that exercise functionality declared
+in src/include/common and defined in src/common.  The purpose of these programs
+is to verify that code intended to work both from frontend and backend code do
+indeed work when compiled and used in frontend code.  The structure of this
+directory makes no attempt to test that such code works in the backend, as the
+backend has its own tests already, and presumably those tests sufficiently
+exercide the code as used by it.

"exercide" is not spelled correctly, but I also disagree with giving
the directory so narrow a charter. I think you should just say
something like:

This directory contains programs that are built and executed for
testing purposes,
but never installed. It may be used, for example, to test that code in
src/common
works in frontend environments.

+# There doesn't seem to be any easy way to get TestLib to use the binaries from
+# our directory, so we hack up a path to our binary and run that
directly.  This
+# seems brittle enough that some other solution should be found, if possible.
+
+my $test_json = join('/', $ENV{TESTDIR}, 'test_json');

I don't know what the right thing to do here is. Perhaps someone more
familiar with TAP testing can comment.

+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_test_json"));

Do we need this? I guess we're not likely to bother with translations
for a test program.

+ /*
+ * Make stdout unbuffered to match stderr; and ensure stderr is unbuffered
+ * too, which it should already be everywhere except sometimes in Windows.
+ */
+ setbuf(stdout, NULL);
+ setbuf(stderr, NULL);

Do we need this? If so, why?

+ char *json;
+ unsigned int json_len;
+ JsonLexContext *lex;
+ int client_encoding;
+ JsonParseErrorType parse_result;
+
+ json_len = (unsigned int) strlen(str);
+ client_encoding = PQenv2encoding();
+
+ json = strdup(str);
+ lex = makeJsonLexContextCstringLen(json, strlen(json),
client_encoding, true /* need_escapes */);
+ parse_result = pg_parse_json(lex, &nullSemAction);
+ fprintf(stdout, _("%s\n"), (JSON_SUCCESS == parse_result ? "VALID" :
"INVALID"));
+ return;

json_len is set but not used.

Not entirely sure why we are using PQenv2encoding() here.

The trailing return is unnecessary.

I think it would be a good idea to use json_errdetail() in the failure
case, print the error, and have the tests check that we got the
expected error.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#83

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Robert Haas (#79)

Re: making the backend's json parser work in frontend code

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jan 28, 2020 at 10:30 AM Mahendra Singh Thalor
<mahi6run@gmail.com> wrote:

Tom Lane already fixed this and committed yesterday(4589c6a2a30faba53d0655a8e).

Oops. OK, thanks.

Yeah, there were multiple issues here:

1. If a switch is expected to cover all values of an enum type,
we now prefer not to have a default: case, so that we'll get
compiler warnings if somebody adds an enum value and fails to
update the switch.

2. Without a default:, though, you need to have after-the-switch
code to catch the possibility that the runtime value was not a
legal enum element. Some compilers are trusting and assume that
that's not a possible case, but some are not (and Coverity will
complain about it too).

3. Some compilers still don't understand that elog(ERROR) doesn't
return, so you need a dummy return. Perhaps pg_unreachable()
would do as well, but project style has been the dummy return for
a long time ... and I'm not entirely convinced by the assumption
that every compiler understands pg_unreachable(), anyway.

(I know Robert knows all this stuff, even if he momentarily
forgot. Just summarizing for onlookers.)

regards, tom lane

#84

Julien Rouhaud

rjuju123@gmail.com

almost 6 years ago

In reply to: Tom Lane (#81)

Re: making the backend's json parser work in frontend code

On Tue, Jan 28, 2020 at 5:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jan 28, 2020 at 10:19 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

FTR this has unfortunately the same result on Thomas' automatic patch
tester, e.g. https://travis-ci.org/postgresql-cfbot/postgresql/builds/642634195#L1968

That's unfortunate ... but presumably Tom's changes took care of this?

Probably the cfbot just hasn't retried this build since that fix.
I don't know what its algorithm is for retrying failed builds, but it
does seem to do so after awhile.

Yes, I think to remember that Thomas put some rules to avoid
rebuilding everything all the time. Patches that was rebuilt since
indeed starting to get back to green, so it's all good!

#85

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Tom Lane (#83)

Re: making the backend's json parser work in frontend code

On Tue, Jan 28, 2020 at 11:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

3. Some compilers still don't understand that elog(ERROR) doesn't
return, so you need a dummy return. Perhaps pg_unreachable()
would do as well, but project style has been the dummy return for
a long time ... and I'm not entirely convinced by the assumption
that every compiler understands pg_unreachable(), anyway.

Is the example of CreateDestReceiver() sufficient to show that this is
not a problem in practice?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#86

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Robert Haas (#85)

Re: making the backend's json parser work in frontend code

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jan 28, 2020 at 11:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

3. Some compilers still don't understand that elog(ERROR) doesn't
return, so you need a dummy return. Perhaps pg_unreachable()
would do as well, but project style has been the dummy return for
a long time ... and I'm not entirely convinced by the assumption
that every compiler understands pg_unreachable(), anyway.

Is the example of CreateDestReceiver() sufficient to show that this is
not a problem in practice?

Dunno. I don't see any warnings about that in the buildfarm, but
that's not a very large sample of non-gcc compilers.

Another angle here is that on non-gcc compilers, pg_unreachable()
is going to expand to an abort() call, which is likely to eat more
code space than a dummy "return 0".

regards, tom lane

#87

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Tom Lane (#86)

Re: making the backend's json parser work in frontend code

I wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Is the example of CreateDestReceiver() sufficient to show that this is
not a problem in practice?

Dunno. I don't see any warnings about that in the buildfarm, but
that's not a very large sample of non-gcc compilers.

BTW, now that I think about it, CreateDestReceiver is not up to project
standards anyway, in that it fails to provide reasonable behavior in
the case where what's passed is not a legal value of the enum.
What you'll get, if you're lucky, is a SIGABRT crash with no
indication of the cause --- or if you're not lucky, some very
hard-to-debug crash elsewhere as a result of the function returning
a garbage pointer. So independently of whether the current coding
suppresses compiler warnings reliably, I think we ought to replace it
with elog()-and-return-NULL. Admittedly, that's wasting a few bytes
on a case that should never happen ... but we haven't ever hesitated
to do that elsewhere, if it'd make the problem more diagnosable.

IOW, there's a good reason why there are exactly no other uses
of that coding pattern.

regards, tom lane

#88

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Tom Lane (#87)

Re: making the backend's json parser work in frontend code

On Tue, Jan 28, 2020 at 1:32 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

BTW, now that I think about it, CreateDestReceiver is not up to project
standards anyway, in that it fails to provide reasonable behavior in
the case where what's passed is not a legal value of the enum.
What you'll get, if you're lucky, is a SIGABRT crash with no
indication of the cause --- or if you're not lucky, some very
hard-to-debug crash elsewhere as a result of the function returning
a garbage pointer. So independently of whether the current coding
suppresses compiler warnings reliably, I think we ought to replace it
with elog()-and-return-NULL. Admittedly, that's wasting a few bytes
on a case that should never happen ... but we haven't ever hesitated
to do that elsewhere, if it'd make the problem more diagnosable.

Well, I might be responsible for the CreateDestReceiver thing -- or I
might not, I haven't checked -- but I do think that style is a bit
cleaner and more elegant. I think it's VERY unlikely that anyone would
ever manage to call it with something that's not a legal value of the
enum, and if they do, I think the chances of surviving are basically
nil, and frankly, I'd rather die. If you asked me where you want me to
store my output and I tell you to store it in the sdklgjsdjgslkdg, you
really should refuse to do anything at all, not just stick my output
someplace-or-other and hope for the best.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#89

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Robert Haas (#88)

Re: making the backend's json parser work in frontend code

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jan 28, 2020 at 1:32 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

BTW, now that I think about it, CreateDestReceiver is not up to project
standards anyway, in that it fails to provide reasonable behavior in
the case where what's passed is not a legal value of the enum.

Well, I might be responsible for the CreateDestReceiver thing -- or I
might not, I haven't checked -- but I do think that style is a bit
cleaner and more elegant. I think it's VERY unlikely that anyone would
ever manage to call it with something that's not a legal value of the
enum, and if they do, I think the chances of surviving are basically
nil, and frankly, I'd rather die. If you asked me where you want me to
store my output and I tell you to store it in the sdklgjsdjgslkdg, you
really should refuse to do anything at all, not just stick my output
someplace-or-other and hope for the best.

Well, yeah, that's exactly my point. But in my book, "refuse to do
anything" should be "elog(ERROR)", not "invoke undefined behavior".
An actual abort() call might be all right here, in that at least
we'd know what would happen and we could debug it once we got hold
of a stack trace. But pg_unreachable() is not that. Basically, if
there's *any* circumstance, broken code or not, where control could
reach a pg_unreachable() call, you did it wrong.

regards, tom lane

#90

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Tom Lane (#89)

Re: making the backend's json parser work in frontend code

On Tue, Jan 28, 2020 at 2:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Well, yeah, that's exactly my point. But in my book, "refuse to do
anything" should be "elog(ERROR)", not "invoke undefined behavior".
An actual abort() call might be all right here, in that at least
we'd know what would happen and we could debug it once we got hold
of a stack trace. But pg_unreachable() is not that. Basically, if
there's *any* circumstance, broken code or not, where control could
reach a pg_unreachable() call, you did it wrong.

I don't really agree. I think such defensive coding is more than
justified when the input is coming from a file on disk or some other
external source where it might have been corrupted. For instance, I
think the fact that the code which deforms heap tuples will cheerfully
sail off the end of the buffer or seg fault if the tuple descriptor
doesn't match the tuple is a seriously bad thing. It results in actual
production crashes that could be avoided with more defensive coding.
Admittedly, there would likely be a performance cost, which might not
be a reason to do it, but if that cost is small I would probably vote
for paying it, because this is something that actually happens to
users on a pretty regular basis.

In the case at hand, though, there are no constants of type
CommandDest that come from any place other than a constant in the
program text, and it seems unlikely that this will ever be different
in the future. So, how could we ever end up with a value that's not in
the enum? I guess the program text itself could be corrupted, but we
cannot defend against that.

Mind you, I'm not going to put up a huge stink if you're bound and
determined to go change this. I prefer it the way that it is, and I
think that preference is well-justified by facts on the ground, but I
don't think it's worth fighting about.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#91

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Robert Haas (#90)

Re: making the backend's json parser work in frontend code

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jan 28, 2020 at 2:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Well, yeah, that's exactly my point. But in my book, "refuse to do
anything" should be "elog(ERROR)", not "invoke undefined behavior".
An actual abort() call might be all right here, in that at least
we'd know what would happen and we could debug it once we got hold
of a stack trace. But pg_unreachable() is not that. Basically, if
there's *any* circumstance, broken code or not, where control could
reach a pg_unreachable() call, you did it wrong.

I don't really agree. I think such defensive coding is more than
justified when the input is coming from a file on disk or some other
external source where it might have been corrupted.

There's certainly an argument to be made that an elog() call is an
unjustified expenditure of code space and we should just do an abort()
(but still not pg_unreachable(), IMO). However, what I'm really on about
here is that CreateDestReceiver is out of step with nigh-universal project
practice. If it's not worth having an elog() here, then there are
literally hundreds of other elog() calls that we ought to be nuking on
the same grounds. I don't really want to run around and have a bunch
of debates about exactly which extant elog() calls are effectively
unreachable and which are not. That's not always very clear, and even
if it is clear today it might not be tomorrow. The minute somebody calls
CreateDestReceiver with a non-constant argument, the issue becomes open
again. And I'd rather not have to stop and think hard about the tradeoff
between elog() and abort() when I write such functions in future.

So basically, my problem with this is that I don't think it's a coding
style we want to encourage, because it's too fragile. And there's no
good argument (like performance) to leave it that way. I quite agree
with you that there are places like tuple deforming where we're taking
more chances than I'd like --- but there is a noticeable performance
cost to being paranoid there.

regards, tom lane

#92

Mark Dilger

mark.dilger@enterprisedb.com

almost 6 years ago

In reply to: Robert Haas (#82)

2 attachment(s)

Re: making the backend's json parser work in frontend code

On Jan 28, 2020, at 8:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jan 27, 2020 at 3:05 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

Since you’ve committed your 0004 and 0005 patches, this v6 patch set is now based on a fresh copy of master.

I think the first question for 0005 is whether want this at all.
Initially, you proposed NOT committing it, but then Andrew reviewed it
as if it were for commit. I'm not sure whether he was actually saying
that it ought to be committed, though, or whether he just missed your
remarks on the topic. Nobody else has really taken a position. I'm not
100% convinced that it's necessary to include this, but I'm also not
particularly opposed to it. It's a fairly small amount of code, which
is nice, and perhaps useful as a demonstration of how to use the JSON
parser in a frontend application, which someone also might find nice.

Once Andrew reviewed it, I started thinking about it as something that might get committed. In that context, I think there should be a lot more tests in this new src/test/bin directory for other common code, but adding those as part of this patch just seems to confuse this patch.

In addition to adding frontend tests for code already in src/common, the conversation in another thread about adding frontend versions of elog and ereport seem like candidates for tests in this location. Sure, you can add an elog into a real frontend tool, such as pg_ctl, and update the tests for that program to expect that elog’s output, but what if you just want to exhaustively test the elog infrastructure in the frontend spanning multiple locales, encodings, whatever? You’ve also recently mentioned the possibility of having memory contexts in frontend code. Testing those seems like a good fit, too.

I decided to leave this in the next version of the patch set, v7. v6 had three files, the second being something that already got committed in a different form, so this is now in v7-0002 whereas it had been in v6-0003. v6-0002 has no equivalent in v7.

Anyone else want to express an opinion?

Meanwhile, here is a round of nitp^H^H^H^Hreview:
-# installcheck and install should not recurse into the subdirectory "modules".
+# installcheck and install should not recurse into the subdirectory "modules"
+# nor "bin".
I would probably have just changed this to:

# installcheck and install should not recurse into "modules" or "bin"

The details are arguable, but you definitely shouldn't say "the
subdirectory" and then list two of them.

I read that as “nor [the subdirectory] bin” with the [the subdirectory] portion elided, and it doesn’t sound anomalous to me, but your formulation is more compact. I have used it in v7 of the patch set. Thanks.

+This directory contains a set of programs that exercise functionality declared
+in src/include/common and defined in src/common.  The purpose of these programs
+is to verify that code intended to work both from frontend and backend code do
+indeed work when compiled and used in frontend code.  The structure of this
+directory makes no attempt to test that such code works in the backend, as the
+backend has its own tests already, and presumably those tests sufficiently
+exercide the code as used by it.
"exercide" is not spelled correctly, but I also disagree with giving
the directory so narrow a charter. I think you should just say
something like:

This directory contains programs that are built and executed for
testing purposes,
but never installed. It may be used, for example, to test that code in
src/common
works in frontend environments.

Your formulation sounds fine, and I’ve used it in v7.

+# There doesn't seem to be any easy way to get TestLib to use the binaries from
+# our directory, so we hack up a path to our binary and run that
directly.  This
+# seems brittle enough that some other solution should be found, if possible.
+
+my $test_json = join('/', $ENV{TESTDIR}, 'test_json');

I don't know what the right thing to do here is. Perhaps someone more
familiar with TAP testing can comment.

Yeah, I was hoping that might get a comment from Andrew. I think if it works as-is on windows, we could just use it this way until it causes a problem on some platform or other. It’s not a runtime issue, being only a build-time test, and only then when tap tests are enabled *and* running check-world, so nobody should really be adversely affected. I’ll likely get around to testing this on Windows, but I don’t have any Windows environments set up yet, as that is still on my todo list.

+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_test_json"));

Do we need this? I guess we're not likely to bother with translations
for a test program.

Removed.

+ /*
+ * Make stdout unbuffered to match stderr; and ensure stderr is unbuffered
+ * too, which it should already be everywhere except sometimes in Windows.
+ */
+ setbuf(stdout, NULL);
+ setbuf(stderr, NULL);

Do we need this? If so, why?

For the current test setup, it is not needed. The tap test executes this program (test_json) once per json string, and exits after printing a single line. Surely the tap test wouldn’t have problems hanging on an unflushed buffer for a program that has exited. I was imagining this code might grow more complex, with the tap test communicating repeatedly with the same instance of test_json, such as if we extend the json parser to iterate over chunks of the input json string.

I’ve removed this for v7, since we don’t need it yet.

+ char *json;
+ unsigned int json_len;
+ JsonLexContext *lex;
+ int client_encoding;
+ JsonParseErrorType parse_result;
+
+ json_len = (unsigned int) strlen(str);
+ client_encoding = PQenv2encoding();
+
+ json = strdup(str);
+ lex = makeJsonLexContextCstringLen(json, strlen(json),
client_encoding, true /* need_escapes */);
+ parse_result = pg_parse_json(lex, &nullSemAction);
+ fprintf(stdout, _("%s\n"), (JSON_SUCCESS == parse_result ? "VALID" :
"INVALID"));
+ return;

json_len is set but not used.

You’re right. I’ve removed it.

Not entirely sure why we are using PQenv2encoding() here.

This program, which passes possibly json formatted strings into the parser, gets those strings from perl through the shell. If locale settings on the machine where this runs might break something about that for a real client application, then our test should break in the same way. Hard-coding “C” or “POSIX” or whatever for the locale side-steps part of the issue we’re trying to test. No?

I’m leaving it as is for v7, but if you still disagree, I’ll change it. Let me know what you want me to change it *to*, though, as there is no obvious choice that I can see.

The trailing return is unnecessary.

Ok, I’ve removed it.

I think it would be a good idea to use json_errdetail() in the failure
case, print the error, and have the tests check that we got the
expected error.

Oh, yeah, I like that idea. That works, and is included in v7.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

v7-0001-Relocating-jsonapi-to-common.patchapplication/octet-stream; name=v7-0001-Relocating-jsonapi-to-common.patch; x-unix-mode=0644Download

From a995c0ddc3077879199bf0b2bc73c2611e77d593 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 28 Jan 2020 09:17:55 -0800
Subject: [PATCH 1/2] Relocating jsonapi to common.

Moving jsonapi.c and jsonapi.h to src/common and src/include/common.
Reworking the code to not include elog, ereport, pg_mblen, and
similar backend-only functionality.
---
 contrib/hstore/hstore_io.c                  |  2 +-
 src/backend/tsearch/to_tsany.c              |  2 +-
 src/backend/tsearch/wparser.c               |  2 +-
 src/backend/utils/adt/Makefile              |  1 -
 src/backend/utils/adt/json.c                |  2 +-
 src/backend/utils/adt/jsonb.c               |  2 +-
 src/backend/utils/adt/jsonb_util.c          |  2 +-
 src/backend/utils/adt/jsonfuncs.c           |  7 ++--
 src/common/Makefile                         |  1 +
 src/{backend/utils/adt => common}/jsonapi.c | 36 +++++++++++++++------
 src/include/{utils => common}/jsonapi.h     |  2 ++
 src/include/utils/jsonfuncs.h               |  2 +-
 12 files changed, 41 insertions(+), 20 deletions(-)
 rename src/{backend/utils/adt => common}/jsonapi.c (96%)
 rename src/include/{utils => common}/jsonapi.h (99%)

diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index 10ec392775..f3174f2995 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -7,13 +7,13 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "funcapi.h"
 #include "hstore.h"
 #include "lib/stringinfo.h"
 #include "libpq/pqformat.h"
 #include "utils/builtins.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index adf181c191..1fe67c4c99 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -13,10 +13,10 @@
  */
 #include "postgres.h"
 
+#include "common/jsonapi.h"
 #include "tsearch/ts_cache.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonfuncs.h"
 
 
diff --git a/src/backend/tsearch/wparser.c b/src/backend/tsearch/wparser.c
index c7499a94ac..88005c0519 100644
--- a/src/backend/tsearch/wparser.c
+++ b/src/backend/tsearch/wparser.c
@@ -16,11 +16,11 @@
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
+#include "common/jsonapi.h"
 #include "funcapi.h"
 #include "tsearch/ts_cache.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonfuncs.h"
 #include "utils/varlena.h"
 
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 790d7a24fb..13efa9338c 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -44,7 +44,6 @@ OBJS = \
 	int.o \
 	int8.o \
 	json.o \
-	jsonapi.o \
 	jsonb.o \
 	jsonb_gin.o \
 	jsonb_op.o \
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index f6cd2b9911..567eab1e01 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -127,7 +127,7 @@ json_recv(PG_FUNCTION_ARGS)
 	str = pq_getmsgtext(buf, buf->len - buf->cursor, &nbytes);
 
 	/* Validate it. */
-	lex = makeJsonLexContextCstringLen(str, nbytes, false);
+	lex = makeJsonLexContextCstringLen(str, nbytes, GetDatabaseEncoding(), false);
 	pg_parse_json_or_ereport(lex, &nullSemAction);
 
 	PG_RETURN_TEXT_P(cstring_to_text_with_len(str, nbytes));
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index c912f8932d..fea4335951 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -261,7 +261,7 @@ jsonb_from_cstring(char *json, int len)
 
 	memset(&state, 0, sizeof(state));
 	memset(&sem, 0, sizeof(sem));
-	lex = makeJsonLexContextCstringLen(json, len, true);
+	lex = makeJsonLexContextCstringLen(json, len, GetDatabaseEncoding(), true);
 
 	sem.semstate = (void *) &state;
 
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index b33c3ef43c..edec657cd3 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -15,12 +15,12 @@
 
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/datetime.h"
 #include "utils/hashutils.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/memutils.h"
 #include "utils/varlena.h"
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 66ea11b971..4f6fd0de02 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -18,6 +18,7 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/jsonapi.h"
 #include "fmgr.h"
 #include "funcapi.h"
 #include "lib/stringinfo.h"
@@ -27,7 +28,6 @@
 #include "utils/builtins.h"
 #include "utils/hsearch.h"
 #include "utils/json.h"
-#include "utils/jsonapi.h"
 #include "utils/jsonb.h"
 #include "utils/jsonfuncs.h"
 #include "utils/lsyscache.h"
@@ -514,6 +514,7 @@ makeJsonLexContext(text *json, bool need_escapes)
 {
 	return makeJsonLexContextCstringLen(VARDATA_ANY(json),
 										VARSIZE_ANY_EXHDR(json),
+										GetDatabaseEncoding(),
 										need_escapes);
 }
 
@@ -2605,7 +2606,7 @@ populate_array_json(PopulateArrayContext *ctx, char *json, int len)
 	PopulateArrayState state;
 	JsonSemAction sem;
 
-	state.lex = makeJsonLexContextCstringLen(json, len, true);
+	state.lex = makeJsonLexContextCstringLen(json, len, GetDatabaseEncoding(), true);
 	state.ctx = ctx;
 
 	memset(&sem, 0, sizeof(sem));
@@ -3448,7 +3449,7 @@ get_json_object_as_hash(char *json, int len, const char *funcname)
 	HASHCTL		ctl;
 	HTAB	   *tab;
 	JHashState *state;
-	JsonLexContext *lex = makeJsonLexContextCstringLen(json, len, true);
+	JsonLexContext *lex = makeJsonLexContextCstringLen(json, len, GetDatabaseEncoding(), true);
 	JsonSemAction *sem;
 
 	memset(&ctl, 0, sizeof(ctl));
diff --git a/src/common/Makefile b/src/common/Makefile
index 44ca68fa6c..e757fb7399 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -56,6 +56,7 @@ OBJS_COMMON = \
 	f2s.o \
 	file_perm.o \
 	ip.o \
+	jsonapi.o \
 	keywords.o \
 	kwlookup.o \
 	link-canary.o \
diff --git a/src/backend/utils/adt/jsonapi.c b/src/common/jsonapi.c
similarity index 96%
rename from src/backend/utils/adt/jsonapi.c
rename to src/common/jsonapi.c
index 230a55b101..7628fe52e1 100644
--- a/src/backend/utils/adt/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -7,15 +7,32 @@
  * Portions Copyright (c) 1994, Regents of the University of California
  *
  * IDENTIFICATION
- *	  src/backend/utils/adt/jsonapi.c
+ *	  src/common/jsonapi.c
  *
  *-------------------------------------------------------------------------
  */
+#ifndef FRONTEND
 #include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
 
+#include "common/jsonapi.h"
+
+#ifdef FRONTEND
+#include "common/logging.h"
+#include "mb/pg_wchar.h"
+#else
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
-#include "utils/jsonapi.h"
+#endif
+
+#ifdef FRONTEND
+#define check_stack_depth()
+#define json_log_and_abort(...) do { pg_log_fatal(__VA_ARGS__); exit(1); } while(0)
+#else
+#define json_log_and_abort(...) elog(ERROR, __VA_ARGS__)
+#endif
 
 /*
  * The context of the parser is maintained by the recursive descent
@@ -135,13 +152,14 @@ IsValidJsonNumber(const char *str, int len)
  * if really required.
  */
 JsonLexContext *
-makeJsonLexContextCstringLen(char *json, int len, bool need_escapes)
+makeJsonLexContextCstringLen(char *json, int len, int encoding, bool need_escapes)
 {
 	JsonLexContext *lex = palloc0(sizeof(JsonLexContext));
 
 	lex->input = lex->token_terminator = lex->line_start = json;
 	lex->line_number = 1;
 	lex->input_length = len;
+	lex->input_encoding = encoding;
 	if (need_escapes)
 		lex->strval = makeStringInfo();
 	return lex;
@@ -720,7 +738,7 @@ json_lex_string(JsonLexContext *lex)
 						ch = (ch * 16) + (*s - 'A') + 10;
 					else
 					{
-						lex->token_terminator = s + pg_mblen(s);
+						lex->token_terminator = s + pg_encoding_mblen(lex->input_encoding, s);
 						return JSON_UNICODE_ESCAPE_FORMAT;
 					}
 				}
@@ -759,7 +777,7 @@ json_lex_string(JsonLexContext *lex)
 						/* We can't allow this, since our TEXT type doesn't */
 						return JSON_UNICODE_CODE_POINT_ZERO;
 					}
-					else if (GetDatabaseEncoding() == PG_UTF8)
+					else if (lex->input_encoding == PG_UTF8)
 					{
 						unicode_to_utf8(ch, (unsigned char *) utf8str);
 						utf8len = pg_utf_mblen((unsigned char *) utf8str);
@@ -809,7 +827,7 @@ json_lex_string(JsonLexContext *lex)
 					default:
 						/* Not a valid string escape, so signal error. */
 						lex->token_start = s;
-						lex->token_terminator = s + pg_mblen(s);
+						lex->token_terminator = s + pg_encoding_mblen(lex->input_encoding, s);
 						return JSON_ESCAPING_INVALID;
 				}
 			}
@@ -823,7 +841,7 @@ json_lex_string(JsonLexContext *lex)
 				 * shown it's not a performance win.
 				 */
 				lex->token_start = s;
-				lex->token_terminator = s + pg_mblen(s);
+				lex->token_terminator = s + pg_encoding_mblen(lex->input_encoding, s);
 				return JSON_ESCAPING_INVALID;
 			}
 
@@ -1010,7 +1028,7 @@ report_parse_error(JsonParseContext ctx, JsonLexContext *lex)
 	 * unhandled enum values.  But this needs to be here anyway to cover the
 	 * possibility of an incorrect input.
 	 */
-	elog(ERROR, "unexpected json parse state: %d", (int) ctx);
+	json_log_and_abort("unexpected json parse state: %d", (int) ctx);
 	return JSON_SUCCESS;		/* silence stupider compilers */
 }
 
@@ -1077,7 +1095,7 @@ json_errdetail(JsonParseErrorType error, JsonLexContext *lex)
 	 * unhandled enum values.  But this needs to be here anyway to cover the
 	 * possibility of an incorrect input.
 	 */
-	elog(ERROR, "unexpected json parse error type: %d", (int) error);
+	json_log_and_abort("unexpected json parse error type: %d", (int) error);
 	return NULL;				/* silence stupider compilers */
 }
 
diff --git a/src/include/utils/jsonapi.h b/src/include/common/jsonapi.h
similarity index 99%
rename from src/include/utils/jsonapi.h
rename to src/include/common/jsonapi.h
index 4d69b18495..bcfd57cc53 100644
--- a/src/include/utils/jsonapi.h
+++ b/src/include/common/jsonapi.h
@@ -73,6 +73,7 @@ typedef struct JsonLexContext
 {
 	char	   *input;
 	int			input_length;
+	int			input_encoding;
 	char	   *token_start;
 	char	   *token_terminator;
 	char	   *prev_token_terminator;
@@ -149,6 +150,7 @@ extern JsonParseErrorType json_count_array_elements(JsonLexContext *lex,
  */
 extern JsonLexContext *makeJsonLexContextCstringLen(char *json,
 													int len,
+													int encoding,
 													bool need_escapes);
 
 /* lex one token */
diff --git a/src/include/utils/jsonfuncs.h b/src/include/utils/jsonfuncs.h
index b993f38409..1f1b4029cb 100644
--- a/src/include/utils/jsonfuncs.h
+++ b/src/include/utils/jsonfuncs.h
@@ -14,7 +14,7 @@
 #ifndef JSONFUNCS_H
 #define JSONFUNCS_H
 
-#include "utils/jsonapi.h"
+#include "common/jsonapi.h"
 #include "utils/jsonb.h"
 
 /*
-- 
2.21.1 (Apple Git-122.3)

v7-0002-Adding-frontend-tests-for-json-parser.patchapplication/octet-stream; name=v7-0002-Adding-frontend-tests-for-json-parser.patch; x-unix-mode=0644Download

From e48fb9c68ec898711a0a3bc7fb765406e814caa0 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Sun, 26 Jan 2020 10:41:37 -0800
Subject: [PATCH 2/2] Adding frontend tests for json parser.

Creating tests in new directory src/test/bin for testing
that the json parser can be included and used from within
a frontend standalone binary.
---
 src/Makefile                    |  4 ++-
 src/test/Makefile               |  6 ++--
 src/test/bin/.gitignore         |  1 +
 src/test/bin/Makefile           | 41 ++++++++++++++++++++++
 src/test/bin/README             | 11 ++++++
 src/test/bin/t/001_test_json.pl | 47 +++++++++++++++++++++++++
 src/test/bin/test_json.c        | 62 +++++++++++++++++++++++++++++++++
 7 files changed, 168 insertions(+), 4 deletions(-)
 create mode 100644 src/test/bin/.gitignore
 create mode 100644 src/test/bin/Makefile
 create mode 100644 src/test/bin/README
 create mode 100644 src/test/bin/t/001_test_json.pl
 create mode 100644 src/test/bin/test_json.c

diff --git a/src/Makefile b/src/Makefile
index bcdbd9588a..ccd4bab0de 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -2,7 +2,8 @@
 #
 # Makefile for src
 #
-# Copyright (c) 1994, Regents of the University of California
+# Portions Copyright (c) 1994, Regents of the University of California
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
 #
 # src/Makefile
 #
@@ -27,6 +28,7 @@ SUBDIRS = \
 	bin \
 	pl \
 	makefiles \
+	test/bin \
 	test/regress \
 	test/isolation \
 	test/perl
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6c3a1d4c27 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,7 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = bin perl regress isolation modules authentication recovery subscription
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
@@ -40,10 +40,10 @@ endif
 ALWAYS_SUBDIRS = $(filter-out $(SUBDIRS),examples kerberos ldap locale thread ssl)
 
 # We want to recurse to all subdirs for all standard targets, except that
-# installcheck and install should not recurse into the subdirectory "modules".
+# installcheck and install should not recurse into "modules" or "bin"
 
 recurse_alldirs_targets := $(filter-out installcheck install, $(standard_targets))
-installable_dirs := $(filter-out modules, $(SUBDIRS))
+installable_dirs := $(filter-out modules bin, $(SUBDIRS))
 
 $(call recurse,$(recurse_alldirs_targets))
 $(call recurse,installcheck, $(installable_dirs))
diff --git a/src/test/bin/.gitignore b/src/test/bin/.gitignore
new file mode 100644
index 0000000000..6709c749d8
--- /dev/null
+++ b/src/test/bin/.gitignore
@@ -0,0 +1 @@
+test_json
diff --git a/src/test/bin/Makefile b/src/test/bin/Makefile
new file mode 100644
index 0000000000..3eee9091bc
--- /dev/null
+++ b/src/test/bin/Makefile
@@ -0,0 +1,41 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/bin
+#
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/bin/bin/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "bin - the PostgreSQL standalone code binaries for testing"
+PGAPPICON=win32
+
+subdir = src/test/bin
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+# make this available to TAP test scripts
+export with_readline
+
+REFDOCDIR= $(top_srcdir)/doc/src/sgml/ref
+
+override CPPFLAGS := -I. -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+OBJS = \
+	$(WIN32RES) \
+	test_json.o
+
+all: test_json
+
+test_json: $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
+	$(CC) $(CFLAGS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+check:
+	PATH="$(abs_top_builddir)/src/test/bin:$$PATH" $(prove_check)
+
+clean distclean maintainer-clean:
+	rm -f test_json$(X) $(OBJS)
+	rm -rf tmp_check
diff --git a/src/test/bin/README b/src/test/bin/README
new file mode 100644
index 0000000000..239bd8456f
--- /dev/null
+++ b/src/test/bin/README
@@ -0,0 +1,11 @@
+src/test/bin/README
+
+Binary executable tests
+=======================
+
+This directory contains programs that are built and executed for testing
+purposes, but never installed. It may be used, for example, to test that code
+in src/common works in frontend environments.
+
+These test programs are part of the tap-test suite.  Configure with tap tests
+enabled or these tests will be skipped.
diff --git a/src/test/bin/t/001_test_json.pl b/src/test/bin/t/001_test_json.pl
new file mode 100644
index 0000000000..a848fd66cc
--- /dev/null
+++ b/src/test/bin/t/001_test_json.pl
@@ -0,0 +1,47 @@
+# Basic logical replication test
+use strict;
+use warnings;
+use TestLib ();
+use Cwd;
+
+use Test::More tests => 83;
+
+# There doesn't seem to be any easy way to get TestLib to use the binaries from
+# our directory, so we hack up a path to our binary and run that directly.  This
+# seems brittle enough that some other solution should be found, if possible.
+
+my $test_json = join('/', $ENV{TESTDIR}, 'test_json');
+
+ok(-f $test_json, "test_json file exists");
+ok(-x $test_json, "test_json file is executable");
+
+# Verify some valid JSON is accepted by our parser
+TestLib::command_like( [$test_json, q/null/       ], qr{\bVALID\b}, "null");
+TestLib::command_like( [$test_json, q/{}/         ], qr{\bVALID\b}, "empty object");
+TestLib::command_like( [$test_json, q/[]/         ], qr{\bVALID\b}, "empty array");
+TestLib::command_like( [$test_json, q/-12345/     ], qr{\bVALID\b}, "negative integer");
+TestLib::command_like( [$test_json, q/-1/         ], qr{\bVALID\b}, "negative integer");
+TestLib::command_like( [$test_json, q/0/          ], qr{\bVALID\b}, "zero");
+TestLib::command_like( [$test_json, q/1/          ], qr{\bVALID\b}, "positive integer");
+TestLib::command_like( [$test_json, q/12345/      ], qr{\bVALID\b}, "positive integer");
+TestLib::command_like( [$test_json, q/-1.23456789/], qr{\bVALID\b}, "negative float");
+TestLib::command_like( [$test_json, q/1.23456789/ ], qr{\bVALID\b}, "positive float");
+TestLib::command_like( [$test_json, q/{"a": "b"}/ ], qr{\bVALID\b}, "object");
+TestLib::command_like( [$test_json, q/["a", "b"]/ ], qr{\bVALID\b}, "array");
+TestLib::command_like( [$test_json, q/"pigs feet"/], qr{\bVALID\b}, 'text string');
+
+# Verify some invalid JSON is rejected by our parser
+TestLib::command_like( [$test_json, q/{/          ], qr{^\s*The input string ended unexpectedly\.\s*$}ms, 'unclosed object');
+TestLib::command_like( [$test_json, q/[/          ], qr{^\s*The input string ended unexpectedly\.\s*$}ms, 'unclosed array');
+TestLib::command_like( [$test_json, q/(/          ], qr{^\s*Token "\(" is invalid\.\s*$}ms, 'unclosed parenthesis');
+TestLib::command_like( [$test_json, q/}/          ], qr{^\s*Expected JSON value, but found "\}"\.\s*$}ms, 'unopened object');
+TestLib::command_like( [$test_json, q/]/          ], qr{^\s*Expected JSON value, but found "\]"\.\s*$}ms, 'unopened array');
+TestLib::command_like( [$test_json, q/)/          ], qr{^\s*Token "\)" is invalid\.\s*$}ms, 'unopened parenthesis');
+TestLib::command_like( [$test_json, q/{{{}}/      ], qr{^\s*Expected string or "\}", but found "\{"\.\s*$}ms, 'unbalanced object curlies');
+TestLib::command_like( [$test_json, q/{{}}}/      ], qr{^\s*Expected string or "\}", but found "\{"\.\s*$}ms, 'unbalanced object curlies');
+TestLib::command_like( [$test_json, q/[[[]]/      ], qr{^\s*The input string ended unexpectedly\.\s*$}ms, 'unbalanced array braces');
+TestLib::command_like( [$test_json, q/[[]]]/      ], qr{^\s*Expected end of input, but found "\]"\.\s*$}ms, 'unbalanced array braces');
+TestLib::command_like( [$test_json, q/((())/      ], qr{^\s*Token "\(" is invalid\.\s*$}ms, 'unbalanced array braces');
+TestLib::command_like( [$test_json, q/(()))/      ], qr{^\s*Token "\(" is invalid\.\s*$}ms, 'unbalanced array braces');
+TestLib::command_like( [$test_json, q/1 7 13/     ], qr{^\s*Expected end of input, but found "7"\.\s*$}ms, 'integer sequence');
+TestLib::command_like( [$test_json, q/{"a", "b"}/ ], qr{^\s*Expected ":", but found ","\.\s*$}ms, 'mixed object and array syntax');
diff --git a/src/test/bin/test_json.c b/src/test/bin/test_json.c
new file mode 100644
index 0000000000..86ad9e7e2b
--- /dev/null
+++ b/src/test/bin/test_json.c
@@ -0,0 +1,62 @@
+/*
+ *	pg_test_json.c
+ *		tests validity of json strings against parser implementation.
+ */
+
+#include "postgres_fe.h"
+
+#include "common/jsonapi.h"
+#include "libpq-fe.h"
+
+static const char *progname;
+
+static void parse_json(const char *str);
+
+int
+main(int argc, char *argv[])
+{
+	int			argidx;
+
+	progname = get_progname(argv[0]);
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			printf(_("Usage: %s jsonstr [, ...]\n"), progname);
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_test_json (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	for (argidx = 1; argidx < argc; argidx++)
+		parse_json(argv[argidx]);
+
+	return 0;
+}
+
+static void
+parse_json(const char *str)
+{
+	char *json;
+	JsonLexContext *lex;
+	int client_encoding;
+	JsonParseErrorType parse_result;
+
+	client_encoding = PQenv2encoding();
+
+	json = strdup(str);
+	lex = makeJsonLexContextCstringLen(json, strlen(json), client_encoding, true /* need_escapes */);
+	parse_result = pg_parse_json(lex, &nullSemAction);
+	if (JSON_SUCCESS == parse_result)
+		fprintf(stdout, _("VALID\n"));
+	else
+	{
+		const char *errstr = json_errdetail(parse_result, lex);
+		fprintf(stdout, _("%s\n"), errstr);
+	}
+}
-- 
2.21.1 (Apple Git-122.3)

#93

Mark Dilger

mark.dilger@enterprisedb.com

almost 6 years ago

In reply to: Robert Haas (#76)

Re: making the backend's json parser work in frontend code

Thanks for your review. I considered all of this along with your review comments in another email prior to sending v7 in response to that other email a few minutes ago.

On Jan 28, 2020, at 7:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jan 27, 2020 at 3:05 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

I’m attaching a new patch set with these three changes including Mahendra’s patch posted elsewhere on this thread.

Since you’ve committed your 0004 and 0005 patches, this v6 patch set is now based on a fresh copy of master.

OK, so I think this is getting close.

What is now 0001 manages to have four (4) conditionals on FRONTEND at
the top of the file. This seems like at least one two many.

You are referencing this section, copied here from the patch:

#ifndef FRONTEND
#include "postgres.h"
#else
#include "postgres_fe.h"
#endif

#include "common/jsonapi.h"

#ifdef FRONTEND
#include "common/logging.h"
#endif

#include "mb/pg_wchar.h"

#ifndef FRONTEND
#include "miscadmin.h"
#endif

I merged these a bit. See v7-0001 for details.

Also, the preprocessor police are on their way to your house now to
arrest you for that first one. You need to write it like this:

#define json_log_and_abort(...) \
do { pg_log_fatal(__VA_ARGS__); exit(1); } while (0)

Yes, right, I had done that and somehow didn’t get it into the patch. I’ll have coffee and donuts waiting.

{
- JsonLexContext *lex = palloc0(sizeof(JsonLexContext));
+ JsonLexContext *lex;
+
+#ifndef FRONTEND
+ lex = palloc0(sizeof(JsonLexContext));
+#else
+ lex = (JsonLexContext*) malloc(sizeof(JsonLexContext));
+ memset(lex, 0, sizeof(JsonLexContext));
+#endif

Instead of this, how making no change at all here?

Yes, good point. I had split that into frontend vs backend because I was using palloc0fast for the backend, which seems to me the preferred function when the size is compile-time known, like it is here, and there is no palloc0fast in fe_memutils.h for frontend use. I then changed back to palloc0 when I noticed that pretty much nowhere else similar to this in the project uses palloc0fast. I neglected to change back completely, which left what you are quoting.

Out of curiousity, why is palloc0fast not used in more places?

- default:
- elog(ERROR, "unexpected json parse state: %d", ctx);
}
+
+ /* Not reached */
+ json_log_and_abort("unexpected json parse state: %d", ctx);

This, too, seems unnecessary.

This was in response to Mahendra’s report of a compiler warning, which I didn’t get on my platform. The code in master changed a bit since v6 was written, so v7 just goes with how the newer committed code does this.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#94

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

almost 6 years ago

In reply to: Mark Dilger (#92)

2 attachment(s)

Re: making the backend's json parser work in frontend code

On 1/28/20 5:28 PM, Mark Dilger wrote:

+# There doesn't seem to be any easy way to get TestLib to use the binaries from
+# our directory, so we hack up a path to our binary and run that
directly.  This
+# seems brittle enough that some other solution should be found, if possible.
+
+my $test_json = join('/', $ENV{TESTDIR}, 'test_json');
I don't know what the right thing to do here is. Perhaps someone more
familiar with TAP testing can comment.
Yeah, I was hoping that might get a comment from Andrew. I think if it works as-is on windows, we could just use it this way until it causes a problem on some platform or other. It’s not a runtime issue, being only a build-time test, and only then when tap tests are enabled *and* running check-world, so nobody should really be adversely affected. I’ll likely get around to testing this on Windows, but I don’t have any Windows environments set up yet, as that is still on my todo list.

I think using TESTDIR is Ok, but we do need a little more on Windows,
because the executable name will be different. See attached revised
version of the test script.

We also need some extra stuff for MSVC. Something like the attached
change to src/tools/msvc/Mkvcbuild.pm. Also, the Makefile will need a
line like:

PROGRAM = test_json

I'm still not 100% on the location of the test. I think the way the msvc
suite works this should be in its own dedicated directory e.g.
src/test/json_parse.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

jsonapi-msvc.patchtext/x-patch; charset=UTF-8; name=jsonapi-msvc.patchDownload

diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index c735d529ca..3c7be61e74 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -32,9 +32,9 @@ my @unlink_on_exit;
 
 # Set of variables for modules in contrib/ and src/test/modules/
 my $contrib_defines = { 'refint' => 'REFINT_VERBOSE' };
-my @contrib_uselibpq = ('dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
+my @contrib_uselibpq = ('test_json', 'dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
 my @contrib_uselibpgport   = ('oid2name', 'pg_standby', 'vacuumlo');
-my @contrib_uselibpgcommon = ('oid2name', 'pg_standby', 'vacuumlo');
+my @contrib_uselibpgcommon = ('test_json','oid2name', 'pg_standby', 'vacuumlo');
 my $contrib_extralibs      = undef;
 my $contrib_extraincludes = { 'dblink' => ['src/backend'] };
 my $contrib_extrasource = {
@@ -121,7 +121,7 @@ sub mkvcbuild
 
 	our @pgcommonallfiles = qw(
 	  base64.c config_info.c controldata_utils.c d2s.c encnames.c exec.c
-	  f2s.c file_perm.c ip.c
+	  f2s.c file_perm.c ip.c jsonapi.c
 	  keywords.c kwlookup.c link-canary.c md5.c
 	  pg_lzcompress.c pgfnames.c psprintf.c relpath.c rmtree.c
 	  saslprep.c scram-common.c string.c stringinfo.c unicode_norm.c username.c
@@ -486,6 +486,11 @@ sub mkvcbuild
 		closedir($D);
 	}
 
+	if (-d "src/test/bin")
+	{
+		AddContrib("src/test","bin");
+	}
+
 	# Build Perl and Python modules after contrib/ modules to satisfy some
 	# dependencies with transform contrib modules, like hstore_plpython
 	# ltree_plpython and hstore_plperl.

001_test_json.plapplication/x-perl; name=001_test_json.plDownload

#95

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Mark Dilger (#93)

Re: making the backend's json parser work in frontend code

On Tue, Jan 28, 2020 at 5:35 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

I merged these a bit. See v7-0001 for details.

I jiggered that a bit more and committed this. I couldn't see the
point of having both the FRONTEND and non-FRONTEND code include
pg_wchar.h.

I'll wait to see what you make of Andrew's latest comments before
doing anything further.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#96

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Robert Haas (#95)

Re: making the backend's json parser work in frontend code

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jan 28, 2020 at 5:35 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

I merged these a bit. See v7-0001 for details.

I jiggered that a bit more and committed this. I couldn't see the
point of having both the FRONTEND and non-FRONTEND code include
pg_wchar.h.

First buildfarm report is not positive:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2020-01-29%2015%3A30%3A26

json.obj : error LNK2019: unresolved external symbol makeJsonLexContextCstringLen referenced in function json_recv [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
jsonb.obj : error LNK2001: unresolved external symbol makeJsonLexContextCstringLen [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
jsonfuncs.obj : error LNK2001: unresolved external symbol makeJsonLexContextCstringLen [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
json.obj : error LNK2019: unresolved external symbol json_lex referenced in function json_typeof [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
json.obj : error LNK2019: unresolved external symbol IsValidJsonNumber referenced in function datum_to_json [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
json.obj : error LNK2001: unresolved external symbol nullSemAction [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
jsonfuncs.obj : error LNK2019: unresolved external symbol pg_parse_json referenced in function json_strip_nulls [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
jsonfuncs.obj : error LNK2019: unresolved external symbol json_count_array_elements referenced in function get_array_start [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
jsonfuncs.obj : error LNK2019: unresolved external symbol json_errdetail referenced in function json_ereport_error [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
.\Release\postgres\postgres.exe : fatal error LNK1120: 7 unresolved externals [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]

regards, tom lane

#97

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Tom Lane (#96)

Re: making the backend's json parser work in frontend code

On Wed, Jan 29, 2020 at 10:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jan 28, 2020 at 5:35 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

I merged these a bit. See v7-0001 for details.

I jiggered that a bit more and committed this. I couldn't see the
point of having both the FRONTEND and non-FRONTEND code include
pg_wchar.h.

First buildfarm report is not positive:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2020-01-29%2015%3A30%3A26

json.obj : error LNK2019: unresolved external symbol makeJsonLexContextCstringLen referenced in function json_recv [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
jsonb.obj : error LNK2001: unresolved external symbol makeJsonLexContextCstringLen [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
jsonfuncs.obj : error LNK2001: unresolved external symbol makeJsonLexContextCstringLen [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
json.obj : error LNK2019: unresolved external symbol json_lex referenced in function json_typeof [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
json.obj : error LNK2019: unresolved external symbol IsValidJsonNumber referenced in function datum_to_json [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
json.obj : error LNK2001: unresolved external symbol nullSemAction [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
jsonfuncs.obj : error LNK2019: unresolved external symbol pg_parse_json referenced in function json_strip_nulls [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
jsonfuncs.obj : error LNK2019: unresolved external symbol json_count_array_elements referenced in function get_array_start [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
jsonfuncs.obj : error LNK2019: unresolved external symbol json_errdetail referenced in function json_ereport_error [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
.\Release\postgres\postgres.exe : fatal error LNK1120: 7 unresolved externals [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]

Hrrm, OK. I think it must need a sprinkling of Windows-specific magic.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#98

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Robert Haas (#97)

Re: making the backend's json parser work in frontend code

On Wed, Jan 29, 2020 at 10:48 AM Robert Haas <robertmhaas@gmail.com> wrote:

Hrrm, OK. I think it must need a sprinkling of Windows-specific magic.

I see that the patch Andrew posted earlier adjusts Mkvcbuild.pm's
@pgcommonallfiles, so I pushed that fix. The other hunks there should
go into the patch to add a test_json utility, I think. Hopefully that
will fix it, but I guess we'll see.

I was under the impression that the MSVC build gets the list of files
to build by parsing the Makefiles, but I guess that's not true at
least in the case of src/common.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#99

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

almost 6 years ago

In reply to: Andrew Dunstan (#94)

Re: making the backend's json parser work in frontend code

On Wed, Jan 29, 2020 at 4:32 PM Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:

On 1/28/20 5:28 PM, Mark Dilger wrote:
+# There doesn't seem to be any easy way to get TestLib to use the binaries from
+# our directory, so we hack up a path to our binary and run that
directly.  This
+# seems brittle enough that some other solution should be found, if possible.
+
+my $test_json = join('/', $ENV{TESTDIR}, 'test_json');
I don't know what the right thing to do here is. Perhaps someone more
familiar with TAP testing can comment.
Yeah, I was hoping that might get a comment from Andrew. I think if it works as-is on windows, we could just use it this way until it causes a problem on some platform or other. It’s not a runtime issue, being only a build-time test, and only then when tap tests are enabled *and* running check-world, so nobody should really be adversely affected. I’ll likely get around to testing this on Windows, but I don’t have any Windows environments set up yet, as that is still on my todo list.
I think using TESTDIR is Ok,

I've changed my mind, I don't think that will work for MSVC, the
executable gets built elsewhere for that. I'll try to come up with
something portable.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#100

Mark Dilger

mark.dilger@enterprisedb.com

almost 6 years ago

In reply to: Andrew Dunstan (#99)

Re: making the backend's json parser work in frontend code

On Jan 29, 2020, at 1:02 PM, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:

On Wed, Jan 29, 2020 at 4:32 PM Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
On 1/28/20 5:28 PM, Mark Dilger wrote:
+# There doesn't seem to be any easy way to get TestLib to use the binaries from
+# our directory, so we hack up a path to our binary and run that
directly.  This
+# seems brittle enough that some other solution should be found, if possible.
+
+my $test_json = join('/', $ENV{TESTDIR}, 'test_json');
I don't know what the right thing to do here is. Perhaps someone more
familiar with TAP testing can comment.
Yeah, I was hoping that might get a comment from Andrew. I think if it works as-is on windows, we could just use it this way until it causes a problem on some platform or other. It’s not a runtime issue, being only a build-time test, and only then when tap tests are enabled *and* running check-world, so nobody should really be adversely affected. I’ll likely get around to testing this on Windows, but I don’t have any Windows environments set up yet, as that is still on my todo list.
I think using TESTDIR is Ok,
I've changed my mind, I don't think that will work for MSVC, the
executable gets built elsewhere for that. I'll try to come up with
something portable.

I’m just now working on getting my Windows VMs set up with Visual Studio and whatnot, per the wiki instructions, so I don’t need to burden you with this sort of Windows task in the future. If there are any gotchas not mentioned on the wiki, I’d appreciate pointers about how to avoid them. I’ll try to help devise a solution, or test what you come up with, once I’m properly set up for that.

For no particular reason, I chose Windows Server 2019 and Windows 10 Pro.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#101

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

almost 6 years ago

In reply to: Mark Dilger (#100)

1 attachment(s)

Re: making the backend's json parser work in frontend code

On Thu, Jan 30, 2020 at 7:36 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

On Jan 29, 2020, at 1:02 PM, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:

On Wed, Jan 29, 2020 at 4:32 PM Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
On 1/28/20 5:28 PM, Mark Dilger wrote:
+# There doesn't seem to be any easy way to get TestLib to use the binaries from
+# our directory, so we hack up a path to our binary and run that
directly.  This
+# seems brittle enough that some other solution should be found, if possible.
+
+my $test_json = join('/', $ENV{TESTDIR}, 'test_json');
I don't know what the right thing to do here is. Perhaps someone more
familiar with TAP testing can comment.
Yeah, I was hoping that might get a comment from Andrew. I think if it works as-is on windows, we could just use it this way until it causes a problem on some platform or other. It’s not a runtime issue, being only a build-time test, and only then when tap tests are enabled *and* running check-world, so nobody should really be adversely affected. I’ll likely get around to testing this on Windows, but I don’t have any Windows environments set up yet, as that is still on my todo list.
I think using TESTDIR is Ok,
I've changed my mind, I don't think that will work for MSVC, the
executable gets built elsewhere for that. I'll try to come up with
something portable.
I’m just now working on getting my Windows VMs set up with Visual Studio and whatnot, per the wiki instructions, so I don’t need to burden you with this sort of Windows task in the future. If there are any gotchas not mentioned on the wiki, I’d appreciate pointers about how to avoid them. I’ll try to help devise a solution, or test what you come up with, once I’m properly set up for that.

For no particular reason, I chose Windows Server 2019 and Windows 10 Pro.

One VM should be sufficient. Either W10Pro os WS2019 would be fine. I
have buildfarm animals running on both.

Here's what I got working after a lot of trial and error. (This will
require a tiny change in the buildfarm script to make the animals test
it). Note that there is one test that I couldn't get working, so I
skipped it. If you can find out why it fails so much the better ... it
seems to be related to how the command processor handles quotes.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v8-0002-Adding-frontend-tests-for-json-parser.patchtext/x-patch; charset=US-ASCII; name=v8-0002-Adding-frontend-tests-for-json-parser.patchDownload

diff --git a/src/Makefile b/src/Makefile
index bcdbd9588a..ccd4bab0de 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -2,7 +2,8 @@
 #
 # Makefile for src
 #
-# Copyright (c) 1994, Regents of the University of California
+# Portions Copyright (c) 1994, Regents of the University of California
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
 #
 # src/Makefile
 #
@@ -27,6 +28,7 @@ SUBDIRS = \
 	bin \
 	pl \
 	makefiles \
+	test/bin \
 	test/regress \
 	test/isolation \
 	test/perl
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6c3a1d4c27 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,7 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = bin perl regress isolation modules authentication recovery subscription
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
@@ -40,10 +40,10 @@ endif
 ALWAYS_SUBDIRS = $(filter-out $(SUBDIRS),examples kerberos ldap locale thread ssl)
 
 # We want to recurse to all subdirs for all standard targets, except that
-# installcheck and install should not recurse into the subdirectory "modules".
+# installcheck and install should not recurse into "modules" or "bin"
 
 recurse_alldirs_targets := $(filter-out installcheck install, $(standard_targets))
-installable_dirs := $(filter-out modules, $(SUBDIRS))
+installable_dirs := $(filter-out modules bin, $(SUBDIRS))
 
 $(call recurse,$(recurse_alldirs_targets))
 $(call recurse,installcheck, $(installable_dirs))
diff --git a/src/test/bin/.gitignore b/src/test/bin/.gitignore
new file mode 100644
index 0000000000..6709c749d8
--- /dev/null
+++ b/src/test/bin/.gitignore
@@ -0,0 +1 @@
+test_json
diff --git a/src/test/bin/Makefile b/src/test/bin/Makefile
new file mode 100644
index 0000000000..113ce04cba
--- /dev/null
+++ b/src/test/bin/Makefile
@@ -0,0 +1,43 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/bin
+#
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/bin/bin/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "bin - the PostgreSQL standalone code binaries for testing"
+PGAPPICON=win32
+
+subdir = src/test/bin
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+# make this available to TAP test scripts
+export with_readline
+
+REFDOCDIR= $(top_srcdir)/doc/src/sgml/ref
+
+override CPPFLAGS := -I. -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+OBJS = \
+	$(WIN32RES) \
+	test_json.o
+
+PROGRAMS = test_json
+
+all: test_json
+
+test_json: test_json.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+	$(CC) $(CFLAGS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+check:
+	PATH="$(abs_top_builddir)/src/test/bin:$$PATH" $(prove_check)
+
+clean distclean maintainer-clean:
+	rm -f test_json$(X) $(OBJS)
+	rm -rf tmp_check
diff --git a/src/test/bin/README b/src/test/bin/README
new file mode 100644
index 0000000000..239bd8456f
--- /dev/null
+++ b/src/test/bin/README
@@ -0,0 +1,11 @@
+src/test/bin/README
+
+Binary executable tests
+=======================
+
+This directory contains programs that are built and executed for testing
+purposes, but never installed. It may be used, for example, to test that code
+in src/common works in frontend environments.
+
+These test programs are part of the tap-test suite.  Configure with tap tests
+enabled or these tests will be skipped.
diff --git a/src/test/bin/t/001_test_json.pl b/src/test/bin/t/001_test_json.pl
new file mode 100644
index 0000000000..50b06e0726
--- /dev/null
+++ b/src/test/bin/t/001_test_json.pl
@@ -0,0 +1,72 @@
+# Basic logical replication test
+use strict;
+use warnings;
+use TestLib qw(command_like $windows_os);
+
+use Test::More tests => 83;
+
+my $exe = $windows_os ? ".exe" : "";
+
+# We don't install test_json, so we pick it up from the
+# place it  gets built, $ENV{TESTDIR} except for MSVC
+
+my $binloc = ".";
+my $using_msvc = 0;
+
+if (-e "../../../test_json.vcxproj") # MSVC
+{
+	$using_msvc = 1;
+	if (-d "../../../Debug/test_json")
+	{
+		$binloc = "../../../Debug/test_json";
+	}
+	else
+	{
+		$binloc = "../../../Release/test_json";
+	}
+}
+elsif (exists $ENV{TESTDIR})
+{
+	$binloc = $ENV{TESTDIR};
+}
+
+my $test_json = "$binloc/test_json$exe";
+
+ok(-f $test_json, "test_json file exists");
+ok(-x $test_json, "test_json file is executable");
+
+# Verify some valid JSON is accepted by our parser
+command_like( [$test_json, q/null/       ], qr{\bVALID\b}, "null");
+command_like( [$test_json, q/{}/         ], qr{\bVALID\b}, "empty object");
+command_like( [$test_json, q/[]/         ], qr{\bVALID\b}, "empty array");
+command_like( [$test_json, q/-12345/     ], qr{\bVALID\b}, "negative integer");
+command_like( [$test_json, q/-1/         ], qr{\bVALID\b}, "negative integer");
+command_like( [$test_json, q/0/          ], qr{\bVALID\b}, "zero");
+command_like( [$test_json, q/1/          ], qr{\bVALID\b}, "positive integer");
+command_like( [$test_json, q/12345/      ], qr{\bVALID\b}, "positive integer");
+command_like( [$test_json, q/-1.23456789/], qr{\bVALID\b}, "negative float");
+command_like( [$test_json, q/1.23456789/ ], qr{\bVALID\b}, "positive float");
+command_like( [$test_json, q/{"a": "b"}/ ], qr{\bVALID\b}, "object");
+command_like( [$test_json, q/["a", "b"]/ ], qr{\bVALID\b}, "array");
+SKIP:
+{
+	skip "text string test confuses processor on MSVC", 3 if $using_msvc;
+
+	command_like( [$test_json, q/"pigs feet"/], qr{\bVALID\b}, 'text string');
+}
+
+# Verify some invalid JSON is rejected by our parser
+command_like( [$test_json, q/{/          ], qr{^\s*The input string ended unexpectedly\.\s*$}ms, 'unclosed object');
+command_like( [$test_json, q/[/          ], qr{^\s*The input string ended unexpectedly\.\s*$}ms, 'unclosed array');
+command_like( [$test_json, q/(/          ], qr{^\s*Token "\(" is invalid\.\s*$}ms, 'unclosed parenthesis');
+command_like( [$test_json, q/}/          ], qr{^\s*Expected JSON value, but found "\}"\.\s*$}ms, 'unopened object');
+command_like( [$test_json, q/]/          ], qr{^\s*Expected JSON value, but found "\]"\.\s*$}ms, 'unopened array');
+command_like( [$test_json, q/)/          ], qr{^\s*Token "\)" is invalid\.\s*$}ms, 'unopened parenthesis');
+command_like( [$test_json, q/{{{}}/      ], qr{^\s*Expected string or "\}", but found "\{"\.\s*$}ms, 'unbalanced object curlies');
+command_like( [$test_json, q/{{}}}/      ], qr{^\s*Expected string or "\}", but found "\{"\.\s*$}ms, 'unbalanced object curlies');
+command_like( [$test_json, q/[[[]]/      ], qr{^\s*The input string ended unexpectedly\.\s*$}ms, 'unbalanced array braces');
+command_like( [$test_json, q/[[]]]/      ], qr{^\s*Expected end of input, but found "\]"\.\s*$}ms, 'unbalanced array braces');
+command_like( [$test_json, q/((())/      ], qr{^\s*Token "\(" is invalid\.\s*$}ms, 'unbalanced array braces');
+command_like( [$test_json, q/(()))/      ], qr{^\s*Token "\(" is invalid\.\s*$}ms, 'unbalanced array braces');
+command_like( [$test_json, q/1 7 13/     ], qr{^\s*Expected end of input, but found "7"\.\s*$}ms, 'integer sequence');
+command_like( [$test_json, q/{"a", "b"}/ ], qr{^\s*Expected ":", but found ","\.\s*$}ms, 'mixed object and array syntax');
diff --git a/src/test/bin/test_json.c b/src/test/bin/test_json.c
new file mode 100644
index 0000000000..86ad9e7e2b
--- /dev/null
+++ b/src/test/bin/test_json.c
@@ -0,0 +1,62 @@
+/*
+ *	pg_test_json.c
+ *		tests validity of json strings against parser implementation.
+ */
+
+#include "postgres_fe.h"
+
+#include "common/jsonapi.h"
+#include "libpq-fe.h"
+
+static const char *progname;
+
+static void parse_json(const char *str);
+
+int
+main(int argc, char *argv[])
+{
+	int			argidx;
+
+	progname = get_progname(argv[0]);
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			printf(_("Usage: %s jsonstr [, ...]\n"), progname);
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_test_json (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	for (argidx = 1; argidx < argc; argidx++)
+		parse_json(argv[argidx]);
+
+	return 0;
+}
+
+static void
+parse_json(const char *str)
+{
+	char *json;
+	JsonLexContext *lex;
+	int client_encoding;
+	JsonParseErrorType parse_result;
+
+	client_encoding = PQenv2encoding();
+
+	json = strdup(str);
+	lex = makeJsonLexContextCstringLen(json, strlen(json), client_encoding, true /* need_escapes */);
+	parse_result = pg_parse_json(lex, &nullSemAction);
+	if (JSON_SUCCESS == parse_result)
+		fprintf(stdout, _("VALID\n"));
+	else
+	{
+		const char *errstr = json_errdetail(parse_result, lex);
+		fprintf(stdout, _("%s\n"), errstr);
+	}
+}
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index a43e31c60e..12204b2b5c 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -814,6 +814,30 @@ sub mkvcbuild
 		$proj->AddLibrary('ws2_32.lib');
 	}
 
+	# test programs in src/test/bin
+	$mf = Project::read_file('src/test/bin/Makefile');
+	$mf =~ s{\\\r?\n}{}g;
+	$mf =~ m{PROGRAMS\s*=\s*(.*)$}m
+	  || die 'Could not match in src/test/bin/Makefile' . "\n";
+	foreach my $prg (split /\s+/, $1)
+	{
+		my $proj = $solution->AddProject($prg, 'exe', 'test/bin');
+		$mf =~ m{$prg\s*:\s*(.*)$}m
+		  || die 'Could not find test define for $prg' . "\n";
+		my @files = split /\s+/, $1;
+		foreach my $f (@files)
+		{
+			$f =~ s/\.o$/\.c/;
+			if ($f =~ /\.c$/)
+			{
+				$proj->AddFile('src/test/bin/' . $f);
+			}
+		}
+		$proj->AddIncludeDir('src/interfaces/libpq');
+		$proj->AddReference($libpq, $libpgfeutils, $libpgcommon, $libpgport);
+		$proj->AddDirResourceFile('src/test/bin');
+	}
+
 	# Regression DLL and EXE
 	my $regress = $solution->AddProject('regress', 'dll', 'misc');
 	$regress->AddFile('src/test/regress/regress.c');
diff --git a/src/tools/msvc/clean.bat b/src/tools/msvc/clean.bat
index 672bb2d650..c7b25d4bd8 100755
--- a/src/tools/msvc/clean.bat
+++ b/src/tools/msvc/clean.bat
@@ -27,6 +27,7 @@ if exist src\pl\plperl\win32ver.rc del /q src\pl\plperl\win32ver.rc
 if exist src\pl\plpgsql\src\win32ver.rc del /q src\pl\plpgsql\src\win32ver.rc
 if exist src\pl\plpython\win32ver.rc del /q src\pl\plpython\win32ver.rc
 if exist src\pl\tcl\win32ver.rc del /q src\pl\tcl\win32ver.rc
+if exist src\test\bin\win32ver.rc del /q src\test\bin\win32ver.rc
 if exist src\test\isolation\win32ver.rc del /q src\test\isolation\win32ver.rc
 if exist src\test\regress\win32ver.rc del /q src\test\regress\win32ver.rc
 if exist src\timezone\win32ver.rc del /q src\timezone\win32ver.rc