BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

Started by Nonameabout 11 years ago12 messages
#1Noname
bouda@edookit.com

The following bug has been logged on the website:

Bug reference: 12070
Logged by: Ondřej Bouda
Email address: bouda@edookit.com
PostgreSQL version: 9.4beta2
Operating system: Windows 7 64bit
Description:

The hstore_to_json_loose(hstore) produces an invalid JSON in the following
case:

SELECT hstore_to_json_loose(hstore(ARRAY ['name'], ARRAY ['1.'] :: TEXT
[]))

Output: {"name": 1.}

Expected:
either {"name": "1."}
or {"name": 1}
(the latter being the preferred one so that it produces the same JSON as
hstore_to_jsonb_loose(hstore))

The actual output is indeed incorrect as JSON does not permit `1.` - it must
be a string.

Tested with PostgreSQL 9.4 RC1 and still wrong.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#1)
Re: BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

bouda@edookit.com writes:

The hstore_to_json_loose(hstore) produces an invalid JSON in the following
case:

SELECT hstore_to_json_loose(hstore(ARRAY ['name'], ARRAY ['1.'] :: TEXT
[]))

Output: {"name": 1.}

The actual output is indeed incorrect as JSON does not permit `1.` - it must
be a string.

Yeah. The problem seems to be the ad-hoc (I'm being polite) code in
hstore_to_json_loose to decide whether a string should be treated as a
number. It does much more work than it needs to, and fails to have any
tight connection to the JSON syntax rules for numbers.

Offhand, it seems like the nicest fix would be if the core json code
exposed a function that would say whether a string matches the JSON
number syntax. Does that functionality already exist someplace,
or is it too deeply buried in the JSON parser guts?

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#2)
Re: [HACKERS] BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

On 11/26/2014 11:19 AM, Tom Lane wrote:

bouda@edookit.com writes:

The hstore_to_json_loose(hstore) produces an invalid JSON in the following
case:
SELECT hstore_to_json_loose(hstore(ARRAY ['name'], ARRAY ['1.'] :: TEXT
[]))
Output: {"name": 1.}
The actual output is indeed incorrect as JSON does not permit `1.` - it must
be a string.

Yeah. The problem seems to be the ad-hoc (I'm being polite) code in
hstore_to_json_loose to decide whether a string should be treated as a
number. It does much more work than it needs to, and fails to have any
tight connection to the JSON syntax rules for numbers.

Offhand, it seems like the nicest fix would be if the core json code
exposed a function that would say whether a string matches the JSON
number syntax. Does that functionality already exist someplace,
or is it too deeply buried in the JSON parser guts?

regards, tom lane

In json.c we now check numbers like this:

JsonLexContext dummy_lex;
bool numeric_error;
...
dummy_lex.input = *outputstr == '-' ? outputstr + 1 : outputstr;
dummy_lex.input_length = strlen(dummy_lex.input);
json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error);

numeric_error is true IFF outputstr is a legal json number.

Exposing a function to do this should be trivial.

cheers

andrew

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#3)
Re: [HACKERS] BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

On 11/26/2014 11:48 AM, Andrew Dunstan wrote:

In json.c we now check numbers like this:

JsonLexContext dummy_lex;
bool numeric_error;
...
dummy_lex.input = *outputstr == '-' ? outputstr + 1 : outputstr;
dummy_lex.input_length = strlen(dummy_lex.input);
json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error);

numeric_error is true IFF outputstr is a legal json number.

er is NOT a legal json number

cheers

andrew

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#3)
Re: [BUGS] BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

On 11/26/2014 11:48 AM, Andrew Dunstan wrote:

On 11/26/2014 11:19 AM, Tom Lane wrote:

bouda@edookit.com writes:

The hstore_to_json_loose(hstore) produces an invalid JSON in the
following
case:
SELECT hstore_to_json_loose(hstore(ARRAY ['name'], ARRAY ['1.'] :: TEXT
[]))
Output: {"name": 1.}
The actual output is indeed incorrect as JSON does not permit `1.` -
it must
be a string.

Yeah. The problem seems to be the ad-hoc (I'm being polite) code in
hstore_to_json_loose to decide whether a string should be treated as a
number. It does much more work than it needs to, and fails to have any
tight connection to the JSON syntax rules for numbers.

Offhand, it seems like the nicest fix would be if the core json code
exposed a function that would say whether a string matches the JSON
number syntax. Does that functionality already exist someplace,
or is it too deeply buried in the JSON parser guts?

regards, tom lane

In json.c we now check numbers like this:

JsonLexContext dummy_lex;
bool numeric_error;
...
dummy_lex.input = *outputstr == '-' ? outputstr + 1 : outputstr;
dummy_lex.input_length = strlen(dummy_lex.input);
json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error);

numeric_error is true IFF outputstr is a legal json number.

Exposing a function to do this should be trivial.

Tom,

what do you want to do about this? In the back branches, exposing a
function like this would be an API change, wouldn't it? Perhaps there we
just need to pick up the 100 lines or so involved from json.c and copy
them into hstore_io.c, suitably modified. In the development branch I
thing adding the function to the API is the best way.

I don't mind doing the work once we agree what is to be done - in the
development branch it will be trivial.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#5)
Re: [HACKERS] BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

Andrew Dunstan <andrew@dunslane.net> writes:

what do you want to do about this? In the back branches, exposing a
function like this would be an API change, wouldn't it? Perhaps there we
just need to pick up the 100 lines or so involved from json.c and copy
them into hstore_io.c, suitably modified. In the development branch I
thing adding the function to the API is the best way.

If we're going to do it by calling some newly-exposed function, I'd be
inclined to fix it the same way in the back branches. Otherwise the
discrepancy between the branches is a big back-patching hazard.
(For instance, if we realize we need to fix a bug in the numeric-parsing
code, what are the odds that we remember to fix hstore's additional copy
in the back branches?)

The "API break" isn't a big issue imo. The net effect would be that eg
hstore 9.3.6 wouldn't work against a 9.3.5 server. We do that sort of
thing *all the time* --- at least twice in the past year, according to
a quick scan of the commit logs. If you were changing or removing a
function that third-party code might depend on, it'd be problematic,
but an addition has no such risk.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#6)
1 attachment(s)
Re: [HACKERS] BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

On 11/30/2014 11:45 AM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

what do you want to do about this? In the back branches, exposing a
function like this would be an API change, wouldn't it? Perhaps there we
just need to pick up the 100 lines or so involved from json.c and copy
them into hstore_io.c, suitably modified. In the development branch I
thing adding the function to the API is the best way.

If we're going to do it by calling some newly-exposed function, I'd be
inclined to fix it the same way in the back branches. Otherwise the
discrepancy between the branches is a big back-patching hazard.
(For instance, if we realize we need to fix a bug in the numeric-parsing
code, what are the odds that we remember to fix hstore's additional copy
in the back branches?)

The "API break" isn't a big issue imo. The net effect would be that eg
hstore 9.3.6 wouldn't work against a 9.3.5 server. We do that sort of
thing *all the time* --- at least twice in the past year, according to
a quick scan of the commit logs. If you were changing or removing a
function that third-party code might depend on, it'd be problematic,
but an addition has no such risk.

OK, here's the patch.

cheers

andrew

Attachments:

jsonnumberfix.patchtext/x-diff; name=jsonnumberfix.patchDownload
diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index 6ce3047..ee3d696 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -1240,7 +1240,6 @@ hstore_to_json_loose(PG_FUNCTION_ARGS)
 	int			count = HS_COUNT(in);
 	char	   *base = STRPTR(in);
 	HEntry	   *entries = ARRPTR(in);
-	bool		is_number;
 	StringInfoData tmp,
 				dst;
 
@@ -1267,48 +1266,9 @@ hstore_to_json_loose(PG_FUNCTION_ARGS)
 			appendStringInfoString(&dst, "false");
 		else
 		{
-			is_number = false;
 			resetStringInfo(&tmp);
 			appendBinaryStringInfo(&tmp, HS_VAL(entries, base, i), HS_VALLEN(entries, i));
-
-			/*
-			 * don't treat something with a leading zero followed by another
-			 * digit as numeric - could be a zip code or similar
-			 */
-			if (tmp.len > 0 &&
-				!(tmp.data[0] == '0' &&
-				  isdigit((unsigned char) tmp.data[1])) &&
-				strspn(tmp.data, "+-0123456789Ee.") == tmp.len)
-			{
-				/*
-				 * might be a number. See if we can input it as a numeric
-				 * value. Ignore any actual parsed value.
-				 */
-				char	   *endptr = "junk";
-				long		lval;
-
-				lval = strtol(tmp.data, &endptr, 10);
-				(void) lval;
-				if (*endptr == '\0')
-				{
-					/*
-					 * strol man page says this means the whole string is
-					 * valid
-					 */
-					is_number = true;
-				}
-				else
-				{
-					/* not an int - try a double */
-					double		dval;
-
-					dval = strtod(tmp.data, &endptr);
-					(void) dval;
-					if (*endptr == '\0')
-						is_number = true;
-				}
-			}
-			if (is_number)
+			if (IsValidJsonNumber(tmp.data, tmp.len))
 				appendBinaryStringInfo(&dst, tmp.data, tmp.len);
 			else
 				escape_json(&dst, tmp.data);
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index d2bf640..271a2a4 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -173,6 +173,31 @@ lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token)
 	 (c) == '_' || \
 	 IS_HIGHBIT_SET(c))
 
+/* utility function to check if a string is a valid JSON number */
+extern bool
+IsValidJsonNumber(char * str, int len)
+{
+	bool		numeric_error;
+	JsonLexContext dummy_lex;
+
+
+	/* json_lex_number expects a leading - to have been eaten already */
+	if (*str == '-')
+	{
+		dummy_lex.input = str + 1;
+		dummy_lex.input_length = len - 1;
+	}
+	else
+	{
+		dummy_lex.input = str;
+		dummy_lex.input_length = len;
+	}
+
+	json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error);
+
+	return ! numeric_error;
+}
+
 /*
  * Input.
  */
@@ -1338,8 +1363,6 @@ datum_to_json(Datum val, bool is_null, StringInfo result,
 {
 	char	   *outputstr;
 	text	   *jsontext;
-	bool		numeric_error;
-	JsonLexContext dummy_lex;
 
 	/* callers are expected to ensure that null keys are not passed in */
 	Assert( ! (key_scalar && is_null));
@@ -1376,25 +1399,14 @@ datum_to_json(Datum val, bool is_null, StringInfo result,
 			break;
 		case JSONTYPE_NUMERIC:
 			outputstr = OidOutputFunctionCall(outfuncoid, val);
-			if (key_scalar)
-			{
-				/* always quote keys */
-				escape_json(result, outputstr);
-			}
+			/*
+			 * Don't call escape_json for a non-key if it's a valid JSON
+			 * number.
+			 */
+			if (!key_scalar && IsValidJsonNumber(outputstr, strlen(outputstr)))
+				appendStringInfoString(result, outputstr);
 			else
-			{
-				/*
-				 * Don't call escape_json for a non-key if it's a valid JSON
-				 * number.
-				 */
-				dummy_lex.input = *outputstr == '-' ? outputstr + 1 : outputstr;
-				dummy_lex.input_length = strlen(dummy_lex.input);
-				json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error);
-				if (!numeric_error)
-					appendStringInfoString(result, outputstr);
-				else
-					escape_json(result, outputstr);
-			}
+				escape_json(result, outputstr);
 			pfree(outputstr);
 			break;
 		case JSONTYPE_DATE:
diff --git a/src/include/utils/jsonapi.h b/src/include/utils/jsonapi.h
index df05712..b3a579b 100644
--- a/src/include/utils/jsonapi.h
+++ b/src/include/utils/jsonapi.h
@@ -117,4 +117,7 @@ extern JsonLexContext *makeJsonLexContextCstringLen(char *json,
 							 int len,
 							 bool need_escapes);
 
+/* utility function to check if a string is a valid JSON number */
+extern bool IsValidJsonNumber(char * str, int len);
+
 #endif   /* JSONAPI_H */
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#7)
Re: [HACKERS] BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

Andrew Dunstan <andrew@dunslane.net> writes:

OK, here's the patch.

Can we make IsValidJsonNumber() take a "const char *"? Also its
comment should specify that it doesn't require nul-terminated
input, if indeed it doesn't. Otherwise +1.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#9Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#8)
Re: [BUGS] BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

On 11/30/2014 04:31 PM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

OK, here's the patch.

Can we make IsValidJsonNumber() take a "const char *"? Also its
comment should specify that it doesn't require nul-terminated
input, if indeed it doesn't. Otherwise +1.

Then I have to cast away the const-ness when I assign it inside the
function. Constifying the whole API would be a rather larger project, I
suspect, assuming it's possible.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#6)
Re: [HACKERS] BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

On 11/30/14 11:45 AM, Tom Lane wrote:

The "API break" isn't a big issue imo. The net effect would be that eg
hstore 9.3.6 wouldn't work against a 9.3.5 server. We do that sort of
thing *all the time* --- at least twice in the past year, according to
a quick scan of the commit logs. If you were changing or removing a
function that third-party code might depend on, it'd be problematic,
but an addition has no such risk.

This sort of things is actually a bit of an annoyance, because it means
that for minor-version upgrades, you need to stop the server before
unpacking the new version, otherwise the old running server will try to
load the new hstore module and fail with a symbol lookup. This can
increase the downtime significantly.

Yes, we've done this before, and people have gotten bitten by it before.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#11Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#10)
Re: [HACKERS] BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

On Tue, Jan 13, 2015 at 10:56:48AM -0500, Peter Eisentraut wrote:

On 11/30/14 11:45 AM, Tom Lane wrote:

The "API break" isn't a big issue imo. The net effect would be that eg
hstore 9.3.6 wouldn't work against a 9.3.5 server. We do that sort of
thing *all the time* --- at least twice in the past year, according to
a quick scan of the commit logs. If you were changing or removing a
function that third-party code might depend on, it'd be problematic,
but an addition has no such risk.

This sort of things is actually a bit of an annoyance, because it means
that for minor-version upgrades, you need to stop the server before
unpacking the new version, otherwise the old running server will try to
load the new hstore module and fail with a symbol lookup. This can
increase the downtime significantly.

Yes, we've done this before, and people have gotten bitten by it before.

Uh, do we ever support installing new binaries while the server is
running? I would hope not.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#12Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#11)
Re: [BUGS] BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

On 1/15/15 2:29 PM, Bruce Momjian wrote:

On Tue, Jan 13, 2015 at 10:56:48AM -0500, Peter Eisentraut wrote:

On 11/30/14 11:45 AM, Tom Lane wrote:

The "API break" isn't a big issue imo. The net effect would be that eg
hstore 9.3.6 wouldn't work against a 9.3.5 server. We do that sort of
thing *all the time* --- at least twice in the past year, according to
a quick scan of the commit logs. If you were changing or removing a
function that third-party code might depend on, it'd be problematic,
but an addition has no such risk.

This sort of things is actually a bit of an annoyance, because it means
that for minor-version upgrades, you need to stop the server before
unpacking the new version, otherwise the old running server will try to
load the new hstore module and fail with a symbol lookup. This can
increase the downtime significantly.

Yes, we've done this before, and people have gotten bitten by it before.

Uh, do we ever support installing new binaries while the server is
running? I would hope not.

Effectively, we don't, but it's not unreasonable to expect it. Check
how your operating system upgrades other server packages such as apache
or openssh.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers