json casts

Started by Andrew Dunstanover 11 years ago15 messages
#1Andrew Dunstan
andrew@dunslane.net

I've been on the receiving end of a couple of mumbles about the fact
that the JSON rendering code ignores casts of builtin types to JSON.
This was originally done as an optimization to avoid doing cache lookups
for casts for things we knew quite well how to turn into JSON values
(unlike, say, hstore). However, there is at least one concrete case
where this has some possibly undesirable consequences, namely
timestamps. Many JSON processors, especially JavaScript/ECMAScript
processors, require timestamp values to be in ISO 8601 format, with a
'T' between the date part and the time part, and thus they barf on the
output we produce for such values.

I'm therefore wondering if we should just remove this optimization. I
don't have any performance figures, but it seems unlikely to be terribly
expensive. Perhaps in 9.5 we should look at caching a lot of the info
the json rendering code gathers, but that's something we can't look at now.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Andrew Dunstan (#1)
Re: json casts

On 05/27/2014 10:53 PM, Andrew Dunstan wrote:

I've been on the receiving end of a couple of mumbles about the fact
that the JSON rendering code ignores casts of builtin types to JSON.
This was originally done as an optimization to avoid doing cache lookups
for casts for things we knew quite well how to turn into JSON values
(unlike, say, hstore). However, there is at least one concrete case
where this has some possibly undesirable consequences, namely
timestamps. Many JSON processors, especially JavaScript/ECMAScript
processors, require timestamp values to be in ISO 8601 format, with a
'T' between the date part and the time part, and thus they barf on the
output we produce for such values.

I don't understand what ignoring casts of builtin types to JSON means.
Can you give an example?

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Heikki Linnakangas (#2)
Re: json casts

On 05/27/2014 03:57 PM, Heikki Linnakangas wrote:

On 05/27/2014 10:53 PM, Andrew Dunstan wrote:

I've been on the receiving end of a couple of mumbles about the fact
that the JSON rendering code ignores casts of builtin types to JSON.
This was originally done as an optimization to avoid doing cache lookups
for casts for things we knew quite well how to turn into JSON values
(unlike, say, hstore). However, there is at least one concrete case
where this has some possibly undesirable consequences, namely
timestamps. Many JSON processors, especially JavaScript/ECMAScript
processors, require timestamp values to be in ISO 8601 format, with a
'T' between the date part and the time part, and thus they barf on the
output we produce for such values.

I don't understand what ignoring casts of builtin types to JSON means.
Can you give an example?

See src/backend/utils/adt/json.c:json_categorize_type() lines 1280-1300.

When rendering some value as part of a json string, if a cast exists
from the data type to json, then the cast function is used to render the
json instead of the type's normal output function, but only if it's not
a builtin type.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#3)
Re: json casts

Andrew Dunstan <andrew@dunslane.net> writes:

On 05/27/2014 03:57 PM, Heikki Linnakangas wrote:

On 05/27/2014 10:53 PM, Andrew Dunstan wrote:

I've been on the receiving end of a couple of mumbles about the fact
that the JSON rendering code ignores casts of builtin types to JSON.
This was originally done as an optimization to avoid doing cache lookups
for casts for things we knew quite well how to turn into JSON values
(unlike, say, hstore). However, there is at least one concrete case
where this has some possibly undesirable consequences, namely
timestamps. Many JSON processors, especially JavaScript/ECMAScript
processors, require timestamp values to be in ISO 8601 format, with a
'T' between the date part and the time part, and thus they barf on the
output we produce for such values.

I don't understand what ignoring casts of builtin types to JSON means.
Can you give an example?

See src/backend/utils/adt/json.c:json_categorize_type() lines 1280-1300.

When rendering some value as part of a json string, if a cast exists
from the data type to json, then the cast function is used to render the
json instead of the type's normal output function, but only if it's not
a builtin type.

How exactly would disabling that code have any effect on timestamp
rendering? There's no cast to json from timestamps (nor any other
builtin type, except jsonb).

I'd be inclined to think a more useful answer to this issue would be to
make json.c special-case timestamps, as it already does for numerics.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Hannu Krosing
hannu@2ndQuadrant.com
In reply to: Tom Lane (#4)
Re: json casts

On 05/27/2014 11:00 PM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

On 05/27/2014 03:57 PM, Heikki Linnakangas wrote:

On 05/27/2014 10:53 PM, Andrew Dunstan wrote:

I've been on the receiving end of a couple of mumbles about the fact
that the JSON rendering code ignores casts of builtin types to JSON.
This was originally done as an optimization to avoid doing cache lookups
for casts for things we knew quite well how to turn into JSON values
(unlike, say, hstore). However, there is at least one concrete case
where this has some possibly undesirable consequences, namely
timestamps. Many JSON processors, especially JavaScript/ECMAScript
processors, require timestamp values to be in ISO 8601 format, with a
'T' between the date part and the time part, and thus they barf on the
output we produce for such values.

I don't understand what ignoring casts of builtin types to JSON means.
Can you give an example?

See src/backend/utils/adt/json.c:json_categorize_type() lines 1280-1300.
When rendering some value as part of a json string, if a cast exists
from the data type to json, then the cast function is used to render the
json instead of the type's normal output function, but only if it's not
a builtin type.

How exactly would disabling that code have any effect on timestamp
rendering? There's no cast to json from timestamps (nor any other
builtin type, except jsonb).

I think Andrews idea was, that if cast were used, one could fix the above
problem by defining a correct cast.

I'd be inclined to think a more useful answer to this issue would be to
make json.c special-case timestamps, as it already does for numerics.

regards, tom lane

But I agree that special-casing the code to use the de-facto json standard
of using ISO 8601 date representation is a better solution.

Just make sure you get the TZ part right - this is another place where
PostgreSQL often differs from other systems' understanding of ISO
timestamps.

Cheers

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic O�

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andrew Dunstan
andrew@dunslane.net
In reply to: Hannu Krosing (#5)
Re: json casts

On 05/27/2014 05:43 PM, Hannu Krosing wrote:

On 05/27/2014 11:00 PM, Tom Lane wrote:

See src/backend/utils/adt/json.c:json_categorize_type() lines 1280-1300.
When rendering some value as part of a json string, if a cast exists
from the data type to json, then the cast function is used to render the
json instead of the type's normal output function, but only if it's not
a builtin type.

How exactly would disabling that code have any effect on timestamp
rendering? There's no cast to json from timestamps (nor any other
builtin type, except jsonb).

I think Andrews idea was, that if cast were used, one could fix the above
problem by defining a correct cast.

Right.

I'd be inclined to think a more useful answer to this issue would be to
make json.c special-case timestamps, as it already does for numerics.

OK, that's another approach.

But I agree that special-casing the code to use the de-facto json standard
of using ISO 8601 date representation is a better solution.

Just make sure you get the TZ part right - this is another place where
PostgreSQL often differs from other systems' understanding of ISO
timestamps.

The target would be something like:

to_char($1,'\"YYYY-MM-DD"T"HH:MI:SS.USOF\"')

AIUI that should be legal ISO 8601. But I'm happy to defer to experts.

Given that this would be a hard coded behaviour change, is it too late
to do this for 9.4?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Stephen Frost
sfrost@snowman.net
In reply to: Andrew Dunstan (#6)
Re: json casts

* Andrew Dunstan (andrew@dunslane.net) wrote:

I'd be inclined to think a more useful answer to this issue would be to
make json.c special-case timestamps, as it already does for numerics.

OK, that's another approach.

I'm all for doing this for JSON, but it'd sure be nice to have the
option to get actual ISO 8601 from the built-ins too...

Given that this would be a hard coded behaviour change, is it too
late to do this for 9.4?

No, for my 2c.

Thanks,

Stephen

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#7)
Re: json casts

Stephen Frost <sfrost@snowman.net> writes:

* Andrew Dunstan (andrew@dunslane.net) wrote:

Given that this would be a hard coded behaviour change, is it too
late to do this for 9.4?

No, for my 2c.

If we do it by adding casts then it'd require an initdb, so I'd vote
against that for 9.4. If we just change behavior in json.c then that
objection doesn't apply, so I wouldn't complain.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#8)
Re: json casts

On 05/27/2014 07:17 PM, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Andrew Dunstan (andrew@dunslane.net) wrote:

Given that this would be a hard coded behaviour change, is it too
late to do this for 9.4?

No, for my 2c.

If we do it by adding casts then it'd require an initdb, so I'd vote
against that for 9.4. If we just change behavior in json.c then that
objection doesn't apply, so I wouldn't complain.

I wasn't proposing to add a cast, just to allow users to add one if they
wanted. But I'm quite happy to go with special-casing timestamps, and
leave the larger question for another time.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#4)
Re: json casts

On Tue, May 27, 2014 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'd be inclined to think a more useful answer to this issue would be to
make json.c special-case timestamps, as it already does for numerics.

I wonder if anyone besides me is nervous about changing the semantics
here. It seems like the sort of backward-compatibility break we
normally avoid. If we do make such a compatibility break, it should
certainly be release-noted.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#10)
Re: json casts

On 05/27/2014 11:55 PM, Robert Haas wrote:

On Tue, May 27, 2014 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'd be inclined to think a more useful answer to this issue would be to
make json.c special-case timestamps, as it already does for numerics.

I wonder if anyone besides me is nervous about changing the semantics
here. It seems like the sort of backward-compatibility break we
normally avoid. If we do make such a compatibility break, it should
certainly be release-noted.

Yes, certainly it should be.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#9)
1 attachment(s)
Re: json casts

On 05/27/2014 07:25 PM, Andrew Dunstan wrote:

On 05/27/2014 07:17 PM, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Andrew Dunstan (andrew@dunslane.net) wrote:

Given that this would be a hard coded behaviour change, is it too
late to do this for 9.4?

No, for my 2c.

If we do it by adding casts then it'd require an initdb, so I'd vote
against that for 9.4. If we just change behavior in json.c then that
objection doesn't apply, so I wouldn't complain.

I wasn't proposing to add a cast, just to allow users to add one if
they wanted. But I'm quite happy to go with special-casing timestamps,
and leave the larger question for another time.

Here's a draft patch. I'm still checking to see if there are other
places that need to be fixed, but I think this has the main one.

cheers

andrew

Attachments:

jsonts.patchtext/x-patch; name=jsonts.patchDownload
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index a7364f3..d262bda 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -24,6 +24,7 @@
 #include "parser/parse_coerce.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/formatting.h"
 #include "utils/lsyscache.h"
 #include "utils/json.h"
 #include "utils/jsonapi.h"
@@ -53,6 +54,8 @@ typedef enum					/* type categories for datum_to_json */
 	JSONTYPE_NULL,				/* null, so we didn't bother to identify */
 	JSONTYPE_BOOL,				/* boolean (built-in types only) */
 	JSONTYPE_NUMERIC,			/* numeric (ditto) */
+	JSONTYPE_TIMESTAMP,         /* we use special formatting for timestamp */
+	JSONTYPE_TIMESTAMPTZ,       /* ... and timestamptz */
 	JSONTYPE_JSON,				/* JSON itself (and JSONB) */
 	JSONTYPE_ARRAY,				/* array */
 	JSONTYPE_COMPOSITE,			/* composite */
@@ -60,6 +63,13 @@ typedef enum					/* type categories for datum_to_json */
 	JSONTYPE_OTHER				/* all else */
 } JsonTypeCategory;
 
+/*
+ * to_char formats to turn timestamps and timpstamptzs into json strings
+ * that are ISO 8601 compliant
+ */
+#define TS_ISO8601_FMT "\\\"YYYY-MM-DD\"T\"HH24:MI:SS.US\\\""
+#define TSTZ_ISO8601_FMT "\\\"YYYY-MM-DD\"T\"HH24:MI:SS.USOF\\\""
+
 static inline void json_lex(JsonLexContext *lex);
 static inline void json_lex_string(JsonLexContext *lex);
 static inline void json_lex_number(JsonLexContext *lex, char *s, bool *num_err);
@@ -1262,6 +1272,14 @@ json_categorize_type(Oid typoid,
 			*tcategory = JSONTYPE_NUMERIC;
 			break;
 
+		case TIMESTAMPOID:
+			*tcategory = JSONTYPE_TIMESTAMP;
+			break;
+
+		case TIMESTAMPTZOID:
+			*tcategory = JSONTYPE_TIMESTAMPTZ;
+			break;
+
 		case JSONOID:
 		case JSONBOID:
 			*tcategory = JSONTYPE_JSON;
@@ -1375,6 +1393,29 @@ datum_to_json(Datum val, bool is_null, StringInfo result,
 			}
 			pfree(outputstr);
 			break;
+		case JSONTYPE_TIMESTAMP:
+			/* 
+			 * The timestamp format used here provides for quoting the string, 
+			 * so no escaping is required.
+			 */
+			jsontext = DatumGetTextP(
+				DirectFunctionCall2(timestamp_to_char, val, 
+									CStringGetTextDatum(TS_ISO8601_FMT)));
+			outputstr = text_to_cstring(jsontext);
+			appendStringInfoString(result, outputstr);
+			pfree(outputstr);
+			pfree(jsontext);
+			break;
+		case JSONTYPE_TIMESTAMPTZ:
+			/* same comment as for timestamp above */
+			jsontext = DatumGetTextP(
+				DirectFunctionCall2(timestamptz_to_char, val, 
+									CStringGetTextDatum(TSTZ_ISO8601_FMT)));
+			outputstr = text_to_cstring(jsontext);
+			appendStringInfoString(result, outputstr);
+			pfree(outputstr);
+			pfree(jsontext);
+			break;
 		case JSONTYPE_JSON:
 			/* JSON and JSONB output will already be escaped */
 			outputstr = OidOutputFunctionCall(outfuncoid, val);
diff --git a/src/test/regress/expected/json.out b/src/test/regress/expected/json.out
index 9f08676..c4dc8b0 100644
--- a/src/test/regress/expected/json.out
+++ b/src/test/regress/expected/json.out
@@ -403,6 +403,29 @@ SELECT row_to_json(row((select array_agg(x) as d from generate_series(5,10) x)),
  {"f1":[5,6,7,8,9,10]}
 (1 row)
 
+-- to_json, timestamps
+select to_json(timestamp '2014-05-28 12:22:35.614298');
+           to_json            
+------------------------------
+ "2014-05-28T12:22:35.614298"
+(1 row)
+
+BEGIN;
+SET LOCAL TIME ZONE 10.5;
+select to_json(timestamptz '2014-05-28 12:22:35.614298-04');
+              to_json               
+------------------------------------
+ "2014-05-29T02:52:35.614298+10:30"
+(1 row)
+
+SET LOCAL TIME ZONE -8;
+select to_json(timestamptz '2014-05-28 12:22:35.614298-04');
+             to_json             
+---------------------------------
+ "2014-05-28T08:22:35.614298-08"
+(1 row)
+
+COMMIT;
 --json_agg
 SELECT json_agg(q)
   FROM ( SELECT $$a$$ || x AS b, y AS c,
diff --git a/src/test/regress/expected/json_1.out b/src/test/regress/expected/json_1.out
index 13f7687..629e98e 100644
--- a/src/test/regress/expected/json_1.out
+++ b/src/test/regress/expected/json_1.out
@@ -403,6 +403,29 @@ SELECT row_to_json(row((select array_agg(x) as d from generate_series(5,10) x)),
  {"f1":[5,6,7,8,9,10]}
 (1 row)
 
+-- to_json, timestamps
+select to_json(timestamp '2014-05-28 12:22:35.614298');
+           to_json            
+------------------------------
+ "2014-05-28T12:22:35.614298"
+(1 row)
+
+BEGIN;
+SET LOCAL TIME ZONE 10.5;
+select to_json(timestamptz '2014-05-28 12:22:35.614298-04');
+              to_json               
+------------------------------------
+ "2014-05-29T02:52:35.614298+10:30"
+(1 row)
+
+SET LOCAL TIME ZONE -8;
+select to_json(timestamptz '2014-05-28 12:22:35.614298-04');
+             to_json             
+---------------------------------
+ "2014-05-28T08:22:35.614298-08"
+(1 row)
+
+COMMIT;
 --json_agg
 SELECT json_agg(q)
   FROM ( SELECT $$a$$ || x AS b, y AS c,
diff --git a/src/test/regress/sql/json.sql b/src/test/regress/sql/json.sql
index 2ae5b82..6c2faec 100644
--- a/src/test/regress/sql/json.sql
+++ b/src/test/regress/sql/json.sql
@@ -100,6 +100,17 @@ FROM rows q;
 
 SELECT row_to_json(row((select array_agg(x) as d from generate_series(5,10) x)),false);
 
+-- to_json, timestamps
+
+select to_json(timestamp '2014-05-28 12:22:35.614298');
+
+BEGIN;
+SET LOCAL TIME ZONE 10.5;
+select to_json(timestamptz '2014-05-28 12:22:35.614298-04');
+SET LOCAL TIME ZONE -8;
+select to_json(timestamptz '2014-05-28 12:22:35.614298-04');
+COMMIT;
+
 --json_agg
 
 SELECT json_agg(q)
#13Peter Eisentraut
peter_e@gmx.net
In reply to: Andrew Dunstan (#12)
Re: json casts

On 5/28/14, 6:48 PM, Andrew Dunstan wrote:

On 05/27/2014 07:25 PM, Andrew Dunstan wrote:

On 05/27/2014 07:17 PM, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Andrew Dunstan (andrew@dunslane.net) wrote:

Given that this would be a hard coded behaviour change, is it too
late to do this for 9.4?

No, for my 2c.

If we do it by adding casts then it'd require an initdb, so I'd vote
against that for 9.4. If we just change behavior in json.c then that
objection doesn't apply, so I wouldn't complain.

I wasn't proposing to add a cast, just to allow users to add one if
they wanted. But I'm quite happy to go with special-casing timestamps,
and leave the larger question for another time.

Here's a draft patch. I'm still checking to see if there are other
places that need to be fixed, but I think this has the main one.

This was solved back in the day for the xml type, which has essentially
the same requirement, by adding an ISO-8601-compatible output option to
EncodeDateTime(). See map_sql_value_to_xml_value() in xml.c. You ought
to be able to reuse that. Seems easier than routing through to_char().

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#13)
Re: json casts

Peter Eisentraut <peter_e@gmx.net> writes:

This was solved back in the day for the xml type, which has essentially
the same requirement, by adding an ISO-8601-compatible output option to
EncodeDateTime(). See map_sql_value_to_xml_value() in xml.c. You ought
to be able to reuse that. Seems easier than routing through to_char().

I was wondering if we didn't have another way to do that. to_char() is
squirrely enough that I really hate having any other core functionality
depending on it. +1 for changing this.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#14)
Re: json casts

On 06/03/2014 04:45 PM, Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

This was solved back in the day for the xml type, which has essentially
the same requirement, by adding an ISO-8601-compatible output option to
EncodeDateTime(). See map_sql_value_to_xml_value() in xml.c. You ought
to be able to reuse that. Seems easier than routing through to_char().

I was wondering if we didn't have another way to do that. to_char() is
squirrely enough that I really hate having any other core functionality
depending on it. +1 for changing this.

It's a bit of a pity neither of you spoke up in the 6 days since I
published the draft patch. And honestly, some of the other code invoked
by the XML code looks a bit squirrely too. But, OK, I will look at it. I
guess I can assume that the output won't contain anything that needs
escaping, so I can just add the leading and trailing quote marks without
needing to call escape_json().

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers