Document DateStyle effect on jsonpath string()
Hackers,
In fuzing around trying to work out what’s going on with the formatting of timestamptz values cast by the timestamp_tz() jsonpath method[1]/messages/by-id/7DE080CE-6D8C-4794-9BD1-7D9699172FAB@justatheory.com, I noticed that the formatting of the string() method applied to date and time objects was not fully tested, or how the output is determined by the DateStyle method.
The attached path aims to rectify this situation by adding tests that chain string() after the jsonpath date/time methods, both with the default testing “PostreSQL” DateStyle and “ISO”. It also mentions the impact of the DateStyle parameter in the string() documentation.
Also available to review as a pull request[2]https://github.com/theory/postgres/pull/7/files.
Best,
David
[1]: /messages/by-id/7DE080CE-6D8C-4794-9BD1-7D9699172FAB@justatheory.com
[2]: https://github.com/theory/postgres/pull/7/files
Attachments:
v1-0001-Document-impact-of-datestyle-on-jsonpath-string.patchapplication/octet-stream; name=v1-0001-Document-impact-of-datestyle-on-jsonpath-string.patch; x-unix-mode=0644Download+108-2
On Wed, Jul 3, 2024 at 12:51 AM David E. Wheeler <david@justatheory.com> wrote:
Hackers,
In fuzing around trying to work out what’s going on with the formatting of timestamptz values cast by the timestamp_tz() jsonpath method[1], I noticed that the formatting of the string() method applied to date and time objects was not fully tested, or how the output is determined by the DateStyle method.
The attached path aims to rectify this situation by adding tests that chain string() after the jsonpath date/time methods, both with the default testing “PostreSQL” DateStyle and “ISO”. It also mentions the impact of the DateStyle parameter in the string() documentation.
Also available to review as a pull request[2].
Best,
David
[1]: /messages/by-id/7DE080CE-6D8C-4794-9BD1-7D9699172FAB@justatheory.com
[2]: https://github.com/theory/postgres/pull/7/files
+set datestyle = 'ISO';
+select jsonb_path_query_tz('"2023-08-15 12:34:56"',
'$.timestamp_tz().string()'); -- should work
+ jsonb_path_query_tz
+--------------------------
+ "2023-08-15 12:34:56-07"
+(1 row)
Do you need to reset the datestyle?
also the above query is time zone sensitive, maybe the time zone is
set in another place, but that's not explicit?
<para>
- String value converted from a JSON boolean, number, string, or datetime
+ String value converted from a JSON boolean, number, string, or
+ datetime. Note that the string output of datetimes is determined by
+ the <xref linkend="guc-datestyle"/> parameter.
</para>
imho, your patch has just too many examples.
for explaining the above sentence, the following example should be enough.
begin;
set local time zone +1;
set local datestyle to postgres;
select jsonb_path_query_tz('"2023-08-15 12:34:56"',
'$.timestamp_tz().string()');
set local datestyle to iso;
select jsonb_path_query_tz('"2023-08-15 12:34:56"',
'$.timestamp_tz().string()');
commit;
On Jul 4, 2024, at 04:28, jian he <jian.universality@gmail.com> wrote:
Do you need to reset the datestyle?
Wouldn’t hurt but it’s not necessary, no. It’s set only for the execution of this file, and there are no more calls that rely on it.
also the above query is time zone sensitive, maybe the time zone is
set in another place, but that's not explicit?
It’s implicit in how PostgreSQL runs its test suite; other tests later change it.
<para> - String value converted from a JSON boolean, number, string, or datetime + String value converted from a JSON boolean, number, string, or + datetime. Note that the string output of datetimes is determined by + the <xref linkend="guc-datestyle"/> parameter. </para> imho, your patch has just too many examples.
I’m confused. There are no examples in my patch, or this bit you cite.
for explaining the above sentence, the following example should be enough.
Are you referring to the tests? I made them comprehensive so that we reliably demonstrate the behavior of the string() method on all the date/time data types. They are not examples, not in the documentation sense at least.
Best,
David
On Thu, Jul 4, 2024 at 10:45 PM David E. Wheeler <david@justatheory.com> wrote:
On Jul 4, 2024, at 04:28, jian he <jian.universality@gmail.com> wrote:
Do you need to reset the datestyle?
Wouldn’t hurt but it’s not necessary, no. It’s set only for the execution of this file, and there are no more calls that rely on it.
also the above query is time zone sensitive, maybe the time zone is
set in another place, but that's not explicit?It’s implicit in how PostgreSQL runs its test suite; other tests later change it.
I inserted these two commands into it.
show time zone;
show datestyle;
turns out in the test suite, the default data style is "Postgres,
MDY", and the default time zone is "PST8PDT".
These two implicit settings weren't mentioned anywhere.
your tests look ok to me.
one tiny complaint would be maybe we need `reset datestyle`.
Because we are in line 600 of src/test/regress/sql/jsonb_jsonpath.sql,
We want to make sure the following test is not influenced by guc datestyle.
<para> - String value converted from a JSON boolean, number, string, or datetime + String value converted from a JSON boolean, number, string, or + datetime. Note that the string output of datetimes is determined by + the <xref linkend="guc-datestyle"/> parameter. </para> imho, your patch has just too many examples.I’m confused. There are no examples in my patch, or this bit you cite.
for explaining the above sentence, the following example should be enough.
Are you referring to the tests? I made them comprehensive so that we reliably demonstrate the behavior of the string() method on all the date/time data types. They are not examples, not in the documentation sense at least.
I mean tests, sorry for the confusion. added several more tests should be fine.
overall looks good to me.
On Jul 9, 2024, at 10:35, jian he <jian.universality@gmail.com> wrote:
one tiny complaint would be maybe we need `reset datestyle`.
That’s fair. Done.
D
Attachments:
v2-0001-Document-impact-of-datestyle-on-jsonpath-string.patchapplication/octet-stream; name=v2-0001-Document-impact-of-datestyle-on-jsonpath-string.patch; x-unix-mode=0644Download+110-2
On Jul 9, 2024, at 10:45, David E. Wheeler <david@justatheory.com> wrote:
one tiny complaint would be maybe we need `reset datestyle`.
That’s fair. Done.
Here’s a rebase on 5784a49. I also updated the commitfest item[1]https://commitfest.postgresql.org/49/5101/ to link to a new pull request[2]https://github.com/theory/postgres/pull/8, since I seem to have turned the other one into the tz conversion bug fix.
Best,
David
[1]: https://commitfest.postgresql.org/49/5101/
[2]: https://github.com/theory/postgres/pull/8
Attachments:
v3-0001-Document-impact-of-datestyle-on-jsonpath-string.patchapplication/octet-stream; name=v3-0001-Document-impact-of-datestyle-on-jsonpath-string.patch; x-unix-mode=0644Download+110-2
On Jul 19, 2024, at 10:22, David E. Wheeler <david@justatheory.com> wrote:
Here’s a rebase on 5784a49. I also updated the commitfest item[1] to link to a new pull request[2], since I seem to have turned the other one into the tz conversion bug fix.
[1]: https://commitfest.postgresql.org/49/5101/
[2]: https://github.com/theory/postgres/pull/8
Rebase on 47c9803. I also changed the commitfest item[1] to “ready for committer”, since jian reviewed it, though I couldn’t see a way to add jian as a reviewer in the app. Hope that makes sense.
Best,
David
Attachments:
v4-0001-Document-impact-of-datestyle-on-jsonpath-string.patchapplication/octet-stream; name=v4-0001-Document-impact-of-datestyle-on-jsonpath-string.patch; x-unix-mode=0644Download+110-2
"David E. Wheeler" <david@justatheory.com> writes:
Rebase on 47c9803. I also changed the commitfest item[1] to “ready for committer”, since jian reviewed it, though I couldn’t see a way to add jian as a reviewer in the app. Hope that makes sense.
Pushed with a little additional polishing.
I thought the best way to address jian's complaint about DateStyle not
being clearly locked down was to change horology.sql to verify the
prevailing setting, as it has long done for TimeZone. That's the
lead test script for related stuff, so it makes the most sense to
do it there. Having done that, I don't feel a need to duplicate
that elsewhere.
regards, tom lane
On Sep 10, 2024, at 14:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pushed with a little additional polishing.
Thank you! Do you think it’d be worthwhile to back port to 17?
I thought the best way to address jian's complaint about DateStyle not
being clearly locked down was to change horology.sql to verify the
prevailing setting, as it has long done for TimeZone. That's the
lead test script for related stuff, so it makes the most sense to
do it there. Having done that, I don't feel a need to duplicate
that elsewhere.
Yeah, that will help, but I still bet next time I go to figure out what it is I’ll stick that line in some test to make it fail with clear output for what it’s set to 😂.
D
Isn't this behavior actually a bug that should be fixed rather than
documented?
These JSON path functions are specified by the SQL standard, so they
shouldn't depend on PostgreSQL-specific settings. At least in new
functionality we should avoid that, no?
Show quoted text
On 10.09.24 21:43, David E. Wheeler wrote:
On Sep 10, 2024, at 14:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pushed with a little additional polishing.
Thank you! Do you think it’d be worthwhile to back port to 17?
I thought the best way to address jian's complaint about DateStyle not
being clearly locked down was to change horology.sql to verify the
prevailing setting, as it has long done for TimeZone. That's the
lead test script for related stuff, so it makes the most sense to
do it there. Having done that, I don't feel a need to duplicate
that elsewhere.Yeah, that will help, but I still bet next time I go to figure out what it is I’ll stick that line in some test to make it fail with clear output for what it’s set to 😂.
Peter Eisentraut <peter@eisentraut.org> writes:
These JSON path functions are specified by the SQL standard, so they
shouldn't depend on PostgreSQL-specific settings. At least in new
functionality we should avoid that, no?
Hmm ... but does the standard precisely define the output format?
Since these conversions are built on our own timestamp I/O code,
I rather imagine there is quite a lot of behavior there that's
not to be found in the standard. That doesn't really trouble
me as long as the spec's behavior is a subset of it (i.e.,
reachable as long as you've got the right parameter settings).
regards, tom lane
"David E. Wheeler" <david@justatheory.com> writes:
On Sep 10, 2024, at 14:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pushed with a little additional polishing.
Thank you! Do you think it’d be worthwhile to back port to 17?
Not as things stand. If we adopt Peter's nearby position that
the current behavior is actually buggy, then probably back-patching
a corrected version would be worthwhile as a part of fixing it.
regards, tom lane
On Sep 10, 2024, at 16:10, Peter Eisentraut <peter@eisentraut.org> wrote:
These JSON path functions are specified by the SQL standard, so they shouldn't depend on PostgreSQL-specific settings. At least in new functionality we should avoid that, no?
Does that also apply to `datetime(template)`, where it uses the `to_timestamp()` templates? From the docs[1]https://www.postgresql.org/docs/devel/functions-json.html:
The datetime() and datetime(template) methods use the same parsing rules as the to_timestamp SQL function does (see Section 9.8[2]), with three exceptions. First, these methods don't allow unmatched template patterns. Second, only the following separators are allowed in the template string: minus sign, period, solidus (slash), comma, apostrophe, semicolon, colon and space. Third, separators in the template string must exactly match the input string.
Does the standard specify a formatting language?
Best,
David
[1]: https://www.postgresql.org/docs/devel/functions-json.html
[2]: https://www.postgresql.org/docs/devel/functions-formatting.html
On Sep 10, 2024, at 16:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Not as things stand. If we adopt Peter's nearby position that
the current behavior is actually buggy, then probably back-patching
a corrected version would be worthwhile as a part of fixing it.
Oh, I see now that my reply to him points out the same issue as yours.
So annoying that the standard is not publicly available for any one of us to go look.
D
On 10.09.24 22:16, Tom Lane wrote:
Peter Eisentraut <peter@eisentraut.org> writes:
These JSON path functions are specified by the SQL standard, so they
shouldn't depend on PostgreSQL-specific settings. At least in new
functionality we should avoid that, no?Hmm ... but does the standard precisely define the output format?
Since these conversions are built on our own timestamp I/O code,
I rather imagine there is quite a lot of behavior there that's
not to be found in the standard. That doesn't really trouble
me as long as the spec's behavior is a subset of it (i.e.,
reachable as long as you've got the right parameter settings).
Actually, the standard prohibits this call:
"""
XV) If JM specifies string, then:
1) Forallj,1(one)≤j≤n,
Case:
a) If Ij is not a character string, number, or Boolean value,
then let ST be data exception — non-string SQL/JSON item (2202X).
b) Otherwise, let X be an SQL variable whose value is Ij. Let ML be an
implementation-defined (IL006) maximum
length of variable-length character strings. Let Vj be the result of
CAST (X AS CHARACTER VARYING(ML)
If this conversion results in an exception condition, then
let ST be that exception condition.
"""
So I guess we have extended this and the current behavior is consistent
with item b).
What I'm concerned about is that this makes the behavior of JSON_QUERY
non-immutable. Maybe there are other reasons for it to be
non-immutable, in which case this isn't important. But it might be
worth avoiding that?
Peter Eisentraut <peter@eisentraut.org> writes:
What I'm concerned about is that this makes the behavior of JSON_QUERY
non-immutable. Maybe there are other reasons for it to be
non-immutable, in which case this isn't important. But it might be
worth avoiding that?
Fair point, but haven't we already bit that bullet with respect
to timezones?
[ looks... ] Hmm, it looks like jsonb_path_exists_tz is marked
stable while jsonb_path_exists is claimed to be immutable.
So yeah, there's a problem here. I'm not 100% convinced that
jsonb_path_exists was truly immutable before, but for sure it
is not now, and that's bad.
regression=# select jsonb_path_query('"2023-08-15 12:34:56"', '$.timestamp().string()');
jsonb_path_query
-----------------------
"2023-08-15 12:34:56"
(1 row)
regression=# set datestyle = postgres;
SET
regression=# select jsonb_path_query('"2023-08-15 12:34:56"', '$.timestamp().string()');
jsonb_path_query
----------------------------
"Tue Aug 15 12:34:56 2023"
(1 row)
regards, tom lane
On Sep 11, 2024, at 10:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
[ looks... ] Hmm, it looks like jsonb_path_exists_tz is marked
stable while jsonb_path_exists is claimed to be immutable.
So yeah, there's a problem here. I'm not 100% convinced that
jsonb_path_exists was truly immutable before, but for sure it
is not now, and that's bad.regression=# select jsonb_path_query('"2023-08-15 12:34:56"', '$.timestamp().string()');
jsonb_path_query
-----------------------
"2023-08-15 12:34:56"
(1 row)regression=# set datestyle = postgres;
SET
regression=# select jsonb_path_query('"2023-08-15 12:34:56"', '$.timestamp().string()');
jsonb_path_query
----------------------------
"Tue Aug 15 12:34:56 2023"
(1 row)
I wonder, then, whether .string() should be modified to use the ISO format in UTC, and therefore be immutable. That’s the format you get if you omit .string() and let result be stringified from a date/time/timestamp.
FWIW, that’s how my Go port works, since I didn’t bother to replicate the DateStyle GUC (example[1]https://theory.github.io/sqljson/playground/?p=%2524.timestamp%28%29.string%28%29&j=%25222023-08-15%252012%253A34%253A56%2522&a=&o=1).
Best,
David
"David E. Wheeler" <david@justatheory.com> writes:
I wonder, then, whether .string() should be modified to use the ISO format in UTC, and therefore be immutable. That’s the format you get if you omit .string() and let result be stringified from a date/time/timestamp.
What "let result be stringified" behavior are you thinking of,
exactly? AFAICS there's not sensitivity to timezone unless you
use the _tz variant, otherwise it just regurgitates the input.
I agree that we should force ISO datestyle, but I'm not quite sure
about whether we're in the clear with timezone handling. We already
had a bunch of specialized rules about timezone handling in the _tz
and not-_tz variants of these functions. It seems to me that simply
forcing UTC would not be consistent with that pre-existing behavior.
However, I may not have absorbed enough caffeine yet.
regards, tom lane
On Sep 11, 2024, at 11:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
What "let result be stringified" behavior are you thinking of,
exactly? AFAICS there's not sensitivity to timezone unless you
use the _tz variant, otherwise it just regurgitates the input.
There is stringification of a time, date, or timestamp value, which has no TZ, but is still affected by DateStyle. Then there is stringification of timetz or timestamptz, which can be created by the .time_tz() and .timstamp_tz() functions, and therefore are impacted by both the DateStyle and TimeZone configs, even when not using the _tz variant:
david=# set timezone = 'America/New_York';
SET
david=# select jsonb_path_query('"2023-08-15 12:34:56-09"', '$.timestamp_tz().string()');
jsonb_path_query
--------------------------
"2023-08-15 17:34:56-04"
david=# set timezone = 'America/Los_Angeles';
SET
david=# select jsonb_path_query('"2023-08-15 12:34:56-09"', '$.timestamp_tz().string()');
jsonb_path_query
--------------------------
"2023-08-15 14:34:56-07"
(1 row)
I agree that we should force ISO datestyle, but I'm not quite sure
about whether we're in the clear with timezone handling. We already
had a bunch of specialized rules about timezone handling in the _tz
and not-_tz variants of these functions. It seems to me that simply
forcing UTC would not be consistent with that pre-existing behavior.
However, I may not have absorbed enough caffeine yet.
True, it would not be consistent with the existing behaviors, but I believe these are all new in Postgres 17.
Best,
David
"David E. Wheeler" <david@justatheory.com> writes:
On Sep 11, 2024, at 11:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
What "let result be stringified" behavior are you thinking of,
exactly? AFAICS there's not sensitivity to timezone unless you
use the _tz variant, otherwise it just regurgitates the input.
There is stringification of a time, date, or timestamp value, which
has no TZ, but is still affected by DateStyle.
What I understood you to be referencing is what happens without
string(), which AFAICS does not result in any timezone rotation:
regression=# set timezone = 'America/New_York';
SET
regression=# select jsonb_path_query('"2023-08-15 12:34:56-09"', '$.timestamp_tz()');
jsonb_path_query
-----------------------------
"2023-08-15T12:34:56-09:00"
(1 row)
I think I'd be content to have string() duplicate that behavior
--- in fact, it seems like it'd be odd if it doesn't match.
regards, tom lane