Re: pgsql: Implement jsonpath .datetime() method
On Thu, Sep 26, 2019 at 2:57 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> writes:
On Thu, Sep 26, 2019 at 2:12 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
The proximate problem seems to be that compareItems() is insufficiently
careful to ensure that both values are non-null before passing them
off to datatype-specific code. The code accidentally fails to crash
on 64-bit machines, but it's still giving garbage answers, I think.I've found compareItems() code to not apply appropriate cast to/from
Datum. Fixed in 7881bb14f4. This makes test pass on my local 32-bit
machine. I'll keep look on buildfarm.Hm. dromedary seems not to crash either with that fix, but I'm not
sure why not, because when I was running the previous tree by hand,
the stack trace showed pretty clearly that we were getting to
timestamp_cmp with one null and one non-null argument. So I don't
believe your argument that that's impossible, and even if it is,
I do not think it's sane for compareItems to depend on that ---
especially when one of its code paths *does* check for nulls.I do not have a very good opinion about the quality of this code
upon my first glance at it. Just looking at compareDatetime:
The patch with compareDatetime() refactoring was posted in the original
thread in pgsql-hackers [1]/messages/by-id/d9244568-08bb-5dcf-db25-540412e2e61f@postgrespro.ru.
* The code is schizophrenic about whether it's allowed to pass a
null have_error pointer or not. It is not very sensible to have
some code doing
if (have_error && *have_error)
return 0;
when other code branches will dump core for null have_error.
Given that if this test actually was doing anything, what it
would be doing is failing to detect error conditions, I think
the only sensible design is to insist that have_error mustn't be
null, in which case these checks for null pointer should be removed,
because (a) they waste code and cycles and (b) they mislead the
reader as to what the API of compareDatetime actually is.* At least some of the code paths will malfunction if the caller
didn't initialize *have_error to false. If that is an intended API
requirement, it is not acceptable for the function header comment
not to say so. (For bonus points, it'd be nice if the header
comment agreed with the code as to the name of the variable.)
If this isn't an intended requirement, you need to fix the code,
probably by initializing "*have_error = false;" up at the top.* This is silly:
if (*have_error)
return 0;*have_error = false;
* Also, given that you have that "if (*have_error)" where you do,
the have_error tests inside the switch are useless redundancy.
You might as well just remove them completely and let the final
test handle falling out if a conversion failed. Alternatively
you could drop the final test, because as the code stands right
now, it's visibly impossible to get there with *have_error true.
Yes, oddities in have_error handling seem to appear during numerous
reworks of the patch. have_error is really non-NULL now, and its
handling was simplified in the patch.
* It's a bit schizophrenic also that some of the switches
lack default:s (and rely on the if (!cmpfunc) below), while
the outer switch does have its own, completely redundant
default:. I'd get rid of that default: and instead add
a comment explaining that the !cmpfunc test substitutes for
default branches.
Default cases with elog()s were added to the all switches. Previously,
the default case in the outer switch was used to report invalid type1,
and cmpfunc was used to report invalid type2.
* OIDs are unsigned, so if you must print them, use %u not %d.
Fixed.
* The errhints don't follow project message style.
Fixed, but I'm not sure about "*_tz()". Maybe it's worth to pass current
jsonb_xxx function name to compareDatetime() through JsonPathExecContext?
* The blank lines before "break"s aren't really per project
style either, IMO. They certainly aren't doing anything to
improve readability, and they help limit how much code you
can see at once.
Fixed. If I recall it correctly, these lines were added by pgindent.
* More generally, it's completely unclear why some error conditions
are thrown as errors and others just result in returning *have_error.
In particular, it seems weird that some unsupported datatype combinations
cause hard errors while others do not. Maybe that's fine, but if so,
the function header comment is falling down on the job by not explaining
the reasoning.
All cast errors are caught by jsonpath predicate. Comparison of the
uncomparable datetime types (time[tz] to dated types) also returns Unknown.
And only if datatype conversion requires current timezone, which is not
available in immutable family of jsonb_xxx() functions, hard error is thrown.
This behavior is specific only for our jsonpath implementation. But I'm
really not sure if we should throw an error or return Unknown in this case.
[1]: /messages/by-id/d9244568-08bb-5dcf-db25-540412e2e61f@postgrespro.ru
--
Nikita Glukhov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
Import Notes
Reply to msg id not found: httpswww.postgresql.orgmessage-id32308.1569455803%40sss.pgh.pa.usReference msg id not found: httpswww.postgresql.orgmessage-id32308.1569455803%40sss.pgh.pa.us
On Fri, Sep 27, 2019 at 6:58 PM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
On Thu, Sep 26, 2019 at 2:57 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
* More generally, it's completely unclear why some error conditions
are thrown as errors and others just result in returning *have_error.
In particular, it seems weird that some unsupported datatype combinations
cause hard errors while others do not. Maybe that's fine, but if so,
the function header comment is falling down on the job by not explaining
the reasoning.All cast errors are caught by jsonpath predicate. Comparison of the
uncomparable datetime types (time[tz] to dated types) also returns Unknown.
And only if datatype conversion requires current timezone, which is not
available in immutable family of jsonb_xxx() functions, hard error is thrown.
This behavior is specific only for our jsonpath implementation. But I'm
really not sure if we should throw an error or return Unknown in this case.
I'd like to share my further thoughts about errors. I think we should
suppress errors defined by standard and which user can expect. So,
user can expect that wrong date format causes an error, division by
zero causes an error and so on. And those errors are defined by
standard.
However, we error is caused by limitation of our implementation, then
suppression doesn't look right to me.
For instance.
# select jsonb_path_query('"1000000-01-01"', '$.datetime() >
"2020-01-01 12:00:00".datetime()'::jsonpath);
jsonb_path_query
------------------
null
(1 row)
# select '1000000-01-01'::date > '2020-01-01 12:00:00'::timestamp;
ERROR: date out of range for timestamp
So, jsonpath behaves like 1000000 is not greater than 2020. This
looks like plain false. And user can't expect that unless she is
familiar with our particular issues. Now I got opinion that such
errors shouldn't be suppressed. We can't suppress *every* error. If
trying to do this, we can come to an idea to suppress OOM error and
return garbage then, which is obviously ridiculous. Opinions?
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Sun, Sep 29, 2019 at 10:30 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
So, jsonpath behaves like 1000000 is not greater than 2020. This
looks like plain false. And user can't expect that unless she is
familiar with our particular issues. Now I got opinion that such
errors shouldn't be suppressed. We can't suppress *every* error. If
trying to do this, we can come to an idea to suppress OOM error and
return garbage then, which is obviously ridiculous. Opinions?
I don't know enough about jsonpath to have a view on specifically
which errors ought to be suppressed, but I agree that it's probably
not all of them. In fact, I'd go so far as to say that thinking about
it in terms of error suppression is probably not the right approach in
the first place. Rather, you want to ask what behavior you're trying
to create.
For example, if I'm trying to write a function that takes a string as
input and returns JSON, where the result is formatted as a number if
possible or a string otherwise, I might want access at the C level to
the guts of numeric_in, with all parsing errors returned rather than
thrown. But it would be silly to suppress an out-of-memory condition,
because that doesn't help the caller. The caller wants to know whether
the thing can be parsed as a number or not, and that has nothing to do
with whether we're out of memory, so an out-of-memory error should
still be thrown.
In this case here, it seems to me that you should similarly start by
defining the behavior you're trying to create. Unless that's clearly
defined, deciding which errors to suppress may be difficult.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Sep 30, 2019 at 10:56 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Sep 29, 2019 at 10:30 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:So, jsonpath behaves like 1000000 is not greater than 2020. This
looks like plain false. And user can't expect that unless she is
familiar with our particular issues. Now I got opinion that such
errors shouldn't be suppressed. We can't suppress *every* error. If
trying to do this, we can come to an idea to suppress OOM error and
return garbage then, which is obviously ridiculous. Opinions?I don't know enough about jsonpath to have a view on specifically
which errors ought to be suppressed, but I agree that it's probably
not all of them. In fact, I'd go so far as to say that thinking about
it in terms of error suppression is probably not the right approach in
the first place. Rather, you want to ask what behavior you're trying
to create.For example, if I'm trying to write a function that takes a string as
input and returns JSON, where the result is formatted as a number if
possible or a string otherwise, I might want access at the C level to
the guts of numeric_in, with all parsing errors returned rather than
thrown. But it would be silly to suppress an out-of-memory condition,
because that doesn't help the caller. The caller wants to know whether
the thing can be parsed as a number or not, and that has nothing to do
with whether we're out of memory, so an out-of-memory error should
still be thrown.In this case here, it seems to me that you should similarly start by
defining the behavior you're trying to create. Unless that's clearly
defined, deciding which errors to suppress may be difficult.
Making C functions return errors rather than throw is what we're
implementing in our patchsets. In big picture the behavior we're
trying to create is SQL Standard 2016. It defines error handling as
following.
The SQL operators JSON_VALUE, JSON_QUERY, JSON_TABLE, and JSON_EXISTS provide
the following mechanisms to handle these errors:
1) The SQL/JSON path language traps any errors that occur during the evaluation
of a <JSON filter expression>. Depending on the precise <JSON path predicate>
contained in the <JSON filter expression>, the result may be Unknown, True, or
False, depending on the outcome of non-error tests evaluated in the <JSON path
predicate>.
2) The SQL/JSON path language has two modes, strict and lax, which govern
structural errors, as follows:
a) In lax mode:
i) If an operation requires an SQL/JSON array but the operand is not an SQL
JSON array, then the operand is first “wrapped” in an SQL/JSON array prior
to performing the operation.
ii) If an operation requires something other than an SQL/JSON array, but
the operand is an SQL/JSON array, then the operand is “unwrapped” by
converting its elements into an SQL/JSON sequence prior to performing the
operation.
iii) After applying the preceding resolutions to structural errors, if
there is still a structural error, the result is an empty SQL/JSON
sequence.
b) In strict mode, if the structural error occurs within a <JSON filter
expression>, then the error handling of <JSON filter expression> applies
Otherwise, a structural error is an unhandled error.
3) Non-structural errors outside of a <JSON path predicate> are always
unhandled errors, resulting in an exception condition returned from the path
engine to the SQL/JSON query operator.
4) The SQL/JSON query operators provide an ON ERROR clause to specify the
behavior in case of an input conversion error, an unhandled structural error,
an unhandled non-structural error, or an output conversion error.
So, basically standard requires us to suppress any error happening in
filter expression. But as I wrote before suppression of errors in
datetime comparison may lead to surprising results. That happens in
rare corner cases, but still. This makes uneasy choice between
consistent behavior and standard behavior.
However, Nikita Glukhov gave to good idea about that. Instead on
thinking about whether we should suppress or not cast errors in
datetime comparison, we may just eliminate those error. So, if we
know that casting date to timestamp overflows upper bound of finite
timestamp, then we also know that this date is greater than any finite
timestamp. So, we still able to do correct comparison. I'm going to
implement this and post a patch.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Show quoted text
On Mon, Sep 30, 2019 at 10:56 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Sep 29, 2019 at 10:30 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:So, jsonpath behaves like 1000000 is not greater than 2020. This
looks like plain false. And user can't expect that unless she is
familiar with our particular issues. Now I got opinion that such
errors shouldn't be suppressed. We can't suppress *every* error. If
trying to do this, we can come to an idea to suppress OOM error and
return garbage then, which is obviously ridiculous. Opinions?I don't know enough about jsonpath to have a view on specifically
which errors ought to be suppressed, but I agree that it's probably
not all of them. In fact, I'd go so far as to say that thinking about
it in terms of error suppression is probably not the right approach in
the first place. Rather, you want to ask what behavior you're trying
to create.For example, if I'm trying to write a function that takes a string as
input and returns JSON, where the result is formatted as a number if
possible or a string otherwise, I might want access at the C level to
the guts of numeric_in, with all parsing errors returned rather than
thrown. But it would be silly to suppress an out-of-memory condition,
because that doesn't help the caller. The caller wants to know whether
the thing can be parsed as a number or not, and that has nothing to do
with whether we're out of memory, so an out-of-memory error should
still be thrown.In this case here, it seems to me that you should similarly start by
defining the behavior you're trying to create. Unless that's clearly
defined, deciding which errors to suppress may be difficult.--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Oct 1, 2019 at 1:41 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
So, basically standard requires us to suppress any error happening in
filter expression.
Sounds like the standard is dumb, then. :-)
But as I wrote before suppression of errors in
datetime comparison may lead to surprising results. That happens in
rare corner cases, but still. This makes uneasy choice between
consistent behavior and standard behavior.
Yeah.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Oct 3, 2019 at 4:48 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Oct 1, 2019 at 1:41 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:So, basically standard requires us to suppress any error happening in
filter expression.Sounds like the standard is dumb, then. :-)
But as I wrote before suppression of errors in
datetime comparison may lead to surprising results. That happens in
rare corner cases, but still. This makes uneasy choice between
consistent behavior and standard behavior.Yeah.
Proposed patch eliminates this dilemma in particular case. It
provides correct cross-type comparison of datetime values even if one
of values overflows during cast. In order to do this, I made cast
functions to report whether lower or upper boundary is overflowed. We
know that overflowed value is lower (or upper) than any valid value
except infinity.
This patch also changes the way timestamp to timestamptz cast works.
Previously it did timestamp2tm() then tm2timestamp(). Instead, after
timestamp2tm() it calculates timezone offset and applies it to
original timestamp value. I hope this is correct. If so, besides
making overflow handling easier, this refactoring saves some CPU
cycles.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0002-Refactor-jsonpath-s-compareDatetime-2.patchapplication/octet-stream; name=0002-Refactor-jsonpath-s-compareDatetime-2.patchDownload+245-150
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
This patch also changes the way timestamp to timestamptz cast works.
Previously it did timestamp2tm() then tm2timestamp(). Instead, after
timestamp2tm() it calculates timezone offset and applies it to
original timestamp value. I hope this is correct.
I'd wonder whether this gives the same answers near DST transitions,
where it's not real clear which offset applies.
Please *don't* wrap this sort of thing into an unrelated feature patch.
regards, tom lane
On Sun, Oct 13, 2019 at 5:24 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
This patch also changes the way timestamp to timestamptz cast works.
Previously it did timestamp2tm() then tm2timestamp(). Instead, after
timestamp2tm() it calculates timezone offset and applies it to
original timestamp value. I hope this is correct.I'd wonder whether this gives the same answers near DST transitions,
where it's not real clear which offset applies.
I will try this and share the results.
Please *don't* wrap this sort of thing into an unrelated feature patch.
Sure, thank you for noticing.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Mon, Oct 14, 2019 at 5:36 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
On Sun, Oct 13, 2019 at 5:24 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
This patch also changes the way timestamp to timestamptz cast works.
Previously it did timestamp2tm() then tm2timestamp(). Instead, after
timestamp2tm() it calculates timezone offset and applies it to
original timestamp value. I hope this is correct.I'd wonder whether this gives the same answers near DST transitions,
where it's not real clear which offset applies.I will try this and share the results.
I've separated refactoring of timestamp to timestamptz cast into a
separate patch. Patchset is attached.
I've investigates the behavior near DST transitions in Moscow
timezone. Last two DST transitions it had in 2010-03-28 and
2010-10-31. It behaves the same with and without patch. The tests
are below.
# set timezone = 'Europe/Moscow';
# select '2010-03-28 01:59:59'::timestamp::timestamptz;
timestamptz
------------------------
2010-03-28 01:59:59+03
(1 row)
# select '2010-03-28 02:00:00'::timestamp::timestamptz;
timestamptz
------------------------
2010-03-28 03:00:00+04
(1 row)
# select '2010-03-28 02:59:59'::timestamp::timestamptz;
timestamptz
------------------------
2010-03-28 03:59:59+04
(1 row)
# select '2010-03-28 03:00:00'::timestamp::timestamptz;
timestamptz
------------------------
2010-03-28 03:00:00+04
(1 row)
# select '2010-10-31 01:59:59'::timestamp::timestamptz;
timestamptz
------------------------
2010-10-31 01:59:59+04
(1 row)
# select '2010-10-31 02:00:00'::timestamp::timestamptz;
timestamptz
------------------------
2010-10-31 02:00:00+03
(1 row)
BTW, I've noticed how ridiculous cast behaves for values in the range
of [2010-03-28 02:00:00, 2010-03-28 03:00:00). Now, I think that
timestamptz type, which explicitly stores timezone offset, has some
point. At least, it would be possible to save the same local time
value during casts.
I'm going to push these two patches if no objections.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company