SQL/JSON: FOR ORDINALITY bug

Started by Erik Rijkersover 3 years ago10 messages
#1Erik Rijkers
er@xs4all.nl
1 attachment(s)

Hi

I've copied some statements from the .pdf called:
"TECHNICAL REPORT ISO/IEC TR 19075-6 First edition 2017-03
Part SQL Notation support 6: (JSON) for JavaScript Object"
(not available anymore although there should be a similar replacement file)

In that pdf I found the data and statement (called 'table 15' in the
.pdf) as in the attached bash file. But the result is different: as
implemented by 15devel, the column rowseq is always 1. It seems to me
that that is wrong; it should count 1, 2, 3 as indeed the example-result
column in that pdf shows.

What do you think?

Erik Rijkers

Attachments:

table15bug.shapplication/x-shellscript; name=table15bug.shDownload
#2Andrew Dunstan
andrew@dunslane.net
In reply to: Erik Rijkers (#1)
Re: SQL/JSON: FOR ORDINALITY bug

On 2022-05-03 Tu 11:19, Erik Rijkers wrote:

Hi

I've copied some statements from the .pdf called:
"TECHNICAL REPORT ISO/IEC TR 19075-6   First edition 2017-03
Part SQL Notation support 6: (JSON) for JavaScript Object"
(not available anymore although there should be a similar replacement
file)

In that pdf I found the data and statement (called 'table 15' in the
.pdf) as in the attached bash file.  But the result is different: as
implemented by 15devel, the column rowseq is always 1.  It seems to me
that that is wrong; it should count 1, 2, 3 as indeed the
example-result column in that pdf shows.

What do you think?

Possibly. 

Here's what the standard says in section 7.11 in I think the relevant
bit of mindbogglingly impenetrable prose:

General Rules
1)
If a <table primary> simply contains a <JSON table primitive> JTP, then:
a) If the value of the <JSON context item> simply contained in the <JSON
API common syntax> is the null value, then the result of <JSON table
primitive> is an empty table and no further General Rules of this
Subclause are applied.
b) Let JACS be the <JSON API common syntax> simply contained in JTP.
c) Let JTEB be the <JSON table error behavior> simply contained in JTP.
d) The General Rules of Subclause 10.14, “<JSON API common syntax>”, are
applied with JACS as JSON API COMMON SYNTAX; let ROWST be the STATUS and
let ROWSEQ be the SQL/JSON SEQUENCE returned from the application of
those General Rules.
460
Foundation (SQL/Foundation)
e) Case:
i) If ROWST is an exception condition, then
Case:
1) If JTEB is ERROR, then the exception condition ROWST is raised.
2) Otherwise, the result of JTP is an empty table.
ii) Otherwise, let NI be the number of SQL/JSON items in ROWSEQ, let Ij,
1 (one) ≤ j ≤ NI, be those SQL/JSON items in order, let NCD be the
number of <JSON table primitive column definition>s contained in JTP,
and let JTCDi, 1 (one) ≤ i ≤ NCD, be those <JSON table primitive column
definition>s.
For all j, 1 (one) ≤ j ≤ NI, and for all i, 1 (one) ≤ i ≤ NCD, the value
of the i-th column of the j-th row in the result of JTP is determined as
follows:
Case:
1) If JTCDi is a <JSON table ordinality column definition>, then the
value of the i-th column
of the j-th row is j.

Maybe some language lawyer can turn that into comprehensible English.

This should probably be an open item for release 15, but I don't really
know what the precise behaviour should be, so it's hard to modify it.

If we can't get it right maybe we should disable the "WITH ORDINALITY"
clause, although that would be a pity.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Andrew Dunstan (#2)
Re: SQL/JSON: FOR ORDINALITY bug

On Tue, May 3, 2022 at 5:27 PM Andrew Dunstan <andrew@dunslane.net> wrote:

On 2022-05-03 Tu 11:19, Erik Rijkers wrote:

Hi

I've copied some statements from the .pdf called:
"TECHNICAL REPORT ISO/IEC TR 19075-6 First edition 2017-03
Part SQL Notation support 6: (JSON) for JavaScript Object"
(not available anymore although there should be a similar replacement
file)

In that pdf I found the data and statement (called 'table 15' in the
.pdf) as in the attached bash file. But the result is different: as
implemented by 15devel, the column rowseq is always 1. It seems to me
that that is wrong; it should count 1, 2, 3 as indeed the
example-result column in that pdf shows.

What do you think?

Possibly.

I don't see how rowseq can be anything but 1. Each invocation of
json_table is given a single jsonb record via the lateral reference to
bookclub.jcol. It produces one result, having a rowseq 1. It does this
for all three outer lateral reference tuples and thus produces three output
rows each with one match numbered rowseq 1.

David J.

#4Andrew Dunstan
andrew@dunslane.net
In reply to: David G. Johnston (#3)
Re: SQL/JSON: FOR ORDINALITY bug

On 2022-05-03 Tu 20:39, David G. Johnston wrote:

On Tue, May 3, 2022 at 5:27 PM Andrew Dunstan <andrew@dunslane.net> wrote:

On 2022-05-03 Tu 11:19, Erik Rijkers wrote:

Hi

I've copied some statements from the .pdf called:
"TECHNICAL REPORT ISO/IEC TR 19075-6   First edition 2017-03
Part SQL Notation support 6: (JSON) for JavaScript Object"
(not available anymore although there should be a similar

replacement

file)

In that pdf I found the data and statement (called 'table 15' in the
.pdf) as in the attached bash file.  But the result is different: as
implemented by 15devel, the column rowseq is always 1.  It seems

to me

that that is wrong; it should count 1, 2, 3 as indeed the
example-result column in that pdf shows.

What do you think?

Possibly. 

I don't see how rowseq can be anything but 1.  Each invocation of
json_table is given a single jsonb record via the lateral reference to
bookclub.jcol.  It produces one result, having a rowseq 1.  It does
this for all three outer lateral reference tuples and thus produces
three output rows each with one match numbered rowseq 1.

I imagine we could overcome that by stashing the sequence counter
somewhere it would survive across calls. The question really is what is
the right thing to do? I'm also a bit worried about how correct is
ordinal numbering with nested paths, e.g. (from the regression tests):

select
    jt.*
from
    jsonb_table_test jtt,
    json_table (
        jtt.js,'strict $[*]' as p
        columns (
            n for ordinality,
            a int path 'lax $.a' default -1 on empty,
            nested path 'strict $.b[*]' as pb columns ( b int path '$' ),
            nested path 'strict $.c[*]' as pc columns ( c int path '$' )
        )
    ) jt;
 n | a  | b | c  
---+----+---+----
 1 |  1 |   |   
 2 |  2 | 1 |   
 2 |  2 | 2 |   
 2 |  2 | 3 |   
 2 |  2 |   | 10
 2 |  2 |   |   
 2 |  2 |   | 20
 3 |  3 | 1 |   
 3 |  3 | 2 |   
 4 | -1 | 1 |   
 4 | -1 | 2 |   

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#5Erik Rijkers
er@xs4all.nl
In reply to: Andrew Dunstan (#4)
1 attachment(s)
Re: SQL/JSON: FOR ORDINALITY bug

Op 04-05-2022 om 13:55 schreef Andrew Dunstan:

On 2022-05-03 Tu 20:39, David G. Johnston wrote:

On Tue, May 3, 2022 at 5:27 PM Andrew Dunstan <andrew@dunslane.net> wrote:

On 2022-05-03 Tu 11:19, Erik Rijkers wrote:

Hi

I've copied some statements from the .pdf called:
"TECHNICAL REPORT ISO/IEC TR 19075-6   First edition 2017-03
Part SQL Notation support 6: (JSON) for JavaScript Object"
(not available anymore although there should be a similar

replacement

file)

In that pdf I found the data and statement (called 'table 15' in the
.pdf) as in the attached bash file.  But the result is different: as
implemented by 15devel, the column rowseq is always 1.  It seems

to me

that that is wrong; it should count 1, 2, 3 as indeed the
example-result column in that pdf shows.

What do you think?

Possibly.

I don't see how rowseq can be anything but 1.  Each invocation of

After some further experimentation, I now think you must be right, David.

Also, looking at the DB2 docs:
https://www.ibm.com/docs/en/i/7.2?topic=data-using-json-table
(see especially under 'Handling nested information')

There, I gathered some example data + statements where one is the case
at hand. I also made them runnable under postgres (attached).

I thought that was an instructive example, with those 'outer_ordinality'
and 'inner_ordinality' columns.

Erik

Show quoted text

json_table is given a single jsonb record via the lateral reference to
bookclub.jcol.  It produces one result, having a rowseq 1.  It does
this for all three outer lateral reference tuples and thus produces
three output rows each with one match numbered rowseq 1.

I imagine we could overcome that by stashing the sequence counter
somewhere it would survive across calls. The question really is what is
the right thing to do? I'm also a bit worried about how correct is
ordinal numbering with nested paths, e.g. (from the regression tests):

select
    jt.*
from
    jsonb_table_test jtt,
    json_table (
        jtt.js,'strict $[*]' as p
        columns (
            n for ordinality,
            a int path 'lax $.a' default -1 on empty,
            nested path 'strict $.b[*]' as pb columns ( b int path '$' ),
            nested path 'strict $.c[*]' as pc columns ( c int path '$' )
        )
    ) jt;
 n | a  | b | c
---+----+---+----
 1 |  1 |   |
 2 |  2 | 1 |
 2 |  2 | 2 |
 2 |  2 | 3 |
 2 |  2 |   | 10
 2 |  2 |   |
 2 |  2 |   | 20
 3 |  3 | 1 |
 3 |  3 | 2 |
 4 | -1 | 1 |
 4 | -1 | 2 |

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Attachments:

json_ordinality_db2.shapplication/x-shellscript; name=json_ordinality_db2.shDownload
#6Andrew Dunstan
andrew@dunslane.net
In reply to: Erik Rijkers (#5)
Re: SQL/JSON: FOR ORDINALITY bug

On 2022-05-04 We 10:39, Erik Rijkers wrote:

Op 04-05-2022 om 13:55 schreef Andrew Dunstan:

On 2022-05-03 Tu 20:39, David G. Johnston wrote:

On Tue, May 3, 2022 at 5:27 PM Andrew Dunstan <andrew@dunslane.net>
wrote:

     On 2022-05-03 Tu 11:19, Erik Rijkers wrote:
     > Hi
     >
     > I've copied some statements from the .pdf called:
     > "TECHNICAL REPORT ISO/IEC TR 19075-6   First edition 2017-03
     > Part SQL Notation support 6: (JSON) for JavaScript Object"
     > (not available anymore although there should be a similar
     replacement
     > file)
     >
     > In that pdf I found the data and statement (called 'table 15'
in the
     > .pdf) as in the attached bash file.  But the result is
different: as
     > implemented by 15devel, the column rowseq is always 1.  It seems
     to me
     > that that is wrong; it should count 1, 2, 3 as indeed the
     > example-result column in that pdf shows.
     >
     > What do you think?
     >
     >

     Possibly.

I don't see how rowseq can be anything but 1.  Each invocation of

After some further experimentation, I now think you must be right, David.

Also, looking at the DB2 docs:
  https://www.ibm.com/docs/en/i/7.2?topic=data-using-json-table
    (see especially under 'Handling nested information')

There, I gathered some example data + statements where one is the case
at hand.  I also made them runnable under postgres (attached).

I thought that was an instructive example, with those
'outer_ordinality' and 'inner_ordinality' columns.

Yeah, I just reviewed the latest version of that page (7.5) and the
example seems fairly plain that we are doing the right thing, or if not
we're in pretty good company, so I guess this is probably a false alarm.
Looks like ordinality is for the number of the element produced by the
path expression. So a path of 'lax $' should just produce ordinality of
1 in each case, while a path of 'lax $[*]' will produce increasing
ordinality for each element of the root array.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#7Erik Rijkers
er@xs4all.nl
In reply to: Andrew Dunstan (#6)
Re: SQL/JSON: FOR ORDINALITY bug

Op 04-05-2022 om 21:12 schreef Andrew Dunstan:

I don't see how rowseq can be anything but 1.  Each invocation of

After some further experimentation, I now think you must be right, David.

Also, looking at the DB2 docs:
  https://www.ibm.com/docs/en/i/7.2?topic=data-using-json-table
    (see especially under 'Handling nested information')

There, I gathered some example data + statements where one is the case
at hand.  I also made them runnable under postgres (attached).

I thought that was an instructive example, with those
'outer_ordinality' and 'inner_ordinality' columns.

Yeah, I just reviewed the latest version of that page (7.5) and the
example seems fairly plain that we are doing the right thing, or if not
we're in pretty good company, so I guess this is probably a false alarm.
Looks like ordinality is for the number of the element produced by the
path expression. So a path of 'lax $' should just produce ordinality of
1 in each case, while a path of 'lax $[*]' will produce increasing
ordinality for each element of the root array.

Agreed.

You've probably noticed then that on that same page under 'Sibling
Nesting' is a statement that gives a 13-row resultset on DB2 whereas in
15devel that statement yields just 10 rows. I don't know which is correct.

Erik

Show quoted text

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#8David G. Johnston
david.g.johnston@gmail.com
In reply to: Erik Rijkers (#7)
Re: SQL/JSON: FOR ORDINALITY bug

On Wed, May 4, 2022 at 1:09 PM Erik Rijkers <er@xs4all.nl> wrote:

Op 04-05-2022 om 21:12 schreef Andrew Dunstan:

I don't see how rowseq can be anything but 1. Each invocation of

After some further experimentation, I now think you must be right,

David.

Also, looking at the DB2 docs:
https://www.ibm.com/docs/en/i/7.2?topic=data-using-json-table
(see especially under 'Handling nested information')

There, I gathered some example data + statements where one is the case
at hand. I also made them runnable under postgres (attached).

I thought that was an instructive example, with those
'outer_ordinality' and 'inner_ordinality' columns.

Yeah, I just reviewed the latest version of that page (7.5) and the
example seems fairly plain that we are doing the right thing, or if not
we're in pretty good company, so I guess this is probably a false alarm.
Looks like ordinality is for the number of the element produced by the
path expression. So a path of 'lax $' should just produce ordinality of
1 in each case, while a path of 'lax $[*]' will produce increasing
ordinality for each element of the root array.

Agreed.

You've probably noticed then that on that same page under 'Sibling
Nesting' is a statement that gives a 13-row resultset on DB2 whereas in
15devel that statement yields just 10 rows. I don't know which is correct.

There should be 12 results (minimum would be 8 - 5 of which are used for
real matches, plus 4 new row producing matches).

Our result seems internally inconsistent; conceptually there are two kinds
of nulls here and we cannot collapse them.

null-val: we are outputting the record from the nested path but there is no
actual value to output so we output null-val
null-union: we are not outputting the record for the nested path (we are
doing a different one) but we need to output something for this column so
we output null-union.

Sally, null-val, null-union
Sally, null-union, null-val

We only have one Sally but need both (11)

We are also missing:

Mary, null-union, null-val (12)

The fact that we agree on John means that we at least agree on UNION
meaning we output a pair of rows when there are two nested paths.

I point to relative comparisons for fear of reading the specification
here...

David J.

David J.

#9David G. Johnston
david.g.johnston@gmail.com
In reply to: David G. Johnston (#8)
Re: SQL/JSON: FOR ORDINALITY bug

On Wed, May 4, 2022 at 1:43 PM David G. Johnston <david.g.johnston@gmail.com>
wrote:

On Wed, May 4, 2022 at 1:09 PM Erik Rijkers <er@xs4all.nl> wrote:

Op 04-05-2022 om 21:12 schreef Andrew Dunstan:

I don't see how rowseq can be anything but 1. Each invocation of

After some further experimentation, I now think you must be right,

David.

Also, looking at the DB2 docs:
https://www.ibm.com/docs/en/i/7.2?topic=data-using-json-table
(see especially under 'Handling nested information')

You've probably noticed then that on that same page under 'Sibling
Nesting' is a statement that gives a 13-row resultset on DB2 whereas in
15devel that statement yields just 10 rows. I don't know which is
correct.

There should be 12 results (minimum would be 8 - 5 of which are used for
real matches, plus 4 new row producing matches).

Our result seems internally inconsistent; conceptually there are two kinds
of nulls here and we cannot collapse them.

null-val: we are outputting the record from the nested path but there is
no actual value to output so we output null-val
null-union: we are not outputting the record for the nested path (we are
doing a different one) but we need to output something for this column so
we output null-union.

Thinking this over - I think the difference is we implemented a FULL OUTER
JOIN to combine the siblings - including the behavior of that construct and
the absence of rows. DB2 took the word "UNION" for the plan modifier
literally and unioned (actually union all) the two subpaths together using
the null concepts above (though somehow ensuring that at least one row was
produced from each subpath...).

Thus we are indeed back to seeing whether the standard defines sibling
combining as union or join, or some other special construct. I'm now
leaning toward what we've done as at least being the more sane option.

Even if our outer join process is correct the existing wording is odd.

"Use FULL OUTER JOIN ON FALSE, so that both parent and child rows are
included into the output, with NULL values inserted into both child and
parent columns for all missing values."

I don't think it helps to mention parent here. This aspect of plan doesn't
concern itself with the final output, only the output of the subplan which
is then combined with the parent using a join. I would probably want to
phrase the default more like:

"This is the default option for joining the combined child rows to the
parent."

David J.

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Erik Rijkers (#7)
Re: SQL/JSON: FOR ORDINALITY bug

On 2022-05-04 We 16:09, Erik Rijkers wrote:

Op 04-05-2022 om 21:12 schreef Andrew Dunstan:

I don't see how rowseq can be anything but 1.  Each invocation of

After some further experimentation, I now think you must be right,
David.

Also, looking at the DB2 docs:
   https://www.ibm.com/docs/en/i/7.2?topic=data-using-json-table
     (see especially under 'Handling nested information')

There, I gathered some example data + statements where one is the case
at hand.  I also made them runnable under postgres (attached).

I thought that was an instructive example, with those
'outer_ordinality' and 'inner_ordinality' columns.

Yeah, I just reviewed the latest version of that page (7.5) and the
example seems fairly plain that we are doing the right thing, or if not
we're in pretty good company, so I guess this is probably a false alarm.
Looks like ordinality is for the number of the element produced by the
path expression. So a path of 'lax $' should just produce ordinality of
1 in each case, while a path of 'lax $[*]' will produce increasing
ordinality for each element of the root array.

Agreed.

You've probably noticed then that on that same page under 'Sibling
Nesting' is a statement that gives a 13-row resultset on DB2 whereas
in 15devel that statement yields just 10 rows.  I don't know which is
correct.

Oracle also gives 10 rows for that query according to my testing, so I
suspect either DB2 and/or its docs are wrong.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com