Tackling JsonPath support

Started by Christian Conveyover 9 years ago41 messages
#1Christian Convey
christian.convey@gmail.com

On Mon, Sep 5, 2016 at 1:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
...

I wrote XMLTABLE function, and I am thinking about JSON_TABLE function. But
there is one blocker - missing JsonPath support in our JSON implementation.

So one idea - implement JsonPath support and related JSON query functions.
This can help with better standard conformance.

Hi Pavel,

Are you still looking for someone to add the JsonPath support to the
JSON implementation? And if so, how urgently are people waiting for
it?

I'd be happy to start working on a patch, but since I'm new to PG
development, I'm probably not the fastest person to get it done.

Kind regards,
Christian

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Christian Convey (#1)
Re: Tackling JsonPath support

Hi

2016-09-15 18:05 GMT+02:00 Christian Convey <christian.convey@gmail.com>:

On Mon, Sep 5, 2016 at 1:44 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:
...

I wrote XMLTABLE function, and I am thinking about JSON_TABLE function.

But

there is one blocker - missing JsonPath support in our JSON

implementation.

So one idea - implement JsonPath support and related JSON query

functions.

This can help with better standard conformance.

Hi Pavel,

Are you still looking for someone to add the JsonPath support to the
JSON implementation? And if so, how urgently are people waiting for
it?

yes - JsonPath support should be great. I hope so this or next commitfest
the XMLTABLE patch will be committed, and with JsonPath I can start to work
on JSON_TABLE function.

But the JsonPath can be merged separately without dependency to JSON_TABLE.
There are more JSON searching functions, and these functions should to
support JsonPath be ANSI SQL compliant.

I'd be happy to start working on a patch, but since I'm new to PG
development, I'm probably not the fastest person to get it done.

It is not problem. Probably you should to do this work without deep
knowledges about PostgreSQL internals. The work with data types (and
functions for data types) is well isolated from PostgreSQL engine.

You can learn from current searching on JSON -
postgresql/src/backend/utils/adt/json.c

And it is good start to be PostgreSQL's hacker - I started with
implementation of own data type and related functions.

Regards

Pavel

Show quoted text

Kind regards,
Christian

#3Christian Convey
christian.convey@gmail.com
In reply to: Pavel Stehule (#2)
Re: Tackling JsonPath support

Hi Pavel,

Can I check a few assumptions about what you're suggesting for this task?

* Our ultimate goal is to give Postgres an implementation of the functions
"JSON_EXISTS", "JSON_VALUE", and "JSON_QUERY" which fully comply with the
SQL standards.

* The best representation of those standards is found here: [1]http://jtc1sc32.org/doc/N2501-2550/32N2528-WG3-Tutorial-Opening-Plenary.pdf.

* When [1]http://jtc1sc32.org/doc/N2501-2550/32N2528-WG3-Tutorial-Opening-Plenary.pdf mentions a "JSON path expression" or "JSON path language", it's
referring to the query language described here: [2]http://goessner.net/articles/JsonPath.

* Even if other popular DBMS's deviate from [1]http://jtc1sc32.org/doc/N2501-2550/32N2528-WG3-Tutorial-Opening-Plenary.pdf, or other popular JSONPath
implementations deviate from [2]http://goessner.net/articles/JsonPath, we remain committed to a faithful
implementation of [1]http://jtc1sc32.org/doc/N2501-2550/32N2528-WG3-Tutorial-Opening-Plenary.pdf.

* It's okay for my first commit to implement just two things: (a) a
PG-internal implementation of JsonPath, and (b) a user-visible
implementation of "JSON_QUERY" based on (a). Later commits could add
implementations of "JSON_VALUE", "JSON_EXISTS", etc. in terms of (a).

Thanks,
Christian

[1]: http://jtc1sc32.org/doc/N2501-2550/32N2528-WG3-Tutorial-Opening-Plenary.pdf
http://jtc1sc32.org/doc/N2501-2550/32N2528-WG3-Tutorial-Opening-Plenary.pdf

[2]: http://goessner.net/articles/JsonPath

On Fri, Sep 16, 2016 at 2:28 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

Show quoted text

Hi

2016-09-15 18:05 GMT+02:00 Christian Convey <christian.convey@gmail.com>:

On Mon, Sep 5, 2016 at 1:44 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:
...

I wrote XMLTABLE function, and I am thinking about JSON_TABLE function.

But

there is one blocker - missing JsonPath support in our JSON

implementation.

So one idea - implement JsonPath support and related JSON query

functions.

This can help with better standard conformance.

Hi Pavel,

Are you still looking for someone to add the JsonPath support to the
JSON implementation? And if so, how urgently are people waiting for
it?

yes - JsonPath support should be great. I hope so this or next commitfest
the XMLTABLE patch will be committed, and with JsonPath I can start to work
on JSON_TABLE function.

But the JsonPath can be merged separately without dependency to
JSON_TABLE. There are more JSON searching functions, and these functions
should to support JsonPath be ANSI SQL compliant.

I'd be happy to start working on a patch, but since I'm new to PG
development, I'm probably not the fastest person to get it done.

It is not problem. Probably you should to do this work without deep
knowledges about PostgreSQL internals. The work with data types (and
functions for data types) is well isolated from PostgreSQL engine.

You can learn from current searching on JSON -
postgresql/src/backend/utils/adt/json.c

And it is good start to be PostgreSQL's hacker - I started with
implementation of own data type and related functions.

Regards

Pavel

Kind regards,
Christian

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christian Convey (#3)
Re: Tackling JsonPath support

Christian Convey <christian.convey@gmail.com> writes:

* Our ultimate goal is to give Postgres an implementation of the functions
"JSON_EXISTS", "JSON_VALUE", and "JSON_QUERY" which fully comply with the
SQL standards.
* The best representation of those standards is found here: [1].
[1]
http://jtc1sc32.org/doc/N2501-2550/32N2528-WG3-Tutorial-Opening-Plenary.pdf

You're going to need to find a draft standard somewhere, as that
presentation is too thin on details to support writing an actual
implementation. In particular, it's far from clear that this is
true at all:

* When [1] mentions a "JSON path expression" or "JSON path language", it's
referring to the query language described here: [2].
[2] http://goessner.net/articles/JsonPath

The one slide they have on the path language mentions a lax/strict syntax
that I don't see either in the document you mention or in the Wikipedia
XPath article it links to. This does not give me a warm feeling. The SQL
committee is *fully* capable of inventing their own random path notation,
especially when there's no ISO-blessed precedent to bind them.

In general, the stuff I see in these WG3 slides strikes me as pretty
horribly designed. The committee is evidently still stuck on the idea
that every feature they invent should have a bunch of new bespoke syntax
for function calls, which is a direction we really don't want to go in
because of the parser overhead and need for more fully-reserved keywords.
For instance:
WHERE JSON_EXISTS (T.J, 'strict $.where' FALSE ON ERROR)
Really? Who thought that was a better idea than a simple bool parameter?

I have no objection to providing some functions that implement XPath-like
tests for JSON, but I'm not sure that you ought to try to tie it to
whatever the SQL committee is going to do, especially when they've not
published a finished standard yet. You may be chasing a moving target.

As for whether JSONPath is the right spec to follow, I'm not sure.
The article you mention is from 2007 and I don't see all that many
other references in a Google search. I found this Wikipedia page:
https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
which mentions half a dozen competitors, including "JSON Pointer"
which has at least gotten as far as being an RFC standard:
https://tools.ietf.org/html/rfc6901
I'm not up enough on the JSON ecosystem to know which of these has the
most traction, but I'm unconvinced that it's JSONPath.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#4)
Re: Tackling JsonPath support

2016-11-13 18:13 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:

Christian Convey <christian.convey@gmail.com> writes:

* Our ultimate goal is to give Postgres an implementation of the

functions

"JSON_EXISTS", "JSON_VALUE", and "JSON_QUERY" which fully comply with the
SQL standards.
* The best representation of those standards is found here: [1].
[1]
http://jtc1sc32.org/doc/N2501-2550/32N2528-WG3-Tutorial-

Opening-Plenary.pdf

You're going to need to find a draft standard somewhere, as that
presentation is too thin on details to support writing an actual
implementation. In particular, it's far from clear that this is
true at all:

* When [1] mentions a "JSON path expression" or "JSON path language",

it's

referring to the query language described here: [2].
[2] http://goessner.net/articles/JsonPath

The one slide they have on the path language mentions a lax/strict syntax
that I don't see either in the document you mention or in the Wikipedia
XPath article it links to. This does not give me a warm feeling. The SQL
committee is *fully* capable of inventing their own random path notation,
especially when there's no ISO-blessed precedent to bind them.

In general, the stuff I see in these WG3 slides strikes me as pretty
horribly designed. The committee is evidently still stuck on the idea
that every feature they invent should have a bunch of new bespoke syntax
for function calls, which is a direction we really don't want to go in
because of the parser overhead and need for more fully-reserved keywords.
For instance:
WHERE JSON_EXISTS (T.J, 'strict $.where' FALSE ON ERROR)
Really? Who thought that was a better idea than a simple bool parameter?

I have no objection to providing some functions that implement XPath-like
tests for JSON, but I'm not sure that you ought to try to tie it to
whatever the SQL committee is going to do, especially when they've not
published a finished standard yet. You may be chasing a moving target.

As for whether JSONPath is the right spec to follow, I'm not sure.
The article you mention is from 2007 and I don't see all that many
other references in a Google search. I found this Wikipedia page:
https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
which mentions half a dozen competitors, including "JSON Pointer"
which has at least gotten as far as being an RFC standard:
https://tools.ietf.org/html/rfc6901
I'm not up enough on the JSON ecosystem to know which of these has the
most traction, but I'm unconvinced that it's JSONPath.

We can use some other databases with this implementation as references.

I have to agree, so the people in SQL committee are not too consistent -
and sometimes creates too cobolish syntax, but it is standard - and it is
implemented by major vendors.

We doesn't need to implement full API - not in first step - important point
is don't close door to possible ANSI conformance. In first step we can take
the best and important from standard. It can be similar to our SQL/XML
implementation - we implement maybe 75% - and only XPath instead XQuery,
but I don't feel any weak. I see very useful "JSON_TABLE" function, which
is good for start.

Regards

Pavel

Show quoted text

regards, tom lane

#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Christian Convey (#3)
Re: Tackling JsonPath support

Hi

2016-11-13 15:14 GMT+01:00 Christian Convey <christian.convey@gmail.com>:

Hi Pavel,

Can I check a few assumptions about what you're suggesting for this task?

* Our ultimate goal is to give Postgres an implementation of the functions
"JSON_EXISTS", "JSON_VALUE", and "JSON_QUERY" which fully comply with the
SQL standards.

* The best representation of those standards is found here: [1].

* When [1] mentions a "JSON path expression" or "JSON path language", it's
referring to the query language described here: [2].

* Even if other popular DBMS's deviate from [1], or other popular JSONPath
implementations deviate from [2], we remain committed to a faithful
implementation of [1].

* It's okay for my first commit to implement just two things: (a) a
PG-internal implementation of JsonPath, and (b) a user-visible
implementation of "JSON_QUERY" based on (a). Later commits could add
implementations of "JSON_VALUE", "JSON_EXISTS", etc. in terms of (a).

My goal is implementation of JSON_TABLE function - this function can be
used instead any other mentioned function (and it is really useful - it is
usual task - transform JSON to table). The SQL/JSON is pretty new and
bigger for implementation in one step. Nobody knows it from PostgreSQL
world. The our SQL/XML needed more than 10 years and still is not fully
complete - and we used power and features libxml2 (nothing similar we have
for JSON). But almost what is daily need from SQL/XML we have. For
JSON_TABLE we need only basic features of JSONPath - the predicates are not
necessary in first step.

http://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/json/src/tpc/db2z_bif_jsontable.html
The vendors use name for this query language "SQL/JSON path expressions" -
so important source is SQL/JSON (this can be different than origin JSONPath
(your second source)).

Regards

Pavel

Regards

Pavel

Show quoted text

Thanks,
Christian

[1] http://jtc1sc32.org/doc/N2501-2550/32N2528-WG3-
Tutorial-Opening-Plenary.pdf

[2] http://goessner.net/articles/JsonPath

On Fri, Sep 16, 2016 at 2:28 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

Hi

2016-09-15 18:05 GMT+02:00 Christian Convey <christian.convey@gmail.com>:

On Mon, Sep 5, 2016 at 1:44 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:
...

I wrote XMLTABLE function, and I am thinking about JSON_TABLE

function. But

there is one blocker - missing JsonPath support in our JSON

implementation.

So one idea - implement JsonPath support and related JSON query

functions.

This can help with better standard conformance.

Hi Pavel,

Are you still looking for someone to add the JsonPath support to the
JSON implementation? And if so, how urgently are people waiting for
it?

yes - JsonPath support should be great. I hope so this or next commitfest
the XMLTABLE patch will be committed, and with JsonPath I can start to work
on JSON_TABLE function.

But the JsonPath can be merged separately without dependency to
JSON_TABLE. There are more JSON searching functions, and these functions
should to support JsonPath be ANSI SQL compliant.

I'd be happy to start working on a patch, but since I'm new to PG
development, I'm probably not the fastest person to get it done.

It is not problem. Probably you should to do this work without deep
knowledges about PostgreSQL internals. The work with data types (and
functions for data types) is well isolated from PostgreSQL engine.

You can learn from current searching on JSON -
postgresql/src/backend/utils/adt/json.c

And it is good start to be PostgreSQL's hacker - I started with
implementation of own data type and related functions.

Regards

Pavel

Kind regards,
Christian

#7Christian Convey
christian.convey@gmail.com
In reply to: Pavel Stehule (#5)
Re: Tackling JsonPath support

From looking at other databases' docs, it seems like the behavior of
various JSON-related operators / functions are described partially in terms
of a "json path expression":

* In Oracle, "JSON_TABLE", "JSON_exists_column", "JSON_value_column": [1]https://docs.oracle.com/database/121/SQLRF/functions092.htm#SQLRF56973
* In MySQL: [2]https://dev.mysql.com/doc/refman/5.7/en/json-path-syntax.html
* In DB2: [3]http://www.ibm.com/support/knowledgecenter/ssw_ibm_i_72/db2/rbafzjsonpath.htm
* In MS SQL Server: [4]https://msdn.microsoft.com/en-us/library/mt577087.aspx
* (Whatever the Standards committee will end up producing.)

If I'm correctly understanding the situation, It sounds like we have two
big unknowns:

(a) The exact syntax/semantics of JSON path searching, especially w.r.t.
corner cases and error handling, and

(b) The syntax/semantics of whatever SQL operators / functions are
currently defined in terms of (a). E.g., "JSON_TABLE".

If that's correct, then what do you guys think about us taking the
following incremental approach?

Step 1: I'll dig into the implementations described above, to see what's
similar and different between the JSON-path-expression syntax and semantics
offered by each. I then report my findings here, and we can hopefully
reach a consensus about the syntax/semantics of PG's json-path-expression
handling.

Step 2: I submit a patch for adding a new function to "contrib", which
implements the JSON-path-expression semantics chosen in Step 1. The
function will be named such that people won't confuse it with any
(eventual) SQL-standard equivalent.

Step 3: PG developers can, if they choose, start defining new JSON operator
/ functions, and/or port existing JSON-related functions, in terms of the
function created in Step 2.

I see the following pros / cons to this approach:

Pro: It gives us a concrete start on this functionality, even though we're
not sure what's happening with the SQL standard.

Pro: The risk of painting ourselves into a corner is relatively low,
because we're putting the functionality in "contrib", and avoid function
names which conflict with likely upcoming standards.

Pro: It might permit us to give PG users access to JSONPath -like
functionality sooner than if we wait until we're clear on the ideal
long-term interface.

Con: "JSON path expression" is a recurring them in the *grammars* of
user-facing operators in [1]https://docs.oracle.com/database/121/SQLRF/functions092.htm#SQLRF56973, [2]https://dev.mysql.com/doc/refman/5.7/en/json-path-syntax.html, [3]http://www.ibm.com/support/knowledgecenter/ssw_ibm_i_72/db2/rbafzjsonpath.htm, and [4]https://msdn.microsoft.com/en-us/library/mt577087.aspx. But it doesn't
necessarily follow that the function implemented in Step 2 will provide
useful infrastructure for PG's eventual implementations of "JSON_TABLE",
etc.

- Christian

[1]: https://docs.oracle.com/database/121/SQLRF/functions092.htm#SQLRF56973
[2]: https://dev.mysql.com/doc/refman/5.7/en/json-path-syntax.html
[3]: http://www.ibm.com/support/knowledgecenter/ssw_ibm_i_72/db2/rbafzjsonpath.htm
http://www.ibm.com/support/knowledgecenter/ssw_ibm_i_72/db2/rbafzjsonpath.htm
[4]: https://msdn.microsoft.com/en-us/library/mt577087.aspx

#8Pavel Stehule
pavel.stehule@gmail.com
In reply to: Christian Convey (#7)
Re: Tackling JsonPath support

2016-11-27 17:50 GMT+01:00 Christian Convey <christian.convey@gmail.com>:

From looking at other databases' docs, it seems like the behavior of
various JSON-related operators / functions are described partially in terms
of a "json path expression":

* In Oracle, "JSON_TABLE", "JSON_exists_column", "JSON_value_column": [1]
* In MySQL: [2]
* In DB2: [3]
* In MS SQL Server: [4]
* (Whatever the Standards committee will end up producing.)

If I'm correctly understanding the situation, It sounds like we have two
big unknowns:

(a) The exact syntax/semantics of JSON path searching, especially w.r.t.
corner cases and error handling, and

(b) The syntax/semantics of whatever SQL operators / functions are
currently defined in terms of (a). E.g., "JSON_TABLE".

If that's correct, then what do you guys think about us taking the
following incremental approach?

Step 1: I'll dig into the implementations described above, to see what's
similar and different between the JSON-path-expression syntax and semantics
offered by each. I then report my findings here, and we can hopefully
reach a consensus about the syntax/semantics of PG's json-path-expression
handling.

Step 2: I submit a patch for adding a new function to "contrib", which
implements the JSON-path-expression semantics chosen in Step 1. The
function will be named such that people won't confuse it with any
(eventual) SQL-standard equivalent.

Step 3: PG developers can, if they choose, start defining new JSON
operator / functions, and/or port existing JSON-related functions, in terms
of the function created in Step 2.

I see the following pros / cons to this approach:

Pro: It gives us a concrete start on this functionality, even though we're
not sure what's happening with the SQL standard.

Pro: The risk of painting ourselves into a corner is relatively low,
because we're putting the functionality in "contrib", and avoid function
names which conflict with likely upcoming standards.

Pro: It might permit us to give PG users access to JSONPath -like
functionality sooner than if we wait until we're clear on the ideal
long-term interface.

Incremental work is great idea - I like this this style. Instead contrib,
you can use public repository on github. Minimally for first stage is
better to live outside core - you are not restricted by PostgreSQL
development process. When your code will be stabilized, then you can go to
commitfest. I believe so we need good JSON support. The XML support helps
to PostgreSQL lot of, JSON will be great too.

Con: "JSON path expression" is a recurring them in the *grammars* of
user-facing operators in [1], [2], [3], and [4]. But it doesn't
necessarily follow that the function implemented in Step 2 will provide
useful infrastructure for PG's eventual implementations of "JSON_TABLE",
etc.

We can implement subset only - our XPath based on libxml2 does it too. The
good target is support of usual examples on the net.

Regards

Pavel

Show quoted text

- Christian

[1] https://docs.oracle.com/database/121/SQLRF/functions092.htm#SQLRF56973
[2] https://dev.mysql.com/doc/refman/5.7/en/json-path-syntax.html
[3] http://www.ibm.com/support/knowledgecenter/ssw_
ibm_i_72/db2/rbafzjsonpath.htm
[4] https://msdn.microsoft.com/en-us/library/mt577087.aspx

#9David Fetter
david@fetter.org
In reply to: Christian Convey (#7)
Re: Tackling JsonPath support

On Sun, Nov 27, 2016 at 11:50:30AM -0500, Christian Convey wrote:

From looking at other databases' docs, it seems like the behavior of

various JSON-related operators / functions are described partially in terms
of a "json path expression":

* In Oracle, "JSON_TABLE", "JSON_exists_column", "JSON_value_column": [1]
* In MySQL: [2]
* In DB2: [3]
* In MS SQL Server: [4]
* (Whatever the Standards committee will end up producing.)

There's another option we should also consider: jq
<https://stedolan.github.io/jq/&gt;. It's available under a
PostgreSQL-compatible license, and has had a LOT of work put into
correctness and performance.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Pavel Stehule
pavel.stehule@gmail.com
In reply to: David Fetter (#9)
Re: Tackling JsonPath support

Dne 28. 11. 2016 17:26 napsal uživatel "David Fetter" <david@fetter.org>:

On Sun, Nov 27, 2016 at 11:50:30AM -0500, Christian Convey wrote:

From looking at other databases' docs, it seems like the behavior of

various JSON-related operators / functions are described partially in

terms

of a "json path expression":

* In Oracle, "JSON_TABLE", "JSON_exists_column", "JSON_value_column":

[1]:

* In MySQL: [2]
* In DB2: [3]
* In MS SQL Server: [4]
* (Whatever the Standards committee will end up producing.)

There's another option we should also consider: jq
<https://stedolan.github.io/jq/&gt;. It's available under a
PostgreSQL-compatible license, and has had a LOT of work put into
correctness and performance.

we can use it for inspiration. but the syntax of this tool is little bit
too complex and too original against Json path ... jsonpath is relative
simple implementation of xpath to json

we have one proprietary syntax already, two is maybe too much :-)

Show quoted text

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#11Christian Convey
christian.convey@gmail.com
In reply to: Pavel Stehule (#8)
Re: Tackling JsonPath support

On Mon, Nov 28, 2016 at 5:20 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:
​...​

Con: "JSON path expression" is a recurring them in the *grammars* of

user-facing operators in [1], [2], [3], and [4]. But it doesn't
necessarily follow that the function implemented in Step 2 will provide
useful infrastructure for PG's eventual implementations of "JSON_TABLE",
etc.

We can implement subset only - our XPath based on libxml2 does it too. The
good target is support of usual examples on the net.

​Hi Pavel,

Can you clarify what you meant? I *think* you're saying:

* It's not important for me to match the syntax/semantics of the json-path
implementations found in MySQL / Oracle / DB2 / ​MS SQL Server, and

* Instead, I should just use examples / explanations on the web as my
guidance.

​Thanks,​
Christian

#12Christian Convey
christian.convey@gmail.com
In reply to: Pavel Stehule (#8)
Re: Tackling JsonPath support

On Mon, Nov 28, 2016 at 5:20 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:
...

Incremental work is great idea - I like this this style. Instead contrib,
you can use public repository on github. Minimally for first stage is
better to live outside core - you are not restricted by PostgreSQL
development process. When your code will be stabilized, then you can go to
commitfest. I believe so we need good JSON support. The XML support helps
to PostgreSQL lot of, JSON will be great too.

​Hi Pavel,

Thanks for the suggestion.

I am planning to use one of my own public github repos as the location for
my work.

I thought by adding my first implementation to "contrib", we could make
this functionality available to end-users, even before there was a
consensus about what PG's "official" JSON-related operators should have for
syntax and semantics.

Does my reasoning make sense?

- C

#13Christian Convey
christian.convey@gmail.com
In reply to: Christian Convey (#1)
Re: Tackling JsonPath support

On Mon, Nov 28, 2016 at 9:40 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:
​...​

​Hi Pavel,

Can you clarify what you meant? I *think* you're saying:

* It's not important for me to match the syntax/semantics of the

json-path implementations found in MySQL / Oracle / DB2 / ​MS SQL Server,
and

oh no. the syntax is important. But for start we can have a subset. For
json table function .. json to relation mapping is important path
expression. some other features like predicates
are nice, but can be implemented later.

Im sorry. My English is bad.

​Hi Pavel,

You're English is very good, actually. I think the confusion arises from
me speaking in vague terms. I apologize for that. Allow me to be more
specific about what I'm proposing to do.

I propose adding to "contrib" a function with the following characteristics:

* Its signature is "json_path( jsonb from_json, string
json_path_expression) --> jsonb".

* The function will hopefully be a useful building block for PG's
implementation of "official" JSON operators such as "JSON_TABLE". Once the
PG community agrees on what those operators' syntax/semantics should be.

* The function will hopefully be immediately useful to PG users who want
JSONPath -like operations on their "jsonb" objects.

- C

#14Christian Convey
christian.convey@gmail.com
In reply to: Christian Convey (#1)
Re: Tackling JsonPath support

On Mon, Nov 28, 2016 at 9:47 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote

I thought by adding my first implementation to "contrib", we could make

this functionality available to end-users, even before there was a
consensus about what PG's "official" JSON-related operators should have for
syntax and semantics.

this time the commiters dislike the contrib dir. It is hard to push there
anything :-(. You can try it, but it can be lost time.

​Thanks for the warning. I'm okay with my patch adding the "json_path"
function to the core PG code.​

I would still suggest that we hold off on having my first patch implement
an official JSON-related operator such as "JSON_TABLE". I would prefer to
have my "json_path" function available to users even before we know how
"JSON_TABLE", etc. should behave.

​Does that sound reasonable?​

#15Nico Williams
nico@cryptonector.com
In reply to: Christian Convey (#7)
Re: Tackling JsonPath support

I wonder what it might take to integrate jq[1]https://stedolan.github.io/jq (note: jq is NOT JQuery) (via libjq) with
PostgreSQL... The internal representation of JSON data is bound to be
completely different, no doubt, but jq is a fantastic language, akin to
XPath and XSLT combined, but with nice syntax.

[1]: https://stedolan.github.io/jq (note: jq is NOT JQuery)

Nico
--

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Nico Williams
nico@cryptonector.com
In reply to: Pavel Stehule (#10)
Re: Tackling JsonPath support

On Mon, Nov 28, 2016 at 05:56:41PM +0100, Pavel Stehule wrote:

Dne 28. 11. 2016 17:26 napsal uživatel "David Fetter" <david@fetter.org>:

There's another option we should also consider: jq
<https://stedolan.github.io/jq/&gt;. It's available under a
PostgreSQL-compatible license, and has had a LOT of work put into
correctness and performance.

we can use it for inspiration. but the syntax of this tool is little bit
too complex and too original against Json path ... jsonpath is relative
simple implementation of xpath to json

we have one proprietary syntax already, two is maybe too much :-)

jq is hardly proprietary :)

JSON Path is not expressive enough (last I looked) and can be mapped
onto jq if need be anyways.

libjq has a number of desirable features, mostly its immutable/COW data
structures. In libjq data structures are only mutated when there's
only one reference to them, but libjq's jv API is built around
immutability, so jv values are always notionally immutable. For
example, one writes:

jv a = jv_array();

a = jv_array_append(a, jv_true()); // `a' is notionally new, but since
// it had only one reference, its
// memory is reused

and similarly for objects. One could instead write:

jv a = jv_array_append(jv_array(), jv_true());

or

jv a = JV_ARRAY(jv_true());

One of the nice things about libjv is that almost every function
consumes a reference of every jv value passed in, with very few
exceptions. This simplifies memory management, or at least avoidance of
double-free and use-after-free (it can be harder to track down leaks
though, since tools like valgrind don't understand that jv_copy() call
sites can be like allocations).

Nico
--

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Petr Jelinek
petr@2ndquadrant.com
In reply to: Christian Convey (#14)
Re: Tackling JsonPath support

On 28/11/16 18:57, Christian Convey wrote:

On Mon, Nov 28, 2016 at 9:47 AM, Pavel Stehule <pavel.stehule@gmail.com
<mailto:pavel.stehule@gmail.com>> wrote

I thought by adding my first implementation to "contrib", we could make this functionality available to end-users, even before there was a consensus about what PG's "official" JSON-related operators should have for syntax and semantics.

this time the commiters dislike the contrib dir. It is hard to push
there anything :-(. You can try it, but it can be lost time.

​Thanks for the warning. I'm okay with my patch adding the "json_path"
function to the core PG code.​

I would still suggest that we hold off on having my first patch
implement an official JSON-related operator such as "JSON_TABLE". I
would prefer to have my "json_path" function available to users even
before we know how "JSON_TABLE", etc. should behave.

​Does that sound reasonable?​

Hi,

just make it extension, not contrib module, there is not much difference
between those except contrib is included in distribution.

Extensions that provide just functions are easy to integrate into core
(that's how some of the existing json functions were added in the past
as well).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Christian Convey
christian.convey@gmail.com
In reply to: Nico Williams (#16)
Re: Tackling JsonPath support

On Mon, Nov 28, 2016 at 11:23 AM, Nico Williams <nico@cryptonector.com>
wrote:
...

JSON Path is not expressive enough (last I looked) and can be mapped
onto jq if need be anyways.

​Hi Nico,

Could you please clarify what you mean by "not expressive enough"?

I ask because I've been struggling to identify clear requirements for the
json-path functionality I'm trying to provide. It sounds like perhaps you
have something concrete in mind.

Since I myself have no need currently for this functionality, I'm left
guessing about hypothetical users of it.​ My current mental model is:

(a) Backend web developers. AFAICT, their community has mostly settled on
the syntax/semantics proposed by Stefan Groessner. It would probably be
unkind for PG's implementation to deviate from that without a good reason.

(b) PG hackers who will eventually implement the ISO SQL standard
operators. In the standards-committee meeting notes I've seen, it seemed
to me that they were planning to define some operators in terms of
json-path expression. So it would probably be good if whatever json-path
function I implement turns out to comply with that standard, so that the
PG-hackers can use it as a building block for their work.

(c) Pavel. (I'm still somewhat unclear on what has him interested in this,
and what his specific constraints are.)

- Christian

#19Nico Williams
nico@cryptonector.com
In reply to: Christian Convey (#18)
Re: Tackling JsonPath support

On Mon, Nov 28, 2016 at 05:50:40PM -0800, Christian Convey wrote:

On Mon, Nov 28, 2016 at 11:23 AM, Nico Williams <nico@cryptonector.com>
wrote:
...

JSON Path is not expressive enough (last I looked) and can be mapped
onto jq if need be anyways.

Hi Nico,

Could you please clarify what you mean by "not expressive enough"?

jq is a functional language that has these and other features:

- recursion
- generators
- lazy evaluation (of sorts)
- path expressions
- math functionality (libm, basically)
- reduction
- functions
- and other things

(jq does not have higher-order functions in that functions cannot return
functions and functions are not values, though it does have closures.)

jq is and feels a lot like a SQL, but for JSON.

I ask because I've been struggling to identify clear requirements for the
json-path functionality I'm trying to provide. It sounds like perhaps you
have something concrete in mind.

SQL imposes structure on data. Recursion makes SQL structure looser in
the sense that it may not be easy or possible to express certain
desirable schema constraints in SQL terms without resorting to triggers,
say. Storing documents in XML, JSON, or other such recursion-friendly
formats (perhaps in semantically equivalent but query-optimized forms)
is also a way to avoid strict structure (thus one needs schema
validators for XML, for example).

Less rigid schema constraints do not and should not preclude powerful
query languages.

One could convert such documents to a SQL EAV schema, if one has an
RDBMS with an ANY type (e.g., something like SQLite3's duck typing), and
then use SQL to query them. But that may be more difficult to use than
a SQL with support for XML/JSON/... and query sub-languages for those.

SQL is very powerful. One might like to have similarly powerful,
format-specific query languages for documents stored in XML, JSON,
etcetera, in a SQL RDBMS. jq is such a language, for JSON documents.
Ditto XPath/XSLT, for XML. While XPath is expressive and compact, XSLT
is rather verbose; jq is as expressive as XSLT, but with the compact
verbosity of XPath.

Since I myself have no need currently for this functionality, I'm left
guessing about hypothetical users of it. My current mental model is:

That's a bit like asking what is the use for SQL :^) The point is that
SQL is a powerful query language, and so is jq. Each is appropriate to
its own domain; both could be used together.

(a) Backend web developers. AFAICT, their community has mostly settled on
the syntax/semantics proposed by Stefan Groessner. It would probably be
unkind for PG's implementation to deviate from that without a good reason.

I can't speak for the community. I wouldn't take it personally that jq
be not chosen, nor any other proposal of mine. If it's politically
easier, then do that.

(b) PG hackers who will eventually implement the ISO SQL standard
operators. In the standards-committee meeting notes I've seen, it seemed
to me that they were planning to define some operators in terms of
json-path expression. So it would probably be good if whatever json-path
function I implement turns out to comply with that standard, so that the
PG-hackers can use it as a building block for their work.

These could still be implemented (e.g., using jq itself).

(c) Pavel. (I'm still somewhat unclear on what has him interested in this,
and what his specific constraints are.)

Hmm?

Nico
--

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Christian Convey
christian.convey@gmail.com
In reply to: Nico Williams (#19)
Re: Tackling JsonPath support

On Mon, Nov 28, 2016 at 6:26 PM, Nico Williams <nico@cryptonector.com>
wrote:

On Mon, Nov 28, 2016 at 05:50:40PM -0800, Christian Convey wrote:

On Mon, Nov 28, 2016 at 11:23 AM, Nico Williams <nico@cryptonector.com>
wrote:
...

JSON Path is not expressive enough (last I looked) and can be mapped
onto jq if need be anyways.

Hi Nico,

Could you please clarify what you mean by "not expressive enough"?

jq is a functional language that has these and other features:

- recursion
- generators
- lazy evaluation (of sorts)
- path expressions
- math functionality (libm, basically)
- reduction
- functions
- and other things

(jq does not have higher-order functions in that functions cannot return
functions and functions are not values, though it does have closures.)

jq is and feels a lot like a SQL, but for JSON.

I ask because I've been struggling to identify clear requirements for the
json-path functionality I'm trying to provide. It sounds like perhaps

you

have something concrete in mind.

SQL imposes structure on data. Recursion makes SQL structure looser in
the sense that it may not be easy or possible to express certain
desirable schema constraints in SQL terms without resorting to triggers,
say. Storing documents in XML, JSON, or other such recursion-friendly
formats (perhaps in semantically equivalent but query-optimized forms)
is also a way to avoid strict structure (thus one needs schema
validators for XML, for example).

Less rigid schema constraints do not and should not preclude powerful
query languages.

One could convert such documents to a SQL EAV schema, if one has an
RDBMS with an ANY type (e.g., something like SQLite3's duck typing), and
then use SQL to query them. But that may be more difficult to use than
a SQL with support for XML/JSON/... and query sub-languages for those.

SQL is very powerful. One might like to have similarly powerful,
format-specific query languages for documents stored in XML, JSON,
etcetera, in a SQL RDBMS. jq is such a language, for JSON documents.
Ditto XPath/XSLT, for XML. While XPath is expressive and compact, XSLT
is rather verbose; jq is as expressive as XSLT, but with the compact
verbosity of XPath.

Since I myself have no need currently for this functionality, I'm left
guessing about hypothetical users of it. My current mental model is:

That's a bit like asking what is the use for SQL :^) The point is that
SQL is a powerful query language, and so is jq. Each is appropriate to
its own domain; both could be used together.

Thanks for the explanation. It sounds like your original point was NOT
that json-path isn't sufficient for "${specific use X}".

Instead, your point was that jq seems to have many advantages over
json-path in general, and therefore PG should offer jq instead or, or in
addition to, json-path.

Is that what you're saying?

​...

(c) Pavel. (I'm still somewhat unclear on what has him interested in

this,

and what his specific constraints are.)

Hmm?

​Context: The reason I'm trying to work on a json-path implementation is
that Pavel Stehule suggested it as a good first PG-hacking project for me.
At the time, it sounded like he had a use for the feature.

- C

#21Nico Williams
nico@cryptonector.com
In reply to: Christian Convey (#20)
Re: Tackling JsonPath support

On Mon, Nov 28, 2016 at 06:38:55PM -0800, Christian Convey wrote:

On Mon, Nov 28, 2016 at 6:26 PM, Nico Williams <nico@cryptonector.com>
wrote:

Thanks for the explanation. It sounds like your original point was NOT
that json-path isn't sufficient for "${specific use X}".

The only uses of SQL w/ JSON I've seen so far in live action are to
implement EAV schemas on PostgreSQL. Since PostgreSQL lacks an ANY
type... using the hstore or jsonb to store data that would otherwise
require an ANY type is the obvious thing to do. Naturally this use
doesn't need deeply nested JSON data structures, so even JSONPath is
overkill for it!

However, there are use cases I can imagine:

- generating complex JSON from complex (e.g., recursive) SQL data where
the desired JSON "schema" is not close to the SQL schema

I've used jq a *lot* to convert schemas. I've also use XSLT for the
same purpose. I've also used SQL RDBMSes and jq together a fair bit,
either having jq consume JSON documents to output INSERT and other
statements, or having a SQL application output JSON that I then
convert to an appropriate schema using jq.

Naturally I can keep using these two tools separately. There's not
much to gain from integrating them for this particular sort of
use-case.

- handling JSON documents with very loose schemata, perhaps arbitrary
JSON documents, embedded in a SQL DB

I've not needed to do this much, so I have no specific examples.
But, of course, one reason I've not needed to do this is that today
it kinda can't be done with enough expressivity.

There are many use-cases for general-purpose programming languages, and
even for very widely-applicable domain-specific programming language.

It's especially difficult to name a specific use-case for a language
that doesn't exist -- in this case that would be SQL + (jq and/or
JSONPath).

Instead, your point was that jq seems to have many advantages over
json-path in general, and therefore PG should offer jq instead or, or in
addition to, json-path.

Is that what you're saying?

Roughly, yes. The distinct advantage is that jq is much more general
and expressive, not unlike SQL itself.

Hmm?

Context: The reason I'm trying to work on a json-path implementation is
that Pavel Stehule suggested it as a good first PG-hacking project for me.
At the time, it sounded like he had a use for the feature.

I see. I understand that. If you've already made a significant
investment, then I don't blame you for not wanting to risk it. On the
other hand, if melding jsonb and jq happens to be easy, then you'll get
much more bang from it for your investment. Naturally, you do what you
prefer, and if the reality on the ground is JSONPath, then so be it. If
I had time and felt sufficiently strongly, I'd contribute jq
integration; as it is I don't, and beggars can't be choosers.

Nico
--

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22David G. Johnston
david.g.johnston@gmail.com
In reply to: Christian Convey (#20)
Re: Tackling JsonPath support

On Mon, Nov 28, 2016 at 7:38 PM, Christian Convey <
christian.convey@gmail.com> wrote:

On Mon, Nov 28, 2016 at 6:26 PM, Nico Williams <nico@cryptonector.com>
wrote:

While XPath is expressive and compact, XSLT
is rather verbose; jq is as expressive as XSLT, but with the compact
verbosity of XPath.

Instead, your point was that jq seems to have many advantages over
json-path in general, and therefore PG should offer jq instead or, or in
addition to, json-path.

IMO jq is considerably closer to XSLT than XPath - which leads me to figure
that since xml has both that JSON can benefit from jq and json-path. I'm
not inclined to dig too deep here but I'd rather take jq in the form of
"pl/jq" and have json-path (abstractly) as something that you can use like
"pg_catalog.get_value(json, json-path)"

​David J.

#23Nico Williams
nico@cryptonector.com
In reply to: David G. Johnston (#22)
Re: Tackling JsonPath support

On Mon, Nov 28, 2016 at 08:00:46PM -0700, David G. Johnston wrote:

IMO jq is considerably closer to XSLT than XPath - which leads me to figure
that since xml has both that JSON can benefit from jq and json-path. I'm
not inclined to dig too deep here but I'd rather take jq in the form of
"pl/jq" and have json-path (abstractly) as something that you can use like
"pg_catalog.get_value(json, json-path)"

JSONPath looks a lot like a small subset of jq. Here are some examples:

JSONPath | jq
-------------------------------------------------------------------

$.store.book[0].title | .store.book[0].title
$['store']['book'][0]['title'] | .["store"]["book"][0]["title"]
$..author | ..|.author
$.store.* | .store[]
$.store..price | .store|..|.price?
$..book[2] | [..|.book?][2]
$..book[?(@.isbn)] | ..|.book?|select(.isbn)
$..book[?(@.price<10)] | ..|.book?|select(.price<10)
$..* | ..?

Of course, jq can do much more than this. E.g.,

# Output [<title>, <price>] of all books with an ISBN:
..|.book?|select(.isbn)|[.title,.price]

# Output the average price of books with ISBNs appearing anywhere in
# the input document:
reduce
(..|.book?|select(.isbn)|.price) as $price
(
# Initial reduction state:
{price:0,num:0};
# State update
.price = (.price * .num + $price) / (.num + 1) | .num += 1) |
# Extract average price
.price

Of course one could just wrap that with a function:

def avg(pathexp; cond; v):
reduce (pathexp | select(cond) | v) as $v
({v: 0, c: 0};
.v = (.v * .c + $v) / (.c + 1) | .c += 1) | v;

# Average price of books with ISBNs:
avg(..|.book?; .isbn; .price)

# Average price of all books:
avg(..|.book?; true; .price)

There's much, much more.

Note that jq comes with a C implementation. It should be easy to make
bindings to it from other programming language run-times.

Nico
--

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Pavel Stehule
pavel.stehule@gmail.com
In reply to: Christian Convey (#18)
Re: Tackling JsonPath support

2016-11-29 2:50 GMT+01:00 Christian Convey <christian.convey@gmail.com>:

On Mon, Nov 28, 2016 at 11:23 AM, Nico Williams <nico@cryptonector.com>
wrote:
...

JSON Path is not expressive enough (last I looked) and can be mapped
onto jq if need be anyways.

​Hi Nico,

Could you please clarify what you mean by "not expressive enough"?

I ask because I've been struggling to identify clear requirements for the
json-path functionality I'm trying to provide. It sounds like perhaps you
have something concrete in mind.

Since I myself have no need currently for this functionality, I'm left
guessing about hypothetical users of it.​ My current mental model is:

(a) Backend web developers. AFAICT, their community has mostly settled on
the syntax/semantics proposed by Stefan Groessner. It would probably be
unkind for PG's implementation to deviate from that without a good reason.

(b) PG hackers who will eventually implement the ISO SQL standard
operators. In the standards-committee meeting notes I've seen, it seemed
to me that they were planning to define some operators in terms of
json-path expression. So it would probably be good if whatever json-path
function I implement turns out to comply with that standard, so that the
PG-hackers can use it as a building block for their work.

(c) Pavel. (I'm still somewhat unclear on what has him interested in
this, and what his specific constraints are.)

My target is simple - 1. to have good ANSI/SQL support, 2. to have good
JSON to relation mapping function - ANSI/SQL JSONTABLE does it.

We now support XPath function - JSONPath is similar to XPath - it is better
for user, because have to learn only one language.

Regards

Pavel

Show quoted text

- Christian

#25Pavel Stehule
pavel.stehule@gmail.com
In reply to: David G. Johnston (#22)
Re: Tackling JsonPath support

2016-11-29 4:00 GMT+01:00 David G. Johnston <david.g.johnston@gmail.com>:

On Mon, Nov 28, 2016 at 7:38 PM, Christian Convey <
christian.convey@gmail.com> wrote:

On Mon, Nov 28, 2016 at 6:26 PM, Nico Williams <nico@cryptonector.com>
wrote:

While XPath is expressive and compact, XSLT
is rather verbose; jq is as expressive as XSLT, but with the compact
verbosity of XPath.

Instead, your point was that jq seems to have many advantages over
json-path in general, and therefore PG should offer jq instead or, or in
addition to, json-path.

IMO jq is considerably closer to XSLT than XPath - which leads me to
figure that since xml has both that JSON can benefit from jq and
json-path. I'm not inclined to dig too deep here but I'd rather take jq in
the form of "pl/jq" and have json-path (abstractly) as something that you
can use like "pg_catalog.get_value(json, json-path)"

I am not against to this idea. The jq and similar environments can have
sense in JSON NoSQL databases. Using it in relation database in searching
functions is a overkill.

Regards

Pavel

Show quoted text

​David J.

#26Christian Convey
christian.convey@gmail.com
In reply to: Pavel Stehule (#24)
Re: Tackling JsonPath support

On Mon, Nov 28, 2016 at 9:28 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

We now support XPath function - JSONPath is similar to XPath - it is
better for user, because have to learn only one language.

I'm not sure I understand.

Are you suggesting that we use XPath, not JSONPath, as our language for
json-path expressions?

​- C

#27Pavel Stehule
pavel.stehule@gmail.com
In reply to: Christian Convey (#26)
Re: Tackling JsonPath support

2016-11-29 7:34 GMT+01:00 Christian Convey <christian.convey@gmail.com>:

On Mon, Nov 28, 2016 at 9:28 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

We now support XPath function - JSONPath is similar to XPath - it is
better for user, because have to learn only one language.

I'm not sure I understand.

Are you suggesting that we use XPath, not JSONPath, as our language for
json-path expressions?

surely not.

follow ANSI/SQL :)

Pavel

Show quoted text

​- C

#28Christian Convey
christian.convey@gmail.com
In reply to: Pavel Stehule (#27)
Re: Tackling JsonPath support

On Mon, Nov 28, 2016 at 10:37 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

2016-11-29 7:34 GMT+01:00 Christian Convey <christian.convey@gmail.com>:

On Mon, Nov 28, 2016 at 9:28 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

​​
We now support XPath function - JSONPath is similar to XPath -
​​
it is better for user, because have to learn only one language.

I'm not sure I understand.

Are you suggesting that we use XPath, not JSONPath, as our language for
json-path expressions?

surely not.

follow ANSI/SQL :)

I see. Then I'm afraid I still don't understand what you're main point was
when you wrote:

We now support XPath function - JSONPath is similar to XPath -
​​
it is better for user, because have to learn only one language.

- C

#29Petr Jelinek
petr@2ndquadrant.com
In reply to: Pavel Stehule (#27)
Re: Tackling JsonPath support

On 29/11/16 07:37, Pavel Stehule wrote:

2016-11-29 7:34 GMT+01:00 Christian Convey <christian.convey@gmail.com
<mailto:christian.convey@gmail.com>>:

On Mon, Nov 28, 2016 at 9:28 PM, Pavel Stehule
<pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>>wrote:

We now support XPath function - JSONPath is similar to XPath -
it is better for user, because have to learn only one language.

I'm not sure I understand.

Are you suggesting that we use XPath, not JSONPath, as our language
for json-path expressions?

surely not.

follow ANSI/SQL :)

Just to add to this, the SQL/JSON proposals I've seen so far, and what
Oracle, MSSQL and Teradata chose to implement already is basically
subset of jsonpath (some proposals/implementations also include
lax/strict prefix keyword on top of that). I think that should give us
some hint on what the base functionality should look like.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30Nico Williams
nico@cryptonector.com
In reply to: Petr Jelinek (#29)
Re: Tackling JsonPath support

On Tue, Nov 29, 2016 at 05:18:17PM +0100, Petr Jelinek wrote:

Just to add to this, the SQL/JSON proposals I've seen so far, and what
Oracle, MSSQL and Teradata chose to implement already is basically
subset of jsonpath (some proposals/implementations also include
lax/strict prefix keyword on top of that). I think that should give us
some hint on what the base functionality should look like.

Yes, that'd be base functionality. You can go above and beyond.

I agree with Pavel that jq could be used as a user-defined function, but
proper integration would be better because it would avoid the need to
format and parse JSON around calls to jq, and also because PG could
compile jq programs when preparing SQL statements. Besides, the libjq
jv API is *very* nice.

Nico
--

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31Petr Jelinek
petr@2ndquadrant.com
In reply to: Nico Williams (#30)
Re: Tackling JsonPath support

On 29/11/16 17:28, Nico Williams wrote:

On Tue, Nov 29, 2016 at 05:18:17PM +0100, Petr Jelinek wrote:

Just to add to this, the SQL/JSON proposals I've seen so far, and what
Oracle, MSSQL and Teradata chose to implement already is basically
subset of jsonpath (some proposals/implementations also include
lax/strict prefix keyword on top of that). I think that should give us
some hint on what the base functionality should look like.

Yes, that'd be base functionality. You can go above and beyond.

But let's just do the base thing first before going to much more
complicated endeavor, especially if this is supposed to be the first
patch for Christian. Also, one of the points of the SQL is the
compatibility so that's what we should strive for first, especially
given that the syntax of the jq is not compatible AFAICS.

I agree with Pavel that jq could be used as a user-defined function, but
proper integration would be better because it would avoid the need to
format and parse JSON around calls to jq, and also because PG could
compile jq programs when preparing SQL statements. Besides, the libjq
jv API is *very* nice.

I think this would be good as extension first and then we can see what
to do with it next (ie I agree with Pavel).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32Christian Convey
christian.convey@gmail.com
In reply to: Petr Jelinek (#29)
Re: Tackling JsonPath support

On Tue, Nov 29, 2016 at 8:18 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
...

Just to add to this, the SQL/JSON proposals I've seen so far, and what
Oracle, MSSQL and Teradata chose to implement already is basically
subset of jsonpath (some proposals/implementations also include
lax/strict prefix keyword on top of that). I think that should give us
some hint on what the base functionality should look like.

I agree. My guess is that PG users would benefit most from:

(1) Conformance to whatever ISO standard regarding JSON operators
eventually makes it out of the working group.
(2) Compatibility with other widely-used DBMS's.
(3) Compatibility with the JSONPath functionality ​used by web developers.
(Although I don't currently have a grasp on which frameworks / libraries
this entails.)

I *think* that (1), (2), and (3) are in approximate agreement about the
syntax and semantics of the path-expression language: the language proposed
by Stefan Groessner, plus the strict vs. lax distinction.

I think I can satisfy (3) with a PG extension which provides a function
that approximately implements JSONPath. My short-term plans are to submit
such a patch.

Hopefully that patch's function will be a helpful starting point for
satisfying (1) and (2) as well. But that can be decided later.

Nico Williams has argued for using "jq". I don't think jq satisfies any of
(1), (2), or (3), so I don't see a good case for incorporating it in my
short-term plans. There *may* be a case for using jq internally to my
implementation; I'll try to look into that.

#33Robert Haas
robertmhaas@gmail.com
In reply to: Christian Convey (#32)
Re: Tackling JsonPath support

On Tue, Nov 29, 2016 at 11:50 AM, Christian Convey
<christian.convey@gmail.com> wrote:

I think I can satisfy (3) with a PG extension which provides a function that
approximately implements JSONPath. My short-term plans are to submit such a
patch.

FWIW, I think that's a fine plan. I don't really know whether
JSONPath is the right standard to pick for the task of extracting bits
of JSON from other bits of JSON, but I think there's some value in
picking something is simple enough that we can implement it in our own
code and not have to rely on a third-party library. Of course, if
somebody feels like adding a configure option for --with-jq and
appropriate interfaces to integrate with JQ, we could consider that,
too, but that imposes a packaging requirement that a home-grown
implementation doesn't. I'd want to hear more than one vote for such
a course of action before embracing it. If JQ is a Turing-complete
query language, integrating it might be quite difficult -- for
example, we'd need a way to make sure that it does periodic
CHECK_FOR_INTERRUPTS() calls, and that it doesn't leak resources or
crash if those calls decide longjmp() away due to an ERROR -- and
would we let people query database tables with it? Would that be
efficient? I think it's fine to have more limited objectives than
what a JQ implementation would apparently entail.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34Nico Williams
nico@cryptonector.com
In reply to: Robert Haas (#33)
Re: Tackling JsonPath support

On Fri, Dec 02, 2016 at 08:53:33AM -0500, Robert Haas wrote:

On Tue, Nov 29, 2016 at 11:50 AM, Christian Convey
<christian.convey@gmail.com> wrote:

I think I can satisfy (3) with a PG extension which provides a function that
approximately implements JSONPath. My short-term plans are to submit such a
patch.

FWIW, I think that's a fine plan. I don't really know whether
JSONPath is the right standard to pick for the task of extracting bits

It's not even a standard. Are there particular proposals that the ANSI
SQL working group is considering?

of JSON from other bits of JSON, but I think there's some value in
picking something is simple enough that we can implement it in our own
code and not have to rely on a third-party library. Of course, if
somebody feels like adding a configure option for --with-jq and

Sure. My main concern is that I don't want to have to parse/format JSON
around every such call. I'd rather parsed JSON remain in an internal
form for as long as possible.

Speaking of which, you could use libjq's jv API and not support the jq
language itself.

appropriate interfaces to integrate with JQ, we could consider that,
too, but that imposes a packaging requirement that a home-grown
implementation doesn't. I'd want to hear more than one vote for such

What we do in Heimdal, OpenAFS, and other open source projects, some
times, is include a copy / git submodule / similar of some such external
dependencies. Naturally it's not possible to do this for all external
dependencies, but it works well enough. The jv API part of jq is small
and simple, and could be ripped out into a library that could be
included in PostgreSQL.

a course of action before embracing it. If JQ is a Turing-complete
query language, integrating it might be quite difficult -- for

Even if it weren't! (It is.)

Consider this expression using a builtin in jq:

[range(4503599627370496)]

That is, an array of integers from 0 to 4503599627370495, inclusive.
That will "halt" given a very, very large computer and a lot of time.

(Because jq is Turning-complete, range() can be coded in jq itself, and
some variants of range() are.)

example, we'd need a way to make sure that it does periodic
CHECK_FOR_INTERRUPTS() calls, and that it doesn't leak resources or
crash if those calls decide longjmp() away due to an ERROR -- and
would we let people query database tables with it? Would that be
efficient? I think it's fine to have more limited objectives than
what a JQ implementation would apparently entail.

Agreed. I think this means that we need either or both of a variant of
the C jq_next() function that takes either a timeout parameter, or a
jq_intr() function that can cause a running jq_next() to stop.

(Tolerating longjmp() is harder to do and I'd rather not.)

Other projects, like, say, nginx or similar where there is a per-client
or per-connection memory pool to limit memory footprint, might want
libjq to get an allocator hook, so that's another enhancement to
consider. If that's something that PostgreSQL would need, please let me
know.

Nico
--

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35Christian Convey
christian.convey@gmail.com
In reply to: Nico Williams (#34)
Re: Tackling JsonPath support

On Fri, Dec 2, 2016 at 1:32 PM, Nico Williams <nico@cryptonector.com> wrote:
...
On Fri, Dec 02, 2016 at 08:53:33AM -0500, Robert Haas wrote:

On Tue, Nov 29, 2016 at 11:50 AM, Christian Convey
<christian.convey@gmail.com> wrote:

I think I can satisfy (3) with a PG extension which provides a

function that

approximately implements JSONPath. My short-term plans are to submit

such a

patch.

FWIW, I think that's a fine plan. I don't really know whether
JSONPath is the right standard to pick for the task of extracting bits

It's not even a standard. Are there particular proposals that the ANSI
SQL working group is considering?

​Hi Nico, it seems to be something in the works with the standards
committee. We were discussing it earlier in the thread: [1]http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg297732.html​

Kind regards,
Christian

[1]: http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg297732.html​

#36Robert Haas
robertmhaas@gmail.com
In reply to: Nico Williams (#34)
Re: Tackling JsonPath support

On Fri, Dec 2, 2016 at 4:32 PM, Nico Williams <nico@cryptonector.com> wrote:

On Fri, Dec 02, 2016 at 08:53:33AM -0500, Robert Haas wrote:

On Tue, Nov 29, 2016 at 11:50 AM, Christian Convey
<christian.convey@gmail.com> wrote:

I think I can satisfy (3) with a PG extension which provides a function that
approximately implements JSONPath. My short-term plans are to submit such a
patch.

FWIW, I think that's a fine plan. I don't really know whether
JSONPath is the right standard to pick for the task of extracting bits

It's not even a standard. Are there particular proposals that the ANSI
SQL working group is considering?

I don't know.

of JSON from other bits of JSON, but I think there's some value in
picking something is simple enough that we can implement it in our own
code and not have to rely on a third-party library. Of course, if
somebody feels like adding a configure option for --with-jq and

Sure. My main concern is that I don't want to have to parse/format JSON
around every such call. I'd rather parsed JSON remain in an internal
form for as long as possible.

Sure, but that seems like an orthogonal consideration.

Speaking of which, you could use libjq's jv API and not support the jq
language itself.

You can submit a patch for that if you like, but I don't think that's
a good reason to block what Christian wants to do, because it's a much
bigger change. Now if you whip that patch up in a short period of
time and everybody agrees that it gives us everything Christian wanted
to implement and other good stuff too, fine; we can reject Christian's
approach then. Or if Christian's patch is committed, we can rip it
back out again if and when somebody does this (or any other thing we
all agree is better). But I object strenuously to the idea that we
should reject the idea of drinking the bottle of beer we have in the
house because there's a liquor store down the road where we can buy an
entire keg. The possibility of awesome (with enough work) is not a
reason to reject good (via a straightforward approach).

appropriate interfaces to integrate with JQ, we could consider that,
too, but that imposes a packaging requirement that a home-grown
implementation doesn't. I'd want to hear more than one vote for such

What we do in Heimdal, OpenAFS, and other open source projects, some
times, is include a copy / git submodule / similar of some such external
dependencies. Naturally it's not possible to do this for all external
dependencies, but it works well enough. The jv API part of jq is small
and simple, and could be ripped out into a library that could be
included in PostgreSQL.

We are typically avoid copying things into our repository because then
we become responsible for pulling in any subsequent fixes. The few
instances that we have right now (zic, snowball, Harry Spencer's
regexp stuff) have imposed a significant code maintenance burden. The
git submodules approach might avoid that problem, but it would still
be the case that any compile-breaking bugs in the upstream repository
immediately become compile breaks for all PostgreSQL developers, and
that any critical defects that force emergency releases by the
upstream project now force emergency releases of PostgreSQL as well.
If we merely link against the external project, then we avoid that.
So if we're going to use JQ at all, I think that's how we should do
it. And again, I'm not trying to prevent you or anybody else from
pursuing that. All I'm saying is that Christian can do what he wants
to do, too.

Other projects, like, say, nginx or similar where there is a per-client
or per-connection memory pool to limit memory footprint, might want
libjq to get an allocator hook, so that's another enhancement to
consider. If that's something that PostgreSQL would need, please let me
know.

The overall need is that it needs to be possible for PostgreSQL to
throw an ERROR, and thus longjmp, without leaking resources.
Sometimes those errors happen asynchronously due to query cancel or
replication conflicts or similar, and those don't have to be processed
at once but the delay can't be unbounded or more than some modest
fraction of a second. Integrating with PostgreSQL's memory-context
stuff might make that or other resource leak problems easier, or it
might not. To really get a clear notion of what would be involved, I
think you'd probably need to produce a prototype patch and submit it
here for review to really find out what issues people see with it.
Library integrations are tricky but, since you wrote JQ and seem
potentially willing to modify it to work with PostgreSQL better, this
one might be a lot less painful than some. I can't promise we'd
accept an integration even if you came up with a patch, but I see
looking back over this thread that there are several people cautiously
supporting the idea of using JQ in some form, which is promising.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37Nico Williams
nico@cryptonector.com
In reply to: Robert Haas (#36)
Re: Tackling JsonPath support

On Mon, Dec 05, 2016 at 11:28:31AM -0500, Robert Haas wrote:

The overall need is that it needs to be possible for PostgreSQL to
throw an ERROR, and thus longjmp, without leaking resources.

As long as one can interpose jump buffers, that should be possible.

Sometimes those errors happen asynchronously due to query cancel or
replication conflicts or similar, and those don't have to be processed
at once but the delay can't be unbounded or more than some modest
fraction of a second. Integrating with PostgreSQL's memory-context
stuff might make that or other resource leak problems easier, or it
might not. To really get a clear notion of what would be involved, I
think you'd probably need to produce a prototype patch and submit it
here for review to really find out what issues people see with it.

Understood. Thanks.

Library integrations are tricky but, since you wrote JQ and seem

Just to be clear, Stephen Dolan wrote jq. I've added to, and maintained
jq, to be sure, but I would not want to take credit from Stephen.

potentially willing to modify it to work with PostgreSQL better, this
one might be a lot less painful than some. I can't promise we'd
accept an integration even if you came up with a patch, but I see
looking back over this thread that there are several people cautiously
supporting the idea of using JQ in some form, which is promising.

Fair enough. I won't be able to work on an integration for a few more
months, so we'll see (and Stephen might well veto some such changes to
jq), and it is time for me to shut up about this for now. Thanks for
the very useful comments, and sorry for the noise.

Nico
--

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#36)
Re: Tackling JsonPath support

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Dec 2, 2016 at 4:32 PM, Nico Williams <nico@cryptonector.com> wrote:

What we do in Heimdal, OpenAFS, and other open source projects, some
times, is include a copy / git submodule / similar of some such external
dependencies. Naturally it's not possible to do this for all external
dependencies, but it works well enough. The jv API part of jq is small
and simple, and could be ripped out into a library that could be
included in PostgreSQL.

We are typically avoid copying things into our repository because then
we become responsible for pulling in any subsequent fixes. The few
instances that we have right now (zic, snowball, Harry Spencer's
regexp stuff) have imposed a significant code maintenance burden. The
git submodules approach might avoid that problem, but it would still
be the case that any compile-breaking bugs in the upstream repository
immediately become compile breaks for all PostgreSQL developers, and
that any critical defects that force emergency releases by the
upstream project now force emergency releases of PostgreSQL as well.
If we merely link against the external project, then we avoid that.

Another point here is that packagers such as Red Hat strenuously dislike
such source-code-level wrapping of other projects, because that means that
they have to rebuild multiple packages to fix any bugs found in the
wrapped code. If I were still packaging Postgres for Red Hat, and such
a distribution landed in my inbox, the very first thing I'd be looking
to do is rip out the borrowed code and replace it with a runtime
shared-library dependency on the upstream project's official library.

Having said that ... we have a *really bad* track record of deciding which
outside projects we want to depend on, or maybe we've just outlived a lot
of them. Aside from Robert's examples, there's uuid-ossp and libxml2,
which are external code but have caused us headaches anyway. So I think
there's a lot to be said for avoiding dependencies on libraries that may
or may not still be getting actively maintained ten years from now.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39Robert Haas
robertmhaas@gmail.com
In reply to: Nico Williams (#37)
Re: Tackling JsonPath support

On Mon, Dec 5, 2016 at 11:42 AM, Nico Williams <nico@cryptonector.com> wrote:

Library integrations are tricky but, since you wrote JQ and seem

Just to be clear, Stephen Dolan wrote jq. I've added to, and maintained
jq, to be sure, but I would not want to take credit from Stephen.

Ah, OK, sorry for misunderstanding.

potentially willing to modify it to work with PostgreSQL better, this
one might be a lot less painful than some. I can't promise we'd
accept an integration even if you came up with a patch, but I see
looking back over this thread that there are several people cautiously
supporting the idea of using JQ in some form, which is promising.

Fair enough. I won't be able to work on an integration for a few more
months, so we'll see (and Stephen might well veto some such changes to
jq), and it is time for me to shut up about this for now. Thanks for
the very useful comments, and sorry for the noise.

Not noise at all. Thanks for your interest. Bandwidth is a little
limited around here so it's hard to give everyone the feedback that
they deserve, but it's good that you're interested.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40Nico Williams
nico@cryptonector.com
In reply to: Tom Lane (#38)
Re: Tackling JsonPath support

On Mon, Dec 05, 2016 at 11:52:57AM -0500, Tom Lane wrote:

Another point here is that packagers such as Red Hat strenuously dislike
such source-code-level wrapping of other projects, because that means that
they have to rebuild multiple packages to fix any bugs found in the
wrapped code. If I were still packaging Postgres for Red Hat, and such
a distribution landed in my inbox, the very first thing I'd be looking
to do is rip out the borrowed code and replace it with a runtime
shared-library dependency on the upstream project's official library.

I'm quite aware :( I used to work at Sun on Solaris. We too hated
duplication. OpenSSL was a particularly problematic case...

There is a real tension between the release trains of many distinct open
source projects and those of their consumers, and those of distros/OSes.

Some projects, such as SQLite3, explicitly recommend copying their
source or statically linking them into dependents; distros/vendors never
like this.

My best advice on this (PG might benefit from it), informed by years of
experience dealing with this, is that there's no perfect answer, but
that nonetheless library developers should always follow these best
practices so as to help those who end up having to deal with multiple
versions of those libraries:

- prefer dynamic linking (because dynamic linking semantics are
superior to static linking semantics)

- make libraries self-initialize and self-finalize! (pthread_once()
and Win32's ExecuteOnce* are your friends, as are atexit()/on_exit(),
pthread_key_create(), and DllMain() on Windows)

(otherwise calling your library from another library gets tricky)

- make it so that as long as you change SONAMEs you can have multiple
versions of the library loaded in one process, specifically:

- don't use POSIX file locking (but the new non-POSIX OFD locks are
OK) (or use them for files that wouldn't be shared across multiple
versions in one process)

(e.g., SQLite3 uses POSIX file locking safely, but it's not likely
that two consumers of different SQLite3 versions in one process
would access the same DB files, so it kinda works)

- be backwards- and forwards-compatible as to any config file
formats and other state that will be shared by multiple versions

- generally: mind backwards compatibility, both source and binary, so
as to make it easy to upgrade

- this means applying good API design best practices that I won't go
into here

- write thread-safe code, and preferably fork-safe code too

For example, I've seen OpenSSL built with different SONAMEs to support
multiple versions of OpenSSL coexisting in a single program/process.
That actually works.

Having said that ... we have a *really bad* track record of deciding which
outside projects we want to depend on, or maybe we've just outlived a lot
of them. Aside from Robert's examples, there's uuid-ossp and libxml2,
which are external code but have caused us headaches anyway. So I think
there's a lot to be said for avoiding dependencies on libraries that may
or may not still be getting actively maintained ten years from now.

I'm not at all surprised.

One codebase I help develop and maintain, Heimdal, includes SQLite3 and
libeditline, and parts of Heimdal should really be separate projects
(e.g., its ASN.1 compiler and library, and several supporting libraries
like libroken (a portability layer)) because they could be useful to
others outside Heimdal. Finding the right balance is not trivial.

Nico
--

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Nico Williams (#15)
Re: Tackling JsonPath support

On 11/28/16 12:38 PM, Nico Williams wrote:

The internal representation of JSON data is bound to be
completely different, no doubt

Actually, that could be a good thing. The internal storage of JSONB is
optimized for compress-ability, but that imposes a substantial overhead
to calls that are searching for a particular key in a document. This
gets *really* bad if you make nested expansion calls (ie: json->'a'->'b').

ExpandedObject support means we're not stuck with the same
representation in-memory as on-disk though. While we could create our
own internal representation, it seems a bit silly to reinvent that wheel
if we don't need to. Bonus points if it would also throw an error if you
fed it duplicated object keys.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers