[PATCH] few fts functions for jsonb

Started by Dmitry Dolgovabout 9 years ago19 messageshackers
Jump to latest
#1Dmitry Dolgov
9erthalion6@gmail.com

Hi all

I would like to propose patch with a set of new small functions for fts in
case of
jsonb data type:

* to_tsvector(config, jsonb) - make a tsvector from all string values and
elements of jsonb object. To prevent the situation, when tsquery can find
a
phrase consisting of lexemes from two different values/elements, this
function will add an increment to position of each lexeme from every new
value/element.

* ts_headline(config, jsonb, tsquery, options) - generate a headline
directly
from jsonb object

Here are the examples how they work:

```
=# select to_tsvector('{"a": "aaa bbb", "b": ["ccc ddd"], "c": {"d": "eee
fff"}}'::jsonb);
to_tsvector
-------------------------------------------------
'aaa':1 'bbb':2 'ccc':4 'ddd':5 'eee':7 'fff':8
(1 row)

=# select ts_headline('english', '{"a": "aaa bbb", "b": {"c": "ccc
ddd"}}'::jsonb, tsquery('bbb & ddd & hhh'), 'StartSel = <, StopSel = >');
ts_headline
----------------------
aaa <bbb> ccc <ddd>
(1 row)
```

Any comments or suggestions?

Attachments:

jsonb_fts_v1.patchtext/x-patch; charset=US-ASCII; name=jsonb_fts_v1.patchDownload+247-0
#2Oleg Bartunov
oleg@sai.msu.su
In reply to: Dmitry Dolgov (#1)
Re: [PATCH] few fts functions for jsonb

The proposed patch looks not very important, but I consider it as an
important feature, which Oracle and Microsoft already have, that's why I
asked Dmitry to work on this and made it before feature freeze. My comments
follows below the post.

On Tue, Feb 28, 2017 at 1:59 PM, Dmitry Dolgov <9erthalion6@gmail.com>
wrote:

Hi all

I would like to propose patch with a set of new small functions for fts in
case of
jsonb data type:

* to_tsvector(config, jsonb) - make a tsvector from all string values and
elements of jsonb object. To prevent the situation, when tsquery can
find a
phrase consisting of lexemes from two different values/elements, this
function will add an increment to position of each lexeme from every new
value/element.

* ts_headline(config, jsonb, tsquery, options) - generate a headline
directly
from jsonb object

Here are the examples how they work:

```
=# select to_tsvector('{"a": "aaa bbb", "b": ["ccc ddd"], "c": {"d": "eee
fff"}}'::jsonb);
to_tsvector
-------------------------------------------------
'aaa':1 'bbb':2 'ccc':4 'ddd':5 'eee':7 'fff':8
(1 row)

=# select ts_headline('english', '{"a": "aaa bbb", "b": {"c": "ccc
ddd"}}'::jsonb, tsquery('bbb & ddd & hhh'), 'StartSel = <, StopSel = >');
ts_headline
----------------------
aaa <bbb> ccc <ddd>
(1 row)
```

Any comments or suggestions?

1. add json support
2. Its_headline should returns the original json with highlighting. As a
first try the proposed ts_headline could be ok, probably need special
option.

Show quoted text

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Oleg Bartunov (#2)
Re: [PATCH] few fts functions for jsonb

On 28 February 2017 at 19:21, Oleg Bartunov <obartunov@gmail.com> wrote:
1. add json support

I've added json support for all functions.

Its_headline should returns the original json with highlighting

Yes, I see now. I don't think it's worth it to add a special option for that
purpose, so I've just changed the implementation to return the original
json(b).

Attachments:

jsonb_fts_v2.patchtext/x-patch; charset=US-ASCII; name=jsonb_fts_v2.patchDownload+777-0
#4Andrew Dunstan
andrew@dunslane.net
In reply to: Dmitry Dolgov (#3)
Re: [PATCH] few fts functions for jsonb

On 03/10/2017 11:13 AM, Dmitry Dolgov wrote:

On 28 February 2017 at 19:21, Oleg Bartunov <obartunov@gmail.com

<mailto:obartunov@gmail.com>> wrote:

1. add json support

I've added json support for all functions.

Its_headline should returns the original json with highlighting

Yes, I see now. I don't think it's worth it to add a special option
for that
purpose, so I've just changed the implementation to return the
original json(b).

This is a pretty good idea.

However, I think it should probably be broken up into a couple of pieces
- one for the generic json/jsonb transforms infrastructure (which
probably needs some more comments) and one for the FTS functions that
will use it.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Andrew Dunstan (#4)
Re: [PATCH] few fts functions for jsonb

On 21 March 2017 at 03:03, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>

wrote:

However, I think it should probably be broken up into a couple of pieces -
one for the generic json/jsonb transforms infrastructure (which probably
needs some more comments) and one for the FTS functions that will use it.

Sure, here are two patches with separated functionality and a bit more
commentaries for the transform functions.

Attachments:

jsonb_fts_support_v1.patchtext/x-patch; charset=US-ASCII; name=jsonb_fts_support_v1.patchDownload+247-0
jsonb_fts_functions_v1.patchtext/x-patch; charset=US-ASCII; name=jsonb_fts_functions_v1.patchDownload+543-0
#6Andrew Dunstan
andrew@dunslane.net
In reply to: Dmitry Dolgov (#5)
Re: [PATCH] few fts functions for jsonb

On 03/21/2017 06:28 PM, Dmitry Dolgov wrote:

On 21 March 2017 at 03:03, Andrew Dunstan

<andrew.dunstan@2ndquadrant.com
<mailto:andrew.dunstan@2ndquadrant.com>> wrote:

However, I think it should probably be broken up into a couple of

pieces -

one for the generic json/jsonb transforms infrastructure (which probably
needs some more comments) and one for the FTS functions that will

use it.

Sure, here are two patches with separated functionality and a bit more
commentaries for the transform functions.

I'm not through looking at this. However, here are a few preliminary
comments

* we might need to rationalize the header locations a bit
* iterate_json(b) and transform_json(b) are a bit too generally named.
Really what they do is iterate over or transform string values in
the json(b). They ignore / preserve the structure, keys, and
non-string scalar values in the json(b). A general iterate or
transform function would be called in effect with a stream of all
the elements in the json, not just scalar strings.
* Unless I'm missing something the iterate_json(b)_values return value
is ignored. Instead of returning the state it looks to me like it
should return nothing and be declared as void instead of void *
* transform_jsonb and transform_json are somewhat asymmetrical. The
latter should probably return a text* instead of a StringInfo, to be
consistent with the former.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Andrew Dunstan (#6)
Re: [PATCH] few fts functions for jsonb

I'm not through looking at this. However, here are a few preliminary

comments

I've attached new versions of the patches with improvements related to
these commentaries.

Attachments:

jsonb_fts_functions_v2.patchtext/x-patch; charset=US-ASCII; name=jsonb_fts_functions_v2.patchDownload+543-0
jsonb_fts_support_v2.patchtext/x-patch; charset=US-ASCII; name=jsonb_fts_support_v2.patchDownload+243-0
#8Andrew Dunstan
andrew@dunslane.net
In reply to: Dmitry Dolgov (#7)
Re: [PATCH] few fts functions for jsonb

On 26 March 2017 at 17:57, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

I'm not through looking at this. However, here are a few preliminary
comments

I've attached new versions of the patches with improvements related to these
commentaries.

These patches seem fundamentally OK. But I'm still not happy with the
naming etc.

I think the header changes would probably be better placed in
jsonapi.h or in a new header file.

And the names still seem too general to me. e.g. transform_json_values
should probably be transform_json_string_values, and the static
support functions should be renamed to match. Also the
JsonIterateAction and JsonTransformAction funtion typedefs should
probably be renamed to match.

I'm not sure there is any great point in the is_jsonb_data macro,
which is only used in one spot. I would get rid of it and expand the
test in place.

I don't have much time this week to work on it, as there are one or
two other patches I also want to look at. If you clean these things
up I will commit it. The second patch looks fine.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Andrew Dunstan (#8)
Re: [PATCH] few fts functions for jsonb

On 29 March 2017 at 18:28, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>

wrote:

These patches seem fundamentally OK. But I'm still not happy with the
naming etc.

I've changed names for all functions and action definitions, moved out the
changes in header file to `jsonapi.h` and removed `is_jsonb_data` macro. So
it
should be better now.

Attachments:

jsonb_fts_support_v3.patchtext/x-patch; charset=US-ASCII; name=jsonb_fts_support_v3.patchDownload+252-0
jsonb_fts_functions_v3.patchtext/x-patch; charset=US-ASCII; name=jsonb_fts_functions_v3.patchDownload+545-0
#10Andrew Dunstan
andrew@dunslane.net
In reply to: Dmitry Dolgov (#9)
Re: [PATCH] few fts functions for jsonb

On 29 March 2017 at 16:19, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On 29 March 2017 at 18:28, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
wrote:

These patches seem fundamentally OK. But I'm still not happy with the
naming etc.

I've changed names for all functions and action definitions, moved out the
changes in header file to `jsonapi.h` and removed `is_jsonb_data` macro. So
it
should be better now.

I have just noticed as I was writing/testing the non-existent docs for
this patch that it doesn't supply variants of to_tsvector that take a
regconfig as the first argument. Is there a reason for that? Why
should the json(b) versions be different from the text versions?

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Andrew Dunstan (#10)
Re: [PATCH] few fts functions for jsonb

On 31 March 2017 at 00:01, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
wrote:

I have just noticed as I was writing/testing the non-existent docs for
this patch that it doesn't supply variants of to_tsvector that take a
regconfig as the first argument. Is there a reason for that? Why
should the json(b) versions be different from the text versions?

No, there is no reason, I just missed that. Here is a new version of the
patch (only the functions part)
to add those variants.

Attachments:

jsonb_fts_functions_v4.patchtext/x-patch; charset=US-ASCII; name=jsonb_fts_functions_v4.patchDownload+595-0
#12Oleg Bartunov
oleg@sai.msu.su
In reply to: Dmitry Dolgov (#11)
Re: [PATCH] few fts functions for jsonb

On 30 Mar 2017 23:43, "Dmitry Dolgov" <9erthalion6@gmail.com> wrote:

On 31 March 2017 at 00:01, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
wrote:

I have just noticed as I was writing/testing the non-existent docs for
this patch that it doesn't supply variants of to_tsvector that take a
regconfig as the first argument. Is there a reason for that? Why
should the json(b) versions be different from the text versions?

No, there is no reason, I just missed that. Here is a new version of the
patch (only the functions part)
to add those variants.

Congratulations with patch committed, who will write an addition
documentation? I think we need to touch FTS and JSON parts.

#13Andrew Dunstan
andrew@dunslane.net
In reply to: Oleg Bartunov (#12)
Re: [PATCH] few fts functions for jsonb

On 03/31/2017 03:17 PM, Oleg Bartunov wrote:

On 30 Mar 2017 23:43, "Dmitry Dolgov" <9erthalion6@gmail.com
<mailto:9erthalion6@gmail.com>> wrote:

On 31 March 2017 at 00:01, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com
<mailto:andrew.dunstan@2ndquadrant.com>> wrote:

I have just noticed as I was writing/testing the non-existent

docs for

this patch that it doesn't supply variants of to_tsvector that

take a

regconfig as the first argument. Is there a reason for that? Why
should the json(b) versions be different from the text versions?

No, there is no reason, I just missed that. Here is a new version
of the patch (only the functions part)
to add those variants.

Congratulations with patch committed, who will write an addition
documentation? I think we need to touch FTS and JSON parts.

I added documentation when I committed it for the new functions, in the
FTS section. I'm not sure what we need to add to the JSON section if
anything.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Andres Freund
andres@anarazel.de
In reply to: Andrew Dunstan (#13)
Re: [PATCH] few fts functions for jsonb

On 2017-04-01 16:20:46 -0400, Andrew Dunstan wrote:

On 03/31/2017 03:17 PM, Oleg Bartunov wrote:

On 30 Mar 2017 23:43, "Dmitry Dolgov" <9erthalion6@gmail.com
<mailto:9erthalion6@gmail.com>> wrote:

On 31 March 2017 at 00:01, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com
<mailto:andrew.dunstan@2ndquadrant.com>> wrote:

I have just noticed as I was writing/testing the non-existent

docs for

this patch that it doesn't supply variants of to_tsvector that

take a

regconfig as the first argument. Is there a reason for that? Why
should the json(b) versions be different from the text versions?

No, there is no reason, I just missed that. Here is a new version
of the patch (only the functions part)
to add those variants.

Congratulations with patch committed, who will write an addition
documentation? I think we need to touch FTS and JSON parts.

I added documentation when I committed it for the new functions, in the
FTS section. I'm not sure what we need to add to the JSON section if
anything.

I see that the CF entry for this hasn't been marked as committed:
https://commitfest.postgresql.org/13/1054/
Is there anything left here?

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Sven R. Kunze
srkunze@mail.de
In reply to: Andrew Dunstan (#13)
Re: [PATCH] few fts functions for jsonb

On 01.04.2017 22:20, Andrew Dunstan wrote:

I added documentation when I committed it for the new functions, in the
FTS section. I'm not sure what we need to add to the JSON section if
anything.

Not sure, if this is related but the formatting of
https://www.postgresql.org/docs/devel/static/functions-textsearch.html
looks a bit strange.

Just 2 questions/notes:
1) in what order are the values of the JSON extracted?

2) Regarding the additional line:
to_tsvector([ config regconfig , ] document json(b)) tsvector reduce
document text to tsvector to_tsvector('english', '{"a": "The Fat
Rats"}'::json) 'fat':2 'rat':3

Maybe change "reduce document text to tsvector" to "extracting JSON
values <in what order> and reduce to tsvector"?

Sven

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Andrew Dunstan
andrew@dunslane.net
In reply to: Andres Freund (#14)
Re: [PATCH] few fts functions for jsonb

On 04/03/2017 02:22 PM, Andres Freund wrote:

On 2017-04-01 16:20:46 -0400, Andrew Dunstan wrote:

On 03/31/2017 03:17 PM, Oleg Bartunov wrote:

On 30 Mar 2017 23:43, "Dmitry Dolgov" <9erthalion6@gmail.com
<mailto:9erthalion6@gmail.com>> wrote:

On 31 March 2017 at 00:01, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com
<mailto:andrew.dunstan@2ndquadrant.com>> wrote:

I have just noticed as I was writing/testing the non-existent

docs for

this patch that it doesn't supply variants of to_tsvector that

take a

regconfig as the first argument. Is there a reason for that? Why
should the json(b) versions be different from the text versions?

No, there is no reason, I just missed that. Here is a new version
of the patch (only the functions part)
to add those variants.

Congratulations with patch committed, who will write an addition
documentation? I think we need to touch FTS and JSON parts.

I added documentation when I committed it for the new functions, in the
FTS section. I'm not sure what we need to add to the JSON section if
anything.

I see that the CF entry for this hasn't been marked as committed:
https://commitfest.postgresql.org/13/1054/
Is there anything left here?

Says "Status committed" for me. I fixed this in Sunday after Tom prodded me.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Andrew Dunstan
andrew@dunslane.net
In reply to: Sven R. Kunze (#15)
Re: [PATCH] few fts functions for jsonb

On 04/03/2017 02:44 PM, Sven R. Kunze wrote:

On 01.04.2017 22:20, Andrew Dunstan wrote:

I added documentation when I committed it for the new functions, in the
FTS section. I'm not sure what we need to add to the JSON section if
anything.

Not sure, if this is related but the formatting of
https://www.postgresql.org/docs/devel/static/functions-textsearch.html
looks a bit strange.

Just 2 questions/notes:
1) in what order are the values of the JSON extracted?

In the order they exist in the underlying document.

2) Regarding the additional line:
to_tsvector([ config regconfig , ] document json(b)) tsvector
reduce document text to tsvector to_tsvector('english', '{"a": "The
Fat Rats"}'::json) 'fat':2 'rat':3

Maybe change "reduce document text to tsvector" to "extracting JSON
values <in what order> and reduce to tsvector"?

OK, I will do something along those lines.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Sven R. Kunze
srkunze@mail.de
In reply to: Andrew Dunstan (#17)
Re: [PATCH] few fts functions for jsonb

On 03.04.2017 21:30, Andrew Dunstan wrote:

On 04/03/2017 02:44 PM, Sven R. Kunze wrote:

On 01.04.2017 22:20, Andrew Dunstan wrote:

I added documentation when I committed it for the new functions, in the
FTS section. I'm not sure what we need to add to the JSON section if
anything.

Not sure, if this is related but the formatting of
https://www.postgresql.org/docs/devel/static/functions-textsearch.html
looks a bit strange.

Just 2 questions/notes:
1) in what order are the values of the JSON extracted?

In the order they exist in the underlying document.

Just asking as the order can have implications for fulltext searches.
So, might be valuable for the docs.

Are these documents equally ordered in this sense?

srkunze=# select '{"a": "abc", "b": "def"}'::jsonb;
jsonb
--------------------------
{"a": "abc", "b": "def"}
(1 row)

srkunze=# select '{"b": "def", "a": "abc"}'::jsonb;
jsonb
--------------------------
{"a": "abc", "b": "def"}
(1 row)

Also what about non-ascii keys? Are they ordered by the default locale
of the PostgreSQL cluster (say de_DE.utf-8)?

2) Regarding the additional line:
to_tsvector([ config regconfig , ] document json(b)) tsvector
reduce document text to tsvector to_tsvector('english', '{"a": "The
Fat Rats"}'::json) 'fat':2 'rat':3

Maybe change "reduce document text to tsvector" to "extracting JSON
values <in what order> and reduce to tsvector"?

OK, I will do something along those lines.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Andrew Dunstan
andrew@dunslane.net
In reply to: Sven R. Kunze (#18)
Re: [PATCH] few fts functions for jsonb

On 04/03/2017 03:41 PM, Sven R. Kunze wrote:

On 03.04.2017 21:30, Andrew Dunstan wrote:

On 04/03/2017 02:44 PM, Sven R. Kunze wrote:

On 01.04.2017 22:20, Andrew Dunstan wrote:

I added documentation when I committed it for the new functions, in
the
FTS section. I'm not sure what we need to add to the JSON section if
anything.

Not sure, if this is related but the formatting of
https://www.postgresql.org/docs/devel/static/functions-textsearch.html
looks a bit strange.

Just 2 questions/notes:
1) in what order are the values of the JSON extracted?

In the order they exist in the underlying document.

Just asking as the order can have implications for fulltext searches.
So, might be valuable for the docs.

Are these documents equally ordered in this sense?

srkunze=# select '{"a": "abc", "b": "def"}'::jsonb;
jsonb
--------------------------
{"a": "abc", "b": "def"}
(1 row)

srkunze=# select '{"b": "def", "a": "abc"}'::jsonb;
jsonb
--------------------------
{"a": "abc", "b": "def"}
(1 row)

Yes, when converted to jsonb these two documents are identical.

Also what about non-ascii keys? Are they ordered by the default locale
of the PostgreSQL cluster (say de_DE.utf-8)?

Yes, I believe so.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers