JSON Path and GIN Questions
Greetings Hackers,
Been a while! I’m working on some experiments with JSONB columns and GIN indexes, and have operated on the assumption that JSON Path operations would take advantage of GIN indexes, with json_path_ops as a nice optimization. But I’ve run into what appear to be some inconsistencies and oddities I’m hoping to figure out with your help.
For the examples in this email, I’m using this simple table:
CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL);
\copy movies(movie) from PROGRAM 'curl -s https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json | jq -c ".[]" | sed "s|\\\\|\\\\\\\\|g"';
create index on movies using gin (movie);
analyze movies;
That gives me a simple table with around 3600 rows. Not a lot of data, but hopefully enough to demonstrate the issues.
Issue 1: @@ vs @?
-----------------
I have been confused as to the difference between @@ vs @?: Why do these return different results?
david=# select id from movies where movie @@ '$ ?(@.title == "New Life Rescue")';
id
----
(0 rows)
david=# select id from movies where movie @? '$ ?(@.title == "New Life Rescue")';
id
----
10
(1 row)
I posted this question on Stack Overflow (https://stackoverflow.com/q/77046554/79202), and from the suggestion I got there, it seems that @@ expects a boolean to be returned by the path query, while @? wraps it in an implicit exists(). Is that right?
If so, I’d like to submit a patch to the docs talking about this, and suggesting the use of jsonb_path_query() to test paths to see if they return a boolean or not.
Issue 2: @? Index Use
---------------------
From Oleg’s (happy belated birthday!) notes (https://github.com/obartunov/sqljsondoc/blob/master/jsonpath.md#jsonpath-operators):
Operators @? and @@ are interchangeable:
js @? '$.a' <=> js @@ 'exists($.a)’
js @@ '$.a == 1' <=> js @? '$ ? ($.a == 1)’
For the purposes of the above example, this appears to hold true: if I wrap the path query in exists(), @@ returns a result:
david=# select id from movies where movie @@ 'exists($ ?(@.title == "New Life Rescue"))';
id
----
10
(1 row)
Yay! However, @@ and @? don’t seem to use an index the same way: @@ uses a GIN index while @? does not.
Or, no, fiddling with it again just now, I think I have still been confusing these operators! @@ was using the index with an an explicit exists(), but @? was not…because I was still using an explicit exists.
In other words:
* @@ 'exists($ ?($.year == 1944))' Uses the index
* @? '$ ?(@.year == 1944)' Uses the index
* @? 'exists($ ?($.year == 1944))' Does not use the index
That last one presumably doesn’t work, because there is an implicit exists() around the exists(), making it `exists(exists($ ?($.year == 1944)))`, which returns true for every row (true and false both exists)! 🤦🏻♂️.
Anyway, if I have this right, I’d like to flesh out the docs a bit.
Issue 3: Index Use for Comparison
---------------------------------
From the docs (https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), I had assumed any JSON Path query would be able to use the GIN index. However while the use of the == JSON Path operator is able to take advantage of the GIN index, apparently the >= operator cannot:
david=# explain analyze select id from movies where movie @? '$ ?($.year >= 2023)';
QUERY PLAN ---------------------------------------------------------------------------------------------------------
Seq Scan on movies (cost=0.00..3741.41 rows=366 width=4) (actual time=34.815..36.259 rows=192 loops=1)
Filter: (movie @? '$?($."year" >= 2023)'::jsonpath)
Rows Removed by Filter: 36081
Planning Time: 1.864 ms
Execution Time: 36.338 ms
(5 rows)
Is this expected? Originally I tried with json_path_ops, which I can understand not working, since it stores hashes of paths, which would allow only exact matches. But a plain old GIN index doesn’t appear to work, either. Should it? Is there perhaps some other op class that would allow it to work? Or would I have to create a separate BTREE index on `movie -> 'year'`?
Thanks your your patience with my questions!
Best,
David
Hi David,
On 13/09/2023 02:16 CEST David E. Wheeler <david@justatheory.com> wrote:
CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL);
\copy movies(movie) from PROGRAM 'curl -s https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json | jq -c ".[]" | sed "s|\\\\|\\\\\\\\|g"';
create index on movies using gin (movie);
analyze movies;I have been confused as to the difference between @@ vs @?: Why do these
return different results?david=# select id from movies where movie @@ '$ ?(@.title == "New Life Rescue")';
id
----
(0 rows)david=# select id from movies where movie @? '$ ?(@.title == "New Life Rescue")';
id
----
10
(1 row)I posted this question on Stack Overflow (https://stackoverflow.com/q/77046554/79202),
and from the suggestion I got there, it seems that @@ expects a boolean to be
returned by the path query, while @? wraps it in an implicit exists(). Is that
right?
That's also my understanding. We had a discussion about the docs on @@, @?, and
jsonb_path_query on -general a while back [1]/messages/by-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA@mail.gmail.com. Maybe it's useful also.
If so, I’d like to submit a patch to the docs talking about this, and
suggesting the use of jsonb_path_query() to test paths to see if they return
a boolean or not.
+1
[1]: /messages/by-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA@mail.gmail.com
--
Erik
Op 9/13/23 om 03:00 schreef Erik Wienhold:
Hi David,
On 13/09/2023 02:16 CEST David E. Wheeler <david@justatheory.com> wrote:
CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL);
\copy movies(movie) from PROGRAM 'curl -s https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json | jq -c ".[]" | sed "s|\\\\|\\\\\\\\|g"';
create index on movies using gin (movie);
analyze movies;I have been confused as to the difference between @@ vs @?: Why do these
return different results?david=# select id from movies where movie @@ '$ ?(@.title == "New Life Rescue")';
id
----
(0 rows)david=# select id from movies where movie @? '$ ?(@.title == "New Life Rescue")';
id
----
10
(1 row)I posted this question on Stack Overflow (https://stackoverflow.com/q/77046554/79202),
and from the suggestion I got there, it seems that @@ expects a boolean to be
returned by the path query, while @? wraps it in an implicit exists(). Is that
right?That's also my understanding. We had a discussion about the docs on @@, @?, and
jsonb_path_query on -general a while back [1]. Maybe it's useful also.If so, I’d like to submit a patch to the docs talking about this, and
suggesting the use of jsonb_path_query() to test paths to see if they return
a boolean or not.+1
[1] /messages/by-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA@mail.gmail.com
--
Erik
"All use of json*() functions preclude index usage."
That sentence is missing from the documentation.
Erik Rijkers
On Sep 13, 2023, at 01:11, Erik Rijkers <er@xs4all.nl> wrote:
"All use of json*() functions preclude index usage."
That sentence is missing from the documentation.
Where did that come from? Why wouldn’t JSON* functions use indexes? I see that the docs only mention operators; why would the corresponding functions behave the same?
D
p 9/13/23 om 22:01 schreef David E. Wheeler:
On Sep 13, 2023, at 01:11, Erik Rijkers <er@xs4all.nl> wrote:
"All use of json*() functions preclude index usage."
That sentence is missing from the documentation.
Where did that come from? Why wouldn’t JSON* functions use indexes? I see that the docs only mention operators; why would the corresponding functions behave the same?
D
Sorry, perhaps my reply was a bit off-topic.
But you mentioned perhaps touching the docs and
the not-use-of-index is just so unexpected.
Compare these two statements:
select count(id) from movies where
movie @? '$ ? (@.year == 2023)'
Time: 1.259 ms
(index used)
select count(id) from movies where
jsonb_path_match(movie, '$.year == 2023');
Time: 17.260 ms
(no index used - unexpectedly slower)
With these two indexes available:
using gin (movie);
using gin (movie jsonb_path_ops);
(REL_15_STABLE; but it's the same in HEAD and
the not-yet-committed SQL/JSON patches.)
Erik Rijkers
Erik Rijkers <er@xs4all.nl> writes:
p 9/13/23 om 22:01 schreef David E. Wheeler:
On Sep 13, 2023, at 01:11, Erik Rijkers <er@xs4all.nl> wrote:
"All use of json*() functions preclude index usage."
Where did that come from? Why wouldn’t JSON* functions use indexes? I see that the docs only mention operators; why would the corresponding functions behave the same?
Sorry, perhaps my reply was a bit off-topic.
But you mentioned perhaps touching the docs and
the not-use-of-index is just so unexpected.
Unexpected to who? I think the docs make it pretty plain that only
operators on indexed columns are considered as index qualifications.
Admittedly, 11.2 Index Types [1]https://www.postgresql.org/docs/current/indexes-types.html makes the point only by not
discussing any other case, but when you get to 11.10 Operator Classes
and Operator Families [2]https://www.postgresql.org/docs/current/indexes-opclass.html and discover that the entire index definition
mechanism is based around operators not functions, you should be able
to reach that conclusion. The point is made even more directly in
38.16 Interfacing Extensions to Indexes [3]https://www.postgresql.org/docs/current/xindex.html, though I'll concede
that that's not material I'd expect the average PG user to read.
As far as json in particular is concerned, 8.14.4 jsonb Indexing [4]https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING
is pretty clear about what is or is not supported.
regards, tom lane
[1]: https://www.postgresql.org/docs/current/indexes-types.html
[2]: https://www.postgresql.org/docs/current/indexes-opclass.html
[3]: https://www.postgresql.org/docs/current/xindex.html
[4]: https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING
On Sep 14, 2023, at 00:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
As far as json in particular is concerned, 8.14.4 jsonb Indexing [4]
is pretty clear about what is or is not supported.
How do you feel about this note, then?
diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index b6c2ddbf55..7dda727f0d 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -413,6 +413,13 @@ SELECT doc->'site_name' FROM websites
Two GIN <quote>operator classes</quote> are provided, offering different
performance and flexibility trade-offs.
</para>
+ <note>
+ <para>
+ As with all indexes, only operators on indexed columns are considered as
+ index qualifications. In other words, only <type>jsonb</type> operators can
+ take advantage of GIN indexes; <type>jsonb</type> functions cannot.
+ </para>
+ </note>
<para>
The default GIN operator class for <type>jsonb</type> supports queries with
the key-exists operators <literal>?</literal>, <literal>?|</literal>
Best,
David
On Sep 12, 2023, at 21:00, Erik Wienhold <ewie@ewie.name> wrote:
That's also my understanding. We had a discussion about the docs on @@, @?, and
jsonb_path_query on -general a while back [1]. Maybe it's useful also.
Okay, I’ll take a pass at expanding the docs on this. I think a little mini-tutorial on these two operators would be useful.
Meanwhile, I’d like to re-up this question about the index qualification of non-equality JSON Path operators.
On Sep 12, 2023, at 20:16, David E. Wheeler <david@justatheory.com> wrote:
Issue 3: Index Use for Comparison
---------------------------------From the docs (https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), I had assumed any JSON Path query would be able to use the GIN index. However while the use of the == JSON Path operator is able to take advantage of the GIN index, apparently the >= operator cannot:
david=# explain analyze select id from movies where movie @? '$ ?($.year >= 2023)';
QUERY PLAN ---------------------------------------------------------------------------------------------------------
Seq Scan on movies (cost=0.00..3741.41 rows=366 width=4) (actual time=34.815..36.259 rows=192 loops=1)
Filter: (movie @? '$?($."year" >= 2023)'::jsonpath)
Rows Removed by Filter: 36081
Planning Time: 1.864 ms
Execution Time: 36.338 ms
(5 rows)Is this expected? Originally I tried with json_path_ops, which I can understand not working, since it stores hashes of paths, which would allow only exact matches. But a plain old GIN index doesn’t appear to work, either. Should it? Is there perhaps some other op class that would allow it to work? Or would I have to create a separate BTREE index on `movie -> 'year'`?
Thanks,
David
"David E. Wheeler" <david@justatheory.com> writes:
On Sep 14, 2023, at 00:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
As far as json in particular is concerned, 8.14.4 jsonb Indexing [4]
is pretty clear about what is or is not supported.
How do you feel about this note, then?
I think it's unnecessary. If we did consider it necessary,
why wouldn't just about every subsection in chapter 8 need
similar wording?
regards, tom lane
Op 9/15/23 om 22:27 schreef David E. Wheeler:
On Sep 12, 2023, at 21:00, Erik Wienhold <ewie@ewie.name> wrote:
That's also my understanding. We had a discussion about the docs on @@, @?, and
jsonb_path_query on -general a while back [1]. Maybe it's useful also.Okay, I’ll take a pass at expanding the docs on this. I think a little mini-tutorial on these two operators would be useful.
Meanwhile, I’d like to re-up this question about the index qualification of non-equality JSON Path operators.
On Sep 12, 2023, at 20:16, David E. Wheeler <david@justatheory.com> wrote:
Issue 3: Index Use for Comparison
---------------------------------From the docs (https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), I had assumed any JSON Path query would be able to use the GIN index. However while the use of the == JSON Path operator is able to take advantage of the GIN index, apparently the >= operator cannot:
david=# explain analyze select id from movies where movie @? '$ ?($.year >= 2023)';
QUERY PLAN ---------------------------------------------------------------------------------------------------------
Seq Scan on movies (cost=0.00..3741.41 rows=366 width=4) (actual time=34.815..36.259 rows=192 loops=1)
Filter: (movie @? '$?($."year" >= 2023)'::jsonpath)
Rows Removed by Filter: 36081
Planning Time: 1.864 ms
Execution Time: 36.338 ms
(5 rows)Is this expected? Originally I tried with json_path_ops, which I can understand not working, since it stores hashes of paths, which would allow only exact matches. But a plain old GIN index doesn’t appear to work, either. Should it? Is there perhaps some other op class that would allow it to work? Or would I have to create a separate BTREE index on `movie -> 'year'`?
movie @? '$ ?($.year >= 2023)'
I believe it is indeed not possible to have such a unequality-search use
the GIN index. It is another weakness of JSON that can be unexpected to
those not in the fullness of Knowledge of the manual. Yes, this too
would be good to explain in the doc where JSON indexes are explained.
Erik Rijkers
Show quoted text
Thanks,
David
On Sep 15, 2023, at 20:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think that that indicates that you're putting the info in the
wrong place. Perhaps the right answer is to insert something
more explicit in section 11.2, which is the first place where
we really spend any effort discussing what can be indexed.
Fair enough. How ’bout this?
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -120,7 +120,7 @@ CREATE INDEX test1_id_index ON test1 (id);
B-tree, Hash, GiST, SP-GiST, GIN, BRIN, and the extension <link
linkend="bloom">bloom</link>.
Each index type uses a different
- algorithm that is best suited to different types of queries.
+ algorithm that is best suited to different types of queries and operators.
By default, the <link linkend="sql-createindex"><command>CREATE
INDEX</command></link> command creates
B-tree indexes, which fit the most common situations.
@@ -132,6 +132,14 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
</programlisting>
</para>
+ <note>
+ <para>
+ Only operators on indexed columns are considered as index qualifications.
+ Functions never qualify for index usage, aside from
+ <link linkend="indexes-expressional">indexes on expressions</link>.
+ </para>
+ </note>
+
<sect2 id="indexes-types-btree">
<title>B-Tree</title>
Import Notes
Reply to msg id not found: 3826235.1694824574@sss.pgh.pa.us
On Sep 15, 2023, at 23:59, Erik Rijkers <er@xs4all.nl> wrote:
movie @? '$ ?($.year >= 2023)'
I believe it is indeed not possible to have such a unequality-search use the GIN index. It is another weakness of JSON that can be unexpected to those not in the fullness of Knowledge of the manual. Yes, this too would be good to explain in the doc where JSON indexes are explained.
Is that a limitation of GIN indexes in general? Or could there be opclass improvements in the future that would enable such comparisons?
Thanks,
David
On Sep 12, 2023, at 21:00, Erik Wienhold <ewie@ewie.name> wrote:
If so, I’d like to submit a patch to the docs talking about this, and
suggesting the use of jsonb_path_query() to test paths to see if they return
a boolean or not.+1
I’ve started work on this; there’s so much to learn! Here’s a new example that surprised me a bit. Using the GPS tracker example from the docs [1]https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-PATH loaded into a `:json` psql variable, this output of this query makes perfect sense to me:
david=# select jsonb_path_query(:'json', '$.track.segments.location[*] ? (@ < 14)');
jsonb_path_query
------------------
13.4034
13.2635
Because `[*]` selects all the values. This, however, I did not expect:
david=# select jsonb_path_query(:'json', '$.track.segments.location ? (@[*] < 14)');
jsonb_path_query
------------------
13.4034
13.2635
(2 rows)
I had expected it to return two single-value arrays, instead:
[13.4034]
[13.2635]
It appears that the filter expression is doing some sub-selection, too. Is that expected?
Best,
David
[1]: https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-PATH
On 16/09/2023 22:19 CEST David E. Wheeler <david@justatheory.com> wrote:
On Sep 15, 2023, at 23:59, Erik Rijkers <er@xs4all.nl> wrote:
movie @? '$ ?($.year >= 2023)'
I believe it is indeed not possible to have such a unequality-search use
the GIN index. It is another weakness of JSON that can be unexpected to
those not in the fullness of Knowledge of the manual. Yes, this too would
be good to explain in the doc where JSON indexes are explained.Is that a limitation of GIN indexes in general? Or could there be opclass
improvements in the future that would enable such comparisons?
This detail is mentioned in docs [1]https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING:
"For these operators, a GIN index extracts clauses of the form
**accessors_chain = constant** out of the jsonpath pattern, and does the
index search based on the keys and values mentioned in these clauses."
I don't know if this is a general limitation of GIN indexes or just how these
operators are implemented right now.
[1]: https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING
--
Erik
On Sep 16, 2023, at 16:50, Erik Wienhold <ewie@ewie.name> wrote:
"For these operators, a GIN index extracts clauses of the form
**accessors_chain = constant** out of the jsonpath pattern, and does the
index search based on the keys and values mentioned in these clauses."I don't know if this is a general limitation of GIN indexes or just how these
operators are implemented right now.[1] https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING
The detail that jumps out at me is this one on jsonb_path_ops:
“Basically, each jsonb_path_ops index item is a hash of the value and the key(s) leading to it”
Because jsonb_path_ops indexes hashes, I would assume it would only support path equality. But it’s not clear to me from these docs that jsonb_ops also indexes hashes. Does it?
Best,
D
On 16/09/2023 22:26 CEST David E. Wheeler <david@justatheory.com> wrote:
I’ve started work on this; there’s so much to learn! Here’s a new example
that surprised me a bit. Using the GPS tracker example from the docs [1]
loaded into a `:json` psql variable, this output of this query makes perfect
sense to me:david=# select jsonb_path_query(:'json', '$.track.segments.location[*] ? (@ < 14)');
jsonb_path_query
------------------
13.4034
13.2635Because `[*]` selects all the values. This, however, I did not expect:
david=# select jsonb_path_query(:'json', '$.track.segments.location ? (@[*] < 14)');
jsonb_path_query
------------------
13.4034
13.2635
(2 rows)I had expected it to return two single-value arrays, instead:
[13.4034]
[13.2635]It appears that the filter expression is doing some sub-selection, too.
Is that expected?
Looks like the effect of lax mode which may unwrap arrays when necessary [1]https://www.postgresql.org/docs/current/functions-json.html#STRICT-AND-LAX-MODES.
The array unwrapping looks like the result of jsonb_array_elements().
It kinda works in strict mode:
SELECT jsonb_path_query(:'json', 'strict $.track.segments[*].location ? (@[*] < 14)');
jsonb_path_query
-----------------------
[47.763, 13.4034]
[47.706, 13.2635]
(2 rows)
But it does not remove elements from the matching arrays. Which I don't even
expect here because the path specifies the location array as the object to be
returned. The filter expression then only decides whether to return the
location array or not. Nowhere in the docs does it say that the filter
expression itself removes any elements from a matched array.
Here's a query that filter's out individual array elements. It's quite a
mouthful (especially to preserve the order of array elements):
WITH location AS (
SELECT loc, row_number() OVER () AS array_num
FROM jsonb_path_query(:'json', 'strict $.track.segments[*].location') loc
),
element AS (
SELECT array_num, e.num AS elem_num, e.elem
FROM location
CROSS JOIN jsonb_array_elements(loc) WITH ORDINALITY AS e (elem, num)
)
SELECT jsonb_agg(elem ORDER BY elem_num)
FROM element
WHERE jsonb_path_exists(elem, '$ ? (@ < 14)')
GROUP BY array_num;
jsonb_agg
---------------
[13.2635]
[13.4034]
(2 rows)
[1]: https://www.postgresql.org/docs/current/functions-json.html#STRICT-AND-LAX-MODES
--
Erik
On Sep 16, 2023, at 18:13, Erik Wienhold <ewie@ewie.name> wrote:
Looks like the effect of lax mode which may unwrap arrays when necessary [1].
The array unwrapping looks like the result of jsonb_array_elements().It kinda works in strict mode:
SELECT jsonb_path_query(:'json', 'strict $.track.segments[*].location ? (@[*] < 14)');
jsonb_path_query
-----------------------
[47.763, 13.4034]
[47.706, 13.2635]
(2 rows)But it does not remove elements from the matching arrays. Which I don't even
expect here because the path specifies the location array as the object to be
returned. The filter expression then only decides whether to return the
location array or not. Nowhere in the docs does it say that the filter
expression itself removes any elements from a matched array.
Yes, this is what I expected. It means “select the location array if any of its contents is less that 14.”
I don’t understand why it’s different in lax mode, though, as `@[*]` is not a structural error; it confirms to the schema, as the docs say. The flattening in this case seems weird.
Ah, here’s why:, from the docs:
"Besides, comparison operators automatically unwrap their operands in the lax mode, so you can compare SQL/JSON arrays out-of-the-box.”
There follow some discussion of the need to specify `[*]` on segments in strict mode, but since that’s exactly what my example does (and the same for the locations array inside the filter), it doesn’t seem right to me that it would be unwrapped here.
Here's a query that filter's out individual array elements. It's quite a
mouthful (especially to preserve the order of array elements):
Wow fun, and yeah, it makes sense to take things apart in SQL for this sort of thing!
Best,
David
"David E. Wheeler" <david@justatheory.com> writes:
On Sep 15, 2023, at 20:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think that that indicates that you're putting the info in the
wrong place. Perhaps the right answer is to insert something
more explicit in section 11.2, which is the first place where
we really spend any effort discussing what can be indexed.
Fair enough. How ’bout this?
After thinking about it for awhile, I think we need some more
discursive explanation of what's allowed, perhaps along the lines
of the attached. (I still can't shake the feeling that this is
duplicative; but I can't find anything comparable until you get
into the weeds in Section V.)
I put the new text at the end of section 11.1, but perhaps it
belongs a little further up in that section; it seems more
important than some of the preceding paras.
regards, tom lane
Attachments:
v1-document-indexable-clauses-better.patchtext/x-diff; charset=us-ascii; name=v1-document-indexable-clauses-better.patchDownload
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 55122129d5..1a0b003fb0 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -109,6 +109,39 @@ CREATE INDEX test1_id_index ON test1 (id);
Therefore indexes that are seldom or never used in queries
should be removed.
</para>
+
+ <para>
+ In general, <productname>PostgreSQL</productname> indexes can be used
+ to optimize queries that contain one or more <literal>WHERE</literal>
+ or <literal>JOIN</literal> clauses of the form
+
+<synopsis>
+<replaceable>indexed-column</replaceable> <replaceable>indexable-operator</replaceable> <replaceable>comparison-value</replaceable>
+</synopsis>
+
+ Here, the <replaceable>indexed-column</replaceable> is whatever
+ column or expression the index has been defined on.
+ The <replaceable>indexable-operator</replaceable> is an operator that
+ is a member of the index's <firstterm>operator class</firstterm> for
+ the indexed column. (More details about that appear below.)
+ And the <replaceable>comparison-value</replaceable> can be any
+ expression that is not volatile and does not reference the index's
+ table.
+ </para>
+
+ <para>
+ In some cases the query planner can extract an indexable clause of
+ this form from another SQL construct. A simple example is that if
+ the original clause was
+
+<synopsis>
+<replaceable>comparison-value</replaceable> <replaceable>operator</replaceable> <replaceable>indexed-column</replaceable>
+</synopsis>
+
+ then it can be flipped around into indexable form if the
+ original <replaceable>operator</replaceable> has a commutator
+ operator that is a member of the index's operator class.
+ </para>
</sect1>
@@ -120,7 +153,7 @@ CREATE INDEX test1_id_index ON test1 (id);
B-tree, Hash, GiST, SP-GiST, GIN, BRIN, and the extension <link
linkend="bloom">bloom</link>.
Each index type uses a different
- algorithm that is best suited to different types of queries.
+ algorithm that is best suited to different types of indexable clauses.
By default, the <link linkend="sql-createindex"><command>CREATE
INDEX</command></link> command creates
B-tree indexes, which fit the most common situations.
On Sep 17, 2023, at 12:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
After thinking about it for awhile, I think we need some more
discursive explanation of what's allowed, perhaps along the lines
of the attached. (I still can't shake the feeling that this is
duplicative; but I can't find anything comparable until you get
into the weeds in Section V.)I put the new text at the end of section 11.1, but perhaps it
belongs a little further up in that section; it seems more
important than some of the preceding paras.
I think this is useful, but also that it’s worth calling out explicitly that functions do not count as indexable operators. True by definition, of course, but I at least had assumed that since an operator is, in a sense, syntax sugar for a function call, they are in some sense the same thing.
A header might be useful, something like “What Counts as an indexable expression”.
Best,
David
On Sep 12, 2023, at 21:00, Erik Wienhold <ewie@ewie.name> wrote:
I posted this question on Stack Overflow (https://stackoverflow.com/q/77046554/79202),
and from the suggestion I got there, it seems that @@ expects a boolean to be
returned by the path query, while @? wraps it in an implicit exists(). Is that
right?That's also my understanding. We had a discussion about the docs on @@, @?, and
jsonb_path_query on -general a while back [1]. Maybe it's useful also.
Hi, finally getting back to this, still fiddling to figure out the differences. From the thread you reference [1]/messages/by-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA@mail.gmail.com, is the point that @@ and jsonb_path_match() can only be properly used with a JSON Path expression that’s a predicate check?
If so, as far as I can tell, only exists() around the entire path query, or the deviation from the SQL standard that allows an expression to be a predicate?
This suggest to me that the "Only the first item of the result is taken into account” bit from the docs may not be quite right. Consider this example:
david=# select jsonb_path_query('{"a":[false,true,false]}', '$.a ?(@[*] == false)');
jsonb_path_query
------------------
false
false
(2 rows)
david=# select jsonb_path_match('{"a":[false,true,false]}', '$.a ?(@[*] == false)');
ERROR: single boolean result is expected
jsonb_path_match(), it turns out, only wants a single result. But furthermore perhaps the use of a filter predicate rather than a predicate expression for the entire path query is an error?
Curiously, @@ seems okay with it:
david=# select '{"a":[false,true,false]}'@@ '$.a ?(@[*] == false)';
?column?
----------
t
Not a predicate query, and somehow returns true even though the first item of the result is false? Is that how it should be?
Best,
David
[1]: /messages/by-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA@mail.gmail.com
On 2023-10-09 01:13 +0200, David E. Wheeler write:
On Sep 12, 2023, at 21:00, Erik Wienhold <ewie@ewie.name> wrote:
I posted this question on Stack Overflow (https://stackoverflow.com/q/77046554/79202),
and from the suggestion I got there, it seems that @@ expects a boolean to be
returned by the path query, while @? wraps it in an implicit exists(). Is that
right?That's also my understanding. We had a discussion about the docs on @@, @?, and
jsonb_path_query on -general a while back [1]. Maybe it's useful also.Hi, finally getting back to this, still fiddling to figure out the
differences. From the thread you reference [1], is the point that @@
and jsonb_path_match() can only be properly used with a JSON Path
expression that’s a predicate check?
I think so. That's also supported by the existing docs which only
mention "JSON path predicate" for @@ and jsonb_path_match().
If so, as far as I can tell, only exists() around the entire path
query, or the deviation from the SQL standard that allows an
expression to be a predicate?
Looks like that. But note that exists() is also a filter expression.
So wrapping the entire jsonpath in exists() is also a deviation from the
SQL standard which only allows predicates in filter expressions, i.e.
'<path> ? (<predicate>)'.
This suggest to me that the "Only the first item of the result is
taken into account” bit from the docs may not be quite right.
Yes, this was also the issue in the referenced thread[1]/messages/by-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA@mail.gmail.com. I think my
suggesstion in [2]/messages/by-id/880194083.579916.1680598906819@office.mailbox.org explains it (as far as I understand it).
Consider this example:
david=# select jsonb_path_query('{"a":[false,true,false]}', '$.a ?(@[*] == false)');
jsonb_path_query
------------------
false
false
(2 rows)david=# select jsonb_path_match('{"a":[false,true,false]}', '$.a ?(@[*] == false)');
ERROR: single boolean result is expectedjsonb_path_match(), it turns out, only wants a single result. But
furthermore perhaps the use of a filter predicate rather than a
predicate expression for the entire path query is an error?
Yes, I think @@ and jsonb_path_match() should not be used with filter
expressions because the jsonpath returns whatever the path expression
yields (which may be an actual boolean value in the jsonb). The filter
expression only filters (as the name suggests) what the path expression
yields.
Curiously, @@ seems okay with it:
david=# select '{"a":[false,true,false]}'@@ '$.a ?(@[*] == false)';
?column?
----------
tNot a predicate query, and somehow returns true even though the first
item of the result is false? Is that how it should be?
Your example does a text search equivalent to:
select to_tsvector('{"a":[false,true,false]}') @@ plainto_tsquery('$.a ? (@[*] == true)')
You forgot the cast to jsonb. jsonb @@ jsonpath actually returns null:
test=# select '{"a":[false,true,false]}'::jsonb @@ '$.a ? (@[*] == false)';
?column?
----------
<null>
(1 row)
This matches the note right after the docs for @@:
"The jsonpath operators @? and @@ suppress the following errors: missing
object field or array element, unexpected JSON item type, datetime and
numeric errors. The jsonpath-related functions described below can also
be told to suppress these types of errors. This behavior might be
helpful when searching JSON document collections of varying structure."
That would be the silent argument of jsonb_path_match():
test=# select jsonb_path_match('{"a":[false,true,false]}', '$.a ? (@[*] == false)', silent => true);
jsonb_path_match
------------------
<null>
(1 row)
[1]: /messages/by-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA@mail.gmail.com
[2]: /messages/by-id/880194083.579916.1680598906819@office.mailbox.org
--
Erik
Thanks for the reply, Erik. Have appreciated collaborating with you on a few different things lately!
On Oct 13, 2023, at 22:50, Erik Wienhold <ewie@ewie.name> wrote:
Hi, finally getting back to this, still fiddling to figure out the
differences. From the thread you reference [1], is the point that @@
and jsonb_path_match() can only be properly used with a JSON Path
expression that’s a predicate check?I think so. That's also supported by the existing docs which only
mention "JSON path predicate" for @@ and jsonb_path_match().
Okay, good.
If so, as far as I can tell, only exists() around the entire path
query, or the deviation from the SQL standard that allows an
expression to be a predicate?Looks like that. But note that exists() is also a filter expression.
So wrapping the entire jsonpath in exists() is also a deviation from the
SQL standard which only allows predicates in filter expressions, i.e.
'<path> ? (<predicate>)'.
Yeah. I’m starting to get the sense that the Postgres extension of the standard to allow predicates without filters is almost a different thing, like there are two Pg SQL/JSON Path languages:
1. SQL Standard path language for selecting values and includes predicates. Returns the selected value(s). Supported by `@?` and jsonb_path_exists().
2. The Postgres predicate path language which returns a boolean, akin to a WHERE expression. Supported by `@@` and jsonb_path_match()
Both are supported by jsonb_path_query(), but if you use a standard path you get the values and if you use a predicate path you get a boolean. This feels a big overloaded to me, TBH; I find myself wanting them to be separate types since the behaviors vary quite a bit!
This suggest to me that the "Only the first item of the result is
taken into account” bit from the docs may not be quite right.Yes, this was also the issue in the referenced thread[1]. I think my
suggesstion in [2] explains it (as far as I understand it).
Yeah, lax vs. strict mode stuff definitely creates some added complexity. I see now I missed the rest of that thread; seeing the entire thread on one page[1]/messages/by-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA@mail.gmail.com really helps. I’d like to take a stab at the doc improvements Tom suggests[2]/messages/by-id/1229727.1680535592@sss.pgh.pa.us.
jsonb_path_match(), it turns out, only wants a single result. But
furthermore perhaps the use of a filter predicate rather than a
predicate expression for the entire path query is an error?Yes, I think @@ and jsonb_path_match() should not be used with filter
expressions because the jsonpath returns whatever the path expression
yields (which may be an actual boolean value in the jsonb). The filter
expression only filters (as the name suggests) what the path expression
yields.
Agreed. It only gets worse with a filter expression that selects a single value:
david=# select jsonb_path_match('{"a":[false,true]}', '$.a ?(@[*] == false)');
jsonb_path_match
------------------
f
Presumably it returns false because the value selected is JSON `false`:
david=# select jsonb_path_query('{"a":[false,true]}', '$.a ?(@[*] == false)');
jsonb_path_query
------------------
false
Which seems misleading, frankly. Would it be possible to update jsonb_path_match and @@ to raise an error when the path expression is not a predicate?
Curiously, @@ seems okay with it:
david=# select '{"a":[false,true,false]}'@@ '$.a ?(@[*] == false)';
?column?
----------
tNot a predicate query, and somehow returns true even though the first
item of the result is false? Is that how it should be?Your example does a text search equivalent to:
select to_tsvector('{"a":[false,true,false]}') @@ plainto_tsquery('$.a ? (@[*] == true)')
You forgot the cast to jsonb.
Oh good grief 🤦🏻♂️
jsonb @@ jsonpath actually returns null:
test=# select '{"a":[false,true,false]}'::jsonb @@ '$.a ? (@[*] == false)';
?column?
----------
<null>
(1 row)
Yes, much better, though see the result above that returns a single `false` and confuses things.
This matches the note right after the docs for @@:
Yeah, that makes sense. But here’s a bit about lax mode[3]https://www.postgresql.org/docs/current/functions-json.html#STRICT-AND-LAX-MODES that confuses me:
The lax mode facilitates matching of a JSON document structure and path expression if the JSON data does not conform to the expected schema. If an operand does not match the requirements of a particular operation, it can be automatically wrapped as an SQL/JSON array or unwrapped by converting its elements into an SQL/JSON sequence before performing this operation. Besides, comparison operators automatically unwrap their operands in the lax mode, so you can compare SQL/JSON arrays out-of-the-box.
This automatic flattening in lax mode seems odd, because it means you get different results in strict and lax mode where there are no errors. In lax mode, you get a set:
david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)');
jsonb_path_query
------------------
3
4
5
(3 rows)
But in strict mode, you get the array selected by `$.a`, which is more what I would expect:
david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', 'strict $.a ?(@[*] > 2)');
jsonb_path_query
------------------
[1, 2, 3, 4, 5]
This seems like an odd inconsistency in return values, but perhaps the standard calls for this? I don’t have access to it, but MSSQL docs[4]https://learn.microsoft.com/en-us/sql/relational-databases/json/json-path-expressions-sql-server?view=sql-server-ver16#PATHMODE, at least, say:
* In **lax** mode, the function returns empty values if the path expression contains an error. For example, if you request the value **$.name**, and the JSON text doesn't contain a **name** key, the function returns null, but does not raise an error.
* In **strict** mode, the function raises an error if the path expression contains an error.
No flattening, only error suppression. The Oracle docs[5]https://docs.oracle.com/en/database/oracle/oracle-database/21/adjsn/json-path-expressions.html#GUID-8656CAB9-C293-4A99-BB62-F38F3CFC4C13 mention array flattening, but I don’t have it up and running to see if that means query *results* are flattened.
Best,
David
[1]: /messages/by-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA@mail.gmail.com
[2]: /messages/by-id/1229727.1680535592@sss.pgh.pa.us
[3]: https://www.postgresql.org/docs/current/functions-json.html#STRICT-AND-LAX-MODES
On Sep 17, 2023, at 18:09, David E. Wheeler <david@justatheory.com> wrote:
I think this is useful, but also that it’s worth calling out explicitly that functions do not count as indexable operators. True by definition, of course, but I at least had assumed that since an operator is, in a sense, syntax sugar for a function call, they are in some sense the same thing.
A header might be useful, something like “What Counts as an indexable expression”.
Hey Tom, are you still thinking about adding this bit to the docs? I took a quick look at master and didn’t see it there.
Thanks,
David
"David E. Wheeler" <david@justatheory.com> writes:
Hey Tom, are you still thinking about adding this bit to the docs? I took a quick look at master and didn’t see it there.
I'd waited because the discussion was still active, and then it
kind of slipped off the radar. I'll take another look and push
some form of what I suggested. That doesn't really address the
jsonpath oddities you were on about, though.
regards, tom lane
On Dec 17, 2023, at 16:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'd waited because the discussion was still active, and then it
kind of slipped off the radar. I'll take another look and push
some form of what I suggested.
Right on.
That doesn't really address the
jsonpath oddities you were on about, though.
No, I attempted to address those in [a patch][1]https://commitfest.postgresql.org/45/4624/.
[1]: https://commitfest.postgresql.org/45/4624/
Best,
David