Patch: Add tsmatch JSONPath operator for granular Full Text Search
Hi,
in real-life I work a lot with json & fts search, here's a feature I've
always wished I had,
but never tackle it. Until yesterday that is.
SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body
tsmatch "performance")');
This patch introduces a tsmatch boolean operator to the JSONPath engine.
By integrating FTS natively into path expressions,
this operator allows for high-precision filtering of nested JSONB
structures—
solving issues with structural ambiguity and query complexity.
Currently, users must choose between two suboptimal paths for FTS-ing
nested JSON:
- Imprecise Global Indexing
jsonb_to_tsvector aggregates text into a flat vector.
This ignores JSON boundaries, leading to false positives when the same key
(e.g., "body")
appears in different contexts (e.g., a "Product Description" vs. a
"Customer Review").
- Complex SQL Workarounds
Achieving 100% precision requires unnesting the document via
jsonb_array_elements and LATERAL joins.
This leads to verbose SQL and high memory overhead from generating
intermediate heap tuples.
One of the most significant advantages of tsmatch is its ability to
participate in multi-condition predicates
within the same JSON object - something jsonb_to_tsvector cannot do.
SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body
tsmatch "performance")');
In a flat vector, the association between "Alice" and "performance" is lost.
tsmatch preserves this link by evaluating the FTS predicate in-place during
path traversal.
While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an
FTS operator,
tsmatch is architecturally modeled after the standard-defined like_regex.
The implementation follows the like_regex precedent:
it is a non-indexable predicate that relies on GIN path-matching for
pruning and heap re-checks for precision.
Caching is scoped to the JsonPathExecContext,
ensuring 'compile-once' efficiency per execution without violating the
stability requirements of prepared statements.
This initial implementation uses plainto_tsquery.
However, the grammar is designed to support a "mode" flag (similar to
like_regex flags)
in future iterations to toggle between to_tsquery, websearch_to_tsquery,
and phraseto_tsquery.
Attachments:
v1-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patchapplication/octet-stream; name=v1-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patchDownload+369-5
tsmatch_bench.outapplication/octet-stream; name=tsmatch_bench.outDownload
tsmatch_bench.sqlapplication/octet-stream; name=tsmatch_bench.sqlDownload+0-11
On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai <florents.tselai@gmail.com>
wrote:
Hi,
in real-life I work a lot with json & fts search, here's a feature I've
always wished I had,
but never tackle it. Until yesterday that is.SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body
tsmatch "performance")');This patch introduces a tsmatch boolean operator to the JSONPath engine.
By integrating FTS natively into path expressions,
this operator allows for high-precision filtering of nested JSONB
structures—
solving issues with structural ambiguity and query complexity.Currently, users must choose between two suboptimal paths for FTS-ing
nested JSON:
- Imprecise Global Indexing
jsonb_to_tsvector aggregates text into a flat vector.
This ignores JSON boundaries, leading to false positives when the same key
(e.g., "body")
appears in different contexts (e.g., a "Product Description" vs. a
"Customer Review").- Complex SQL Workarounds
Achieving 100% precision requires unnesting the document via
jsonb_array_elements and LATERAL joins.
This leads to verbose SQL and high memory overhead from generating
intermediate heap tuples.One of the most significant advantages of tsmatch is its ability to
participate in multi-condition predicates
within the same JSON object - something jsonb_to_tsvector cannot do.SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body
tsmatch "performance")');In a flat vector, the association between "Alice" and "performance" is
lost.
tsmatch preserves this link by evaluating the FTS predicate in-place
during path traversal.While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an
FTS operator,
tsmatch is architecturally modeled after the standard-defined like_regex.The implementation follows the like_regex precedent:
it is a non-indexable predicate that relies on GIN path-matching for
pruning and heap re-checks for precision.
Caching is scoped to the JsonPathExecContext,
ensuring 'compile-once' efficiency per execution without violating the
stability requirements of prepared statements.This initial implementation uses plainto_tsquery.
However, the grammar is designed to support a "mode" flag (similar to
like_regex flags)
in future iterations to toggle between to_tsquery, websearch_to_tsquery,
and phraseto_tsquery.
Here's a v2, that implements the tsqparser clause
So this should now work too
select jsonb_path_query_array('["fast car", "slow car", "fast and
furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w")
Attachments:
v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patchapplication/octet-stream; name=v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patchDownload+567-5
On Feb 1, 2026, at 19:02, Florents Tselai <florents.tselai@gmail.com> wrote:
On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai <florents.tselai@gmail.com> wrote:
Hi,in real-life I work a lot with json & fts search, here's a feature I've always wished I had,
but never tackle it. Until yesterday that is.SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
This patch introduces a tsmatch boolean operator to the JSONPath engine.
By integrating FTS natively into path expressions,
this operator allows for high-precision filtering of nested JSONB structures—
solving issues with structural ambiguity and query complexity.Currently, users must choose between two suboptimal paths for FTS-ing nested JSON:
- Imprecise Global Indexing
jsonb_to_tsvector aggregates text into a flat vector.
This ignores JSON boundaries, leading to false positives when the same key (e.g., "body")
appears in different contexts (e.g., a "Product Description" vs. a "Customer Review").- Complex SQL Workarounds
Achieving 100% precision requires unnesting the document via jsonb_array_elements and LATERAL joins.
This leads to verbose SQL and high memory overhead from generating intermediate heap tuples.One of the most significant advantages of tsmatch is its ability to participate in multi-condition predicates
within the same JSON object - something jsonb_to_tsvector cannot do.SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
In a flat vector, the association between "Alice" and "performance" is lost.
tsmatch preserves this link by evaluating the FTS predicate in-place during path traversal.While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an FTS operator,
tsmatch is architecturally modeled after the standard-defined like_regex.The implementation follows the like_regex precedent:
it is a non-indexable predicate that relies on GIN path-matching for pruning and heap re-checks for precision.
Caching is scoped to the JsonPathExecContext,
ensuring 'compile-once' efficiency per execution without violating the stability requirements of prepared statements.This initial implementation uses plainto_tsquery.
However, the grammar is designed to support a "mode" flag (similar to like_regex flags)
in future iterations to toggle between to_tsquery, websearch_to_tsquery, and phraseto_tsquery.Here's a v2, that implements the tsqparser clause
So this should now work too
select jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w") <v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch>
Hi Florents,
Grant pinged me about this. I can review it in coming days. Can you please rebase it? I failed to apply to current master. Also, the CF reported a failure test case, please take a look.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Thu, Feb 26, 2026 at 8:48 AM Chao Li <li.evan.chao@gmail.com> wrote:
On Feb 1, 2026, at 19:02, Florents Tselai <florents.tselai@gmail.com>
wrote:
On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai <
florents.tselai@gmail.com> wrote:
Hi,
in real-life I work a lot with json & fts search, here's a feature I've
always wished I had,
but never tackle it. Until yesterday that is.
SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" &&
@.body tsmatch "performance")');
This patch introduces a tsmatch boolean operator to the JSONPath engine.
By integrating FTS natively into path expressions,
this operator allows for high-precision filtering of nested JSONBstructures—
solving issues with structural ambiguity and query complexity.
Currently, users must choose between two suboptimal paths for FTS-ing
nested JSON:
- Imprecise Global Indexing
jsonb_to_tsvector aggregates text into a flat vector.
This ignores JSON boundaries, leading to false positives when the samekey (e.g., "body")
appears in different contexts (e.g., a "Product Description" vs. a
"Customer Review").
- Complex SQL Workarounds
Achieving 100% precision requires unnesting the document viajsonb_array_elements and LATERAL joins.
This leads to verbose SQL and high memory overhead from generating
intermediate heap tuples.
One of the most significant advantages of tsmatch is its ability to
participate in multi-condition predicates
within the same JSON object - something jsonb_to_tsvector cannot do.
SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" &&
@.body tsmatch "performance")');
In a flat vector, the association between "Alice" and "performance" is
lost.
tsmatch preserves this link by evaluating the FTS predicate in-place
during path traversal.
While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define
an FTS operator,
tsmatch is architecturally modeled after the standard-defined like_regex.
The implementation follows the like_regex precedent:
it is a non-indexable predicate that relies on GIN path-matching forpruning and heap re-checks for precision.
Caching is scoped to the JsonPathExecContext,
ensuring 'compile-once' efficiency per execution without violating thestability requirements of prepared statements.
This initial implementation uses plainto_tsquery.
However, the grammar is designed to support a "mode" flag (similar tolike_regex flags)
in future iterations to toggle between to_tsquery, websearch_to_tsquery,
and phraseto_tsquery.
Here's a v2, that implements the tsqparser clause
So this should now work too
select jsonb_path_query_array('["fast car", "slow car", "fast and
furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w")
<v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch>Hi Florents,
Grant pinged me about this. I can review it in coming days. Can you please
rebase it? I failed to apply to current master. Also, the CF reported a
failure test case, please take a look.
Hi Evan,
thanks for having a look. The conflict was due to the intro of
pg_fallthrough. Not related to this patch .
I noticed the failure too, but I'm having a hard time reproducing it tbh.
This fails for Debian Trixie with Meson. The same with Autoconf passes...
https://github.com/Florents-Tselai/postgres/runs/65098077968
Attachments:
v3-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patchapplication/octet-stream; name=v3-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patchDownload+567-5
On Feb 27, 2026, at 13:59, Florents Tselai <florents.tselai@gmail.com> wrote:
On Thu, Feb 26, 2026 at 8:48 AM Chao Li <li.evan.chao@gmail.com> wrote:
On Feb 1, 2026, at 19:02, Florents Tselai <florents.tselai@gmail.com> wrote:
On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai <florents.tselai@gmail.com> wrote:
Hi,in real-life I work a lot with json & fts search, here's a feature I've always wished I had,
but never tackle it. Until yesterday that is.SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
This patch introduces a tsmatch boolean operator to the JSONPath engine.
By integrating FTS natively into path expressions,
this operator allows for high-precision filtering of nested JSONB structures—
solving issues with structural ambiguity and query complexity.Currently, users must choose between two suboptimal paths for FTS-ing nested JSON:
- Imprecise Global Indexing
jsonb_to_tsvector aggregates text into a flat vector.
This ignores JSON boundaries, leading to false positives when the same key (e.g., "body")
appears in different contexts (e.g., a "Product Description" vs. a "Customer Review").- Complex SQL Workarounds
Achieving 100% precision requires unnesting the document via jsonb_array_elements and LATERAL joins.
This leads to verbose SQL and high memory overhead from generating intermediate heap tuples.One of the most significant advantages of tsmatch is its ability to participate in multi-condition predicates
within the same JSON object - something jsonb_to_tsvector cannot do.SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
In a flat vector, the association between "Alice" and "performance" is lost.
tsmatch preserves this link by evaluating the FTS predicate in-place during path traversal.While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an FTS operator,
tsmatch is architecturally modeled after the standard-defined like_regex.The implementation follows the like_regex precedent:
it is a non-indexable predicate that relies on GIN path-matching for pruning and heap re-checks for precision.
Caching is scoped to the JsonPathExecContext,
ensuring 'compile-once' efficiency per execution without violating the stability requirements of prepared statements.This initial implementation uses plainto_tsquery.
However, the grammar is designed to support a "mode" flag (similar to like_regex flags)
in future iterations to toggle between to_tsquery, websearch_to_tsquery, and phraseto_tsquery.Here's a v2, that implements the tsqparser clause
So this should now work too
select jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w") <v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch>
Hi Florents,
Grant pinged me about this. I can review it in coming days. Can you please rebase it? I failed to apply to current master. Also, the CF reported a failure test case, please take a look.
Hi Evan,
thanks for having a look. The conflict was due to the intro of pg_fallthrough. Not related to this patch .I noticed the failure too, but I'm having a hard time reproducing it tbh.
This fails for Debian Trixie with Meson. The same with Autoconf passes...https://github.com/Florents-Tselai/postgres/runs/65098077968
<v3-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch>
I have reviewed v3 and traced a few test cases. Here comes my review comments:
1
```
+ <replaceable>string</replaceable> <literal>tsmatch</literal> <replaceable>string</replaceable>
+ <optional> <literal>tsconfig</literal> <replaceable>string</replaceable> </optional>
+ <optional> <literal>tsqparser</literal> <replaceable>string</replaceable> </optional>
```
For all “replaceable”, instead of “string”, would it be better to use something more descriptive? For example:
```
<replaceable>json_string</replaceable> <literal>tsmatch</literal> <replaceable>query</replaceable>
<optional> <literal>tsconfig</literal> <replaceable>config_name</replaceable> </optional>
<optional> <literal>tsqparser</literal> <replaceable>parser_mode</replaceable> </optional>
```
2 - jsonpath_gram.y
```
+static bool makeItemTsMatch(JsonPathParseItem *doc,
+ JsonPathString *tsquery,
+ JsonPathString *tsconfig,
+ JsonPathString *tsquery_parser,
+ JsonPathParseItem ** result,
+ struct Node *escontext);
```
Format Nit: Looking at the existing code, the J in the second and following lines, should be placed in the same column as the J in the first line.
3 - jsonpath_gram.y
```
+ | expr TSMATCH_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsconfig (3rd) and NULL for tsquery_parser (4th) */
+ if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsquery_parser (4th) */
+ if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsconfig (3rd) */
+ if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
```
Feels a little redundant, repeatedly calls makeItemTsMatch. See the attached diff for a simplification. But my version is a bit longer in terms of number of lines. So, up to you.
4 - jsonpath_gram.y
```
+static bool
+makeItemTsMatch(JsonPathParseItem *doc,
+ JsonPathString *tsquery,
+ JsonPathString *tsconfig,
+ JsonPathString *tsquery_parser,
+ JsonPathParseItem **result,
+ struct Node *escontext)
```
makeItemTsMatch doesn’t need to return a bool. Actually, now it never returns false, instead, it just ereport(ERROR).
5 - jsonpath.h
```
+ struct
+ {
+ int32 doc;
+ char *tsquery;
+ uint32 tsquerylen;
+ int32 tsconfig;
+ char *tsqparser;
+ uint32 tsqparser_len;
+ } tsmatch;
+ struct
+ {
+ JsonPathParseItem *doc;
+ char *tsquery;
+ uint32 tsquerylen;
+ JsonPathParseItem *tsconfig;
+ char *tsqparser;
+ uint32 tsqparser_len;
+ } tsmatch;
} value;
```
tsquerylen doesn’t have _ before len, and tsqparser_len, would it be better to make naming conventions consistent in the same structure?
6 - jsonpath_exec.c
```
#include "tsearch/ts_utils.h"
#include "tsearch/ts_cache.h"
#include "utils/regproc.h"
#include "catalog/namespace.h"
static JsonPathBool
executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg,
void *param)
```
Why don’t put these includes to the header section together with other includes?
7 - jsonpath_exec.c
```
+ else
+ {
+ /*
+ * Fallback or Error for unknown flags (should be caught by
+ * parser)
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized tsqparser flag")));
+ }
```
This “else” should never be entered as the same check has been done by makeItemTsMatch. So, maybe just use an Assert here, or pg_unreachable().
8 - jsonpath_exec.c
```
+ /* Setup Context (Run ONLY once per predicate) */
+ if (!cxt->initialized)
```
While tracing this SQL:
```
evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb
evantest-# @@ '$.tags[*] ? (@ tsmatch "run" tsconfig "english")';
?column?
----------
(1 row)
```
I noticed that, when process “jogging”, cxt->initialized is still false, meaning that, the cxt is not reused across array items. Given the same tsconfig should apply to all array items, I think cxt should be reused.
9 - jsonpath_exec.c
```
+ /* Select Parser and Compile Query */
+ parser_mode = jsp->content.tsmatch.tsqparser;
+ parser_len = jsp->content.tsmatch.tsqparser_len;
+
+ if (parser_len > 0)
+ {
+ /* Dispatch based on flag */
+ if (pg_strncasecmp(parser_mode, "pl", parser_len) == 0)
```
Nit: parser_mode is only used inside if (parser_len > 0), it can be defined inside the “if”.
10 - jsonpath_gram.y
```
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid tsquery_parser value: \"%s\"", tsquery_parser->val),
+ errhint("Valid values are \"pl\", \"ph\", and \"w\".")));
```
When tested a case with an invalid parser, I got:
```
evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqparser "pss")';
ERROR: invalid tsquery_parser value: "pss@"
LINE 2: @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqpar...
^
HINT: Valid values are "pl", "ph", and "w".
```
You can see the it shows a bad looking invalid value. I think that’s because tsquery_parser->val is not NULL terminated. I fixed this problem with:
```
errmsg("invalid tsquery_parser value: \"%.*s\"", (int) tsquery_parser->len, tsquery_parser->val),
```
This change is also included in the attached diff file.
11 - jsonpath.c
```
+ if (printBracketes)
+ appendStringInfoChar(buf, ')');
+ break;
+
if (printBracketes)
appendStringInfoChar(buf, ')');
```
Duplicate code. Looks like a copy-pasto.
12 - jsonpath.c
```
+ /* Write the Main Query String */
+ appendBinaryStringInfo(buf,
+ &item->value.tsmatch.tsquerylen,
+ sizeof(item->value.tsmatch.tsquerylen));
+ appendBinaryStringInfo(buf,
+ item->value.tsmatch.tsquery,
+ item->value.tsmatch.tsquerylen);
+ appendStringInfoChar(buf, '\0');
```
I don’t think we need to manually append ‘\0’ after appendBinaryStringInfo. Looking at the header comment of appendBinaryStringInfo, it says that a trailing null will be added.
```
/*
* appendBinaryStringInfo
*
* Append arbitrary binary data to a StringInfo, allocating more space
* if necessary. Ensures that a trailing null byte is present.
*/
void
appendBinaryStringInfo(StringInfo str, const void *data, int datalen)
```
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/