Extract numeric filed in JSONB more effectively
Hi:
Currently if we want to extract a numeric field in jsonb, we need to use
the following expression: cast (a->>'a' as numeric). It will turn a numeric
to text first and then turn the text to numeric again. See
jsonb_object_field_text and JsonbValueAsText. However the binary format
of numeric in JSONB is compatible with the numeric in SQL, so I think we
can have an operator to extract the numeric directly. If the value of a
given
field is not a numeric data type, an error will be raised, this can be
documented.
In this patch, I added a new operator for this purpose, here is the
performance gain because of this.
create table tb (a jsonb);
insert into tb select '{"a": 1}'::jsonb from generate_series(1, 100000)i;
current method:
select count(*) from tb where cast (a->>'a' as numeric) = 2;
167ms.
new method:
select count(*) from tb where a@->'a' = 2;
65ms.
Is this the right way to go? Testcase, document and catalog version are
updated.
--
Best Regards
Andy Fan
Attachments:
v1-0001-Add-jsonb-operator-to-return-a-numeric-directly.patchapplication/octet-stream; name=v1-0001-Add-jsonb-operator-to-return-a-numeric-directly.patchDownload+72-2
On Tue, 1 Aug 2023 at 06:39, Andy Fan <zhihui.fan1213@gmail.com> wrote:
Hi:
Currently if we want to extract a numeric field in jsonb, we need to use
the following expression: cast (a->>'a' as numeric). It will turn a numeric
to text first and then turn the text to numeric again.
Why wouldn't you use cast(a->'a' as numeric), or ((a->'a')::numeric)?
We have a cast from jsonb to numeric (jsonb_numeric in jsonb.c) that
does not require this additional (de)serialization through text.
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
On Tue, Aug 1, 2023 at 7:03 PM Matthias van de Meent <
boekewurm+postgres@gmail.com> wrote:
On Tue, 1 Aug 2023 at 06:39, Andy Fan <zhihui.fan1213@gmail.com> wrote:
Hi:
Currently if we want to extract a numeric field in jsonb, we need to use
the following expression: cast (a->>'a' as numeric). It will turn anumeric
to text first and then turn the text to numeric again.
Why wouldn't you use cast(a->'a' as numeric), or ((a->'a')::numeric)?
Thanks for this information! I didn't realize we have this function
already at [1].
https://www.postgresql.org/docs/15/functions-json.html
--
Best Regards
Andy Fan
Hi Matthias:
On Wed, Aug 2, 2023 at 7:33 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
On Tue, Aug 1, 2023 at 7:03 PM Matthias van de Meent <
boekewurm+postgres@gmail.com> wrote:On Tue, 1 Aug 2023 at 06:39, Andy Fan <zhihui.fan1213@gmail.com> wrote:
Hi:
Currently if we want to extract a numeric field in jsonb, we need to use
the following expression: cast (a->>'a' as numeric). It will turn anumeric
to text first and then turn the text to numeric again.
Why wouldn't you use cast(a->'a' as numeric), or ((a->'a')::numeric)?
Thanks for this information! I didn't realize we have this function
already at [1].
Hi:
I just found ((a->'a')::numeric) is not as effective as I expected.
First in the above expression we used jsonb_object_field which
returns a jsonb (see JsonbValueToJsonb), and then we convert jsonb
to jsonbValue in jsonb_numeric (see JsonbExtractScalar). This
looks like a wastage.
Secondly, because of the same reason above, we use PG_GETARG_JSONB_P(0),
which may detoast a value so we need to free it with PG_FREE_IF_COPY.
then this looks like another potential wastage.
Thirdly, I am not sure we need to do the NumericCopy automatically
in jsonb_numeric. an option in my mind is maybe we can leave this
to the caller? At least in the normal case (a->'a')::numeric, we don't
need this copy IIUC.
/*
* v.val.numeric points into jsonb body, so we need to make a copy to
* return
*/
retValue = DatumGetNumericCopy(NumericGetDatum(v.val.numeric));
At last this method needs 1 extra FuncExpr than my method, this would
cost some expression execution effort. I'm not saying we need to avoid
expression execution generally, but extracting numeric fields from jsonb
looks a reasonable case. As a comparison, cast to other data types like
int2/int4 may be not needed since they are not binary compatible.
Here is the performance comparison (with -O3, my previous post is -O0).
select 1 from tb where (a->'a')::numeric = 2; 31ms.
select 1 from tb where (a@->'a') = 2; 15ms
--
Best Regards
Andy Fan
On Tue, Aug 1, 2023 at 12:39 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
Hi:
Currently if we want to extract a numeric field in jsonb, we need to use
the following expression: cast (a->>'a' as numeric). It will turn a numeric
to text first and then turn the text to numeric again. See
jsonb_object_field_text and JsonbValueAsText. However the binary format
of numeric in JSONB is compatible with the numeric in SQL, so I think we
can have an operator to extract the numeric directly. If the value of a given
field is not a numeric data type, an error will be raised, this can be
documented.In this patch, I added a new operator for this purpose, here is the
performance gain because of this.create table tb (a jsonb);
insert into tb select '{"a": 1}'::jsonb from generate_series(1, 100000)i;current method:
select count(*) from tb where cast (a->>'a' as numeric) = 2;
167ms.new method:
select count(*) from tb where a@->'a' = 2;
65ms.Is this the right way to go? Testcase, document and catalog version are
updated.--
Best Regards
Andy Fan
return PointerGetDatum(v->val.numeric);
should be something like
PG_RETURN_NUMERIC(v->val.numeric);
?
Hi Jian:
return PointerGetDatum(v->val.numeric);
should be something like
PG_RETURN_NUMERIC(v->val.numeric);
?
Thanks for this reminder, a new patch is attached. and commitfest
entry is added as well[1]https://commitfest.postgresql.org/44/4476/. For recording purposes, I compared the
new operator with all the existing operators.
select 1 from tb where (a->'a')::numeric = 2; 30.56ms
select 1 from tb where (a->>'a')::numeric = 2; 29.43ms
select 1 from tb where (a@->'a') = 2; 14.80ms
[1]: https://commitfest.postgresql.org/44/4476/
--
Best Regards
Andy Fan
Attachments:
v2-0001-Add-jsonb-operator-to-return-a-numeric-directly.patchapplication/octet-stream; name=v2-0001-Add-jsonb-operator-to-return-a-numeric-directly.patchDownload+72-2
Hi
čt 3. 8. 2023 v 2:51 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:
Hi Jian:
return PointerGetDatum(v->val.numeric);
should be something like
PG_RETURN_NUMERIC(v->val.numeric);
?Thanks for this reminder, a new patch is attached. and commitfest
entry is added as well[1]. For recording purposes, I compared the
new operator with all the existing operators.select 1 from tb where (a->'a')::numeric = 2; 30.56ms
select 1 from tb where (a->>'a')::numeric = 2; 29.43ms
select 1 from tb where (a@->'a') = 2; 14.80ms
I don't like this solution because it is bloating operators and it is not
extra readable. For completeness you should implement cast for date, int,
boolean too. Next, the same problem is with XML or hstore type (probably
with any types that are containers).
It is strange so only casting is 2x slower. I don't like the idea so using
a special operator is 2x faster than common syntax for casting. It is a
signal, so there is a space for optimization. Black magic with special
operators is not user friendly for relatively common problems.
Maybe we can introduce some *internal operator* "extract to type", and in
rewrite stage we can the pattern (x->'field')::type transform to OP(x,
'field', typid)
Regards
Pavel
--
Show quoted text
Best Regards
Andy Fan
Hi Pavel:
Thanks for the feedback.
I don't like this solution because it is bloating operators and it is not
extra readable.
If we support it with cast, could we say we are bloating CAST? It is true
that it is not extra readable, if so how about a->>'a' return text?
Actually
I can't guess any meaning of the existing jsonb operations without
documentation.
For completeness you should implement cast for date, int, boolean too.
Next, the same problem is with XML or hstore type (probably with any types
that are containers).
I am not sure completeness is a gold rule we should obey anytime,
like we have some function like int24le to avoid the unnecessary
cast, but we just avoid casting for special types for performance
reason, but not for all. At the same time, `int2/int4/int8` doesn't
have a binary compatibility type in jsonb. and the serialization
/deserialization for boolean is pretty cheap.
I didn't realize timetime types are binary compatible with SQL,
so maybe we can have some similar optimization as well.
(It is a pity that timestamp(tz) are not binary, or else we may
just need one operator).
I don't like the idea so using a special operator is 2x faster than common
syntax for casting. It is a signal, so there is a space for optimization.
Black magic with special operators is not user friendly for relatively
common problems.
I don't think "Black magic" is a proper word here, since it is not much
different from ->> return a text. If you argue text can be cast to
most-of-types, that would be a reason, but I doubt this difference
should generate a "black magic".
Maybe we can introduce some *internal operator* "extract to type", and in
rewrite stage we can the pattern (x->'field')::type transform to OP(x,
'field', typid)
Not sure what the OP should be? If it is a function, what is the
return value? It looks to me like it is hard to do in c language?
After all, if we really care about the number of operators, I'm OK
with just let users use the function directly, like
jsonb_field_as_numeric(jsonb, 'filedname')
jsonb_field_as_timestamp(jsonb, 'filedname');
jsonb_field_as_timestamptz(jsonb, 'filedname');
jsonb_field_as_date(jsonb, 'filedname');
it can save an operator and sloves the readable issue.
--
Best Regards
Andy Fan
Hi
čt 3. 8. 2023 v 9:53 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:
Hi Pavel:
Thanks for the feedback.
I don't like this solution because it is bloating operators and it is not
extra readable.
If we support it with cast, could we say we are bloating CAST? It is true
that it is not extra readable, if so how about a->>'a' return text?
Actually
I can't guess any meaning of the existing jsonb operations without
documentation.
yes, it can bloat CAST, but for usage we have already used syntax, and
these casts are cooked already:
(2023-08-03 11:04:51) postgres=# select castfunc::regprocedure from pg_cast
where castsource = 'jsonb'::regtype;
┌──────────────────┐
│ castfunc │
╞══════════════════╡
│ - │
│ bool(jsonb) │
│ "numeric"(jsonb) │
│ int2(jsonb) │
│ int4(jsonb) │
│ int8(jsonb) │
│ float4(jsonb) │
│ float8(jsonb) │
└──────────────────┘
(8 rows)
the operator ->> was a special case, the text type is special in postgres
as the most convertible type. And when you want to visualise a value or
display the value, you should convert value to text.
I can live with that because it is just one, but with your proposal opening
the doors for implementing tens of similar operators, I think it is bad.
Using ::target_type is common syntax and doesn't require reading
documentation.
More, I believe so lot of people uses more common syntax, and then this
syntax should to have good performance - for jsonb - (val->'op')::numeric
works, and then there should not be performance penalty, because this
syntax will be used in 99%.
Usage of cast is self documented.
For completeness you should implement cast for date, int, boolean too.
Next, the same problem is with XML or hstore type (probably with any types
that are containers).I am not sure completeness is a gold rule we should obey anytime,
like we have some function like int24le to avoid the unnecessary
cast, but we just avoid casting for special types for performance
reason, but not for all. At the same time, `int2/int4/int8` doesn't
have a binary compatibility type in jsonb. and the serialization
/deserialization for boolean is pretty cheap.I didn't realize timetime types are binary compatible with SQL,
so maybe we can have some similar optimization as well.
(It is a pity that timestamp(tz) are not binary, or else we may
just need one operator).I don't like the idea so using a special operator is 2x faster than
common syntax for casting. It is a signal, so there is a space for
optimization. Black magic with special operators is not user friendly for
relatively common problems.I don't think "Black magic" is a proper word here, since it is not much
different from ->> return a text. If you argue text can be cast to
most-of-types, that would be a reason, but I doubt this difference
should generate a "black magic".
I used the term black magic, because nobody without reading documentation
can find this operator. It is used just for this special case, and the
functionality is the same as using cast (only with different performance).
The operator ->> is more widely used. But if we have some possibility to
work without it, then the usage for a lot of users will be more simple.
More if the target types can be based on context
Can be nice to use some like `EXTRACT(YEAR FROM val->'field')` instead
`EXTRACT(YEAR FROM (val->>'field')::date)`
Maybe we can introduce some *internal operator* "extract to type", and in
rewrite stage we can the pattern (x->'field')::type transform to OP(x,
'field', typid)Not sure what the OP should be? If it is a function, what is the
return value? It looks to me like it is hard to do in c language?
It should be internal structure - it can be similar like COALESCE or IS
operator
After all, if we really care about the number of operators, I'm OK
with just let users use the function directly, likejsonb_field_as_numeric(jsonb, 'filedname')
jsonb_field_as_timestamp(jsonb, 'filedname');
jsonb_field_as_timestamptz(jsonb, 'filedname');
jsonb_field_as_date(jsonb, 'filedname');it can save an operator and sloves the readable issue.
I don't like it too much, but it is better than introduction new operator
We already have the jsonb_extract_path and jsonb_extract_path_text
function.
I can imagine to usage "anyelement" type too. some like
`jsonb_extract_path_type(jsonb, anyelement, variadic text[] )`
Show quoted text
--
Best Regards
Andy Fan
On Wed, 2 Aug 2023 at 03:05, Andy Fan <zhihui.fan1213@gmail.com> wrote:
Hi Matthias:
On Wed, Aug 2, 2023 at 7:33 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
On Tue, Aug 1, 2023 at 7:03 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
On Tue, 1 Aug 2023 at 06:39, Andy Fan <zhihui.fan1213@gmail.com> wrote:
Hi:
Currently if we want to extract a numeric field in jsonb, we need to use
the following expression: cast (a->>'a' as numeric). It will turn a numeric
to text first and then turn the text to numeric again.Why wouldn't you use cast(a->'a' as numeric), or ((a->'a')::numeric)?
Thanks for this information! I didn't realize we have this function
already at [1].Hi:
I just found ((a->'a')::numeric) is not as effective as I expected.
First in the above expression we used jsonb_object_field which
returns a jsonb (see JsonbValueToJsonb), and then we convert jsonb
to jsonbValue in jsonb_numeric (see JsonbExtractScalar). This
looks like a wastage.
Yes, it's not great, but that's just how this works. We can't
pre-specialize all possible operations that one might want to do in
PostgreSQL - that'd be absurdly expensive for binary and initial
database sizes.
Secondly, because of the same reason above, we use PG_GETARG_JSONB_P(0),
which may detoast a value so we need to free it with PG_FREE_IF_COPY.
then this looks like another potential wastage.
Is it? Detoasting only happens if the argument was toasted, and I have
serious doubts that the result of (a->'a') will be toasted in our
current system. Sure, we do need to allocate an intermediate result,
but that's in a temporary memory context that should be trivially
cheap to free.
/*
* v.val.numeric points into jsonb body, so we need to make a copy to
* return
*/
retValue = DatumGetNumericCopy(NumericGetDatum(v.val.numeric));At last this method needs 1 extra FuncExpr than my method, this would
cost some expression execution effort. I'm not saying we need to avoid
expression execution generally, but extracting numeric fields from jsonb
looks a reasonable case.
But we don't have special cases for the other jsonb types - the one
that is available (text) is lossy and doesn't work reliably without
making sure the field we're accessing is actually a string, and not
any other type of value.
As a comparison, cast to other data types like
int2/int4 may be not needed since they are not binary compatible.
Yet there are casts from jsonb to and back from int2, int4 and int8. I
don't see a very good reason to add this, for the same reasons
mentioned by Pavel.
*If* we were to add this operator, I would want this patch to also
include a #-variant for text[]-based deep access (c.q. #> / #>>), and
equivalent operators for the json type to keep the current access
operator parity.
Here is the performance comparison (with -O3, my previous post is -O0).
select 1 from tb where (a->'a')::numeric = 2; 31ms.
select 1 from tb where (a@->'a') = 2; 15ms
What's tb here?
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
Hi:
Yes, it's not great, but that's just how this works. We can't
pre-specialize all possible operations that one might want to do in
PostgreSQL - that'd be absurdly expensive for binary and initial
database sizes.
Are any people saying we would pre-specialize all possible operators?
I would say anything if adding operators will be expensive for binary and
initial database sizes. If so, how many per operator and how many
operators would be in your expectation?
Secondly, because of the same reason above, we use PG_GETARG_JSONB_P(0),
which may detoast a value so we need to free it with PG_FREE_IF_COPY.
then this looks like another potential wastage.Is it? Detoasting only happens if the argument was toasted, and I have
serious doubts that the result of (a->'a') will be toasted in our
current system. Sure, we do need to allocate an intermediate result,
but that's in a temporary memory context that should be trivially
cheap to free.
If you take care about my context, I put this as a second factor for the
current strategy. and it is the side effects of factor 1. FWIW, that cost
is paid for every jsonb object, not something during the initial database.
As a comparison, cast to other data types like
int2/int4 may be not needed since they are not binary compatible.Yet there are casts from jsonb to and back from int2, int4 and int8. I
don't see a very good reason to add this, for the same reasons
mentioned by Pavel.
Who is insisting on adding such an operator in your opinion?
*If* we were to add this operator, I would want this patch to also
include a #-variant for text[]-based deep access (c.q. #> / #>>), and
equivalent operators for the json type to keep the current access
operator parity.Here is the performance comparison (with -O3, my previous post is -O0).
select 1 from tb where (a->'a')::numeric = 2; 31ms.
select 1 from tb where (a@->'a') = 2; 15msWhat's tb here?
This is my first post. Copy it here again.
create table tb (a jsonb);
insert into tb select '{"a": 1}'::jsonb from generate_series(1, 100000)i;
--
Best Regards
Andy Fan
On 2023-08-03 03:53, Andy Fan wrote:
I didn't realize timetime types are binary compatible with SQL,
so maybe we can have some similar optimization as well.
(It is a pity that timestamp(tz) are not binary, or else we may
just need one operator).
Not to veer from the thread, but something about that paragraph
has been hard for me to parse/follow.
Maybe we can introduce some *internal operator* "extract to type", and
in
rewrite stage we can the pattern (x->'field')::type transform to OP(x,
'field', typid)Not sure what the OP should be? If it is a function, what is the
return value? It looks to me like it is hard to do in c language?
Now I am wondering about the 'planner support function' available
in CREATE FUNCTION since PG 12. I've never played with that yet.
Would that make it possible to have some, rather generic, extract
from JSON operator that can look at the surrounding expression
and replace itself sometimes with something more efficient?
Regards,
-Chap
Hi:
More, I believe so lot of people uses more common syntax, and then this
syntax should to have good performance - for jsonb - (val->'op')::numeric
works, and then there should not be performance penalty, because this
syntax will be used in 99%.
This looks like a valid opinion IMO, but to rescue it, we have to do
something like "internal structure" and remove the existing cast.
But even we pay the effort, it still breaks some common knowledge,
since xx:numeric is not a cast. It is an "internal structure"!
I don't think "Black magic" is a proper word here, since it is not much
different from ->> return a text. If you argue text can be cast to
most-of-types, that would be a reason, but I doubt this difference
should generate a "black magic".I used the term black magic, because nobody without reading documentation
can find this operator.
I think this is what document is used for..
It is used just for this special case, and the functionality is the same
as using cast (only with different performance).
This is not good, but I didn't see a better choice so far, see my first
graph.
The operator ->> is more widely used. But if we have some possibility to
work without it, then the usage for a lot of users will be more simple.
More if the target types can be based on context
It would be cool but still I didn't see a way to do that without making
something else complex.
Maybe we can introduce some *internal operator* "extract to type", and
in rewrite stage we can the pattern (x->'field')::type transform to OP(x,
'field', typid)Not sure what the OP should be? If it is a function, what is the
return value? It looks to me like it is hard to do in c language?It should be internal structure - it can be similar like COALESCE or IS
operator
It may work, but see my answer in the first graph.
After all, if we really care about the number of operators, I'm OK
with just let users use the function directly, likejsonb_field_as_numeric(jsonb, 'filedname')
jsonb_field_as_timestamp(jsonb, 'filedname');
jsonb_field_as_timestamptz(jsonb, 'filedname');
jsonb_field_as_date(jsonb, 'filedname');it can save an operator and sloves the readable issue.
I don't like it too much, but it is better than introduction new operator
Good to know it. Naming operators is a complex task if we add four.
We already have the jsonb_extract_path and jsonb_extract_path_text
function.
I can't follow this. jsonb_extract_path returns a jsonb, which is far
away from
our goal: return a numeric effectively?
I can imagine to usage "anyelement" type too. some like
`jsonb_extract_path_type(jsonb, anyelement, variadic text[] )`
Can you elaborate this please?
--
Best Regards
Andy Fan
Hi:
On Thu, Aug 3, 2023 at 8:34 PM Chapman Flack <chap@anastigmatix.net> wrote:
On 2023-08-03 03:53, Andy Fan wrote:
I didn't realize timetime types are binary compatible with SQL,
so maybe we can have some similar optimization as well.
(It is a pity that timestamp(tz) are not binary, or else we may
just need one operator).Not to veer from the thread, but something about that paragraph
has been hard for me to parse/follow.
I don't think this is a key conflict so far. but I'd explain this in more
detail. If timestamp -> timestamptz or timestamptz -> timestamp is
binary compatible, we can only have 1 operator to return a timestamp.
then when we cast it to timestamptz, it will be a no-op during runtime.
however cast between timestamp and timestamptz is not binary
compatible. whose castmethod is 'f';
Maybe we can introduce some *internal operator* "extract to type", and
in
rewrite stage we can the pattern (x->'field')::type transform to OP(x,
'field', typid)Not sure what the OP should be? If it is a function, what is the
return value? It looks to me like it is hard to do in c language?Now I am wondering about the 'planner support function' available
in CREATE FUNCTION since PG 12. I've never played with that yet.
Would that make it possible to have some, rather generic, extract
from JSON operator that can look at the surrounding expression
and replace itself sometimes with something efficient?
I didn't realize this before, 'planner support function' looks
amazing and SupportRequestSimplify looks promising, I will check it
more.
--
Best Regards
Andy Fan
Hi
čt 3. 8. 2023 v 15:23 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:
Hi:
More, I believe so lot of people uses more common syntax, and then this
syntax should to have good performance - for jsonb - (val->'op')::numeric
works, and then there should not be performance penalty, because this
syntax will be used in 99%.This looks like a valid opinion IMO, but to rescue it, we have to do
something like "internal structure" and remove the existing cast.
But even we pay the effort, it still breaks some common knowledge,
since xx:numeric is not a cast. It is an "internal structure"!
I didn't study jsonb function, but there is an xml function that extracts
value and next casts it to some target type. It does what is expected - for
known types use hard coded casts, for other ask system catalog for cast
function or does IO cast. This code is used for the XMLTABLE function. The
JSON_TABLE function is not implemented yet, but there should be similar
code. If you use explicit cast, then the code should not be hard, in the
rewrite stage all information should be known.
I don't think "Black magic" is a proper word here, since it is not much
different from ->> return a text. If you argue text can be cast to
most-of-types, that would be a reason, but I doubt this difference
should generate a "black magic".I used the term black magic, because nobody without reading documentation
can find this operator.I think this is what document is used for..
It is used just for this special case, and the functionality is the same
as using cast (only with different performance).This is not good, but I didn't see a better choice so far, see my first
graph.The operator ->> is more widely used. But if we have some possibility to
work without it, then the usage for a lot of users will be more simple.
More if the target types can be based on contextIt would be cool but still I didn't see a way to do that without making
something else complex.
sure - it is significantly more work, but it should be usable for all types
and just use common syntax. The custom @-> operator you can implement in
your own custom extension. Builtin solutions should be generic as it is
possible.
The things should be as simple as possible - mainly for users, that missing
knowledge, and any other possibility of how to do some task just increases
their confusion. Can be nice if users find one solution on stack overflow
and this solution should be great for performance too. It is worse if users
find more solutions, but it is not too bad, if these solutions have similar
performance. It is too bad if any solution has great performance and others
not too much. Users has not internal knowledge, and then don't understand
why sometimes should to use special operator and not common syntax.
Maybe we can introduce some *internal operator* "extract to type", and
in rewrite stage we can the pattern (x->'field')::type transform to OP(x,
'field', typid)Not sure what the OP should be? If it is a function, what is the
return value? It looks to me like it is hard to do in c language?It should be internal structure - it can be similar like COALESCE or IS
operatorIt may work, but see my answer in the first graph.
After all, if we really care about the number of operators, I'm OK
with just let users use the function directly, likejsonb_field_as_numeric(jsonb, 'filedname')
jsonb_field_as_timestamp(jsonb, 'filedname');
jsonb_field_as_timestamptz(jsonb, 'filedname');
jsonb_field_as_date(jsonb, 'filedname');it can save an operator and sloves the readable issue.
I don't like it too much, but it is better than introduction new operator
Good to know it. Naming operators is a complex task if we add four.
We already have the jsonb_extract_path and jsonb_extract_path_text
function.I can't follow this. jsonb_extract_path returns a jsonb, which is far
away from
our goal: return a numeric effectively?
I proposed `jsonb_extract_path_type` that is of anyelement type.
Regards
Pavel
Show quoted text
I can imagine to usage "anyelement" type too. some like
`jsonb_extract_path_type(jsonb, anyelement, variadic text[] )`
Can you elaborate this please?
--
Best Regards
Andy Fan
Hi:
If you use explicit cast, then the code should not be hard, in the
rewrite stage all information should be known.
Can you point to me where the code is for the XML stuff? I thought
this is a bad idea but I may accept it if some existing code does
such a thing already. "such thing" is typeA:typeB is
converted something else but user can't find out an entry in
pg_cast for typeA to typeB.
It would be cool but still I didn't see a way to do that without making
something else complex.
The custom @-> operator you can implement in your own custom extension.
Builtin solutions should be generic as it is possible.
I agree, but actually I think there is no clean way to do it, at least I
dislike the conversion of typeA to typeB in a cast syntax but there
is no entry in pg_cast for it. Are you saying something like this
or I misunderstood you?
--
Best Regards
Andy Fan
Andy Fan <zhihui.fan1213@gmail.com> writes:
If you use explicit cast, then the code should not be hard, in the
rewrite stage all information should be known.
Can you point to me where the code is for the XML stuff?
I think Pavel means XMLTABLE, which IMO is an ugly monstrosity of
syntax --- but count on the SQL committee to do it that way :-(.
As far as the current discussion goes, I'm strongly against
introducing new functions or operators to do the same things that
we already have perfectly good syntax for. "There's more than one
way to do it" isn't necessarily a virtue, and for sure it isn't a
virtue if people have to rewrite their existing queries to make use
of your optimization.
Also, why stop at optimizing "(jsonbval->'fld')::sometype"? There are
many other extraction cases that people might wish were faster, such
as json array indexing, nested fields, etc. It certainly won't make
sense to introduce yet another set of functions for each pattern you
want to optimize --- or at least, we won't want to ask users to change
their queries to invoke those functions explicitly.
I do like the idea of attaching a Simplify planner support function
to jsonb_numeric (and any other ones that seem worth optimizing)
that can convert a stack of jsonb transformations into a bundled
operation that avoids unnecessary conversions. Then you get the
speedup without any need for people to change their existing queries.
We'd still have functions like jsonb_field_as_numeric() under the
hood, but there's not an expectation that users call them explicitly.
(Alternatively, the output of this Simplify could be a new kind of
expression node that bundles one or more jsonb extractions with a
type conversion. I don't have an opinion yet on which way is better.)
regards, tom lane
On Thu, Aug 3, 2023 at 6:04 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
Is it? Detoasting only happens if the argument was toasted, and I have
serious doubts that the result of (a->'a') will be toasted in our
current system. Sure, we do need to allocate an intermediate result,
but that's in a temporary memory context that should be trivially
cheap to free./*
* v.val.numeric points into jsonb body, so we need to make a copy to
* return
*/
retValue = DatumGetNumericCopy(NumericGetDatum(v.val.numeric));At last this method needs 1 extra FuncExpr than my method, this would
cost some expression execution effort. I'm not saying we need to avoid
expression execution generally, but extracting numeric fields from jsonb
looks a reasonable case.What's tb here?
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
can confirm the patch's jsonb_object_field_numeric is faster than
pg_catalog."numeric"(jsonb).
also it works accurately either jsonb is in the page or in toast schema chunks.
I don't understand why we need to allocate an intermediate result
part. since you cannot directly update a jsonb value field.
This function is not easy to find out...
select numeric('{"a":11}'->'a'); --fail.
select jsonb_numeric(jsonb'{"a":11}'->'a'); --fail
select "numeric"('{"a":11}'::jsonb->'a'); --ok
čt 3. 8. 2023 v 16:27 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:
Hi:
If you use explicit cast, then the code should not be hard, in the
rewrite stage all information should be known.Can you point to me where the code is for the XML stuff? I thought
this is a bad idea but I may accept it if some existing code does
such a thing already. "such thing" is typeA:typeB is
converted something else but user can't find out an entry in
pg_cast for typeA to typeB.
in XML there is src/backend/utils/adt/xml.c, the XmlTableGetValue routine.
It is not an internal transformation - and from XML type to some else.
you can look at parser - parse_expr, parse_func. You can watch the
lifecycle of :: operator. There are transformations of nodes to different
nodes
you can look to patches related to SQL/JSON (not fully committed yet) and
json_table
It would be cool but still I didn't see a way to do that without making
something else complex.
The custom @-> operator you can implement in your own custom extension.
Builtin solutions should be generic as it is possible.I agree, but actually I think there is no clean way to do it, at least I
dislike the conversion of typeA to typeB in a cast syntax but there
is no entry in pg_cast for it. Are you saying something like this
or I misunderstood you?
There is not any possibility of user level space. The conversions should
be supported by cast from pg_cast, where it is possible. When it is
impossible, then you can raise an exception in some strict mode, or you can
do IO cast. But this is not hard part
You should to teach parser to push type info deeper to some nodes about
expected result
(2023-08-04 05:28:36) postgres=# select ('{"a":2,
"b":"nazdar"}'::jsonb)['a']::numeric;
┌─────────┐
│ numeric │
╞═════════╡
│ 2 │
└─────────┘
(1 row)
(2023-08-04 05:28:36) postgres=# select ('{"a":2,
"b":"nazdar"}'::jsonb)['a']::numeric;
┌─────────┐
│ numeric │
╞═════════╡
│ 2 │
└─────────┘
(1 row)
(2023-08-04 05:28:41) postgres=# select ('{"a":2,
"b":"nazdar"}'::jsonb)['a']::int;
┌──────┐
│ int4 │
╞══════╡
│ 2 │
└──────┘
(1 row)
when the parser iterates over the expression, it crosses ::type node first,
so you have information about the target type. Currently this information
is used when the parser is going back and when the source type is the same
as the target type, the cast can be ignored. Probably it needs to add some
flag to the operator if they are able to use this. Maybe it can be a new
third argument with an expected type. So new kinds of op functions can look
like opfx("any", "any", anyelement) returns anyelement. Maybe you find
another possibility. It can be invisible for me (or for you) now.
It is much more work, but the benefits will be generic. I think this is an
important part for container types, so partial fix is not good, and it
requires a system solution. The performance is important, but without
generic solutions, the complexity increases, and this is a much bigger
problem.
Regards
Pavel
Show quoted text
--
Best Regards
Andy Fan
Hi:
can confirm the patch's jsonb_object_field_numeric is faster than
pg_catalog."numeric"(jsonb).
Thanks for the confirmation.
This function is not easy to find out...
select jsonb_numeric(jsonb'{"a":11}'->'a'); --fail
jsonb_numeric is a prosrc rather than a proname, that's why you
can't call them directly.
select * from pg_proc where prosrc = 'jsonb_numeric';
select * from pg_proc where proname = 'jsonb_numeric';
It is bound to "numeric"(jsonb) cast, so we can call it with
a->'a'::numeric.
select numeric('{"a":11}'->'a'); --fail.
select "numeric"('{"a":11}'::jsonb->'a'); --ok
The double quotes look weird to me. but it looks like a common situation.
select numeric('1'::int); -- failed.
select "numeric"('1'::int); -- ok.
--
Best Regards
Andy Fan