Extract numeric filed in JSONB more effectively

Started by Andy Fanover 2 years ago110 messageshackers
Jump to latest
#1Andy Fan
zhihui.fan1213@gmail.com

Hi:

Currently if we want to extract a numeric field in jsonb, we need to use
the following expression: cast (a->>'a' as numeric). It will turn a numeric
to text first and then turn the text to numeric again. See
jsonb_object_field_text and JsonbValueAsText. However the binary format
of numeric in JSONB is compatible with the numeric in SQL, so I think we
can have an operator to extract the numeric directly. If the value of a
given
field is not a numeric data type, an error will be raised, this can be
documented.

In this patch, I added a new operator for this purpose, here is the
performance gain because of this.

create table tb (a jsonb);
insert into tb select '{"a": 1}'::jsonb from generate_series(1, 100000)i;

current method:
select count(*) from tb where cast (a->>'a' as numeric) = 2;
167ms.

new method:
select count(*) from tb where a@->'a' = 2;
65ms.

Is this the right way to go? Testcase, document and catalog version are
updated.

--
Best Regards
Andy Fan

Attachments:

v1-0001-Add-jsonb-operator-to-return-a-numeric-directly.patchapplication/octet-stream; name=v1-0001-Add-jsonb-operator-to-return-a-numeric-directly.patchDownload+72-2
#2Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Andy Fan (#1)
Re: Extract numeric filed in JSONB more effectively

On Tue, 1 Aug 2023 at 06:39, Andy Fan <zhihui.fan1213@gmail.com> wrote:

Hi:

Currently if we want to extract a numeric field in jsonb, we need to use
the following expression: cast (a->>'a' as numeric). It will turn a numeric
to text first and then turn the text to numeric again.

Why wouldn't you use cast(a->'a' as numeric), or ((a->'a')::numeric)?
We have a cast from jsonb to numeric (jsonb_numeric in jsonb.c) that
does not require this additional (de)serialization through text.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

#3Andy Fan
zhihui.fan1213@gmail.com
In reply to: Matthias van de Meent (#2)
Re: Extract numeric filed in JSONB more effectively

On Tue, Aug 1, 2023 at 7:03 PM Matthias van de Meent <
boekewurm+postgres@gmail.com> wrote:

On Tue, 1 Aug 2023 at 06:39, Andy Fan <zhihui.fan1213@gmail.com> wrote:

Hi:

Currently if we want to extract a numeric field in jsonb, we need to use
the following expression: cast (a->>'a' as numeric). It will turn a

numeric

to text first and then turn the text to numeric again.

Why wouldn't you use cast(a->'a' as numeric), or ((a->'a')::numeric)?

Thanks for this information! I didn't realize we have this function
already at [1].

https://www.postgresql.org/docs/15/functions-json.html

--
Best Regards
Andy Fan

#4Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#3)
Re: Extract numeric filed in JSONB more effectively

Hi Matthias:

On Wed, Aug 2, 2023 at 7:33 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Tue, Aug 1, 2023 at 7:03 PM Matthias van de Meent <
boekewurm+postgres@gmail.com> wrote:

On Tue, 1 Aug 2023 at 06:39, Andy Fan <zhihui.fan1213@gmail.com> wrote:

Hi:

Currently if we want to extract a numeric field in jsonb, we need to use
the following expression: cast (a->>'a' as numeric). It will turn a

numeric

to text first and then turn the text to numeric again.

Why wouldn't you use cast(a->'a' as numeric), or ((a->'a')::numeric)?

Thanks for this information! I didn't realize we have this function
already at [1].

https://www.postgresql.org/docs/15/functions-json.html

Hi:

I just found ((a->'a')::numeric) is not as effective as I expected.

First in the above expression we used jsonb_object_field which
returns a jsonb (see JsonbValueToJsonb), and then we convert jsonb
to jsonbValue in jsonb_numeric (see JsonbExtractScalar). This
looks like a wastage.

Secondly, because of the same reason above, we use PG_GETARG_JSONB_P(0),
which may detoast a value so we need to free it with PG_FREE_IF_COPY.
then this looks like another potential wastage.

Thirdly, I am not sure we need to do the NumericCopy automatically
in jsonb_numeric. an option in my mind is maybe we can leave this
to the caller? At least in the normal case (a->'a')::numeric, we don't
need this copy IIUC.

/*
* v.val.numeric points into jsonb body, so we need to make a copy to
* return
*/
retValue = DatumGetNumericCopy(NumericGetDatum(v.val.numeric));

At last this method needs 1 extra FuncExpr than my method, this would
cost some expression execution effort. I'm not saying we need to avoid
expression execution generally, but extracting numeric fields from jsonb
looks a reasonable case. As a comparison, cast to other data types like
int2/int4 may be not needed since they are not binary compatible.

Here is the performance comparison (with -O3, my previous post is -O0).

select 1 from tb where (a->'a')::numeric = 2; 31ms.
select 1 from tb where (a@->'a') = 2; 15ms

--
Best Regards
Andy Fan

#5jian he
jian.universality@gmail.com
In reply to: Andy Fan (#1)
Re: Extract numeric filed in JSONB more effectively

On Tue, Aug 1, 2023 at 12:39 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Hi:

Currently if we want to extract a numeric field in jsonb, we need to use
the following expression: cast (a->>'a' as numeric). It will turn a numeric
to text first and then turn the text to numeric again. See
jsonb_object_field_text and JsonbValueAsText. However the binary format
of numeric in JSONB is compatible with the numeric in SQL, so I think we
can have an operator to extract the numeric directly. If the value of a given
field is not a numeric data type, an error will be raised, this can be
documented.

In this patch, I added a new operator for this purpose, here is the
performance gain because of this.

create table tb (a jsonb);
insert into tb select '{"a": 1}'::jsonb from generate_series(1, 100000)i;

current method:
select count(*) from tb where cast (a->>'a' as numeric) = 2;
167ms.

new method:
select count(*) from tb where a@->'a' = 2;
65ms.

Is this the right way to go? Testcase, document and catalog version are
updated.

--
Best Regards
Andy Fan

return PointerGetDatum(v->val.numeric);
should be something like
PG_RETURN_NUMERIC(v->val.numeric);
?

#6Andy Fan
zhihui.fan1213@gmail.com
In reply to: jian he (#5)
Re: Extract numeric filed in JSONB more effectively

Hi Jian:

return PointerGetDatum(v->val.numeric);
should be something like
PG_RETURN_NUMERIC(v->val.numeric);
?

Thanks for this reminder, a new patch is attached. and commitfest
entry is added as well[1]https://commitfest.postgresql.org/44/4476/. For recording purposes, I compared the
new operator with all the existing operators.

select 1 from tb where (a->'a')::numeric = 2; 30.56ms
select 1 from tb where (a->>'a')::numeric = 2; 29.43ms
select 1 from tb where (a@->'a') = 2; 14.80ms

[1]: https://commitfest.postgresql.org/44/4476/

--
Best Regards
Andy Fan

Attachments:

v2-0001-Add-jsonb-operator-to-return-a-numeric-directly.patchapplication/octet-stream; name=v2-0001-Add-jsonb-operator-to-return-a-numeric-directly.patchDownload+72-2
#7Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andy Fan (#6)
Re: Extract numeric filed in JSONB more effectively

Hi

čt 3. 8. 2023 v 2:51 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:

Hi Jian:

return PointerGetDatum(v->val.numeric);
should be something like
PG_RETURN_NUMERIC(v->val.numeric);
?

Thanks for this reminder, a new patch is attached. and commitfest
entry is added as well[1]. For recording purposes, I compared the
new operator with all the existing operators.

select 1 from tb where (a->'a')::numeric = 2; 30.56ms
select 1 from tb where (a->>'a')::numeric = 2; 29.43ms
select 1 from tb where (a@->'a') = 2; 14.80ms

[1] https://commitfest.postgresql.org/44/4476/

I don't like this solution because it is bloating operators and it is not
extra readable. For completeness you should implement cast for date, int,
boolean too. Next, the same problem is with XML or hstore type (probably
with any types that are containers).

It is strange so only casting is 2x slower. I don't like the idea so using
a special operator is 2x faster than common syntax for casting. It is a
signal, so there is a space for optimization. Black magic with special
operators is not user friendly for relatively common problems.

Maybe we can introduce some *internal operator* "extract to type", and in
rewrite stage we can the pattern (x->'field')::type transform to OP(x,
'field', typid)

Regards

Pavel

--

Show quoted text

Best Regards
Andy Fan

#8Andy Fan
zhihui.fan1213@gmail.com
In reply to: Pavel Stehule (#7)
Re: Extract numeric filed in JSONB more effectively

Hi Pavel:

Thanks for the feedback.

I don't like this solution because it is bloating operators and it is not

extra readable.

If we support it with cast, could we say we are bloating CAST? It is true
that it is not extra readable, if so how about a->>'a' return text?
Actually
I can't guess any meaning of the existing jsonb operations without
documentation.

For completeness you should implement cast for date, int, boolean too.

Next, the same problem is with XML or hstore type (probably with any types
that are containers).

I am not sure completeness is a gold rule we should obey anytime,
like we have some function like int24le to avoid the unnecessary
cast, but we just avoid casting for special types for performance
reason, but not for all. At the same time, `int2/int4/int8` doesn't
have a binary compatibility type in jsonb. and the serialization
/deserialization for boolean is pretty cheap.

I didn't realize timetime types are binary compatible with SQL,
so maybe we can have some similar optimization as well.
(It is a pity that timestamp(tz) are not binary, or else we may
just need one operator).

I don't like the idea so using a special operator is 2x faster than common
syntax for casting. It is a signal, so there is a space for optimization.
Black magic with special operators is not user friendly for relatively
common problems.

I don't think "Black magic" is a proper word here, since it is not much
different from ->> return a text. If you argue text can be cast to
most-of-types, that would be a reason, but I doubt this difference
should generate a "black magic".

Maybe we can introduce some *internal operator* "extract to type", and in
rewrite stage we can the pattern (x->'field')::type transform to OP(x,
'field', typid)

Not sure what the OP should be? If it is a function, what is the
return value? It looks to me like it is hard to do in c language?

After all, if we really care about the number of operators, I'm OK
with just let users use the function directly, like

jsonb_field_as_numeric(jsonb, 'filedname')
jsonb_field_as_timestamp(jsonb, 'filedname');
jsonb_field_as_timestamptz(jsonb, 'filedname');
jsonb_field_as_date(jsonb, 'filedname');

it can save an operator and sloves the readable issue.

--
Best Regards
Andy Fan

#9Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andy Fan (#8)
Re: Extract numeric filed in JSONB more effectively

Hi

čt 3. 8. 2023 v 9:53 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:

Hi Pavel:

Thanks for the feedback.

I don't like this solution because it is bloating operators and it is not

extra readable.

If we support it with cast, could we say we are bloating CAST? It is true
that it is not extra readable, if so how about a->>'a' return text?
Actually
I can't guess any meaning of the existing jsonb operations without
documentation.

yes, it can bloat CAST, but for usage we have already used syntax, and
these casts are cooked already:

(2023-08-03 11:04:51) postgres=# select castfunc::regprocedure from pg_cast
where castsource = 'jsonb'::regtype;
┌──────────────────┐
│ castfunc │
╞══════════════════╡
│ - │
│ bool(jsonb) │
│ "numeric"(jsonb) │
│ int2(jsonb) │
│ int4(jsonb) │
│ int8(jsonb) │
│ float4(jsonb) │
│ float8(jsonb) │
└──────────────────┘
(8 rows)

the operator ->> was a special case, the text type is special in postgres
as the most convertible type. And when you want to visualise a value or
display the value, you should convert value to text.

I can live with that because it is just one, but with your proposal opening
the doors for implementing tens of similar operators, I think it is bad.
Using ::target_type is common syntax and doesn't require reading
documentation.

More, I believe so lot of people uses more common syntax, and then this
syntax should to have good performance - for jsonb - (val->'op')::numeric
works, and then there should not be performance penalty, because this
syntax will be used in 99%.

Usage of cast is self documented.

For completeness you should implement cast for date, int, boolean too.

Next, the same problem is with XML or hstore type (probably with any types
that are containers).

I am not sure completeness is a gold rule we should obey anytime,
like we have some function like int24le to avoid the unnecessary
cast, but we just avoid casting for special types for performance
reason, but not for all. At the same time, `int2/int4/int8` doesn't
have a binary compatibility type in jsonb. and the serialization
/deserialization for boolean is pretty cheap.

I didn't realize timetime types are binary compatible with SQL,
so maybe we can have some similar optimization as well.
(It is a pity that timestamp(tz) are not binary, or else we may
just need one operator).

I don't like the idea so using a special operator is 2x faster than
common syntax for casting. It is a signal, so there is a space for
optimization. Black magic with special operators is not user friendly for
relatively common problems.

I don't think "Black magic" is a proper word here, since it is not much
different from ->> return a text. If you argue text can be cast to
most-of-types, that would be a reason, but I doubt this difference
should generate a "black magic".

I used the term black magic, because nobody without reading documentation
can find this operator. It is used just for this special case, and the
functionality is the same as using cast (only with different performance).

The operator ->> is more widely used. But if we have some possibility to
work without it, then the usage for a lot of users will be more simple.
More if the target types can be based on context

Can be nice to use some like `EXTRACT(YEAR FROM val->'field')` instead
`EXTRACT(YEAR FROM (val->>'field')::date)`

Maybe we can introduce some *internal operator* "extract to type", and in
rewrite stage we can the pattern (x->'field')::type transform to OP(x,
'field', typid)

Not sure what the OP should be? If it is a function, what is the
return value? It looks to me like it is hard to do in c language?

It should be internal structure - it can be similar like COALESCE or IS
operator

After all, if we really care about the number of operators, I'm OK
with just let users use the function directly, like

jsonb_field_as_numeric(jsonb, 'filedname')
jsonb_field_as_timestamp(jsonb, 'filedname');
jsonb_field_as_timestamptz(jsonb, 'filedname');
jsonb_field_as_date(jsonb, 'filedname');

it can save an operator and sloves the readable issue.

I don't like it too much, but it is better than introduction new operator

We already have the jsonb_extract_path and jsonb_extract_path_text
function.

I can imagine to usage "anyelement" type too. some like
`jsonb_extract_path_type(jsonb, anyelement, variadic text[] )`

Show quoted text

--
Best Regards
Andy Fan

#10Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Andy Fan (#4)
Re: Extract numeric filed in JSONB more effectively

On Wed, 2 Aug 2023 at 03:05, Andy Fan <zhihui.fan1213@gmail.com> wrote:

Hi Matthias:

On Wed, Aug 2, 2023 at 7:33 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Tue, Aug 1, 2023 at 7:03 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:

On Tue, 1 Aug 2023 at 06:39, Andy Fan <zhihui.fan1213@gmail.com> wrote:

Hi:

Currently if we want to extract a numeric field in jsonb, we need to use
the following expression: cast (a->>'a' as numeric). It will turn a numeric
to text first and then turn the text to numeric again.

Why wouldn't you use cast(a->'a' as numeric), or ((a->'a')::numeric)?

Thanks for this information! I didn't realize we have this function
already at [1].

https://www.postgresql.org/docs/15/functions-json.html

Hi:

I just found ((a->'a')::numeric) is not as effective as I expected.

First in the above expression we used jsonb_object_field which
returns a jsonb (see JsonbValueToJsonb), and then we convert jsonb
to jsonbValue in jsonb_numeric (see JsonbExtractScalar). This
looks like a wastage.

Yes, it's not great, but that's just how this works. We can't
pre-specialize all possible operations that one might want to do in
PostgreSQL - that'd be absurdly expensive for binary and initial
database sizes.

Secondly, because of the same reason above, we use PG_GETARG_JSONB_P(0),
which may detoast a value so we need to free it with PG_FREE_IF_COPY.
then this looks like another potential wastage.

Is it? Detoasting only happens if the argument was toasted, and I have
serious doubts that the result of (a->'a') will be toasted in our
current system. Sure, we do need to allocate an intermediate result,
but that's in a temporary memory context that should be trivially
cheap to free.

/*
* v.val.numeric points into jsonb body, so we need to make a copy to
* return
*/
retValue = DatumGetNumericCopy(NumericGetDatum(v.val.numeric));

At last this method needs 1 extra FuncExpr than my method, this would
cost some expression execution effort. I'm not saying we need to avoid
expression execution generally, but extracting numeric fields from jsonb
looks a reasonable case.

But we don't have special cases for the other jsonb types - the one
that is available (text) is lossy and doesn't work reliably without
making sure the field we're accessing is actually a string, and not
any other type of value.

As a comparison, cast to other data types like
int2/int4 may be not needed since they are not binary compatible.

Yet there are casts from jsonb to and back from int2, int4 and int8. I
don't see a very good reason to add this, for the same reasons
mentioned by Pavel.

*If* we were to add this operator, I would want this patch to also
include a #-variant for text[]-based deep access (c.q. #> / #>>), and
equivalent operators for the json type to keep the current access
operator parity.

Here is the performance comparison (with -O3, my previous post is -O0).

select 1 from tb where (a->'a')::numeric = 2; 31ms.
select 1 from tb where (a@->'a') = 2; 15ms

What's tb here?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

#11Andy Fan
zhihui.fan1213@gmail.com
In reply to: Matthias van de Meent (#10)
Re: Extract numeric filed in JSONB more effectively

Hi:

Yes, it's not great, but that's just how this works. We can't
pre-specialize all possible operations that one might want to do in
PostgreSQL - that'd be absurdly expensive for binary and initial
database sizes.

Are any people saying we would pre-specialize all possible operators?
I would say anything if adding operators will be expensive for binary and
initial database sizes. If so, how many per operator and how many
operators would be in your expectation?

Secondly, because of the same reason above, we use PG_GETARG_JSONB_P(0),
which may detoast a value so we need to free it with PG_FREE_IF_COPY.
then this looks like another potential wastage.

Is it? Detoasting only happens if the argument was toasted, and I have
serious doubts that the result of (a->'a') will be toasted in our
current system. Sure, we do need to allocate an intermediate result,
but that's in a temporary memory context that should be trivially
cheap to free.

If you take care about my context, I put this as a second factor for the
current strategy. and it is the side effects of factor 1. FWIW, that cost
is paid for every jsonb object, not something during the initial database.

As a comparison, cast to other data types like
int2/int4 may be not needed since they are not binary compatible.

Yet there are casts from jsonb to and back from int2, int4 and int8. I
don't see a very good reason to add this, for the same reasons
mentioned by Pavel.

Who is insisting on adding such an operator in your opinion?

*If* we were to add this operator, I would want this patch to also
include a #-variant for text[]-based deep access (c.q. #> / #>>), and
equivalent operators for the json type to keep the current access
operator parity.

Here is the performance comparison (with -O3, my previous post is -O0).

select 1 from tb where (a->'a')::numeric = 2; 31ms.
select 1 from tb where (a@->'a') = 2; 15ms

What's tb here?

This is my first post. Copy it here again.

create table tb (a jsonb);
insert into tb select '{"a": 1}'::jsonb from generate_series(1, 100000)i;

--
Best Regards
Andy Fan

#12Chapman Flack
chap@anastigmatix.net
In reply to: Andy Fan (#8)
Re: Extract numeric filed in JSONB more effectively

On 2023-08-03 03:53, Andy Fan wrote:

I didn't realize timetime types are binary compatible with SQL,
so maybe we can have some similar optimization as well.
(It is a pity that timestamp(tz) are not binary, or else we may
just need one operator).

Not to veer from the thread, but something about that paragraph
has been hard for me to parse/follow.

Maybe we can introduce some *internal operator* "extract to type", and
in
rewrite stage we can the pattern (x->'field')::type transform to OP(x,
'field', typid)

Not sure what the OP should be? If it is a function, what is the
return value? It looks to me like it is hard to do in c language?

Now I am wondering about the 'planner support function' available
in CREATE FUNCTION since PG 12. I've never played with that yet.
Would that make it possible to have some, rather generic, extract
from JSON operator that can look at the surrounding expression
and replace itself sometimes with something more efficient?

Regards,
-Chap

#13Andy Fan
zhihui.fan1213@gmail.com
In reply to: Pavel Stehule (#9)
Re: Extract numeric filed in JSONB more effectively

Hi:

More, I believe so lot of people uses more common syntax, and then this
syntax should to have good performance - for jsonb - (val->'op')::numeric
works, and then there should not be performance penalty, because this
syntax will be used in 99%.

This looks like a valid opinion IMO, but to rescue it, we have to do
something like "internal structure" and remove the existing cast.
But even we pay the effort, it still breaks some common knowledge,
since xx:numeric is not a cast. It is an "internal structure"!

I don't think "Black magic" is a proper word here, since it is not much

different from ->> return a text. If you argue text can be cast to
most-of-types, that would be a reason, but I doubt this difference
should generate a "black magic".

I used the term black magic, because nobody without reading documentation
can find this operator.

I think this is what document is used for..

It is used just for this special case, and the functionality is the same
as using cast (only with different performance).

This is not good, but I didn't see a better choice so far, see my first
graph.

The operator ->> is more widely used. But if we have some possibility to
work without it, then the usage for a lot of users will be more simple.
More if the target types can be based on context

It would be cool but still I didn't see a way to do that without making
something else complex.

Maybe we can introduce some *internal operator* "extract to type", and
in rewrite stage we can the pattern (x->'field')::type transform to OP(x,
'field', typid)

Not sure what the OP should be? If it is a function, what is the
return value? It looks to me like it is hard to do in c language?

It should be internal structure - it can be similar like COALESCE or IS
operator

It may work, but see my answer in the first graph.

After all, if we really care about the number of operators, I'm OK
with just let users use the function directly, like

jsonb_field_as_numeric(jsonb, 'filedname')
jsonb_field_as_timestamp(jsonb, 'filedname');
jsonb_field_as_timestamptz(jsonb, 'filedname');
jsonb_field_as_date(jsonb, 'filedname');

it can save an operator and sloves the readable issue.

I don't like it too much, but it is better than introduction new operator

Good to know it. Naming operators is a complex task if we add four.

We already have the jsonb_extract_path and jsonb_extract_path_text
function.

I can't follow this. jsonb_extract_path returns a jsonb, which is far
away from
our goal: return a numeric effectively?

I can imagine to usage "anyelement" type too. some like

`jsonb_extract_path_type(jsonb, anyelement, variadic text[] )`

Can you elaborate this please?

--
Best Regards
Andy Fan

#14Andy Fan
zhihui.fan1213@gmail.com
In reply to: Chapman Flack (#12)
Re: Extract numeric filed in JSONB more effectively

Hi:

On Thu, Aug 3, 2023 at 8:34 PM Chapman Flack <chap@anastigmatix.net> wrote:

On 2023-08-03 03:53, Andy Fan wrote:

I didn't realize timetime types are binary compatible with SQL,
so maybe we can have some similar optimization as well.
(It is a pity that timestamp(tz) are not binary, or else we may
just need one operator).

Not to veer from the thread, but something about that paragraph
has been hard for me to parse/follow.

I don't think this is a key conflict so far. but I'd explain this in more
detail. If timestamp -> timestamptz or timestamptz -> timestamp is
binary compatible, we can only have 1 operator to return a timestamp.
then when we cast it to timestamptz, it will be a no-op during runtime.
however cast between timestamp and timestamptz is not binary
compatible. whose castmethod is 'f';

Maybe we can introduce some *internal operator* "extract to type", and
in
rewrite stage we can the pattern (x->'field')::type transform to OP(x,
'field', typid)

Not sure what the OP should be? If it is a function, what is the
return value? It looks to me like it is hard to do in c language?

Now I am wondering about the 'planner support function' available
in CREATE FUNCTION since PG 12. I've never played with that yet.
Would that make it possible to have some, rather generic, extract
from JSON operator that can look at the surrounding expression
and replace itself sometimes with something efficient?

I didn't realize this before, 'planner support function' looks
amazing and SupportRequestSimplify looks promising, I will check it
more.

--
Best Regards
Andy Fan

#15Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andy Fan (#13)
Re: Extract numeric filed in JSONB more effectively

Hi

čt 3. 8. 2023 v 15:23 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:

Hi:

More, I believe so lot of people uses more common syntax, and then this
syntax should to have good performance - for jsonb - (val->'op')::numeric
works, and then there should not be performance penalty, because this
syntax will be used in 99%.

This looks like a valid opinion IMO, but to rescue it, we have to do
something like "internal structure" and remove the existing cast.
But even we pay the effort, it still breaks some common knowledge,
since xx:numeric is not a cast. It is an "internal structure"!

I didn't study jsonb function, but there is an xml function that extracts
value and next casts it to some target type. It does what is expected - for
known types use hard coded casts, for other ask system catalog for cast
function or does IO cast. This code is used for the XMLTABLE function. The
JSON_TABLE function is not implemented yet, but there should be similar
code. If you use explicit cast, then the code should not be hard, in the
rewrite stage all information should be known.

I don't think "Black magic" is a proper word here, since it is not much

different from ->> return a text. If you argue text can be cast to
most-of-types, that would be a reason, but I doubt this difference
should generate a "black magic".

I used the term black magic, because nobody without reading documentation
can find this operator.

I think this is what document is used for..

It is used just for this special case, and the functionality is the same
as using cast (only with different performance).

This is not good, but I didn't see a better choice so far, see my first
graph.

The operator ->> is more widely used. But if we have some possibility to
work without it, then the usage for a lot of users will be more simple.
More if the target types can be based on context

It would be cool but still I didn't see a way to do that without making
something else complex.

sure - it is significantly more work, but it should be usable for all types
and just use common syntax. The custom @-> operator you can implement in
your own custom extension. Builtin solutions should be generic as it is
possible.

The things should be as simple as possible - mainly for users, that missing
knowledge, and any other possibility of how to do some task just increases
their confusion. Can be nice if users find one solution on stack overflow
and this solution should be great for performance too. It is worse if users
find more solutions, but it is not too bad, if these solutions have similar
performance. It is too bad if any solution has great performance and others
not too much. Users has not internal knowledge, and then don't understand
why sometimes should to use special operator and not common syntax.

Maybe we can introduce some *internal operator* "extract to type", and
in rewrite stage we can the pattern (x->'field')::type transform to OP(x,
'field', typid)

Not sure what the OP should be? If it is a function, what is the
return value? It looks to me like it is hard to do in c language?

It should be internal structure - it can be similar like COALESCE or IS
operator

It may work, but see my answer in the first graph.

After all, if we really care about the number of operators, I'm OK
with just let users use the function directly, like

jsonb_field_as_numeric(jsonb, 'filedname')
jsonb_field_as_timestamp(jsonb, 'filedname');
jsonb_field_as_timestamptz(jsonb, 'filedname');
jsonb_field_as_date(jsonb, 'filedname');

it can save an operator and sloves the readable issue.

I don't like it too much, but it is better than introduction new operator

Good to know it. Naming operators is a complex task if we add four.

We already have the jsonb_extract_path and jsonb_extract_path_text
function.

I can't follow this. jsonb_extract_path returns a jsonb, which is far
away from
our goal: return a numeric effectively?

I proposed `jsonb_extract_path_type` that is of anyelement type.

Regards

Pavel

Show quoted text

I can imagine to usage "anyelement" type too. some like

`jsonb_extract_path_type(jsonb, anyelement, variadic text[] )`

Can you elaborate this please?

--
Best Regards
Andy Fan

#16Andy Fan
zhihui.fan1213@gmail.com
In reply to: Pavel Stehule (#15)
Re: Extract numeric filed in JSONB more effectively

Hi:

If you use explicit cast, then the code should not be hard, in the
rewrite stage all information should be known.

Can you point to me where the code is for the XML stuff? I thought
this is a bad idea but I may accept it if some existing code does
such a thing already. "such thing" is typeA:typeB is
converted something else but user can't find out an entry in
pg_cast for typeA to typeB.

It would be cool but still I didn't see a way to do that without making

something else complex.

The custom @-> operator you can implement in your own custom extension.
Builtin solutions should be generic as it is possible.

I agree, but actually I think there is no clean way to do it, at least I
dislike the conversion of typeA to typeB in a cast syntax but there
is no entry in pg_cast for it. Are you saying something like this
or I misunderstood you?

--
Best Regards
Andy Fan

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andy Fan (#16)
Re: Extract numeric filed in JSONB more effectively

Andy Fan <zhihui.fan1213@gmail.com> writes:

If you use explicit cast, then the code should not be hard, in the
rewrite stage all information should be known.

Can you point to me where the code is for the XML stuff?

I think Pavel means XMLTABLE, which IMO is an ugly monstrosity of
syntax --- but count on the SQL committee to do it that way :-(.

As far as the current discussion goes, I'm strongly against
introducing new functions or operators to do the same things that
we already have perfectly good syntax for. "There's more than one
way to do it" isn't necessarily a virtue, and for sure it isn't a
virtue if people have to rewrite their existing queries to make use
of your optimization.

Also, why stop at optimizing "(jsonbval->'fld')::sometype"? There are
many other extraction cases that people might wish were faster, such
as json array indexing, nested fields, etc. It certainly won't make
sense to introduce yet another set of functions for each pattern you
want to optimize --- or at least, we won't want to ask users to change
their queries to invoke those functions explicitly.

I do like the idea of attaching a Simplify planner support function
to jsonb_numeric (and any other ones that seem worth optimizing)
that can convert a stack of jsonb transformations into a bundled
operation that avoids unnecessary conversions. Then you get the
speedup without any need for people to change their existing queries.
We'd still have functions like jsonb_field_as_numeric() under the
hood, but there's not an expectation that users call them explicitly.
(Alternatively, the output of this Simplify could be a new kind of
expression node that bundles one or more jsonb extractions with a
type conversion. I don't have an opinion yet on which way is better.)

regards, tom lane

#18jian he
jian.universality@gmail.com
In reply to: Matthias van de Meent (#10)
Re: Extract numeric filed in JSONB more effectively

On Thu, Aug 3, 2023 at 6:04 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

Is it? Detoasting only happens if the argument was toasted, and I have
serious doubts that the result of (a->'a') will be toasted in our
current system. Sure, we do need to allocate an intermediate result,
but that's in a temporary memory context that should be trivially
cheap to free.

/*
* v.val.numeric points into jsonb body, so we need to make a copy to
* return
*/
retValue = DatumGetNumericCopy(NumericGetDatum(v.val.numeric));

At last this method needs 1 extra FuncExpr than my method, this would
cost some expression execution effort. I'm not saying we need to avoid
expression execution generally, but extracting numeric fields from jsonb
looks a reasonable case.

What's tb here?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

can confirm the patch's jsonb_object_field_numeric is faster than
pg_catalog."numeric"(jsonb).
also it works accurately either jsonb is in the page or in toast schema chunks.
I don't understand why we need to allocate an intermediate result
part. since you cannot directly update a jsonb value field.

This function is not easy to find out...
select numeric('{"a":11}'->'a'); --fail.
select jsonb_numeric(jsonb'{"a":11}'->'a'); --fail
select "numeric"('{"a":11}'::jsonb->'a'); --ok

#19Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andy Fan (#16)
Re: Extract numeric filed in JSONB more effectively

čt 3. 8. 2023 v 16:27 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:

Hi:

If you use explicit cast, then the code should not be hard, in the
rewrite stage all information should be known.

Can you point to me where the code is for the XML stuff? I thought
this is a bad idea but I may accept it if some existing code does
such a thing already. "such thing" is typeA:typeB is
converted something else but user can't find out an entry in
pg_cast for typeA to typeB.

in XML there is src/backend/utils/adt/xml.c, the XmlTableGetValue routine.
It is not an internal transformation - and from XML type to some else.

you can look at parser - parse_expr, parse_func. You can watch the
lifecycle of :: operator. There are transformations of nodes to different
nodes

you can look to patches related to SQL/JSON (not fully committed yet) and
json_table

It would be cool but still I didn't see a way to do that without making

something else complex.

The custom @-> operator you can implement in your own custom extension.
Builtin solutions should be generic as it is possible.

I agree, but actually I think there is no clean way to do it, at least I
dislike the conversion of typeA to typeB in a cast syntax but there
is no entry in pg_cast for it. Are you saying something like this
or I misunderstood you?

There is not any possibility of user level space. The conversions should
be supported by cast from pg_cast, where it is possible. When it is
impossible, then you can raise an exception in some strict mode, or you can
do IO cast. But this is not hard part

You should to teach parser to push type info deeper to some nodes about
expected result

(2023-08-04 05:28:36) postgres=# select ('{"a":2,
"b":"nazdar"}'::jsonb)['a']::numeric;
┌─────────┐
│ numeric │
╞═════════╡
│ 2 │
└─────────┘
(1 row)

(2023-08-04 05:28:36) postgres=# select ('{"a":2,
"b":"nazdar"}'::jsonb)['a']::numeric;
┌─────────┐
│ numeric │
╞═════════╡
│ 2 │
└─────────┘
(1 row)

(2023-08-04 05:28:41) postgres=# select ('{"a":2,
"b":"nazdar"}'::jsonb)['a']::int;
┌──────┐
│ int4 │
╞══════╡
│ 2 │
└──────┘
(1 row)

when the parser iterates over the expression, it crosses ::type node first,
so you have information about the target type. Currently this information
is used when the parser is going back and when the source type is the same
as the target type, the cast can be ignored. Probably it needs to add some
flag to the operator if they are able to use this. Maybe it can be a new
third argument with an expected type. So new kinds of op functions can look
like opfx("any", "any", anyelement) returns anyelement. Maybe you find
another possibility. It can be invisible for me (or for you) now.

It is much more work, but the benefits will be generic. I think this is an
important part for container types, so partial fix is not good, and it
requires a system solution. The performance is important, but without
generic solutions, the complexity increases, and this is a much bigger
problem.

Regards

Pavel

Show quoted text

--
Best Regards
Andy Fan

#20Andy Fan
zhihui.fan1213@gmail.com
In reply to: jian he (#18)
Re: Extract numeric filed in JSONB more effectively

Hi:

can confirm the patch's jsonb_object_field_numeric is faster than
pg_catalog."numeric"(jsonb).

Thanks for the confirmation.

This function is not easy to find out...

select jsonb_numeric(jsonb'{"a":11}'->'a'); --fail

jsonb_numeric is a prosrc rather than a proname, that's why you
can't call them directly.

select * from pg_proc where prosrc = 'jsonb_numeric';
select * from pg_proc where proname = 'jsonb_numeric';

It is bound to "numeric"(jsonb) cast, so we can call it with
a->'a'::numeric.

select numeric('{"a":11}'->'a'); --fail.

select "numeric"('{"a":11}'::jsonb->'a'); --ok

The double quotes look weird to me. but it looks like a common situation.

select numeric('1'::int); -- failed.
select "numeric"('1'::int); -- ok.

--
Best Regards
Andy Fan

#21Andy Fan
zhihui.fan1213@gmail.com
In reply to: Tom Lane (#17)
#22Chapman Flack
chap@anastigmatix.net
In reply to: Andy Fan (#20)
#23Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#21)
#24Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andy Fan (#23)
#25Andy Fan
zhihui.fan1213@gmail.com
In reply to: Pavel Stehule (#24)
#26jian he
jian.universality@gmail.com
In reply to: Andy Fan (#25)
#27Andy Fan
zhihui.fan1213@gmail.com
In reply to: jian he (#26)
#28Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#27)
#29Chapman Flack
chap@anastigmatix.net
In reply to: Andy Fan (#27)
#30jian he
jian.universality@gmail.com
In reply to: Chapman Flack (#29)
#31Andy Fan
zhihui.fan1213@gmail.com
In reply to: Chapman Flack (#29)
#32Andy Fan
zhihui.fan1213@gmail.com
In reply to: Tom Lane (#17)
#33Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andy Fan (#32)
#34Andy Fan
zhihui.fan1213@gmail.com
In reply to: Pavel Stehule (#33)
#35Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andy Fan (#34)
#36Chapman Flack
chap@anastigmatix.net
In reply to: Andy Fan (#32)
#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chapman Flack (#36)
#38Andy Fan
zhihui.fan1213@gmail.com
In reply to: Tom Lane (#37)
#39Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#38)
#40Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andy Fan (#39)
#41Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#40)
#42Andy Fan
zhihui.fan1213@gmail.com
In reply to: Pavel Stehule (#40)
#43Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andy Fan (#42)
#44Andy Fan
zhihui.fan1213@gmail.com
In reply to: Pavel Stehule (#43)
#45Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andy Fan (#44)
#46Andy Fan
zhihui.fan1213@gmail.com
In reply to: Pavel Stehule (#40)
#47Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#46)
#48jian he
jian.universality@gmail.com
In reply to: Andy Fan (#47)
#49Andy Fan
zhihui.fan1213@gmail.com
In reply to: jian he (#48)
#50Chapman Flack
chap@anastigmatix.net
In reply to: Andy Fan (#49)
#51Andy Fan
zhihui.fan1213@gmail.com
In reply to: Chapman Flack (#50)
#52Chapman Flack
chap@anastigmatix.net
In reply to: Andy Fan (#51)
#53jian he
jian.universality@gmail.com
In reply to: Chapman Flack (#52)
#54Andy Fan
zhihui.fan1213@gmail.com
In reply to: Chapman Flack (#52)
#55Chapman Flack
chap@anastigmatix.net
In reply to: Andy Fan (#54)
#56Chapman Flack
chap@anastigmatix.net
In reply to: Chapman Flack (#55)
#57Chapman Flack
chap@anastigmatix.net
In reply to: Chapman Flack (#56)
#58Andy Fan
zhihui.fan1213@gmail.com
In reply to: Chapman Flack (#56)
#59Chapman Flack
chap@anastigmatix.net
In reply to: Andy Fan (#58)
#60Andy Fan
zhihui.fan1213@gmail.com
In reply to: Chapman Flack (#57)
#61Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#1)
#62Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#61)
#63Chapman Flack
chap@anastigmatix.net
In reply to: Andy Fan (#62)
#64Chapman Flack
chap@anastigmatix.net
In reply to: Chapman Flack (#63)
#65Andy Fan
zhihui.fan1213@gmail.com
In reply to: Chapman Flack (#64)
#66Chapman Flack
chap@anastigmatix.net
In reply to: Andy Fan (#65)
#67Andy Fan
zhihui.fan1213@gmail.com
In reply to: Chapman Flack (#66)
#68Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#67)
#69jian he
jian.universality@gmail.com
In reply to: Andy Fan (#68)
#70Andy Fan
zhihui.fan1213@gmail.com
In reply to: jian he (#69)
#71Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#70)
#72jian he
jian.universality@gmail.com
In reply to: Andy Fan (#71)
#73Andy Fan
zhihui.fan1213@gmail.com
In reply to: jian he (#72)
#74Chapman Flack
chap@anastigmatix.net
In reply to: Andy Fan (#71)
#75Andy Fan
zhihui.fan1213@gmail.com
In reply to: Chapman Flack (#74)
#76Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#75)
#77Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#76)
#78Chapman Flack
chap@anastigmatix.net
In reply to: Andy Fan (#77)
#79John Naylor
john.naylor@enterprisedb.com
In reply to: Chapman Flack (#78)
#80Andy Fan
zhihui.fan1213@gmail.com
In reply to: Chapman Flack (#78)
#81jian he
jian.universality@gmail.com
In reply to: Andy Fan (#80)
#82Andy Fan
zhihui.fan1213@gmail.com
In reply to: jian he (#81)
#83jian he
jian.universality@gmail.com
In reply to: Andy Fan (#82)
#84Andy Fan
zhihui.fan1213@gmail.com
In reply to: jian he (#83)
#85Peter Eisentraut
peter_e@gmx.net
In reply to: Andy Fan (#84)
#86Andy Fan
zhihui.fan1213@gmail.com
In reply to: Peter Eisentraut (#85)
#87Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#86)
#88Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#87)
#89Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#88)
#90David Rowley
dgrowleyml@gmail.com
In reply to: Andy Fan (#89)
#91Andy Fan
zhihui.fan1213@gmail.com
In reply to: David Rowley (#90)
#92Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Andy Fan (#91)
#93Andy Fan
zhihui.fan1213@gmail.com
In reply to: Dmitry Dolgov (#92)
#94Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Andy Fan (#93)
#95Haibo Yan
tristan.yim@gmail.com
In reply to: Dmitry Dolgov (#94)
#96Pavel Stehule
pavel.stehule@gmail.com
In reply to: Haibo Yan (#95)
#97David Rowley
dgrowleyml@gmail.com
In reply to: Haibo Yan (#95)
#98Haibo Yan
tristan.yim@gmail.com
In reply to: David Rowley (#97)
#99Haibo Yan
tristan.yim@gmail.com
In reply to: David Rowley (#97)
#100Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Haibo Yan (#99)
#101Haibo Yan
tristan.yim@gmail.com
In reply to: Dmitry Dolgov (#100)
#102Haibo Yan
tristan.yim@gmail.com
In reply to: Haibo Yan (#101)
#103Haibo Yan
tristan.yim@gmail.com
In reply to: Haibo Yan (#102)
#104Haibo Yan
tristan.yim@gmail.com
In reply to: Haibo Yan (#103)
#105Haibo Yan
tristan.yim@gmail.com
In reply to: Haibo Yan (#104)
#106Andy Fan
zhihui.fan1213@gmail.com
In reply to: Haibo Yan (#95)
#107Haibo Yan
tristan.yim@gmail.com
In reply to: Andy Fan (#106)
#108Andy Fan
zhihui.fan1213@gmail.com
In reply to: Haibo Yan (#107)
#109Haibo Yan
tristan.yim@gmail.com
In reply to: Andy Fan (#108)
#110Haibo Yan
tristan.yim@gmail.com
In reply to: Haibo Yan (#109)