jsonb_delete with arrays

Started by Magnus Haganderover 9 years ago8 messageshackers
Jump to latest
#1Magnus Hagander
magnus@hagander.net

Attached is an implantation of jsonb_delete that instead of taking a single
key to remove accepts an array of keys (it still does just keys, so it's
using the - operator, it's not like the path delete function that also
takes an array, but uses a different operator).

In some simple testing of working through a real world usecases where we
needed to delete 7 keys from jsonb data, it shows approximately a 9x
speedup over calling the - operator multiple times. I'm guessing since we
copy a lot less and don't have to re-traverse the structure.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Attachments:

jsonb_delete_array.patchtext/x-patch; charset=US-ASCII; name=jsonb_delete_array.patchDownload+112-0
#2Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Magnus Hagander (#1)
Re: jsonb_delete with arrays

On 15 November 2016 at 22:53, Magnus Hagander <magnus@hagander.net> wrote:
Attached is an implantation of jsonb_delete that instead of taking a

single key to remove accepts an array of keys (it still does just keys, so
it's using the - operator, it's not like the path delete function that also
takes an array, but uses a different operator).

In some simple testing of working through a real world usecases where we

needed to delete 7 keys from jsonb data, it shows approximately a 9x
speedup over calling the - operator multiple times. I'm guessing since we
copy a lot less and don't have to re-traverse the structure.

I wonder, is it worth it to create some sort of helper function to handle
both
deleting one key and deleting an array of keys (like `setPath` for
`jsonb_set`,
`jsonb_insert` and `jsonb_delete_path`)? At first glance it looks like
`jsonb_delete` and `jsonb_delete_array` can reuse some code.

Speaking about the performance I believe it's the same problem as here [1]/messages/by-id/1566eab8731.10193ac585742.5467876610052746443@zohocorp.com,
since for each key the full jsonb will be decompressed. Looks like we need a
new set of functions to read/update/delete an array of elements at once.

[1]: /messages/by-id/1566eab8731.10193ac585742.5467876610052746443@zohocorp.com
/messages/by-id/1566eab8731.10193ac585742.5467876610052746443@zohocorp.com

#3Magnus Hagander
magnus@hagander.net
In reply to: Dmitry Dolgov (#2)
Re: jsonb_delete with arrays

On Mon, Nov 21, 2016 at 5:05 AM, Dmitry Dolgov <9erthalion6@gmail.com>
wrote:

On 15 November 2016 at 22:53, Magnus Hagander <magnus@hagander.net>

wrote:

Attached is an implantation of jsonb_delete that instead of taking a

single key to remove accepts an array of keys (it still does just keys, so
it's using the - operator, it's not like the path delete function that also
takes an array, but uses a different operator).

In some simple testing of working through a real world usecases where we

needed to delete 7 keys from jsonb data, it shows approximately a 9x
speedup over calling the - operator multiple times. I'm guessing since we
copy a lot less and don't have to re-traverse the structure.

I wonder, is it worth it to create some sort of helper function to handle
both
deleting one key and deleting an array of keys (like `setPath` for
`jsonb_set`,
`jsonb_insert` and `jsonb_delete_path`)? At first glance it looks like
`jsonb_delete` and `jsonb_delete_array` can reuse some code.

Speaking about the performance I believe it's the same problem as here [1],
since for each key the full jsonb will be decompressed. Looks like we need
a
new set of functions to read/update/delete an array of elements at once.

[1]: /messages/by-id/1566eab8731.10193ac585742.
5467876610052746443%40zohocorp.com

It can be partially related, but the usecase itself had jsonb in memory
only and never stored on disk, so it's not the decompression itself.
Shouldn't be deep parsing either as we just copy the data over. But it's a
different angle on the same core problem, I think, which comes from the
fact that jsonb is just "one value".

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#4Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Magnus Hagander (#3)
Re: jsonb_delete with arrays

Attached is an implantation of jsonb_delete that instead of taking a

single key to remove accepts an array of keys

Since I already saw this patch, here is my small review.

Speaking about implementation of `jsonb_delete_array` - it's fine, but I
would like to suggest two modifications:

* create a separate helper function for jsonb delete operation, to use it
in both `jsonb_delete` and `jsonb_delete_array`. It will help to
concentrate related logic in one place.

* use variadic arguments for `jsonb_delete_array`. For rare cases, when
someone decides to use this function directly instead of corresponding
operator. It will be more consistent with `jsonb_delete` from my point of
view, because it's transition from `jsonb_delete(data, 'key')` to
`jsonb_delete(data, 'key1', 'key2')` is more smooth, than to
`jsonb_delete(data, '{key1, key2}')`.

I've attached a patch with these modifications. What do you think?

Attachments:

jsonb_delete_worker.patchtext/x-patch; charset=US-ASCII; name=jsonb_delete_worker.patchDownload+114-15
#5Michael Paquier
michael@paquier.xyz
In reply to: Dmitry Dolgov (#4)
Re: jsonb_delete with arrays

On Sun, Dec 18, 2016 at 1:27 AM, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Speaking about implementation of `jsonb_delete_array` - it's fine, but I
would like to suggest two modifications:

* create a separate helper function for jsonb delete operation, to use it in
both `jsonb_delete` and `jsonb_delete_array`. It will help to concentrate
related logic in one place.

I am not sure that it is much a win as the code loses readability for
a minimal refactoring. What would have been nice is to group as well
jsonb_delete_idx but handling of the element deletion is really
different there.

* use variadic arguments for `jsonb_delete_array`. For rare cases, when
someone decides to use this function directly instead of corresponding
operator. It will be more consistent with `jsonb_delete` from my point of
view, because it's transition from `jsonb_delete(data, 'key')` to
`jsonb_delete(data, 'key1', 'key2')` is more smooth, than to
`jsonb_delete(data, '{key1, key2}')`.

That's a good idea.

I've attached a patch with these modifications. What do you think?

Looking at both patches proposed, documentation is still missing in
the list of jsonb operators as '-' is missing for arrays. I am marking
this patch as waiting on author for now.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Magnus Hagander
magnus@hagander.net
In reply to: Michael Paquier (#5)
Re: jsonb_delete with arrays

On Tue, Jan 17, 2017 at 8:25 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Sun, Dec 18, 2016 at 1:27 AM, Dmitry Dolgov <9erthalion6@gmail.com>
wrote:

Speaking about implementation of `jsonb_delete_array` - it's fine, but I
would like to suggest two modifications:

* create a separate helper function for jsonb delete operation, to use

it in

both `jsonb_delete` and `jsonb_delete_array`. It will help to concentrate
related logic in one place.

I am not sure that it is much a win as the code loses readability for
a minimal refactoring. What would have been nice is to group as well
jsonb_delete_idx but handling of the element deletion is really
different there.

I agree with that. I agree with investigating it as an option, but I think
the lost readability is worse.

* use variadic arguments for `jsonb_delete_array`. For rare cases, when
someone decides to use this function directly instead of corresponding
operator. It will be more consistent with `jsonb_delete` from my point of
view, because it's transition from `jsonb_delete(data, 'key')` to
`jsonb_delete(data, 'key1', 'key2')` is more smooth, than to
`jsonb_delete(data, '{key1, key2}')`.

That's a good idea.

I can see the point of that. In the particular usecase I built it for
originally though, the list of keys came from the application, in which
case binding them as an array was a lot more efficient (so as not to
require a whole lot of different prepared statements, one for each number
of parameters). But that should be workaround-able using the VARIADIC
keyword in the caller. Or by just using the operator.

I've attached a patch with these modifications. What do you think?

Looking at both patches proposed, documentation is still missing in
the list of jsonb operators as '-' is missing for arrays. I am marking
this patch as waiting on author for now.

Added in updated patch. Do you see that as enough, or do we need it in some
more places in the docs as well?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Attachments:

jsonb_delete_array_2.patchtext/x-patch; charset=US-ASCII; name=jsonb_delete_array_2.patchDownload+120-0
#7Michael Paquier
michael@paquier.xyz
In reply to: Magnus Hagander (#6)
Re: jsonb_delete with arrays

On Tue, Jan 17, 2017 at 8:45 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Tue, Jan 17, 2017 at 8:25 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Sun, Dec 18, 2016 at 1:27 AM, Dmitry Dolgov <9erthalion6@gmail.com>
wrote:

* use variadic arguments for `jsonb_delete_array`. For rare cases, when
someone decides to use this function directly instead of corresponding
operator. It will be more consistent with `jsonb_delete` from my point
of
view, because it's transition from `jsonb_delete(data, 'key')` to
`jsonb_delete(data, 'key1', 'key2')` is more smooth, than to
`jsonb_delete(data, '{key1, key2}')`.

That's a good idea.

I can see the point of that. In the particular usecase I built it for
originally though, the list of keys came from the application, in which case
binding them as an array was a lot more efficient (so as not to require a
whole lot of different prepared statements, one for each number of
parameters). But that should be workaround-able using the VARIADIC keyword
in the caller. Or by just using the operator.

Yes that should be enough:
=# select jsonb_delete('{"a":1 , "b":2, "c":3}', 'a', 'b', 'c');
jsonb_delete
--------------
{}
(1 row)
=# select '{"a":1 , "b":2, "c":3}'::jsonb - '{a,b}'::text[];
?column?
----------
{"c": 3}
(1 row)
That's a nice bonus, perhaps that's not worth documenting as most
users will likely care only about the operator.

I've attached a patch with these modifications. What do you think?

Looking at both patches proposed, documentation is still missing in
the list of jsonb operators as '-' is missing for arrays. I am marking
this patch as waiting on author for now.

Added in updated patch. Do you see that as enough, or do we need it in some
more places in the docs as well?

I am not seeing other places to update, thanks.

Another victim of 352a24a... Your patch is failing to apply because
now the headers of the functions is generated automatically. And the
OIDs have been taken recently. I have fixed that to test your patch,
the result is attached. The patch is marked as ready for committer.
--
Michael

Attachments:

jsonb_delete_array_3.patchtext/plain; charset=US-ASCII; name=jsonb_delete_array_3.patchDownload+119-0
#8Magnus Hagander
magnus@hagander.net
In reply to: Michael Paquier (#7)
Re: jsonb_delete with arrays

On Wed, Jan 18, 2017 at 5:49 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Tue, Jan 17, 2017 at 8:45 PM, Magnus Hagander <magnus@hagander.net>
wrote:

On Tue, Jan 17, 2017 at 8:25 AM, Michael Paquier <

michael.paquier@gmail.com>

wrote:

On Sun, Dec 18, 2016 at 1:27 AM, Dmitry Dolgov <9erthalion6@gmail.com>
wrote:

* use variadic arguments for `jsonb_delete_array`. For rare cases,

when

someone decides to use this function directly instead of corresponding
operator. It will be more consistent with `jsonb_delete` from my point
of
view, because it's transition from `jsonb_delete(data, 'key')` to
`jsonb_delete(data, 'key1', 'key2')` is more smooth, than to
`jsonb_delete(data, '{key1, key2}')`.

That's a good idea.

I can see the point of that. In the particular usecase I built it for
originally though, the list of keys came from the application, in which

case

binding them as an array was a lot more efficient (so as not to require a
whole lot of different prepared statements, one for each number of
parameters). But that should be workaround-able using the VARIADIC

keyword

in the caller. Or by just using the operator.

Yes that should be enough:
=# select jsonb_delete('{"a":1 , "b":2, "c":3}', 'a', 'b', 'c');
jsonb_delete
--------------
{}
(1 row)
=# select '{"a":1 , "b":2, "c":3}'::jsonb - '{a,b}'::text[];
?column?
----------
{"c": 3}
(1 row)
That's a nice bonus, perhaps that's not worth documenting as most
users will likely care only about the operator.

I've attached a patch with these modifications. What do you think?

Looking at both patches proposed, documentation is still missing in
the list of jsonb operators as '-' is missing for arrays. I am marking
this patch as waiting on author for now.

Added in updated patch. Do you see that as enough, or do we need it in

some

more places in the docs as well?

I am not seeing other places to update, thanks.

Another victim of 352a24a... Your patch is failing to apply because
now the headers of the functions is generated automatically. And the
OIDs have been taken recently. I have fixed that to test your patch,
the result is attached. The patch is marked as ready for committer.

Thanks! I had already rebased it, so I pushed that version (picked
different oids).

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/