Re: Using JSONB directly from application

Started by Christian Ohlerover 7 years ago3 messages
#1Christian Ohler
ohler@shift.com

(continuing an old thread from
/messages/by-id/CAMsr+YEtamQYZ5EocsuthQCvyvmRnQrucDP6GZynPtf0gsMbuw@mail.gmail.com
)

Craig Ringer <craig(at)2ndquadrant(dot)com>, 2018-02-26:

On 26 February 2018 at 04:05, Anthony Communier

<anthony(dot)communier(at)gmail(dot)com> wrote:

It would be nice if application connected to a Postrgesql database could
send and receive JSONB in binary. It could save some useless text
conversion. All works could be done on application side which are often
more scalable than database itself.

To support this, you'd need to extract PostgreSQL's jsonb support into a C
library that could be used independently of backend server infrastructure
like 'palloc' and memory contexts, ereport(), etc. Or write a parallel
implementation.

At Shift, we also have use cases that would likely be sped up quite a bit
if we could avoid the conversion from JSONB to JSON, and instead pass
binary JSONB to the application side and parse it there (in Go). I doubt
we'd want to reuse any of Postgres's C code, and would instead go with your
"parallel implementation" idea; I can't imagine it being particularly
difficult to implement a JSONB parser from scratch.

All we need, I think, is a Postgres function raw_jsonb(jsonb) that returns
bytea but is the identity function at the byte level. (Or allow a cast
from jsonb to bytea.)

Our Go code would then send queries like SELECT col1, col2, raw_jsonb(col3)
FROM table1 WHERE ...; I haven't thought in depth about how we'd parse the
JSONB in Go, but perhaps we can synthesize a stream of JSON tokens from the
binary JSONB (one token at a time, to avoid copies and allocations) and
adapt the streaming parser https://github.com/json-iterator/go to turn it
into Go values.

Sending raw JSONB to Postgres might also be interesting, but I'd start with
receiving.

Would implementing raw_jsonb be as trivial as it sounds? What about usages
like SELECT raw_jsonb(col3->'foo'); does the subobject returned by '->'
share structure with the containing object, making the conversion to a
self-contained JSONB value less direct?

Can these conversions be implemented without copying the bytes?

An open question about the API contract would be how raw_jsonb would be
affected if Postgres introduces a version 2 of JSONB encoding. My
intuition is to punt that problem to the application, and define that
raw_jsonb returns whatever version of JSONB is most convenient for Postgres
for that particular datum; this minimizes conversion work on the Postgres
side, which is the purpose of the mechanism. Applications that want a
stable format can use the conventional textual JSON format. But I could
see a case for making the function raw_jsonb(int, jsonb) and allowing the
caller to specify what (maximum?) version of JSONB they want.

#2Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Christian Ohler (#1)

On 06/22/2018 11:52 PM, Christian Ohler wrote:

(continuing an old thread from
/messages/by-id/CAMsr+YEtamQYZ5EocsuthQCvyvmRnQrucDP6GZynPtf0gsMbuw@mail.gmail.com
)

Craig Ringer <craig(at)2ndquadrant(dot)com>, 2018-02-26:

On 26 February 2018 at 04:05, Anthony Communier

<anthony(dot)communier(at)gmail(dot)com> wrote:

 

It would be nice if application connected to a Postrgesql database could
send and receive JSONB in binary. It could save some useless text
conversion. All works could be done on application side which are often
more scalable than database itself.

To support this, you'd need to extract PostgreSQL's jsonb support into a C
library that could be used independently of backend server infrastructure
like 'palloc' and memory contexts, ereport(), etc. Or write a parallel
implementation.

At Shift, we also have use cases that would likely be sped up quite a
bit if we could avoid the conversion from JSONB to JSON, and instead
pass binary JSONB to the application side and parse it there (in Go).  I
doubt we'd want to reuse any of Postgres's C code, and would instead go
with your "parallel implementation" idea; I can't imagine it being
particularly difficult to implement a JSONB parser from scratch.

All we need, I think, is a Postgres function raw_jsonb(jsonb) that
returns bytea but is the identity function at the byte level.  (Or allow
a cast from jsonb to bytea.)

Our Go code would then send queries like SELECT col1, col2,
raw_jsonb(col3) FROM table1 WHERE ...; I haven't thought in depth about
how we'd parse the JSONB in Go, but perhaps we can synthesize a stream
of JSON tokens from the binary JSONB (one token at a time, to avoid
copies and allocations) and adapt the streaming parser
https://github.com/json-iterator/go to turn it into Go values.

Sending raw JSONB to Postgres might also be interesting, but I'd start
with receiving.

Would implementing raw_jsonb be as trivial as it sounds?  What about
usages like SELECT raw_jsonb(col3->'foo'); does the subobject returned
by '->' share structure with the containing object, making the
conversion to a self-contained JSONB value less direct?

Can these conversions be implemented without copying the bytes?

I don't think you need the function, actually. PostgreSQL protocol
supports both text and binary mode - in the text mode the server formats
everything as text before sending it to the client. I guess this is what
you mean by "convert to json".

But with the extended protocol you (or rather the connection library
you're using) can specify that the output should be handed in binary,
i.e. as exact copy of the data. This happens at "bind" phase, see the
"Bind" message docs here:

https://www.postgresql.org/docs/current/static/protocol-message-formats.html

An open question about the API contract would be how raw_jsonb would be
affected if Postgres introduces a version 2 of JSONB encoding.  My
intuition is to punt that problem to the application, and define that
raw_jsonb returns whatever version of JSONB is most convenient for
Postgres for that particular datum; this minimizes conversion work on
the Postgres side, which is the purpose of the mechanism.  Applications
that want a stable format can use the conventional textual JSON format. 
But I could see a case for making the function raw_jsonb(int, jsonb) and
allowing the caller to specify what (maximum?) version of JSONB they want.

I doubt we'll introduce a new JSONB any time soon, so I wouldn't be
particularly worried about this. If it eventually happens, you'll have
to adapt your parser to that, I think.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Andrew Dunstan
andrew.dunstan@2ndquadrant.com
In reply to: Tomas Vondra (#2)

On 06/24/2018 12:42 PM, Tomas Vondra wrote:

Sending raw JSONB to Postgres might also be interesting, but I'd start
with receiving.

Would implementing raw_jsonb be as trivial as it sounds?  What about
usages like SELECT raw_jsonb(col3->'foo'); does the subobject returned
by '->' share structure with the containing object, making the
conversion to a self-contained JSONB value less direct?

Can these conversions be implemented without copying the bytes?

I don't think you need the function, actually. PostgreSQL protocol
supports both text and binary mode - in the text mode the server formats
everything as text before sending it to the client. I guess this is what
you mean by "convert to json".

But with the extended protocol you (or rather the connection library
you're using) can specify that the output should be handed in binary,
i.e. as exact copy of the data. This happens at "bind" phase, see the
"Bind" message docs here:

https://www.postgresql.org/docs/current/static/protocol-message-formats.html

jsonb_send just sends 1 followed by the stringified value. If you want
real binary transmission I think you'll need a new output version, by
adjusting jsonb_send and jsonb_recv.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services