Protocol buffer support for Postgres

Started by 陈天舟over 9 years ago6 messages
#1陈天舟
tianzhouchen@gmail.com

I am interested in adding Protocol Buffer support for Postgres. Protocol
Buffer occupies less space than JSON. More importantly, it has schema and
is forward/backward compatible. All these make it a very good format for
persistency.

Here are two rough ideas I have right now:

Approach 1:

Creating a datatype "PROTOBUF" similar as "JSON" and implement all the
similar support (e.g. indexing) as done for "JSON" Type.
(1) Since each protocol buffer column requires a schema. I am not sure
where is the best place to store that schema info. Should it be in a
CONSTRAINT (but I am not able to find the doc referring any custom
constraint), or should it be in the COMMENT or somewhere else?
(2) The input/output will be already serialized protocol buffer. The
protocol buffer schema will be used to validate the column value when
storing/retrieving the data. It may be possible to skip the check upon
retrieving, instead the check can be done anytime the column schema is
changed.

Approach 2:

Still creating a datatype "PROTOBUF", but internally it uses "JSON" type.
So PROTOBUF is just a wrapper type doing the conversion and schema
validation on the fly. The benefit of this is it can leverage the JSON
support. The downside is it can only support the common features available
in both Protocol Buffer and JSON. e.g. Protocol Buffer annotation is a
quite useful feature which is not available in JSON.

I would like to hear some thoughts about having Protocol Buffer for
Postgres as well as the approach tackling it.

Thanks
Tianzhou

#2Craig Ringer
craig@2ndquadrant.com
In reply to: 陈天舟 (#1)
Re: Protocol buffer support for Postgres

On 26 April 2016 at 14:06, 陈天舟 <tianzhouchen@gmail.com> wrote:

I am interested in adding Protocol Buffer support for Postgres. Protocol
Buffer occupies less space than JSON. More importantly, it has schema and
is forward/backward compatible. All these make it a very good format for
persistency.

Here are two rough ideas I have right now:

Approach 1:

Creating a datatype "PROTOBUF" similar as "JSON" and implement all the
similar support (e.g. indexing) as done for "JSON" Type.
(1) Since each protocol buffer column requires a schema. I am not sure
where is the best place to store that schema info. Should it be in a
CONSTRAINT (but I am not able to find the doc referring any custom
constraint), or should it be in the COMMENT or somewhere else?

I can't really imagine how you'd do that without adding a new catalog like
we have for enum members. A typmod isn't sufficient since you need a whole
lot more than an integer, and typmods aren't tracked throughout the server
that well.

That'll make it hard to do it with an extension.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#3José Luis Tallón
jltallon@adv-solutions.net
In reply to: 陈天舟 (#1)
Re: Protocol buffer support for Postgres

On 04/26/2016 08:06 AM, 陈天舟 wrote:

I am interested in adding Protocol Buffer support for Postgres.
Protocol Buffer occupies less space than JSON. More importantly, it
has schema and is forward/backward compatible. All these make it a
very good format for persistency.

Have you investigated JSONB vs ProtoBuf space usage ?
(the key being the "B" -- Postgres' own binary JSON implementation)

The "per-column schema" thing sounds difficult to do without major
changes to the core unless/until we have generalized user-defined
metadata for objects ....

Just my .02€

/ J.L.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Craig Ringer (#2)
Re: Protocol buffer support for Postgres

Craig Ringer <craig@2ndquadrant.com> writes:

On 26 April 2016 at 14:06, 陈天舟 <tianzhouchen@gmail.com> wrote:

(1) Since each protocol buffer column requires a schema. I am not sure
where is the best place to store that schema info. Should it be in a
CONSTRAINT (but I am not able to find the doc referring any custom
constraint), or should it be in the COMMENT or somewhere else?

I can't really imagine how you'd do that without adding a new catalog like
we have for enum members. A typmod isn't sufficient since you need a whole
lot more than an integer, and typmods aren't tracked throughout the server
that well.

That'll make it hard to do it with an extension.

PostGIS manages to reference quite a lot of schema-like information via
a geometry column's typmod. Maybe there's a reason why their approach
wouldn't be a good fit for this, but it'd be worth investigating.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Paul Ramsey
pramsey@cleverelephant.ca
In reply to: Tom Lane (#4)
Re: Protocol buffer support for Postgres

On Tue, Apr 26, 2016 at 6:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Craig Ringer <craig@2ndquadrant.com> writes:

On 26 April 2016 at 14:06, 陈天舟 <tianzhouchen@gmail.com> wrote:

(1) Since each protocol buffer column requires a schema. I am not sure
where is the best place to store that schema info. Should it be in a
CONSTRAINT (but I am not able to find the doc referring any custom
constraint), or should it be in the COMMENT or somewhere else?

I can't really imagine how you'd do that without adding a new catalog like
we have for enum members. A typmod isn't sufficient since you need a whole
lot more than an integer, and typmods aren't tracked throughout the server
that well.

That'll make it hard to do it with an extension.

PostGIS manages to reference quite a lot of schema-like information via
a geometry column's typmod. Maybe there's a reason why their approach
wouldn't be a good fit for this, but it'd be worth investigating.

We pack a short type number, two flags and 24 bits of SRID number into
an integer. The SRID number is in turn a foreign key into the
spatial_ref_sys table where the fully spelled out spatial reference
definition lives. It's not very nice, and it's quite breakable since
there's no foreign key integrity between the typmod and the
spatial_ref_sys pk.

P.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6David Fetter
david@fetter.org
In reply to: 陈天舟 (#1)
Re: Protocol buffer support for Postgres

On Mon, Apr 25, 2016 at 11:06:11PM -0700, 陈天舟 wrote:

I am interested in adding Protocol Buffer support for Postgres. Protocol
Buffer occupies less space than JSON. More importantly, it has schema and
is forward/backward compatible. All these make it a very good format for
persistency.

Thanks for the idea. There are many different serializations, some of
which are listed below, each with their own qualities, which can be
good or bad with a very strong dependency on context:

https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats

Should we see about making a more flexible serialization
infrastructure? What we have is mostly /ad hoc/, and has already
caused real pain to the PostGIS folks, this even after some pretty
significant and successful efforts were made in this direction.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers