Re: Protocol buffer support for Postgres

Started by Flavius Antonover 9 years ago7 messages

f.v.anton@gmail.com

over 9 years ago

On Tue, Apr 26, 2016 at 2:40:49PM -0700, David Fetter wrote:

Should we see about making a more flexible serialization
infrastructure? What we have is mostly /ad hoc/, and has already
caused real pain to the PostGIS folks, this even after some pretty
significant and successful efforts were made in this direction.

Hi all. Is anybody working on this right now? I would like to pick
this task for the summer. First of all, what do you think about what
David said? Should we try and design a generic infrastructure for
similar serialization datatypes? If so, will we need to refactor some
pieces from the JSON/XML implementation? I looked over the code and it
seems nicely decoupled, but I am not sure what this would involve.
I've done this before for MySQL[1]https://github.com/google/mysql-protobuf (not yet completed), but I'd love
to try it for PostgreSQL too.

On Tue, Apr 26, 2016 at 11:23:11AM -0700, José Luis Tallón wrote:

Have you investigated JSONB vs ProtoBuf space usage ?
(the key being the "B" -- Postgres' own binary JSON
implementation)

This is something I can further investigate, but another (possibly
major) benefit of the Protocol Buffers over JSON is that they *still*
have a schema. I think they combine advantages from both structured
and schemaless data.

My best guess is that we shouldn't focus on abstracting *any*
serialization paradigm, but only the ones that have a schema (like
Thrift or Protocol Buffers). Speaking of schemas, where is the best
place to keep that? For MySQL I opted for a plain text file similar to
.trg files (the ones used by MySQL for keeping triggers).

I'd love to talk more about this.

Thank you.
Flavius Anton

[1]: https://github.com/google/mysql-protobuf

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Greg Stark

stark@mit.edu

over 9 years ago

In reply to: Flavius Anton (#1)

On Thu, Jun 23, 2016 at 8:15 PM, Flavius Anton <f.v.anton@gmail.com> wrote:

I'd love to talk more about this.

I thought quite a bit about this a few years ago but never really
picked up it to work on.

There are different use cases where this could be useful and different
approaches that could be useful for the different use cases.

One idea was to have a protocol buffer data type which used the typmod
as an identifier to specify the schema. Perhaps as an oid primary key
from some other table, or something like how enums are handled. That
would be used much like JSON is used by languages where that's natural
but might be useful if you're in an environment where data is being
provided in protobufs and you need to send them on in protobufs.

Another option would be to allow the output of your select query to be
read in a protobuf. That might be a feature to add in libpq rather
than in the server, or perhaps as a new protocol feature that libpq
would then just switch to which might make it easier to use from other
languages. That might make it easier to use Postgres as a data store
for an environment where everything is in protobufs without losing the
full power of SQL schemas in the database.

As an aside, have you seen Cap’n Proto? It looks more interesting to
me than protobufs. It fixes a lot of the corner cases where protobufs
end up with unbounded computational complexity and seems a lot
cleaner. I haven't thought about it much but I wonder if it would be
possible to actually use Cap'n Proto capabilities to grant access to
Postgres RLS labels too.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Flavius Anton

f.v.anton@gmail.com

over 9 years ago

In reply to: Greg Stark (#2)

On Thu, Jun 23, 2016 at 1:50 PM, Greg Stark <stark@mit.edu> wrote:

On Thu, Jun 23, 2016 at 8:15 PM, Flavius Anton <f.v.anton@gmail.com> wrote:

I'd love to talk more about this.

I thought quite a bit about this a few years ago but never really
picked up it to work on.

Another option would be to allow the output of your select query to be
read in a protobuf. That might be a feature to add in libpq rather
than in the server, or perhaps as a new protocol feature that libpq
would then just switch to which might make it easier to use from other
languages. That might make it easier to use Postgres as a data store
for an environment where everything is in protobufs without losing the
full power of SQL schemas in the database.

I agree on this one, I think it's the most /natural/ way of doing things.

As an aside, have you seen Cap’n Proto? It looks more interesting to
me than protobufs. It fixes a lot of the corner cases where protobufs
end up with unbounded computational complexity and seems a lot
cleaner.

I've seen it around for quite some time, but my fear is that it is not
(yet?) widely adopted. Protocol Buffers themselves are not that
popular, let alone Cap'n Proto. By the way, there's also the newer
FlatBuffers[1]https://github.com/google/flatbuffers project, from Google too. They seem a lot like Cap'n
Proto, though. I think it's doable to provide some sort of abstraction
for these protocols in PostgreSQL, so that it can support all of them
eventually, with minimum effort for adding a new one. However, I am
skeptical about the practical advantages of having /all/ of them
supported.

--
Flavius

[1]: https://github.com/google/flatbuffers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Flavius Anton

f.v.anton@gmail.com

over 9 years ago

In reply to: Flavius Anton (#3)

On Thu, Jun 23, 2016 at 2:54 PM, Flavius Anton <f.v.anton@gmail.com> wrote:

On Thu, Jun 23, 2016 at 1:50 PM, Greg Stark <stark@mit.edu> wrote:

On Thu, Jun 23, 2016 at 8:15 PM, Flavius Anton <f.v.anton@gmail.com> wrote:

I'd love to talk more about this.

I thought quite a bit about this a few years ago but never really
picked up it to work on.

Any other thoughts on this? My guess is that it might be an important
addition to Postgres that can attract even more users, but I am not
sure if there's enough interest from the community. If I want to pick
this task, how should I move forward? Do I need to write a design
document or similar or should I come up with a patch that implements a
draft prototype? I am new to this community and don't know the code
yet, so I'd appreciate some guidance from an older, more experienced
member.

Thanks.

--
Flavius

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Álvaro Hernández Tortosa

aht@8kdata.com

over 9 years ago

In reply to: Flavius Anton (#4)

On 24/06/16 14:23, Flavius Anton wrote:

On Thu, Jun 23, 2016 at 2:54 PM, Flavius Anton <f.v.anton@gmail.com> wrote:

On Thu, Jun 23, 2016 at 1:50 PM, Greg Stark <stark@mit.edu> wrote:

On Thu, Jun 23, 2016 at 8:15 PM, Flavius Anton <f.v.anton@gmail.com> wrote:

I'd love to talk more about this.

I thought quite a bit about this a few years ago but never really
picked up it to work on.

Any other thoughts on this? My guess is that it might be an important
addition to Postgres that can attract even more users, but I am not
sure if there's enough interest from the community. If I want to pick
this task, how should I move forward? Do I need to write a design
document or similar or should I come up with a patch that implements a
draft prototype? I am new to this community and don't know the code
yet, so I'd appreciate some guidance from an older, more experienced
member.

Other than protobuf, there are also other serialization formats
that might be worth considering. Not too long ago I suggested working
specifically on serialization formas for the json/jsonb types:
/messages/by-id/56CB8A62.40100@8kdata.com I
believe this effort is on the same boat.

Álvaro

Álvaro Hernández Tortosa

-----------
8Kdata

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Flavius Anton

f.v.anton@gmail.com

over 9 years ago

In reply to: Álvaro Hernández Tortosa (#5)

On Fri, Jun 24, 2016 at 11:35 AM, Álvaro Hernández Tortosa
<aht@8kdata.com> wrote:

On 24/06/16 14:23, Flavius Anton wrote:

On Thu, Jun 23, 2016 at 2:54 PM, Flavius Anton <f.v.anton@gmail.com>
wrote:

On Thu, Jun 23, 2016 at 1:50 PM, Greg Stark <stark@mit.edu> wrote:

On Thu, Jun 23, 2016 at 8:15 PM, Flavius Anton <f.v.anton@gmail.com>
wrote:

I'd love to talk more about this.

I thought quite a bit about this a few years ago but never really
picked up it to work on.

Any other thoughts on this? My guess is that it might be an important
addition to Postgres that can attract even more users, but I am not
sure if there's enough interest from the community. If I want to pick
this task, how should I move forward? Do I need to write a design
document or similar or should I come up with a patch that implements a
draft prototype? I am new to this community and don't know the code
yet, so I'd appreciate some guidance from an older, more experienced
member.

Other than protobuf, there are also other serialization formats that
might be worth considering. Not too long ago I suggested working
specifically on serialization formas for the json/jsonb types:
/messages/by-id/56CB8A62.40100@8kdata.com I believe
this effort is on the same boat.

Sure, there are a bunch of these, some of the most popular being:
* Cap'n Proto
* Flatbuffers
* Thrift

A longer list is already made here[1]https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats.

Meanwhile, I came across another interesting idea. What if, for
starters, we don't introduce a completely new serialization format,
like Protocol Buffers, but rather make the JSON support more stronger
and interesting. What I am thinking is to have a JSON schema
(optionally) associated with a JSON column. In this way, you don't
have to store the keys on disk anymore and also you'd have your
database check for JSON validity at INSERT time. I think there are two
big advantages here. What do you think about this one?

--
Flavius

[1]: https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Flavius Anton (#4)

On Fri, Jun 24, 2016 at 2:23 PM, Flavius Anton <f.v.anton@gmail.com> wrote:

Any other thoughts on this? My guess is that it might be an important
addition to Postgres that can attract even more users, but I am not
sure if there's enough interest from the community. If I want to pick
this task, how should I move forward? Do I need to write a design
document or similar or should I come up with a patch that implements a
draft prototype? I am new to this community and don't know the code
yet, so I'd appreciate some guidance from an older, more experienced
member.

It's not very clear to me from this discussion what you actually want
to implement, and I think your chances of implementing it successfully
and your chances of getting that committed depend heavily on what you
actually intend to do. If you want to implement a new datatype, that
can be done as a contrib module and, if it doesn't go into the core
distribution, it can always go up on GitHub and people can use it if
they like it. If you want to implement a tool for validating JSON
documents against a JSON schema, you can just write that as a function
in a loadable module and people who want to use it can add a CHECK
(json_validate_schema(mycolumn, 'schema or a reference thereunto goes
here')) constraint. Again, if core doesn't want it, you can still
distribute it and people can use it with unmodified PostgreSQL. And
similarly, if you want to convert records from PostgreSQL's native
format to some new serialization format, there is precedent with that
for JSON and it can all be done as an extension to start.

If you do something that touches the wire protocol or adds new formats
in which data can be returned to the client, I think your chances of
getting that committed to core are extremely low. We don't really
like to change the wire protocol, and every time we add some extension
to it, the amount of actual use that people get out of it is
contingent on all of the various reimplementations of that protocol in
various language-specific drivers also being extended to support that
same thing. It's not worth going through that kind of hassle very
often, and I don't think this would come close to having enough value
to make that level of pain worthwhile, given that there are so many
other ways of accomplishing the same thing: you can use a function in
the query to transform the data before sending it, or you can have the
client transform it afterwards. Moreover, if your work doesn't go
into core then it's probably going to die on the vine because who
wants a modified server version that breaks wire protocol
compatibility?

I'm not taking any position on the intrinsic value of protocol buffer
support beyond the broad statement that I don't personally have any
reason care about it. That is, I neither support nor oppose the idea
of accepting code that does stuff related to protocol buffers. I am
only offering a general opinion that something that can exist as an
extension is likely (a) easier to implement, (b) easier to get
committed, and (c) still potentially useful if it doesn't.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers