Proposal: http2 wire format

Started by Damir Simunicabout 8 years ago47 messageshackers

damir.simunic@wa-research.ch

about 8 years ago

Hello hackers,

I’d like to propose the implementation of new wire protocol using http2 framing.

It appears to me that http2 solves many of the issues on the TODO list under “Wire Protocol Changes / v4 Protocol,“ without any obvious downsides.

The implementation I have in mind has zero impact on existing clients. No changes to the format of existing v3 protocol. The new protocol works through a few small additions to postmaster.c to intercept TLS requests, and the rest in new source files, linked through PQcommMethods.

I’d like to emphasize that this proposal is empathically NOT about “let’s handle REST in the database” or some such. It’s about upgrading the framing, where http2 offers many benefits: content negotiation, concurrent bidirectional streams, extensible frame types, metadata/data split into headers/trailers and data frames, flow control, etc. It’s at least as efficient as febe v3. A lot of research is going into it to make it even more efficient and latency friendly. The mechanisms it provides for content negotiation, (and with ALPN, protocol negotiation), offers us a future-friendly way to evolve without the burden of backward compatibility compromises.

Before writing this proposal, I set out to create a proof of concept. My goal for the PoC is to be able to connect to the server using an existing http2 client and get json back:

curl -k https://localhost:5432/some_func \
--http2-prior-knowledge --tlsv1.2 \
-H 'pg-database: postgres' \
-H 'pg-user: web' \
-H ‘authorization: ….’
-H ‘accept: application/json’

{ result: [ … ] }

After spending a week getting up to speed with C, libpq internals, http2 standard, libnghttp2 interface, etc., I’m fairly convinced that pg/http2 is feasible.

Sadly, my experience with C and Postgres internals is non-existent, and I am not yet able to finalize a live demo. The above curl request does establish the connection, receives the settings frame and queries the database, but I’m still struggling with writing code to return the http2 response. At this stage, it’s purely an issue of mechanically writing the code, I think I solved how it all works in principle.

If anyone finds the idea of Postgres speaking http2 appealing, I’d welcome guidance/mentoring/coding help (or just plain taking over). I a put up a repo with the results so far and a longer writeup: https://github.com/dsimunic/pg_h2

All changes I made to the codebase are in a single commit, hopefully easy to understand what is happening. You’ll need libnghttp2 and openssl 1.0.2 or newer to compile.

My hope is that this post leads to a conversation and gets a few people excited about the idea the way I am. Maybe even some of the GSoC students would take the implementation further?

Damir

David Fetter

david@fetter.org

about 8 years ago

In reply to: Damir Simunic (#1)

Re: Proposal: http2 wire format

On Sat, Mar 24, 2018 at 06:52:47PM +0100, Damir Simunic wrote:

Hello hackers,

I’d like to propose the implementation of new wire protocol using http2 framing.

Welcome to the PostgreSQL community! This is a very interesting idea.
Please send a patch to this mailing list on this thread.

In order to get and keep it on the radar, you should know about how
development works in PostgreSQL.

http://wiki.postgresql.org/wiki/Development_information

In particular, please look at: http://wiki.postgresql.org/wiki/Submitting_a_Patch

I notice that you patched 10. New features, and this is definitely
one, go against git master.

It appears to me that http2 solves many of the issues on the TODO
list under “Wire Protocol Changes / v4 Protocol,“ without any
obvious downsides.

Here are a few things to consider, at least from my perspective:

- Docs. Gotta have some: https://wiki.postgresql.org/wiki/Documentation_Tools

- Testing. Gotta have some in src/test/regress in the source tree.

- Tight coupling to OpenSSL, if that's actually what's happening.
We're actively trying to get away from this, so a TLS-neutral
implementation or at least one that's not specific to OpenSSL would
be good.

- Overhead for all clients. It may be tiny, but it needs to be
measured and that cost needs to be weighed against the benefits.
Maybe a cache miss in the context of a network connection is
negligible, but we do need to know.

- Dependency on a new external library. Fortunately, it's MIT
licensed, so it's PostgreSQL compatible, but what happens if it
becomes unmaintained? This has happened a couple of times, and it
causes overhead that needs to be taken into account.

My hope is that this post leads to a conversation and gets a few
people excited about the idea the way I am. Maybe even some of the
GSoC students would take the implementation further?

The conversation has started.

Again, welcome, and thanks for jumping in!

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Damir Simunic

damir.simunic@wa-research.ch

about 8 years ago

In reply to: David Fetter (#2)

Re: Proposal: http2 wire format

On 25 Mar 2018, at 19:42, David Fetter <david@fetter.org> wrote:

On Sat, Mar 24, 2018 at 06:52:47PM +0100, Damir Simunic wrote:

Hello hackers,

I’d like to propose the implementation of new wire protocol using http2 framing.

Welcome to the PostgreSQL community! This is a very interesting idea.
Please send a patch to this mailing list on this thread.

Thanks David, very excited to be part of pgsql-hackers!

In order to get and keep it on the radar, you should know about how
development works in PostgreSQL.

http://wiki.postgresql.org/wiki/Development_information

In particular, please look at: http://wiki.postgresql.org/wiki/Submitting_a_Patch

To put it out front: my forte is product design, not C coding. (Also, I made a grammar error in the opening sentence: I’m not proposing “the implementation”, but “implementing h2 as new wire proto”)

I did study all of the resources you mentioned. And am voraciously reading up on Postgres internals, scouring its source, practicing C development, etc.

My email is the result of the first advice under “Brand new features” in “So you want to be a developer?”.

I notice that you patched 10. New features, and this is definitely
one, go against git master.

Let me figure out how to do that pronto. 10.2 tarball was easier to learn from as it was not a moving target. Whatever I did so far is not yet patch-worthy.

It appears to me that http2 solves many of the issues on the TODO
list under “Wire Protocol Changes / v4 Protocol,“ without any
obvious downsides.

Here are a few things to consider, at least from my perspective:

- Docs. Gotta have some: https://wiki.postgresql.org/wiki/Documentation_Tools

No worries about that—I love writing :)

- Testing. Gotta have some in src/test/regress in the source tree.

Before even getting to the patch stage, there will be a period of discussion about latency and other tradeoffs. Mandatory part of any conversation mentioning a wire protocol.

So the plan is to come up with a working prototype that we can plug into protocol testing tools and measure the heck out of it in context. Yet one more thing to figure out. BTW, are there any formal tests of that kind for v3 protocol?

By that time I do hope to learn how to write code tests to put into src/test/regress.

- Tight coupling to OpenSSL, if that's actually what's happening.
We're actively trying to get away from this, so a TLS-neutral
implementation or at least one that's not specific to OpenSSL would
be good.

Didn’t know that. Will ifdef the openssl-dependent code. It’s not hard to implement ALPN nego to cover all viable libraries. Do you know what alternatives are being considered?

- Overhead for all clients. It may be tiny, but it needs to be
measured and that cost needs to be weighed against the benefits.
Maybe a cache miss in the context of a network connection is
negligible, but we do need to know.

Important point. If h2 is to be seriously considered, then it must be an improvement in absolutely every aspect.

The core part of this proposal is that h2 is parallel to v3. Something one can opt into by compiling `--with_http2`.

Even if h2 finds its way already into PG12, its likely that the existing installed base would elect not to compile it in as there are no immediate benefits to them. The first wave of users will be web-facing apps. They already pay the penalty of conversion to/from v3, so in those scenarios the switch will be a gain.

Then again, if h2 becomes the new v4, then libpq-fe will support for it, so we might find that the savings in one or two network round trips amply offset one byte socket peek, and everyone will eagerly upgrade. Who knows.

My PoC strategy is to touch existing code as little as possible. Yet if the ProcessStartupPacket can somehow return the consumed bytes back to the TLS lib for negotiation, then there’s zero cost to protocol detection for v2/v3 clients and only h2 clients pay the price of the extra check.

- Dependency on a new external library. Fortunately, it's MIT
licensed, so it's PostgreSQL compatible, but what happens if it
becomes unmaintained? This has happened a couple of times, and it
causes overhead that needs to be taken into account.

I chose nghttp because it gave me a quick start, it’s well designed, a good fit for this kind of work, and fortunately indeed, the license is compatible. (Also, curl links to it as well, so am pretty confident it’ll be around). Very possible that over time h2 parsing code migrates into pg codebase. There are so much similarities to v3 architecture, we might find a way to generalize both into a single codebase. Then h2 frame parser/state machine becomes only a handful of .c files.

h2 is a standard; however you decide to parse it, your code will eventually converge to a stable state in the same manner that febe v3 code did. Once we master the protocol, I don’t think there’ll be much need to touch the framing code. IOW even if we just import what we need, it won’t be a big issue.

My hope is that this post leads to a conversation and gets a few
people excited about the idea the way I am. Maybe even some of the
GSoC students would take the implementation further?

The conversation has started.

Thanks so much for picking up the invitation!

There are few points that I’d really like to discuss next:

* Is there merit in the idea of a completely new v4 protocol—one that freezes the v3 and takes a new path?

* What are the criteria for getting this into the core?

* Is it better to develop in an experimental fork until the architecture is stable and than patch onto the master, or are we supposed to keep proposing patches for inclusion in the master? Even if not all details are fully fleshed out?

Again, welcome, and thanks for jumping in!

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Thanks,
Damir

Craig Ringer

craig@2ndquadrant.com

about 8 years ago

In reply to: Damir Simunic (#3)

Re: Proposal: http2 wire format

On 26 March 2018 at 06:00, Damir Simunic <damir.simunic@wa-research.ch>
wrote:

- Overhead for all clients. It may be tiny, but it needs to be
measured and that cost needs to be weighed against the benefits.
Maybe a cache miss in the context of a network connection is
negligible, but we do need to know.

Important point. If h2 is to be seriously considered, then it must be an
improvement in absolutely every aspect.

The core part of this proposal is that h2 is parallel to v3. Something one
can opt into by compiling `--with_http2`.

IMO, a new protocol intended to supersede an old one must be a core,
non-optional feature. It won't reach critical mass of adoption if people
can't reasonably rely on it being there. There'll still be a multi-year
lead time as versions that support it become widespread enough to interest
non-libpq-based driver authors.

My PoC strategy is to touch existing code as little as possible. Yet if
the ProcessStartupPacket can somehow return the consumed bytes back to the
TLS lib for negotiation, then there’s zero cost to protocol detection for
v2/v3 clients and only h2 clients pay the price of the extra check.

As others have noted, you'll want to find a way to handle this in the least
SSL-implementation-specific manner possible. IMO if it can't work with
OpenSSL, Windows's SSL implementation and OS X's SSL framework it's a
non-starter.

- Dependency on a new external library. Fortunately, it's MIT
licensed, so it's PostgreSQL compatible, but what happens if it
becomes unmaintained? This has happened a couple of times, and it
causes overhead that needs to be taken into account.

I chose nghttp because it gave me a quick start, it’s well designed, a
good fit for this kind of work, and fortunately indeed, the license is
compatible. (Also, curl links to it as well, so am pretty confident it’ll
be around). Very possible that over time h2 parsing code migrates into pg
codebase. There are so much similarities to v3 architecture, we might find
a way to generalize both into a single codebase. Then h2 frame parser/state
machine becomes only a handful of .c files.

h2 is a standard; however you decide to parse it, your code will
eventually converge to a stable state in the same manner that febe v3 code
did. Once we master the protocol, I don’t think there’ll be much need to
touch the framing code. IOW even if we just import what we need, it won’t
be a big issue.

While I'm a big fan of code reuse and using existing libraries, I
understand others' hesitance here. Look at what happened with ossp-uuid;
that was painful and it was just a contrib.

It's a difficult balance between NIH and maintaining a stable core.

* Is there merit in the idea of a completely new v4 protocol—one that
freezes the v3 and takes a new path?

Likely so... but it has to be pretty compelling IMO. And more importantly,
offer a smooth backwards- and forwards-compatible path.

* What are the criteria for getting this into the core?

Mine would be:

- No new/separate port required. Works on existing port.

- Doesn't break old clients connecting to new servers

- Doesn't break new clients connecting to old servers

- No extra round trips for new client -> old server . I don't personally
care about old client -> new server so much, but should be able to offer a
pg_hba.conf option to ensure v3 proto only or otherwise prevent extra round
trips in this case too.

- Offers significant, concrete benefits and solves the outstanding set of
issues with v3 comprehensively

- Offers a really strong extensibility path for client-requested and
server-requested optional protocol features as well as protocol version
negotiation, with no extra round trips whenever possible.

- Has a wireshark dissector

- Is practical to implement in connection pooler proxies like pgbouncer,
pgpool

- Can be made wholly transparent to clients of libpq, i.e. no extra headers
or libraries to link

- Works on windows and osx too

- Any libraries used are widespread enough that they're present in at least
RHEL7 and Debian Stable. We *can't* just bundle extras in our sources, and
packagers are unlikely to be at all happy packaging an extra lib or
backport for us. They'll probably just disable the new protocol.

- No regressions for support of SASL / SCRAM, GSSAPI, TLS with X.509 client
certs, various other auth methods.

Now, a protocol that cannot satisfy these is IMO not a complete
non-starter. It just has to be treated as an optional feature to help out
webapps, with quite different design criteria as a result, and cannot be
allowed to be as intrusive. Where changes to core protocol logic paths are
required it'd have to add plugin mechanisms/hooks instead of adding its own
new logic directly.

Make sense?

* Is it better to develop in an experimental fork until the architecture
is stable and than patch onto the master, or are we supposed to keep
proposing patches for inclusion in the master? Even if not all details are
fully fleshed out?

Protocol support doesn't change fast.

I strongly advise you to work on git master at all times, and become
familiar with:

- git rebase
- git cherry-pick
- git merge
- git reflog (for when you make mistakes with the above)

Consider maintaining a public git repo with the current working branch. Tag
versions if you refer to them in mailing list posts etc, so that people
know the exact code you were referring to.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Jacob Champion

jacob.champion@enterprisedb.com

about 8 years ago

In reply to: Craig Ringer (#4)

Re: Proposal: http2 wire format

On Sun, Mar 25, 2018 at 8:11 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

As others have noted, you'll want to find a way to handle this in the least
SSL-implementation-specific manner possible. IMO if it can't work with
OpenSSL, Windows's SSL implementation and OS X's SSL framework it's a
non-starter.

+1.

While I'm a big fan of code reuse and using existing libraries, I understand
others' hesitance here. Look at what happened with ossp-uuid; that was
painful and it was just a contrib.

It's a difficult balance between NIH and maintaining a stable core.

For whatever it's worth, I think libnghttp2 is an excellent choice for
an HTTP/2 implementation, even when taking into account the risks of
NIH. It's a well-designed library with mature clients (Curl and Apache
HTTP Server, among others), and it's authored by an HTTP/2 expert. (If
you're seriously considering HTTP/2, then you seriously need to avoid
not-invented-here syndrome. Don't roll your own unless you're
interested in becoming HTTP/2 protocol-layer security experts in
addition to SQL security experts.)

As you move forward with the PoC, consider: even if you decide not to
become protocol-layer experts, you'll still need to become familiar
with application-layer security in HTTP. You'll need to decide whether
the HTTP browser/server security model -- which is notoriously
unintuitive for many -- works well for Postgres. In particular, you'll
want to make sure that the new protocol doesn't put your browser-based
users in danger (I'm thinking primarily about cross-site request
forgeries here). Always remember that one of a web browser's core use
cases is the execution of untrusted code...

--Jacob

Damir Simunic

damir.simunic@wa-research.ch

about 8 years ago

In reply to: Craig Ringer (#4)

Re: Proposal: http2 wire format

Hi,

On 26 Mar 2018, at 05:11, Craig Ringer <craig@2ndquadrant.com> wrote:

On 26 March 2018 at 06:00, Damir Simunic <damir.simunic@wa-research.ch> wrote:

- Overhead for all clients. It may be tiny, but it needs to be
measured and that cost needs to be weighed against the benefits.
Maybe a cache miss in the context of a network connection is
negligible, but we do need to know.

Important point. If h2 is to be seriously considered, then it must be an improvement in absolutely every aspect.

The core part of this proposal is that h2 is parallel to v3. Something one can opt into by compiling `--with_http2`.

IMO, a new protocol intended to supersede an old one must be a core, non-optional feature. It won't reach critical mass of adoption if people can't reasonably rely on it being there. There'll still be a multi-year lead time as versions that support it become widespread enough to interest non-libpq-based driver authors.

Agreed, it should be in core.

My PoC strategy is to touch existing code as little as possible. Yet if the ProcessStartupPacket can somehow return the consumed bytes back to the TLS lib for negotiation, then there’s zero cost to protocol detection for v2/v3 clients and only h2 clients pay the price of the extra check.

As others have noted, you'll want to find a way to handle this in the least SSL-implementation-specific manner possible. IMO if it can't work with OpenSSL, Windows's SSL implementation and OS X's SSL framework it's a non-starter.

Understood.

Everyone that matters supports ALPN: https://en.wikipedia.org/wiki/Application-Layer_Protocol_Negotiation#Support

From the PoC standpoint, it’s now a straightforward chore to make sure it is supported for all possible build choices.

- Dependency on a new external library. Fortunately, it's MIT
licensed, so it's PostgreSQL compatible, but what happens if it
becomes unmaintained? This has happened a couple of times, and it
causes overhead that needs to be taken into account.

I chose nghttp because it gave me a quick start, it’s well designed, a good fit for this kind of work, and fortunately indeed, the license is compatible. (Also, curl links to it as well, so am pretty confident it’ll be around). Very possible that over time h2 parsing code migrates into pg codebase. There are so much similarities to v3 architecture, we might find a way to generalize both into a single codebase. Then h2 frame parser/state machine becomes only a handful of .c files.

h2 is a standard; however you decide to parse it, your code will eventually converge to a stable state in the same manner that febe v3 code did. Once we master the protocol, I don’t think there’ll be much need to touch the framing code. IOW even if we just import what we need, it won’t be a big issue.

While I'm a big fan of code reuse and using existing libraries, I understand others' hesitance here. Look at what happened with ossp-uuid; that was painful and it was just a contrib.

It's a difficult balance between NIH and maintaining a stable core.

Enough important projects depend on libnghttp, I don’t think it will go away any time soon. And http2 is big; as more and more tools want to talk that protocol they’ll turn to libnghttp, so the signs of any troubles will be visible very very quickly.

* Is there merit in the idea of a completely new v4 protocol—one that freezes the v3 and takes a new path?

Likely so... but it has to be pretty compelling IMO. And more importantly, offer a smooth backwards- and forwards-compatible path.

* What are the criteria for getting this into the core?

Mine would be:

- No new/separate port required. Works on existing port.

Check.

- Doesn't break old clients connecting to new servers

Check.

- Doesn't break new clients connecting to old servers

Old server sends “Invalid startup packet” and closes the connection; client’s TLS layer reports an error. Does that count as not breaking new clients?

curl -v https://localhost:5432

...
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to localhost:5432
* stopped the pause stream!
* Closing connection 0
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to localhost:5432

This applies to any TLS client (an h2-supporting libpq-fe will behave the same):

wget -v https://localhost:5432

Connecting to localhost|::1|:5432... connected.
Unable to establish SSL connection.

- No extra round trips for new client -> old server . I don't personally care about old client -> new server so much, but should be able to offer a pg_hba.conf option to ensure v3 proto only or otherwise prevent extra round trips in this case too.

Can we talk about this more, please?

- Offers significant, concrete benefits and solves the outstanding set of issues with v3 comprehensively

This proposal aims to do exactly that. Work on the existing TODO list items is the way to stay on topic and demonstrate a strong case.

Once we can clear the todo list items, we can of course discuss many other benefits. As soon as I get enough of the framing working, I’ll dive into addressing each TODO item, and then scour the mailing list for more “I wish it could do…” remarks.

- Offers a really strong extensibility path for client-requested and server-requested optional protocol features as well as protocol version negotiation, with no extra round trips whenever possible.

Check.

Extensibility is the essence of h2, we’re getting this for free.

- Has a wireshark dissector

Check.

- Is practical to implement in connection pooler proxies like pgbouncer, pgpool

Something I’m planning to look into and address.

New connection poolers might become feasible, too: nginx, nghttpx, etc. (for non-web related scenarios as well). Opting into h2 lets us benefit from a much larger amount of time and resources being spent on improving things that matter. Reverse proxies face the same architectural challenges as pg-only connection poolers do.

- Can be made wholly transparent to clients of libpq, i.e. no extra headers or libraries to link

Check.

This proposal focuses on changes in framing in the following ways:

* client->server: packaging the startup packet into HEADERS, and optionally sending the query and parameters in a DATA frame.
* server->client: moving response packet tags into HEADERS frames and dropping length prefix. DATA frames still contain the usual v3 payload.

Existing code linking against the new libpq client should not even notice the protocol change.

New code wanting to use more of the h2 benefits will of course have to be written differently. That should be a separate conversation, once h2 is in the core. (I’m talking about new features like feature negotiation, etc.—obviously there will have to be new features supported on the server before there’s anything to negotiate).

- Works on windows and osx too

Check.

The plan is to use the existing socket and TLS code that v3 uses. I think I can make it work elegantly through the existing PQcommMethods abstraction.

- Any libraries used are widespread enough that they're present in at least RHEL7 and Debian Stable. We *can't* just bundle extras in our sources, and packagers are unlikely to be at all happy packaging an extra lib or backport for us. They'll probably just disable the new protocol.

Check.

Let me see if I can make a table showing parallel availability of Postgres and libnghttp versions on mainstream platforms. If there are any gaps, I’m sure it is possible to lobby for inclusion of libnghttp where it matters. I see Debian has it for wheezy, jessie, and sid, while pg10 is on sid and buster.

- No regressions for support of SASL / SCRAM, GSSAPI, TLS with X.509 client certs, various other auth methods.

Check.

Adding new auth method keyword (“h2”) in pg_hba will give us a clean code path to work with.

Now, a protocol that cannot satisfy these is IMO not a complete non-starter. It just has to be treated as an optional feature to help out webapps, with quite different design criteria as a result, and cannot be allowed to be as intrusive. Where changes to core protocol logic paths are required it'd have to add plugin mechanisms/hooks instead of adding its own new logic directly.

While web-related scenarios are the first thing that comes to ming when talking about h2, (and that should not be disregarded), this proposal looks at the bigger picture of future-proofing the protocol. Headers/data/trailers split, and feature/ content negotiation are far bigger benefits then being web friendly.

Make sense?

Exactly what I was looking for, thanks! Hopefully we hear from more folks about the concerns with taking this path.

* Is it better to develop in an experimental fork until the architecture is stable and than patch onto the master, or are we supposed to keep proposing patches for inclusion in the master? Even if not all details are fully fleshed out?

Protocol support doesn't change fast.

I strongly advise you to work on git master at all times, and become familiar with:

- git rebase
- git cherry-pick
- git merge
- git reflog (for when you make mistakes with the above)

Consider maintaining a public git repo with the current working branch. Tag versions if you refer to them in mailing list posts etc, so that people know the exact code you were referring to.

Will do.

Thanks,
Damir

Show quoted text

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Vladimir Sitnikov

sitnikov.vladimir@gmail.com

about 8 years ago

In reply to: Damir Simunic (#1)

Re: Proposal: http2 wire format

Hi,

If anyone finds the idea of Postgres speaking http2 appealing

HTTP/2 sounds interesting.
What do you think of https://grpc.io/ ?

Have you evaluated it?
It does sound like a ready RPC on top of HTTP/2 with support for lots of
languages.

The idea of reimplementing the protocol for multiple languages from scratch
does not sound too appealing.

Vladimir

Vladimir Sitnikov

sitnikov.vladimir@gmail.com

about 8 years ago

In reply to: Craig Ringer (#4)

Re: Proposal: http2 wire format

Damir> * What are the criteria for getting this into the core?
Craig>Mine would be:

There's a relevant list as well:
https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md

Vladimir

Craig Ringer

craig@2ndquadrant.com

about 8 years ago

In reply to: Damir Simunic (#6)

Re: Proposal: http2 wire format

On 26 March 2018 at 17:01, Damir Simunic <damir.simunic@wa-research.ch>
wrote:

- Doesn't break new clients connecting to old servers

Old server sends “Invalid startup packet” and closes the connection;
client’s TLS layer reports an error. Does that count as not breaking new
clients?

libpq would have to do something like it does now for ssl connections,
falling back to non-ssl, and offering a connection option to make it try
the v3 protocol immediately without bothering with v4.

- No extra round trips for new client -> old server . I don't personally

care about old client -> new server so much, but should be able to offer a
pg_hba.conf option to ensure v3 proto only or otherwise prevent extra round
trips in this case too.

Can we talk about this more, please?

As above. A newer libpq should not perform worse on an existing server than
an older libpq.

Check.

Extensibility is the essence of h2, we’re getting this for free.

Please elaborate somewhat for people not already strongly familiar with
HTTP2.

BTW, please stop saying "h2" when you mean HTTP2. It's really confusing,
because I keep thinking you are talking about H2, the database engine (
http://www.h2database.com/), which has PostgreSQL protocol and syntax
compatibility as well as its own wire protocol.

- Has a wireshark dissector

Check.

... including understanding of the PostgreSQL bits that are payload within
the protocol.

Look at what the current dissector does - capture some packets.

- Is practical to implement in connection pooler proxies like pgbouncer,

pgpool

Something I’m planning to look into and address.

New connection poolers might become feasible, too: nginx, nghttpx, etc.
(for non-web related scenarios as well). Opting into h2 lets us benefit
from a much larger amount of time and resources being spent on improving
things that matter. Reverse proxies face the same architectural challenges
as pg-only connection poolers do.

... which is nice, but doesn't change the fact that a protocol revision
that completely and unfixably breaks existing tools much of the community
relies on won't go far.

- Any libraries used are widespread enough that they're present in at

least RHEL7 and Debian Stable. We *can't* just bundle extras in our
sources, and packagers are unlikely to be at all happy packaging an extra
lib or backport for us. They'll probably just disable the new protocol.

Check.

Let me see if I can make a table showing parallel availability of Postgres
and libnghttp versions on mainstream platforms. If there are any gaps, I’m
sure it is possible to lobby for inclusion of libnghttp where it matters. I
see Debian has it for wheezy, jessie, and sid, while pg10 is on sid and
buster.

Good plan. But be clear that this is super experimental.

- No regressions for support of SASL / SCRAM, GSSAPI, TLS with X.509

client certs, various other auth methods.

Check.

Adding new auth method keyword (“h2”) in pg_hba will give us a clean code
path to work with.

I think you missed the point there entirely.

HTTP2 isn't an authentication method. It's a wire protocol. It will be
necessary to support authentication methods including, but not limited to,
GSSAPI, SSPI (windows), SCRAM, etc *on any new protocol*.

If you propose a new protocol, to replace the v3 protocol, and it doesn't
support SSPI or SCRAM I rate your chances as about zero of getting serious
interest. You'll be back in extension-for-webdevs town.

Now, a protocol that cannot satisfy these is IMO not a complete

non-starter. It just has to be treated as an optional feature to help out
webapps, with quite different design criteria as a result, and cannot be
allowed to be as intrusive. Where changes to core protocol logic paths are
required it'd have to add plugin mechanisms/hooks instead of adding its own
new logic directly.

While web-related scenarios are the first thing that comes to ming when
talking about h2, (and that should not be disregarded), this proposal looks
at the bigger picture of future-proofing the protocol.
Headers/data/trailers split, and feature/ content negotiation are far
bigger benefits then being web friendly.

You mentioned something about bundling queries in the startup packet.
That's cool if your queries don't need to adapt to server version etc,
which will often be the case. But doesn't that imply rather high backend
startup/shutdown costs?

There's a reason everyone with high rates of small simple queries uses
poolers right now.

Such a protocol would help poolers a lot, but not gain a great deal for the
core server without some kind of backend pooling, which is a huge separate
topic.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#10

Damir Simunic

damir.simunic@wa-research.ch

about 8 years ago

In reply to: Jacob Champion (#5)

Re: Proposal: http2 wire format

Hi,

On 26 Mar 2018, at 06:47, Jacob Champion <pchampion@pivotal.io> wrote:

On Sun, Mar 25, 2018 at 8:11 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

As others have noted, you'll want to find a way to handle this in the least
SSL-implementation-specific manner possible. IMO if it can't work with
OpenSSL, Windows's SSL implementation and OS X's SSL framework it's a
non-starter.

+1.

While I'm a big fan of code reuse and using existing libraries, I understand
others' hesitance here. Look at what happened with ossp-uuid; that was
painful and it was just a contrib.

It's a difficult balance between NIH and maintaining a stable core.

For whatever it's worth, I think libnghttp2 is an excellent choice for
an HTTP/2 implementation, even when taking into account the risks of
NIH. It's a well-designed library with mature clients (Curl and Apache
HTTP Server, among others), and it's authored by an HTTP/2 expert. (If
you're seriously considering HTTP/2, then you seriously need to avoid
not-invented-here syndrome. Don't roll your own unless you're
interested in becoming HTTP/2 protocol-layer security experts in
addition to SQL security experts.)

Agreed.

As you move forward with the PoC, consider: even if you decide not to
become protocol-layer experts, you'll still need to become familiar
with application-layer security in HTTP.

Good point. Application layer security is indeed a concern.

h2 has provisions for security by design, and a significant amount of research going into this on a large scale. Adopting h2 instead of inventing our own v4 gets us all this research for free.

You'll need to decide whether
the HTTP browser/server security model -- which is notoriously
unintuitive for many -- works well for Postgres. In particular, you'll
want to make sure that the new protocol doesn't put your browser-based
users in danger (I'm thinking primarily about cross-site request
forgeries here). Always remember that one of a web browser's core use
cases is the execution of untrusted code…

Mentioning h2 does bring browsers in mind, but this proposal is not concerned with that. (quick curl sketches are shown only because curl is an already available h2 client). Present web-facing designs already deal with browsers and API clients, there will be no change to that. Existing Postgres deployment and security practices must remain unchanged whether we use v3 or h2. Don’t think anyone would want to expose Postgres to the open web without a connection pooler in front of it.

When you say "browser/server model,” presumably you’re having http1 in mind. h2 does not have much in common with http1 on the wire. In fact, h2 is architecturally closer to febe than http1. Both h2 and febe deal with multiple request/response pairs over a single connection. Server initiated requests are covered through push_promise frames, and logical replication (being more of a subscription thing in my mind) is covered through stream multiplexing.

Let's keep the discussion focused on the wire protocol: the sooner we can get to stable h2 framing in the core, the sooner we’ll be able to experiment with new use cases and possibilities. Only then it will make sense to bring back this discussion about browsers, content negotiation, etc.

Thanks,
Damir

Show quoted text

--Jacob

#11

Damir Simunic

damir.simunic@wa-research.ch

about 8 years ago

In reply to: Craig Ringer (#9)

Re: Proposal: http2 wire format

On 26 Mar 2018, at 11:34, Craig Ringer <craig@2ndquadrant.com> wrote:

On 26 March 2018 at 17:01, Damir Simunic <damir.simunic@wa-research.ch <mailto:damir.simunic@wa-research.ch>> wrote:

- Doesn't break new clients connecting to old servers

Old server sends “Invalid startup packet” and closes the connection; client’s TLS layer reports an error. Does that count as not breaking new clients?

libpq would have to do something like it does now for ssl connections, falling back to non-ssl, and offering a connection option to make it try the v3 protocol immediately without bothering with v4.

- No extra round trips for new client -> old server . I don't personally care about old client -> new server so much, but should be able to offer a pg_hba.conf option to ensure v3 proto only or otherwise prevent extra round trips in this case too.

Can we talk about this more, please?

As above. A newer libpq should not perform worse on an existing server than an older libpq.

Wouldn’t newer libpq continue to support v3 as long as supported servers do? I’m confused with “no extra round trips” part and the “pg_hba.conf option". If I know I’m talking to the old server, I’ll just configure the client to talk febe v3 and not worry.

Anyway, I’ll document all the combinations to make it easier to discuss.

Check.

Extensibility is the essence of h2, we’re getting this for free.

Please elaborate somewhat for people not already strongly familiar with HTTP2.

BTW, please stop saying "h2" when you mean HTTP2. It's really confusing, because I keep thinking you are talking about H2, the database engine (http://www.h2database.com/ <http://www.h2database.com/>), which has PostgreSQL protocol and syntax compatibility as well as its own wire protocol.

Haha, I din’t know that! “h2” is the protocol identifier in the ALPN; in mind, http2 has more of the web and http1 baggage that I’m trying to avoid here. But let’s stick to http2 and define it better.

- Has a wireshark dissector

Check.

... including understanding of the PostgreSQL bits that are payload within the protocol.

Look at what the current dissector does - capture some packets.

- Is practical to implement in connection pooler proxies like pgbouncer, pgpool

Something I’m planning to look into and address.

New connection poolers might become feasible, too: nginx, nghttpx, etc. (for non-web related scenarios as well). Opting into h2 lets us benefit from a much larger amount of time and resources being spent on improving things that matter. Reverse proxies face the same architectural challenges as pg-only connection poolers do.

... which is nice, but doesn't change the fact that a protocol revision that completely and unfixably breaks existing tools much of the community relies on won't go far.

- Any libraries used are widespread enough that they're present in at least RHEL7 and Debian Stable. We *can't* just bundle extras in our sources, and packagers are unlikely to be at all happy packaging an extra lib or backport for us. They'll probably just disable the new protocol.

Check.

Let me see if I can make a table showing parallel availability of Postgres and libnghttp versions on mainstream platforms. If there are any gaps, I’m sure it is possible to lobby for inclusion of libnghttp where it matters. I see Debian has it for wheezy, jessie, and sid, while pg10 is on sid and buster.

Good plan. But be clear that this is super experimental.

- No regressions for support of SASL / SCRAM, GSSAPI, TLS with X.509 client certs, various other auth methods.

Check.

Adding new auth method keyword (“h2”) in pg_hba will give us a clean code path to work with.

I think you missed the point there entirely.

HTTP2 isn't an authentication method. It's a wire protocol. It will be necessary to support authentication methods including, but not limited to, GSSAPI, SSPI (windows), SCRAM, etc *on any new protocol*.

If you propose a new protocol, to replace the v3 protocol, and it doesn't support SSPI or SCRAM I rate your chances as about zero of getting serious interest. You'll be back in extension-for-webdevs town.

Great points. I need to be more clear on that. My main concern was how to bypass the v3 auth negotiation that is closely linked to existing methods. From PoC perspective, I didn’t want to touch that and was focusing on the fact that more can be done wrt authentication in the initial request packet.

Let me spend some time on this and come up with a good way to cover everything.

Now, a protocol that cannot satisfy these is IMO not a complete non-starter. It just has to be treated as an optional feature to help out webapps, with quite different design criteria as a result, and cannot be allowed to be as intrusive. Where changes to core protocol logic paths are required it'd have to add plugin mechanisms/hooks instead of adding its own new logic directly.

While web-related scenarios are the first thing that comes to ming when talking about h2, (and that should not be disregarded), this proposal looks at the bigger picture of future-proofing the protocol. Headers/data/trailers split, and feature/ content negotiation are far bigger benefits then being web friendly.

You mentioned something about bundling queries in the startup packet. That's cool if your queries don't need to adapt to server version etc, which will often be the case. But doesn't that imply rather high backend startup/shutdown costs?

There's a reason everyone with high rates of small simple queries uses poolers right now.

Such a protocol would help poolers a lot, but not gain a great deal for the core server without some kind of backend pooling, which is a huge separate topic.

Yeah, let’s leave that for later. Nothing forces us to send the query in the first request. I think we can get http2 working the same way v3 works now. Then we can experiment and figure these things out.

Show quoted text

--
Craig Ringer http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

#12

Craig Ringer

craig@2ndquadrant.com

about 8 years ago

In reply to: Damir Simunic (#10)

Re: Proposal: http2 wire format

On 26 March 2018 at 17:34, Damir Simunic <damir.simunic@wa-research.ch>
wrote:

As you move forward with the PoC, consider: even if you decide not to
become protocol-layer experts, you'll still need to become familiar
with application-layer security in HTTP.

Good point. Application layer security is indeed a concern.

h2 has provisions for security by design, and a significant amount of
research going into this on a large scale. Adopting h2 instead of inventing
our own v4 gets us all this research for free.

HTTP2, please, not "h2".

It looks HTTP2 does use the term "h2" to mean "http2 over TLS", to
differentiate it from "h2c" which is HTTP2-over-cleartext.

IMO, you'd have to support both. Mandating TLS is going to be a non-starter
for sites that use loopback connections or virtual switches on VMs, VLAN
isolation, or other features to render traffic largely unsniffable. They
won't want to pay the price for crypto on all traffic. So this needs to be
"HTTP2 support" not "HTTP2/TLS (h2) support" anyway.

Re Pg and security: By and large we don't invent our own security
protocols. We've adopted standard mechanisms like GSSAPI and SCRAM, and
vendor ones like SSPI. Some of the details of how they're implemented in
the protocol are of course protocol specific (and thus, opportunities for
bugs/design mistakes), of course.

But you will get _nowhere_ in making this a new default protocol if you
just try to treat those as outdated and uninteresting.

In fact, part of extensibility considerations should be extensible
authentication.

Authentication and authorization (which any new protocol really should
separate) are crucial features, and there's no one-size-fits-all answer.

If you just assume, say, that everything happens over TLS with password
auth or x.509 client certs, you'll create a giant mess for all the sites
that use Kerberos or SSPI.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#13

Damir Simunic

damir.simunic@wa-research.ch

about 8 years ago

In reply to: Craig Ringer (#12)

Re: Proposal: http2 wire format

On 26 Mar 2018, at 12:47, Craig Ringer <craig@2ndquadrant.com> wrote:

On 26 March 2018 at 17:34, Damir Simunic <damir.simunic@wa-research.ch> wrote:

As you move forward with the PoC, consider: even if you decide not to
become protocol-layer experts, you'll still need to become familiar
with application-layer security in HTTP.

Good point. Application layer security is indeed a concern.

h2 has provisions for security by design, and a significant amount of research going into this on a large scale. Adopting h2 instead of inventing our own v4 gets us all this research for free.

HTTP2, please, not "h2".

It looks HTTP2 does use the term "h2" to mean "http2 over TLS", to differentiate it from "h2c" which is HTTP2-over-cleartext.

IMO, you'd have to support both. Mandating TLS is going to be a non-starter for sites that use loopback connections or virtual switches on VMs, VLAN isolation, or other features to render traffic largely unsniffable. They won't want to pay the price for crypto on all traffic. So this needs to be "HTTP2 support" not "HTTP2/TLS (h2) support" anyway.

Makes sense; I’ll update all wording and function names, etc. No difference to the substance of this proposal. The same code path handles both h2 and h2c. TLS is optional, a matter of detecting the first byte of the request and taking the appropriate action.

I think we can reliably and efficiently detect h2, h2c, and FEBE requests. Of course, the behavior needs to be configurable: which protocols to enable, and how to resolve the negotiation. In my mind this is self-evident.

Re Pg and security: By and large we don't invent our own security protocols. We've adopted standard mechanisms like GSSAPI and SCRAM, and vendor ones like SSPI. Some of the details of how they're implemented in the protocol are of course protocol specific (and thus, opportunities for bugs/design mistakes), of course.

But you will get _nowhere_ in making this a new default protocol if you just try to treat those as outdated and uninteresting.

Agreed: new default protocol must be covering 100% of existing use cases, _and_ add more compelling capabilities on top.

If anything I wrote made it appear contrary to that goal, it is purely because of my current focus on getting to a PoC.

In fact, part of extensibility considerations should be extensible authentication.

Authentication and authorization (which any new protocol really should separate) are crucial features, and there's no one-size-fits-all answer.

I think that HTTP2 gets us much closer to that goal. My vision is to enable application-developer-defined authentication and/or authorization as well. This is something to research once the framing is in place.

If you just assume, say, that everything happens over TLS with password auth or x.509 client certs, you'll create a giant mess for all the sites that use Kerberos or SSPI.

100% agreed on everything you say, and thanks for taking the time to write this up.

Show quoted text

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#14

Damir Simunic

damir.simunic@wa-research.ch

about 8 years ago

In reply to: Vladimir Sitnikov (#8)

Re: Proposal: http2 wire format

On 26 Mar 2018, at 11:13, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:

Damir> * What are the criteria for getting this into the core?
Craig>Mine would be:

+1

There's a relevant list as well: https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md <https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md>

This is a great addition to the list, thanks!

Damir

#15

Damir Simunic

damir.simunic@wa-research.ch

about 8 years ago

In reply to: Vladimir Sitnikov (#7)

Re: Proposal: http2 wire format

On 26 Mar 2018, at 11:06, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:

Hi,

If anyone finds the idea of Postgres speaking http2 appealing

HTTP/2 sounds interesting.
What do you think of https://grpc.io/ ?

Have you evaluated it?
It does sound like a ready RPC on top of HTTP/2 with support for lots of languages.

The idea of reimplementing the protocol for multiple languages from scratch does not sound too appealing.

This proposal takes the stance that having HTTP2 wire protocol in place will enable wide experimentation with and implementation of many new features and content types, but is not concerned with the specifics of those.

---
Let me illustrate with an example how it would look if we already had HTTP2 as proposed.

Lets’ say you have a building automation device on your network that happens to speak grpc, and you decided to use Postgres to store published topics in the database.

Your grpc-speaking device might connect to Postgres and issue a request like this:

HEADERS (flags = END_HEADERS)
:method = POST
:scheme = http
:path = /CreateTopic
pg-database = Publisher
content-type = application/grpc+proto
grpc-encoding = gzip
authorization = Bearer y235.wef315yfh138vh31hv93hv8h3v

DATA (flags = END_STREAM)
<Length-Prefixed Message>

(This is from grpc.io homepage; uppercase HEADERS and DATA are frame names from the HTTP2 specification).

Postgres would take care of TLS negotiation, unpack the frames, decompress the headers (:method, :path, etc are transferred compressed with a lookup table) and copy the payload into memory and make it all available to the backend. If this was the first request, it would start the backend for you as well.

Postgres doesn’t know about grpc, so it would just conveniently return "406 Not Supported” to your client and close the stream (but not the connection). Still connected and authenticated, the device could retry the request with `content-type: application/json`, and if you somehow programmed a function that accepts json, the request would go through. (Let’s imagine we have some kind of mechanism to associate functions to requests and content types, maybe through some function attributes in the catalog).

Say that someone else took the time and programmed a plugin that knows how to talk grpc. Then the server would call that plugin for you, validate and insert the data in the right table, and return 200 OK or 204 or whatever is appropriate to return according to grpc protocol semantics.

Obviously, someone has to implement a bunch of new code on the server side to ungzip, to interpret the content of the protobuf message and take action. But that someone doesn’t need to think of getting to all the metadata like compression type, payload format etc. Just somehow plug into the server at the right level read the data and metadata from memory, and then call into SPI to do its thing. Similar to how application servers work today. (Or Postgres for that matter, though it’s just it speaks FEBE and there’s no content type negotiation).

The same goes for the ‘authorization’ header. Postgres does not support Bearer token authorization today. But maybe you’ll be able to define a function that knows how to deal with the token, and somehow signal to Postgres that you want it to call this function when it sees such a header. Or maybe someone wrote a plugin that does that, and you configure your server to use it.

Then when connecting to Postgres with the above request, it would start the backend and call the function/plugin for you to decide whether to authorize the request. (As a side note, subsequent requests within the same connection would have this header compressed on the wire; that’s also a HTTP2 feature).

---

That’s only one possible scenario, and not the only one. In this specific scenario, the benefit is that Postgres will give you content negotiation built in, and will talk to any HTTP2 conforming client. Like you said, you don’t want to reimplement the protocol over and over.

But whether that content is grpc or something else, that's for a future discussion.

Current focus is really on getting the framing and extensibility in the core. Admittedly, haven’t yet figured out how to code all the details, but I’m more and more clear how this will work architecturally. Now it’s about putting lots of elbow grease into understanding the source, coding in C, and addressing all the issues that make sure the new protocol is 100% supporting all existing v3 use cases.

Beyond v3 use cases, top of my mind are improvements like you comment on in the topic “Binary transfer” in your “v4 wanted features” doc (and most of the other stuff you mention).

Damir

Show quoted text

Vladimir

#16

Alvaro Hernandez

aht@ongres.com

about 8 years ago

In reply to: Damir Simunic (#14)

Re: Proposal: http2 wire format

On 26/03/18 13:11, Damir Simunic wrote:

On 26 Mar 2018, at 11:13, Vladimir Sitnikov
<sitnikov.vladimir@gmail.com <mailto:sitnikov.vladimir@gmail.com>> wrote:

Damir> * What are the criteria for getting this into the core?
Craig>Mine would be:

+1

There's a relevant list as well:
https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md

This is a great addition to the list, thanks!

Damir

ï¿½ï¿½ï¿½ Hi Damir.

ï¿½ï¿½ï¿½ I'm interested in the idea. However, way before writing a PoC,
IMVHO I'd rather write a detailed document including:

- A brief summary of the main features of HTTP2 and why it might be a
good fit for PG (of course there's a lot of doc in the wild about
HTTP/2, so just a summary of the main relevant features and an analysis
of how it may fit Postgres).

- A more or less thorough description of how every feature in current
PostgreSQL protocol would be implemented on HTTP/2.

- Similar to the above, but applied to the v4 TODO feature list.

- A section for connection poolers, asï¿½ an auth, as these are very
important topics.

ï¿½ï¿½ï¿½ Hope this helps,

ï¿½ï¿½ï¿½ ï¿½lvaro

Alvaro Hernandez

-----------
OnGres

#17

Tom Lane

tgl@sss.pgh.pa.us

about 8 years ago

In reply to: Damir Simunic (#15)

Re: Proposal: http2 wire format

Damir Simunic <damir.simunic@wa-research.ch> writes:

On 26 Mar 2018, at 11:06, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:

If anyone finds the idea of Postgres speaking http2 appealing

TBH, this sounds like a proposal to expend a whole lot of work (much of it
outside the core server, and thus not under our control) in order to get
from a state of affairs where there are things we'd like to do but can't
because of protocol compatibility worries, to a different state of affairs
where there are things we'd like to do but can't because of protocol
compatibility worries. Why would forcing our data into a protocol
designed for a completely different purpose, and which we have no control
over, be a step forward? How would that address the fundamental issue of
inertia in multiple chunks of software (ie, client libraries and
applications as well as the server)?

This proposal takes the stance that having HTTP2 wire protocol in place will enable wide experimentation with and implementation of many new features and content types, but is not concerned with the specifics of those.

That reads to me as pie in the sky, and uninformed by any engineering
reality. As an example, it's not the protocol's fault that database
server processes are expensive to spin up; changing to a different
protocol will do nothing to make them more lightweight. We've thought
about various ways to amortize that cost, but they tend to fall foul of
the fact that sessions are associated with TCP connections, which we can't
transparently remake or reattach to a different endpoint process. HTTP2
is not going to fix that, because it's still TCP based. I realize that
webservers manage to have pretty lightweight sessions, but that's not a
property of the protocol they use, it's a property of their internal
architectures. We can't get there without a massive rewrite of the PG
server --- one that would be largely independent of any particular way of
representing data on the wire, anyway.

We've certainly got issues that can't be solved without protocol changes.
But starting from the assumption that HTTP2 solves our problems seems to
me to be "Here's a hammer. I'm sure your problem must be a nail, because
all problems are nails".

regards, tom lane

#18

Vladimir Sitnikov

sitnikov.vladimir@gmail.com

about 8 years ago

In reply to: Damir Simunic (#15)

Re: Proposal: http2 wire format

Damir>Postgres doesn’t know about grpc, s

I'm afraid you are missing the point.
I would say PostgreSQL doesn't know about HTTP/2.
It is the same as "PostgreSQL doesn't know about grpc".

Here's a quote from your pg_h2 repo:

What we need is to really build a request object and correctly extract
the full payload and parameters from the request. For example,
maybe we want to implement a QUERY method, similar to POST or PUT,
and pass the query text as the body of the request, with parameters
in the query string or in the headers

It basically suggests to implement own framing on top of HTTP/2.

When I say GRPC, I mean "implement PostgreSQL-specific protocol via GRPC
messages".

Let's take current message formats:
https://www.postgresql.org/docs/current/static/protocol-message-formats.html
If one defines those message formats via GRPC, then GRPC would autogenerate
parsers and serializers for lots of languages "for free".

For instance
Query (F)
Byte1('Q') Identifies the message as a simple query.
Int32 Length of message contents in bytes, including self.
String The query string itself.

can be defined via GPRC as
message Query {
string queryText = 1;
}

This is trivial to read, trivial to write, trivial to maintain, and it
automatically generates parsers/generators for lots of languages.

Parsing of the current v3 protocol has to be reimplemented for each and
every language, and it would be pain to implement parsing for v4.
Are you going to create "http/2" clients for Java, C#, Ruby, Swift, Dart,
etc, etc?

I am not saying that a mere redefinition of v3 messages as GRPC would do
the trick. I am saying that you'd better consider frameworks that would
enable transparent implementation of client libraries.

Damir>and will talk to any HTTP2 conforming client

I do not see where are you heading to.
Is "curl as PostgreSQL client" one of the key objectives for you?
True clients (the ones that are used by the majority of applications)
should support things like "prepared statements", "data types", "cursors"
(resultset streaming), etc. I can hardly imagine a case when one would use
"curl" and operate with prepared statements.
I think psql is pretty good client, so I see no point in implementing
HTTP/2 for a mere reason of using curl to fetch data from the DB.

Vladimir

#19

Vladimir Sitnikov

sitnikov.vladimir@gmail.com

about 8 years ago

In reply to: Tom Lane (#17)

Re: Proposal: http2 wire format

Tom>But starting from the assumption that HTTP2 solves our problems seems
to me to be "Here's a hammer.

Agree.

Just a side note: if v4 is ever invented I wish client language support
is considered.
It does take resources to implement message framing, and data parsing (e.g.
int, timestamp, struct, array, ...) for each language independently.

Vladimir

#20

Damir Simunic

damir.simunic@wa-research.ch

about 8 years ago

In reply to: Tom Lane (#17)

Re: Proposal: http2 wire format

On 26 Mar 2018, at 16:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Damir Simunic <damir.simunic@wa-research.ch> writes:

On 26 Mar 2018, at 11:06, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:

If anyone finds the idea of Postgres speaking http2 appealing

TBH, this sounds like a proposal to expend a whole lot of work (much of it
outside the core server, and thus not under our control) in order to get
from a state of affairs where there are things we'd like to do but can't
because of protocol compatibility worries, to a different state of affairs
where there are things we'd like to do but can't because of protocol
compatibility worries.

What do you mean by compatibility worries? Is it backward compatibility?

If so, I’m not suggesting we get rid of FEBE, but leave it as is and complement it with a widely understood and supported protocol, that in fact takes compatibility way more seriously than FEBE. Just leave v3 frozen. Seems like ultimate backward compatibility, no? Or am I missing something?

You likely know every possible use case for Postgres, which makes you believe that the status quo is the right way. Or maybe I didn’t flesh out my proposal enough for you to give it a chance. Either way, I just can’t figure out where would HTTP2 be the same as status quo or a step backward compared to FEBE. I can see you’re super-busy and dedicated, but if you can find the time to enlighten me beyond just waving the “compatibility” and “engineering” banners, I’d appreciate you endlessly.

Why would forcing our data into a protocol
designed for a completely different purpose, and which we have no control
over, be a step forward?

What purpose do you see HTTP2 being designed for that is completely different from FEBE? Not being cynical, genuinely want to learn. (Oh, it’s my data, too; presently held hostage to the v3 protocol).

You mention twice loss of control--what exactly is the fear?

How would that address the fundamental issue of
inertia in multiple chunks of software (ie, client libraries and
applications as well as the server)?

Is this inertia as in "our TODO list is years old and nobody’s doing anything about it"? If so, I posit here that using HTTP2 as the v4 protocol will lead to significant reduction of inertia. And that just because we’re talking HTTP2 and not some new obscure thing we invented.

The psychological and social aspects are not to be underestimated.

This proposal takes the stance that having HTTP2 wire protocol in place will enable wide experimentation with and implementation of many new features and content types, but is not concerned with the specifics of those.

That reads to me as pie in the sky, and uninformed by any engineering
reality. As an example, it's not the protocol's fault that database
server processes are expensive to spin up; changing to a different
protocol will do nothing to make them more lightweight. We've thought
about various ways to amortize that cost, but they tend to fall foul of
the fact that sessions are associated with TCP connections, which we can't
transparently remake or reattach to a different endpoint process. HTTP2
is not going to fix that, because it's still TCP based.

That reads to me as uninformed engineering reality. Just because you are encumbered with the worries of compatibility and stuck in the world of TCP, doesn’t mean it can’t be done.

You know what? HTTP2 just might fix it. Getting a new protocol into the core will force enough adjustments to the code to open the door for the next protocol on the horizon: QUIC, which happens to be UDP based, and might just be the ticket. At a minimum it will get significantly more people thinking about the possibility of reattaching sessions and doing all kinds of other things. Allowing multiple protocols is not very different from allowing a multitude of pl implementations.

Help me put HTTP2 in place, and I’ll bet you, within a few months someone will come up with a patch for QUIC. And then someone else will remember your paragraph above and say “hmm, let’s see…"

I realize that
webservers manage to have pretty lightweight sessions, but that's not a
property of the protocol they use, it's a property of their internal
architectures. We can't get there without a massive rewrite of the PG
server --- one that would be largely independent of any particular way of
representing data on the wire, anyway.

A smart outsider might come along, look at an ultra-fast web server, then look at Postgres and think, “Hmm, both speak HTTP2, but one is blazing fast, the other slow. Can I learn anything from the former to apply to the latter? Maybe I'll add another type of a backend that serves only a very very narrow use case, but makes it blazing fast?” Pie in the sky? Maybe. But isn’t it how it works today: lots of smart people chipping away in small increments?

Let’s not underestimate the effect of possibilities on mobilizing minds. Innovation is fueled by the power of possibilities. “Engineering reality” is not enough. HTTP2 is at least as good as FEBE, but it has infinitely more cachet than anything we can come up with. And that is super-important.

We've certainly got issues that can't be solved without protocol changes.
But starting from the assumption that HTTP2 solves our problems seems to
me to be "Here's a hammer. I'm sure your problem must be a nail, because
all problems are nails”.

Or maybe starting from the assumption that a small change will get a lot of people excited about solving those issues seems to me to be “ideas help start revolutions”?

Yes, I do happen to believe HTTP2 can solve a slice of current problems, and open the possibilities you didn’t have the time to think of. And yes, a protocol designed to transport data happens to look like a good hammer to nail data transfer problems. What are the odds of coming up with a better one?

Look, I keep trying to limit this to the smallest possible increment that I could think of. The choice is simply pragmatic. But that doesn’t make me a hipster fanboi of the protocol du jour, just because they are all doing it we should too.

There are three alternatives to the proposal: do nothing, make a few anemic changes to v3, or start a multiyear discussion on the design of the next protocol. And you’ll still converge to something like HTTP2 or QUIC.

It’s hard to move forward if you’re not focused. Doubly hard when you’re an outsider, and extra frustrating when you have the idea and the intuition, but it takes forever to learn everything. Someone with your experience and skills would get HTTP2 done in a couple of days, and have a ton of people well on their way to resolving all these issues that can’t be solved today. If I could have pulled off the coding all by myself already, i would have already done it. But I need you and everyone else here to help.

What would it take to convince you, or at least lend enough support to the idea to give it a chance?

Thanks,
Damir

Show quoted text

regards, tom lane

#21

Damir Simunic

damir.simunic@wa-research.ch

about 8 years ago

In reply to: Vladimir Sitnikov (#19)

#22

Damir Simunic

damir.simunic@wa-research.ch

about 8 years ago

In reply to: Alvaro Hernandez (#16)

#23

Andres Freund

andres@anarazel.de

about 8 years ago

In reply to: Damir Simunic (#20)

#24

Damir Simunic

damir.simunic@wa-research.ch

about 8 years ago

In reply to: Vladimir Sitnikov (#18)

#25

Vladimir Sitnikov

sitnikov.vladimir@gmail.com

about 8 years ago

In reply to: Damir Simunic (#21)

#26

Vladimir Sitnikov

sitnikov.vladimir@gmail.com

about 8 years ago

In reply to: Damir Simunic (#24)

#27

Damir Simunic

damir.simunic@wa-research.ch

about 8 years ago

In reply to: Andres Freund (#23)

#28

Alvaro Hernandez

aht@ongres.com

about 8 years ago

In reply to: Damir Simunic (#22)

#29

Damir Simunic

damir.simunic@wa-research.ch

about 8 years ago

In reply to: Vladimir Sitnikov (#25)

#30

David G. Johnston

david.g.johnston@gmail.com

about 8 years ago

In reply to: Damir Simunic (#24)

#31

Craig Ringer

craig@2ndquadrant.com

about 8 years ago

In reply to: Damir Simunic (#15)

#32

Stephen Frost

sfrost@snowman.net

about 8 years ago

In reply to: Craig Ringer (#31)

#33

Craig Ringer

craig@2ndquadrant.com

about 8 years ago

In reply to: Tom Lane (#17)

#34

Damir Simunic

damir.simunic@wa-research.ch

about 8 years ago

In reply to: Craig Ringer (#33)

#35

Andres Freund

andres@anarazel.de

about 8 years ago

In reply to: Damir Simunic (#27)

#36

Craig Ringer

craig@2ndquadrant.com

about 8 years ago

In reply to: Damir Simunic (#34)

#37

Craig Ringer

craig@2ndquadrant.com

about 8 years ago

In reply to: Andres Freund (#35)

#38

Tatsuo Ishii

t-ishii@sra.co.jp

about 8 years ago

In reply to: Andres Freund (#35)

#39

Tom Lane

tgl@sss.pgh.pa.us

about 8 years ago

In reply to: Andres Freund (#35)

#40

Andres Freund

andres@anarazel.de

about 8 years ago

In reply to: Craig Ringer (#37)

#41

Andres Freund

andres@anarazel.de

about 8 years ago

In reply to: Tom Lane (#39)

#42

Peter Eisentraut

peter_e@gmx.net

about 8 years ago

In reply to: Andres Freund (#41)

#43

Hannu Krosing

hannu@tm.ee

about 8 years ago

In reply to: Andres Freund (#40)

#44

Andres Freund

andres@anarazel.de

about 8 years ago

In reply to: Peter Eisentraut (#42)

#45

Peter Eisentraut

peter_e@gmx.net

about 8 years ago

In reply to: Andres Freund (#44)

#46

Andres Freund

andres@anarazel.de

about 8 years ago

In reply to: Peter Eisentraut (#45)

#47

Robert Haas

robertmhaas@gmail.com

about 8 years ago

In reply to: Craig Ringer (#33)