PROXY protocol support

Started by Julien Riouover 6 years ago5 messages
#1Julien Riou
julien@riou.xyz

Hello,

Nowadays, PostgreSQL is often used behind proxies. Some are PostgreSQL
protocol aware (Pgpool, PgBouncer), some are pure TCP (HAProxy). From
the database instance point of view, all clients come from the proxy.

There are two major problems with this topology:

* It neutralizes the host based authentication. Every client shares
the same source. Either we allow this source or not but we cannot allow
clients on a more fine-grained basis, or not by the IP address.

* It makes debugging harder. If we have a DDL or a slow query logged, we
cannot use the source to identify who is responsible.

On one hand, we can move the authentication and logging mechanisms to
PostgreSQL based proxies but they will never be as complete as
PostgreSQL itself. And they don't have features like HTTP health checks
to redirect trafic to nodes (health, role, whatever behind the URL). On
the other hand, those features are not implemented at all because they
don't know the PostgreSQL protocol, they simply forward requests.

In the HTTP reverse proxies world, there's a "dirty hack" to identify
the source IP address: add an HTTP header "X-Forwared-For" to the
request. It's the destination duty to do whatever they want with this
information. With this feature in mind, someone from HAProxy has
implemented this mechanism at the protocol level. It's called the PROXY
protocol.

With this piece of logic at the beginning of the protocol, we could
implement a totally transparent proxy and benefit from the great
features of PostgreSQL regarding clients. Note that MariaDB support the
PROXY protocol in MaxScale (proxy) and MariaDB Server in recent
versions.

My question is, what do you think of this feature? Is it worth to spend
time implementing it in PostgreSQL or not?

Links:
- http://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
- https://mariadb.com/kb/en/library/proxy-protocol-support/

Thanks,
Julien

PS: I've already sent this message to a wrong mailing list. Stephen
Frost said it's implemented in pgbouncer but all I can find is an open
issue: https://github.com/pgbouncer/pgbouncer/issues/241.

#2Stephen Frost
sfrost@snowman.net
In reply to: Julien Riou (#1)
Re: PROXY protocol support

Greetings,

* Julien Riou (julien@riou.xyz) wrote:

Nowadays, PostgreSQL is often used behind proxies. Some are PostgreSQL
protocol aware (Pgpool, PgBouncer), some are pure TCP (HAProxy). From
the database instance point of view, all clients come from the proxy.

There are two major problems with this topology:

* It neutralizes the host based authentication. Every client shares
the same source. Either we allow this source or not but we cannot allow
clients on a more fine-grained basis, or not by the IP address.

You can instead have the IP-based checking done at the pooler.

* It makes debugging harder. If we have a DDL or a slow query logged, we
cannot use the source to identify who is responsible.

Protocol-level poolers are able to do this, and pgbouncer does (see
application_name_add_host).

On one hand, we can move the authentication and logging mechanisms to
PostgreSQL based proxies but they will never be as complete as
PostgreSQL itself. And they don't have features like HTTP health checks
to redirect trafic to nodes (health, role, whatever behind the URL). On
the other hand, those features are not implemented at all because they
don't know the PostgreSQL protocol, they simply forward requests.

In the HTTP reverse proxies world, there's a "dirty hack" to identify
the source IP address: add an HTTP header "X-Forwared-For" to the
request. It's the destination duty to do whatever they want with this
information. With this feature in mind, someone from HAProxy has
implemented this mechanism at the protocol level. It's called the PROXY
protocol.

Someone from HAProxy could certainly implement something similar by
having HAProxy understand PostgreSQL's protocol.

With this piece of logic at the beginning of the protocol, we could
implement a totally transparent proxy and benefit from the great
features of PostgreSQL regarding clients. Note that MariaDB support the
PROXY protocol in MaxScale (proxy) and MariaDB Server in recent
versions.

pgbouncer is already a transparent proxy that understands the PG
protocol, and, even better, it has support for transaction-level pooling
(as well as connection-level), which is really critical for larger PG
deployments as PG backend startup is (relatively) expensive.

PS: I've already sent this message to a wrong mailing list. Stephen
Frost said it's implemented in pgbouncer but all I can find is an open
issue: https://github.com/pgbouncer/pgbouncer/issues/241.

That would be some *other* proxy system (Amazon's ELB) that apparently
also doesn't understand the PG protocol and therefore doesn't have a
feature similar to pgbouncer's application_name_add_host.

I haven't looked very closely at if it'd be possible to interpret the
PROXY protocol thing that Amazon's ELB can do without confusing it with
a regular PG authentication startup and I'm not sure if we'd really want
to wed ourselves to something like that. Certainly, what pgbouncer does
works quite well and is about as transparent to clients as possible.

You'd almost certainly want something like pgbouncer after the ELB
anyway to avoid having tons of connections to PG and avoid spinning up
new backends constantly.

Thanks,

Stephen

#3Julien Riou
julien@riou.xyz
In reply to: Stephen Frost (#2)
Re: PROXY protocol support

On May 19, 2019 5:59:04 PM GMT+02:00, Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Julien Riou (julien@riou.xyz) wrote:

Nowadays, PostgreSQL is often used behind proxies. Some are

PostgreSQL

protocol aware (Pgpool, PgBouncer), some are pure TCP (HAProxy). From
the database instance point of view, all clients come from the proxy.

There are two major problems with this topology:

* It neutralizes the host based authentication. Every client shares
the same source. Either we allow this source or not but we cannot

allow

clients on a more fine-grained basis, or not by the IP address.

You can instead have the IP-based checking done at the pooler.

* It makes debugging harder. If we have a DDL or a slow query logged,

we

cannot use the source to identify who is responsible.

Protocol-level poolers are able to do this, and pgbouncer does (see
application_name_add_host).

On one hand, we can move the authentication and logging mechanisms to
PostgreSQL based proxies but they will never be as complete as
PostgreSQL itself. And they don't have features like HTTP health

checks

to redirect trafic to nodes (health, role, whatever behind the URL).

On

the other hand, those features are not implemented at all because

they

don't know the PostgreSQL protocol, they simply forward requests.

In the HTTP reverse proxies world, there's a "dirty hack" to identify
the source IP address: add an HTTP header "X-Forwared-For" to the
request. It's the destination duty to do whatever they want with this
information. With this feature in mind, someone from HAProxy has
implemented this mechanism at the protocol level. It's called the

PROXY

protocol.

Someone from HAProxy could certainly implement something similar by
having HAProxy understand PostgreSQL's protocol.

With this piece of logic at the beginning of the protocol, we could
implement a totally transparent proxy and benefit from the great
features of PostgreSQL regarding clients. Note that MariaDB support

the

PROXY protocol in MaxScale (proxy) and MariaDB Server in recent
versions.

pgbouncer is already a transparent proxy that understands the PG
protocol, and, even better, it has support for transaction-level
pooling
(as well as connection-level), which is really critical for larger PG
deployments as PG backend startup is (relatively) expensive.

PS: I've already sent this message to a wrong mailing list. Stephen
Frost said it's implemented in pgbouncer but all I can find is an

open

issue: https://github.com/pgbouncer/pgbouncer/issues/241.

That would be some *other* proxy system (Amazon's ELB) that apparently
also doesn't understand the PG protocol and therefore doesn't have a
feature similar to pgbouncer's application_name_add_host.

I haven't looked very closely at if it'd be possible to interpret the
PROXY protocol thing that Amazon's ELB can do without confusing it with
a regular PG authentication startup and I'm not sure if we'd really
want
to wed ourselves to something like that. Certainly, what pgbouncer
does
works quite well and is about as transparent to clients as possible.

You'd almost certainly want something like pgbouncer after the ELB
anyway to avoid having tons of connections to PG and avoid spinning up
new backends constantly.

Thanks,

Stephen

It could be proprietary Amazon load balancers I don't have experience with, or simple HAProxy coupled with a Patroni HTTP API to tell if a backend is healthy or not.

The PgBouncer approach is interesting. I'm already using the application name as a workaround to identify containerized applications but didn't used it for setting the source IP.

If we take a look at the MariaDB implementation, they check for errors in the startup packet then run the PROXY protocol decoding then return a real error if it doesn't work. As our bouncers are all behind a pool of HAProxy, and if we consider PgBouncer as a trusted extension of PostgreSQL, maybe implementing it in PgBouncer first will be easier.

Thanks for your insightful comments.
Julien

#4Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Julien Riou (#1)
Re: PROXY protocol support

On 19.05.2019 18:36, Julien Riou wrote:

Hello,

Nowadays, PostgreSQL is often used behind proxies. Some are PostgreSQL
protocol aware (Pgpool, PgBouncer), some are pure TCP (HAProxy). From
the database instance point of view, all clients come from the proxy.

There are two major problems with this topology:

* It neutralizes the host based authentication. Every client shares
the same source. Either we allow this source or not but we cannot allow
clients on a more fine-grained basis, or not by the IP address.

* It makes debugging harder. If we have a DDL or a slow query logged, we
cannot use the source to identify who is responsible.

On one hand, we can move the authentication and logging mechanisms to
PostgreSQL based proxies but they will never be as complete as
PostgreSQL itself. And they don't have features like HTTP health checks
to redirect trafic to nodes (health, role, whatever behind the URL). On
the other hand, those features are not implemented at all because they
don't know the PostgreSQL protocol, they simply forward requests.

In the HTTP reverse proxies world, there's a "dirty hack" to identify
the source IP address: add an HTTP header "X-Forwared-For" to the
request. It's the destination duty to do whatever they want with this
information. With this feature in mind, someone from HAProxy has
implemented this mechanism at the protocol level. It's called the PROXY
protocol.

With this piece of logic at the beginning of the protocol, we could
implement a totally transparent proxy and benefit from the great
features of PostgreSQL regarding clients. Note that MariaDB support the
PROXY protocol in MaxScale (proxy) and MariaDB Server in recent
versions.

My question is, what do you think of this feature? Is it worth to spend
time implementing it in PostgreSQL or not?

Links:
- http://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
- https://mariadb.com/kb/en/library/proxy-protocol-support/

Thanks,
Julien

PS: I've already sent this message to a wrong mailing list. Stephen
Frost said it's implemented in pgbouncer but all I can find is an open
issue: https://github.com/pgbouncer/pgbouncer/issues/241.

Hi,
From my point of view it will be better to support embedded connection
pooler in Postgres.
In this case all mentioned problems can be more or less
straightforwardly solved without inventing new protocol.
There is my prototype implementation of built-in connection pooler on
commit-fest:
https://commitfest.postgresql.org/23/2067/

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#5Bruno Lavoie
bl@brunol.com
In reply to: Julien Riou (#1)
Re: PROXY protocol support

+1 on this one...

MySQL and derivatives support it very well.. it is a standard that can be
used with either haproxy or better, ProxySQL.

Would be nice to have it in core.

It is a show stopper for us to use proxying because of compliance and
tracability reasons.

Le dim. 19 mai 2019 11:36 AM, Julien Riou <julien@riou.xyz> a écrit :

Show quoted text

Hello,

Nowadays, PostgreSQL is often used behind proxies. Some are PostgreSQL
protocol aware (Pgpool, PgBouncer), some are pure TCP (HAProxy). From
the database instance point of view, all clients come from the proxy.

There are two major problems with this topology:

* It neutralizes the host based authentication. Every client shares
the same source. Either we allow this source or not but we cannot allow
clients on a more fine-grained basis, or not by the IP address.

* It makes debugging harder. If we have a DDL or a slow query logged, we
cannot use the source to identify who is responsible.

On one hand, we can move the authentication and logging mechanisms to
PostgreSQL based proxies but they will never be as complete as
PostgreSQL itself. And they don't have features like HTTP health checks
to redirect trafic to nodes (health, role, whatever behind the URL). On
the other hand, those features are not implemented at all because they
don't know the PostgreSQL protocol, they simply forward requests.

In the HTTP reverse proxies world, there's a "dirty hack" to identify
the source IP address: add an HTTP header "X-Forwared-For" to the
request. It's the destination duty to do whatever they want with this
information. With this feature in mind, someone from HAProxy has
implemented this mechanism at the protocol level. It's called the PROXY
protocol.

With this piece of logic at the beginning of the protocol, we could
implement a totally transparent proxy and benefit from the great
features of PostgreSQL regarding clients. Note that MariaDB support the
PROXY protocol in MaxScale (proxy) and MariaDB Server in recent
versions.

My question is, what do you think of this feature? Is it worth to spend
time implementing it in PostgreSQL or not?

Links:
- http://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
- https://mariadb.com/kb/en/library/proxy-protocol-support/

Thanks,
Julien

PS: I've already sent this message to a wrong mailing list. Stephen
Frost said it's implemented in pgbouncer but all I can find is an open
issue: https://github.com/pgbouncer/pgbouncer/issues/241.