Question: merit / feasibility of compressing frontend <--> backend transfers w/ zlib
Hello,
I'm new to the list, and just started working as an intern at
commandprompt.com.
As one of my first projects I'm been asked to compress with zlib
(www.gzip.org/zlib ) data flowing from postgres clients to and
especially from the backend server. Our first idea was to write a sort
of 'compression proxy' with a frontend and backend of its own. The
postgres client would connect to the compression frontend on their local
machine which would compress and transfer to the compresss backend on
the server. Decompressed requests would be forwared to the postgres
server. This idea was abandoned since: 1.) it means existing clients
would have to be reconfigured to talk to their local machine, and 2.) it
destroys host based authentication since all packets arriving at the
sever would be from the local decompressor.
The current idea is to rewrite parts of postgres itself, both the
frontend libpq and the backend, so that a "compress" option could be
passed by the client. After the startup packet and authentication all
subsequent queries and responses would be compressed (and decompressed
when received).
My questions are: Is there any merit to this idea? i.e would
compressing large result sets decrease the transfer time? and, How
easy or difficult would it be to incorporate such change into the
postgres frontend and backend source?
Any help appreciated,
Robert Flory
using psql-general@commandprompt.com
On Mon, Jul 15, 2002 at 12:01:03PM -0700, pgsql-general wrote:
As one of my first projects I'm been asked to compress with zlib
(www.gzip.org/zlib ) data flowing from postgres clients to and
especially from the backend server. Our first idea was to write a sort
of 'compression proxy' with a frontend and backend of its own. The
postgres client would connect to the compression frontend on their local
machine which would compress and transfer to the compresss backend on
the server. Decompressed requests would be forwared to the postgres
server. This idea was abandoned since: 1.) it means existing clients
would have to be reconfigured to talk to their local machine, and 2.) it
destroys host based authentication since all packets arriving at the
sever would be from the local decompressor.
It also strikes me as inefficient and unnecessarily complicated.
My questions are: Is there any merit to this idea? i.e would
compressing large result sets decrease the transfer time?
I'm not too keen about it (as was Tom Lane when someone suggested it
earlier, IIRC). The vast majority of PostgreSQL installations place
both the clients and the RDBMS on the same LAN, so I'd expect
that few people would find it useful. And among those that would,
you can get that functionality in other ways (e.g. ssh forwarding,
a generic zlib tunnel if one exists -- similar to stunnel for SSL),
without needing to bloat PostgreSQL.
How easy or difficult would it be to incorporate such change into the
postgres frontend and backend source?
Doesn't seem like it would be very difficult, IMHO.
Cheers,
Neil
--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC
Does the ODBC or JDBC interface use compression? I think these
are more likely to be used over a non-LAN connection.
The other use for compression would be for a data sync between
two database installations that are geographically distributed.The idea
is that two offices would each have a local DBMS but the link
between them is slow. Compression could help in that case.
Compression is not all that hard to set up using port forwarding
proxies
like you thought. In fact ssh can do it already if you specify the
"-C" option.
--- Neil Conway <nconway@klamath.dyndns.org> wrote:
On Mon, Jul 15, 2002 at 12:01:03PM -0700, pgsql-general wrote:
As one of my first projects I'm been asked to compress with zlib
(www.gzip.org/zlib ) data flowing from postgres clients to and
especially from the backend server. Our first idea was to write asort
of 'compression proxy' with a frontend and backend of its own. The
postgres client would connect to the compression frontend on their
<SNIP>
=====
Chris Albertson
Home: 310-376-1029 chrisalbertson90278@yahoo.com
Cell: 310-990-7550
Office: 310-336-5189 Christopher.J.Albertson@aero.org
__________________________________________________
Do You Yahoo!?
Yahoo! Autos - Get free new car price quotes
http://autos.yahoo.com
Hello,
Without getting into a huge debate over which implementation is
better. I can suffice to say that we have seen significant demand for
this solution without the obnoxiousness of ssh. SSH is great for lots of
stuff, but you are adding an addition user layer application to manage.
Our implementation will make it so that you literally just say
compression=yes in the connection string and boom.... it's compressed.
There is a real commercial need, when dealing with VPN's, remote
users, and web based distributed applications for something like this.
Sincerely,
Joshua Drake
Chris Albertson wrote:
Show quoted text
Does the ODBC or JDBC interface use compression? I think these
are more likely to be used over a non-LAN connection.The other use for compression would be for a data sync between
two database installations that are geographically distributed.The idea
is that two offices would each have a local DBMS but the link
between them is slow. Compression could help in that case.Compression is not all that hard to set up using port forwarding
proxies
like you thought. In fact ssh can do it already if you specify the
"-C" option.--- Neil Conway <nconway@klamath.dyndns.org> wrote:On Mon, Jul 15, 2002 at 12:01:03PM -0700, pgsql-general wrote:
As one of my first projects I'm been asked to compress with zlib
(www.gzip.org/zlib ) data flowing from postgres clients to and
especially from the backend server. Our first idea was to write asort
of 'compression proxy' with a frontend and backend of its own. The
postgres client would connect to the compression frontend on their<SNIP>
=====
Chris Albertson
Home: 310-376-1029 chrisalbertson90278@yahoo.com
Cell: 310-990-7550
Office: 310-336-5189 Christopher.J.Albertson@aero.org__________________________________________________
Do You Yahoo!?
Yahoo! Autos - Get free new car price quotes
http://autos.yahoo.com---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
"Joshua D. Drake" <jd@commandprompt.com> writes:
There is a real commercial need, when dealing with VPN's, remote
users, and web based distributed applications for something like this.
This unsubstantiated opinion doesn't really do much to change my
opinion. We have seen maybe two or three prior requests for compression
(which does not qualify as a groundswell); furthermore they were all "it
would be nice if..." handwaving, with no backup data to convince anyone
that any real performance gain would emerge in common scenarios. So I'm
less than eager to buy into the portability and interoperability
pitfalls that are likely to emerge from requiring clients and servers to
have zlib.
regards, tom lane
Has anyone run any tests to see if it is faster/slower.
---------------------------------------------------------------------------
Tom Lane wrote:
"Joshua D. Drake" <jd@commandprompt.com> writes:
There is a real commercial need, when dealing with VPN's, remote
users, and web based distributed applications for something like this.This unsubstantiated opinion doesn't really do much to change my
opinion. We have seen maybe two or three prior requests for compression
(which does not qualify as a groundswell); furthermore they were all "it
would be nice if..." handwaving, with no backup data to convince anyone
that any real performance gain would emerge in common scenarios. So I'm
less than eager to buy into the portability and interoperability
pitfalls that are likely to emerge from requiring clients and servers to
have zlib.regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Hello,
All due respect Tom, I am not asking you to. We (CMD) have specific
instances of projects that will require this feature. I have also spoke
with others that have requested that we do something like this for their
projects, although we will not benefit from them. This is why I have
authorized my programmer to implement the feature.
We see a benefit, in compressing result sets for transfer to clients. In
a lot of instances it would take less time to compress and decompress a
result set, than to actually transfer the result set across the wire in
plain text.
If you are dealing with 1 meg of text, across a distributed application
where the client connect via a VPN at 56k, we are talking 4 minutes. If we
compress and send it across that could be 30 seconds (mileage will vary).
Besides, we are not asking the PostgreSQL team to implement the feature,
just to help us understand the existing code a little better (which I
realize now, my budding programmer did not word very well), so that we may
implement it within our code base.
Sincerely,
Joshua D. Drake
We are not asking the PostgreSQL team to do so.
On Tue, 16 Jul 2002, Tom Lane wrote:
Show quoted text
"Joshua D. Drake" <jd@commandprompt.com> writes:
There is a real commercial need, when dealing with VPN's, remote
users, and web based distributed applications for something like this.This unsubstantiated opinion doesn't really do much to change my
opinion. We have seen maybe two or three prior requests for compression
(which does not qualify as a groundswell); furthermore they were all "it
would be nice if..." handwaving, with no backup data to convince anyone
that any real performance gain would emerge in common scenarios. So I'm
less than eager to buy into the portability and interoperability
pitfalls that are likely to emerge from requiring clients and servers to
have zlib.regards, tom lane
Hello,
All due respect Tom, I am not asking you to. We (CMD) have specific
instances of projects that will require this feature. I have also spoke
with others that have requested that we do something like this for their
projects, although we will not benefit from them. This is why I have
authorized my programmer to implement the feature.
We see a benefit, in compressing result sets for transfer to clients. In
a lot of instances it would take less time to compress and decompress a
result set, than to actually transfer the result set across the wire in
plain text.
If you are dealing with 1 meg of text, across a distributed application
where the client connect via a VPN at 56k, we are talking 4 minutes. If we
compress and send it across that could be 30 seconds (mileage will vary).
Besides, we are not asking the PostgreSQL team to implement the feature,
just to help us understand the existing code a little better (which I
realize now, my budding programmer did not word very well), so that we may
implement it within our code base.
Sincerely,
Joshua D. Drake
We are not asking the PostgreSQL team to do so.
On Tue, 16 Jul 2002, Tom Lane wrote:
Show quoted text
"Joshua D. Drake" <jd@commandprompt.com> writes:
There is a real commercial need, when dealing with VPN's, remote
users, and web based distributed applications for something like this.This unsubstantiated opinion doesn't really do much to change my
opinion. We have seen maybe two or three prior requests for compression
(which does not qualify as a groundswell); furthermore they were all "it
would be nice if..." handwaving, with no backup data to convince anyone
that any real performance gain would emerge in common scenarios. So I'm
less than eager to buy into the portability and interoperability
pitfalls that are likely to emerge from requiring clients and servers to
have zlib.regards, tom lane
Import Notes
Resolved by subject fallback
On Tue, Jul 16, 2002 at 01:59:10 -0700,
"Joshua D. Drake" <jd@commandprompt.com> wrote:
If you are dealing with 1 meg of text, across a distributed application
where the client connect via a VPN at 56k, we are talking 4 minutes. If we
compress and send it across that could be 30 seconds (mileage will vary).
Shouldn't the VPN be doing compression?
Bruno Wolff III <bruno@wolff.to> writes:
On Tue, Jul 16, 2002 at 01:59:10 -0700,
"Joshua D. Drake" <jd@commandprompt.com> wrote:If you are dealing with 1 meg of text, across a distributed application
where the client connect via a VPN at 56k, we are talking 4 minutes. If we
compress and send it across that could be 30 seconds (mileage will vary).Shouldn't the VPN be doing compression?
Most VPNs (eg ones based on IPsec) work at the IP packet level, with
no knowledge of the streams at higher levels. I don't think the IPsec
standard addresses compression at all--that's supposed to be handled
at the link layer (eg PPP) or at higher levels.
Even if it were there, packet-by-packet compression, or that provided
by a 56K modem link, isn't going to give you nearly as big a win as
compressing at the TCP stream level, where there is much more
redundancy to take advantage of, and you don't have things like packet
headers polluting the compression dictionary.
I'm not advocating zlib-in-PG, but it does seem that some people would
find it useful.
-Doug
Import Notes
Reply to msg id not found: BrunoWolffIII'smessageofTue16Jul2002063235-0500
On Tue, Jul 16, 2002 at 12:13:14 -0400,
Doug McNaught <doug@wireboard.com> wrote:
Most VPNs (eg ones based on IPsec) work at the IP packet level, with
no knowledge of the streams at higher levels. I don't think the IPsec
standard addresses compression at all--that's supposed to be handled
at the link layer (eg PPP) or at higher levels.
That can't be right. Once the data is encrypted, you won't be able to
compress it. That is why it is useful for the VPN software to be able
to do it.
Even if it were there, packet-by-packet compression, or that provided
by a 56K modem link, isn't going to give you nearly as big a win as
compressing at the TCP stream level, where there is much more
redundancy to take advantage of, and you don't have things like packet
headers polluting the compression dictionary.
Maybe a generic compression tool could be put into the path without having
to change either Postgres or your VPN software.
Bruno Wolff III <bruno@wolff.to> writes:
On Tue, Jul 16, 2002 at 12:13:14 -0400,
Doug McNaught <doug@wireboard.com> wrote:Most VPNs (eg ones based on IPsec) work at the IP packet level, with
no knowledge of the streams at higher levels. I don't think the IPsec
standard addresses compression at all--that's supposed to be handled
at the link layer (eg PPP) or at higher levels.That can't be right. Once the data is encrypted, you won't be able to
compress it. That is why it is useful for the VPN software to be able
to do it.
True enough, but my point below still stands--it just makes a lot more
sense to do it up at the stream level, if you have one.
Even if it were there, packet-by-packet compression, or that provided
by a 56K modem link, isn't going to give you nearly as big a win as
compressing at the TCP stream level, where there is much more
redundancy to take advantage of, and you don't have things like packet
headers polluting the compression dictionary.Maybe a generic compression tool could be put into the path without having
to change either Postgres or your VPN software.
SSH with compression enabled works fairly well for this, but the OP
didn't see the point of using it when he already had a VPN going.
The idea of a generic "compression tunnel" (without the SSH overhead)
is nice, but I've never seen one. Wouldn't be that hard to write, I'd
think.
I think the big obstacle to putting compression into PG is needing to
extend the FE/BE protocol for negotiating compression, and the possible
client compatibility issues that raises. We already have SSL
negotiation working, though...
-Doug
Import Notes
Reply to msg id not found: BrunoWolffIII'smessageofTue16Jul2002134626-0500
Doug McNaught <doug@wireboard.com> writes:
I think the big obstacle to putting compression into PG is needing to
extend the FE/BE protocol for negotiating compression, and the possible
client compatibility issues that raises. We already have SSL
negotiation working, though...
Yup. Seems like a more useful exercise would be to lobby the SSL people
to include compression as an option in SSL connections. That would
solve the problem not only for PG, but every other application that uses
SSL ...
regards, tom lane
Hi Tom,
Tom Lane wrote:
Doug McNaught <doug@wireboard.com> writes:
I think the big obstacle to putting compression into PG is needing to
extend the FE/BE protocol for negotiating compression, and the possible
client compatibility issues that raises. We already have SSL
negotiation working, though...Yup. Seems like a more useful exercise would be to lobby the SSL people
to include compression as an option in SSL connections. That would
solve the problem not only for PG, but every other application that uses
SSL ...
We can all see the merits of having a compressed data stream, especially
in those situations where the byte count if more important than a CPU
cost.
However, I'd like to point out that SSL isn't feasible to use in all
situations, so having to enable SSL to gain compression would be a pain.
If someone's willing to put the time into this, then compression without
SSL feels like a good idea. Not everyone uses SSL. Bad network latency
has a very undesirable effect on the establishment of SSL connections,
and this is especially of interest in those cases where people need to
get short "bursty" amounts of SQL data across a connection as fast as
possible. Aka, client using a frontend app to remote databases over a
modem, and not using persistent connections.
Establishment of an individual SSL session using OpenSSL can take over a
second in this case. Not consistently-always, but I had to time it (on
fast hardware too) for a contract recently when deciding on network
layer transports.
Hope this gives some decent food for thought.
:-)
Regards and best wishes,
Justin Clift
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi
If someone's willing to put the time into this, then compression without
SSL feels like a good idea. Not everyone uses SSL. Bad network latency
Well, we are already putting the time into it ;). I expect to have it
complete by the end of the week. If people like we can keep in touch about
it.
Sincerely,
Joshua Drake
Show quoted text
has a very undesirable effect on the establishment of SSL connections,
and this is especially of interest in those cases where people need to
get short "bursty" amounts of SQL data across a connection as fast as
possible. Aka, client using a frontend app to remote databases over a
modem, and not using persistent connections.Establishment of an individual SSL session using OpenSSL can take over a
second in this case. Not consistently-always, but I had to time it (on
fast hardware too) for a contract recently when deciding on network
layer transports.Hope this gives some decent food for thought.
:-)
Regards and best wishes,
Justin Clift
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Hello,
We have successfully completed the rewrite of the connection functions
(frontend and backend) to enable compression. After testing (which I
will provide numbers soon) we have found that compression is quite
usable and increases performnce for most connections. In fact unless you
are running on a 10Mb or higher it will probably help you. We still
need to run some tests on connections that are above 384k but it is
looking quite good.
We did not break compatibility and compression is a dynamic option
that can be used in the connection string.
Sincerely,
Joshua Drake
Command Prompt, Inc.