libpq compression (part 3)
Hello PG developers!
I would like to introduce an updated take on libpq protocol-level
compression, building off off the work in
/messages/by-id/aad16e41-b3f9-e89d-fa57-fb4c694bec25@postgrespro.ru
and the followon work in
/messages/by-id/ABAA09C6-BB95-47A5-890D-90353533F9AC@yandex-team.ru
along with all of the (nice and detailed) feedback and discussion therein.
The first patch in the stack replaces `secure_read` and `secure_write`
(and their frontend counterparts) with an "IO stream" abstraction,
which should address a lot of the concerns from all parties around
secure_read. The fundamental idea of an IO stream is that it is a
linked list of "IoStreamProcessor"s. The "base" processor is the
actual socket-backed API, and then SSL, GSSAPI, and compression can
all add layers on top of that base that add functionality and rely on
the layer below to read/write data. This structure makes it easy to
add compression on top of either plain or encrypted sockets through a
unified, unconditional API, and also makes it easy for callers to use
plain, plain-compressed, secure, and secure-compressed communication
channels equivalently.
The second patch is the refactored implementation of compression
itself, with ZSTD support merged into the main patch because the
configuration-level work is now already merged in master. There was a
good bit of rebasing, housekeeping, and bugfixing (including fixing
lz4 by making it now be explicitly buffered inside ZStream), along
with taking into account a lot of the feedback from this mailing list.
I reworked the API to use the general compression processing types and
methods from `common/compression`. This change also refactors the
protocol to require the minimum amount of new message types and
exchanges possible, while also enabling one-directional compression.
The compression "handshaking" process now looks as follows:
1. Client sends startup packet with `_pq_.libpq_compression = alg1;alg2`
2. At this point, the server can immediately begin compressing packets
to the client with any of the specified algorithms it supports if it
so chooses
3. Server includes `libpq_compression` in the automatically sent
`ParameterStatus` messages before handshaking
4. At this point, the client can immediately begin compressing packets
to the server with any of the supported algorithms
Both the server and client will prefer to compress using the first
algorithm in their list that the other side supports, and we
explicitly support `none` in the algorithm list. This allows e.g. a
client to use `none;gzip` and a server to use `zstd;gzip;lz4`, and
then the client will not compress its data but the server will send
its data using gzip. Each side uses its own configured compression
level (if set), since compression levels affect compression effort
much more than decompression effort. This change also allows
connections to succeed if compression was requested but not available
(most of the time, I imagine that a client would prefer to just not
use compression if the server doesn't support it; unlike SSL, it's a
nice to have not an essential. If a client application actually
really *needs* compression, that can still be facilitated by
explicitly checking the negotiated compression methods.)
The third patch adds the traffic monitoring statistics that had been
in the main patch of the previous series. I've renamed them and
changed slightly where to measure the actual raw network bytes and the
"logical" protocol bytes, which also means this view can measure
SSL/GSSAPI overhead (not that that's a particularly *important* thing
to measure, but it's worth nothing what the view will actually
measure.
The fourth patch adds a TAP test that validates all of the compression
methods and compression negotiation. Ideally it would probably be
part of patch #2, but it uses the monitoring from #3 to be able to
validate that compression is actually working.
The fifth patch is just a placeholder to allow running the test suite
with compression maximally enabled to work out any kinks.
I believe this patch series is ready for detailed review/testing, with
one caveat: as can be seen here
https://cirrus-ci.com/build/6732518292979712 , the build is passing on
all platforms and all tests except for the primary SSL test on
Windows. After some network-level debugging, it appears that we are
bumping into a version of the issues seen here
/messages/by-id/CA+hUKG+OeoETZQ=Qw5Ub5h3tmwQhBmDA=nuNO3KG=zWfUypFAw@mail.gmail.com
, where on Windows some SSL error messages end up getting swallowed up
by the the process exiting and closing the socket with a RST rather
than a nice clean shutdown. I may have the cause/effect wrong here,
but the issues appear before the compression is actually fully set up
in the client used and would appear to be a side effect of timing
differences and/or possibly size differences in the startup packet.
Any pointers on how to resolve this would be appreciated. It does
reproduce on Windows fairly readily, though any one particular test
still sometimes succeeds, and the relevant SSL connection failure
message reliably shows up in Wireshark.
Also please let me know if I have made any notable mailing list/patch
etiquette/format/structure errors. This is my first time submitting a
patch to a mailing-list driven open source project and while I have
tried to carefully review the various wiki guides I'm sure I didn't
take everything in perfectly.
Thanks,
Jacob Burroughs
Attachments:
v1-0004-Add-basic-test-for-compression-functionality.patchapplication/octet-stream; name=v1-0004-Add-basic-test-for-compression-functionality.patchDownload+109-4
v1-0003-Add-network-traffic-stats.patchapplication/octet-stream; name=v1-0003-Add-network-traffic-stats.patchDownload+297-9
v1-0001-Add-IO-stream-abstraction.patchapplication/octet-stream; name=v1-0001-Add-IO-stream-abstraction.patchDownload+1070-806
v1-0005-DO-NOT-MERGE-enable-compression-for-CI.patchapplication/octet-stream; name=v1-0005-DO-NOT-MERGE-enable-compression-for-CI.patchDownload+6-6
v1-0002-Add-protocol-layer-compression-to-libpq.patchapplication/octet-stream; name=v1-0002-Add-protocol-layer-compression-to-libpq.patchDownload+3016-138
I believe this patch series is ready for detailed review/testing, with one caveat: as can be seen here https://cirrus-ci.com/build/6732518292979712 , the build is passing on all platforms and all tests except for the primary SSL test on Windows.
One correction: I apparently missed a kerberos timeout failure on
freebsd with compression enabled (being color blind the checkmark and
still running colors are awfully similar, and I misread what I saw).
I haven't yet successfully reproduced that one, so I may or may not
need some pointers to sort it out, but I think whatever it is the fix
will be small enough that the patch overall is still reviewable.
One correction: I apparently missed a kerberos timeout failure on
freebsd with compression enabled (being color blind the checkmark and
still running colors are awfully similar, and I misread what I saw).
I haven't yet successfully reproduced that one, so I may or may not
need some pointers to sort it out, but I think whatever it is the fix
will be small enough that the patch overall is still reviewable.
I have now sorted out all of the non-Windows build issues (and removed
my stray misguided attempt at fixing the Windows issue that I hadn't
intended to post the first time around). The build that is *actually*
passing every platform except for the one Windows SSL test mentioned
in my original message can be seen here:
https://cirrus-ci.com/build/5924321042890752
Attachments:
v2-0003-Add-network-traffic-stats.patchapplication/octet-stream; name=v2-0003-Add-network-traffic-stats.patchDownload+285-9
v2-0005-DO-NOT-MERGE-enable-compression-for-CI.patchapplication/octet-stream; name=v2-0005-DO-NOT-MERGE-enable-compression-for-CI.patchDownload+6-6
v2-0004-Add-basic-test-for-compression-functionality.patchapplication/octet-stream; name=v2-0004-Add-basic-test-for-compression-functionality.patchDownload+109-4
v2-0002-Add-protocol-layer-compression-to-libpq.patchapplication/octet-stream; name=v2-0002-Add-protocol-layer-compression-to-libpq.patchDownload+3076-148
v2-0001-Add-IO-stream-abstraction.patchapplication/octet-stream; name=v2-0001-Add-IO-stream-abstraction.patchDownload+1069-805
On Tue, Dec 19, 2023 at 11:41 AM Jacob Burroughs
<jburroughs@instructure.com> wrote:
The compression "handshaking" process now looks as follows:
1. Client sends startup packet with `_pq_.libpq_compression = alg1;alg2`
2. At this point, the server can immediately begin compressing packets
to the client with any of the specified algorithms it supports if it
so chooses
3. Server includes `libpq_compression` in the automatically sent
`ParameterStatus` messages before handshaking
4. At this point, the client can immediately begin compressing packets
to the server with any of the supported algorithms
Both the server and client will prefer to compress using the first
algorithm in their list that the other side supports, and we
explicitly support `none` in the algorithm list. This allows e.g. a
client to use `none;gzip` and a server to use `zstd;gzip;lz4`, and
then the client will not compress its data but the server will send
its data using gzip.
I'm having difficulty understanding the details of this handshaking
algorithm from this description. It seems good that the handshake
proceeds in each direction somewhat separately from the other, but I
don't quite understand how the whole thing fits together. If the
client tells the server that 'none,gzip' is supported, and I elect to
start using gzip, how does the client know that I picked gzip rather
than none? Are the compressed packets self-identifying?
It's also slightly odd to me that the same parameter seems to specify
both what we want to send, and what we're able to receive. I'm not
really sure we should have separate parameters for those things, but I
don't quite understand how this works without it. The "none" thing
seems like a bit of a hack. It lets you say "I'd like to receive
compressed data but send uncompressed data" ... but what about the
reverse? How do you say "don't bother compressing what you receive
from the server, but please lz4 everything you send to the server"? Or
how do you say "use zstd from server to client, but lz4 from client to
server"? It seems like you can't really say that kind of thing.
What if we had, on the server side, a GUC saying what compression to
accept and a GUC saying what compression to be willing to do? And then
let the client request whatever it wants for each direction.
Also please let me know if I have made any notable mailing list/patch
etiquette/format/structure errors. This is my first time submitting a
patch to a mailing-list driven open source project and while I have
tried to carefully review the various wiki guides I'm sure I didn't
take everything in perfectly.
Seems fine to me.
--
Robert Haas
EDB: http://www.enterprisedb.com
I'm having difficulty understanding the details of this handshaking
algorithm from this description. It seems good that the handshake
proceeds in each direction somewhat separately from the other, but I
don't quite understand how the whole thing fits together. If the
client tells the server that 'none,gzip' is supported, and I elect to
start using gzip, how does the client know that I picked gzip rather
than none? Are the compressed packets self-identifying?
I agree I could have spelled this out more clearly. I forgot to
mention that I added a byte to the CompressedMessage message type that
specifies the chosen algorithm. So if the server receives
'none,gzip', it can either keep sending uncompressed regular messages,
or it can compress them in CompressedMessage packets which now look
like "z{len}{format}{data}" (format just being a member of the
pg_compress_algorithm enum, so `1` in the case of gzip). Overall the
intention is that both the client and the server can just start
sending CompressedMessages once they receive the list of ones other
party supports without any negotiation or agreement needed and without
an extra message type to first specify the compression algorithm. (One
byte per message seemed to me like a reasonable overhead for the
simplicity, but it wouldn't be hard to bring back SetCompressionMethod
if we prefer.)
It's also slightly odd to me that the same parameter seems to specify
both what we want to send, and what we're able to receive. I'm not
really sure we should have separate parameters for those things, but I
don't quite understand how this works without it. The "none" thing
seems like a bit of a hack. It lets you say "I'd like to receive
compressed data but send uncompressed data" ... but what about the
reverse? How do you say "don't bother compressing what you receive
from the server, but please lz4 everything you send to the server"? Or
how do you say "use zstd from server to client, but lz4 from client to
server"? It seems like you can't really say that kind of thing.
When I came up with the protocol I was imagining that basically both
server admins and clients might want a decent bit more control over
the compression they do rather than the decompression they do, since
compression is generally much more computationally expensive than
decompression. Now that you point it out though, I don't think that
actually makes that much sense.
What if we had, on the server side, a GUC saying what compression to
accept and a GUC saying what compression to be willing to do? And then
let the client request whatever it wants for each direction.
Here's two proposals:
Option 1:
GUCs:
libpq_compression (default "off")
libpq_decompression (default "auto", which is defined to be equal to
libpq_compression)
Connection parameters:
compression (default "off")
decompression (default "auto", which is defined to be equal to compression)
I think we would only send the decompression fields over the wire to
the other side, to be used to filter for the first chosen compression
field. We would send the `_pq_.libpq_decompression` protocol
extension even if only compression was enabled and not decompression
so that the server knows to enable compression processing for the
connection (I think this would be the only place we would still use
`none`, and not as part of a list in this case.) I think we also
would want to add libpq functions to allow a client to check the
last-used compression algorithm in each direction for any
monitoring/inspection purposes (actually that's probably a good idea
regardless, so a client application that cares doesn't need to/try to
implement the intersection and assumption around choosing the first
algorithm in common). Also I'm open to better names than "auto", I
just would like it to avoid unnecessary verbosity for the common case
of "I just want to enable bidirectional compression with whatever
algorithms are available with default parameters".
Option 2:
This one is even cleaner in the common case but a bit worse in the
uncommon case: just use one parameter and have
compression/decompression enabling be part of the compression detail
(e.g. "libpq_compression='gzip:no_decompress;lz4:level=2,no_compress;zstd'"
or something like that, in which case the "none,gzip" case would
become "'libpq_compression=gzip:no_compress'"). See
https://www.postgresql.org/docs/current/app-pgbasebackup.html ,
specifically the `--compress` flag, for how specifying compression
algorithms and details works.
I'm actually not sure which of the two I prefer; opinions are welcome :)
Thanks,
Jacob
Thanks for working on this!
One thing I'm wondering: should it be possible for the client to change the
compression it wants mid-connection? I can think of some scenarios where
that would be useful to connection poolers: if a pooler does plain
forwarding of the compressed messages, then it would need to be able to
disable/enable compression if it wants to multiplex client connections with
different compression settings over the same server connection.
On 21 Dec 2023, at 05:30, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
One thing I'm wondering: should it be possible for the client to change the compression it wants mid-connection?
This patchset allows sending CompressionMethod message, which allows to set another codec\level picked from the set of negotiated codec sets (during startup).
Best regards, Andrey Borodin.
On Fri, 29 Dec 2023 at 11:02, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
This patchset allows sending CompressionMethod message, which allows to set another codec\level picked from the set of negotiated codec sets (during startup).
Did you mean to attach a patchset? I don't see the CompressionMethod
message in the v2 patchset. Only a CompressedData one.
One thing I'm wondering: should it be possible for the client to change the compression it wants mid-connection? I can think of some scenarios where that would be useful to connection poolers: if a pooler does plain forwarding of the compressed messages, then it would need to be able to disable/enable compression if it wants to multiplex client connections with different compression settings over the same server connection.
I have reworked this patch series to make it easier to extend to
restart compression mid-connection once something in the vein of the
discussion in "Add new protocol message to change GUCs for usage with
future protocol-only GUCs" [1] happens. In particular, I have changed
the `CompressedMessage` protocol message to signal the current
compression algorithm any time the client should restart its streaming
decompressor and otherwise implicitly use whatever compression
algorithm and decompressor was used for previous `CompressedMessage` ,
which future work can leverage to trigger such a restart on update of
the client-supported compression algorithms.
Option 2:
This one is even cleaner in the common case but a bit worse in the
uncommon case: just use one parameter and have
compression/decompression enabling be part of the compression detail
(e.g. "libpq_compression='gzip:no_de
compress;lz4:level=2,no_compress;zstd'"
or something like that, in which case the "none,gzip" case would
become "'libpq_compression=gzip:no_compress'"). See
https://www.postgresql.org/docs/current/app-pgbasebackup.html ,
specifically the `--compress` flag, for how specifying compression
algorithms and details works.
I ended up reworking this to use a version of this option in place of
the `none` hackery, but naming the parameters `compress` and
`decompress, so to disable compression but allow decompression you
would specify `libpq_compression=gzip:compress=off`.
Also my windows SSL test failures seem to have resolved themselves
with either these changes or a rebase, so I think things are truly in
a reviewable state now.
Attachments:
v3-0001-Add-IO-stream-abstraction.patchapplication/octet-stream; name=v3-0001-Add-IO-stream-abstraction.patchDownload+1069-805
v3-0002-Add-protocol-layer-compression-to-libpq.patchapplication/octet-stream; name=v3-0002-Add-protocol-layer-compression-to-libpq.patchDownload+3175-165
v3-0003-Add-network-traffic-stats.patchapplication/octet-stream; name=v3-0003-Add-network-traffic-stats.patchDownload+285-9
v3-0004-Add-basic-test-for-compression-functionality.patchapplication/octet-stream; name=v3-0004-Add-basic-test-for-compression-functionality.patchDownload+108-4
v3-0005-DO-NOT-MERGE-enable-compression-for-CI.patchapplication/octet-stream; name=v3-0005-DO-NOT-MERGE-enable-compression-for-CI.patchDownload+8-6
On Sun, Dec 31, 2023 at 2:32 AM Jacob Burroughs
<jburroughs@instructure.com> wrote:
I ended up reworking this to use a version of this option in place of
the `none` hackery, but naming the parameters `compress` and
`decompress, so to disable compression but allow decompression you
would specify `libpq_compression=gzip:compress=off`.
I'm still a bit befuddled by this interface.
libpq_compression=gzip:compress=off looks a lot like it's saying "do
it, except don't". I guess that you're using "compress" and
"decompress" to distinguish the two directions - i.e. server to client
and client to server - but of course in both directions the sender
compresses and the receiver decompresses, so I don't find that very
clear.
I wonder if we could use "upstream" and "downstream" to be clearer? Or
some other terminology?
--
Robert Haas
EDB: http://www.enterprisedb.com
I wonder if we could use "upstream" and "downstream" to be clearer? Or
some other terminology?
What about `send` and `receive`?
On Fri, Jan 12, 2024 at 4:02 PM Jacob Burroughs
<jburroughs@instructure.com> wrote:
I wonder if we could use "upstream" and "downstream" to be clearer? Or
some other terminology?What about `send` and `receive`?
I think that would definitely be better than "compress" and
"decompress," but I was worried that it might be unclear to the user
whether the parameter that they specified was from the point of view
of the client or the server. Perhaps that's a dumb thing to worry
about, though.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Jan 12, 2024 at 4:11 PM Robert Haas <robertmhaas@gmail.com> wrote:
I think that would definitely be better than "compress" and
"decompress," but I was worried that it might be unclear to the user
whether the parameter that they specified was from the point of view
of the client or the server. Perhaps that's a dumb thing to worry
about, though.
According to https://commitfest.postgresql.org/48/4746/ this patch set
needs review, but:
1. Considering that there have been no updates for 5 months, maybe
it's actually dead?
and
2. I still think it needs to be more clear how the interface is
supposed to work. I do not want to spend time reviewing a patch to see
whether it works without understanding how it is intended to work --
and I also think that reviewing the patch in detail before we've got
the user interface right makes a whole lot of sense.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, May 14, 2024 at 11:08 AM Robert Haas <robertmhaas@gmail.com> wrote:
According to https://commitfest.postgresql.org/48/4746/ this patch set
needs review, but:1. Considering that there have been no updates for 5 months, maybe
it's actually dead?
I've withdrawn this patch from the commitfest. I had been waiting for
some resolution on "Add new protocol message to change GUCs for usage
with future protocol-only GUCs" before I rebased/refactored this one,
because this would be introducing the first protocol extension so far,
and that discussion appeared to be working out some meaningful issues
on how GUCs and protocol parameters should interact. If you think it
is worthwhile to proceed here though, I am very happy to do so. (I
would love to see this feature actually make it into postgres; it
would almost certainly be a big efficiency and cost savings win for
how my company deploys postgres internally :) )
2. I still think it needs to be more clear how the interface is
supposed to work. I do not want to spend time reviewing a patch to see
whether it works without understanding how it is intended to work --
and I also think that reviewing the patch in detail before we've got
the user interface right makes a whole lot of sense.
Regarding the interface, what I had originally gone for was the idea
that the naming of the options was from the perspective of the side
you were setting them on. Therefore, when setting `libpq_compression`
as a server-level GUC, `compress` would control if the server would
compress (send compressed data) with the given algorithm, and
`decompress` would control if the the server would decompress (receive
compressed data) with the given algorithm. And likewise on the client
side, when setting `compression` as a connection config option,
`compress` would control if the *client* would compress (send
compressed data) with the given algorithm, and `decompress` would
control if the the *client* would decompress (receive compressed data)
with the given algorithm. So for a client to pick what to send, it
would choose from the intersection of its own `compress=true` and the
server's `decompress=true` algorithms sent in the `ParameterStatus`
message with `libpq_compression`. And likewise on the server side, it
would choose from the intersection of the server's `compress=true`
algorithms and the client's `decompress=true` algorithms sent in the
`_pq_.libpq_compression` startup option. If you have a better
suggestion I am very open to it though.
--
Jacob Burroughs | Staff Software Engineer
E: jburroughs@instructure.com
On Tue, May 14, 2024 at 12:30 PM Jacob Burroughs
<jburroughs@instructure.com> wrote:
I've withdrawn this patch from the commitfest. I had been waiting for
some resolution on "Add new protocol message to change GUCs for usage
with future protocol-only GUCs" before I rebased/refactored this one,
because this would be introducing the first protocol extension so far,
and that discussion appeared to be working out some meaningful issues
on how GUCs and protocol parameters should interact. If you think it
is worthwhile to proceed here though, I am very happy to do so. (I
would love to see this feature actually make it into postgres; it
would almost certainly be a big efficiency and cost savings win for
how my company deploys postgres internally :) )
I don't think you should wait for that to be resolved; IMHO, this
patch needs to inform that discussion more than the other way around.
2. I still think it needs to be more clear how the interface is
supposed to work. I do not want to spend time reviewing a patch to see
whether it works without understanding how it is intended to work --
and I also think that reviewing the patch in detail before we've got
the user interface right makes a whole lot of sense.Regarding the interface, what I had originally gone for was the idea
that the naming of the options was from the perspective of the side
you were setting them on. Therefore, when setting `libpq_compression`
as a server-level GUC, `compress` would control if the server would
compress (send compressed data) with the given algorithm, and
`decompress` would control if the the server would decompress (receive
compressed data) with the given algorithm. And likewise on the client
side, when setting `compression` as a connection config option,
`compress` would control if the *client* would compress (send
compressed data) with the given algorithm, and `decompress` would
control if the the *client* would decompress (receive compressed data)
with the given algorithm. So for a client to pick what to send, it
would choose from the intersection of its own `compress=true` and the
server's `decompress=true` algorithms sent in the `ParameterStatus`
message with `libpq_compression`. And likewise on the server side, it
would choose from the intersection of the server's `compress=true`
algorithms and the client's `decompress=true` algorithms sent in the
`_pq_.libpq_compression` startup option. If you have a better
suggestion I am very open to it though.
Well, in my last response before the thread died, I complained that
libpq_compression=gzip:compress=off was confusing, and I stand by
that, because "compress" is used both in the name of the parameter and
as an option within the value of that parameter. I think there's more
than one acceptable way to resolve that problem, but I think leaving
it like that is unacceptable.
Even more broadly, I think there have been a couple of versions of
this patch now where I read the documentation and couldn't understand
how the feature was supposed to work, and I'm not really willing to
spend time trying to review a complex patch for conformity with a
design that I can't understand in the first place. I don't want to
pretend like I'm the smartest person on this mailing list, and in fact
I know that I'm definitely not, but I think I'm smart enough and
experienced enough with PostgreSQL that if I look at the description
of a parameter and say "I don't understand how the heck this is
supposed to work", probably a lot of users are going to have the same
reaction. That lack of understanding on my part my come either from
the explanation of the parameter not being as good as it needs to be,
or from the design itself not being as good as it needs to be, or from
some combination of the two, but whichever is the case, IMHO you or
somebody else has got to figure out how to fix it.
I do also admit that there is a possibility that everything is totally
fine and I've just been kinda dumb on the days when I've looked at the
patch. If a chorus of other hackers shows up and gives me a few whacks
with the cluestick and after that I look at the proposed options and
go "oh, yeah, these totally make sense, I was just being stupid," fair
enough! But right now that's not where I'm at. I don't want you to
explain to me how it works; I want you to change it in some way so
that when I or some end user looks at it, they go "I don't need an
explanation of how that works because it's extremely clear to me
already," or at least "hmm, this is a bit complicated but after a
quick glance at the documentation it makes sense".
I would really like to see this patch go forward, but IMHO these UI
questions are blockers.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, May 14, 2024 at 1:35 PM Robert Haas <robertmhaas@gmail.com> wrote:
Well, in my last response before the thread died, I complained that
libpq_compression=gzip:compress=off was confusing, and I stand by
that, because "compress" is used both in the name of the parameter and
as an option within the value of that parameter. I think there's more
than one acceptable way to resolve that problem, but I think leaving
it like that is unacceptable.
What if we went with:
Server side:
* `libpq_compression=on` (I just want everything the server supports
available; probably the most common case)
* `libpq_compression=off` (I don't want any compression ever with this server)
* `libpq_compression=lzma;gzip` (I only want these algorithms for
whatever reason)
* `libpq_compression=lzma:client_to_server=off;gzip:server_to_client=off`
(I only want to send data with lzma and receive data with gzip)
Client side:
*`compression=on` (I just want compression; pick sane defaults
automatically for me; probably the most common case)
* `compression=off` (I don't want any compression)
* `compression=lzma;gzip` (I only want these algorithms for whatever reason)
* `compression=lzma:client_to_server=off;gzip:server_to_client=off` (I
only want to receive data with lzma and send data with gzip)
`client_to_server`/`server_to_client` is a bit verbose, but it's very
explicit in what it means, so you don't need to reason about who is
sending/receiving/etc in a given context, and a given config string
applied to the server or the client side has the same effect on the
connection.
Even more broadly, I think there have been a couple of versions of
this patch now where I read the documentation and couldn't understand
how the feature was supposed to work, and I'm not really willing to
spend time trying to review a complex patch for conformity with a
design that I can't understand in the first place. I don't want to
pretend like I'm the smartest person on this mailing list, and in fact
I know that I'm definitely not, but I think I'm smart enough and
experienced enough with PostgreSQL that if I look at the description
of a parameter and say "I don't understand how the heck this is
supposed to work", probably a lot of users are going to have the same
reaction. That lack of understanding on my part my come either from
the explanation of the parameter not being as good as it needs to be,
or from the design itself not being as good as it needs to be, or from
some combination of the two, but whichever is the case, IMHO you or
somebody else has got to figure out how to fix it.
If the above proposal seems better to you I'll both rework the patch
and then also try to rewrite the relevant bits of documentation to
separate out "what knobs are there" and "how do I specify the flags to
turn the knobs", because I think those two being integrated is making
the parameter documentation less readable/followable.
--
Jacob Burroughs | Staff Software Engineer
E: jburroughs@instructure.com
On Tue, May 14, 2024 at 3:22 PM Jacob Burroughs
<jburroughs@instructure.com> wrote:
What if we went with:
Server side:
* `libpq_compression=on` (I just want everything the server supports
available; probably the most common case)
* `libpq_compression=off` (I don't want any compression ever with this server)
* `libpq_compression=lzma;gzip` (I only want these algorithms for
whatever reason)
* `libpq_compression=lzma:client_to_server=off;gzip:server_to_client=off`
(I only want to send data with lzma and receive data with gzip)
Client side:
*`compression=on` (I just want compression; pick sane defaults
automatically for me; probably the most common case)
* `compression=off` (I don't want any compression)
* `compression=lzma;gzip` (I only want these algorithms for whatever reason)
* `compression=lzma:client_to_server=off;gzip:server_to_client=off` (I
only want to receive data with lzma and send data with gzip)`client_to_server`/`server_to_client` is a bit verbose, but it's very
explicit in what it means, so you don't need to reason about who is
sending/receiving/etc in a given context, and a given config string
applied to the server or the client side has the same effect on the
connection.
IMHO, that's a HUGE improvement. But:
* I would probably change is the name "libpq_compression", because
even though we have src/backend/libpq, we typically use libpq to refer
to the client library, not the server's implementation of the wire
protocol. I think we could call it connection_encryption or
wire_protocol_encryption or something like that, but I'm not a huge
fan of libpq_compression.
* I would use commas, not semicolons, to separate items in a list,
i.e. lzma,gzip not lzma;gzip. I think that convention is nearly
universal in PostgreSQL, but feel free to point out counterexamples if
you were modelling this on something.
* libpq_compression=lzma:client_to_server=off;gzip:server_to_client=off
reads strangely to me. How about making it so that the syntax is like
this:
libpq_compression=DEFAULT_VALUE_FOR_BOTH_DIRECTIONS:client_to_server=OVERRIDE_FOR_THIS_DIRECTION:servert_to_client=OVERRIDE_FOR_THIS_DIRECTION
With all components being optional. So this example could be written
in any of these ways:
libpq_compression=lzma;server_to_client=gzip
libpq_compression=gzip;client_to_server=lzma
libpq_compression=server_to_client=gzip;client_to_server=lzma
libpq_compression=client_to_server=lzma;client_to_server=gzip
And if I wrote libpq_compression=server_to_client=gzip that would mean
send data to the client using gzip and in the other direction use
whatever the default is.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, May 14, 2024 at 3:24 PM Robert Haas <robertmhaas@gmail.com> wrote:
IMHO, that's a HUGE improvement. But:
* I would probably change is the name "libpq_compression", because
even though we have src/backend/libpq, we typically use libpq to refer
to the client library, not the server's implementation of the wire
protocol. I think we could call it connection_encryption or
wire_protocol_encryption or something like that, but I'm not a huge
fan of libpq_compression.
I think connection_compression would seem like a good name to me.
* I would use commas, not semicolons, to separate items in a list,
i.e. lzma,gzip not lzma;gzip. I think that convention is nearly
universal in PostgreSQL, but feel free to point out counterexamples if
you were modelling this on something.* libpq_compression=lzma:client_to_server=off;gzip:server_to_client=off
reads strangely to me. How about making it so that the syntax is like
this:libpq_compression=DEFAULT_VALUE_FOR_BOTH_DIRECTIONS:client_to_server=OVERRIDE_FOR_THIS_DIRECTION:servert_to_client=OVERRIDE_FOR_THIS_DIRECTION
With all components being optional. So this example could be written
in any of these ways:libpq_compression=lzma;server_to_client=gzip
libpq_compression=gzip;client_to_server=lzma
libpq_compression=server_to_client=gzip;client_to_server=lzma
libpq_compression=client_to_server=lzma;client_to_server=gzipAnd if I wrote libpq_compression=server_to_client=gzip that would mean
send data to the client using gzip and in the other direction use
whatever the default is.
The reason for both the semicolons and for not doing this is related
to using the same specification structure as here:
https://www.postgresql.org/docs/current/app-pgbasebackup.html
(specifically the --compress argument). Reusing that specification
requires that we use commas to separate the flags for a compression
method, and therefore left me with semicolons as the leftover
separator character. I think we could go with something like your
proposal, and in a lot of ways I like it, but there's still the
possibility of e.g.
`libpq_compression=client_to_server=zstd:level=10,long=true,gzip;client_to_server=gzip`
and I'm not quite sure how to make the separator characters work
coherently if we need to treat `zstd:level=10,long=true` as a unit.
Alternatively, we could have `connection_compression`,
`connection_compression_server_to_client`, and
`connection_compression_client_to_server` as three separate GUCs (and
on the client side `compression`, `compression_server_to_client`, and
`compression_client_to_server` as three separate connection
parameters), where we would treat `connection_compression` as a
default that could be overridden by an explicit
client_to_server/server_to_client. That creates the slightly funky
case where if you specify all three then the base one ends up unused
because the two more specific ones are being used instead, but that
isn't necessarily terrible. On the server side we *could* go with
just the server_to_client and client_to_server ones, but I think we
want it to be easy to use this feature in the simple case with a
single libpq parameter.
--
Jacob Burroughs | Staff Software Engineer
E: jburroughs@instructure.com
Import Notes
Reply to msg id not found: CACzsqT7S+Bk48xvGd9cvhyj0tHWNqzs_SPVs+rDNSsqfR7ymuw@mail.gmail.com
On Tue, May 14, 2024 at 5:21 PM Jacob Burroughs
<jburroughs@instructure.com> wrote:
The reason for both the semicolons and for not doing this is related
to using the same specification structure as here:
https://www.postgresql.org/docs/current/app-pgbasebackup.html
(specifically the --compress argument).
I agree with that goal, but I'm somewhat confused by how your proposal
achieves it. You had
libpq_compression=lzma:client_to_server=off;gzip:server_to_client=off,
so how do we parse that? Is that two completely separate
specifications, one for lzma and one for gzip, and each of those has
one option which is set to off? And then they are separated from each
other by a semicolon? That actually does make sense, and I think it
may do a better job allowing for compression options than my proposal,
but it also seems a bit weird, because client_to_server and
server_to_client are not really compression options at all. They're
framing when this compression specification applies, rather than what
it does when it applies. In a way it's a bit like the fact that you
can prefix a pg_basebackup's --compress option with client- or server-
to specify where the compression should happen. But we can't quite
reuse that idea here, because in that case there's no question of
doing it in both places, whereas here, you might want one thing for
upstream and another thing for downstream.
Alternatively, we could have `connection_compression`,
`connection_compression_server_to_client`, and
`connection_compression_client_to_server` as three separate GUCs (and
on the client side `compression`, `compression_server_to_client`, and
`compression_client_to_server` as three separate connection
parameters), where we would treat `connection_compression` as a
default that could be overridden by an explicit
client_to_server/server_to_client. That creates the slightly funky
case where if you specify all three then the base one ends up unused
because the two more specific ones are being used instead, but that
isn't necessarily terrible. On the server side we *could* go with
just the server_to_client and client_to_server ones, but I think we
want it to be easy to use this feature in the simple case with a
single libpq parameter.
I'm not a fan of three settings; I could go with two settings, one for
each direction, and if you want both you have to set both. Or, another
idea, what if we just separated the two directions with a slash,
SEND/RECEIVE, and if there's no slash, then it applies to both
directions. So you could say
connection_compression='gzip:level=9/lzma' or whatever.
But now I'm wondering whether these options should really be symmetric
on the client and server sides? Isn't it for the server just to
specify a list of acceptable algorithms, and the client to set the
compression options? If both sides are trying to set the compression
level, for example, who wins?
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, May 15, 2024 at 8:38 AM Robert Haas <robertmhaas@gmail.com> wrote:
I agree with that goal, but I'm somewhat confused by how your proposal
achieves it. You had
libpq_compression=lzma:client_to_server=off;gzip:server_to_client=off,
so how do we parse that? Is that two completely separate
specifications, one for lzma and one for gzip, and each of those has
one option which is set to off? And then they are separated from each
other by a semicolon? That actually does make sense, and I think it
may do a better job allowing for compression options than my proposal,
but it also seems a bit weird, because client_to_server and
server_to_client are not really compression options at all. They're
framing when this compression specification applies, rather than what
it does when it applies. In a way it's a bit like the fact that you
can prefix a pg_basebackup's --compress option with client- or server-
to specify where the compression should happen. But we can't quite
reuse that idea here, because in that case there's no question of
doing it in both places, whereas here, you might want one thing for
upstream and another thing for downstream.
Your interpretation is correct, but I don't disagree that it ends up
feeling confusing.
I'm not a fan of three settings; I could go with two settings, one for
each direction, and if you want both you have to set both. Or, another
idea, what if we just separated the two directions with a slash,
SEND/RECEIVE, and if there's no slash, then it applies to both
directions. So you could say
connection_compression='gzip:level=9/lzma' or whatever.But now I'm wondering whether these options should really be symmetric
on the client and server sides? Isn't it for the server just to
specify a list of acceptable algorithms, and the client to set the
compression options? If both sides are trying to set the compression
level, for example, who wins?
Compression options really only ever apply to the side doing the
compressing, and at least as I had imagined things each party
(client/server) only used its own level/other compression params.
That leaves me thinking, maybe we really want two independent GUCs,
one for "what algorithms are enabled/negotiable" and one for "how
should I configure my compressors" and then we reduce the dimensions
we are trying to shove into one GUC and each one ends up with a very
clear purpose:
connection_compression=(yes|no|alg1,alg2:server_to_client=alg1,alg2:client_to_server=alg3)
connection_compression_opts=gzip:level=2
--
Jacob Burroughs | Staff Software Engineer
E: jburroughs@instructure.com