Toast compression method options

Started by Dilip Kumarover 4 years ago15 messages
#1Dilip Kumar
Dilip Kumar
dilipbalaut@gmail.com
1 attachment(s)

We have already pushed the configurable lz4 toast compression code[1]/messages/by-id/E1lNKw9-0008DT-1L@gemulon.postgresql.org.
In the custom compression thread, we were already having the patch to
support the compression method options[2]/messages/by-id/CAFiTN-s7fno8pGwfK7jwSf7uNaVhPZ38C3LAcF+=WHu7jNvy7g@mail.gmail.com. But the design for the
base patches was heavily modified before commit but I never rebased
this patch based on the new design. Now, I have rebased this patch so
that we don't lose track and we can work on this for v15. This is
still a WIP patch.

For v15 I will work on improving the code and I will also work on
analyzing the usage of compression method options (compression
speed/ratio).

[1]: /messages/by-id/E1lNKw9-0008DT-1L@gemulon.postgresql.org
[2]: /messages/by-id/CAFiTN-s7fno8pGwfK7jwSf7uNaVhPZ38C3LAcF+=WHu7jNvy7g@mail.gmail.com

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

WIP-0001-Toast-compression-method-options.patchapplication/octet-stream; name=WIP-0001-Toast-compression-method-options.patch
#2Dilip Kumar
Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#1)
1 attachment(s)
Re: Toast compression method options

On Mon, May 3, 2021 at 6:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

We have already pushed the configurable lz4 toast compression code[1].
In the custom compression thread, we were already having the patch to
support the compression method options[2]. But the design for the
base patches was heavily modified before commit but I never rebased
this patch based on the new design. Now, I have rebased this patch so
that we don't lose track and we can work on this for v15. This is
still a WIP patch.

For v15 I will work on improving the code and I will also work on
analyzing the usage of compression method options (compression
speed/ratio).

[1] /messages/by-id/E1lNKw9-0008DT-1L@gemulon.postgresql.org
[2] /messages/by-id/CAFiTN-s7fno8pGwfK7jwSf7uNaVhPZ38C3LAcF+=WHu7jNvy7g@mail.gmail.com

I have fixed some comments offlist reported by Justin. Apart from
that, I have also improved documentation and test case. Stil it has a
lot of cleanup to be done but I am not planning to do that
immediately.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

wip-v2-0001-Toast-compression-method-options.patchtext/x-patch; charset=US-ASCII; name=wip-v2-0001-Toast-compression-method-options.patch
#3Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Dilip Kumar (#2)
Re: Toast compression method options

On Thu, May 06, 2021 at 07:23:48PM +0530, Dilip Kumar wrote:

I have fixed some comments offlist reported by Justin. Apart from
that, I have also improved documentation and test case. Stil it has a
lot of cleanup to be done but I am not planning to do that
immediately.

I was testing the various compression algos we touch in core, and I am
not really convinced that we need more code to control that. First,
pglz is living as-is in the backend code for a very long time and no
one has expressed an interest in controlling the compression strategy
used AFAIK. On top of that, LZ4 outclasses it easily, so if there is
a need to worry about performance with the TOAST data, users could
just move to use LZ4.

+       if (strcmp(def->defname, "acceleration") == 0)
+       {
+           int32 acceleration =
+               pg_atoi(defGetString(def), sizeof(acceleration), 0);
+
+           if (acceleration < INT32_MIN || acceleration > INT32_MAX)
+               ereport(ERROR,
+                   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                    errmsg("unexpected value for lz4 compression acceleration: \"%s\"",
+                               defGetString(def)),
+                    errhint("expected value between INT32_MIN and
INT32_MAX")
+                   ));

Then comes the part with LZ4 and its acceleration. The default
compression level used by LZ4 compresses data the most, and it is
already pretty cheap in CPU. Do you have cases where this would be
useful? Increasing the acceleration reduces the compression to be
close to zero, but if one cares about the compression cost, he/she
could fall back to the external storage for basically the same
effect. Is there really a use-case for something in-between?
--
Michael

#4Dilip Kumar
Dilip Kumar
dilipbalaut@gmail.com
In reply to: Michael Paquier (#3)
Re: Toast compression method options

On Fri, Jun 18, 2021 at 12:13 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, May 06, 2021 at 07:23:48PM +0530, Dilip Kumar wrote:

I have fixed some comments offlist reported by Justin. Apart from
that, I have also improved documentation and test case. Stil it has a
lot of cleanup to be done but I am not planning to do that
immediately.

I was testing the various compression algos we touch in core, and I am
not really convinced that we need more code to control that. First,
pglz is living as-is in the backend code for a very long time and no
one has expressed an interest in controlling the compression strategy
used AFAIK. On top of that, LZ4 outclasses it easily, so if there is
a need to worry about performance with the TOAST data, users could
just move to use LZ4.

+       if (strcmp(def->defname, "acceleration") == 0)
+       {
+           int32 acceleration =
+               pg_atoi(defGetString(def), sizeof(acceleration), 0);
+
+           if (acceleration < INT32_MIN || acceleration > INT32_MAX)
+               ereport(ERROR,
+                   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                    errmsg("unexpected value for lz4 compression acceleration: \"%s\"",
+                               defGetString(def)),
+                    errhint("expected value between INT32_MIN and
INT32_MAX")
+                   ));

Then comes the part with LZ4 and its acceleration. The default
compression level used by LZ4 compresses data the most, and it is
already pretty cheap in CPU. Do you have cases where this would be
useful? Increasing the acceleration reduces the compression to be
close to zero, but if one cares about the compression cost, he/she
could fall back to the external storage for basically the same
effect. Is there really a use-case for something in-between?

IMHO there is certainly a use case, basically, if we compress the data
so that we can avoid storing it externally. Now suppose for some
data, with default LZ4 compression, the compression ratio is so high
that you are able to compress to the size which is way under the
limit. So for such data, the acceleration can be increased such that
compression is fast and compression ratio is good enough that it is
not going to the external storage. I agree it will be difficult for
the user to make such a decision and select the acceleration value but
based on the data pattern and their compressed length the admin can
make such a decision. So in short select the acceleration value such
that you can compress it fast and the compression ratio is good enough
to keep it from storing externally.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#5Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Dilip Kumar (#4)
Re: Toast compression method options

On Tue, Jun 22, 2021 at 11:05:22AM +0530, Dilip Kumar wrote:

IMHO there is certainly a use case, basically, if we compress the data
so that we can avoid storing it externally. Now suppose for some
data, with default LZ4 compression, the compression ratio is so high
that you are able to compress to the size which is way under the
limit. So for such data, the acceleration can be increased such that
compression is fast and compression ratio is good enough that it is
not going to the external storage. I agree it will be difficult for
the user to make such a decision and select the acceleration value but
based on the data pattern and their compressed length the admin can
make such a decision. So in short select the acceleration value such
that you can compress it fast and the compression ratio is good enough
to keep it from storing externally.

Theoritically, I agree that there could be a use case, and that was
the point I was trying to outline above. My point is more from a
practical point of view. LZ4 is designed to be fast and cheap in CPU
with a rather low compression ratio compared to other modern algos.

Could it be possible to think about some worst cases where one may
want to reduce its compression to save some CPU? The point, as you
say, to allow a tuning of the acceleration would be that one may want
to save a bit of CPU and does not care about the extra disk space it
takes. Still, I am wondering why one would not just store the values
externally in such cases and just save as much compression effort as
possible.
--
Michael

#6vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Dilip Kumar (#2)
Re: Toast compression method options

On Thu, May 6, 2021 at 7:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, May 3, 2021 at 6:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

We have already pushed the configurable lz4 toast compression code[1].
In the custom compression thread, we were already having the patch to
support the compression method options[2]. But the design for the
base patches was heavily modified before commit but I never rebased
this patch based on the new design. Now, I have rebased this patch so
that we don't lose track and we can work on this for v15. This is
still a WIP patch.

For v15 I will work on improving the code and I will also work on
analyzing the usage of compression method options (compression
speed/ratio).

[1] /messages/by-id/E1lNKw9-0008DT-1L@gemulon.postgresql.org
[2] /messages/by-id/CAFiTN-s7fno8pGwfK7jwSf7uNaVhPZ38C3LAcF+=WHu7jNvy7g@mail.gmail.com

I have fixed some comments offlist reported by Justin. Apart from
that, I have also improved documentation and test case. Stil it has a
lot of cleanup to be done but I am not planning to do that
immediately.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

#7Dilip Kumar
Dilip Kumar
dilipbalaut@gmail.com
In reply to: Michael Paquier (#5)
Re: Toast compression method options

On Tue, Jun 22, 2021 at 1:37 PM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Jun 22, 2021 at 11:05:22AM +0530, Dilip Kumar wrote:

IMHO there is certainly a use case, basically, if we compress the data
so that we can avoid storing it externally. Now suppose for some
data, with default LZ4 compression, the compression ratio is so high
that you are able to compress to the size which is way under the
limit. So for such data, the acceleration can be increased such that
compression is fast and compression ratio is good enough that it is
not going to the external storage. I agree it will be difficult for
the user to make such a decision and select the acceleration value but
based on the data pattern and their compressed length the admin can
make such a decision. So in short select the acceleration value such
that you can compress it fast and the compression ratio is good enough
to keep it from storing externally.

Theoritically, I agree that there could be a use case, and that was
the point I was trying to outline above. My point is more from a
practical point of view. LZ4 is designed to be fast and cheap in CPU
with a rather low compression ratio compared to other modern algos.

Could it be possible to think about some worst cases where one may
want to reduce its compression to save some CPU? The point, as you
say, to allow a tuning of the acceleration would be that one may want
to save a bit of CPU and does not care about the extra disk space it
takes. Still, I am wondering why one would not just store the values
externally in such cases and just save as much compression effort as
possible.

Well, that actually depends upon the data, basically, LZ4 acceleration
searches in wider increments, which may reduce the number of potential
matches but increase the speed. So based on the actual data pattern
it is highly possible that you get the speed benefit without losing
much or nothing on the compression ratio. So IMHO, this is user
exposed option so based on the user's data pattern why it is not wise
to provide an option for the user to give the acceration when the user
is sure that selecting a better speed will not harm anything on
compression ratio for their data pattern.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#8Dilip Kumar
Dilip Kumar
dilipbalaut@gmail.com
In reply to: vignesh C (#6)
Re: Toast compression method options

On Wed, Jul 14, 2021 at 5:35 PM vignesh C <vignesh21@gmail.com> wrote:

On Thu, May 6, 2021 at 7:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Okay, I will rebase and send it by next week.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#9Dilip Kumar
Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#8)
1 attachment(s)
Re: Toast compression method options

On Sun, Jul 18, 2021 at 9:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Jul 14, 2021 at 5:35 PM vignesh C <vignesh21@gmail.com> wrote:

On Thu, May 6, 2021 at 7:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Okay, I will rebase and send it by next week.

I have rebased the patch.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v2-0001-Toast-compression-method-options.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Toast-compression-method-options.patch
#10Jaime Casanova
Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Dilip Kumar (#9)
Re: Toast compression method options

On Mon, Jul 19, 2021 at 01:24:03PM +0530, Dilip Kumar wrote:

On Sun, Jul 18, 2021 at 9:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Jul 14, 2021 at 5:35 PM vignesh C <vignesh21@gmail.com> wrote:

On Thu, May 6, 2021 at 7:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Okay, I will rebase and send it by next week.

I have rebased the patch.

Hi,

This doesn't apply cleanly nor compile.
Are you planning to send a rebased version?

--
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

#11Dilip Kumar
Dilip Kumar
dilipbalaut@gmail.com
In reply to: Jaime Casanova (#10)
Re: Toast compression method options

On Fri, 10 Sep 2021 at 10:40 AM, Jaime Casanova <
jcasanov@systemguards.com.ec> wrote:

On Mon, Jul 19, 2021 at 01:24:03PM +0530, Dilip Kumar wrote:

On Sun, Jul 18, 2021 at 9:15 PM Dilip Kumar <dilipbalaut@gmail.com>

wrote:

On Wed, Jul 14, 2021 at 5:35 PM vignesh C <vignesh21@gmail.com> wrote:

On Thu, May 6, 2021 at 7:24 PM Dilip Kumar <dilipbalaut@gmail.com>

wrote

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Okay, I will rebase and send it by next week.

I have rebased the patch.

Hi,

This doesn't apply cleanly nor compile.
Are you planning to send a rebased version?

I will do that early next week.

--

Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#12Jaime Casanova
Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Dilip Kumar (#11)
Re: Toast compression method options

On Fri, Sep 10, 2021 at 10:54:04AM +0530, Dilip Kumar wrote:

On Fri, 10 Sep 2021 at 10:40 AM, Jaime Casanova <
jcasanov@systemguards.com.ec> wrote:

On Mon, Jul 19, 2021 at 01:24:03PM +0530, Dilip Kumar wrote:

On Sun, Jul 18, 2021 at 9:15 PM Dilip Kumar <dilipbalaut@gmail.com>

wrote:

On Wed, Jul 14, 2021 at 5:35 PM vignesh C <vignesh21@gmail.com> wrote:

On Thu, May 6, 2021 at 7:24 PM Dilip Kumar <dilipbalaut@gmail.com>

wrote

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Okay, I will rebase and send it by next week.

I have rebased the patch.

Hi,

This doesn't apply cleanly nor compile.
Are you planning to send a rebased version?

I will do that early next week.

Great! I'm marking the patch as "waiting on author".
Thanks for keep working on this.

--
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

#13Dilip Kumar
Dilip Kumar
dilipbalaut@gmail.com
In reply to: Jaime Casanova (#12)
1 attachment(s)
Re: Toast compression method options

On Fri, Sep 10, 2021 at 9:54 PM Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:

I will do that early next week.

Great! I'm marking the patch as "waiting on author".
Thanks for keep working on this.

I haved rebased the patch.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v3-0001-Toast-compression-method-options.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Toast-compression-method-options.patch
#14Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Dilip Kumar (#13)
Re: Toast compression method options

On Mon, Sep 13, 2021 at 05:10:22PM +0530, Dilip Kumar wrote:

I haved rebased the patch.

Please note that the patch does not apply. FWIW, I still don't think
that this is a good idea to have that. I don't recall seeing much on
this list that users would like to have such a level of tuning for
pglz, while lz4 only offers the option to reduce the compression rate
while being already very cheap in CPU so the impact is limited. On
top of that, this adds a new attribute to pg_attribute with much more
complexity into ALTER TABLE code paths in tablecmds.c..
--
Michael

#15Dilip Kumar
Dilip Kumar
dilipbalaut@gmail.com
In reply to: Michael Paquier (#14)
Re: Toast compression method options

On Fri, Oct 1, 2021 at 12:42 PM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Sep 13, 2021 at 05:10:22PM +0530, Dilip Kumar wrote:

I haved rebased the patch.

Please note that the patch does not apply. FWIW, I still don't think
that this is a good idea to have that. I don't recall seeing much on
this list that users would like to have such a level of tuning for
pglz, while lz4 only offers the option to reduce the compression rate
while being already very cheap in CPU so the impact is limited. On
top of that, this adds a new attribute to pg_attribute with much more
complexity into ALTER TABLE code paths in tablecmds.c..

Thanks for the feedback, seeing no much interest from other hackers
and also as Michael pointed out that there is no much use case for
this, I am withdrawing this patch from the commitfest.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com