Compress ReorderBuffer spill files using LZ4

Started by Julien Tachoiresalmost 2 years ago28 messages
Jump to latest
#1Julien Tachoires
julmon@gmail.com

Hi,

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

When compiled with LZ4 support (--with-lz4), this patch enables data
compression/decompression of these temporary files. Each transaction
change that must be written on disk (ReorderBufferDiskChange) is now
compressed and encapsulated in a new structure.

3 different compression strategies are implemented:

1. LZ4 streaming compression is the preferred one and works
efficiently for small individual changes.
2. LZ4 regular compression when the changes are too large for using
the streaming API.
3. No compression when compression fails, the change is then stored
not compressed.

When not using compression, the following case generates 1590MB of
spill files:

CREATE TABLE t (i INTEGER PRIMARY KEY, t TEXT);
INSERT INTO t
SELECT i, 'Hello number n°'||i::TEXT
FROM generate_series(1, 10000000) as i;

With LZ4 compression, it creates 653MB of spill files: 58.9% less
disk space usage.

Open items:

1. The spill_bytes column from pg_stat_get_replication_slot() still returns
plain data size, not the compressed data size. Should we expose the
compressed data size when compression occurs?

2. Do we want a GUC to switch compression on/off?

Regards,

JT

Attachments:

v1-0001-Compress-ReorderBuffer-spill-files-using-LZ4.patchapplication/octet-stream; name=v1-0001-Compress-ReorderBuffer-spill-files-using-LZ4.patchDownload+496-53
#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Julien Tachoires (#1)
Re: Compress ReorderBuffer spill files using LZ4

On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

2. Do we want a GUC to switch compression on/off?

It depends on the overhead of decoding. Did you try to measure the
decoding overhead of decompression when reading compressed files?

--
With Regards,
Amit Kapila.

#3Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#2)
Re: Compress ReorderBuffer spill files using LZ4

On Thu, Jun 6, 2024 at 4:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

2. Do we want a GUC to switch compression on/off?

It depends on the overhead of decoding. Did you try to measure the
decoding overhead of decompression when reading compressed files?

I think it depends on the trade-off between the I/O savings from
reducing the data size and the performance cost of compressing and
decompressing the data. This balance is highly dependent on the
hardware. For example, if you have a very slow disk and a powerful
processor, compression could be advantageous. Conversely, if the disk
is very fast, the I/O savings might be minimal, and the compression
overhead could outweigh the benefits. Additionally, the effectiveness
of compression also depends on the compression ratio, which varies
with the type of data being compressed.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#4Julien Tachoires
julmon@gmail.com
In reply to: Amit Kapila (#2)
Re: Compress ReorderBuffer spill files using LZ4

Le jeu. 6 juin 2024 à 04:13, Amit Kapila <amit.kapila16@gmail.com> a écrit :

On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

That's right, setting subscription's option 'streaming' to 'on' moves
the problem away from the publisher to the subscribers. This patch
tries to improve the default situation when 'streaming' is set to
'off'.

2. Do we want a GUC to switch compression on/off?

It depends on the overhead of decoding. Did you try to measure the
decoding overhead of decompression when reading compressed files?

Quick benchmarking executed on my laptop shows 1% overhead.

Table DDL:
CREATE TABLE t (i INTEGER PRIMARY KEY, t TEXT);

Data generated with:
INSERT INTO t SELECT i, 'Text number n°'||i::TEXT FROM
generate_series(1, 10000000) as i;

Restoration duration measured using timestamps of log messages:
"DEBUG: restored XXXX/YYYY changes from disk"

HEAD: 25.54s, 25.94s, 25.516s, 26.267s, 26.11s / avg=25.874s
Patch: 26.872s, 26.311s, 25.753s, 26.003, 25.843s / avg=26.156s

Regards,

JT

#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Julien Tachoires (#4)
Re: Compress ReorderBuffer spill files using LZ4

On Thu, Jun 6, 2024 at 6:22 PM Julien Tachoires <julmon@gmail.com> wrote:

Le jeu. 6 juin 2024 à 04:13, Amit Kapila <amit.kapila16@gmail.com> a écrit :

On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

That's right, setting subscription's option 'streaming' to 'on' moves
the problem away from the publisher to the subscribers. This patch
tries to improve the default situation when 'streaming' is set to
'off'.

Can we think of changing the default to 'parallel'? BTW, it would be
better to use 'parallel' for the 'streaming' option, if the workload
has large transactions. Is there a reason to use a default value in
this case?

2. Do we want a GUC to switch compression on/off?

It depends on the overhead of decoding. Did you try to measure the
decoding overhead of decompression when reading compressed files?

Quick benchmarking executed on my laptop shows 1% overhead.

Thanks. We probably need different types of data (say random data in
bytea column, etc.) for this.

--
With Regards,
Amit Kapila.

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Kapila (#2)
Re: Compress ReorderBuffer spill files using LZ4

On 2024-Jun-06, Amit Kapila wrote:

On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

I like the general idea of compressing the output of logical decoding.
It's not so clear to me that we only want to do so for spilling to disk;
for instance, if the two nodes communicate over a slow network, it may
even be beneficial to compress when streaming, so to this question:

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

I would say that streaming doesn't necessarily have to mean we don't
want compression, because for some users it might be beneficial.

I think a GUC would be a good idea. Also, what if for whatever reason
you want a different compression algorithm or different compression
parameters? Looking at the existing compression UI we offer in
pg_basebackup, perhaps you could add something like this:

compress_logical_decoding = none
compress_logical_decoding = lz4:42
compress_logical_decoding = spill-zstd:99

"none" says to never use compression (perhaps should be the default),
"lz4:42" says to use lz4 with parameters 42 on both spilling and
streaming, and "spill-zstd:99" says to use Zstd with parameter 99 but
only for spilling to disk.

(I don't mean to say that you should implement Zstd compression with
this patch, only that you should choose the implementation so that
adding Zstd support (or whatever) later is just a matter of adding some
branches here and there. With the current #ifdef you propose, it's hard
to do that. Maybe separate the parts that depend on the specific
algorithm to algorithm-agnostic functions.)

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

#7Julien Tachoires
julmon@gmail.com
In reply to: Amit Kapila (#5)
Re: Compress ReorderBuffer spill files using LZ4

Le jeu. 6 juin 2024 à 06:40, Amit Kapila <amit.kapila16@gmail.com> a écrit :

On Thu, Jun 6, 2024 at 6:22 PM Julien Tachoires <julmon@gmail.com> wrote:

Le jeu. 6 juin 2024 à 04:13, Amit Kapila <amit.kapila16@gmail.com> a écrit :

On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

That's right, setting subscription's option 'streaming' to 'on' moves
the problem away from the publisher to the subscribers. This patch
tries to improve the default situation when 'streaming' is set to
'off'.

Can we think of changing the default to 'parallel'? BTW, it would be
better to use 'parallel' for the 'streaming' option, if the workload
has large transactions. Is there a reason to use a default value in
this case?

You're certainly right, if using the streaming API helps to avoid bad
situations and there is no downside, it could be used by default.

2. Do we want a GUC to switch compression on/off?

It depends on the overhead of decoding. Did you try to measure the
decoding overhead of decompression when reading compressed files?

Quick benchmarking executed on my laptop shows 1% overhead.

Thanks. We probably need different types of data (say random data in
bytea column, etc.) for this.

Yes, good idea, will run new tests in that sense.

Thank you!

Regards,

JT

#8Julien Tachoires
julmon@gmail.com
In reply to: Alvaro Herrera (#6)
Re: Compress ReorderBuffer spill files using LZ4

Le jeu. 6 juin 2024 à 07:24, Alvaro Herrera <alvherre@alvh.no-ip.org> a écrit :

On 2024-Jun-06, Amit Kapila wrote:

On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

I like the general idea of compressing the output of logical decoding.
It's not so clear to me that we only want to do so for spilling to disk;
for instance, if the two nodes communicate over a slow network, it may
even be beneficial to compress when streaming, so to this question:

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

I would say that streaming doesn't necessarily have to mean we don't
want compression, because for some users it might be beneficial.

Interesting idea, will try to evaluate how to compress/decompress data
transiting via streaming and how good the compression ratio would be.

I think a GUC would be a good idea. Also, what if for whatever reason
you want a different compression algorithm or different compression
parameters? Looking at the existing compression UI we offer in
pg_basebackup, perhaps you could add something like this:

compress_logical_decoding = none
compress_logical_decoding = lz4:42
compress_logical_decoding = spill-zstd:99

"none" says to never use compression (perhaps should be the default),
"lz4:42" says to use lz4 with parameters 42 on both spilling and
streaming, and "spill-zstd:99" says to use Zstd with parameter 99 but
only for spilling to disk.

I agree, if the server was compiled with support of multiple
compression libraries, users should be able to choose which one they
want to use.

(I don't mean to say that you should implement Zstd compression with
this patch, only that you should choose the implementation so that
adding Zstd support (or whatever) later is just a matter of adding some
branches here and there. With the current #ifdef you propose, it's hard
to do that. Maybe separate the parts that depend on the specific
algorithm to algorithm-agnostic functions.)

Makes sense, will rework this patch in that way.

Thank you!

Regards,

JT

#9Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#6)
Re: Compress ReorderBuffer spill files using LZ4

On Thu, Jun 6, 2024 at 7:54 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Jun-06, Amit Kapila wrote:

On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

I like the general idea of compressing the output of logical decoding.
It's not so clear to me that we only want to do so for spilling to disk;
for instance, if the two nodes communicate over a slow network, it may
even be beneficial to compress when streaming, so to this question:

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

I would say that streaming doesn't necessarily have to mean we don't
want compression, because for some users it might be beneficial.

+1

I think a GUC would be a good idea. Also, what if for whatever reason
you want a different compression algorithm or different compression
parameters? Looking at the existing compression UI we offer in
pg_basebackup, perhaps you could add something like this:

compress_logical_decoding = none
compress_logical_decoding = lz4:42
compress_logical_decoding = spill-zstd:99

"none" says to never use compression (perhaps should be the default),
"lz4:42" says to use lz4 with parameters 42 on both spilling and
streaming, and "spill-zstd:99" says to use Zstd with parameter 99 but
only for spilling to disk.

I think the compression option should be supported at the CREATE
SUBSCRIPTION level instead of being controlled by a GUC. This way, we
can decide on compression for each subscription individually rather
than applying it to all subscribers. It makes more sense for the
subscriber to control this, especially when we are planning to
compress the data sent downstream.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dilip Kumar (#9)
Re: Compress ReorderBuffer spill files using LZ4

On 2024-Jun-07, Dilip Kumar wrote:

I think the compression option should be supported at the CREATE
SUBSCRIPTION level instead of being controlled by a GUC. This way, we
can decide on compression for each subscription individually rather
than applying it to all subscribers. It makes more sense for the
subscriber to control this, especially when we are planning to
compress the data sent downstream.

True. (I think we have some options that are in GUCs for the general
behavior and can be overridden by per-subscription options for specific
tailoring; would that make sense here? I think it does, considering
that what we mostly want is to save disk space in the publisher when
spilling to disk.)

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"I can't go to a restaurant and order food because I keep looking at the
fonts on the menu. Five minutes later I realize that it's also talking
about food" (Donald Knuth)

#11Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#10)
Re: Compress ReorderBuffer spill files using LZ4

On Fri, Jun 7, 2024 at 2:39 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Jun-07, Dilip Kumar wrote:

I think the compression option should be supported at the CREATE
SUBSCRIPTION level instead of being controlled by a GUC. This way, we
can decide on compression for each subscription individually rather
than applying it to all subscribers. It makes more sense for the
subscriber to control this, especially when we are planning to
compress the data sent downstream.

True. (I think we have some options that are in GUCs for the general
behavior and can be overridden by per-subscription options for specific
tailoring; would that make sense here? I think it does, considering
that what we mostly want is to save disk space in the publisher when
spilling to disk.)

Yeah, that makes sense.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#12Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#6)
Re: Compress ReorderBuffer spill files using LZ4

On Thu, Jun 6, 2024 at 7:54 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Jun-06, Amit Kapila wrote:

On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

I like the general idea of compressing the output of logical decoding.
It's not so clear to me that we only want to do so for spilling to disk;
for instance, if the two nodes communicate over a slow network, it may
even be beneficial to compress when streaming, so to this question:

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

I would say that streaming doesn't necessarily have to mean we don't
want compression, because for some users it might be beneficial.

Fair enough. it would be an interesting feature if we see the wider
usefulness of compression/decompression of logical changes. For
example, if this can improve the performance of applying large
transactions (aka reduce the apply lag for them) even when the
'streaming' option is 'parallel' then it would have a much wider
impact.

--
With Regards,
Amit Kapila.

#13Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#9)
Re: Compress ReorderBuffer spill files using LZ4

On Fri, Jun 7, 2024 at 2:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I think the compression option should be supported at the CREATE
SUBSCRIPTION level instead of being controlled by a GUC. This way, we
can decide on compression for each subscription individually rather
than applying it to all subscribers. It makes more sense for the
subscriber to control this, especially when we are planning to
compress the data sent downstream.

Yes, that makes sense. However, we then need to provide this option
via SQL APIs as well for other plugins.

--
With Regards,
Amit Kapila.

#14Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alvaro Herrera (#6)
Re: Compress ReorderBuffer spill files using LZ4

On 6/6/24 16:24, Alvaro Herrera wrote:

On 2024-Jun-06, Amit Kapila wrote:

On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

I like the general idea of compressing the output of logical decoding.
It's not so clear to me that we only want to do so for spilling to disk;
for instance, if the two nodes communicate over a slow network, it may
even be beneficial to compress when streaming, so to this question:

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

I would say that streaming doesn't necessarily have to mean we don't
want compression, because for some users it might be beneficial.

I think a GUC would be a good idea. Also, what if for whatever reason
you want a different compression algorithm or different compression
parameters? Looking at the existing compression UI we offer in
pg_basebackup, perhaps you could add something like this:

compress_logical_decoding = none
compress_logical_decoding = lz4:42
compress_logical_decoding = spill-zstd:99

"none" says to never use compression (perhaps should be the default),
"lz4:42" says to use lz4 with parameters 42 on both spilling and
streaming, and "spill-zstd:99" says to use Zstd with parameter 99 but
only for spilling to disk.

(I don't mean to say that you should implement Zstd compression with
this patch, only that you should choose the implementation so that
adding Zstd support (or whatever) later is just a matter of adding some
branches here and there. With the current #ifdef you propose, it's hard
to do that. Maybe separate the parts that depend on the specific
algorithm to algorithm-agnostic functions.)

I haven't been following the "libpq compression" thread, but wouldn't
that also do compression for the streaming case? That was my assumption,
at least, and it seems like the right way - we probably don't want to
patch every place that sends data over network independently, right?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Julien Tachoires (#1)
Re: Compress ReorderBuffer spill files using LZ4

On 6/6/24 12:58, Julien Tachoires wrote:

...

When compiled with LZ4 support (--with-lz4), this patch enables data
compression/decompression of these temporary files. Each transaction
change that must be written on disk (ReorderBufferDiskChange) is now
compressed and encapsulated in a new structure.

I'm a bit confused, but why tie this to having lz4? Why shouldn't this
be supported even for pglz, or whatever algorithms we add in the future?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16Julien Tachoires
julmon@gmail.com
In reply to: Tomas Vondra (#15)
Re: Compress ReorderBuffer spill files using LZ4

Le ven. 7 juin 2024 à 05:59, Tomas Vondra
<tomas.vondra@enterprisedb.com> a écrit :

On 6/6/24 12:58, Julien Tachoires wrote:

...

When compiled with LZ4 support (--with-lz4), this patch enables data
compression/decompression of these temporary files. Each transaction
change that must be written on disk (ReorderBufferDiskChange) is now
compressed and encapsulated in a new structure.

I'm a bit confused, but why tie this to having lz4? Why shouldn't this
be supported even for pglz, or whatever algorithms we add in the future?

That's right, reworking this patch in that sense.

Regards,

JT

#17Julien Tachoires
julmon@gmail.com
In reply to: Julien Tachoires (#16)
Re: Compress ReorderBuffer spill files using LZ4

Hi,

Le ven. 7 juin 2024 à 06:18, Julien Tachoires <julmon@gmail.com> a écrit :

Le ven. 7 juin 2024 à 05:59, Tomas Vondra
<tomas.vondra@enterprisedb.com> a écrit :

On 6/6/24 12:58, Julien Tachoires wrote:

...

When compiled with LZ4 support (--with-lz4), this patch enables data
compression/decompression of these temporary files. Each transaction
change that must be written on disk (ReorderBufferDiskChange) is now
compressed and encapsulated in a new structure.

I'm a bit confused, but why tie this to having lz4? Why shouldn't this
be supported even for pglz, or whatever algorithms we add in the future?

That's right, reworking this patch in that sense.

Please find a new version of this patch adding support for LZ4, pglz
and ZSTD. It introduces the new GUC logical_decoding_spill_compression
which is used to set the compression method. In order to stay aligned
with the other server side GUCs related to compression methods
(wal_compression, default_toast_compression), the compression level is
not exposed to users.

The last patch of this set is still in WIP, it adds the machinery
required for setting the compression methods as a subscription option:
CREATE SUBSCRIPTION ... WITH (spill_compression = ...);
I think there is a major problem with this approach: the logical
decoding context is tied to one replication slot, but multiple
subscriptions can use the same replication slot. How should this work
if 2 subscriptions want to use the same replication slot but different
compression methods?

At this point, compression is only available for the changes spilled
on disk. It is still not clear to me if the compression of data
transiting through the streaming protocol should be addressed by this
patch set or by another one. Thought ?

Regards,

JT

Attachments:

v2-0001-Compress-ReorderBuffer-spill-files-using-LZ4.patchapplication/octet-stream; name=v2-0001-Compress-ReorderBuffer-spill-files-using-LZ4.patchDownload+723-84
v2-0002-Add-GUC-logical_decoding_spill_compression.patchapplication/octet-stream; name=v2-0002-Add-GUC-logical_decoding_spill_compression.patchDownload+51-5
v2-0005-Compress-ReorderBuffer-spill-files-using-ZSTD.patchapplication/octet-stream; name=v2-0005-Compress-ReorderBuffer-spill-files-using-ZSTD.patchDownload+409-2
v2-0004-Compress-ReorderBuffer-spill-files-using-PGLZ.patchapplication/octet-stream; name=v2-0004-Compress-ReorderBuffer-spill-files-using-PGLZ.patchDownload+66-2
v2-0003-Fix-spill_bytes-counter.patchapplication/octet-stream; name=v2-0003-Fix-spill_bytes-counter.patchDownload+8-5
v2-0007-WIP-Add-the-subscription-option-spill_compression.patchapplication/octet-stream; name=v2-0007-WIP-Add-the-subscription-option-spill_compression.patchDownload+166-68
v2-0006-Add-ReorderBuffer-ondisk-compression-tests.patchapplication/octet-stream; name=v2-0006-Add-ReorderBuffer-ondisk-compression-tests.patchDownload+108-2
#18Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Julien Tachoires (#17)
Re: Compress ReorderBuffer spill files using LZ4

On 7/15/24 20:50, Julien Tachoires wrote:

Hi,

Le ven. 7 juin 2024 à 06:18, Julien Tachoires <julmon@gmail.com> a écrit :

Le ven. 7 juin 2024 à 05:59, Tomas Vondra
<tomas.vondra@enterprisedb.com> a écrit :

On 6/6/24 12:58, Julien Tachoires wrote:

...

When compiled with LZ4 support (--with-lz4), this patch enables data
compression/decompression of these temporary files. Each transaction
change that must be written on disk (ReorderBufferDiskChange) is now
compressed and encapsulated in a new structure.

I'm a bit confused, but why tie this to having lz4? Why shouldn't this
be supported even for pglz, or whatever algorithms we add in the future?

That's right, reworking this patch in that sense.

Please find a new version of this patch adding support for LZ4, pglz
and ZSTD. It introduces the new GUC logical_decoding_spill_compression
which is used to set the compression method. In order to stay aligned
with the other server side GUCs related to compression methods
(wal_compression, default_toast_compression), the compression level is
not exposed to users.

Sounds reasonable. I wonder if it might be useful to allow specifying
the compression level in those places, but that's clearly not something
this patch needs to do.

The last patch of this set is still in WIP, it adds the machinery
required for setting the compression methods as a subscription option:
CREATE SUBSCRIPTION ... WITH (spill_compression = ...);
I think there is a major problem with this approach: the logical
decoding context is tied to one replication slot, but multiple
subscriptions can use the same replication slot. How should this work
if 2 subscriptions want to use the same replication slot but different
compression methods?

Do we really support multiple subscriptions sharing the same slot? I
don't think we do, but maybe I'm missing something.

At this point, compression is only available for the changes spilled
on disk. It is still not clear to me if the compression of data
transiting through the streaming protocol should be addressed by this
patch set or by another one. Thought ?

I'd stick to only compressing the data spilled to disk. It might be
useful to compress the streamed data too, but why shouldn't we compress
the regular (non-streamed) transactions too? Yeah, it's more efficient
to compress larger chunks, but we can fit quite large transactions into
logical_decoding_work_mem without spilling.

FWIW I'd expect that to be handled at the libpq level - there's already
a patch for that, but I haven't checked if it would handle this. But
maybe more importantly, I think compressing streamed data might need to
handle some sort of negotiation of the compression algorithm, which
seems fairly complex.

To conclude, I'd leave this out of scope for this patch.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19Julien Tachoires
julmon@gmail.com
In reply to: Tomas Vondra (#18)
Re: Compress ReorderBuffer spill files using LZ4

Le lun. 15 juil. 2024 à 12:28, Tomas Vondra
<tomas.vondra@enterprisedb.com> a écrit :

On 7/15/24 20:50, Julien Tachoires wrote:

The last patch of this set is still in WIP, it adds the machinery
required for setting the compression methods as a subscription option:
CREATE SUBSCRIPTION ... WITH (spill_compression = ...);
I think there is a major problem with this approach: the logical
decoding context is tied to one replication slot, but multiple
subscriptions can use the same replication slot. How should this work
if 2 subscriptions want to use the same replication slot but different
compression methods?

Do we really support multiple subscriptions sharing the same slot? I
don't think we do, but maybe I'm missing something.

You are right, it's not supported, the following error is raised in this case:
ERROR: replication slot "sub1" is active for PID 51735

I was distracted by the fact that nothing prevents the configuration
of multiple subscriptions sharing the same replication slot.

Thanks,

JT

#20Amit Kapila
amit.kapila16@gmail.com
In reply to: Tomas Vondra (#18)
Re: Compress ReorderBuffer spill files using LZ4

On Tue, Jul 16, 2024 at 12:58 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 7/15/24 20:50, Julien Tachoires wrote:

Hi,

Le ven. 7 juin 2024 à 06:18, Julien Tachoires <julmon@gmail.com> a écrit :

Le ven. 7 juin 2024 à 05:59, Tomas Vondra
<tomas.vondra@enterprisedb.com> a écrit :

On 6/6/24 12:58, Julien Tachoires wrote:

...

When compiled with LZ4 support (--with-lz4), this patch enables data
compression/decompression of these temporary files. Each transaction
change that must be written on disk (ReorderBufferDiskChange) is now
compressed and encapsulated in a new structure.

I'm a bit confused, but why tie this to having lz4? Why shouldn't this
be supported even for pglz, or whatever algorithms we add in the future?

That's right, reworking this patch in that sense.

Please find a new version of this patch adding support for LZ4, pglz
and ZSTD. It introduces the new GUC logical_decoding_spill_compression
which is used to set the compression method. In order to stay aligned
with the other server side GUCs related to compression methods
(wal_compression, default_toast_compression), the compression level is
not exposed to users.

Sounds reasonable. I wonder if it might be useful to allow specifying
the compression level in those places, but that's clearly not something
this patch needs to do.

The last patch of this set is still in WIP, it adds the machinery
required for setting the compression methods as a subscription option:
CREATE SUBSCRIPTION ... WITH (spill_compression = ...);
I think there is a major problem with this approach: the logical
decoding context is tied to one replication slot, but multiple
subscriptions can use the same replication slot. How should this work
if 2 subscriptions want to use the same replication slot but different
compression methods?

Do we really support multiple subscriptions sharing the same slot? I
don't think we do, but maybe I'm missing something.

At this point, compression is only available for the changes spilled
on disk. It is still not clear to me if the compression of data
transiting through the streaming protocol should be addressed by this
patch set or by another one. Thought ?

I'd stick to only compressing the data spilled to disk. It might be
useful to compress the streamed data too, but why shouldn't we compress
the regular (non-streamed) transactions too? Yeah, it's more efficient
to compress larger chunks, but we can fit quite large transactions into
logical_decoding_work_mem without spilling.

FWIW I'd expect that to be handled at the libpq level - there's already
a patch for that, but I haven't checked if it would handle this. But
maybe more importantly, I think compressing streamed data might need to
handle some sort of negotiation of the compression algorithm, which
seems fairly complex.

To conclude, I'd leave this out of scope for this patch.

Your point sounds reasonable to me. OTOH, if we want to support
compression for spill case then shouldn't there be a question how
frequent such an option would be required? Users currently have an
option to stream large transactions for parallel apply or otherwise in
which case no spilling is required. I feel sooner or later we will
make such behavior (streaming=parallel) as default, and then spilling
should happen in very few cases. Is it worth adding this new option
and GUC if that is true?

--
With Regards,
Amit Kapila.

#21Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Amit Kapila (#20)
#22Amit Kapila
amit.kapila16@gmail.com
In reply to: Tomas Vondra (#21)
#23Julien Tachoires
julmon@gmail.com
In reply to: Amit Kapila (#22)
#24Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Julien Tachoires (#23)
#25Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#24)
#26Julien Tachoires
julmon@gmail.com
In reply to: Tomas Vondra (#25)
#27Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Julien Tachoires (#26)
#28Julien Tachoires
julmon@gmail.com
In reply to: Tomas Vondra (#27)