ZStandard (with dictionaries) compression support for TOAST compression

Started by Nikhil Kumar Veldanda9 months ago46 messages
#1Nikhil Kumar Veldanda
Nikhil Kumar Veldanda
veldanda.nikhilkumar17@gmail.com
1 attachment(s)

Hi all,

The ZStandard compression algorithm [1]https://facebook.github.io/zstd/[2]https://github.com/facebook/zstd, though not currently used for
TOAST compression in PostgreSQL, offers significantly improved compression
ratios compared to lz4/pglz in both dictionary-based and non-dictionary
modes. Attached find for review my patch to add ZStandard compression to
Postgres. In tests this patch used with a pre-trained dictionary achieved
up to four times the compression ratio of LZ4, while ZStandard without a
dictionary outperformed LZ4/pglz by about two times during compression of
data.

Notably, this is the first compression algorithm for Postgres that can make
use of a dictionary to provide higher levels of compression, but
dictionaries have to be generated and maintained, and so I’ve had to break
new ground in that regard. To use the dictionary support requires training
and storing a dictionary for a given variable-length column type. On a
variable-length column, a SQL function will be called. It will sample the
column’s data and feed it into the ZStandard training API which will return
a dictionary. In the example, the column is of JSONB type. The SQL function
takes the table name and the attribute number as inputs. If the training is
successful, it will return true; otherwise, it will return false.

‘’‘
test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
build_zstd_dict_for_attribute
-------------------------------
t
(1 row)
‘’‘

The sampling logic and data to feed to the ZStandard training API can vary
by data type. The patch includes an method to write other type-specific
training functions and includes a default for JSONB, TEXT and BYTEA. There
is a new option called ‘build_zstd_dict’ that takes a function name as
input in ‘CREATE TYPE’. In this way anyone can write their own
type-specific training function by handling sampling logic and returning
the necessary information for the ZStandard training API in
“ZstdTrainingData” format.

```
typedef struct ZstdTrainingData
{
char *sample_buffer; /* Pointer to the raw sample buffer */
size_t *sample_sizes; /* Array of sample sizes */
int nitems; /* Number of sample sizes */
} ZstdTrainingData;
```
This information is feed into the ZStandard train API, which generates a
dictionary and inserts it into the dictionary catalog table. Additionally,
we update the ‘pg_attribute’ attribute options to include the unique
dictionary ID for that specific attribute. During compression, based on the
available dictionary ID, we retrieve the dictionary and use it to compress
the documents. I’ve created standard training function
(`zstd_dictionary_builder`) for JSONB, TEXT, and BYTEA.

We store dictionary and dictid in the new catalog table
‘pg_zstd_dictionaries’

```
test=# \d pg_zstd_dictionaries
Table "pg_catalog.pg_zstd_dictionaries"
Column | Type | Collation | Nullable | Default
--------+-------+-----------+----------+---------
dictid | oid | | not null |
dict | bytea | | not null |
Indexes:
"pg_zstd_dictionaries_dictid_index" PRIMARY KEY, btree (dictid)
```

This is the entire ZStandard dictionary infrastructure. A column can have
multiple dictionaries. The latest dictionary will be identified by the
pg_attribute attoptions. We never delete dictionaries once they are
generated. If a dictionary is not provided and attcompression is set to
zstd, we compress with ZStandard without dictionary. For decompression, the
zstd-compressed frame contains a dictionary identifier (dictid) that
indicates the dictionary used for compression. By retrieving this dictid
from the zstd frame, we then fetch the corresponding dictionary and perform
decompression.

#############################################################################

Enter toast compression framework changes,

We identify a compressed datum compression algorithm using the top two bits
of va_tcinfo (varattrib_4b.va_compressed).
It is possible to have four compression methods. However, based on previous
community email discussions regarding toast compression changes[3]/messages/by-id/YoMiNmkztrslDbNS@paquier.xyz, the
idea of using it for a new compression algorithm has been rejected, and a
suggestion has been made to extend it which I’ve implemented in this patch.
This change necessitates an update to ‘varattrib_4b’ and ‘varatt_external’
on disk structures. I’ve made sure that this changes are backward
compatible.

```
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct /* Compressed-in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
} va_compressed;
struct
{
uint32 va_header;
uint32 va_tcinfo;
uint32 va_cmp_alg;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed_ext;
} varattrib_4b;

typedef struct varatt_external
{
int32 va_rawsize; /* Original data size (includes header) */
uint32 va_extinfo; /* External saved size (without header) and
* compression method */
Oid va_valueid; /* Unique ID of value within TOAST table */
Oid va_toastrelid; /* RelID of TOAST table containing it */
uint32 va_cmp_alg; /* The additional compression algorithms
* information. */
} varatt_external;
```

As I need to update this structs, I’ve made changes to the existing macros.
Additionally added compression and decompression routines related to
ZStandard as needed. These are major design changes in the patch to
incorporate ZStandard with dictionary compression.

Please let me know what you think about all this. Are there any concerns
with my approach? In particular, I would appreciate your thoughts on the
on-disk changes that result from this.

kind regards,

Nikhil Veldanda
Amazon Web Services: https://aws.amazon.com

[1]: https://facebook.github.io/zstd/
[2]: https://github.com/facebook/zstd
[3]: /messages/by-id/YoMiNmkztrslDbNS@paquier.xyz
/messages/by-id/YoMiNmkztrslDbNS@paquier.xyz

Attachments:

v1-0001-Add-ZStandard-with-dictionaries-compression-suppo.patchapplication/octet-stream; name=v1-0001-Add-ZStandard-with-dictionaries-compression-suppo.patch
#2Kirill Reshke
Kirill Reshke
reshkekirill@gmail.com
In reply to: Nikhil Kumar Veldanda (#1)
Re: ZStandard (with dictionaries) compression support for TOAST compression

On Thu, 6 Mar 2025 at 08:43, Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:

Hi all,

The ZStandard compression algorithm [1][2], though not currently used for TOAST compression in PostgreSQL, offers significantly improved compression ratios compared to lz4/pglz in both dictionary-based and non-dictionary modes. Attached find for review my patch to add ZStandard compression to Postgres. In tests this patch used with a pre-trained dictionary achieved up to four times the compression ratio of LZ4, while ZStandard without a dictionary outperformed LZ4/pglz by about two times during compression of data.

Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher levels of compression, but dictionaries have to be generated and maintained, and so I’ve had to break new ground in that regard. To use the dictionary support requires training and storing a dictionary for a given variable-length column type. On a variable-length column, a SQL function will be called. It will sample the column’s data and feed it into the ZStandard training API which will return a dictionary. In the example, the column is of JSONB type. The SQL function takes the table name and the attribute number as inputs. If the training is successful, it will return true; otherwise, it will return false.

‘’‘
test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
build_zstd_dict_for_attribute
-------------------------------
t
(1 row)
‘’‘

The sampling logic and data to feed to the ZStandard training API can vary by data type. The patch includes an method to write other type-specific training functions and includes a default for JSONB, TEXT and BYTEA. There is a new option called ‘build_zstd_dict’ that takes a function name as input in ‘CREATE TYPE’. In this way anyone can write their own type-specific training function by handling sampling logic and returning the necessary information for the ZStandard training API in “ZstdTrainingData” format.

```
typedef struct ZstdTrainingData
{
char *sample_buffer; /* Pointer to the raw sample buffer */
size_t *sample_sizes; /* Array of sample sizes */
int nitems; /* Number of sample sizes */
} ZstdTrainingData;
```
This information is feed into the ZStandard train API, which generates a dictionary and inserts it into the dictionary catalog table. Additionally, we update the ‘pg_attribute’ attribute options to include the unique dictionary ID for that specific attribute. During compression, based on the available dictionary ID, we retrieve the dictionary and use it to compress the documents. I’ve created standard training function (`zstd_dictionary_builder`) for JSONB, TEXT, and BYTEA.

We store dictionary and dictid in the new catalog table ‘pg_zstd_dictionaries’

```
test=# \d pg_zstd_dictionaries
Table "pg_catalog.pg_zstd_dictionaries"
Column | Type | Collation | Nullable | Default
--------+-------+-----------+----------+---------
dictid | oid | | not null |
dict | bytea | | not null |
Indexes:
"pg_zstd_dictionaries_dictid_index" PRIMARY KEY, btree (dictid)
```

This is the entire ZStandard dictionary infrastructure. A column can have multiple dictionaries. The latest dictionary will be identified by the pg_attribute attoptions. We never delete dictionaries once they are generated. If a dictionary is not provided and attcompression is set to zstd, we compress with ZStandard without dictionary. For decompression, the zstd-compressed frame contains a dictionary identifier (dictid) that indicates the dictionary used for compression. By retrieving this dictid from the zstd frame, we then fetch the corresponding dictionary and perform decompression.

#############################################################################

Enter toast compression framework changes,

We identify a compressed datum compression algorithm using the top two bits of va_tcinfo (varattrib_4b.va_compressed).
It is possible to have four compression methods. However, based on previous community email discussions regarding toast compression changes[3], the idea of using it for a new compression algorithm has been rejected, and a suggestion has been made to extend it which I’ve implemented in this patch. This change necessitates an update to ‘varattrib_4b’ and ‘varatt_external’ on disk structures. I’ve made sure that this changes are backward compatible.

```
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct /* Compressed-in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
} va_compressed;
struct
{
uint32 va_header;
uint32 va_tcinfo;
uint32 va_cmp_alg;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed_ext;
} varattrib_4b;

typedef struct varatt_external
{
int32 va_rawsize; /* Original data size (includes header) */
uint32 va_extinfo; /* External saved size (without header) and
* compression method */
Oid va_valueid; /* Unique ID of value within TOAST table */
Oid va_toastrelid; /* RelID of TOAST table containing it */
uint32 va_cmp_alg; /* The additional compression algorithms
* information. */
} varatt_external;
```

As I need to update this structs, I’ve made changes to the existing macros. Additionally added compression and decompression routines related to ZStandard as needed. These are major design changes in the patch to incorporate ZStandard with dictionary compression.

Please let me know what you think about all this. Are there any concerns with my approach? In particular, I would appreciate your thoughts on the on-disk changes that result from this.

kind regards,

Nikhil Veldanda
Amazon Web Services: https://aws.amazon.com

[1] https://facebook.github.io/zstd/
[2] https://github.com/facebook/zstd
[3] /messages/by-id/YoMiNmkztrslDbNS@paquier.xyz

Hi!
I generally love this idea, however I am not convinced in-core support
this is the right direction here. Maybe we can introduce some API
infrastructure here to allow delegating compression to extension's?
This is merely my opinion; perhaps dealing with a redo is not
worthwhile.

I did a brief lookup on patch v1. I feel like this is too much for a
single patch. Take, for example this change:

```
-#define NO_LZ4_SUPPORT() \
+#define NO_METHOD_SUPPORT(method) \
  ereport(ERROR, \
  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
- errmsg("compression method lz4 not supported"), \
- errdetail("This functionality requires the server to be built with
lz4 support.")))
+ errmsg("compression method %s not supported", method), \
+ errdetail("This functionality requires the server to be built with
%s support.", method)))
 ```

This could be a separate preliminary refactoring patch in series.
Perhaps we need to divide the patch into smaller pieces if we follow
the suggested course of this thread (in-core support).

I will try to give another in-depth look here soon.

--
Best regards,
Kirill Reshke

#3Yura Sokolov
Yura Sokolov
y.sokolov@postgrespro.ru
In reply to: Nikhil Kumar Veldanda (#1)
Re: ZStandard (with dictionaries) compression support for TOAST compression

06.03.2025 08:32, Nikhil Kumar Veldanda пишет:

Hi all,

The ZStandard compression algorithm [1][2], though not currently used for
TOAST compression in PostgreSQL, offers significantly improved compression
ratios compared to lz4/pglz in both dictionary-based and non-dictionary
modes. Attached find for review my patch to add ZStandard compression to
Postgres. In tests this patch used with a pre-trained dictionary achieved
up to four times the compression ratio of LZ4, while ZStandard without a
dictionary outperformed LZ4/pglz by about two times during compression of data.

Notably, this is the first compression algorithm for Postgres that can make
use of a dictionary to provide higher levels of compression, but
dictionaries have to be generated and maintained, and so I’ve had to break
new ground in that regard. To use the dictionary support requires training
and storing a dictionary for a given variable-length column type. On a
variable-length column, a SQL function will be called. It will sample the
column’s data and feed it into the ZStandard training API which will return
a dictionary. In the example, the column is of JSONB type. The SQL function
takes the table name and the attribute number as inputs. If the training is
successful, it will return true; otherwise, it will return false.

‘’‘
test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
build_zstd_dict_for_attribute
-------------------------------
t
(1 row)
‘’‘

The sampling logic and data to feed to the ZStandard training API can vary
by data type. The patch includes an method to write other type-specific
training functions and includes a default for JSONB, TEXT and BYTEA. There
is a new option called ‘build_zstd_dict’ that takes a function name as
input in ‘CREATE TYPE’. In this way anyone can write their own type-
specific training function by handling sampling logic and returning the
necessary information for the ZStandard training API in “ZstdTrainingData”
format.

```
typedef struct ZstdTrainingData
{
char *sample_buffer; /* Pointer to the raw sample buffer */
size_t *sample_sizes; /* Array of sample sizes */
int nitems; /* Number of sample sizes */
} ZstdTrainingData;
```
This information is feed into the ZStandard train API, which generates a
dictionary and inserts it into the dictionary catalog table. Additionally,
we update the ‘pg_attribute’ attribute options to include the unique
dictionary ID for that specific attribute. During compression, based on the
available dictionary ID, we retrieve the dictionary and use it to compress
the documents. I’ve created standard training function
(`zstd_dictionary_builder`) for JSONB, TEXT, and BYTEA. 

We store dictionary and dictid in the new catalog table ‘pg_zstd_dictionaries’

```
test=# \d pg_zstd_dictionaries
Table "pg_catalog.pg_zstd_dictionaries"
Column | Type | Collation | Nullable | Default
--------+-------+-----------+----------+---------
dictid | oid | | not null |
dict | bytea | | not null |
Indexes:
"pg_zstd_dictionaries_dictid_index" PRIMARY KEY, btree (dictid)
``` 

This is the entire ZStandard dictionary infrastructure. A column can have
multiple dictionaries. The latest dictionary will be identified by the
pg_attribute attoptions. We never delete dictionaries once they are
generated. If a dictionary is not provided and attcompression is set to
zstd, we compress with ZStandard without dictionary. For decompression, the
zstd-compressed frame contains a dictionary identifier (dictid) that
indicates the dictionary used for compression. By retrieving this dictid
from the zstd frame, we then fetch the corresponding dictionary and perform
decompression.

#############################################################################

Enter toast compression framework changes,

We identify a compressed datum compression algorithm using the top two bits
of va_tcinfo (varattrib_4b.va_compressed). 
It is possible to have four compression methods. However, based on previous
community email discussions regarding toast compression changes[3], the
idea of using it for a new compression algorithm has been rejected, and a
suggestion has been made to extend it which I’ve implemented in this patch.
This change necessitates an update to ‘varattrib_4b’ and ‘varatt_external’
on disk structures. I’ve made sure that this changes are backward compatible. 

```
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct /* Compressed-in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
} va_compressed;
struct
{
uint32 va_header;
uint32 va_tcinfo;
uint32 va_cmp_alg;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed_ext;
} varattrib_4b;

typedef struct varatt_external
{
int32 va_rawsize; /* Original data size (includes header) */
uint32 va_extinfo; /* External saved size (without header) and
* compression method */
Oid va_valueid; /* Unique ID of value within TOAST table */
Oid va_toastrelid; /* RelID of TOAST table containing it */
uint32 va_cmp_alg; /* The additional compression algorithms
* information. */
} varatt_external;
```

As I need to update this structs, I’ve made changes to the existing macros.
Additionally added compression and decompression routines related to
ZStandard as needed. These are major design changes in the patch to
incorporate ZStandard with dictionary compression. 

Please let me know what you think about all this. Are there any concerns
with my approach? In particular, I would appreciate your thoughts on the
on-disk changes that result from this.

kind regards,

Nikhil Veldanda
Amazon Web Services: https://aws.amazon.com <https://aws.amazon.com/&gt;

[1] https://facebook.github.io/zstd/ <https://facebook.github.io/zstd/&gt;
[2] https://github.com/facebook/zstd <https://github.com/facebook/zstd&gt;
[3] /messages/by-id/flat/
YoMiNmkztrslDbNS%40paquier.xyz </messages/by-id/flat/
YoMiNmkztrslDbNS%40paquier.xyz>

Overall idea is great.

I just want to mention LZ4 also have API to use dictionary. Its dictionary
will be as simple as "virtually prepended" text (in contrast to complex
ZStd dictionary format).

I mean, it would be great if "dictionary" will be common property for
different algorithms.

On the other hand, zstd have "super fast" mode which is actually a bit
faster than LZ4 and compresses a bit better. So may be support for
different algos is not essential. (But then we need a way to change
compression level to that "super fast" mode.)

-------
regards
Yura Sokolov aka funny-falcon

#4Aleksander Alekseev
Aleksander Alekseev
aleksander@timescale.com
In reply to: Nikhil Kumar Veldanda (#1)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi Nikhil,

Many thanks for working on this. I proposed a similar patch some time
ago [1]/messages/by-id/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com but the overall feedback was somewhat mixed so I choose to
focus on something else. Thanks for peeking this up.

test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
build_zstd_dict_for_attribute
-------------------------------
t
(1 row)

Did you have a chance to familiarize yourself with the corresponding
discussion [1]/messages/by-id/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com and probably the previous threads? Particularly it was
pointed out that dictionaries should be built automatically during
VACUUM. We also discussed a special syntax for the feature, besides
other things.

[1]: /messages/by-id/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com

--
Best regards,
Aleksander Alekseev

#5Nikhil Kumar Veldanda
Nikhil Kumar Veldanda
veldanda.nikhilkumar17@gmail.com
In reply to: Yura Sokolov (#3)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi,

Overall idea is great.

I just want to mention LZ4 also have API to use dictionary. Its dictionary
will be as simple as "virtually prepended" text (in contrast to complex
ZStd dictionary format).

I mean, it would be great if "dictionary" will be common property for
different algorithms.

On the other hand, zstd have "super fast" mode which is actually a bit
faster than LZ4 and compresses a bit better. So may be support for
different algos is not essential. (But then we need a way to change
compression level to that "super fast" mode.)

zstd compression level and zstd dictionary size is configurable at
attribute level using ALTER TABLE. Default zstd level is 3 and dict
size is 4KB. For super fast mode level can be set to 1.

```
test=# alter table zstd alter column doc set compression zstd;
ALTER TABLE
test=# alter table zstd alter column doc set(zstd_cmp_level = 1);
ALTER TABLE
test=# select * from pg_attribute where attrelid = 'zstd'::regclass
and attname = 'doc';
 attrelid | attname | atttypid | attlen | attnum | atttypmod |
attndims | attbyval | attalign | attstorage | attcompre
ssion | attnotnull | atthasdef | atthasmissing | attidentity |
attgenerated | attisdropped | attislocal | attinhcount
| attcollation | attstattarget | attacl |            attoptions
    | attfdwoptions | attmissingval
----------+---------+----------+--------+--------+-----------+----------+----------+----------+------------+----------
------+------------+-----------+---------------+-------------+--------------+--------------+------------+-------------
+--------------+---------------+--------+----------------------------------+---------------+---------------
    16389 | doc     |     3802 |     -1 |      1 |        -1 |
0 | f        | i        | x          | z
      | f          | f         | f             |             |
     | f            | t          |           0
|            0 |               |        |
{zstd_dictid=1,zstd_cmp_level=1} |               |
(1 row)
```
#6Nikhil Kumar Veldanda
Nikhil Kumar Veldanda
veldanda.nikhilkumar17@gmail.com
In reply to: Aleksander Alekseev (#4)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi

On Thu, Mar 6, 2025 at 5:35 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:

Hi Nikhil,

Many thanks for working on this. I proposed a similar patch some time
ago [1] but the overall feedback was somewhat mixed so I choose to
focus on something else. Thanks for peeking this up.

test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
build_zstd_dict_for_attribute
-------------------------------
t
(1 row)

Did you have a chance to familiarize yourself with the corresponding
discussion [1] and probably the previous threads? Particularly it was
pointed out that dictionaries should be built automatically during
VACUUM. We also discussed a special syntax for the feature, besides
other things.

[1]: /messages/by-id/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com

Restricting dictionary generation to the vacuum process is not ideal
because it limits user control and flexibility. Compression efficiency
is highly dependent on data distribution, which can change
dynamically. By allowing users to generate dictionaries on demand via
an API, they can optimize compression when they detect inefficiencies
rather than waiting for a vacuum process, which may not align with
their needs.

Additionally, since all dictionaries are stored in the catalog table
anyway, users can generate and manage them independently without
interfering with the system’s automatic maintenance tasks. This
approach ensures better adaptability to real-world scenarios where
compression performance needs to be monitored and adjusted in real
time.

---
Nikhil Veldanda

#7Yura Sokolov
Yura Sokolov
y.sokolov@postgrespro.ru
In reply to: Nikhil Kumar Veldanda (#5)
Re: ZStandard (with dictionaries) compression support for TOAST compression

06.03.2025 19:29, Nikhil Kumar Veldanda пишет:

Hi,

Overall idea is great.

I just want to mention LZ4 also have API to use dictionary. Its dictionary
will be as simple as "virtually prepended" text (in contrast to complex
ZStd dictionary format).

I mean, it would be great if "dictionary" will be common property for
different algorithms.

On the other hand, zstd have "super fast" mode which is actually a bit
faster than LZ4 and compresses a bit better. So may be support for
different algos is not essential. (But then we need a way to change
compression level to that "super fast" mode.)

zstd compression level and zstd dictionary size is configurable at
attribute level using ALTER TABLE. Default zstd level is 3 and dict
size is 4KB. For super fast mode level can be set to 1.

No. Super-fast mode levels are negative. See parsing "--fast" parameter in
`programs/zstdcli.c` in zstd's repository and definition of ZSTD_minCLevel().

So, to support "super-fast" mode you have to accept negative compression
levels. I didn't check, probably you're already support them?

-------
regards
Yura Sokolov aka funny-falcon

#8Nikhil Kumar Veldanda
Nikhil Kumar Veldanda
veldanda.nikhilkumar17@gmail.com
In reply to: Yura Sokolov (#7)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi Yura,

So, to support "super-fast" mode you have to accept negative compression
levels. I didn't check, probably you're already support them?

The key point I want to emphasize is that both zstd compression levels
and dictionary size should be configurable based on user preferences
at attribute level.

---
Nikhil Veldanda

#9Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Nikhil Kumar Veldanda (#1)
Re: ZStandard (with dictionaries) compression support for TOAST compression

On Thu, Mar 6, 2025 at 12:43 AM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:

Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher levels of compression, but dictionaries have to be generated and maintained,

I think that solving the problems around using a dictionary is going
to be really hard. Can we see some evidence that the results will be
worth it?

--
Robert Haas
EDB: http://www.enterprisedb.com

#10Tom Lane
Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#9)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Mar 6, 2025 at 12:43 AM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:

Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher levels of compression, but dictionaries have to be generated and maintained,

I think that solving the problems around using a dictionary is going
to be really hard. Can we see some evidence that the results will be
worth it?

BTW, this is hardly the first such attempt. See [1]/messages/by-id/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com for a prior
attempt at something fairly similar, which ended up going nowhere.
It'd be wise to understand why that failed before pressing forward.

Note that the thread title for [1]/messages/by-id/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com is pretty misleading, as the
original discussion about JSONB-specific compression soon migrated
to discussion of compressing TOAST data using dictionaries. At
least from a ten-thousand-foot viewpoint, that seems like exactly
what you're proposing here. I see that you dismissed [1]/messages/by-id/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com as
irrelevant upthread, but I think you'd better look closer.

regards, tom lane

[1]: /messages/by-id/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com

#11Nikhil Kumar Veldanda
Nikhil Kumar Veldanda
veldanda.nikhilkumar17@gmail.com
In reply to: Robert Haas (#9)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi Robert,

I think that solving the problems around using a dictionary is going
to be really hard. Can we see some evidence that the results will be
worth it?

With the latest patch I've shared,

Using a Kaggle dataset of Nintendo-related tweets[1]https://www.kaggle.com/code/dcalambas/nintendo-tweets-analysis/data, we leveraged
PostgreSQL's acquire_sample_rows function to quickly gather just 1,000
sample rows for a specific attribute out of 104695 rows. These raw
samples were passed into Zstd's sampling buffer, generating a custom
dictionary. This dictionary was then directly used to compress the
documents, resulting in 62% of space savings after compressed:

```
test=# \dt+
List of tables
Schema | Name | Type | Owner | Persistence | Access
method | Size | Description
--------+----------------+-------+----------+-------------+---------------+--------+-------------
public | lz4 | table | nikhilkv | permanent | heap
| 297 MB |
public | pglz | table | nikhilkv | permanent | heap
| 259 MB |
public | zstd_with_dict | table | nikhilkv | permanent | heap
| 114 MB |
public | zstd_wo_dict | table | nikhilkv | permanent | heap
| 210 MB |
(4 rows)
```

We've observed similarly strong results on other datasets as well with
using dictionaries.

[1]: https://www.kaggle.com/code/dcalambas/nintendo-tweets-analysis/data

---
Nikhil Veldanda

#12Nikhil Kumar Veldanda
Nikhil Kumar Veldanda
veldanda.nikhilkumar17@gmail.com
In reply to: Tom Lane (#10)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi Tom,

On Thu, Mar 6, 2025 at 11:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Mar 6, 2025 at 12:43 AM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:

Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher levels of compression, but dictionaries have to be generated and maintained,

I think that solving the problems around using a dictionary is going
to be really hard. Can we see some evidence that the results will be
worth it?

BTW, this is hardly the first such attempt. See [1] for a prior
attempt at something fairly similar, which ended up going nowhere.
It'd be wise to understand why that failed before pressing forward.

Note that the thread title for [1] is pretty misleading, as the
original discussion about JSONB-specific compression soon migrated
to discussion of compressing TOAST data using dictionaries. At
least from a ten-thousand-foot viewpoint, that seems like exactly
what you're proposing here. I see that you dismissed [1] as
irrelevant upthread, but I think you'd better look closer.

regards, tom lane

[1] /messages/by-id/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com

Thank you for highlighting the previous discussion—I reviewed [1]/messages/by-id/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com
closely. While both methods involve dictionary-based compression, the
approach I'm proposing differs significantly.

The previous method explicitly extracted string values from JSONB and
assigned unique OIDs to each entry, resulting in distinct dictionary
entries for every unique value. In contrast, this approach directly
leverages Zstandard's dictionary training API. We provide raw data
samples to Zstd, which generates a dictionary of a specified size.
This dictionary is then stored in a catalog table and used to compress
subsequent inserts for the specific attribute it was trained on.

Key differences include:

1. No new data types are required.
2. Attributes can optionally have multiple dictionaries; the latest
dictionary is used during compression, and the exact dictionary used
during compression is retrieved and applied for decompression.
3. Compression utilizes Zstandard's trained dictionaries when available.

Additionally, I have provided an option for users to define custom
sampling and training logic, as directly passing raw buffers to the
training API may not always yield optimal results, especially for
certain custom variable-length data types. This flexibility motivates
the necessary adjustments to `pg_type`.

I would greatly appreciate your feedback or any additional suggestions
you might have.

[1]: /messages/by-id/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com

Best regards,
Nikhil Veldanda

#13Aleksander Alekseev
Aleksander Alekseev
aleksander@timescale.com
In reply to: Nikhil Kumar Veldanda (#12)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi Nikhil,

Thank you for highlighting the previous discussion—I reviewed [1]
closely. While both methods involve dictionary-based compression, the
approach I'm proposing differs significantly.

The previous method explicitly extracted string values from JSONB and
assigned unique OIDs to each entry, resulting in distinct dictionary
entries for every unique value. In contrast, this approach directly
leverages Zstandard's dictionary training API. We provide raw data
samples to Zstd, which generates a dictionary of a specified size.
This dictionary is then stored in a catalog table and used to compress
subsequent inserts for the specific attribute it was trained on.

[...]

You didn't read closely enough I'm afraid. As Tom pointed out, the
title of the thread is misleading. On top of that there are several
separate threads. I did my best to cross-reference them, but
apparently didn't do good enough.

Initially I proposed to add ZSON extension [1]https://github.com/afiskon/zson[2]/messages/by-id/CAJ7c6TP3fCC9TNKJBQAcEf4c=L7XQZ7QvuUayLgjhNQMD_5M_A@mail.gmail.com to the PostgreSQL
core. However the idea evolved into TOAST improvements that don't
require a user to use special types. You may also find interesting the
related "Pluggable TOASTer" discussion [3]/messages/by-id/224711f9-83b7-a307-b17f-4457ab73aa0a@sigaev.ru. The idea there was rather
different but the discussion about extending TOAST pointers so that in
the future we can use something else than ZSTD is relevant.

You will find the recent summary of the reached agreements somewhere
around this message [4]/messages/by-id/CAJ7c6TPSN06C+5cYSkyLkQbwN1C+pUNGmx+VoGCA-SPLCszC8w@mail.gmail.com, take a look at the thread a bit above and
below it.

I believe this effort is important. You can't, however, simply discard
everything that was discussed in this area for the past several years.
If you want to succeed of course. No one will look at your patch if it
doesn't account for all the previous discussions. I'm sorry, I know
it's disappointing. This being said you should have done better
research before submitting the code. You could just ask if anyone was
working on something like this before and save a lot of time.

Personally I would suggest starting with one little step toward
compression dictionaries. Particularly focusing on extendability of
TOAST pointers. You are going to need to store dictionary ids there
and allow using other compression algorithms in the future. This will
require something like a varint/utf8-like bitmask for this. See the
previous discussions.

[1]: https://github.com/afiskon/zson
[2]: /messages/by-id/CAJ7c6TP3fCC9TNKJBQAcEf4c=L7XQZ7QvuUayLgjhNQMD_5M_A@mail.gmail.com
[3]: /messages/by-id/224711f9-83b7-a307-b17f-4457ab73aa0a@sigaev.ru
[4]: /messages/by-id/CAJ7c6TPSN06C+5cYSkyLkQbwN1C+pUNGmx+VoGCA-SPLCszC8w@mail.gmail.com

--
Best regards,
Aleksander Alekseev

#14Aleksander Alekseev
Aleksander Alekseev
aleksander@timescale.com
In reply to: Robert Haas (#9)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi Robert,

I think that solving the problems around using a dictionary is going
to be really hard. Can we see some evidence that the results will be
worth it?

Compression dictionaries give a good compression ratio (~50%) and also
increase TPS a bit (5-10%) due to better buffer cache utilization. At
least according to synthetic and not trustworthy benchmarks I did some
years ago [1]https://github.com/afiskon/zson/blob/master/docs/benchmark.md. The result may be very dependent on the actual data of
course, not to mention particular implementation of the idea.

[1]: https://github.com/afiskon/zson/blob/master/docs/benchmark.md

--
Best regards,
Aleksander Alekseev

#15Nikhil Kumar Veldanda
Nikhil Kumar Veldanda
veldanda.nikhilkumar17@gmail.com
In reply to: Aleksander Alekseev (#13)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi,

I reviewed the discussions, and while most agreements focused on
changes to the toast pointer, the design I propose requires no
modifications to it. I’ve carefully considered the design choices made
previously, and I recognize Zstd’s clear advantages in compression
efficiency and performance over algorithms like PGLZ and LZ4, we can
integrate it without altering the existing toast pointer
(varatt_external) structure.

By simply using the top two bits of the va_extinfo field (setting them
to '11') in `varatt_external`, we can signal an alternative
compression algorithm, clearly distinguishing new methods from legacy
ones. The specific algorithm used would then be recorded in the
va_cmp_alg field.

This approach addresses the issues raised in the summarized thread[1]/messages/by-id/CAJ7c6TPSN06C+5cYSkyLkQbwN1C+pUNGmx+VoGCA-SPLCszC8w@mail.gmail.com
and to leverage dictionaries for the data that can stay in-line. While
my initial patch includes modifications to toast_pointer due to a
single dependency on (pg_column_compression), those changes aren’t
strictly necessary; resolving that dependency separately would make
the overall design even less intrusive.

Here’s an illustrative structure:
```
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct /* Current Compressed format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original size and compression method */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
} va_compressed;
struct /* Extended compression format */
{
uint32 va_header;
uint32 va_tcinfo;
uint32 va_cmp_alg;
uint32 va_cmp_dictid;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed_ext;
} varattrib_4b;

typedef struct varatt_external
{
int32 va_rawsize; /* Original data size (includes header) */
uint32 va_extinfo; /* External saved size (without header) and
* compression method */ `11` indicates new compression methods.
Oid va_valueid; /* Unique ID of value within TOAST table */
Oid va_toastrelid; /* RelID of TOAST table containing it */
} varatt_external;
```

Decompression flow remains straightforward: once a datum is identified
as external, we detoast it, then we identify the compression algorithm
using `
TOAST_COMPRESS_METHOD` macro which refers to a varattrib_4b structure
not a toast pointer. We retrieve the compression algorithm from either
va_tcinfo or va_cmp_alg based on adjusted macros, and decompress
accordingly.

In summary, integrating Zstandard into the TOAST framework in this
minimally invasive way should yield substantial benefits.

[1]: /messages/by-id/CAJ7c6TPSN06C+5cYSkyLkQbwN1C+pUNGmx+VoGCA-SPLCszC8w@mail.gmail.com

Best regards,
Nikhil Veldanda

On Fri, Mar 7, 2025 at 3:42 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:

Show quoted text

Hi Nikhil,

Thank you for highlighting the previous discussion—I reviewed [1]
closely. While both methods involve dictionary-based compression, the
approach I'm proposing differs significantly.

The previous method explicitly extracted string values from JSONB and
assigned unique OIDs to each entry, resulting in distinct dictionary
entries for every unique value. In contrast, this approach directly
leverages Zstandard's dictionary training API. We provide raw data
samples to Zstd, which generates a dictionary of a specified size.
This dictionary is then stored in a catalog table and used to compress
subsequent inserts for the specific attribute it was trained on.

[...]

You didn't read closely enough I'm afraid. As Tom pointed out, the
title of the thread is misleading. On top of that there are several
separate threads. I did my best to cross-reference them, but
apparently didn't do good enough.

Initially I proposed to add ZSON extension [1][2] to the PostgreSQL
core. However the idea evolved into TOAST improvements that don't
require a user to use special types. You may also find interesting the
related "Pluggable TOASTer" discussion [3]. The idea there was rather
different but the discussion about extending TOAST pointers so that in
the future we can use something else than ZSTD is relevant.

You will find the recent summary of the reached agreements somewhere
around this message [4], take a look at the thread a bit above and
below it.

I believe this effort is important. You can't, however, simply discard
everything that was discussed in this area for the past several years.
If you want to succeed of course. No one will look at your patch if it
doesn't account for all the previous discussions. I'm sorry, I know
it's disappointing. This being said you should have done better
research before submitting the code. You could just ask if anyone was
working on something like this before and save a lot of time.

Personally I would suggest starting with one little step toward
compression dictionaries. Particularly focusing on extendability of
TOAST pointers. You are going to need to store dictionary ids there
and allow using other compression algorithms in the future. This will
require something like a varint/utf8-like bitmask for this. See the
previous discussions.

[1]: https://github.com/afiskon/zson
[2]: /messages/by-id/CAJ7c6TP3fCC9TNKJBQAcEf4c=L7XQZ7QvuUayLgjhNQMD_5M_A@mail.gmail.com
[3]: /messages/by-id/224711f9-83b7-a307-b17f-4457ab73aa0a@sigaev.ru
[4]: /messages/by-id/CAJ7c6TPSN06C+5cYSkyLkQbwN1C+pUNGmx+VoGCA-SPLCszC8w@mail.gmail.com

--
Best regards,
Aleksander Alekseev

#16Nikhil Kumar Veldanda
Nikhil Kumar Veldanda
veldanda.nikhilkumar17@gmail.com
In reply to: Nikhil Kumar Veldanda (#15)
2 attachment(s)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi all,

Attached an updated version of the patch. Specifically, I've removed
changes related to the TOAST pointer structure. This proposal is
different from earlier discussions on this topic[1]/messages/by-id/CAJ7c6TPSN06C+5cYSkyLkQbwN1C+pUNGmx+VoGCA-SPLCszC8w@mail.gmail.com, where extending
the TOAST pointer was considered essential for enabling
dictionary-based compression.

Key improvements introduced in this proposal:

1. No Changes to TOAST Pointer: The existing TOAST pointer structure
remains untouched, simplifying integration and minimizing potential
disruptions.

2. Extensible Design: The solution is structured to seamlessly
incorporate future compression algorithms beyond zstd [2]https://github.com/facebook/zstd, providing
greater flexibility and future-proofing.

3. Inline Data Compression with Dictionary Support: Crucially, this
approach supports dictionary-based compression for inline data.
Dictionaries are highly effective for compressing small-sized
documents, providing substantial storage savings. Please refer to the
attached image from the zstd README[2]https://github.com/facebook/zstd for supporting evidence.
Omitting dictionary-based compression for inline data would
significantly reduce these benefits. For example, under previous
design constraints [3]/messages/by-id/CAJ7c6TPSN06C+5cYSkyLkQbwN1C+pUNGmx+VoGCA-SPLCszC8w@mail.gmail.com, if a 16KB document compressed down to 256
bytes using a dictionary, storing this inline would not have been
feasible. The current proposal addresses this limitation, thereby
fully leveraging dictionary-based compression.

I believe this solution effectively addresses the limitations
identified in our earlier discussions [1]/messages/by-id/CAJ7c6TPSN06C+5cYSkyLkQbwN1C+pUNGmx+VoGCA-SPLCszC8w@mail.gmail.com[3]/messages/by-id/CAJ7c6TPSN06C+5cYSkyLkQbwN1C+pUNGmx+VoGCA-SPLCszC8w@mail.gmail.com.

Feedback on this approach would be greatly appreciated, I welcome any
feedback or suggestions you might have.

References:
[1]: /messages/by-id/CAJ7c6TPSN06C+5cYSkyLkQbwN1C+pUNGmx+VoGCA-SPLCszC8w@mail.gmail.com
[2]: https://github.com/facebook/zstd
[3]: /messages/by-id/CAJ7c6TPSN06C+5cYSkyLkQbwN1C+pUNGmx+VoGCA-SPLCszC8w@mail.gmail.com

```
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct /* Compressed-in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
} va_compressed;
struct
{
uint32 va_header;
uint32 va_tcinfo;
uint32 va_cmp_alg;
uint32 va_cmp_dictid;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed_ext;
} varattrib_4b;
```
Additional algorithm information and dictid is stored in varattrib_4b.

Best regards,
Nikhil Veldanda

On Fri, Mar 7, 2025 at 5:35 PM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:

Show quoted text

Hi,

I reviewed the discussions, and while most agreements focused on
changes to the toast pointer, the design I propose requires no
modifications to it. I’ve carefully considered the design choices made
previously, and I recognize Zstd’s clear advantages in compression
efficiency and performance over algorithms like PGLZ and LZ4, we can
integrate it without altering the existing toast pointer
(varatt_external) structure.

By simply using the top two bits of the va_extinfo field (setting them
to '11') in `varatt_external`, we can signal an alternative
compression algorithm, clearly distinguishing new methods from legacy
ones. The specific algorithm used would then be recorded in the
va_cmp_alg field.

This approach addresses the issues raised in the summarized thread[1]
and to leverage dictionaries for the data that can stay in-line. While
my initial patch includes modifications to toast_pointer due to a
single dependency on (pg_column_compression), those changes aren’t
strictly necessary; resolving that dependency separately would make
the overall design even less intrusive.

Here’s an illustrative structure:
```
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct /* Current Compressed format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original size and compression method */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
} va_compressed;
struct /* Extended compression format */
{
uint32 va_header;
uint32 va_tcinfo;
uint32 va_cmp_alg;
uint32 va_cmp_dictid;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed_ext;
} varattrib_4b;

typedef struct varatt_external
{
int32 va_rawsize; /* Original data size (includes header) */
uint32 va_extinfo; /* External saved size (without header) and
* compression method */ `11` indicates new compression methods.
Oid va_valueid; /* Unique ID of value within TOAST table */
Oid va_toastrelid; /* RelID of TOAST table containing it */
} varatt_external;
```

Decompression flow remains straightforward: once a datum is identified
as external, we detoast it, then we identify the compression algorithm
using `
TOAST_COMPRESS_METHOD` macro which refers to a varattrib_4b structure
not a toast pointer. We retrieve the compression algorithm from either
va_tcinfo or va_cmp_alg based on adjusted macros, and decompress
accordingly.

In summary, integrating Zstandard into the TOAST framework in this
minimally invasive way should yield substantial benefits.

[1] /messages/by-id/CAJ7c6TPSN06C+5cYSkyLkQbwN1C+pUNGmx+VoGCA-SPLCszC8w@mail.gmail.com

Best regards,
Nikhil Veldanda

On Fri, Mar 7, 2025 at 3:42 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:

Hi Nikhil,

Thank you for highlighting the previous discussion—I reviewed [1]
closely. While both methods involve dictionary-based compression, the
approach I'm proposing differs significantly.

The previous method explicitly extracted string values from JSONB and
assigned unique OIDs to each entry, resulting in distinct dictionary
entries for every unique value. In contrast, this approach directly
leverages Zstandard's dictionary training API. We provide raw data
samples to Zstd, which generates a dictionary of a specified size.
This dictionary is then stored in a catalog table and used to compress
subsequent inserts for the specific attribute it was trained on.

[...]

You didn't read closely enough I'm afraid. As Tom pointed out, the
title of the thread is misleading. On top of that there are several
separate threads. I did my best to cross-reference them, but
apparently didn't do good enough.

Initially I proposed to add ZSON extension [1][2] to the PostgreSQL
core. However the idea evolved into TOAST improvements that don't
require a user to use special types. You may also find interesting the
related "Pluggable TOASTer" discussion [3]. The idea there was rather
different but the discussion about extending TOAST pointers so that in
the future we can use something else than ZSTD is relevant.

You will find the recent summary of the reached agreements somewhere
around this message [4], take a look at the thread a bit above and
below it.

I believe this effort is important. You can't, however, simply discard
everything that was discussed in this area for the past several years.
If you want to succeed of course. No one will look at your patch if it
doesn't account for all the previous discussions. I'm sorry, I know
it's disappointing. This being said you should have done better
research before submitting the code. You could just ask if anyone was
working on something like this before and save a lot of time.

Personally I would suggest starting with one little step toward
compression dictionaries. Particularly focusing on extendability of
TOAST pointers. You are going to need to store dictionary ids there
and allow using other compression algorithms in the future. This will
require something like a varint/utf8-like bitmask for this. See the
previous discussions.

[1]: https://github.com/afiskon/zson
[2]: /messages/by-id/CAJ7c6TP3fCC9TNKJBQAcEf4c=L7XQZ7QvuUayLgjhNQMD_5M_A@mail.gmail.com
[3]: /messages/by-id/224711f9-83b7-a307-b17f-4457ab73aa0a@sigaev.ru
[4]: /messages/by-id/CAJ7c6TPSN06C+5cYSkyLkQbwN1C+pUNGmx+VoGCA-SPLCszC8w@mail.gmail.com

--
Best regards,
Aleksander Alekseev

Attachments:

v2-0001-Add-ZStandard-with-dictionaries-compression-suppo.patchapplication/octet-stream; name=v2-0001-Add-ZStandard-with-dictionaries-compression-suppo.patch
image.pngimage/png; name=image.png
#17Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Nikhil Kumar Veldanda (#15)
Re: ZStandard (with dictionaries) compression support for TOAST compression

On Fri, Mar 7, 2025 at 8:36 PM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:

struct /* Extended compression format */
{
uint32 va_header;
uint32 va_tcinfo;
uint32 va_cmp_alg;
uint32 va_cmp_dictid;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed_ext;
} varattrib_4b;

First, thanks for sending along the performance results. I agree that
those are promising. Second, thanks for sending these design details.

The idea of keeping dictionaries in pg_zstd_dictionaries literally
forever doesn't seem very appealing, but I'm not sure what the other
options are. I think we've established in previous work in this area
that compressed values can creep into unrelated tables and inside
records or other container types like ranges. Therefore, we have no
good way of knowing when a dictionary is unreferenced and can be
dropped. So in that sense your decision to keep them forever is
"right," but it's still unpleasant. It would even be necessary to make
pg_upgrade carry them over to new versions.

If we could make sure that compressed datums never leaked out into
other tables, then tables could depend on dictionaries and
dictionaries could be dropped when there were no longer any tables
depending on them. But like I say, previous work suggested that this
would be very difficult to achieve. However, without that, I imagine
users generating new dictionaries regularly as the data changes and
eventually getting frustrated that they can't get rid of the old ones.

--
Robert Haas
EDB: http://www.enterprisedb.com

#18Nikhil Kumar Veldanda
Nikhil Kumar Veldanda
veldanda.nikhilkumar17@gmail.com
In reply to: Robert Haas (#17)
7 attachment(s)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi Robert,

Thank you for your response, and apologies for the delay in getting
back to you. You raised some important concerns in your reply, I’ve
worked hard to understand and hopefully address these two:

* Dictionary Cleanup via Dependency Tracking
* Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ...
SELECT ...)

Dictionary Cleanup via Dependency Tracking:

To address your question on how we can safely clean up unused
dictionaries, I’ve implemented a mechanism based on PostgreSQL’s
standard dependency system (pg_depend), permit me to explain.

When a Zstandard dictionary is created for a table, we record a
DEPENDENCY_NORMAL dependency from the table to the dictionary. This
ensures that when the table is dropped, the corresponding entry is
removed from the pg_depend catalog. Users can then call the
cleanup_unused_dictionaries() function to remove any dictionaries that
are no longer referenced by any table.

// create dependency,
{
ObjectAddress dictObj;
ObjectAddress relation;

ObjectAddressSet(dictObj, ZstdDictionariesRelationId, dictid);
ObjectAddressSet(relation, RelationRelationId, relid);

/* NORMAL dependency: relid → Dictionary */
recordDependencyOn(&relation, &dictObj, DEPENDENCY_NORMAL);
}

Example: Consider two tables, each using its own Zstandard dictionary:

test=# \dt+
List of tables
Schema | Name | Type | Owner | Persistence | Access method |
Size | Description
--------+-------+-------+----------+-------------+---------------+-------+-------------
public | temp | table | nikhilkv | permanent | heap | 16 kB |
public | temp1 | table | nikhilkv | permanent | heap | 16 kB |
(2 rows)

// Dictionary dependencies
test=# select * from pg_depend where refclassid = 9946;
classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
---------+-------+----------+------------+----------+-------------+---------
1259 | 16389 | 0 | 9946 | 1 | 0 | n
1259 | 16394 | 0 | 9946 | 2 | 0 | n
(2 rows)

// the corresponding dictionaries:
test=# select * from pg_zstd_dictionaries ;
dictid |
dict
--------+----------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------
--------------------------------------
1 | \x37a430ec71451a10091010df303333b3770a33f1783c1e8fc7e3f1783ccff3bcf7d442414141414141414141414141414141414141414
14141414141a15028140a8542a15028140a85a2288aa2284a297d74e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1f1783c1e8fc7e3f1789ee779ef01
0100000004000000080000004c6f72656d20697073756d20646f6c6f722073697420616d65742c20636f6e73656374657475722061646970697363696
e6720656c69742e204c6f72656d2069
2 | \x37a430ec7d1a933a091010df303333b3770a33f1783c1e8fc7e3f1783ccff3bcf7d442414141414141414141414141414141414141414
14141414141a15028140a8542a15028140a85a2288aa2284a297d74e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1f1783c1e8fc7e3f1789ee779ef01
0100000004000000080000004e696b68696c206b756d616e722076656c64616e64612c206973206f6b61792063616e6469646174652c2068652069732
0696e2073656174746c65204e696b68696c20
(2 rows)

If cleanup_unused_dictionaries() is called while the dependencies
still exist, nothing is removed:

test=# select cleanup_unused_dictionaries();
cleanup_unused_dictionaries
-----------------------------
0
(1 row)

After dropping temp1, the associated dictionary becomes eligible for cleanup:

test=# drop table temp1;
DROP TABLE

test=# select cleanup_unused_dictionaries();
cleanup_unused_dictionaries
-----------------------------
1
(1 row)

________________________________
Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT ...)

As compressed datums can be copied to other unrelated tables via CTAS,
INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a
method inheritZstdDictionaryDependencies. This method is invoked at
the end of such statements and ensures that any dictionary
dependencies from source tables are copied to the destination table.
We determine the set of source tables using the relationOids field in
PlannedStmt.

This guarantees that if compressed datums reference a zstd dictionary
the destination table is marked as dependent on the dictionaries that
the source tables depend on, preventing premature cleanup by
cleanup_unused_dictionaries.

Example: Consider this example where we have two tables which has
their own dictionary

List of tables
Schema | Name | Type | Owner | Persistence | Access method |
Size | Description
--------+-------+-------+----------+-------------+---------------+-------+-------------
public | temp | table | nikhilkv | permanent | heap | 16 kB |
public | temp1 | table | nikhilkv | permanent | heap | 16 kB |
(2 rows)

Using CTAS (CREATE TABLE AS), one table is copied to another. In this
case, the compressed datums in the temp table are copied to copy_tbl.
Since the dictionary is shared between two tables, a dependency on
that dictionary is also established for the destination table. Even if
the original temp table is deleted and cleanup is triggered, the
dictionary will not be dropped because there remains an active
dependency.

test=# create table copy_tbl as select * from temp;
SELECT 20

// dictid 1 is shared between two tables.
test=# select * from pg_depend where refclassid = 9946;
classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
---------+-------+----------+------------+----------+-------------+---------
1259 | 16389 | 0 | 9946 | 1 | 0 | n
1259 | 16404 | 0 | 9946 | 1 | 0 | n
1259 | 16399 | 0 | 9946 | 3 | 0 | n
(3 rows)

// After dropping the temp tale where dictid 1 is used to compress datums
test=# drop table temp;
DROP TABLE

// dependency for temp table is dropped.
test=# select * from pg_depend where refclassid = 9946;
classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
---------+-------+----------+------------+----------+-------------+---------
1259 | 16404 | 0 | 9946 | 1 | 0 | n
1259 | 16399 | 0 | 9946 | 3 | 0 | n
(2 rows)

// No dictionaries are being deleted.
test=# select cleanup_unused_dictionaries();
cleanup_unused_dictionaries
-----------------------------
0
(1 row)

Once the new copy_tbl is also deleted, the dictionary can be dropped
because no dependency exists on it:

test=# drop table copy_tbl;
DROP TABLE

// The dictionary is then deleted.
test=# select cleanup_unused_dictionaries();
cleanup_unused_dictionaries
-----------------------------
1
(1 row)

Another example using composite types, including a more complex
scenario involving two source tables.

// Create a base composite type with two text fields
test=# create type my_composite as (f1 text, f2 text);
CREATE TYPE

// Create a nested composite type that uses my_composite twice
test=# create type my_composite1 as (f1 my_composite, f2 my_composite);
CREATE TYPE

test=# \d my_composite
Composite type "public.my_composite"
Column | Type | Collation | Nullable | Default
--------+------+-----------+----------+---------
f1 | text | | |
f2 | text | | |

test=# \d my_composite1
Composite type "public.my_composite1"
Column | Type | Collation | Nullable | Default
--------+--------------+-----------+----------+---------
f1 | my_composite | | |
f2 | my_composite | | |

// Sample table with ZSTD dictionary compression on text columns
test=# \d+ orders
Table "public.orders"
Column | Type | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
-------------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
order_id | integer | | | | plain |
| |
customer_id | integer | | | | plain |
| |
random1 | text | | | | extended |
zstd | |
random2 | text | | | | extended |
zstd | |
Access method: heap

// Sample table with ZSTD dictionary compression on one of the text column
test=# \d+ customers
Table "public.customers"
Column | Type | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
-------------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
customer_id | integer | | | | plain |
| |
random3 | text | | | | extended |
zstd | |
random4 | text | | | | extended |
| |
Access method: heap

// Check existing dictionaries: dictid 1 for random1, dictid 2 for
random2, dictid 3 for random3 attribute
test=# select dictid from pg_zstd_dictionaries;
dictid
--------
1
2
3
(3 rows)

// List all objects dependent on ZSTD dictionaries
test=# select objid::regclass, * from pg_depend where refclassid = 9946;
objid | classid | objid | objsubid | refclassid | refobjid |
refobjsubid | deptype
-----------+---------+-------+----------+------------+----------+-------------+---------
orders | 1259 | 16391 | 0 | 9946 | 1 |
0 | n
orders | 1259 | 16391 | 0 | 9946 | 2 |
0 | n
customers | 1259 | 16396 | 0 | 9946 | 3 |
0 | n
(3 rows)

// Create new table using nested composite type
// This copies compressed datums into temp1.
test=# create table temp1 as
select ROW(
ROW(random3, random4)::my_composite,
ROW(random1, random2)::my_composite
)::my_composite1
from customers full outer join orders using (customer_id);
SELECT 51

test=# select objid::regclass, * from pg_depend where refclassid = 9946;
objid | classid | objid | objsubid | refclassid | refobjid |
refobjsubid | deptype
-----------+---------+-------+----------+------------+----------+-------------+---------
orders | 1259 | 16391 | 0 | 9946 | 1 |
0 | n
temp1 | 1259 | 16423 | 0 | 9946 | 1 |
0 | n
orders | 1259 | 16391 | 0 | 9946 | 2 |
0 | n
temp1 | 1259 | 16423 | 0 | 9946 | 2 |
0 | n
temp1 | 1259 | 16423 | 0 | 9946 | 3 |
0 | n
customers | 1259 | 16396 | 0 | 9946 | 3 |
0 | n
(6 rows)

// Drop the original source tables.
test=# drop table orders;
DROP TABLE

test=# drop table customers ;
DROP TABLE

// Even after dropping orders, customers table, temp1 still holds
references to the dictionaries.
test=# select objid::regclass, * from pg_depend where refclassid = 9946;
objid | classid | objid | objsubid | refclassid | refobjid |
refobjsubid | deptype
-------+---------+-------+----------+------------+----------+-------------+---------
temp1 | 1259 | 16423 | 0 | 9946 | 1 | 0 | n
temp1 | 1259 | 16423 | 0 | 9946 | 2 | 0 | n
temp1 | 1259 | 16423 | 0 | 9946 | 3 | 0 | n
(3 rows)

// Attempt cleanup, No cleanup occurs, because temp1 table still
depends on the dictionaries.
test=# select cleanup_unused_dictionaries();
cleanup_unused_dictionaries
-----------------------------
0
(1 row)

test=# select dictid from pg_zstd_dictionaries ;
dictid
--------
1
2
3
(3 rows)

// Drop the destination table
test=# drop table temp1;
DROP TABLE

// Confirm no remaining dependencies
test=# select objid::regclass, * from pg_depend where refclassid = 9946;
objid | classid | objid | objsubid | refclassid | refobjid |
refobjsubid | deptype
-------+---------+-------+----------+------------+----------+-------------+---------
(0 rows)

// Cleanup now succeeds
test=# select cleanup_unused_dictionaries();
cleanup_unused_dictionaries
-----------------------------
3
(1 row)

test=# select dictid from pg_zstd_dictionaries ;
dictid
--------
(0 rows)

This design ensures that:

Dictionaries are only deleted when no table depends on them.
We avoid costly decompression/recompression to avoid compressed datum leakage.
We don’t retain dictionaries forever.

These changes are the core additions in this revision of the patch to
address concern around long-lived dictionaries and compressed datum
leakage. Additionally, this update incorporates feedback by enabling
automatic zstd dictionary generation and cleanup during the VACUUM
process and includes changes to support copying ZSTD dictionaries
during pg_upgrade.

Patch summary:

v11-0001-varattrib_4b-changes-and-macros-update-needed-to.patch
Refactors varattrib_4b structures and updates related macros to enable
ZSTD dictionary support.
v11-0002-Zstd-compression-and-decompression-routines-incl.patch
Adds ZSTD compression and decompression routines, and introduces a new
catalog to store dictionary metadata.
v11-0003-Zstd-dictionary-training-process.patch
Implements the dictionary training workflow. Includes built-in support
for text and jsonb types. Allows users to define custom sampling
functions per type by specifying a C function name in the
pg_type.typzstdsampling field.
v11-0004-Dependency-tracking-mechanism-to-track-compresse.patch
Introduces a dependency tracking mechanism using pg_depend to record
which ZSTD dictionaries a table depends on. When compressed datums
that rely on a dictionary are copied to unrelated target tables, the
corresponding dictionary dependencies from the source table are also
recorded for the target table, ensuring the dictionaries are not
prematurely cleaned up.
v11-0005-generate-and-cleanup-dictionaries-using-vacuum.patch
Adds integration with VACUUM to automatically generate and clean up
ZSTD dictionaries.
v11-0006-pg_dump-pg_upgrade-needed-changes-to-support-new.patch
Extends pg_dump and pg_upgrade to support migrating ZSTD dictionaries
and their dependencies during pg_upgrade.
v11-0007-Some-tests-related-to-zstd-dictionary-based-comp.patch
Provides test coverage for ZSTD dictionary-based compression features,
including training, usage, and cleanup.

I hope that these changes address your concerns, any thoughts or
suggestions on this approach are welcome.

Best regards,
Nikhil Veldanda

Show quoted text

On Mon, Mar 17, 2025 at 1:03 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 7, 2025 at 8:36 PM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:

struct /* Extended compression format */
{
uint32 va_header;
uint32 va_tcinfo;
uint32 va_cmp_alg;
uint32 va_cmp_dictid;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed_ext;
} varattrib_4b;

First, thanks for sending along the performance results. I agree that
those are promising. Second, thanks for sending these design details.

The idea of keeping dictionaries in pg_zstd_dictionaries literally
forever doesn't seem very appealing, but I'm not sure what the other
options are. I think we've established in previous work in this area
that compressed values can creep into unrelated tables and inside
records or other container types like ranges. Therefore, we have no
good way of knowing when a dictionary is unreferenced and can be
dropped. So in that sense your decision to keep them forever is
"right," but it's still unpleasant. It would even be necessary to make
pg_upgrade carry them over to new versions.

If we could make sure that compressed datums never leaked out into
other tables, then tables could depend on dictionaries and
dictionaries could be dropped when there were no longer any tables
depending on them. But like I say, previous work suggested that this
would be very difficult to achieve. However, without that, I imagine
users generating new dictionaries regularly as the data changes and
eventually getting frustrated that they can't get rid of the old ones.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

v11-0001-varattrib_4b-changes-and-macros-update-needed-to.patchapplication/octet-stream; name=v11-0001-varattrib_4b-changes-and-macros-update-needed-to.patch
v11-0002-Zstd-compression-and-decompression-routines-incl.patchapplication/octet-stream; name=v11-0002-Zstd-compression-and-decompression-routines-incl.patch
v11-0005-generate-and-cleanup-dictionaries-using-vacuum.patchapplication/octet-stream; name=v11-0005-generate-and-cleanup-dictionaries-using-vacuum.patch
v11-0003-Zstd-dictionary-training-process.patchapplication/octet-stream; name=v11-0003-Zstd-dictionary-training-process.patch
v11-0007-Some-tests-related-to-zstd-dictionary-based-comp.patchapplication/octet-stream; name=v11-0007-Some-tests-related-to-zstd-dictionary-based-comp.patch
v11-0006-pg_dump-pg_upgrade-needed-changes-to-support-new.patchapplication/octet-stream; name=v11-0006-pg_dump-pg_upgrade-needed-changes-to-support-new.patch
v11-0004-Dependency-tracking-mechanism-to-track-compresse.patchapplication/octet-stream; name=v11-0004-Dependency-tracking-mechanism-to-track-compresse.patch
#19Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Nikhil Kumar Veldanda (#18)
Re: ZStandard (with dictionaries) compression support for TOAST compression

On Tue, Apr 15, 2025 at 2:13 PM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:

Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT ...)

As compressed datums can be copied to other unrelated tables via CTAS,
INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a
method inheritZstdDictionaryDependencies. This method is invoked at
the end of such statements and ensures that any dictionary
dependencies from source tables are copied to the destination table.
We determine the set of source tables using the relationOids field in
PlannedStmt.

With the disclaimer that I haven't opened the patch or thought
terribly deeply about this issue, at least not yet, my fairly strong
suspicion is that this design is not going to work out, for multiple
reasons. In no particular order:

1. I don't think users will like it if dependencies on a zstd
dictionary spread like kudzu across all of their tables. I don't think
they'd like it even if it were 100% accurate, but presumably this is
going to add dependencies any time there MIGHT be a real dependency
rather than only when there actually is one.

2. Inserting into a table or updating it only takes RowExclusiveLock,
which is not even self-exclusive. I doubt that it's possible to change
system catalogs in a concurrency-safe way with such a weak lock. For
instance, if two sessions tried to do the same thing in concurrent
transactions, they could both try to add the same dependency at the
same time.

3. I'm not sure that CTAS, INSERT INTO...SELECT, and CREATE
TABLE...EXECUTE are the only ways that datums can creep from one table
into another. For example, what if I create a plpgsql function that
gets a value from one table and stores it in a variable, and then use
that variable to drive an INSERT into another table? I seem to recall
there are complex cases involving records and range types and arrays,
too, where the compressed object gets wrapped inside of another
object; though maybe that wouldn't matter to your implementation if
INSERT INTO ... SELECT uses a sufficiently aggressive strategy for
adding dependencies.

When Dilip and I were working on lz4 TOAST compression, my first
instinct was to not let LZ4-compressed datums leak out of a table by
forcing them to be decompressed (and then possibly recompressed). We
spent a long time trying to make that work before giving up. I think
this is approximately where things started to unravel, and I'd suggest
you read both this message and some of the discussion before and
after:

/messages/by-id/20210316185455.5gp3c5zvvvq66iyj@alap3.anarazel.de

I think we could add plain-old zstd compression without really
tackling this issue, but if we are going to add dictionaries then I
think we might need to revisit the idea of preventing things from
leaking out of tables. What I can't quite remember at the moment is
how much of the problem was that it was going to be slow to force the
recompression, and how much of it was that we weren't sure we could
even find all the places in the code that might need such handling.

I'm now also curious to know whether Andres would agree that it's bad
if zstd dictionaries are un-droppable. After all, I thought it would
be bad if there was no way to eliminate a dependency on a compression
method, and he disagreed. So maybe he would also think undroppable
dictionaries are fine. But maybe not. It seems even worse to me than
undroppable compression methods, because you'll probably not have that
many compression methods ever, but you could have a large number of
dictionaries eventually.

--
Robert Haas
EDB: http://www.enterprisedb.com

#20Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#19)
Re: ZStandard (with dictionaries) compression support for TOAST compression

On Fri, Apr 18, 2025 at 12:22:18PM -0400, Robert Haas wrote:

I think we could add plain-old zstd compression without really
tackling this issue, but if we are going to add dictionaries then I
think we might need to revisit the idea of preventing things from
leaking out of tables. What I can't quite remember at the moment is
how much of the problem was that it was going to be slow to force the
recompression, and how much of it was that we weren't sure we could
even find all the places in the code that might need such handling.

FWIW, this point resonates here. There is one thing that we have to
do anyway: we just have one bit left in the varlena headers as lz4 is
using the one before last. So we have to make it extensible, even if
it means that any compression method other than LZ4 and pglz would
consume one more byte in its header by default. And I think that this
has to happen at some point if we want flexibility in this area.

+    struct
+    {
+        uint32        va_header;
+        uint32        va_tcinfo;
+        uint32        va_cmp_alg;
+        uint32        va_cmp_dictid;
+        char        va_data[FLEXIBLE_ARRAY_MEMBER];
+    }            va_compressed_ext;

Speaking of which, I am confused by this abstraction choice in
varatt.h in the first patch. Are we sure that we are always going to
have a dictionary attached to a compressed data set or even a
va_cmp_alg? It seems to me that this could lead to a waste of data in
some cases because these fields may not be required depending on the
compression method used, as some fields may not care about these
details. This kind of data should be made optional, on a per-field
basis.

One thing that I've been wondering is how it would be possible to make
the area around varattrib_4b more readable while dealing with more
extensibility. It would be a good occasion to improve that, even if
I'm hand-waving here currently and that the majority of this code is
old enough to vote, with few modifications across the years.

The second thing that I'd love to see on top of the addition of the
extensibility is adding plain compression support for zstd, with
nothing fancy, just the compression and decompression bits. I've done
quite a few benchmarks with the two, and results kind of point in the
direction that zstd is more efficient than lz4 overall. Don't take me
wrong: lz4 can be better in some workloads as it can consume less CPU
than zstd while compressing less. However, a comparison of ratios
like (compression rate / cpu used) has always led me to see zstd as
superior in a large number of cases. lz4 is still very good if you
are CPU-bound and don't care about the extra space required. Both are
three classes better than pglz.

Once we have these three points incrementally built-in together (the
last bit extensibility, the potential varatt.h refactoring and the
zstd support), there may be a point in having support for more
advanced options with the compression methods in the shape of dicts or
more requirements linked to other compression methods, but I think the
topic is complex enough that we should make sure that these basics are
implemented in a way sane enough so as we'd be able to extend them
with all the use cases in mind.
--
Michael

#21Nikhil Kumar Veldanda
Nikhil Kumar Veldanda
veldanda.nikhilkumar17@gmail.com
In reply to: Robert Haas (#19)
1 attachment(s)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi Robert,

Thank you for your feedback on the patch. You’re right that my
proposed design will introduce more dictionary dependencies as
dictionaries grow, I chose this path specifically to avoid changing
existing system behavior and prevent perf regressions in CTAS and
related commands.

After reviewing the email thread you attached on previous response, I
identified a natural choke point for both inserts and updates: the
call to "heap_toast_insert_or_update" inside
heap_prepare_insert/heap_update. In the current master branch, that
function only runs when HeapTupleHasExternal is true; my patch extends
it to HeapTupleHasVarWidth tuples as well. By decompressing every
nested compressed datum at this point—no matter how deeply nested—we
can prevent any leaked datum from propagating into unrelated tables.
This mirrors the existing inlining logic in toast_tuple_init for
external toasted datum, but takes it one step further to fully flatten
datum(decompress datum, not just top level at every level).

On the performance side, my basic benchmarks show almost no regression
for simple INSERT … VALUES workloads. CTAS, however, does regress
noticeably: a CTAS completes in about 4 seconds before this patch, but
with this patch it takes roughly 24 seconds. (For reference, a normal
insert into the source table took about 58 seconds when using zstd
dictionary compression), I suspect the extra cost comes from the added
zstd decompression and PGLZ compression on the destination table.

I’ve attached v13-0008-initial-draft-to-address-datum-leak-problem.patch,
which implements this “flatten_datum” method.

I’d love to know your thoughts on this. Am I on the right track for
solving the problem?

Best regards,
Nikhil Veldanda

Show quoted text

On Fri, Apr 18, 2025 at 9:22 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Apr 15, 2025 at 2:13 PM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:

Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT ...)

As compressed datums can be copied to other unrelated tables via CTAS,
INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a
method inheritZstdDictionaryDependencies. This method is invoked at
the end of such statements and ensures that any dictionary
dependencies from source tables are copied to the destination table.
We determine the set of source tables using the relationOids field in
PlannedStmt.

With the disclaimer that I haven't opened the patch or thought
terribly deeply about this issue, at least not yet, my fairly strong
suspicion is that this design is not going to work out, for multiple
reasons. In no particular order:

1. I don't think users will like it if dependencies on a zstd
dictionary spread like kudzu across all of their tables. I don't think
they'd like it even if it were 100% accurate, but presumably this is
going to add dependencies any time there MIGHT be a real dependency
rather than only when there actually is one.

2. Inserting into a table or updating it only takes RowExclusiveLock,
which is not even self-exclusive. I doubt that it's possible to change
system catalogs in a concurrency-safe way with such a weak lock. For
instance, if two sessions tried to do the same thing in concurrent
transactions, they could both try to add the same dependency at the
same time.

3. I'm not sure that CTAS, INSERT INTO...SELECT, and CREATE
TABLE...EXECUTE are the only ways that datums can creep from one table
into another. For example, what if I create a plpgsql function that
gets a value from one table and stores it in a variable, and then use
that variable to drive an INSERT into another table? I seem to recall
there are complex cases involving records and range types and arrays,
too, where the compressed object gets wrapped inside of another
object; though maybe that wouldn't matter to your implementation if
INSERT INTO ... SELECT uses a sufficiently aggressive strategy for
adding dependencies.

When Dilip and I were working on lz4 TOAST compression, my first
instinct was to not let LZ4-compressed datums leak out of a table by
forcing them to be decompressed (and then possibly recompressed). We
spent a long time trying to make that work before giving up. I think
this is approximately where things started to unravel, and I'd suggest
you read both this message and some of the discussion before and
after:

/messages/by-id/20210316185455.5gp3c5zvvvq66iyj@alap3.anarazel.de

I think we could add plain-old zstd compression without really
tackling this issue, but if we are going to add dictionaries then I
think we might need to revisit the idea of preventing things from
leaking out of tables. What I can't quite remember at the moment is
how much of the problem was that it was going to be slow to force the
recompression, and how much of it was that we weren't sure we could
even find all the places in the code that might need such handling.

I'm now also curious to know whether Andres would agree that it's bad
if zstd dictionaries are un-droppable. After all, I thought it would
be bad if there was no way to eliminate a dependency on a compression
method, and he disagreed. So maybe he would also think undroppable
dictionaries are fine. But maybe not. It seems even worse to me than
undroppable compression methods, because you'll probably not have that
many compression methods ever, but you could have a large number of
dictionaries eventually.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

v13-0008-initial-draft-to-address-datum-leak-problem.patchapplication/octet-stream; name=v13-0008-initial-draft-to-address-datum-leak-problem.patch
#22Nikhil Kumar Veldanda
Nikhil Kumar Veldanda
veldanda.nikhilkumar17@gmail.com
In reply to: Michael Paquier (#20)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi Michael,

Thanks for the feedback and the suggested patch sequence. I completely
agree—we must minimize storage overhead when dictionaries aren’t used,
while ensuring varattrib_4b remains extensible enough to handle future
compression metadata beyond dictionary ID (for other algorithms). I’ll
explore design options that satisfy both goals and share my proposal.

Best regards,
Nikhil Veldanda

Show quoted text

On Mon, Apr 21, 2025 at 12:02 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Apr 18, 2025 at 12:22:18PM -0400, Robert Haas wrote:

I think we could add plain-old zstd compression without really
tackling this issue, but if we are going to add dictionaries then I
think we might need to revisit the idea of preventing things from
leaking out of tables. What I can't quite remember at the moment is
how much of the problem was that it was going to be slow to force the
recompression, and how much of it was that we weren't sure we could
even find all the places in the code that might need such handling.

FWIW, this point resonates here. There is one thing that we have to
do anyway: we just have one bit left in the varlena headers as lz4 is
using the one before last. So we have to make it extensible, even if
it means that any compression method other than LZ4 and pglz would
consume one more byte in its header by default. And I think that this
has to happen at some point if we want flexibility in this area.

+    struct
+    {
+        uint32        va_header;
+        uint32        va_tcinfo;
+        uint32        va_cmp_alg;
+        uint32        va_cmp_dictid;
+        char        va_data[FLEXIBLE_ARRAY_MEMBER];
+    }            va_compressed_ext;

Speaking of which, I am confused by this abstraction choice in
varatt.h in the first patch. Are we sure that we are always going to
have a dictionary attached to a compressed data set or even a
va_cmp_alg? It seems to me that this could lead to a waste of data in
some cases because these fields may not be required depending on the
compression method used, as some fields may not care about these
details. This kind of data should be made optional, on a per-field
basis.

One thing that I've been wondering is how it would be possible to make
the area around varattrib_4b more readable while dealing with more
extensibility. It would be a good occasion to improve that, even if
I'm hand-waving here currently and that the majority of this code is
old enough to vote, with few modifications across the years.

The second thing that I'd love to see on top of the addition of the
extensibility is adding plain compression support for zstd, with
nothing fancy, just the compression and decompression bits. I've done
quite a few benchmarks with the two, and results kind of point in the
direction that zstd is more efficient than lz4 overall. Don't take me
wrong: lz4 can be better in some workloads as it can consume less CPU
than zstd while compressing less. However, a comparison of ratios
like (compression rate / cpu used) has always led me to see zstd as
superior in a large number of cases. lz4 is still very good if you
are CPU-bound and don't care about the extra space required. Both are
three classes better than pglz.

Once we have these three points incrementally built-in together (the
last bit extensibility, the potential varatt.h refactoring and the
zstd support), there may be a point in having support for more
advanced options with the compression methods in the shape of dicts or
more requirements linked to other compression methods, but I think the
topic is complex enough that we should make sure that these basics are
implemented in a way sane enough so as we'd be able to extend them
with all the use cases in mind.
--
Michael

#23Andres Freund
Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#19)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi,

On 2025-04-18 12:22:18 -0400, Robert Haas wrote:

On Tue, Apr 15, 2025 at 2:13 PM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:

Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT ...)

As compressed datums can be copied to other unrelated tables via CTAS,
INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a
method inheritZstdDictionaryDependencies. This method is invoked at
the end of such statements and ensures that any dictionary
dependencies from source tables are copied to the destination table.
We determine the set of source tables using the relationOids field in
PlannedStmt.

With the disclaimer that I haven't opened the patch or thought
terribly deeply about this issue, at least not yet, my fairly strong
suspicion is that this design is not going to work out, for multiple
reasons. In no particular order:

1. I don't think users will like it if dependencies on a zstd
dictionary spread like kudzu across all of their tables. I don't think
they'd like it even if it were 100% accurate, but presumably this is
going to add dependencies any time there MIGHT be a real dependency
rather than only when there actually is one.

2. Inserting into a table or updating it only takes RowExclusiveLock,
which is not even self-exclusive. I doubt that it's possible to change
system catalogs in a concurrency-safe way with such a weak lock. For
instance, if two sessions tried to do the same thing in concurrent
transactions, they could both try to add the same dependency at the
same time.

3. I'm not sure that CTAS, INSERT INTO...SELECT, and CREATE
TABLE...EXECUTE are the only ways that datums can creep from one table
into another. For example, what if I create a plpgsql function that
gets a value from one table and stores it in a variable, and then use
that variable to drive an INSERT into another table? I seem to recall
there are complex cases involving records and range types and arrays,
too, where the compressed object gets wrapped inside of another
object; though maybe that wouldn't matter to your implementation if
INSERT INTO ... SELECT uses a sufficiently aggressive strategy for
adding dependencies.

+1 to all of these.

I think we could add plain-old zstd compression without really
tackling this issue

+1

I'm now also curious to know whether Andres would agree that it's bad
if zstd dictionaries are un-droppable. After all, I thought it would
be bad if there was no way to eliminate a dependency on a compression
method, and he disagreed.

I still am not too worried about that aspect. However:

So maybe he would also think undroppable dictionaries are fine.

I'm much less sanguine about this. Imagine a schema based multi-tenancy setup,
where tenants come and go, and where a few of the tables use custom
dictionaries. Whereas not being able to get rid of lz4 at all has basically no
cost whatsoever, collecting more and more unusable dictionaries can imply a
fair amount of space usage after a while. I don't see any argument why that
would be ok, really.

But maybe not. It seems even worse to me than undroppable compression
methods, because you'll probably not have that many compression methods
ever, but you could have a large number of dictionaries eventually.

Agreed on the latter.

Greetings,

Andres Freund

#24Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Nikhil Kumar Veldanda (#21)
Re: ZStandard (with dictionaries) compression support for TOAST compression

On Mon, Apr 21, 2025 at 8:52 PM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:

After reviewing the email thread you attached on previous response, I
identified a natural choke point for both inserts and updates: the
call to "heap_toast_insert_or_update" inside
heap_prepare_insert/heap_update. In the current master branch, that
function only runs when HeapTupleHasExternal is true; my patch extends
it to HeapTupleHasVarWidth tuples as well.

Isn't that basically all tuples, though? I think that's where this gets painful.

On the performance side, my basic benchmarks show almost no regression
for simple INSERT … VALUES workloads. CTAS, however, does regress
noticeably: a CTAS completes in about 4 seconds before this patch, but
with this patch it takes roughly 24 seconds. (For reference, a normal
insert into the source table took about 58 seconds when using zstd
dictionary compression), I suspect the extra cost comes from the added
zstd decompression and PGLZ compression on the destination table.

That's nice to know, but I think the key question is not so much what
the feature costs when it is used but what it costs when it isn't
used. If we implement a system where we don't let
dictionary-compressed zstd datums leak out of tables, that's bound to
slow down a CTAS from a table where this feature is used, but that's
kind of OK: the feature has pros and cons, and if you don't like those
tradeoffs, you don't have to use it. However, it sounds like this
could also slow down inserts and updates in some cases even for users
who are not making use of the feature, and that's going to be a major
problem unless it can be shown that there is no case where the impact
is at all significant. Users hate paying for features that they aren't
using.

I wonder if there's a possible design where we only allow
dictionary-compressed datums to exist as top-level attributes in
designated tables to which those dictionaries are attached; and any
time you try to bury that Datum inside a container object (row, range,
array, whatever) detoasting is forced. If there's a clean and
inexpensive way to implement that, then you could avoid having
heap_toast_insert_or_update care about HeapTupleHasExternal(), which
seems like it might be a key point.

--
Robert Haas
EDB: http://www.enterprisedb.com

#25Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#24)
Re: ZStandard (with dictionaries) compression support for TOAST compression

On Wed, Apr 23, 2025 at 11:59 AM Robert Haas <robertmhaas@gmail.com> wrote:

heap_toast_insert_or_update care about HeapTupleHasExternal(), which
seems like it might be a key point.

Care about HeapTupleHasVarWidth, rather.

--
Robert Haas
EDB: http://www.enterprisedb.com

#26Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#24)
Re: ZStandard (with dictionaries) compression support for TOAST compression

On Wed, Apr 23, 2025 at 11:59:26AM -0400, Robert Haas wrote:

That's nice to know, but I think the key question is not so much what
the feature costs when it is used but what it costs when it isn't
used. If we implement a system where we don't let
dictionary-compressed zstd datums leak out of tables, that's bound to
slow down a CTAS from a table where this feature is used, but that's
kind of OK: the feature has pros and cons, and if you don't like those
tradeoffs, you don't have to use it. However, it sounds like this
could also slow down inserts and updates in some cases even for users
who are not making use of the feature, and that's going to be a major
problem unless it can be shown that there is no case where the impact
is at all significant. Users hate paying for features that they aren't
using.

The cost of digesting a dictionnary when decompressing sets of values
is also something I think we should worry about, FWIW (see [1]https://facebook.github.io/zstd/zstd_manual.html#Chapter10 -- Michael), as
the digesting cost is documented as costly, so I think that there is
also an argument in making the feature efficient if used. That would
hurt if a sequential scan needs to detoast multiple blobs with the
same dict. If we attach that on a per-value value, wouldn't it imply
that we need to digest the dictionnary every time a blob is
decompressed? This information could be cached, but it seems a bit
weird to me to invent a new level of relation caching for would could
be attached as a relation attribute option in the relcache. If a
dictionnary gets trained with a new sample of values, we could rely on
the invalidation to pass the new information.

Based on what I'm reading and I know very little about the topic so I
may be wrong, but does it even make sense to allow multiple
dictionnaries to be used in a single attribute? Of course that may
depend on the JSON blob patterns a single attribute is dealing with,
but I'm not sure that this is worth the extra complexity this creates.

I wonder if there's a possible design where we only allow
dictionary-compressed datums to exist as top-level attributes in
designated tables to which those dictionaries are attached; and any
time you try to bury that Datum inside a container object (row, range,
array, whatever) detoasting is forced. If there's a clean and
inexpensive way to implement that, then you could avoid having
heap_toast_insert_or_update care about HeapTupleHasExternal(), which
seems like it might be a key point.

Interesting, not sure.

FWIW, I'd still try to focus on making varatt more extensible with
plain zstd support first, because diving in all these details. We are
going to need it anyway.

[1]: https://facebook.github.io/zstd/zstd_manual.html#Chapter10 -- Michael
--
Michael

#27Nikhil Kumar Veldanda
Nikhil Kumar Veldanda
veldanda.nikhilkumar17@gmail.com
In reply to: Michael Paquier (#20)
2 attachment(s)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi Michael,

Thanks for the suggestions. I agree that we should first solve the
“last–free-bit” problem in varattrib_4b compression bits before
layering on any features. Below is the approach I’ve prototyped to
keep the header compact yet fully extensible, followed by a sketch of
the plain-ZSTD(no dict) patch that sits cleanly on top of it.

1. Minimal but extensible header

/* varatt_cmp_extended follows va_tcinfo when the upper two bits of
* va_tcinfo are 11. Compressed data starts immediately after
* ext_data. ext_hdr encodes both the compression algorithm and the
* byte-length of the algorithm-specific metadata.
*/
typedef struct varatt_cmp_extended
{
uint32 ext_hdr; /* [ meta_size:24 | cmpr_id:8 ] */
char ext_data[FLEXIBLE_ARRAY_MEMBER]; /* optional metadata */
} varatt_cmp_extended;

a. 24 bits for length → per-datum compression algorithm metadata is
capped at 16 MB, which is far more than any realistic compression
header.
b. 8 bits for algorithm id → up to 256 algorithms.
c. Zero-overhead when unused if an algorithm needs no per-datum
metadata (e.g., ZSTD-nodict),

2. Algorithm registry
/*
* TOAST compression methods enumeration.
*
* Each entry defines:
* - NAME : identifier for the compression algorithm
* - VALUE : numeric enum value
* - METADATA type: struct type holding extra info (void when none)
*
* The INVALID entry is a sentinel and must remain last.
*/
#define TOAST_COMPRESSION_LIST \
X(PGLZ, 0, void) /* existing */ \
X(LZ4, 1, void) /* existing */ \
X(ZSTD_NODICT, 2, void) /* new, no metadata */ \
X(ZSTD_DICT, 3, zstd_dict_meta) /* new, needs dict_id */ \
X(INVALID, 4, void) /* sentinel */

typedef enum ToastCompressionId
{
#define X(name,val,meta) TOAST_##name##_COMPRESSION_ID = val,
TOAST_COMPRESSION_LIST
#undef X
} ToastCompressionId;

/* Example of an algorithm-specific metadata block */
typedef struct
{
uint32 dict_id; /* dictionary Oid */
} zstd_dict_meta;

3. Resulting on-disk layouts for zstd

ZSTD no dict: datum ondisk layout:
+----------------------------------+
| va_header (uint32) |
+----------------------------------+
| va_tcinfo (uint32) | (11 in top two bits specify extended)
+----------------------------------+
| ext_hdr (uint32) | <-- [ meta size:24 bits |
compression id:8 bits ]
+----------------------------------+
| Compressed bytes … | <-- zstd (no dictionary)
+----------------------------------+

ZSTD dict: datum ondisk layout
+----------------------------------+
| va_header (uint32) |
+----------------------------------+
| va_tcinfo (uint32) |
+----------------------------------+
| ext_hdr (uint32) | <-- [ meta size:24 bits |
compression id:8 bits ]
+----------------------------------+
| dict_id (uint32) | <-- zstd_dict_meta
+----------------------------------+
| Compressed bytes … | <-- zstd (dictionary)
+----------------------------------+

4. How does this fit?

Flexibility: Each new algorithm that needs extra metadata simply
defines its own struct and allocates varatt_cmp_extended in
setup_compression_info.
Storage: Everything in varatt_cmp_extended is copied to the datum,
immediately followed by the compressed payload.
Optional, pay-as-you-go metadata – only algorithms that need it pay for it.
Future-proof – new compression algorithms, requires any kind of
metadata like dictid or any other slot into the same ext_data
mechanism.

I’ve split the work into two patches for review:
v19-0001-varattrib_4b-design-proposal-to-make-it-extended.patch:
varattrib_4b extensibility – adds varatt_cmp_extended, enum plumbing,
and macros; behaviour unchanged.
v19-0002-zstd-nodict-support.patch: Plain ZSTD (non dict) support.

Please share your thoughts—and I’d love to hear feedback on the design. Thanks!

On Mon, Apr 21, 2025 at 12:02 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Apr 18, 2025 at 12:22:18PM -0400, Robert Haas wrote:

I think we could add plain-old zstd compression without really
tackling this issue, but if we are going to add dictionaries then I
think we might need to revisit the idea of preventing things from
leaking out of tables. What I can't quite remember at the moment is
how much of the problem was that it was going to be slow to force the
recompression, and how much of it was that we weren't sure we could
even find all the places in the code that might need such handling.

FWIW, this point resonates here. There is one thing that we have to
do anyway: we just have one bit left in the varlena headers as lz4 is
using the one before last. So we have to make it extensible, even if
it means that any compression method other than LZ4 and pglz would
consume one more byte in its header by default. And I think that this
has to happen at some point if we want flexibility in this area.

+    struct
+    {
+        uint32        va_header;
+        uint32        va_tcinfo;
+        uint32        va_cmp_alg;
+        uint32        va_cmp_dictid;
+        char        va_data[FLEXIBLE_ARRAY_MEMBER];
+    }            va_compressed_ext;

Speaking of which, I am confused by this abstraction choice in
varatt.h in the first patch. Are we sure that we are always going to
have a dictionary attached to a compressed data set or even a
va_cmp_alg? It seems to me that this could lead to a waste of data in
some cases because these fields may not be required depending on the
compression method used, as some fields may not care about these
details. This kind of data should be made optional, on a per-field
basis.

One thing that I've been wondering is how it would be possible to make
the area around varattrib_4b more readable while dealing with more
extensibility. It would be a good occasion to improve that, even if
I'm hand-waving here currently and that the majority of this code is
old enough to vote, with few modifications across the years.

The second thing that I'd love to see on top of the addition of the
extensibility is adding plain compression support for zstd, with
nothing fancy, just the compression and decompression bits. I've done
quite a few benchmarks with the two, and results kind of point in the
direction that zstd is more efficient than lz4 overall. Don't take me
wrong: lz4 can be better in some workloads as it can consume less CPU
than zstd while compressing less. However, a comparison of ratios
like (compression rate / cpu used) has always led me to see zstd as
superior in a large number of cases. lz4 is still very good if you
are CPU-bound and don't care about the extra space required. Both are
three classes better than pglz.

Once we have these three points incrementally built-in together (the
last bit extensibility, the potential varatt.h refactoring and the
zstd support), there may be a point in having support for more
advanced options with the compression methods in the shape of dicts or
more requirements linked to other compression methods, but I think the
topic is complex enough that we should make sure that these basics are
implemented in a way sane enough so as we'd be able to extend them
with all the use cases in mind.
--
Michael

--
Nikhil Veldanda

--
Nikhil Veldanda

Attachments:

v19-0002-zstd-nodict-support.patchapplication/octet-stream; name=v19-0002-zstd-nodict-support.patch
v19-0001-varattrib_4b-design-proposal-to-make-it-extended.patchapplication/octet-stream; name=v19-0001-varattrib_4b-design-proposal-to-make-it-extended.patch
#28Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Nikhil Kumar Veldanda (#27)
Re: ZStandard (with dictionaries) compression support for TOAST compression

On Fri, Apr 25, 2025 at 11:15 AM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:

a. 24 bits for length → per-datum compression algorithm metadata is
capped at 16 MB, which is far more than any realistic compression
header.
b. 8 bits for algorithm id → up to 256 algorithms.
c. Zero-overhead when unused if an algorithm needs no per-datum
metadata (e.g., ZSTD-nodict),

I don't understand why we need to spend 24 bits on a length header
here. I agree with the idea of adding a 1-byte quantity for algorithm
here, but I don't see why we need anything more than that. If the
compression method is zstd-with-a-dict, then the payload data
presumably needs to start with the OID of the dictionary, but it seems
like in your schema every single datum would use these 3 bytes to
store the fact that sizeof(Oid) = 4. The code that interprets
zstd-with-dict datums should already know the header length. Even if
generic code that works with all types of compression needs to be able
to obtain the header length on a per-compression-type basis, there can
be some kind of callback or table for that, rather than storing it in
every single datum.

--
Robert Haas
EDB: http://www.enterprisedb.com

#29Nikhil Kumar Veldanda
Nikhil Kumar Veldanda
veldanda.nikhilkumar17@gmail.com
In reply to: Robert Haas (#28)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi Robert,

Thanks for raising that question. The idea behind including a 24-bit
length field alongside the 1-byte algorithm ID is to ensure that each
compressed datum self-describes its metadata size. This allows any
compression algorithm to embed variable-length metadata (up to 16 MB)
without the need for hard-coding header sizes. For instance, an
algorithm in feature might require different metadata lengths for each
datum, and a fixed header size table wouldn’t work. By storing the
length in the header, we maintain a generic and future-proof design. I
would greatly appreciate any feedback on this design. Thanks!

On Mon, Apr 28, 2025 at 7:50 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Apr 25, 2025 at 11:15 AM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:

a. 24 bits for length → per-datum compression algorithm metadata is
capped at 16 MB, which is far more than any realistic compression
header.
b. 8 bits for algorithm id → up to 256 algorithms.
c. Zero-overhead when unused if an algorithm needs no per-datum
metadata (e.g., ZSTD-nodict),

I don't understand why we need to spend 24 bits on a length header
here. I agree with the idea of adding a 1-byte quantity for algorithm
here, but I don't see why we need anything more than that. If the
compression method is zstd-with-a-dict, then the payload data
presumably needs to start with the OID of the dictionary, but it seems
like in your schema every single datum would use these 3 bytes to
store the fact that sizeof(Oid) = 4. The code that interprets
zstd-with-dict datums should already know the header length. Even if
generic code that works with all types of compression needs to be able
to obtain the header length on a per-compression-type basis, there can
be some kind of callback or table for that, rather than storing it in
every single datum.

--
Robert Haas
EDB: http://www.enterprisedb.com

--
Nikhil Veldanda

#30Nikita Malakhov
Nikita Malakhov
hukutoc@gmail.com
In reply to: Nikhil Kumar Veldanda (#29)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi,

Nikhil, please consider existing discussions on using dictionaries
(mentioned above by Aleksander) and extending the TOAST pointer [1]/messages/by-id/CAN-LCVMq2X=fhx7KLxfeDyb3P+BXuCkHC0g=9GF+JD4izfVa0Q@mail.gmail.com,
it seems you did not check them.

The same question Robert asked above - it's unclear why the header
wastes so much space. You mentioned metadata length - what metadata
do you mean there?

Also Robert pointed out very questionable approaches in your solution -
new dependencies crawling around user tables, new catalog table
with very unclear lifecycle (and, having new catalog table, immediately
having questions with pg_upgrade).
Currently I'm looking through the patch and could share my thoughts
later.

While reading this thread I've thought about storing a dictionary within
the table it is used for - IIUC on dictionary is used for just one
attribute,
so it does not make sense to make it global.

Also, I have a question regarding the Zstd implementation you propose -
does it provide a possibility for partial decompression?

Thanks!

[1]: /messages/by-id/CAN-LCVMq2X=fhx7KLxfeDyb3P+BXuCkHC0g=9GF+JD4izfVa0Q@mail.gmail.com
/messages/by-id/CAN-LCVMq2X=fhx7KLxfeDyb3P+BXuCkHC0g=9GF+JD4izfVa0Q@mail.gmail.com

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

#31Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Nikhil Kumar Veldanda (#29)
Re: ZStandard (with dictionaries) compression support for TOAST compression

On Mon, Apr 28, 2025 at 5:32 PM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:

Thanks for raising that question. The idea behind including a 24-bit
length field alongside the 1-byte algorithm ID is to ensure that each
compressed datum self-describes its metadata size. This allows any
compression algorithm to embed variable-length metadata (up to 16 MB)
without the need for hard-coding header sizes. For instance, an
algorithm in feature might require different metadata lengths for each
datum, and a fixed header size table wouldn’t work. By storing the
length in the header, we maintain a generic and future-proof design. I
would greatly appreciate any feedback on this design. Thanks!

I feel like I gave you some feedback on the design already, which was
that it seems like a waste of 3 bytes to me.

Don't get me wrong: I'm quite impressed by the way you're working on
this problem and I hope you stick around and keep working on it and
figure something out. But I don't quite understand the point of this
response: it seems like you're just restating what the design does
without really justifying it. The question here isn't whether a 3-byte
header can describe a length up to 16MB; I think we all know our
powers of two well enough to agree on the answer to that question. The
question is whether it's a good use of 3 bytes, and I don't think it
is.

I did consider the fact that future compression algorithms might want
to use variable-length headers; but I couldn't see a reason why we
shouldn't let each of those compression algorithms decide for
themselves how to encode whatever information they need. If a
compression algorithm needs a variable-length header, then it just
needs to make that header self-describing. Worst case scenario, it can
make the first byte of that variable-length header a length byte, and
then go from there; but it's probably possible to be even smarter and
use less than a full byte. Say for example we store a dictionary ID
that in concept is a 32-bit quantity but we use a variable-length
integer representation for it. It's easy to see that we shouldn't ever
need more than 3 bits for that so a full length byte is overkill and,
in fact, would undermine the value of a variable-length representation
rather severely. (I suspect it's a bad idea anyway, but it's a worse
idea if you burn a full byte on a length header.)

But there's an even larger question here too, which is why we're
having some kind of discussion about generalized metadata when the
current project seemingly only requires a 4-byte dictionary OID. If
you have some other use of this space in mind, I don't think you've
told us what it is. If you don't, then I'm not sure why we're
designing around an up-to-16MB variable-length quantity when what we
have before us is a 4-byte fixed-length quantity.

Moreover, even if you do have some (undisclosed) idea about what else
might be stored in this metadata area, why would it be important or
even desirable to have the length of that area represented in some
uniform way across compression methods? There's no obvious need for
any code outside the compression method itself to be able to decompose
the Datum into a metadata portion and a payload portion. After all,
the metadata portion could be anything so there's no way for anything
but the compression method to interpret it usefully. If we do want to
have outside code be able to ask questions, we could design some kind
of callback interface - e.g. if we end up with multiple compression
methods that store dictionary OIDs and they maybe do it in different
ways, each could provide an
"extract-the-dictionary-OID-from-this-datum" callback and each
compression method can implement that however it likes.

Maybe you can argue that we will eventually end up with various
compression method callbacks each of which is capable of working on
the metadata, and so then we might want to take an initial slice of a
toasted datum that is just big enough to allow that to work. But that
is pretty hypothetical, and in practice the first chunk of the TOAST
value (~2k) seems like it'd probably work well for most cases.

So, again, if you want us to take seriously the idea of dedicating 3
bytes per Datum to something, you need to give us a really good reason
for so doing. The fact a 24-bit metadata length can describe a
metadata header of up to 2^24 bits isn't a reason, good or bad. It's
just math.

--
Robert Haas
EDB: http://www.enterprisedb.com

#32Nikhil Kumar Veldanda
Nikhil Kumar Veldanda
veldanda.nikhilkumar17@gmail.com
In reply to: Robert Haas (#31)
2 attachment(s)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi Robert

But I don't quite understand the point of this
response: it seems like you're just restating what the design does
without really justifying it. The question here isn't whether a 3-byte
header can describe a length up to 16MB; I think we all know our
powers of two well enough to agree on the answer to that question. The
question is whether it's a good use of 3 bytes, and I don't think it
is.

My initial decision to include a 3‑byte length field was driven by two goals:
1. Avoid introducing separate callbacks for each algorithm.
2. Provide a single, algorithm-agnostic mechanism for handling
metadata length.

After re-evaluating based on your feedback, I agree that the fixed
overhead of a 3-byte length field outweighs its benefit; per-algorithm
callbacks deliver the same functionality while saving three bytes per
datum.

I did consider the fact that future compression algorithms might want
to use variable-length headers; but I couldn't see a reason why we
shouldn't let each of those compression algorithms decide for
themselves how to encode whatever information they need. If a
compression algorithm needs a variable-length header, then it just
needs to make that header self-describing. Worst case scenario, it can
make the first byte of that variable-length header a length byte, and
then go from there; but it's probably possible to be even smarter and
use less than a full byte. Say for example we store a dictionary ID
that in concept is a 32-bit quantity but we use a variable-length
integer representation for it. It's easy to see that we shouldn't ever
need more than 3 bits for that so a full length byte is overkill and,
in fact, would undermine the value of a variable-length representation
rather severely. (I suspect it's a bad idea anyway, but it's a worse
idea if you burn a full byte on a length header.)

I agree. Each compression algorithm can decide its own metadata size
overhead. Callbacks can provide this information as well rather than
storing in fixed length bytes(3 bytes). The revised patch introduces a
"toast_cmpid_meta_size(const varatt_cmp_extended *hdr)", which
calculates the metadata size.

But there's an even larger question here too, which is why we're
having some kind of discussion about generalized metadata when the
current project seemingly only requires a 4-byte dictionary OID. If
you have some other use of this space in mind, I don't think you've
told us what it is. If you don't, then I'm not sure why we're
designing around an up-to-16MB variable-length quantity when what we
have before us is a 4-byte fixed-length quantity.

This project only requires 4 bytes of fixed-size metadata to store the
dictionary ID.

Updated design for extending varattrib_4b compression

1. extensible header

/*
* varatt_cmp_extended: an optional per‐datum header for extended
compression method.
* Only used when va_tcinfo's top two bits are "11".
*/
typedef struct varatt_cmp_extended
{
uint8 cmp_alg;
char cmp_meta[FLEXIBLE_ARRAY_MEMBER]; /*
algorithm‐specific metadata */
} varatt_cmp_extended;

2. Algorithm registry and metadata size dispatch

static inline uint32
unsupported_meta_size(const varatt_cmp_extended *hdr)
{
elog(ERROR, "toast_cmpid_meta_size called for unsupported
compression algorithm");
return 0; /* unreachable */
}

/* no metadata for plain-ZSTD */
static inline uint32
zstd_nodict_meta_size(const varatt_cmp_extended *hdr)
{
return 0;
}

static inline uint32
zstd_dict_meta_size(const varatt_cmp_extended *hdr)
{
return sizeof(Oid);
}

/*
* TOAST compression methods enumeration.
*
* NAME : algorithm identifier
* VALUE : enum value
* META-SIZE-FN : Calculates algorithm metadata size.
*/
#define TOAST_COMPRESSION_LIST \
X(PGLZ, 0, unsupported_meta_size) \
X(LZ4, 1, unsupported_meta_size) \
X(ZSTD_NODICT, 2, zstd_nodict_meta_size) \
X(ZSTD_DICT, 3, zstd_dict_meta_size) \
X(INVALID, 4, unsupported_meta_size) /* sentinel */

/* Compression algorithm identifiers */
typedef enum ToastCompressionId
{
#define X(name,val,fn) TOAST_##name##_COMPRESSION_ID = (val),
TOAST_COMPRESSION_LIST
#undef X
} ToastCompressionId;

/* lookup table to check if compression method uses extended format */
static const bool toast_cmpid_extended[] = {
#define X(name,val,fn) \
/* PGLZ, LZ4 don't use extended format */ \
[TOAST_##name##_COMPRESSION_ID] = \
((val) != TOAST_PGLZ_COMPRESSION_ID && \
(val) != TOAST_LZ4_COMPRESSION_ID && \
(val) != TOAST_INVALID_COMPRESSION_ID),
TOAST_COMPRESSION_LIST
#undef X
};

#define TOAST_CMPID_EXTENDED(alg) (toast_cmpid_extended[alg])

/*
* Prototype for a per-datum metadata-size callback:
* given a pointer to the extended header, return
* how many metadata bytes follow it.
*/
typedef uint32 (*ToastMetaSizeFn) (const varatt_cmp_extended *hdr);

/* Callback table—indexed by ToastCompressionId */
static const ToastMetaSizeFn toast_meta_size_fns[] = {
#define X(name,val,fn) [TOAST_##name##_COMPRESSION_ID] = fn,
TOAST_COMPRESSION_LIST
#undef X
};

/* Calculates algorithm metadata size */
static inline uint32
toast_cmpid_meta_size(const varatt_cmp_extended *hdr)
{
Assert(hdr != NULL);
return toast_meta_size_fns[hdr->cmp_alg] (hdr);
}

Each compression algorithm provides a static callback that returns the
size of its metadata, given a pointer to the varatt_cmp_extended
header. Algorithms with fixed-size metadata return a constant, while
algorithms with variable-length metadata are responsible for defining
and parsing their own internal headers to compute the metadata size.

3. Resulting on-disk layouts for zstd

ZSTD (nodict) — datum on‑disk layout

+----------------------------------+
| va_header (uint32) |
+----------------------------------+
| va_tcinfo (uint32) | ← top two bits = 11 (extended)
+----------------------------------+
| cmp_alg (uint8) | ← (ZSTD_NODICT)
+----------------------------------+
| compressed bytes … | ← ZSTD frame
+----------------------------------+

ZSTD(dict) — datum on‑disk layout

+----------------------------------+
| va_header (uint32) |
+----------------------------------+
| va_tcinfo (uint32) | ← top two bits = 11 (extended)
+----------------------------------+
| cmp_alg (uint8) | ← (ZSTD_DICT)
+----------------------------------+
| dict_id (uint32) | ← dictionary OID
+----------------------------------+
| compressed bytes … | ← ZSTD frame
+----------------------------------+

I hope this updated design addresses your concerns. I would appreciate
any further feedback you may have. Thanks again for your guidance—it's
been very helpful.

v20-0001-varattrib_4b-design-proposal-to-make-it-extended.patch:
varattrib_4b extensibility – adds varatt_cmp_extended, metadata size
dispatch and useful macros; behaviour unchanged.
v20-0002-zstd-nodict-compression.patch: Plain ZSTD (non dict) support.

--
Nikhil Veldanda

Attachments:

v20-0002-zstd-nodict-compression.patchapplication/x-patch; name=v20-0002-zstd-nodict-compression.patch
v20-0001-varattrib_4b-design-proposal-to-make-it-extended.patchapplication/x-patch; name=v20-0001-varattrib_4b-design-proposal-to-make-it-extended.patch
#33Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Nikhil Kumar Veldanda (#32)
Re: ZStandard (with dictionaries) compression support for TOAST compression

On Sun, May 4, 2025 at 8:54 AM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:

I agree. Each compression algorithm can decide its own metadata size
overhead. Callbacks can provide this information as well rather than
storing in fixed length bytes(3 bytes). The revised patch introduces a
"toast_cmpid_meta_size(const varatt_cmp_extended *hdr)", which
calculates the metadata size.

I don't understand why we need this. I don't see why we need any sort
of generalized concept of metadata at all here. The zstd-dict
compression method needs to store a four-byte OID, so let it do that.
But we don't need to brand that as metadata; and we don't need a
method for other parts of the system to ask how much metadata exists.
At least, I don't think we do.

--
Robert Haas
EDB: http://www.enterprisedb.com

#34Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Nikhil Kumar Veldanda (#32)
Re: ZStandard (with dictionaries) compression support for TOAST compression

On Sun, May 04, 2025 at 05:54:34AM -0700, Nikhil Kumar Veldanda wrote:

3. Resulting on-disk layouts for zstd

ZSTD (nodict) — datum on‑disk layout

+----------------------------------+
| va_header (uint32) |
+----------------------------------+
| va_tcinfo (uint32) | ← top two bits = 11 (extended)
+----------------------------------+
| cmp_alg (uint8) | ← (ZSTD_NODICT)
+----------------------------------+
| compressed bytes … | ← ZSTD frame
+----------------------------------+

This makes sense, yes. You are allocating an extra byte after
va_tcinfo that serves as a redirection if the three bits dedicated to
the compression method are set.

ZSTD(dict) — datum on‑disk layout
+----------------------------------+
| va_header (uint32) |
+----------------------------------+
| va_tcinfo (uint32) | ← top two bits = 11 (extended)
+----------------------------------+
| cmp_alg (uint8) | ← (ZSTD_DICT)
+----------------------------------+
| dict_id (uint32) | ← dictionary OID
+----------------------------------+
| compressed bytes … | ← ZSTD frame
+----------------------------------+

I hope this updated design addresses your concerns. I would appreciate
any further feedback you may have. Thanks again for your guidance—it's
been very helpful.

That makes sense as well structurally if we include a dictionary for
each value. Not sure that we need that much space, for this purpose,
though. We are going to need the extra byte anyway AFAIK, so better
to start with that.

I have been reading 0001 and I'm finding that the integration does not
seem to fit much with the existing varatt_external, making the whole
result slightly confusing. A simple thing: the last bit that we can
use is in varatt_external's va_extinfo, where the patch is using
VARATT_4BCE_MASK to track that we need to go beyond varatt_external to
know what kind of compression information we should use. This is an
important point, and it is not documented around varatt_external which
still assumes that the last bit could be used for a compression
method. With what you are doing in 0001 (or even 0002), this becomes
wrong.

Shouldn't we have a new struct portion in varattrib_4b's union for
this purpose at least (I don't recall that we rely on varattrib_4b's
size which would get larger with this extra byte for the new extended
data with the three bits set for the compression are set in
va_extinfo, correct me if I'm wrong here).
--
Michael

#35Nikita Malakhov
Nikita Malakhov
hukutoc@gmail.com
In reply to: Michael Paquier (#34)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi!

Michael, what do you think of this approach (extending varatt_external)
vs extending varatt itself by new tag and structure? The second approach
allows more flexibility, independence of existing structure without
modifying
varatt_4b and is extensible further. I mentioned it above (extending
the TOAST pointer), and it could be implemented more easily and in a less
confusing way.

I'm +1 on storing dictionary somewhere around actual data (not necessary
in the data storage area itself) but strongly against new catalog table
with dictionaries - it involves a lot of side effects, including locks
while working
with this table resulting in performance degradation, and so on.

--

Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

#36Nikhil Kumar Veldanda
Nikhil Kumar Veldanda
veldanda.nikhilkumar17@gmail.com
In reply to: Robert Haas (#33)
2 attachment(s)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi Robert,

On Mon, May 5, 2025 at 8:07 AM Robert Haas <robertmhaas@gmail.com> wrote:

I don't understand why we need this. I don't see why we need any sort
of generalized concept of metadata at all here. The zstd-dict
compression method needs to store a four-byte OID, so let it do that.
But we don't need to brand that as metadata; and we don't need a
method for other parts of the system to ask how much metadata exists.
At least, I don't think we do.

Thank you for the feedback. My intention in introducing the
toast_cmpid_meta_size helper to centralize header-size computation
across all compression algorithms and to provide generic macros that
can be applied to any extended algorithm methods.

I agree that algorithm-specific metadata details or its sizes need not
be exposed beyond their own routines. Each compression method
inherently knows its layout requirements and should handle them
internally in their routines. I’ve removed the toast_cmpid_meta_size
helper and eliminated the metadata branding.

In the varatt_cmp_extended, the cmp_data field carries the algorithm
payload: for zstd-nodict, it’s a ZSTD frame; for zstd-dict, it’s a
four-byte dictionary OID followed by the ZSTD frame. This approach
ensures the algorithm's framing is fully self-contained in its
routines.

/*
* varatt_cmp_extended: an optional per-datum header for extended
compression method.
* Only used when va_tcinfo’s top two bits are “11”.
*/
typedef struct varatt_cmp_extended
{
uint8 cmp_alg;
char cmp_data[FLEXIBLE_ARRAY_MEMBER];
} varatt_cmp_extended;

I’ve updated patch v21, please review it and let me know if you have
any questions or feedback? Thank you!

v21-0001-varattrib_4b-design-proposal-to-make-it-extended.patch:
varattrib_4b extensibility – adds varatt_cmp_extended, useful macros;
behaviour unchanged.
v21-0002-zstd-nodict-compression.patch: Plain ZSTD (non dict) support
and few basic tests.

--
Nikhil Veldanda

Attachments:

v21-0002-zstd-nodict-compression.patchapplication/octet-stream; name=v21-0002-zstd-nodict-compression.patch
v21-0001-varattrib_4b-design-proposal-to-make-it-extended.patchapplication/octet-stream; name=v21-0001-varattrib_4b-design-proposal-to-make-it-extended.patch
#37Nikhil Kumar Veldanda
Nikhil Kumar Veldanda
veldanda.nikhilkumar17@gmail.com
In reply to: Michael Paquier (#34)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi Michael, Thanks for the feedback.

On Wed, May 7, 2025 at 12:49 AM Michael Paquier <michael@paquier.xyz> wrote:

I have been reading 0001 and I'm finding that the integration does not
seem to fit much with the existing varatt_external, making the whole
result slightly confusing. A simple thing: the last bit that we can
use is in varatt_external's va_extinfo, where the patch is using
VARATT_4BCE_MASK to track that we need to go beyond varatt_external to
know what kind of compression information we should use. This is an
important point, and it is not documented around varatt_external which
still assumes that the last bit could be used for a compression
method. With what you are doing in 0001 (or even 0002), this becomes
wrong.

This is the current logic used in patch for varatt_external.

When a datum is compressed with an extended algorithm and must live in
external storage, we set the top two bits of
va_extinfo(varatt_external) to 0b11.

To figure out the compression method for an external TOAST datum:

1. Inspect the top two bits of va_extinfo.
2. If they equal 0b11(VARATT_4BCE_MASK), call
toast_get_compression_id, which invokes detoast_external_attr to fetch
the datum in its 4-byte varattrib form (no decompression) and then
reads its compression header to find the compression method.
3. Otherwise, fall back to the existing
VARATT_EXTERNAL_GET_COMPRESS_METHOD path to get the compression
method.

We use this macro VARATT_EXTERNAL_COMPRESS_METHOD_EXTENDED to
determine if the compression method is extended or not.

Across the entire codebase, external TOAST‐pointer compression methods
are only inspected in the following functions:
1. pg_column_compression
2. check_tuple_attribute (verify_heapam pg function)
3. detoast_attr_slice (just to check pglz or not)

Could you please help me understand what’s incorrect about this approach?

Shouldn't we have a new struct portion in varattrib_4b's union for
this purpose at least (I don't recall that we rely on varattrib_4b's
size which would get larger with this extra byte for the new extended
data with the three bits set for the compression are set in
va_extinfo, correct me if I'm wrong here).
--

In patch v21, va_compressed.va_data points to varatt_cmp_extended, so
adding it isn’t strictly necessary. If we do want to fold it into the
varattrib_4b union, we could define it like this:

```
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct /* Compressed-in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
} va_compressed;
struct
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
uint8 cmp_alg;
char cmp_data[FLEXIBLE_ARRAY_MEMBER];
} varatt_cmp_extended;
} varattrib_4b;
```
we don't depend on varattrib_4b size anywhere.

--
Nikhil Veldanda

#38Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Nikhil Kumar Veldanda (#37)
Re: ZStandard (with dictionaries) compression support for TOAST compression

On Wed, May 07, 2025 at 04:39:17PM -0700, Nikhil Kumar Veldanda wrote:

In patch v21, va_compressed.va_data points to varatt_cmp_extended, so
adding it isn’t strictly necessary. If we do want to fold it into the
varattrib_4b union, we could define it like this:

```
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct /* Compressed-in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
} va_compressed;
struct
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
uint8 cmp_alg;
char cmp_data[FLEXIBLE_ARRAY_MEMBER];
} varatt_cmp_extended;
} varattrib_4b;
```
we don't depend on varattrib_4b size anywhere.

Yes, I was wondering if this is not the most natural approach in terms
of structure once if we plug an extra byte into the varlena header if
all the bits of va_extinfo for the compression information are used.
Having all the bits may not mean that this necessarily means that the
information would be cmp_data all the time, just that this a natural
option when plugging in a new compression method in the new byte
available.

FWIW, I've tested this exact change yesterday, wondering if we depend
on sizeof(varattrib_4b) after looking at the code and getting the
impression that we don't even for some the in-memory comparisons, and
noted two things:
- check-world was OK.
- a pg_upgrade'd instance with a regression database seems kind of
OK, but I've not done that much in-depth checking on this side so I
have less confidence about that.
--
Michael

#39Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Nikita Malakhov (#35)
Re: ZStandard (with dictionaries) compression support for TOAST compression

On Wed, May 07, 2025 at 11:40:14AM +0300, Nikita Malakhov wrote:

Michael, what do you think of this approach (extending varatt_external)
vs extending varatt itself by new tag and structure?

I'm reserved on that. What I'm afraid here is more complications in
the backend code because we have quite a few places where we do
varatt lookups to decide what should be happen, like in PLs, so this
brings complications of its own for something that could be isolated
behind a varattrib_4b, where detoasting is under control. The patch
posted at [1]/messages/by-id/CAN-LCVNxbnpHh4PVUUc9g6dPibE8wZALiLtxcs3TjfivxDkCkA@mail.gmail.com -- Michael means that the custom area could be anything, how do you
make sure that the backend is able to understand what could be
anything? I guess that this also depends on the pluggable toast part,
of course, but I've not studied enough what's been proposed to have a
hard opinion. If you have very specific pointers, please feel free.

The second approach
allows more flexibility, independence of existing structure without
modifying
varatt_4b and is extensible further. I mentioned it above (extending
the TOAST pointer), and it could be implemented more easily and in a less
confusing way.

If you mean [0]/messages/by-id/CAN-LCVMq2X=fhx7KLxfeDyb3P+BXuCkHC0g=9GF+JD4izfVa0Q@mail.gmail.com, putting an "extended" flag into ToastCompressionId
which is something used now by the internals of TOAST for a
compression method, with ToastCompressionId being limited to have up
to 4 elements in its enum, does not feel right. In concept, once
extended, this may point to something more than a compression method,
as there's also metadata around the compression method added. At
least that's what I'm understanding as a possible scenario from all
the proposals in this area. There's some overlap with
common/compression.h, for example, even if we are never going to care
about gzip in this case, just saying that this has been buzzing me in
the core code for some time.

One first thing I'd try to do here is to untangle this situation, by
allowing ToastCompressionId to have more extensibility so as we could
use it to track more compression methods, or just perhaps remove it
entirely in a smart way by keeping the information related to the
extra byte and the two bits of va_tcinfo for the compression method
isolated in varatt.h, shaping the code so as adding more compression
methods in the extra byte put after va_tcinfo would be easier once the
surroundings of varattrib_4b are extended. Without an agreement about
how to use the last bit we have, there's perhaps little point in
aiming for any of that now.

FWIW, extending the area around varattrib_4b feels a natural thing to
do here, and it does not have to overlap with the possibilities around
the varatts.

I'm +1 on storing dictionary somewhere around actual data (not necessary
in the data storage area itself) but strongly against new catalog table
with dictionaries - it involves a lot of side effects, including locks
while working
with this table resulting in performance degradation, and so on.

Just wondering. Have you looked at the potential overhead of doing
computation and decomputation of a dictionnary? zstd mentions in its
docs that these can easily cause a lot of overhead, hence handling
this stuff without some kind of caching is going to be costly if
performing a lot of chunk decompressions. It's something that could
be decided later on, of course. If this area of the code is made
pluggable, then it's up to an extension to just do it.

[0]: /messages/by-id/CAN-LCVMq2X=fhx7KLxfeDyb3P+BXuCkHC0g=9GF+JD4izfVa0Q@mail.gmail.com
[1]: /messages/by-id/CAN-LCVNxbnpHh4PVUUc9g6dPibE8wZALiLtxcs3TjfivxDkCkA@mail.gmail.com -- Michael
--
Michael

#40Nikhil Kumar Veldanda
Nikhil Kumar Veldanda
veldanda.nikhilkumar17@gmail.com
In reply to: Michael Paquier (#38)
2 attachment(s)
Re: ZStandard (with dictionaries) compression support for TOAST compression

On Wed, May 7, 2025 at 5:38 PM Michael Paquier <michael@paquier.xyz> wrote:

Yes, I was wondering if this is not the most natural approach in terms
of structure once if we plug an extra byte into the varlena header if
all the bits of va_extinfo for the compression information are used.
Having all the bits may not mean that this necessarily means that the
information would be cmp_data all the time, just that this a natural
option when plugging in a new compression method in the new byte
available.

Thanks for reviewing and providing feedback on the patch. Regarding
questions about varatt_external—specifically, storing compression
methods in one byte for extended compression methods for external
ondisk datum here’s the proposal for varatt_external. We check for
compression methods for external ondisk datum in 3 trivial places in
core, my previous proposal just mark 0b11 in the top bits of
va_extinfo and fetch externally stored chunks and form varattrib_4b to
find the compression method id for extended compression methods.
However, I understand why embedding the method byte directly is
clearer.

```
typedef struct varatt_external
{
int32 va_rawsize; /* Original data size (includes header) */
uint32 va_extinfo; /* External size (without header) and
* compression method */
Oid va_valueid; /* Unique ID within TOAST table */
Oid va_toastrelid; /* OID of TOAST table containing it */
/* -------- optional trailer -------- */
union
{
struct /* compression-method trailer */
{
uint8 va_ecinfo; /* Extended-compression-method info */
} cmp;
} extended; /* “extended” = optional byte */
} varatt_external;
```

I'm proposing not to store algorithm metadata exclusively at
varatt_external level because storing metadata within varatt_external
is not always appropriate because in scenarios where datum initially
qualifies for out-of-line storage but becomes sufficiently small in
size after compression—specifically under the 2KB threshold(extended
storage type)—it no longer meets the criteria for external storage.
Consequently, it cannot utilize a TOAST pointer and must instead be
stored in-line.
Given this behavior, it is more robust to store metadata at the
varattrib_4b level. This ensures that metadata remains accessible
regardless of whether the datum ends up stored in-line or externally.
Moreover, during detoasting it first fetches the external data,
reconstructs it into varattrib_4b, then decompresses—so keeping
metadata in varattrib_4b matches that flow.

This is the layout for extra 1 byte in both varatt_external and varattrib_4b.
```
bit 7 6 5 4 3 2 1 0
+---+---+---+---+---+---+---+---+
| cmid − 2 | F|
+---+---+---+---+---+---+---+---+

• Bits 7–1 (cmid − 2)
– 7-bit field holding compression IDs: raw ∈ [0…127] ⇒ cmid = raw +
2 ([2…129])
• Bit 0 (F)
– flag indicating whether the algorithm expects metadata
```

Introduced metadata flag in the 1-byte layout, To prevent zstd from
exposing dict or nodict types for ToastCompressionId. This metadata
flag indicates whether the algorithm expects any metadata or not. For
the ZSTD scenario, if the flag is set, it expects a dictid; otherwise,
no dictid is present.

```
typedef enum ToastCompressionId
{
TOAST_PGLZ_COMPRESSION_ID = 0,
TOAST_LZ4_COMPRESSION_ID = 1,
TOAST_ZSTD_COMPRESSION_ID = 2,
TOAST_INVALID_COMPRESSION_ID = 3,
} ToastCompressionId;

// varattrib_4b remains unchanged from the previous proposal
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;

struct /* Compressed in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size and method; see va_extinfo */
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed;

struct /* Extended compressed in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size and method; see va_extinfo */
uint8 va_ecinfo;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed_ext;
} varattrib_4b;
```

During compression, compression methods (zstd_compress_datum) will
determine whether to use metadata(dictionary) or not based on
CompressionInfo.meta.

Per-column ZSTD compression levels:
Since ZSTD supports compression levels (default = 3, up to
ZSTD_maxCLevel()—currently 22—and negative “fast” levels), I’m
proposing an option for users to choose their preferred level on a
per-column basis via pg_attribute.attoptions. If unset, we’ll use
ZSTD’s default:

```
typedef struct AttributeOpts
{
int32 vl_len_; /* varlena header (do not touch!) */
float8 n_distinct;
float8 n_distinct_inherited;
int zstd_level; /* user-specified ZSTD level */
} AttributeOpts;

ALTER TABLE tblname
ALTER COLUMN colname
SET (zstd_level = 5);
```

Since PostgreSQL doesn’t currently expose LZ4 compression levels, I
propose adding per-column ZSTD compression level settings so users can
tune the speed/ratio trade-off. I’d like to hear thoughts on this
approach.

v24-0001-Design-to-extend-the-varattrib_4b-varatt_externa.patch -
Design proposal for varattrib_4b & varatt_external
v24-0002-zstd-nodict-compression.patch - ZSTD no dictionary implementation.

--
Nikhil Veldanda

Attachments:

v24-0002-zstd-nodict-compression.patchapplication/octet-stream; name=v24-0002-zstd-nodict-compression.patch
v24-0001-Design-to-extend-the-varattrib_4b-varatt_externa.patchapplication/octet-stream; name=v24-0001-Design-to-extend-the-varattrib_4b-varatt_externa.patch
#41Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Nikhil Kumar Veldanda (#40)
Re: ZStandard (with dictionaries) compression support for TOAST compression

On Tue, May 27, 2025 at 02:59:17AM -0700, Nikhil Kumar Veldanda wrote:

typedef struct varatt_external
{
int32 va_rawsize; /* Original data size (includes header) */
uint32 va_extinfo; /* External size (without header) and
* compression method */
Oid va_valueid; /* Unique ID within TOAST table */
Oid va_toastrelid; /* OID of TOAST table containing it */
/* -------- optional trailer -------- */
union
{
struct /* compression-method trailer */
{
uint8 va_ecinfo; /* Extended-compression-method info */
} cmp;
} extended; /* “extended” = optional byte */
} varatt_external;
```

Yeah, something like that does make sense to me. If the three bits of
va_extinfo are set, we'd look at the next one. I'll try to think a
bit harder about the structure of varatt.h; that's the important bit
and some of its stuff is outdated. That's not necessarily related to
the patch discussed here.

I'm proposing not to store any metadata exclusively at varatt_external
level because storing metadata within varatt_external is not always
appropriate because in scenarios where datum initially qualifies for
out-of-line storage but becomes sufficiently small in size after
compression—specifically under the 2KB threshold(extended storage
type)—it no longer meets the criteria for external storage.

By metadata, are you referring to the dictionary data, like an ID
pointing to a dictionary stored elsewhere or even the dictionary data
itself? It could be something else, of course. I think that it makes
sense, because we don't need the dictionary to know which code path to
take; the compression method is the only important information to be
able to redirect to the slice, compression or decompression routines.

Given this behavior, it is more robust to store metadata at the
varattrib_4b level. This ensures that metadata remains accessible
regardless of whether the datum ends up stored in-line or externally.
Moreover, during detoasting it first fetches the external data,
reconstructs it into varattrib_4b, then decompresses—so keeping
metadata in varattrib_4b matches that flow.

Okay.

This is the layout for extra 1 byte in both varatt_external and varattrib_4b.
```
bit 7 6 5 4 3 2 1 0
+---+---+---+---+---+---+---+---+
| cmid − 2 | F|
+---+---+---+---+---+---+---+---+

• Bits 7–1 (cmid − 2)
– 7-bit field holding compression IDs: raw ∈ [0…127] ⇒ cmid = raw +
2 ([2…129])
• Bit 0 (F)
– flag indicating whether the algorithm expects metadata
```

Yeah, dedicating one bit to this fact should be more than enough, and
the metadata associated to each compression method may differ. I
don't have a lot of imagination on the matter with any other
compression methods floating around in the industry, unfortunately, so
my imagination is limited.

Introduced metadata flag in the 1-byte layout, To prevent zstd from
exposing dict or nodict types for ToastCompressionId. This metadata
flag indicates whether the algorithm expects any metadata or not. For
the ZSTD scenario, if the flag is set, it expects a dictid; otherwise,
no dictid is present.
```
typedef enum ToastCompressionId
{
TOAST_PGLZ_COMPRESSION_ID = 0,
TOAST_LZ4_COMPRESSION_ID = 1,
TOAST_ZSTD_COMPRESSION_ID = 2,
TOAST_INVALID_COMPRESSION_ID = 3,
} ToastCompressionId;

This makes sense to me; we should try to untangle ToastCompressionId
the dependency between ToastCompressionId and what's stored on disk
for the purpose of extensibility.

struct /* Extended compressed in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size and method; see va_extinfo */
uint8 va_ecinfo; /* Algorithm ID (0–255) */
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed_ext;
} varattrib_4b;
```

Yep.

During compression, compression methods (zstd_compress_datum) will
determine whether to use metadata(dictionary) or not based on
CompressionInfo.meta.

Not sure about this one.

ALTER TABLE tblname
ALTER COLUMN colname
SET (zstd_level = 5);
```

Since PostgreSQL currently doesn’t expose LZ4 compression levels, I
propose adding per-column ZSTD compression level settings so users can
tune the speed/ratio trade-off. I’d like to hear thoughts on this
approach.

Specifying that as an attribute option makes sense here, but I don't
think that this has to be linked to the initial patch set that should
extend the toast data for the new compression method. It's a bit hard
to say how relevant that is, and IMV it's kind of hard for users to
know which level makes more sense. Setting up the wrong level can be
equally very costly in CPU. For now, my suggestion would be to focus
on the basics, and discard this part until we figure out the rest.

Anyway, I've read through the patches, and got a couple of comments.
This includes a few pieces that we are going to need to make the
implementation a bit easier that I've noticed while reading your
patch. Some of them can be implemented even before we add this extra
byte for the new compression methods in the varlena headers.

+CompressionInfo
+setup_cmp_info(char cmethod, Form_pg_attribute att)

This routine declares a Form_pg_attribute as argument, does not use
it. Due to that, it looks that attoptcache.h is pulled into
toast_compression.c.

Patch 0001 has the concept of metadata with various facilities, like
VARATT_4BCE_HAS_META(), CompressionInfo, etc. However at the current
stage we don't need that at all. Wouldn't it be better to delay this
kind of abstraction layer to happen after we discuss how (and if) the
dictionary part should be introduced rather than pay the cost of the
facility in the first step of the implementation? This is not
required as a first step. The toast zstd routines introduced in patch
0002 use !meta, discard meta=true as an error case.

+/* Helper: pack <flag, cmid> into a single byte: flag (b0), cmid-2
(b1..7) */

Having a one-liner here is far from enough? This is the kind of thing
where we should spend time describing how things are done and why they
are done this way. This is not sufficient, there's just too much to
guess. The fact that we have VARATT_4BCE_EXTFLAG is only, but there's
no real information about va_ecinfo and that it relates to the three
bits sets, for example.

+#define VARTAG_SIZE(PTR) \
[...]
UNALIGNED_U32()

This stuff feels magic. It's hard for someone to understand what's
going on here, and there is no explanation about why it's done this
way.

-toast_compress_datum(Datum value, char cmethod)
+toast_compress_datum(Datum value, CompressionInfo cmp)
[...]
-   /* If the compression method is not valid, use the current default */
-   if (!CompressionMethodIsValid(cmethod))
-       cmethod = default_toast_compression;

Removing the fallback to the default toast compression GUC if nothing
is valid does not look right. There could be extensions that depend
on that, and it's unclear what the benefits of setup_cmp_info() are,
because it is not documented, so it's hard for one to understand how
to use these changes.

-   result = (struct varlena *) palloc(TOAST_POINTER_SIZE);
+   result = (struct varlena *) palloc(TOAST_CMPID_EXTENDED(cm) ? TOAST_POINTER_EXT_SIZE : TOAST_POINTER_NOEXT_SIZE);
[...]
-   memcpy(VARDATA_EXTERNAL(result), &toast_pointer,
sizeof(toast_pointer));
+   memcpy(VARDATA_EXTERNAL(result), &toast_pointer,
TOAST_CMPID_EXTENDED(cm) ? TOAST_POINTER_EXT_SIZE - VARHDRSZ_EXTERNAL
: TOAST_POINTER_NOEXT_SIZE - VARHDRSZ_EXTERNAL) ; 

That looks, err... Hard to maintain to me. Okay, that's a
calculation for the extended compression part, but perhaps this is a
sign that we need to think harder about the surroundings of the
toast_pointer to ease such calculations.

+    {
+        {
+            "zstd_level",
+            "Set column's ZSTD compression level",
+            RELOPT_KIND_ATTRIBUTE,
+            ShareUpdateExclusiveLock
+        },
+        DEFAULT_ZSTD_LEVEL, MIN_ZSTD_LEVEL, MAX_ZSTD_LEVEL
+    },

This could be worth a patch on its own, once we get the basics sorted
out. I'm not even sure that we absolutely need that, TBH. The last
time I've had a discussion on the matter for WAL compression we
discarded the argument about the level because it's hard to understand
how to tune, and the default is enough to work magics. For WAL, we've
been using ZSTD_CLEVEL_DEFAULT in xloginsert.c, and I've not actually
heard much about people wanting to tune the compression level. That
was a few years ago, perhaps there are some more different opinions on
the matter.

+#define COMPRESSION_METHOD_NOT_SUPPORTED(method) \
     ereport(ERROR, \
             (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
-             errmsg("compression method lz4 not supported"), \
-             errdetail("This functionality requires the server to be built with lz4 support.")))
+             errmsg("compression method %s not supported", method), \
+             errdetail("This functionality requires the server to be built with %s support.", method)))

Let's make that a first independent patch that applies on top of the
rest. My original zstd toast patch did the same, without a split.

Your patch introduces a new compression_zstd, touching very lightly
compression.sql. I think that we should and can do much better than
that in the long term. The coverage of compression.sql is quite good,
and what the zstd code is adding does not cover all of it. Let's
rework the tests of HEAD and split compression.sql for the LZ4 and
pglz parts. If one takes a diff between compression.out and
compression_1.out, he/she would notice that the only differences are
caused by the existence of the lz4 table. This is not the smartest
move we can do if we add more compression methods, so I'd suggest the
following:
- Add a new SQL function called pg_toast_compression_available(text)
or similar, able to return if a toast compression method is supported
or not. This would need two arguments once the initial support for
zstd is done: lz4 and zstd. For head, we only require one: lz4.
- Now, the actual reason why a function returning a boolean result is
useful is for the SQL tests. It is possible with \if to make the
tests conditional if LZ4 is supported or now, limiting the noise if
LZ4 is not supported. See for example the tricks we use for the UTF-8
encoding or NUMA.
- Move the tests related to lz4 into a separate file, outside
compression.sql, in a new file called compression_lz4.sql. With the
addition of zstd toast support, we would add a new file:
compression_zstd.sql. The new zstd suite would then just need to
copy-paste the original one, with few tweaks. It may be better to
parameterize that but we don't do that anymore these days with
input/output regression files.
--
Michael

#42Nikhil Kumar Veldanda
Nikhil Kumar Veldanda
veldanda.nikhilkumar17@gmail.com
In reply to: Michael Paquier (#41)
3 attachment(s)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Thanks Michael, for providing feedback.

On Fri, May 30, 2025 at 12:21 AM Michael Paquier <michael@paquier.xyz> wrote:

During compression, compression methods (zstd_compress_datum) will
determine whether to use metadata(dictionary) or not based on
CompressionInfo.meta.

Not sure about this one.

I removed the meta field and the CompressionInfo struct. I originally
used CompressionInfo to carry the compression method, zstd_level, and
zstd dict ID downstream, but since we’re now using a default
compression level for zstd, it’s no longer needed.

ALTER TABLE tblname
ALTER COLUMN colname
SET (zstd_level = 5);
```

Specifying that as an attribute option makes sense here, but I don't
think that this has to be linked to the initial patch set that should
extend the toast data for the new compression method. It's a bit hard
to say how relevant that is, and IMV it's kind of hard for users to
know which level makes more sense. Setting up the wrong level can be
equally very costly in CPU. For now, my suggestion would be to focus
on the basics, and discard this part until we figure out the rest.

Ack. I’ve removed that option and will stick with ZSTD_CLEVEL_DEFAULT
as the compression level.

+CompressionInfo
+setup_cmp_info(char cmethod, Form_pg_attribute att)

Removed setup_cmp_info and its references.

This routine declares a Form_pg_attribute as argument, does not use
it. Due to that, it looks that attoptcache.h is pulled into
toast_compression.c.

Removed it.

Patch 0001 has the concept of metadata with various facilities, like
VARATT_4BCE_HAS_META(), CompressionInfo, etc. However at the current
stage we don't need that at all. Wouldn't it be better to delay this
kind of abstraction layer to happen after we discuss how (and if) the
dictionary part should be introduced rather than pay the cost of the
facility in the first step of the implementation? This is not
required as a first step. The toast zstd routines introduced in patch
0002 use !meta, discard meta=true as an error case.

Removed all metadata-related abstractions from patch 0001.

+/* Helper: pack <flag, cmid> into a single byte: flag (b0), cmid-2
(b1..7) */

Having a one-liner here is far from enough? This is the kind of thing
where we should spend time describing how things are done and why they
are done this way. This is not sufficient, there's just too much to
guess. The fact that we have VARATT_4BCE_EXTFLAG is only, but there's
no real information about va_ecinfo and that it relates to the three
bits sets, for example.

I’ve added a detailed comment explaining the one-byte layout.

+#define VARTAG_SIZE(PTR) \
[...]
UNALIGNED_U32()

This stuff feels magic. It's hard for someone to understand what's
going on here, and there is no explanation about why it's done this
way.

To clarify, we need to read a 32-bit value from an unaligned address
(specifically va_extinfo inside varatt_external) to determine the
toast_pointer size (by checking the top two bits to see if they equal
0b11, indicating an optional trailer). I wrote two versions of
READ_U32_UNALIGNED(ptr) that load four bytes individually and
reassemble them according to little- or big-endian order:

/**
* Safely read a 32-bit unsigned integer from *any* address, even when
* that address is **not** naturally aligned to 4 bytes. We do the load
* one byte at a time and re-assemble the word in *host* byte order.
* For LITTLE ENDIAN systems
*/
#define READ_U32_UNALIGNED(ptr) \
( (uint32) (((const uint8 *)(ptr))[0]) \
| ((uint32)(((const uint8 *)(ptr))[1]) << 8) \
| ((uint32)(((const uint8 *)(ptr))[2]) << 16) \
| ((uint32)(((const uint8 *)(ptr))[3]) << 24) )

/**
* For BIG ENDIAN systems.
*/
#define READ_U32_UNALIGNED(ptr) \
( (uint32) (((const uint8 *)(ptr))[3]) \
| ((uint32)(((const uint8 *)(ptr))[2]) << 8) \
| ((uint32)(((const uint8 *)(ptr))[1]) << 16) \
| ((uint32)(((const uint8 *)(ptr))[0]) << 24) )

Alternatively, one could use:

#define READ_U32_UNALIGNED(src) \
({ \
uint32 _tmp; \
memcpy(&_tmp, (src), sizeof(uint32)); \
_tmp; \
})

I chose the byte-by-byte version to avoid extra instructions in a hot path.

-toast_compress_datum(Datum value, char cmethod)
+toast_compress_datum(Datum value, CompressionInfo cmp)
[...]
-   /* If the compression method is not valid, use the current default */
-   if (!CompressionMethodIsValid(cmethod))
-       cmethod = default_toast_compression;

Removing the fallback to the default toast compression GUC if nothing
is valid does not look right. There could be extensions that depend
on that, and it's unclear what the benefits of setup_cmp_info() are,
because it is not documented, so it's hard for one to understand how
to use these changes.

I removed setup_cmp_info, all related code has been deleted.

-   result = (struct varlena *) palloc(TOAST_POINTER_SIZE);
+   result = (struct varlena *) palloc(TOAST_CMPID_EXTENDED(cm) ? TOAST_POINTER_EXT_SIZE : TOAST_POINTER_NOEXT_SIZE);
[...]
-   memcpy(VARDATA_EXTERNAL(result), &toast_pointer,
sizeof(toast_pointer));
+   memcpy(VARDATA_EXTERNAL(result), &toast_pointer,
TOAST_CMPID_EXTENDED(cm) ? TOAST_POINTER_EXT_SIZE - VARHDRSZ_EXTERNAL
: TOAST_POINTER_NOEXT_SIZE - VARHDRSZ_EXTERNAL) ;

That looks, err... Hard to maintain to me. Okay, that's a
calculation for the extended compression part, but perhaps this is a
sign that we need to think harder about the surroundings of the
toast_pointer to ease such calculations.

I simplified it by introducing a helper macro:
Now both the palloc call and the memcpy length calculation simply use
TOAST_POINTER_SIZE(cm) and TOAST_POINTER_SIZE(cm) − VARHDRSZ_EXTERNAL,
respectively.

#define TOAST_POINTER_SIZE(cm) \
(TOAST_CMPID_EXTENDED(cm) ? TOAST_POINTER_EXT_SIZE : TOAST_POINTER_NOEXT_SIZE)

+    {
+        {
+            "zstd_level",
+            "Set column's ZSTD compression level",
+            RELOPT_KIND_ATTRIBUTE,
+            ShareUpdateExclusiveLock
+        },
+        DEFAULT_ZSTD_LEVEL, MIN_ZSTD_LEVEL, MAX_ZSTD_LEVEL
+    },

This could be worth a patch on its own, once we get the basics sorted
out. I'm not even sure that we absolutely need that, TBH. The last
time I've had a discussion on the matter for WAL compression we
discarded the argument about the level because it's hard to understand
how to tune, and the default is enough to work magics. For WAL, we've
been using ZSTD_CLEVEL_DEFAULT in xloginsert.c, and I've not actually
heard much about people wanting to tune the compression level. That
was a few years ago, perhaps there are some more different opinions on
the matter.

Removed it.

Your patch introduces a new compression_zstd, touching very lightly
compression.sql. I think that we should and can do much better than
that in the long term. The coverage of compression.sql is quite good,
and what the zstd code is adding does not cover all of it. Let's
rework the tests of HEAD and split compression.sql for the LZ4 and
pglz parts. If one takes a diff between compression.out and
compression_1.out, he/she would notice that the only differences are
caused by the existence of the lz4 table. This is not the smartest
move we can do if we add more compression methods, so I'd suggest the
following:
- Add a new SQL function called pg_toast_compression_available(text)
or similar, able to return if a toast compression method is supported
or not. This would need two arguments once the initial support for
zstd is done: lz4 and zstd. For head, we only require one: lz4.
- Now, the actual reason why a function returning a boolean result is
useful is for the SQL tests. It is possible with \if to make the
tests conditional if LZ4 is supported or now, limiting the noise if
LZ4 is not supported. See for example the tricks we use for the UTF-8
encoding or NUMA.
- Move the tests related to lz4 into a separate file, outside
compression.sql, in a new file called compression_lz4.sql. With the
addition of zstd toast support, we would add a new file:
compression_zstd.sql. The new zstd suite would then just need to
copy-paste the original one, with few tweaks. It may be better to
parameterize that but we don't do that anymore these days with
input/output regression files.

Agreed. I introduced pg_compression_available(text) and refactored the
SQL tests accordingly. I split out LZ4 tests into compression_lz4.sql
and created compression_zstd.sql with the appropriate differences.

v25-0001-Add-pg_compression_available-and-split-sql-compr.patch -
Introduced pg_compression_available function and split sql tests
related to compression
v25-0002-Design-to-extend-the-varattrib_4b-varatt_externa.patch -
Design proposal for varattrib_4b & varatt_external
v25-0003-Implement-Zstd-compression-no-dictionary-support.patch - zstd
no dictionary compression implementation

--
Nikhil Veldanda

Attachments:

v25-0003-Implement-Zstd-compression-no-dictionary-support.patchapplication/octet-stream; name=v25-0003-Implement-Zstd-compression-no-dictionary-support.patch
v25-0001-Add-pg_compression_available-and-split-sql-compr.patchapplication/octet-stream; name=v25-0001-Add-pg_compression_available-and-split-sql-compr.patch
v25-0002-Design-to-extend-the-varattrib_4b-varatt_externa.patchapplication/octet-stream; name=v25-0002-Design-to-extend-the-varattrib_4b-varatt_externa.patch
#43Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Nikhil Kumar Veldanda (#42)
Re: ZStandard (with dictionaries) compression support for TOAST compression

On Thu, Jun 05, 2025 at 12:03:49AM -0700, Nikhil Kumar Veldanda wrote:

Agreed. I introduced pg_compression_available(text) and refactored the
SQL tests accordingly. I split out LZ4 tests into compression_lz4.sql
and created compression_zstd.sql with the appropriate differences.

v25-0001-Add-pg_compression_available-and-split-sql-compr.patch -
Introduced pg_compression_available function and split sql tests
related to compression

I like that as an independent piece because it's going to help a lot
in having new compression methods, so I'm looking forward to getting
that merged into the tree for v19. It can be split into two
independent pieces:
- One patch for the addition of the new function
pg_compression_available(), to detect which compression are supported
at binary level to skip the tests.
- One patch to split the LZ4-only tests into its own file.

The split of the tests is not completely clean as presented in your
patch, though. Your patch only does a copy-paste of the original
file. Some of the basic tests of compression.sql check the
interactions between the use of two compression methods, and the
"basic" compression.sql could just cut them and rely on the LZ4
scripts to do the job, because we want two active different
compression methods for these scenarios. For example, cmdata1
switched to use pglz has little uses. The trick is to have a minimal
set of tests to minimize the run time, while we don't lose in
coverage. Coverage report numbers are useful to compile when it comes
to such exercises, even if it can be an ant's work sometimes.

+ * pg_compression_available(text) → bool

Non-ASCII characters added in the code comments.

+#include "fmgr.h"
+#include "parser/scansup.h"
+#include "utils/builtins.h"

Include file order.

v25-0002-Design-to-extend-the-varattrib_4b-varatt_externa.patch -
Design proposal for varattrib_4b & varatt_external
v25-0003-Implement-Zstd-compression-no-dictionary-support.patch - zstd
no dictionary compression implementation

About this part, I am not sure yet. TBH, I've been working on this
the code for a different proposal in this area, because I've been
reminded during pgconf.dev that we still depend on 4-byte OIDs for
toast values, and we have done nothing about that for a long time.

If I'm able to pull this off correctly, modernizing the code on the
way, it should make additions related to the handling of different
on-disk varatt_external easier; the compression handling is a part of
that. So yes, that's related to varatt_external, and how we handle
it in the core code in the toasting and detoasting layers. The
difficult part is finding out how a good layer should look like,
because there's a bunch of hardcoded knowledge related to on-disk
TOAST Datums and entries, like the maximum chunk size (control file)
that depends on the toast_pointer, pointer alignment when inserting
the TOAST datums, etc. A lot of these things are close to 20 years
old, we have to maintain on-disk compatibility while attempting to
extend the varatt_external compatibility and there have been many
proposals that did not make it. None of them were really mature
enough in terms of layer deinision. Probably what I'm doing is going
to be flat-out rejected, but we'll see.
--
Michael

#44Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#43)
2 attachment(s)
Re: ZStandard (with dictionaries) compression support for TOAST compression

On Wed, Jun 11, 2025 at 11:42:02AM +0900, Michael Paquier wrote:

The split of the tests is not completely clean as presented in your
patch, though. Your patch only does a copy-paste of the original
file. Some of the basic tests of compression.sql check the
interactions between the use of two compression methods, and the
"basic" compression.sql could just cut them and rely on the LZ4
scripts to do the job, because we want two active different
compression methods for these scenarios. For example, cmdata1
switched to use pglz has little uses. The trick is to have a minimal
set of tests to minimize the run time, while we don't lose in
coverage. Coverage report numbers are useful to compile when it comes
to such exercises, even if it can be an ant's work sometimes.

I have no idea yet about the fate of the other TOAST patches I have
proposed for this commit fest, but let's move on with this part of the
refactoring by splitting the TOAST regression tests for LZ4 and pglz,
with the new pg_compression_available() that would reduce the diffs
with the alternate outputs.

This has required a bit more work than I suspected. Based on my
notes, first for pg_compression_available():
- Code moved to misc.c, with comments related to TOAST removed.
- Addition of gzip as an acceptable value.
- Error if the compression method is unknown.
- Some regression tests.
- Documentation should list the functions alphabetically.

Then for the refactoring of the tests, a few notes:
- There is no need for cmdata1 in compression.sql, using the same
compression method as cmdata, aka pglz. So we can trim down the
tests.
- In compression.sql, we can remove cmmove2, cmmove3 and cmdata2 which
have a compression method of pglz, and that we want to check where the
origin has LZ4 data. These should be only in compression_lz4.sql,
perhaps also in the zstd portion if needed later for your patch.
- The error cases with I_Do_Not_Exist_Compression at the bottom of
compression.sql can be kept, we don't need them in
compression_lz4.sql.
- It would be tempting to keep the test for LIKE INCLUDING COMPRESSION
in compression.sql, but we cannot do that as there is a dependency
with default_toast_compression so we want the GUC at pglz but the
table we are copying the data from at LZ4.
compression.sql, there is no need for it to depend on LZ4.
- The tests related to cmdata2 depend on LZ4 TOAST, which were a bit
duplicated.
- "test column type update varlena/non-varlena" was duplicated. Same
for "changing column storage should not impact the compression
method".
- The materialized view test in compression.sql depends on LZ4, can be
moved to compression_lz4.sql.
- The test with partitions and compression methods expects multiple
compression methods, can be moved to compression_lz4.sql
- "test alter compression method" expects two compression methods, can
be moved to compression_lz4.sql.
- The tests with SET default_toast_compression report a hint with the
list of values supported. This is not portable because the list of
values depends on what the build supports. We should use a trick
based on "\set VERBOSITY terse", removing the HINT to reduce the
noise.
- The tables specific to pglz and lz4 data are both still required in
compression_lz4.sql, for one test with inheritance. I have renamed
both to cmdata_pglz and cmdata_lz4, for clarity.

At the end, the gain in diffs is here per the following numbers in
the attached 0002 as we remove the alternal output of compression.sql
when lz4 is disabled:
7 files changed, 319 insertions(+), 724 deletions(-)

Attached are two patches for all that:
- 0001: Introduction of the new function pg_compression_available().
- 0002: Refactoring of the TOAST compression tests.

With this infrastructure in place, the addition of a new TOAST
compression method becomes easier for the test part: no more
cross-build specific diffs.

Thought, comments or objections?
--
Michael

Attachments:

0001-Add-function-pg_compression_available.patchtext/x-diff; charset=us-ascii
0002-Split-TOAST-compression-tests-into-two-files.patchtext/x-diff; charset=us-ascii
#45Nikhil Kumar Veldanda
Nikhil Kumar Veldanda
veldanda.nikhilkumar17@gmail.com
In reply to: Michael Paquier (#44)
Re: ZStandard (with dictionaries) compression support for TOAST compression

Hi Michael,

On Tue, Jul 15, 2025 at 9:44 PM Michael Paquier <michael@paquier.xyz> wrote:

I have no idea yet about the fate of the other TOAST patches I have
proposed for this commit fest, but let's move on with this part of the
refactoring by splitting the TOAST regression tests for LZ4 and pglz,
with the new pg_compression_available() that would reduce the diffs
with the alternate outputs.

This has required a bit more work than I suspected. Based on my
notes, first for pg_compression_available():
- Code moved to misc.c, with comments related to TOAST removed.
- Addition of gzip as an acceptable value.
- Error if the compression method is unknown.
- Some regression tests.
- Documentation should list the functions alphabetically.

Then for the refactoring of the tests, a few notes:
- There is no need for cmdata1 in compression.sql, using the same
compression method as cmdata, aka pglz. So we can trim down the
tests.
- In compression.sql, we can remove cmmove2, cmmove3 and cmdata2 which
have a compression method of pglz, and that we want to check where the
origin has LZ4 data. These should be only in compression_lz4.sql,
perhaps also in the zstd portion if needed later for your patch.
- The error cases with I_Do_Not_Exist_Compression at the bottom of
compression.sql can be kept, we don't need them in
compression_lz4.sql.
- It would be tempting to keep the test for LIKE INCLUDING COMPRESSION
in compression.sql, but we cannot do that as there is a dependency
with default_toast_compression so we want the GUC at pglz but the
table we are copying the data from at LZ4.
compression.sql, there is no need for it to depend on LZ4.
- The tests related to cmdata2 depend on LZ4 TOAST, which were a bit
duplicated.
- "test column type update varlena/non-varlena" was duplicated. Same
for "changing column storage should not impact the compression
method".
- The materialized view test in compression.sql depends on LZ4, can be
moved to compression_lz4.sql.
- The test with partitions and compression methods expects multiple
compression methods, can be moved to compression_lz4.sql
- "test alter compression method" expects two compression methods, can
be moved to compression_lz4.sql.
- The tests with SET default_toast_compression report a hint with the
list of values supported. This is not portable because the list of
values depends on what the build supports. We should use a trick
based on "\set VERBOSITY terse", removing the HINT to reduce the
noise.
- The tables specific to pglz and lz4 data are both still required in
compression_lz4.sql, for one test with inheritance. I have renamed
both to cmdata_pglz and cmdata_lz4, for clarity.

At the end, the gain in diffs is here per the following numbers in
the attached 0002 as we remove the alternal output of compression.sql
when lz4 is disabled:
7 files changed, 319 insertions(+), 724 deletions(-)

Attached are two patches for all that:
- 0001: Introduction of the new function pg_compression_available().
- 0002: Refactoring of the TOAST compression tests.

With this infrastructure in place, the addition of a new TOAST
compression method becomes easier for the test part: no more
cross-build specific diffs.

Thought, comments or objections?

Thanks for driving this forward—both patches look good to me.

0001 – pg_compression_available()
pg_compression_available() in misc.c feels sensible;

0002 – test-suite split
The new compression.sql / compression_lz4.sql split makes the diffs
much easier to reason about.

Michael

--
Nikhil Veldanda

#46Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Nikhil Kumar Veldanda (#45)
Re: ZStandard (with dictionaries) compression support for TOAST compression

On Tue, Jul 15, 2025 at 10:37:02PM -0700, Nikhil Kumar Veldanda wrote:

0001 – pg_compression_available()
pg_compression_available() in misc.c feels sensible;

Actually, I have taken a step back on this one and recalled that the
list of values available for an enum GUC are already available in
pg_settings, so we can already do something without this function,
with the same result:
+SELECT NOT(enumvals @> '{lz4}') AS skip_test FROM pg_settings WHERE
+  name = 'default_toast_compression' \gset

0002 – test-suite split
The new compression.sql / compression_lz4.sql split makes the diffs
much easier to reason about.

Another thing that I have spent a lot of time on today while having a
second look was the code coverage after a make check. There was one
surprising result: lz4_compress_datum() for the incompressible data
case now has some coverage.

A second thing is AdjustUpgrade.pm, which has the matview compressmv
with a qual based on cmdata1, but I think we're OK as this is an
adjustment of the upgrade dumps for 74a3fc36f314, which exists in
v16~. I'll keep an eye on the buildfarm anyway, in case something
shows up.
--
Michael