[PATCH] Add zstd compression for TOAST using extended header format

Started by Dharin Shah4 months ago19 messageshackers
Jump to latest
#1Dharin Shah
dharinshah95@gmail.com

Hello PG Hackers,

Want to submit a patch that implements zstd compression for TOAST data
using a 20-byte TOAST pointer format, directly addressing the concerns
raised in prior discussions [1
</messages/by-id/CAFAfj_F4qeRCNCYPk1vgH42fDZpjQWKO+ufq3FyoVyUa5AviFA@mail.gmail.com
][2
</messages/by-id/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com&gt;
][3
</messages/by-id/YoMiNmkztrslDbNS@paquier.xyz&gt;%5D.

A bit of a background in the 2022 thread [3
</messages/by-id/YoMiNmkztrslDbNS@paquier.xyz&gt;%5D,
Robert Haas suggested:
"we had better reserve the fourth bit pattern for something extensible e.g.
another byte or several to specify the actual method"

i.e. something like:
00 = PGLZ
01 = LZ4
10 = reserved for future emergencies
11 = extended header with additional type byte

Michael also asked whether we should have "something a bit more extensible
for the design of an extensible varlena header."

This patch implements that idea.
The format:

struct varatt_external_extended {
int32 va_rawsize; /* same as legacy */
uint32 va_extinfo; /* cmid=3 signals extended format */
uint8 va_flags; /* feature flags */
uint8 va_data[3]; /* va_data[0] = compression method */
Oid va_valueid; /* same as legacy */
Oid va_toastrelid; /* same as legacy */
};

*A few notes:*

- Zstd only applies to external TOAST, not inline compression. The 2-bit
limit in va_tcinfo stays as-is for inline data, where pglz/lz4 work fine
anyway. Zstd's wins show up on larger values.
- A GUC use_extended_toast_header controls whether pglz/lz4 also use the
20-byte format (defaults to off for compatibility, can enable it if you
want consistency).
- Legacy 16-byte pointers continue to work - we check the vartag to
determine which format to read.

The 4 extra bytes per pointer is negligible for typical TOAST data sizes,
and it gives us room to grow.

Regards,
Dharin

Attachments:

zstd-toast-compression-external.patchapplication/octet-stream; name=zstd-toast-compression-external.patchDownload+1538-92
#2Dharin Shah
dharinshah95@gmail.com
In reply to: Dharin Shah (#1)
Re: [PATCH] Add zstd compression for TOAST using extended header format

Hello,

Apologies for the spam, updated the patch with the tests corrected.

Thanks,
Dharin

On Sat, Dec 13, 2025 at 6:31 PM Dharin Shah <dharinshah95@gmail.com> wrote:

Show quoted text

Hello PG Hackers,

Want to submit a patch that implements zstd compression for TOAST data
using a 20-byte TOAST pointer format, directly addressing the concerns
raised in prior discussions [1
</messages/by-id/CAFAfj_F4qeRCNCYPk1vgH42fDZpjQWKO+ufq3FyoVyUa5AviFA@mail.gmail.com
][2
</messages/by-id/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com&gt;
][3
</messages/by-id/YoMiNmkztrslDbNS@paquier.xyz&gt;
].

A bit of a background in the 2022 thread [3
</messages/by-id/YoMiNmkztrslDbNS@paquier.xyz&gt;%5D,
Robert Haas suggested:
"we had better reserve the fourth bit pattern for something extensible
e.g. another byte or several to specify the actual method"

i.e. something like:
00 = PGLZ
01 = LZ4
10 = reserved for future emergencies
11 = extended header with additional type byte

Michael also asked whether we should have "something a bit more extensible
for the design of an extensible varlena header."

This patch implements that idea.
The format:

struct varatt_external_extended {
int32 va_rawsize; /* same as legacy */
uint32 va_extinfo; /* cmid=3 signals extended format */
uint8 va_flags; /* feature flags */
uint8 va_data[3]; /* va_data[0] = compression method */
Oid va_valueid; /* same as legacy */
Oid va_toastrelid; /* same as legacy */
};

*A few notes:*

- Zstd only applies to external TOAST, not inline compression. The 2-bit
limit in va_tcinfo stays as-is for inline data, where pglz/lz4 work fine
anyway. Zstd's wins show up on larger values.
- A GUC use_extended_toast_header controls whether pglz/lz4 also use the
20-byte format (defaults to off for compatibility, can enable it if you
want consistency).
- Legacy 16-byte pointers continue to work - we check the vartag to
determine which format to read.

The 4 extra bytes per pointer is negligible for typical TOAST data sizes,
and it gives us room to grow.

Regards,
Dharin

Attachments:

zstd-toast-compression-external.patchapplication/octet-stream; name=zstd-toast-compression-external.patchDownload+1440-92
#3Dharin Shah
dharinshah95@gmail.com
In reply to: Dharin Shah (#1)

Hello PG Hackers,

Want to submit a patch that implements zstd compression for TOAST data
using a 20-byte TOAST pointer format, directly addressing the concerns
raised in prior discussions [1
</messages/by-id/CAFAfj_F4qeRCNCYPk1vgH42fDZpjQWKO+ufq3FyoVyUa5AviFA@mail.gmail.com
][2
</messages/by-id/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com&gt;
][3
</messages/by-id/YoMiNmkztrslDbNS@paquier.xyz&gt;%5D.

A bit of a background in the 2022 thread [3
</messages/by-id/YoMiNmkztrslDbNS@paquier.xyz&gt;%5D,
The overall suggestion was to have something extensible for the TOAST header

i.e. something like:
00 = PGLZ
01 = LZ4
10 = reserved for future emergencies
11 = extended header with additional type byte

This patch implements that idea.
The new header format:

struct varatt_external_extended {
int32 va_rawsize; /* same as legacy */
uint32 va_extinfo; /* cmid=3 signals extended format */
uint8 va_flags; /* feature flags */
uint8 va_data[3]; /* va_data[0] = compression method */
Oid va_valueid; /* same as legacy */
Oid va_toastrelid; /* same as legacy */
};

*A few notes:*

- Zstd only applies to external TOAST, not inline compression. The 2-bit
limit in va_tcinfo stays as-is for inline data, where pglz/lz4 work fine
anyway. Zstd's wins show up on larger values.
- A GUC use_extended_toast_header controls whether pglz/lz4 also use the
20-byte format (defaults to off for compatibility, can enable it if you
want consistency).
- Legacy 16-byte pointers continue to work - we check the vartag to
determine which format to read.

The 4 extra bytes per pointer is negligible for typical TOAST data sizes,
and it gives us room to grow.

Regards,
Dharin

Attachments:

zstd-toast-compression-external.patchapplication/x-patch; name=zstd-toast-compression-external.patchDownload+1440-92
#4Murtuza Zabuawala
murtuza.zabuawala@enterprisedb.com
In reply to: Dharin Shah (#3)
Re: [PATCH] Add zstd compression for TOAST using extended header format

Hello,

You may want to consider sending the patch to the pgsql-hackers mailing list.

Murtuza Zabuawala
enterprisedb.com <http://enterprisedb.com/&gt;

Show quoted text

On 16 Dec 2025, at 12:46 AM, Dharin Shah <dharinshah95@gmail.com> wrote:

Hello PG Hackers,

Want to submit a patch that implements zstd compression for TOAST data using a 20-byte TOAST pointer format, directly addressing the concerns raised in prior discussions [1 </messages/by-id/CAFAfj_F4qeRCNCYPk1vgH42fDZpjQWKO+ufq3FyoVyUa5AviFA@mail.gmail.com </messages/by-id/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com&gt;%5D%5B3 </messages/by-id/YoMiNmkztrslDbNS@paquier.xyz&gt;%5D.

A bit of a background in the 2022 thread [3 </messages/by-id/YoMiNmkztrslDbNS@paquier.xyz&gt;%5D, The overall suggestion was to have something extensible for the TOAST header

i.e. something like:
00 = PGLZ
01 = LZ4
10 = reserved for future emergencies
11 = extended header with additional type byte

This patch implements that idea.
The new header format:

struct varatt_external_extended {
int32 va_rawsize; /* same as legacy */
uint32 va_extinfo; /* cmid=3 signals extended format */
uint8 va_flags; /* feature flags */
uint8 va_data[3]; /* va_data[0] = compression method */
Oid va_valueid; /* same as legacy */
Oid va_toastrelid; /* same as legacy */
};

A few notes:

- Zstd only applies to external TOAST, not inline compression. The 2-bit limit in va_tcinfo stays as-is for inline data, where pglz/lz4 work fine anyway. Zstd's wins show up on larger values.
- A GUC use_extended_toast_header controls whether pglz/lz4 also use the 20-byte format (defaults to off for compatibility, can enable it if you want consistency).
- Legacy 16-byte pointers continue to work - we check the vartag to determine which format to read.

The 4 extra bytes per pointer is negligible for typical TOAST data sizes, and it gives us room to grow.

Regards,
Dharin
<zstd-toast-compression-external.patch>

#5Dharin Shah
dharinshah95@gmail.com
In reply to: Murtuza Zabuawala (#4)
Re: [PATCH] Add zstd compression for TOAST using extended header format

THanks Murtuza,

My bad, wrong email :(

Regards,
Dharin

On Tue, Dec 16, 2025 at 6:56 AM Murtuza Zabuawala <
murtuza.zabuawala@enterprisedb.com> wrote:

Show quoted text

Hello,

You may want to consider sending the patch to the pgsql-hackers mailing
list.

*Murtuza Zabuawala*
enterprisedb.com

On 16 Dec 2025, at 12:46 AM, Dharin Shah <dharinshah95@gmail.com> wrote:

Hello PG Hackers,

Want to submit a patch that implements zstd compression for TOAST data
using a 20-byte TOAST pointer format, directly addressing the concerns
raised in prior discussions [1
</messages/by-id/CAFAfj_F4qeRCNCYPk1vgH42fDZpjQWKO+ufq3FyoVyUa5AviFA@mail.gmail.com
][2
</messages/by-id/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com&gt;
][3
</messages/by-id/YoMiNmkztrslDbNS@paquier.xyz&gt;
].

A bit of a background in the 2022 thread [3
</messages/by-id/YoMiNmkztrslDbNS@paquier.xyz&gt;%5D,
The overall suggestion was to have something extensible for the TOAST header

i.e. something like:
00 = PGLZ
01 = LZ4
10 = reserved for future emergencies
11 = extended header with additional type byte

This patch implements that idea.
The new header format:

struct varatt_external_extended {
int32 va_rawsize; /* same as legacy */
uint32 va_extinfo; /* cmid=3 signals extended format */
uint8 va_flags; /* feature flags */
uint8 va_data[3]; /* va_data[0] = compression method */
Oid va_valueid; /* same as legacy */
Oid va_toastrelid; /* same as legacy */
};

*A few notes:*

- Zstd only applies to external TOAST, not inline compression. The 2-bit
limit in va_tcinfo stays as-is for inline data, where pglz/lz4 work fine
anyway. Zstd's wins show up on larger values.
- A GUC use_extended_toast_header controls whether pglz/lz4 also use the
20-byte format (defaults to off for compatibility, can enable it if you
want consistency).
- Legacy 16-byte pointers continue to work - we check the vartag to
determine which format to read.

The 4 extra bytes per pointer is negligible for typical TOAST data sizes,
and it gives us room to grow.

Regards,
Dharin
<zstd-toast-compression-external.patch>

#6Dharin Shah
dharinshah95@gmail.com
In reply to: Dharin Shah (#1)
Fwd: [PATCH] Add zstd compression for TOAST using extended header format

Hello PG Hackers,

Want to submit a patch that implements zstd compression for TOAST data
using a 20-byte TOAST pointer format, directly addressing the concerns
raised in prior discussions [1
</messages/by-id/CAFAfj_F4qeRCNCYPk1vgH42fDZpjQWKO+ufq3FyoVyUa5AviFA@mail.gmail.com
][2
</messages/by-id/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com&gt;
][3
</messages/by-id/YoMiNmkztrslDbNS@paquier.xyz&gt;%5D.

A bit of a background in the 2022 thread [3
</messages/by-id/YoMiNmkztrslDbNS@paquier.xyz&gt;%5D,
The overall suggestion was to have something extensible for the TOAST header

i.e. something like:
00 = PGLZ
01 = LZ4
10 = reserved for future emergencies
11 = extended header with additional type byte

This patch implements that idea.
The new header format:

struct varatt_external_extended {
int32 va_rawsize; /* same as legacy */
uint32 va_extinfo; /* cmid=3 signals extended format */
uint8 va_flags; /* feature flags */
uint8 va_data[3]; /* va_data[0] = compression method */
Oid va_valueid; /* same as legacy */
Oid va_toastrelid; /* same as legacy */
};

*A few notes:*

- Zstd only applies to external TOAST, not inline compression. The 2-bit
limit in va_tcinfo stays as-is for inline data, where pglz/lz4 work fine
anyway. Zstd's wins show up on larger values.
- A GUC use_extended_toast_header controls whether pglz/lz4 also use the
20-byte format (defaults to off for compatibility, can enable it if you
want consistency).
- Legacy 16-byte pointers continue to work - we check the vartag to
determine which format to read.

The 4 extra bytes per pointer is negligible for typical TOAST data sizes,
and it gives us room to grow.

Regards,
Dharin

Attachments:

zstd-toast-compression-external.patchapplication/octet-stream; name=zstd-toast-compression-external.patchDownload+1440-92
#7Peter Eisentraut
peter_e@gmx.net
In reply to: Dharin Shah (#6)
Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format

On 16.12.25 11:51, Dharin Shah wrote:

- Zstd only applies to external TOAST, not inline compression. The 2-bit
limit in va_tcinfo stays as-is for inline data, where pglz/lz4 work fine
anyway. Zstd's wins show up on larger values.

This is a very complicated patch. To motivate it, you should show some
detailed performance measurements that show these wins.

#8Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#7)
Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format

On Wed, Dec 17, 2025 at 04:11:38PM +0100, Peter Eisentraut wrote:

On 16.12.25 11:51, Dharin Shah wrote:

- Zstd only applies to external TOAST, not inline compression. The 2-bit
limit in va_tcinfo stays as-is for inline data, where pglz/lz4 work fine
anyway. Zstd's wins show up on larger values.

This is a very complicated patch. To motivate it, you should show some
detailed performance measurements that show these wins.

Yes, this is expected for any patch posted. Zstd is an improved
version of lz4, acting as a sort of industry standard these days, and
any byte sequences I have looked at points that zstd leads kind of
always to a better compression ratio for less or equivalent CPU cost
compared to LZ4. Not saying that numbers are not required, they are.
But I strongly suspect numbers among these lines.

FWIW, it's not a complicated patch, it is a large mechanical patch
that enforces a bunch of TOAST code paths to do what it wants. If we
are going to do something about that and agree on something, I think
that we should just use a new vartag_external for this matter
(spoiler: I think we should use a new vartag_external value), but
keep the toast structure at 16 bytes all the time, leaving alone the
extra bit in the existing varatt_external structure so as there is no
impact for heap relations if zstd is used, as long as the TOAST value
is 32 bits. The patch introduces a new vartag_external with
VARTAG_ONDISK_EXTENDED, so while it leads to a better compatibility,
it also means that all zstd entries have to pay an extra amount of
space in the main relation as an effect of a different
default_toast_compression. The difficulty is not in the
implementation, it would be on agreeing on what folks would be OK
with in terms if vartag and varatt structures, and that's one of the
oldest areas of the PG code, that has complications and assumptions of
its own.

The test implementation looks wrong to me. Why is there any need for
an extra test module test_toast_ext? You could just reuse the same
structure as compression_lz4.sql, but adapted to zstd. That's why a
split with compression.sql has been done in 74a3fc36f314, FYI.

You should also aim at splitting the patch more to make it easier to
review: one of the sticky point of this area of the code is to untie
completely DefaultCompressionId, its GUC and the TOAST code. The GUC
default_toast_compression accepts by design only 4 values. This needs
to go further, and should be refactored as a piece of its own.

And also, I would prefer if the 32-bit value issue is tackled first,
but that's a digression here, for a different thread. :)
--
Michael

#9Dharin Shah
dharinshah95@gmail.com
In reply to: Michael Paquier (#8)
Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format

Hi Michael (and Peter),

Thanks for the detailed feedback — this is really helpful.

I want to make sure I understand your main point: you're OK with a new
`vartag_external`, but prefer we avoid increasing the heap TOAST pointer
from 16 -> 20 bytes since every zstd-toasted value would pay +4 bytes in
the main heap tuple.
I also realize the "compatibility" of the extended header doesn't buy us
much — we'll need to support the existing 16-byte varatt_external forever
for backward compatibility. Adding a 20-byte structure just means two
formats to maintain indefinitely.

A couple clarifying questions if we go with new vartag (e.g.,
`VARTAG_ONDISK_ZSTD`), same 16-byte `varatt_external` payload, vartag as
discriminator
1. How should we handle future methods beyond zstd? One tag per method, or
store a method id elsewhere (e.g., in TOAST chunk header)?
2. And re: "as long as the TOAST value is 32 bits" — are you referring to
the 30-bit extsize field in va_extinfo (i.e., avoid stealing bits from
extsize for method encoding)?

Test

Rows

Uncompressed

PGLZ

LZ4

ZSTD

PGLZ/ZSTD

LZ4/ZSTD

T1: Large JSON (~18KB/row)

500

~9,000 KB

1496 KB

1528 KB

976 KB

1.53x

1.57x

T2: Repetitive Text (~246KB/row)

500

~123,000 KB

1672 KB

648 KB

248 KB

6.74x

2.61x

T3: MD5 Hash Data (~16KB/row)

500

~8,000 KB

8288 KB

8232 KB

4256 KB

1.95x

1.93x

T4: Server Logs (~3.5KB/row)

1000

~3,500 KB

400 KB

352 KB

456 KB

0.88x

0.77x

*Key findings (i guess well known at this point):*
- ZSTD excels for repetitive/pattern-heavy data (6.7x better than PGLZ)
- For low-redundancy data (MD5 hashes), ZSTD still achieves ~2x better
- The T4 result showing zstd as "worse" is not about compression quality -
it's about missing inline storage support. ZSTD actually compresses better,
but pays unnecessary TOAST overhead.

I'll share the detailed benchmark script with the next patch revision. But
also a potential path forward could be that we could just fully replace
pglz (can bring it up later in different thread)

*On Testing and Patch Structure*
Agreed on both points:
- I'll use `compression_zstd.sql` following the `compression_lz4.sql`
pattern (removing the test_toast_ext module)
- I'll split the GUC refactoring into a separate preparatory patch

Once you confirm which representation you're advocating, I'll respin
accordingly.

Thanks,
Dharin

On Thu, Dec 18, 2025 at 7:35 AM Michael Paquier <michael@paquier.xyz> wrote:

Show quoted text

On Wed, Dec 17, 2025 at 04:11:38PM +0100, Peter Eisentraut wrote:

On 16.12.25 11:51, Dharin Shah wrote:

- Zstd only applies to external TOAST, not inline compression. The

2-bit

limit in va_tcinfo stays as-is for inline data, where pglz/lz4 work

fine

anyway. Zstd's wins show up on larger values.

This is a very complicated patch. To motivate it, you should show some
detailed performance measurements that show these wins.

Yes, this is expected for any patch posted. Zstd is an improved
version of lz4, acting as a sort of industry standard these days, and
any byte sequences I have looked at points that zstd leads kind of
always to a better compression ratio for less or equivalent CPU cost
compared to LZ4. Not saying that numbers are not required, they are.
But I strongly suspect numbers among these lines.

FWIW, it's not a complicated patch, it is a large mechanical patch
that enforces a bunch of TOAST code paths to do what it wants. If we
are going to do something about that and agree on something, I think
that we should just use a new vartag_external for this matter
(spoiler: I think we should use a new vartag_external value), but
keep the toast structure at 16 bytes all the time, leaving alone the
extra bit in the existing varatt_external structure so as there is no
impact for heap relations if zstd is used, as long as the TOAST value
is 32 bits. The patch introduces a new vartag_external with
VARTAG_ONDISK_EXTENDED, so while it leads to a better compatibility,
it also means that all zstd entries have to pay an extra amount of
space in the main relation as an effect of a different
default_toast_compression. The difficulty is not in the
implementation, it would be on agreeing on what folks would be OK
with in terms if vartag and varatt structures, and that's one of the
oldest areas of the PG code, that has complications and assumptions of
its own.

The test implementation looks wrong to me. Why is there any need for
an extra test module test_toast_ext? You could just reuse the same
structure as compression_lz4.sql, but adapted to zstd. That's why a
split with compression.sql has been done in 74a3fc36f314, FYI.

You should also aim at splitting the patch more to make it easier to
review: one of the sticky point of this area of the code is to untie
completely DefaultCompressionId, its GUC and the TOAST code. The GUC
default_toast_compression accepts by design only 4 values. This needs
to go further, and should be refactored as a piece of its own.

And also, I would prefer if the 32-bit value issue is tackled first,
but that's a digression here, for a different thread. :)
--
Michael

#10Michael Paquier
michael@paquier.xyz
In reply to: Dharin Shah (#9)
Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format

On Thu, Dec 18, 2025 at 10:44:22PM +0100, Dharin Shah wrote:

I want to make sure I understand your main point: you're OK with a new
`vartag_external`, but prefer we avoid increasing the heap TOAST pointer
from 16 -> 20 bytes since every zstd-toasted value would pay +4 bytes in
the main heap tuple.

That would be my choice, yes. Not sure about the opinion of others on
this matter.

I also realize the "compatibility" of the extended header doesn't buy us
much — we'll need to support the existing 16-byte varatt_external forever
for backward compatibility. Adding a 20-byte structure just means two
formats to maintain indefinitely.

Yes. Patches have to maintain on-disk compatibility.

A couple clarifying questions if we go with new vartag (e.g.,
`VARTAG_ONDISK_ZSTD`), same 16-byte `varatt_external` payload, vartag as
discriminator
1. How should we handle future methods beyond zstd? One tag per method, or
store a method id elsewhere (e.g., in TOAST chunk header)?

My suspicion would be that we could either use a new set of vartags in
the future for each compression method. When it comes to zstd there
is something that comes in play: we could set some bits related to
dictionnaries at tuple level. Not sure if this is the best design or
if using an attribute-level option is more adapted (for example a
JSONB blob could be applied as an attribute with common keys in a
dictionnary saving a lot of on-disk space even before compression),
but keeping some bits free in the 16-byte header leaves this option
open with a new vartag_external. Saying that, zstd is good enough
that I strongly suspect that we would not regret it for quite a few
years. One issue that has pushed towards the addition of lz4 as an
option for toast compression is that pglz was worse in terms of CPU
cost. zlib is also more expensive than lz4 or zstd, especially at
very high compression level for usually little compression gains.

2. And re: "as long as the TOAST value is 32 bits" — are you referring to
the 30-bit extsize field in va_extinfo (i.e., avoid stealing bits from
extsize for method encoding)?

I mean extending the TOAST value to 8 bytes, as per the following
issues:
/messages/by-id/764273.1669674269@sss.pgh.pa.us
https://commitfest.postgresql.org/patch/5830/

*Key findings (i guess well known at this point):*
- ZSTD excels for repetitive/pattern-heavy data (6.7x better than PGLZ)
- For low-redundancy data (MD5 hashes), ZSTD still achieves ~2x better
- The T4 result showing zstd as "worse" is not about compression quality -
it's about missing inline storage support. ZSTD actually compresses better,
but pays unnecessary TOAST overhead.

I'll share the detailed benchmark script with the next patch revision. But
also a potential path forward could be that we could just fully replace
pglz (can bring it up later in different thread)

I don't think that we will ever be able to remove pglz. It would be
nice, as final result of course, but I also expect that not being able
to decompress pglz data is going to lead to a lot of user pain. That
would be also very expensive to check at upgrade for large instances.

*On Testing and Patch Structure*
Agreed on both points:
- I'll use `compression_zstd.sql` following the `compression_lz4.sql`
pattern (removing the test_toast_ext module)

Okay.

- I'll split the GUC refactoring into a separate preparatory patch

This refactoring, if done nicely, is worth an independent piece. It's
something that I have actually done for the sake of the other thread,
though the result was not really much liked by others. Perhaps I'm
just lacking imagination with this abstraction, and I'd surely welcome
different ideas.
--
Michael

#11Dharin Shah
dharinshah95@gmail.com
In reply to: Michael Paquier (#10)
Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format

Hello,

Following up on my earlier patch submission, I've reworked the zstd TOAST
compression implementation based on our discussion here. The new patch now
avoids the 20-byte extended header.

Current Approach
- New `VARTAG_ONDISK_ZSTD` (value 19) for ZSTD external storage
- Maintains existing 16-byte varatt_external structure
- ZSTD external-only (no inline compression)

Note: Using a dedicated VARTAG_ONDISK_ZSTD keeps the on-disk TOAST pointer
payload at 16 bytes, but it is not a general extensible metadata carrier.
If PostgreSQL later adopts a more general extensible TOAST framework, this
change should not block it; VARTAG_ONDISK_ZSTD would remain as a supported
legacy encoding, while new toasted values could be written using the newer
framework and old values rewritten via normal table rewrites.

Storage (170 MB uncompressed):
ZSTD: 22 MB (7.60x) - 38.7% space savings vs LZ4
PGLZ: 36 MB (4.76x)
LZ4: 36 MB (4.66x)

Key findings:
- Large values (>50KB): ZSTD 33% better compression than PGLZ (~30% better
than LZ4)
- Low-entropy data: ZSTD compresses what LZ77 methods cannot
- Small values: ZSTD pays external overhead vs inline PGLZ/LZ4
While ZSTD uses slightly less space overall, the external storage mechanism
incurs a TOAST fetch overhead for small values, potentially impacting
performance.
Backwards Compatibility Tests
- Mixed compression: Rows with PGLZ, LZ4, and ZSTD coexist and decompress
correctly
- Lazy recompression: ALTER COLUMN ... SET COMPRESSION zstd affects new
data; existing data is lazily recompressed upon UPDATE or VACUUM FULL.
- Inline vs external: Small values remain inline; large values use
appropriate external compression.
Data integrity: All data decompresses correctly across all methods.

Trade-offs and Design Considerations

- External-only avoids consuming cmid=3 and extended header complexity

- Slice access: no ZSTD-specific optimization (follow-up area)

- Hybrid inline/external for small values: not in this patch (feedback
welcome)

Reviewer Questions - Is vartag-based external-only acceptable?
- Should compression level (currently 3) be configurable? - Is the external
storage overhead for small values acceptable, or is hybrid inline/external
behavior needed?
Thanks, Dharin

On Thu, Dec 18, 2025 at 11:44 PM Michael Paquier <michael@paquier.xyz>
wrote:

Show quoted text

On Thu, Dec 18, 2025 at 10:44:22PM +0100, Dharin Shah wrote:

I want to make sure I understand your main point: you're OK with a new
`vartag_external`, but prefer we avoid increasing the heap TOAST pointer
from 16 -> 20 bytes since every zstd-toasted value would pay +4 bytes in
the main heap tuple.

That would be my choice, yes. Not sure about the opinion of others on
this matter.

I also realize the "compatibility" of the extended header doesn't buy us
much — we'll need to support the existing 16-byte varatt_external forever
for backward compatibility. Adding a 20-byte structure just means two
formats to maintain indefinitely.

Yes. Patches have to maintain on-disk compatibility.

A couple clarifying questions if we go with new vartag (e.g.,
`VARTAG_ONDISK_ZSTD`), same 16-byte `varatt_external` payload, vartag as
discriminator
1. How should we handle future methods beyond zstd? One tag per method,

or

store a method id elsewhere (e.g., in TOAST chunk header)?

My suspicion would be that we could either use a new set of vartags in
the future for each compression method. When it comes to zstd there
is something that comes in play: we could set some bits related to
dictionnaries at tuple level. Not sure if this is the best design or
if using an attribute-level option is more adapted (for example a
JSONB blob could be applied as an attribute with common keys in a
dictionnary saving a lot of on-disk space even before compression),
but keeping some bits free in the 16-byte header leaves this option
open with a new vartag_external. Saying that, zstd is good enough
that I strongly suspect that we would not regret it for quite a few
years. One issue that has pushed towards the addition of lz4 as an
option for toast compression is that pglz was worse in terms of CPU
cost. zlib is also more expensive than lz4 or zstd, especially at
very high compression level for usually little compression gains.

2. And re: "as long as the TOAST value is 32 bits" — are you referring to
the 30-bit extsize field in va_extinfo (i.e., avoid stealing bits from
extsize for method encoding)?

I mean extending the TOAST value to 8 bytes, as per the following
issues:
/messages/by-id/764273.1669674269@sss.pgh.pa.us
https://commitfest.postgresql.org/patch/5830/

*Key findings (i guess well known at this point):*
- ZSTD excels for repetitive/pattern-heavy data (6.7x better than PGLZ)
- For low-redundancy data (MD5 hashes), ZSTD still achieves ~2x better
- The T4 result showing zstd as "worse" is not about compression quality

-

it's about missing inline storage support. ZSTD actually compresses

better,

but pays unnecessary TOAST overhead.

I'll share the detailed benchmark script with the next patch revision.

But

also a potential path forward could be that we could just fully replace
pglz (can bring it up later in different thread)

I don't think that we will ever be able to remove pglz. It would be
nice, as final result of course, but I also expect that not being able
to decompress pglz data is going to lead to a lot of user pain. That
would be also very expensive to check at upgrade for large instances.

*On Testing and Patch Structure*
Agreed on both points:
- I'll use `compression_zstd.sql` following the `compression_lz4.sql`
pattern (removing the test_toast_ext module)

Okay.

- I'll split the GUC refactoring into a separate preparatory patch

This refactoring, if done nicely, is worth an independent piece. It's
something that I have actually done for the sake of the other thread,
though the result was not really much liked by others. Perhaps I'm
just lacking imagination with this abstraction, and I'd surely welcome
different ideas.
--
Michael

Attachments:

benchmark_toast_compression.sqlapplication/octet-stream; name=benchmark_toast_compression.sqlDownload
v3-0001-Add-ZSTD-TOAST-compression-using-VARTAG-ONDISK-ZSTD.patchapplication/octet-stream; name=v3-0001-Add-ZSTD-TOAST-compression-using-VARTAG-ONDISK-ZSTD.patchDownload+947-40
backwards_compatibility_test.sqlapplication/octet-stream; name=backwards_compatibility_test.sqlDownload
#12Robert Treat
xzilla@users.sourceforge.net
In reply to: Michael Paquier (#10)
Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format

On Thu, Dec 18, 2025 at 5:44 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Dec 18, 2025 at 10:44:22PM +0100, Dharin Shah wrote:

I'll share the detailed benchmark script with the next patch revision. But
also a potential path forward could be that we could just fully replace
pglz (can bring it up later in different thread)

I don't think that we will ever be able to remove pglz. It would be
nice, as final result of course, but I also expect that not being able
to decompress pglz data is going to lead to a lot of user pain. That
would be also very expensive to check at upgrade for large instances.

Agreed that I can't see pglz being removed any time soon, if ever.
Thinking through what a conversion process would look like seems
unwieldy at best, so I think we definitely need it for backwards
compatibility, plus I think it is useful to have a self-contained
option. I'd almost suggest we should look at replacing lz4, but I
don't think that is significantly easier, it just has a smaller, more
invested, blast radius. That said, I do suspect ztsd could quickly
become a popular recommendation and/or default among users /
consultants / service providers.

Robert Treat
https://xzilla.net

#13Michael Paquier
michael@paquier.xyz
In reply to: Robert Treat (#12)
Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format

On Wed, Dec 24, 2025 at 11:50:48AM -0500, Robert Treat wrote:

Agreed that I can't see pglz being removed any time soon, if ever.
Thinking through what a conversion process would look like seems
unwieldy at best, so I think we definitely need it for backwards
compatibility, plus I think it is useful to have a self-contained
option. I'd almost suggest we should look at replacing lz4, but I
don't think that is significantly easier, it just has a smaller, more
invested, blast radius.

Backward-compatibility requirements make a replacement of LZ4
basically impossible to me, for the same reasons as pglz. We could
not replace the bit used in the va_extinfo to track if LZ4 compression
is used, either. One thing that I do wonder is if it would make
things simpler in the long-run if we introduced a new separated vartag
for LZ4-compressed external TOAST pointers as well. At least we'd
have a leaner design: it means that we have to keep the
varatt_external available on read, but we could update to the new
format when writing entries. Or perhaps that's not worth the
complication based on the last sentence you are writing..

That said, I do suspect ztsd could quickly
become a popular recommendation and/or default among users /
consultants / service providers.

.. Because I strongly suspect that this is going to be true, and that
zstd would just be a better replacement over lz4. That's a trend that
I see is already going on for wal_compression.

Note that I am not on board with simply reusing varatt_external for
zstd-compressed entries, neither do I think that this is the best move
ever. It makes the core patch simpler, but it makes things like
ToastCompressionId more complicated to think about. If anything, I'd
consider a rename of varatt_external as the best way to go with an
intermediate "translation" structure only used in memory as I am
proposing on the other thread (something that others seem meh enough
about but I am not seeing alternate proposals floating around,
either). This would make things like detoast_external_attr() less
confusing, I think, as the latest patch posted on this thread is
actually proving with its shortcut for toast_fetch_datum as one
example of something I'd rather not do..
--
Michael

#14Dharin Shah
dharinshah95@gmail.com
In reply to: Michael Paquier (#13)
Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format

Thanks Michael & Robert,

Agreed — I don’t think it’s realistic or practical to talk about
deprecating or replacing pglz (or lz4) given on-disk compatibility
requirements.

Note that I am not on board with simply reusing varatt_external for
zstd-compressed entries, neither do I think that this is the best move
ever. It makes the core patch simpler, but it makes things like
ToastCompressionId more complicated to think about. If anything, I'd
consider a rename of varatt_external as the best way to go with an
intermediate "translation" structure only used in memory as I am
proposing on the other thread (something that others seem meh enough
about but I am not seeing alternate proposals floating around,
either). This would make things like detoast_external_attr() less
confusing, I think, as the latest patch posted on this thread is
actually proving with its shortcut for toast_fetch_datum as one
example of something I'd rather not do..

On the design: I understand & share the same concerns that we’d end up with
multiple “sources of truth” for external compression method identification
(pglz/lz4 via va_extinfo bits, zstd via vartag), and that this pushes
method-specific shortcuts into detoast paths.

Would you be OK if I split this into two steps?

1.First, a refactor-only patch introducing a small decoded/in-memory
representation of an external TOAST pointer, so detoast/toast code paths
don’t have to reason directly about tcinfo vs vartag vs va_extinfo. This
would be a cleanup with no on-disk format change and no behavioral change
for existing methods. Is this the same “translation structure” approach you
mentioned in the other thread? If you can point me to it, I’ll align with
that proposal.

2. Then, a follow-up patch adding zstd using VARTAG_ONDISK_ZSTD,
implemented on top of that abstraction to keep zstd handling centralized
and minimize special-casing in detoast.
If that direction matches what you had in mind, I can first post the
proposed translation structure/API for feedback before respinning the zstd
patch.

Thanks,
Dharin

On Thu, Dec 25, 2025 at 1:25 AM Michael Paquier <michael@paquier.xyz> wrote:

Show quoted text

On Wed, Dec 24, 2025 at 11:50:48AM -0500, Robert Treat wrote:

Agreed that I can't see pglz being removed any time soon, if ever.
Thinking through what a conversion process would look like seems
unwieldy at best, so I think we definitely need it for backwards
compatibility, plus I think it is useful to have a self-contained
option. I'd almost suggest we should look at replacing lz4, but I
don't think that is significantly easier, it just has a smaller, more
invested, blast radius.

Backward-compatibility requirements make a replacement of LZ4
basically impossible to me, for the same reasons as pglz. We could
not replace the bit used in the va_extinfo to track if LZ4 compression
is used, either. One thing that I do wonder is if it would make
things simpler in the long-run if we introduced a new separated vartag
for LZ4-compressed external TOAST pointers as well. At least we'd
have a leaner design: it means that we have to keep the
varatt_external available on read, but we could update to the new
format when writing entries. Or perhaps that's not worth the
complication based on the last sentence you are writing..

That said, I do suspect ztsd could quickly
become a popular recommendation and/or default among users /
consultants / service providers.

.. Because I strongly suspect that this is going to be true, and that
zstd would just be a better replacement over lz4. That's a trend that
I see is already going on for wal_compression.

Note that I am not on board with simply reusing varatt_external for
zstd-compressed entries, neither do I think that this is the best move
ever. It makes the core patch simpler, but it makes things like
ToastCompressionId more complicated to think about. If anything, I'd
consider a rename of varatt_external as the best way to go with an
intermediate "translation" structure only used in memory as I am
proposing on the other thread (something that others seem meh enough
about but I am not seeing alternate proposals floating around,
either). This would make things like detoast_external_attr() less
confusing, I think, as the latest patch posted on this thread is
actually proving with its shortcut for toast_fetch_datum as one
example of something I'd rather not do..
--
Michael

#15Dharin Shah
dharinshah95@gmail.com
In reply to: Dharin Shah (#14)
Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format

Hello Michael,

Following up on the discussion about avoiding method-specific shortcuts in
detoast paths, this patch is a refactor-only step: it introduces a small
decoded/in-memory representation of an on-disk external TOAST pointer, and
refactors detoast_attr() and detoast_attr_slice() to use it.

The goal is to centralize “how do we interpret an external datum?” so that
detoast code paths don’t have to reason directly about va_extinfo encoding
vs payload layout details. This is intended as groundwork for a follow-up
patch adding a new vartag-based method (e.g., zstd) without scattering
special cases in detoast paths.

Key changes
- Introduces DecodedExternalToast + ToastDecompressMethod +
TOAST_EXT_HAS_TCINFO in toast_internals.h.
- Adds a small static decoder in detoast.c (decode_external_toast_pointer())
- Refactors detoast_attr() and detoast_attr_slice() to use a decode ->
fetch -> decompress dispatch pattern
- No on-disk format changes; existing behavior preserved (including error
behavior for unsupported compression builds).

Why HAS_TCINFO?
- Previously, “is compressed?” was used as a proxy for whether the external
payload begins with tcinfo. This patch makes that explicit: HAS_TCINFO
captures payload layout, which is distinct from whether the value is
compressed. This separation is needed for future methods that may store
external compressed payloads without tcinfo.

Testing: Core regression suites pass

Performance: I ran a small detoast-focused benchmark that forces external
storage; results were within run-to-run variance, with no measurable
regression. (Benchmark script attached: benchmark_toast_detoast.sql for
reproduction)

Thanks,
Dharin

On Thu, Dec 25, 2025 at 1:54 AM Dharin Shah <dharinshah95@gmail.com> wrote:

Show quoted text

Thanks Michael & Robert,

Agreed — I don’t think it’s realistic or practical to talk about
deprecating or replacing pglz (or lz4) given on-disk compatibility
requirements.

Note that I am not on board with simply reusing varatt_external for
zstd-compressed entries, neither do I think that this is the best move
ever. It makes the core patch simpler, but it makes things like
ToastCompressionId more complicated to think about. If anything, I'd
consider a rename of varatt_external as the best way to go with an
intermediate "translation" structure only used in memory as I am
proposing on the other thread (something that others seem meh enough
about but I am not seeing alternate proposals floating around,
either). This would make things like detoast_external_attr() less
confusing, I think, as the latest patch posted on this thread is
actually proving with its shortcut for toast_fetch_datum as one
example of something I'd rather not do..

On the design: I understand & share the same concerns that we’d end up
with multiple “sources of truth” for external compression method
identification (pglz/lz4 via va_extinfo bits, zstd via vartag), and that
this pushes method-specific shortcuts into detoast paths.

Would you be OK if I split this into two steps?

1.First, a refactor-only patch introducing a small decoded/in-memory
representation of an external TOAST pointer, so detoast/toast code paths
don’t have to reason directly about tcinfo vs vartag vs va_extinfo. This
would be a cleanup with no on-disk format change and no behavioral change
for existing methods. Is this the same “translation structure” approach you
mentioned in the other thread? If you can point me to it, I’ll align with
that proposal.

2. Then, a follow-up patch adding zstd using VARTAG_ONDISK_ZSTD,
implemented on top of that abstraction to keep zstd handling centralized
and minimize special-casing in detoast.
If that direction matches what you had in mind, I can first post the
proposed translation structure/API for feedback before respinning the zstd
patch.

Thanks,
Dharin

On Thu, Dec 25, 2025 at 1:25 AM Michael Paquier <michael@paquier.xyz>
wrote:

On Wed, Dec 24, 2025 at 11:50:48AM -0500, Robert Treat wrote:

Agreed that I can't see pglz being removed any time soon, if ever.
Thinking through what a conversion process would look like seems
unwieldy at best, so I think we definitely need it for backwards
compatibility, plus I think it is useful to have a self-contained
option. I'd almost suggest we should look at replacing lz4, but I
don't think that is significantly easier, it just has a smaller, more
invested, blast radius.

Backward-compatibility requirements make a replacement of LZ4
basically impossible to me, for the same reasons as pglz. We could
not replace the bit used in the va_extinfo to track if LZ4 compression
is used, either. One thing that I do wonder is if it would make
things simpler in the long-run if we introduced a new separated vartag
for LZ4-compressed external TOAST pointers as well. At least we'd
have a leaner design: it means that we have to keep the
varatt_external available on read, but we could update to the new
format when writing entries. Or perhaps that's not worth the
complication based on the last sentence you are writing..

That said, I do suspect ztsd could quickly
become a popular recommendation and/or default among users /
consultants / service providers.

.. Because I strongly suspect that this is going to be true, and that
zstd would just be a better replacement over lz4. That's a trend that
I see is already going on for wal_compression.

Note that I am not on board with simply reusing varatt_external for
zstd-compressed entries, neither do I think that this is the best move
ever. It makes the core patch simpler, but it makes things like
ToastCompressionId more complicated to think about. If anything, I'd
consider a rename of varatt_external as the best way to go with an
intermediate "translation" structure only used in memory as I am
proposing on the other thread (something that others seem meh enough
about but I am not seeing alternate proposals floating around,
either). This would make things like detoast_external_attr() less
confusing, I think, as the latest patch posted on this thread is
actually proving with its shortcut for toast_fetch_datum as one
example of something I'd rather not do..
--
Michael

Attachments:

benchmark_toast_detoast.sqlapplication/octet-stream; name=benchmark_toast_detoast.sqlDownload
0001-refactor-detoast-to-use-decoded-external-toast-abstr.patchapplication/octet-stream; name=0001-refactor-detoast-to-use-decoded-external-toast-abstr.patchDownload+278-88
#16Michael Paquier
michael@paquier.xyz
In reply to: Dharin Shah (#15)
Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format

On Mon, Dec 29, 2025 at 02:45:27PM +0100, Dharin Shah wrote:

The goal is to centralize “how do we interpret an external datum?” so that
detoast code paths don’t have to reason directly about va_extinfo encoding
vs payload layout details. This is intended as groundwork for a follow-up
patch adding a new vartag-based method (e.g., zstd) without scattering
special cases in detoast paths.

+static bool
+decode_external_toast_pointer(const struct varlena *attr,
+						   DecodedExternalToast *decoded)
[...]
+typedef enum ToastDecompressMethod
+{
+	TOAST_DECOMP_NONE = 0,
+	TOAST_DECOMP_PGLZ = 1,
+	TOAST_DECOMP_LZ4 = 2
+} ToastDecompressMethod;
+
+typedef struct DecodedExternalToast
+{
+	Oid			toastrelid;
+	Oid			valueid;
+	uint32		rawsize;		/* Decompressed size; for future methods without tcinfo */
+	uint32		extsize;		/* On-disk payload size */
+	ToastDecompressMethod method;
+	uint8		flags;
+} DecodedExternalToast;

Yeah, honestly this is a layer I have been thinking about as well as
one option, but contrary to you I have been focusing on putting that
into varatt.h, with the exception of the value being an Oid8. I think
that you have an interesting point in focusing your implementation to
be stored in the detoast part, though. I'd need to spend a bit more
time to see the result this would lead at with the larger 8-byte issue
in mind, but this is something that would come at no real cost as it
has no function pointer redirection compared to what I was first
envisioning on the other thread. That's especially true if it makes
the CompressionId business easier to mold around when adding a new
vartag.

Why HAS_TCINFO?
- Previously, “is compressed?” was used as a proxy for whether the external
payload begins with tcinfo. This patch makes that explicit: HAS_TCINFO
captures payload layout, which is distinct from whether the value is
compressed. This separation is needed for future methods that may store
external compressed payloads without tcinfo.

It is possible to model the on-memory data as we want. This
suggestion would be OK with some flags.
--
Michael

#17Dharin Shah
dharinshah95@gmail.com
In reply to: Michael Paquier (#16)
Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format

Thanks Michael,

After looking more closely at your “8‑byte TOAST values / infinite loop”
thread and patch series, I see this is very much the same direction you
outlined there: introduce a normalized in-memory representation for
external pointers (toast_external_data) and keep most call sites from
having to reason about vartag_external/va_extinfo details directly [1]/messages/by-id/CAN-LCVNsE4x0k11ZRWvU4ySTbe98fwA16qzV7p8dxogWnD5Jng@mail.gmail.com.

For this refactor patch I kept the decoder local to detoast.c to minimize
scope and avoid committing to a broader API boundary too early. But if the
consensus heads toward a shared interface closer to the format definitions
(as in your toast_external approach), I’m happy to respin/rework this patch
to align with that direction, rather than working on parallel
abstractions. It should also be straightforward to mold this refactor in
the direction of the 8‑byte value-id work without changing the overall
detoast structure.

On HAS_TCINFO flag: the intent is to make payload layout explicit. In the
current code, “external is compressed” effectively implies “payload begins
with tcinfo”, which is wired into fetch/slice logic. For a vartag-based
follow-up (e.g., zstd), we may want compressed payloads without a tcinfo
prefix, so having an explicit flag keeps detoast paths uniform and avoids
method-specific shortcuts.

Let me know what you’d prefer for next steps: keep this patch as a
detoast-local refactor, or respin it to align more directly with a shared
decoded external-pointer interface in the direction of the 8‑byte work.

[1]: /messages/by-id/CAN-LCVNsE4x0k11ZRWvU4ySTbe98fwA16qzV7p8dxogWnD5Jng@mail.gmail.com
/messages/by-id/CAN-LCVNsE4x0k11ZRWvU4ySTbe98fwA16qzV7p8dxogWnD5Jng@mail.gmail.com

Thanks,
Dharin

On Tue, Dec 30, 2025 at 12:46 AM Michael Paquier <michael@paquier.xyz>
wrote:

Show quoted text

On Mon, Dec 29, 2025 at 02:45:27PM +0100, Dharin Shah wrote:

The goal is to centralize “how do we interpret an external datum?” so

that

detoast code paths don’t have to reason directly about va_extinfo

encoding

vs payload layout details. This is intended as groundwork for a follow-up
patch adding a new vartag-based method (e.g., zstd) without scattering
special cases in detoast paths.

+static bool
+decode_external_toast_pointer(const struct varlena *attr,
+                                                  DecodedExternalToast
*decoded)
[...]
+typedef enum ToastDecompressMethod
+{
+       TOAST_DECOMP_NONE = 0,
+       TOAST_DECOMP_PGLZ = 1,
+       TOAST_DECOMP_LZ4 = 2
+} ToastDecompressMethod;
+
+typedef struct DecodedExternalToast
+{
+       Oid                     toastrelid;
+       Oid                     valueid;
+       uint32          rawsize;                /* Decompressed size; for
future methods without tcinfo */
+       uint32          extsize;                /* On-disk payload size */
+       ToastDecompressMethod method;
+       uint8           flags;
+} DecodedExternalToast;

Yeah, honestly this is a layer I have been thinking about as well as
one option, but contrary to you I have been focusing on putting that
into varatt.h, with the exception of the value being an Oid8. I think
that you have an interesting point in focusing your implementation to
be stored in the detoast part, though. I'd need to spend a bit more
time to see the result this would lead at with the larger 8-byte issue
in mind, but this is something that would come at no real cost as it
has no function pointer redirection compared to what I was first
envisioning on the other thread. That's especially true if it makes
the CompressionId business easier to mold around when adding a new
vartag.

Why HAS_TCINFO?
- Previously, “is compressed?” was used as a proxy for whether the

external

payload begins with tcinfo. This patch makes that explicit: HAS_TCINFO
captures payload layout, which is distinct from whether the value is
compressed. This separation is needed for future methods that may store
external compressed payloads without tcinfo.

It is possible to model the on-memory data as we want. This
suggestion would be OK with some flags.
--
Michael

#18Michael Paquier
michael@paquier.xyz
In reply to: Dharin Shah (#17)
Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format

On Wed, Dec 31, 2025 at 04:02:24PM +0100, Dharin Shah wrote:

Let me know what you’d prefer for next steps: keep this patch as a
detoast-local refactor, or respin it to align more directly with a shared
decoded external-pointer interface in the direction of the 8‑byte work.

My apologies for the rather long silence on this thread. As the next
step of this project, I am going to put my hands of what you are
suggesting here, and see how I can align it with the 64-bit toast
value patch:
/messages/by-id/CAOj6k6dEVi0NvLjMLDhyrJS_n_NZO5D_OU89AO1u53u6NCDDwQ@mail.gmail.com

What I am pretty sure about at this stage is that there is little love
for the patch set I have sent on the other thread where I have been
using pointer redirections for the TOAST function calls with
callbacks (perhaps I'll be able to apply some of the renaming patches
anyway, nobody would scream at me for that), at least nobody has put a
+1 on it or just ignored it, so this approach feels dead to me.  What
you are suggesting upthread, though, is a direction I'd like to dig
into and this comes down to how I can unify what you want to do for
zstd and what I want to do with Oid8.  Perhaps that you are right and
that it is just simpler to invest on an interface in the detoast code,
but I still see that there is nothing done for the logical decoding or
amcheck code paths, which is something my other patch is able to deal
with transparently.
--
Michael
#19Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#18)
Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format

On Tue, Jan 20, 2026 at 03:45:12PM +0900, Michael Paquier wrote:

What I am pretty sure about at this stage is that there is little love
for the patch set I have sent on the other thread where I have been
using pointer redirections for the TOAST function calls with
callbacks (perhaps I'll be able to apply some of the renaming patches
anyway, nobody would scream at me for that), at least nobody has put a
+1 on it or just ignored it, so this approach feels dead to me. What
you are suggesting upthread, though, is a direction I'd like to dig
into and this comes down to how I can unify what you want to do for
zstd and what I want to do with Oid8. Perhaps that you are right and
that it is just simpler to invest on an interface in the detoast code,
but I still see that there is nothing done for the logical decoding or
amcheck code paths, which is something my other patch is able to deal
with transparently.

(Added Nikhil in CC.)

While looking at what could be achieved for this release, I have
hacked on a patch that removes the limitation of the GUC enum for
defailt_toast_compression, which is currently limited at 4 values due
to the two bits allocated in the varlenas, and finished with the
attached.

This relies on the concept of a compression method "registry", saved
in toast_compression.c, that includes 4 fields for each compression
method supported in TOAST:
- The attcompression attribute in catalogs, a char.
- The compression method name.
- The compression GUC enum value.
- The on-disk varlena value, now moved to varatt.h.

What do you think about this approach?
--
Michael

Attachments:

0001-Refactor-code-logic-around-TOAST-compression.patchtext/plain; charset=us-asciiDownload+176-86