[PATCH] Compressed TOAST data corruption with REPACK CONCURRENTLY

Started by SATYANARAYANA NARLAPURAMabout 2 months ago10 messageshackers
Jump to latest
#1SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com

Hi hackers,

restore_tuple() in repack.c uses SET_VARSIZE() to reconstruct the varlena
header when
reading back external attributes from the spill file. In this process,
looks like the flag
SET_VARSIZE_COMPRESSED is silently lost. Because of this, when REPACK
CONCURRENTLY
run any concurrently updated column whose value was TOAST-compressed ends
up with raw
compressed bytes behind an "uncompressed" header returning garbled data on
subsequent reads.
It appears that existing tests are using random chars which are
uncompressable.

Please find the
attached 0001-Fix-restore_tuple-losing-varlena-compression-flag.patch to
fix this.
Additionally I updated the existing repack_toast test to include the
scenario I was talking about.

Thanks,
Satya

Attachments:

0001-Fix-restore_tuple-losing-varlena-compression-flag.patchapplication/octet-stream; name=0001-Fix-restore_tuple-losing-varlena-compression-flag.patchDownload+4-1
0002-Add-compressed-TOAST-test-to-repack_toast.patchapplication/octet-stream; name=0002-Add-compressed-TOAST-test-to-repack_toast.patchDownload+116-0
#2Chao Li
li.evan.chao@gmail.com
In reply to: SATYANARAYANA NARLAPURAM (#1)
Re: [PATCH] Compressed TOAST data corruption with REPACK CONCURRENTLY

On Apr 16, 2026, at 14:13, SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> wrote:

Hi hackers,

restore_tuple() in repack.c uses SET_VARSIZE() to reconstruct the varlena header when
reading back external attributes from the spill file. In this process, looks like the flag
SET_VARSIZE_COMPRESSED is silently lost. Because of this, when REPACK CONCURRENTLY
run any concurrently updated column whose value was TOAST-compressed ends up with raw
compressed bytes behind an "uncompressed" header returning garbled data on subsequent reads.
It appears that existing tests are using random chars which are uncompressable.

Please find the attached 0001-Fix-restore_tuple-losing-varlena-compression-flag.patch to fix this.
Additionally I updated the existing repack_toast test to include the scenario I was talking about.

Thanks,
Satya
<0001-Fix-restore_tuple-losing-varlena-compression-flag.patch><0002-Add-compressed-TOAST-test-to-repack_toast.patch>

I managed to reproduce the bug manually, and confirmed your fix to work for me. The repro is not simple, so I won’t put it here. If somebody is interested in it, then I can provide.

I didn’t review the test in 0002, I guess we don’t have to add the test because once fixed, the bug won’t be there anymore, thus it’s not worthy extending the test time.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#3Antonin Houska
ah@cybertec.at
In reply to: SATYANARAYANA NARLAPURAM (#1)
Re: [PATCH] Compressed TOAST data corruption with REPACK CONCURRENTLY

SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> wrote:

restore_tuple() in repack.c uses SET_VARSIZE() to reconstruct the varlena header when
reading back external attributes from the spill file. In this process, looks like the flag
SET_VARSIZE_COMPRESSED is silently lost. Because of this, when REPACK CONCURRENTLY
run any concurrently updated column whose value was TOAST-compressed ends up with raw
compressed bytes behind an "uncompressed" header returning garbled data on subsequent reads.
It appears that existing tests are using random chars which are uncompressable.

Please find the attached 0001-Fix-restore_tuple-losing-varlena-compression-flag.patch to fix this.
Additionally I updated the existing repack_toast test to include the scenario I was talking about.

Good catch, thanks!

I'd slightly prefer to fix it w/o checking the varlena type, as
attached. However, your test fails to reproduce the issue here, so I'm not
able to verify the fix. I'll take a closer look early next week.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachments:

fix_restore_tuple.difftext/x-diffDownload+1-1
#4SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: Antonin Houska (#3)
Re: [PATCH] Compressed TOAST data corruption with REPACK CONCURRENTLY

Hi

On Fri, Apr 17, 2026 at 8:45 AM Antonin Houska <ah@cybertec.at> wrote:

SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> wrote:

restore_tuple() in repack.c uses SET_VARSIZE() to reconstruct the

varlena header when

reading back external attributes from the spill file. In this process,

looks like the flag

SET_VARSIZE_COMPRESSED is silently lost. Because of this, when REPACK

CONCURRENTLY

run any concurrently updated column whose value was TOAST-compressed

ends up with raw

compressed bytes behind an "uncompressed" header returning garbled data

on subsequent reads.

It appears that existing tests are using random chars which are

uncompressable.

Please find the attached

0001-Fix-restore_tuple-losing-varlena-compression-flag.patch to fix this.

Additionally I updated the existing repack_toast test to include the

scenario I was talking about.

Good catch, thanks!

I'd slightly prefer to fix it w/o checking the varlena type, as
attached. However, your test fails to reproduce the issue here, so I'm not
able to verify the fix. I'll take a closer look early next week.

I started with that but tried to follow the existing code pattern. This
LGTM.
Please add a comment as well.

Show quoted text

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#5Michael Paquier
michael@paquier.xyz
In reply to: SATYANARAYANA NARLAPURAM (#4)
Re: [PATCH] Compressed TOAST data corruption with REPACK CONCURRENTLY

On Fri, Apr 17, 2026 at 10:40:39AM -0700, SATYANARAYANA NARLAPURAM wrote:

I started with that but tried to follow the existing code pattern. This
LGTM.
Please add a comment as well.

Hmm. I was reading restore_tuple(), and could it make sense to expand
a bit more the tests so as more types of varlena pointers could be
checked in this routine? I am taking about MAIN, EXTENDED and
EXTERNAL, so as we could check more patterns with in-line
[non-]compressed, and external [non-]compressed, counting for the four
different possible patterns that could lead to repacked data. See for
example strings.sql for such tests, that could be used as base
inspiration.
--
Michael

#6Antonin Houska
ah@cybertec.at
In reply to: Michael Paquier (#5)
Re: [PATCH] Compressed TOAST data corruption with REPACK CONCURRENTLY

Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Apr 17, 2026 at 10:40:39AM -0700, SATYANARAYANA NARLAPURAM wrote:

I started with that but tried to follow the existing code pattern. This
LGTM.
Please add a comment as well.

Hmm. I was reading restore_tuple(), and could it make sense to expand
a bit more the tests so as more types of varlena pointers could be
checked in this routine? I am taking about MAIN, EXTENDED and
EXTERNAL, so as we could check more patterns with in-line
[non-]compressed, and external [non-]compressed, counting for the four
different possible patterns that could lead to repacked data. See for
example strings.sql for such tests, that could be used as base
inspiration.

ok, this is a patch to enhance the tests. It also simplifies
gen_compressible() a bit and adjusts it so it is (hopefully) more reliable in
generating compressible values. Now it can be used to reproduce the fix [1]/messages/by-id/52301.1776440752@localhost.

[1]: /messages/by-id/52301.1776440752@localhost

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachments:

0001-Enhance-tests-for-TOAST-processing-by-REPACK.patchtext/x-diffDownload+63-20
#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Antonin Houska (#6)
Re: [PATCH] Compressed TOAST data corruption with REPACK CONCURRENTLY

Hi Antonin,

On 2026-04-20, Antonin Houska wrote:

ok, this is a patch to enhance the tests. It also simplifies
gen_compressible() a bit and adjusts it so it is (hopefully) more reliable in
generating compressible values. Now it can be used to reproduce the fix [1].

[1] /messages/by-id/52301.1776440752@localhost

Thanks, I think it looks good, but unfortunately I don't see any problems when running this test without your fix (I was assuming that the test would fail). I didn't immediately understand why. Do you see any failures?

Regards,

--
Álvaro Herrera

#8Antonin Houska
ah@cybertec.at
In reply to: Alvaro Herrera (#7)
Re: [PATCH] Compressed TOAST data corruption with REPACK CONCURRENTLY

Álvaro Herrera <alvherre@kurilemu.de> wrote:

ok, this is a patch to enhance the tests. It also simplifies
gen_compressible() a bit and adjusts it so it is (hopefully) more reliable in
generating compressible values. Now it can be used to reproduce the fix [1].

[1] /messages/by-id/52301.1776440752@localhost

Thanks, I think it looks good, but unfortunately I don't see any problems
when running this test without your fix (I was assuming that the test would
fail). I didn't immediately understand why. Do you see any failures?

The enhanced tests should reproduce it (please let me know if they dont):

/messages/by-id/44015.1776685220@localhost

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#9Antonin Houska
ah@cybertec.at
In reply to: Antonin Houska (#8)
Re: [PATCH] Compressed TOAST data corruption with REPACK CONCURRENTLY

Antonin Houska <ah@cybertec.at> wrote:

Álvaro Herrera <alvherre@kurilemu.de> wrote:

ok, this is a patch to enhance the tests. It also simplifies
gen_compressible() a bit and adjusts it so it is (hopefully) more reliable in
generating compressible values. Now it can be used to reproduce the fix [1].

[1] /messages/by-id/52301.1776440752@localhost

Thanks, I think it looks good, but unfortunately I don't see any problems
when running this test without your fix (I was assuming that the test would
fail). I didn't immediately understand why. Do you see any failures?

The enhanced tests should reproduce it (please let me know if they dont):

/messages/by-id/44015.1776685220@localhost

I see you actually refer to the correct tests. I see no failure now as
well. I'll try to make the tests more stable. Sorry for the confusion.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Antonin Houska (#9)
Re: [PATCH] Compressed TOAST data corruption with REPACK CONCURRENTLY

On 2026-Apr-26, Antonin Houska wrote:

I see you actually refer to the correct tests. I see no failure now as
well. I'll try to make the tests more stable. Sorry for the confusion.

So, the reason this wasn't failing for me is that I had default TOAST
compression lz4, and the test case wasn't reaching size for being made
external, as opposed to pglz. I changed the test to specify pglz
compression explicitly. I also added some dropped columns and one more
toast column. (On my machine, this expanded test fails for several
rows, not just one, when the fix is deactivated.) I fear the buildfarm
may end up showing other failures, but there's no way to know other than
to try.

Thanks, Satya, for reporting the bug.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"In fact, the basic problem with Perl 5's subroutines is that they're not
crufty enough, so the cruft leaks out into user-defined code instead, by
the Conservation of Cruft Principle." (Larry Wall, Apocalypse 6)