astreamer_lz4: fix bug of output pointer advancement in decompressor

Started by Chao Liabout 1 month ago7 messageshackers
Jump to latest
#1Chao Li
li.evan.chao@gmail.com

Hi,

There have been a couple of LZ4-related patches recently, so I spent some time playing with the LZ4 path and found a bug in astreamer_lz4_decompressor_content().

Looking at the code snippet (omitting irrelevant code):
```
ret = LZ4F_decompress(mystreamer->dctx,
next_out, &out_size,
next_in, &read_size, NULL);

mystreamer->bytes_written += out_size; // <== bumped bytes_written already

/*
* If output buffer is full then forward the content to next streamer
* and update the output buffer.
*/
if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
{
astreamer_content(mystreamer->base.bbs_next, member,
mystreamer->base.bbs_buffer.data,
mystreamer->base.bbs_buffer.maxlen,
context);

avail_out = mystreamer->base.bbs_buffer.maxlen;
mystreamer->bytes_written = 0;
next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
}
else
{
avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
next_out += mystreamer->bytes_written; // <== The bug is there
}
```

To advance next_out, the code uses mystreamer->bytes_written. However, bytes_written has already been increased by out_size in the current iteration. As a result, next_out is advanced by the cumulative number of bytes written so far, instead of just the number of bytes produced in this iteration. Effectively, the pointer movement is double-counting the previous progress.

When I tried to design a test case to trigger this bug, I found it is actually not easy to hit in normal execution. Tracing into the function, I found that the default output buffer size is 1024 bytes, and in practice LZ4F_decompress() tends to fill the output buffer in one or two iterations. As a result, the problematic else branch is either not reached, or reached in a case where bytes_written == out_size, so the incorrect pointer increment does not manifest.

To reliably trigger the bug, I used a small hack: instead of letting LZ4F_decompress() use the full available out_size, I artificially limited out_size before the call, forcing LZ4F_decompress() to require one more iteration to fill the buffer. See the attached nocfbot_hack.diff for the hack.

With that hack in place, the bug can be reproduced using the following procedure:

1. initdb
2 Set "wal_level = replica” in postgreSQl.conf
3. Restart the instance
4. Create a database
5. Generate some WAL logs by psql
```
CREATE TABLE t AS SELECT generate_series(1, 100000) AS id;
CHECKPOINT;
```
6. Create a backup
```
% rm -rf /tmp/bkup_lz4
% pg_basebackup -D /tmp/bkup_lz4 -F t -Z lz4 -X stream -c fast
```
7. Verify the backup
```
% pg_verifybackup -F t -n /tmp/bkup_lz4
pg_verifybackup: error: zsh: trace trap pg_verifybackup -F t -n /tmp/bkup_lz4
```

With the fix applied (plus the hack), step 7 succeeds:
```
% pg_verifybackup -F t -n /tmp/bkup_lz4
backup successfully verified
```
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

nocfbot_hack.diffapplication/octet-stream; name=nocfbot_hack.diff; x-unix-mode=0644Download+8-1
v1-0001-astreamer_lz4-fix-output-pointer-advancement-in-d.patchapplication/octet-stream; name=v1-0001-astreamer_lz4-fix-output-pointer-advancement-in-d.patch; x-unix-mode=0644Download+2-3
#2Chao Li
li.evan.chao@gmail.com
In reply to: Chao Li (#1)
Re: astreamer_lz4: fix bug of output pointer advancement in decompressor

On Mar 2, 2026, at 17:17, Chao Li <li.evan.chao@gmail.com> wrote:

Hi,

There have been a couple of LZ4-related patches recently, so I spent some time playing with the LZ4 path and found a bug in astreamer_lz4_decompressor_content().

Looking at the code snippet (omitting irrelevant code):
```
ret = LZ4F_decompress(mystreamer->dctx,
next_out, &out_size,
next_in, &read_size, NULL);

mystreamer->bytes_written += out_size; // <== bumped bytes_written already

/*
* If output buffer is full then forward the content to next streamer
* and update the output buffer.
*/
if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
{
astreamer_content(mystreamer->base.bbs_next, member,
mystreamer->base.bbs_buffer.data,
mystreamer->base.bbs_buffer.maxlen,
context);

avail_out = mystreamer->base.bbs_buffer.maxlen;
mystreamer->bytes_written = 0;
next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
}
else
{
avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
next_out += mystreamer->bytes_written; // <== The bug is there
}
```

To advance next_out, the code uses mystreamer->bytes_written. However, bytes_written has already been increased by out_size in the current iteration. As a result, next_out is advanced by the cumulative number of bytes written so far, instead of just the number of bytes produced in this iteration. Effectively, the pointer movement is double-counting the previous progress.

When I tried to design a test case to trigger this bug, I found it is actually not easy to hit in normal execution. Tracing into the function, I found that the default output buffer size is 1024 bytes, and in practice LZ4F_decompress() tends to fill the output buffer in one or two iterations. As a result, the problematic else branch is either not reached, or reached in a case where bytes_written == out_size, so the incorrect pointer increment does not manifest.

To reliably trigger the bug, I used a small hack: instead of letting LZ4F_decompress() use the full available out_size, I artificially limited out_size before the call, forcing LZ4F_decompress() to require one more iteration to fill the buffer. See the attached nocfbot_hack.diff for the hack.

With that hack in place, the bug can be reproduced using the following procedure:

1. initdb
2 Set "wal_level = replica” in postgreSQl.conf
3. Restart the instance
4. Create a database
5. Generate some WAL logs by psql
```
CREATE TABLE t AS SELECT generate_series(1, 100000) AS id;
CHECKPOINT;
```
6. Create a backup
```
% rm -rf /tmp/bkup_lz4
% pg_basebackup -D /tmp/bkup_lz4 -F t -Z lz4 -X stream -c fast
```
7. Verify the backup
```
% pg_verifybackup -F t -n /tmp/bkup_lz4
pg_verifybackup: error: zsh: trace trap pg_verifybackup -F t -n /tmp/bkup_lz4
```

With the fix applied (plus the hack), step 7 succeeds:
```
% pg_verifybackup -F t -n /tmp/bkup_lz4
backup successfully verified
```
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

<nocfbot_hack.diff><v1-0001-astreamer_lz4-fix-output-pointer-advancement-in-d.patch>

Added to CF for tracking https://commitfest.postgresql.org/patch/6561/

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#3Xuneng Zhou
xunengzhou@gmail.com
In reply to: Chao Li (#2)
Re: astreamer_lz4: fix bug of output pointer advancement in decompressor

Hi,

On Tue, Mar 3, 2026 at 11:27 AM Chao Li <li.evan.chao@gmail.com> wrote:

On Mar 2, 2026, at 17:17, Chao Li <li.evan.chao@gmail.com> wrote:

Hi,

There have been a couple of LZ4-related patches recently, so I spent some time playing with the LZ4 path and found a bug in astreamer_lz4_decompressor_content().

Looking at the code snippet (omitting irrelevant code):
```
ret = LZ4F_decompress(mystreamer->dctx,
next_out, &out_size,
next_in, &read_size, NULL);

mystreamer->bytes_written += out_size; // <== bumped bytes_written already

/*
* If output buffer is full then forward the content to next streamer
* and update the output buffer.
*/
if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
{
astreamer_content(mystreamer->base.bbs_next, member,
mystreamer->base.bbs_buffer.data,
mystreamer->base.bbs_buffer.maxlen,
context);

avail_out = mystreamer->base.bbs_buffer.maxlen;
mystreamer->bytes_written = 0;
next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
}
else
{
avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
next_out += mystreamer->bytes_written; // <== The bug is there
}
```

To advance next_out, the code uses mystreamer->bytes_written. However, bytes_written has already been increased by out_size in the current iteration. As a result, next_out is advanced by the cumulative number of bytes written so far, instead of just the number of bytes produced in this iteration. Effectively, the pointer movement is double-counting the previous progress.

When I tried to design a test case to trigger this bug, I found it is actually not easy to hit in normal execution. Tracing into the function, I found that the default output buffer size is 1024 bytes, and in practice LZ4F_decompress() tends to fill the output buffer in one or two iterations. As a result, the problematic else branch is either not reached, or reached in a case where bytes_written == out_size, so the incorrect pointer increment does not manifest.

To reliably trigger the bug, I used a small hack: instead of letting LZ4F_decompress() use the full available out_size, I artificially limited out_size before the call, forcing LZ4F_decompress() to require one more iteration to fill the buffer. See the attached nocfbot_hack.diff for the hack.

With that hack in place, the bug can be reproduced using the following procedure:

1. initdb
2 Set "wal_level = replica” in postgreSQl.conf
3. Restart the instance
4. Create a database
5. Generate some WAL logs by psql
```
CREATE TABLE t AS SELECT generate_series(1, 100000) AS id;
CHECKPOINT;
```
6. Create a backup
```
% rm -rf /tmp/bkup_lz4
% pg_basebackup -D /tmp/bkup_lz4 -F t -Z lz4 -X stream -c fast
```
7. Verify the backup
```
% pg_verifybackup -F t -n /tmp/bkup_lz4
pg_verifybackup: error: zsh: trace trap pg_verifybackup -F t -n /tmp/bkup_lz4
```

With the fix applied (plus the hack), step 7 succeeds:
```
% pg_verifybackup -F t -n /tmp/bkup_lz4
backup successfully verified
```
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

<nocfbot_hack.diff><v1-0001-astreamer_lz4-fix-output-pointer-advancement-in-d.patch>

Added to CF for tracking https://commitfest.postgresql.org/patch/6561/

I agree this is a logical issue. We should increment next_out by the
delta value(out_size) rather than the cumulative
value(mystreamer->bytes_written); otherwise, it will leave holes in
the output buffer. The proposed fix LGTM.

--
Best,
Xuneng

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chao Li (#1)
Re: astreamer_lz4: fix bug of output pointer advancement in decompressor

Chao Li <li.evan.chao@gmail.com> writes:

There have been a couple of LZ4-related patches recently, so I spent some time playing with the LZ4 path and found a bug in astreamer_lz4_decompressor_content().

Yup, that's clearly wrong. I failed to reproduce a crash with the
test hack you suggested, but no matter. Pushed with some cosmetic
editorialization.

The track record of all this client-side-compression logic is
really quite awful :-(. Another thing that I'm looking askance
at is the error handling, or rather lack of it:

ret = LZ4F_decompress(mystreamer->dctx,
next_out, &out_size,
next_in, &read_size, NULL);

if (LZ4F_isError(ret))
pg_log_error("could not decompress data: %s",
LZ4F_getErrorName(ret));

... continue on our merry way ...

I suspect whoever wrote this thought pg_log_error is equivalent
to elog(ERROR), but it's not; it just prints a message. It seems
highly unlikely to me that continuing onwards will result in a
good outcome. I'm a bit inclined to s/pg_log_error/pg_fatal/
throughout these files, at least in places where there's no
visible effort to handle the error.

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#4)
Re: astreamer_lz4: fix bug of output pointer advancement in decompressor

I wrote:

I suspect whoever wrote this thought pg_log_error is equivalent
to elog(ERROR), but it's not; it just prints a message. It seems
highly unlikely to me that continuing onwards will result in a
good outcome. I'm a bit inclined to s/pg_log_error/pg_fatal/
throughout these files, at least in places where there's no
visible effort to handle the error.

After looking through fe_utils, pg_dump, pg_basebackup, and
pg_verifybackup, I found the attached places that seem to
need cleanup. There are a couple other places where we
are not treating failures as fatal, but those seem intentional,
eg not fatal'ing on close() failure for an input file.

regards, tom lane

Attachments:

v1-exit-after-pg_log_error.patchtext/x-diff; charset=us-ascii; name=v1-exit-after-pg_log_error.patchDownload+31-24
#6Chao Li
li.evan.chao@gmail.com
In reply to: Tom Lane (#4)
Re: astreamer_lz4: fix bug of output pointer advancement in decompressor

On Mar 5, 2026, at 01:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Chao Li <li.evan.chao@gmail.com> writes:

There have been a couple of LZ4-related patches recently, so I spent some time playing with the LZ4 path and found a bug in astreamer_lz4_decompressor_content().

Yup, that's clearly wrong. I failed to reproduce a crash with the
test hack you suggested, but no matter. Pushed with some cosmetic
editorialization.

Hmm.. I just tried again. With applying nocfbot_hack.diff to an old branch, I can easily reproduce the bug:
```
chaol@ChaodeMacBook-Air cndb % pg_basebackup -D /tmp/bkup_lz4 -F t -Z lz4 -X stream -c fast
2026-03-05 09:01:53.461 CST [72896] LOG: checkpoint starting: fast force wait time
2026-03-05 09:01:53.466 CST [72896] LOG: checkpoint complete: fast force wait time: wrote 0 buffers (0.0%), wrote 0 SLRU buffers; 0 WAL file(s) added, 0 removed, 1 recycled; write=0.001 s, sync=0.001 s, total=0.006 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16383 kB, estimate=29655 kB; lsn=0/14000080, redo lsn=0/14000028
chaol@ChaodeMacBook-Air cndb % pg_verifybackup -F t -n /tmp/bkup_lz4
pg_verifybackup: error: zsh: trace trap pg_verifybackup -F t -n /tmp/bkup_lz4
```

Then switching to the latest master, and also applying nocfbot_hack.diff:
```
chaol@ChaodeMacBook-Air postgresql % git diff
diff --git a/src/fe_utils/astreamer_lz4.c b/src/fe_utils/astreamer_lz4.c
index e196fcc81e5..35fd564df9a 100644
--- a/src/fe_utils/astreamer_lz4.c
+++ b/src/fe_utils/astreamer_lz4.c
@@ -331,10 +331,15 @@ astreamer_lz4_decompressor_content(astreamer *streamer,
                size_t          ret,
                                        read_size,
                                        out_size;
+               size_t hack_out_size;

read_size = avail_in;
out_size = avail_out;

+               if (out_size > 5)
+                       hack_out_size = out_size - 5;
+               else
+                       hack_out_size = out_size;
                /*
                 * This call decompresses the data starting at next_in and generates
                 * the output data starting at next_out. It expects the caller to
@@ -349,13 +354,15 @@ astreamer_lz4_decompressor_content(astreamer *streamer,
                 * to out_size respectively.
                 */
                ret = LZ4F_decompress(mystreamer->dctx,
-                                                         next_out, &out_size,
+                                                         next_out, &hack_out_size,
                                                          next_in, &read_size, NULL);

if (LZ4F_isError(ret))
pg_log_error("could not decompress data: %s",
LZ4F_getErrorName(ret));

+               out_size = hack_out_size;
+
                /* Update input buffer based on number of bytes consumed */
                avail_in -= read_size;
                next_in += read_size;
```

Now, the bug goes away:
```
chaol@ChaodeMacBook-Air cndb % rm -rf /tmp/bkup_lz4
chaol@ChaodeMacBook-Air cndb % pg_basebackup -D /tmp/bkup_lz4 -F t -Z lz4 -X stream -c fast
2026-03-05 09:05:57.632 CST [72896] LOG: checkpoint starting: fast force wait
2026-03-05 09:05:57.634 CST [72896] LOG: checkpoint complete: fast force wait: wrote 0 buffers (0.0%), wrote 0 SLRU buffers; 0 WAL file(s) added, 0 removed, 2 recycled; write=0.001 s, sync=0.001 s, total=0.003 s; sync files=0, longest=0.000 s, average=0.000 s; distance=32768 kB, estimate=32768 kB; lsn=0/16000080, redo lsn=0/16000028
chaol@ChaodeMacBook-Air cndb % pg_verifybackup -F t -n /tmp/bkup_lz4
backup successfully verified
```

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#7Chao Li
li.evan.chao@gmail.com
In reply to: Tom Lane (#5)
Re: astreamer_lz4: fix bug of output pointer advancement in decompressor

On Thu, Mar 5, 2026 at 2:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

I suspect whoever wrote this thought pg_log_error is equivalent
to elog(ERROR), but it's not; it just prints a message. It seems
highly unlikely to me that continuing onwards will result in a
good outcome. I'm a bit inclined to s/pg_log_error/pg_fatal/
throughout these files, at least in places where there's no
visible effort to handle the error.

After looking through fe_utils, pg_dump, pg_basebackup, and
pg_verifybackup, I found the attached places that seem to
need cleanup. There are a couple other places where we
are not treating failures as fatal, but those seem intentional,
eg not fatal'ing on close() failure for an input file.

regards, tom lane

I also thought pg_log_error behaves the same way as elog(ERROR). Noted now.

The cleanup looks good to me. I also did a broader search, and didn't find
a more similar place to clean up.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/