astreamer_lz4: fix bug of output pointer advancement in decompressor

Started by Chao Liabout 1 month ago7 messageshackers

li.evan.chao@gmail.com

about 1 month ago

Hi,

There have been a couple of LZ4-related patches recently, so I spent some time playing with the LZ4 path and found a bug in astreamer_lz4_decompressor_content().

Looking at the code snippet (omitting irrelevant code):
```
ret = LZ4F_decompress(mystreamer->dctx,
next_out, &out_size,
next_in, &read_size, NULL);

mystreamer->bytes_written += out_size; // <== bumped bytes_written already

/*
* If output buffer is full then forward the content to next streamer
* and update the output buffer.
*/
if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
{
astreamer_content(mystreamer->base.bbs_next, member,
mystreamer->base.bbs_buffer.data,
mystreamer->base.bbs_buffer.maxlen,
context);

avail_out = mystreamer->base.bbs_buffer.maxlen;
mystreamer->bytes_written = 0;
next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
}
else
{
avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
next_out += mystreamer->bytes_written; // <== The bug is there
}
```

To advance next_out, the code uses mystreamer->bytes_written. However, bytes_written has already been increased by out_size in the current iteration. As a result, next_out is advanced by the cumulative number of bytes written so far, instead of just the number of bytes produced in this iteration. Effectively, the pointer movement is double-counting the previous progress.

When I tried to design a test case to trigger this bug, I found it is actually not easy to hit in normal execution. Tracing into the function, I found that the default output buffer size is 1024 bytes, and in practice LZ4F_decompress() tends to fill the output buffer in one or two iterations. As a result, the problematic else branch is either not reached, or reached in a case where bytes_written == out_size, so the incorrect pointer increment does not manifest.

To reliably trigger the bug, I used a small hack: instead of letting LZ4F_decompress() use the full available out_size, I artificially limited out_size before the call, forcing LZ4F_decompress() to require one more iteration to fill the buffer. See the attached nocfbot_hack.diff for the hack.

With that hack in place, the bug can be reproduced using the following procedure:

1. initdb
2 Set "wal_level = replica” in postgreSQl.conf
3. Restart the instance
4. Create a database
5. Generate some WAL logs by psql
```
CREATE TABLE t AS SELECT generate_series(1, 100000) AS id;
CHECKPOINT;
```
6. Create a backup
```
% rm -rf /tmp/bkup_lz4
% pg_basebackup -D /tmp/bkup_lz4 -F t -Z lz4 -X stream -c fast
```
7. Verify the backup
```
% pg_verifybackup -F t -n /tmp/bkup_lz4
pg_verifybackup: error: zsh: trace trap pg_verifybackup -F t -n /tmp/bkup_lz4
```

With the fix applied (plus the hack), step 7 succeeds:
```
% pg_verifybackup -F t -n /tmp/bkup_lz4
backup successfully verified
```
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Chao Li

li.evan.chao@gmail.com

about 1 month ago

In reply to: Chao Li (#1)

Re: astreamer_lz4: fix bug of output pointer advancement in decompressor

On Mar 2, 2026, at 17:17, Chao Li <li.evan.chao@gmail.com> wrote:

Hi,

There have been a couple of LZ4-related patches recently, so I spent some time playing with the LZ4 path and found a bug in astreamer_lz4_decompressor_content().

Looking at the code snippet (omitting irrelevant code):
```
ret = LZ4F_decompress(mystreamer->dctx,
next_out, &out_size,
next_in, &read_size, NULL);

mystreamer->bytes_written += out_size; // <== bumped bytes_written already

/*
* If output buffer is full then forward the content to next streamer
* and update the output buffer.
*/
if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
{
astreamer_content(mystreamer->base.bbs_next, member,
mystreamer->base.bbs_buffer.data,
mystreamer->base.bbs_buffer.maxlen,
context);

avail_out = mystreamer->base.bbs_buffer.maxlen;
mystreamer->bytes_written = 0;
next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
}
else
{
avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
next_out += mystreamer->bytes_written; // <== The bug is there
}
```

To advance next_out, the code uses mystreamer->bytes_written. However, bytes_written has already been increased by out_size in the current iteration. As a result, next_out is advanced by the cumulative number of bytes written so far, instead of just the number of bytes produced in this iteration. Effectively, the pointer movement is double-counting the previous progress.

When I tried to design a test case to trigger this bug, I found it is actually not easy to hit in normal execution. Tracing into the function, I found that the default output buffer size is 1024 bytes, and in practice LZ4F_decompress() tends to fill the output buffer in one or two iterations. As a result, the problematic else branch is either not reached, or reached in a case where bytes_written == out_size, so the incorrect pointer increment does not manifest.

To reliably trigger the bug, I used a small hack: instead of letting LZ4F_decompress() use the full available out_size, I artificially limited out_size before the call, forcing LZ4F_decompress() to require one more iteration to fill the buffer. See the attached nocfbot_hack.diff for the hack.

With that hack in place, the bug can be reproduced using the following procedure:

1. initdb
2 Set "wal_level = replica” in postgreSQl.conf
3. Restart the instance
4. Create a database
5. Generate some WAL logs by psql
```
CREATE TABLE t AS SELECT generate_series(1, 100000) AS id;
CHECKPOINT;
```
6. Create a backup
```
% rm -rf /tmp/bkup_lz4
% pg_basebackup -D /tmp/bkup_lz4 -F t -Z lz4 -X stream -c fast
```
7. Verify the backup
```
% pg_verifybackup -F t -n /tmp/bkup_lz4
pg_verifybackup: error: zsh: trace trap pg_verifybackup -F t -n /tmp/bkup_lz4
```

With the fix applied (plus the hack), step 7 succeeds:
```
% pg_verifybackup -F t -n /tmp/bkup_lz4
backup successfully verified
```
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

<nocfbot_hack.diff><v1-0001-astreamer_lz4-fix-output-pointer-advancement-in-d.patch>

Added to CF for tracking https://commitfest.postgresql.org/patch/6561/

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Xuneng Zhou

xunengzhou@gmail.com

about 1 month ago

In reply to: Chao Li (#2)

Re: astreamer_lz4: fix bug of output pointer advancement in decompressor

Hi,

On Tue, Mar 3, 2026 at 11:27 AM Chao Li <li.evan.chao@gmail.com> wrote:

On Mar 2, 2026, at 17:17, Chao Li <li.evan.chao@gmail.com> wrote:

Hi,

There have been a couple of LZ4-related patches recently, so I spent some time playing with the LZ4 path and found a bug in astreamer_lz4_decompressor_content().

Looking at the code snippet (omitting irrelevant code):
```
ret = LZ4F_decompress(mystreamer->dctx,
next_out, &out_size,
next_in, &read_size, NULL);

mystreamer->bytes_written += out_size; // <== bumped bytes_written already

/*
* If output buffer is full then forward the content to next streamer
* and update the output buffer.
*/
if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
{
astreamer_content(mystreamer->base.bbs_next, member,
mystreamer->base.bbs_buffer.data,
mystreamer->base.bbs_buffer.maxlen,
context);

avail_out = mystreamer->base.bbs_buffer.maxlen;
mystreamer->bytes_written = 0;
next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
}
else
{
avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
next_out += mystreamer->bytes_written; // <== The bug is there
}
```

To advance next_out, the code uses mystreamer->bytes_written. However, bytes_written has already been increased by out_size in the current iteration. As a result, next_out is advanced by the cumulative number of bytes written so far, instead of just the number of bytes produced in this iteration. Effectively, the pointer movement is double-counting the previous progress.

When I tried to design a test case to trigger this bug, I found it is actually not easy to hit in normal execution. Tracing into the function, I found that the default output buffer size is 1024 bytes, and in practice LZ4F_decompress() tends to fill the output buffer in one or two iterations. As a result, the problematic else branch is either not reached, or reached in a case where bytes_written == out_size, so the incorrect pointer increment does not manifest.

To reliably trigger the bug, I used a small hack: instead of letting LZ4F_decompress() use the full available out_size, I artificially limited out_size before the call, forcing LZ4F_decompress() to require one more iteration to fill the buffer. See the attached nocfbot_hack.diff for the hack.

With that hack in place, the bug can be reproduced using the following procedure:

1. initdb
2 Set "wal_level = replica” in postgreSQl.conf
3. Restart the instance
4. Create a database
5. Generate some WAL logs by psql
```
CREATE TABLE t AS SELECT generate_series(1, 100000) AS id;
CHECKPOINT;
```
6. Create a backup
```
% rm -rf /tmp/bkup_lz4
% pg_basebackup -D /tmp/bkup_lz4 -F t -Z lz4 -X stream -c fast
```
7. Verify the backup
```
% pg_verifybackup -F t -n /tmp/bkup_lz4
pg_verifybackup: error: zsh: trace trap pg_verifybackup -F t -n /tmp/bkup_lz4
```

With the fix applied (plus the hack), step 7 succeeds:
```
% pg_verifybackup -F t -n /tmp/bkup_lz4
backup successfully verified
```
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

<nocfbot_hack.diff><v1-0001-astreamer_lz4-fix-output-pointer-advancement-in-d.patch>

Added to CF for tracking https://commitfest.postgresql.org/patch/6561/

I agree this is a logical issue. We should increment next_out by the
delta value(out_size) rather than the cumulative
value(mystreamer->bytes_written); otherwise, it will leave holes in
the output buffer. The proposed fix LGTM.

--
Best,
Xuneng

Tom Lane

tgl@sss.pgh.pa.us

about 1 month ago

In reply to: Chao Li (#1)

Re: astreamer_lz4: fix bug of output pointer advancement in decompressor

Chao Li <li.evan.chao@gmail.com> writes:

There have been a couple of LZ4-related patches recently, so I spent some time playing with the LZ4 path and found a bug in astreamer_lz4_decompressor_content().

Yup, that's clearly wrong. I failed to reproduce a crash with the
test hack you suggested, but no matter. Pushed with some cosmetic
editorialization.

The track record of all this client-side-compression logic is
really quite awful :-(. Another thing that I'm looking askance
at is the error handling, or rather lack of it:

ret = LZ4F_decompress(mystreamer->dctx,
next_out, &out_size,
next_in, &read_size, NULL);

if (LZ4F_isError(ret))
pg_log_error("could not decompress data: %s",
LZ4F_getErrorName(ret));

... continue on our merry way ...

I suspect whoever wrote this thought pg_log_error is equivalent
to elog(ERROR), but it's not; it just prints a message. It seems
highly unlikely to me that continuing onwards will result in a
good outcome. I'm a bit inclined to s/pg_log_error/pg_fatal/
throughout these files, at least in places where there's no
visible effort to handle the error.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

about 1 month ago

In reply to: Tom Lane (#4)

Re: astreamer_lz4: fix bug of output pointer advancement in decompressor

I wrote:

I suspect whoever wrote this thought pg_log_error is equivalent
to elog(ERROR), but it's not; it just prints a message. It seems
highly unlikely to me that continuing onwards will result in a
good outcome. I'm a bit inclined to s/pg_log_error/pg_fatal/
throughout these files, at least in places where there's no
visible effort to handle the error.

After looking through fe_utils, pg_dump, pg_basebackup, and
pg_verifybackup, I found the attached places that seem to
need cleanup. There are a couple other places where we
are not treating failures as fatal, but those seem intentional,
eg not fatal'ing on close() failure for an input file.

regards, tom lane

Chao Li

li.evan.chao@gmail.com

about 1 month ago

In reply to: Tom Lane (#4)

Re: astreamer_lz4: fix bug of output pointer advancement in decompressor

On Mar 5, 2026, at 01:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Chao Li <li.evan.chao@gmail.com> writes:

There have been a couple of LZ4-related patches recently, so I spent some time playing with the LZ4 path and found a bug in astreamer_lz4_decompressor_content().

Yup, that's clearly wrong. I failed to reproduce a crash with the
test hack you suggested, but no matter. Pushed with some cosmetic
editorialization.

Hmm.. I just tried again. With applying nocfbot_hack.diff to an old branch, I can easily reproduce the bug:
```
chaol@ChaodeMacBook-Air cndb % pg_basebackup -D /tmp/bkup_lz4 -F t -Z lz4 -X stream -c fast
2026-03-05 09:01:53.461 CST [72896] LOG: checkpoint starting: fast force wait time
2026-03-05 09:01:53.466 CST [72896] LOG: checkpoint complete: fast force wait time: wrote 0 buffers (0.0%), wrote 0 SLRU buffers; 0 WAL file(s) added, 0 removed, 1 recycled; write=0.001 s, sync=0.001 s, total=0.006 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16383 kB, estimate=29655 kB; lsn=0/14000080, redo lsn=0/14000028
chaol@ChaodeMacBook-Air cndb % pg_verifybackup -F t -n /tmp/bkup_lz4
pg_verifybackup: error: zsh: trace trap pg_verifybackup -F t -n /tmp/bkup_lz4
```

Then switching to the latest master, and also applying nocfbot_hack.diff:
```
chaol@ChaodeMacBook-Air postgresql % git diff
diff --git a/src/fe_utils/astreamer_lz4.c b/src/fe_utils/astreamer_lz4.c
index e196fcc81e5..35fd564df9a 100644
--- a/src/fe_utils/astreamer_lz4.c
+++ b/src/fe_utils/astreamer_lz4.c
@@ -331,10 +331,15 @@ astreamer_lz4_decompressor_content(astreamer *streamer,
                size_t          ret,
                                        read_size,
                                        out_size;
+               size_t hack_out_size;

read_size = avail_in;
out_size = avail_out;

+               if (out_size > 5)
+                       hack_out_size = out_size - 5;
+               else
+                       hack_out_size = out_size;
                /*
                 * This call decompresses the data starting at next_in and generates
                 * the output data starting at next_out. It expects the caller to
@@ -349,13 +354,15 @@ astreamer_lz4_decompressor_content(astreamer *streamer,
                 * to out_size respectively.
                 */
                ret = LZ4F_decompress(mystreamer->dctx,
-                                                         next_out, &out_size,
+                                                         next_out, &hack_out_size,
                                                          next_in, &read_size, NULL);

if (LZ4F_isError(ret))
pg_log_error("could not decompress data: %s",
LZ4F_getErrorName(ret));

+               out_size = hack_out_size;
+
                /* Update input buffer based on number of bytes consumed */
                avail_in -= read_size;
                next_in += read_size;
```

Now, the bug goes away:
```
chaol@ChaodeMacBook-Air cndb % rm -rf /tmp/bkup_lz4
chaol@ChaodeMacBook-Air cndb % pg_basebackup -D /tmp/bkup_lz4 -F t -Z lz4 -X stream -c fast
2026-03-05 09:05:57.632 CST [72896] LOG: checkpoint starting: fast force wait
2026-03-05 09:05:57.634 CST [72896] LOG: checkpoint complete: fast force wait: wrote 0 buffers (0.0%), wrote 0 SLRU buffers; 0 WAL file(s) added, 0 removed, 2 recycled; write=0.001 s, sync=0.001 s, total=0.003 s; sync files=0, longest=0.000 s, average=0.000 s; distance=32768 kB, estimate=32768 kB; lsn=0/16000080, redo lsn=0/16000028
chaol@ChaodeMacBook-Air cndb % pg_verifybackup -F t -n /tmp/bkup_lz4
backup successfully verified
```

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Chao Li

li.evan.chao@gmail.com

about 1 month ago

In reply to: Tom Lane (#5)

Re: astreamer_lz4: fix bug of output pointer advancement in decompressor

On Thu, Mar 5, 2026 at 2:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

I suspect whoever wrote this thought pg_log_error is equivalent
to elog(ERROR), but it's not; it just prints a message. It seems
highly unlikely to me that continuing onwards will result in a
good outcome. I'm a bit inclined to s/pg_log_error/pg_fatal/
throughout these files, at least in places where there's no
visible effort to handle the error.

After looking through fe_utils, pg_dump, pg_basebackup, and
pg_verifybackup, I found the attached places that seem to
need cleanup. There are a couple other places where we
are not treating failures as fatal, but those seem intentional,
eg not fatal'ing on close() failure for an input file.

regards, tom lane

I also thought pg_log_error behaves the same way as elog(ERROR). Noted now.

The cleanup looks good to me. I also did a broader search, and didn't find
a more similar place to clean up.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

astreamer_lz4: fix bug of output pointer advancement in decompressor

Attachments:

Attachments: