pg_basebackup: removed an unnecessary use of memset in FindStreamingStart
Hi Hackers,
When I read the FindStreamingStart function in pg_receivewal.c, I discovered an unnecessary use of memset.So I removed it, optimizing the performance without affecting its functionality.
The following is the detailed analysis of the reasons:
1.LZ4F_decompress will fully overwrite the output buffer:
When out_size is passed as an input parameter, it denotes the size of the output buffer (outbuf). The decompression operation writes the decompressed data to outbuf. Upon function return, out_size is updated to reflect the actual number of bytes written. Notably, even in cases of partial decompression, data is written starting from the initial position of outbuf.
2.Performance Overhead
In each iteration, the entire buffer of size LZ4_CHUNK_SZ (potentially several megabytes) is zero-initialized. Since these memory blocks are immediately overwritten by decompressed data, this zeroing operation constitutes an unnecessary consumption of CPU resources.
Regards,
Yang Yuanzhuo
Attachments:
v1-0001-Removed-an-unnecessary-use-of-memset-in-FindStrea.patchapplication/octet-stream; charset=utf-8; name=v1-0001-Removed-an-unnecessary-use-of-memset-in-FindStrea.patchDownload+0-2
On Feb 25, 2026, at 14:31, yangyz <1197620467@qq.com> wrote:
Hi Hackers,
When I read the FindStreamingStart function in pg_receivewal.c, I discovered an unnecessary use of memset.So I removed it, optimizing the performance without affecting its functionality.
The following is the detailed analysis of the reasons:
1.LZ4F_decompress will fully overwrite the output buffer:
When out_size is passed as an input parameter, it denotes the size of the output buffer (outbuf). The decompression operation writes the decompressed data to outbuf. Upon function return, out_size is updated to reflect the actual number of bytes written. Notably, even in cases of partial decompression, data is written starting from the initial position of outbuf.
2.Performance Overhead
In each iteration, the entire buffer of size LZ4_CHUNK_SZ (potentially several megabytes) is zero-initialized. Since these memory blocks are immediately overwritten by decompressed data, this zeroing operation constitutes an unnecessary consumption of CPU resources.Regards,
Yang Yuanzhuo<v1-0001-Removed-an-unnecessary-use-of-memset-in-FindStrea.patch>
Looking at the code snippet:
```
while (readp < readend)
{
size_t out_size = LZ4_CHUNK_SZ;
size_t read_size = readend - readp;
memset(outbuf, 0, LZ4_CHUNK_SZ);
status = LZ4F_decompress(ctx, outbuf, &out_size,
readp, &read_size, &dec_opt);
if (LZ4F_isError(status))
pg_fatal("could not decompress file \"%s\": %s",
fullpath,
LZ4F_getErrorName(status));
readp += read_size;
uncompressed_size += out_size;
}
```
It’s trying to locate the start position, and the decoded bytes are not consumed (they’re effectively discarded). Given that LZ4F_decompress() reports the produced size via out_size, zeroing the whole output buffer beforehand doesn’t seem necessary here. Since this happens inside the loop, the extra memset() just amplifies the overhead.
Also, ReadDataFromArchiveLZ4() has a very similar loop that doesn’t zero the output buffer at all:
```
while (readp < readend)
{
size_t out_size = DEFAULT_IO_BUFFER_SIZE;
size_t read_size = readend - readp;
status = LZ4F_decompress(ctx, outbuf, &out_size,
readp, &read_size, &dec_opt);
if (LZ4F_isError(status))
pg_fatal("could not decompress: %s",
LZ4F_getErrorName(status));
ahwrite(outbuf, 1, out_size, AH);
readp += read_size;
}
```
So +1 for removing the memset.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On 25 Feb 2026, at 07:31, yangyz <1197620467@qq.com> wrote:
2.Performance Overhead
In each iteration, the entire buffer of size LZ4_CHUNK_SZ (potentially several megabytes) is zero-initialized. Since these memory blocks are immediately overwritten by decompressed data, this zeroing operation constitutes an unnecessary consumption of CPU resources.
When proposing a performance improvement it's important to provide some level
of benchmarks to show the improvement. Is removing this memset noticeable?
--
Daniel Gustafsson
On Feb 25, 2026, at 18:21, Daniel Gustafsson <daniel@yesql.se> wrote:
On 25 Feb 2026, at 07:31, yangyz <1197620467@qq.com> wrote:
2.Performance Overhead
In each iteration, the entire buffer of size LZ4_CHUNK_SZ (potentially several megabytes) is zero-initialized. Since these memory blocks are immediately overwritten by decompressed data, this zeroing operation constitutes an unnecessary consumption of CPU resources.When proposing a performance improvement it's important to provide some level
of benchmarks to show the improvement. Is removing this memset noticeable?--
Daniel Gustafsson
I don’t think this patch is about performance. Although removing the memset might save a few CPU cycles, the real benefit seems to be cleanup and consistency. The memset appears unnecessary, and similar functions don’t use it, so I think this change mainly improves maintainability.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On 25 Feb 2026, at 13:41, Chao Li <li.evan.chao@gmail.com> wrote:
On Feb 25, 2026, at 18:21, Daniel Gustafsson <daniel@yesql.se> wrote:
On 25 Feb 2026, at 07:31, yangyz <1197620467@qq.com> wrote:
2.Performance Overhead
In each iteration, the entire buffer of size LZ4_CHUNK_SZ (potentially several megabytes) is zero-initialized. Since these memory blocks are immediately overwritten by decompressed data, this zeroing operation constitutes an unnecessary consumption of CPU resources.When proposing a performance improvement it's important to provide some level
of benchmarks to show the improvement. Is removing this memset noticeable?I don’t think this patch is about performance. Although removing the memset might save a few CPU cycles, the real benefit seems to be cleanup and consistency. The memset appears unnecessary, and similar functions don’t use it, so I think this change mainly improves maintainability.
I would argue the opposite, clearing a buffer before passing it to an external
library function writing to it seems the right thing to do unless it can be
proven to regress performance too much. Also, "appears unnecessary" doesn't
instill enough confidence to perform a change IMO.
--
Daniel Gustafsson
On Feb 25, 2026, at 21:10, Daniel Gustafsson <daniel@yesql.se> wrote:
On 25 Feb 2026, at 13:41, Chao Li <li.evan.chao@gmail.com> wrote:
On Feb 25, 2026, at 18:21, Daniel Gustafsson <daniel@yesql.se> wrote:
On 25 Feb 2026, at 07:31, yangyz <1197620467@qq.com> wrote:
2.Performance Overhead
In each iteration, the entire buffer of size LZ4_CHUNK_SZ (potentially several megabytes) is zero-initialized. Since these memory blocks are immediately overwritten by decompressed data, this zeroing operation constitutes an unnecessary consumption of CPU resources.When proposing a performance improvement it's important to provide some level
of benchmarks to show the improvement. Is removing this memset noticeable?I don’t think this patch is about performance. Although removing the memset might save a few CPU cycles, the real benefit seems to be cleanup and consistency. The memset appears unnecessary, and similar functions don’t use it, so I think this change mainly improves maintainability.
I would argue the opposite, clearing a buffer before passing it to an external
library function writing to it seems the right thing to do unless it can be
proven to regress performance too much. Also, "appears unnecessary" doesn't
instill enough confidence to perform a change IMO.--
Daniel Gustafsson
As I pointed out earlier, ReadDataFromArchiveLZ4() has a very similar loop that doesn’t zero out the output buffer:
```
while (readp < readend)
{
size_t out_size = DEFAULT_IO_BUFFER_SIZE;
size_t read_size = readend - readp;
status = LZ4F_decompress(ctx, outbuf, &out_size,
readp, &read_size, &dec_opt);
if (LZ4F_isError(status))
pg_fatal("could not decompress: %s",
LZ4F_getErrorName(status));
ahwrite(outbuf, 1, out_size, AH);
readp += read_size;
}
```
Do you think we should add a memset there? There are a couple of more callers of LZ4F_decompress that don’t zero out the output buffer.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Feb 25, 2026, at 21:54, Chao Li <li.evan.chao@gmail.com> wrote:
On Feb 25, 2026, at 21:10, Daniel Gustafsson <daniel@yesql.se> wrote:
On 25 Feb 2026, at 13:41, Chao Li <li.evan.chao@gmail.com> wrote:
On Feb 25, 2026, at 18:21, Daniel Gustafsson <daniel@yesql.se> wrote:
On 25 Feb 2026, at 07:31, yangyz <1197620467@qq.com> wrote:
2.Performance Overhead
In each iteration, the entire buffer of size LZ4_CHUNK_SZ (potentially several megabytes) is zero-initialized. Since these memory blocks are immediately overwritten by decompressed data, this zeroing operation constitutes an unnecessary consumption of CPU resources.When proposing a performance improvement it's important to provide some level
of benchmarks to show the improvement. Is removing this memset noticeable?I don’t think this patch is about performance. Although removing the memset might save a few CPU cycles, the real benefit seems to be cleanup and consistency. The memset appears unnecessary, and similar functions don’t use it, so I think this change mainly improves maintainability.
I would argue the opposite, clearing a buffer before passing it to an external
library function writing to it seems the right thing to do unless it can be
proven to regress performance too much. Also, "appears unnecessary" doesn't
instill enough confidence to perform a change IMO.--
Daniel GustafssonAs I pointed out earlier, ReadDataFromArchiveLZ4() has a very similar loop that doesn’t zero out the output buffer:
```
while (readp < readend)
{
size_t out_size = DEFAULT_IO_BUFFER_SIZE;
size_t read_size = readend - readp;status = LZ4F_decompress(ctx, outbuf, &out_size,
readp, &read_size, &dec_opt);
if (LZ4F_isError(status))
pg_fatal("could not decompress: %s",
LZ4F_getErrorName(status));ahwrite(outbuf, 1, out_size, AH);
readp += read_size;
}
```Do you think we should add a memset there? There are a couple of more callers of LZ4F_decompress that don’t zero out the output buffer.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Adding the original author to see if he still remember what was the intention of the memset.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/