[PATCH] Fix memory leak in pgstat_progress_parallel_incr_param()

Started by Baji Shaik18 days ago6 messageshackers
Jump to latest
#1Baji Shaik
baji.pgdev@gmail.com

Hi,

While running parallel vacuum with track_cost_delay_timing=on, I
noticed memory in the parallel worker process keeps growing
proportionally to vacuum runtime, and is never reclaimed until the
worker exits.

I think pgstat_progress_parallel_incr_param() (backend_progress.c)
leaks memory on every call from a parallel worker.

The suspected block:

static StringInfoData progress_message;
initStringInfo(&progress_message); /* palloc -> A */
pq_beginmessage(&progress_message, PqMsg_Progress);
/* pq_beginmessage internally calls initStringInfo again ->
palloc -> B, A is orphaned */
pq_sendint32(&progress_message, index);
pq_sendint64(&progress_message, incr);
pq_endmessage(&progress_message); /* pfree(B), A leaked
*/

So one palloc(~1 kB) leaks per call, into the per-worker context.

This is an oversight of f1889729dd3 ("Add new parallel message type
to progress reporting"); track_cost_delay_timing just makes it more
visible. With that GUC enabled, a long-running parallel vacuum leaks
megabytes per worker (~232 MB observed in a 43-min vacuum at default
settings on a 15M-row, 30-index workload).

The proposed fix is in the attached patch which does a one-time init of the
static
buffer, then pq_beginmessage_reuse() / pq_endmessage_reuse() so the
buffer is allocated once and reused.

All 245 regression tests pass.

Thanks,
Baji Shaik.

Attachments:

0001-Fix-memory-leak-in-pgstat_progress_parallel_incr_par.patchapplication/octet-stream; name=0001-Fix-memory-leak-in-pgstat_progress_parallel_incr_par.patchDownload+15-6
#2Tristan Partin
tristan@partin.io
In reply to: Baji Shaik (#1)
Re: [PATCH] Fix memory leak in pgstat_progress_parallel_incr_param()

On Fri Jun 5, 2026 at 5:44 PM UTC, Baji Shaik wrote:

Hi,

While running parallel vacuum with track_cost_delay_timing=on, I
noticed memory in the parallel worker process keeps growing
proportionally to vacuum runtime, and is never reclaimed until the
worker exits.

I think pgstat_progress_parallel_incr_param() (backend_progress.c)
leaks memory on every call from a parallel worker.

The suspected block:

static StringInfoData progress_message;
initStringInfo(&progress_message); /* palloc -> A */
pq_beginmessage(&progress_message, PqMsg_Progress);
/* pq_beginmessage internally calls initStringInfo again ->
palloc -> B, A is orphaned */
pq_sendint32(&progress_message, index);
pq_sendint64(&progress_message, incr);
pq_endmessage(&progress_message); /* pfree(B), A leaked
*/

So one palloc(~1 kB) leaks per call, into the per-worker context.

This is an oversight of f1889729dd3 ("Add new parallel message type
to progress reporting"); track_cost_delay_timing just makes it more
visible. With that GUC enabled, a long-running parallel vacuum leaks
megabytes per worker (~232 MB observed in a 43-min vacuum at default
settings on a 15M-row, 30-index workload).

The proposed fix is in the attached patch which does a one-time init of the
static
buffer, then pq_beginmessage_reuse() / pq_endmessage_reuse() so the
buffer is allocated once and reused.

Hey Baji,

This looks pretty reasonable to me. Nice find. Did you think about
keeping the code path as is and just removing the first initStringInfo()
call? Removing the allocation per progress message seems like a good
idea to me. Maybe you could separate this change into two patches. One
to fix the memory leak and another to remove the allocation per message.
A committer could then decide for themselves if the second patch is
worth committing.

--
Tristan Partin
PostgreSQL Contributors Team
AWS (https://aws.amazon.com)

#3Baji Shaik
baji.pgdev@gmail.com
In reply to: Tristan Partin (#2)
Re: [PATCH] Fix memory leak in pgstat_progress_parallel_incr_param()

On Fri, Jun 5, 2026 at 4:29 PM Tristan Partin <tristan@partin.io> wrote:

This looks pretty reasonable to me. Nice find. Did you think about
keeping the code path as is and just removing the first initStringInfo()
call? Removing the allocation per progress message seems like a good
idea to me. Maybe you could separate this change into two patches. One
to fix the memory leak and another to remove the allocation per message.
A committer could then decide for themselves if the second patch is
worth committing.

Thank you for the review. I hadn't thought of splitting it, but it's
a good idea. I see f1889729dd3 itself is in PG17+, so the bug fix is
a backport candidate independently of the PG19 caller bb8dff9995f.

Patches attached:

0001: drop the redundant initStringInfo() call (backport candidate)
0002: allocate the static buffer once per process via
pq_beginmessage_reuse / pq_endmessage_reuse, to avoid the
per-call allocation (master only)

Thanks,
Baji Shaik

Attachments:

0001-Fix-memory-leak-in-pgstat_progress_parallel_incr_par.patchapplication/octet-stream; name=0001-Fix-memory-leak-in-pgstat_progress_parallel_incr_par.patchDownload+0-3
0002-Allocate-progress-message-buffer-once-per-parallel-w.patchapplication/octet-stream; name=0002-Allocate-progress-message-buffer-once-per-parallel-w.patchDownload+15-4
#4Sami Imseih
samimseih@gmail.com
In reply to: Baji Shaik (#3)
Re: [PATCH] Fix memory leak in pgstat_progress_parallel_incr_param()

Hi,

good find, and thanks for the patches!

0001: drop the redundant initStringInfo() call (backport candidate)

This one looks like an obvious fix to me.

0002: allocate the static buffer once per process via
pq_beginmessage_reuse / pq_endmessage_reuse, to avoid the
per-call allocation (master only)

I am less convinced this will have any benefits for the additional complexity.
The callers of pgstat_progress_parallel_incr_param() are not frequent enough
to make a measurable difference here. cost delay reporting for parallel workers
is throttled by PARALLEL_VACUUM_DELAY_REPORT_INTERVAL_NS and
index progress reporting does not happen very frequently either.

--
Sami Imseih
Amazon Web Services (AWS)

#5Baji Shaik
baji.pgdev@gmail.com
In reply to: Sami Imseih (#4)
Re: [PATCH] Fix memory leak in pgstat_progress_parallel_incr_param()

On Sat, Jun 6, 2026 at 9:55 AM Sami Imseih <samimseih@gmail.com> wrote:

0002: allocate the static buffer once per process via
pq_beginmessage_reuse / pq_endmessage_reuse, to avoid the
per-call allocation (master only)

I am less convinced this will have any benefits for the additional
complexity.
The callers of pgstat_progress_parallel_incr_param() are not frequent
enough
to make a measurable difference here. cost delay reporting for parallel
workers
is throttled by PARALLEL_VACUUM_DELAY_REPORT_INTERVAL_NS and
index progress reporting does not happen very frequently either.

Thanks for the review, Sami.

Agreed, at those call frequencies the per-call palloc is not worth
the added complexity. Dropping 0002.

v3 attached (just 0001, unchanged from previous).

Thanks,
Baji Shaik.

Attachments:

v3-0001-Fix-memory-leak-in-pgstat_progress_parallel_incr_.patchapplication/octet-stream; name=v3-0001-Fix-memory-leak-in-pgstat_progress_parallel_incr_.patchDownload+0-3
#6Michael Paquier
michael@paquier.xyz
In reply to: Sami Imseih (#4)
Re: [PATCH] Fix memory leak in pgstat_progress_parallel_incr_param()

On Sat, Jun 06, 2026 at 09:55:05AM -0500, Sami Imseih wrote:

Hi,

good find, and thanks for the patches!

0001: drop the redundant initStringInfo() call (backport candidate)

This one looks like an obvious fix to me.

And clearly something that should be backpatched down to v17, or we
could pile a lot of memory depending on how many calls we do in a
worker, with more piling over time.

Will process, thanks!

0002: allocate the static buffer once per process via
pq_beginmessage_reuse / pq_endmessage_reuse, to avoid the
per-call allocation (master only)

I am less convinced this will have any benefits for the additional complexity.
The callers of pgstat_progress_parallel_incr_param() are not frequent enough
to make a measurable difference here. cost delay reporting for parallel workers
is throttled by PARALLEL_VACUUM_DELAY_REPORT_INTERVAL_NS and
index progress reporting does not happen very frequently either.

I doubt that 0002 is worth doing, particularly seeing the code paths
where this is called.
--
Michael