BUG #19400: Memory leak in checkpointer and startup processes on PostgreSQL 18

Started by PG Bug reporting form4 months ago3 messagesbugs

noreply@postgresql.org

4 months ago

The following bug has been logged on the website:

Bug reference: 19400
Logged by: Raphael
Email address: raphael@atmotrack.fr
PostgreSQL version: 18.1
Operating system: Debian 13
Description:

Hello,

I recently migrated my cluster with 3 dedicated servers to a new cluster. I
was running on PG12 and I am now on PG18.1.
I noticed an increasing memory usage on all of my 3 node, until at some
point there is no memory left and patroni crashes on the leader, leaving the
cluster with no available primary.
The cluster is a Data Warehouse type using TimescaleDB, ingesting approx. 1M
of time-serie a day.
It appears that the memory leak is affecting both the checkpointer and
startup (WAL replay) processes in PostgreSQL 18.0 and 18.1.
I never had such issue on the old cluster with PG12 and the server's
configuration and cluster usage are the same (except the upgrade of PG)

SYMPTOMS:
- Checkpointer process grows to 5.6GB RSS after 24 hours
- Startup process on replicas grows to 3.9GB RSS
- Memory growth rate: approximately 160-200MB per hour
- Eventually causes out-of-memory conditions

CONFIGURATION:
- PostgreSQL version: Initially 18.0, upgraded to 18.1 - same issue persists
- Platform: Debian 13
- TimescaleDB: 2.23.0
- Deployment: 3-node Patroni cluster with streaming replication
- WAL level: logical
- Hot standby enabled

SYSTEM RESOURCES:
RAM: 32GB
Proc: 12 core of Intel(R) Xeon(R) E-2386G 3.50GHz

KEY SETTINGS:
- wal_level: logical
- hot_standby: on
- max_wal_senders: 20
- max_replication_slots: 20
- wal_keep_size: 1GB
- shared_buffer: 8GB

WAL STATISTICS (over 7 days):
- Total WAL generated: 2.3TB (approximately 31GB/day)
- Replication lag: 0 bytes (replicas are caught up)
- No long-running transactions

MEMORY STATE AFTER 24 HOURS:
On primary:
postgres checkpointer: 3.9GB RSS

On replicas:
postgres checkpointer: 5.6GB RSS
postgres startup recovering: 3.9GB RSS <-- This is abnormal

TESTING PERFORMED:
- Disabled wal_log_hints: No effect
- Reduced timescaledb.max_background_workers from 16 to 4: No effect
- Increased checkpoint_timeout to 30min: No effect
- Increased max_wal_size to 4GB: No effect
- Checkpoint frequency is normal (5.99% requested vs timed)
- Upgraded from PostgreSQL 18.0 to 18.1: Same issue persists

The startup process consuming 3.9GB on replicas is particularly concerning,
as this process should not accumulate memory during normal WAL replay.

Here some output 4 days after starting Postgres, on the primary :
# ps aux --sort=-%mem | head -20
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 38762 0.0 24.9 8648868 8182260 ? Ss Feb06 2:42 postgres:
atmo_data: checkpointer
postgres 38763 0.0 8.3 8646992 2746188 ? Ss Feb06 1:08 postgres:
atmo_data: background writer
postgres 38759 0.0 4.5 8646728 1501344 ? Ss Feb06 0:24 postgres:
atmo_data: io worker 0
postgres 38760 0.0 2.9 8646728 976132 ? Ss Feb06 0:13 postgres:
atmo_data: io worker 1
postgres 38761 0.0 2.3 8646728 755948 ? Ss Feb06 0:07 postgres:
atmo_data: io worker 2

The same output on a replica :
# ps aux --sort=-%mem | head -20
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 2706473 0.0 25.2 8653612 8291200 ? Ss Feb06 5:09 postgres:
atmo_data: checkpointer
postgres 2706475 0.2 23.8 8656016 7802324 ? Ss Feb06 14:41 postgres:
atmo_data: startup recovering 0000001E0000024500000075

My not-so-good workaround is to perform a daily restart of the cluster
members to avoid breaking patroni on the primary.

Please let me know if you need any additional information to investigate
this issue

Andres Freund

andres@anarazel.de

4 months ago

In reply to: PG Bug reporting form (#1)

Re: BUG #19400: Memory leak in checkpointer and startup processes on PostgreSQL 18

Hi,

On 2026-02-10 15:28:38 +0000, PG Bug reporting form wrote:

I recently migrated my cluster with 3 dedicated servers to a new cluster. I
was running on PG12 and I am now on PG18.1.
I noticed an increasing memory usage on all of my 3 node, until at some
point there is no memory left and patroni crashes on the leader, leaving the
cluster with no available primary.
The cluster is a Data Warehouse type using TimescaleDB, ingesting approx. 1M
of time-serie a day.
It appears that the memory leak is affecting both the checkpointer and
startup (WAL replay) processes in PostgreSQL 18.0 and 18.1.
I never had such issue on the old cluster with PG12 and the server's
configuration and cluster usage are the same (except the upgrade of PG)

SYMPTOMS:
- Checkpointer process grows to 5.6GB RSS after 24 hours
- Startup process on replicas grows to 3.9GB RSS
- Memory growth rate: approximately 160-200MB per hour
- Eventually causes out-of-memory conditions

CONFIGURATION:
- PostgreSQL version: Initially 18.0, upgraded to 18.1 - same issue persists
- Platform: Debian 13
- TimescaleDB: 2.23.0
- Deployment: 3-node Patroni cluster with streaming replication
- WAL level: logical
- Hot standby enabled

SYSTEM RESOURCES:
RAM: 32GB
Proc: 12 core of Intel(R) Xeon(R) E-2386G 3.50GHz

KEY SETTINGS:
- wal_level: logical
- hot_standby: on
- max_wal_senders: 20
- max_replication_slots: 20
- wal_keep_size: 1GB
- shared_buffer: 8GB

WAL STATISTICS (over 7 days):
- Total WAL generated: 2.3TB (approximately 31GB/day)
- Replication lag: 0 bytes (replicas are caught up)
- No long-running transactions

MEMORY STATE AFTER 24 HOURS:
On primary:
postgres checkpointer: 3.9GB RSS

On replicas:
postgres checkpointer: 5.6GB RSS
postgres startup recovering: 3.9GB RSS <-- This is abnormal

The RSS slowly increasing towards shared_buffers is normal if you're not using
huge_pages. The OS only counts pages in shared memory as part of RSS once a
page has been used in the process. Over time the checkpointer process touches
more and more of shared_buffers, thus increasing the RSS.

You can use "pmap -d -p $pid_of_process" to see how much of the RSS is
actually shared memory.

To show this, here's a PS for a new backend:

ps:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
andres 2544694 0.0 0.0 8719956 25744 ? Ss 14:55 0:00 postgres: dev assert: andres postgres [local] idle

and then after reading in a relation 1.3GB relation:

andres 2544694 1.7 2.2 8720972 1403576 ? Ss 14:55 0:00 postgres: dev assert: andres postgres [local] idle

So you can see that RSS increased proportionally with the amount of touched
data.

Whereas with pmap:

pmap -d -p 2544694|tail -n 1
mapped: 8721924K writeable/private: 5196K shared: 8646284K

I think you would need to monitor the real memory usage of various processes
to know why you're OOMing.

You can use pg_log_backend_memory_contexts() to get the memory usage
information of backend processes.

Greetings,

Andres Freund

Raphaël Perissat

raphael@atmotrack.fr

4 months ago

In reply to: Andres Freund (#2)

Re: BUG #19400: Memory leak in checkpointer and startup processes on PostgreSQL 18

Hi and thanks for the explanation.

Indeed the pmap seems to confirm that the private memory usage is correct
# pmap -d -p $(pgrep -f "postgres.*checkpointer") | tail -n 1
mapped: 8647256K writeable/private: 2632K shared: 8574584K

I used pg_log_backend_memory_contexts() with the pid of the checkpointer
process and I can't see much on this side as well :
2026-02-11 08:58:13.794 UTC [1988622] LOG: logging memory contexts of PID
1988622
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 1; TopMemoryContext:
61568 total in 3 blocks; 2768 free (0 chunks); 58800 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; smgr relation table:
32768 total in 3 blocks; 16904 free (9 chunks); 15864 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; Checkpointer: 24576
total in 2 blocks; 24296 free (13 chunks); 280 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; LOCALLOCK hash: 8192
total in 1 blocks; 616 free (0 chunks); 7576 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; WAL record
construction: 50200 total in 2 blocks; 6400 free (0 chunks); 43800 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; PrivateRefCount: 8192
total in 1 blocks; 2672 free (0 chunks); 5520 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; MdSmgr: 8192 total in
1 blocks; 7952 free (62 chunks); 240 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; Pending ops context:
8192 total in 1 blocks; 7952 free (5 chunks); 240 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 3; Pending Ops Table:
16384 total in 2 blocks; 6712 free (3 chunks); 9672 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; Rendezvous variable
hash: 8192 total in 1 blocks; 616 free (0 chunks); 7576 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; GUCMemoryContext:
32768 total in 3 blocks; 3264 free (19 chunks); 29504 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 3; GUC hash table: 32768
total in 3 blocks; 10664 free (6 chunks); 22104 used
2026-02-11 08:58:13.795 UTC [1988622] LOG: level: 2; Timezones: 104112
total in 2 blocks; 2672 free (0 chunks); 101440 used
2026-02-11 08:58:13.795 UTC [1988622] LOG: level: 2; ErrorContext: 8192
total in 1 blocks; 7952 free (5 chunks); 240 used
2026-02-11 08:58:13.795 UTC [1988622] LOG: Grand total: 404296 bytes in 26
blocks; 101440 free (122 chunks); 302856 used

I monitor my servers metric on ELK (grafana-like) and I can clearly see the
memory usage growing with approx 100MB/hour until it reaches 8GB for the
postgres process, causing patroni to crashout on the primary.
The fact that this memory "leak" is appearing on both the primary and the 2
replicas make me think that this is not caused by some ingest delay / index
creation but maybe I'm wrong.

I restarted the 3 nodes yesterday and here is the output of the watch
command on the primary :
total used free shared buff/cache available
Mem: 31988 5861 670 4571 30530
26126
Swap: 4095 55 4040

free is only showing 670 already, and I can see it going down in real-time
like a countdown.

Based on this output on the replicas :
:~# ps aux --sort=-%mem | head -20
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 62338 0.1 10.6 8652016 3492940 ? Ss Feb10 1:04
postgres: atmo_data: checkpointer
postgres 62340 0.2 9.8 8655736 3227356 ? Ss Feb10 2:36
postgres: atmo_data: startup recovering 0000001E00000246000000C9

Couldn't the startup recovery process be causing this ? I do this command
on the same replica time to time and I can clearly see that the %MEM used
by those 2 processes are growing over time.

Thanks for your help.

Le mar. 10 févr. 2026 à 20:58, Andres Freund <andres@anarazel.de> a écrit :

Show quoted text

Hi,

On 2026-02-10 15:28:38 +0000, PG Bug reporting form wrote:

I recently migrated my cluster with 3 dedicated servers to a new

cluster. I

was running on PG12 and I am now on PG18.1.
I noticed an increasing memory usage on all of my 3 node, until at some
point there is no memory left and patroni crashes on the leader, leaving

the

cluster with no available primary.
The cluster is a Data Warehouse type using TimescaleDB, ingesting

approx. 1M

of time-serie a day.
It appears that the memory leak is affecting both the checkpointer and
startup (WAL replay) processes in PostgreSQL 18.0 and 18.1.
I never had such issue on the old cluster with PG12 and the server's
configuration and cluster usage are the same (except the upgrade of PG)

SYMPTOMS:
- Checkpointer process grows to 5.6GB RSS after 24 hours
- Startup process on replicas grows to 3.9GB RSS
- Memory growth rate: approximately 160-200MB per hour
- Eventually causes out-of-memory conditions

CONFIGURATION:
- PostgreSQL version: Initially 18.0, upgraded to 18.1 - same issue

persists

- Platform: Debian 13
- TimescaleDB: 2.23.0
- Deployment: 3-node Patroni cluster with streaming replication
- WAL level: logical
- Hot standby enabled

SYSTEM RESOURCES:
RAM: 32GB
Proc: 12 core of Intel(R) Xeon(R) E-2386G 3.50GHz

KEY SETTINGS:
- wal_level: logical
- hot_standby: on
- max_wal_senders: 20
- max_replication_slots: 20
- wal_keep_size: 1GB
- shared_buffer: 8GB

WAL STATISTICS (over 7 days):
- Total WAL generated: 2.3TB (approximately 31GB/day)
- Replication lag: 0 bytes (replicas are caught up)
- No long-running transactions

MEMORY STATE AFTER 24 HOURS:
On primary:
postgres checkpointer: 3.9GB RSS

On replicas:
postgres checkpointer: 5.6GB RSS
postgres startup recovering: 3.9GB RSS <-- This is abnormal

The RSS slowly increasing towards shared_buffers is normal if you're not
using
huge_pages. The OS only counts pages in shared memory as part of RSS once a
page has been used in the process. Over time the checkpointer process
touches
more and more of shared_buffers, thus increasing the RSS.

You can use "pmap -d -p $pid_of_process" to see how much of the RSS is
actually shared memory.

To show this, here's a PS for a new backend:

ps:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
andres 2544694 0.0 0.0 8719956 25744 ? Ss 14:55 0:00
postgres: dev assert: andres postgres [local] idle

and then after reading in a relation 1.3GB relation:

andres 2544694 1.7 2.2 8720972 1403576 ? Ss 14:55 0:00
postgres: dev assert: andres postgres [local] idle

So you can see that RSS increased proportionally with the amount of touched
data.

Whereas with pmap:

pmap -d -p 2544694|tail -n 1
mapped: 8721924K writeable/private: 5196K shared: 8646284K

I think you would need to monitor the real memory usage of various
processes
to know why you're OOMing.

You can use pg_log_backend_memory_contexts() to get the memory usage
information of backend processes.

Greetings,

Andres Freund