Is pg_basebackup Performance Limited by Files Page Cache?

Started by Alexandru Lazarevabout 1 year ago3 messagesgeneral
Jump to latest
#1Alexandru Lazarev
alexandru.lazarev@gmail.com

Hi everyone,
I'm a Java developer, so please bear with me if my description isn't
perfect. I hope the community can help me understand this issue
better.

I have PostgreSQL 15 running in a Kubernetes Pod using the Zalando
Spilo image, managed by Patroni. The container has a memory limit of
16 GB, and the database size is 40 GB (data dir on disk). When I
reinitialize a replica with Patroni, it runs 'basebackup.sh', which in
turn runs the 'pg_basebackup' command:
"/usr/lib/postgresql/15/bin/pg_basebackup
--pgdata=/home/postgres/pgdata/pgroot/data -X none
--dbname='dbname=postgres user=standby host=<Leader IP> port=5432' "

I noticed that the first 16 GB are copied quickly, but the process
slows down significantly afterward (I observed that after the copied
size is approximately equal to container's memory limits). Increasing
the container memory limit to 32 GB showed a similar pattern: the
first 32 GB were copied quickly, then it slowed down.

Running the command manually:
"/usr/lib/postgresql/15/bin/pg_basebackup
--pgdata=/home/postgres/pgdata/pgroot/dummy_dir -X none
--dbname='dbname=postgres user=standby host=<Leader IP> port=5432' "
reproduced the issue. Once the container's total memory (including
page cache, and including inactive files which are a lion's part of
container MEM) reaches the limit, 'pg_basebackup' slows down. Other
disk write operations (e.g., generating and copying large files with
'dd') are not affected by the memory limit and remain fast.

When 'pg_basebackup' is slow and the container memory limit is
reached, I tried discarding the page cache with: "sync && echo 3 >
/proc/sys/vm/drop_caches" and this made 'pg_basebackup' fast again.

Is 'pg_basebackup' performance limited by the files page cache? Do you
need any additional information from me? Any suggestions?
Thanks for your help!

Regards,
AlexL

#2Alexandru Lazarev
alexandru.lazarev@gmail.com
In reply to: Alexandru Lazarev (#1)
Re: Is pg_basebackup Performance Limited by Files Page Cache?

In my case culprit is Linux Kernel (Oracle Linux 9.2: RHCK 5.14 - issue
exists, after switched to UEK 5.15 Kernel issue disappears).
I did various tests with PG container and rsync too and even out of
container (launched rsync into a CgroupV2).

So now I guess it is a question to Linux Kernel communities in my case.
More details: https://www.reddit.com/r/linuxquestions/s/ADScolw76C

AlexL

On Wed, Mar 26, 2025, 21:16 kaido vaikla <kaido.vaikla@gmail.com> wrote:

Show quoted text

Hi,
Did you get any feedback?
I have suspected something similar.

br
Kaido

#3Alexandru Lazarev
alexandru.lazarev@gmail.com
In reply to: Alexandru Lazarev (#2)
Re: Is pg_basebackup Performance Limited by Files Page Cache?

I'll answer my own question here:
Looks like it is a bug in Oracle Linux (OL) 9.2 RHCK Kernel
"kernel-5.14.0-284.11.1" (see table:
https://docs.oracle.com/en/operating-systems/oracle-linux/9/boot/oracle_linux9_kernel_version_matrix.html)

Switching to UEK Kernel of the same OL version (9.2) fixed the issue,
but UEK is not supported by some SW (e.g. Vertica DB), so that I've
tried RHCK Kernels update: only Kernel and dependencies from OL9.4
(kernel-5.14.0-427) and OL9.5 (kernel-5.14.0-503.11.1) - both fixed
the issue, with remark: RHCK update using one from OL9.5 is suspected
to have some HW compatibility issues (under investigation).
Some more details discussed here: Performance Degradation with rsync
in container or cgroupv2 with MEM limit on Oracle Linux 9.2 (RHCK 5.14
vs. UEK 5.15)

On Wed, Mar 26, 2025 at 11:17 PM Alexandru Lazarev
<alexandru.lazarev@gmail.com> wrote:

Show quoted text

In my case culprit is Linux Kernel (Oracle Linux 9.2: RHCK 5.14 - issue exists, after switched to UEK 5.15 Kernel issue disappears).
I did various tests with PG container and rsync too and even out of container (launched rsync into a CgroupV2).

So now I guess it is a question to Linux Kernel communities in my case.
More details: https://www.reddit.com/r/linuxquestions/s/ADScolw76C

AlexL

On Wed, Mar 26, 2025, 21:16 kaido vaikla <kaido.vaikla@gmail.com> wrote:

Hi,
Did you get any feedback?
I have suspected something similar.

br
Kaido