Assistance Needed: Issue with pg_upgrade and --link option

Started by Pradeep Kumarover 2 years ago7 messages
#1Pradeep Kumar
spradeepkumar29@gmail.com

Dear Postgres Hackers,

I hope this email finds you well. I am currently facing an issue while
performing an upgrade using the pg_upgrade utility with the --link option.
I was under the impression that the --link option would create hard links
between the old and new cluster's data files, but it appears that the
entire old cluster data was copied to the new cluster, resulting in a
significant increase in the new cluster's size.

Here are the details of my scenario:
- PostgreSQL version: [Old Version: Postgres 11.4 | New Version: Postgres
14.0]
- Command used for pg_upgrade:
[~/pg_upgrade_testing/postgres_14/bin/pg_upgrade -b
~/pg_upgrade_testing/postgres_11.4/bin -B
~/pg_upgrade_testing/postgres_14/bin -d
~/pg_upgrade_testing/postgres_11.4/replica_db2 -D
~/pg_upgrade_testing/postgres_14/new_pg -r -k
- Paths to the old and new data directories:
[~/pg_upgrade_testing/postgres_11.4/replica_db2]
[~/pg_upgrade_testing/postgres_14/new_pg]
- OS information: [Ubuntu 22.04.2 linux]

However, after executing the pg_upgrade command with the --link option, I
observed that the size of the new cluster is much larger than expected. I
expected the --link option to create hard links instead of duplicating the
data files.

I am seeking assistance to understand the following:
1. Is my understanding of the --link option correct?
2. Is there any additional configuration or step required to properly
utilize the --link option?
3. Are there any limitations or considerations specific to my PostgreSQL
version or file system that I should be aware of?

Any guidance, clarification, or troubleshooting steps you can provide would
be greatly appreciated. I want to ensure that I am utilizing the --link
option correctly and optimize the upgrade process.

Best regards,
Pradeep Kumar

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Pradeep Kumar (#1)
Re: Assistance Needed: Issue with pg_upgrade and --link option

On Wed, 2023-06-28 at 11:49 +0530, Pradeep Kumar wrote:

I was under the impression that the --link option would create hard links between the
old and new cluster's data files, but it appears that the entire old cluster data was
copied to the new cluster, resulting in a significant increase in the new cluster's size.

Please provide some numbers, ideally

du -sk <old_data_directory> <new_data_directory>

Yours,
Laurenz Albe

#3Peter Eisentraut
peter@eisentraut.org
In reply to: Laurenz Albe (#2)
Re: Assistance Needed: Issue with pg_upgrade and --link option

On 28.06.23 08:24, Laurenz Albe wrote:

On Wed, 2023-06-28 at 11:49 +0530, Pradeep Kumar wrote:

I was under the impression that the --link option would create hard links between the
old and new cluster's data files, but it appears that the entire old cluster data was
copied to the new cluster, resulting in a significant increase in the new cluster's size.

Please provide some numbers, ideally

du -sk <old_data_directory> <new_data_directory>

I don't think you can observe the effects of the --link option this way.
It would just give you the full size count for both directories, even
though the point to the same underlying inodes.

To see the effect, you could perhaps use `df` to see how much overall
disk space the upgrade step eats up.

#4Pradeep Kumar
spradeepkumar29@gmail.com
In reply to: Laurenz Albe (#2)
Re: Assistance Needed: Issue with pg_upgrade and --link option

Sure,
du -sk ~/pradeep_test/pg_upgrade_testing/postgres_11.4/master
~/pradeep_test/pg_upgrade_testing/postgres_14/new_pg
11224524 /home/test/pradeep_test/pg_upgrade_testing/postgres_11.4/master
41952 /home/test/pradeep_test/pg_upgrade_testing/postgres_14/new_pg

On Wed, Jun 28, 2023 at 11:54 AM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

Show quoted text

On Wed, 2023-06-28 at 11:49 +0530, Pradeep Kumar wrote:

I was under the impression that the --link option would create hard

links between the

old and new cluster's data files, but it appears that the entire old

cluster data was

copied to the new cluster, resulting in a significant increase in the

new cluster's size.

Please provide some numbers, ideally

du -sk <old_data_directory> <new_data_directory>

Yours,
Laurenz Albe

#5Pradeep Kumar
spradeepkumar29@gmail.com
In reply to: Peter Eisentraut (#3)
Re: Assistance Needed: Issue with pg_upgrade and --link option

This is my numbers.
df ~/pradeep_test/pg_upgrade_testing/postgres_11.4/master
~/pradeep_test/pg_upgrade_testing/postgres_14/new_pg
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/nvme0n1p4_crypt 375161856 102253040 270335920 28% /home
/dev/mapper/nvme0n1p4_crypt 375161856 102253040 270335920 28% /home

On Wed, Jun 28, 2023 at 3:14 PM Peter Eisentraut <peter@eisentraut.org>
wrote:

Show quoted text

On 28.06.23 08:24, Laurenz Albe wrote:

On Wed, 2023-06-28 at 11:49 +0530, Pradeep Kumar wrote:

I was under the impression that the --link option would create hard

links between the

old and new cluster's data files, but it appears that the entire old

cluster data was

copied to the new cluster, resulting in a significant increase in the

new cluster's size.

Please provide some numbers, ideally

du -sk <old_data_directory> <new_data_directory>

I don't think you can observe the effects of the --link option this way.
It would just give you the full size count for both directories, even
though the point to the same underlying inodes.

To see the effect, you could perhaps use `df` to see how much overall
disk space the upgrade step eats up.

#6Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Pradeep Kumar (#4)
Re: Assistance Needed: Issue with pg_upgrade and --link option

On Wed, 2023-06-28 at 15:40 +0530, Pradeep Kumar wrote:

I was under the impression that the --link option would create hard links between the
old and new cluster's data files, but it appears that the entire old cluster data was
copied to the new cluster, resulting in a significant increase in the new cluster's size.

Please provide some numbers, ideally

  du -sk <old_data_directory> <new_data_directory>

du -sk ~/pradeep_test/pg_upgrade_testing/postgres_11.4/master ~/pradeep_test/pg_upgrade_testing/postgres_14/new_pg
11224524 /home/test/pradeep_test/pg_upgrade_testing/postgres_11.4/master
41952 /home/test/pradeep_test/pg_upgrade_testing/postgres_14/new_pg

That looks fine. The files exist only once, and the 41MB that only exist in
the new data directory are catalog data and other stuff that is different
on the new cluster.

Yours,
Laurenz Albe

#7Peter Eisentraut
peter@eisentraut.org
In reply to: Laurenz Albe (#6)
Re: Assistance Needed: Issue with pg_upgrade and --link option

On 28.06.23 12:46, Laurenz Albe wrote:

On Wed, 2023-06-28 at 15:40 +0530, Pradeep Kumar wrote:

I was under the impression that the --link option would create hard links between the
old and new cluster's data files, but it appears that the entire old cluster data was
copied to the new cluster, resulting in a significant increase in the new cluster's size.

Please provide some numbers, ideally

  du -sk <old_data_directory> <new_data_directory>

du -sk ~/pradeep_test/pg_upgrade_testing/postgres_11.4/master ~/pradeep_test/pg_upgrade_testing/postgres_14/new_pg
11224524 /home/test/pradeep_test/pg_upgrade_testing/postgres_11.4/master
41952 /home/test/pradeep_test/pg_upgrade_testing/postgres_14/new_pg

That looks fine. The files exist only once, and the 41MB that only exist in
the new data directory are catalog data and other stuff that is different
on the new cluster.

Interesting, so it actually does count files with multiple hardlinks
only once.