pg_combinebackup: incorrect size of VM fork after combine

Started by Amul Sul6 days ago6 messages

sulamul@gmail.com

6 days ago

(new thread)

On Wed, Jan 7, 2026 at 8:20 PM Oleg Tkachenko <oatkachenko@gmail.com> wrote:

Hello Robert,

I checked the VM fork file and found that its incremental version has a wrong
block number in the header:

```
xxd -l 12 INCREMENTAL.16384_vm
0d1f aed3 0100 0000 0000 0200 <--- 131072 blocks (1 GB)
^^^^ ^^^^
```

This value can only come from the WAL summaries, so I checked them too.
One of the summary files contains:

```
TS 1663, DB 5, REL 16384, FORK main: limit 131073
TS 1663, DB 5, REL 16384, FORK vm: limit 131073
TS 1663, DB 5, REL 16384, FORK vm: block 4

```

Both forks have the same limit, which looks wrong.
So I checked the WAL files to see what really happened with the VM fork.
I did not find any “truncate" records for the VM file.
I only found this record for the main fork
(actually, the fork isn’t mentioned at all):

```
rmgr: Storage len (rec/tot): 46/46, tx: 759, lsn: 0/4600D318,
prev 0/4600B2C8, desc: TRUNCATE base/5/16384 to 131073 blocks flags 7
```

This suggests that the WAL summarizer may be mixing up information between
relation forks.

$subject found, while discussing another bug in the incremental backup
feature [1].

The issue is when a relation spanning multiple segments (e.g., > 1 GB)
is truncated down to a single segment (or a smaller size) via VACUUM.
This action generates an SMGR_TRUNCATE_ALL WAL record. When a
subsequent incremental backup is taken and then processed by
pg_combinebackup, the resulting Visibility Map (VM) fork in the
combined backup is reconstructed with an incorrect, "insanely high"
size -- the size equal to the main fork.

I have attached a small reproducer by modifying an existing test case
and making it fail so that the file size can be checked. Apply it to
the master branch and run:

cd src/bin/pg_combinebackup/
make check PROVE_TESTS='t/011_ib_truncation.pl'

Backups used for testing will be
"tmp_check/t_011_ib_truncation_primary_data/backup/" directory and
the combined backup result in
"tmp_check/t_011_ib_truncation_node2_data/pgdata/"

If you inspect the relation forks in the final combined backup, you
will see the VM size discrepancy (16384 is the test relation oid):

ll -h tmp_check/t_011_ib_truncation_node2_data/pgdata/base/5/ | grep 16384

-rw-------. 1 amul 1.0G Feb 19 17:10 16384 <----- main fork file
-rw-------. 1 amul 8.0K Feb 19 17:10 16384.1
-rw-------. 1 amul 280K Feb 19 17:10 16384_fsm
-rw-------. 1 amul 1.0G Feb 19 17:10 16384_vm. <----- vm fork file (1 GB)

The reason, as Oleg explained in the same thread [1], is that the
summary file recorded an incorrect size limit for the VM fork due to a
truncation WAL record with the SMGR_TRUNCATE_ALL flag.

I think the fix will be to correct the wal summary entry that records
an incorrect truncation limit for the VM fork. Attached are the
patches: 0001 is a refactoring patch that moves the necessary macro
definitions from visibilitymap.c to visibilitymap.h to correctly
calculate the VM fork limit recorded in the wal summary file, and 0002
provides the actual fix.

1] /messages/by-id/6897DAF7-B699-41BF-A6FB-B818FCFFD585@gmail.com

--
Regards,
Amul Sul
EDB: http://www.enterprisedb.com

Robert Haas

robertmhaas@gmail.com

3 days ago

In reply to: Amul Sul (#1)

Re: pg_combinebackup: incorrect size of VM fork after combine

On Tue, Mar 3, 2026 at 11:50 PM Amul Sul <sulamul@gmail.com> wrote:

I think the fix will be to correct the wal summary entry that records
an incorrect truncation limit for the VM fork. Attached are the
patches: 0001 is a refactoring patch that moves the necessary macro
definitions from visibilitymap.c to visibilitymap.h to correctly
calculate the VM fork limit recorded in the wal summary file, and 0002
provides the actual fix.

I don't like exposing those macros to the rest of the system, because
they're not named very generically.

Also, I don't think that using HEAPBLK_TO_MAPBLOCK() directly is
correct. That macro returns the visibility map page that contains the
VM bit for the indicated heap block number, but that's not what we
need here. For example, if we truncate the heap to a length of 1,
HEAPBLK_TO_MAPBLOCK() will return 0, but if we set the limit block for
the VM to zero, that means the VM was truncated away entirely, which
in this scenario wouldn't be the case. So, instead of computing too
large a value for the VM's limit block, I think your patch would cause
us to compute too small a value for the VM's limit block.

The question we need to answer here is: if we entire remove all heap
blocks >= N, then for what value of M do we remove all visibility map
blocks >= M? The answer is that if the the N (the heap block limit) is
a multiple of HEAPBLOCKS_PER_PAGE, then it's just
N/HEAPBLOCKS_PER_PAGE. Otherwise, it's one more, because we need an
extra VM page as soon as it's necessary to use at least one bit on
that page. So, basically, we need to divide by HEAPBLOCKS_PER_PAGE and
round up.

So here's my attempt at a patch. Instead of exposing
HEAPBLK_TO_MAPBLOCK() etc., I added a new helper function,
visibilitymap_truncation_length. It's just a wrapper around a new
macro HEAPBLK_TO_MAPBLOCK_LIMIT(), but I think keeping the exposed
symbols having visibilitymap in the name is a good idea. Also, I added
a test case, and I have verified that this test case passes with the
fix and fails without it.

--
Robert Haas
EDB: http://www.enterprisedb.com

Amul Sul

sulamul@gmail.com

about 22 hours ago

In reply to: Robert Haas (#2)

Re: pg_combinebackup: incorrect size of VM fork after combine

On Sat, Mar 7, 2026 at 12:25 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Mar 3, 2026 at 11:50 PM Amul Sul <sulamul@gmail.com> wrote:

I think the fix will be to correct the wal summary entry that records
an incorrect truncation limit for the VM fork. Attached are the
patches: 0001 is a refactoring patch that moves the necessary macro
definitions from visibilitymap.c to visibilitymap.h to correctly
calculate the VM fork limit recorded in the wal summary file, and 0002
provides the actual fix.

I don't like exposing those macros to the rest of the system, because
they're not named very generically.

Also, I don't think that using HEAPBLK_TO_MAPBLOCK() directly is
correct. That macro returns the visibility map page that contains the
VM bit for the indicated heap block number, but that's not what we
need here. For example, if we truncate the heap to a length of 1,
HEAPBLK_TO_MAPBLOCK() will return 0, but if we set the limit block for
the VM to zero, that means the VM was truncated away entirely, which
in this scenario wouldn't be the case. So, instead of computing too
large a value for the VM's limit block, I think your patch would cause
us to compute too small a value for the VM's limit block.

The question we need to answer here is: if we entire remove all heap
blocks >= N, then for what value of M do we remove all visibility map
blocks >= M? The answer is that if the the N (the heap block limit) is
a multiple of HEAPBLOCKS_PER_PAGE, then it's just
N/HEAPBLOCKS_PER_PAGE. Otherwise, it's one more, because we need an
extra VM page as soon as it's necessary to use at least one bit on
that page. So, basically, we need to divide by HEAPBLOCKS_PER_PAGE and
round up.

So here's my attempt at a patch. Instead of exposing
HEAPBLK_TO_MAPBLOCK() etc., I added a new helper function,
visibilitymap_truncation_length. It's just a wrapper around a new
macro HEAPBLK_TO_MAPBLOCK_LIMIT(), but I think keeping the exposed
symbols having visibilitymap in the name is a good idea. Also, I added
a test case, and I have verified that this test case passes with the
fix and fails without it.

Understood, the patch looks good to me, thanks.

Regards,
Amul

Robert Haas

robertmhaas@gmail.com

about 15 hours ago

In reply to: Amul Sul (#3)

Re: pg_combinebackup: incorrect size of VM fork after combine

On Mon, Mar 9, 2026 at 2:20 AM Amul Sul <sulamul@gmail.com> wrote:

Understood, the patch looks good to me, thanks.

Committed.

--
Robert Haas
EDB: http://www.enterprisedb.com

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

about 13 hours ago

In reply to: Robert Haas (#2)

Re: pg_combinebackup: incorrect size of VM fork after combine

On 06/03/2026 20:55, Robert Haas wrote:

On Tue, Mar 3, 2026 at 11:50 PM Amul Sul <sulamul@gmail.com> wrote:

I think the fix will be to correct the wal summary entry that records
an incorrect truncation limit for the VM fork. Attached are the
patches: 0001 is a refactoring patch that moves the necessary macro
definitions from visibilitymap.c to visibilitymap.h to correctly
calculate the VM fork limit recorded in the wal summary file, and 0002
provides the actual fix.

I don't like exposing those macros to the rest of the system, because
they're not named very generically.

Also, I don't think that using HEAPBLK_TO_MAPBLOCK() directly is
correct. That macro returns the visibility map page that contains the
VM bit for the indicated heap block number, but that's not what we
need here. For example, if we truncate the heap to a length of 1,
HEAPBLK_TO_MAPBLOCK() will return 0, but if we set the limit block for
the VM to zero, that means the VM was truncated away entirely, which
in this scenario wouldn't be the case. So, instead of computing too
large a value for the VM's limit block, I think your patch would cause
us to compute too small a value for the VM's limit block.

The question we need to answer here is: if we entire remove all heap
blocks >= N, then for what value of M do we remove all visibility map
blocks >= M? The answer is that if the the N (the heap block limit) is
a multiple of HEAPBLOCKS_PER_PAGE, then it's just
N/HEAPBLOCKS_PER_PAGE. Otherwise, it's one more, because we need an
extra VM page as soon as it's necessary to use at least one bit on
that page. So, basically, we need to divide by HEAPBLOCKS_PER_PAGE and
round up.

So here's my attempt at a patch. Instead of exposing
HEAPBLK_TO_MAPBLOCK() etc., I added a new helper function,
visibilitymap_truncation_length. It's just a wrapper around a new
macro HEAPBLK_TO_MAPBLOCK_LIMIT(), but I think keeping the exposed
symbols having visibilitymap in the name is a good idea. Also, I added
a test case, and I have verified that this test case passes with the
fix and fails without it.

For 'master', I wonder if we should change the 'xl_smgr_create' record
format to directly include the new sizes of all of the truncated forks.
It's always felt error-prone to me that the redo function recomputes
those. It's a little more complicated for the redo function, as it needs
to also clear out part of the last remaining page, but that information
could also be included in the WAL record directly.

- Heikki

Robert Haas

robertmhaas@gmail.com

about 12 hours ago

In reply to: Heikki Linnakangas (#5)

Re: pg_combinebackup: incorrect size of VM fork after combine

On Mon, Mar 9, 2026 at 11:42 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

For 'master', I wonder if we should change the 'xl_smgr_create' record
format to directly include the new sizes of all of the truncated forks.
It's always felt error-prone to me that the redo function recomputes
those. It's a little more complicated for the redo function, as it needs
to also clear out part of the last remaining page, but that information
could also be included in the WAL record directly.

Possibly -- or at least improve the comments. I was very confused
about how this works for a long time. One advantage of the current
system is that it keeps the WAL record small, but it's unlikely that
the additional bytes in an infrequently-used record type would matter
to many people. An advantage of listing the sizes explicitly is that
it would accommodate hypothetical relation forks where the other-fork
size can't be trivially derived from the main-fork size, but I'm
unconvinced we're ever going to add such a relation fork. I don't
really know what the right thing to do is.

--
Robert Haas
EDB: http://www.enterprisedb.com

pg_combinebackup: incorrect size of VM fork after combine

Attachments:

Attachments: