BUG #19519: REPACK can fail due to missing chunk for toast value
The following bug has been logged on the website:
Bug reference: 19519
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 19beta1
Operating system: Ubuntu 24.04
Description:
The following script:
createdb db1
createdb db2
cat << EOF | psql db1
SET default_statistics_target = 1000;
CREATE TABLE t(i int, t text);
INSERT INTO t SELECT 1, g::text FROM generate_series(1, 50000) g;
ANALYZE t;
EOF
cat << EOF | psql db2 &
BEGIN;
CREATE TABLE t(i int);
SELECT pg_sleep(3);
EOF
sleep 1
cat << EOF | psql db1
DROP TABLE t;
VACUUM pg_toast.pg_toast_2619;
REPACK;
EOF
wait
triggers:
ERROR: missing chunk number 0 for toast value 16393 in pg_toast_2619
Reproduced starting from ac58465e0.
Hi Alexander,
On Sun, Jun 14, 2026 at 10:37 PM PG Bug reporting form <
noreply@postgresql.org> wrote:
The following bug has been logged on the website:
The following script:
createdb db1
createdb db2cat << EOF | psql db1
SET default_statistics_target = 1000;
CREATE TABLE t(i int, t text);
INSERT INTO t SELECT 1, g::text FROM generate_series(1, 50000) g;
ANALYZE t;
EOFcat << EOF | psql db2 &
BEGIN;
CREATE TABLE t(i int);
SELECT pg_sleep(3);
EOFsleep 1
cat << EOF | psql db1
DROP TABLE t;
VACUUM pg_toast.pg_toast_2619;
REPACK;
EOF
waittriggers:
ERROR: missing chunk number 0 for toast value 16393 in pg_toast_2619
Thanks for the report and clean reproducer , I can also reproduce this.
I looked into this. It is not REPACK-specific, it reproduces with
plain VACUUM FULL on back branches too, so it predates
the REPACK command. AFAIU The root cause is a disagreement about
the removal horizon between a table rewrite and an independent vacuum
of that table's TOAST relation.
A rewrite (cluster_rel -> copy_table_data, copy_heap_data) preserves live
and RECENTLY_DEAD tuples, and detoasts a preserved tuple's external
values to move them into the new TOAST table.Its removal cutoff comes
from GetOldestNonRemovableTransactionId(), i.e. ComputeXidHorizons(),
which includes the rewriting backend's own snapshot xmin.
The rewrite holds an ordinary MVCC snapshot (taken so index expressions
can be evaluated) and deliberately does not set PROC_IN_VACUUM. So
The rewrite backend's snapshot xmin is the cluster-wide oldest
running xid, which the db2 transaction pins. Since that backend is
in db1 and lacks PROC_IN_VACUUM, its own xmin folds into the
relation's data/catalog horizon -> OldestXmin sits behind the
DROP, so the dead pg_statistic tuple is RECENTLY_DEAD and gets
copied.
The lazy vacuum of pg_toast_2619 sets PROC_IN_VACUUM, which excludes
its own xmin from the horizon, and the db2 transaction, being in another
database,
does not constrain a non-shared relation's horizon. Its cutoff advances
past the DROP
and the chunks are removed.The two operations thus disagree about the same
tuple,
and the rewrite ends up fetching TOAST the lazy vacuum already reclaimed.
Attached Fix:
The rewrite is simply too conservative,AFAIK it's snapshot is only there to
evaluate index expressions against other relations, and it is not a
reader of the rewritten relation's historical rows, so its own xmin
should not hold back that relation's removal horizon. Excluding it
yields exactly the horizon the lazy vacuum uses.
We cannot set PROC_IN_VACUUM on the rewrite, that would broadcast
"ignore my xmin" to every backend and let other vacuums remove tuples in
other relations that the index expressions may need (the documented
reason VACUUM FULL does not set it). The exclusion has to be local to
the rewrite's own horizon computation, so the patch adds
GetOldestNonRemovableTransactionIdForRewrite(): ComputeXidHorizons()
gains a flag to skip the calling backend (and to not publish the
resulting, more-aggressive horizons into the shared GlobalVisState), and
copy_table_data() uses it for OldestXmin.
--
Thanks :)
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/