Dead code with short varlenas in toast_save_datum()

Started by Michael Paquier5 months ago11 messages

michael@paquier.xyz

5 months ago

Hi all,

Nikhil (in CC.), has noticed while looking at the 8B-value TOAST patch
that this code block used for short varlenas in toast_save_datum() is
dead based on the current coverage:
if (VARATT_IS_SHORT(dval))
{
data_p = VARDATA_SHORT(dval);
data_todo = VARSIZE_SHORT(dval) - VARHDRSZ_SHORT;
toast_pointer.va_rawsize = data_todo + VARHDRSZ; /* as if not short */
toast_pointer.va_extinfo = data_todo;
}

Coverage link:
https://coverage.postgresql.org/src/backend/access/common/toast_internals.c.gcov.html

toast_save_datum() is taken only through toast_tuple_externalize(),
and I was assuming first that an EXTERNAL attribute with a minimal
toast_tuple_target of 128B would have been enough to trigger this,
still even a minimal bound is not enough to trigger
heap_toast_insert_or_update(), which heap_prepare_insert() would call
if:
- the hardcoded check on TOAST_TUPLE_THRESHOLD is reached, which of
course would not happen.
- tuple is marked with HEAP_HASEXTERNAL, something that happens only
if fill_val() finds an external varlena (aka VARATT_IS_EXTERNAL),
and that does not happen for SHORT varlenas.

The insertion of short varlenas in TOAST is old enough to vote and it
assumed as supported when reading external on-disk TOAST datums, still
it's amazing to see that there no in-core coverage for it.

Anyway, There is an action item here. I don't see a SQL pattern that
would trigger this code path on HEAD (perhaps I lack the imagination
to do so), so the only alternative I can think of is the introduction
of a C function in regress.c to force our way through, calling
directly toast_save_datum() or something equivalent, so as
toast_save_datum() is tested with more varlena patterns. That would
be something similar to what we do for indirect TOAST tuples.

Another possibility is to assume that this code is dead. But I doubt
that it is possible to claim that as the TOAST code is assumed as
being usable by out-of-core code, so depending on how fancy things are
being done a TOAST relation could use a short varatt for an insert.

Thoughts?
--
Michael

Nikita Malakhov

hukutoc@gmail.com

5 months ago

In reply to: Michael Paquier (#1)

Re: Dead code with short varlenas in toast_save_datum()

Hi!

Michael, as far as I understand while externalizing tuple we always check
tuple size with toast_tuple_find_biggest_attribute(), where
biggest attribute size is always >= MAXALIGN(TOAST_POINTER_SIZE)
so currently I cannot see how it is possible to get into VARATT_IS_SHORT
branch.

I've commented this code and done some tests and did not see any unexpected
behavior. It certainly looks like some legacy code left "just because".

On Mon, Aug 4, 2025 at 6:16 AM Michael Paquier <michael@paquier.xyz> wrote:

Hi all,

Nikhil (in CC.), has noticed while looking at the 8B-value TOAST patch
that this code block used for short varlenas in toast_save_datum() is
dead based on the current coverage:
if (VARATT_IS_SHORT(dval))
{
data_p = VARDATA_SHORT(dval);
data_todo = VARSIZE_SHORT(dval) - VARHDRSZ_SHORT;
toast_pointer.va_rawsize = data_todo + VARHDRSZ; /* as if not
short */
toast_pointer.va_extinfo = data_todo;
}

Coverage link:

https://coverage.postgresql.org/src/backend/access/common/toast_internals.c.gcov.html

toast_save_datum() is taken only through toast_tuple_externalize(),
and I was assuming first that an EXTERNAL attribute with a minimal
toast_tuple_target of 128B would have been enough to trigger this,
still even a minimal bound is not enough to trigger
heap_toast_insert_or_update(), which heap_prepare_insert() would call
if:
- the hardcoded check on TOAST_TUPLE_THRESHOLD is reached, which of
course would not happen.
- tuple is marked with HEAP_HASEXTERNAL, something that happens only
if fill_val() finds an external varlena (aka VARATT_IS_EXTERNAL),
and that does not happen for SHORT varlenas.

The insertion of short varlenas in TOAST is old enough to vote and it
assumed as supported when reading external on-disk TOAST datums, still
it's amazing to see that there no in-core coverage for it.

Anyway, There is an action item here. I don't see a SQL pattern that
would trigger this code path on HEAD (perhaps I lack the imagination
to do so), so the only alternative I can think of is the introduction
of a C function in regress.c to force our way through, calling
directly toast_save_datum() or something equivalent, so as
toast_save_datum() is tested with more varlena patterns. That would
be something similar to what we do for indirect TOAST tuples.

Another possibility is to assume that this code is dead. But I doubt
that it is possible to claim that as the TOAST code is assumed as
being usable by out-of-core code, so depending on how fancy things are
being done a TOAST relation could use a short varatt for an insert.

Thoughts?
--
Michael

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

Nikhil Kumar Veldanda

veldanda.nikhilkumar17@gmail.com

5 months ago

In reply to: Michael Paquier (#1)

Re: Dead code with short varlenas in toast_save_datum()

Hi all,

On Sun, Aug 3, 2025 at 8:16 PM Michael Paquier <michael@paquier.xyz> wrote:

if (VARATT_IS_SHORT(dval))
{
data_p = VARDATA_SHORT(dval);
data_todo = VARSIZE_SHORT(dval) - VARHDRSZ_SHORT;
toast_pointer.va_rawsize = data_todo + VARHDRSZ; /* as if not short */
toast_pointer.va_extinfo = data_todo;
}

Coverage link:
https://coverage.postgresql.org/src/backend/access/common/toast_internals.c.gcov.html

This code path is currently not covered by tests. It can be exercised
with the following SQL pattern

CREATE TABLE temp_tbl (a text, b text);
ALTER TABLE temp_tbl SET (toast_tuple_target = 128);
ALTER TABLE temp_tbl ALTER COLUMN a SET STORAGE EXTERNAL;
ALTER TABLE temp_tbl ALTER COLUMN b SET STORAGE EXTERNAL;
INSERT INTO temp_tbl values(repeat('a', 4000), repeat('a', 120));

--
Nikhil Veldanda

Nikita Malakhov

hukutoc@gmail.com

5 months ago

In reply to: Nikhil Kumar Veldanda (#3)

Re: Dead code with short varlenas in toast_save_datum()

Hi,

Nikhil, thank you! This case should be added to regressions.

On Tue, Aug 5, 2025 at 8:26 PM Nikhil Kumar Veldanda <
veldanda.nikhilkumar17@gmail.com> wrote:

Hi all,

On Sun, Aug 3, 2025 at 8:16 PM Michael Paquier <michael@paquier.xyz>
wrote:

if (VARATT_IS_SHORT(dval))
{
data_p = VARDATA_SHORT(dval);
data_todo = VARSIZE_SHORT(dval) - VARHDRSZ_SHORT;
toast_pointer.va_rawsize = data_todo + VARHDRSZ; /* as if not

short */

toast_pointer.va_extinfo = data_todo;
}

Coverage link:

https://coverage.postgresql.org/src/backend/access/common/toast_internals.c.gcov.html

This code path is currently not covered by tests. It can be exercised
with the following SQL pattern

CREATE TABLE temp_tbl (a text, b text);
ALTER TABLE temp_tbl SET (toast_tuple_target = 128);
ALTER TABLE temp_tbl ALTER COLUMN a SET STORAGE EXTERNAL;
ALTER TABLE temp_tbl ALTER COLUMN b SET STORAGE EXTERNAL;
INSERT INTO temp_tbl values(repeat('a', 4000), repeat('a', 120));

--
Nikhil Veldanda

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

Michael Paquier

michael@paquier.xyz

5 months ago

In reply to: Nikhil Kumar Veldanda (#3)

Re: Dead code with short varlenas in toast_save_datum()

On Tue, Aug 05, 2025 at 10:26:03AM -0700, Nikhil Kumar Veldanda wrote:

This code path is currently not covered by tests. It can be exercised
with the following SQL pattern

CREATE TABLE temp_tbl (a text, b text);
ALTER TABLE temp_tbl SET (toast_tuple_target = 128);
ALTER TABLE temp_tbl ALTER COLUMN a SET STORAGE EXTERNAL;
ALTER TABLE temp_tbl ALTER COLUMN b SET STORAGE EXTERNAL;
INSERT INTO temp_tbl values(repeat('a', 4000), repeat('a', 120));

Ah, thanks, nice one. I did not consider the trick of using two
attributes to bypass the check when externalizing the tuple. I'll go
add a test in strings.sql among these lines, with some TOAST slice
scans based on substr().
--
Michael

Michael Paquier

michael@paquier.xyz

5 months ago

In reply to: Michael Paquier (#5)

Re: Dead code with short varlenas in toast_save_datum()

On Wed, Aug 06, 2025 at 10:30:54AM +0900, Michael Paquier wrote:

Ah, thanks, nice one. I did not consider the trick of using two
attributes to bypass the check when externalizing the tuple. I'll go
add a test in strings.sql among these lines, with some TOAST slice
scans based on substr().

Done that as of 225ebfe30a1a, with some additions:
- A check on pg_column_compression(), to make sure that the short
varlena path leads to an uncompressed on-disk Datum.
- Some substr() for the values, to cross-check slice reads.
- A query based on the TOAST relation's chunk_seq, making sure that we
have taken toast_save_datum() two times when inserting a tuple with
the two toastable attributes.

The buildfarm is OK with it, as is the latest patch for the 8-byte
TOAST values posted here:
/messages/by-id/aIyCz4T858Kcm4UU@paquier.xyz

The patch posted there needs a rebase, following the changes in
varatt.h done in e035863c9a04. Will post a rebase very soon.
--
Michael

Nikhil Kumar Veldanda

veldanda.nikhilkumar17@gmail.com

5 months ago

In reply to: Michael Paquier (#6)

1 attachment(s)

Re: Dead code with short varlenas in toast_save_datum()

Hi Michael,

I’m sending a small test-only patch that increases test coverage for
toast_internals.c from 88.5% -> 95.8%.

--
Nikhil Veldanda

Attachments:

v1-0001-Add-tests-coverage-for-TOAST-value-reuse-and-OID-.patchapplication/octet-stream; name=v1-0001-Add-tests-coverage-for-TOAST-value-reuse-and-OID-.patchDownload

From 22bcce5c8c7fd8cc866044cd6819e98db9bc91e3 Mon Sep 17 00:00:00 2001
From: Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com>
Date: Sun, 10 Aug 2025 09:16:22 +0000
Subject: [PATCH v1] Add tests coverage for TOAST value reuse and OID
 allocation in toast_save_datum

Add isolation test for TOAST value reuse during table rewrite operations with
concurrent transactions, and regression tests for TOAST OID allocation
code paths in GetNewOidWithIndex(). These tests exercise edge cases in
toast_save_datum and TOAST chunk ID assignment during table rewrites.
---
 .../expected/cluster-toast-value-reuse.out    | 15 +++++
 src/test/isolation/isolation_schedule         |  1 +
 .../specs/cluster-toast-value-reuse.spec      | 58 +++++++++++++++++++
 src/test/regress/expected/strings.out         | 49 ++++++++++++++++
 src/test/regress/sql/strings.sql              | 27 +++++++++
 5 files changed, 150 insertions(+)
 create mode 100644 src/test/isolation/expected/cluster-toast-value-reuse.out
 create mode 100644 src/test/isolation/specs/cluster-toast-value-reuse.spec

diff --git a/src/test/isolation/expected/cluster-toast-value-reuse.out b/src/test/isolation/expected/cluster-toast-value-reuse.out
new file mode 100644
index 00000000000..fe1af5c5753
--- /dev/null
+++ b/src/test/isolation/expected/cluster-toast-value-reuse.out
@@ -0,0 +1,15 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1_begin s1_update s2_cluster s1_commit
+step s1_begin: BEGIN;
+step s1_update: 
+  UPDATE cluster_toast_value_reuse
+     SET flag = (random()*1000)::int
+   WHERE id IN (
+     SELECT 1 + (random()*999)::int
+     FROM generate_series(1,30)
+   );
+
+step s2_cluster: CLUSTER cluster_toast_value_reuse; <waiting ...>
+step s1_commit: COMMIT;
+step s2_cluster: <... completed>
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index e3c669a29c7..67c67c83855 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -116,3 +116,4 @@ test: serializable-parallel-2
 test: serializable-parallel-3
 test: matview-write-skew
 test: lock-nowait
+test: cluster-toast-value-reuse
diff --git a/src/test/isolation/specs/cluster-toast-value-reuse.spec b/src/test/isolation/specs/cluster-toast-value-reuse.spec
new file mode 100644
index 00000000000..4549c05f690
--- /dev/null
+++ b/src/test/isolation/specs/cluster-toast-value-reuse.spec
@@ -0,0 +1,58 @@
+# Hold an UPDATE open, run CLUSTER in another session, then COMMIT. Which triggers data_todo = 0; code path in toast_save_datum
+
+# ---------- global setup ----------
+setup
+{
+  DROP TABLE IF EXISTS cluster_toast_value_reuse CASCADE;
+
+  CREATE TABLE cluster_toast_value_reuse
+  (
+    id    serial PRIMARY KEY,
+    flag  integer,
+    value text
+  );
+
+  -- Make sure 'value' is large enough to be TOASTed.
+  ALTER TABLE cluster_toast_value_reuse ALTER COLUMN value SET STORAGE EXTERNAL;
+
+  -- Define the clustering index.
+  CLUSTER "cluster_toast_value_reuse_pkey" ON cluster_toast_value_reuse;
+
+  -- Seed data: enough rows with big strings to force TOAST tuples.
+  INSERT INTO cluster_toast_value_reuse(flag, value)
+  SELECT 0,
+         repeat(md5(gs::text), 120) || repeat('x', 8000)  -- ~11KB
+  FROM generate_series(1, 1000) AS gs;
+
+  ANALYZE cluster_toast_value_reuse;
+
+  -- Establish clustered state so plain "CLUSTER cluster_toast_value_reuse;" uses the pkey.
+  CLUSTER cluster_toast_value_reuse;
+}
+
+teardown
+{
+  DROP TABLE IF EXISTS cluster_toast_value_reuse;
+}
+
+# ---------- sessions ----------
+# Session 1: starts a txn and updates some rows, then commits later.
+session s1
+step s1_begin  { BEGIN; }
+step s1_update {
+  UPDATE cluster_toast_value_reuse
+     SET flag = (random()*1000)::int
+   WHERE id IN (
+     SELECT 1 + (random()*999)::int
+     FROM generate_series(1,30)
+   );
+}
+step s1_commit { COMMIT; }
+
+# Session 2: runs CLUSTER while s1 holds locks.
+session s2
+step s2_cluster { CLUSTER cluster_toast_value_reuse; }
+
+# ---------- single interleaving ----------
+# Do the update in s1, then attempt CLUSTER in s2 (will wait), then commit s1.
+permutation s1_begin s1_update s2_cluster s1_commit
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index ba302da51e7..afc0cd6977e 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2124,6 +2124,55 @@ SELECT pg_column_compression(f1) AS f1_comp, pg_column_compression(f2) AS f2_com
          | 
 (1 row)
 
+DROP TABLE toasttest;
+-- Test to trigger TOAST OID allocation code path in GetNewOidWithIndex()
+-- This exercises the toast_pointer.va_valueid assignment when creating new TOAST entries
+-- Create table with plain storage (forces inline storage initially)
+CREATE TABLE toasttest (f1 TEXT STORAGE plain);
+-- Insert large value to trigger TOAST storage
+INSERT INTO toasttest values (repeat('a', 7000));
+-- Switch to external storage to force TOAST table usage
+ALTER TABLE toasttest ALTER COLUMN f1 SET STORAGE EXTERNAL;
+-- Check initial TOAST chunk ID (should return NULL)
+SELECT pg_column_toast_chunk_id(f1) FROM toasttest;
+ pg_column_toast_chunk_id 
+--------------------------
+                         
+(1 row)
+
+SELECT substr(f1, 5, 10) AS f1_data FROM toasttest;
+  f1_data   
+------------
+ aaaaaaaaaa
+(1 row)
+
+SELECT pg_column_compression(f1) AS f1_comp FROM toasttest;
+ f1_comp 
+---------
+ 
+(1 row)
+
+-- VACUUM FULL forces TOAST data rewrite
+vacuum full toasttest;
+-- Verify new TOAST chunk ID was assigned via GetNewOidWithIndex()
+SELECT pg_column_toast_chunk_id(f1) IS NOT NULL FROM toasttest;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT substr(f1, 5, 10) AS f1_data FROM toasttest;
+  f1_data   
+------------
+ aaaaaaaaaa
+(1 row)
+
+SELECT pg_column_compression(f1) AS f1_comp FROM toasttest;
+ f1_comp 
+---------
+ 
+(1 row)
+
 DROP TABLE toasttest;
 --
 -- test length
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index b94004cc08c..ed0feaa0302 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -670,6 +670,33 @@ SELECT pg_column_compression(f1) AS f1_comp, pg_column_compression(f2) AS f2_com
   FROM toasttest;
 DROP TABLE toasttest;
 
+-- Test to trigger TOAST OID allocation code path in GetNewOidWithIndex()
+-- This exercises the toast_pointer.va_valueid assignment when creating new TOAST entries
+
+-- Create table with plain storage (forces inline storage initially)
+CREATE TABLE toasttest (f1 TEXT STORAGE plain);
+
+-- Insert large value to trigger TOAST storage
+INSERT INTO toasttest values (repeat('a', 7000));
+
+-- Switch to external storage to force TOAST table usage
+ALTER TABLE toasttest ALTER COLUMN f1 SET STORAGE EXTERNAL;
+
+-- Check initial TOAST chunk ID (should return NULL)
+SELECT pg_column_toast_chunk_id(f1) FROM toasttest;
+SELECT substr(f1, 5, 10) AS f1_data FROM toasttest;
+SELECT pg_column_compression(f1) AS f1_comp FROM toasttest;
+
+-- VACUUM FULL forces TOAST data rewrite
+vacuum full toasttest;
+
+-- Verify new TOAST chunk ID was assigned via GetNewOidWithIndex()
+SELECT pg_column_toast_chunk_id(f1) IS NOT NULL FROM toasttest;
+SELECT substr(f1, 5, 10) AS f1_data FROM toasttest;
+SELECT pg_column_compression(f1) AS f1_comp FROM toasttest;
+
+DROP TABLE toasttest;
+
 --
 -- test length
 --
-- 
2.47.3

Michael Paquier

michael@paquier.xyz

5 months ago

In reply to: Nikhil Kumar Veldanda (#7)

Re: Dead code with short varlenas in toast_save_datum()

On Sun, Aug 10, 2025 at 03:08:14AM -0700, Nikhil Kumar Veldanda wrote:

I’m sending a small test-only patch that increases test coverage for
toast_internals.c from 88.5% -> 95.8%.

(Discussed these ones offline, while looking at the coverage of
toast_internals.c.)

Thanks for compiling a patch to close more the coverage gap. I'll
look at what you have here.
--
Michael

Michael Paquier

michael@paquier.xyz

5 months ago

In reply to: Michael Paquier (#8)

Re: Dead code with short varlenas in toast_save_datum()

On Sun, Aug 10, 2025 at 07:35:13PM +0900, Michael Paquier wrote:

Thanks for compiling a patch to close more the coverage gap. I'll
look at what you have here.

First, thanks for working on that. That's going to help a lot to make
sure that messing with this area of the code will not break some of
the current assumptions.

So, we have two things here for the rewrite code paths in
toast_save_datum():
1) A SQL test for va_valueid == InvalidOid where we need a new value
that does not conflict with the new and old TOAST tables.
2) An isolation test for the much trickier case where we are going to
reuse a chunk_id.

First, in the SQL test. The trick where you are using a PLAIN storage
to not allocate a chunk_id on initial storage with a value large
enough to force TOAST on rewrite, while the value is small enough to
fit on a single page, is a nice one. We could have used a \gset as
well with a toasted value, but that won't change the fact that we
check for a new value allocated. The location in strings.sql feels
incorrect because this is a rewrite issue, so I have moved the test to
vacuum.sql and applied a slightly-tweaked result. A second thing I
have added is a test to make sure that the same chunk_id is reused
after the rewrite. That's also worth tracking and cheap, covering the
non-InvalidOid case.

With the isolation test, the case is different, and it looks like the
test is incomplete: we want to make sure that the new chunk IDs are
the same before and after, but we cannot use \gset in this context.
What I would suggest is to create an intermediate table storing the
contents we want to compare after the CLUSTER, with a CTAS that stores
the primary key of cluster_toast_value_reuse.id and the chunk_id
associated to each row. Then, after the CLUSTER, we join the pkey
values in the CTAS table and cluster_toast_value_reuse, compare their
chunk IDs and they should match. The test triggers 29 times the
todo=0 code path, as far as I can see, but we should not need that
many tuples with generate_series(), no? If the test is written so as
we compare the old and new chunk IDs with the pkey values, the number
of tuples does not matter much, but that would be a bit cheaper to
run. Could you update the isolation test to do something among these
lines?
--
Michael

#10

Nikhil Kumar Veldanda

veldanda.nikhilkumar17@gmail.com

5 months ago

In reply to: Michael Paquier (#9)

1 attachment(s)

Re: Dead code with short varlenas in toast_save_datum()

On Thu, Aug 14, 2025 at 8:32 PM Michael Paquier <michael@paquier.xyz> wrote:

First, in the SQL test. The trick where you are using a PLAIN storage
to not allocate a chunk_id on initial storage with a value large
enough to force TOAST on rewrite, while the value is small enough to
fit on a single page, is a nice one. We could have used a \gset as
well with a toasted value, but that won't change the fact that we
check for a new value allocated. The location in strings.sql feels
incorrect because this is a rewrite issue, so I have moved the test to
vacuum.sql and applied a slightly-tweaked result. A second thing I
have added is a test to make sure that the same chunk_id is reused
after the rewrite. That's also worth tracking and cheap, covering the
non-InvalidOid case.

Thank you, Michael, for adjusting the change and merging it.

With the isolation test, the case is different, and it looks like the
test is incomplete: we want to make sure that the new chunk IDs are
the same before and after, but we cannot use \gset in this context.
What I would suggest is to create an intermediate table storing the
contents we want to compare after the CLUSTER, with a CTAS that stores
the primary key of cluster_toast_value_reuse.id and the chunk_id
associated to each row. Then, after the CLUSTER, we join the pkey
values in the CTAS table and cluster_toast_value_reuse, compare their
chunk IDs and they should match. The test triggers 29 times the
todo=0 code path, as far as I can see, but we should not need that
many tuples with generate_series(), no? If the test is written so as
we compare the old and new chunk IDs with the pkey values, the number
of tuples does not matter much, but that would be a bit cheaper to
run. Could you update the isolation test to do something among these
lines?
--

Thanks for the guidance. I’ve updated the isolation test to use a CTAS
capturing (id, chunk_id) pre-CLUSTER and added a post-CLUSTER join to
verify the chunk IDs match. I also reduced the number of tuples to 1.

Please find the attached patch for your review. Thanks

--
Nikhil Veldanda

Attachments:

v1-0001-Add-isolation-test-for-TOAST-value-deduplication-.patchapplication/octet-stream; name=v1-0001-Add-isolation-test-for-TOAST-value-deduplication-.patchDownload

From d434b39c83db5c3804aab4de657946324c083a2a Mon Sep 17 00:00:00 2001
From: Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com>
Date: Sat, 16 Aug 2025 09:12:32 +0000
Subject: [PATCH v1] Add isolation test for TOAST value deduplication during
 CLUSTER

This test exercises the corner case in toast_save_datum() where CLUSTER operations encounter duplicate TOAST references and correctly reuse existing TOAST data instead of creating redundant copies.

During table rewrites like CLUSTER, both live and recently-dead versions of a row may reference the same TOAST value. When copying the second or later version of such a row, the system checks if the TOAST OID already exists in the new toast table using toastrel_valueid_exists(). If found, it sets data_todo = 0 to skip redundant data storage, ensuring only one copy of the TOAST value exists in the new table.

The test creates a scenario where:
- Session 1 updates rows while holding a transaction lock
- Session 2 attempts CLUSTER, which waits for the lock
- When CLUSTER proceeds, it encounters the duplicate TOAST references
- The test verifies TOAST chunk IDs are preserved via deduplication
---
 .../expected/cluster-toast-value-reuse.out    | 29 +++++++++
 src/test/isolation/isolation_schedule         |  1 +
 .../specs/cluster-toast-value-reuse.spec      | 64 +++++++++++++++++++
 3 files changed, 94 insertions(+)
 create mode 100644 src/test/isolation/expected/cluster-toast-value-reuse.out
 create mode 100644 src/test/isolation/specs/cluster-toast-value-reuse.spec

diff --git a/src/test/isolation/expected/cluster-toast-value-reuse.out b/src/test/isolation/expected/cluster-toast-value-reuse.out
new file mode 100644
index 00000000000..84cfc00c84e
--- /dev/null
+++ b/src/test/isolation/expected/cluster-toast-value-reuse.out
@@ -0,0 +1,29 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1_begin s1_update s2_store_chunk_ids s2_cluster s1_commit s2_verify_chunk_ids
+step s1_begin: BEGIN;
+step s1_update: 
+  UPDATE cluster_toast_value_reuse
+     SET flag = 1 WHERE TRUE;
+
+step s2_store_chunk_ids: 
+  -- Store the primary keys and their associated chunk IDs before CLUSTER
+  CREATE TABLE chunk_id_comparison AS
+  SELECT c.id, pg_column_toast_chunk_id(c.value) as chunk_id
+  FROM cluster_toast_value_reuse c;
+
+step s2_cluster: CLUSTER cluster_toast_value_reuse; <waiting ...>
+step s1_commit: COMMIT;
+step s2_cluster: <... completed>
+step s2_verify_chunk_ids: 
+  -- Verify that chunk IDs are the same before and after CLUSTER (indicating reuse)
+  SELECT COUNT(*) = 0 AS chunk_ids_preserved
+  FROM chunk_id_comparison orig
+  JOIN cluster_toast_value_reuse c ON orig.id = c.id
+  WHERE orig.chunk_id != pg_column_toast_chunk_id(c.value);
+
+chunk_ids_preserved
+-------------------
+t                  
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 4411d3c86dd..cb8a3bfbcbf 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -117,3 +117,4 @@ test: serializable-parallel-2
 test: serializable-parallel-3
 test: matview-write-skew
 test: lock-nowait
+test: cluster-toast-value-reuse
diff --git a/src/test/isolation/specs/cluster-toast-value-reuse.spec b/src/test/isolation/specs/cluster-toast-value-reuse.spec
new file mode 100644
index 00000000000..c123da9f720
--- /dev/null
+++ b/src/test/isolation/specs/cluster-toast-value-reuse.spec
@@ -0,0 +1,64 @@
+# Hold an UPDATE open, run CLUSTER in another session, then COMMIT. Which triggers data_todo = 0; code path in toast_save_datum
+
+# ---------- global setup ----------
+setup
+{
+  DROP TABLE IF EXISTS cluster_toast_value_reuse CASCADE;
+  DROP TABLE IF EXISTS chunk_id_comparison CASCADE;
+
+  CREATE TABLE cluster_toast_value_reuse
+  (
+    id    serial PRIMARY KEY,
+    flag  integer,
+    value text
+  );
+
+  -- Make sure 'value' is large enough to be TOASTed.
+  ALTER TABLE cluster_toast_value_reuse ALTER COLUMN value SET STORAGE EXTERNAL;
+
+  -- Define the clustering index.
+  CLUSTER "cluster_toast_value_reuse_pkey" ON cluster_toast_value_reuse;
+
+  -- Seed data: one row with big string to force TOAST tuple and trigger the todo=0 code path.
+  INSERT INTO cluster_toast_value_reuse(flag, value) 
+  VALUES (0, repeat(md5('1'), 120) || repeat('x', 8000));
+
+  CLUSTER cluster_toast_value_reuse;
+}
+
+teardown
+{
+  DROP TABLE IF EXISTS cluster_toast_value_reuse;
+  DROP TABLE IF EXISTS chunk_id_comparison;
+}
+
+# ---------- sessions ----------
+# Session 1: starts a txn and updates some rows, then commits later.
+session s1
+step s1_begin  { BEGIN; }
+step s1_update {
+  UPDATE cluster_toast_value_reuse
+     SET flag = 1 WHERE TRUE;
+}
+step s1_commit { COMMIT; }
+
+# Session 2: runs CLUSTER while s1 holds locks.
+session s2
+step s2_store_chunk_ids {
+  -- Store the primary keys and their associated chunk IDs before CLUSTER
+  CREATE TABLE chunk_id_comparison AS
+  SELECT c.id, pg_column_toast_chunk_id(c.value) as chunk_id
+  FROM cluster_toast_value_reuse c;
+}
+step s2_cluster { CLUSTER cluster_toast_value_reuse; }
+step s2_verify_chunk_ids {
+  -- Verify that chunk IDs are the same before and after CLUSTER (indicating reuse)
+  SELECT COUNT(*) = 0 AS chunk_ids_preserved
+  FROM chunk_id_comparison orig
+  JOIN cluster_toast_value_reuse c ON orig.id = c.id
+  WHERE orig.chunk_id != pg_column_toast_chunk_id(c.value);
+}
+
+# ---------- single interleaving ----------
+# Do the update in s1, store chunk IDs, then attempt CLUSTER in s2 (will wait), then commit s1, then verify chunk IDs.
+permutation s1_begin s1_update s2_store_chunk_ids s2_cluster s1_commit s2_verify_chunk_ids
-- 
2.47.3

#11

Michael Paquier

michael@paquier.xyz

5 months ago

In reply to: Nikhil Kumar Veldanda (#10)

Re: Dead code with short varlenas in toast_save_datum()

On Sat, Aug 16, 2025 at 02:34:02AM -0700, Nikhil Kumar Veldanda wrote:

Thanks for the guidance. I’ve updated the isolation test to use a CTAS
capturing (id, chunk_id) pre-CLUSTER and added a post-CLUSTER join to
verify the chunk IDs match. I also reduced the number of tuples to 1.

Please find the attached patch for your review. Thanks

Cool, thanks for the new patch. The structure looks OK. There were
two things that itched me a bit:
- Instead of reporting the number of tuples where the chunk IDs do not
match across the rewrites, I have rewritten the query to report the
serial IDs, instead.
- Making sure that the CTAS holds some data, matching with the number
of tuples inserted. A count(*) was enough for that.
--
Michael