BUG: SUBSTRING on TOASTed text returns "invalid byte sequence for encoding UTF8: 0xc3"
Disclosure: This bug reproduction and report were generated with theassistance of Claude Opus 4.6, and reviewed by me.
PostgreSQL version: 18.2-1 (Arch Linux package)
SELECT version();
PostgreSQL 18.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 15.2.1 20260209, 64-bit
Platform: Arch Linux (rolling), kernel 6.18.9-arch1-2, x86_64
Encoding: UTF8, Collation: C.UTF-8, Ctype: C.UTF-8
Block size: 8192
Full reproduction (can be run through psql):
-- 1. Load the test data from the attached hex file (pg18_repro_hex.txt)
CREATE TABLE _pg18_repro (id INT PRIMARY KEY DEFAULT 1, hex_content TEXT);
\COPY _pg18_repro (hex_content) FROM '/tmp/pg18_repro_hex.txt'
ALTER TABLE _pg18_repro ADD COLUMN content TEXT;
UPDATE _pg18_repro SET content = convert_from(decode(replace(hex_content, '\x', ''), 'hex'), 'UTF8');
ALTER TABLE _pg18_repro DROP COLUMN hex_content;
-- 2. Verify the data is valid UTF-8 and is TOASTed
SELECT length(content) AS char_len,
octet_length(content) AS byte_len,
pg_column_size(content) AS stored_size
FROM _pg18_repro WHERE id = 1;
-- Returns: char_len=136485, byte_len=137165, stored_size=65412
-- (stored_size confirms the value is TOASTed and compressed)
-- 3. SUBSTRING fails at character position 1212
SELECT SUBSTRING(content FROM 1212 FOR 1) FROM _pg18_repro WHERE id = 1;
Actual output (with \set VERBOSITY verbose):
ERROR: 22021: invalid byte sequence for encoding "UTF8": 0xc3
LOCATION: report_invalid_encoding_int, mbutils.c:1847
Expected output:
substring
-----------
&
(1 row)
The character at position 1212 is U+0026 AMPERSAND (&), a single-byte ASCII
character. The error 0xc3 is the lead byte of a 2-byte UTF-8 sequence that
exists elsewhere in the data — SUBSTRING appears to be computing an incorrect
byte offset when fetching from the TOASTed value.
Evidence that the data is valid and the bug is TOAST-specific:
-- LEFT/RIGHT returns the correct character at the same position:
SELECT LEFT(RIGHT(content, -1211), 1) FROM _pg18_repro WHERE id = 1;
-- Returns: &
-- Adjacent positions work with SUBSTRING:
SELECT SUBSTRING(content FROM 1211 FOR 1) FROM _pg18_repro WHERE id = 1;
-- Returns: (newline)
SELECT SUBSTRING(content FROM 1213 FOR 1) FROM _pg18_repro WHERE id = 1;
-- Returns: S
-- Forcing a full detoast with || '' makes SUBSTRING work:
SELECT SUBSTRING((content || '') FROM 1212 FOR 1) FROM _pg18_repro WHERE id = 1;
-- Returns: &
-- An inline (non-TOASTed) copy of the same range works:
SELECT SUBSTRING(
(SELECT SUBSTRING(content FROM 1 FOR 1500) FROM _pg18_repro WHERE id = 1)
FROM 1212 FOR 1
);
-- Returns: &
Position 1212 is not the only affected position. Scanning the first 5000
character positions, 37 positions fail — all with the same 0xc3 error.
The full list of failing positions in the first 5000 characters:
18, 69, 75, 167, 308, 704, 876, 1183, 1212, 1235, 1557, 1758, 1770,
2034, 2077, 2091, 2576, 2742, 2878, 2895, 2918, 2944, 3020, 3126,
3181, 3226, 3267, 3462, 3548, 3551, 3554, 3556, 4068, 4154, 4269,
4332, 4964
The scanning query used:
DO $$
DECLARE
i INT;
ch TEXT;
BEGIN
FOR i IN 1..5000 LOOP
BEGIN
SELECT SUBSTRING(content FROM i FOR 1) INTO ch
FROM _pg18_repro WHERE id = 1;
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'FAIL at position %: %', i, SQLERRM;
END;
END LOOP;
END;
$$;
No non-default configuration. The PostgreSQL installation is the stock Arch
Linux postgresql 18.2-1 package, started via systemd, with no changes to
postgresql.conf beyond the defaults.
Cleanup:
DROP TABLE _pg18_repro;
On Friday, February 20, 2026, ludovitkramar <ludovitkramar@proton.me> wrote:
Disclosure: This bug reproduction and report were generated with the
assistance of Claude Opus 4.6, and reviewed by me.PostgreSQL version: 18.2-1 (Arch Linux package)
SELECT version();
PostgreSQL 18.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 15.2.1
20260209, 64-bit
You should modify your prompt to include looking for current news and bug
reports on the PostgreSQL website before writing up a new report.
https://www.postgresql.org/about/news/out-of-cycle-release-scheduled-for-february-26-2026-3241/
David J.