Possible G2-item at SERIALIZABLE
Dear Postgres maintainers,
I think I may have found a violation of Serializability in PostgreSQL
18.4 (Debian 18.4-1.pgdg13+1); the version from the PostgreSQL repo.
## Schema
I create three tables, `txn0`, `txn1`, and `txn2`. Each has identical
structure: an integer primary key `id`, a secondary key `sk` (not used
here), and a text `val`.
create table if not exists txn0 (
id int not null primary key,
sk int not null,
val text
);
## Transactions
All transactions run at SERIALIZABLE isolation, using JDBC transactions
or explicit BEGIN/COMMIT; both seem to produce the problem. Transactions
are randomly generated, and perform a mix of either appends or reads by
primary key. Reads are simple:
select (val) from txn1 where id = ?;
We append unique integers to `val` using `INSERT ... ON CONFLICT` and
the `CONCAT` function, like so:
insert into txn1 as t (id, sk, val) values (?, ?, ?) on conflict (id) do
update set val = CONCAT(t.val, ',', ?) where t.id = ?;
## Phenomenon
Consider this test run:
https://s3.amazonaws.com/jepsen.io/analyses/postgres-18.4/g2-item.zip
This test performed two transactions concurrently, using separate clients.
T1: [[:r 491 [1 2 3]] [:r 597 [2 3 4]] [:append 633 3] [:r 632 nil]]
T2: [[:append 632 2] [:r 630 [1 2 3 4]] [:r 633 nil] [:r 632 [2]]]
T1 performed two unrelated reads, then appended 3 to id 633, and read id
632, finding nothing. T2 appended 2 to id 632, performed an unrelated
read, then read key 633, finding nothing, then read its own append to 632.
The problem is that there is no way these transactions could execute in
(apparent) total order, because neither observed the other's write. Each
has a read-write anti-dependency on the other. This is one of the
canonical "dangerous structures" that Postgres' SSI is supposed to prevent.
Here are the exact SQL statements and their responses, as observed by
the client, ordered by the time their responses arrived.
T2: ["insert into txn2 as t (id, sk, val) values (?, ?, ?) on conflict
(id) do update set val = CONCAT(t.val, ',', ?) where t.id = ?" 632 632
"2" "2" 632] -> [#:next.jdbc{:update-count 1}]
T2: ["select (val) from txn0 where id = ? " 630] -> [{:val "1,2,3,4"}]
T2: ["select (val) from txn0 where id = ? " 633] -> []
T1: ["select (val) from txn1 where id = ? " 491] -> [{:val "1,2,3"}]
T1: ["select (val) from txn2 where id = ? " 597] -> [{:val "2,3,4"}]
T2: ["select (val) from txn2 where id = ? " 632] -> [{:val "2"}]
T1: ["insert into txn0 as t (id, sk, val) values (?, ?, ?) on conflict
(id) do update set val = CONCAT(t.val, ',', ?) where t.id = ?" 633 633
"3" "3" 633] -> [#:next.jdbc{:update-count 1}]
T1: ["select (val) from txn2 where id = ? " 632] -> []
For detailed timing and process information, search for these strings in
`jepsen.log`.
## Manual Repro
I've tried these same commands, and several other anomalies, at the psql
shell, but I can't seem to reproduce it there--perhaps the order is
slightly different due to request-response delay, or I'm getting the
BEGIN/COMMIT timing (which is missing from this log) wrong, or maybe it
depends on other transactions (e.g. for the update vs insert state), or
it's a probabilistic bug. I am able to reproduce it at much slower
timings--e.g. with 100-millisecond delays between statements in each
transaction. However, of the examples I've investigated by hand, each
has involved COMMITs executed at just about the same time. That might be
a hint?
## Test
My test runs in Clojure, a Lisp on the JVM. I use the Postgres JDBC
driver, org.postgresql/postgresql "42.7.11", and next.jdbc "1.3.1093".
The test itself lives at:
https://github.com/jepsen-io/postgres.
With commit 6c2bcc3f43085d3b0f21a5d78ba2b0e0e559ea8f, you can run:
lein run test-all -n n1 -w append --time-limit 30 --concurrency 30 --
log-sql --isolation serializable --max-writes-per-key 4 --leave-db-
running --key-types primary --upsert-types on-conflict --test-count 20
--nemesis none
On my machine, this spits out roughly one G2-item anomaly every 20 seconds.
Its transactions are generated by jepsen.sql, which lives here:
https://github.com/jepsen-io/sql
This is essentially the same test I ran which found a Serializability
bug in 12.3: https://jepsen.io/analyses/postgresql-12.3. I've just
broken it up into two libraries and added a few complications.
The workload which produces these results is:
https://github.com/jepsen-io/sql/
blob/6e34b76e2ac6a1c3edb5a9cab1c835eda4ca4c5e/src/jepsen/sql/append.clj.
I am relatively confident that isolation levels and transactions are
being used correctly, because when I adjust the isolation level from
Serializable to (e.g.) Read Committed, I see vastly different anomalies.
I can also reproduce the problem both using JDBC's transaction calls,
and with explicit BEGIN; SET TRANSACTION ISOLATION LEVEL
SERIALIZABLE; ...; COMMIT;, which makes me think that it's not a bug in
the way next.jdbc or the JDBC driver handle transactions.
## System Information
This is a basically stock Debian 13 LXC container.
$ uname -a
Linux n1 6.17.0-29-generic #29~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon
May 11 10:30:58 UTC 2 x86_64 GNU/Linux
The host machine runs Linux Mint 22.3 Zena. It's a Threadripper 7980X,
if that's relevant--happy to provide additional hardware details.
I install `postgresql-common` from the usual Debian repos, then run `/
usr/share/postgresql-common/pgdg/apt.postgresql.org.sh` to add
Postgres's official repos, then `apt install postgresql-18 postgresql-
client-18`, which as of 2026-05-22, installs Postgresql 18.4 (Debian
18.4-1.pgdg13+1). I leave the configuration files essentially stock,
except for enabling network access and setting `autovacuum_naptime = 5s`.
--
If y'all have any luck reproducing this, I'd love to hear about it!
Yours truly,
--Kyle