Inconsistent increment of pg_stat_database.xact_rollback with logical replication
Column xact_rollback from pg_stat_database gets inconsistently incremented
when logical replication is being used (on publisher side).
This can be easily reproduced in latest code from master branch:
- Publisher
postgres=# select xact_commit, xact_rollback from pg_stat_database where
datname = 'postgres';
-[ RECORD 1 ]-+---
xact_commit | 20
xact_rollback | 0
postgres=# insert into t1 values (1);
INSERT 0 1
postgres=# insert into t1 values (2);
INSERT 0 1
postgres=# insert into t1 values (3);
INSERT 0 1
postgres=# insert into t1 values (4);
INSERT 0 1
postgres=# insert into t1 values (5);
INSERT 0 1
postgres=# insert into t1 values (6);
INSERT 0 1
postgres=# insert into t1 values (7);
INSERT 0 1
postgres=# insert into t1 values (8);
INSERT 0 1
postgres=# insert into t1 values (9);
INSERT 0 1
postgres=# insert into t1 values (10);
INSERT 0 1
postgres=# select xact_commit, xact_rollback from pg_stat_database where
datname = 'postgres';
-[ RECORD 1 ]-+---
xact_commit | 33
xact_rollback | 0
- Subscriber
postgres=# alter subscription sub disable;
ALTER SUBSCRIPTION
- Publisher
postgres=# select xact_commit, xact_rollback from pg_stat_database where
datname = 'postgres';
-[ RECORD 1 ]-+---
xact_commit | 36
xact_rollback | 10
What seems to be happening is that the amount of transactions decoded by
the walsender are being added in pg_stat_database.xact_rollback. But these
changes are only flushed to global stats when the walsender gets terminated.
On a quick look look at the source I would suspect that the issue starts
here:
https://github.com/postgres/postgres/blob/master/src/backend/replication/logical/reorderbuffer.c#L2545
All decoded transactions are aborted for cleanup purposes. Following the
source code flow after calling AbortCurrentTransaction() we eventually
reach the part that increments rollback stats here:
https://github.com/postgres/postgres/blob/master/src/backend/utils/activity/pgstat_database.c#L249
This is causing inconsistency in monitoring TPS metric of a database where
we eventually see sudden spikes of TPS in the order of millions.
Regards,
Rafael Castro.
On Thu, Apr 16, 2026 at 7:19 PM Rafael Thofehrn Castro
<rafaelthca@gmail.com> wrote:
Column xact_rollback from pg_stat_database gets inconsistently incremented when logical replication is being used (on publisher side).
...
This is causing inconsistency in monitoring TPS metric of a database where we eventually see sudden spikes of TPS in the order of millions.
This still reproduces on master.
I agree on the root cause: ReorderBufferProcessTXN() ends each decoded
transaction
with AbortCurrentTransaction() for catalog cleanup; in the walsender
that is a top-level
abort, so AtEOXact_PgStat_Database(isCommit=false) increments the backend-local
pgStatXactRollback.
The counts are flushed to shared stats on walsender exit, producing
an acute spike. Result: for production systems with tight alerting on
xact_rollback, this turns routine logical-replication operations
(disabling a subscription, dropping a slot, walsender restart) into
false-positive pages. Also experienced at GitLab [1]https://gitlab.com/gitlab-com/gl-infra/production/-/work_items/8290[2]https://gitlab.com/postgres-ai/postgresql-consulting/tests-and-benchmarks/-/work_items/39[3]https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/work_items/406.
Attaching a simple patch that adds a backend-local flag pgStatXactSkipCounters
in pgstat_database.c that AtEOXact_PgStat_Database() honors to skip
the counter bump.
Included a TAP test that fails on master with 5/0 and passes with the patch.
If there is agreement on this shape, happy to send patches for all
supported branches. Let me know what you think.
[1]: https://gitlab.com/gitlab-com/gl-infra/production/-/work_items/8290
[2]: https://gitlab.com/postgres-ai/postgresql-consulting/tests-and-benchmarks/-/work_items/39
[3]: https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/work_items/406
Nik