Inconsistent increment of pg_stat_database.xact_rollback with logical replication

Started by Rafael Thofehrn Castroalmost 2 years ago2 messageshackersbugs
Jump to latest
#1Rafael Thofehrn Castro
rafaelthca@gmail.com
bugs

Column xact_rollback from pg_stat_database gets inconsistently incremented
when logical replication is being used (on publisher side).

This can be easily reproduced in latest code from master branch:

- Publisher

postgres=# select xact_commit, xact_rollback from pg_stat_database where
datname = 'postgres';

-[ RECORD 1 ]-+---

xact_commit | 20

xact_rollback | 0

postgres=# insert into t1 values (1);

INSERT 0 1

postgres=# insert into t1 values (2);

INSERT 0 1

postgres=# insert into t1 values (3);

INSERT 0 1

postgres=# insert into t1 values (4);

INSERT 0 1

postgres=# insert into t1 values (5);

INSERT 0 1

postgres=# insert into t1 values (6);

INSERT 0 1

postgres=# insert into t1 values (7);

INSERT 0 1

postgres=# insert into t1 values (8);

INSERT 0 1

postgres=# insert into t1 values (9);

INSERT 0 1

postgres=# insert into t1 values (10);

INSERT 0 1

postgres=# select xact_commit, xact_rollback from pg_stat_database where
datname = 'postgres';

-[ RECORD 1 ]-+---

xact_commit | 33

xact_rollback | 0

- Subscriber

postgres=# alter subscription sub disable;

ALTER SUBSCRIPTION

- Publisher

postgres=# select xact_commit, xact_rollback from pg_stat_database where
datname = 'postgres';

-[ RECORD 1 ]-+---

xact_commit | 36

xact_rollback | 10

What seems to be happening is that the amount of transactions decoded by
the walsender are being added in pg_stat_database.xact_rollback. But these
changes are only flushed to global stats when the walsender gets terminated.

On a quick look look at the source I would suspect that the issue starts
here:
https://github.com/postgres/postgres/blob/master/src/backend/replication/logical/reorderbuffer.c#L2545

All decoded transactions are aborted for cleanup purposes. Following the
source code flow after calling AbortCurrentTransaction() we eventually
reach the part that increments rollback stats here:
https://github.com/postgres/postgres/blob/master/src/backend/utils/activity/pgstat_database.c#L249

This is causing inconsistency in monitoring TPS metric of a database where
we eventually see sudden spikes of TPS in the order of millions.

Regards,

Rafael Castro.

#2Nikolay Samokhvalov
samokhvalov@gmail.com
In reply to: Rafael Thofehrn Castro (#1)
bugshackers
Re: Inconsistent increment of pg_stat_database.xact_rollback with logical replication

On Thu, Apr 16, 2026 at 7:19 PM Rafael Thofehrn Castro
<rafaelthca@gmail.com> wrote:

Column xact_rollback from pg_stat_database gets inconsistently incremented when logical replication is being used (on publisher side).

...

This is causing inconsistency in monitoring TPS metric of a database where we eventually see sudden spikes of TPS in the order of millions.

This still reproduces on master.

I agree on the root cause: ReorderBufferProcessTXN() ends each decoded
transaction
with AbortCurrentTransaction() for catalog cleanup; in the walsender
that is a top-level
abort, so AtEOXact_PgStat_Database(isCommit=false) increments the backend-local
pgStatXactRollback.

The counts are flushed to shared stats on walsender exit, producing
an acute spike. Result: for production systems with tight alerting on
xact_rollback, this turns routine logical-replication operations
(disabling a subscription, dropping a slot, walsender restart) into
false-positive pages. Also experienced at GitLab [1]https://gitlab.com/gitlab-com/gl-infra/production/-/work_items/8290[2]https://gitlab.com/postgres-ai/postgresql-consulting/tests-and-benchmarks/-/work_items/39[3]https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/work_items/406.

Attaching a simple patch that adds a backend-local flag pgStatXactSkipCounters
in pgstat_database.c that AtEOXact_PgStat_Database() honors to skip
the counter bump.

Included a TAP test that fails on master with 5/0 and passes with the patch.

If there is agreement on this shape, happy to send patches for all
supported branches. Let me know what you think.

[1]: https://gitlab.com/gitlab-com/gl-infra/production/-/work_items/8290
[2]: https://gitlab.com/postgres-ai/postgresql-consulting/tests-and-benchmarks/-/work_items/39
[3]: https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/work_items/406

Nik

Attachments:

v1-xact-rollback-decoding.patchapplication/octet-stream; name=v1-xact-rollback-decoding.patchDownload+115-15