BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction
The following bug has been logged on the website:
Bug reference: 19439
Logged by: klemen kobau
Email address: klemen.kobau@gmail.com
PostgreSQL version: 18.0
Operating system: Linux (EndeavorOS)
Description:
I am running postgres:18.0 in a docker container, the configuration is as
follows:
postgres:
image: postgres:18.0
command: [
"postgres",
"-N", "200",
"-c", "max_prepared_transactions=100",
"-c", "wal_level=logical",
"-c", "max_wal_senders=10",
"-c", "max_replication_slots=20",
"-c", "wal_keep_size=10",
"-c", "max_slot_wal_keep_size=1024"
]
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
TZ: UTC
PGTZ: UTC
ports:
- 5432:5432
volumes:
- postgres_data:/var/lib/postgresql
I use psql version 18.3.
I run the following:
psql -h localhost -p 5432 -U postgres -d postgres -c "
-- Transaction 1: insert 1 row, check stats, commit
BEGIN;
CREATE TABLE IF NOT EXISTS temp.xact_test (id serial PRIMARY KEY, val text);
INSERT INTO temp.xact_test (val) VALUES ('a');
SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
'xact_test';
COMMIT;
-- Transaction 2: insert 1 row, check stats
BEGIN;
INSERT INTO temp.xact_test (val) VALUES ('b');
SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
'xact_test';
ROLLBACK;
-- Cleanup
DROP TABLE temp.xact_test;
"
And the output is
BEGIN
CREATE TABLE
INSERT 0 1
relname | n_tup_ins
-----------+-----------
xact_test | 1
(1 row)
COMMIT
BEGIN
INSERT 0 1
relname | n_tup_ins
-----------+-----------
xact_test | 2
(1 row)
ROLLBACK
DROP TABLE
This is not what I would expect from reading
https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS,
where it states
A transaction can also see its own statistics (not yet flushed out to the
shared memory statistics) in the views pg_stat_xact_all_tables,
pg_stat_xact_sys_tables, pg_stat_xact_user_tables, and
pg_stat_xact_user_functions. These numbers do not act as stated above;
instead they update continuously throughout the transaction.
based on this, I would expect that the numbers are updated each time an
insert happens and that they are isolated per transaction.
Kind regards
Klemen Kobau
Hi klemen,
Thanks for the report.
On Fri, Mar 27, 2026 at 5:36 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
The following bug has been logged on the website:
Bug reference: 19439
Logged by: klemen kobau
Email address: klemen.kobau@gmail.com
PostgreSQL version: 18.0
Operating system: Linux (EndeavorOS)
Description:I am running postgres:18.0 in a docker container, the configuration is as
follows:postgres:
image: postgres:18.0
command: [
"postgres",
"-N", "200",
"-c", "max_prepared_transactions=100",
"-c", "wal_level=logical",
"-c", "max_wal_senders=10",
"-c", "max_replication_slots=20",
"-c", "wal_keep_size=10",
"-c", "max_slot_wal_keep_size=1024"
]
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
TZ: UTC
PGTZ: UTC
ports:
- 5432:5432
volumes:
- postgres_data:/var/lib/postgresqlI use psql version 18.3.
I run the following:
psql -h localhost -p 5432 -U postgres -d postgres -c "
-- Transaction 1: insert 1 row, check stats, commit
BEGIN;
CREATE TABLE IF NOT EXISTS temp.xact_test (id serial PRIMARY KEY, val text);
INSERT INTO temp.xact_test (val) VALUES ('a');
SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
'xact_test';
COMMIT;-- Transaction 2: insert 1 row, check stats
BEGIN;
INSERT INTO temp.xact_test (val) VALUES ('b');
SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
'xact_test';
ROLLBACK;-- Cleanup
DROP TABLE temp.xact_test;
"And the output is
BEGIN
CREATE TABLE
INSERT 0 1
relname | n_tup_ins
-----------+-----------
xact_test | 1
(1 row)COMMIT
BEGIN
INSERT 0 1
relname | n_tup_ins
-----------+-----------
xact_test | 2
(1 row)ROLLBACK
DROP TABLE
I can also reproduce this behavior on HEAD.
This is not what I would expect from reading
https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS,
where it statesA transaction can also see its own statistics (not yet flushed out to the
shared memory statistics) in the views pg_stat_xact_all_tables,
pg_stat_xact_sys_tables, pg_stat_xact_user_tables, and
pg_stat_xact_user_functions. These numbers do not act as stated above;
instead they update continuously throughout the transaction.based on this, I would expect that the numbers are updated each time an
insert happens and that they are isolated per transaction.
This snippet of the doc feels somewhat ambiguous. The current behavior
does not seem fully consistent with wording such as “throughout the
transaction.” It also seems more appropriate for these table views to
reflect stats within their own scopes, rather than cumulative values
spanning txn boundaries.
--
Best,
Xuneng
On Fri, Mar 27, 2026 at 10:15 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
Hi klemen,
Thanks for the report.
On Fri, Mar 27, 2026 at 5:36 AM PG Bug reporting form
<noreply@postgresql.org> wrote:The following bug has been logged on the website:
Bug reference: 19439
Logged by: klemen kobau
Email address: klemen.kobau@gmail.com
PostgreSQL version: 18.0
Operating system: Linux (EndeavorOS)
Description:I am running postgres:18.0 in a docker container, the configuration is as
follows:postgres:
image: postgres:18.0
command: [
"postgres",
"-N", "200",
"-c", "max_prepared_transactions=100",
"-c", "wal_level=logical",
"-c", "max_wal_senders=10",
"-c", "max_replication_slots=20",
"-c", "wal_keep_size=10",
"-c", "max_slot_wal_keep_size=1024"
]
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
TZ: UTC
PGTZ: UTC
ports:
- 5432:5432
volumes:
- postgres_data:/var/lib/postgresqlI use psql version 18.3.
I run the following:
psql -h localhost -p 5432 -U postgres -d postgres -c "
-- Transaction 1: insert 1 row, check stats, commit
BEGIN;
CREATE TABLE IF NOT EXISTS temp.xact_test (id serial PRIMARY KEY, val text);
INSERT INTO temp.xact_test (val) VALUES ('a');
SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
'xact_test';
COMMIT;-- Transaction 2: insert 1 row, check stats
BEGIN;
INSERT INTO temp.xact_test (val) VALUES ('b');
SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
'xact_test';
ROLLBACK;-- Cleanup
DROP TABLE temp.xact_test;
"And the output is
BEGIN
CREATE TABLE
INSERT 0 1
relname | n_tup_ins
-----------+-----------
xact_test | 1
(1 row)COMMIT
BEGIN
INSERT 0 1
relname | n_tup_ins
-----------+-----------
xact_test | 2
(1 row)ROLLBACK
DROP TABLEI can also reproduce this behavior on HEAD.
This is not what I would expect from reading
https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS,
where it statesA transaction can also see its own statistics (not yet flushed out to the
shared memory statistics) in the views pg_stat_xact_all_tables,
pg_stat_xact_sys_tables, pg_stat_xact_user_tables, and
pg_stat_xact_user_functions. These numbers do not act as stated above;
instead they update continuously throughout the transaction.based on this, I would expect that the numbers are updated each time an
insert happens and that they are isolated per transaction.This snippet of the doc feels somewhat ambiguous. The current behavior
does not seem fully consistent with wording such as “throughout the
transaction.” It also seems more appropriate for these table views to
reflect stats within their own scopes, rather than cumulative values
spanning txn boundaries.
I’ve looked into this issue and 'd like to propose a patch to address
it. Feedback is very welcome.
--- Root cause
Since the stats subsystem was rewritten (commit 5891c7a8), each
backend keeps a hash of pending stats entries that persist across
transaction boundaries. Entries that flush successfully are deleted
from the backend-local pending list; entries not flushed remain
pending with their accumulated counters. Flushing is driven by
pgstat_report_stat(), called from the ReadyForQuery path, and subject
to a rate limiter.
The pg_stat_xact_* views read these pending entries directly via
find_tabstat_entry() / find_funcstat_entry(). Both functions return
the raw accumulated counters without per-transaction scoping.
The bug is deterministic when multiple top-level transactions are
processed within a single simple-query message, because there is no
ReadyForQuery boundary between the transactions and therefore no
opportunity to flush and remove the pending entry.
--- Fix
The patch introduces a per-entry "transaction baseline" — a lazy
snapshot of the counters taken the first time an entry is touched in
each new top-level transaction. The accessor functions
(find_tabstat_entry, find_funcstat_entry) then subtract the baseline
from the current counters, yielding only the current transaction's
delta.
The baseline is keyed by MyProc->vxid.lxid. A static inline helper,
pgstat_ensure_xact_baseline(), is called at every nontransactional
counter-increment site (the 7 event-counter macros in pgstat.h and the
4 non-inline counter functions in pgstat_relation.c). After the first
call per entry per transaction, the check reduces to a single integer
comparison with a predictably not-taken branch.
For function stats, a new PgStat_FunctionPending wrapper struct embeds
PgStat_FunctionCounts at offset 0 with the baseline fields appended,
so the flush callback requires only a trivial cast change.
--- Testing
The regression test is a TAP test under src/bin/psql/t/ that uses psql
-c to send multi-statement strings as single simple-query messages.
src/bin/psql/t look like the right existing harness for the primary
regression because psql -c sends the whole multi-statement string as
one simple-query message via simple_action_list dispatch. By contrast,
ordinary pg_regress SQL files are executed by psql in file/stdin mode,
which runs through MainLoop() and dispatches top-level statements one
at a time via SendQuery(), creating ReadyForQuery boundaries that
could mask the bug.
The test covers three scenarios: table n_tup_ins and seq_scan
counters, and function calls — each verifying isolation across
consecutive top-level transactions within a single message.
--
Best,
Xuneng
Attachments:
v1-0001-Fix-pg_stat_xact_-views-leaking-stats-across-tran.patchapplication/octet-stream; name=v1-0001-Fix-pg_stat_xact_-views-leaking-stats-across-tran.patchDownload+249-10
On Sat, Mar 28, 2026 at 1:47 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
On Fri, Mar 27, 2026 at 10:15 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
Hi klemen,
Thanks for the report.
On Fri, Mar 27, 2026 at 5:36 AM PG Bug reporting form
<noreply@postgresql.org> wrote:The following bug has been logged on the website:
Bug reference: 19439
Logged by: klemen kobau
Email address: klemen.kobau@gmail.com
PostgreSQL version: 18.0
Operating system: Linux (EndeavorOS)
Description:I am running postgres:18.0 in a docker container, the configuration is as
follows:postgres:
image: postgres:18.0
command: [
"postgres",
"-N", "200",
"-c", "max_prepared_transactions=100",
"-c", "wal_level=logical",
"-c", "max_wal_senders=10",
"-c", "max_replication_slots=20",
"-c", "wal_keep_size=10",
"-c", "max_slot_wal_keep_size=1024"
]
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
TZ: UTC
PGTZ: UTC
ports:
- 5432:5432
volumes:
- postgres_data:/var/lib/postgresqlI use psql version 18.3.
I run the following:
psql -h localhost -p 5432 -U postgres -d postgres -c "
-- Transaction 1: insert 1 row, check stats, commit
BEGIN;
CREATE TABLE IF NOT EXISTS temp.xact_test (id serial PRIMARY KEY, val text);
INSERT INTO temp.xact_test (val) VALUES ('a');
SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
'xact_test';
COMMIT;-- Transaction 2: insert 1 row, check stats
BEGIN;
INSERT INTO temp.xact_test (val) VALUES ('b');
SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
'xact_test';
ROLLBACK;-- Cleanup
DROP TABLE temp.xact_test;
"And the output is
BEGIN
CREATE TABLE
INSERT 0 1
relname | n_tup_ins
-----------+-----------
xact_test | 1
(1 row)COMMIT
BEGIN
INSERT 0 1
relname | n_tup_ins
-----------+-----------
xact_test | 2
(1 row)ROLLBACK
DROP TABLEI can also reproduce this behavior on HEAD.
This is not what I would expect from reading
https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS,
where it statesA transaction can also see its own statistics (not yet flushed out to the
shared memory statistics) in the views pg_stat_xact_all_tables,
pg_stat_xact_sys_tables, pg_stat_xact_user_tables, and
pg_stat_xact_user_functions. These numbers do not act as stated above;
instead they update continuously throughout the transaction.based on this, I would expect that the numbers are updated each time an
insert happens and that they are isolated per transaction.This snippet of the doc feels somewhat ambiguous. The current behavior
does not seem fully consistent with wording such as “throughout the
transaction.” It also seems more appropriate for these table views to
reflect stats within their own scopes, rather than cumulative values
spanning txn boundaries.I’ve looked into this issue and 'd like to propose a patch to address
it. Feedback is very welcome.--- Root causeSince the stats subsystem was rewritten (commit 5891c7a8), each
backend keeps a hash of pending stats entries that persist across
transaction boundaries. Entries that flush successfully are deleted
from the backend-local pending list; entries not flushed remain
pending with their accumulated counters. Flushing is driven by
pgstat_report_stat(), called from the ReadyForQuery path, and subject
to a rate limiter.The pg_stat_xact_* views read these pending entries directly via
find_tabstat_entry() / find_funcstat_entry(). Both functions return
the raw accumulated counters without per-transaction scoping.The bug is deterministic when multiple top-level transactions are
processed within a single simple-query message, because there is no
ReadyForQuery boundary between the transactions and therefore no
opportunity to flush and remove the pending entry.--- FixThe patch introduces a per-entry "transaction baseline" — a lazy
snapshot of the counters taken the first time an entry is touched in
each new top-level transaction. The accessor functions
(find_tabstat_entry, find_funcstat_entry) then subtract the baseline
from the current counters, yielding only the current transaction's
delta.The baseline is keyed by MyProc->vxid.lxid. A static inline helper,
pgstat_ensure_xact_baseline(), is called at every nontransactional
counter-increment site (the 7 event-counter macros in pgstat.h and the
4 non-inline counter functions in pgstat_relation.c). After the first
call per entry per transaction, the check reduces to a single integer
comparison with a predictably not-taken branch.For function stats, a new PgStat_FunctionPending wrapper struct embeds
PgStat_FunctionCounts at offset 0 with the baseline fields appended,
so the flush callback requires only a trivial cast change.--- TestingThe regression test is a TAP test under src/bin/psql/t/ that uses psql
-c to send multi-statement strings as single simple-query messages.
src/bin/psql/t look like the right existing harness for the primary
regression because psql -c sends the whole multi-statement string as
one simple-query message via simple_action_list dispatch. By contrast,
ordinary pg_regress SQL files are executed by psql in file/stdin mode,
which runs through MainLoop() and dispatches top-level statements one
at a time via SendQuery(), creating ReadyForQuery boundaries that
could mask the bug.The test covers three scenarios: table n_tup_ins and seq_scan
counters, and function calls — each verifying isolation across
consecutive top-level transactions within a single message.
postgres % cat output.txt
BEGIN
CREATE TABLE
INSERT 0 1
relname | n_tup_ins
-----------+-----------
xact_test | 1
(1 row)
COMMIT
BEGIN
INSERT 0 1
relname | n_tup_ins
-----------+-----------
xact_test | 2
(1 row)
ROLLBACK
DROP TABLE
I can also reproduce this behavior in v13. So this issue pre-existed
the major refactor 5891c7a8.
--
Best,
Xuneng
On Sun, Mar 29, 2026 at 11:17 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
On Sat, Mar 28, 2026 at 1:47 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
On Fri, Mar 27, 2026 at 10:15 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
Hi klemen,
Thanks for the report.
On Fri, Mar 27, 2026 at 5:36 AM PG Bug reporting form
<noreply@postgresql.org> wrote:The following bug has been logged on the website:
Bug reference: 19439
Logged by: klemen kobau
Email address: klemen.kobau@gmail.com
PostgreSQL version: 18.0
Operating system: Linux (EndeavorOS)
Description:I am running postgres:18.0 in a docker container, the configuration is as
follows:postgres:
image: postgres:18.0
command: [
"postgres",
"-N", "200",
"-c", "max_prepared_transactions=100",
"-c", "wal_level=logical",
"-c", "max_wal_senders=10",
"-c", "max_replication_slots=20",
"-c", "wal_keep_size=10",
"-c", "max_slot_wal_keep_size=1024"
]
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
TZ: UTC
PGTZ: UTC
ports:
- 5432:5432
volumes:
- postgres_data:/var/lib/postgresqlI use psql version 18.3.
I run the following:
psql -h localhost -p 5432 -U postgres -d postgres -c "
-- Transaction 1: insert 1 row, check stats, commit
BEGIN;
CREATE TABLE IF NOT EXISTS temp.xact_test (id serial PRIMARY KEY, val text);
INSERT INTO temp.xact_test (val) VALUES ('a');
SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
'xact_test';
COMMIT;-- Transaction 2: insert 1 row, check stats
BEGIN;
INSERT INTO temp.xact_test (val) VALUES ('b');
SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
'xact_test';
ROLLBACK;-- Cleanup
DROP TABLE temp.xact_test;
"And the output is
BEGIN
CREATE TABLE
INSERT 0 1
relname | n_tup_ins
-----------+-----------
xact_test | 1
(1 row)COMMIT
BEGIN
INSERT 0 1
relname | n_tup_ins
-----------+-----------
xact_test | 2
(1 row)ROLLBACK
DROP TABLEI can also reproduce this behavior on HEAD.
This is not what I would expect from reading
https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS,
where it statesA transaction can also see its own statistics (not yet flushed out to the
shared memory statistics) in the views pg_stat_xact_all_tables,
pg_stat_xact_sys_tables, pg_stat_xact_user_tables, and
pg_stat_xact_user_functions. These numbers do not act as stated above;
instead they update continuously throughout the transaction.based on this, I would expect that the numbers are updated each time an
insert happens and that they are isolated per transaction.This snippet of the doc feels somewhat ambiguous. The current behavior
does not seem fully consistent with wording such as “throughout the
transaction.” It also seems more appropriate for these table views to
reflect stats within their own scopes, rather than cumulative values
spanning txn boundaries.I’ve looked into this issue and 'd like to propose a patch to address
it. Feedback is very welcome.--- Root causeSince the stats subsystem was rewritten (commit 5891c7a8), each
backend keeps a hash of pending stats entries that persist across
transaction boundaries. Entries that flush successfully are deleted
from the backend-local pending list; entries not flushed remain
pending with their accumulated counters. Flushing is driven by
pgstat_report_stat(), called from the ReadyForQuery path, and subject
to a rate limiter.The pg_stat_xact_* views read these pending entries directly via
find_tabstat_entry() / find_funcstat_entry(). Both functions return
the raw accumulated counters without per-transaction scoping.The bug is deterministic when multiple top-level transactions are
processed within a single simple-query message, because there is no
ReadyForQuery boundary between the transactions and therefore no
opportunity to flush and remove the pending entry.--- FixThe patch introduces a per-entry "transaction baseline" — a lazy
snapshot of the counters taken the first time an entry is touched in
each new top-level transaction. The accessor functions
(find_tabstat_entry, find_funcstat_entry) then subtract the baseline
from the current counters, yielding only the current transaction's
delta.The baseline is keyed by MyProc->vxid.lxid. A static inline helper,
pgstat_ensure_xact_baseline(), is called at every nontransactional
counter-increment site (the 7 event-counter macros in pgstat.h and the
4 non-inline counter functions in pgstat_relation.c). After the first
call per entry per transaction, the check reduces to a single integer
comparison with a predictably not-taken branch.For function stats, a new PgStat_FunctionPending wrapper struct embeds
PgStat_FunctionCounts at offset 0 with the baseline fields appended,
so the flush callback requires only a trivial cast change.--- TestingThe regression test is a TAP test under src/bin/psql/t/ that uses psql
-c to send multi-statement strings as single simple-query messages.
src/bin/psql/t look like the right existing harness for the primary
regression because psql -c sends the whole multi-statement string as
one simple-query message via simple_action_list dispatch. By contrast,
ordinary pg_regress SQL files are executed by psql in file/stdin mode,
which runs through MainLoop() and dispatches top-level statements one
at a time via SendQuery(), creating ReadyForQuery boundaries that
could mask the bug.The test covers three scenarios: table n_tup_ins and seq_scan
counters, and function calls — each verifying isolation across
consecutive top-level transactions within a single message.postgres % cat output.txt
BEGIN
CREATE TABLE
INSERT 0 1
relname | n_tup_ins
-----------+-----------
xact_test | 1
(1 row)COMMIT
BEGIN
INSERT 0 1
relname | n_tup_ins
-----------+-----------
xact_test | 2
(1 row)ROLLBACK
DROP TABLEI can also reproduce this behavior in v13. So this issue pre-existed
the major refactor 5891c7a8.
After further thought of v1, I'd like to also propose an alternative
approach. The root cause and testing rationale are the same as
described upthread; what changes is how the baseline is established.
--- Problems with the lazy (v1) approach
The v1 patch snapshots the baseline lazily -- on every
nontransactional counter increment, it checks whether baseline_lxid
matches the current
transaction and snapshots if not. This has two drawbacks:
1) Hot-path overhead
Every pgstat_count_heap_scan(), pgstat_count_buffer_read_block(), etc.
acquires a branch. After the first call per entry per transaction, the
branch is predictably not-taken, but it could still present on some of
the most frequently executed paths in the backend.
2) Maintenance burden
Every new counter added to PgStat_TableCounts requires a corresponding
baseline snapshot at its increment site. Miss one, and that counter
silently leaks across transactions
--- eager baseline sweep
The attached patch records the baseline eagerly at transaction
boundaries instead of lazily at counter-increment sites.
pgstat_set_pending_baselines() iterates the pgStatPending list and
snapshots each entry's current counts into an xact_baseline field via
struct assignment. It is called from AtEOXact_PgStat() (after folding
transactional counts and removing dropped entries) and from
PostPrepare_PgStat() (after relation cleanup), covering commit, abort,
and PREPARE TRANSACTION. The view accessors unconditionally subtract
the baseline. For entries created in the current transaction,
xact_baseline is zero-initialized, so the subtraction is a no-op.
I don’t have a clear preference between the two approaches; both are
presented for review.
--
Best,
Xuneng