n_ins_since_vacuum stats for aborted transactions

Started by Sami Imseih11 months ago34 messages
Jump to latest
#1Sami Imseih
samimseih@gmail.com

Hi,

I came across what appears to be incorrect behavior in the
pg_stat_all_tables.n_ins_since_vacuum
counter after a rollback.

As shown below, the first two inserts of 1,000 rows were rolled back,
yet they are still counted toward
n_ins_since_vacuum.Consequently, they influence the vacuum insert
threshold calculation—even though
such rolled-back rows are dead tuples and should only affect the
vacuum threshold calculation.

Notice that the n_mod_since_analyze actually does the correct thing
here and does
not take into account the rolledback inserts.

Rollbacks are not common, so this may go unnoticed, but I think it should be
corrected, which means that only committed inserts should count
towards n_ins_since_vacuum.

the n_tup_ins|del|upd should continue to track both committed and rolledback
rows, but I also think the documentation [0]https://www.postgresql.org/docs/current/monitoring-stats.html for these fields could be improved
to clarify this point, i.e. n_tup_ins should be documented as
"Total number of rows inserted, including those from aborted transactions"
instead of just "Total number of rows inserted"

```
DROP TABLE t;
CREATE TABLE t (id INT);
ALTER TABLE t SET (autovacuum_enabled = OFF);
BEGIN;
INSERT INTO t SELECT n FROM generate_series(1, 1000) AS n;
INSERT INTO t SELECT n FROM generate_series(1, 1000) AS n;
ROLLBACK;
INSERT INTO t SELECT n FROM generate_series(1, 1000) AS n;
INSERT INTO t SELECT n FROM generate_series(1, 1000) AS n;
INSERT INTO t SELECT n FROM generate_series(1, 1000) AS n;
SELECT
n_tup_ins,
n_tup_upd,
n_tup_del,
n_live_tup,
n_dead_tup,
n_mod_since_analyze,
n_ins_since_vacuum
FROM
pg_stat_all_tables
WHERE
relname = 't';

n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup |
n_mod_since_analyze | n_ins_since_vacuum
-----------+-----------+-----------+------------+------------+---------------------+--------------------
5000 | 0 | 0 | 3000 | 2000 |
3000 | 5000
(1 row)

```

Thoughts? before I prepare patches for this.

[0]: https://www.postgresql.org/docs/current/monitoring-stats.html

--
Sami Imseih
Amazon Web Services (AWS)

#2Euler Taveira
euler@eulerto.com
In reply to: Sami Imseih (#1)
Re: n_ins_since_vacuum stats for aborted transactions

On Wed, Apr 9, 2025, at 1:57 PM, Sami Imseih wrote:

I came across what appears to be incorrect behavior in the
pg_stat_all_tables.n_ins_since_vacuum
counter after a rollback.

This is *not* an oversight. It is by design. See commit b07642dbcd8d. The
documentation says

Estimated number of rows inserted since this table was last vacuumed

Those rows were actually inserted. They are physically in the data files. And
that's what matters for the autovacuum algorithm.

--
Euler Taveira
EDB https://www.enterprisedb.com/

#3Sami Imseih
samimseih@gmail.com
In reply to: Euler Taveira (#2)
Re: n_ins_since_vacuum stats for aborted transactions

This is *not* an oversight. It is by design. See commit b07642dbcd8d. The
documentation says

I don't see in the commit message why inserts in an aborted transaction
must count towards n_ins_since_vacuum.

Estimated number of rows inserted since this table was last vacuumed

Those rows were actually inserted. They are physically in the data files. And
that's what matters for the autovacuum algorithm.

Correct,but they are dead tuples that are physically in the files, and
are accounted for through
n_dead_tup and are then cleaned up based on the
autovacuum_vacuum_scale_factor|threshold
calculation.

The purpose of b07642dbcd8d is to trigger autovacuum for append only/mainly
workloads that don't generate dead tuples.

--
Sami Imseih
Amazon Web Services (AWS)

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: Sami Imseih (#3)
Re: n_ins_since_vacuum stats for aborted transactions

On Wed, Apr 9, 2025 at 11:32 AM Sami Imseih <samimseih@gmail.com> wrote:

The purpose of b07642dbcd8d is to trigger autovacuum for append only/mainly
workloads that don't generate dead tuples.

Why does counting or not counting the dead tuples matter? Forget original
purpose, is there presently a bug or not? Between the two options the one
where we count dead tuples makes more sense on its face.

David J.

#5Sami Imseih
samimseih@gmail.com
In reply to: David G. Johnston (#4)
Re: n_ins_since_vacuum stats for aborted transactions

Forget original purpose, is there presently a bug or not?

Yes, there is a bug. Accounting rows inserted as part of an aborted
transaction in
n_ins_since_vacuum is not correct, since the same rows are being
accounted for with n_dead_tup.

Between the two options the one where we count dead tuples makes more sense on its face.

IIUC, I am saying the same thing, if an inserted row is rolled back,
it should only count as a
dead tuple (n_dead_tup) only, and not in n_ins_since_vacuum.

--
Sami Imseih
Amazon Web Services (AWS)

#6Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Sami Imseih (#5)
Re: n_ins_since_vacuum stats for aborted transactions

Yes, there is a bug. Accounting rows inserted as part of an aborted
transaction in
n_ins_since_vacuum is not correct, since the same rows are being
accounted for with n_dead_tup.

If I create a table with autovacuum_enabled=false, insert rows (some of
which abort), and check the stats, surely the n_ins_tup and the
n_ins_since_vacuum should be the same, because all the insertions (however
we count them) have happened since the nonexistent last vacuum:

CREATE TABLE n_insert_test (
i INTEGER NOT NULL PRIMARY KEY
) WITH (autovacuum_enabled = false);
INSERT INTO n_insert_test (i) VALUES (1);
INSERT INTO n_insert_test
(SELECT 1 FROM generate_series(1,100000))
ON CONFLICT
DO NOTHING;
SELECT pg_sleep(1);
pg_sleep
----------

(1 row)

SELECT n_live_tup, n_dead_tup, n_tup_ins, n_ins_since_vacuum
FROM pg_stat_all_tables
WHERE relname = 'n_insert_test';
n_live_tup | n_dead_tup | n_tup_ins | n_ins_since_vacuum
------------+------------+-----------+--------------------
1 | 0 | 1 | 1
(1 row)

INSERT INTO n_insert_test
(SELECT 2 FROM generate_series(1,100000))
ON CONFLICT
DO NOTHING;
SELECT pg_sleep(1);
pg_sleep
----------

(1 row)

SELECT n_live_tup, n_dead_tup, n_tup_ins, n_ins_since_vacuum
FROM pg_stat_all_tables
WHERE relname = 'n_insert_test';
n_live_tup | n_dead_tup | n_tup_ins | n_ins_since_vacuum
------------+------------+-----------+--------------------
2 | 0 | 2 | 2
(1 row)

BEGIN;
INSERT INTO n_insert_test
(SELECT * FROM generate_series(3,100000));
ROLLBACK;
SELECT pg_sleep(1);
pg_sleep
----------

(1 row)

SELECT n_live_tup, n_dead_tup, n_tup_ins, n_ins_since_vacuum
FROM pg_stat_all_tables
WHERE relname = 'n_insert_test';
n_live_tup | n_dead_tup | n_tup_ins | n_ins_since_vacuum
------------+------------+-----------+--------------------
2 | 99998 | 100000 | 100000
(1 row)

If we went with your suggestion, I think the final n_ins_since_vacuum
column would be 2. Do you think the n_tup_ins should also be 2? Should
those two columns differ? If so, why?


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7David G. Johnston
david.g.johnston@gmail.com
In reply to: Sami Imseih (#5)
Re: n_ins_since_vacuum stats for aborted transactions

On Wed, Apr 9, 2025, 11:57 Sami Imseih <samimseih@gmail.com> wrote:

Forget original purpose, is there presently a bug or not?

Yes, there is a bug. Accounting rows inserted as part of an aborted
transaction in
n_ins_since_vacuum is not correct, since the same rows are being
accounted for with n_dead_tup.

So why is it important we not account for the aborted insert in both
n_ins_since_vacuum and n_dead_tup?

When would you ever add them together so that an actual double-counting
would reflect in some total.

You aren't upset that n_live_tup and this both include the non-aborted
inserts.

David J.

#8Sami Imseih
samimseih@gmail.com
In reply to: Mark Dilger (#6)
Re: n_ins_since_vacuum stats for aborted transactions

If we went with your suggestion, I think the final n_ins_since_vacuum column would be 2. Do you think the n_tup_ins should also be 2?

n_ins_since_vacuum should be 2 and n_tup_ins should be 100000.

A user tracks how many inserts they performed with n_tup_ins
to measure load/activity on the database. It's important to also
include aborted transactions in this metric,

n_ins_since_vacuum however is not used to measure database activity,
but is used to drive autovacuum decisions. So, it has a different purpose.

Should those two columns differ? If so, why?

They will differ because n_tup_ins keeps increasing, while n_ins_since_vacuum is
reset after a vacuum. The issue I see is that n_ins_since_vacuum should only
reflect the number of newly inserted rows that are eligible for
freezing, as described
in pgstat_report_vacuum [0]https://github.com/postgres/postgres/blob/master/src/backend/utils/activity/pgstat_relation.c#L238-L247

[0]: https://github.com/postgres/postgres/blob/master/src/backend/utils/activity/pgstat_relation.c#L238-L247

--
Sami Imseih
Amazon Web Services (AWS)

#9Sami Imseih
samimseih@gmail.com
In reply to: David G. Johnston (#7)
Re: n_ins_since_vacuum stats for aborted transactions

So why is it important we not account for the aborted insert in both n_ins_since_vacuum and n_dead_tup?

I am saying n_dead_tup should continue to account for n_dead_tup. I am
not saying it should not.
What I am saying is n_ins_since_vacuum should not account for aborted inserts.

When would you ever add them together so that an actual double-counting would reflect in some total.

I would never add them together. n_ins_since_vacuum uses this value
for vacuuming purposes.

You aren't upset that n_live_tup and this both include the non-aborted inserts.

n_live_tup only shows the non-aborted inserts. I will put a better formatted
version of the repro below. IN the exampleI have 2k dead tuples from the rolled
back transactions and 3k live tuples from the committed transactions.

```
DROP TABLE t;
CREATE TABLE t (id INT);
ALTER TABLE t SET (autovacuum_enabled = OFF);
BEGIN;
INSERT INTO t SELECT n FROM generate_series(1, 1000) AS n;
INSERT INTO t SELECT n FROM generate_series(1, 1000) AS n;
ROLLBACK;
INSERT INTO t SELECT n FROM generate_series(1, 1000) AS n;
INSERT INTO t SELECT n FROM generate_series(1, 1000) AS n;
INSERT INTO t SELECT n FROM generate_series(1, 1000) AS n;
\x
SELECT
n_tup_ins,
n_tup_upd,
n_tup_del,
n_live_tup,
n_dead_tup,
n_mod_since_analyze,
n_ins_since_vacuum
FROM
pg_stat_all_tables
WHERE
relname = 't';

-[ RECORD 1 ]-------+-----
n_tup_ins | 5000
n_tup_upd | 0
n_tup_del | 0
n_live_tup | 3000
n_dead_tup | 2000
n_mod_since_analyze | 3000
n_ins_since_vacuum | 5000
```

--
Sami Imseih
Amazon Web Services (AWS)

#10Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Sami Imseih (#8)
Re: n_ins_since_vacuum stats for aborted transactions

The issue I see is that n_ins_since_vacuum should only
reflect the number of newly inserted rows that are eligible for
freezing, as described
in pgstat_report_vacuum [0]

[0]
https://github.com/postgres/postgres/blob/master/src/backend/utils/activity/pgstat_relation.c#L238-L247

For the archives, that paragraph reads:

/*
* It is quite possible that a non-aggressive VACUUM ended up skipping
* various pages, however, we'll zero the insert counter here regardless.
* It's currently used only to track when we need to perform an "insert"
* autovacuum, which are mainly intended to freeze newly inserted tuples.
* Zeroing this may just mean we'll not try to vacuum the table again
* until enough tuples have been inserted to trigger another insert
* autovacuum. An anti-wraparound autovacuum will catch any persistent
* stragglers.
*/

What work do you believe the word "mainly" does in that paragraph? The
presence of the word "mainly" rather than "only" somewhat cuts against your
argument that we should only be counting tuples that get inserted without
aborting.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#11David G. Johnston
david.g.johnston@gmail.com
In reply to: Sami Imseih (#9)
Re: n_ins_since_vacuum stats for aborted transactions

On Wed, Apr 9, 2025, 12:56 Sami Imseih <samimseih@gmail.com> wrote:

What I am saying is n_ins_since_vacuum should not account for aborted
inserts.

It does and from what I can see it should. You need to explain why it
should not. More importantly, convincingly enough to change five year old
behavior.

You should get away from showing counts and talk about system behavior.
Or, if this is accounting related, explain the math.

David J.

#12David G. Johnston
david.g.johnston@gmail.com
In reply to: Sami Imseih (#8)
Re: n_ins_since_vacuum stats for aborted transactions

On Wed, Apr 9, 2025, 12:39 Sami Imseih <samimseih@gmail.com> wrote:

They will differ because n_tup_ins keeps increasing, while
n_ins_since_vacuum is
reset after a vacuum. The issue I see is that n_ins_since_vacuum should
only
reflect the number of newly inserted rows that are eligible for
freezing, as described
in pgstat_report_vacuum [0]

Vacuuming them into oblivion is a form of freezing. It also removes the
aging xid from the table.

David J.

Show quoted text
#13Sami Imseih
samimseih@gmail.com
In reply to: David G. Johnston (#11)
Re: n_ins_since_vacuum stats for aborted transactions

What I am saying is n_ins_since_vacuum should not account for aborted inserts.

It does and from what I can see it should. You need to explain why it should not. More importantly, convincingly enough to change five year old behavior.

n_ins_since_vacuum was introduced to trigger autovacuum based on the #
of inserts
committed, and does not care about about dead tuples in this formula.

If I have a transaction that rolledback an insert of a million rows,
I expect autovacuum to kick in based on the fact there are now 1 million
n_dead_tup. n_ins_since_vacuumm is not relevant to the formula
for this case.

In other words, the reason n_ins_since_vacuum was introduced is to freeze
(committed) rows, so it should not need to track dead rows to do what it intends
to do.

--
Sami Imseih

#14David G. Johnston
david.g.johnston@gmail.com
In reply to: Sami Imseih (#13)
Re: n_ins_since_vacuum stats for aborted transactions

On Wed, Apr 9, 2025 at 1:30 PM Sami Imseih <samimseih@gmail.com> wrote:

What I am saying is n_ins_since_vacuum should not account for aborted

inserts.

It does and from what I can see it should. You need to explain why it

should not. More importantly, convincingly enough to change five year old
behavior.

n_ins_since_vacuum was introduced to trigger autovacuum based on the #
of inserts
committed, and does not care about about dead tuples in this formula.

If I have a transaction that rolledback an insert of a million rows,
I expect autovacuum to kick in based on the fact there are now 1 million
n_dead_tup. n_ins_since_vacuumm is not relevant to the formula
for this case.

In other words, the reason n_ins_since_vacuum was introduced is to freeze
(committed) rows, so it should not need to track dead rows to do what it
intends
to do.

Well, then the authors failed to implement what they intended. Which seems
unlikely, but even accepting that to be true, you still haven't shown that
the current behavior is undesirable. Just unexpected. Which means we
haven't documented this well enough so people reading the docs/code get an
expectation that matches reality.

So, please, go ahead with some documentation/code comment changes. But,
for my part, the existing behavior seems quite reasonable: "yes, please,
get rid of my bloat on this otherwise infrequently updated table"; thus
leave the existing behavior and the tracking alone.

David J.

#15Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Sami Imseih (#13)
Re: n_ins_since_vacuum stats for aborted transactions

In other words, the reason n_ins_since_vacuum was introduced is to freeze
(committed) rows, so it should not need to track dead rows to do what it
intends
to do.

Wouldn't that result in the rather strange behavior that 1 million dead
rows might trigger a vacuum due to one threshold, 1 million inserted live
rows might trigger a vacuum due to another threshold, while half a million
dead plus half a million live fails to meet either threshold and fails to
trigger a vacuum? What is the use case for that behavior? Perhaps you
have one, but until you make it explicit, it is hard for others to get
behind your proposal.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16David G. Johnston
david.g.johnston@gmail.com
In reply to: Sami Imseih (#13)
Re: n_ins_since_vacuum stats for aborted transactions

On Wednesday, April 9, 2025, Sami Imseih <samimseih@gmail.com> wrote:

In other words, the reason n_ins_since_vacuum was introduced is to freeze
(committed) rows, so it should not need to track dead rows to do what it
intends
to do.

n_ins_since_vacuum was introduced to indicate how many tuples a vacuum
would touch on an insert-only table should vacuum be run now. Autovacuum
uses this value when determining whether a given relation should be
vacuumed.

David J.

#17Sami Imseih
samimseih@gmail.com
In reply to: Mark Dilger (#15)
Re: n_ins_since_vacuum stats for aborted transactions

On Wed, Apr 9, 2025 at 4:23 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:

In other words, the reason n_ins_since_vacuum was introduced is to freeze
(committed) rows, so it should not need to track dead rows to do what it intends
to do.

Wouldn't that result in the rather strange behavior that 1 million dead rows might trigger a vacuum due to one threshold,
1 million inserted live rows might trigger a vacuum due to another threshold,
while half a million dead plus half a million live fails to meet either threshold and fails to trigger a vacuum?

Vacuum works based on different thresholds already, right? A user is
able to configure different thresholds:
autovacuum_vacuum_scale_factor|threshold
autovacuum_vacuum_insert_scale_factor|threshold

What is the use case for that behavior? Perhaps you have one, but until you make it explicit, it is hard for others to get behind your proposal.

The point is to ensure that the pg_stats fields that autovacuum uses
are supplied the correct values
for the different thresholds they need to calculate, as described here [0]/messages/by-id/CAA5RZ0uDyGW1omWqWkxyW8NB1qzsKmXhnoMtzTBeRzSd4DMatQ@mail.gmail.com

[0]: /messages/by-id/CAA5RZ0uDyGW1omWqWkxyW8NB1qzsKmXhnoMtzTBeRzSd4DMatQ@mail.gmail.com

--
Sami Imseih

#18Andres Freund
andres@anarazel.de
In reply to: Sami Imseih (#17)
Re: n_ins_since_vacuum stats for aborted transactions

Hi,

On 2025-04-09 17:05:39 -0500, Sami Imseih wrote:

On Wed, Apr 9, 2025 at 4:23 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:

In other words, the reason n_ins_since_vacuum was introduced is to freeze
(committed) rows, so it should not need to track dead rows to do what it intends
to do.

Wouldn't that result in the rather strange behavior that 1 million dead rows might trigger a vacuum due to one threshold,
1 million inserted live rows might trigger a vacuum due to another threshold,
while half a million dead plus half a million live fails to meet either threshold and fails to trigger a vacuum?

Vacuum works based on different thresholds already, right? A user is
able to configure different thresholds:
autovacuum_vacuum_scale_factor|threshold
autovacuum_vacuum_insert_scale_factor|threshold

What is the use case for that behavior? Perhaps you have one, but until you make it explicit, it is hard for others to get behind your proposal.

The point is to ensure that the pg_stats fields that autovacuum uses
are supplied the correct values
for the different thresholds they need to calculate, as described here [0]

You so far have not outlined a single scenario where the current behaviour
causes an issue.

Greetings,

Andres Freund

#19Sami Imseih
samimseih@gmail.com
In reply to: David G. Johnston (#14)
Re: n_ins_since_vacuum stats for aborted transactions

What I am saying is n_ins_since_vacuum should not account for aborted inserts.

It does and from what I can see it should. You need to explain why it should not. More importantly, convincingly enough to change five year old behavior.

n_ins_since_vacuum was introduced to trigger autovacuum based on the #
of inserts
committed, and does not care about about dead tuples in this formula.

If I have a transaction that rolledback an insert of a million rows,
I expect autovacuum to kick in based on the fact there are now 1 million
n_dead_tup. n_ins_since_vacuumm is not relevant to the formula
for this case.

In other words, the reason n_ins_since_vacuum was introduced is to freeze
(committed) rows, so it should not need to track dead rows to do what it intends
to do.

Well, then the authors failed to implement what they intended. Which seems unlikely, but even accepting that to be true,

At least, the reasoning behind this is neither explained in code comments or
in public docs.

you still haven't shown that the current behavior is undesirable. Just unexpected.

It's not expected for sure.

As far as being undesirable, I don't think it is because rollbacks are not
that common and this is not preventing or delaying autovacuum.

I do think it's probably a good idea to add code comment that it is OK
to include
dead tuples from a aborted insert into the n_ins_since_vacuum counter,
for the above reasons.

I will also update the public documentation for the counters to mention that
they include rows from aborted transactions.

--
Sami Imseih

#20David G. Johnston
david.g.johnston@gmail.com
In reply to: Sami Imseih (#17)
Re: n_ins_since_vacuum stats for aborted transactions

On Wednesday, April 9, 2025, Sami Imseih <samimseih@gmail.com> wrote:

What is the use case for that behavior? Perhaps you have one, but until

you make it explicit, it is hard for others to get behind your proposal.

The point is to ensure that the pg_stats fields that autovacuum uses
are supplied the correct values
for the different thresholds they need to calculate, as described here [0]

[0] /messages/by-id/CAA5RZ0uDyGW1omWqWkxyW8NB1qzsK
mXhnoMtzTBeRzSd4DMatQ%40mail.gmail.com

Except there isn’t some singular provably correct value here. Today’s
behavior (considering dead tuples) is not intrinsically wrong nor correct,
and neither is what you propose (ignoring the dead tuples). The fact that
those dead tuples get removed regardless is a point in favor of counting
them when deciding what to do. And it’s also the long-standing behavior.
You need to make a compelling argument to change to your preference.

Inserting aborted dead tuples moves the counter closer to both autovacuum
thresholds. There is no reason why that should be prohibited. I can see
the argument for why one threshold should be dead tuples only and the other
live tuples only - but I don’t favor that design.

David J.

#21Sami Imseih
samimseih@gmail.com
In reply to: David G. Johnston (#20)
#22David Rowley
dgrowleyml@gmail.com
In reply to: Sami Imseih (#21)
#23Sami Imseih
samimseih@gmail.com
In reply to: David Rowley (#22)
#24Sami Imseih
samimseih@gmail.com
In reply to: Sami Imseih (#23)
#25David Rowley
dgrowleyml@gmail.com
In reply to: Sami Imseih (#24)
#26Sami Imseih
samimseih@gmail.com
In reply to: David Rowley (#25)
#27David G. Johnston
david.g.johnston@gmail.com
In reply to: Sami Imseih (#26)
#28Sami Imseih
samimseih@gmail.com
In reply to: David G. Johnston (#27)
#29David G. Johnston
david.g.johnston@gmail.com
In reply to: Sami Imseih (#28)
#30Sami Imseih
samimseih@gmail.com
In reply to: David G. Johnston (#29)
#31David G. Johnston
david.g.johnston@gmail.com
In reply to: Sami Imseih (#30)
#32David Rowley
dgrowleyml@gmail.com
In reply to: Sami Imseih (#28)
#33Sami Imseih
samimseih@gmail.com
In reply to: David Rowley (#32)
#34Sami Imseih
samimseih@gmail.com
In reply to: Sami Imseih (#33)