BUG #17399: Dead tuple number stats not updated on long running queries

Started by PG Bug reporting formover 4 years ago6 messagesbugs

noreply@postgresql.org

over 4 years ago

The following bug has been logged on the website:

Bug reference: 17399
Logged by: Soni
Email address: diptatapa@gmail.com
PostgreSQL version: 13.5
Operating system: Red Hat Enterprise Linux release 8.5 (Ootpa)
Description:

Hello All,
I think I found a bug.

While there are long running queries, a vacuum that start and end during the
long running queries, the stats of pg_stat_user_tables.n_dead_tup not
updated. The real dead tuple on the table is cleaned up, but not the
stats.
So, if dead tuple percentage on pg_stat_user_tables is above
autovacuum_vacuum_scale_factor, then the autovacuum keeps triggered during
the long running queries.

Soni M

diptatapa@gmail.com

over 4 years ago

In reply to: PG Bug reporting form (#1)

Re: BUG #17399: Dead tuple number stats not updated on long running queries

Some Update
After several times repeatedly autovacuum is launched on the table,
then it stops. After the long running queries finished, the postgres
service restarted, analyzed the table, the n_dead_tup still the same, then
vacuum again, but now the vacuum process detects again the dead row it has
cleaned previously.

It seems the concurrency control in the vacuum process.

On Tue, Feb 8, 2022 at 8:41 AM PG Bug reporting form <noreply@postgresql.org>
wrote:

The following bug has been logged on the website:

Bug reference: 17399
Logged by: Soni
Email address: diptatapa@gmail.com
PostgreSQL version: 13.5
Operating system: Red Hat Enterprise Linux release 8.5 (Ootpa)
Description:

Hello All,
I think I found a bug.

While there are long running queries, a vacuum that start and end during
the
long running queries, the stats of pg_stat_user_tables.n_dead_tup not
updated. The real dead tuple on the table is cleaned up, but not the
stats.
So, if dead tuple percentage on pg_stat_user_tables is above
autovacuum_vacuum_scale_factor, then the autovacuum keeps triggered during
the long running queries.

--
Regards,

Soni Maula Harriz

Andres Freund

andres@anarazel.de

over 4 years ago

In reply to: PG Bug reporting form (#1)

Re: BUG #17399: Dead tuple number stats not updated on long running queries

Hi,

On 2022-02-08 01:40:32 +0000, PG Bug reporting form wrote:

The following bug has been logged on the website:

Bug reference: 17399
Logged by: Soni
Email address: diptatapa@gmail.com
PostgreSQL version: 13.5
Operating system: Red Hat Enterprise Linux release 8.5 (Ootpa)
Description:

Hello All,
I think I found a bug.

While there are long running queries, a vacuum that start and end during the
long running queries, the stats of pg_stat_user_tables.n_dead_tup not
updated. The real dead tuple on the table is cleaned up, but not the
stats.

Are you saying that pg_stat_user_tables.n_dead_tup is not updated for changes
done by the long running transactions, or that it is not getting updated at
all, even if other transactions that performed modifications commit?

It is correct and expected that the changes by currently running transactions
are not reflected in the stats subsystem - and they can't really. Whether
changes by such transactions end up as "live" or "dead" rows differs between
that transaction committing and aborting.

I just verified that indeed n_dead_tup gets updated after a transaction
commits, even if there are other long running transactions.

So, if dead tuple percentage on pg_stat_user_tables is above
autovacuum_vacuum_scale_factor, then the autovacuum keeps triggered during
the long running queries.

The reason for this likely is not that pg_stat_user_tables.n_dead_tup is not
updated, but that existing dead tuples cannot be yet be removed, because the
longrunning transaction might still see them.

If you enable autovacuum logging, you can see (output differs a bit in older
versions, but the concept is similar) something like this:

2022-02-10 16:27:47.927 PST [61256][autovacuum worker][6/161:0] LOG: automatic vacuum of table "postgres.public.large": index scans: 0
pages: 0 removed, 15610 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 0 removed, 2400000 remain, 1400000 are dead but not yet removable, oldest xmin: 739
...

Note the "1400000 are dead but not yet removable" bit. In this case there are
1.4M dead rows that can't be cleaned up due to the longrunning transaction /
query. Because of that autovacuum will be started again and again, until the
dead rows can actually be cleaned up.

Once the longrunning transaction ends, autovacuum will run once more:

2022-02-10 16:32:04.292 PST [61703][autovacuum worker][6/261:749] LOG: automatic vacuum of table "postgres.public.large": index scans: 1
pages: 2335 removed, 13275 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 1400000 removed, 2000000 remain, 0 are dead but not yet removable, oldest xmin: 749

and because 1.4M tuples were removed, n_dead_tup changes to 0, and everyone is
happy again.

Greetings,

Andres Freund

Soni M

diptatapa@gmail.com

over 4 years ago

In reply to: Andres Freund (#3)

Re: BUG #17399: Dead tuple number stats not updated on long running queries

Ah yes, it is the dead tuple cannot be removed, because long running
transactions still see them.
Something that lead to my confusion above is pg_stat_progress_vacuum, each
time autovacuum/manual vacuum launch during long running transaction, it
reports that num_dead_tuples

On Fri, Feb 11, 2022 at 7:32 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-02-08 01:40:32 +0000, PG Bug reporting form wrote:

The following bug has been logged on the website:

Bug reference: 17399
Logged by: Soni
Email address: diptatapa@gmail.com
PostgreSQL version: 13.5
Operating system: Red Hat Enterprise Linux release 8.5 (Ootpa)
Description:

Hello All,
I think I found a bug.

While there are long running queries, a vacuum that start and end during

the

long running queries, the stats of pg_stat_user_tables.n_dead_tup not
updated. The real dead tuple on the table is cleaned up, but not the
stats.

Are you saying that pg_stat_user_tables.n_dead_tup is not updated for
changes
done by the long running transactions, or that it is not getting updated at
all, even if other transactions that performed modifications commit?

It is correct and expected that the changes by currently running
transactions
are not reflected in the stats subsystem - and they can't really. Whether
changes by such transactions end up as "live" or "dead" rows differs
between
that transaction committing and aborting.

I just verified that indeed n_dead_tup gets updated after a transaction
commits, even if there are other long running transactions.

So, if dead tuple percentage on pg_stat_user_tables is above
autovacuum_vacuum_scale_factor, then the autovacuum keeps triggered

during

the long running queries.

The reason for this likely is not that pg_stat_user_tables.n_dead_tup is
not
updated, but that existing dead tuples cannot be yet be removed, because
the
longrunning transaction might still see them.

If you enable autovacuum logging, you can see (output differs a bit in
older
versions, but the concept is similar) something like this:

2022-02-10 16:27:47.927 PST [61256][autovacuum worker][6/161:0] LOG:
automatic vacuum of table "postgres.public.large": index scans: 0
pages: 0 removed, 15610 remain, 0 skipped due to pins, 0 skipped
frozen
tuples: 0 removed, 2400000 remain, 1400000 are dead but not yet
removable, oldest xmin: 739
...

Note the "1400000 are dead but not yet removable" bit. In this case there
are
1.4M dead rows that can't be cleaned up due to the longrunning transaction
/
query. Because of that autovacuum will be started again and again, until
the
dead rows can actually be cleaned up.

Once the longrunning transaction ends, autovacuum will run once more:

2022-02-10 16:32:04.292 PST [61703][autovacuum worker][6/261:749] LOG:
automatic vacuum of table "postgres.public.large": index scans: 1
pages: 2335 removed, 13275 remain, 0 skipped due to pins, 0
skipped frozen
tuples: 1400000 removed, 2000000 remain, 0 are dead but not yet
removable, oldest xmin: 749

and because 1.4M tuples were removed, n_dead_tup changes to 0, and
everyone is
happy again.

Greetings,

Andres Freund

--
Regards,

Soni Maula Harriz

Soni M

diptatapa@gmail.com

over 4 years ago

In reply to: PG Bug reporting form (#1)

Re: BUG #17399: Dead tuple number stats not updated on long running queries

Sorry, accidentally send the message before finishing it.

It is false reporting from pg_stat_progress_vacuum that lead to my
confusion.

Thanks.

On Fri, Feb 11, 2022 at 11:33 AM Soni M <diptatapa@gmail.com> wrote:

Ah yes, it is the dead tuple cannot be removed, because long running
transactions still see them.
Something that lead to my confusion above is pg_stat_progress_vacuum, each
time autovacuum/manual vacuum launch during long running transaction, it
reports that num_dead_tuples

On Fri, Feb 11, 2022 at 7:32 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-02-08 01:40:32 +0000, PG Bug reporting form wrote:

The following bug has been logged on the website:

Bug reference: 17399
Logged by: Soni
Email address: diptatapa@gmail.com
PostgreSQL version: 13.5
Operating system: Red Hat Enterprise Linux release 8.5 (Ootpa)
Description:

Hello All,
I think I found a bug.

While there are long running queries, a vacuum that start and end

during the

long running queries, the stats of pg_stat_user_tables.n_dead_tup not
updated. The real dead tuple on the table is cleaned up, but not the
stats.

Are you saying that pg_stat_user_tables.n_dead_tup is not updated for
changes
done by the long running transactions, or that it is not getting updated
at
all, even if other transactions that performed modifications commit?

It is correct and expected that the changes by currently running
transactions
are not reflected in the stats subsystem - and they can't really. Whether
changes by such transactions end up as "live" or "dead" rows differs
between
that transaction committing and aborting.

I just verified that indeed n_dead_tup gets updated after a transaction
commits, even if there are other long running transactions.

So, if dead tuple percentage on pg_stat_user_tables is above
autovacuum_vacuum_scale_factor, then the autovacuum keeps triggered

during

the long running queries.

The reason for this likely is not that pg_stat_user_tables.n_dead_tup is
not
updated, but that existing dead tuples cannot be yet be removed, because
the
longrunning transaction might still see them.

If you enable autovacuum logging, you can see (output differs a bit in
older
versions, but the concept is similar) something like this:

2022-02-10 16:27:47.927 PST [61256][autovacuum worker][6/161:0] LOG:
automatic vacuum of table "postgres.public.large": index scans: 0
pages: 0 removed, 15610 remain, 0 skipped due to pins, 0 skipped
frozen
tuples: 0 removed, 2400000 remain, 1400000 are dead but not yet
removable, oldest xmin: 739
...

Note the "1400000 are dead but not yet removable" bit. In this case there
are
1.4M dead rows that can't be cleaned up due to the longrunning
transaction /
query. Because of that autovacuum will be started again and again, until
the
dead rows can actually be cleaned up.

Once the longrunning transaction ends, autovacuum will run once more:

2022-02-10 16:32:04.292 PST [61703][autovacuum worker][6/261:749] LOG:
automatic vacuum of table "postgres.public.large": index scans: 1
pages: 2335 removed, 13275 remain, 0 skipped due to pins, 0
skipped frozen
tuples: 1400000 removed, 2000000 remain, 0 are dead but not yet
removable, oldest xmin: 749

and because 1.4M tuples were removed, n_dead_tup changes to 0, and
everyone is
happy again.

Greetings,

Andres Freund

--
Regards,

Soni Maula Harriz

--
Regards,

Soni Maula Harriz

Andres Freund

andres@anarazel.de

over 4 years ago

In reply to: Soni M (#5)

Re: BUG #17399: Dead tuple number stats not updated on long running queries

Hi,

On February 10, 2022 8:37:10 PM PST, Soni M <diptatapa@gmail.com> wrote:

Sorry, accidentally send the message before finishing it.

Ah yes, it is the dead tuple cannot be removed, because long running
transactions still see them.
Something that lead to my confusion above is pg_stat_progress_vacuum, each
time autovacuum/manual vacuum launch during long running transaction, it
reports that num_dead_tuples is 0, while the
pg_stat_user_tables.n_dead_tuple still have the number of dead tuples.

It is false reporting from pg_stat_progress_vacuum that lead to my
confusion.

That just reports the number of dead tuples that it is tracking for removal. Not the number of tuples that could not yet be removed.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.