POC: track vacuum/analyze cumulative time per relation

Started by Sami Imseihover 1 year ago22 messageshackers

samimseih@gmail.com

over 1 year ago

Hi,

After a recent question regarding tracking vacuum start_time in
pg_stat_all_tables [1]/messages/by-id/CAGjGUAKQ4UBNdkjunH2qLsdUVG-3F9gCuG0Kb0hToo+uMmSteQ@mail.gmail.com, it dawned on me that this view is missing
an important cumulative metric, which is how much time is spent
performing vacuum per table.

Currently, the only way a user can get this
information is if they enable autovacuum logging or track timing
for manual vacuums. Even then, if a user wants to trend
the time spent vacuuming over time, they must store the
timing data somewhere and perform the calculations.

Also, unless autovacuum logging is enabled for all a/v
operations, they could have gaps in their analysis.

Having the total (auto)vacuum elapsed time
along side the existing (auto)vaccum_count
allows a user to track the average time an
operating overtime and to find vacuum tuning
opportunities.

The same can also be said for (auto)analyze.

attached a patch ( without doc changes)
that adds 4 new columns:

total_autovacuum_time
total_vacuum_time
total_autoanalyze_time
total_analyze_time

Below is an example of output and how it
can be used to derive the average vacuum
operation time.

postgres=# select
relname,
autovacuum_count,
total_autovacuum_time,
total_autovacuum_time/NULLIF(autovacuum_count,0) average_autovac_time,
vacuum_count,
total_vacuum_time,
total_vacuum_time/NULLIF(vacuum_count,0) average_vac_time
from pg_catalog.pg_stat_all_tables
where relname = 'pgbench_history';
-[ RECORD 1 ]---------+-----------------
relname | pgbench_history
autovacuum_count | 3
total_autovacuum_time | 1689
average_autovac_time | 563
vacuum_count | 1
total_vacuum_time | 1
average_vac_time | 1

It should be noted that the timing is only tracked
when the vacuum or analyze operation completes,
as is the case with the other metrics.

Also, there is another discussion in-flight [2]/messages/by-id/b68ab452-c41f-4d04-893f-eaab84f1855b@vondra.me regarding
tracking vacuum run history in a view, but this serves a
different purpose as this will provide all the metrics
that are other wise exposed in vacuum logging
via sql. This history will also be required to drop
entries using some criteria to keep the cache from
growing infinitely.

Feedback for the attached patch is appreciated!

Regards,

Sami Imseih
Amazon Web Services (AWS)

[1]: /messages/by-id/CAGjGUAKQ4UBNdkjunH2qLsdUVG-3F9gCuG0Kb0hToo+uMmSteQ@mail.gmail.com
[2]: /messages/by-id/b68ab452-c41f-4d04-893f-eaab84f1855b@vondra.me

wenhui qiu

qiuwenhuifx@gmail.com

over 1 year ago

In reply to: Sami Imseih (#1)

Re: POC: track vacuum/analyze cumulative time per relation

Hi Sami
Thank you for your path，it seems some path monitor vacuum status，Can we
synthesize their good ideas together。

On Fri, 3 Jan 2025 at 02:24, Sami Imseih <samimseih@gmail.com> wrote:

Show quoted text

Hi,

After a recent question regarding tracking vacuum start_time in
pg_stat_all_tables [1], it dawned on me that this view is missing
an important cumulative metric, which is how much time is spent
performing vacuum per table.

Currently, the only way a user can get this
information is if they enable autovacuum logging or track timing
for manual vacuums. Even then, if a user wants to trend
the time spent vacuuming over time, they must store the
timing data somewhere and perform the calculations.

Also, unless autovacuum logging is enabled for all a/v
operations, they could have gaps in their analysis.

Having the total (auto)vacuum elapsed time
along side the existing (auto)vaccum_count
allows a user to track the average time an
operating overtime and to find vacuum tuning
opportunities.

The same can also be said for (auto)analyze.

attached a patch ( without doc changes)
that adds 4 new columns:

total_autovacuum_time
total_vacuum_time
total_autoanalyze_time
total_analyze_time

Below is an example of output and how it
can be used to derive the average vacuum
operation time.

postgres=# select
relname,
autovacuum_count,
total_autovacuum_time,
total_autovacuum_time/NULLIF(autovacuum_count,0) average_autovac_time,
vacuum_count,
total_vacuum_time,
total_vacuum_time/NULLIF(vacuum_count,0) average_vac_time
from pg_catalog.pg_stat_all_tables
where relname = 'pgbench_history';
-[ RECORD 1 ]---------+-----------------
relname | pgbench_history
autovacuum_count | 3
total_autovacuum_time | 1689
average_autovac_time | 563
vacuum_count | 1
total_vacuum_time | 1
average_vac_time | 1

It should be noted that the timing is only tracked
when the vacuum or analyze operation completes,
as is the case with the other metrics.

Also, there is another discussion in-flight [2] regarding
tracking vacuum run history in a view, but this serves a
different purpose as this will provide all the metrics
that are other wise exposed in vacuum logging
via sql. This history will also be required to drop
entries using some criteria to keep the cache from
growing infinitely.

Feedback for the attached patch is appreciated!

Regards,

Sami Imseih
Amazon Web Services (AWS)

[1]
/messages/by-id/CAGjGUAKQ4UBNdkjunH2qLsdUVG-3F9gCuG0Kb0hToo+uMmSteQ@mail.gmail.com
[2]
/messages/by-id/b68ab452-c41f-4d04-893f-eaab84f1855b@vondra.me

Alena Rybakina

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: wenhui qiu (#2)

Re: POC: track vacuum/analyze cumulative time per relation

Hi!

On 04.01.2025 06:41, wenhui qiu wrote:

Hi Sami
Thank you for your path，it seems some path monitor vacuum status，Can
we synthesize their good ideas together。

I've been working on a patch that collects vacuum statistics since May
[0]: https://commitfest.postgresql.org/51/5012/
course, database vacuum statistics. We welcome reviews and any feedback!

It collects a bit more information than total_time, such as delay_time,
buffer statistics, wal statistics, number of processed, skipped pages
and tuples during vacuum operations, so
this information gives a complete picture of vacuum efficiency, because
analyzing only total_time statistics does not give a complete picture of
what vacuum did: was it cleaning almost huge index, cleaning tables or
just sleeping.

[0]: https://commitfest.postgresql.org/51/5012/

--
Regards,
Alena Rybakina
Postgres Professional

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

over 1 year ago

In reply to: Sami Imseih (#1)

Re: POC: track vacuum/analyze cumulative time per relation

Hi,

On Thu, Jan 02, 2025 at 12:24:06PM -0600, Sami Imseih wrote:

Having the total (auto)vacuum elapsed time
along side the existing (auto)vaccum_count
allows a user to track the average time an
operating overtime and to find vacuum tuning
opportunities.

The same can also be said for (auto)analyze.

I think that makes sense to expose those metrics through SQL as you're proposing
here. The more we expose through SQL the better I think.

attached a patch ( without doc changes)
that adds 4 new columns:

Thanks!

A few random comments:

=== 1

+       endtime = GetCurrentTimestamp();
        pgstat_report_vacuum(RelationGetRelid(rel),
                                                 rel->rd_rel->relisshared,
                                                 Max(vacrel->new_live_tuples, 0),
                                                 vacrel->recently_dead_tuples +
-                                                vacrel->missed_dead_tuples);
+                                                vacrel->missed_dead_tuples,
+                                                starttime,
+                                                endtime);
        pgstat_progress_end_command();

if (instrument)
{
- TimestampTz endtime = GetCurrentTimestamp();

What about keeping the endtime assignment after the pgstat_progress_end_command()
call? I think that it makes more sense that way.

=== 2

        pgstat_report_vacuum(RelationGetRelid(rel),
                                                 rel->rd_rel->relisshared,
                                                 Max(vacrel->new_live_tuples, 0),
                                                 vacrel->recently_dead_tuples +
-                                                vacrel->missed_dead_tuples);
+                                                vacrel->missed_dead_tuples,
+                                                starttime,
+                                                endtime);

What about doing the elapsedtime computation prior the this call and passed
it as a parameter (then remove the starttime one and keep the endtime as it
is needed)?

=== 3

        pgstat_report_analyze(onerel, totalrows, totaldeadrows,
-                             (va_cols == NIL));
+                             (va_cols == NIL), starttime, endtime);

Same as 2 for pgstat_report_analyze() except that the endtime could be removed
too.

=== 4

+/* pg_stat_get_vacuum_time */
+PG_STAT_GET_RELENTRY_INT64(total_vacuum_time)
+
+/* pg_stat_get_autovacuum_time */
+PG_STAT_GET_RELENTRY_INT64(total_autovacuum_time)
+
+/* pg_stat_get_analyze_time */
+PG_STAT_GET_RELENTRY_INT64(total_analyze_time)
+
+/* pg_stat_get_autoanalyze_time */
+PG_STAT_GET_RELENTRY_INT64(total_autoanalyze_time)

I wonder if it wouldn't be better to use FLOAT8 here (to match things like
pg_stat_get_checkpointer_write_time(), pg_stat_get_checkpointer_sync_time() among
others).

=== 5

+
+       PgStat_Counter total_vacuum_time;       /* user initiated vacuum */
+       PgStat_Counter total_autovacuum_time;   /* autovacuum initiated */
+       PgStat_Counter total_analyze_time;      /* user initiated vacuum */
+       PgStat_Counter total_autoanalyze_time;  /* autovacuum initiated */

Those comments look weird to me.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Sami Imseih

samimseih@gmail.com

over 1 year ago

In reply to: Alena Rybakina (#3)

Re: POC: track vacuum/analyze cumulative time per relation

this information gives a complete picture of vacuum efficiency, because
analyzing only total_time statistics does not give a complete picture of
what vacuum did: was it cleaning almost huge index, cleaning tables or
just sleeping.

The purpose of total_time is to be able to calculate the average
time of a vacuum per relation from pg_stat_all_tables (by
total_time/vacuum_count) This is useful on its own to be able to
trend slow downs of vacuum ( or improvements after vacuum tuning )
for specific relations.

The work going into [1]https://commitfest.postgresql.org/51/5012/ is more detailed and will be useful
for deep dive investigations.In short, I think both proposals
have merit.

[1]: https://commitfest.postgresql.org/51/5012/

Regards,

Sami

Sami Imseih

samimseih@gmail.com

over 1 year ago

In reply to: Sami Imseih (#5)

Re: POC: track vacuum/analyze cumulative time per relation

this information gives a complete picture of vacuum efficiency, because
analyzing only total_time statistics does not give a complete picture of
what vacuum did: was it cleaning almost huge index, cleaning tables or
just sleeping.

The purpose of total_time is to be able to calculate the average
time of a vacuum per relation from pg_stat_all_tables (by
total_time/vacuum_count) This is useful on its own to be able to
trend slow downs of vacuum ( or improvements after vacuum tuning )
for specific relations.

The work going into [1] is more detailed and will be useful
for deep dive investigations.In short, I think both proposals
have merit.

Missed this point in my last reply. This patch also tracks
(auto)analyze per relation cumulative timing which can
be useful along with (auto)analyze_count to derive
the average.

Regards,

Sami

Sami Imseih

samimseih@gmail.com

over 1 year ago

In reply to: Bertrand Drouvot (#4)

Re: POC: track vacuum/analyze cumulative time per relation

Thanks for the review!

=== 1
+       endtime = GetCurrentTimestamp();
pgstat_report_vacuum(RelationGetRelid(rel),
rel->rd_rel->relisshared,
Max(vacrel->new_live_tuples, 0),
vacrel->recently_dead_tuples +
-                                                vacrel->missed_dead_tuples);
+                                                vacrel->missed_dead_tuples,
+                                                starttime,
+                                                endtime);
pgstat_progress_end_command();
if (instrument)
{
- TimestampTz endtime = GetCurrentTimestamp();

What about keeping the endtime assignment after the pgstat_progress_end_command()
call? I think that it makes more sense that way.

Yes, that can be done. I see a few more examples in
which we report stats after end_command, i.e.
AbortTransaction

What about doing the elapsedtime computation prior the this call and passed
it as a parameter (then remove the starttime one and keep the endtime as it
is needed)?

Yes, makes sense. It will keep the changes to pgstat_report_ to a minimum.

=== 4
+/* pg_stat_get_vacuum_time */
+PG_STAT_GET_RELENTRY_INT64(total_vacuum_time)
+
+/* pg_stat_get_autovacuum_time */
+PG_STAT_GET_RELENTRY_INT64(total_autovacuum_time)
+
+/* pg_stat_get_analyze_time */
+PG_STAT_GET_RELENTRY_INT64(total_analyze_time)
+
+/* pg_stat_get_autoanalyze_time */
+PG_STAT_GET_RELENTRY_INT64(total_autoanalyze_time)
I wonder if it wouldn't be better to use FLOAT8 here (to match things like
pg_stat_get_checkpointer_write_time(), pg_stat_get_checkpointer_sync_time() among
others).

=== 5

I see the checkpointer write_time for example is converted
to double "for presentation". I am not sure why, expect for maybe
historical reasons. I will do the same for this new cumulative
time field for the sake of consistency. There is no macro for
relentry to return a stat as FLOAT8, so adding it as part of this
patch.

Datum
pg_stat_get_checkpointer_write_time(PG_FUNCTION_ARGS)
{
/* time is already in msec, just convert to double for presentation */
PG_RETURN_FLOAT8((double)
pgstat_fetch_stat_checkpointer()->write_time);
}

+
+       PgStat_Counter total_vacuum_time;       /* user initiated vacuum */
+       PgStat_Counter total_autovacuum_time;   /* autovacuum initiated */
+       PgStat_Counter total_analyze_time;      /* user initiated vacuum */
+       PgStat_Counter total_autoanalyze_time;  /* autovacuum initiated */

Those comments look weird to me.

Ok, Will remove these.

I also updated the comments for the instrument code path to reflect the
fact starttime is now set for all cases.Also, added documentation.

See the attached v2.

Regards,

Sami

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

over 1 year ago

In reply to: Sami Imseih (#7)

Re: POC: track vacuum/analyze cumulative time per relation

Hi,

On Fri, Jan 10, 2025 at 04:26:05PM -0600, Sami Imseih wrote:

{
/* time is already in msec, just convert to double for presentation */
PG_RETURN_FLOAT8((double)
pgstat_fetch_stat_checkpointer()->write_time);
}
+
+       PgStat_Counter total_vacuum_time;       /* user initiated vacuum */
+       PgStat_Counter total_autovacuum_time;   /* autovacuum initiated */
+       PgStat_Counter total_analyze_time;      /* user initiated vacuum */
+       PgStat_Counter total_autoanalyze_time;  /* autovacuum initiated */
Those comments look weird to me.
Ok, Will remove these.

I also updated the comments for the instrument code path to reflect the
fact starttime is now set for all cases.Also, added documentation.

See the attached v2.

Thanks for the patch update!

A few random comments:

=== 1

+/* pg_stat_get_vacuum_time */
+PG_STAT_GET_RELENTRY_FLOAT8(total_vacuum_time)
+
+/* pg_stat_get_autovacuum_time */
+PG_STAT_GET_RELENTRY_FLOAT8(total_autovacuum_time)
+
+/* pg_stat_get_analyze_time */
+PG_STAT_GET_RELENTRY_FLOAT8(total_analyze_time)
+
+/* pg_stat_get_autoanalyze_time */
+PG_STAT_GET_RELENTRY_FLOAT8(total_autoanalyze_time)
+

The comments do not reflect the function names ("total" is missing to give
pg_stat_get_total_vacuum_time() and such).

=== 2

+#define PG_STAT_GET_RELENTRY_FLOAT8(stat)
+Datum                                        
+CppConcat(pg_stat_get_,stat)(PG_FUNCTION_ARGS)
+{                                                       
+       Oid                     relid = PG_GETARG_OID(0); 
+       int64           result;
+       PgStat_StatTabEntry *tabentry;
+                                               
+       if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
+               result = 0;
+       else              
+               result = (float8) (tabentry->stat);
+                                      
		+       PG_RETURN_FLOAT8(result);
+}
+
 /* pg_stat_get_analyze_count */
 PG_STAT_GET_RELENTRY_INT64(analyze_count

I think it's better to define the macro just before its first usage (meaning
just after pg_stat_get_vacuum_count()): that would be consistent with the places
the other macros are defined.

=== 3

+ int64 result;

s/int64/double/?

=== 4

+       Total time this table has spent in manual vacuum
+      </para></entry>

Mention the unit?

=== 5

+       /*
+        * When verbose or autovacuum logging is used, initialize a resource usage
+        * snapshot and optionally track I/O timing.
+        */
        if (instrument)
        {

Out of curiosity, why this extra comment? To be somehow consistent with
do_analyze_rel()?

=== 6

@@ -343,6 +349,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
pgstat_progress_start_command(PROGRESS_COMMAND_VACUUM,
RelationGetRelid(rel));

+ starttime = GetCurrentTimestamp();

I wonder if it wouldn't make more sense to put the GetCurrentTimestamp() call
before the pgstat_progress_start_command() call. That would be aligned with the
"endtime" being after the pgstat_progress_end_command() and where it was before
the patch.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Sami Imseih

samimseih@gmail.com

over 1 year ago

In reply to: Bertrand Drouvot (#8)

Re: POC: track vacuum/analyze cumulative time per relation

thanks for the review!

The comments do not reflect the function names ("total" is missing to give
pg_stat_get_total_vacuum_time() and such).

fixed

I think it's better to define the macro just before its first usage (meaning
just after pg_stat_get_vacuum_count()): that would be consistent with the places
the other macros are defined.

correct. I fixed this to match the placement of similar macros.

s/int64/double/?

fixed

Mention the unit?

fixed

Out of curiosity, why this extra comment? To be somehow consistent with
do_analyze_rel()?

Yes, I added it for consistency, but since the comment was not there
before, I just decided to remove it.

I wonder if it wouldn't make more sense to put the GetCurrentTimestamp() call
before the pgstat_progress_start_command() call. That would be aligned with the
"endtime" being after the pgstat_progress_end_command() and where it was before
the patch.

I agree. Fixed.

Please see the attached v3.

Regards,

Sami

#10

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

over 1 year ago

In reply to: Sami Imseih (#9)

Re: POC: track vacuum/analyze cumulative time per relation

Hi,

On Tue, Jan 14, 2025 at 05:01:52PM -0600, Sami Imseih wrote:

Please see the attached v3.

Thanks!

A few comments:

=== 1

+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>total_vacuum_time</structfield> <type>bigint</type>
+      </para>

Those new fields should be documented as "double precision".

=== 2

+#define PG_STAT_GET_RELENTRY_FLOAT8(stat)                                              \
+Datum                                                                                                                  \
+CppConcat(pg_stat_get_,stat)(PG_FUNCTION_ARGS)                                 \
+{                                                                                                                              \
+       Oid                     relid = PG_GETARG_OID(0);                                               \
+       float8          result;                                                                                 \
+       PgStat_StatTabEntry *tabentry;                                                          \
+                                                                                                                               \
+       if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)     \
+               result = 0;                                                                                             \
+       else                                                                                                            \
+               result = (float8) (tabentry->stat);                                             \
+                                                                                                                               \
+       PG_RETURN_FLOAT8(result);                                                                       \
+}

I did propose "double" up-thread to be consistent with the code around. That
would mean to also cast to "double". That's just for consistency purpose. What do
you think?

Appart from the above that LGTM.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#11

Sami Imseih

samimseih@gmail.com

over 1 year ago

In reply to: Bertrand Drouvot (#10)

Re: POC: track vacuum/analyze cumulative time per relation

Appart from the above that LGTM.

thanks! I took care of your comments. I was not sure
about the need to cast "double" but as you mention this
is consistent with other parts of pgstatfuncs.c

v4 attached.

Regards,

Sami

#12

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

about 1 year ago

In reply to: Sami Imseih (#11)

Re: POC: track vacuum/analyze cumulative time per relation

Hi,

On Wed, Jan 15, 2025 at 11:09:00AM -0600, Sami Imseih wrote:

Appart from the above that LGTM.

thanks! I took care of your comments.

Thanks!

I was not sure
about the need to cast "double" but as you mention this
is consistent with other parts of pgstatfuncs.c

Yup.

v4 attached.

One comment:

+ PG_RETURN_FLOAT8(result); \

The "\" indentation looks wrong for this line (was ok in v3).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#13

Sami Imseih

samimseih@gmail.com

about 1 year ago

In reply to: Bertrand Drouvot (#12)

Re: POC: track vacuum/analyze cumulative time per relation

One comment:

+ PG_RETURN_FLOAT8(result); \

The "\" indentation looks wrong for this line (was ok in v3).

Regards,

Indeed. Corrected in v5

Thanks!

Sami

#14

Michael Paquier

michael@paquier.xyz

about 1 year ago

In reply to: Sami Imseih (#13)

Re: POC: track vacuum/analyze cumulative time per relation

On Fri, Jan 17, 2025 at 09:42:13AM -0600, Sami Imseih wrote:

Indeed. Corrected in v5

I can get behind the idea of the patch.

+	PgStat_Counter total_vacuum_time;
+	PgStat_Counter total_autovacuum_time;
+	PgStat_Counter total_analyze_time;
+	PgStat_Counter total_autoanalyze_time;
 } PgStat_StatTabEntry;

These are time values in milliseconds. It would be good to document
that as a comment with these fields, like PgStat_CheckpointerStats and
others. Perhaps they should be grouped with their related counters
within PgStat_StatTabEntry? Or not.

The patch makes sense here in terms of passing down to
pgstat_report_vacuum() the elapsed time and the end time to avoid an
extra GetCurrentTimestamp() when calculating if the logging should
happen. However in analyze.c..

+++ b/src/backend/commands/analyze.c
[...]
+    TimestampTz endtime = 0;
[...]
-        TimestampTz endtime = GetCurrentTimestamp();
-
         if (verbose || params->log_min_duration == 0 ||
             TimestampDifferenceExceeds(starttime, endtime,

This is incorrect, endtime is never set. I would give priority to the
symmetry of the code here, using for the analyze reporting the same
arguments as the vacuum bits, so as we don't call GetCurrentTimestamp
in the proposed changes for pgstat_report_analyze().

+	if (AmAutoVacuumWorkerProcess())
+		tabentry->total_autovacuum_time += elapsedtime;
+	else
+		tabentry->total_vacuum_time += elapsedtime;
[...]
+	if (AmAutoVacuumWorkerProcess())
+		tabentry->total_autoanalyze_time += elapsedtime;
+	else
+		tabentry->total_analyze_time += elapsedtime;

These could be grouped with their previous blocks.
--
Michael

#15

Sami Imseih

samimseih@gmail.com

about 1 year ago

In reply to: Michael Paquier (#14)

Re: POC: track vacuum/analyze cumulative time per relation

I can get behind the idea of the patch.

Thanks for the review!

+       PgStat_Counter total_vacuum_time;
+       PgStat_Counter total_autovacuum_time;
+       PgStat_Counter total_analyze_time;
+       PgStat_Counter total_autoanalyze_time;
} PgStat_StatTabEntry;
These are time values in milliseconds.  It would be good to document
that as a comment with these fields, like PgStat_CheckpointerStats and
others.  Perhaps they should be grouped with their related counters
within PgStat_StatTabEntry?  Or not.

done, including grouping all the vacuum counters together.

This is incorrect, endtime is never set.

indeed. fixed.

I would give priority to the
symmetry of the code here, using for the analyze reporting the same
arguments as the vacuum bits, so as we don't call GetCurrentTimestamp
in the proposed changes for pgstat_report_analyze().

I agree.
pgstat_report_analyze and pgstat_report_vacuum match
will now take endtime and elapsedtime and we can remove
the GetCurrentTimestamp inside pgstat_report_analyze.

+       if (AmAutoVacuumWorkerProcess())
+               tabentry->total_autovacuum_time += elapsedtime;
+       else
+               tabentry->total_vacuum_time += elapsedtime;
[...]
+       if (AmAutoVacuumWorkerProcess())
+               tabentry->total_autoanalyze_time += elapsedtime;
+       else
+               tabentry->total_analyze_time += elapsedtime;

Correct. fixed as well.

See the attached v6.

Thanks!

Regards,

Sami

#16

Michael Paquier

michael@paquier.xyz

about 1 year ago

In reply to: Sami Imseih (#15)

Re: POC: track vacuum/analyze cumulative time per relation

On Mon, Jan 20, 2025 at 11:04:40AM -0600, Sami Imseih wrote:

+       PgStat_Counter total_vacuum_time;
+       PgStat_Counter total_autovacuum_time;
+       PgStat_Counter total_analyze_time;
+       PgStat_Counter total_autoanalyze_time;
} PgStat_StatTabEntry;
These are time values in milliseconds.  It would be good to document
that as a comment with these fields, like PgStat_CheckpointerStats and
others.  Perhaps they should be grouped with their related counters
within PgStat_StatTabEntry?  Or not.

done, including grouping all the vacuum counters together.

I was referring to the order of the fields in the structure itself,
but that's no big deal one way or the other.

See the attached v6.

+    PgStat_Counter total_vacuum_time; /* times in milliseconds */
+    PgStat_Counter total_autovacuum_time; /* times in milliseconds */
+    PgStat_Counter total_analyze_time; /* times in milliseconds */
+    PgStat_Counter total_autoanalyze_time; /* times in milliseconds */

This should be one comment for the whole block, or this should use the
singular for each comment.

instrument = (verbose || (AmAutoVacuumWorkerProcess() &&
params->log_min_duration >= 0));
+
if (instrument)

Some noise.

if (instrument)
{
- TimestampTz endtime = GetCurrentTimestamp();

On HEAD in the ANALYZE path, the endtime is calculated after
index_vacuum_cleanup(). Your patch calculates it before
index_vacuum_cleanup(), which would result in an incorrect calculation
if the index cleanup takes a long time with less logs generated, no?
Sorry for not noticing that earlier on the thread.. Perhaps it would
be just better to pass the start time to pgstat_report_vacuum() and
pgstat_report_analyze() then let their internals do the elapsed time
calculations. Consistency of the arguments for both functions is
something worth having, IMO, even if it means a bit more
GetCurrentTimestamp() in this case.
--
Michael

#17

Sami Imseih

samimseih@gmail.com

about 1 year ago

In reply to: Michael Paquier (#16)

Re: POC: track vacuum/analyze cumulative time per relation

I was referring to the order of the fields in the structure itself,
but that's no big deal one way or the other.

I understand your point now. I will group them with the
related counters in the next rev and will use

This should be one comment for the whole block, or this should use the
singular for each comment.

I will use a singular "/* time in milliseconds */" comment for each
new field.

This existing write_time field is plural, but I will leave
that one alone for now.

PgStat_Counter write_time; /* times in milliseconds */

On HEAD in the ANALYZE path, the endtime is calculated after
index_vacuum_cleanup(). Your patch calculates it before
index_vacuum_cleanup(), which would result in an incorrect calculation
if the index cleanup takes a long time with less logs generated, no?
Sorry for not noticing that earlier on the thread.. Perhaps it would
be just better to pass the start time to pgstat_report_vacuum() and
pgstat_report_analyze() then let their internals do the elapsed time
calculations. Consistency of the arguments for both functions is
something worth having, IMO, even if it means a bit more
GetCurrentTimestamp() in this case.

So currently, the report of the last_(autoanalyze|analyze)_time
is set before the index_vacuum_cleanup, but for logging purposes
the elapsed time is calculated afterwards. Most users will not notice
this, but I think that is wrong as well.

I think we should calculate endtime and elapsedtime and call
pgstat_report_analyze after the index_vacuum_cleanup; and
before vac_close_indexes. This is more accurate and will
avoid incurring the extra GetCurrentTimestamp() call.

What do you think?

Regards,

Sami

#18

Michael Paquier

michael@paquier.xyz

about 1 year ago

In reply to: Sami Imseih (#17)

Re: POC: track vacuum/analyze cumulative time per relation

On Fri, Jan 24, 2025 at 02:19:59PM -0600, Sami Imseih wrote:

So currently, the report of the last_(autoanalyze|analyze)_time
is set before the index_vacuum_cleanup, but for logging purposes
the elapsed time is calculated afterwards. Most users will not notice
this, but I think that is wrong as well.

I think we should calculate endtime and elapsedtime and call
pgstat_report_analyze after the index_vacuum_cleanup; and
before vac_close_indexes. This is more accurate and will
avoid incurring the extra GetCurrentTimestamp() call.

What do you think?

I think that this is hiding a behavior change while adding new
counters, and both changes are independent. Another thing that is
slightly incorrect to do if we take the argument of only adding the
counters is moving around the call of pgstat_progress_end_command(),
because it's not really wrong as-is, either. I'd suggest to make all
that a separate discussion.

The addition of the extra GetCurrentTimestamp() in the report path
does not stress me much, FWIW, and keeping the tracking of the end
time within the report routines for [auto]analyze and [auto]vacuum has
the merit that the caller does not need to worry about more
TimestampDifferenceMilliseconds() calculations.

I have put my hands on this patch, and at the end I think that we
should just do the attached, which is simpler and addresses your
use-case. Note also that the end time is acquired while the entries
are not locked in the report routines, and some tweaks in the docs and
comments.
--
Michael

#19

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

about 1 year ago

In reply to: Michael Paquier (#18)

Re: POC: track vacuum/analyze cumulative time per relation

Hi,

On Mon, Jan 27, 2025 at 09:30:16AM +0900, Michael Paquier wrote:

The addition of the extra GetCurrentTimestamp() in the report path
does not stress me much,

Reading at the previous messages I see how you reached this state. I also think
that makes sense and that's not an issue as we are not in a hot path here.

should just do the attached, which is simpler and addresses your
use-case. Note also that the end time is acquired while the entries
are not locked in the report routines, and some tweaks in the docs and
comments.

LGTM.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#20

Sami Imseih

samimseih@gmail.com

about 1 year ago

In reply to: Bertrand Drouvot (#19)

Re: POC: track vacuum/analyze cumulative time per relation

I think that this is hiding a behavior change while adding new
counters, and both changes are independent.

That's a fair point.

Another thing that is
slightly incorrect to do if we take the argument of only adding the
counters is moving around the call of pgstat_progress_end_command(),
because it's not really wrong as-is, either. I'd suggest to make all
that a separate discussion.

Yeah, we have inconsistency between when vacuum and analyze
command ends as well as inconsistency between the time
reported to pg_stats vs the logs. I will start a follow-up thread for
this.

I have put my hands on this patch, and at the end I think that we
should just do the attached, which is simpler and addresses your
use-case. Note also that the end time is acquired while the entries
are not locked in the report routines, and some tweaks in the docs and
comments.

Thanks for the update. This LGTM.

Regards,

Sami

#21

Michael Paquier

michael@paquier.xyz

about 1 year ago

In reply to: Sami Imseih (#20)

#22

Alena Rybakina

a.rybakina@postgrespro.ru

about 1 year ago

In reply to: Michael Paquier (#21)

POC: track vacuum/analyze cumulative time per relation

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: