Vacuum statistics

Started by Alena Rybakinaalmost 2 years ago122 messages

lena.ribackina@yandex.ru

almost 2 years ago

Hello, everyone!

I think we don't have enough information to analyze vacuum functionality.

Needless to say that the vacuum is the most important process for a
database system. It prevents problems like table and index bloating and
emergency freezing if we have a wraparound problem. Furthermore, it
keeps the visibility map up to date. On the other hand, because of
incorrectly adjusted aggressive settings of autovacuum it can consume a
lot of computing resources that lead to all queries to the system
running longer.

Nowadays the vacuum gathers statistical information about tables, but it
is important not for optimizer only.

Because the vacuum is an automation process, there are a lot of settings
that determine their aggressive functionality to other objects of the
database. Besides, sometimes it is important to set a correct parameter
for the specified table, because of its dynamic changes.

An administrator of a database needs to set the settings of autovacuum
to have a balance between the vacuum's useful action in the database
system on the one hand, and the overhead of its workload on the other.
However, it is not enough for him to decide on vacuum functionality
through statistical information about the number of vacuum passes
through tables and operational data from progress_vacuum, because it is
available only during vacuum operation and does not provide a strategic
overview over the considered period.

To sum up, an automation vacuum has a strategic behavior because the
frequency of its functionality and resource consumption depends on the
workload of the database. Its workload on the database is minimal for an
append-only table and it is a maximum for the table with a
high-frequency updating. Furthermore, there is a high dependence of the
vacuum load on the number and volume of indexes. Because of the absence
of the visibility map for indexes, the vacuum scans the index
completely, and the worst situation when it needs to do it during a
bloating index situation in a small table.

I suggest gathering information about vacuum resource consumption for
processing indexes and tables and storing it in the table and index
relationships (for example, PgStat_StatTabEntry structure like it has
realized for usual statistics). It will allow us to determine how well
the vacuum is configured and evaluate the effect of overhead on the
system at the strategic level, the vacuum has gathered this information
already, but this valuable information doesn't store it.

--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

lena.ribackina@yandex.ru

almost 2 years ago

In reply to: Alena Rybakina (#1)

Re: Vacuum statistics

On 30.05.2024 10:33, Alena Rybakina wrote:

I suggest gathering information about vacuum resource consumption for
processing indexes and tables and storing it in the table and index
relationships (for example, PgStat_StatTabEntry structure like it has
realized for usual statistics). It will allow us to determine how well
the vacuum is configured and evaluate the effect of overhead on the
system at the strategic level, the vacuum has gathered this
information already, but this valuable information doesn't store it.

My colleagues and I have prepared a patch that can help to solve this
problem.

We are open to feedback.

--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

zubkov@moonset.ru

almost 2 years ago

In reply to: Alena Rybakina (#1)

Re: Vacuum statistics

Hi,

Th, 30/05/2024 at 10:33 -0700, Alena Rybakina wrote:

I suggest gathering information about vacuum resource consumption for
processing indexes and tables and storing it in the table and index
relationships (for example, PgStat_StatTabEntry structure like it has
realized for usual statistics). It will allow us to determine how
well
the vacuum is configured and evaluate the effect of overhead on the
system at the strategic level, the vacuum has gathered this
information
already, but this valuable information doesn't store it.

It seems a little bit unclear to me, so let me explain a little the
point of a proposition.

As the vacuum process is a backend it has a workload instrumentation.
We have all the basic counters available such as a number of blocks
read, hit and written, time spent on I/O, WAL stats and so on.. Also,
we can easily get some statistics specific to vacuum activity i.e.
number of tuples removed, number of blocks removed, number of VM marks
set and, of course the most important metric - time spent on vacuum
operation.

All those statistics must be stored by the Cumulative Statistics System
on per-relation basis. I mean individual cumulative counters for every
table and every index in the database.

Such counters will provide us a clear view about vacuum workload on
individual objects of the database, providing means to measure the
efficiency of performed vacuum fine tuning.
--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

dilipbalaut@gmail.com

almost 2 years ago

In reply to: Alena Rybakina (#2)

Re: Vacuum statistics

On Thu, May 30, 2024 at 11:57 PM Alena Rybakina
<lena.ribackina@yandex.ru> wrote:

On 30.05.2024 10:33, Alena Rybakina wrote:

I suggest gathering information about vacuum resource consumption for
processing indexes and tables and storing it in the table and index
relationships (for example, PgStat_StatTabEntry structure like it has
realized for usual statistics). It will allow us to determine how well
the vacuum is configured and evaluate the effect of overhead on the
system at the strategic level, the vacuum has gathered this
information already, but this valuable information doesn't store it.

My colleagues and I have prepared a patch that can help to solve this
problem.

We are open to feedback.

I was reading through the patch here are some initial comments.

--
+typedef struct LVExtStatCounters
+{
+ TimestampTz time;
+ PGRUsage ru;
+ WalUsage walusage;
+ BufferUsage bufusage;
+ int64 VacuumPageMiss;
+ int64 VacuumPageHit;
+ int64 VacuumPageDirty;
+ double VacuumDelayTime;
+ PgStat_Counter blocks_fetched;
+ PgStat_Counter blocks_hit;
+} LVExtStatCounters;

I noticed that you are storing both pgBufferUsage and
VacuumPage(Hit/Miss/Dirty) stats. Aren't these essentially the same?
It seems they both exist in the system because some code, like
heap_vacuum_rel(), uses pgBufferUsage, while do_analyze_rel() still
relies on the old counters. And there is already a patch to remove
those old counters.

--
+static Datum
+pg_stats_vacuum(FunctionCallInfo fcinfo, ExtVacReportType type, int ncolumns)
+{

I don't think you need this last parameter (ncolumns) we can anyway
fetch that from tupledesc, so adding an additional parameter
just for checking doesn't look good to me.

--
+ /* Tricky turn here: enforce pgstat to think that our database us dbid */
+
+ MyDatabaseId = dbid;

typo
/think that our database us dbid/think that our database has dbid

Also, remove the blank line between the comment and the next code
block that is related to that comment.

--
  VacuumPageDirty = 0;
+ VacuumDelayTime = 0.;

There is an extra "." after 0

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

lena.ribackina@yandex.ru

almost 2 years ago

In reply to: Dilip Kumar (#4)

Re: Vacuum statistics

Hi! Thank you for your interest in this topic!

On 07.06.2024 09:46, Dilip Kumar wrote:

On Thu, May 30, 2024 at 11:57 PM Alena Rybakina
<lena.ribackina@yandex.ru> wrote:

On 30.05.2024 10:33, Alena Rybakina wrote:

I suggest gathering information about vacuum resource consumption for
processing indexes and tables and storing it in the table and index
relationships (for example, PgStat_StatTabEntry structure like it has
realized for usual statistics). It will allow us to determine how well
the vacuum is configured and evaluate the effect of overhead on the
system at the strategic level, the vacuum has gathered this
information already, but this valuable information doesn't store it.

My colleagues and I have prepared a patch that can help to solve this
problem.

We are open to feedback.

I was reading through the patch here are some initial comments.
--
+typedef struct LVExtStatCounters
+{
+ TimestampTz time;
+ PGRUsage ru;
+ WalUsage walusage;
+ BufferUsage bufusage;
+ int64 VacuumPageMiss;
+ int64 VacuumPageHit;
+ int64 VacuumPageDirty;
+ double VacuumDelayTime;
+ PgStat_Counter blocks_fetched;
+ PgStat_Counter blocks_hit;
+} LVExtStatCounters;
I noticed that you are storing both pgBufferUsage and
VacuumPage(Hit/Miss/Dirty) stats. Aren't these essentially the same?
It seems they both exist in the system because some code, like
heap_vacuum_rel(), uses pgBufferUsage, while do_analyze_rel() still
relies on the old counters. And there is already a patch to remove
those old counters.

I agree with you and I have fixed it.

--
+static Datum
+pg_stats_vacuum(FunctionCallInfo fcinfo, ExtVacReportType type, int ncolumns)
+{
I don't think you need this last parameter (ncolumns) we can anyway
fetch that from tupledesc, so adding an additional parameter
just for checking doesn't look good to me.

To be honest,I'm notsureifncolumns shouldbe deletedat
all,becausethepg_stats_vacuum
functionisusedtodisplaythreedifferenttypesof
statistics:fortables,indexes,
anddatabases.Weusethisparametertopassinformationaboutthe numberof
parameters(orhowmany statisticsweexpect)dependingonthe typeof
statistics.For example,table
vacuumstatisticscontain27parameters,whileindexesanddatabasescontain19and15parameters,
respectively.Youcanseethatthe pg_stats_vacuum functioncontainsan
Assertthatchecksthatthe expectednumberof tupledesc
parametersmatchestheactualnumber.

Assert(tupdesc->natts == ncolumns);

PerhapsIcanconvertitto alocalparameteranddetermineitsvaluealreadyinthe
function,for example:

pg_stats_vacuum(FunctionCallInfo fcinfo, ExtVacReportType type, int
ncolumns)
{

int columns = 0;

switch (type)

{

case PGSTAT_EXTVAC_HEAP:

ncolumns = EXTVACHEAPSTAT_COLUMNS;

break;

case PGSTAT_EXTVAC_INDEX:

ncolumns = EXTVACINDEXSTAT_COLUMNS;

break;

case PGSTAT_EXTVAC_DB:

ncolumns = EXTVACDBSTAT_COLUMNS;

break;

}

...

}

What do you think?

--
+ /* Tricky turn here: enforce pgstat to think that our database us dbid */
+
+ MyDatabaseId = dbid;
typo
/think that our database us dbid/think that our database has dbid

Also, remove the blank line between the comment and the next code
block that is related to that comment.
--
VacuumPageDirty = 0;
+ VacuumDelayTime = 0.;
There is an extra "." after 0

Thank you, I fixed it.

In additionto thesechanges,Ifixedthe
problemwithdisplayingvacuumstatisticsfordatabases:Ifoundan
errorindefiningthe pg_stats_vacuum_database systemview.In
addition,Irewrotethe testsandconvertedthemintoa regressiontest.In
addition,Ihave dividedthe testtotestthe functionalityof the outputof
vacuumstatisticsintotwotests:oneofthemchecksthe functionalityof
tablesanddatabases,andthe other-indexes.Thisis causedby aproblemwiththe
vacuumfunctionalitywhenthe tablecontainsan
index.Youcanfindmoreinformationaboutthishere:[0]/messages/by-id/d1ca3a1d-7ead-41a7-bfd0-5b66ad97b1cd@yandex.ruand[1]/messages/by-id/CAH2-Wznv94Q_Td8OS8bAN7fYLpfU6CGgjn6Xau5eJ_sDxEGeBA@mail.gmail.com.

I attached the diff to this letter.

[0]: /messages/by-id/d1ca3a1d-7ead-41a7-bfd0-5b66ad97b1cd@yandex.ru
/messages/by-id/d1ca3a1d-7ead-41a7-bfd0-5b66ad97b1cd@yandex.ru

[1]: /messages/by-id/CAH2-Wznv94Q_Td8OS8bAN7fYLpfU6CGgjn6Xau5eJ_sDxEGeBA@mail.gmail.com
/messages/by-id/CAH2-Wznv94Q_Td8OS8bAN7fYLpfU6CGgjn6Xau5eJ_sDxEGeBA@mail.gmail.com

Iam currentlyworkingondividingthispatchintothreepartstosimplifythe
reviewprocess:oneofthemwillcontaincodeforcollectingvacuumstatisticsontables,the
secondonindexesandthe lastondatabases.I alsowritethe documentation.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

lena.ribackina@yandex.ru

over 1 year ago

In reply to: Alena Rybakina (#1)

Re: Vacuum statistics

Hi!

On 11.06.2024 16:09, Alena Rybakina wrote:

On 08.06.2024 09:30, Alena Rybakina wrote:

Iam currentlyworkingondividingthispatchintothreepartstosimplifythe
reviewprocess:oneofthemwillcontaincodeforcollectingvacuumstatisticsontables,the
secondonindexesandthe lastondatabases.

I have divided the patch into three: the first patch containscodeforthe
functionalityof collecting and storage for tables, the second one for
indexes and the last one for databases.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Import Notes

Reply to msg id not found: 06c68056-5c4d-48e0-926d-ff9c3ec681b4@yandex.ru

lena.ribackina@yandex.ru

over 1 year ago

In reply to: Alena Rybakina (#1)

Re: Vacuum statistics

Hi!

On 11.06.2024 16:09, Alena Rybakina wrote:

On 08.06.2024 09:30, Alena Rybakina wrote:

Iam currentlyworkingondividingthispatchintothreepartstosimplifythe
reviewprocess:oneofthemwillcontaincodeforcollectingvacuumstatisticsontables,the
secondonindexesandthe lastondatabases.

I have divided the patch into three: the first patch containscodeforthe
functionalityof collecting and storage for tables, the second one for
indexes and the last one for databases.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Import Notes

Reply to msg id not found: 06c68056-5c4d-48e0-926d-ff9c3ec681b4@yandex.ru

lena.ribackina@yandex.ru

over 1 year ago

In reply to: Alena Rybakina (#5)

Re: Vacuum statistics

I have written the documentary and attached the patch.

On 08.06.2024 09:30, Alena Rybakina wrote:

Iam currentlyworkingondividingthispatchintothreepartstosimplifythe
reviewprocess:oneofthemwillcontaincodeforcollectingvacuumstatisticsontables,the
secondonindexesandthe lastondatabases.I alsowritethe documentation.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Masahiko Sawada

sawada.mshk@gmail.com

over 1 year ago

In reply to: Andrei Zubkov (#3)

Re: Vacuum statistics

On Fri, May 31, 2024 at 4:19 AM Andrei Zubkov <zubkov@moonset.ru> wrote:

Hi,

Th, 30/05/2024 at 10:33 -0700, Alena Rybakina wrote:

I suggest gathering information about vacuum resource consumption for
processing indexes and tables and storing it in the table and index
relationships (for example, PgStat_StatTabEntry structure like it has
realized for usual statistics). It will allow us to determine how
well
the vacuum is configured and evaluate the effect of overhead on the
system at the strategic level, the vacuum has gathered this
information
already, but this valuable information doesn't store it.

It seems a little bit unclear to me, so let me explain a little the
point of a proposition.

As the vacuum process is a backend it has a workload instrumentation.
We have all the basic counters available such as a number of blocks
read, hit and written, time spent on I/O, WAL stats and so on.. Also,
we can easily get some statistics specific to vacuum activity i.e.
number of tuples removed, number of blocks removed, number of VM marks
set and, of course the most important metric - time spent on vacuum
operation.

I've not reviewed the patch closely but it sounds helpful for users. I
would like to add a statistic, the high-water mark of memory usage of
dead tuple TIDs. Since the amount of memory used by TidStore is hard
to predict, I think showing the high-water mark would help users to
predict how much memory they set to maintenance_work_mem.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

zubkov@moonset.ru

over 1 year ago

In reply to: Masahiko Sawada (#9)

Re: Vacuum statistics

Hello!

On Thu, 27/06/2024 at 10:39 +0900, Masahiko Sawada:

On Fri, May 31, 2024 at 4:19 AM Andrei Zubkov <zubkov@moonset.ru>
wrote:

As the vacuum process is a backend it has a workload
instrumentation.
We have all the basic counters available such as a number of blocks
read, hit and written, time spent on I/O, WAL stats and so on..
Also,
we can easily get some statistics specific to vacuum activity i.e.
number of tuples removed, number of blocks removed, number of VM
marks
set and, of course the most important metric - time spent on vacuum
operation.

I've not reviewed the patch closely but it sounds helpful for users.
I
would like to add a statistic, the high-water mark of memory usage of
dead tuple TIDs. Since the amount of memory used by TidStore is hard
to predict, I think showing the high-water mark would help users to
predict how much memory they set to maintenance_work_mem.

Thank you for your interest on this patch. I've understand your idea.
The obvious goal of it is to avoid expensive index multi processing
during vacuum of the heap. Provided statistics in the patch contain the
index_vacuum_count counter for each table which can be compared to the
pg_stat_all_tables.vacuum_count to detect specific relation index
multi-passes. Previous setting of maintenance_work_mem is known. Usage
of TidStore should be proportional to the amount of dead-tuples vacuum
workload on the table, so as the first evaluation we can take the
number of index passes per one heap pass as a maintenance_work_mem
multiplier.

But there is a better way. Once we detected the index multiprocessing
we can lower the vacuum workload for the heap pass making vacuum a
little bit more aggressive for this particular relation. I mean, in
such case increasing maintenance_work_mem is not only decision.

Suggested high-water mark statistic can't be used as cumulative
statistic - any high-water mark statistic as maximim-like statistic is
valid for certain time period thus should be reset on some kind of
schedule. Without resets it should reach 100% once under the heavy load
and stay there forever.

Said that such high-water mark seems a little bit unclear and
complicated for the DBA. It seems redundant to me right now. I can see
the main value of such statistic is to avoid too large
maintenance_work_mem setting. But I can't see really dramatic
consequences of that. Maybe I've miss something..

--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Alena Rybakina (#1)

Re: Vacuum statistics

Hello, everyone!

Thank you for your interesting patch with extended information
statistics about autovacuum.

Do you consider not to create new table in pg_catalog but to save
statistics in existing table? I mean pg_class or
pg_stat_progress_analyze, pg_stat_progress_vacuum?

P.S. If I sent this mail twice, I'm sorry :)

Regards

Ilia Evdokimov,

Tantor Labs.

zubkov@moonset.ru

over 1 year ago

In reply to: Alena Rybakina (#1)

Re: Vacuum statistics

Hi, Ilia!

Do you consider not to create new table in pg_catalog but to save
statistics in existing table? I mean pg_class or
pg_stat_progress_analyze, pg_stat_progress_vacuum?

Thank you for your interest on our patch!

*_progress views is not our case. They hold online statistics while
vacuum is in progress. Once work is done on a table the entry is gone
from those views. Idea of this patch is the opposite - it doesn't
provide online statistics but it accumulates statistics about rosources
consumed by all vacuum passes over all relations. It's much closer to
the pg_stat_all_tables than pg_stat_progress_vacuum.

It seems pg_class is not the right place because it is not a statistic
view - it holds the current relation state and haven't anything about
the relation workload.

Maybe the pg_stat_all_tables is the right place but I have several
thoughts about why it is not:
- Some statistics provided by this patch is really vacuum specific. I
don't think we want them in the relation statistics view.
- Postgres is extreamly extensible. I'm sure someday there will be
table AMs that does not need the vacuum at all.

Right now vacuum specific workload views seems optimal choice to me.

Regards,

--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Import Notes

Reply to msg id not found: ff99283d-f9d4-4b37-b4fa-3a433fb408a4@tantorlabs.com

reshkekirill@gmail.com

over 1 year ago

In reply to: Alena Rybakina (#7)

Re: Vacuum statistics

On Wed, 12 Jun 2024 at 11:38, Alena Rybakina <lena.ribackina@yandex.ru> wrote:

Hi!

On 11.06.2024 16:09, Alena Rybakina wrote:

On 08.06.2024 09:30, Alena Rybakina wrote:

I am currently working on dividing this patch into three parts to simplify the review process: one of them will contain code for collecting vacuum statistics on tables, the second on indexes and the last on databases.

I have divided the patch into three: the first patch contains code for the functionality of collecting and storage for tables, the second one for indexes and the last one for databases.

--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Hi!
Few suggestions on this patch-set

1)

+{ oid => '4701',
+  descr => 'pg_stats_vacuum_tables return stats values',
+  proname => 'pg_stats_vacuum_tables', provolatile => 's', prorettype => 'record',proisstrict => 'f',
+  proretset => 't',
+  proargtypes => 'oid oid',
+  proallargtypes =>

During development, OIDs should be picked up from range 8000-9999.
Same for pg_stats_vacuum_database & pg_stats_vacuum_indexes

Also, why are these function naming schemes like
pg_stats_vacuum_*something*, not pg_stat_vacuum_*something*, like
pg_stat_replication etc?

2) In 0003:

+ proargnames => '{dboid,dboid,db_blks_read,db_blks_hit,total_blks_dirtied,total_blks_written,wal_records,wal_fpi,wal_bytes,blk_read_time,blk_write_time,delay_time,system_time,user_time,total_time,interrupts}',

Repeated dboid arg name is strange. Is it done this way to make
pg_stats_vacuum function call in more unified fashion? I don't see any
other place within postgresql core with similar approach, so I doubt
it is correct.

3) 0001 patch vacuum_tables_statistics test creates
statistic_vacuum_database1, but does not use it. 0003 do.
Also I'm not sure if these additional checks on the second database
adds much value. Can you justify this please?

Other places look more or less fine to me.
However, I'll maybe post some additional nit-picky comments on the
next patch version.

--
Best regards,
Kirill Reshke

lena.ribackina@yandex.ru

over 1 year ago

In reply to: Kirill Reshke (#13)

Re: Vacuum statistics

Hi! Thank you for inquiring about our topic!

On 10.08.2024 23:57, Kirill Reshke wrote:

Hi!
Few suggestions on this patch-set

1)
+{ oid => '4701',
+  descr => 'pg_stats_vacuum_tables return stats values',
+  proname => 'pg_stats_vacuum_tables', provolatile => 's', prorettype => 'record',proisstrict => 'f',
+  proretset => 't',
+  proargtypes => 'oid oid',
+  proallargtypes =>
During development, OIDs should be picked up from range 8000-9999.
Same for pg_stats_vacuum_database & pg_stats_vacuum_indexes

Also, why are these function naming schemes like
pg_stats_vacuum_*something*, not pg_stat_vacuum_*something*, like
pg_stat_replication etc?

To be honest, when I named it, I missed this aspect. I thought about the
plural vacuum statistics we show, so I named them. I fixed it.

2) In 0003:

+ proargnames => '{dboid,dboid,db_blks_read,db_blks_hit,total_blks_dirtied,total_blks_written,wal_records,wal_fpi,wal_bytes,blk_read_time,blk_write_time,delay_time,system_time,user_time,total_time,interrupts}',

Repeated dboid arg name is strange. Is it done this way to make
pg_stats_vacuum function call in more unified fashion? I don't see any
other place within postgresql core with similar approach, so I doubt
it is correct.

Both parameters are required for input and output. We are trying to find
statistics for a specific database if the database oid was specified by
the user or display statistics for all objects, but we need to display
which database these statistics are for. I corrected the name of the
first parameter.

3) 0001 patch vacuum_tables_statistics test creates
statistic_vacuum_database1, but does not use it. 0003 do.
Also I'm not sure if these additional checks on the second database
adds much value. Can you justify this please?

The statistic_vacuum_database1 needs us to check the visible of
statistics from another database (statistic_vacuum_database) as they are
after the manipulation with tables in another database, and after
deleting the vestat table . In the latter case, we need to be sure that
all the table statistics are not visible to us.

So, I agree that it should be added only in the latest version of the
patch, where we add vacuum statistics for databases. I fixed it.

Other places look more or less fine to me.
However, I'll maybe post some additional nit-picky comments on the
next patch version.

We are glad any feedback and review, so feel free to do it)

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Andrei Zubkov (#12)

Re: Vacuum statistics

On 10.8.24 22:37, Andrei Zubkov wrote:

Hi, Ilia!

Do you consider not to create new table in pg_catalog but to save
statistics in existing table? I mean pg_class or
pg_stat_progress_analyze, pg_stat_progress_vacuum?

Thank you for your interest on our patch!

*_progress views is not our case. They hold online statistics while
vacuum is in progress. Once work is done on a table the entry is gone
from those views. Idea of this patch is the opposite - it doesn't
provide online statistics but it accumulates statistics about rosources
consumed by all vacuum passes over all relations. It's much closer to
the pg_stat_all_tables than pg_stat_progress_vacuum.

It seems pg_class is not the right place because it is not a statistic
view - it holds the current relation state and haven't anything about
the relation workload.

Maybe the pg_stat_all_tables is the right place but I have several
thoughts about why it is not:
- Some statistics provided by this patch is really vacuum specific. I
don't think we want them in the relation statistics view.
- Postgres is extreamly extensible. I'm sure someday there will be
table AMs that does not need the vacuum at all.

Right now vacuum specific workload views seems optimal choice to me.

Regards,

Agreed. They are not god places to store such statistics.

I have some suggestions:

1. pgstatfuncs.c in functions tuplestore_put_for_database() and
tuplestore_put_for_relation you can remove 'nulls' array if you're
sure that columns cannot be NULL.
2. These functions are almost the same and I would think of writing one
function depending of type 'ExtVacReportType'

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Ilia Evdokimov (#15)

Re: Vacuum statistics

And I have one suggestion for pg_stat_vacuum_database: I suppose we
should add database's name column after 'dboid' column because it is
difficult to read statistics without database's name. We could call it
'datname' just like in 'pg_stat_database' view.

Regards,

Ilia Evdokimov,
Tantor Labs LCC.

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Ilia Evdokimov (#16)

Re: Vacuum statistics

Hi!

On 13.08.2024 16:18, Ilia Evdokimov wrote:

On 10.8.24 22:37, Andrei Zubkov wrote:

Hi, Ilia!

Do you consider not to create new table in pg_catalog but to save
statistics in existing table? I mean pg_class or
pg_stat_progress_analyze, pg_stat_progress_vacuum?

Thank you for your interest on our patch!

*_progress views is not our case. They hold online statistics while
vacuum is in progress. Once work is done on a table the entry is gone
from those views. Idea of this patch is the opposite - it doesn't
provide online statistics but it accumulates statistics about rosources
consumed by all vacuum passes over all relations. It's much closer to
the pg_stat_all_tables than pg_stat_progress_vacuum.

It seems pg_class is not the right place because it is not a statistic
view - it holds the current relation state and haven't anything about
the relation workload.

Maybe the pg_stat_all_tables is the right place but I have several
thoughts about why it is not:
- Some statistics provided by this patch is really vacuum specific. I
don't think we want them in the relation statistics view.
- Postgres is extreamly extensible. I'm sure someday there will be
table AMs that does not need the vacuum at all.

Right now vacuum specific workload views seems optimal choice to me.

Regards,

Agreed. They are not god places to store such statistics.

I have some suggestions:

1. pgstatfuncs.c in functions tuplestore_put_for_database() and
tuplestore_put_for_relation you can remove 'nulls' array if you're
sure that columns cannot be NULL.

We need to use this for tuplestore_putvalues function. With this
function, we fill the table with the values of the statistics.

1.

2. These functions are almost the same and I would think of writing
one function depending of type 'ExtVacReportType'

I'm not sure that I fully understand what you mean. Can you explain it
more clearly, please?

On 13.08.2024 16:37, Ilia Evdokimov wrote:

And I have one suggestion for pg_stat_vacuum_database: I suppose we
should add database's name column after 'dboid' column because it is
difficult to read statistics without database's name. We could call it
'datname' just like in 'pg_stat_database' view.

Thank you. Fixed.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Alena Rybakina (#17)

Re: Vacuum statistics

On 15.8.24 11:49, Alena Rybakina wrote:

I have some suggestions:

1. pgstatfuncs.c in functions tuplestore_put_for_database() and
tuplestore_put_for_relation you can remove 'nulls' array if
you're sure that columns cannot be NULL.

We need to use this for tuplestore_putvalues function. With this
function, we fill the table with the values of the statistics.

Ah, right! I'm sorry.

1.

2. These functions are almost the same and I would think of writing
one function depending of type 'ExtVacReportType'

I'm not sure that I fully understand what you mean. Can you explain it
more clearly, please?

Ah, I didn't notice that the size of all three tables is different.
Therefore, it won't be possible to write one function instead of two to
avoid code duplication. My mistake.

jian.universality@gmail.com

over 1 year ago

In reply to: Alena Rybakina (#17)

Re: Vacuum statistics

On Thu, Aug 15, 2024 at 4:49 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:

Hi!

I've applied all the v5 patches.
0002 and 0003 have white space errors.

+      <para>
+        Number of times blocks of this index were already found
+        in the buffer cache by vacuum operations, so that a read was
not necessary
+        (this only includes hits in the
+        &project; buffer cache, not the operating system's file system cache)
+      </para></entry>

+        Number of times blocks of this table were already found
+        in the buffer cache by vacuum operations, so that a read was
not necessary
+        (this only includes hits in the
+        &project; buffer cache, not the operating system's file system cache)
+      </para></entry>

"&project;"
represents a sgml file placeholder name as "project" and puts all the
content of "project.sgml" to system-views.sgml.
but you don't have "project.sgml". you may check
doc/src/sgml/filelist.sgml or doc/src/sgml/ref/allfiles.sgml
for usage of "&place_holder;".
so you can change it to "project", otherwise doc cannot build.

src/backend/commands/dbcommands.c
we have:
/*
* If built with appropriate switch, whine when regression-testing
* conventions for database names are violated. But don't complain during
* initdb.
*/
#ifdef ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
if (IsUnderPostmaster && strstr(dbname, "regression") == NULL)
elog(WARNING, "databases created by regression test cases
should have names including \"regression\"");
#endif
so in src/test/regress/sql/vacuum_tables_and_db_statistics.sql you
need to change dbname:
CREATE DATABASE statistic_vacuum_database;
CREATE DATABASE statistic_vacuum_database1;

+  <para>
+   The view <structname>pg_stat_vacuum_indexes</structname> will contain
+   one row for each index in the current database (including TOAST
+   table indexes), showing statistics about vacuuming that specific index.
+  </para>
TOAST should
<acronym>TOAST</acronym>

+ /* Build a tuple descriptor for our result type */
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
maybe change to
            ereport(ERROR,
                    (errcode(ERRCODE_DATATYPE_MISMATCH),
                     errmsg("return type must be a row type")));
Later I found out "InitMaterializedSRF(fcinfo, 0);" already did all
the work. much of the code can be gotten rid of.
please check attached.

#define EXTVACHEAPSTAT_COLUMNS 27
#define EXTVACIDXSTAT_COLUMNS 19
#define EXTVACDBSTAT_COLUMNS 15
#define EXTVACSTAT_COLUMNS Max(EXTVACHEAPSTAT_COLUMNS, EXTVACIDXSTAT_COLUMNS)

static Oid CurrentDatabaseId = InvalidOid;

we already defined MyDatabaseId in src/include/miscadmin.h,
Why do we need "static Oid CurrentDatabaseId = InvalidOid;"?
also src/backend/utils/adt/pgstatfuncs.c already included "miscadmin.h".

the following code one function has 2 return statements?
------------------------------------------------------------------------
/*
* Get the vacuum statistics for the heap tables.
*/
Datum
pg_stat_vacuum_tables(PG_FUNCTION_ARGS)
{
return pg_stats_vacuum(fcinfo, PGSTAT_EXTVAC_HEAP, EXTVACHEAPSTAT_COLUMNS);

PG_RETURN_NULL();
}

/*
* Get the vacuum statistics for the indexes.
*/
Datum
pg_stat_vacuum_indexes(PG_FUNCTION_ARGS)
{
return pg_stats_vacuum(fcinfo, PGSTAT_EXTVAC_INDEX, EXTVACIDXSTAT_COLUMNS);

PG_RETURN_NULL();
}

/*
* Get the vacuum statistics for the database.
*/
Datum
pg_stat_vacuum_database(PG_FUNCTION_ARGS)
{
return pg_stats_vacuum(fcinfo, PGSTAT_EXTVAC_DB, EXTVACDBSTAT_COLUMNS);

PG_RETURN_NULL();
}
------------------------------------------------------------------------
in pg_stats_vacuum:
if (type == PGSTAT_EXTVAC_INDEX || type == PGSTAT_EXTVAC_HEAP)
{
Oid relid = PG_GETARG_OID(1);

/* Load table statistics for specified database. */
if (OidIsValid(relid))
{
tabentry = fetch_dbstat_tabentry(dbid, relid);
if (tabentry == NULL || tabentry->vacuum_ext.type != type)
/* Table don't exists or isn't an heap relation. */
PG_RETURN_NULL();

tuplestore_put_for_relation(relid, tupstore, tupdesc,
tabentry, ncolumns);
}
else
{
...
}
}
I don't understand the ELSE branch. the IF branch means the input
dboid, heap/index oid is correct.
the ELSE branch means table reloid is invalid = 0.
I am not sure 100% what the ELSE Branch means.
Also there are no comments explaining why.
experiments seem to show that when reloid is 0, it will print out all
the vacuum statistics
for all the tables in the current database. If so, then only super
users can call pg_stats_vacuum?
but the table owner should be able to call pg_stats_vacuum for that
specific table.

/* Type of ExtVacReport */
typedef enum ExtVacReportType
{
PGSTAT_EXTVAC_INVALID = 0,
PGSTAT_EXTVAC_HEAP = 1,
PGSTAT_EXTVAC_INDEX = 2,
PGSTAT_EXTVAC_DB = 3,
} ExtVacReportType;
generally "HEAP" means table and index, maybe "PGSTAT_EXTVAC_HEAP" would be term

jian.universality@gmail.com

over 1 year ago

In reply to: jian he (#19)

Re: Vacuum statistics

in pg_stats_vacuum
if (type == PGSTAT_EXTVAC_INDEX || type == PGSTAT_EXTVAC_HEAP)
{
Oid relid = PG_GETARG_OID(1);

/* Load table statistics for specified database. */
if (OidIsValid(relid))
{
tabentry = fetch_dbstat_tabentry(dbid, relid);
if (tabentry == NULL || tabentry->vacuum_ext.type != type)
/* Table don't exists or isn't an heap relation. */
PG_RETURN_NULL();

tuplestore_put_for_relation(relid, rsinfo, tabentry);
}
else
{
}

So for functions pg_stat_vacuum_indexes and pg_stat_vacuum_tables,
it seems you didn't check "relid" 's relkind,
you may need to use get_rel_relkind.

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: jian he (#20)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: jian he (#19)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: jian he (#20)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Ilia Evdokimov (#21)

jian.universality@gmail.com

over 1 year ago

In reply to: Alena Rybakina (#23)

reshkekirill@gmail.com

over 1 year ago

In reply to: jian he (#25)

Alexander Korotkov

aekorotkov@gmail.com

over 1 year ago

In reply to: Alena Rybakina (#24)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Alexander Korotkov (#27)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: jian he (#25)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Kirill Reshke (#26)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Alena Rybakina (#28)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Alena Rybakina (#28)

jian.universality@gmail.com

over 1 year ago

In reply to: Alena Rybakina (#32)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: jian he (#33)

Masahiko Sawada

sawada.mshk@gmail.com

over 1 year ago

In reply to: Alena Rybakina (#34)

Melanie Plageman

melanieplageman@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#35)

zubkov@moonset.ru

over 1 year ago

In reply to: Masahiko Sawada (#35)

Masahiko Sawada

sawada.mshk@gmail.com

over 1 year ago

In reply to: Melanie Plageman (#36)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Masahiko Sawada (#38)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Alena Rybakina (#34)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Alena Rybakina (#40)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Ilia Evdokimov (#41)

zubkov@moonset.ru

over 1 year ago

In reply to: Ilia Evdokimov (#41)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Alena Rybakina (#42)

Alexander Korotkov

aekorotkov@gmail.com

over 1 year ago

In reply to: Alena Rybakina (#28)

Alexander Korotkov

aekorotkov@gmail.com

over 1 year ago

In reply to: Alena Rybakina (#39)

Jim.Nasby@BlueTreble.com

over 1 year ago

In reply to: Alexander Korotkov (#46)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Alexander Korotkov (#45)

zubkov@moonset.ru

over 1 year ago

In reply to: Jim Nasby (#47)

Jim.Nasby@BlueTreble.com

over 1 year ago

In reply to: Andrei Zubkov (#49)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Jim Nasby (#50)

lena.ribackina@yandex.ru

over 1 year ago

In reply to: Alena Rybakina (#48)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Alena Rybakina (#44)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Ilia Evdokimov (#53)

Jim.Nasby@BlueTreble.com

over 1 year ago

In reply to: Alena Rybakina (#51)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Alena Rybakina (#54)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Jim Nasby (#55)

Jim.Nasby@BlueTreble.com

over 1 year ago

In reply to: Alena Rybakina (#57)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Jim Nasby (#58)

reshkekirill@gmail.com

over 1 year ago

In reply to: Alena Rybakina (#59)

Alexander Korotkov

aekorotkov@gmail.com

over 1 year ago

In reply to: Alena Rybakina (#59)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Alexander Korotkov (#61)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Alexander Korotkov (#61)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Kirill Reshke (#60)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Ilia Evdokimov (#62)

a.rybakina@postgrespro.ru

about 1 year ago

In reply to: Alena Rybakina (#63)

a.rybakina@postgrespro.ru

about 1 year ago

In reply to: Ilia Evdokimov (#62)

samimseih@gmail.com

about 1 year ago

In reply to: Alena Rybakina (#67)

Jim.Nasby@BlueTreble.com

about 1 year ago

In reply to: Sami Imseih (#68)

samimseih@gmail.com

about 1 year ago

In reply to: Jim Nasby (#69)

Greg Sabino Mullane

greg@turnstep.com

about 1 year ago

In reply to: Jim Nasby (#69)

Jim.Nasby@BlueTreble.com

about 1 year ago

In reply to: Sami Imseih (#70)

samimseih@gmail.com

about 1 year ago

In reply to: Jim Nasby (#72)

a.rybakina@postgrespro.ru

about 1 year ago

In reply to: Sami Imseih (#68)

samimseih@gmail.com

about 1 year ago

In reply to: Alena Rybakina (#74)

a.rybakina@postgrespro.ru

about 1 year ago

In reply to: Sami Imseih (#75)

a.rybakina@postgrespro.ru

about 1 year ago

In reply to: Alena Rybakina (#63)

a.rybakina@postgrespro.ru

about 1 year ago

In reply to: Alena Rybakina (#76)

zubkov@moonset.ru

about 1 year ago

In reply to: Sami Imseih (#75)

a.rybakina@postgrespro.ru

about 1 year ago

In reply to: Alena Rybakina (#66)

a.rybakina@postgrespro.ru

about 1 year ago

In reply to: Alena Rybakina (#77)

a.rybakina@postgrespro.ru

about 1 year ago

In reply to: Alena Rybakina (#80)

Alexander Korotkov

aekorotkov@gmail.com

about 1 year ago

In reply to: Alena Rybakina (#82)

a.rybakina@postgrespro.ru

about 1 year ago

In reply to: Alexander Korotkov (#83)

a.rybakina@postgrespro.ru

about 1 year ago

In reply to: Alena Rybakina (#84)

Alexander Korotkov

aekorotkov@gmail.com

about 1 year ago

In reply to: Alena Rybakina (#84)

a.rybakina@postgrespro.ru

about 1 year ago

In reply to: Alexander Korotkov (#86)

a.rybakina@postgrespro.ru

about 1 year ago

In reply to: Alena Rybakina (#84)

a.rybakina@postgrespro.ru

about 1 year ago

In reply to: Alena Rybakina (#88)

ilya.evdokimov@tantorlabs.com

12 months ago

In reply to: Alena Rybakina (#89)

reshkekirill@gmail.com

12 months ago

In reply to: Alena Rybakina (#89)

a.rybakina@postgrespro.ru

12 months ago

In reply to: Kirill Reshke (#91)

a.rybakina@postgrespro.ru

12 months ago

In reply to: Ilia Evdokimov (#90)

Jim.Nasby@BlueTreble.com

12 months ago

In reply to: Alena Rybakina (#93)

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

12 months ago

In reply to: Jim Nasby (#94)

vignesh21@gmail.com

12 months ago

In reply to: Alena Rybakina (#89)

a.rybakina@postgrespro.ru

12 months ago

In reply to: vignesh C (#96)

a.rybakina@postgrespro.ru

12 months ago

In reply to: Bertrand Drouvot (#95)

a.rybakina@postgrespro.ru

12 months ago

In reply to: Alena Rybakina (#98)

Jim.Nasby@BlueTreble.com

12 months ago

In reply to: Alena Rybakina (#98)

a.rybakina@postgrespro.ru

12 months ago

In reply to: Alena Rybakina (#98)

Andrei Lepikhov

lepihov@gmail.com

11 months ago

In reply to: Alexander Korotkov (#45)

a.rybakina@postgrespro.ru

10 months ago

In reply to: Alena Rybakina (#101)

a.rybakina@postgrespro.ru

10 months ago

In reply to: Andrei Lepikhov (#102)

amit.kapila16@gmail.com

10 months ago

In reply to: Alena Rybakina (#103)

a.rybakina@postgrespro.ru

10 months ago

In reply to: Amit Kapila (#105)

amit.kapila16@gmail.com

10 months ago

In reply to: Alena Rybakina (#106)

Alexander Korotkov

aekorotkov@gmail.com

9 months ago

In reply to: Alena Rybakina (#106)

a.rybakina@postgrespro.ru

9 months ago

In reply to: Alexander Korotkov (#108)

a.rybakina@postgrespro.ru

9 months ago

In reply to: Alexander Korotkov (#108)

a.rybakina@postgrespro.ru

9 months ago

In reply to: Alena Rybakina (#110)

a.rybakina@postgrespro.ru

6 months ago

In reply to: Alena Rybakina (#111)

a.rybakina@postgrespro.ru

6 months ago

In reply to: Alena Rybakina (#109)

a.rybakina@postgrespro.ru

6 months ago

In reply to: Alena Rybakina (#113)

ilya.evdokimov@tantorlabs.com

6 months ago

In reply to: Alena Rybakina (#112)

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

6 months ago

In reply to: Amit Kapila (#105)

lena.ribackina@yandex.ru

6 months ago

In reply to: Ilia Evdokimov (#115)

lena.ribackina@yandex.ru

6 months ago

In reply to: Bharath Rupireddy (#116)

a.rybakina@postgrespro.ru

3 months ago

In reply to: Alena Rybakina (#117)

lena.ribackina@yandex.ru

9 days ago

In reply to: Alena Rybakina (#119)

lena.ribackina@yandex.ru

about 15 hours ago

In reply to: Alena Rybakina (#120)

lena.ribackina@yandex.ru

about 13 hours ago

In reply to: Alena Rybakina (#121)