autovac issue with large number of tables

Started by Nasby, Jimover 5 years ago36 messages
#1Nasby, Jim
nasbyj@amazon.com
1 attachment(s)

A database with a very large number of tables eligible for autovacuum can result in autovacuum workers “stuck” in a tight loop of table_recheck_autovac() constantly reporting nothing to do on the table. This is because a database with a very large number of tables means it takes a while to search the statistics hash to verify that the table still needs to be processed[1]. If a worker spends some time processing a table, when it’s done it can spend a significant amount of time rechecking each table that it identified at launch (I’ve seen a worker in this state for over an hour). A simple work-around in this scenario is to kill the worker; the launcher will quickly fire up a new worker on the same database, and that worker will build a new list of tables.

That’s not a complete solution though… if the database contains a large number of very small tables you can end up in a state where 1 or 2 workers is busy chugging through those small tables so quickly than any additional workers spend all their time in table_recheck_autovac(), because that takes long enough that the additional workers are never able to “leapfrog” the workers that are doing useful work.

PoC patch attached.

1: top hits from `perf top -p xxx` on an affected worker
Samples: 72K of event 'cycles', Event count (approx.): 17131910436
Overhead Shared Object Symbol
42.62% postgres [.] hash_search_with_hash_value
10.34% libc-2.17.so [.] __memcpy_sse2
6.99% [kernel] [k] copy_user_enhanced_fast_string
4.73% libc-2.17.so [.] _IO_fread
3.91% postgres [.] 0x00000000002d6478
2.95% libc-2.17.so [.] _IO_getc
2.44% libc-2.17.so [.] _IO_file_xsgetn
1.73% postgres [.] hash_search
1.65% [kernel] [k] find_get_entry
1.10% postgres [.] hash_uint32
0.99% libc-2.17.so [.] __memcpy_ssse3_back

Attachments:

autovac.patchapplication/octet-stream; name=autovac.patchDownload
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index a8d4dfdd7c..3c2027e77c 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1947,6 +1947,8 @@ do_autovacuum(void)
 	bool		did_vacuum = false;
 	bool		found_concurrent_worker = false;
 	int			i;
+	int			skipped, modulo, to_skip;
+	skipped = modulo = to_skip = 0;
 
 	/*
 	 * StartTransactionCommand and CommitTransactionCommand will automatically
@@ -2325,6 +2327,66 @@ do_autovacuum(void)
 			 */
 		}
 
+		/*
+		 * If we've had to skip a large number of tables consecutively, that's
+		 * an indication that either our initial list is now very out-of-date
+		 * with reality, or we are competing with other workers to process a
+		 * lot of really small tables. Both cases are problematic because on a
+		 * database with enough tables to trigger this the hash search to find
+		 * updated stats for the table we're looking at (in
+		 * table_recheck_autovac) will be non-trivially expensive. When this
+		 * happens, we need to take some defensive measures to avoid hopelessly
+		 * spinning our wheels.
+		 */
+		if (skipped > 1000)
+		{
+			if (modulo == 0)
+			{
+				/*
+				 * The first time we get here, we just assume it's because we're
+				 * competing with other workers over the same set of tables, not
+				 * because our list is very out of date. If it turns out our list
+				 * is way out of date we'll quickly max out skipped again and exit
+				 * the loop.
+				 *
+				 * Figure out how many other workers are handling this database and
+				 * start skipping enough records to "stay ahead of" them. This
+				 * doesn't need to be perfect; the goal is simply to try and get
+				 * real work done. If it turns out there's no competing workers
+				 * we'll break out soon enough anyway.
+				 */
+				LWLockAcquire(AutovacuumLock, LW_SHARED);
+				dlist_foreach(iter, &AutoVacuumShmem->av_runningWorkers)
+				{
+					WorkerInfo	worker = dlist_container(WorkerInfoData, wi_links, iter.cur);
+
+					/* we intentionally count ourselves to ensure modulo > 0 */
+
+					/* ignore workers in other databases */
+					if (worker->wi_dboid != MyDatabaseId)
+						continue;
+
+					modulo++;
+				}
+				LWLockRelease(AutovacuumLock);
+				to_skip = modulo;
+			}
+			/*
+			 * Handle the case of our list being hopelessly out of date. In
+			 * this scenario we built a very large initial list, then spent
+			 * enough time processing a table (while other workers carried on)
+			 * that our list is hopelessly out of date, so just exit and let
+			 * the launcher fire up a new worker.
+			 */
+			else
+				break;
+		}
+
+		if (--to_skip > 0)
+			continue;
+		else
+			to_skip = modulo;
+
 		/*
 		 * Find out whether the table is shared or not.  (It's slightly
 		 * annoying to fetch the syscache entry just for this, but in typical
@@ -2408,8 +2470,11 @@ do_autovacuum(void)
 			MyWorkerInfo->wi_tableoid = InvalidOid;
 			MyWorkerInfo->wi_sharedrel = false;
 			LWLockRelease(AutovacuumScheduleLock);
+			skipped++;
 			continue;
 		}
+		else
+			skipped = 0;
 
 		/*
 		 * Remember the prevailing values of the vacuum cost GUCs.  We have to
#2Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Nasby, Jim (#1)
Re: autovac issue with large number of tables

On Mon, 27 Jul 2020 at 06:43, Nasby, Jim <nasbyj@amazon.com> wrote:

A database with a very large number of tables eligible for autovacuum can result in autovacuum workers “stuck” in a tight loop of table_recheck_autovac() constantly reporting nothing to do on the table. This is because a database with a very large number of tables means it takes a while to search the statistics hash to verify that the table still needs to be processed[1]. If a worker spends some time processing a table, when it’s done it can spend a significant amount of time rechecking each table that it identified at launch (I’ve seen a worker in this state for over an hour). A simple work-around in this scenario is to kill the worker; the launcher will quickly fire up a new worker on the same database, and that worker will build a new list of tables.

That’s not a complete solution though… if the database contains a large number of very small tables you can end up in a state where 1 or 2 workers is busy chugging through those small tables so quickly than any additional workers spend all their time in table_recheck_autovac(), because that takes long enough that the additional workers are never able to “leapfrog” the workers that are doing useful work.

As another solution, I've been considering adding a queue having table
OIDs that need to vacuumed/analyzed on the shared memory (i.g. on
DSA). Since all autovacuum workers running on the same database can
see a consistent queue, the issue explained above won't happen and
probably it makes the implementation of prioritization of tables being
vacuumed easier which is sometimes discussed on pgsql-hackers. I guess
it might be worth to discuss including this idea.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Nasby, Jim
nasbyj@amazon.com
In reply to: Nasby, Jim (#1)
1 attachment(s)
FW: autovac issue with large number of tables

A database with a very large number of tables eligible for autovacuum can result in autovacuum workers “stuck” in a tight loop of table_recheck_autovac() constantly reporting nothing to do on the table. This is because a database with a very large number of tables means it takes a while to search the statistics hash to verify that the table still needs to be processed[1]. If a worker spends some time processing a table, when it’s done it can spend a significant amount of time rechecking each table that it identified at launch (I’ve seen a worker in this state for over an hour). A simple work-around in this scenario is to kill the worker; the launcher will quickly fire up a new worker on the same database, and that worker will build a new list of tables.

That’s not a complete solution though… if the database contains a large number of very small tables you can end up in a state where 1 or 2 workers is busy chugging through those small tables so quickly than any additional workers spend all their time in table_recheck_autovac(), because that takes long enough that the additional workers are never able to “leapfrog” the workers that are doing useful work.

PoC patch attached.

1: top hits from `perf top -p xxx` on an affected worker
Samples: 72K of event 'cycles', Event count (approx.): 17131910436
Overhead Shared Object Symbol
42.62% postgres [.] hash_search_with_hash_value
10.34% libc-2.17.so [.] __memcpy_sse2
6.99% [kernel] [k] copy_user_enhanced_fast_string
4.73% libc-2.17.so [.] _IO_fread
3.91% postgres [.] 0x00000000002d6478
2.95% libc-2.17.so [.] _IO_getc
2.44% libc-2.17.so [.] _IO_file_xsgetn
1.73% postgres [.] hash_search
1.65% [kernel] [k] find_get_entry
1.10% postgres [.] hash_uint32
0.99% libc-2.17.so [.] __memcpy_ssse3_back

Attachments:

autovac.patchapplication/octet-stream; name=autovac.patchDownload
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index a8d4dfdd7c..3c2027e77c 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1947,6 +1947,8 @@ do_autovacuum(void)
 	bool		did_vacuum = false;
 	bool		found_concurrent_worker = false;
 	int			i;
+	int			skipped, modulo, to_skip;
+	skipped = modulo = to_skip = 0;
 
 	/*
 	 * StartTransactionCommand and CommitTransactionCommand will automatically
@@ -2325,6 +2327,66 @@ do_autovacuum(void)
 			 */
 		}
 
+		/*
+		 * If we've had to skip a large number of tables consecutively, that's
+		 * an indication that either our initial list is now very out-of-date
+		 * with reality, or we are competing with other workers to process a
+		 * lot of really small tables. Both cases are problematic because on a
+		 * database with enough tables to trigger this the hash search to find
+		 * updated stats for the table we're looking at (in
+		 * table_recheck_autovac) will be non-trivially expensive. When this
+		 * happens, we need to take some defensive measures to avoid hopelessly
+		 * spinning our wheels.
+		 */
+		if (skipped > 1000)
+		{
+			if (modulo == 0)
+			{
+				/*
+				 * The first time we get here, we just assume it's because we're
+				 * competing with other workers over the same set of tables, not
+				 * because our list is very out of date. If it turns out our list
+				 * is way out of date we'll quickly max out skipped again and exit
+				 * the loop.
+				 *
+				 * Figure out how many other workers are handling this database and
+				 * start skipping enough records to "stay ahead of" them. This
+				 * doesn't need to be perfect; the goal is simply to try and get
+				 * real work done. If it turns out there's no competing workers
+				 * we'll break out soon enough anyway.
+				 */
+				LWLockAcquire(AutovacuumLock, LW_SHARED);
+				dlist_foreach(iter, &AutoVacuumShmem->av_runningWorkers)
+				{
+					WorkerInfo	worker = dlist_container(WorkerInfoData, wi_links, iter.cur);
+
+					/* we intentionally count ourselves to ensure modulo > 0 */
+
+					/* ignore workers in other databases */
+					if (worker->wi_dboid != MyDatabaseId)
+						continue;
+
+					modulo++;
+				}
+				LWLockRelease(AutovacuumLock);
+				to_skip = modulo;
+			}
+			/*
+			 * Handle the case of our list being hopelessly out of date. In
+			 * this scenario we built a very large initial list, then spent
+			 * enough time processing a table (while other workers carried on)
+			 * that our list is hopelessly out of date, so just exit and let
+			 * the launcher fire up a new worker.
+			 */
+			else
+				break;
+		}
+
+		if (--to_skip > 0)
+			continue;
+		else
+			to_skip = modulo;
+
 		/*
 		 * Find out whether the table is shared or not.  (It's slightly
 		 * annoying to fetch the syscache entry just for this, but in typical
@@ -2408,8 +2470,11 @@ do_autovacuum(void)
 			MyWorkerInfo->wi_tableoid = InvalidOid;
 			MyWorkerInfo->wi_sharedrel = false;
 			LWLockRelease(AutovacuumScheduleLock);
+			skipped++;
 			continue;
 		}
+		else
+			skipped = 0;
 
 		/*
 		 * Remember the prevailing values of the vacuum cost GUCs.  We have to
#4Jim Nasby
nasbyj@amazon.com
In reply to: Nasby, Jim (#3)
Re: [UNVERIFIED SENDER] FW: autovac issue with large number of tables

Sorry, please ignore this duplicate!

Show quoted text

On 7/27/20 1:39 PM, Nasby, Jim wrote:

A database with a very large number of  tables eligible for autovacuum
can result in autovacuum workers “stuck” in a tight loop of
table_recheck_autovac() constantly reporting nothing to do on the
table. This is because a database with a very large number of tables
means it takes a while to search the statistics hash to verify that
the table still needs to be processed[1]. If a worker spends some time
processing a table, when it’s done it can spend a significant amount
of time rechecking each table that it identified at launch (I’ve seen
a worker in this state for over an hour). A simple work-around in this
scenario is to kill the worker; the launcher will quickly fire up a
new worker on the same database, and that worker will build a new list
of tables.

That’s not a complete solution though… if the database contains a
large number of very small tables you can end up in a state where 1 or
2 workers is busy chugging through those small tables so quickly than
any additional workers spend all their time in
table_recheck_autovac(), because that takes long enough that the
additional workers are never able to “leapfrog” the workers that are
doing useful work.

PoC patch attached.

1: top hits from `perf top -p xxx` on an affected worker

Samples: 72K of event 'cycles', Event count (approx.): 17131910436

Overhead Shared Object     Symbol

  42.62% postgres          [.] hash_search_with_hash_value

  10.34% libc-2.17.so      [.] __memcpy_sse2

   6.99% [kernel]          [k] copy_user_enhanced_fast_string

   4.73% libc-2.17.so      [.] _IO_fread

   3.91% postgres          [.] 0x00000000002d6478

   2.95% libc-2.17.so      [.] _IO_getc

   2.44% libc-2.17.so      [.] _IO_file_xsgetn

   1.73% postgres          [.] hash_search

   1.65% [kernel]          [k] find_get_entry

   1.10% postgres          [.] hash_uint32

   0.99% libc-2.17.so      [.] __memcpy_ssse3_back

#5Jim Nasby
nasbyj@amazon.com
In reply to: Masahiko Sawada (#2)
Re: autovac issue with large number of tables

On 7/27/20 1:51 AM, Masahiko Sawada wrote:

On Mon, 27 Jul 2020 at 06:43, Nasby, Jim <nasbyj@amazon.com> wrote:

A database with a very large number of tables eligible for autovacuum can result in autovacuum workers “stuck” in a tight loop of table_recheck_autovac() constantly reporting nothing to do on the table. This is because a database with a very large number of tables means it takes a while to search the statistics hash to verify that the table still needs to be processed[1]. If a worker spends some time processing a table, when it’s done it can spend a significant amount of time rechecking each table that it identified at launch (I’ve seen a worker in this state for over an hour). A simple work-around in this scenario is to kill the worker; the launcher will quickly fire up a new worker on the same database, and that worker will build a new list of tables.

That’s not a complete solution though… if the database contains a large number of very small tables you can end up in a state where 1 or 2 workers is busy chugging through those small tables so quickly than any additional workers spend all their time in table_recheck_autovac(), because that takes long enough that the additional workers are never able to “leapfrog” the workers that are doing useful work.

As another solution, I've been considering adding a queue having table
OIDs that need to vacuumed/analyzed on the shared memory (i.g. on
DSA). Since all autovacuum workers running on the same database can
see a consistent queue, the issue explained above won't happen and
probably it makes the implementation of prioritization of tables being
vacuumed easier which is sometimes discussed on pgsql-hackers. I guess
it might be worth to discuss including this idea.

I'm in favor of trying to improve scheduling (especially allowing users
to control how things are scheduled), but that's a far more invasive
patch. I'd like to get something like this patch in without waiting on a
significantly larger effort.

#6Kasahara Tatsuhito
kasahara.tatsuhito@gmail.com
In reply to: Jim Nasby (#5)
Re: autovac issue with large number of tables

Hi,

On Tue, Jul 28, 2020 at 3:49 AM Jim Nasby <nasbyj@amazon.com> wrote:

I'm in favor of trying to improve scheduling (especially allowing users
to control how things are scheduled), but that's a far more invasive
patch. I'd like to get something like this patch in without waiting on a
significantly larger effort.

BTW, Have you tried the patch suggested in the thread below?

/messages/by-id/20180629.173418.190173462.horiguchi.kyotaro@lab.ntt.co.jp

The above is a suggestion to manage statistics on shared memory rather
than in a file, but I think this feature may mitigate your problem.
I think that feature has yet another performance challenge, but it
might be worth a try.
The above patch will also require a great deal of effort to get into
the PostgreSQL-core, but I'm curious to see how well it works for this
problem.

Best regards,

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com

#7Jim Nasby
nasbyj@amazon.com
In reply to: Kasahara Tatsuhito (#6)
Re: autovac issue with large number of tables

On 7/31/20 1:26 AM, Kasahara Tatsuhito wrote:

On Tue, Jul 28, 2020 at 3:49 AM Jim Nasby <nasbyj@amazon.com> wrote:

I'm in favor of trying to improve scheduling (especially allowing users
to control how things are scheduled), but that's a far more invasive
patch. I'd like to get something like this patch in without waiting on a
significantly larger effort.

BTW, Have you tried the patch suggested in the thread below?

/messages/by-id/20180629.173418.190173462.horiguchi.kyotaro@lab.ntt.co.jp

The above is a suggestion to manage statistics on shared memory rather
than in a file, but I think this feature may mitigate your problem.
I think that feature has yet another performance challenge, but it
might be worth a try.
The above patch will also require a great deal of effort to get into
the PostgreSQL-core, but I'm curious to see how well it works for this
problem.

Without reading the 100+ emails or the 260k patch, I'm guessing that it
won't help because the problem I observed was spending most of it's time in

  42.62% postgres          [.] hash_search_with_hash_value

I don't see how moving things to shared memory would help that at all.

BTW, when it comes to getting away from using files to store stats, IMHO
the best first pass on that is to put hooks in place to allow an
extension to replace/supplement different parts of the existing stats
infrastructure.

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#7)
Re: autovac issue with large number of tables

Jim Nasby <nasbyj@amazon.com> writes:

Without reading the 100+ emails or the 260k patch, I'm guessing that it
won't help because the problem I observed was spending most of it's time in
  42.62% postgres          [.] hash_search_with_hash_value
I don't see how moving things to shared memory would help that at all.

So I'm a bit mystified as to why that would show up as the primary cost.
It looks to me like we force a re-read of the pgstats data each time
through table_recheck_autovac(), and it seems like the costs associated
with that would swamp everything else in the case you're worried about.

I suspect that the bulk of the hash_search_with_hash_value costs are
HASH_ENTER calls caused by repopulating the pgstats hash table, rather
than the single read probe that table_recheck_autovac itself will do.
It's still surprising that that would dominate the other costs of reading
the data, but maybe those costs just aren't as well localized in the code.

So I think Kasahara-san's point is that the shared memory stats collector
might wipe out those costs, depending on how it's implemented. (I've not
looked at that patch in a long time either, so I don't know how much it'd
cut the reader-side costs. But maybe it'd be substantial.)

In the meantime, though, do we want to do something else to alleviate
the issue? I realize you only described your patch as a PoC, but I
can't say I like it much:

* Giving up after we've wasted 1000 pgstats re-reads seems like locking
the barn door only after the horse is well across the state line.

* I'm not convinced that the business with skipping N entries at a time
buys anything. You'd have to make pretty strong assumptions about the
workers all processing tables at about the same rate to believe it will
help. In the worst case, it might lead to all the workers ignoring the
same table(s).

I think the real issue here is autovac_refresh_stats's insistence that it
shouldn't throttle pgstats re-reads in workers. I see the point about not
wanting to repeat vacuum work on the basis of stale data, but still ...
I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

BTW, can you provide a test script that reproduces the problem you're
looking at? The rest of us are kind of guessing at what's happening.

regards, tom lane

#9Kasahara Tatsuhito
kasahara.tatsuhito@gmail.com
In reply to: Tom Lane (#8)
Re: autovac issue with large number of tables

Hi,

On Wed, Aug 12, 2020 at 2:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

So I think Kasahara-san's point is that the shared memory stats collector
might wipe out those costs, depending on how it's implemented. (I've not
looked at that patch in a long time either, so I don't know how much it'd
cut the reader-side costs. But maybe it'd be substantial.)

Thanks for your clarification, that's what I wanted to say.
Sorry for the lack of explanation.

I think the real issue here is autovac_refresh_stats's insistence that it
shouldn't throttle pgstats re-reads in workers.

I agree that.

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC.

Best regards,

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com

#10Kasahara Tatsuhito
kasahara.tatsuhito@gmail.com
In reply to: Kasahara Tatsuhito (#9)
3 attachment(s)
Re: autovac issue with large number of tables

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

The patch and reproduce scripts were attached.

Thoughts ?

Best regards,

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com

Attachments:

v1_mod_table_recheck_autovac.patchapplication/octet-stream; name=v1_mod_table_recheck_autovac.patchDownload
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 1b8cd7b..f42f858 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -153,6 +153,9 @@ static int	default_freeze_table_age;
 static int	default_multixact_freeze_min_age;
 static int	default_multixact_freeze_table_age;
 
+/* Counter to determine if statistics should be refreshed */
+static bool use_existing_stats = false;
+
 /* Memory context for long-lived data */
 static MemoryContext AutovacMemCxt;
 
@@ -2787,18 +2790,68 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 	bool		wraparound;
 	AutoVacOpts *avopts;
 
-	/* use fresh stats */
-	autovac_refresh_stats();
-
-	shared = pgstat_fetch_stat_dbentry(InvalidOid);
-	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
-
 	/* fetch the relation's relcache entry */
 	classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
 	if (!HeapTupleIsValid(classTup))
 		return NULL;
 	classForm = (Form_pg_class) GETSTRUCT(classTup);
 
+	/* 
+	 * Before stats refresh, check existing stats for avoiding
+ 	 * frequent reloading of pgstats.
+ 	 * In the case of very large numbers of tables, the cost of re-reading
+ 	 * the stats file can be significant, and the frequent calls to 
+ 	 * autovac_refresh_stats() can make certain autovacuum workers unable to work.
+ 	 * So if the last time we checked a table that was already VACUUMed after 
+ 	 * refres stats, check the current statistics before refreshing it.
+	 */
+	if (use_existing_stats)
+	{
+		shared = pgstat_fetch_stat_dbentry(InvalidOid);
+		dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
+
+		/*
+	 	 * Get the applicable reloptions.  If it is a TOAST table, try to get the
+	 	 * main table reloptions if the toast table itself doesn't have.
+	 	 */
+		avopts = extract_autovac_opts(classTup, pg_class_desc);
+		if (classForm->relkind == RELKIND_TOASTVALUE &&
+			avopts == NULL && table_toast_map != NULL)
+		{
+			av_relation *hentry;
+			bool		found;
+
+			hentry = hash_search(table_toast_map, &relid, HASH_FIND, &found);
+			if (found && hentry->ar_hasrelopts)
+			avopts = &hentry->ar_reloptions;
+		}
+
+		/* fetch the pgstat table entry */
+		tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
+											 shared, dbentry);
+
+		relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
+								  effective_multixact_freeze_max_age,
+								  &dovacuum, &doanalyze, &wraparound);
+
+		/* ignore ANALYZE for toast tables */
+		if (classForm->relkind == RELKIND_TOASTVALUE)
+			doanalyze = false;
+
+		/* someone has already issued vacuum, so exit quickly */
+		if (!doanalyze && !dovacuum)
+		{
+			heap_freetuple(classTup);
+			return NULL;
+		}
+	}
+
+	/* use fresh stats and recheck again */
+	autovac_refresh_stats();
+
+	shared = pgstat_fetch_stat_dbentry(InvalidOid);
+	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
+
 	/*
 	 * Get the applicable reloptions.  If it is a TOAST table, try to get the
 	 * main table reloptions if the toast table itself doesn't have.
@@ -2913,9 +2966,17 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 		tab->at_dobalance =
 			!(avopts && (avopts->vacuum_cost_limit > 0 ||
 						 avopts->vacuum_cost_delay > 0));
+
+		/* We might be better to refresh stats */
+		use_existing_stats = false;
 	}
+	else
+	{
 
-	heap_freetuple(classTup);
+		heap_freetuple(classTup);
+		/* The relid has already vacuumed, so we might be better to use exiting stats */
+		use_existing_stats = true;
+	}
 
 	return tab;
 }
normal_vacuum_case_sample.shapplication/octet-stream; name=normal_vacuum_case_sample.shDownload
wrap-round_vacuum_case_sample.shapplication/octet-stream; name=wrap-round_vacuum_case_sample.shDownload
#11Kasahara Tatsuhito
kasahara.tatsuhito@gmail.com
In reply to: Kasahara Tatsuhito (#10)
Re: autovac issue with large number of tables

Therefore, we expect this patch [1]/messages/by-id/20200908.175557.617150409868541587.horikyota.ntt@gmail.com to be committed for its original
purpose, as well as to improve autovacuum from v14 onwards.Hi,

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

The patch and reproduce scripts were attached.

Thoughts ?

Hi.

I ran the same test with a patch[1]/messages/by-id/20200908.175557.617150409868541587.horikyota.ntt@gmail.com that manages the statistics on
shared memory.
This patch is expected to reduce the burden of refreshing large
amounts of stats.

And the following results were obtained.
(The results for HEAD are the same as in my last post.)

========================================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with shared_base_stast
patch) 8 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with shared_base_stast
patch) 8 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with shared_base_stast
patch) 8 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with shared_base_stast
patch) 9 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with shared_base_stast
patch) 9 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with shared_base_stast
patch) 13 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with shared_base_stast
patch) 12 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with shared_base_stast
patch) 13 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with shared_base_stast
patch) 12 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with shared_base_stast
patch) 12 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with
shared_base_stast patch) 18 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with
shared_base_stast patch) 25 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with
shared_base_stast patch) 28 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with
shared_base_stast patch) 28 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with
shared_base_stast patch) 29 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with
shared_base_stast patch) 27 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with
shared_base_stast patch) 54 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with
shared_base_stast patch) 67 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with
shared_base_stast patch) 75 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with
shared_base_stast patch) 83 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with shared_base_stats
patch) 6 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with shared_base_stats
patch) 7 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with shared_base_stats
patch) 6 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with shared_base_stats
patch) 6 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with shared_base_stats
patch) 7 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with shared_base_stats
patch) 8 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with shared_base_stats
patch) 8 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with shared_base_stats
patch) 8 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with shared_base_stats
patch) 9 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with shared_base_stats
patch) 8 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with
shared_base_stats patch) 9 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with
shared_base_stats patch) 9 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with
shared_base_stats patch) 9 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with
shared_base_stats patch) 8 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with
shared_base_stats patch) 9 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with
shared_base_stats patch) 12 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with
shared_base_stats patch) 12 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with
shared_base_stats patch) 12 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with
shared_base_stats patch) 11 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with
shared_base_stats patch) 12 sec
========================================================================================

This patch provided a very nice speedup in both cases.
However, in case 1, when the number of tables is large, there is an
increase in the time required
as the number of workers increases.
Whether this is due to CPU and IO conflicts or patch characteristics
is not yet known.
Nevertheless, at least the problems associated with
table_recheck_autovac() appear to have been resolved.

So, I hope that this patch [1]/messages/by-id/20200908.175557.617150409868541587.horikyota.ntt@gmail.com to be committed for its original purpose,
as well as to improve autovacuum of v14 and later.

The other patch I submitted (v1_mod_table_recheck_autovac.patch) is
useful for slight
improving autovacuum of PostgreSQL 13 and before.
Is it worth backporting this patch to current PostgreSQL 13 and earlier?

Best regards,

[1]: /messages/by-id/20200908.175557.617150409868541587.horikyota.ntt@gmail.com

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com

#12Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Kasahara Tatsuhito (#10)
Re: autovac issue with large number of tables

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
    }
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

#13Kasahara Tatsuhito
kasahara.tatsuhito@gmail.com
In reply to: Masahiko Sawada (#12)
1 attachment(s)
Re: autovac issue with large number of tables

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.
(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

BTW, I found some typos in comments, so attache a fixed version.

Best regards,

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com

Attachments:

v2_mod_table_recheck_autovac.patchapplication/octet-stream; name=v2_mod_table_recheck_autovac.patchDownload
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index aa5b97f..ec74b60 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -153,6 +153,9 @@ static int	default_freeze_table_age;
 static int	default_multixact_freeze_min_age;
 static int	default_multixact_freeze_table_age;
 
+/* Counter to determine if statistics should be refreshed */
+static bool use_existing_stats = false;
+
 /* Memory context for long-lived data */
 static MemoryContext AutovacMemCxt;
 
@@ -2803,18 +2806,68 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 	bool		wraparound;
 	AutoVacOpts *avopts;
 
-	/* use fresh stats */
-	autovac_refresh_stats();
-
-	shared = pgstat_fetch_stat_dbentry(InvalidOid);
-	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
-
 	/* fetch the relation's relcache entry */
 	classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
 	if (!HeapTupleIsValid(classTup))
 		return NULL;
 	classForm = (Form_pg_class) GETSTRUCT(classTup);
 
+	/* 
+	 * Before stats refresh, check existing stats for avoiding
+ 	 * frequent reloading of pgstats.
+ 	 * In the case of very large numbers of tables, the cost of re-reading
+ 	 * the stats file can be significant, and the frequent calls to 
+ 	 * autovac_refresh_stats() can make certain autovacuum workers unable to work.
+ 	 * So if the last time we checked a table that was already vacuumed after 
+ 	 * refres stats, check the current statistics before refreshing it.
+	 */
+	if (use_existing_stats)
+	{
+		shared = pgstat_fetch_stat_dbentry(InvalidOid);
+		dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
+
+		/*
+	 	 * Get the applicable reloptions.  If it is a TOAST table, try to get the
+	 	 * main table reloptions if the toast table itself doesn't have.
+	 	 */
+		avopts = extract_autovac_opts(classTup, pg_class_desc);
+		if (classForm->relkind == RELKIND_TOASTVALUE &&
+			avopts == NULL && table_toast_map != NULL)
+		{
+			av_relation *hentry;
+			bool		found;
+
+			hentry = hash_search(table_toast_map, &relid, HASH_FIND, &found);
+			if (found && hentry->ar_hasrelopts)
+			avopts = &hentry->ar_reloptions;
+		}
+
+		/* fetch the pgstat table entry */
+		tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
+											 shared, dbentry);
+
+		relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
+								  effective_multixact_freeze_max_age,
+								  &dovacuum, &doanalyze, &wraparound);
+
+		/* ignore ANALYZE for toast tables */
+		if (classForm->relkind == RELKIND_TOASTVALUE)
+			doanalyze = false;
+
+		/* someone has already issued vacuum, so exit quickly */
+		if (!doanalyze && !dovacuum)
+		{
+			heap_freetuple(classTup);
+			return NULL;
+		}
+	}
+
+	/* use fresh stats and recheck again */
+	autovac_refresh_stats();
+
+	shared = pgstat_fetch_stat_dbentry(InvalidOid);
+	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
+
 	/*
 	 * Get the applicable reloptions.  If it is a TOAST table, try to get the
 	 * main table reloptions if the toast table itself doesn't have.
@@ -2929,9 +2982,18 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 		tab->at_dobalance =
 			!(avopts && (avopts->vacuum_cost_limit > 0 ||
 						 avopts->vacuum_cost_delay > 0));
+
+		/* We might be better to refresh stats */
+		use_existing_stats = false;
 	}
+	else
+	{
 
-	heap_freetuple(classTup);
+		heap_freetuple(classTup);
+
+		/* The relid has already vacuumed, so we might be better to use existing stats */
+		use_existing_stats = true;
+	}
 
 	return tab;
 }
#14Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Kasahara Tatsuhito (#13)
Re: autovac issue with large number of tables

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.
(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

BTW, I found some typos in comments, so attache a fixed version.

Thank you for updating the patch! I'll also run the performance test
you shared with the latest version patch.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

#15Kasahara Tatsuhito
kasahara.tatsuhito@gmail.com
In reply to: Masahiko Sawada (#14)
Re: autovac issue with large number of tables

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.
(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

Yeah, I would like to see how much the performance changes in other cases.
In addition, if the shared-based-stats patch is applied, we won't need to reload
a huge stats file, so we will just have to check the stats on
shared-mem every time.
Perhaps the logic of table_recheck_autovac could be simpler.

BTW, I found some typos in comments, so attache a fixed version.

Thank you for updating the patch! I'll also run the performance test
you shared with the latest version patch.

Thank you!
It's very helpful.

Best regards,
--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com

#16Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Kasahara Tatsuhito (#15)
Re: autovac issue with large number of tables

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

Yeah, I would like to see how much the performance changes in other cases.
In addition, if the shared-based-stats patch is applied, we won't need to reload
a huge stats file, so we will just have to check the stats on
shared-mem every time.
Perhaps the logic of table_recheck_autovac could be simpler.

BTW, I found some typos in comments, so attache a fixed version.

The patch adds some duplicated codes into table_recheck_autovac().
It's better to make the common function performing them and make
table_recheck_autovac() call that common function, to simplify the code.

+		/*
+	 	 * Get the applicable reloptions.  If it is a TOAST table, try to get the
+	 	 * main table reloptions if the toast table itself doesn't have.
+	 	 */
+		avopts = extract_autovac_opts(classTup, pg_class_desc);
+		if (classForm->relkind == RELKIND_TOASTVALUE &&
+			avopts == NULL && table_toast_map != NULL)
+		{
+			av_relation *hentry;
+			bool		found;
+
+			hentry = hash_search(table_toast_map, &relid, HASH_FIND, &found);
+			if (found && hentry->ar_hasrelopts)
+			avopts = &hentry->ar_reloptions;
+		}

The above is performed both when using the existing stats and
also when the stats are refreshed. But it's actually required
only at once?

-	heap_freetuple(classTup);
+		heap_freetuple(classTup);

With the patch, heap_freetuple() is not called when either doanalyze
or dovacuum is true. But it should be called even in that case,
like it is originally?

Thank you for updating the patch! I'll also run the performance test
you shared with the latest version patch.

+1

Thank you!
It's very helpful.

Agreed.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#17Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Fujii Masao (#16)
Re: autovac issue with large number of tables

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

FWIW I'd like to share the benchmark results of the same test in my
environment as Kasahara-san did. In this performance evaluation, I
measured the execution time for the loop in do_autovacuum(), line 2318
in autovacuum.c, where taking a major time of autovacuum. So it checks
how much time an autovacuum worker took to process the list of the
collected all tables, including refreshing and checking the stats,
vacuuming tables, and checking the existing stats. Since all tables
are the same size (only 1 page) there is no big difference in the
execution time between concurrent autovacuum workers. The following
results show the maximum execution time among the autovacuum workers.
From the left the execution time of the current HEAD, Kasahara-san's
patch, the method of always checking the existing stats, in seconds.
The result has a similar trend to what Kasahara-san mentioned.

1000 tables:
autovac_workers 1 : 13s, 13s, 13s
autovac_workers 2 : 6s, 4s, 5s
autovac_workers 3 : 3s, 4s, 4s
autovac_workers 5 : 3s, 3s, 3s
autovac_workers 10: 2s, 3s, 3s

5000 tables:
autovac_workers 1 : 71s, 71s, 132s
autovac_workers 2 : 37s, 32s, 48s
autovac_workers 3 : 29s, 26s, 38s
autovac_workers 5 : 20s, 19s, 19s
autovac_workers 10: 13s, 8s, 9s

10000 tables:
autovac_workers 1 : 158s,157s, 290s
autovac_workers 2 : 80s, 53s, 151s
autovac_workers 3 : 75s, 67s, 89s
autovac_workers 5 : 61s, 42s, 53s
autovac_workers 10: 69s, 26s, 33s

20000 tables:
autovac_workers 1 : 379s, 380s, 695s
autovac_workers 2 : 236s, 232s, 369s
autovac_workers 3 : 222s, 181s, 238s
autovac_workers 5 : 212s, 132s, 167s
autovac_workers 10: 317s, 91s, 117s

I'm benchmarking the performance improvement by the patch on other
workloads. I'll share that result.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

#18Kasahara Tatsuhito
kasahara.tatsuhito@gmail.com
In reply to: Fujii Masao (#16)
1 attachment(s)
Re: autovac issue with large number of tables

Hi,

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

Yeah, I would like to see how much the performance changes in other cases.
In addition, if the shared-based-stats patch is applied, we won't need to reload
a huge stats file, so we will just have to check the stats on
shared-mem every time.
Perhaps the logic of table_recheck_autovac could be simpler.

BTW, I found some typos in comments, so attache a fixed version.

The patch adds some duplicated codes into table_recheck_autovac().
It's better to make the common function performing them and make
table_recheck_autovac() call that common function, to simplify the code.

Thanks for your comment.
Hmm.. I've cut out the duplicate part.
Attach the patch.
Could you confirm that it fits your expecting?

+               /*
+                * Get the applicable reloptions.  If it is a TOAST table, try to get the
+                * main table reloptions if the toast table itself doesn't have.
+                */
+               avopts = extract_autovac_opts(classTup, pg_class_desc);
+               if (classForm->relkind == RELKIND_TOASTVALUE &&
+                       avopts == NULL && table_toast_map != NULL)
+               {
+                       av_relation *hentry;
+                       bool            found;
+
+                       hentry = hash_search(table_toast_map, &relid, HASH_FIND, &found);
+                       if (found && hentry->ar_hasrelopts)
+                       avopts = &hentry->ar_reloptions;
+               }

The above is performed both when using the existing stats and
also when the stats are refreshed. But it's actually required
only at once?

Yeah right. Fixed.

-       heap_freetuple(classTup);
+               heap_freetuple(classTup);

With the patch, heap_freetuple() is not called when either doanalyze
or dovacuum is true. But it should be called even in that case,
like it is originally?

Yeah right. Fixed.

Best regards,

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com

Attachments:

v3_mod_table_recheck_autovac.patchapplication/octet-stream; name=v3_mod_table_recheck_autovac.patchDownload
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index aa5b97f..c41514d 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -153,6 +153,9 @@ static int	default_freeze_table_age;
 static int	default_multixact_freeze_min_age;
 static int	default_multixact_freeze_table_age;
 
+/* Counter to determine if statistics should be refreshed */
+static bool use_existing_stats = false;
+
 /* Memory context for long-lived data */
 static MemoryContext AutovacMemCxt;
 
@@ -2779,6 +2782,35 @@ get_pgstat_tabentry_relid(Oid relid, bool isshared, PgStat_StatDBEntry *shared,
 	return tabentry;
 }
 
+static void
+recheck_relation_needs_vacanalyze(Oid relid,
+							   Form_pg_class classForm,
+							   AutoVacOpts *avopts,
+							   int effective_multixact_freeze_max_age,
+							   bool *dovacuum,
+						  	   bool *doanalyze,
+						  	   bool *wraparound)
+{
+	PgStat_StatTabEntry *tabentry;
+	PgStat_StatDBEntry *shared;
+	PgStat_StatDBEntry *dbentry;
+
+	shared = pgstat_fetch_stat_dbentry(InvalidOid);
+	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
+
+	/* fetch the pgstat table entry */
+	tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
+										 shared, dbentry);
+
+	relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
+							  effective_multixact_freeze_max_age,
+							  dovacuum, doanalyze, wraparound);
+
+	/* ignore ANALYZE for toast tables */
+	if (classForm->relkind == RELKIND_TOASTVALUE)
+		*doanalyze = false;
+}
+
 /*
  * table_recheck_autovac
  *
@@ -2797,18 +2829,9 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 	bool		dovacuum;
 	bool		doanalyze;
 	autovac_table *tab = NULL;
-	PgStat_StatTabEntry *tabentry;
-	PgStat_StatDBEntry *shared;
-	PgStat_StatDBEntry *dbentry;
 	bool		wraparound;
 	AutoVacOpts *avopts;
 
-	/* use fresh stats */
-	autovac_refresh_stats();
-
-	shared = pgstat_fetch_stat_dbentry(InvalidOid);
-	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
-
 	/* fetch the relation's relcache entry */
 	classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
 	if (!HeapTupleIsValid(classTup))
@@ -2831,17 +2854,35 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 			avopts = &hentry->ar_reloptions;
 	}
 
-	/* fetch the pgstat table entry */
-	tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
-										 shared, dbentry);
+	/* 
+	 * Before stats refresh, check existing stats for avoiding
+ 	 * frequent reloading of pgstats.
+ 	 * In the case of very large numbers of tables, the cost of re-reading
+ 	 * the stats file can be significant, and the frequent calls to 
+ 	 * autovac_refresh_stats() can make certain autovacuum workers unable to work.
+ 	 * So if the last time we checked a table that was already vacuumed after 
+ 	 * refres stats, check the current statistics before refreshing it.
+	 */
+	if (use_existing_stats)
+	{
+		recheck_relation_needs_vacanalyze(relid, classForm, avopts,
+									   effective_multixact_freeze_max_age,
+									   &dovacuum, &doanalyze, &wraparound);
 
-	relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
-							  effective_multixact_freeze_max_age,
-							  &dovacuum, &doanalyze, &wraparound);
+		/* someone has already issued vacuum, so exit quickly */
+		if (!doanalyze && !dovacuum)
+		{
+			heap_freetuple(classTup);
+			return NULL;
+		}
+	}
 
-	/* ignore ANALYZE for toast tables */
-	if (classForm->relkind == RELKIND_TOASTVALUE)
-		doanalyze = false;
+	/* use fresh stats and recheck again */
+	autovac_refresh_stats();
+
+	recheck_relation_needs_vacanalyze(relid, classForm, avopts,
+								   effective_multixact_freeze_max_age,
+								   &dovacuum, &doanalyze, &wraparound);
 
 	/* OK, it needs something done */
 	if (doanalyze || dovacuum)
@@ -2929,10 +2970,17 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 		tab->at_dobalance =
 			!(avopts && (avopts->vacuum_cost_limit > 0 ||
 						 avopts->vacuum_cost_delay > 0));
+
+		/* We might be better to refresh stats */
+		use_existing_stats = false;
+	}
+	else
+	{
+		/* The relid has already vacuumed, so we might be better to use existing stats */
+		use_existing_stats = true;
 	}
 
 	heap_freetuple(classTup);
-
 	return tab;
 }
 
#19Kasahara Tatsuhito
kasahara.tatsuhito@gmail.com
In reply to: Masahiko Sawada (#17)
Re: autovac issue with large number of tables

On Fri, Nov 27, 2020 at 5:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

FWIW I'd like to share the benchmark results of the same test in my
environment as Kasahara-san did. In this performance evaluation, I
measured the execution time for the loop in do_autovacuum(), line 2318
in autovacuum.c, where taking a major time of autovacuum. So it checks
how much time an autovacuum worker took to process the list of the
collected all tables, including refreshing and checking the stats,
vacuuming tables, and checking the existing stats. Since all tables
are the same size (only 1 page) there is no big difference in the
execution time between concurrent autovacuum workers. The following
results show the maximum execution time among the autovacuum workers.
From the left the execution time of the current HEAD, Kasahara-san's
patch, the method of always checking the existing stats, in seconds.
The result has a similar trend to what Kasahara-san mentioned.

Thanks!
Yes, I think the results are as expected.

1000 tables:
autovac_workers 1 : 13s, 13s, 13s
autovac_workers 2 : 6s, 4s, 5s
autovac_workers 3 : 3s, 4s, 4s
autovac_workers 5 : 3s, 3s, 3s
autovac_workers 10: 2s, 3s, 3s

5000 tables:
autovac_workers 1 : 71s, 71s, 132s
autovac_workers 2 : 37s, 32s, 48s
autovac_workers 3 : 29s, 26s, 38s
autovac_workers 5 : 20s, 19s, 19s
autovac_workers 10: 13s, 8s, 9s

10000 tables:
autovac_workers 1 : 158s,157s, 290s
autovac_workers 2 : 80s, 53s, 151s
autovac_workers 3 : 75s, 67s, 89s
autovac_workers 5 : 61s, 42s, 53s
autovac_workers 10: 69s, 26s, 33s

20000 tables:
autovac_workers 1 : 379s, 380s, 695s
autovac_workers 2 : 236s, 232s, 369s
autovac_workers 3 : 222s, 181s, 238s
autovac_workers 5 : 212s, 132s, 167s
autovac_workers 10: 317s, 91s, 117s

I'm benchmarking the performance improvement by the patch on other
workloads. I'll share that result.

+1
If you would like to try the patch I just posted, it would be very helpful.

Best regards,

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com

#20Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Kasahara Tatsuhito (#18)
Re: autovac issue with large number of tables

On 2020/11/27 18:38, Kasahara Tatsuhito wrote:

Hi,

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

Yeah, I would like to see how much the performance changes in other cases.
In addition, if the shared-based-stats patch is applied, we won't need to reload
a huge stats file, so we will just have to check the stats on
shared-mem every time.
Perhaps the logic of table_recheck_autovac could be simpler.

BTW, I found some typos in comments, so attache a fixed version.

The patch adds some duplicated codes into table_recheck_autovac().
It's better to make the common function performing them and make
table_recheck_autovac() call that common function, to simplify the code.

Thanks for your comment.
Hmm.. I've cut out the duplicate part.
Attach the patch.
Could you confirm that it fits your expecting?

Yes, thanks for updataing the patch! Here are another review comments.

+	shared = pgstat_fetch_stat_dbentry(InvalidOid);
+	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);

When using the existing stats, ISTM that these are not necessary and
we can reuse "shared" and "dbentry" obtained before. Right?

+		/* We might be better to refresh stats */
+		use_existing_stats = false;

I think that we should add more comments about why it's better to
refresh the stats in this case.

+		/* The relid has already vacuumed, so we might be better to use existing stats */
+		use_existing_stats = true;

I think that we should add more comments about why it's better to
reuse the stats in this case.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#21Kasahara Tatsuhito
kasahara.tatsuhito@gmail.com
In reply to: Fujii Masao (#20)
1 attachment(s)
Re: autovac issue with large number of tables

Hi, Thanks for you comments.

On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/27 18:38, Kasahara Tatsuhito wrote:

Hi,

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

Yeah, I would like to see how much the performance changes in other cases.
In addition, if the shared-based-stats patch is applied, we won't need to reload
a huge stats file, so we will just have to check the stats on
shared-mem every time.
Perhaps the logic of table_recheck_autovac could be simpler.

BTW, I found some typos in comments, so attache a fixed version.

The patch adds some duplicated codes into table_recheck_autovac().
It's better to make the common function performing them and make
table_recheck_autovac() call that common function, to simplify the code.

Thanks for your comment.
Hmm.. I've cut out the duplicate part.
Attach the patch.
Could you confirm that it fits your expecting?

Yes, thanks for updataing the patch! Here are another review comments.

+       shared = pgstat_fetch_stat_dbentry(InvalidOid);
+       dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);

When using the existing stats, ISTM that these are not necessary and
we can reuse "shared" and "dbentry" obtained before. Right?

Yeah, but unless autovac_refresh_stats() is called, these functions
read the information from the
local hash table without re-read the stats file, so the process is very light.
Therefore, I think, it is better to keep the current logic to keep the
code simple.

+               /* We might be better to refresh stats */
+               use_existing_stats = false;

I think that we should add more comments about why it's better to
refresh the stats in this case.

+               /* The relid has already vacuumed, so we might be better to use existing stats */
+               use_existing_stats = true;

I think that we should add more comments about why it's better to
reuse the stats in this case.

I added comments.

Attache the patch.

Best regards,

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com

Attachments:

v4_mod_table_recheck_autovac.patchapplication/octet-stream; name=v4_mod_table_recheck_autovac.patchDownload
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index aa5b97f..cf42422 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -153,6 +153,9 @@ static int	default_freeze_table_age;
 static int	default_multixact_freeze_min_age;
 static int	default_multixact_freeze_table_age;
 
+/* Counter to determine if statistics should be refreshed */
+static bool use_existing_stats = false;
+
 /* Memory context for long-lived data */
 static MemoryContext AutovacMemCxt;
 
@@ -325,6 +328,10 @@ static void autovac_balance_cost(void);
 static void do_autovacuum(void);
 static void FreeWorkerInfo(int code, Datum arg);
 
+static void recheck_relation_needs_vacanalyze(Oid relid, Form_pg_class classForm,
+							 AutoVacOpts *avopts,
+							 int effective_multixact_freeze_max_age,
+							 bool *dovacuum, bool *doanalyze, bool *wraparound);
 static autovac_table *table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 											TupleDesc pg_class_desc,
 											int effective_multixact_freeze_max_age);
@@ -2779,6 +2786,38 @@ get_pgstat_tabentry_relid(Oid relid, bool isshared, PgStat_StatDBEntry *shared,
 	return tabentry;
 }
 
+/*
+ * Subroutine of table_recheck_autovac.
+ */
+static void
+recheck_relation_needs_vacanalyze(Oid relid,
+							   Form_pg_class classForm,
+							   AutoVacOpts *avopts,
+							   int effective_multixact_freeze_max_age,
+							   bool *dovacuum,
+							   bool *doanalyze,
+							   bool *wraparound)
+{
+	PgStat_StatTabEntry *tabentry;
+	PgStat_StatDBEntry *shared;
+	PgStat_StatDBEntry *dbentry;
+
+	shared = pgstat_fetch_stat_dbentry(InvalidOid);
+	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
+
+	/* fetch the pgstat table entry */
+	tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
+										 shared, dbentry);
+
+	relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
+							  effective_multixact_freeze_max_age,
+							  dovacuum, doanalyze, wraparound);
+
+	/* ignore ANALYZE for toast tables */
+	if (classForm->relkind == RELKIND_TOASTVALUE)
+		*doanalyze = false;
+}
+
 /*
  * table_recheck_autovac
  *
@@ -2797,18 +2836,9 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 	bool		dovacuum;
 	bool		doanalyze;
 	autovac_table *tab = NULL;
-	PgStat_StatTabEntry *tabentry;
-	PgStat_StatDBEntry *shared;
-	PgStat_StatDBEntry *dbentry;
 	bool		wraparound;
 	AutoVacOpts *avopts;
 
-	/* use fresh stats */
-	autovac_refresh_stats();
-
-	shared = pgstat_fetch_stat_dbentry(InvalidOid);
-	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
-
 	/* fetch the relation's relcache entry */
 	classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
 	if (!HeapTupleIsValid(classTup))
@@ -2831,17 +2861,35 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 			avopts = &hentry->ar_reloptions;
 	}
 
-	/* fetch the pgstat table entry */
-	tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
-										 shared, dbentry);
+	/*
+	 * Before stats refresh, check existing stats for avoiding
+	 * frequent reloading of pgstats.
+	 * In the case of very large numbers of tables, the cost of re-reading
+	 * the stats file can be significant, and the frequent calls to
+	 * autovac_refresh_stats() can make certain autovacuum workers unable to work.
+	 * So if the last time we checked a table that was already vacuumed after
+	 * refres stats, check the current statistics before refreshing it.
+	 */
+	if (use_existing_stats)
+	{
+		recheck_relation_needs_vacanalyze(relid, classForm, avopts,
+									   effective_multixact_freeze_max_age,
+									   &dovacuum, &doanalyze, &wraparound);
 
-	relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
-							  effective_multixact_freeze_max_age,
-							  &dovacuum, &doanalyze, &wraparound);
+		/* someone has already issued vacuum, so exit quickly */
+		if (!doanalyze && !dovacuum)
+		{
+			heap_freetuple(classTup);
+			return NULL;
+		}
+	}
 
-	/* ignore ANALYZE for toast tables */
-	if (classForm->relkind == RELKIND_TOASTVALUE)
-		doanalyze = false;
+	/* use fresh stats and recheck again */
+	autovac_refresh_stats();
+
+	recheck_relation_needs_vacanalyze(relid, classForm, avopts,
+								   effective_multixact_freeze_max_age,
+								   &dovacuum, &doanalyze, &wraparound);
 
 	/* OK, it needs something done */
 	if (doanalyze || dovacuum)
@@ -2929,10 +2977,27 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 		tab->at_dobalance =
 			!(avopts && (avopts->vacuum_cost_limit > 0 ||
 						 avopts->vacuum_cost_delay > 0));
+
+		/*
+		 * The relid had not yet been vacuumed. That means, it is unlikely that the
+		 * stats that this worker currently has are updated by other worker's.
+		 * So we might be better to refresh the stats in the next this recheck.
+		 */
+		use_existing_stats = false;
+	}
+	else
+	{
+		/*
+		 * The relid had already vacuumed. That means, that for the stats that this
+		 * worker currently has, the info of tables that this worker will process may
+		 * have been updated by other workers with information that has already been
+		 * vacuumed or analyzed.
+		 * So we might be better to reuse the existing stats in the next this recheck.
+		 */
+		use_existing_stats = true;
 	}
 
 	heap_freetuple(classTup);
-
 	return tab;
 }
 
#22Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Kasahara Tatsuhito (#21)
Re: autovac issue with large number of tables

On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi, Thanks for you comments.

On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/27 18:38, Kasahara Tatsuhito wrote:

Hi,

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

Yeah, I would like to see how much the performance changes in other cases.
In addition, if the shared-based-stats patch is applied, we won't need to reload
a huge stats file, so we will just have to check the stats on
shared-mem every time.
Perhaps the logic of table_recheck_autovac could be simpler.

BTW, I found some typos in comments, so attache a fixed version.

The patch adds some duplicated codes into table_recheck_autovac().
It's better to make the common function performing them and make
table_recheck_autovac() call that common function, to simplify the code.

Thanks for your comment.
Hmm.. I've cut out the duplicate part.
Attach the patch.
Could you confirm that it fits your expecting?

Yes, thanks for updataing the patch! Here are another review comments.

+       shared = pgstat_fetch_stat_dbentry(InvalidOid);
+       dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);

When using the existing stats, ISTM that these are not necessary and
we can reuse "shared" and "dbentry" obtained before. Right?

Yeah, but unless autovac_refresh_stats() is called, these functions
read the information from the
local hash table without re-read the stats file, so the process is very light.
Therefore, I think, it is better to keep the current logic to keep the
code simple.

+               /* We might be better to refresh stats */
+               use_existing_stats = false;

I think that we should add more comments about why it's better to
refresh the stats in this case.

+               /* The relid has already vacuumed, so we might be better to use existing stats */
+               use_existing_stats = true;

I think that we should add more comments about why it's better to
reuse the stats in this case.

I added comments.

Attache the patch.

Thank you for updating the patch. Here are some small comments on the
latest (v4) patch.

+    * So if the last time we checked a table that was already vacuumed after
+    * refres stats, check the current statistics before refreshing it.
+    */

s/refres/refresh/

-----
+/* Counter to determine if statistics should be refreshed */
+static bool use_existing_stats = false;
+

I think 'use_existing_stats' can be declared within table_recheck_autovac().

-----
While testing the performance, I realized that the statistics are
reset every time vacuumed one table, leading to re-reading the stats
file even if 'use_existing_stats' is true. Please refer that vacuum()
eventually calls AtEOXact_PgStat() which calls to
pgstat_clear_snapshot(). I believe that's why the performance of the
method of always checking the existing stats wasn’t good as expected.
So if we save the statistics somewhere and use it for rechecking, the
results of the performance benchmark will differ between these two
methods.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

#23Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Masahiko Sawada (#22)
Re: autovac issue with large number of tables

On 2020/11/30 10:43, Masahiko Sawada wrote:

On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi, Thanks for you comments.

On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/27 18:38, Kasahara Tatsuhito wrote:

Hi,

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

Yeah, I would like to see how much the performance changes in other cases.
In addition, if the shared-based-stats patch is applied, we won't need to reload
a huge stats file, so we will just have to check the stats on
shared-mem every time.
Perhaps the logic of table_recheck_autovac could be simpler.

BTW, I found some typos in comments, so attache a fixed version.

The patch adds some duplicated codes into table_recheck_autovac().
It's better to make the common function performing them and make
table_recheck_autovac() call that common function, to simplify the code.

Thanks for your comment.
Hmm.. I've cut out the duplicate part.
Attach the patch.
Could you confirm that it fits your expecting?

Yes, thanks for updataing the patch! Here are another review comments.

+       shared = pgstat_fetch_stat_dbentry(InvalidOid);
+       dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);

When using the existing stats, ISTM that these are not necessary and
we can reuse "shared" and "dbentry" obtained before. Right?

Yeah, but unless autovac_refresh_stats() is called, these functions
read the information from the
local hash table without re-read the stats file, so the process is very light.
Therefore, I think, it is better to keep the current logic to keep the
code simple.

+               /* We might be better to refresh stats */
+               use_existing_stats = false;

I think that we should add more comments about why it's better to
refresh the stats in this case.

+               /* The relid has already vacuumed, so we might be better to use existing stats */
+               use_existing_stats = true;

I think that we should add more comments about why it's better to
reuse the stats in this case.

I added comments.

Attache the patch.

Thank you for updating the patch. Here are some small comments on the
latest (v4) patch.

+    * So if the last time we checked a table that was already vacuumed after
+    * refres stats, check the current statistics before refreshing it.
+    */

s/refres/refresh/

-----
+/* Counter to determine if statistics should be refreshed */
+static bool use_existing_stats = false;
+

I think 'use_existing_stats' can be declared within table_recheck_autovac().

-----
While testing the performance, I realized that the statistics are
reset every time vacuumed one table, leading to re-reading the stats
file even if 'use_existing_stats' is true. Please refer that vacuum()
eventually calls AtEOXact_PgStat() which calls to
pgstat_clear_snapshot().

Good catch!

I believe that's why the performance of the
method of always checking the existing stats wasn’t good as expected.
So if we save the statistics somewhere and use it for rechecking, the
results of the performance benchmark will differ between these two
methods.

Or it's simpler to make autovacuum worker skip calling
pgstat_clear_snapshot() in AtEOXact_PgStat()?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#24Kasahara Tatsuhito
kasahara.tatsuhito@gmail.com
In reply to: Fujii Masao (#23)
1 attachment(s)
Re: autovac issue with large number of tables

Hi,

On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/30 10:43, Masahiko Sawada wrote:

On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi, Thanks for you comments.

On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/27 18:38, Kasahara Tatsuhito wrote:

Hi,

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

Yeah, I would like to see how much the performance changes in other cases.
In addition, if the shared-based-stats patch is applied, we won't need to reload
a huge stats file, so we will just have to check the stats on
shared-mem every time.
Perhaps the logic of table_recheck_autovac could be simpler.

BTW, I found some typos in comments, so attache a fixed version.

The patch adds some duplicated codes into table_recheck_autovac().
It's better to make the common function performing them and make
table_recheck_autovac() call that common function, to simplify the code.

Thanks for your comment.
Hmm.. I've cut out the duplicate part.
Attach the patch.
Could you confirm that it fits your expecting?

Yes, thanks for updataing the patch! Here are another review comments.

+       shared = pgstat_fetch_stat_dbentry(InvalidOid);
+       dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);

When using the existing stats, ISTM that these are not necessary and
we can reuse "shared" and "dbentry" obtained before. Right?

Yeah, but unless autovac_refresh_stats() is called, these functions
read the information from the
local hash table without re-read the stats file, so the process is very light.
Therefore, I think, it is better to keep the current logic to keep the
code simple.

+               /* We might be better to refresh stats */
+               use_existing_stats = false;

I think that we should add more comments about why it's better to
refresh the stats in this case.

+               /* The relid has already vacuumed, so we might be better to use existing stats */
+               use_existing_stats = true;

I think that we should add more comments about why it's better to
reuse the stats in this case.

I added comments.

Attache the patch.

Thank you for updating the patch. Here are some small comments on the
latest (v4) patch.

+    * So if the last time we checked a table that was already vacuumed after
+    * refres stats, check the current statistics before refreshing it.
+    */

s/refres/refresh/

Thanks! fixed.
Attached the patch.

-----
+/* Counter to determine if statistics should be refreshed */
+static bool use_existing_stats = false;
+

I think 'use_existing_stats' can be declared within table_recheck_autovac().

-----
While testing the performance, I realized that the statistics are
reset every time vacuumed one table, leading to re-reading the stats
file even if 'use_existing_stats' is true. Please refer that vacuum()
eventually calls AtEOXact_PgStat() which calls to
pgstat_clear_snapshot().

Good catch!

I believe that's why the performance of the
method of always checking the existing stats wasn’t good as expected.
So if we save the statistics somewhere and use it for rechecking, the
results of the performance benchmark will differ between these two
methods.

Thanks for you checks.
But, if a worker did vacuum(), that means this worker had determined
need vacuum in the
table_recheck_autovac(). So, use_existing_stats set to false, and next
time, refresh stats.
Therefore I think the current patch is fine, as we want to avoid
unnecessary refreshing of
statistics before the actual vacuum(), right?

Or it's simpler to make autovacuum worker skip calling
pgstat_clear_snapshot() in AtEOXact_PgStat()?

Hmm. IMO the side effects are a bit scary, so I think it's fine the way it is.

Best regards,

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com

Attachments:

v5_mod_table_recheck_autovac.patchapplication/octet-stream; name=v5_mod_table_recheck_autovac.patchDownload
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index aa5b97f..4ebe45f 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -325,6 +325,10 @@ static void autovac_balance_cost(void);
 static void do_autovacuum(void);
 static void FreeWorkerInfo(int code, Datum arg);
 
+static void recheck_relation_needs_vacanalyze(Oid relid, Form_pg_class classForm,
+							 AutoVacOpts *avopts,
+							 int effective_multixact_freeze_max_age,
+							 bool *dovacuum, bool *doanalyze, bool *wraparound);
 static autovac_table *table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 											TupleDesc pg_class_desc,
 											int effective_multixact_freeze_max_age);
@@ -2780,6 +2784,38 @@ get_pgstat_tabentry_relid(Oid relid, bool isshared, PgStat_StatDBEntry *shared,
 }
 
 /*
+ * Subroutine of table_recheck_autovac.
+ */
+static void
+recheck_relation_needs_vacanalyze(Oid relid,
+							   Form_pg_class classForm,
+							   AutoVacOpts *avopts,
+							   int effective_multixact_freeze_max_age,
+							   bool *dovacuum,
+							   bool *doanalyze,
+							   bool *wraparound)
+{
+	PgStat_StatTabEntry *tabentry;
+	PgStat_StatDBEntry *shared;
+	PgStat_StatDBEntry *dbentry;
+
+	shared = pgstat_fetch_stat_dbentry(InvalidOid);
+	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
+
+	/* fetch the pgstat table entry */
+	tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
+										 shared, dbentry);
+
+	relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
+							  effective_multixact_freeze_max_age,
+							  dovacuum, doanalyze, wraparound);
+
+	/* ignore ANALYZE for toast tables */
+	if (classForm->relkind == RELKIND_TOASTVALUE)
+		*doanalyze = false;
+}
+
+/*
  * table_recheck_autovac
  *
  * Recheck whether a table still needs vacuum or analyze.  Return value is a
@@ -2797,17 +2833,9 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 	bool		dovacuum;
 	bool		doanalyze;
 	autovac_table *tab = NULL;
-	PgStat_StatTabEntry *tabentry;
-	PgStat_StatDBEntry *shared;
-	PgStat_StatDBEntry *dbentry;
 	bool		wraparound;
 	AutoVacOpts *avopts;
-
-	/* use fresh stats */
-	autovac_refresh_stats();
-
-	shared = pgstat_fetch_stat_dbentry(InvalidOid);
-	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
+	static bool use_existing_stats = false;
 
 	/* fetch the relation's relcache entry */
 	classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
@@ -2831,17 +2859,35 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 			avopts = &hentry->ar_reloptions;
 	}
 
-	/* fetch the pgstat table entry */
-	tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
-										 shared, dbentry);
+	/*
+	 * Before stats refresh, check existing stats for avoiding
+	 * frequent reloading of pgstats.
+	 * In the case of very large numbers of tables, the cost of re-reading
+	 * the stats file can be significant, and the frequent calls to
+	 * autovac_refresh_stats() can make certain autovacuum workers unable to work.
+	 * So if the last time we checked a table that was already vacuumed after
+	 * refresh stats, check the current statistics before refreshing it.
+	 */
+	if (use_existing_stats)
+	{
+		recheck_relation_needs_vacanalyze(relid, classForm, avopts,
+									   effective_multixact_freeze_max_age,
+									   &dovacuum, &doanalyze, &wraparound);
 
-	relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
-							  effective_multixact_freeze_max_age,
-							  &dovacuum, &doanalyze, &wraparound);
+		/* someone has already issued vacuum, so exit quickly */
+		if (!doanalyze && !dovacuum)
+		{
+			heap_freetuple(classTup);
+			return NULL;
+		}
+	}
 
-	/* ignore ANALYZE for toast tables */
-	if (classForm->relkind == RELKIND_TOASTVALUE)
-		doanalyze = false;
+	/* use fresh stats and recheck again */
+	autovac_refresh_stats();
+
+	recheck_relation_needs_vacanalyze(relid, classForm, avopts,
+								   effective_multixact_freeze_max_age,
+								   &dovacuum, &doanalyze, &wraparound);
 
 	/* OK, it needs something done */
 	if (doanalyze || dovacuum)
@@ -2929,10 +2975,27 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 		tab->at_dobalance =
 			!(avopts && (avopts->vacuum_cost_limit > 0 ||
 						 avopts->vacuum_cost_delay > 0));
+
+		/*
+		 * The relid had not yet been vacuumed. That means, it is unlikely that the
+		 * stats that this worker currently has are updated by other worker's.
+		 * So we might be better to refresh the stats in the next this recheck.
+		 */
+		use_existing_stats = false;
+	}
+	else
+	{
+		/*
+		 * The relid had already vacuumed. That means, that for the stats that this
+		 * worker currently has, the info of tables that this worker will process may
+		 * have been updated by other workers with information that has already been
+		 * vacuumed or analyzed.
+		 * So we might be better to reuse the existing stats in the next this recheck.
+		 */
+		use_existing_stats = true;
 	}
 
 	heap_freetuple(classTup);
-
 	return tab;
 }
 
#25Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Kasahara Tatsuhito (#24)
Re: autovac issue with large number of tables

On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/30 10:43, Masahiko Sawada wrote:

On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi, Thanks for you comments.

On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/27 18:38, Kasahara Tatsuhito wrote:

Hi,

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

Yeah, I would like to see how much the performance changes in other cases.
In addition, if the shared-based-stats patch is applied, we won't need to reload
a huge stats file, so we will just have to check the stats on
shared-mem every time.
Perhaps the logic of table_recheck_autovac could be simpler.

BTW, I found some typos in comments, so attache a fixed version.

The patch adds some duplicated codes into table_recheck_autovac().
It's better to make the common function performing them and make
table_recheck_autovac() call that common function, to simplify the code.

Thanks for your comment.
Hmm.. I've cut out the duplicate part.
Attach the patch.
Could you confirm that it fits your expecting?

Yes, thanks for updataing the patch! Here are another review comments.

+       shared = pgstat_fetch_stat_dbentry(InvalidOid);
+       dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);

When using the existing stats, ISTM that these are not necessary and
we can reuse "shared" and "dbentry" obtained before. Right?

Yeah, but unless autovac_refresh_stats() is called, these functions
read the information from the
local hash table without re-read the stats file, so the process is very light.
Therefore, I think, it is better to keep the current logic to keep the
code simple.

+               /* We might be better to refresh stats */
+               use_existing_stats = false;

I think that we should add more comments about why it's better to
refresh the stats in this case.

+               /* The relid has already vacuumed, so we might be better to use existing stats */
+               use_existing_stats = true;

I think that we should add more comments about why it's better to
reuse the stats in this case.

I added comments.

Attache the patch.

Thank you for updating the patch. Here are some small comments on the
latest (v4) patch.

+    * So if the last time we checked a table that was already vacuumed after
+    * refres stats, check the current statistics before refreshing it.
+    */

s/refres/refresh/

Thanks! fixed.
Attached the patch.

-----
+/* Counter to determine if statistics should be refreshed */
+static bool use_existing_stats = false;
+

I think 'use_existing_stats' can be declared within table_recheck_autovac().

-----
While testing the performance, I realized that the statistics are
reset every time vacuumed one table, leading to re-reading the stats
file even if 'use_existing_stats' is true. Please refer that vacuum()
eventually calls AtEOXact_PgStat() which calls to
pgstat_clear_snapshot().

Good catch!

I believe that's why the performance of the
method of always checking the existing stats wasn’t good as expected.
So if we save the statistics somewhere and use it for rechecking, the
results of the performance benchmark will differ between these two
methods.

Thanks for you checks.
But, if a worker did vacuum(), that means this worker had determined
need vacuum in the
table_recheck_autovac(). So, use_existing_stats set to false, and next
time, refresh stats.
Therefore I think the current patch is fine, as we want to avoid
unnecessary refreshing of
statistics before the actual vacuum(), right?

Yes, you're right.

When I benchmarked the performance of the method of always checking
existing stats I edited your patch so that it sets use_existing_stats
= true even if the second check is false (i.g., vacuum is needed).
And the result I got was worse than expected especially in the case of
a few autovacuum workers. But it doesn't evaluate the performance of
that method rightly as the stats snapshot is cleared every time
vacuum. Given you had similar results, I guess you used a similar way
when evaluating it, is it right? If so, it’s better to fix this issue
and see how the performance benchmark results will differ.

For example, the results of the test case with 10000 tables and 1
autovacuum worker I reported before was:

10000 tables:
autovac_workers 1 : 158s,157s, 290s

But after fixing that issue in the third method (always checking the
existing stats), the results are:

10000 tables:
autovac_workers 1 : 157s,157s, 160s

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

#26Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Masahiko Sawada (#25)
Re: autovac issue with large number of tables

On 2020/12/01 16:23, Masahiko Sawada wrote:

On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/30 10:43, Masahiko Sawada wrote:

On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi, Thanks for you comments.

On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/27 18:38, Kasahara Tatsuhito wrote:

Hi,

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

Yeah, I would like to see how much the performance changes in other cases.
In addition, if the shared-based-stats patch is applied, we won't need to reload
a huge stats file, so we will just have to check the stats on
shared-mem every time.
Perhaps the logic of table_recheck_autovac could be simpler.

BTW, I found some typos in comments, so attache a fixed version.

The patch adds some duplicated codes into table_recheck_autovac().
It's better to make the common function performing them and make
table_recheck_autovac() call that common function, to simplify the code.

Thanks for your comment.
Hmm.. I've cut out the duplicate part.
Attach the patch.
Could you confirm that it fits your expecting?

Yes, thanks for updataing the patch! Here are another review comments.

+       shared = pgstat_fetch_stat_dbentry(InvalidOid);
+       dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);

When using the existing stats, ISTM that these are not necessary and
we can reuse "shared" and "dbentry" obtained before. Right?

Yeah, but unless autovac_refresh_stats() is called, these functions
read the information from the
local hash table without re-read the stats file, so the process is very light.
Therefore, I think, it is better to keep the current logic to keep the
code simple.

+               /* We might be better to refresh stats */
+               use_existing_stats = false;

I think that we should add more comments about why it's better to
refresh the stats in this case.

+               /* The relid has already vacuumed, so we might be better to use existing stats */
+               use_existing_stats = true;

I think that we should add more comments about why it's better to
reuse the stats in this case.

I added comments.

Attache the patch.

Thank you for updating the patch. Here are some small comments on the
latest (v4) patch.

+    * So if the last time we checked a table that was already vacuumed after
+    * refres stats, check the current statistics before refreshing it.
+    */

s/refres/refresh/

Thanks! fixed.
Attached the patch.

-----
+/* Counter to determine if statistics should be refreshed */
+static bool use_existing_stats = false;
+

I think 'use_existing_stats' can be declared within table_recheck_autovac().

-----
While testing the performance, I realized that the statistics are
reset every time vacuumed one table, leading to re-reading the stats
file even if 'use_existing_stats' is true. Please refer that vacuum()
eventually calls AtEOXact_PgStat() which calls to
pgstat_clear_snapshot().

Good catch!

I believe that's why the performance of the
method of always checking the existing stats wasn’t good as expected.
So if we save the statistics somewhere and use it for rechecking, the
results of the performance benchmark will differ between these two
methods.

Thanks for you checks.
But, if a worker did vacuum(), that means this worker had determined
need vacuum in the
table_recheck_autovac(). So, use_existing_stats set to false, and next
time, refresh stats.
Therefore I think the current patch is fine, as we want to avoid
unnecessary refreshing of
statistics before the actual vacuum(), right?

Yes, you're right.

When I benchmarked the performance of the method of always checking
existing stats I edited your patch so that it sets use_existing_stats
= true even if the second check is false (i.g., vacuum is needed).
And the result I got was worse than expected especially in the case of
a few autovacuum workers. But it doesn't evaluate the performance of
that method rightly as the stats snapshot is cleared every time
vacuum. Given you had similar results, I guess you used a similar way
when evaluating it, is it right? If so, it’s better to fix this issue
and see how the performance benchmark results will differ.

For example, the results of the test case with 10000 tables and 1
autovacuum worker I reported before was:

10000 tables:
autovac_workers 1 : 158s,157s, 290s

But after fixing that issue in the third method (always checking the
existing stats), the results are:

Could you tell me how you fixed that issue? You copied the stats to
somewhere as you suggested or skipped pgstat_clear_snapshot() as
I suggested?

Kasahara-san seems not to like the latter idea because it might
cause bad side effect. So we should use the former idea?

10000 tables:
autovac_workers 1 : 157s,157s, 160s

Looks good number!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#27Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Fujii Masao (#26)
Re: autovac issue with large number of tables

On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/01 16:23, Masahiko Sawada wrote:

On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/30 10:43, Masahiko Sawada wrote:

On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi, Thanks for you comments.

On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/27 18:38, Kasahara Tatsuhito wrote:

Hi,

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

Yeah, I would like to see how much the performance changes in other cases.
In addition, if the shared-based-stats patch is applied, we won't need to reload
a huge stats file, so we will just have to check the stats on
shared-mem every time.
Perhaps the logic of table_recheck_autovac could be simpler.

BTW, I found some typos in comments, so attache a fixed version.

The patch adds some duplicated codes into table_recheck_autovac().
It's better to make the common function performing them and make
table_recheck_autovac() call that common function, to simplify the code.

Thanks for your comment.
Hmm.. I've cut out the duplicate part.
Attach the patch.
Could you confirm that it fits your expecting?

Yes, thanks for updataing the patch! Here are another review comments.

+       shared = pgstat_fetch_stat_dbentry(InvalidOid);
+       dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);

When using the existing stats, ISTM that these are not necessary and
we can reuse "shared" and "dbentry" obtained before. Right?

Yeah, but unless autovac_refresh_stats() is called, these functions
read the information from the
local hash table without re-read the stats file, so the process is very light.
Therefore, I think, it is better to keep the current logic to keep the
code simple.

+               /* We might be better to refresh stats */
+               use_existing_stats = false;

I think that we should add more comments about why it's better to
refresh the stats in this case.

+               /* The relid has already vacuumed, so we might be better to use existing stats */
+               use_existing_stats = true;

I think that we should add more comments about why it's better to
reuse the stats in this case.

I added comments.

Attache the patch.

Thank you for updating the patch. Here are some small comments on the
latest (v4) patch.

+    * So if the last time we checked a table that was already vacuumed after
+    * refres stats, check the current statistics before refreshing it.
+    */

s/refres/refresh/

Thanks! fixed.
Attached the patch.

-----
+/* Counter to determine if statistics should be refreshed */
+static bool use_existing_stats = false;
+

I think 'use_existing_stats' can be declared within table_recheck_autovac().

-----
While testing the performance, I realized that the statistics are
reset every time vacuumed one table, leading to re-reading the stats
file even if 'use_existing_stats' is true. Please refer that vacuum()
eventually calls AtEOXact_PgStat() which calls to
pgstat_clear_snapshot().

Good catch!

I believe that's why the performance of the
method of always checking the existing stats wasn’t good as expected.
So if we save the statistics somewhere and use it for rechecking, the
results of the performance benchmark will differ between these two
methods.

Thanks for you checks.
But, if a worker did vacuum(), that means this worker had determined
need vacuum in the
table_recheck_autovac(). So, use_existing_stats set to false, and next
time, refresh stats.
Therefore I think the current patch is fine, as we want to avoid
unnecessary refreshing of
statistics before the actual vacuum(), right?

Yes, you're right.

When I benchmarked the performance of the method of always checking
existing stats I edited your patch so that it sets use_existing_stats
= true even if the second check is false (i.g., vacuum is needed).
And the result I got was worse than expected especially in the case of
a few autovacuum workers. But it doesn't evaluate the performance of
that method rightly as the stats snapshot is cleared every time
vacuum. Given you had similar results, I guess you used a similar way
when evaluating it, is it right? If so, it’s better to fix this issue
and see how the performance benchmark results will differ.

For example, the results of the test case with 10000 tables and 1
autovacuum worker I reported before was:

10000 tables:
autovac_workers 1 : 158s,157s, 290s

But after fixing that issue in the third method (always checking the
existing stats), the results are:

Could you tell me how you fixed that issue? You copied the stats to
somewhere as you suggested or skipped pgstat_clear_snapshot() as
I suggested?

I used the way you suggested in this quick test; skipped
pgstat_clear_snapshot().

Kasahara-san seems not to like the latter idea because it might
cause bad side effect. So we should use the former idea?

Not sure. I'm also concerned about the side effect but I've not checked yet.

Since probably there is no big difference between the two ways in
terms of performance I'm going to see how the performance benchmark
result will change first. Maybe meanwhile we can discuss on these two
choices.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

#28Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#27)
Re: autovac issue with large number of tables

On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/01 16:23, Masahiko Sawada wrote:

On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/30 10:43, Masahiko Sawada wrote:

On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi, Thanks for you comments.

On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/27 18:38, Kasahara Tatsuhito wrote:

Hi,

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

Yeah, I would like to see how much the performance changes in other cases.
In addition, if the shared-based-stats patch is applied, we won't need to reload
a huge stats file, so we will just have to check the stats on
shared-mem every time.
Perhaps the logic of table_recheck_autovac could be simpler.

BTW, I found some typos in comments, so attache a fixed version.

The patch adds some duplicated codes into table_recheck_autovac().
It's better to make the common function performing them and make
table_recheck_autovac() call that common function, to simplify the code.

Thanks for your comment.
Hmm.. I've cut out the duplicate part.
Attach the patch.
Could you confirm that it fits your expecting?

Yes, thanks for updataing the patch! Here are another review comments.

+       shared = pgstat_fetch_stat_dbentry(InvalidOid);
+       dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);

When using the existing stats, ISTM that these are not necessary and
we can reuse "shared" and "dbentry" obtained before. Right?

Yeah, but unless autovac_refresh_stats() is called, these functions
read the information from the
local hash table without re-read the stats file, so the process is very light.
Therefore, I think, it is better to keep the current logic to keep the
code simple.

+               /* We might be better to refresh stats */
+               use_existing_stats = false;

I think that we should add more comments about why it's better to
refresh the stats in this case.

+               /* The relid has already vacuumed, so we might be better to use existing stats */
+               use_existing_stats = true;

I think that we should add more comments about why it's better to
reuse the stats in this case.

I added comments.

Attache the patch.

Thank you for updating the patch. Here are some small comments on the
latest (v4) patch.

+    * So if the last time we checked a table that was already vacuumed after
+    * refres stats, check the current statistics before refreshing it.
+    */

s/refres/refresh/

Thanks! fixed.
Attached the patch.

-----
+/* Counter to determine if statistics should be refreshed */
+static bool use_existing_stats = false;
+

I think 'use_existing_stats' can be declared within table_recheck_autovac().

-----
While testing the performance, I realized that the statistics are
reset every time vacuumed one table, leading to re-reading the stats
file even if 'use_existing_stats' is true. Please refer that vacuum()
eventually calls AtEOXact_PgStat() which calls to
pgstat_clear_snapshot().

Good catch!

I believe that's why the performance of the
method of always checking the existing stats wasn’t good as expected.
So if we save the statistics somewhere and use it for rechecking, the
results of the performance benchmark will differ between these two
methods.

Thanks for you checks.
But, if a worker did vacuum(), that means this worker had determined
need vacuum in the
table_recheck_autovac(). So, use_existing_stats set to false, and next
time, refresh stats.
Therefore I think the current patch is fine, as we want to avoid
unnecessary refreshing of
statistics before the actual vacuum(), right?

Yes, you're right.

When I benchmarked the performance of the method of always checking
existing stats I edited your patch so that it sets use_existing_stats
= true even if the second check is false (i.g., vacuum is needed).
And the result I got was worse than expected especially in the case of
a few autovacuum workers. But it doesn't evaluate the performance of
that method rightly as the stats snapshot is cleared every time
vacuum. Given you had similar results, I guess you used a similar way
when evaluating it, is it right? If so, it’s better to fix this issue
and see how the performance benchmark results will differ.

For example, the results of the test case with 10000 tables and 1
autovacuum worker I reported before was:

10000 tables:
autovac_workers 1 : 158s,157s, 290s

But after fixing that issue in the third method (always checking the
existing stats), the results are:

Could you tell me how you fixed that issue? You copied the stats to
somewhere as you suggested or skipped pgstat_clear_snapshot() as
I suggested?

I used the way you suggested in this quick test; skipped
pgstat_clear_snapshot().

Kasahara-san seems not to like the latter idea because it might
cause bad side effect. So we should use the former idea?

Not sure. I'm also concerned about the side effect but I've not checked yet.

Since probably there is no big difference between the two ways in
terms of performance I'm going to see how the performance benchmark
result will change first.

I've tested performance improvement again. From the left the execution
time of the current HEAD, Kasahara-san's patch, the method of always
checking the existing stats (using approach suggested by Fujii-san),
in seconds.

1000 tables:
autovac_workers 1 : 13s, 13s, 13s
autovac_workers 2 : 6s, 4s, 4s
autovac_workers 3 : 3s, 4s, 3s
autovac_workers 5 : 3s, 3s, 2s
autovac_workers 10: 2s, 3s, 2s

5000 tables:
autovac_workers 1 : 71s, 71s, 72s
autovac_workers 2 : 37s, 32s, 32s
autovac_workers 3 : 29s, 26s, 26s
autovac_workers 5 : 20s, 19s, 18s
autovac_workers 10: 13s, 8s, 8s

10000 tables:
autovac_workers 1 : 158s,157s, 159s
autovac_workers 2 : 80s, 53s, 78s
autovac_workers 3 : 75s, 67s, 67s
autovac_workers 5 : 61s, 42s, 42s
autovac_workers 10: 69s, 26s, 25s

20000 tables:
autovac_workers 1 : 379s, 380s, 389s
autovac_workers 2 : 236s, 232s, 233s
autovac_workers 3 : 222s, 181s, 182s
autovac_workers 5 : 212s, 132s, 139s
autovac_workers 10: 317s, 91s, 89s

I don't see a big difference between Kasahara-san's patch and the
method of always checking the existing stats.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

#29Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Masahiko Sawada (#28)
Re: autovac issue with large number of tables

On 2020/12/02 12:53, Masahiko Sawada wrote:

On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/01 16:23, Masahiko Sawada wrote:

On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/30 10:43, Masahiko Sawada wrote:

On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi, Thanks for you comments.

On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/27 18:38, Kasahara Tatsuhito wrote:

Hi,

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

Yeah, I would like to see how much the performance changes in other cases.
In addition, if the shared-based-stats patch is applied, we won't need to reload
a huge stats file, so we will just have to check the stats on
shared-mem every time.
Perhaps the logic of table_recheck_autovac could be simpler.

BTW, I found some typos in comments, so attache a fixed version.

The patch adds some duplicated codes into table_recheck_autovac().
It's better to make the common function performing them and make
table_recheck_autovac() call that common function, to simplify the code.

Thanks for your comment.
Hmm.. I've cut out the duplicate part.
Attach the patch.
Could you confirm that it fits your expecting?

Yes, thanks for updataing the patch! Here are another review comments.

+       shared = pgstat_fetch_stat_dbentry(InvalidOid);
+       dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);

When using the existing stats, ISTM that these are not necessary and
we can reuse "shared" and "dbentry" obtained before. Right?

Yeah, but unless autovac_refresh_stats() is called, these functions
read the information from the
local hash table without re-read the stats file, so the process is very light.
Therefore, I think, it is better to keep the current logic to keep the
code simple.

+               /* We might be better to refresh stats */
+               use_existing_stats = false;

I think that we should add more comments about why it's better to
refresh the stats in this case.

+               /* The relid has already vacuumed, so we might be better to use existing stats */
+               use_existing_stats = true;

I think that we should add more comments about why it's better to
reuse the stats in this case.

I added comments.

Attache the patch.

Thank you for updating the patch. Here are some small comments on the
latest (v4) patch.

+    * So if the last time we checked a table that was already vacuumed after
+    * refres stats, check the current statistics before refreshing it.
+    */

s/refres/refresh/

Thanks! fixed.
Attached the patch.

-----
+/* Counter to determine if statistics should be refreshed */
+static bool use_existing_stats = false;
+

I think 'use_existing_stats' can be declared within table_recheck_autovac().

-----
While testing the performance, I realized that the statistics are
reset every time vacuumed one table, leading to re-reading the stats
file even if 'use_existing_stats' is true. Please refer that vacuum()
eventually calls AtEOXact_PgStat() which calls to
pgstat_clear_snapshot().

Good catch!

I believe that's why the performance of the
method of always checking the existing stats wasn’t good as expected.
So if we save the statistics somewhere and use it for rechecking, the
results of the performance benchmark will differ between these two
methods.

Thanks for you checks.
But, if a worker did vacuum(), that means this worker had determined
need vacuum in the
table_recheck_autovac(). So, use_existing_stats set to false, and next
time, refresh stats.
Therefore I think the current patch is fine, as we want to avoid
unnecessary refreshing of
statistics before the actual vacuum(), right?

Yes, you're right.

When I benchmarked the performance of the method of always checking
existing stats I edited your patch so that it sets use_existing_stats
= true even if the second check is false (i.g., vacuum is needed).
And the result I got was worse than expected especially in the case of
a few autovacuum workers. But it doesn't evaluate the performance of
that method rightly as the stats snapshot is cleared every time
vacuum. Given you had similar results, I guess you used a similar way
when evaluating it, is it right? If so, it’s better to fix this issue
and see how the performance benchmark results will differ.

For example, the results of the test case with 10000 tables and 1
autovacuum worker I reported before was:

10000 tables:
autovac_workers 1 : 158s,157s, 290s

But after fixing that issue in the third method (always checking the
existing stats), the results are:

Could you tell me how you fixed that issue? You copied the stats to
somewhere as you suggested or skipped pgstat_clear_snapshot() as
I suggested?

I used the way you suggested in this quick test; skipped
pgstat_clear_snapshot().

Kasahara-san seems not to like the latter idea because it might
cause bad side effect. So we should use the former idea?

Not sure. I'm also concerned about the side effect but I've not checked yet.

Since probably there is no big difference between the two ways in
terms of performance I'm going to see how the performance benchmark
result will change first.

I've tested performance improvement again. From the left the execution
time of the current HEAD, Kasahara-san's patch, the method of always
checking the existing stats (using approach suggested by Fujii-san),
in seconds.

1000 tables:
autovac_workers 1 : 13s, 13s, 13s
autovac_workers 2 : 6s, 4s, 4s
autovac_workers 3 : 3s, 4s, 3s
autovac_workers 5 : 3s, 3s, 2s
autovac_workers 10: 2s, 3s, 2s

5000 tables:
autovac_workers 1 : 71s, 71s, 72s
autovac_workers 2 : 37s, 32s, 32s
autovac_workers 3 : 29s, 26s, 26s
autovac_workers 5 : 20s, 19s, 18s
autovac_workers 10: 13s, 8s, 8s

10000 tables:
autovac_workers 1 : 158s,157s, 159s
autovac_workers 2 : 80s, 53s, 78s
autovac_workers 3 : 75s, 67s, 67s
autovac_workers 5 : 61s, 42s, 42s
autovac_workers 10: 69s, 26s, 25s

20000 tables:
autovac_workers 1 : 379s, 380s, 389s
autovac_workers 2 : 236s, 232s, 233s
autovac_workers 3 : 222s, 181s, 182s
autovac_workers 5 : 212s, 132s, 139s
autovac_workers 10: 317s, 91s, 89s

I don't see a big difference between Kasahara-san's patch and the
method of always checking the existing stats.

Thanks for doing the benchmark!

This benchmark result makes me think that we don't need to tweak
AtEOXact_PgStat() and can use Kasahara-san approach.
That's good news :)

+		/*
+		 * The relid had not yet been vacuumed. That means, it is unlikely that the
+		 * stats that this worker currently has are updated by other worker's.
+		 * So we might be better to refresh the stats in the next this recheck.
+		 */
+		use_existing_stats = false;

I think that this comment should be changed to something like
the following. Thought?

When we decide to do vacuum or analyze, the existing stats cannot
be reused in the next cycle because it's cleared at the end of vacuum
or analyze (by AtEOXact_PgStat()).

+		/*
+		 * The relid had already vacuumed. That means, that for the stats that this
+		 * worker currently has, the info of tables that this worker will process may
+		 * have been updated by other workers with information that has already been
+		 * vacuumed or analyzed.
+		 * So we might be better to reuse the existing stats in the next this recheck.
+		 */
+		use_existing_stats = true;

Maybe it's better to change this comment to something like the following?

If neither vacuum nor analyze is necessary, the existing stats is
not cleared and can be reused in the next cycle.

+	if (use_existing_stats)
+	{
+		recheck_relation_needs_vacanalyze(relid, classForm, avopts,
+									   effective_multixact_freeze_max_age,
+									   &dovacuum, &doanalyze, &wraparound);

Personally I'd like to add the assertion test checking "pgStatDBHash != NULL"
here, to guarantee that there is the existing stats to reuse when
use_existing_stats==true. Because if the future changes of autovacuum
code will break that assumption, it's not easy to detect that breakage
without that assertion test. Thought?

+	shared = pgstat_fetch_stat_dbentry(InvalidOid);
+	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);

If classForm->relisshared is true, only the former needs to be executed.
Otherwise, only the latter needs to be executed. Right?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#30Kasahara Tatsuhito
kasahara.tatsuhito@gmail.com
In reply to: Fujii Masao (#29)
1 attachment(s)
Re: autovac issue with large number of tables

Hi

On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/02 12:53, Masahiko Sawada wrote:

On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/01 16:23, Masahiko Sawada wrote:

On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/30 10:43, Masahiko Sawada wrote:

On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi, Thanks for you comments.

On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/27 18:38, Kasahara Tatsuhito wrote:

Hi,

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

Yeah, I would like to see how much the performance changes in other cases.
In addition, if the shared-based-stats patch is applied, we won't need to reload
a huge stats file, so we will just have to check the stats on
shared-mem every time.
Perhaps the logic of table_recheck_autovac could be simpler.

BTW, I found some typos in comments, so attache a fixed version.

The patch adds some duplicated codes into table_recheck_autovac().
It's better to make the common function performing them and make
table_recheck_autovac() call that common function, to simplify the code.

Thanks for your comment.
Hmm.. I've cut out the duplicate part.
Attach the patch.
Could you confirm that it fits your expecting?

Yes, thanks for updataing the patch! Here are another review comments.

+       shared = pgstat_fetch_stat_dbentry(InvalidOid);
+       dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);

When using the existing stats, ISTM that these are not necessary and
we can reuse "shared" and "dbentry" obtained before. Right?

Yeah, but unless autovac_refresh_stats() is called, these functions
read the information from the
local hash table without re-read the stats file, so the process is very light.
Therefore, I think, it is better to keep the current logic to keep the
code simple.

+               /* We might be better to refresh stats */
+               use_existing_stats = false;

I think that we should add more comments about why it's better to
refresh the stats in this case.

+               /* The relid has already vacuumed, so we might be better to use existing stats */
+               use_existing_stats = true;

I think that we should add more comments about why it's better to
reuse the stats in this case.

I added comments.

Attache the patch.

Thank you for updating the patch. Here are some small comments on the
latest (v4) patch.

+    * So if the last time we checked a table that was already vacuumed after
+    * refres stats, check the current statistics before refreshing it.
+    */

s/refres/refresh/

Thanks! fixed.
Attached the patch.

-----
+/* Counter to determine if statistics should be refreshed */
+static bool use_existing_stats = false;
+

I think 'use_existing_stats' can be declared within table_recheck_autovac().

-----
While testing the performance, I realized that the statistics are
reset every time vacuumed one table, leading to re-reading the stats
file even if 'use_existing_stats' is true. Please refer that vacuum()
eventually calls AtEOXact_PgStat() which calls to
pgstat_clear_snapshot().

Good catch!

I believe that's why the performance of the
method of always checking the existing stats wasn’t good as expected.
So if we save the statistics somewhere and use it for rechecking, the
results of the performance benchmark will differ between these two
methods.

Thanks for you checks.
But, if a worker did vacuum(), that means this worker had determined
need vacuum in the
table_recheck_autovac(). So, use_existing_stats set to false, and next
time, refresh stats.
Therefore I think the current patch is fine, as we want to avoid
unnecessary refreshing of
statistics before the actual vacuum(), right?

Yes, you're right.

When I benchmarked the performance of the method of always checking
existing stats I edited your patch so that it sets use_existing_stats
= true even if the second check is false (i.g., vacuum is needed).
And the result I got was worse than expected especially in the case of
a few autovacuum workers. But it doesn't evaluate the performance of
that method rightly as the stats snapshot is cleared every time
vacuum. Given you had similar results, I guess you used a similar way
when evaluating it, is it right? If so, it’s better to fix this issue
and see how the performance benchmark results will differ.

For example, the results of the test case with 10000 tables and 1
autovacuum worker I reported before was:

10000 tables:
autovac_workers 1 : 158s,157s, 290s

But after fixing that issue in the third method (always checking the
existing stats), the results are:

Could you tell me how you fixed that issue? You copied the stats to
somewhere as you suggested or skipped pgstat_clear_snapshot() as
I suggested?

I used the way you suggested in this quick test; skipped
pgstat_clear_snapshot().

Kasahara-san seems not to like the latter idea because it might
cause bad side effect. So we should use the former idea?

Not sure. I'm also concerned about the side effect but I've not checked yet.

Since probably there is no big difference between the two ways in
terms of performance I'm going to see how the performance benchmark
result will change first.

I've tested performance improvement again. From the left the execution
time of the current HEAD, Kasahara-san's patch, the method of always
checking the existing stats (using approach suggested by Fujii-san),
in seconds.

1000 tables:
autovac_workers 1 : 13s, 13s, 13s
autovac_workers 2 : 6s, 4s, 4s
autovac_workers 3 : 3s, 4s, 3s
autovac_workers 5 : 3s, 3s, 2s
autovac_workers 10: 2s, 3s, 2s

5000 tables:
autovac_workers 1 : 71s, 71s, 72s
autovac_workers 2 : 37s, 32s, 32s
autovac_workers 3 : 29s, 26s, 26s
autovac_workers 5 : 20s, 19s, 18s
autovac_workers 10: 13s, 8s, 8s

10000 tables:
autovac_workers 1 : 158s,157s, 159s
autovac_workers 2 : 80s, 53s, 78s
autovac_workers 3 : 75s, 67s, 67s
autovac_workers 5 : 61s, 42s, 42s
autovac_workers 10: 69s, 26s, 25s

20000 tables:
autovac_workers 1 : 379s, 380s, 389s
autovac_workers 2 : 236s, 232s, 233s
autovac_workers 3 : 222s, 181s, 182s
autovac_workers 5 : 212s, 132s, 139s
autovac_workers 10: 317s, 91s, 89s

I don't see a big difference between Kasahara-san's patch and the
method of always checking the existing stats.

Thanks!

Thanks for doing the benchmark!

This benchmark result makes me think that we don't need to tweak
AtEOXact_PgStat() and can use Kasahara-san approach.
That's good news :)

+               /*
+                * The relid had not yet been vacuumed. That means, it is unlikely that the
+                * stats that this worker currently has are updated by other worker's.
+                * So we might be better to refresh the stats in the next this recheck.
+                */
+               use_existing_stats = false;

I think that this comment should be changed to something like
the following. Thought?

I think your comment is more reasonable.
I replaced the comments.

When we decide to do vacuum or analyze, the existing stats cannot
be reused in the next cycle because it's cleared at the end of vacuum
or analyze (by AtEOXact_PgStat()).

+               /*
+                * The relid had already vacuumed. That means, that for the stats that this
+                * worker currently has, the info of tables that this worker will process may
+                * have been updated by other workers with information that has already been
+                * vacuumed or analyzed.
+                * So we might be better to reuse the existing stats in the next this recheck.
+                */
+               use_existing_stats = true;

Maybe it's better to change this comment to something like the following?

I replaced the comments.

If neither vacuum nor analyze is necessary, the existing stats is
not cleared and can be reused in the next cycle.

+       if (use_existing_stats)
+       {
+               recheck_relation_needs_vacanalyze(relid, classForm, avopts,
+                                                                          effective_multixact_freeze_max_age,
+                                                                          &dovacuum, &doanalyze, &wraparound);

Personally I'd like to add the assertion test checking "pgStatDBHash != NULL"
here, to guarantee that there is the existing stats to reuse when
use_existing_stats==true. Because if the future changes of autovacuum
code will break that assumption, it's not easy to detect that breakage
without that assertion test. Thought?

I think, it's nice to have.
But if do so, we have to add new function to pgstat.c for check
pgStatDBHash is null or not.
I'm not sure it's a reasonable change.
And, if pgstatDBHash is NULL here, it is not a critical issue, so
foregoing the addition of the Assert for now.

+       shared = pgstat_fetch_stat_dbentry(InvalidOid);
+       dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);

If classForm->relisshared is true, only the former needs to be executed.
Otherwise, only the latter needs to be executed. Right?

Right.
I modified that check classForm->relisshared to execute only one of them.

Attached the patch.

Best regards,

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com

Attachments:

v6_mod_table_recheck_autovac.patchapplication/octet-stream; name=v6_mod_table_recheck_autovac.patchDownload
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index aa5b97f..aa95513 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -325,6 +325,10 @@ static void autovac_balance_cost(void);
 static void do_autovacuum(void);
 static void FreeWorkerInfo(int code, Datum arg);
 
+static void recheck_relation_needs_vacanalyze(Oid relid, Form_pg_class classForm,
+							 AutoVacOpts *avopts,
+							 int effective_multixact_freeze_max_age,
+							 bool *dovacuum, bool *doanalyze, bool *wraparound);
 static autovac_table *table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 											TupleDesc pg_class_desc,
 											int effective_multixact_freeze_max_age);
@@ -2780,6 +2784,40 @@ get_pgstat_tabentry_relid(Oid relid, bool isshared, PgStat_StatDBEntry *shared,
 }
 
 /*
+ * Subroutine of table_recheck_autovac.
+ */
+static void
+recheck_relation_needs_vacanalyze(Oid relid,
+							   Form_pg_class classForm,
+							   AutoVacOpts *avopts,
+							   int effective_multixact_freeze_max_age,
+							   bool *dovacuum,
+							   bool *doanalyze,
+							   bool *wraparound)
+{
+	PgStat_StatTabEntry *tabentry;
+	PgStat_StatDBEntry *shared = NULL;
+	PgStat_StatDBEntry *dbentry = NULL;
+
+	if (classForm->relisshared)
+		shared = pgstat_fetch_stat_dbentry(InvalidOid);
+	else 
+		dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
+
+	/* fetch the pgstat table entry */
+	tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
+										 shared, dbentry);
+
+	relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
+							  effective_multixact_freeze_max_age,
+							  dovacuum, doanalyze, wraparound);
+
+	/* ignore ANALYZE for toast tables */
+	if (classForm->relkind == RELKIND_TOASTVALUE)
+		*doanalyze = false;
+}
+
+/*
  * table_recheck_autovac
  *
  * Recheck whether a table still needs vacuum or analyze.  Return value is a
@@ -2797,17 +2835,9 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 	bool		dovacuum;
 	bool		doanalyze;
 	autovac_table *tab = NULL;
-	PgStat_StatTabEntry *tabentry;
-	PgStat_StatDBEntry *shared;
-	PgStat_StatDBEntry *dbentry;
 	bool		wraparound;
 	AutoVacOpts *avopts;
-
-	/* use fresh stats */
-	autovac_refresh_stats();
-
-	shared = pgstat_fetch_stat_dbentry(InvalidOid);
-	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
+	static bool use_existing_stats = false;
 
 	/* fetch the relation's relcache entry */
 	classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
@@ -2831,17 +2861,33 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 			avopts = &hentry->ar_reloptions;
 	}
 
-	/* fetch the pgstat table entry */
-	tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
-										 shared, dbentry);
+	/*
+	 * Before stats refresh, check existing stats for avoiding frequent reloading
+	 * of pgstats if possible.
+	 * In the case of very large numbers of tables, the cost of re-reading
+	 * the stats file can be significant, and the frequent calls to
+	 * autovac_refresh_stats() can make certain autovacuum workers unable to work.
+	 */
+	if (use_existing_stats)
+	{
+		recheck_relation_needs_vacanalyze(relid, classForm, avopts,
+									   effective_multixact_freeze_max_age,
+									   &dovacuum, &doanalyze, &wraparound);
 
-	relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
-							  effective_multixact_freeze_max_age,
-							  &dovacuum, &doanalyze, &wraparound);
+		/* someone has already issued vacuum, so exit quickly */
+		if (!doanalyze && !dovacuum)
+		{
+			heap_freetuple(classTup);
+			return NULL;
+		}
+	}
 
-	/* ignore ANALYZE for toast tables */
-	if (classForm->relkind == RELKIND_TOASTVALUE)
-		doanalyze = false;
+	/* use fresh stats and recheck again */
+	autovac_refresh_stats();
+
+	recheck_relation_needs_vacanalyze(relid, classForm, avopts,
+								   effective_multixact_freeze_max_age,
+								   &dovacuum, &doanalyze, &wraparound);
 
 	/* OK, it needs something done */
 	if (doanalyze || dovacuum)
@@ -2929,10 +2975,23 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 		tab->at_dobalance =
 			!(avopts && (avopts->vacuum_cost_limit > 0 ||
 						 avopts->vacuum_cost_delay > 0));
+
+		/* When we decide to do vacuum or analyze, the existing stats cannot
+ 		 * be reused in the next cycle because it's cleared at the end of vacuum
+ 		 * or analyze (by AtEOXact_PgStat()).
+ 		 */
+		use_existing_stats = false;
+	}
+	else
+	{
+		/*
+		 * If neither vacuum nor analyze is necessary, the existing stats is
+		 * not cleared and can be reused in the next cycle.
+		 */
+		use_existing_stats = true;
 	}
 
 	heap_freetuple(classTup);
-
 	return tab;
 }
 
#31Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Fujii Masao (#29)
Re: autovac issue with large number of tables

On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/02 12:53, Masahiko Sawada wrote:

On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/01 16:23, Masahiko Sawada wrote:

On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/30 10:43, Masahiko Sawada wrote:

On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi, Thanks for you comments.

On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/27 18:38, Kasahara Tatsuhito wrote:

Hi,

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

Yeah, I would like to see how much the performance changes in other cases.
In addition, if the shared-based-stats patch is applied, we won't need to reload
a huge stats file, so we will just have to check the stats on
shared-mem every time.
Perhaps the logic of table_recheck_autovac could be simpler.

BTW, I found some typos in comments, so attache a fixed version.

The patch adds some duplicated codes into table_recheck_autovac().
It's better to make the common function performing them and make
table_recheck_autovac() call that common function, to simplify the code.

Thanks for your comment.
Hmm.. I've cut out the duplicate part.
Attach the patch.
Could you confirm that it fits your expecting?

Yes, thanks for updataing the patch! Here are another review comments.

+       shared = pgstat_fetch_stat_dbentry(InvalidOid);
+       dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);

When using the existing stats, ISTM that these are not necessary and
we can reuse "shared" and "dbentry" obtained before. Right?

Yeah, but unless autovac_refresh_stats() is called, these functions
read the information from the
local hash table without re-read the stats file, so the process is very light.
Therefore, I think, it is better to keep the current logic to keep the
code simple.

+               /* We might be better to refresh stats */
+               use_existing_stats = false;

I think that we should add more comments about why it's better to
refresh the stats in this case.

+               /* The relid has already vacuumed, so we might be better to use existing stats */
+               use_existing_stats = true;

I think that we should add more comments about why it's better to
reuse the stats in this case.

I added comments.

Attache the patch.

Thank you for updating the patch. Here are some small comments on the
latest (v4) patch.

+    * So if the last time we checked a table that was already vacuumed after
+    * refres stats, check the current statistics before refreshing it.
+    */

s/refres/refresh/

Thanks! fixed.
Attached the patch.

-----
+/* Counter to determine if statistics should be refreshed */
+static bool use_existing_stats = false;
+

I think 'use_existing_stats' can be declared within table_recheck_autovac().

-----
While testing the performance, I realized that the statistics are
reset every time vacuumed one table, leading to re-reading the stats
file even if 'use_existing_stats' is true. Please refer that vacuum()
eventually calls AtEOXact_PgStat() which calls to
pgstat_clear_snapshot().

Good catch!

I believe that's why the performance of the
method of always checking the existing stats wasn’t good as expected.
So if we save the statistics somewhere and use it for rechecking, the
results of the performance benchmark will differ between these two
methods.

Thanks for you checks.
But, if a worker did vacuum(), that means this worker had determined
need vacuum in the
table_recheck_autovac(). So, use_existing_stats set to false, and next
time, refresh stats.
Therefore I think the current patch is fine, as we want to avoid
unnecessary refreshing of
statistics before the actual vacuum(), right?

Yes, you're right.

When I benchmarked the performance of the method of always checking
existing stats I edited your patch so that it sets use_existing_stats
= true even if the second check is false (i.g., vacuum is needed).
And the result I got was worse than expected especially in the case of
a few autovacuum workers. But it doesn't evaluate the performance of
that method rightly as the stats snapshot is cleared every time
vacuum. Given you had similar results, I guess you used a similar way
when evaluating it, is it right? If so, it’s better to fix this issue
and see how the performance benchmark results will differ.

For example, the results of the test case with 10000 tables and 1
autovacuum worker I reported before was:

10000 tables:
autovac_workers 1 : 158s,157s, 290s

But after fixing that issue in the third method (always checking the
existing stats), the results are:

Could you tell me how you fixed that issue? You copied the stats to
somewhere as you suggested or skipped pgstat_clear_snapshot() as
I suggested?

I used the way you suggested in this quick test; skipped
pgstat_clear_snapshot().

Kasahara-san seems not to like the latter idea because it might
cause bad side effect. So we should use the former idea?

Not sure. I'm also concerned about the side effect but I've not checked yet.

Since probably there is no big difference between the two ways in
terms of performance I'm going to see how the performance benchmark
result will change first.

I've tested performance improvement again. From the left the execution
time of the current HEAD, Kasahara-san's patch, the method of always
checking the existing stats (using approach suggested by Fujii-san),
in seconds.

1000 tables:
autovac_workers 1 : 13s, 13s, 13s
autovac_workers 2 : 6s, 4s, 4s
autovac_workers 3 : 3s, 4s, 3s
autovac_workers 5 : 3s, 3s, 2s
autovac_workers 10: 2s, 3s, 2s

5000 tables:
autovac_workers 1 : 71s, 71s, 72s
autovac_workers 2 : 37s, 32s, 32s
autovac_workers 3 : 29s, 26s, 26s
autovac_workers 5 : 20s, 19s, 18s
autovac_workers 10: 13s, 8s, 8s

10000 tables:
autovac_workers 1 : 158s,157s, 159s
autovac_workers 2 : 80s, 53s, 78s
autovac_workers 3 : 75s, 67s, 67s
autovac_workers 5 : 61s, 42s, 42s
autovac_workers 10: 69s, 26s, 25s

20000 tables:
autovac_workers 1 : 379s, 380s, 389s
autovac_workers 2 : 236s, 232s, 233s
autovac_workers 3 : 222s, 181s, 182s
autovac_workers 5 : 212s, 132s, 139s
autovac_workers 10: 317s, 91s, 89s

I don't see a big difference between Kasahara-san's patch and the
method of always checking the existing stats.

Thanks for doing the benchmark!

This benchmark result makes me think that we don't need to tweak
AtEOXact_PgStat() and can use Kasahara-san approach.
That's good news :)

Yeah, given that all autovaucum workers have the list of tables to
vacuum in the same order in most cases, the assumption in
Kasahara-san’s patch that if a worker needs to vacuum a table it’s
unlikely that it will be able to skip the next table using the current
snapshot of stats makes sense to me.

One small comment on v6 patch:

+ /* When we decide to do vacuum or analyze, the existing stats cannot
+ * be reused in the next cycle because it's cleared at the end of vacuum
+ * or analyze (by AtEOXact_PgStat()).
+ */
+ use_existing_stats = false;

I think the comment should start on the second line (i.g., \n is
needed after /*).

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

#32Kasahara Tatsuhito
kasahara.tatsuhito@gmail.com
In reply to: Masahiko Sawada (#31)
1 attachment(s)
Re: autovac issue with large number of tables

On Wed, Dec 2, 2020 at 7:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/02 12:53, Masahiko Sawada wrote:

On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/01 16:23, Masahiko Sawada wrote:

On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/30 10:43, Masahiko Sawada wrote:

On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi, Thanks for you comments.

On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/27 18:38, Kasahara Tatsuhito wrote:

Hi,

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

Yeah, I would like to see how much the performance changes in other cases.
In addition, if the shared-based-stats patch is applied, we won't need to reload
a huge stats file, so we will just have to check the stats on
shared-mem every time.
Perhaps the logic of table_recheck_autovac could be simpler.

BTW, I found some typos in comments, so attache a fixed version.

The patch adds some duplicated codes into table_recheck_autovac().
It's better to make the common function performing them and make
table_recheck_autovac() call that common function, to simplify the code.

Thanks for your comment.
Hmm.. I've cut out the duplicate part.
Attach the patch.
Could you confirm that it fits your expecting?

Yes, thanks for updataing the patch! Here are another review comments.

+       shared = pgstat_fetch_stat_dbentry(InvalidOid);
+       dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);

When using the existing stats, ISTM that these are not necessary and
we can reuse "shared" and "dbentry" obtained before. Right?

Yeah, but unless autovac_refresh_stats() is called, these functions
read the information from the
local hash table without re-read the stats file, so the process is very light.
Therefore, I think, it is better to keep the current logic to keep the
code simple.

+               /* We might be better to refresh stats */
+               use_existing_stats = false;

I think that we should add more comments about why it's better to
refresh the stats in this case.

+               /* The relid has already vacuumed, so we might be better to use existing stats */
+               use_existing_stats = true;

I think that we should add more comments about why it's better to
reuse the stats in this case.

I added comments.

Attache the patch.

Thank you for updating the patch. Here are some small comments on the
latest (v4) patch.

+    * So if the last time we checked a table that was already vacuumed after
+    * refres stats, check the current statistics before refreshing it.
+    */

s/refres/refresh/

Thanks! fixed.
Attached the patch.

-----
+/* Counter to determine if statistics should be refreshed */
+static bool use_existing_stats = false;
+

I think 'use_existing_stats' can be declared within table_recheck_autovac().

-----
While testing the performance, I realized that the statistics are
reset every time vacuumed one table, leading to re-reading the stats
file even if 'use_existing_stats' is true. Please refer that vacuum()
eventually calls AtEOXact_PgStat() which calls to
pgstat_clear_snapshot().

Good catch!

I believe that's why the performance of the
method of always checking the existing stats wasn’t good as expected.
So if we save the statistics somewhere and use it for rechecking, the
results of the performance benchmark will differ between these two
methods.

Thanks for you checks.
But, if a worker did vacuum(), that means this worker had determined
need vacuum in the
table_recheck_autovac(). So, use_existing_stats set to false, and next
time, refresh stats.
Therefore I think the current patch is fine, as we want to avoid
unnecessary refreshing of
statistics before the actual vacuum(), right?

Yes, you're right.

When I benchmarked the performance of the method of always checking
existing stats I edited your patch so that it sets use_existing_stats
= true even if the second check is false (i.g., vacuum is needed).
And the result I got was worse than expected especially in the case of
a few autovacuum workers. But it doesn't evaluate the performance of
that method rightly as the stats snapshot is cleared every time
vacuum. Given you had similar results, I guess you used a similar way
when evaluating it, is it right? If so, it’s better to fix this issue
and see how the performance benchmark results will differ.

For example, the results of the test case with 10000 tables and 1
autovacuum worker I reported before was:

10000 tables:
autovac_workers 1 : 158s,157s, 290s

But after fixing that issue in the third method (always checking the
existing stats), the results are:

Could you tell me how you fixed that issue? You copied the stats to
somewhere as you suggested or skipped pgstat_clear_snapshot() as
I suggested?

I used the way you suggested in this quick test; skipped
pgstat_clear_snapshot().

Kasahara-san seems not to like the latter idea because it might
cause bad side effect. So we should use the former idea?

Not sure. I'm also concerned about the side effect but I've not checked yet.

Since probably there is no big difference between the two ways in
terms of performance I'm going to see how the performance benchmark
result will change first.

I've tested performance improvement again. From the left the execution
time of the current HEAD, Kasahara-san's patch, the method of always
checking the existing stats (using approach suggested by Fujii-san),
in seconds.

1000 tables:
autovac_workers 1 : 13s, 13s, 13s
autovac_workers 2 : 6s, 4s, 4s
autovac_workers 3 : 3s, 4s, 3s
autovac_workers 5 : 3s, 3s, 2s
autovac_workers 10: 2s, 3s, 2s

5000 tables:
autovac_workers 1 : 71s, 71s, 72s
autovac_workers 2 : 37s, 32s, 32s
autovac_workers 3 : 29s, 26s, 26s
autovac_workers 5 : 20s, 19s, 18s
autovac_workers 10: 13s, 8s, 8s

10000 tables:
autovac_workers 1 : 158s,157s, 159s
autovac_workers 2 : 80s, 53s, 78s
autovac_workers 3 : 75s, 67s, 67s
autovac_workers 5 : 61s, 42s, 42s
autovac_workers 10: 69s, 26s, 25s

20000 tables:
autovac_workers 1 : 379s, 380s, 389s
autovac_workers 2 : 236s, 232s, 233s
autovac_workers 3 : 222s, 181s, 182s
autovac_workers 5 : 212s, 132s, 139s
autovac_workers 10: 317s, 91s, 89s

I don't see a big difference between Kasahara-san's patch and the
method of always checking the existing stats.

Thanks for doing the benchmark!

This benchmark result makes me think that we don't need to tweak
AtEOXact_PgStat() and can use Kasahara-san approach.
That's good news :)

Yeah, given that all autovaucum workers have the list of tables to
vacuum in the same order in most cases, the assumption in
Kasahara-san’s patch that if a worker needs to vacuum a table it’s
unlikely that it will be able to skip the next table using the current
snapshot of stats makes sense to me.

One small comment on v6 patch:

+ /* When we decide to do vacuum or analyze, the existing stats cannot
+ * be reused in the next cycle because it's cleared at the end of vacuum
+ * or analyze (by AtEOXact_PgStat()).
+ */
+ use_existing_stats = false;

I think the comment should start on the second line (i.g., \n is
needed after /*).

Oops, thanks.
Fixed.

Best regards,

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com

Attachments:

v7_mod_table_recheck_autovac.patchapplication/octet-stream; name=v7_mod_table_recheck_autovac.patchDownload
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index aa5b97f..ac3982c 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -325,6 +325,10 @@ static void autovac_balance_cost(void);
 static void do_autovacuum(void);
 static void FreeWorkerInfo(int code, Datum arg);
 
+static void recheck_relation_needs_vacanalyze(Oid relid, Form_pg_class classForm,
+							 AutoVacOpts *avopts,
+							 int effective_multixact_freeze_max_age,
+							 bool *dovacuum, bool *doanalyze, bool *wraparound);
 static autovac_table *table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 											TupleDesc pg_class_desc,
 											int effective_multixact_freeze_max_age);
@@ -2780,6 +2784,40 @@ get_pgstat_tabentry_relid(Oid relid, bool isshared, PgStat_StatDBEntry *shared,
 }
 
 /*
+ * Subroutine of table_recheck_autovac.
+ */
+static void
+recheck_relation_needs_vacanalyze(Oid relid,
+							   Form_pg_class classForm,
+							   AutoVacOpts *avopts,
+							   int effective_multixact_freeze_max_age,
+							   bool *dovacuum,
+							   bool *doanalyze,
+							   bool *wraparound)
+{
+	PgStat_StatTabEntry *tabentry;
+	PgStat_StatDBEntry *shared = NULL;
+	PgStat_StatDBEntry *dbentry = NULL;
+
+	if (classForm->relisshared)
+		shared = pgstat_fetch_stat_dbentry(InvalidOid);
+	else 
+		dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
+
+	/* fetch the pgstat table entry */
+	tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
+										 shared, dbentry);
+
+	relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
+							  effective_multixact_freeze_max_age,
+							  dovacuum, doanalyze, wraparound);
+
+	/* ignore ANALYZE for toast tables */
+	if (classForm->relkind == RELKIND_TOASTVALUE)
+		*doanalyze = false;
+}
+
+/*
  * table_recheck_autovac
  *
  * Recheck whether a table still needs vacuum or analyze.  Return value is a
@@ -2797,17 +2835,9 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 	bool		dovacuum;
 	bool		doanalyze;
 	autovac_table *tab = NULL;
-	PgStat_StatTabEntry *tabentry;
-	PgStat_StatDBEntry *shared;
-	PgStat_StatDBEntry *dbentry;
 	bool		wraparound;
 	AutoVacOpts *avopts;
-
-	/* use fresh stats */
-	autovac_refresh_stats();
-
-	shared = pgstat_fetch_stat_dbentry(InvalidOid);
-	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
+	static bool use_existing_stats = false;
 
 	/* fetch the relation's relcache entry */
 	classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
@@ -2831,17 +2861,33 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 			avopts = &hentry->ar_reloptions;
 	}
 
-	/* fetch the pgstat table entry */
-	tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
-										 shared, dbentry);
+	/*
+	 * Before stats refresh, check existing stats for avoiding frequent reloading
+	 * of pgstats if possible.
+	 * In the case of very large numbers of tables, the cost of re-reading
+	 * the stats file can be significant, and the frequent calls to
+	 * autovac_refresh_stats() can make certain autovacuum workers unable to work.
+	 */
+	if (use_existing_stats)
+	{
+		recheck_relation_needs_vacanalyze(relid, classForm, avopts,
+									   effective_multixact_freeze_max_age,
+									   &dovacuum, &doanalyze, &wraparound);
 
-	relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
-							  effective_multixact_freeze_max_age,
-							  &dovacuum, &doanalyze, &wraparound);
+		/* someone has already issued vacuum, so exit quickly */
+		if (!doanalyze && !dovacuum)
+		{
+			heap_freetuple(classTup);
+			return NULL;
+		}
+	}
 
-	/* ignore ANALYZE for toast tables */
-	if (classForm->relkind == RELKIND_TOASTVALUE)
-		doanalyze = false;
+	/* use fresh stats and recheck again */
+	autovac_refresh_stats();
+
+	recheck_relation_needs_vacanalyze(relid, classForm, avopts,
+								   effective_multixact_freeze_max_age,
+								   &dovacuum, &doanalyze, &wraparound);
 
 	/* OK, it needs something done */
 	if (doanalyze || dovacuum)
@@ -2929,10 +2975,24 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 		tab->at_dobalance =
 			!(avopts && (avopts->vacuum_cost_limit > 0 ||
 						 avopts->vacuum_cost_delay > 0));
+
+		/* 
+ 		 * When we decide to do vacuum or analyze, the existing stats cannot
+ 		 * be reused in the next cycle because it's cleared at the end of vacuum
+ 		 * or analyze (by AtEOXact_PgStat()).
+ 		 */
+		use_existing_stats = false;
+	}
+	else
+	{
+		/*
+		 * If neither vacuum nor analyze is necessary, the existing stats is
+		 * not cleared and can be reused in the next cycle.
+		 */
+		use_existing_stats = true;
 	}
 
 	heap_freetuple(classTup);
-
 	return tab;
 }
 
#33Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Kasahara Tatsuhito (#32)
1 attachment(s)
Re: autovac issue with large number of tables

On 2020/12/03 11:46, Kasahara Tatsuhito wrote:

On Wed, Dec 2, 2020 at 7:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/02 12:53, Masahiko Sawada wrote:

On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/01 16:23, Masahiko Sawada wrote:

On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/30 10:43, Masahiko Sawada wrote:

On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi, Thanks for you comments.

On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/27 18:38, Kasahara Tatsuhito wrote:

Hi,

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

Yeah, I would like to see how much the performance changes in other cases.
In addition, if the shared-based-stats patch is applied, we won't need to reload
a huge stats file, so we will just have to check the stats on
shared-mem every time.
Perhaps the logic of table_recheck_autovac could be simpler.

BTW, I found some typos in comments, so attache a fixed version.

The patch adds some duplicated codes into table_recheck_autovac().
It's better to make the common function performing them and make
table_recheck_autovac() call that common function, to simplify the code.

Thanks for your comment.
Hmm.. I've cut out the duplicate part.
Attach the patch.
Could you confirm that it fits your expecting?

Yes, thanks for updataing the patch! Here are another review comments.

+       shared = pgstat_fetch_stat_dbentry(InvalidOid);
+       dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);

When using the existing stats, ISTM that these are not necessary and
we can reuse "shared" and "dbentry" obtained before. Right?

Yeah, but unless autovac_refresh_stats() is called, these functions
read the information from the
local hash table without re-read the stats file, so the process is very light.
Therefore, I think, it is better to keep the current logic to keep the
code simple.

+               /* We might be better to refresh stats */
+               use_existing_stats = false;

I think that we should add more comments about why it's better to
refresh the stats in this case.

+               /* The relid has already vacuumed, so we might be better to use existing stats */
+               use_existing_stats = true;

I think that we should add more comments about why it's better to
reuse the stats in this case.

I added comments.

Attache the patch.

Thank you for updating the patch. Here are some small comments on the
latest (v4) patch.

+    * So if the last time we checked a table that was already vacuumed after
+    * refres stats, check the current statistics before refreshing it.
+    */

s/refres/refresh/

Thanks! fixed.
Attached the patch.

-----
+/* Counter to determine if statistics should be refreshed */
+static bool use_existing_stats = false;
+

I think 'use_existing_stats' can be declared within table_recheck_autovac().

-----
While testing the performance, I realized that the statistics are
reset every time vacuumed one table, leading to re-reading the stats
file even if 'use_existing_stats' is true. Please refer that vacuum()
eventually calls AtEOXact_PgStat() which calls to
pgstat_clear_snapshot().

Good catch!

I believe that's why the performance of the
method of always checking the existing stats wasn’t good as expected.
So if we save the statistics somewhere and use it for rechecking, the
results of the performance benchmark will differ between these two
methods.

Thanks for you checks.
But, if a worker did vacuum(), that means this worker had determined
need vacuum in the
table_recheck_autovac(). So, use_existing_stats set to false, and next
time, refresh stats.
Therefore I think the current patch is fine, as we want to avoid
unnecessary refreshing of
statistics before the actual vacuum(), right?

Yes, you're right.

When I benchmarked the performance of the method of always checking
existing stats I edited your patch so that it sets use_existing_stats
= true even if the second check is false (i.g., vacuum is needed).
And the result I got was worse than expected especially in the case of
a few autovacuum workers. But it doesn't evaluate the performance of
that method rightly as the stats snapshot is cleared every time
vacuum. Given you had similar results, I guess you used a similar way
when evaluating it, is it right? If so, it’s better to fix this issue
and see how the performance benchmark results will differ.

For example, the results of the test case with 10000 tables and 1
autovacuum worker I reported before was:

10000 tables:
autovac_workers 1 : 158s,157s, 290s

But after fixing that issue in the third method (always checking the
existing stats), the results are:

Could you tell me how you fixed that issue? You copied the stats to
somewhere as you suggested or skipped pgstat_clear_snapshot() as
I suggested?

I used the way you suggested in this quick test; skipped
pgstat_clear_snapshot().

Kasahara-san seems not to like the latter idea because it might
cause bad side effect. So we should use the former idea?

Not sure. I'm also concerned about the side effect but I've not checked yet.

Since probably there is no big difference between the two ways in
terms of performance I'm going to see how the performance benchmark
result will change first.

I've tested performance improvement again. From the left the execution
time of the current HEAD, Kasahara-san's patch, the method of always
checking the existing stats (using approach suggested by Fujii-san),
in seconds.

1000 tables:
autovac_workers 1 : 13s, 13s, 13s
autovac_workers 2 : 6s, 4s, 4s
autovac_workers 3 : 3s, 4s, 3s
autovac_workers 5 : 3s, 3s, 2s
autovac_workers 10: 2s, 3s, 2s

5000 tables:
autovac_workers 1 : 71s, 71s, 72s
autovac_workers 2 : 37s, 32s, 32s
autovac_workers 3 : 29s, 26s, 26s
autovac_workers 5 : 20s, 19s, 18s
autovac_workers 10: 13s, 8s, 8s

10000 tables:
autovac_workers 1 : 158s,157s, 159s
autovac_workers 2 : 80s, 53s, 78s
autovac_workers 3 : 75s, 67s, 67s
autovac_workers 5 : 61s, 42s, 42s
autovac_workers 10: 69s, 26s, 25s

20000 tables:
autovac_workers 1 : 379s, 380s, 389s
autovac_workers 2 : 236s, 232s, 233s
autovac_workers 3 : 222s, 181s, 182s
autovac_workers 5 : 212s, 132s, 139s
autovac_workers 10: 317s, 91s, 89s

I don't see a big difference between Kasahara-san's patch and the
method of always checking the existing stats.

Thanks for doing the benchmark!

This benchmark result makes me think that we don't need to tweak
AtEOXact_PgStat() and can use Kasahara-san approach.
That's good news :)

Yeah, given that all autovaucum workers have the list of tables to
vacuum in the same order in most cases, the assumption in
Kasahara-san’s patch that if a worker needs to vacuum a table it’s
unlikely that it will be able to skip the next table using the current
snapshot of stats makes sense to me.

One small comment on v6 patch:

+ /* When we decide to do vacuum or analyze, the existing stats cannot
+ * be reused in the next cycle because it's cleared at the end of vacuum
+ * or analyze (by AtEOXact_PgStat()).
+ */
+ use_existing_stats = false;

I think the comment should start on the second line (i.g., \n is
needed after /*).

Oops, thanks.
Fixed.

Thanks for updating the patch!

I applied the following cosmetic changes to the patch.
Attached is the updated version of the patch.
Coud you review this version?

- Ran pgindent to fix some warnings that "git diff --check"
reported on the patch.
- Made the order of arguments consistent between
recheck_relation_needs_vacanalyze and relation_needs_vacanalyze.
- Renamed the variable use_existing_stats to reuse_stats for simplicity.
- Added more comments.

Barring any objection, I'm thinking to commit this version.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

v8_mod_table_recheck_autovac.patchtext/plain; charset=UTF-8; name=v8_mod_table_recheck_autovac.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index aa5b97fbac..7e28944d2f 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -328,6 +328,10 @@ static void FreeWorkerInfo(int code, Datum arg);
 static autovac_table *table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 											TupleDesc pg_class_desc,
 											int effective_multixact_freeze_max_age);
+static void recheck_relation_needs_vacanalyze(Oid relid, AutoVacOpts *avopts,
+											  Form_pg_class classForm,
+											  int effective_multixact_freeze_max_age,
+											  bool *dovacuum, bool *doanalyze, bool *wraparound);
 static void relation_needs_vacanalyze(Oid relid, AutoVacOpts *relopts,
 									  Form_pg_class classForm,
 									  PgStat_StatTabEntry *tabentry,
@@ -2797,17 +2801,9 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 	bool		dovacuum;
 	bool		doanalyze;
 	autovac_table *tab = NULL;
-	PgStat_StatTabEntry *tabentry;
-	PgStat_StatDBEntry *shared;
-	PgStat_StatDBEntry *dbentry;
 	bool		wraparound;
 	AutoVacOpts *avopts;
-
-	/* use fresh stats */
-	autovac_refresh_stats();
-
-	shared = pgstat_fetch_stat_dbentry(InvalidOid);
-	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
+	static bool reuse_stats = false;
 
 	/* fetch the relation's relcache entry */
 	classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
@@ -2831,17 +2827,38 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 			avopts = &hentry->ar_reloptions;
 	}
 
-	/* fetch the pgstat table entry */
-	tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
-										 shared, dbentry);
+	/*
+	 * Reuse the stats to recheck whether a relation needs to be vacuumed or
+	 * analyzed if it was reloaded before and has not been cleared yet. This
+	 * is necessary to avoid frequent refresh of stats, especially when there
+	 * are very large number of relations and the refresh can cause lots of
+	 * overhead.
+	 *
+	 * If we determined that a relation needs to be vacuumed or analyzed,
+	 * based on the old stats, we refresh stats and recheck the necessity
+	 * again. Because a relation may have already been vacuumed or analyzed by
+	 * someone since the last reload of stats.
+	 */
+	if (reuse_stats)
+	{
+		recheck_relation_needs_vacanalyze(relid, avopts, classForm,
+										  effective_multixact_freeze_max_age,
+										  &dovacuum, &doanalyze, &wraparound);
 
-	relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
-							  effective_multixact_freeze_max_age,
-							  &dovacuum, &doanalyze, &wraparound);
+		/* Quick exit if a relation doesn't need to be vacuumed or analyzed */
+		if (!doanalyze && !dovacuum)
+		{
+			heap_freetuple(classTup);
+			return NULL;
+		}
+	}
 
-	/* ignore ANALYZE for toast tables */
-	if (classForm->relkind == RELKIND_TOASTVALUE)
-		doanalyze = false;
+	/* Use fresh stats and recheck again */
+	autovac_refresh_stats();
+
+	recheck_relation_needs_vacanalyze(relid, avopts, classForm,
+									  effective_multixact_freeze_max_age,
+									  &dovacuum, &doanalyze, &wraparound);
 
 	/* OK, it needs something done */
 	if (doanalyze || dovacuum)
@@ -2929,13 +2946,66 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 		tab->at_dobalance =
 			!(avopts && (avopts->vacuum_cost_limit > 0 ||
 						 avopts->vacuum_cost_delay > 0));
+
+		/*
+		 * When we decide to do vacuum or analyze, the existing stats cannot
+		 * be reused in the next cycle because it's cleared at the end of
+		 * vacuum or analyze (by AtEOXact_PgStat()).
+		 */
+		reuse_stats = false;
+	}
+	else
+	{
+		/*
+		 * If neither vacuum nor analyze is necessary, the existing stats is
+		 * not cleared and can be reused in the next cycle.
+		 */
+		reuse_stats = true;
 	}
 
 	heap_freetuple(classTup);
-
 	return tab;
 }
 
+/*
+ * recheck_relation_needs_vacanalyze
+ *
+ * Subroutine for table_recheck_autovac.
+ *
+ * Fetch the pgstat of a relation and recheck whether a relation
+ * needs to be vacuumed or analyzed.
+ */
+static void
+recheck_relation_needs_vacanalyze(Oid relid,
+								  AutoVacOpts *avopts,
+								  Form_pg_class classForm,
+								  int effective_multixact_freeze_max_age,
+								  bool *dovacuum,
+								  bool *doanalyze,
+								  bool *wraparound)
+{
+	PgStat_StatTabEntry *tabentry;
+	PgStat_StatDBEntry *shared = NULL;
+	PgStat_StatDBEntry *dbentry = NULL;
+
+	if (classForm->relisshared)
+		shared = pgstat_fetch_stat_dbentry(InvalidOid);
+	else
+		dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
+
+	/* fetch the pgstat table entry */
+	tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
+										 shared, dbentry);
+
+	relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
+							  effective_multixact_freeze_max_age,
+							  dovacuum, doanalyze, wraparound);
+
+	/* ignore ANALYZE for toast tables */
+	if (classForm->relkind == RELKIND_TOASTVALUE)
+		*doanalyze = false;
+}
+
 /*
  * relation_needs_vacanalyze
  *
#34Kasahara Tatsuhito
kasahara.tatsuhito@gmail.com
In reply to: Fujii Masao (#33)
Re: autovac issue with large number of tables

Hi,

On Thu, Dec 3, 2020 at 9:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/03 11:46, Kasahara Tatsuhito wrote:

On Wed, Dec 2, 2020 at 7:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/02 12:53, Masahiko Sawada wrote:

On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/01 16:23, Masahiko Sawada wrote:

On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/30 10:43, Masahiko Sawada wrote:

On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi, Thanks for you comments.

On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/27 18:38, Kasahara Tatsuhito wrote:

Hi,

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

Yeah, I would like to see how much the performance changes in other cases.
In addition, if the shared-based-stats patch is applied, we won't need to reload
a huge stats file, so we will just have to check the stats on
shared-mem every time.
Perhaps the logic of table_recheck_autovac could be simpler.

BTW, I found some typos in comments, so attache a fixed version.

The patch adds some duplicated codes into table_recheck_autovac().
It's better to make the common function performing them and make
table_recheck_autovac() call that common function, to simplify the code.

Thanks for your comment.
Hmm.. I've cut out the duplicate part.
Attach the patch.
Could you confirm that it fits your expecting?

Yes, thanks for updataing the patch! Here are another review comments.

+       shared = pgstat_fetch_stat_dbentry(InvalidOid);
+       dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);

When using the existing stats, ISTM that these are not necessary and
we can reuse "shared" and "dbentry" obtained before. Right?

Yeah, but unless autovac_refresh_stats() is called, these functions
read the information from the
local hash table without re-read the stats file, so the process is very light.
Therefore, I think, it is better to keep the current logic to keep the
code simple.

+               /* We might be better to refresh stats */
+               use_existing_stats = false;

I think that we should add more comments about why it's better to
refresh the stats in this case.

+               /* The relid has already vacuumed, so we might be better to use existing stats */
+               use_existing_stats = true;

I think that we should add more comments about why it's better to
reuse the stats in this case.

I added comments.

Attache the patch.

Thank you for updating the patch. Here are some small comments on the
latest (v4) patch.

+    * So if the last time we checked a table that was already vacuumed after
+    * refres stats, check the current statistics before refreshing it.
+    */

s/refres/refresh/

Thanks! fixed.
Attached the patch.

-----
+/* Counter to determine if statistics should be refreshed */
+static bool use_existing_stats = false;
+

I think 'use_existing_stats' can be declared within table_recheck_autovac().

-----
While testing the performance, I realized that the statistics are
reset every time vacuumed one table, leading to re-reading the stats
file even if 'use_existing_stats' is true. Please refer that vacuum()
eventually calls AtEOXact_PgStat() which calls to
pgstat_clear_snapshot().

Good catch!

I believe that's why the performance of the
method of always checking the existing stats wasn’t good as expected.
So if we save the statistics somewhere and use it for rechecking, the
results of the performance benchmark will differ between these two
methods.

Thanks for you checks.
But, if a worker did vacuum(), that means this worker had determined
need vacuum in the
table_recheck_autovac(). So, use_existing_stats set to false, and next
time, refresh stats.
Therefore I think the current patch is fine, as we want to avoid
unnecessary refreshing of
statistics before the actual vacuum(), right?

Yes, you're right.

When I benchmarked the performance of the method of always checking
existing stats I edited your patch so that it sets use_existing_stats
= true even if the second check is false (i.g., vacuum is needed).
And the result I got was worse than expected especially in the case of
a few autovacuum workers. But it doesn't evaluate the performance of
that method rightly as the stats snapshot is cleared every time
vacuum. Given you had similar results, I guess you used a similar way
when evaluating it, is it right? If so, it’s better to fix this issue
and see how the performance benchmark results will differ.

For example, the results of the test case with 10000 tables and 1
autovacuum worker I reported before was:

10000 tables:
autovac_workers 1 : 158s,157s, 290s

But after fixing that issue in the third method (always checking the
existing stats), the results are:

Could you tell me how you fixed that issue? You copied the stats to
somewhere as you suggested or skipped pgstat_clear_snapshot() as
I suggested?

I used the way you suggested in this quick test; skipped
pgstat_clear_snapshot().

Kasahara-san seems not to like the latter idea because it might
cause bad side effect. So we should use the former idea?

Not sure. I'm also concerned about the side effect but I've not checked yet.

Since probably there is no big difference between the two ways in
terms of performance I'm going to see how the performance benchmark
result will change first.

I've tested performance improvement again. From the left the execution
time of the current HEAD, Kasahara-san's patch, the method of always
checking the existing stats (using approach suggested by Fujii-san),
in seconds.

1000 tables:
autovac_workers 1 : 13s, 13s, 13s
autovac_workers 2 : 6s, 4s, 4s
autovac_workers 3 : 3s, 4s, 3s
autovac_workers 5 : 3s, 3s, 2s
autovac_workers 10: 2s, 3s, 2s

5000 tables:
autovac_workers 1 : 71s, 71s, 72s
autovac_workers 2 : 37s, 32s, 32s
autovac_workers 3 : 29s, 26s, 26s
autovac_workers 5 : 20s, 19s, 18s
autovac_workers 10: 13s, 8s, 8s

10000 tables:
autovac_workers 1 : 158s,157s, 159s
autovac_workers 2 : 80s, 53s, 78s
autovac_workers 3 : 75s, 67s, 67s
autovac_workers 5 : 61s, 42s, 42s
autovac_workers 10: 69s, 26s, 25s

20000 tables:
autovac_workers 1 : 379s, 380s, 389s
autovac_workers 2 : 236s, 232s, 233s
autovac_workers 3 : 222s, 181s, 182s
autovac_workers 5 : 212s, 132s, 139s
autovac_workers 10: 317s, 91s, 89s

I don't see a big difference between Kasahara-san's patch and the
method of always checking the existing stats.

Thanks for doing the benchmark!

This benchmark result makes me think that we don't need to tweak
AtEOXact_PgStat() and can use Kasahara-san approach.
That's good news :)

Yeah, given that all autovaucum workers have the list of tables to
vacuum in the same order in most cases, the assumption in
Kasahara-san’s patch that if a worker needs to vacuum a table it’s
unlikely that it will be able to skip the next table using the current
snapshot of stats makes sense to me.

One small comment on v6 patch:

+ /* When we decide to do vacuum or analyze, the existing stats cannot
+ * be reused in the next cycle because it's cleared at the end of vacuum
+ * or analyze (by AtEOXact_PgStat()).
+ */
+ use_existing_stats = false;

I think the comment should start on the second line (i.g., \n is
needed after /*).

Oops, thanks.
Fixed.

Thanks for updating the patch!

I applied the following cosmetic changes to the patch.
Attached is the updated version of the patch.
Coud you review this version?

Thanks for tweaking the patch.

- Ran pgindent to fix some warnings that "git diff --check"
reported on the patch.
- Made the order of arguments consistent between
recheck_relation_needs_vacanalyze and relation_needs_vacanalyze.
- Renamed the variable use_existing_stats to reuse_stats for simplicity.
- Added more comments.

I think it's no problem.
The patch passed makecheck, and I benchmarked "Anti wrap round VACUUM
case" (only 20000 tables) just in case.

From the left the execution time of the current HEAD, v8 patch.
tables 20000:
autovac workers 1: 319sec, 315sec
autovac workers 2: 301sec, 190sec
autovac workers 3: 270sec, 133sec
autovac workers 5: 211sec, 86sec
autovac workers 10: 376sec, 68sec

It's as expected.

Barring any objection, I'm thinking to commit this version.

+1

Best regards,

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com

#35Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Kasahara Tatsuhito (#34)
Re: autovac issue with large number of tables

On 2020/12/04 12:21, Kasahara Tatsuhito wrote:

Hi,

On Thu, Dec 3, 2020 at 9:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/03 11:46, Kasahara Tatsuhito wrote:

On Wed, Dec 2, 2020 at 7:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/02 12:53, Masahiko Sawada wrote:

On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/01 16:23, Masahiko Sawada wrote:

On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/30 10:43, Masahiko Sawada wrote:

On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi, Thanks for you comments.

On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/27 18:38, Kasahara Tatsuhito wrote:

Hi,

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

Yeah, I would like to see how much the performance changes in other cases.
In addition, if the shared-based-stats patch is applied, we won't need to reload
a huge stats file, so we will just have to check the stats on
shared-mem every time.
Perhaps the logic of table_recheck_autovac could be simpler.

BTW, I found some typos in comments, so attache a fixed version.

The patch adds some duplicated codes into table_recheck_autovac().
It's better to make the common function performing them and make
table_recheck_autovac() call that common function, to simplify the code.

Thanks for your comment.
Hmm.. I've cut out the duplicate part.
Attach the patch.
Could you confirm that it fits your expecting?

Yes, thanks for updataing the patch! Here are another review comments.

+       shared = pgstat_fetch_stat_dbentry(InvalidOid);
+       dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);

When using the existing stats, ISTM that these are not necessary and
we can reuse "shared" and "dbentry" obtained before. Right?

Yeah, but unless autovac_refresh_stats() is called, these functions
read the information from the
local hash table without re-read the stats file, so the process is very light.
Therefore, I think, it is better to keep the current logic to keep the
code simple.

+               /* We might be better to refresh stats */
+               use_existing_stats = false;

I think that we should add more comments about why it's better to
refresh the stats in this case.

+               /* The relid has already vacuumed, so we might be better to use existing stats */
+               use_existing_stats = true;

I think that we should add more comments about why it's better to
reuse the stats in this case.

I added comments.

Attache the patch.

Thank you for updating the patch. Here are some small comments on the
latest (v4) patch.

+    * So if the last time we checked a table that was already vacuumed after
+    * refres stats, check the current statistics before refreshing it.
+    */

s/refres/refresh/

Thanks! fixed.
Attached the patch.

-----
+/* Counter to determine if statistics should be refreshed */
+static bool use_existing_stats = false;
+

I think 'use_existing_stats' can be declared within table_recheck_autovac().

-----
While testing the performance, I realized that the statistics are
reset every time vacuumed one table, leading to re-reading the stats
file even if 'use_existing_stats' is true. Please refer that vacuum()
eventually calls AtEOXact_PgStat() which calls to
pgstat_clear_snapshot().

Good catch!

I believe that's why the performance of the
method of always checking the existing stats wasn’t good as expected.
So if we save the statistics somewhere and use it for rechecking, the
results of the performance benchmark will differ between these two
methods.

Thanks for you checks.
But, if a worker did vacuum(), that means this worker had determined
need vacuum in the
table_recheck_autovac(). So, use_existing_stats set to false, and next
time, refresh stats.
Therefore I think the current patch is fine, as we want to avoid
unnecessary refreshing of
statistics before the actual vacuum(), right?

Yes, you're right.

When I benchmarked the performance of the method of always checking
existing stats I edited your patch so that it sets use_existing_stats
= true even if the second check is false (i.g., vacuum is needed).
And the result I got was worse than expected especially in the case of
a few autovacuum workers. But it doesn't evaluate the performance of
that method rightly as the stats snapshot is cleared every time
vacuum. Given you had similar results, I guess you used a similar way
when evaluating it, is it right? If so, it’s better to fix this issue
and see how the performance benchmark results will differ.

For example, the results of the test case with 10000 tables and 1
autovacuum worker I reported before was:

10000 tables:
autovac_workers 1 : 158s,157s, 290s

But after fixing that issue in the third method (always checking the
existing stats), the results are:

Could you tell me how you fixed that issue? You copied the stats to
somewhere as you suggested or skipped pgstat_clear_snapshot() as
I suggested?

I used the way you suggested in this quick test; skipped
pgstat_clear_snapshot().

Kasahara-san seems not to like the latter idea because it might
cause bad side effect. So we should use the former idea?

Not sure. I'm also concerned about the side effect but I've not checked yet.

Since probably there is no big difference between the two ways in
terms of performance I'm going to see how the performance benchmark
result will change first.

I've tested performance improvement again. From the left the execution
time of the current HEAD, Kasahara-san's patch, the method of always
checking the existing stats (using approach suggested by Fujii-san),
in seconds.

1000 tables:
autovac_workers 1 : 13s, 13s, 13s
autovac_workers 2 : 6s, 4s, 4s
autovac_workers 3 : 3s, 4s, 3s
autovac_workers 5 : 3s, 3s, 2s
autovac_workers 10: 2s, 3s, 2s

5000 tables:
autovac_workers 1 : 71s, 71s, 72s
autovac_workers 2 : 37s, 32s, 32s
autovac_workers 3 : 29s, 26s, 26s
autovac_workers 5 : 20s, 19s, 18s
autovac_workers 10: 13s, 8s, 8s

10000 tables:
autovac_workers 1 : 158s,157s, 159s
autovac_workers 2 : 80s, 53s, 78s
autovac_workers 3 : 75s, 67s, 67s
autovac_workers 5 : 61s, 42s, 42s
autovac_workers 10: 69s, 26s, 25s

20000 tables:
autovac_workers 1 : 379s, 380s, 389s
autovac_workers 2 : 236s, 232s, 233s
autovac_workers 3 : 222s, 181s, 182s
autovac_workers 5 : 212s, 132s, 139s
autovac_workers 10: 317s, 91s, 89s

I don't see a big difference between Kasahara-san's patch and the
method of always checking the existing stats.

Thanks for doing the benchmark!

This benchmark result makes me think that we don't need to tweak
AtEOXact_PgStat() and can use Kasahara-san approach.
That's good news :)

Yeah, given that all autovaucum workers have the list of tables to
vacuum in the same order in most cases, the assumption in
Kasahara-san’s patch that if a worker needs to vacuum a table it’s
unlikely that it will be able to skip the next table using the current
snapshot of stats makes sense to me.

One small comment on v6 patch:

+ /* When we decide to do vacuum or analyze, the existing stats cannot
+ * be reused in the next cycle because it's cleared at the end of vacuum
+ * or analyze (by AtEOXact_PgStat()).
+ */
+ use_existing_stats = false;

I think the comment should start on the second line (i.g., \n is
needed after /*).

Oops, thanks.
Fixed.

Thanks for updating the patch!

I applied the following cosmetic changes to the patch.
Attached is the updated version of the patch.
Coud you review this version?

Thanks for tweaking the patch.

- Ran pgindent to fix some warnings that "git diff --check"
reported on the patch.
- Made the order of arguments consistent between
recheck_relation_needs_vacanalyze and relation_needs_vacanalyze.
- Renamed the variable use_existing_stats to reuse_stats for simplicity.
- Added more comments.

I think it's no problem.
The patch passed makecheck, and I benchmarked "Anti wrap round VACUUM
case" (only 20000 tables) just in case.

From the left the execution time of the current HEAD, v8 patch.
tables 20000:
autovac workers 1: 319sec, 315sec
autovac workers 2: 301sec, 190sec
autovac workers 3: 270sec, 133sec
autovac workers 5: 211sec, 86sec
autovac workers 10: 376sec, 68sec

It's as expected.

Thanks!

Barring any objection, I'm thinking to commit this version.

+1

Pushed.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#36Kasahara Tatsuhito
kasahara.tatsuhito@gmail.com
In reply to: Fujii Masao (#35)
Re: autovac issue with large number of tables

On Wed, Dec 9, 2020 at 12:01 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/04 12:21, Kasahara Tatsuhito wrote:

Hi,

On Thu, Dec 3, 2020 at 9:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/03 11:46, Kasahara Tatsuhito wrote:

On Wed, Dec 2, 2020 at 7:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/02 12:53, Masahiko Sawada wrote:

On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/12/01 16:23, Masahiko Sawada wrote:

On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/30 10:43, Masahiko Sawada wrote:

On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi, Thanks for you comments.

On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/27 18:38, Kasahara Tatsuhito wrote:

Hi,

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/11/26 10:41, Kasahara Tatsuhito wrote:

On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:

I wonder if we could have table_recheck_autovac do two probes of the stats
data. First probe the existing stats data, and if it shows the table to
be already vacuumed, return immediately. If not, *then* force a stats
re-read, and check a second time.

Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
- SET autovacuum = off
- CREATE tables with 100 rows
- DELETE 90 rows for each tables
- SET autovacuum = on and restart PostgreSQL
- Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
- CREATE brank tables
- SELECT all of these tables (for generate stats)
- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
- Consumes a lot of XIDs by using txid_curent()
- Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 10000 and 20000.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===========================================================================
[1.Normal VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec

[2.Anti wrap round VACUUM case]
tables:1000
autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec

tables:5000
autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec

tables:10000
autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec

tables:20000
autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
===========================================================================

The cases without patch, the scalability of the worker has decreased
as the number of tables has increased.
In fact, the more workers there are, the longer it takes to complete
VACUUM to all tables.
The cases with patch, it shows good scalability with respect to the
number of workers.

It seems a good performance improvement even without the patch of
shared memory based stats collector.

Sounds great!

Note that perf top results showed that hash_search_with_hash_value,
hash_seq_search and
pgstat_read_statsfiles are dominant during VACUUM in all patterns,
with or without the patch.

Therefore, there is still a need to find ways to optimize the reading
of large amounts of stats.
However, this patch is effective in its own right, and since there are
only a few parts to modify,
I think it should be able to be applied to current (preferably
pre-v13) PostgreSQL.

+1

+
+       /* We might be better to refresh stats */
+       use_existing_stats = false;
}
+   else
+   {
-   heap_freetuple(classTup);
+       heap_freetuple(classTup);
+       /* The relid has already vacuumed, so we might be better to
use exiting stats */
+       use_existing_stats = true;
+   }

With that patch, the autovacuum process refreshes the stats in the
next check if it finds out that the table still needs to be vacuumed.
But I guess it's not necessarily true because the next table might be
vacuumed already. So I think we might want to always use the existing
for the first check. What do you think?

Thanks for your comment.

If we assume the case where some workers vacuum on large tables
and a single worker vacuum on small tables, the processing
performance of the single worker will be slightly lower if the
existing statistics are checked every time.

In fact, at first I tried to check the existing stats every time,
but the performance was slightly worse in cases with a small number of workers.

Do you have this benchmark result?

(Checking the existing stats is lightweight , but at high frequency,
it affects processing performance.)
Therefore, at after refresh statistics, determine whether autovac
should use the existing statistics.

Yeah, since the test you used uses a lot of small tables, if there are
a few workers, checking the existing stats is unlikely to return true
(no need to vacuum). So the cost of existing stats check ends up being
overhead. Not sure how slow always checking the existing stats was,
but given that the shared memory based stats collector patch could
improve the performance of refreshing stats, it might be better not to
check the existing stats frequently like the patch does. Anyway, I
think it’s better to evaluate the performance improvement with other
cases too.

Yeah, I would like to see how much the performance changes in other cases.
In addition, if the shared-based-stats patch is applied, we won't need to reload
a huge stats file, so we will just have to check the stats on
shared-mem every time.
Perhaps the logic of table_recheck_autovac could be simpler.

BTW, I found some typos in comments, so attache a fixed version.

The patch adds some duplicated codes into table_recheck_autovac().
It's better to make the common function performing them and make
table_recheck_autovac() call that common function, to simplify the code.

Thanks for your comment.
Hmm.. I've cut out the duplicate part.
Attach the patch.
Could you confirm that it fits your expecting?

Yes, thanks for updataing the patch! Here are another review comments.

+       shared = pgstat_fetch_stat_dbentry(InvalidOid);
+       dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);

When using the existing stats, ISTM that these are not necessary and
we can reuse "shared" and "dbentry" obtained before. Right?

Yeah, but unless autovac_refresh_stats() is called, these functions
read the information from the
local hash table without re-read the stats file, so the process is very light.
Therefore, I think, it is better to keep the current logic to keep the
code simple.

+               /* We might be better to refresh stats */
+               use_existing_stats = false;

I think that we should add more comments about why it's better to
refresh the stats in this case.

+               /* The relid has already vacuumed, so we might be better to use existing stats */
+               use_existing_stats = true;

I think that we should add more comments about why it's better to
reuse the stats in this case.

I added comments.

Attache the patch.

Thank you for updating the patch. Here are some small comments on the
latest (v4) patch.

+    * So if the last time we checked a table that was already vacuumed after
+    * refres stats, check the current statistics before refreshing it.
+    */

s/refres/refresh/

Thanks! fixed.
Attached the patch.

-----
+/* Counter to determine if statistics should be refreshed */
+static bool use_existing_stats = false;
+

I think 'use_existing_stats' can be declared within table_recheck_autovac().

-----
While testing the performance, I realized that the statistics are
reset every time vacuumed one table, leading to re-reading the stats
file even if 'use_existing_stats' is true. Please refer that vacuum()
eventually calls AtEOXact_PgStat() which calls to
pgstat_clear_snapshot().

Good catch!

I believe that's why the performance of the
method of always checking the existing stats wasn’t good as expected.
So if we save the statistics somewhere and use it for rechecking, the
results of the performance benchmark will differ between these two
methods.

Thanks for you checks.
But, if a worker did vacuum(), that means this worker had determined
need vacuum in the
table_recheck_autovac(). So, use_existing_stats set to false, and next
time, refresh stats.
Therefore I think the current patch is fine, as we want to avoid
unnecessary refreshing of
statistics before the actual vacuum(), right?

Yes, you're right.

When I benchmarked the performance of the method of always checking
existing stats I edited your patch so that it sets use_existing_stats
= true even if the second check is false (i.g., vacuum is needed).
And the result I got was worse than expected especially in the case of
a few autovacuum workers. But it doesn't evaluate the performance of
that method rightly as the stats snapshot is cleared every time
vacuum. Given you had similar results, I guess you used a similar way
when evaluating it, is it right? If so, it’s better to fix this issue
and see how the performance benchmark results will differ.

For example, the results of the test case with 10000 tables and 1
autovacuum worker I reported before was:

10000 tables:
autovac_workers 1 : 158s,157s, 290s

But after fixing that issue in the third method (always checking the
existing stats), the results are:

Could you tell me how you fixed that issue? You copied the stats to
somewhere as you suggested or skipped pgstat_clear_snapshot() as
I suggested?

I used the way you suggested in this quick test; skipped
pgstat_clear_snapshot().

Kasahara-san seems not to like the latter idea because it might
cause bad side effect. So we should use the former idea?

Not sure. I'm also concerned about the side effect but I've not checked yet.

Since probably there is no big difference between the two ways in
terms of performance I'm going to see how the performance benchmark
result will change first.

I've tested performance improvement again. From the left the execution
time of the current HEAD, Kasahara-san's patch, the method of always
checking the existing stats (using approach suggested by Fujii-san),
in seconds.

1000 tables:
autovac_workers 1 : 13s, 13s, 13s
autovac_workers 2 : 6s, 4s, 4s
autovac_workers 3 : 3s, 4s, 3s
autovac_workers 5 : 3s, 3s, 2s
autovac_workers 10: 2s, 3s, 2s

5000 tables:
autovac_workers 1 : 71s, 71s, 72s
autovac_workers 2 : 37s, 32s, 32s
autovac_workers 3 : 29s, 26s, 26s
autovac_workers 5 : 20s, 19s, 18s
autovac_workers 10: 13s, 8s, 8s

10000 tables:
autovac_workers 1 : 158s,157s, 159s
autovac_workers 2 : 80s, 53s, 78s
autovac_workers 3 : 75s, 67s, 67s
autovac_workers 5 : 61s, 42s, 42s
autovac_workers 10: 69s, 26s, 25s

20000 tables:
autovac_workers 1 : 379s, 380s, 389s
autovac_workers 2 : 236s, 232s, 233s
autovac_workers 3 : 222s, 181s, 182s
autovac_workers 5 : 212s, 132s, 139s
autovac_workers 10: 317s, 91s, 89s

I don't see a big difference between Kasahara-san's patch and the
method of always checking the existing stats.

Thanks for doing the benchmark!

This benchmark result makes me think that we don't need to tweak
AtEOXact_PgStat() and can use Kasahara-san approach.
That's good news :)

Yeah, given that all autovaucum workers have the list of tables to
vacuum in the same order in most cases, the assumption in
Kasahara-san’s patch that if a worker needs to vacuum a table it’s
unlikely that it will be able to skip the next table using the current
snapshot of stats makes sense to me.

One small comment on v6 patch:

+ /* When we decide to do vacuum or analyze, the existing stats cannot
+ * be reused in the next cycle because it's cleared at the end of vacuum
+ * or analyze (by AtEOXact_PgStat()).
+ */
+ use_existing_stats = false;

I think the comment should start on the second line (i.g., \n is
needed after /*).

Oops, thanks.
Fixed.

Thanks for updating the patch!

I applied the following cosmetic changes to the patch.
Attached is the updated version of the patch.
Coud you review this version?

Thanks for tweaking the patch.

- Ran pgindent to fix some warnings that "git diff --check"
reported on the patch.
- Made the order of arguments consistent between
recheck_relation_needs_vacanalyze and relation_needs_vacanalyze.
- Renamed the variable use_existing_stats to reuse_stats for simplicity.
- Added more comments.

I think it's no problem.
The patch passed makecheck, and I benchmarked "Anti wrap round VACUUM
case" (only 20000 tables) just in case.

From the left the execution time of the current HEAD, v8 patch.
tables 20000:
autovac workers 1: 319sec, 315sec
autovac workers 2: 301sec, 190sec
autovac workers 3: 270sec, 133sec
autovac workers 5: 211sec, 86sec
autovac workers 10: 376sec, 68sec

It's as expected.

Thanks!

Barring any objection, I'm thinking to commit this version.

+1

Pushed.

Thanks !

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com