Autovacuum launcher process launches worker process at high frequency
Hi all,
I found the kind of strange behaviour of the autovacuum launcher
process when XID anti-wraparound vacuum.
Suppose that a database (say test_db) whose age of frozenxid is about
to reach max_autovacuum_max_age has three tables T1 and T2.
T1 is very large and is frequently updated, so vacuum takes long time
for vacuum.
T2 is static and already frozen table, thus vacuum can skip to vacuum
whole table.
And anti-wraparound vacuum was already executed on other databases.
Once the age of datfrozenxid of test_db exceeded
max_autovacuum_max_age, autovacuum launcher launches worker process in
order to do anti-wraparound vacuum on testdb.
A worker process assigned to test_db begins to vacuum T1, it takes long time.
Meanwhile another worker process is assigned to test_db and completes
to vacuum on T2 and exits.
After for while, the autovacuum launcher launches new worker again and
worker is assigned to test_db again.
But that worker exits quickly because there is no table we need to
vacuum. (T1 is being vacuumed by another worker process).
When new worker process starts, worker process sends SIGUSR2 signal to
launcher process to wake up him.
Although the launcher process executes WaitLatch() after launched new
worker, it is woken up and launches another new worker process soon
again.
As a result, launcher process launches new worker process at extremely
high frequency regardless of autovacuum_naptime, which increase cpu
use rate.
Why does auto vacuum worker need to wake up launcher process after started?
autovacuum.c:L1604
/* wake up the launcher */
if (AutoVacuumShmem->av_launcherpid != 0)
kill(AutoVacuumShmem->av_launcherpid, SIGUSR2);
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 5, 2016 at 7:28 AM, Masahiko Sawada <sawada.mshk@gmail.com>
wrote:
Hi all,
I found the kind of strange behaviour of the autovacuum launcher
process when XID anti-wraparound vacuum.Suppose that a database (say test_db) whose age of frozenxid is about
to reach max_autovacuum_max_age has three tables T1 and T2.
T1 is very large and is frequently updated, so vacuum takes long time
for vacuum.
T2 is static and already frozen table, thus vacuum can skip to vacuum
whole table.
And anti-wraparound vacuum was already executed on other databases.Once the age of datfrozenxid of test_db exceeded
max_autovacuum_max_age, autovacuum launcher launches worker process in
order to do anti-wraparound vacuum on testdb.
A worker process assigned to test_db begins to vacuum T1, it takes long
time.
Meanwhile another worker process is assigned to test_db and completes
to vacuum on T2 and exits.After for while, the autovacuum launcher launches new worker again and
worker is assigned to test_db again.
But that worker exits quickly because there is no table we need to
vacuum. (T1 is being vacuumed by another worker process).
When new worker process starts, worker process sends SIGUSR2 signal to
launcher process to wake up him.
Although the launcher process executes WaitLatch() after launched new
worker, it is woken up and launches another new worker process soon
again.
See also this thread, which was never resolved:
/messages/by-id/CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=YPyFPQ@mail.gmail.com
As a result, launcher process launches new worker process at extremely
high frequency regardless of autovacuum_naptime, which increase cpu
use rate.Why does auto vacuum worker need to wake up launcher process after started?
autovacuum.c:L1604
/* wake up the launcher */
if (AutoVacuumShmem->av_launcherpid != 0)
kill(AutoVacuumShmem->av_launcherpid, SIGUSR2);
I think that that is so that the launcher can launch multiple workers in
quick succession if it has fallen behind schedule. It can't launch them in
a tight loop, because its signals to the postmaster would get merged into
one signal, so it has to wait for one to get mostly set-up before launching
the next.
But it doesn't make any real difference to your scenario, as the
short-lived worker will wake the launcher up a few microseconds later
anyway, when it realizes it has no work to do and so exits.
Cheers,
Jeff
On Thu, Oct 6, 2016 at 12:11 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Oct 5, 2016 at 7:28 AM, Masahiko Sawada <sawada.mshk@gmail.com>
wrote:Hi all,
I found the kind of strange behaviour of the autovacuum launcher
process when XID anti-wraparound vacuum.Suppose that a database (say test_db) whose age of frozenxid is about
to reach max_autovacuum_max_age has three tables T1 and T2.
T1 is very large and is frequently updated, so vacuum takes long time
for vacuum.
T2 is static and already frozen table, thus vacuum can skip to vacuum
whole table.
And anti-wraparound vacuum was already executed on other databases.Once the age of datfrozenxid of test_db exceeded
max_autovacuum_max_age, autovacuum launcher launches worker process in
order to do anti-wraparound vacuum on testdb.
A worker process assigned to test_db begins to vacuum T1, it takes long
time.
Meanwhile another worker process is assigned to test_db and completes
to vacuum on T2 and exits.After for while, the autovacuum launcher launches new worker again and
worker is assigned to test_db again.
But that worker exits quickly because there is no table we need to
vacuum. (T1 is being vacuumed by another worker process).
When new worker process starts, worker process sends SIGUSR2 signal to
launcher process to wake up him.
Although the launcher process executes WaitLatch() after launched new
worker, it is woken up and launches another new worker process soon
again.See also this thread, which was never resolved:
/messages/by-id/CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=YPyFPQ@mail.gmail.com
As a result, launcher process launches new worker process at extremely
high frequency regardless of autovacuum_naptime, which increase cpu
use rate.Why does auto vacuum worker need to wake up launcher process after
started?autovacuum.c:L1604
/* wake up the launcher */
if (AutoVacuumShmem->av_launcherpid != 0)
kill(AutoVacuumShmem->av_launcherpid, SIGUSR2);I think that that is so that the launcher can launch multiple workers in
quick succession if it has fallen behind schedule. It can't launch them in a
tight loop, because its signals to the postmaster would get merged into one
signal, so it has to wait for one to get mostly set-up before launching the
next.But it doesn't make any real difference to your scenario, as the short-lived
worker will wake the launcher up a few microseconds later anyway, when it
realizes it has no work to do and so exits.
Thank you for the reply.
I also thought that it's better to have information about how many
tables there are in each database and not been vacuumed yet.
But I'm not sure how to implement that and the current optimistic
logic is more safe in most situation.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers