Return of the Solaris vacuum polling problem -- anyone remember this?
All,
This is something I'd swear we fixed around 8.3.2. However, I'm seeing
it again in production, and was wondering if anyone could remember what
the root cause was and how we fixed it.
The problem is that sometimes (but not the majority of times) autovaccum
with cost_delay is going into a pathological cycle where it polls the
system clock after reading every single disk page of a table. On large
tables, this results in vacuum not completing within the lifetime of the
server. In most cases, killing autovaccuum and restarting it will cause
it to behave normally.
The below is the truss from the exhibited issue on 8.3.11 on Solaris
10u7, compiled with sun cc:
pollsys(0xFFFFFD7FFFDF9030, 0, 0xFFFFFD7FFFDF90C0, 0x00000000) = 0
lseek(39, 0x28F88000, SEEK_SET) = 0x28F88000
write(39, "0E10\0\0E0 CB5C101\001\0".., 8192) = 8192
lseek(39, 0x28FCA000, SEEK_SET) = 0x28FCA000
read(39, " q\r\0\0 `9CD2B001\001\0".., 8192) = 8192
pollsys(0xFFFFFD7FFFDF9030, 0, 0xFFFFFD7FFFDF90C0, 0x00000000) = 0
read(39, " F0E\0\090A888 H01\001\0".., 8192) = 8192
pollsys(0xFFFFFD7FFFDF9030, 0, 0xFFFFFD7FFFDF90C0, 0x00000000) = 0
read(39, " q\r\0\0C819D3B001\001\0".., 8192) = 8192
pollsys(0xFFFFFD7FFFDF9030, 0, 0xFFFFFD7FFFDF90C0, 0x00000000) = 0
lseek(39, 0x28F90000, SEEK_SET) = 0x28F90000
write(39, "0E10\0\0 0 gB7C101\001\0".., 8192) = 8192
lseek(39, 0x28FD0000, SEEK_SET) = 0x28FD0000
read(39, " q\r\0\0 X 8D3B001\001\0".., 8192) = 8192
pollsys(0xFFFFFD7FFFDF9030, 0, 0xFFFFFD7FFFDF90C0, 0x00000000) = 0
read(39, " t0F\0\0 H +8F !01\001\0".., 8192) = 8192
pollsys(0xFFFFFD7FFFDF9030, 0, 0xFFFFFD7FFFDF90C0, 0x00000000) = 0
read(39, " q\r\0\0F0 sD3B001\001\0".., 8192) = 8192
pollsys(0xFFFFFD7FFFDF9030, 0, 0xFFFFFD7FFFDF90C0, 0x00000000) = 0
read(39, " F0E\0\0 0C888 H01\001\0".., 8192) = 8192
pollsys(0xFFFFFD7FFFDF9030, 0, 0xFFFFFD7FFFDF90C0, 0x00000000) = 0
pollsys(0xFFFFFD7FFFDF9030, 0, 0xFFFFFD7FFFDF90C0, 0x00000000) = 0
lseek(39, 0x28FDA000, SEEK_SET) = 0x28FDA000
read(39, " q\r\0\0C0D1D3B001\001\0".., 8192) = 8192
pollsys(0xFFFFFD7FFFDF9030, 0, 0xFFFFFD7FFFDF90C0, 0x00000000) = 0
read(39, " q\r\0\0D8F0D3B001\001\0".., 8192) = 8192
pollsys(0xFFFFFD7FFFDF9030, 0, 0xFFFFFD7FFFDF90C0, 0x00000000) = 0
read(39, " F0E\0\0800189 H01\001\0".., 8192) = 8192
pollsys(0xFFFFFD7FFFDF9030, 0, 0xFFFFFD7FFFDF90C0, 0x00000000) = 0
read(39, " q0F\0\0D0 ^A9F701\001\0".., 8192) = 8192
pollsys(0xFFFFFD7FFFDF9030, 0, 0xFFFFFD7FFFDF90C0, 0x00000000) = 0
read(39, " F0E\0\010 ?89 H01\001\0".., 8192) = 8192
pollsys(0xFFFFFD7FFFDF9030, 0, 0xFFFFFD7FFFDF90C0, 0x00000000) = 0
read(39, " q\r\0\0 x mD4B001\001\0".., 8192) = 8192
pollsys(0xFFFFFD7FFFDF9030, 0, 0xFFFFFD7FFFDF90C0, 0x00000000) = 0
read(39, " F0E\0\0 X _89 H01\001\0".., 8192) = 8192
pollsys(0xFFFFFD7FFFDF9030, 0, 0xFFFFFD7FFFDF90C0, 0x00000000) = 0
read(39, " q\r\0\0 @ADD4B001\001\0".., 8192) = 8192
For contrast, this is normal behavior:
read(10, " }\0\0\0 X82 >E301\0\0\0".., 8192) = 8192
read(10, " }\0\0\018 4 ME301\0\0\0".., 8192) = 8192
read(10, " }\0\0\0E881 NE301\0\0\0".., 8192) = 8192
semop(16777221, 0xFFFFFD7FFFDF8FB8, 1) = 0
read(10, " }\0\0\0 PEE \E301\0\0\0".., 8192) = 8192
read(10, " }\0\0\0\b k ^E301\0\0\0".., 8192) = 8192
read(10, " }\0\0\0 8E0 jE301\0\0\0".., 8192) = 8192
read(10, " }\0\0\0 P07 nE301\0\0\0".., 8192) = 8192
read(10, " }\0\0\0D885 xE301\0\0\0".., 8192) = 8192
read(10, " }\0\0\0 8D }E301\0\0\0".., 8192) = 8192
read(10, " }\0\0\0 xD280E301\0\0\0".., 8192) = 8192
read(10, " }\0\0\010DF8CE301\0\0\0".., 8192) = 8192
read(10, " }\0\0\0E09E8EE301\0\0\0".., 8192) = 8192
read(10, " }\0\0\0C8E29CE301\0\0\0".., 8192) = 8192
read(10, " }\0\0\080889EE301\0\0\0".., 8192) = 8192
read(10, " }\0\0\0B0 UADE301\0\0\0".., 8192) = 8192
read(10, " }\0\0\0C0E4BCE301\0\0\0".., 8192) = 8192
read(10, " }\0\0\0 !C0E301\0\0\0".., 8192) = 8192
read(10, " }\0\0\010 UCDE301\0\0\0".., 8192) = 8192
read(10, " }\0\0\0F8EBCEE301\0\0\0".., 8192) = 8192
read(10, " }\0\0\08092DDE301\0\0\0".., 8192) = 8192
read(10, " }\0\0\0A8 QDFE301\0\0\0".., 8192) = 8192
read(10, " }\0\0\0 x cEDE301\0\0\0".., 8192) = 8192
read(10, " }\0\0\0D8 "EFE301\0\0\0".., 8192) = 8192
read(10, " }\0\0\0 P15FAE301\0\0\0".., 8192) = 8192
read(10, " }\0\0\0C8C0FDE301\0\0\0".., 8192) = 8192
read(10, " }\0\0\0 PFC\tE401\0\0\0".., 8192) = 8192
read(10, " }\0\0\0 8C3\rE401\0\0\0".., 8192) = 8192
read(10, " }\0\0\0 @890FE401\0\0\0".., 8192) = 8192
read(10, " }\0\0\0 r11E401\0\0\0".., 8192) = 8192
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
On 08/16/2010 11:24 AM, Josh Berkus wrote:
All,
This is something I'd swear we fixed around 8.3.2. However, I'm seeing
it again in production, and was wondering if anyone could remember what
the root cause was and how we fixed it.
I've also recently heard a report of vacuum hanging on 8.3.x on Solaris
Sparc. Any chance you can get a backtrace from a build with debug symbols?
Joe
--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
I've also recently heard a report of vacuum hanging on 8.3.x on Solaris
Sparc. Any chance you can get a backtrace from a build with debug symbols?
The problem is that we haven't been able to reproduce the bug in
testing. Like I said, it only seems to happen occasionally ... like
maybe once in 10 or 20 (or more?) autovacuums. We've never been seen it
with a manual vacuum at all.
And we can't rebuild the production servers.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes:
This is something I'd swear we fixed around 8.3.2. However, I'm seeing
it again in production, and was wondering if anyone could remember what
the root cause was and how we fixed it.
Hmm, I can't find anything in the 8.3-series CVS logs suggesting that
there was a post-8.3.0 fix related to vacuum delays.
The problem is that sometimes (but not the majority of times) autovaccum
with cost_delay is going into a pathological cycle where it polls the
system clock after reading every single disk page of a table.
What I find interesting about that trace is the large proportion of
writes. That appears to me to indicate that it's *not* a matter of
vacuum delays, or at least not just a matter of that. The process seems
to be getting involved in having to dump dirty buffers to disk. Perhaps
the background writer is malfunctioning?
Another idea that comes to mind is that you have vacuum_cost_page_dirty
set to an unreasonably large value, so that autovac is blocking whenever
it has to write even one page.
regards, tom lane
On 08/16/2010 12:12 PM, Josh Berkus wrote:
I've also recently heard a report of vacuum hanging on 8.3.x on Solaris
Sparc. Any chance you can get a backtrace from a build with debug symbols?The problem is that we haven't been able to reproduce the bug in
testing. Like I said, it only seems to happen occasionally ... like
maybe once in 10 or 20 (or more?) autovacuums. We've never been seen it
with a manual vacuum at all.And we can't rebuild the production servers.
Hmmm, well I don't know how to reproduce it on demand either -- I'll try
to get a backtrace from the wild if possible. I'll keep you posted...
Joe
--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
Excerpts from Joe Conway's message of lun ago 16 16:47:19 -0400 2010:
On 08/16/2010 12:12 PM, Josh Berkus wrote:
I've also recently heard a report of vacuum hanging on 8.3.x on Solaris
Sparc. Any chance you can get a backtrace from a build with debug symbols?The problem is that we haven't been able to reproduce the bug in
testing. Like I said, it only seems to happen occasionally ... like
maybe once in 10 or 20 (or more?) autovacuums. We've never been seen it
with a manual vacuum at all.And we can't rebuild the production servers.
Hmmm, well I don't know how to reproduce it on demand either -- I'll try
to get a backtrace from the wild if possible. I'll keep you posted...
FWIW there's also a report of it hanging in FreeBSD, but sadly when the
process is inspected under truss, it dies because of its "parent PID"
attribute changing underneath and thus triggering the safety feature
that makes it die if the parent postmaster disappears.
I suspect that the problem may lie in the "cost_delay rebalance" code in
autovacuum.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Excerpts from Alvaro Herrera's message of lun ago 16 16:58:31 -0400 2010:
I suspect that the problem may lie in the "cost_delay rebalance" code in
autovacuum.
Hmm, so we have this code:
void
AutoVacuumUpdateDelay(void)
{
if (MyWorkerInfo)
{
VacuumCostDelay = MyWorkerInfo->wi_cost_delay;
VacuumCostLimit = MyWorkerInfo->wi_cost_limit;
}
}
where the MyWorkerInfo bits come from shared memory and can be modified
by other autovac worker processes. We could read an incomplete value
into our variables. But this only makes sense if an "int" variable can
be subject to a partial read/write, which we already assume not to be so
(c.f. GetNewTransactionId).
In any case, if you happen to see this reoccur, could you please attach
GDB to the misbehaving worker and see what VacuumCostDelay and
VacuumCostLimit print out as?
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Another idea that comes to mind is that you have vacuum_cost_page_dirty
set to an unreasonably large value, so that autovac is blocking whenever
it has to write even one page.
Nope. Default. And total cost was raised to 1000.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
What I find interesting about that trace is the large proportion of
writes. That appears to me to indicate that it's *not* a matter of
vacuum delays, or at least not just a matter of that. The process seems
to be getting involved in having to dump dirty buffers to disk. Perhaps
the background writer is malfunctioning?
You appear to be correct in that it's write-related. Will be testing on
what specificially is producing it.
Note that this is one of two ostensibly duplicate servers, and the issue
has never appeared on the other server.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes:
What I find interesting about that trace is the large proportion of
writes. That appears to me to indicate that it's *not* a matter of
vacuum delays, or at least not just a matter of that. The process seems
to be getting involved in having to dump dirty buffers to disk. Perhaps
the background writer is malfunctioning?
You appear to be correct in that it's write-related. Will be testing on
what specificially is producing it.
Note that this is one of two ostensibly duplicate servers, and the issue
has never appeared on the other server.
On further reflection, though: since we put in the BufferAccessStrategy
code, which was in 8.3, the background writer isn't *supposed* to be
very much involved in writing pages that are dirtied by VACUUM. VACUUM
runs in a small ring of buffers and is supposed to have to clean its own
dirt most of the time. So it's wrong to blame this on the bgwriter not
holding up its end. Rather, what you need to be thinking about is how
come vacuum seems to be making lots of pages dirty on only one of these
machines.
regards, tom lane
On further reflection, though: since we put in the BufferAccessStrategy
code, which was in 8.3, the background writer isn't *supposed* to be
very much involved in writing pages that are dirtied by VACUUM. VACUUM
runs in a small ring of buffers and is supposed to have to clean its own
dirt most of the time. So it's wrong to blame this on the bgwriter not
holding up its end. Rather, what you need to be thinking about is how
come vacuum seems to be making lots of pages dirty on only one of these
machines.
This is an anti-wraparound vacuum, so it could have something to do with
the hint bits. Maybe it's setting the freeze bit on every page, and
writing them one page at a time? Still don't understand the call to
pollsys, even so, though.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes:
Rather, what you need to be thinking about is how
come vacuum seems to be making lots of pages dirty on only one of these
machines.
This is an anti-wraparound vacuum, so it could have something to do with
the hint bits. Maybe it's setting the freeze bit on every page, and
writing them one page at a time?
That would explain all the writes, but it doesn't seem to explain why
your two servers aren't behaving similarly.
Still don't understand the call to pollsys, even so, though.
Most likely that's the libc implementation of the select()-based sleeps
for vacuum_cost_delay. I'm still suspicious that the writes are eating
more cost_delay points than you think.
regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote:
Josh Berkus <josh@agliodbs.com> writes:
This is an anti-wraparound vacuum, so it could have something to
do with the hint bits. Maybe it's setting the freeze bit on
every page, and writing them one page at a time?That would explain all the writes, but it doesn't seem to explain
why your two servers aren't behaving similarly.
One was bulk-loaded from the other, or they were bulk-loaded at
different times? Or one had some other activity that boosted the
xid count, possibly in another database?
-Kevin
That would explain all the writes, but it doesn't seem to explain why
your two servers aren't behaving similarly.
Well, that's why I said "ostensibly identical". There may in fact be
differences, not just in the databases but in some OS libs as well.
These servers have been in production for quite a while, and the owner
has a messy deployment process.
Most likely that's the libc implementation of the select()-based sleeps
for vacuum_cost_delay. I'm still suspicious that the writes are eating
more cost_delay points than you think.
Tested that. It does look like if I increase vacuum_cost_limit to 10000
and lower vacuum_cost_page_dirty to 10, it reads 5-7 pages and writes
2-3 before each pollsys. The math seems completely wrong on that,
though -- it should be 50 and 30 pages, or similar. If I can, I'll test
a vacuum without cost_delay and make sure the pollsys() are connected to
the cost delay and not something else.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
Tested that. It does look like if I increase vacuum_cost_limit to 10000
and lower vacuum_cost_page_dirty to 10, it reads 5-7 pages and writes
2-3 before each pollsys. The math seems completely wrong on that,
though -- it should be 50 and 30 pages, or similar. If I can, I'll test
a vacuum without cost_delay and make sure the pollsys() are connected to
the cost delay and not something else.
Hmmm. Looks like, at least in 8.3, running a manual vacuum on a table
doesn't prevent anti-wraparound vacuum from restarting. So I can't do
any further testing until we can restart the server.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes:
Most likely that's the libc implementation of the select()-based sleeps
for vacuum_cost_delay. I'm still suspicious that the writes are eating
more cost_delay points than you think.
Tested that. It does look like if I increase vacuum_cost_limit to 10000
and lower vacuum_cost_page_dirty to 10, it reads 5-7 pages and writes
2-3 before each pollsys. The math seems completely wrong on that,
though -- it should be 50 and 30 pages, or similar.
I think there could be a lot of cost_delay points getting expended
without any effects visible at the level of strace. Maybe try fooling
with vacuum_cost_page_hit and vacuum_cost_page_miss, too.
regards, tom lane
Josh Berkus wrote:
On further reflection, though: since we put in the BufferAccessStrategy
code, which was in 8.3, the background writer isn't *supposed* to be
very much involved in writing pages that are dirtied by VACUUM. VACUUM
runs in a small ring of buffers and is supposed to have to clean its own
dirt most of the time. So it's wrong to blame this on the bgwriter not
holding up its end. Rather, what you need to be thinking about is how
come vacuum seems to be making lots of pages dirty on only one of these
machines.This is an anti-wraparound vacuum, so it could have something to do with
the hint bits. Maybe it's setting the freeze bit on every page, and
writing them one page at a time? Still don't understand the call to
pollsys, even so, though.
We often mention that we do vacuum freeze for anti-wraparound vacuum,
but not for pg_clog file removal, which is the primary trigger for
autovacuum vacuum freezing. I have added the attached documentation
patch for autovacuum_freeze_max_age; back-patched to 9.0.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
Attachments:
/rtmp/difftext/x-diffDownload+3-3
For a documentation patch should this not be back ported to all
relevant versions?
On 8/21/10, Bruce Momjian <bruce@momjian.us> wrote:
Josh Berkus wrote:
On further reflection, though: since we put in the BufferAccessStrategy
code, which was in 8.3, the background writer isn't *supposed* to be
very much involved in writing pages that are dirtied by VACUUM. VACUUM
runs in a small ring of buffers and is supposed to have to clean its own
dirt most of the time. So it's wrong to blame this on the bgwriter not
holding up its end. Rather, what you need to be thinking about is how
come vacuum seems to be making lots of pages dirty on only one of these
machines.This is an anti-wraparound vacuum, so it could have something to do with
the hint bits. Maybe it's setting the freeze bit on every page, and
writing them one page at a time? Still don't understand the call to
pollsys, even so, though.We often mention that we do vacuum freeze for anti-wraparound vacuum,
but not for pg_clog file removal, which is the primary trigger for
autovacuum vacuum freezing. I have added the attached documentation
patch for autovacuum_freeze_max_age; back-patched to 9.0.--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com+ It's impossible for everything to be true. +
--
Rob Wultsch
wultsch@gmail.com
Rob Wultsch wrote:
For a documentation patch should this not be back ported to all
relevant versions?
It is only a minor adjustment and I normally don't backpatch that.
---------------------------------------------------------------------------
On 8/21/10, Bruce Momjian <bruce@momjian.us> wrote:
Josh Berkus wrote:
On further reflection, though: since we put in the BufferAccessStrategy
code, which was in 8.3, the background writer isn't *supposed* to be
very much involved in writing pages that are dirtied by VACUUM. VACUUM
runs in a small ring of buffers and is supposed to have to clean its own
dirt most of the time. So it's wrong to blame this on the bgwriter not
holding up its end. Rather, what you need to be thinking about is how
come vacuum seems to be making lots of pages dirty on only one of these
machines.This is an anti-wraparound vacuum, so it could have something to do with
the hint bits. Maybe it's setting the freeze bit on every page, and
writing them one page at a time? Still don't understand the call to
pollsys, even so, though.We often mention that we do vacuum freeze for anti-wraparound vacuum,
but not for pg_clog file removal, which is the primary trigger for
autovacuum vacuum freezing. I have added the attached documentation
patch for autovacuum_freeze_max_age; back-patched to 9.0.--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com+ It's impossible for everything to be true. +
--
Rob Wultsch
wultsch@gmail.com
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> writes:
We often mention that we do vacuum freeze for anti-wraparound vacuum,
but not for pg_clog file removal, which is the primary trigger for
autovacuum vacuum freezing. I have added the attached documentation
patch for autovacuum_freeze_max_age; back-patched to 9.0.
This patch does not actually seem like an improvement. The paragraph is
all about transaction age, but you've inserted something entirely
unrelated, and not only that but chosen to make the unrelated thing seem
like the major consequence and anti-wraparound an afterthought.
regards, tom lane