autovacuum truncate exclusive lock round two
This problem has been discussed before. Those familiar with the subject
please skip the next paragraph.
When autovacuum finds a substantial amount of empty pages at the end of
a relation, it attempts to truncate it in lazy_truncate_heap(). Because
all the scanning had been done in parallel to normal DB activity, it
needs to verify that all those blocks are still empty. To do that
autovacuum grabs an AccessExclusiveLock on the relation, then scans
backwards to the last non-empty page. If any other backend needs to
access that table during this time, it will kill the autovacuum from the
deadlock detection code, which by default is done after a 1000ms
timeout. The autovacuum launcher will start another vacuum after
(default) 60 seconds, which most likely is getting killed again, and
again, and again. The net result of this is that the table is never
truncated and every 60 seconds there is a 1 second hiccup before the
autovacuum is killed.
Proposal:
Add functions to lmgr that are derived from the lock release code, but
instead of releasing the lock and waking up waiters, just return a
boolean telling if there are any waiters that would be woken up if this
lock was released.
Use this lmgr feature inside count_nondeletable_pages() of vacuumlazy.c
to periodically check, if there is a conflicting lock request waiting.
If not, keep going. If there is a waiter, truncate the relation to the
point checked thus far, release the AccessExclusiveLock, then loop back
to where we acquire this lock in the first place and continue
checking/truncating.
I have a working patch here:
https://github.com/wieck/postgres/tree/autovacuum-truncate-lock
This patch does introduce three new postgresql.conf parameters, which I
would be happy to get rid of if we could derive them from something
else. Something based on the deadlock timeout may be possible.
autovacuum_truncate_lock_check = 100ms # how frequent to check
# for conflicting locks
autovacuum_truncate_lock_retry = 50 # how often to try acquiring
# the exclusive lock
autovacuum_truncate_lock_wait = 20ms # nap in between attempts
With these settings, I see the truncate of a bloated table progressing
at a rate of 3 minutes per GB, while that table is accessed 20 times per
second.
The original "kill autovacuum" mechanism in the deadlock code is still
there. All this code really does is 10 lmgr lookups per second and
releasing the AccessExclusiveLock if there are any waiters. I don't
think it can get any cheaper than this.
I am attaching a script that uses pgbench to demonstrate the actual
problem of a bloated table with significant empty pages at the end.
Comments?
Jan
--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
Attachments:
Here is the patch for it.
Jan
--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
Attachments:
autovacuum-truncate-lock.difftext/x-patch; name=autovacuum-truncate-lock.diffDownload+374-171
Jan,
* Jan Wieck (JanWieck@Yahoo.com) wrote:
This problem has been discussed before. Those familiar with the
subject please skip the next paragraph.
Apologies if this was already thought-of and ruled out for some reason,
but...
Because all the scanning had been done in parallel to normal DB
activity, it needs to verify that all those blocks are still empty.
Would it be possible to use the FSM to figure out if things have changed
since the last scan..? Does that scan update the FSM, which would then
be updated by another backend in the event that it decided to write
something there? Or do we consider the FSM to be completely
untrustworthy wrt this (and if so, I don't suppose there's any hope to
using the visibility map...)?
The notion of having to double-scan and the AccessExclusiveLock on the
relation are telling me this work-around, while completely possible,
isn't exactly ideal...
Perhaps another option would be a page-level or something which is
larger than per-row (strikes me as a lot of overhead for this and it's
not clear how we'd do it), but less than an entire relation, but there
are certainly pain points there too.
Thanks,
Stephen
Steven,
On 10/24/2012 10:46 PM, Stephen Frost wrote:
Jan,
* Jan Wieck (JanWieck@Yahoo.com) wrote:
This problem has been discussed before. Those familiar with the
subject please skip the next paragraph.Apologies if this was already thought-of and ruled out for some reason,
but...Because all the scanning had been done in parallel to normal DB
activity, it needs to verify that all those blocks are still empty.Would it be possible to use the FSM to figure out if things have changed
since the last scan..? Does that scan update the FSM, which would then
be updated by another backend in the event that it decided to write
something there? Or do we consider the FSM to be completely
untrustworthy wrt this (and if so, I don't suppose there's any hope to
using the visibility map...)?
I honestly don't know if we can trust the FSM enough when it comes to
throwing away heap pages. Can we?
The notion of having to double-scan and the AccessExclusiveLock on the
relation are telling me this work-around, while completely possible,
isn't exactly ideal...
Under normal circumstances with just a few pages to trim off the end
this is no problem. Those pages were the last pages just scanned by this
very autovacuum, so they are found in the shared buffers anyway. All the
second scan does in that case is to fetch the page once more from shared
buffers to be 100% sure, we are not truncating off new tuples. We
definitely need the AccessExclusiveLock to prevent someone from
extending the relation at the end between our check for relation size
and the truncate. Fetching 50 empty blocks from the buffer cache while
at it isn't that big of a deal and that is what it normally looks like.
The problem case this patch is dealing with is rolling window tables
that experienced some bloat. The typical example is a log table, that
has new data constantly added and the oldest data constantly purged out.
This data normally rotates through some blocks like a rolling window. If
for some reason (purging turned off for example) this table bloats by
several GB and later shrinks back to its normal content, soon all the
used blocks are at the beginning of the heap and we find tens of
thousands of empty pages at the end. Only now does the second scan take
more than 1000ms and autovacuum is at risk to get killed while at it.
Since we have experienced this problem several times now on our
production systems, something clearly needs to be done. But IMHO it
doesn't happen often enough to take any risk here.
Jan
--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
Import Notes
Resolved by subject fallback
Jan Wieck <JanWieck@Yahoo.com> writes:
On 10/24/2012 10:46 PM, Stephen Frost wrote:
Would it be possible to use the FSM to figure out if things have changed
since the last scan..? Does that scan update the FSM, which would then
be updated by another backend in the event that it decided to write
something there? Or do we consider the FSM to be completely
untrustworthy wrt this (and if so, I don't suppose there's any hope to
using the visibility map...)?
I honestly don't know if we can trust the FSM enough when it comes to
throwing away heap pages. Can we?
No. Backends are under no obligation to update FSM for each individual
tuple insertion, and typically don't do so.
More to the point, you have to take AccessExclusiveLock *anyway*,
because this is interlocking not only against new insertions but plain
read-only seqscans: if a seqscan falls off the end of the table it will
be very unhappy. So I don't see where we'd buy anything by consulting
the FSM.
regards, tom lane
On 10/25/2012 9:45 AM, Tom Lane wrote:
Jan Wieck <JanWieck@Yahoo.com> writes:
On 10/24/2012 10:46 PM, Stephen Frost wrote:
Would it be possible to use the FSM to figure out if things have changed
since the last scan..? Does that scan update the FSM, which would then
be updated by another backend in the event that it decided to write
something there? Or do we consider the FSM to be completely
untrustworthy wrt this (and if so, I don't suppose there's any hope to
using the visibility map...)?I honestly don't know if we can trust the FSM enough when it comes to
throwing away heap pages. Can we?No. Backends are under no obligation to update FSM for each individual
tuple insertion, and typically don't do so.More to the point, you have to take AccessExclusiveLock *anyway*,
because this is interlocking not only against new insertions but plain
read-only seqscans: if a seqscan falls off the end of the table it will
be very unhappy. So I don't see where we'd buy anything by consulting
the FSM.
Thank you.
One thing that I haven't mentioned yet is that with this patch, we could
actually insert a vacuum_delay_point() into the loop in
count_nondeletable_pages(). We no longer cling to the exclusive lock but
rather get out of the way as soon as somebody needs the table. Under
this condition we no longer need to do the second scan full bore.
Jan
--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
Jan,
* Jan Wieck (JanWieck@Yahoo.com) wrote:
The problem case this patch is dealing with is rolling window tables
that experienced some bloat. The typical example is a log table,
that has new data constantly added and the oldest data constantly
purged out. This data normally rotates through some blocks like a
rolling window. If for some reason (purging turned off for example)
this table bloats by several GB and later shrinks back to its normal
content, soon all the used blocks are at the beginning of the heap
and we find tens of thousands of empty pages at the end. Only now
does the second scan take more than 1000ms and autovacuum is at risk
to get killed while at it.
My concern is that this could certainly also happen to a heavily updated
table in an OLTP type of environment where the requirement to take a
heavy lock to clean it up might prevent it from ever happening.. I was
simply hoping we could find a mechanism to lock just those pages we're
getting ready to nuke rather than the entire relation. Perhaps we can
consider how to make those changes alongside of changes to eliminate or
reduce the extent locking that has been painful (for me at least) when
doing massive parallel loads into a table.
Since we have experienced this problem several times now on our
production systems, something clearly needs to be done. But IMHO it
doesn't happen often enough to take any risk here.
I'm not advocating a 'do-nothing' approach, was just looking for another
option that might allow for this work to happen on the heap in parallel
with regular access. Since we havn't got any way to do that currently,
+1 for moving forward with this as it clearly improves the current
situation.
Thanks,
Stephen
On 10/25/2012 10:12 AM, Stephen Frost wrote:
Jan,
* Jan Wieck (JanWieck@Yahoo.com) wrote:
The problem case this patch is dealing with is rolling window tables
that experienced some bloat. The typical example is a log table,
that has new data constantly added and the oldest data constantly
purged out. This data normally rotates through some blocks like a
rolling window. If for some reason (purging turned off for example)
this table bloats by several GB and later shrinks back to its normal
content, soon all the used blocks are at the beginning of the heap
and we find tens of thousands of empty pages at the end. Only now
does the second scan take more than 1000ms and autovacuum is at risk
to get killed while at it.My concern is that this could certainly also happen to a heavily updated
table in an OLTP type of environment where the requirement to take a
heavy lock to clean it up might prevent it from ever happening.. I was
simply hoping we could find a mechanism to lock just those pages we're
getting ready to nuke rather than the entire relation. Perhaps we can
consider how to make those changes alongside of changes to eliminate or
reduce the extent locking that has been painful (for me at least) when
doing massive parallel loads into a table.
I've been testing this with loads of 20 writes/s to that bloated table.
Preventing not only the clean up, but the following ANALYZE as well is
precisely what happens. There may be multiple ways how to get into this
situation, but once you're there the symptoms are the same. Vacuum fails
to truncate it and causing a 1 second hiccup every minute, while vacuum
is holding the exclusive lock until the deadlock detection code of
another transaction kills it.
My patch doesn't change the logic how we ensure that we don't zap any
data by accident with the truncate and Tom's comments suggest we should
stick to it. It only makes autovacuum check frequently if the
AccessExclusiveLock is actually blocking anyone and then get out of the
way.
I would rather like to discuss any ideas how to do all this without 3
new GUCs.
In the original code, the maximum delay that autovacuum can cause by
holding the exclusive lock is one deadlock_timeout (default 1s). It
would appear reasonable to me to use max(deadlock_timeout/10,10ms) as
the interval to check for a conflicting lock request. For another
transaction that needs to access the table this is 10 times faster than
it is now and still guarantees that autovacuum will make some progress
with the truncate.
The other two GUCs control how often and how fast autovacuum tries to
acquire the exclusive lock in the first place. Since we actively release
the lock *because someone needs it* it is pretty much guaranteed that
the immediate next lock attempt fails. We on purpose do a
ConditionalLockRelation() because there is a chance to deadlock. The
current code only tries one lock attempt and gives up immediately. I
don't know from what to derive a good value for how long to retry, but
the nap time in between tries could be a hardcoded 20ms or using the
cost based vacuum nap time (which defaults to 20ms).
Any other ideas are welcome.
Thanks,
Jan
--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
Jan Wieck wrote:
In the original code, the maximum delay that autovacuum can cause by
holding the exclusive lock is one deadlock_timeout (default 1s). It
would appear reasonable to me to use max(deadlock_timeout/10,10ms)
as the interval to check for a conflicting lock request. For another
transaction that needs to access the table this is 10 times faster
than it is now and still guarantees that autovacuum will make some
progress with the truncate.
So you would be calling GetCurrentTimestamp() continuously? Since you
mentioned adding a vacuum delay point I wonder if it would make sense to
test for lockers each time it would consider going to sleep, instead.
(One hazard to keep in mind is the case where no vacuum delay is
configured.)
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 10/25/2012 12:24 PM, Alvaro Herrera wrote:
Jan Wieck wrote:
In the original code, the maximum delay that autovacuum can cause by
holding the exclusive lock is one deadlock_timeout (default 1s). It
would appear reasonable to me to use max(deadlock_timeout/10,10ms)
as the interval to check for a conflicting lock request. For another
transaction that needs to access the table this is 10 times faster
than it is now and still guarantees that autovacuum will make some
progress with the truncate.So you would be calling GetCurrentTimestamp() continuously? Since you
mentioned adding a vacuum delay point I wonder if it would make sense to
test for lockers each time it would consider going to sleep, instead.
(One hazard to keep in mind is the case where no vacuum delay is
configured.)
Depends on your definition of "continuously". If doing one
INSTR_TIME_SET_CURRENT(), which on Unix boils down to a gettimeofday(),
every 32 ReadBufferExtended() calls counts as continuously, then yes.
Adding a vacuum_delay_point() is something we should consider. However,
the vacuum_delay_point() call simply naps when enough cost has been
racked up. You don't know if the next call will nap or not. We would
have to extend that functionality with some vacuum_delay_would_nap()
call to do what you suggest.
Jan
--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
On Thursday, October 25, 2012 9:46 PM Jan Wieck wrote:
On 10/25/2012 10:12 AM, Stephen Frost wrote:
Jan,
* Jan Wieck (JanWieck@Yahoo.com) wrote:
The problem case this patch is dealing with is rolling window tables
that experienced some bloat. The typical example is a log table,
that has new data constantly added and the oldest data constantly
purged out. This data normally rotates through some blocks like a
rolling window. If for some reason (purging turned off for example)
this table bloats by several GB and later shrinks back to its normal
content, soon all the used blocks are at the beginning of the heap
and we find tens of thousands of empty pages at the end. Only now
does the second scan take more than 1000ms and autovacuum is at risk
to get killed while at it.My concern is that this could certainly also happen to a heavily
updated
table in an OLTP type of environment where the requirement to take a
heavy lock to clean it up might prevent it from ever happening.. Iwas
simply hoping we could find a mechanism to lock just those pages we're
getting ready to nuke rather than the entire relation. Perhaps we can
consider how to make those changes alongside of changes to eliminateor
reduce the extent locking that has been painful (for me at least) when
doing massive parallel loads into a table.I've been testing this with loads of 20 writes/s to that bloated table.
Preventing not only the clean up, but the following ANALYZE as well is
precisely what happens. There may be multiple ways how to get into this
situation, but once you're there the symptoms are the same. Vacuum fails
to truncate it and causing a 1 second hiccup every minute, while vacuum
is holding the exclusive lock until the deadlock detection code of
another transaction kills it.My patch doesn't change the logic how we ensure that we don't zap any
data by accident with the truncate and Tom's comments suggest we should
stick to it. It only makes autovacuum check frequently if the
AccessExclusiveLock is actually blocking anyone and then get out of the
way.I would rather like to discuss any ideas how to do all this without 3
new GUCs.In the original code, the maximum delay that autovacuum can cause by
holding the exclusive lock is one deadlock_timeout (default 1s). It
would appear reasonable to me to use max(deadlock_timeout/10,10ms) as
the interval to check for a conflicting lock request. For another
transaction that needs to access the table this is 10 times faster than
it is now and still guarantees that autovacuum will make some progress
with the truncate.
One other way could be to check after every few pages for a conflicting
lock request.
The other two GUCs control how often and how fast autovacuum tries to
acquire the exclusive lock in the first place. Since we actively release
the lock *because someone needs it* it is pretty much guaranteed that
the immediate next lock attempt fails. We on purpose do a
ConditionalLockRelation() because there is a chance to deadlock. The
current code only tries one lock attempt and gives up immediately. I
don't know from what to derive a good value for how long to retry,
Can't we do something like, after nap check for conditional lock and if it
didn't get
then get lock unconditionally.
The reason why after your implementation it might be okay to have lock
unconditionally after one try is that
anyway after every few pages or after small time, it will release the lock
if there is any waiter.
but
the nap time in between tries could be a hardcoded 20ms or using the
cost based vacuum nap time (which defaults to 20ms).
I think using cost based vacuum nap time or default value is good.
Adding new parameters might have user/administrator overhead, it is always
better if it can be intelligently decided by database itself.
However if you feel these are parameters which can vary based on different
kind of usage, then I think it is better to expose it through configuration
parameters to users.
With Regards,
Amit Kapila.
On 10/26/2012 1:29 AM, Amit Kapila wrote:
On Thursday, October 25, 2012 9:46 PM Jan Wieck wrote:
On 10/25/2012 10:12 AM, Stephen Frost wrote:
Jan,
* Jan Wieck (JanWieck@Yahoo.com) wrote:
The problem case this patch is dealing with is rolling window tables
that experienced some bloat. The typical example is a log table,
that has new data constantly added and the oldest data constantly
purged out. This data normally rotates through some blocks like a
rolling window. If for some reason (purging turned off for example)
this table bloats by several GB and later shrinks back to its normal
content, soon all the used blocks are at the beginning of the heap
and we find tens of thousands of empty pages at the end. Only now
does the second scan take more than 1000ms and autovacuum is at risk
to get killed while at it.My concern is that this could certainly also happen to a heavily
updated
table in an OLTP type of environment where the requirement to take a
heavy lock to clean it up might prevent it from ever happening.. Iwas
simply hoping we could find a mechanism to lock just those pages we're
getting ready to nuke rather than the entire relation. Perhaps we can
consider how to make those changes alongside of changes to eliminateor
reduce the extent locking that has been painful (for me at least) when
doing massive parallel loads into a table.I've been testing this with loads of 20 writes/s to that bloated table.
Preventing not only the clean up, but the following ANALYZE as well is
precisely what happens. There may be multiple ways how to get into this
situation, but once you're there the symptoms are the same. Vacuum fails
to truncate it and causing a 1 second hiccup every minute, while vacuum
is holding the exclusive lock until the deadlock detection code of
another transaction kills it.My patch doesn't change the logic how we ensure that we don't zap any
data by accident with the truncate and Tom's comments suggest we should
stick to it. It only makes autovacuum check frequently if the
AccessExclusiveLock is actually blocking anyone and then get out of the
way.I would rather like to discuss any ideas how to do all this without 3
new GUCs.In the original code, the maximum delay that autovacuum can cause by
holding the exclusive lock is one deadlock_timeout (default 1s). It
would appear reasonable to me to use max(deadlock_timeout/10,10ms) as
the interval to check for a conflicting lock request. For another
transaction that needs to access the table this is 10 times faster than
it is now and still guarantees that autovacuum will make some progress
with the truncate.One other way could be to check after every few pages for a conflicting
lock request.
How is this any different from what my patch does? Did you even look at
the code?
The other two GUCs control how often and how fast autovacuum tries to
acquire the exclusive lock in the first place. Since we actively release
the lock *because someone needs it* it is pretty much guaranteed that
the immediate next lock attempt fails. We on purpose do a
ConditionalLockRelation() because there is a chance to deadlock. The
current code only tries one lock attempt and gives up immediately. I
don't know from what to derive a good value for how long to retry,Can't we do something like, after nap check for conditional lock and if it
didn't get
then get lock unconditionally.
No, we cannot. This is also well documented in the code.
Jan
--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
On Friday, October 26, 2012 10:59 AM Amit Kapila wrote:
On Thursday, October 25, 2012 9:46 PM Jan Wieck wrote:
On 10/25/2012 10:12 AM, Stephen Frost wrote:
Jan,
* Jan Wieck (JanWieck@Yahoo.com) wrote:
The problem case this patch is dealing with is rolling window
tables
that experienced some bloat. The typical example is a log table,
that has new data constantly added and the oldest data constantly
purged out. This data normally rotates through some blocks like a
rolling window. If for some reason (purging turned off for example)
this table bloats by several GB and later shrinks back to itsnormal
content, soon all the used blocks are at the beginning of the heap
and we find tens of thousands of empty pages at the end. Only now
does the second scan take more than 1000ms and autovacuum is atrisk
to get killed while at it.
My concern is that this could certainly also happen to a heavily
updated
table in an OLTP type of environment where the requirement to take a
heavy lock to clean it up might prevent it from ever happening.. Iwas
simply hoping we could find a mechanism to lock just those pages
we're
getting ready to nuke rather than the entire relation. Perhaps we
can
consider how to make those changes alongside of changes to eliminate
or
reduce the extent locking that has been painful (for me at least)
when
doing massive parallel loads into a table.
I've been testing this with loads of 20 writes/s to that bloated
table.
Preventing not only the clean up, but the following ANALYZE as well is
precisely what happens. There may be multiple ways how to get intothis
situation, but once you're there the symptoms are the same. Vacuum
fails
to truncate it and causing a 1 second hiccup every minute, while
vacuum
is holding the exclusive lock until the deadlock detection code of
another transaction kills it.My patch doesn't change the logic how we ensure that we don't zap any
data by accident with the truncate and Tom's comments suggest weshould
stick to it. It only makes autovacuum check frequently if the
AccessExclusiveLock is actually blocking anyone and then get out ofthe
way.
I would rather like to discuss any ideas how to do all this without 3
new GUCs.In the original code, the maximum delay that autovacuum can cause by
holding the exclusive lock is one deadlock_timeout (default 1s). It
would appear reasonable to me to use max(deadlock_timeout/10,10ms) as
the interval to check for a conflicting lock request. For another
transaction that needs to access the table this is 10 times fasterthan
it is now and still guarantees that autovacuum will make some progress
with the truncate.One other way could be to check after every few pages for a
conflicting
lock request.The other two GUCs control how often and how fast autovacuum tries to
acquire the exclusive lock in the first place. Since we activelyrelease
the lock *because someone needs it* it is pretty much guaranteed that
the immediate next lock attempt fails. We on purpose do a
ConditionalLockRelation() because there is a chance to deadlock. The
current code only tries one lock attempt and gives up immediately. I
don't know from what to derive a good value for how long to retry,Can't we do something like, after nap check for conditional lock and
if it
didn't get
then get lock unconditionally.
The reason why after your implementation it might be okay to have lock
unconditionally after one try is that
anyway after every few pages or after small time, it will release the
lock
if there is any waiter.
I am sorry, at this point trying to take unconditional X lock can lead to
deadlock, so above is not possible.
Show quoted text
but
the nap time in between tries could be a hardcoded 20ms or using the
cost based vacuum nap time (which defaults to 20ms).I think using cost based vacuum nap time or default value is good.
Adding new parameters might have user/administrator overhead, it is
always
better if it can be intelligently decided by database itself.
However if you feel these are parameters which can vary based on
different
kind of usage, then I think it is better to expose it through
configuration
parameters to users.With Regards,
Amit Kapila.--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Friday, October 26, 2012 11:50 AM Jan Wieck wrote:
On 10/26/2012 1:29 AM, Amit Kapila wrote:
On Thursday, October 25, 2012 9:46 PM Jan Wieck wrote:
On 10/25/2012 10:12 AM, Stephen Frost wrote:
Jan,
* Jan Wieck (JanWieck@Yahoo.com) wrote:
The problem case this patch is dealing with is rolling window
tables
that experienced some bloat. The typical example is a log table,
that has new data constantly added and the oldest data constantly
purged out. This data normally rotates through some blocks like a
rolling window. If for some reason (purging turned off forexample)
this table bloats by several GB and later shrinks back to its
normal
content, soon all the used blocks are at the beginning of the heap
and we find tens of thousands of empty pages at the end. Only now
does the second scan take more than 1000ms and autovacuum is atrisk
to get killed while at it.
My concern is that this could certainly also happen to a heavily
updated
table in an OLTP type of environment where the requirement to take
a
heavy lock to clean it up might prevent it from ever happening.. I
was
simply hoping we could find a mechanism to lock just those pages
we're
getting ready to nuke rather than the entire relation. Perhaps we
can
consider how to make those changes alongside of changes to
eliminate
or
reduce the extent locking that has been painful (for me at least)
when
doing massive parallel loads into a table.
I've been testing this with loads of 20 writes/s to that bloated
table.
Preventing not only the clean up, but the following ANALYZE as well
is
precisely what happens. There may be multiple ways how to get into
this
situation, but once you're there the symptoms are the same. Vacuum
fails
to truncate it and causing a 1 second hiccup every minute, while
vacuum
is holding the exclusive lock until the deadlock detection code of
another transaction kills it.My patch doesn't change the logic how we ensure that we don't zap any
data by accident with the truncate and Tom's comments suggest weshould
stick to it. It only makes autovacuum check frequently if the
AccessExclusiveLock is actually blocking anyone and then get out ofthe
way.
I would rather like to discuss any ideas how to do all this without 3
new GUCs.In the original code, the maximum delay that autovacuum can cause by
holding the exclusive lock is one deadlock_timeout (default 1s). It
would appear reasonable to me to use max(deadlock_timeout/10,10ms) as
the interval to check for a conflicting lock request. For another
transaction that needs to access the table this is 10 times fasterthan
it is now and still guarantees that autovacuum will make some
progress
with the truncate.
One other way could be to check after every few pages for a
conflicting
lock request.
How is this any different from what my patch does?
The difference is that in the patch it checks for waiters by using 2
parameters autovacuum_truncate_lock_check and blkno%32 and what I
had mentioned was to check only based on blkno.
Will it effect too much if we directly check for waiters after every 32
(any feasible number) blocks?
Did you even look at the code?
I haven't looked at code when I had given reply to your previous mail. But
now I have checked it.
With Regards,
Amit Kapila.
On 10/26/2012 6:35 AM, Amit Kapila wrote:
On Friday, October 26, 2012 11:50 AM Jan Wieck wrote:
On 10/26/2012 1:29 AM, Amit Kapila wrote:
One other way could be to check after every few pages for a
conflicting
lock request.
How is this any different from what my patch does?
The difference is that in the patch it checks for waiters by using 2
parameters autovacuum_truncate_lock_check and blkno%32 and what I
had mentioned was to check only based on blkno.
Will it effect too much if we directly check for waiters after every 32
(any feasible number) blocks?
The blkno%32 is there to not do the gettimeofday() call too often. But
relying on the blkno alone is IMHO not a good idea. It had to be a
number small enough so that even on a busy system and when the pages
have to be read from disk, vacuum checks and releases the lock quickly.
But large enough so that it doesn't create a significant amount of
spinlock calls in the lmgr. We would end up with another parameter,
number of blocks, that is a lot harder to estimate a good value for.
Jan
--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
On Wed, Oct 24, 2012 at 4:20 PM, Jan Wieck <JanWieck@yahoo.com> wrote:
This patch does introduce three new postgresql.conf parameters, which I
would be happy to get rid of if we could derive them from something else.
Something based on the deadlock timeout may be possible.autovacuum_truncate_lock_check = 100ms # how frequent to check
# for conflicting locks
autovacuum_truncate_lock_retry = 50 # how often to try acquiring
# the exclusive lock
autovacuum_truncate_lock_wait = 20ms # nap in between attempts
+1 for this general approach.
As you suggested downthread, I think that hard-coding
autovacuum_truncate_lock_check to one-tenth of the deadlock timeout
should be just fine. For the other two parameters, I doubt we need to
make them configurable at all. It's not exactly clear what to set
them to, but it does seem clear that the down side of setting them
incorrectly isn't very much as long as the defaults are roughly sane.
Personally, I'd be inclined to retry less frequently but over a
slightly longer time period - say twenty retries, one after every
100ms. But I wouldn't be upset if we settled on what you've got here,
either. We just don't want to let the total time we spend waiting for
the lock get too long, because that means pinning down an auto-vacuum
worker that might be critically needed elsewhere. So the product of
autovacuum_truncate_lock_retry and autovacuum_truncate_lock_wait
probably should not be more than a couple of seconds.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Jan Wieck <JanWieck@Yahoo.com> writes:
Use this lmgr feature inside count_nondeletable_pages() of vacuumlazy.c to
periodically check, if there is a conflicting lock request waiting. If not,
keep going. If there is a waiter, truncate the relation to the point checked
thus far, release the AccessExclusiveLock, then loop back to where we
acquire this lock in the first place and continue checking/truncating.
I think that maybe we could just bail out after releasing the
AccessExclusiveLock and trust autovacuum to get back to truncating that
relation later. That would allow removing 2 of the 3 GUCs below:
autovacuum_truncate_lock_check = 100ms # how frequent to check
# for conflicting locks
This is the one remaining. Could we maybe check for lock conflict after
every move backward a page, or some multiple thereof? The goal would be
to ensure that progress is made, while also being aware of concurrent
activity, ala CHECK_FOR_INTERRUPTS().
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine wrote:
Jan Wieck <JanWieck@Yahoo.com> writes:
Use this lmgr feature inside count_nondeletable_pages() of vacuumlazy.c to
periodically check, if there is a conflicting lock request waiting. If not,
keep going. If there is a waiter, truncate the relation to the point checked
thus far, release the AccessExclusiveLock, then loop back to where we
acquire this lock in the first place and continue checking/truncating.I think that maybe we could just bail out after releasing the
AccessExclusiveLock and trust autovacuum to get back to truncating that
relation later.
That doesn't work, because the truncating code is not reached unless
vacuuming actually took place. So if you interrupt it, it will just not
get called again later. Maybe we could have autovacuum somehow invoke
that separately, but that would require that the fact that truncation
was aborted is kept track of somewhere.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Friday, November 16, 2012 4:09 AM Alvaro Herrera wrote:
Dimitri Fontaine wrote:
Jan Wieck <JanWieck@Yahoo.com> writes:
Use this lmgr feature inside count_nondeletable_pages() of
vacuumlazy.c to
periodically check, if there is a conflicting lock request waiting.
If not,
keep going. If there is a waiter, truncate the relation to the point
checked
thus far, release the AccessExclusiveLock, then loop back to where
we
acquire this lock in the first place and continue
checking/truncating.
I think that maybe we could just bail out after releasing the
AccessExclusiveLock and trust autovacuum to get back to truncatingthat
relation later.
That doesn't work, because the truncating code is not reached unless
vacuuming actually took place. So if you interrupt it, it will just not
get called again later. Maybe we could have autovacuum somehow invoke
that separately, but that would require that the fact that truncation
was aborted is kept track of somewhere.
Won't it have a chance to be handled next time when vacuum will trigger due
to updates/deletes on some other pages.
OTOH, may be next time again the same thing happens and it was not able to
complete the truncate.
So I think it's better to complete first time only, but may be using some
heuristic time for wait and retry rather than
with configuration variables.
With Regards,
Amit Kapila
Jan,
Are you posting an updated patch?
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services