heap vacuum & cleanup locks

Started by Robert Haasalmost 15 years ago43 messageshackers

robertmhaas@gmail.com

almost 15 years ago

We've occasionally seen problems with VACUUM getting stuck for failure
to acquire a cleanup lock due to, for example, a cursor holding a pin
on the buffer page. In the worst case, this can cause an undetected
deadlock, if the backend holding the buffer pin blocks trying to
acquire a heavyweight lock that is in turn blocked by VACUUM. A while
back, someone (Greg Stark? me?) floated the idea of not waiting for
the cleanup lock. If we can't get it immediately, or within some
short period of time, then we just skip the page and continue on.

Today I had what might be a better idea: don't try to acquire a
cleanup lock at all. Instead, acquire an exclusive lock. After
having done so, observe the pin count. If there are no other buffer
pins, that means our exclusive lock is actually a cleanup lock, and we
proceed as now. If other buffer pins do exist, then we can't
defragment the page, but that doesn't mean no useful work can be done:
we can still mark used line pointers dead, or dead line pointers
unused. We cannot defragment, but that can be done either by the next
VACUUM or by a HOT cleanup. We can even arrange - using existing
mechanism - to leave behind a hint that the page is a good candidate
for a HOT cleanup, by setting pd_prune_xid to, say, FrozenXID.

Like the idea of skipping pages on which we can't acquire a cleanup
lock altogether, this should prevent VACUUM from getting stuck trying
to lock a heap page. While buffer pins can be held for extended
periods of time, I don't think there is any operation that holds a
buffer content lock more than very briefly. Furthermore, unlike the
idea of skipping the page altogether, we could use this approach even
during an anti-wraparound vacuum.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Itagaki Takahiro

itagaki.takahiro@gmail.com

almost 15 years ago

In reply to: Robert Haas (#1)

Re: heap vacuum & cleanup locks

On Sun, Jun 5, 2011 at 12:03, Robert Haas <robertmhaas@gmail.com> wrote:

If other buffer pins do exist, then we can't
defragment the page, but that doesn't mean no useful work can be done:
we can still mark used line pointers dead, or dead line pointers
unused. We cannot defragment, but that can be done either by the next
VACUUM or by a HOT cleanup.

This is just an idea -- Is it possible to have copy-on-write techniques?
VACUUM allocates a duplicated page for the pinned page, and copy valid
tuples into the new page. Following buffer readers after the VACUUM will
see the cloned page instead of the old pinned one.

Of course, copy-on-writing is more complex than skipping pinned pages,
but I wonder we cannot vacuum at all in some edge cases with the
skipping method.

--
Itagaki Takahiro

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Itagaki Takahiro (#2)

Re: heap vacuum & cleanup locks

On Mon, Jun 6, 2011 at 12:19 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:

On Sun, Jun 5, 2011 at 12:03, Robert Haas <robertmhaas@gmail.com> wrote:

If other buffer pins do exist, then we can't
defragment the page, but that doesn't mean no useful work can be done:
we can still mark used line pointers dead, or dead line pointers
unused. We cannot defragment, but that can be done either by the next
VACUUM or by a HOT cleanup.

This is just an idea -- Is it possible to have copy-on-write techniques?
VACUUM allocates a duplicated page for the pinned page, and copy valid
tuples into the new page. Following buffer readers after the VACUUM will
see the cloned page instead of the old pinned one.

Heikki suggested the same thing, and it's not a bad idea, but I think
it would be more work to implement than what I proposed. The caller
would need to be aware that, if it tries to re-acquire a content lock
on the same page, the offset of the tuple within the page might
change. I'm not sure how much work would be required to cope with
that possibility.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 15 years ago

In reply to: Robert Haas (#3)

Re: heap vacuum & cleanup locks

On Jun 6, 2011, at 1:00 AM, Robert Haas wrote:

On Mon, Jun 6, 2011 at 12:19 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:

On Sun, Jun 5, 2011 at 12:03, Robert Haas <robertmhaas@gmail.com> wrote:

If other buffer pins do exist, then we can't
defragment the page, but that doesn't mean no useful work can be done:
we can still mark used line pointers dead, or dead line pointers
unused. We cannot defragment, but that can be done either by the next
VACUUM or by a HOT cleanup.

This is just an idea -- Is it possible to have copy-on-write techniques?
VACUUM allocates a duplicated page for the pinned page, and copy valid
tuples into the new page. Following buffer readers after the VACUUM will
see the cloned page instead of the old pinned one.

Heikki suggested the same thing, and it's not a bad idea, but I think
it would be more work to implement than what I proposed. The caller
would need to be aware that, if it tries to re-acquire a content lock
on the same page, the offset of the tuple within the page might
change. I'm not sure how much work would be required to cope with
that possibility.

I've had a related idea that I haven't looked into... if you're scanning a relation (ie: index scan, seq scan) I've wondered if it would be more efficient to deal with the entire page at once, possibly be making a copy of it. This would reduce the number of times you pin the page (often quite dramatically). I realize that means copying the entire page, but I suspect that would occur entirely in the L1 cache, which would be fast.

So perhaps instead of copy on write we should try for copy on read on all appropriate plan nodes.

On a related note, I've also wondered if it would be useful to allow nodes to deal with more than one tuple at a time; the idea being that it's better to execute a smaller chunk of code over a bigger chunk of data instead of dribbling tuples through an entire execution tree one at a time. Perhaps that will only be useful if nodes are executing in parallel...
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

Pavan Deolasee

pavan.deolasee@gmail.com

almost 15 years ago

In reply to: Robert Haas (#1)

Re: heap vacuum & cleanup locks

On Sun, Jun 5, 2011 at 8:33 AM, Robert Haas <robertmhaas@gmail.com> wrote:

We've occasionally seen problems with VACUUM getting stuck for failure
to acquire a cleanup lock due to, for example, a cursor holding a pin
on the buffer page. In the worst case, this can cause an undetected
deadlock, if the backend holding the buffer pin blocks trying to
acquire a heavyweight lock that is in turn blocked by VACUUM. A while
back, someone (Greg Stark? me?) floated the idea of not waiting for
the cleanup lock. If we can't get it immediately, or within some
short period of time, then we just skip the page and continue on.

Do we know if this is really a problem though ? The deadlock for
example, can happen only when a backend tries to get a table level
conflicting lock while holding the buffer pin and I am not sure if we
do that.

The contention issue would probably make sense for small tables
because for large to very large tables, the probability that a backend
and vacuum would process the same page would be quite low. With the
current default for vac_threshold, the small tables can get vacuumed
very frequently and if they are also heavily accessed, the cleanup
lock can become a bottleneck.

Another issue that might be worth paying attention to is the single
pass vacuum that I am currently working on. The design that we agreed
up on, assumes that the index vacuum must clear index pointers to all
the dead line pointers. If we skip any page, we must at least collect
the existing dead line pointers and remove those index pointers. If we
create dead line pointers and we want to vacuum them later, we store
the LSN in the page and that may require defrag. Of course, we can
work around that, but I think it will be useful if we do some tests to
show that the cleanup lock is indeed a major bottleneck.

Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 15 years ago

In reply to: Jim Nasby (#4)

Re: heap vacuum & cleanup locks

On 06.06.2011 09:35, Jim Nasby wrote:

I've had a related idea that I haven't looked into... if you're scanning a relation (ie: index scan, seq scan) I've wondered if it would be more efficient to deal with the entire page at once, possibly be making a copy of it. This would reduce the number of times you pin the page (often quite dramatically). I realize that means copying the entire page, but I suspect that would occur entirely in the L1 cache, which would be fast.

We already do that. When an index scan moves to an index page, the heap
tid pointers of all the matching index tuples are copied to
backend-private memory in one go, and the lock is released. And for a
seqscan, the visibility of all the tuples on the page is checked in one
go while holding the lock, then the lock is released but the pin is
kept. The pin is only released after all the tuples have been read.
There's no repeated pin-unpin for each tuple.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Simon Riggs

simon@2ndQuadrant.com

almost 15 years ago

In reply to: Robert Haas (#1)

Re: heap vacuum & cleanup locks

On Sun, Jun 5, 2011 at 4:03 AM, Robert Haas <robertmhaas@gmail.com> wrote:

We've occasionally seen problems with VACUUM getting stuck for failure
to acquire a cleanup lock due to, for example, a cursor holding a pin
on the buffer page. In the worst case, this can cause an undetected
deadlock, if the backend holding the buffer pin blocks trying to
acquire a heavyweight lock that is in turn blocked by VACUUM. A while
back, someone (Greg Stark? me?) floated the idea of not waiting for
the cleanup lock. If we can't get it immediately, or within some
short period of time, then we just skip the page and continue on.

Today I had what might be a better idea: don't try to acquire a
cleanup lock at all. Instead, acquire an exclusive lock. After
having done so, observe the pin count. If there are no other buffer
pins, that means our exclusive lock is actually a cleanup lock, and we
proceed as now. If other buffer pins do exist, then we can't
defragment the page, but that doesn't mean no useful work can be done:
we can still mark used line pointers dead, or dead line pointers
unused. We cannot defragment, but that can be done either by the next
VACUUM or by a HOT cleanup. We can even arrange - using existing
mechanism - to leave behind a hint that the page is a good candidate
for a HOT cleanup, by setting pd_prune_xid to, say, FrozenXID.

Like the idea of skipping pages on which we can't acquire a cleanup
lock altogether, this should prevent VACUUM from getting stuck trying
to lock a heap page. While buffer pins can be held for extended
periods of time, I don't think there is any operation that holds a
buffer content lock more than very briefly. Furthermore, unlike the
idea of skipping the page altogether, we could use this approach even
during an anti-wraparound vacuum.

Thoughts?

Not waiting seems like a good idea.

Not returning to the block while it is in RAM or not cleaning the
block at all would cause a different performance issues, which I would
wish to avoid.

Hot Standby has specific code to avoid this situation. Perhaps you
could copy that, not sure.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Pavan Deolasee (#5)

Re: heap vacuum & cleanup locks

On Mon, Jun 6, 2011 at 2:36 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:

Do we know if this is really a problem though ? The deadlock for
example, can happen only when a backend tries to get a table level
conflicting lock while holding the buffer pin and I am not sure if we
do that.

The deadlock isn't terribly common, because, as you say, you need the
process holding the buffer pin to try to take a lock on the relation
being vacuumed that is strong enough to conflict with
ShareUpdateExclusiveLock. That's a slightly unusual thing to do.

But the problem of vacuum stalling out because it can't get the
cleanup lock is a very real one. I've seen at least one customer hit
this in production, and it was pretty painful. Now, granted, you need
some bad application design, too: you have to leave a cursor lying
around instead of running it to completion and then stopping. But
supposing you do make that mistake, you might hope that it wouldn't
cause VACUUM starvation, which is what happens today. IOW, I'm less
worried about whether the cleanup lock is slowing vacuum down than I
am about eliminating the pathological cases where an autovacuum
workers gets pinned down, stuck waiting for a cleanup lock that never
arrives. Now the table doesn't get vacuumed (bad) and the system as a
whole is one AV worker short of what it's supposed to have (also bad).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Alvaro Herrera

alvherre@2ndquadrant.com

almost 15 years ago

In reply to: Robert Haas (#8)

Re: heap vacuum & cleanup locks

Excerpts from Robert Haas's message of lun jun 06 08:06:10 -0400 2011:

But the problem of vacuum stalling out because it can't get the
cleanup lock is a very real one. I've seen at least one customer hit
this in production, and it was pretty painful. Now, granted, you need
some bad application design, too: you have to leave a cursor lying
around instead of running it to completion and then stopping. But
supposing you do make that mistake, you might hope that it wouldn't
cause VACUUM starvation, which is what happens today. IOW, I'm less
worried about whether the cleanup lock is slowing vacuum down than I
am about eliminating the pathological cases where an autovacuum
workers gets pinned down, stuck waiting for a cleanup lock that never
arrives. Now the table doesn't get vacuumed (bad) and the system as a
whole is one AV worker short of what it's supposed to have (also bad).

One of the good things about your proposal is that (AFAICS) you can
freeze tuples without the cleanup lock, so the antiwraparound cleanup
would still work.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#10

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Alvaro Herrera (#9)

Re: heap vacuum & cleanup locks

On Mon, Jun 6, 2011 at 12:49 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Excerpts from Robert Haas's message of lun jun 06 08:06:10 -0400 2011:

But the problem of vacuum stalling out because it can't get the
cleanup lock is a very real one. I've seen at least one customer hit
this in production, and it was pretty painful. Now, granted, you need
some bad application design, too: you have to leave a cursor lying
around instead of running it to completion and then stopping. But
supposing you do make that mistake, you might hope that it wouldn't
cause VACUUM starvation, which is what happens today. IOW, I'm less
worried about whether the cleanup lock is slowing vacuum down than I
am about eliminating the pathological cases where an autovacuum
workers gets pinned down, stuck waiting for a cleanup lock that never
arrives. Now the table doesn't get vacuumed (bad) and the system as a
whole is one AV worker short of what it's supposed to have (also bad).

One of the good things about your proposal is that (AFAICS) you can
freeze tuples without the cleanup lock, so the antiwraparound cleanup
would still work.

Yeah, I think that's a major selling point. VACUUM getting stuck is
Bad. Anti-wraparound VACUUM getting stuck is Really Bad.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#11

Simon Riggs

simon@2ndQuadrant.com

almost 15 years ago

In reply to: Heikki Linnakangas (#6)

Re: heap vacuum & cleanup locks

On Mon, Jun 6, 2011 at 8:02 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 06.06.2011 09:35, Jim Nasby wrote:

I've had a related idea that I haven't looked into... if you're scanning a
relation (ie: index scan, seq scan) I've wondered if it would be more
efficient to deal with the entire page at once, possibly be making a copy of
it. This would reduce the number of times you pin the page (often quite
dramatically). I realize that means copying the entire page, but I suspect
that would occur entirely in the L1 cache, which would be fast.

We already do that. When an index scan moves to an index page, the heap tid
pointers of all the matching index tuples are copied to backend-private
memory in one go, and the lock is released. And for a seqscan, the
visibility of all the tuples on the page is checked in one go while holding
the lock, then the lock is released but the pin is kept. The pin is only
released after all the tuples have been read. There's no repeated pin-unpin
for each tuple.

But I think you've hit the important point here. The problem is not
whether VACUUM waits for the pin, its that the pins can be held for
extended periods.

It makes more sense to try to limit pin hold times than it does to
come up with pin avoidance techniques.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#12

Bruce Momjian

bruce@momjian.us

almost 15 years ago

In reply to: Simon Riggs (#11)

Re: heap vacuum & cleanup locks

On Mon, Jun 6, 2011 at 11:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

But I think you've hit the important point here. The problem is not
whether VACUUM waits for the pin, its that the pins can be held for
extended periods.

Yes

It makes more sense to try to limit pin hold times than it does to
come up with pin avoidance techniques.

Well it's super-exclusive-vacuum-lock avoidance techniques. Why
shouldn't it make more sense to try to reduce the frequency and impact
of the single-purpose outlier in a non-critical-path instead of
burdening every other data reader with extra overhead?

I think Robert's plan is exactly right though I would phrase it
differently. We should get the exclusive lock, freeze/kill any xids
and line pointers, then if the pin-count is 1 do the compaction.

I'm really wishing we had more bits in the vm. It looks like we could use:
- contains not-all-visible tuples
- contains not-frozen xids
- in need of compaction

I'm sure we could find a use for one more page-level vm bit too.

--
greg

#13

Simon Riggs

simon@2ndQuadrant.com

almost 15 years ago

In reply to: Bruce Momjian (#12)

Re: heap vacuum & cleanup locks

On Tue, Jun 7, 2011 at 8:24 PM, Greg Stark <gsstark@mit.edu> wrote:

On Mon, Jun 6, 2011 at 11:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

But I think you've hit the important point here. The problem is not
whether VACUUM waits for the pin, its that the pins can be held for
extended periods.

Yes

It makes more sense to try to limit pin hold times than it does to
come up with pin avoidance techniques.

Well it's super-exclusive-vacuum-lock avoidance techniques. Why
shouldn't it make more sense to try to reduce the frequency and impact
of the single-purpose outlier in a non-critical-path instead of
burdening every other data reader with extra overhead?

I think Robert's plan is exactly right though I would phrase it
differently. We should get the exclusive lock, freeze/kill any xids
and line pointers, then if the pin-count is 1 do the compaction.

Would that also be possible during recovery?

A similar problem exists with Hot Standby, so I'm worried fixing just
VACUUMs would be a kluge.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#14

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Simon Riggs (#13)

Re: heap vacuum & cleanup locks

On Tue, Jun 7, 2011 at 3:43 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Tue, Jun 7, 2011 at 8:24 PM, Greg Stark <gsstark@mit.edu> wrote:

On Mon, Jun 6, 2011 at 11:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

But I think you've hit the important point here. The problem is not
whether VACUUM waits for the pin, its that the pins can be held for
extended periods.

Yes

It makes more sense to try to limit pin hold times than it does to
come up with pin avoidance techniques.

Well it's super-exclusive-vacuum-lock avoidance techniques. Why
shouldn't it make more sense to try to reduce the frequency and impact
of the single-purpose outlier in a non-critical-path instead of
burdening every other data reader with extra overhead?

I think Robert's plan is exactly right though I would phrase it
differently. We should get the exclusive lock, freeze/kill any xids
and line pointers, then if the pin-count is 1 do the compaction.

Would that also be possible during recovery?

A similar problem exists with Hot Standby, so I'm worried fixing just
VACUUMs would be a kluge.

We have to do the same operation on both the master and standby, so if
the master decides to skip the compaction then the slave will skip it
as well (and need not worry about waiting for pin-count 1). But if
the master does the compaction then the slave will have to get a
matching cleanup lock, just as now.

Your idea of somehow adjusting things so that we don't hold the buffer
pin for a long period of time would be better in that regard, but I'm
not sure how to do it. Presumably we could rejigger things to copy
the tuples instead of holding a pin, but that would carry a
performance penalty for the (very common) case where there is no
conflict with VACUUM.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Bruce Momjian (#12)

Re: heap vacuum & cleanup locks

On Tue, Jun 7, 2011 at 3:24 PM, Greg Stark <gsstark@mit.edu> wrote:

Well it's super-exclusive-vacuum-lock avoidance techniques. Why
shouldn't it make more sense to try to reduce the frequency and impact
of the single-purpose outlier in a non-critical-path instead of
burdening every other data reader with extra overhead?

I think Robert's plan is exactly right though I would phrase it
differently. We should get the exclusive lock, freeze/kill any xids
and line pointers, then if the pin-count is 1 do the compaction.

I wrote a really neat patch to do this today... and then, as I
thought about it some more, I started to think that it's probably
unsafe. Here's the trouble: with this approach, we assume that it's
OK to change the contents of the line pointer while holding only an
exclusive lock on the buffer. But there is a good deal of code out
there that thinks it's OK to examine a line pointer with only a pin on
the buffer (no lock). You need a content lock to scan the item
pointer array, but once you've identified an item of interest, you're
entitled to assume that it won't be modified while you hold a buffer
pin. Now, if you've identified a particular tuple as being visible to
your scan, then you might think that VACUUM shouldn't be removing it
anyway. But I think that's only true for MVCC scans - for example,
what happens under SnapshotNow semantics? But then then on third
thought, if you've also got an MVCC snapshot taken before the start of
the SnapshotNow scan, you are probably OK, because your advertised
xmin should prevent anyone from removing anything anyway, so how do
you actually provoke a failure?

Anyway, I'm attaching the patch, in case anyone has any ideas on where
to go with this.

I'm really wishing we had more bits in the vm. It looks like we could use:
- contains not-all-visible tuples
- contains not-frozen xids
- in need of compaction

I'm sure we could find a use for one more page-level vm bit too.

We've got plenty of room for more page-level bits, if we need them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16

Simon Riggs

simon@2ndQuadrant.com

over 14 years ago

In reply to: Robert Haas (#1)

Re: heap vacuum & cleanup locks

On Sun, Jun 5, 2011 at 4:03 AM, Robert Haas <robertmhaas@gmail.com> wrote:

We've occasionally seen problems with VACUUM getting stuck for failure
to acquire a cleanup lock due to, for example, a cursor holding a pin
on the buffer page. In the worst case, this can cause an undetected
deadlock, if the backend holding the buffer pin blocks trying to
acquire a heavyweight lock that is in turn blocked by VACUUM.

Those deadlocks can be detected in exactly the same way as is used for
Hot Standby.

Cleanup waiter registers interest in pin, anyone with a lock request
that must wait checks to see if they hold a pin that would cause
deadlock.

I'll look at doing a patch for that. Shouldn't take long.

A while
back, someone (Greg Stark? me?) floated the idea of not waiting for
the cleanup lock. If we can't get it immediately, or within some
short period of time, then we just skip the page and continue on.

Separately, that sounds like a great idea and it's simple to implement
- patch attached.

Enhancements to that are that I don't see any particular reason why
the heap pages need to be vacuumed in exactly sequential order. If
they are on disk, reading sequentially is useful, in which case nobody
has a pin and so we will continue. But if the blocks are already in
shared_buffers, then the sequential order doesn't matter at all. So we
could skip pages and then return to them later on.

Also, ISTM that LockBufferForCleanup() waits for just a single buffer,
but it could equally well wait for multiple buffers at the same time.
By this, we would then be able to simply register our interest in
multiple buffers and get woken as soon as one of them were free. That
way we could read the blocks sequentially, but lock and clean them out
of sequence if necessary. Do this in chunks, so it plays nicely with
buffer strategy. (Patch doesn't do that yet).

(Not sure if the patch handles vacuum map correctly if we skip the
page, but its a reasonable prototype for discussion).

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#17

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Simon Riggs (#16)

Re: heap vacuum & cleanup locks

On Thu, Nov 3, 2011 at 7:15 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

A while
back, someone (Greg Stark? me?) floated the idea of not waiting for
the cleanup lock. If we can't get it immediately, or within some
short period of time, then we just skip the page and continue on.

Separately, that sounds like a great idea and it's simple to implement
- patch attached.

Oh, that's kind of clever. I was thinking that you'd have to disable
this entirely for anti-wraparound vacuum, but the way you've done it
avoids that. You'll still have to wait if there's a competing pin on
a buffer that contains tuples actually in need of freezing, but that
should be relatively rare.

Enhancements to that are that I don't see any particular reason why
Also, ISTM that LockBufferForCleanup() waits for just a single buffer,
but it could equally well wait for multiple buffers at the same time.
By this, we would then be able to simply register our interest in
multiple buffers and get woken as soon as one of them were free. That
way we could read the blocks sequentially, but lock and clean them out
of sequence if necessary. Do this in chunks, so it plays nicely with
buffer strategy. (Patch doesn't do that yet).

I doubt this would help much. The real issue is with open cursors,
and those can easily be left open for long enough that those
optimizations won't help. I think the patch as it stands is probably
gets just about all of the benefit that can be had from this approach
while still being reasonably simple.

(Not sure if the patch handles vacuum map correctly if we skip the
page, but its a reasonable prototype for discussion).

Yeah. I think that should be OK, but:

- It looks to me like you haven't done anything about the second heap
pass. That should probably get a similar fix.
- I think that this is going to screw up the reltuples calculation
unless we fudge it somehow. The number of scanned pages has already
been incremented by the time your new code is reached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18

Simon Riggs

simon@2ndQuadrant.com

over 14 years ago

In reply to: Robert Haas (#17)

Re: heap vacuum & cleanup locks

On Thu, Nov 3, 2011 at 1:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think that should be OK, but:

- It looks to me like you haven't done anything about the second heap
pass. That should probably get a similar fix.

I was assuming this worked with Pavan's patch to remove second pass.

Not in any rush to commit this, so will wait till that is thru.

- I think that this is going to screw up the reltuples calculation
unless we fudge it somehow. The number of scanned pages has already
been incremented by the time your new code is reached.

Yeh, I'll have a look at that in more detail. Thanks for the review.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#19

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Simon Riggs (#18)

Re: heap vacuum & cleanup locks

On Thu, Nov 3, 2011 at 9:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Thu, Nov 3, 2011 at 1:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think that should be OK, but:

- It looks to me like you haven't done anything about the second heap
pass. That should probably get a similar fix.

I was assuming this worked with Pavan's patch to remove second pass.

It's not entirely certain that will make it into 9.2, so I would
rather get this done first. If you want I can pick up what you've
done and send you back a version that addresses this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20

Simon Riggs

simon@2ndQuadrant.com

over 14 years ago

In reply to: Robert Haas (#19)

Re: heap vacuum & cleanup locks

On Thu, Nov 3, 2011 at 2:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Nov 3, 2011 at 9:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Thu, Nov 3, 2011 at 1:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think that should be OK, but:

- It looks to me like you haven't done anything about the second heap
pass. That should probably get a similar fix.

I was assuming this worked with Pavan's patch to remove second pass.

It's not entirely certain that will make it into 9.2, so I would
rather get this done first. If you want I can pick up what you've
done and send you back a version that addresses this.

OK, that seems efficient. Thanks.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#21

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Simon Riggs (#20)

#22

Simon Riggs

simon@2ndQuadrant.com

over 14 years ago

In reply to: Robert Haas (#21)