buffer locking fix for lazy_scan_heap
I discovered when researching the issue of index-only scans and Hot
Standby that there's a bug (for which I'm responsible) in
lazy_scan_heap[1]. Actually, the code has been like this for a long
time, but I needed to change it when I did the crash-safe visibility
map work, and I didn't. The problem is that lazy_scan_heap() releases
the lock on the buffer it's whacking around before it sets the
visibility map bit. Thus, it's possible for the page-level bit to be
cleared by some other backend before the visibility map bit gets set.
In previous releases that was arguably OK, since the worst thing that
could happen is a postponement of vacuuming on that page until the
next anti-wraparound cycle; but now that we have index-only scans, it
can cause queries to return wrong answers.
Attached is a patch to fix the problem, which rearranges things so
that we set the bit in the visibility map before releasing the buffer
lock. Similar work has already been done for inserts, updates, and
deletes, which in previous releases would *clear* the visibility map
bit after releasing the page lock, and no longer done. But the vacuum
case, which *sets* the bit, was overlooked. Our convention in those
related cases is that it's acceptable to hold the buffer lock while
setting the visibility map bit and issuing the WAL log record, but
there must be no possibility of an I/O to read in the visibility map
page while the buffer lock is held. This turned out to be pretty
simple because, as it turns out, lazy_scan_heap() is almost always
holding a pin on the correct page anyway, so I only had to tweak
things slightly to guarantee it. As a pleasant side effect, the new
code is actually quite a bit simpler and more readable than the old
code, at least IMHO.
While I was at it, I made a couple of minor, related changes which I
believe to be improvements. First, I arranged things so that, when we
pause the first vacuum pass to do an index vac cycle, we release any
buffer pin we're holding on the visibility map page. The fact that we
haven't done that in the past is probably harmless, but I think
there's something to be said for not holding onto buffer pins for long
periods of time when it isn't really necessary. Second, I adjusted
things so that we print a warning if the visibility map bit is set and
the page-level bit is clear, since that should never happen in 9.2+.
It is similar to the existing warning which catches the case where a
page is marked all-visible despite containing dead tuples.
Comments?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
http://archives.postgresql.org/pgsql-hackers/2012-04/msg00682.php
Attachments:
lazy-scan-heap-locking-fix.patchapplication/octet-stream; name=lazy-scan-heap-locking-fix.patchDownload
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index d2cfcc0..6017147 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -490,6 +490,18 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
if ((vacrelstats->max_dead_tuples - vacrelstats->num_dead_tuples) < MaxHeapTuplesPerPage &&
vacrelstats->num_dead_tuples > 0)
{
+ /*
+ * Before beginning index vacuuming, we release any pin we may hold
+ * on the visibility map page. This isn't necessary for correctness,
+ * but we do it anyway to avoid holding the pin across a lengthy,
+ * unrelated operation.
+ */
+ if (BufferIsValid(vmbuffer))
+ {
+ ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
+ }
+
/* Log cleanup info before we touch indexes */
vacuum_log_cleanup_info(onerel, vacrelstats);
@@ -510,6 +522,16 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
vacrelstats->num_index_scans++;
}
+ /*
+ * Pin the visibility map page in case we need to mark the page
+ * all-visible. In most cases this will be very cheap, because we'll
+ * already have the correct page pinned anyway. However, it's possible
+ * that (a) next_not_all_visible_block is covered by a different VM page
+ * than the current block or (b) we released our pin and did a cycle of
+ * index vacuuming.
+ */
+ visibilitymap_pin(onerel, blkno, &vmbuffer);
+
buf = ReadBufferExtended(onerel, MAIN_FORKNUM, blkno,
RBM_NORMAL, vac_strategy);
@@ -600,26 +622,15 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
empty_pages++;
freespace = PageGetHeapFreeSpace(page);
+ /* empty pages are always all-visible */
if (!PageIsAllVisible(page))
{
PageSetAllVisible(page);
MarkBufferDirty(buf);
+ visibilitymap_set(onerel, blkno, InvalidXLogRecPtr, vmbuffer);
}
- LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-
- /* Update the visibility map */
- if (!all_visible_according_to_vm)
- {
- visibilitymap_pin(onerel, blkno, &vmbuffer);
- LockBuffer(buf, BUFFER_LOCK_SHARE);
- if (PageIsAllVisible(page))
- visibilitymap_set(onerel, blkno, InvalidXLogRecPtr,
- vmbuffer);
- LockBuffer(buf, BUFFER_LOCK_UNLOCK);
- }
-
- ReleaseBuffer(buf);
+ UnlockReleaseBuffer(buf);
RecordPageWithFreeSpace(onerel, blkno, freespace);
continue;
}
@@ -834,11 +845,26 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
freespace = PageGetHeapFreeSpace(page);
- /* Update the all-visible flag on the page */
- if (!PageIsAllVisible(page) && all_visible)
+ /* mark page all-visible, if appropriate */
+ if (all_visible && !all_visible_according_to_vm)
{
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
+ if (!PageIsAllVisible(page))
+ {
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+ }
+ visibilitymap_set(onerel, blkno, InvalidXLogRecPtr, vmbuffer);
+ }
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if
+ * the page-level bit is clear.
+ */
+ else if (all_visible_according_to_vm && !PageIsAllVisible(page))
+ {
+ elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ relname, blkno);
+ visibilitymap_clear(onerel, blkno, vmbuffer);
}
/*
@@ -859,30 +885,11 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
elog(WARNING, "page containing dead tuples is marked as all-visible in relation \"%s\" page %u",
relname, blkno);
PageClearAllVisible(page);
- SetBufferCommitInfoNeedsSave(buf);
-
- /*
- * Normally, we would drop the lock on the heap page before
- * updating the visibility map, but since this case shouldn't
- * happen anyway, don't worry about that.
- */
- visibilitymap_pin(onerel, blkno, &vmbuffer);
+ MarkBufferDirty(buf);
visibilitymap_clear(onerel, blkno, vmbuffer);
}
- LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-
- /* Update the visibility map */
- if (!all_visible_according_to_vm && all_visible)
- {
- visibilitymap_pin(onerel, blkno, &vmbuffer);
- LockBuffer(buf, BUFFER_LOCK_SHARE);
- if (PageIsAllVisible(page))
- visibilitymap_set(onerel, blkno, InvalidXLogRecPtr, vmbuffer);
- LockBuffer(buf, BUFFER_LOCK_UNLOCK);
- }
-
- ReleaseBuffer(buf);
+ UnlockReleaseBuffer(buf);
/* Remember the location of the last page with nonremovable tuples */
if (hastup)
Hi,
sorry for asking this really old commit.
I could not understand why "releases the lock on the buffer" is
a problem since vacuum will lock and check the page bit again before
set the vm. Did I missed something?
Thanks for your help.
On Thu, Apr 19, 2012 at 4:09 AM, Robert Haas <robertmhaas@gmail.com> wrote:
I discovered when researching the issue of index-only scans and Hot
Standby that there's a bug (for which I'm responsible) in
lazy_scan_heap[1]. Actually, the code has been like this for a long
time, but I needed to change it when I did the crash-safe visibility
map work, and I didn't. The problem is that lazy_scan_heap() releases
the lock on the buffer it's whacking around before it sets the
visibility map bit. Thus, it's possible for the page-level bit to be
cleared by some other backend before the visibility map bit gets set.
In previous releases that was arguably OK, since the worst thing that
could happen is a postponement of vacuuming on that page until the
next anti-wraparound cycle; but now that we have index-only scans, it
can cause queries to return wrong answers.Attached is a patch to fix the problem, which rearranges things so
that we set the bit in the visibility map before releasing the buffer
lock. Similar work has already been done for inserts, updates, and
deletes, which in previous releases would *clear* the visibility map
bit after releasing the page lock, and no longer done. But the vacuum
case, which *sets* the bit, was overlooked. Our convention in those
related cases is that it's acceptable to hold the buffer lock while
setting the visibility map bit and issuing the WAL log record, but
there must be no possibility of an I/O to read in the visibility map
page while the buffer lock is held. This turned out to be pretty
simple because, as it turns out, lazy_scan_heap() is almost always
holding a pin on the correct page anyway, so I only had to tweak
things slightly to guarantee it. As a pleasant side effect, the new
code is actually quite a bit simpler and more readable than the old
code, at least IMHO.While I was at it, I made a couple of minor, related changes which I
believe to be improvements. First, I arranged things so that, when we
pause the first vacuum pass to do an index vac cycle, we release any
buffer pin we're holding on the visibility map page. The fact that we
haven't done that in the past is probably harmless, but I think
there's something to be said for not holding onto buffer pins for long
periods of time when it isn't really necessary. Second, I adjusted
things so that we print a warning if the visibility map bit is set and
the page-level bit is clear, since that should never happen in 9.2+.
It is similar to the existing warning which catches the case where a
page is marked all-visible despite containing dead tuples.Comments?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companyhttp://archives.postgresql.org/pgsql-hackers/2012-04/msg00682.php
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
GaoZengqi
pgf00a@gmail.com
zengqigao@gmail.com
On Mon, Aug 3, 2015 at 12:03 AM, 高增琦 <pgf00a@gmail.com> wrote:
sorry for asking this really old commit.
I could not understand why "releases the lock on the buffer" is
a problem since vacuum will lock and check the page bit again before
set the vm. Did I missed something?
Hmm, you might be right. It certainly seems safer to do all the work
without releasing the buffer lock temporarily in the middle, but maybe
the old way wasn't actually broken.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers