Proposal: Log inability to lock pages during vacuum

Started by Jim Nasbyabout 11 years ago53 messages
#1Jim Nasby
Jim.Nasby@BlueTreble.com

Currently, a non-freeze vacuum will punt on any page it can't get a cleanup lock on, with no retry. Presumably this should be a rare occurrence, but I think it's bad that we just assume that and won't warn the user if something bad is going on.

My thought is that if we skip any pages elog(LOG) how many we skipped. If we skip more than 1% of the pages we visited (not relpages) then elog(WARNING) instead.

Comments?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Greg Stark
stark@mit.edu
In reply to: Jim Nasby (#1)
Re: Proposal: Log inability to lock pages during vacuum

On Mon, Oct 20, 2014 at 2:57 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

Currently, a non-freeze vacuum will punt on any page it can't get a cleanup
lock on, with no retry. Presumably this should be a rare occurrence, but I
think it's bad that we just assume that and won't warn the user if something
bad is going on.

My thought is that if we skip any pages elog(LOG) how many we skipped. If we
skip more than 1% of the pages we visited (not relpages) then elog(WARNING)
instead.

Is there some specific failure you've run into where a page was stuck
in a pinned state and never got vacuumed?

I would like to see a more systematic way of going about this. What
LSN or timestamp is associated with the oldest unvacuumed page? How
many times have we tried to visit it? What do those numbers look like
overall -- i.e. what's the median number of times it takes to vacuum a
page and what does the distribution look like of the unvacuumed ages?

With that data it should be possible to determine if the behaviour is
actually working well and where to draw the line to determine outliers
that might represent bugs.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Greg Stark (#2)
Re: Proposal: Log inability to lock pages during vacuum

On 10/20/14, 10:29 AM, Greg Stark wrote:

On Mon, Oct 20, 2014 at 2:57 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

Currently, a non-freeze vacuum will punt on any page it can't get a cleanup
lock on, with no retry. Presumably this should be a rare occurrence, but I
think it's bad that we just assume that and won't warn the user if something
bad is going on.

My thought is that if we skip any pages elog(LOG) how many we skipped. If we
skip more than 1% of the pages we visited (not relpages) then elog(WARNING)
instead.

Is there some specific failure you've run into where a page was stuck
in a pinned state and never got vacuumed?

Not that I know of... but how would I actually know? Having that info available is the point of my proposal. :)

I would like to see a more systematic way of going about this. What
LSN or timestamp is associated with the oldest unvacuumed page? How
many times have we tried to visit it? What do those numbers look like
overall -- i.e. what's the median number of times it takes to vacuum a
page and what does the distribution look like of the unvacuumed ages?

With that data it should be possible to determine if the behaviour is
actually working well and where to draw the line to determine outliers
that might represent bugs.

I agree we could use better data about/for vacuum (see /messages/by-id/544468C1.6050101@BlueTreble.com).

In the meantime, I think it's worth adding this logging. If in fact this basically never happens (the current assumption), it doesn't hurt anything. If it turns out our assumption is wrong, then we'll actually be able to find that out. :)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andres Freund
andres@2ndquadrant.com
In reply to: Jim Nasby (#3)
Re: Proposal: Log inability to lock pages during vacuum

On 2014-10-20 19:18:31 -0500, Jim Nasby wrote:

In the meantime, I think it's worth adding this logging. If in fact this basically never happens (the current assumption), it doesn't hurt anything. If it turns out our assumption is wrong, then we'll actually be able to fin> that out. :)

It does happen, and not infrequently. Just not enough pages to normally
cause significant bloat. The most likely place where it happens is very
small tables that all connections hit with a high frequency. Starting to
issue high volume log spew for a nonexistant problem isn't helping.

If you're super convinced this is urgent then add it as a *single*
datapoint inside the existing messages. But I think there's loads of
stuff in vacuum logging that are more important this.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Andres Freund (#4)
Re: Proposal: Log inability to lock pages during vacuum

On 10/20/14, 7:31 PM, Andres Freund wrote:

On 2014-10-20 19:18:31 -0500, Jim Nasby wrote:

In the meantime, I think it's worth adding this logging. If in fact this basically never happens (the current assumption), it doesn't hurt anything. If it turns out our assumption is wrong, then we'll actually be able to fin> that out.:)

It does happen, and not infrequently. Just not enough pages to normally
cause significant bloat. The most likely place where it happens is very
small tables that all connections hit with a high frequency. Starting to
issue high volume log spew for a nonexistant problem isn't helping.

How'd you determine that? Is there some way to measure this? I'm not doubting you; I just don't want to work on a problem that's already solved.

If you're super convinced this is urgent then add it as a*single*
datapoint inside the existing messages. But I think there's loads of
stuff in vacuum logging that are more important this.

See my original proposal; at it's most intrusive this would issue one warning per (auto)vacuum if it was over a certain threshold. I would think that a DBA would really like to know when this happens, but if we think that's too much spew we can limit it to normal vacuum logging.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andres Freund
andres@2ndquadrant.com
In reply to: Jim Nasby (#5)
Re: Proposal: Log inability to lock pages during vacuum

On 2014-10-20 19:43:38 -0500, Jim Nasby wrote:

On 10/20/14, 7:31 PM, Andres Freund wrote:

On 2014-10-20 19:18:31 -0500, Jim Nasby wrote:

In the meantime, I think it's worth adding this logging. If in fact this basically never happens (the current assumption), it doesn't hurt anything. If it turns out our assumption is wrong, then we'll actually be able to fin> that out.:)

It does happen, and not infrequently. Just not enough pages to normally
cause significant bloat. The most likely place where it happens is very
small tables that all connections hit with a high frequency. Starting to
issue high volume log spew for a nonexistant problem isn't helping.

How'd you determine that? Is there some way to measure this?

You'd seen individual pages with too old dead rows in them.

If you're super convinced this is urgent then add it as a*single*
datapoint inside the existing messages. But I think there's loads of
stuff in vacuum logging that are more important this.

See my original proposal; at it's most intrusive this would issue one
warning per (auto)vacuum if it was over a certain threshold.

Which would vastly increase the log output for setups with small tables
and a nonzero log_autovacuum_min_duration.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jim Nasby (#1)
Re: Proposal: Log inability to lock pages during vacuum

Jim Nasby wrote:

Currently, a non-freeze vacuum will punt on any page it can't get a
cleanup lock on, with no retry. Presumably this should be a rare
occurrence, but I think it's bad that we just assume that and won't
warn the user if something bad is going on.

I think if you really want to attack this problem, rather than just
being noisy about it, what you could do is to keep a record of which
page numbers you had to skip, and then once you're done with your first
scan you go back and retry the lock on the pages you skipped.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Alvaro Herrera (#7)
Re: Proposal: Log inability to lock pages during vacuum

On 10/21/14, 5:39 PM, Alvaro Herrera wrote:

Jim Nasby wrote:

Currently, a non-freeze vacuum will punt on any page it can't get a
cleanup lock on, with no retry. Presumably this should be a rare
occurrence, but I think it's bad that we just assume that and won't
warn the user if something bad is going on.

I think if you really want to attack this problem, rather than just
being noisy about it, what you could do is to keep a record of which
page numbers you had to skip, and then once you're done with your first
scan you go back and retry the lock on the pages you skipped.

I'm OK with that if the community is; I was just trying for minimum invasiveness.

If I go this route, I'd like some input though...

- How to handle storing the blockIDs. Fixed size array or something fancier? What should we limit it to, especially since we're already allocating maintenance_work_mem for the tid array.

- What happens if we run out of space to remember skipped blocks? I could do something like what we do for running out of space in the dead_tuples array, but I'm worried that will add a serious amount of complexity, especially since re-processing these blocks could be what actually pushes us over the limit.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#8)
Re: Proposal: Log inability to lock pages during vacuum

Jim Nasby <Jim.Nasby@BlueTreble.com> writes:

- What happens if we run out of space to remember skipped blocks?

You forget some, and are no worse off than today. (This might be an
event worthy of logging, if the array is large enough that we don't
expect it to happen often ...)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#9)
Re: Proposal: Log inability to lock pages during vacuum

On 10/21/14, 6:05 PM, Tom Lane wrote:

Jim Nasby <Jim.Nasby@BlueTreble.com> writes:

- What happens if we run out of space to remember skipped blocks?

You forget some, and are no worse off than today. (This might be an
event worthy of logging, if the array is large enough that we don't
expect it to happen often ...)

Makes sense. I'll see if there's some reasonable way to retry pages when the array fills up.

I'll make the array 2k in size; that allows for 512 pages without spending a bunch of memory.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Jim Nasby (#10)
1 attachment(s)
Re: Proposal: Log inability to lock pages during vacuum

On 10/29/14, 11:49 AM, Jim Nasby wrote:

On 10/21/14, 6:05 PM, Tom Lane wrote:

Jim Nasby <Jim.Nasby@BlueTreble.com> writes:

- What happens if we run out of space to remember skipped blocks?

You forget some, and are no worse off than today. (This might be an
event worthy of logging, if the array is large enough that we don't
expect it to happen often ...)

Makes sense. I'll see if there's some reasonable way to retry pages when the array fills up.

I'll make the array 2k in size; that allows for 512 pages without spending a bunch of memory.

Attached is a patch for this. It also adds logging of unobtainable cleanup locks, and refactors scanning a page for vacuum into it's own function.

Anyone reviewing this might want to look at https://github.com/decibel/postgres/commit/69ab22f703d577cbb3d8036e4e42563977bcf74b, which is the refactor with no whitespace changes.

I've verified this works correctly by connecting to a backend with gdb and halting it with a page pinned. Both vacuum and vacuum freeze on that table do what's expected, but I also get this waring (which AFAICT is a false positive):

decibel@decina.local=# vacuum verbose i;
INFO: vacuuming "public.i"
INFO: "i": found 0 removable, 399774 nonremovable row versions in 1769 out of 1770 pages
DETAIL: 200000 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
Retried cleanup lock on 0 pages, retry failed on 1, skipped retry on 0.
CPU 0.00s/0.06u sec elapsed 12.89 sec.
WARNING: buffer refcount leak: [105] (rel=base/16384/16385, blockNum=0, flags=0x106, refcount=2 1)
VACUUM

I am doing a simple static allocation of retry_pages[]; my understanding is that will only exist for the duration of this function so it's OK. If not I'll palloc it. If it is OK then I'll do the same for the freeze array.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Attachments:

0001-Vacuum-cleanup-lock-retry.patchtext/plain; charset=UTF-8; name=0001-Vacuum-cleanup-lock-retry.patch; x-mac-creator=0; x-mac-type=0Download
>From 1752751903a8d51b7b3b618072b6b0687f9f141c Mon Sep 17 00:00:00 2001
From: Jim Nasby <Jim.Nasby@BlueTreble.com>
Date: Thu, 6 Nov 2014 14:42:52 -0600
Subject: [PATCH] Vacuum cleanup lock retry

This patch will retry failed attempts to obtain the cleanup lock on a
buffer. It remembers failed block numbers in an array and retries after
vacuuming the relation. The array is currently fixed at 512 entries;
additional lock failures will not be re-attempted.

This patch also adds counters to report on failures, as well as
refactoring the guts of page vacuum scans into it's own function.
---
 src/backend/commands/vacuumlazy.c | 964 +++++++++++++++++++++-----------------
 1 file changed, 541 insertions(+), 423 deletions(-)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 3778d9d..240113f 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -96,6 +96,14 @@
  */
 #define SKIP_PAGES_THRESHOLD	((BlockNumber) 32)
 
+/*
+ * Instead of blindly skipping pages that we can't immediately acquire a
+ * cleanup lock for (assuming we're not freezing), we keep a list of pages we
+ * initially skipped, up to VACUUM_MAX_RETRY_PAGES. We retry those pages at the
+ * end of vacuuming.
+ */
+#define VACUUM_MAX_RETRY_PAGES	512
+
 typedef struct LVRelStats
 {
 	/* hasindex = true means two-pass strategy; false means one-pass */
@@ -143,6 +151,10 @@ static void lazy_vacuum_index(Relation indrel,
 static void lazy_cleanup_index(Relation indrel,
 				   IndexBulkDeleteResult *stats,
 				   LVRelStats *vacrelstats);
+static void lazy_scan_page(Relation onerel, LVRelStats *vacrelstats,
+				BlockNumber blkno, Buffer buf, Buffer vmbuffer, xl_heap_freeze_tuple *frozen,
+				int nindexes, bool all_visible_according_to_vm,
+				BlockNumber *empty_pages, BlockNumber *vacuumed_pages, double *nunused);
 static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 				 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
 static void lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats);
@@ -422,13 +434,15 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 {
 	BlockNumber nblocks,
 				blkno;
-	HeapTupleData tuple;
 	char	   *relname;
 	BlockNumber empty_pages,
-				vacuumed_pages;
-	double		num_tuples,
-				tups_vacuumed,
-				nkeep,
+				vacuumed_pages,
+				retry_pages[VACUUM_MAX_RETRY_PAGES];
+	int			retry_pages_insert_ptr;
+	double		retry_page_count,
+				retry_fail_count,
+				retry_pages_skipped,
+				cleanup_lock_waits,
 				nunused;
 	IndexBulkDeleteResult **indstats;
 	int			i;
@@ -446,8 +460,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 					get_namespace_name(RelationGetNamespace(onerel)),
 					relname)));
 
-	empty_pages = vacuumed_pages = 0;
-	num_tuples = tups_vacuumed = nkeep = nunused = 0;
+	empty_pages = vacuumed_pages = retry_pages_insert_ptr = retry_page_count =
+		retry_fail_count = retry_pages_skipped = cleanup_lock_waits = nunused = 0;
 
 	indstats = (IndexBulkDeleteResult **)
 		palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
@@ -508,18 +522,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
-		Page		page;
-		OffsetNumber offnum,
-					maxoff;
-		bool		tupgone,
-					hastup;
-		int			prev_dead_count;
-		int			nfrozen;
-		Size		freespace;
 		bool		all_visible_according_to_vm;
-		bool		all_visible;
-		bool		has_dead_tuples;
-		TransactionId visibility_cutoff_xid = InvalidTransactionId;
 
 		if (blkno == next_not_all_visible_block)
 		{
@@ -617,6 +620,19 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 			 */
 			if (!scan_all)
 			{
+				/*
+				 * Remember the page that we're skipping, but only if there's
+				 * still room.
+				 *
+				 * XXX it would be even better if we retried as soon as we
+				 * filled retry_pages, but we should get very few retry pages
+				 * anyway so lets not go overboard.
+				 */
+				if (retry_pages_insert_ptr<VACUUM_MAX_RETRY_PAGES)
+					retry_pages[retry_pages_insert_ptr++] = blkno;
+				else
+					retry_pages_skipped++;
+
 				ReleaseBuffer(buf);
 				continue;
 			}
@@ -641,420 +657,55 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 			}
 			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 			LockBufferForCleanup(buf);
+			cleanup_lock_waits++;
 			/* drop through to normal processing */
 		}
 
-		vacrelstats->scanned_pages++;
-
-		page = BufferGetPage(buf);
-
-		if (PageIsNew(page))
-		{
-			/*
-			 * An all-zeroes page could be left over if a backend extends the
-			 * relation but crashes before initializing the page. Reclaim such
-			 * pages for use.
-			 *
-			 * We have to be careful here because we could be looking at a
-			 * page that someone has just added to the relation and not yet
-			 * been able to initialize (see RelationGetBufferForTuple). To
-			 * protect against that, release the buffer lock, grab the
-			 * relation extension lock momentarily, and re-lock the buffer. If
-			 * the page is still uninitialized by then, it must be left over
-			 * from a crashed backend, and we can initialize it.
-			 *
-			 * We don't really need the relation lock when this is a new or
-			 * temp relation, but it's probably not worth the code space to
-			 * check that, since this surely isn't a critical path.
-			 *
-			 * Note: the comparable code in vacuum.c need not worry because
-			 * it's got exclusive lock on the whole relation.
-			 */
-			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-			LockRelationForExtension(onerel, ExclusiveLock);
-			UnlockRelationForExtension(onerel, ExclusiveLock);
-			LockBufferForCleanup(buf);
-			if (PageIsNew(page))
-			{
-				ereport(WARNING,
-				(errmsg("relation \"%s\" page %u is uninitialized --- fixing",
-						relname, blkno)));
-				PageInit(page, BufferGetPageSize(buf), 0);
-				empty_pages++;
-			}
-			freespace = PageGetHeapFreeSpace(page);
-			MarkBufferDirty(buf);
-			UnlockReleaseBuffer(buf);
-
-			RecordPageWithFreeSpace(onerel, blkno, freespace);
-			continue;
-		}
-
-		if (PageIsEmpty(page))
-		{
-			empty_pages++;
-			freespace = PageGetHeapFreeSpace(page);
-
-			/* empty pages are always all-visible */
-			if (!PageIsAllVisible(page))
-			{
-				START_CRIT_SECTION();
-
-				/* mark buffer dirty before writing a WAL record */
-				MarkBufferDirty(buf);
-
-				/*
-				 * It's possible that another backend has extended the heap,
-				 * initialized the page, and then failed to WAL-log the page
-				 * due to an ERROR.  Since heap extension is not WAL-logged,
-				 * recovery might try to replay our record setting the page
-				 * all-visible and find that the page isn't initialized, which
-				 * will cause a PANIC.  To prevent that, check whether the
-				 * page has been previously WAL-logged, and if not, do that
-				 * now.
-				 */
-				if (RelationNeedsWAL(onerel) &&
-					PageGetLSN(page) == InvalidXLogRecPtr)
-					log_newpage_buffer(buf, true);
-
-				PageSetAllVisible(page);
-				visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr,
-								  vmbuffer, InvalidTransactionId);
-				END_CRIT_SECTION();
-			}
-
-			UnlockReleaseBuffer(buf);
-			RecordPageWithFreeSpace(onerel, blkno, freespace);
-			continue;
-		}
-
-		/*
-		 * Prune all HOT-update chains in this page.
-		 *
-		 * We count tuples removed by the pruning step as removed by VACUUM.
-		 */
-		tups_vacuumed += heap_page_prune(onerel, buf, OldestXmin, false,
-										 &vacrelstats->latestRemovedXid);
+		lazy_scan_page(onerel, vacrelstats, blkno, buf, vmbuffer, frozen,
+						nindexes, all_visible_according_to_vm,
+						&empty_pages, &vacuumed_pages, &nunused);
+	}
 
-		/*
-		 * Now scan the page to collect vacuumable items and check for tuples
-		 * requiring freezing.
-		 */
-		all_visible = true;
-		has_dead_tuples = false;
-		nfrozen = 0;
-		hastup = false;
-		prev_dead_count = vacrelstats->num_dead_tuples;
-		maxoff = PageGetMaxOffsetNumber(page);
+	/*
+	 * Make a second attempt to acquire the cleanup lock on pages we skipped.
+	 * Note that we don't have to worry about !scan_all here.
+	 */
 
-		/*
-		 * Note: If you change anything in the loop below, also look at
-		 * heap_page_is_all_visible to see if that needs to be changed.
-		 */
-		for (offnum = FirstOffsetNumber;
-			 offnum <= maxoff;
-			 offnum = OffsetNumberNext(offnum))
+	if (retry_pages_insert_ptr)
+	{
+		for (i = 0; i < retry_pages_insert_ptr; i++)
 		{
-			ItemId		itemid;
-
-			itemid = PageGetItemId(page, offnum);
-
-			/* Unused items require no processing, but we count 'em */
-			if (!ItemIdIsUsed(itemid))
-			{
-				nunused += 1;
-				continue;
-			}
-
-			/* Redirect items mustn't be touched */
-			if (ItemIdIsRedirected(itemid))
-			{
-				hastup = true;	/* this page won't be truncatable */
-				continue;
-			}
-
-			ItemPointerSet(&(tuple.t_self), blkno, offnum);
-
-			/*
-			 * DEAD item pointers are to be vacuumed normally; but we don't
-			 * count them in tups_vacuumed, else we'd be double-counting (at
-			 * least in the common case where heap_page_prune() just freed up
-			 * a non-HOT tuple).
-			 */
-			if (ItemIdIsDead(itemid))
-			{
-				lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
-				all_visible = false;
-				continue;
-			}
-
-			Assert(ItemIdIsNormal(itemid));
-
-			tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
-			tuple.t_len = ItemIdGetLength(itemid);
-			tuple.t_tableOid = RelationGetRelid(onerel);
-
-			tupgone = false;
-
-			switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
-			{
-				case HEAPTUPLE_DEAD:
-
-					/*
-					 * Ordinarily, DEAD tuples would have been removed by
-					 * heap_page_prune(), but it's possible that the tuple
-					 * state changed since heap_page_prune() looked.  In
-					 * particular an INSERT_IN_PROGRESS tuple could have
-					 * changed to DEAD if the inserter aborted.  So this
-					 * cannot be considered an error condition.
-					 *
-					 * If the tuple is HOT-updated then it must only be
-					 * removed by a prune operation; so we keep it just as if
-					 * it were RECENTLY_DEAD.  Also, if it's a heap-only
-					 * tuple, we choose to keep it, because it'll be a lot
-					 * cheaper to get rid of it in the next pruning pass than
-					 * to treat it like an indexed tuple.
-					 */
-					if (HeapTupleIsHotUpdated(&tuple) ||
-						HeapTupleIsHeapOnly(&tuple))
-						nkeep += 1;
-					else
-						tupgone = true; /* we can delete the tuple */
-					all_visible = false;
-					break;
-				case HEAPTUPLE_LIVE:
-					/* Tuple is good --- but let's do some validity checks */
-					if (onerel->rd_rel->relhasoids &&
-						!OidIsValid(HeapTupleGetOid(&tuple)))
-						elog(WARNING, "relation \"%s\" TID %u/%u: OID is invalid",
-							 relname, blkno, offnum);
-
-					/*
-					 * Is the tuple definitely visible to all transactions?
-					 *
-					 * NB: Like with per-tuple hint bits, we can't set the
-					 * PD_ALL_VISIBLE flag if the inserter committed
-					 * asynchronously. See SetHintBits for more info. Check
-					 * that the tuple is hinted xmin-committed because of
-					 * that.
-					 */
-					if (all_visible)
-					{
-						TransactionId xmin;
-
-						if (!HeapTupleHeaderXminCommitted(tuple.t_data))
-						{
-							all_visible = false;
-							break;
-						}
-
-						/*
-						 * The inserter definitely committed. But is it old
-						 * enough that everyone sees it as committed?
-						 */
-						xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-						if (!TransactionIdPrecedes(xmin, OldestXmin))
-						{
-							all_visible = false;
-							break;
-						}
-
-						/* Track newest xmin on page. */
-						if (TransactionIdFollows(xmin, visibility_cutoff_xid))
-							visibility_cutoff_xid = xmin;
-					}
-					break;
-				case HEAPTUPLE_RECENTLY_DEAD:
+			Buffer		buf;
+			blkno = retry_pages[i];
 
-					/*
-					 * If tuple is recently deleted then we must not remove it
-					 * from relation.
-					 */
-					nkeep += 1;
-					all_visible = false;
-					break;
-				case HEAPTUPLE_INSERT_IN_PROGRESS:
-					/* This is an expected case during concurrent vacuum */
-					all_visible = false;
-					break;
-				case HEAPTUPLE_DELETE_IN_PROGRESS:
-					/* This is an expected case during concurrent vacuum */
-					all_visible = false;
-					break;
-				default:
-					elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
-					break;
-			}
-
-			if (tupgone)
-			{
-				lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
-				HeapTupleHeaderAdvanceLatestRemovedXid(tuple.t_data,
-											 &vacrelstats->latestRemovedXid);
-				tups_vacuumed += 1;
-				has_dead_tuples = true;
-			}
-			else
-			{
-				num_tuples += 1;
-				hastup = true;
-
-				/*
-				 * Each non-removable tuple must be checked to see if it needs
-				 * freezing.  Note we already have exclusive buffer lock.
-				 */
-				if (heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit,
-										  MultiXactCutoff, &frozen[nfrozen]))
-					frozen[nfrozen++].offset = offnum;
-			}
-		}						/* scan along page */
+			visibilitymap_pin(onerel, blkno, &vmbuffer);
 
-		/*
-		 * If we froze any tuples, mark the buffer dirty, and write a WAL
-		 * record recording the changes.  We must log the changes to be
-		 * crash-safe against future truncation of CLOG.
-		 */
-		if (nfrozen > 0)
-		{
-			START_CRIT_SECTION();
-
-			MarkBufferDirty(buf);
+			buf = ReadBufferExtended(onerel, MAIN_FORKNUM, blkno,
+									 RBM_NORMAL, vac_strategy);
 
-			/* execute collected freezes */
-			for (i = 0; i < nfrozen; i++)
+			/* We need buffer cleanup lock so that we can prune HOT chains. */
+			if (ConditionalLockBufferForCleanup(buf))
 			{
-				ItemId		itemid;
-				HeapTupleHeader htup;
+				retry_page_count++;
 
-				itemid = PageGetItemId(page, frozen[i].offset);
-				htup = (HeapTupleHeader) PageGetItem(page, itemid);
-
-				heap_execute_freeze_tuple(htup, &frozen[i]);
-			}
-
-			/* Now WAL-log freezing if neccessary */
-			if (RelationNeedsWAL(onerel))
+				lazy_scan_page(onerel, vacrelstats, blkno, buf, vmbuffer, frozen,
+								nindexes, visibilitymap_test(onerel, blkno, &vmbuffer),
+								&empty_pages, &vacuumed_pages, &nunused);
+			} else
 			{
-				XLogRecPtr	recptr;
-
-				recptr = log_heap_freeze(onerel, buf, FreezeLimit,
-										 frozen, nfrozen);
-				PageSetLSN(page, recptr);
+				retry_fail_count++;
 			}
-
-			END_CRIT_SECTION();
-		}
-
-		/*
-		 * If there are no indexes then we can vacuum the page right now
-		 * instead of doing a second scan.
-		 */
-		if (nindexes == 0 &&
-			vacrelstats->num_dead_tuples > 0)
-		{
-			/* Remove tuples from heap */
-			lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, &vmbuffer);
-			has_dead_tuples = false;
-
-			/*
-			 * Forget the now-vacuumed tuples, and press on, but be careful
-			 * not to reset latestRemovedXid since we want that value to be
-			 * valid.
-			 */
-			vacrelstats->num_dead_tuples = 0;
-			vacuumed_pages++;
-		}
-
-		freespace = PageGetHeapFreeSpace(page);
-
-		/* mark page all-visible, if appropriate */
-		if (all_visible && !all_visible_according_to_vm)
-		{
-			/*
-			 * It should never be the case that the visibility map page is set
-			 * while the page-level bit is clear, but the reverse is allowed
-			 * (if checksums are not enabled).  Regardless, set the both bits
-			 * so that we get back in sync.
-			 *
-			 * NB: If the heap page is all-visible but the VM bit is not set,
-			 * we don't need to dirty the heap page.  However, if checksums
-			 * are enabled, we do need to make sure that the heap page is
-			 * dirtied before passing it to visibilitymap_set(), because it
-			 * may be logged.  Given that this situation should only happen in
-			 * rare cases after a crash, it is not worth optimizing.
-			 */
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-			visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr,
-							  vmbuffer, visibility_cutoff_xid);
-		}
-
-		/*
-		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
-		 * the page-level bit is clear.  However, it's possible that the bit
-		 * got cleared after we checked it and before we took the buffer
-		 * content lock, so we must recheck before jumping to the conclusion
-		 * that something bad has happened.
-		 */
-		else if (all_visible_according_to_vm && !PageIsAllVisible(page)
-				 && visibilitymap_test(onerel, blkno, &vmbuffer))
-		{
-			elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-				 relname, blkno);
-			visibilitymap_clear(onerel, blkno, vmbuffer);
 		}
-
-		/*
-		 * It's possible for the value returned by GetOldestXmin() to move
-		 * backwards, so it's not wrong for us to see tuples that appear to
-		 * not be visible to everyone yet, while PD_ALL_VISIBLE is already
-		 * set. The real safe xmin value never moves backwards, but
-		 * GetOldestXmin() is conservative and sometimes returns a value
-		 * that's unnecessarily small, so if we see that contradiction it just
-		 * means that the tuples that we think are not visible to everyone yet
-		 * actually are, and the PD_ALL_VISIBLE flag is correct.
-		 *
-		 * There should never be dead tuples on a page with PD_ALL_VISIBLE
-		 * set, however.
-		 */
-		else if (PageIsAllVisible(page) && has_dead_tuples)
-		{
-			elog(WARNING, "page containing dead tuples is marked as all-visible in relation \"%s\" page %u",
-				 relname, blkno);
-			PageClearAllVisible(page);
-			MarkBufferDirty(buf);
-			visibilitymap_clear(onerel, blkno, vmbuffer);
-		}
-
-		UnlockReleaseBuffer(buf);
-
-		/* Remember the location of the last page with nonremovable tuples */
-		if (hastup)
-			vacrelstats->nonempty_pages = blkno + 1;
-
-		/*
-		 * If we remembered any tuples for deletion, then the page will be
-		 * visited again by lazy_vacuum_heap, which will compute and record
-		 * its post-compaction free space.  If not, then we're done with this
-		 * page, so remember its free space as-is.  (This path will always be
-		 * taken if there are no indexes.)
-		 */
-		if (vacrelstats->num_dead_tuples == prev_dead_count)
-			RecordPageWithFreeSpace(onerel, blkno, freespace);
 	}
 
 	pfree(frozen);
 
-	/* save stats for use later */
-	vacrelstats->scanned_tuples = num_tuples;
-	vacrelstats->tuples_deleted = tups_vacuumed;
-	vacrelstats->new_dead_tuples = nkeep;
 
 	/* now we can compute the new value for pg_class.reltuples */
 	vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
 														 nblocks,
 												  vacrelstats->scanned_pages,
-														 num_tuples);
+														 vacrelstats->scanned_tuples);
 
 	/*
 	 * Release any remaining pin on visibility map page.
@@ -1077,6 +728,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 			lazy_vacuum_index(Irel[i],
 							  &indstats[i],
 							  vacrelstats);
+		
 		/* Remove tuples from heap */
 		lazy_vacuum_heap(onerel, vacrelstats);
 		vacrelstats->num_index_scans++;
@@ -1091,21 +743,57 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 		ereport(elevel,
 				(errmsg("\"%s\": removed %.0f row versions in %u pages",
 						RelationGetRelationName(onerel),
-						tups_vacuumed, vacuumed_pages)));
+						vacrelstats->tuples_deleted, vacuumed_pages)));
 
-	ereport(elevel,
-			(errmsg("\"%s\": found %.0f removable, %.0f nonremovable row versions in %u out of %u pages",
-					RelationGetRelationName(onerel),
-					tups_vacuumed, num_tuples,
-					vacrelstats->scanned_pages, nblocks),
-			 errdetail("%.0f dead row versions cannot be removed yet.\n"
-					   "There were %.0f unused item pointers.\n"
-					   "%u pages are entirely empty.\n"
-					   "%s.",
-					   nkeep,
-					   nunused,
-					   empty_pages,
-					   pg_rusage_show(&ru0))));
+	if (retry_page_count || retry_fail_count || retry_pages_skipped)
+		ereport(elevel,
+				(errmsg("\"%s\": found %.0f removable, %.0f nonremovable row versions in %u out of %u pages",
+						RelationGetRelationName(onerel),
+						vacrelstats->tuples_deleted, vacrelstats->scanned_tuples,
+						vacrelstats->scanned_pages, nblocks),
+				 errdetail("%.0f dead row versions cannot be removed yet.\n"
+						   "There were %.0f unused item pointers.\n"
+						   "%u pages are entirely empty.\n"
+						   "Retried cleanup lock on %.0f pages, retry failed on %.0f, skipped retry on %.0f.\n"
+						   "%s.",
+						   vacrelstats->new_dead_tuples,
+						   nunused,
+						   empty_pages,
+						   retry_page_count, retry_fail_count, retry_pages_skipped,
+						   pg_rusage_show(&ru0))
+				 ));
+	else if (cleanup_lock_waits)
+		ereport(elevel,
+				(errmsg("\"%s\": found %.0f removable, %.0f nonremovable row versions in %u out of %u pages",
+						RelationGetRelationName(onerel),
+						vacrelstats->tuples_deleted, vacrelstats->scanned_tuples,
+						vacrelstats->scanned_pages, nblocks),
+				 errdetail("%.0f dead row versions cannot be removed yet.\n"
+						   "There were %.0f unused item pointers.\n"
+						   "%u pages are entirely empty.\n"
+						   "Waited for cleanup lock on %.0f pages.\n"
+						   "%s.",
+						   vacrelstats->new_dead_tuples,
+						   nunused,
+						   empty_pages,
+						   cleanup_lock_waits,
+						   pg_rusage_show(&ru0))
+				 ));
+	else
+		ereport(elevel,
+				(errmsg("\"%s\": found %.0f removable, %.0f nonremovable row versions in %u out of %u pages",
+						RelationGetRelationName(onerel),
+						vacrelstats->tuples_deleted, vacrelstats->scanned_tuples,
+						vacrelstats->scanned_pages, nblocks),
+				 errdetail("%.0f dead row versions cannot be removed yet.\n"
+						   "There were %.0f unused item pointers.\n"
+						   "%u pages are entirely empty.\n"
+						   "%s.",
+						   vacrelstats->new_dead_tuples,
+						   nunused,
+						   empty_pages,
+						   pg_rusage_show(&ru0))
+				 ));
 }
 
 
@@ -1175,6 +863,436 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 			 errdetail("%s.",
 					   pg_rusage_show(&ru0))));
 }
+/*
+ * lazy_scan_page() - scan a single page for dead tuples
+ *
+ * This is broken out from lazy_scan_heap() so that we can retry cleaning pages
+ * that we couldn't get the cleanup lock on. Caller must have a cleanup lock on
+ * the heap buffer (buf), and have the appropriate visibility map buffer
+ * (vmbuffer) pinned.
+ *
+ */
+static void
+lazy_scan_page(Relation onerel, LVRelStats *vacrelstats,
+				BlockNumber blkno, Buffer buf, Buffer vmbuffer, xl_heap_freeze_tuple *frozen,
+				int nindexes, bool all_visible_according_to_vm,
+				BlockNumber *empty_pages, BlockNumber *vacuumed_pages, double *nunused)
+{
+	int				nfrozen = 0;
+	int				i;
+	Page			page;
+	OffsetNumber 	offnum,
+					maxoff;
+	HeapTupleData 	tuple;
+	bool			all_visible = true;
+	bool			has_dead_tuples = false;
+	bool			hastup = false;
+	bool			tupgone;
+	char	   		*relname = RelationGetRelationName(onerel);
+	Size			freespace;
+	TransactionId	visibility_cutoff_xid = InvalidTransactionId;
+	int				prev_dead_count = vacrelstats->num_dead_tuples;
+
+	/*
+	 * I don't see a way to check onerel against buf or vmbuffer without
+	 * BufferGetTag, which seems like overkill.
+	 */
+	Assert(BufferGetBlockNumber(buf) == blkno);
+	Assert(visibilitymap_pin_ok(blkno, vmbuffer));
+
+	vacrelstats->scanned_pages++;
+
+	page = BufferGetPage(buf);
+
+	if (PageIsNew(page))
+	{
+		/*
+		 * An all-zeroes page could be left over if a backend extends the
+		 * relation but crashes before initializing the page. Reclaim such
+		 * pages for use.
+		 *
+		 * We have to be careful here because we could be looking at a
+		 * page that someone has just added to the relation and not yet
+		 * been able to initialize (see RelationGetBufferForTuple). To
+		 * protect against that, release the buffer lock, grab the
+		 * relation extension lock momentarily, and re-lock the buffer. If
+		 * the page is still uninitialized by then, it must be left over
+		 * from a crashed backend, and we can initialize it.
+		 *
+		 * We don't really need the relation lock when this is a new or
+		 * temp relation, but it's probably not worth the code space to
+		 * check that, since this surely isn't a critical path.
+		 *
+		 * Note: the comparable code in vacuum.c need not worry because
+		 * it's got exclusive lock on the whole relation.
+		 */
+		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		LockRelationForExtension(onerel, ExclusiveLock);
+		UnlockRelationForExtension(onerel, ExclusiveLock);
+		LockBufferForCleanup(buf);
+		if (PageIsNew(page))
+		{
+			ereport(WARNING,
+			(errmsg("relation \"%s\" page %u is uninitialized --- fixing",
+					relname, blkno)));
+			PageInit(page, BufferGetPageSize(buf), 0);
+			empty_pages++;
+		}
+		freespace = PageGetHeapFreeSpace(page);
+		MarkBufferDirty(buf);
+		UnlockReleaseBuffer(buf);
+
+		RecordPageWithFreeSpace(onerel, blkno, freespace);
+		return;
+	}
+
+	if (PageIsEmpty(page))
+	{
+		empty_pages++;
+		freespace = PageGetHeapFreeSpace(page);
+
+		/* empty pages are always all-visible */
+		if (!PageIsAllVisible(page))
+		{
+			START_CRIT_SECTION();
+
+			/* mark buffer dirty before writing a WAL record */
+			MarkBufferDirty(buf);
+
+			/*
+			 * It's possible that another backend has extended the heap,
+			 * initialized the page, and then failed to WAL-log the page
+			 * due to an ERROR.  Since heap extension is not WAL-logged,
+			 * recovery might try to replay our record setting the page
+			 * all-visible and find that the page isn't initialized, which
+			 * will cause a PANIC.  To prevent that, check whether the
+			 * page has been previously WAL-logged, and if not, do that
+			 * now.
+			 */
+			if (RelationNeedsWAL(onerel) &&
+				PageGetLSN(page) == InvalidXLogRecPtr)
+				log_newpage_buffer(buf, true);
+
+			PageSetAllVisible(page);
+			visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr,
+							  vmbuffer, InvalidTransactionId);
+			END_CRIT_SECTION();
+		}
+
+		UnlockReleaseBuffer(buf);
+		RecordPageWithFreeSpace(onerel, blkno, freespace);
+		return;
+	}
+
+	/*
+	 * Prune all HOT-update chains in this page.
+	 *
+	 * We count tuples removed by the pruning step as removed by VACUUM.
+	 */
+	vacrelstats->tuples_deleted += heap_page_prune(onerel, buf, OldestXmin, false,
+									 &vacrelstats->latestRemovedXid);
+
+	/*
+	 * Now scan the page to collect vacuumable items and check for tuples
+	 * requiring freezing.
+	 */
+
+	/*
+	 * Note: If you change anything in the loop below, also look at
+	 * heap_page_is_all_visible to see if that needs to be changed.
+	 */
+	maxoff = PageGetMaxOffsetNumber(page);
+	for (offnum = FirstOffsetNumber;
+		 offnum <= maxoff;
+		 offnum = OffsetNumberNext(offnum))
+	{
+		ItemId		itemid;
+
+		itemid = PageGetItemId(page, offnum);
+
+		/* Unused items require no processing, but we count 'em */
+		if (!ItemIdIsUsed(itemid))
+		{
+			nunused += 1;
+			continue;
+		}
+
+		/* Redirect items mustn't be touched */
+		if (ItemIdIsRedirected(itemid))
+		{
+			hastup = true;	/* this page won't be truncatable */
+			continue;
+		}
+
+		ItemPointerSet(&(tuple.t_self), blkno, offnum);
+
+		/*
+		 * DEAD item pointers are to be vacuumed normally; but we don't
+		 * count them in vacrelstats->tuples_deleted, else we'd be double-counting (at
+		 * least in the common case where heap_page_prune() just freed up
+		 * a non-HOT tuple).
+		 */
+		if (ItemIdIsDead(itemid))
+		{
+			lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
+			all_visible = false;
+			continue;
+		}
+
+		Assert(ItemIdIsNormal(itemid));
+
+		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
+		tuple.t_len = ItemIdGetLength(itemid);
+		tuple.t_tableOid = RelationGetRelid(onerel);
+
+		tupgone = false;
+
+		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+		{
+			case HEAPTUPLE_DEAD:
+
+				/*
+				 * Ordinarily, DEAD tuples would have been removed by
+				 * heap_page_prune(), but it's possible that the tuple
+				 * state changed since heap_page_prune() looked.  In
+				 * particular an INSERT_IN_PROGRESS tuple could have
+				 * changed to DEAD if the inserter aborted.  So this
+				 * cannot be considered an error condition.
+				 *
+				 * If the tuple is HOT-updated then it must only be
+				 * removed by a prune operation; so we keep it just as if
+				 * it were RECENTLY_DEAD.  Also, if it's a heap-only
+				 * tuple, we choose to keep it, because it'll be a lot
+				 * cheaper to get rid of it in the next pruning pass than
+				 * to treat it like an indexed tuple.
+				 */
+				if (HeapTupleIsHotUpdated(&tuple) ||
+					HeapTupleIsHeapOnly(&tuple))
+					vacrelstats->new_dead_tuples += 1;
+				else
+					tupgone = true; /* we can delete the tuple */
+				all_visible = false;
+				break;
+			case HEAPTUPLE_LIVE:
+				/* Tuple is good --- but let's do some validity checks */
+				if (onerel->rd_rel->relhasoids &&
+					!OidIsValid(HeapTupleGetOid(&tuple)))
+					elog(WARNING, "relation \"%s\" TID %u/%u: OID is invalid",
+						 relname, blkno, offnum);
+
+				/*
+				 * Is the tuple definitely visible to all transactions?
+				 *
+				 * NB: Like with per-tuple hint bits, we can't set the
+				 * PD_ALL_VISIBLE flag if the inserter committed
+				 * asynchronously. See SetHintBits for more info. Check
+				 * that the tuple is hinted xmin-committed because of
+				 * that.
+				 */
+				if (all_visible)
+				{
+					TransactionId xmin;
+
+					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
+					{
+						all_visible = false;
+						break;
+					}
+
+					/*
+					 * The inserter definitely committed. But is it old
+					 * enough that everyone sees it as committed?
+					 */
+					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
+					if (!TransactionIdPrecedes(xmin, OldestXmin))
+					{
+						all_visible = false;
+						break;
+					}
+
+					/* Track newest xmin on page. */
+					if (TransactionIdFollows(xmin, visibility_cutoff_xid))
+						visibility_cutoff_xid = xmin;
+				}
+				break;
+			case HEAPTUPLE_RECENTLY_DEAD:
+
+				/*
+				 * If tuple is recently deleted then we must not remove it
+				 * from relation.
+				 */
+				vacrelstats->new_dead_tuples += 1;
+				all_visible = false;
+				break;
+			case HEAPTUPLE_INSERT_IN_PROGRESS:
+				/* This is an expected case during concurrent vacuum */
+				all_visible = false;
+				break;
+			case HEAPTUPLE_DELETE_IN_PROGRESS:
+				/* This is an expected case during concurrent vacuum */
+				all_visible = false;
+				break;
+			default:
+				elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+				break;
+		}
+
+		if (tupgone)
+		{
+			lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
+			HeapTupleHeaderAdvanceLatestRemovedXid(tuple.t_data,
+										 &vacrelstats->latestRemovedXid);
+			vacrelstats->tuples_deleted += 1;
+			has_dead_tuples = true;
+		}
+		else
+		{
+			vacrelstats->scanned_tuples += 1;
+			hastup = true;
+
+			/*
+			 * Each non-removable tuple must be checked to see if it needs
+			 * freezing.  Note we already have exclusive buffer lock.
+			 */
+			if (heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit,
+									  MultiXactCutoff, &frozen[nfrozen]))
+				frozen[nfrozen++].offset = offnum;
+		}
+	}						/* scan along page */
+
+	/*
+	 * If we froze any tuples, mark the buffer dirty, and write a WAL
+	 * record recording the changes.  We must log the changes to be
+	 * crash-safe against future truncation of CLOG.
+	 */
+	if (nfrozen > 0)
+	{
+		START_CRIT_SECTION();
+
+		MarkBufferDirty(buf);
+
+		/* execute collected freezes */
+		for (i = 0; i < nfrozen; i++)
+		{
+			ItemId		itemid;
+			HeapTupleHeader htup;
+
+			itemid = PageGetItemId(page, frozen[i].offset);
+			htup = (HeapTupleHeader) PageGetItem(page, itemid);
+
+			heap_execute_freeze_tuple(htup, &frozen[i]);
+		}
+
+		/* Now WAL-log freezing if neccessary */
+		if (RelationNeedsWAL(onerel))
+		{
+			XLogRecPtr	recptr;
+
+			recptr = log_heap_freeze(onerel, buf, FreezeLimit,
+									 frozen, nfrozen);
+			PageSetLSN(page, recptr);
+		}
+
+		END_CRIT_SECTION();
+	}
+
+	/*
+	 * If there are no indexes then we can vacuum the page right now
+	 * instead of doing a second scan.
+	 */
+	if (nindexes == 0 &&
+		vacrelstats->num_dead_tuples > 0)
+	{
+		/* Remove tuples from heap */
+		lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, &vmbuffer);
+		has_dead_tuples = false;
+
+		/*
+		 * Forget the now-vacuumed tuples, and press on, but be careful
+		 * not to reset latestRemovedXid since we want that value to be
+		 * valid.
+		 */
+		vacrelstats->num_dead_tuples = 0;
+		vacuumed_pages++;
+	}
+
+	freespace = PageGetHeapFreeSpace(page);
+
+	/* mark page all-visible, if appropriate */
+	if (all_visible && !all_visible_according_to_vm)
+	{
+		/*
+		 * It should never be the case that the visibility map page is set
+		 * while the page-level bit is clear, but the reverse is allowed
+		 * (if checksums are not enabled).  Regardless, set the both bits
+		 * so that we get back in sync.
+		 *
+		 * NB: If the heap page is all-visible but the VM bit is not set,
+		 * we don't need to dirty the heap page.  However, if checksums
+		 * are enabled, we do need to make sure that the heap page is
+		 * dirtied before passing it to visibilitymap_set(), because it
+		 * may be logged.  Given that this situation should only happen in
+		 * rare cases after a crash, it is not worth optimizing.
+		 */
+		PageSetAllVisible(page);
+		MarkBufferDirty(buf);
+		visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr,
+						  vmbuffer, visibility_cutoff_xid);
+	}
+
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if
+	 * the page-level bit is clear.  However, it's possible that the bit
+	 * got cleared after we checked it and before we took the buffer
+	 * content lock, so we must recheck before jumping to the conclusion
+	 * that something bad has happened.
+	 */
+	else if (all_visible_according_to_vm && !PageIsAllVisible(page)
+			 && visibilitymap_test(onerel, blkno, &vmbuffer))
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 relname, blkno);
+		visibilitymap_clear(onerel, blkno, vmbuffer);
+	}
+
+	/*
+	 * It's possible for the value returned by GetOldestXmin() to move
+	 * backwards, so it's not wrong for us to see tuples that appear to
+	 * not be visible to everyone yet, while PD_ALL_VISIBLE is already
+	 * set. The real safe xmin value never moves backwards, but
+	 * GetOldestXmin() is conservative and sometimes returns a value
+	 * that's unnecessarily small, so if we see that contradiction it just
+	 * means that the tuples that we think are not visible to everyone yet
+	 * actually are, and the PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be dead tuples on a page with PD_ALL_VISIBLE
+	 * set, however.
+	 */
+	else if (PageIsAllVisible(page) && has_dead_tuples)
+	{
+		elog(WARNING, "page containing dead tuples is marked as all-visible in relation \"%s\" page %u",
+			 relname, blkno);
+		PageClearAllVisible(page);
+		MarkBufferDirty(buf);
+		visibilitymap_clear(onerel, blkno, vmbuffer);
+	}
+
+	UnlockReleaseBuffer(buf);
+
+	/* Remember the location of the last page with nonremovable tuples */
+	if (hastup)
+		vacrelstats->nonempty_pages = blkno + 1;
+
+	/*
+	 * If we remembered any tuples for deletion, then the page will be
+	 * visited again by lazy_vacuum_heap, which will compute and record
+	 * its post-compaction free space.  If not, then we're done with this
+	 * page, so remember its free space as-is.  (This path will always be
+	 * taken if there are no indexes.)
+	 */
+	if (vacrelstats->num_dead_tuples == prev_dead_count)
+		RecordPageWithFreeSpace(onerel, blkno, freespace);
+}
 
 /*
  *	lazy_vacuum_page() -- free dead tuples on a page
-- 
2.1.2

#12Andres Freund
andres@2ndquadrant.com
In reply to: Jim Nasby (#11)
Re: Proposal: Log inability to lock pages during vacuum

On 2014-11-06 14:55:37 -0600, Jim Nasby wrote:

On 10/29/14, 11:49 AM, Jim Nasby wrote:

On 10/21/14, 6:05 PM, Tom Lane wrote:

Jim Nasby <Jim.Nasby@BlueTreble.com> writes:

- What happens if we run out of space to remember skipped blocks?

You forget some, and are no worse off than today. (This might be an
event worthy of logging, if the array is large enough that we don't
expect it to happen often ...)

Makes sense. I'll see if there's some reasonable way to retry pages when the array fills up.

I'll make the array 2k in size; that allows for 512 pages without spending a bunch of memory.

Attached is a patch for this. It also adds logging of unobtainable cleanup locks, and refactors scanning a page for vacuum into it's own function.

Anyone reviewing this might want to look at https://github.com/decibel/postgres/commit/69ab22f703d577cbb3d8036e4e42563977bcf74b, which is the refactor with no whitespace changes.

I've verified this works correctly by connecting to a backend with gdb and halting it with a page pinned. Both vacuum and vacuum freeze on that table do what's expected, but I also get this waring (which AFAICT is a false positive):

I think the retry logical is a largely pointless complication of already
complex enough code. You're fixing a problem for which there is
absolutely no evidence of its existance. Yes, this happens
occasionally. But it's going to be so absolutely minor in comparison to
just about every other source of bloat.

So, I pretty strongly against retrying. I could live with adding logging
of the number of pages skipped due to not being able to acquire the
cleanup lock. I don't think that's going to do help many people, but the
cost is pretty low.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Greg Stark
stark@mit.edu
In reply to: Andres Freund (#12)
Re: Proposal: Log inability to lock pages during vacuum

On Thu, Nov 6, 2014 at 9:30 PM, Andres Freund <andres@2ndquadrant.com> wrote:

I think the retry logical is a largely pointless complication of already
complex enough code. You're fixing a problem for which there is
absolutely no evidence of its existance. Yes, this happens
occasionally. But it's going to be so absolutely minor in comparison to
just about every other source of bloat.

I agree bloat isn't really a threat, but what about the relfrozenxid?
If we skip even one page we don't get to advance it and retrying could
eliminate those skipped pages and allow us to avoid a vacuum freeze
which can be really painful. Of course that only works if you can be
sure you haven't overflowed and forgotten any skipped pages and if you
don't find the page still pinned every time until you eventually give
up on it.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Greg Stark (#13)
Re: Proposal: Log inability to lock pages during vacuum

On 11/6/14, 5:40 PM, Greg Stark wrote:

On Thu, Nov 6, 2014 at 9:30 PM, Andres Freund <andres@2ndquadrant.com> wrote:

I think the retry logical is a largely pointless complication of already
complex enough code. You're fixing a problem for which there is
absolutely no evidence of its existance. Yes, this happens
occasionally. But it's going to be so absolutely minor in comparison to
just about every other source of bloat.

For some reason I don't have Andres' original email, so I'll reply here: I agree with you, and my original proposal was simply to log how many pages were skipped, but that was objected to. Simply logging this extra information would be a patch of a dozen lines or less.

The problem right now is there's no way to actually obtain evidence that this is (or isn't) something to worry about, because we just silently skip pages. If we had any kind of tracking on this we could stop guessing. :(

I agree bloat isn't really a threat, but what about the relfrozenxid?
If we skip even one page we don't get to advance it and retrying could
eliminate those skipped pages and allow us to avoid a vacuum freeze
which can be really painful. Of course that only works if you can be
sure you haven't overflowed and forgotten any skipped pages and if you
don't find the page still pinned every time until you eventually give
up on it.

The overflow part shouldn't be that big a deal. Either we just bump the array size up some more, or worst-case we scan it whenever it fills (like we do when we fill vacrelstats->dead_tuples.

But like I said above, I think this is already making a mountain out of a mole-hill, until we have evidence there's a real problem.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Andres Freund
andres@2ndquadrant.com
In reply to: Greg Stark (#13)
Re: Proposal: Log inability to lock pages during vacuum

On 2014-11-06 23:40:18 +0000, Greg Stark wrote:

On Thu, Nov 6, 2014 at 9:30 PM, Andres Freund <andres@2ndquadrant.com> wrote:

I think the retry logical is a largely pointless complication of already
complex enough code. You're fixing a problem for which there is
absolutely no evidence of its existance. Yes, this happens
occasionally. But it's going to be so absolutely minor in comparison to
just about every other source of bloat.

I agree bloat isn't really a threat, but what about the relfrozenxid?
If we skip even one page we don't get to advance it and retrying could
eliminate those skipped pages and allow us to avoid a vacuum freeze
which can be really painful. Of course that only works if you can be
sure you haven't overflowed and forgotten any skipped pages and if you
don't find the page still pinned every time until you eventually give
up on it.

I don't buy this argument. Either you're constantly vacuuming the whole
relation anyway - in which case we'll acquire the cleanup lock
unconditionally - or we're doing partial vacuums via the visibility map
anyway.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Andres Freund
andres@2ndquadrant.com
In reply to: Jim Nasby (#14)
Re: Proposal: Log inability to lock pages during vacuum

On 2014-11-06 19:03:20 -0600, Jim Nasby wrote:

On 11/6/14, 5:40 PM, Greg Stark wrote:

On Thu, Nov 6, 2014 at 9:30 PM, Andres Freund <andres@2ndquadrant.com> wrote:

I think the retry logical is a largely pointless complication of already
complex enough code. You're fixing a problem for which there is
absolutely no evidence of its existance. Yes, this happens
occasionally. But it's going to be so absolutely minor in comparison to
just about every other source of bloat.

For some reason I don't have Andres' original email, so I'll reply
here: I agree with you, and my original proposal was simply to log how
many pages were skipped, but that was objected to. Simply logging this
extra information would be a patch of a dozen lines or less.

The objection was that it's unneccessary complexity. So you made the
patch a magnitude more complex *and* added logging? That doesn't make
much sense.

The problem right now is there's no way to actually obtain evidence
that this is (or isn't) something to worry about, because we just
silently skip pages. If we had any kind of tracking on this we could
stop guessing. :(

What's the worst consequence this could have? A couple pages not marked
all visible and not immediately cleaned up. That's not particularly
harmful.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Greg Stark (#13)
Re: Proposal: Log inability to lock pages during vacuum

Greg Stark wrote:

I agree bloat isn't really a threat, but what about the relfrozenxid?
If we skip even one page we don't get to advance it and retrying could
eliminate those skipped pages and allow us to avoid a vacuum freeze
which can be really painful. Of course that only works if you can be
sure you haven't overflowed and forgotten any skipped pages and if you
don't find the page still pinned every time until you eventually give
up on it.

We never advance relfrozenxid nowadays unless it's a whole-table scan;
and once we commit to doing those (vacuum_freeze_table_age is past), we
don't skip pages anymore.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Robert Haas
robertmhaas@gmail.com
In reply to: Jim Nasby (#14)
Re: Proposal: Log inability to lock pages during vacuum

On Thu, Nov 6, 2014 at 8:03 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

The problem right now is there's no way to actually obtain evidence that
this is (or isn't) something to worry about, because we just silently skip
pages. If we had any kind of tracking on this we could stop guessing. :(

I could see logging it, but I agree with Andres and Alvaro that the
odds are strongly against there being any actual problem here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Robert Haas (#18)
Re: Proposal: Log inability to lock pages during vacuum

On 11/7/14, 8:21 PM, Robert Haas wrote:

On Thu, Nov 6, 2014 at 8:03 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

The problem right now is there's no way to actually obtain evidence that
this is (or isn't) something to worry about, because we just silently skip
pages. If we had any kind of tracking on this we could stop guessing. :(

I could see logging it, but I agree with Andres and Alvaro that the
odds are strongly against there being any actual problem here.

I'm fine with that. Any other objections? Andres?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jim Nasby (#19)
Re: Proposal: Log inability to lock pages during vacuum

Jim Nasby wrote:

On 11/7/14, 8:21 PM, Robert Haas wrote:

On Thu, Nov 6, 2014 at 8:03 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

The problem right now is there's no way to actually obtain evidence that
this is (or isn't) something to worry about, because we just silently skip
pages. If we had any kind of tracking on this we could stop guessing. :(

I could see logging it, but I agree with Andres and Alvaro that the
odds are strongly against there being any actual problem here.

I'm fine with that. Any other objections? Andres?

If what we want is to quantify the extent of the issue, would it be more
convenient to save counters to pgstat? Vacuum already sends pgstat
messages, so there's no additional traffic there.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Alvaro Herrera (#20)
Re: Proposal: Log inability to lock pages during vacuum

On 11/10/14, 11:28 AM, Alvaro Herrera wrote:

Jim Nasby wrote:

On 11/7/14, 8:21 PM, Robert Haas wrote:

On Thu, Nov 6, 2014 at 8:03 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

The problem right now is there's no way to actually obtain evidence that
this is (or isn't) something to worry about, because we just silently skip
pages. If we had any kind of tracking on this we could stop guessing. :(

I could see logging it, but I agree with Andres and Alvaro that the
odds are strongly against there being any actual problem here.

I'm fine with that. Any other objections? Andres?

If what we want is to quantify the extent of the issue, would it be more
convenient to save counters to pgstat? Vacuum already sends pgstat
messages, so there's no additional traffic there.

IMHO that would be ideal, but I think Tom was leery of using more space for every table. If we go this route, I'm guessing we should only log pages we skip, and not log pages we had to wait for the lock on (in the case of a freeze). Also, should we still eroprt this even if we are putting it in stats?

Is there a way to avoid duplicating the entire eroprt call? I see I could call errstart & friends manually, but currently that's only done in elog.c.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Andres Freund
andres@2ndquadrant.com
In reply to: Alvaro Herrera (#20)
Re: Proposal: Log inability to lock pages during vacuum

On 2014-11-10 14:28:30 -0300, Alvaro Herrera wrote:

Jim Nasby wrote:

On 11/7/14, 8:21 PM, Robert Haas wrote:

On Thu, Nov 6, 2014 at 8:03 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

The problem right now is there's no way to actually obtain evidence that
this is (or isn't) something to worry about, because we just silently skip
pages. If we had any kind of tracking on this we could stop guessing. :(

I could see logging it, but I agree with Andres and Alvaro that the
odds are strongly against there being any actual problem here.

I'm fine with that. Any other objections? Andres?

If you feel that strong about the need for logging, go ahead.

If what we want is to quantify the extent of the issue, would it be more
convenient to save counters to pgstat? Vacuum already sends pgstat
messages, so there's no additional traffic there.

I'm pretty strongly against that one in isolation. They'd need to be
stored somewhere and they'd need to be queryable somewhere with enough
context to make sense. To actually make sense of the numbers we'd also
need to report all the other datapoints of vacuum in some form. That's
quite a worthwile project imo - but *much* *much* more work than this.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#22)
Re: Proposal: Log inability to lock pages during vacuum

Andres Freund wrote:

On 2014-11-10 14:28:30 -0300, Alvaro Herrera wrote:

If what we want is to quantify the extent of the issue, would it be more
convenient to save counters to pgstat? Vacuum already sends pgstat
messages, so there's no additional traffic there.

I'm pretty strongly against that one in isolation. They'd need to be
stored somewhere and they'd need to be queryable somewhere with enough
context to make sense. To actually make sense of the numbers we'd also
need to report all the other datapoints of vacuum in some form. That's
quite a worthwile project imo - but *much* *much* more work than this.

We already have "last_autovacuum" columns and such in pg_stat_tables et
al, which only record the last value. My thinking regarding such
numbers is that you would save histories and put them in a chart, see
how they evolve with time. I doubt the individual numbers are worth
much, but the trends might show something interesting. As far as I
know, this is already true for most other pgstat values, with exception
of things such as live_tuples which are absolute numbers rather than
running counters.

I agree having more vacuuming data in general is a worthwhile project,
much larger than this one. Wasn't Greg Smith working on that?

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Andres Freund (#22)
Re: Proposal: Log inability to lock pages during vacuum

On 11/10/14, 12:15 PM, Andres Freund wrote:

If what we want is to quantify the extent of the issue, would it be more
convenient to save counters to pgstat? Vacuum already sends pgstat
messages, so there's no additional traffic there.

I'm pretty strongly against that one in isolation. They'd need to be
stored somewhere and they'd need to be queryable somewhere with enough
context to make sense. To actually make sense of the numbers we'd also
need to report all the other datapoints of vacuum in some form. That's
quite a worthwile project imo - but*much* *much* more work than this.

We already report statistics on vacuums (lazy_vacuum_rel()/pgstat_report_vacuum), so this would just be adding 1 or 2 counters to that. Should we add the other counters from vacuum? That would be significantly more data.

Semi-related, we should probably be reporting stats from heap truncation.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Andres Freund
andres@2ndquadrant.com
In reply to: Alvaro Herrera (#23)
Re: Proposal: Log inability to lock pages during vacuum

On 2014-11-10 15:36:55 -0300, Alvaro Herrera wrote:

Andres Freund wrote:

On 2014-11-10 14:28:30 -0300, Alvaro Herrera wrote:

If what we want is to quantify the extent of the issue, would it be more
convenient to save counters to pgstat? Vacuum already sends pgstat
messages, so there's no additional traffic there.

I'm pretty strongly against that one in isolation. They'd need to be
stored somewhere and they'd need to be queryable somewhere with enough
context to make sense. To actually make sense of the numbers we'd also
need to report all the other datapoints of vacuum in some form. That's
quite a worthwile project imo - but *much* *much* more work than this.

We already have "last_autovacuum" columns and such in pg_stat_tables et
al, which only record the last value. My thinking regarding such
numbers is that you would save histories and put them in a chart, see
how they evolve with time. I doubt the individual numbers are worth
much, but the trends might show something interesting.

I don't think they mean anything without also reporting the number of
buffers actually scanned and other related stats.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26Andres Freund
andres@2ndquadrant.com
In reply to: Jim Nasby (#24)
Re: Proposal: Log inability to lock pages during vacuum

On 2014-11-10 12:37:29 -0600, Jim Nasby wrote:

On 11/10/14, 12:15 PM, Andres Freund wrote:

If what we want is to quantify the extent of the issue, would it be more
convenient to save counters to pgstat? Vacuum already sends pgstat
messages, so there's no additional traffic there.

I'm pretty strongly against that one in isolation. They'd need to be
stored somewhere and they'd need to be queryable somewhere with enough
context to make sense. To actually make sense of the numbers we'd also
need to report all the other datapoints of vacuum in some form. That's
quite a worthwile project imo - but*much* *much* more work than this.

We already report statistics on vacuums
(lazy_vacuum_rel()/pgstat_report_vacuum), so this would just be adding
1 or 2 counters to that. Should we add the other counters from vacuum?
That would be significantly more data.

At the very least it'd require:
* The number of buffers skipped due to the vm
* The number of buffers actually scanned
* The number of full table in contrast to partial vacuums

I think it'd require a fair amount of thinking about which values are
required to make sense of the number of skipped buffers due to not being
able to acquire the cleanup lock.

If you want to do this - and I sure don't want to stop you from it - you
should look at it from a general perspective, not from the perspective
of how skipped cleanup locks are logged.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Andres Freund (#26)
Re: Proposal: Log inability to lock pages during vacuum

On 11/10/14, 12:56 PM, Andres Freund wrote:

On 2014-11-10 12:37:29 -0600, Jim Nasby wrote:

On 11/10/14, 12:15 PM, Andres Freund wrote:

If what we want is to quantify the extent of the issue, would it be more
convenient to save counters to pgstat? Vacuum already sends pgstat
messages, so there's no additional traffic there.

I'm pretty strongly against that one in isolation. They'd need to be
stored somewhere and they'd need to be queryable somewhere with enough
context to make sense. To actually make sense of the numbers we'd also
need to report all the other datapoints of vacuum in some form. That's
quite a worthwile project imo - but*much* *much* more work than this.

We already report statistics on vacuums
(lazy_vacuum_rel()/pgstat_report_vacuum), so this would just be adding
1 or 2 counters to that. Should we add the other counters from vacuum?
That would be significantly more data.

At the very least it'd require:
* The number of buffers skipped due to the vm
* The number of buffers actually scanned
* The number of full table in contrast to partial vacuums

If we're going to track full scan vacuums separately, I think we'd need two separate scan counters. I think (for pgstats) it'd make more sense to just count initial failure to acquire the lock in a full scan in the 'skipped page' counter. In terms of answering the question "how common is it not to get the lock", it's really the same event.

I think it'd require a fair amount of thinking about which values are
required to make sense of the number of skipped buffers due to not being
able to acquire the cleanup lock.

If you want to do this - and I sure don't want to stop you from it - you
should look at it from a general perspective, not from the perspective
of how skipped cleanup locks are logged.

Honestly, my desire at this point is just to see if there's actually a problem. Many people are asserting that this should be a very rare occurrence, but there's no way to know.

Towards that simple end, I'm a bit torn. My preference would be to simply log, and throw a warning if it's over some threshold. I believe that would give the best odds of getting feedback from users if this isn't as uncommon as we think.

I agree that ideally this would be tracked as another stat, but from that standpoint I think there's other, much more important metrics to track, and AFAIK the only reason we don't have them is that busy systems already push pgstats to it's limits. We should try and fix that, but that's a much bigger issue.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#27)
Re: Proposal: Log inability to lock pages during vacuum

Jim Nasby <Jim.Nasby@BlueTreble.com> writes:

On 11/10/14, 12:56 PM, Andres Freund wrote:

If you want to do this - and I sure don't want to stop you from it - you
should look at it from a general perspective, not from the perspective
of how skipped cleanup locks are logged.

Honestly, my desire at this point is just to see if there's actually a problem. Many people are asserting that this should be a very rare occurrence, but there's no way to know.

Towards that simple end, I'm a bit torn. My preference would be to simply log, and throw a warning if it's over some threshold. I believe that would give the best odds of getting feedback from users if this isn't as uncommon as we think.

I agree that ideally this would be tracked as another stat, but from that standpoint I think there's other, much more important metrics to track, and AFAIK the only reason we don't have them is that busy systems already push pgstats to it's limits. We should try and fix that, but that's a much bigger issue.

Yeah. We know that per-table pgstat counters are a pretty expensive thing
in databases with many tables. We should absolutely not be adding them on
mere speculation that the number might be interesting.

Now, that objection would not apply to a per *database* counter, but I'm
not sure if tracking the numbers at that granularity would help anyone.

On the whole, I'm +1 for just logging the events and seeing what we learn
that way. That seems like an appropriate amount of effort for finding out
whether there is really an issue.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29Andres Freund
andres@2ndquadrant.com
In reply to: Jim Nasby (#27)
Re: Proposal: Log inability to lock pages during vacuum

On 2014-11-10 19:36:18 -0600, Jim Nasby wrote:

On 11/10/14, 12:56 PM, Andres Freund wrote:

On 2014-11-10 12:37:29 -0600, Jim Nasby wrote:

On 11/10/14, 12:15 PM, Andres Freund wrote:

If what we want is to quantify the extent of the issue, would it be more
convenient to save counters to pgstat? Vacuum already sends pgstat
messages, so there's no additional traffic there.

I'm pretty strongly against that one in isolation. They'd need to be
stored somewhere and they'd need to be queryable somewhere with enough
context to make sense. To actually make sense of the numbers we'd also
need to report all the other datapoints of vacuum in some form. That's
quite a worthwile project imo - but*much* *much* more work than this.

We already report statistics on vacuums
(lazy_vacuum_rel()/pgstat_report_vacuum), so this would just be adding
1 or 2 counters to that. Should we add the other counters from vacuum?
That would be significantly more data.

At the very least it'd require:
* The number of buffers skipped due to the vm
* The number of buffers actually scanned
* The number of full table in contrast to partial vacuums

If we're going to track full scan vacuums separately, I think we'd
need two separate scan counters.

Well, we already have the entire number of vacuums, so we'd have that.

I think (for pgstats) it'd make more sense to just count initial
failure to acquire the lock in a full scan in the 'skipped page'
counter. In terms of answering the question "how common is it not to
get the lock", it's really the same event.

It's absolutely not. You need to correlate the number of skipped pages
to the number of vacuumed pages. If you have 100k skipped in 10 billion
total scanned pages it's something entirely different than 100k in 200k
pages.

Honestly, my desire at this point is just to see if there's actually a
problem. Many people are asserting that this should be a very rare
occurrence, but there's no way to know.

Ok.

Towards that simple end, I'm a bit torn. My preference would be to
simply log, and throw a warning if it's over some threshold. I believe
that would give the best odds of getting feedback from users if this
isn't as uncommon as we think.

I'm strongly against a warning. We have absolutely no sane way of tuning
that. We'll just create a pointless warning that people will get
confused about and that they'll have to live with till the next release.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Andres Freund (#29)
Re: Proposal: Log inability to lock pages during vacuum

On 11/11/14, 2:01 AM, Andres Freund wrote:

On 2014-11-10 19:36:18 -0600, Jim Nasby wrote:

On 11/10/14, 12:56 PM, Andres Freund wrote:

On 2014-11-10 12:37:29 -0600, Jim Nasby wrote:

On 11/10/14, 12:15 PM, Andres Freund wrote:

If what we want is to quantify the extent of the issue, would it be more
convenient to save counters to pgstat? Vacuum already sends pgstat
messages, so there's no additional traffic there.

I'm pretty strongly against that one in isolation. They'd need to be
stored somewhere and they'd need to be queryable somewhere with enough
context to make sense. To actually make sense of the numbers we'd also
need to report all the other datapoints of vacuum in some form. That's
quite a worthwile project imo - but*much* *much* more work than this.

We already report statistics on vacuums
(lazy_vacuum_rel()/pgstat_report_vacuum), so this would just be adding
1 or 2 counters to that. Should we add the other counters from vacuum?
That would be significantly more data.

At the very least it'd require:
* The number of buffers skipped due to the vm
* The number of buffers actually scanned
* The number of full table in contrast to partial vacuums

If we're going to track full scan vacuums separately, I think we'd
need two separate scan counters.

Well, we already have the entire number of vacuums, so we'd have that.

I mean number of pages scanned, but as I said below I don't think that's really necessary.

I think (for pgstats) it'd make more sense to just count initial
failure to acquire the lock in a full scan in the 'skipped page'
counter. In terms of answering the question "how common is it not to
get the lock", it's really the same event.

It's absolutely not. You need to correlate the number of skipped pages
to the number of vacuumed pages. If you have 100k skipped in 10 billion
total scanned pages it's something entirely different than 100k in 200k
pages.

If the goal here is to find out if this even is a problem then I think the critical question is not "did we vacuum", but "were we able to acquire the lock on the first try".

Obviously users will care much more about the vacuuming and not so much about the lock; but if this really is a non-issue as most tend to believe I don't think it's worth worrying about any of this (except perhaps putting dtrace/system tap probes in).

Honestly, my desire at this point is just to see if there's actually a
problem. Many people are asserting that this should be a very rare
occurrence, but there's no way to know.

Ok.

Towards that simple end, I'm a bit torn. My preference would be to
simply log, and throw a warning if it's over some threshold. I believe
that would give the best odds of getting feedback from users if this
isn't as uncommon as we think.

I'm strongly against a warning. We have absolutely no sane way of tuning
that. We'll just create a pointless warning that people will get
confused about and that they'll have to live with till the next release.

To clarify: I'm only suggesting we issue a warning if we have to skip some significant number of pages; say 5 or 0.01% of the table, whichever is greater. That's aimed directly at the goal of letting us know if this is actually a problem or not.

The reason I'm inclined to do the warning is because I don't think people will notice this otherwise. If this really isn't a problem then it won't matter; if it's a *big* problem then we'll at least know about it.

I'm thinking of an undocumented GUC to control the threshold, but I assume no one else would be on board with that?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31Andres Freund
andres@2ndquadrant.com
In reply to: Jim Nasby (#30)
Re: Proposal: Log inability to lock pages during vacuum

On 2014-11-12 11:34:04 -0600, Jim Nasby wrote:

On 11/11/14, 2:01 AM, Andres Freund wrote:

On 2014-11-10 19:36:18 -0600, Jim Nasby wrote:

Towards that simple end, I'm a bit torn. My preference would be to
simply log, and throw a warning if it's over some threshold. I believe
that would give the best odds of getting feedback from users if this
isn't as uncommon as we think.

I'm strongly against a warning. We have absolutely no sane way of tuning
that. We'll just create a pointless warning that people will get
confused about and that they'll have to live with till the next release.

To clarify: I'm only suggesting we issue a warning if we have to skip some significant number of pages; say 5 or 0.01% of the table, whichever is greater. That's aimed directly at the goal of letting us know if this is actually a problem or not.

Meh. You have a 5 page relation and it'll trigger quite easily. And it's
absolutely harmless.

The reason I'm inclined to do the warning is because I don't think people will notice this otherwise. If this really isn't a problem then it won't matter; if it's a *big* problem then we'll at least know about it.

I'm thinking of an undocumented GUC to control the threshold, but I assume no one else would be on board with that?

Stop overdesigning this.

Add it to the existing mesage and let us be done with this. This thread
has already wasted far too much time.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#31)
Re: Proposal: Log inability to lock pages during vacuum

On Wed, Nov 12, 2014 at 12:37 PM, Andres Freund <andres@2ndquadrant.com> wrote:

Stop overdesigning this.

Add it to the existing mesage and let us be done with this. This thread
has already wasted far too much time.

That's a little harsh, but I agree. Producing a warning here is just
going to be log-spam. We've had this behavior for years and - to my
knowledge - we have not got one complaint that can be attributed to
this feature. What used to happen is that VACUUM would get stuck for
minutes, hours, or days trying to vacuum a table because of an open
cursor, and people did complain about that. It was a serious nuisance
that is now gone. The entire argument in favor of some change here is
"even though a careful theoretical analysis indicates that this is
safe, even though it solved a real problem that was hurting our users,
and even though we have no evidence whatsoever that the change is
hurting anybody in any way whatsoever, the lack of instrumentation
means that it could possibly be hurting somebody and we wouldn't
know."

That is, of course, quite true, but you could apply the same argument
to lots of patches. It would usually be a waste of time because most
of the patches we think are good actually ARE good - but of course,
every once in a while you would find a real problem that had been
overlooked.

Maybe the right thing here is not to make any change to the source
code *at all* but to patch his own copy and try to come up with a
reproducible test case where this is actually a problem. Then, if
there is a problem, we could actually fix it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#28)
1 attachment(s)
Re: Proposal: Log inability to lock pages during vacuum

On 11/10/14, 7:52 PM, Tom Lane wrote:

On the whole, I'm +1 for just logging the events and seeing what we learn
that way. That seems like an appropriate amount of effort for finding out
whether there is really an issue.

Attached is a patch that does this.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Attachments:

0001-Log-cleanup-lock-acquisition-failures-in-vacuum.patchtext/plain; charset=UTF-8; name=0001-Log-cleanup-lock-acquisition-failures-in-vacuum.patch; x-mac-creator=0; x-mac-type=0Download
>From a8e824900d7c68e2c242b28c9c06c854f01b770a Mon Sep 17 00:00:00 2001
From: Jim Nasby <Jim.Nasby@BlueTreble.com>
Date: Sun, 30 Nov 2014 20:43:47 -0600
Subject: [PATCH] Log cleanup lock acquisition failures in vacuum

---

Notes:
    Count how many times we fail to grab the page cleanup lock on the first try,
    logging it with different wording depending on whether scan_all is true.

 doc/src/sgml/ref/vacuum.sgml      | 1 +
 src/backend/commands/vacuumlazy.c | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 450c94f..1272c1c 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -252,6 +252,7 @@ DETAIL:  CPU 0.01s/0.06u sec elapsed 0.07 sec.
 INFO:  "onek": found 3000 removable, 1000 nonremovable tuples in 143 pages
 DETAIL:  0 dead tuples cannot be removed yet.
 There were 0 unused item pointers.
+Could not acquire cleanup lock on 0 pages.
 0 pages are entirely empty.
 CPU 0.07s/0.39u sec elapsed 1.56 sec.
 INFO:  analyzing "public.onek"
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 6db6c5c..8f22ed2 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -105,6 +105,8 @@ typedef struct LVRelStats
 	BlockNumber old_rel_pages;	/* previous value of pg_class.relpages */
 	BlockNumber rel_pages;		/* total number of pages */
 	BlockNumber scanned_pages;	/* number of pages we examined */
+	/* number of pages we could not initially get lock on */
+	BlockNumber	nolock;
 	double		scanned_tuples; /* counts only tuples on scanned pages */
 	double		old_rel_tuples; /* previous value of pg_class.reltuples */
 	double		new_rel_tuples; /* new estimated total # of tuples */
@@ -346,6 +348,7 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
 			ereport(LOG,
 					(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
 							"pages: %d removed, %d remain\n"
+							"%s cleanup lock on %u pages.\n"
 							"tuples: %.0f removed, %.0f remain, %.0f are dead but not yet removable\n"
 							"buffer usage: %d hits, %d misses, %d dirtied\n"
 					  "avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"
@@ -356,6 +359,7 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
 							vacrelstats->num_index_scans,
 							vacrelstats->pages_removed,
 							vacrelstats->rel_pages,
+							scan_all ? "Waited for" : "Could not acquire", vacrelstats->nolock,
 							vacrelstats->tuples_deleted,
 							vacrelstats->new_rel_tuples,
 							vacrelstats->new_dead_tuples,
@@ -611,6 +615,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 		/* We need buffer cleanup lock so that we can prune HOT chains. */
 		if (!ConditionalLockBufferForCleanup(buf))
 		{
+			vacrelstats->nolock++;
+
 			/*
 			 * If we're not scanning the whole relation to guard against XID
 			 * wraparound, it's OK to skip vacuuming a page.  The next vacuum
@@ -1101,10 +1107,12 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 					vacrelstats->scanned_pages, nblocks),
 			 errdetail("%.0f dead row versions cannot be removed yet.\n"
 					   "There were %.0f unused item pointers.\n"
+					   "%s cleanup lock on %u pages.\n"
 					   "%u pages are entirely empty.\n"
 					   "%s.",
 					   nkeep,
 					   nunused,
+					   scan_all ? "Waited for" : "Could not acquire", vacrelstats->nolock,
 					   empty_pages,
 					   pg_rusage_show(&ru0))));
 }
-- 
2.1.2

#34Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Jim Nasby (#14)
Re: Proposal: Log inability to lock pages during vacuum

On 12/1/14, 11:57 AM, Andres Freund wrote:

On 2014-11-30 20:46:51 -0600, Jim Nasby wrote:

On 11/10/14, 7:52 PM, Tom Lane wrote:

On the whole, I'm +1 for just logging the events and seeing what we learn
that way. That seems like an appropriate amount of effort for finding out
whether there is really an issue.

Attached is a patch that does this.

Unless somebody protests I plan to push this soon. I'll change two
things:

* Always use the same message, independent of scan_all. For one Jim's
version would be untranslatable, for another it's not actually
correct. In most cases we'll *not* wait, even if scan_all is
true as we'll often just balk after !lazy_check_needs_freeze().

Good point.

* Change the new bit in the errdetail. "could not acquire cleanup lock"
sounds too much like an error to me. "skipped %u pinned pages" maybe?

Seems reasonable.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35Simon Riggs
simon@2ndQuadrant.com
In reply to: Jim Nasby (#1)
Re: Proposal: Log inability to lock pages during vacuum

On 20 October 2014 at 10:57, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

Currently, a non-freeze vacuum will punt on any page it can't get a cleanup
lock on, with no retry. Presumably this should be a rare occurrence, but I
think it's bad that we just assume that and won't warn the user if something
bad is going on.

(I'm having email problems, so I can't see later mails on this thread,
so replying here.)

Logging patch looks fine, but I would rather not add a line of text
for each VACUUM, just in case this is non-zero. I think we should add
that log line only if the blocks skipped > 0.

What I'm more interested in is what you plan to do with the
information once we get it?

The assumption that skipping blocks is something bad is strange. I
added it because VACUUM could and did regularly hang on busy tables,
which resulted in bloat because other blocks that needed cleaning
didn't get any attention.

Which is better, spend time obsessively trying to vacuum particular
blocks, or to spend the time on other blocks that are in need of
cleaning and are available to be cleaned?

Which is better, have autovacuum or system wide vacuum progress on to
other tables that need cleaning, or spend lots of effort retrying?

How do we know what is the best next action?

I'd really want to see some analysis of those things before we spend
even more cycles on this.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Simon Riggs (#35)
Re: Proposal: Log inability to lock pages during vacuum

On 12/7/14, 6:16 PM, Simon Riggs wrote:

On 20 October 2014 at 10:57, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

Currently, a non-freeze vacuum will punt on any page it can't get a cleanup
lock on, with no retry. Presumably this should be a rare occurrence, but I
think it's bad that we just assume that and won't warn the user if something
bad is going on.

(I'm having email problems, so I can't see later mails on this thread,
so replying here.)

Logging patch looks fine, but I would rather not add a line of text
for each VACUUM, just in case this is non-zero. I think we should add
that log line only if the blocks skipped > 0.

I thought about doing that, but I'm loath to duplicate a rather large ereport call. Happy to make the change if that's the consensus though.

What I'm more interested in is what you plan to do with the
information once we get it?

The assumption that skipping blocks is something bad is strange. I
added it because VACUUM could and did regularly hang on busy tables,
which resulted in bloat because other blocks that needed cleaning
didn't get any attention.

Which is better, spend time obsessively trying to vacuum particular
blocks, or to spend the time on other blocks that are in need of
cleaning and are available to be cleaned?

Which is better, have autovacuum or system wide vacuum progress on to
other tables that need cleaning, or spend lots of effort retrying?

How do we know what is the best next action?

I'd really want to see some analysis of those things before we spend
even more cycles on this.

That's the entire point of logging this information. There is an underlying assumption that we won't actually skip many pages, but there's no data to back that up, nor is there currently any way to get that data.

My hope is that the logging shows that there isn't anything more that needs to be done here. If this is something that causes problems, at least now DBAs will be aware of it and hopefully we'll be able to identify specific problem scenarios and find a solution.

BTW, my initial proposal[1]"Currently, a non-freeze vacuum will punt on any page it can't get a cleanup lock on, with no retry. Presumably this should be a rare occurrence, but I think it's bad that we just assume that and won't warn the user if something bad is going on. was strictly logging. The only difference was raising it to a warning if a significant portion of the table was skipped. I only investigated retrying locks at the suggestion of others. I never intended this to become a big time sink.

[1]: "Currently, a non-freeze vacuum will punt on any page it can't get a cleanup lock on, with no retry. Presumably this should be a rare occurrence, but I think it's bad that we just assume that and won't warn the user if something bad is going on.
"Currently, a non-freeze vacuum will punt on any page it can't get a cleanup lock on, with no retry. Presumably this should be a rare occurrence, but I think it's bad that we just assume that and won't warn the user if something bad is going on.

"My thought is that if we skip any pages elog(LOG) how many we skipped. If we skip more than 1% of the pages we visited (not relpages) then elog(WARNING) instead."
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37Simon Riggs
simon@2ndQuadrant.com
In reply to: Jim Nasby (#36)
Re: Proposal: Log inability to lock pages during vacuum

On 9 December 2014 at 06:28, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 12/7/14, 6:16 PM, Simon Riggs wrote:

What I'm more interested in is what you plan to do with the
information once we get it?

The assumption that skipping blocks is something bad is strange. I
added it because VACUUM could and did regularly hang on busy tables,
which resulted in bloat because other blocks that needed cleaning
didn't get any attention.

Which is better, spend time obsessively trying to vacuum particular
blocks, or to spend the time on other blocks that are in need of
cleaning and are available to be cleaned?

Which is better, have autovacuum or system wide vacuum progress on to
other tables that need cleaning, or spend lots of effort retrying?

How do we know what is the best next action?

I'd really want to see some analysis of those things before we spend
even more cycles on this.

That's the entire point of logging this information. There is an underlying
assumption that we won't actually skip many pages, but there's no data to
back that up, nor is there currently any way to get that data.

There is no such underlying assumption. You assumed there was one, but
there isn't one.

All I can say for certain is that waiting on a lock for long periods
was literally a waste of time. Now it no longer wastes time, it gets
on with vacuuming the pages it can.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Jim Nasby (#34)
1 attachment(s)
Re: Proposal: Log inability to lock pages during vacuum

On 12/01/2014 08:55 PM, Jim Nasby wrote:

On 12/1/14, 11:57 AM, Andres Freund wrote:

On 2014-11-30 20:46:51 -0600, Jim Nasby wrote:

On 11/10/14, 7:52 PM, Tom Lane wrote:

On the whole, I'm +1 for just logging the events and seeing what we learn
that way. That seems like an appropriate amount of effort for finding out
whether there is really an issue.

Attached is a patch that does this.

Unless somebody protests I plan to push this soon. I'll change two
things:

* Always use the same message, independent of scan_all. For one Jim's
version would be untranslatable, for another it's not actually
correct. In most cases we'll *not* wait, even if scan_all is
true as we'll often just balk after !lazy_check_needs_freeze().

Good point.

* Change the new bit in the errdetail. "could not acquire cleanup lock"
sounds too much like an error to me. "skipped %u pinned pages" maybe?

Seems reasonable.

Well, that's not always true either; when freezing, it doesn't skip the
pinned pages, it waits for them.

How about the attached (I also edited the comments a bit)? It looks like
this:

postgres=# vacuum verbose foo;
INFO: vacuuming "public.foo"
INFO: "foo": found 0 removable, 0 nonremovable row versions in 0 out of
7256 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
1 pages were pinned.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "foo": stopping truncate due to conflicting lock request
INFO: vacuuming "pg_toast.pg_toast_16384"
INFO: index "pg_toast_16384_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

and for autovacuum log:

LOG: automatic vacuum of table "postgres.public.foo": index scans: 0
pages: 0 removed, 7256 remain, 0 pinned
tuples: 79415 removed, 513156 remain, 0 are dead but not yet removable
buffer usage: 14532 hits, 6 misses, 6241 dirtied
avg read rate: 0.003 MB/s, avg write rate: 3.413 MB/s
system usage: CPU 0.00s/0.30u sec elapsed 14.28 sec

I.e. this just says how many pages were pinned, without saying what was
done about them. That's not very meaningful to an average DBA, but
that's true for many of the numbers printed in vacuum verbose.

- Heikki

Attachments:

log-cleanup-lock-acquisition-failures-in-vacuum-2.patchtext/x-diff; name=log-cleanup-lock-acquisition-failures-in-vacuum-2.patchDownload
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 450c94f..5755753 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -252,6 +252,7 @@ DETAIL:  CPU 0.01s/0.06u sec elapsed 0.07 sec.
 INFO:  "onek": found 3000 removable, 1000 nonremovable tuples in 143 pages
 DETAIL:  0 dead tuples cannot be removed yet.
 There were 0 unused item pointers.
+0 pages were pinned.
 0 pages are entirely empty.
 CPU 0.07s/0.39u sec elapsed 1.56 sec.
 INFO:  analyzing "public.onek"
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 6db6c5c..af36486 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -105,6 +105,7 @@ typedef struct LVRelStats
 	BlockNumber old_rel_pages;	/* previous value of pg_class.relpages */
 	BlockNumber rel_pages;		/* total number of pages */
 	BlockNumber scanned_pages;	/* number of pages we examined */
+	BlockNumber	pinned_pages;	/* # of pages we could not initially lock */
 	double		scanned_tuples; /* counts only tuples on scanned pages */
 	double		old_rel_tuples; /* previous value of pg_class.reltuples */
 	double		new_rel_tuples; /* new estimated total # of tuples */
@@ -345,7 +346,7 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
 			}
 			ereport(LOG,
 					(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
-							"pages: %d removed, %d remain\n"
+							"pages: %d removed, %d remain, %d pinned\n"
 							"tuples: %.0f removed, %.0f remain, %.0f are dead but not yet removable\n"
 							"buffer usage: %d hits, %d misses, %d dirtied\n"
 					  "avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"
@@ -356,6 +357,7 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
 							vacrelstats->num_index_scans,
 							vacrelstats->pages_removed,
 							vacrelstats->rel_pages,
+							vacrelstats->pinned_pages,
 							vacrelstats->tuples_deleted,
 							vacrelstats->new_rel_tuples,
 							vacrelstats->new_dead_tuples,
@@ -611,6 +613,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 		/* We need buffer cleanup lock so that we can prune HOT chains. */
 		if (!ConditionalLockBufferForCleanup(buf))
 		{
+			vacrelstats->pinned_pages++;
+
 			/*
 			 * If we're not scanning the whole relation to guard against XID
 			 * wraparound, it's OK to skip vacuuming a page.  The next vacuum
@@ -1101,10 +1105,12 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 					vacrelstats->scanned_pages, nblocks),
 			 errdetail("%.0f dead row versions cannot be removed yet.\n"
 					   "There were %.0f unused item pointers.\n"
+					   "%u pages were pinned.\n"
 					   "%u pages are entirely empty.\n"
 					   "%s.",
 					   nkeep,
 					   nunused,
+					   vacrelstats->pinned_pages,
 					   empty_pages,
 					   pg_rusage_show(&ru0))));
 }
#39Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Heikki Linnakangas (#38)
Re: Proposal: Log inability to lock pages during vacuum

On 12/17/14, 10:20 AM, Heikki Linnakangas wrote:

* Change the new bit in the errdetail. "could not acquire cleanup lock"
sounds too much like an error to me. "skipped %u pinned pages" maybe?

Seems reasonable.

Well, that's not always true either; when freezing, it doesn't skip the pinned pages, it waits for them.

Oops. :(

I.e. this just says how many pages were pinned, without saying what was done about them. That's not very meaningful to an average DBA, but that's true for many of the numbers printed in vacuum verbose.

In this case it'll mean people will be less likely to report that this is happening, but maybe that's OK. At least if someone comes to us with a problem we'll be able to get some info from them. I'll separately look into the vacuum docs and see if we can do a better job explaining the verbose output.

BTW, what is it about a dynamic message that makes it untranslatable? Doesn't the translation happen down-stream, so that at most we'd just need two translation messages? Or worst case we could have two separate elog calls, if we wanted to go that route.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Jim Nasby (#39)
Re: Proposal: Log inability to lock pages during vacuum

On 12/17/2014 08:02 PM, Jim Nasby wrote:

BTW, what is it about a dynamic message that makes it untranslatable?
Doesn't the translation happen down-stream, so that at most we'd just
need two translation messages?

I'm not sure what you mean by down-stream. There is some explanation on
this in the localization guide in the manual:

http://www.postgresql.org/docs/devel/static/nls-programmer.html#NLS-GUIDELINES

printf("Files were %s.\n", flag ? "copied" : "removed");

The translator will get three strings to translate: "Files were %s.\n",
"copied", and "removed".

Or worst case we could have two
separate elog calls, if we wanted to go that route.

Yeah, that's one option. I.e:

if (flag)
printf("Files were copied.\n");
else
printf("Files were removed.\n");

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jim Nasby (#39)
Re: Proposal: Log inability to lock pages during vacuum

Jim Nasby wrote:

On 12/17/14, 10:20 AM, Heikki Linnakangas wrote:

* Change the new bit in the errdetail. "could not acquire cleanup lock"
sounds too much like an error to me. "skipped %u pinned pages" maybe?

Seems reasonable.

Well, that's not always true either; when freezing, it doesn't skip the pinned pages, it waits for them.

Oops. :(

I.e. this just says how many pages were pinned, without saying what was done about them. That's not very meaningful to an average DBA, but that's true for many of the numbers printed in vacuum verbose.

In this case it'll mean people will be less likely to report that this is happening, but maybe that's OK. At least if someone comes to us with a problem we'll be able to get some info from them. I'll separately look into the vacuum docs and see if we can do a better job explaining the verbose output.

BTW, what is it about a dynamic message that makes it untranslatable? Doesn't the translation happen down-stream, so that at most we'd just need two translation messages? Or worst case we could have two separate elog calls, if we wanted to go that route.

Since each part is a complete sentence, you can paste them together.
For example you could do something like

msg1 = _("Could not acquire cleanup lock.");
msg2 = _("Skipped %u pinned pages.");

ereport(INFO,
(errcode(ERRCODE_SOME_STUFF),
errmsg_internal("%s\n%s", msg1, msg2)));

The use of errmsg_internal() is to avoid having it be translated again.

FWIW I think the vacuum summary INFO message is pretty terrible already.
I would support a patch that changed it to be separately translatable
units (as long as each unit is a complete sentence.)

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#38)
Re: Proposal: Log inability to lock pages during vacuum

On Wed, Dec 17, 2014 at 11:20 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

LOG: automatic vacuum of table "postgres.public.foo": index scans: 0
pages: 0 removed, 7256 remain, 0 pinned
tuples: 79415 removed, 513156 remain, 0 are dead but not yet
removable
buffer usage: 14532 hits, 6 misses, 6241 dirtied
avg read rate: 0.003 MB/s, avg write rate: 3.413 MB/s
system usage: CPU 0.00s/0.30u sec elapsed 14.28 sec

I.e. this just says how many pages were pinned, without saying what was done
about them. That's not very meaningful to an average DBA, but that's true
for many of the numbers printed in vacuum verbose.

That message is extremely confusing, to my eyes. If you want to say
"pages: 0 removed, 7256 remain, 0 skipped due to pins", that would
work for me, but just say "0 pinned" is totally wrong, because vacuum
pinned every page in the table.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Robert Haas (#42)
Re: Proposal: Log inability to lock pages during vacuum

On 12/18/14, 7:56 AM, Robert Haas wrote:

On Wed, Dec 17, 2014 at 11:20 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

LOG: automatic vacuum of table "postgres.public.foo": index scans: 0
pages: 0 removed, 7256 remain, 0 pinned
tuples: 79415 removed, 513156 remain, 0 are dead but not yet
removable
buffer usage: 14532 hits, 6 misses, 6241 dirtied
avg read rate: 0.003 MB/s, avg write rate: 3.413 MB/s
system usage: CPU 0.00s/0.30u sec elapsed 14.28 sec

I.e. this just says how many pages were pinned, without saying what was done
about them. That's not very meaningful to an average DBA, but that's true
for many of the numbers printed in vacuum verbose.

That message is extremely confusing, to my eyes. If you want to say
"pages: 0 removed, 7256 remain, 0 skipped due to pins", that would
work for me, but just say "0 pinned" is totally wrong, because vacuum
pinned every page in the table.

We have to decide on a tradeoff here. Either we end up with two different log messages (depending on scan_all) that require two different translations, or we end up with a generic message that isn't as clear.

The best option I can think of for the later is something like "failed initial lock attempt". That's the only thing that will be true in all cases.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jim Nasby (#43)
1 attachment(s)
Re: Proposal: Log inability to lock pages during vacuum

Jim Nasby wrote:

We have to decide on a tradeoff here. Either we end up with two
different log messages (depending on scan_all) that require two
different translations, or we end up with a generic message that isn't
as clear.

The best option I can think of for the later is something like "failed
initial lock attempt". That's the only thing that will be true in all
cases.

Here's my proposal. Instead of punting, I split the message in
separately translatable units, and emit only the ones that apply. The
code is messier this way, but I think we can live with that.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

log-cleanup-lock-acquisition-failures-in-vacuum-3.patchtext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 450c94f..19fd748 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -252,6 +252,7 @@ DETAIL:  CPU 0.01s/0.06u sec elapsed 0.07 sec.
 INFO:  "onek": found 3000 removable, 1000 nonremovable tuples in 143 pages
 DETAIL:  0 dead tuples cannot be removed yet.
 There were 0 unused item pointers.
+Skipped 0 pages due to buffer pins.
 0 pages are entirely empty.
 CPU 0.07s/0.39u sec elapsed 1.56 sec.
 INFO:  analyzing "public.onek"
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 6db6c5c..592ef95 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -105,6 +105,7 @@ typedef struct LVRelStats
 	BlockNumber old_rel_pages;	/* previous value of pg_class.relpages */
 	BlockNumber rel_pages;		/* total number of pages */
 	BlockNumber scanned_pages;	/* number of pages we examined */
+	BlockNumber	pinned_pages;	/* # of pages we could not initially lock */
 	double		scanned_tuples; /* counts only tuples on scanned pages */
 	double		old_rel_tuples; /* previous value of pg_class.reltuples */
 	double		new_rel_tuples; /* new estimated total # of tuples */
@@ -332,6 +333,7 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
 			TimestampDifferenceExceeds(starttime, endtime,
 									   Log_autovacuum_min_duration))
 		{
+			StringInfoData	buf;
 			TimestampDifference(starttime, endtime, &secs, &usecs);
 
 			read_rate = 0;
@@ -343,27 +345,44 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
 				write_rate = (double) BLCKSZ *VacuumPageDirty / (1024 * 1024) /
 							(secs + usecs / 1000000.0);
 			}
+
+			/*
+			 * This is pretty messy, but we split it up so that we can skip emitting
+			 * individual parts of the message when not applicable.
+			 */
+			initStringInfo(&buf);
+			appendStringInfo(&buf, _("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"),
+							 get_database_name(MyDatabaseId),
+							 get_namespace_name(RelationGetNamespace(onerel)),
+							 RelationGetRelationName(onerel),
+							 vacrelstats->num_index_scans);
+			appendStringInfo(&buf, _("pages: %d removed, %d remain\n"),
+							 vacrelstats->pages_removed,
+							 vacrelstats->rel_pages);
+			if (scan_all)
+				appendStringInfo(&buf, _("waited for %d buffer pins\n"),
+								 vacrelstats->pinned_pages);
+			else
+				appendStringInfo(&buf,
+								 _("skipped %d pages due to buffer pins\n"),
+								 vacrelstats->pinned_pages);
+			appendStringInfo(&buf,
+							 _("tuples: %.0f removed, %.0f remain, %.0f are dead but not yet removable\n"),
+							 vacrelstats->tuples_deleted,
+							 vacrelstats->new_rel_tuples,
+							 vacrelstats->new_dead_tuples);
+			appendStringInfo(&buf,
+							 _("buffer usage: %d hits, %d misses, %d dirtied\n"),
+							 VacuumPageHit,
+							 VacuumPageMiss,
+							 VacuumPageDirty);
+			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
+							 read_rate, write_rate);
+			appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
+
 			ereport(LOG,
-					(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
-							"pages: %d removed, %d remain\n"
-							"tuples: %.0f removed, %.0f remain, %.0f are dead but not yet removable\n"
-							"buffer usage: %d hits, %d misses, %d dirtied\n"
-					  "avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"
-							"system usage: %s",
-							get_database_name(MyDatabaseId),
-							get_namespace_name(RelationGetNamespace(onerel)),
-							RelationGetRelationName(onerel),
-							vacrelstats->num_index_scans,
-							vacrelstats->pages_removed,
-							vacrelstats->rel_pages,
-							vacrelstats->tuples_deleted,
-							vacrelstats->new_rel_tuples,
-							vacrelstats->new_dead_tuples,
-							VacuumPageHit,
-							VacuumPageMiss,
-							VacuumPageDirty,
-							read_rate, write_rate,
-							pg_rusage_show(&ru0))));
+					(errmsg_internal("%s", buf.data)));
+			pfree(buf.data);
 		}
 	}
 }
@@ -438,6 +457,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 	BlockNumber next_not_all_visible_block;
 	bool		skipping_all_visible_blocks;
 	xl_heap_freeze_tuple *frozen;
+	StringInfoData	buf;
 
 	pg_rusage_init(&ru0);
 
@@ -611,6 +631,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 		/* We need buffer cleanup lock so that we can prune HOT chains. */
 		if (!ConditionalLockBufferForCleanup(buf))
 		{
+			vacrelstats->pinned_pages++;
+
 			/*
 			 * If we're not scanning the whole relation to guard against XID
 			 * wraparound, it's OK to skip vacuuming a page.  The next vacuum
@@ -1094,19 +1116,27 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 						RelationGetRelationName(onerel),
 						tups_vacuumed, vacuumed_pages)));
 
+	initStringInfo(&buf);
+	appendStringInfo(&buf,
+					 _("%.0f dead row versions cannot be removed yet.\n"),
+					 nkeep);
+	appendStringInfo(&buf, _("There were %.0f unused item pointers.\n"),
+					 nunused);
+	if (!scan_all)
+		appendStringInfo(&buf, _("Skipped %u pages due to buffer pins.\n"),
+						 vacrelstats->pinned_pages);
+	appendStringInfo(&buf, _("%u pages are entirely empty.\n"),
+					 empty_pages);
+	appendStringInfo(&buf, _("%s."),
+					 pg_rusage_show(&ru0));
+
 	ereport(elevel,
 			(errmsg("\"%s\": found %.0f removable, %.0f nonremovable row versions in %u out of %u pages",
 					RelationGetRelationName(onerel),
 					tups_vacuumed, num_tuples,
 					vacrelstats->scanned_pages, nblocks),
-			 errdetail("%.0f dead row versions cannot be removed yet.\n"
-					   "There were %.0f unused item pointers.\n"
-					   "%u pages are entirely empty.\n"
-					   "%s.",
-					   nkeep,
-					   nunused,
-					   empty_pages,
-					   pg_rusage_show(&ru0))));
+			 errdetail_internal("%s", buf.data)));
+	pfree(buf.data);
 }
 
 
#45Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Alvaro Herrera (#44)
Re: Proposal: Log inability to lock pages during vacuum

On 12/18/2014 09:41 PM, Alvaro Herrera wrote:

Jim Nasby wrote:

We have to decide on a tradeoff here. Either we end up with two
different log messages (depending on scan_all) that require two
different translations, or we end up with a generic message that isn't
as clear.

The best option I can think of for the later is something like "failed
initial lock attempt". That's the only thing that will be true in all
cases.

Here's my proposal. Instead of punting, I split the message in
separately translatable units, and emit only the ones that apply. The
code is messier this way, but I think we can live with that.

Works for me.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Heikki Linnakangas (#45)
Re: Proposal: Log inability to lock pages during vacuum

Heikki Linnakangas wrote:

On 12/18/2014 09:41 PM, Alvaro Herrera wrote:

Here's my proposal. Instead of punting, I split the message in
separately translatable units, and emit only the ones that apply. The
code is messier this way, but I think we can live with that.

Works for me.

Great, thanks, pushed. I tweaked it a bit more, so that it would say
either "skipped N pages" or "waited N pins" in both autovacuum and
vacuum verbose cases, but only if N > 0.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#46)
Re: Proposal: Log inability to lock pages during vacuum

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Great, thanks, pushed. I tweaked it a bit more, so that it would say
either "skipped N pages" or "waited N pins" in both autovacuum and
vacuum verbose cases, but only if N > 0.

Not directly relevant but ... I think probably all those BlockNumber
counters should be printed with %u not %d. A relation with between
2G and 4G pages is possible, and it wouldn't look right with %d.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48Andres Freund
andres@2ndquadrant.com
In reply to: Alvaro Herrera (#44)
Re: Proposal: Log inability to lock pages during vacuum

On 2014-12-18 16:41:04 -0300, Alvaro Herrera wrote:

+			if (scan_all)
+				appendStringInfo(&buf, _("waited for %d buffer pins\n"),
+								 vacrelstats->pinned_pages);
+			else
+				appendStringInfo(&buf,
+								 _("skipped %d pages due to buffer pins\n"),
+								 vacrelstats->pinned_pages);

Unless I miss something this is, as mentioned before, not
correct. scan_all doesn't imply at all that we waited for buffer
pins. We only do so if lazy_check_needs_freeze(buf). Which usually won't
be true for a *significant* number of pages.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#48)
Re: Proposal: Log inability to lock pages during vacuum

Andres Freund wrote:

On 2014-12-18 16:41:04 -0300, Alvaro Herrera wrote:

+			if (scan_all)
+				appendStringInfo(&buf, _("waited for %d buffer pins\n"),
+								 vacrelstats->pinned_pages);
+			else
+				appendStringInfo(&buf,
+								 _("skipped %d pages due to buffer pins\n"),
+								 vacrelstats->pinned_pages);

Unless I miss something this is, as mentioned before, not
correct. scan_all doesn't imply at all that we waited for buffer
pins. We only do so if lazy_check_needs_freeze(buf). Which usually won't
be true for a *significant* number of pages.

Ah, interesting, I didn't remember we had that. I guess one possible
tweak is to discount the pages we skip from pinned_pages; or we could
keep a separate count of pages waited for. Jim, up for a patch?

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Alvaro Herrera (#49)
Re: Proposal: Log inability to lock pages during vacuum

On 12/18/14, 3:02 PM, Alvaro Herrera wrote:

Andres Freund wrote:

On 2014-12-18 16:41:04 -0300, Alvaro Herrera wrote:

+			if (scan_all)
+				appendStringInfo(&buf, _("waited for %d buffer pins\n"),
+								 vacrelstats->pinned_pages);
+			else
+				appendStringInfo(&buf,
+								 _("skipped %d pages due to buffer pins\n"),
+								 vacrelstats->pinned_pages);

Unless I miss something this is, as mentioned before, not
correct. scan_all doesn't imply at all that we waited for buffer
pins. We only do so if lazy_check_needs_freeze(buf). Which usually won't
be true for a *significant* number of pages.

Ah, interesting, I didn't remember we had that. I guess one possible
tweak is to discount the pages we skip from pinned_pages; or we could
keep a separate count of pages waited for. Jim, up for a patch?

I would prefer that we at least count if we initially don't get the lock; presumably that number is always low anyway and in that case I think we're done with this. If it turns out it is common to initially miss the pin then we could do something fancier.

So how about if in the scan_all case we say something like "unable to initially acquire pin on %d buffers\n"?

(Happy to do the patch either way, but I'd like us to decide what we're doing first. ;)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#51Andres Freund
andres@2ndquadrant.com
In reply to: Jim Nasby (#50)
Re: Proposal: Log inability to lock pages during vacuum

On 2014-12-18 16:05:23 -0600, Jim Nasby wrote:

On 12/18/14, 3:02 PM, Alvaro Herrera wrote:

Andres Freund wrote:

On 2014-12-18 16:41:04 -0300, Alvaro Herrera wrote:

+			if (scan_all)
+				appendStringInfo(&buf, _("waited for %d buffer pins\n"),
+								 vacrelstats->pinned_pages);
+			else
+				appendStringInfo(&buf,
+								 _("skipped %d pages due to buffer pins\n"),
+								 vacrelstats->pinned_pages);

Unless I miss something this is, as mentioned before, not
correct. scan_all doesn't imply at all that we waited for buffer
pins. We only do so if lazy_check_needs_freeze(buf). Which usually won't
be true for a *significant* number of pages.

Ah, interesting, I didn't remember we had that. I guess one possible
tweak is to discount the pages we skip from pinned_pages; or we could
keep a separate count of pages waited for. Jim, up for a patch?

I would prefer that we at least count if we initially don't get the lock; presumably that number is always low anyway and in that case I think we're done with this. If it turns out it is common to initially miss the pin then we could do something fancier.

So how about if in the scan_all case we say something like "unable to initially acquire pin on %d buffers\n"?

I'd just do away with the difference between scan_all/!scan_all and
always say the above.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52Andres Freund
andres@2ndquadrant.com
In reply to: Jim Nasby (#50)
Re: Proposal: Log inability to lock pages during vacuum

On 2014-12-18 16:05:23 -0600, Jim Nasby wrote:

On 12/18/14, 3:02 PM, Alvaro Herrera wrote:

Andres Freund wrote:

On 2014-12-18 16:41:04 -0300, Alvaro Herrera wrote:

+			if (scan_all)
+				appendStringInfo(&buf, _("waited for %d buffer pins\n"),
+								 vacrelstats->pinned_pages);
+			else
+				appendStringInfo(&buf,
+								 _("skipped %d pages due to buffer pins\n"),
+								 vacrelstats->pinned_pages);

Unless I miss something this is, as mentioned before, not
correct. scan_all doesn't imply at all that we waited for buffer
pins. We only do so if lazy_check_needs_freeze(buf). Which usually won't
be true for a *significant* number of pages.

Also, naming the number of pages we could *not* pin, pinned_pages?
Really?

pinskipped_pages,skipped_pages,unpinned_pages,...

Ah, interesting, I didn't remember we had that. I guess one possible
tweak is to discount the pages we skip from pinned_pages; or we could
keep a separate count of pages waited for. Jim, up for a patch?

This is still wrong. I think just counting skipped pages, without
distinct messages for waiting/not waiting, is good enough for
now. Everything else would only be actually meaningful if we actually
tracked the waiting time.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#53Andres Freund
andres@2ndquadrant.com
In reply to: Andres Freund (#52)
Re: Proposal: Log inability to lock pages during vacuum

On 2015-01-04 01:53:24 +0100, Andres Freund wrote:

Ah, interesting, I didn't remember we had that. I guess one possible
tweak is to discount the pages we skip from pinned_pages; or we could
keep a separate count of pages waited for. Jim, up for a patch?

This is still wrong. I think just counting skipped pages, without
distinct messages for waiting/not waiting, is good enough for
now. Everything else would only be actually meaningful if we actually
tracked the waiting time.

Pushed a commit for this, with additional improvements to autovacuum's
log output from:
LOG: automatic vacuum of table "postgres.public.frak": index scans: 0
pages: 0 removed, 1672 remain
skipped 1 pages due to buffer pins
tuples: 0 removed, 309959 remain, 309774 are dead but not yet removable
buffer usage: 4258 hits, 0 misses, 0 dirtied
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
system usage: CPU 0.00s/0.04u sec elapsed 0.46 sec
to:
LOG: automatic vacuum of table "postgres.public.frak": index scans: 0
pages: 0 removed, 1672 remain, 1 skipped due to pins
tuples: 0 removed, 309959 remain, 309774 are dead but not yet removable
buffer usage: 4258 hits, 0 misses, 0 dirtied
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
system usage: CPU 0.00s/0.04u sec elapsed 0.46 sec
as the 'skipped ...' line didn't really look in line with the rest.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers