error context for vacuum to include block number

Started by Justin Pryzbyover 6 years ago139 messageshackers
Jump to latest
#1Justin Pryzby
pryzby@telsasoft.com

On Wed, Aug 07, 2019 at 04:51:54PM -0700, Andres Freund wrote:
/messages/by-id/20190807235154.erbmr4o4bo6vgnjv@alap3.anarazel.de
| Ugh :(
|
| We really need to add a error context to vacuumlazy that shows which
| block is being processed.

I eeked out a minimal patch.

I renamed "StringInfoData buf", since it wasn't nice to mask it by
"Buffer buf".

postgres=# SET statement_timeout=99;vacuum t;
SET
2019-11-20 14:52:49.521 CST [6319] ERROR: canceling statement due to statement timeout
2019-11-20 14:52:49.521 CST [6319] CONTEXT: block 596
2019-11-20 14:52:49.521 CST [6319] STATEMENT: vacuum t;

Justin

Attachments:

v1-0001-vacuum-errcontext-to-show-block-being-processed.patchtext/x-diff; charset=us-asciiDownload+33-11
#2Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#1)
Re: error context for vacuum to include block number

Find attached updated patch:
. Use structure to include relation name.
. Split into a separate patch rename of "StringInfoData buf".

2019-11-27 20:04:53.640 CST [14244] ERROR: canceling statement due to statement timeout
2019-11-27 20:04:53.640 CST [14244] CONTEXT: block 2314 of relation t
2019-11-27 20:04:53.640 CST [14244] STATEMENT: vacuum t;

I tried to use BufferGetTag() to avoid using a 2ndary structure, but fails if
the buffer is not pinned.

Attachments:

v2-0001-vacuum-errcontext-to-show-block-being-processed.patchtext/x-diff; charset=us-asciiDownload+40-1
v2-0002-Rename-buf-to-avoid-shadowing-buf-of-another-type.patchtext/x-diff; charset=us-asciiDownload+10-11
#3Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#2)
Re: error context for vacuum to include block number

On Fri, Dec 06, 2019 at 10:23:25AM -0600, Justin Pryzby wrote:

Find attached updated patch:
. Use structure to include relation name.
. Split into a separate patch rename of "StringInfoData buf".

2019-11-27 20:04:53.640 CST [14244] ERROR: canceling statement due to statement timeout
2019-11-27 20:04:53.640 CST [14244] CONTEXT: block 2314 of relation t
2019-11-27 20:04:53.640 CST [14244] STATEMENT: vacuum t;

I tried to use BufferGetTag() to avoid using a 2ndary structure, but fails if
the buffer is not pinned.

No problem from me to add more context directly in lazy_scan_heap().

+ // errcallback.arg = (void *) &buf;
The first patch is full of that, please make sure to clean it up.

Let's keep also the message simple, still I think that it should be a
bit more explicative:
1) Let's the schema name, and quote the relation name.
2) Let's mention the scanning (or vacuuming) operation.

So I would suggest the following instead:
"while scanning block %u of relation \"%s.%s\""
--
Michael

#4Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#3)
Re: error context for vacuum to include block number

On Wed, Dec 11, 2019 at 09:15:07PM +0900, Michael Paquier wrote:

On Fri, Dec 06, 2019 at 10:23:25AM -0600, Justin Pryzby wrote:

Find attached updated patch:
. Use structure to include relation name.
. Split into a separate patch rename of "StringInfoData buf".

2019-11-27 20:04:53.640 CST [14244] ERROR: canceling statement due to statement timeout
2019-11-27 20:04:53.640 CST [14244] CONTEXT: block 2314 of relation t
2019-11-27 20:04:53.640 CST [14244] STATEMENT: vacuum t;

I tried to use BufferGetTag() to avoid using a 2ndary structure, but fails if
the buffer is not pinned.

No problem from me to add more context directly in lazy_scan_heap().

Do you mean without a callback ? I think that's necessary, since the IO errors
would happen within ReadBufferExtended, but we don't want to polute that with
errcontext. And cannot call errcontext on its own:
FATAL: errstart was not called

So I would suggest the following instead:
"while scanning block %u of relation \"%s.%s\""

Done in the attached.

Attachments:

v3-0001-vacuum-errcontext-to-show-block-being-processed.patchtext/x-diff; charset=us-asciiDownload+35-2
#5Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Justin Pryzby (#4)
Re: error context for vacuum to include block number

On 2019-Dec-11, Justin Pryzby wrote:

@@ -635,6 +644,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
else
skipping_blocks = false;

+	/* Setup error traceback support for ereport() */
+	errcallback.callback = vacuum_error_callback;
+	cbarg.relname = relname;
+	cbarg.relnamespace = get_namespace_name(RelationGetNamespace(onerel));
+	cbarg.blkno = 0; /* Not known yet */

Shouldn't you use InvalidBlockNumber for this initialization?

@@ -658,6 +676,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,

pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);

+ cbarg.blkno = blkno;

I would put this before pgstat_progress_update_param, just out of
paranoia.

@@ -817,7 +837,6 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,

buf = ReadBufferExtended(onerel, MAIN_FORKNUM, blkno,
RBM_NORMAL, vac_strategy);
-
/* We need buffer cleanup lock so that we can prune HOT chains. */
if (!ConditionalLockBufferForCleanup(buf))
{

Lose this hunk?

@@ -2354,3 +2376,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,

return all_visible;
}
+
+/*
+ * Error context callback for errors occurring during vacuum.
+ */
+static void
+vacuum_error_callback(void *arg)
+{
+	vacuum_error_callback_arg *cbarg = arg;
+
+	errcontext("while scanning block %u of relation \"%s.%s\"",
+			cbarg->blkno, cbarg->relnamespace, cbarg->relname);
+}

I would put this function around line 1512 (just after lazy_scan_heap)
rather than at bottom of file. (And move its prototype accordingly, to
line 156.) Or do you intend that this is going to be used for
lazy_vacuum_heap too? Maybe it should.

Patch looks good to me otherwise.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6Andres Freund
andres@anarazel.de
In reply to: Justin Pryzby (#4)
Re: error context for vacuum to include block number

Hi,

Thanks for working on this!

On 2019-12-11 08:36:48 -0600, Justin Pryzby wrote:

On Wed, Dec 11, 2019 at 09:15:07PM +0900, Michael Paquier wrote:

On Fri, Dec 06, 2019 at 10:23:25AM -0600, Justin Pryzby wrote:

Find attached updated patch:
. Use structure to include relation name.
. Split into a separate patch rename of "StringInfoData buf".

2019-11-27 20:04:53.640 CST [14244] ERROR: canceling statement due to statement timeout
2019-11-27 20:04:53.640 CST [14244] CONTEXT: block 2314 of relation t
2019-11-27 20:04:53.640 CST [14244] STATEMENT: vacuum t;

I tried to use BufferGetTag() to avoid using a 2ndary structure, but fails if
the buffer is not pinned.

The tag will not add all that informative details, because the
relfilenode isn't easily mappable to the table name or such.

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 043ebb4..9376989 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -138,6 +138,12 @@ typedef struct LVRelStats
bool		lock_waiter_detected;
} LVRelStats;
+typedef struct
+{
+	char *relname;
+	char *relnamespace;
+	BlockNumber blkno;
+} vacuum_error_callback_arg;

Hm, wonder if could be worthwhile to not use a separate struct here, but
instead extend one of the existing structs to contain the necessary
information. Or perhaps have one new struct with all the necessary
information. There's already quite a few places that do
get_namespace_name(), for example.

@@ -524,6 +531,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
PROGRESS_VACUUM_MAX_DEAD_TUPLES
};
int64		initprog_val[3];
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg cbarg;

Not a fan of "cbarg", too generic.

pg_rusage_init(&ru0);

@@ -635,6 +644,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
else
skipping_blocks = false;

+	/* Setup error traceback support for ereport() */
+	errcallback.callback = vacuum_error_callback;
+	cbarg.relname = relname;
+	cbarg.relnamespace = get_namespace_name(RelationGetNamespace(onerel));
+	cbarg.blkno = 0; /* Not known yet */
+	errcallback.arg = (void *) &cbarg;
+	errcallback.previous = error_context_stack;
+	error_context_stack = &errcallback;
+
for (blkno = 0; blkno < nblocks; blkno++)
{
Buffer		buf;
@@ -658,6 +676,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,

pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);

+		cbarg.blkno = blkno;
+
if (blkno == next_unskippable_block)
{
/* Time to advance next_unskippable_block */
@@ -817,7 +837,6 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,

buf = ReadBufferExtended(onerel, MAIN_FORKNUM, blkno,
RBM_NORMAL, vac_strategy);
-
/* We need buffer cleanup lock so that we can prune HOT chains. */
if (!ConditionalLockBufferForCleanup(buf))
{
@@ -1388,6 +1407,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
RecordPageWithFreeSpace(onerel, blkno, freespace);
}

+	/* Pop the error context stack */
+	error_context_stack = errcallback.previous;
+
/* report that everything is scanned and vacuumed */
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);

@@ -2354,3 +2376,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,

return all_visible;
}

I think this will misattribute errors that happen when in the:
/*
* If we are close to overrunning the available space for dead-tuple
* TIDs, pause and do a cycle of vacuuming before we tackle this page.
*/
section of lazy_scan_heap(). That will

a) scan the index, during which we presumably don't want the same error
context, as it'd be quite misleading: The block that was just scanned
in the loop isn't actually likely to be the culprit for an index
problem. And we'd not mention the fact that the problem is occurring
in the index.
b) will report the wrong block, when in lazy_vacuum_heap().

Greetings,

Andres Freund

#7Justin Pryzby
pryzby@telsasoft.com
In reply to: Andres Freund (#6)
Re: error context for vacuum to include block number

On Wed, Dec 11, 2019 at 12:33:53PM -0300, Alvaro Herrera wrote:

On 2019-Dec-11, Justin Pryzby wrote:

+ cbarg.blkno = 0; /* Not known yet */

Shouldn't you use InvalidBlockNumber for this initialization?

..

pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
+ cbarg.blkno = blkno;

I would put this before pgstat_progress_update_param, just out of
paranoia.

..

Lose this hunk?

Addressed those.

Or do you intend that this is going to be used for lazy_vacuum_heap too?
Maybe it should.

Done in a separate patch.

On Wed, Dec 11, 2019 at 08:54:25AM -0800, Andres Freund wrote:

Hm, wonder if could be worthwhile to not use a separate struct here, but
instead extend one of the existing structs to contain the necessary
information. Or perhaps have one new struct with all the necessary
information. There's already quite a few places that do
get_namespace_name(), for example.

Didn't find a better struct to use yet.

+ vacuum_error_callback_arg cbarg;

Not a fan of "cbarg", too generic.

..

I think this will misattribute errors that happen when in the:

Probably right. Attached should address it.

On Wed, Dec 11, 2019 at 08:54:25AM -0800, Andres Freund wrote:

+typedef struct
+{
+	char *relname;
+	char *relnamespace;
+	BlockNumber blkno;
+} vacuum_error_callback_arg;

Hm, wonder if could be worthwhile to not use a separate struct here, but
instead extend one of the existing structs to contain the necessary
information. Or perhaps have one new struct with all the necessary
information. There's already quite a few places that do
get_namespace_name(), for example.

Not a fan of "cbarg", too generic.

I think this will misattribute errors that happen when in the:

I think that's addressed after deduplicating in attached.

Deduplication revealed 2nd progress call, which seems to have been included
redundantly at c16dc1aca.

- /* Remove tuples from heap */
- pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
- PROGRESS_VACUUM_PHASE_VACUUM_HEAP);

Justin

Attachments:

v4-0001-Rename-buf-to-avoid-shadowing-buf-of-another-type.patchtext/x-diff; charset=us-asciiDownload+10-11
v4-0002-Remove-redundant-call-to-vacuum-progress.patchtext/x-diff; charset=us-asciiDownload+0-4
v4-0003-deduplication.patchtext/x-diff; charset=us-asciiDownload+48-66
v4-0004-vacuum-errcontext-to-show-block-being-processed.patchtext/x-diff; charset=us-asciiDownload+39-1
v4-0005-add-errcontext-callback-in-lazy_vacuum_heap-too.patchtext/x-diff; charset=us-asciiDownload+14-1
#8Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#7)
Re: error context for vacuum to include block number

On Thu, Dec 12, 2019 at 09:08:31PM -0600, Justin Pryzby wrote:

On Wed, Dec 11, 2019 at 08:54:25AM -0800, Andres Freund wrote:

Hm, wonder if could be worthwhile to not use a separate struct here, but
instead extend one of the existing structs to contain the necessary
information. Or perhaps have one new struct with all the necessary
information. There's already quite a few places that do
get_namespace_name(), for example.

Didn't find a better struct to use yet.

Yes, I am too wondering what Andres has in mind here. It is not like
you can use VacuumRelation as the OID of the relation may not have
been stored.

On Wed, Dec 11, 2019 at 08:54:25AM -0800, Andres Freund wrote:>
I think that's addressed after deduplicating in attached.

Deduplication revealed 2nd progress call, which seems to have been included
redundantly at c16dc1aca.

- /* Remove tuples from heap */
- pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
- PROGRESS_VACUUM_PHASE_VACUUM_HEAP);

What is the purpose of 0001 in the context of this thread? One could
say the same about 0002 and 0003. Anyway, you are right with 0002 as
the progress value for PROGRESS_VACUUM_PHASE gets updated twice in a
row with the same value. So let's clean up that.

The refactoring in 0003 is interesting, so I would be tempted to merge
it.  Now you have one small issue in it:
-    /*
-     * Forget the now-vacuumed tuples, and press on, but be careful
-     * not to reset latestRemovedXid since we want that value to be
-     * valid.
-     */
+     lazy_vacuum_heap_index(onerel, vacrelstats, Irel, nindexes, indstats);
      vacrelstats->num_dead_tuples = 0;
-     vacrelstats->num_index_scans++;
You are moving this comment within lazy_vacuum_heap_index, but it
applies to num_dead_tuples and not num_index_scans, no?  You should
keep the incrementation of num_index_scans within the routine though.
--
Michael
#9Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#8)
Re: error context for vacuum to include block number

On Fri, Dec 13, 2019 at 10:28:50PM +0900, Michael Paquier wrote:

v4-0001-Rename-buf-to-avoid-shadowing-buf-of-another-type.patch
v4-0002-Remove-redundant-call-to-vacuum-progress.patch
v4-0003-deduplication.patch
v4-0004-vacuum-errcontext-to-show-block-being-processed.patch
v4-0005-add-errcontext-callback-in-lazy_vacuum_heap-too.patch

What is the purpose of 0001 in the context of this thread? One could
say the same about 0002 and 0003. Anyway, you are right with 0002 as
the progress value for PROGRESS_VACUUM_PHASE gets updated twice in a
row with the same value. So let's clean up that.

It's related code which I cleaned up before adding new stuff. Not essential,
thus separate (0002 should be backpatched).

The refactoring in 0003 is interesting, so I would be tempted to merge
it.  Now you have one small issue in it:
-    /*
-     * Forget the now-vacuumed tuples, and press on, but be careful
-     * not to reset latestRemovedXid since we want that value to be
-     * valid.
-     */
+     lazy_vacuum_heap_index(onerel, vacrelstats, Irel, nindexes, indstats);
vacrelstats->num_dead_tuples = 0;
-     vacrelstats->num_index_scans++;
You are moving this comment within lazy_vacuum_heap_index, but it
applies to num_dead_tuples and not num_index_scans, no?  You should
keep the incrementation of num_index_scans within the routine though.

Thank you, fixed.

--
Justin Pryzby
System Administrator
Telsasoft
+1-952-707-8581

Attachments:

v5-0001-Rename-buf-to-avoid-shadowing-buf-of-another-type.patchtext/x-diff; charset=us-asciiDownload+10-11
v5-0002-Remove-redundant-call-to-vacuum-progress.patchtext/x-diff; charset=us-asciiDownload+0-4
v5-0003-deduplication.patchtext/x-diff; charset=us-asciiDownload+43-60
v5-0004-vacuum-errcontext-to-show-block-being-processed.patchtext/x-diff; charset=us-asciiDownload+38-1
v5-0005-add-errcontext-callback-in-lazy_vacuum_heap-too.patchtext/x-diff; charset=us-asciiDownload+14-1
#10Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#9)
Re: error context for vacuum to include block number

On Fri, Dec 13, 2019 at 04:47:35PM -0600, Justin Pryzby wrote:

It's related code which I cleaned up before adding new stuff. Not essential,
thus separate (0002 should be backpatched).

The issue just causes some extra work and that's not a bug, so applied
without a backpatch.

The refactoring in 0003 is interesting, so I would be tempted to merge
it.  Now you have one small issue in it:
-    /*
-     * Forget the now-vacuumed tuples, and press on, but be careful
-     * not to reset latestRemovedXid since we want that value to be
-     * valid.
-     */
+     lazy_vacuum_heap_index(onerel, vacrelstats, Irel, nindexes, indstats);
vacrelstats->num_dead_tuples = 0;
-     vacrelstats->num_index_scans++;
You are moving this comment within lazy_vacuum_heap_index, but it
applies to num_dead_tuples and not num_index_scans, no?  You should
keep the incrementation of num_index_scans within the routine though.

Thank you, fixed.

For 0003, I think that lazy_vacuum_heap_index() can be confusing as
those indexes are unrelated to heap. Why not naming it just
lazy_vacuum_all_indexes()? The routine should also have a header
describing it.
--
Michael

#11Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#10)
Re: error context for vacuum to include block number

On Sun, Dec 15, 2019 at 10:07:08PM +0900, Michael Paquier wrote:

On Fri, Dec 13, 2019 at 04:47:35PM -0600, Justin Pryzby wrote:

It's related code which I cleaned up before adding new stuff. Not essential,
thus separate (0002 should be backpatched).

The issue just causes some extra work and that's not a bug, so applied
without a backpatch.

Thanks

For 0003, I think that lazy_vacuum_heap_index() can be confusing as
those indexes are unrelated to heap. Why not naming it just
lazy_vacuum_all_indexes()? The routine should also have a header
describing it.

I named it so because it calls both lazy_vacuum_index
("PROGRESS_VACUUM_PHASE_VACUUM_INDEX") and
lazy_vacuum_heap("PROGRESS_VACUUM_PHASE_VACUUM_HEAP")

I suppose you don't think the other way around is better?
lazy_vacuum_index_heap

Justin

#12Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#11)
Re: error context for vacuum to include block number

On Sun, Dec 15, 2019 at 10:27:12AM -0600, Justin Pryzby wrote:

I named it so because it calls both lazy_vacuum_index
("PROGRESS_VACUUM_PHASE_VACUUM_INDEX") and
lazy_vacuum_heap("PROGRESS_VACUUM_PHASE_VACUUM_HEAP")

I suppose you don't think the other way around is better?
lazy_vacuum_index_heap

Dunno. Let's see if others have other thoughts on the matter. FWIW,
I have a long history at naming things in a way others don't like :)
--
Michael

#13Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#12)
Re: error context for vacuum to include block number

At Mon, 16 Dec 2019 11:49:56 +0900, Michael Paquier <michael@paquier.xyz> wrote in

On Sun, Dec 15, 2019 at 10:27:12AM -0600, Justin Pryzby wrote:

I named it so because it calls both lazy_vacuum_index
("PROGRESS_VACUUM_PHASE_VACUUM_INDEX") and
lazy_vacuum_heap("PROGRESS_VACUUM_PHASE_VACUUM_HEAP")

I suppose you don't think the other way around is better?
lazy_vacuum_index_heap

Dunno. Let's see if others have other thoughts on the matter. FWIW,
I have a long history at naming things in a way others don't like :)

lazy_vacuum_heap_index() seems confusing to me. I read the name as
Michael did before looking the above explanation.

lazy_vacuum_heap_and_index() is clearer to me.
lazy_vacuum_heap_with_index() could also work but I'm not sure it's
further better.

I see some function names like that, and some others that have two
verbs bonded by "_and_".

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#14Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#12)
Re: error context for vacuum to include block number

On Mon, Dec 16, 2019 at 11:49:56AM +0900, Michael Paquier wrote:

On Sun, Dec 15, 2019 at 10:27:12AM -0600, Justin Pryzby wrote:

I named it so because it calls both lazy_vacuum_index
("PROGRESS_VACUUM_PHASE_VACUUM_INDEX") and
lazy_vacuum_heap("PROGRESS_VACUUM_PHASE_VACUUM_HEAP")

I suppose you don't think the other way around is better?
lazy_vacuum_index_heap

Dunno. Let's see if others have other thoughts on the matter. FWIW,
I have a long history at naming things in a way others don't like :)

I renamed.

And deduplicated two more hunks into a 2nd function.

(I'm also including the changes I mentioned here ... in case anyone cares to
comment or review).
/messages/by-id/20191220171132.GB30414@telsasoft.com

Attachments:

v6-0004-vacuum-errcontext-to-show-block-being-processed.patchtext/x-diff; charset=us-asciiDownload+37-1
v6-0005-add-errcontext-callback-in-lazy_vacuum_heap-too.patchtext/x-diff; charset=us-asciiDownload+14-1
v6-0001-Rename-buf-to-avoid-shadowing-buf-of-another-type.patchtext/x-diff; charset=us-asciiDownload+10-11
v6-0002-deduplication.patchtext/x-diff; charset=us-asciiDownload+43-61
v6-0003-dedup2-skip_blocks.patchtext/x-diff; charset=us-asciiDownload+84-104
v6-0006-Print-debug-line-before-starting-each-vacuum-step.patchtext/x-diff; charset=us-asciiDownload+10-1
#15Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#14)
Re: error context for vacuum to include block number

On Mon, Dec 23, 2019 at 07:24:28PM -0600, Justin Pryzby wrote:

I renamed.

Hmm. I have found what was partially itching me for patch 0002, and
that's actually the fact that we don't do the error reporting for heap
within lazy_vacuum_heap() because the code relies too much on updating
two progress parameters at the same time, on top of the fact that you
are mixing multiple concepts with this refactoring. One problem is
that if this code is refactored in the future, future callers of
lazy_vacuum_heap() would miss the update of the progress reporting.
Splitting things improves also the readability of the code, so
attached is the refactoring I would do for this portion of the set.
It is also more natural to increment num_index_scans when the
reporting happens on consistency grounds.

(Please note that I have not indented yet the patch.)
--
Michael

Attachments:

vacuumlazy-index-refactor-v7.patchtext/x-diff; charset=us-asciiDownload+43-53
#16Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#15)
Re: error context for vacuum to include block number

On Tue, Dec 24, 2019 at 01:19:09PM +0900, Michael Paquier wrote:

(Please note that I have not indented yet the patch.)

And one indentation later, committed this one after an extra lookup as
of 1ab41a3.
--
Michael

#17Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#15)
Re: error context for vacuum to include block number

On Tue, Dec 24, 2019 at 01:19:09PM +0900, Michael Paquier wrote:

On Mon, Dec 23, 2019 at 07:24:28PM -0600, Justin Pryzby wrote:

I renamed.

Hmm. I have found what was partially itching me for patch 0002, and
that's actually the fact that we don't do the error reporting for heap
within lazy_vacuum_heap() because the code relies too much on updating
two progress parameters at the same time, on top of the fact that you
are mixing multiple concepts with this refactoring. One problem is
that if this code is refactored in the future, future callers of
lazy_vacuum_heap() would miss the update of the progress reporting.
Splitting things improves also the readability of the code, so
attached is the refactoring I would do for this portion of the set.
It is also more natural to increment num_index_scans when the

I agree that's better.
I don't see any reason why the progress params need to be updated atomically.
So rebasified against your patch.

Attachments:

v7-0001-Rename-buf-to-avoid-shadowing-buf-of-another-type.patchtext/x-diff; charset=us-asciiDownload+10-11
v7-0002-michael-dedup.patchtext/x-diff; charset=us-asciiDownload+43-54
v7-0003-dedup2-skip_blocks.patchtext/x-diff; charset=us-asciiDownload+84-104
v7-0004-vacuum-errcontext-to-show-block-being-processed.patchtext/x-diff; charset=us-asciiDownload+37-1
v7-0005-add-errcontext-callback-in-lazy_vacuum_heap-too.patchtext/x-diff; charset=us-asciiDownload+14-1
v7-0006-Print-debug-line-before-starting-each-vacuum-step.patchtext/x-diff; charset=us-asciiDownload+10-1
#18Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#17)
Re: error context for vacuum to include block number

On Thu, Dec 26, 2019 at 10:57 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

I agree that's better.
I don't see any reason why the progress params need to be updated atomically.
So rebasified against your patch.

I am not sure whether it's important enough to make a stink about, but
it bothers me a bit that this is being dismissed as unimportant. The
problem is that, if the updates are not atomic, then somebody might
see the data after one has been updated and the other has not yet been
updated. The result is that when the phase is
PROGRESS_VACUUM_PHASE_VACUUM_INDEX, someone reading the information
can't tell whether the number of index scans reported is the number
*previously* performed or the number performed including the one that
just finished. The race to see the latter state is narrow, so it
probably wouldn't come up often, but it does seem like it would be
confusing if it did happen.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#18)
Re: error context for vacuum to include block number (atomic progress update)

On Sat, Dec 28, 2019 at 07:21:31PM -0500, Robert Haas wrote:

On Thu, Dec 26, 2019 at 10:57 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

I agree that's better.
I don't see any reason why the progress params need to be updated atomically.
So rebasified against your patch.

I am not sure whether it's important enough to make a stink about, but
it bothers me a bit that this is being dismissed as unimportant. The
problem is that, if the updates are not atomic, then somebody might
see the data after one has been updated and the other has not yet been
updated. The result is that when the phase is
PROGRESS_VACUUM_PHASE_VACUUM_INDEX, someone reading the information
can't tell whether the number of index scans reported is the number
*previously* performed or the number performed including the one that
just finished. The race to see the latter state is narrow, so it
probably wouldn't come up often, but it does seem like it would be
confusing if it did happen.

What used to be atomic was this:

- hvp_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_HEAP;
- hvp_val[1] = vacrelstats->num_index_scans + 1;

=> switch from PROGRESS_VACUUM_PHASE_VACUUM INDEX to HEAP and increment
index_vacuum_count, which is documented as the "Number of completed index
vacuum cycles."

Now, it 1) increments the number of completed scans; and, 2) then progresses
phase to HEAP, so there's a window where the number of completed scans is
incremented, and it still says VACUUM_INDEX.

Previously, if it said VACUUM_INDEX, one could assume that index_vacuum_count
would increase at least once more, and that's no longer true. If someone sees
VACUUM_INDEX and some NUM_INDEX_VACUUMS, and then later sees VACUUM_HEAP or
other later stage, with same (maybe final) value of NUM_INDEX_VACUUMS, that's
different than previous behavior.

It seems to me that a someone or their tool monitoring pg_stat shouldn't be
confused by this change, since:
1) there's no promise about how high NUM_INDEX_VACUUMS will or won't go; and,
2) index_vacuum_count didn't do anything strange like decreasing, or increased
before the scans were done; and,
3) the vacuum can finish at any time, and the monitoring process presumably
knows that when the PID is gone, it's finished, even if it missed intermediate
updates;

The behavior is different from before, but I think that's ok: the number of
scans is accurate, and the PHASE is accurate, even though it'll change a moment
later.

I see there's similar case here:
| /* report all blocks vacuumed; and that we're cleaning up */
| pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
| pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
| PROGRESS_VACUUM_PHASE_INDEX_CLEANUP);

heap_blks_scanned is documented as "Number of heap blocks SCANNED", and it
increments exactly to heap_blks_total. Would someone be confused if
heap_blks_scanned==heap_blks_total AND phase=='scanning heap' ? I think they'd
just expect PHASE to be updated a moment later. (And if it wasn't, I agree they
should then be legitimately confused or concerned).

Actually, the doc says:
|If heap_blks_scanned is less than heap_blks_total, the system will return to
|scanning the heap after this phase is completed; otherwise, it will begin
|cleaning up indexes AFTER THIS PHASE IS COMPLETED.

I read that to mean that it's okay if heap_blks_scanned==heap_blks_total when
scanning/vacuuming heap.

Justin

#20Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#17)
Re: error context for vacuum to include block number

On Thu, Dec 26, 2019 at 09:57:04AM -0600, Justin Pryzby wrote:

So rebasified against your patch.

Rebased against your patch in master this time.

Attachments:

v8-0001-Rename-buf-to-avoid-shadowing-buf-of-another-type.patchtext/x-diff; charset=us-asciiDownload+10-11
v8-0002-dedup2-skip_blocks.patchtext/x-diff; charset=us-asciiDownload+84-104
v8-0003-vacuumlazy-typos-in-comments.patchtext/x-diff; charset=us-asciiDownload+2-3
v8-0004-vacuum-errcontext-to-show-block-being-processed.patchtext/x-diff; charset=us-asciiDownload+37-1
v8-0005-add-errcontext-callback-in-lazy_vacuum_heap-too.patchtext/x-diff; charset=us-asciiDownload+14-1
#21Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#19)
#22Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#20)
#23Andres Freund
andres@anarazel.de
In reply to: Justin Pryzby (#22)
#24Andres Freund
andres@anarazel.de
In reply to: Justin Pryzby (#7)
#25Justin Pryzby
pryzby@telsasoft.com
In reply to: Andres Freund (#23)
#26Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Justin Pryzby (#25)
#27Justin Pryzby
pryzby@telsasoft.com
In reply to: Masahiko Sawada (#26)
#28Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Justin Pryzby (#27)
#29Justin Pryzby
pryzby@telsasoft.com
In reply to: Alvaro Herrera (#28)
#30Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Justin Pryzby (#29)
#31Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Justin Pryzby (#29)
#32Justin Pryzby
pryzby@telsasoft.com
In reply to: Andres Freund (#23)
#33Justin Pryzby
pryzby@telsasoft.com
In reply to: Masahiko Sawada (#31)
#34Andres Freund
andres@anarazel.de
In reply to: Justin Pryzby (#32)
#35Andres Freund
andres@anarazel.de
In reply to: Justin Pryzby (#25)
#36Justin Pryzby
pryzby@telsasoft.com
In reply to: Andres Freund (#35)
#37Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#36)
#38Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Justin Pryzby (#36)
#39Justin Pryzby
pryzby@telsasoft.com
In reply to: Masahiko Sawada (#38)
#40Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Justin Pryzby (#39)
#41Justin Pryzby
pryzby@telsasoft.com
In reply to: Masahiko Sawada (#40)
#42Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Justin Pryzby (#41)
#43Justin Pryzby
pryzby@telsasoft.com
In reply to: Masahiko Sawada (#42)
#44Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Justin Pryzby (#43)
#45Justin Pryzby
pryzby@telsasoft.com
In reply to: Masahiko Sawada (#44)
#46Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#23)
#47Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Justin Pryzby (#45)
#48Justin Pryzby
pryzby@telsasoft.com
In reply to: Masahiko Sawada (#47)
#49Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Justin Pryzby (#48)
#50Justin Pryzby
pryzby@telsasoft.com
In reply to: Masahiko Sawada (#49)
#51Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Justin Pryzby (#50)
#52Justin Pryzby
pryzby@telsasoft.com
In reply to: Masahiko Sawada (#51)
#53Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Justin Pryzby (#52)
#54Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#52)
#55Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Justin Pryzby (#54)
#56Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#55)
#57Justin Pryzby
pryzby@telsasoft.com
In reply to: Alvaro Herrera (#55)
#58Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Justin Pryzby (#57)
#59Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Alvaro Herrera (#55)
#60Justin Pryzby
pryzby@telsasoft.com
In reply to: Alvaro Herrera (#58)
#61Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Justin Pryzby (#60)
#62Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Justin Pryzby (#60)
#63Justin Pryzby
pryzby@telsasoft.com
In reply to: Masahiko Sawada (#62)
#64Justin Pryzby
pryzby@telsasoft.com
In reply to: Alvaro Herrera (#61)
#65Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#63)
#66Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Kapila (#65)
#67Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#66)
#68Justin Pryzby
pryzby@telsasoft.com
In reply to: Amit Kapila (#67)
#69Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#68)
#70Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#69)
#71Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#70)
#72Justin Pryzby
pryzby@telsasoft.com
In reply to: Amit Kapila (#71)
#73Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#72)
#74Justin Pryzby
pryzby@telsasoft.com
In reply to: Amit Kapila (#73)
#75Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#74)
#76Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#75)
#77Justin Pryzby
pryzby@telsasoft.com
In reply to: Amit Kapila (#76)
#78Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#77)
#79Justin Pryzby
pryzby@telsasoft.com
In reply to: Amit Kapila (#78)
#80Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#79)
#81Justin Pryzby
pryzby@telsasoft.com
In reply to: Amit Kapila (#80)
#82Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#80)
#83Justin Pryzby
pryzby@telsasoft.com
In reply to: Masahiko Sawada (#82)
#84Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#83)
#85Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#81)
#86Justin Pryzby
pryzby@telsasoft.com
In reply to: Amit Kapila (#84)
#87Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#86)
#88Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#87)
#89Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#88)
#90Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#89)
#91Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#90)
#92Justin Pryzby
pryzby@telsasoft.com
In reply to: Amit Kapila (#91)
#93Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#92)
#94Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#91)
#95Justin Pryzby
pryzby@telsasoft.com
In reply to: Masahiko Sawada (#90)
#96Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#94)
#97Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#95)
#98Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#96)
#99Justin Pryzby
pryzby@telsasoft.com
In reply to: Masahiko Sawada (#98)
#100Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#98)
#101Justin Pryzby
pryzby@telsasoft.com
In reply to: Amit Kapila (#100)
#102Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Justin Pryzby (#101)
#103Justin Pryzby
pryzby@telsasoft.com
In reply to: Masahiko Sawada (#102)
#104Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#103)
#105Justin Pryzby
pryzby@telsasoft.com
In reply to: Amit Kapila (#104)
#106Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#104)
#107Justin Pryzby
pryzby@telsasoft.com
In reply to: Masahiko Sawada (#106)
#108Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#107)
#109Justin Pryzby
pryzby@telsasoft.com
In reply to: Amit Kapila (#108)
#110Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#109)
#111Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#110)
#112Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#111)
#113Justin Pryzby
pryzby@telsasoft.com
In reply to: Masahiko Sawada (#112)
#114Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#113)
#115Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Justin Pryzby (#114)
#116Justin Pryzby
pryzby@telsasoft.com
In reply to: Alvaro Herrera (#115)
#117Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Justin Pryzby (#116)
#118Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#114)
#119Justin Pryzby
pryzby@telsasoft.com
In reply to: Amit Kapila (#118)
#120Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#119)
#121Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#120)
#122Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Justin Pryzby (#114)
#123Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#122)
#124Justin Pryzby
pryzby@telsasoft.com
In reply to: Amit Kapila (#121)
#125Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#124)
#126Justin Pryzby
pryzby@telsasoft.com
In reply to: Amit Kapila (#125)
#127Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#126)
#128Justin Pryzby
pryzby@telsasoft.com
In reply to: Amit Kapila (#127)
#129Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#128)
#130Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#129)
#131Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#130)
#132Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#131)
#133Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#116)
#134Justin Pryzby
pryzby@telsasoft.com
In reply to: Amit Kapila (#133)
#135Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#134)
#136Justin Pryzby
pryzby@telsasoft.com
In reply to: Amit Kapila (#135)
#137Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#136)
#138Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#137)
#139Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#138)