error context for vacuum to include block number
On Wed, Aug 07, 2019 at 04:51:54PM -0700, Andres Freund wrote:
/messages/by-id/20190807235154.erbmr4o4bo6vgnjv@alap3.anarazel.de
| Ugh :(
|
| We really need to add a error context to vacuumlazy that shows which
| block is being processed.
I eeked out a minimal patch.
I renamed "StringInfoData buf", since it wasn't nice to mask it by
"Buffer buf".
postgres=# SET statement_timeout=99;vacuum t;
SET
2019-11-20 14:52:49.521 CST [6319] ERROR: canceling statement due to statement timeout
2019-11-20 14:52:49.521 CST [6319] CONTEXT: block 596
2019-11-20 14:52:49.521 CST [6319] STATEMENT: vacuum t;
Justin
Attachments:
v1-0001-vacuum-errcontext-to-show-block-being-processed.patchtext/x-diff; charset=us-asciiDownload+33-11
Find attached updated patch:
. Use structure to include relation name.
. Split into a separate patch rename of "StringInfoData buf".
2019-11-27 20:04:53.640 CST [14244] ERROR: canceling statement due to statement timeout
2019-11-27 20:04:53.640 CST [14244] CONTEXT: block 2314 of relation t
2019-11-27 20:04:53.640 CST [14244] STATEMENT: vacuum t;
I tried to use BufferGetTag() to avoid using a 2ndary structure, but fails if
the buffer is not pinned.
On Fri, Dec 06, 2019 at 10:23:25AM -0600, Justin Pryzby wrote:
Find attached updated patch:
. Use structure to include relation name.
. Split into a separate patch rename of "StringInfoData buf".2019-11-27 20:04:53.640 CST [14244] ERROR: canceling statement due to statement timeout
2019-11-27 20:04:53.640 CST [14244] CONTEXT: block 2314 of relation t
2019-11-27 20:04:53.640 CST [14244] STATEMENT: vacuum t;I tried to use BufferGetTag() to avoid using a 2ndary structure, but fails if
the buffer is not pinned.
No problem from me to add more context directly in lazy_scan_heap().
+ // errcallback.arg = (void *) &buf;
The first patch is full of that, please make sure to clean it up.
Let's keep also the message simple, still I think that it should be a
bit more explicative:
1) Let's the schema name, and quote the relation name.
2) Let's mention the scanning (or vacuuming) operation.
So I would suggest the following instead:
"while scanning block %u of relation \"%s.%s\""
--
Michael
On Wed, Dec 11, 2019 at 09:15:07PM +0900, Michael Paquier wrote:
On Fri, Dec 06, 2019 at 10:23:25AM -0600, Justin Pryzby wrote:
Find attached updated patch:
. Use structure to include relation name.
. Split into a separate patch rename of "StringInfoData buf".2019-11-27 20:04:53.640 CST [14244] ERROR: canceling statement due to statement timeout
2019-11-27 20:04:53.640 CST [14244] CONTEXT: block 2314 of relation t
2019-11-27 20:04:53.640 CST [14244] STATEMENT: vacuum t;I tried to use BufferGetTag() to avoid using a 2ndary structure, but fails if
the buffer is not pinned.No problem from me to add more context directly in lazy_scan_heap().
Do you mean without a callback ? I think that's necessary, since the IO errors
would happen within ReadBufferExtended, but we don't want to polute that with
errcontext. And cannot call errcontext on its own:
FATAL: errstart was not called
So I would suggest the following instead:
"while scanning block %u of relation \"%s.%s\""
Done in the attached.
Attachments:
v3-0001-vacuum-errcontext-to-show-block-being-processed.patchtext/x-diff; charset=us-asciiDownload+35-2
On 2019-Dec-11, Justin Pryzby wrote:
@@ -635,6 +644,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
else
skipping_blocks = false;+ /* Setup error traceback support for ereport() */ + errcallback.callback = vacuum_error_callback; + cbarg.relname = relname; + cbarg.relnamespace = get_namespace_name(RelationGetNamespace(onerel)); + cbarg.blkno = 0; /* Not known yet */
Shouldn't you use InvalidBlockNumber for this initialization?
@@ -658,6 +676,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
+ cbarg.blkno = blkno;
I would put this before pgstat_progress_update_param, just out of
paranoia.
@@ -817,7 +837,6 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
buf = ReadBufferExtended(onerel, MAIN_FORKNUM, blkno,
RBM_NORMAL, vac_strategy);
-
/* We need buffer cleanup lock so that we can prune HOT chains. */
if (!ConditionalLockBufferForCleanup(buf))
{
Lose this hunk?
@@ -2354,3 +2376,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
return all_visible; } + +/* + * Error context callback for errors occurring during vacuum. + */ +static void +vacuum_error_callback(void *arg) +{ + vacuum_error_callback_arg *cbarg = arg; + + errcontext("while scanning block %u of relation \"%s.%s\"", + cbarg->blkno, cbarg->relnamespace, cbarg->relname); +}
I would put this function around line 1512 (just after lazy_scan_heap)
rather than at bottom of file. (And move its prototype accordingly, to
line 156.) Or do you intend that this is going to be used for
lazy_vacuum_heap too? Maybe it should.
Patch looks good to me otherwise.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi,
Thanks for working on this!
On 2019-12-11 08:36:48 -0600, Justin Pryzby wrote:
On Wed, Dec 11, 2019 at 09:15:07PM +0900, Michael Paquier wrote:
On Fri, Dec 06, 2019 at 10:23:25AM -0600, Justin Pryzby wrote:
Find attached updated patch:
. Use structure to include relation name.
. Split into a separate patch rename of "StringInfoData buf".2019-11-27 20:04:53.640 CST [14244] ERROR: canceling statement due to statement timeout
2019-11-27 20:04:53.640 CST [14244] CONTEXT: block 2314 of relation t
2019-11-27 20:04:53.640 CST [14244] STATEMENT: vacuum t;I tried to use BufferGetTag() to avoid using a 2ndary structure, but fails if
the buffer is not pinned.
The tag will not add all that informative details, because the
relfilenode isn't easily mappable to the table name or such.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 043ebb4..9376989 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -138,6 +138,12 @@ typedef struct LVRelStats bool lock_waiter_detected; } LVRelStats;+typedef struct +{ + char *relname; + char *relnamespace; + BlockNumber blkno; +} vacuum_error_callback_arg;
Hm, wonder if could be worthwhile to not use a separate struct here, but
instead extend one of the existing structs to contain the necessary
information. Or perhaps have one new struct with all the necessary
information. There's already quite a few places that do
get_namespace_name(), for example.
@@ -524,6 +531,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, PROGRESS_VACUUM_MAX_DEAD_TUPLES }; int64 initprog_val[3]; + ErrorContextCallback errcallback; + vacuum_error_callback_arg cbarg;
Not a fan of "cbarg", too generic.
pg_rusage_init(&ru0);
@@ -635,6 +644,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
else
skipping_blocks = false;+ /* Setup error traceback support for ereport() */ + errcallback.callback = vacuum_error_callback; + cbarg.relname = relname; + cbarg.relnamespace = get_namespace_name(RelationGetNamespace(onerel)); + cbarg.blkno = 0; /* Not known yet */ + errcallback.arg = (void *) &cbarg; + errcallback.previous = error_context_stack; + error_context_stack = &errcallback; + for (blkno = 0; blkno < nblocks; blkno++) { Buffer buf; @@ -658,6 +676,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
+ cbarg.blkno = blkno; + if (blkno == next_unskippable_block) { /* Time to advance next_unskippable_block */ @@ -817,7 +837,6 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,buf = ReadBufferExtended(onerel, MAIN_FORKNUM, blkno,
RBM_NORMAL, vac_strategy);
-
/* We need buffer cleanup lock so that we can prune HOT chains. */
if (!ConditionalLockBufferForCleanup(buf))
{
@@ -1388,6 +1407,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
RecordPageWithFreeSpace(onerel, blkno, freespace);
}+ /* Pop the error context stack */ + error_context_stack = errcallback.previous; + /* report that everything is scanned and vacuumed */ pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);@@ -2354,3 +2376,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
return all_visible;
}
I think this will misattribute errors that happen when in the:
/*
* If we are close to overrunning the available space for dead-tuple
* TIDs, pause and do a cycle of vacuuming before we tackle this page.
*/
section of lazy_scan_heap(). That will
a) scan the index, during which we presumably don't want the same error
context, as it'd be quite misleading: The block that was just scanned
in the loop isn't actually likely to be the culprit for an index
problem. And we'd not mention the fact that the problem is occurring
in the index.
b) will report the wrong block, when in lazy_vacuum_heap().
Greetings,
Andres Freund
On Wed, Dec 11, 2019 at 12:33:53PM -0300, Alvaro Herrera wrote:
On 2019-Dec-11, Justin Pryzby wrote:
+ cbarg.blkno = 0; /* Not known yet */
Shouldn't you use InvalidBlockNumber for this initialization?
..
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
+ cbarg.blkno = blkno;I would put this before pgstat_progress_update_param, just out of
paranoia.
..
Lose this hunk?
Addressed those.
Or do you intend that this is going to be used for lazy_vacuum_heap too?
Maybe it should.
Done in a separate patch.
On Wed, Dec 11, 2019 at 08:54:25AM -0800, Andres Freund wrote:
Hm, wonder if could be worthwhile to not use a separate struct here, but
instead extend one of the existing structs to contain the necessary
information. Or perhaps have one new struct with all the necessary
information. There's already quite a few places that do
get_namespace_name(), for example.
Didn't find a better struct to use yet.
+ vacuum_error_callback_arg cbarg;
Not a fan of "cbarg", too generic.
..
I think this will misattribute errors that happen when in the:
Probably right. Attached should address it.
On Wed, Dec 11, 2019 at 08:54:25AM -0800, Andres Freund wrote:
+typedef struct +{ + char *relname; + char *relnamespace; + BlockNumber blkno; +} vacuum_error_callback_arg;Hm, wonder if could be worthwhile to not use a separate struct here, but
instead extend one of the existing structs to contain the necessary
information. Or perhaps have one new struct with all the necessary
information. There's already quite a few places that do
get_namespace_name(), for example.
Not a fan of "cbarg", too generic.
I think this will misattribute errors that happen when in the:
I think that's addressed after deduplicating in attached.
Deduplication revealed 2nd progress call, which seems to have been included
redundantly at c16dc1aca.
- /* Remove tuples from heap */
- pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
- PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
Justin
Attachments:
v4-0001-Rename-buf-to-avoid-shadowing-buf-of-another-type.patchtext/x-diff; charset=us-asciiDownload+10-11
v4-0002-Remove-redundant-call-to-vacuum-progress.patchtext/x-diff; charset=us-asciiDownload+0-4
v4-0003-deduplication.patchtext/x-diff; charset=us-asciiDownload+48-66
v4-0004-vacuum-errcontext-to-show-block-being-processed.patchtext/x-diff; charset=us-asciiDownload+39-1
v4-0005-add-errcontext-callback-in-lazy_vacuum_heap-too.patchtext/x-diff; charset=us-asciiDownload+14-1
Import Notes
Reply to msg id not found: 20191211165425.4ewww2s5k5cafi4l@alap3.anarazel.de20191211153353.GA23008@alvherre.pgsql | Resolved by subject fallback
On Thu, Dec 12, 2019 at 09:08:31PM -0600, Justin Pryzby wrote:
On Wed, Dec 11, 2019 at 08:54:25AM -0800, Andres Freund wrote:
Hm, wonder if could be worthwhile to not use a separate struct here, but
instead extend one of the existing structs to contain the necessary
information. Or perhaps have one new struct with all the necessary
information. There's already quite a few places that do
get_namespace_name(), for example.Didn't find a better struct to use yet.
Yes, I am too wondering what Andres has in mind here. It is not like
you can use VacuumRelation as the OID of the relation may not have
been stored.
On Wed, Dec 11, 2019 at 08:54:25AM -0800, Andres Freund wrote:>
I think that's addressed after deduplicating in attached.Deduplication revealed 2nd progress call, which seems to have been included
redundantly at c16dc1aca.- /* Remove tuples from heap */
- pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
- PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
What is the purpose of 0001 in the context of this thread? One could
say the same about 0002 and 0003. Anyway, you are right with 0002 as
the progress value for PROGRESS_VACUUM_PHASE gets updated twice in a
row with the same value. So let's clean up that.
The refactoring in 0003 is interesting, so I would be tempted to merge
it. Now you have one small issue in it:
- /*
- * Forget the now-vacuumed tuples, and press on, but be careful
- * not to reset latestRemovedXid since we want that value to be
- * valid.
- */
+ lazy_vacuum_heap_index(onerel, vacrelstats, Irel, nindexes, indstats);
vacrelstats->num_dead_tuples = 0;
- vacrelstats->num_index_scans++;
You are moving this comment within lazy_vacuum_heap_index, but it
applies to num_dead_tuples and not num_index_scans, no? You should
keep the incrementation of num_index_scans within the routine though.
--
Michael
On Fri, Dec 13, 2019 at 10:28:50PM +0900, Michael Paquier wrote:
v4-0001-Rename-buf-to-avoid-shadowing-buf-of-another-type.patch
v4-0002-Remove-redundant-call-to-vacuum-progress.patch
v4-0003-deduplication.patch
v4-0004-vacuum-errcontext-to-show-block-being-processed.patch
v4-0005-add-errcontext-callback-in-lazy_vacuum_heap-too.patch
What is the purpose of 0001 in the context of this thread? One could
say the same about 0002 and 0003. Anyway, you are right with 0002 as
the progress value for PROGRESS_VACUUM_PHASE gets updated twice in a
row with the same value. So let's clean up that.
It's related code which I cleaned up before adding new stuff. Not essential,
thus separate (0002 should be backpatched).
The refactoring in 0003 is interesting, so I would be tempted to merge it. Now you have one small issue in it: - /* - * Forget the now-vacuumed tuples, and press on, but be careful - * not to reset latestRemovedXid since we want that value to be - * valid. - */ + lazy_vacuum_heap_index(onerel, vacrelstats, Irel, nindexes, indstats); vacrelstats->num_dead_tuples = 0; - vacrelstats->num_index_scans++; You are moving this comment within lazy_vacuum_heap_index, but it applies to num_dead_tuples and not num_index_scans, no? You should keep the incrementation of num_index_scans within the routine though.
Thank you, fixed.
--
Justin Pryzby
System Administrator
Telsasoft
+1-952-707-8581
Attachments:
v5-0001-Rename-buf-to-avoid-shadowing-buf-of-another-type.patchtext/x-diff; charset=us-asciiDownload+10-11
v5-0002-Remove-redundant-call-to-vacuum-progress.patchtext/x-diff; charset=us-asciiDownload+0-4
v5-0003-deduplication.patchtext/x-diff; charset=us-asciiDownload+43-60
v5-0004-vacuum-errcontext-to-show-block-being-processed.patchtext/x-diff; charset=us-asciiDownload+38-1
v5-0005-add-errcontext-callback-in-lazy_vacuum_heap-too.patchtext/x-diff; charset=us-asciiDownload+14-1
On Fri, Dec 13, 2019 at 04:47:35PM -0600, Justin Pryzby wrote:
It's related code which I cleaned up before adding new stuff. Not essential,
thus separate (0002 should be backpatched).
The issue just causes some extra work and that's not a bug, so applied
without a backpatch.
The refactoring in 0003 is interesting, so I would be tempted to merge it. Now you have one small issue in it: - /* - * Forget the now-vacuumed tuples, and press on, but be careful - * not to reset latestRemovedXid since we want that value to be - * valid. - */ + lazy_vacuum_heap_index(onerel, vacrelstats, Irel, nindexes, indstats); vacrelstats->num_dead_tuples = 0; - vacrelstats->num_index_scans++; You are moving this comment within lazy_vacuum_heap_index, but it applies to num_dead_tuples and not num_index_scans, no? You should keep the incrementation of num_index_scans within the routine though.Thank you, fixed.
For 0003, I think that lazy_vacuum_heap_index() can be confusing as
those indexes are unrelated to heap. Why not naming it just
lazy_vacuum_all_indexes()? The routine should also have a header
describing it.
--
Michael
On Sun, Dec 15, 2019 at 10:07:08PM +0900, Michael Paquier wrote:
On Fri, Dec 13, 2019 at 04:47:35PM -0600, Justin Pryzby wrote:
It's related code which I cleaned up before adding new stuff. Not essential,
thus separate (0002 should be backpatched).The issue just causes some extra work and that's not a bug, so applied
without a backpatch.
Thanks
For 0003, I think that lazy_vacuum_heap_index() can be confusing as
those indexes are unrelated to heap. Why not naming it just
lazy_vacuum_all_indexes()? The routine should also have a header
describing it.
I named it so because it calls both lazy_vacuum_index
("PROGRESS_VACUUM_PHASE_VACUUM_INDEX") and
lazy_vacuum_heap("PROGRESS_VACUUM_PHASE_VACUUM_HEAP")
I suppose you don't think the other way around is better?
lazy_vacuum_index_heap
Justin
On Sun, Dec 15, 2019 at 10:27:12AM -0600, Justin Pryzby wrote:
I named it so because it calls both lazy_vacuum_index
("PROGRESS_VACUUM_PHASE_VACUUM_INDEX") and
lazy_vacuum_heap("PROGRESS_VACUUM_PHASE_VACUUM_HEAP")I suppose you don't think the other way around is better?
lazy_vacuum_index_heap
Dunno. Let's see if others have other thoughts on the matter. FWIW,
I have a long history at naming things in a way others don't like :)
--
Michael
At Mon, 16 Dec 2019 11:49:56 +0900, Michael Paquier <michael@paquier.xyz> wrote in
On Sun, Dec 15, 2019 at 10:27:12AM -0600, Justin Pryzby wrote:
I named it so because it calls both lazy_vacuum_index
("PROGRESS_VACUUM_PHASE_VACUUM_INDEX") and
lazy_vacuum_heap("PROGRESS_VACUUM_PHASE_VACUUM_HEAP")I suppose you don't think the other way around is better?
lazy_vacuum_index_heapDunno. Let's see if others have other thoughts on the matter. FWIW,
I have a long history at naming things in a way others don't like :)
lazy_vacuum_heap_index() seems confusing to me. I read the name as
Michael did before looking the above explanation.
lazy_vacuum_heap_and_index() is clearer to me.
lazy_vacuum_heap_with_index() could also work but I'm not sure it's
further better.
I see some function names like that, and some others that have two
verbs bonded by "_and_".
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Mon, Dec 16, 2019 at 11:49:56AM +0900, Michael Paquier wrote:
On Sun, Dec 15, 2019 at 10:27:12AM -0600, Justin Pryzby wrote:
I named it so because it calls both lazy_vacuum_index
("PROGRESS_VACUUM_PHASE_VACUUM_INDEX") and
lazy_vacuum_heap("PROGRESS_VACUUM_PHASE_VACUUM_HEAP")I suppose you don't think the other way around is better?
lazy_vacuum_index_heapDunno. Let's see if others have other thoughts on the matter. FWIW,
I have a long history at naming things in a way others don't like :)
I renamed.
And deduplicated two more hunks into a 2nd function.
(I'm also including the changes I mentioned here ... in case anyone cares to
comment or review).
/messages/by-id/20191220171132.GB30414@telsasoft.com
Attachments:
v6-0004-vacuum-errcontext-to-show-block-being-processed.patchtext/x-diff; charset=us-asciiDownload+37-1
v6-0005-add-errcontext-callback-in-lazy_vacuum_heap-too.patchtext/x-diff; charset=us-asciiDownload+14-1
v6-0001-Rename-buf-to-avoid-shadowing-buf-of-another-type.patchtext/x-diff; charset=us-asciiDownload+10-11
v6-0002-deduplication.patchtext/x-diff; charset=us-asciiDownload+43-61
v6-0003-dedup2-skip_blocks.patchtext/x-diff; charset=us-asciiDownload+84-104
v6-0006-Print-debug-line-before-starting-each-vacuum-step.patchtext/x-diff; charset=us-asciiDownload+10-1
On Mon, Dec 23, 2019 at 07:24:28PM -0600, Justin Pryzby wrote:
I renamed.
Hmm. I have found what was partially itching me for patch 0002, and
that's actually the fact that we don't do the error reporting for heap
within lazy_vacuum_heap() because the code relies too much on updating
two progress parameters at the same time, on top of the fact that you
are mixing multiple concepts with this refactoring. One problem is
that if this code is refactored in the future, future callers of
lazy_vacuum_heap() would miss the update of the progress reporting.
Splitting things improves also the readability of the code, so
attached is the refactoring I would do for this portion of the set.
It is also more natural to increment num_index_scans when the
reporting happens on consistency grounds.
(Please note that I have not indented yet the patch.)
--
Michael
Attachments:
vacuumlazy-index-refactor-v7.patchtext/x-diff; charset=us-asciiDownload+43-53
On Tue, Dec 24, 2019 at 01:19:09PM +0900, Michael Paquier wrote:
(Please note that I have not indented yet the patch.)
And one indentation later, committed this one after an extra lookup as
of 1ab41a3.
--
Michael
On Tue, Dec 24, 2019 at 01:19:09PM +0900, Michael Paquier wrote:
On Mon, Dec 23, 2019 at 07:24:28PM -0600, Justin Pryzby wrote:
I renamed.
Hmm. I have found what was partially itching me for patch 0002, and
that's actually the fact that we don't do the error reporting for heap
within lazy_vacuum_heap() because the code relies too much on updating
two progress parameters at the same time, on top of the fact that you
are mixing multiple concepts with this refactoring. One problem is
that if this code is refactored in the future, future callers of
lazy_vacuum_heap() would miss the update of the progress reporting.
Splitting things improves also the readability of the code, so
attached is the refactoring I would do for this portion of the set.
It is also more natural to increment num_index_scans when the
I agree that's better.
I don't see any reason why the progress params need to be updated atomically.
So rebasified against your patch.
Attachments:
v7-0001-Rename-buf-to-avoid-shadowing-buf-of-another-type.patchtext/x-diff; charset=us-asciiDownload+10-11
v7-0002-michael-dedup.patchtext/x-diff; charset=us-asciiDownload+43-54
v7-0003-dedup2-skip_blocks.patchtext/x-diff; charset=us-asciiDownload+84-104
v7-0004-vacuum-errcontext-to-show-block-being-processed.patchtext/x-diff; charset=us-asciiDownload+37-1
v7-0005-add-errcontext-callback-in-lazy_vacuum_heap-too.patchtext/x-diff; charset=us-asciiDownload+14-1
v7-0006-Print-debug-line-before-starting-each-vacuum-step.patchtext/x-diff; charset=us-asciiDownload+10-1
On Thu, Dec 26, 2019 at 10:57 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
I agree that's better.
I don't see any reason why the progress params need to be updated atomically.
So rebasified against your patch.
I am not sure whether it's important enough to make a stink about, but
it bothers me a bit that this is being dismissed as unimportant. The
problem is that, if the updates are not atomic, then somebody might
see the data after one has been updated and the other has not yet been
updated. The result is that when the phase is
PROGRESS_VACUUM_PHASE_VACUUM_INDEX, someone reading the information
can't tell whether the number of index scans reported is the number
*previously* performed or the number performed including the one that
just finished. The race to see the latter state is narrow, so it
probably wouldn't come up often, but it does seem like it would be
confusing if it did happen.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sat, Dec 28, 2019 at 07:21:31PM -0500, Robert Haas wrote:
On Thu, Dec 26, 2019 at 10:57 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
I agree that's better.
I don't see any reason why the progress params need to be updated atomically.
So rebasified against your patch.I am not sure whether it's important enough to make a stink about, but
it bothers me a bit that this is being dismissed as unimportant. The
problem is that, if the updates are not atomic, then somebody might
see the data after one has been updated and the other has not yet been
updated. The result is that when the phase is
PROGRESS_VACUUM_PHASE_VACUUM_INDEX, someone reading the information
can't tell whether the number of index scans reported is the number
*previously* performed or the number performed including the one that
just finished. The race to see the latter state is narrow, so it
probably wouldn't come up often, but it does seem like it would be
confusing if it did happen.
What used to be atomic was this:
- hvp_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_HEAP;
- hvp_val[1] = vacrelstats->num_index_scans + 1;
=> switch from PROGRESS_VACUUM_PHASE_VACUUM INDEX to HEAP and increment
index_vacuum_count, which is documented as the "Number of completed index
vacuum cycles."
Now, it 1) increments the number of completed scans; and, 2) then progresses
phase to HEAP, so there's a window where the number of completed scans is
incremented, and it still says VACUUM_INDEX.
Previously, if it said VACUUM_INDEX, one could assume that index_vacuum_count
would increase at least once more, and that's no longer true. If someone sees
VACUUM_INDEX and some NUM_INDEX_VACUUMS, and then later sees VACUUM_HEAP or
other later stage, with same (maybe final) value of NUM_INDEX_VACUUMS, that's
different than previous behavior.
It seems to me that a someone or their tool monitoring pg_stat shouldn't be
confused by this change, since:
1) there's no promise about how high NUM_INDEX_VACUUMS will or won't go; and,
2) index_vacuum_count didn't do anything strange like decreasing, or increased
before the scans were done; and,
3) the vacuum can finish at any time, and the monitoring process presumably
knows that when the PID is gone, it's finished, even if it missed intermediate
updates;
The behavior is different from before, but I think that's ok: the number of
scans is accurate, and the PHASE is accurate, even though it'll change a moment
later.
I see there's similar case here:
| /* report all blocks vacuumed; and that we're cleaning up */
| pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
| pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
| PROGRESS_VACUUM_PHASE_INDEX_CLEANUP);
heap_blks_scanned is documented as "Number of heap blocks SCANNED", and it
increments exactly to heap_blks_total. Would someone be confused if
heap_blks_scanned==heap_blks_total AND phase=='scanning heap' ? I think they'd
just expect PHASE to be updated a moment later. (And if it wasn't, I agree they
should then be legitimately confused or concerned).
Actually, the doc says:
|If heap_blks_scanned is less than heap_blks_total, the system will return to
|scanning the heap after this phase is completed; otherwise, it will begin
|cleaning up indexes AFTER THIS PHASE IS COMPLETED.
I read that to mean that it's okay if heap_blks_scanned==heap_blks_total when
scanning/vacuuming heap.
Justin
On Thu, Dec 26, 2019 at 09:57:04AM -0600, Justin Pryzby wrote:
So rebasified against your patch.
Rebased against your patch in master this time.