display offset along with block number in vacuum errors

Started by Mahendra Singh Thalorover 5 years ago39 messageshackers
Jump to latest
#1Mahendra Singh Thalor
mahi6run@gmail.com

Hi hackers,
We discussed in another email thread[1]/messages/by-id/20200713223822.az6fo3m2x4t42xz2@alap3.anarazel.de -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com, that it will be helpful if we can
display offset along with block number in vacuum error. Here, proposing a
patch to add offset along with block number in vacuum errors.

In commit b61d161(Introduce vacuum errcontext to display additional
information), we added vacuum errcontext to display additional
information(block number) so that in case of vacuum error, we can identify
which block we are getting error. Addition to block number, if we can
display offset, then it will be more helpful for users. So to display
offset, here proposing two different methods(Thanks Robert for suggesting
these 2 methods):

*Method 1:* We can report the TID as well as the block number in
errcontext.
- errcontext("while scanning block %u of relation \"%s.%s\"",
-   errinfo->blkno, errinfo->relnamespace, errinfo->relname);
+ errcontext("while scanning block %u and offset %u of relation \"%s.%s\"",
+   errinfo->blkno, errinfo->offnum, errinfo->relnamespace,
errinfo->relname);

Above fix requires more calls to update_vacuum_error_info(). Attaching
v01_0001 patch for this method.

*Method 2: *We can improve the error messages by passing the relevant TID
to heap_prepare_freeze_tuple and having it report the TID as part of the
error message or in the error detail.
  ereport(ERROR,
  (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg_internal("found xmin %u from before relfrozenxid %u",
+ errmsg_internal("for block %u and offnum %u, found xmin %u from before
relfrozenxid %u",
+ ItemPointerGetBlockNumber(tid),
+ ItemPointerGetOffsetNumber(tid),
  xid, relfrozenxid)));

Attaching v01_0002 patch for this method.

Please let me know your thoughts.

[1]: /messages/by-id/20200713223822.az6fo3m2x4t42xz2@alap3.anarazel.de -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v01_0001-Added-offset-with-block-number-in-vacuum-errcontext.patchapplication/octet-stream; name=v01_0001-Added-offset-with-block-number-in-vacuum-errcontext.patchDownload+32-13
v01_0002-Added-block-and-offset-to-errors-of-heap_prepare_fre.patchapplication/octet-stream; name=v01_0002-Added-block-and-offset-to-errors-of-heap_prepare_fre.patchDownload+30-14
#2Michael Paquier
michael@paquier.xyz
In reply to: Mahendra Singh Thalor (#1)
Re: display offset along with block number in vacuum errors

On Fri, Jul 24, 2020 at 11:18:43PM +0530, Mahendra Singh Thalor wrote:

In commit b61d161(Introduce vacuum errcontext to display additional
information), we added vacuum errcontext to display additional
information(block number) so that in case of vacuum error, we can identify
which block we are getting error. Addition to block number, if we can
display offset, then it will be more helpful for users. So to display
offset, here proposing two different methods(Thanks Robert for suggesting
these 2 methods):

new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
vacrelstats->blkno = new_rel_pages;
+ vacrelstats->offnum = InvalidOffsetNumber;

Adding more context would be interesting for some cases, but not all
contrary to what your patch does in some code paths like
lazy_truncate_heap() as you would show up an offset of 0 for an
invalid value. This could confuse some users. Note that we are
careful enough to not print a context message if the block number is
invalid.
--
Michael

#3Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Mahendra Singh Thalor (#1)
Re: display offset along with block number in vacuum errors

On Sat, 25 Jul 2020 at 02:49, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:

Hi hackers,
We discussed in another email thread[1], that it will be helpful if we can display offset along with block number in vacuum error. Here, proposing a patch to add offset along with block number in vacuum errors.

In commit b61d161(Introduce vacuum errcontext to display additional information), we added vacuum errcontext to display additional information(block number) so that in case of vacuum error, we can identify which block we are getting error. Addition to block number, if we can display offset, then it will be more helpful for users. So to display offset, here proposing two different methods(Thanks Robert for suggesting these 2 methods):

Method 1: We can report the TID as well as the block number in  errcontext.
- errcontext("while scanning block %u of relation \"%s.%s\"",
-   errinfo->blkno, errinfo->relnamespace, errinfo->relname);
+ errcontext("while scanning block %u and offset %u of relation \"%s.%s\"",
+   errinfo->blkno, errinfo->offnum, errinfo->relnamespace, errinfo->relname);

Above fix requires more calls to update_vacuum_error_info(). Attaching v01_0001 patch for this method.

Method 2: We can improve the error messages by passing the relevant TID to heap_prepare_freeze_tuple and having it report the TID as part of the error message or in the error detail.
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
- errmsg_internal("found xmin %u from before relfrozenxid %u",
+ errmsg_internal("for block %u and offnum %u, found xmin %u from before relfrozenxid %u",
+ ItemPointerGetBlockNumber(tid),
+ ItemPointerGetOffsetNumber(tid),
xid, relfrozenxid)));

Attaching v01_0002 patch for this method.

Please let me know your thoughts.

+1 for adding offset in error messages.

I had a look at 0001 patch. You've set the vacuum error info but I
think an error won't happen during setting itemids unused:

@@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber
blkno, Buffer buffer,
                BlockNumber tblk;
                OffsetNumber toff;
                ItemId          itemid;
+               LVSavedErrInfo loc_saved_err_info;
                tblk =
ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
                if (tblk != blkno)
                        break;                          /* past end of
tuples for this block */
                toff =
ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
+
+               /* Update error traceback information */
+               update_vacuum_error_info(vacrelstats,
&loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+                                                                blkno, toff);
                itemid = PageGetItemId(page, toff);
                ItemIdSetUnused(itemid);
                unused[uncnt++] = toff;
+
+               /* Revert to the previous phase information for error
traceback */
+               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
        }

PageRepairFragmentation(page);

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Mahendra Singh Thalor
mahi6run@gmail.com
In reply to: Michael Paquier (#2)
Re: display offset along with block number in vacuum errors

Thanks Michael for looking into this.

On Sat, 25 Jul 2020 at 15:02, Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Jul 24, 2020 at 11:18:43PM +0530, Mahendra Singh Thalor wrote:

In commit b61d161(Introduce vacuum errcontext to display additional
information), we added vacuum errcontext to display additional
information(block number) so that in case of vacuum error, we can identify
which block we are getting error. Addition to block number, if we can
display offset, then it will be more helpful for users. So to display
offset, here proposing two different methods(Thanks Robert for suggesting
these 2 methods):

new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
vacrelstats->blkno = new_rel_pages;
+ vacrelstats->offnum = InvalidOffsetNumber;

Adding more context would be interesting for some cases, but not all
contrary to what your patch does in some code paths like
lazy_truncate_heap() as you would show up an offset of 0 for an
invalid value. This could confuse some users. Note that we are
careful enough to not print a context message if the block number is
invalid.

Okay. I agree with you. In case of inavlid offset, we can skip offset
printing. I will do this change in the next patch.

Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

#5Justin Pryzby
pryzby@telsasoft.com
In reply to: Mahendra Singh Thalor (#1)
Re: display offset along with block number in vacuum errors

On Fri, Jul 24, 2020 at 11:18:43PM +0530, Mahendra Singh Thalor wrote:

Hi hackers,
We discussed in another email thread[1], that it will be helpful if we can
display offset along with block number in vacuum error. Here, proposing a
patch to add offset along with block number in vacuum errors.

Thanks. I happened to see both threads, only by chance.

I'd already started writing the same as your 0001, which is essentially the
same as yours.

Here:

@@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
BlockNumber tblk;
OffsetNumber toff;
ItemId itemid;
+ LVSavedErrInfo loc_saved_err_info;

 		tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
 		if (tblk != blkno)
 			break;				/* past end of tuples for this block */
 		toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
+
+		/* Update error traceback information */
+		update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+								 blkno, toff);
 		itemid = PageGetItemId(page, toff);
 		ItemIdSetUnused(itemid);
 		unused[uncnt++] = toff;
+
+		/* Revert to the previous phase information for error traceback */
+		restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
 	}

I'm not sure why you use restore_vacuum_error_info() at all. It's already
called at the end of lazy_vacuum_page() (and others) to allow functions to
clean up after their own state changes, rather than requiring callers to do it.
I don't think you should use it in a loop, nor introduce another
LVSavedErrInfo.

Since phase and blkno are already set, I think you should just set
vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
Similar to whats done in lazy_vacuum_heap():

tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
vacrelstats->blkno = tblk;

I think you should also do:

@@ -2976,6 +2984,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
ItemId itemid;
HeapTupleData tuple;

+ vacrelstats->offset = offnum;

I'm not sure, but maybe you'd also want to do the same in more places:

@@ -2024,6 +2030,7 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
@@ -2790,6 +2797,7 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)

--
Justin

#6Mahendra Singh Thalor
mahi6run@gmail.com
In reply to: Justin Pryzby (#5)
Re: display offset along with block number in vacuum errors

Thanks Justin, Sawada and Michael for reviewing.

On Mon, 27 Jul 2020 at 16:43, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Fri, Jul 24, 2020 at 11:18:43PM +0530, Mahendra Singh Thalor wrote:

Hi hackers,
We discussed in another email thread[1], that it will be helpful if we can
display offset along with block number in vacuum error. Here, proposing a
patch to add offset along with block number in vacuum errors.

Thanks. I happened to see both threads, only by chance.

I'd already started writing the same as your 0001, which is essentially the
same as yours.

Here:

@@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
BlockNumber tblk;
OffsetNumber toff;
ItemId          itemid;
+               LVSavedErrInfo loc_saved_err_info;
tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
if (tblk != blkno)
break;                          /* past end of tuples for this block */
toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
+
+               /* Update error traceback information */
+               update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+                                                                blkno, toff);
itemid = PageGetItemId(page, toff);
ItemIdSetUnused(itemid);
unused[uncnt++] = toff;
+
+               /* Revert to the previous phase information for error traceback */
+               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
}

I'm not sure why you use restore_vacuum_error_info() at all. It's already
called at the end of lazy_vacuum_page() (and others) to allow functions to
clean up after their own state changes, rather than requiring callers to do it.
I don't think you should use it in a loop, nor introduce another
LVSavedErrInfo.

Since phase and blkno are already set, I think you should just set
vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
Similar to whats done in lazy_vacuum_heap():

tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
vacrelstats->blkno = tblk;

Fixed.

I think you should also do:

@@ -2976,6 +2984,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
ItemId itemid;
HeapTupleData tuple;

+ vacrelstats->offset = offnum;

Agreed and fixed.

I'm not sure, but maybe you'd also want to do the same in more places:

@@ -2024,6 +2030,7 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)

Fixed.

@@ -2790,6 +2797,7 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)

I checked the code and I think there is no need to do similar changes
in count_nondeletable_pages. I will look again and will verify again.

Apart from these, I fixed comments given by Sawada and Michael in the
latest patch. Attaching v2 patch for review.

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v02_0001-Added-offset-with-block-number-in-vacuum-errcontext.patchapplication/octet-stream; name=v02_0001-Added-offset-with-block-number-in-vacuum-errcontext.patchDownload+46-17
#7Justin Pryzby
pryzby@telsasoft.com
In reply to: Mahendra Singh Thalor (#6)
Re: display offset along with block number in vacuum errors

On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:

Apart from these, I fixed comments given by Sawada and Michael in the
latest patch. Attaching v2 patch for review.

Thanks.

lazy_check_needs_freeze iterates over blocks and this patch changes it to
update vacrelstats. I think it should do what
lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at
its beginning (even though only the offset is changed), and then
restore_vacuum_error_info() at its end (to "revert back" to the item number it
started with).

The same is true of heap_page_is_all_visible(), except it's only called by
lazy_vacuum_page(), which already calls restore_vacuum_error_info() a few lines
later.

As for the message:

+                               if (OffsetNumberIsValid(errinfo->offnum))
+                                       errcontext("while scanning block %u and offset %u of relation \"%s.%s\"",
+                                                          errinfo->blkno, errinfo->offnum, errinfo->relnamespace, errinfo->relname);
+                               else
+                                       errcontext("while scanning block %u of relation \"%s.%s\"",
+                                                          errinfo->blkno, errinfo->relnamespace, errinfo->relname);

I think that may be confusing. A DBA should know what a "block" is, but
"offset" sounds like a byte offset, rather than an item number. Here's what
I'd written. Maybe it should say "offset number".

+ errcontext("while vacuuming block %u of relation \"%s.%s\", item offset %u",

--
Justin

#8Justin Pryzby
pryzby@telsasoft.com
In reply to: Mahendra Singh Thalor (#6)
Re: display offset along with block number in vacuum errors

Bcc:
Subject: Re: display offset along with block number in vacuum errors
Reply-To:
In-Reply-To: <CAKYtNApLJjAaRw0UEBBY6G1o0LRZKS7rA5n46BFh+NfwSOycdg@mail.gmail.com>

On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:

Here:

@@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
BlockNumber tblk;
OffsetNumber toff;
ItemId          itemid;
+               LVSavedErrInfo loc_saved_err_info;
tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
if (tblk != blkno)
break;                          /* past end of tuples for this block */
toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
+
+               /* Update error traceback information */
+               update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+                                                                blkno, toff);
itemid = PageGetItemId(page, toff);
ItemIdSetUnused(itemid);
unused[uncnt++] = toff;
+
+               /* Revert to the previous phase information for error traceback */
+               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
}

I'm not sure why you use restore_vacuum_error_info() at all. It's already
called at the end of lazy_vacuum_page() (and others) to allow functions to
clean up after their own state changes, rather than requiring callers to do it.
I don't think you should use it in a loop, nor introduce another
LVSavedErrInfo.

Since phase and blkno are already set, I think you should just set
vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
Similar to whats done in lazy_vacuum_heap():

tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
vacrelstats->blkno = tblk;

Fixed.

I rearead this thread and I think the earlier suggestion from Masahiko was
right. The loop around dead_tuples only does ItemIdSetUnused() which updates
the page, which has already been read from disk. On my suggestion, your v2
patch sets offnum directly, but now I think it's not useful to set at all,
since the whole page is manipulated by PageRepairFragmentation() and
log_heap_clean(). An error there would misleadingly say "..at offset number
MM", but would always show the page's last offset, and not the offset where an
error occured.

I added this at:
https://commitfest.postgresql.org/29/2662/

If anyone is considering this patch for v13, I guess it should be completed by
next week.

--
Justin

#9Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#8)
Re: display offset along with block number in vacuum errors

On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote:

Bcc:
Subject: Re: display offset along with block number in vacuum errors
Reply-To:
In-Reply-To: <CAKYtNApLJjAaRw0UEBBY6G1o0LRZKS7rA5n46BFh+NfwSOycdg@mail.gmail.com>

whoops

On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:

Here:

@@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
BlockNumber tblk;
OffsetNumber toff;
ItemId          itemid;
+               LVSavedErrInfo loc_saved_err_info;
tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
if (tblk != blkno)
break;                          /* past end of tuples for this block */
toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
+
+               /* Update error traceback information */
+               update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+                                                                blkno, toff);
itemid = PageGetItemId(page, toff);
ItemIdSetUnused(itemid);
unused[uncnt++] = toff;
+
+               /* Revert to the previous phase information for error traceback */
+               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
}

I'm not sure why you use restore_vacuum_error_info() at all. It's already
called at the end of lazy_vacuum_page() (and others) to allow functions to
clean up after their own state changes, rather than requiring callers to do it.
I don't think you should use it in a loop, nor introduce another
LVSavedErrInfo.

Since phase and blkno are already set, I think you should just set
vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
Similar to whats done in lazy_vacuum_heap():

tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
vacrelstats->blkno = tblk;

Fixed.

I rearead this thread and I think the earlier suggestion from Masahiko was
right. The loop around dead_tuples only does ItemIdSetUnused() which updates
the page, which has already been read from disk. On my suggestion, your v2
patch sets offnum directly, but now I think it's not useful to set at all,
since the whole page is manipulated by PageRepairFragmentation() and
log_heap_clean(). An error there would misleadingly say "..at offset number
MM", but would always show the page's last offset, and not the offset where an
error occured.

This makes me question whether offset numbers are ever useful during
VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by
internal functions that don't update vacrelstats->offno. Note that my initial
problem report that led to the errcontext implementation was an ERROR in heap
*scan* (not vacuum). So an offset number at that point would've been
sufficient.
/messages/by-id/20190808012436.GG11185@telsasoft.com

I mentioned that lazy_check_needs_freeze() should save and restore the errinfo,
so an error in heap_page_prune() (for example) doesn't get the wrong offset
associated with it.

So see the attached changes on top of your v2 patch.

postgres=# DROP TABLE tt; CREATE TABLE tt(a int) WITH (fillfactor=90); INSERT INTO tt SELECT generate_series(1,99999); VACUUM tt; UPDATE tt SET a=1 WHERE ctid='(345,10)'; UPDATE pg_class SET relfrozenxid=(relfrozenxid::text::int + (1<<30))::int::text::xid WHERE oid='tt'::regclass; VACUUM tt;
ERROR: found xmin 1961 from before relfrozenxid 1073743785
CONTEXT: while scanning block 345 of relation "public.tt", item offset 205

Hmm.. is it confusing that the block number is 0-indexed but the offset is
1-indexed ?

--
Justin

Attachments:

0001-Added-offset-with-block-number-in-vacuum-errcontext.patchtext/x-diff; charset=us-asciiDownload+46-17
0002-fix.patchtext/x-diff; charset=us-asciiDownload+16-14
#10Mahendra Singh Thalor
mahi6run@gmail.com
In reply to: Justin Pryzby (#9)
Re: display offset along with block number in vacuum errors

Thanks Justin.

On Sat, 1 Aug 2020 at 11:47, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote:

Bcc:
Subject: Re: display offset along with block number in vacuum errors
Reply-To:
In-Reply-To: <CAKYtNApLJjAaRw0UEBBY6G1o0LRZKS7rA5n46BFh+NfwSOycdg@mail.gmail.com>

whoops

On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:

Here:

@@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
BlockNumber tblk;
OffsetNumber toff;
ItemId          itemid;
+               LVSavedErrInfo loc_saved_err_info;
tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
if (tblk != blkno)
break;                          /* past end of tuples for this block */
toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
+
+               /* Update error traceback information */
+               update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+                                                                blkno, toff);
itemid = PageGetItemId(page, toff);
ItemIdSetUnused(itemid);
unused[uncnt++] = toff;
+
+               /* Revert to the previous phase information for error traceback */
+               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
}

I'm not sure why you use restore_vacuum_error_info() at all. It's already
called at the end of lazy_vacuum_page() (and others) to allow functions to
clean up after their own state changes, rather than requiring callers to do it.
I don't think you should use it in a loop, nor introduce another
LVSavedErrInfo.

Since phase and blkno are already set, I think you should just set
vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
Similar to whats done in lazy_vacuum_heap():

tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
vacrelstats->blkno = tblk;

Fixed.

I rearead this thread and I think the earlier suggestion from Masahiko was
right. The loop around dead_tuples only does ItemIdSetUnused() which updates
the page, which has already been read from disk. On my suggestion, your v2
patch sets offnum directly, but now I think it's not useful to set at all,
since the whole page is manipulated by PageRepairFragmentation() and
log_heap_clean(). An error there would misleadingly say "..at offset number
MM", but would always show the page's last offset, and not the offset where an
error occured.

This makes me question whether offset numbers are ever useful during
VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by
internal functions that don't update vacrelstats->offno. Note that my initial
problem report that led to the errcontext implementation was an ERROR in heap
*scan* (not vacuum). So an offset number at that point would've been
sufficient.
/messages/by-id/20190808012436.GG11185@telsasoft.com

I mentioned that lazy_check_needs_freeze() should save and restore the errinfo,
so an error in heap_page_prune() (for example) doesn't get the wrong offset
associated with it.

So see the attached changes on top of your v2 patch.

Actually I was waiting for review comments from committer and other
people also and was planning to send a patch after that. I already
fixed your comments in my offline patch and was waiting for more
comments. Anyway, thanks for delta patch.

Here, attaching v3 patch for review.

postgres=# DROP TABLE tt; CREATE TABLE tt(a int) WITH (fillfactor=90); INSERT INTO tt SELECT generate_series(1,99999); VACUUM tt; UPDATE tt SET a=1 WHERE ctid='(345,10)'; UPDATE pg_class SET relfrozenxid=(relfrozenxid::text::int + (1<<30))::int::text::xid WHERE oid='tt'::regclass; VACUUM tt;
ERROR: found xmin 1961 from before relfrozenxid 1073743785
CONTEXT: while scanning block 345 of relation "public.tt", item offset 205

Hmm.. is it confusing that the block number is 0-indexed but the offset is
1-indexed ?

--
Justin

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.c

Attachments:

v03_0001-Added-offset-with-block-number-in-vacuum-errcontext.patchapplication/octet-stream; name=v03_0001-Added-offset-with-block-number-in-vacuum-errcontext.patchDownload+50-18
#11Justin Pryzby
pryzby@telsasoft.com
In reply to: Mahendra Singh Thalor (#10)
Re: display offset along with block number in vacuum errors

On Sat, Aug 01, 2020 at 12:31:53PM +0530, Mahendra Singh Thalor wrote:

Actually I was waiting for review comments from committer and other
people also and was planning to send a patch after that. I already
fixed your comments in my offline patch and was waiting for more
comments. Anyway, thanks for delta patch.

Here, attaching v3 patch for review.

I wasn't being impatient but I spent enough time thinking about this that it
made sense to put it in patch form. Your patch has a couple extaneous changes:

case VACUUM_ERRCB_PHASE_VACUUM_HEAP:
if (BlockNumberIsValid(errinfo->blkno))
+ {
errcontext("while vacuuming block %u of relation \"%s.%s\"",
errinfo->blkno, errinfo->relnamespace, errinfo->relname);
+ }
break;

case VACUUM_ERRCB_PHASE_VACUUM_INDEX:
@@ -3589,6 +3618,7 @@ vacuum_error_callback(void *arg)
errinfo->indname, errinfo->relnamespace, errinfo->relname);
break;

+
case VACUUM_ERRCB_PHASE_INDEX_CLEANUP:
errcontext("while cleaning up index \"%s\" of relation \"%s.%s\"",

I would get rid of these by doing like: git reset -p HEAD~1 (say "n" to most
hunks and "y" to reset just the two, above), then git commit --amend (without
-a and without pathnames), then git diff will show local changes (including
those no-longer-committed hunks), which you can git checkout -p (or similar).
I'd be interested to hear if there's a better way.

--
Justin

#12Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Mahendra Singh Thalor (#10)
Re: display offset along with block number in vacuum errors

On Sat, 1 Aug 2020 at 16:02, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:

Thanks Justin.

On Sat, 1 Aug 2020 at 11:47, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote:

Bcc:
Subject: Re: display offset along with block number in vacuum errors
Reply-To:
In-Reply-To: <CAKYtNApLJjAaRw0UEBBY6G1o0LRZKS7rA5n46BFh+NfwSOycdg@mail.gmail.com>

whoops

On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:

Here:

@@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
BlockNumber tblk;
OffsetNumber toff;
ItemId          itemid;
+               LVSavedErrInfo loc_saved_err_info;
tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
if (tblk != blkno)
break;                          /* past end of tuples for this block */
toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
+
+               /* Update error traceback information */
+               update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+                                                                blkno, toff);
itemid = PageGetItemId(page, toff);
ItemIdSetUnused(itemid);
unused[uncnt++] = toff;
+
+               /* Revert to the previous phase information for error traceback */
+               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
}

I'm not sure why you use restore_vacuum_error_info() at all. It's already
called at the end of lazy_vacuum_page() (and others) to allow functions to
clean up after their own state changes, rather than requiring callers to do it.
I don't think you should use it in a loop, nor introduce another
LVSavedErrInfo.

Since phase and blkno are already set, I think you should just set
vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
Similar to whats done in lazy_vacuum_heap():

tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
vacrelstats->blkno = tblk;

Fixed.

I rearead this thread and I think the earlier suggestion from Masahiko was
right. The loop around dead_tuples only does ItemIdSetUnused() which updates
the page, which has already been read from disk. On my suggestion, your v2
patch sets offnum directly, but now I think it's not useful to set at all,
since the whole page is manipulated by PageRepairFragmentation() and
log_heap_clean(). An error there would misleadingly say "..at offset number
MM", but would always show the page's last offset, and not the offset where an
error occured.

This makes me question whether offset numbers are ever useful during
VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by
internal functions that don't update vacrelstats->offno. Note that my initial
problem report that led to the errcontext implementation was an ERROR in heap
*scan* (not vacuum). So an offset number at that point would've been
sufficient.
/messages/by-id/20190808012436.GG11185@telsasoft.com

I mentioned that lazy_check_needs_freeze() should save and restore the errinfo,
so an error in heap_page_prune() (for example) doesn't get the wrong offset
associated with it.

So see the attached changes on top of your v2 patch.

Actually I was waiting for review comments from committer and other
people also and was planning to send a patch after that. I already
fixed your comments in my offline patch and was waiting for more
comments. Anyway, thanks for delta patch.

Here, attaching v3 patch for review.

Thank you for updating the patch!

Here are my comments on v3 patch:

@@ -2024,6 +2033,11 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
if (PageIsNew(page) || PageIsEmpty(page))
return false;

+   /* Update error traceback information */
+   update_vacuum_error_info(vacrelstats, &saved_err_info,
+           VACUUM_ERRCB_PHASE_SCAN_HEAP, vacrelstats->blkno,
+           InvalidOffsetNumber);
+
    maxoff = PageGetMaxOffsetNumber(page);
    for (offnum = FirstOffsetNumber;
         offnum <= maxoff;

You update the error callback phase to VACUUM_ERRCB_PHASE_SCAN_HEAP
but I think we're already in that phase. I'm okay with explicitly
setting it but on the other hand, we don't set the phase in
heap_page_is_all_visible(). Is there any reason for that?

Also, since we don't reset vacrelstats->offnum at the end of
heap_page_is_all_visible(), if an error occurs by the end of
lazy_vacuum_page(), the caller of heap_page_is_all_visible(), we
report the error context with the last offset number in the page,
making the users confused.

---
@@ -2045,10 +2060,13 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)

if (heap_tuple_needs_freeze(tupleheader, FreezeLimit,
MultiXactCutoff, buf))
- return true;
+ break;
} /* scan along page */

-   return false;
+   /* Revert to the previous phase information for error traceback */
+   restore_vacuum_error_info(vacrelstats, &saved_err_info);
+
+   return offnum <= maxoff ? true : false;
 }

I think we can write just "return (offnum <= maxoff)".

---
-   /* Revert back to the old phase information for error traceback */
+   /* Revert to the old phase information for error traceback */

If we want to modify this comment how about the following phrase for
consistency with other places?

/* Revert to the previous phase information for error traceback */

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13Justin Pryzby
pryzby@telsasoft.com
In reply to: Masahiko Sawada (#12)
Re: display offset along with block number in vacuum errors

On Sun, Aug 02, 2020 at 01:02:42PM +0900, Masahiko Sawada wrote:

Thank you for updating the patch!

Here are my comments on v3 patch:

@@ -2024,6 +2033,11 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
if (PageIsNew(page) || PageIsEmpty(page))
return false;

+   /* Update error traceback information */
+   update_vacuum_error_info(vacrelstats, &saved_err_info,
+           VACUUM_ERRCB_PHASE_SCAN_HEAP, vacrelstats->blkno,
+           InvalidOffsetNumber);
+
maxoff = PageGetMaxOffsetNumber(page);
for (offnum = FirstOffsetNumber;
offnum <= maxoff;

You update the error callback phase to VACUUM_ERRCB_PHASE_SCAN_HEAP
but I think we're already in that phase. I'm okay with explicitly
setting it but on the other hand, we don't set the phase in
heap_page_is_all_visible(). Is there any reason for that?

That part was my suggestion, so I can answer that. I added
update_vacuum_error_info() to lazy_check_needs_freeze() to allow it to later
call restore_vacuum_error_info().

Also, since we don't reset vacrelstats->offnum at the end of
heap_page_is_all_visible(), if an error occurs by the end of
lazy_vacuum_page(), the caller of heap_page_is_all_visible(), we
report the error context with the last offset number in the page,
making the users confused.

So it looks like heap_page_is_all_visible() should also call the update and
restore functions.

Do you agree with my suggestion that the VACUUM phase should never try to
report an offset ?

How do you think we can phrase the message to avoid confusion due to 0-based
block number and 1-based offset ?

Thanks,
--
Justin

#14Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Justin Pryzby (#13)
Re: display offset along with block number in vacuum errors

On Sun, 2 Aug 2020 at 13:24, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Sun, Aug 02, 2020 at 01:02:42PM +0900, Masahiko Sawada wrote:

Thank you for updating the patch!

Here are my comments on v3 patch:

@@ -2024,6 +2033,11 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
if (PageIsNew(page) || PageIsEmpty(page))
return false;

+   /* Update error traceback information */
+   update_vacuum_error_info(vacrelstats, &saved_err_info,
+           VACUUM_ERRCB_PHASE_SCAN_HEAP, vacrelstats->blkno,
+           InvalidOffsetNumber);
+
maxoff = PageGetMaxOffsetNumber(page);
for (offnum = FirstOffsetNumber;
offnum <= maxoff;

You update the error callback phase to VACUUM_ERRCB_PHASE_SCAN_HEAP
but I think we're already in that phase. I'm okay with explicitly
setting it but on the other hand, we don't set the phase in
heap_page_is_all_visible(). Is there any reason for that?

That part was my suggestion, so I can answer that. I added
update_vacuum_error_info() to lazy_check_needs_freeze() to allow it to later
call restore_vacuum_error_info().

Also, since we don't reset vacrelstats->offnum at the end of
heap_page_is_all_visible(), if an error occurs by the end of
lazy_vacuum_page(), the caller of heap_page_is_all_visible(), we
report the error context with the last offset number in the page,
making the users confused.

So it looks like heap_page_is_all_visible() should also call the update and
restore functions.

Do you agree with my suggestion that the VACUUM phase should never try to
report an offset ?

Yeah, given the current heap vacuum implementation, I agree that
setting the offset number during VACUUM_HEAP phase doesn't help
anything. But setting the offset number during checking tuples'
visibility in heap_page_is_all_visible() might be useful, although it
might be unlikely to find a problem in heap_page_is_all_visible() as
the tuple visibility checking is already done in lazy_scan_heap(). I
wonder if we can set SCAN_HEAP phase and update the offset number in
heap_page_is_all_visible().

How do you think we can phrase the message to avoid confusion due to 0-based
block number and 1-based offset ?

I think that since the user who uses this errcontext information is
likely to know more or less the internal of PostgreSQL I think 0-based
block number and 1-based offset will not be a problem. However, I
expected these are documented but couldn't find. If not yet, I think
it’s a good time to document that.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#15Mahendra Singh Thalor
mahi6run@gmail.com
In reply to: Masahiko Sawada (#12)
Re: display offset along with block number in vacuum errors

Thanks Sawada and Justin.

On Sun, 2 Aug 2020 at 09:33, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Sat, 1 Aug 2020 at 16:02, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:

Thanks Justin.

On Sat, 1 Aug 2020 at 11:47, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote:

Bcc:
Subject: Re: display offset along with block number in vacuum errors
Reply-To:
In-Reply-To: <CAKYtNApLJjAaRw0UEBBY6G1o0LRZKS7rA5n46BFh+NfwSOycdg@mail.gmail.com>

whoops

On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:

Here:

@@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
BlockNumber tblk;
OffsetNumber toff;
ItemId          itemid;
+               LVSavedErrInfo loc_saved_err_info;
tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
if (tblk != blkno)
break;                          /* past end of tuples for this block */
toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
+
+               /* Update error traceback information */
+               update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+                                                                blkno, toff);
itemid = PageGetItemId(page, toff);
ItemIdSetUnused(itemid);
unused[uncnt++] = toff;
+
+               /* Revert to the previous phase information for error traceback */
+               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
}

I'm not sure why you use restore_vacuum_error_info() at all. It's already
called at the end of lazy_vacuum_page() (and others) to allow functions to
clean up after their own state changes, rather than requiring callers to do it.
I don't think you should use it in a loop, nor introduce another
LVSavedErrInfo.

Since phase and blkno are already set, I think you should just set
vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
Similar to whats done in lazy_vacuum_heap():

tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
vacrelstats->blkno = tblk;

Fixed.

I rearead this thread and I think the earlier suggestion from Masahiko was
right. The loop around dead_tuples only does ItemIdSetUnused() which updates
the page, which has already been read from disk. On my suggestion, your v2
patch sets offnum directly, but now I think it's not useful to set at all,
since the whole page is manipulated by PageRepairFragmentation() and
log_heap_clean(). An error there would misleadingly say "..at offset number
MM", but would always show the page's last offset, and not the offset where an
error occured.

This makes me question whether offset numbers are ever useful during
VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by
internal functions that don't update vacrelstats->offno. Note that my initial
problem report that led to the errcontext implementation was an ERROR in heap
*scan* (not vacuum). So an offset number at that point would've been
sufficient.
/messages/by-id/20190808012436.GG11185@telsasoft.com

I mentioned that lazy_check_needs_freeze() should save and restore the errinfo,
so an error in heap_page_prune() (for example) doesn't get the wrong offset
associated with it.

So see the attached changes on top of your v2 patch.

Actually I was waiting for review comments from committer and other
people also and was planning to send a patch after that. I already
fixed your comments in my offline patch and was waiting for more
comments. Anyway, thanks for delta patch.

Here, attaching v3 patch for review.

Thank you for updating the patch!

Here are my comments on v3 patch:

@@ -2024,6 +2033,11 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
if (PageIsNew(page) || PageIsEmpty(page))
return false;

+   /* Update error traceback information */
+   update_vacuum_error_info(vacrelstats, &saved_err_info,
+           VACUUM_ERRCB_PHASE_SCAN_HEAP, vacrelstats->blkno,
+           InvalidOffsetNumber);
+
maxoff = PageGetMaxOffsetNumber(page);
for (offnum = FirstOffsetNumber;
offnum <= maxoff;

You update the error callback phase to VACUUM_ERRCB_PHASE_SCAN_HEAP
but I think we're already in that phase. I'm okay with explicitly
setting it but on the other hand, we don't set the phase in
heap_page_is_all_visible(). Is there any reason for that?

Also, since we don't reset vacrelstats->offnum at the end of
heap_page_is_all_visible(), if an error occurs by the end of
lazy_vacuum_page(), the caller of heap_page_is_all_visible(), we
report the error context with the last offset number in the page,
making the users confused.

Your point is valid. Added update and restore functions in
heap_page_is_all_visible in the latest patch.

---
@@ -2045,10 +2060,13 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)

if (heap_tuple_needs_freeze(tupleheader, FreezeLimit,
MultiXactCutoff, buf))
- return true;
+ break;
} /* scan along page */

-   return false;
+   /* Revert to the previous phase information for error traceback */
+   restore_vacuum_error_info(vacrelstats, &saved_err_info);
+
+   return offnum <= maxoff ? true : false;
}

I think we can write just "return (offnum <= maxoff)".

Fixed this.

---
-   /* Revert back to the old phase information for error traceback */
+   /* Revert to the old phase information for error traceback */

If we want to modify this comment how about the following phrase for
consistency with other places?

Fixed this.

/* Revert to the previous phase information for error traceback */

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Apart from these, I fixed Justin's comment of extra brackets(That was
due to "patch -p 1 < file", as 002_fix was not applying directly). I
haven't updated the document for this(Sawada's comment). I will try in
the next patch.
Attaching v4 patch for review.

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v04_0001-Added-offset-with-block-number-in-vacuum-errcontext.patchapplication/octet-stream; name=v04_0001-Added-offset-with-block-number-in-vacuum-errcontext.patchDownload+57-18
#16Robert Haas
robertmhaas@gmail.com
In reply to: Masahiko Sawada (#14)
Re: display offset along with block number in vacuum errors

On Sun, Aug 2, 2020 at 10:43 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

I think that since the user who uses this errcontext information is
likely to know more or less the internal of PostgreSQL I think 0-based
block number and 1-based offset will not be a problem. However, I
expected these are documented but couldn't find. If not yet, I think
it’s a good time to document that.

I agree. That's just how TIDs are.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#7)
Re: display offset along with block number in vacuum errors

On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:

Apart from these, I fixed comments given by Sawada and Michael in the
latest patch. Attaching v2 patch for review.

Thanks.

lazy_check_needs_freeze iterates over blocks and this patch changes it to
update vacrelstats. I think it should do what
lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at
its beginning (even though only the offset is changed), and then
restore_vacuum_error_info() at its end (to "revert back" to the item number it
started with).

I see that Mahendra has changed patch as per this suggestion but I am
not convinced that it is a good idea to sprinkle
update_vacuum_error_info()/restore_vacuum_error_info() at places more
than required. I see that it might look a bit clean from the
perspective that if tomorrow we use the function
lazy_check_needs_freeze() for a different purpose then we don't need
to worry about the wrong phase information. If we are worried about
that then we should have an assert in that function to ensure that the
current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP.

--
With Regards,
Amit Kapila.

#18Amit Kapila
amit.kapila16@gmail.com
In reply to: Mahendra Singh Thalor (#15)
Re: display offset along with block number in vacuum errors

On Wed, Aug 5, 2020 at 12:47 AM Mahendra Singh Thalor
<mahi6run@gmail.com> wrote:

Apart from these, I fixed Justin's comment of extra brackets(That was
due to "patch -p 1 < file", as 002_fix was not applying directly). I
haven't updated the document for this(Sawada's comment). I will try in
the next patch.
Attaching v4 patch for review.

Few comments on the latest patch:
1.
@@ -2640,6 +2659,7 @@ lazy_truncate_heap(Relation onerel, LVRelStats
*vacrelstats)
  */
  new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
  vacrelstats->blkno = new_rel_pages;
+ vacrelstats->offnum = InvalidOffsetNumber;

Do we really need to update the 'vacrelstats->offnum' here when we
have already set it to InvalidOffsetNumber in the caller?

2.
@@ -3574,8 +3605,14 @@ vacuum_error_callback(void *arg)
  {
  case VACUUM_ERRCB_PHASE_SCAN_HEAP:
  if (BlockNumberIsValid(errinfo->blkno))
- errcontext("while scanning block %u of relation \"%s.%s\"",
-    errinfo->blkno, errinfo->relnamespace, errinfo->relname);
+ {
+ if (OffsetNumberIsValid(errinfo->offnum))
+ errcontext("while scanning block %u of relation \"%s.%s\", item offset %u",
+

I am not completely sure if this error message is an improvement over
what you have in the initial version of patch "while scanning block %u
and offset %u of relation \"%s.%s\"",...". I see that Justin has
raised a concern above that whether users will be aware of 'offset'
but I also see that we have used it in a few other messages in the
code. For example:

PageIndexTupleDeleteNoCompact()
{
..
nline = PageGetMaxOffsetNumber(page);
if ((int) offnum <= 0 || (int) offnum > nline)
elog(ERROR, "invalid index offnum: %u", offnum);
..
}

hash_desc
{
..
case XLOG_HASH_INSERT:
{
xl_hash_insert *xlrec = (xl_hash_insert *) rec;

appendStringInfo(buf, "off %u", xlrec->offnum);
break;
}

Similarly in other desc functions, we have used off or offnum.

I find the message in your initial patch better.

--
With Regards,
Amit Kapila.

#19Justin Pryzby
pryzby@telsasoft.com
In reply to: Amit Kapila (#17)
Re: display offset along with block number in vacuum errors

On Thu, Aug 06, 2020 at 07:39:21PM +0530, Amit Kapila wrote:

On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:

Apart from these, I fixed comments given by Sawada and Michael in the
latest patch. Attaching v2 patch for review.

Thanks.

lazy_check_needs_freeze iterates over blocks and this patch changes it to
update vacrelstats. I think it should do what
lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at
its beginning (even though only the offset is changed), and then
restore_vacuum_error_info() at its end (to "revert back" to the item number it
started with).

I see that Mahendra has changed patch as per this suggestion but I am
not convinced that it is a good idea to sprinkle
update_vacuum_error_info()/restore_vacuum_error_info() at places more
than required. I see that it might look a bit clean from the
perspective that if tomorrow we use the function
lazy_check_needs_freeze() for a different purpose then we don't need
to worry about the wrong phase information. If we are worried about
that then we should have an assert in that function to ensure that the
current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP.

The motivation was to restore the offnum, which is set to Invalid at the start
of lazy_scan_heap(), and then set valid within lazy_check_needs_freeze, but
should be restored or re-set to Invalid when returns to lazy_scan_heap(). If
you think it's important, we could just set vacrelstats->offnum = Invalid
before returning, but that's what the restore function was built for. We do
direct assignment in 2 places to avoid a function call within a loop.

lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
Relation *Irel, int nindexes, bool aggressive)

...
for (blkno = 0; blkno < nblocks; blkno++)
{
...
update_vacuum_error_info(vacrelstats, NULL, VACUUM_ERRCB_PHASE_SCAN_HEAP,
blkno, InvalidOffsetNumber);
if (!ConditionalLockBufferForCleanup(buf))
{
...
if (!lazy_check_needs_freeze(buf, &hastup, vacrelstats))
{
...
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))

--
Justin

#20Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#19)
Re: display offset along with block number in vacuum errors

On Thu, Aug 6, 2020 at 7:51 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Thu, Aug 06, 2020 at 07:39:21PM +0530, Amit Kapila wrote:

On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

lazy_check_needs_freeze iterates over blocks and this patch changes it to
update vacrelstats. I think it should do what
lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at
its beginning (even though only the offset is changed), and then
restore_vacuum_error_info() at its end (to "revert back" to the item number it
started with).

I see that Mahendra has changed patch as per this suggestion but I am
not convinced that it is a good idea to sprinkle
update_vacuum_error_info()/restore_vacuum_error_info() at places more
than required. I see that it might look a bit clean from the
perspective that if tomorrow we use the function
lazy_check_needs_freeze() for a different purpose then we don't need
to worry about the wrong phase information. If we are worried about
that then we should have an assert in that function to ensure that the
current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP.

The motivation was to restore the offnum, which is set to Invalid at the start
of lazy_scan_heap(), and then set valid within lazy_check_needs_freeze, but
should be restored or re-set to Invalid when returns to lazy_scan_heap(). If
you think it's important, we could just set vacrelstats->offnum = Invalid
before returning,

Yeah, I would prefer that and probably a comment to indicate why we
are doing that.

but that's what the restore function was built for.

I think it would be better to call restore wherever we call update. I
see your point that there is some value doing it via update/restore
but I think we should try to avoid that at many places unless it is
required and we already update blockno information without
update/restore at few places.

--
With Regards,
Amit Kapila.

#21Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#20)
#22Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#21)
#23Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#22)
#24Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#23)
#25Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#20)
#26Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#18)
#27Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#24)
#28Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#27)
#29Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#28)
#30Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#29)
#31Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#30)
#32Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#31)
#33Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#32)
#34Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#33)
#35Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#34)
#36Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#35)
#37Mahendra Singh Thalor
mahi6run@gmail.com
In reply to: Amit Kapila (#34)
#38Amit Kapila
amit.kapila16@gmail.com
In reply to: Mahendra Singh Thalor (#37)
#39Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#38)