HOT chain validation in verify_heapam()

Started by Himanshu Upadhyayaover 3 years ago100 messageshackers
Jump to latest
#1Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com

Hi,

On Mon, Apr 4, 2022 at 11:46 PM Andres Freund <andres@anarazel.de> wrote:

I think there's a few more things that'd be good to check. For example
amcheck
doesn't verify that HOT chains are reasonable, which can often be spotted
looking at an individual page. Which is a bit unfortunate, given how many
bugs
we had in that area.

Stuff to check around that:
- target of redirect has HEAP_ONLY_TUPLE, HEAP_UPDATED set
- In a valid ctid chain within a page (i.e. xmax = xmin):
- tuples have HEAP_UPDATED set
- HEAP_ONLY_TUPLE / HEAP_HOT_UPDATED matches across chains elements

(I changed the subject because the attached patch is related to HOT chain
validation).

Please find attached the patch with the above idea of HOT chain's
validation(within a Page) and a few more validation as below.

* If the predecessor’s xmin is aborted or in progress, the current tuples
xmin should be aborted or in progress respectively. Also, Both xmin must be
equal.
* If the predecessor’s xmin is not frozen, then-current tuple’s shouldn’t
be either.
* If the predecessor’s xmin is equal to the current tuple’s xmin, the
current tuple’s cmin should be greater than the predecessor’s xmin.
* If the current tuple is not HOT then its predecessor’s tuple must not be
HEAP_HOT_UPDATED.
* If the current Tuple is HOT then its predecessor’s tuple must be
HEAP_HOT_UPDATED and vice-versa.
* If xmax is 0, which means it's the last tuple in the chain, then it must
not be HEAP_HOT_UPDATED.
* If the current tuple is the last tuple in the HOT chain then the next
tuple should not be HOT.

I am looking into the process of adding the TAP test for these changes and
finding a way to corrupt a page in the tap test. Will try to include these
test cases in my Upcoming version of the patch.

--
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v1-0001-HOT-chain-validation-in-verify_heapam.patchapplication/x-patch; name=v1-0001-HOT-chain-validation-in-verify_heapam.patchDownload+144-1
#2Robert Haas
robertmhaas@gmail.com
In reply to: Himanshu Upadhyaya (#1)
Re: HOT chain validation in verify_heapam()

Hi,

Thanks for working on this.

+                htup = (HeapTupleHeader) PageGetItem(ctx.page, rditem);
+                if (!(HeapTupleHeaderIsHeapOnly(htup) &&
htup->t_infomask & HEAP_UPDATED))
+                    report_corruption(&ctx,
+                                      psprintf("Redirected Tuple at
line pointer offset %u is not HEAP_ONLY_TUPLE or HEAP_UPDATED tuple",
+                                               (unsigned) rdoffnum));

This isn't safe because the line pointer referenced by rditem may not
have been sanity-checked yet. Refer to the comment just below where it
says "Sanity-check the line pointer's offset and length values".

There are multiple problems with this error message. First, if you
take a look at the existing messages - which is always a good thing to
do when adding new ones - you will see that they are capitalized
differently. Try to match the existing style. Second, we made a real
effort with the existing messages to avoid referring to the names of
identifiers that only exist at the C level. For example, just above
you will see a message which says "line pointer redirection to item at
offset %u precedes minimum offset %u". It deliberately does not say
"line pointer redirection to item at offset %u is less than
FirstOffsetNumber" even though that would be an equally correct
statement of the problem. The intent here is to make the messages at
least somewhat accessible to users who are somewhat familiar with how
PostgreSQL storage works but may not read the C code. These comments
apply to every message you add in the patch.

The message also does not match the code. The code tests for
HEAP_UPDATED, but the message claims that the code is testing for
either HEAP_ONLY_TUPLE or HEAP_UPDATED. As a general rule, it is best
not to club related tests together in cases like this, because it
enables better and more specific error messages.

It would be clearer to make an explicit comparison to 0, like
(htup->t_infomask & HEAP_UPDATED) != 0, rather than relying on 0 being
false and non-zero being true. It doesn't matter to the compiler, but
it may help human readers.

+            /*
+             * Add line pointer offset to predecessor array.
+             * 1) If xmax is matching with xmin of next
Tuple(reaching via its t_ctid).
+             * 2) If next tuple is in the same page.
+             * Raise corruption if:
+             * We have two tuples having same predecessor.
+             *
+             * We add offset to predecessor irrespective of
transaction(t_xmin) status. We will
+             * do validation related to transaction status(and also
all other validations)
+             * when we loop over predecessor array.
+             */

The formatting of this comment will, I think, be mangled if pgindent
is run against the file. You can use ----- markers to prevent that, I
believe, or (better) write this as a paragraph without relying on the
lines ending up uneven in length.

+                if (predecessor[nextTupOffnum] != 0)
+                {
+                    report_corruption(&ctx,
+                                psprintf("Tuple at offset %u is
reachable from two or more updated tuple",
+                                    (unsigned) nextTupOffnum));
+                    continue;
+                }

You need to do this test after xmin/xmax matching. Otherwise you might
get false positives. Also, the message should be more specific and
match the style of the existing messages. ctx.offnum is already going
to be reported in another column, but we can report both nextTupOffnum
and predecessor[nextTupOffnum] here e.g. "updated version at offset %u
is also the updated version of tuple at offset %u".

+                currTupXmax = HeapTupleHeaderGetUpdateXid(ctx.tuphdr);
+                lp   = PageGetItemId(ctx.page, nextTupOffnum);
+
+                htup = (HeapTupleHeader) PageGetItem(ctx.page, lp);

This has the same problem I mentioned in my first comment above,
namely, we haven't necessarily sanity-checked the length and offset
values for nextTupOffnum yet. Saying that another way, if the contents
of lp are corrupt and point off the page, we want that to be reported
as corruption (which the current code will already do) and we want
this check to be skipped so that we don't crash or access random
memory addresses. You need to think about how to rearrange the code so
that we only perform checks that are known to be safe.

+        /* Now loop over offset and validate data in predecessor array.*/
+        for ( ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff;
+             ctx.offnum = OffsetNumberNext(ctx.offnum))

Please take the time to format your code according to the PostgeSQL
standard practice. If you don't know what that looks like, use
pgindent.

+        {
+            HeapTupleHeader pred_htup, curr_htup;
+            TransactionId   pred_xmin, curr_xmin, curr_xmax;
+            ItemId      pred_lp, curr_lp;

Same here.

Within this loop, you need to think about what to include in the
columns of the output other than 'msg' and what to include in the
message itself. There's no reason to include ctx.offnum in the message
text because it's already included in the 'offnum' column of the
output.

I think it would actually be a good idea to set ctx.offnum to the
predecessor's offset number, and use a separate variable for the
current offset number. The reason why I think this is that I believe
it will make it easier to phrase the messages appropriately. For
example, if ctx.offnum is the predecessor tuple, then we can issue
complaints like this:

tuple with uncommitted xmin %u was updated to produce a tuple at
offset %u with differing xmin %u
unfrozen tuple was updated to produce a tuple at offset %u which is not frozen
tuple with uncommitted xmin %u has cmin %u, but was updated to produce
a tuple with cmin %u
non-heap-only update produced a heap-only tuple at offset %u
heap-only update produced a non-heap only tuple at offset %u

+            if (!TransactionIdIsValid(curr_xmax) &&
+                HeapTupleHeaderIsHotUpdated(curr_htup))
+            {
+                report_corruption(&ctx,
+                            psprintf("Current tuple at offset %u is
HOT but is last tuple in the HOT chain.",
+                            (unsigned) ctx.offnum));
+            }

This check has nothing to do with the predecessor[] array, so it seems
like it belongs in check_tuple() rather than here. Also, the message
is rather confused, because the test is checking whether the tuple has
been HOT-updated, while the message is talking about whether the tuple
was *itself* created by a HOT update. Also, when we're dealing with
corruption, statements like "is last tuple in the HOT chain" are
pretty ambiguous. Also, isn't this an issue for both HOT-updated
tuples and also just regular updated tuples? i.e. maybe what we should
be complaining about here is something like "tuple has been updated,
but xmax is 0" and then make the test check exactly that.

+            if (!HeapTupleHeaderIsHotUpdated(pred_htup) &&
+                HeapTupleHeaderIsHeapOnly(pred_htup) &&
+                HeapTupleHeaderIsHeapOnly(curr_htup))
+            {
+                report_corruption(&ctx,
+                            psprintf("Current tuple at offset %u is
HOT but it is next updated tuple of last Tuple in HOT chain.",
+                            (unsigned) ctx.offnum));
+            }

Three if-statements up, you tested two out of these three conditions
and complained if they were met. So any time this fires, that will
have also fired.

...Robert

#3Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Robert Haas (#2)
Re: HOT chain validation in verify_heapam()

Hi Robert,

Thanks for sharing the feedback.

On Sat, Aug 27, 2022 at 1:47 AM Robert Haas <robertmhaas@gmail.com> wrote:

+                htup = (HeapTupleHeader) PageGetItem(ctx.page, rditem);
+                if (!(HeapTupleHeaderIsHeapOnly(htup) &&
htup->t_infomask & HEAP_UPDATED))
+                    report_corruption(&ctx,
+                                      psprintf("Redirected Tuple at
line pointer offset %u is not HEAP_ONLY_TUPLE or HEAP_UPDATED tuple",
+                                               (unsigned) rdoffnum));

This isn't safe because the line pointer referenced by rditem may not
have been sanity-checked yet. Refer to the comment just below where it
says "Sanity-check the line pointer's offset and length values".

handled by creating a new function check_lp and calling it before

accessing the redirected tuple.

+            /*
+             * Add line pointer offset to predecessor array.
+             * 1) If xmax is matching with xmin of next
Tuple(reaching via its t_ctid).
+             * 2) If next tuple is in the same page.
+             * Raise corruption if:
+             * We have two tuples having same predecessor.
+             *
+             * We add offset to predecessor irrespective of
transaction(t_xmin) status. We will
+             * do validation related to transaction status(and also
all other validations)
+             * when we loop over predecessor array.
+             */

The formatting of this comment will, I think, be mangled if pgindent
is run against the file. You can use ----- markers to prevent that, I
believe, or (better) write this as a paragraph without relying on the
lines ending up uneven in length.

Done, reformatted using pg_indent.

+ if (predecessor[nextTupOffnum] != 0)

+                {
+                    report_corruption(&ctx,
+                                psprintf("Tuple at offset %u is
reachable from two or more updated tuple",
+                                    (unsigned) nextTupOffnum));
+                    continue;
+                }

You need to do this test after xmin/xmax matching. Otherwise you might
get false positives. Also, the message should be more specific and
match the style of the existing messages. ctx.offnum is already going
to be reported in another column, but we can report both nextTupOffnum
and predecessor[nextTupOffnum] here e.g. "updated version at offset %u
is also the updated version of tuple at offset %u".

agree, done.

+ currTupXmax = HeapTupleHeaderGetUpdateXid(ctx.tuphdr);

+                lp   = PageGetItemId(ctx.page, nextTupOffnum);
+
+                htup = (HeapTupleHeader) PageGetItem(ctx.page, lp);

This has the same problem I mentioned in my first comment above,
namely, we haven't necessarily sanity-checked the length and offset
values for nextTupOffnum yet. Saying that another way, if the contents
of lp are corrupt and point off the page, we want that to be reported
as corruption (which the current code will already do) and we want
this check to be skipped so that we don't crash or access random
memory addresses. You need to think about how to rearrange the code so
that we only perform checks that are known to be safe.

Moved logic of sanity checked to a new function check_lp() and called
before accessing the next tuple while populating the predecessor array.

Please take the time to format your code according to the PostgeSQL
standard practice. If you don't know what that looks like, use
pgindent.

+        {
+            HeapTupleHeader pred_htup, curr_htup;
+            TransactionId   pred_xmin, curr_xmin, curr_xmax;
+            ItemId      pred_lp, curr_lp;

Same here.

Done.

I think it would actually be a good idea to set ctx.offnum to the

predecessor's offset number, and use a separate variable for the
current offset number. The reason why I think this is that I believe
it will make it easier to phrase the messages appropriately. For
example, if ctx.offnum is the predecessor tuple, then we can issue
complaints like this:

tuple with uncommitted xmin %u was updated to produce a tuple at
offset %u with differing xmin %u
unfrozen tuple was updated to produce a tuple at offset %u which is not
frozen
tuple with uncommitted xmin %u has cmin %u, but was updated to produce
a tuple with cmin %u
non-heap-only update produced a heap-only tuple at offset %u
heap-only update produced a non-heap only tuple at offset %u

Agree, Done.

+            if (!TransactionIdIsValid(curr_xmax) &&
+                HeapTupleHeaderIsHotUpdated(curr_htup))
+            {
+                report_corruption(&ctx,
+                            psprintf("Current tuple at offset %u is
HOT but is last tuple in the HOT chain.",
+                            (unsigned) ctx.offnum));
+            }

This check has nothing to do with the predecessor[] array, so it seems
like it belongs in check_tuple() rather than here. Also, the message
is rather confused, because the test is checking whether the tuple has
been HOT-updated, while the message is talking about whether the tuple
was *itself* created by a HOT update. Also, when we're dealing with
corruption, statements like "is last tuple in the HOT chain" are
pretty ambiguous. Also, isn't this an issue for both HOT-updated
tuples and also just regular updated tuples? i.e. maybe what we should
be complaining about here is something like "tuple has been updated,
but xmax is 0" and then make the test check exactly that.

Moved to check_tuple_header. This should be applicable for both HOT and
normal updates but even the last updated tuple in the normal update is
HEAP_UPDATED so not sure how we can apply this check for a normal update?

+ if (!HeapTupleHeaderIsHotUpdated(pred_htup) &&

+                HeapTupleHeaderIsHeapOnly(pred_htup) &&
+                HeapTupleHeaderIsHeapOnly(curr_htup))
+            {
+                report_corruption(&ctx,
+                            psprintf("Current tuple at offset %u is
HOT but it is next updated tuple of last Tuple in HOT chain.",
+                            (unsigned) ctx.offnum));
+            }

Three if-statements up, you tested two out of these three conditions
and complained if they were met. So any time this fires, that will
have also fired.

Yes, the above condition is not required. Now removed.

--
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v2-0001-HOT-chain-validation-in-verify_heapam.patchtext/x-patch; charset=UTF-8; name=v2-0001-HOT-chain-validation-in-verify_heapam.patchDownload+206-23
#4Aleksander Alekseev
aleksander@timescale.com
In reply to: Himanshu Upadhyaya (#3)
Re: HOT chain validation in verify_heapam()

Hi Himanshu,

Many thanks for working on this!

Please find attached the patch with the above idea of HOT chain's validation

Please correct me if I'm wrong, but don't we have a race condition here:

```
+            if ((TransactionIdDidAbort(pred_xmin) ||
TransactionIdIsInProgress(pred_xmin))
+                && !TransactionIdEquals(pred_xmin, curr_xmin))
             {
```

The scenario that concerns me is the following:

1. TransactionIdDidAbort(pred_xmin) returns false
2. The transaction aborts
3. TransactionIdIsInProgress(pred_xmin) returns false
4. (false || false) gives us false. An error is reported, although
actually the condition should have been true.

--
Best regards,
Aleksander Alekseev

#5Aleksander Alekseev
aleksander@timescale.com
In reply to: Aleksander Alekseev (#4)
Re: HOT chain validation in verify_heapam()

Hi again,

I am looking into the process of adding the TAP test for these changes and finding a way to corrupt a page in the tap test

Please note that currently the patch breaks many existing tests. I
suggest fixing these first.

For the details please see the cfbot report [1]http://cfbot.cputube.org/ or execute the tests
locally. Personally I'm using a little script for this [2]https://github.com/afiskon/pgscripts/blob/master/full-build.sh.

[1]: http://cfbot.cputube.org/
[2]: https://github.com/afiskon/pgscripts/blob/master/full-build.sh

--
Best regards,
Aleksander Alekseev

#6Aleksander Alekseev
aleksander@timescale.com
In reply to: Aleksander Alekseev (#4)
Re: HOT chain validation in verify_heapam()

Hi hackers,

Please correct me if I'm wrong, but don't we have a race condition here:

```
+            if ((TransactionIdDidAbort(pred_xmin) ||
TransactionIdIsInProgress(pred_xmin))
+                && !TransactionIdEquals(pred_xmin, curr_xmin))
{
```

The scenario that concerns me is the following:

1. TransactionIdDidAbort(pred_xmin) returns false
2. The transaction aborts
3. TransactionIdIsInProgress(pred_xmin) returns false
4. (false || false) gives us false. An error is reported, although
actually the condition should have been true.

It looks like I had a slight brain fade here.

In order to report a false error either TransactionIdDidAbort() or
TransactionIdIsInProgress() should return true and
TransactionIdEquals() should be false. So actually under rare
conditions the error will NOT be reported while it should. Other than
that we seem to be safe from the concurrency perspective, unless I'm
missing something again.

Personally I don't have a strong opinion on whether we should bother
about this scenario. Probably an explicit comment will not hurt.

Also I suggest checking TransactionIdEquals() first though since it's cheaper.

--
Best regards,
Aleksander Alekseev

#7Robert Haas
robertmhaas@gmail.com
In reply to: Aleksander Alekseev (#6)
Re: HOT chain validation in verify_heapam()

On Tue, Sep 6, 2022 at 9:38 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:

Please correct me if I'm wrong, but don't we have a race condition here:

```
+            if ((TransactionIdDidAbort(pred_xmin) ||
TransactionIdIsInProgress(pred_xmin))
+                && !TransactionIdEquals(pred_xmin, curr_xmin))
{
```

It looks like I had a slight brain fade here.

In order to report a false error either TransactionIdDidAbort() or
TransactionIdIsInProgress() should return true and
TransactionIdEquals() should be false. So actually under rare
conditions the error will NOT be reported while it should. Other than
that we seem to be safe from the concurrency perspective, unless I'm
missing something again.

Personally I don't have a strong opinion on whether we should bother
about this scenario. Probably an explicit comment will not hurt.

Also I suggest checking TransactionIdEquals() first though since it's cheaper.

I think the check should be written like this:

!TransactionIdEquals(pred_xmin, curr_xmin) && !TransctionIdDidCommit(pred_xmin)

The TransactionIdEquals check should be done first for the reason you
state: it's cheaper.

I think that we shouldn't be using TransactionIdDidAbort() at all,
because it can sometimes return false even when the transaction
actually did abort. See test_lockmode_for_conflict() and
TransactionIdIsInProgress() for examples of logic that copes with
this. What's happening here is that TransactionIdDidAbort doesn't ever
get called if the system crashes while a transaction is running. So we
can use TransactionIdDidAbort() only as an optimization: if it returns
true, the transaction is definitely aborted, but if it returns false,
we have to check whether it's still running. If not, it aborted
anyway.

TransactionIdDidCommit() does not have the same issue. A transaction
can abort without updating CLOG if the system crashes, but it can
never commit without updating CLOG. If the transaction didn't commit,
then it is either aborted or still in progress (and we don't care
which, because neither is an error here).

As to whether the existing formulation of the test has an error
condition, you're generally right that we should test
TransactionIdIsInProgress() before TransactionIdDidCommit/Abort,
because we during commit or abort, we first set the status in CLOG -
which is queried by TransactionIdDidCommit/Abort - and only afterward
update the procarray - which is queried by TransactionIdIsInProgress.
So normally TransactionIdIsInProgress should be checked first, and
TransactionIdDidCommit/Abort should only be checked if it returns
false, at which point we know that the return values of the latter
calls can't ever change. Possibly there is an argument for including
the TransactionIdInProgress check here too:

!TransactionIdEquals(pred_xmin, curr_xmin) &&
(TransactionIdIsInProgress(pred_xmin) ||
!TransctionIdDidCommit(pred_xmin))

...but I don't think it could change the answer. Most places that
check TransactionIdIsInProgress() first are concerned with MVCC
semantics, and here we are not. I think the only effects of including
or excluding the TransactionIdIsInProgress() test are (1) performance,
in that searching the procarray might avoid expense if it's cheaper
than searching clog, or add expense if the reverse is true and (2)
slightly changing the time at which we're first able to detect this
form of corruption. I am inclined to prefer the simpler form of the
test without TransactionIdIsInProgress() unless someone can say why we
shouldn't go that route.

--
Robert Haas
EDB: http://www.enterprisedb.com

#8Robert Haas
robertmhaas@gmail.com
In reply to: Himanshu Upadhyaya (#3)
Re: HOT chain validation in verify_heapam()

On Tue, Sep 6, 2022 at 6:34 AM Himanshu Upadhyaya
<upadhyaya.himanshu@gmail.com> wrote:

This isn't safe because the line pointer referenced by rditem may not
have been sanity-checked yet. Refer to the comment just below where it
says "Sanity-check the line pointer's offset and length values".

handled by creating a new function check_lp and calling it before accessing the redirected tuple.

I think this is going to result in duplicate error messages, because
if A redirects to B, what keeps us from calling check_lp(B) once when
we reach A and again when we reach B?

I am kind of generally suspicious of the idea that, both for redirects
and for ctid links, you just have it check_lp() on the target line
pointer and then maybe try to skip doing that again later when we get
there. That feels error-prone to me. I think we should try to find a
way of organizing the code where we do the check_lp() checks on all
line pointers in order without skipping around or backing up. It's not
100% clear to me what the best way of accomplishing that is, though.

But here's one random idea: add a successor[] array and an lp_valid[]
array. In the first loop, set lp_valid[offset] = true if it passes the
check_lp() checks, and set successor[A] = B if A redirects to B or has
a CTID link to B, without matching xmin/xmax. Then, in a second loop,
iterate over the successor[] array. If successor[A] = B && lp_valid[A]
&& lp_valid[B], then check whether A.xmax = B.xmin; if so, then
complain if predecessor[B] is already set, else set predecessor[B] =
A. Then, in the third loop, iterate over the predecessor array just as
you're doing now. Then it's clear that we do the lp_valid checks
exactly once for every offset that might need them, and in order. And
it's also clear that the predecessor-based checks can never happen
unless the lp_valid checks passed for both of the offsets involved.

Done, reformatted using pg_indent.

Thanks, but the new check_lp() function's declaration is not formatted
according to pgindent guidelines. It's not enough to fix the problems
once, you have to avoid reintroducing them.

+            if (!TransactionIdIsValid(curr_xmax) &&
+                HeapTupleHeaderIsHotUpdated(curr_htup))
+            {
+                report_corruption(&ctx,
+                            psprintf("Current tuple at offset %u is
HOT but is last tuple in the HOT chain.",
+                            (unsigned) ctx.offnum));
+            }

This check has nothing to do with the predecessor[] array, so it seems
like it belongs in check_tuple() rather than here. Also, the message
is rather confused, because the test is checking whether the tuple has
been HOT-updated, while the message is talking about whether the tuple
was *itself* created by a HOT update. Also, when we're dealing with
corruption, statements like "is last tuple in the HOT chain" are
pretty ambiguous. Also, isn't this an issue for both HOT-updated
tuples and also just regular updated tuples? i.e. maybe what we should
be complaining about here is something like "tuple has been updated,
but xmax is 0" and then make the test check exactly that.

Moved to check_tuple_header. This should be applicable for both HOT and normal updates but even the last updated tuple in the normal update is HEAP_UPDATED so not sure how we can apply this check for a normal update?

Oh, yeah. You're right. I was thinking that HEAP_UPDATED was like
HEAP_HOT_UPDATED, but it's not: HEAP_UPDATED gets set on the new
tuple, while HEAP_HOT_UPDATED gets set on the old tuple.

--
Robert Haas
EDB: http://www.enterprisedb.com

#9Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Robert Haas (#8)
Re: HOT chain validation in verify_heapam()

On Wed, Sep 7, 2022 at 2:49 AM Robert Haas <robertmhaas@gmail.com> wrote:

But here's one random idea: add a successor[] array and an lp_valid[]
array. In the first loop, set lp_valid[offset] = true if it passes the
check_lp() checks, and set successor[A] = B if A redirects to B or has
a CTID link to B, without matching xmin/xmax. Then, in a second loop,
iterate over the successor[] array. If successor[A] = B && lp_valid[A]
&& lp_valid[B], then check whether A.xmax = B.xmin; if so, then
complain if predecessor[B] is already set, else set predecessor[B] =
A. Then, in the third loop, iterate over the predecessor array just as
you're doing now. Then it's clear that we do the lp_valid checks
exactly once for every offset that might need them, and in order. And
it's also clear that the predecessor-based checks can never happen
unless the lp_valid checks passed for both of the offsets involved.

Approach of introducing a successor array is good but I see one overhead
with having both successor and predecessor array, that is, we will traverse
each offset on page thrice(one more for original loop on offset) and with
each offset we have to retrieve
/reach an ItemID(PageGetItemId) and Item(PageGetItem) itself. This is not
much overhead as they are all preprocessors but there will be some overhead.
How about having new array(initialised with LP_NOT_CHECKED) of enum
LPStatus as below

typedef enum LPStatus
{
LP_NOT_CHECKED,
LP_VALID,
LP_NOT_VALID
}LPStatus;

and validating and setting with proper status at three places
1) while validating Redirect Tuple
2) while validating populating predecessor array and
3) at original place of "sanity check"

something like:
" if (lpStatus[rdoffnum] == LP_NOT_CHECKED)
{
ctx.offnum = rdoffnum;
if (!check_lp(&ctx,
ItemIdGetLength(rditem), ItemIdGetOffset(rditem)))
{
lpStatus[rdoffnum] =
LP_NOT_VALID;
continue;
}
lpStatus[rdoffnum] = LP_VALID;
}
else if (lpStatus[rdoffnum] == LP_NOT_VALID)
continue;
"

thoughts?
--
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com

#10Robert Haas
robertmhaas@gmail.com
In reply to: Himanshu Upadhyaya (#9)
Re: HOT chain validation in verify_heapam()

On Fri, Sep 9, 2022 at 10:00 AM Himanshu Upadhyaya
<upadhyaya.himanshu@gmail.com> wrote:

Approach of introducing a successor array is good but I see one overhead with having both successor and predecessor array, that is, we will traverse each offset on page thrice(one more for original loop on offset) and with each offset we have to retrieve
/reach an ItemID(PageGetItemId) and Item(PageGetItem) itself. This is not much overhead as they are all preprocessors but there will be some overhead.
How about having new array(initialised with LP_NOT_CHECKED) of enum LPStatus as below

typedef enum LPStatus
{
LP_NOT_CHECKED,
LP_VALID,
LP_NOT_VALID
}LPStatus;

and validating and setting with proper status at three places
1) while validating Redirect Tuple
2) while validating populating predecessor array and
3) at original place of "sanity check"

Well, having to duplicate the logic in three places doesn't seem all
that clean to me. Admittedly, I haven't tried implementing my
proposal, so maybe that doesn't come out very clean either. I don't
know. But I think having the code be clearly correct here is the most
important thing, not shaving a few CPU cycles here or there. It's not
even clear to me that your way would be cheaper, because an "if"
statement is certainly not free, and in fact is probably more
expensive than an extra call to PageGetItem() or PageGetItemId().
Branches are usually more expensive than math. But actually I doubt
that it matters much either way. I think the way to figure out whether
we have a performance problem is to write the code in the
stylistically best way and then test it. There may be no problem at
all, and if there is a problem, it may not be where we think it will
be.

In short, let's apply Knuth's optimization principle.

--
Robert Haas
EDB: http://www.enterprisedb.com

#11Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Robert Haas (#8)
Re: HOT chain validation in verify_heapam()

On Wed, Sep 7, 2022 at 2:49 AM Robert Haas <robertmhaas@gmail.com> wrote:

But here's one random idea: add a successor[] array and an lp_valid[]
array. In the first loop, set lp_valid[offset] = true if it passes the
check_lp() checks, and set successor[A] = B if A redirects to B or has
a CTID link to B, without matching xmin/xmax. Then, in a second loop,
iterate over the successor[] array. If successor[A] = B && lp_valid[A]
&& lp_valid[B], then check whether A.xmax = B.xmin; if so, then
complain if predecessor[B] is already set, else set predecessor[B] =
A. Then, in the third loop, iterate over the predecessor array just as
you're doing now. Then it's clear that we do the lp_valid checks
exactly once for every offset that might need them, and in order. And
it's also clear that the predecessor-based checks can never happen
unless the lp_valid checks passed for both of the offsets involved.

ok, I have introduced a new approach to first construct a successor array
and then loop over the successor array to construct a predecessor array.

--
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v3-0001-HOT-chain-validation-in-verify_heapam.patchtext/x-patch; charset=UTF-8; name=v3-0001-HOT-chain-validation-in-verify_heapam.patchDownload+209-1
#12Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Robert Haas (#7)
Re: HOT chain validation in verify_heapam()

On Wed, Sep 7, 2022 at 12:11 AM Robert Haas <robertmhaas@gmail.com> wrote:

I think the check should be written like this:

!TransactionIdEquals(pred_xmin, curr_xmin) &&
!TransctionIdDidCommit(pred_xmin)

The TransactionIdEquals check should be done first for the reason you
state: it's cheaper.

I think that we shouldn't be using TransactionIdDidAbort() at all,
because it can sometimes return false even when the transaction
actually did abort. See test_lockmode_for_conflict() and
TransactionIdIsInProgress() for examples of logic that copes with
this. What's happening here is that TransactionIdDidAbort doesn't ever
get called if the system crashes while a transaction is running. So we
can use TransactionIdDidAbort() only as an optimization: if it returns
true, the transaction is definitely aborted, but if it returns false,
we have to check whether it's still running. If not, it aborted
anyway.

TransactionIdDidCommit() does not have the same issue. A transaction
can abort without updating CLOG if the system crashes, but it can
never commit without updating CLOG. If the transaction didn't commit,
then it is either aborted or still in progress (and we don't care
which, because neither is an error here).

As to whether the existing formulation of the test has an error
condition, you're generally right that we should test
TransactionIdIsInProgress() before TransactionIdDidCommit/Abort,
because we during commit or abort, we first set the status in CLOG -
which is queried by TransactionIdDidCommit/Abort - and only afterward
update the procarray - which is queried by TransactionIdIsInProgress.
So normally TransactionIdIsInProgress should be checked first, and
TransactionIdDidCommit/Abort should only be checked if it returns
false, at which point we know that the return values of the latter
calls can't ever change. Possibly there is an argument for including
the TransactionIdInProgress check here too:

!TransactionIdEquals(pred_xmin, curr_xmin) &&
(TransactionIdIsInProgress(pred_xmin) ||
!TransctionIdDidCommit(pred_xmin))

...but I don't think it could change the answer. Most places that
check TransactionIdIsInProgress() first are concerned with MVCC
semantics, and here we are not. I think the only effects of including
or excluding the TransactionIdIsInProgress() test are (1) performance,
in that searching the procarray might avoid expense if it's cheaper
than searching clog, or add expense if the reverse is true and (2)
slightly changing the time at which we're first able to detect this
form of corruption. I am inclined to prefer the simpler form of the
test without TransactionIdIsInProgress() unless someone can say why we
shouldn't go that route.

Done, updated in the v3 patch.

--
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com

#13Aleksander Alekseev
aleksander@timescale.com
In reply to: Himanshu Upadhyaya (#12)
Re: HOT chain validation in verify_heapam()

Hi Himanshu,

Done, updated in the v3 patch.

Thanks for the updated patch.

Here is v4 with fixed compiler warnings and some minor tweaks from me.

I didn't put too much thought into the algorithm but I already see
something strange. At verify_heapam.c:553 you declared curr_xmax and
next_xmin. However the variables are not used/initialized until you
do:

```
if (lp_valid[nextoffnum] && lp_valid[ctx.offnum] &&
TransactionIdIsValid(curr_xmax) &&
TransactionIdEquals(curr_xmax, next_xmin)) {
/* ... */
```

In v4 I elected to initialize both curr_xmax and next_xmin with
InvalidTransactionId for safety and in order to silence the compiler
but still there is no way this condition can succeed.

Please make sure there is no logic missing.

--
Best regards,
Aleksander Alekseev

Attachments:

v4-0001-Implement-HOT-chain-validation-in-verify_heapam.patchapplication/octet-stream; name=v4-0001-Implement-HOT-chain-validation-in-verify_heapam.patchDownload+213-1
#14Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Aleksander Alekseev (#13)
Re: HOT chain validation in verify_heapam()

On Mon, Sep 19, 2022 at 8:27 PM Aleksander Alekseev <
aleksander@timescale.com> wrote:

Hi Himanshu,

Done, updated in the v3 patch.

Thanks for the updated patch.

Here is v4 with fixed compiler warnings and some minor tweaks from me.

I didn't put too much thought into the algorithm but I already see
something strange. At verify_heapam.c:553 you declared curr_xmax and
next_xmin. However the variables are not used/initialized until you
do:

```
if (lp_valid[nextoffnum] && lp_valid[ctx.offnum] &&
TransactionIdIsValid(curr_xmax) &&
TransactionIdEquals(curr_xmax, next_xmin)) {
/* ... */
```

In v4 I elected to initialize both curr_xmax and next_xmin with
InvalidTransactionId for safety and in order to silence the compiler
but still there is no way this condition can succeed.

Please make sure there is no logic missing.

Hi Aleksander,

Thanks for sharing the feedback,
It's my mistake, sorry about that, I was trying to merge two if conditions
and forgot to move the initialization part for xmin and xmax. Now I think
that it will be good to have nested if, and have an inner if condition to
test xmax and xmin matching. This way we can retrieve and populate xmin and
xmax when it is actually required for the inner if condition.
I have changed this in the attached patch.

--
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v4-0001-HOT-chain-validation-in-verify_heapam.patchtext/x-patch; charset=UTF-8; name=v4-0001-HOT-chain-validation-in-verify_heapam.patchDownload+211-1
#15Aleksander Alekseev
aleksander@timescale.com
In reply to: Himanshu Upadhyaya (#14)
Re: HOT chain validation in verify_heapam()

Hi Himanshu,

I have changed this in the attached patch.

If it's not too much trouble could you please base your changes on v4
that I submitted? I put some effort into writing a proper commit
message, editing the comments, etc. The easiest way of doing this is
using `git am` and `git format-patch`.

--
Best regards,
Aleksander Alekseev

#16Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Aleksander Alekseev (#15)
Re: HOT chain validation in verify_heapam()

On Mon, Sep 19, 2022 at 10:04 PM Aleksander Alekseev <
aleksander@timescale.com> wrote:

Hi Himanshu,

I have changed this in the attached patch.

If it's not too much trouble could you please base your changes on v4
that I submitted? I put some effort into writing a proper commit
message, editing the comments, etc. The easiest way of doing this is
using `git am` and `git format-patch`.

Please find it attached.

--
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v5-0001-Implement-HOT-chain-validation-in-verify_heapam.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Implement-HOT-chain-validation-in-verify_heapam.patchDownload+214-1
#17Robert Haas
robertmhaas@gmail.com
In reply to: Himanshu Upadhyaya (#16)
Re: HOT chain validation in verify_heapam()

On Tue, Sep 20, 2022 at 5:00 AM Himanshu Upadhyaya
<upadhyaya.himanshu@gmail.com> wrote:

Please find it attached.

This patch still has no test cases. Just as we have test cases for the
existing corruption checks, we should have test cases for these new
corruption checks, showing cases where they actually fire.

I think I would be inclined to set lp_valid[x] = true in both the
redirected and non-redirected case, and then have the very first thing
that the second loop does be if (nextoffnum == 0 ||
!lp_valid[ctx.offnum]) continue. I think that would be more clear
about the intent to ignore line pointers that failed validation. Also,
if you did it that way, then you could catch the case of a redirected
line pointer pointing to another redirected line pointer, which is a
corruption condition that the current code does not appear to check.

+            /*
+             * Validation via the predecessor array. 1) If the predecessor's
+             * xmin is aborted or in progress, the current tuples xmin should
+             * be aborted or in progress respectively. Also both xmin's must
+             * be equal. 2) If the predecessor's xmin is not frozen, then
+             * current tuple's shouldn't be either. 3) If the predecessor's
+             * xmin is equal to the current tuple's xmin, the current tuple's
+             * cmin should be greater than the predecessor's cmin. 4) If the
+             * current tuple is not HOT then its predecessor's tuple must not
+             * be HEAP_HOT_UPDATED. 5) If the current tuple is HOT then its
+             * predecessor's tuple must be HEAP_HOT_UPDATED.
+             */

This comment needs to be split up into pieces and the pieces need to
be moved closer to the tests to which they correspond.

+ psprintf("unfrozen tuple was
updated to produce a tuple at offset %u which is not frozen",

Shouldn't this say "which is frozen"?

+             * Not a corruption if current tuple is updated/deleted by a
+             * different transaction, means t_cid will point to cmax (that is
+             * command id of deleting transaction) and cid of predecessor not
+             * necessarily will be smaller than cid of current tuple. t_cid

I think that the next person who reads this code is likely to
understand that the CIDs of different transactions are numerically
unrelated. What's less obvious is that if the XID is the same, the
newer update must have a higher CID.

+             * can hold combo command id but we are not worrying here since
+             * combo command id of the next updated tuple (if present) must be
+             * greater than combo command id of the current tuple. So here we
+             * are not checking HEAP_COMBOCID flag and simply doing t_cid
+             * comparison.

I disapprove of ignoring the HEAP_COMBOCID flag. Emitting a message
claiming that the CID has a certain value when that's actually a combo
CID is misleading, so at least a different message wording is needed
in such cases. But it's also not clear to me that the newer update has
to have a higher combo CID, because combo CIDs can be reused. If you
have multiple cursors open in the same transaction, the updates can be
interleaved, and it seems to me that it might be possible for an older
CID to have created a certain combo CID after a newer CID, and then
both cursors could update the same page in succession and end up with
combo CIDs out of numerical order. Unless somebody can make a
convincing argument that this isn't possible, I think we should just
skip this check for cases where either tuple has a combo CID.

+            if (TransactionIdEquals(pred_xmin, curr_xmin) &&
+                (TransactionIdEquals(curr_xmin, curr_xmax) ||
+                 !TransactionIdIsValid(curr_xmax)) && pred_cmin >= curr_cmin)

I don't understand the reason for the middle part of this condition --
TransactionIdEquals(curr_xmin, curr_xmax) ||
!TransactionIdIsValid(curr_xmax). I suppose the comment is meant to
explain this, but I still don't get it. If a tuple with XMIN 12345
CMIN 2 is updated to produce a tuple with XMIN 12345 CMIN 1, that's
corruption, regardless of what the XMAX of the second tuple may happen
to be.

+            if (HeapTupleHeaderIsHeapOnly(curr_htup) &&
+                !HeapTupleHeaderIsHotUpdated(pred_htup))
+            if (!HeapTupleHeaderIsHeapOnly(curr_htup) &&
+                HeapTupleHeaderIsHotUpdated(pred_htup))

I think it would be slightly clearer to write these tests the other
way around i.e. check the previous tuple's state first.

+    if (!TransactionIdIsValid(curr_xmax) &&
HeapTupleHeaderIsHotUpdated(tuphdr))
+    {
+        report_corruption(ctx,
+                          psprintf("tuple has been updated, but xmax is 0"));
+        result = false;
+    }

I guess this message needs to say "tuple has been HOT updated, but
xmax is 0" or something like that.

--
Robert Haas
EDB: http://www.enterprisedb.com

#18Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Robert Haas (#17)
Re: HOT chain validation in verify_heapam()

On Tue, Sep 20, 2022 at 6:43 PM Robert Haas <robertmhaas@gmail.com> wrote:

I disapprove of ignoring the HEAP_COMBOCID flag. Emitting a message
claiming that the CID has a certain value when that's actually a combo
CID is misleading, so at least a different message wording is needed
in such cases. But it's also not clear to me that the newer update has
to have a higher combo CID, because combo CIDs can be reused. If you
have multiple cursors open in the same transaction, the updates can be
interleaved, and it seems to me that it might be possible for an older
CID to have created a certain combo CID after a newer CID, and then
both cursors could update the same page in succession and end up with
combo CIDs out of numerical order. Unless somebody can make a
convincing argument that this isn't possible, I think we should just
skip this check for cases where either tuple has a combo CID.

Here our objective is to validate if both Predecessor's xmin and current

Tuple's xmin are same then cmin of predecessor must be less than current
Tuple's cmin. In case when both tuple xmin's are same then I think
predecessor's t_cid will always hold combo CID.
Then either one or both tuple will always have a combo CID and skipping
this check based on "either tuple has a combo CID" will make this if
condition to be evaluated to false ''.

+            if (TransactionIdEquals(pred_xmin, curr_xmin) &&
+                (TransactionIdEquals(curr_xmin, curr_xmax) ||
+                 !TransactionIdIsValid(curr_xmax)) && pred_cmin >=
curr_cmin)

I don't understand the reason for the middle part of this condition --
TransactionIdEquals(curr_xmin, curr_xmax) ||
!TransactionIdIsValid(curr_xmax). I suppose the comment is meant to
explain this, but I still don't get it. If a tuple with XMIN 12345
CMIN 2 is updated to produce a tuple with XMIN 12345 CMIN 1, that's
corruption, regardless of what the XMAX of the second tuple may happen
to be.

tuple | t_xmin | t_xmax | t_cid | t_ctid | tuple_data_split
|
heap_tuple_infomask_flags

-------+--------+--------+-------+--------+---------------------------------------------+------------------------------------------------------------------------------------------------------------------
-------------
1 | 971 | 971 | 0 | (0,3) |
{"\\x1774657374312020202020","\\x01000000"} |
("{HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_HOT_UPDATED}",{})
2 | 971 | 0 | 1 | (0,2) |
{"\\x1774657374322020202020","\\x02000000"} |
("{HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})
3 | 971 | 971 | 1 | (0,4) |
{"\\x1774657374322020202020","\\x01000000"} |
("{HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY
_TUPLE}",{})
4 | 971 | 972 | 0 | (0,5) |
{"\\x1774657374332020202020","\\x01000000"} |
("{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE}",{})
5 | 972 | 0 | 0 | (0,5) |
{"\\x1774657374342020202020","\\x01000000"} |
("{HEAP_HASVARWIDTH,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE}",{})

In the above case Tuple 1->3->4 is inserted and updated by xid 971 and
tuple 4 is next update by xid 972, here t_cid of tuple 4 is 0 where as its
predecessor's t_cid is 1, because in Tuple 4 t_cid is having command ID of
deleting transaction(cmax), that is why we need to check xmax of the Tuple.

Please correct me if I am missing anything here?
--
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com

#19Robert Haas
robertmhaas@gmail.com
In reply to: Himanshu Upadhyaya (#18)
Re: HOT chain validation in verify_heapam()

On Sat, Sep 24, 2022 at 8:45 AM Himanshu Upadhyaya
<upadhyaya.himanshu@gmail.com> wrote:

Here our objective is to validate if both Predecessor's xmin and current Tuple's xmin are same then cmin of predecessor must be less than current Tuple's cmin. In case when both tuple xmin's are same then I think predecessor's t_cid will always hold combo CID.
Then either one or both tuple will always have a combo CID and skipping this check based on "either tuple has a combo CID" will make this if condition to be evaluated to false ''.

Fair point. I think maybe we should just remove the CID validation
altogether. I thought initially that it would be possible to have a
newer update with a numerically lower combo CID, but after some
experimentation I don't see a way to do it. However, it also doesn't
seem very useful to me to check that the combo CIDs are in ascending
order. I mean, even if that's not supposed to happen and does anyway,
there aren't really any enduring consequences, because command IDs are
ignored completely outside of the transaction that performed the
operation originally. So even if the combo CIDs were set to completely
random values, is that really corruption? At most it messes things up
for the duration of one transaction. And if we just have plain CIDs
rather than combo CIDs, the same thing is true: they could be totally
messed up and it wouldn't really matter beyond the lifetime of that
one transaction.

Also, it would be a lot more tempting to check this if we could check
it in all cases, but we can't. If a tuple is inserted in transaction
T1 and ten updated twice in transaction T2, we'll have only one combo
CID and nothing to compare it against, nor any way to decode what CMIN
and CMAX it originally represented. And this is probably a pretty
common type of case.

+            if (TransactionIdEquals(pred_xmin, curr_xmin) &&
+                (TransactionIdEquals(curr_xmin, curr_xmax) ||
+                 !TransactionIdIsValid(curr_xmax)) && pred_cmin >= curr_cmin)

I don't understand the reason for the middle part of this condition --
TransactionIdEquals(curr_xmin, curr_xmax) ||
!TransactionIdIsValid(curr_xmax). I suppose the comment is meant to
explain this, but I still don't get it. If a tuple with XMIN 12345
CMIN 2 is updated to produce a tuple with XMIN 12345 CMIN 1, that's
corruption, regardless of what the XMAX of the second tuple may happen
to be.

tuple | t_xmin | t_xmax | t_cid | t_ctid | tuple_data_split | heap_tuple_infomask_flags

-------+--------+--------+-------+--------+---------------------------------------------+------------------------------------------------------------------------------------------------------------------
-------------
1 | 971 | 971 | 0 | (0,3) | {"\\x1774657374312020202020","\\x01000000"} | ("{HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_HOT_UPDATED}",{})
2 | 971 | 0 | 1 | (0,2) | {"\\x1774657374322020202020","\\x02000000"} | ("{HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})
3 | 971 | 971 | 1 | (0,4) | {"\\x1774657374322020202020","\\x01000000"} | ("{HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY
_TUPLE}",{})
4 | 971 | 972 | 0 | (0,5) | {"\\x1774657374332020202020","\\x01000000"} | ("{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE}",{})
5 | 972 | 0 | 0 | (0,5) | {"\\x1774657374342020202020","\\x01000000"} | ("{HEAP_HASVARWIDTH,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE}",{})

In the above case Tuple 1->3->4 is inserted and updated by xid 971 and tuple 4 is next update by xid 972, here t_cid of tuple 4 is 0 where as its predecessor's t_cid is 1, because in Tuple 4 t_cid is having command ID of deleting transaction(cmax), that is why we need to check xmax of the Tuple.

Please correct me if I am missing anything here?

Hmm, I see, so basically you're trying to check whether the CID field
contains a CMIN as opposed to a CMAX. But I'm not sure this test is
entirely reliable, because heap_prepare/execute_freeze_tuple() can set
a tuple's xmax to InvalidTransactionId even after it's had some other
value, and that won't do anything to the contents of the CID field.

--
Robert Haas
EDB: http://www.enterprisedb.com

#20Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Robert Haas (#19)
Re: HOT chain validation in verify_heapam()

On Tue, Sep 27, 2022 at 1:35 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Sep 24, 2022 at 8:45 AM Himanshu Upadhyaya
<upadhyaya.himanshu@gmail.com> wrote:

Here our objective is to validate if both Predecessor's xmin and current

Tuple's xmin are same then cmin of predecessor must be less than current
Tuple's cmin. In case when both tuple xmin's are same then I think
predecessor's t_cid will always hold combo CID.

Then either one or both tuple will always have a combo CID and skipping

this check based on "either tuple has a combo CID" will make this if
condition to be evaluated to false ''.

Fair point. I think maybe we should just remove the CID validation
altogether. I thought initially that it would be possible to have a
newer update with a numerically lower combo CID, but after some
experimentation I don't see a way to do it. However, it also doesn't
seem very useful to me to check that the combo CIDs are in ascending
order. I mean, even if that's not supposed to happen and does anyway,
there aren't really any enduring consequences, because command IDs are
ignored completely outside of the transaction that performed the
operation originally. So even if the combo CIDs were set to completely
random values, is that really corruption? At most it messes things up
for the duration of one transaction. And if we just have plain CIDs
rather than combo CIDs, the same thing is true: they could be totally
messed up and it wouldn't really matter beyond the lifetime of that
one transaction.

Also, it would be a lot more tempting to check this if we could check
it in all cases, but we can't. If a tuple is inserted in transaction
T1 and ten updated twice in transaction T2, we'll have only one combo
CID and nothing to compare it against, nor any way to decode what CMIN
and CMAX it originally represented. And this is probably a pretty
common type of case.

ok, I will be removing this entire validation of cmin/cid in my next patch.

+            if (TransactionIdEquals(pred_xmin, curr_xmin) &&
+                (TransactionIdEquals(curr_xmin, curr_xmax) ||
+                 !TransactionIdIsValid(curr_xmax)) && pred_cmin >=

curr_cmin)

I don't understand the reason for the middle part of this condition --
TransactionIdEquals(curr_xmin, curr_xmax) ||
!TransactionIdIsValid(curr_xmax). I suppose the comment is meant to
explain this, but I still don't get it. If a tuple with XMIN 12345
CMIN 2 is updated to produce a tuple with XMIN 12345 CMIN 1, that's
corruption, regardless of what the XMAX of the second tuple may happen
to be.

tuple | t_xmin | t_xmax | t_cid | t_ctid |

tuple_data_split |
heap_tuple_infomask_flags

-------+--------+--------+-------+--------+---------------------------------------------+------------------------------------------------------------------------------------------------------------------

-------------
1 | 971 | 971 | 0 | (0,3) |

{"\\x1774657374312020202020","\\x01000000"} |
("{HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_HOT_UPDATED}",{})

2 | 971 | 0 | 1 | (0,2) |

{"\\x1774657374322020202020","\\x02000000"} |
("{HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})

3 | 971 | 971 | 1 | (0,4) |

{"\\x1774657374322020202020","\\x01000000"} |
("{HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY

_TUPLE}",{})
4 | 971 | 972 | 0 | (0,5) |

{"\\x1774657374332020202020","\\x01000000"} |
("{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE}",{})

5 | 972 | 0 | 0 | (0,5) |

{"\\x1774657374342020202020","\\x01000000"} |
("{HEAP_HASVARWIDTH,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE}",{})

In the above case Tuple 1->3->4 is inserted and updated by xid 971 and

tuple 4 is next update by xid 972, here t_cid of tuple 4 is 0 where as its
predecessor's t_cid is 1, because in Tuple 4 t_cid is having command ID of
deleting transaction(cmax), that is why we need to check xmax of the Tuple.

Please correct me if I am missing anything here?

Hmm, I see, so basically you're trying to check whether the CID field
contains a CMIN as opposed to a CMAX. But I'm not sure this test is
entirely reliable, because heap_prepare/execute_freeze_tuple() can set
a tuple's xmax to InvalidTransactionId even after it's had some other
value, and that won't do anything to the contents of the CID field.

ok, Got it, as we are removing this cmin/cid validation so we don't need
any change here.

--
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com

#21Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Robert Haas (#17)
#22Aleksander Alekseev
aleksander@timescale.com
In reply to: Himanshu Upadhyaya (#21)
#23Andres Freund
andres@anarazel.de
In reply to: Himanshu Upadhyaya (#21)
In reply to: Andres Freund (#23)
#25Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#24)
In reply to: Andres Freund (#25)
#27Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#26)
#28Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#27)
In reply to: Andres Freund (#27)
In reply to: Andres Freund (#28)
#31Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#30)
In reply to: Andres Freund (#31)
#33Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#23)
#34Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#32)
In reply to: Robert Haas (#33)
#36Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#33)
In reply to: Andres Freund (#34)
#38Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#37)
In reply to: Andres Freund (#38)
#40Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#39)
In reply to: Andres Freund (#40)
#42Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#36)
#43Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#42)
#44Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#43)
#45Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#41)
#46Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Andres Freund (#23)
#47Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Robert Haas (#44)
#48Robert Haas
robertmhaas@gmail.com
In reply to: Himanshu Upadhyaya (#47)
#49Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Robert Haas (#48)
#50Robert Haas
robertmhaas@gmail.com
In reply to: Himanshu Upadhyaya (#49)
#51Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Himanshu Upadhyaya (#46)
#52Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Andres Freund (#36)
#53Andres Freund
andres@anarazel.de
In reply to: Himanshu Upadhyaya (#52)
In reply to: Andres Freund (#45)
#55Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#54)
In reply to: Andres Freund (#55)
#57Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Andres Freund (#23)
#58Andres Freund
andres@anarazel.de
In reply to: Himanshu Upadhyaya (#57)
#59Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Andres Freund (#58)
#60Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Andres Freund (#58)
#61Aleksander Alekseev
aleksander@timescale.com
In reply to: Himanshu Upadhyaya (#60)
#62Robert Haas
robertmhaas@gmail.com
In reply to: Aleksander Alekseev (#61)
#63Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Robert Haas (#62)
#64Robert Haas
robertmhaas@gmail.com
In reply to: Himanshu Upadhyaya (#63)
#65Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Himanshu Upadhyaya (#63)
#66Robert Haas
robertmhaas@gmail.com
In reply to: Himanshu Upadhyaya (#65)
#67Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Robert Haas (#66)
#68Robert Haas
robertmhaas@gmail.com
In reply to: Himanshu Upadhyaya (#67)
#69Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Robert Haas (#68)
#70Robert Haas
robertmhaas@gmail.com
In reply to: Himanshu Upadhyaya (#69)
#71Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#70)
#72Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Robert Haas (#71)
#73Aleksander Alekseev
aleksander@timescale.com
In reply to: Mark Dilger (#72)
#74Robert Haas
robertmhaas@gmail.com
In reply to: Aleksander Alekseev (#73)
#75Aleksander Alekseev
aleksander@timescale.com
In reply to: Robert Haas (#74)
#76Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Robert Haas (#74)
#77Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Himanshu Upadhyaya (#76)
#78Aleksander Alekseev
aleksander@timescale.com
In reply to: Himanshu Upadhyaya (#77)
#79Robert Haas
robertmhaas@gmail.com
In reply to: Aleksander Alekseev (#78)
#80Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#79)
#81Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#79)
In reply to: Andres Freund (#81)
In reply to: Peter Geoghegan (#82)
#84Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#81)
#85Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#84)
#86Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#81)
#87Himanshu Upadhyaya
upadhyaya.himanshu@gmail.com
In reply to: Andres Freund (#81)
#88Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#80)
#89Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#81)
#90Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#84)
#91Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#83)
#92Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#88)
#93Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#90)
#94Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#93)
#95Andres Freund
andres@anarazel.de
In reply to: Himanshu Upadhyaya (#87)
#96Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#94)
#97Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#86)
#98Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#97)
#99Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#98)
#100Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#96)