Re: XMAX_LOCK_ONLY and XMAX_COMMITTED (fk/multixact code)
On 8/26/20, 2:13 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
On 8/26/20, 12:16 PM, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote:
On 2020-Aug-20, Jeremy Schneider wrote:
Alternatively, if we don't want to take this approach, then I'd argue
that we should update README.tuplock to explicitly state that
XMAX_LOCK_ONLY and XMAX_COMMITTED are incompatible (just as it already
states for HEAP_XMAX_IS_MULTI and HEAP_XMAX_COMMITTED) and clean up the
code in heapam_visibility.c for consistency.Yeah, I like this approach better for the master branch; not just clean
up as in remove the cases that handle it, but also actively elog(ERROR)
if the condition ever occurs (hopefully with some known way to fix the
problem; maybe by "WITH tup AS (DELETE FROM tab WHERE .. RETURNING *)
INSERT * INTO tab FROM tup" or similar.)+1. I wouldn't mind picking this up, but it might be some time before
I can get to it.
I've finally gotten started on this, and I've attached a work-in-
progress patch to gather some early feedback. I'm specifically
wondering if there are other places it'd be good to check for these
unsupported combinations and whether we should use the
HEAP_XMAX_IS_LOCKED_ONLY macro or just look for the
HEAP_XMAX_LOCK_ONLY bit. IIUC HEAP_XMAX_IS_LOCKED_ONLY is intended to
handle some edge cases after pg_upgrade, but AFAICT
HEAP_XMAX_COMMITTED was not used for those previous bit combinations,
either. Therefore, I've used the HEAP_XMAX_IS_LOCKED_ONLY macro in
the attached patch, but I would not be surprised to learn that this is
wrong for some reason.
Nathan
Attachments:
v1-0001-Disallow-HEAP_XMAX_COMMITTED-and-HEAP_XMAX_IS_LOC.patchapplication/octet-stream; name=v1-0001-Disallow-HEAP_XMAX_COMMITTED-and-HEAP_XMAX_IS_LOC.patchDownload+85-31
The archives seem unhappy with my attempt to revive this old thread,
so here is a link to it in case anyone is looking for more context:
/messages/by-id/3476708e-7919-20cb-ca45-6603470565f7@amazon.com
Nathan
On Nov 23, 2021, at 4:26 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
I've finally gotten started on this, and I've attached a work-in-
progress patch to gather some early feedback. I'm specifically
wondering if there are other places it'd be good to check for these
unsupported combinations and whether we should use the
HEAP_XMAX_IS_LOCKED_ONLY macro or just look for the
HEAP_XMAX_LOCK_ONLY bit.
I have to wonder if, when corruption is reported for conditions like this:
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
+ HEAP_XMAX_IS_LOCKED_ONLY(ctx->tuphdr->t_infomask))
if the first thing we're going to want to know is which branch of the HEAP_XMAX_IS_LOCKED_ONLY macro evaluated true? Should we split this check to do each branch of the macro separately, such as:
if (ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED)
{
if (ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY)
... report something ...
else if ((ctx->tuphdr->t_infomask & (HEAP_XMAX_IS_MULTI | HEAP_LOCK_MASK)) == HEAP_XMAX_EXCL_LOCK)
... report a different thing ...
}
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 11/23/21, 4:36 PM, "Mark Dilger" <mark.dilger@enterprisedb.com> wrote:
I have to wonder if, when corruption is reported for conditions like this:
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) && + HEAP_XMAX_IS_LOCKED_ONLY(ctx->tuphdr->t_infomask))if the first thing we're going to want to know is which branch of the HEAP_XMAX_IS_LOCKED_ONLY macro evaluated true? Should we split this check to do each branch of the macro separately, such as:
if (ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED)
{
if (ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY)
... report something ...
else if ((ctx->tuphdr->t_infomask & (HEAP_XMAX_IS_MULTI | HEAP_LOCK_MASK)) == HEAP_XMAX_EXCL_LOCK)
... report a different thing ...
}
This is a good point. Right now, you'd have to manually inspect the
infomask field to determine the exact form of the invalid combination.
My only worry with this is that we'd want to make sure these checks
stayed consistent with the definition of HEAP_XMAX_IS_LOCKED_ONLY in
htup_details.h. I'm guessing HEAP_XMAX_IS_LOCKED_ONLY is unlikely to
change all that often, though.
Nathan
On Nov 23, 2021, at 4:51 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
This is a good point. Right now, you'd have to manually inspect the
infomask field to determine the exact form of the invalid combination.
My only worry with this is that we'd want to make sure these checks
stayed consistent with the definition of HEAP_XMAX_IS_LOCKED_ONLY in
htup_details.h. I'm guessing HEAP_XMAX_IS_LOCKED_ONLY is unlikely to
change all that often, though.
I expect that your patch will contain some addition to the amcheck (or pg_amcheck) tests, so if we ever allow some currently disallowed bit combination, we'd be reminded of the need to update amcheck. So I'm not too worried about that aspect of this.
I prefer not to have to get a page (or full file) from a customer when the check reports corruption. Even assuming they are comfortable giving that, which they may not be, it is an extra round trip to the customer asking for stuff.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 11/23/21, 4:59 PM, "Mark Dilger" <mark.dilger@enterprisedb.com> wrote:
On Nov 23, 2021, at 4:51 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
This is a good point. Right now, you'd have to manually inspect the
infomask field to determine the exact form of the invalid combination.
My only worry with this is that we'd want to make sure these checks
stayed consistent with the definition of HEAP_XMAX_IS_LOCKED_ONLY in
htup_details.h. I'm guessing HEAP_XMAX_IS_LOCKED_ONLY is unlikely to
change all that often, though.I expect that your patch will contain some addition to the amcheck (or pg_amcheck) tests, so if we ever allow some currently disallowed bit combination, we'd be reminded of the need to update amcheck. So I'm not too worried about that aspect of this.
I prefer not to have to get a page (or full file) from a customer when the check reports corruption. Even assuming they are comfortable giving that, which they may not be, it is an extra round trip to the customer asking for stuff.
Another option we might consider is only checking for the
HEAP_XMAX_LOCK_ONLY bit instead of everything in
HEAP_XMAX_IS_LOCKED_ONLY. IIUC everything else is only expected to
happen for upgrades from v9.2 or earlier, so it might be pretty rare
at this point. Otherwise, I'll extract the exact bit pattern for the
error message as you suggest.
Nathan
On Nov 24, 2021, at 12:53 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
Another option we might consider is only checking for the
HEAP_XMAX_LOCK_ONLY bit instead of everything in
HEAP_XMAX_IS_LOCKED_ONLY. IIUC everything else is only expected to
happen for upgrades from v9.2 or earlier, so it might be pretty rare
at this point. Otherwise, I'll extract the exact bit pattern for the
error message as you suggest.
I would prefer to detect and report any "can't happen" bit patterns without regard for how likely the pattern may be. The difficulty is in proving that a bit pattern is disallowed. Just because you can't find a code path in the current code base that would create a pattern doesn't mean it won't have legitimately been created by some past release or upgrade path. As such, any prohibitions explicitly in the backend, such as Asserts around a condition, are really valuable. You can know that the pattern is disallowed, since the server would Assert on it if encountered.
Aside from that, I don't really buy the argument that databases upgraded from v9.2 or earlier are rare. Even if servers *running* v9.2 or earlier are (or become) rare, servers initialized that far back which have been upgraded one or more times since then may be common.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 11/25/21, 9:16 AM, "Mark Dilger" <mark.dilger@enterprisedb.com> wrote:
On Nov 24, 2021, at 12:53 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
Another option we might consider is only checking for the
HEAP_XMAX_LOCK_ONLY bit instead of everything in
HEAP_XMAX_IS_LOCKED_ONLY. IIUC everything else is only expected to
happen for upgrades from v9.2 or earlier, so it might be pretty rare
at this point. Otherwise, I'll extract the exact bit pattern for the
error message as you suggest.I would prefer to detect and report any "can't happen" bit patterns without regard for how likely the pattern may be. The difficulty is in proving that a bit pattern is disallowed. Just because you can't find a code path in the current code base that would create a pattern doesn't mean it won't have legitimately been created by some past release or upgrade path. As such, any prohibitions explicitly in the backend, such as Asserts around a condition, are really valuable. You can know that the pattern is disallowed, since the server would Assert on it if encountered.
Aside from that, I don't really buy the argument that databases upgraded from v9.2 or earlier are rare. Even if servers *running* v9.2 or earlier are (or become) rare, servers initialized that far back which have been upgraded one or more times since then may be common.
Okay, I'll do it that way in the next revision.
Nathan
On 11/29/21, 10:10 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
Okay, I'll do it that way in the next revision.
v2 attached.
Nathan
Attachments:
v2-0001-Disallow-HEAP_XMAX_COMMITTED-and-HEAP_XMAX_IS_LOC.patchapplication/octet-stream; name=v2-0001-Disallow-HEAP_XMAX_COMMITTED-and-HEAP_XMAX_IS_LOC.patchDownload+148-31
On 11/30/21, 4:54 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
v2 attached.
I accidentally left a redundant check in v2, so here is a v3 without
it.
My proposed patch adds a few checks for the unsupported bit patterns
in the visibility code, but it is far from exhaustive. I'm wondering
if it might be better just to add a function or macro that everything
exported from heapam_visibility.c is expected to call. My guess is
the main argument against that would be the possible performance
impact.
Nathan
Attachments:
v3-0001-Disallow-HEAP_XMAX_COMMITTED-and-HEAP_XMAX_IS_LOC.patchapplication/octet-stream; name=v3-0001-Disallow-HEAP_XMAX_COMMITTED-and-HEAP_XMAX_IS_LOC.patchDownload+142-31
On Dec 1, 2021, at 10:59 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
here is a v3
It took a while for me to get to this....
@@ -1304,33 +1370,46 @@ HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer, TransactionId *de
if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
{
+ if (unlikely(tuple->t_infomask & HEAP_XMAX_COMMITTED))
+ {
+ if (tuple->t_infomask & HEAP_XMAX_LOCK_ONLY)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg_internal("found tuple with HEAP_XMAX_COMMITTED "
+ "and HEAP_XMAX_LOCK_ONLY")));
+
+ /* pre-v9.3 lock-only bit pattern */
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg_internal("found tuple with HEAP_XMAX_COMMITTED and"
+ "HEAP_XMAX_EXCL_LOCK set and "
+ "HEAP_XMAX_IS_MULTI unset")));
+ }
+
I find this bit hard to understand. Does the comment mean to suggest that the *upgrade* process should have eliminated all pre-v9.3 bit patterns, and therefore any such existing patterns are certainly corruption, or does it mean that data written by pre-v9.3 servers (and not subsequently updated) is defined as corrupt, or .... ?
I am not complaining that the logic is wrong, just trying to wrap my head around what the comment means.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 12/21/21, 11:42 AM, "Mark Dilger" <mark.dilger@enterprisedb.com> wrote:
+ /* pre-v9.3 lock-only bit pattern */ + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg_internal("found tuple with HEAP_XMAX_COMMITTED and" + "HEAP_XMAX_EXCL_LOCK set and " + "HEAP_XMAX_IS_MULTI unset"))); + } +I find this bit hard to understand. Does the comment mean to suggest that the *upgrade* process should have eliminated all pre-v9.3 bit patterns, and therefore any such existing patterns are certainly corruption, or does it mean that data written by pre-v9.3 servers (and not subsequently updated) is defined as corrupt, or .... ?
I am not complaining that the logic is wrong, just trying to wrap my head around what the comment means.
This is just another way that a tuple may be marked locked-only, and
we want to explicitly disallow locked-only + xmax-committed. This bit
pattern may be present on servers that were pg_upgraded from pre-v9.3
versions. See commits 0ac5ad5 and 74ebba8 for more detail.
Nathan
I think this one requires some more work, and it needn't be a priority for
v15, so I've adjusted the commitfest entry to v16 and moved it to the next
commitfest.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Thu, Mar 17, 2022 at 04:45:28PM -0700, Nathan Bossart wrote:
I think this one requires some more work, and it needn't be a priority for
v15, so I've adjusted the commitfest entry to v16 and moved it to the next
commitfest.
Here is a new patch. The main differences from v3 are in
heapam_visibility.c. Specifically, instead of trying to work the infomask
checks into the visibility logic, I added a new function that does a couple
of assertions. This function is called at the beginning of each visibility
function.
What do folks think? The options I've considered are 1) not adding any
such checks to heapam_visibility.c, 2) only adding assertions like the
attached patch, or 3) actually using elog(ERROR, ...) when the invalid bit
patterns are detected. AFAICT (1) is more in line with existing invalid
bit patterns (e.g., XMAX_COMMITTED + XMAX_IS_MULTI). There are a couple of
scattered assertions, but most code paths don't check for it. (2) adds
additional checks, but only for --enable-cassert builds. (3) would add
checks even for non-assert builds, but there would presumably be some
performance cost involved.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachments:
v4-0001-disallow-XMAX_COMMITTED-XMAX_LOCK_ONLY.patchtext/x-diff; charset=us-asciiDownload+96-35
Hi,
On 2022-03-22 16:06:40 -0700, Nathan Bossart wrote:
On Thu, Mar 17, 2022 at 04:45:28PM -0700, Nathan Bossart wrote:
I think this one requires some more work, and it needn't be a priority for
v15, so I've adjusted the commitfest entry to v16 and moved it to the next
commitfest.Here is a new patch. The main differences from v3 are in
heapam_visibility.c. Specifically, instead of trying to work the infomask
checks into the visibility logic, I added a new function that does a couple
of assertions. This function is called at the beginning of each visibility
function.What do folks think? The options I've considered are 1) not adding any
such checks to heapam_visibility.c, 2) only adding assertions like the
attached patch, or 3) actually using elog(ERROR, ...) when the invalid bit
patterns are detected. AFAICT (1) is more in line with existing invalid
bit patterns (e.g., XMAX_COMMITTED + XMAX_IS_MULTI). There are a couple of
scattered assertions, but most code paths don't check for it. (2) adds
additional checks, but only for --enable-cassert builds. (3) would add
checks even for non-assert builds, but there would presumably be some
performance cost involved.
From 2d6b372cf61782e0fd52590b57b1c914b0ed7a4c Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandbossart@gmail.com>
Date: Tue, 22 Mar 2022 15:35:34 -0700
Subject: [PATCH v4 1/1] disallow XMAX_COMMITTED + XMAX_LOCK_ONLY
Just skimming this thread quickly, I really have no idea what this is trying
to achieve and the commit message doesn't help either... I didn't read the
referenced thread, but I shouldn't have to, to get a basic idea.
Greetings,
Andres Freund
On Tue, Mar 22, 2022 at 04:13:47PM -0700, Andres Freund wrote:
Just skimming this thread quickly, I really have no idea what this is trying
to achieve and the commit message doesn't help either... I didn't read the
referenced thread, but I shouldn't have to, to get a basic idea.
Ah, my bad. I should've made sure the context was carried over better. I
updated the commit message with some basic information about the intent.
Please let me know if there is anything else that needs to be cleared up.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachments:
v5-0001-Disallow-setting-XMAX_COMMITTED-and-XMAX_LOCK_ONL.patchtext/x-diff; charset=us-asciiDownload+96-35
Here is a rebased patch for cfbot.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachments:
v6-0001-Disallow-setting-XMAX_COMMITTED-and-XMAX_LOCK_ONL.patchtext/x-diff; charset=us-asciiDownload+96-35
On Tue, Sep 20, 2022 at 2:32 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:
Here is a rebased patch for cfbot.
Applies, passes make check world.
Patch is straightforward, but the previous code is less so. It purported to
set XMAX_COMMITTED _or_ XMAX_INVALID, but never seemed to un-set
XMAX_COMMITTED, was that the source of the double-setting?
Hi,
On 2022-09-20 11:32:02 -0700, Nathan Bossart wrote:
Note that this change also disallows XMAX_COMMITTED together with
the special pre-v9.3 locked-only bit pattern that
HEAP_XMAX_IS_LOCKED_ONLY checks for. This locked-only bit pattern
may still be present on servers pg_upgraded from pre-v9.3 versions.
Given that fact, that aspect at least seems to be not viable?
Greetings,
Andres Freund
On Thu, Feb 02, 2023 at 06:59:51AM -0800, Andres Freund wrote:
On 2022-09-20 11:32:02 -0700, Nathan Bossart wrote:
Note that this change also disallows XMAX_COMMITTED together with
the special pre-v9.3 locked-only bit pattern that
HEAP_XMAX_IS_LOCKED_ONLY checks for. This locked-only bit pattern
may still be present on servers pg_upgraded from pre-v9.3 versions.Given that fact, that aspect at least seems to be not viable?
AFAICT from looking at the v9.2 code, the same idea holds true for this
special bit pattern. I only see HEAP_XMAX_INVALID set when one of the
infomask lock bits is set, and those bits now correspond to
HEAP_XMAX_LOCK_ONLY and HEAP_XMAX_EXCL_LOCK (which are both covered by the
HEAP_XMAX_IS_LOCKED_ONLY macro). Of course, I could be missing something.
Do you think we should limit this to the v9.3+ bit pattern?
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com