XMAX_LOCK_ONLY and XMAX_COMMITTED (fk/multixact code)

Started by Jeremy Schneiderover 5 years ago8 messages
#1Jeremy Schneider
schnjere@amazon.com

While working with Nathan Bossart on an extension, we came across an
interesting quirk and possible inconsistency in the PostgreSQL code
around infomask flags. I'd like to know if there's something I'm
misunderstanding here or if this really is a correctness/robustness gap
in the code.

At the root of it is the relationship between the XMAX_LOCK_ONLY and
XMAX_COMMITTED infomask bits.

One of the things in that all-important foreign key patch from 2013
(0ac5ad51) was to tweak the UpdateXmaxHintBits() function to always set
the INVALID bit if the transaction was a locker only (even if the
locking transaction committed).

https://github.com/postgres/postgres/blob/9168793d7275b4b318c153d607fba55d14098c19/src/backend/access/heap/heapam.c#L1748

However it seems pretty clear from pretty much all of the visibility
code that while it may not be the usual case, it is considered a valid
state to have the XMAX_LOCK_ONLY and XMAX_COMMITTED bits set at the same
time. This combination is handled correctly throughout heapam_visibility.c:

https://github.com/postgres/postgres/blob/7559d8ebfa11d98728e816f6b655582ce41150f3/src/backend/access/heap/heapam_visibility.c#L273
https://github.com/postgres/postgres/blob/7559d8ebfa11d98728e816f6b655582ce41150f3/src/backend/access/heap/heapam_visibility.c#L606
https://github.com/postgres/postgres/blob/7559d8ebfa11d98728e816f6b655582ce41150f3/src/backend/access/heap/heapam_visibility.c#L871
https://github.com/postgres/postgres/blob/7559d8ebfa11d98728e816f6b655582ce41150f3/src/backend/access/heap/heapam_visibility.c#L1271
https://github.com/postgres/postgres/blob/7559d8ebfa11d98728e816f6b655582ce41150f3/src/backend/access/heap/heapam_visibility.c#L1447

But then there's one place in heapam.c where an assumption appears that
this state will never happen - the compute_new_xmax_infomask() function:

https://github.com/postgres/postgres/blob/9168793d7275b4b318c153d607fba55d14098c19/src/backend/access/heap/heapam.c#L4800

else if (old_infomask & HEAP_XMAX_COMMITTED)
{
...
if (old_infomask2 & HEAP_KEYS_UPDATED)
status = MultiXactStatusUpdate;
else
status = MultiXactStatusNoKeyUpdate;
new_status = get_mxact_status_for_lock(mode, is_update);
...
new_xmax = MultiXactIdCreate(xmax, status, add_to_xmax, new_status);

When that code sees XMAX_COMMITTED, it assumes the xmax can't possibly
be LOCK_ONLY and it sets the status to something sufficiently high
enough to guarantee that ISUPDATE_from_mxstatus() returns true. That
means that when you try to update this tuple and
compute_new_xmax_infomask() is called with an "update" status, two
"update" statuses are sent to MultiXactIdCreate() which then fails
(thank goodness) with the error "new multixact has more than one
updating member".

https://github.com/postgres/postgres/blob/cd5e82256de5895595cdd99ecb03aea15b346f71/src/backend/access/transam/multixact.c#L784

The UpdateXmaxHintBits() code to always set the INVALID bit wasn't in
any patches on the mailing list but it was committed and it seems to
have worked very well. As a result it seems nearly impossible to get
into the state where you have both XMAX_LOCK_ONLY and XMAX_COMMITTED
bits set; thus it seems we've avoided problems in
compute_new_xmax_infomask().

But nonetheless it seems worth making the code more robust by having the
compute_new_xmax_infomask() function handle this state correctly just as
the visibility code does. It should only require a simple patch like
this (credit to Nathan Bossart):

diff --git a/src/backend/access/heap/heapam.c
b/src/backend/access/heap/heapam.c
index d881f4cd46..371e3e4f61 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4695,7 +4695,9 @@ compute_new_xmax_infomask(TransactionId xmax,
uint16 old_infomask,
 l5:
        new_infomask = 0;
        new_infomask2 = 0;
-       if (old_infomask & HEAP_XMAX_INVALID)
+       if (old_infomask & HEAP_XMAX_INVALID ||
+               (old_infomask & HEAP_XMAX_COMMITTED &&
+                HEAP_XMAX_IS_LOCKED_ONLY(old_infomask)))
        {
                /*
                 * No previous locker; we just insert our own TransactionId.

Alternatively, if we don't want to take this approach, then I'd argue
that we should update README.tuplock to explicitly state that
XMAX_LOCK_ONLY and XMAX_COMMITTED are incompatible (just as it already
states for HEAP_XMAX_IS_MULTI and HEAP_XMAX_COMMITTED) and clean up the
code in heapam_visibility.c for consistency.

Might be worth adding a note to README.tuplock either way about
valid/invalid combinations of infomask flags. Might help avoid some
confusion as people approach the code base.

What do others think about this?

Thanks,
Jeremy

--
Jeremy Schneider
Database Engineer
Amazon Web Services

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jeremy Schneider (#1)
Re: XMAX_LOCK_ONLY and XMAX_COMMITTED (fk/multixact code)

On 2020-Aug-20, Jeremy Schneider wrote:

While working with Nathan Bossart on an extension, we came across an
interesting quirk and possible inconsistency in the PostgreSQL code
around infomask flags. I'd like to know if there's something I'm
misunderstanding here or if this really is a correctness/robustness gap
in the code.

At the root of it is the relationship between the XMAX_LOCK_ONLY and
XMAX_COMMITTED infomask bits.

I spent a lot of time wondering about XMAX_COMMITTED. It seemed to me
that it would make no sense to have xacts that were lock-only yet have
XMAX_COMMITTED set; so I looked hard to make sure no place would ever
cause such a situation to arise. However, I still made my best effort
to make the code cope with such a combination correctly if it ever
showed up.

And I may have missed assumptions such as this one, even if they seem
obvious in retrospect, such as in compute_new_xmax_infomask (which, as
you'll notice, changed considerably from what was initially committed):

But then there's one place in heapam.c where an assumption appears that
this state will never happen - the compute_new_xmax_infomask() function:

https://github.com/postgres/postgres/blob/9168793d7275b4b318c153d607fba55d14098c19/src/backend/access/heap/heapam.c#L4800

else if (old_infomask & HEAP_XMAX_COMMITTED)
{
...
if (old_infomask2 & HEAP_KEYS_UPDATED)
status = MultiXactStatusUpdate;
else
status = MultiXactStatusNoKeyUpdate;
new_status = get_mxact_status_for_lock(mode, is_update);
...
new_xmax = MultiXactIdCreate(xmax, status, add_to_xmax, new_status);

When that code sees XMAX_COMMITTED, it assumes the xmax can't possibly
be LOCK_ONLY and it sets the status to something sufficiently high
enough to guarantee that ISUPDATE_from_mxstatus() returns true. That
means that when you try to update this tuple and
compute_new_xmax_infomask() is called with an "update" status, two
"update" statuses are sent to MultiXactIdCreate() which then fails
(thank goodness) with the error "new multixact has more than one
updating member".

https://github.com/postgres/postgres/blob/cd5e82256de5895595cdd99ecb03aea15b346f71/src/backend/access/transam/multixact.c#L784

Have you ever observed this error case hit? I've never seen a report of
that.

The UpdateXmaxHintBits() code to always set the INVALID bit wasn't in
any patches on the mailing list but it was committed and it seems to
have worked very well. As a result it seems nearly impossible to get
into the state where you have both XMAX_LOCK_ONLY and XMAX_COMMITTED
bits set; thus it seems we've avoided problems in
compute_new_xmax_infomask().

Phew.

(I guess I should fully expect that somebody, given sufficient time,
would carefully inspect the committed code against submitted patches ...
especially given that I do such inspections myself.)

But nonetheless it seems worth making the code more robust by having the
compute_new_xmax_infomask() function handle this state correctly just as
the visibility code does. It should only require a simple patch like
this (credit to Nathan Bossart):

diff --git a/src/backend/access/heap/heapam.c
b/src/backend/access/heap/heapam.c
index d881f4cd46..371e3e4f61 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4695,7 +4695,9 @@ compute_new_xmax_infomask(TransactionId xmax,
uint16 old_infomask,
l5:
new_infomask = 0;
new_infomask2 = 0;
-       if (old_infomask & HEAP_XMAX_INVALID)
+       if (old_infomask & HEAP_XMAX_INVALID ||
+               (old_infomask & HEAP_XMAX_COMMITTED &&
+                HEAP_XMAX_IS_LOCKED_ONLY(old_infomask)))
{
/*
* No previous locker; we just insert our own TransactionId.

We could do this in stable branches, if there were any reports that
this inconsistency is happening in real world databases.

Alternatively, if we don't want to take this approach, then I'd argue
that we should update README.tuplock to explicitly state that
XMAX_LOCK_ONLY and XMAX_COMMITTED are incompatible (just as it already
states for HEAP_XMAX_IS_MULTI and HEAP_XMAX_COMMITTED) and clean up the
code in heapam_visibility.c for consistency.

Yeah, I like this approach better for the master branch; not just clean
up as in remove the cases that handle it, but also actively elog(ERROR)
if the condition ever occurs (hopefully with some known way to fix the
problem; maybe by "WITH tup AS (DELETE FROM tab WHERE .. RETURNING *)
INSERT * INTO tab FROM tup" or similar.)

Might be worth adding a note to README.tuplock either way about
valid/invalid combinations of infomask flags. Might help avoid some
confusion as people approach the code base.

Absolutely.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Bossart, Nathan
bossartn@amazon.com
In reply to: Alvaro Herrera (#2)
Re: XMAX_LOCK_ONLY and XMAX_COMMITTED (fk/multixact code)

On 8/26/20, 12:16 PM, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote:

On 2020-Aug-20, Jeremy Schneider wrote:

Alternatively, if we don't want to take this approach, then I'd argue
that we should update README.tuplock to explicitly state that
XMAX_LOCK_ONLY and XMAX_COMMITTED are incompatible (just as it already
states for HEAP_XMAX_IS_MULTI and HEAP_XMAX_COMMITTED) and clean up the
code in heapam_visibility.c for consistency.

Yeah, I like this approach better for the master branch; not just clean
up as in remove the cases that handle it, but also actively elog(ERROR)
if the condition ever occurs (hopefully with some known way to fix the
problem; maybe by "WITH tup AS (DELETE FROM tab WHERE .. RETURNING *)
INSERT * INTO tab FROM tup" or similar.)

+1. I wouldn't mind picking this up, but it might be some time before
I can get to it.

Nathan

In reply to: Alvaro Herrera (#2)
Re: XMAX_LOCK_ONLY and XMAX_COMMITTED (fk/multixact code)

On Wed, Aug 26, 2020 at 12:16 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

We could do this in stable branches, if there were any reports that
this inconsistency is happening in real world databases.

I hope that the new heapam amcheck stuff eventually leads to our
having total (or near total) certainty about what correct on-disk
states are possible, regardless of the exact pg_upgrade + minor
version paths. We should take a strict line on this stuff where
possible. If that turns out to be wrong in some detail, then it's
relatively easy to fix as a bug in amcheck itself.

There is a high cost to allowing ambiguity about what heapam states
are truly legal/possible. It makes future development projects harder.

--
Peter Geoghegan

#5Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Peter Geoghegan (#4)
Re: XMAX_LOCK_ONLY and XMAX_COMMITTED (fk/multixact code)

On Aug 27, 2020, at 4:47 PM, Peter Geoghegan <pg@bowt.ie> wrote:

On Wed, Aug 26, 2020 at 12:16 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

We could do this in stable branches, if there were any reports that
this inconsistency is happening in real world databases.

I hope that the new heapam amcheck stuff eventually leads to our
having total (or near total) certainty about what correct on-disk
states are possible, regardless of the exact pg_upgrade + minor
version paths. We should take a strict line on this stuff where
possible. If that turns out to be wrong in some detail, then it's
relatively easy to fix as a bug in amcheck itself.

There is a high cost to allowing ambiguity about what heapam states
are truly legal/possible. It makes future development projects harder.

The amcheck patch has Asserts in hio.c that purport to disallow writing invalid header bits to disk. The combination being discussed here is not disallowed, but if there is consensus that it is an illegal combination, we could certainly add it:

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index aa3f14c019..ca357410a2 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -47,6 +47,17 @@ RelationPutHeapTuple(Relation relation,
         */
        Assert(!token || HeapTupleHeaderIsSpeculative(tuple->t_data));
+       /*
+        * Do not allow tuples with invalid combinations of hint bits to be placed
+        * on a page.  These combinations are detected as corruption by the
+        * contrib/amcheck logic, so if you disable one or both of these
+        * assertions, make corresponding changes there.
+        */
+       Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+                        (tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)));
+       Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_COMMITTED) &&
+                        (tuple->t_data->t_infomask & HEAP_XMAX_IS_MULTI)));
+
        /* Add the tuple to the page */
        pageHeader = BufferGetPage(buffer);


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In reply to: Mark Dilger (#5)
Re: XMAX_LOCK_ONLY and XMAX_COMMITTED (fk/multixact code)

On Thu, Aug 27, 2020 at 4:57 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

The amcheck patch has Asserts in hio.c that purport to disallow writing invalid header bits to disk.

Can it also be a runtime check for the verification process? I think
that we can easily afford to be fairly exhaustive about stuff like
this.

--
Peter Geoghegan

#7Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Peter Geoghegan (#6)
Re: XMAX_LOCK_ONLY and XMAX_COMMITTED (fk/multixact code)

On Aug 27, 2020, at 4:58 PM, Peter Geoghegan <pg@bowt.ie> wrote:

On Thu, Aug 27, 2020 at 4:57 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

The amcheck patch has Asserts in hio.c that purport to disallow writing invalid header bits to disk.

Can it also be a runtime check for the verification process? I think
that we can easily afford to be fairly exhaustive about stuff like
this.

These two are both checked in verify_heapam.c. The point is that the system will also refuse to write out pages that have this corruption. The Asserts could be converted to panics or whatever, but that has other more serious consequences. Did you have something else in mind?


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In reply to: Mark Dilger (#7)
Re: XMAX_LOCK_ONLY and XMAX_COMMITTED (fk/multixact code)

On Thu, Aug 27, 2020 at 5:06 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

These two are both checked in verify_heapam.c. The point is that the system will also refuse to write out pages that have this corruption. The Asserts could be converted to panics or whatever, but that has other more serious consequences. Did you have something else in mind?

Oh, I see -- you propose to add both an assert to hio.c, as well as a
check to amcheck itself. That seems like the right thing to do.

Thanks

--
Peter Geoghegan