MultiXactId error after upgrade to 9.3.4

sfrost@snowman.net

almost 12 years ago

In reply to: Stephen Frost (#1)

Re: MultiXactId error after upgrade to 9.3.4

All,

* Stephen Frost (sfrost@snowman.net) wrote:

Looks like we might not be entirely out of the woods yet regarding
MultiXactId's. After doing an upgrade from 9.2.6 to 9.3.4, we saw the
following:

ERROR: MultiXactId 6849409 has not been created yet -- apparent wraparound

While trying to get the production system back in order, I was able to
simply do:

select * from table for update;

Which happily updated the xmax for all of the rows- evidently without
any care that the MultiXactId in one of the tuples was considered
invalid (by at least some parts of the code).

I have the pre-upgrade database and can upgrade/rollback/etc that pretty
easily. Note that the table contents weren't changed during the
upgrade, of course, and so the 9.2.6 instance has HEAP_XMAX_IS_MULTI set
while t_xmax is 6849409 for the tuple in question- even though
pg_controldata reports NextMultiXactId as 1601462 (and it seems very
unlikely that there's been a wraparound on that in this database..).

Perhaps something screwed up xmax/HEAP_XMAX_IS_MULTI under 9.2 and the
9.3 instance now detects that something is wrong? Or is this a case
which was previously allowed and it's just in 9.3 that we don't like it?
Hard for me to see why that would be the case, but this really feels
like HEAP_XMAX_IS_MULTI was incorrectly set on the old cluster and the
xmax in the table was actually a regular xid.. That would have come
from 9.2 though.

Thanks,

Stephen

sfrost@snowman.net

almost 12 years ago

In reply to: Stephen Frost (#2)

Re: MultiXactId error after upgrade to 9.3.4

* Stephen Frost (sfrost@snowman.net) wrote:

I have the pre-upgrade database and can upgrade/rollback/etc that pretty
easily. Note that the table contents weren't changed during the
upgrade, of course, and so the 9.2.6 instance has HEAP_XMAX_IS_MULTI set
while t_xmax is 6849409 for the tuple in question- even though
pg_controldata reports NextMultiXactId as 1601462 (and it seems very
unlikely that there's been a wraparound on that in this database..).

Further review leads me to notice that both HEAP_XMAX_IS_MULTI and
HEAP_XMAX_INVALID are set:

t_infomask | 6528

6528 decimal -> 0x1980

0001 1001 1000 0000

Which gives us:

0000 0000 1000 0000 - HEAP_XMAX_LOCK_ONLY
0000 0001 0000 0000 - HEAP_XMIN_COMMITTED
0000 1000 0000 0000 - HEAP_XMAX_INVALID
0001 0000 0000 0000 - HEAP_XMAX_IS_MULTI

Which shows that both HEAP_XMAX_INVALID and HEAP_XMAX_IS_MULTI are set.
Of some interest is that HEAP_XMAX_LOCK_ONLY is also set..

Perhaps something screwed up xmax/HEAP_XMAX_IS_MULTI under 9.2 and the
9.3 instance now detects that something is wrong? Or is this a case
which was previously allowed and it's just in 9.3 that we don't like it?

The 'improve concurrency of FK locking' patch included a change to
UpdateXmaxHintBits():

- * [...] Hence callers should look
- * only at XMAX_INVALID.

...

+ * Hence callers should look only at XMAX_INVALID.
+ *
+ * Note this is not allowed for tuples whose xmax is a multixact.

[...]

+ Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));

What isn't clear to me is if this restriction was supposed to always be
there and something pre-9.3 screwed this up, or if this is a *new*
restriction on what's allowed, in which case it's an on-disk format
change that needs to be accounted for.

One other thing to mention is that this system was originally a 9.0
system and the last update to this tuple that we believe happened was
when it was on 9.0, prior to the 9.2 upgrade (which happened about a
year ago), so it's possible the issue is from the 9.0 era.

Thanks,

Stephen

andres@2ndquadrant.com

almost 12 years ago

In reply to: Stephen Frost (#1)

Re: MultiXactId error after upgrade to 9.3.4

Hi,

On 2014-03-30 00:00:30 -0400, Stephen Frost wrote:

Greetings,

Looks like we might not be entirely out of the woods yet regarding
MultiXactId's. After doing an upgrade from 9.2.6 to 9.3.4, we saw the
following:

ERROR: MultiXactId 6849409 has not been created yet -- apparent wraparound

The table contents can be select'd out and match the pre-upgrade
backup, but any attempt to VACUUM / VACUUM FULL / CLUSTER fails,
unsurprisingly.

Without having looked at the code, IIRC this looks like some place
misses passing allow_old=true where it's actually required. Any chance
you can get a backtrace for the error message? I know you said somewhere
below that you'd worked around the problem, but maybe you have a copy of
the database somewhere?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

alvherre@2ndquadrant.com

almost 12 years ago

In reply to: Stephen Frost (#3)

Re: MultiXactId error after upgrade to 9.3.4

Stephen Frost wrote:

* Stephen Frost (sfrost@snowman.net) wrote:

I have the pre-upgrade database and can upgrade/rollback/etc that pretty
easily. Note that the table contents weren't changed during the
upgrade, of course, and so the 9.2.6 instance has HEAP_XMAX_IS_MULTI set
while t_xmax is 6849409 for the tuple in question- even though
pg_controldata reports NextMultiXactId as 1601462 (and it seems very
unlikely that there's been a wraparound on that in this database..).

Further review leads me to notice that both HEAP_XMAX_IS_MULTI and
HEAP_XMAX_INVALID are set:

t_infomask | 6528

6528 decimal -> 0x1980

0001 1001 1000 0000

Which gives us:

0000 0000 1000 0000 - HEAP_XMAX_LOCK_ONLY
0000 0001 0000 0000 - HEAP_XMIN_COMMITTED
0000 1000 0000 0000 - HEAP_XMAX_INVALID
0001 0000 0000 0000 - HEAP_XMAX_IS_MULTI

Which shows that both HEAP_XMAX_INVALID and HEAP_XMAX_IS_MULTI are set.

My conclusion here is that some part of the code is failing to examine
XMAX_INVALID before looking at the value stored in xmax itself. There
ought to be a short-circuit. Fortunately, this bug should be pretty
harmless.

.. and after looking, I'm fairly sure the bug is in
heap_tuple_needs_freeze.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andres@2ndquadrant.com

almost 12 years ago

In reply to: Alvaro Herrera (#5)

Re: MultiXactId error after upgrade to 9.3.4

On 2014-03-31 08:54:53 -0300, Alvaro Herrera wrote:

My conclusion here is that some part of the code is failing to examine
XMAX_INVALID before looking at the value stored in xmax itself. There
ought to be a short-circuit. Fortunately, this bug should be pretty
harmless.

.. and after looking, I'm fairly sure the bug is in
heap_tuple_needs_freeze.

heap_tuple_needs_freeze() isn't *allowed* to look at
XMAX_INVALID. Otherwise it could miss freezing something still visible
on a standby or after an eventual crash.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

alvherre@2ndquadrant.com

almost 12 years ago

In reply to: Andres Freund (#6)

Re: MultiXactId error after upgrade to 9.3.4

Andres Freund wrote:

On 2014-03-31 08:54:53 -0300, Alvaro Herrera wrote:

My conclusion here is that some part of the code is failing to examine
XMAX_INVALID before looking at the value stored in xmax itself. There
ought to be a short-circuit. Fortunately, this bug should be pretty
harmless.

.. and after looking, I'm fairly sure the bug is in
heap_tuple_needs_freeze.

heap_tuple_needs_freeze() isn't *allowed* to look at
XMAX_INVALID. Otherwise it could miss freezing something still visible
on a standby or after an eventual crash.

Ah, you're right. It even says so on the comment at the top (no
caffeine yet.) But what it's doing is still buggy, per this report, so
we need to do *something* ...

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andres@2ndquadrant.com

almost 12 years ago

In reply to: Alvaro Herrera (#7)

Re: MultiXactId error after upgrade to 9.3.4

On 2014-03-31 09:19:12 -0300, Alvaro Herrera wrote:

Andres Freund wrote:

On 2014-03-31 08:54:53 -0300, Alvaro Herrera wrote:

My conclusion here is that some part of the code is failing to examine
XMAX_INVALID before looking at the value stored in xmax itself. There
ought to be a short-circuit. Fortunately, this bug should be pretty
harmless.

.. and after looking, I'm fairly sure the bug is in
heap_tuple_needs_freeze.

heap_tuple_needs_freeze() isn't *allowed* to look at
XMAX_INVALID. Otherwise it could miss freezing something still visible
on a standby or after an eventual crash.

Ah, you're right. It even says so on the comment at the top (no
caffeine yet.) But what it's doing is still buggy, per this report, so
we need to do *something* ...

Are you sure needs_freeze() is the problem here?

IIRC it already does some checks for allow_old? Why is the check for
that not sufficient?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

alvherre@2ndquadrant.com

almost 12 years ago

In reply to: Andres Freund (#8)

Re: MultiXactId error after upgrade to 9.3.4

Andres Freund wrote:

On 2014-03-31 09:19:12 -0300, Alvaro Herrera wrote:

Andres Freund wrote:

On 2014-03-31 08:54:53 -0300, Alvaro Herrera wrote:

My conclusion here is that some part of the code is failing to examine
XMAX_INVALID before looking at the value stored in xmax itself. There
ought to be a short-circuit. Fortunately, this bug should be pretty
harmless.

.. and after looking, I'm fairly sure the bug is in
heap_tuple_needs_freeze.

heap_tuple_needs_freeze() isn't *allowed* to look at
XMAX_INVALID. Otherwise it could miss freezing something still visible
on a standby or after an eventual crash.

Ah, you're right. It even says so on the comment at the top (no
caffeine yet.) But what it's doing is still buggy, per this report, so
we need to do *something* ...

Are you sure needs_freeze() is the problem here?

IIRC it already does some checks for allow_old? Why is the check for
that not sufficient?

GetMultiXactIdMembers has this:

if (MultiXactIdPrecedes(multi, oldestMXact))
{
ereport(allow_old ? DEBUG1 : ERROR,
(errcode(ERRCODE_INTERNAL_ERROR),
errmsg("MultiXactId %u does no longer exist -- apparent wraparound",
multi)));
return -1;
}

if (!MultiXactIdPrecedes(multi, nextMXact))
ereport(ERROR,
(errcode(ERRCODE_INTERNAL_ERROR),
errmsg("MultiXactId %u has not been created yet -- apparent wraparound",
multi)));

I guess I wasn't expecting that too-old values would last longer than a
full wraparound cycle. Maybe the right fix is just to have the second
check also conditional on allow_old.

Anyway, it's not clear to me why this database has a multixact value of
6 million when the next multixact value is barely above one million.
Stephen said a wraparound here is not likely.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

sfrost@snowman.net

almost 12 years ago

In reply to: Alvaro Herrera (#9)

Re: MultiXactId error after upgrade to 9.3.4

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:

I guess I wasn't expecting that too-old values would last longer than a
full wraparound cycle. Maybe the right fix is just to have the second
check also conditional on allow_old.

I don't believe this was a wraparound case.

Anyway, it's not clear to me why this database has a multixact value of
6 million when the next multixact value is barely above one million.
Stephen said a wraparound here is not likely.

I don't think the xmax value is a multixact at all- I think it's
actually a regular xid, but everyone is expected to ignore it because
XMAX_IS_INVALID, yet somehow the MULTIXACT bit was also set and the new
code expects to be able to look at the xmax field even though it's
marked as invalid..

I'm going through the upgrade process again from 9.2 and will get a
stack trace.

Thanks,

Stephen

#11

andres@2ndquadrant.com

almost 12 years ago

In reply to: Stephen Frost (#10)

Re: MultiXactId error after upgrade to 9.3.4

On 2014-03-31 09:09:08 -0400, Stephen Frost wrote:

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:

I guess I wasn't expecting that too-old values would last longer than a
full wraparound cycle. Maybe the right fix is just to have the second
check also conditional on allow_old.

I don't believe this was a wraparound case.

Could you show pg_controldata from the old cluster?

Anyway, it's not clear to me why this database has a multixact value of
6 million when the next multixact value is barely above one million.
Stephen said a wraparound here is not likely.

I don't think the xmax value is a multixact at all- I think it's
actually a regular xid, but everyone is expected to ignore it because
XMAX_IS_INVALID, yet somehow the MULTIXACT bit was also set and the new
code expects to be able to look at the xmax field even though it's
marked as invalid..

XMAX_IS_INVALID was never allowed to be ignored for vacuum AFAICS. So I
don't think this is a valid explanation.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

sfrost@snowman.net

almost 12 years ago

In reply to: Andres Freund (#4)

Re: MultiXactId error after upgrade to 9.3.4

Andres,

* Andres Freund (andres@2ndquadrant.com) wrote:

Without having looked at the code, IIRC this looks like some place
misses passing allow_old=true where it's actually required. Any chance
you can get a backtrace for the error message? I know you said somewhere
below that you'd worked around the problem, but maybe you have a copy of
the database somewhere?

#0 GetMultiXactIdMembers (allow_old=<optimized out>, members=0x7fff78200ca0, multi=6849409) at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/access/transam/multixact.c:1130
#1 GetMultiXactIdMembers (multi=6849409, members=0x7fff78200ca0, allow_old=<optimized out>) at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/access/transam/multixact.c:1056
#2 0x00007f46e9e707f8 in FreezeMultiXactId (flags=<synthetic pointer>, cutoff_multi=<optimized out>, cutoff_xid=4285178915, t_infomask=<optimized out>, multi=6849409)
at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/access/heap/heapam.c:5355
#3 heap_prepare_freeze_tuple (tuple=0x7f46e2d9a328, cutoff_xid=4285178915, cutoff_multi=<optimized out>, frz=0x7f46ec4d31b0) at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/access/heap/heapam.c:5593
#4 0x00007f46e9f72aca in lazy_scan_heap (scan_all=0 '\000', nindexes=2, Irel=<optimized out>, vacrelstats=<optimized out>, onerel=0x7f46e9d65ca0) at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/commands/vacuumlazy.c:899
#5 lazy_vacuum_rel (onerel=0x7f46e9d65ca0, vacstmt=<optimized out>, bstrategy=<optimized out>) at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/commands/vacuumlazy.c:236
#6 0x00007f46e9f70755 in vacuum_rel (relid=288284, vacstmt=0x7f46ec59f780, do_toast=1 '\001', for_wraparound=0 '\000') at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/commands/vacuum.c:1205
#7 0x00007f46e9f71445 in vacuum (vacstmt=0x7f46ec59f780, relid=<optimized out>, do_toast=1 '\001', bstrategy=<optimized out>, for_wraparound=0 '\000', isTopLevel=<optimized out>)
at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/commands/vacuum.c:234
#8 0x00007f46ea05e773 in standard_ProcessUtility (parsetree=0x7f46ec59f780, queryString=0x7f46ec59ec90 "vacuum common.wave_supplemental;", context=<optimized out>, params=0x0, dest=<optimized out>, completionTag=0x7fff782016e0 "")
at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/tcop/utility.c:639
#9 0x00007f46ea05b5a3 in PortalRunUtility (portal=0x7f46ec4f82e0, utilityStmt=0x7f46ec59f780, isTopLevel=1 '\001', dest=0x7f46ec59fae0, completionTag=0x7fff782016e0 "")
at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/tcop/pquery.c:1187
#10 0x00007f46ea05c1d6 in PortalRunMulti (portal=0x7f46ec4f82e0, isTopLevel=1 '\001', dest=0x7f46ec59fae0, altdest=0x7f46ec59fae0, completionTag=0x7fff782016e0 "")
at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/tcop/pquery.c:1318
#11 0x00007f46ea05ce6d in PortalRun (portal=0x7f46ec4f82e0, count=9223372036854775807, isTopLevel=1 '\001', dest=0x7f46ec59fae0, altdest=0x7f46ec59fae0, completionTag=0x7fff782016e0 "")
at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/tcop/pquery.c:816
#12 0x00007f46ea05908c in exec_simple_query (query_string=0x7f46ec59ec90 "vacuum common.wave_supplemental;") at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/tcop/postgres.c:1048
#13 PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=0x7f46ec4b9010 "mine", username=<optimized out>) at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/tcop/postgres.c:3997
#14 0x00007f46ea01487d in BackendRun (port=0x7f46ec4fc2c0) at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/postmaster/postmaster.c:3996
#15 BackendStartup (port=0x7f46ec4fc2c0) at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/postmaster/postmaster.c:3685
#16 ServerLoop () at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/postmaster/postmaster.c:1586
#17 PostmasterMain (argc=<optimized out>, argv=<optimized out>) at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/postmaster/postmaster.c:1253
#18 0x00007f46e9e4950c in main (argc=5, argv=0x7f46ec4b81f0) at /tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/main/main.c:206

Looks like your idea that is has to do w/ freezeing is accurate...

Thanks,

Stephen

#13

sfrost@snowman.net

almost 12 years ago

In reply to: Andres Freund (#11)

Re: MultiXactId error after upgrade to 9.3.4

* Andres Freund (andres@2ndquadrant.com) wrote:

On 2014-03-31 09:09:08 -0400, Stephen Frost wrote:

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:

I guess I wasn't expecting that too-old values would last longer than a
full wraparound cycle. Maybe the right fix is just to have the second
check also conditional on allow_old.

I don't believe this was a wraparound case.

Could you show pg_controldata from the old cluster?

Per the original email-

The *OLD* (9.2.6) control data had:

pg_control version number: 922
Catalog version number: 201204301

Latest checkpoint's NextXID: 0/40195831
Latest checkpoint's NextOID: 53757891

Latest checkpoint's NextMultiXactId: 1601462
Latest checkpoint's NextMultiOffset: 4503112

Latest checkpoint's oldestXID: 654
Latest checkpoint's oldestXID's DB: 12870
Latest checkpoint's oldestActiveXID: 0

(It doesn't report the oldestMulti info under 9.2.6).

Was there something else you were looking for?

I don't think the xmax value is a multixact at all- I think it's
actually a regular xid, but everyone is expected to ignore it because
XMAX_IS_INVALID, yet somehow the MULTIXACT bit was also set and the new
code expects to be able to look at the xmax field even though it's
marked as invalid..

XMAX_IS_INVALID was never allowed to be ignored for vacuum AFAICS. So I
don't think this is a valid explanation.

The old 9.2 cluster certainly had no issue w/ vacuum'ing this
table/tuple. Unfortunately, I can't have the 9.2 debug packages
installed concurrently w/ the 9.3 debug packages, so it's a bit awkward
to go back and forth between the two. Anything else of interest while I
have the 9.3 instance running under gdb? Sent the requested backtrace
in another email.

Thanks,

Stephen

#14

alvherre@2ndquadrant.com

almost 12 years ago

In reply to: Stephen Frost (#3)

Re: MultiXactId error after upgrade to 9.3.4

Stephen Frost wrote:

Further review leads me to notice that both HEAP_XMAX_IS_MULTI and
HEAP_XMAX_INVALID are set:

t_infomask | 6528

6528 decimal -> 0x1980

0001 1001 1000 0000

Which gives us:

0000 0000 1000 0000 - HEAP_XMAX_LOCK_ONLY
0000 0001 0000 0000 - HEAP_XMIN_COMMITTED
0000 1000 0000 0000 - HEAP_XMAX_INVALID
0001 0000 0000 0000 - HEAP_XMAX_IS_MULTI

Which shows that both HEAP_XMAX_INVALID and HEAP_XMAX_IS_MULTI are set.
Of some interest is that HEAP_XMAX_LOCK_ONLY is also set..

This combination seems reasonable. This tuple had two FOR SHARE
lockers, so it was marked HEAP_XMAX_SHARED_LOCK|HEAP_XMAX_IS_MULTI
(0x1080). Then those lockers finished, and somebody else checked the
tuple with a tqual.c routine (say HeapTupleSatisfiesUpdate), which saw
the lockers were gone and marked it as HEAP_XMAX_INVALID (0x800),
without removing the Xmax value and without removing the other bits.

This is all per spec, so we must cope.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

alvherre@2ndquadrant.com

almost 12 years ago

In reply to: Andres Freund (#6)

Re: MultiXactId error after upgrade to 9.3.4

Andres Freund wrote:

On 2014-03-31 08:54:53 -0300, Alvaro Herrera wrote:

My conclusion here is that some part of the code is failing to examine
XMAX_INVALID before looking at the value stored in xmax itself. There
ought to be a short-circuit. Fortunately, this bug should be pretty
harmless.

.. and after looking, I'm fairly sure the bug is in
heap_tuple_needs_freeze.

heap_tuple_needs_freeze() isn't *allowed* to look at
XMAX_INVALID. Otherwise it could miss freezing something still visible
on a standby or after an eventual crash.

I think this rule is wrong. I think the rule ought to be something like
"if the XMAX_INVALID bit is set, then reset whatever is there if there
is something; if the bit is not set, proceed as today". Otherwise we
risk reading garbage, which is what is happening in this case.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

alvherre@2ndquadrant.com

almost 12 years ago

In reply to: Alvaro Herrera (#15)

Re: MultiXactId error after upgrade to 9.3.4

Alvaro Herrera wrote:

Andres Freund wrote:

On 2014-03-31 08:54:53 -0300, Alvaro Herrera wrote:

My conclusion here is that some part of the code is failing to examine
XMAX_INVALID before looking at the value stored in xmax itself. There
ought to be a short-circuit. Fortunately, this bug should be pretty
harmless.

.. and after looking, I'm fairly sure the bug is in
heap_tuple_needs_freeze.

heap_tuple_needs_freeze() isn't *allowed* to look at
XMAX_INVALID. Otherwise it could miss freezing something still visible
on a standby or after an eventual crash.

I think this rule is wrong. I think the rule ought to be something like
"if the XMAX_INVALID bit is set, then reset whatever is there if there
is something; if the bit is not set, proceed as today". Otherwise we
risk reading garbage, which is what is happening in this case.

Andres asks on IM: How come there is garbage there in the first place?
I have to admit I have no idea.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

sfrost@snowman.net

almost 12 years ago

In reply to: Alvaro Herrera (#16)

Re: MultiXactId error after upgrade to 9.3.4

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:

I think this rule is wrong. I think the rule ought to be something like
"if the XMAX_INVALID bit is set, then reset whatever is there if there
is something; if the bit is not set, proceed as today". Otherwise we
risk reading garbage, which is what is happening in this case.

Andres asks on IM: How come there is garbage there in the first place?
I have to admit I have no idea.

I haven't got any great explanation for that either. I continue to feel
that it's much more likely that it's an xid than a multixid. Is it
possible that it was stamped with a real xmax through some code path
which ignored the IS_MULTI flag? This could have been from as far back
as 9.0-era. On this over-7TB database, only this one tuple had the
issue. I have another set of databases which add up to ~20TB that I'm
currently testing an upgrade from 9.2 to 9.3 on and will certainly let
everyone know if I run into a similar situation there.

On our smallest DB (which we upgraded first) which is much more OLTP
instead of OLAP, we didn't run into this.

This is all on physical gear and we've seen no indications that there
has been any corruption. Hard to rule it out completely, but it seems
pretty unlikely.

Thanks,

Stephen

#18

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: Stephen Frost (#12)

Re: MultiXactId error after upgrade to 9.3.4

On Mon, Mar 31, 2014 at 09:36:03AM -0400, Stephen Frost wrote:

Andres,

* Andres Freund (andres@2ndquadrant.com) wrote:

Without having looked at the code, IIRC this looks like some place
misses passing allow_old=true where it's actually required. Any chance
you can get a backtrace for the error message? I know you said somewhere
below that you'd worked around the problem, but maybe you have a copy of
the database somewhere?

Looks like your idea that is has to do w/ freezeing is accurate...

Where are we on this?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Andres Freund (#6)

1 attachment(s)

Re: MultiXactId error after upgrade to 9.3.4

Andres Freund wrote:

On 2014-03-31 08:54:53 -0300, Alvaro Herrera wrote:

My conclusion here is that some part of the code is failing to examine
XMAX_INVALID before looking at the value stored in xmax itself. There
ought to be a short-circuit. Fortunately, this bug should be pretty
harmless.

.. and after looking, I'm fairly sure the bug is in
heap_tuple_needs_freeze.

heap_tuple_needs_freeze() isn't *allowed* to look at
XMAX_INVALID. Otherwise it could miss freezing something still visible
on a standby or after an eventual crash.

I think what we should do here is that if we see that XMAX_INVALID is
set, we just reset everything to zero without checking the multixact
contents. Something like the attached (warning: hand-edited, line
numbers might be bogus)

I still don't know under what circumstances this situation could arise.
This seems most strange to me. I would wonder about this to be just
papering over a different bug elsewhere, except that we know this tuple
comes from a pg_upgraded table and so I think the only real solution is
to cope.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

no-freeze-invalid-multi.patchtext/x-diff; charset=us-asciiDownload

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9283b70..72602fd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -5585,7 +5602,12 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
 	 */
 	xid = HeapTupleHeaderGetRawXmax(tuple);
 
-	if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
+	if ((tuple->t_infomask & HEAP_XMAX_IS_MULTI) &&
+		(tuple->t_infomask & HEAP_XMAX_INVALID))
+	{
+		freeze_xmax = true;
+	}
+	else if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
 	{
 		TransactionId newxmax;
 		uint16		flags;

#20

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: Alvaro Herrera (#19)

Re: MultiXactId error after upgrade to 9.3.4

On Wed, Apr 23, 2014 at 03:01:02PM -0300, Alvaro Herrera wrote:

Andres Freund wrote:

On 2014-03-31 08:54:53 -0300, Alvaro Herrera wrote:

My conclusion here is that some part of the code is failing to examine
XMAX_INVALID before looking at the value stored in xmax itself. There
ought to be a short-circuit. Fortunately, this bug should be pretty
harmless.

.. and after looking, I'm fairly sure the bug is in
heap_tuple_needs_freeze.

heap_tuple_needs_freeze() isn't *allowed* to look at
XMAX_INVALID. Otherwise it could miss freezing something still visible
on a standby or after an eventual crash.

I think what we should do here is that if we see that XMAX_INVALID is
set, we just reset everything to zero without checking the multixact
contents. Something like the attached (warning: hand-edited, line
numbers might be bogus)

I still don't know under what circumstances this situation could arise.
This seems most strange to me. I would wonder about this to be just
papering over a different bug elsewhere, except that we know this tuple
comes from a pg_upgraded table and so I think the only real solution is
to cope.

Shouldn't we log something at least if we are unsure of the cause?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Bruce Momjian (#20)

Re: MultiXactId error after upgrade to 9.3.4

Bruce Momjian wrote:

On Wed, Apr 23, 2014 at 03:01:02PM -0300, Alvaro Herrera wrote:

Andres Freund wrote:

On 2014-03-31 08:54:53 -0300, Alvaro Herrera wrote:

My conclusion here is that some part of the code is failing to examine
XMAX_INVALID before looking at the value stored in xmax itself. There
ought to be a short-circuit. Fortunately, this bug should be pretty
harmless.

.. and after looking, I'm fairly sure the bug is in
heap_tuple_needs_freeze.

heap_tuple_needs_freeze() isn't *allowed* to look at
XMAX_INVALID. Otherwise it could miss freezing something still visible
on a standby or after an eventual crash.

I think what we should do here is that if we see that XMAX_INVALID is
set, we just reset everything to zero without checking the multixact
contents. Something like the attached (warning: hand-edited, line
numbers might be bogus)

I still don't know under what circumstances this situation could arise.
This seems most strange to me. I would wonder about this to be just
papering over a different bug elsewhere, except that we know this tuple
comes from a pg_upgraded table and so I think the only real solution is
to cope.

Shouldn't we log something at least if we are unsure of the cause?

I don't know. Is it possible that XMAX_IS_MULTI got set because of
cosmic rays? At this point that's the only explanation that makes sense
to me. And I'm not sure what to do about this until we know more --
more user reports of this problem, for instance.

I don't see any reasonable way to distinguish this particular kind of
multixact-out-of-bounds situation from any other, so not sure what else
to log either (you can see that we already emit an error message.)

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: Alvaro Herrera (#21)

Re: MultiXactId error after upgrade to 9.3.4

On Wed, Apr 23, 2014 at 03:42:14PM -0300, Alvaro Herrera wrote:

I still don't know under what circumstances this situation could arise.
This seems most strange to me. I would wonder about this to be just
papering over a different bug elsewhere, except that we know this tuple
comes from a pg_upgraded table and so I think the only real solution is
to cope.

Shouldn't we log something at least if we are unsure of the cause?

I don't know. Is it possible that XMAX_IS_MULTI got set because of
cosmic rays? At this point that's the only explanation that makes sense
to me. And I'm not sure what to do about this until we know more --
more user reports of this problem, for instance.

I don't see any reasonable way to distinguish this particular kind of
multixact-out-of-bounds situation from any other, so not sure what else
to log either (you can see that we already emit an error message.)

I guess I am lost then. I thought it supressed the error. What does
the patch do?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Bruce Momjian (#22)

Re: MultiXactId error after upgrade to 9.3.4

Bruce Momjian wrote:

On Wed, Apr 23, 2014 at 03:42:14PM -0300, Alvaro Herrera wrote:

I still don't know under what circumstances this situation could arise.
This seems most strange to me. I would wonder about this to be just
papering over a different bug elsewhere, except that we know this tuple
comes from a pg_upgraded table and so I think the only real solution is
to cope.

Shouldn't we log something at least if we are unsure of the cause?

I don't know. Is it possible that XMAX_IS_MULTI got set because of
cosmic rays? At this point that's the only explanation that makes sense
to me. And I'm not sure what to do about this until we know more --
more user reports of this problem, for instance.

I don't see any reasonable way to distinguish this particular kind of
multixact-out-of-bounds situation from any other, so not sure what else
to log either (you can see that we already emit an error message.)

I guess I am lost then. I thought it supressed the error. What does
the patch do?

You're right, it does. I am not sure I would apply it.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

andres@2ndquadrant.com

over 11 years ago

In reply to: Alvaro Herrera (#23)

Re: MultiXactId error after upgrade to 9.3.4

On April 23, 2014 8:51:21 PM CEST, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Bruce Momjian wrote:

On Wed, Apr 23, 2014 at 03:42:14PM -0300, Alvaro Herrera wrote:

I still don't know under what circumstances this situation

could arise.

This seems most strange to me. I would wonder about this to be

just

papering over a different bug elsewhere, except that we know

this tuple

comes from a pg_upgraded table and so I think the only real

solution is

to cope.

Shouldn't we log something at least if we are unsure of the

cause?

I don't know. Is it possible that XMAX_IS_MULTI got set because of
cosmic rays? At this point that's the only explanation that makes

sense

to me. And I'm not sure what to do about this until we know more

--

more user reports of this problem, for instance.

I don't see any reasonable way to distinguish this particular kind

of

multixact-out-of-bounds situation from any other, so not sure what

else

to log either (you can see that we already emit an error message.)

I guess I am lost then. I thought it supressed the error. What does
the patch do?

You're right, it does. I am not sure I would apply it.

I think this patch is a seriously bad idea. For one, it's not actually doing anything about the problem - the tuple can be accessed without freezing getting involved. For another, it will increase wall traffic for freezing of short lived tuples considerably, without any benefit.

Andres

--
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Andres Freund (#24)

Re: MultiXactId error after upgrade to 9.3.4

Andres Freund wrote:

I think this patch is a seriously bad idea. For one, it's not actually
doing anything about the problem - the tuple can be accessed without
freezing getting involved.

Normal access other than freeze is not a problem, because other code
paths do check for HEAP_XMAX_INVALID and avoid access to Xmax if that's
set.

For another, it will increase wall traffic for freezing of short lived
tuples considerably, without any benefit.

True. I didn't actually try to run this; it was just for demonstration
purposes. It'd need some cooperation from heap_tuple_needs_freeze in
order to work at all, for one thing.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

andres@2ndquadrant.com

over 11 years ago

In reply to: Alvaro Herrera (#25)

Re: MultiXactId error after upgrade to 9.3.4

On 2014-04-23 16:30:05 -0300, Alvaro Herrera wrote:

Andres Freund wrote:

I think this patch is a seriously bad idea. For one, it's not actually
doing anything about the problem - the tuple can be accessed without
freezing getting involved.

Normal access other than freeze is not a problem, because other code
paths do check for HEAP_XMAX_INVALID and avoid access to Xmax if that's
set.

For another, it will increase wall traffic for freezing of short lived
tuples considerably, without any benefit.

True. I didn't actually try to run this; it was just for demonstration
purposes. It'd need some cooperation from heap_tuple_needs_freeze in
order to work at all, for one thing.

I think if you want to add something like this it should be added
*inside* the if (MultiXactIdPrecedes(multi, cutoff_multi)) block in
FreezeMultiXactId(). There it seems to make quite a bit of sense.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27

/messages/by-id/20140330040029.GY4582@tamriel.snowman.net
/messages/by-id/538F3D70.6080902@publicrelay.com
/messages/by-id/556439CF.7070109@pscs.co.uk
/messages/by-id/20160614173150.GA443784@alvherre.pgsql
/messages/by-id/20160615203829.5798.4594@wrigleys.postgresql.org

alvherre@2ndquadrant.com

over 9 years ago

In reply to: Stephen Frost (#1)

Re: MultiXactId error after upgrade to 9.3.4

Stephen Frost wrote:

Greetings,

Looks like we might not be entirely out of the woods yet regarding
MultiXactId's. After doing an upgrade from 9.2.6 to 9.3.4, we saw the
following:

ERROR: MultiXactId 6849409 has not been created yet -- apparent wraparound

The table contents can be select'd out and match the pre-upgrade
backup, but any attempt to VACUUM / VACUUM FULL / CLUSTER fails,
unsurprisingly.

I finally figured what is going on here, though I don't yet have a
patch.

This has been reported a number of times:

We theorised that we were missing some place that was failing to pass
the "allow_old" flag to GetMultiXactIdMembers; and since we couldn't
find any and the problem was worked around simply (by doing SELECT FOR
UPDATE or equivalent on the affected tuples), there was no further
research. (The allow_old flag is passed for tuples that match an
infomask bit pattern that can only come from tuples locked in 9.2 and
prior, i.e. one that is never set by 9.3ff).

Yesterday I had to deal with it and quickly found what is going wrong:
the problem is that 9.2 and earlier it was acceptable (and common) to
leave tuples with very old multixacts in xmax, even after multixact
counter wraparound. When one such value was found in a live tuple,
GetMultiXactIdMembers() would notice that it was out of range and simply
return "no members", at which point heap_update and siblings would
consider the tuple as not locked and move on.

When pg_upgrading a database containing tuples marked like that, the new
code would error out, because during 9.3 multixact we considered that it
was dangerous to silently allow tuples to be marked by values we didn't
keep track of, so we made it an error instead, per
/messages/by-id/20111204122027.GA10035@tornado.leadboat.com
Some cases are allowed to be downgraded to DEBUG, when allow_old is
true.

I think that was a good choice in general so that possibly-data-eating
bugs could be reported, but there's a problem in the specific case of
tuples carried over by pg_upgrade whose Multixact is "further in the
future" compared to the nextMultiXactId counter. I think it's pretty
clear that we should let that error be downgraded to DEBUG too, like the
other checks.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

Andrew Gierth

andrew@tao11.riddles.org.uk

over 9 years ago

In reply to: Alvaro Herrera (#27)

Re: MultiXactId error after upgrade to 9.3.4

"Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Alvaro> I think that was a good choice in general so that
Alvaro> possibly-data-eating bugs could be reported, but there's a
Alvaro> problem in the specific case of tuples carried over by
Alvaro> pg_upgrade whose Multixact is "further in the future" compared
Alvaro> to the nextMultiXactId counter. I think it's pretty clear that
Alvaro> we should let that error be downgraded to DEBUG too, like the
Alvaro> other checks.

But that leaves an obvious third issue: it's all very well to ignore the
pre-upgrade (pre-9.3) mxid if it's older than the cutoff or it's in the
future, but what if it's actually inside the currently valid range?
Looking it up as though it were a valid mxid in that case seems to be
completely wrong and could introduce more subtle errors.

(It can, AFAICT, be inside the currently valid range due to wraparound,
i.e. without there being a valid pg_multixact entry for it, because
AFAICT in 9.2, once the mxid is hinted dead it is never again either
looked up or cleared, so it can sit in the tuple xmax forever, even
through multiple wraparounds.)

Why is the correct rule not "check for and ignore pre-upgrade mxids
before even trying to fetch members"?

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29

alvherre@2ndquadrant.com

over 9 years ago

In reply to: Andrew Gierth (#28)

1 attachment(s)

Re: MultiXactId error after upgrade to 9.3.4

Andrew Gierth wrote:

But that leaves an obvious third issue: it's all very well to ignore the
pre-upgrade (pre-9.3) mxid if it's older than the cutoff or it's in the
future, but what if it's actually inside the currently valid range?
Looking it up as though it were a valid mxid in that case seems to be
completely wrong and could introduce more subtle errors.

You're right, we should not try to resolve a multixact coming from the
old install in any case.

(It can, AFAICT, be inside the currently valid range due to wraparound,
i.e. without there being a valid pg_multixact entry for it, because
AFAICT in 9.2, once the mxid is hinted dead it is never again either
looked up or cleared, so it can sit in the tuple xmax forever, even
through multiple wraparounds.)

HeapTupleSatisfiesVacuum removes very old multixacts; see the
HEAP_IS_LOCKED block starting in line 1162 where we set
HEAP_XMAX_INVALID for multixacts that are not running by falling
through. It's not a strict guarantee but this probably explains why
this problem is not more common.

Why is the correct rule not "check for and ignore pre-upgrade mxids
before even trying to fetch members"?

I propose something like the attached patch, which implements that idea.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

upgraded-locks.patchtext/x-diff; charset=us-asciiDownload

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index 88f7137..e20e7f8 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -158,8 +158,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 
 				values[Atnum_ismulti] = pstrdup("true");
 
-				allow_old = !(infomask & HEAP_LOCK_MASK) &&
-					(infomask & HEAP_XMAX_LOCK_ONLY);
+				allow_old = HEAP_LOCKED_UPGRADED(infomask);
 				nmembers = GetMultiXactIdMembers(xmax, &members, allow_old,
 												 false);
 				if (nmembers == -1)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 22b3f5f..9d2de7f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -5225,8 +5225,7 @@ l5:
 		 * pg_upgrade; both MultiXactIdIsRunning and MultiXactIdExpand assume
 		 * that such multis are never passed.
 		 */
-		if (!(old_infomask & HEAP_LOCK_MASK) &&
-			HEAP_XMAX_IS_LOCKED_ONLY(old_infomask))
+		if (HEAP_LOCKED_UPGRADED(old_infomask))
 		{
 			old_infomask &= ~HEAP_XMAX_IS_MULTI;
 			old_infomask |= HEAP_XMAX_INVALID;
@@ -5586,6 +5585,7 @@ l4:
 				int			i;
 				MultiXactMember *members;
 
+				/* XXX do we need a HEAP_LOCKED_UPGRADED test? */
 				nmembers = GetMultiXactIdMembers(rawxmax, &members, false,
 									 HEAP_XMAX_IS_LOCKED_ONLY(old_infomask));
 				for (i = 0; i < nmembers; i++)
@@ -6144,14 +6144,14 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	bool		has_lockers;
 	TransactionId update_xid;
 	bool		update_committed;
-	bool		allow_old;
 
 	*flags = 0;
 
 	/* We should only be called in Multis */
 	Assert(t_infomask & HEAP_XMAX_IS_MULTI);
 
-	if (!MultiXactIdIsValid(multi))
+	if (!MultiXactIdIsValid(multi) ||
+		HEAP_LOCKED_UPGRADED(t_infomask))
 	{
 		/* Ensure infomask bits are appropriately set/reset */
 		*flags |= FRM_INVALIDATE_XMAX;
@@ -6164,14 +6164,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 		 * was a locker only, it can be removed without any further
 		 * consideration; but if it contained an update, we might need to
 		 * preserve it.
-		 *
-		 * Don't assert MultiXactIdIsRunning if the multi came from a
-		 * pg_upgrade'd share-locked tuple, though, as doing that causes an
-		 * error to be raised unnecessarily.
 		 */
-		Assert((!(t_infomask & HEAP_LOCK_MASK) &&
-				HEAP_XMAX_IS_LOCKED_ONLY(t_infomask)) ||
-			   !MultiXactIdIsRunning(multi,
+		Assert(!MultiXactIdIsRunning(multi,
 									 HEAP_XMAX_IS_LOCKED_ONLY(t_infomask)));
 		if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
 		{
@@ -6213,10 +6207,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	 * anything.
 	 */
 
-	allow_old = !(t_infomask & HEAP_LOCK_MASK) &&
-		HEAP_XMAX_IS_LOCKED_ONLY(t_infomask);
 	nmembers =
-		GetMultiXactIdMembers(multi, &members, allow_old,
+		GetMultiXactIdMembers(multi, &members, false,
 							  HEAP_XMAX_IS_LOCKED_ONLY(t_infomask));
 	if (nmembers <= 0)
 	{
@@ -6777,14 +6769,15 @@ static bool
 DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
 						LockTupleMode lockmode)
 {
-	bool		allow_old;
 	int			nmembers;
 	MultiXactMember *members;
 	bool		result = false;
 	LOCKMODE	wanted = tupleLockExtraInfo[lockmode].hwlock;
 
-	allow_old = !(infomask & HEAP_LOCK_MASK) && HEAP_XMAX_IS_LOCKED_ONLY(infomask);
-	nmembers = GetMultiXactIdMembers(multi, &members, allow_old,
+	if (HEAP_LOCKED_UPGRADED(infomask))
+		return false;
+
+	nmembers = GetMultiXactIdMembers(multi, &members, false,
 									 HEAP_XMAX_IS_LOCKED_ONLY(infomask));
 	if (nmembers >= 0)
 	{
@@ -6867,15 +6860,15 @@ Do_MultiXactIdWait(MultiXactId multi, MultiXactStatus status,
 				   Relation rel, ItemPointer ctid, XLTW_Oper oper,
 				   int *remaining)
 {
-	bool		allow_old;
 	bool		result = true;
 	MultiXactMember *members;
 	int			nmembers;
 	int			remain = 0;
 
-	allow_old = !(infomask & HEAP_LOCK_MASK) && HEAP_XMAX_IS_LOCKED_ONLY(infomask);
-	nmembers = GetMultiXactIdMembers(multi, &members, allow_old,
-									 HEAP_XMAX_IS_LOCKED_ONLY(infomask));
+	/* for pre-pg_upgrade tuples, no need to sleep at all */
+	nmembers = HEAP_LOCKED_UPGRADED(infomask) ? -1 :
+		GetMultiXactIdMembers(multi, &members, false,
+							  HEAP_XMAX_IS_LOCKED_ONLY(infomask));
 
 	if (nmembers >= 0)
 	{
@@ -7053,9 +7046,11 @@ heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid,
 		multi = HeapTupleHeaderGetRawXmax(tuple);
 		if (!MultiXactIdIsValid(multi))
 		{
-			/* no xmax set, ignore */
+			/* no valid xmax set, ignore */
 			;
 		}
+		else if (HEAP_LOCKED_UPGRADED(tuple->t_infomask))
+			return true;
 		else if (MultiXactIdPrecedes(multi, cutoff_multi))
 			return true;
 		else
@@ -7063,13 +7058,10 @@ heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid,
 			MultiXactMember *members;
 			int			nmembers;
 			int			i;
-			bool		allow_old;
 
 			/* need to check whether any member of the mxact is too old */
 
-			allow_old = !(tuple->t_infomask & HEAP_LOCK_MASK) &&
-				HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask);
-			nmembers = GetMultiXactIdMembers(multi, &members, allow_old,
+			nmembers = GetMultiXactIdMembers(multi, &members, false,
 								HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask));
 
 			for (i = 0; i < nmembers; i++)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 7bccca8..ee2b7ab 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1175,12 +1175,17 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
  * GetMultiXactIdMembers
  *		Returns the set of MultiXactMembers that make up a MultiXactId
  *
- * If the given MultiXactId is older than the value we know to be oldest, we
- * return -1.  The caller is expected to allow that only in permissible cases,
- * i.e. when the infomask lets it presuppose that the tuple had been
- * share-locked before a pg_upgrade; this means that the HEAP_XMAX_LOCK_ONLY
- * needs to be set, but HEAP_XMAX_KEYSHR_LOCK and HEAP_XMAX_EXCL_LOCK are not
- * set.
+ * from_pgupgrade should be set to true when a multixact corresponds to a
+ * value from a tuple that was locked in a 9.2-or-older install and later
+ * pg_upgrade'd.  In this case, we now for certain that no members can still
+ * be running, so we return -1 just like for an empty multixact without
+ * further any further checking.  It would be wrong to try to resolve such
+ * multixacts, because they may be pointing to any part of the multixact
+ * space, both within the current valid range in which case we could return
+ * bogus results, or outside it, which would raise errors.  This flag should
+ * only be passed true when the multixact is attached to a tuple with
+ * HEAP_XMAX_LOCK_ONLY set, but neither of HEAP_XMAX_KEYSHR_LOCK and
+ * HEAP_XMAX_EXCL_LOCK.
  *
  * Other border conditions, such as trying to read a value that's larger than
  * the value currently known as the next to assign, raise an error.  Previously
@@ -1194,7 +1199,7 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
  */
 int
 GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
-					  bool allow_old, bool onlyLock)
+					  bool from_pgupgrade, bool onlyLock)
 {
 	int			pageno;
 	int			prev_pageno;
@@ -1213,7 +1218,7 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
-	if (!MultiXactIdIsValid(multi))
+	if (!MultiXactIdIsValid(multi) || from_pgupgrade)
 		return -1;
 
 	/* See if the MultiXactId is in the local cache */
@@ -1273,7 +1278,7 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 
 	if (MultiXactIdPrecedes(multi, oldestMXact))
 	{
-		ereport(allow_old ? DEBUG1 : ERROR,
+		ereport(ERROR,
 				(errcode(ERRCODE_INTERNAL_ERROR),
 		 errmsg("MultiXactId %u does no longer exist -- apparent wraparound",
 				multi)));
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 0e815a9..a08dae1 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -620,15 +620,12 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 	{
 		TransactionId xmax;
 
+		if (HEAP_LOCKED_UPGRADED(tuple->t_infomask))
+			return HeapTupleMayBeUpdated;
+
 		if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
 		{
-			/*
-			 * If it's only locked but neither EXCL_LOCK nor KEYSHR_LOCK is
-			 * set, it cannot possibly be running.  Otherwise need to check.
-			 */
-			if ((tuple->t_infomask & (HEAP_XMAX_EXCL_LOCK |
-									  HEAP_XMAX_KEYSHR_LOCK)) &&
-				MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), true))
+			if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), true))
 				return HeapTupleBeingUpdated;
 
 			SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
@@ -1279,26 +1276,21 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 		 * "Deleting" xact really only locked it, so the tuple is live in any
 		 * case.  However, we should make sure that either XMAX_COMMITTED or
 		 * XMAX_INVALID gets set once the xact is gone, to reduce the costs of
-		 * examining the tuple for future xacts.  Also, marking dead
-		 * MultiXacts as invalid here provides defense against MultiXactId
-		 * wraparound (see also comments in heap_freeze_tuple()).
+		 * examining the tuple for future xacts.
 		 */
 		if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
 		{
 			if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
 			{
 				/*
-				 * If it's only locked but neither EXCL_LOCK nor KEYSHR_LOCK
-				 * are set, it cannot possibly be running; otherwise have to
-				 * check.
+				 * If it's a pre-pg_upgrade tuple, the multixact cannot
+				 * possibly be running; otherwise have to check.
 				 */
-				if ((tuple->t_infomask & (HEAP_XMAX_EXCL_LOCK |
-										  HEAP_XMAX_KEYSHR_LOCK)) &&
+				if (!HEAP_LOCKED_UPGRADED(tuple->t_infomask) &&
 					MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple),
 										 true))
 					return HEAPTUPLE_LIVE;
 				SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
-
 			}
 			else
 			{
diff --git a/src/include/access/htup_details.h b/src/include/access/htup_details.h
index 9f9cf2a..74da10c 100644
--- a/src/include/access/htup_details.h
+++ b/src/include/access/htup_details.h
@@ -218,6 +218,20 @@ struct HeapTupleHeaderData
 	 (((infomask) & (HEAP_XMAX_IS_MULTI | HEAP_LOCK_MASK)) == HEAP_XMAX_EXCL_LOCK))
 
 /*
+ * A tuple that has HEAP_XMAX_IS_MULTI and HEAP_XMAX_LOCK_ONLY but neither of
+ * XMAX_EXCL_LOCK and XMAX_KEYSHR_LOCK must come from a tuple that was
+ * share-locked in 9.2 or earlier and then pg_upgrade'd.
+ *
+ * We must not try to resolve such multixacts locally, because the result would
+ * be bogus, regardless of where they stand with respect to the current valid
+ * range.
+ */
+#define HEAP_LOCKED_UPGRADED(infomask) \
+	(((infomask) & HEAP_XMAX_IS_MULTI) && \
+	 ((infomask) & HEAP_XMAX_LOCK_ONLY) && \
+	 (((infomask) & (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)) == 0))
+
+/*
  * Use these to test whether a particular lock is applied to a tuple
  */
 #define HEAP_XMAX_IS_SHR_LOCKED(infomask) \

#30

alvherre@2ndquadrant.com

over 9 years ago

In reply to: Alvaro Herrera (#29)

Re: MultiXactId error after upgrade to 9.3.4

Alvaro Herrera wrote:

Andrew Gierth wrote:

Why is the correct rule not "check for and ignore pre-upgrade mxids
before even trying to fetch members"?

I propose something like the attached patch, which implements that idea.

Hm, this doesn't apply cleanly to 9.4. I'll need to come up with a
merge tomorrow.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31

Andrew Gierth

andrew@tao11.riddles.org.uk

over 9 years ago

In reply to: Alvaro Herrera (#29)

Re: MultiXactId error after upgrade to 9.3.4

"Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes:

(It can, AFAICT, be inside the currently valid range due to
wraparound, i.e. without there being a valid pg_multixact entry for
it, because AFAICT in 9.2, once the mxid is hinted dead it is never
again either looked up or cleared, so it can sit in the tuple xmax
forever, even through multiple wraparounds.)

Alvaro> HeapTupleSatisfiesVacuum removes very old multixacts

It does nothing of the kind; it only marks them HEAP_XMAX_INVALID. The
actual mxid remains in the tuple xmax field.

The failing mxids in the case I analyzed on -bugs are failing _in spite
of_ being already hinted HEAP_XMAX_INVALID, because the code path in
question doesn't check that.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Andrew Gierth (#28)

Re: MultiXactId error after upgrade to 9.3.4

On Thu, Jun 16, 2016 at 4:50 AM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

Why is the correct rule not "check for and ignore pre-upgrade mxids
before even trying to fetch members"?

I entirely believe that's the correct rule, but doesn't implementing
it require a crystal balll?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33

Andrew Gierth

andrew@tao11.riddles.org.uk

over 9 years ago

In reply to: Robert Haas (#32)

Re: MultiXactId error after upgrade to 9.3.4

"Robert" == Robert Haas <robertmhaas@gmail.com> writes:

Why is the correct rule not "check for and ignore pre-upgrade mxids
before even trying to fetch members"?

Robert> I entirely believe that's the correct rule, but doesn't
Robert> implementing it require a crystal balll?

Why would it? Pre-9.3 mxids are identified by the combination of flag
bits in the infomask, see Alvaro's patch.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

alvherre@2ndquadrant.com

over 9 years ago

In reply to: Andrew Gierth (#31)

Re: MultiXactId error after upgrade to 9.3.4

Andrew Gierth wrote:

"Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes:

(It can, AFAICT, be inside the currently valid range due to
wraparound, i.e. without there being a valid pg_multixact entry for
it, because AFAICT in 9.2, once the mxid is hinted dead it is never
again either looked up or cleared, so it can sit in the tuple xmax
forever, even through multiple wraparounds.)

Alvaro> HeapTupleSatisfiesVacuum removes very old multixacts

It does nothing of the kind; it only marks them HEAP_XMAX_INVALID. The
actual mxid remains in the tuple xmax field.

The failing mxids in the case I analyzed on -bugs are failing _in spite
of_ being already hinted HEAP_XMAX_INVALID, because the code path in
question doesn't check that.

Ah, right. I had some code to reset HEAP_XMAX_IS_MULTI early on but
somebody didn't like it and we removed it; and we also removed some of
the checks for HEAP_XMAX_INVALID we had, or perhaps didn't extend them
to every place that needed them. It's not critical now anyway; the
patch I posted (or some variation thereof) should suffice as a fix.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35

alvherre@2ndquadrant.com

over 9 years ago

In reply to: Alvaro Herrera (#29)

1 attachment(s)

Re: MultiXactId error after upgrade to 9.3.4

Alvaro Herrera wrote:

Andrew Gierth wrote:

Why is the correct rule not "check for and ignore pre-upgrade mxids
before even trying to fetch members"?

I propose something like the attached patch, which implements that idea.

Here's a backpatch of that to 9.3 and 9.4.

I tested this by creating a 9.2 installation with an out-of-range
multixact, and verified that after upgrading this to 9.3 it fails with
the "apparent wraparound" message in VACUUM. With this patch applied,
it silently removes the multixact.

I will clean this up and push to all branches after the tagging of
9.6beta2 on Monday.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

upgraded-locks-93.patchtext/x-diff; charset=us-asciiDownload

commit 5a57daa6ef9784c42f3cb2ec6e3a58d1e4004593[m
Author:     Alvaro Herrera <alvherre@alvh.no-ip.org>
AuthorDate: Thu Jun 16 23:33:20 2016 -0400
CommitDate: Fri Jun 17 18:46:04 2016 -0400

    Fix upgraded multixact problem

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index 075d781..ee5b241 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -165,8 +165,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 
 				values[Atnum_ismulti] = pstrdup("true");
 
-				allow_old = !(infomask & HEAP_LOCK_MASK) &&
-					(infomask & HEAP_XMAX_LOCK_ONLY);
+				allow_old = HEAP_LOCKED_UPGRADED(infomask);
 				nmembers = GetMultiXactIdMembers(xmax, &members, allow_old);
 				if (nmembers == -1)
 				{
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1f1bac5..44aa495 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4641,8 +4641,7 @@ l5:
 		 * pg_upgrade; both MultiXactIdIsRunning and MultiXactIdExpand assume
 		 * that such multis are never passed.
 		 */
-		if (!(old_infomask & HEAP_LOCK_MASK) &&
-			HEAP_XMAX_IS_LOCKED_ONLY(old_infomask))
+		if (HEAP_LOCKED_UPGRADED(old_infomask))
 		{
 			old_infomask &= ~HEAP_XMAX_IS_MULTI;
 			old_infomask |= HEAP_XMAX_INVALID;
@@ -5001,6 +5000,7 @@ l4:
 				int		i;
 				MultiXactMember *members;
 
+				/* XXX do we need a HEAP_LOCKED_UPGRADED test? */
 				nmembers = GetMultiXactIdMembers(rawxmax, &members, false);
 				for (i = 0; i < nmembers; i++)
 				{
@@ -5329,14 +5329,14 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	bool		has_lockers;
 	TransactionId update_xid;
 	bool		update_committed;
-	bool		allow_old;
 
 	*flags = 0;
 
 	/* We should only be called in Multis */
 	Assert(t_infomask & HEAP_XMAX_IS_MULTI);
 
-	if (!MultiXactIdIsValid(multi))
+	if (!MultiXactIdIsValid(multi) ||
+		HEAP_LOCKED_UPGRADED(t_infomask))
 	{
 		/* Ensure infomask bits are appropriately set/reset */
 		*flags |= FRM_INVALIDATE_XMAX;
@@ -5349,14 +5349,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 		 * was a locker only, it can be removed without any further
 		 * consideration; but if it contained an update, we might need to
 		 * preserve it.
-		 *
-		 * Don't assert MultiXactIdIsRunning if the multi came from a
-		 * pg_upgrade'd share-locked tuple, though, as doing that causes an
-		 * error to be raised unnecessarily.
 		 */
-		Assert((!(t_infomask & HEAP_LOCK_MASK) &&
-				HEAP_XMAX_IS_LOCKED_ONLY(t_infomask)) ||
-			   !MultiXactIdIsRunning(multi));
+		Assert(!MultiXactIdIsRunning(multi));
 		if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
 		{
 			*flags |= FRM_INVALIDATE_XMAX;
@@ -5397,9 +5391,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	 * anything.
 	 */
 
-	allow_old = !(t_infomask & HEAP_LOCK_MASK) &&
-		HEAP_XMAX_IS_LOCKED_ONLY(t_infomask);
-	nmembers = GetMultiXactIdMembers(multi, &members, allow_old);
+	nmembers =
+		GetMultiXactIdMembers(multi, &members, false);
 	if (nmembers <= 0)
 	{
 		/* Nothing worth keeping */
@@ -5959,14 +5952,15 @@ static bool
 DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
 						LockTupleMode lockmode)
 {
-	bool	allow_old;
 	int		nmembers;
 	MultiXactMember *members;
 	bool	result = false;
 	LOCKMODE wanted = tupleLockExtraInfo[lockmode].hwlock;
 
-	allow_old = !(infomask & HEAP_LOCK_MASK) && HEAP_XMAX_IS_LOCKED_ONLY(infomask);
-	nmembers = GetMultiXactIdMembers(multi, &members, allow_old);
+	if (HEAP_LOCKED_UPGRADED(infomask))
+		return false;
+
+	nmembers = GetMultiXactIdMembers(multi, &members, false);
 	if (nmembers >= 0)
 	{
 		int		i;
@@ -6037,14 +6031,14 @@ static bool
 Do_MultiXactIdWait(MultiXactId multi, MultiXactStatus status,
 				   int *remaining, uint16 infomask, bool nowait)
 {
-	bool		allow_old;
 	bool		result = true;
 	MultiXactMember *members;
 	int			nmembers;
 	int			remain = 0;
 
-	allow_old = !(infomask & HEAP_LOCK_MASK) && HEAP_XMAX_IS_LOCKED_ONLY(infomask);
-	nmembers = GetMultiXactIdMembers(multi, &members, allow_old);
+	/* for pre-pg_upgrade tuples, no need to sleep at all */
+	nmembers = HEAP_LOCKED_UPGRADED(infomask) ? -1 :
+		GetMultiXactIdMembers(multi, &members, false);
 
 	if (nmembers >= 0)
 	{
@@ -6165,9 +6159,11 @@ heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid,
 		multi = HeapTupleHeaderGetRawXmax(tuple);
 		if (!MultiXactIdIsValid(multi))
 		{
-			/* no xmax set, ignore */
+			/* no valid xmax set, ignore */
 			;
 		}
+		else if (HEAP_LOCKED_UPGRADED(tuple->t_infomask))
+			return true;
 		else if (MultiXactIdPrecedes(multi, cutoff_multi))
 			return true;
 		else
@@ -6175,13 +6171,9 @@ heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid,
 			MultiXactMember *members;
 			int			nmembers;
 			int			i;
-			bool		allow_old;
 
 			/* need to check whether any member of the mxact is too old */
-
-			allow_old = !(tuple->t_infomask & HEAP_LOCK_MASK) &&
-				HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask);
-			nmembers = GetMultiXactIdMembers(multi, &members, allow_old);
+			nmembers = GetMultiXactIdMembers(multi, &members, false);
 
 			for (i = 0; i < nmembers; i++)
 			{
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 9b10496..15de62d 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1191,12 +1191,17 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
  * GetMultiXactIdMembers
  *		Returns the set of MultiXactMembers that make up a MultiXactId
  *
- * If the given MultiXactId is older than the value we know to be oldest, we
- * return -1.  The caller is expected to allow that only in permissible cases,
- * i.e. when the infomask lets it presuppose that the tuple had been
- * share-locked before a pg_upgrade; this means that the HEAP_XMAX_LOCK_ONLY
- * needs to be set, but HEAP_XMAX_KEYSHR_LOCK and HEAP_XMAX_EXCL_LOCK are not
- * set.
+ * from_pgupgrade should be set to true when a multixact corresponds to a
+ * value from a tuple that was locked in a 9.2-or-older install and later
+ * pg_upgrade'd.  In this case, we now for certain that no members can still
+ * be running, so we return -1 just like for an empty multixact without
+ * further any further checking.  It would be wrong to try to resolve such
+ * multixacts, because they may be pointing to any part of the multixact
+ * space, both within the current valid range in which case we could return
+ * bogus results, or outside it, which would raise errors.  This flag should
+ * only be passed true when the multixact is attached to a tuple with
+ * HEAP_XMAX_LOCK_ONLY set, but neither of HEAP_XMAX_KEYSHR_LOCK and
+ * HEAP_XMAX_EXCL_LOCK.
  *
  * Other border conditions, such as trying to read a value that's larger than
  * the value currently known as the next to assign, raise an error.  Previously
@@ -1205,7 +1210,7 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
  */
 int
 GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
-					  bool allow_old)
+					  bool from_pgupgrade)
 {
 	int			pageno;
 	int			prev_pageno;
@@ -1224,7 +1229,7 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
-	if (!MultiXactIdIsValid(multi))
+	if (!MultiXactIdIsValid(multi) || from_pgupgrade)
 		return -1;
 
 	/* See if the MultiXactId is in the local cache */
@@ -1271,7 +1276,7 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 
 	if (MultiXactIdPrecedes(multi, oldestMXact))
 	{
-		ereport(allow_old ? DEBUG1 : ERROR,
+		ereport(ERROR,
 				(errcode(ERRCODE_INTERNAL_ERROR),
 		 errmsg("MultiXactId %u does no longer exist -- apparent wraparound",
 				multi)));
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index ccbbf62..9d7050a 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -787,15 +787,12 @@ HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
 	{
 		TransactionId xmax;
 
+		if (HEAP_LOCKED_UPGRADED(tuple->t_infomask))
+			return HeapTupleMayBeUpdated;
+
 		if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
 		{
-			/*
-			 * If it's only locked but neither EXCL_LOCK nor KEYSHR_LOCK is
-			 * set, it cannot possibly be running.  Otherwise need to check.
-			 */
-			if ((tuple->t_infomask & (HEAP_XMAX_EXCL_LOCK |
-									  HEAP_XMAX_KEYSHR_LOCK)) &&
-				MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple)))
+			if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple)))
 				return HeapTupleBeingUpdated;
 
 			SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
@@ -1401,25 +1398,20 @@ HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId OldestXmin,
 		 * "Deleting" xact really only locked it, so the tuple is live in any
 		 * case.  However, we should make sure that either XMAX_COMMITTED or
 		 * XMAX_INVALID gets set once the xact is gone, to reduce the costs of
-		 * examining the tuple for future xacts.  Also, marking dead
-		 * MultiXacts as invalid here provides defense against MultiXactId
-		 * wraparound (see also comments in heap_freeze_tuple()).
+		 * examining the tuple for future xacts.
 		 */
 		if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
 		{
 			if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
 			{
 				/*
-				 * If it's only locked but neither EXCL_LOCK nor KEYSHR_LOCK
-				 * are set, it cannot possibly be running; otherwise have to
-				 * check.
+				 * If it's a pre-pg_upgrade tuple, the multixact cannot
+				 * possibly be running; otherwise have to check.
 				 */
-				if ((tuple->t_infomask & (HEAP_XMAX_EXCL_LOCK |
-										  HEAP_XMAX_KEYSHR_LOCK)) &&
+				if (!HEAP_LOCKED_UPGRADED(tuple->t_infomask) &&
 					MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple)))
 					return HEAPTUPLE_LIVE;
 				SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
-
 			}
 			else
 			{
diff --git a/src/include/access/htup_details.h b/src/include/access/htup_details.h
index f7af83c..1cdbc64 100644
--- a/src/include/access/htup_details.h
+++ b/src/include/access/htup_details.h
@@ -204,6 +204,20 @@ struct HeapTupleHeaderData
 	 (((infomask) & (HEAP_XMAX_IS_MULTI | HEAP_LOCK_MASK)) == HEAP_XMAX_EXCL_LOCK))
 
 /*
+ * A tuple that has HEAP_XMAX_IS_MULTI and HEAP_XMAX_LOCK_ONLY but neither of
+ * XMAX_EXCL_LOCK and XMAX_KEYSHR_LOCK must come from a tuple that was
+ * share-locked in 9.2 or earlier and then pg_upgrade'd.
+ *
+ * We must not try to resolve such multixacts locally, because the result would
+ * be bogus, regardless of where they stand with respect to the current valid
+ * range.
+ */
+#define HEAP_LOCKED_UPGRADED(infomask) \
+	(((infomask) & HEAP_XMAX_IS_MULTI) && \
+	 ((infomask) & HEAP_XMAX_LOCK_ONLY) && \
+	 (((infomask) & (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)) == 0))
+
+/*
  * Use these to test whether a particular lock is applied to a tuple
  */
 #define HEAP_XMAX_IS_SHR_LOCKED(infomask) \

#36

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Andrew Gierth (#33)

Re: MultiXactId error after upgrade to 9.3.4

On Fri, Jun 17, 2016 at 9:33 AM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

"Robert" == Robert Haas <robertmhaas@gmail.com> writes:

Why is the correct rule not "check for and ignore pre-upgrade mxids
before even trying to fetch members"?

Robert> I entirely believe that's the correct rule, but doesn't
Robert> implementing it require a crystal balll?

Why would it? Pre-9.3 mxids are identified by the combination of flag
bits in the infomask, see Alvaro's patch.

I see the patch, but I don't see much explanation of why the patch is
correct, which I think is pretty scary in view of the number of
mistakes we've already made in this area. The comments just say:

+ * A tuple that has HEAP_XMAX_IS_MULTI and HEAP_XMAX_LOCK_ONLY but neither of
+ * XMAX_EXCL_LOCK and XMAX_KEYSHR_LOCK must come from a tuple that was
+ * share-locked in 9.2 or earlier and then pg_upgrade'd.

Why must that be true?

+ * We must not try to resolve such multixacts locally, because the result would
+ * be bogus, regardless of where they stand with respect to the current valid
+ * range.

What about other pre-upgrade mxacts that don't have this exact bit
pattern? Why can't we try to resolve them and end up in trouble just
as easily?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37

alvherre@2ndquadrant.com

over 9 years ago

In reply to: Robert Haas (#36)

Re: MultiXactId error after upgrade to 9.3.4

Robert Haas wrote:

On Fri, Jun 17, 2016 at 9:33 AM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

"Robert" == Robert Haas <robertmhaas@gmail.com> writes:

Why is the correct rule not "check for and ignore pre-upgrade mxids
before even trying to fetch members"?

Robert> I entirely believe that's the correct rule, but doesn't
Robert> implementing it require a crystal balll?

Why would it? Pre-9.3 mxids are identified by the combination of flag
bits in the infomask, see Alvaro's patch.

I see the patch, but I don't see much explanation of why the patch is
correct, which I think is pretty scary in view of the number of
mistakes we've already made in this area.

... and actually the patch fails one isolation tests in 9.3, which is
why I haven't pushed (I haven't tested 9.4 but I suppose it should be
the same). I'm looking into that now.

The comments just say:

+ * A tuple that has HEAP_XMAX_IS_MULTI and HEAP_XMAX_LOCK_ONLY but neither of
+ * XMAX_EXCL_LOCK and XMAX_KEYSHR_LOCK must come from a tuple that was
+ * share-locked in 9.2 or earlier and then pg_upgrade'd.

Why must that be true?

The reason this must be true is that in 9.2 and earlier multixacts were only
used to lock tuples FOR SHARE, which had that specific bit pattern. I suppose
I need to make this comment more explicit.

+ * We must not try to resolve such multixacts locally, because the result would
+ * be bogus, regardless of where they stand with respect to the current valid
+ * range.
What about other pre-upgrade mxacts that don't have this exact bit
pattern? Why can't we try to resolve them and end up in trouble just
as easily?

There shouldn't be any. Back then, it was not possible to have tuples
locked and updated at the same time; FOR UPDATE (the only other locking
mode back then) didn't allow other lockers, so the only possibility was
FOR SHARE with that bit pattern.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38

alvherre@2ndquadrant.com

over 9 years ago

In reply to: Alvaro Herrera (#37)

1 attachment(s)

Re: MultiXactId error after upgrade to 9.3.4

Alvaro Herrera wrote:

Robert Haas wrote:

On Fri, Jun 17, 2016 at 9:33 AM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

"Robert" == Robert Haas <robertmhaas@gmail.com> writes:

Why is the correct rule not "check for and ignore pre-upgrade mxids
before even trying to fetch members"?

Robert> I entirely believe that's the correct rule, but doesn't
Robert> implementing it require a crystal balll?

Why would it? Pre-9.3 mxids are identified by the combination of flag
bits in the infomask, see Alvaro's patch.

I see the patch, but I don't see much explanation of why the patch is
correct, which I think is pretty scary in view of the number of
mistakes we've already made in this area.

... and actually the patch fails one isolation tests in 9.3, which is
why I haven't pushed (I haven't tested 9.4 but I suppose it should be
the same). I'm looking into that now.

Ah, it should have been obvious; the reason it's failing is because 9.3
and 9.4 lack commit 27846f02c176 which removed
MultiXactHasRunningRemoteMembers(), so the straight backpatch plus
conflict fixes left one GetMultiXactIdMembers call with the allow_old
flag to true. The attached patch fixes that omission.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

locked-upgraded-93-fixup.patchtext/x-diff; charset=us-asciiDownload

commit 17507b2a04b8c381997410d1fe3fbaacc34f5a31[m
Author:     Alvaro Herrera <alvherre@alvh.no-ip.org>
AuthorDate: Tue Jun 21 18:07:49 2016 -0400
CommitDate: Tue Jun 21 18:07:49 2016 -0400

    fixup MultiXactHasRunningRemoteMembers

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 15de62d..efbca6f 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1448,7 +1448,7 @@ MultiXactHasRunningRemoteMembers(MultiXactId multi)
 	int			nmembers;
 	int			i;
 
-	nmembers = GetMultiXactIdMembers(multi, &members, true);
+	nmembers = GetMultiXactIdMembers(multi, &members, false);
 	if (nmembers <= 0)
 		return false;
 
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 9d7050a..931e2fb 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -701,7 +701,9 @@ HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
 
 				if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
 				{
-					if (MultiXactHasRunningRemoteMembers(xmax))
+					if (HEAP_LOCKED_UPGRADED(tuple->t_infomask))
+						return HeapTupleMayBeUpdated;
+					else if (MultiXactHasRunningRemoteMembers(xmax))
 						return HeapTupleBeingUpdated;
 					else
 						return HeapTupleMayBeUpdated;
@@ -725,6 +727,7 @@ HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
 
 				/* not LOCKED_ONLY, so it has to have an xmax */
 				Assert(TransactionIdIsValid(xmax));
+				Assert(!HEAP_LOCKED_UPGRADED(tuple->t_infomask));
 
 				/* updating subtransaction must have aborted */
 				if (!TransactionIdIsCurrentTransactionId(xmax))

#39

alvherre@2ndquadrant.com

over 9 years ago

In reply to: Robert Haas (#36)

2 attachment(s)

Re: MultiXactId error after upgrade to 9.3.4

Robert Haas wrote:

I see the patch, but I don't see much explanation of why the patch is
correct, which I think is pretty scary in view of the number of
mistakes we've already made in this area. The comments just say:
+ * A tuple that has HEAP_XMAX_IS_MULTI and HEAP_XMAX_LOCK_ONLY but neither of
+ * XMAX_EXCL_LOCK and XMAX_KEYSHR_LOCK must come from a tuple that was
+ * share-locked in 9.2 or earlier and then pg_upgrade'd.
Why must that be true?
+ * We must not try to resolve such multixacts locally, because the result would
+ * be bogus, regardless of where they stand with respect to the current valid
+ * range.
What about other pre-upgrade mxacts that don't have this exact bit
pattern? Why can't we try to resolve them and end up in trouble just
as easily?

Attached are two patches, one for 9.3 and 9.4 and the other for 9.5 and
master. Pretty much the same as before, but with answers to the above
concerns. In particular,

 /*
+ * A tuple that has HEAP_XMAX_IS_MULTI and HEAP_XMAX_LOCK_ONLY but neither of
+ * XMAX_EXCL_LOCK and XMAX_KEYSHR_LOCK must come from a tuple that was
+ * share-locked in 9.2 or earlier and then pg_upgrade'd.
+ *
+ * In 9.2 and prior, HEAP_XMAX_IS_MULTI was only set when there were multiple
+ * FOR SHARE lockers of that tuple.  That set HEAP_XMAX_LOCK_ONLY (with a
+ * different name back then) but neither of HEAP_XMAX_EXCL_LOCK and
+ * HEAP_XMAX_KEYSHR_LOCK.  That combination is no longer possible in 9.3 and
+ * up, so if we see that combination we know for certain that the tuple was
+ * locked in an earlier release; since all such lockers are gone (they cannot
+ * survive through pg_upgrade), such tuples can safely be considered not
+ * locked.
+ *
+ * We must not resolve such multixacts locally, because the result would be
+ * bogus, regardless of where they stand with respect to the current valid
+ * multixact range.
+ */
+#define HEAP_LOCKED_UPGRADED(infomask) \
+( \
+    ((infomask) & HEAP_XMAX_IS_MULTI) && \
+    ((infomask) & HEAP_XMAX_LOCK_ONLY) && \
+    (((infomask) & (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)) == 0) \
+)

One place that had an XXX comment in the previous patch
(heap_lock_updated_tuple_rec) now contains this:

+               /*
+                * We don't need a test for pg_upgrade'd tuples: this is only
+                * applied to tuples after the first in an update chain.  Said
+                * first tuple in the chain may well be locked-in-9.2-and-
+                * pg_upgraded, but that one was already locked by our caller,
+                * not us; and any subsequent ones cannot be because our
+                * caller must necessarily have obtained a snapshot later than
+                * the pg_upgrade itself.
+                */
+               Assert(!HEAP_LOCKED_UPGRADED(mytup.t_data->t_infomask));

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

locked-upgraded-2-93.patchtext/x-diff; charset=iso-8859-1Download

commit 7235c0b120208d73d53d3929fe8243d5e487dca8[m
Author:     Alvaro Herrera <alvherre@alvh.no-ip.org>
AuthorDate: Thu Jun 16 23:33:20 2016 -0400
CommitDate: Wed Jun 22 17:17:15 2016 -0400

    Fix handling of multixacts predating pg_upgrade
    
    After pg_upgrade, it is possible that some tuples' Xmax have multixacts
    corresponding to the old installation; such multixacts cannot have
    running members anymore.  In many code sites we already know not to read
    them and clobber them silently, but at least when VACUUM tries to freeze
    a multixact or determine if one needs freezing, there's an attempt to
    resolve it to its member transactions by calling GetMultiXactIdMembers,
    and if the multixact value is "in the future" with regards to the
    current valid multixact range, an error like this is raised:
        ERROR:  MultiXactId 123 has not been created yet -- apparent wraparound
    and vacuuming fails.  Per discussion with Andrew Gierth, it is completely
    bogus to try to resolve multixacts coming from before a pg_upgrade,
    regardless of where they stand with regards to the current valid
    multixact range.
    
    It's possible to get from under this problem by doing SELECT FOR UPDATE
    of the problem tuples, but if tables are large, this is slow and
    tedious, so a more thorough solution is desirable.
    
    To fix, we realize that multixacts in xmax created in 9.2 and previous
    have a specific bit pattern that is never used in 9.3 and later.
    Whenever the infomask of the tuple matches that bit pattern, we just
    ignore the multixact completely as if Xmax wasn't set; or, in the case
    of tuple freezing, we act as if an unwanted value is set and clobber it
    without decoding.  This guarantees that no errors will be raised, and
    that the values will be progressively removed until all tables are
    clean.  Most callers of GetMultiXactIdMembers are patched to recognize
    directly that the value is a removable "empty" multixact and avoid
    calling GetMultiXactIdMembers altogether.
    
    To avoid changing the signature of GetMultiXactIdMembers() in back
    branches, we keep the "allow_old" boolean flag but rename it to
    "from_pgupgrade"; if the flag is true, we always return an empty set
    instead of looking up the multixact.  (We could remove the argument in
    the master branch, but choose not to do so in this commit).
    
    Bug analysis by Andrew Gierth and Álvaro Herrera.
    
    A number of public reports match this bug:
      https://www.postgresql.org/message-id/20140330040029.GY4582@tamriel.snowman.net
      https://www.postgresql.org/message-id/538F3D70.6080902@publicrelay.com
      https://www.postgresql.org/message-id/556439CF.7070109@pscs.co.uk
      https://www.postgresql.org/message-id/SG2PR06MB0760098A111C88E31BD4D96FB3540@SG2PR06MB0760.apcprd06.prod.outlook.com
      https://www.postgresql.org/message-id/20160615203829.5798.4594@wrigleys.postgresql.org

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index 075d781..ee5b241 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -165,8 +165,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 
 				values[Atnum_ismulti] = pstrdup("true");
 
-				allow_old = !(infomask & HEAP_LOCK_MASK) &&
-					(infomask & HEAP_XMAX_LOCK_ONLY);
+				allow_old = HEAP_LOCKED_UPGRADED(infomask);
 				nmembers = GetMultiXactIdMembers(xmax, &members, allow_old);
 				if (nmembers == -1)
 				{
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1f1bac5..89a9335 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4641,8 +4641,7 @@ l5:
 		 * pg_upgrade; both MultiXactIdIsRunning and MultiXactIdExpand assume
 		 * that such multis are never passed.
 		 */
-		if (!(old_infomask & HEAP_LOCK_MASK) &&
-			HEAP_XMAX_IS_LOCKED_ONLY(old_infomask))
+		if (HEAP_LOCKED_UPGRADED(old_infomask))
 		{
 			old_infomask &= ~HEAP_XMAX_IS_MULTI;
 			old_infomask |= HEAP_XMAX_INVALID;
@@ -5001,6 +5000,16 @@ l4:
 				int		i;
 				MultiXactMember *members;
 
+				/*
+				 * We don't need a test for pg_upgrade'd tuples: this is only
+				 * applied to tuples after the first in an update chain.  Said
+				 * first tuple in the chain may well be locked-in-9.2-and-
+				 * pg_upgraded, but that one was already locked by our caller,
+				 * not us; and any subsequent ones cannot be because our
+				 * caller must necessarily have obtained a snapshot later than
+				 * the pg_upgrade itself.
+				 */
+				Assert(!HEAP_LOCKED_UPGRADED(mytup.t_data->t_infomask));
 				nmembers = GetMultiXactIdMembers(rawxmax, &members, false);
 				for (i = 0; i < nmembers; i++)
 				{
@@ -5329,14 +5338,14 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	bool		has_lockers;
 	TransactionId update_xid;
 	bool		update_committed;
-	bool		allow_old;
 
 	*flags = 0;
 
 	/* We should only be called in Multis */
 	Assert(t_infomask & HEAP_XMAX_IS_MULTI);
 
-	if (!MultiXactIdIsValid(multi))
+	if (!MultiXactIdIsValid(multi) ||
+		HEAP_LOCKED_UPGRADED(t_infomask))
 	{
 		/* Ensure infomask bits are appropriately set/reset */
 		*flags |= FRM_INVALIDATE_XMAX;
@@ -5349,14 +5358,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 		 * was a locker only, it can be removed without any further
 		 * consideration; but if it contained an update, we might need to
 		 * preserve it.
-		 *
-		 * Don't assert MultiXactIdIsRunning if the multi came from a
-		 * pg_upgrade'd share-locked tuple, though, as doing that causes an
-		 * error to be raised unnecessarily.
 		 */
-		Assert((!(t_infomask & HEAP_LOCK_MASK) &&
-				HEAP_XMAX_IS_LOCKED_ONLY(t_infomask)) ||
-			   !MultiXactIdIsRunning(multi));
+		Assert(!MultiXactIdIsRunning(multi));
 		if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
 		{
 			*flags |= FRM_INVALIDATE_XMAX;
@@ -5397,9 +5400,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	 * anything.
 	 */
 
-	allow_old = !(t_infomask & HEAP_LOCK_MASK) &&
-		HEAP_XMAX_IS_LOCKED_ONLY(t_infomask);
-	nmembers = GetMultiXactIdMembers(multi, &members, allow_old);
+	nmembers =
+		GetMultiXactIdMembers(multi, &members, false);
 	if (nmembers <= 0)
 	{
 		/* Nothing worth keeping */
@@ -5959,14 +5961,15 @@ static bool
 DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
 						LockTupleMode lockmode)
 {
-	bool	allow_old;
 	int		nmembers;
 	MultiXactMember *members;
 	bool	result = false;
 	LOCKMODE wanted = tupleLockExtraInfo[lockmode].hwlock;
 
-	allow_old = !(infomask & HEAP_LOCK_MASK) && HEAP_XMAX_IS_LOCKED_ONLY(infomask);
-	nmembers = GetMultiXactIdMembers(multi, &members, allow_old);
+	if (HEAP_LOCKED_UPGRADED(infomask))
+		return false;
+
+	nmembers = GetMultiXactIdMembers(multi, &members, false);
 	if (nmembers >= 0)
 	{
 		int		i;
@@ -6037,14 +6040,14 @@ static bool
 Do_MultiXactIdWait(MultiXactId multi, MultiXactStatus status,
 				   int *remaining, uint16 infomask, bool nowait)
 {
-	bool		allow_old;
 	bool		result = true;
 	MultiXactMember *members;
 	int			nmembers;
 	int			remain = 0;
 
-	allow_old = !(infomask & HEAP_LOCK_MASK) && HEAP_XMAX_IS_LOCKED_ONLY(infomask);
-	nmembers = GetMultiXactIdMembers(multi, &members, allow_old);
+	/* for pre-pg_upgrade tuples, no need to sleep at all */
+	nmembers = HEAP_LOCKED_UPGRADED(infomask) ? -1 :
+		GetMultiXactIdMembers(multi, &members, false);
 
 	if (nmembers >= 0)
 	{
@@ -6165,9 +6168,11 @@ heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid,
 		multi = HeapTupleHeaderGetRawXmax(tuple);
 		if (!MultiXactIdIsValid(multi))
 		{
-			/* no xmax set, ignore */
+			/* no valid xmax set, ignore */
 			;
 		}
+		else if (HEAP_LOCKED_UPGRADED(tuple->t_infomask))
+			return true;
 		else if (MultiXactIdPrecedes(multi, cutoff_multi))
 			return true;
 		else
@@ -6175,13 +6180,9 @@ heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid,
 			MultiXactMember *members;
 			int			nmembers;
 			int			i;
-			bool		allow_old;
 
 			/* need to check whether any member of the mxact is too old */
-
-			allow_old = !(tuple->t_infomask & HEAP_LOCK_MASK) &&
-				HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask);
-			nmembers = GetMultiXactIdMembers(multi, &members, allow_old);
+			nmembers = GetMultiXactIdMembers(multi, &members, false);
 
 			for (i = 0; i < nmembers; i++)
 			{
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 9b10496..efbca6f 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1191,12 +1191,17 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
  * GetMultiXactIdMembers
  *		Returns the set of MultiXactMembers that make up a MultiXactId
  *
- * If the given MultiXactId is older than the value we know to be oldest, we
- * return -1.  The caller is expected to allow that only in permissible cases,
- * i.e. when the infomask lets it presuppose that the tuple had been
- * share-locked before a pg_upgrade; this means that the HEAP_XMAX_LOCK_ONLY
- * needs to be set, but HEAP_XMAX_KEYSHR_LOCK and HEAP_XMAX_EXCL_LOCK are not
- * set.
+ * from_pgupgrade should be set to true when a multixact corresponds to a
+ * value from a tuple that was locked in a 9.2-or-older install and later
+ * pg_upgrade'd.  In this case, we now for certain that no members can still
+ * be running, so we return -1 just like for an empty multixact without
+ * further any further checking.  It would be wrong to try to resolve such
+ * multixacts, because they may be pointing to any part of the multixact
+ * space, both within the current valid range in which case we could return
+ * bogus results, or outside it, which would raise errors.  This flag should
+ * only be passed true when the multixact is attached to a tuple with
+ * HEAP_XMAX_LOCK_ONLY set, but neither of HEAP_XMAX_KEYSHR_LOCK and
+ * HEAP_XMAX_EXCL_LOCK.
  *
  * Other border conditions, such as trying to read a value that's larger than
  * the value currently known as the next to assign, raise an error.  Previously
@@ -1205,7 +1210,7 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
  */
 int
 GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
-					  bool allow_old)
+					  bool from_pgupgrade)
 {
 	int			pageno;
 	int			prev_pageno;
@@ -1224,7 +1229,7 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
-	if (!MultiXactIdIsValid(multi))
+	if (!MultiXactIdIsValid(multi) || from_pgupgrade)
 		return -1;
 
 	/* See if the MultiXactId is in the local cache */
@@ -1271,7 +1276,7 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 
 	if (MultiXactIdPrecedes(multi, oldestMXact))
 	{
-		ereport(allow_old ? DEBUG1 : ERROR,
+		ereport(ERROR,
 				(errcode(ERRCODE_INTERNAL_ERROR),
 		 errmsg("MultiXactId %u does no longer exist -- apparent wraparound",
 				multi)));
@@ -1443,7 +1448,7 @@ MultiXactHasRunningRemoteMembers(MultiXactId multi)
 	int			nmembers;
 	int			i;
 
-	nmembers = GetMultiXactIdMembers(multi, &members, true);
+	nmembers = GetMultiXactIdMembers(multi, &members, false);
 	if (nmembers <= 0)
 		return false;
 
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index ccbbf62..931e2fb 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -701,7 +701,9 @@ HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
 
 				if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
 				{
-					if (MultiXactHasRunningRemoteMembers(xmax))
+					if (HEAP_LOCKED_UPGRADED(tuple->t_infomask))
+						return HeapTupleMayBeUpdated;
+					else if (MultiXactHasRunningRemoteMembers(xmax))
 						return HeapTupleBeingUpdated;
 					else
 						return HeapTupleMayBeUpdated;
@@ -725,6 +727,7 @@ HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
 
 				/* not LOCKED_ONLY, so it has to have an xmax */
 				Assert(TransactionIdIsValid(xmax));
+				Assert(!HEAP_LOCKED_UPGRADED(tuple->t_infomask));
 
 				/* updating subtransaction must have aborted */
 				if (!TransactionIdIsCurrentTransactionId(xmax))
@@ -787,15 +790,12 @@ HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
 	{
 		TransactionId xmax;
 
+		if (HEAP_LOCKED_UPGRADED(tuple->t_infomask))
+			return HeapTupleMayBeUpdated;
+
 		if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
 		{
-			/*
-			 * If it's only locked but neither EXCL_LOCK nor KEYSHR_LOCK is
-			 * set, it cannot possibly be running.  Otherwise need to check.
-			 */
-			if ((tuple->t_infomask & (HEAP_XMAX_EXCL_LOCK |
-									  HEAP_XMAX_KEYSHR_LOCK)) &&
-				MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple)))
+			if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple)))
 				return HeapTupleBeingUpdated;
 
 			SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
@@ -1401,25 +1401,20 @@ HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId OldestXmin,
 		 * "Deleting" xact really only locked it, so the tuple is live in any
 		 * case.  However, we should make sure that either XMAX_COMMITTED or
 		 * XMAX_INVALID gets set once the xact is gone, to reduce the costs of
-		 * examining the tuple for future xacts.  Also, marking dead
-		 * MultiXacts as invalid here provides defense against MultiXactId
-		 * wraparound (see also comments in heap_freeze_tuple()).
+		 * examining the tuple for future xacts.
 		 */
 		if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
 		{
 			if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
 			{
 				/*
-				 * If it's only locked but neither EXCL_LOCK nor KEYSHR_LOCK
-				 * are set, it cannot possibly be running; otherwise have to
-				 * check.
+				 * If it's a pre-pg_upgrade tuple, the multixact cannot
+				 * possibly be running; otherwise have to check.
 				 */
-				if ((tuple->t_infomask & (HEAP_XMAX_EXCL_LOCK |
-										  HEAP_XMAX_KEYSHR_LOCK)) &&
+				if (!HEAP_LOCKED_UPGRADED(tuple->t_infomask) &&
 					MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple)))
 					return HEAPTUPLE_LIVE;
 				SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
-
 			}
 			else
 			{
diff --git a/src/include/access/htup_details.h b/src/include/access/htup_details.h
index f7af83c..e5d3098 100644
--- a/src/include/access/htup_details.h
+++ b/src/include/access/htup_details.h
@@ -204,6 +204,31 @@ struct HeapTupleHeaderData
 	 (((infomask) & (HEAP_XMAX_IS_MULTI | HEAP_LOCK_MASK)) == HEAP_XMAX_EXCL_LOCK))
 
 /*
+ * A tuple that has HEAP_XMAX_IS_MULTI and HEAP_XMAX_LOCK_ONLY but neither of
+ * XMAX_EXCL_LOCK and XMAX_KEYSHR_LOCK must come from a tuple that was
+ * share-locked in 9.2 or earlier and then pg_upgrade'd.
+ *
+ * In 9.2 and prior, HEAP_XMAX_IS_MULTI was only set when there were multiple
+ * FOR SHARE lockers of that tuple.  That set HEAP_XMAX_LOCK_ONLY (with a
+ * different name back then) but neither of HEAP_XMAX_EXCL_LOCK and
+ * HEAP_XMAX_KEYSHR_LOCK.  That combination is no longer possible in 9.3 and
+ * up, so if we see that combination we know for certain that the tuple was
+ * locked in an earlier release; since all such lockers are gone (they cannot
+ * survive through pg_upgrade), such tuples can safely be considered not
+ * locked.
+ *
+ * We must not resolve such multixacts locally, because the result would be
+ * bogus, regardless of where they stand with respect to the current valid
+ * multixact range.
+ */
+#define HEAP_LOCKED_UPGRADED(infomask) \
+( \
+	 ((infomask) & HEAP_XMAX_IS_MULTI) && \
+	 ((infomask) & HEAP_XMAX_LOCK_ONLY) && \
+	 (((infomask) & (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)) == 0) \
+)
+
+/*
  * Use these to test whether a particular lock is applied to a tuple
  */
 #define HEAP_XMAX_IS_SHR_LOCKED(infomask) \

locked-upgraded-2-master.patchtext/x-diff; charset=iso-8859-1Download

commit a97b3b0300442eb0125f0cb3dd179b420b411b92[m
Author:     Alvaro Herrera <alvherre@alvh.no-ip.org>
AuthorDate: Thu Jun 16 23:33:20 2016 -0400
CommitDate: Wed Jun 22 15:25:07 2016 -0400

    Fix handling of multixacts predating pg_upgrade
    
    After pg_upgrade, it is possible that some tuples' Xmax have multixacts
    corresponding to the old installation; such multixacts cannot have
    running members anymore.  In many code sites we already know not to read
    them and clobber them silently, but at least when VACUUM tries to freeze
    a multixact or determine if one needs freezing, there's an attempt to
    resolve it to its member transactions by calling GetMultiXactIdMembers,
    and if the multixact value is "in the future" with regards to the
    current valid multixact range, an error like this is raised:
        ERROR:  MultiXactId 123 has not been created yet -- apparent wraparound
    and vacuuming fails.  Per discussion with Andrew Gierth, it is completely
    bogus to try to resolve multixacts coming from before a pg_upgrade,
    regardless of where they stand with regards to the current valid
    multixact range.
    
    It's possible to get from under this problem by doing SELECT FOR UPDATE
    of the problem tuples, but if tables are large, this is slow and
    tedious, so a more thorough solution is desirable.
    
    To fix, we realize that multixacts in xmax created in 9.2 and previous
    have a specific bit pattern that is never used in 9.3 and later.
    Whenever the infomask of the tuple matches that bit pattern, we just
    ignore the multixact completely as if Xmax wasn't set; or, in the case
    of tuple freezing, we act as if an unwanted value is set and clobber it
    without decoding.  This guarantees that no errors will be raised, and
    that the values will be progressively removed until all tables are
    clean.  Most callers of GetMultiXactIdMembers are patched to recognize
    directly that the value is a removable "empty" multixact and avoid
    calling GetMultiXactIdMembers altogether.
    
    To avoid changing the signature of GetMultiXactIdMembers() in back
    branches, we keep the "allow_old" boolean flag but rename it to
    "from_pgupgrade"; if the flag is true, we always return an empty set
    instead of looking up the multixact.  (We could remove the argument in
    the master branch, but choose not to do so in this commit).
    
    Bug analysis by Andrew Gierth and Álvaro Herrera.
    
    A number of public reports match this bug:
      https://www.postgresql.org/message-id/20140330040029.GY4582@tamriel.snowman.net
      https://www.postgresql.org/message-id/538F3D70.6080902@publicrelay.com
      https://www.postgresql.org/message-id/556439CF.7070109@pscs.co.uk
      https://www.postgresql.org/message-id/SG2PR06MB0760098A111C88E31BD4D96FB3540@SG2PR06MB0760.apcprd06.prod.outlook.com
      https://www.postgresql.org/message-id/20160615203829.5798.4594@wrigleys.postgresql.org

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index 88f7137..e20e7f8 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -158,8 +158,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 
 				values[Atnum_ismulti] = pstrdup("true");
 
-				allow_old = !(infomask & HEAP_LOCK_MASK) &&
-					(infomask & HEAP_XMAX_LOCK_ONLY);
+				allow_old = HEAP_LOCKED_UPGRADED(infomask);
 				nmembers = GetMultiXactIdMembers(xmax, &members, allow_old,
 												 false);
 				if (nmembers == -1)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 80e9594..24416c6 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -5225,8 +5225,7 @@ l5:
 		 * pg_upgrade; both MultiXactIdIsRunning and MultiXactIdExpand assume
 		 * that such multis are never passed.
 		 */
-		if (!(old_infomask & HEAP_LOCK_MASK) &&
-			HEAP_XMAX_IS_LOCKED_ONLY(old_infomask))
+		if (HEAP_LOCKED_UPGRADED(old_infomask))
 		{
 			old_infomask &= ~HEAP_XMAX_IS_MULTI;
 			old_infomask |= HEAP_XMAX_INVALID;
@@ -5586,6 +5585,17 @@ l4:
 				int			i;
 				MultiXactMember *members;
 
+				/*
+				 * We don't need a test for pg_upgrade'd tuples: this is only
+				 * applied to tuples after the first in an update chain.  Said
+				 * first tuple in the chain may well be locked-in-9.2-and-
+				 * pg_upgraded, but that one was already locked by our caller,
+				 * not us; and any subsequent ones cannot be because our
+				 * caller must necessarily have obtained a snapshot later than
+				 * the pg_upgrade itself.
+				 */
+				Assert(!HEAP_LOCKED_UPGRADED(mytup.t_data->t_infomask));
+
 				nmembers = GetMultiXactIdMembers(rawxmax, &members, false,
 									 HEAP_XMAX_IS_LOCKED_ONLY(old_infomask));
 				for (i = 0; i < nmembers; i++)
@@ -6144,14 +6154,14 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	bool		has_lockers;
 	TransactionId update_xid;
 	bool		update_committed;
-	bool		allow_old;
 
 	*flags = 0;
 
 	/* We should only be called in Multis */
 	Assert(t_infomask & HEAP_XMAX_IS_MULTI);
 
-	if (!MultiXactIdIsValid(multi))
+	if (!MultiXactIdIsValid(multi) ||
+		HEAP_LOCKED_UPGRADED(t_infomask))
 	{
 		/* Ensure infomask bits are appropriately set/reset */
 		*flags |= FRM_INVALIDATE_XMAX;
@@ -6164,14 +6174,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 		 * was a locker only, it can be removed without any further
 		 * consideration; but if it contained an update, we might need to
 		 * preserve it.
-		 *
-		 * Don't assert MultiXactIdIsRunning if the multi came from a
-		 * pg_upgrade'd share-locked tuple, though, as doing that causes an
-		 * error to be raised unnecessarily.
 		 */
-		Assert((!(t_infomask & HEAP_LOCK_MASK) &&
-				HEAP_XMAX_IS_LOCKED_ONLY(t_infomask)) ||
-			   !MultiXactIdIsRunning(multi,
+		Assert(!MultiXactIdIsRunning(multi,
 									 HEAP_XMAX_IS_LOCKED_ONLY(t_infomask)));
 		if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
 		{
@@ -6213,10 +6217,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	 * anything.
 	 */
 
-	allow_old = !(t_infomask & HEAP_LOCK_MASK) &&
-		HEAP_XMAX_IS_LOCKED_ONLY(t_infomask);
 	nmembers =
-		GetMultiXactIdMembers(multi, &members, allow_old,
+		GetMultiXactIdMembers(multi, &members, false,
 							  HEAP_XMAX_IS_LOCKED_ONLY(t_infomask));
 	if (nmembers <= 0)
 	{
@@ -6777,14 +6779,15 @@ static bool
 DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
 						LockTupleMode lockmode)
 {
-	bool		allow_old;
 	int			nmembers;
 	MultiXactMember *members;
 	bool		result = false;
 	LOCKMODE	wanted = tupleLockExtraInfo[lockmode].hwlock;
 
-	allow_old = !(infomask & HEAP_LOCK_MASK) && HEAP_XMAX_IS_LOCKED_ONLY(infomask);
-	nmembers = GetMultiXactIdMembers(multi, &members, allow_old,
+	if (HEAP_LOCKED_UPGRADED(infomask))
+		return false;
+
+	nmembers = GetMultiXactIdMembers(multi, &members, false,
 									 HEAP_XMAX_IS_LOCKED_ONLY(infomask));
 	if (nmembers >= 0)
 	{
@@ -6867,15 +6870,15 @@ Do_MultiXactIdWait(MultiXactId multi, MultiXactStatus status,
 				   Relation rel, ItemPointer ctid, XLTW_Oper oper,
 				   int *remaining)
 {
-	bool		allow_old;
 	bool		result = true;
 	MultiXactMember *members;
 	int			nmembers;
 	int			remain = 0;
 
-	allow_old = !(infomask & HEAP_LOCK_MASK) && HEAP_XMAX_IS_LOCKED_ONLY(infomask);
-	nmembers = GetMultiXactIdMembers(multi, &members, allow_old,
-									 HEAP_XMAX_IS_LOCKED_ONLY(infomask));
+	/* for pre-pg_upgrade tuples, no need to sleep at all */
+	nmembers = HEAP_LOCKED_UPGRADED(infomask) ? -1 :
+		GetMultiXactIdMembers(multi, &members, false,
+							  HEAP_XMAX_IS_LOCKED_ONLY(infomask));
 
 	if (nmembers >= 0)
 	{
@@ -7053,9 +7056,11 @@ heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid,
 		multi = HeapTupleHeaderGetRawXmax(tuple);
 		if (!MultiXactIdIsValid(multi))
 		{
-			/* no xmax set, ignore */
+			/* no valid xmax set, ignore */
 			;
 		}
+		else if (HEAP_LOCKED_UPGRADED(tuple->t_infomask))
+			return true;
 		else if (MultiXactIdPrecedes(multi, cutoff_multi))
 			return true;
 		else
@@ -7063,13 +7068,10 @@ heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid,
 			MultiXactMember *members;
 			int			nmembers;
 			int			i;
-			bool		allow_old;
 
 			/* need to check whether any member of the mxact is too old */
 
-			allow_old = !(tuple->t_infomask & HEAP_LOCK_MASK) &&
-				HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask);
-			nmembers = GetMultiXactIdMembers(multi, &members, allow_old,
+			nmembers = GetMultiXactIdMembers(multi, &members, false,
 								HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask));
 
 			for (i = 0; i < nmembers; i++)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 7bccca8..ee2b7ab 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1175,12 +1175,17 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
  * GetMultiXactIdMembers
  *		Returns the set of MultiXactMembers that make up a MultiXactId
  *
- * If the given MultiXactId is older than the value we know to be oldest, we
- * return -1.  The caller is expected to allow that only in permissible cases,
- * i.e. when the infomask lets it presuppose that the tuple had been
- * share-locked before a pg_upgrade; this means that the HEAP_XMAX_LOCK_ONLY
- * needs to be set, but HEAP_XMAX_KEYSHR_LOCK and HEAP_XMAX_EXCL_LOCK are not
- * set.
+ * from_pgupgrade should be set to true when a multixact corresponds to a
+ * value from a tuple that was locked in a 9.2-or-older install and later
+ * pg_upgrade'd.  In this case, we now for certain that no members can still
+ * be running, so we return -1 just like for an empty multixact without
+ * further any further checking.  It would be wrong to try to resolve such
+ * multixacts, because they may be pointing to any part of the multixact
+ * space, both within the current valid range in which case we could return
+ * bogus results, or outside it, which would raise errors.  This flag should
+ * only be passed true when the multixact is attached to a tuple with
+ * HEAP_XMAX_LOCK_ONLY set, but neither of HEAP_XMAX_KEYSHR_LOCK and
+ * HEAP_XMAX_EXCL_LOCK.
  *
  * Other border conditions, such as trying to read a value that's larger than
  * the value currently known as the next to assign, raise an error.  Previously
@@ -1194,7 +1199,7 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
  */
 int
 GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
-					  bool allow_old, bool onlyLock)
+					  bool from_pgupgrade, bool onlyLock)
 {
 	int			pageno;
 	int			prev_pageno;
@@ -1213,7 +1218,7 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
-	if (!MultiXactIdIsValid(multi))
+	if (!MultiXactIdIsValid(multi) || from_pgupgrade)
 		return -1;
 
 	/* See if the MultiXactId is in the local cache */
@@ -1273,7 +1278,7 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 
 	if (MultiXactIdPrecedes(multi, oldestMXact))
 	{
-		ereport(allow_old ? DEBUG1 : ERROR,
+		ereport(ERROR,
 				(errcode(ERRCODE_INTERNAL_ERROR),
 		 errmsg("MultiXactId %u does no longer exist -- apparent wraparound",
 				multi)));
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 0e815a9..a08dae1 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -620,15 +620,12 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 	{
 		TransactionId xmax;
 
+		if (HEAP_LOCKED_UPGRADED(tuple->t_infomask))
+			return HeapTupleMayBeUpdated;
+
 		if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
 		{
-			/*
-			 * If it's only locked but neither EXCL_LOCK nor KEYSHR_LOCK is
-			 * set, it cannot possibly be running.  Otherwise need to check.
-			 */
-			if ((tuple->t_infomask & (HEAP_XMAX_EXCL_LOCK |
-									  HEAP_XMAX_KEYSHR_LOCK)) &&
-				MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), true))
+			if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), true))
 				return HeapTupleBeingUpdated;
 
 			SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
@@ -1279,26 +1276,21 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 		 * "Deleting" xact really only locked it, so the tuple is live in any
 		 * case.  However, we should make sure that either XMAX_COMMITTED or
 		 * XMAX_INVALID gets set once the xact is gone, to reduce the costs of
-		 * examining the tuple for future xacts.  Also, marking dead
-		 * MultiXacts as invalid here provides defense against MultiXactId
-		 * wraparound (see also comments in heap_freeze_tuple()).
+		 * examining the tuple for future xacts.
 		 */
 		if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
 		{
 			if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
 			{
 				/*
-				 * If it's only locked but neither EXCL_LOCK nor KEYSHR_LOCK
-				 * are set, it cannot possibly be running; otherwise have to
-				 * check.
+				 * If it's a pre-pg_upgrade tuple, the multixact cannot
+				 * possibly be running; otherwise have to check.
 				 */
-				if ((tuple->t_infomask & (HEAP_XMAX_EXCL_LOCK |
-										  HEAP_XMAX_KEYSHR_LOCK)) &&
+				if (!HEAP_LOCKED_UPGRADED(tuple->t_infomask) &&
 					MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple),
 										 true))
 					return HEAPTUPLE_LIVE;
 				SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
-
 			}
 			else
 			{
diff --git a/src/include/access/htup_details.h b/src/include/access/htup_details.h
index 9f9cf2a..d7e5fad 100644
--- a/src/include/access/htup_details.h
+++ b/src/include/access/htup_details.h
@@ -218,6 +218,31 @@ struct HeapTupleHeaderData
 	 (((infomask) & (HEAP_XMAX_IS_MULTI | HEAP_LOCK_MASK)) == HEAP_XMAX_EXCL_LOCK))
 
 /*
+ * A tuple that has HEAP_XMAX_IS_MULTI and HEAP_XMAX_LOCK_ONLY but neither of
+ * XMAX_EXCL_LOCK and XMAX_KEYSHR_LOCK must come from a tuple that was
+ * share-locked in 9.2 or earlier and then pg_upgrade'd.
+ *
+ * In 9.2 and prior, HEAP_XMAX_IS_MULTI was only set when there were multiple
+ * FOR SHARE lockers of that tuple.  That set HEAP_XMAX_LOCK_ONLY (with a
+ * different name back then) but neither of HEAP_XMAX_EXCL_LOCK and
+ * HEAP_XMAX_KEYSHR_LOCK.  That combination is no longer possible in 9.3 and
+ * up, so if we see that combination we know for certain that the tuple was
+ * locked in an earlier release; since all such lockers are gone (they cannot
+ * survive through pg_upgrade), such tuples can safely be considered not
+ * locked.
+ *
+ * We must not resolve such multixacts locally, because the result would be
+ * bogus, regardless of where they stand with respect to the current valid
+ * multixact range.
+ */
+#define HEAP_LOCKED_UPGRADED(infomask) \
+( \
+	 ((infomask) & HEAP_XMAX_IS_MULTI) && \
+	 ((infomask) & HEAP_XMAX_LOCK_ONLY) && \
+	 (((infomask) & (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)) == 0) \
+)
+
+/*
  * Use these to test whether a particular lock is applied to a tuple
  */
 #define HEAP_XMAX_IS_SHR_LOCKED(infomask) \

#40